Abstract
In this paper we propose an approach for deriving a global representation of data sources having different formats and structures. The proposed approach is based on the exploitation of a particular conceptual model for both uniformly representing such data sources and reconstructing both their intra-source and their inter-source semantics. Along with the global representation, our approach returns two support structures which improve the access transparency to stored information, namely, a set of mappings, encoding the transformations carried out during the construction of the global representation, and a set of views, allowing to obtain instances of the concepts of the global representation from instances of the concepts of the input data sources. The paper also describes a prototype which implements the proposed approach.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
1993Arens*93 Arens Y, Knoblock CA, Chee CY, et al. (1993) Retrieving and integrating data from multiple information sources. International Journal of Cooperative Information Systems 2(2):127–158
1984BaLe84 Batini C, Lenzerini M (1984) A methodology for data schema integration in the entity relationship model. IEEE Transactions in Software Engineering 10(6):650–664
1995Batini*95 Batini C, Castano S, Fugini MG, et al. (1995) Tecniche per l’analisi di descrizioni di processi nella pubblica amministrazione. In Atti del Congresso Annuale dell’AICA (AICA’95), Cagliari, Italy, pp 247–258 (in Italian)
1999BeCaVi99 Bergamaschi S, Castano S, Vincini M (1999) Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1):54–59
2001Bergamaschi*01 Bergamaschi S, Castano S, Vincini M, et al. (2001) Semantic integration and query of heterogeneous information sources. Data & Knowledge Engineering 36(3):215–249
2000BeRa00 Bernstein PA, Rahm E (2000) Data warehouse scenarios for model management. In Proceedings of the international conference on conceptual modeling (ER’00), Salt Lake City, UT. Lecture Notes in Computer Science, Springer, Berlin, pp 1–15
1992BuVa92 Buitelaar P, Van De Riet RP (1992) The use of a lexicon to interpret er-diagrams: a like project. In Proceedings of the international conference on the entity-relationship approach, (ER’92), Karlsruhe, Germany. Lecture Notes in Computer Science, Springer, Berlin, pp 162–177
1998Calvanese*98-2 Calvanese D, De Giacomo G, Lenzerini M, et al. (1998) Description logic framework for information integration. In Proceedings of the international conference on principles of knowledge representation and reasoning (KR’98), Trento, Italy. Morgan Kaufmann, San Mateo, CA, pp 2–13
1997CaDe97 Castano S, De Antonellis V (1997) Semantic dictionary design for database interoperability. In Proceedings of the international conference on data engineering (ICDE ‘97), Birmingham, UK. IEEE Computer Society, Los Alamitos, CA, pp 43–54
2001CaDeDe01 Castano S, De Antonellis V, De Capitani di Vimercati S (2001) Global viewing of heterogeneous data sources. Transactions in Data and Knowledge Engineering 13(2):277–297
1991CoHuSh91 Collet C, Huhns MN, Shen WM (1991) Resource integration using a large knowledge base in carnot. IEEE Computer 24(12):55–62
2001DoDoHa01 Doan A, Domingos P, Halevy A (2001) Reconciling schemas of disparate data sources: a machine-learning approach. In Proceedings of the international conference on management of data (SIGMOD 2001), Santa Barbara, CA. ACM Press, New York
1995Ellmer*95 Ellmer E, Huemer C, Merkl D, et al. (1995) Neural network technology to support view integration. In Proceedings of the international conference on object-oriented and entity-relationship modelling (OOER’95), Gold Coast, Australia. Lecture Notes in Computer Science, Springer, Berlin, pp 181–190
1991FaKrNe91 Fankhauser P, Kracker M, Neuhold EJ (1991) Semantic vs. structural resemblance of classes. ACM SIGMOD RECORD 20(4):59–63
1998Ursino-NIS1 Flesca S, Palopoli L, Saccà D, et al. (1998) An architecture for accessing a large number of autonomous, heterogeneous databases. Networking and Information Systems Journal 1(4–5):495–518
1997Garcia-Molina*97 Garcia-Molina H, Papakonstantinou Y, Quass D, et al. (1997) The TSIMMIS approach to mediation: data models and languages. Journal of Intelligent Information Systems 8:117–132
1992GoLoNe92 Gotthard W, Lockemann PC, Neufeld A (1992) System-guided view integration for object-oriented databases. IEEE Transactions on Knowledge and Data Engineering 4(1):1–22
1999Haas*99 Haas LM, Miller RJ, Niswonger B, et al. (1999) Transforming heterogeneous data with database middleware: beyond integration. IEEE Data Engineering Bulletin 22(1):31–36
1990HaRa90 Hayne S, Ram S (1990) Multi-user view integration system (muvis): an expert system for view integration. In Proceedings of the international conference on data engineering (ICDE ‘90), Los Angeles, CA. IEEE Computer Society, Los Alamitos, CA, pp 402–409
1993Johannesson93 Johannesson P (1993) Using conceptual graph theory to support schema integration. In Proceedings of the international conference on the entity-relationship approach (ER’93), Arlington, TX. Lecture Notes in Computer Science, Springer, Berlin, pp 283–296
1996LeRaOr96 Levy A, Rajaraman A, Ordille J (1996) Querying heterogeneous information sources using source descriptions. In Proceedings of the international conference on very large data bases (VLDB’96), Bombay, India. Morgan Kaufmann, San Mateo, CA, pp 251–262
2001MaBeRa01 Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In Proceedings of the international conference on very large data bases (VLDB 2001), Rome, Italy. Morgan Kaufmann, San Mateo, CA, pp 49–58
1993MeMeLe93 Metais E, Meunier JN, Levreau G (1993) Database schema design: a perspective from natural language techniques to validation and view integration. In Proceedings of the international conference on conceptual modeling (ER’93), Dallas TX. Lecture Notes in Computer Science, Springer, Berlin, pp 190–205
1997Metais*97 Metais E, Kedad Z, Comyn-Wattiau I, et al. (1997) Using linguistic knowledge in view integration: toward a third generation of tools. Data & Knowledge Engineering 23(1):59–78
1995Miller95 Miller AG (1995) WordNet: a lexical database for English. Communications of the ACM 38(11):39–41
1998MiZo98 Milo T, Zohar S (1998) Using schema matching to simplify heterogenous data translations. In Proceedings of the international conference on very large data bases (VLDB’98), New York. Morgan Kaufmann, San Mateo, CA, pp 122–133
1995Mirbel95 Mirbel I (1995) Semantic integration of conceptual schemes. In Proceedings of the international workshop on applications of natural language to data bases (NLDB’95), Versailles, France. AFCET
1999MiWiJa99 Mitra P, Wiederhold G, Jannink J (1999) Semi-automatic integration of knowledge sources. In Proceedings of Fusion’99, Sunnyvale, CA.
1986NaElLa86 Navathe SB, Elmasri R, Larson JA (1986) Integrating user views in database design. IEEE Computer 19(1):50–62
2000Ursino-RETIS99 Palopoli L, Pontieri L, Terracina G, et al. (1999) Semi-automatic construction of a data warehouse from numerous large databases. In Proceedings of the international conference on re-technologies for information systems (ReTIS’00), Zurich, Switzerland. Osterreichische Computer Gesellschaft, pp 55–75
2000aUrsino-DKE2 Palopoli L, Pontieri L, Terracina G, et al. (2000) Intensional and extensional integration and abstraction of heterogeneous databases. Data & Knowledge Engineering 35(3):201–237
2001bUrsino-ICDE2001 Palopoli L, Terracina G, Ursino D (2001b) A graph-based approach for extracting terminological properties of elements of XML documents. In Proceedings of the international conference on data engineering (ICDE 2001), Heidelberg, Germany. IEEE Computer Society, Los Alamitos, CA, pp 330–340
2002Ursino-CAISE2002 Pontieri L, Ursino D, Zumpano E (2002) An approach for synergically carrying out intensional and extensional integration of data sources having different formats. In Proceedings of the international conference on advanced information systems engineering (CAiSE 2002), Toronto, Ontario. Lecture Notes in Computer Science, Springer, Berlin, pp 752–756
2001RaBe01 Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB Journal 10(4):334–350
1998RiDoVa98 Richardson SD, Dolan WB, Vanderwende L (1998) Mindnet: acquiring and structuring semantic information from text. In Proceedings of the international conference on computational linguistics (COLING-ACL’98), Montreal, Quebec. Morgan Kaufmann, San Mateo, CA, pp 1098–1102
2001Ursino-COOPIS2001 Rosaci D, Terracina G, Ursino D (2001) Deriving ‘sub-source’ similarities from heterogeneous, semi-structured information sources. In Proceedings of the IFCIS conference on cooperative information systems (CoopIS 2001), Trento, Italy. Lecture Notes in Computer Science, Springer, Berlin, pp 163–178
1997RoSc97 Roth MT, Schwarz PM (1997) Don’t scrap it, wrap it! A wrapper architecture for legacy data sources. In Proceedings of the international conference on very large data bases (VLDB 1997), Athens, Greece. Morgan Kaufmann, San Mateo, CA, pp 266–275
1998Sheth*88 Sheth AP, Larson JA, Cornelio A, et al. (1998) A tool for integrating conceptual schemata and user views. In Proceedings of the international conference on data engineering (ICDE’88), Los Angeles, CA. IEEE Computer Society, Los Alamitos, CA, pp 176–183
1994SpPa94 Spaccapietra S, Parent C (1994) View integration: a step forward in solving structural conflicts. IEEE Transactions on Knowledge and Data Engineering 6(2):258–274
2000Ursino-COMAD2000 Terracina G, Ursino D (2000) Deriving synonymies and homonymies of object classes in semi-structured information sources. In Proceedings of the international conference on management of data (COMAD 2000), Pune, India. McGraw-Hill, New York, pp 21–32
2000Ursino-Thesis Ursino D (2000) Extraction and exploitation of intensional knowledge from heterogeneous information sources. PhD thesis, Lecture Notes in Computer Science 2282, Springer, Heidelberg
1992Wiederhold92 Wiederhold G (1992) Mediators in the architecture of future information systems. IEEE Computer 25(3):38–49
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary and partial version of this paper appears under the title ‘A semi-automatic technique for constructing a global representation of information sources having different formats and structure’ in the Proceedings of the Conference ‘Database and Expert System Applications (DEXA 2001)’, Munich, September 2001.
Rights and permissions
About this article
Cite this article
Rosaci, D., Terracina, G. & Ursino, D. An Approach for Deriving a Global Representation of Data Sources Having Different Formats and Structures. Knowledge and Information Systems 6, 42–82 (2004). https://doi.org/10.1007/s10115-003-0095-8
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10115-003-0095-8