Abstract
Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades. Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem. We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping. Hence, Tupelo is applicable in a wide range of data mapping scenarios. Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)
Agrawal, R., Somani, A., Xu, Y.: Storage and Querying of E-Commerce Data. In: VLDB, Rome, Italy, pp. 149–158 (2001)
Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323–364 (1986)
Bernstein, P.A., Melnik, S., Mork, P.: Interactive Schema Translation with Instance-Level Mappings. In: VLDB, Trondheim, Norway, pp. 1283–1286 (2005)
Berry, M.W., Drmač, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41(2), 335–362 (1999)
Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: IEEE ICDE, Tokyo, Japan, pp. 69–80 (2005)
Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting Context into Schema Matching. In: VLDB, Seoul, Korea, pp. 307–318 (2006)
Bossung, S., Stoeckle, H., Grundy, J.C., Amor, R., Hosking, J.G.: Automated Data Mapping Specification via Schema Heuristics and User Interaction. In: IEEE ASE, Linz, Austria, pp. 208–217 (2004)
Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.: Logical Foundations of Peer-To-Peer Data Integration. In: ACM PODS, Paris, France, pp. 241–251 (2004)
Carreira, P., Galhardas, H.: Execution of Data Mappers. In: ACM SIGMOD Workshop IQIS, Paris, France, pp. 2–9 (2004)
Dalvi, N.N., Suciu, D.: Management of Probabilistic Data: Foundations and Challenges. In: PODS, Beijing, pp. 1–12 (2007)
Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: iMAP: Discovering Complex Mappings between Database Schemas. In: ACM SIGMOD, Paris, France, pp. 383–394 (2004)
Doan, A., Domingos, P., Halevy, A.: Learning to Match the Schemas of Databases: A Multistrategy Approach. Machine Learning 50(3), 279–301 (2003)
Doan, A., Noy, N.F., Halevy, A.Y.: Special Issue on Semantic Integration. SIGMOD Record 33(4) (2004)
Eco, U.: The Search for the Perfect Language. Blackwell, Oxford (1995)
Euzenat, J., et al.: State of the Art on Ontology Alignment. Technical Report D2.2.3, IST Knowledge Web NoE (2004)
Feng, Y., Goldstone, R.L., Menkov, V.: A Graph Matching Algorithm and its Application to Conceptual System Translation. Int. J. AI Tools 14(1-2), 77–100 (2005)
Fletcher, G.H.L., Gyssens, M., Paredaens, J., Van Gucht, D.: On the Expressive Power of the Relational Algebra on Finite Sets of Relation Pairs. IEEE Trans. Knowl. Data Eng. 21(6), 939–942 (2009)
Fletcher, G.H.L., Wyss, C.M.: Mapping Between Data Sources on the Web. In: IEEE WIRI, Tokyo, Japan, pp. 173–178 (2005)
Fletcher, G.H.L., Wyss, C.M.: Data Mapping as Search. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 95–111. Springer, Heidelberg (2006)
Fletcher, G.H.L., Wyss, C.M., Robertson, E.L., Van Gucht, D.: A Calculus for Data Mapping. ENTCS 150(2), 37–54 (2006)
Gal, A.: On the Cardinality of Schema Matching. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 947–956. Springer, Heidelberg (2005)
Gal, A.: Why is Schema Matching Tough and What Can We Do About It?. SIGMOD Record 35(4), 2–5 (2006)
Garcia-Molina, H.: Web Information Management: Past, Present, Future. In: ACM WSDM, Palo Alto, CA (2008)
Gillis, J., Van den Bussche, J.: Induction of relational algebra expressions. In: ILP, Leuven (2009)
Giunchiglia, F., Shvaiko, P.: Semantic Matching. Knowledge Eng. Review 18(3), 265–280 (2003)
Goguen, J.A.: Information Integration in Institutions. In: Moss, L. (ed.) Jon Barwise Memorial Volume. Indiana University Press (2006)
Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto Data Extraction Project - Back and Forth between Theory and Practice. In: ACM PODS, Paris, France, pp. 1–12 (2004)
Grahne, G., Kiricenko, V.: Towards an Algebraic Theory of Information Integration. Information and Computation 194(2), 79–100 (2004)
Grundy, J.C., Hosking, J.G., Amor, R., Mugridge, W.B., Li, Y.: Domain-Specific Visual Languages for Specifying and Generating Data Mapping Systems. J. Vis. Lang. Comput. 15(3-4), 243–263 (2004)
Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio Grows Up: From Research Prototype to Industrial Tool. In: ACM SIGMOD, Baltimore, MD, pp. 805–810 (2005)
Habegger, B.: Mapping a Database into an Ontology: a Relational Learning Approach. In: IEEE ICDE, Istanbul, pp. 1443–1447 (2007)
Harris, R.: The Language Connection: Philosophy and Linguistics. Thoemmes Press, Bristol (1997)
He, B., Chang, K.C.-C., Han, J.: Discovering Complex Matchings Across Web Query Interfaces: a Correlation Mining Approach. In: ACM KDD, Seattle, WA, pp. 148–157 (2004)
Hernández, M.A., Papotti, P., Tan, W.-C.: Data Exchange with Data-Metadata Translations. In: VLDB, Auckland, New Zealand (2008)
Hull, R.: Managing Semantic Heterogeneity in Databases: a Theoretical Perspective. In: ACM PODS, Tucson, AZ, pp. 51–61 (1997)
Jain, M.K., Mendhekar, A., Van Gucht, D.: A Uniform Data Model for Relational Data and Meta-Data Query Processing. In: COMAD, Pune, India (1995)
Kalfoglou, Y., Schorlemmer, M.: Ontology Mapping: the State of the Art. Knowledge Eng. Review 18(1), 1–31 (2003)
Kashyap, V., Sheth, A.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. VLDB J. 5(4), 276–304 (1996)
Kedad, Z., Xue, X.: Mapping Discovery for XML Data Integration. In: Meersman, R., Tari, Z. (eds.) OTM 2005. LNCS, vol. 3760, pp. 166–182. Springer, Heidelberg (2005)
Kementsietsidis, A., Arenas, M., Miller, R.J.: Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues. In: ACM SIGMOD, San Diego, CA, pp. 325–336 (2003)
Kent, W.: The Unsolvable Identity Problem. In: Extreme Markup Languages, Montréal, Quebec, Canada (2003)
Kim, W., Seo, J.: Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)
Kolaitis, P.G.: Schema Mappings, Data Exchange, and Metadata Management. In: ACM PODS, Baltimore, MD, pp. 61–75 (2005)
Korf, R.E.: Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artif. Intell. 27(1), 97–109 (1985)
Korf, R.E.: Linear-Space Best-First Search. Artif. Intell. 62(1), 41–78 (1993)
Krishnamurthy, R., Litwin, W., Kent, W.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: ACM SIGMOD, Denver, CO, pp. 40–49 (1991)
Lenzerini, M.: Data Integration: A Theoretical Perspective. In: ACM PODS, Madison, WI, pp. 233–246 (2002)
Levenshtein, V.I.: Dvoichnye Kody s Ispravleniem Vypadenii, Vstavok i Zameshchenii Simvolov. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Levy, A.Y., Ordille, J.J.: An Experiment in Integrating Internet Information Sources. In: AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, pp. 92–96 (1995)
Li, W.-S., Clifton, C.: SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks. Data & Knowl. Eng. 33(1), 49–84 (2000)
Litwin, W.: Bridging a Great Divide: Past, Present, and Future in Multidatabase Interoperability. In: InterDB, Namur, Belgium (2005)
Litwin, W., Ketabchi, M.A., Krishnamurthy, R.: First Order Normal Form for Relational Databases and Multidatabases. SIGMOD Record 20(4), 74–76 (1991)
Litwin, W., Mark, L., Roussopoulos, N.: Interoperability of Multiple Autonomous Databases. ACM Comput. Surv. 22(3), 267–293 (1990)
Matuszek, C., Cabral, J., Witbrockand, M., DeOliveira, J.: An Introduction to the Syntax and Content of Cyc. In: Baral, C. (ed.) Technical Report SS-06-05, pp. 44–49. AAAI Press, Menlo Park (2006)
Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Berlin (2004)
Melnik, S., Bernstein, P.A., Halevy, A.Y., Rahm, E.: Supporting Executable Mappings in Model Management. In: ACM SIGMOD, Baltimore, MD, pp. 167–178 (2005)
Miller, R.J.: Using Schematically Heterogeneous Structures. In: ACM SIGMOD, Seattle, WA, pp. 189–200 (1998)
Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: VLDB, Cairo, Egypt, pp. 77–88 (2000)
Morishima, A., Kitagawa, H., Matsumoto, A.: A Machine Learning Approach to Rapid Development of XML Mapping Queries. In: IEEE ICDE, Boston, MA, pp. 276–287 (2004)
Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)
Noy, N.F., Doan, A., Halevy, A.Y.: Special Issue on Semantic Integration. AI Magazine 26(1) (2005)
Perkowitz, M., Doorenbos, R.B., Etzioni, O., Weld, D.S.: Learning to Understand Information on the Internet: An Example-Based Approach. J. Intell. Inf. Syst. 8(2), 133–153 (1997)
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)
Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: VLDB, Roma, Italy, pp. 381–390 (2001)
Schmid, U., Waltermann, J.: Automatic Synthesis of XSL-Transformations from Example Documents. In: IASTED AIA, Innsbruck, Austria, pp. 252–257 (2004)
Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surv. 22(3), 183–236 (1990)
Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: a Data EXtraction, Processing, and Restructuring System. ACM Trans. Database Syst. 2(2), 134–174 (1977)
Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Stuckenschmidt, H., van Harmelen, F.: Information Sharing on the Semantic Web. Springer, Berlin (2005)
Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Hübner, S.: Ontology-based integration of information – a survey of existing approaches. In: IJCAI (2001)
Wang, G., Goguen, J.A., Nam, Y.-K., Lin, K.: Critical Points for Interactive Schema Matching. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 654–664. Springer, Heidelberg (2004)
Warren, R.H., Tompa, F.W.: Multi-Column Substring Matching for Database Schema Translation. In: VLDB, Seoul, Korea, pp. 331–342 (2006)
Wiederhold, G.: The Impossibility of Global Consistency. OMICS 7(1), 17–20 (2003)
Wiesman, F., Roos, N.: Domain Independent Learning of Ontology Mappings. In: AAMAS, New York, NY, pp. 846–853 (2004)
Winkler, W.E.: The State of Record Linkage and Current Research Problems. Technical Report RR99/04, U.S. Bureau of the Census, Statistical Research Division (1999)
Wyss, C.M., Robertson, E.L.: A Formal Characterization of PIVOT/UNPIVOT. In: ACM CIKM, Bremen, Germany, pp. 602–608 (2005)
Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM Trans. Database Syst. 30(2), 624–660 (2005)
Wyss, C.M., Van Gucht, D.: A Relational Algebra for Data/Metadata Integration in a Federated Database System. In: ACM CIKM, Atlanta, GA, USA, pp. 65–72 (2001)
Wyss, C.M., Wyss, F.I.: Extending Relational Query Optimization to Dynamic Schemas for Information Integration in Multidatabases. In: ACM SIGMOD, Beijing (2007)
Xu, L., Embley, D.W.: A Composite Approach to Automating Direct and Indirect Schema Mappings. Information Systems 31(8), 697–732 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fletcher, G.H.L., Wyss, C.M. (2009). Towards a General Framework for Effective Solutions to the Data Mapping Problem. In: Spaccapietra, S., Delcambre, L. (eds) Journal on Data Semantics XIV. Lecture Notes in Computer Science, vol 5880. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10562-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-10562-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10561-6
Online ISBN: 978-3-642-10562-3
eBook Packages: Computer ScienceComputer Science (R0)