Adaptive Similarity of XML Data

Jílková, Eva; Polák, Marek; Holubová, Irena

doi:10.1007/978-3-662-45563-0_32

Eva Jílková²³,
Marek Polák²³ &
Irena Holubová²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8841))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1420 Accesses
1 Citations

Abstract

In this work we explore application of XML schema similarity mapping in the area of conceptual modeling of XML schemas. We expand upon our previous efforts to map XML schemas to a common platform-independent schema using similarity evaluation based on exploitation of a decision tree. In particular, in this paper a more versatile method is implemented and the decision tree is trained using a large set of user-annotated mapping decision samples. Several variations of training that could improve the mapping results are proposed. The approach is implemented within a modeling and evolution management framework called eXolutio and its variations are evaluated using a wide range of experiments.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Towards Flexible Similarity Analysis of XML Data

An Efficient Schema Matching Approach Using Previous Mapping Result Set

SemSynX: Flexible Similarity Analysis of XML Data via Semantic and Syntactic Heterogeneity/Homogeneity Detection

Keywords

References

Do, H.H., Rahm, E.: COMA – A system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, Pages, pp. 610–621. VLDB Endowment, Hong Kong (2002)
Google Scholar
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proceeding SIGMOD 2005 Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908 (2005) ISBN:1-59593-060-4
Google Scholar
Duchateau, F., Bellahsene, Z., Coletta, R.: A flexible approach for planning schema matching algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. In: Proceeding ICDE 2002 Proceedings of the 18th International Conference on Data Engineering, p. 117. IEEE Computer Society, Washington, DC (2002)
Chapter Google Scholar
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation (November 26, 2008), http://www.w3.org/TR/REC-xml .
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco (1993) ISBN:1-55860-238-0
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman & Hall, New York (1984)
Google Scholar
Hunt, E. B., Marin, J., Stone, P. T.: Experiments in Induction. Academic Press, New York (1966)
Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–540. Springer, Heidelberg (1996)
Google Scholar
Stárka, J.: Similarity of XML Data. Master’s thesis, Charles University in Prague (2010), http://www.ksi.mff.cuni.cz/~holubova/dp/Starka.pdf
Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Pattern Matching Algorithms, pp. 341–371. Oxford University Press (1997)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proceedings of the Fifth International Workshop on the Web and Databases, pp. 61–66 (2002)
Google Scholar
Li, W., Clifton, C.: SemInt: a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data & Knowledge Engineering 33(1), 169–123 (2000) ISSN 0169-023X
Google Scholar
Chen, P.: The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems, 9–36 (March 1976)
Google Scholar
Quinlan, R.: C5.0, http://www.rulequest.com/see5-unix.html .
Stárka, J., Mlýnková, I., Klímek, J., Nečaský, M.: Integration of web service interfaces via decision trees. In: Proceedings of the 7th International Symposium on Innovations in Information Technology, pp. 47–52. IEEE Computer Society, Abu Dhabi (2011) ISBN: 978-1-4577-0311-9
Google Scholar
Klímek, J., Mlýnková, I., Nečaský, M.: eXolutio: Tool for XML and Data Management. In: CEUR Workshop Proceedings, pp. 1613–1673 (2012) ISSN: 1613-0073
Google Scholar
Miller, J., Mukerji, J.: MDA Guide Version 1.0.1. Object Management Group (2003), http://www.omg.org/docs/omg/03-06-01.pdf
Nečaský, M., Mlýnková, I., Klímek, J., Malý, J.: When conceptual model meets grammar: A dual approach to XML data modeling. International Journal on Data & Knowledge Engineering 72, 1–30 (2012) ISBN:3-642-17615-1, 978-3-642-17615-9
Google Scholar
Jílková, E.: Adaptive Similarity of XML Data. Master’s thesis, Charles University in Prague (2013), http://www.ksi.mff.cuni.cz/~holubova/dp/Jilkova.pdf
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Charles University in Prague, Czech Republic
Eva Jílková, Marek Polák & Irena Holubová

Authors

Eva Jílková
View author publications
You can also search for this author in PubMed Google Scholar
Marek Polák
View author publications
You can also search for this author in PubMed Google Scholar
Irena Holubová
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

TU Graz, Rechbauerstraße 12, 8010, Graz, Austria
Robert Meersman
CRAN, Campus Sciences, University of Lorraine, BP 70239, 54506, Vandoevre-les-Nancy, France
Hervé Panetto
Computer Science and Computer Engineering, La Trobe University, 3086, Melbourne, VIC, Australia
Tharam Dillon
Laboratory for Enterprise Knowledge and Systems (LEKS), IASI - CNR, Viale Manzoni 30, 00185, Rome, Italy
Michele Missikoff
School of Software, Tsinghua University, 100084, Beijing, China
Lin Liu
PROS Research Center, Camino de Vera, Universidad Politècnica de València, 46022, Valencia, Spain
Oscar Pastor
Department of Electronics, Computer Science and Systems, Univeristy of Calabria, Via P. Bucci, 41C, 87036, Rende, Cosenza, Italy
Alfredo Cuzzocrea
School of Computer Science and Information Technology, RMIT, Australia
Timos Sellis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jílková, E., Polák, M., Holubová, I. (2014). Adaptive Similarity of XML Data. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2014 Conferences. OTM 2014. Lecture Notes in Computer Science, vol 8841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45563-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-662-45563-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45562-3
Online ISBN: 978-3-662-45563-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Adaptive Similarity of XML Data

Abstract

Chapter PDF

Similar content being viewed by others

Towards Flexible Similarity Analysis of XML Data

An Efficient Schema Matching Approach Using Previous Mapping Result Set

SemSynX: Flexible Similarity Analysis of XML Data via Semantic and Syntactic Heterogeneity/Homogeneity Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Adaptive Similarity of XML Data

Abstract

Chapter PDF

Similar content being viewed by others

Towards Flexible Similarity Analysis of XML Data

An Efficient Schema Matching Approach Using Previous Mapping Result Set

SemSynX: Flexible Similarity Analysis of XML Data via Semantic and Syntactic Heterogeneity/Homogeneity Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation