Abstract
In this work we explore application of XML schema similarity mapping in the area of conceptual modeling of XML schemas. We expand upon our previous efforts to map XML schemas to a common platform-independent schema using similarity evaluation based on exploitation of a decision tree. In particular, in this paper a more versatile method is implemented and the decision tree is trained using a large set of user-annotated mapping decision samples. Several variations of training that could improve the mapping results are proposed. The approach is implemented within a modeling and evolution management framework called eXolutio and its variations are evaluated using a wide range of experiments.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Do, H.H., Rahm, E.: COMA – A system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, Pages, pp. 610–621. VLDB Endowment, Hong Kong (2002)
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proceeding SIGMOD 2005 Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908 (2005) ISBN:1-59593-060-4
Duchateau, F., Bellahsene, Z., Coletta, R.: A flexible approach for planning schema matching algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. In: Proceeding ICDE 2002 Proceedings of the 18th International Conference on Data Engineering, p. 117. IEEE Computer Society, Washington, DC (2002)
Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation (November 26, 2008), http://www.w3.org/TR/REC-xml .
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco (1993) ISBN:1-55860-238-0
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman & Hall, New York (1984)
Hunt, E. B., Marin, J., Stone, P. T.: Experiments in Induction. Academic Press, New York (1966)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–540. Springer, Heidelberg (1996)
Stárka, J.: Similarity of XML Data. Master’s thesis, Charles University in Prague (2010), http://www.ksi.mff.cuni.cz/~holubova/dp/Starka.pdf
Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Pattern Matching Algorithms, pp. 341–371. Oxford University Press (1997)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proceedings of the Fifth International Workshop on the Web and Databases, pp. 61–66 (2002)
Li, W., Clifton, C.: SemInt: a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data & Knowledge Engineering 33(1), 169–123 (2000) ISSN 0169-023X
Chen, P.: The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems, 9–36 (March 1976)
Quinlan, R.: C5.0, http://www.rulequest.com/see5-unix.html .
Stárka, J., Mlýnková, I., Klímek, J., Nečaský, M.: Integration of web service interfaces via decision trees. In: Proceedings of the 7th International Symposium on Innovations in Information Technology, pp. 47–52. IEEE Computer Society, Abu Dhabi (2011) ISBN: 978-1-4577-0311-9
Klímek, J., Mlýnková, I., Nečaský, M.: eXolutio: Tool for XML and Data Management. In: CEUR Workshop Proceedings, pp. 1613–1673 (2012) ISSN: 1613-0073
Miller, J., Mukerji, J.: MDA Guide Version 1.0.1. Object Management Group (2003), http://www.omg.org/docs/omg/03-06-01.pdf
Nečaský, M., Mlýnková, I., Klímek, J., Malý, J.: When conceptual model meets grammar: A dual approach to XML data modeling. International Journal on Data & Knowledge Engineering 72, 1–30 (2012) ISBN:3-642-17615-1, 978-3-642-17615-9
Jílková, E.: Adaptive Similarity of XML Data. Master’s thesis, Charles University in Prague (2013), http://www.ksi.mff.cuni.cz/~holubova/dp/Jilkova.pdf
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jílková, E., Polák, M., Holubová, I. (2014). Adaptive Similarity of XML Data. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2014 Conferences. OTM 2014. Lecture Notes in Computer Science, vol 8841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45563-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-662-45563-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45562-3
Online ISBN: 978-3-662-45563-0
eBook Packages: Computer ScienceComputer Science (R0)