Abstract
Entity matching is the approach of finding different records of the same real-world entity across single or multiple databases or data sources. In this chapter, the theoretical foundations of this approach and how it is applied in various data science tasks will be presented. The chapter will also focus on a specific task that many data science companies in the tourism branch have to face, namely, to correctly map hotel entities across different sources such as review websites or client and internal databases.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Bayrak, A.T., Özbek, E.E., Kestepe, S., & Yildiz, O.T. (2019). Intelligent mapping for hotel records representing the same entity (pp. 560–563). In 2019 4th International conference on computer science and engineering (UBMK).
Christen, P. (2012). Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Publishing Company. Incorporated.
Cohen, W. W., Ravikumar, P., & Fienberg, S. E. (2003, August). A comparison of string distance metrics for name-matching tasks. IIWeb, 3, 73–78.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Hyyrö, H. (2003). A bit-vector algorithm for computing Levenshtein and Damerau edit distances. Nordic Journal of Botany, 10(1), 29–39.
Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651.
Keil, J. M. (2019). Efficient bounded Jaro-Winkler similarity based search. In T. Grust, F. Naumann, A. Böhm, W. Lehner, T. Härder, E. Rahm, A. Heuer, M. Klettke, & H. Meyer (Eds.), BTW 2019. Gesellschaft für Informatik.
Kirsten, T., Kolb, L., Hartung, M., Groß, A., Köpcke, H., & Rahm, E. (2010). Data partitioning for parallel entity matching. arXiv preprint arXiv:1006.5309.
Kong, C., Gao, M., Xu, C., Qian, W., & Zhou, A. (2016, April). Entity matching across multiple heterogeneous data sources. In International conference on database systems for advanced applications (pp. 133–146). Springer.
Kozhevnikov, I., & Gorovoy, V. (2016). Comparison of different approaches for hotels deduplication. In A.-C. N. Ngomo & P. Křemen (Eds.), Knowledge engineering and semantic web. Springer Nature.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., … Raghavendra, V. (2018). Deep learning for entity matching: A design space exploration. SIGMOD’18 (pp. 19–34). Association for Computing Machinery. https://doi.org/10.1145/3183713.3196926
Tai, X. (2018). Record linkage and matching problems in forensics (pp. 510–517). In 2018 IEEE International conference on data mining workshops (ICDMW). IEEE. https://doi.org/10.1109/ICDMW.2018.00081.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., … Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (Vol. 30). Curran Associates.
Zhao, C., & He, Y. (2019). Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning (pp. 2413–2424). Association for Computing Machinery. https://doi.org/10.1145/3308558.3313578
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Further Readings and Other Sources
Further Readings and Other Sources
-
Book: Data matching: Concepts and techniques for record linkage, entity resolution, and duplicate detection by Christen (2012).
-
Video: Deep learning for entity matching: A design space exploration https://www.youtube.com/watch?v=plaONS-Lr8U
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bilan, I. (2022). Entity Matching: Matching Entities Between Multiple Data Sources. In: Egger, R. (eds) Applied Data Science in Tourism. Tourism on the Verge. Springer, Cham. https://doi.org/10.1007/978-3-030-88389-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-88389-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88388-1
Online ISBN: 978-3-030-88389-8
eBook Packages: Business and ManagementBusiness and Management (R0)