Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

Duan, Songyun; Fokoue, Achille; Hassanzadeh, Oktie; Kementsietsidis, Anastasios; Srinivas, Kavitha; Ward, Michael J.

doi:10.1007/978-3-642-35176-1_4

Songyun Duan²⁶,
Achille Fokoue²⁶,
Oktie Hassanzadeh²⁶,
Anastasios Kementsietsidis²⁶,
Kavitha Srinivas²⁶ &
…
Michael J. Ward²⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7649))

Included in the following conference series:

International Semantic Web Conference

4018 Accesses
23 Citations

Abstract

In this paper, we describe a mechanism for ontology alignment using instance based matching of types (or classes). Instance-based matching is known to be a useful technique for matching ontologies that have different names and different structures. A key problem in instance matching of types, however, is scaling the matching algorithm to (a) handle types with a large number of instances, and (b) efficiently match a large number of type pairs. We propose the use of state-of-the art locality-sensitive hashing (LSH) techniques to vastly improve the scalability of instance matching across multiple types. We show the feasibility of our approach with DBpedia and Freebase, two different type systems with hundreds and thousands of types, respectively. We describe how these techniques can be used to estimate containment or equivalence relations between two type systems, and we compare two different LSH techniques for computing instance similarity.

Download to read the full chapter text

Chapter PDF

Tackling the challenges of matching biomedical ontologies

Article Open access 15 January 2018

An effective method of large scale ontology matching

Article Open access 28 October 2014

Putting Instance Matching to the Test: Is Instance Matching Ready for Reliable Data Linking?

Keywords

References

Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)
Chapter Google Scholar
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA+ +. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 906–908 (2005), System demonstration
Google Scholar
Bellahsene, Z., Bonifati, A., Rahm, E.: Schema Matching and Mapping (Data-Centric Systems and Applications), 1st edn. Springer (2011)
Google Scholar
Berlin, J., Motro, A.: Database Schema Matching Using Machine Learning with Feature Selection. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, pp. 452–466. Springer, Heidelberg (2002)
Chapter Google Scholar
Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-Strength Schema Matching. SIGMOD Record 33(4), 38–43 (2004)
Article Google Scholar
Bilke, A., Naumann, F.: Schema Matching Using Duplicates. In: IEEE Proc. of the Int’l Conf. on Data Eng., pp. 69–80 (2005)
Google Scholar
Bizer, C., Jentzsch, A., Cyganiak, R.: State of the LOD Cloud (September 2011), http://www4.wiwiss.fu-berlin.de/lodcloud/state/ (online; accessed October 31, 2011)
Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - A Link Discovery Framework for the Web of Data. In: WWW 2009 Workshop on Linked Data on the Web (LDOW 2011) (April 2009)
Google Scholar
Broder, A.Z.: Some applications of rabin’s fingerprinting method. In: Sequences II: Methods in Communications, Security, and Computer Science (MCSCS), pp. 143–152. Springer (1993)
Google Scholar
Broder, A.: On the resemblance and containment of documents. In: Proc. Compression and Complexity of Sequences, pp. 21–29 (1997)
Google Scholar
Byrne, B., Fokoue, A., Kalyanpur, A., Srinivas, K., Wang, M.: Scalable matching of industry models - a case study. In: Proceedings of the International Workshop on Ontology Matching, OM (2009)
Google Scholar
Carter, J., Wegman, M.N.: Universal classes of hash functions. Journal of Computer and System Sciences 18(2), 143–154 (1979), http://www.sciencedirect.com/science/article/pii/0022000079900448
Article MathSciNet MATH Google Scholar
Charikar, M.: Similarity estimation techniques from rounding algorithms. In: ACM Symp. on Theory of Computing (STOC), pp. 380–388 (2002)
Google Scholar
Dai, B.T., Koudas, N., Srivastava, D., Tung, A.K.H., Venkatasubramanian, S.: Validating Multi-column Schema Matchings by Type. In: IEEE Proc. of the Int’l Conf. on Data Eng., pp. 120–129 (2008)
Google Scholar
Do, H.H., Rahm, E.: COMA - A System for Flexible Combination of Schema Matching Approaches. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB), pp. 610–621 (2002)
Google Scholar
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 509–520 (2001)
Google Scholar
Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Community: A Brief Survey. AI Magazine 26(1), 83–94 (2005)
Google Scholar
Doan, A., Madhavan, J., Domingos, P., Halevy, A.Y.: Ontology Matching: A Machine Learning Approach. In: Handbook on Ontologies, pp. 385–404. Springer (2004)
Google Scholar
Duan, S., Fokoue, A., Srinivas, K.: One Size Does Not Fit All: Customizing Ontology Alignment Using User Feedback. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 177–192. Springer, Heidelberg (2010)
Chapter Google Scholar
Duan, S., Fokoue, A., Srinivas, K., Byrne, B.: A Clustering-Based Approach to Ontology Alignment. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 146–161. Springer, Heidelberg (2011)
Chapter Google Scholar
Engmann, D., Maßmann, S.: Instance Matching with COMA++. In: BTW Workshops, pp. 28–37 (2007)
Google Scholar
Euzenat, J., Shvaiko, P.: Ontology Matching. Springer (2007), http://book.ontologymatching.org/
Hassanzadeh, O., Duan, S., Fokoue, A., Kementsietsidis, A., Srinivas, K., Ward, M.J.: Helix: Online Enterprise Data Analytics. In: Proceedings of the 20th International World Wide Web Conference (WWW 2011) - Demo Track (2011)
Google Scholar
Hassanzadeh, O., Xin, R., Miller, R.J., Kementsietsidis, A., Lim, L., Wang, M.: Linkage Query Writer. Proceedings of the VLDB Endowment (PVLDB) 2(2), 1590–1593 (2009)
Google Scholar
Huang, C.C.E., Chiang, R.H.L., Lim, E.P.: Instance-based attribute identification in database integration. VLDB J. 12(3), 228–243 (2003)
Article Google Scholar
Isaac, A., van der Meij, L., Schlobach, S., Wang, S.: An Empirical Study of Instance-Based Ontology Matching. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 253–266. Springer, Heidelberg (2007)
Chapter Google Scholar
Kang, J., Naughton, J.F.: On Schema Matching with Opaque Column Names and Data Values. In: ACM SIGMOD Int’l Conf. on Mgmt. of Data, pp. 205–216 (2003)
Google Scholar
Kirsten, T., Thor, A., Rahm, E.: Instance-Based Matching of Large Life Science Ontologies. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 172–187. Springer, Heidelberg (2007)
Chapter Google Scholar
Li, W.S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data and Knowledge Engineering 33(1), 49–84 (2000)
Article MATH Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic Schema Matching with Cupid. In: Proc. of the Int’l Conf. on Very Large Data Bases (VLDB), pp. 49–58 (2001)
Google Scholar
Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. The Int’l Journal on Very Large Data Bases 10(4), 334–350 (2001)
Article MATH Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 1st edn. Cambridge University Press, College Station (2011)
Book Google Scholar
Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research, 19 Skyline Drive, Hawthorne, NY, 10532, USA
Songyun Duan, Achille Fokoue, Oktie Hassanzadeh, Anastasios Kementsietsidis, Kavitha Srinivas & Michael J. Ward

Authors

Songyun Duan
View author publications
You can also search for this author in PubMed Google Scholar
Achille Fokoue
View author publications
You can also search for this author in PubMed Google Scholar
Oktie Hassanzadeh
View author publications
You can also search for this author in PubMed Google Scholar
Anastasios Kementsietsidis
View author publications
You can also search for this author in PubMed Google Scholar
Kavitha Srinivas
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Ward
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Fribourg, Switzerland
Philippe Cudré-Mauroux
Lehigh University, 18015, Bethlehem, PA, USA
Jeff Heflin
Clark & Parsia, 20001, Washington, DC, USA
Evren Sirin
Stanford University, CA, USA
Tania Tudorache
INRIA & LIG, Le Cesnay Cedex, France
Jérôme Euzenat
National University of Ireland, DERI, Galway, Ireland
Manfred Hauswirth & Josiane Xavier Parreira &
Rensselaer Polytechnic Institute (RPI), Troy, NY, USA
Jim Hendler
VU University Amsterdam, The Netherlands
Guus Schreiber
University of Zurich, Switzerland
Abraham Bernstein
Linköping University, Sweden
Eva Blomqvist

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, S., Fokoue, A., Hassanzadeh, O., Kementsietsidis, A., Srinivas, K., Ward, M.J. (2012). Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing. In: Cudré-Mauroux, P., et al. The Semantic Web – ISWC 2012. ISWC 2012. Lecture Notes in Computer Science, vol 7649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35176-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-35176-1_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35175-4
Online ISBN: 978-3-642-35176-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

Abstract

Chapter PDF

Similar content being viewed by others

Tackling the challenges of matching biomedical ontologies

An effective method of large scale ontology matching

Putting Instance Matching to the Test: Is Instance Matching Ready for Reliable Data Linking?

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing

Abstract

Chapter PDF

Similar content being viewed by others

Tackling the challenges of matching biomedical ontologies

An effective method of large scale ontology matching

Putting Instance Matching to the Test: Is Instance Matching Ready for Reliable Data Linking?

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation