Abstract
An object on the Semantic Web is likely to be denoted with several URIs by different parties. Object coreferencing is a process to identify “equivalent” URIs of objects for achieving a better Data Web. In this paper, we propose a bootstrapping approach for object coreferencing on the Semantic Web. For an object URI, we firstly establish a kernel that consists of semantically equivalent URIs from the same-as, (inverse) functional properties and (max-)cardinalities, and then extend the kernel with respect to the textual descriptions (e.g., labels and local names) of URIs. We also propose a trustworthiness-based method to rank the coreferent URIs in the kernel as well as a similarity-based method for ranking the URIs in the extension of the kernel. We implement the proposed approach, called ObjectCoref, on a large-scale dataset that contains 76 million URIs collected by the Falcons search engine until 2008. The evaluation on precision, relative recall and response time demonstrates the feasibility of our approach. Additionally, we apply the proposed approach to investigate the popularity of the URI alias phenomenon on the current Semantic Web.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Hogan A, Harth A, Decker S. Performing object consolidation on the semantic web data graph. In Proc. WWW Workshop on I3: Identity, Identifiers, Identification, Banff, Canada, May 8, 2007.
Jacobs I, Walsh N. Architecture of the World Wide Web, volume one. http://www.w3.org/TR/webarch/, Dec. 15, 2004.
Bleiholder J, Naumann F. Data fusion. ACM Computing Surveys, 2008, 41(1): 1–41.
Glaser H, Jaffri A, Millard I C. Managing co-reference on the Semantic Web. In WWW Workshop on LDOW, Madrid, Spain, Apr. 20, 2009.
Bizer C, Heath T, Berners-Lee T. Linked data — The story so far. International Journal on Semantic Web and Information Systems, 2009, 5(3): 1–22.
Volz R, Kleb J, Mueller W. Towards ontology-based disambiguation of geographical identifiers. In Proc. WWW Workshop on I3: Identity, Identifiers, Identification, Banff, Canada, May 8, 2007.
Raimond Y, Sutton C, Sandler M. Automatic interlinking of music datasets on the Semantic Web. In WWW Workshop on LDOW, Beijing, China, Apr. 22, 2008.
Hassanzadeh O, Consens M. Linked movie data base. In WWW Workshop on LDOW, Madrid, Spain, Apr. 20, 2009.
Tummarello G, Delbru R, Oren E. Sindice.com: Weaving the open linked data. In Proc. ISWC/ASWC, Busan, Korea, Nov. 11–15, 2007, pp.552-565.
Cheng G, Qu Y Z. Searching linked objects with Falcons: Approach, implementation and evaluation. International Journal on Semantic Web and Information Systems, 2009, 5(3): 49–70.
Bouquet P, Stoermer H, Niederee C, Ma¹na A. Entity name system: The back-bone of an open and scalable web of data. In Proc. IEEE ICSC, Washington DC, USA, Aug. 4–7, 2008, pp.554-561.
Hogan A, Polleres A, Umbrich J, Zimmermann A. Some entities are more equal than others: Statistical methods to consolidate linked data. In ESWC Workshop on NeFoRS, Heraklion, Greece, May 31, 2010.
Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1–16.
Wang S, Du X Y, Meng X F, Chen H. Database research: Achievements and challenges. Journal of Computer Science and Technology, 2006, 21(5): 823–837.
Li Y, Musílek P, Reformat M, Wyard-Scott L. Identification of pleonastic it using the web. Journal of Artificial Intelligence Research, 2009, 34(1): 339–389.
Dean M, Schreiber G. OWL web ontology language reference. http://www.w3.org/TR/owl-ref/, Feb. 10, 2004.
Nikolov A, Uren V, Motta E, de Roeck A. Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In Proc. ASWC, Shanghai, China, Dec. 6–9, 2009, pp.332–346.
Qu Y Z, Hu W, Cheng G. Constructing virtual documents for ontology matching. In Proc. WWW, Edinburgh, UK, May 23–26, 2006, pp.23–31.
Hu W, Qu Y Z, Cheng G. Matching large ontologies: A divide- and-conquer approach. Data and Knowledge Engineering, 2008, 67(1): 140–160.
Ferrara A, Lorusso D, Montanelli S. Automatic identity recognition in the Semantic Web. In Proc. ESWC Workshop on IRSW, Tenerife, Spain, Jun. 2, 2008.
Volz J, Bizer C, Gaedke M, Kobilarov G. Discovering and maintaining links on the web of data. In Proc. ISWC, Chantilly, USA, Oct. 25–29, 2009, pp.650–665.
Halpin P, Hayes P J, McCusker J P, McGuinness D L, Thompson H S. When owl:sameAs isn’t the same: An analysis of identity in linked data. In Proc. ISWC, Shanghai, China, Nov. 7–11, 2010, pp.305–320.
Ding L, Shinavier J, Shangguan Z N, McGuinness D L. SameAs networks and beyond: Analyzing deployment status and implications of owl:sameAs in linked data. In Proc. ISWC, Shanghai, China, Nov. 7–11, 2010, pp.145–160.
Gracia J, d’Aquin M, Mena E. Large scale integration of senses for the SemanticWeb. In Proc. WWW, Madrid, Spain, Apr. 20–24, 2009, pp.611-620.
Fellegi I P, Sunter A B. A theory for record linkage. Journal of the American Statistical Society, 1969, 64(328): 1183–1210.
Cheng T Y, Wang S. A novel approach to clustering merchandise records. Journal of Computer Science and Technology, 2007, 22(2): 228–231.
Euzenat J, Shvaiko P. Ontology Matching. Heidelberg: Springer, 2007.
Wang S, Englebienne G, Schlobach S. Learning concept mappings from instance similarity. In Proc. ISWC, Karlsruhe, Germany, Oct. 26–30, 2008, pp.339–355.
Klyne G, Carroll J J. Resource description framework (RDF): Concepts and abstract syntax. http://www.w3.org/TR/rdf-concepts/, Feb. 10, 2004.
Urbani J, Kotoulas S, Maassen J, van Harmelen F, Bal H. OWL reasoning with WebPIE: Calculating the closure of 100 billion triples. In Proc. ESWC, Heraklion, Greece, May 30-Jun. 3, 2010, pp.213-227.
Hogan A, Pan J Z, Polleres A, Decker S. SAOR: Template rule optimisations for distributed reasoning over 1 billion linked data triples. In: Proc. ISWC, Shanghai, China, Nov. 7–11, 2010, pp.337–353.
Ghazvinian A, Noy N F, Jonquet C, Shah N, Musen M A. What four million mappings can tell you about two hundred ontologies. In Proc. ISWC, Chantilly, USA, Oct. 25–29, 2009, pp.229–242.
Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford University, 1998.
Kleinberg J. Authoritative sources in a hyperlinked environment. In Proc. SODA, San Francisco, USA, Jan. 25–27, 1998, pp.668–677.
Tummarello G, Morbidoni C, Bachmann-Gmür R, Erling O. RDFSync: Efficient remote synchronization of RDF models. In Proc. ISWC/ASWC, Busan, Korea, Nov. 11–15, 2007, pp.537-551.
Stickler P. CBD — Concise bounded description. http://www.w3.org/Submission/CBD/, Jun. 3, 2005.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61003018 and 60973024, in part by the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No. 20100091120041, and also in part by the IBM CRL UR Joint Project.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hu, W., Qu, YZ. & Sun, XZ. Bootstrapping Object Coreferencing on the Semantic Web. J. Comput. Sci. Technol. 26, 663–675 (2011). https://doi.org/10.1007/s11390-011-1166-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-1166-z