Abstract
Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we present an “entity retrieval system” designed to provide entity search capabilities over datasets as large as the entire Web of Data. Our system supports full-text search, semi-structural queries and top-k query results while exhibiting a concise index and efficient incremental updates. We advocate the use of a node indexing scheme and show that it offers a good compromise between query expressiveness, query processing time and update complexity in comparison to three other indexing techniques. We then demonstrate how such system can effectively answer queries over 10 billion triples on a single commodity machine.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Oren, E., Delbru, R., Catasta, M., Cyganiak, R., Stenzhorn, H., Tummarello, G.: Sindice.com: A document-oriented lookup index for open linked data. International Journal of Metadata, Semantics and Ontologies 3(1) (2008)
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB Endowment, pp. 411–422 (2007)
Harth, A., Umbrich, J., Hogan, A., Decker, S.: YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 211–224. Springer, Heidelberg (2007)
Weiss, C., Karras, P., Bernstein, A.: Hexastore - sextuple indexing for semantic web data management. Proceedings of the VLDB Endowment 1(1), 1008–1019 (2008)
Neumann, T., Weikum, G.: RDF-3X - a RISC-style Engine for RDF. Proceedings of the VLDB Endowment 1(1), 647–659 (2008)
Baeza-Yates, R., Navarro, G.: Integrating contents and structure in text retrieval. SIGMOD Rec. 25(1), 67–79 (1996)
Walsh, N., Fernández, M., Malhotra, A., Nagy, M., Marsh, J.: XQuery 1.0 and XPath 2.0 data model (XDM). W3C recommendation, W3C (January 2007)
Gang, G., Chirkova, R.: Efficiently Querying Large XML Data Repositories: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(10), 1381–1403 (2007)
Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 361–370 (2001)
Haixun, W., Hao, H., Jun, Y., Yu, P., Yu, J.: Dual Labeling: Answering Graph Reachability Queries in Constant Time. In: Proceedings of the 22nd International Conference on Data Engineering, p. 75. IEEE, Los Alamitos (2006)
Su-Cheng, H., Chien-Sing, L.: Node Labeling Schemes in XML Query Optimization: A Survey and Trends. IETE Technical Review 26(2), 88 (2009)
Wang, H., Liu, Q., Penin, T., Fu, L., Zhang, L., Tran, T., Yu, Y., Pan, Y.: Semplore: A scalable IR approach to search the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 177–188 (2009)
Bast, H., Chitea, A., Suchanek, F., Weber, I.: ESTER: efficient search on text, entities, and relations. In: Proceedings of the 30th Annual International ACM SIGIR Conference, pp. 671–678. ACM, New York (2007)
Dong, X., Halevy, A.: Indexing dataspaces. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, p. 43 (2007)
Christophides, V., Plexousakis, D., Scholl, M., Tourtounis, S.: On labeling schemes for the semantic web. In: Proceedings of the 12th International Conference on World Wide Web, p. 544 (2003)
Beyer, K., Viglas, S.D., Tatarinov, I., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: Proceedings of the 2002 ACM SIGMOD International Conference, pp. 204–215 (2002)
Sacks-davis, R., Dao, T., Thom, J.A., Zobel, J.: Indexing documents for queries on structure, content and attributes. In: Proceedings of International Symposium on Digital Media Information Base, November 1997, pp. 236–245. World Scientific, Singapore (1997)
Anh, V.N., Moffat, A.: Structured index organizations for high-throughput text querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 304–315. Springer, Heidelberg (2006)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: Compressing and indexing documents and images, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (1999)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Computer Surveys 38(2), 6 (2006)
Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)
Graefe, G.: Query evaluation techniques for large databases. ACM Computing Surveys 25(2), 73 (1993)
Graefe, G.: B-tree indexes for high update rates. ACM SIGMOD Record 35(1), 39 (2006)
Lim, L., Wang, M., Padmanabhan, S., Vitter, J.S., Agarwal, R.: Dynamic maintenance of web indexes using landmarks. In: Proceedings of the 12th International Conference on World Wide Web, p. 102 (2003)
Delbru, R., Toupikov, N., Catasta, M., Fuller, R., Tummarello, G.: SIREn: Efficient Search on Semi- Structured Documents. In: Lucene in Action, 2nd edn. Manning Publications Co. (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Delbru, R., Toupikov, N., Catasta, M., Tummarello, G. (2010). A Node Indexing Scheme for Web Entity Retrieval. In: Aroyo, L., et al. The Semantic Web: Research and Applications. ESWC 2010. Lecture Notes in Computer Science, vol 6089. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13489-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-13489-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13488-3
Online ISBN: 978-3-642-13489-0
eBook Packages: Computer ScienceComputer Science (R0)