Abstract
In the design of a relational database, the administrator has to decide, given a fixed or estimated workload, which indexes should be created. This so called index selection problem is an non-trivial optimization problem in relational databases. In this paper we describe a novel approach for index selection on RDF data sets. We propose an algorithm to automatically suggest a set of indexes as materialized views based on a workload of SPARQL queries. The selected set of indexes aims to decrease the cost of the workload. We provide a cost model to estimate the potential impact of candidate indexes on query performance and an algorithm to select an optimal set of indexes. This algorithm may be integrated into an existing SPARQL query engine. We experimentally evaluate our approach on a standard query processor. We claim that our approach is the first comprehensive suggestion for the index selection problem in RDF.
Chapter PDF
Similar content being viewed by others
References
Comer, D.: The difficulty of optimum index selection. ACM Trans. Database Syst. 3(4), 440–445 (1978)
Manola, F., Miller, E.: RDF Primer, W3C Recommendation (February 2004)
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation (April 2008)
Castillo, R., Leser, U., Rothe, C.: RDFMatView: Indexing RDF Data for SPARQL Queries. Technical report, Humboldt University of Berlin (2010)
UniProt: RDF Dataset, http://dev.isb-sib.ch/projects/uniprot-rdf/
W3C SWEO Community Project, Linking open data on the semantic web, http://esw.w3.org/topic/sweoig/taskforces/communityprojects/linkingopendata/
Stenzhorn, H., Srinivas, K., Samwald, M., Ruttenberg, A.: Simplifying access to large-scale health care and life sciences datasets. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 864–868. Springer, Heidelberg (2008)
Caprara, A., Fischetti, M., Maio, D.: Exact and approximate algorithms for the index selection problem in physical database design. IEEE Transactions on Knowledge and Data Engineering 7(6), 955–967 (1995)
Chaudhuri, S., Narasayya, V.R.: An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In: VLDB 1997: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 146–155. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated Selection of Materialized Views and Indexes in SQL Databases. In: VLDB 2000: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 496–505. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Groppe, J., Groppe, S., Ebers, S., Linnemann, V.: Efficient Processing of SPARQL Joins in Memory by Dynamically Restricting Triple Patterns. In: Proceedings of the 24th ACM Symposium on Applied Computing (ACM SAC 2009), Honolulu, Hawaii, March 8-12, pp. 1231–1238. ACM, New York (2009)
Groppe, S., Groppe, J., Linnemann, V.: Using an Index of Precomputed Joins in order to speed up SPARQL Processing. In: Cardoso, J., Cordeiro, J., Filipe, J. (eds.) Proceedings 9th International Conference on Enterprise Information Systems, ICEIS 2007 (1), Funchal, Madeira, Portugal, June 12-16, DISI, pp. 13–20. INSTICC (2007)
Jinghua Groppe, S.G., Kolbaum, J.: Optimization of SPARQL by Using coreSPARQL. In: Cordeiro, J., Filipe, J. (eds.) Proceedings of the 11th International Conference on Enterprise Information Systems (ICEIS 2009), Milano, Italien, Mai 6 - 10, DISI, pp. 107–112. INSTICC (2009)
Olaf Hartig, R.H.: The SPARQL Query Graph Model for Query Optimization. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 564–578. Springer, Heidelberg (2007)
Matono, A., Amagasa, T., Yoshikawa, M., Uemura, S.: An Indexing Scheme for RDF and RDF Schema based on Suffix Arrays. In: Proceedings of SWDB 2003, pp. 151–168 (2003)
Udrea, O., Pugliese, A., Subrahmanian, V.S.: GRIN: A Graph Based RDF Index. In: AAAI, pp. 1465–1470 (2007)
Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for semantic web data management. Proc. VLDB Endow. 1(1), 1008–1019 (2008)
Yan, X., Yu, P.S., Han, J.: Graph indexing based on discriminative frequent structure analysis. ACM Trans. Database Syst. 30(4), 960–993 (2005)
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)
Dantzig, G.: Linear Programming and Extensions. Princeton University Press, Princeton (August 1998)
Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal on Semantic Web and Information Systems - Special Issue on Scalability and Performance of Semantic Web Systems (2009)
ARQJena: ARQ - A SPARQL Processor for Jena (2010), http://jena.sourceforge.net/ARQ/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Castillo, R., Leser, U. (2010). Selecting Materialized Views for RDF Data. In: Daniel, F., Facca, F.M. (eds) Current Trends in Web Engineering. ICWE 2010. Lecture Notes in Computer Science, vol 6385. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16985-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-16985-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16984-7
Online ISBN: 978-3-642-16985-4
eBook Packages: Computer ScienceComputer Science (R0)