Abstract
There has been a rapid increase in the amount of Resource Description Framework (RDF) data on the web. The processing of large volumes of RDF data requires an efficient storage and query-processing engine that can scale well with the volume of data. In the past two and half years, however, heavy users of big data systems, like Facebook, noted limitations with the query performance of these big data systems and began to develop new distributed query engines for big data that do not rely on map-reduce. Facebook’s Presto is one such example. This paper proposes an architecture based on Presto, called Presto-RDF, that can be used to process big RDF data. An evaluation of performance of Presto in processing big RDF data against Apache Hive is also presented. The results of the experiments show that Presto-RDF framework has a much higher performance than Apache Hive and native RDF store - 4Store and it can be used to process big RDF data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Luo, Y., Picalausa, F., Fletcher, G.H., Hidders, J., Vansummeren, S.: Storing and indexing massive RDF datasets. In: Semantic Search Over the Web, pp. 31–60. Springer (2012)
Cudré-Mauroux, P., Enchev, I., Fundatureanu, S., Groth, P., Haque, A., Harth, A., Keppmann, F.L., Miranker, D., Sequeda, J.F., Wylot, M.: NoSql databases for rdf: an empirical evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 310–325. Springer, Heidelberg (2013)
RDF, S.: Efficient RDF Storage and Retrieval in Jena2 (2003)
Sakr, S., Al-Naymat, G.: Relational processing of RDF queries: a survey. ACM SIGMOD Record 38(4), 23–28 (2010)
Abadi, D.J., Marcus, A., Madden, S.R., Hollenbach, K.: Scalable semantic web data management using vertical partitioning. In: Proc. of the Intl. Conf. on Very Large Data Bases, pp. 411–422 (2007)
Morsey, M., Lehmann, J., Auer, S., Ngonga Ngomo, A.-C.: DBpedia SPARQL benchmark – performance assessment with real queries on real data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 454–469. Springer, Heidelberg (2011)
Presto: Interacting with petabytes of data at Facebook. https://www.facebook.com/notes/facebook-engineering/presto-interacting-with-petabytes-of-data-at-facebook/10151786197628920. (accessed: December 02, 2014)
Hammoud, M., etal.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proc. of Intl. Conf. on Vary Large Databases (VLDB 2015)
Papailiou, N., Tsoumakos, D., Konstantinou, I., Karras, P., Koziris, N.: H2RDF+: an efficient data management system for big RDF graphs. In: Proceedings of SIGMOD Conference, pp. 909-912 (2014)
Gurajada, S., Seufert, S., Miliaraki, I., Theobald, M.: TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing. In: Proceedings of SIGMOD Conference, pp. 289-300 (2014)
Kulkarni, P.: Distributed SPARQL query engine using MapReduce. In: Master of Science, Computer Science, School of Informatics, University of Edinburgh (2010)
Leida, M., Chu, A.: Distributed SPARQL query answering over RDF data streams. In: 2013 IEEE International Congress on Big Data (BigData Congress), pp. 369–378 (2013)
Wang, X., Tiropanis, T., Davis, H.C.: Evaluating graph traversal algorithms for distributed SPARQL query optimization. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Wu, Z., Horrocks, I., Mizoguchi, R., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 210–225. Springer, Heidelberg (2012)
Dutta, A.K., Theobald, M., Schenkel, R.: A Distributed In-Memory SPARQL Query Processor based on Message Passing (2012)
Harth, A., Hose, K., Schenkel, R.: Linked Data Management. In: CRC Press (2014)
Schmidt, M., Hornung, T., Lausen, G., Pinkel, C.: SP^ 2Bench: a SPARQL performance benchmark. In: Data Engineering, ICDE 2009, pp. 222–233 (2009)
The SP2Bench SPARQL Performance Benchmark. http://dbis.informatik.uni-freiburg.de/forschung/projekte/SP2B/. (accessed: December 02, 2014)
Guo, Y., Pan, Z., Heflin, J.: LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Science, Services & Agents on WWW 3(2), 158–182 (2005)
Berlin SPARQL Benchmark. http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/. (accessed: December 02, 2014)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mammo, M., Bansal, S.K. (2015). Presto-RDF: SPARQL Querying over Big RDF Data. In: Sharaf, M., Cheema, M., Qi, J. (eds) Databases Theory and Applications. ADC 2015. Lecture Notes in Computer Science(), vol 9093. Springer, Cham. https://doi.org/10.1007/978-3-319-19548-3_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-19548-3_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19547-6
Online ISBN: 978-3-319-19548-3
eBook Packages: Computer ScienceComputer Science (R0)