Skip to main content

Abstract

Resource Description Framework (RDF) is a widespread standard and flexible method of data representation. RDF storage systems are actively used for storing, sharing, and publishing RDF data on the Internet. RDF models are used in business applications and by research teams who wish to share their data with the community. Generally, most RDF stores are optimized for queries, but usually, this is achieved at the cost of increased disk space consumption. Renting a dedicated server with a large volume of local storage is quite expensive, especially for small research teams and business startups which makes it an important factor in choosing the data storage. In this study we compared disk space usage of four popular triple storage which can serve as SPARQL (an SQL-like query language) endpoints, depending on the amount and structure of the loaded RDF data. To the best of our knowledge, no previous work has compared the disk space occupied by triple stores. We found that all of the compared open-source solutions, namely Apache Jena Fuseki, consume large amounts of hard disk space and should be used with caution in resource-limited environments. The data structure – one large graph or many smaller named graphs – strongly affected Parliament’s disk space usage so it also should be taken into account when selecting an RDF storage. Free versions of commercial systems show adequate disk consumption and appear to be weakly dependent on data structure, but Ontotext GraphDb is deliberately limited in performance, and Stardog is limited in license term and may need additional manual maintenance.

The reported study was funded by RFBR, project number 20-07-00764.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://jena.apache.org/documentation/fuseki2.

  2. 2.

    https://docs.stardog.com.

  3. 3.

    https://graphdb.ontotext.com.

  4. 4.

    https://github.com/SemWebCentral/parliament/releases.

  5. 5.

    https://graphdb.ontotext.com/documentation/9.11/pdf/GraphDB-Free.pdf.

  6. 6.

    MiB: 1 mebibyte equals to \(1024^3\) bytes

  7. 7.

    RDF serialization format:https://www.w3.org/TR/turtle.

  8. 8.

    https://www.w3.org/wiki/RdfStoreBenchmarking.

References

  1. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  2. Ben Mahria, B., Chaker, I., Zahi, A.: An empirical study on the evaluation of the RDF storage systems. J. Big Data 8(1), 100 (2021)

    Google Scholar 

  3. Bizer, C., Schultz, A.: The Berlin SPARQL benchmark. Int. J. Seman. Web Inf. Syst.5, 1–24 (2009). https://doi.org/10.4018/jswis.2009040101

  4. Bonduel, M.: RDF triplestores and SPARQL endpoints (2019). http://www.linkedbuildingdata.net/ldac2019/summerschool/files/07_Bonduel_triplestores_SPARQL_endpoints.pdf

  5. Deb Nath, R.P., Hose, K., Pedersen, T.B., Romero, O., Bhattacharjee, A.: SETLBI: An Integrated Platform for Semantic Business Intelligence, pp. 167-171. Association for Computing Machinery, New York (2020), https://doi.org/10.1145/3366424.3383533

  6. Fellbaum, C.: WordNet, pp. 231–243. Springer Netherlands, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10

  7. Fernández, J.D., Umbrich, J., Polleres, A., Knuth, M.: Evaluating query and storage strategies for RDF archives. Semantic Web 10, 247–291 (2019). https://doi.org/10.3233/SW-180309

    Article  Google Scholar 

  8. Ilievski, F., et al.: KGTK: a toolkit for large knowledge graph manipulation and analysis. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 278–293. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_18

    Chapter  Google Scholar 

  9. Kirchhoff, M., Geihs, K.: Querying SAP ERP with SPARQL. In: Proceedings of the 8th International Conference on Semantic Systems, pp. 173–176. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2362499.2362525

  10. Pan, J.Z.: Resource Description Framework, pp. 71–90. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_3

  11. Ramesh, A., Pradhan, V., Lamkuche, H.: Understanding and analysing resource utilization, costing strategies and pricing models in cloud computing. J. Phys. Conf. Ser. 1964(4), 042049 (2021). https://doi.org/10.1088/1742-6596/1964/4/042049

  12. Sellami, S., Dkaki, T., Zarour, N.E., Charrel, P.J.: MidSemI. Int. J. Inf. Syst. Model. Des. 10(2), 1–25 (2019). https://doi.org/10.4018/ijismd.2019040101

    Article  Google Scholar 

  13. Storage and indexing of RDF data. In: Curé, O., Blin, G. (eds.) RDF Database Systems, pp. 105–144. Morgan Kaufmann, Boston (2015). https://doi.org/10.1016/B978-0-12-799957-9.00005-5

  14. Sychev, O.A., Anikin, A., Denisov, M.: Inference engines performance in reasoning tasks for intelligent tutoring systems. In: Gervasi, O., et al. (eds.) ICCSA 2021. LNCS, vol. 12950, pp. 471–482. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86960-1_33

    Chapter  Google Scholar 

  15. Sychev, O., Penskoy, N., Anikin, A., Denisov, M., Prokudin, A.: Improving comprehension: Intelaligent tutoring system explaining the domain rules when students break them. Educ. Sci. 11(11) (2021). https://doi.org/10.3390/educsci11110719

  16. Wishart, D.S., et al.: DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46(D1), D1074–D1082 (2017). https://doi.org/10.1093/nar/gkx1037

  17. Wylot, M., Hauswirth, M., Cudré-Mauroux, P., Sakr, S.: Rdf data storage and query processing schemes: A survey. ACM Comput. Surv. 51(4) (2018). https://doi.org/10.1145/3177850

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleg Sychev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Prokudin, A., Denisov, M., Sychev, O. (2023). Disk Space Consumption by Triple Storage Systems. In: Krouska, A., Troussas, C., Caro, J. (eds) Novel & Intelligent Digital Systems: Proceedings of the 2nd International Conference (NiDS 2022). NiDS 2022. Lecture Notes in Networks and Systems, vol 556. Springer, Cham. https://doi.org/10.1007/978-3-031-17601-2_26

Download citation

Publish with us

Policies and ethics