Abstract
OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets). These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach. We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bischof, S., et al.: Mapping between RDF and XML with XSPARQL. English. Journal on Data Semantics 1(3) (2012)
Blackwell, A., Green, T.: Cognitive Dimensions of Notations Resource Site (2010) http://www.cl.cam.ac.uk/afb21/CognitiveDimensions/
Breitling, F.: A standard transformation from XML to RDF via XSLT. Astronomische Nachrichten 330(7) (2009)
Diao, Y., et al.: Yfilter: Efficient and scalable filtering of XML documents. In: Data Engineering. IEEE (2002)
Ermilov, I., Auer, S., Stadler, C.: CSV2RDF: User-Driven CSV to RDF Mass Conversion Framework. In: I-Semantics (2013)
Haque, A., Perkins, L. Distributed RDF Triple Store Using HBase and Hive. In: University of Texas at Austin (2012)
Hert, M., Reif, G., Gall, H.C.: A comparison of RDB-to-RDF mapping languages. In: I-Semantics. ACM (2011)
Kiran, K.V., Sadasivam, D.G.S.: A Novel Method For Dynamic SPARQL Endpoint Generation In NoSQL Databases. Australian Journal of Basic and Applied Sciences 9(6) (2015)
Khadilkar, V., et al.: Jena-HBase: A distributed, scalable and efficient RDF triple store. In: ISWC Posters & Demonstrations (2012)
Klein, M.: Interpreting XML documents via an RDF schema ontology. In: DEXA (2002)
Lebo, T., Williams, G.T.: Converting governmental datasets into linked data. In: I-Semantics. ACM (2010)
Manghi, P., Mikulicic, M., Atzori, C.: OpenAIRE Data Model Specification. Deliverable
Manghi, P., Houssos, N., Mikulicic, M., Jörg, B.: The data model of the OpenAIRE scientific communication e-infrastructure. In: Dodero, J.M., Palomo-Duarte, M., Karampiperis, P. (eds.) MTSR 2012. CCIS, vol. 343, pp. 168–180. Springer, Heidelberg (2012)
Michel, F., Montagnat, J., Faron-Zucker, C.: A survey of RDB to RDF translation approaches and tools. Research report. I3S (2014)
Osborne, F., Motta, E.: Understanding research dynamics. In: Presutti, V., Stankovic, M., Cambria, E., Cantador, I., Di Iorio, A., Di Noia, T., Lange, C., Reforgiato Recupero, D., Tordai, A. (eds.) SemWebEval 2014. CCIS, vol. 475, pp. 101–107. Springer, Heidelberg (2014)
Papailiou, N., et al.: H2RDF: adaptive query processing on RDF data in the cloud. In: International conference on World Wide Web. ACM (2012)
Protocol Buffers (2015). https://developers.google.com/protocol-buffers/
Sahoo, S.S., et al.: A survey of current approaches for mapping of relational databases to RDF. W3C RDB2RDF Incubator Group Report (2009)
Scharffe, F., et al.: Enabling linked data publication with the Datalift platform. In: Proc. AAAI workshop on semantic cities (2012)
Sun, J., Jin, Q.: Scalable RDF store based on HBase and MapReduce. In: Advanced Computer Theory and Engineering (ICACTE), Vol. 1. IEEE (2010)
Svihla, M., Jelinek, I.: Benchmarking RDF production tools. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 700–709. Springer, Heidelberg (2007)
Wiedijk, F.: The de Bruijn factor (2012). http://cs.ru.nl/freek/factor/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vahdati, S., Karim, F., Huang, JY., Lange, C. (2015). Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML. In: Garoufallou, E., Hartley, R., Gaitanou, P. (eds) Metadata and Semantics Research. MTSR 2015. Communications in Computer and Information Science, vol 544. Springer, Cham. https://doi.org/10.1007/978-3-319-24129-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-24129-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24128-9
Online ISBN: 978-3-319-24129-6
eBook Packages: Computer ScienceComputer Science (R0)