Leibniz Data Manager – A Research Data Management System

Beer, Anna; Brunet, Mauricio; Srivastava, Vibhav; Vidal, Maria-Esther

doi:10.1007/978-3-031-11609-4_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13384))

Included in the following conference series:

European Semantic Web Conference

973 Accesses
1 Citations

Abstract

FAIR principles aim to enhance machine-actionability of research data management, and enable data consumers and providers to scale up to incoming data avalanches. This demo paper describes Leibniz Data Manager (LDM), a research data management repository that resorts to Semantic Web technologies to empower FAIR principles. During the demonstration, the attendees will create various digital objects, and observe the crucial role of metadata in efficient and effective management and analysis of research data management. LDM is publicly available: https://service.tib.eu/ldmservice/.

Access provided by Autonomous University of Puebla. Download conference paper PDF

FAIR Data Infrastructure

A Comparative Study of Platforms for Research Data Management: Interoperability, Metadata Capabilities and Integration Potential

Dendro: A FAIR, Open-Source Data Sharing Platform

Keywords

1 Introduction

FAIR data principles emphasize the crucial role of machine-processable metadata to find, access, interoperate, and reuse data with minimal human intervention [3]. Leibniz Data Manager is built on Semantic Web technologies to support researchers in documenting, analyzing, and sharing research datasets. LDM solves interoperability across repositories and integrates datasets published in other repositories. To present dataset metadata, it relies on existing vocabularies, e.g., DCAT^{Footnote 1} and DataCite^{Footnote 2}. Also, data services implemented as Jupyter notebooks^{Footnote 3} enable the execution of live code over LDM repositories. The definition of various access privileges facilitates the access and management of the LDM datasets and data services. Lastly, a wide variety of available data visualizations enables the preview of the main characteristics of a dataset without downloading it. This demo demonstrates the LDM features in the whole lifecycle of research data management [2]. First, attendees will collect and describe a dataset, and generate a Digital Object Identifier (DOI)^{Footnote 4} that will persistently and globally identify their datasets. Next, they will explore metadata, describing the defined datasets, in various RDF serializations. Previews of the uploaded data will be visualized using a myriad of plots. Jupyter notebooks will be included as data services to demonstrate on-the-flight analyses. Lastly, datasets from other data repositories or data providers will be integrated; the attendees will be able to set up different synchronization schedules to keep datasets up to date.

2 The Leibniz Data Manager Architecture

Leibniz Data Manager aims at supporting the lifecycle of research data management: a) Planning research; b) Collecting data; c) Processing and Analyzing data; d) Publishing and Sharing; e) Preserving data, and f) Re-using data. Figure 1 depicts the main components for research data management and analysis. Data is collected from datasets in heterogeneous formats; also, data catalogs can be integrated from existing repositories (e.g., the data repository of the Leibniz University Hannover^{Footnote 5}). Metadata describing a dataset is collected from the data provider; and existing vocabularies, e.g., DCAT and DataCite are utilized to describe the metadata following the Linked Data^{Footnote 6} and FAIR principles. The newly created dataset is uniquely and persistently identified by generating a DOI. Moreover, the user can define a scheduler for synchronizing the dataset with the other dataset providers [1]. Lastly, the user can describe the dataset access regulations. Once a dataset is part of the LDM catalog, data and metadata are created and synchronized according to the schedule defined during the data creation step. At the analysis level, LDM enables users to explore the datasets based on keyword queries or searches defined on DCAT properties (e.g., object types, formats, licenses). Metadata is presented in various RDF serializations and described using DCAT or DataCite. Datasets can be explored using multiple plots or visualized in 2D or 3D. Lastly, data services allow for the analysis of datasets via the use of interactive programming via Jupyter notebooks. LDM is implemented as an open source and extends the open data repository system CKAN^{Footnote 7} along with extra features developed on top of CKAN extensions, e.g., ckanext-dcat^{Footnote 8}. LDM is available as a Docker container to facilitate installing LDM distributions^{Footnote 9}. LDM is a publicly available resource maintained by the TIB – Leibniz Information Center for Science and Technology in Hannover^{Footnote 10}. TIB is one of the largest libraries for Science and Technology in the world^{Footnote 11}, and actively promotes open access to digital research artifacts, e.g., research data, scientific literature, non-textual material, and software. Similar to other TIB services, LDM is regularly maintained and supported.

3 Demonstration of Use Cases

The demonstration aims at illustrating the LDM main functionalities and the support provided in each of the steps of the research data management lifecycle. During the demonstration, the attendees will be able to interact with LDM, and experiment the tasks of dataset creation and management, and dataset analysis.

3.1 Dataset Creation and Management

Attendees will go through dataset creation and specify metadata that characterizes the defined dataset; it includes title, description, tags, and license. Additionally, the dataset authors can be uniquely identified using their ORCID^{Footnote 12} identifiers. Similarly, attendees will define data services for the datasets, and use controlled vocabularies, to express the meaning of the published data. Figure 2 illustrates the part of the interface used to collect this metadata and create a dataset. Attendees will create two types of datasets, i.e., local and imported from other repositories. Different schedulers for data synchronization will be defined. They will explore metadata in various vocabularies, e.g., DataCite, DCAT, or DublinCore, to analyze machine-readable descriptions of the defined datasets (Fig. 3). Furthermore, attendees will be able to explore and search the datasets based on metadata represented using these vocabularies. Different schedulers for data synchronization will allow for LDM adaptability and synchronization.

3.2 Dataset Analysis

Three types of datasets are available: (i) Local datasets, including data resources presented in various formats (e.g., CSV, JSON, text, or MP4). (ii) Imported datasets collected from existing data repositories on the Web. (iii) Data services running on top of datasets and providing data processing results in the form of non-alterable Python code. The attendees will publish these three different types of data resources (Fig. 4) and analyze main properties using live code implemented as Jupyter notebook services^{Footnote 13}.

4 Conclusions

We demonstrate a data management system for supporting the lifecycle of research data management, i.e., data creation, documentation, analysis, preservation, and sharing. Datasets from other repositories can be imported and maintained up to date. The LDM demo puts in perspective the crucial role of Semantic Web technologies, and W3C recommended vocabularies, in the generation of machine-readable metadata respecting Linked Data and FAIR principles.

Notes

References

Chamanara, J., Kraft, A., Auer, S., Koepler, O.: Towards semantic integration of federated research data. Datenbank-Spektrum 19(2), 87–94 (2019). https://doi.org/10.1007/s13222-019-00315-w
Article Google Scholar
Mosconi, G., et al.: Three gaps in opening science. Comput. Support. Coop. Work 28(3–4), 749–789 (2019). https://doi.org/10.1007/s10606-019-09354-z
Article Google Scholar
Wilkinson, M., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016). https://doi.org/10.1038/sdata.2016.18
Article Google Scholar

Download references

Acknowledgements

The project is funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the LIS Funding Programme e-Research Technologies (grant no. 438302423).

Author information

Authors and Affiliations

TIB - Leibniz Information Centre for Science and Technology, Hannover, Germany
Anna Beer, Mauricio Brunet, Vibhav Srivastava & Maria-Esther Vidal
Leibniz University Hannover, Hannover, Germany
Maria-Esther Vidal

Authors

Anna Beer
View author publications
You can also search for this author in PubMed Google Scholar
Mauricio Brunet
View author publications
You can also search for this author in PubMed Google Scholar
Vibhav Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Maria-Esther Vidal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria-Esther Vidal .

Editor information

Editors and Affiliations

Faculty of Science, Informatics Institute, University of Amsterdam, Amsterdam, Noord-Holland, The Netherlands
Paul Groth
Department of Computer Engineering, University of Brescia, Brescia, Italy
Anisa Rula
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL, USA
Jodi Schneider
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Ilaria Tiddi
Bush House, Strand Campus, King’s College London, London, UK
Elena Simperl
Textkernel BV, Amsterdam, The Netherlands
Panos Alexopoulos
Elsevier BV, Amsterdam, The Netherlands
Rinke Hoekstra
FIZ Karlsruhe - Leibniz Institute for Information Infrastructure, Eggenstein-Leopoldshafen, Germany
Mehwish Alam
Department of Computer Science, KU Leuven, Sint-Katelijne-Waver, Belgium
Anastasia Dimou
Department of Computer Science, Aalto University, Espoo, Finland
Minna Tamper

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Beer, A., Brunet, M., Srivastava, V., Vidal, ME. (2022). Leibniz Data Manager – A Research Data Management System. In: Groth, P., et al. The Semantic Web: ESWC 2022 Satellite Events. ESWC 2022. Lecture Notes in Computer Science, vol 13384. Springer, Cham. https://doi.org/10.1007/978-3-031-11609-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-11609-4_14
Published: 20 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11608-7
Online ISBN: 978-3-031-11609-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leibniz Data Manager – A Research Data Management System

Abstract