Abstract
FAIR principles aim to enhance machine-actionability of research data management, and enable data consumers and providers to scale up to incoming data avalanches. This demo paper describes Leibniz Data Manager (LDM), a research data management repository that resorts to Semantic Web technologies to empower FAIR principles. During the demonstration, the attendees will create various digital objects, and observe the crucial role of metadata in efficient and effective management and analysis of research data management. LDM is publicly available: https://service.tib.eu/ldmservice/.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
FAIR data principles emphasize the crucial role of machine-processable metadata to find, access, interoperate, and reuse data with minimal human intervention [3]. Leibniz Data Manager is built on Semantic Web technologies to support researchers in documenting, analyzing, and sharing research datasets. LDM solves interoperability across repositories and integrates datasets published in other repositories. To present dataset metadata, it relies on existing vocabularies, e.g., DCATFootnote 1 and DataCiteFootnote 2. Also, data services implemented as Jupyter notebooksFootnote 3 enable the execution of live code over LDM repositories. The definition of various access privileges facilitates the access and management of the LDM datasets and data services. Lastly, a wide variety of available data visualizations enables the preview of the main characteristics of a dataset without downloading it. This demo demonstrates the LDM features in the whole lifecycle of research data management [2]. First, attendees will collect and describe a dataset, and generate a Digital Object Identifier (DOI)Footnote 4 that will persistently and globally identify their datasets. Next, they will explore metadata, describing the defined datasets, in various RDF serializations. Previews of the uploaded data will be visualized using a myriad of plots. Jupyter notebooks will be included as data services to demonstrate on-the-flight analyses. Lastly, datasets from other data repositories or data providers will be integrated; the attendees will be able to set up different synchronization schedules to keep datasets up to date.
2 The Leibniz Data Manager Architecture
Leibniz Data Manager aims at supporting the lifecycle of research data management: a) Planning research; b) Collecting data; c) Processing and Analyzing data; d) Publishing and Sharing; e) Preserving data, and f) Re-using data. Figure 1 depicts the main components for research data management and analysis. Data is collected from datasets in heterogeneous formats; also, data catalogs can be integrated from existing repositories (e.g., the data repository of the Leibniz University HannoverFootnote 5). Metadata describing a dataset is collected from the data provider; and existing vocabularies, e.g., DCAT and DataCite are utilized to describe the metadata following the Linked DataFootnote 6 and FAIR principles. The newly created dataset is uniquely and persistently identified by generating a DOI. Moreover, the user can define a scheduler for synchronizing the dataset with the other dataset providers [1]. Lastly, the user can describe the dataset access regulations. Once a dataset is part of the LDM catalog, data and metadata are created and synchronized according to the schedule defined during the data creation step. At the analysis level, LDM enables users to explore the datasets based on keyword queries or searches defined on DCAT properties (e.g., object types, formats, licenses). Metadata is presented in various RDF serializations and described using DCAT or DataCite. Datasets can be explored using multiple plots or visualized in 2D or 3D. Lastly, data services allow for the analysis of datasets via the use of interactive programming via Jupyter notebooks. LDM is implemented as an open source and extends the open data repository system CKANFootnote 7 along with extra features developed on top of CKAN extensions, e.g., ckanext-dcatFootnote 8. LDM is available as a Docker container to facilitate installing LDM distributionsFootnote 9. LDM is a publicly available resource maintained by the TIB – Leibniz Information Center for Science and Technology in HannoverFootnote 10. TIB is one of the largest libraries for Science and Technology in the worldFootnote 11, and actively promotes open access to digital research artifacts, e.g., research data, scientific literature, non-textual material, and software. Similar to other TIB services, LDM is regularly maintained and supported.
3 Demonstration of Use Cases
The demonstration aims at illustrating the LDM main functionalities and the support provided in each of the steps of the research data management lifecycle. During the demonstration, the attendees will be able to interact with LDM, and experiment the tasks of dataset creation and management, and dataset analysis.
3.1 Dataset Creation and Management
Attendees will go through dataset creation and specify metadata that characterizes the defined dataset; it includes title, description, tags, and license. Additionally, the dataset authors can be uniquely identified using their ORCIDFootnote 12 identifiers. Similarly, attendees will define data services for the datasets, and use controlled vocabularies, to express the meaning of the published data. Figure 2 illustrates the part of the interface used to collect this metadata and create a dataset. Attendees will create two types of datasets, i.e., local and imported from other repositories. Different schedulers for data synchronization will be defined. They will explore metadata in various vocabularies, e.g., DataCite, DCAT, or DublinCore, to analyze machine-readable descriptions of the defined datasets (Fig. 3). Furthermore, attendees will be able to explore and search the datasets based on metadata represented using these vocabularies. Different schedulers for data synchronization will allow for LDM adaptability and synchronization.
3.2 Dataset Analysis
Three types of datasets are available: (i) Local datasets, including data resources presented in various formats (e.g., CSV, JSON, text, or MP4). (ii) Imported datasets collected from existing data repositories on the Web. (iii) Data services running on top of datasets and providing data processing results in the form of non-alterable Python code. The attendees will publish these three different types of data resources (Fig. 4) and analyze main properties using live code implemented as Jupyter notebook servicesFootnote 13.
4 Conclusions
We demonstrate a data management system for supporting the lifecycle of research data management, i.e., data creation, documentation, analysis, preservation, and sharing. Datasets from other repositories can be imported and maintained up to date. The LDM demo puts in perspective the crucial role of Semantic Web technologies, and W3C recommended vocabularies, in the generation of machine-readable metadata respecting Linked Data and FAIR principles.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
References
Chamanara, J., Kraft, A., Auer, S., Koepler, O.: Towards semantic integration of federated research data. Datenbank-Spektrum 19(2), 87–94 (2019). https://doi.org/10.1007/s13222-019-00315-w
Mosconi, G., et al.: Three gaps in opening science. Comput. Support. Coop. Work 28(3–4), 749–789 (2019). https://doi.org/10.1007/s10606-019-09354-z
Wilkinson, M., et al.: The fair guiding principles for scientific data management and stewardship. Sci. Data 3(1), 1–9 (2016). https://doi.org/10.1038/sdata.2016.18
Acknowledgements
The project is funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the LIS Funding Programme e-Research Technologies (grant no. 438302423).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Beer, A., Brunet, M., Srivastava, V., Vidal, ME. (2022). Leibniz Data Manager – A Research Data Management System. In: Groth, P., et al. The Semantic Web: ESWC 2022 Satellite Events. ESWC 2022. Lecture Notes in Computer Science, vol 13384. Springer, Cham. https://doi.org/10.1007/978-3-031-11609-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-11609-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11608-7
Online ISBN: 978-3-031-11609-4
eBook Packages: Computer ScienceComputer Science (R0)