Keywords

1 Introduction

FAIR data principles emphasize the crucial role of machine-processable metadata to find, access, interoperate, and reuse data with minimal human intervention [3]. Leibniz Data Manager is built on Semantic Web technologies to support researchers in documenting, analyzing, and sharing research datasets. LDM solves interoperability across repositories and integrates datasets published in other repositories. To present dataset metadata, it relies on existing vocabularies, e.g., DCATFootnote 1 and DataCiteFootnote 2. Also, data services implemented as Jupyter notebooksFootnote 3 enable the execution of live code over LDM repositories. The definition of various access privileges facilitates the access and management of the LDM datasets and data services. Lastly, a wide variety of available data visualizations enables the preview of the main characteristics of a dataset without downloading it. This demo demonstrates the LDM features in the whole lifecycle of research data management [2]. First, attendees will collect and describe a dataset, and generate a Digital Object Identifier (DOI)Footnote 4 that will persistently and globally identify their datasets. Next, they will explore metadata, describing the defined datasets, in various RDF serializations. Previews of the uploaded data will be visualized using a myriad of plots. Jupyter notebooks will be included as data services to demonstrate on-the-flight analyses. Lastly, datasets from other data repositories or data providers will be integrated; the attendees will be able to set up different synchronization schedules to keep datasets up to date.

Fig. 1.
figure 1

The Leibniz Data Manager main components.

2 The Leibniz Data Manager Architecture

Leibniz Data Manager aims at supporting the lifecycle of research data management: a) Planning research; b) Collecting data; c) Processing and Analyzing data; d) Publishing and Sharing; e) Preserving data, and f) Re-using data. Figure 1 depicts the main components for research data management and analysis. Data is collected from datasets in heterogeneous formats; also, data catalogs can be integrated from existing repositories (e.g., the data repository of the Leibniz University HannoverFootnote 5). Metadata describing a dataset is collected from the data provider; and existing vocabularies, e.g., DCAT and DataCite are utilized to describe the metadata following the Linked DataFootnote 6 and FAIR principles. The newly created dataset is uniquely and persistently identified by generating a DOI. Moreover, the user can define a scheduler for synchronizing the dataset with the other dataset providers [1]. Lastly, the user can describe the dataset access regulations. Once a dataset is part of the LDM catalog, data and metadata are created and synchronized according to the schedule defined during the data creation step. At the analysis level, LDM enables users to explore the datasets based on keyword queries or searches defined on DCAT properties (e.g., object types, formats, licenses). Metadata is presented in various RDF serializations and described using DCAT or DataCite. Datasets can be explored using multiple plots or visualized in 2D or 3D. Lastly, data services allow for the analysis of datasets via the use of interactive programming via Jupyter notebooks. LDM is implemented as an open source and extends the open data repository system CKANFootnote 7 along with extra features developed on top of CKAN extensions, e.g., ckanext-dcatFootnote 8. LDM is available as a Docker container to facilitate installing LDM distributionsFootnote 9. LDM is a publicly available resource maintained by the TIB – Leibniz Information Center for Science and Technology in HannoverFootnote 10. TIB is one of the largest libraries for Science and Technology in the worldFootnote 11, and actively promotes open access to digital research artifacts, e.g., research data, scientific literature, non-textual material, and software. Similar to other TIB services, LDM is regularly maintained and supported.

Fig. 2.
figure 2

Dataset creation step. Metadata is collected to describe datasets, licenses, authors, data services, and semantic annotations.

3 Demonstration of Use Cases

The demonstration aims at illustrating the LDM main functionalities and the support provided in each of the steps of the research data management lifecycle. During the demonstration, the attendees will be able to interact with LDM, and experiment the tasks of dataset creation and management, and dataset analysis.

3.1 Dataset Creation and Management

Attendees will go through dataset creation and specify metadata that characterizes the defined dataset; it includes title, description, tags, and license. Additionally, the dataset authors can be uniquely identified using their ORCIDFootnote 12 identifiers. Similarly, attendees will define data services for the datasets, and use controlled vocabularies, to express the meaning of the published data. Figure 2 illustrates the part of the interface used to collect this metadata and create a dataset. Attendees will create two types of datasets, i.e., local and imported from other repositories. Different schedulers for data synchronization will be defined. They will explore metadata in various vocabularies, e.g., DataCite, DCAT, or DublinCore, to analyze machine-readable descriptions of the defined datasets (Fig. 3). Furthermore, attendees will be able to explore and search the datasets based on metadata represented using these vocabularies. Different schedulers for data synchronization will allow for LDM adaptability and synchronization.

Fig. 3.
figure 3

Semantically describing datasets.

3.2 Dataset Analysis

Three types of datasets are available: (i) Local datasets, including data resources presented in various formats (e.g., CSV, JSON, text, or MP4). (ii) Imported datasets collected from existing data repositories on the Web. (iii) Data services running on top of datasets and providing data processing results in the form of non-alterable Python code. The attendees will publish these three different types of data resources (Fig. 4) and analyze main properties using live code implemented as Jupyter notebook servicesFootnote 13.

Fig. 4.
figure 4

Jupyter notebook integrated over dataset for live code analysis.

4 Conclusions

We demonstrate a data management system for supporting the lifecycle of research data management, i.e., data creation, documentation, analysis, preservation, and sharing. Datasets from other repositories can be imported and maintained up to date. The LDM demo puts in perspective the crucial role of Semantic Web technologies, and W3C recommended vocabularies, in the generation of machine-readable metadata respecting Linked Data and FAIR principles.