Keywords

1 Introduction

The Resource Description Framework (RDF) is a standard directed labeled graph data format for representing information on the Semantic Web, an extension of the web. RDF is expressed in triples, consisting of a subject, a predicate, and an object [9]. RDF concept and related specifications were introduced by the World Wide Web Consortium (W3C) and are maintained by the W3C. The important formats of RDF include RDF/XML, RDFa, JSON-LD, and Turtle. The main difference between RDF and other data formats is that RDF uses a directed graph model, allowing for more flexible and powerful data representations. The advantages of RDF include its flexibility, extensibility, and interoperability. RDF has importance in the modern web because it provides a standard way to represent data that can be shared across different applications and platforms.

The difference between RDF and other data formats like csv and json is that RDF is a graph based data format. This means that data is represented as a set of interconnected nodes in a graph, as opposed to being represented as a table or set of key-value pairs. RDF is also a standard format, which means that there are well-defined rules for how data should be represented in RDF. This makes it easier to exchange data between different systems, and to query data using standard tools.

YAMA Mapping Language (YAMAML) is proposed to improve 5-star levelFootnote 1 data publication with the notion that a profile-driven RDF generation can streamline the process by mapping multiple non-RDF sources to an RDF application profile of varying complexities.

1.1 Application Profiles, DCAP and DSP

Application profiles, often called Metadata Application Profiles, are a combination of vocabularies, which are mixed and matched from different namespaces and optimized for a particular local application [3]. Application profiles express the terms taken from other namespaces and the structural use of those terms in the local instance data. Application profiles also express constraints on those terms so that the data can be validated as well.

A Dublin Core Application Profile (DCAP) specifies how some metadata description sets are constructed. It includes information on the terms used in the description sets, how they are deployed, and constraints on the values and datatypes of the properties used. A DCAP is a declaration specifying which metadata terms an organization, information provider, or user community uses in its metadata. DCAPs can be used to document the semantics and constraints used for a set of metadata records or instance data. A DCAP can promote interoperability between different metadata models and harmonize metadata practices among different communities. A DCAP can also help communities of implementers harmonize metadata practice among themselves.

The Singapore Framework for Dublin Core Application Profiles [7] is a set of standards for designing metadata applications that are interoperable and reusable. The standards form a basis for reviewing Application Profiles for documentary completeness and conformance with Web-architectural principles. The standards define a set of descriptive components that are necessary or useful for documenting an Application Profile. The standards also describe how these documentary standards relate to standard domain models and Semantic Web foundation standards.

Description Set Profiles (DSP) is a constraint language for Dublin Core Application Profiles [6]. DSP is based on the DCMI Abstract Model (DCAM), which defines Description Set, Description, and Statement [8]. A DSP defines constraints on Description Sets, Descriptions, and Statements. Description Set Templates hold one or more Description Templates composed of Statement Constraints. DSP supports the RDF-oriented data design with properties and datatypes.

1.2 Yet Another Metadata Application Profile (YAMA)

Yet Another Metadata Application Profile (YAMA) is a user-friendly interoperable preprocessor for creating, maintaining, and publishing Metadata Application Profiles [12], developed to be a direct adaption of DublinCore DSP. It is heavily inspired by the Simple-DSP (SDSP) format [5] for the MetaBridge project.Footnote 2. Even though it helps to produce various formats and standards to express the application profiles, YAMA is not defined as a new standard for application profiles but as an easy-to-use preprocessor to create standard application profile formats; extensible [11] with custom elements and structure with a syntax based on YAML 1.2 specificationFootnote 3. However, it is parsable with any YAML 1.2 parser; the processing capabilities of the profile depend on implementations.

YAMAML is built with a minimal YAMA application profile concept, that a standard RDF can be expressed in an application profile with descriptions and statements.

1.3 Related Works

There are different attempts for Application Profiles and RDF mapping languages. A brief overview of the state-of-the-art is provided below.

DC Tabular Application Profiles (DC TAP) is a way to create application profiles in the form of tables. These tables can be read by humans and saved in a CSV format, which can be read by a computer program.Footnote 4

Tarql: SPARQL for Tables is a command-line tool that uses SPARQL 1.1 syntax to convert CSV files to RDF.Footnote 5

LinkML is a flexible modeling language that allows authors to create schemas in YAML that describe the structure of data. LinkML is also a framework for working with and validating data in a variety of formats (JSON, RDF, TSV) and can be used to compile LinkML schemas to other frameworks.Footnote 6

R2RML is a language expressing customized mappings from relational databases to RDF datasets. [1] This language allows different mapping implementations, such as creating a virtual SPARQL endpoint over the mapped relational data, generating RDF dumps, or offering a Linked Data interface.

RDF Mapping Language (RML) is a mapping language that can express customized mapping rules from heterogeneous data structures and serializations to the RDF data model. RML is defined as a superset of the W3C-standardized mapping language R2RML [2].

YARRRML is a human-readable text-based representation for declarative Linked Data generation rules. It is a subset of YAML that can be used to represent R2RML and RML rules.Footnote 7

CSV2RDF defines the procedures and rules for converting tabular data into RDF, including how metadata annotations can describe the structure, meaning, and interrelation of tabular data. [10]

RDF Transform is an extension for OpenRefineFootnote 8 that allows users to transform data into RDF formats. The RDF Transform extension provides a graphical user interface (GUI) for transforming OpenRefine project data to RDF-based formats. The transform maps the data with a template graph designed using the GUI.Footnote 9

Among these attempts, DC TAP is to devise a method of creating application profiles in tabular format. Other mapping language attempts are for generating RDF data from different types of input sources, but they are not oriented to application profiles. Considering these facts, YAMAML is a novel approach in devising an RDF mapping language based on application profiles.

2 Methods

The major goals of this attempt are :

  1. 1.

    Derive a subset of YAMA to express minimal RDF as an application profile.

  2. 2.

    Use the derived subset as a general purpose RDF data mapping language, suitable for both RDF generation and minimal profiling.

  3. 3.

    Use data mapping in a descriptive and opinionated format to make RDF mapping easier.

  4. 4.

    Develop a set of ready-to-use and simplified tooling for basic RDF generation.

2.1 Modeling Application Profile of RDF with Data Mapping

Application profiles are constrainers as well as explainers of the data. A typical YAMA application profile also includes constraining options to help generate data validation formats and ensure data quality. These constraining elements, such as cardinality and value constraints, are not part of the modeling or generating RDF, but they help generate RDF validation formats such as Shape Expressions (ShEx)Footnote 10 and Shapes Constraint Language (SHACL) [4]. For modeling a minimal application profile for RDF, YAMA constraining elements were avoided for the YAMAML subset. Also, an application profile is intended to generate human-readable documentation of the profile. Explainer elements such as labels and notes were removed from the YAMA profile to create the subset. So the subset required to explain minimal RDF application profile structure is limited to ’descriptions’ and ’statements’ with essential parameters.

Minimal YAMA application profile, which is based on DSP is adapted for YAMAL as well. A basic overview of YAMAML mapping of the application profile to data is explained in Figure 1.

Fig. 1.
figure 1

YAMAML mapping overview - basic application profile and data mapping.

2.2 Data Mapping

YAMAML’s data mapping is designed as multi-source capable, so users can use many data sources to generate a single RDF. The sources can be heterogeneous and require only a proper mapping to ID, so various data sources can be mixed and matched to create RDF of any level of complexity. A detailed description of all mapping elements is provided in Table 1.

YAMAML adapts all basic YAML collections from YAMA. Other than the YAMA collections base, namespaces, descriptions, and defaults, a special data collection is defined as a data holder. This optional collection container can store structured data as YAML or JSON. Since YAML is a superset of JSON, valid JSON is treated as valid YAML. All basic collection containers are explained in Table 2.

3 Results

YAMAML basic tooling and documentation are published at https://yamaml.org. The command line (CLI) toolkit can be used to generate relatively big and complex RDF. The playground web app is an in-browser environment, so it may not be sufficient for generating massive RDFs, but it will help to understand the basic implementation. A simple example of converting a basic CSV dataset is illustrated in Fig. 2. The command line tools for YAMAML are written in Javascript for Deno runtime.

YAMA is an extensible application profile authoring environment. It was extended to cover many use-cases like versioning, application profile change log management, and provenance [13]. YAMA can be used to generate application profile expressions, documentation, and validation schemas, and now with the data mapping, it can also be an RDF generator. This attempt to subset YAMA as an RDF mapping language is aiding YAMA to be an application profile ecosystem for Semantic Web and Linked Data.

Table 1. Elements in YAMAML data mapping

3.1 Comparison with State of the Art

DC TAP is focused more on authoring application profiles in a tabular way and is not extensible as YAMA. So it may not cover the use-case of RDF generation. Though DCTAP and YAMA are primarily based on Dublincore application profiles, YAMA follows the DC-DSP approach in modeling profiles. Thus YAMAML is modeled with a basic DC-DSP structure. Tarql requires a practical knowledge of SPARQL to map CSV to RDF. This is the potential limitation, where YAMAML is relatively simple to author a mapping. Though LinkML uses YAML as the serialization format, it has a steep learning curve. The same challenge is with YARRRML, which demands proper knowledge of R2ML and RML concepts. CSV2RDF requires CSVW to map the data, and it would be challenging to model complex RDF structures with CSVW. OpenRefine is the easiest option for RDF conversion with powerful data transformation capabilities. It has a user-friendly GUI and a sound reconciliation system with Wikidata support. OpenRefine RDF addon requires modeling the data in a certain way, which is not correctly equivalent to the application profile. In short, with all limitations, YAMAML tries to be a minimal, easy-to-use application profile-based RDF mapping language.

Table 2. YAMAML Containers

4 Discussion

YAMA is proposed as an extensible format [11] so that it can be extended to cover more use-cases and specific needs. With YAML’s flexibility, it can be a handy format for various RDF and linked data-related projects. Most of these requirements need custom tooling, which can help grow YAMA as an application profile ecosystem. So the authors believe that an RDF generation mapping language will be an added advantage in adapting YAMA for many real-world scenarios.

Fig. 2.
figure 2

YAMAML mapping example - a simplified example to demonstrate application profile mapping to a single CSV file with minimal data.

4.1 Limitations of This Approach

YAMAML mapping depends on declaring IDs for the descriptions; at least the main or initial description requires an ID mapping. ID can be any common data element similar to the primary key and foreign key concept of traditional relational database systems. This approach is a significant limitation and forces the users to pre-process their data with proper relationships. Another issue is that modeling complex RDF will require essential skills and time. Since YAMAML tries to be simple enough for basic uses, many advanced use-cases and edge cases were not considered in the design decisions. Though YAMAML can be used to map linked data using IRI stems, it is not intended to do a reconciliation of entities to any linked dataset. Tools like OpenRefine can do reconciliation and RDF generation from a GUI environment.

5 Conclusion

There are various tools and mapping languages for RDF, though some have high learning curves, and some are complex for basic use-cases. GUI-oriented tools like OpenRefine helps novice users to convert their data to RDF. The adaption of RDF will be less painstaking and popular if there are many tools from which the users can freely choose something that suits their needs. Many of these attempts have overlapping feature sets but still provide many unique features and options. The authors are optimistic that more accessible tools will eventually help to grow semantic web-oriented data sharing and expand the linked data cloud with more 5-star open data.