Keywords

1 Introduction

Metadata application profiles (MAP) are data element schemas from various namespaces mixed and customized for a specific application [9]. MAPs are the best mechanism to express consensus of any metadata instance by documenting the elements, policies, guidelines, and vocabularies for that particular implementation along with the schemas, and applicable constraints. Application profiles also provide the term usage specifications and support interoperability by representing domain consensus, alignment, and the structure of the data [1, 10].

Provenance information is vital for application profiles. Changelogs of application profile versions help to ensure the metadata instance’s longevity. The longevity of the schema is essential for metadata longevity. Metadata schema provenance should be documented and maintained for the preservation of metadata [13]. Application profile should provide sufficient administrative information, such as creator, date of release, version, and usage rights. Versioning of the application profile is crucial as it is a record of the application profile as well as metadata changes. Keeping changelogs might help to migrate data-sets to new profiles or create crosswalks to upgrade the instances. For Linked Open Data (LOD), changelogs help to update linked datasets as well.

Through this paper, the authors are attempting to:

  1. 1.

    Extend and clarify a previously proposed [25] extensible authoring format [24] to include structured and actionable changeset with a notion that the source of the MAP can include an actionable timeline of its development.

  2. 2.

    Use a lightweight ontology to distinguish and point the source of the MAP as well as the published MAP resources [4].

  3. 3.

    Use the same ontology to notate the provenance information with identifiable roles on authoring, publishing, and contributing for collaborative MAP development.

The anticipated outcomes of these proposals are:

  1. 1.

    Distinguish the source of the application profile from the published versions to baseline the concepts of authoring formats and expression formats for application profiles.

  2. 2.

    Identifying and retrieving application profiles and its versions, including changelogs, can be automated with the help of semantic linking of MAP resources.

  3. 3.

    A source of MAP with an interoperable authoring format consists of an actionable timeline can help to maintain the longevity of the schema. Declared roles of contribution can act as a means of provenance for MAP resources.

1.1 Application Profile Expression Formats

Dublin Core Metadata Initiative (DCMI) defines one of the earliest guidelines to express application profiles, which can be in various formats, as Description Set Profiles (DSP). DSP is a constraint language for Dublin Core Application Profiles (DCAP) based on the Singapore framework for application profiles [18]. XML or RDF can be used as an expression format for DSP.

Singapore framework recommends publishing the application profiles in human- readable expression formats as a documentation, with detailed usage guidelines aimed to maximize reusability and interoperability. Expressing application profile in human readable formats require much more components than textual descriptions of first-order elements such as properties and classes. As a result, the expression of an application profile in human readable formats is expected to have schematic representation, changelogs as well as detailed administrative information.

For the machine-actionable expressions, other than the XML and RDF, new standards are emerging and being widely accepted. Evolution of Linked Data encourages to express the application profiles in semantic web friendly formats like JSON-LD and OWL. Considering the developments in data linking and reuse, compelling use cases for expressing application profiles in promising data validation formats like ShEx or SHACL is increasing. Including these futuristic expression formats in application profile publishing will expand the scope of its usage as well as assures long-term usability.

1.2 Current Status and Availability of Application Profiles

Application profiles are not standardized in terms of availability, maintenance, and distribution. It requires human involvement to identify MAPs [15]. Because of this manual effort, curating and archiving MAPs is difficult and costly. In addition to automated methods, numerous registry initiatives also rely on manual contributions. Most of the application profiles are available only in human-friendly formats, and to distinguish them from other types of documents; this requires human involvement in the identification process. It is challenging to extract structured application profile data from spreadsheets or PDF documents. Lack of versioning, changes logs, and access to previous versions have a substantial impact on metadata information’s longevity and provenance. The absence of unified publication formats limits the automated processing of application profiles, thereby limiting the number of application profiles accessible in various attempts to register and curate them. The limited number of application profiles also restricts the primary purpose of metadata registries in using application profiles to promote interoperability and reuse [17]. There is also a lack of a standardized way to link data to the MAP it is based on [21, 22].

1.3 Challenges in Application Profile Development

To develop and manage application profiles, there are recommendations such as Me4DCAP which provides a set of guidelines to define, construct, and validate MAPs [14]. However, authoring tools and formats which are dedicated to MAP is less in number. Usually, application profile maintainers have to use different tools to create different expression formats, and this makes the whole process tedious for most of the domain experts. As a result, a large number of application profiles were authored and published only in the human-readable document formats. Availability of previous versions is not ensured and most of the MAPs doesn’t maintain older versions in a publicly accessible format.

Different communities have different levels of experience in the technical aspects of application profile expression formats. There is a severe lack of guidance for developing and publishing metadata application profiles. The barrier is the limited number of well-defined samples and initiatives for archiving and curating application profiles. For creating application profiles, there are not many well-accepted authoring formats or pre-processors.

1.4 Yet Another Metadata Application Profile (YAMA) as an Application Profile Authoring Format

Source formats used for application profile publication can be considered as an authoring format. This source formats can be processed with the help of processing tools such as a format converter or a parsing system to generate different expression formats of that application profile. A format to author the application profile cannot be considered as an expression of the application profile in all situations if the format is not a standard expression. The expression capabilities of such formats are dependable to its processors or conversion tools. This clear separation between authoring and expression formats is illustrated in Fig. 1.

Fig. 1.
figure 1

Authoring formats and expression formats for application profiles

For application profile, authors proposes Yet Another Metadata Application Profile (YAMA) as an extensible authoring format to address shortcomings of previous proposals [23]. Despite extensive knowledge of MAP, YAMA is meant to be simple enough that domain experts can use it. YAMA uses YAML Ain’t Markup Language (YAML), a robust human-friendly data serialization format with various implementations in most popular programming languages and considered to be JSON’s superset [2]. Basic structure of YAMA MAP section is explained in Fig. 2.

YAMA is also an attempt to resolve the lack of a workflow in authoring metadata application profiles. Given the increasing popularity of workflows based on GitHub, different output formats, and extensibility to various proposals such as ShEx, DCAT, PROV removes the need for repetitive tasks in the maintenance of metadata application profiles. YAMA is an intermediate MAP format to produce or convert different standard application profile expression formats.

Fig. 2.
figure 2

Structure of YAMA MAP

YAMA is extensible with custom elements and structure. For example, custom elements can be added to the document tree, as per the demand of the use case. The only restriction is that custom elements cannot be from reserved element sets. Capabilities of YAMA could be extended without any large-scale implementation changes within the scope of YAML specification. YAMA is based on DC-DSP, and a minimal DC-DSP is mandatory to express a MAP in YAMA. YAMA also includes a structured syntax to record modifications of a YAMA document named as change-sets, in addition to extensible key-values and structure. YAMA change-sets can be used to record changes of a MAP over any other versions. Change-sets are adapted from RFC 6902 JavaScript Object Notation (JSON) Patch [19], with the changes marked as an action to a path. Every change use ‘status’ as a reserved value to indicate status changes like ‘deprecation.’ This extensible nature of YAMA documents is explained in Fig. 3.

Fig. 3.
figure 3

Extensibility of a YAMA application profile

2 Related Work

As an application format, DCMI proposed a constrained language for Dublin Core Application Profiles named Description Set Profile (DSP). As an authoring format for DSP, a MoinMoin wiki syntax was introduced to embed Application Profiles in web pages. Later, Simple DSP (SDSP) [7]. A simplified form of DSP using spreadsheets as an authoring format was developed as part of the Metabridge project [17]. Recently, the DCMI application profile Special Interest Group is working on improving DSP [6]. Library of Congress BIBFRAME project developed a web-based editor for BibFrame Profiles [5]. Linked Data for Production 2 (LD4P2) project modified and released BIBFRAME editor for general application profile creation named Sinopia Profile Editor [11].

There is no extensibility of all these stated authoring formats. A format’s extensibility is critical to its acceptance, which helps different communities to adopt a simple base format and introduce specific domain requirements. It will also help to create different standard formats from the same source document without relying on the common elements. The authors previously proposed an extensible authoring format named Yet Another Metadata Application Profile (YAMA) [25] using YAMLFootnote 1 syntax and validated its extensibility over existing similar proposals [24].

Li and Sugimoto proposed a provenance model named DSP-PROV [13] to keep track of structural changes of metadata schemas. The DSP-PROV model applies PROV to the Dublin Core Application Profile. Different from the above proposal, this paper is treating application profile documents as a digital resource and attempting to use a lightweight ontology to map different versions of the published MAP and its provenance.

3 Methodology

The authors are attempting to extend a previously proposed MAP authoring format with an actionable timeline [23]. With the consideration that the format is to be a complete source of MAP authoring and versioning, a lightweight ontology is introduced to notate the authoring and versioning of MAP. The ontology is introduced with a notion that it can express different versions of the MAP as well as stakeholders and authoring source of the MAP.

3.1 Actionable Changesets as Timeline of MAP

YAMA is extended with two different sets of change mapping options. An actionable change record named ‘changesets’ - a collection of changes declared using a custom adaptation of JSON-PATCH - along with minimal metadata for the set of changes. Changesets are declared within the ‘changes’ path of the YAMA document. JSON patch is originally intended to use as HTTP-PATCH method for JavaScript Object Notation (JSON) [RFC4627]Footnote 2 - a standard format for storing and exchanging structured data. HTTP PATCH [RFC5789]Footnote 3 method extends the Hypertext Transfer Protocol (HTTP) [RFC2616]Footnote 4 as a method to perform partial modifications to resources. A simple JSON patch is shown in Fig. 4.

Adaption of JSON-Patch as a possible means of recording changes within the application profile authoring environment helps to makes the changes actionable without any lock-in as JSON-Patch is widely adapted and there are plenty of implementations in every popular programming languages. This acceptance helps the implementors to keep the format open for independent development and tooling within the workflow of MAP development.

A JSON Patch consists of sequential operations applied to a JSON object with one operation (op) element. As per RFC6902, valid operations are - add, remove, replace, move, copy, and test. Each operation must declare one path element which is a JSON Pointer - defined as per RFC6901 - points to a location to modify within the given JSON document. A JSON Pointer composed of a string of tokens separated by ‘/’ characters. These tokens can be a specific key in objects or indexes of arrays.

The remaining part of a JSON Patch operation consists of more elements depends on the specific type of operation.

Fig. 4.
figure 4

A basic JSON-Patch object indicating a removal operation

In theory, a YAMA document is a constrained YAML expression of a MAP which can be abstracted or converted into a valid JSON structure. The JSON patch is applied to this JSON structure instead of the YAML document. In order to make the JSON patch actionable for generating the pre or post-change versions of a MAP, authors extended the JSON patch by including a new optional elements which is applicable only for remove and replace operations. Another proposed additional element in the context of an application profile is status - which can notate the changes in status, such as deprecation, proposed, reserved, and obsolete. An example changeset expressed within YAMA is shows in Fig. 5. Minimal mandatory metadata elemets for YAMA changeset is given in Table 1.

Fig. 5.
figure 5

Example of YAMA ChangeSet

Table 1. Metadata for the changeset

Along with changesets, YAMA is extended with an optional changelog section, which is a human-readable list of changes with minimal metadata. Changelogs are not meant to be actionable but act as a structured collection of human-readable descriptions of changes, which can be changes intended to be documented but does not have any impact on the structure of the MAP document. Also, this section can serve as an alternate but meaningful textual representation of changes instead of utilizing the changeset. This changelog section is proposed for authors prefer to utilize another means of change management, such as a version control systems, or authors with minimal technical expertise on creating an actionable JSON-patch. A schematic representation of YAMA’s provenance components and their outcome is expressed in Fig. 6.

Fig. 6.
figure 6

YAMA with actionable changesets and changelogs mapped to their expected outputs

3.2 PAV Ontology as a Means of MAP Provenance

Provenance, Authoring and Versioning ontology (PAV)Footnote 5 [4] is developed as a lightweight ontology for notating minimal information which is essential for documenting the provenance, authoring, and versioning of resources published in the web. PAV clearly distinguishes between contributors, authors and publishes of digital resources. PAV is capable of representing the provenance of originating resources that have been accessed, transformed, and consumed.

PAV utilizes the W3C provenance ontology PROV-O, in order to describe authoring, publishing, and digital maintenance of online resources. PAV does not define any explicit classes, domain, or ranges; instead, every property is meant to be directly used in describing an online resource. This direct usage minimalizes the efforts required for expressing resources using an ontology. Being lightweight over PROV-O is the main reason for considering PAV to be a means of expressing MAP resources [4].

There are vocabularies similar to PAV such as Dublin Core Terms (DC Terms) [3], PROV-O [12], OPM [16], and Provenance Vocabulary [8]. Among that PROV-O is the most suitable and previously considered in many other studies to express MAP provenance. PROV-O is similar to a generic framework for describing provenance in a different range of applications. However, using PROV-O alone may not be suitable in expressing necessary details for the specific provenance involving authoring and versioning. PAV can be considered as a specialization of PROV-O by facilitating more straightforward relationships for expressing common provenance for digital resources in the web [4]. PROV-O implements terms useful in tracing the origin of a resource, its derivations, and the relationship between these different resources. PROV-O is also capable of expressing the different entities contributed to the resource. In short, PROV-O can be considered as a general provenance data model extendable for domain-specific provenance information. For example, PROV-O does not distinguish between authors, editors, and contributors - which is a noticeable distinction in use-cases like collaborative MAP authoring and publishing based on public repositories such as GitHub.

Table 2. Subset of PAV authoring properties mapped to YAMA MAP metadata elements
Table 3. Subset of PAV provenance properties mapped to YAMA MAP metadata elements
Table 4. Subset of PAV versioning properties mapped to YAMA MAP metadata elements

PAV based framework is proposed in the context of MAP authoring and publishing with these intentions.

  1. 1.

    Identify the persons and organizations or agents involved in the application profile development. Also, distinguish their roles as contributor or creator of the published MAP.

  2. 2.

    Mapping of MAP versions, release, and updates by distinguishes between published and last modified dates.

  3. 3.

    Track and distinguish the versions and source of the MAP, such as differentiating the provenance for the published versions of the application profiles and source repositories, the version control systems or authoring environment.

A detailed schematic explanation MAP versioning expression with PAV is narrated in Fig. 7. Tables 2, 3 and 4 shows the possible mapping of YAMA metadata elements a subset of PAV ontology.

Fig. 7.
figure 7

MAP publication is expressed in PAV ontology

4 Validation

To validate the proposal, a popular public application profile, The DCAT Application profile for data portals in Europe (DCAT-AP) can be used. DCAT-AP an application profile based on W3C’s Data Catalogue vocabulary (DCAT). DCAT is implemented for describing public sector datasets in Europe to enable a cross-data portal search for open data sets and make them searchable. DCAT-AP is published in Joinup portalFootnote 6, but the sources are maintained in a GitHub repositoryFootnote 7. DCAT-AP repository does not use any authoring format or preprocessors but maintains and releases the MAP in individual expression formats. As a well-maintained MAP, the repository holds three different versions - v1.1, v1.2, and v1.2.1. RDF expression of the MAP points to the previous version, but the whole versioning is not mapped within the RDF [20]. A minimal expression of DCAT-AP provenance with PAV in RDF is demonstrated below.

figure e

5 Limitations and Future Work

As an authoring format, YAMA can be extended to include the actionable changesets and parsable changelog. And PAV ontology can be used to point the source of the MAP, in which the YAMA changeset can be exposed as the timeline of the application profile. The main limitation of this approach is its inability in pointing to a standard format of the actionable changeset. A processor or system capable of understanding YAMA’s YAML format as well as JSON-Patch is required to parse the changeset and develop the timeline of the application profile from it. So it is recommended that the authors or publishes tender required efforts to properly expose the changesets in other standard actionable formats as well. Even though YAML and JSON-Patch are comparatively more uncomplicated concepts for structured data, they demand the authors to have the skill sets and capabilities to deal with these formats. Mainly these formats need to be generated or modified using a ‘real text editor’ as there is not yet any known dedicated graphical editor implementation for YAMA.

PAV ontology is capable enough to point to versions and sources of the application profiles. The authors made this recommendation purely on the notion that MAPs are published as a package of expression formats and documentation. PAV is not directly usable in differentiating these formats within the application profile package or even pointing to individual format. For example, PAV may not be sufficient enough in distinguishing and pointing to the individual files representing the human-readable documentation or machine-actionable expressions like RDF and ShEx. Also, PAV mapping needs to implemented in templates or generators, used in producing expression formats from YAMA. Webpages liked to the application profiles requires to use RDFa or JSON-LD to include the ontology in expressing the versions and source with PAV.

The future work is to adopt ontologies to cover YAMA changesets with the capability of mentioning changes within an actionable and semantic approach. Notating the relation between individual expressions formats inside the publishable application profile package is also being investigated.

6 Conclusion

Providing a simplified authoring format can substantially promote the application profile creation efforts. Utilizing extensibility of this authoring format to include actionable changelog as the timeline of MAP creation can help in ensuring longevity. The authors are attempted to explain the possibility of a previously proposed extensible authoring format for application profiles with an advanced changeset. This paper also demonstrates adopting a lightweight ontology to notate the versioning of this application profiles with distinguishing its source from the published expressions. Any attempts to ensure the provenance and longevity of the metadata application profile will also help to ensure the provenance of the schema. Schema maintenance will help to achieve better goals in data interoperability and seamless linking of data with automated techniques.