Keywords

1 Introduction

Documents are central to scholarly communication. Virtually all research findings are nowadays communicated by means of electronic scholarly articles. Scholarly knowledge communicated in such form is hardly accessible to computers and the primary machine-supported tasks are largely limited to traditional full-text search. As such, the current scholarly infrastructure does not exploit modern information systems and technologies to their full potential [6].

We argue that there is an urgent need for a more flexible, fine-grained, context sensitive representation of scholarly knowledge and thus corresponding infrastructure for knowledge curation, publishing and processing. Furthermore, we suggest that representing scholarly knowledge as structured, interlinked, and semantically rich knowledge graphs is a key element of a technical infrastructure [3].

While some important conceptual foundations have been developed over several decades [1, 6], knowledge graph infrastructure for science has recently gained momentum in the literature and community. The Research Graph [2] is a prominent example of an effort that aims to link publications, datasets, and researchers. The Scholix project [4] standardized the information about the links between scholarly literature and data exchanged among (primarily) publishers and data repositories. More recently, the FREYA H2020 projectFootnote 1 has released information on their work towards a PID Graph [5]. The key distinguishing factor between these systems and the ORKG is the granularity of captured scholarly knowledge (article bibliographic metadata vs. materials, methods, and results communicated in articles).

2 Architecture and Features

The ORKG leverages knowledge graph technologies to represent, store, link, and process scholarly knowledge. It has two main components: The back end, which contains the logic to handle requests by client applications and the front end through which users create, curate or explore scholarly knowledge.

The concept of ResearchContribution is central to the ORKG as it represents key aspects of scholarly knowledge in structured, machine actionable form. A ResearchContribution is an information object which relates the ResearchProblem addressed by the contribution with a ResearchMethod and at least one ResearchResult.

Fig. 1.
figure 1

The ORKG architecture showing the main infrastructure components.

The ORKG back end represents descriptions by means of a graph data model. Similarly to the Research Description FrameworkFootnote 2 (RDF), the data model is centered around the concept of a statement, a triple consisting of two nodes (resources) connected by a directed edge. In contrast to RDF, it allows annotating edges and statements. As metadata of statements, provenance information, e.g. when and by whom a statement was created, is a concrete and relevant application of such annotations.

ORKG users interact with the front end (UI), which guides users through the process of creating research contribution descriptions in a step by step manner. More advanced features of the infrastructure include the ability to directly find similar contributions (and related papers), thus enabling efficient state-of-the-art comparison and literature review. Figure 1 depicts the ORKG system architecture.

Fig. 2.
figure 2

ORKG UI curation wizard step (3) depicting the auto-completion feature that enables linking existing resources (here, Java).

3 Use Case

Consider the following research contribution: FRANKENSTEIN [8is a collaborative question answering (QA) framework written in Java and Python. It generates QA pipelines based on predictions for the best performing pipelines obtained via a supervised learning model. FRANKENSTEINevaluates the results against QALD and LC-Quad datasets using the f1-score and accuracy@k metrics. We can identify the following instances of relevant concepts:

  • Problem: Collaborative question answering

  • Programming Language: Python, Java

  • Approach: Generate optimal QA pipelines

  • Datasets: QALD, LC-Quad

  • Evaluation Metrics: f1-score, accuracy@k.

Using the “Add paper” wizard (Fig. 2), we can create structured descriptions that encode, in machine actionable manner, the key information of research contributions. This process is straightforward also for non-technical users. Firstly, bibliographic metadata is collected, either via DOI lookup using the Crossref API or manually. Secondly, users can classify their paper according to the research domain. Finally, the research contributions described in the paper are collected using a flexible and dynamic interface.

4 Conclusion and Future Work

We presented the Open Research Knowledge Graph, an infrastructure that makes the first steps of a larger research and development agenda that aims to transition document-based scholarly communication to a knowledge-based information representation. In future work, we will include additional techniques from machine support to content creation and curation (such as NLP tools to suggest/annotate relevant concepts on behalf of users). Furthermore, we will further develop novel features such as state-of-the-art comparisons (Fig. 3). Such features will underscore the possibilities enabled by machine actionable scholarly knowledge and corresponding infrastructure.

Fig. 3.
figure 3

ORKG UI state-of-the-art comparison for research contributions, showing a subset of shared properties between two articles.