1 Introduction

Knowledge graphs have been gaining more and more attention in recent years, particularly in settings that involve integrating and making sense of diverse data at large scale. Much of this attention stems from the 2012 announcement of the Google Knowledge Graph [63], which was later followed by further announcements of knowledge graphs by eBay, Facebook, IBM, Microsoft [53], and many more besides [35]. These enterprise knowledge graphs are typically internal to a company, where they aim to serve – per the words of Noy et al. [53] – as a common substrate of knowledge within the organisation, expanding and evolving over time. Other open knowledge graphs – such as BabelNet [49], DBpedia [44], Freebase [14], Wikidata [66], YAGO [33] – are made available to the public.

On a more technical level, there are varying perspectives about how knowledge graphs should be formally defined (if at all) [35]. This stems from the diversity of communities and organisations that have adopted the phrase. However, underlying all such perspectives is the foundational idea of representing knowledge using a graph abstraction, with nodes representing entities of interest in a given domain, and edges representing relations between those entities. Typically a knowledge graph will model different types of relations, which can be captured by labelled edges, with a different label used for each type of relation. Knowledge can consist of simple assertions, such as Charon orbits Pluto, which can be represented as directed labelled edges in a graph. Knowledge may also consist of quantified assertions, such as , which require a more expressive formalism to capture, such as an ontology or rule language.

A reader familiar with related concepts may then question: what is the difference between a graph database and a knowledge graph, or an ontology and a knowledge graph? Per the previous definition, both graph databasesFootnote 1 and ontologies can be considered knowledge graphs. It is then natural to ask: what, then, is new about knowledge graphs? What distinguishes research on knowledge graphs from research on graph databases, or ontologies, etc.?

If we so wish, we can choose to see the “vagueness” surrounding knowledge graphs as a feature not a bug: this vagueness offers flexibility for different communities to adapt the concept to their interests. No single research community can “lay claim” to the topic. Knowledge graphs can then become a commons for researchers from different areas to share and exchange techniques relating to the use of graphs to represent data and knowledge. In any case, most of the different choices – what graph model to use, what graph query language to use, whether to use rules or ontologies – end up being quite superficial choices.

Anecdotally, many of the most influential papers on knowledge graphs have emerged from the machine learning community, particularly relating to two main techniques: knowledge graph embeddings [67] and graph neural networks [70] (which we will discuss later). These works are not particularly related to graph databases (they do not consider regular expressions on paths, graph pattern matching, etc.) nor are they particularly related to ontologies (they do not consider interpretations, entailments, etc.), but by exploiting knowledge in the context of large-scale graph-structured data, they are related to knowledge graphs. Subsequently, one can find relations between these ostensibly orthogonal techniques; for example, while graph neural networks are used to inductively classify nodes in the graph based on numerical methods, ontologies can be used to deductively classify nodes in the graph based on symbolic methods, which raises natural questions about how they might compare.

Knowledge graphs, if we so choose, can be seen as an opportunity for researchers to bring together traditionally disparate techniques – relating to the use of graphs to represent and exploit knowledge – from different communities of Computer Science: to share and discuss them, to compare and contrast them, to unify and hybridise them. The goal of these notes will be to introduce examples of some of the principal directions along which such research may follow.

Recently we have published a tutorial paper online that offers a much more extensive (and generally less technical, more example-based) introduction to knowledge graphs than would be possible here, to which we refer readers new to the topic [35].Footnote 2 In the interest of keeping the current notes self-contained, and in order to establish notation, we will re-introduce some of the key concepts underlying knowledge graphs. However, our focus herein is to introduce specific open questions that we think may become of interest for knowledge graphs in the coming years. Specifically, we introduce six main concepts underlying knowledge graphs: data models, queries, ontologies, rules, embeddings and graph neural networks. Most of the questions we introduce intersect these topics.

2 Data Model

Knowledge graphs assume a graph-structured data model. The high-level benefits of modelling data as graphs are as follows:

  • Graphs offer a more intuitive abstraction of certain domains than alternative data models; for example, metro maps, flight routes, social networks, protein pathways, etc., are often visualised as graphs.

  • When compared with (normalised) relational schemas, graphs offer a simpler and more flexible way to represent diverse, incomplete, evolving data, which can be defined independently of a particular schema.

  • Graphs from different sources can be initially combined by taking the union of the constituent elements (nodes, edges, etc.).

  • Unlike other semi-structured data models based on trees, graphs allow for representing and working with cyclic relations and do not assume a hierarchy.

  • Graphs offer a more intuitive representation for querying and working with paths that represent indirect relations between entities.

A number of graph data models have become popular in practice, principally directed edge-labelled graphs, and property graphs [4]. The simpler of the two models is that of directed edge-labelled graphs, or simply del graphs henceforth. To build a del graph we will assume a set of constants \(\mathbf {C} \).

Definition 1 (Directed edge-labelled graph)

A directed edge-labelled graph (or del graph) is a set of triples \(G \subseteq \mathbf {C} \times \mathbf {C} \times \mathbf {C} \).

We provide an example of a del graph in Fig. 1, describing various astronomic bodies. The data are incomplete, but more data can be added incrementally by adding new nodes and edges. Each triple \((s,p,o) \in G\) denotes a directed labelled edge of the form \(s \xrightarrow {p} o\). Nodes in the graph denote entities of interest, and edges between them denote (binary) relations. We see both directed and undirected cycles formed from the relationships present between the entities.

Fig. 1.
figure 1

Del graph describing astronomic bodies

The property graph model is more ornate, and allows for adding labels and property–value pairs on both nodes and edges [4].

Definition 2 (Property graph)

A property graph G is a tuple

where \(V \subseteq \mathbf {C} \) is a set of node ids, \(E \subseteq \mathbf {C} \) is a set of edge ids, \(L \subseteq \mathbf {C} \) is a set of labels, \(P \subseteq \mathbf {C} \) is a set of properties, \(U \subseteq \mathbf {C} \) is a set of values, \(e : E \rightarrow V \times V\) maps an edge id to a pair of node ids, \(l : V \cup E \rightarrow 2^L\) maps a node or edge id to a set of labels, and \(p : V \cup E \rightarrow 2^{P \times U}\) maps a node or edge id to a set of property–value pairs.

In Fig. 2 we provide an example of a property graph alongside a del graph representing analogous information. The graph is defined as follows:

figure b

While del graphs form the basis of the RDF data model [18], property graphs are used in a variety of popular graphs databases systems, such as Neo4j [48]. Comparing del graphs and property graphs, we can say that del graphs are simpler, but property graphs are more flexible, particularly in terms of the ability to add property–value pairs to edges. However, property graphs can be represented as del graphs (and vice versa) as illustrated in Fig. 2, where edges with property–value pairs in the property graph can be converted to nodes in the del graph, as seen for Pb orbit; edges without property–value pairs in the property graph can be represented directly as edges in the del graph.

Fig. 2.
figure 2

Del graph (above) and property graph (below) with analogous information describing a planet orbiting Proxima Centauri

We identify two particular topics relating to modelling data as graphs.

figure c
figure d
Fig. 3.
figure 3

Del graph pattern looking for trinary star systems related to Sol

Henceforth we will focus on del graphs, as the simpler of the two models.

3 Queries

We may wish to pose questions over the data represented by the graph. While a range of query languages have been proposed for querying graphs – including SPARQL for RDF graphs [29]; Cypher [23], Gremlin [56], and G-CORE [3] for property graphs – the same foundational elements underlie all such languages: graph patterns, relational algebra, and path expressions [4]. Herein we will assume data represented as a del graph, though the concepts we present generalise quite straightforwardly to other graph models, as appropriate [35].

At the core of graph queries is the notion of graph patterns. The most basic form of graph pattern is a graph that additionally allows variables to replace constants in any position. We refer to the set of variables as \(\mathbf {V} \). We refer to the union of constants \(\mathbf {C} \) and variables \(\mathbf {V} \) as terms, denoted \(\mathbf {T} \) (i.e., \(\mathbf {T} = \mathbf {C} \cup \mathbf {V} \)).

Definition 3 (Directed edge-labelled graph pattern)

A directed edge-labelled graph pattern (or del graph pattern) is a set of triples \(Q \subseteq \mathbf {T} \times \mathbf {T} \times \mathbf {T} \).

We provide an example in Fig. 3 of a del graph pattern looking for trinary star systems related (in some way) to Sol. Variables are prefixed with “?”. A graph pattern is then evaluated over a data graph by mapping the variables of the graph pattern to constants such that the image of the graph pattern under the mapping is a sub-graph of the data graph. More formally, let \(\mu : \mathbf {V} \rightarrow \mathbf {C} \) denote a partial mapping from variables to constants, and let \(\mathrm {dom} (\mu )\) denote the set of variables for which \(\mu \) is defined. Further let \(\mathbf {V} (Q)\) denote the set of variables used in Q. We can then define the evaluation of Q over G as:

Definition 4 (Evaluation of a del graph pattern)

Let Q be a del graph pattern and let G be a del graph. We then define the evaluation of graph pattern Q over G, as .

Graph patterns of this form can be seen as transforming graphs into tables. In Table 1 we present the results of evaluating the del graph pattern of Fig. 3 over the del graph of Fig. 1, where each row indicates a solution given by a mapping \(\mu \) per the previous definition. Graph patterns can be evaluated under different semantics by placing further restrictions on \(\mu \). Without further restrictions, a homomorphism-based semantics is considered where two or more variables in an individual solution can be mapped to a single constant in the data. On the other hand, if we restrict \(\mu \) to be one-to-one, a (sub-graph) isomorphism-based semantics is considered, where each variable in an individual solution must map to a different constant. All of the solutions shown in Table 1 are returned under the unrestricted homomorphism-based semantics, while only the first solution is returned under the restricted isomorphism-based semantics.

Given that graph patterns return tables, a natural idea is to introduce the relational algebra to allow the solutions of a graph pattern to be transformed, and the solutions of several graph patterns to be combined: using \(\pi \) to project variables or \(\sigma \) to select rows returned from a single graph pattern, or using \(\cup \), −, or \(\bowtie \) to apply a union, difference, or join (respectively) across the results of two graph patterns. We can also introduce further syntactic operators, such as left-joins (, aka optionals), anti-joins (; aka not exists), etc. As a simple example, given the graph pattern Q of Fig. 3, we may define \(\pi _{\textsf {?tstar}}(\sigma _{\textsf {?bstar1} \ne \textsf {?bstar2}}(Q))\) to select only solutions where ?bstar1 and ?bstar2 map to different constants, and subsequently project only the results for the ?tstar variable. Graph patterns extended with the relational algebra are called complex graph patterns [3]. Graph patterns extended with projection alone are called conjunctive queries.

Table 1. Evaluation of the del graph pattern of Fig. 3 over the del graph of Fig. 1 under homomorphism-based semantics

The features we have seen thus far can only match bounded sub-graphs of the data graph (more formally, we can say that they are Gaifman-local [45]). However, we may be interested to find pairs of nodes that are connected by paths of arbitrary lengths. In order to express such queries, we can introduce path expressions and regular path queries.Footnote 3

Definition 5 (Path expression)

A constant \(c \in \mathbf {C} \) is a path expression. Furthermore:

  • If r is a path expression, then \(r^-\) (inverse) and \(r^*\) (Kleene star) are path expressions.

  • If \(r_1\) and \(r_2\) are path expressions, then \(r_1 \cdot r_2\) (concatenation) and \(r_1 \mid r_2\) (disjunction) are path expressions.

In the following we assume that the evaluation of a path expression returns pairs of nodes connected by some path that satisfies the path expression [29]. Other query languages may support returning paths [3, 23]. We will further introduce the notation \(N_G\) to denote the nodes of G; more formally, we define the nodes of G as .

Definition 6 (Path expression evaluation)

Given a del graph G and a path expression r, we define the evaluation of r over G, denoted r[G], as follows:

where by \(r^n\) we denote the nth-concatenation of r (e.g., \(r^3 = r \cdot r \cdot r\)).

Let \(\mathbf {R} \) denote the set of all path expressions. We next define a regular path query and a navigational graph pattern [4].

Definition 7 (Regular path query)

A regular path query is a triple \((x,r,y) \in \mathbf {T} \times (\mathbf {R} \cup \mathbf {V}) \times \mathbf {T} \).

Definition 8 (Navigational graph pattern)

A navigational graph pattern Q is a set of regular path queries.

We provide an example of a navigational graph pattern in Fig. 4, searching for planemos in the Milky Way, along with the star(s) they orbit. The evaluation of a navigational graph pattern is then defined analogously to the evaluation of graph patterns, but where pairs of nodes connected by arbitrary-length paths – rather than simply edges – can now be matched. The navigational graph pattern of Fig. 4 will return four results, with ?star mapped to Sun in each, and ?planemo mapped to Earth, Pluto, Charon and Luna.

Fig. 4.
figure 4

Navigational graph pattern looking for planemos in the Milky Way and the stars that they orbit

Finally, noting that navigational graph patterns again transform graphs into tables, it is natural to define complex navigational graph patterns, which support applying relational algebra over the results of one or more navigational graph patterns. Navigational graph patterns extended with (only) projection are known as conjunctive two-way regular path queries (C2RPQs) [9].

The task of finding solutions for a query over a graph is known as query answering. While query answering have been well-studied in the literature [4, 9], there are a number of interesting problems that arise when considering the evaluation of (complex) navigational graph patterns. We now discuss two topics.

figure e
figure f

4 Ontologies

Looking more closely at the graph of Fig. 1, we may be able to deduce additional knowledge beyond what is explicitly stated in the graph. For example, we might conclude that Charon and Luna are instances of Moon; that Toliman and Rigil Kentaurus are instances of Binary Stars; that Luna, Earth, Sun, etc., are all part of the Milky Way, etc. In order to be able to draw such conclusions automatically from a graph, we require a formal representation of the semantics of the terms used in the graph. While there are a number of alternatives that we might consider for this task, we initially focus on ontologies, which have long been studied in the context of defining semantics over graphs.

Ontologies centre around three main types of elements: individuals, concepts (aka classes) and roles (aka properties). They allow for making formal, well-defined claims about such elements, which in turn opens up the possibility to perform automated reasoning. In terms of the semantics of ontologies, there are many well-defined syntaxes we can potentially use, but perhaps the most established formalism is taken from Description Logics (DL) [8, 41, 57], which studies logics centring around (unary and) binary relations, and logical primitives for which reasoning tasks remain decidable.

Table 2 defines the typical DL constructs one can find in the literature. The syntax column denotes how the construct is expressed in DL. A DL ontology then consists of an assertional box (A-Box), a T-Box (terminological box), and an R-Box (role box), each of which consists of a set of axioms that describe formal claims about individuals, concepts and properties, respectively.

Table 2. Description Logic syntax and semantics

Definition 9 (DL ontology)

A DL ontology \(\mathsf {O}\) is defined as a tuple \((\mathsf {A},\mathsf {T},\mathsf {R})\), where \(\mathsf {A}\) is the A-Box: a set of assertional axioms; \(\mathsf {T}\) is the T-Box: a set of concept (aka class/terminological) axioms; and \(\mathsf {R}\) is the R-Box: a set of role (aka property/relation) axioms.

Syntactically, DL ontologies can be serialised as (del) graphs. Such a serialisation is defined by the Web Ontology Language (OWL) [31], which draws inspiration from the DL area towards standardising languages for which common reasoning tasks are decidable (or even tractable), and which defines serialisations of OWL ontologies as RDF graphs. In this context, looking at Fig. 1, for example, a triple such as can be read as a concept axiom \(\texttt {Planet} \sqsubseteq \texttt {Planemo}\), a triple such as can be read as a role axiom \(\texttt {child}(\texttt {Pluto},\texttt {Charon})\), while a triple such as can be read as a concept assertion \(\texttt {Planemo}(\texttt {Charon})\).

Regarding the formal meaning of these axioms, the semantics column of Table 2 defines axioms using interpretations.

Definition 10 (DL interpretation)

A DL interpretation I is defined as a pair \((\varDelta ^I,\cdot ^I)\), where \(\varDelta ^I\) is the interpretation domain, and \(\cdot ^I\) is the interpretation function. The interpretation domain is a set of individuals. The interpretation function accepts a definition of either an individual a, a concept C, or a role R, mapping them, respectively, to an element of the domain (\(a^I \in \varDelta ^I\)), a subset of the domain (\(C^I \subseteq \varDelta ^I\)), or a set of pairs from the domain (\(R^I \subseteq \varDelta ^I\times \varDelta ^I\)).

An interpretation I satisfies an ontology \(\mathsf {O}\) if and only if, for all axioms in \(\mathsf {O}\), the corresponding semantic conditions in Table 2 hold for I. In this case, we call I a model of \(\mathsf {O}\). This notion of a model gives rise to the key notion of entailment.

Definition 11

Given two DL ontologies \(\mathsf {O}_1\) and \(\mathsf {O}_2\), we define that \(\mathsf {O}_1\) entails \(\mathsf {O}_2\), denoted \(\mathsf {O}_1 \models \mathsf {O}_2\), if and only if every model of \(\mathsf {O}_1\) is a model of \(\mathsf {O}_2\).

As an example of a DL ontology, for , let:

  • ;

  • ;

  • .

For \(I = (\varDelta ^I,\cdot ^I)\), let:

  • ;

  • , , , ;

  • , ;

  • , .

The interpretation I is not a model of \(\mathsf {O}\) since it does not have that is an instance of Moon, nor that orbits , nor that orbits . However, if we extend the interpretation I with the following:

  • ;

  • .

then the interpretation of I will satisfy – i.e., will be a model of – the ontology \(\mathsf {O}\). Notably even though the ontology \(\mathsf {O}\) does not entail that \(\texttt {Planemo(Pluto)}\), it is still the case that I satisfies \(\mathsf {O}\); this is often referred to as the Open World Assumption, where ontologies are not assumed to completely describe the world, but rather to be incomplete. Now given the ontology:

we may – with a bit of thought – convince ourselves of the fact that any model I of \(\mathsf {O}\) must also be a model of \(\mathsf {O}'\), and hence that \(\mathsf {O} \models \mathsf {O}'\).

Unfortunately, deciding entailment for DL ontologies using all of the axioms of Table 2 in an unrestricted manner is undecidable. Hence, different DLs then apply different restrictions to the use of these axioms in order to achieve particular guarantees with respect to the complexity of deciding entailment. Most DLs are founded on one of the following base DLs (we use indentation to denote that the child DL extends the parent DL):

  • \(\mathcal {ALC}\) (\(\mathcal {A}\)ttributive \(\mathcal {L}\)anguage with \(\mathcal {C}\)omplement [60]), supports atomic concepts, the top and bottom concepts, concept intersection, concept union, concept negation, universal restrictions and existential restrictions. Role and concept assertions are also supported.

    • \(\mathcal {S}\) extends \(\mathcal {ALC}\) with transitive closure.

These base languages can be extended as follows:

  • \(\mathcal {H}\) adds role inclusion.

    • \(\mathcal {R}\) adds (limited) complex role inclusion, as well as role reflexivity, role irreflexivity, role disjointness and the universal role.

  • \(\mathcal {O}\) adds nomimals.

  • \(\mathcal {I}\) adds inverse roles.

  • \(\mathcal {F}\) adds (limited) functional properties.

    • \(\mathcal {N}\) adds (limited) number restrictions.

      • \(\mathcal {Q}\) adds (limited) qualified number restrictions (subsuming \(\mathcal {N}\) given \(\top \)).

We use “(limited)” to indicate that such features are often only allowed under certain restrictions to ensure decidability; for example, complex roles (chains) typically cannot be combined with cardinality restrictions. DLs are then typically named per the following scheme, where [a|b] denotes an alternative between a and b and [c][d] denotes a concatenation cd:

$$\begin{aligned}{}[\mathcal {ALC}|\mathcal {S}][\mathcal {H}|\mathcal {R}][\mathcal {O}][\mathcal {I}][\mathcal {F}|\mathcal {N}|\mathcal {Q}] \end{aligned}$$

Examples include \(\mathcal {ALCO}\), \(\mathcal {ALCHI}\), \(\mathcal {SHIF}\), \(\mathcal {SROIQ}\), etc. These languages often apply additional restrictions on concept and role axioms to ensure decidability, which we do not discuss here. For further details on Description Logics, we refer to the recent book by Baader et al. [8].

We now discuss a research topic relating to the combination of ontology entailment and graph querying.

figure l

5 Rules

Another way to define the semantics of knowledge graphs is to use rules. Rules can be formalised in many ways – as Horn clauses, as Datalog, etc. – but in essence they consist of if-then statements, where if some condition holds, then some entailment holds. Here we define rules based on graph patterns.

Definition 12 (Rule)

A rule is a pair such that B and H are graph patterns. The graph pattern B is called the body of the rule while H is called the head of the rule.

Given a graph G and a rule \(R = (B,H)\), we can then apply R over G by taking the union of \(\mu (H)\) for all \(\mu \in B(G)\). Typically we will assume that the set of variables used in H will be a subset of those used in B (\(\mathbf {V} (H) \subseteq \mathbf {V} (B)\)), which assures us that \(\mu (H)\) will always result in a graph without variables. Given a set of rules, we can then apply each rule recursively, accumulating the results in the graph until a fixpoint is reached, thus enriching our graph with entailments.

There is a large intersection between the types of semantics we can define with rules and with DL-based ontologies [40]. For example, we can capture the role-inclusion axiom \(\textsf {parent} \sqsubseteq \textsf {orbits}\) as a rule (written \(B \rightarrow H\)):

$$\begin{aligned} \{ (\texttt {?x},\texttt {parent},\texttt {?y}) \} \rightarrow \{ (\texttt {?x},\texttt {orbits},\texttt {?y}) \} \end{aligned}$$

or we can capture the concept inclusion \(\texttt {Planet} \sqsubseteq \forall \texttt {child}.\texttt {Moon}\) as:

$$\begin{aligned} \{ (\texttt {?x},\texttt {type},\texttt {Planet}), (\texttt {?x},\texttt {child},\texttt {?y}) \} \rightarrow \{ (\texttt {?y},\texttt {type},\texttt {Moon}) \}\ . \end{aligned}$$

However, we cannot capture axioms such as \(\texttt {BinaryStar} \sqsubseteq \exists \texttt {orbits}.\texttt {BinaryStar}\) with the types of rules we have seen. Such an axiom would need a rule like:

$$\begin{aligned} \{ (\texttt {?x},\texttt {type},\texttt {BinaryStar}) \} \rightarrow \exists y:\{ (\texttt {?x},\texttt {orbits},y),(y,\texttt {type},\texttt {BinaryStar}) \} \end{aligned}$$

where the head introduces a fresh existential variable y for each binary star, which itself is a binary star. (Note that if left unrestricted, such a rule, applied recursively on a single instance of a binary star, might end up creating an infinite chain of existential binary stars, each orbiting their existential successor!)

Additionally, rules as we have previously defined cannot capture axioms of the form \(\texttt {Planemo} \sqsubseteq \texttt {Moon} \sqcup \texttt {Planet} \sqcup \texttt {DwarfPlanet}\) either. For this we would need a rule along the lines of the following:

$$\begin{aligned} \{ (\texttt {?x},\texttt {type},\texttt {Planemo}) \} \rightarrow&\{ (\texttt {?x},\texttt {type},\texttt {Moon}) \} \vee \\&\{ (\texttt {?x},\texttt {type},\texttt {Planet}) \} \vee \\&\{ (\texttt {?x},\texttt {type},\texttt {DwarfPlanet}) \} \ . \end{aligned}$$

Here we introduce the disjunctive connective “\(\vee \)” to state that when the body is matched, then at least one of the heads holds true.

Rules extended with existentials and disjunction have, however, been explored as extensions of Datalog. Datalog\(^\pm \) variants support existential rules (aka tuple generating dependencies (tgds)) that allow for generating existentials in the head of the rule, per the BinaryStar example, but typically in a restricted way to ensure termination. On the other hand, Disjunctive Datalog allows for using disjunction in the head of rules, as seen in the Planemo example. These extensions allow rules to capture semantics similar to more expressive DLs [26, 27, 58].

On the other hand, rules can express entailments not typically supported in DLs, where a simple example of such a rule is:

$$\begin{aligned} \{ (\texttt {?x},\texttt {orbits},\texttt {?y}), (\texttt {?y},\texttt {orbits},\texttt {?x}) \} \rightarrow&\{ (\texttt {?x},\texttt {sibling},\texttt {?y}) \}\ . \end{aligned}$$

This type of entailment would require role intersection, which is not typically supported by DLs. More generally, cyclical graph patterns that entail role assertions are typically not supported in DLs, though easily captured by rules.Footnote 4

Efficient reasoning systems have then been developed to support such rules over knowledge graphs, including most recently, Vadalog [11] and VLog [17].

figure m

6 Context

All of the data represented in Fig. 1 holds true with respect to a given context: a given scope of truth. As an example of temporal context, the Sun will cease being a G-type Star and will eventually become a White Dwarf in its later years. We may also have spatial context (e.g., to state that something holds true in a particular region of space, such as a solar eclipse affecting parts of Earth), provenance (e.g., to state that something holds true with respect to a given source, such as the mass of a given planet), and so forth.

There are a variety of syntactic ways to embed contextual meta-data in a knowledge graph. Within del graphs, for example, the options include reification [18], n-ary relations [18], and singleton properties [52]. Context can also be represented syntactically using higher-arity models, such as property graphs [4], RDF* [30], or named graphs [18]. For further details on these options for representing context in graphs, we refer to the extended tutorial [35].

Aside from syntactic conventions for representing context in graphs, an important issue is with respect to how the semantics of context should be defined. A general contextual framework for graphs based on annotations has been proposed by Zimmermann et al. [75], based on the notion of an annotation domain.

Definition 13 (Annotation domain)

Let A be a set of annotation values. An annotation domain is defined as an idempotent, commutative semi-ring \(D = \langle A,\oplus ,\otimes ,\mathbf {0},\mathbf {1} \rangle \).

For example, we may define A to be powerset of numbers from \(\{1,\ldots ,365\}\) indicating different days of the year. We may then annotate triples such as with a set of days in a given year – say \(\{13,245,301\}\) on which the given occultation will occur. Given another similar such triple – say annotated with \(\{46,245,323\}\) – we can define \(\oplus \) to be \(\cup \) and use it to find the annotation for days when Luna occults the Sun or Jupiter (\(\{13,46,245,301,323\}\)); we can further define \(\otimes \) to be \(\cap \) and use it to find the annotation for days when Luna occults the Sun and Jupiter (\(\{245\}\)). In this scenario, \(\mathbf {0}\) would then refer to the empty set, while \(\mathbf {1}\) would refer to the set of all days of the year. Custom annotation domains can be defined for custom forms of context, and can be used in the context of querying and rules for computing the annotations corresponding to particular results and entailments.

In the context of ontologies, a more declarative framework for handling context – called contextual knowledge repositories – was proposed by Homola and Serafini [37] based on Description Logics, with a related framework more recently proposed by Schuetz et al. [61] based on an OLAP-style abstraction. These frameworks can assign graphs into different hierarchical contexts, which can then be aggregated into higher-level contexts.

Still, there are a number of questions that remain open regarding context.

figure n

7 Embeddings

Machine learning has been gaining more and more attention in recent years, particularly due to impressive advances in sub-areas such as Deep Learning, where multi-layer architectures such as Convolutional Neural Networks (CNNs) have led to major strides in tasks involving multi-dimensional inputs (such as image classification), while architectures such as Recurrent Neural Networks (RNNs) have likewise lead to major strides in tasks involving sequential data (such as natural language translation). An obvious question then is: what kinds of learning architectures could be applied to graphs? The answer, unfortunately, is not so obvious. While machine learning architectures typically assume numeric inputs in the form of tensors (multi-dimensional arrays), graphs – unlike, say, images – are not naturally conceptualised in such terms.

A first attempt to encode a graph numerically would be a “one-hot encoding”. Recalling the notation of \(N_G\) for the nodes of a del graph, we introduce a similar notation \(E_G\) to denote the edge-labels (i.e., predicates or properties) of a del graph; formally, . We could consider creating a 3-mode tensor of size \(|N_G| \cdot |E_G| \cdot |N_G|\), where the element at (ijk) is 1 if \((n_i,e_j,n_k) \in G\), or 0 otherwise; here \(n_i\) and \(n_k\) denote the ith and kth nodes in an indexing of \(N_G\), while \(e_j\) indicates the jth edge label in an indexing of \(E_G\). Though we now have a tensor representing the graph, for most graphs in practice, the tensor would be very sparse, with few 1’s relative to 0’s. As a result, using the tensor for the purposes of machine learning would be impractical.

In order to create more practical numeric representations of graphs for machine learning applications, knowledge graph embeddings aim to embed graphs within a low-dimensional, dense, numerical representation. There are many ways in which embeddings may be defined, but typically an embedding will map each node and each edge-label of a graph to an independent vector or matrix. For simplicity, we will assume embeddings that associate nodes and edge-labels to real-valued vectors of fixed dimension d, which we denote by the set \(\mathbb {R}^d\).Footnote 5

Definition 14 (Knowledge graph embedding)

Given a del graph G, a knowledge graph embedding of G is a pair of mappings \((\varepsilon ,\rho )\) such that \(\varepsilon : N_G \rightarrow \mathbb {R}^d\) and \(\rho : E_G \rightarrow \mathbb {R}^d\).

Typically \(\varepsilon \) is known as an entity embedding, while \(\rho \) is known as a relation embedding. The knowledge graph embedding then consists of (typically low-dimensional) numeric representations of the node and edge-labels of the graph G, typically extracted such that the graph G can be (approximately) reconstructed from the embedding. Towards this goal, the graph can be conceptualised as a function \(\gamma : \mathbf {C} \times \mathbf {C} \times \mathbf {C} \rightarrow \mathbb {R}_{[0,1]}\), where \(\gamma (s,p,o) = 1\) if \((s,p,o) \in G\) or \(\gamma (s,p,o) = 0\) if \((s,p,o) \not \in G\). Instead of accepting the constants s, p, o, however, we could rather consider accepting the embeddings of those concepts: \(\varepsilon (s)\), \(\rho (p)\), \(\varepsilon (o)\). This gives rise to the notion of a plausibility scoring function.

Definition 15 (Plausibility)

A plausibility scoring function is a partial function \(\phi : \mathbb {R}^d \times \mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}_{[0,1]}\). Given a del graph \(G = (V,E,L)\), a triple \((s,p,o) \in N_G \times E_G \times N_G\), and a knowledge graph embedding \((\varepsilon ,\rho )\) of G, the plausibility of (spo) is given as \(\phi (\varepsilon (s),\rho (p),\varepsilon (o))\).

Triples with scores closer to 1 are considered more plausible, while triples with scores closer to 0 are considered less plausible. We can now learn embeddings that yield a score close to 1 for triples in (spo), and a score close to 0 for a sample of negative triples not in G. Given that G is assumed to be incomplete, we cannot be certain that triples not in G are false. A common heuristic to generate negative examples is to apply the Partial Completeness Assumption (PCA), taking a triple (spo) from G and replacing a term (often o) with another term appearing in G such that the resulting triple does not appear in G. In practice, the dimensions of the embeddings are fixed to a low number such that rather than “remembering” G, the knowledge graph embedding is forced to generalise patterns in how G is connected. Different knowledge graph embeddings instantiate the types of embedding considered and the plausibility scoring function in different ways [67]. We refer to the extended tutorial for details [35].

figure o

Once trained over G, knowledge graph embeddings directly allow for estimating the plausibility for triples of the form \((s,p,o) \in N_G \times E_G \times N_G\) (i.e., triples using some combination of terms found in G). This can be used for the purposes of link prediction, whereby considering the graph G of Fig. 1 and given a partial triple such as , the embedding of G can be used to predict likely completions of the triples, such as Rigil Kentaurus (already in the graph), or perhaps Proxima Centauri, or even Toliman itself. The embeddings themselves can also be useful independently of the plausibility scoring function as numerical representations of the elements of the graph, where, for example, similar numerical values will typically be assigned to similar nodes and edge-labels based on how they are connected in the graph.

In terms of open questions, a natural topic to explore is the combination of knowledge graph embeddings with ontologies and notions of entailment.

8 Graph Neural Networks

Rather than encoding the structure of graphs numerically, an alternative to enable learning over graphs is to design machine learning architectures that operate directly over the structure of a graph. A natural starting point is to consider neural networks, which already form graphs where nodes are (artificial) neurons, and edges are (artificial) axons that pass signals between neurons. Unlike graphs used to represent data, however, traditional neural networks tend to have a much more regular structure, being organised into layers, where all the nodes of one layer are connected pairwise to all the nodes of the next layer.

To enable learning over graphs of data, an alternative approach is to define the structure of the neural network in terms of the structure of the graph. In this case, the nodes of the graph are interpreted as neurons, while edges are interpreted as axons. Thus nodes can pass signals to each other through edges towards solving a given task. In a supervised setting, we may label a subset of nodes in the graph and then learn functions that aggregate information from the neighbourhood of each node, computing a new state for the node; this may continue recursively until a fixpoint, or for a fixed number of steps. Typically nodes and edges in the input graph will be associated with numerical feature vectors that encode pertinent information for the supervised task. Such a learning architecture is known as a graph neural network (GNN) [59].

As an example, assuming a large del graph similar to that shown in Fig. 2, we may label a subset of planets that are known to be in a habitable zone that could support life. We could then associate nodes with vectors that numerically encode their type, their mass, their density, their composition etc.; we may then further associate edges with vectors indicating the type of relation, the astronomic distance between bodies, etc. Based on labelled examples, a graph neural network can then learn aggregation functions that take the information surrounding the neighbourhood of each node – for example, numerical data about the moon(s) of a particular planet, the star(s) of planet, the distances involved, etc. – and generate the expected output; the same functions can then be applied to unlabelled nodes to generate predicted classifications.

We first define a vector-labelled del graph, which serves as input to a GNN:

Definition 16 (Vector-labelled graph)

A vector-labelled del graph \(G_\lambda \) is a pair where G is a del graph, and \(\lambda : N_G \cup G \rightarrow \mathbb {R}^a\) is a vector labelling function.

For simplicity, we will assume that nodes and triples are labelled with vectors of the same dimension (a). Thereafter, there are two principle architectures for GNNs: recursive and non-recursive graph neural networks. Here we focus on non-recursive graph neural networks, where details of the recursive architecture can rather be found in the extended tutorial [35]. Note that in the following definition, we use \(S \rightarrow \mathbb {N}\) to denote bags (aka multisets) formed from the set S.

Definition 17 (Non-recursive graph neural network)

A non-recursive graph neural network (NRecGNN) \(\mathfrak {N}\) with l layers is an l-tuple of functions , where \(\textsc {Agg}^{(k)}: \mathbb {R}^{a} \times 2^{(\mathbb {R}^{a} \times \mathbb {R}^a) \rightarrow \mathbb {N}} \rightarrow \mathbb {R}^{a}\) for \(1 \le k \le l\).

Each aggregation function \(\textsc {Agg}^{(k)}\) computes a new feature vector for a node, given its previous feature vector and the feature vectors of the nodes and edges forming its neighbourhood. We assume for simplicity that the dimensions of the vectors remain the same throughout, though this is not necessary in practice.Footnote 6 Given an NRecGNN \(\mathfrak {N} = ( \mathrm {\textsc {Agg}}^{(1)},\ldots ,\mathrm {\textsc {Agg}}^{(l)} )\), a vector-labelled graph \(G_\lambda \), and a node \(u \in N_G\), we define the output vector assigned to node u in \(G_\lambda \) by \(\mathfrak {N}\) (written \(\mathfrak {N}(G_\lambda ,u)\)) as follows. First let . For all \(i \ge 1\), let:

Then .

In an l-layer NRecGNN, a different aggregation function is applied at each step (i.e., in each layer), up to a fixed number of steps l. These aggregation functions are typically parametrised and learnt based on labelled training data. When the aggregation functions are based on a convolutional operator, the result is a convolutional graph neural network (ConvGNN).

Though initially it may seem as if GNNs and ontologies are orthogonal concepts for graphs, there are some correspondences between both.

figure p

9 Conclusions

The growing popularity of knowledge graphs presents an opportunity for research that combines various technical perspectives on how graphs can be used to represent and exploit knowledge at large scale. Within the intersection of these varying perspectives lies interesting research questions in terms of how concepts from graph databases and knowledge representation can be fruitfully combined, how techniques from machine learning relate to logical languages, and so forth.

Here we have mentioned a number of key concepts relating to knowledge graphs – specifically data models, querying, ontologies, rules, embeddings and graph neural networks – as well as a number of research topics that arise from their combination. We have only scratched the surface. Knowledge graphs encompass an even wider range of topics, where we have not touched upon concepts such as shapes and validation, graph algorithm and analytics, privacy and anonymisation, data quality, knowledge graph creation and completion, knowledge graph refinement, etc. Likewise there are many interesting research questions that arise when considering combinations of these concepts. For discussion on these and other topics we refer to the extended tutorial [35].