1 Introduction

We survey the work we have done on developing \(\text {FunDL}\), a family of description logics that can be used to address a number of problems in querying structured data sources, with a particular focus on data sources that have an underlying object-relational schema. All member dialects of this family have two properties in common. First, each is feature based: the usual notion of roles in description logic that are interpreted as binary relations is replaced with the notion of features that are interpreted as unary functions. We have found features to be a better fit with object-relational schema, e.g., for capturing the ubiquitous notion of attributes. And second, each dialect includes a concept constructor for capturing a variety of equality generating dependencies: so-called path functional dependencies (PFDs) that generalize the notions of primary keys, uniqueness constraints and functional dependencies that are again ubiquitous in object-relational schema. PFDs also ensure member dialects do not forgo the ability to capture roles or indeed n-ary relations in general. This can be accomplished by the simple expedient of reification via features, and then by employing PFDs to ensure a set semantics for reified relations. Indeed, the dialect \(\mathcal{DLFD}\), introduced in the first part of our survey, can capture very expressive role-based dialects of description logics, including dialects with so-called qualified number restrictions, inverse roles, role hierarchies, and so on [29].

Our survey consists of three general parts, with the first two parts focusing on the problem of logical implication for \(\text {FunDL}\) dialects with EXPTIME and PTIME complexity, respectively, and in which the dialects assume features are interpreted as total functions. In the third part of our survey, we begin with a review of more recent work on how such dialects may be adapted to support features that are instead partial functions. We then consider how role hierarchies can be captured as concept hierarchies in which the concepts are introduced as reifications of roles. Part three concludes with a review of other reasoning problems, in particular, on knowledge base consistency for \(\text {FunDL}\) dialects, and on query answering for dialects surveyed in part two.

We begin in the next section with a general introduction to \(\text {FunDL}\): what features are, what the various concept constructors are, basic notational conventions, the grammar protocol we follow to define the various dialects, and so on. Our survey concludes with a brief overview of related work.

2 Background and Definitions

Here, we define a nameless all inclusive member dialect of the \(\text {FunDL}\) family for the purpose of introducing a space of concept constructors that we then use for defining all remaining dialects in our survey. We also say how a theory is defined by a so-called terminology (or TBox) consisting of a finite set of sentences expressing inclusion dependencies, and introduce the problem of logical implication of an inclusion dependency by a TBox. Indeed, we focus exclusively on the problem of logical implication throughout the first two parts of our survey.

Definition 1

(Feature-Based DLs). Let \(\mathsf {F}\) and \(\mathsf {PC}\) be sets of feature names and primitive concept names, respectively. A path expression is defined by the grammar \(\mathop {\mathsf {Pf}}\nolimits \,{:}{:}= f.\mathop {\mathsf {Pf}}\nolimits | \mathop { id }\nolimits \), for \(f\in \mathsf {F}\). We define derived concept descriptions by the grammar on the left-hand-side of Fig. 1.

An inclusion dependency \(\mathcal C\) is an expression of the form \(C_1\sqsubseteq C_2\). A terminology (TBox) \(\mathcal T\) consists of a finite set of inclusion dependencies. A posed question \(\mathcal{Q}\) is a single inclusion dependency.

Fig. 1.
figure 1

Concept constructors in feature-based description logics.

The semantics of expressions is defined with respect to a structure \(\mathcal{I}=(\triangle , \cdot ^\mathcal{I})\), where \(\triangle \) is a domain of objects or entities and \((\cdot )^\mathcal{I}\) an interpretation function that fixes the interpretations of primitive concept names A to be subsets of \(\triangle \) and feature names f to be total functions \((f)^\mathcal{I}:\triangle \rightarrow \triangle \). The interpretation is extended to path expressions, \((\mathop { id }\nolimits )^\mathcal{I} = \lambda x.x\), \((f.\mathop {\mathsf {Pf}}\nolimits )^\mathcal{I} = (\mathop {\mathsf {Pf}}\nolimits )^\mathcal{I}\circ (f)^\mathcal{I}\) and derived concept descriptions C as defined in the centre column of Fig. 1.

An interpretation \(\mathcal{I}\) satisfies an inclusion dependency \(C_1\sqsubseteq C_2\) if \((C_1)^\mathcal{I}\subseteq (C_2)^\mathcal{I}\) and is a model of \(\mathcal {T}\) (\(\mathcal{I}\models \mathcal {T}\)) if it satisfies all inclusion dependencies in \(\mathcal {T}\). The logical implication problem asks if \(\mathcal {T}\models \mathcal {Q}\) holds, that is, if \(\mathcal{Q}\) is satisfied in all models of \(\mathcal {T}\). \(\Box \)

We shall see that the logical implication problem for this logic is undecidable for a variety of reasons. For example, the value restriction, top and same-as concept constructors are all that are needed to encode the uniform word problem [24]. Thus, each dialect of the \(\text {FunDL}\) family in our survey will correspond to some fragment of this logic. Grammars defining a dialect use the non-terminals C and D to characterize concept constructors permitted on left-hand-sides and right-hand-sides of inclusion dependencies occurring in a TBox, respectively, and the non-terminal E to characterize concept constructors permitted in posed questions. We also assume, when an explicit definition of non-terminal D (resp. E) is missing, that D concept descriptions align with C concept descriptions (resp. E concept descriptions align with D concept descriptions).

Fig. 2.
figure 2

An object-relational schema.

To see how \(\text {FunDL}\) dialects are useful in capturing structured data sources, consider a visualization of a hypothetical object-relational university schema in Fig. 2. Here, nodes are classes, labelled directed edges are attributes, thick edges denote inheritance, and underlined attributes denote primary keys. Introducing a primitive concept and a feature for each class and attribute then enables attribute typing, inheritance, primary keys and a variety of other data dependencies to be captured as inclusion dependencies in a university TBox:

  1. 1.

    (disjoint classes) \(\text {PERSON} \sqsubseteq \lnot \text {DEPT}\),

  2. 2.

    (attribute typing) \(\text {PERSON} \sqsubseteq \forall \textit{name}.\text {STRING}\),

  3. 3.

    (unary primary key) \(\text {PERSON} \sqsubseteq \text {PERSON}:\textit{name}\rightarrow \mathop { id }\nolimits \),

  4. 4.

    (disjoint attribute values) \(\text {PERSON} \sqsubseteq \text {DEPT}:\textit{name}\rightarrow \mathop { id }\nolimits \),

  5. 5.

    (inheritance) \(\text {PROF} \sqsubseteq {\text {PERSON}}\),

  6. 6.

    (views) \(\forall reports .\text {CHAIR} \sqsubseteq \text {PROF}\),

  7. 7.

    (mandatory participation) \(\exists { head }^{-1} \sqsubseteq \text {CHAIR}\),

  8. 8.

    (binary primary key) \(\text {CLASS} \sqsubseteq \text {CLASS}:\textit{dept},\textit{num}\rightarrow \mathop { id }\nolimits \), and

  9. 9.

    (cover) \(\text {PERSON} \sqsubseteq (\text {STUDENT} \sqcup \text {PROF})\).

Allowing path expressions to occur in PFD concepts turns out to be quite useful in capturing additional varieties of equality generating dependencies, as in the following:

$$ \text {TAKES} \sqsubseteq \text {TAKES}:\textit{student}, \textit{class}.\textit{room}, \textit{class}.\textit{time}\rightarrow \textit{class}. $$

This inclusion dependency expresses a constraint induced by the interaction of time and space, that no student can take two different classes in the same room at the same time or, to paraphrase, that no pair of classes with at least one student taking them can be in the same room at the same time. The second reading illustrates how so-called identification constraints in DL-Lite dialects can also be captured [11].

In the third part of our survey, we review work on how features may be interpreted as partial functions. This leads to the addition of the concept constructor \(\exists {f}\) for capturing domain elements for which feature f is defined. Consequently, it becomes possible to say, e.g., that a \(\text {DEPT}\) does not have a \(\textit{gpa}\) by adding the inclusion dependency

$$\mathrm {DEPT} \sqsubseteq \lnot \exists {\textit{gpa}}$$

to the university TBox.

Note that any logical implication problem for the university TBox defined thus far can be solved by appeal to one of the expressive \(\text {FunDL}\) dialects, and, notwithstanding cover constraints, can be solved by one of the tractable dialects in PTIME. An ability to do this has many applications in information systems technology. For example, early work on \(\text {FunDL}\) has shown how to reduce the problem of determining when a SQL query can be reformulated without mentioning the DISTINCT keyword to a logical consequence problem [20]. More recent applications allow one to resolve fundamental issues in reasoning about identity in conceptual modelling and SQL programming [6], and in ontology-based data access [7, 26].

2.1 Ackerman Decision Problems

Our complexity reductions are tied to the classical Ackermann case of the decision problem [1].

Definition 2

(Monadic Ackerman Formulae). Let \(P_i\) be monadic predicate symbols and \(x,y_i,z_i\) variables. A monadic first-order formula in the Ackermann class is a formula of the form \(\exists z_1\ldots \exists z_k \forall x \exists y_1\ldots \exists y_l.\varphi \) where \(\varphi \) is a quantifier-free formula over the symbols \(P_i\). \(\Box \)

Every formula with the Ackermann prefix can be converted to Skolem normal form by replacing variables \(z_i\) by Skolem constants and \(y_i\) by unary Skolem functions not appearing in the original formula. This, together with standard Boolean equivalences, yields a finite set of universally-quantified clauses containing at most one variable (x).

Proposition 3

([16]). The Ackermann decision problem is complete for EXPTIME.

The lower bound holds even for the Horn fragment of the decision problem called [15]. A program is a finite set of definite Horn clauses. A recognition problem for a program \(\varPi \) and a ground atom Q is to determine if Q is true in all models of \(\varPi \) (i.e., if \(\varPi \cup \{\lnot Q\}\) is unsatisfiable).

3 Expressive \(\text {FunDL}\) Dialects

In this first part of our survey, we consider the logical implication problem for an expressive Boolean complete dialect with value restrictions on features. We begin by presenting a lower bound for a fragment of this dialect and then follow with upper bounds. We subsequently consider extensions to the dialect that admit additional concept constructors, namely PFDs and inverse features.

3.1 Logical Implication in \(\mathcal{DLF}\)

The dialect \(\mathcal{DLF}_0\) of \(\text {FunDL}\) is defined by the following grammar (and recall our protocol whereby right-hand-sides of inclusion dependencies and posed questions are also defined by non-terminal C):

$$\begin{array}{lcl} C&\,{:}{:}=&A \mid C_1\sqcap C_2 \mid \forall f.C \end{array}$$

Observe that \(\mathcal{DLF}_0\) is a Horn fragment that only allows primitive concepts, conjunctions and value restrictions. We show that every recognition problem can be simulated by a \(\mathcal{DLF}_0\) implication problem [29]. For this reduction, each monadic predicate symbol is assumed to also qualify as a primitive concept name in \(\mathcal{DLF}_0\). Given an instance of a recognition problem in the form of a program \(\varPi \) and a ground goal atom \(G = P(\mathop {\overline{\mathsf {Pf}}}\nolimits (0))\), we construct an implication problem for \(\mathcal{DLF}_0\) as follows: in \(\varPi \),

$$\begin{array}{lll} \mathcal{T}_{\varPi } &{}=&{}\{ \forall \mathop {\mathsf {Pf}}\nolimits _1'.Q_1'\sqcap \ldots \sqcap \forall \mathop {\mathsf {Pf}}\nolimits _k'.Q_k' \sqsubseteq \forall \mathop {\mathsf {Pf}}\nolimits '.P':\\ &{}&{}~~~~~~P'(\mathop {\overline{\mathsf {Pf}}}\nolimits '(x))\leftarrow Q_1'(\mathop {\overline{\mathsf {Pf}}}\nolimits _1'(x)),\ldots ,Q_k'(\mathop {\overline{\mathsf {Pf}}}\nolimits _k'(x))\in \varPi \},\\[1mm] \mathcal{Q}_{\varPi ,G} &{}=&{} \forall \mathop {\mathsf {Pf}}\nolimits _1.Q_1\sqcap \cdots \sqcap \forall \mathop {\mathsf {Pf}}\nolimits _k.Q_k\sqsubseteq \forall \mathop {\mathsf {Pf}}\nolimits .P, \end{array}$$

where the \(\mathop {\overline{\mathsf {Pf}}}\nolimits (x)\) terms in naturally correspond to path functions \(\mathop {\mathsf {Pf}}\nolimits \) in \(\mathcal{DLF}_0\), and where the posed question \(\mathcal{Q}_{\varPi ,G}\) is formed from ground facts \(Q_i(\mathop {\overline{\mathsf {Pf}}}\nolimits _i(0))\in \varPi \), and the ground goal atom \(G=P(\mathop {\overline{\mathsf {Pf}}}\nolimits (0))\).

Theorem 4

([30]). Let \(\varPi \) be a program and G a ground atom. Then

$$\varPi \models G \iff \mathcal{T}_{\varPi }\models \mathcal{Q}_{\varPi ,G}.$$

For the reduction to work, one needs two features. (Unlike the case with \(\mathcal ALC\) style logics, the problem becomes PSPACE-complete with one feature.) This result was later used to show EXPTIME-hardness for \(\mathcal{FL}_0\) [3].

We now show a matching upper bound for the Boolean complete dialect with value restrictions, as defined by the following:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2 \mid C_1\sqcup C_2 \mid \forall f.C \mid \lnot C\\ \end{array}$$

We first show how the semantics of \(\mathcal{DLF}\) constructors can be captured by Ackermann formulae: let C, \(C_1\), and \(C_2\) range over concept descriptions and f over attribute names. We introduce a unary predicate subscripted by a description that simulates that description in our reduction:

$$\begin{array}{l} \forall x.(P_C(x) \vee P_{\lnot C}(x)), \forall x.\lnot (P_C(x) \wedge P_{\lnot C}(x))\\ \forall x.P_{C_1\sqcap C_2}(x) \leftrightarrow (P_{C_1}(x) \wedge P_{C_2}(x)) \\ \forall x.P_{C_1\sqcup C_2}(x) \leftrightarrow (P_{C_1}(x) \vee P_{C_2}(x)) \\ \forall x.P_{\forall f.C}(x) \leftrightarrow P_{C}(f(x)) \end{array}{~~~~~~~~~~~~~~~~~~(*)}$$

To complete the translation of a \(\mathcal{DLF}\) implication problem \(\mathcal{T}\models \mathcal{Q}\), for \(\mathcal{Q}\) of the form \(C\sqsubseteq D\), what remains is the translation of the inclusion dependencies in \(\mathcal T \cup \{Q\}\):

  • \(\varPhi _{\mathcal{DLF}} = \bigwedge _{\varphi \in \mathrm {Semantics}(\mathcal {T},\mathcal {Q})}\varphi \),

  • \(\varPhi _\mathcal{T}= \bigwedge _{C'\sqsubseteq D'\in \mathcal{T}} \forall x.P_{C'}(x) \rightarrow P_{D'}(x)\), and

  • \(\varPhi _\mathcal{C}= P_C(0) \wedge P_{\lnot D}(0)\) (a Skolemized negation of the posed question \(\mathcal Q\)),

where \(\mathrm {Semantics}(\mathcal {T},\mathcal {Q})\) is the set of all formulae \((*)\) whose subscripts range over concepts and subconcepts that appear in \(\mathcal T\cup \{Q\}\).

Theorem 5

([30]). Let \(\mathcal{T}\) and \(\mathcal{Q} = C\sqsubseteq D\) be a terminology and inclusion dependency in \(\mathcal{DLF}\), respectively. Then \(\mathcal{T} \models \mathcal{C}\) iff \(\varPhi _{\mathcal{DLF}}\wedge \varPhi _\mathcal{T}\wedge \varPhi _\mathcal{Q}\) is not satisfiable.

Theorems 4 and 5 establish a tight EXPTIME complexity bound for the \(\mathcal{DLF}\) logical implication problem.

3.2 Adding Path Functional Dependencies to \(\mathcal{DLF}\)

Allowing unrestricted use of the PFD concept constructor leads to undecidable implication problems, as in the case of a description logic defined by the following grammar:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2 \mid C_1\sqcup C_2 \mid \forall f.C \mid \lnot C \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ \end{array}$$

This remains true even for very simple varieties of PFD concept constructors.

The undecidability results are based on a reduction of the unrestricted tiling problem [4, 5] to the logical implication problem. The crux of the reduction is the use of the PFD constructor under negation or, equivalently, on the left-hand-side of inclusion dependencies. For example, the dependency

$$ A\sqsubseteq {\lnot (B:f,g\rightarrow \mathop { id }\nolimits )} $$

states that, for some A object, there must be a distinct B object that agrees with this A object on features f and g, i.e., there must be a square in a model of the above inclusion dependency. Such squares can then be connected into a grid using additional PFDs and the Boolean structure of the logic in a way that enables tiling to be simulated.

This idea can be sharpened to the following three borderline cases, where simple, unary and key refer, respectively, to conditions in which path expressions correspond to individual features or to \(\mathop { id }\nolimits \), in which left-hand-sides of PFDs consist of a single path expression, and in which the right-hand-side is \(\mathop { id }\nolimits \) [35]:

  1. 1.

    PFDs are simple and key, and therefore resemble

    $$C:f_1,\ldots , f_k\rightarrow \mathop { id }\nolimits $$

    (i.e., the standard notion of relational keys);

  2. 2.

    PFDs are simple and non-key, and therefore resemble

    $$C:f_1,\ldots ,f_k\rightarrow f$$

    (i.e., the standard notion of relational functional dependencies); and

  3. 3.

    PFDs are simple and unary, and therefore resemble either of the following:

    $$C:f\rightarrow g \text{ or } C:f\rightarrow \mathop { id }\nolimits .$$

Observe that the three cases are exhaustive: the only possibility not covered happens when all PFDs have the form \(C:\mathop {\mathsf {Pf}}\nolimits \rightarrow \mathop { id }\nolimits \), i.e., are unary and key. However, it is a straightforward exercise in this case to map logical implication problems to alternative formulations in decidable DL dialects with inverses and functional restrictions. Notably, the reductions make no use of attribute value restrictions in the first two of these cases; they rely solely on PFDs and the standard Boolean constructors.

On Regaining Decidability. It turns out that undecidability is indeed a consequence of allowing PFDs to occur within the scope of negation (and, as a consequence, all \(\text {FunDL}\) dialects disallow this possibility). Among the first expressive and decidable dialects is \(\mathcal{DLFD}\), the description logic defined by the following grammar rules:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2 \mid C_1\sqcup C_2 \mid \forall f.C \mid \lnot C \mid \top \\ D&{}\,{:}{:}=&{} C \mid D_1\sqcap D_2 \mid D_1\sqcup D_2 \mid \forall f.D \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ \end{array}$$

Observe that PFDs must now occur on right hand sides of inclusion dependencies at either the top level or within the scope of monotone concept constructors. (Allowing PFDs on left hand sides is equivalent to allowing PFDs in the scope of negation: \(D_1\sqsubseteq \lnot (D_2:f\rightarrow g)\mathrm {~is~equivalent~to~} D_1\sqcap (D_2:f\rightarrow g)\sqsubseteq \bot .\))

To establish the complexity lower bound, we first study the problem for a subset of \(\mathcal{DLFD}\) in which all inclusion dependencies are of the form

$$\top \sqsubseteq \top :\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits .$$

An implication problem in this subset is called a PFD membership problem. It will simplify matters to assume that each monadic predicate symbol P in maps to a distinct feature p in \(\mathcal{DLFD}\), and that each such p differs from the attributes corresponding to unary function symbols in .

We proceed similarly to the \(\mathcal{DLF}\) case: Let \(\varPi \) be an arbitrary program and \(G=P(\mathop {\overline{\mathsf {Pf}}}\nolimits (0))\) a ground atom. We construct an implication problem for \(\mathcal{DLFD}\) as follows:

figure a

where \(P_1(\mathop {\overline{\mathsf {Pf}}}\nolimits _1(0)),\ldots ,P_k(\mathop {\overline{\mathsf {Pf}}}\nolimits _k(0))\) are the ground facts in \(\varPi \).

Theorem 6

([30]). Let \(\varPi \) be an arbitrary program and \(G=P(\mathop {\overline{\mathsf {Pf}}}\nolimits (0))\) a ground atom. Then \(\varPi \models G \iff \mathcal{T}_{\varPi }\models \mathcal{C}_{\varPi ,G}\).

The reduction establishes another source of EXPTIME-hardness for our \(\mathcal{DLFD}\) fragment that originates from the PFDs only.

To establish the upper bound, we reduce logical implication in \(\mathcal{DLFD}\) to logical implication in \(\mathcal{DLF}\). The reduction is based on the following observations:

  1. 1.

    If the posed question does not contain the PFD concept constructor then the implication problem reduces to the implication problem in \(\mathcal{DLF}\) since, due to the tree model property of the logic, the PFD inclusion dependencies in the TBox are satisfied vacuously;

  2. 2.

    Otherwise the posed question contains a PFD, e.g., has the form

    $$A\sqsubseteq B:\mathop {\mathsf {Pf}}\nolimits _!,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits .$$

    To falsify the posed question in this case, we need to construct a model consisting of two trees respectively rooted by A and B that obey the TBox inclusion dependencies, that agree on paths \(\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\) originating from the respective roots, and that disagree on \(\mathop {\mathsf {Pf}}\nolimits \). Since the two trees are identical up to node labels and the agreements always equate corresponding nodes in the two trees, the model can be simulated in \(\mathcal{DLF}\) by doubling the primitive concepts (one for simulating concept membership in each of the two trees) and by introducing an auxiliary primitive concept to simulate path agreements. This two trees idea can then be generalized to account for posed questions having (possibly multiple) PFDs nested in other monotone concept constructors.

The above assumes that PFDs are not nested in other constructors in a TBox; this can be achieved by a simple conservative extension of the given TBox and appropriate reformulation of the posed question [35].

Theorem 7

([30]). The implication problem for \(\mathcal{DLFD}\) can be reduced to an implication problem for \(\mathcal{DLF}\) with only a linear increase in size.

Theorems 6 and 7 establish a tight EXPTIME complexity bound for the \(\mathcal{DLFD}\) implication problem.

3.3 Adding Inverse Features

Allowing right-hand-sides of inclusion dependencies to now employ inverse features together with PFDs, as in \(\mathcal{DLFDI}\), a \(\text {FunDL}\) dialect defined by the following grammar:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2 \mid C_1\sqcup C_2 \mid \forall f.C \mid \lnot C \mid \top \\ D&{}\,{:}{:}=&{} C \mid D_1\sqcap D_2 \mid D_1\sqcup D_2 \mid \forall f.D \mid \exists {f}^{-1}.{C} \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ \end{array}$$

leads immediately to undecidability, similarly to [14]. Again, the reduction is from the unrestricted tiling problem in which an initial square is generated by the constraints

$$ A \sqsubseteq \exists {f}^{-1}.{B}\sqcap \exists {f}^{-1}.{C},~~ B\sqcap C\sqsubseteq \bot , \text{ and } B\sqsubseteq C:f\rightarrow g, $$

and further inclusion dependencies then extend it to a properly tiled grid.

Theorem 8

([31]) Logical implication for \(\mathcal{DLFDI}\) is undecidable.

On Regaining Decidability with Inverses. We review two approaches to restricting either the PFD constructor or the way inverses are allowed to be qualified to regain decidability of the logical implication problem.

Prefix-restricted PFDs. The first approach syntactically restricts the PFD constructor as follows:

Definition 9

[Prefix Restricted Terminologies]. Let \(D:\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits '\) be an arbitrary PFD where \(\mathop {\mathsf {Pf}}\nolimits \) is the maximal common prefix of the path expressions \(\{\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _k\}\). The PFD is prefix-restricted if either \(\mathop {\mathsf {Pf}}\nolimits '\) is a prefix of \(\mathop {\mathsf {Pf}}\nolimits \) or \(\mathop {\mathsf {Pf}}\nolimits \) is a prefix of \(\mathop {\mathsf {Pf}}\nolimits '\). \(\Box \)

This condition applies to the argument PFDs occurring in a terminology and strengthens the results in [14]. Note that, because of accidental common prefixes, it is not sufficient to simply require that unary PFDs resemble keys since, for example, a k-ary PFD \(A_1\sqsubseteq A_2:f.a_1,\ldots ,f.a_k\rightarrow h\) has a logical consequence \(A_1\sqsubseteq A_2:f\rightarrow h\), thus yielding the ability to construct tiling similar to the one outlined above.

Theorem 10

([31]). Let \(\mathcal{T}\) be \(\mathcal{DLFDI}\) terminology with prefix-restricted PFDs. Then the implication problem \(\mathcal{T}\models \mathcal{Q}\) is decidable and EXPTIME-complete.

Coherent Terminologies. The second of our conditions for recovering decidability is to impose a coherency condition on terminologies themselves. The main advantage of this approach is that we thereby regain the ability for unrestricted use of PFDs in terminologies. The disadvantage is roughly that there is a single use restriction on using feature inversions in terminologies.

Definition 11

(Coherent Terminologies). A terminology \(\mathcal{T}\) is coherent if

$$\mathcal{T}\models (\exists {f}^{-1}{.D})\sqcap (\exists {f}^{-1}{.E})\sqsubseteq \exists {f}^{-1}{(D\sqcap E)}$$

for all descriptions DE that appear as subconcepts of concepts that appear in \(\mathcal{T}\), or their negations. \(\Box \)

Note that we can syntactically guarantee that \(\mathcal{T}\) is coherent by adding inclusion dependencies of the form \((\exists {f}^{-1}{D})\sqcap (\exists {f}^{-1}{E})\sqsubseteq \exists {f}^{-1}{(D\sqcap E)}\) to \(\mathcal{T}\) for all concept descriptions DE appearing in \(\mathcal{T}\). This restriction allows us to construct interpretations of non-PFD descriptions in which objects do not have more than one f predecessor (for all \(f\in \mathsf {F}\)) and thus satisfy all PFDs vacuously.

By restricting logical implication problems for \(\mathcal{DLFDI}\) to cases in which terminologies are coherent, it becomes possible to apply reductions to satisfiability problems for Ackerman formulae.

Theorem 12

([31]). Let \(\mathcal{T}\) be a coherent \(\mathcal{DLFDI}\) terminology. Then the implication problem \(\mathcal{T}\models \mathcal{C}\) is decidable and EXPTIME-complete.

Note that unqualified inverse features of the form \(\exists {f}^{-1}\) immediately imply coherency. Moreover, one can qualify an f predecessor by concept C by asserting

$$ A\sqsubseteq \exists {f}^{-1}, ~~ \forall f.A\sqsubseteq C.$$

Thus, the restriction to unqualified inverses does not rule out cases in which qualified inverses might be useful, and avoids the problem of allowing multiple f predecessors (that could then interact with the PFD constructs). Hence, for the remainder of the survey, we assume unqualified inverse features in \(\text {FunDL}\) dialects.

3.4 Equational Constraints

As pointed out in our introductory comments, allowing equational (same-as) concepts in TBoxes leads immediately to undecidability via a reduction from the uniform word problem [24]. Conversely, allowing equational concepts in posed questions extends the capabilities of the logics, in particular allowing for capturing factual assertions (called an ABox, see Sect. 5.3). To this end we introduce the \(\text {FunDL}\) dialect \(\mathcal{DLFDE}\) defined as follows:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2 \mid C_1\sqcup C_2 \mid \forall f.C \mid \lnot C \mid \top \\ D&{}\,{:}{:}=&{} C \mid D_1\sqcap D_2 \mid D_1\sqcup D_2 \mid \forall f.D \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ E &{}\,{:}{:}=&{} C \mid E_1\sqcap E_2 \mid \bot \mid \lnot E \mid \forall f.E \mid (\mathop {\mathsf {Pf}}\nolimits _1=\mathop {\mathsf {Pf}}\nolimits _2)\\ \end{array}$$

Undecidability. It is easy to see that the following two restricted cases have decidable decision problems:

  • allowing arbitrary PFDs in terminologies, and

  • allowing equational concepts in the posed question.

Unfortunately, the combination of the two cases leads again to undecidability. One can use the equational concept to create a seed square for a tiling problem (although a triangle is actually sufficient in this case, as in \(A \sqsubseteq (f.g=g) \sqcap \forall f.B \) [35]) that can then be extended into an infinite grid using PFDs in a TBox (e.g., \(A\sqsubseteq (B:g\rightarrow f.h)\sqcap (B:g\rightarrow k.g)\) for the triable seed case), and ultimately to an instance of a tiling problem. Hence:

Theorem 13

([35]). Let \(\mathcal{T}\) be a \(\mathcal{DLFD}\) terminology and E an equational concept. Then the problem \(\mathcal{T}\models E\sqsubseteq \bot \) is undecidable.

3.4.1 Decidability and a Boundary Condition.

To regain decidability, we restrict the PFD constructor to adhere to a boundary condition, in particular, to have either of the following two forms:

  • \(C:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _i,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits \); and

  • \(C:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _i,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits .f\), for some primitive feature f.

We call the resulting fragment \(\mathcal{DLFDE}^-\). The condition distinguishes, e.g., the PFDs \(f \rightarrow \mathop { id }\nolimits \) and \(f \rightarrow g\) from the PFD \(f \rightarrow g.f\). Intuitively, a simple saturation procedure that fires PFDs on a hypothetical database is now guaranteed to terminate as a consequence.

Notice that the boundary condition still admits PFDs that express arbitrary keys or functional dependencies in the sense of the relational model, including those occurring in all our examples. Thus, restricting PFDs in this manner does not sacrifice any ability to capture database schema for legacy data sources.

Theorem 14

([21]). Let \(\mathcal{T}\) and \(\mathcal{T}'\) be respective \(\mathcal{DLF}\) and \(\mathcal{DLFD}\) terminologies in which the latter contains only PFD inclusion dependencies, and let E be an equational concept. Then there is a concept \(E'\) such that

$$ \mathcal{T}\cup \mathcal{T}' \models E\sqsubseteq \bot \text { iff } \mathcal{T} \models (E\sqcap E')\sqsubseteq \bot .$$

Moreover, \(E'\) can be constructed from \(\mathcal{T}'\) and E effectively and in time polynomial in \(|\mathcal{T}'|\).

The boundary condition on PFDs is essential for the above theorem to hold. If unrestricted PFDs are combined with either equations or an ABox, there is no limit on the length of paths participating in path agreements when measured from an initial object \(o\in E\sqcap E'\) in the associated satisfiability problem. Moreover, any minimal relaxation of this condition, i.e., allowing only non-key PFDs of the form \(C:f\rightarrow g.h\), already leads to undecidability [32, 35]:

Theorem 15

([21]). \(\mathcal{DLFDE}^-\) logical implication and the problem of ABox consistency defined in Sect. 5.3 are decidable and complete for EXPTIME.

The construction essentially generates a pattern (part of a model) that satisfies E (which already contains the effects of all PFDs due to the boundary condition) and then tests if this pattern can be extended to a full model using the decision procedure for \(\mathcal{DLF}\). Note also that posed questions containing PFDs can be rewritten to equivalent posed questions replacing the PFDs with their semantic definitions via path agreements and disagreements.

Inverses. Finally, we conjecture that adding unqualified inverse constructor to \(\mathcal{DLFDE}\) under the restrictions outlined in Sect. 4.4 preserves all the results.

4 Tractable \(\text {FunDL}\) Dialects

In this second part of our survey, we consider the logical implication problem for \(\text {FunDL}\) dialects for which the logical implication problem can be solved in PTIME. We begin by reviewing \(\mathcal{CFD}\), chronologically, the first member of the \(\text {FunDL}\) family and, so far as we are aware, the first DL dialect to introduce a type constructor, PFDs, for capturing equality generating dependencies [8, 20].

Ensuring tractability requires that we somehow evade Theorems 4 and 6. This is generally achieved by requiring a TBox to satisfy the following additional conditions:

  1. 1.

    Interaction between value restrictions and conjunctions on the left-hand-sides of inclusion dependencies must somehow be controlled,

  2. 2.

    Inclusion dependencies must be Horn (which effectively disallows the use of disjunction)Footnote 1, and

  3. 3.

    PFDs must satisfy an additional syntactic boundary condition in addition to being disallowed on the left-hand-side of inclusion dependencies.

We shall see that violating any of these conditions leads to intractability of logical implication.

4.1 Horn Inclusion Dependencies

The first way of limiting the interactions between value restrictions and conjunctions on the left-hand-sides of inclusion dependencies is by simply disallowing value restrictions entirely, and by no longer permitting posed questions to mention either negations or disjunctions. This approach underlies the \(\text {FunDL}\) dialect called \(\mathcal{CFD}\) given by the following grammar:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid C_1\sqcap C_2\\ D&{}\,{:}{:}=&{} C \mid D_1\sqcap D_2 \mid \forall f.D \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ E &{}\,{:}{:}=&{} C \mid \bot \mid E_1\sqcap E_2 \mid \forall f.E \mid (\mathop {\mathsf {Pf}}\nolimits _1=\mathop {\mathsf {Pf}}\nolimits _2)\\ \end{array}$$

The main idea behind decidability and complexity of the logical implication problem is similar to the idea in Theorem 15. However, we no longer need to use the \(\mathcal{DLF}\) decision procedure to verify that the partial model can be completed to a full model since, in \(\mathcal{CFD}\), one can always employ complete \(\mathsf {F}\)-trees whose nodes belong to all primitive concepts (without having to check for their existence [20, 36]). Hence, the complexity reduces to the construction of the initial part of the model. This, with the help of the restrictions on E concepts, can be done in PTIME.

Theorem 16

([36]). The logical implication problem for \(\mathcal{CFD}\) is complete for PTIME.

The hardness follows from the fact that the PFDs alone can simulate HornSAT.

Extensions Versus Tractability. Unfortunately, extending this fragment while maintaining tractability is essentially infeasible. The following table summarizes the effects of allowing additional concept constructors in the TBox on the right-hand-side of inclusion dependencies, reading down, and in the posed question, reading across [36]:

\(\mathcal{T}/\mathcal{Q}\)

\(\mathcal{CFD}\) or \(\mathcal{CFD}_{\not =,(\lnot )}\)

\(\mathcal{CFD}_{\not =,\sqcup }\) or \(\mathcal{CFD}_{\lnot }\)

\(\mathcal{CFD}\)

P-c / in P

P-c / coNP-c

\(\mathcal{CFD}^{\sqcup }\)

coNP-c / coNP-c

coNP-c / coNP-c

\(\mathcal{CFD}^{\bot }\)

PSPACE-c / in P

PSPACE-c / coNP-c

\(\mathcal{CFD}^{\sqcup ,\bot }\)

EXPTIME-c / coNP-c

EXPTIME-c / coNP-c

The complexities listed in the table are with respect to the size of the TBox and the size of the posed question. Note in particular that concept disjointness, in which \(\bot \) is allowed on right-hand-sides of inclusion dependencies, leads to PSPACE-completeness. This is due to the need for checking whether a partial model can be completed, which in turn requires testing for reachability in an implicit but exponentially-sized graph.

4.2 Value Restrictions Instead of Conjunctions

An alternative that allows us to evade the ramifications of Theorem 4 is disallowing conjunctions on the left-hand-sides of inclusion dependencies, yielding the dialect \(\mathcal{CFD}_\textit{nc}\) [37] given by the following:

$$\begin{array}{lcl} C&{}\,{:}{:}=&{} A \mid \forall f.C\\ D&{}\,{:}{:}=&{} C \mid \lnot C \mid D_1\sqcap D_2 \mid \forall f.D \mid C:\mathop {\mathsf {Pf}}\nolimits _{1}, ... ,\mathop {\mathsf {Pf}}\nolimits _{k}\rightarrow \mathop {\mathsf {Pf}}\nolimits \\ E &{}\,{:}{:}=&{} C \mid \bot \mid E_1\sqcap E_2 \mid \forall f.E \mid (\mathop {\mathsf {Pf}}\nolimits _1=\mathop {\mathsf {Pf}}\nolimits _2)\\ \end{array}$$

The main idea behind tractability of \(\mathcal{CFD}_\textit{nc}\) relies on the fact that left-hand-sides of inclusion dependencies can only observe object membership in a single atomic concept (as opposed to a conjunction of concepts). Hence, while models of this logic require exponentially many objects labelled by conjunctions of primitive concepts in general, they can be abstracted in a polynomial way. The construction of the actual model is then similar to the standard NFA to DFA construction followed by unfolding of the resulting DFA.

Theorem 17

([37]). The logical implication problem for \(\mathcal{CFD}_\textit{nc}\) is complete for PTIME.

As with the dialect \(\mathcal{CFD}\), hardness follows from reducing HornSAT to reasoning with PFDs.

4.3 Value Restrictions and Limited Conjunctions

The above has shown that allowing an arbitrary use of concept conjunction on the left-hand-sides of inclusion dependences in a \(\mathcal{CFD}_\textit{nc}\) TBox immediately leads to hardness for EXPTIME (a consequence of Theorem 4). The complexity can be traced to the need for exponentially many objects labelled by different sets of primitive concepts to be generated. The following definition provides a way of controlling this need for all such objects:

Definition 18

(Restricted Conjunction). Let \(k>0\) be a constant. We say that TBox \(\mathcal {T}\) is a \(\mathcal{CFD}_{k\textit{c}}\) TBox if, whenever \(\mathcal {T}\models (A_1\sqcap \,\cdots \,\sqcap A_n)\sqsubseteq B\) for some set of primitive concepts \(\{A_1,\ldots ,A_n\} \cup \{B\}\), with \(n>k\), then \(\mathcal {T}\models (A_{i_1}\sqcap \,\cdots \,\sqcap A_{i_k})\sqsubseteq B\) for some k-sized subset \(\{A_{i_1},\ldots ,A_{i_k}\}\) of the primitive concepts \(\{A_1,\ldots ,A_n\}\). \(\Box \)

A saturation-style procedure based on this definition can be implemented to generate all implied inclusion dependencies with at most k primitive concepts (value restrictions) on left-hand-sides of inclusion dependencies [26]. The decision procedure essentially follows the procedure for \(\mathcal{CFD}_\textit{nc}\) but is exponential in k due to the need to consider sets of concepts up to size k (essentially by determining all implied inclusion dependencies that are not a trivial weakening of other inclusion dependencies) and leads to the following:

Theorem 19

([26]). The logical implication problem for \(\mathcal{CFD}_{k\textit{c}}\) is complete for PTIME for a fixed value of k; the decision procedure is exponential in k.

In addition, the procedure enables an incremental means of determining the minimum k for which a given TBox is a \(\mathcal{CFD}_{k\textit{c}}\) TBox, that is, allows for testing if a given parameter k suffices:

Theorem 20

(Testing for k [26]). A TBox \(\mathcal {T}\) is not a \(\mathcal{CFD}_{k\textit{c}}\) TBox if and only if there is an additional single-step inference that infers a non-trivial inclusion dependency (i.e., one that is not a weakening of an already discovered dependency) with \(k+1\) conjuncts on the left hand side.

An algorithm based on iterative deepening allows one to determine the value of k for a given TBox in a pay as you go way. Hence the decision procedure also runs within the optimal time bound, exponential in k and polynomial in \(|\mathcal {T}|+|\mathcal {Q}|\), even when k is not part of the input.

4.4 Adding Inverse Features

Recall from Sect. 3.3 that we consider only the (unqualified) inverse feature constructor, \(\exists {f}^{-1}\), to be added to the D grammar rules of \(\mathcal{CFD}_\textit{nc}\) and \(\mathcal{CFD}_{k\textit{c}}\), yielding the respective logics and \(\mathcal{CFDI}_{k\textit{c}}\). However, additional restrictions are still required to guarantee tractability of logical consequence [38]. We introduce the restrictions by examples:

  1. 1.

    Inverses and Value Restrictions. Interactions between these two concept constructors can be illustrated by the following inference:

    $$ \{A\sqsubseteq \exists {f}^{-1}, \forall f.A'\sqsubseteq \forall f.B\} \models A\sqcap A'\sqsubseteq B. $$

    This cannot be allowed since unrestricted use of this construction yields hardness for EXPTIME (see Theorem 4). \(\mathcal{CFDI}_\textit{nc}\) syntactically restricts TBoxes to avoid the above situation by requiring additional inclusion dependencies of the form \(A\sqsubseteq A'\), \(A'\sqsubseteq A\), or \(A\sqcap A'\sqsubseteq \bot \) to be present in a TBox whenever the above pattern appears. Note that \(\mathcal{CFDI}_{k\textit{c}}\) does not require this restriction since the testing for k procedure we have outlined will detect the above situation (thus determining the price).

  2. 2.

    Inverses and PFDs. The second interaction that hinders tractability is between inverses and PFDs. In particular, a logical consequence problem of the form

    $$ \{A\sqsubseteq \exists {f}^{-1}, \forall f.A\sqsubseteq A , \ldots \}\models (\forall h_1.A)\sqcap (\forall h_2.A)\sqcap (h_1.f=h_2.f)\sqsubseteq h_1=h_2 $$

    will force two infinite f anti-chains starting from two A objects created by the left-hand-side of the posed question. We have shown how to use these anti-chains and additional PFDs in the TBox to reduce linearly bounded DTM acceptance [19] to logical implication in this case, yielding PSPACE-hardness, and how to repair this by further limiting the syntax of PFDs in a way that disables this kind of interaction with inverse features [38]. In particular, PFDs in a TBox must now have one of the following two forms:

    • \(C:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .\mathop {\mathsf {Pf}}\nolimits _i,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits \); and

    • \(C:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits .g,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits .f\), for some primitive features f and g.

Inverses obeying these two restrictions can then be added to both the \(\text {FunDL}\) dialects \(\mathcal{CFDI}_\textit{nc}\) and \(\mathcal{CFDI}_{k\textit{c}}\) while maintaining tractability:

Theorem 21

([38]). The logical implication problems for \(\mathcal{CFDI}_\textit{nc}\) and \(\mathcal{CFDI}_{k\textit{c}}\) are complete for PTIME, in the latter case for a fixed value of k.

5 Partial Features, Roles, ABoxes and Query Answering

The third part of our survey considers how partial features and role hierarchies can be accommodated in \(\text {FunDL}\) dialects, and how to check for knowledge base consistency and to evaluate queries over \(\text {FunDL}\) knowledge bases consisting of a so-called ABox in addition to a TBox.

5.1 Partial Features

We first consider the impact of changing the semantics of features in the \(\text {FunDL}\) family to partial features [25, 26, 40, 41]. The changes can be summarized as follows:

  1. 1.

    Features \(f\in \mathsf {F}\) are now interpreted as partial functions on \(\triangle \) (i.e., the result can be undefined for some elements of \(\triangle \));

  2. 2.

    A path function \(\mathop {\mathsf {Pf}}\nolimits \) now denotes a partial function resulting from the composition of partial functions;

  3. 3.

    The syntax of C in feature-based DLs is extended with an additional concept constructor, \(\exists {f}\), called an existential restriction that can now appear on both sides of inclusion dependencies;

  4. 4.

    The \(\exists {f}\) concept constructor is interpreted as \(\{x \mid \exists y\in \triangle . (f)^\mathcal{I}(x)=y\}.\)

  5. 5.

    We adopt a strict interpretation of set membership and equality. This means that set membership holds only when the value exists; and equality holds only when both sides are defined and denote the same object.

In the light of these changes, we need to consider their impact on concept constructors that involve features or feature paths:

Value Restrictions. Our definition of value restriction \(\forall f.C\) (see Definition 1) assumes features are total. For partial features, there is now a choice:

  1. 1.

    keeping the original semantics, i.e., objects in the interpretation of \(\forall f.C\) must have a feature f defined and leading to a C object, or

  2. 2.

    altering the semantics to match \(\mathcal ALC\)-style semantics, i.e., the f value of objects in the interpretation of such a value restriction must be a C object, if such a value exists; we denote this variant \(\widetilde{\forall } f.C\).

While not equivalent, it is easy to see that many inclusion dependencies can be expressed using either variant of the value restriction, for example

$$ A\sqsubseteq \widetilde{\forall } f.B \text{ can } \text{ be } \text{ expressed } \text{ as } A\sqcap \forall f.\top \sqsubseteq \forall f.B.$$

Note that when the original semantics is used, the existential restriction \(\exists {f}\) is simply a synonym for \(\forall f.\top \). Also, since features are still functional, the so-called qualified existential restrictions of the form \(\exists f.C\), with semantics given by \((\exists {f}.C)^\mathcal{I} = \{x \mid \exists y\in \triangle . (f)^\mathcal{I}(x)=y\wedge y\in (C)^\mathcal{I}\},\) can be simulated by expansion to \(\exists {f}\sqcap \forall f.C\). Indeed, hereon we write \(\exists {\mathop {\mathsf {Pf}}\nolimits }\) as shorthand for \(\exists {f_1}\sqcap \forall f_1.(\exists {f_2}\sqcap \forall f_2.( \ldots (\exists {f_k})\ldots ))\).

PFDs. Our PFDs agree with the definition of identity constraints in [11], where \(\mathop {\mathsf {Pf}}\nolimits _0=id\), which also require path values to exist. To further clarify the impact of this observation, note that a PFD inclusion dependency of the form \(C_1 \sqsubseteq C_2:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits _0\) is violated when (a) all path functions \(\mathop {\mathsf {Pf}}\nolimits _0,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\) are defined for a \(C_1\) object \(e_1\) and a \(C_2\) object \(e_2\), and (b) \((\mathop {\mathsf {Pf}}\nolimits _i)^\mathcal{I}(e_1) = (\mathop {\mathsf {Pf}}\nolimits _i)^\mathcal{I}(e_2)\) holds only for \(1 \le i \le k\). Formally, and more explicitly, this leads to the following interpretation of PFDs in the presence of partial features:

figure b

Equational Concepts. Similarly to PFDs, we assume the strict interpretation of equalities, i.e., an object belongs to \((\mathop {\mathsf {Pf}}\nolimits _1=\mathop {\mathsf {Pf}}\nolimits _2)\) if and only if both \(\mathop {\mathsf {Pf}}\nolimits _1\) and \(\mathop {\mathsf {Pf}}\nolimits _2\) are defined for the object and agree.

Partiality in Expressive FunDL. In expressive \(\text {FunDL}\) dialects, partiality can be simulated by introducing an auxiliary primitive concept G that will stand for the domain of existing objects. Depending on our choice of semantics for value restrictions we get a mapping of a TBox under the partial semantics to a TBox under the total semantics. We first define a way to modify concept descriptions to capture the desired semantics of partiality:

  1. 1.

    \(\mathop {\mathsf {PtoT}}(C) = C[\forall f.C\mapsto \lnot G\sqcup \forall f.(C\sqcap G) \text{ for } f\in F]\), for the original semantics,

  2. 2.

    \(\mathop {\mathsf {PtoT}}(C) = C[\exists {f}\mapsto \lnot G\sqcup \forall f.G, \text{ for } f\in F]\) for the \(\mathcal ALC\)-style semantics.

Now we can define a partial to total TBox mapping

$$ \mathcal {T}_{\mathrm {total}} = \{ G \sqsubseteq \mathop {\mathsf {PtoT}}(C) \mid \top \sqsubseteq C\in \mathcal {T}_{\mathrm {partial}}\}\cup \{\forall f.G\sqsubseteq G \mid f\in F\}, $$

and show:

Theorem 22

([41]). Let \(\mathcal {T}_{\mathrm {partial}}\) be a \({\textit{partial-}\mathcal DLFI}\) TBox in which all inclusion dependencies are of the form \(\top \sqsubseteq C\) with C in negation normal form. Then

$$\mathcal {T}_{\mathrm {partial}}\models \top \sqsubseteq C \iff \mathcal {T}_{\mathrm {total}}\models G\sqsubseteq \mathop {\mathsf {PtoT}}(C),$$

for G a fresh primitive concept.

To extend this construction to the full \({\textit{partial-}\mathcal DLFDI}\) logic, it is sufficient to encode the path function existence preconditions of PFDs in terms of the auxiliary concept G as follows: if \(A\sqsubseteq B:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits _0\in \mathcal {T}_{\mathrm {partial}}\) then

$$\begin{aligned} A\sqcap (\bigsqcap _{i=0}^k\forall \mathop {\mathsf {Pf}}\nolimits _i.G)\sqsubseteq B\sqcap (\bigsqcap _{i=0}^k\forall \mathop {\mathsf {Pf}}\nolimits _i.G):\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits _0 \end{aligned}$$
(1)

is added to \(\mathcal {T}_{\mathrm {total}}\). Here, we are assuming w.l.o.g. that A and B are primitive concept names (\(\mathcal{DLFD}\) allows one to give such names to complex concepts).

Theorem 23

([41]). Let \(\mathcal {T}_{\mathrm {partial}}\) be a \({\textit{partial-}\mathcal DLFDI}\) TBox in which all inclusion dependencies are of the form \(\top \sqsubseteq C\) or \(A\sqsubseteq B:\mathop {\mathsf {Pf}}\nolimits _1,\ldots ,\mathop {\mathsf {Pf}}\nolimits _k\rightarrow \mathop {\mathsf {Pf}}\nolimits _0\). Then

figure c

for G a fresh primitive concept.

This result can also be extended to the logic \(\mathcal{DLFDE}^-\) by appropriately transforming the posed question with respect to the strict interpretation of equational constraints.

Partiality in Tractable FunDL. A similar construction can be used to accommodate partial features in tractable \(\text {FunDL}\) dialects. However, there is a need to accommodate the various restrictions in these logics that guarantee tractability. Hence, we assume that we will be given a \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \) TBox \(\mathcal {T}_{\mathrm {partial}}\) in a normal form, and that the semantics of value restrictions is the same in both the partial and the total logic. We then derive a \(\mathcal{CFDI}_ (k+1)c \) TBox \(\mathcal {T}_{\mathrm {total}}\) by applying the following rules:

figure d

and by adding the inclusion dependency \(\forall f.G\sqsubseteq G\) to \(\mathcal {T}_{\mathrm {total}}\) for each feature.

Conversly, value restrictions in more traditional role-based description logics, such as \(\mathcal ALC\), also cover the vacuous cases, containing objects for which f is undefined (in addition to the above). This definition unfortunately leads to computational difficulties: the disjunctive nature of such a value restriction, when used on left-hand-sides of inclusion dependencies, destroys the canonical model property of the logic. This leads to intractability of query answering as shown by Calvanese et al. [12]. To regain tractability, it becomes necessary to restrict the use of value restrictions on the left-hand-side of inclusion dependencies. In a normal form, the C grammar for left-hand-side concepts must replace \(\forall f.A\) with \(\forall f.A\sqcap \exists {f}\). This leads to alternative rules when simulating the partial-feature logic in the total-feature counterpart, i.e.,

figure e

The technique for treating posed questions [40] extends to \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \) and yields the following:

Theorem 24

([40]). Let \(\mathcal {T}_{\mathrm {partial}}\) be a \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \) TBox, \(\mathcal {Q}_{\mathrm {partial}}\) a posed question, and \(\mathcal {T}_{\mathrm {total}}\) be defined as above. Then \(\mathcal {T}_{\mathrm {total}}\) is a \(\mathcal{CFDI}_ (k+1)c \) TBox and

$$\mathcal {T}_{\mathrm {partial}}\models \mathcal {Q}_{\mathrm {partial}} \iff \mathcal {T}_{\mathrm {total}}\models \mathcal {Q}_{\mathrm {total}},$$

where \(\mathcal {Q}_{\mathrm {total}}\) is effectively constructed from \(\mathcal {Q}_{\mathrm {partial}}\) by adding appropriate conjunctions with G concepts.

Since \(|\mathcal {Q}_{\mathrm {partial}}|\) is linear in \(|\mathcal {Q}_{\mathrm {total}}|\), this provides a tractable decision procedure for logical implication in \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \). An analogous result involving \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \) knowledge base reasoning was studied in [26].

5.2 Simulating Roles and Role Constructors

It is well known that unrestricted use of role functionality with role hierarchies, e.g., DL-Lite\(^\mathcal{HF}_{\mathrm {core}}\), leads to intractability [2, 10]. Conversely, the ability to reify roles would seem to enable capturing a limited variety of role hierarchies.Footnote 2

Consider roles R and S and the corresponding primitive concepts \(C_{R}\) and \(C_{S}\), respectively, and assume that the domains and ranges of the reified roles are captured by the features \(\text{ dom }\) and \(\text{ ran }\) common to both the reified roles. Subsumption and disjointness of these roles can then be captured as follows:

figure f

assuming that the reified role R (and analogously S) also satisfies the key constraint \(C_{R}\sqsubseteq C_{R}:\text{ dom },\text{ ran }\rightarrow \mathop { id }\nolimits \). Such a reduction does not lend itself to capturing role hierarchies between roles and inverses of roles (due to fixing the names of the features \(\text{ dom }\) and \(\text{ ran }\)).

Moreover, for tractable fragments of \(\text {FunDL}\), a condition introduced earlier, governing the interactions between inverse features and value restrictions, introduces additional interactions that interfere with (simulating) role hierarchies, in particular in cases when mandatory participation constraints are present. Consider again roles \(R_1\) and \(R_2\) and the corresponding primitive concepts \(C_{R_1}\) and \(C_{R_2}\), respectively, and associated constraints that declare typing for the roles,

figure g

originating, e.g., from an ER diagram postulating that entity sets \(A_i\) and \(B_i\) participate in a relationship \(R_i\) (for \(i=1,2\)). Now consider a situation where the participation of \(A_i\) in \(R_i\) is mandatory (expressed, e.g., as \(A_i\sqsubseteq \exists R_i\) in DL-Lite). This leads to the following constraints:

figure h

The earlier condition governing the use of inverse roles then requires that one of

figure i

are present in the TBox. The first (and second) conditions imply that \(C_{R_1}\sqsubseteq C_{R_2}\) (\(C_{R_2}\sqsubseteq C_{R_1}\), respectively). The third condition states that the domains of (the reified versions of) \(R_1\) and \(R_2\) are disjoint, hence the roles themselves must also be disjoint. Hence, in the presence of \(C_{R_1}\sqsubseteq C_{R_2}:\text{ dom },\text{ ran }\rightarrow \mathop { id }\nolimits \), the concepts \(C_{R_1}\) and \(C_{R_2}\) must also be disjoint.

All this shows that some form of role hierarchies can be accommodated in \(\text {FunDL}\) dialects. However:

  1. 1.

    only primitive roles can be captured (i.e., capturing inverse roles will not be possible), and

  2. 2.

    when tractability is required, only role forests can be captured, that is, for each pair of roles participating in the same role hierarchy, one must be a super-role of the other or their domain and range features must be distinct.

The first restriction originates in the way (binary) roles are reified—by assigning canonically-named features. This prevents modelling constraints such as \(R\sqsubseteq R^-\) (which would seem to require simple equational constraints for feature renaming). The second condition is essential to maintaining tractability of reasoning [38]. Note, however, that no such restriction is needed for roles that do not participate in the same role hierarchy; this is achieved by appropriate choice of names for the features \(\text{ dom }\) and \(\text{ ran }\).

Last, our approach to role hierarchies can easily be extended to handling hierarchies of higher-arity non-homogeneous relationships (again, via reification and appropriate naming of features) that originate, e.g., from relating the aggregation constructs via inheritance in the EER model [27, 28]. The reification based approach differs from approaches to modelling higher arity relationships directly in the underlying description logic, such as \(\mathcal DLR\) [13, 14] in which only homogeneous relationships can be related in hierarchies. This is due to the positional nature of referring to components of such relationship in lieu of using arguably more flexible keywords (realized by features in \(\text {FunDL}\)).

5.3 ABoxes, Knowledge Bases, and Consistency

First we consider the issue of knowledge bases, combinations of terminological knowledge (TBoxes) with factual assertions about particular objects (ABoxes).

Definition 25

(ABoxes and Knowledge Bases). A knowledge base \(\mathcal K\) is defined by a TBox \(\mathcal T\) and an ABox \(\mathcal A\) consisting of a finite set of facts in form of concept assertions \(A(a)\), basic function assertions \(f(a)=b\) and path function assertions \(\mathop {\mathsf {Pf}}\nolimits _1(a) = \mathop {\mathsf {Pf}}\nolimits _2(b)\). \(\mathcal A\) is called a primitive ABox if it consists only of concept and basic function assertions. Semantics is extended to interpret individuals a to be elements of \(\triangle \). An interpretation \(\mathcal I\) satisfies a concept assertion \(A(a)\) if \((a)^\mathcal{I} \in (A)^\mathcal{I}\), a basic function assertion \(f(a)=b\) if \((f)^\mathcal{I}((a)^\mathcal{I}) = (b)^\mathcal{I}\) and a path function assertion \(\mathop {\mathsf {Pf}}\nolimits _1(a) = \mathop {\mathsf {Pf}}\nolimits _2(b)\) if \((\mathop {\mathsf {Pf}}\nolimits _1)^\mathcal{I}((a)^\mathcal{I}) = (\mathop {\mathsf {Pf}}\nolimits _2)^\mathcal{I}((b)^\mathcal{I})\). \(\mathcal I\) satisfies a knowledge base \(\mathcal K\) if it satisfies each inclusion dependency and assertion in \(\mathcal K\), and also satisfies UNA if, for any individuals a and b occurring in \(\mathcal K\), \((a)^\mathcal{I} \not = (b)^\mathcal{I}\). \(\Box \)

A standard reasoning problem for knowledge bases is the consistency problem, the question whether a knowledge base has a model. We relate this problem to the logical implication problems for \(\text {FunDL}\) dialects that admit equational constructs in the posed questions. It turns out that either capacity alone is sufficient: each is able to effectively simulate the other [21].

ABoxes vs. Equalities in Posed Questions. Intuitively, path equations can enforce that an arbitrary finite graph (with feature-labeled edges and concept description-labeled nodes) is a part of any model that satisfies the equations. Such a graph can equivalently be enforced by an ABox. Hence we have:

Theorem 26

([21]). Let \(\mathcal{T}\) be a \(\mathcal{DLFD}\) terminology and \(\mathcal{A}\) an ABox. Then there is a concept E such that \( \mathcal{T}\cup \mathcal{A}\text { is not consistent if and only if } \mathcal{T}\models E\sqsubseteq \bot .\)

Conversely, it is also possible to show that ABox reasoning can be used for reasoning about equational constraints in the posed questions. However, as the equational concepts are closed under Boolean constructors, a single equational problem may need to map to several ABox consistency problems.

Theorem 27

([21]). Let \(\mathcal{T}\) be a \(\mathcal{DLFD}\) terminology and E an equational concept. Then there is a finite set of ABoxes \(\{\mathcal{A}_i: 0<i\le k \}\) such that

$$ \mathcal{T}\models E\sqsubseteq \bot \text { iff } \mathcal{T}\cup \mathcal{A}_i \text { is not consistent for all }0<i\le k.$$

Theorems 26 and 27 hold even when the terminology \(\mathcal{T}\) is a \(\mathcal{DLF}\) TBox (i.e., does not contain any occurrences of the PFD concept constructor) or to the tractable \(\text {FunDL}\) dialects \(\mathcal{CFD}\) and \(\mathcal{CFDI}_{k\textit{c}}\). Here, posed question E concepts must be limited to retain a PTIME upper bound in the size of the posed question (Sect. 4 has the details).

5.4 Query Answering

Conjunctive queries (CQ) are, as usual, formed from atomic queries (or atoms) of the form C(x) and \(x.\mathop {\mathsf {Pf}}\nolimits _1=y.\mathop {\mathsf {Pf}}\nolimits _2\), where x and y are variables, using conjunction and existential quantification. To simplify notation, we conflate conjunctive queries with the set of its constituent atoms and a set of answer variables. Given a knowledge base (KB) consisting of a TBox and ABox expressed in terms of a tractable \(\text {FunDL}\) dialect, our goal is to compute the so called certain answers:

Definition 28

(Certain Answer). Let \(\mathcal {K}\) be a KB over a tractable \(\text {FunDL}\) dialect and \(Q = \{ \bar{x}\mid \varphi \}\) a CQ. A certain answer to Q over \(\mathcal {K}\) is a substitution of constant symbols \(\bar{a}\), \([\bar{x}\mapsto \bar{a}]\), such that \(\mathcal {K}\models Q[\bar{x}\mapsto \bar{a}]\). \(\Box \)

Computing certain answers in this case requires a combination of perfect rewriting [10] and of the combined approach [22, 36]. The latter is necessary because tractable \(\text {FunDL}\) dialects are complete for PTIME and first-order rewriting alone followed by evaluating the rewritten query over the ABox will not suffice. The former is necessary to avoid the need for exponentially many anonymous objects in an ABox completion (unlike \(\mathcal EL\) logics in which there is a need for only polynomially many such objects).

This approach was introduced for \(\mathcal{CFDI}_\textit{nc}\) in [17, 18] and the two steps are realized by two procedures:

  1. 1.

    \(\mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A})\): this procedure applies consequences of the TBox \(\mathcal {T}\) to the ABox \(\mathcal {A}\). In particular, concept membership is fully determined for all all ABox individuals. For example, if \(\{A(a), f(a)=b, f(b)=c,\ldots \}\subseteq \mathcal {A}\) and \(\mathcal {T}\models A\sqsubseteq \forall f.A\), we require {\(A(b),A(c),\ldots \}\subseteq \mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A})\). (Indeed, propagating concepts along paths that exists in an ABox is the reason why perfect rewriting alone will not suffice in tractable \(\text {FunDL}\) dialects.)

  2. 2.

    \(\mathsf {Fold}_\mathcal {T}(Q)\): this procedure rewrites an input CQ to an union of CQs that account for the constraints in \(\mathcal {T}\) that postulate existence of anonymous objects in all models of the knowledge base. A (slight simplification of a) typical rule applied during such a rewriting looks as follows:

    If \(\{y.f=x, A(y)\}\subseteq \psi \) and y does not appear elsewhere in \(\psi \) nor is an answer variable, then \(\mathsf {Fold}(Q):= \mathsf {Fold}(Q) \cup \{\{\bar{y}\mid \psi _i\}\}\) for all \(\psi _i = \psi -\{y.f=x, A(y)\} \cup \{B_{i}(x)\}\), where \(B_{i}\) are all maximal primitive concepts w.r.t. \(\sqsubseteq \) satisfying the logical implication conditions \(\mathcal {T}\models B_{i}\sqsubseteq \exists {f}^{-1}\) and \(\mathcal {T}\models \forall f.B_{i}\sqsubseteq {A} \}\).

    The rule states that whenever the variable y is connected to the rest of the query via a single feature f, it may be mapped to an anonymous individual. This is accommodated by the query \(\psi _i\) that no longer uses the variable y, but implies \(\psi \) since the existence of the necessary individual is implied by the TBox \(\mathcal {T}\) and the \(B_i(x)\) atom in \(\psi _i\).

Note that query rewriting requires a completed ABox. Thus, the rewriting produces fewer disjuncts since only maximal concepts need to be retained.

Theorem 29

([40]). Let \(\mathcal {K}=(\mathcal {T},\mathcal {A})\) be a \(\mathcal{CFDI}_\textit{nc}\) knowledge base and Q a conjunctive query. Then

$$\mathcal {K}\models Q[\bar{x}\mapsto \bar{a}] \iff (\emptyset ,\mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A}))\models \mathsf {Fold}_\mathcal {T}(Q)[\bar{x}\mapsto \bar{a}].$$

Note that \((\emptyset ,\mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A}))\models \mathsf {Fold}_\mathcal {T}(Q)[\bar{x}\mapsto \bar{a}]\) reduces to evaluating the query \(\mathsf {Fold}_\mathcal {T}(Q)\) over a finite relational structure \(\mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A})\). Tractability (in \(|\mathcal {K}|\)) then follows from \(|\mathop {\mathsf {Completion}}\nolimits _{\mathcal {T}}(\mathcal {A})|\) being polynomial in \(|\mathcal {A}|\) and the fact that reasoning in \(\mathcal {K}\) is in PTIME. This approach was later extended to other tractable dialects of \(\text {FunDL}\) including logics with partial features up to and including \(\textit{ partial-}\mathcal {CFDI}_ k\textit{c} \) [26].

6 Related Work

Recall that \(\mathcal{CFDI}_\textit{nc}\) is a tractable \(\text {FunDL}\) dialect in which left-hand-sides of inclusion dependencies exclude the use of negation as well as conjunction. The possibility of the Krom extension of this dialect, that readmits negation, has also been explored [39]. Tractability is still possible, but requires TBoxes to be free of non-key PFDs, requires ABoxes to be primitive, and requires the adoption of UNA. (Relaxing any of these conditions leads to intractability.)

We have also considered how concepts in \(\text {FunDL}\) dialects can replace constants in an ABox as a way of referring to entities or objects. Indeed, the judicious adoption of features instead of roles in these dialects makes it easy for an ABox to be a window on factual data in backend object-relational data sources. Coupled with the notion of referring expression types, this overall development pays off nicely in ontology-based data access and in relating conceptual and object-relational database design in information systems [6, 7].

A short review of ways in which PFDs themselves have been generalized completes our survey.

Path Order Dependencies. PFDs can be viewed as a variety of tuple generating dependencies in which equality is the only predicate occurring on the right-hand-side. The possibility that any comparison operator can be used instead has also been investigated. In particular, so-called guarded order dependencies can be added to the expressive \(\text {FunDL}\) dialect \(\mathcal{DLF}\) without impacting the complexity of logical implication [29]. For our introductory university TBox, a correlation between \(\textit{gpa}\) and \(\textit{mark}\) can be expressed by such a dependency:

$$ \text {TAKES} \sqsubseteq \text {TAKES}:\textit{class}^=, \textit{mark}^<\rightarrow \textit{student}.\textit{gpa}^\le $$

The dependency asserts that the grade point average of a student is never greater than that of another student when there is some class they have both taken in which the latter student obtained a better grade.

Regular Path Functional Dependencies. Left and right-hand-sides of PFDs can be viewed as instances of finite regular languages. The possibility of allowing these languages to be defined by regular expressions admitting the Kleene closure operator has also been investigated. In particular, regular path functional dependencies were introduced in [30], and more general regular path order dependencies in [33], and, in both cases, were shown to not impact the complexity of logical implication when added to \(\mathcal{DLF}\). This remains the case when value restrictions are also generalized by allowing component path expressions to be given by regular expressions. For example, to ensure that every professor eventually reports to a dean, one can now add the inclusion dependencies

$$ \text {DEAN} \sqsubseteq \text {CHAIR} \text{ and } \text {PROF} \sqsubseteq \lnot \forall \textit{reports}*.\lnot \text {DEAN} $$

to the university TBox.

Temporal Path Functional Dependencies. Finally, adding both a temporal variety of PFDs and a global model operator (\(\Box \)) to \(\mathcal{DLF}\) is also possible without impact on the complexity of logical implication [34]. This enables adding the inclusion dependency

$$ \text {PERSON} \sqsubseteq (\Box _{\text {forever}} \text {PERSON}) \sqcap (\text {PERSON}: \mathop { id }\nolimits \rightarrow _{\text {forever}} \textit{name}) $$

to the university TBox to ensure that a person is always a person and that the name of a person never changes. Adding the inclusion dependency

$$ \text {DEPT} \sqsubseteq (\text {DEPT}: \mathop { id }\nolimits \rightarrow _{\text {term}} \textit{head}) \sqcap (\text {DEPT}: \textit{head} \rightarrow _{\text {term}} \mathop { id }\nolimits ) $$

would ensure that a professor is the unique head of a department for a fixed term. However, it not possible to add any form of eventuality together with temporal PFDs to \(\mathcal{DLF}\) (e.g., by also adding regular PFDs) and at the same time retain EXPTIME complexity of logical implication for \(\mathcal{DLF}\) iteself [34].