1 Introduction

Ontology-mediated query answering (OMQA) has generated a lot of interest in the last years as a promising way of facilitating access to data (see [4] for a recent survey). In the OMQA approach, the ontology serves to define a conceptual view of an application domain, introducing a convenient vocabulary for query formulation and providing background knowledge that is exploited at query time to obtain the complete set of answers. So far, the vast majority of research on OMQA has considered user queries in the form of conjunctive queries (CQs), which are a standard query language for relational databases. However, in numerous application scenarios, data can naturally be seen as graphs, in which case so-called navigational queries are considered more suitable. The basic navigational query language is regular path queries (RPQs) [11], which allow one to find paths whose labels conform to a given regular language.

In recent years, the problem of answering navigational queries in the setting of OMQA has begun to be explored, first for ontologies formulated in highly expressive description logics (DLs) of the \(\mathcal {Z}\) family [810], then for rich Horn DLs like Horn-\(\mathcal {SROIQ}\) [18], and more recently, for lightweight DLs like DL-Lite\(_\mathcal {R}\) and \(\mathcal {EL}\) [5, 19]. The latter DLs, which underlie the OWL 2 QL and EL profiles, are the most relevant for OMQA due to their favourable computational properties. In addition to plain RPQs, this line of work has also considered richer navigational languages like conjunctive RPQs (which extend both RPQs and CQs) and extensions with nesting and/or negation [3, 6, 15]. Although much work remains to be done in developing and implementing efficient algorithms, the complexity landscape for answering various forms of path queries over DL knowledge bases is now rather well understood. The same cannot be said for ontologies formulated by means of decidable classes of existential rules (like linear and guarded rulesets), which constitute another important class of ontology languages [1, 7]. A key feature that distinguishes existential rules from DLs is the possibility of using predicates of arity greater than two. Since regular path queries are defined only with respect to unary and binary predicates, one might wonder whether they make sense in higher arity settings. We argue however that unary and binary predicates form the backbone of real-world ontologies (irrespective of the choice of ontology language), and it is desirable to be able to use some higher-arity predicates without losing any expressivity in the query language.

In this paper, we take a step towards a better understanding of the combination of navigational query languages and existential rules by studying the complexity of answering two-way RPQs in the presence of linear rules, a well-studied class of existential rules that are a natural generalization of the DL-Lite description logics. After introducing the necessary background, we show how to adapt the RPQ algorithm for DL-Lite proposed in [5] to the setting of linear rules. Unfortunately, our adaptation incurs an exponential blow-up with respect to the maximum predicate arity. We can nevertheless show that the obtained algorithm is worst-case optimal, as RPQ answering is ExpTime-complete in combined complexity.

2 Preliminaries

We adopt the notation of [13]. The notions of constants, function symbols and predicate symbols are standard. Each function or predicate symbol is associated with a nonnegative integer arity. Variables, terms, substitutions, atoms, first-order formulae, sentences, interpretations (i.e., structures), and models are defined as usual. By a slight abuse of notation, we often identify a conjunction with the set of its conjuncts. Furthermore, we often abbreviate a vector of terms \(t_1,\ldots ,t_n\) as \(\mathbf {t}\), and define \(|\mathbf {t}|=n\). By \(\varphi \sigma \) we denote the result of applying a substitution \(\sigma \) to \(\varphi \). A term, atom, or formula is ground if it does not contain variables; a fact is a ground atom. A term \(t'\) is a subterm of a term t if \(t'=t\) or \(t = f(\mathbf {s})\) where f is a function and \(t'\) is a subterm of some \(s_i \in \mathbf {s}\). A term s is contained in an atom \(p(\mathbf {t})\) is \(s \in \mathbf {t}\), and s occurs in \(p(\mathbf {t})\) if s is a subterm of some term \(t_i \in \mathbf {t}\); thus, if s is contained in \(p(\mathbf {t})\), s occurs in \(p(\mathbf {t})\), but the converse may not hold. A term s is contained (resp. occurs) in a set of atoms I if s is contained (resp. occurs) in some atom in I. Let \(T = \{t_1,\ldots ,t_n\}\) be a set of terms. A term t is generated by T if (i) \(t \in T\) or (ii) \(t = f(x_1,\ldots ,x_k)\) and all the \(x_k\) are generated by T. An instance is a finite set of function-free facts. The terms appearing in an instance (resp. atom) are denoted by \(\textit{terms}{(I)}\) (resp. \(\textit{terms}{(\alpha )}\)).

Existential Rules. An existential rule (or just rule) takes the form:

$$\begin{aligned} \forall \mathbf {x}\forall \mathbf {z}.[\varphi (\mathbf {x},\mathbf {z}) \rightarrow \exists \mathbf {y}.\psi (\mathbf {x},\mathbf {y})], \end{aligned}$$

where \(\varphi (\mathbf {x},\mathbf {z})\) and \(\psi (\mathbf {x},\mathbf {y})\) are non-empty conjunctions of function-free atoms, and tuples of variables \(\mathbf {x}, \mathbf {y}\) and \(\mathbf {z}\) are pairwise disjoint. We call \(\varphi \) the body and \(\psi \) the head of the rule. For brevity, quantifiers are often omitted.

We frequently use Skolemisation to interpret rules in Herbrand interpretations, which are defined as possibly infinite sets of facts. In particular, for each rule \(\rho \) and each variable \(y_i \in \mathbf {y}\), let \(f_\rho ^i\) be a function symbol globally unique for \(\rho \) and \(y_i\) of arity \(|\mathbf {x}|\); furthermore, let \(\theta _{sk}\) be the substitution such that \(\theta _{sk}(y_i) = f_\rho ^i(\mathbf {x})\) for each \(y_i \in \mathbf {y}\). Then, the Skolemisation \(\textit{sk}{(\rho )}\) of \(\rho \) is the following rule: \(\varphi (\mathbf {x},\mathbf {z}) \rightarrow \psi (\mathbf {x},\mathbf {y})\theta _{sk}.\)

A linear rule is an existential rule whose body is restricted to a single atom. For ease of presentation, we will consider only rules without any constants. As usual, we also assume that rules have only a single atom in the head. This can be done without loss of generality.

Skolem Chase. The chase [14, 16] (or canonical model) is a classical tool in OMQA. In this paper, we use the Skolem chase variant [17]. Let \(\rho = \varphi \rightarrow \psi \) be a Skolemised rule, and let I be a set of facts. A set of facts S is a consequence of \(\rho \) on I if a substitution \(\sigma \) exists that maps the variables in \(\rho \) to the terms occurring in I (denoted by \(\textit{terms}{(I)}\)) such that \(\varphi \sigma \subseteq I\) and \(S \subseteq \psi \sigma \). The result of applying \(\rho \) to I, written \(\rho (I)\), is the union of all consequences of \(\rho \) on I. If \(\varOmega \) is a set of Skolemised rules, we set \(\varOmega (I) = \bigcup _{\rho \in \varOmega } \rho (I)\). Let I be a finite set of facts, let \(\mathcal {R}\) be a set of rules, let \(\mathcal {R}' = \textit{sk}{(\mathcal {R})}\), and let \(\mathcal {R}'_f\) and \(\mathcal {R}'_n\) be the subsets of \(\mathcal {R}'\) containing rules with and without function symbols, respectively. The chase sequence for I and \(\mathcal {R}\) is a sequence of sets of facts \(I_\mathcal {R}^0,I_\mathcal {R}^1,\ldots ,\) where \(I_\mathcal {R}^0 = I\) and for each \(i > 0\), set \(I_\mathcal {R}^i\) is defined as follows:

  • if \(\mathcal {R}'_n(I_\mathcal {R}^{i-1}) \not \subseteq I_\mathcal {R}^{i-1}\), then \(I_\mathcal {R}^i = I_\mathcal {R}^{i-1} \cup \mathcal {R}'_n(I_\mathcal {R}^{i-1})\)

  • otherwise \(I_\mathcal {R}^{i} = I_\mathcal {R}^{i-1} \cup \mathcal {R}'_f(I_\mathcal {R}^{i-1})\)

The chase of I and \(\mathcal {R}\), written \(\textit{chase}(I,\mathcal {R})\), is defined as \(\bigcup _i I_\mathcal {R}^i\); note that \(\textit{chase}(I,\mathcal {R})\) can be infinite. However, the chase has a simple structure when linear rules are considered: each atom can be “chased” independently.

Property 1

(Decomposition of the Chase). Let \(\mathcal {R}\) be a set of linear rules and I be an instance. It holds that:

$$\begin{aligned} \textit{chase}(I,\mathcal {R}) = \cup _{\alpha \in I} \,\,\textit{chase}(\{\alpha \},\mathcal {R}) \end{aligned}$$

Regular Languages. A regular language can be represented either by a regular expression or by a non-deterministic finite automaton (NFA). Let \(\varSigma \) be a finite set of symbols. A regular expression over \(\varSigma \) is defined by the grammar: \(\mathcal {E} \rightarrow \varepsilon \mid a\mid \mathcal {E}\cdot \mathcal {E} \mid \mathcal {E} + \mathcal {E} \mid \mathcal {E}^*,\) where \(a\in \varSigma \) and \(\varepsilon \) denotes the empty word. We use \(L(\mathcal {E})\) to denote the language defined by \(\mathcal {E}\). An NFA over \(\varSigma \) is a tuple \(\mathbb {A}= (S,\varSigma ,\delta ,s_0,F)\), where S is a finite set of states, \(\delta \subseteq S \times \varSigma \times S\) is the transition relation, \(s_0 \in S\) is the initial state and \(F \subseteq S\) is the set of final states. If \(\mathbb {A}\) is an automaton and s and \(s'\) are two states of \(\mathbb {A}\), we denote by \(\mathcal {L}_{\mathbb {A}}(s,s')\) the set of words w for which there is path from s to \(s'\) in \(\mathbb {A}\) labeled by w.

Regular Path Queries. Let \(\mathcal {P}\) be a set of predicates. Let us define \(\mathcal {P}_2^\pm = \mathcal {P}_2 \cup \{r^- \mid r \in \mathcal {P}_2\}\) and \(\mathcal {P}_r = \mathcal {P}_2^\pm \cup \mathcal {P}_1\), where \(\mathcal {P}_i\) (\(i \in \{1,2\}\)) denotes the predicates of arity i. A two-way regular path query (RPQFootnote 1) is a query of the form \(q(x,x') = \mathcal {E}(x,x')\), where \(\mathcal {E}\) is a regular expression defining a language over \(\mathcal {P}_r\).

Given an interpretation \(\mathcal {I}\), a path from \(a_0\) to \(a_n\) in \(\mathcal {I}\) is a sequence \(a_0r_1a_1r_2\ldots r_na_n\) such that for any i such that \(1\le i \le n\), \(a_i\) is an element of the domain \(\varDelta ^\mathcal {I}\) of \(\mathcal {I}\), every \(r_i\) is a symbol from \(\mathcal {P}_r\) and:

  • if \(r_i = a \in \mathcal {P}_1\), then \(a_i = a_{i-1} \in a^\mathcal {I}\);

  • if \(r_i \in \mathcal {P}_2\), then \((a_{i-1},a_i) \in r_i^\mathcal {I}\);

  • if \(r_i = r^-\) with \(r\in \mathcal {P}_2\), then \((a_i,a_{i-1}) \in r^\mathcal {I}\).

The label \(\lambda (p)\) of path \(p=a_0r_1a_1r_2\ldots r_na_n\) is the word \(r_1r_2\ldots r_n\). For any language L over \(\mathcal {P}_r\), the semantics of L with respect to an interpretation \(\mathcal {I}\) is defined by:

$$\begin{aligned} L^\mathcal {I}= \{(a_0,a_n) \mid \text {there is some path} \,\, p\,\, \text {from } a_0 \text { to } a_n \text { such that } \lambda (p) \in L\}. \end{aligned}$$

A match for an RPQ \(q(x,x') = \mathcal {E}(x,x')\) in an interpretation \(\mathcal {I}\) is a mapping \(\pi \) from the variables of q to elements of \(\varDelta ^\mathcal {I}\) such that \((\pi (x),\pi (x')) \in L(\mathcal {E})^\mathcal {I}\).

A certain answer to \(q(x_1,x_2)\) with respect to \((I,\mathcal {R})\) is a pair of constants \((a_1,a_2)\) such that for every model \(\mathcal {I}\) of \((I,\mathcal {R})\), there is a match \(\pi \) for q such that \(\pi (x_1) = a_1^\mathcal {I}\) and \(\pi (x_2) = a_2^\mathcal {I}\). As matches are preserved under homomorphisms, it holds that \((a_1,a_2)\) is a certain answer to \(q(x_1,x_2)\) w.r.t. \((I,\mathcal {R})\) if and only if there is a match for \((a_1^\mathcal {I},a_2^\mathcal {I})\) in \(\mathcal {I}= \textit{chase}(I,\mathcal {R})\). The RPQ Answering problem asks, given an RPQ \(q(x_1,x_2)\), an instance \(I\), a set of existential rules \(\mathcal {R}\), and two constants \((a_1,a_2) \in \textit{terms}{(\mathcal {I})}\times \textit{terms}{(\mathcal {I})}\), whether \((a_1,a_2)\) is a certain answer to \(q(x_1,x_2)\).

Computational Complexity and Turing Machines. We assume the reader to be familiar with standard complexity classes. In particular, we will consider P, NP, PSpace, APSpace (alternating PSpace), and ExpTime. We recall that APSpace = ExpTime.

To fix notations, we recall that an alternating Turing machine (TM) is given by a 5-tuple \(\mathcal {M}= (Q,\varGamma ,\delta ,q_0,g)\) where:

  • \(Q\) is the finite set of states;

  • \(\varGamma \) is the finite tape alphabet;

  • \(\delta : Q\times \varGamma \rightarrow (Q\times \varGamma \times \{L,R\})^2\) is the transition function;

  • \(q_0\in Q\) is the initial state;

  • \(g: Q\rightarrow \{\wedge ,\vee ,\text {accept, reject}\}\) specifies the type of each state.

Note that without loss of generality, we consider TMs having the following properties:

  • for every universal (\(\wedge \)) or existential (\(\vee \)) configuration, there exist exactly two applicable transitions;

  • the machine directly accepts any configuration whose state s is such that \(g(s) = \text {accept}\);

  • the TM never tries to go to the left of the initial position.

We say \(\mathcal {M}\) is polynomially space-bounded (\(\mathcal {M}\) is a PSpace TM) if there exists a polynomial p such that on input x, \(\mathcal {M}\) visits only the first p(|x|) tape cells. We assume w.l.o.g. that the alternating PSpace TMs we consider terminate on every input.

3 Evaluating Regular Path Queries over Linear Rules

We consider the problem of computing the certain answers to a regular path query and show how to adapt the construction in [5] to the case of linear rules. There are two main ingredients in the original algorithm for DL-Lite:

  • a path in the chase is guessed step by step, keeping in memory only the current constant of the instance and current state of the automaton;

  • when a path goes through the Skolem part of the chase, these constants are not guessed, but the state in which the automaton is when the path returns to constants of the instance is guessed, thanks to a precomputed table.

3.1 Additional Challenges with Linear Rules

There are two main differences between DL-Lite and linear rules that need to be handled. First, in DL-Lite, it is enough to know the predicate of the atom in which an constant has been created during the chase and the position at which it appeared in that atom to determine all the atoms that contain that constant in the chase. This is not true if we consider general linear rules, as illustrated by the following example:

Example 1

(More Complex Types are Needed). Let us consider the following rules:

$$\begin{aligned} h(x,y,z) \rightarrow h(z,x,y) \qquad h(x,x,y) \rightarrow q(y) \end{aligned}$$

and instance \(I= \{h(a,b,b),h(c,d,e)\}\). Observe that while a and c occur in the same position of atoms with the same predicate, q(a) is in \(\textit{chase}(I,\mathcal {R})\), while q(c) is not.

Second, the following looping property is central to the algorithm from [5].

Definition 1

(Looping Property). An ontology \(\mathcal {R}\) fulfills the looping property if it holds that for any instance \(I\), for any path \(a_0r_1a_1\ldots r_na_n\) in \(\textit{chase}(I,\mathcal {R})\) such that (i) \(a_i\) and \(a_{i+1}\) are Skolem terms, (ii) \(a_i\) is a subterm of \(a_{i+1}\), and (iii) \(a_1\) and \(a_n\) are original constants, there exists \(k \ge i\) such that \(a_k = a_i\).

Indeed, DL-Lite\(_\mathcal {R}\) fulfills the looping property (as do many other DLs). However, linear rules do not, as is witnessed by Example 2.

Example 2

(Failure of Looping Property). Consider the instance \(I_e = \{t(a,b)\}\) and the ruleset \(\mathcal {R}_e\) consisting of the following rules:

$$\begin{aligned}&t(x,y) \rightarrow r(y,z)&q(x,y,z) \rightarrow p(y,z)\\&r(x,y) \rightarrow q(x,y,z)&q(x,y,z) \rightarrow p(z,x) \end{aligned}$$

The chase for \(I_e\) and \(\mathcal {R}_e\) contains the following atoms:

$$\begin{aligned} r(b,f_1(b)) \quad q(b,f_1(b),f_2(b,f_1(b))) \quad p(f_1(b),f_2(b,f_1(b))) \quad p(f_2(b,f_1(b)),b) \end{aligned}$$

There is thus a path \(b\, r \, f_1(b)\, p\, f_2(b,f_1(b)) \, p\,\, b\) going from the initial constant b to b, that passes by \(f_1(b)\) but does not return via \(f_1(b)\).

3.2 Adapting the DL-Lite\(_\mathcal {R}\) Algorithm

To take care of the first difficulty, we utilize a finer notion of type, which has similar properties to the one used in [5].

Definition 2

(Type). A type is a pair \((r,\mathcal {P})\) where \(r\) is a predicate of arity k and \(\mathcal {P}\) is a partition of \(\{1,\ldots ,k\}\).

With each atom, we can associate a type, representing the way terms are repeated in the atom.

Definition 3

(Type of an Atom). Let \(\alpha \) be an atom, whose arity is k. The type of \(\alpha \) is the pair \((r,\mathcal P)\) where p is the predicate of \(\alpha \) and \(\mathcal {P}\) is the partition of \(\{1,\ldots ,k\}\) such that i and j belong to the same partition iff the \(i^\mathrm {th}\) and the \(j^\mathrm {th}\) arguments of \(\alpha \) are equal.

Note that if two atoms \(\alpha _1\) and \(\alpha _2\) are of same type, there exists an injective substitution \(\theta _{12}\) such that \(\alpha _2 = \alpha _1\theta _{12}\).

Property 2

Let \(I\) be an instance, and \(\mathcal {R}\) be a set of linear rules. Let \(\alpha _1\) and \(\alpha _2\) be two atoms of \(I\) of same type and \(\theta _{12}\) such that \(\alpha _2 = \alpha _1\theta _{12}\). Then for every atom \(\beta \) such that \(\beta \in \textit{chase}(\{\alpha _1\},\mathcal {R})\), \(\beta \theta _{12}\in \textit{chase}(\{\alpha _2\},\mathcal {R})\).

Let us define for any atom \(\alpha \in \textit{chase}(I,\mathcal {R})\), the restriction of \(\textit{chase}(I,\mathcal {R})\) to \(\alpha \), denoted \(\textit{chase}(I,\mathcal {R})_{\mid \alpha }\), as the subset of \(\textit{chase}(I,\mathcal {R})\) consisting of those atoms whose terms are generated by \(\textit{terms}{(\alpha )}\). Observe that by the preceding property, if \(\textit{type}{(\alpha )} = \textit{type}{(\beta )}\), then \(\textit{chase}(I,\mathcal {R})_{\mid \alpha }\) is isomorphic to \(\textit{chase}(\{\beta \},\mathcal {R})\).

We can overcome the second difficulty by generalizing the Loop table introduced in [5], which keeps track of the paths that occur ‘below’ a given type. Intuitively, a type T is in the cell indexed by \((s_i,j,s'_i,j')\) if and only if below any atom of type T, there is a path going from the term in position j to the term in position \(j'\) labeled by a word that takes \(\mathbb {A}\) from state \(s_i\) to state \(s'_i\).

Definition 4

( Loop ). Let \(\mathcal {R}\) be a set of linear rules and \(\mathbb {A}\) be an NFA. A Loop table has cells indexed by tuples \((s_i,j,s_{i'},j')\) such that \(s_i\) and \(s_{i'}\) are states of \(\mathbb {A}\) and j and \(j'\) are integers between 1 and \(w\), where \(w\) is the maximum arity appearing in the ruleset. Cells contain types. A Loop table is:

  • sound if for every \(T \in (s_i,j,s_{i'},j')\) it holds that for every atom \(\alpha \) of type T appearing in some \(\textit{chase}(\{\alpha '\},\mathcal {R})\) (with the predicate of \(\alpha '\) appearing in \(\mathcal {R}\)), there is a path p in the restriction of \(\textit{chase}(I,\mathcal {R})\) to \(\alpha \) that goes from argument j of \(\alpha \) to argument \(j'\) of \(\alpha \) such that \(\lambda (p) \in \mathcal {L}_{\mathbb {A}}(s_i,s_{i'})\).

  • complete if for every atom \(\alpha \) of type T (whose predicate appears in \(\mathcal {R}\)), if there is path p from argument j to argument \(j'\) of \(\alpha \) in \(\textit{chase}(\{\alpha \},\mathcal {R})\) such that \(\lambda (p) \in \mathcal {L}_{\mathbb {A}}(s_i,s_{i'})\), then \(T \in (s_i,j,s_{i'},j')\).

It is direct from the definition that there exists a unique sound and complete Loop table, and in what follows, we use Loop to denote this table.

The table Loop can be constructed using Algorithm 1. Line 5 initializes the table by stating than one can go from a position to the same position without reading any word (and thus not moving in the automaton). Lines 8 and 10 correspond to going through a single edge, reading its label either as an r or an \(r^-\), in the case where both terms are distinct. Lines 13 to 16 do the same thing when both arguments are equal. Line 19 deals with unary predicates. Finally, Lines 23 and 26 saturate the table through respectively transitive closure and propagation of paths from a child to its parent.

figure a

Property 3

Let \(\mathcal {R}\) be a set of linear rules, \(I\) be an instance and \(\alpha \in I\). The following are equivalent:

  1. 1.

    \(\textit{type}{(\alpha )} \in \texttt {Loop}(s,i,s',j)\)

  2. 2.

    there is a path \(p = a_0r_1a_1\ldots r_na_n\) in \(\textit{chase}(I,\mathcal {R})_{\mid \alpha }\) with \(a_0\) appearing at position i in \(\alpha \), \(a_n\) appearing at position j in \(\alpha \), and \(\lambda (p) \in \mathcal {L}_{\mathbb {A}}(s,s')\).

Proof

(\(\Rightarrow \)) We prove, by induction on the order of addition of types that whenever a type is added to a cell in \(\texttt {Loop}(s,i,s',j)\), the second condition is fulfilled as well. If \(\textit{type}{(\alpha )}\) is added to \(\texttt {Loop}(s_i,j,s_i,j)\) at Line 5, the empty word defines a trivial path from any position existing in \(\alpha \) to itself, and takes the automaton from any state to itself. If \(\textit{type}{(\alpha )}\) is added to \(\texttt {Loop}(s_1,1,s_2,2)\) at Line 8, \(\alpha \) is a binary atom of the form \(r(e_1,e_2)\), and there is indeed a path from \(e_1\) to \(e_2\) labeled r. Moreover, there is a transition in \(\mathbb {A}\) from \(s_1\) to \(s_2\) labeled by r, which concludes this case. The reasoning is similar for types added via Line 10 and Lines 13 to 16. If \(\textit{type}{(\alpha )}\) is added at Line 23, it must have already been added to \(\texttt {Loop}(s_1,j_1,s_2,j_2)\) and \(\texttt {Loop}(s_2,j_2,s_3,j_3)\). By the induction assumption, there is a word \(w_1\) (resp. \(w_2\)) in \(\mathcal {L}_{\mathbb {A}}(s_1,s_2)\) (resp. \(\mathcal {L}_{\mathbb {A}}(s_2,s_3)\)) that labels a path from the position \(j_1\) (resp. \(j_2\)) of an atom \(\alpha \) of type T to the position \(j_2\) (resp. \(j_3\)). Thus \(w_1\!\cdot \! w_2\) labels a path from position \(j_1\) in \(\alpha \) to position \(j_3\) in \(\alpha \) and belongs to \(\mathcal {L}_{\mathbb {A}}(s_1,s_3)\). Finally, let us assume that \(\textit{type}{(\alpha )}\) is added to \(\texttt {Loop}(s_1,i_{\alpha '},s_2,j_{\alpha '})\) at Line 26. By assumption, there is a rule \(\alpha '\rightarrow \beta '\) in \(\mathcal {R}\) such that \(\alpha \) and \(\alpha '\) have the same type, \(\textit{type}{(\beta ')}\) is in \(\texttt {Loop}(s_1,i_{\beta '},s_2,j_{\beta '})\), and the same variable appears at position \(i_{\alpha '}\) (resp. \(j_{\alpha '}\)) in \(\alpha '\) and \(i_{\beta '}\) (res. \(j_{\beta '})\) in \(\beta '\). By the induction assumption, there is a word \(w \in \mathcal {L}_{\mathbb {A}}(s_1,s_2)\) that labels a path from \(i_{\beta '}\) to \(j_{\beta '}\). Now, let us observe that any two terms that are at positions \(i_{\alpha '}\) and \(j_{\alpha '}\) of the same atom of type \(\textit{type}{(\alpha ')}\) are also at position \(i_{\beta '}\) and \(j_{\beta '}\) of an atom of type \(\textit{type}{(\beta ')}\) in \(\textit{chase}(D,\mathcal {R})_{\mid \alpha }\) because it is a model of \(\alpha ' \rightarrow \beta '\). Thus, w is also the label of a path from the term at position \(i_\alpha '\) to the term at position \(j_\alpha '\), which concludes the proof.

(\(\Leftarrow \)) We suppose that the second statement holds and reason by induction on the length n of the path \(p = a_0r_1a_1\ldots r_na_n\).

Base case, path of length 0: both states and database constants are thus equal, and the type is added by the initialization in Line 5.

Base case, path of length 1: \(\alpha ' = r_1(a_0,a_1)\) belongs to \(\textit{chase}(I,\mathcal {R})_{\mid \alpha }\), and \(r_1 \in \mathcal {L}_{\mathbb {A}}(s,s')\). If \(a_0 \ne a_1\), then \(\textit{type}{(\alpha ')}\) is added to the cells \((s,1,s',2)\) and \((s,1,s',2)\) in Lines 8 and 10. If \(a_0=a_1\), then \(\textit{type}{(\alpha ')}\) is added to the four cells \((s,i',s',j')\) with \(i',j' \in \{1,2\}\) (Lines 13–16). As \(\alpha '\) belongs to \(\textit{chase}(I,\mathcal {R})_{\mid \alpha }\), there exists a finite sequence of atoms \(\alpha = \alpha _0,\ldots ,\alpha _m = \alpha '\) such that \(\alpha _{i+1}\) belongs to \(\rho _i(\alpha _{i})\) for some rule \(\rho _i \in \mathcal {R}\). By using m applications of Line 26, we obtain \(\textit{type}{(\alpha )} \in \texttt {Loop}(s,i,s',j)\).

Induction step: let us assume that the result holds for any path of length up to \(n-1, n\ge 2\), and consider the path \(p = a_0r_1a_1\ldots r_na_n\). First consider the case in which \(a_k\) is contained in \(\alpha \) for some \(1 \le k < n\), and let l be a position of \(a_k\) in \(\alpha \). There exists a path from \(a_0\) to \(a_k\) of length strictly smaller than n, and similarly from \(a_k\) to \(a_n\). By the induction assumption, \(\textit{type}{(\alpha )}\) is in both \(\texttt {Loop}(s,i,s'',l)\) and \(\texttt {Loop}(s'',l,s',j)\) for some state \(s''\). An application of Line 23 yields \(\textit{type}{(\alpha )} \in \texttt {Loop}(s,i,s',j)\). Next suppose there is no \(a_k\) (\(1 \le k < n\)) that occurs in \(\alpha \), and let \(\beta \) be the atom in which \(a_1\) is created (at position \(k'\)). This atom is well defined as we consider rules with atomic head. We know that \(a_0\) (resp. \(a_n\)) must occur in \(\beta \), let us say at position \(i'\) (resp. \(j'\)). Indeed, if it was not the case, \(\alpha \) should contain a term among \(a_1,\ldots ,a_{n-1}\) which contradicts our earlier assumption. By the induction hypothesis, \(\textit{type}{(\beta )}\) belongs to \(\texttt {Loop}(s,i',s'',k')\) and to \(\texttt {Loop}(s'',k',s',j')\) for some state \(s''\). Hence, by Line 23, \(\textit{type}{(\beta )}\) is in the cell \(\texttt {Loop}(s,i',s',j')\). By (repeated) application of Line 26, \(\textit{type}{(\alpha )}\) is in the cell \(\texttt {Loop}(s,i,s',j)\), which concludes the proof.   \(\square \)

Property 4

Algorithm 1 runs in exponential time, and in polynomial time if the predicate arity is bounded.

Proof

There are polynomially many cells in the table, each of which can contain at most all types. The number \(n_t\) of distinct types is single exponential (and polynomial for bounded-arity predicates). The first for loop runs in \(\mathcal {O}(n_t)\), the next two run in polynomial time, and the while loop is performed at most \(n_t\) times.    \(\square \)

The remainder of the decision procedure is very close to the original algorithm for DL-Lite\(_\mathcal {R}\), but we recall it here (Algorithm 2) in the interest of self-containment. The idea is as follows: starting from a constant a and the initial state of \(\mathbb {A}\), we guess the next constant in \(I\) on a path from a to b and the state of \(\mathbb {A}\) after taking this step (Line 7). We then check that this choice is valid, i.e., there is indeed a path from a to the guessed constant which takes the automaton from the initial state to the current guessed state. This can be done either by a checking that a corresponding unary or binary atom is entailed (Lines 9 and 10), or by checking that a path going through the Skolem part of the chase allows us to reach the next constant in the required state, using the Loop table (Lines 12 to 14). We repeat this procedure until we reach the constant b in a final state, or hit the maximal path length. Note that at Line 12, \(\alpha \) is uniquely defined if it exists (it may not exist e.g., if c and d are different but are at positions that should have identical terms according to T).

figure b

The following property will be used to establish correctness of the algorithm.

Property 5

At the beginning of each iteration of the while loop of Algorithm 2, it holds that there is a path from a to the first element of current that takes the NFA \(\mathbb {A}\) from the initial state \(s_0\) to the state in the second argument of current.

Proof

At the beginning of the first iteration of the while loop, current is equal to \((a,s_0)\). Thus, the path a, whose label is \(\varepsilon \), goes from a to a and \(\varepsilon \in \mathcal {L}_{\mathbb {A}}(s_0,s_0)\).

Let \((a_i,s_i)\) be the content of current at the beginning of the \(i^\text {th}\) iteration of the while loop. Let \(w_i\) be the label of a path from \(a_0\) to \(a_i\) such that \(w_i \in \mathcal {L}_{\mathbb {A}}(s_0,s_i)\). If there is an \((i+1)^\text {th}\) iteration, either \((s,\sigma ,s')\) or \((T,i_c,i_d)\) has been guessed, and the corresponding check was successful. Let us consider each case:

  • if \((s,\sigma ,s')\) has been guessed and checked, we have two cases:

    • \(\sigma \in \mathcal {P}_2^\pm \), and there is a path from \(a_i\) to \(a_{i+1}\) in \(\textit{chase}(I,\mathcal {R})\) labeled by \(\sigma \). Moreover, \(\sigma \) labels an edge from s to \(s'\) in \(\mathbb {A}\). We can thus define \(w_{i+1} = w_i.\sigma \)

    • \(\sigma = A\), and \(I, \mathcal {R}\models A(c)\). As \(c=d\), we can again define \(w_{i+1} = w_i.\sigma \)

  • if \((T,i_c,i_d)\) has been guessed, it means that T belongs to \(\texttt {Loop}(s_i,i_c,s_{i+1},i_d)\). By the definition of \(\texttt {Loop}\), there is a path p (in the Skolem part) from any term at position \(i_c\) of an atom of type T to the position \(i_d\) of an atom of type T such that \(\lambda (p) \in \mathcal {L}_{\mathbb {A}}(s,s')\). Let \(\alpha \) be as defined Line 12. As \(I, \mathcal {R}\models \alpha \), where \(\textit{type}{(\alpha )} = T\), \(a_i\) appears at position \(i_c\) of \(\alpha \), and \(a_{i+1}\) appears at position \(i_d\) of \(\alpha \), there is such a path from \(a_i\) to \(a_{i+1}\). We can thus set \(w_{i+1} = w_i.p\).    \(\square \)

Property 6

There is an execution of Algorithm 2 that outputs Yes iff the RPQ given by \(\mathbb {A}\) is entailed from \((I, \mathcal {R})\).

Proof

(\(\Rightarrow \)) If the algorithm outputs Yes, the while loop has been exited with current equal to \((b,s_f)\), with \(s_f\) a final state of \(\mathbb {A}\). By Property 5, this means that there is a path from a to b whose label takes \(\mathbb {A}\) from \(s_0\) to \(s_f\), hence is accepted by \(\mathbb {A}\). This show that whenever Algorithm 2 accepts, (ab) is a certain answer to the RPQ given by \(\mathbb {A}\).

(\(\Leftarrow \)) If (ab) is a certain answer to the RPQ based upon \(\mathbb {A}\), then there is path of minimal length \(p = a'_0r_1a'_1\ldots r_na'_n\) from \(a=a'_0\) to \(b=a'_n\) in \(\textit{chase}(I,\mathcal {R})\) such that \(\lambda (p) = r_1 \ldots r_n \in \mathcal {L}_{\mathbb {A}}(s_0,s_f)\) for some final state \(s_f\). Let \(s'_0s'_1\ldots s'_n\) be a sequence of states of \(\mathbb {A}\) such that \(s'_n\) is a final state of \(\mathbb {A}\) and for every \(1 \le i \le n\), \((s_{i-1}, r_i, s_i) \in \delta \). Since p is of minimal length, there is no pair (ij) with \(i \not = j\) such that \((a_i,s_i) = (a_j,s_j)\). Let us consider the sequence \(p' = ((a_i,s_i))_i\) such that:

  • for any i, \(a_i\) is the \(i^\text {th}\) constant, say \(a'_{k_i}\), in p belonging to \(\textit{terms}{(I)}\);

  • for any i, \(s_i = s'_{k_i}\).

Moreover, for any i, if \(k_{i+1} = k_i+1\), we define \(\textit{aux}_i = (s_i,r_{i+1},s_{i+1})\). Otherwise, let \(\textit{aux}_i = (\textit{type}{(\alpha )},i_c,i_d)\),where:

  • \(\alpha \) is such that \(\alpha \in I\) and \(\textit{type}{(\alpha )} \in \texttt {Loop}(s_i,i_c,s_{i+1},i_d)\);

  • \(a_{k_{i}}\) appears at position \(i_c\) of \(\alpha \) and \(a_{k_{i+1}}\) appears at position \(i_d\) of \(\alpha \).

In the second case, it is possible to define \(\textit{aux}_i\) in such a way, as the path \(p_s = a'_{k_i}r_{k_i+1}\ldots a'_{k_{i+1}}\) goes from \(a_{k_i}\) to \(a_{k_{i+1}}\) and belongs to \(\mathcal {L}_{\mathbb {A}}(s_i,s_{i+1})\) by definition of \(s_i\). We show that the sequence of guesses \((a_i,s_i,\textit{aux}_i)\) leads Algorithm 2 to accept. Since p is minimal, the length of \(p'\) is less than \(|\mathbb {A}|\times |I|\). Moreover, \(a_n = b\) and \(s_f\) is a final state. Thus, the only way for Algorithm 2 to reject with this sequence of guesses is to reject during checks, i.e., one of the checks performed at Lines 9, 10, 12 or 14 fails. Let \((a_i,s_i,\textit{aux}_i)\) be the guess at one of the steps. If \(\textit{aux}_i\) is of the form \((s_i, r_{i+1}, s_{i+1})\), then \(a_{k_i}\) and \(a_{k_{i+1}}\) are consecutive elements in p, and there is an atom \(r_{i+1}(a_{k_i},a_{k_{i+1}})\) in \(\textit{chase}(I,\mathcal {R})\). Thus, \(r_{i+1}(a_{k_i},a_{k_{i+1}})\) is entailed by \(I\) and \(\mathcal {R}\), and the check at Line 9 or 10 (depending on \(r_{i+1}\) being a binary or unary atom) is successful. If \(\textit{aux}_i\) is of the form \((\textit{type}{(\alpha )},i_c,i_d)\), then there is \(\alpha \in I\) such that \(\textit{type}{(\alpha )} \in \texttt {Loop}(s_i,i_c,s_{i+1},i_d)\), and with \(a_{k_i}\) (resp. \(a_{k_{i+1}}\)) appearing at position \(i_c\) (resp. \(i_d\)) of \(\alpha \). The atom \(\alpha \) fulfills the conditions of Lines 12 and 14. Thus the defined sequence never triggers a rejection from Algorithm 2, which concludes the proof.   \(\square \)

Theorem 1

RPQ Answering in the presence of linear existential rules is:

  • in NL in data complexity

  • in PTime in combined complexity with bounded arity

  • in ExpTime in combined complexity with unbounded arity

Proof

Algorithm 2 is a non-deterministic algorithm that needs to keep in memory the current state, the current constant, and the number of iterations done so far. It performs two types of operations: entailment checks and accessing the contents of the Loop table (more precisely, deciding whether \(T \in \texttt {Loop}(s,i_c,s',i_d)\)). Hence, it can be seen as an NL algorithm making oracle calls whenever an entailment check is performed or a cell of Loop is retrieved. Entailment checks are in NL in data complexity, and Loop is independent from the data: the overall algorithm thus runs in NL in data complexity. In combined complexity with bounded arity, entailment checks can be performed in PTime, while Loop can be computed in polynomial time: the overall algorithm is thus in PTime with bounded arity. In the unbounded arity case, the entailment checks can be performed in PSpace, while the Loop table can be computed in ExpTime: the algorithm thus runs in ExpTime.    \(\square \)

4 Lower Bound

It is already known that the data complexity (resp. combined complexity) of RPQs under linear rules (resp. linear rules with bounded arity) is NL-hard (resp. PTime-hard) [5], which matches the upper bounds obtained in the preceding section. We thus focus on providing a matching ExpTime lower bound for the combined complexity of evaluating RPQs under linear rules of unbounded arity. The proof is done by simulating an alternating PSpace TM. It is already known that PSpace TMs can be simulated by means of linear rules [12]. In the following, we explain how to adapt this construction to simulate alternating TMs. Note that in this section, we will use rules with multiple atoms in the head: this is done to simplify the presentation, and a classical transformation allows us to get the same lower bound for rules with atomic heads.

The intuition is as follows: the construction in [12] represents the configuration of a TM \(\mathcal {M}\) by a single atom of polynomial arity. The initial configuration can thus be represented by an instance \(I_\mathcal {M}\) containing a single atom. Then, for each transition of the TM, polynomially many linear rules are created, each one representing the action of the transition on a cell at a given position. All these rules are part of \(\mathcal {R}_\mathcal {M}\). The initial configuration of the TM is accepted if and only if an atom encoding a configuration having an accepting state is entailed by \(I_\mathcal {M}\) and \(\mathcal {R}_\mathcal {M}\).

We modify this construction in the following way to deal with alternating Turing machines: to each atom, we add two positions, that will act as “input” and “output” positions. Moreover, we will maintain the following property: there is a path, whose edges are all labeled by the same predicate \(p\), from the input position of \(\alpha \) to the output position of \(\alpha \) entailed by \(\textit{chase}(I_{\{\alpha \}},\mathcal {R}_\mathcal {M})\) if and only if the configuration represented by \(\alpha \) is accepted by \(\mathcal {M}\). This is true in the following cases:

  • the state of the current configuration is accepting. It is then enough to add a \(p\)-edge from \(i_c\) to \(o_c\); this is possible as the Turing machine is assumed to never leave an accepting state;

  • the current state is existential and one of the two successor configurations is accepting: we thus add \(p\)-edges from the input of the current configuration to the input of the two children, and from the output of the two children to the output of the current configuration;

  • the current state is universal, and both successor configurations are accepting: we thus add \(p\)-edges from the input of the current configuration to the input of the first successor configuration, then from the output of that configuration to the input of the other successor, and lastly from the output of the second successor to the output of the current configuration.

We now formalize the construction sketched above, staying as close as possible to the notations in [12].

Turing Machine. Given an alternating PSpace TM and an input x, we can represent a configuration c reached during the computation by storing the content of the first p(|x|) cells, as well as the position of the head of the tape and the current state of the TM. Adding input and output positions, this can be encoded by a predicate \(\textit{conf}\) of arity \(2p(|x|) + 3\):

$$\begin{aligned} \textit{conf}(i_c,state,cell_1,cur_1,cell_2,cur_2,\ldots ,cell_{p(|x|)},cur_{p(|x|)},o_c), \end{aligned}$$

where state contains the state identifier, \(cell_i\) represents the content of the \(i^\text {th}\) cell, \(cur_i\) is equal to 1 if the head of the Turing machine is on cell i and 0 otherwise, and \(i_c\) and \(o_c\) are the input and output terms of this atom. We say that the above atom represents configuration c. Given an atom \(\alpha \), the term at its input (resp. output) position is denoted by \(i(\alpha )\) (resp. \(o(\alpha )\)). We denote by \(I_{\mathcal {M},x}\) the instance containing a single atom representing the initial configuration of \(\mathcal {M}\) on input x.

For every state \(q_f\) with \(g(q_f) = \text {accept}\), we create the following rule:

$$\begin{aligned} \textit{conf}(i_c,q_f,\ldots ,o_c) \rightarrow p(i_c,o_c). \end{aligned}$$
(1)

For each transition \(\delta (q,\gamma ) = \{(q',\gamma ',L),(q'',\gamma '',L)\}\) such that \(g(q) = \vee \), we create the rule

$$\begin{aligned}&\textit{conf}(i_c,q,cell_1,cur_1,\ldots ,cell_{i-1},0,\gamma ,1,\ldots ,o_c) \rightarrow \nonumber \\&\qquad \exists i_{c'},o_{c'},i_{c''},o_{c''} ~ \textit{conf}(i_{c'},q',cell_1,cur_1,\ldots ,cell_{i-1},1,\gamma ',0,\ldots ,o_{c'}), \nonumber \\&\qquad \qquad \quad \textit{conf}(i_{c''},q'',cell_1,cur_1,\ldots ,cell_{i-1},1,\gamma ',0,\ldots ,o_{c''}), \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad p(i_c,i_{c'}), p(o_{c'},o_c),p(i_c,i_{c''}), p(o_{c''},o_c). \end{aligned}$$
(2)

for each position i on the tape, and similarly when the head is moving to the right.

When \(g(q) = \wedge \), we associate with each transition \(\delta (q,\gamma ) = \{(q',\gamma ',L),(q'',\gamma '',L)\}\) the following rule:

$$\begin{aligned}&\textit{conf}(i_c,q,cell_1,cur_1,\ldots ,cell_i,0,\gamma ,1,\ldots ,o_c) \rightarrow \nonumber \\&\qquad \exists i_{c'},o_{c'},i_{c''},o_{c''}~ \textit{conf}(i_{c'},q',cell_1,cur_1,\ldots ,cell_i,1,\gamma ',0,\ldots ,o_{c'}), \nonumber \\&\qquad \qquad \quad \qquad \qquad \,\, \textit{conf}(i_{c''},q'',cell_1,cur_1,\ldots ,cell_i,1,\gamma '',0,\ldots ,o_{c''}), \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad p(i_c,i_{c'}), p(o_{c'},i_{c''}), p(o_{c''},o_c). \end{aligned}$$
(3)

Figure 1 illustrates the functioning of rules of types (2) and (3). We denote by \(\mathcal {R}_{\mathcal {M},x}\) the set containing all the rules defined aboveFootnote 2. The above rules (where input and output positions are removed) simulate the run of a PSpace TM [12].

The following property formalizes the reduction and establishes its correctness.

Fig. 1.
figure 1

Existential (left) and universal (right) gadgets

Property 7

Let \(\mathcal {M}\) be an alternating PSpace Turing machine, and let \(\alpha \) be an atom of \(\textit{chase}\)(\(I_{\mathcal {M},x}\),\(\mathcal {R}_{\mathcal {M},x}\)) representing a configuration \(c(\alpha )\). Then \(c(\alpha )\) is an accepting configuration of \(\mathcal {M}\) if and only if there is a path in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\) from \(i(\alpha )\) to \(o(\alpha )\) whose label belongs to \(p^*\).

Proof

(\(\Leftarrow \)) Let \(\alpha \in \textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\) represent a configuration \(c(\alpha )\), and let \(C_\alpha \) be the restriction of \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\) to \(\alpha \). We show by induction on the number of atoms of \(C_\alpha \) that the required path exists. Note that the induction is well-founded as the Skolem chase is finite (recall that the considered Turing machines terminate).

  • If \(C_\alpha \) contains one atom, then there can be no path in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\) witnessing \(p^*(i(\alpha ),o(\alpha ))\). Suppose then that \(C_\alpha \) contains two atoms. In this case, the only atom in \(C_\alpha \) other than \(\alpha \) must be \(p(i(\alpha ),o(\alpha ))\). The only way to derive such an atom is to apply a rule of the form (1), which is applied if and only if \(c(\alpha )\) is in an accepting state, hence \(c(\alpha )\) is an accepting configuration of \(\mathcal {M}\).

  • Next assume that the result holds for any atom \(\alpha \) such that \(C_\alpha \) has less than n atoms, and let \(\alpha \) be an atom such that \(C_\alpha \) contains n atoms. We distinguish two cases:

    • Case 1: the state of \(c(\alpha )\) is existential. Then, since the rules of type (2) must be satisfied, \(C_\alpha \) contains atoms \(\alpha _1\) and \(\alpha _2\) representing the successor configurations of \(c(\alpha )\). The existence of a path from \(i(\alpha )\) to \(o(\alpha )\) implies that there is either a path from \(i(\alpha _1)\) to \(o(\alpha _1)\) or a path from \(i(\alpha _2)\) to \(o(\alpha _2)\). To see why, observe that every \(p\)-atom involving \(i(\alpha )\) or \(o(\alpha )\) is added either by the same rule application as created \(\alpha \) or by a rule of type (2) applied to \(\alpha \). Only atoms of the second kind (refer to Fig. 1, left) can belong to a shortest path from \(i(\alpha )\) to \(o(\alpha )\), as atoms of the first kind have \(i(\alpha )\) (resp. \(o(\alpha )\)) as second (resp. first) argument. If we have a path from \(i(\alpha _1)\) to \(o(\alpha _1)\), then we can apply the induction assumption to \(\alpha _1\) to get that \(c(\alpha _1)\) is an accepting configuration, which implies that \(c(\alpha )\) is also accepting. We can proceed analogously if we have path from \(i(\alpha _2)\) to \(o(\alpha _2)\).

    • Case 2: the state of \(c(\alpha )\) is universal. As the rules of type (3) must be satisfied, the existence of a path from \(i(\alpha )\) to \(o(\alpha )\) implies the existence of a path from \(i(\alpha _1)\) to \(o(\alpha _1)\) and a path from \(i(\alpha _2)\) to \(o(\alpha _2)\), where \(\alpha _1\) and \(\alpha _2\) represent the successor configurations of \(c(\alpha )\) (refer to Fig. 1, right). By the induction assumption, \(c(\alpha _1)\) and \(c(\alpha _2)\) are both accepting configurations, which means that \(c(\alpha )\) is also accepting.

(\(\Rightarrow \)) We prove the other direction by induction on the number of transitions that need to be performed to prove that \(c(\alpha )\) is accepted by \(\mathcal {M}\).

  • If no transitions are required, this means that \(c(\alpha )\) is in an accepting state. Thus, Rule (1) is applicable, and \(p(i(\alpha ),o(\alpha ))\) is present in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\).

  • Assume the result holds up to n required transitions. We distinguish two cases:

    • Case 1: the state of \(c(\alpha )\) is existential. As \(c(\alpha )\) is accepting, this means that one of its two successor configurations, say \(c(\alpha _1)\), is accepting. Moreover, the number of transitions required to accept \(c(\alpha _1)\) is strictly smaller than for \(c(\alpha )\). By the induction assumption, \(p^*(i(\alpha _1),o(\alpha _1))\) is present in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\). As \(p(i(\alpha ),i(\alpha _1))\) and \(p(o(\alpha _1),o(\alpha ))\) are also present (since the rules of the form (2) generate them), this proves that \(p^*(i(\alpha ),o(\alpha ))\) is present as well.

    • Case 2: the state of \(c(\alpha )\) is universal. As \(c(\alpha )\) is accepting, this means that its two successor configuration are also accepting. By the induction assumption, this means that \(p^*(i(\alpha _1),o(\alpha _1))\) and \(p^*(i(\alpha _2),o(\alpha _2))\) are present in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\). As the rules of the form (3) also generate \(p(i(\alpha ),i(\alpha _1))\), \(p(o(\alpha _1),i(\alpha _2))\), and \(p(o(\alpha _2),o(\alpha ))\), this proves that \(p^*(i(\alpha ),o(\alpha ))\) is present in \(\textit{chase}(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x})\).    \(\square \)

Now let \(\mathcal {M}\) be an alternating PSpace Turing machine, x be an input to \(\mathcal {M}\), and \(\alpha \) be the unique atom in \(I_{\mathcal {M},x}\). Then by Property 7, \(c(\alpha )\) is an accepting configuration of \(\mathcal {M}\) if and only if \(I_{\mathcal {M},x},\mathcal {R}_{\mathcal {M},x} \models p^*(i(\alpha ),o(\alpha )).\) This, together with known results, yields the following lower bounds:

Theorem 2

RPQ Answering in the presence of linear existential rules is NL-hard in data complexity, PTime-hard in combined complexity with bounded arity and ExpTime-hard in combined complexity without arity bound, even for a fixed RPQ.

Note that the preceding reduction can be easily adapted to show that atomic query answering under rulesets containing linear rules and transitivity rules is ExpTime-hard. Assuming ExpTime \(\ne \) PSpace, this result is in contradiction with Theorem 5 in [2], which purports to show a Pspace upper bound. Indeed, after reexamining the proofs, the authors of the latter work have identified the flaw, which occurs in the analysis of the combined complexity of their rewriting-based decision procedure. It turns out that the procedure runs in exponential time, rather than in polynomial space (the NL upper bound in data complexity remains valid). Combining our lower bound with their procedure shows that the problem is ExpTime-complete in combined complexity.

5 Conclusion and Future Work

In this paper, we have investigated the complexity of evaluating regular path queries under linear existential rules. We have shown that it is NL-complete in data complexity, PTime-complete in combined complexity when the predicate arity is bounded, and ExpTime-complete otherwise. This behavior is somewhat surprising with respect to prior work: indeed, for DL-Lite\(_\mathcal {R}\), the combined complexity of RPQ answering is lower than for CQs, whereas we observe just the opposite in the linear case (recall CQ answering is PSpace-complete under linear rules). The upper bound was shown by adapting an existing decision procedure for DL-Lite, using a refined definition of type. The lower bound builds upon a PSpace-hardness result for CQ answering under linear rules.

There are two natural ways to extend the present work: either investigate more expressive forms of path queries (with conjunction and/or nesting) over linear rules, or consider the effect of moving to more expressive decidable classes of existential rules.