1 Introduction

Description Logics (DLs) are a family of knowledge representation languages with unambiguous syntax and well-defined semantics that are widely used to represent the conceptual knowledge of an application domain in a structured and formally well-understood manner.

DLs have been successfully employed to formulate ontologies for a range of knowledge domains, in particular for the bio-medical sciences. Prominent examples of ontologies in these areas are the Gene Ontology,Footnote 1 and the ontology Snomed CT.Footnote 2 Arguably the largest success of DLs to date is that they provide the formal foundation for the standard web ontology language OWL, a milestone for the Semantic Web. More precisely, the current version of the web ontology language, OWL 2, is based on the very expressive DL \({\mathcal {SROIQ}}\).Footnote 3

In DLs, knowledge is represented through concepts that describe collections of objects (that is, correspond to unary predicates from first-order logic), and roles that define relations between pairs of objects (binary predicates). To encode the actual knowledge of the domain, DLs employ different kinds of axioms. These axioms restrict the possible interpretations of the concepts and roles.

For example, we can express the fact that cpuA is an overused CPU, and that every server that has a part that is overused is a server with limited resources through the axioms

$$\begin{aligned}&(\mathsf {CPU}\sqcap \mathsf {Overused})({\textsf {cpuA}}) \end{aligned}$$
(1)
$$\begin{aligned}&\mathsf {Server} \sqcap \exists {\textsf {hasPart}}. \mathsf {Overused} \nonumber \\&\qquad \sqsubseteq \mathsf {ServerWithLimitedResources} \end{aligned}$$
(2)

An axiom of the form (1) is called an assertion, while (2) is a general concept inclusion (GCI).

It has been widely argued that many application domains require the representation of vague concepts, for which it is impossible to precisely characterize the objects that belong to these concepts, and distinguish them from those who do not belong to them [40]. A simple example of such a concept is that of an overused CPU. While it is easy to state that a CPU that is running permanently at its maximum capacity is overused, and one that is not being used at all is not overused, there is no precise usage point where a CPU starts (or stops) being overused. Fuzzy Description Logics have been proposed to alleviate this problem. In these logics, objects are assigned a membership degree, typically a number between 0 and 1, expressing “how much” they belong to a given concept.

In general, the higher the degree of an object, the more it belongs to the concept. To represent vague knowledge, axioms are also extended to restrict the possible degrees that the interpretations may use.

Thus, one can express that cpuA is overused with degree at least 0.8 through the assertion \(\mathsf {Overused}({\textsf {cpuA}})\geqslant 0.8\).

Formally, fuzzy DLs generalize classical DLs by interpreting concepts and roles as fuzzy unary predicates and fuzzy binary predicates, respectively. Hence, fuzzy DLs can be seen as sublogics of fuzzy first-order logic. Adopting this view, one can use a triangular norm (t-norm) and its associated operators to interpret the different logical constructors. Each t-norm then defines a specific family of fuzzy DLs.

It has been shown that reasoning in fuzzy DLs easily becomes undecidable, if infinitely many membership degrees are allowed [1, 12]. In fact, these undecidability results hold even for relatively inexpressive fuzzy DLs. This has motivated the study of finitely valued fuzzy DLs.

It is known that the complexity of standard reasoning tasks in expressive DLs is not affected by the use of t-norm-based finitely valued semantics [13, 14]. Unfortunately, the automata-based techniques exploited in [13] cannot be easily adapted to obtain complexity bounds for the problem of answering conjunctive queries in these logics. Moreover, despite providing optimal complexity bounds, automata-based methods are not used in practice due to their bad best-case behavior.

A different approach for reasoning in the presence of finitely many membership degrees is crispifying; i.e., transforming a finitely valued ontology into an “equivalent” classical ontology, from which the relevant membership degrees can be read [2, 4, 11, 37].

Reasoning in finitely valued fuzzy DLs is thus reduced to reasoning in classical DLs, for which very efficient methods have already been developed and implemented. The main drawback of the translation described in [4, 11] is that it may introduce an exponential blow-up of the ontology, thus affecting the efficiency of the overall method.

Table 1 Families of fuzzy logic operators

In this paper, we adapt the crispification approach for answering conjunctive queries in expressive finitely valued fuzzy DLs. The problem of answering conjunctive queries has recently received much attention as a powerful means to access facts encoded in an ontology. For example, using a fuzzy conjunctive query it is possible to ask for all pairs of servers and CPUs such that the CPU is an overused part (to degree at least 0.6) of the server as follows:

$$\begin{aligned}&\{\mathsf {Server}(x)\geqslant 1 ,~ {\textsf {hasPart}}(x,y)\geqslant 1 ,~ \mathsf {CPU}(y)\geqslant 1 ,\\&\quad \mathsf {Overused}(y)\geqslant 0.6\}. \end{aligned}$$

The crispification approach allows us to effectively answer conjunctive queries over finitely valued ontologies, by reusing the methods developed for the classical case. Once the ontology is crispified, this approach calls a classical conjunctive query answering engine as a black-box procedure. Thus, any optimization developed for the classical case automatically improves the performance for the finitely valued scenario. What remains to be addressed is the exponential blow-up of the ontology, if done according to [4, 11]. We strengthen our results by providing a linear preprocessing step that avoids the exponential blow-up produced by this crispification. Using this preprocessing of the finitely valued ontology, we can guarantee that the classical ontology produced is only polynomially larger than the original input. In particular, this means that the classical query answering engine becomes able to provide answers more efficiently over classical ontologies of lesser size.

The contributions of this paper are the following:

  • We prove that some of the previous crispification algorithms [8, 10, 28] are incorrect for qualified number restrictions (indicated by the letter \(\mathcal {Q}\) in the name of DLs) by means of a counter-example (see Example 8).

  • We discuss a possible way to reduce such qualified number restrictions, but which depends on the presence of so-called Boolean role constructors [34] in the DL (see Sect. 3.1).

  • We improve the reduction from finitely valued \({\mathcal {SROIN}}\) ontologies to classical \({\mathcal {SROIN}}\) ontologies by introducing a linear normalization step (\(\mathcal {N}\) stands for unqualified number restrictions, see Sect. 3).

  • We extend the crispification approach to answering different types of fuzzy conjunctive queries in the finitely valued setting and we prove correctness of the obtained methods (see Sect. 4). This approach works for any known crispification algorithm, in particular specialized ones that correctly reduce number restrictions.

  • We assess the complexity of the presented conjunctive query answering technique for a family of fuzzy extensions of (sublogics of) \({\mathcal {SROIQ}}\) with regard to two types of conjunctive queries that use membership degrees.

  • We provide an evaluation of a prototype implementation of our methods over the LUBM ontology benchmark [23] based on the reduction-based DeLorean reasoner [5] for fuzzy ontologies and a standard query answering reasoner for crisp ontologies (PAGOdA  [42]).

A preliminary version of this paper can be found in [28], where the incorrect reduction of number restrictions was still used. We also extend here that earlier paper by full proofs for the correctness of the crispification procedure, something which has not been done before in the literature. Finally, we optimize the crispification procedure described [28] to eliminate an exponential blow-up inherent in some of the previous crispification proposals.

The rest of the paper is structured as follows: Sect. 2 introduces the syntax and semantics of finitely valued fuzzy DLs based on \({\mathcal {SROIQ}}\). Section 3 describes our improved reduction procedure from fuzzy to classical ontologies. Section 4 presents the actual reduction from fuzzy to classical conjunctive query answering. Section 5 provides an evaluation of a prototype implementation over (fuzzified versions of) the LUBM ontology benchmark [23]. Finally, Sect. 6 presents the current literature on reduction techniques and conjunctive query answering for fuzzy DLs and Sect. 7 summarizes the paper and mentions directions for future work.

2 Preliminaries

We first introduce a class of finite chains, together with some basic operations over them. Afterwards, we formally define the fuzzy extension of \({\mathcal {SROIQ}}\), whose semantics is based on these chains.

2.1 Finite Fuzzy Logics

The semantics of fuzzy DLs is based on truth structures endowed with additional operators for interpreting the logical constructors. We consider arbitrary finite total orders (or chains). Since the names of the truth degrees in a chain are not relevant, we consider in the following only the canonical chain of n elements \({\mathscr {C}}:=\left\{ 0,\frac{1}{n-1},\dots ,\frac{n-2}{n-1},1\right\} \), in the usual order. We denote \({\mathscr {C}} {\setminus }\{0\}\) by \({\mathscr {C}}_{>0}\). We use the notation \(d_{\mathrm{next}}\) to refer to the direct upper neighbour of d in \({\mathscr {C}}\), which is the unique smallest element strictly larger than d. We now consider tuples of the form \(({\mathscr {C}},\otimes ,\Rightarrow ,\ominus ,\oplus )\) that specify the chain together with the operators used for interpreting conjunction, implication, negation, and disjunction, respectively.

The largest family of operators used for fuzzy semantics is based on t-norms, which are associative, commutative binary operators that are monotonic in both arguments and have identity 1. These binary operators, denoted by \(\otimes \), are used in mathematical fuzzy logic to interpret conjunction. The residuum \(\Rightarrow \) is a binary operator which is used to interpret implication. It is uniquely defined by the property that

$$\begin{aligned} (x\otimes y)\leqslant z \text { iff } y\leqslant (x\Rightarrow z) \text { for all } x,y,z\in {\mathscr {C}}. \end{aligned}$$

The residual negation \(\ominus \) is defined simply as \(\ominus x:=x\Rightarrow 0\). Finally, the t-conorm, used for the disjunction, is defined as \(x\oplus y:=1-((1-x)\otimes (1-y))\) for all \(x,y\in {\mathscr {C}} \). Two prominent families of operators are based on the finite Gödel t-norm and the finite Łukasiewicz t-norm (see Table 1). Note, however, that we do not restrict our considerations to only those logics listed in Table 1; our results are valid for any semantics based on a finite t-norm.

An alternative to t-norm-based approaches for interpreting the logical connectives in fuzzy logics is the so-called Zadeh family of operators, shown in the first row of Table 1. Intuitively, the Zadeh family can be seen as a combination of the Gödel and the Łukasiewicz operators.

2.2 The Fuzzy DL \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\)

We introduce finitely valued, fuzzy extensions of the classical description logic \({\mathcal {SROIQ}}\)  [25]—one of the most expressive decidable DLs which provides the direct model-theoretic semantics of the standardized ontology language for the Semantic Web OWL 2. It has been shown that reasoning in finitely valued fuzzy extensions of \({\mathcal {SROIQ}}\) can be reduced to reasoning in classical \({\mathcal {SROIQ}}\)  [3, 10, 37]. This reduction technique will be considered in detail in Sect. 3.

Consider three countable and pairwise disjoint sets of individual names \(\mathsf{N_I}\), concept names \(\mathsf{N_C}\), and role names \(\mathsf{N_R}\). Individual names refer to single elements of an application domain, concept names describe sets of elements, and role names binary relations between elements. Based on these, complex concepts and roles can be built using different constructors. More precisely, a (complex) role is either of the form r or \(r^-\) (inverse role), for \(r\in \mathsf{N_R} \), or it is the universal role u. Similarly, (complex) concepts are built inductively from concept names using the following constructors:

  • \(\top \)   (top concept),

  • \(\bot \)   (bottom concept),

  • \(C\sqcap D\)   (conjunction),

  • \(C\sqcup D\)   (disjunction),

  • \(\lnot C\)   (negation),

  • \(\forall r.C\)   (value restriction),

  • \(\exists r.C\)   (existential restriction),

  • \(\left\{ d_1/a_1,\ldots ,d_m/a_m\right\} \)   (fuzzy nominal),

  • \({\ge }m\,r.C\)   (at-least restriction),

  • \({\le }m\,r.C\)   (at-most restriction), and

  • \(\exists r.\text {Self} \)   (local reflexivity),

where CD are concepts, r is a role, m is a natural number, \(a_1,\dots ,a_m\in \mathsf{N_I} \), and \(d_1,\dots ,d_m\in {\mathscr {C}}_{>0} \).

An ontology \(\mathcal {O} \) consists of the intensional and the extensional knowledge related to an application domain. The intensional knowledge, i.e., the general knowledge about the application domain, is expressed through

  • a TBox \(\mathcal {T}\), a set of finitely many (fuzzy) general concept inclusion (GCIs) axioms of the form \(\langle C\sqsubseteq D\geqslant d\rangle \), where \(d\in {\mathscr {C}}_{>0} \), and

  • an RBox \(\mathcal {R}\), a finite set of role axioms, which are statements of the following form:

    • \(\langle r_1\dots r_m\sqsubseteq r\geqslant d\rangle \)   ((fuzzy) complex role inclusion),

    • \(\text {trans}(r)\)   (transitivity),

    • \(\text {dis}(r_1,r_2)\)   (disjointness),

    • \(\text {ref}(r)\)   (reflexivity),

    • \(\text {irr}(r)\)   (irreflexivity),

    • \(\text {sym}(r)\)   (symmetry), or

    • \(\text {asy}(r)\)   (asymmetry),

where \(r,r_1,\dots ,r_m\) are roles and again \(d\in {\mathscr {C}}_{>0} \).

The extensional knowledge, which refers to the particular knowledge about specific facts or situations, is expressed by an ABox \(\mathcal {A}\) containing a finite set of statements about individuals of the form:

  • \(\langle C(a)\bowtie d\rangle \)   (concept assertion),

  • \(\langle r(a,b)\bowtie d\rangle \)   (role assertion),

  • \(a\ne b\)   (individual inequality assertion), or

  • \(a=b\)   (individual equality assertion),

where \(a,b\in \mathsf{N_I} \), C is a concept, r is a role, \(d\in {\mathscr {C}} \), and \({\bowtie }\in \{\leqslant ,\geqslant \}\). If \(\bowtie \) is \(\geqslant \), then we again consider only values \(d>0\); dually, for assertions using \(\leqslant \) we assume that \(d<1\).

For any axiom of the form \(\langle \alpha \geqslant 1\rangle \), we may simply write \(\alpha \). Finally, an ontology is a tuple \(\mathcal {O} =(\mathcal {A},\mathcal {T},\mathcal {R})\) consisting of an ABox \(\mathcal {A}\), a TBox \(\mathcal {T}\), and an RBox \(\mathcal {R}\).

Table 2 Syntax and semantics of concepts in \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\)

To ensure decidability of classical \({\mathcal {SROIQ}}\), a set of restrictions regarding the use of roles is imposed. For example, transitive roles are not allowed to occur in number restrictions (for more details see [25]). The same restrictions are also adopted for fuzzy extensions of \({\mathcal {SROIQ}}\)  [3, 7, 10]. However, they are not essential for our purposes, as all results presented in this paper hold regardless of these restrictions (except, of course, for the complexity results of Sect. 4.2).

Table 3 Syntax and semantics of axioms in \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\)

The semantics of \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\) is defined via interpretations. A (fuzzy) interpretation is a pair \(\mathcal {I} =(\Delta ^\mathcal {I},\cdot ^\mathcal {I})\), consisting of a non-empty set \(\Delta ^\mathcal {I} \) (called the domain) and an interpretation function \(\cdot ^\mathcal {I} \) that maps every individual name \(a\in \mathsf{N_I} \) to an element \(a^\mathcal {I} \in \Delta ^\mathcal {I} \), every concept name \(A\in \mathsf{N_C} \) to a fuzzy set \(A^\mathcal {I} :\Delta ^\mathcal {I} \rightarrow {\mathscr {C}} \), and every role name \(r\in \mathsf{N_R} \) to a fuzzy binary relation \(r^\mathcal {I} :\Delta ^\mathcal {I} \times \Delta ^\mathcal {I} \rightarrow {\mathscr {C}} \). This function is extended to complex roles and complex concepts as described in Table 2. Note that the usual fuzzy semantics of existential and value restrictions, number restrictions, and role inclusions formally require the computation of an infimum or supremum over all domain elements. However, since \({\mathscr {C}}\) is a finite chain, in our case these are actually minima or maxima, respectively.

For the two-valued chain \({\mathscr {C}} =\{0,1\}\), we obtain the semantics of classical \({\mathcal {SROIQ}}\), since then all fuzzy operators correspond to their classical counterparts. In this setting, it is more natural to treat \(C^\mathcal {I} \) as a subset of \(\Delta ^\mathcal {I} \), given by its characteristic function \(C^\mathcal {I} :\Delta ^\mathcal {I} \rightarrow \{0,1\}\) (and analogously for roles). We call such an interpretation a classical interpretation.

An ontology is satisfied by a fuzzy interpretation \(\mathcal {I}\) if all of its axioms are satisfied, as defined in Table 3.

In this case, \(\mathcal {I}\) is called a model of the ontology. An ontology is consistent iff it has a model. In fuzzy extensions of \({\mathcal {SROIQ}}\), axioms are often allowed to express also strict inequalities (\(<\) and \(>\)). However, in the finitely valued setting an axiom \(\langle \alpha >d\rangle \) with \(d<1\) can be expressed as \(\langle \alpha \geqslant d_{\mathrm{next}}\rangle \) (and similarly for \(<\) and the direct lower neighbour of d).

In the literature, it is also common to find negated role assertions of the form \(\langle \lnot r(a,b)\bowtie d\rangle \) [4, 8]. However, in our setting \(\langle \lnot r(a,b)\leqslant d\rangle \) is equivalent to an assertion of the form \(\langle r(a,b)\geqslant d'\rangle \), and similarly for \(\langle \lnot r(a,b)\geqslant d\rangle \).

Example 1

Suppose that we have a cloud computing environment consisting of multiple servers with their own internal memory and CPU. To model such an environment, we use

  • the individual names: \(\mathsf {serverA}\), \(\mathsf {serverB}\), \(\mathsf {memA}\), \(\mathsf {memB}\), cpuA, cpuB;

  • the concept names: \(\mathsf {CPU}\), \(\mathsf {Memory}\), \(\mathsf {Overused}\), \(\mathsf {Server}\), \(\mathsf {ServerWithLimitedResources}\), and \(\mathsf {ServerWithAvailableResources}\); and

  • the role names: hasPart and \(\mathsf {isConnectedTo}\).

The assertional knowledge of this domain is modeled via the ABox \(\mathcal {A}\)

$$\begin{aligned} \big \{&\mathsf {Server}(\mathsf {serverA}), \mathsf {CPU}({\textsf {cpuA}}), \mathsf {Memory}(\mathsf {memA}), \\&\langle \mathsf {Overused}({\textsf {cpuA}})\geqslant 0.8\rangle , \mathsf {Overused}(\mathsf {memA}), \\&{\textsf {hasPart}}(\mathsf {serverA},{\textsf {cpuA}}), {\textsf {hasPart}}(\mathsf {serverA},\mathsf {memA}), \\&\langle \mathsf {ServerWithAvailableResources}(\mathsf {serverB}) \geqslant 0.6\rangle , \\&\langle \mathsf {isConnectedTo}(\mathsf {serverA},\mathsf {serverB}) \geqslant 0.8\rangle \big \}, \end{aligned}$$

which, for example, states that cpuA is overused with degree at least 0.8, and that the memory \(\mathsf {memA}\) is also overused with degree 1. The terminological knowledge of this domain can be modeled via a TBox \(\mathcal {T}\) containing axioms like

$$\begin{aligned} \langle \mathsf {Server}\sqcap {}&\exists {\textsf {hasPart}}.(\mathsf {Overused}\sqcap \mathsf {CPU})\sqcap {} \\&\exists {\textsf {hasPart}}.(\mathsf {Overused}\sqcap \mathsf {Memory}) \\&\sqsubseteq \mathsf {ServerWithLimitedResources}\geqslant 0.8\rangle , \end{aligned}$$

stating that a server with an overused memory and CPU is a server with limited resources. This implication must hold with a degree of at least 0.8.

It should be noted that the concepts \(\mathsf {CPU}\), \(\mathsf {Memory}\), and \(\mathsf {Server}\) and the role hasPart are essentially crisp; i.e., they can only take values in \(\{0,1\}\). This information can be easily modeled as part of a fuzzy ontology and handled by the reduction algorithm in [3]. In contrast to this, the concepts \(\mathsf {ServerWithLimitedResources}\) and \(\mathsf {Overused}\) have a vague nature; that is, they are fuzzy, and the degree to which a server has limited resources and the degree to which a CPU or a memory card is overused can take values strictly between 0 and 1. The role \(\mathsf {isConnectedTo}\) is also fuzzy and it is used to declare the connection between two servers. The higher the connection degree between two servers, the larger is the bandwidth they use in their communication.

2.3 Conjunctive Queries

Based on the semantics, other reasoning services than consistency of ontologies can be defined. In this paper, we are interested in conjunctive query answering. We give the definition of classical conjunctive queries next.

Definition 2

(Conjunctive Query) Let \(\mathsf {N_V}\) be a countably infinite set of variables disjoint from \(\mathsf{N_C}\), \(\mathsf{N_R}\), and \(\mathsf{N_I}\). An atom is a concept atom of the form A(x), a role atom of the form r(xy), or an equality atom of the form \(x\approx y\), where \(x,y\in \mathsf {N_V} \cup \mathsf{N_I} \), \(A\in \mathsf{N_C} \), and \(r\in \mathsf{N_R} \).

A (k-ary) conjunctive query (CQ) q is a statement of the form

$$\begin{aligned} (x_1,\dots ,x_k)\leftarrow \alpha _1,\dots ,\alpha _m, \end{aligned}$$

where \(\alpha _1,\dots ,\alpha _m\) are atoms, and \(x_1,\dots ,x_k\) are (not necessarily distinct) variables occurring in these atoms. We call \(x_1,\dots ,x_k\) the distinguished variables of q. \(\textsf {VarInds}(q) \) denotes the set of all variables and individual names occurring in q. If \(k=0\), we call q a Boolean conjunctive query.

Let \(\mathcal {I} \) be a classical interpretation, q a Boolean CQ, and \(\pi :\textsf {VarInds}(q) \rightarrow \Delta ^\mathcal {I} \) a function such that \(\pi (a)=a^\mathcal {I} \) for all \(a\in \mathsf{N_I} \). If \(\pi (x)\in A^\mathcal {I} \), then we write \(\mathcal {I} \models ^\pi A(x)\), and \(\mathcal {I} \models ^\pi r(x,y)\) whenever \((\pi (x),\pi (y))\in r^\mathcal {I} \), and \(\mathcal {I} \models ^\pi x\approx y\) if \(\pi (x)=\pi (y)\). If \(\mathcal {I} \models ^\pi \alpha \) for all atoms \(\alpha \) in q, we write \(\mathcal {I} \models ^\pi q\) and call \(\pi \) a match for \(\mathcal {I} \) and q. We say that \(\mathcal {I} \) satisfies q and write \(\mathcal {I} \models q\) if there is a match \(\pi \) for \(\mathcal {I} \) and q.

A (k-ary) union of conjunctive queries (UCQ) \(q_\text {UCQ}\) is a set of k-ary conjunctive queries. An interpretation \(\mathcal {I}\) satisfies a Boolean UCQ \(q_\text {UCQ}\), written \(\mathcal {I} \models q_\text {UCQ}\) if \(\mathcal {I} \models q\) for some \(q\in q_\text {UCQ}\). For a Boolean (U)CQ q and an ontology \(\mathcal {O}\), we write \(\mathcal {O} \models q\) and say that \(\mathcal {O} \) entails q if \(\mathcal {I} \models q\) holds for all models \(\mathcal {I}\) of \(\mathcal {O}\).

Consider now an arbitrary k-ary (U)CQ q and a k-tuple \(\mathfrak {a} \in \mathsf{N_I} ^k\) of individual names. We say that \(\mathfrak {a} \) is an answer to q w.r.t. an ontology \(\mathcal {O}\) if \(\mathcal {O}\) entails the Boolean (U)CQ \(\mathfrak {a} (q)\) resulting from q by replacing all distinguished variables according to \(\mathfrak {a}\) (and possibly introducing new equality atoms if some of the distinguished variables in an answer tuple are equal).

The problem of query answering is to compute all answers of a (U)CQ w.r.t. a given ontology. Query answering can be reduced to query entailment by testing all possible tuples \(a\in \mathsf{N_I} ^k\), which yields an exponential blow-up. It is well known that query entailment and query answering can be mutually reduced and that decidability and complexity results carry over modulo the mentioned blow-up [15].

Example 3

Consider the UCQ \(\mathsf {isMonitoredBy}\), consisting of the following CQs:

$$\begin{aligned} (y,x)&\leftarrow \mathsf {monitors}(x,y), \\ (x,x)&\leftarrow \mathsf {SelfMonitored}(x). \end{aligned}$$

To obtain all answers of this UCQ, we consider all possible tuples \((a,b)\in \mathsf{N_I} \) and instantiate the CQs as follows:

$$\begin{aligned} ()&\leftarrow \mathsf {monitors}(b,a), \\ ()&\leftarrow \mathsf {SelfMonitored}(a),\ a\approx b, \end{aligned}$$

which results in a Boolean UCQ. The latter is entailed by an ontology \(\mathcal {O}\) if one can derive that in all models of \(\mathcal {O}\) either the assertion \(\mathsf {monitors}(b,a)\) holds, or else both \(a\approx b\) and \(\mathsf {SelfMonitored}(a)\) are satisfied.

If the distinguished variables are clear from the context, we may also omit them and write a CQ simply as a set of atoms.

In fuzzy DLs, conjunctive queries can be of two different types: threshold conjunctive queries or general fuzzy queries [32, 40, 41].Footnote 4

Threshold queries ask for tuples of individuals that satisfy a set of assertions to at least some given degree. For example, the threshold query

$$\begin{aligned}&\{\mathsf {Server}(x)\geqslant 1, {\textsf {hasPart}}(x,y)\geqslant 1, \mathsf {CPU}(y)\geqslant 1, \\&\qquad \mathsf {Overused}(y)\geqslant 0.6\} \end{aligned}$$

asks for all pairs of servers and CPUs such that the CPU is a part of the server and is also overused to a degree of at least 0.6.

Definition 4

(Threshold Conjunctive Query) A degree atom is an expression of the form \(\alpha \geqslant d\), where \(\alpha \) is an atom and \(d\in {\mathscr {C}}_{>0} \). A (k -ary) threshold conjunctive query \(q_{\mathsf {t}}\) is of the form

$$\begin{aligned} (x_1,\dots ,x_k) \leftarrow \alpha _1\geqslant d_1,\dots ,\alpha _m\geqslant d_m, \end{aligned}$$

where \(\alpha _1\geqslant d_1,\dots ,\alpha _m\geqslant d_m\) are degree atoms and \(x_1,\dots ,x_k\) are variables. As before, \(\textsf {VarInds}({q_{\mathsf {t}}}) \) denotes the set of variables and individuals occurring in the threshold CQ \({q_{\mathsf {t}}} \).

Let \(\mathcal {I} \) be an interpretation, \(q_{\mathsf {t}}\) a Boolean threshold CQ, and \(\pi :\textsf {VarInds}({q_{\mathsf {t}}}) \rightarrow \Delta ^\mathcal {I} \) a function that maps each \(a\in \mathsf{N_I} \) to \(a^\mathcal {I} \). The degree of an atom \(\alpha =A(x)\) w.r.t. \(\pi \) is defined as \(\alpha ^\mathcal {I} (\pi ):=A^\mathcal {I} (\pi (x))\), and we set \(\alpha ^\mathcal {I} (\pi ):=r^\mathcal {I} (\pi (x),\pi (y))\) for \(\alpha =r(x,y)\); finally, for \(\alpha =x\approx y\) we define \(\alpha ^\mathcal {I} (\pi ):=1\) if \(\pi (x)=\pi (y)\), and \(\alpha ^\mathcal {I} (\pi ):=0\) otherwise. If \(\alpha ^\mathcal {I} (\pi )\geqslant d\) holds for all degree atoms \(\alpha \geqslant d\) in \(q_{\mathsf {t}}\), then we write \(\mathcal {I} \models ^{\pi }{q_{\mathsf {t}}} \) and call \(\pi \) a match for \(\mathcal {I} \) and \({q_{\mathsf {t}}} \). The notions of satisfaction, entailment, and answers are defined as for classical CQs.

General fuzzy CQs, in contrast, have the same syntax as classical conjunctive queries. Their answers are the tuples of individuals satisfying them to a degree greater than 0, together with the degree to which the query is satisfied. For example,

$$\begin{aligned} \{\mathsf {Server}(x),{\textsf {hasPart}}(x,y),\mathsf {CPU}(y),\mathsf {Overused}(y)\} \end{aligned}$$
(3)

asks for all overused CPUs that belong to a server, along with the degree to which these CPUs are overused. To obtain the degree of the query from the individual degrees of the atoms, the fuzzy operator interpreting the conjunction is used.

Definition 5

(Fuzzy Conjunctive Query) A (k -ary) fuzzy conjunctive query \(q_{\mathsf {f}}\) is of the form

$$\begin{aligned} (x_1,\dots ,x_k)\leftarrow \alpha _1,\dots ,\alpha _m, \end{aligned}$$

where \(\alpha _1,\dots ,\alpha _m\) are atoms and \(x_1,\dots ,x_k\) are variables. Let \(\mathcal {I} \) be an interpretation, \({q_{\mathsf {f}}} \) a Boolean fuzzy \( CQ \), and \(\pi \) a mapping as in Definition 4. If \(\bigotimes _{\alpha \in {q_{\mathsf {f}}}}\alpha ^\mathcal {I} (\pi )\geqslant d>0\), then we write \(\mathcal {I} \models ^{\pi }{q_{\mathsf {f}}} \geqslant d\) and call \(\pi \) a match for \(\mathcal {I} \) and \({q_{\mathsf {f}}} \) with a degree of at least d. We say that \(\mathcal {I} \) satisfies \({q_{\mathsf {f}}} \) with a degree of at least d and write \(\mathcal {I} \models {q_{\mathsf {f}}} \geqslant d\) if there is such a match. If \(\mathcal {I} \models {q_{\mathsf {f}}} \geqslant d\) for all models \(\mathcal {I} \) of an ontology \(\mathcal {O}\), we write \(\mathcal {O} \models {q_{\mathsf {f}}} \geqslant d\) and say that \(\mathcal {O} \) entails \({q_{\mathsf {f}}} \) with a degree of at least d. Finally, a tuple \(\mathfrak {a} \in \mathsf{N_I} ^k\) is an answer to a k-ary fuzzy CQ \(q_{\mathsf {f}}\) w.r.t. \(\mathcal {O}\) with a degree of at least d if \(\mathcal {O}\) entails \(\mathfrak {a} ({q_{\mathsf {f}}})\) with a degree of at least d.

The query entailment problem for a (Boolean) threshold \( CQ \) is to decide whether \(\mathcal {O} \models {q_{\mathsf {t}}} \). For fuzzy \( CQ \)s, we may consider two variants of the query entailment problem, namely

  • to decide whether \(\mathcal {O} \models {q_{\mathsf {f}}} \geqslant d\) for a given \(d\in {\mathscr {C}}_{>0} \), or

  • to find the best entailment degree \(\max \{d\mid \mathcal {O} \models {q_{\mathsf {f}}} \geqslant d\}\).

Since we consider only finitely valued semantics over the chain \({\mathscr {C}}\), these two problems can be polynomially reduced to each other. As for classical query answering, it suffices to analyze the complexity of query entailment; the results can then be transferred to query answering [15].

Example 6

Consider the following queries:

$$\begin{aligned} {q_{\mathsf {t}}}&:= \{{\textsf {hasPart}}(x,y)\geqslant 1,\mathsf {Overused}(y)\geqslant 0.9\},\\ {q_{\mathsf {f}}}&:= \{{\textsf {hasPart}}(x,y),\mathsf {Overused}(y)\}, \end{aligned}$$

and the ontology from Example 1. An answer to the query \(q_{\mathsf {t}}\) is \((\mathsf {serverA},\mathsf {memA})\), but not \((\mathsf {serverA},{\textsf {cpuA}})\) since \({\textsf {cpuA}}\) is only overused to degree 0.8. The answers to \(q_{\mathsf {f}}\) are the pairs \((\mathsf {serverA},{\textsf {cpuA}})\) with degree \(\geqslant 0.8\) and \((\mathsf {serverA},\mathsf {memA})\) to degree 1.

Remark 7

A threshold CQ with inequalities using \(\leqslant \) would correspond to a classical CQ containing negated role atoms, for which query answering is undecidable even in very inexpressive DLs [24, 33]. Similarly, upper bounds for fuzzy conjunctive queries \(q_{\mathsf {f}}\), i.e., asking whether \(\mathcal {O} \models {q_{\mathsf {f}}} \leqslant d\), can be seen as a generalized form of disjunction of (negated) query atoms. For these reasons, we consider only inequalities using \(\geqslant \).

Before we turn to answering such queries over fuzzy ontologies, we describe the reduction of expressive finitely valued fuzzy ontologies to classical ones.

3 Reduction of Finitely Valued Fuzzy Ontologies to Classical Ontologies

A popular reasoning technique for fuzzy DLs based on finite chains is the reduction of the fuzzy ontology to a classical one. This allows to use existing DL systems to reason in the fuzzy description logic. However, a major drawback of existing approaches for finite chains using arbitrary t-norms (see [8, 10, 28]) is that this reduction introduces an exponential blow-up in the size of the fuzzy ontology. While this handicap can be remedied by our normalization step described in Sect. 3.2 (see also the experiments in Sect. 5.2), another obstacle needs to be addressed first: the reduction proposed in [8, 10, 28] is not correct for number restrictions.

In the following, we describe this problem in detail and propose a (partial) solution.

3.1 Treating Number Restrictions

The reduction in [8, 10, 28] is based on the idea to simulate number restrictions by existential restrictions in the following way. For a number restriction \({\ge }m\,r.C \), the new concept names \(B_1,\dots ,B_m\) and the axioms

  • \(\top \sqsubseteq B_1\sqcup \dots \sqcup B_m \text { and}\)

  • \(B_i\sqcap B_j\sqsubseteq \bot \text { for all }i,j,\ 1\leqslant i<j\leqslant m.\)

are introduced, which require them to form a partition. Subsequently, the number restriction \({\ge }m\,r.C \) is replaced by the concept \(\exists r.(C\sqcap B_1)\sqcap \ldots \sqcap \exists r.(C\sqcap B_m)\). The following classical example shows that this replacement does not preserve the semantics of the number restrictions, and thus cannot be correct in the fuzzy case, either.

Example 8

Consider the following ABox and TBox:

$$\begin{aligned} \mathcal {A}:= & {} \{ r(a,a),\,r(a,b),\,r(b,a),\,r(b,c),\,r(c,b),\,r(c,c), \\&a\ne b,\,b\ne c,\,a\ne c \}, \\ \mathcal {T}:= & {} \{ \top \sqsubseteq {\le }2\,r.\top ,\, \top \sqsubseteq {\ge }2\,r.\top \}. \end{aligned}$$

A simple model for \(\mathcal {O} =(\mathcal {A},\mathcal {T}, \emptyset )\) is given by \(\Delta ^\mathcal {I} :=\{a,b,c\}\) and \(r^\mathcal {I}:=\{(a,a),(a,b),(b,a),(b,c),(c,b),(c,c)\}\). Thus, the ontology \(\mathcal {O} \) is consistent. By replacing \({\ge }2\,r.\top \) according to the method described above, we obtain the TBox

$$\begin{aligned} \mathcal {T} ' := \{&\top \sqsubseteq {\le }2\,r.\top ,\, \top \sqsubseteq \exists r.B_1\sqcap \exists r.B_2, \\&B_1\sqcap B_2\sqsubseteq \bot ,\, \top \sqsubseteq B_1\sqcup B_2 \}. \end{aligned}$$

We show that the resulting ontology \((\mathcal {A},\mathcal {T} ',\emptyset )\) is inconsistent. Assume to the contrary that there exists a model \(\mathcal {I} '\) of the ontology \(\mathcal {O} '= (\mathcal {A}, \mathcal {T} ', \emptyset )\). Without loss of generality, suppose that it interprets the individual names abc as themselves. Thus, we must have \(r^\mathcal {I} \subseteq r^{\mathcal {I} '}\). There are only eight possible combinations for abc belonging to either \(B_1\) or \(B_2\). Suppose first that \(a,b\in B_1^{\mathcal {I} '}\) and \(c\in B_2^{\mathcal {I} '}\). Then, by the axiom \(\top \sqsubseteq \exists r.B_1\sqcap \exists r.B_2\) the individual a must have yet another r-successor \(x\in B_2^{\mathcal {I} '}\). However, this contradicts the GCI \(\top \sqsubseteq {\le }2\,r.\top \). Similar arguments apply for all other combinations and, therefore, the ontology is inconsistent.

It should be noted that for Gödel and Zadeh semantics alternative (correct) reductions of number restrictions exist [7, 8].

We now propose an alternative encoding of number restrictions when using other fuzzy semantics, avoiding the problem exhibited by Example 8. Intuitively, instead of using a partition of the target concept C of a restriction \({\ge }m\,r.C \), we will partition the role r. Note first that at-most restrictions can be expressed using negation and at-least restrictions; that is, \({\le }m\,r.C \) has the same semantics as \(\lnot ({\ge }(m+1)\,r.C)\) (cf. Table 2). Hence, in the following, we focus on methods for handling at-least number restrictions. Furthermore, we can assume without loss of generality that they only occur in axioms of the forms

$$\begin{aligned} \langle A\sqsubseteq {\ge }m\,r.B \geqslant d\rangle \quad \text {and} \quad \langle {\ge }m\,r.B \sqsubseteq A\geqslant d\rangle , \end{aligned}$$

where A and B are concept names (cf. Sect. 3.2).

Axioms of the first kind can be equivalently expressed using m fresh role names \(r_1,\dots ,r_m\) in the following axioms:

$$\begin{aligned} \langle A\sqsubseteq \exists r_i.C \geqslant d\rangle , \quad r_i\sqsubseteq r, \quad \text {dis}(r_j,r_k), \end{aligned}$$

for all \(i,j,k\in \{1,\dots ,m\}\) with \(j<k\). This is correct due to the minimum used in the semantics of at-least restrictions. More precisely, every model of the original axiom can be extended by a suitable interpretation of the new role names to a model of the resulting axioms, and every model of the latter is immediately a model of the former. Hence, we can eliminate all at-least restrictions that occur on the right-hand side of GCIs (and all at-most restrictions that occur on the left-hand side of GCIs).

Unfortunately, this approach does not work for at-least restrictions occurring on the left-hand side of GCIs. The reason is that the presence of m many r-successors satisfying C does not imply that these successors can be reached using one of the disjoint roles \(r_1,\dots ,r_m\). However, this can be expressed using the additional role axiom

$$\begin{aligned} r\sqsubseteq r_1\sqcup \dots \sqcup r_m, \end{aligned}$$
(4)

which involves a role disjunction that is interpreted using the maximum, i.e.,

$$\begin{aligned} (r_1\sqcup \dots \sqcup r_m)^\mathcal {I} (x,y):=\max _{i=1}^m r_i^\mathcal {I} (x,y). \end{aligned}$$

Role disjunction is an example of a (safe) Boolean role constructor, which can be added to most classical DLs without increasing the complexity of reasoning [34]. Moreover, some query answering procedures for classical DLs even work in the presence of such constructors [16]. Unfortunately, to the best of our knowledge, role disjunctions are not yet supported by any classical DL reasoner.

Table 4 Normalization rules for \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) ontologies

In the presence of axiom (4) and the role disjointness axioms from above, the GCI \(\langle {\ge }m\,r.C \sqsubseteq A \geqslant d\rangle \) can now be equivalently expressed as:

$$\begin{aligned} \langle \exists r_1.C\sqcap \dots \sqcap \exists r_m.C \sqsubseteq A \geqslant d\rangle . \end{aligned}$$

Unlike the incorrect reduction for number restrictions that was first proposed in [10], our approach does not partition the range of the role r in the number restriction, but rather the role itself, and hence it correctly treats the case where a domain element is an r-successor of two different elements that are subject to the same number restriction on r (recall Example 8). However, like the approach of [8, 10, 28], this incurs an exponential blow-up in the largest number occurring in number restrictions, if these numbers are represented in the ontology using a binary encoding. The reduction is polynomial if we assume unary encoding of numbers.

Since role disjunctions are not supported by \({\mathcal {SROIQ}}\) or OWL 2, we will restrict the following investigation to unqualified number restrictions of the form \({\ge }m\,r:={\ge }m\,r.\top \) and \({\le }m\,r:={\le }m\,r.\top \), i.e., to the fuzzy logic \({\mathscr {C}}\)-\({\mathcal {SROIN}}\). However, we want to emphasize that we can easily treat qualified number restrictions in the following reduction if the classical target language supports role disjunctions. It is straightforward to extend the reduction to deal even with \({\mathcal {SROIQB}}_{\mathrm{s}}\), the extension of \({\mathcal {SROIQ}}\) with full Boolean role expressions (which satisfy a safety condition) [34]; the reduction of the role constructors is similar to the one for concepts.

3.2 Ontology Normalization for \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) Ontologies

The reason that the reductions described in [7, 8, 10] can cause an exponential blow-up in the size of the ontology is that concept constructors may be nested to arbitrary depths. In this subsection, we propose a normalization step to ensure that each GCI and concept assertion contains at most one concept constructor, and that each complex role inclusion contains at most two roles on the left-hand side. Because of this, the subsequent reduction of a \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) ontology \(\mathcal {O}\) to a classical \({\mathcal {SROIN}}\) ontology \(\mathcal {O} _\mathsf {c} \) causes only a linear blow-up in the size of \(\mathcal {O}\) (and a quadratic blow-up in the size of \({\mathscr {C}}\)). For an experimental evaluation of the resulting difference in ontology size and reasoning performance, see Sect. 5.2.

The normalization proceeds by exhaustively replacing each axiom by a set of axioms according to Table 4. In that table, \(A,A_C,A_D\) denote concept names, \(\top \), or \(\bot \); CD are complex concepts that are neither concept names, \(\top \), nor \(\bot \); and \(r_1,\dots ,r_m\), r, and \(r_{r_1r_2}\) are roles. \(A_C\) and \(A_D\) are fresh concept names that abbreviate the concepts C and D, respectively. In the last rule, \(r_{r_1r_2}\) is a fresh role name that stands for the role composition of \(r_1\) and \(r_2\). For simplicity, we have given the rules for conjunctions and disjunctions only for the case where both operands are complex concepts. However, if only one of them is a complex concept, we would not introduce a new concept name for the other operand. Note that nominals, unqualified number restrictions, and local reflexivity concepts do not need to be normalized.

It should be noted that this reduction is not correct under Zadeh semantics due to the properties of the implication function. However, [4] provides a different reduction for this case that does not exhibit an exponential blow-up even without normalization. Hence, we consider in the following result only semantics that are based on finitely valued t-norms and their induced operators \(\Rightarrow \), \(\ominus \), and \(\oplus \).

Proposition 9

Let \(\mathcal {O} '\) be the ontology resulting from the exhaustive application of the rules in Table 4 to a \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) ontology \(\mathcal {O}\).

Fig. 1
figure 1

ABox and TBox for Example 14

Under t-norm-based semantics, every model of \(\mathcal {O}\) can be extended to a model of \(\mathcal {O} '\) by interpreting the new concept names \(A_C\) like C and \(r_{r_1r_2}\) like \(r_1r_2\).Footnote 5 Moreover, every model of \(\mathcal {O} '\) is already a model of \(\mathcal {O}\).

This simple observation immediately shows that \(\mathcal {O} '\) is consistent iff \(\mathcal {O}\) is consistent. Moreover, it allows us to prove correctness of the normalization procedure also with respect to the other reasoning tasks we will consider in the following sections. Furthermore, it is easy to see that the normalization could be extended to deal also with qualified number restrictions (\(\mathcal {Q}\)).

While this procedure involves the introduction of linearly many new concept names, it allows us to circumvent the exponential blow-up exhibited by previous reductions.

Remark 10

The reason why this normalization reduces the complexity of the following reduction is that it ensures that each axiom contains at most three occurrences of concept or role names. However, we will see in the following subsection that concept and role names that are interpreted classically, i.e., can take only the values 0 and 1, do not take part in the reduction (see also [2, 5]). Hence, it is enough to ensure that each axiom contains at most three occurrences of fuzzy concept or role names. Such axioms do not need to be reduced any further. Nevertheless, all complexity results concerning the reduction in the following section remain valid.

Example 11

The normalized form of the TBox \(\mathcal {T}\) containing the GCI from Example 1 is as follows:

$$\begin{aligned} \{&\mathsf {Overused}\sqcap \mathsf {CPU}\sqsubseteq A, \\&\mathsf {Overused}\sqcap \mathsf {Memory}\sqsubseteq B, \\&\exists {\textsf {hasPart}}.A \sqsubseteq C, \\&\exists {\textsf {hasPart}}.B \sqsubseteq D, \\&\mathsf {Server}\sqcap C\sqsubseteq E, \\&\langle E\sqcap D \sqsubseteq \mathsf {ServerWithLimitedResources} \geqslant 0.8\rangle \} \end{aligned}$$

However, since the GCI in \(\mathcal {T}\) contains only three occurrences of names of fuzzy concepts (two times \(\mathsf {Overused}\) and once \(\mathsf {ServerWithLimitedResources}\)), we can use \(\mathcal {T}\) as it is in the following reduction.

We will assume in the following that \(\mathcal {O}\) is already normalized. The remainder of the reduction is very similar to the one described in [8] (except for number restrictions).

3.3 The Reduction Algorithm

Each concept name and role name in \(\mathcal {O}\) is mapped onto a set of concepts and roles corresponding to their \(\alpha \)-cuts, which are crisp sets containing all elements that belong to a fuzzy set to at least a given degree \(\alpha \). For example, if the concept name \(\mathsf {Overused}\) describes the degree to which a CPU is overused, then \(\mathsf {Overused}_{\geqslant 0.6}\) represents the set of CPUs that are overused to a degree of at least 0.6. It is clear that we do not need to consider the value 0 for such cuts, as \(A_{\geqslant 0}\) always describes the whole domain. We may also refer to concept names of the form \(A_{>d}\) for \(d\in {\mathscr {C}} \) and \(d<1\), which is a short-hand notation for \(A_{\ge d_{\mathrm{next}}}\), and similarly for role names.

The \({\mathcal {SROIN}} \) ontology \(\mathcal {O} _\mathsf {c} \) obtained from the reduction has the following form:

  • To preserve the semantics of \(\alpha \)-cuts of concept and role names, the following axioms are added to \(\mathcal {O} _\mathsf {c} \) for all \(A\in \mathsf{N_C} \), \(r\in \mathsf{N_R} \), and \(d\in {\mathscr {C}} \) with \(0<d<1\):

    $$\begin{aligned} A_{>d}\sqsubseteq A_{\geqslant d},\ r_{>d}\sqsubseteq r_{\geqslant d}. \end{aligned}$$
  • Each complex concept C appearing in \(\mathcal {O}\) is mapped to the complex concept \(\rho (C,\geqslant d)\) that represents its \(\alpha \)-cut regarding degree d, as defined in the first part of Table 7 in the appendix.

  • Each axiom in \(\mathcal {O}\) is then mapped to a classical axiom or set of axioms in \(\mathcal {O} _\mathsf {c} \) according to the mapping \(\kappa \) defined in the second part of Table 7.

For a more detailed analysis of the reduction rules, the interested reader may refer to [7, 8, 10].

We provide a detailed proof of correctness in the appendix.

Theorem 12

Let \(\mathcal {O}\) be a \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) ontology. Then, \(\mathcal {O}\) has a fuzzy model iff its reduced form \(\mathcal {O} _\mathsf {c} \) has a classical model.

Our normalization procedure allows us to show the following improved complexity bounds. The proof of the following lemma can be found in Appendix 1.

Lemma 13

For a normalized \({\mathscr {C}}\)-\({\mathcal {SROIN}}\) ontology \(\mathcal {O}\), the size of \(\mathcal {O} _\mathsf {c} \) is linear in the size of \(\mathcal {O}\) and quadratic in the size of \({\mathscr {C}}\).

This means that, by simply introducing the normalization step, we can avoid the exponential blow-up of the crispification approach. In particular, we greatly improve the exponential bounds shown in [8, 10].

Example 14

Figure 1 contains the reduced form of the ontology from Example 1 w.r.t. Łukasiewicz semantics over the chain with six elements \(\mathscr {C}=\{0,0.2,0.4,0.6,0.8,1\}\). We have taken into account that one does not need to consider \(\alpha \)-cuts of classical concept and role names. This nicely illustrates how classical concepts and roles help to reduce the size of the reduction [2, 5]. Not only do we have 1 crisp concept instead of \(n-1\) cut concepts, but the number of disjunctions and conjunctions introduced can be reduced dramatically (cf. Table 7).

4 Conjunctive Query Answering for Fuzzy DLs

In this section, we show how to solve the problem of answering threshold and fuzzy CQs in finitely valued fuzzy DLs by taking advantage of existing algorithms for answering unions of conjunctive queries in classical DLs. Our solution is based on the reduction technique described in Sect. 3 .

4.1 Translating Fuzzy and Threshold CQs

In the following, we define a function \(\kappa \) that maps each threshold CQ and fuzzy CQ to a (U)CQ in a classical DL. The idea is that we can then evaluate these classical queries over \(\mathcal {O} _\mathsf {c} \) to answer the original queries over \(\mathcal {O}\). The shape of the mapping \(\kappa \) depends on the type of query, and uses \(\alpha \)-cuts for reducing fuzzy concept and role names to classical ones (cf. Sect. 3).

We first define the function \(\kappa \) for degree atoms of queries analogously as it was done for assertions:

$$\begin{aligned}&\kappa (A(x)\geqslant d) := A_{\geqslant d}(x), \\&\kappa (r(x,y)\geqslant d) := r_{\geqslant d}(x,y). \end{aligned}$$

This definition is then lifted to threshold CQs \(q_{\mathsf {t}}\) in the obvious way: \(\kappa ({q_{\mathsf {t}}}):=\{\kappa (\alpha ) \mid \alpha \in {q_{\mathsf {t}}} \}\).

For transforming fuzzy \( CQ \)s, we use \(\alpha \)-cuts as in the translation of concepts. Thus, in the case of fuzzy \( CQ \)s, \(\kappa \) receives as input also a membership degree from \({\mathscr {C}}_{>0}\), which will be a lower bound for the degree of the query. Recall that the answers depend on the operator \(\otimes \) that interprets the conjunction in the logic under consideration. Since there can be several different combinations of degrees for the different atoms that result in the same degree, fuzzy \( CQ \)s are translated into unions of (classical) conjunctive queries, representing each of these combinations.

Formally, for a fuzzy CQ \({q_{\mathsf {f}}} =\{\alpha _1,\dots ,\alpha _n\}\) and \(d\in {\mathscr {C}}_{>0} \), \(\kappa ({q_{\mathsf {f}}},\geqslant d)\) is the set of all conjunctive queries

$$\begin{aligned} \{\kappa (\alpha _1\geqslant d_1),\dots ,\kappa (\alpha _n\geqslant d_n)\}, \end{aligned}$$

where \(d_1,\ldots ,d_n\in {\mathscr {C}} \) are such that \(\bigotimes _{i=1}^n d_i\geqslant d\).

Example 15

Consider a threshold CQ asking for all pairs of connected servers such that the first one has limited and the second one has available resources:

$$\begin{aligned}&\{\mathsf {ServerWithLimitedResources}(x)\geqslant 0.8, \\&\qquad \mathsf {isConnectedTo}(x,y)\geqslant 0.6, \\&\qquad \mathsf {ServerWithAvailableResources}(y)\geqslant 0.6\}. \end{aligned}$$

This threshold \( CQ \) is reduced to the following classical \( CQ \):

$$\begin{aligned}&\{\mathsf {ServerWithLimitedResources}_{\geqslant 0.8}(x), \nonumber \\&\qquad \mathsf {isConnectedTo}_{\geqslant 0.6}(x,y), \nonumber \\&\qquad \mathsf {ServerWithAvailableResources}_{\geqslant 0.6}(y)\}. \end{aligned}$$

A fuzzy \( CQ \) asking for the same information, but without the thresholds, is the following:

$$\begin{aligned}&\{\mathsf {ServerWithLimitedResources}(x), \\&\qquad \mathsf {isConnectedTo}(x,y), \\&\qquad \mathsf {ServerWithAvailableResources}(y)\}. \end{aligned}$$

To acquire all the pairs (xy) that satisfy this query with degree at least 0.8, for the Łukasiewicz t-norm over the chain with 6 membership degrees, the query is reduced to the following union of classical \( CQ \) s:

$$\begin{aligned} \big \{&\{ \mathsf {ServerWithLimitedResources}_{\geqslant 0.8}(x), \\&\qquad \mathsf {isConnectedTo}_{\geqslant 1}(x,y), \\&\qquad \mathsf {ServerWithAvailableResources}_{\geqslant 1}(y)\}, \\&\{ \mathsf {ServerWithLimitedResources}_{\geqslant 1}(x), \\&\qquad \mathsf {isConnectedTo}_{\geqslant 0.8}(x,y), \\&\qquad \mathsf {ServerWithAvailableResources}_{\geqslant 1}(y) \},\\&\{\mathsf {ServerWithLimitedResources}_{\geqslant 1}(x), \\&\qquad \mathsf {isConnectedTo}_{\geqslant 1}(x,y), \\&\qquad \mathsf {ServerWithAvailableResources}_{\geqslant 0.8}(y)\}\big \}. \end{aligned}$$

The following theorem states that our query reduction is sound and complete.

Theorem 16

Let \(\mathcal {O} _\mathsf {c} \) be the classical version of the fuzzy ontology \(\mathcal {O} \), \({q_{\mathsf {t}}} \) be a threshold \( CQ \), \({q_{\mathsf {f}}} \) be a fuzzy \( CQ \), and \(d\in {\mathscr {C}} \). Then, the following equivalences hold:

  1. 1.

    \(\mathcal {O} \models {q_{\mathsf {t}}} ~\Leftrightarrow ~\mathcal {O} _\mathsf {c} \models \kappa ({q_{\mathsf {t}}})\)

  2. 2.

    \(\mathcal {O} \models {q_{\mathsf {f}}} \geqslant d~\Leftrightarrow ~\mathcal {O} _\mathsf {c} \models \kappa ({q_{\mathsf {f}}},\geqslant d)\).

Proof

To prove that \( \mathcal {O} _\mathsf {c} \models \kappa ({q_{\mathsf {t}}})\) implies \(\mathcal {O} \models {q_{\mathsf {t}}} \), consider any fuzzy model \(\mathcal {I} \) of \(\mathcal {O} \). By Proposition 9, \(\mathcal {I}\) can be extended to a model of the normalized ontology \(\mathcal {O} '\). We now define the classical interpretation \(\mathcal {J} =\{\Delta ^\mathcal {J},\cdot ^\mathcal {J} \}\) as follows (cf. Appendix 1):

$$\begin{aligned}&\Delta ^{\mathcal {J}} := \Delta ^{\mathcal {I}} \nonumber \\&a^\mathcal {J}:= a^\mathcal {I} \nonumber \\&A_{\geqslant d}^{\mathcal {J}} := \left\{ \beta \mid A^\mathcal {I} (\beta )\geqslant d \right\} \nonumber \\&r_{\geqslant d}^{\mathcal {J}} := \left\{ (\beta ,\gamma ) \mid r^\mathcal {I} (\beta ,\gamma )\geqslant d \right\} . \end{aligned}$$
(5)

By Lemma 22, \(\mathcal {J} \) is a classical model of \(\mathcal {O} _\mathsf {c} \). Since \(\mathcal {J} \models \mathcal {O} _\mathsf {c} \) and \( \mathcal {O} _\mathsf {c} \models \kappa ({q_{\mathsf {t}}})\), it follows that \(\mathcal {J} \models \kappa ({q_{\mathsf {t}}})\). By the construction of \(\mathcal {J} \), it can be easily verified that \(\mathcal {I} \models {q_{\mathsf {t}}} \).

It can be shown in a similar way that \(\mathcal {O} \models {q_{\mathsf {f}}} \geqslant d\) implies \(\mathcal {O} _\mathsf {c} \models \kappa ({q_{\mathsf {f}}},\geqslant d)\). To prove the opposite direction we build for each classical model \(\mathcal {J}\) of \(\mathcal {O} _\mathsf {c} \), the fuzzy interpretation \(\mathcal {I} =(\Delta ^\mathcal {I},\cdot ^\mathcal {I})\), where

$$\begin{aligned}&\Delta ^{\mathcal {I}} := \Delta ^{\mathcal {J}} \nonumber \\&a^\mathcal {I}:= a^\mathcal {J} \nonumber \\&A^\mathcal {I} (\beta ) := \max \big \{ d \mid \beta \in A_{\geqslant d}^\mathcal {J} \big \} \nonumber \\&r^{\mathcal {I}}(\beta ,\gamma ) := \max \big \{ d \mid (\beta ,\gamma )\in r_{\geqslant d}^\mathcal {J} \big \}. \end{aligned}$$
(6)

(cf. Appendix 1). By Proposition 9 and Lemma 20, \(\mathcal {I}\) is a fuzzy model of the original ontology \(\mathcal {O}\).

It is straightforward to show that \(\mathcal {J} \models \kappa ({q_{\mathsf {t}}})\) whenever \(\mathcal {I} \models {q_{\mathsf {t}}} \), and similarly for fuzzy CQs. \(\square \)

It is easy to see that this result applies not only to the reduction described in Sect. 3, but also to any reduction that can be shown correct using the definitions in (5) and (6), e.g., the one optimized for \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\) under Gödel and Zadeh semantics in [8].

4.2 Complexity Results

By Lemma 13, the size of the crispified ontology \(\mathcal {O} _\mathsf {c} \) is polynomial in the size of \(\mathcal {O}\). Therefore, we can transfer all complexity results for answering classical CQs over classical sublogics of \({\mathcal {SROIN}}\) directly to the query answering problem for threshold CQs. In particular, recall that CQ entailment in \({\mathcal {SROIN}}\) can be decided in 3-ExpTime  [16], is 2-ExpTime-complete for \({\mathcal {SHIN}}\), and ExpTime-complete for \({\mathcal {ALCHN}}\)  [20, 27].

Moreover, for fuzzy DLs where correct crispification algorithms exist even for \({\mathcal {SROIQ}}\), e.g., for Gödel and Zadeh semantics [8] or in the presence of role disjunctions (see Sect. 3.1), any classical CQ answering technique that is able to handle number restrictions can be applied also for threshold CQs. Under this condition, answering threshold CQs in finitely valued extensions of \({\mathcal {SHIQ}}\) and \(\mathcal{SHOQ}\) is 2-ExpTime-complete [20, 22], for \({\mathcal {SROQ}}\) and \({\mathcal {SRIQ}}\) it can be done in 3-ExpTime  [16], while for \({\mathcal {SHQ}}\) it becomes ExpTime-complete if queries are restricted to simple roles, i.e., roles that do not have transitive subroles [27]. Notice, however, that none of these query answering approaches has been implemented so far.

For fuzzy CQs, the complexity increases by an exponential factor due to the blow-up in the translation \(\kappa \). It is possible to eliminate this blow-up, however, when the minimum t-norm is used, i.e., under Gödel and Zadeh semantics. Then, we can obviously define \(\kappa ({q_{\mathsf {f}}},\geqslant d):=\{\kappa (\alpha \geqslant d) \mid \alpha \in {q_{\mathsf {f}}} \}\) for any fuzzy CQ \(q_{\mathsf {f}}\), and thus obtain the same complexity results as for threshold CQs.

It should also be noted that the data complexity of all these problems is the same as that for classical CQs, as the size of the ABox is not increased by the reduction from \(\mathcal {O}\) to \(\mathcal {O} _\mathsf {c} \). Since for many applications the TBox remains unchanged, while the ABox changes frequently; the reduction to the crisp ontology \(\mathcal {O} _\mathsf {c} \) need not be computed when queries are answered, but need to be computed only once “off-line” beforehand.

4.3 Generalizing the Query Component

So far, we have examined the reduction technique for answering threshold CQs and fuzzy CQs. These two types of queries are immediate extensions of classical CQs, and usually considered in the literature. Nevertheless, the existence of degrees may lead to more general forms of fuzzy CQs in which the score of a query is computed via a monotone scoring function, as described in the following.

Definition 17

A (k-ary) scoring query \(q_{\mathsf {s}}\) is an expression of the form

$$\begin{aligned} (x_1,\dots ,x_k) \leftarrow f(\alpha _1,\dots ,\alpha _n), \end{aligned}$$

where f is a monotonically increasing scoring function with n arguments, \(\alpha _1,\dots ,\alpha _n\) are atoms, and \(x_1,\dots ,x_k\) are variables. Let \(\mathcal {I} \) be an interpretation, \({q_{\mathsf {s}}} \) a Boolean scoring query, and \(\pi \) a mapping as in Definition 4. If

$$\begin{aligned} f(\alpha _1^\mathcal {I} (\pi ),\dots ,\alpha _n^\mathcal {I} (\pi ))\geqslant d, \end{aligned}$$

then we write \(\mathcal {I} \models ^{\pi }{q_{\mathsf {s}}} \geqslant d\) and call \(\pi \) a match for \(\mathcal {I} \) and \({q_{\mathsf {s}}} \) with a score of at least d. We say that \(\mathcal {I} \) satisfies \({q_{\mathsf {s}}} \) with a score of at least d and write \(\mathcal {I} \models {q_{\mathsf {s}}} \geqslant d\) if there is such a match. If \(\mathcal {I} \models {q_{\mathsf {s}}} \geqslant d\) for all models \(\mathcal {I} \) of an ontology \(\mathcal {O}\), we write \(\mathcal {O} \models {q_{\mathsf {s}}} \geqslant d\) and say that \(\mathcal {O} \) entails \({q_{\mathsf {s}}} \) with a score of at least d. Finally, a tuple \(\mathfrak {a} \in \mathsf{N_I} ^k\) is an answer to a k-ary scoring query \(q_{\mathsf {s}}\) w.r.t. \(\mathcal {O}\) with a score of at least d if \(\mathcal {O}\) entails \(\mathfrak {a} ({q_{\mathsf {s}}})\) with a score of at least d.Footnote 6

It should be noted that the score may take an arbitrary value in \(\mathbb {R}\). This kind of queries has already been considered in the literature [32, 40, 41]. Fuzzy CQs can be seen as special scoring queries of the form \(\alpha _1\otimes \dots \otimes \alpha _n\). Since \({\mathscr {C}}\) is finite, the same technique as for fuzzy CQs can be applied here, i.e., considering all possible combinations of degrees in \({\mathscr {C}} \).

Example 18

Suppose that we are interested in finding all servers that have overused CPU and memory, but the excessive use of CPU should be considered of greater importance than the use of memory. To achieve this, we formulate the following query to include a weighting factor on the degrees of overuse for the different components. For instance, we can use the query

$$\begin{aligned}&\mathsf {Server}(x) \cdot {\textsf {hasPart}}(x,y) \cdot \mathsf {CPU}(y) \cdot {\textsf {hasPart}}(x,z) \cdot {} \nonumber \\&\quad \mathsf {Memory}(z) \cdot \frac{3\cdot \mathsf {Overused}(y)+2\cdot \mathsf {Overused}(z)}{5}\quad \end{aligned}$$
(7)

where the fraction in the last factor takes into account the degrees of overuse of CPU and memory with weights 0.6 and 0.4, respectively.

Assume that \({\mathscr {C}} =\{0,0.25,0.5,0.75,1\}\) and that the concepts \(\mathsf {Server}\), \(\mathsf {CPU}\), and \(\mathsf {Memory}\) and the role \({\textsf {hasPart}}\) behave classically. If we want to find all answers that satisfy this query to degree at least 0.25, then we can translate it into a union of classical conjunctive queries that contains, e.g.,

$$\begin{aligned}&\left\{ \mathsf {Server}_{\geqslant 1}(x), \right. \left. {\textsf {hasPart}}_{\geqslant 1}(x,y), \mathsf {CPU}_{\geqslant 1}(y), \right. \\&\quad \left. {\textsf {hasPart}}_{\geqslant 1}(x,z), \mathsf {Memory}_{\geqslant 1}(z), \right. \\&\quad \left. \mathsf {Overused}_{\geqslant 0.25}(y), \mathsf {Overused}_{\geqslant 0.75}(z) \right\} . \end{aligned}$$

When evaluated over the reduced ontology, this query returns all triples of elements from \(\mathsf {Server},\mathsf {CPU}\), and \(\mathsf {Memory}\), respectively, where the CPU is overused to degree 0.25 and the memory is overused to degree 0.75. These tuples satisfy the query (7) with a degree of at least

$$\begin{aligned} 0.6\cdot 0.25+0.4\cdot 0.75=0.45\geqslant 0.25, \end{aligned}$$

as desired.

Another interesting problem, specific to weighted logics, is the top-k query answering problem presented in [3941]. This variation of the fuzzy query answering problem focuses on the k answers with the highest degrees of satisfaction. In a naive approach to solve this problem, the translation function \(\kappa \) for fuzzy or scoring \( CQ \) s can be iteratively applied starting from the highest to the lowest degrees in \({\mathscr {C}} \) until the limit of k answers is reached. It has to be investigated if a more sophisticated approach can be adopted to solve this problem.

5 Practical Evaluation

We have proved that, by introducing the normalization step, we can avoid the exponential blow-up of the earlier crispification approach (Sect. 3). We have additionally extended the reduction process to handle the problem of CQ answering (Sect. 4). The main objectives of this section are to

  1. 1.

    Evaluate how the ontology normalization preprocessing improves the total execution time for CQ answering, and

  2. 2.

    Study the practical limitations of the reduction approach for the problem of CQ answering in the finitely valued fuzzy setting.

Rather than implementing a full fuzzy CQ answering system from scratch, we have modified the existing reasoner DeLorean [5].Footnote 7 DeLorean is a reduction-based fuzzy DL reasoner that supports fuzzy variants of the description logics \(\mathcal {SROIQ}(\mathcal {D})\) and \(\mathcal {SHOIN}(\mathcal {D})\) under the finite Gödel, finite Łukasiewicz, and Zadeh semantics. We use DeLorean to transform an input fuzzy ontology into a crisp ontology through the transformation rules described earlier in this paper. Additionally, we have implemented the reduction for threshold queries described in Sect. 4. After the crisp ontology and query are obtained, an arbitrary crisp reasoner can be employed for query answering. For our experiments, we used the PAGOdA systemFootnote 8 [42, 43]. See [2] for an evaluation of the performance of DeLorean on classical reasoning problems in tractable DLs.

Other reasoners that support query answering for expressive DLs could also have been adopted. However, most other systems can only correctly answer CQs that can be directly expressed in the DL, e.g., tree-shaped CQs or instance queries, or employ the simplifying restriction that the non-distinguished variables can only be bound to named individuals [31]. Although in theory PAGOdA also depends on the query answering capabilities of HermiT  [21], it was shown in [42] that it correctly answers the standard queries of the LUBM benchmark (which we use in the following), without having to rely on HermiT. This is also the case for the normalized, fuzzified, and crispified versions of the LUBM TBox we use in the following experiments.

5.1 Test Data and Test Set-up

To evaluate our approach, we used the LUBMFootnote 9 ontology benchmark [23]. The LUBM benchmark contains terminological knowledge that describes the domain of a university, and generates synthetic OWL data (ABox) over the ontology specifying individuals belonging to the university. The ABox corresponding to a single university contains approximately 1300 concept assertions and 2450 role assertions and can be scaled by a factor of k, producing information for k different universities. In our experiments, we used different ABoxes covering 1, 15, and 30 universities, respectively.

Table 5 Time needed for preprocessing by PAGOdA (in s) and query execution time (in ms) for instance queries over the original LUBM TBox vs. its normalized version

To use LUBM as a benchmark for fuzzy reasoning, we have extended all ABox axioms with random degrees chosen from a fixed finite chain of cardinality 3, 7, or 11, which are interpreted using Łukasiewicz semantics. The axioms in the TBox are always required to hold with degree 1. Note that the original LUBM TBox is formulated in \(\mathcal {ELHI}_{R^+}\) (we ignored all datatype axioms), and hence the same holds for our fuzzified variants. However, due to the introduction of disjunctions in the reduction to a classical ontology, the resulting TBox is formulated in \(\mathcal {SHI}\) (see Table 7 and [2]).

The queries of the LUBM benchmark were translated to threshold queries by asking each atom to hold with the smallest non-zero degree (\(\frac{1}{2}\), \(\frac{1}{6}\), and \(\frac{1}{10}\) for 3, 7, and 11 truth degrees, respectively).

For all experiments, we report the average runtime over 5 separate runs; the coefficient of variation was mostly below 25 %, which means that the runtimes did not differ much between runs. All the tests were performed on a machine powered by an Intel Core i7 2.6 GHz processor and equipped with 8 GB DDR3 1600MHz main memory.

5.2 Evaluating the Normalization and Its Effect on Query Answering

For the first experiment, we applied the normalization from Sect. 3.2 to the LUBM TBox, and compared the query answering times for a single challenging threshold CQ (based on query #9 of the LUBM benchmark). The results of this evaluation, based on the Łukasiewicz semantics, are presented in Table 5.

Fig. 2
figure 2

The sizes of the crispified TBoxes (the number of occurrences of concept and role names) compared between the original LUBM TBox and its normalized variant

The table compares the sizes of the crispified TBoxes for both the original TBox and the normalized one (counted as the number of occurrences of concept and role names), as well as the running times of PAGOdA. It shows both the time needed by PAGOdA for its own preprocessing of the ontology and the time needed to perform query answering. The preprocessing clearly dominates the overall runtime. In comparison to this, the time needed by PAGOdA to actually answer the queries is almost negligible.

The preprocessing time is mostly determined by the normalization and by the number of degrees, i.e., the size of \({\mathscr {C}}\), but it also increases slightly with increased ABox size. One can also see that, although the query answering times are not much affected, the size of the TBox, and hence the time PAGOdA needs for preprocessing, can be reduced significantly by normalization. This can also be seen in the graph in Fig. 2, where the normalized variant of the TBox is clearly smaller w.r.t. increasing numbers of degrees than the original TBox.

However, for only 3 degrees of membership, the avoided exponential blow-up does not outweigh the overhead of introducing auxiliary concept names for the normalization in practice.

Finally, it should be noted that the query answering time actually decreases when the number of truth degrees is increased (for both the normalized and the original TBox). This is not relevant for our experiments, but is simply a consequence of the Łukasiewicz semantics: When more fine-grained truth degrees are randomly assigned to all ABox assertions, the chance is larger that the degree to which a query atom is satisfied is reduced to 0, and hence does not yield an answer tuple (see Table 1). And with a decreased number of answers, the queries need less time to be answered.

5.3 Evaluating Conjunctive Query Answering

To analyze the practical usefulness of the reduction approach for CQ answering, we consider several CQs from the LUBM benchmark (namely #5, #7, and #9) over the normalized LUBM TBox.

An additional parameter that is evaluated here is the percentage of crisp concepts and roles appearing in the ontology; 0, 20, 80, or 100 % of all names are randomly chosen to be crisp.

Recall that a crisp concept or role name can only take the values 0 or 1. For such concepts and roles, the reduction process is more light-weight, since it results in a smaller number of concept names in the reduced ontology [2, 5].

For each chain size, number of universities, and percentage of crisp symbols, we evaluated three threshold CQs Footnote 10 that extend the CQs in the LUBM benchmark.

The evaluation was performed again w.r.t. the Łukasiewicz semantics. The results are depicted in Table 6.

Table 6 Time needed for preprocessing by PAGOdA (in s) and query execution time (in ms) for three queries and different percentages of crisp names

We again see a counter-intuitive behavior, namely that the query answering time decreases with smaller percentage of crisp elements (and hence increased TBox size). The reason lies again with the Łukasiewicz semantics: queries over “more fuzzy” TBoxes will have less answers, and hence less effort is required to answer them. Nevertheless, we can see that the times required to answer threshold CQs are of the same order of magnitude as for classical CQs. Again the total runtime is dominated by the preprocessing time of PAGOdA, and hence mainly by the TBox size. This is good news, since the preprocessing step only needs to be performed once as long as the input TBox does not change.

Moreover, our approach is scalable for large ABoxes (as much as this is the case for a crisp ontology) since the crispification mainly affects the size of the TBox, but there is nearly no difference in size between a fuzzy ABox and its crispified version (see also Table 7).

As a final note, in practical applications of fuzzy ontologies one would hardly expect the ontology to have only fuzzy concept and role names, but rather that a smaller number of fuzzy names complements a large number of crisp names that describe precise knowledge. This is closer to our 80 % scenario, which does not differ much from the purely crisp case in terms of runtime.

6 Related Work

Non-fuzzy representations of fuzzy DLs have been extensively studied for several families of languages. The methods can be classified based on their fuzzy and DL expressivity used. Based on the Zadeh family of fuzzy logic operators, reduction techniques and optimizations have been examined for the fuzzy extensions of the \({\mathcal {ALCH}}\) [37], \({\mathcal {SHOIN}}\) [36], and \({\mathcal {SROIQ}}\) [3] languages, while an experimental evaluation of the reduction technique for the fuzzy version of \({\mathcal {SHIN}}\) is presented in [19]. For Gödel semantics, a reduction procedure for the DL \({\mathcal {SROIQ}}\) is considered in [7]. This procedure is extended in [8] for a language combining Gödel and Zadeh semantics. The reduction technique for the Łukasiewicz t-norm over \({\mathcal {SROIQ}}\) is studied in [10], and for arbitrary finite t-norms in [8, 11]. It should be noted, as we show in Example 8, that the reduction of qualified number restrictions proposed in [8, 10, 28] is incorrect. This family of algorithms has been implemented in the DeLorean reasoner [5, 6, 9]. Based on a different approach, a family of fuzzy DLs using \(\alpha \)-cuts as atomic concepts and roles is considered in [26].

Conjunctive query answering for fuzzy DLs has mostly been studied for the DL-Lite family of languages. In [38, 39], the problem of evaluating top-k queries in fuzzy DL-Lite is considered. In [32], the authors present a variety of query languages that can be used for querying fuzzy DL-Lite ontologies and adapt classical query rewriting techniques for answering these queries. A similar approach is taken in [30].

A tableaux algorithm for conjunctive query answering for fuzzy CARIN, a language combining the DL \({\mathcal {ALCNR}}\) with Horn rules under Zadeh semantics, is provided in [29]. Another algorithm for answering expressive fuzzy conjunctive queries is presented in [17, 18]. The algorithm allows the occurrence of both lower bound and the upper bound of thresholds in a query atom over the DL \({\mathcal {SHIN}}\) extended with the Zadeh semantics. Finally, practical approaches for storing and querying fuzzy knowledge in the Semantic Web have been also investigated [35].

7 Conclusions and Future Work

This paper focuses on how a classical representation of ontologies written in finitely valued fuzzy DLs can be adopted to solve the threshold and fuzzy conjunctive query answering problems. These problems are reduced to equivalent (U)CQ answering problems in classical DLs. The correctness of the suggested technique is proved and its complexity is studied for different variants of \({\mathscr {C}}\)-\({\mathcal {SROIQ}}\). As far as we know, no similar theoretical results have been presented. The proofs rely on the fact that each model of a fuzzy ontology \(\mathcal {O} \) can be mapped to a model of its reduced crisp form \(\mathcal {O} _\mathsf {c} \) and vice versa, thus showing the soundness and completeness of the reduction technique.

To verify the correctness of our approach, we have corrected and extended the proofs sketched in [10], providing the first full proof for this result in the literature. Additionally, we have improved the size of the reduction from fuzzy \({\mathcal {SROIN}}\) ontologies to classical \({\mathcal {SROIN}}\) ontologies by introducing a linear normalization step. After this step, the obtained classical ontology can be exponentially smaller than the one obtained by previous crispification approaches. The smaller size promises lower runtimes of the query answering procedure. We have also suggested a reduction of qualified number restrictions, which uses role disjunctions to express a partitioning of a role.

In our evaluation of a prototype implementation of this approach, based on the DeLorean system, we have demonstrated that the overhead involved in using a finitely valued fuzzzy description logic instead of a classical DL is reasonable. In a scenario where only the data and queries change frequently, while the TBox remains fixed, the normalization and the reduction of a finitely valued TBox can be computed “off-line”. It is thus sufficient to consider the time required for query answering, which does not increase significantly when using more than two membership degrees.

Future work involves examining if available optimization techniques for fuzzy and classical DLs can be applied to improve the performance of these algorithms. On the theoretical side, we will investigate the possibility of extending the reduction to deal with qualified number restrictions without resorting to non-standard role constructors, obtaining a polynomial reduction for any finitely valued fuzzy description logic extending OWL 2 DL.