Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Today, we are experiencing an unprecedented production of resources, published as Linked Open Data (LOD). This is leading to the creation of a global data space with billions of assertions [9]. RDF [24] provides formal ways to build these assertions. Most of the RDF links, connecting resources coming from different data sources, are identity links, also called sameAs statements. They are defined using the owl:sameAs property, expressing that two URIs actually refer to the same thing [1]. Unfortunately, many existing identity links do not reflect genuine real identity [15, 16] and therefore might lead to inconsistencies. Over the years, inconsistency-tolerant semantics (e.g. [7, 8, 26, 27]) have been proposed for query answering over potentially inconsistent existing data (and thus overcoming inconsistencies within the data).

In this work, we formalize explanation dialogues that use argument-based explanation based on inconsistency tolerant semantics. Our explanation dialogue supports a domain expert in discovering inconsistencies as in (eventually) performing corrections on erroneous data, or in revising the logical rules used for the invalidation or even in deciding the (potential) redesign of the initial linking strategy.

The explanation dialogue relies on a method for invalidating the sameAs that computes repairs so that, when a sameAs is not entailed by the defined semantics, an explanation of the reasons against this entailment is provided.

This is the first work that uses argumentation for sameAs links invalidation together with a formalization of a general explanation framework supporting the dialogue between user and reasoner. The salient point of this paper is to show how inconsistency-tolerant semantics can represent a first step in the direction of the design of new interactive paradigms for assessing the quality of sameAs statements.

The paper is organized as follows. Section 2 argues on related works while in Sect. 3 we provide background notions for argumentation theory. Section 4 is devoted to the presentation of our argumentation problem for sameAs invalidation. Section 5 formally introduces the novel Explanation Dialogue and Sect. 6 provides an example of the overall strategy implemented in a prototype. Finally, Sect. 7 draws some concluding remarks and possible future directions.

2 Related Work

To the extent of our knowledge the work presented here is the first attempt to combine argumentation theory, identity links evaluation and explanation dialogues, however, related works can be found in the context of sameAs evaluation and in general approaches which use argumentation in the Semantic Web.

For what concern the sameAs validation problem, it is very recent and few methods exist. In [17] an approach is presented where the structural properties of large graphs of sameAs links are analyzed, without analyzing the quality. In [22] a framework dedicated to the assessment of sameAs using network metrics is described, while in [23] the authors reported on the quality of sameAs links in the LOD using a manual method. In [15], the author illustrates how to assess the quality of sameAs, using a constraint-based method which, in the end, consider only one property (name of the entity), while in [29] an ontology-based logical invalidation method is presented which discovers invalid sameAs by the use of contextual graphs build around the resources, thus using more properties. Finally, the recent work presented in [14] evaluate a sameAs by using position and relevance of each resource involved with regards to the associated DBpedia categories, modeled through two probabilistic category distribution and selection functions. We need to recall that there exist a lot of linking methods (see [21] as survey) that, during their process of sameAs discovery, include a strategy for evaluating the reliability of the sameAs just computed.

Regarding argumentation in the Semantic Web, several works exist that mainly address ontologies alignment agreement based on argumentation theory (e.g. [18, 19, 25]). Basically, all of them use argumentation to provide a final agreement (or a final answer), and do not exploit the argumentation as a form of explanation of the answer to a query. Recently, in [10] the problem of data fusion in Linked Data is addressed, by adopting a bipolar argumentation theory (with fuzzy labeling) to reason over inconsistent information sets, and to provide a unique answer.

This last method has common points with our line of work, namely the use of argumentation theory to detect inconsistencies, but the scenarios in which the approach is exploited are different as its general aim is. This obviously leads to different addressed issues and proposed solutions.

3 Background Notions

There exist two major approaches for representing an ontology for the OBDA (Ontology-Based Data Access) problem: (i) Description Logics (DL) (such as \(\mathscr {EL}\) [4] and DL\(_{Lite}\) [12] families) and (ii) Rule-based Languages (such as Datalog\(+/-\) [11] language). Despite its undecidability when answering conjunctive queries, different decidable fragments of Datalog\(+/-\) have been studied in the literature [6]. They overcome their limitations allowing n-arity for predicates and cyclic structures. We consider the positive existential conjunctive fragment of first-order logic, denoted by FOL(\(\wedge \),\(\exists \)), which is composed of formulas built with the connectors \((\wedge ,\rightarrow )\) and the quantifiers \((\exists ,\forall )\).

We consider first-order vocabularies with constants but no other function symbol. A term t is a constant or a variable. Different constants represent different values (unique name assumption), an atomic formula (or atom) is of the form \(p(t_1,\ldots ,t_n)\) where p is an n-ary predicate, and \(t_1,\ldots ,t_n\) are terms. A ground atom is an atom with no variables. A variable in a formula is free if it is not in the scope of a quantifier. A formula is closed if it has not free variable. We denote by X (bold font) sequences of variables \(X_1,\ldots ,X_k\) with \(k \ge 1\). A conjunct \(C[\mathbf X ]\) is a finite conjunction of atoms, where X is the sequence of variables occurring in C. Given an atom or a set of atoms A, vars(A), consts(A) and terms(A) denote its set of variables, constants and terms, respectively.

An existential rule is a first-order formula of the form \(R=\forall \mathbf X \forall \mathbf Y (H[\mathbf X ,\mathbf Y ]) \rightarrow \exists \mathbf Z C[\mathbf Z ,\mathbf Y ]\), with \(vars(H)=\mathbf X \cup \mathbf Y \), and \(vars(C)= \mathbf Z \cup \mathbf Y \) where H and C are conjuncts (hypothesis and conclusion of R), respectively. \(R=(H,C)\) is a contracted form for R. An existential rule with an empty hypothesis is called a fact. A fact is an existentially closed (with no free variable) conjunct. A rule \(r=(H,C)\) is applicable to a set of facts F iff there exists \(F' \subseteq F\) such that there is a homomorphism \(\pi \) from H to the conjunction of elements of \(F'\). If a rule r is applicable to a set F, its application according to \(\pi \) produces a set \(F \cup \{\pi (C)\}\). The new set \(F \cup \{\pi (C)\}\), denoted also by r(F), is called immediate derivation of F by r. Finally, we say that a set of facts \(F \subseteq {\mathscr {F}}\) and a set of rules \(\mathscr {R}\) entail a fact f (and we write \(F, \mathscr {R} \,\models \, f\)) iff the closure of F by all the rules entails f (i.e. \({\mathtt {Cl_{{\mathscr {R}}}}}(F) \,\models \, f\)). A negative constraint is a first-order formula \(n=\forall \mathbf X H[\mathbf X ] \rightarrow \perp \) where \(H[\mathbf X ]\) is a conjunct called hypothesis of n and X sequence of variables appearing in the hypothesis. A knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) is composed of, a finite set of facts \({\mathscr {F}}\), a finite set of existential rules \({\mathscr {R}}\) and a finite set of negative constraints \({\mathscr {N}}\). Given a knowledge base \({\mathscr {K}}= ({\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}})\), a set \(F \subseteq {\mathscr {F}}\) is said to be inconsistent iff there exists a constraint \(n \in {\mathscr {N}}\) such that \(F \,\models \, H_{n}\), where \(H_{n}\) is the hypothesis of the constraint n. A set of facts is consistent iff it is not inconsistent. A conjunctive query (CQ) has the form \(Q(\mathbf X )=\exists \mathbf Y \varPhi [\mathbf X ,\mathbf Y ]\) where \(\varPhi [\mathbf X ,\mathbf Y ]\) is a conjunct such that X and Y are variables in \(\varPhi \). A Boolean CQ (BCQ) is a CQ with yes or no as answer.

Inconsistency Handling. If a knowledge base \({\mathscr {K}}= ({\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}})\) is inconsistent, then everything is entailed from it. A common way to face inconsistency [7, 26] is to construct maximal (with respect to set inclusion) consistent subsets of \(\mathscr {F}\), called repairs, denoted by \({\mathscr {R}epair}({\mathscr {K}})\). In this paper, we consider a fragment of our language where the deduction method (the chase) halts, thus the closure \({\mathtt {Cl_{{\mathscr {R}}}}}(F)\) of any set of facts F is finite. Once the repairs are computed, different semantics can be used for query answering over the knowledge base. Here we focus on brave-semantics [26] and ICR-semantics [7].

The brave-semantics accepts a query if it is entailed from at least one repair. This kind of semantics has been criticized because it allows conflicting answers. Let \(\mathscr {K}=(\mathscr {F}, \mathscr {R}, \mathscr {N})\) be a knowledge base and let Q be a query. Q is brave-entailed from \(\mathscr {K}\), written \(\mathscr {K} \,\models \,_{brave} Q\) if and only if: \(\exists {\mathscr {A}}\in {\mathscr {R}epair}({\mathscr {K}}) \text {such that } {\mathtt {Cl_{{\mathscr {R}}}}}({\mathscr {A}}) \,\models \, Q\). A prudent and more preservative semantics has been proposed in [7]. Let \( \mathscr {K}=(\mathscr {F}, \mathscr {R}, \mathscr {N})\) be a knowledge base and let Q be a query. Q is ICR-entailed from \(\mathscr {K}\), written \(\mathscr {K} \,\models \,_{ICR} Q\) if: \(\bigcap _{{\mathscr {A}}\in {\mathscr {R}epair}({\mathscr {K}})} {\mathtt {Cl_{{\mathscr {R}}}}}({\mathscr {A}}) \,\models \, Q\).

An alternative method for handling inconsistency is the use of argumentation. Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\), the corresponding argumentation framework \({\mathscr {AF}_{\mathscr {K}}}\) is a pair \(({\mathtt {Arg}}, {\mathtt {Att}})\) where \({\mathtt {Arg}}\) is the set of arguments that can be constructed from \({\mathscr {F}}\) and \({\mathtt {Att}}\) is an asymmetric binary relation called attack defined over \({\mathtt {Arg}}\times {\mathtt {Arg}}\) (as defined in [13]). Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\), an argument a is a tuple \(\mathbf a =(F_0, F_1, \ldots ,F_{n}, C)\) where: \((F_0, \ldots , F_{n})\) is an \({\mathscr {R}}\)-derivation of \(F_0\) in \({\mathscr {K}}\), such that (i) \(F_0\) is \({\mathscr {R}}\)-consistent and (ii) C is an atom, a conjunction of atoms, the existential closure of an atom or the existential closure of a conjunction of atoms such that \(F_{n} \,\models \, C\). \(F_0\) is the support of the argument \(\mathbf a \) (\({\mathtt {Supp}}(\mathbf a )\)) and C is its conclusion (\({\mathtt {Conc}}(\mathbf a )\)).

An argument \(\mathbf a \) supports a query Q if \({\mathtt {Conc}}(\mathbf a )\) entails Q and \(\mathbf a \) is against Q if it attacks at least one argument that supports Q. An attack between two arguments \(\mathbf a \) and \(\mathbf b \) expresses the conflict between their conclusions and supports. Thus, \(\mathbf a \) attacks \(\mathbf b \) iff there exists \(f \in {\mathtt {Supp}}(\mathbf a )\) (f is a fact) such that the set \(\{{\mathtt {Conc}}(\mathbf b ), f\}\) is \(\mathscr {R}\)-inconsistent.

Let \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) be a knowledge base and \({\mathscr {AF}_{\mathscr {K}}}\) be its corresponding argumentation framework. Let \(E \subseteq {\mathtt {Arg}}\) be a set of arguments. We say that E is conflict free iff there exist no arguments \(a, b \in E\) such that \((a,b) \in {\mathtt {Att}}\). E defends an argument a iff for every argument \(b \in {\mathtt {Arg}}\), if we have \((b,a) \in {\mathtt {Att}}\) then there exists \(c \in E\) such that \((c,b) \in {\mathtt {Att}}\). E is admissible iff it is conflict free and defends all its arguments. E is a preferred extension iff it is maximal (with respect to set inclusion) admissible set (please see [20] for other types of semantics). We denote by \({\mathtt {Ext}}({\mathscr {AF}_{\mathscr {K}}})\) the set of all extensions of \({\mathscr {AF}_{\mathscr {K}}}\). a is sceptically accepted if it is in all extensions, credulously accepted if it is in at least one extension and not accepted if not in any extension.

In [13] has been proved the equivalence between skeptical acceptance under preferred semantics and ICR-entailment. This allows us to use the argumentation approach in our explanation dialogue (Sect. 5) as to ensure its correctness and completeness w.r.t. ICR query explanation and failure.

Given a knowledge base \({\mathscr {K}}\) and a query Q, the general problem is to explain if Q is entailed by \({\mathscr {K}}\) or not. Let \({\mathscr {K}}\) be an inconsistent knowledge base, Q a Boolean conjunctive query. \(\varPi =\langle {\mathscr {K}},Q\rangle \) is a query result explanation problem (QREP) iff (i) \({\mathscr {K}}\) is inconsistent, and (ii) \({\mathscr {K}}\,\models \,_{brave} Q\). [3]. Using ICR semantics we distinguish:

  1. 1.

    The Query Failure Explanation Problem (QFEP). In the ICR setting, a QREP \(\varPi \) is defined to be a QFEP iff \({\mathscr {K}}\not \,\models \,_{ICR} Q\).

  2. 2.

    The Query Acceptance Explanation Problem (QAEP): In the ICR setting, a QREP \(\varPi \) is a QAEP iff \({\mathscr {K}}\,\models \,_{ICR} Q\).

The first one refers to the case when the query fails (no answer) due to contradictions; the second refers to the case when the query is accepted, so a yes answer is obtained.

4 QFEP for SameAs Invalidation

Let \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) be a knowledge base and \({\mathscr {AF}_{\mathscr {K}}}\) its corresponding argumentation framework. We define now the main components of \({\mathscr {K}}\) for a QFEP in case of a sameAs invalidation.

Defining the Facts, the Rules and the Negative Constraints. \({{\mathscr {F}}}\) is a set of facts including (i) RDF triples, coming from RDF graphs representing the knowledge described in (possibly) different inter-connected datasets, and (ii) facts asserting similarity values between specific literals. These second type of facts are in the form of

$$\begin{aligned} is\text {[prop]}Diff\text {[SimFunction]}(x,y,\sigma ) \end{aligned}$$

where (i) [prop] is the name of a datatype (inverse-functional) functional property, (ii) [SimFunction] is a similarity measure (e.g. Jaccard, Levenshtein, ...), (iii) xy are literals and (iv) \(\sigma \) is a similarity value between x and y. These facts are considered when \(\sigma \) is less than a given threshold \(\epsilon \), defined for the similarity measure [SimFunction] of a given property [prop].

There are several kind of logical rules that we consider. There are rules defined by the W3C standards: for instance, we exploit the OWL2 RL rules which define the owl : sameAs predicate as being reflexive, symmetric, and transitive, and the rules that axiomatize the standard replacement properties. We also use rules declared or discovered using mining techniques on RDF triples. For these kind of rules, here, we consider two types of properties: functional and inverse-functional properties [1].

When a property p is a datatype functional property, it can be expressed via the following logical rule: \(p(r,v) \wedge p(r,v^{'}) \rightarrow isEquiv(v,v^{'})\), where isEquiv expresses equivalence of two literals. If the property p is an object functional property, the following logical rule can be used: \(p(r,v) \wedge p(r,v^{'}) \rightarrow sameAs(v,v^{'})\). Instead, when p is an inverse-functional property, the logical rule is \(p(w_1,x) \wedge p(w_2,x) \rightarrow sameAs(w_1, w_2)\).

We also add the set of rules which have all the following form:

$$\begin{aligned} is\text {[prop]}Diff\text {[SimFunction]}(x,y,\sigma ) \rightarrow isDiff(x,y) \end{aligned}$$

A rule like this basically asserts that, when two literals x and y have a low similarity value \(\sigma \) for a specific property [prop], they are declared as different (thus the fact isDiff(xy) is added to \({{\mathscr {F}}}\)).

In our setting, the negative constraints are very simple. The only necessary negative constraints are in the following form: \(isEquiv(x,y) \wedge isDiff(x,y) \rightarrow \perp \), where isEquiv(xy) are predicates coming from the rules defined for the datatype functional properties and isDiff(xy) comes from the similarity value between the literals. Note that all the other negative constraints, meaningful for discovering inconsistencies for a given sameAs, can be logically derived from the rules defined before. In case of a datatype functional property title the following leads to an inconsistency:

$$\begin{aligned} sameAs(s,o) \wedge title(s,w) \wedge title(o,w_{1}) \wedge isDiff(w,w_{1}) \rightarrow \perp \end{aligned}$$

This can be derived from one rule and a negative constraint, namely:

  1. 1.

    \(sameAs(s,o) \wedge title(s,w) \wedge title(o,w_{1}) \rightarrow isEquiv(w,w_{1})\)

  2. 2.

    \(isEquiv(w,w_{1}) \wedge isDiff(w,w_{1}) \rightarrow \perp \)

The problem QFEP. For completing the components and thus the instantiation of the QFEP, using ICR semantics, in the setting of sameAs invalidation, we need to define the query Q which is basically a sameAs (xy). The problem becomes:

Query Failure Explanation Problem ( \(QFEP_{sameAs}\) ). Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) with \({{\mathscr {F}}}\), \({{\mathscr {R}}}\), \({{\mathscr {N}}}\) defined above and the query Q as a sameAs(xy) statement. The \(QFEP_{sameAs}\), in the ICR setting (which is equivalent to \({\mathscr {AF}_{\mathscr {K}}}\) as argumentation framework) is a QREP where .

At this point, we formally instantiated a QFEP as a sameAs invalidation problem. Given a sameAs statement (as query), by the use of facts, rules and negative constraints as described above, we are able to discover if the sameAs is not entailed with respect to the given knowledge base (in ICR semantics). This proves that a sameAs invalidation method can be seen as a instantiation of QFEP in ICR. By itself, this represents an interesting result when searching for effective methods for evaluating the quality of sameAs statements. But, we also need interactions with the domain experts to explain the problems encountered and to support the corrective actions. In the following, we define our explanation framework (and dialogues) which provides these interactive functionalities.

5 The Explanation Framework

It is clear that if a sameAs has problems, it makes sense to show to the experts what kind of actions and negative conditions lead to this answer. Here we introduce our explanation framework that is custom-tailored for the problem of Query Failure Explanation Problem under ICR-semantics in inconsistent knowledge bases.

Example 1

(Motivating Example). Let us consider the case of a QFEP \(\varPi =\langle {\mathscr {K}},Q\rangle \) with a query as \(Q=worksIn(Linda,Statistics)\). The dialogue we would like is similar to the following:

Actor

Dialogue expression

\({\mathtt {User}}\)

Why not worksIn(LindaStatistics)?

\({\mathtt {Reasoner}}\)

Because Linda works in Accounting.

\({\mathtt {User}}\)

Clarify?

\({\mathtt {Reasoner}}\)

Because Linda uses office \(o_1\) and \(o_1\) is located in Accounting, so Linda works in Accounting.

\({\mathtt {User}}\)

How’s that a problem?

\({\mathtt {Reasoner}}\)

The following negative constraint is violated \(\forall x \forall y \forall z\,\,(worksIn(x,y)\wedge worksIn(x,z) \wedge y \ne z)\rightarrow \bot \).

\({\mathtt {User}}\)

Understood.

This simple example (not explicitly related to SameAs) is only to clarify that, in our ideal explanation framework, each iteration need to respect certain rules and some predefined locutions must be used (like understood, clarify, why, etc.). In addition, all the information must be represented as arguments and/or elaboration of arguments. Finally, our dialogue will use a turn taking mechanism where the \({\mathtt {User}}\) and the \({\mathtt {Reasoner}}\) switch turn at each stage.

In the following, we formalize the dialogue system and a legal dialogue for our explanation framework and, for doing this, we define the necessary syntax and semantics. The formalization is based on a very preliminary work [3], where the idea of dialogue was firstly introduced. The novelty here is the full formalization of the dialogue with specific references and custom definitions to the problem at hand.

5.1 Syntax

Definition 1

(Dialogue System). Given a QFEP \(\varPi =\langle {\mathscr {K}},Q\rangle \). A dialogue system for \(\varPi \) is a tuple , where \(\varPi \) is the topic, \({\mathscr {P}r}\) is the set of participants, is a finite set of the allowed utterances, \({\mathbb {R}}\) is an irreflexive binary relation defined over called the reply relation.

The definition above is intentionally general, the reader should note that, in the case of this work, the topic of the dialogue is a discussion that aims to get the \({\mathtt {User}}\) understand the refusal of a query Q (sameAs(xy)) in the \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) with \({{\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}}}\) defined in the previous section. The participants \({\mathscr {P}r}=\{{\mathtt {Reasoner}},{\mathtt {User}}\}\) are (i) the \({\mathtt {User}}\), namely the domain expert who is analysing the quality of a set of sameAs, (ii) and the \({\mathtt {Reasoner}}\), who represents an agent providing explanations in case of refusal.

The set of allowed utterances and the reply relation \({\mathbb {R}}\) for our dialogue system \({\mathscr {D}}\) is given in Table 1. Note that a, \(a'\), t, \(t'\) and Q in the table represent metavariables of arbitrary well-formed syntactical objects (e.g. queries, arguments, integers, etc.) of an arbitrary formal language.

Table 1. The set of allowed utterances . In the table \(\mathbf U \) is \({\mathtt {User}}\) and \(\mathbf R \) is \({\mathtt {Reasoner}}\).

A dialogue D has a potential infinite sequence of legal utterances. An utterance is considered a legal reply for another utterance iff it is a correct reply with respect to the reply relation \({\mathbb {R}}\) and it is the turn of the participant x to talk. We provide here a simple explanatory example.

Example 2

(Legal/Illegal Reply). Consider the dialogue: \(\langle {\textsc {explain}}(1,{\mathtt {User}},Q), {\textsc {attempt}}(2,{\mathtt {Reasoner}},a),{\textsc {clarify}}(3,{\mathtt {User}},a),{\textsc {negative}}(4,{\mathtt {User}},a') \rangle \) As one may notice, the reply \({\textsc {negative}}\) to \({\textsc {clarify}}\) is illegal because it is not in \({\mathbb {R}}\). A legal reply would be \({\textsc {clarification}}(4,{\mathtt {Reasoner}},a')\).

At this point, it is necessary to define the protocol which will decide if a dialogue is legal or not. We introduce the following definition for a Legal Dialogue.

Definition 2

(Legal Dialogues). Given a dialogue \(D_n\) at stage n, \(n \ge 0\). The dialogue \(D_n\) is legal iff:

  • Empty dialogue rule: if \(n=0\) then \(D_0\) is legal.

  • Commencement rule: if \(n=1\) then \(D_1=\langle u_1\rangle \) is legal iff \(u_1 = {\textsc {explain}}(1,{\mathtt {User}},Q)\).

  • Dialogue rules: if \(n > 1\) then \(D_n\) is legal iff \(D_{n-1}\) is legal and \(u_{n}\) is a legal reply to \(u_{n-1}\) and there is no \(u_i \in D_{n-1}\), \(i<n\) and \(u_{n}\) equals \(u_i\).

Our definition indicates that an empty dialogue is legal. Furthermore, a legal dialogue always starts with an explanation request made by the \({\mathtt {User}}\). Also, the protocol defines a legal dialogue as a sequence of utterances which legally replies to each other and no utterance is repeated twice.

5.2 Semantics

Now we shift to the semantic aspect of the dialogue where we deal with the content of the utterances. For instance, the utterance \({\textsc {explain}}(2,{\mathtt {User}},Q)\) is legal (syntactically correct) but it will not be semantically legal if \(\varPi =\langle {\mathscr {K}},Q\rangle \) is not a query result explanation problem (or, in our more specific case a QFEP). The same applies to the utterance \({\textsc {attempt}}(2,{\mathtt {Reasoner}},a)\) if a is not an argument or a combination of arguments in our argumentation framework.

In Table 2 we put the conditions under which a given utterance or a reply is considered semantically legal in our setting. Here a deepening of an argument a explains the conflict between a and another argument b by showing the set of violated constraints. A clarification, instead, intends to unfold the knowledge (rules) used in the argument a to exhibit the line of reasoning that drives the conclusion.

Table 2. The utterances and their semantical conditions. \({\mathscr {K}}\) is an inconsistent knowledge base defined as in Sect. 4 and \({\mathscr {AF}_{\mathscr {K}}}\) is the corresponding argumentation framework.

The semantical legality must also be considered within a context where replies are taken into account. Table 3 indicates the conditions under which a reply is semantically legal. For instance, a reply by the utterance \({\textsc {attempt}}(2,{\mathtt {Reasoner}},a)\) to the utterance \({\textsc {explain}}(1,{\mathtt {User}},Q)\) is legal but it will not be semantically legal if a is not a proponent (opponent) argument of the query Q.

Table 3. The replies and their semantical conditions. Here U is for \({\mathtt {User}}\) and R is for \({\mathtt {Reasoner}}\).

The dialogue is defined as a finite set of semantically legal moves. An explanation dialogues is typed, depending on its topic. Here, the topic is a QFEP, thus our explanation dialogue \(D_n\) is called a Query Failure Explanation Dialogue QFED: the \({\mathtt {Reasoner}}\) will show, by presenting opponent arguments, why a query Q has failed.

6 First Results and Discussion

To verify our strategy, we have implemented a prototype of the explanation dialogue that communicates with a Datalog\(+/-\) rule-based reasoner called Graal [5]. For the knowledge base, we considered facts from the CORA dataset [28] and sameAs computed using the SILK framework [2]. We provide an example of sameAs invalidation explaining what has been obtained while running dialogues and we discuss over these results. Due to space limitations, here we present a single example and we provide only a meaningful portion of the set of facts, rules and negative constraints (only those related to the sameAs used in the query or in the dialogue).

Let us consider a query Q as \(sameAs(r_1, r_2)\), where \(r_1, r_2\) are URIs describing two resources in CORA. We show our explanation framework in the form of a QFED, where the \({\mathtt {User}}\) and the \({\mathtt {Reasoner}}\) interact in order to explain why Q is invalid. In Table 4 we report a subset of the knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) we used. This subset provides sufficient details to discuss over the results.

Table 4. A portion of the facts \({\mathscr {F}}\), rules \({\mathscr {R}}\) and negative constraints \({\mathscr {N}}\) used to build our knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\).

To be more clear, the query \(Q=sameAs(r_1,r_2)\) involves two resources which describe two ‘conferences’ with title (confName) ‘proceedings aaai-98’ and ‘in proceedings of aaai’, respectively. The query Q is not entailed, according to the inconsistency-tolerant semantics \({\mathscr {AF}_{\mathscr {K}}}\): the two conferences are not the same. In Table 5 we show our explanation dialogue providing details on the reasons why Q is not entailed.

Table 5. A query failure explanation dialogue for a sameAs query involving the resources \(r_1\) and \(r_2\). For each dialogue we outline the formalism and the utterances involved.
Table 6. A new portion of the failure explanation dialogue for an invalid sameAs involving the resources \(r_1\) and \(r_2\). In this case, the user asks for further explanations by providing an argument against the reasoner conclusion.

As mentioned in the formal specification of the dialogue in Sect. 5, utterances succession respects certain constraints: in step \(\mathbf 1. \) the \({\mathtt {User}}\) is the one who is allowed to make the opening move (\({\textsc {explain}}\)), not the \({\mathtt {Reasoner}}\). At step \(\mathbf 2. \) the \({\mathtt {Reasoner}}\) responds providing an argument against the query (\({\textsc {attempt}}\)) and the request for clarification (\({\textsc {clarify}}\)) at step \(\mathbf 3. \) made by the \({\mathtt {User}}\) is followed by a response made by the \({\mathtt {Reasoner}}\) (\({\textsc {clarification}}\)). Note that, after this clarification, the possible utterances can be: (i) a deepening request (\({\textsc {deepen}}\)), followed immediately by a deepening response (\({\textsc {deepening}}\)) or (ii) a \({\textsc {negative}}\) (understanding dis-acknowledgment) since, according to the semantical conditions we provided in Table 3, another deepening request is prohibited.

Another interesting property of our explanation dialogue is that it provides to the domain expert (\({\mathtt {User}}\)) the possibility to ask additional follow-ups. In the portion of dialogue described in Table 6, we report an extension of the previous dialogue (Table 5), where the \({\mathtt {User}}\) inputs additional arguments supporting her query Q and thus she asks for further explanations. We continue from step 7 of Table 5 and, instead of declaring ’understood’ (\({\textsc {positive}}\)), we disacknowledge the dialogue by providing a feedback in form of an argument.

Finally, to better illustrate the explanation dialogue, we present here the sequence of utterances, in terms of the formal model we formalized before. The dialogue \(D_i\) (\(i=7\)) depicted in Table 5 is the following, where a is an argument and \(C_a, D_a\) are clarification and deepening of a, respectively.

figure a

The second dialogue (Table 6) is composed by 9 steps. Its formal representation as sequence of utterances is:

figure b

It is worthy to make a consideration on the semantics of the utterance \({\textsc {negative}}\), which has two goals. First, it declares that the \({\mathtt {User}}\) has not understood the last explanation; second, it provides to the \({\mathtt {Reasoner}}\) a feedback. This feedback is in form of an argument \(a'\). Thus, if the \({\mathtt {User}}\) has an expectation about a query and her expectation is endorsed by an argument then, she can present this argument in this utterance. Henceforth, \({\textsc {negative}}(7,{\mathtt {User}},a')\) can be read as “I do not understand why Q is not entailed given that the argument \(a'\) supports it". When \(a'\) is empty, the user has no argument to propose.

6.1 Discussion

Our tests on the prototype have shown that, running dialogues on various sameAs statements (computed externally and considered potentially problematicFootnote 1) was a support for different corrective actions. In some cases, errors in the data have been found (e.g. resource 100001135 has confYear property value 0, while its correct sameAs resources are conferences of the year 1995, or resource 100000021 has pageFrom to 24.1 which is again an error since it should be 24). Thanks to the dialogues with the reasoner, the expert has easily located these problems. In some other tests, the explanation dialogue supported the expert to understand that an update of some similarity functions used in specific properties was necessary (e.g. Levensthein instead of Jaccard for confName), or that the threshold \(\epsilon \) to determine “dissimilar literals” had to be lowered for some properties (e.g. title). Finally, at the very first running, we used a set of sameAs links computed loosely (full of erroneous links). Thanks to the explanation dialogue it was clear that every sameAs query had strong inconsistencies over fundamental properties and values, thus this supported the idea to redo the linking process with a different strategy (in our case using composite keys in the linkage phase).

An important question may occur at this point, “what happens if the \({\mathtt {Reasoner}}\) has multiple explanations (several potential arguments against/for the query)?”.

In this case, we adapt a selection strategy: we choose each time which argument must be presented. In this work we aim at providing a general account for such process, thus we use the concept of a selection function \(\mathbb {S}\) over a set of arguments. Note that \(\mathbb {S}\) can be instantiated to express preferences with respect to some criteria that can possibly be defined by the expert \({\mathtt {User}}\), such as “the property confName is very important (high weight \(w_{confName}\))” or “the property year may contain errors, thus it has lower importance (low weight \(w_{year}\))”. To order the sameAs presented to the expert, we used Graal to compute all the conflicts in the knowledge base. Then, we highlighted those sameAs statements that were more involved in conflicts (and sub-sequentially more present in attacks in the corresponding argumentation framework). These sameAs have been compared with the gold standard of the CORA dataset, and they have been used to define the order by which the dialogue should propose the sameAs links to the \({\mathtt {User}}\). The sameAs links with most attacks, thus the most debatable ones, were showed first. The procedure we used to compute the conflicts is expensive from a computational point of viewFootnote 2. Such approach can be further improved in future work, by suitably adapting the conflict computation in order to obtain an incremental any-time algorithm with better computational properties.

7 Concluding Remarks

In this paper we presented an explanation dialogue based on argumentation theory where a domain expert can interact with the reasoner regarding a problematic sameAs statement.

The paper demonstrates the significance of the explanation framework by the use of a real world example. All the dialogue moves are detailed, so that the reader can comprehend the types of interactions allowed. To the extent of our knowledge, the work presented in this paper, is the first attempt to use argumentation for sameAs links invalidation and for providing an explanation framework.

The results we obtained with the first prototype are very promising, motivating us in the continuation of the research activity. In these days, we are working on conducting tests using different (in size and quality) synthetic datasets (e.g. OAEI) and, in the immediate future, we are planning to analyze and evaluate sameAs coming directly from the LOD. In parallel, we are studying suitable improvements and strategies in order to ensure scalability of the approach when dealing with big datasets.

Different interesting long-term research directions can be exploited. For example, it could be interesting to study how to design innovative methods for modeling and combining contextual weights associated to each property used in the QFEP. Such weights could depend on different factors such as the reliability (automatically acquired or computed) of each property in the initial dataset. In addition, these weights could include suggestions (or restrictions) provided directly from the expert/user (something like ‘I trust this data, please consider it true over all the other computations’), and so on.

Another interesting future research direction could be also to study suitable user interfaces (by the use of innovative interactive systems) in the explanation of the inconsistencies and the properties involved, such that the type of interactions as the way in which the arguments are presented could be more ‘user-friendly’ and supported by graphical representations.