Abstract
Due to the impressive growing of the LOD graph in the last years, assuring the quality of its content is becoming a very important issue. Thus, it is crucial to design techniques for supporting experts in validating facts and links in complex data sources. Here, we focus on identity links (sameAs) and apply argumentation semantics to (i) detect inconsistencies in sameAs statements and to (ii) explain them to the experts using dialogues. We formalize the framework, explaining its purposes. Finally we provide a promising preliminary evaluation and discuss on some interesting future directions we foresee.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Today, we are experiencing an unprecedented production of resources, published as Linked Open Data (LOD). This is leading to the creation of a global data space with billions of assertions [9]. RDF [24] provides formal ways to build these assertions. Most of the RDF links, connecting resources coming from different data sources, are identity links, also called sameAs statements. They are defined using the owl:sameAs property, expressing that two URIs actually refer to the same thing [1]. Unfortunately, many existing identity links do not reflect genuine real identity [15, 16] and therefore might lead to inconsistencies. Over the years, inconsistency-tolerant semantics (e.g. [7, 8, 26, 27]) have been proposed for query answering over potentially inconsistent existing data (and thus overcoming inconsistencies within the data).
In this work, we formalize explanation dialogues that use argument-based explanation based on inconsistency tolerant semantics. Our explanation dialogue supports a domain expert in discovering inconsistencies as in (eventually) performing corrections on erroneous data, or in revising the logical rules used for the invalidation or even in deciding the (potential) redesign of the initial linking strategy.
The explanation dialogue relies on a method for invalidating the sameAs that computes repairs so that, when a sameAs is not entailed by the defined semantics, an explanation of the reasons against this entailment is provided.
This is the first work that uses argumentation for sameAs links invalidation together with a formalization of a general explanation framework supporting the dialogue between user and reasoner. The salient point of this paper is to show how inconsistency-tolerant semantics can represent a first step in the direction of the design of new interactive paradigms for assessing the quality of sameAs statements.
The paper is organized as follows. Section 2 argues on related works while in Sect. 3 we provide background notions for argumentation theory. Section 4 is devoted to the presentation of our argumentation problem for sameAs invalidation. Section 5 formally introduces the novel Explanation Dialogue and Sect. 6 provides an example of the overall strategy implemented in a prototype. Finally, Sect. 7 draws some concluding remarks and possible future directions.
2 Related Work
To the extent of our knowledge the work presented here is the first attempt to combine argumentation theory, identity links evaluation and explanation dialogues, however, related works can be found in the context of sameAs evaluation and in general approaches which use argumentation in the Semantic Web.
For what concern the sameAs validation problem, it is very recent and few methods exist. In [17] an approach is presented where the structural properties of large graphs of sameAs links are analyzed, without analyzing the quality. In [22] a framework dedicated to the assessment of sameAs using network metrics is described, while in [23] the authors reported on the quality of sameAs links in the LOD using a manual method. In [15], the author illustrates how to assess the quality of sameAs, using a constraint-based method which, in the end, consider only one property (name of the entity), while in [29] an ontology-based logical invalidation method is presented which discovers invalid sameAs by the use of contextual graphs build around the resources, thus using more properties. Finally, the recent work presented in [14] evaluate a sameAs by using position and relevance of each resource involved with regards to the associated DBpedia categories, modeled through two probabilistic category distribution and selection functions. We need to recall that there exist a lot of linking methods (see [21] as survey) that, during their process of sameAs discovery, include a strategy for evaluating the reliability of the sameAs just computed.
Regarding argumentation in the Semantic Web, several works exist that mainly address ontologies alignment agreement based on argumentation theory (e.g. [18, 19, 25]). Basically, all of them use argumentation to provide a final agreement (or a final answer), and do not exploit the argumentation as a form of explanation of the answer to a query. Recently, in [10] the problem of data fusion in Linked Data is addressed, by adopting a bipolar argumentation theory (with fuzzy labeling) to reason over inconsistent information sets, and to provide a unique answer.
This last method has common points with our line of work, namely the use of argumentation theory to detect inconsistencies, but the scenarios in which the approach is exploited are different as its general aim is. This obviously leads to different addressed issues and proposed solutions.
3 Background Notions
There exist two major approaches for representing an ontology for the OBDA (Ontology-Based Data Access) problem: (i) Description Logics (DL) (such as \(\mathscr {EL}\) [4] and DL\(_{Lite}\) [12] families) and (ii) Rule-based Languages (such as Datalog\(+/-\) [11] language). Despite its undecidability when answering conjunctive queries, different decidable fragments of Datalog\(+/-\) have been studied in the literature [6]. They overcome their limitations allowing n-arity for predicates and cyclic structures. We consider the positive existential conjunctive fragment of first-order logic, denoted by FOL(\(\wedge \),\(\exists \)), which is composed of formulas built with the connectors \((\wedge ,\rightarrow )\) and the quantifiers \((\exists ,\forall )\).
We consider first-order vocabularies with constants but no other function symbol. A term t is a constant or a variable. Different constants represent different values (unique name assumption), an atomic formula (or atom) is of the form \(p(t_1,\ldots ,t_n)\) where p is an n-ary predicate, and \(t_1,\ldots ,t_n\) are terms. A ground atom is an atom with no variables. A variable in a formula is free if it is not in the scope of a quantifier. A formula is closed if it has not free variable. We denote by X (bold font) sequences of variables \(X_1,\ldots ,X_k\) with \(k \ge 1\). A conjunct \(C[\mathbf X ]\) is a finite conjunction of atoms, where X is the sequence of variables occurring in C. Given an atom or a set of atoms A, vars(A), consts(A) and terms(A) denote its set of variables, constants and terms, respectively.
An existential rule is a first-order formula of the form \(R=\forall \mathbf X \forall \mathbf Y (H[\mathbf X ,\mathbf Y ]) \rightarrow \exists \mathbf Z C[\mathbf Z ,\mathbf Y ]\), with \(vars(H)=\mathbf X \cup \mathbf Y \), and \(vars(C)= \mathbf Z \cup \mathbf Y \) where H and C are conjuncts (hypothesis and conclusion of R), respectively. \(R=(H,C)\) is a contracted form for R. An existential rule with an empty hypothesis is called a fact. A fact is an existentially closed (with no free variable) conjunct. A rule \(r=(H,C)\) is applicable to a set of facts F iff there exists \(F' \subseteq F\) such that there is a homomorphism \(\pi \) from H to the conjunction of elements of \(F'\). If a rule r is applicable to a set F, its application according to \(\pi \) produces a set \(F \cup \{\pi (C)\}\). The new set \(F \cup \{\pi (C)\}\), denoted also by r(F), is called immediate derivation of F by r. Finally, we say that a set of facts \(F \subseteq {\mathscr {F}}\) and a set of rules \(\mathscr {R}\) entail a fact f (and we write \(F, \mathscr {R} \,\models \, f\)) iff the closure of F by all the rules entails f (i.e. \({\mathtt {Cl_{{\mathscr {R}}}}}(F) \,\models \, f\)). A negative constraint is a first-order formula \(n=\forall \mathbf X H[\mathbf X ] \rightarrow \perp \) where \(H[\mathbf X ]\) is a conjunct called hypothesis of n and X sequence of variables appearing in the hypothesis. A knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) is composed of, a finite set of facts \({\mathscr {F}}\), a finite set of existential rules \({\mathscr {R}}\) and a finite set of negative constraints \({\mathscr {N}}\). Given a knowledge base \({\mathscr {K}}= ({\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}})\), a set \(F \subseteq {\mathscr {F}}\) is said to be inconsistent iff there exists a constraint \(n \in {\mathscr {N}}\) such that \(F \,\models \, H_{n}\), where \(H_{n}\) is the hypothesis of the constraint n. A set of facts is consistent iff it is not inconsistent. A conjunctive query (CQ) has the form \(Q(\mathbf X )=\exists \mathbf Y \varPhi [\mathbf X ,\mathbf Y ]\) where \(\varPhi [\mathbf X ,\mathbf Y ]\) is a conjunct such that X and Y are variables in \(\varPhi \). A Boolean CQ (BCQ) is a CQ with yes or no as answer.
Inconsistency Handling. If a knowledge base \({\mathscr {K}}= ({\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}})\) is inconsistent, then everything is entailed from it. A common way to face inconsistency [7, 26] is to construct maximal (with respect to set inclusion) consistent subsets of \(\mathscr {F}\), called repairs, denoted by \({\mathscr {R}epair}({\mathscr {K}})\). In this paper, we consider a fragment of our language where the deduction method (the chase) halts, thus the closure \({\mathtt {Cl_{{\mathscr {R}}}}}(F)\) of any set of facts F is finite. Once the repairs are computed, different semantics can be used for query answering over the knowledge base. Here we focus on brave-semantics [26] and ICR-semantics [7].
The brave-semantics accepts a query if it is entailed from at least one repair. This kind of semantics has been criticized because it allows conflicting answers. Let \(\mathscr {K}=(\mathscr {F}, \mathscr {R}, \mathscr {N})\) be a knowledge base and let Q be a query. Q is brave-entailed from \(\mathscr {K}\), written \(\mathscr {K} \,\models \,_{brave} Q\) if and only if: \(\exists {\mathscr {A}}\in {\mathscr {R}epair}({\mathscr {K}}) \text {such that } {\mathtt {Cl_{{\mathscr {R}}}}}({\mathscr {A}}) \,\models \, Q\). A prudent and more preservative semantics has been proposed in [7]. Let \( \mathscr {K}=(\mathscr {F}, \mathscr {R}, \mathscr {N})\) be a knowledge base and let Q be a query. Q is ICR-entailed from \(\mathscr {K}\), written \(\mathscr {K} \,\models \,_{ICR} Q\) if: \(\bigcap _{{\mathscr {A}}\in {\mathscr {R}epair}({\mathscr {K}})} {\mathtt {Cl_{{\mathscr {R}}}}}({\mathscr {A}}) \,\models \, Q\).
An alternative method for handling inconsistency is the use of argumentation. Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\), the corresponding argumentation framework \({\mathscr {AF}_{\mathscr {K}}}\) is a pair \(({\mathtt {Arg}}, {\mathtt {Att}})\) where \({\mathtt {Arg}}\) is the set of arguments that can be constructed from \({\mathscr {F}}\) and \({\mathtt {Att}}\) is an asymmetric binary relation called attack defined over \({\mathtt {Arg}}\times {\mathtt {Arg}}\) (as defined in [13]). Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\), an argument a is a tuple \(\mathbf a =(F_0, F_1, \ldots ,F_{n}, C)\) where: \((F_0, \ldots , F_{n})\) is an \({\mathscr {R}}\)-derivation of \(F_0\) in \({\mathscr {K}}\), such that (i) \(F_0\) is \({\mathscr {R}}\)-consistent and (ii) C is an atom, a conjunction of atoms, the existential closure of an atom or the existential closure of a conjunction of atoms such that \(F_{n} \,\models \, C\). \(F_0\) is the support of the argument \(\mathbf a \) (\({\mathtt {Supp}}(\mathbf a )\)) and C is its conclusion (\({\mathtt {Conc}}(\mathbf a )\)).
An argument \(\mathbf a \) supports a query Q if \({\mathtt {Conc}}(\mathbf a )\) entails Q and \(\mathbf a \) is against Q if it attacks at least one argument that supports Q. An attack between two arguments \(\mathbf a \) and \(\mathbf b \) expresses the conflict between their conclusions and supports. Thus, \(\mathbf a \) attacks \(\mathbf b \) iff there exists \(f \in {\mathtt {Supp}}(\mathbf a )\) (f is a fact) such that the set \(\{{\mathtt {Conc}}(\mathbf b ), f\}\) is \(\mathscr {R}\)-inconsistent.
Let \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) be a knowledge base and \({\mathscr {AF}_{\mathscr {K}}}\) be its corresponding argumentation framework. Let \(E \subseteq {\mathtt {Arg}}\) be a set of arguments. We say that E is conflict free iff there exist no arguments \(a, b \in E\) such that \((a,b) \in {\mathtt {Att}}\). E defends an argument a iff for every argument \(b \in {\mathtt {Arg}}\), if we have \((b,a) \in {\mathtt {Att}}\) then there exists \(c \in E\) such that \((c,b) \in {\mathtt {Att}}\). E is admissible iff it is conflict free and defends all its arguments. E is a preferred extension iff it is maximal (with respect to set inclusion) admissible set (please see [20] for other types of semantics). We denote by \({\mathtt {Ext}}({\mathscr {AF}_{\mathscr {K}}})\) the set of all extensions of \({\mathscr {AF}_{\mathscr {K}}}\). a is sceptically accepted if it is in all extensions, credulously accepted if it is in at least one extension and not accepted if not in any extension.
In [13] has been proved the equivalence between skeptical acceptance under preferred semantics and ICR-entailment. This allows us to use the argumentation approach in our explanation dialogue (Sect. 5) as to ensure its correctness and completeness w.r.t. ICR query explanation and failure.
Given a knowledge base \({\mathscr {K}}\) and a query Q, the general problem is to explain if Q is entailed by \({\mathscr {K}}\) or not. Let \({\mathscr {K}}\) be an inconsistent knowledge base, Q a Boolean conjunctive query. \(\varPi =\langle {\mathscr {K}},Q\rangle \) is a query result explanation problem (QREP) iff (i) \({\mathscr {K}}\) is inconsistent, and (ii) \({\mathscr {K}}\,\models \,_{brave} Q\). [3]. Using ICR semantics we distinguish:
-
1.
The Query Failure Explanation Problem (QFEP). In the ICR setting, a QREP \(\varPi \) is defined to be a QFEP iff \({\mathscr {K}}\not \,\models \,_{ICR} Q\).
-
2.
The Query Acceptance Explanation Problem (QAEP): In the ICR setting, a QREP \(\varPi \) is a QAEP iff \({\mathscr {K}}\,\models \,_{ICR} Q\).
The first one refers to the case when the query fails (no answer) due to contradictions; the second refers to the case when the query is accepted, so a yes answer is obtained.
4 QFEP for SameAs Invalidation
Let \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) be a knowledge base and \({\mathscr {AF}_{\mathscr {K}}}\) its corresponding argumentation framework. We define now the main components of \({\mathscr {K}}\) for a QFEP in case of a sameAs invalidation.
Defining the Facts, the Rules and the Negative Constraints. \({{\mathscr {F}}}\) is a set of facts including (i) RDF triples, coming from RDF graphs representing the knowledge described in (possibly) different inter-connected datasets, and (ii) facts asserting similarity values between specific literals. These second type of facts are in the form of
where (i) [prop] is the name of a datatype (inverse-functional) functional property, (ii) [SimFunction] is a similarity measure (e.g. Jaccard, Levenshtein, ...), (iii) x, y are literals and (iv) \(\sigma \) is a similarity value between x and y. These facts are considered when \(\sigma \) is less than a given threshold \(\epsilon \), defined for the similarity measure [SimFunction] of a given property [prop].
There are several kind of logical rules that we consider. There are rules defined by the W3C standards: for instance, we exploit the OWL2 RL rules which define the owl : sameAs predicate as being reflexive, symmetric, and transitive, and the rules that axiomatize the standard replacement properties. We also use rules declared or discovered using mining techniques on RDF triples. For these kind of rules, here, we consider two types of properties: functional and inverse-functional properties [1].
When a property p is a datatype functional property, it can be expressed via the following logical rule: \(p(r,v) \wedge p(r,v^{'}) \rightarrow isEquiv(v,v^{'})\), where isEquiv expresses equivalence of two literals. If the property p is an object functional property, the following logical rule can be used: \(p(r,v) \wedge p(r,v^{'}) \rightarrow sameAs(v,v^{'})\). Instead, when p is an inverse-functional property, the logical rule is \(p(w_1,x) \wedge p(w_2,x) \rightarrow sameAs(w_1, w_2)\).
We also add the set of rules which have all the following form:
A rule like this basically asserts that, when two literals x and y have a low similarity value \(\sigma \) for a specific property [prop], they are declared as different (thus the fact isDiff(x, y) is added to \({{\mathscr {F}}}\)).
In our setting, the negative constraints are very simple. The only necessary negative constraints are in the following form: \(isEquiv(x,y) \wedge isDiff(x,y) \rightarrow \perp \), where isEquiv(x, y) are predicates coming from the rules defined for the datatype functional properties and isDiff(x, y) comes from the similarity value between the literals. Note that all the other negative constraints, meaningful for discovering inconsistencies for a given sameAs, can be logically derived from the rules defined before. In case of a datatype functional property title the following leads to an inconsistency:
This can be derived from one rule and a negative constraint, namely:
-
1.
\(sameAs(s,o) \wedge title(s,w) \wedge title(o,w_{1}) \rightarrow isEquiv(w,w_{1})\)
-
2.
\(isEquiv(w,w_{1}) \wedge isDiff(w,w_{1}) \rightarrow \perp \)
The problem QFEP. For completing the components and thus the instantiation of the QFEP, using ICR semantics, in the setting of sameAs invalidation, we need to define the query Q which is basically a sameAs (x, y). The problem becomes:
Query Failure Explanation Problem ( \(QFEP_{sameAs}\) ). Given a knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) with \({{\mathscr {F}}}\), \({{\mathscr {R}}}\), \({{\mathscr {N}}}\) defined above and the query Q as a sameAs(x, y) statement. The \(QFEP_{sameAs}\), in the ICR setting (which is equivalent to \({\mathscr {AF}_{\mathscr {K}}}\) as argumentation framework) is a QREP where .
At this point, we formally instantiated a QFEP as a sameAs invalidation problem. Given a sameAs statement (as query), by the use of facts, rules and negative constraints as described above, we are able to discover if the sameAs is not entailed with respect to the given knowledge base (in ICR semantics). This proves that a sameAs invalidation method can be seen as a instantiation of QFEP in ICR. By itself, this represents an interesting result when searching for effective methods for evaluating the quality of sameAs statements. But, we also need interactions with the domain experts to explain the problems encountered and to support the corrective actions. In the following, we define our explanation framework (and dialogues) which provides these interactive functionalities.
5 The Explanation Framework
It is clear that if a sameAs has problems, it makes sense to show to the experts what kind of actions and negative conditions lead to this answer. Here we introduce our explanation framework that is custom-tailored for the problem of Query Failure Explanation Problem under ICR-semantics in inconsistent knowledge bases.
Example 1
(Motivating Example). Let us consider the case of a QFEP \(\varPi =\langle {\mathscr {K}},Q\rangle \) with a query as \(Q=worksIn(Linda,Statistics)\). The dialogue we would like is similar to the following:
Actor | Dialogue expression |
---|---|
\({\mathtt {User}}\) | Why not worksIn(Linda, Statistics)? |
\({\mathtt {Reasoner}}\) | Because Linda works in Accounting. |
\({\mathtt {User}}\) | Clarify? |
\({\mathtt {Reasoner}}\) | Because Linda uses office \(o_1\) and \(o_1\) is located in Accounting, so Linda works in Accounting. |
\({\mathtt {User}}\) | How’s that a problem? |
\({\mathtt {Reasoner}}\) | The following negative constraint is violated \(\forall x \forall y \forall z\,\,(worksIn(x,y)\wedge worksIn(x,z) \wedge y \ne z)\rightarrow \bot \). |
\({\mathtt {User}}\) | Understood. |
This simple example (not explicitly related to SameAs) is only to clarify that, in our ideal explanation framework, each iteration need to respect certain rules and some predefined locutions must be used (like understood, clarify, why, etc.). In addition, all the information must be represented as arguments and/or elaboration of arguments. Finally, our dialogue will use a turn taking mechanism where the \({\mathtt {User}}\) and the \({\mathtt {Reasoner}}\) switch turn at each stage.
In the following, we formalize the dialogue system and a legal dialogue for our explanation framework and, for doing this, we define the necessary syntax and semantics. The formalization is based on a very preliminary work [3], where the idea of dialogue was firstly introduced. The novelty here is the full formalization of the dialogue with specific references and custom definitions to the problem at hand.
5.1 Syntax
Definition 1
(Dialogue System). Given a QFEP \(\varPi =\langle {\mathscr {K}},Q\rangle \). A dialogue system for \(\varPi \) is a tuple , where \(\varPi \) is the topic, \({\mathscr {P}r}\) is the set of participants, is a finite set of the allowed utterances, \({\mathbb {R}}\) is an irreflexive binary relation defined over called the reply relation.
The definition above is intentionally general, the reader should note that, in the case of this work, the topic of the dialogue is a discussion that aims to get the \({\mathtt {User}}\) understand the refusal of a query Q (sameAs(x, y)) in the \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) with \({{\mathscr {F}}, {\mathscr {R}}, {\mathscr {N}}}\) defined in the previous section. The participants \({\mathscr {P}r}=\{{\mathtt {Reasoner}},{\mathtt {User}}\}\) are (i) the \({\mathtt {User}}\), namely the domain expert who is analysing the quality of a set of sameAs, (ii) and the \({\mathtt {Reasoner}}\), who represents an agent providing explanations in case of refusal.
The set of allowed utterances and the reply relation \({\mathbb {R}}\) for our dialogue system \({\mathscr {D}}\) is given in Table 1. Note that a, \(a'\), t, \(t'\) and Q in the table represent metavariables of arbitrary well-formed syntactical objects (e.g. queries, arguments, integers, etc.) of an arbitrary formal language.
A dialogue D has a potential infinite sequence of legal utterances. An utterance is considered a legal reply for another utterance iff it is a correct reply with respect to the reply relation \({\mathbb {R}}\) and it is the turn of the participant x to talk. We provide here a simple explanatory example.
Example 2
(Legal/Illegal Reply). Consider the dialogue: \(\langle {\textsc {explain}}(1,{\mathtt {User}},Q), {\textsc {attempt}}(2,{\mathtt {Reasoner}},a),{\textsc {clarify}}(3,{\mathtt {User}},a),{\textsc {negative}}(4,{\mathtt {User}},a') \rangle \) As one may notice, the reply \({\textsc {negative}}\) to \({\textsc {clarify}}\) is illegal because it is not in \({\mathbb {R}}\). A legal reply would be \({\textsc {clarification}}(4,{\mathtt {Reasoner}},a')\).
At this point, it is necessary to define the protocol which will decide if a dialogue is legal or not. We introduce the following definition for a Legal Dialogue.
Definition 2
(Legal Dialogues). Given a dialogue \(D_n\) at stage n, \(n \ge 0\). The dialogue \(D_n\) is legal iff:
-
Empty dialogue rule: if \(n=0\) then \(D_0\) is legal.
-
Commencement rule: if \(n=1\) then \(D_1=\langle u_1\rangle \) is legal iff \(u_1 = {\textsc {explain}}(1,{\mathtt {User}},Q)\).
-
Dialogue rules: if \(n > 1\) then \(D_n\) is legal iff \(D_{n-1}\) is legal and \(u_{n}\) is a legal reply to \(u_{n-1}\) and there is no \(u_i \in D_{n-1}\), \(i<n\) and \(u_{n}\) equals \(u_i\).
Our definition indicates that an empty dialogue is legal. Furthermore, a legal dialogue always starts with an explanation request made by the \({\mathtt {User}}\). Also, the protocol defines a legal dialogue as a sequence of utterances which legally replies to each other and no utterance is repeated twice.
5.2 Semantics
Now we shift to the semantic aspect of the dialogue where we deal with the content of the utterances. For instance, the utterance \({\textsc {explain}}(2,{\mathtt {User}},Q)\) is legal (syntactically correct) but it will not be semantically legal if \(\varPi =\langle {\mathscr {K}},Q\rangle \) is not a query result explanation problem (or, in our more specific case a QFEP). The same applies to the utterance \({\textsc {attempt}}(2,{\mathtt {Reasoner}},a)\) if a is not an argument or a combination of arguments in our argumentation framework.
In Table 2 we put the conditions under which a given utterance or a reply is considered semantically legal in our setting. Here a deepening of an argument a explains the conflict between a and another argument b by showing the set of violated constraints. A clarification, instead, intends to unfold the knowledge (rules) used in the argument a to exhibit the line of reasoning that drives the conclusion.
The semantical legality must also be considered within a context where replies are taken into account. Table 3 indicates the conditions under which a reply is semantically legal. For instance, a reply by the utterance \({\textsc {attempt}}(2,{\mathtt {Reasoner}},a)\) to the utterance \({\textsc {explain}}(1,{\mathtt {User}},Q)\) is legal but it will not be semantically legal if a is not a proponent (opponent) argument of the query Q.
The dialogue is defined as a finite set of semantically legal moves. An explanation dialogues is typed, depending on its topic. Here, the topic is a QFEP, thus our explanation dialogue \(D_n\) is called a Query Failure Explanation Dialogue QFED: the \({\mathtt {Reasoner}}\) will show, by presenting opponent arguments, why a query Q has failed.
6 First Results and Discussion
To verify our strategy, we have implemented a prototype of the explanation dialogue that communicates with a Datalog\(+/-\) rule-based reasoner called Graal [5]. For the knowledge base, we considered facts from the CORA dataset [28] and sameAs computed using the SILK framework [2]. We provide an example of sameAs invalidation explaining what has been obtained while running dialogues and we discuss over these results. Due to space limitations, here we present a single example and we provide only a meaningful portion of the set of facts, rules and negative constraints (only those related to the sameAs used in the query or in the dialogue).
Let us consider a query Q as \(sameAs(r_1, r_2)\), where \(r_1, r_2\) are URIs describing two resources in CORA. We show our explanation framework in the form of a QFED, where the \({\mathtt {User}}\) and the \({\mathtt {Reasoner}}\) interact in order to explain why Q is invalid. In Table 4 we report a subset of the knowledge base \({{\mathscr {K}}=({\mathscr {F}},{\mathscr {R}},{\mathscr {N}})}\) we used. This subset provides sufficient details to discuss over the results.
To be more clear, the query \(Q=sameAs(r_1,r_2)\) involves two resources which describe two ‘conferences’ with title (confName) ‘proceedings aaai-98’ and ‘in proceedings of aaai’, respectively. The query Q is not entailed, according to the inconsistency-tolerant semantics \({\mathscr {AF}_{\mathscr {K}}}\): the two conferences are not the same. In Table 5 we show our explanation dialogue providing details on the reasons why Q is not entailed.
As mentioned in the formal specification of the dialogue in Sect. 5, utterances succession respects certain constraints: in step \(\mathbf 1. \) the \({\mathtt {User}}\) is the one who is allowed to make the opening move (\({\textsc {explain}}\)), not the \({\mathtt {Reasoner}}\). At step \(\mathbf 2. \) the \({\mathtt {Reasoner}}\) responds providing an argument against the query (\({\textsc {attempt}}\)) and the request for clarification (\({\textsc {clarify}}\)) at step \(\mathbf 3. \) made by the \({\mathtt {User}}\) is followed by a response made by the \({\mathtt {Reasoner}}\) (\({\textsc {clarification}}\)). Note that, after this clarification, the possible utterances can be: (i) a deepening request (\({\textsc {deepen}}\)), followed immediately by a deepening response (\({\textsc {deepening}}\)) or (ii) a \({\textsc {negative}}\) (understanding dis-acknowledgment) since, according to the semantical conditions we provided in Table 3, another deepening request is prohibited.
Another interesting property of our explanation dialogue is that it provides to the domain expert (\({\mathtt {User}}\)) the possibility to ask additional follow-ups. In the portion of dialogue described in Table 6, we report an extension of the previous dialogue (Table 5), where the \({\mathtt {User}}\) inputs additional arguments supporting her query Q and thus she asks for further explanations. We continue from step 7 of Table 5 and, instead of declaring ’understood’ (\({\textsc {positive}}\)), we disacknowledge the dialogue by providing a feedback in form of an argument.
Finally, to better illustrate the explanation dialogue, we present here the sequence of utterances, in terms of the formal model we formalized before. The dialogue \(D_i\) (\(i=7\)) depicted in Table 5 is the following, where a is an argument and \(C_a, D_a\) are clarification and deepening of a, respectively.
The second dialogue (Table 6) is composed by 9 steps. Its formal representation as sequence of utterances is:
It is worthy to make a consideration on the semantics of the utterance \({\textsc {negative}}\), which has two goals. First, it declares that the \({\mathtt {User}}\) has not understood the last explanation; second, it provides to the \({\mathtt {Reasoner}}\) a feedback. This feedback is in form of an argument \(a'\). Thus, if the \({\mathtt {User}}\) has an expectation about a query and her expectation is endorsed by an argument then, she can present this argument in this utterance. Henceforth, \({\textsc {negative}}(7,{\mathtt {User}},a')\) can be read as “I do not understand why Q is not entailed given that the argument \(a'\) supports it". When \(a'\) is empty, the user has no argument to propose.
6.1 Discussion
Our tests on the prototype have shown that, running dialogues on various sameAs statements (computed externally and considered potentially problematicFootnote 1) was a support for different corrective actions. In some cases, errors in the data have been found (e.g. resource 100001135 has confYear property value 0, while its correct sameAs resources are conferences of the year 1995, or resource 100000021 has pageFrom to 24.1 which is again an error since it should be 24). Thanks to the dialogues with the reasoner, the expert has easily located these problems. In some other tests, the explanation dialogue supported the expert to understand that an update of some similarity functions used in specific properties was necessary (e.g. Levensthein instead of Jaccard for confName), or that the threshold \(\epsilon \) to determine “dissimilar literals” had to be lowered for some properties (e.g. title). Finally, at the very first running, we used a set of sameAs links computed loosely (full of erroneous links). Thanks to the explanation dialogue it was clear that every sameAs query had strong inconsistencies over fundamental properties and values, thus this supported the idea to redo the linking process with a different strategy (in our case using composite keys in the linkage phase).
An important question may occur at this point, “what happens if the \({\mathtt {Reasoner}}\) has multiple explanations (several potential arguments against/for the query)?”.
In this case, we adapt a selection strategy: we choose each time which argument must be presented. In this work we aim at providing a general account for such process, thus we use the concept of a selection function \(\mathbb {S}\) over a set of arguments. Note that \(\mathbb {S}\) can be instantiated to express preferences with respect to some criteria that can possibly be defined by the expert \({\mathtt {User}}\), such as “the property confName is very important (high weight \(w_{confName}\))” or “the property year may contain errors, thus it has lower importance (low weight \(w_{year}\))”. To order the sameAs presented to the expert, we used Graal to compute all the conflicts in the knowledge base. Then, we highlighted those sameAs statements that were more involved in conflicts (and sub-sequentially more present in attacks in the corresponding argumentation framework). These sameAs have been compared with the gold standard of the CORA dataset, and they have been used to define the order by which the dialogue should propose the sameAs links to the \({\mathtt {User}}\). The sameAs links with most attacks, thus the most debatable ones, were showed first. The procedure we used to compute the conflicts is expensive from a computational point of viewFootnote 2. Such approach can be further improved in future work, by suitably adapting the conflict computation in order to obtain an incremental any-time algorithm with better computational properties.
7 Concluding Remarks
In this paper we presented an explanation dialogue based on argumentation theory where a domain expert can interact with the reasoner regarding a problematic sameAs statement.
The paper demonstrates the significance of the explanation framework by the use of a real world example. All the dialogue moves are detailed, so that the reader can comprehend the types of interactions allowed. To the extent of our knowledge, the work presented in this paper, is the first attempt to use argumentation for sameAs links invalidation and for providing an explanation framework.
The results we obtained with the first prototype are very promising, motivating us in the continuation of the research activity. In these days, we are working on conducting tests using different (in size and quality) synthetic datasets (e.g. OAEI) and, in the immediate future, we are planning to analyze and evaluate sameAs coming directly from the LOD. In parallel, we are studying suitable improvements and strategies in order to ensure scalability of the approach when dealing with big datasets.
Different interesting long-term research directions can be exploited. For example, it could be interesting to study how to design innovative methods for modeling and combining contextual weights associated to each property used in the QFEP. Such weights could depend on different factors such as the reliability (automatically acquired or computed) of each property in the initial dataset. In addition, these weights could include suggestions (or restrictions) provided directly from the expert/user (something like ‘I trust this data, please consider it true over all the other computations’), and so on.
Another interesting future research direction could be also to study suitable user interfaces (by the use of innovative interactive systems) in the explanation of the inconsistencies and the properties involved, such that the type of interactions as the way in which the arguments are presented could be more ‘user-friendly’ and supported by graphical representations.
Notes
- 1.
Experiment, at this moment, with one domain expert.
- 2.
Exponential in the size of the facts in the knowledge base.
References
OWL 2 Web Ontology Language: Primer. www.w3.org/TR/owl2-primer
Silk - The Linked Data Integration Framework. http://silk-framework.com/
Arioua, A., Tamani, N., Croitoru, M., Buche, P.: Query failure explanation in inconsistent knowledge bases: a dialogical approach. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXI. Springer, Heidelberg (2014)
Baader, F., Brandt, S., Lutz, C.: Pushing the EL envelope. In: Proceedings of IJCAI 2005 (2005)
Baget, J.-F., Leclère, M., Mugnier, M.-L., Rocher, S., Sipieter, C.: Graal: a toolkit for query answering with existential rules. In: Bassiliades, N., Gottlob, G., Sadri, F., Paschke, A., Roman, D. (eds.) RuleML 2015. LNCS, vol. 9202, pp. 328–344. Springer, Heidelberg (2015)
Baget, J.-F., Mugnier, M.-L., Rudolph, S., Thomazo, M.: Walking the complexity lines for generalized guarded existential rules. In: Proceedings of IJCAI 2011 (2011)
Bienvenu, M.: On the complexity of consistent query answering in the presence of simple ontologies. In: Proceedings of AAAI (2012)
Bienvenu, M., Rosati, R.: Tractable approximations of consistent query answering for robust ontology-based data access. In: Proceedings of IJCAI 2013. AAAI Press (2013)
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3) (2009)
Cabrio, E., Cojan, J., Villata, S., Gandon, F.: Argumentation-based inconsistencies detection for question-answering over dbpedia. In: Proceedings of the NLP&DBpedia Workshop (2013)
Calì, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. Web Semant. Sci. Serv. Agents World Wide Web 14, 57–83 (2012)
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: the DL-Lite family. J. Autom. Reasoning 39(3), 385–429 (2007)
Croitoru, M., Vesic, S.: What can argumentation do for inconsistent ontology query answering? In: Liu, W., Subrahmanian, V.S., Wijsen, J. (eds.) SUM 2013. LNCS, vol. 8078, pp. 15–29. Springer, Heidelberg (2013)
Cuzzola, J., Bagheri, E., Jovanovic, J.: Filtering inaccurate entity co-references on the linked open data. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 128–143. Springer, Heidelberg (2015)
de Melo, G.: Not quite the same: Identity constraints for the web of linked data. In: Proceedings of the Conference on Artificial Intelligence. AAAI Press (2013)
Ding, L., Shinavier, J., Finin, T., McGuinness, D.: owl: sameAs and linked data: an empirical study. In: International Web Science Conference (2010)
Ding, L., Shinavier, J., Shangguan, Z., McGuinness, D.L.: SameAs networks and beyond: analyzing deployment status and implications of owl:sameAs in linked data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 145–160. Springer, Heidelberg (2010)
Doran, P., Tamma, V., Palmisano, I., Payne, T.: Efficient argumentation over ontology correspondences. In: International Conference on AAMAS (2009)
dos Santos, C., Euzenat, J.: Consistency-driven argumentation for alignment agreement. In: International Workshop on Ontology Matching (OM-2010) (2010)
Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. AI 77(2), 321–357 (1995)
Ferrara, A., Nikolov, A., Scharffe, F.: Data linking. J. Web Semant. 23 (2013)
Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)
Halpin, H., Hayes, P., Thompson, H.: When owl: sameAs isn’t the same: a preliminary theory of identity and inference on the semantic web. In: International Workshop LHD (2011)
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space, 1st edn. Morgan & Claypool, Palo Alto (2011)
Laera, L., Blacoe, I., Tamma, V., Payne, T., Euzenat, J., Bench-Capon, T.: Argumentation over ontology correspondences in MAS. In: International Conference on AAMAS (2007)
Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D.: Inconsistency-tolerant semantics for description logics. In: Proceedings of International Conference on Web Reasoning Rule Systems (2010)
Lukasiewicz, T., Martinez, M.V., Simari, G.I.: Complexity of inconsistency-tolerant query answering in datalog\(+/-\). In: Meersman, R., Panetto, H., Dillon, T., Eder, J., Bellahsene, Z., Ritter, N., De Leenheer, P., Dou, D. (eds.) ODBASE 2013. LNCS, vol. 8185, pp. 488–500. Springer, Heidelberg (2013)
McCallum, A. (ed.): Cora Research Paper Dataset. http://people.cs.umass.edu/~mccallum/data.html
Papaleo, L., Pernelle, N., Saïs, F., Dumont, C.: Logical detection of invalid SameAs statements in RDF data. In: Janowicz, K., Schlobach, S., Lambrix, P., Hyvönen, E. (eds.) EKAW 2014. LNCS, vol. 8876, pp. 373–384. Springer, Heidelberg (2014)
Acknowledgments
The authors acknowledge the support of ANR grants ASPIQ (ANR-12- BS02-0003), QUALINCA (ANR-12-0012) and DURDUR (ANR-13-ALID-0002). The work of the second author has been carried out in a part of the research delegation at INRA MISTEA Montpellier and INRA IATE CEPIA Axe 5 Montpellier.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Arioua, A., Croitoru, M., Papaleo, L., Pernelle, N., Rocher, S. (2016). On the Explanation of SameAs Statements Using Argumentation. In: Schockaert, S., Senellart, P. (eds) Scalable Uncertainty Management. SUM 2016. Lecture Notes in Computer Science(), vol 9858. Springer, Cham. https://doi.org/10.1007/978-3-319-45856-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-45856-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45855-7
Online ISBN: 978-3-319-45856-4
eBook Packages: Computer ScienceComputer Science (R0)