Abstract
Expert knowledge can often be represented using default rules of the form “if A then typically B”. In a probabilistic framework, such default rules can be seen as constraints on what should be derivable by MAP-inference. We exploit this idea for constructing a Markov logic network \(\mathcal {M}\) from a set of first-order default rules D, such that MAP inference from \(\mathcal {M}\) exactly corresponds to default reasoning from D, where we view first-order default rules as templates for the construction of propositional default rules. In particular, to construct appropriate Markov logic networks, we lift three standard methods for default reasoning. The resulting Markov logic networks could then be refined based on available training data. Our method thus offers a convenient way of using expert knowledge for constraining or guiding the process of learning Markov logic networks.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction
Markov logic is a popular framework for statistical relational learning [20]. Formulas in Markov logic essentially correspond to weighted first-order formulas, which act as soft constraints on possible worlds. In current applications, the weights are typically learned from data, while the first-order formulas are either hand crafted or obtained using standard rule learning methods.
The fact that a domain expert could manually specify (some of) the formulas, or could inspect learned formulas, is an important strength of Markov logic. Unfortunately, the weights associated with these formulas do not have an easily interpretable meaning. This limits the potential of Markov logic, as it means that domain experts cannot offer much guidance in terms of how the weights should be set (e.g. in applications with little or no training data) or cannot verify the quality of learned weights (e.g. in applications where the quality of the training data is in doubt). Often, however, Markov logic networks (MLN) are not used for evaluating probabilities but for finding the most likely truth assignment of unobserved variables, given the available evidence, i.e. for maximum a posteriori (MAP) reasoning. In such cases, the precise values of the weights are only relevant inasmuch as they influence the result of MAP queries. In this setting, we can instead ask for constraints on how MAP reasoning should behave as opposed to asking a domain expert to specify weights. For example, the expert could specify constraints such as “if all we know is that x is a bird, then using MAP reasoning we should be able to conclude that x can fly”, which is in agreement with the semantics of the default rule “birds can typically fly” in System P [2, 13].
Thus, a domain expert could be involved in the process of learning an MLN by providing a set of defaults, which are interpreted as constraints on the ranking of possible worlds induced by the MLN. Taking this idea one step further, in this paper, we show how a specific MLN can be constructed from the default rules provided by the expert. Constructing this MLN requires us to select a specific probability distribution that is compatible with the defaults. This selection problem is closely related to the problem of defining the closure of a set of defaults, which has been widely studied in the field of non-monotonic reasoning [9, 10, 14]. In particular, several proposals to define this closure are based on constructing a specific probability distribution [4, 10]. As we will show, it is possible to lift these approaches and thus obtain an efficient and principled way to construct MLNs that are compatible with a given set of defaults.
To date, the use of expert knowledge for guiding or even replacing weight learning has only received limited attention. One exception is [17], which constructs an MLN based on (potentially inconsistent) conditional probabilities provided by domain experts. While this can be useful in some applications, it relies on the ability of experts to provide meaningful probability estimates. However, humans are notoriously poor at judging likelihood. For example, properties that are common among the typical elements of a class of objects are often assumed to be likely in general [22]. Moreover, experts may be able to specify which properties are most likely to hold, in a given context, without being able to quantify their likelihood. In such situations, our default-rule-based approach would be more natural than approaches that force experts to estimate probabilities. On the other hand, our approach will only provide meaningful results for MAP queries: numerical input will be difficult to avoid if we want the constructed MLN to produce satisfactory conditional probability estimates.
This paper is structured as follows. The next section recalls some preliminaries from Markov logic and the non-monotonic reasoning literature. Then in Sect. 3 we show how three well-known approaches to non-monotonic reasoning can be implemented as MAP inference in a particular MLN. By lifting the constructions from Sect. 3, in Sect. 4 we show how MLNs can be constructed whose MAP-consequences are compatible with a given set of first-order default rules. Finally, Sect. 5 evaluates the performance of the resulting MLNs in a standard classification task.
2 Background
2.1 Markov Logic Networks
A Markov logic network (MLN) [20] is a set of weighted formulas \((F, w_F)\), where F is a classical first-order formula and \(w_F\) is a real number, intuitively reflecting a penalty that is applied to possible worlds that violate F. We will sometimes also use the notation \(w_F:\, F\) to denote the formula \((F,w_F)\). Given a set of constants C, an MLN \(\mathcal {M}\) induces the following probability distribution on the set of possible worlds \(\omega \):
where \(n_F(x)\) is the number of true groundings of F in the possible world \(\omega \), and \(Z = \sum _{\omega } p(\omega )\) is a normalization constant to ensure that p can be interpreted as a probability distribution. Sometimes, formulas \((F,w_F)\) with \(w_F = +\infty \) are considered to represent hard constraints. In such cases, we define \(p_{\mathcal {M}}(\omega )=0\) for all possible worlds that do not satisfy all of the hard constraints, and only formulas with a real-valued weight are considered in (1) for the possible worlds that do.
The main inference task for MLNs which we will consider is full MAP inference. Given a set of ground literals (the evidence), MAP inferences aims to compute the most probable configuration of all unobserved variables (the queries). Standard approaches for performing MAP inference include a strategy based on MaxWalkSAT [20] and a cutting plane based strategy [16, 21]. Given a set of ground formulas E, we write \(\max (\mathcal {M},E)\) for the set of most probable worlds of the MLN that satisfy E. We will also consider the following inference relation, initially proposed for penalty logic in [7]:
with \(\mathcal {M}\) an MLN, \(\alpha \) a ground formula and E a set of ground formulas. Note that \((\mathcal {M},E) \vdash _{\textit{MAP}} \alpha \) means that the formula \(\alpha \) is satisfied in all the most probable worlds which are compatible with the available evidence.
2.2 Reasoning About Default Rules in System P
A variety of approaches have been proposed to reason about default rules of the form “if \(\alpha \) then typically \(\beta \) holds”, which we will denote as \(\alpha {\,\mid \!\sim \,}\beta \). Most approaches are based on the idea of defining a preference order over possible worlds and insisting that \(\beta \) is true in the most preferred (i.e. the most normal) of the worlds in which \(\alpha \) is true [3, 9, 13, 18, 19]. The axioms of System P [13] capture a set of desirable properties for an inference relation for default rules:
-
Reflexivity. \(\alpha {\,\mid \!\sim \,}\alpha \)
-
Left logical equivalence. If \(\alpha \equiv \alpha '\) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha '{\,\mid \!\sim \,}\beta \)
-
Right weakening. If \(\beta \models \beta '\) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha {\,\mid \!\sim \,}\beta '\)
-
OR. If \(\alpha {\,\mid \!\sim \,}\gamma \) and \(\beta {\,\mid \!\sim \,}\gamma \) then \(\alpha \vee \beta {\,\mid \!\sim \,}\gamma \)
-
Cautious monotonicity. If \(\alpha {\,\mid \!\sim \,}\beta \) and \(\alpha {\,\mid \!\sim \,}\gamma \) then \(\alpha \wedge \beta {\,\mid \!\sim \,}\gamma \)
-
CUT. If \(\alpha \wedge \beta {\,\mid \!\sim \,}\gamma \) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha {\,\mid \!\sim \,}\gamma \)
where \(\alpha \equiv \alpha '\) and \(\beta \models \beta '\) refer to equivalence and entailment from classical logic. Note that applying the axioms of System P to a set of defaults \(\varDelta = \{\alpha _1 {\,\mid \!\sim \,}\beta _1,...,\alpha _n{\,\mid \!\sim \,}\beta _n\}\) corresponds to a form of monotonic reasoning about defaults. However, as the set of consequences that can be obtained in this way is limited, it is common to consider a non-monotonic inference relation whose set of consequences is closed under the axioms of System P as well as the following property:
-
Rational monotonicity If \(\alpha {\,\mid \!\sim \,}\beta \) and we cannot derive \(\alpha {\,\mid \!\sim \,}\lnot \gamma \) then \(\alpha \wedge \gamma {\,\mid \!\sim \,}\beta \).
In this paper, we will consider three such inference relations: the rational closure [19], the lexicographic closure and the maximum entropy closure. A default \(\alpha {\,\mid \!\sim \,}\beta \) is tolerated by a set of defaults \(\gamma _1{\,\mid \!\sim \,}\delta _1,...,\gamma _m{\,\mid \!\sim \,}\delta _m\) if the classical formula \(\alpha \wedge \beta \wedge \bigwedge _i (\lnot \gamma _i \vee \delta _i)\) is consistent. The rational closure is based on a stratification \(\varDelta _1,...,\varDelta _k\) of \(\varDelta \), where each \(\varDelta _j\) contains all defaults \(\alpha {\,\mid \!\sim \,}\beta \) from \(\varDelta \setminus (\varDelta _{1}\cup ... \varDelta _{j-1})\) which are tolerated by \(\varDelta \setminus (\varDelta _1\cup ... \cup \varDelta _{j-1})\). It can be shown that such a stratification always exists when \(\varDelta \) satisfies some natural consistency properties (see [19] for details). Intuitively, \(\varDelta _1\) contains the most general default rules, \(\varDelta _2\) contains exceptions to the rules in \(\varDelta _1\), \(\varDelta _3\) contains exceptions to the rules in \(\varDelta _2\), etc. This stratification is known as the Z-ordering. Let j be the smallest index for which \(\varDelta ^{\textit{rat}}_{\alpha } = \{\lnot \alpha \vee \beta | \alpha {\,\mid \!\sim \,}\beta \in \varDelta _j \cup ... \cup \varDelta _k \} \cup \{\alpha \}\) is consistent. Then \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \(\varDelta \) if \(\varDelta ^{\textit{rat}}_{\alpha }\models \beta \). When a set of hard rules \(\varGamma \) needs to be enforced, the Z-ordering can be generalized as follows [3]. Each set \(\varDelta _j\) then contains those defaults \(\alpha {\,\mid \!\sim \,}\beta \) for which \(\varGamma \cup \{\alpha \wedge \beta \} \cup \{\lnot \alpha _i\vee \beta _i :(\alpha _i{\,\mid \!\sim \,}\beta _i) \in \varDelta \setminus (\varDelta _1\cup ... \cup \varDelta _{j-1})\}\). Finally, we define \(\varDelta _k = \varGamma \), where \(\varDelta _1, ..., \varDelta _{k-1}\) is the stratification of \(\varDelta \) that was obtained.
The rational closure encodes the intuition that in case of conflict, specific rules should have priority over more generic ones. However, it requires us to ignore all the defaults in \(\varDelta _1\cup ... \cup \varDelta _{j-1}\), even defaults which are intuitively unrelated to this conflict. The lexicographic closure [1] addresses this issue as follows. For a propositional interpretation \(\omega \), we write \(\textit{sat}(\omega ,\varDelta _j)\) for the number of defaults satisfied by \(\omega \), i.e. \(\textit{sat}(\omega ,\varDelta _j) = |\{\alpha {\,\mid \!\sim \,}\beta :(\alpha {\,\mid \!\sim \,}\beta )\in \varDelta _j, \omega \models \lnot \alpha \vee \beta \}|\). We say that an interpretation \(\omega _1\) is lex-preferred over an interpretation \(\omega _2\), written \(\omega _1 \prec \omega _2\), if there exists a j such that \(\textit{sat}(\omega _1,\varDelta _j) > \textit{sat}(\omega _2,\varDelta _j)\) while \(\textit{sat}(\omega _1,\varDelta _i) = \textit{sat}(\omega _2,\varDelta _i)\) for all \(i>j\). The default \(\alpha {\,\mid \!\sim \,}\beta \) is in the lexicographic closure of \(\varDelta \) if \(\beta \) is satisfied in all the most lex-preferred models of \(\alpha \), i.e. \(\forall \omega \in \llbracket \alpha \rrbracket : (\omega \not \models \beta ) \Rightarrow \exists \omega '\in \llbracket \alpha \rrbracket : \omega ' \prec \omega \), where \(\llbracket \alpha \rrbracket \) is a shorthand for \(\{\omega :\omega \models \alpha \}\).
Another approach to default reasoning is based on the principle of maximum entropy [10]. To describe how the maximum-entropy ranking of possible worlds can be computed, we need some additional terminology. A possible world \(\omega \) is said to falsify a rule \(\alpha {\,\mid \!\sim \,}\beta \) if \(\omega \models \alpha \wedge \lnot \beta \) and said to verify it if \(\omega \models \alpha \wedge \beta \). A set of default rules \(\varDelta \) is said to be a minimal core if for any rule \(\alpha {\,\mid \!\sim \,}\beta \), the set \(\{ \alpha {\,\mid \!\sim \,}\lnot \beta \} \cup (\varDelta \setminus \{ \alpha {\,\mid \!\sim \,}\beta \}) \) is a consistent set of default rules, meaning that a Z-ordering of this set exists. Given a minimal core set of defaults \(\varDelta \), the maximum-entropy ranking is obtained as follows [10]. Let \(\varGamma \) be the set of rules tolerated by \(\varDelta \). For each rule \(r \in \varGamma \), we set \(\kappa _{ME}(r) = 1\). While \(\varGamma \ne \varDelta \) we repeat the following steps. Let \(\varOmega \) be the set of models \(\omega \) which do not falsify any of the rules in \(\varDelta \setminus \varGamma \) and verify at least one of these rules. For each model \(\omega \in \varOmega \), we compute \(\kappa _{ME}(\omega ) = \sum \{\kappa _{ME}(\alpha {\,\mid \!\sim \,}\beta ) :(\alpha {\,\mid \!\sim \,}\beta ) \in \varGamma , \omega \models \alpha \wedge \lnot \beta \}\). Let \(\omega ^*\) be the model in \(\varOmega \) with minimum rank. Each rule \(\alpha {\,\mid \!\sim \,}\beta \) that is verified by \(\omega ^*\) is added to \(\varGamma \), and its rank is computed as \(\kappa _{ME}(\alpha {\,\mid \!\sim \,}\beta ) = 1+\kappa _{ME}(\omega ^*)\).
3 Encoding Ground Default Theories in Markov Logic
It is well-known [14] that any set of defaults \(\varDelta \) which is closed under the axioms of System P and rational monotonicity corresponds to a linear ranking \(\kappa \) of possible worlds, such that \(\alpha {\,\mid \!\sim \,}\beta \) iff \(\kappa (\alpha \wedge \beta ) > \kappa (\alpha \wedge \lnot \beta )\), where we write \(\kappa (\gamma )\) for a formula \(\gamma \) as an abbreviation for \(\max \{\kappa (\omega ) :\omega \models \gamma \}\). Since the ranking \(\kappa \) can be encoded as a probability distribution, and every probability distribution on possible worlds can be represented as an MLN, it is clear that there must exist an MLN \(\mathcal {M}\) such that \((\alpha {\,\mid \!\sim \,}\beta ) \in \varDelta \) iff \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \). More generally, for any (i.e. not necessarily closed) set of defaults \(\varDelta \), there exists an MLN \(\mathcal {M}\) such that \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \(\varDelta \), and similar for the lexicographic and maximum-entropy closures. We now show how the MLNs corresponding to these three closures can be constructed.
Transformation 1
(Rational closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\varDelta _1,...,\varDelta _k\) be the Z-ordering of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k (\{ (\lnot a_i \vee \lnot \alpha \vee \beta , \infty ) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (a_i, 1) \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}) \cup \bigcup _{i=2}^k \{ (a_i \vee \lnot a_{i-1}, \infty )\}\) where \(a_i\) are auxiliary literals. Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \((\varDelta ,\varTheta )\).
Transformation 2
(Lexicographic closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\varDelta _1,...,\varDelta _k\) be the Z-ordering of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot \alpha \vee \beta , \lambda _i) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}\) where \(\lambda _i = 1+\sum _{j=1}^{i-1} |\varDelta _j| \cdot \lambda _j\) for \(i > 1\) and \(\lambda _1 = 1\). Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the lexicographic closure of \((\varDelta ,\varTheta )\).
Transformation 3
(Maximum-entropy closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\kappa \) be weights of rules corresponding to the maximum-entropy closure of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\{ (\lnot \alpha \vee \beta , \kappa (\alpha {\,\mid \!\sim \,}\beta ))\) \(:\) \(\alpha {\,\mid \!\sim \,}\beta \) \(\in \) \(\varDelta \}\) \(\cup \) \(\{ (\phi , \infty )\) \(:\) \(\phi \in \varTheta \}\). Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the maximum-entropy closure of \((\varDelta ,\varTheta )\).
Example 1
Consider the default rules \(\varDelta = \{\textit{bird} {\,\mid \!\sim \,}\textit{flies}, \textit{antarctic}\wedge \textit{bird} {\,\mid \!\sim \,}\) \( \lnot \textit{flies}\}\). Then \(\mathcal {M}_1\) \(=\) \(\{ (\lnot a_1\) \(\vee \) \(\lnot \textit{bird}\) \(\vee \) \(\textit{flies}, \infty )\), \((\lnot a_2\) \(\vee \) \(\lnot \textit{antarctic}\) \(\vee \) \(\lnot \textit{bird}\) \(\vee \) \(\lnot \textit{flies}\), \(\infty )\), \( (a_1, 1),\) \((a_2,1),\) \((a_2 \vee \lnot a_1, \infty ) \}\) is the result of Transformation 1, and \(\mathcal {M}_2 = \{ (\lnot \textit{bird} \vee \textit{flies}, 1), (\lnot \textit{antarctic} \vee \lnot \textit{bird} \vee \lnot \textit{flies}, 2) \}\) is the result of Transformation 2, which in this example coincides with the result of Transformation 3.
4 Encoding Non-ground Default Theories in Markov Logic
While reasoning with default rules has mostly been studied at the propositional level, a few authors have considered first-order default rules [8, 12]. Similar as for probabilistic first-order rules [11], two rather different semantics for first-order defaults can be considered. On the one hand, a default such as \(P(x) {\,\mid \!\sim \,}Q(x)\) could mean that the most typical objects that have the property P also have the property Q. On the other hand, this default could also mean that whenever P(x) holds for a given x, in the most normal worlds Q(x) will also hold. In other words, first-order defaults can either model typicality of objects or normality of worlds [8]. In this paper, we will consider the latter interpretation. Given that we only consider finite universes (as is usual in the context of MLNs), we can then see a first order default as a template for propositional defaults. For example \(P(x) {\,\mid \!\sim \,}Q(x)\) can be seen as a compact notation for a set of defaults \(\{P(c_1) {\,\mid \!\sim \,}Q(c_1),...,P(c_n){\,\mid \!\sim \,}Q(c_n)\}\). Note that this approach would not be possible for first-order defaults that model the typicality of objects.
In particular, the first-order default theories we will consider consist of first-order logic formulas (hard rules) and default rules of the form \(\alpha {\,\mid \!\sim \,}\beta \), where \(\alpha \) is a conjunction of literals and \(\beta \) is a disjunction of literals. Our approach can be straightforwardly extended to quantified default rules, where the scopes of quantifiers may be the whole default rules, and not just either the antecedent or the consequent of a rule. While this could be of interest, we do not consider this for the ease of presentation.
Definition 1
(Markov logic model of a first-order default theory). Let \((\varDelta ,\varTheta )\) be a first-order default theory with \(\varDelta \) the set of default rules and \(\varTheta \) the set of hard rules. A Markov logic network \(\mathcal {M}\) is a model of the default logic theory \(\varDelta \cup \varTheta \) if it holds that: (i) \(P\left[ X = \omega \right] = 0\) whenever \(\omega \not \models \varTheta \), and (ii) for any default rule \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta \) and any grounding substitution \(\theta \) of the unquantified variables of \(\alpha {\,\mid \!\sim \,}\beta \), either \(\{ \alpha \theta \} \cup \varTheta \vdash \bot \) or \((\mathcal {M},\alpha \theta ) \vdash _{\textit{MAP}}\beta \theta .\) We say that \((\varDelta ,\varTheta )\) is satisfiable if it has at least one model.
Below we will describe three methods for constructing Markov logic models of first-order default theories, generalizing Transformations 1–3. For convenience, we will use typed formulas (nevertheless, we will assume that default rules given on input are not typed for simplicity). For instance, when we have the formula \(\alpha = \textit{owns}(\textit{person}:X,\textit{thing}:Y)\) and the set of constants of the type person is \(\{\textit{alice}, \textit{bob} \}\) and the set of constants of the type thing is \(\{ \textit{car} \}\) then \(\alpha \) corresponds to the ground formulas \(\textit{owns}(alice,car)\) and \(\textit{owns}(bob,car)\). In cases where there is only one type, we will not write it explicitly. For a constant or variable x, we write \(\tau (x)\) to denote its type. Two formulas \(F_1\) and \(F_2\) (either both conjunctions or both disjunctions of literals) are said to be isomorphic when there is an injective substitution \(\theta \) of the variables of \(F_1\) such that \(F_1\theta \equiv F_2\) (where \(\equiv \) denotes logical equivalence). Two default rules \(D_1 = \alpha _1 {\,\mid \!\sim \,}\beta _1\) and \(D_2 = \alpha _2 {\,\mid \!\sim \,}\beta _2\) are said to be isomorphic, denoted \(D_1 \approx D_2\), if there exists a substitution \(\theta \) of the variables of \(D_1\) such that \(\alpha _1\theta \equiv \alpha _2\) and \(\beta _1\theta \equiv \beta _2\). Two default theories \(\varDelta _1 \cup \varTheta _1\) and \(\varDelta _2 \cup \varTheta _2\) are said to be isomorphic, denoted by \(\varDelta _1 \cup \varTheta _1 \approx \varDelta _2 \cup \varTheta _2\), if there is a bijection i from elements of \(\varDelta _1 \cup \varTheta _1\) to elements of \(\varDelta _2 \cup \varTheta _2\) such that for any \(F \in \varDelta _1 \cup \varTheta _1\), \(i(F) \approx F\). When j is a permutation of a subset of constants from \(\varDelta \cup \varTheta \) then \(j(\varDelta \cup \varTheta )\) denotes the default theory obtained by replacing any constant c from the subset by its image j(c).
Definition 2
(Interchangeable constants). Let \(\varDelta \cup \varTheta \) be a non-ground default theory. A set of constants \(\mathcal {C}\) is said to be interchangeable in \(\varDelta \cup \varTheta \) if \(j(\varDelta \cup \varTheta ) \approx \varDelta \cup \varTheta \) for any permutation j of the constants in \(\mathcal {C}\).
The set of maximal interchangeable subsets of a set of constants is the uniquely defined partition of this set and will be called the interchangeable partition. To check whether a set of constants \(\mathcal {C}\) is interchangeable, it is sufficient to check that \(j(\varDelta \cup \varTheta ) \approx \varDelta \cup \varTheta \) for those permutations which swap just two constants from \(\mathcal {C}\). Note that the constants do not actually need to appear in \(\varDelta \cup \varTheta \). It trivially holds that constants which do not appear in \(\varDelta \cup \varTheta \) are interchangeable. When \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) is the interchangeable partition of a set of constants then we may introduce a new type \(\textit{type}_{\textit{lexmin}{(\mathcal {C}_i)}}\) for every \(\mathcal {C}_i \in \mathcal {I}\) (where \(\textit{lexmin}{(\mathcal {C})}\) denotes the lexicallyFootnote 1 smallest element from \(\mathcal {C}\)). When \(D = \alpha {\,\mid \!\sim \,}\beta \) is a ground default rule, we write \(\textit{variabilize}(D)\) to denote the following default rule: \(\bigwedge \{V_c \ne V_d :c,d\in \textit{const}(D), \tau (c)=\tau (d)\} \wedge \alpha ' {\,\mid \!\sim \,}\beta '\) where \(\textit{const}(D)\) is the set of constants appearing in D and \(\alpha '\) and \(\beta '\) are obtained from \(\alpha \) and \(\beta \) by respectively replacing all constants c by a new variable \(V_c\) of type \(\tau (c)\). Here \(\ne \) is treated as a binary predicate which is defined in the set of hard rules \(\varTheta \).
Let \(\mathcal {C}\) be a set of constants and let \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) be the interchangeable partition of the constants from \(\mathcal {C}\). Two ground default rules \(\alpha _1 {\,\mid \!\sim \,}\beta _1\) and \(\alpha _2 {\,\mid \!\sim \,}\beta _2\) are said to be weakly isomorphic w.r.t. \(\mathcal {I}\) if \(\textit{variabilize}(\alpha _1 {\,\mid \!\sim \,}\beta _1)\) and \(\textit{variabilize}(\alpha _2 {\,\mid \!\sim \,}\beta _2)\) are isomorphicFootnote 2.
Definition 3
(Ground representatives). Let \(D = \alpha {\,\mid \!\sim \,}\beta \) be a default rule and let \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) be the interchangeable partition of constants. A set of ground representatives of D w.r.t. \(\mathcal {I}\) is a maximal set of groundings of D by constants from \(\bigcup _{\mathcal {C}_i \in \mathcal {I}}\mathcal {C}_i\) such that no two of these groundings are weakly isomorphic w.r.t. \(\mathcal {I}\). (If \(\alpha {\,\mid \!\sim \,}\beta \) is typed then we only consider groundings which respect the typing of variables.)
A set of ground representatives of a default rule \(D = \alpha {\,\mid \!\sim \,}\beta \) can be constructed in time \(O(|\mathcal {I}|^{|D|})\). While this is exponential in the size of the default rule (which is usually small), it is only polynomial in the number of classes in the interchangeable partition \(\mathcal {I}\) and does not depend on the total number of constants.
Let \(\varDelta \cup \varTheta \) be a first-order default theory and \(\mathcal {C}\) a set of constants. Let \(\mathcal {R} = \bigcup _{\alpha {\,\mid \!\sim \,}\beta \in \varDelta } \mathcal {R}_{\alpha {\,\mid \!\sim \,}\beta }\) where \(\mathcal {R}_{\alpha {\,\mid \!\sim \,}\beta }\) denotes a set of ground representatives of \(\alpha {\,\mid \!\sim \,}\beta \). The rational closure for the first-order default theory is based on the partitionFootnote 3 \(\varDelta _1^* \cup ... \cup \varDelta _k^*\) of the set
where \(\varDelta ^*_j\) is the set of default rules \(\textit{variabilize}(\alpha {\,\mid \!\sim \,}\beta ) \in \varDelta ^* \setminus (\varDelta _1^* \cup \dots \cup \varDelta _{j-1}^*)\) such that
has a model with Herbrand universe \(\mathcal {C}\). When all rules \(\alpha ' {\,\mid \!\sim \,}\beta '\) from the set
are contained in the same partition class \(\varDelta _j^*\) then we can simplify \(\varDelta _j^*\) by setting \(\varDelta _j^* := \varDelta _j^* \cup \{ \alpha {\,\mid \!\sim \,}\beta \} \setminus \varDelta ^*_{\alpha {\,\mid \!\sim \,}\beta }\). Furthermore, note that checking the existence of Herbrand models can be carried out using cutting-plane inference which means that it is seldom needed to ground the set of default rules completely. We can now present the lifted counterparts to Transformations 1–3.
Transformation 4
(Lifted rational closure). Let \(\varDelta \) be a set of default rules and let \(\varTheta \) be a set of hard constraints. Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined by (3). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot a_i \vee \lnot \alpha \vee \beta , \infty ) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i^* \} \cup \{ (a_i, 1) \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \} \cup \{ (a_i \vee \lnot a_{i-1}, \infty ) \}\) where \(a_i\) are auxiliary (ground) literals. If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).
Transformation 5
(Lifted lexicographic entailment). Let \(\varDelta \) be a set of default rules, let \(\varTheta \) be a set of hard constraints, and let \(\mathcal {U}\) be the considered set of constants. Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined by (3). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot \alpha \vee \beta , \lambda _i) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}\) where \(\lambda _i = 1+\sum _{j=1}^{i-1} \sum _{\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^*_j} |\mathcal {U}|^{|\textit{vars}(\alpha {\,\mid \!\sim \,}\beta )|} \cdot \lambda _j\) for \(i > 1\) and \(\lambda _1 = 1\). If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).
Note that lexicographic entailment may lead to MLNs with very large weights.Footnote 4
Next, we describe a lifted variant of maximum-entropy entailment. Let \(\varDelta \cup \varTheta \) be a first-order default theory and \(\mathcal {I}\) the interchangeable partition of constants from a given set \(\mathcal {C}\). Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined as in (3) (without the simplification of merging default rules), and let \(\varGamma := \varDelta ^*_1\). First, we construct an MLN \(\mathcal {M}\) containing the rules from \(\varGamma \) and set their weights equal to 1. For every \(\varDelta ^*_j\) with \(j > 1\), while \(\varDelta _j^* \not \subseteq \varGamma \), we repeat the following steps. We construct a new MLN \(\mathcal {M}'\) by adding to the MLN \(\mathcal {M}\) all rules from the set \(\{\lnot \alpha \vee \beta :\alpha {\,\mid \!\sim \,}\beta \in (\varDelta ^*_j \cup \dots \varDelta _k^*)\setminus \varGamma \}\) as hard constraints (i.e. with infinite weights). For every \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta _j^* \setminus \varGamma \), we construct its ground representative \(\alpha ' {\,\mid \!\sim \,}\beta '\) (note that there is only one ground representative up to isomorphism for any rule in \(\varDelta _j^*\), which follows from the construction of \(\varDelta _j^*\)) and we find a most probable world \(\omega _{\alpha {\,\mid \!\sim \,}\beta }\) of \((\mathcal {M}',\alpha ')\); let us write \(p_{\alpha {\,\mid \!\sim \,}\beta }\) for its penalty, i.e. the sum of the weights of the violated rules. Note that \(\omega _{\alpha {\,\mid \!\sim \,}\beta }\) verifies the default rule \(\alpha {\,\mid \!\sim \,}\beta \) and only falsifies rules in \(\varGamma \), exactly as in the propositional version of the maximum-entropy transformation. We then select the rules \(\alpha {\,\mid \!\sim \,}\beta \) with minimum penalty \(p_{\alpha {\,\mid \!\sim \,}\beta }\), add them to \(\varGamma \) and to the MLN \(\mathcal {M}\) with the weight set to \(1+p_{\alpha {\,\mid \!\sim \,}\beta }\). If \(\mathcal {M}'\) does not have any models, the initial set of defaults cannot be satisfiable, and we end the procedure.
Transformation 6
(Lifted maximum-entropy entailment). Let \(\varDelta \) be a set of default rules, let \(\varTheta \) be a set of hard constraints, and let \(\mathcal {U}\) be the considered set of constants. Let \(\mathcal {M}\) be the MLN obtained in the last iteration of the procedure described above. If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).
Example 2
Let us consider the following defaults:
Let the set of constants be given by \(\mathcal {C}=\{ \textit{tweety},\textit{donald},\textit{beeper}\}\). Then the lexicographic transformation yields the MLN \(\{(\phi _1, 1),(\phi _2 , 4),(\phi _3, 4)\}\) while the maximum entropy transformation yields \(\{(\phi _1, 1),(\phi _2, 2),(\phi _3, 3) \}\), where \(\phi _1 =\lnot \textit{bird}(X) \vee \textit{flies}(Y)\), \(\phi _2 = \lnot \textit{bird}(X)\vee \lnot \textit{sameSpecies}(X, Y)\vee \lnot (X \ne Y)\vee \lnot \textit{bird}(Y)\vee \lnot \textit{antarctic}(X)\vee \textit{antarctic}(Y)\) and \(\phi _3 = \lnot \textit{bird}(X) \vee \lnot \textit{antarctic}(X) \vee \lnot \textit{flies}(X)\).
As the next example illustrates, it is sometimes necessary to split the initial default rules into several typed specializations.
Example 3
Consider the following defaults: \(bird(X) \wedge (X \ne \textit{tweety}) {\,\mid \!\sim \,}\textit{flies}(X)\), \(\textit{bird}(X) \wedge \textit{antarctic}(X) {\,\mid \!\sim \,}\lnot \textit{flies}(X)\) and \(\textit{bird}(X) \wedge \textit{antarctic}(X) \wedge (X \ne Y) \wedge \textit{sameSpecies}(X,Y) {\,\mid \!\sim \,}\textit{antarctic}(Y)\). Then the lexicographic transformation yields the MLN \(\{(\phi _1, 1), (\phi _2, 1), (\phi _3, 7), (\phi _4, 7)\}\), where:
Note that the transformation had to introduce new types corresponding to the interchangeable sets of constants \(\{\{ \textit{tweety} \}, \{ \textit{beeper}, \textit{donald} \} \}\). The rule \(\phi _4\) was created by merging rules with different typing, which was made possible by the fact that all the respective differently typed rules ended up with the same weights. The maximum entropy transformation leads to six such rules.
5 Evaluation
In this section we describe experimental evaluation of the methods presented in this paper. The implementation of the described methods is available for downloadFootnote 5.
We have evaluated the proposed methods using the well-known UW-CSE dataset, which describes the Department of Computer Science and Engineering at the University of Washington [20]. The usual task is to predict the \(\textit{advisedBy}(\textit{person},\) \(\textit{person})\) predicate from the other predicates. A set of rules for this domain has previously been collected for the experiments in [20]. These rules, however, cannot be used as default rules because they are not satisfiable in the sense of Definition 1. Therefore, in order to evaluate our method, we have used the following consistent set of default rules.
Recall that default rules \(\alpha {\,\mid \!\sim \,}\beta \) in our setting correspond to statements of the form: for any grounding substitution \(\theta \), \(\beta \theta \) is true in all most probable worlds of \((\mathcal {M},\alpha \theta )\). Thus the default rules \(\alpha {\,\mid \!\sim \,}\beta \) we consider should be such that an expert believes that \(\alpha \theta \) being the only evidence, it would make sense to conclude \(\beta \theta \). Seen with this perspective in mind, rule \(D_1\) states that in absence of any knowledge, we assume that persons S and P are not in the \(\textit{advisedBy}\) relationship. Rule \(D_2\) states that if we only know that S has an advisor then we conclude that S does not have a temporary advisor. Rule \(D_3\) states that advisors are typically co-authors of their students’ papers. Rule \(D_4\) states that students typically only have one advisor. The rest of the rules can be interpreted similarly. Note that rules \(D_7\) and \(D_9\) encode exceptions to rules \(D_6\) and \(D_8\).
We computed the lexicographic and maximum-entropy transformations of these rules using our implementation of the described methods.Footnote 6 We evaluated the obtained MLNs on the five different subject areas of the UW-CSE dataset, which is the standard methodology. Specifically, we computed the average number of true positives and false positivesFootnote 7 for the advisedBy predicate over 10 runs of MAP inference, noting that the results can depend on the specific MAP state that is returned. For comparison, we have used an MLN with the same setFootnote 8 of rules but with weights learned discriminatively using Tuffy [15] (LEARNED), and an MLN with the same set of rules but with all weights set to 1 (ONES). The results are shown in Table 1. The maximum entropy and lexicographic entailment have highest recall but at the cost of also having a higher number of false positives. Note that the number of pairs which can potentially be in the \(\textit{advisedBy}\) relationship is in the order of hundreds or even thousands but the true number of pairs of people in this relationship is in the order of just tens. The baseline method ONES has largest variance.
6 Conclusion
We have discussed the problem of constructing a Markov logic network (MLN) from a set of first-order default rules, where default rules are seen as constraints on what should be derivable using MAP inference. The proposed construction methods have been obtained by lifting three well-known methods for non-monotonic reasoning about propositional default rules: the rational closure, the lexicographic closure and the maximum-entropy closure. As our evaluation with the UW-CSE dataset illustrates, our method can be used to construct useful MLNs in scenarios where no training data is available. In the future, we would like to explore the connections between our proposed lifted transformations and the lifted inference literature. For example, identifying interchangeable constants is known as shattering in lifted inference [6].
Notes
- 1.
Here, we are just ordering the constants by the lexical ordering of their names.
- 2.
We will omit “w.r.t. \(\mathcal {I}\)” when it is clear from the context.
- 3.
With a slight abuse of terminology, we will call \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) the partition of \(\varDelta \cup \varTheta \) even though it is strictly speaking only a partition of \(\varDelta ^*\).
- 4.
Although existing MLN systems are not able to work with weights as large as are sometimes produced, due to numerical issues, we have implemented an MLN system based on cutting-plane MAP inference which can work with arbitrarily large weights.
- 5.
- 6.
- 7.
Note that using AUC as an evaluation metric would not make sense in this case because of the way the MLNs are constructed by our approach. The construction can produce MLNs which make sensible predictions when used together with MAP inference but which do not have to be meaningful for the given datasets as probability distributions. After all, our MLN construction methods do not assume any information from which the probabilities could be inferred, except qualitative information on rankings of possible worlds expressed by default rules.
- 8.
We had to remove \(D_3\) for efficiency reasons, though.
- 9.
For brevity we omit hard rules here because generalizations of the proofs to involve hard rules are rather straightforward, but a bit too verbose.
- 10.
Recall that the formulas in \(\varDelta ^*\) are typed according to interchangeability of constants. The groundings must respect the typing information. This will be the case whenever we speak of groundings in this section.
- 11.
However, this does not mean that we need to ground this theory completely in order to solve it, e.g. by using cutting plane inference we can avoid the need to ground it completely.
References
Benferhat, S., Bonnefon, J.F., da Silva Neves, R.: An overview of possibilistic handling of default reasoning, with experimental studies. Synthese 146(1–2), 53–70 (2005)
Benferhat, S., Dubois, D., Prade, H.: Nonmonotonic reasoning, conditional objects and possibility theory. Artif. Intell. 92(1–2), 259–276 (1997)
Benferhat, S., Dubois, D., Prade, H.: Practical handling of exception-tainted rules and independence information in possibilistic logic. Appl. Intell. 9(2), 101–127 (1998)
Benferhat, S., Dubois, D., Prade, H.: Possibilistic and standard probabilistic semantics of conditional knowledge bases. J. Log. Comput. 9(6), 873–895 (1999)
Berre, D.L., Parrain, A.: The Sat4j library, release 2.2. J. Satisfiability, Boolean Model. Comput. 7, 50–64 (2010)
de Salvo Braz, R., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Kaelbling, L.P., Saffiotti, A. (eds.) Proceeding of the 19th Joint Conference on Artificial Intelligence, p. 1319 (2005)
de Saint-Cyr, F.D., Lang, J., Schiex, T.: Penalty logic and its link with Dempster-Shafer theory. In: Proceedings of the 10th International Conference on Uncertainty in Artificial Intelligence, pp. 204–211 (1994)
Friedman, N., Halpern, J.Y., Koller, D.: First-order conditional logic for default reasoning revisited. ACM Trans. Comput. Log. 1(2), 175–207 (2000)
Geffner, H., Pearl, J.: Conditional entailment: bridging two approaches to default reasoning. Artif. Intell. 53(2), 209–244 (1992)
Goldszmidt, M., Morris, P., Pearl, J.: A maximum entropy approach to nonmonotonic reasoning. IEEE Trans. Pattern Anal. Mach. Intell. 15(3), 220–232 (1993)
Halpern, J.Y.: An analysis of first-order logics of probability. Artif. Intell. 46(3), 311–350 (1990)
Kern-Isberner, G., Thimm, M.: A ranking semantics for first-order conditionals. In: Proceedings of the 20th European Conference on Artificial Intelligence, pp. 456–461 (2012)
Kraus, S., Lehmann, D., Magidor, M.: Nonmonotonic reasoning, preferential models and cumulative logics. Artif. Intell. 44(1–2), 167–207 (1990)
Lehmann, D., Magidor, M.: What does a conditional knowledge base entail? Artif. Intell. 55(1), 1–60 (1992)
Niu, F., Ré, C., Doan, A., Shavlik, J.W.: Tuffy: scaling up statistical inference in markov logic networks using an RDBMS. PVLDB 4(6), 373–384 (2011)
Noessner, J., Niepert, M., Stuckenschmidt, H., Rockit.: Exploiting parallelism and symmetry for map inference in statistical relational models. In: Proceedings of the 27th Conference on Artificial Intelligence, AAAI (2013)
Pápai, T., Ghosh, S., Kautz, H.: Combining subjective probabilities and data in training markov logic networks. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part I. LNCS, vol. 7523, pp. 90–105. Springer, Heidelberg (2012)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Burlington (1988)
Pearl, J., System, Z.: A natural ordering of defaults with tractable applications to nonmonotonic reasoning. In: Proceedings of the 3rd Conference on Theoretical Aspects of Reasoning about Knowledge, pp. 121–135 (1990)
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1–2), 107–136 (2006)
Riedel, S.: Improving the accuracy and efficiency of MAP inference for Markov logic. In: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, pp. 468–475 (2008)
Tversky, A., Kahneman, D.: Extensional versus intuitive reasoning: the conjunction fallacy in probability judgment. Psychol. Rev. 90(4), 293 (1983)
Acknowledgement
We thank the anonymous reviewers for their detailed comments. This work has been supported by a grant from the Leverhulme Trust (RPG-2014-164).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Proofs
A Proofs
Here we provide formal justifications for the transformations presented in this paperFootnote 9. We start by proving correctness of the ground transformations.
Proposition 1
Let \(\varDelta \) be a set of default rules. Let \(\varDelta ^{rat}\), \(\varDelta ^{lex}\) and \(\varDelta ^{ent}\) be the rational, lexicographic and maximum entropy closure of \(\varDelta \), respectively. Let \(\mathcal {M}^{rat}\), \(\mathcal {M}^{lex}\) and \(\mathcal {M}^{ent}\) be Markov logic networks obtained from \(\varDelta \) by Transformations 1, 2 and 3, respectively. Then the following holds for any default rule \(\alpha {\,\mid \!\sim \,}\beta \):
-
1.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{rat}\) if and only if \((\mathcal {M}^{rat}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \),
-
2.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{lex}\) if and only if \((\mathcal {M}^{lex}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \),
-
3.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{ent}\) if and only if \((\mathcal {M}^{ent}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \).
Proof
Throughout the proof, let \(\varDelta = \varDelta _1 \cup \varDelta _2 \cup \dots \cup \varDelta _k\) be the Z-ordering of \(\varDelta \).
1. Let \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{rat}\) be a default rule. Let j be the smallest index such that \(\varDelta _{\alpha }^{rat} = \{\lnot \gamma \vee \delta | \gamma {\,\mid \!\sim \,}\delta \in \varDelta _j \cup ... \cup \varDelta _k \} \cup \{ \alpha \}\) is consistent. Recall that \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{rat}\) if and only if \(\varDelta _\alpha ^{rat} \models \beta \). By the construction of the MLN \(\mathcal {M}^{rat}\) it must hold that \((\mathcal {M}^{rat},\{ \alpha \}) \vdash _{\textit{MAP}}\lnot a_i\) for every \(i < j\) and also \((\mathcal {M}^{rat},\{ \alpha \}) \vdash _{\textit{MAP}}a_i\) for all \(i \ge j\). Therefore all \(\lnot \alpha \vee \beta \), such that \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i\) where \(i \ge j\), must be true in all most probable worlds of \((\mathcal {M}^{rat},\{\alpha \})\). But then necessarily we have: if \(\varDelta _\alpha ^{rat} \models \beta \) then \((\mathcal {M}^{rat},\{ \alpha \}) \vdash _{\textit{MAP}}\beta \). Similarly, to show the other direction of the implication, let us assume that \((\mathcal {M}^{rat},\{ \alpha \}) \vdash _{\textit{MAP}}\beta \). Then we can show using basically the identical reasoning as for the other direction that the set of formulas \(\lnot \alpha \vee \beta \) which must be satisfied in all most probable worlds of \((\mathcal {M}^{rat},\{\alpha \})\) is equivalent to the set of formulas in \(\varDelta _\alpha ^{rat}\).
2. It holds that \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{lex}\) if and only if \(\beta \) is true in all lex-preferred models of \(\alpha \), i.e. \(\forall \omega \in \llbracket \alpha \rrbracket : (\omega \not \models \beta ) \Rightarrow \exists \omega '\in \llbracket \alpha \rrbracket : \omega ' \prec \omega \) where \(\prec \) is the lex-preference relation based on Z-ordering defined in Sect. 2.2. What we need to show is that for any possible worlds \(\omega \), \(\omega '\) it holds \(\omega \prec \omega '\) if and only \(P_{\mathcal {M}^{lex}}(\omega ) > P_{\mathcal {M}^{lex}}(\omega ')\) where \(P_{\mathcal {M}^{lex}}\) is the probability given by the MLN \(\mathcal {M}^{lex}\), from which correctness of the lexicographic transformation will follow. But this actually follows immediately from the way we set the weights in Transformation 2, as the penalty for not satisfying one formula corresponding to a default rule in \(\varDelta _i\) is greater than the sum of penalties for not satisfying all formulas corresponding to the default rules in \(\bigcup _{j < i} \varDelta _j\).
3. This follows directly from the results in the paper [10] in which maximum entropy closure was introduced. An explicit ranking function on possible worlds was derived in that paper, which we explicitly use in the transformation.
Next we show correctness of the non-ground transformations. We start by proving properties of the non-ground counterpart of Z-ordering.
Proposition 2
Let \(\varDelta ^*\) be a default theory and \(\mathcal {C}\) be a set of constants (universe). Let \(\varDelta \) be the set of groundingsFootnote 10 of default rules from \(\varDelta ^*\). Let \(\varDelta = \varDelta _1 \cup \dots \cup \varDelta _k\) be Z-ordering of the set of ground default rules \(\varDelta \). Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be as defined by Eq. 3. Then a ground default rule \(\alpha {\,\mid \!\sim \,}\beta \) is in \(\varDelta _i\) if and only if a rule isomorphic to \(\textit{variabilize}(\alpha {\,\mid \!\sim \,}\beta )\) is in \(\varDelta _i^*\).
Proof
(Sketch) This proposition follows from the simple observation that Eq. 3 is equivalent to checking whether the ground default rule \(\alpha {\,\mid \!\sim \,}\beta \) is tolerated by the set of groundings of the default rules \(\gamma {\,\mid \!\sim \,}\delta \in \varDelta ^* \setminus (\varDelta ^*_1 \cup \dots \cup \varDelta ^*_{j-1})\) (because we explicitly ask there about existence of a Herbrand modelFootnote 11 with universe \(\mathcal {C}\)). Since the answer, whether it is tolerated or not, must be the same for every default rule weakly isomorphic to \(\alpha {\,\mid \!\sim \,}\beta \), it follows that this is equivalent to checking this condition for all groundings of \(\textit{variabilize}(\alpha {\,\mid \!\sim \,}\beta )\), which must then necessarily give us an equivalent result to what we would obtain by Z-ordering performed on the explicitly enumerated groundings. The statement of the proposition then follows from this.
In other words, what the above proposition states, is that if we replace non-ground rules in the particular \(\varDelta _i^*\)’s by all their groundings then this partitioning of ground default rules must be equivalent to what we would obtain by directly Z-ordering the ground default rules in the set \(\mathcal {R}\).
Proposition 3
Let \(\varDelta ^*\) be a set of non-ground default rules and \(\mathcal {C}\) be a set of constants (universe). Let \(\varDelta ^{rat}\), \(\varDelta ^{lex}\) and \(\varDelta ^{ent}\) be the rational, lexicographic and maximum entropy closure, respectively, of the set of default rules obtained by grounding \(\varDelta ^*\). Let \(\mathcal {M}^{rat}\), \(\mathcal {M}^{lex}\) and \(\mathcal {M}^{ent}\) be Markov logic networks obtained from \(\varDelta ^*\) by Transformations 4, 5 and 6, respectively. Then the following holds for any ground default rule \(\alpha {\,\mid \!\sim \,}\beta \):
-
1.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{rat}\) if and only if \((\mathcal {M}^{rat}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \),
-
2.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{lex}\) if and only if \((\mathcal {M}^{lex}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \),
-
3.
\(\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^{ent}\) if and only if \((\mathcal {M}^{ent}, \{ \alpha \}) \vdash _{\textit{MAP}}\beta \).
Proof
1. This case follows from Propositions 1 and 2 by noticing that the constructed MLNs, when grounded, are the same as what the ground Transformation 1 would produce if applied on all groundings of default rules from \(\varDelta ^*\).
2. From Proposition 2 we have that, if we ground the MLN produced by Transformation 5, then the structure of the MLN will be identical to what we would obtain if we applied Transformation 2 on all groundings of default rules from \(\varDelta ^*\). While the weights of the formulas are not the same, it is still guaranteed that \(\omega \prec \omega '\) if and only \(P_{\mathcal {M}^{lex}}(\omega ) > P_{\mathcal {M}^{lex}}(\omega ')\), where \(\prec \) is the lex-preference relation. This is because the term \(|\mathcal {C}|^{|vars(\alpha {\,\mid \!\sim \,}\beta )|}\), which is used to define the weights in Transformation 5, is an upper bound on the number of groundings of a default rule \(\alpha {\,\mid \!\sim \,}\beta \) (this implies that the sum of all weights of groundings of formulas in the MLN which correspond to default rules from \(\bigcup _{i < j} \varDelta _i\) will be smaller than the weight of a single formula corresponding to a default rule from \(\varDelta _j\) which is what we need).
3. (Sketch) To show the last part of this proposition, we would basically need to replicate a more detailed reasoning from the proof of Proposition 2 because maximum entropy closure needs to create a partitioning of the set of default rules which refines Z-ordering. Since no new ideas are needed for this proof and because of space limitations, we omit details. The basic idea is the same as for the non-ground Z-ordering – we only process representatives of the non-ground default rules and we can show that we would obtain an equivalent result if we processed all groundings of the default rules by the procedure from Transformation 3.
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kuželka, O., Davis, J., Schockaert, S. (2016). Constructing Markov Logic Networks from First-Order Default Rules. In: Inoue, K., Ohwada, H., Yamamoto, A. (eds) Inductive Logic Programming. ILP 2015. Lecture Notes in Computer Science(), vol 9575. Springer, Cham. https://doi.org/10.1007/978-3-319-40566-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-40566-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40565-0
Online ISBN: 978-3-319-40566-7
eBook Packages: Computer ScienceComputer Science (R0)