Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Markov logic is a popular framework for statistical relational learning [20]. Formulas in Markov logic essentially correspond to weighted first-order formulas, which act as soft constraints on possible worlds. In current applications, the weights are typically learned from data, while the first-order formulas are either hand crafted or obtained using standard rule learning methods.

The fact that a domain expert could manually specify (some of) the formulas, or could inspect learned formulas, is an important strength of Markov logic. Unfortunately, the weights associated with these formulas do not have an easily interpretable meaning. This limits the potential of Markov logic, as it means that domain experts cannot offer much guidance in terms of how the weights should be set (e.g. in applications with little or no training data) or cannot verify the quality of learned weights (e.g. in applications where the quality of the training data is in doubt). Often, however, Markov logic networks (MLN) are not used for evaluating probabilities but for finding the most likely truth assignment of unobserved variables, given the available evidence, i.e. for maximum a posteriori (MAP) reasoning. In such cases, the precise values of the weights are only relevant inasmuch as they influence the result of MAP queries. In this setting, we can instead ask for constraints on how MAP reasoning should behave as opposed to asking a domain expert to specify weights. For example, the expert could specify constraints such as “if all we know is that x is a bird, then using MAP reasoning we should be able to conclude that x can fly”, which is in agreement with the semantics of the default rule “birds can typically fly” in System P [2, 13].

Thus, a domain expert could be involved in the process of learning an MLN by providing a set of defaults, which are interpreted as constraints on the ranking of possible worlds induced by the MLN. Taking this idea one step further, in this paper, we show how a specific MLN can be constructed from the default rules provided by the expert. Constructing this MLN requires us to select a specific probability distribution that is compatible with the defaults. This selection problem is closely related to the problem of defining the closure of a set of defaults, which has been widely studied in the field of non-monotonic reasoning [9, 10, 14]. In particular, several proposals to define this closure are based on constructing a specific probability distribution [4, 10]. As we will show, it is possible to lift these approaches and thus obtain an efficient and principled way to construct MLNs that are compatible with a given set of defaults.

To date, the use of expert knowledge for guiding or even replacing weight learning has only received limited attention. One exception is [17], which constructs an MLN based on (potentially inconsistent) conditional probabilities provided by domain experts. While this can be useful in some applications, it relies on the ability of experts to provide meaningful probability estimates. However, humans are notoriously poor at judging likelihood. For example, properties that are common among the typical elements of a class of objects are often assumed to be likely in general [22]. Moreover, experts may be able to specify which properties are most likely to hold, in a given context, without being able to quantify their likelihood. In such situations, our default-rule-based approach would be more natural than approaches that force experts to estimate probabilities. On the other hand, our approach will only provide meaningful results for MAP queries: numerical input will be difficult to avoid if we want the constructed MLN to produce satisfactory conditional probability estimates.

This paper is structured as follows. The next section recalls some preliminaries from Markov logic and the non-monotonic reasoning literature. Then in Sect. 3 we show how three well-known approaches to non-monotonic reasoning can be implemented as MAP inference in a particular MLN. By lifting the constructions from Sect. 3, in Sect. 4 we show how MLNs can be constructed whose MAP-consequences are compatible with a given set of first-order default rules. Finally, Sect. 5 evaluates the performance of the resulting MLNs in a standard classification task.

2 Background

2.1 Markov Logic Networks

A Markov logic network (MLN) [20] is a set of weighted formulas \((F, w_F)\), where F is a classical first-order formula and \(w_F\) is a real number, intuitively reflecting a penalty that is applied to possible worlds that violate F. We will sometimes also use the notation \(w_F:\, F\) to denote the formula \((F,w_F)\). Given a set of constants C, an MLN \(\mathcal {M}\) induces the following probability distribution on the set of possible worlds \(\omega \):

$$\begin{aligned} p_{\mathcal {M}}(\omega ) = \frac{1}{Z}\text{ exp }\left( \sum _{(F,w_F) \in \mathcal {M}} w_F n_F(\omega )\right) , \end{aligned}$$
(1)

where \(n_F(x)\) is the number of true groundings of F in the possible world \(\omega \), and \(Z = \sum _{\omega } p(\omega )\) is a normalization constant to ensure that p can be interpreted as a probability distribution. Sometimes, formulas \((F,w_F)\) with \(w_F = +\infty \) are considered to represent hard constraints. In such cases, we define \(p_{\mathcal {M}}(\omega )=0\) for all possible worlds that do not satisfy all of the hard constraints, and only formulas with a real-valued weight are considered in (1) for the possible worlds that do.

The main inference task for MLNs which we will consider is full MAP inference. Given a set of ground literals (the evidence), MAP inferences aims to compute the most probable configuration of all unobserved variables (the queries). Standard approaches for performing MAP inference include a strategy based on MaxWalkSAT [20] and a cutting plane based strategy [16, 21]. Given a set of ground formulas E, we write \(\max (\mathcal {M},E)\) for the set of most probable worlds of the MLN that satisfy E. We will also consider the following inference relation, initially proposed for penalty logic in [7]:

$$\begin{aligned} (\mathcal {M},E) \vdash _{\textit{MAP}} \alpha \quad \text {iff}\quad \forall \omega \in \max (\mathcal {M},E): \omega \models \alpha \end{aligned}$$
(2)

with \(\mathcal {M}\) an MLN, \(\alpha \) a ground formula and E a set of ground formulas. Note that \((\mathcal {M},E) \vdash _{\textit{MAP}} \alpha \) means that the formula \(\alpha \) is satisfied in all the most probable worlds which are compatible with the available evidence.

2.2 Reasoning About Default Rules in System P

A variety of approaches have been proposed to reason about default rules of the form “if \(\alpha \) then typically \(\beta \) holds”, which we will denote as \(\alpha {\,\mid \!\sim \,}\beta \). Most approaches are based on the idea of defining a preference order over possible worlds and insisting that \(\beta \) is true in the most preferred (i.e. the most normal) of the worlds in which \(\alpha \) is true [3, 9, 13, 18, 19]. The axioms of System P [13] capture a set of desirable properties for an inference relation for default rules:

  • Reflexivity. \(\alpha {\,\mid \!\sim \,}\alpha \)

  • Left logical equivalence. If \(\alpha \equiv \alpha '\) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha '{\,\mid \!\sim \,}\beta \)

  • Right weakening. If \(\beta \models \beta '\) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha {\,\mid \!\sim \,}\beta '\)

  • OR. If \(\alpha {\,\mid \!\sim \,}\gamma \) and \(\beta {\,\mid \!\sim \,}\gamma \) then \(\alpha \vee \beta {\,\mid \!\sim \,}\gamma \)

  • Cautious monotonicity. If \(\alpha {\,\mid \!\sim \,}\beta \) and \(\alpha {\,\mid \!\sim \,}\gamma \) then \(\alpha \wedge \beta {\,\mid \!\sim \,}\gamma \)

  • CUT. If \(\alpha \wedge \beta {\,\mid \!\sim \,}\gamma \) and \(\alpha {\,\mid \!\sim \,}\beta \) then \(\alpha {\,\mid \!\sim \,}\gamma \)

where \(\alpha \equiv \alpha '\) and \(\beta \models \beta '\) refer to equivalence and entailment from classical logic. Note that applying the axioms of System P to a set of defaults \(\varDelta = \{\alpha _1 {\,\mid \!\sim \,}\beta _1,...,\alpha _n{\,\mid \!\sim \,}\beta _n\}\) corresponds to a form of monotonic reasoning about defaults. However, as the set of consequences that can be obtained in this way is limited, it is common to consider a non-monotonic inference relation whose set of consequences is closed under the axioms of System P as well as the following property:

  • Rational monotonicity If \(\alpha {\,\mid \!\sim \,}\beta \) and we cannot derive \(\alpha {\,\mid \!\sim \,}\lnot \gamma \) then \(\alpha \wedge \gamma {\,\mid \!\sim \,}\beta \).

In this paper, we will consider three such inference relations: the rational closure [19], the lexicographic closure and the maximum entropy closure. A default \(\alpha {\,\mid \!\sim \,}\beta \) is tolerated by a set of defaults \(\gamma _1{\,\mid \!\sim \,}\delta _1,...,\gamma _m{\,\mid \!\sim \,}\delta _m\) if the classical formula \(\alpha \wedge \beta \wedge \bigwedge _i (\lnot \gamma _i \vee \delta _i)\) is consistent. The rational closure is based on a stratification \(\varDelta _1,...,\varDelta _k\) of \(\varDelta \), where each \(\varDelta _j\) contains all defaults \(\alpha {\,\mid \!\sim \,}\beta \) from \(\varDelta \setminus (\varDelta _{1}\cup ... \varDelta _{j-1})\) which are tolerated by \(\varDelta \setminus (\varDelta _1\cup ... \cup \varDelta _{j-1})\). It can be shown that such a stratification always exists when \(\varDelta \) satisfies some natural consistency properties (see [19] for details). Intuitively, \(\varDelta _1\) contains the most general default rules, \(\varDelta _2\) contains exceptions to the rules in \(\varDelta _1\), \(\varDelta _3\) contains exceptions to the rules in \(\varDelta _2\), etc. This stratification is known as the Z-ordering. Let j be the smallest index for which \(\varDelta ^{\textit{rat}}_{\alpha } = \{\lnot \alpha \vee \beta | \alpha {\,\mid \!\sim \,}\beta \in \varDelta _j \cup ... \cup \varDelta _k \} \cup \{\alpha \}\) is consistent. Then \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \(\varDelta \) if \(\varDelta ^{\textit{rat}}_{\alpha }\models \beta \). When a set of hard rules \(\varGamma \) needs to be enforced, the Z-ordering can be generalized as follows [3]. Each set \(\varDelta _j\) then contains those defaults \(\alpha {\,\mid \!\sim \,}\beta \) for which \(\varGamma \cup \{\alpha \wedge \beta \} \cup \{\lnot \alpha _i\vee \beta _i :(\alpha _i{\,\mid \!\sim \,}\beta _i) \in \varDelta \setminus (\varDelta _1\cup ... \cup \varDelta _{j-1})\}\). Finally, we define \(\varDelta _k = \varGamma \), where \(\varDelta _1, ..., \varDelta _{k-1}\) is the stratification of \(\varDelta \) that was obtained.

The rational closure encodes the intuition that in case of conflict, specific rules should have priority over more generic ones. However, it requires us to ignore all the defaults in \(\varDelta _1\cup ... \cup \varDelta _{j-1}\), even defaults which are intuitively unrelated to this conflict. The lexicographic closure [1] addresses this issue as follows. For a propositional interpretation \(\omega \), we write \(\textit{sat}(\omega ,\varDelta _j)\) for the number of defaults satisfied by \(\omega \), i.e. \(\textit{sat}(\omega ,\varDelta _j) = |\{\alpha {\,\mid \!\sim \,}\beta :(\alpha {\,\mid \!\sim \,}\beta )\in \varDelta _j, \omega \models \lnot \alpha \vee \beta \}|\). We say that an interpretation \(\omega _1\) is lex-preferred over an interpretation \(\omega _2\), written \(\omega _1 \prec \omega _2\), if there exists a j such that \(\textit{sat}(\omega _1,\varDelta _j) > \textit{sat}(\omega _2,\varDelta _j)\) while \(\textit{sat}(\omega _1,\varDelta _i) = \textit{sat}(\omega _2,\varDelta _i)\) for all \(i>j\). The default \(\alpha {\,\mid \!\sim \,}\beta \) is in the lexicographic closure of \(\varDelta \) if \(\beta \) is satisfied in all the most lex-preferred models of \(\alpha \), i.e. \(\forall \omega \in \llbracket \alpha \rrbracket : (\omega \not \models \beta ) \Rightarrow \exists \omega '\in \llbracket \alpha \rrbracket : \omega ' \prec \omega \), where \(\llbracket \alpha \rrbracket \) is a shorthand for \(\{\omega :\omega \models \alpha \}\).

Another approach to default reasoning is based on the principle of maximum entropy [10]. To describe how the maximum-entropy ranking of possible worlds can be computed, we need some additional terminology. A possible world \(\omega \) is said to falsify a rule \(\alpha {\,\mid \!\sim \,}\beta \) if \(\omega \models \alpha \wedge \lnot \beta \) and said to verify it if \(\omega \models \alpha \wedge \beta \). A set of default rules \(\varDelta \) is said to be a minimal core if for any rule \(\alpha {\,\mid \!\sim \,}\beta \), the set \(\{ \alpha {\,\mid \!\sim \,}\lnot \beta \} \cup (\varDelta \setminus \{ \alpha {\,\mid \!\sim \,}\beta \}) \) is a consistent set of default rules, meaning that a Z-ordering of this set exists. Given a minimal core set of defaults \(\varDelta \), the maximum-entropy ranking is obtained as follows [10]. Let \(\varGamma \) be the set of rules tolerated by \(\varDelta \). For each rule \(r \in \varGamma \), we set \(\kappa _{ME}(r) = 1\). While \(\varGamma \ne \varDelta \) we repeat the following steps. Let \(\varOmega \) be the set of models \(\omega \) which do not falsify any of the rules in \(\varDelta \setminus \varGamma \) and verify at least one of these rules. For each model \(\omega \in \varOmega \), we compute \(\kappa _{ME}(\omega ) = \sum \{\kappa _{ME}(\alpha {\,\mid \!\sim \,}\beta ) :(\alpha {\,\mid \!\sim \,}\beta ) \in \varGamma , \omega \models \alpha \wedge \lnot \beta \}\). Let \(\omega ^*\) be the model in \(\varOmega \) with minimum rank. Each rule \(\alpha {\,\mid \!\sim \,}\beta \) that is verified by \(\omega ^*\) is added to \(\varGamma \), and its rank is computed as \(\kappa _{ME}(\alpha {\,\mid \!\sim \,}\beta ) = 1+\kappa _{ME}(\omega ^*)\).

3 Encoding Ground Default Theories in Markov Logic

It is well-known [14] that any set of defaults \(\varDelta \) which is closed under the axioms of System P and rational monotonicity corresponds to a linear ranking \(\kappa \) of possible worlds, such that \(\alpha {\,\mid \!\sim \,}\beta \) iff \(\kappa (\alpha \wedge \beta ) > \kappa (\alpha \wedge \lnot \beta )\), where we write \(\kappa (\gamma )\) for a formula \(\gamma \) as an abbreviation for \(\max \{\kappa (\omega ) :\omega \models \gamma \}\). Since the ranking \(\kappa \) can be encoded as a probability distribution, and every probability distribution on possible worlds can be represented as an MLN, it is clear that there must exist an MLN \(\mathcal {M}\) such that \((\alpha {\,\mid \!\sim \,}\beta ) \in \varDelta \) iff \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \). More generally, for any (i.e. not necessarily closed) set of defaults \(\varDelta \), there exists an MLN \(\mathcal {M}\) such that \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \(\varDelta \), and similar for the lexicographic and maximum-entropy closures. We now show how the MLNs corresponding to these three closures can be constructed.

Transformation 1

(Rational closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\varDelta _1,...,\varDelta _k\) be the Z-ordering of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k (\{ (\lnot a_i \vee \lnot \alpha \vee \beta , \infty ) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (a_i, 1) \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}) \cup \bigcup _{i=2}^k \{ (a_i \vee \lnot a_{i-1}, \infty )\}\) where \(a_i\) are auxiliary literals. Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the rational closure of \((\varDelta ,\varTheta )\).

Transformation 2

(Lexicographic closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\varDelta _1,...,\varDelta _k\) be the Z-ordering of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot \alpha \vee \beta , \lambda _i) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}\) where \(\lambda _i = 1+\sum _{j=1}^{i-1} |\varDelta _j| \cdot \lambda _j\) for \(i > 1\) and \(\lambda _1 = 1\). Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the lexicographic closure of \((\varDelta ,\varTheta )\).

Transformation 3

(Maximum-entropy closure). Let \(\varDelta \) be a set of ground default rules and let \(\varTheta \) be a set of hard constraints (clauses). Let \(\kappa \) be weights of rules corresponding to the maximum-entropy closure of \(\varDelta \cup \varTheta \). Let the MLN \(\mathcal {M}\) be defined as follows: \(\{ (\lnot \alpha \vee \beta , \kappa (\alpha {\,\mid \!\sim \,}\beta ))\) \(:\) \(\alpha {\,\mid \!\sim \,}\beta \) \(\in \) \(\varDelta \}\) \(\cup \) \(\{ (\phi , \infty )\) \(:\) \(\phi \in \varTheta \}\). Then \((\mathcal {M},\alpha ) \vdash _{\textit{MAP}}\beta \) iff \(\alpha {\,\mid \!\sim \,}\beta \) is in the maximum-entropy closure of \((\varDelta ,\varTheta )\).

Example 1

Consider the default rules \(\varDelta = \{\textit{bird} {\,\mid \!\sim \,}\textit{flies}, \textit{antarctic}\wedge \textit{bird} {\,\mid \!\sim \,}\) \( \lnot \textit{flies}\}\). Then \(\mathcal {M}_1\) \(=\) \(\{ (\lnot a_1\) \(\vee \) \(\lnot \textit{bird}\) \(\vee \) \(\textit{flies}, \infty )\), \((\lnot a_2\) \(\vee \) \(\lnot \textit{antarctic}\) \(\vee \) \(\lnot \textit{bird}\) \(\vee \) \(\lnot \textit{flies}\), \(\infty )\), \( (a_1, 1),\) \((a_2,1),\) \((a_2 \vee \lnot a_1, \infty ) \}\) is the result of Transformation 1, and \(\mathcal {M}_2 = \{ (\lnot \textit{bird} \vee \textit{flies}, 1), (\lnot \textit{antarctic} \vee \lnot \textit{bird} \vee \lnot \textit{flies}, 2) \}\) is the result of Transformation 2, which in this example coincides with the result of Transformation 3.

4 Encoding Non-ground Default Theories in Markov Logic

While reasoning with default rules has mostly been studied at the propositional level, a few authors have considered first-order default rules [8, 12]. Similar as for probabilistic first-order rules [11], two rather different semantics for first-order defaults can be considered. On the one hand, a default such as \(P(x) {\,\mid \!\sim \,}Q(x)\) could mean that the most typical objects that have the property P also have the property Q. On the other hand, this default could also mean that whenever P(x) holds for a given x, in the most normal worlds Q(x) will also hold. In other words, first-order defaults can either model typicality of objects or normality of worlds [8]. In this paper, we will consider the latter interpretation. Given that we only consider finite universes (as is usual in the context of MLNs), we can then see a first order default as a template for propositional defaults. For example \(P(x) {\,\mid \!\sim \,}Q(x)\) can be seen as a compact notation for a set of defaults \(\{P(c_1) {\,\mid \!\sim \,}Q(c_1),...,P(c_n){\,\mid \!\sim \,}Q(c_n)\}\). Note that this approach would not be possible for first-order defaults that model the typicality of objects.

In particular, the first-order default theories we will consider consist of first-order logic formulas (hard rules) and default rules of the form \(\alpha {\,\mid \!\sim \,}\beta \), where \(\alpha \) is a conjunction of literals and \(\beta \) is a disjunction of literals. Our approach can be straightforwardly extended to quantified default rules, where the scopes of quantifiers may be the whole default rules, and not just either the antecedent or the consequent of a rule. While this could be of interest, we do not consider this for the ease of presentation.

Definition 1

(Markov logic model of a first-order default theory). Let \((\varDelta ,\varTheta )\) be a first-order default theory with \(\varDelta \) the set of default rules and \(\varTheta \) the set of hard rules. A Markov logic network \(\mathcal {M}\) is a model of the default logic theory \(\varDelta \cup \varTheta \) if it holds that: (i) \(P\left[ X = \omega \right] = 0\) whenever \(\omega \not \models \varTheta \), and (ii) for any default rule \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta \) and any grounding substitution \(\theta \) of the unquantified variables of \(\alpha {\,\mid \!\sim \,}\beta \), either \(\{ \alpha \theta \} \cup \varTheta \vdash \bot \) or \((\mathcal {M},\alpha \theta ) \vdash _{\textit{MAP}}\beta \theta .\) We say that \((\varDelta ,\varTheta )\) is satisfiable if it has at least one model.

Below we will describe three methods for constructing Markov logic models of first-order default theories, generalizing Transformations 13. For convenience, we will use typed formulas (nevertheless, we will assume that default rules given on input are not typed for simplicity). For instance, when we have the formula \(\alpha = \textit{owns}(\textit{person}:X,\textit{thing}:Y)\) and the set of constants of the type person is \(\{\textit{alice}, \textit{bob} \}\) and the set of constants of the type thing is \(\{ \textit{car} \}\) then \(\alpha \) corresponds to the ground formulas \(\textit{owns}(alice,car)\) and \(\textit{owns}(bob,car)\). In cases where there is only one type, we will not write it explicitly. For a constant or variable x, we write \(\tau (x)\) to denote its type. Two formulas \(F_1\) and \(F_2\) (either both conjunctions or both disjunctions of literals) are said to be isomorphic when there is an injective substitution \(\theta \) of the variables of \(F_1\) such that \(F_1\theta \equiv F_2\) (where \(\equiv \) denotes logical equivalence). Two default rules \(D_1 = \alpha _1 {\,\mid \!\sim \,}\beta _1\) and \(D_2 = \alpha _2 {\,\mid \!\sim \,}\beta _2\) are said to be isomorphic, denoted \(D_1 \approx D_2\), if there exists a substitution \(\theta \) of the variables of \(D_1\) such that \(\alpha _1\theta \equiv \alpha _2\) and \(\beta _1\theta \equiv \beta _2\). Two default theories \(\varDelta _1 \cup \varTheta _1\) and \(\varDelta _2 \cup \varTheta _2\) are said to be isomorphic, denoted by \(\varDelta _1 \cup \varTheta _1 \approx \varDelta _2 \cup \varTheta _2\), if there is a bijection i from elements of \(\varDelta _1 \cup \varTheta _1\) to elements of \(\varDelta _2 \cup \varTheta _2\) such that for any \(F \in \varDelta _1 \cup \varTheta _1\), \(i(F) \approx F\). When j is a permutation of a subset of constants from \(\varDelta \cup \varTheta \) then \(j(\varDelta \cup \varTheta )\) denotes the default theory obtained by replacing any constant c from the subset by its image j(c).

Definition 2

(Interchangeable constants). Let \(\varDelta \cup \varTheta \) be a non-ground default theory. A set of constants \(\mathcal {C}\) is said to be interchangeable in \(\varDelta \cup \varTheta \) if \(j(\varDelta \cup \varTheta ) \approx \varDelta \cup \varTheta \) for any permutation j of the constants in \(\mathcal {C}\).

The set of maximal interchangeable subsets of a set of constants is the uniquely defined partition of this set and will be called the interchangeable partition. To check whether a set of constants \(\mathcal {C}\) is interchangeable, it is sufficient to check that \(j(\varDelta \cup \varTheta ) \approx \varDelta \cup \varTheta \) for those permutations which swap just two constants from \(\mathcal {C}\). Note that the constants do not actually need to appear in \(\varDelta \cup \varTheta \). It trivially holds that constants which do not appear in \(\varDelta \cup \varTheta \) are interchangeable. When \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) is the interchangeable partition of a set of constants then we may introduce a new type \(\textit{type}_{\textit{lexmin}{(\mathcal {C}_i)}}\) for every \(\mathcal {C}_i \in \mathcal {I}\) (where \(\textit{lexmin}{(\mathcal {C})}\) denotes the lexicallyFootnote 1 smallest element from \(\mathcal {C}\)). When \(D = \alpha {\,\mid \!\sim \,}\beta \) is a ground default rule, we write \(\textit{variabilize}(D)\) to denote the following default rule: \(\bigwedge \{V_c \ne V_d :c,d\in \textit{const}(D), \tau (c)=\tau (d)\} \wedge \alpha ' {\,\mid \!\sim \,}\beta '\) where \(\textit{const}(D)\) is the set of constants appearing in D and \(\alpha '\) and \(\beta '\) are obtained from \(\alpha \) and \(\beta \) by respectively replacing all constants c by a new variable \(V_c\) of type \(\tau (c)\). Here \(\ne \) is treated as a binary predicate which is defined in the set of hard rules \(\varTheta \).

Let \(\mathcal {C}\) be a set of constants and let \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) be the interchangeable partition of the constants from \(\mathcal {C}\). Two ground default rules \(\alpha _1 {\,\mid \!\sim \,}\beta _1\) and \(\alpha _2 {\,\mid \!\sim \,}\beta _2\) are said to be weakly isomorphic w.r.t. \(\mathcal {I}\) if \(\textit{variabilize}(\alpha _1 {\,\mid \!\sim \,}\beta _1)\) and \(\textit{variabilize}(\alpha _2 {\,\mid \!\sim \,}\beta _2)\) are isomorphicFootnote 2.

Definition 3

(Ground representatives). Let \(D = \alpha {\,\mid \!\sim \,}\beta \) be a default rule and let \(\mathcal {I} = \{ \mathcal {C}_1, \dots , \mathcal {C}_n \}\) be the interchangeable partition of constants. A set of ground representatives of D w.r.t. \(\mathcal {I}\) is a maximal set of groundings of D by constants from \(\bigcup _{\mathcal {C}_i \in \mathcal {I}}\mathcal {C}_i\) such that no two of these groundings are weakly isomorphic w.r.t. \(\mathcal {I}\). (If \(\alpha {\,\mid \!\sim \,}\beta \) is typed then we only consider groundings which respect the typing of variables.)

A set of ground representatives of a default rule \(D = \alpha {\,\mid \!\sim \,}\beta \) can be constructed in time \(O(|\mathcal {I}|^{|D|})\). While this is exponential in the size of the default rule (which is usually small), it is only polynomial in the number of classes in the interchangeable partition \(\mathcal {I}\) and does not depend on the total number of constants.

Let \(\varDelta \cup \varTheta \) be a first-order default theory and \(\mathcal {C}\) a set of constants. Let \(\mathcal {R} = \bigcup _{\alpha {\,\mid \!\sim \,}\beta \in \varDelta } \mathcal {R}_{\alpha {\,\mid \!\sim \,}\beta }\) where \(\mathcal {R}_{\alpha {\,\mid \!\sim \,}\beta }\) denotes a set of ground representatives of \(\alpha {\,\mid \!\sim \,}\beta \). The rational closure for the first-order default theory is based on the partitionFootnote 3 \(\varDelta _1^* \cup ... \cup \varDelta _k^*\) of the set

$$\varDelta ^* = \{ \textit{variabilize}(\alpha {\,\mid \!\sim \,}\beta ) :\alpha {\,\mid \!\sim \,}\beta \in \mathcal {R} \text{ and } \{ \alpha \} \cup \varTheta \not \vdash \bot \} $$

where \(\varDelta ^*_j\) is the set of default rules \(\textit{variabilize}(\alpha {\,\mid \!\sim \,}\beta ) \in \varDelta ^* \setminus (\varDelta _1^* \cup \dots \cup \varDelta _{j-1}^*)\) such that

$$\begin{aligned} \{\alpha \wedge \beta \} \cup \{\lnot \alpha _i \vee \beta _i :(\alpha _i{\,\mid \!\sim \,}\beta _i) \in \varDelta ^* \setminus (\varDelta _1^*\cup ... \cup \varDelta _{j-1}^*) \} \cup \varTheta \end{aligned}$$
(3)

has a model with Herbrand universe \(\mathcal {C}\). When all rules \(\alpha ' {\,\mid \!\sim \,}\beta '\) from the set

$$\varDelta ^*_{\alpha {\,\mid \!\sim \,}\beta } = \{\textit{variabilize}(\alpha ' {\,\mid \!\sim \,}\beta ') :\alpha ' {\,\mid \!\sim \,}\beta ' \text{ is } \text{ a } \text{ ground } \text{ representative } \text{ of } \alpha {\,\mid \!\sim \,}\beta \}, $$

are contained in the same partition class \(\varDelta _j^*\) then we can simplify \(\varDelta _j^*\) by setting \(\varDelta _j^* := \varDelta _j^* \cup \{ \alpha {\,\mid \!\sim \,}\beta \} \setminus \varDelta ^*_{\alpha {\,\mid \!\sim \,}\beta }\). Furthermore, note that checking the existence of Herbrand models can be carried out using cutting-plane inference which means that it is seldom needed to ground the set of default rules completely. We can now present the lifted counterparts to Transformations 13.

Transformation 4

(Lifted rational closure). Let \(\varDelta \) be a set of default rules and let \(\varTheta \) be a set of hard constraints. Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined by (3). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot a_i \vee \lnot \alpha \vee \beta , \infty ) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i^* \} \cup \{ (a_i, 1) \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \} \cup \{ (a_i \vee \lnot a_{i-1}, \infty ) \}\) where \(a_i\) are auxiliary (ground) literals. If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).

Transformation 5

(Lifted lexicographic entailment). Let \(\varDelta \) be a set of default rules, let \(\varTheta \) be a set of hard constraints, and let \(\mathcal {U}\) be the considered set of constants. Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined by (3). Let the MLN \(\mathcal {M}\) be defined as follows: \(\bigcup _{i=1}^k \{ (\lnot \alpha \vee \beta , \lambda _i) :\alpha {\,\mid \!\sim \,}\beta \in \varDelta _i \} \cup \{ (\phi , \infty ) :\phi \in \varTheta \}\) where \(\lambda _i = 1+\sum _{j=1}^{i-1} \sum _{\alpha {\,\mid \!\sim \,}\beta \in \varDelta ^*_j} |\mathcal {U}|^{|\textit{vars}(\alpha {\,\mid \!\sim \,}\beta )|} \cdot \lambda _j\) for \(i > 1\) and \(\lambda _1 = 1\). If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).

Note that lexicographic entailment may lead to MLNs with very large weights.Footnote 4

Next, we describe a lifted variant of maximum-entropy entailment. Let \(\varDelta \cup \varTheta \) be a first-order default theory and \(\mathcal {I}\) the interchangeable partition of constants from a given set \(\mathcal {C}\). Let \(\varDelta _1^* \cup \dots \cup \varDelta _k^*\) be the partition of \(\varDelta \cup \varTheta \), defined as in (3) (without the simplification of merging default rules), and let \(\varGamma := \varDelta ^*_1\). First, we construct an MLN \(\mathcal {M}\) containing the rules from \(\varGamma \) and set their weights equal to 1. For every \(\varDelta ^*_j\) with \(j > 1\), while \(\varDelta _j^* \not \subseteq \varGamma \), we repeat the following steps. We construct a new MLN \(\mathcal {M}'\) by adding to the MLN \(\mathcal {M}\) all rules from the set \(\{\lnot \alpha \vee \beta :\alpha {\,\mid \!\sim \,}\beta \in (\varDelta ^*_j \cup \dots \varDelta _k^*)\setminus \varGamma \}\) as hard constraints (i.e. with infinite weights). For every \(\alpha {\,\mid \!\sim \,}\beta \in \varDelta _j^* \setminus \varGamma \), we construct its ground representative \(\alpha ' {\,\mid \!\sim \,}\beta '\) (note that there is only one ground representative up to isomorphism for any rule in \(\varDelta _j^*\), which follows from the construction of \(\varDelta _j^*\)) and we find a most probable world \(\omega _{\alpha {\,\mid \!\sim \,}\beta }\) of \((\mathcal {M}',\alpha ')\); let us write \(p_{\alpha {\,\mid \!\sim \,}\beta }\) for its penalty, i.e. the sum of the weights of the violated rules. Note that \(\omega _{\alpha {\,\mid \!\sim \,}\beta }\) verifies the default rule \(\alpha {\,\mid \!\sim \,}\beta \) and only falsifies rules in \(\varGamma \), exactly as in the propositional version of the maximum-entropy transformation. We then select the rules \(\alpha {\,\mid \!\sim \,}\beta \) with minimum penalty \(p_{\alpha {\,\mid \!\sim \,}\beta }\), add them to \(\varGamma \) and to the MLN \(\mathcal {M}\) with the weight set to \(1+p_{\alpha {\,\mid \!\sim \,}\beta }\). If \(\mathcal {M}'\) does not have any models, the initial set of defaults cannot be satisfiable, and we end the procedure.

Transformation 6

(Lifted maximum-entropy entailment). Let \(\varDelta \) be a set of default rules, let \(\varTheta \) be a set of hard constraints, and let \(\mathcal {U}\) be the considered set of constants. Let \(\mathcal {M}\) be the MLN obtained in the last iteration of the procedure described above. If \((\varDelta ,\varTheta )\) is satisfiable then \(\mathcal {M}\) is a Markov logic model of \((\varDelta ,\varTheta )\).

Example 2

Let us consider the following defaults:

$$\begin{aligned}&\textit{bird}(X){\,\mid \!\sim \,}\textit{flies}(X) \quad \quad \quad \quad \textit{bird}(X)\wedge \textit{antarctic}(X){\,\mid \!\sim \,}\lnot \textit{flies}(X)\\&\textit{bird}(X)\wedge \textit{antarctic}(X) \wedge (X \ne Y) \wedge \textit{sameSpecies}(X,Y){\,\mid \!\sim \,}\textit{antarctic}(Y) \end{aligned}$$

Let the set of constants be given by \(\mathcal {C}=\{ \textit{tweety},\textit{donald},\textit{beeper}\}\). Then the lexicographic transformation yields the MLN \(\{(\phi _1, 1),(\phi _2 , 4),(\phi _3, 4)\}\) while the maximum entropy transformation yields \(\{(\phi _1, 1),(\phi _2, 2),(\phi _3, 3) \}\), where \(\phi _1 =\lnot \textit{bird}(X) \vee \textit{flies}(Y)\), \(\phi _2 = \lnot \textit{bird}(X)\vee \lnot \textit{sameSpecies}(X, Y)\vee \lnot (X \ne Y)\vee \lnot \textit{bird}(Y)\vee \lnot \textit{antarctic}(X)\vee \textit{antarctic}(Y)\) and \(\phi _3 = \lnot \textit{bird}(X) \vee \lnot \textit{antarctic}(X) \vee \lnot \textit{flies}(X)\).

As the next example illustrates, it is sometimes necessary to split the initial default rules into several typed specializations.

Example 3

Consider the following defaults: \(bird(X) \wedge (X \ne \textit{tweety}) {\,\mid \!\sim \,}\textit{flies}(X)\), \(\textit{bird}(X) \wedge \textit{antarctic}(X) {\,\mid \!\sim \,}\lnot \textit{flies}(X)\) and \(\textit{bird}(X) \wedge \textit{antarctic}(X) \wedge (X \ne Y) \wedge \textit{sameSpecies}(X,Y) {\,\mid \!\sim \,}\textit{antarctic}(Y)\). Then the lexicographic transformation yields the MLN \(\{(\phi _1, 1), (\phi _2, 1), (\phi _3, 7), (\phi _4, 7)\}\), where:

$$\begin{aligned} \phi _1 =&\lnot \textit{bird}(\tau _{\textit{tweety}}:X) \vee \lnot \textit{antarctic}(\tau _{\textit{tweety}}:X) \vee \lnot \textit{flies}(\tau _{\textit{tweety}}:X), \\ \phi _2 =&\lnot \textit{bird}(\tau _{\textit{beeper}}:X) \vee \lnot (\tau _{\textit{beeper}}:X \ne \tau _{\textit{tweety}}:\textit{tweety}) \vee \textit{flies}(\tau _{\textit{beeper}}:X), \\ \phi _3 =&\lnot \textit{bird}(\tau _{\textit{beeper}}:X) \vee \lnot \textit{antarctic}(\tau _{\textit{beeper}}:X) \vee \lnot \textit{flies}(\tau _{\textit{beeper}}:X), \\ \phi _4 =&\lnot \textit{bird}(X) \vee \lnot \textit{sameSpecies}(X, Y) \vee \lnot (X \ne Y), \lnot \textit{bird}(Y) \vee \lnot \textit{antarctic}(X) \end{aligned}$$

Note that the transformation had to introduce new types corresponding to the interchangeable sets of constants \(\{\{ \textit{tweety} \}, \{ \textit{beeper}, \textit{donald} \} \}\). The rule \(\phi _4\) was created by merging rules with different typing, which was made possible by the fact that all the respective differently typed rules ended up with the same weights. The maximum entropy transformation leads to six such rules.

5 Evaluation

In this section we describe experimental evaluation of the methods presented in this paper. The implementation of the described methods is available for downloadFootnote 5.

We have evaluated the proposed methods using the well-known UW-CSE dataset, which describes the Department of Computer Science and Engineering at the University of Washington [20]. The usual task is to predict the \(\textit{advisedBy}(\textit{person},\) \(\textit{person})\) predicate from the other predicates. A set of rules for this domain has previously been collected for the experiments in [20]. These rules, however, cannot be used as default rules because they are not satisfiable in the sense of Definition 1. Therefore, in order to evaluate our method, we have used the following consistent set of default rules.

$$\begin{aligned} D_1:&{\,\mid \!\sim \,}\lnot \textit{advisedBy}(\textit{S}, \textit{P})\\ D_2:&\textit{advisedBy}(\textit{S}, \textit{P}_1){\,\mid \!\sim \,}\lnot \textit{tempAdvisedBy}(\textit{S}, \textit{P}_2)\\ D_3:&\textit{advisedBy}(\textit{S}, \textit{P}) \wedge \textit{publication}(\textit{Pub}, \textit{S}){\,\mid \!\sim \,}\textit{publication}(\textit{Pub}, \textit{P})\\ D_4:&(\textit{P}_1 \ne \textit{P}_2) \wedge \textit{advisedBy}(\textit{S}, \textit{P}_1){\,\mid \!\sim \,}\lnot \textit{advisedBy}(\textit{S}, \textit{P}_2)\\ D_5:&\textit{advisedBy}(\textit{S}, \textit{P}) \wedge \textit{ta}(\textit{C}, \textit{S}, \textit{T}){\,\mid \!\sim \,}\textit{taughtBy}(\textit{C}, \textit{P}, \textit{T})\\ D_6:&\textit{professor}(\textit{P}) \wedge \textit{student}(\textit{S}) \wedge \textit{publication}(\textit{Pub}, \textit{P}) \wedge \textit{publication}(\textit{Pub}, \textit{S}) \\&{\,\mid \!\sim \,}\textit{advisedBy}(\textit{S}, \textit{P})\\ D_7:&\textit{professor}(\textit{P}) \wedge \textit{student}(\textit{S}) \wedge \textit{publication}(\textit{Pub}, \textit{P}) \wedge \textit{publication}(\textit{Pub}, \textit{S}) \wedge \\&\textit{tempAdvisedBy}(\textit{S}, \textit{P2}){\,\mid \!\sim \,}\lnot \textit{advisedBy}(\textit{S}, \textit{P})\\ D_{8}:&(\textit{S}_1 \ne \textit{S}_2) \wedge \textit{advisedBy}(\textit{S}_2, \textit{P}) \wedge \textit{ta}(\textit{C}, \textit{S}_2, \textit{T}) \wedge \textit{ta}(\textit{C}, \textit{S}_1, \textit{T}) \wedge \\&\textit{taughtBy}(\textit{C}, \textit{P}, \textit{T}) \wedge \textit{student}(\textit{S}_1) \wedge \textit{professor}(\textit{P}){\,\mid \!\sim \,}\textit{advisedBy}(\textit{S}_1, \textit{P})\\ D_{9}:&(\textit{S}_1 \ne \textit{S}_2) \wedge \textit{advisedBy}(\textit{S}_2, \textit{P}) \wedge \textit{ta}(\textit{C}, \textit{S}_2, \textit{T}) \wedge \textit{ta}(\textit{C}, \textit{S}_1, \textit{T}) \wedge \\&\textit{taughtBy}(\textit{C}, \textit{P}, \textit{T}) \wedge \textit{student}(\textit{S}_1) \wedge \textit{professor}(\textit{P}) \wedge \textit{tempAdvisedBy}(\textit{S}_1, \textit{P}_2) \\&{\,\mid \!\sim \,}\lnot \textit{advisedBy}(\textit{S}_1, \textit{P}) \end{aligned}$$

Recall that default rules \(\alpha {\,\mid \!\sim \,}\beta \) in our setting correspond to statements of the form: for any grounding substitution \(\theta \), \(\beta \theta \) is true in all most probable worlds of \((\mathcal {M},\alpha \theta )\). Thus the default rules \(\alpha {\,\mid \!\sim \,}\beta \) we consider should be such that an expert believes that \(\alpha \theta \) being the only evidence, it would make sense to conclude \(\beta \theta \). Seen with this perspective in mind, rule \(D_1\) states that in absence of any knowledge, we assume that persons S and P are not in the \(\textit{advisedBy}\) relationship. Rule \(D_2\) states that if we only know that S has an advisor then we conclude that S does not have a temporary advisor. Rule \(D_3\) states that advisors are typically co-authors of their students’ papers. Rule \(D_4\) states that students typically only have one advisor. The rest of the rules can be interpreted similarly. Note that rules \(D_7\) and \(D_9\) encode exceptions to rules \(D_6\) and \(D_8\).

We computed the lexicographic and maximum-entropy transformations of these rules using our implementation of the described methods.Footnote 6 We evaluated the obtained MLNs on the five different subject areas of the UW-CSE dataset, which is the standard methodology. Specifically, we computed the average number of true positives and false positivesFootnote 7 for the advisedBy predicate over 10 runs of MAP inference, noting that the results can depend on the specific MAP state that is returned. For comparison, we have used an MLN with the same setFootnote 8 of rules but with weights learned discriminatively using Tuffy [15] (LEARNED), and an MLN with the same set of rules but with all weights set to 1 (ONES). The results are shown in Table 1. The maximum entropy and lexicographic entailment have highest recall but at the cost of also having a higher number of false positives. Note that the number of pairs which can potentially be in the \(\textit{advisedBy}\) relationship is in the order of hundreds or even thousands but the true number of pairs of people in this relationship is in the order of just tens. The baseline method ONES has largest variance.

Table 1. Experimental results for MLNs obtained by the described methods (the numbers represent absolute counts).

6 Conclusion

We have discussed the problem of constructing a Markov logic network (MLN) from a set of first-order default rules, where default rules are seen as constraints on what should be derivable using MAP inference. The proposed construction methods have been obtained by lifting three well-known methods for non-monotonic reasoning about propositional default rules: the rational closure, the lexicographic closure and the maximum-entropy closure. As our evaluation with the UW-CSE dataset illustrates, our method can be used to construct useful MLNs in scenarios where no training data is available. In the future, we would like to explore the connections between our proposed lifted transformations and the lifted inference literature. For example, identifying interchangeable constants is known as shattering in lifted inference [6].