Keywords

figure a

1 Introduction

Inductive logic is a formal approach to model rational uncertain inferences. It seeks to analyse the degree to which premisses entail putative conclusions. Given uncertain premisses \(\varphi _1,\dots ,\varphi _k\) with attached uncertainties \(X_1,\dots ,X_k\) an inductive logic provides means to attach uncertainty Y to a conclusion \(\psi \), where the \(X_i\) and Y are non-empty subsets of the unit interval. An inductive logic can be represented as

(1)

where denotes an inductive entailment relation [11]. Much work has gone into the development and exploration of inductive logics, see, e.g., [4, 9, 12, 16, 29,30,31, 41].

The main early proponent of inductive logic was Rudolf Carnap [2, 3]. Nowadays, the spirit of his approach today continues in the Pure Inductive Logic approach [14, 17, 21,22,23,24, 40]. In this paper, I however consider uncertain inference within the maximum entropy framework, which goes back to Edwin Jaynes [15], who put forward a Maximum Entropy Principle governing rational uncertain inference.

Maximum Entropy Principle. Rational agents ought to use a probability function consistent with the evidence for drawing uncertain inferences. In case there is more than one such probability function, a rational agent ought to use probability functions with maximal entropy.

In case only a single probability function is consistent with the evidence, the Maximum Entropy Principle is uncontroversial. Its strength (and sometimes controversial nature) is rooted in applications with multiple prima facie reasonable probability functions for probabilistic inference. This principle is at the heart of objective Bayesianism.

If the underlying domain is finite, then applying the Maximum Entropy Principle for inductive entailment is straight-forward and well-understood due to the seminal work of Alena Vencovská & Jeff Paris [32,33,34, 36,37,38,39]. Matters change dramatically for infinite domains. Naively replacing the sum by an integral in the definition of Shannon Entropy produces a great number of probability functions with infinite entropy. But then there is no way to pick a probability function with maximal entropy out of a set in which all functions have infinite entropy.

There are two different suggestions for inductive logic on an infinite first order predicate logic explicating the Maximum Entropy Principle. The Entropy Limit Approach [1, 35] and the Maximum Entropy Approach [28, 45,46,47,48]. It has been conjectured, that both approaches agree in cases in which the former approach is-well defined [48, p. 191]. This conjecture has been shown to hold in a number of cases of evidence bases with relatively low quantifier-complexity [19, 25, 44].

This paper introduces modifications of the Maximum Entropy Approach and studies their relationships. I next properly introduce this approach along with some notation and the modifications. I then proceed to investigate their relationships. My main result is Theorem 1; it proves that the two suggested modifications agree with the original Maximum Entropy Approach expounded by Jon Williamson for convex optimisation problems, if at least one of these three approaches yields a unique probability function for inference on the underlying first order predicate language.

In Sect. 4, I study a third modification of Williamson’s Maximum Entropy Approach, which I reject due to the absurd probabilities it delivers for inductive inference, see Proposition 6. In Sect. 5, I put forward some concluding remarks and consider avenues for future research.

2 The Maximum Entropy Approach and Two Modifications

The formal framework and notation is adapted from [25].

A fixed first order predicate language \({\mathcal L}\) is given. It consists of countably many constant symbols \(t_{1},t_{2},\ldots \) exhausting the universe (for every element in the universe there is at least one constant symbol representing it) and finitely many relation symbols, \(U_1, \ldots , U_n\). In particular, note that the language does not contain a symbol for equality nor does it contain function symbols. The atomic sentences are sentences of the form \(U_{i} t_{i_{1}}\ldots t_{i_{k}}\), where k is the arity of the relation \(U_{i}\); the atomic sentences are denoted by \(a_{1},a_{2},\ldots \). They are by construction ordered in such a way that atomic sentences involving only constants among \(t_{1},\ldots ,t_n\) occur before those atomic sentences that also involve \(t_{n+1}\). The set of sentences of \({\mathcal L}\) is denoted by \(S{\mathcal L}\).

The finite sublanguages \({\mathcal L}_{n}\) of \({\mathcal L}\) are those languages, which only contain the first n constant symbols \(t_{1},\ldots ,t_{n}\) and the same relation symbols as \({\mathcal L}\). Denote the sentences of \({\mathcal L}_{n}\) by \(S{\mathcal L}_{n}\).

The contingent conjunctions of maximal length of the form \({\scriptstyle \pm }a_{1}\wedge \ldots \wedge {\scriptstyle \pm }a_{r_{n}}\in S\mathcal L_n\) are called the n-states. Let \(\varOmega _{n}\) be the set of n-states for each \(n\in \mathbb N\) with \(|\varOmega _{n}| = 2^{r_{n}}\).

Definition 1

(Probabilities on Predicate Languages). A probability function P on \({\mathcal L}\) is a function \(P:S{\mathcal L}\longrightarrow \mathbb R_{\! \ge \! 0}\) such that:

  • P1: If \(\tau \) is a tautology, i.e., \(\models \tau \), then \(P(\tau )=1\).

  • P2: If \(\theta \) and \(\varphi \) are mutually exclusive, i.e., \(\models \lnot (\theta \wedge \varphi )\), then \(P(\theta \vee \varphi ) = P(\theta ) + P(\varphi )\).

  • P3: \(P\left( \exists x\theta (x)\right) = \sup _m P\left( \bigvee _{i=1}^m \theta (t_{i})\right) \).

A probability function on \({\mathcal L}_{n}\) is defined similarly (the supremum in P3 is dropped and m is equal to n).

\(\mathbb P\) denotes the set of all probability functions on \({\mathcal L}\).

A probability function \(P\in \mathbb P\) is determined by the values it gives to the quantifier-free sentences, this result is known as Gaifman’s Theorem [8]. P3 is sometimes called Gaifman Condition [40, p. 11]. The Gaifman Condition is justified by the assumption that the constants exhaust the universe. Consequently, a probability function is determined by the values it gives to the n-states, for each n [33, p. 171].

It is thus sensible to measure the entropy of a probability function \(P\in \mathbb P\) via n-states with varying n.

Definition 2

(n-entropy). The n-entropy of a probability function \(P\in \mathbb P\) is defined as:

$$\begin{aligned} H_n(P):{=} -\sum _{\omega \in \varOmega _{n}} P(\omega ) \log P(\omega ) . \end{aligned}$$

The usual conventions are \(0 \log 0:=0\) and \(\log \) denoting the natural logarithm. The second convention is inconsequential for current purposes.

\(H_n(\cdot )\) is a strictly concave function.

The key idea is to combine the n-entropies defined on finite sublanguages \(\mathcal L_n\) into an overall notion of comparative entropy comparing probability functions P and Q defined on the entire first order language.

So far, the literature has only studied such inductive logics with respect to the first binary relation in the following definition.

Definition 3

(Comparative Notions of Entropy). That a probability function \(P\in \mathbb P\) has greater (or equal) entropy than a probability function \(Q\in \mathbb P\) could be defined in the following three ways.

  1. 1.

    If and only if there is some natural number N such that for all \(n \ge N\) it holds that \(H_{n}(P) > H_{n}(Q)\), denoted by \(P\succ Q\).

  2. 2.

    If and only if there is some natural number N such that for all \(n \ge N\) it holds that \(H_{n}(P) \ge H_{n}(Q)\) and there are infinitely many n such that \(H_n(P)> H_n(Q)\), denoted by P]Q.

  3. 3.

    If and only if there is some natural number N such that for all \(n \ge N\) it holds that \(H_{n}(P) \ge H_{n}(Q)\), denoted by P)Q.

The lower two definitions are alternative ways in which one could explicate the intuitive idea of comparative entropy, which have never been studied before. Prima facie, all three definitions appear reasonable.

Before I can define notions of maximal entropy with respect to the given premisses, I need to specify the set of probability functions over which entropy is to be maximised.

Definition 4

(Region of Feasible Probability Functions). The set of probability functions consistent with all premisses, \(P(\varphi _i)\in X_i\) for all i, is denoted by \(\mathbb E\) and defined as

$$\begin{aligned} \mathbb E:=\{P\in \mathbb P\;:\;P(\varphi _i)\in X_i\text { for all }1\le i\le k\} . \end{aligned}$$

In order to simplify the notation, I do not display the dependence of \(\mathbb E\) on the premisses.

In this paper, I only consider a fixed set of premisses, \(\varphi _1^{X_1},\dots ,\varphi _k^{X_k}\). There is hence no need to complicate notation by writing \(\mathbb E_{\varphi _1^{X_1},\dots ,\varphi _k^{X_k}}\) or similar.

Definition 5

(Set of Maximum Entropy Functions). The set of probability functions on \({\mathcal L}\) with maximal entropy in \(\mathbb E\) relative to a notion of comparative entropy > defined on \(\mathbb P\times \mathbb P\) can then be defined as

$$\begin{aligned} {\text {maxent}}_> \mathbb E :{=} \{P\in \mathbb E :\text { there is no } Q\in \mathbb E\setminus \{P\}\text { with } Q> P \} . \end{aligned}$$
(2)

The Maximum Entropy Principle now compels agents to use probabilities in \({\text {maxent}}_> \mathbb E\) for drawing uncertain inferences as described by the scheme for inductive logic in (1). The induced inductive logics are described in the following definition.

Definition 6

(Maximum Entropy Inductive Logics). An inductive logic with respect to > is induced by attaching uncertainty \(Y_>(\psi )\subseteq [0,1]\) to the sentences \(\psi \) of \(\mathcal L\) via

$$\begin{aligned} Y_>(\psi )&:=\{r\in [0,1]\;|\;\text { there exists }P\in {\text {maxent}}_>\mathbb E\text { with }P(\psi )=r\} . \end{aligned}$$

In case there are two or more different probability functions in \({\text {maxent}}_>\mathbb E\), there are some sentences of \(\psi \) of \(\mathcal L\) to which multiple different probabilities attach.

The Maximum Entropy Approach arises using \(\succ \) for comparing entropies of probability functions in \(\mathbb P\) [28, 45,46,47,48].

Remark 1

(Comparisons to the Entropy Limit Approach). It is known that the Entropy Limit Approach is invariant under arbitrary permutations of the constant symbols [25, Footnote 2]. The Entropy Limit Approach however suffers from a finite model problem, a premiss sentence formalising the existence of an infinite order does not have a finite model [42, Sect. 4.1] and hence the entropy limit is undefined.

The three Maximum Entropy Approaches defined in Definition 6 are invariant under finite permutations of constant symbols, the three notions of comparative entropy only depend on the limiting behaviour of \(H_n\) and n-entropy is invariant under permutation of the first n constant symbols (Definition 3). Whether these Maximum Entropy Approaches are invariant under infinite permutations is still to be determined. Since these approaches do not make use of finite models, they are immune to the finite model problem. See [19, 25] for more detailed comparisons and further background.

Remark 2

(Maximum Entropy Functions). Determining maximum entropy functions is an often difficult endeavour. In concrete applications, it is often easier to determine the entropy limit first and then show that the entropy limit also has maximal entropy. See [25] for an overview of the cases in which a maximum entropy function exists and is unique.

Trivially, if the equivocator function, \(P_=\in \mathbb P\), which for all n assigns all n-states the same probability of \(1/|\varOmega _n|\),Footnote 1 is in \(\mathbb E\), then \(\{P_=\}={\text {maxent}}_\succ \mathbb E\).

In a forthcoming paper [26], we show that if there is only a single premiss \(\varphi \) such that \(0<P_=(\varphi )<1\), then the maximum entropy function is obtained from (Jeffrey) updating the equivocator function. For the premiss \(\varphi ^c\) with \(0\le c\le 1\) it holds that \(\{cP_=(\cdot |\varphi _N)+(1-c)P_=(\cdot |\lnot \varphi _N)\}={\text {maxent}}_\succ \mathbb E\), where N is maximal such that \(t_N\in \varphi \) and \(\varphi _N\) is defined as the disjunction of N-states \(\omega _N\) such that \(P_=(\varphi \wedge \omega _N)>0\).

Cases with multiple uncertain premisses are, in general, still poorly understood.

In the next section, I study (the relationships of) these binary relations and the arising inductive logics. Particular attention is paid to the case of a unique probability function for inference, \(|{\text {maxent}}_>\mathbb E|=1\). These cases are of particular interest, since they deliver well-defined (unique) probabilities for inductive inference.

3 Maximal (Modified) Entropy

I first consider two notions of refinement relating these three binary relations.

Definition 7

(Strong Refinement). > is called a strong refinement of \(\gg \), if and only if the following hold

  • > is a refinement of \(\gg \), for all \(P,Q\in \mathbb P\) it holds that \(P\gg Q\) entails \(P>Q\),

  • for all \(R,P,Q\in \mathbb P\) it holds that, if \(R\gg P\) and \(P>Q\) are both true, then \(R\gg Q\) and \(R\ne Q\).

Definition 8

(Centric Refinement). A refinement > of \(\gg \) is called centric, if and only if for all different \(R,P\in \mathbb P\) with \(R>P\) it holds that \((R+P)/2\gg P\).

The name centric has been chosen to emphasise that the centre between R and P is greater than P.

Clearly, not all binary relations possess strong refinements; not all binary relations possess centric refinements.

Proposition 1

(Strong and Centric Refinements).] is a strong and centric refinement of \(\succ \). ) is a strong and centric refinement of ] and of \(\succ \).

Proof

For ease of comparison, I now display the three notions of comparative entropy line by line. The first line defines \(P\succ Q\), the second line P]Q and the third line P)Q. The second conjunct in the first definition is superfluous as is the second conjunct in the third definition:

$$ \begin{aligned}&H_n(P)\le H_n(Q)\text { not infinitely often} \, \& \, H_n(P)> H_n(Q)\text { infinitely often}\\&H_n(P)< H_n(Q)\text { not infinitely often} \, \& \, H_n(P)> H_n(Q)\text { infinitely often}\\&H_n(P)< H_n(Q)\text { not infinitely often} \, \& \, H_n(P)\ge H_n(Q)\text { infinitely often} . \end{aligned}$$

By thusly spelling out both comparative notions of entropy one easily observes that \(P\succ Q\) entails P]Q, and that P]Q entails P)Q. This establishes the refinement relationships.

Strong Refinements. Next note that, if \(R\succ Q\) or if R]Q, then \(R\ne Q\).

] is a strong refinement of \(\succ \): Let \(R\succ P\) and P]Q. Then \(R\ne Q\). Furthermore, \(H_n(R)\le H_n(Q)\) is true for at most finitely many n, since from some N onwards P has always greater or equal n-entropy than Q. So, \(R\succ Q\).

) is a strong refinement of]: Let R]P and P)Q. Then \(R\ne Q\). From some N onwards P has always greater or equal n-entropy than Q. There are also infinitely many \(n\in \mathbb N\) such that \(H_n(R)>H_n(P)\). So, R]Q.

) is a strong refinement of \(\succ \): Let \(R\gg P\) and P)Q. Then \(R\ne Q\). From some N onwards P has always greater or equal n-entropy than Q. From some \(N'\) onwards R has always greater n-entropy than P. Hence, \(H_n(R)\le H_n(Q)\) can only be the case for finitely many \(n\in \mathbb N\). So, \(R\succ Q\).

Centric Refinement. First, note that different probability functions disagree on some quantifier free sentence \(\varphi \in \mathcal L_N\) (Gaifman’s Theorem [8]). Since \(\varphi \in \mathcal L_{n+N}\) for all \(n\ge 1\), these probability functions also disagree on all more expressive sub-languages \(\mathcal L_{n+N}\).

] is a centric refinement of \(\succ \): Fix arbitrary probability functions RP defined on \(\mathcal L\) with R]P. \(R\ne P\). From the concavity of the function \(H_n\) it follows that \(H_n(\frac{R+P}{2})>H_n(P)\), whenever \(H_n(R)\ge H_n(P)\). By definition of ], there are only finitely many n for which \(H_n(R)\ge H_n(P)\) fails to hold. Hence, \(\frac{R+P}{2}\succ P\) by definition of \(\succ \).

) is a centric refinement of \(\succ \): Fix arbitrary probability functions RP defined on \(\mathcal L\) with R)P. Note that R may be equal to P. From the concavity of the function \(H_n\) it follows that \(H_n(\frac{R+P}{2})>H_n(P)\), whenever \(H_n(R)\ge H_n(P)\). By definition of ), there are only finitely many n for which \(H_n(R)\ge H_n(P)\) fails to hold. Hence, \(\frac{R+P}{2}\succ P\) by definition of \(\succ \).

) is a centric refinement of]: Fix arbitrary probability functions RP defined on \(\mathcal L\) with R)P. Note that R may be equal to P. Since \(\frac{R+P}{2}\succ P\) (see above case) and since ] is a refinement of \(\succ \), it holds that \(\frac{R+P}{2}] P\).    \(\square \)

Remark 3

(Properties of Comparative Entropies). If \(H_n(P)=H_n(Q)\) for all even n and \(H_n(P)>H_n(Q)\) for all odd n, then P]Q and \(P\nsucc Q\). Hence, ] is a proper refinement of \(\succ \).

For \(P=Q\) it holds that P)Q and Q)P. Hence, ) is a proper refinement of ] and thus a proper refinement of \(\succ \).

] is transitive, irreflexive, acyclic and asymmetric. ) is transitive, reflexive and has non-trivial cycles, e.g., for all probability functions PQ with zero-entropy, \(H_n(P)=0\) for all \(n\in \mathbb N\), it holds that P)Q.

I now turn to entropy maximisation and the induced inductive logics.

Proposition 2

(Downwards Uniqueness). Let > be a strong refinement of \(\gg \). If \({\text {maxent}}_\gg \mathbb E=\{Q\}\), then \(\{Q\}={\text {maxent}}_\gg \mathbb E={\text {maxent}}_>\mathbb E\).

In case the inductive logic induced by \(\gg \) as a notion of a comparative entropy provides a unique probability function for rational inference, so does the inductive logic induced by >.

Proof

Note at first that since > is a refinement of \(\gg \) it holds that

$$\begin{aligned} {\text {maxent}}_>\mathbb E\subseteq {\text {maxent}}_\gg \mathbb E . \end{aligned}$$
(3)

Maximal elements according to \(\gg \) may not be maximal according to > and all maximal elements according to > are also maximal according to \(\gg \).

Assume for the purpose of deriving a contradiction that \(Q\notin {\text {maxent}}_>\mathbb E\). Then, there has to exist a \(P\in \mathbb E\setminus \{Q\}\) such that \(P>Q\) but \(P\gg Q\) fails to hold (\(\{Q\}={\text {maxent}}_\gg \mathbb E\) holds by assumption).

However, since \(\{Q\}={\text {maxent}}_\gg \mathbb E\) and \(Q\notin {\text {maxent}}_>\mathbb E\) hold, there has to exist some \(R\in \mathbb E\setminus \{P\}\) such that \(R\gg P\), P cannot have maximal \(\gg \)-entropy. We hence have \(R\gg P\) and \(P>Q\). Since > is a strong refinement of \(\gg \), we obtain \(R\gg Q\) and \(R\ne Q\). Since \(R\in \mathbb E\) it follows from the definition of \({\text {maxent}}_\gg \) that \(Q\notin {\text {maxent}}_\gg \mathbb E\). Contradiction. So, \(Q\in {\text {maxent}}_>\mathbb E\).

Since \(\{Q\}={\text {maxent}}_\gg \mathbb E{\mathop {\supseteq }\limits ^{(3)}}{\text {maxent}}_>\mathbb E\ni Q\), it follows that \({\text {maxent}}_>\mathbb E=\{Q\}\).    \(\square \)

The converse is also true for convex \(\mathbb E\) and centric refinements.

Proposition 3

(Upwards Uniqueness). If \(\mathbb E\) is convex, > is a centric refinement of \(\gg \) and \({\text {maxent}}_>\mathbb E=\{Q\}\), then \(\{Q\}={\text {maxent}}_\gg \mathbb E={\text {maxent}}_>\mathbb E\).

Proof

Assume for contradiction that there exists a feasible probability function \(P\in \mathbb E\setminus \{Q\}\) such that P is not \(\gg \)-dominated by the probability functions in \(\mathbb E\) but >-dominated by some \(R\in \mathbb E\setminus \{P\}\), \(R>P\). Now define \(S=\frac{1}{2}(P+R)\) and note that \(S\in \mathbb E\) (convexity) and that SPR are pairwise different, \(|\{S,P,R\}|=3\).

Since > is a centric refinement of \(\gg \), we conclude that \(S\gg P\), which contradicts that \(P\in {\text {maxent}}_\gg \mathbb E\) and \(P\ne Q\). So, only Q can be in \({\text {maxent}}_\gg \mathbb E\).

Since \(Q\in {\text {maxent}}_>\mathbb E\) and \({\text {maxent}}_>\mathbb E{\mathop {\subseteq }\limits ^{(3)}}{\text {maxent}}_\gg \mathbb E\) it follows that \(\{Q\}={\text {maxent}}_\gg \mathbb E\).    \(\square \)

Theorem 1

(Triple Uniqueness). If \(\mathbb E\) is convex and at least one of \({\text {maxent}}_)\mathbb E\), \({\text {maxent}}_]\mathbb E\) or \({\text {maxent}}_\succ \mathbb E\) is a singleton, then

$$\begin{aligned} {\text {maxent}}\mathbb E_)={\text {maxent}}_]\mathbb E={\text {maxent}}_\succ \mathbb E . \end{aligned}$$

Proof

Simply apply the above three propositions.    \(\square \)

It is known that \({\text {maxent}}_\succ \mathbb E\) is a singleton, in case of a certain \(\varSigma _1\) premiss [44], a class of \(\varPi _1\) premisses [25] and a class of constraints for unary languages [42]. As remarked above, we show in a forthcoming paper [26], that for a single premiss \(\varphi ^c\) with \(0<P_=(\varphi )<1\) and \(0\le c\le 1\) there exists a unique maximum entropy function in \({\text {maxent}}_\succ \mathbb E\), which is obtained from suitably (Jeffrey) updating the equivocator.

Premiss sentences \(\varphi \) with \(P_=(\varphi )=0\) and cases with multiple uncertain premisses – on the other hand – are still poorly understood.

4 Modification Number 3

The Maximum Entropy Approach, in its original formulation, fails to provide probabilities for uncertain inference for certain evidence bases of quantifier complexity \(\varSigma _2\) [43, § 2.2]. For example, for the single certain premiss \(\varphi :=\exists x\forall y Uxy\) every probability function P consistent with the evidence must assign \(\varphi \) probability one, \(P\in \mathbb E\) entails \(P(\varphi )=1\). There must hence be, at least, one constant \(t_k\) witnessing the existence of all these y, \(P(\forall y Ut_ky)>0\). Ceteris paribus, the \(k-1\)-entropy of such functions increases the greater the k such that \(t_k\) is a witness. Now suppose for contradiction that there exists a maximum entropy function \(P\in \mathbb E\) with \(t_k\) as the first witness. Now construct a probability Q by postponing the witness by one, \(t_{k+1}\) is the first witness of the premiss according to Q. Q can be constructed such that \(H_n(Q)\ge H_n(P)\) for all n. One can then show that \(R:=\frac{Q+P}{2}\) has strictly greater n-entropy than P for all \(r\ge n+1\) (concavity of \(H_n\)). It is hence the case that for all \(P\in \mathbb E\) there exists a \(R\in \mathbb E\) such that \(R\succ P\). \({\text {maxent}}_\succ \mathbb E\) is hence empty, see the forthcoming [26] for more details and a proof.

Theorem 1 shows that inductive logics induced by ) and ] also do not produce a unique probability for uncertain inference for the certain premiss \(\varphi =\exists x\forall y Uxy\).

From the perspective of this paper, we see that the inductive logic of the standard Maximum Entropy Approach fails to deliver well-defined probabilities for inference for the certain premiss \(\varphi =\exists x\forall y Uxy\), because the relation \(\succ \) holds for too many pairs of probability functions. Proceeding in the spirit of this paper, it seems sensible to define a modified inductive logic induced by a notion of comparative entropy, \(\}\), which holds for fewer pairs of probability functions. That is, \(\}\) is refined by \(\succ \).

Closest to the spirit of Definition 3 is to define \(P\}Q\) as follows.

Definition 9

(Modification Number 3). \(P\}Q\), if and only if \(H_n(P)>H_n(Q)\) for all \(n\in \mathbb N\).

Clearly, the other three notions of comparative entropy are refinements of \(\}\).

Proposition 4

(Comparison of \(\}\) vs. \(\succ ,\) ],)). Neither of the three binary relations \(\succ ,],)\) is a strong refinement and neither is a centric refinement of \(\}\).

Proof

Consider three pairwise different probability functions PQR with i) \(H_n(P)>H_n(Q)\) for all n, ii) \(\frac{H_n(P)}{H_n(Q)}\approx 1\), iii) \(H_1(Q)=H_1(R)-\delta \) for large \(\delta >0\) and iv) \(H_n(Q)>H_n(R)\) for all \(n\ge 1\).

Then \(P\}Q\) and \(Q\succ R, Q]R,Q)R\) all hold. Now note that \(H_1(P)<H_1(R)\) and thus \(P\}R\) fails to hold. Hence, none of \(\succ ,],)\) is a strong refinement of \(\}\). Finally, observe that \(\frac{Q+R}{2}\}R\) fails to hold. Hence, none of \(\succ ,],)\) is a centric refinement of \(\}\).    \(\square \)

The aim here is to define a different inductive logic producing well-defined probabilities for the premiss sentence \(\varphi =\exists x\forall yRxy\) (and other premiss sentences). Theorem 1 shows that a different logic can only arise, if none of the other three notions of comparative entropy is a strong and centric refinement of \(\}\). Proposition 4 shows that neither of these notions is a strong and centric refinement. It his hence in principle possible that \(\}\) defines a novel inductive logic.

The following proposition shows that this not only possible in principle by providing a case in which the induced inductive logics do come apart.

Proposition 5

(Inductive Logics). The binary relation \(\}\) induces a different inductive logic than \(\succ ,],)\).

Proof

Let U be the only and unary relation symbol of \(\mathcal L\). Suppose there is no evidence, then all probability functions are consistent with the empty set of premisses, \(\mathbb E=\mathbb P\). Then every \(P\in \mathbb P\) with \(P(Ut_1)=P(\lnot Ut_1)=0.5\) has maximal 1-entropy. Hence, all such P are members of \({\text {maxent}}_\}\mathbb E\). For \(\Box \in \{\succ ,],)\}\) it holds that \({\text {maxent}}_\Box \mathbb E=\{P_=\}\). So, \({\text {maxent}}_\Box \mathbb E\) is a proper subset of \({\text {maxent}}_\}\mathbb E\):

$$\begin{aligned} {\text {maxent}}_\Box \mathbb E=\{P_=\}\subset \{P\in \mathbb P\;:\;P(Ut_1)=P(\lnot Ut_1)\}\subset {\text {maxent}}_\}\mathbb E . \end{aligned}$$

   \(\square \)

This proof leads to the following more general observation:

Proposition 6

(Finite Sublanguages). If there exists an \(n\in \mathbb N\) and a \(P\in \mathbb E\) such that \(H_n(P)=\max \{H_n(Q)\,:\,Q\in \mathbb E\}\), then \(P\in {\text {maxent}}_\}\mathbb E\).

This strong focus on single sublanguages \(\mathcal L_n\) makes \({\text {maxent}}_\}\) unsuitable as an inductive logic for infinite predicate languages, as the following example demonstrates.

Example 1

(Absurdity of Modification 3). Consider the case in which the premisses jointly determine the probabilities on \(\mathcal L_1\). For example, the given language \(\mathcal L\) contains two relation symbols: a unary relation symbol \(U_1\) and a binary relation symbol \(U_2\). The premisses are \(U_2t_1t_1\) holds with certainty and \(P(U_1t_1)=10\%\). Then every probability function \(P\in \mathbb P\) that satisfies these two premisses has a 1-entropy of \(H_1(P)=-0.9\cdot \log (0.9)-0.1\cdot \log (0.1)\). So, \(H_1(P)=\max \{H_1(Q)\,:\,Q\in \mathbb E\}\). This means that every feasible probability function (of which there are many) is a maximum entropy function – regardless of how entropic (or not) probabilities are assigned to the other sublanguages \(\mathcal L_n\) for \(n\ge 2\):

$$\begin{aligned} {\text {maxent}}_\}\mathbb E=\mathbb E . \end{aligned}$$

5 Conclusions

Maximum entropy inductive logic on infinite domains lacks a paradigm approach. The Entropy Limit Approach, the Maximum Entropy Approach as well as the here studied modified Maximum Entropy Approaches induce the same unique inductive logic in a number of natural cases (Theorem 1 and [25, 42, 44]). This points towards a unified picture of maximum entropy inductive logics – in spite of the number of possible ways to define such inductive logics.

This uniqueness is particularly noteworthy in light of a string of results suggesting and comparing different notions of entropy (maximisation), which lead to different maximum entropy functions [5,6,7, 10, 13, 18, 20, 27].

The Maximum Entropy Approach fails to provide probabilities for uncertain inference for some evidence bases of quantifier complexity \(\varSigma _2\) [43, § 2.2]. In these cases, for all \(P\in \mathbb E\) there exists a \(Q\in \mathbb E\) such that \(Q\succ P\) and \({\text {maxent}}\mathbb E\) is hence empty [26]. One way to sensibly define an inductive logic could be to consider a binary relation which is refined by \(\succ \). Unfortunately, the most obvious way to define such an inductive logic produces absurd results (Proposition 6). Finding a way to sensibly define a (maximum entropy) inductive logic properly dealing with such cases must be left to further study.

Further avenues for future research suggest themselves. Firstly, the question arises whether the first two here suggested modifications and the original Maximum Entropy Approach agree more widely or whether they come apart in important cases. If they do provide different probabilities for inductive inference, which of them is to be preferred and why? Secondly, are there further prima facie plausible ways to modify the Maximum Entropy Approach? Thirdly, are there other modifications of the Entropy Limit Approach? If so, how do they look like and what are the implications for the induced inductive logics? Fourthly, what is the status of the entropy limit conjecture [48, p. 191], the conjecture that the Entropy Limit Approach and the Maximum Entropy Approach agree under the assumption that the former is well-defined, in light of these modifications? Finally, cases with multiple uncertain premisses remain poorly understood and pose a challenge to be tackled.