Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Uncertainty exists in human decision making, where the results of our actions cannot be predicted, or we are not aware of the complete circumstances surrounding the decision. The unpredictability of the outcome of a test can be modelled using probability theory. For example, both calling heads/tails on the next coin flip (unpredictable) and guessing the side of a covered coin (unknown state), are naturally modelled as decisions with probabilities of a half. We name this kind of unpredictability as risk-based uncertainty.

Information theory [10] allows measuring how uncertain we are about the effects of a decision, in the form of entropy. People tend to be risk-averse [2], and they prefer to make decisions under low entropy. Concretely, most people prefer 1,000$ over an all-or-nothing coin flip for 2,000$. The former option has 0 bits entropy, and the latter has 1 bit entropy.

There is a type of uncertainty stronger than risk – ambiguity or Knightian uncertainty [5] – where the probabilities themselves are unknown. An example would be a coin with an unknown bias. Subjective logic is a formalism that addresses this type of uncertainty. People also tend to be ambiguity-avoiding; the Ellsberg paradox [2] (Sect. 2) shows that people may prefer a bigger known risk over a smaller unknown risk.

In this paper, we generalise information theory to cover subjective logic. As a consequence, entropy can be used to measure both types of uncertainty (rather than merely risk). Moreover, the information theory paradigm comes with a body of results, which may become useful for reasoning about ambiguity-based uncertainty. Cross entropy is an example of such a useful concept from information theory, as it allows measuring the difference between two settings with either type of uncertainty.

There are four types of extensions that we propose. The first two, pignistic entropy and aggregate uncertainty entropy, flatten ambiguity-based uncertainty to risk-based uncertainty. Pignistic entropy models a perfectly rational agent, interested in the expected risk, given an ambiguous situation; whereas aggregate uncertainty entropy models a paranoid agent, that assumes the worst-case reasonable risk. The final two, belief entropy and conceivability entropy, properly extend information theory to model ambiguity-based uncertainty. Both methods are based on extending surprisal. For belief entropy, surprisal is based on the beliefs of the agent; whereas for conceivability entropy, surprisal decreases with uncertainty. All four types coincide when there is no ambiguity-based uncertainty.

2 Ellsberg Paradox

The Ellsberg paradox [2] is a motivating example for uncertainty representation in subjective logic. The Ellsberg paradox shows that people make different decisions than rational risk-avoiding agents. We use the Ellsberg paradox as a running example throughout this paper.

Suppose you are shown an urn with 90 balls in it and you are told that 30 are red and that the remaining 60 balls are either black or yellow. One ball is selected at random and you are given the following choice: Option 1 A gives you $100 if a red ball was drawn and $0 of either a black or a yellow ball was drawn; option 1B gives you $100 if a black ball was drawn and $0 if a red or a yellow was drawn. Table 1a summarises the possible outcomes given the choices.

Experiments show that people strongly favour option 1 A over option 1B [2]. Assuming that people are rational, this implies that people believe that black balls are less probable than red.

Options 2 A and 2B are based on the exact same set-up. The amount of balls of each colour is equal to variant 1. Option 2 A pays $100 when either red or yellow is drawn, whereas 2B pays $100 when either black or yellow is drawn. Table 1b summaries these outcomes.

Experiments show that people strongly favour option 2B over option 2A [2]. Assuming that people are rational, this implies that people believe that the black balls are more probable than red. However, the set-up remains unchanged between variant 1 and 2. Thus, the choices made by the people cannot be explained as rational estimates of probabilities.

Table 1. Two sets of choices that comprise the Ellsberg paradox.

It is impossible to explain the choices using risk-based uncertainty, since the risks are perfectly symmetrical. The common explanation of the difference between variant 1 and 2 is ambiguity-based uncertainty. In variant 1, option 1a has no ambiguity-based uncertainty, as the odds are known to be one in three. In variant 2, option 2b is the ambiguity-free option, as the odds are known to be two in three. Options 1b and 2a have ambiguity-based uncertainty, as the real odds could be as low as 0 or , respectively (if there are no black balls), or as high as or 1, respectively (if there are no yellow balls).

The Ellsberg paradox is the running example throughout this paper. We relate concepts from subjectively logic and the four types of entropy directly to the four choices of the Ellsberg paradox. A good entropy measure can describe the core difference between options 1 A or 2B and options 1B or 2A.

3 Opinion Representation in Subjective Logic

Random events have a set of possible outcomes. Each of these outcomes is assigned some probability. A user with incomplete knowledge, however, may not know these probabilities. Subjective logic introduces opinions to model users that estimate these probabilities.

The domain of an opinion is the set of outcomes of the underlying event. The elements of the domain are exclusive and exhaustive. The user realises that the underlying event can have only one outcome, and includes all possible outcomes in the domain.

A probability distribution assigns a (non-negative) probability to each of the outcomes. An opinion assigns a (non-negative) belief to each of the outcomes. Unlike the probability distribution, the sum of the beliefs may be less than one. The remainder is uncertainty.

An opinion of user A about an event with domain X is denoted \(\omega ^A_X\). An opinion consists of a belief mass function \(b^A_X : X \rightarrow [0, 1]\), such that \(\sum _{x \in X} b^A_X(x) \le 1\), and a base rate function \(a^A_X : X \rightarrow [0, 1]\), such that \(\sum _{x \in X} a^A_X(x) = 1\). The uncertainty \(u^A_X\) is defined \(1 - \sum _{x \in X} b^A_X(x)\). For the domain \(X = \{x_1, \dots , x_n\}\), we may denote an opinion as \(\omega ^A_X = (b_1, \dots , b_n)\), to mean \(b^A_X(x_1) = b_1, \dots b^A_X(x_n) = b_n\).

The base rates denote the projected probabilities, in case of uncertainty. With a base rate, every opinion uniquely denotes a probability distribution, which we call the pignistic probabilities. The pignistic probability mass for \(x \in X\) is computed \(p^A_X(x) = b^A_X(x) + u^A_X \cdot a^A_X(x)\).

Barycentric coordinate systems can be used to visualise opinions. In a barycentric coordinate system the location of a point is specified as the centre of mass, or barycentre, of masses placed at its vertices [8]. A barycentric coordinate system with n axes is represented on a simplex with n vertices which has dimensionality \((n-1)\). A triangle is a 2D simplex which has 3 vertices and is thus a barycentric system with 3 axes. A binomial opinion can be visualised as a point in a barycentric coordinate system of 3 axes represented by a 2D simplex which is in fact an equal sided triangle, as in Fig. 1. Here, the belief, disbelief and uncertainty axes go perpendicularly from each edge towards the respective opposite vertices denoted x, \(\overline{x}\) and uncertainty. The base rate \(a^A_X(x)\) is a point on the base line, and the projected probability \(p^A_X(x)\) is determined by projecting the opinion point to the base line in parallel with the base rate director. The binomial opinion with projected probability is shown as an example.

Fig. 1.
figure 1

Barycentric visualisations of opinions.

In case the opinion point is located at the left or right vertex of the triangle, i.e. with \(b^A_X(\overline{x})=1\) or \(b^A_X(x)=1\) (and \(u^A_X=0\)), then the opinion is equivalent to boolean TRUE or FALSE, in which case subjective logic becomes equivalent to binary logic. In case the opinion point is located on the baseline of the triangle, i.e. with \(u^A_X=0\), then the opinion is equivalent to a traditional probability, in which case subjective logic becomes equivalent to probabilistic logic.

In general, a multinomial opinion can be represented as a point inside a regular simplex. In particular, a ternary multinomial opinion can be represented inside a tetrahedron with a barycentric system of 4 axes, as shown in Fig. 1.

The tetrahedron is a 3D simplex. Assume the 3-domain \(X = \{x_1,\, x_2,\, x_3\}\). Figure 1 shows a tetrahedron with the example multinomial opinion , and base rate distribution . The belief axes for \(x_1\), \(x_2\) and \(x_3\) are omitted due to the difficulty of 3D visualisation.

Running Example 1

The Ellsberg paradox can be expressed elegantly in subjective logic. We can let the domain be \(\{\mathrm {win},\mathrm {lose}\}\). For choice 1A, the opinion is ; for 1B ; for 2A ; and for 2B . The base rate is the most natural base rate – black balls are no more or less likely than yellow balls – but we generally consider arbitrary base rates for the Ellsberg paradox. The choices 1 A and 2B lead to opinions without uncertainty; their generalised entropy measure, therefore, equals the standard entropy measure. The choices 1B and 2 A have an amount of uncertainty; their various generalised entropy measures lead to different figures.

4 Information Theory

Subjective logic has an extensive set of operations that allow calculus with opinions. One particular operation represents constructing opinions based on recommendations. In [11], the authors show the use of (standard) information theory in measuring the usefulness of recommendations and the derived opinions. Information theory, opinions and uncertainty are intimately linked.

Before we introduce some extensions of information theory to cover subjective logic, we introduce the important standard notions. A more detailed discussion and treatment can be found, e.g., in [7].

Definition 1

(Surprisal). The surprisal (or self-information) of an outcome x of a discrete random variable X is \(I_X(x) = -\log (p_X(x))\).

Surprisal measures the degree to which an outcome is surprising. The more surprising an outcome is, the more informative it is. In information theory, surprisal of an outcome is completely determined by the probability it happens. Usually, an outcome is more surprising if it is less likely to happen.

Definition 2

(Entropy). The entropy of a discrete random variable X is the expected surprisal \(H(X) = -\sum _x p_X(x) \log (p_X(x))\).

Entropy measures the expected information carried with a random variable. In information theory, entropy of a random variable is decided by the uncertainty of its outcome in one test. A random variable has more entropy if all of its outcomes have more similar probabilities to happen.

Definition 3

(Cross Entropy). The cross entropy of two discrete random variables XY is \(H(X,Y) = -\sum _x p_X(x) \log (p_Y(x))\).

The cross entropy measures the amount of surprisal obtained when you believe an event is distributed as Y, but in reality is distributed as X. This amount is not typically symmetric in Y and X. The cross entropy is minimised when Y is selected to be equal to X, in which case the believed distribution equals reality.

5 Pignistic Entropy

Subjective logic opinions model the subjective opinions of users. Users make decisions based on their opinions. We can imagine a user forced to make a decision, where he would decide one way if the probability is above a certain threshold, and the other way otherwise. The cut-off for the decision is called the pignistic probability of a belief in an opinion.

A user may have an opinion about a potentially unfair coin. The user believes that even unfair coins provide heads or tails at least \(30\,\%\) of the time. Hence, his opinion \(\omega \) is (with uncertainty ). Since the user has no reason to prefer heads over tails (or vice versa), if he is forced to pick a probability distribution, then he assigns to both.

Pignistic entropy of an opinion \(\omega ^A_X\) characterised by belief mass function \(b^A_X\) and base rate \(a^A_X\) is based on the entropy of the associated pignistic probability distribution:

Definition 4

(Pignistic Entropy). The pignistic entropy \(H_p(\omega ^A_X)\) is defined: \(- \sum _x p^A_X(x) \log (p^A_X(x)) = - \sum _x \big (b^A_X(x) + u^A_X \cdot a^A_X(x)\big ) \log \big (b^A_X(x) + u^A_X \cdot a^A_X(x)\big )\).

The pignistic entropy is insensitive towards the change of uncertainty in an opinion:

Proposition 1

Let \(\omega ^A_X\) and \(\omega ^B_X\) be two opinions, such that \(u^A_X > u^B_X\) and for all x, \(p^A_X(x) = p^B_X(x)\), then \(H_p(\omega ^A_X) = H_p(\omega ^B_X)\).

Proof

Proposition follows from the fact that \(H_p\) is completely determined by the pignistic probabilities, which are equal for \(\omega ^A_X\) and \(\omega ^B_X\).

Running Example 2

The pignistic entropy models the way a rational agent would approach the Ellsberg paradox. As depicted in Table 1, if the base rate for black versus yellow is 50-50, then options 1 A and 1B have equal pignistic entropy, and options 2 A and 2B also have equal pignistic entropy. If the base rate is skewed towards black, then 1B and 2B are the superior choices, and 1 A and 2A if it is not. As expected, the pignistic entropy does not reflect the inherent desire to avoid ambiguity.

Entropy can not just be used to measure how much information there is, but also to compare the difference between two opinions. The cross entropy between \(\omega ^A_X\) and \(\omega ^B_X\) describes how well \(\omega ^B_X\) predicts \(\omega ^A_X\):

Definition 5

(Pignistic Cross Entropy). The pignistic cross entropy between \(\omega ^A_X\) and \(\omega ^B_X\), \(H_p(\omega ^A_X,\omega ^B_X)\) is defined: \(- \sum _x p^A_X(x) \log (p^B_X(x))\).

The pignistic cross entropy is insensitive towards the difference between the uncertainty of two opinions:

Proposition 2

Let \(\omega ^A_X\) and \(\omega ^{A'}_X\) be two opinions, such that \(u^A_X > u^{A'}_X\) and for all x, \(p^A_X(x) = p^{A'}_X(x)\), and idem for \(\omega ^B_X\) and \(\omega ^{B'}_X\). Then \(H_p(\omega ^A_X,\omega ^B_X) = H_p(\omega ^{A'}_X,\omega ^B_X) = H_p(\omega ^A_X,\omega ^{B'}_X) = H_p(\omega ^{A'}_X,\omega ^{B'}_X)\).

Proof

Proposition follows from the fact that \(H_p\) is completely determined by the pignistic probabilities, which are equal for \(\omega ^A_X\) and \(\omega ^{A'}_X\), and for \(\omega ^B_X\) and \(\omega ^{B'}_X\).

The pignistic cross entropy between two identical opinions is equal to the entropy of one of the opinions:

Proposition 3

\(H_p(\omega _X,\omega _X) = H_p(\omega _X)\)

To give an example of pignistic cross entropy, consider five opinions (with belief and disbelief): , , , , \(\omega ^E_X = (0, 0)\). We suppose the base rate is . Their pignistic cross entropies are presented in Table 3. As \(\omega ^A_X\) (\(\omega ^B_X\)) and \(\omega ^C_X\) (\(\omega ^D_X\)) have the same pignistic probability distributions, their cross entropy is minimal. In this sense, the uncertainty component in \(\omega ^C_X\) (\(\omega ^D_X\)), which makes them different from \(\omega ^A_X\) (\(\omega ^B_X\)), is eliminated. Note that cross entropy between \(\omega ^E_X\), which represents complete uncertainty, and any other opinions are the same. It implies that they are equally different when compared with the complete uncertain opinion. Such difference is actually smaller than that between two completely opposite opinions (e.g., \(\omega ^A_X\) and \(\omega ^B_X\)). Also note that the pignistic cross entropy measure is not symmetric.

Table 2. Pignistic entropy of the options in the Ellsberg paradox.
Table 3. Pignistic cross entropy among five opinions.

The pignistic (cross) entropy ignores the uncertainty present in an opinion, and converts uncertainty to pignistic probability, before measuring the (cross) entropy. Pignistic entropy, therefore, accurately measures the entropy of the decisions of users with an opinion, but not the entropy of the opinion itself (nor the cross entropy between opinions). In the remainder of the paper, we want to study the entropy of the opinions including the uncertainty.

6 Aggregate Uncertainty Entropy

Dempster-Shafer theory [1] shares similarities with subjective logic. Dempster-Shafer theory also deals with beliefs and uncertainty. Extensions of information theory for Dempster-Shafer theory currently exist. The major variant is the aggregate uncertainty [6]. In this section, we discuss aggregate uncertainty, and translate it to subjective logic (Table 2).

A particular downside of pignistic entropy, is that an uncertainty plays no role in the amount of entropy. Intuitively, we should expect a more uncertain opinion not to have less entropy. The aggregate uncertainty entropy is the minimal extension of pignistic entropy that satisfies this requirement [6]:

Definition 6

(Aggregate Uncertainty Entropy). Let \(F^A_X\) be the set of functions f with, for all x, \(b^A_X(x) \le f(x) \le 1\) and \(\sum _x f(x) = 1\). The aggregate uncertainty entropy \(H_{au}(\omega ^A_X)\) is defined: \(- \max _{f \in F^A_X} \sum _x f(x) \log (f(x))\).

The aggregate uncertainty entropy cannot decrease whenever uncertainty increases, even if the ratio of beliefs is affected:

Proposition 4

Let \(\omega ^A_X\) and \(\omega ^B_X\) be two opinions, such that for all x, \(b^A_X(x) > b^B_X(x)\), then \(H_{au}(\omega ^A_X) \le H_{au}(\omega ^B_X)\).

Proof

As \(b^A_X(x) > b^B_X(x)\), \(F^A_X \subseteq F^B_X\), meaning maximal f in \(F^A_X\) is in \(F^B_X\).

The functions \(f^A_X\) are all probability mass functions. Each probability mass function \(f^A_X\) has the property that it assigns a probability at least as great as the belief to each outcome. Thus, \(F^A_X\) is essentially the set of probability distributions that we believe may be the case. If we take the maximal element of the entropy using the different probabilities, then we satisfy the requirement that increasing uncertainty can never decrease entropy.

Running Example 3

The aggregate uncertainty entropy models the way a paranoid agent would approach the Ellsberg paradox. Specifically, the agent assumes that the Shannon entropy is maximised under constraints of his beliefs. For 1A and 2B, the beliefs fix the probabilities, but for 1B and 2A, the entropy is maximised by letting the probability of winning (and losing) be . As depicted in Table 1, the aggregate uncertainty entropy is independent of base rates (as it is based on Dempster-Shafer theory), and 1A and 2B score significantly better than 1B and 2A. This approach to the problem uses no notions of ambiguity, and has been suggested previously [3]. The problem with this view, is that the Ellsberg’s experiment is purposely set-up to ensure the set-up remains unchanged between the two variants, whereas the maximal entropy cases of 1B and 2 A are inconsistent.

Table 4. Ellsberg paradox and cross entropy for aggregate uncertainty entropy.

Definition 7

(Aggregate Uncertainty Cross Entropy). Let \(F^A_X\) and \(F^B_X\) as before, and fg be \(\mathrm {argmax}_{f \in F^A_X} \sum _x f(x) \log (f(x)), \mathrm {argmax}_{g \in F^B_X} \sum _x g(x) \log (g(x))\) The aggregate uncertainty cross entropy between \(\omega ^A_X\) and \(\omega ^B_X\), \(H_{au}(\omega ^A_X,\omega ^B_X)\) is defined: \(- \sum _x f(x) \log (g(x))\).

The aggregate uncertainty cross entropy between two identical opinions is equal to the entropy of one of the opinions:

Proposition 5

\(H_{au}(\omega ^A_X,\omega ^A_X) = H_{au}(\omega ^A_X)\)

We compute aggregate uncertainty cross entropy between the opinions introduced in Table 3, and the results are presented in Table 4b. As \(\mathrm {argmax}_{f \in F^A_X}\) \(\sum _x f(x) \log (f(x))\) are the same for \(\omega ^C_X\), \(\omega ^D_X\), \(\omega ^E_X\), namely \(f(x) = 0.5\) for all x, any of five opinions has the same cross entropy with them. Also, due to symmetry between \(\omega ^A_X\) and \(\omega ^B_X\), other three opinions have equal cross entropy with them. Using this cross entropy measure, opinions with partial uncertainty (\(\omega ^C_X\), \(\omega ^D_X\)) seems to be the same as that with complete uncertainty (\(\omega ^E_X\)). Because they have the same distance with the other two deterministic opinions.

There are two major downsides to the aggregate uncertainty entropy. The first is theoretical, namely that the aggregate uncertainty is not a closed form expression. There is, however, research that addresses this specific problem to some degree [9]. The second downside is that the measure applies to Dempster-Shafer theory, which has a subtly different interpretation of uncertainty (specifically, that the probability mass must be over the belief mass). In the next two sections, we study how subjective logic’s interpretation of uncertainty impacts the definition of entropy.

7 Ambiguity Entropy

Rather than using the entropy based on risk as a proxy for ambiguity-based uncertainty entropy, we can directly encode beliefs and ambiguity-based uncertainty into surprisal. An interesting question is whether uncertainty leads to surprisal. Two opposing interpretations are that total uncertainty means that everything is maximally surprising, or that nothing is surprising at all. We demonstrate that which interpretation is appropriate depends on the context.

Before introducing the two types of ambiguity entropy, we introduce an overarching definition of surprisal: \(-\log (b^A_X(x) + c \cdot u^A_X)\). The definition contains a parameter \(c \in [0,1]\), which determines the amount of surprisal from uncertainty. The special cases for c are when \(c = 0\) (or \(c \approx 0\)) and when \(c = 1\). If the uncertainty is zero, then all choices of c collapse into one, which equals the standard definition of surprisal. If the uncertainty is non-zero, then possible interpretations of surprisal start to diverge. In the next two sections, we formally analyse the two edge cases, belief entropy and conceivability entropy.

In [4], Klir explores a similar idea, where belief entropy parallels confusion ambiguity and conceivability entropy parallels dissonance ambiguity. The fundamental difference is that [4] considers Dempster-Shafer theory, and therefore cannot use the base-rate that subjective logic has. As a consequence, his notions cannot use projected probabilities. His notions are further removed from classical notions in information theory, as he cannot use the expected surprisal.

7.1 Belief Entropy

A user has an opinion with beliefs. If the belief in an outcome is low, then the user thinks it is unlikely that the outcome will happen. To encode this, we can define surprisal based on the belief mass, by letting the belief in x be: \(- \log (b^A_X(x))\). This equates to \(-\log (b^A_X(x) + c \cdot u^A_X)\), when \(c = 0\).

We take the natural definition of entropy as the expected surprisal. Entropy of an opinion should measure the expected surprisal of beliefs.

Definition 8

(Belief Entropy). The belief entropy \(H_b(\omega ^A_X)\) is defined as: \(- \sum _x p^A_X(x) \log (b^A_X(x))\).

The belief entropy has several nice properties. The first property is that, with the pignistic probabilities remaining constant, the entropy strictly increases when the uncertainty increases:

Proposition 6

Let \(\omega ^A_X\) and \(\omega ^B_X\) be two opinions, such that \(u^A_X > u^B_X\) and for all x, \(p^A_X(x) = p^B_X(x)\), then \(H_b(\omega ^A_X) > H_b(\omega ^B_X)\).

The second property is that, unlike aggregate uncertainty entropy, the entropy of a completely uncertain opinion is strictly larger than the entropy of any pignistic entropy:

Proposition 7

Let \(\omega ^A_X\) be complete uncertainty; \(u^A_X = 1\). Then \(H_b(\omega ^A_X) > H_p(\omega ^B_X)\), for all \(\omega ^B_X\).

As the entropy of uncertainty strictly exceeds the entropy of any other opinion in subjective logic, uncertainty contains less information than any other opinion.

Running Example 4

The belief entropy directly models the beliefs on the agent, where ambiguity reduces the sum of the beliefs. Table 1 shows the belief entropies associated with each of the choices. It is interesting to note that 1B and 2 A are both assigned infinite belief entropy. The reason is that the participant has no reason to assume it is even possible to win, so the surprisal upon winning is the global maximum of surprisal; positive infinity.

Table 5. Ellsberg paradox and cross entropy for belief entropy.

Note that in Table 5a, choices 1B and 2 A have infinite entropy. The reason is that they contain the terms and , which equate to infinity except when \(a = 0\) or \(a = 1\), respectively. Intuitively, the cause is that we have zero belief in winning or losing, respectively, although both winning and losing have a non-zero probability of occurring (except for extreme base rates). The interesting aspect of the extreme base rates is that they would remove the ambiguity uncertainty altogether. Since we would know the outcome under uncertainty, there is no ambiguity-based uncertainty to measure.

That choices 1B and 2 A have infinite entropy may be desirable for one reason: the entropy exceeds that of any opinions without zero-belief events. However, the downside is that a one-in-a-million event with zero belief and a certain event with zero belief yield the exact same entropy: infinite bits. Consider the general definition of ambiguity surprisal, \(- \log (b^A_X(x) + c \cdot u^A_X)\), where c converges to 0 (and the expected surprisal to belief entropy), then the entropy converges to infinity slowly. We can consider \(c = \epsilon \), for some small \(\epsilon > 0\), then the entropy remains finite. The addition of \(\epsilon \) hardly affects the entropy of opinions without zero-beliefs. For example, (0.5, 0.1), with base rate , has \(-0.7 \log (0.5 + \epsilon ) - 0.3 \log (0.1 + \epsilon ) \approx - 0.7 \log (0.5) - 0.3 \log (0.1) \approx 1.6966\) bits entropy. For opinions with zero-beliefs, we get a more fine-grained measure of entropy. For example, (0.5, 0) and (0, 0), both with base rate , have \(-0.75 \log (0.5 + \epsilon ) - 0.25 \log (\epsilon ) \approx - 0.25 \log (\epsilon )\) and \(-0.5 \log (\epsilon ) - 0.5 \log (\epsilon ) = - \log (\epsilon )\) bits of entropy, and the latter is four times more bits entropy. Thus, the belief entropy can be made more fine-grained without loss of generality.

The belief entropy can be extended to belief cross entropy:

Definition 9

(Belief Cross Entropy). The belief cross entropy between \(\omega ^A_X\) and \(\omega ^B_X\), \(H_b(\omega ^A_X,\omega ^B_X)\) is defined: \(- \sum _x p^A_X(x) \log (b^B_X(x))\).

We compute the belief cross entropy between the opinions introduced in Table 3, and the results are presented in Table 5b. Some equalities in the table can be easily derived, based on the Definition 9. Note that cross entropy between any opinions and \(\omega ^E_X\), which means complete uncertainty, is infinity. This is not reasonable, as explained below.

The belief cross entropy measures the information distance from one opinion to the other. Intuitively, when an uncertain opinion conflicts with another opinion, this may not surprise us, whereas two conflicting and certain opinions would be a surprise. Unfortunately, this is not the intuition captured by the definition of belief cross entropy. Belief cross entropy measures the information gap between two opinions, and uncertainty introduced large quantities of entropy, allowing for bigger information gaps. In the next section, we introduce a measure of entropy that models the intuition of entropy that is suitable for cross entropy.

7.2 Conceivability Entropy

In belief entropy, the belief in an outcome determines the surprisal. However, we can imagine that users are not surprised when they are uncertain. To encode this, we can define surprisal based on the belief mass plus the uncertainty, by letting the belief in x be: \(- \log (b^A_X(x))\). This equates to \(-\log (b^A_X(x) + c \cdot u^A_X)\), when \(c = 1\). Note that \(b^A_X(x) + u^A_X = 1 - \sum _{x' \ne x} b^A_X(x)\), so conceivability can be seen as the converse of belief.

The entropy can be derived from the surprisal:

Definition 10

(Conceivability Entropy). The conceivability entropy \(H_c(\omega ^A_X)\) is defined: \(- \sum _x p^A_X(x) \log (b^A_X(x) + u^A_X)\).

When the opinion is complete uncertainty, surprisal is zero, as all outcomes are fully conceivable. For this reason, viewing surprisal as the opposite of information does not make sense here (unlike the other notions of entropy, such as belief entropy and Shannon entropy),

Table 6. Ellsberg paradox and cross entropy for conceivability entropy.

However, the conceivability entropy notion is suitable for cross entropy:

Definition 11

(Conceivability Cross Entropy). The conceivability cross entropy between \(\omega ^A_X\) and \(\omega ^B_X\), \(H_c(\omega ^A_X,\omega ^B_X)\) is defined: \(- \sum _x p^A_X(x) \log (b^B_X(x)+u^B_X)\).

Conceivability cross entropy is a more useful measure of distance between opinions than belief cross entropy. More uncertain opinions tend to have a shorter distance. The reason why conceivability cross entropy is a better measure for distance, is that we want to measure whether it is “conceivable” that an opinion describes another opinion. We can see the concrete numbers in Table 6b. The cross entropy of opinions with similar pignistic probabilities is lower, but the amount of uncertainty correlates more strongly. The distance from any opinion to complete uncertainty is 0.

8 Conclusion

To understand decision making, not only must we analyse uncertainty introduced by risk, but also the uncertainty about risk (ambiguity). Standard notions of Shannon entropy in information theory can measure the former, but not the latter. We extend information theory to capture subjective logic – a framework to deal with uncertainty about ambiguity – in four ways.

Two of the extensions of information theory remove ambiguity before measuring entropy. The first extension, pignistic entropy, models rational agents. The second extension, aggregate uncertainty entropy, models paranoid agents.

However, the interesting extensions model ambiguity, rather than remove it. The final two extensions, belief entropy and conceivability entropy, are two sides of the same coin. Belief entropy is suitable for measuring entropy of both risk and ambiguity. Conceivability entropy is more suited for measuring cross entropy.

All extensions are related using Ellsberg paradox as a running example, and the different entropies provide insights into the paradox. Moreover, the different entropies can be generalised to cross entropy – a measure of the quality of an opinion, given a valid opinion. Cross entropy can be used for analysing the quality of opinions in systems that use subjective logic.