Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The main goal of this paper is to provide a new definition of entropy of belief functions in the D-S theory that is consistent with semantics of the D-S theory. By entropy, we mean a real-valued measure of uncertainty in the tradition of Hartley [12] and Shannon [30]. Also, while there are several theories of belief functions (see, e.g., [10, 36]), our goal is to define entropy for the D-S theory that uses Dempster’s product-intersection rule [6] as the combination rule.

Hartley’s Entropy. Suppose X is a discrete random variable with a finite state space \(\varOmega _{X}\), whose elements are assumed to be mutually exclusive and exhaustive. Suppose this is all we know about X, i.e., we do not know the probability mass function (PMF) of X. What is a measure of uncertainty? Hartley [12] defines entropy of \(\varOmega _{X}\) as follows:

$$\begin{aligned} U(\varOmega _{X}) = \log _{2}(|\varOmega _{X}|), \end{aligned}$$
(1)

where \(U(\varOmega _{X})\) denotes a real-valued measure of uncertainty of \(\varOmega _{X}\), with units of bits. First, notice that \(U(\varOmega _{X})\) does not depend on the labels attached to the states in \(\varOmega _{X}\), only on the number of states in \(\varOmega _{X}\). Second, Rényi [26] shows that Hartley’s definition in Eq. (1) is characterized by the following three properties.

  1. 1.

    (Additivity) Suppose X and Y are random variables with finite state spaces \(\varOmega _{X}\) and \(\varOmega _{Y}\), respectively. The joint state space of (XY) is \(\varOmega _{X} \times \varOmega _{Y}\). Then, \(U(\varOmega _{X} \times \varOmega _{Y}) = U(\varOmega _{X}) + U(\varOmega _{Y})\).

  2. 2.

    (Monotonicity) If \(|\varOmega _{X_{1}}| > |\varOmega _{X_{2}}|\), then \(U(\varOmega _{X_{1}}) > U(\varOmega _{X_{2}})\).

  3. 3.

    (Units) If \(|\varOmega _{X}| = 2\), then \(U(\varOmega _{X}) = 1\) bit.

Shannon’s Entropy. Now suppose we learn of a probability mass function \(P_{X}\) of X. What is the information content of \(P_{X}\)? Or alternatively, we can ask: What is the uncertainty in \(P_{X}\)? Shannon [30] provides an answer to the second question as follows:

$$\begin{aligned} H_{s}(P_{X}) = \sum _{x \in \varOmega _{X}} P_{X}(x) \log _{2}\left( \frac{1}{P_{X}(x)}\right) , \end{aligned}$$
(2)

where \(H_{s}(P_{X})\) is called Shannon’s measure of entropy (uncertainty) in PMF \(P_{X}\). Shannon’s entropy is characterized (up to a constant) by the following two properties [30].

  1. 1.

    (Monotonicity) If \(P_{X}\) is an equally likely PMF, then \(H_{s}(P_{X})\) is a monotonically increasing function of \(|\varOmega _{X}|\).

  2. 2.

    (Compound distributions) If a PMF \(P_{X,Y}\) is factored into two PMFs \(P_{X,Y}(x,y) = P_{X}(x) \, P_{Y|x}(y)\), then \(H_{s}(P_{X,Y}) = H_{s}(P_{X}) + \sum _{x \in \varOmega _{X}}P_{X}(x)\,H_{s}(P_{Y|x})\).

The uncertainty prior to learning \(P_{X}\) was \(U(\varOmega _{X})\). After learning \(P_{X}\), it is now \(H_{s}(P_{X})\). Thus, if \(I(P_{X})\) denotes information content of \(P_{X}\), then we have the equality

$$\begin{aligned} I(P_{X}) + H_{s}(P_{X}) = U(\varOmega _{X}). \end{aligned}$$
(3)

The maximum value of \(H_{s}(P_{X})\) (over the space of all PMFs for X) is \(\log _{2}(|\varOmega _{X}|)\), which is attained by the uniform PMF for X, \(P_{X}(x) = 1/|\varOmega _{X}|\) for all \(x \in \varOmega _{X}\). Thus, \(I(P_{X}) \ge 0\), with equality if and only if \(P_{X}\) is the uniform PMF. At the other extreme, \(H_{s}(P_{X}) \ge 0\), with equality if and only if there exists \(x \in \varOmega _{X}\) such that \(P_{X}(x) = 1\). Such a PMF has no uncertainty, and therefore, it must have maximum information. Thus \(I(P_{X}) \le U(\varOmega _{X})\), with equality if and only if there exists \(x \in \varOmega _{X}\) such that \(P_{X}(x) = 1\).

Entropy for the D-S Theory. In the case of D-S theory of belief functions, if m is a basic probability assignment (BPA) for X, let H(m) denote the entropy of BPA m. First, the D-S theory is a generalization of probability theory. The equiprobable PMF is represented by a Bayesian uniform basic probability assignment (BPA) \(m_{u}\) for X such that \(m_{u}(\{x\}) = 1/|\varOmega _{X}|\) for all \(x \in \varOmega _{X}\). So to be consistent with probability theory, we should have \(H(m_{u}) = \log _{2}(|\varOmega _{X}|)\). However, such a BPA, \(m_{u}\) for X, does not have the maximum uncertainty. The vacuous BPA \(\iota _{X}\) for X such that \(\iota _{X}(\varOmega _{X}) = 1\) has more uncertainty than the equiprobable Bayesian \(m_{u}\). As we cannot imagine a BPA for X that has more uncertainty than \(\iota _{X}\), we assume that \(H(\iota _{X})\) has the maximum uncertainty. Klir [16] and others argue that a measure of uncertainty can capture a measure of conflict as well as a measure of non-specificity. Assuming that each of these two measures is scaled so that they are each measured on a scale \([0, \log _{2}(|\varOmega _{X}|)]\), then, \(H(\iota _{X}) = 2 \log _{2}(|\varOmega _{X}|)\). Like in probability theory, we can define a measure of information content of BPA m for X so that the following holds:

$$\begin{aligned} I(m) + H(m) = 2 \log _{2}(|\varOmega _{X}|), \end{aligned}$$
(4)

where I(m) denotes the information content of BPA m for X. Thus, for the vacuous BPA \(\iota _{X}\) for X, we have \(I(\iota _{X}) = 0\), whereas for the Bayesian uniform BPA \(m_{u}\) for X, we have \(I(m_{u}) = \log _{2}(|\varOmega _{X}|)\).

In this paper, we are interested in defining a measure of entropy (uncertainty) of BPAs m for X in the D-S theory of belief functions on the scale \(0 \le H(m) \le 2 \log _{2}(|\varOmega _{X}|)\), so that \(H(m) \le 2 \log _{2}(|\varOmega _{X}|)\) with equality if and only if \(m = \iota _{X}\), and \(H(m) \ge 0\), with equality if and only if m is such that \(m(\{x\}) = 1\) for some \(x \in \varOmega _{X}\). Also, we require a monotonicity property, a probability consistency property, an additivity property, and a requirement that any definition should be based on semantics consistent with D-S theory. These are discussed in detail in Sect. 2.

Literature Review. There is a rich literature on information theoretic measures for the D-S theory of belief functions. Some, e.g., [13, 23, 32, 38], define the information content of BPA m so that \(I(\iota _{X}) = 0\). Some define entropy on the scale \([0, \log _{2}(|\varOmega _{X}|)]\) so that they define entropy only as a measure of conflict (e.g., [35]), or only as a measure of non-specificity [8]. Some, e.g., [11, 15, 17, 18, 21, 24, 25, 37] define entropy as a measure of conflict and non-specificity, but on a scale \([0, \log _{2}(|\varOmega _{X}|)]\), so that \(H(m_{u}) = H(\iota _{X}) = \log _{2}(|\varOmega _{X}|)\). Some, e.g., [1, 3, 22] define entropy as a measure of conflict and non-specificity on the scale \([0, 2\log _{2}(|\varOmega _{X}|)]\), but they do so using semantics of belief functions (credal sets of PMFs) that are inconsistent with Dempster’s rule of combination [10, 29]. Our definition is the only one that defines entropy as a measure of conflict and non-specificity, on the scale \([0, 2\log _{2}(|\varOmega _{X}|)]\), using semantics of belief functions that are consistent with Dempster’s combination rule.

2 Desired Properties of Entropy of BPAs in the D-S Theory

First, we explain our informal requirement that any definition of entropy for D-S theory should be consistent with the semantics of this theory. Next, we propose five formal properties that a definition of entropy of BPAs in the D-S theory should satisfy, Finally, we compare these properties with those proposed by Klir and Wierman [19] for the same purposes.

Consistency with D-S Theory Semantics Requirement. First, let us stress once more that we are concerned in this paper only with the D-S belief functions theory that includes Dempster’s combination rule as the operation for aggregating knowledge. There are theories of belief functions that use other combination rules. Let \(2^{\varOmega _{X}}\) denote the set of all non-empty subsets of \(\varOmega _{X}\). A BPA m for X can be considered as an encoding of a collection of PMFs \(\mathcal {P}_{m}\) for X such that for all \(\textsf {a} \in 2^{\varOmega _{X}}\) we have:

$$\begin{aligned} Bel_{m}(\textsf {a}) = \sum _{\textsf {b} \in 2^{\varOmega _{X}}: \textsf {b} \subseteq \textsf {a}} m(\textsf {b}) = \min _{P \in \mathcal {P}_{m}} \sum _{x \in \textsf {a}}P(x). \end{aligned}$$
(5)

\(\mathcal {P}_{m}\) is referred to as a credal set corresponding to m in the imprecise probability literature (see, e.g., [36]). For such a theory of belief functions, Fagin and Halpern [9] propose a combination rule that is different from Dempster’s combination rule. Thus, a BPA m in the D-S theory cannot be interpreted as a collection of PMFs satisfying Eq. (5) [10, 28]. There are, of course, semantics that are consistent with D-S theory, such as multivalued mappings [6], random codes [28], transferable beliefs [34], and hints [20].

Example 1

Consider a BPA \(m_{1}\) for X with state space \(\varOmega _{X} = \{x_{1}, x_{2}, x_{3}\}\) as follows: \(m_{1}(\{x_{1}\}) = 0.5\), \(m_{1}(\varOmega _{X}) = 0.5\). With the credal set semantics of a BPA function, \(m_{1}\) corresponds to a set of PMFs \(\mathcal {P}_{m_{1}} = \{P \in \mathcal {P} : P(x_{1}) \ge 0.5\}\), where \(\mathcal {P}\) denotes the set of all PMFs for X. Now suppose we get a distinct piece of evidence \(m_{2}\) for X such that \(m_{2}(\{x_{2}\}) = 0.5\), \(m_{2}(\varOmega _{X}) = 0.5\). \(m_{2}\) corresponds to \(\mathcal {P}_{m_{2}} = \{P \in \mathcal {P} : P(x_{2}) \ge 0.5\}\). The only PMF that is in both \(\mathcal {P}_{m_{1}}\) and \(\mathcal {P}_{m_{2}}\) is \(P \in \mathcal {P}\) such that \(P(x_{1}) = P(x_{2}) = 0.5\), and \(P(x_{3}) = 0\). Notice that if we use Dempster’s rule to combine \(m_{1}\) and \(m_{2}\), we have: \((m_{1}\oplus m_{2})(\{x_{1}\}) = \frac{1}{3}\), \((m_{1}\oplus m_{2})(\{x_{2}\}) = \frac{1}{3}\), and \((m_{1}\oplus m_{2})(\varOmega _{X}) = \frac{1}{3}\). The set of PMFs \(\mathcal {P}_{m_{1}\oplus m_{2}} = \{P \in \mathcal {P}: P(x_{1}) \ge \frac{1}{3}, P(x_{2}) \ge \frac{1}{3}\}\) is not the same as \(\mathcal {P}_{m_{1}} \cap \mathcal {P}_{m_{2}}\). Thus, credal set semantics of belief functions are not compatible with Dempster’s rule of combination.

Second, given a BPA m for X in the D-S theory, there are many ways to transform m to a corresponding PMF \(P_{m}\) for X [5]. However, only one of these ways, called the plausibility transform [4], is consistent with m in the D-S theory in the sense that \(P_{m_{1}} \otimes P_{m_{2}} = P_{m_{1} \oplus m_{2}}\), where \(\otimes \) is the combination rule in probability theory [31], and \(\oplus \) is Dempster’s combination rule in D-S theory [4]. [7, 15, 25] define entropy of m as the Shannon’s entropy of the pignistic transform of m. The pignistic transform of m is not compatible with Dempster’s combination rule [4], and therefore, this definition is not consistent with D-S theory semantics. Thus, as per our consistency with D-S theory semantics requirement, any method for defining entropy of m in the D-S theory by first transforming m to a corresponding PMF should use the plausibility transform method. Notice that we are not claiming that a definition of entropy for D-S theory must use the plausibility transform method, only that if one takes the path of first transforming a BPA m to an equivalent PMF and then using Shannon’s entropy of the PMF as the definition of entropy of m, then to be compatible with D-S theory semantics, the transformation method used must be the plausibility transform method.

Example 2

Consider a situation where we have vacuous prior knowledge of X with \(\varOmega _{X} = \{x_{1}, \ldots , x_{70}\}\) and we receive evidence represented as BPA m for X as follows: \(m(\{x_{1}\}) = 0.30\), \(m(\{x_{2}\}) = 0.01\), and \(m(\{x_{2}, \ldots , x_{70}\}) = 0.69\). The pignistic transform of m [33], denoted by \(BetP_{m}\), is as follows: \(BetP_{m}(x_{1}) = 0.30\), \(BetP_{m}(x_{2}) = 0.02\), and \(BetP_{m}(x_{3}) = \ldots = BetP_{m}(x_{70}) = 0.01\). Thus, as per the pignistic transform, BPA m is interpreted as evidence where \(x_{1}\) is 15 times more likely than \(x_{2}\). Now suppose we receive another distinct piece of evidence that is also represented by m. As per the D-S theory, our total evidence is now \(m \oplus m\). If on the basis of m (or \(BetP_{m}\)), \(x_{1}\) was 15 times more likely than \(x_{2}\), then now that we have evidence \(m \oplus m\), \(x_{1}\) should be \(15^{2} = 225\) times more likely than \(x_{2}\). But \(BetP_{m\oplus m}(x_{1}) \approx 0.156\) and \(BetP_{m\oplus m}(x_{2}) \approx 0.036\). So according to \(BetP_{m\oplus m}\), \(x_{1}\) is only 4.33 more likely than \(x_{2}\). Thus, \(BetP_{m}\) is not consistent with Dempster’s combination rule.

Thus, one requirement we implicitly assume is that any definition of entropy of m should be based on semantics for m that are consistent with the basis tenets of D-S theory. Also, we implicitly assume existence and continuity—given a BPA m, H(m) should always exist, and H(m) should be a continuous function of m. We do not list these three requirements explicitly.

Desired Properties of Entropy for the D-S Theory. The following list of desired properties of entropy \(H(m_{X})\), where \(m_{X}\) is a BPA for X, is motivated by the properties of Shannon’s entropy of PMFs [30].

Let X and Y denote random variables with state spaces \(\varOmega _{X}\) and \(\varOmega _{Y}\), respectively. Let \(m_{X}\) and \(m_{Y}\) denote distinct BPAs for X and Y, respectively. Let \(\iota _{X}\) and \(\iota _{Y}\) denote the vacuous BPAs for X and Y, respectively.

  1. 1.

    (Non-negativity) \(H(m_{X}) \ge 0\), with equality if and only if there is a \(x \in \varOmega _{X}\) such that \(m_{X}(\{x\}) = 1\). This is similar to the probabilistic case.

  2. 2.

    (Maximum entropy) \(H(m_{X}) \le H(\iota _{X})\), with equality if and only if \(m_{X} = \iota _{X}\). This makes sense as the vacuous BPA \(\iota _{X}\) for X has the most uncertainty among all BPAs for X. Such a property is advocated in [3].

  3. 3.

    (Monotonicity) If \(|\varOmega _{X}| < |\varOmega _{Y}|\), then \(H(\iota _{X}) < H(\iota _{Y})\). A similar property is used by Shannon to characterize his definition of entropy of PMFs.

  4. 4.

    (Probability consistency) If \(m_{X}\) is a Bayesian BPA for X, then \(H(m_{X}) = H_{s}(P_{X})\), where \(P_{X}\) is the PMF of X corresponding to \(m_{X}\).

  5. 5.

    (Additivity) \(H(m_{X} \oplus m_{Y}) = H(m_{X}) + H(m_{Y})\). This is a weaker form of the compound property of Shannon’s entropy of a PMF.

Klir and Wierman [19] also describe a set of properties that they believe should be satisfied by any meaningful measure of uncertainty based on intuitive grounds. Two of the properties that they suggest, probability consistency and additivity, are also included in the above list. Our maximum entropy property is not in their list. Two of the properties that they require do not make intuitive sense to us.

First, Klir and Wierman require a property they call “set consistency” as follows: \(H(m) = \log _{2}(|\textsf {a}|)\) whenever m is deterministic (i.e., it has only one focal element) with focal set a. This property would require that \(H(\iota _{X}) = \log _{2}(|\varOmega _{X}|)\). The probability consistency property requires that for the Bayesian uniform BPA \(m_{u}\), \(H(m_{u}) = \log _{2}(|\varOmega _{X}|)\). Thus, these two requirements entail that \(H(\iota _{X}) = H(m_{u}) = \log _{2}(|\varOmega _{X}|)\). We disagree, as there is greater uncertainty in \(\iota _{X}\) than in \(m_{u}\).

Second, Klir and Wierman require a property they call “range” as follows: For any BPA \(m_{X}\) for X, \(0 \le H(m_{X}) \le \log _{2}(|\varOmega _{X}|)\). The probability consistency property requires that \(H(m_{u}) = \log _{2}(|\varOmega _{X}|)\). Also including the range property prevents from having \(H(\iota _{X}) > H(m_{u})\). So we do not include it in our list.

Finally, Klir and Wierman require a sub-additivity property defined as follows. Suppose m is a BPA for \(\{X, Y\}\), with marginal BPAs \(m^{\downarrow X}\) for X, and \(m^{\downarrow Y}\) for Y. Then,

$$\begin{aligned} H(m) \le H(m^{\downarrow X}) + H(m^{\downarrow Y}) \end{aligned}$$
(6)

We agree that this property is important, and the only reason we do not include it in our list is because we are unable to meet this requirement in addition to the five requirements that we do include, and our implicit requirement that any definition be consistent with the semantics of D-S theory of belief functions.

The most important property that characterizes Shannon’s definition of entropy is the compound property \(H_{s}(P_{X, Y}) = H_{s}(P_{X} \otimes P_{Y|X}) = H_{s}(P_{X}) + H_{s}(P_{Y|X})\), where \(H_{s}(P_{Y|X}) = \sum _{x \in \varOmega _{X}}P_{X}(x)\,H_{s}(P_{Y|x})\). This translated to the D-S theory of belief function would require factorizing a BPA m for \(\{X, Y\}\) into BPA \(m^{\downarrow X}\) for X, and BPA \(m_{Y|X}\) for \(\{X, Y\}\) such that \(m = m^{\downarrow X} \oplus m_{Y|X}\). This cannot be done for all BPA m for \(\{X, Y\}\) [31]. But, we could construct m for \(\{X, Y\}\) such that \(m = m_{X} \oplus m_{Y|X}\), where \(m_{X}\) is a BPA for X, and \(m_{Y|X}\) is a BPA for \(\{X, Y\}\) such that \(m_{Y|X}^{\downarrow X} = \iota _{X}\), and \(m_{X}\) and \(m_{Y|X}\) are non-conflicting, i.e., the normalization constant in Dempster’s combination rule is 1. Notice that such a constructive BPA m would have the property \(m^{\downarrow X} = (m_{X} \oplus m_{Y|X})^{\downarrow X} = m_{X}\). For such constructive BPAs m, we could require a compound property as follows:

$$\begin{aligned} H(m_{X} \oplus m_{Y|X}) = H(m_{X}) + H(m_{Y|X}). \end{aligned}$$
(7)

However, we are unable to formulate a definition of H(m) to satisfy such a compound property. So like the sub-additivity property, we do not include a compound property in our list of properties. The additivity property included in Klir-Wierman’s and our list is so weak that it is satisfied by any definition on a log scale. All definitions of entropy of belief functions in the literature are defined on a log scale, and, thus, they all satisfy the additivity property.

3 A New Definition of Entropy of BPAs in the D-S Theory

In this section, we propose a new definition of entropy of BPAs in the D-S theory. The new definition of entropy is based on the plausibility transform of a belief function to an equivalent probability function. Therefore, we start this section by describing the plausibility transform introduced originally in [4].

Plausibility Transform of a BPA to a PMF. Suppose m is a BPA for X. What is the PMF of X that best represents m in the D-S theory? An answer to this question is given by Cobb and Shenoy [4], who propose the plausibility transform of m as follows. First consider the plausibility function \(Pl_{m}\) corresponding to m. Next, construct a PMF for X, denoted by \(P_{Pl_{m}}\), by the values of \(Pl_{m}\) for singleton subsets suitably normalized, i.e.,

$$\begin{aligned} P_{Pl_{m}}(x) = K^{-1} \cdot Pl_{m}(\{x\}) = K^{-1} \cdot Q_{m}(\{x\}) \end{aligned}$$
(8)

for all \(x \in \varOmega _{X}\), where K is a normalization constant that ensures \(P_{Pl_{m}}\) is a PMF, i.e., \(K = \sum _{x \in \varOmega _{X}} Pl_{m}(\{x\}) = \sum _{x \in \varOmega _{X}} Q_{m}(\{x\})\).

[4] argues that of the many methods for transforming belief functions to PMFs, the plausibility transform is one that is consistent with Dempster’s rule of combination in the sense that if we have BPAs \(m_{1}, \ldots , m_{k}\) for X, then \(P_{Pl_{m_{1} \oplus \ldots \oplus m_{k}}} = P_{Pl_{m_{1}}} \otimes \ldots \otimes P_{Pl_{m_{k}}}\), where \(\otimes \) denotes pointwise multiplication followed by normalization (i.e., Bayesian combination [31]). It can be shown that the plausibility transform is the only method that has this property, which follows from the fact that Dempster’s rule of combination is pointwise multiplication of commonality functions followed by normalization [27].

Example 3

Consider a BPA m for X as described in Example 2. Then, \(Pl_{m}\) for singleton subsets is as follows: \(Pl_{m}(\{x_{1}\}) = 0.30\), \(Pl_{m}(\{x_{2}\}) = 0.70\), \(Pl_{m}(\{x_{3}\}) = \cdots = Pl_{m}(\{x_{70}\}) = 0.69\). The plausibility transform of m is as follows: \(P_{Pl_{m}}(x_{1}) = 0.3/49.72 \approx 0.0063\), and \(P_{Pl_{m}}(x_{2}) = 0.7/49.72 \approx 0.0.0146\), and \(P_{Pl_{m}}(x_{3}) = \cdots = P_{Pl_{m}}(x_{70}) \approx 0.0144\). Notice that \(P_{Pl_{m}}\) is quite different from \(BetP_{m}\). In \(BetP_{m}\), \(x_{1}\) is 15 times more likely than \(x_{2}\). In \(P_{Pl_{m}}\), \(x_{2}\) is 2.33 times more likely than \(x_{1}\). Now consider the scenario where we get a distinct piece of evidence that is identical to m, so that our total evidence is \(m \oplus m\). If we compute \(m \oplus m\) and \(P_{Pl_{m \oplus m}}\), then as per \(P_{Pl_{m \oplus m}}\), \(x_{2}\) is \(2.33^{2}\) more likely than \(x_{1}\). This is a direct consequence of the consistency of the plausibility transform with Dempster’s combination rule.

A New Definition of Entropy of a BPA. Suppose m is a BPA for X. The entropy of m is defined as follows:

$$\begin{aligned} H(m) = \sum _{x \in \varOmega _{X}} P_{Pl_{m}}(x) \log _{2}\left( \frac{1}{P_{Pl_{m}}(x)}\right) + \sum _{\textsf {a} \in 2^{\varOmega _{X}}} m(\textsf {a}) \log _{2}(|\textsf {a}|). \end{aligned}$$
(9)

The first component is Shannon’s entropy of \(P_{Pl_{m}}\), and the second component is generalized Hartley’s entropy of m [8]. Like some of the definitions in the literature, the first component in Eq. (9) is designed to measure conflict in m, and the second component is designed to measure non-specificity in m. Both components are on the scale \([0, \log _{2}(|\varOmega _{X}|)]\), and therefore, H(m) is on the scale \([0, 2 \log _{2}(|\varOmega _{X}|)]\).

Theorem 1

The entropy H(m) of BPA m for X defined in Eq. (9) satisfies the non-negativity, maximum entropy, monotonicity, probability consistency, and additivity properties. It is also consistent with semantics of the D-S theory.

A proof of this theorem can be found in [14] (that can be downloaded from \(\langle \) http://pshenoy.faculty.ku.edu/Papers/WP330.pdf \(\rangle \)). Finally, we provide an example that shows our definition does not satisfy the sub-additivity property.

Example 4

Consider a BPA m for binary-valued variables \(\{X, Y\}\): \(m(\{(x, y)\}) = m(\{(x, \bar{y})\}) = 0.1\), \(m(\{(\bar{x}, y)\}) = m(\{(\bar{x}, \bar{y})\}) = 0.3\), \(m(\varOmega _{\{X, Y\}}) = 0.2.\) It is easy to verify that \(H(m) \doteq 2.35\). The marginal BPA \(m^{\downarrow X}\) is as follows: \(m^{\downarrow X}(\{x\}) = 0.2\), \(m^{\downarrow X}(\{\bar{x}\}) = 0.6\), and \(m^{\downarrow X}(\varOmega _{X}) = 0.2\). It is easy to verify that \(H(m^{\downarrow X}) \doteq 1.12\). Similarly, the marginal BPA \(m^{\downarrow Y}\) is as follows: \(m^{\downarrow Y}(\{y\}) = 0.4\), \(m^{\downarrow Y}(\{\bar{y}\}) = 0.4\), and \(m^{\downarrow Y}(\varOmega _{Y}) = 0.2\). It is easy to verify that \(H(m^{\downarrow Y}) \doteq 1.20\). Thus, \(H(m) \doteq 2.35 > H(m^{\downarrow X}) + H(m^{\downarrow Y}) \doteq 1.12 + 1.20 = 2.32\).

The only definition that satisfies the five properties we state plus the sub-additivity property is that due to Maeda and Ichihashi [22], but this definition is based on credal set semantics of a belief function that is inconsistent with Dempster’s combination rule. Whether there exists a definition that satisfies our five properties plus sub-additivity, and that is based on semantics consistent with the basic tenets of D-S theory, remains an open question.

4 Summary and Conclusions

Interpreting Shannon’s entropy of a PMF of a discrete random variable as the amount of uncertainty in the PMF [30], we propose five desirable properties of entropy of a basic probability assignment in the D-S theory of belief functions. These five properties are motivated by the analogous properties of Shannon’s entropy of PMFs, and they are based on our intuition that a vacuous belief function has more uncertainty than an equiprobable Bayesian belief function. Also, besides the five properties, we also require that any definition should be based on semantics consistent with the D-S theory of belief functions (with Dempster’s rule as the combination rule), H(m) should always exist, and H(m) should be a continuous function of m. Thus, a monotonicity-like property suggested by Abellán-Masegosa [2], based on credal set semantics of belief functions that are not compatible with Dempster’s rule, is not included in our set of requirements.

It would be ideal if we can state the consistency with D-S theory semantics as a formal requirement, but we are unable to do so. In our opinion, the additivity property for the case of two distinct BPAs for disjoint sets of variables does not fully capture consistency with D-S theory semantics. In any case, the definitions of entropy based on credal sets of probability distributions and pignistic transforms are not consistent with Dempster’s combination rule, and therefore, in our perspective, not appropriate for the D-S theory of evidence.

As first suggested by Lamata and Moral [21], we propose a new definition of entropy of BPA as a combination of Shannon’s entropy of an equivalent PMF that captures the conflict measure of entropy, and Dubois-Prade’s entropy of a BPA that captures the non-specificity (or generalized Hartley) measure of entropy. The equivalent PMF is that obtained by using the plausibility transform [4]. This new definition satisfies all five properties we propose. More importantly, our definition is consistent with the semantics for the D-S theory of belief functions.

An open question is whether there exists a definition of entropy of BPA m in the D-S theory that satisfies the five properties we list in Sect. 2, the sub-additivity property, and most importantly, that is consistent with semantics for the D-S theory. Our definition satisfies the five properties and is consistent with semantics for the D-S theory, but it does not satisfy the sub-additivity property.