Entropy of Belief Functions in the Dempster-Shafer Theory: A New Perspective

Jiroušek, Radim; Shenoy, Prakash P.

doi:10.1007/978-3-319-45559-4_1

Radim Jiroušek^15,17 &
Prakash P. Shenoy¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9861))

Included in the following conference series:

International Conference on Belief Functions

615 Accesses
6 Citations

Abstract

We propose a new definition of entropy of basic probability assignments (BPA) in the Dempster-Shafer (D-S) theory of belief functions, which is interpreted as a measure of total uncertainty in the BPA. We state a list of five desired properties of entropy for D-S belief functions theory that are motivated by Shannon’s definition of entropy of probability functions, together with the implicit requirement that any definition should be consistent with semantics of D-S belief functions theory. Three of our five desired properties are different from the five properties described by Klir and Wierman. We demonstrate that our definition satisfies all five properties in our list, and is consistent with semantics of D-S theory, whereas none of the existing definitions do. Our definition does not satisfy the sub-additivity property. Whether there exists a definition that satisfies our five properties plus sub-additivity, and that is consistent with semantics for the D-S theory, remains an open question.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Decomposable Entropy of Belief Functions in the Dempster-Shafer Theory

Measure of Information Content of Basic Belief Assignments

The generalized maximum belief entropy model

Article 13 March 2022

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The main goal of this paper is to provide a new definition of entropy of belief functions in the D-S theory that is consistent with semantics of the D-S theory. By entropy, we mean a real-valued measure of uncertainty in the tradition of Hartley [12] and Shannon [30]. Also, while there are several theories of belief functions (see, e.g., [10, 36]), our goal is to define entropy for the D-S theory that uses Dempster’s product-intersection rule [6] as the combination rule.

Hartley’s Entropy. Suppose X is a discrete random variable with a finite state space $\varOmega _{X}$, whose elements are assumed to be mutually exclusive and exhaustive. Suppose this is all we know about X, i.e., we do not know the probability mass function (PMF) of X. What is a measure of uncertainty? Hartley [12] defines entropy of $\varOmega _{X}$ as follows:

$$\begin{aligned} U(\varOmega _{X}) = \log _{2}(|\varOmega _{X}|), \end{aligned}$$

(1)

where $U(\varOmega _{X})$ denotes a real-valued measure of uncertainty of $\varOmega _{X}$, with units of bits. First, notice that $U(\varOmega _{X})$ does not depend on the labels attached to the states in $\varOmega _{X}$, only on the number of states in $\varOmega _{X}$. Second, Rényi [26] shows that Hartley’s definition in Eq. (1) is characterized by the following three properties.

1.
(Additivity) Suppose X and Y are random variables with finite state spaces $\varOmega _{X}$ and $\varOmega _{Y}$, respectively. The joint state space of (X, Y) is $\varOmega _{X} \times \varOmega _{Y}$. Then, $U(\varOmega _{X} \times \varOmega _{Y}) = U(\varOmega _{X}) + U(\varOmega _{Y})$.
2.
(Monotonicity) If $|\varOmega _{X_{1}}| > |\varOmega _{X_{2}}|$, then $U(\varOmega _{X_{1}}) > U(\varOmega _{X_{2}})$.
3.
(Units) If $|\varOmega _{X}| = 2$, then $U(\varOmega _{X}) = 1$ bit.

Shannon’s Entropy. Now suppose we learn of a probability mass function $P_{X}$ of X. What is the information content of $P_{X}$? Or alternatively, we can ask: What is the uncertainty in $P_{X}$? Shannon [30] provides an answer to the second question as follows:

$$\begin{aligned} H_{s}(P_{X}) = \sum _{x \in \varOmega _{X}} P_{X}(x) \log _{2}\left( \frac{1}{P_{X}(x)}\right) , \end{aligned}$$

(2)

where $H_{s}(P_{X})$ is called Shannon’s measure of entropy (uncertainty) in PMF $P_{X}$. Shannon’s entropy is characterized (up to a constant) by the following two properties [30].

1.
(Monotonicity) If $P_{X}$ is an equally likely PMF, then $H_{s}(P_{X})$ is a monotonically increasing function of $|\varOmega _{X}|$.
2.
(Compound distributions) If a PMF $P_{X,Y}$ is factored into two PMFs $P_{X,Y}(x,y) = P_{X}(x) \, P_{Y|x}(y)$, then $H_{s}(P_{X,Y}) = H_{s}(P_{X}) + \sum _{x \in \varOmega _{X}}P_{X}(x)\,H_{s}(P_{Y|x})$.

The uncertainty prior to learning $P_{X}$ was $U(\varOmega _{X})$. After learning $P_{X}$, it is now $H_{s}(P_{X})$. Thus, if $I(P_{X})$ denotes information content of $P_{X}$, then we have the equality

$$\begin{aligned} I(P_{X}) + H_{s}(P_{X}) = U(\varOmega _{X}). \end{aligned}$$

(3)

The maximum value of $H_{s}(P_{X})$ (over the space of all PMFs for X) is $\log _{2}(|\varOmega _{X}|)$, which is attained by the uniform PMF for X, $P_{X}(x) = 1/|\varOmega _{X}|$ for all $x \in \varOmega _{X}$. Thus, $I(P_{X}) \ge 0$, with equality if and only if $P_{X}$ is the uniform PMF. At the other extreme, $H_{s}(P_{X}) \ge 0$, with equality if and only if there exists $x \in \varOmega _{X}$ such that $P_{X}(x) = 1$. Such a PMF has no uncertainty, and therefore, it must have maximum information. Thus $I(P_{X}) \le U(\varOmega _{X})$, with equality if and only if there exists $x \in \varOmega _{X}$ such that $P_{X}(x) = 1$.

Entropy for the D-S Theory. In the case of D-S theory of belief functions, if m is a basic probability assignment (BPA) for X, let H(m) denote the entropy of BPA m. First, the D-S theory is a generalization of probability theory. The equiprobable PMF is represented by a Bayesian uniform basic probability assignment (BPA) $m_{u}$ for X such that $m_{u}(\{x\}) = 1/|\varOmega _{X}|$ for all $x \in \varOmega _{X}$. So to be consistent with probability theory, we should have $H(m_{u}) = \log _{2}(|\varOmega _{X}|)$. However, such a BPA, $m_{u}$ for X, does not have the maximum uncertainty. The vacuous BPA $\iota _{X}$ for X such that $\iota _{X}(\varOmega _{X}) = 1$ has more uncertainty than the equiprobable Bayesian $m_{u}$. As we cannot imagine a BPA for X that has more uncertainty than $\iota _{X}$, we assume that $H(\iota _{X})$ has the maximum uncertainty. Klir [16] and others argue that a measure of uncertainty can capture a measure of conflict as well as a measure of non-specificity. Assuming that each of these two measures is scaled so that they are each measured on a scale $[0, \log _{2}(|\varOmega _{X}|)]$, then, $H(\iota _{X}) = 2 \log _{2}(|\varOmega _{X}|)$. Like in probability theory, we can define a measure of information content of BPA m for X so that the following holds:

$$\begin{aligned} I(m) + H(m) = 2 \log _{2}(|\varOmega _{X}|), \end{aligned}$$

(4)

where I(m) denotes the information content of BPA m for X. Thus, for the vacuous BPA $\iota _{X}$ for X, we have $I(\iota _{X}) = 0$, whereas for the Bayesian uniform BPA $m_{u}$ for X, we have $I(m_{u}) = \log _{2}(|\varOmega _{X}|)$.

In this paper, we are interested in defining a measure of entropy (uncertainty) of BPAs m for X in the D-S theory of belief functions on the scale $0 \le H(m) \le 2 \log _{2}(|\varOmega _{X}|)$, so that $H(m) \le 2 \log _{2}(|\varOmega _{X}|)$ with equality if and only if $m = \iota _{X}$, and $H(m) \ge 0$, with equality if and only if m is such that $m(\{x\}) = 1$ for some $x \in \varOmega _{X}$. Also, we require a monotonicity property, a probability consistency property, an additivity property, and a requirement that any definition should be based on semantics consistent with D-S theory. These are discussed in detail in Sect. 2.

Literature Review. There is a rich literature on information theoretic measures for the D-S theory of belief functions. Some, e.g., [13, 23, 32, 38], define the information content of BPA m so that $I(\iota _{X}) = 0$. Some define entropy on the scale $[0, \log _{2}(|\varOmega _{X}|)]$ so that they define entropy only as a measure of conflict (e.g., [35]), or only as a measure of non-specificity [8]. Some, e.g., [11, 15, 17, 18, 21, 24, 25, 37] define entropy as a measure of conflict and non-specificity, but on a scale $[0, \log _{2}(|\varOmega _{X}|)]$, so that $H(m_{u}) = H(\iota _{X}) = \log _{2}(|\varOmega _{X}|)$. Some, e.g., [1, 3, 22] define entropy as a measure of conflict and non-specificity on the scale $[0, 2\log _{2}(|\varOmega _{X}|)]$, but they do so using semantics of belief functions (credal sets of PMFs) that are inconsistent with Dempster’s rule of combination [10, 29]. Our definition is the only one that defines entropy as a measure of conflict and non-specificity, on the scale $[0, 2\log _{2}(|\varOmega _{X}|)]$, using semantics of belief functions that are consistent with Dempster’s combination rule.

2 Desired Properties of Entropy of BPAs in the D-S Theory

First, we explain our informal requirement that any definition of entropy for D-S theory should be consistent with the semantics of this theory. Next, we propose five formal properties that a definition of entropy of BPAs in the D-S theory should satisfy, Finally, we compare these properties with those proposed by Klir and Wierman [19] for the same purposes.

Consistency with D-S Theory Semantics Requirement. First, let us stress once more that we are concerned in this paper only with the D-S belief functions theory that includes Dempster’s combination rule as the operation for aggregating knowledge. There are theories of belief functions that use other combination rules. Let $2^{\varOmega _{X}}$ denote the set of all non-empty subsets of $\varOmega _{X}$. A BPA m for X can be considered as an encoding of a collection of PMFs $\mathcal {P}_{m}$ for X such that for all $\textsf {a} \in 2^{\varOmega _{X}}$ we have:

$$\begin{aligned} Bel_{m}(\textsf {a}) = \sum _{\textsf {b} \in 2^{\varOmega _{X}}: \textsf {b} \subseteq \textsf {a}} m(\textsf {b}) = \min _{P \in \mathcal {P}_{m}} \sum _{x \in \textsf {a}}P(x). \end{aligned}$$

(5)

$\mathcal {P}_{m}$ is referred to as a credal set corresponding to m in the imprecise probability literature (see, e.g., [36]). For such a theory of belief functions, Fagin and Halpern [9] propose a combination rule that is different from Dempster’s combination rule. Thus, a BPA m in the D-S theory cannot be interpreted as a collection of PMFs satisfying Eq. (5) [10, 28]. There are, of course, semantics that are consistent with D-S theory, such as multivalued mappings [6], random codes [28], transferable beliefs [34], and hints [20].

Example 1

Consider a BPA $m_{1}$ for X with state space $\varOmega _{X} = \{x_{1}, x_{2}, x_{3}\}$ as follows: $m_{1}(\{x_{1}\}) = 0.5$, $m_{1}(\varOmega _{X}) = 0.5$. With the credal set semantics of a BPA function, $m_{1}$ corresponds to a set of PMFs $\mathcal {P}_{m_{1}} = \{P \in \mathcal {P} : P(x_{1}) \ge 0.5\}$, where $\mathcal {P}$ denotes the set of all PMFs for X. Now suppose we get a distinct piece of evidence $m_{2}$ for X such that $m_{2}(\{x_{2}\}) = 0.5$, $m_{2}(\varOmega _{X}) = 0.5$. $m_{2}$ corresponds to $\mathcal {P}_{m_{2}} = \{P \in \mathcal {P} : P(x_{2}) \ge 0.5\}$. The only PMF that is in both $\mathcal {P}_{m_{1}}$ and $\mathcal {P}_{m_{2}}$ is $P \in \mathcal {P}$ such that $P(x_{1}) = P(x_{2}) = 0.5$, and $P(x_{3}) = 0$. Notice that if we use Dempster’s rule to combine $m_{1}$ and $m_{2}$, we have: $(m_{1}\oplus m_{2})(\{x_{1}\}) = \frac{1}{3}$, $(m_{1}\oplus m_{2})(\{x_{2}\}) = \frac{1}{3}$, and $(m_{1}\oplus m_{2})(\varOmega _{X}) = \frac{1}{3}$. The set of PMFs $\mathcal {P}_{m_{1}\oplus m_{2}} = \{P \in \mathcal {P}: P(x_{1}) \ge \frac{1}{3}, P(x_{2}) \ge \frac{1}{3}\}$ is not the same as $\mathcal {P}_{m_{1}} \cap \mathcal {P}_{m_{2}}$. Thus, credal set semantics of belief functions are not compatible with Dempster’s rule of combination.

Second, given a BPA m for X in the D-S theory, there are many ways to transform m to a corresponding PMF $P_{m}$ for X [5]. However, only one of these ways, called the plausibility transform [4], is consistent with m in the D-S theory in the sense that $P_{m_{1}} \otimes P_{m_{2}} = P_{m_{1} \oplus m_{2}}$, where $\otimes $ is the combination rule in probability theory [31], and $\oplus $ is Dempster’s combination rule in D-S theory [4]. [7, 15, 25] define entropy of m as the Shannon’s entropy of the pignistic transform of m. The pignistic transform of m is not compatible with Dempster’s combination rule [4], and therefore, this definition is not consistent with D-S theory semantics. Thus, as per our consistency with D-S theory semantics requirement, any method for defining entropy of m in the D-S theory by first transforming m to a corresponding PMF should use the plausibility transform method. Notice that we are not claiming that a definition of entropy for D-S theory must use the plausibility transform method, only that if one takes the path of first transforming a BPA m to an equivalent PMF and then using Shannon’s entropy of the PMF as the definition of entropy of m, then to be compatible with D-S theory semantics, the transformation method used must be the plausibility transform method.

Example 2

Consider a situation where we have vacuous prior knowledge of X with $\varOmega _{X} = \{x_{1}, \ldots , x_{70}\}$ and we receive evidence represented as BPA m for X as follows: $m(\{x_{1}\}) = 0.30$, $m(\{x_{2}\}) = 0.01$, and $m(\{x_{2}, \ldots , x_{70}\}) = 0.69$. The pignistic transform of m [33], denoted by $BetP_{m}$, is as follows: $BetP_{m}(x_{1}) = 0.30$, $BetP_{m}(x_{2}) = 0.02$, and $BetP_{m}(x_{3}) = \ldots = BetP_{m}(x_{70}) = 0.01$. Thus, as per the pignistic transform, BPA m is interpreted as evidence where $x_{1}$ is 15 times more likely than $x_{2}$. Now suppose we receive another distinct piece of evidence that is also represented by m. As per the D-S theory, our total evidence is now $m \oplus m$. If on the basis of m (or $BetP_{m}$), $x_{1}$ was 15 times more likely than $x_{2}$, then now that we have evidence $m \oplus m$, $x_{1}$ should be $15^{2} = 225$ times more likely than $x_{2}$. But $BetP_{m\oplus m}(x_{1}) \approx 0.156$ and $BetP_{m\oplus m}(x_{2}) \approx 0.036$. So according to $BetP_{m\oplus m}$, $x_{1}$ is only 4.33 more likely than $x_{2}$. Thus, $BetP_{m}$ is not consistent with Dempster’s combination rule.

Thus, one requirement we implicitly assume is that any definition of entropy of m should be based on semantics for m that are consistent with the basis tenets of D-S theory. Also, we implicitly assume existence and continuity—given a BPA m, H(m) should always exist, and H(m) should be a continuous function of m. We do not list these three requirements explicitly.

Desired Properties of Entropy for the D-S Theory. The following list of desired properties of entropy $H(m_{X})$, where $m_{X}$ is a BPA for X, is motivated by the properties of Shannon’s entropy of PMFs [30].

Let X and Y denote random variables with state spaces $\varOmega _{X}$ and $\varOmega _{Y}$, respectively. Let $m_{X}$ and $m_{Y}$ denote distinct BPAs for X and Y, respectively. Let $\iota _{X}$ and $\iota _{Y}$ denote the vacuous BPAs for X and Y, respectively.

1.
(Non-negativity) $H(m_{X}) \ge 0$, with equality if and only if there is a $x \in \varOmega _{X}$ such that $m_{X}(\{x\}) = 1$. This is similar to the probabilistic case.
2.
(Maximum entropy) $H(m_{X}) \le H(\iota _{X})$, with equality if and only if $m_{X} = \iota _{X}$. This makes sense as the vacuous BPA $\iota _{X}$ for X has the most uncertainty among all BPAs for X. Such a property is advocated in [3].
3.
(Monotonicity) If $|\varOmega _{X}| < |\varOmega _{Y}|$, then $H(\iota _{X}) < H(\iota _{Y})$. A similar property is used by Shannon to characterize his definition of entropy of PMFs.
4.
(Probability consistency) If $m_{X}$ is a Bayesian BPA for X, then $H(m_{X}) = H_{s}(P_{X})$, where $P_{X}$ is the PMF of X corresponding to $m_{X}$.
5.
(Additivity) $H(m_{X} \oplus m_{Y}) = H(m_{X}) + H(m_{Y})$. This is a weaker form of the compound property of Shannon’s entropy of a PMF.

Klir and Wierman [19] also describe a set of properties that they believe should be satisfied by any meaningful measure of uncertainty based on intuitive grounds. Two of the properties that they suggest, probability consistency and additivity, are also included in the above list. Our maximum entropy property is not in their list. Two of the properties that they require do not make intuitive sense to us.

First, Klir and Wierman require a property they call “set consistency” as follows: $H(m) = \log _{2}(|\textsf {a}|)$ whenever m is deterministic (i.e., it has only one focal element) with focal set a. This property would require that $H(\iota _{X}) = \log _{2}(|\varOmega _{X}|)$. The probability consistency property requires that for the Bayesian uniform BPA $m_{u}$, $H(m_{u}) = \log _{2}(|\varOmega _{X}|)$. Thus, these two requirements entail that $H(\iota _{X}) = H(m_{u}) = \log _{2}(|\varOmega _{X}|)$. We disagree, as there is greater uncertainty in $\iota _{X}$ than in $m_{u}$.

Second, Klir and Wierman require a property they call “range” as follows: For any BPA $m_{X}$ for X, $0 \le H(m_{X}) \le \log _{2}(|\varOmega _{X}|)$. The probability consistency property requires that $H(m_{u}) = \log _{2}(|\varOmega _{X}|)$. Also including the range property prevents from having $H(\iota _{X}) > H(m_{u})$. So we do not include it in our list.

Finally, Klir and Wierman require a sub-additivity property defined as follows. Suppose m is a BPA for $\{X, Y\}$, with marginal BPAs $m^{\downarrow X}$ for X, and $m^{\downarrow Y}$ for Y. Then,

$$\begin{aligned} H(m) \le H(m^{\downarrow X}) + H(m^{\downarrow Y}) \end{aligned}$$

(6)

We agree that this property is important, and the only reason we do not include it in our list is because we are unable to meet this requirement in addition to the five requirements that we do include, and our implicit requirement that any definition be consistent with the semantics of D-S theory of belief functions.

The most important property that characterizes Shannon’s definition of entropy is the compound property $H_{s}(P_{X, Y}) = H_{s}(P_{X} \otimes P_{Y|X}) = H_{s}(P_{X}) + H_{s}(P_{Y|X})$, where $H_{s}(P_{Y|X}) = \sum _{x \in \varOmega _{X}}P_{X}(x)\,H_{s}(P_{Y|x})$. This translated to the D-S theory of belief function would require factorizing a BPA m for $\{X, Y\}$ into BPA $m^{\downarrow X}$ for X, and BPA $m_{Y|X}$ for $\{X, Y\}$ such that $m = m^{\downarrow X} \oplus m_{Y|X}$. This cannot be done for all BPA m for $\{X, Y\}$ [31]. But, we could construct m for $\{X, Y\}$ such that $m = m_{X} \oplus m_{Y|X}$, where $m_{X}$ is a BPA for X, and $m_{Y|X}$ is a BPA for $\{X, Y\}$ such that $m_{Y|X}^{\downarrow X} = \iota _{X}$, and $m_{X}$ and $m_{Y|X}$ are non-conflicting, i.e., the normalization constant in Dempster’s combination rule is 1. Notice that such a constructive BPA m would have the property $m^{\downarrow X} = (m_{X} \oplus m_{Y|X})^{\downarrow X} = m_{X}$. For such constructive BPAs m, we could require a compound property as follows:

$$\begin{aligned} H(m_{X} \oplus m_{Y|X}) = H(m_{X}) + H(m_{Y|X}). \end{aligned}$$

(7)

However, we are unable to formulate a definition of H(m) to satisfy such a compound property. So like the sub-additivity property, we do not include a compound property in our list of properties. The additivity property included in Klir-Wierman’s and our list is so weak that it is satisfied by any definition on a log scale. All definitions of entropy of belief functions in the literature are defined on a log scale, and, thus, they all satisfy the additivity property.

3 A New Definition of Entropy of BPAs in the D-S Theory

In this section, we propose a new definition of entropy of BPAs in the D-S theory. The new definition of entropy is based on the plausibility transform of a belief function to an equivalent probability function. Therefore, we start this section by describing the plausibility transform introduced originally in [4].

Plausibility Transform of a BPA to a PMF. Suppose m is a BPA for X. What is the PMF of X that best represents m in the D-S theory? An answer to this question is given by Cobb and Shenoy [4], who propose the plausibility transform of m as follows. First consider the plausibility function $Pl_{m}$ corresponding to m. Next, construct a PMF for X, denoted by $P_{Pl_{m}}$, by the values of $Pl_{m}$ for singleton subsets suitably normalized, i.e.,

$$\begin{aligned} P_{Pl_{m}}(x) = K^{-1} \cdot Pl_{m}(\{x\}) = K^{-1} \cdot Q_{m}(\{x\}) \end{aligned}$$

(8)

for all $x \in \varOmega _{X}$, where K is a normalization constant that ensures $P_{Pl_{m}}$ is a PMF, i.e., $K = \sum _{x \in \varOmega _{X}} Pl_{m}(\{x\}) = \sum _{x \in \varOmega _{X}} Q_{m}(\{x\})$.

[4] argues that of the many methods for transforming belief functions to PMFs, the plausibility transform is one that is consistent with Dempster’s rule of combination in the sense that if we have BPAs $m_{1}, \ldots , m_{k}$ for X, then $P_{Pl_{m_{1} \oplus \ldots \oplus m_{k}}} = P_{Pl_{m_{1}}} \otimes \ldots \otimes P_{Pl_{m_{k}}}$, where $\otimes $ denotes pointwise multiplication followed by normalization (i.e., Bayesian combination [31]). It can be shown that the plausibility transform is the only method that has this property, which follows from the fact that Dempster’s rule of combination is pointwise multiplication of commonality functions followed by normalization [27].

Example 3

Consider a BPA m for X as described in Example 2. Then, $Pl_{m}$ for singleton subsets is as follows: $Pl_{m}(\{x_{1}\}) = 0.30$, $Pl_{m}(\{x_{2}\}) = 0.70$, $Pl_{m}(\{x_{3}\}) = \cdots = Pl_{m}(\{x_{70}\}) = 0.69$. The plausibility transform of m is as follows: $P_{Pl_{m}}(x_{1}) = 0.3/49.72 \approx 0.0063$, and $P_{Pl_{m}}(x_{2}) = 0.7/49.72 \approx 0.0.0146$, and $P_{Pl_{m}}(x_{3}) = \cdots = P_{Pl_{m}}(x_{70}) \approx 0.0144$. Notice that $P_{Pl_{m}}$ is quite different from $BetP_{m}$. In $BetP_{m}$, $x_{1}$ is 15 times more likely than $x_{2}$. In $P_{Pl_{m}}$, $x_{2}$ is 2.33 times more likely than $x_{1}$. Now consider the scenario where we get a distinct piece of evidence that is identical to m, so that our total evidence is $m \oplus m$. If we compute $m \oplus m$ and $P_{Pl_{m \oplus m}}$, then as per $P_{Pl_{m \oplus m}}$, $x_{2}$ is $2.33^{2}$ more likely than $x_{1}$. This is a direct consequence of the consistency of the plausibility transform with Dempster’s combination rule.

A New Definition of Entropy of a BPA. Suppose m is a BPA for X. The entropy of m is defined as follows:

$$\begin{aligned} H(m) = \sum _{x \in \varOmega _{X}} P_{Pl_{m}}(x) \log _{2}\left( \frac{1}{P_{Pl_{m}}(x)}\right) + \sum _{\textsf {a} \in 2^{\varOmega _{X}}} m(\textsf {a}) \log _{2}(|\textsf {a}|). \end{aligned}$$

(9)

The first component is Shannon’s entropy of $P_{Pl_{m}}$, and the second component is generalized Hartley’s entropy of m [8]. Like some of the definitions in the literature, the first component in Eq. (9) is designed to measure conflict in m, and the second component is designed to measure non-specificity in m. Both components are on the scale $[0, \log _{2}(|\varOmega _{X}|)]$, and therefore, H(m) is on the scale $[0, 2 \log _{2}(|\varOmega _{X}|)]$.

Theorem 1

The entropy H(m) of BPA m for X defined in Eq. (9) satisfies the non-negativity, maximum entropy, monotonicity, probability consistency, and additivity properties. It is also consistent with semantics of the D-S theory.

A proof of this theorem can be found in [14] (that can be downloaded from $\langle $ http://pshenoy.faculty.ku.edu/Papers/WP330.pdf $\rangle $). Finally, we provide an example that shows our definition does not satisfy the sub-additivity property.

Example 4

Consider a BPA m for binary-valued variables $\{X, Y\}$: $m(\{(x, y)\}) = m(\{(x, \bar{y})\}) = 0.1$, $m(\{(\bar{x}, y)\}) = m(\{(\bar{x}, \bar{y})\}) = 0.3$, $m(\varOmega _{\{X, Y\}}) = 0.2.$ It is easy to verify that $H(m) \doteq 2.35$. The marginal BPA $m^{\downarrow X}$ is as follows: $m^{\downarrow X}(\{x\}) = 0.2$, $m^{\downarrow X}(\{\bar{x}\}) = 0.6$, and $m^{\downarrow X}(\varOmega _{X}) = 0.2$. It is easy to verify that $H(m^{\downarrow X}) \doteq 1.12$. Similarly, the marginal BPA $m^{\downarrow Y}$ is as follows: $m^{\downarrow Y}(\{y\}) = 0.4$, $m^{\downarrow Y}(\{\bar{y}\}) = 0.4$, and $m^{\downarrow Y}(\varOmega _{Y}) = 0.2$. It is easy to verify that $H(m^{\downarrow Y}) \doteq 1.20$. Thus, $H(m) \doteq 2.35 > H(m^{\downarrow X}) + H(m^{\downarrow Y}) \doteq 1.12 + 1.20 = 2.32$.

The only definition that satisfies the five properties we state plus the sub-additivity property is that due to Maeda and Ichihashi [22], but this definition is based on credal set semantics of a belief function that is inconsistent with Dempster’s combination rule. Whether there exists a definition that satisfies our five properties plus sub-additivity, and that is based on semantics consistent with the basic tenets of D-S theory, remains an open question.

4 Summary and Conclusions

Interpreting Shannon’s entropy of a PMF of a discrete random variable as the amount of uncertainty in the PMF [30], we propose five desirable properties of entropy of a basic probability assignment in the D-S theory of belief functions. These five properties are motivated by the analogous properties of Shannon’s entropy of PMFs, and they are based on our intuition that a vacuous belief function has more uncertainty than an equiprobable Bayesian belief function. Also, besides the five properties, we also require that any definition should be based on semantics consistent with the D-S theory of belief functions (with Dempster’s rule as the combination rule), H(m) should always exist, and H(m) should be a continuous function of m. Thus, a monotonicity-like property suggested by Abellán-Masegosa [2], based on credal set semantics of belief functions that are not compatible with Dempster’s rule, is not included in our set of requirements.

It would be ideal if we can state the consistency with D-S theory semantics as a formal requirement, but we are unable to do so. In our opinion, the additivity property for the case of two distinct BPAs for disjoint sets of variables does not fully capture consistency with D-S theory semantics. In any case, the definitions of entropy based on credal sets of probability distributions and pignistic transforms are not consistent with Dempster’s combination rule, and therefore, in our perspective, not appropriate for the D-S theory of evidence.

As first suggested by Lamata and Moral [21], we propose a new definition of entropy of BPA as a combination of Shannon’s entropy of an equivalent PMF that captures the conflict measure of entropy, and Dubois-Prade’s entropy of a BPA that captures the non-specificity (or generalized Hartley) measure of entropy. The equivalent PMF is that obtained by using the plausibility transform [4]. This new definition satisfies all five properties we propose. More importantly, our definition is consistent with the semantics for the D-S theory of belief functions.

An open question is whether there exists a definition of entropy of BPA m in the D-S theory that satisfies the five properties we list in Sect. 2, the sub-additivity property, and most importantly, that is consistent with semantics for the D-S theory. Our definition satisfies the five properties and is consistent with semantics for the D-S theory, but it does not satisfy the sub-additivity property.

References

Abellán, J.: Combining nonspecificity measures in Dempster-Shafer theory of evidence. Int. J. Gen. Syst. 40(6), 611–622 (2011)
Article MathSciNet MATH Google Scholar
Abellán, J., Masegosa, A.: Requirements for total uncertainty measures in Dempster-Shafer theory of evidence. Int. J. Gen. Syst. 37(6), 733–747 (2008)
Article MathSciNet MATH Google Scholar
Abellán, J., Moral, S.: Completing a total uncertainty measure in Dempster-Shafer theory. Int. J. Gen. Syst. 28(4–5), 299–314 (1999)
Article MathSciNet MATH Google Scholar
Cobb, B.R., Shenoy, P.P.: On the plausibility transformation method for translating belief function models to probability models. Int. J. Approx. Reason. 41(3), 314–340 (2006)
Article MathSciNet MATH Google Scholar
Daniel, M.: On transformations of belief functions to probabilities. Int. J. Intell. Syst. 21(3), 261–282 (2006)
Article MATH Google Scholar
Dempster, A.P.: Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Stat. 38(2), 325–339 (1967)
Article MathSciNet MATH Google Scholar
Dezert, J., Smarandache, F., Tchamova, A.: On the Blackman’s association problem. In: Proceedings of the 6th Annual Conference on Information Fusion, Cairns, Queensland, Australia, pp. 1349–1356. International Society for Information Fusion (2003)
Google Scholar
Dubois, D., Prade, H.: Properties of measures of information in evidence and possibility theories. Fuzzy Sets Syst. 24(2), 161–182 (1987)
Article MathSciNet MATH Google Scholar
Fagin, R., Halpern, J.Y.: A new approach to updating beliefs. In: Bonissone, P., Henrion, M., Kanal, L., Lemmer, J. (eds.) Uncertainty in Artificial Intelligence, vol. 6, pp. 347–374. North-Holland (1991)
Google Scholar
Halpern, J.Y., Fagin, R.: Two views of belief: belief as generalized probability and belief as evidence. Artif. Intell. 54(3), 275–317 (1992)
Article MathSciNet MATH Google Scholar
Harmanec, D., Klir, G.J.: Measuring total uncertainty in Dempster-Shafer theory: a novel approach. Int. J. Gen. Syst. 22(4), 405–419 (1994)
Article MATH Google Scholar
Hartley, R.V.L.: Transmission of information. Bell Syst. Tech. J. 7(3), 535–563 (1928)
Article Google Scholar
Höhle, U.: Entropy with respect to plausibility measures. In: Proceedings of the 12th IEEE Symposium on Multiple-Valued Logic, pp. 167–169 (1982)
Google Scholar
Jiroušek, R., Shenoy, P.P.: A new definition of entropy of belief functions in the Dempster-Shafer theory. Working Paper 330, University of Kansas School of Business, Lawrence, KS (2016)
Google Scholar
Jousselme, A.-L., Liu, C., Grenier, D., Bossé, E.: Measuring ambiguity in the evidence theory. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 36(5), 890–903 (2006)
Article Google Scholar
Klir, G.J.: Where do we stand on measures of uncertainty, ambiguity, fuzziness, and the like? Fuzzy Sets Syst. 24(2), 141–160 (1987)
Article MathSciNet MATH Google Scholar
Klir, G.J., Parviz, B.: A note on the measure of discord. In: Dubois, D., Wellman, M.P., D’Ambrosio, B., Smets, P. (eds.) Uncertainty in Artificial Intelligence: Proceedings of the Eighth Conference, pp. 138–141. Morgan Kaufmann (1992)
Google Scholar
Klir, G.J., Ramer, A.: Uncertainty in the Dempster-Shafer theory: a critical re-examination. Int. J. Gen. Syst. 18(2), 155–166 (1990)
Article MATH Google Scholar
Klir, G.J., Wierman, M.J.: Uncertainity Elements of Generalized Information Theory, 2nd edn. Springer, Berlin (1999)
MATH Google Scholar
Kohlas, J., Monney, P.-A.: A Mathematical Theory of Hints: An Approach to the Dempster-Shafer Theory of Evidence. Springer, Berlin (1995)
Book MATH Google Scholar
Lamata, M.T., Moral, S.: Measures of entropy in the theory of evidence. Int. J. Gen. Syst. 14(4), 297–305 (1988)
Article MathSciNet MATH Google Scholar
Maeda, Y., Ichihashi, H.: An uncertainty measure under the random set inclusion. Int. J. Gen. Syst. 21(4), 379–392 (1993)
Article MATH Google Scholar
Nguyen, H.T.: On entropy of random sets and possibility distributions. In: Bezdek, J.C. (ed.) The Analysis of Fuzzy Information, pp. 145–156. CRC Press, Boca Raton (1985)
Google Scholar
Pal, N.R., Bezdek, J.C., Hemasinha, R.: Uncertainty measures for evidential reasoning II: a new measure of total uncertainty. Int. J. Approx. Reason. 8(1), 1–16 (1993)
Article MathSciNet MATH Google Scholar
Pouly, M., Kohlas, J., Ryan, P.Y.A.: Generalized information theory for hints. Int. J. Approx. Reason. 54(1), 228–251 (2013)
Article MathSciNet MATH Google Scholar
Rényi, A.: On measures of information and entropy. In: Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547–561 (1960)
Google Scholar
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Shafer, G.: Constructive probability. Synthese 48(1), 1–60 (1981)
Article MathSciNet MATH Google Scholar
Shafer, G.: Perspectives on the theory and practice of belief functions. Int. J. Approx. Reason. 4(5–6), 323–362 (1990)
Article MathSciNet MATH Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)
Article MathSciNet MATH Google Scholar
Shenoy, P.P.: Conditional independence in valuation-based systems. Int. J. Approx. Reason. 10(3), 203–234 (1994)
Article MathSciNet MATH Google Scholar
Smets, P.: Information content of an evidence. Int. J. Man Mach. Stud. 19, 33–43 (1983)
Article Google Scholar
Smets, P.: Constructing the pignistic probability function in a context of uncertainty. In: Henrion, M., Shachter, R., Kanal, L.N., Lemmer, J.F. (eds.) Uncertainty in Artificial Intelligence, vol. 5. pp, pp. 29–40. North-Holland, Amsterdam (1990)
Google Scholar
Smets, P., Kennes, R.: The transferable belief model. Artif. Intell. 66(2), 191–234 (1994)
Article MathSciNet MATH Google Scholar
Vejnarová, J., Klir, G.J.: Measure of strife in Dempster-Shafer theory. Int. J. Gen. Syst. 22(1), 25–42 (1993)
Article MATH Google Scholar
Walley, P.: Statistical Reasoning with Imprecise Probabilities. Chapman & Hall, London (1991)
Book MATH Google Scholar
Wierman, M.J.: Measuring granularity in evidence theory. Int. J. Gen. Syst. 30(6), 649–660 (2001)
Article MathSciNet MATH Google Scholar
Yager, R.: Entropy and specificity in a mathematical theory of evidence. Int. J. Gen. Syst. 9(4), 249–260 (1983)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This article is a short version of [14], which has been supported in part by funds from grant GAČR 15-00215S to the first author, and from the Ronald G. Harper Distinguished Professorship at the University of Kansas to the second author. We are extremely grateful to Thierry Denoeux, Marc Pouly, Anne-Laure Jousselme, Joaquín Abellán, and Mark Wierman for their comments on earlier drafts of [14]. We are grateful to two anonymous reviewers of Belief-2016 conference for their comments. We are also grateful to Suzanna Emelio for a careful proof-reading of the text.

Author information

Authors and Affiliations

Faculty of Management, University of Economics, Jindřichův Hradec, Czech Republic
Radim Jiroušek
School of Business, University of Kansas, Lawrence, KS, USA
Prakash P. Shenoy
Institute of Information Theory and Automation, Academy of Sciences, Prague, Czech Republic
Radim Jiroušek

Authors

Radim Jiroušek
View author publications
You can also search for this author in PubMed Google Scholar
Prakash P. Shenoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prakash P. Shenoy .

Editor information

Editors and Affiliations

Czech Academy of Sciences, nstitute of Information Theory and Automation, Prague , Czech Republic
Jiřina Vejnarová
Czech Academy of Sciences, Institute of Information Theory and Automation, Prague, Czech Republic
Václav Kratochvíl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiroušek, R., Shenoy, P.P. (2016). Entropy of Belief Functions in the Dempster-Shafer Theory: A New Perspective. In: Vejnarová, J., Kratochvíl, V. (eds) Belief Functions: Theory and Applications. BELIEF 2016. Lecture Notes in Computer Science(), vol 9861. Springer, Cham. https://doi.org/10.1007/978-3-319-45559-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-45559-4_1
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45558-7
Online ISBN: 978-3-319-45559-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Entropy of Belief Functions in the Dempster-Shafer Theory: A New Perspective

Abstract

Similar content being viewed by others

A Decomposable Entropy of Belief Functions in the Dempster-Shafer Theory

Measure of Information Content of Basic Belief Assignments

The generalized maximum belief entropy model

Keywords

1 Introduction

2 Desired Properties of Entropy of BPAs in the D-S Theory

Example 1

Example 2

3 A New Definition of Entropy of BPAs in the D-S Theory

Example 3

Theorem 1

Example 4

4 Summary and Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Entropy of Belief Functions in the Dempster-Shafer Theory: A New Perspective

Abstract

Similar content being viewed by others

A Decomposable Entropy of Belief Functions in the Dempster-Shafer Theory

Measure of Information Content of Basic Belief Assignments

The generalized maximum belief entropy model

Keywords

1 Introduction

2 Desired Properties of Entropy of BPAs in the D-S Theory

Example 1

Example 2

3 A New Definition of Entropy of BPAs in the D-S Theory

Example 3

Theorem 1

Example 4

4 Summary and Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation