1 Introduction

Dealing with lack of information is a usual problem in many areas. This lack of information can be given in two different ways: uncertainty or imprecision. In the first case, we deal with experiments where we can have more than one possible outcome, each possible outcome can be specified in advance, but the outcome of the experiment depends on chance. For instance, in a coin toss, we know the two possible outcomes, head or tail, but we do not know the final result. In the second case, we have no uncertainty about the result of the experiment but imprecision. Thus, for instance, if we consider again the experiment of the coin toss, the coin could be already thrown but maybe it is too old and we are not sure that the face it shows is clearly a head.

Information theory studies the quantification and communication of the information and, in particular, it measures the amount of uncertainty involved in the value of a random experiment. It was originally proposed by Shannon [31] in 1948 as a tool in signal processing. Thus, this theory combines a lot of different fields such as mathematics, statistics, computer science, physics and electrical engineering. From the beginning, this theory was revealed as an interesting tool in many other areas and therefore a lot of researchers started to work on it (Rényi [30], Oniçescu [27], Sharma and Mittal [32], Havrda and Charvát [11], etc.). Later, an important step was given by Kampé de Fèriet and Forte [12] with an axiomatic definition of the information with or without a probability measure. From the theoretical aspects of this theory, Kullback [17] found a lot of interesting applications in statistical inference. From this initial application a lot of papers have been developed in this area. In particular, some very important achievements have been obtained by Pardo (see, among others, [28, 29]). An important review about all these theories can be found in Gil [7], since he was one of the most important researchers in this area in Spain. Divergence measures between probability distributions were an important topic on this monograph and it is the starting point of this chapter, as we will see later.

On the other hand, Zadeh [34] introduced in 1965 the concept of fuzzy set, as a way to model vague or poorly defined properties for situations in which it is not possible to fully discriminate between having and not having the said properties. From that, a whole mathematical and applied theory to deal with imprecision was developed. It is known as Fuzzy Logic Theory. Two interesting monographs about this theory were written by Dubois and Prade [6] and Klir and Folger [13].

As we can see from the title of this last book, the concepts fuzzy sets, uncertainty and information are mixed. This is not by chance, since these topics are very related, as we can see in [8,9,10]. In particular, we have studied [24] the relationship between uncertainty measures defined in Information Theory [12] and the fuzziness measures introduced by De Luca and Termini [5] and later analyzed in a deeper way by Knopfmacher [14]. The link between measures of uncertainty and imprecision in fuzzy environments will lie in what we will refer as divergence measure, because of the analogy with the classical meaning of the term used in comparing two probability distributions (see, for instance, [29]). The main purpose of this chapter is to use these measures to compare two fuzzy sets.

As introductory notions, we present two axiomatic definitions to measure the entropy–uncertainty measures and fuzziness measures in Sect. 2. A study on the relationship between them, in the most general context, is given there. The definition of divergence measure between fuzzy sets is given in Sect. 3 following the ideas considered previously. The most important results are contained in that section where we also comment some extensions. Finally, we conclude the work with some comments in Sect. 4.

2 Preliminary

Necessary concepts to understand the remaining parts of this work are given in this section. In particular, we will focus on the definitions and notations for uncertainty measures and fuzziness measures.

2.1 Uncertainty Measures

The first probabilistic uncertainty measure (also called entropy) was given by Shannon [31] in the context of Communication Theory. That initial definition considered that the uncertainty for a random experiment can be measured by means of the quantity

$$H(P) = -\displaystyle \sum _{i=1}^n p_i log_2 (p_i)$$

where values \(p_i\) represent the probabilities of the possible results of the experiment.

From that initial definition, a lot of generalizations have been proposed in the literature.

Thus, Menéndez et al. [19] proved that all these measures of entropy are part of a wider family, which are named h-\(\phi \)-entropies.

This family is slightly more general than Ben Bassat’s family of f-entropies that were defined as those functions that can be expressed like

$$H(P)= \displaystyle \sum _{i=1}^n f(p_i) $$

where f is a concave function.

Later, the quasi-\(\phi \)-entropies were introduced and characterized in the case of discrete distributions [3]. Thus, it is a family more general than Ben Bassat’s one but different from the family of h-\(\phi \)-entropies. More precisely, they are defined by

$$H(P)= \displaystyle \sum _{i=1}^n \phi (p_i) $$

where \(\phi \) is a function such that \(\phi (\lambda x + (1-\lambda ) y) \ge \lambda \phi (x) + (1-\lambda ) \phi (y), \forall x,y\in [0,1],\, x+y\le 1\).

An important property of uncertainty measures is the Principle of Transfer or Pigou–Dalton’s condition. An uncertainty measure H fulfils this property if given two probability distributions P and \(P^{'}\) with parameters \((p_1,\) \(p_2, \ldots ,\) \(p_n)\) and \((p'_1, p'_2, \ldots , p'_n)\) respectively, then \(H(P)\le H(P')\), where, \(p_k = p'_k,\, \forall k\notin \{i,j\}\) and \(p'_i = p_i+\delta ,\, p'_j = p_j-\delta \) for some \(\delta \le (p_i-p_j)/2\).

It is a very logical property, since it means that the more similar the probabilities of two outcomes of an experiment are, the higher the uncertainty is.

2.2 Fuzziness Measures

After having commented some results about uncertainty measures or probabilistic entropies, we now introduce fuzzy sets and the measures of their fuzziness, i.e., the non-probabilistic entropies.

They are well-known and can be found in a wide range of sources (see, for instance, the classical books [6, 13]).

The universal set is denoted by X. A fuzzy subset of X is a mapping from X into the unit interval [0, 1].

In this framework, we use the following notations:

  • \(\mathscr {P}(X)\) is the set of all subsets of X,

  • \({\mathscr {F}}(X)\) is the set of all fuzzy subsets of X,

  • \(A \in \mathscr {P}(X)\) will denote any crisp set,

  • \(\widetilde{A} \in {\mathscr {F}}(X)\) will denote any fuzzy set.

We identify a fuzzy set and its membership function. Thus we have that \(X(x)=1\) for all \(x \in X\) and for the empty set we have \(\emptyset (x)=0\) for all \(x \in X\).

Two further important concepts are the containment relation and the complement set. We consider the standard Zadeh’s negation for the complement (see [34]).

Definition 2.1

Let \(\widetilde{A}, \widetilde{B}\in \mathscr {F}(X)\). The complement of \(\widetilde{A}\) is the fuzzy set \(\widetilde{A}^c(x)=1-\widetilde{A}(x)\), \(x\in X\). \(\widetilde{A}\) is contained in \(\widetilde{B}\), denoted by \(\widetilde{A}\subseteq \widetilde{B}\) if \(\widetilde{A}(x)\le \widetilde{B}(x)\) for all \(x\in X\).

Apart from the previous relation of containment, we consider the concepts of intersection and union of fuzzy sets. The initial definitions were also given in [34] by means of the minimum and the maximum operators.

However, they are not the only way to generalize the classical set operations, since there exists a broader class of functions to represent them. In fact, for the intersection, this class is referred as t-norm and for the union as t-conorm.

A triangular norm (t-norm) is a function \(T:[0,1] \times [0,1] \rightarrow [0,1]\) satisfying the following properties:

  1. (T1)

    \(T(a,b)=T(b,a)\), for all \(a,b\in [0,1]\),

  2. (T2)

    \(T(T(a,b),c)=T(a,T(b,c))\), for all \(a,b,c\in [0,1]\),

  3. (T3)

    \(b \le c \Rightarrow T(a,b) \le T(a,c)\), for all \(a,b,c\in [0,1]\),

  4. (T4)

    \(T(a,1)=a\), for all \(a\in [0,1]\).

Some important examples of t-norms are:

  • Minimum: \(T_M(a,b)=\min (a,b)\), for all \(a,b\in [0,1]\),

  • Product: \(T_P(a,b)=a\cdot b\), for all \(a,b\in [0,1]\),

  • Łukasiewicz t-norm: \(T_L(a,b)=\max (a+b-1,0)\), for all \(a,b\in [0,1]\),

  • Drastic t-norm:

    $$\begin{aligned} T_D(a,b)= \left\{ \begin{array}{ll} \min (a,b), &{} \text { if } \max (a,b)=1 \\ 0, &{} \text { otherwise} \end{array}\right. \,. \end{aligned}$$

For these basic t-norms, it holds that \(T_D \le T_L \le T_P \le T_M\). In fact, for any t-norm T, it is true that \(T_D \le T \le T_M\). By changing the neutral element from 1 to 0, we obtain the triangular conorms (t-conorm).

A t-norm T and a t-conorm S are dual iff for each \(a,b \in [0,1]\) it holds that \(T(a,b)=1-S(1-a,1-b)\).

The dual conorms of the t-norms presented earlier are the following:

  • Maximum: \(S_M(a,b)=\max (a,b)\), for all \(a,b\in [0,1]\),

  • Probabilistic sum: \(S_P(a,b)=a+b-a\cdot b\), for all \(a,b\in [0,1]\),

  • Łukasiewicz t-conorm: \(S_L(a,b)=\min (a+b,1)\), for all \(a,b\in [0,1]\),

  • Drastic t-conorm:

    $$\begin{aligned} S_D(a,b)= \left\{ \begin{array}{ll} \max (a,b), &{} \text { if } \min (a,b)=0 \\ 1, &{} \text { otherwise } \end{array}\right. \,. \end{aligned}$$

Using t-norms and t-conorms, we can define the intersection and union of two fuzzy sets as follows.

Definition 2.2

Let \(\widetilde{A}, \widetilde{B} \in \mathscr {F}(X)\). Given a t-norm T and a t-conorm S,

  • \(\widetilde{A} \cap \widetilde{B}(x)=T(\widetilde{A}(x),\widetilde{B}(x)), \forall x\in X\);

  • \(\widetilde{A} \cup \widetilde{B}(x)=S(\widetilde{A}(x),B(x)), \forall x\in X\).

Thus, we can denote by (XTS) the triple formed by the universe with the t-norm and the t-conorm defining the intersection and the union, respectively.

The entropy for a fuzzy set is quantified by means of the non-probabilistic entropies or fuzziness measures (see, for instance, [33]), which are defined as follows.

Definition 2.3

A fuzziness measure is a real function f defined on \({\mathscr {F}}(X)\), fulfilling the following requirements:

  1. (a)

    \(f(\widetilde{A}) = 0 \Longleftrightarrow \widetilde{A}\) is a crisp set.

  2. (b)

    If \(\widetilde{A}, \widetilde{B} \in {\mathscr {F}}(X)\) and \(\widetilde{A}\) is “sharper” than \(\widetilde{B}\), then \(f(\widetilde{A}) \le f(\widetilde{B})\).

  3. (c)

    \(f(\widetilde{A})\) takes maximum value if and only if \(\widetilde{A}\) is “maximally fuzzy”.

This last definition is based on the concepts “sharper than” and “maximally fuzzy”, although the second one follows from the former. Thus, the most usual criteria to define the relation “to be sharper than” are the following:

  • \(\widetilde{A}\) is sharper than \(\widetilde{B}\) iff either \(\widetilde{A}(x)\le \widetilde{B}(x)\le 1/2\) or \(\widetilde{A}(x) \ge \widetilde{B}(x)\ge 1/2\) for any x in X (see [13]) or

  • \(\widetilde{A}\) is sharper than \(\widetilde{B}\) iff \(|\widetilde{A}(x)-1/2|\le | \widetilde{B}(x)- 1/2|\) for any x in X (see [6]).

It is clear that the first one is a particular case of the second one and therefore we are going to consider the most general definition.

Knopfmacher introduced in 1975 a very important family of fuzziness measures, the Knopfmacher class [14], which is given by the functions f such that

$$f(\widetilde{A}) = F\left( \displaystyle \sum _{x\in X} c_x\cdot g_x(\widetilde{A}(x))\right) $$

for any \( \widetilde{A}\) in \(\mathscr {F}(X)\) where \(c_x\in \mathbb {R}^{+}\); \(g_x\) is a real-valued function such that \(g_x(0) = g_x(1) = 0,\, g_x(t) = g_x(1-t), \forall t\in [0,1] \) and \(g_x\) is strictly increasing on [0, 1 / 2]; F is a positive strictly increasing function with \(F(0)=0\).

Later, we consider a particular class of Knopfmacher fuzziness measure (see [20, 22]) when F is the identity, \(g_x\) is the same for all \(x\in X\) (we denoted \(g_x\) by \(u_f\) or simply u) and u is concave. Any function in this family was named local fuzziness measure.

2.3 From Uncertainty to Fuzziness

Proposition 2.1

([24]) Let \((X,\mathscr {A},\mu )\) be a measurable space and let H be an uncertainty measure fulfilling the Pigou-Dalton’s condition and such that \(H(P) = 0 \Longleftrightarrow P\) is degenerate. The map f defined as follows:

$$\begin{array}{rccc} f: &{} {\mathscr {A}}^{*} &{} \longrightarrow &{} \mathbb {R}^{+}\\ &{} \widetilde{A} &{} \longrightarrow &{} \displaystyle \int _{X} H(\widetilde{A}(x),\widetilde{A}^c(x)) d\mu (x)\end{array} $$

is a fuzziness measure and it belongs to the Knopfmacher’s class.

If we work on some particular spaces, we are also able to establish a one-to-one correspondence between fuzziness measures and uncertainty measures.

Thus, if we consider the subset of uncertainty measures given by

$$\begin{array}{c}{\mathscr {H}}_2 = \{H | H \text { is a quasi-}\,\phi \,\text {-entropy with}\,\phi \,\text {continue,}\,\phi (x) = \phi (1-x),\\ \\ \forall x\in [0,\frac{1}{2}] \text { and } \phi (x)=0\Leftrightarrow x=0\}\end{array} $$

we have the injective property, as we can see in the following proposition.

Proposition 2.2

([24]) If \(F_1\) is a map from \({\mathscr {H}}_2\) in \({\mathscr {F}}\) such that

$$F_1(H)(\widetilde{A}) = \displaystyle \int _{X} H(\widetilde{A}(x),\widetilde{A}^c(x)) d\mu (x),$$

where \({\mathscr {F}}\) denotes the Knopfmacher’s class of fuzziness measures, then we have that \(F_1\) is injective.

If we restrict our study to the family of \(\phi \)-entropies given by \({\mathscr {H}}_{\phi } = \{H\in {\mathscr {H}}_2 | \phi \text { is concave}\}\) and the family of fuzziness measures given by \({\mathscr {F}}_1 = \{ f\in {\mathscr {F}}~\text {with }~g~\text {continue}\}\) we have the bijection.

Theorem 2.1

([24]) There exists a one-to-one correspondence between the family of uncertainty measures \({\mathscr {H}}_{\phi }\) and the family of fuzziness measures \({\mathscr {F}}_1\).

3 Divergence Measures

From the previous section, we could notice that the imprecision about the membership of any element \(x\in X\) in a fuzzy set \(\widetilde{A}\) could be represented by a probability distribution \(\{\widetilde{A}(x),\widetilde{A}^c(x)\}\). Then, we looked at the classical divergence measures between probability distributions (see, for instance, [7, 29]) to try to compare two fuzzy sets.

Thus, from this starting point, we proposed a new way to compare two fuzzy sets [20], the divergence, with the following properties:

  • It becomes zero when the two sets coincide.

  • It is a nonnegative and symmetric function.

  • It decreases when the two sets become “more similar” in some sense.

While it is easy to formulate the first and the second conditions analytically, the third one depends on the formalization of the concept “more similar”. We base our approach on the fact that if we add a set \(\widetilde{C}\) to both fuzzy sets \(\widetilde{A},\widetilde{B}\), we obtain two subsets which are closer to each other; the same with the intersection.

Definition 3.1

Let (XTS) be a triple with X a universe and T and S any t-norm and t-conorm, respectively. A map \(D: {\mathscr {F}}(X)\times {\mathscr {F}}(X) \rightarrow \mathbb {R}\) is a divergence measure with respect to (XTS) iff for all \(\widetilde{A}, \widetilde{B} \in {\mathscr {F}}(X)\), D satisfies the following conditions:

  1. (a)

    \(D(\widetilde{A},\widetilde{A}) = 0\);

  2. (b)

    \(D(\widetilde{A},\widetilde{B}) = D(\widetilde{B},\widetilde{A})\);

  3. (c)

    \(\max \{D(\widetilde{A}\cup \widetilde{C}, \widetilde{B}\cup \widetilde{C}), D(\widetilde{A}\cap \widetilde{C}, \widetilde{B}\cap \widetilde{C})\} \le D(\widetilde{A},\widetilde{B})\), for all \(\widetilde{C}\in {\mathscr {F}}(X)\), where the union and intersection are defined by means of S and T, respectively.

It is clear that a divergence measure is associated to a triple (XTS) and a map D can be a divergence measure with respect to a t-norm and it cannot be a divergence measure with respect to a different t-norm.

However, when there is not ambiguity, we will call just divergence measure without specifying the used t-norm and t-conorm.

After different studies of this concept [2, 20, 22,23,24], we presented the most general study in [15], where we can also find the following examples.

Example 3.1

([15]) The map

$$\begin{aligned} D(\widetilde{A},\widetilde{B})= \left\{ \begin{array}{ll} 0, &{} \text { if } \widetilde{A} = \widetilde{B} \\ 1, &{} \text { if } \widetilde{A} \ne \widetilde{B} \end{array}\right. \,. \end{aligned}$$

is a divergence for any triple (XTS).

On the other hand, if we consider the map

$$D(\widetilde{A},\widetilde{B})=\sum _{x \in X} \alpha _x\cdot |\widetilde{A}(x)-\widetilde{B}(x)|$$

where \(\alpha _x \ge 0\) for any \(x\in X\), \(\sum _{x\in X} \alpha _x=1\) and X is a finite space, D is a divergence for the minimum t-norm, the product t-norm or the Łukasiewicz t-norm, but it is not for the drastic t-norm.

A divergence measure can be seen as a particular case of dissimilarity when the minimum t-norm is considered, which is the most usual way to compare two fuzzy sets [18].

Moreover, it avoids some counterintuitive examples for dissimilarities, while both divergence and dissimilarity measures can be seen as a particular case of the general measures of comparison given by Bouchon–Meunier et al. [1] in 1996. An interesting study about different ways to compare fuzzy sets can be found in [4].

From this starting point, we have been able to generalize this concept to define the divergence measure for comparing two intuitionistic fuzzy sets [25].

The particular case of local divergences for intuitionistic fuzzy sets was studied in [26]. There, we presented interesting applications of this concept in Pattern Recognition and Decision Theory.

A similar generalization has been done for hesitant fuzzy sets in [16].

Moreover, we have been able to use the divergences to measure the fuzziness of a fuzzy set by comparing it with the closest crisp set and conversely, we have used fuzziness measures to define a divergence measure [21].

All these definitions and results can be considered as a heritage of the classical divergence measures, and more precisely, of the knowledge about them conveyed by Prof. Gil to the authors of this work.

4 Conclusion

In this paper we have studied some relationships among different ways to compare two elements, under uncertainty and imprecision.

Thus, we have used the classical divergence measures between two probability distributions to obtain a new way to compare two fuzzy sets. This is a particularly interesting case of dissimilarity in some cases and it has very interesting and specific properties.

The link between randomness and fuzziness is proven one more time, as we did previously for probabilistic and non-probabilistic entropies.