Keywords

1 Introduction

Most works on fuzzy set theory do not give any precise interpretation for the values of membership functions. This is not a problem as far as the works remain in the realm of pure mathematics. However, as soon as examples of application are included an interpretation is needed, otherwise not only the membership functions are arbitrary, but also all rules applied to them are unjustified [3, 25, 32].

In this paper, the interpretation of the values of membership functions in terms of likelihood is reviewed. The concepts of probability and likelihood were clearly distinguished by Fisher [19]: likelihood is simpler, more intuitive, and better suited to information fusion [6, 8]. The likelihood interpretation of fuzzy sets is elucidated in Sect. 2, while Sect. 3 shows that it justifies an expression for the likelihood function induced by fuzzy data that appeared often in the literature [13, 20, 23, 26, 35], but without a clear justification. This likelihood function can also be interpreted as resulting from an errors-in-variables model or measurement error model [5], as will be illustrated by a simple example. Finally, Sect. 4 discusses the interpretation of \(\alpha \)-cuts as confidence intervals, while the last section concludes the paper and outlines future work.

2 The Likelihood Interpretation

A fuzzy set is described by its membership function \(\mu :\mathcal {X} \rightarrow [0,1]\), where \(\mathcal {X}\) is a nonempty (crisp) set [34]. A standard example is the fuzzy set representing the meaning of the word “tall” in relation to a man, where the elements of \(\mathcal {X}\) are the possible values of a man’s height in cm [36]. We can expect for instance that \(\mu (180)>\mu (160)\), because the attribute “tall” fits better to a 180 cm man than to a 160 cm one. However, the concept of a fuzzy set as described by a real-valued membership function \(\mu \) can only be used to model the reality if we have an interpretation for the numerical values of \(\mu \).

In fact, a clear interpretation of membership functions should be the starting point of a theory of fuzzy sets that describes the real world, and all rules of the theory should be a consequence of the interpretation [3, 25, 32]. This is for example the case with the theory of probability, whose rules are a consequence of each of its interpretations (at least on finite spaces). As suggested by this example, it is not necessary that the interpretation is unique, but only the rules that are implied by the considered interpretation should be used in applications.

One of the first aspects to consider when discussing the interpretation of fuzzy sets is if they are used in an epistemic or ontic sense [13, 15]. Fuzzy sets have an ontic interpretation when they are themselves the object of inquiry, while they have an epistemic interpretation when their membership function \(\mu :\mathcal {X}\rightarrow [0,1]\) only gives information about the real object of inquiry, which is the value of \(x\in \mathcal {X}\). In this paper, we will only consider epistemic fuzzy sets, and focus on their interpretation in terms of likelihood.

The likelihood interpretation of a fuzzy set consists in interpreting its membership function \(\mu :\mathcal {X}\rightarrow [0,1]\) as the likelihood function lik on \(\mathcal {X}\) induced by the observation of an event D:

$$ \mu (x)=lik(x\,|\,D)\propto P(D\,|\,x) $$

for all \(x\in \mathcal {X}\), where \(P(D\,|\,x)\) was the probability of the event D (before its realization) given the value of \(x\in \mathcal {X}\).

For example, “John is tall” is a piece of information that can be modeled by a fuzzy set with membership function \(\mu :\mathcal {X}\rightarrow [0,1]\) with \(\mu (x)\propto P(D\,|\,x)\), where the elements of \(\mathcal {X}\) are the possible values of John’s height in cm, and \(P(D\,|\,x)\) is the probability of the event D of getting the information that “John is tall” when John’s height is x cm. Hence, the exact meaning of the interpretation of fuzzy sets in terms of likelihood depends on the interpretation given to probability values, but as noted above, the choice of this interpretation does not affect the rules of probability theory.

The likelihood interpretation is probably the oldest interpretation of fuzzy sets: it has been more or less explicitly used directly after [27] and even before [2, 29] the mathematical concept of fuzzy set was introduced by Zadeh [34], and has later been studied in detail by several authors [1, 1012, 14, 16, 17, 22, 24, 30, 31]. However, most of them interpreted membership functions \(\mu \) in terms of probability values \(\mu (x)=P(D\,|\,x)\), instead of likelihood values \(\mu (x)=lik(x\,|\,D)\). Historically, the subtle distinction between probability and likelihood confused several great minds, before the likelihood of \(x\in \mathcal {X}\) was clearly defined by Fisher as proportional to the probability of the data D given x [18, 19, 21].

The proportionality constant in the definition of \(lik(x\,|\,D)\) can depend on anything but the value of \(x\in \mathcal {X}\). The reason for defining the likelihood function lik only up to a multiplicative constant is that otherwise lik would strongly depend on irrelevant information. For example, if two persons chosen at random from a population independently tell us that John is “tall” and “very tall”, respectively, then the resulting fuzzy set should not change completely if we would or would not have the additional information that the first person said “tall” and the second one “very tall”.

Interpreting fuzzy sets in terms of likelihood thus implies that proportional membership functions have the same meaning. Uniqueness of representation is recovered by assuming, as is often done anyway, that all fuzzy sets are normalized. That is, their membership functions \(\mu :\mathcal {X} \rightarrow [0,1]\) satisfy \(\sup _{x\in \mathcal {X}}\mu (x)=1\), and are thus uniquely determined by \(\mu (x)\propto P(D\,|\,x)\). Surprisingly, very few authors seem to have somehow considered this important aspect of the likelihood interpretation, and not in a very explicit way [14, 25, 31].

3 Fuzzy Data

A basic advantage of the likelihood interpretation of fuzzy sets is that it allows to directly obtain statistical inferences from fuzzy data. The only condition on the statistical methods used is that the data enter them through the likelihood function only. In particular, all methods from the likelihood and Bayesian approaches to statistics can be straightforwardly generalized to the case of fuzzy data.

As discussed in Sect. 2, the membership function of a fuzzy set \(\mu (x)\propto P(D\,|\,x)\) is interpreted as the likelihood function induced by the observation of an event D. Now, if we have a probability distribution on \(x\in \mathcal {X}\), depending on an unknown parameter \(\theta \in \varTheta \), then the observation of the event D induces also a likelihood function lik on \(\varTheta \):

$$\begin{aligned} lik(\theta \,|\,D)\propto P(D\,|\,\theta )=\int _{\mathcal {X}} P(D\,|\,x)\,dP(x\,|\,\theta )\propto \int _{\mathcal {X}}\mu (x)\,dP(x\,|\,\theta ) \end{aligned}$$
(1)

for all \(\theta \in \varTheta \), where \(P(D\,|\,x)\) is assumed to be a measurable function of x that does not depend on \(\theta \).

Zadeh [35] defined the probability of the fuzzy event described by a membership function \(\mu :\mathcal {X}\rightarrow [0,1]\) as the right-hand side of (1), without justifying this choice through a clear interpretation of the values of \(\mu \). The likelihood interpretation provides only a partial justification: the right-hand side of (1) is proportional to the probability of the event D that induced the fuzzy information described by \(\mu \), where the proportionality constant can depend on anything but \(\theta \) (or x).

In [35] Zadeh introduced also the concept of probabilistic independence for fuzzy events, again without a clear justification. The likelihood interpretation clarifies another concept of independence, which is extremely important in fuzzy set theory: the concept of independence among the pieces of information described by different fuzzy sets, which is usually implicitly or explicitly assumed [3, 24]. The pieces of information described by the membership functions \(\mu _{1} ,\ldots ,\mu _{n}:\mathcal {X}\rightarrow [0,1]\) with \(\mu _{i}(x)\propto P(D_{i}\,|\,x)\) can be interpreted as independent when the events \(D_{1},\ldots ,D_{n}\) that induced them were conditionally independent given x. In this case, the joint fuzzy information is described by the membership function \(\mu :\mathcal {X}\rightarrow [0,1]\) with

$$\begin{aligned} \mu (x)=lik(x\,|\,D)\propto P(D\,|\,x)=\prod _{i=1}^{n}P(D_{i}\,|\,x)\propto \prod _{i=1}^{n}\mu _{i}(x) \end{aligned}$$
(2)

for all \(x\in \mathcal {X}\), where \(D=D_{1}\cap \cdots \cap D_{n}\).

In particular, if \(\mathcal {X}=\mathcal {X}_{1}\times \cdots \times \mathcal {X}_{n}\), the components \(x_{i}\) of \(x=(x_{1},\ldots ,x_{n})\) are probabilistically independent (for all \(\theta \)), and each piece of fuzzy information \(\mu _{i}(x_{i})\propto P(D_{i}\,|\,x)\) is about a different component of x, then the assumption of their independence is very natural, and by combining (1) and (2) we obtain

$$\begin{aligned} lik(\theta \,|\,D)\propto \int _{\mathcal {X}}\prod _{i=1}^{n}\mu _{i} (x_{i})\,dP(x\,|\,\theta )=\prod _{i=1}^{n}\int _{\mathcal {X}_{i}}\mu _{i} (x_{i})\,dP(x_{i}\,|\,\theta ) \end{aligned}$$
(3)

for all \(\theta \in \varTheta \). This likelihood function has been considered by several authors [13, 20, 23, 26], but only justified on the basis of Zadeh’s rather arbitrary definition of the probability of a fuzzy event [35].

The likelihood function (3) induced by fuzzy data with membership functions \(\mu _{i}:\mathcal {X}_{i}\rightarrow [0,1]\) is often too complex to be handled analytically [20], but this is nowadays a typical situation in the likelihood and Bayesian approaches to statistics. In particular, \(x_{1},\ldots ,x_{n}\) play the role of unobserved variables in (3), and therefore the EM algorithm can be used to maximize the likelihood [13]. Several examples of numerical calculations of maximum likelihood estimates based on fuzzy data are given for instance in [13, 23].

When the data are fuzzy numbers, in the sense that \(\mathcal {X}_{i} \subseteq \mathbb {R}\), the likelihood function (3) can also be interpreted as resulting from an errors-in-variables model or measurement error model [5]. In this case, the value \(\xi _{i}\) of a proxy \(x_{i}^{*}\) is assumed to be observed instead of the value of the variable \(x_{i}\), where \(\xi _{i}\in \mathbb {R}\) is an arbitrarily chosen constant, while the measurement error \(\varepsilon _{i}=x_{i}^{*}-x_{i}\) is random with density \(f_{i}\propto \mu _{i}(\xi _{i}-\,\cdot \,)\) and independent of everything else. In this model, each fuzzy number \(\mu _{i}(x_{i})\propto f_{i}(\xi _{i}-x_{i})\propto lik(x_{i}\,|\,x_{i}^{*}=\xi _{i})\) describes the information about the unknown value of \(x_{i}\) obtained from the observed value of its proxy \(x_{i}^{*}\), and the likelihood function \(lik(\,\cdot \,|\,x_{1}^{*}=\xi _{1},\,\ldots ,\,x_{n}^{*}=\xi _{n})\) on \(\varTheta \) induced by these observations is the one in (3). The description of fuzzy data in terms of measurement errors is particularly useful when the various components combine well mathematically, as in the following simple example.

Example 1

Assume that \(x_{1,}\ldots ,x_{n}\) is a sample from a normal distribution with known variance \(\sigma ^{2}\) and unknown expectation \(\theta \in \mathbb {R}\), but we have only fuzzy data with membership functions , where \(\xi _{i},\sigma _{i}\) are known constants. Then the proxy variables \(x_{1}^{*},\ldots ,x_{n}^{*}\) are independent, and each \(x_{i}^{*}\) is normally distributed with expectation \(\theta \) and variance \(\sigma ^{2}+\sigma _{i}^{2} \). Hence, the likelihood function induced by the fuzzy data is given by

$$\begin{aligned} lik(\theta \,|\,x_{1}^{*}=\xi _{1},\,\ldots ,\,x_{n}^{*}=\xi _{n} )\propto \exp \left( -\tfrac{(\theta -\hat{\theta })^{2}}{2\,\tau ^{2}}\right) \end{aligned}$$
(4)

for all \(\theta \in \mathbb {R}\), where the maximum likelihood estimate \(\hat{\theta }\) is the weighted average of the centers \(\xi _{i}\) of the fuzzy numbers, with weights depending on their precision , while is the precision of \(\hat{\theta }\) (which is normally distributed with expectation \(\theta \) and variance \(\tau ^{2}\)).

Besides the maximum likelihood estimate \(\hat{\theta }\), for each \(\alpha \in (0,1)\) we obtain a likelihood-based confidence interval for \(\theta \):

$$\begin{aligned} \left\{ \theta \in \mathbb {R}:lik(\theta )>\alpha \,lik(\hat{\theta })\right\} =\left( \hat{\theta }\pm \tau \,\sqrt{-2\,\ln \alpha }\right) \text {,} \end{aligned}$$
(5)

with exact level \(F_{\chi _{1}^{2}}(-2\,\ln \alpha )\), where \(F_{\chi _{1}^{2}}\) is the cumulative distribution function of the chi-squared distribution with 1 degree of freedom. Alternatively, we can combine the likelihood function (4) induced by the fuzzy data with a Bayesian prior, and base our conclusions on the resulting posterior. In particular, if the prior is a normal distribution with expectation \(\theta _{0}\) and variance \(\tau _{0}^{2}\), then the posterior is a normal distribution with expectation \(\theta _{1}\) and variance \(\tau _{1}^{2}\), where \(\theta _{1}\) is the weighted average of \(\theta _{0}\) and \(\hat{\theta }\), with weights proportional to their precision and , respectively, while these add up to the posterior precision .

4 Fuzzy Inference

Besides allowing the direct use of fuzzy data in statistical methods, the likelihood interpretation of fuzzy sets also leads naturally to fuzzy statistical inference. In fact, the likelihood function on \(\varTheta \) induced by the (fuzzy or crisp) data can be interpreted as the membership function \(\mu :\varTheta \rightarrow [0,1]\) of a (normalized) fuzzy set describing the information obtained from the data about the unknown value of the parameter \(\theta \in \varTheta \).

In particular, the likelihood-based confidence intervals (or regions) for \(\theta \), defined as in the left-hand side of (5) for all \(\alpha \in (0,1)\), correspond to the \(\alpha \)-cuts of the fuzzy set with membership function \(\mu \). Both likelihood-based confidence intervals and \(\alpha \)-cuts are usually defined using the non-strict inequality, but the choice of the strict inequality in the definition provides a better agreement with the concept of profile likelihood function [9], which is of central importance in the likelihood approach to statistics, and corresponds to the extension principle [36], which is equally central in fuzzy set theory.

A correspondence between \(\alpha \)-cuts and (general) confidence intervals has also been suggested as an alternative interpretation of some fuzzy sets [4, 28]. However, this interpretation is afflicted by the fact that confidence intervals are rather arbitrary constructs, and in particular do not usually satisfy the extension principle, when they are not likelihood-based confidence intervals. The interpretation of fuzzy sets in terms of likelihood-based confidence intervals (i.e. the likelihood interpretation) has the advantage of uniqueness, invariance, and general applicability, although a simple expression for the confidence level based on the chi-squared distribution, as in Example 1, is valid (exactly or asymptotically) only under some regularity conditions [33].

Since each value of \(\theta \in \varTheta \) corresponds to a probability measure \(P(\,\cdot \,|\,\theta )\), a fuzzy set with membership function \(\mu :\varTheta \rightarrow [0,1]\) can also be interpreted as a fuzzy probability measure [6, 7]. This likelihood-based model of fuzzy probability bears important similarities to the Bayesian model of probability, and can be used as a basis for statistical inference and decision making [68].

5 Conclusion

In this paper, the likelihood interpretation of fuzzy sets has been reviewed and some of its consequences analyzed. Not surprisingly, with this interpretation fuzzy data and fuzzy inferences can be easily incorporated in statistical methods. In particular, the likelihood interpretation of fuzzy data justifies the use of expression (3) for the induced likelihood function, and establishes a fruitful connection with errors-in-variables models or measurement error models, as illustrated by Example 1. Furthermore, the link between this interpretation and the likelihood approach to statistics sheds some light on the central role played by extension principle and \(\alpha \)-cuts in fuzzy set theory.

The theory of fuzzy sets is also a theory of information fusion. However, only the product rule \(\mu (x)\propto \prod _{i=1}^{n}\mu _{i}(x)\) for the conjunction of independent pieces of information is directly justified by the likelihood interpretation (2). The rules for other logical connectives, with or without the independence assumption, can be obtained through the concept of profile likelihood (i.e. the extension principle). For example, the conjunction without independence assumption is then given by the minimum rule \(\mu (x)\propto \bigwedge _{i=1}^{n}\mu _{i}(x)\), while negation always results in the vacuous membership function \(\mu \equiv 1\). Such rules, which are a consequence of the likelihood interpretation of fuzzy sets, will be the topic of future work.