Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Possibility theory is an uncertainty theory devoted to the handling of incomplete information. To a large extent, it is comparable to probability theory because it is based on set functions. It differs from the latter by the use of a pair of dual set functions (possibility and necessity measures) instead of only one. Besides, it is not additive and makes sense on ordinal structures. The name Theory of Possibility was coined by Zadeh [1], who was inspired by a paper by Gaines and Kohout [2]. In Zadeh’s view, possibility distributions were meant to provide a graded semantics to natural language statements; on this basis, possibility degrees can be attached to other statements, as well as dual necessity degrees expressing graded certainty. However, possibility and necessity measures can also be the basis of a full-fledged representation of partial belief that parallels probability, without compulsory reference to linguistic information [3, 4]. It can be seen either as a coarse, nonnumerical version of probability theory, or a framework for reasoning with extreme probabilities, or yet a simple approach to reasoning with imprecise probabilities [5].

Besides, possibility distributions can also be interpreted as representations of preference, thus standing for a counterpart to a utility function. In this case, possibility degrees estimate degrees of feasibility of alternative choices, while necessity measures can represent priorities [6]. The possibility theory framework is also bipolar [7] because distributions may either restrict the possible states of the world (negative information pointing out the impossible), or model sets of actually observed possibilities (positive information pointing out the possible). Negative information refers to pieces of knowledge that are supposedly correct and act as constraints. Possibility and necessity measures rely on negative information. Positive information refers to reports of actually observed states, or to sets of preferred choices. They induce two other set functions: guaranteed possibility measures and its dual, that are decreasing w.r.t. set inclusion [8].

After reviewing pioneering contributions to possibility theory, we recall its basic concepts namely the four set functions at work in possibility theory. Then we present the two main directions along which possibility theory has developed: the qualitative and quantitative settings. Both approaches share the same basic maxitivity axiom. They differ when it comes to conditioning, and to independence notions. We point out the connections with a coarse numerical integer-valued approach to belief representation, proposed by Spohn [9], now known as ranking theory [10].

In each setting, we discuss current and prospective lines of research. In the qualitative approach, we review the connections between possibility theory and modal logic, possibilistic logic and its applications to nonmonotonic reasoning, logic programming and the like, possibilistic counterparts of Bayesian belief networks, the framework of soft constraints and the possibilistic approach to qualitative decision theory, and more recent investigations in formal concept analysis and learning. On the quantitative side, we review quantitative possibilistic networks, the connections between possibility theory, belief functions and imprecise probabilities, the connections with non-Bayesian statistics, and the application of quantitative possibility to risk analysis.

1 Historical Background

Zadeh was not the first scientist to speak about formalising notions of possibility. The modalities possible and necessary have been used in philosophy at least since the Middle Ages in Europe, based on Aristotle’s and Theophrastus’ works [11]. More recently these notions became the building blocks of modal logics that emerged at the beginning of the 20th century from the works of C.I. Lewis (see Cresswell [12]). In this approach, possibility and necessity are all-or-nothing notions, and handled at the syntactic level. More recently, and independently from Zadeh’s view, the notion of possibility, as opposed to probability, was central in the works of one economist, and in those of two philosophers.

1.1 G.L.S. Shackle

A graded notion of possibility was introduced as a full-fledged approach to uncertainty and decision in 1940–1970 by the English economist Shackle [13], who called degree of potential surprise of an event its degree of impossibility, that is, retrospectively, the degree of necessity of the opposite event. Shackle’s notion of possibility is basically epistemic, it is a character of the chooser’s particular state of knowledge in his present. Impossibility is understood as disbelief. Potential surprise is valued on a disbelief scale, namely a positive interval of the form [ 0 , y * ] , where y * denotes the absolute rejection of the event to which it is assigned. In case everything is possible, all mutually exclusive hypotheses have zero surprise. At least one elementary hypothesis must carry zero potential surprise. The degree of surprise of an event, a set of elementary hypotheses, is the degree of surprise of its least surprising realization. Shackle also introduces a notion of conditional possibility, whereby the degree of surprise of a conjunction of two events A and B is equal to the maximum of the degree of surprise of A, and of the degree of surprise of B, should A prove true. The disbelief notion introduced later by Spohn [10, 9] employs the same type of convention as potential surprise, but uses the set of natural integers as a disbelief scale; his conditioning rule uses the subtraction of natural integers.

1.2 D. Lewis

In his 1973 book [14], the philosopher David Lewis considers a graded notion of possibility in the form of a relation between possible worlds he calls comparative possibility. He connects this concept of possibility to a notion of similarity between possible worlds. This asymmetric notion of similarity is also comparative, and is meant to express statements of the form: a world j is at least as similar to world i as world k is. Comparative similarity of j and k with respect to i is interpreted as the comparative possibility of j with respect to k viewed from world i. Such relations are assumed to be complete pre-orderings and are instrumental in defining the truth conditions of counterfactual statements (of the form If I were rich, I would buy a big boat). Comparative possibility relations Π obey the key axiom: for all events A , B , C

A Π B implies C A Π C B .

This axiom was later independently proposed by the first author [15] in an attempt to derive a possibilistic counterpart to comparative probabilities. Independently, the connection between numerical possibility degrees and similarity was investigated by Sudkamp [16].

1.3 L.J. Cohen

A framework very similar to the one of Shackle was proposed by the philosopher Cohen [17] who considered the problem of legal reasoning. He introduced so-called Baconian probabilities understood as degrees of provability. The idea is that it is hard to prove someone guilty at the court of law by means of pure statistical arguments. The basic feature of degrees of provability is that a hypothesis and its negation cannot both be provable together to any extent (the contrary being a case for inconsistency). Such degrees of provability coincide with what is known as necessity measures.

1.4 L.A. Zadeh

In his seminal paper [1], Zadeh proposed an interpretation of membership functions of fuzzy sets as possibility distributions encoding flexible constraints induced by natural language statements. Zadeh tentatively articulated the relationship between possibility and probability, noticing that what is probable must preliminarily be possible. However, the view of possibility degrees developed in his paper refers to the idea of graded feasibility (degrees of ease, as in the example of how many eggs can Hans eat for his breakfast) rather than to the epistemic notion of plausibility laid bare by Shackle. Nevertheless, the key axiom of maxitivity for possibility measures is highlighted. In the two subsequent articles [18, 19], Zadeh acknowledged the connection between possibility theory, belief functions and upper/lower probabilities, and proposed their extensions to fuzzy events and fuzzy information granules.

2 Basic Notions of Possibility Theory

The basic building blocks of possibility theory originate in Zadeh’s paper [1] and were first extensively described in the authors’ book [20], then further on in [21, 3]. More recent accounts are in [4, 5]. In this section, possibility theory is envisaged as a stand-alone theory of uncertainty.

2.1 Possibility Distributions

Let S be a set of states of affairs (or descriptions thereof), or states for short. This set can be the domain of an attribute (numerical or categorical), the Cartesian product of attribute domains, the set of interpretations of a propositional language etc.. A possibility distribution is a mapping π from S to a totally ordered scale L, with top denoted by 1 and bottom by 0. In the finite case L = { 1 = λ 1 > λ n > λ n + 1 = 0 } . The possibility scale can be the unit interval as suggested by Zadeh, or generally any finite chain, or even the set of nonnegative integers. It is often assumed that L is equipped with an order-reversing map denoted by λ L 1 - λ .

The function π represents the state of knowledge of an agent (about the actual state of affairs), also called an epistemic state distinguishing what is plausible from what is less plausible, what is the normal course of things from what is not, what is surprising from what is expected. It represents a flexible restriction on what is the actual state with the following conventions (similar to probability, but opposite to Shackle’s potential surprise scale (If L = N , the conventions are opposite: 0 means possible and means impossible.)):

  • π ( s ) = 0 means that state s is rejected as impossible;

  • π ( s ) = 1 means that state s is totally possible (=plausible).

The larger π ( s ) , the more possible, i. e., plausible the state s is. Formally, the mapping π is the membership function of a fuzzy set [1], where membership grades are interpreted in terms of plausibility. If the universe S is exhaustive, at least one of the elements of S should be the actual world, so that s , π ( s ) = 1 (normalization). This condition expresses the consistency of the epistemic state described by π.

Distinct values may simultaneously have a degree of possibility equal to 1. In the Boolean case, π is just the characteristic function of a subset E S of mutually exclusive states (a disjunctive set [22]), ruling out all those states considered as impossible. Possibility theory is thus a (fuzzy) set-based representation of incomplete information.

2.2 Specificity

A possibility distribution π is said to be at least as specific as another π if and only if for each state of affairs s: π ( s ) π ( s )  [23]. Then, π is at least as restrictive and informative as π , since it rules out at least as many states with at least as much strength. In the possibilistic framework, extreme forms of partial knowledge can be captured, namely:

  • Complete knowledge : for some s 0 , π ( s 0 ) = 1 and π ( s ) = 0 , s s 0 (only s 0 is possible)

  • Complete ignorance : π ( s ) = 1 , s S (all states are possible).

Possibility theory is driven by the principle of minimal specificity. It states that any hypothesis not known to be impossible cannot be ruled out. It is a minimal commitment, cautious information principle. Basically, we must always try to maximize possibility degrees, taking constraints into account.

Given a piece of information in the form x is F, where F is a fuzzy set restricting the values of the ill-known quantity x, it leads to represent the knowledge by the inequality π μ F , the membership function of F. The minimal specificity principle enforces the possibility distribution π = μ F , if no other piece of knowledge is available. Generally there may be impossible values of x due to other piece(s) of information. Thus, given several pieces of knowledge of the form x is F i , for i = 1 , , n , each of them translates into the constraint π μ F i ; hence, several constraints lead to the inequality π min⁡ i = 1 n μ F i and on behalf of the minimal specificity principle, to the possibility distribution

π = min⁡ i = 1 n π i ,

where π i is induced by the information item x is F i . It justifies the use of the minimum operation for combining information items. It is noticeable that this way of combining pieces of information fully agrees with classical logic, since a classical logic base is equivalent to the logical conjunction of the logical formulas that belong to the base, and its models is obtained by intersecting the sets of models of its formulas. Indeed, in propositional logic, asserting a proposition ϕ amounts to declaring that any interpretation (state) that makes ϕ false is impossible, as being incompatible with the state of knowledge.

2.3 Possibility and Necessity Functions

Given a simple query of the form does event A occur? (is the corresponding proposition ϕ true?) where A is a subset of states, the response to the query can be obtained by computing degrees of possibility and necessity, respectively (if the possibility scale L = [ 0 , 1 ] )

Π ( A ) = sup⁡ s A π ( s ) ; N ( A ) = inf⁡ s A 1 - π ( s ) .

Π ( A ) evaluates to what extent A is consistent with π, while N ( A ) evaluates to what extent A is certainly implied by π. The possibility–necessity duality is expressed by N ( A ) = 1 - Π ( A c ) , where A c is the complement of A. Generally, Π ( S ) = N ( S ) = 1 and Π ( ) = N ( ) = 0 (since π is normalized to 1). In the Boolean case, the possibility distribution comes down to the disjunctive (epistemic) set E S  [24, 3]:

  • Π ( A ) = 1 if A E , and 0 otherwise: function Π checks whether proposition A is logically consistent with the available information or not.

  • N ( A ) = 1 if E A , and 0 otherwise: function N checks whether proposition A is logically entailed by the available information or not.

More generally, possibility and necessity measures represent degrees of plausibility and belief, respectively, in agreement with other uncertainty theories (see Sect. 3.4). Possibility measures satisfy the basic maxitivity property Π ( A B ) = max⁡ ( Π ( A ) , Π ( B ) ) . Necessity measures satisfy an axiom dual to that of possibility measures, namely N ( A B ) = min⁡ ( N ( A ) , N ( B ) ) . On infinite spaces, these axioms must hold for infinite families of sets. As a consequence, of the normalization of π, min⁡ ( N ( A ) , N ( A c ) ) = 0 and max⁡ ( Π ( A ) , Π ( A c ) ) = 1 , where A c is the complement of A, or equivalently Π ( A ) = 1 whenever N ( A ) > 0 , which totally fits the intuition behind this formalism, namely that something somewhat certain should be fully possible, i. e., consistent with the available information.

2.4 Certainty Qualification

Human knowledge is often expressed in a declarative way using statements to which belief degrees are attached. Certainty-qualified pieces of uncertain information of the form A is certain to degree α can then be modeled by the constraint N ( A ) α . It represents a family of possible epistemic states π that obey this constraints. The least specific possibility distribution among them exists and is defined by [3]

π ( A , α ) ( s ) = 1 if s A , 1 - α otherwise .
(3.1)

If α = 1 , we get the characteristic function of A. If α = 0 , we get total ignorance. This possibility distribution is a key building block to construct possibility distributions from several pieces of uncertain knowledge. Indeed, e. g., in the finite case, any possibility distribution can be viewed as a collection of nested certainty-qualified statements. Let E i = { s : π ( s ) λ i L } be the λ i -cut of π. Then it is easy to check that π ( s ) = min⁡ i : s E i 1 - N ( E i ) (with the convention min⁡ = 1 ).

We can also consider possibility-qualified statements of the form Π ( A ) β ; however, the least specific epistemic state compatible with this constraint expresses total ignorance.

2.5 Joint Possibility Distributions

Possibility distributions over Cartesian products of attribute domains S 1 × × S m are called joint possibility distributions π ( s 1 , , s n ) . The projection π k of the joint possibility distribution π onto S k is defined as

π k ( s k ) = Π ( S 1 × S k - 1 × { s k } × S k + 1 × S m ) = sup⁡ s i S i , i k π ( s 1 , , s n ) .

Clearly, π ( s 1 , , s n ) min⁡ k = 1 m π k ( s k ) that is, a joint possibility distribution is at least as specific as the Cartesian product of its projections. When the equality holds, π ( s 1 , , s n ) is called separable.

2.6 Conditioning

Notions of conditioning exist in possibility theory. Conditional possibility can be defined similarly to probability theory using a Bayesian-like equation of the form [3]

Π ( B A ) = Π ( B A ) Π ( A ) .

where Π ( A ) > 0 and is a t-norm (A nondecreasing Abelian semigroup operation on the unit interval having identity 1 and absorbing element 0 [25].); moreover N ( B A ) = 1 - Π ( B c A ) . The above equation makes little sense for necessity measures, as it becomes trivial when N ( A ) = 0 , that is under lack of certainty, while in the above definition, the equation becomes problematic only if Π ( A ) = 0 , which is natural as then A is considered impossible. If operation is the minimum, the equation Π ( B A ) = min⁡ ( Π ( B A ) , Π ( A ) ) fails to characterize Π ( B A ) , and we must resort to the minimal specificity principle to come up with the qualitative conditioning rule [3]

Π ( B A ) = 1 if Π ( B A ) = Π ( A ) > 0 , Π ( B A ) otherwise .
(3.2)

It is clear that N ( B A ) > 0 if and only if Π ( B A ) > Π ( B c A ) . Moreover, if Π ( B A ) > Π ( B ) then Π ( B A ) = 1 , which points out the limited expressiveness of this qualitative notion (no gradual positive reinforcement of possibility). However, it is possible to have that N ( B ) > 0 , N ( B c A 1 ) > 0 , N ( B A 1 A 2 ) > 0 (i. e., oscillating beliefs). Extensive works on conditional possibility, especially qualitative, handling the case Π ( A ) = 0 , have been recently carried out by Coletti and Vantaggi [26, 27] in the spirit of De Finetti’s approach to subjective probabilities defined in terms of conditional measures and allowing for conditioning on impossible events.

In the numerical setting, due to the need of preserving for Π ( B A ) continuity properties of Π, we must choose = product, so that

Π ( B A ) = Π ( B A ) Π ( A )

which makes possibilistic and probabilistic conditionings very similar [28] (now, gradual positive reinforcement of possibility is allowed). But there is yet another definition of numerical possibilistic conditioning, not based on the above equation as seen later in this chapter.

2.7 Independence

There are also several variants of possibilistic independence between events. Let us mention here the two basic approaches:

  • Unrelatedness: Π ( A B ) = min⁡ ( Π ( A ) , Π ( B ) ) . When it does not hold, it indicates an epistemic form of mutual exclusion between A and B. It is symmetric but sensitive to negation. When it holds for all pairs made of A , B and their complements, it is an epistemic version of logical independence related to separability.

  • Causal independence: Π ( B A ) = Π ( B ) . This notion is different from the former one and stronger. It is a form of directed epistemic independence whereby learning A does not affect the plausibility of B. It is neither symmetric nor insensitive to negation: for instance, it is not equivalent to N ( B A ) = N ( B ) .

Generally, independence in possibility theory is neither symmetric, nor insensitive to negation. For Boolean variables, independence between events is not equivalent to independence between variables. But since the possibility scale can be qualitative or quantitative, and there are several forms of conditioning, there are also various possible forms of independence. For studies of various notions and their properties see [29, 30, 31, 32]. More discussions and references appear in [4].

2.8 Fuzzy Interval Analysis

An important example of a possibility distribution is a fuzzy interval [20, 3]. A fuzzy interval is a fuzzy set of reals whose membership function is unimodal and upper-semi continuous. Its α-cuts are closed intervals. The calculus of fuzzy intervals is an extension of interval arithmetics based on a possibilistic counterpart of a computation of random variable. To compute the addition of two fuzzy intervals A and B one has to compute the membership function of A B as the degree of possibility μ A B ( z ) = Π ( { ( x , y ) : x + y = z } ) , based on the possibility distribution min⁡ ( μ A ( x ) , μ B ( y ) ) . There is a large literature on possibilistic interval analysis; see [33] for a survey of 20th-century references.

2.9 Guaranteed Possibility

Possibility distributions originally represent negative information in the sense that their role is essentially to rule out impossible states. More recently, [34, 35] another type of possibility distribution has been considered where the information has a positive nature, namely it points out actually possible states, such as observed cases, examples of solutions, etc. Positively-flavored possibility distributions will be denoted by δ and serve as evidential support functions. The conventions for interpreting them contrast with usual possibility distributions:

  • δ ( s ) = 1 means that state s is actually possible because of a high evidential support (for instance, s is a case that has been actually observed);

  • δ ( s ) = 0 means that state s has not been observed (yet: potential impossibility).

Note that π ( s ) = 1 indicates potential possibility, while δ ( s ) = 1 conveys more information. In contrast, δ ( s ) = 0 expresses ignorance.

A measure of guaranteed possibility can be defined, that differs from functions Π and N [34, 35]

Δ ( A ) = inf⁡ s A δ ( s ) .

It estimates to what extent all states in A are actually possible according to evidence. Δ ( A ) can be used as a degree of evidential support for A. Of course, this function possesses a conjugate such that ( A ) = 1 - Δ ( A c ) = sup⁡ s A 1 - δ ( s ) . Function ( A ) evaluates the degree of potential necessity of A, as it is 1 only if some state s outside A is potentially impossible.

Uncertain statements of the form A is possible to degree β often mean that any realization of A is possible to degree β (e. g., it is possible that the museum is open this afternoon). They can then be modeled by a constraint of the form Δ ( A ) β . It corresponds to the idea of observed evidence.

This type of information is better exploited by assuming an informational principle opposite to the one of minimal specificity, namely, any situation not yet observed is tentatively considered as impossible. This is similar to the closed-world assumption. The most specific distribution δ ( A , β ) in agreement with Δ ( A ) β is

δ ( A , β ) ( s ) = β if s A , 0 otherwise .

Note that while possibility distributions induced from certainty qualified pieces of knowledge combine conjunctively, by discarding possible states, evidential support distributions induced by possibility-qualified pieces of evidence combine disjunctively, by accumulating possible states. Given several pieces of knowledge of the form x is F i is possible, for i = 1 , , n , each of them translates into the constraint δ μ F i ; hence, several constraints lead to the inequality δ max⁡ i = 1 n μ F i and on behalf of another minimal commitment principle based on maximal specificity, we get the possibility distribution

δ = max⁡ i = 1 n π i ,

where δ i is induced by the information item x is F i is possible. It justifies the use of the maximum operation for combining evidential support functions. Acquiring pieces of possibility-qualified evidence leads to updating δ ( A , β ) into some wider distribution δ > δ ( A , β ) . Any possibility distribution can be represented as a collection of nested possibility-qualified statements of the form ( E i , Δ ( E i ) ) , with E i = { s : δ ( s ) λ i } , since δ ( s ) = max⁡ i : s E i Δ ( E i ) , dually to the case of certainty-qualified statements.

2.10 Bipolar Possibility Theory

A bipolar representation of information using pairs ( δ , π ) may provide a natural interpretation of interval-valued fuzzy sets [8]. Although positive and negative information are represented in separate and different ways via δ and π functions, respectively, there is a coherence condition that should hold between positive and negative information. Indeed, observed information should not be impossible. Likewise, in terms of preferences, solutions that are preferred to some extent should not be unfeasible. This leads to enforce the coherence constraint δ π between the two representations.

This condition should be maintained when new information arrives and is combined with the previous one. This does not go for free since degrees δ ( s ) tend to increase while degrees π ( s ) tend to decrease due to the disjunctive and conjunctive processes that, respectively, govern their combination. Maintaining this coherence requires a revision process that works as follows. If the current information state is represented by the pair ( δ , π ) , receiving a new positive (resp. negative) piece of information represented by δ new (resp. π new ) to be enforced, leads to revising ( δ , π ) into ( max⁡ ( δ , δ new ) , π rev ) (resp. into ( δ rev , min⁡ ( π , π new ) ), using, respectively,

π rev = max⁡ ( π , δ new ) ;
(3.3)
δ rev = min⁡ ( π new , δ ) .
(3.4)

It is important to note that when both positive and negative pieces of information are collected, there are two options:

  • Either priority is given to positive information over negative information: it means that (past) positive information cannot be ruled out by (future) negative information. This may be found natural when very reliable observations (represented by δ) contradict tentative knowledge (represented by π). Then revising ( δ , π ) by ( δ new , π new ) yields the new pair

    ( δ rev , π rev ) = ( max⁡ ( δ , δ new ) , max⁡ ( min⁡ ( π , π new ) , max⁡ ( δ , δ new ) ) )
  • Priority is given to negative information over positive information. It makes sense when handling preferences. Indeed, then, positive information may be viewed as wishes, while negative information reflects constraints. Then, revising ( δ , π ) by ( δ new , π new ) would yield the new pair

    ( δ rev , π rev ) = ( min⁡ ( min⁡ ( π , π new ) , max⁡ ( δ , δ new ) ) , min⁡ ( π , π new ) ) .

It can be checked that the two latter revision rules generalize the two previous ones. With both revision options, it can be checked that if δ π and δ new π new hold, revising ( δ , π ) by ( δ new , π new ) yields a new coherent pair. This revision process should not be confused with another one pertaining only to the negative part of the information, namely computing min⁡ ( π , π new ) may yield a possibility distribution that is not normalized, in the case of inconsistency. If such an inconsistency takes place, it should be resolved (by some appropriate renormalization) before one of the two above bipolar revision mechanisms can be applied.

3 Qualitative Possibility Theory

This section is restricted to the case of a finite state space S, typically S is the set of interpretations of a formal propositional language L based on a finite set of Boolean attributes V. The usual connectives (conjunction), (disjunction), and ¬ (negation) are used. The possibility scale is then taken as a finite chain, or the unit interval understood as an ordinal scale, or even just a complete preordering of states. At the other end, one may use the set of natural integers (viewed as an impossibility scale) equipped with addition, which comes down to a countable subset of the unit interval, equipped with the product t-norm, instrumental for conditioning. However, the qualitative nature of the latter setting is questionable, even if authors using it do not consider it as genuinely quantitative.

3.1 Possibility Theory and Modal Logic

In this section, the possibility scale is Boolean ( L = { 0 , 1 } ) and a possibility distribution reduces to a subset of states E, for instance the models of a set of formulas K representing the beliefs of an agent in propositional logic. The presence of a proposition p in K can be modeled by N ( [ p ] ) = 1 , or Π ( [ ¬ p ] ) = 0 where  [ p ] is the set of interpretations of p; more generally the degrees of possibility and necessity can be defined by [36]:

  • N ( [ p ] ) = Π ( [ p ] ) = 1 if and only if K p (the agent believes p)

  • N ( [ ¬ p ] ) = Π ( [ ¬ p ] ) = 0 if and only if K ¬ p (the agent believes  ¬ p )

  • N ( [ p ] ) = 0 and Π ( [ p ] ) = 1 if and only if K ⊧̸ p and K ⊧̸ ¬ p (the agent is unsure about p)

However, in propositional logic, it cannot be syntactically expressed that N ( [ p ] ) = 0 nor Π ( [ p ] ) = 1 . To do so, a modal language is needed [12], that prefixes propositions with modalities such as necessary () and possible (). Then p encodes N ( [ p ] ) = 1 (instead of p K in classical logic), p encodes Π ( [ p ] ) = 1 . Only a very simple modal language L is needed that encapsulates the propositional language L. Atoms of this logic are of the form p , where p is any propositional formula. Well-formed formulas in this logic are obtained by applying standard conjunction and negation to these atoms

L = p , p L ¬ ϕ ϕ ψ .

The well-known conjugateness between possibility and necessity reads: p = ¬ ¬ p . Maxitivity and minitivity axioms of possibility and necessity measure, respectively, read ( p q ) = p q and ( p q ) = p q and are well known to hold in regular modal logics, and the consistency of the epistemic state is ensured by axiom D : p p . This is the minimal epistemic logic (GlossaryTerm

MEL

) [37] needed to account for possibility theory. It corresponds to a small fragment of the logic KD without modality nesting and without objective formulas ( L L = ). Models of such modal formulas are epistemic states: for instance, E is a model of p means that E [ p ]  [37, 38]. This logic is sound and complete with respect to this semantics, and enables propositions whose truth status is explicitly unknown to be reasoned about.

3.2 Comparative Possibility

A plausibility ordering is a complete preorder of states denoted by π , which induces a well-ordered partition { E 1 , , E n } of S. It is the comparative counterpart of a possibility distribution π, i. e., s π s if and only if π ( s ) π ( s ) . Indeed it is more natural to expect that an agent will supply ordinal rather than numerical information about his beliefs. By convention, E 1 contains the most normal states of fact, E n the least plausible, or most surprising ones. Denoting by max⁡ ( A ) any most plausible state s 0 A , ordinal counterparts of possibility and necessity measures [15] are then defined as follows: { s } Π for all s S and

A Π B if and only if max⁡ ( A ) π max⁡ ( B ) A N B if and only if max⁡ ( B c ) π max⁡ ( A c ) .

Possibility relations Π were proposed by Lewis [14] and they satisfy his characteristic property

A Π B implies C A Π C B ,

while necessity relations can also be defined as A N B if and only if B c Π A c , and they satisfy a similar axiom

A N B implies C A N C B .

The latter coincides with epistemic entrenchment relations in the sense of belief revision theory [39] (provided that A > Π , if A ). Conditioning a possibility relation Π by a nonimpossible event C > Π means deriving a relation C Π such that

A C Π B if and only if A C Π B C .

These results show that possibility theory is implicitly at work in the principal axiomatic approach to belief revision [40], and that conditional possibility obeys its main postulates [41]. The notion of independence for comparative possibility theory was studied by Dubois etal [31], for independence between events, and Ben Amor etal [32] between variables.

3.3 Possibility Theory and Nonmonotonic Inference

Suppose S is equipped with a plausibility ordering. The main idea behind qualitative possibility theory is that the state of the world is always believed to be as normal as possible, neglecting less normal states. A Π B really means that there is a normal state where A holds that is at least as normal as any normal state where B holds. The dual case A N B is intuitively understood as A is at least as certain as B, in the sense that there are states where B fails to hold that are at least as normal as the most normal state where A does not hold. In particular, the events accepted as true are those which are true in all the most plausible states, namely the ones such that A > N . These assumptions lead us to interpret the plausible inference A B of a proposition B from another A, under a state of knowledge Π as follows: B should be true in all the most normal states were A is true, which means B > A Π B c in terms of ordinal conditioning, that is, A B is more plausible than A B c . A B also means that the agent considers B as an accepted belief in the context A.

This kind of inference is nonmonotonic in the sense that A B does not always imply A C B for any additional information C. This is similar to the fact that a conditional probability P ( B A C ) may be low even if P ( B A ) is high. The properties of the consequence relation are now well understood, and are precisely the ones laid bare by Lehmann and Magidor [42] for their so-called rational inference. Monotonicity is only partially restored: A B implies A C B provided that A C c does not hold (i. e., that states were A is true do not typically violate C). This property is called rational monotony, and, along with some more standard ones (like closure under conjunction), characterizes default possibilistic inference . In fact, the set { B , A B } of accepted beliefs in the context A is deductively closed, which corresponds to the idea that the agent reasons with accepted beliefs in each context as if they were true, until some event occurs that modifies this context. This closure property is enough to justify a possibilistic approach [43] and adding the rational monotonicity property ensures the existence of a single possibility relation generating the consequence relation  [44].

Plausibility orderings can be generated by a set of if-then rules tainted with unspecified exceptions. This set forms a knowledge base supplied by an agent. Each rule if A then B is modeled by a constraint of the form A B > Π A B c on possibility relations. There exists a single minimally specific element in the set of possibility relations satisfying all constraints induced by rules (unless the latter are inconsistent). It corresponds to the most compact plausibility ranking of states induced by the rules [44]. This ranking can be computed by an algorithm originally proposed by Pearl [45].

Qualitative possibility theory has been studied from the point of view of cognitive psychology. Experimental results [46] suggest that there are situations where people reason about uncertainty using the rules or possibility theory, rather than with those of probability theory, namely people jump to plausible conclusions based on assuming the current world is normal.

3.4 Possibilistic Logic

Qualitative possibility relations can be represented by (and only by) possibility measures ranging on any totally ordered set L (especially a finite one) [15]. This absolute representation on an ordinal scale is slightly more expressive than the purely relational one. For instance, one can express that a proposition is fully plausible ( Π ( A ) = 1 ) , while using a possibility relation, one can only say that it is among the most plausible ones. When the finite set S is large and generated by a propositional language, qualitative possibility distributions can be efficiently encoded in possibilistic logic [47, 48, 49].

A possibilistic logic base K is a set of pairs ( p i , α i ) , where p i is an expression in classical (propositional or first-order) logic and α i > 0 is a element of the value scale L. This pair encodes the constraint N ( p i ) α i where N ( p i ) is the degree of necessity of the set of models of p i . Each prioritized formula ( p i , α i ) has a fuzzy set of models (via certainty qualification described in Sect. 3.2) and the fuzzy intersection of the fuzzy sets of models of all prioritized formulas in K yields the associated plausibility ordering on S encoded by a possibility distribution π K . Namely, an interpretation s is all the less possible as it falsifies formulas with higher weights, i. e.,

π K ( s ) = 1 if s p i , ( p i , α i ) K ,
(3.5)
π K ( s ) = 1 - max⁡ { α i : ( p i , α i ) K , s ⊧̸ p i } otherwise .
(3.6)

This distribution is obtained by applying the minimal specificity principle, since it is the largest one that satisfies the constraints N ( p i ) α i . If the classical logic base { p i : ( p i , α i ) K } is inconsistent, π K is not normalized, and a level of inconsistency equal to inc ( K ) = 1 - max⁡ π K can be attached to the base K. However, the set of formulas { p i : ( p i , α i ) K , α i > inc ( K ) } is always consistent.

Syntactic deduction from a set of prioritized clauses is achieved by refutation using an extension of the standard resolution rule, whereby ( p q , min⁡ ( α , β ) ) can be derived from ( p r , α ) and ( q ¬ r , β ) . This rule, which evaluates the validity of an inferred proposition by the validity of the weakest premiss, goes back to Theophrastus, a disciple of Aristotle. Another way of presenting inference in possibilistic logic relies on the fact that K ( p , α ) if and only if K α = { p i : ( p i , α i ) K , α i α } p in the sense of classical logic. In particular, inc ( K ) = max⁡ { α : K α } . Inference in possibilistic logic can use this extended resolution rule and proceeds by refutation since K ( p , α ) if and only if inc ( { ( ¬ p , 1 ) } K ) α . Computational inference methods in possibilistic logic are surveyed in [50].

Possibilistic logic is an inconsistency-tolerant extension of propositional logic that provides a natural semantic setting for mechanizing nonmonotonic reasoning [51], with a computational complexity close to that of propositional logic. Namely, once a possibility distribution on models is generated by a set of if-then rules p i q i (as explained in Sect. 3.3.3 and modeled here using qualitative conditioning as N ( q i p i ) > 0 ), weights α i = N ( ¬ p i q i ) can be computed, and the corresponding possibilistic base built [51]. See [52] for an efficient method involving compilation.

Variants of possibilistic logic have been proposed in later works. A partially ordered extension of possibilistic logic has been proposed, whose semantic counterpart consists of partially ordered models [53]. Another approach for handling partial orderings between weights is to encode formulas with partially constrained weights in a possibilistic-like many-sorted propositional logic [54]. Namely, a formula ( p , α ) is rewritten as a classical two-sorted clause p a b α , where a b α means the situation is α-abnormal, and thus the clause expresses that p is true or the situation is abnormal, while more generally ( p , min⁡ ( α , β ) ) is rewritten as the clause p a b α a b β . Then a known constraint between unknown weights such as α β is translated into a clause ¬ a b α a b β . In this way, a possibilistic logic base, where only partial information about the relative ordering between the weights is available under the form of constraints, can be handled as a set of classical logic formulas that involve symbolic weights.

An efficient inference process has been proposed using the notion of forgetting variables. This approach provides a technique for compiling a standard possibilistic knowledge bases in order to process inference in polynomial time [55]. Let us also mention quasi-possibilistic logic [56], an extension of possibilistic logic based on the so-called quasi-classical logic, a paraconsistent logic whose inference mechanism is close to classical inference (except that it is not allowed to infer p q from p). This approach copes with inconsistency between formulas having the same weight. Other types of possibilistic logic can also handle constraints of the form Π ( ϕ ) α , or Δ ( ϕ ) α  [49].

There is a major difference between possibilistic logic and weighted many-valued logics [57]. Namely, in the latter, a weight τ L attached to a (many valued, thus nonclassical) formula p acts as a truth-value threshold, and ( p , τ ) in a fuzzy knowledge base expresses the Boolean requirement that the truth value of p should be at least equal to τ for ( p , τ ) to be valid. So in such fuzzy logics, while truth of p is many-valued, the validity of a weighted formula is two-valued. On the contrary, in possibilistic logic, truth is two-valued (since p is Boolean), but the validity of a possibilistic formula ( p , α ) is many-valued. In particular, it is possible to cast possibilistic logic inside a many-valued logic. The idea is to consider many-valued atomic sentences ϕ of the form ( p , α ) , where p is a formula in classical logic. Then, one can define well-formed formulas such as ϕ ψ , ϕ ψ , or yet ϕ ψ , where the external connectives linking ϕ and ψ are those of the chosen many-valued logic. From this point of view, possibilistic logic can be viewed as a fragment of a many-valued logic that uses only one external connective: conjunction interpreted as minimum. This approach involving a Boolean algebra embedded in a nonclassical one has been proposed by Boldrin and Sossai [58] with a view to augment possibilistic logic with fusion modes cast at the object level. It is also possible to replace classical logic by a many-valued logic inside possibilistic logic. For instance, possibilistic logic has been extended to Gödel many-valued logic [59]. A similar technique has been used by Hájek etal to extend possibilistic logic to a many-valued modal setting [60].

Lehmke [61] has cast fuzzy logics and possibilistic logic inside the same framework, considering weighted many-valued formulas of the form ( p , θ ) , where p is a many-valued formula with truth set T, and θ is a label defined as a monotone mapping from the truth-set T to a validity set L (a set of possibility degrees). T and L are supposed to be complete lattices, and the set of labels has properties that make it a fuzzy extension of a filter. Labels encompass fuzzy truth-values in the sense of Zadeh [62], such as very true, more or less true that express uncertainty about (many-valued) truth in a graded way.

Rather than expressing statements such as it is half-true that John is tall, which presupposes a state of complete knowledge about John’s height, one may be interested in handling states of incomplete knowledge, namely assertions of the form all we know is that John is tall. One way to do it is to introduce fuzzy constants in a possibilistic first-ordered logic. Dubois, Prade, and Sandri [63] have noticed that an imprecise restriction on the scope of an existential quantifier can be handled in the following way. From the two premises x A , ¬ p ( x , y ) q ( x , y ) , and x B , p ( x , a ) , where a is a constant, we can conclude that x B , q ( x , a ) provided that B A . Thus, letting p ( B , a ) stand for x B , p ( x , a ) , one can write

x A , ¬ p ( x , y ) q ( x , y ) , p ( B , a ) q ( B , a )

if B A , B being an imprecise constant. Letting A and B be fuzzy sets, the following pattern can be validated in possibilistic logic

( ¬ p ( x , y ) q ( x , y ) , min⁡ ( μ A ( x ) , α ) ) , ( p ( B , a ) , β ) ( q ( B , a ) , min⁡ ( N B ( A ) , α , β ) ,

where N B ( A ) = inf⁡ t max⁡ ( μ A ( t ) , 1 - μ B ( t ) ) is the necessity measure of the fuzzy event A based on fuzzy information B. Note that A, which appears in the weight slot of the first possibilistic formula plays the role of a fuzzy predicate, since the formula expresses that the more x is A , the more certain (up to level α) if p is true for ( x , y ) , q is true for them as well.

Alsinet and Godo [64, 65] have applied possibilistic logic to logic programming that allows for fuzzy constants [65, 66]. They have developed programming environments based on possibility theory. In particular, the above inference pattern can be strengthened, replacing B by its cut B β in the expression of N B ( A ) and extended to a sound resolution rule. They have further developed possibilistic logic programming with similarity reasoning [67] and more recently argumentation [68, 69].

Lastly, in order to improve the knowledge representation power of the answer-set programming (GlossaryTerm

ASP

) paradigm, the stable model semantics has been extended by taking into account a certainty level, expressed in terms of necessity measure, on each rule of a normal logic program. It leads to the definition of a possibilistic stable model for weighted answer-set programming [70]. Bauters etal [71] introduce a characterization of answer sets of classical and possibilistic GlossaryTerm

ASP

programs in terms of possibilistic logic where an GlossaryTerm

ASP

program specifies a set of constraints on possibility distributions.

3.5 Ranking Function Theory

A theory that parallels possibility theory to a large extent and that has been designed for handling issues in belief revision, nonmonotonic reasoning and causation, just like qualitative possibility theory is the one of ranking functions by Spohn [10, 72, 9]. The main difference is that it is not really a qualitative theory as it uses the set of integers including (denoted by N + ) as a value scale. Hence, it is more expressive than qualitative possibility theory, but it is applied to the same problems.

Formally [10], a ranking function is a mapping κ : 2 S N + such that:

  • κ ( { s } ) = 0 for some s S ;

  • κ ( A ) = min⁡ s A κ ( { s } ) ;

  • κ ( ) = .

It is immediate to verify that the set function Π ( A ) = 2 - κ ( A ) is a possibility measure. So a ranking function is an integer-valued measure of impossibility (disbelief). The function β ( A ) = κ ( A c ) is an integer-valued necessity measure used by Spohn for measuring belief, and it is clear that the rescaled necessity measure is N ( A ) = 1 - 2 - β ( A ) . Interestingly, ranking functions also bear close connection to probability theory [72], viewing κ ( A ) as the exponent of an infinitesimal probability, of the form P ( A ) = ϵ κ ( A ) . Indeed the order of magnitude of P ( A B ) is then ϵ min⁡ ( κ ( A ) , κ ( B ) ) . Integers also come up naturally if we consider Hamming distances between models in the Boolean logic context, if for instance, the degree of possibility of an interpretation is a function of its Hamming distance to the closest model of a classical knowledge base.

Spohn [9] also introduces conditioning concepts, especially:

  • The so-called A-part of κ, which is a conditioning operation by event A defined by κ ( B A ) = κ ( B A ) - κ ( B ) ;

  • The ( A , n ) -conditionalization of κ, κ ( ( A n ) ) which is a revision operation by an uncertain input enforcing κ ( A c ) = n , and defined by

    κ ( s ( A n ) ) = κ ( s A ) if s A n + κ ( s A c ) otherwise .
    (3.7)

    This operation makes A more believed than A c by n steps, namely,

    β ( A ( A n ) ) = 0 ; β ( A c ( A n = n ) ) .

It is easy to see that the conditioning of ranking functions comes down to the product-based conditioning of numerical possibility measures, and to the infinitesimal counterpart of usual Bayesian conditioning of probabilities. The other conditioning rule can be obtained by means of Jeffrey’s rule of conditioning [73] P ( B ( A , α ) ) = α P ( B A ) + ( 1 - α ) P ( B A c ) by a constraint of the form P ( A ) = α . Both qualitative and quantitative counterparts of this revision rule in possibility theory have been studied in detail [74, 75]. In fact, ranking function theory is formally encompassed by numerical possibility theory. Moreover, there is no fusion rule in Spohn theory, while fusion is one of the main applications of possibility theory (see Sect. 3.5).

3.6 Possibilistic Belief Networks

Another compact representation of qualitative possibility distributions is the possibilistic directed graph, which uses the same conventions as Bayesian nets, but relies on conditional possibility [76]. The qualitative approach is based on a symmetric notion of qualitative independence Π ( B A ) = min⁡ ( Π ( A ) , Π ( B ) ) that is weaker than the causal-like condition Π ( B A ) = Π ( B )  [31]. Like joint probability distributions, joint possibility distributions can be decomposed into a conjunction of conditional possibility distributions (using minimum or product) in a way similar to Bayes nets [76]. A joint possibility distribution associated with variables X 1 , , X n , decomposed by the chain rule

π ( X 1 , , X n ) = min⁡ ( π ( X n X 1 , , X n - 1 ) , , π ( X 2 X 1 ) , π ( X 1 ) ) .

Such a decomposition can be simplified by assuming conditional independence relations between variables, as reflected by the structure of the graph. The form of independence between variables at work here is conditional noninteractivity: Two variables X and Y are independent in the context Z, if for each instance ( x , y , z ) of ( X , Y , Z ) we have: π ( x , y z ) = min⁡ ( π ( x z ) , π ( y z ) ) .

Ben Amor and Benferhat [77] investigate the properties of qualitative independence that enable local inferences to be performed in possibilistic nets. Uncertainty propagation algorithms suitable for possibilistic graphical structures have been studied in [78]. It is also possible to propagate uncertainty in nondirected decompositions of joint possibility measures as done quite early by Borgelt etal [79]. Counterparts of product-based numerical possibilistic nets using ranking functions exist as well [10]. Qualitative possibilistic counterparts of decision trees and influence diagrams for decision trees have been recently investigated [80, 81]. Compilation techniques for inference in possibilistic networks have been devised [82]. Finally, the study of possibilistic networks from the standpoint of causal reasoning has been investigated, using the concept of intervention, that comes down to enforcing the values of some variables so as to lay bare their influence on other ones [83, 84].

3.7 Fuzzy Rule-Based and Case-Based Approximate Reasoning

A typology of fuzzy rules has been devised in the setting of possibility theory, distinguishing rules whose purpose is to propagate uncertainty through reasoning steps, from rules whose main purpose is similarity-based interpolation [85], depending on the choice of a many-valued implication connective that models a rule. The bipolar view of information based on ( δ , π ) pairs sheds new light on the debate between conjunctive and implicative representation of rules [86]. Representing a rule as a material implication focuses on counterexamples to rules, while using a conjunction between antecedent and consequent points out examples of the rule and highlights its positive content. Traditionally in fuzzy control and modeling, the latter representation is adopted, while the former is the logical tradition. Introducing fuzzy implicative rules in modeling accounts for constraints or landmark points the model should comply with (as opposed to observed data) [87]. The bipolar view of rules in terms of examples and counterexamples may turn out to be very useful when extracting fuzzy rules from data [88].

Fuzzy rules have been applied to case-based reasoning (GlossaryTerm

CBR

). In general, GlossaryTerm

CBR

relies on the following implicit principle: similar situations may lead to similar outcomes. Thus, a similarity relation S between problem descriptions or situations, and a similarity measure T between outcomes are needed. This implicit GlossaryTerm

CBR

principle can be expressed in the framework of fuzzy rules as: ‘‘the more similar (in the sense of S) are the attribute values describing two situations, the more possible the similarity (in the sense of T) of the values of the corresponding outcome attributes.’’ Given a situation s 0 associated to an unknown outcome t 0 and a current case ( s , t ) , this principle enables us to conclude on the possibility of t 0 being equal to a value similar to t [89]. This acknowledges the fact that, often in practice, a database may contain cases that are rather similar with respect to the problem description attributes, but which may be distinct with respect to outcome attribute(s). This emphasizes that case-based reasoning can only lead to cautious conclusions.

The possibility rule the more similar s and s 0 , the more possible t and t 0 are similar, is modeled in terms of a guaranteed possibility measure [90]. This leads to enforce the inequality Δ 0 ( T ( t , ) ) μ S ( s , s 0 ) , which expresses that the guaranteed possibility that t 0 belongs to a high degree to the fuzzy set of values that are T-similar to t, is lower bounded by the S-similarity of s and s 0. Then the fuzzy set F of possible values t for t 0 with respect to case ( s , t ) is given by

F t 0 ( t ) = min⁡ ( μ T ( t , t ) , μ S ( s , s 0 ) ) ,

since the maximally specific distribution such that Δ ( A ) α is δ = min⁡ ( μ A , α ) . What is obtained is the fuzzy set T ( t , . ) of values t that are T-similar to t, whose possibility level is truncated at the global degree μ S ( s , s 0 ) of similarity of s and s 0. The max-based aggregation of the various contributions obtained from the comparison with each case ( s , t ) in the memory M of cases acknowledges the fact that each new comparison may suggest new possible values for t 0 and agrees with the positive nature of the information in the repository of cases. Thus, we obtain the following fuzzy set Es 0 of the possible values t for t 0

E s 0 ( t ) = max⁡ ( s , t ) M min⁡ ( S ( s , s 0 ) , T ( t , t ) ) .

This latter expression can be put in parallel with the evaluation of a flexible query [91]. This approach has been generalized to imprecisely or fuzzily described situations, and has been related to other approaches to instance-based prediction [92, 93].

3.8 Preference Representation

Possibility theory also offers a framework for preference modeling in constraint-directed reasoning. Both prioritized and soft constraints can be captured by possibility distributions expressing degrees of feasibility rather than plausibility [6]. Possibility theory offers a natural setting for fuzzy optimization whose aim is to balance the levels of satisfaction of multiple fuzzy constraints (instead of minimizing an overall cost) [94]. In such problems, some possibility distributions represent soft constraints on decision variables, other ones can represent incomplete knowledge about uncontrollable state variables. Qualitative decision criteria are particularly adapted to the handling of uncertainty in this setting. Possibility distributions can also model ill-known constraint coefficients in linear and nonlinear programming, thus leading to variants of chance-constrained programming [95].

Optimal solutions of fuzzy constraint-based problems maximize the satisfaction of the most violated constraint, which does not ensure the Pareto dominance of all such solutions. More demanding optimality notions have been defined, by canceling equally satisfied constraints (the so-called discrimin ordering) or using a leximin criterion [94, 96, 97].

Besides, the possibilistic logic setting provides a compact representation framework for preferences, where possibilistic logic formulas represent prioritized constraints on Boolean domains. This approach has been compared to qualitative conditional preference networks (GlossaryTerm

CP  net

s), based on a systematic ceteris paribus assumption (preferential independence between decision variables). GlossaryTerm

CP  net

s induce partial orders of solutions rather than complete preorders, as possibilistic logic does [98]. Possibilistic networks can also model preference on the values of variables, conditional to the value of other ones, and offer an alternative to conditional preference networks [98].

Bipolar possibility theory has been applied to preference problems where it can be distinguished between imperative constraints (modeled by propositions with a degree of necessity), and nonimperative wishes (modeled by propositions with a degree of guaranteed possibility level) [99]. Another kind of bipolar approach to qualitative multifactorial evaluation based on possibility theory, is when comparing objects in terms of their pros and cons where the decision maker focuses on the most important assets or defects. Such qualitative multifactorial bipolar decision criteria have been defined, axiomatized [100], and empirically tested [101]. They are qualitative counterparts of cumulative prospect theory criteria of Kahneman and Tverski [102].

Two issues in preference modeling based on possibility theory in a logic format are as follows:

  • Preference statements of the form Π ( p ) > Π ( q ) provide an incomplete description of a preference relation. One question is then how to complete this description by default. The principle of minimal specificity then means that a solution not explicitly rejected is satisfactory by default. The dual maximal specificity principle, says that a solution not supported is rejected by default. It is not always clear which principle is the most natural.

  • A statement according to which it is better to satisfy a formula p than a formula q can in fact be interpreted in several ways. For instance, it may mean that the best solution satisfying p is better that the best solution satisfying q, which reads Π ( p ) > Π ( q ) and can be encoded in possibilistic logic under minimal specificity assumption; a stronger statement is that the worst solution satisfying p is better that the best solution satisfying q, which reads Δ ( p ) > Π ( q ) . Other possibilities are Δ ( p ) > Δ ( q ) , and Π ( p ) > Δ ( q ) . This question is studied in some detail by Kaci [103].

3.9 Decision-Theoretic Foundations

Zadeh [1] hinted that since our intuition concerning the behavior of possibilities is not very reliable, our understanding of them

would be enhanced by the development of an axiomatic approach to the definition of subjective possibilities in the spirit of axiomatic approaches to the definition of subjective probabilities.

Decision-theoretic justifications of qualitative possibility were devised, in the style of Von Neumann and Morgenstern, and Savage [104] more than 15 years ago [105, 106].

On top of the set of states, assume there is a set X of consequences of decisions. A decision, or act, is modeled as a mapping f from S to X assigning to each state S its consequence f ( s ) . The axiomatic approach consists in proposing properties of a preference relation between acts so that a representation of this relation by means of a preference functional W ( f ) is ensured, that is, act f is as good as act g (denoted by f g ) if and only if W ( f ) W ( g ) . W ( f ) depends on the agent’s knowledge about the state of affairs, here supposed to be a possibility distribution π on S, and the agent’s goal, modeled by a utility function u on X. Both the utility function and the possibility distribution map to the same finite chain L. A pessimistic criterion W - π ( f ) is of the form

W - π ( f ) = min⁡ s S max⁡ ( n ( π ( s ) ) , u ( f ( s ) ) ) ,

where n is the order-reversing map of L. n ( π ( s ) ) is the degree of certainty that the state is not s (hence the degree of surprise of observing s), u ( f ( s ) ) the utility of choosing act f in state s. W - π ( f ) is all the higher as all states are either very surprising or have high utility. This criterion is actually a prioritized extension of the Wald maximin criterion. The latter is recovered if π ( s ) = 1 (top of L) s S . According to the pessimistic criterion, acts are chosen according to their worst consequences, restricted to the most plausible states S * = { s , π ( s ) n ( W - π ( f ) ) } . The optimistic counterpart of this criterion is

W + π ( f ) = max⁡ s S min⁡ ( π ( s ) ) , u ( f ( s ) ) ) .

W + π ( f ) is all the higher as there is a very plausible state with high utility. The optimistic criterion was first proposed by Yager [107] and the pessimistic criterion by Whalen [108]. See Dubois etal [109] for the resolution of decision problems under uncertainty using the above criterion, and cast in the possibilistic logic framework. Such criteria can be refined by the classical expected utility criterion [110].

These optimistic and pessimistic possibilistic criteria are particular cases of a more general criterion based on the Sugeno integral [111] specialized to possibility and necessity of fuzzy events [1, 20]

S γ , u ( f ) = max⁡ λ L min⁡ ( λ , γ ( F λ ) ) ,

where F λ = { s S , u ( f ( s ) ) λ } , γ is a monotonic set function that reflects the decision-maker attitude in front of uncertainty: γ ( A ) is the degree of confidence in event A. If γ = Π , then S Π , u ( f ) = W + π ( f ) . Similarly, if γ = N , then S N , u ( f ) = W - π ( f ) .

For any acts f , g , and any event A, let fAg denote an act consisting of choosing f if A occurs and g if its complement occurs. Let f g (resp. f g ) be the act whose results yield the worst (resp. best) consequence of the two acts in each state. Constant acts are those whose consequence is fixed regardless of the state. A result in [112, 113] provides an act-driven axiomatization of these criteria, and enforces possibility theory as a rational representation of uncertainty for a finite state space S:

Theorem 3.1

Suppose the preference relation on acts obeys the following properties:

  1. 1.

    ( X S , ) is a complete preorder.

  2. 2.

    There are two acts such that f g .

  3. 3.

    A , g and h constant, f , g h implies g A f h A f .

  4. 4.

    If f is constant, f h and g h imply f g h .

  5. 5.

    If f is constant, h f and h g imply h f g .

Then there exists a finite chain L, an L-valued monotonic set function γ on S and an L-valued utility function u, such that is representable by a Sugeno integral of u ( f ) with respect to γ. Moreover, γ is a necessity (resp. possibility) measure as soon as property (4) (resp. (5)) holds for all acts. The preference functional is then W - π ( f ) (resp. W + π ( f ) ).

Axioms (4 and 5) contradict expected utility theory. They become reasonable if the value scale is finite, decisions are one-shot (no compensation) and provided that there is a big step between any level in the qualitative value scale and the adjacent ones. In other words, the preference pattern f h always means that f is significantly preferred to h, to the point of considering the value of h negligible in front of the value of f. The above result provides decision-theoretic foundations of possibility theory, whose axioms can thus be tested from observing the choice behavior of agents. See [114] for another approach to comparative possibility relations, more closely relying on Savage axioms, but giving up any comparability between utility and plausibility levels. The drawback of these and other qualitative decision criteria is their lack of discrimination power [115]. To overcome it, refinements of possibilistic criteria were recently proposed, based on lexicographic schemes. These refined criteria turn out to be by a classical (but big-stepped) expected utility criterion [110], and Sugeno integral can be refined by a Choquet integral [116]. For extension of this qualitative decision-making framework to multiple-stage decision, see [117].

4 Quantitative Possibility Theory

The phrase quantitative possibility refers to the case when possibility degrees range in the unit interval, and are considered in connection with belief function and imprecise probability theory. Quantitative possibility theory is the natural setting for a reconciliation between probability and fuzzy sets. In that case, a precise articulation between possibility and probability theories is useful to provide an interpretation to possibility and necessity degrees. Several such interpretations can be consistently devised: a degree of possibility can be viewed as an upper probability bound [118], and a possibility distribution can be viewed as a likelihood function [119]. A possibility measure is also a special case of a Shafer plausibility function [120]. Following a very different approach, possibility theory can account for probability distributions with extreme values, infinitesimal [72] or having big steps [121]. There are finally close connections between possibility theory and idempotent analysis [122]. The theory of large deviations in probability theory [123] also handles set functions that look like possibility measures [124]. Here we focus on the role of possibility theory in the theory of imprecise probability.

4.1 Possibility as Upper Probability

Let π be a possibility distribution where π ( s ) [ 0 , 1 ] . Let P ( π ) be the set of probability measures P such that P Π , i. e., A S , P ( A ) Π ( A ) . Then the possibility measure Π coincides with the upper probability function P * such that P * ( A ) = sup⁡ { P ( A ) , P P ( π ) } while the necessity measure N is the lower probability function P * such that P * ( A ) = inf⁡ { P ( A ) , P P ( π ) } ; see [118, 125] for details. P and π are said to be consistent if P P ( π ) . The connection between possibility measures and imprecise probabilistic reasoning is especially promising for the efficient representation of nonparametric families of probability functions, and it makes sense even in the scope of modeling linguistic information [126].

A possibility measure can be computed from nested confidence subsets { A 1 , A 2 , , A m } where A i A i + 1 , i = 1 , , m - 1 . Each confidence subset A i is attached a positive confidence level λ i interpreted as a lower bound of P ( A i ) , hence a necessity degree. It is viewed as a certainty qualified statement that generates a possibility distribution π i according to Sect. 3.2. The corresponding possibility distribution is

π ( s ) = min⁡ i = 1 , , m π i ( s ) = 1 if u A 1 1 - λ j - 1 if j = max⁡ { i : s A i } > 1 .

The information modeled by π can also be viewed as a nested random set { ( A i , ν i ) , i = 1 , , m } , where ν i = λ i - λ i - 1 . This framework allows for imprecision (reflected by the size of the A i ’s) and uncertainty (the ν i ’s). And ν i is the probability that the agent only knows that A i contains the actual state (it is not P ( A i ) ). The random set view of possibility theory is well adapted to the idea of imprecise statistical data, as developed in [127, 128]. Namely, given a bunch of imprecise (not necessarily nested) observations (called focal sets), π supplies an approximate representation of the data, as π ( s ) = i : s A i ν i .

In the continuous case, a fuzzy interval M can be viewed as a nested set of α-cuts, which are intervals M α = { x : μ M ( x ) α , α > 0 } . In the continuous case, note that the degree of necessity is N ( M α ) = 1 - α , and the corresponding probability set P ( μ M ) = { P : P ( M α ) 1 - α , α > 0 } . Representing uncertainty by the family of pairs { ( M α , 1 - α ) : α > 0 } is very similar to the basic approach of info-gap theory [129].

The set P ( π ) contains many probability distributions, arguably too many. Neumaier [130] has recently proposed a related framework, in a different terminology, for representing smaller subsets of probability measures using two possibility distributions instead of one. He basically uses a pair of distributions ( δ , π ) (in the sense of Sect. 3.2) of distributions, he calls cloud, where δ is a guaranteed possibility distribution (in our terminology) such that π δ . A cloud models the (generally nonempty) set P ( π ) P ( 1 - δ ) , viewing 1 - δ as a standard possibility distribution. The precise connections between possibility distributions, clouds and other simple representations of numerical uncertainty is studied in [131].

4.2 Conditioning

There are two kinds of conditioning that can be envisaged upon the arrival of new information E. The first method presupposes that the new information alters the possibility distribution π by declaring all states outside E impossible. The conditional measure π ( . E ) is such that Π ( B E ) Π ( E ) = Π ( B E ) . This is formally Dempster rule of conditioning of belief functions, specialized to possibility measures. The conditional possibility distribution representing the weighted set of confidence intervals is

π ( s E ) = π ( s ) Π ( E ) , if s E 0 otherwise .

De Baets etal [28] provide a mathematical justification of this notion in an infinite setting, as opposed to the min-based conditioning of qualitative possibility theory. Indeed, the maxitivity axiom extended to the infinite setting is not preserved by the min-based conditioning. The product-based conditioning leads to a notion of independence of the form Π ( B E ) = Π ( B ) Π ( E ) whose properties are very similar to the ones of probabilistic independence [30].

Another form of conditioning [132, 133], more in line with the Bayesian tradition, considers that the possibility distribution π encodes imprecise statistical information, and event E only reflects a feature of the current situation, not of the state in general. Then the value Π ( B E ) = sup⁡ { P ( B E ) , P ( E ) > 0 , P Π } is the result of performing a sensitivity analysis of the usual conditional probability over P ( π )  [134]. Interestingly, the resulting set function is again a possibility measure, with distribution

π ( s E ) = max⁡ ( π ( s ) , π ( s ) π ( s ) + N ( E ) ) , if s E 0 otherwise .

It is generally less specific than π on E, as clear from the above expression, and becomes noninformative when N ( E ) = 0 (i. e., if there is no information about E). This is because π ( E ) is obtained from the focusing of the generic information π over the reference class E. On the contrary, π ( E ) operates a revision process on π due to additional knowledge asserting that states outside E are impossible. See De Cooman [133] for a detailed study of this form of conditioning.

4.3 Probability–Possibility Transformations

The problem of transforming a possibility distribution into a probability distribution and conversely is meaningful in the scope of uncertainty combination with heterogeneous sources (some supplying statistical data, other linguistic data, for instance). It is useful to cast all pieces of information in the same framework. The basic requirement is to respect the consistency principle Π P . The problem is then either to pick a probability measure in P ( π ) , or to construct a possibility measure dominating P.

There are two basic approaches to possibility/probability transformations, which both respect a form of probability–possibility consistency. One, due to Klir [135, 136] is based on a principle of information invariance, the other [137] is based on optimizing information content. Klir assumes that possibilistic and probabilistic information measures are commensurate. Namely, the choice between possibility and probability is then a mere matter of translation between languages neither of which is weaker or stronger than the other (quoting Klir and Parviz [138]). It suggests that entropy and imprecision capture the same facet of uncertainty, albeit in different guises. The other approach, recalled here, considers that going from possibility to probability leads to increase the precision of the considered representation (as we go from a family of nested sets to a random element), while going the other way around means a loss of specificity.

4.3.1 From Possibility to Probability

The most basic example of transformation from possibility to probability is the Laplace principle of insufficient reason claiming that what is equally possible should be considered as equally probable. A generalized Laplacean indifference principle is then adopted in the general case of a possibility distribution π: the weights ν i bearing on the sets A i from the nested family of levels cuts of π are uniformly distributed on the elements of these cuts A i . Let P i be the uniform probability measure on A i . The resulting probability measure is P = i = 1 , , m ν i P i . This transformation, already proposed in 1982 [139] comes down to selecting the center of gravity of the set P ( π ) of probability distributions dominated by π. This transformation also coincides with Smets’ pignistic transformation [140] and with the Shapley value of the unamimity game (another name of the necessity measure) in game theory. The rationale behind this transformation is to minimize arbitrariness by preserving the symmetry properties of the representation. This transformation from possibility to probability is one-to-one. Note that the definition of this transformation does not use the nestedness property of cuts of the possibility distribution. It applies all the same to nonnested random sets (or belief functions) defined by pairs { ( A i , ν i ) , i = 1 , , m } , where ν i are nonnegative reals such that i = 1 , , m ν i = 1 .

4.3.2 From Objective Probability to Possibility

From probability to possibility, the rationale of the transformation is not the same according to whether the probability distribution we start with is subjective or objective [106]. In the case of a statistically induced probability distribution, the rationale is to preserve as much information as possible. This is in line with the handling of Δ-qualified pieces of information representing observed evidence, considered in Sect. 3.2; hence we select as the result of the transformation of a probability measure P, the most specific possibility measure in the set of those dominating P [137]. This most specific element is generally unique if P induces a linear ordering on S. Suppose S is a finite set. The idea is to let Π ( A ) = P ( A ) , for these sets A having minimal probability among other sets having the same cardinality as A. If p 1 > p 2 > > p n , then Π ( A ) = P ( A ) for sets A of the form { s i , , s n } , and the possibility distribution is defined as π P ( s i ) = j = i , , m p j , with p j = P ( { s j } ) . Note that π P is a kind of cumulative distribution of P, already known as a Lorentz curve in the mathematical literature [141]. If there are equiprobable elements, the unicity of the transformation is preserved if equipossibility of the corresponding elements is enforced. In this case it is a bijective transformation as well. Recently, this transformation was used to prove a rather surprising agreement between probabilistic indeterminateness as measured by Shannon entropy, and possibilistic nonspecificity. Namely it is possible to compare probability measures on finite sets in terms of their relative peakedness (a concept adapted from Birnbaum [142]) by comparing the relative specificity of their possibilistic transforms. Namely let P and Q be two probability measures on S and π P , π Q the possibility distributions induced by our transformation. It can be proved that if π P π Q (i. e., P is less peaked than Q) then the Shannon entropy of P is higher than the one of Q [143]. This result give some grounds to the intuitions developed by Klir [135], without assuming any commensurability between entropy and specificity indices.

4.3.3 Possibility Distributions Induced by Prediction Intervals

In the continuous case, moving from objective probability to possibility means adopting a representation of uncertainty in terms of prediction intervals around the mode viewed as the most frequent value. Extracting a prediction interval from a probability distribution or devising a probabilistic inequality can be viewed as moving from a probabilistic to a possibilistic representation. Namely suppose a nonatomic probability measure P on the real line, with unimodal density ϕ, and suppose one wishes to represent it by an interval I with a prescribed level of confidence P ( I ) = γ of hitting it. The most natural choice is the most precise interval ensuring this level of confidence. It can be proved that this interval is of the form of a cut of the density, i. e., I γ = { s , ϕ ( s ) θ } for some threshold θ. Moving the degree of confidence from 0 to 1 yields a nested family of prediction intervals that form a possibility distribution π consistent with P, the most specific one actually, having the same support and the same mode as P and defined by [137]

π ( inf⁡ I γ ) = π ( sup⁡ I γ ) = 1 - γ = 1 - P ( I γ ) .

This kind of transformation again yields a kind of cumulative distribution according to the ordering induced by the density ϕ. Similar constructs can be found in the statistical literature (Birnbaum [142]). More recently Mauris etal [144] noticed that starting from any family of nested sets around some characteristic point (the mean, the median,), the above equation yields a possibility measure dominating P. Well-known inequalities of probability theory, such as those of Chebyshev and Camp-Meidel, can also be viewed as possibilistic approximations of probability functions. It turns out that for symmetric unimodal densities, each side of the optimal possibilistic transform is a convex function. Given such a probability density on a bounded interval [ a , b ] , the triangular fuzzy number whose core is the mode of ϕ and the support is [ a , b ] is thus a possibility distribution dominating P regardless of its shape (and the tightest such distribution). These results justify the use of symmetric triangular fuzzy numbers as fuzzy counterparts to uniform probability distributions. They provide much tighter probability bounds than Chebyshev and Camp-Meidel inequalities for symmetric densities with bounded support. This setting is adapted to the modeling of sensor measurements [145]. These results are extended to more general distributions by Baudrit etal [146], and provide a tool for representing poor probabilistic information. More recently, Mauris [147] unifies, by means of possibility theory, many old techniques independently developed in statistics for one-point estimation, relying on the idea of dispersion of an empirical distribution. The efficiency of different estimators can be compared by means of fuzzy set inclusion applied to optimal possibility transforms of probability distributions. This unified approach does not presuppose a finite variance.

4.3.4 Subjective Possibility Distributions

The case of a subjective probability distribution is different. Indeed, the probability function is then supplied by an agent who is in some sense forced to express beliefs in this form due to rationality constraints, and the setting of exchangeable bets. However his actual knowledge may be far from justifying the use of a single well-defined probability distribution. For instance in case of total ignorance about some value, apart from its belonging to an interval, the framework of exchangeable bets enforces a uniform probability distribution, on behalf of the principle of insufficient reason. Based on the setting of exchangeable bets, it is possible to define a subjectivist view of numerical possibility theory, that differs from the proposal of Walley [134]. The approach developed by Dubois etal [148] relies on the assumption that when an agent constructs a probability measure by assigning prices to lotteries, this probability measure is actually induced by a belief function representing the agent’s actual state of knowledge. We assume that going from an underlying belief function to an elicited probability measure is achieved by means of the above mentioned pignistic transformation, changing focal sets into uniform probability distributions. The task is to reconstruct this underlying belief function under a minimal commitment assumption. In the paper [148], we pose and solve the problem of finding the least informative belief function having a given pignistic probability. We prove that it is unique and consonant, thus induced by a possibility distribution. The obtained possibility distribution can be defined as the converse of the pignistic transformation (which is one-to-one for possibility distributions). It is subjective in the same sense as in the subjectivist school in probability theory. However, it is the least biased representation of the agent’s state of knowledge compatible with the observed betting behavior. In particular, it is less specific than the one constructed from the prediction intervals of an objective probability. This transformation was first proposed in [149] for objective probability, interpreting the empirical necessity of an event as summing the excess of probabilities of realizations of this event with respect to the probability of the most likely realization of the opposite event.

4.3.5 Possibility Theory and Defuzzification

Possibilistic mean values can be defined using Choquet integrals with respect to possibility and necessity measures [133, 150], and come close to defuzzification methods [151]. Interpreting a fuzzy interval M, associated with a possibility distribution μ M , as a family of probabilities, upper and lower mean values E * ( M ) and E * ( M ) , can be defined as [152]

E * ( M ) = 0 1 inf⁡ M α d α ; E * ( M ) = 0 1 sup⁡ M α d α ,

where M α is the α-cut of M.

Then the mean interval E ( M ) = [ E * ( M ) , E * ( M ) ] of M is the interval containing the mean values of all random variables consistent with M, that is E ( M ) = { E ( P ) P P ( μ M ) } , where E ( P ) represents the expected value associated with the probability measure P. That the mean value of a fuzzy interval is an interval seems to be intuitively satisfactory. Particularly the mean interval of a (regular) interval [ a , b ] is this interval itself. The upper and lower mean values are linear with respect to the addition of fuzzy numbers. Define the addition M + N as the fuzzy interval whose cuts are M α + N α = { s + t , s M α , t N α } defined according to the rules of interval analysis. Then E ( M + N ) = E ( M ) + E ( N ) , and similarly for the scalar multiplication E ( a M ) = a E ( M ) , where aM has membership grades of the form μ M ( s / a ) for a 0 . In view of this property, it seems that the most natural defuzzication method is the middle point E ^ ( M ) of the mean interval (originally proposed by Yager [153]). Other defuzzification techniques do not generally possess this kind of linearity property. E ^ ( M ) has a natural interpretation in terms of simulation of a fuzzy variable [154], and is the mean value of the pignistic transformation of M. Indeed it is the mean value of the empirical probability distribution obtained by the random process defined by picking an element α in the unit interval at random, and then an element s in the cut M α at random.

5 Some Applications

Possibility theory has not been the main framework for engineering applications of fuzzy sets in the past. However, on the basis of its connections to symbolic artificial intelligence, to decision theory and to imprecise statistics, we consider that it has significant potential for further applied developments in a number of areas, including some where fuzzy sets are not yet always accepted. Only some directions are pointed out here.

5.1 Uncertain Database Querying and Preference Queries

The evaluation of a flexible query in the face of incomplete or fuzzy information amounts to computing the possibility and the necessity of the fuzzy event expressing the gradual satisfaction of the query [155]. This evaluation, known as fuzzy pattern matching [156, 157], corresponds to the extent to which fuzzy sets (representing the query) overlap, or include the possibility distributions (representing the available information). Such an evaluation procedure has been extended to symbolic labels that are no longer represented by possibility distributions, but which belong to possibilistic ontologies where approximate similarity and subsumption between labels are estimated in terms of possibility and necessity degrees, respectively [158]. These approaches presuppose a total lack of dependencies between ill-known attributes. A more general approach based on possible world semantics has been envisaged [159]. However, as for the probabilistic counterpart of this latter view, evaluating queries has a high computational cost [160]. This is why it has been proposed to only use certainty qualified values (or disjunctions of values), as in possibilistic logic, rather than general possibility distributions, for representing attribute values pervaded with uncertainty. It has been shown that it leads to a tractable extension of relational algebra operations [161, 162].

Besides, possibility theory is not only useful for representing qualitative uncertainty, but it may also be of interest for representing preferences, and as such may be applied to the handling of preferences queries [163]. Thus, requirements of the form A and preferably B (i. e., it is more satisfactory to have A and B than A alone), or A or at least B can be expressed using appropriate priority orderings, as in possibilistic logic [164]. Lastly, in bipolar queries [165, 166, 167], flexible constraints that are more or less compulsory are distinguished from additional wishes that are optional, as for instance in the request find the apartments that are cheap and maybe near the train station. Indeed, negative preferences express what is (more or less, or completely) impossible or undesirable, and by complementation state flexible constraints restricting the possible or acceptable values. Positive preferences are not compulsory, but rather express wishes; they state what attribute values would be really satisfactory.

5.2 Description Logics

Description logics (initially named terminological logics) are tractable fragments of first-order logic representation languages that handle notions of concepts, roles and instances, referring at the semantic level to the respective notions of set, binary relations, membership, and cardinality. They are useful for describing ontologies that consist in hierarchies of concepts in a particular domain, for the semantic web. Two ideas that, respectively, come from fuzzy sets and possibility theory, and that may be combined, may be used for extending the expressive power of description logics. On one hand, vague concepts can be approximated in practice by pairs of nested sets corresponding to the cores and the supports of fuzzy sets, thus sorting out the typical elements, in a way that agrees with fuzzy set operations and inclusions. On the other hand, a possibilistic treatment of uncertainty and exceptions can be performed on top of a description logic in a possibilistic logic style [168]. In both cases, the underlying principle is to remain as close as possible to classical logic for preserving computational efficiency as much as possible. Thus, formal expressions such as ( P X α Q , β ) intend to mean that it is certain at least at level β that the degree of subsumption of concept P in Q is at least α, in the sense of some X-implication (e. g., Gödel, or Kleene–Dienes implication). In particular, it can be expressed that typical Ps are Qs, or that typical Ps are typical Qs, or that an instance is typical of a concept. Such ideas have been developed by Qi etal [169] toward implemented systems in connection with web research.

5.3 Information Fusion

Possibility theory offers a simple, flexible framework for information fusion that can handle incompleteness and conflict. For instance, intervals or fuzzy intervals can be merged, coming from several sources. The basic fusion modes are the conjunctive and disjunctive modes, presupposing, respectively, that all sources of information are reliable and that at least one is [170, 171]. In the conjunctive mode, the use of the minimum operation avoids assuming sources are independent. If they are, the product rule can be applied, whereby low plausibility degrees reinforce toward impossibility. Quite often, the results of a conjunctive aggregation are subnormalized, this indicating a conflict. Then, it is common to apply a renormalization step that makes this mode of combination brittle in case of strong conflict, and anyway the more numerous the sources the more conflicting they become. Weighted average of possibility degrees can be used but it does not preserve the properties of possibility measure. The use of the disjunctive mode is more cautious: it avoids the conflict at the expense of losing information. When many sources are involved the result becomes totally

To cope with this problem, some ad hoc adaptive combination rules have been proposed that focus on maximal subsets of sources that are either fully consistent or not completely inconsistent [170]. This scheme has been further improved by Oussalah etal [172]. Oussalah [173] has proposed a number of postulates a possibilistic fusion rule should satisfy. Another approach is to merge the set of cuts of the possibility distributions based on the maximal consistent subsets of sources (consistent subsets of cuts are merged using conjunction, and the results are merged disjunctively). The result is then a belief function [174]. Another option is to make a guess on the number of reliable sources and merge information inside consistent subsets of sources having this cardinality.

Possibilistic information fusion can be performed syntactically on more compact representations such as possibilistic logic bases [175] (the merging of possibilistic networks [176] has also been recently considered). The latter type of fusion may be of interest both from a computational and from representational point of view. Still it is important to make sure that the syntactic operations are counterparts of semantic ones. Fusion should be performed both at the semantic and at the syntactic levels equivalently. For instance, the conjunctive merging of two possibility distributions corresponds to the mere union of the possibilistic bases that represent them. More details for other operations can be found in [175, 177], and in the bipolar case in [99]. This line of research is pursued by Qi etal [178]. They also proposed an approach to measuring conflict between possibilistic knowledge bases [179].

The distance-based approach [180] that applies to the fusion of classical logic bases can be embedded in the possibilistic fusion setting as well [177]. The distance between an interpretation s and each classical base K is usually defined as d ( s , K ) = min⁡ { H ( s , s * ) : s * K } where H ( s , s * ) is the Hamming distance that evaluates the number of literals with different signs in s and s * ). It is then easy to encode the distance d ( s , K ) into a possibilistic knowledge base (interpreting possibility as Hamming-distance-based similarity to the models of K, i. e., π ( s ) = a d ( s , K ) , a ( 0 , 1 ) ). The result of the possibilistic fusion is a possibilistic knowledge base, the highest weight layer of which is the classical database that is searched for, provided that the distance merging operation is suitably translated to a possibilistic merging operation.

A similar problem exists in belief revision where an epistemic state, represented either by a possibility distribution or by a possibilistic logic base, is revised by an input information p [181]. Revision can be viewed as prioritized fusion, using for instance conditioning, or other operations, depending if in the revised epistemic state one wants to enforce N ( p ) = 1 , or N ( p ) > 0 only, or if we are dealing with an uncertain input ( p , α ) . Then, the uncertain input may be understood as enforcing N ( p ) α in any case, or as taking it into account only if it is sufficiently certain w.r.t. the current epistemic state.

5.4 Temporal Reasoning and Scheduling

Temporal reasoning may refer to time intervals or to time points. When handling time intervals, the basic building block is the one provided by Allen relations between time intervals. There are 13 relations that describe the possible relative locations of two intervals. For instance, given the two intervals A = [ a , a ] and B = [ b , b ] , A is before (resp. after) B means a < b (resp. b < a ), A meets (resp. is met by) B means a = b (resp. b = a ), A overlaps (resp. is overlapped by) B iff b > a and a > b and b > a (resp. a > b and b > a , and a > b ). The introduction of fuzzy features in temporal reasoning can be related to two different issues:

  • First, it can be motivated by the need of a gradual, linguistic-like description of temporal relations even in the face of complete information. Then an extension of Allen relational calculus has been proposed, which is based on fuzzy comparators expressing linguistic tolerance, which are used in place of the exact relations > , = , and <. Fuzzy Allen relations are thus defined from three fuzzy relations between dates that can be, for instance approximately equal, clearly greater, and clearly smaller, where, e. g., the extent to which x is approximately equal to y is the degree of membership of x - y to some fuzzy set expressing something like small [182, 183].

  • Second, the possibilistic handling of fuzzy or incomplete information leads to pervade classical Allen relations, and more generally fuzzy Allen relations, with uncertainty. Then patterns for propagating uncertainty and composing the different (fuzzy) Allen relations in a possibilistic way have been laid bare [184, 185].

Besides, the handling of temporal reasoning in terms of relations between time points can also be extended in case of uncertain information [186]. Uncertain relations between temporal points are represented by means of possibility distributions over the three basic relations > , = , and <. Operations for computing inverse relations, for composing relations, for combining relations coming from different sources and pertaining to the same temporal points, or for handling negation, have been defined. This shows that possibilistic temporal uncertainty can be handled in the setting of point algebra. The possibilistic approach can then be favorably compared with a probabilistic approach previously proposed (first, the approach can be purely qualitative, thus avoiding the necessity of quantifying uncertainty if information is poor, and second, it is capable of modeling ignorance in a nonbiased way). Possibilistic logic has also been extended to a timed version where time intervals where a proposition is more or less certainly true is attached to classical propositional formulas [187].

Applications of possibility theory-based decision-making can be found in scheduling. One issue is to handle fuzzy due dates of jobs using the calculus of fuzzy constraints [188]. Another issue is to handle uncertainty in task durations in basic scheduling problems such as program evaluation and review technique (GlossaryTerm

PERT

) networks. A large literature exists on this topic [189, 190] where the role of fuzzy sets is not always very clear. Convincing solutions on this problem start with the works of Chanas and Zielinski [191, 192], where the problem is posed in terms of projecting a joint possibility theory on quantities of interest (earliest finishing times, or slack times) and where tasks can be possibly or certainly critical. A full solution applying Boolean possibility theory to interval uncertainty of tasks durations is described in [193], and its fuzzy extension in [194]. Other scheduling problems are solved in the same possibilistic framework by Kasperski and colleagues [195, 196], as well as more general optimization problems [197, 198].

5.5 Risk Analysis

The aim of risk analysis studies is to perform uncertainty propagation under poor data and without independence assumptions (see the papers in the special issue [199]). Finding the potential of possibilistic representations in computing conservative bounds for such probabilistic calculations is certainly a major challenge [200]. An important research direction is the comparison between fuzzy interval analysis [33] and random variable calculations with a view to unifying them [201]. Methods for joint propagation of possibilistic and probabilistic information have been devised [202], based on casting both in a random set setting [203]; the case of probabilistic models with fuzzy interval parameters has also been dealt with [204]. The active area of fuzzy random variables is also connected to this question [205].

5.6 Machine Learning

Applications of possibility theory to learning have started to be investigated rather recently in different directions. For instance, taking advantage of the proximity between reinforcement learning and partially observed Markov decision processes, a possibilistic counterpart of reinforcement learning has been proposed after developing the possibilistic version of the latter [206]. Besides, by looking for big-stepped probability distributions, defined by discrete exponential distributions, one can mine data bases for discovering default rules [207]. Big-stepped probabilities mimick possibility measures in the sense that P ( A ) > P ( B ) if and only if max⁡ s A p ( s ) > max⁡ s B p ( s ) . The version space approach to learning presents interesting similarities with the binary bipolar possibilistic representation setting, thinking of examples as positive information and of counterexamples as negative information [208]. The general bipolar setting, where intermediary degrees of possibility are allowed, provides a basis for extending version space approach in a graded way, where examples and counter examples can be weighted according to their importance. The graded version space approach agrees with the possibilistic extension of inductive logic programming [209]. Indeed, where the background knowledge may be associated with certainty levels, the examples may be more or less important to cover, and the set of rules that is learnt may be stratified in order to have a better management of exceptions in multiple-class classification problems, in agreement with the possibilistic approach to nonmonotonic reasoning.

Other applications of possibility theory can be found in fields such as data analysis [210, 211, 79], diagnosis [212, 213], belief revision [181], argumentation [214, 215, 68], etc.

6 Some Current Research Lines

A number of ongoing works deal with new research lines where possibility theory is central. In the following, we outline a few of those:

  • Formal concept analysis: Formal concept analysis (GlossaryTerm

    FCA

    ) studies Boolean data tables relating objects and attributes. The key issue of GlossaryTerm

    FCA

    is to extract so-called concepts from such tables. A concept is a maximal set of objects sharing a maximal number of attributes. The enumeration of such concepts can be carried out via a Galois connection between objects and attributes, and this Galois connection uses operators similar to the Δ function of possibility theory. Based on this analogy, other correspondences can be laid bare using the three other set functions of possibility theory [216, 217]. In particular, one of these correspondences detects independent subtables [22]. This approach can be systematized to fuzzy or uncertain versions of formal concept analysis.

  • Generalized possibilistic logic: Possibilistic logic, in its basic version, attaches degrees of necessity to formulas, which turn them into graded modal formulas of the necessity kind. However only conjunction of weighted formulas are allowed. Yet very early we noticed that it makes sense to extend the language toward handing constraints on the degree of possibility of a formula. This requires allowing for negation and disjunctions of necessity-qualified proposition. This extension, still under study [218], puts together the KD modal logic and basic possibilistic logic. Recently it has been shown that nonmonotonic logic programming languages can be translated into generalized possibilistic logic, making the meaning of negation by default in rules much more transparent [219]. This move from basic to generalized possibilistic logic also enables further extensions to the multiagent and the multisource case [220] to be considered. Besides, it has been recently shown that a Sugeno integral can also be represented in terms of possibilistic logic, which enables us to lay bare the logical description of an aggregation process [221].

  • Qualitative capacities and possibility measures: While a numerical possibility measure is equivalent to a convex set of probability measures, it turns out that in the qualitative setting, a monotone set function can be represented by means of a family of possibility measures [222, 223]. This line of research enables qualitative counterparts of results in the study of Choquet capacities in the numerical settings to be established. Especially, a monotone set function can be seen as the counterpart of a belief function, and various concepts of evidence theory can be adapted to this setting [224]. Sugeno integral can be viewed as a lower possibilistic expectation in the sense of Sect. 3.3.9 [223]. These results enable the structure of qualitative monotonic set functions to be laid bare, with possible connection with neighborhood semantics of nonregular modal logics [225].

  • Regression and kriging: Fuzzy regression analysis is seldom envisaged from the point of view of possibility theory. One exception is the possibilistic regression initiated by Tanaka and Guo [211], where the idea is to approximate precise or set-valued data in the sense of inclusion by means of a set-valued or fuzzy set-valued linear function obtained by making the linear coefficients of a linear function fuzzy. The alternative approach is the fuzzy least squares of Diamond [226] where fuzzy data are interpreted as functions and a crisp distance between fuzzy sets is often used. However, in this approach, fuzzy data are questionably seen as objective entities [227]. The introduction of possibility theory in regression analysis of fuzzy data comes down to an epistemic view of fuzzy data whereby one tries to construct the envelope of all linear regression results that could have been obtained, had the data been precise [228]. This view has been applied to the kriging problem in geostatistics [229]. Another use of possibility theory consists in exploiting possibility–probability transforms to develop a form of quantile regression on crisp data [230], yielding a fuzzy function that is much more faithful to the data set than what a fuzzified linear function can offer.