Keywords

1 Introduction

When information about uncertainty cannot be quantified in a probabilistic way, possibilistic decision theory is a natural field to consider [2,3,4,5,6,7]. Qualitative decision theory is relevant, among other fields, for applications to planning under uncertainty, where a suitable strategy (i.e. a set of conditional or unconditional decisions) is to be found, starting from a qualitative description of the initial world, of the available decisions, of their (perhaps uncertain) effects and of the goal to reach (see [8,9,10]). But up to this point, the evaluation of the strategies was considered in a simple, mono-criterion context, while it is often the case that several criteria are involved in the decision [11].

A theoretical framework has been proposed for multi-criteria/multi-agent (non sequential) decision making under possibilistic uncertainty [1, 12]. In the present paper, we extend it to decision trees and we propose a detailed algorithmic study. After a refreshing on the background (Sect. 2), Sect. 3 presents our algorithms, and is completed, in Sect. 4, by an experimental evaluation.

2 Background

2.1 Multi-criteria Decision Making (MCDM) Under Uncertainty

Following Dubois and Prade’s possibilistic approach of decision making under qualitative uncertainty, a non-sequential (i.e. one stage) decision can be seen as a possibility distributionFootnote 1 over a finite set of outcomes, called a (simple) possibilistic lottery [2]. Such a lottery is denoted \(L=\langle \lambda _{1}/x_{1}, \dots , \lambda _{n}/x_{n}\rangle \) where \(\lambda _{i} = \pi _L(x_i) \) is the possibility that decision L leads to outcome \(x_i\); this possibility degree can also be denoted by \(L[x_i] \). In this framework, a decision problem is thus fully specified by a set of possibilistic lotteries on X and a utility function \(u:X \mapsto [0,1]\). Under the assumption that the utility scale and the possibility scale are commensurate and purely ordinal, [2] proposes to evaluate each lottery by a qualitative, optimistic or pessimistic, global utility:

$$\begin{aligned} {\text {Optimistic utility: }} U^+(L) = \underset{x_i \in X}{\max }\min (\lambda _i, u(x_i)) \end{aligned}$$
(1)
$$\begin{aligned} {\text {Pessimistic utility: }} U^-(L) = \underset{x_i \in X}{\min }\max (1 - \lambda _i, u(x_i)) \end{aligned}$$
(2)

\(U^+(L)\) is a mild version of the maximax criterion: L is good as soon as it is totally plausible that it gives a good consequence. On the contrary, the pessimistic index, \(U^-(L)\) estimates the utility of an act by its worst possible consequence: its value is high whenever L gives good consequences in every “rather plausible” state.

This setting assumes a ranking of X by a single preference criterion, hence the use of a single utility function. When several criteria, say a set \(Cr = \{1, \dots , p \}\) of p criteria, have to be taken into account, u must be replaced by a vector \(\varvec{u} = \langle u_1, \dots , u_p \rangle \) of utility functions \(u_j\). If the criteria are not equally important, each j is equipped with a weight \(w_j \in [0,1]\) reflecting its importance.

In the absence of uncertainty, each decision leads to a unique consequence and the problem is a simple problem of qualitative MCDM aggregation; classically, such aggregation shall be either conjunctive (i.e. based on a weighted min) or disjunctive (i.e. based on a weighted max) - see [13] for more details about weighted min and weighted max aggregations.

In presence of uncertainty, the aggregation can be done ex-ante or ex-post:

  • The ex-ante approach consists in computing the (optimistic or pessimistic) utility relative to each criterion j, and then performs the MCDM aggregation.

  • The ex-post approach consists in first determining the aggregated utility (conjunctive or disjunctive) of each possible \(x_i\); then the problem can be viewed as a mono-criterion problem of decision making under uncertainty.

Since the decision maker’s attitude with respect to uncertainty can be either optimistic or pessimistic and the way of aggregating the criteria either conjunctive or disjunctive, [1, 12] propose four ex-ante and four ex-post approaches:

$$\begin{aligned}&U^{-\min }_{ante} (L) = \underset{j \in Cr}{\min }~\max ( (1 - w_j), \underset{x_i \in X}{\min }\max (u_j(x_i), (1 - L[x_i]))) \end{aligned}$$
(3)
$$\begin{aligned}&U^{-\max }_{ante} (L) = \underset{j \in Cr}{\max }~\min ( w_j, \underset{x_i \in X}{\min }\max (u_j(x_i), (1 - L[x_i])))\end{aligned}$$
(4)
$$\begin{aligned}&U^{+\min }_{ante} (L) = \underset{j \in Cr}{\min }~\max ( (1- w_j), \underset{x_i \in X}{\max }\min (u_j(x_i), L[x_i]))\end{aligned}$$
(5)
$$\begin{aligned}&U^{+\max }_{ante} (L) = \underset{j \in Cr}{\max }~\min ( w_j, \underset{x_i \in X}{\max }\min (u_j(x_i), L[x_i]))\end{aligned}$$
(6)
$$\begin{aligned}&U^{-\min }_{post} (L) = \underset{x_i \in X}{\min }~\max ( (1 - L[x_i]), \underset{j \in Cr}{\min }\max (u_j(x_i), (1 - w_j)))\end{aligned}$$
(7)
$$\begin{aligned}&U^{-\max }_{post} (L) = \underset{x_i \in X}{\min }~\max ( (1 - L[x_i]), \underset{j \in Cr}{\max }\min (u_j(x_i), w_j))\end{aligned}$$
(8)
$$\begin{aligned}&U^{+\min }_{post} (L) = \underset{x_i \in X}{\max }~\min ( L[x_i], \underset{j \in Cr}{\min }\max (u_j(x_i), (1 - w_j)))\end{aligned}$$
(9)
$$\begin{aligned}&U^{+\max }_{post} (L) = \underset{x_i \in X}{\max }~\min ( L[x_i], \underset{j \in Cr}{\max }\min (u_j(x_i), w_j)) \end{aligned}$$
(10)

In the notations above, the first (resp. second) sign denotes the attitude of the decision maker w.r.t. uncertainty (resp. the criteria). The \(U^{-\min }_{ante}\) utility for instance considers that the decision maker is pessimistic and computes the pessimistic utility of each criterion. Then the criteria are aggregated on a cautions basis: the higher is the satisfaction of the least satisfied of the important criteria, the better is the lottery. Using the same notations, \(U^{-\max }_{post}\) considers that a \(x_i\) is good as soon as one of the important criteria is satisfied: a max-based aggregation of the utilities is done, yielding a unique utility function u() on the basis of which the pessimistic utility is computed. It should be noticed that the full pessimistic and full optimistic ex-ante utilities are equivalent to their ex-post counterparts [12], i.e. \( U^{-\min }_{ante} = U^{-\min }_{post}\) and \(U^{+\max }_{ante} = U^{+\max }_{post}\). But \(U^{-\max }_{ante}\) (resp. \(U^{+\min }_{ante}\)) may differ from \(U^{-\max }_{post}\) (resp. from \(U^{+\min }_{post}\)).

Example 1

Consider two equally important criteria 1 and 2 (\(w_1 = w_2 = 1\)), and a lottery \(L = \langle 1/x_a, 1/x_b \rangle \) leading to two equi possible consequences \(x_a\) and \(x_b\) such that \(x_a\) is good for 1 and bad for 2, and \(x_b\) is bad for 1 and good for 2: \(u_1(x_a) = u_2(x_b) = 1\) and \(u_2(x_a) = u_1(x_b) = 0\). It is easy to check that \(U^{+\min }_{ante} (L) = 0 \ne U^{+\min }_{post} (L)= 1\).

2.2 Possibilistic Decision Trees [14]

Decision trees provide an explicit modeling of sequential problems by representing, simply, all possible scenarios. A decision tree is a labeled tree \(\mathcal {DT} = (\mathcal {N}, \mathcal {E})\) where \(\mathcal {N} = \mathcal {D} \cup \mathcal {C} \cup \mathcal {LN}\) contains three kinds of nodes (see Fig. 1): \(\mathcal {D}\) is the set of decision nodes (represented by squares); \(\mathcal {C}\) is the set of chance nodes (represented by circles) and \(\mathcal {LN}\) is the set of leaves.Succ(N) denotes the set of children nodes of node N. For any \(X_i \in \mathcal {D}, Succ(X_i)\subseteq \mathcal {C}\) i.e. a chance node (an action) must be chosen at each decision node. For any \(C_i \in \mathcal {C}, \ Succ(C_i)\subseteq \mathcal {LN} \cup \mathcal {D}\): the set of outcomes of an action is either a leaf node or a decision node (and then a new action should be executed).

In the possibilistic context, leaves are labeled by utility degrees in the [0,1] scale and the uncertainty pertaining to the possible outcomes of each \(C_i \in \mathcal {C}\), is represented by a conditional possibility distribution \(\pi _i \) on \(Succ(C_i)\), such that \(\forall N \in Succ(C_i), \pi _i(N) = \varPi (N|path(C_i))\) where \(path(C_i)\) denotes all the value assignments of chance and decision nodes on the path from the root to \(C_i \) [14].

Solving a decision tree amounts at building a complete strategy that selects an action (a chance node) for each decision node: a strategy is a mapping \(\delta : \mathcal {D} \mapsto \mathcal {C} \cup \{\bot \}\). \(\delta (D_i) = \bot \) means that no action has been selected for \(D_i\) (\(\delta \) is partial). Leaf nodes being labeled with utility degrees, the rightmost chance nodes can be seen as simple possibilistic lotteries. Then, each strategy \(\delta \) can be viewed as a connected sub-tree of the decision tree and is identified with a possibilistic compound lottery \(L_\delta \), i.e. with a possibility distribution over a set of (simple or compound) lotteries. A compound lottery \(\langle \lambda _1 /L_1,..., \lambda _k /L_k \rangle \) (and thus any strategy) can then be reduced into an equivalent simple lottery as followsFootnote 2 [2]:

$$\begin{aligned} Reduction(\langle \lambda _{1}/L_1,..., \lambda _{k}/L_k \rangle ) =\langle \underset{j = 1,k}{\max }(\min ( \lambda ^j_{1}, \lambda _j)) / u_1,...,\underset{j = 1,k}{\max }(\min ( \lambda ^j_{n}, \lambda _j)) / u_n\rangle . \end{aligned}$$

The pessimistic and optimistic utility of a strategy \(\delta \) can then be computed on the basis of the reduction of \(L_\delta \): the utility of \(\delta \) is the one of \(Reduction(L_\delta \)).

3 Multi-criteria Optimization in Possibilistic Trees

Multi-criteria Possibilistic Decision Trees can now be defined: they are classical possibilistic decision trees, the leaves of which are evaluated according to several criteria - each leaf N is now labeled by a \(vector~u(N) = \langle u_1(N), \dots , u_p(N) \rangle \) rather than by a single utility score (see Fig. 1). A strategy still leads to compound lottery, and can be reduced, thus leading in turn to a simple (but multi-criteria) lottery. We propose to base the comparison of strategies on the comparison, according to the rules O previously presented, of their reductions:

$$\begin{aligned} \delta _1 \succeq _O \delta _2 ~ \text{ iff } ~ U_O(\delta _1) \ge U_O(\delta _2), ~ \text{ where } ~ \forall \delta , U_O(\delta ) = U_O( Reduction(L_\delta )) \end{aligned}$$
(11)

Example 2

Consider the tree of Fig. 1, involving two criteria that are supposed to be equally important and the strategy \(\delta (D_0) = C_1\), \(\delta (D_1) = C_3\), \(\delta (D_2) = C_5\). It holds that \(L_\delta = \langle 1/ L_{C_3}, 0.9/ L_{C_5}\rangle \) with \(L_{C_3}=\langle 0.5/x_a, 1/x_b \rangle \), \(L_{C_5}=\langle 0.2/x_a, 1/x_b \rangle \). Because \(Reduction(L_\delta ) = \langle max(0.5, 0.2 ) / x_a, max( 1, 0.9) / x_b \rangle = \langle 0.5 / x_a, 1 / x_b \rangle \), we get \(U^{+min}_{ante}( \delta )= \min ( \max \min (0.5, 0.3), \min (1, 0.6), \max (\min (0.5, 0.8) \min (1, 0.4))) = 0.5.\)

Fig. 1.
figure 1

A multi-criteria possibilistic decision tree

The definition proposed by Eq. (11) is quite obvious but raises an algorithmic challenge: the set of strategies to compare is exponential w.r.t. the size of the tree which makes the explicit evaluation of the strategies not realistic. The sequel of the paper aims at providing algorithmic solutions to this difficulty.

3.1 Dynamic Programming as a tool for ex-post Utilities

Dynamic Programming [15] is an efficient procedure of strategy optimization. It proceeds by backward induction, handling the problem from the end (and in our case, from the leafs): the last decision nodes are considered first, and recursively until the root is reached. This algorithm is sound and complete as soon as the decision rule leads to complete and transitive preferences and satisfies the principle of weak monotonicity,Footnote 3 that ensures that each sub strategy of an optimal strategy is optimal in its sub-tree. Hopefully, each of the ex-post criteria satisfy transitivity, completeness and weak monotonicity, because collapsing to either a classical \(U^-\) or a \(U^+\) utility, which satisfy these properties [8, 14]. The adaptation of Dynamic Programming to the ex-post rules is detailed in Algorithm 1.

figure a

In short, this algorithm aggregates the utility values of each leaf, and then builds an optimal strategy from the last decision nodes to the root of the tree, using the principle defined by [9, 10] for classical (monocriterion) possibilistic decision trees.

3.2 Dynamic Programming for ex-ante Utilities?

The ex-ante variant of Dynamic Programming we propose is a little more tricky (see Algorithm 2). It keeps at each node a vector of p pessimistic (resp. optimistic) utilities, one for each criterion. The computation of the ex-ante utility can then be performed each time a decision is to be made.

figure b

Recall that \(U^{-\min }_{ante} = U^{-\min }_{post}\) and \(U^{+\max }_{ante} = U^{+\max }_{post}\). Hence, for these two rules the optimization could also be performed by the ex-post algorithm. The two other rules, \(U_{ante}^{-\max }\) and \(U_{ante}^{+\min }\), unfortunately do not satisfy the monotonicity principle (see [1]). Hence, Algorithm 2 may provide a good strategy, but without any guarantee of optimality - it can be considered as an approximation algorithm in these two cases. Another approximation algorithm is the ex-post Algorithm described in the previous Section - even if it is not always the case, it often happens that \(U^{-\max }_{post} = U^{-\max }_{ante} \) (resp. \(U^{+\min }_{post}=U^{+\min }_{ante}\)); if it is the case the solution provided by the ex-post Algorithm is optimal.

3.3 Optimization of \(U_{ante}^{-\max }\) by Multi Dynamic Programming

The lack of monotonicity of \(U_{ante}^{-\max }\) is not dramatic, even when optimality must be guaranteed. With \(U_{ante}^{-\max }\) indeed, we look for a strategy that has a good pessimistic utility \(U^-_j\) for at least one criterion j. This means that if it is possible to get for each j a strategy that optimizes \(U^-_j\) (and this can be done by Dynamic Programming, since the classical pessimistic utility is monotonic), the one with the highest value for \(U^{-\max }_{ante}\) is globally optimal. Formally:

Proposition 1

\(U^{-\max }_{ante}(L) = \underset{j=1,p}{\max }\min ( w_j, U^{-}_j(L))\) where \(U^{-}_j(L)\) is the pessimistic utility of L according to the sole criterion j.

Corollary 1

Let \(\varDelta ^* = \{L^*_1, \dots , L^*_p\}\) s.t. \(\forall L\), \(U^{-}_j(L^*_j) \ge U^{-}_j(L)\) and \(L^* \in \varDelta ^*\). If \(\underset{j=1,p}{\max }\min (w_j, U^{-}_j(L^*)) \ge \underset{j=1,p}{\max }\min (w_j, U^{-}_j(L^*_i)) \forall L^*_i \in \varDelta ^*\) then \( U^{-\max }_{ante}(L^*) \ge U^{-\max }_{ante}(L), \forall L\).

Hence, the optimization problem can be solved by a series of p calls to a classical (monocriterion) pessimistic optimization. This is the principle of the Multi Dynamic Programming approach detailed by Algorithm 3.

figure c

3.4 Right Optimization of \(U_{ante}^{+\min }\): A Branch and Bound algorithm

Let us finally study the \(U^{+\min }_{ante}\) utility. As previously said, it does not satisfy monotonicity and Dynamic Programming can provide a good strategy, but without any guarantee of optimality. To guarantee optimality, one can proceed by an implicit enumeration via a Branch and Bound algorithm, as done by [8] for Possibilistic Choquet integrals and by [16] for Rank Dependent Utility (both in the mono criterion case). The Branch and Bound procedure (see Algorithm 4) takes as argument a partial strategy \(\delta \) and an upper bound of the \(U_{ante}^{+\min }\) value of its best extension. It returns \(U^*\), the \(U_{ante}^{+\min }\) value of the best strategy found so far, \(\delta ^*\). We can initialize \(\delta ^*\) with any strategy, e.g. the one provided by Algorithms 2 or 1. At each step of the Branch and Bound algorithm, the current partial strategy, \(\delta \), is developed by the choice of an action for some unassigned decision node. When several decision nodes are candidate, the one with the minimal rank (i.e. the former one according to the temporal order) is developed. The recursive procedure backtracks when either the current strategy is complete (then \(\delta ^*\) and \(U^*\) are updated) or proves to be worse than the current \(\delta ^*\) in any case. Function \(UpperBound (D_0, \delta )\) provides an upper bound of the best completion of \(\delta \) - in practice, it builds, for each criterion j, a strategy \(\delta _j\) that maximizes \(U_{j}^{+}\) (using [9, 10]’s algorithm, which is linear). It then selects, among these strategies, the one with the highest \(U^{+\min }_{ante}\). It is important to note that \(UpperBound(D_0, \delta ) = U^{+\min }_{ante}(\delta )\) when \(\delta \) is complete. Whenever the value returned by \(UpperBound (D_0, \delta )\) is lower or equal to \(U^*\), the value of the best current strategy, the algorithm backtracks, yielding the choice of another action for the last considered decision node.

figure d

4 Experiments

Beyond the evaluation of the feasibility of the algorithms proposed, our experiments aim at evaluating to what extent the optimization of the problematic utilities, \(U^{-\max }_{ante}\) and \(U^{+\min }_{ante}\), can be approximated by Dynamic Programming.

The implementation has been done in Java, on a processor Intel Core i7 2670 QMCPU, 2.2 GHz, 6 GB of RAM. The experiments were performed on complete binary decision trees. We have considered four sets of problems, the number of decisions to be made in sequence (denoted seq) varying from 2 to 6, with an alternation of decision and chance nodes: at each decision level l (i.e. odd level), the tree contains \(2^{l-1}\) decision nodes followed by \(2^{l}\) chance nodes.Footnote 4 In the present experiment, the number of criteria is set equal to 3. The utility values as well as the weights degrees are uniformly fired in the set \( \{0, 0.1, 0.2, \dots , 0.9, 1 \}\). Conditional possibilities are chosen randomly in [0, 1] and normalized. Each of the four samples of problems contains 1000 randomly generated trees.

Feasibility Analysis and Temporal Performances: Table 1 presents the execution time of each algorithm. Obviously, for each one, the CPU time increases with the size of the tree. But it remains affordable even for very big trees (1365 decisions). We can check that \(U^{-\max }_{ante}\) (resp. \(U^{+\min }_{ante}\)) the approximation performed by ex-post Dynamic Programming is faster than the one performed by ex-ante Dynamic Programming, both being faster than the exact algorithm (Multi Dynamic Programming and Branch and Bound, respectively).

Table 1. Average CPU time, in milliseconds, of for each algorithms and for each rule, according the size of the tree (in number of decision nodes)

Quality of the Approximation: As previously mentioned the ex-post and the ex-ante Dynamic Programming algorithms are approximation algorithms for \(U^{-\max }_{ante}\) and \(U^{+\min }_{ante}\). The following experiments estimate the quality of these approximations. At this extent, we compute for each sample the success rate of the approximation algorithm considered, i.e. the number of trees for which the value provided by the approximation algorithm is actually optimal; then for the trees for which it fails to reach optimality, we report the average closeness value to \( \frac{U_{Approx}}{U_{Exact}}\) where \(U_{Approx}\) is the utility of the strategy provided by the approximation algorithm and \(U_{Exact}\) is the optimal utility - the one of the solution by the exact algorithm (Branch and Bound for \(U^{+\min }_{ante}\) and Multi Dynamic Programming for \(U^{-\max }_{ante}\)). The results are given in Table 2.

Table 2. Quality of approximation of \(U^{-\max }_{ante}\) and \(U^{+\min }_{ante}\) by Dynamic Programming

Clearly, Ex-Post Dynamic Programming provides a good approximation for \(U^{+\min }_{ante}\) - its success rate decreases with the number of nodes but stay higher than 70%, and above all it has a very high closeness value (above 0.9); notice that it is always better than its ex-ante counterpart, in terms of success rate, of closeness and of CPU time. This is good news since it is polynomial while Branch and Bound, the exact algorithm, is exponential in the number of nodes. As to \(U^{-\max }_{ante}\), none of the approximation algorithms is good. However, this is not so bad news since Multi Dynamic Programming, the exact algorithm is polynomial and has very affordable CPU time.

5 Conclusion

This paper proposes to extend to possibilistic decision trees the decision rules presented in [1] for non sequential problems. We show that, for the ex-post decision rules, as well as for \(U^{+max}_{ante}\) and \(U^{-min}_{ante}\), the optimization can be achieved by Dynamic Programming. For \(U^{+\min }_{ante}\) the optimization can be carried either by an exact but costly algorithm (Branch & Bound) or by an approximation one, (ex-post Dynamic Programming). For \(U^{-\max }_{ante}\) we propose an exact algorithm (Multi Dynamic Programming) that performs better than Dynamic Programming. As future work, we would like to study the handling of several criteria in more sophisticated qualitative decision models such as possibilistic influence diagrams [14] or possibilistic Markov decision models [10].