Keywords

1 Introduction

A variety of approaches have been suggested over the years for the evaluation of decision problems. An important category is multi-attribute utility theory (MAUT), where there are several extensively used implementations such as SMART, EXPERT CHOICE and CAR, and various varieties thereof (Danielson and Ekenberg 2016a, 2016b). In general, albeit far from always, these assume that the decision-maker can provide numerically precise decision information; in many cases, this is considered to be unrealistic in real-life decision-making, and is the reason that different interval approaches have been suggested to extend the various decision models for both multi-criteria and risk-based decision-making, such as the PRIME tool, handling multiple criteria while supporting interval-valued ratio estimates for value differences. Another approach is the preference programming method, which is an interval extension of the classical analytical hierarchy process (AHP) method (Salo and Hämäläinen, 2001) and is related to the RICH method. There are also other approaches, such as ARIADNE (Sage and White, 1984). There is a further multitude of fuzzy measurement variants of MAUT techniques. A main issue with the above approaches is that they provide very little assistance when the results overlap, as is usually the case in real-life decision problems. This issue will be addressed in the paper.

In the research community, there have been many suggestions as to how to handle the very strong requirements for decision-makers to provide precise information. Some main categories of approaches to remedy the precision problem are based on capacities, sets of probability measures, upper and lower probabilities, interval probabilities (and sometimes utilities), evidence and possibility theories, as well as fuzzy measures (see, for example, Dubois, 2010; Rohmer and Baudrit, 2010; Shapiro and Koissi, 2015; Dutta, 2018). The latter category seems to be used only to a limited extent in real-life decision analyses since it usually requires a significant mathematical background on the part of the decision-maker. Another reason is that the computational complexity can be problematic if the fuzzy aggregation mechanisms are not significantly simplified. This is further discussed in, for example, Danielson (2004) and Danielson and Ekenberg (2007).

In this article, we therefore suggest a method and software for integrated multi-attribute evaluation under risk, subject to incomplete or imperfect information. The software originates from our earlier work on evaluating decision situations using imprecise utilities, probabilities and weights, as well as qualitative estimates between these components derived from convex sets of weight, utility and probability measures. To avoid some aggregation problems when handling set membership functions and similar, we introduce higher-order distributions for better discrimination between the possible outcomes. For the decision structure, we use the common tree formalism but refrain from using precise numbers. To alleviate the problem of overlapping results, we suggest a new evaluation method based on a resulting belief mass over the output intervals, but without trying to introduce further complicating aspects into the decision situation.

In the next section, we briefly describe how risk and multi-criteria trees can be co-modelled trees in an integrated framework. Thereafter, we provide the conceptual model for our method and explain both the input data format and the evaluations, and how it relates to the modelling of beliefs. We finish with a real-life example.

2 Probabilistic Approaches

Probabilistic decision situations are often represented by a decision tree such as in Fig. 1.

Fig. 1
figure 1

A partial tree representation of the events for one alternative (Alt.1) in a decision under risk. [The three red dots are binary events]

Such a tree consists of a root node, also called a decision node, and a set of probability nodes, representing uncertainty and consequence nodes for the final outcomes. In general, the probability nodes are assigned unique probability distributions representing the uncertainties in the decision situation. When an alternative A i is chosen, there is a probability p ij that an event will occur that leads either to a subsequent event or a consequence. The consequences are assigned values v ijk. The maximization of the expected value is often used as an evaluation rule. The expected value of alternative A i in Fig. 1 is:

$$ E\left({A}_i\right)=\sum \limits_{j=1}^2{p}_{ij}\ \sum \limits_{k=1}^2{p}_{ij k}{v}_{ij k}. $$

This is straightforwardly generalized to decision trees of arbitrary depth.

3 Multi-criteria Decision Trees

Multi-criteria decisions in the MAUT category are characterized by there being several criteria, often on different levels in a hierarchy, as in Fig. 2, where the alternatives are valued and the decision-maker assigns values to the alternatives relative to a value scale.

Fig. 2
figure 2

A multi-criteria decision tree

Normalized weights are assigned to each sub-branch in the tree and the alternatives are valued under the respective sub-criteria. A maximization of the weighted value is often used for the evaluations. In Fig. 2, the value of alternative A i under criterion jk is v ijk, while the weight of criterion jk is w jk. Thereafter, the total value of alternative A i can be calculated using

$$ E\left({A}_i\right)=\sum \limits_{j=1}^2{w}_j\ \sum \limits_{k=1}^2{w}_{jk}{v}_{ijk}. $$

The alternative with the maximum expected value is then the preferred choice.

4 Probabilistic Multi-criteria Hierarchies

Combining these formalisms is straightforward by calculating the value of the alternatives as expected values derived from decision trees—i.e. the valuation of the consequences can be included in the overall multi-criteria tree evaluation. Figure 3 demonstrates how this is done, where the alternatives’ values under the weight w 11 are derived from the entire underlying probabilistic decision tree.

Fig. 3
figure 3

Combined multi-criteria and probabilistic representations

The expected value of the tree is then calculated by

$$ E\left({A}_i\right)=\sum \limits_{j=1}^2\bigg({w}_j\cdot \sum \limits_{k=1}^2\bigg({w}_{jk}\cdot \sum \limits_{m=1}^2\bigg({p}_{im}\cdot \sum \limits_{n=1}^2{p}_{im n}{v}_{im n}\bigg)\bigg)\bigg) $$

or, more generally, by

$$ E\left({A}_i\right)=\sum \limits_{i_1=1}^{n_{i_0}}{w}_{ii_1}\ \sum \limits_{i_2=1}^{n_{i_1}}{w}_{ii_1{i}_2}\dots \sum \limits_{i_{m-1}=1}^{n_{i_{m-2}}}{p}_{ii_1{i}_2}{\dots}_{i_{m-2}{i}_{m-1}}\ \\ \sum \limits_{i_m=1}^{n_{i_{m-1}}}{p}_{ii_1{i}_2}{\dots}_{i_{m-2}{i}_{m-1}{i}_m}{v}_{ii_1{i}_2}{\dots}_{i_{m-2}{i}_{m-1}{i}_m}, $$

where p denotes a probability, w denotes a weight and v denotes a value.

We will formalize this in the next sections and explain how imprecision can be modelled in a combined structure.

5 Strong Uncertainty

In the type of multi-criteria decision problems we consider, we hold that strong uncertainty exists if the decision is also made under risk, with uncertain consequences for at least one criterion, in combination with imprecise or incomplete information with respect to probabilities, weights and consequences or alternative values. Decision evaluation under strong uncertainty and computational means for evaluating these models should both be capable of embracing the uncertainty in the evaluation rules and methods and provide evaluation results reflecting the effects of uncertainty for the subsequent discrimination between alternatives.

We will call our representation of a combined decision problem a multi-frame. Such a frame collects all information necessary for the model in one structure. One part of this is the concept of a graph.

Definition

A graph is a structure 〈V,E〉 where V is a set of nodes and E is a set of node pairs. A tree is a connected graph without cycles. A rooted tree is a tree with a dedicated node as a root. The root is at level 0. The adjacent nodes, except for the nodes at level i-1, to a node at level i is at level i + 1. A node at level i is a leaf if it has no adjacent nodes at level i + 1. A node at level i + 1 that is adjacent to a node at level i is a child of the latter. A (sub-)tree is symmetrical if all nodes at level i have the same number of adjacent nodes at level i + 1. The depth of the tree is max (n | there exists a node at level n).

Definition

A criteria-consequence tree T = 〈C∪A∪N∪{r},E〉 is a tree where

  • r is the root,

  • A is the set of nodes at level 1,

  • C is the set of leaves, and.

  • N is the set of intermediary nodes in the tree except those in A.

In a multi-frame, represented as a multi-tree, user statements can either be range constraints or comparative statements (see below); they are translated into inequalities and collected together in a value constraint set. For probability and weight statements, the same is done into a node constraint set. We denote the values of the consequences ci and cj by vi and vj respectively. Value statements are relations between value variables, and they are translated into systems of inequalities in a value constraint set. Probability statements are in the same manner collected in a node constraint set. A constraint set is said to be consistent if it can be assigned at least one real number to each variable so that all inequalities are simultaneously satisfied. Consequently, we get potential sets of functions with an infinite number of instantiations.

Definition

Given a criteria-consequence tree T, let N be a constraint set in the variables {n…i…j…}. Substitute the intermediary node labels x…i…j… with n…i…j…. N is a node constraint set for T if, for all sets {n…i1,…,n…im} of all sub-nodes of nodes n…i that are not leaves, the statements n…ij ∈ [0,1] and ∑j n…ij = 1, j∈[1,…,m] are in N.

A probability node constraint set relative to a criteria-consequence tree then characterizes a set of discrete probability distributions. Weight and value constraint sets are analogously defined. Weight and probability node constraint sets also contain the usual normalization constraints (∑j x ij = 1) requiring the probabilities and weights to total one.

Definition

A multi-frame is a structure 〈T,N〉, where T is a criteria-consequence tree and N is a set of all constraint sets relative to T.

The probability, value and weight constraint sets thus consist of linear inequalities. A minimal requirement is that it is consistent—i.e. there must exist some vector of variable assignments that simultaneously satisfies each inequality in the system.

Definition

Given a consistent constraint set X in the variables {x i}, Xmax(x i) = def sup(a ∣ {x i > a} ∪ X is consistent. Similarly, Xmin(x i) = def inf(a ∣ {x i < a} ∪ X is consistent. Furthermore, given a function f, Xargmax(f(x)) is a solution vector that is a solution to Xmax(f(x)), and Xargmin(f(x)) is a solution vector that is a solution to Xmin(f(x)).

The set of orthogonal projections of the solution set is the orthogonal hull, consisting of all consistent variable assignments for each variable in a constraint set.

Definition

Given a consistent constraint set X in {x i}i∈[1,…n], the set of pairs 〈 Xmin(xi), Xmax(xi)〉 is the orthogonal hull of the set.

The orthogonal hull is the upper and lower probabilities (weights, values) if X consists of probabilities (weights, values). The hull intervals are calculated by first finding a consistent point. Thereafter, the minimum and maximum of each variable are found by solving linear programming problems. Because of convexity, the intervals between the extremal points are feasible—i.e. the entire orthogonal hull has been determined.

6 Beliefs in Intervals

We will now extend the representation to obtain a more granulated representation of a decision problem. Often when we specify an interval, we probably do not believe in all values in the intervals equally: we may, for example, believe less in the values closer to the borders of the intervals. Additional values are nevertheless added to cover everything that we perceive as possible in uncertain situations. These additions give rise to belief distributions indicating the different strengths with which we believe in the different values. Distributions over classes of weight, probability and value measures have been developed into various models, such as second-order probability theory.

In the extended model, we introduce a focal point to each of the intervals used as parameters for belief distributions for probabilities, values and criteria weights. We can then operate on these distributions using additive and multiplicative combination rules for random variables. The detailed theory of belief distributions in this sense is described in Ekenberg and Thorbiörnson (2001), Danielson et al. (2007, 2014) and Sundgren et al. (2009).

To make the method more concrete, we introduce the unit cube as all tuples (x 1, …, x n) in [0,1]n. A second-order distribution over a unit cube B is a positive distribution F defined on B such that

$$ {\int}_BF(x)\ {dV}_B(x)=1, $$

where VB is the n-dimensional Lebesgue measure on B.

We will use second-order joint probability distributions as measures of beliefs. Different distributions are utilized for weights, probabilities and values because of the normalization constraints for probabilities and weights. Natural candidates are then the Dirichlet distribution for weights and probabilities and two- or three-point distributions for values. In brief, the Dirichlet distribution is a parameterized family of continuous multivariate probability distributions. It has a probability density function given by a function of those parameters, such that α1,...,αk > 0 depends on a beta function and the product of the parameters xi.

More precisely, the probability density function of the Dirichlet distribution is

$$ {f}_{dir}\left(p,\alpha \right)=\frac{\Gamma \left({\sum}_{i=1}^k{\alpha}_i\right)}{\prod_{i=1}^k\Gamma \left({\alpha}_i\right)}{p_1}^{\alpha_i-1}{p_2}^{\alpha_2-1}\dots {p_k}^{\alpha_k-1} $$

on a set {p = (p 1,…p k) | p 1,…,p k ≥ 0, Σp i = 1} where (α1,…, αk) is a parameter vector in which each αi > 0 and Γ(αi) is the Gamma function.Footnote 1

The Dirichlet distribution is a multivariate generalization of the beta distribution and the marginal distributions of Dirichlet are thus beta distributions. The beta distribution is a family of continuous probability distributions defined on [0, 1] and parameterized by two parameters, α and β, defining the shape of the distribution.

If the distribution is uniform, the resulting marginal distribution (over an orthogonal axis) is a polynomial of degree n − 2, where n is the dimension of a cube B. Let all αi = 1, then the Dirichlet distribution is uniform with the marginal distributionFootnote 2

$$ f\left({x}_i\right)={\int}_{B_i^{-}}{dV}_{B_i^{-}}(x)=\left(n-1\right){\left(1-{x}_i\right)}^{n-2} $$

However, we need a bounded Dirichlet distribution operating on a user-specified [a i, b i] range instead of the general interval [0,1]. Bounded beta distributions are then derived—the so-called four-parameter beta distributions, also defined only on the user-specified range. We then define a probability or weight belief distribution as a three-point bounded Dirichlet distribution f 3(a i, c i, b i) where c i is the most likely probability or weight and a i and b i are the boundaries of the belief with a i < c i < b i (Kotz and van Dorp, 2004).

For values, the generalization to a trapezoid from a triangle is analogous. We will utilize either a two-point distribution (uniform, trapezoidal) or a three-point distribution (triangular). When there is large uncertainty regarding the underlying belief distribution in values and we have no reason to make any more specific assumptions, a two-point distribution modelling upper and lower bounds (the uniform or trapezoid distributions) is preferred. On the other hand, when the modal outcome can be estimated, the beliefs are more congenially represented by three-point distributions. Because triangular distributions are less centre-weighted than other three-point distributions, the risk of underestimation is less, which is why there are no particular reasons to use any other distribution for real-life decision purposes.

7 Evaluation Steps

We will use a generalization of the ordinary expected value for the evaluation—i.e. the resulting distribution over the generalized expected utility is

$$ E\left({A}_i\right)=\sum \limits_{i_1=1}^{n_{i_0}}{w}_{ii_1}\ \sum \limits_{i_2=1}^{n_{i_1}}{w}_{ii_1{i}_2}\dots \sum \limits_{i_{m-1}=1}^{n_{i_{m-2}}}{p}_{ii_1\;{i}_2}{\dots}_{i_{m-2}\;{i}_{m-1}}\ \\ \sum \limits_{i_m=1}^{n_{i_{m-1}}}{p}_{ii_1{i}_2}{\dots}_{i_{m-2}{i}_{m-1}{i}_m}{v}_{ii_1{i}_2}{\dots}_{i_{m-2}{i}_{m-1}{i}_m1}, $$

given the distributions over random variables p and v. There are only two operations of relevance here, multiplication and addition.

Let G be a distribution over the two cubes A and B. Assume that G has a positive support on the feasible distributions at level i in a general decision tree, as well as on the feasible probability distributions of the children of a node xij and assume that f(x) and g(y) are the marginal distributions of G(z) on A and B, respectively. Then the cumulative multiplied distribution of the two belief distributions is \( \mathrm{H}\left(\mathrm{z}\right)=\underset{\Gamma_x}{\iint }f(x)g(y) dx dy={\int}_0^1{\int}_0^{z/x}f(x)g(y) dx dy={\int}_z^1f(x)G\left(z/x\right) dx \)where G is a primitive function to g, Γz = {(x,y) | x·y ≤ z}, and 0 ≤ z ≤ 1.

Let h(z) be the corresponding density function. Then

$$ \mathrm{h}\left(\mathrm{z}\right)=\frac{d}{dz}{\int}_z^1f(x)G\left(z/x\right) dx={\int}_z^1f\frac{f(x)g\left(z/x\right)}{x} dx. $$

The addition of the products is the standard convolution of two densities restricted to the cubes. The distribution h on a sum z = x + y associated with the belief distributions f(x) and g(y) is therefore given by

$$ \mathrm{h}\left(\mathrm{z}\right)=\frac{d}{dz}{\int}_0^zf(x)g\left(z-x\right) dx. $$

Then we can obtain the combined distribution over the generalized expected utility.

As in most of risk and decision theory, we assume that a large number of events will occur and a large number of decisions will be made. In business administration, this is called the principle of going concern. In such an operating environment, the expected value becomes a reasonable decision rule and, at the same time, the belief distributions over the expected values tend to normal distributions or similar. But the resulting distributions will be normal only when the original distributions are symmetrical, which of course is not usually the case for beta and triangular distributions. The result then will instead be skew-normal. Thus, we use a truncated skew-normal distribution, generalizing the normal distribution by allowing for non-zero skewness and truncated tails. We can then conveniently represent truncated (skew-)normal distributions as probability distributions of (skew-)normally distributed random variables that are bounded. Assume that a distribution X has a normal distribution within the interval (a, b). Then X, a < X < b, has a truncated normal distribution and its probability density function is given by a four-parameter expression that tends to normality as the intervals are widened (see, for instance, Loeve, 1977).

8 Real-Life Decision Example

In the following, we will illustrate the approach with an example derived from a real-life decision problem. The example is modelled and evaluated using the DecideITFootnote 3 tool version 3.0, which, among other features, implements the above approach to handling strong uncertainty. Consider a pulp mill company that wishes to evaluate whether to rebuild or possibly exchange its recovery boiler.Footnote 4 The decision problem is viewed as two sequential decisions. The first decision is to what extent the boiler will be enhanced, and three different alternatives are considered: (1) do nothing; (2) rebuild boiler in order to secure deliveries; and (3) replace existing recovery boiler.

The second decision concerns what to do with the power turbine exploiting the pressure from the boiler in order to produce electricity (since a new boiler can allow for more powerful turbines). Furthermore, the existing turbine would need to be revised within one year were it not replaced. For this sub-decision, four alternatives were evaluated: (1) revise and use the existing turbine; (2) replace with a smaller 70 kg/s turbine; (3) replace with a bigger 80 kg/s turbine; and (4) replace with a 100 kg/s turbine (which is only feasible with a new boiler).

The alternatives are evaluated based on the following set of criteria:

  • Cr. 1. Discounted cash flow with weight variable w 1. Assessed on a monetary scale.

  • Cr. 2. Initial cash drain with weight variable w 2. Assessed on a value scale [−10, 0].

  • Cr. 3. Internal environment with weight variable w 3. Assessed using comparisons.

  • Cr. 4. External environment with weight variable w 4. Assessed using comparisons.

  • Cr. 5. Delivery dependability with weight variable w 5. Assessed using comparisons.

  • Cr. 6. Room for a production increase with weight variable w 6. Assessed on a value scale [0, 10].

The criteria weights were provided as comparisons:

$$ {\displaystyle \begin{array}{l}{w}_1-{w}_2=0\\ {}{w}_2-{w}_5>0\\ {}{w}_5-{w}_6=0\\ {}{w}_6-{w}_4>0\\ {}{w}_4-{w}_5=0\end{array}} $$

This constraint set essentially says that Cr. 1 and Cr. 2 are most and equally important, followed by Cr. 5 and Cr. 6, which are of equal importance and, in turn, more important than Cr. 3 and Cr. 4, also being of equal importance. The resulting orthogonal weight hull for each criterion is shown within brackets in Fig. 4. Cr. 1 is connected to a decision tree shown in Fig. 5 according to the approach in Larsson et al. (2005).

Fig. 4
figure 4

The criteria tree in DecideIT

Fig. 5
figure 5

Decision tree for the discounted cash flow criterion

Cr. 1 was assessed through discounted cash flow analysis (EBITA), using a risk-free discount rate with a ten-year time frame, providing a net present value for each consequence node C1 to C17 in Fig. 5, where interval values for each consequence within brackets are shown in kSEK. The cash flows were based upon unit margins of paper production and power production, together with annual estimated production. Since the estimates were uncertain, interval statements were used. This way of modelling risk in discounted cash flow analysis can be labelled risk-adjusted net present value since the risk is modelled by means of probabilities for different consequences, each associated with a net present value, as opposed to incorporating risk in the discount rate (see Aven, 2011).

For the first boiler alternative, keeping the boiler, there was an initial sub-decision regarding a choice between a new 70 kg/s turbine, a new 80 kg/s turbine, or keeping the existing turbine but with more frequent revisions. The chance nodes in the tree reflect whether or not the old turbine will break down during its final year of operation. The probability of the existing turbine breaking down while awaiting a new turbine was assessed to lie within the interval [2%, 10%]. For the second alternative, the action of rebuilding the boiler can either be done to secure the deliveries only or to additionally enable increased power production by utilizing a more powerful turbine.

For the third alternative, acquiring a new recovery boiler together with a new 100 kg/s turbine, the existing turbine needed to be in use for two years instead of only one year due to increased planning and installation time. This resulted in the breakdown probability of the old turbine being estimated to be higher compared to the other two boiler alternatives, at [10%, 20%], which is the probability for consequence C16 in Fig. 5. The discounted cash flow analysis strongly supports the alternative of enabling increased power production if rebuilding the boiler (Table 1).

Table 1 Alternative values or rankings per criterion. Interval values within brackets

For the above multi-criteria decision problem modelled in DecideIT, a main decision evaluation window is shown in Fig. 6, consisting of bar charts of stacked centroid part-worth values for the criteria for each alternative. The part-worth value φ il for alternative A i under criterion l is simply given by φ il = c w l·c v il, where c w l and c v il are the centroid weights for criterion l and the centroid alternative value for alternative A i under criterion l. The height of each bar is then the sum φ i1 + φ i2 + … + φ in, where n is the number of direct sub-criteria.

Fig. 6
figure 6

Main decision evaluation result

In addition, the results of an embedded a priori sensitivity analysis are presented in the main evaluation window in a table of pairwise comparisons between all three alternatives, done according to the approach above, enabling investigation of the belief support for the given ranking of Alt. 3 being the most preferred alternative, followed by Alt. 1. In this way, the evaluation windows provide an informative decision evaluation in the presence of strong uncertainty. The main outcome is the resulting belief distribution of the combined input belief distributions over the expression E(Alt. 3) – E(Alt. 1) and the support where this expression is positive is 89%. It would thus be unreasonable to select Alt. 3 over Alt. 1.

9 Concluding Remarks

In classic decision theory, a decision-maker is expected to assign precise numerical values to the different decision components such as weights, probabilities and values. However, in real-life problems, this requirement is too strong in many situations and some kind of representation and evaluation mechanism is important. Many candidates have been suggested, such as sets of probability measures, upper and lower probabilities, as well as interval weights, probabilities and utilities enabling a more realistic representation of the input sentences. In these contexts, higher-order analyses can add information, enabling further discrimination between alternatives. Decision trees can still be utilized to represent the decision structure, where the various estimates can be done by intervals and qualitative assessments. However, much is accomplished by enhancing this with an evaluation method based on a belief mass interpretation of the various data. We have discussed here how multi-criteria and probabilistic trees can be viewed in an integrated framework and the effects of employing second-order information in decision trees. We have also demonstrated an implementation of the theory on a real-life decision problem and how the multiplicative and additive effects strongly influence the resulting distribution over the expected values. The result is a method that can offer considerably more discriminative power when selecting alternative options.