Keywords

1 Introduction

A problem with many Multi-Criteria Decision Making (MCDM) models is that there is a lack of numerically precise information available in real life and it is hence difficult for a decision-maker to enter realistic input data into a model. There is, therefore, a perceived need for relaxing the demand for precise judgments to more realistically model decision problems. See, for instance, (Park 2004; Larsson et al. 2014) among others. Solutions to such problems are sometimes significantly hard to find and the results can be difficult to interpret. Quite well-known methods for approaching this problem are based on, e.g., sets of probability measures, upper and lower probabilities as well as interval probabilities and utilities (Coolen and Utkin 2008), fuzzy measures (Aven and Zio 2011; Shapiro and Koissi 2015; Tang et al. 2018) as well as evidence and possibility theory, cf., e.g., (Dubois 2010; Dutta et al. 2018; Rohmer and Baudrit 2010) just to mention a few of them. Other approaches include second-order methods (Ekenberg et al. 2014; Danielson et al. 2007, 2019) and modifications of classical decision rules, cf., (Ahn and Park 2008; Sarabando and Dias 2009; Aguayo et al. 2014; Mateos et al. 2014). Regarding MCDM problems, Salo, Hämäläinen, and others have suggested methods for handling imprecise information, for instance, the PRIME method (Salo and Hämäläinen 2001) with various implementations thereof, see e.g. (Mustajoki et al. 2005b). Several other models are focussing on preference intensities, such as the MACBETH method (Bana e Costa et al. 2002), a variety of ROC approaches, such as (Sarabando and Dias 2010), or the Simos’s method and variants thereof (Figueira and Roy 2002). Furthermore, there are smart swaps methods, such as (Mustajoki and Hämäläinen 2005a). Mixes of the above techniques are also common, as in Jiménez et al. (2006).

A major problem is combining interval and qualitative estimates without introducing evaluation measures like Γ-maximin or (Levi’s) E-admissibility, cf., e.g., (Augustin et al. 2014). Greco et al. (2008) suggest the UTAGMS methodology for purposes similar to ours. By using an ordinal regression technique, they can form a representation based on a set of pairwise comparisons. This is generalised in Figueira et al. (2009) by introducing cardinalities for obtaining a class of total preference functions compatible with user assessments. However, this is less suitable for our purposes since it is unclear how interval constraints can be handled in combination with the extracted preference functions without encountering the computational difficulties discussed in, e.g., (Danielson and Ekenberg 2007). Also, structural constraints should be taken into consideration as discussed already in, e.g., (Ekenberg et al. 2005).

This paper will, more particularly, discuss a method for criteria weight elicitation that can be generally applied to any case where automatic weight generation is considered and with the property that weight functions can be elicited while preserving efficiency and correctness. Below we will provide a brief discussion of so-called surrogate weight methods and then propose a reinterpretation of the rank exponential method. Herein, we focus on ordinal information. In many circumstances, there is only ordinal information available which merits the investigation into ordinal weights. In (Danielson and Ekenberg 2017), it is investigated how much contribution cardinality brings over ordinality, where it is demonstrated that weights are much more insensitive to cardinality than values, which has implications for all ranking methods. We also provide experimental simulations and investigate some properties of the method. Thereafter, a problem of selecting a national strategy for handling the COVID-19 pandemic is discussed. The conclusion is that the method seems to be a very competitive candidate for weight elicitation and evaluations.

2 Rank Ordering Methods

Ordinal methods for generating weights, sometimes with some kind of further discrimination mechanism, constitute a quite commonly used approach to handle the difficulties in eliciting precise criteria weights from decision-makers, c.f., e.g., (Stewart 1993; Arbel and Vargas 1993; Barron and Barrett 1996a, 1996b; Katsikopoulos and Fasolo 2006). The decision-maker supplies ordinal information on importance, which subsequently is converted into numerical weights following the ordinal information. There have in the literature been several suggestions of such methods, e.g., rank sum weights (RS), rank reciprocal weights (RR) (Stillwell et al. 1981), and centroid (ROC) weights (Barron 1992). Based on simulation experiments, Barron and Barrett (1996b) found ROC weights superior to RS and RR. Danielson and Ekenberg (2014 2016a, 2016b), applied in large-scale contexts, such as (Fasth et al. 2020, Komendantova et al. 2018, 2020), have also suggested a spectrum of methods and suggested some that are more robust than the earlier suggestions. In these experiments, surrogate weights as well as “true” reference weights are sampled from some underlying distribution. Then it is investigated how well the surrogate number results match the result of using the “true” results. The method is however dependent on the distribution used for generating the weight vectors.

RS is based on the idea that the rank order should be reflected directly in the weights. Given a simplex Sw generated by w1 > w2 > … > wN, where Σwi = 1 and 0 ≤ wi, assign an ordinal number to each item in the ranking, starting with the highest-ranked item as number 1. Let i be the ranking number among N items to rank. RS then becomes

$$ w_{i}^{\text{RS}} = \frac{{N + 1 {-} i}}{{\mathop \sum \nolimits_{j = 1}^{N} \left( {N + 1 {-} j} \right)}} = \frac{{2\left( {N + 1 {-} i} \right)}}{{N\left( {N + 1} \right)}} $$

for all i = 1,…,N.

RR has a similar design as RS but is based on the reciprocals (inverted numbers) of the rank order items. Assign an ordinal number to each item ranked, starting with the highest-ranked item (receiving number 1). Then assign the number i to the i:th item in the ranking to obtain

$$ w_{i}^{\text{RR}} = \frac{1/i}{{\mathop \sum \nolimits_{j = 1}^{N} \frac{1}{j}}} $$

ROC is a function based on the average of the corners in the polytope defined by the same simplex Sw = w1 > w2 > … > wN, Σwi = 1, and 0 ≤ wi, where wi are variables representing the criteria weights. The ROC weights are given by

$$ w_{i}^{\text{ROC}} = 1/N\mathop \sum \limits_{j = i}^{N} \frac{1}{j} $$

for the ranking number i among N items to rank.

As a generalization to the RS method previously discussed, a rank exponent weight method was introduced by (Stillwell et al. 1981). In the original RS formula, they introduced the exponent z < 1 to yield the rank exponent (RX) weights given by

$$ w_{i}^{{{\text{RX}}\left( {\text{z}} \right)}} = \frac{{\left( {N + 1 {-} i} \right)^{z} }}{{\mathop \sum \nolimits_{j = 1}^{N} \left( {N + 1 {-} j} \right)^{z} }}. $$

For 0 ≤ z ≤ 1 the parameter z mediates between the case of equal weights (no discrimination between the importance of criteria) and RS weights such that for z = 0 it in effect becomes equal weights and for z = 1 it instead becomes RS weights. Thus, for these values of the parameter z the RX(z) formula is the (exponential) combination of equal and RS weights. In this paper, we suggest a reinterpretation of RX.Footnote 1 This has, to our knowledge, not been investigated before. Beyond z = 1 it becomes something else, a novel weighting scheme in its own right. Earlier, before the accessibility and use of simulations to evaluate different weights, parameters such as the z parameter of RX was considered hard to estimate and thus less suitable for real-life decisions. In this work, we examine the potential of RX(z) in detail and compare it to established state-of-the-art weights such as RS, RR, and ROC.

3 Assessing Automatically Generated Weights

There are basically two categories of elicitation models that are in use depending on the degrees of freedom (DoF) present when decision-makers assign their weights. In point allocation (PA), decision-makers are given point sums, e.g. 100, to distribute among N criteria and there are consequently N−1 degrees of freedom. In the Direct Rating (DR) way of assigning weights, on the other hand, the decision-makers have no limitation on the point sum they are allowed to use, and thus a decision-maker may allocate as many points as desired. Only thereafter, the points are normalized, i.e., in DR there are N degrees of freedom. Consequently, when generating weight vectors in an N−1 DoF model, they must sum to 100%, and when generating vectors for an N DoF model, a vector is generated keeping components within [0%, 100%] which is thereafter normalised. Other distributions would of course at least theoretically be possible, but it is important to remember that the validation methods are strongly dependent on these assumptions and affect the validations. Different decision-makers use different mental strategies and models when weighting criteria. Thus, a reasonable weighting scheme must be able to perform well in both PA and DR cases, i.e. regardless of the degrees of freedom being N−1 or N (Danielson and Ekenberg 2019).

3.1 Experimental Setup

The experiments below for an N−1 DoF model was based on a homogenous N-variate Dirichlet distribution generator, and a standard round-robin normalised random weight generator was used for the N DoF experiments. We call the N−1 DoF model type of generator an N1-generator and the N DoF model type an N-generator. Details of the simulation generators are given in (Danielson and Ekenberg 2014).

The simulation experiment consisted of four numbers of criteria N = {3, 6, 9, 12} and five numbers of alternatives M = {3, 6, 9, 12, 15}, i.e. a total of 20 simulation scenarios. These simulation sets were selected to cover the most common sizes of decision problems. The behaviour with large decision problems is not within the scope of this article. Each scenario was run 10 times with 10,000 trials for each of them yielding a total of 2,000,000 decision situations. Unscaled value vectors were generated uniformly, and no significant differences were observed with other value distributions. The results of the simulations are shown in the tables below, where we show a subset of the results with chosen pairs (NM).

The “winner frequency” in the tables refers to the fraction of cases where the best alternative was correctly predicted. Other measurements include “podium frequency” where the three best alternatives are correctly predicted and “total frequency” where the positions of all alternatives are correctly predicted. The latter two measurements showed the same pattern across the weighting methods as the winners, and are thus not presented here since they would not add to the discussion Footnote 2.

The first set of tables shows the winner frequency for the RX(z) family of methods and the second set of tables shows the winner frequency for the older ROC, RR, RS methods together with selected RX(z) methods. Both sets of tables utilise the simulation methods N−1 DoF, N DoF, and an equal combination of N−1 and N DoF. All hit ratios in all tables are given in percent and are mean values of the 10 scenario runs. With hit ratios is meant the fraction of times that the correct winner is predicted.

The first set of studies concern the parameter z of the RX(z) method. Recall that z = 1 is the same as the RS method studied previously and which is used as one of the comparisons in the next set of tables. For values 0 ≤ z ≤ 1, which is a combination of RS and equal weights, the algorithm underperforms compared to already known algorithms. This is easily understood since equal weights is a very weak weighting scheme as it does not take any information on the decision situation into account. Thus, this study focuses on parameters z > 1.

3.2 Results

In Table 1, using an N−1-generator, it can be seen that higher parameter values tend to outperform the others when looking at the winner. In Table 2, the frequencies have changed according to expectation since we employ a model with N degrees of freedom. Now lower parameter values outperform higher (lower being closer to RS), but not at all by as much. In Table 3, the N and N−1 DoF models are combined with equal emphasis on both. Now, we can see that in total medium-sized parameters generally perform the best. While (Stillwell et al. 1981) discussed z < 1, it is evident by examining the formula that it cannot outperform RS (which is z = 1) since it is the linear combination of RS and equal weights, the latter being the worst performer since it does not take any information into account. Thus, we did in this experiment vary z from 1 (RS) and up in steps of 0.1 until the performance declined. The best performances for the different sizes were always found in the interval [1.1, 1.6]. Thus, it gives guidance to select the best z given the problem size.

Table 1. Hit ratio for predicting winners using an N−1-generator
Table 2. Hit ratio for predicting winners using an N-generator
Table 3. Hit ratio for predicting winners using N and N−1 DoF generators combined

It is clear from the table that parameters z ∈ [1.3, 1.5] are the best performers but that all of the range [1.2, 1.6] are performing well. Since we do not know exactly what goes on inside a particular decision-maker’s head when giving input to a decision situation, it is not wise to rely on a weight function to perform well on only one side of the dimensionality spectrum above. Instead, we consider the mix of N and N−1 dimensions to constitute the most valid measurement of a viable automatically generated weighting scheme.

But in line with that reasoning, we would also like to minimise the spread between the dimensions, i.e. having a generating function that differs less between both end-points of the input dimensionality scale is preferred to one that has a larger spread. To that effect, in addition to studying the overall hit ratio, we also studied the spread of the results from different dimensionalities. This is shown in Table 4 for the different z parameters of RX(z). Now a quite different picture emerges. While all parameters z ∈ [1.2, 1.6] perform well overall, it is clear that higher z keeps the spread down, especially for problems of larger size. Since this is a highly desirable property given that we don’t know the thinking process of a particular decision-maker, we tend to favour higher z parameters for their robustness as long as they do perform well overall. For comparisons with current well-known weight functions, we select both RX(1.5) and RX(1.6).

Table 4. Spread between hit ratio for predicting winners using N and N−1 DoF generators

3.3 Comparing with Earlier State-of-the-Art Weights

In (Danielson and Ekenberg 2014), previous classic weighting functions were compared. Here, these results are repeated together with the new results for RX. The latter is represented by RX(1.5) and RX(1.6) which achieved the best results above. In Table 5, using an N−1-generator, it can be seen that ROC outperforms the other classical ones when looking at the winner. RR is better than RS (which is RX(1.0)). In Table 6, the frequencies have changed according to expectation since we employ a model with N degrees of freedom. Now RS outperforms the others including RX while ROC and RR are far behind. In Table 7, the N and N−1 DoF models are combined with equal emphasis on both. Now, we can see that in total RX generally performs the best.

Table 5. Hit ratio for predicting winners using an N−1-generator
Table 6. Hit ratio for predicting winners using an N-generator
Table 7. Hit ratio for predicting winners using N and N−1 DoF generators combined
Table 8. Spread between hit ratio for predicting winners using N and N−1 DoF generators

It is clear from studying the resulting tables that the RX family of automatic weight functions easily outperform the more well-known functions, provided that it is possible to select the z parameter in an informed manner. The picture becomes even clearer once the spread between different decision-maker ways of thinking is being taken into consideration.

None of the other studied state-of-the-art functions perform well under varying conditions, while the RX(z) family is able to do so. Especially somewhat higher z parameters perform very well, making parameter selection a trade-off between pure performance and robustness. Our suggestion is to use z ∈ [1.5, 1.6] as the optimal compromise for the parameter. As was seen in Table 4, lower z-values lead to less robustness with respect to decision-maker styles of reasoning. With a higher parameter, the RX(z) family by far outperforms the earlier known ROC, RS, and RR weighting schemes.

4 Example

In the current outbreak of the COVID-19 pandemic, several nation-states seem to have been less than fully prepared. Where strategic plans existed, they were often either not complete or not followed. In some cases, the supply of resources was not sufficient to sustain the outbreak over time. Further, cognitive and behavioural biases seem to have played a significant role in the decision-making processes regarding which risk mitigation and management measures to implement. Many countries were to a large extent unprepared for a similar scenario to arise, despite the fact that predictions about a significant probability for a pandemic to occur in a foreseeable future, and national governments of several countries often acted in an uncoordinated manner, which have resulted in suboptimal responses from national bodies. The current discourse has had a strong emphasis on the number of direct fatalities, while there still is a multitude of relevant aspects of the current crisis. In this example, we briefly discuss how a more general framework, including epidemiological and socio-economic factors, could look like using a model for evaluating the qualitative and quantitative aspects involved.

A detailed account of all the relevant aspects is beyond the scope of this article and for demonstrational purposes only, we just use a few possible options and criteria for a national policy with four levels of restrictions suggested to be imposed on the population of a country affected by COVID-19.Footnote 3

Some examples of possible mitigation strategies could then be:

  1. 1.

    An unmitigated response

  2. 2.

    Response by pharmaceutical measures and case isolation, public communication encouraging increased hygiene and personal protection.

  3. 3.

    2 + additional personal protective measures and mild social distancing measures.

  4. 4.

    3 + self-selected social distancing and comprehensive contact tracing and publicly disclosed detailed location information of individuals that tested positive for COVID-19.

We use the following four criteria:

  1. 1.

    Number of cases (including critical, severe and mild)

  2. 2.

    Economic aspects

  3. 3.

    Human rights violations

  4. 4.

    Effects on education

The estimates (for demonstrational purpose only) on the values of each response level under each criterion are shown in Table 9 below.

Table 9. The valuation of strategies under the respective criteria

We need to calibrate the different scales since they are of very different characters and in this example, we assume that:

  • The maximum difference between Str.1 and Str.4 in Cases is more important than the maximum difference of Str.1 and Str.4 in Economy.

  • The maximum difference between Str.1 and Str.4 in Economy is more important than the maximum difference of Str.1 and Str.4 in Human rights.

  • The maximum difference between Str.1 and Str.4 in Human rights is more important than the maximum difference between Str.1 and Str.4 in Education.

The resulting criteria ranking then becomes the following: The importance of Cases is higher than that of Economy, which in turn is more important than Human rights. Further, Human rights is more important than Education. This ranking is then represented by the RX(1.5) weight generating algorithm. The weights (using the enumeration above) then become w(1) = 0.470, w(2) = 0.305, w(3) = 0.166, and w(4) = 0.059 respectively.

The generated weights together with estimates on the values of each response strategy can then be evaluated by solving successive optimisation problems using the program DecideIT which employs the RX weights together with algorithms from (Danielson et al. 2019). For the evaluation, belief distributions are generated from the input data (both weights and values) using the algorithms in the program. The value V(Si) for each strategy is then assessed as \( \text{V}(\text{S}_{\text{i}}) = \sum \text{w}_{\text{i}} \cdot \text{v}_{\text{ij}} \) for all weight and value variables involved. The result can be seen in Fig. 1, where Str.1 is found to be the best option for a policy given the background information used herein. The strategy values V(Si) are seen at the top of the evaluation bars. The coloured parts are the contributions from each criterion.

Fig. 1.
figure 1

The result of the analysis. (Color figure online)

Without going into the details, Fig. 1 shows that, given the background information, the higher the bars representing the strategies, the better the respective strategies are. We can also see the result’s robustness by the colour markings. A green square means that there is a significant difference between the strategies and that there must be large changes in the input data for it to change. A yellow square means that there is still a significant difference, but that the result is more sensitive to input data. A black square means that there is no significant difference between the strategies. An extended explanation of the semantics regarding the bars and the colour markings are provided in (Danielson et al. 2019). Str.1 is thus the best option in this example. Furthermore, this result is quite robust. It is followed by Str.2, which is quite similar to Str.4, but better than Str.3.Footnote 4

5 Conclusions

This paper aims to define and test a robust multi-criteria weight generating method covering a broad set of decision situations, but which still is reasonably simple to use. In the analyses, we have investigated the average hit rate in percent over the pairs (NM) of number of criteria and alternatives. From Tables 7 and 8 concerning generated weight performances, RX(1.5) and RX(1.6) are found to be the best candidates for representing weights when searching an optimal alternative. The other weight generation methods are clearly inferior. In particular, ROC is heavily biased by an assumption that decision-makers have a decision process with N−1 degrees of freedom considering N criteria, while a reasonable requirement on a robust a rank ordering method is that it should provide adequate alternative rankings under the varying assumptions that we have little real-life knowledge about. It is thus clear that the RX family of methods generates the most efficient and robust weighs and works very well on different problem sizes. Furthermore, it is stable under varying assumptions regarding the decision-makers’ mindset and internal modelling. As further research, the obvious next step and extension to the observations in this paper is to find a configuration function that asserts different parameter values to problems of different sizes. This would further increase the efficiency of the RX family of automatic weight functions over its previous competitors.