1 Introduction

In machine learning, and more precisely in classification problems, it is very common applying different algorithms to many benchmarks several times. Normally, the performance of the algorithms is analyzed using the mean and the standard deviation of some known metric, such as the classification accuracy. Next, we need to compare the algorithms and a difficult question arises: how to compare these algorithms effectively? The first answer to this question is to use statistical tests, i.e., parametric and/or nonparametric. The statistical tests can detect if there are differences between the performances of the algorithms [4, 5]. One problem is if there are differences, which algorithm is the best, the second best, and the worst? Using nonparametric statistical tests, it is necessary to make pairwise and multiple comparisons among the algorithms. Obviously, the number of tests required increases greatly with the number of algorithms being analyzed. This is problematic, firstly because of the tiresome work of comparing each pair of algorithms; secondly, and more importantly, the probability of making a mistake increases. In addition, these tests may also present limitations, since they consider only the means and not the standard deviations of the obtained results.

Over the past few years, some approaches have been proposed in order to rank classification algorithms. Brazdil and Soares [2] presented three methods to generate rankings of classification algorithms. However, these methods are not robust and sometimes their results do not match with the statistical tests. Peng et al. [13] developed a decision-making framework to rank classification algorithms. Nonetheless, this framework does not consider the standard deviation of the algorithms’ performance. Moreover, the authors do not compare their methods with the statistical tests. Kotthoff [10] investigated ranking approaches to select the most appropriate algorithm for solving a particular problem. In this case, his goal is to tackle the Algorithm Selection Problem [14], which is slightly different from ranking algorithms based on performance in different benchmarks.

Recently, Krohling et al. [9] presented a new approach to support the selection of the best algorithms by using the Hellinger distance [11]. This approach, called Hellinger-TOPSIS, provides a rank order of the algorithms in an easy and direct way, using the mean and the standard deviation of the performance of the algorithms. However, the Hellinger-TOPSIS presents some shortcomings. Firstly, the mean and the standard deviation of the algorithms’ performance have the same importance. Usually, the mean of the performance is more important than the standard deviation. In the Hellinger-TOPSIS we cannot control the influence of these two parameters. Second, if any algorithm in the group is deterministic, i.e, the results obtained are described just by the means, and it must be compared with other stochastic ones, the Hellinger-TOPSIS cannot handle such a case, because in the algorithm the standard deviation must be different from zero.

In our previous work, we proposed the A-TOPSIS [8], a new approach that provides a rank order of the evolutionary algorithms in cases where the performance of the algorithms are expressed in terms of means and standard deviations. In this work, we extend our previous approach by providing an in-depth investigation for two case studies for classification problems. In addition, we develop an easy-to-use web framework for A-TOPSIS. The remainder of this paper is organized as follows: Sect. 2 presents a background in decision-making and in the TOPSIS. In Sect. 3, we present the approach based on TOPSIS to deal with data matrix consisting of the performance of algorithms in terms of means and standard deviations and we briefly describe the web framework developed. In Sect. 4, we present simulation results for two case studies involving the classification task in order to illustrate the suitability of the presented approach. In Sect. 5, conclusions and directions for future work are given.

2 Background in Decision-Making and TOPSIS

The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS), developed by Hwang and Yoon [6], is a technique to evaluate the performance of alternatives through the similarity with the ideal solution. According to this technique, the best alternative would be one that is closest to the positive-ideal solution and farthest from the negative-ideal solution. The positive-ideal solution is the one that maximizes the benefit criteria and minimizes the cost criteria. The negative-ideal solution maximizes the cost criteria and minimizes the benefit criteria. In summary, the positive-ideal solution is composed of all the best values attainable for the criteria, and the negative-ideal solution consists of all the worst values attainable for the criteria. The interested reader shall refer to Behzadian et al. [1] for a broad survey about TOPSIS.

Let us consider the decision matrix A, which consists of alternatives and criteria, described by:

(1)

where \(A_1 ,A_2 ,\ldots ,A_m \) are viable alternatives, and \(C_1 ,C_2 ,\ldots ,C_n \) are criteria, \(x_{ij} \) indicates the rating of the alternative \(A_i \) with respect to criterion \(C_j .\) The weight vector \(W=\left( {w_1 ,w_2 ,\ldots ,w_n } \right) \) is composed of the individual weights \(w_j (j=1,\ldots ,n),\) for each criterion \(C_j \) and satisfies \(\sum \nolimits _{j=1}^n {w_j } =1. \) In general, the criteria can be classified into two types: benefit and cost. The benefit criterion means that a higher value is better, while for the cost criterion the opposite is valid. The data of the decision matrix A come from different sources, so it is necessary to normalize it in order to transform it into a dimensionless matrix, which allows the comparison of the various criteria. In this work, we use the normalized decision matrix \(R= \left[ {r_{ij} } \right] _{m\times n} \hbox {with }i=1,\ldots ,m,\hbox { and }j=1,\ldots ,n.\) The normalized value \(r_{ij} \) is calculated as:

$$\begin{aligned} r_{ij} =\frac{x_{ij} }{\sqrt{\sum \nolimits _{i=1}^m {x_{ij}^2 } }},\hbox {with }i=1,\ldots ,m;j=1,\ldots ,n. \end{aligned}$$
(2)

or

$$\begin{aligned} r_{ij} =\frac{x_{ij} }{x_{i\max } },\hbox {with }i=1,\ldots ,m;j=1,\ldots ,n. \end{aligned}$$
(3)

The normalized decision matrix R represents the relative rating of the alternatives. After normalization, one calculates the weighted normalized decision matrix \(P= \left[ {p_{ij} } \right] _{m\times n} \hbox {with }i=1,\ldots ,m,\hbox { and }j=1,\ldots ,n\) by multiplying the normalized decision matrix by its associated weights. The weighted normalized value \(p_{ij} \) is calculated as:

$$\begin{aligned} p_{ij} =w_j \cdot r_{ij} \hbox { with }i=1,\ldots ,m,\hbox { and }j=1,\ldots ,n. \end{aligned}$$
(4)

The TOPSIS is described in the following steps [6, 7]:

Step 1 Identify the positive ideal solutions \(A^{+} \) (benefits) and negative ideal solutions \(A^{-}\) (costs) as follows:

$$\begin{aligned} A^{+}= & {} \left( p_1^+ ,p_2^+ ,\ldots ,p_n^+ \right) \end{aligned}$$
(5)
$$\begin{aligned} A^{-}= & {} \left( p_1^- ,p_2^- ,\ldots ,p_n^- \right) \end{aligned}$$
(6)

where \(p_j^+ =\left( {\max \nolimits _{i} p_{ij} ,j\in J_1 ;\min \nolimits _{i} p_{ij} ,j\in J_2 } \right) \) and \(p_j^- =\left( {\min \nolimits _{i} p_{ij} ,j\in J_1 ;\max \nolimits _{i} p_{ij} ,j\in J_2 } \right) ;J_1 \) and \(J_2 \) represent the criteria benefit and cost, respectively.

Step 2 Calculate the Euclidean distances from the positive ideal solution \(A^{+}\) (benefits) and the negative ideal solution \(A^{-}\) (costs) of each alternative \(A_i \), respectively as follows:

$$\begin{aligned} d_i^+= & {} \sqrt{\sum _{j=1}^n {(d_{ij}^+ )^{2}} } \end{aligned}$$
(7)
$$\begin{aligned} d_i^-= & {} \sqrt{\sum _{j=1}^n {(d_{ij}^- )^{2}} } \end{aligned}$$
(8)

where \(d_{ij}^+ =p_j^+ -p_{ij} ,\hbox {with }i=1,\ldots ,m\) and \(d_{ij}^- =p_j^- -p_{ij} ,\hbox {with }i=1,\ldots ,m.\)

Step 3 Calculate the relative closeness coefficients \(\xi _i \) for each alternative \(A_i \) with respect to the positive ideal solution as given by:

$$\begin{aligned} \xi _i =\frac{d_i^- }{d_i^+ +d_i^- } \end{aligned}$$
(9)

Step 4 Rank the alternatives according to the relative closeness. The best alternatives are those that have higher value \(\xi _i \) and therefore should be chosen.

3 A-TOPSIS: An Approach Based on TOPSIS for Ranking Algorithms

The A-TOPSIS is an approach for ranking algorithms that uses the TOPSIS as a building block. Its main idea is to rank a group of algorithms by using the means and the standard deviations of their performance. Let us consider a group of m algorithms performed to n benchmarks. We can set a decision matrix using all the performance means and the standard deviations of each algorithm for each benchmark as follows:

$$\begin{aligned} D={\begin{array}{l} {A_1 } \\ \vdots \\ {A_m } \\ \end{array} } {\begin{array}{lll} \quad C_1 \quad \ldots \quad C_n \\ \left( {{\begin{array}{lll} {x_{11} }&{} \ldots &{} {x_{1n} } \\ \vdots &{} \ddots &{} \vdots \\ {x_{m1} }&{} \cdots &{} {x_{mn} } \\ \end{array} }} \right) \end{array}}=\left( {{\begin{array}{lll} {(\mu _{11} ,\sigma _{11} )}&{} \ldots &{} {(\mu _{1n} ,\sigma _{1n} )} \\ \qquad \vdots &{} \ddots &{}\qquad \vdots \\ {(\mu _{m1} ,\sigma _{m1} )}&{} \cdots &{} {(\mu _{mn} ,\sigma _{mn} )} \\ \end{array} }} \right) \\ \end{aligned}$$

where \(A_1 ,A_2 ,\ldots , A_m \) are alternatives, \(C_1 ,C_2 ,\ldots ,C_n \) are criteria, \(x_{ij} \) indicates the rating of the alternative \(A_i \) with respect to criterion \(C_j \) described in terms of its mean and standard deviations \((\mu _{ij} ,\sigma _{ij} )\), respectively. As we can note, for the A-TOPSIS, the alternatives consist of the algorithms and the criteria are the benchmark problems.

The decision matrix D can be split into two matrices, given by \(D=\left\{ {M_\mu ,M_\sigma } \right\} \):

$$\begin{aligned} M_\mu =\left( {{\begin{array}{lll} {\mu _{11} }&{} \ldots &{} {\mu _{1n} } \\ \vdots &{} \ddots &{} \vdots \\ {\mu _{m1} }&{} \cdots &{} {\mu _{mn} } \\ \end{array} }} \right) \quad M_\sigma =\left( {{\begin{array}{lll} {\sigma _{11} }&{} \ldots &{} {\sigma _{1n} } \\ \vdots &{} \ddots &{} \vdots \\ {\sigma _{m1} }&{} \cdots &{} {\sigma _{mn} } \\ \end{array} }} \right) \end{aligned}$$

where \(M_\mu \) and \(M_\sigma \) are the matrices of the means and the standard deviations of the algorithms performance for each benchmark problem. From these matrices, we developed a new framework combining the TOPSIS as illustrated in Fig. 1.

Fig. 1
figure 1

Illustration of the A-TOPSIS: an approach for ranking algorithms in terms of mean and standard deviations

3.1 A-TOPSIS Algorithm

Next, we present the step-by-step of the proposed framework:

Step 1 Normalize the matrices \(M_\mu \) and \(M_\sigma .\)

Step 2 Identify the positive ideal solutions \(A^{+}\) (benefits) and negative ideal solutions \(A^{-}\) (costs) for each matrix as follows:

$$\begin{aligned} A^{+}= & {} \left( p_1^+ ,p_2^+ ,\ldots ,p_n^+ \right) \end{aligned}$$
(10)
$$\begin{aligned} A^{-}= & {} \left( p_1^- ,p_2^- ,\ldots ,p_n^- \right) \end{aligned}$$
(11)

where \(p_j^+ =\left( {\max \nolimits _{i} p_{ij} ,j\in J_1 ;\min \nolimits _{i} p_{ij} ,j\in J_2 } \right) \) and \(p_j^- =\left( {\min \nolimits _i p_{ij} ,j\in J_1 ; \max \nolimits _i p_{ij} ,j\in J_2 } \right) \); \(J_1 \) and \(J_2 \) represent the criteria benefit and cost, respectively.

Step 3 Calculate the Euclidean distances from the positive ideal solution \(A^{+}\) (benefits) and the negative ideal solution \(A^{-}\) of each alternative \(A_i \), respectively as follows:

$$\begin{aligned} d_{^{i}}^+= & {} \sqrt{\sum _{j=1}^n {(p_j^+ -p_{ij} ,)^{2}} }\hbox { with }i=1,\ldots ,m. \end{aligned}$$
(12)
$$\begin{aligned} d_{^{i}}^-= & {} \sqrt{\sum _{j=1}^n {(p_j^- -p_{ij} )^{2}} }\hbox { with }i=1,\ldots ,m. \end{aligned}$$
(13)

Step 4 Calculate the relative closeness coefficients for each alternative \(\xi _i \) with respect to positive ideal solution as:

$$\begin{aligned} \xi _i =\frac{d_i^- }{d_i^+ +d_i^- }\hbox { with }i=1,\ldots ,m. \end{aligned}$$
(14)

Step 5 After calculating the vector \(\xi _i \) for both decision matrices, we obtain a data matrix that is made up of the two vectors of the relative closeness coefficients, as given by:

$$\begin{aligned} C=\left( {{\begin{array}{ll} { }^1\xi _1 &{} { }^2\xi _1 \\ \quad \vdots &{}\quad \vdots \\ {{ }^1\xi _m }&{} {{ }^2\xi _m } \\ \end{array} }} \right) \end{aligned}$$
(15)

In this case, to each of the vectors, it is assigned a weight \(W=\left( {w_1 ,w_2 } \right) =\left( {w_\mu ,w_\sigma } \right) , \) where \(w_\mu \hbox { and }w_\sigma \) represent the weight assigned to the criteria means, and standard deviations, respectively, which satisfies \(w_\mu +w_\sigma =1.\) One can now obtain the weighted relative-closeness coefficients matrix by introducing the importance weights to each one of the relative-closeness coefficient vector, as given by:

$$\begin{aligned} C=\left( {{\begin{array}{ll} w_1 { }^1\xi _1 &{}\quad w_2 { }^2\xi _1 \\ \quad \vdots &{}\qquad \vdots \\ {w_1 { }^1\xi _m }&{}\quad {w_2 { }^2\xi _m } \\ \end{array} }} \right) \end{aligned}$$
(16)

From this stage on, the method continues by applying the standard TOPSIS to the resulting matrix in order to identify the global ranking.

Step 6 Identify the global positive ideal solution \(A_G^+ \) and the global negative ideal solution \(A_G^- \), respectively, as follows:

$$\begin{aligned} A_G^+= & {} \left( p_{G1}^+ ,p_{G2}^+ \right) =\left( {\mathop {\max }\limits _i { }^l\xi _i ,\quad l\in J_1 ;\mathop {\min }\limits _i { }^l\xi _i \quad l\in J_2 } \right) . \end{aligned}$$
(17)
$$\begin{aligned} A_G^-= & {} \left( p_{G1}^- ,p_{G2}^- \right) =\left( {\min \limits _i { }^l\xi _i ,\quad l\in J_1 ;\max \limits _{i} { }^l\xi _i \quad l\in J_2 } \right) . \end{aligned}$$
(18)

where \(J_1 \) and \(J_2 \) represent the criteria benefit and cost, respectively.

Step 7 Calculate to each alternative \(A_i \) the distances from the global positive ideal solution \(A_G^+ \) and from the global negative ideal solution \(A_G^- \), respectively, as follows:

$$\begin{aligned} d_{Gi}^+= & {} \sqrt{\sum _{l=1}^2 {\left( { }^l\xi _i -p_{G_l }^+ \right) ^{2}} }\hbox { with }i=1,\ldots ,m. \end{aligned}$$
(19)
$$\begin{aligned} d_{Gi}^-= & {} \sqrt{\sum _{l=1}^2 {\left( { }^l\xi _i -p_{G_l }^- \right) ^{2}} }\hbox { with }i=1,\ldots ,m. \end{aligned}$$
(20)

Step 8 Calculate the global relative-closeness coefficients \(\xi _{Gi} \) for each alternative \(A_i \) with respect to global positive ideal solution \(A_G^+ \) as:

$$\begin{aligned} \xi _{Gi}= & {} \frac{d_{Gi}^- }{d_{Gi}^- +d_{Gi}^+ } \end{aligned}$$
(21)

Step 9 Rank the alternatives according to the relative closeness coefficients. The best alternatives are those that have higher value \(\xi _{Gi} \) and therefore should be chosen.

3.2 A-TOPSIS Web Framework

In order to encourage researchers and practitioners from different areas of knowledge to use the A-TOPSIS for ranking algorithms, we provide an easy-to-use web framework. As shown in Fig. 2, to use this framework the user needs to set the matrices \(M_\mu \) and \(M_\sigma \) as .csv files and the value of the weights for each one. Thereby, the framework provides the graph bar rank and the values of the closeness coefficients.

Fig. 2
figure 2

The A-TOPSIS framework

The A-TOPSIS framework can be easily used by accessing the web address http://www.inf.ufes.br/~agcpacheco/alg-ranking/.

4 Simulation Results

In this section, we present two case studies involving classification problems. In order to compare our results, we also apply the Hellinger-TOPSIS for each case. As the Hellinger-TOPSIS cannot handle a standard deviation equal to zero, we set a very small value as the standard deviation in cases where this occurs. Lastly, we used the nonparametric Friedman test followed by Wilcoxon test as a pos hoc, both with \(p_{value} =0.05,\) in order to certify the quality of the rank. For more details about the statistical tests performed in this section, the reader may refer to Derrac et al. [4].

4.1 Case Study I

In this case study, we have an ensemble of classifiers, containing four classifiers: feedforward neural network (FNN), extreme learning machine (ELM), discriminative restricted Boltzmann machine (DRBM) and K-nearest neighbors (KNN). In addition, we have three aggregation methodologies: the average of the supports (AVG), the majority voting (MV) and the Choquet integral (CHO) [12]. All these classifiers were applied to 12 benchmarks, and their performance for each benchmark is described in Table 1. Our goal is to rank the seven algorithms according to their performance. Therefore, the decision matrix \(D=\left\{ {M_\mu ,M_\sigma } \right\} ,\) presented in Sect. 3, is set by using the values described in Table 1.

Table 1 The classifiers performance for each benchmark in terms of mean and standard deviation of the classification accuracy—case study I

As we can see in Table 1, the KNN algorithm does not have a standard deviation because it was used with just one value of k. Therefore, we divided this case study into two parts. First, we remove the KNN and consider the remaining classifiers. Second, we consider all seven classifiers setting the KNN standard deviation equal to zero. We decided to do it to show the ranking differences between the A-TOPSIS and the Hellinger-TOPSIS when we include an algorithm with standard deviation equal to zero (recall the Hellinger-TOPSIS considers the mean and the standard deviation with the same importance). For both parts, we carry out a sensitivity study by varying the weights for the mean and the standard deviation, respectively.

4.1.1 Case Study I: Part I

From the Table 1, we remove the KNN and maintain the remaining algorithms. Thereby, the decision matrix for this experiment has six alternatives (algorithms) and 12 criteria (benchmarks). In Table 2 is described the rank provided by the A-TOPSIS by varying the weights for the mean and the standard deviation.

Table 2 Rank by varying the values of the weight—case study I, part I

As we can see, the first, second and third place in the rank do not change regardless the weight. In fact, the only change in the rank occurs when the values of the weights become [0.6, 0.4]. In this case, the FNN rises to the fourth place, the ELM goes down to the fiftieth place and the DRBM goes to the last place. Varying the weights from [0.6, 0.4] to [1, 0] does not change the rank. In Fig. 3 is illustrated the raking in the bar graph for each weights configuration.

Fig. 3
figure 3

Rank in bar graph for each weights configuration—case study I, part I

We compare the results obtained by A-TOPSIS with the Hellinger-TOPSIS. Since the rank provided by the A-TOPSIS becomes stable with weights equal to [0.6, 0.4], we chose these values for this comparison. In Table 3 is presented the rank for each methodology, which is also depicted in Fig. 4 in the bar graph. According to the presented results, both methods obtained the same rank.

Table 3 Rank comparison between A-TOPSIS and H-TOPSIS—part I
Fig. 4
figure 4

Rank in bar graph for A-TOPSIS and H-TOPSIS—case study I, part I

The Friedman test for this experiment provides \(p_{value} =0.00005,\) leading to reject \(H_0 .\) Then, we perform the pairwise comparisons using the Wilcoxon test. According to the results presented in Table 4, the CHO classifier is significantly different when compared to the other ones. Furthermore, the DRBM classifier is significantly different than AVG, MV and CHO. Thus, the statistical tests indicate that the CHO classifier is the best algorithm and the DRBM is the worst one. This finding is consistent with the results obtained by A-TOPSIS. Nonetheless, the statistical tests cannot provide a rank with all the classifiers as A-TOPSIS does.

Table 4 Wilcoxon test pairwise comparison with p less than 0.05—case study I, part I

4.1.2 Case Study I: Part II

In this experiment, we consider all the algorithms in Table 1. Thereby, the decision matrix for this experiment has seven alternatives (algorithms) and 12 criteria (benchmarks). In Table 5 is described the rank provided by the A-TOPSIS by varying the weights for the mean and the standard deviation.

Table 5 Rank by varying the values of the weights—case study I, part II

As we can notice in Table 5, for all weights the first and the second place in the rank do not change. For the weights equal to [0.5, 0.5], the KNN reaches the third place in the rank. However, when the weights are set equal to [0.6, 0.4], only 10% of variation, the KNN goes down to the last place. Moreover, for these weights, the ELM rises from the sixth to the fourth place. Nevertheless, when the weights are equal to [0.7, 0.3], the ELM and the FNN switch their positions. From the weights [0.7, 0.3] to [1, 0], the rank become stable and does not change anymore. In Fig. 5 is illustrated the rank in bar graph for each weights configuration.

Fig. 5
figure 5

Rank in bar graph for each values of the weight—case study I, part II

Again, we compare the results obtained by A-TOPSIS with the Hellinger-TOPSIS. For this experiment, we choose the stable weights [0.7, 0.3]. In Table 6 is presented the rank for each methodology, which is also depicted in Fig. 6 in the bar graph.

Table 6 Rank comparison between A-TOPSIS and H-TOPSIS—case study I, part II

In Table 6, we can easily check that the ranking of the alternatives CHO, FNN, DRBM and KNN are the same in both methods. Thus, the ranking of the best and worst alternatives are kept. On the other hand, the ranking of the alternatives MV, AVG and ELM have changed their positions. Comparing the rank in both experiments (Tables 3, 6), we observe that the A-TOPSIS include the KNN in the last position and maintain the ranking for the remaining algorithms. Conversely, the Hellinger-TOPSIS does not do the same. Therefore, we conclude that the inclusion of the KNN directly affects in the Hellinger-TOPSIS ranking. This happened because the Hellinger-TOPSIS does not allow us to control the influence of the mean and the standard deviation in its algorithm.

Fig. 6
figure 6

Rank in bar graph for A-TOPSIS and H-TOPSIS—case study I, part II

Similarly to the previous experiment, the Friedman test for this experiment provides \(p_{value} =0.00007,\) leading to reject \(H_0\) . Next, we perform the pairwise comparisons using the Wilcoxon test. According to the results presented in Table 7, the CHO classifier is significantly different when compared to the other ones. In addition, the KNN is also significantly different than the others, except DRBM. Lastly, the DRBM is significantly different comparing to AVG, MV and CHO. Thus, the statistical tests indicate that the CHO classifier is the best algorithm and the KNN and the DRBM are the worst ones. Also in this case, this finding is consistent with the results obtained by A-TOPSIS.

Table 7 Wilcoxon test pairwise comparison with p less than 0.05—case study I, part II
Table 8 The classifiers performance for each benchmark in terms of mean and standard deviation of the classification accuracy—case study 2

4.2 Case Study II

This case study presented by Wen et al. [15] consists in a classification problem with eight classifiers applied to 10 benchmarks. In Table 8 is described the performance of the classifiers for each benchmark. Similar to the case study 1, our goal is to find the rank of the classifiers according to their performance. It is worth mentioning that in this case study the authors used the error rate as accuracy. The A-TOPSIS can easily handle with this just changing the criterion from benefit to cost, i.e., the smaller the value is, the better.

As in this case study we have eight algorithm and ten benchmarks, the decision matrix has eight alternatives and ten criteria. In Table 9 is described the rank provided by the A-TOPSIS by varying the weights for the mean and the standard deviation.

Table 9 Rank by varying the values of the weights—case study II
Fig. 7
figure 7

Rank in bar graph for each values of the weight—case study II

As we can notice in Table 9, the first and the last place are the same for all weights. When the weights are varied from [0.5, 0.5] to [0.6, 0.4] the classifiers HKNN, LNC, LPC and EKNN switch their positions. Varying the weights from [0.6, 0.4] to [0.9, 0.1] the rank does not change. Lastly, when the weights become [1, 0], the classifiers EKNN and ALH switch their positions. In Fig. 7 is illustrated the rank in bar graph for each weights configuration.

In Table 10 is described the rank provided by the A-TOPSIS and Hellinger-TOPSIS. For the A-TOPSIS we chose the stable weights [0.7, 0.3]. In addition, in Fig. 8 is depicted the rank of each approach in bar graph.

Table 10 Rank comparison between A-TOPSIS and H-TOPSIS—case study II

From Table 10, we observe that there are some differences in the ranks. For both ranks, the first and the last places do not change. However, in the Hellinger-TOPSIS rank, the position of the classifiers LNC and HKNN are too close, therefore, they are tied in second place. Furthermore, as we can see in Fig. 8, even though we can distinguish the ranking for the classifiers FKNN, EKNN, LPC and ALH in Hellinger-TOPSIS rank, the values of the closeness coefficients are too close. This issue does not occur in A-TOPSIS.

Fig. 8
figure 8

Rank in bar graph for A-TOPSIS and H-TOPSIS—case study II

Table 11 Wilcoxon test pairwise comparison with p less than 0.05—case study II

For the case study II, the Friedman test provides \(p_{value} =0.00001,\) leading to reject \(H_0 \). The pos hoc obtained by Wilcoxon test is described in Table 11. According to the results, the classifier REC is significantly different when compared to the other ones. In addition, the KNN classifier is significantly different from the LMC, LPC, HKNN and REC. Lastly, the FKNN is significantly different than LMC, HKNN, REC and ALH. Therefore, the statistical tests indicate that the best classifier is the REC and the worst classifiers are KNN and EKNN. Also in this case, this finding is consistent with the results obtained by A-TOPSIS.

5 Concluding Remarks

In this work, we present a thorough investigation about our previous work, the A-TOPSIS framework. We carried out two cases studies case in which we detailed the applicability of our approach and we compare it with the Hellinger-TOPSIS. Throughout the experiments, we described the benefits of using the A-TOPSIS rather the Hellinger-TOPSIS. In order to verify the suitability of the A-TOPSIS rank, we performed the nonparametric tests of Friedman and Wilcoxon. The obtained results showed the effectiveness of the approach and indicate that the A-TOPSIS can support the statistical tests with a complete rank of all algorithms analyzed.

Despite we use classification problems in both studies case, the presented approach is general and can be applied to compare the performance of any stochastic algorithms in machine learning. In terms of computational burden, the A-TOPSIS consists of a very simple computation procedure. It is worth to note that the TOPSIS is a well-established and reliable methodology, which guarantee the A-TOPSIS effectiveness. Finally, in order to encourage researchers and practitioners in the different areas of knowledge, especially in machine learning, to use the A-TOPSIS, we provided a web framework to rank algorithms in an easy way.