Keywords

1 Introduction

A classifier ensemble can be defined as a collection of individual classifiers (ensemble members), working in a parallel way, which receives the same pattern input and produces its output. A combination method receives the members outputs and provides the global output of the system [13]. In machine learning, classifier ensembles have been emerged as an efficient technique in different classification problems. In the literature, several studies have been investigated efficient ways to combine classifiers aiming at improving the classification performance in different applications. In these cases, we can find different algorithms, methods and/or practices for combining classifiers [1,2,3,4,5, 7, 8].

In this context, one important aspect is the definition of the ensemble structure. Several studies have proposed different ways to define the ensemble structure such as: Optimization techniques, meta-learning, among others [21, 22]. Basically, the ensemble structure definition can be done statically or dynamically. In the static selection, the ensemble structure is selected in the beginning of the training phase, and it is used throughout the whole ensemble processing. On the other hand, the dynamic selection defines the ensemble structure dynamically, in which each testing instance has its own ensemble structure. Several studies have shown that the dynamic selection tends to increase the predictive capacity of an ensemble [1, 9].

There are several studies that investigate the dynamic selection of ensemble structure, mainly for ensemble members [7, 8] and features [23] and both of them [24]. As mentioned previously, in an ensemble structure, the combination method aims at combining the outputs of all classifiers in order to provide the final output of an ensemble. Although this component plays an important role in the performance of an ensemble, very little has been done in order to define an efficient dynamic selection of this module.

Aiming at proposing an automatic decision process to select the best classification structure to a testing instance, this paper proposes a dynamic selection method for combination methods in classifier ensembles. Also known as dynamic fusion, the proposed method defines the region of competence of each candidate (combination method) for each testing instance and the most competent combination method is selected. The definition of the region of competence is made based on a pool of classifiers and it can be statically or dynamically formed. In this paper, we will apply the dynamic selection since we aim to promote dynamicity in two important parameters of an ensemble (ensemble members and combination method).

In order to assess the feasibility of the proposed method, an empirical analysis will be conducted. In this analysis, the proposed method will be used along with two well-known ensemble members dynamic selection methods, KNORA-Eliminate (KNORA-E) and META-DES. In the proposed method, a set of eight combination methods are used as candidates to be selected. Additionally, an analysis of the selection distribution of the possible candidates will be performed in order to investigate whether this selection is distributed over all possible candidates or there are one or two candidates that dominates the selection process. Finally, the proposed method will be compared to 12 ensembles in which the combination methods are selected in a static way. All the analyzed methods will be evaluated using 15 classification datasets.

2 Theoretical Concepts and Related Work

2.1 State of the Art

There are several studies that investigate the dynamic selection of ensemble structure, mainly for ensemble members [7, 8] and features [23] and both of them [10, 24].

In relation to ensemble members, in [7], for instance, a new method for dynamic ensemble member selection is presented and it uses the confidence of the base classifiers during the classification and its general credibility as selection criterion. Thus, an ensemble member is selected to compose the ensemble if its selection criterion is higher than an established threshold x. Another interesting way is to use region of competence as selection criterion, making it possible to improve the combination of classifiers, in which the most competent ones in a certain region are selected. The use of region of competence as selection criterion helps to maximize results [11, 12] by focusing only on the most competent classifiers, and examples can be found in KNORA-E [1] and META-DES [3].

In terms of dynamic feature selection, in [23], a dynamic feature select approach was proposed. The main aim of this approach is to select a different subset of features for one instance or a group of instances. The main goal of this approach is to explore the full potential of all instances in a classification problem. In [24], an initial study on how to combine these two dynamic selection techniques was performed. According to the authors, an improvement in performance was detected with the use of this integrated dynamic selection technique.

Although there are several studies to propose dynamic selection of ensemble members and feature selection, very little has been done in order to propose efficient dynamic selection of combination methods. This paper tries to bridge this gap and it proposes a dynamic selection method based on region of competence.

2.2 Classifier Ensembles

It is well-known that there is not a single classifier which can be considered optimal for all problem domains [13]. Therefore, it is difficult to select a good single classifier which provides the best performance in practical pattern classification tasks [14]. Therefore, classifier ensembles have emerged as an efficient classification structure since it combines the advantages and overcomes the limitations of the individual classifiers. Providing better generalization and performance ability, when compared to the individual classifiers [14]. In a classifier ensemble, an input pattern is presented to all individual classifiers [15, 16], and a combination method combines their outputs to produce the overall output of the system [13, 17]. The Machine Learning literature has ensured that diversity plays an important role in the design of ensembles, contributing to their accuracy and generalization [13].

One important issue regarding the design of classifier ensembles involves the appropriate selection of its structure (individual classifies and combination methods) [18]. As previously mentioned, there are basically two main selection approaches, static and dynamic. In this paper, we will focus on the dynamic approach. The next subsection will describe some existing dynamic selection methods that will be used in this paper.

2.3 Dynamic Ensemble Member Selection

The Dynamic Ensemble Selection (DES) methods perform the dynamic ensemble member selection. These methods select a subset of classifiers to classify each test instance. The selection of the classifier subset is done through the use of a selection procedure and each DES method has its own procedure. There are several DES methods proposed in the literature. In this paper, we will use two well-known DES methods, KNORA-E and META-DES.

KNORA-E:

Knora [1] is a well-known DES method and it seeks to find the best subset of classifiers for a given test instance. It applies a k-Nearest Neighbors methods and the neighbors of a testing instance are selected from the validation set and the competence of each classifier is calculated. Based on a certain selection criterion, the classifier subset is selected.

KNORA-E is a knora-based method, and the selection criterion is to select a set of classifiers formed only by the classifiers that correctly classify all k neighbors of a testing instance. In the case where no classifier can correctly classify all k neighbors, the k value is decremented by one and this is done until at least one classifier can be selected [1].

META-DES:

The META-DES [3] is a DES method that uses the idea of selection using meta-learning. In this method, a meta-problem is created to determine whether a classifier is competent for a given test instance. According to [10], the META-DES method uses five criteria for extracting meta-features in order to establish the new region of a meta-problem.

After that, a meta-classifier is trained, based on the defined meta-features. This meta-classifier is then used to identify whether a classifier is competent or not to classify a testing instance. Classifiers that are labeled as competent will be selected to compose the ensemble to classify the test instance.

3 The Proposed Method

The proposed method aims at selecting the combination method of a classifier ensemble dynamically. Algorithm 1 presents the main steps of the proposed method. As it can be observed, the dynamic selection of combination is performed in the testing phase. In this phase, when a testing instance is presented to the classifier ensemble, the competence of each combination method is calculated.

figure a

This competence can be calculated in several different ways. As the use of local competence has been widely used in dynamic member selection methods, in the proposed method, the k-NN method is used to select the neighbors of the testing instance. Then, the local accuracy of each combination method is calculated in this neighborhood, and the most accurate combination method is selected.

When computing the competence region, a draw in accuracy may occur. In this case, the number of neighbours is increased by 1 (k = k + 1) until a winner is detected. If all instances of the Validation set is used and there is still a draw, the winner method is randomly selected.

Additionally, there are two main parameters in the proposed method, the number of neighbors of a testing instance (line 4 of Algorithm 1) and the size of the pool of classifiers. Finally, he proposed method can be applied to a pool of classifiers selected statically or dynamically. In order to provide more dynamicity for the classifier ensemble, in this paper, the proposed method will be applied to two well-known dynamic member selection methods, META-DES and KNORA-E (described in Sect. 2.3).

4 Experimental Methodology

In order to assess the feasibility of the proposed method, an empirical analysis will be conducted. The next subsections will describe the main aspects of this analysis, mainly the used datasets as well as its methods and materials.

4.1 Datasets

The datasets used in this paper are extracted from the UCI Machine Learning Repository. Table 1 describes some characteristics of the used datasets, focusing in the number of instances (Inst), number of attributes (Att) and number of classes (Class) of each dataset.

Table 1. Description of the used datasets

Each dataset is divided into training, Validation1, Validation2 and Testing sets, in a proportion of 50%, 16.7%, 16.7%, and 16.6%, respectively. The training set is used to generate the pool of classifiers. The testing set is used to assess the performance of the classifier ensembles. The Validation2 set is used to train the trainable combination methods (Neural Networks and Naive Bayes) while the Validation1 set is used to obtain the competence region of the proposed dynamic fusion method. This division is performed 30 times and the presented results of each ensemble configuration represent the average values over these 30 values.

4.2 Methods and Materials

In this paper, we evaluated 6 different pool sizes, 5, 10, 15, 20, 25, and 30 classifiers, in which all of them are, generated through the Bagging method. In addition, 3 different number of neighbors are assessed, 3, 7, and 11 neighbors. It is important to emphasize that the proposed method as well as both member selection techniques (KNORA-E and META-DES) use the idea of competence region. In this sense, the same number of neighbors are used for both cases, the selection of the combination method (proposed method) and the ensemble members (KNORA-E and META-DES). Finally, all ensemble configurations use Decision Trees as ensemble members.

The proposed method used a pool of 8 different combination methods, which are: Majority Vote, Sum, Geometric Mean, Naive Bayes, Edge and three Neural Networks (MLP) versions, Hard, Soft and Soft-Class. The three NN versions differ on the input information received by the ensemble members. In the Hard version, the ensemble member provides only the winner class for the testing instance. In other words, this MLP version is trained and tested using only the winner class of each ensemble member. In the other two MLP versions, the prediction probability for each class is used. In this sense, the prediction probability for each class is provided, for both MLP versions.

As the MLP input must have a fixed size and the number of selected members might vary, a strategy to define the input size must be done. In this paper, we decided to use the maximum possible size for a combination method (pool of classifier times the number of classes). In doing this, it is important to define how to handle the outputs of the unselected classifiers. The way to handle the outputs of the unselected classifiers is the main difference between Soft and Soft-Class versions. While the Soft MLP version uses -1 to all classes of an unselected classifier, the Soft-Class version uses a fixed value (1/number of total classes) as the value to all classes of the unselected classifiers.

The MLP algorithms were implemented using the Scikit-Learn [6] library with the Multi-layer Perceptron model, using 200 neurons in the hidden layer. All three neural networks followed the same configurations since this configuration provided promising results for all three NNs in a grid search method. For comparison purposes, the performance of the proposed method is compared to 12 classifier ensembles using the following combination methods: Majority Vote, Sum (Sum), Maximum (MAX), Minimum (MIN), Geometric mean, Hard MLP, Soft MLP, Soft-Class MLP, Edge, Naive Bayes, Weighted Sum and Weighted Ensemble Voting. For all analyzed ensembles, the ensemble members are dynamically selected, and the combination method is statistically selected, as originally proposed in both analyzed methods (KNORA-E and META-DES).

Additionally, the Weighted sum and weighted vote methods use weights on their functioning. The used weight is 1/(distance-of-classes) and it is applied to the vote procedure in the Weighted Sum as well as the probability of the classifiers in the Weighted sum method.

The obtained results of all analyzed methods will be evaluated using the Friedman statistical test [20]. In cases where a statistically significant difference is detected, the Nemenyi post-hoc test is applied [20]. In order to present the obtained results by the post-hoc test, the critical difference diagram (CD) [20] is used. This diagram was selected in order to have a visual illustration of the statistical test, making it easier to interpret the obtained results. Additionally, all implemented methods are included in the DESLIB [3] library that contains both methods, KNORA-E and META-DES.

5 The Obtained Results

In this section, the obtained results are presented and analyzed. This analysis will be done in three main parts. In the first part, the accuracy of all 13 analyzed methods are assessed. The second part presents the distribution of the selected combination methods while the third part describes the results of the statistical analysis.

5.1 The Accuracy of the Analyzed Methods

Tables 2 and 3 present the accuracy results of all 13 analyzed methods for KNORA-E and META-DES, respectively. As previously mentioned, 18 different ensemble configurations (6 pool sizes and 3 different number of neighbors) and each configuration was performed 30 times. Therefore, values in Tables 2 and 3 represent the average over 540 results. Additionally, the last line in both tables represents the overall accuracy over all 15 datasets. Finally, the bold numbers represent the highest accuracy for each dataset.

For KNORA-E (Table 2), it can be observed that the proposed method (Dynamic Fusion) delivered the highest accuracy in 8 datasets, out of 15, followed by Vote and Sum (6 datasets). Furthermore, the overall accuracy achieved by the proposed method is the highest one of all analyzed methods. For META-DES (Table 3), the proposed method did not deliver the highest accuracy levels in many datasets (4 out of 15). Nevertheless, it presents the highest overall accuracy, followed by Vote and Sum. It shows that the proposed method is the best overall classifier ensemble.

The results obtained in Tables 2 and 3 show that the use of the dynamic selection of the combination methods proved to be efficient since it improves the performance of the classifier ensembles, when compared to the static selection methods. We believe that this improvement in performance is due to the fact that the dynamic fusion technique maximizes, even more, the characteristics of the ensemble members and, as a consequence, to improve the performance of the classifier ensemble. The original KNORA-E and META-DES techniques themselves already provide more efficient classifiers since the best ones are selected to classify a testing instance. Furthermore, the proposed method handles even more efficiently the classifier ensembles since it selects the combination methods that suits better to the ensemble members. Finally, we can state that the inclusion of more dynamicity in the ensemble structure can lead to an improvement in its performance.

Table 2. Results of the classifier ensembles using the KNORA-E method
Table 3. Results of the classifier ensembles using the META-DES method

5.2 The Selection Distribution

Once we have analyzed the accuracy of the different classifier ensembles, now we will evaluate the selection distribution made by the proposed methods. In other words, what is the proportion of selection for each combination method which was made by the proposed method in the testing phase.

Tables 4 and 5 present the selection distribution for KNORA-E and META-DES methods, respectively. As it can be observed in both tables, all eight combination methods were selected by the proposed method in the testing phase. The most selected combination method is Vote, for both methods (KNORA-E and META-DES). It is an expected result since this method provided the second best accuracy level. On the other hand, the Naïve Bayes was rarely selected as the best combination method. Once again, this combination method provided one of the worst accuracy levels, for both methods (KNORA-E and META-DES). It is important to emphasize that the Sum combination method was also rarely selected since its performance is usually very similar to the Vote method and, in these datasets, the latter method was slightly better, and it was then selected. Finally, although the edge combination method obtained good accuracy levels, it is possible to observe that it was rarely selected by the proposed method.

Table 4. Selection distribution in the dynamic fusion when using KNORA-E
Table 5. Selection distribution in the dynamic fusion when using META-DES

Still in the analysis of the selection distribution, it can be seen that there is a certain equal selection distribution among five combination methods (Vote, Geometric mean, MLP-Hard, MLP-Soft and MLP-Soft-Class) which shows that there is no best combination method and that an efficient selection can improve even further the performance of the classifier ensemble. Among the NN versions, it can be observed similar selection distribution among themselves, showing a certain similarity among all three version, which together surpasses the majority vote proportion.

Based on the results of Tables 4 and 5, one can conclude that the proposed method generally biases towards the method with the highest accuracy level. This is an expected result since if a combination method is the most successful one, it is usually one of the most successful one in the dynamic fusion competence region.

5.3 Statistical Analysis

In order to evaluate the obtained results from a statistical point of view, the Friedman test [19] was applied to verify if there are statistical differences among all ensemble classifiers. The Friedman test is used to be able to state the hypothesis that the k-related observations derive from the same population (similar performance) or not (superiority in performance). In this test, the significance level used was set to 0.05. Hence, if the p-value is less than the established value, the null hypothesis is rejected, with a confidence level greater than 95%.

Table 6 presents the results of the Friedman test. As it can be observed in both cases (KNORA-E and META-DES) that the statistical test detected statistical differences among all analyzed methods. In this sense, the post-hoc test was applied the results are presented in the Critical Difference Diagram [20].

Table 6. Friedman test for KNORA-E and META-DES

In the Critical Difference Diagram, the performance of a method is statistically different from another method if the difference between their average rankings is higher than the critical difference calculated by the Critical Difference Diagram (CD). In this case, when two methods are similar, there is a horizontal line linking these two methods.

Figure 1(a) shows the CD diagram for KNORA-E. As it can be observed in this figure, the superiority of the proposed method was detected by the statistical test. It shows that the improvement in performance was strong enough to be detected by the statistical test, when compared to all other analyzed methods.

Fig. 1.
figure 1

Critical Difference Diagram for KNORA-E (a) and META-DES (b)

Figure 1(b) presents the CD diagram for META-DES. In this method, unlike KNORA-E, the superiority in performance delivered by the proposed method was not detected by the statistical test. As it can be observed, the CD diagram detected that the accuracy of dynamic fusion, sum, majority vote, and soft class MLP provide similar performance. For the remaining methods, the proposed method provided higher accuracy levels, detected by the statistical test.

6 Final Remarks

This paper proposed a method to dynamically select combination methods for a classifier ensemble. The combination methods are selected for each testing instance based on the competence region of the analyzed combination methods. The dynamic combination method selection was used along the dynamic ensemble member methods in order to include more dynamicity in the ensemble structure selection and, as a consequence, leading to more efficient ensembles.

In order to assess the feasibility of the proposed method, an empirical analysis was conducted. In this analysis, the dynamic combination method selection was made among eight combination methods: Majority Vote, Sum, Geometric Average, Edge, Naive Bayes, and three types of MLP. Additionally, the proposed method was used in two different dynamic ensemble member methods, KNORA-E and META-DES.

Through this analysis, it can be observed that the proposed method provided the highest overall accuracy levels among all analyzed methods, for both methods (KNORA-E and in META-DES). Additionally, we could observe that the proposed method selected almost equally five combination methods (out of 8), showing that it is indeed important to apply different combination methods in the classifier ensemble structures. Finally, the statistical test proved that the superiority in performance of the proposed method was detected by the statistical test, for KNORA-E.

In general, the proposed technique showed promising results and, indeed, improvements in performance, when compared to the static selection of combination methods. We believe that this improvement in performance is due to the fact that the dynamic fusion technique maximizes, even more, the characteristics of the ensemble members and, as a consequence, to improve the performance of the classifier ensemble.

This empirical was limited to 15 classification databases. As future analysis, it is necessary to expand this analysis using another selection approaches and to perform a more detailed analysis with more datasets and different ensemble configurations. It will also be investigated the best number of neighbors to define the dynamic fusion competence region and to expand the tests for KNORA-Union [1] and Overall Local Accuracy (OLA) [9], among others. Finally, different approaches to define the competence of a method will be analyzed.