Discriminant Function Selection in Binary Classification Task

Burduk, Robert

doi:10.1007/978-3-319-26227-7_25

Robert Burduk⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 403))

999 Accesses

Abstract

The ensemble selection is one of the important problems in building multiple classifier systems (MCSs). This paper presents dynamic ensemble selection based on the analysis of discriminant functions. The idea of the selection is presented on the basis of binary classification tasks. The paper presents two approaches: one takes into account the normalization of the discrimination functions, and in the second approach, normalization is not performed. The reported results based on the data sets form the UCI repository show that the proposed ensemble selection is a promising method for the development of MCSs.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Ensemble Selection Based on Discriminant Functions in Binary Classification Task

Dynamic Ensemble Selection Using Discriminant Functions and Normalization Between Class Labels – Approach to Binary Classification

Ensemble feature selection for high dimensional data: a new method and a comparative study

Article 24 April 2017

Keywords

1 Introduction

The pattern recognition task is one of the trends in research on machine learning [1]. The classification task can be accomplished by a single classifier or by a team of classifiers. In the literature, the use of multiple classifiers for a decision problem is known as MCSs or an ensemble of classifiers [4, 11, 27]. These methods are popular for their ability to fuse together multiple classification outputs for better accuracy of classification. The building of MCSs consists of three phases: generation, selection, and integration [3]. The final decision which is made in the third phase uses the prediction of the selected classifiers. The output of an individual classifier can be divided into three types [17].

The abstract level—classifier $\psi $ assigns the unique label j to a given input x.
The rank level—in this case for each input x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted with the label at the top being the first choice.
The measurement level—the output of a classifier is represented by a measurement value that addresses the degree of assigning the class label to the given output x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier.

Following these three types of outputs of the base classifier, various problems of combination function (third phase) of classifier outputs are considered. The problems studied in [19, 25] belong to the abstract level. The combining outputs for the rank level are presented in [13] and problems studied in [16, 18] belong to the last level. The selection of classifiers is one of the important problems in the creation of EoC [15, 24]. This task is related to the choice of a set of classifiers from all the available pool of classifiers. Formally, if we choose one classifier, it is a classifier selection process. But if we choose a subset of classifiers from the pool, then it is an ensemble selection. In this work, these terms will be used interchangeably. Here you can distinguish between the static or dynamic selection [22, 26]. In the static classifier selection one set of classifiers is selected to create EoC. This EoC is used in the classification of all the objects from the testing set. The main problem in this case is to find a pertinent objective function for selecting the classifiers. One of the best objective functions for the abstract level of classifier outputs is the simple majority voting error [23]. In the dynamic classifier selection for each unknown sample a specific subset of classifiers is selected [5]. It means that we are selecting different EoCs for different objects from the testing set. In this type of classifier selection, the classifier is chosen and assigned to the sample based on different features [28] or different decision regions [7, 14]. In this work we will consider the dynamic ensemble selection. In detail, we propose the new selection method based on the analysis of the discriminant functions in the contents of the binary classification. The paper presents two approaches: one takes into account the normalization of the discrimination functions. In the second approach, normalization is not performed. The text is organized as follows: in Sect. 2 the ensemble of classifiers and combination functions based on the sum method are presented. Section 3 contains the new method for the dynamic ensemble selection based on the analysis of the discriminant functions. Section 4 includes the description of research experiments comparing the suggested algorithms with base classifiers. Finally, the discussion and conclusions from the experiments are presented.

2 Ensemble of Classifiers

Let us consider the binary classification task. It means that we have two class labels $M=\{1, 2\}$. Each pattern is characterized by a feature vector X. The recognition algorithm maps the feature space X to the set of class labels M according to the general formula:

$$\begin{aligned} \varPsi :X \rightarrow M. \end{aligned}$$

(1)

Let us assume that K different classifiers $\varPsi _1,\varPsi _2,\ldots ,\varPsi _K$ are available to solve the classification task. In MCSs these classifiers are called base classifiers. In the binary classification task K is assumed to be an odd number. As a result, for all the classifiers’ actions, their K responses are obtained. The output information from all K component classifiers is applied to make the ultimate decision of MCSs. This decision is made based on the predictions of all the base classifiers. One of the possible methods for integrating the output of the base classifier is the sum rule. In this method the score of MCSs is based on the application of the following sums:

$$\begin{aligned} s_i(x)=\sum _{k=1}^{K}\hat{p}_k(i|x), \qquad i \in M, \end{aligned}$$

(2)

where $\hat{p}_k(i|x)$ is an estimate of the discrimination functions for class label i returned by classifier k. The final decision of the MCSs is made following the maximum rule:

$$\begin{aligned} \varPsi _{S}(x)= \arg \max _{i} s_i(x). \end{aligned}$$

(3)

In the presented method (3) the discrimination functions obtained from the individual classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information about the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [2].

2.1 Selection of Discrimination Functions

The classifier selection phase is often criticized due to usually observed high computational cost [3]. Bearing this in mind, we propose the dynamic selection of discrimination functions, which has minor computing requirements. It uses the assumption that the more credible the classifier, the larger the differences in the discrimination function. In the discussed binary classification task this means that during the selection we compare the discriminant functions $|\hat{p}_k(1|x)-\hat{p}_k(2|x)|<\alpha $. The parameter $\alpha $ determines the size of the difference in the discriminant functions. The values are derived from the interval $<\alpha \in [0,1)$. For value $\alpha =0$ the selection process does not occur. The proposed selection process is performed before the calculation of the coefficients $s_i(x)$. The final decision is made according to the sum rule, and MCSs with the selection is labelled as $\varPsi _{S_{ON}}^{\alpha }$. In MSCs model labeled as $\varPsi _{S_{N}}^{\alpha }$, in addition the normalization of the discrimination functions is carried out. Normalization is performed for each label class i according to the rule:

$$\begin{aligned} \widehat{p'}_k(i|x)=\frac{\hat{p}_k(i|x)- \min (\hat{p}_1(i|x),\ldots ,\hat{p}_k(i|x))}{\max (\hat{p}_1(i|x),\ldots ,\hat{p}_k(i|x))-\min (\hat{p}_1(i|x),\ldots ,\hat{p}_k(i|x))}, \qquad k \in K. \end{aligned}$$

(4)

The aim of the experiments presented in the next section was among other things, to compare two proposed methods of selection.

Table 1 Description of data sets selected for the experiments

Full size table

Table 2 Classification accuracy and mean rank positions for the proposed selection algorithm with normalization produced by the Friedman test

Full size table

Table 3 Classification accuracy and mean rank positions for the proposed selection algorithm without normalization produced by the Friedman test

Full size table

Table 4 Classification accuracy and mean rank positions for the base classifiers ($\varPsi _1,\ldots ,\varPsi _7$) and MCSs algorithms produced by the Friedman test

Full size table

3 Experimental Studies

A series of experiments were carried out to illustrate the quality of classifications. The aim of the experiments was to compare the proposed selection method algorithms with the base classifiers and ensemble classifiers based on the majority voting and sum methods. In the experiential research 11 benchmark data sets were used. Two of them were generated randomly—they are the so-called Banana and Highleyman sets. The other nine benchmark data sets come from the UCI repository [10]. All the data sets concern the binary classification task. The numbers of attributes, examples, and ration in the classes are presented in Table 1. The studies did not include the impact of the feature selection process on the quality of classifications. Therefore, the feature selection process [12, 21] was not performed. The results are obtained via tenfold-cross-validation method. In experiment 7, base classifiers were used. Two of them work according to $k-NN$ rule where the k parameter is equal to 5 or 7. Two base classifiers are used as decision trees algorithms, with the number of branches denoted as 2 and the depth of the precision tree having at most 6 levels. In the decision-making nodes the Gini index or entropy is used. One of the classifiers uses a combination of the decision trees’ base models. It is Gradient boosting algorithm. The other two classifiers use Support Vector Machines models. One of them uses Least Squares SVM method and the second Decomposed Quadratic Programming method. The experiments were performed in an SAS Enterprise Miner environment. Table 2 shows the results of the classification for the proposed classifier selection with normalization of the posteriori probability functions. Additionally, the mean ranks obtained by the Friedman test were presented. The values show that the best value of the parameter $\alpha $ is 0.4 for that classifier selection method. Table 3 shows the results for the case where there is no normalization of the posteriori probability functions. For this case the optimal value of the parameter $\alpha $ is 0.1. Classifiers with the selected values of parameter $\alpha $ were compared with the base classifiers and ensemble methods based on the majority voting and sum methods. The results of classification with the mean ranks obtained by the Friedman test are presented in Table 4. To compare the results the post-hoc Nemenyi test was used [24]. The critical difference for this test at $p=0.05$ is equal to $CD=4.51$. Since the difference between $\varPsi _{Sum}$ and the proposed algorithms $\varPsi _{S_N}^{0.4}$, $\varPsi _{S_{ON}}^{0.1}$ is already smaller than 4.51, we can conclude that the post-hoc test is not powerful enough to detect any significant differences between these algorithms. However, the proposed selection of the posteriori probability functions algorithm achieved the best result, which is the lowest average rank (Table 3).

4 Conclusion

This paper discusses the classifier selection for the binary classification task. The proposal in the paper process concerns the selection of the posteriori probability functions. The paper presents two approaches. In one of them normalization of the posteriori probability functions is carried out. In the second case under consideration, normalization is not performed. The distributed computing approaches enable the efficient and parallel processing of the complicated data analysis task, also in the context of classification systems with multiple classifiers [20]. The methods of classifier selection presented in the paper show the ability to work in a parallel and distributed environment. Parallel processing provides the possibility to speed up the selection of the posteriori probability functions, whose results are needed to make the decision by the classifier ensemble. Additionally, the proposed approach can be applied in various practical tasks involving multiple elementary classification tasks [6, 8, 9]. In the paper several experiments were carried out on the data sets available from the UCI repository and on the synthetical data sets. The aim of the experiments was to compare the proposed selection method algorithms with the base classifiers and ensemble classifiers based on the majority voting and sum methods. For the proposed selection method, we obtained improvement of the classification quality measured by average values from the Friedman test. However, the difference in average ranks is too small to detect statistically significant differences between the proposed selection method and the ensemble method based on the sum rule.

References

Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2006)
MATH Google Scholar
Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recognit. Lett. 34(14), 1623–1629 (2013)
Article Google Scholar
Britto, A.S., Sabourin, R., Oliveira, L.E.S.: Dynamic selection of classifiers—a comprehensive review. Pattern Recognit. 47(11), 3665–3680 (2014)
Article Google Scholar
Cyganek, B.: One-class support vector ensembles for image segmentation and classification. J. Math. Imaging Vis. 42(2–3), 103–117 (2012)
Article MathSciNet MATH Google Scholar
Cavalin, P.R., Sabourin, R., Suen, C.Y.: Dynamic selection approaches for multiple classifier systems. Neural Comput. Appl. 22(3–4), 673–688 (2013)
Article Google Scholar
Cyganek, B., Woźniak, M.: Vehicle Logo Recognition with an Ensemble of Classifiers. Lecture Notes in Computer Science, vol. 8398, pp. 117–126. Springer, Berlin (2014)
Google Scholar
Didaci, L., Giacinto, G., Roli, F., Marcialis, G.L.: A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit. 38, 2188–2191 (2005)
Article MATH Google Scholar
Frejlichowski, D.: An Algorithm for the Automatic Analysis of Characters Located on Car License Plates. Lecture Notes in Computer Science, vol. 7950, pp. 774–781. Springer, Berlin (2013)
Google Scholar
Forczmański, P., Łabȩdź, P.: Recognition of Occluded Faces Based on Multi-subspace Classification. Lecture Notes in Computer Science, vol. 8104, pp. 148–157. Springer, Berlin (2013)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010)
Google Scholar
Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recognit. Lett. 22, 25–33 (2001)
Article MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 16(1), 66–75 (1994)
Article Google Scholar
Jackowski, K., Woźniak, M.: Method of classifier selection using the genetic approach. Expert Syst. 27(2), 114–128 (2010)
Article Google Scholar
Jackowski, K., Krawczyk, B., Woźniak, M.: Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. Int. J. Neural Syst. 24(3) (2014)
Google Scholar
Kuncheva, L.I.: A theoretical study on six classifier fusion strategies. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 281–286 (2002)
Article Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken (2004)
Book MATH Google Scholar
Kittler, J., Alkoot, F.M.: Sum versus vote fusion in multiple classifier systems. IEEE Trans. Pattern Anal. Mach. Intell. 25(1), 110–115 (2003)
Article Google Scholar
Lam, L., Suen, C.Y.: Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst., Man, Cybern., Part A 27(5), 553–568 (1997)
Article Google Scholar
Przewoźniczek, M., Walkowiak, K., Woźniak, M.: Optimizing distributed computing systems for k-nearest neighbours classifiers-evolutionary approach. Logic J. IGPL 19(2), 357–372 (2010)
Article MathSciNet Google Scholar
Rejer, I.: Genetic Algorithms in EEG Feature Selection for the Classification of Movements of the Left and Right Hand. In: Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013. Advances in Intelligent Systems and Computing, vol. 226, pp. 581–590. Springer, Heidelberg (2013)
Google Scholar
Ranawana, R., Palade, V.: Multi-classifier systems: review and a roadmap for developers. Int. J. Hybrid Intell. Syst. 3(1), 35–61 (2006)
MATH Google Scholar
Ruta, D., Gabrys, B.: Classifier selection for majority voting. Inf. Fusion 6(1), 63–81 (2005)
Article MATH Google Scholar
Smȩtek, M., Trawiński, B.: Selection of heterogeneous fuzzy model ensembles using self-adaptive genetic algorithms. New Gener. Comput. 29(3), 309–327 (2011)
Article Google Scholar
Suen, C.Y., Legault, R., Nadal, C.P., Cheriet, M., Lam, L.: Building a new generation of handwriting recognition systems. Pattern Recognit. Lett. 14(4), 303–315 (1993)
Article Google Scholar
Trawiński, K., Cordon, O., Quirin, A.: A study on the use of multiobjective genetic algorithms for classifier selection in furia-based fuzzy multiclassifiers. Int. J. Comput. Intell. Syst. 5(2), 231–253 (2012)
Article Google Scholar
Ulas, A., Semerci, M., Yildiz, O.T., Alpaydin, E.: Incremental construction of classifier and discriminant ensembles. Inf. Sci. 179(9), 1298–1318 (2009). Apr
Article Google Scholar
Woloszyński, T., Kurzyński, M.: A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recognit. 44(10–11), 2656–2668 (2011)
Article MATH Google Scholar

Download references

Acknowledgments

This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264 and by the statutory funds of the Department of Systems and Computer Networks, Wroclaw University of Technology.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Robert Burduk

Authors

Robert Burduk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Burduk .

Editor information

Editors and Affiliations

Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Robert Burduk
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Konrad Jackowski
Department of Systems and Computer, Wrocław University of Technology, Wroclaw, Poland
Marek Kurzyński
Dept. of Systems and Computer Networks, Wrocław University of Technology, Wroclaw, Poland
Michał Woźniak
Department of Systems, Wrocław University of Technology, Wroclaw, Poland
Andrzej Żołnierek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burduk, R. (2016). Discriminant Function Selection in Binary Classification Task. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-26227-7_25
Published: 05 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Discriminant Function Selection in Binary Classification Task

Abstract

Similar content being viewed by others

Ensemble Selection Based on Discriminant Functions in Binary Classification Task

Dynamic Ensemble Selection Using Discriminant Functions and Normalization Between Class Labels – Approach to Binary Classification

Ensemble feature selection for high dimensional data: a new method and a comparative study

Keywords

1 Introduction

2 Ensemble of Classifiers

2.1 Selection of Discrimination Functions

3 Experimental Studies

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Discriminant Function Selection in Binary Classification Task

Abstract

Similar content being viewed by others

Ensemble Selection Based on Discriminant Functions in Binary Classification Task

Dynamic Ensemble Selection Using Discriminant Functions and Normalization Between Class Labels – Approach to Binary Classification

Ensemble feature selection for high dimensional data: a new method and a comparative study

Keywords

1 Introduction

2 Ensemble of Classifiers

2.1 Selection of Discrimination Functions

3 Experimental Studies

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation