Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Burduk, Robert

doi:10.1007/978-3-319-47274-4_14

Robert Burduk¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 525))

Included in the following conference series:

International Conference on Image Processing and Communications

655 Accesses

Abstract

The goal of the Active Learning algorithm is to reduce the number of labeled examples needed for learning. In this paper we propose the new AL algorithm based on the analysis of decision profiles. The decision profiles are obtained from the outputs of the base classifiers that form an ensemble of classifiers. The usefulness of the proposed algorithm is experimentally evaluated on several data sets.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

Adaptive Active Learning with Ensemble of Learners and Multiclass Problems

Classifier Selection Uses Decision Profiles in Binary Classification Task

Keywords

1 Introduction

Supervised learning is one of the types of machine learning [1]. Classification methods are applied in many practical tasks [5, 8, 10]. Generally, the recognition algorithm maps the feature space to the set of class labels. This process requires a sufficiently large number of training examples in the learning set. Typically these examples are manually labelled by the expert (sometimes called - oracle). On the other hand unlabelled examples are much easier to be acquired.

Active Learning (AL) [6] is a special case of machine learning in which a learning algorithm is able to interactively query the expert to obtain a label for unlabelled examples. These labelled examples are further used to improve a classifier. The key issue is to select the most informative examples. In this paper we use Query by Committee (QBC) approach [11]. This approach to AL is based on using the ensemble of classifiers to select the most informative examples.

The text is organized as follows: after this introduction, in Sect. 2 the idea of an ensemble of classifiers is presented. Section 3 contains the description of the proposed AL scheme based on the QBC approach. The experimental results on several data sets are presented in Sect. 4. Finally, conclusions from the experiments and future research proposals are presented.

2 Ensemble of Classifiers

The classification task can be accomplished by a single classifier or by a team of classifiers. In the literature, the use of the multiple classifiers for a decision problem is known as the multiple classifier systems (MCS) or the ensemble of classifiers EoC [7, 12]. The construction of MSC consists of three phases: generation, selection and integration [3]. In the second phase, which is discussed in this paper, one classifier or a subset of the base classifiers is selected to make the final decision which is to assign an object to the class label.

The output of an individual classifier can be divided into three types [13]:

The abstract level – the classifier $\psi $ assigns the unique label j to the given input x.
The rank level – in this case for each input x, each classifier produces an integer rank array. Each element within this array corresponds to one of the defined class labels. The array is usually sorted with the label at the top being the first choice.
The measurement level – the output of a classifier is represented by a measurement value that addresses the degree of assigning the class label to the given output x. An example of such a representation of the output is a posteriori probability returned by Bayes classifier.

Let us assume that we possess K of different classifiers $\varPsi _1,\varPsi _2,\ldots ,\varPsi _K$. Such a set of classifiers, which is constructed on the basis of the same learning sample is called an ensemble of classifiers or a combining classifier. However, any of $\varPsi _i$ classifiers is described as a component or a base classifier. As a rule K is assumed to be an odd number and each of $\varPsi _i$ classifiers makes an independent decision. As a result, of all the classifiers’ actions, their K responses are obtained. Having at the disposal a set of base classifiers one should determine the procedure of making the ultimate decision regarding the allocation of the object to the given class. It implies that the output information from all K component classifiers is applied to make the ultimate decision.

In this work we consider the situation when each base classifier returns the estimation of a posteriori probability. This means that the output of all the base classifiers is at the measurement level. Let us denote a posteriori probability estimation (most often discrimination function – DF) by $p_k(\omega |x)$, $k=1,2,\ldots ,K$, $\omega =1,2,\ldots ,\varOmega $, where $\varOmega $ is the number of the class labels. One of the possible methods for such outputs is the linear combination method. This method makes use of the linear function like Sum, Prod or Mean for the combination of the outputs. In the sum method the score of the group of classifiers is based on the application of the following sums:

$$\begin{aligned} s_i(x)=\sum _{k=1}^{K}p_k(\omega |x), \qquad \omega =1,2,\ldots ,\varOmega . \end{aligned}$$

(1)

The final decision of the group of classifiers is made following the maximum rule and is presented accordingly, depending on the sum method (1):

$$\begin{aligned} \varPsi _{S}(x)= \arg \max _{i} s_i(x). \end{aligned}$$

(2)

In the presented method (2) DF obtained from the base classifiers take an equal part in building MCSs. This is the simplest situation in which we do not need additional information on the testing process of the base classifiers except for the models of these classifiers. One of the possible methods in which weights of the base classifier are used is presented in [4].

3 Proposal of Active Learning Algorithm Using the Decision Profile

The general AL scheme is defined as follows [15]:

Input: Learning algorithm - A; Set of labeled training examples - L; Set of unlabeled training examples - U; Number of active learning iterations - n; Number of selected examples - m.

Repeat n times

1.
Generate a committee of classifiers, $C^*= EnsembleMethod (A, L)$
2.
$\forall x_j \in U$ compute $Information Value (C^*, x_j)$, based on the current committee
3.
Select a subset S of m examples that are the most informative
4.
Obtain a label for examples in S from “oracle” or an expert
5.
Remove examples in S from U and add to L

Return EnsembleMethod(A, L)

3.1 Proposal of Information Value Calculations

For K base classifier their outputs are arranged in the decision profile:

$$\begin{aligned} DP(x)= \left[ \begin{array}{ccc} p_1(1|x) &{} \vdots &{} p_1(\varOmega |x) \\ \vdots &{} \vdots &{} \vdots \\ p_K(1|x) &{} \vdots &{} p_K(\varOmega |x) \\ \end{array} \right] . \end{aligned}$$

(3)

During learning of the base classifiers we obtain m decision profiles, where m is the number of objects from the learning set. From the decision profiles we calculate the decision scheme according to the formula:

$$\begin{aligned} DS= \left[ \begin{array}{ccc} ds_{11} &{} \vdots &{} ds_{1\varOmega } \\ \vdots &{} \vdots &{}\vdots \\ ds_{K1} &{} \vdots &{} ds_{K\varOmega } \\ \end{array} \right] , \end{aligned}$$

(4)

where

$$\begin{aligned} ds_{k\omega } =\frac{\sum _{n=1}^{m}I(\varPsi _k(x_n)=\omega _n)\, p_k(\omega _n|x_n)}{\sum _{n=1}^{m} I(\varPsi _k(x_n)=\omega _n)}, \end{aligned}$$

(5)

where $I(\cdot )$ is the indicator function. The value of $ds_{k\omega }$ is calculated only from those DFs for which the classifier k did not make an error.

The information value is calculated for the new object on the basis of its decision profile according to the formula:

$$\begin{aligned} IV(x) = \sum _{k=1}^{K}\sum _{\omega =1}^{\varOmega }I(p_k(\omega |x)< ds_{k\omega }). \end{aligned}$$

(6)

The obtained information value IV for the new object from a set of an unlabelled training example is used to indicate the object x to be labelled. The labelled object is the one that exceeds the established value of the IV. In the experiments the algorithm using the proposed above method is denoted as $\varPsi _{AL}^{IV}$.

4 Experimental Studies

In the experiential research we used two data sets from the UCI repository [9], one data set form Keel repository and the two generated randomly – they are the so called Banana and Higleyman sets. The numbers of attributes and examples are presented in Table 1. In the experiments we have used the standard 10-fold-cross-validation method and the feature selection process [14] was not performed.

Table 1. Description of data sets selected for the experiments

Full size table

In the experiments 9 base classifiers were used. One of them (labelled as $\varPsi _1$) used the decision trees algorithms, with the splitting rule based on entropy, the number of branches equal to 2 and the depth of the precision tree having at most 6 levels. Two of them work according to $k-NN$ rule where k parameter is equal to 3 or 5 and they are labelled as $\varPsi _2$ and $\varPsi _3$ respectively. The classifier labelled as $\varPsi _4$ is the rule induction classifier. This classifier uses a tree-based modelling in which a tree is run and models for which the purity is at or above a specified threshold are removed from the training data set and placed on the side. The fifth classifier $\varPsi _5$ uses Support Vector Machines models with the Decomposed Quadratic Programming estimation method. The sixth classifier $\varPsi _6$ uses the least squares regression model to predict the class label. The last of the base classifiers $\varPsi _7$ is the multilayer perceptron model with 3 hidden units.

Tables 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 show the results of the classification for all base classifiers and two ensemble methods. The results refer to the first iteration of AL scheme and are for information value equal 12 or 15.

Table 2. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=12$ and random labelling of unlabelled training examples - Pima data set

Full size table

Table 3. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=15$ and random labelling of unlabelled training examples - Pima data set

Full size table

Table 4. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=12$ and random labelling of unlabelled training examples - Banana data set

Full size table

Table 5. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=15$ and random labelling of unlabelled training examples - Banana data set

Full size table

Table 6. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=12$ and random labelling of unlabelled training examples - Higleman data set

Full size table

Table 7. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=15$ and random labelling of unlabelled training examples - Higleman data set

Full size table

Table 8. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=12$ and random labelling of unlabelled training examples - Magic data set

Full size table

Table 9. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=15$ and random labelling of unlabelled training examples - Magic data set

Full size table

Table 10. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=12$ and random labelling of unlabelled training examples - Phone data set

Full size table

Table 11. Classification accuracy for the base classifiers ($\varPsi _1,...,\varPsi _9$), Majority Voting and Sum method for AL scheme with $IV=15$ and random labelling of unlabelled training examples - Phone data set

Full size table

The obtained results are promising. For a given data sets we received improvement of classification quality. This improvement relates to the proposed AL algorithm compared to random labelling of the unlabelled training examples. For example, algorithm $\varPsi _{AL}^{12}$ in our AL approach is about two percent higher than the same algorithm with random labelling in the case of MAGIC Gamma Telescope and Phone data sets. Made experiments also show that the selection of IV parameter values significantly affect the quality of classification. For example algorithm $\varPsi _{AL}^{IV}$ obtained the best results for $IV=15$ and $IV=12$ for MAGIC Gamma Telescope and Phone data sets respectively.

5 Conclusion

The paper presents the new AL algorithm based on the QBC approach. In the information value calculating process the decision profiles are used. In the paper experiments on several data sets were carried out. The obtained results are promising. This is due to the improvement of classification quality when we use the proposed AL algorithm compared to random labelling of unlabelled training examples.

In the future studies we plan to discuss the impact of using another value of the information value. In addition, further experiments should apply imbalanced data set [2] and not only to the binary data set.

References

Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)
MATH Google Scholar
Borowska, K., Topczewska, M.: New data level approach for imbalanced data classification improvement. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds.) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol. 403, pp. 283–294. Springer, Switzerland (2016)
Chapter Google Scholar
Britto, A.S., Sabourin, R., Oliveira, L.E.: Dynamic selection of classifiersa comprehensive review. Pattern Recogn. 47(11), 3665–3680 (2014)
Article Google Scholar
Burduk, R.: Classifier fusion with interval-valued weights. Pattern Recogn. Lett. 34(14), 1623–1629 (2013)
Article Google Scholar
Choraś, M., Kozik, R.: Machine learning techniques applied to detect cyber attacks on web applications. Logic J. IGPL 23(1), 45–56 (2015)
Article MathSciNet Google Scholar
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)
Google Scholar
Cyganek, B.: One-class support vector ensembles for image segmentation and classification. J. Math. Imaging Vis. 42(2–3), 103–117 (2012)
Article MathSciNet MATH Google Scholar
Forczmański, P., Łabȩdź, P.: Recognition of occluded faces based on multi-subspace classification. In: Saeed, K., Chaki, R., Cortesi, A., Wierzchoń, S. (eds.) CISIM 2013. LNCS, vol. 8104, pp. 148–157. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40925-7_15
Chapter Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Frejlichowski, D.: An algorithm for the automatic analysis of characters located on car license plates. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 774–781. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39094-4_89
Chapter Google Scholar
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)
Article MATH Google Scholar
Giacinto, G., Roli, F.: An approach to the automatic design of multiple classifier systems. Pattern Recogn. Lett. 22, 25–33 (2001)
Article MATH Google Scholar
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley, New Jersey (2004)
Book MATH Google Scholar
Rejer, I.: Genetic algorithm with aggressive mutation for feature selection in BCI feature space. Pattern Anal. Appl. 18(3), 485–492 (2015)
Article MathSciNet Google Scholar
Stefanowski, J., Pachocki, M.: Comparing performance of committee based approaches to active learning. In: Recent Advances in Intelligent Information Systems, pp. 457–470 (2009)
Google Scholar

Download references

Acknowledgments

This work was supported by the Polish National Science Center under the grant no. DEC-2013/09/B/ST6/02264.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
Robert Burduk

Authors

Robert Burduk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Burduk .

Editor information

Editors and Affiliations

Institute of Telecommunications and Computer Sciences, UTP University of Science and Technology, Bydgoszcz, Poland
Ryszard S. Choraś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Burduk, R. (2017). Active Learning Algorithm Using the Discrimination Function of the Base Classifiers. In: Choraś, R. (eds) Image Processing and Communications Challenges 8. IP&C 2016. Advances in Intelligent Systems and Computing, vol 525. Springer, Cham. https://doi.org/10.1007/978-3-319-47274-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-47274-4_14
Published: 28 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47273-7
Online ISBN: 978-3-319-47274-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Abstract

Similar content being viewed by others

Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

Adaptive Active Learning with Ensemble of Learners and Multiclass Problems

Classifier Selection Uses Decision Profiles in Binary Classification Task

Keywords

1 Introduction

2 Ensemble of Classifiers

3 Proposal of Active Learning Algorithm Using the Decision Profile

3.1 Proposal of Information Value Calculations

4 Experimental Studies

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Active Learning Algorithm Using the Discrimination Function of the Base Classifiers

Abstract

Similar content being viewed by others

Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search

Adaptive Active Learning with Ensemble of Learners and Multiclass Problems

Classifier Selection Uses Decision Profiles in Binary Classification Task

Keywords

1 Introduction

2 Ensemble of Classifiers

3 Proposal of Active Learning Algorithm Using the Decision Profile

3.1 Proposal of Information Value Calculations

4 Experimental Studies

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation