Research of Incremental Learning Algorithm Based on the Minimum Classification Error Criterion

Wen, Bo; Shan, Ganlin; Duan, Xiusheng

doi:10.1007/978-1-4471-4790-9_83

Bo Wen⁴,
Ganlin Shan⁴ &
Xiusheng Duan⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 206))

718 Accesses

Abstract

Incremental learning is widely used in artificial intelligence, pattern recognition and other fields. It is an effective method to solve the system, where samples are less in the beginning of training, but over time its performance reduces. In this paper, based on the analyses of support vector machine and the characteristics of incremental learning, we proposed incremental learning method which is based on the minimum classification error criterion. Moreover, the validity and feasibility of this algorithm is verified through experiments.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Incremental supervised learning: algorithms and applications in pattern recognition

Article 02 February 2019

Support vector machine incremental learning triggered by wrongly predicted samples

Article 07 May 2018

A novel online incremental and decremental learning algorithm based on variable support vector machine

Article 17 January 2018

Keywords

1 Introduction

Support vector machine (SVM) is a kind of pervasive machine learning model which is put forward by Vapnik et al., based on the statistical learning theory [1]. It uses the kernel function mapping nonlinear, thereby enhancing the learning machine performance [2, 3]. At present, the international research gets more and more attention [4, 5]. Machine learning is the basic problem in the field of artificial intelligence. Many learning system get few samples at the early stage of learning. As time goes by, samples accumulate ceaselessly [6, 7]. But the efficiency of the system is reduced. Then we need the new samples get into training to improve the system’s performance [8]. If we process the new samples together with the old ones, the difficulty of learning will increase, and will consume too much time and storage space because sample set is too large. An effective method is to add, modify or the delete the part of original learning results which is only relevant to the new samples. Other irrelevant parts are untouched. This is the incremental learning. However, the learning results of support vector machine are support vector set. It is usually a small part of the whole sample sets, but can represent the feature of the entire sample set. This suggests that SVM incremental learning method based on support vector machine is feasible. By anglicizing support vector, this paper proposed incremental learning algorithm of support vector based on the minimum classification error criterion [9].

2 Problem Description

2.1 The Function of Support Vector Machine and Support Vector

Support vector machine is originated from classification problem under the linear case. The goal of using a support vector machine classification is to find the optimal hyperplane. Optimal hyperplane is the maximum-margin hyperplane, i.e., two kinds of samples are correctly classified, and at the same time, makes their classification interval biggest, as shown in Fig. 83.1 [10]:

As to the training sample set $ (y_{1} ,x_{1} ),\ominus ,(y_{l} ,x_{l} ), $ $ x \in R^{n} , $ $ y \in \{ - 1,1\} $, classification hyper plane’s two optimization problem is

$$ \begin{gathered} \mathop {\hbox{min} }\limits_{\psi ,b} \frac{1}{2}\left\| \psi \right\|^{2} + C\sum\limits_{i = 1}^{l} {\xi_{i} } \hfill \\ s.t. \, y_{i} ((\varphi (x_{i} ) * \psi ) + b) \ge 1 - \xi_{i} , \hfill \\ \, \xi_{i} \ge 0,\quad { }i = 1,\ominus ,l \hfill \\ \end{gathered} $$

(83.1)

Its Lagrange function is

$$ \begin{gathered} L(\psi ,b,\alpha ,\beta ) = \frac{1}{2}(\psi * \psi ) + C\sum\limits_{i = 1}^{l} {\xi_{i} } \hfill \\ - \sum\limits_{i = 1}^{l} {\alpha_{i} \left( {y_{i} \left[ {\left( {\varphi (x_{i} ) * \psi } \right) + b} \right] - 1 + \xi_{i} } \right)} - \sum\limits_{i = 1}^{l} {\beta_{i} \xi_{i} } \hfill \\ \end{gathered} $$

(83.2)

Its dual problem is

$$ \begin{gathered} \hbox{max}\, W(\alpha ) = \sum\limits_{i = 1}^{l} {\alpha_{i} } - \frac{1}{2}\sum\limits_{i,j = 1}^{l} {y_{i} y_{j} \alpha_{i} \alpha_{j} \left( {\varphi (x_{i} ) * \varphi (x_{j} )} \right)} \hfill \\ = \sum\limits_{i = 1}^{l} {\alpha_{i} } - \frac{1}{2}\sum\limits_{i,j = 1}^{l} {y_{i} y_{j} \alpha_{i} \alpha_{j} K(x_{i} ,x_{j} )} \hfill \\ s.t.{\text{ C}} \ge \alpha_{i} \ge 0 \hfill \\ \sum\limits_{i = 1}^{l} {y_{i} \alpha_{i} } = 0 \hfill \\ \end{gathered} $$

(83.3)

Here, $ K(x_{i} ,x_{j} ) = \varphi (x_{i} ) * \varphi (x_{j} ) $ is a kernel function.

Finally, we can get the decision function.

$$ f(x,\alpha ) = sign\left( {\sum\limits_{support \, vector} {y_{i} \alpha_{i}^{0} K(x_{i} ,x_{j} )} + b} \right) $$

(83.4)

The learning machine which constructs the decision function is called support vector machines (SVM). In the classification learning, only the support vector samples make contributions to optimal hyperplane and decision function. In general, $ X_{i} $, which corresponds to the Lagrange multiplier $ \alpha_{i} $’s value $ 0 < \alpha_{i} < C $, is called general support vector. But $ X_{i} $, which corresponds to $ \alpha_{i} = C $, is called boundary support vector. The former one represents all the samples which cannot be correctly classified; while the latter one represents the classification features of most samples. They together determine the form of the final classifier. That is to say, the support vector set can fully describe the features of training sample set. Its division is equivalent to the division of the entire sample set. In most cases, support vector set accounted for only a small part of training samples, therefore we can use support vector set to replace the training sample set for learning, if it does not affect the classification accuracy and reduce training time and memory space.

2.2 The Criterion of Minimum Classification Error

The simplest way of classification error measure is Bayes Discriminant which can be used to classify two kinds of problems. Its classification error measure can be defined as:

$$ d(x){ = }P(C_{2} |x) - P(C_{1} |x) $$

(83.5)

Here, we assume that $ P(C_{2} |x)(i{ = }1,2) $ is a known posteriori possibility. This formula gives the possibility that observational samples of class 1 would be mistakenly classified into class 2. Its optimal decision boundary is the solution of $ d(x)\,{ = }0 $. For many such cases ($ N{ > }2 $) of unknown distribution, they are different from Bayes Discriminant which classifies two kinds of problems and defines a classification error measure. Amari defines another classification error measure:

$$ d_{k} (x) = \sum\limits_{{x \in S_{k} }} {\frac{1}{{m_{k} }}} \left[g_{i} (x;\Uplambda ) - g_{k} (x;\Uplambda )\right] $$

(83.6)

Here, $ S_{k} = \{ i|g_{i} (x;\Uplambda ) > g_{k} (x;\Uplambda )\} $ is a confusion class set. $ m_{k} $ Is confusion number of $ S_{k}\cdot{S_{k}} $ Is uncertain and it changes with the parameter set $ \Uplambda $ and sample $ x $. Therefore, this formula is discontinuous to … It can’t work out any derivative, so it’s not suitable for gradient operation. There are many methods to define the $ \Uplambda $ continuous classification error measure. One of them is:

$$ d_{k} (x) = - g_{k} (x;\Uplambda ) + \left[ {\frac{1}{N - 1}\sum\limits_{j,\eta \ne k} {g_{i} } (x;\Uplambda )^{\eta } } \right]^{{\frac{1}{\eta }}} $$

(83.7)

In this formula, the second term on the right is the geometric average of all the other competing kinds. Parameter $ \eta $ can be regarded as a coefficient which adjusts the other competing kinds’ contributions to the whole discriminant function. In the process of searching for the classifier parameter, many potential classifications can be found through the change of $ \eta $. An extreme situation is that when $ \eta \to \infty $, the discriminant function of the biggest competing kind in the second term plays a leading role. That is:

When $ \eta \to \infty $, then

$$ \left[ {\frac{1}{N - 1}\sum\limits_{j,\eta \ne k} {g_{j} } (x;\Uplambda )^{\eta } } \right]^{{\frac{1}{\eta }}} = \mathop {\hbox{max} }\limits_{j,\eta \ne k} g_{j} (x;\Uplambda ) $$

(83.8)

Classification error measure changes into:

$$ d_{k} (x) = - g_{k} (x;\Uplambda ) + g_{j} (x;\Uplambda ) $$

(83.9)

Here, $ C_{i} $, except $ C_{K} $, is the kind which has the biggest discriminant value among all other kinds. This is because $ (N - 1)^{1/\infty } \cong 1 $. Obviously, under this circumstance, $ d_{k} (x) > 0 $ is a classification error. $ d_{k} (x) \le 0 $ is the right classification. In this way, decision rule is changed into a problem of judging the scalar value.

3 Algorithm Design and Experiment Analysis

3.1 Algorithm Design

Assuming that the initial training sample set is $ A = \left\{ {x_{1} ,x_{2} ,\ldots,x_{M} } \right\} $, every $ x_{m} (m = 1,2,\ldots,M) $ is a $ K $ vector, and belongs to a certain category in $ C_{i} (i = 1,2,\ldots,N) $. As to a classifier which usually contains a set of parameters and a decision rule, the task of minimum classification error classifier design is: based on the given initial training sample set $ A $ find out the classifier parameters set and the associated decision rules, in order to get the minimum probability of any classification error sample $ x_{m} (m = 1,2,\ldots,M) $. Generally, the probability of classification error is approximated by error rate. If the cost of the related classification error exists, the goal of this kind of classifier is changed into: find out the suitable classifier parameter set $ \Uplambda $ and the related rules to achieve the minimum expected cost.

The algorithm is as follows:

1.
Initialize parameters $ \Uplambda $, set $ t = 1 $, select historical sample set $ A $ and new sample set $ I_{t} $. Calculate the respective class center and sample forgetting factor $ d(x_{k} ) $ of the two kinds of sample in $ A $. Determine the minimum classification error criterion collection $ B $. Training $ B $ received initial support vector set $ SV^{0} $ and decision function $ f^{0} $.
2.
If $ I_{t} \ne \varnothing $, the algorithm will terminate. Otherwise, turn to 3.
3.
Find out the samples $ I_{t} $ that violate the $ KKT $ conditions of decision function $ f^{t - 1} $, and determine set $ I^{v}_{t} $.
4.
If $ I_{t}^{v} = \varnothing $, set $ SV^{t} = SV^{t - 1} $, $ f^{t} = f^{t - 1} $, $ t = t + 1 $, then turn to 2; or else turn to 5.
5.
Through the incremental learning of $ T = SV^{t - 1} \cup I^{v}_{t} $, get a new decision function $ f^{t} $ support vector set $ SV^{t} $, set $ t = t + 1 $, and turn to 2.

3.2 Experiment Analysis

This paper contains 3678 calibration samples. In order to verify the effectiveness and the feasibility of this algorithm, we apply incremental learning algorithm in the experiments of automatic classification of these samples based on the above research. In the test, we selected 643 samples as test sample set and 781 samples as the initial training set (H ₁). The rest of the samples were randomly divided into five groups (H ₂, H ₃, H ₄, H ₅, H ₆), which make up five incremental training sample set. We have conducted comparative experiments between the new SVM Incremental Learning Algorithm and the Traditional SVM Learning Method. The experimental results are shown in Table 83.1 (SVM training nuclear function is using Gauss function. Software environment is MATLAB7.0. Hardware environment is Pentium 4/256 MB memory) [4].

Table 83.1 The comparison between results of the new SVM incremental learning algorithm and the traditional SVM learning method

Full size table

From the results, compared with the Traditional Learning Algorithm, the classification accuracy of the SVM Incremental Learning Algorithm is improved, and its training speed is obviously improved.

4 Conclusion

Based on the characteristics of vector machine, we work out the SVM Incremental Learning Method. Furthermore, we put forward Incremental Learning Algorithm which is based on the minimum classification error criterion. This algorithm, in each round of the incremental training of training sample set, effectively keeps the former training samples which contain important information. It also simplifies the current incremental training sample set, and keeps those that are most likely to become the support vectors of training sample set for SVM incremental training. The experimental results show that this learning method is superior to the traditional incremental learning algorithm in both accuracy and time consumption.

References

Vapnik V (1995) The nature of statistical learning theory, vol 1. Springer Press, New York, pp 147–149
Google Scholar
Zhu HB, Cai Y (2009) Text categorization based on active learning support vector machines. Comput Eng Appl 45(2):134–136
Google Scholar
Kong R, Zhang B (2005) A fast incremental learning algorithm for support vector machine. Control Decis 20(10):1129–1136
MathSciNet Google Scholar
Cao J, Liu Z (2007) Incremental learning algorithm based on SVM. Appl Res Comput 24(8):48–52
Google Scholar
Ye S, Wang X, Liu Z, Qian Q (2011) Power system transient stability assessment based on support vector. Mach Increm Learn Method 35(11):15–19
Google Scholar
Zhe X, Zhizhong M (2010) Incremental learning of support vector machine based on hyperspheres. J Northeastern Univ (Nat Sci) 1:16–19
Google Scholar
Xiao R, Wang J (2001) An incremental SVM learning algorithm a-ISV. J Softw 12(12):1818–1823
Google Scholar
Li D, Du S, Wu T(2006) Fast Incremental learning algorithm of linear support vector machine based on hull vectors. J Zhejiang Univ (Eng Sci) 40(2):203–215
MathSciNet Google Scholar
Jiqing Han (2001) Discriminative learning method based on minimum classification error criterion. Electron Eng 27(2):1–12
Google Scholar
Xiusheng Duan (2009) Research on fault diagnosis technology for a fire control system based on SVM. Shijiazhuang Mech Eng Coll 9:112–120
Google Scholar

Download references

Author information

Authors and Affiliations

Optics and Electronic Department, Ordnance Engineering College, Shijiazhuang, China
Bo Wen, Ganlin Shan & Xiusheng Duan

Authors

Bo Wen
View author publications
You can also search for this author in PubMed Google Scholar
Ganlin Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xiusheng Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wen .

Editor information

Editors and Affiliations

College of Elementary Education, Chongqing Normal University, Chongqing, China, People's Republic
Wenjiang Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wen, B., Shan, G., Duan, X. (2013). Research of Incremental Learning Algorithm Based on the Minimum Classification Error Criterion. In: Du, W. (eds) Informatics and Management Science III. Lecture Notes in Electrical Engineering, vol 206. Springer, London. https://doi.org/10.1007/978-1-4471-4790-9_83

Download citation

DOI: https://doi.org/10.1007/978-1-4471-4790-9_83
Published: 25 November 2012
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4789-3
Online ISBN: 978-1-4471-4790-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Research of Incremental Learning Algorithm Based on the Minimum Classification Error Criterion

Abstract

Similar content being viewed by others

Incremental supervised learning: algorithms and applications in pattern recognition

Support vector machine incremental learning triggered by wrongly predicted samples

A novel online incremental and decremental learning algorithm based on variable support vector machine

Keywords

1 Introduction

2 Problem Description

2.1 The Function of Support Vector Machine and Support Vector

2.2 The Criterion of Minimum Classification Error

3 Algorithm Design and Experiment Analysis

3.1 Algorithm Design

3.2 Experiment Analysis

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Research of Incremental Learning Algorithm Based on the Minimum Classification Error Criterion

Abstract

Similar content being viewed by others

Incremental supervised learning: algorithms and applications in pattern recognition

Support vector machine incremental learning triggered by wrongly predicted samples

A novel online incremental and decremental learning algorithm based on variable support vector machine

Keywords

1 Introduction

2 Problem Description

2.1 The Function of Support Vector Machine and Support Vector

2.2 The Criterion of Minimum Classification Error

3 Algorithm Design and Experiment Analysis

3.1 Algorithm Design

3.2 Experiment Analysis

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation