1 Introduction

In the past two decades, due to their surprising classification capability, support vector machine (SVM) [1] and its variants [2,3,4] have been extensively used in classification applications. SVM has two main learning features: (1) In SVM, the training data are first mapped into a higher dimensional feature space through a nonlinear feature mapping function \(\phi \left( x \right) \), and (2) the standard optimization method is then used to find the solution of maximizing the separating margin of two different classes in this feature space while minimizing the training errors. With the introduction of the epsilon-insensitive loss function, the support vector method has been extended to solve regression problems [5].

As the training of SVMs involves a quadratic programming problem, the computational complexity of SVM training algorithms is usually intensive, which is at least quadratic with respect to the number of training examples. It is difficult to deal with large problems using single traditional SVM [6]; Instead, SVM mixtures can be used in large applications [7].

The information and knowledge has great potential value. The updating speed of information is amazing and the traditional generalization ability of learning algorithms aren’t appropriate. It caused learning and local minimum, and when the training sample number of these traditional algorithms faced with the curse of dimensionality or the training can’t be due to memory limitations. The incremental learning algorithm and the data classification technology has gradually become one of the key technologies of computational intelligence technology [8]. It is compared with ordinary data classification techniques.

The incremental learning classification techniques have significant advantages, and mainly manifested in two aspects: (1) Eliminating the need to preserve historical data, thereby reducing the storage space occupied. (2) Due to its new training full use of history in training results, and thus significantly reduce the time of a subsequent training [9]. While some gradient problems, new samples are provided on the amount of information provided by the amount of information and historical sample is different [10]. The conventional incremental algorithms in most of the learning algorithm are using the decision tree algorithm and neural network algorithm to achieve. The shortcomings can usually be found in varying degrees: (1) Because of lack control of the expected risk in entire dataset, the algorithm is easy to produce excessive training data matching; (2) Due to the lack of training data selection forgotten elimination mechanism, they are affecting the classification accuracy of SVM. The learning algorithm is based on structural risk minimization theory. It is one of the few learning algorithms can successfully solve the first problem with the traditional learning technologies. The advantages of SVM are compared to its generalization performance doesn’t depend on all training data, but through the training to get the support vector datasets on behalf of the entire datasets. The support vector datasets only accounts for the training datasets but having a small part of the datasets with important classification boundary information [11]. SVM can discard useless samples in order to reduce the training datasets, thereby reducing the storage space using historical training results so that the new learning algorithm. With the continuity of SVM to incremental learning is an effective method, especially ideal for big datasets.

Under normal circumstances, the incremental learning algorithm is mainly used in the following circumstances: (1) The samples are generated in real time, such as the stock trading data and time sequence data. (2) The samples are obtained in the form of spacer blocks, such as the scientific data. (3) The dataset is too large to be stored in personal computer’s memory, such as web logging data.

Incremental learning for different type learning has some problems. The following two algorithms have to solve the problems: (1) Add the sample called online incremental learning; (2) Add appropriate size of the sample datasets as increments [12]. The literature [3] firstly had proposed Incremental Learning on Support Vector Machine. It is given the increase in incremental learning strategy, but there is no improvement in the solution algorithm and still use Support Vector Machine. They are given only approximate incremental learning time to select a small number of general quadratic programming algorithm can handle the training sample, and then retain only the support vector and lost the other samples. The new training samples has been completed training until all samples are finished training procedure [13]. The incremental learning algorithms with approximate solution will lose part of the information. When they discard some non-support vectors for processing large training datasets than using traditional algorithms to improve some shortcomings, the literature [4] had given online incremental learning algorithm. The resulting solution is precise. The literature [5] had given the local online incremental learning algorithm based on radial basis function. When new samples into the training datasets hasn’t reconsider the entire training datasets, the RBF kernel function is localized and saving computational time in the procedure of Support Vector Machine [14].

2 Variable support vector machine

Variable support vector machine (VSVM) is quick and simple classification algorithm based on improved support vector machine. It is the standard linear SVM quadratic programming problem for simple deformation obtained the M-dimensional space of non-negative constraints convex function minimization problem. M is the number of samples of n-dimensional input data space. The non-negative optimality conditions for constrained minimization problem into the symmetric positive definite complement problem, and then use the simple linear convergence of the iterative algorithm to solve VSVM need two order matrix inversion. The Sherman–Morrison–Woodbury (SMW) equation into smaller \(\left( {n+1} \right) \)-order matrix inverse [15] \((n\ll m)\). The VSVM algorithm can process big datasets including with millions of sample points, but for the non-linear VSVM matrix inversion can’t be applied to the SMW equation is difficult to handle big-scale problems.

The VSVM is the standard linear SVM classifier. It can make two simple changes to maximize parallel boundary planes interval from n-dimensional to n+1-dimensional space with \(\left( {w,b} \right) \). Followed as minimize the error rate \(\xi \) with 1-norm becomes \(\xi \) with 2-norm, thus eliminating the need for non-negative constraints, the deformation classification problem is strongly convex and the solution is obtained with the SVM classification method [16]. The solutions are almost following steps. The deformation of the linear SVM classification problem is follows as:

$$\begin{aligned}&\min \frac{1}{2}\left( {\left\| w \right\| ^{2}+b^{2}} \right) +\frac{C}{2}\xi ^{T}\xi \nonumber \\&s.t. \qquad y_i \left( {\left( {w\cdot x_i } \right) +b} \right) +\xi _i \ge 1 ,i=1,...,m \end{aligned}$$
(1)

In formula (1), the variable function is shown as formula (2).

$$\begin{aligned} L= & {} \frac{1}{2}\left( {\left\| w \right\| ^{2}+b^{2}} \right) +\frac{C}{2}\xi ^{T}\xi \nonumber \\&-\sum _{i=1}^m {\alpha _i \left( {y_i \left( {\left( {w\cdot x_i } \right) +b} \right) +\xi _i -1} \right) } \end{aligned}$$
(2)

In VSVM algorithm, the iterative formula is shown as formula (3).

$$\begin{aligned}&\alpha ^{i+1}=Q^{-1}\left( {e+\left( {\left( {Q\alpha ^{i}-e} \right) -\lambda \alpha ^{i}} \right) _+ } \right) , \nonumber \\&\quad i=0,1,...,\lambda >0 \end{aligned}$$
(3)

When it is satisfied with the condition of \(0<\lambda <\frac{2}{C}\), the VSVM algorithm for arbitrary initial points has the global linear convergence. The inverse matrix Q using SMW equation will be second-order matrix reversal into n + 1 (\(n\ll m)\) order matrix inversion. They are making solving problems and it is feasible for big datasets to reduce the calculation. The SMW equation is shown as follows:

$$\begin{aligned} \left( {\frac{I}{v}+AA^{T}} \right) ^{-1}-v\left( {I-A\left( {\frac{I}{v}+A^{T}A} \right) ^{-1}A^{T}} \right) \end{aligned}$$

where in \(v>0\), the A matrix is an arbitrary \(m\times n\)-dimensional matrix.

For the non-linear case, the kernel function is \(K\left( {x,y} \right) =\phi \left( x \right) ^{T}\phi \left( y \right) \). The non-linear classification function is shown as formula (4).

$$\begin{aligned} f\left( x \right) =\alpha ^{T}DK\left( {A,x} \right) +b \end{aligned}$$
(4)

So \(G=\left[ {A-e} \right] \), \(Q=\frac{I}{C}+DK\left( {G,G^{T}} \right) D\). The formula \(\mathop {\min }\limits _{0\le \alpha \in R^{m}} \frac{1}{2}\alpha ^{T}Q\alpha -e^{T}\alpha \) can become formula (5).

$$\begin{aligned} \mathop {\min }\limits _{0\le \alpha \in R^{m}} \frac{1}{2}\alpha ^{T}\left( {\frac{I}{C}+DK\left( {G,G^{T}} \right) D} \right) \alpha -e^{T}\alpha \end{aligned}$$
(5)

The nonlinear case iteration formula with linear situation has shown in the same equation.

3 Online incremental learning algorithm based on VSVM

This section is based on VSVM to solve conventional SVM problems quickly and effectively. It can give suitable algorithm for the online incremental learning algorithm of VSVM. The matrix inversion algorithms can perform new sample results to the use of original datasets before the incremental demand. The inverse matrix can be updated without re-optimization of the new QP problem corresponding to the new training datasets and greatly reduce the scale of computation.

3.1 Online incremental learning algorithm

The algorithm analysis is shown that the sample points by Sect. 2 only support vectors and non-support vector both cases. Therefore, they need to consider both cases when the new sample is added to the training datasets of \(x_c \). When they add new sample to the training sample datasets of T, the incremental learning algorithm need to update the SVM classifier. When new sample of \(x_c \) is joining into the training datasets, they firstly initialize \(\alpha _c =0\). Be judged by using the formula (1), if \(x_c \) meet \(y_c f\left( {x_c } \right) \ge 1\), then \(\alpha _c =0\) meet the KKT conditions. Thereby, \(x_c \) isn’t support vector and influence on the classification is configured. If \(x_c \) meet \(y_c f\left( {x_c } \right) <1\), then the \(\alpha _c =0\) is clearly not in line with the KKT conditions. Thus, it is affecting the construction of the classifier and the need to re-optimize the dual problem.

It is supposed that T has m samples of the training datasets, the newly added sample \(x_{m+1} ,y_{m+1} \), and order \(\alpha _{m+1} =0\). Add new sample of \(\left( {x_{m+1} ,y_{m+1} } \right) \) corresponding dual problem becomes formula (6).

$$\begin{aligned}&\min \frac{1}{2}\left( {\alpha ^{T}\alpha _{m+1} } \right) Q_{new} \left( {\begin{array}{l} \alpha \\ \alpha _{m+1} \\ \end{array}} \right) -e^{T}\left( {\begin{array}{l} \alpha \\ \alpha _{m+1} \\ \end{array}} \right) \nonumber \\&\quad \left( {\alpha ^{T}\alpha _{m+1} } \right) ^{T}\ge 0 \end{aligned}$$
(6)

Let \(h_{m+1} =y_{m+1} \left( {x_{m+1}^T ,-1} \right) \), \(H=D\left[ {A-e} \right] \), then \(H_{new} =\left( {\begin{array}{l} H \\ h_{m+1} \\ \end{array}} \right) \), the abbreviated \(H_{new} =\left( {\begin{array}{l} H \\ h \\ \end{array}} \right) \), which \(Q_{new} =\frac{1}{C}+\left( {\begin{array}{l} H \\ h \\ \end{array}} \right) \left( {H^{T}h^{T}} \right) \), it is as formula (7).

$$\begin{aligned} Q_{new} =\left( \begin{array}{ll} Q &{} Hh^{T} \\ hH^{T} &{} \frac{1}{C}+hh^{T} \\ \end{array} \right) \end{aligned}$$
(7)

The VSVM iteration formula (2) by the second part of the new issues of \(\alpha _{new} \) as formula (8).

$$\begin{aligned}&\alpha _{new}^{i+1} =Q_{new}^{-1} \left( {\left( {\left( {Q_{new} \alpha _{new}^i -e} \right) -\lambda \alpha _{new}^i } \right) _+ +e} \right) , \nonumber \\&\quad i=0,1,2,\ldots ,\lambda >0 \end{aligned}$$
(8)

where in \(\alpha _{new}^0 =\left( {\begin{array}{l} \alpha \\ 0 \\ \end{array}} \right) \), \(\alpha \) is optimal solution for incremental training datasets of T.

(1) The Linear Case

If novel solution parameter is \(\alpha _{new} \), \(Q_{new}^{-1} \) is the key question in front of VSVM iteration formula (8). The application with SMW equation can be obtained as follows:

$$\begin{aligned} Q_{new}^{-1} =C\left( {1-\left( {\begin{array}{l} H \\ h \\ \end{array}} \right) \left( {\frac{1}{C}+H^{T}H+h^{T}h} \right) ^{-1}\left( {H^{T}h^{T}} \right) } \right) \end{aligned}$$

Let \(B=\left( {\frac{I}{C}+H^{T}H} \right) ^{-1}\), you can get as follows:

$$\begin{aligned}&\left( {\frac{I}{C}+H^{T}H+h^{T}h} \right) ^{-1}=\left( {B^{-1}+h^{T}h} \right) ^{-1} \\&\quad =\left( {B-\frac{Bh^{T}hB}{1+hBh^{T}}} \right) \end{aligned}$$

Thus

$$\begin{aligned} Q_{new}^{-1} =C\left( {I-\left( {\begin{array}{l} H \\ h \\ \end{array}} \right) \left( {B-\frac{Bh^{T}hB}{1+hBh^{T}}} \right) \left( {H^{T}h^{T}} \right) } \right) \end{aligned}$$
(9)

where in B is a value obtained in the previous calculation, we can use the VSVM iterative formula (8) and add a new sample after the new solutions of \(\alpha \) obtained by the novel solution \(\alpha _{new} \).

(2) The Nonlinear Case

Let \(g=\left( {x_{m+1}^T -1} \right) \), \(G=\left[ {A-e} \right] \), then:

$$\begin{aligned} G_{new} =\left[ {\begin{array}{l} G \\ g \\ \end{array}} \right] , D_{new} =\left[ \begin{array}{ll} D &{} 0 \\ 0 &{} y_{m+1} \\ \end{array} \right] \end{aligned}$$

At this point, Q can become \(Q_{new} \) as follows:

$$\begin{aligned}&Q_{new} =\frac{I}{C}+D_{new} K\left( {G_{new} ,G_{new}^T } \right) D_{new} \\&\quad =\left( \begin{array}{ll} Q &{} DK\left( {G,g^{T}} \right) y_{m+1} \\ y_{m+1} K\left( {g,G^{T}} \right) D &{} \frac{1}{C}+K\left( {g,g^{T}} \right) \\ \end{array} \right) \end{aligned}$$

Let \(DK\left( {G,g^{T}} \right) y_{m+1} =b\), \(y_{m+1} K\left( {g,G^{T}} \right) D=b^{T}\), \(\frac{1}{C}+K\left( {g,g^{T}} \right) =d\), \(Q_{new} \) can be expressed as follows:

$$\begin{aligned} Q_{new} =\left( \begin{array}{ll} Q &{} b \\ b^{T} &{} d \\ \end{array} \right) \end{aligned}$$

It can be obtained by the formula (10).

$$\begin{aligned} Q_{new}^{-1}{=}\left( \begin{array}{ll} \left( {Q-bb^{T}/d} \right) ^{-1} &{} -\left( {Q-bb^{T}/d} \right) ^{-1}b/d \\ -b^{T}\left( {Q-bb^{T}/d} \right) ^{-1}/d &{} b^{T}\left( {Q-bb^{T}/d} \right) ^{-1}b/d^{2}+1/d \\ \end{array} \right) \nonumber \\ \end{aligned}$$
(10)

And \(\left( {Q-bb^{T}/d} \right) ^{-1}\) is existed.

3.2 Online incremental learning algorithm based on VSVM

In summary, they are given the online incremental learning algorithm based on VSVM.

Firstly, the original sample datasets are divided into follows.

$$\begin{aligned}&\left\{ {T=\left\{ {\left( {x_1 ,y_1 } \right) ,\left( {x_2 ,y_2 } \right) ,\ldots ,\left( {x_m ,y_m } \right) } \right\} } \right\} \\&\quad \cup \left\{ {\left( {x_{m+1} ,y_{m+1} } \right) } \right\} \cup \ldots \cup \left\{ {\left( {x_N ,y_N } \right) } \right\} . \end{aligned}$$
  1. Step 1:

    Given the parameter C, the parameter \(\lambda \) is selected to satisfy \(0<\lambda <2/C\), the accuracy requirements \(\varepsilon >0\), T trained classifier \(C_1 \), so that \(k=1\).

  2. Step 2:

    Get the new sample in \(\left( {x_{m+k} ,y_{m+k} } \right) \), and \(\alpha _{m+k} =0\), \(i=0\).

  3. Step 3:

    Classifier \(C_k \) is using KKT conditions for new sample \(x_{m+k} \) is judged.

  4. Step 3.1:

    If the KKT conditions are satisfied, the classifier \(C_k \) doesn’t change, \(k=k+1\), turn Step2, until the completion of all training sample;

  5. Step 3.2:

    If the KKT conditions aren’t satisfied, \(x_{m+k} \), and the training datasets form new training datasets, computing the inverse matrix of the new matrix Q;

  6. Step 3.2.1:

    If the classification problem is linear separable, it is calculated using formula (9) is the inverse matrix of the new matrix Q;

  7. Step 3.2.2:

    If the classification problem is nonlinear separable, using formula (10) to calculate the inverse matrix of the new matrix Q;

  8. Step 4:

    Use VSVM iteration formula (8) to calculate \(\alpha _{new}^{i+1} \) parameter.

  9. Step 5:

    If \(\left\| {\alpha _{new}^{i+1} -\alpha _{new}^i } \right\| \le \varepsilon \), the new classifier \(C_{k+1} \), \(k=k+1\), then turn Step2, until the sample training, otherwise let \(i=i+1\), then turn Step4.

4 Online decremental learning algorithm based on VSVM

4.1 Online decremental learning algorithm

Online incremental training algorithm is training with the conduct of the incremental learning algorithm. The size of the sample datasets will become increasingly larger than before. The increased amount of storage is big, while variable multiplier is to be calculated the amount of increase of the promoter and nuclear function. The slow training and the processor load will be increasingly heavier to cause increasingly longer running time. In order to effectively reduce the load on the processor, it is necessary to effectively reduce the size of the learning sample datasets. Therefore, they must be designed to remove redundant sample algorithm, when one or more of the samples removed from the training datasets T needed decremental learning algorithm. The decremental learning algorithm based on VSVM is put forward in the context, the decremental learning of the so-called VSVM learning. In accordance with certain rules, they can abandon some samples to reduce the size of sample datasets. Only samples lost each time is called online decremental learning algorithm.

Each lost multiple samples called bulk decremental learning in the literature [4]. It is presented the online decremental learning algorithm. The decremental learning procedure can be defined as the reversible process given incremental learning, and the algorithm is the evaluation to promote the ability to leave-one-out. Two linear classifications with incremental and decremental learning algorithm based on Proximal Support Vector Machine (PSVM) is given by literature [17]. The running time is relatively fast, followed by literature [7] gives decremental learning algorithm based on PSVM uses a weighted decay coefficient instead of the existing window method and improve the running speed of algorithm.

4.2 Online decremental learning algorithm based on VSVM

In this section, they are given the decremental learning algorithm based on VSVM. Firstly, in this section it need some basic conventions and notations.

(1) The classification of the training datasets \(T=\left\{ {\left( {x_i ,y_i } \right) |x_i \in R^{n},y_i =\pm 1,i=1,\ldots ,m} \right\} \), wherein bully \(x_i \) is an n-dimensional space. The sample points \(A^{T}=\left[ {x_1 ,\ldots ,x_m } \right] \), \(y_i \in \left\{ {\pm 1} \right\} \) are corresponding to the positive class of \(x_i \) and negative categories like numerals, \(i=1,...,m\), \(D=diag\left( {y_1 ,\ldots ,y_m } \right) \).

(2) The subscription of \(x\in R^{n}\) is optionally with the vector. The \(x\backslash K\) represents the vector x to remove all of the K for the next element in the underlying component vector formed where K is a subset of the datasets x.

(3) I is represented the matrix unit. The elementary matrix is represented by \(P\left( {i,j} \right) \) to interchange the i\(^{th}\) row (column) and the j\(^{th}\) row (column) matrix obtained by the matrix I, and \(P\left( {i,j} \right) ^{-1}=P\left( {i,j} \right) \).

The decremental learning training datasets of samples has time in descending online decremental learning training datasets in decrements of one sample at each moment. Our study and analysis with online decremental algorithm has based on VSVM, and the first consideration can be removed from the training datasets. It is provided into the original training datasets T. They have m samples and remove the samples of \(\left( {x_i ,y_i } \right) \). For the order of m positive definite matrix Q and \(Q^{-1}\), they can transform the row and column of Q elementary with row and column transformation. They can assumed that first k-rows and k-columns, and changed to the first line, and let \(P\left( {1,k} \right) QP\left( {1,k} \right) =K\), then \(K^{-1}=P\left( {1,k} \right) Q^{-1}P\left( {1,k} \right) =U\). Then they can block K and U matrix.

$$\begin{aligned} K=\left( \begin{array}{ll} k_{11} &{} {k^{T}} \\ k &{} {Q_{m-1} } \\ \end{array} \right) , \quad U=\left( \begin{array}{ll} u_{11} &{} {u^{T}} \\ u &{} U_{m-1} \\ \end{array} \right) \end{aligned}$$

Wherein \(k_{11} \), \(u_{11} \in R^{1}\), \(k,u\in R^{m-1}\). Then the \(K^{-1}\) is written as follows:

$$\begin{aligned} K^{-1}=\left( \begin{array}{ll} \left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1} &{} -\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}k^{T}Q_{m-1}^{-1} \\ -Q_{m-1}^{-1} k\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1} &{} Q_{m-1}^{-1} k\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}k^{T}Q_{m-1}^{-1} +Q_{m-1}^{-1} \end{array} \right) \end{aligned}$$

By \(K^{-1}=U\), it can be obtained as formula (11).

$$\begin{aligned}&\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}=u_{11} \nonumber \\&\quad -\,\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}k^{T}Q_{m-1}^{-1} =u^{T} \nonumber \\&\quad -\,Q_{m-1}^{-1} k\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}=u \nonumber \\&\quad Q_{m-1}^{-1} k\left( {k_{11} -k^{T}Q_{m-1}^{-1} k} \right) ^{-1}k^{T}Q_{m-1}^{-1} \nonumber \\&\quad +\,Q_{m-1}^{-1} =U_{m-1} \end{aligned}$$
(11)

By the formula (11) obtained as follows:

$$\begin{aligned} Q_{m-1}^{-1} =U_{m-1} -\frac{uu^{T}}{u_{11} } \end{aligned}$$
(12)

\(Q_{m-1} \) is set to be removed the sample of \(\left( {x_k ,y_k } \right) \) after the Q represented by \(Q^{-1}\) using formula (11). It can be obtained by the preceding analysis shows \(Q_{m-1}^{-1} \). \(Q_{m-1} \) of the inverse matrix don’t have to recalculate thereby reducing the amount of computation.

Table 1 Linear case of OI-VSVM experimental results

It can be obtained by the iterative formula (8) of VSVM after the decremental solution of \(\alpha _{new} \).

$$\begin{aligned}&\alpha _{new}^{i+1} =Q_{m-1}^{-1} \left( {\left( {\left( {Q_{m-1} \alpha _{new}^i -\varepsilon } \right) -\lambda \alpha _{new}^i } \right) _+ +\varepsilon } \right) , \nonumber \\&\quad i=0,1,2\ldots ,\lambda >0 \end{aligned}$$
(13)

where in \(\alpha _{new}^0 =\alpha \backslash \left\{ k \right\} \), \(\alpha \) before decrementing the optimal solution.

The details of online decremental learning algorithm are based on VSVM:

  1. Step 1:

    Given the parameter of C, selected to meet \(0<\lambda <2/C\) the parameter \(\lambda \), the accuracy requirements \(\varepsilon >0\) to obtain the sample to be removed \(\left( {x_k ,y_k } \right) \);

  2. Step 2:

    Let \(\alpha _{new}^0 =\alpha \backslash \left\{ k \right\} \), wherein \(\alpha \) is the decrement before the optimal solution, \(i=0\);

  3. Step 3:

    Use of the inverse matrix of the formula (12) to calculate the new matrix \(Q_{m-1} \);

  4. Step 4:

    Use the VSVM iteration formula (13) to calculate \(\alpha _{new}^{i+1} \);

  5. Step 5:

    If \(\left\| {\alpha _{new}^{i+1} -\alpha _{new}^i } \right\| \le \varepsilon \), they can get new solution. otherwise, let \(i=i+1\) and turn step 4.

5 The experimental results analysis and discussion

5.1 The numerical experiments of online incremental learning algorithm

To describe the convenience, they can give in this section the VSVM-Online Incremental learning algorithm. It is abbreviated as OI-VSVM. It isn’t use previously calculated results to calculate the matrix inverse of online incremental learning algorithm abbreviated is OVSVM.

In order to verify the effectiveness of the OI-VSVM algorithm in Sect. 2, the paper has selected nine datasets [18] in the standard pattern classification used on numerical experiments and comparison. We applied Matlab 2014a for the programming environment in the paper. The numerical experiments are run on personal computer.

We take advantage of the OVSVM algorithm and the literature [4] had proposed online incremental learning algorithm (It is abbreviated to On-line) and OI-VSVM algorithm is selected datasets from the correct rate of training test and CPU running time to compare \(\lambda ={1.9}/C\), where C is a penalty parameter, accuracy requirements. \(\varepsilon \) is \(10^{-5}\) to take in all of the following test. The paper has divided into both cases of linear and nonlinear numerical experiments.

The operating results for the linear case is shown in Table 1, in which the number of sample points for the initial training datasets. C is the penalty parameter. The value of C is adjusted by selecting the datasets of the SVM’s correct rate to determine the highest adjusted datasets with 10% of the size in the training datasets. The data is randomly selected with the training datasets is separated as the following experiments.

For the nonlinear case, the radial basis on selection of the Gaussian kernel function is \(K\left( {x,y} \right) =\exp \big ( -\left\| {x-y} \right\| ^{2}/\delta ^{2} \big )\), the parameter selection and the results are shown in Table 2. The RBF kernel function is \(\delta \) nuclear parameter.

This section has given used OI-VSVM. It can be seen through the experimental results in the linear and nonlinear case. The OI-VSVM algorithm keep training correct rate and test correct rate doesn’t drop the case greatly reduces the CPU running time. It mainly due to step calculation results, and SMW equation to calculate the increase the sample of matrix inverse in the OI-VSVM utilization, thus saving computation time.

Table 2 Nonlinear case of OI-VSVM experimental results

5.2 The numerical experiments of online decremental learning algorithm

The decremental VSVM of the effectiveness of the learning algorithm is given by numerical experiments in this section. In order to express convenience, they are abbreviated as OD-VSVM in this section by given the online decremental learning algorithm based on VSVM. OVSVM for the use of direct inversion method of learning algorithm is the VSVM online decrements.

In order to verify the performance of OD-VSVM, the UCI machine learning database [19] is selected datasets in numerical experiments. Respectively, because the algorithm given directly inverse matrix inversion, so their training correct rate and test correct rate is the same. So the selected datasets from the CPU running time is to be compared on our experiments for the nonlinear case, the RBF kernel function of \(K\left( {x,y} \right) =\exp \left( {-\left\| {x-y} \right\| ^{2}/2\delta ^{2}} \right) \) is selected, and take \(\delta =2\). In VSVM, we can take \(\lambda ={1.9}/C\), where C is the penalty parameter and take the \(C=1/m\) (m is the number of sample points in training datasets). The accuracy is \(\varepsilon =10^{-5}\).

Table 3 The linear case of OD-VSVM experimental results

The CPU running time comparison results with OD-VSVM and OVSVM on the same datasets is shown in Table 3. The CPU running time of OD-VSVM is less than OVSVM when training datasets’ size isn’t too big. The difference between OD-VSVM and OVSVM isn’t obvious. However, the CPU time for big-scale training datasets by OD-VSVM and OVSVM is quite obvious, such as Image1 and Image2 datasets [19].

Table 4 is shown as the experimental results of learning by the nonlinear case VSVM online decrements is shown different algorithms. The CPU running time can be seen from Table 4. The CPU running time below OD-VSVM and OVSVM has more apparent especially for the amount of the training datasets. k is larger than value of OD-VSVM time savings where k is the number of samples for each lostsample.

Table 4 The nonlinear case of OD-VSVM experimental results

They have the ability of online adaptive learning makes support vector machine decremental learning, it can change as time itself. The Sect. 4.2 has given the block for matrix based on the online decremental learning algorithm based on VSVM. The two inverse matrix with one algorithm speed, due to take full advantage of the results to avoid re-learning to remove some of the sample on a learning to effectively reduce the complexity of the computation time of OVSVM and OD-VSVM, the numerical experimental results are shown that the section is given as VSVM online and bulk decremental learning algorithm to maintain good algorithm original training correct rate and test correct rate, and reduce the run time, so the proposed decremental learning algorithm in the paper is effective in big-scale datasets.

5.3 Calculating visual saliency map experiment

This part of video data published by the Itti’s datasets [20] to calculate the video image sequences of visual saliency. The datasets are included by day and night video, indoor and outdoor video, news video and other various video. In order to contrast conveniently, they were in the input image is first marked, so very convenient resolution model can result in visual saliency map on the pros and cons.

The image of Fig. 1a is the original image in the video. The image of Fig. 1b is the visual saliency map using Itti’s model [21]. The image of Fig. 1c is the visual saliency map using GBVS model [22]. The image of Fig. 1d is a visual saliency map using IS model [23]. The image of Fig. 1e is a visual saliency map by the proposed algorithm in the paper.

According to the saliency map results, the VSVM method into the characteristics of visual saliency map can better reflect the original image. At the same time, the visual saliency map well to important auxiliary role in the subsequent target detection process.

5.4 The testing experiments of object tracking algorithm

In order to verify the accuracy of the proposed algorithm in target detection field of video based face tracking test, this section has selected Stan Birchfield datasets [24] on the target detection experiments. The test computer implementing environment has Intel E8400 2.6 GHz and 8G DRAM to achieve target detection using Matlab 2014a software. They are focusing on the robustness of the algorithm, including the changes of light intensity, the target shape change, occlusion detection results of target detection algorithm, and individual characteristics and performance comparison.

The first datasets of experiments on video sequences of occlusion (\(128\times 96)\) for face detection (video file name: movie_cubicle, total frames: 95 of video). Three algorithms of experiment are target detection algorithm considering the color feature of the target detection algorithm, the application of visual saliency detection algorithm, using OI-VSVM and OD-VSVM and visual saliency global feature. Figure 2 is the three occlusions algorithm of target detection results map. The video image sequence was first, 16, 21, 34, 51, 62, 70, 88 frame images. The video image sequence in the target will be blocked in the process of movement, occluded objects with similar color of the target to be detected. In considering the target detection method of color feature, because the background color distribution approximation and the object color, so target detection effect is poor; while in Fig. 2c is presented in this section method, considering OI-VSVM and OD-VSVM in visual saliency feature fusion. Even in the presence of occlusions, the proposed algorithm in the paper can accurately locate the target, the algorithm robustness.

Fig. 1
figure 1

Different visual saliency figures

Fig. 2
figure 2

Under occlusions detection results. a OI-VSVM in feature tracking algorithm results. b OD-VSVM in feature tracking algorithm results. c OI-VSVM and OD-VSVM in feature tracking algorithm results

6 Conclusions

Firstly, the paper has proposed the online incremental learning algorithm based on VSVM. The proposed algorithm has taken full advantage of incremental pre-calculated results, don’t need to re-learning the entire training datasets under the premise, don’t reduce training correct rate and test correct rate, so that the increase reduced amount after the amount of calculation of the inverse matrix. The experimental results in the paper have shown that the algorithm in the case of ensuring the classification accuracy of effectively improving the training speed, with incremental conducted training assembly increasing, leading to training slowly. In order to solve the problem, they are given the online decremental learning algorithm based on the VSVM, calculated the inverse matrix decremental learning algorithm speed through the inverse matrix. Due to take full advantage of the time to learn the results to avoid due to lose some samples caused by re-learning, which effectively reduces the complexity of computation time by numerical experiments to prove the effectiveness of the algorithm.