A Novel Manifold Regularized Online Semi-supervised Learning Model

Ding, Shuguang; Xi, Xuanyang; Liu, Zhiyong; Qiao, Hong; Zhang, Bo

doi:10.1007/s12559-017-9489-x

A Novel Manifold Regularized Online Semi-supervised Learning Model

Published: 02 August 2017

Volume 10, pages 49–61, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cognitive Computation Aims and scope Submit manuscript

A Novel Manifold Regularized Online Semi-supervised Learning Model

Download PDF

Shuguang Ding¹,
Xuanyang Xi²,
Zhiyong Liu^2,3,4,
Hong Qiao^2,3,4 &
…
Bo Zhang¹

690 Accesses
15 Citations
Explore all metrics

Abstract

In the process of human learning, training samples are often obtained successively. Therefore, many human learning tasks exhibit online and semi-supervision characteristics, that is, the observations arrive in sequence and the corresponding labels are presented very sporadically. In this paper, we propose a novel manifold regularized model in a reproducing kernel Hilbert space (RKHS) to solve the online semi-supervised learning (OS²L) problems. The proposed algorithm, named Model-Based Online Manifold Regularization (MOMR), is derived by solving a constrained optimization problem. Different from the stochastic gradient algorithm used for solving the online version of the primal problem of Laplacian support vector machine (LapSVM), the proposed algorithm can obtain an exact solution iteratively by solving its Lagrange dual problem. Meanwhile, to improve the computational efficiency, a fast algorithm is presented by introducing an approximate technique to compute the derivative of the manifold term in the proposed model. Furthermore, several buffering strategies are introduced to improve the scalability of the proposed algorithms and theoretical results show the reliability of the proposed algorithms. Finally, the proposed algorithms are experimentally shown to have a comparable performance to the standard batch manifold regularization algorithm.

A Novel Manifold Regularized Online Semi-supervised Learning Algorithm

Manifold proximal support vector machine with mixed-norm for semi-supervised classification

Article 01 October 2014

Semi-supervised Kernel Minimum Squared Error Based on Manifold Structure

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Online learning techniques get great progress in recent years [1,2,3,4,5,6,7,8,9,10]. In general, online learning has several characteristics: (1) the samples arrive sequentially in a stream and only one new sample is available in each online learning round; (2) the label of the new arrived sample is predicted by the current classifier and the true label of the sample is revealed; (3) when a new sample is misclassified, the classifier should be updated in time to improve its generalization ability; and (4) the classifier can be updated without re-training all the visible samples.

In literature, much attention has been put on online supervised learning, e.g., [11,12,13,14], that is, the true labels are available in the online training process. However, in practice, we frequently face online semi-supervised learning problems [15,16,17], such as the human categorization problem. In [16], Zhu et al. designed a series of experiments to demonstrate that the human learning behavior is closely related to the semi-supervised learning pattern. Furthermore, Gibson et. al. [18] applied the learned semi-supervised model to human learning tasks. In human learning, learners can incrementally learn the classes of various objects from the surrounding environment, where only a few objects are labeled by a knowledgeable source. This scenario can be actually regarded as online semi-supervised learning, that is, the label of a new arrived sample is unavailable or presented very sporadically in the online process.

In this paper, we focus on the online semi-supervised learning problems (OS²L). Several OS²L algorithms have been proposed in the past several years. By using a heuristic method to greedily label the unlabeled examples, Babenko et al. [19] and Grabner et al. [20] tried to solve the OS²L problems in an online supervised learning framework. Dyer et al. [21] presented a semi-supervised learning (SSL) framework called COMPOSE (COMPacted Object Sample Extraction), where a few labeled samples are given initially, and then a SSL problem is solved based on the currently labeled samples and new unlabeled samples, which follow a drift distribution. To reduce the computational complexity of manifold construction in the online training process, Kveton et al. [22] and Farajtabar et al. [23] proposed the harmonic solution for manifold regularization on an approximate graph.

By using online convex programming, Goldberg et al. [24] proposed an online manifold learning framework for SSL in a kernel space with stochastic gradient descent. In addition, they extended their method to online active learning by adding an optional component to select the instances to be labeled [25]. Sun et al. [26, 27] exploited the property of Fenchel conjugate of hinge loss and gradient ascend method to solve the dual problem of their online manifold learning model. Those algorithms in [24, 26, 27] are derived by using online gradient methods, implying that these methods can be regarded as solving the off-line semi-supervised learning models by stochastic gradient methods. However, none of these stochastic gradient methods can obtain an exact solution because they do not directly solve the constrained optimization problem involved.

In practice, we prefer an exact solution, which can usually achieve a more accurate result and meanwhile more efficiently. In this paper, we propose an algorithm with analytical solution to solve the online semi-supervised problem. Specifically, we propose a novel online manifold regularization learning model in a reproducing kernel Hilbert space (RKHS), by exploiting the internal geometry information of the unlabeled data and take advantage of the kernel methods. In each iteration of online training, by considering the new arrived sample and the previous samples, an online model based on a constrained optimization problem is presented, and the exact solution of the proposed model is obtained with the help of the Lagrange dual problem. Meanwhile, a fast learning algorithm (named FMOMR) is presented by introducing an approximate technique to compute the derivative of the manifold term. In addition, the regularization parameter of the proposed model can be regarded as a forgetting factor, which provides a reasonable and consistent way to control the number of support vectors. By such merits, the proposed online predictors experimentally exhibits a high accuracy comparable to batch algorithm LapSVM.

This paper substantially extend our previous work [28] by providing (a) a fast algorithm of the proposed model (“Fast Algorithm of the Proposed Model” section), (b) several buffering strategies (“Buffering Strategies” section), (c) a brief theory analysis of the proposed algorithms (“Theory Analysis” section), (d) more experiments (“Action Video Categorization” section), and (e) some background knowledge (“Background Knowledge” section).

The rest of this paper is organized as follows. The background knowledge is briefly reviewed in “Background Knowledge” section, and the proposed model and algorithms are detailed in “Online Manifold Learning with Kernels” section. After giving some experimental results in “Experiments” section, the paper is concluded by “Conclusion” section.

Background Knowledge

The background knowledge consists of two parts, Lagrange dual problem and LapSVM, a batch manifold learning model.

Lagrange Dual Problem

The Lagrange dual technique is frequently used for solving the primal optimization problem in RKHS. Thus, we give a brief review of the Lagrange dual problem in this section. Consider the primal constrained optimization problem:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \min_x& \quad f_0(x) \\ \text{s.t.} &\quad f_i(x)\leq0,\quad i=1,...,p\\ &\quad h_i(x) = 0,\quad i=1,...,q, \end{array}\end{array} $$

(1)

where x ∈ R ⁿ and f ₀(x) is an objective cost function that is minimized under the p inequality constraints f _i(x) ≤ 0 and q equality constraints h _i(x) ≤ 0. The Lagrange function of Eq. 1 is

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} L(x,\alpha,\beta) = f_0(x) +\sum\limits_{i=1}^p\alpha_if_i(x)+\sum\limits_{i=1}^q\beta_ih_i(x) \end{array} \end{array} $$

(2)

where α = (α ₁,..., α _p)^T and β = (β ₁,..., β _q)^T are Lagrange multipliers. By minimizing (2) over x, the Lagrange dual function g is defined as:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} g(\alpha,\beta) = \min_x \quad L(x,\alpha,\beta) \end{array} \end{array} $$

(3)

Then the Lagrange dual problem of Eq. 1 is to maximize (3)

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \max_{\alpha,\beta}&\quad g(\alpha,\beta) \\ \text{s.t.}& \quad \alpha_i\geq 0,\quad i=1,...,p. \end{array} \end{array} $$

(4)

The strong duality, that is, the optimal value of Eqs. 1 and 4 being equal to each other, holds under the Slater’s condition [29], that is, the primal problem is convex and there exists x ₀ such that f _i(x ₀) < 0, i = 1,..., p. Therefore, the solution of the primal problem can be obtained by solving its Lagrange dual problem. Actually, the standard Laplacian SVM (SVM based on manifold regularization) is commonly solved by Lagrange dual technique, as reviewed below.

Manifold Regularization for Semi-supervised Learning

Laplacian support vector machine [30] (LapSVM) is derived by adding the manifold regularization term into support vector machine (SVM). Given the labeled training data (x ₁, y ₁),...,(x _l, y _l) and unlabeled training data x _l+1,..., x _{l + u}, where $x_{i}\in \mathcal {X}$ and y _i ∈{−1,1}, LapSVM is given by the following optimization problem:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \underset{f\in\mathcal{H}_K}{\min} &\frac{1}{l}\sum\limits_{i=1}^{l}(1-y_if(x_i))_{+}+\gamma_A||f||_K^2\\ &+\frac{\gamma_I}{(u+l)^2}\mathbf{f}^TL\mathbf{f}. \end{array} \end{array} $$

(5)

where γ _A, γ _I are trade-off parameters, f = [f(x ₁),..., f(x _{l + u})], $K: \mathcal {X} \times \mathcal {X} \rightarrow \mathbb {R}$ is a Mercer kernel and $\mathcal {H}_{K}$ is an associated RKHS of functions $f: \mathcal {X} \rightarrow \mathbb {R}$ with the corresponding norm ∥⋅∥_K. Especially, the graph Laplacian L is defined by L = D − W, where W is the edge weights matrix and D is a diagonal matrix (defined by $D_{ii} = {\sum }_{j=1}^{l+u}W_{ij}$, i = 1,..., l + u). By the Representer Theorem (see Theorem 2 in [30]), the optimal solution of Eq. 5 can be presented as

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllll} f^{*}(x) = \sum\limits_{i=1}^{l+u}\alpha_i^{*}K(x,x_i). \end{array} \end{array} $$

(6)

By adding a bias term b to the above formula, the primal problem (5) can be rewritten as:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \underset{\alpha\in \mathbb{R}^{l+u},\xi\in\mathbb{R}^l}{\min} \frac{1}{l}\sum\limits_{i=1}^{l}\xi_i+\gamma_A\alpha^TK\alpha+\frac{\gamma_I}{(u+l)^2}\alpha^TKLK\alpha \\ \text{s.t.}\quad y_i\left( \sum\limits_{j=1}^{l+u}\alpha_jK(x_i,x_j)+b\right)\geq 1-\xi_i,i=1,...,l \\ \xi_i \geq 0, i=1,...,l. \end{array} \end{array} $$

where α = [α ₁,..., α _{l + u}]^T. Then, we can obtain the Lagrangian:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllllll} L(\alpha,\xi,b,\beta,\zeta) \\ \qquad= \frac{1}{l}\sum\limits_{i=1}^{l}\xi_i+\frac{1}{2}\alpha^T\left( 2\gamma_AK+2\frac{\gamma_A}{(l+u)^2}KLK\right)\alpha \\ \qquad\quad-\sum\limits_{i=1}^{l}\beta_i\left( y_i\left( {\sum}_{j=1}^{l+u}\alpha_jK(x_i,x_j)+b\right)-1+\xi_i\right)\\\qquad\quad-\sum\limits_{i=1}^l\zeta_i\xi_i. \end{array} \end{array} $$

where β _i, ζ _i are Lagrange multipliers.

To obtain the minimum with respect to b and ξ, consider the conditions ∂ L/∂ b = 0 and ∂ L/∂ ξ _i = 0. Thus, a reduced Lagrangian can be formulated as follows:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} L^{R}(\alpha,\beta) &=& \frac{1}{2}\alpha^T\left( 2\gamma_AK+2\frac{\gamma_I}{(u+l)^2}KLK\right)\alpha \\ &&-\alpha^TKJ^TY\beta+\sum\limits_{i=1}^l\beta_i. \end{array} \end{array} $$

where J = [I 0] is an l × (l + u) matrix with I as the l × l identity matrix (assuming the first l points are labeled in the training set) and Y is a diagonal matrix with Y _{i
i} = y _i for i = 1,..., l. By taking derivative of the reduced Lagrangian with respect to α, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \frac{\partial{L^R}}{\partial{\alpha}} = \left( 2\gamma_AK+2\frac{\gamma_I}{(u+l)^2}KLK\right)\alpha-KJ^TY\beta. \end{array} \end{array} $$

(7)

So, we can get:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \alpha = \left( 2\gamma_AI+2\frac{\gamma_I}{(u+l)^2}LK\right)^{-1}J^TY\beta^{*}. \end{array} \end{array} $$

(8)

Substituting back in the reduced Lagrangian, an optimization problem with respect to β is derived as follows:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \beta^{*} =& \underset{\beta\in\mathbb{R}^l}{\max}\sum\limits_{i=1}^l\beta_i-\frac{1}{2}\beta^TQ\beta \\ \text{s. t. } & \sum\limits_{i=1}^{l}\beta_iy_i = 0 \\ &0\leq \beta_i \leq \frac{1}{l},i=1,...,l \end{array} \end{array} $$

(9)

where

$$\begin{array}{@{}rcl@{}} Q = YJK\left( 2\gamma_AI+2\frac{\gamma_I}{(u+l)^2}LK\right)^{-1}J^TY \end{array} $$

By solving (9) and using the Eqs. 6 and 8, we can obtain the optimal solution f ^∗(x). However, the process of training LapSVM classifier with all the training data can be very slow when the data size is large. To improve the computational efficiency for online learning, we proposed a novel online manifold learning model based on a constrained optimization problem, which is presented in the next section.

Online Manifold Learning with Kernels

In this section, the proposed model is presented in detail. In “Online Model Based on Manifold Regularization” section, a model based on manifold regularization is proposed for online semi-supervise learning in a RKHS. In “Online Algorithm of the Proposed Model” section, the proposed model is solved by exploiting the property of Lagrange dual problem. Several fast learning strategies are presented in “Fast Learning Strategies” section. A brief theoretical analysis of the proposed algorithms is presented in “Theory Analysis” section.

Online Model Based on Manifold Regularization

Assume that the current learning data for semi-supervised learning are (x ₁, y ₁, δ ₁),(x ₂, y ₂, δ ₂),..., (x _t, y _t, δ _t) where $x_{i}\in \mathcal {X}$ is a point, $y_{i}\in \mathcal {Y}=\{-1,1\}$ is its label and δ _i is a flag to determine whether the label y _i is available (y _i is available if and only if δ _i = 1). At round t, the current predictor is h _t(x) = sign(f _t(x)) and f ₀ is set as f ₀ = 0 in our algorithm. In online semi-supervised learning, when a new sample (x _t+1, y _t+1, δ _t+1) is available, the function f _t+1 is updated based on the current decision function f _t and the implicit feedback, that is, the manifold structure of the samples. The detailed process of online manifold learning is presented in Fig. 1. In Fig. 1, a new input x _t is provided to the current predictor and the decision value f(x _t) is computed by the predictor. Thereafter, the learner will update the decision function in different ways based on different feedbacks: if the label of x _t is available, the learner will update the classifier with both the explicit feedback y _t and the implicit feedback under the manifold structure of the samples; otherwise, the classifier will be updated only based on the implicit feedback. The process will continue until no more new samples arrive and the final predictor h _T(x) = sign (f _T(x)) (T is the final time) is derived for classification tasks.

Suppose that K(⋅,⋅) is a chosen Kernel function over the training samples and $\mathcal {H}$ is the corresponding RKHS. Therefore, according to the Representer Theory [31], we can write f _t and f _t+1 as follows:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllll} f_t(\cdot) &= \sum\limits_{i=1}^t\alpha_i^{t}K(x_i,\cdot), \\ f_{t+1}(\cdot) &= \sum\limits_{i=1}^t\alpha_i^{t+1}K(x_i,\cdot)+\alpha_{t+1}^{t+1}K(x_{t+1},\cdot). \end{array} \end{array} $$

(10)

In the online learning process, our aim is to update {α i t+1}i = 1t+1 from {α i t}i = 1t based on a proper algorithm. Considering the trade-off between the amount of progress made on each round and the amount of information retained from previous rounds, and compromise the classification error, the manifold constraint and the complexity of f as LapSVM, our online semi-supervised learning model with manifold regularization is presented as follows:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \underset{f,\xi_{t+1}}{\min}& \frac{1}{2}\|f-f_{t}\|_{\mathcal{H}}^{2}+\frac{\lambda_{1}}{2}\|f\|^{2}_{\mathcal{H}}+C\delta_{t+1}\xi_{t+1}\\&+ \frac{1}{2}\lambda_{2}\sum\limits_{i=1}^{t}(f(x_{i})-f(x_{t+1}))^{2}w_{it+1} \\ \text{s.t.} \quad &y_{t+1}f(x_{t+1})\geq 1-\xi_{t+1}, \xi_{t+1} \geq 0 \end{array} \end{array} $$

(11)

where $\frac {1}{2}\|f-f_{t}\|_{\mathcal {H}}^{2}$ measures the difference between f and the previous f _t, $\|f\|^{2}_{\mathcal {H}}$ controls the complexity of the decision function f, ${\sum }_{i=1}^{t}(f(x_{i})-f(x_{t+1}))^{2}$ w _{i
t+1} is the manifold regularizer which depends on the edge weight w _{i
t+1}, f and x _i, and ξ _t+1 is the slack variable denoting a possible error for the newly arrived data (x _t+1, y _t+1, δ _t+1) after f is determined, λ ₁, λ ₂ and C are parameters reflecting the weights compromising complexity, the manifold regularizer and the classification error.

In the objective function of Eq. 11, the manifold structure of the samples is reflected in the term ${\sum }_{i=1}^{t}(f(x_{i})-f(x_{t+1}))^{2}$ w _{i
t+1}, which can be regarded as an implicit feedback. This regularization term makes the new sample gain a similar decision value to its close sample in the manifold. The solution of the proposed model is presented in the next section.

Online Algorithm of the Proposed Model

In this section, we give a detailed solution of the proposed model by exploiting the property of Lagrange dual problem. Assuming that δ _t+1 = 1 (if δ _t+1 = 0, the solution of Eq. 11 can be obtained by the similar process as bellow), the Lagrange dual problem of Eq. 11 is

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllll} \underset{\gamma_{t+1}}{\max}\underset{f,\xi_{t+1}}{\min}\quad &{L(f,\xi_{t+1},\gamma_{t+1},\beta_{t+1})} \\ \text{s.t.} \quad &\gamma_{t+1} \geq 0, \quad \beta_{t+1} \geq 0 \end{array} \end{array} $$

(12)

where γ _t+1 and β _t+1 are the Lagrange multipliers corresponding to the constraints y _t+1 f(x _t+1) ≥ 1 − ξ _t+1 and ξ _t+1 ≥ 0, respectively, and

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} {L(f,\xi_{t+1},\gamma_{t+1},\beta_{t+1})}& =& \frac{1}{2}\|f-f_t\|_{\mathcal{H}}^2+\frac{\lambda_1}{2}\|f\|^2_{\mathcal{H}} \\ &&+\frac{1}{2}\lambda_2\sum\limits_{i=1}^{t}(f(x_i)\,-\,f(x_{t+1}))^2w_{it+1} \\ &&-\gamma_{t+1}(y_{t+1}f(x_{t+1})-1+\xi_{t+1}) \\ &&+C\xi_{t+1}-\beta_{t+1}\xi_{t+1} \end{array} \end{array} $$

By solving the Lagrange dual problem of Eq. 11 (the details can be found in the Appendix), we can obtain the new classifier at time t + 1:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} &f_{t+1}(x) = \sum\limits_{i=1}^{t+1}\alpha_i^{t+1}K(x_{t+1},x), \\ &h_{t+1} = \text{sign}({f_{t+1}(x)}), \end{array} \end{array} $$

(13)

where

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \alpha^{t+1} =A^{-1}(K\tilde{\alpha}^{t}+\delta_{t+1} y_{t+1}\gamma_{t+1}^{*}J). \end{array} \end{array} $$

The above process is summarized in Algorithm 1. In Algorithm 1, when the first sample arrives, the value of α ₁ is set to be 1.

However, there are two difficulties in performing Algorithm 1: (1) To compute the value of α _t, we need to compute the inverse of matrix A = K + λ ₁ K + λ ₂ K L K, which is difficult to calculate when t is very large; (2) In the online learning process, the online manifold learning algorithms with kernel functions have to store the sequence up to the current round. In a result, the set of support vectors will grow unboundedly, which limits the applicability of the online algorithms. Therefore, we present a fast algorithm to solve the proposed model and introduce several buffering strategies to reduce the number of support vectors.

Fast Learning Strategies

Fast Algorithm of the Proposed Model

In this section, we propose a fast algorithm to solve the proposed model. Note that if we let λ ₂ = 0, the process of calculating the inverse matrix can be avoided. There, for the sake of taking advantage of the properties of manifold regularization and improving the computational efficiency, we use an approximate term to replace (31).

Consider the formula (30), by replacing the term λ ₂ K L K α with λ ₂ K L K α _t, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllll} \frac{\partial {L}^R}{\partial \alpha} \approx &(K+\lambda_1K)\alpha+\lambda_2KLK\alpha_t\\ &-K\alpha^t-Jy_{t+1}\gamma_{t+1}\\ \end{array} \end{array} $$

(14)

This approximation is reasonable for that the term $\frac {1}{2}\|f-f_{t}\|_{\mathcal {H}}^{2}$ is used to control the distance of a predicted f from the previous f _t in our model, which can guarantee that the difference between α ^t+1 and α ^t is not very large. In addition, the convex function M(α) = α ^T λ ₂ K L K α is continuous and differentiable, so we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llllllll} \frac{\partial M}{\partial \alpha^{t+1}} \approx \frac{\partial M}{\partial \alpha^{t}} \\ \end{array} \end{array} $$

that is,

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \lambda_2KLK\alpha^{t+1} \approx \lambda_2KLK\alpha^{t} . \end{array} \end{array} $$

Now, from Eq. 14, we get

$$\begin{array}{@{}rcl@{}} \begin{array}{llllllll} {\alpha} = &\frac{1}{1+\lambda_1}[(I-\lambda_2LK)\alpha_t+ey_{t+1}\gamma_{t+1}]\\ \end{array} \end{array} $$

(15)

Taking the derivative of Eq. 29 with respect to γ _t+1 we get:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllllll} \frac{\partial {L}^R}{\partial \gamma_{t+1}} = 1-y_{t+1}\alpha^TJ=0 \end{array} \end{array} $$

(16)

Substituting (15) into (16), we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \overline{\gamma}_{t+1} = 1+\lambda_1-y_{t+1}J^T(I-\lambda_2LK)\alpha^t \end{array} \end{array} $$

(17)

Let the approximate solution of Eq. 30 be $\hat {\alpha }_{t+1}$ and $\hat {\gamma }^{*}_{t+1}$. Hence,

$$\begin{array}{@{}rcl@{}} \hat{\gamma}_{t+1}^{*} = \left\{ \begin{array}{lcl} 0,\quad \text{if} \quad \overline{\gamma}_{t+1}\leq 0 \\ C,\quad \text{if} \quad \overline{\gamma}_{t+1}\geq 0 \\ \overline{\gamma}_{t+1},\quad \text{otherwise} \end{array} \right. \end{array} $$

(18)

Similar to Eq. 13, the classifier obtained at time t + 1 is:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} &f_{t+1}(x) = \sum\limits_{i=1}^{t+1}\hat{\alpha}_i^{t+1}K(x_{t+1},x), \\ &h_{t+1} = \text{sign}({f_{t+1}(x)}), \end{array} \end{array} $$

(19)

where

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \hat{\alpha}_{t+1} = &\frac{1}{1+\lambda_1}\left[(I-\lambda_2LK)\alpha_t+e\delta_{t+1}y_{t+1}\hat{\gamma}^{*}_{t+1}\right] \end{array} \end{array} $$

The above process is summarized in Algorithm 2.

The main computation in Eqs. 15 and 17 is to calculate the matrix multiplication L × K. It can be seen that L = D − W is a sparse matrix by its definition (D is a diagonal matrix and W is a matrix that only 2t elements are non-zero), which means the computational complexity of L × K is only O(t ²). Therefore, the computational complexity is O(t ²) of Algorithm 2. Note that the computational complexity becomes very high with the increasing of t. This limits the scalability of the proposed algorithms. Therefore, we present several buffering strategies to improve the scalability of the online algorithms in the next section.

Buffering Strategies

In practice, kernel-based discriminative algorithms have been shown to perform very well on semi-supervised learning problems [30, 32]. However, in the online learning process, the set of support vectors will grow unboundedly, which limits the applicability of the online manifold regularization algorithms. To address this problem, we present several approaches to bound the size of the support set.

Buffering strategies [5, 24, 33] keep a fixed number of support vectors for online learning. Let the buffer size be τ. There are several different strategies:

(1)
Buffer-N. The oldest sample in the buffer is replaced with the new incoming sample after each online learning round.
(2)
Buffer-U. The oldest unlabeled sample in the buffer is replaced with the new incoming sample after each online learning round. When the buffer is filled with labeled samples, the oldest labeled points is evicted from the buffer.

To modify this for a more general case, we remove the sample with the smallest |α i t| in round t, where |⋅| is the absolute value symbol. As suggested in [24], we choose Buffer-U as the buffering strategy for all our experiments.

Theory Analysis

In this section, we give out a brief theory analysis of the proposed algorithms.

Theorem 1

Suppose that K (⋅,⋅) is a chosen Kernel function over the training samples and $\mathcal {H}$ is the corresponding RKHS, then (13) is exactly the solution of the primal problem (11) .

Proof

Let

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} c_t^1(f) =& 1-\xi_{t+1}-y_{t+1}f(x_{t+1}), \\ c_t^2(f)=&-\xi_{t+1}. \end{array} \end{array} $$

Apparently, c t1(f) and c t2(f) are continuous. In addition, since the object function of Eq. 11 is convex, by the Convex Duality Theorem of [34] (see the Theorem 14.37 on page 532), the optimal value of the primal problem (11) is equal to that of its Lagrange dual problem. Therefore, according to the above derivational process, the result (13) is exactly the solution of the primal problem (11). □

Theorem 1 implies that MOMR is an exact algorithm with respect to the proposed model (11). Next, we give out an analysis of the relationship between the proposed MOMR and FMOMR.

Theorem 2

Suppose λ ₂ = 0, then the solution of Eqs. 13 and 19 is equivalent.

Proof

Suppose λ ₂ = 0, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llllllll} J^TA^{-1}J &= e^TK(K+\lambda_1K)^{-1}Ke \\ &= \frac{1}{1+\lambda_1}e^TKe = \frac{1}{1+\lambda_1}. \end{array} \end{array} $$

(20)

Therefore, from Eq. 33, we get

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \overline{\gamma}_{t+1} &= \frac{1-y_{t+1}J^TA^{-1}K\tilde{\alpha}^t}{J^TA^{-1}J}\\ &= 1+\lambda_1-y_{t+1}J^T\tilde{\alpha}^t. \end{array} \end{array} $$

(21)

which is equivalent to Eq. 17 if λ ₂ = 0.

Similarly, by substituting λ ₂ = 0 into (31) and (15) respectively, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} \hat{\alpha}_{t+1} = {\alpha}_{t+1} = \frac{1}{1+\lambda_1}(\tilde{\alpha}^{t}+ey_{t+1}\gamma_{t+1}). \end{array} \end{array} $$

(22)

□

Theorem 2 is reasonable for that the Algorithm 2 is obtained only by approximating the derivative of the manifold regularization term. In addition, for that (31) and (15) are continuous with respect to λ ₂, (15) is an appropriate approximation of Eq. 31 when λ ₂ is very small.

Experiments

In this section, to verify the effectiveness, we compare the proposed algorithms, MOMR and FMOMR, with two online manifold regularization algorithms and a batch algorithm on three data sets (see “Handwritten Digit Recognition—4” section), respectively.

In all the experiments, the RBF kernel $k(x_{i},x_{j})=\exp ({-{\|x_{i}-x_{j}\|^{2}}/(2{{\sigma _{K}^{2}}}}))$ is used for classification. The edge weight is $k(x_{i},x_{j})=\exp ({-{\|x_{i}-x_{j}\|^{2}}/(2{{\sigma _{W}^{2}}}}))$, which define a fully connected graph. The labeled rate of training samples is 2%.

In our experiments, we focus on online manifold regularization algorithms derived from the dual problem. Therefore, we compare the performance of our algorithms with an online manifold regularization algorithm based on Example-Associate Update (denoted by OMR-EA), an online manifold regularization algorithm based on Overall Update (denoted by OMR-Overall) [26], and a batch manifold regularization algorithm LapSVM [30]. As suggested in [26], the step sizes of the OMR-EA and OMR-Overall are set to be a small value 0.01.

All the evaluations share the same buffering strategy, Buffer-U, but employ different buffer sizes (B ∈{50,100,150,200}). The parameter values σ _K, σ _W, λ ₁ and λ ₂ are selected by using five-fold cross validation on the first 500 samples of the training data, where σ _K, σ _W ∈{2⁻³, 2⁻², 2⁻¹, 2⁻⁰, 2¹, 2², 2³} and λ ₁, λ ₂ ∈{10⁻⁵,10⁻⁴, 10⁻³,10⁻²,10⁻¹,10⁻⁰,10¹,10²}. In addition, the value of parameter C is set to be 1 for the proposed algorithm MOMR and FMOMR. The computational efficiencies of all the algorithms are evaluated in terms of their CPU running time (in seconds). All the experiments are implemented in Matlab on a PC with Inter(R) Core(TM) 3.2 GHz CPU, 4G RAM and Windows 7 operating system.

All the four online algorithms are performed in the same way which can be divided into two steps: (1) Online processing: training a classifier with a new arrived sample using an online algorithm. (2) Test: testing the final model on a test set. However, the batch algorithm LapSVM is trained with all the visible samples in each learning round. We repeat all the experiments ten times (each with an independent random permutation of the training samples) and the results presented bellow are all average over ten trials.

Handwritten Digit Recognition

In this section, we perform an evaluation experiment on the MNIST data set [35]. We focus on the binary classification task of separating “6” from “8” (MNIST6VS8) in our experiment. The sizes of the training set and test set are 11769 and 1932 respectively. Some images of the MNIST6VS8 data set are presented in Fig. 2.

The test accuracies are summarized in Table 1. From the results, the test accuracies of MOMR and FMOMR are comparable to those of LapSVM and higher than those of OMR-EA and OMR-Overall. This is reasonable for that MOMR is exactly the solution of the proposed model, while OMR-EA and OMR-Overall [26] are obtained by using the stochastic gradient methods. And, the performance of the fast algorithm FMOMR is very similar to that of the algorithm MOMR. It implies that FMOMR is a proper approximate solution to the proposed model.

Table 1 On the MNIST6VS8, test accuracies (%) of MOMR, FMOMR, OMR-EA, OMR-Overall, and LapSVM with using different buffer sizes

Full size table

The online updating time is presented in Fig. 3. We can see that: with respect to the updating time (a) MOMR is comparable to the other three online algorithms when the buffer size is small; (b) FMOMR is comparable to the online algorithms OMR-EA and OMR-Overall, and much faster than the off-line algorithm LapSVM. These are reasonable for that each sample is trained only once by the online algorithms and a buffering strategy is used to reduce the repeated training process.

Face Recognition

This experiment is performed on the data set FACEMIT [36] which contains 361-dimensional images of faces and non-faces. A balanced subset (size 5000) from FACEMIT is randomly sampled and divided into two sets obeying a rule that the number of training samples is equal to that of test samples. Some images of the FACEMIT data set are presented in Fig. 4.

The test accuracies are summarized in Table 2. We can make the following comments: (a) The test accuracies of MOMR and FMOMR are higher than those of the other algorithms; (b) FMOMR surpass other algorithms with respect to the test accuracy, which further demonstrates that the proposed fast approximate algorithm FMOMR is reasonable and efficient.

Table 2 On the FACEMIT, test accuracies (%) of MOMR, FMOMR, OMR-EA, OMR-Overall, and LapSVM with different buffer sizes

Full size table

The online updating time of the five algorithms are presented in Fig. 5. It can be seen that FMOMR is the fastest algorithm among all the five algorithms. Additionally, the difference between MOMR and FMOMR increases with the increasing of buffer size. This can be explained by that the computational complexity of MOMR and FMOMR are O(B ³) and O(B ²) respectively. Note that in Fig. 5, the curves are plotted by using singe logarithmic coordinate axis. Therefore, MOMR consumes more time than FMOMR as B increases. However, when B is small, both MOMR and FMOMR can be implemented very fast, so that the difference of cumulative running time between MOMR and FMOMR is insignificant.

Action Video Categorization

Further, we evaluate our methods on a kind of multi-manifold data, action video. As we know, a video is always made up of lots of static images which keep coherence in content and space, especially action videos. We adopt the UCF YouTube dataset [37] which consists of 1168 video sequences captured under uncontrolled conditions. It is a challenging dataset owing to tremendous variations in camera motion, object pose, cluttered background, viewpoint, illumination, etc. We select two action categories, biking and diving (some images are presented in Fig. 6.), which both have a better continuity. We utilize the dense trajectories [38] to describe actions in the videos. The method can extract essential features representing actions which is robust to fast irregular motions and short boundaries. Then, 10,000 frames are sampled from these 2 action categories respectively (so the total number of frames is 20,000). They are divided into two sets: the training set and the test set with a proportion 1:1 for our experiment. The task is to classify these frames into these two action categories.

The test accuracies are summarized in Table 3. We can make the following comments: (a) The test accuracies of MOMR and FMOMR are comparable with the off-line algorithm LapSVM and higher than those of the two online algorithms OMR-EA and OMR-Overall; (b) the proposed algorithms MOMR and FMOMR make 2% improvements over the other two online algorithms when the buffer size is small; and (c) all the online algorithms make a significant improvement on performance with the increasing of the buffer size.

Table 3 Test accuracies (%) of MOMR, FMOMR, OMR-EA, OMR-Overall, and LapSVM on a subset of UCF YouTube [29] with different buffer sizes

Full size table

The online updating time of the five algorithms are presented in Fig. 7. With respect to the running time, it can be seen that (a) FMOMR is faster than the other compared online algorithms, and (b) all the compared online algorithms are much faster than the off-line algorithm LapSVM.

Considering the above results, it can be inferred that the proposed algorithms can reach the first grade among the five algorithms both on the test accuracy aspect and on the running time aspect. The proposed fast algorithm FMOMR is the best among the compared online algorithms for its performances on the three data sets because (a) FMOMR is the fastest algorithm among the compared algorithm; (b) in the aspect of generalization performance, FMOMR is better than OMR-EA and OMR-Overall and FMOMR has a comparable performance to the batch algorithm LapSVM.

Additionally, the test accuracy is higher with a larger buffer, but the time cost increases with the increase of the buffer size. In practice, the buffer size can be used to trade-off the accuracy and the time cost of online classifiers. An appropriate buffer size can be derived by using cross validation on the first N arrived samples, where N is a predefined number.

Conclusion

According to the manifold regularized online model, we give out an analytical solution of the constrained optimization problem by exploiting the techniques of the Lagrange dual problem. The proposed idea offers two new algorithms to solve the online semi-supervised leaning problem. Experiment results verify the effectiveness and validity of the proposed algorithms.

In fact, the proposed algorithms can solve not only semi-supervised learning problems but also online supervise learning problems (this can be done in the algorithm MOMR by deleting the manifold regularization term from the object function of (11) and setting the value of λ ₂ to be 0). In the future work, we will extend the proposed algorithms to solve some specific online learning problems.

References

Kivinen J, Smola AJ, Williamson RC. Online learning with kernels. IEEE Trans Sig Process. 2004;52(8):2165–76.
Article Google Scholar
Li GQ, Wen CY, Li ZG, Zhang A, Yang F, Mao K. Model-based online learning with kernels. IEEE Trans Neural Netw Learn Syst. 2013;24(3):356–69.
Article PubMed Google Scholar
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011;12:2121–59.
Google Scholar
Huang KZ, Yang HQ, Lyu MR. Machine learning: modeling data locally and globally. Springer Science & Business Media. 2008.
Orabona F, Keshet J, Caputo B. Bounded kernel-based online learning. J Mach Learn Res. 2009;10: 2643–66.
Google Scholar
Ertekin S, Bottou L, Giles CL. Nonconvex online support vector machines. IEEE Trans Pattern Anal Mach Intell. 2011;33(2):368–81.
Article PubMed Google Scholar
Hoi SC, Wang JL, Zhao PL. Libol: A library for online learning algorithms. J Mach Learn Res. 2014; 15(1):495–9.
Google Scholar
Ding S, Zhang J, Jia H, Qian J. An adaptive density data stream clustering algorithm. Cogn Comput. 2016;8(1):30–8.
Article Google Scholar
Gepperth A, Karaoguz C. A bio-inspired incremental learning architecture for applied perceptual problems. Cogn Comput. 2016;8(5):924–34.
Article Google Scholar
Zhao J, Du C, Sun H, Liu X, Sun J. Biologically motivated model for outdoor scene classification. Cogn Comput. 2015;7(1):20– 33.
Article Google Scholar
Wang D, Qiao H, Zhang B, Wang M. Online support vector machine based on convex Hull vertices selection. IEEE Trans Neural Netw Learn Syst. 2013;24(4):593–609.
Article PubMed Google Scholar
Ding SG, Nie XL, Qiao H, Zhang B. Online classification for SAR target recognition based on SVM and approximate convex hull vertices selection. In: 11th World Congress on intelligent control and automation (WCICA); 2014. p. 1473–1478.
Wu PC, Hoi SC, Zhao PL, Xia H, Liu ZY, Miao CY. Online multi-modal distance metric learning with application to image retrieval. IEEE Trans Knowl Data Eng. 2016;28(2):454–67.
Article Google Scholar
Scardapane S, Uncini A. Semi-supervised echo state networks for audio classification. Cogn Comput. 2016;1–11.
Zhang YM, Huang KZ, Geng GG, Liu CL. A fast and robust graph-based transductive learning method. IEEE Trans Neural Netw Learn Syst. 2015;26(9):1979–91.
Article PubMed Google Scholar
Zhu XJ, Rogers T, Qian RC, Kalish C. Humans perform semi-supervised classification too. In: Proceedings of the national conference on artificial intelligence. vol. 22. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999; 2007. p. 864.
Yang HQ, Huang KZ, King I, Lyu MR. Maximum margin semi-supervised learning with irrelevant data. Neural Netw. 2015;70 :90–102.
Article PubMed Google Scholar
Gibson BR, Rogers TT, Zhu XJ. Human semi-supervised learning. Topics Cogn Sci. 2013;5(1):132–72.
Article Google Scholar
Babenko B, Yang MH, Belongie S. Visual tracking with online multiple instance learning. In: IEEE Conference on computer vision and pattern recognition; 2009. p. 983–990.
Grabner H, Leistner C, Bischof H. Semi-supervised on-line boosting for robust tracking. In: Computer Vision–European conference on computer vision. Springer; 2008. p. 234–247.
Dyer KB, Capo R, Polikar R. Compose: a semisupervised learning framework for initially labeled nonstationary streaming data. IEEE Trans Neural Netw Learn Syst. 2014;25(1):12–26.
Article PubMed Google Scholar
Kveton B, Philipose M, Valko M, Huang L. Online semi-supervised perception: Real-time learning without explicit feedback. In: IEEE Computer society conference on computer vision and pattern recognition workshops (CVPRW); 2010. p. 15–21.
Farajtabar M, Shaban A, Rabiee HR, Rohban MH. Manifold coarse graining for online semi-supervised learning. In: Machine Learning and Knowledge Discovery in Databases. Springer; 2011. p. 391–406.
Goldberg AB, Li M, Zhu XJ. Online manifold regularization: A new learning setting and empirical study. Springer. 2008;393–407.
Goldberg AB, Zhu XJ, Furger A, Xu JM. OASIS: Online active semi-supervised learning. In: Proceedings of the Twenty-Fifth AAAI conference on artificial intelligence; 2011.
Sun BL, Li GH, Jia L, Zhang H. Online manifold regularization by dual ascending procedure. Math Probl Eng. 2013;2013.
Sun BL, Li GH, Jia L, Huang KH. Online coregularization for multiview semisupervised learning. Sci World J. 2013;2013.
Ding SG, Xi XY, Liu ZY, Qiao H, Zhang B. A novel manifold regularized online semi-supervised learning algorithm. In: International conference on neural information processing. Springer; 2016. p. 597–605.
Slater M. Lagrange multipliers revisited. Springer. 2014.
Belkin M, Niyogi P, Sindhwani V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res. 2006;7:2399–434.
Google Scholar
Schölkopf B, Herbrich R, Smola AJ. A generalized representer theorem. In: Computational learning theory. Springer; 2001. p. 416–426.
Melacci S, Belkin M. Laplacian support vector machines trained in the primal. J Mach Learn Res. 2011; 12:1149–84.
Google Scholar
Dekel O, Shalev-Shwartz S, Singer Y. The forgetron: A kernel-based perceptron on a budget. SIAM J Comput. 2008;37(5):1342–72.
Article Google Scholar
Griva I, Nash SG, Sofer A. Linear and nonlinear optimization. 2009.
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
Article Google Scholar
Heisele B, Poggio T, Pontil M. Face detection in still gray images. AI Memo 1697 Massachusetts Institute of Technology. 2000.
Liu J, Luo J, Shah M. Recognizing realistic actions from videos “in the wild”. In: IEEE Conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE; 2009. p. 1996–2003.
Wang H, Kläser A, Schmid C, Liu CL. Action recognition by dense trajectories. In: 2011 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE; 2011. p. 3169– 3176.

Download references

Acknowledgment

This work is partly supported by NSFC grants 61375005, U1613213, 61210009, 61627808, 61603389, 61602483, MOST grants 2015BAK35B00, 2015BAK35B01, Guangdong Science and Technology Department grant 2016B090910001, and BNSF grant 4174107.

Author information

Authors and Affiliations

LSEC and Institute of Applied Mathematics, AMSS, Chinese Academy of Sciences, Beijing, 100190, China
Shuguang Ding & Bo Zhang
State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Xuanyang Xi, Zhiyong Liu & Hong Qiao
CAS Centre for Excellence in Brain Science and Intelligence Technology (CEBSIT), Shanghai, 200031, China
Zhiyong Liu & Hong Qiao
Cloud Computing Center, Chinese Academy of Sciences, DongGuan, GuangDong, 523808, China
Zhiyong Liu & Hong Qiao

Authors

Shuguang Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xuanyang Xi
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hong Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Liu.

Ethics declarations

Conflict of interests

We declare that we have no conflict of interest.

Human and Animal Rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Appendix

In this Appendix, we give out the derivation process of Eq. 13.

For simplicity, we define D and W as

$$\begin{array}{@{}rcl@{}} D_{ij} = \left\{ \begin{array}{lcl} {w_{ij}} &\text{if} &0<i=j<t+1 \\ {{\sum}_{i=1}^{t}w_{it+1}} &\text{if} &i=j=t+1\\ {0} &\text{otherwise} \end{array} \right. \end{array} $$

(23)

$$\begin{array}{@{}rcl@{}} W_{ij} = \left\{ \begin{array}{lcl} {w_{ij}} &\text{if} &0<i<t+1,j=t+1 \\ {w_{ij}} &\text{if} &i=t+1,0<j<t+1 \\ {0} &\text{otherwise} \end{array} \right. \end{array} $$

(24)

Substituting (10), (23), (24) into (12) and letting L = D − W, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llll} {L(\alpha,\xi_{t+1},\gamma_{t+1},\beta_{t+1})}& =& \frac{1}{2}\alpha^T(K+\lambda_1K+\lambda_2KLK)\alpha \\ &&-\gamma_{t+1}(y_{t+1}\alpha^TJ-1+\xi_{t+1})\\ &&-\alpha^TK\tilde{\alpha}^t-\beta_{t+1}\xi_{t+1}+C\xi_{t+1}+c_0 \end{array}\\ \end{array} $$

(25)

where α = [α ₁,..., α _t+1]^T, $\tilde {\alpha }^{t} = [{\alpha ^{t}_{1}},...,{\alpha ^{t}_{t}},0]^{T}$, K is a (t + 1) × (t + 1) Gram Matrix with K _{i
j} = K(x _i, x _j), J = K e, e = [0,..., 0, 1]^T is a (t + 1)-dimensional vector and c ₀ is a constant.

Note that L(α, ξ _t+1, γ _t+1, β _t+1) attains its minimum with respect to α and ξ _t+1, if and only if the following conditions are satisfied:

$$\begin{array}{@{}rcl@{}} \nabla_{\alpha}{L(\alpha,\xi_{t+1},\gamma_{t+1},\beta_{t+1})} = 0, \end{array} $$

(26)

$$\begin{array}{@{}rcl@{}} \nabla_{\xi_{t+1}}{L(\alpha,\xi_{t+1},\gamma_{t+1},\beta_{t+1})} = 0. \end{array} $$

(27)

Therefore, we have

$$\begin{array}{@{}rcl@{}} \begin{array}{llll} \frac{\partial L}{\partial \xi_{t+1}} &= -\gamma_{t+1}-\beta_{t+1}+C=0\\ &\Longrightarrow \quad 0\leq\gamma_{t+1}\leq C. \end{array} \end{array} $$

(28)

According to the above identity, we formulate a reduced Lagrangian:

$$\begin{array}{@{}rcl@{}} \begin{array}{llll} L^{R}(\alpha,\gamma_{t+1}) =& \frac{1}{2}\alpha^T(K+\lambda_1K+\lambda_2KLK)\alpha \\ &-\gamma_{t+1}(y_{t+1}\alpha^TJ-1)\\ &-\alpha^TK\tilde{\alpha}^t+c_0. \end{array} \end{array} $$

(29)

Taking derivative of Eq. 29 with respect to α, we have:

$$\begin{array}{@{}rcl@{}} \begin{array}{llll} \frac{\partial L^R}{\partial \alpha} = &(K+\lambda_1K+\lambda_2KLK)\alpha\\ &-K\tilde{\alpha}^t-Jy_{t+1}\gamma_{t+1}. \end{array} \end{array} $$

(30)

Note that ∂ L ^R/∂ α = 0. Therefore, we have:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \alpha = &(K+\lambda_1K+\lambda_2KLK)^{-1}\times\\ &(K\tilde{\alpha}^t+Jy_{t+1}\gamma_{t+1}). \end{array} \end{array} $$

(31)

Substituting (31) back into the reduced Lagrangian (29), we get:

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \underset{\gamma_{t+1}}{\max} &\!-\frac{1}{2}(K\tilde{\alpha}^t\,+\,Jy_{t+1}\gamma_{t+1})^TA^{-1}(K\tilde{\alpha}^t\,+\,Jy_{t+1}\gamma_{t+1})\\ &+\gamma_{t+1}\\ \text{s.t.} &\quad \quad \quad \quad \quad \quad 0\leq\gamma_{t+1}\leq C, \end{array}\\ \end{array} $$

(32)

where A = K + λ ₁ K + λ ₂ K L K.

Let $\overline {\gamma }_{t+1}$ be the stationary point of the object function of Eq. 32.

Therefore,

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \overline{\gamma}_{t+1} = \frac{1-y_{t+1}J^TA^{-1}K\tilde{\alpha}^t}{J^TA^{-1}J}. \end{array} \end{array} $$

(33)

Assume that the optimal solution of Eq. 32 is γ t+1∗ . Note that the object function (32) is quadratic, so the optimal solution γ t+1∗ in the interval [0, C] is at either 0, C or $\overline {\gamma }_{t+1}$. Hence

$$\begin{array}{@{}rcl@{}} {\gamma}_{t+1}^{*} = \left\{ \begin{array}{lcl} 0,\quad \text{if} \quad \overline{\gamma}_{t+1}\leq 0 \\ C,\quad \text{if} \quad \overline{\gamma}_{t+1}\geq 0 \\ \overline{\gamma}_{t+1},\quad \text{otherwise} \end{array} \right. \end{array} $$

(34)

Furthermore, if δ _t+1 = 0, we can obtain the solution of the proposed model by the similar process as above. Thus, the classifier obtained at time t + 1 is:

$$\begin{array}{@{}rcl@{}} \begin{array}{llllll} &f_{t+1}(x) = \sum\limits_{i=1}^{t+1}\alpha_i^{t+1}K(x_{t+1},x), \\ &h_{t+1} = \text{sign}({f_{t+1}(x)}), \end{array} \end{array} $$

(35)

where

$$\begin{array}{@{}rcl@{}} \begin{array}{lllll} \alpha^{t+1} =A^{-1}(K\tilde{\alpha}^{t}+\delta_{t+1} y_{t+1}\gamma_{t+1}^{*}J). \end{array} \end{array} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, S., Xi, X., Liu, Z. et al. A Novel Manifold Regularized Online Semi-supervised Learning Model. Cogn Comput 10, 49–61 (2018). https://doi.org/10.1007/s12559-017-9489-x

Download citation

Received: 06 January 2017
Accepted: 18 July 2017
Published: 02 August 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s12559-017-9489-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Novel Manifold Regularized Online Semi-supervised Learning Model

Abstract

Similar content being viewed by others

A Novel Manifold Regularized Online Semi-supervised Learning Algorithm

Manifold proximal support vector machine with mixed-norm for semi-supervised classification

Semi-supervised Kernel Minimum Squared Error Based on Manifold Structure

Introduction

Background Knowledge

Lagrange Dual Problem

Manifold Regularization for Semi-supervised Learning

Online Manifold Learning with Kernels

Online Model Based on Manifold Regularization

Online Algorithm of the Proposed Model

Fast Learning Strategies

Fast Algorithm of the Proposed Model

Buffering Strategies

Theory Analysis

Theorem 1

Proof

Theorem 2

Proof

Experiments

Handwritten Digit Recognition

Face Recognition

Action Video Categorization

Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Human and Animal Rights

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Manifold Regularized Online Semi-supervised Learning Model

Abstract

Similar content being viewed by others

A Novel Manifold Regularized Online Semi-supervised Learning Algorithm

Manifold proximal support vector machine with mixed-norm for semi-supervised classification

Semi-supervised Kernel Minimum Squared Error Based on Manifold Structure

Explore related subjects

Introduction

Background Knowledge

Lagrange Dual Problem

Manifold Regularization for Semi-supervised Learning

Online Manifold Learning with Kernels

Online Model Based on Manifold Regularization

Online Algorithm of the Proposed Model

Fast Learning Strategies

Fast Algorithm of the Proposed Model

Buffering Strategies

Theory Analysis

Theorem 1

Proof

Theorem 2

Proof

Experiments

Handwritten Digit Recognition

Face Recognition

Action Video Categorization

Conclusion

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interests

Human and Animal Rights

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation