Multi-view Laplacian twin support vector machines

Xie, Xijiong; Sun, Shiliang

doi:10.1007/s10489-014-0563-8

Multi-view Laplacian twin support vector machines

Published: 03 September 2014

Volume 41, pages 1059–1068, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Multi-view Laplacian twin support vector machines

Download PDF

Xijiong Xie¹ &
Shiliang Sun¹

1102 Accesses
46 Citations
Explore all metrics

Abstract

Twin support vector machines are a recently proposed learning method for pattern classification. They learn two hyperplanes rather than one as in usual support vector machines and often bring performance improvements. Semi-supervised learning has attracted great attention in machine learning in the last decade. Laplacian support vector machines and Laplacian twin support vector machines have been proposed in the semi-supervised learning framework. In this paper, inspired by the recent success of multi-view learning we propose multi-view Laplacian twin support vector machines, whose dual optimization problems are quadratic programming problems. We further extend them to kernel multi-view Laplacian twin support vector machines. Experimental results demonstrate that our proposed methods are effective.

Regularized multi-view least squares twin support vector machines

Article 07 February 2018

Multi-task $\nu$-twin support vector machines

Article 29 November 2019

Multi-view semi-supervised least squares twin support vector machines with manifold-preserving graph reduction

Article 15 May 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Support vector machines (SVMs) are a state-of-the-art tool for pattern classification and regression problems [1–3], which originate from the idea of structural risk minization in statistical learning theory. SVMs can learn a nonlinear decision function which is linear in a potentially high-dimensional feature space [4] with the aid of the kernel trick. In practice, SVMs have been applied to a variety of domains such as object detection, text categorization, bioinformatics and image classification, etc.

In order to reduce the computational cost of SVMs, proximal support vector machines (PSVMs) [5] have been proposed. Compared with SVMs, PSVMs solve a linear equation with time complexity O(d ³) (d is the dimension of the examples) while SVMs solve the convex optimization problem. In essence, PSVMs classify the examples by a hyperplane on the premise of guaranteeing the maximum margin. Mangasarian and Wild [6] proposed generalized eigenvalue proximal SVMs (GEPSVMs) which are an extension of PSVMs for binary classification. Instead of finding a single hyperplane as in PSVMs, GEPSVMs find two nonparallel hyperplanes such that each hyperplane is as close as possible to examples from one class and as far as possible to examples from the other class. The two hyperplanes are obtained by eigenvectors corresponding to the smallest eigenvalues of two related generalized eigenvalue problems. Jayadeva et al. [7] proposed another nonparallel hyperplane classifier called twin SVMs (TSVMs), which aim to generate two nonparallel hyperplanes such that one of the hyperplanes is closer to one class and has a certain distance to the other class. The formulation of TSVMs is different from that of GEPSVMs and is similar to SVMs. TSVMs solve a pair of quadratic programming problems (QPPs), whereas SVMs solve a single QPP. This strategy of solving two smaller sized QPPs rather than one large QPP makes TSVMs work faster than standard SVMs [7]. Experimental results [8] show that nonparallel hyperplane classifiers given by TSVMs can indeed improve the performance of conventional SVMs [9–14].

In many machine learning tasks [15–18], labeled examples are often difficult and expensive to obtain, while unlabeled examples may be relatively easy to collect. Semi-supervised learning has attracted a great deal of attention in the last decade to deal with this situation. It can be superior to the performance of the counterpart supervised learning approaches if the unlabeled data are properly used. Some extensions of SVMs and TSVMs from supervised learning to semi-supervised learning have been proposed, e.g., transductive SVMs, semi-supervised support vector machines, Laplacian support vector machines (LapSVMs), Laplacian twin support vector machines (LapTSVMs) [19–24]. LapTSVMs [24] are a successful combination of semi-supervised learning and TSVMs, which are a generalized framework of twin support vector machines for learning from labeled and unlabeled data. By choosing appropriate parameters, LapTSVMs can degenerate to TSVMs [25, 26]. Experimental results showed that LapTSVMs are superior to LapSVMs and TSVMs in classification accuracy and the training time is more economical than LapSVMs and TSVMs.

In many real-world applications, multi-modal data are very common because of the use of different measuring methods (e.g., infrared and visual cameras), or of different media (e.g., text, video and audio) [27]. For example, web pages can be represented by a vector for the words in the web page text and another vector for the words in the anchor text of a hyper-link. In content-based web-image retrieval, an image can be simultaneously described by visual features and the text surrounding the image. Multi-view learning (MVL) is an emerging direction which aims to improve classifiers by leveraging the complementarity and consistency among distinct views [28–30]. The theories on MVL can be classified to four categories which are canonical correlation analysis, effectiveness of co-training, generalization error analysis for co-training and generalization error analysis for other MVL approaches [27].

SVM-2K is a successful combination of MVL and SVMs which combines the maximum margin and multi-view regularization principles to leverage two views to improve classification performance [31]. Farquhar et al. [31] have provided a theoretical analysis to illuminate the effectiveness of SVM-2K, showing a significant reduction in the Rademacher complexity of the corresponding function class. Sun and Shawe-Taylor characterized the generalization error of multi-view sparse SVMs [32] and multi-view LapSVMs (MvLapSVMs) [33] in terms of the margin bound and derived the empirical Rademacher complexity of the considered function classes [34]. MvLapSVMs integrate three regularization terms respectively on function norm, manifold and multi-view regularization in the objective function. However, there is no existing multi-view extension for LapTSVMs although LapTSVMs are superior to LapSVMs. In this paper, we extend LapTSVMs to our new frameworks named by multi-view Laplacian twin support vector machines (MvLapTSVMs) which combine two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct TSVMs from two feature spaces. Compared to MvLapSVMs, there are two main differences. First, LapSVMs and LapTSVMs are different in the principle though they commonly use the manifold regularization term for semi-supervised learning. MvLapSVMs are based on LapSVMs while MvLapTSVMs are based on LapTSVMs. Second, MvLapTSVMs combine two views in the constraints rather than in the objective function. Experimental results validate that our proposed methods are effective.

The remainder of this paper proceeds as follows. Section 2 briefly reviews related work including SVMs, TSVMs, LapSVMs, LapTSVMs and SVM-2K. Section 3 introduces our proposed linear MvLapTSVMs and kernel MvLapTSVMs. After reporting experimental results in Section 4, we give conclusions in Section 5.

2 Related work

In this section, we briefly review SVMs, TSVMs, LapSVMs, LapTSVMs and SVM-2K. They constitute the foundation of our subsequent proposed methods.

2.1 SVMs and TSVMs

Suppose there are l examples represented by matrix A with the ith row A _i (i = 1, 2, ⋯ ,l) being the ith example. Let y _i ∈ {1, −1} denote the class to which the ith example belongs. For simplicity, here we only review the linearly separable case [1]. Then, we need to determine w ∈ R ^d and b ∈ R such that

$$ y_{i}(A_{i}w+b)\geq 1. $$

(1)

The hyperplane described by w ^⊤ x + b = 0 lies midway between the bounding hyperplanes given by w ^⊤ x + b = 1 and w ^⊤ x + b = −1. The margin of separation between the two classes is given by $\frac {2}{\| w \|},$ where ∥w∥ denotes the ℓ ₂ norm of w. Support vectors are those training examples lying on the above two hyperplanes. The standard SVMs [1] are obtained by solving the following problem

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w,b} \,\,\,\frac{1}{2}w^{\top}w \\ &&\text{s.t.}\,\,\,\,\,\forall i:y_{i}(A_{i}w+b)\geq 1. \end{array} $$

(2)

The decision function is

$$ f(x)=\text{sign}(w^{\top}x+b). $$

(3)

Then we introduce TSVMs [7]. Suppose examples belonging to classes 1 and −1 are represented by matrices A ₊ and B ₋, and the size of A ₊ and B ₋ are (l ₁×d) and (l ₂ × d), respectively. We define two matrices A, B and four vectors v ₁, v ₂, e ₁, e ₂, where e ₁ and e ₂ are vectors of ones of appropriate dimensions and

$$\begin{array}{@{}rcl@{}} A&=&(A_{+} , e_{1}), \,\,B=(B_{-},e_{2}),\\ &&v_{1}= \left( \begin{array}{c} w_{1}\\ b_{1} \end{array}\right), \,\,v_{2}= \left( \begin{array}{c} w_{2}\\ b_{2} \end{array}\right). \end{array} $$

(4)

TSVMs obtain two nonparallel hyperplanes

$$ w_{1}^{\top}x+b_{1}=0\,\,\,\,\,\text{and}\,\,\,\,w_{2}^{\top}x+b_{2}=0 $$

(5)

around which the examples of the corresponding class get clustered. The classifier is given by solving the following QPPs separately (TSVM1)

$$\begin{array}{@{}rcl@{}} &&\min\limits_{v_{1},q_{1}}\,\,\,\frac{1}{2}(Av_{1})^{\top}(Av_{1})+c_{1}e_{2}^{\top}q_{1}\\ &&\text{s.t.} \,\,\,\,-Bv_{1}+q_{1}\succeq e_{2},\,\,q_{1}\succeq 0, \end{array} $$

(6)

(TSVM2)

$$\begin{array}{@{}rcl@{}} &&\min\limits_{v_{2},q_{2}}\,\,\,\frac{1}{2}(Bv_{2})^{\top}(Bv_{2})+c_{2}e_{1}^{\top}q_{2}\\ &&\text{s.t.} \,\,\,\, Av_{2}+q_{2}\succeq e_{1},\,\,q_{2}\succeq 0, \end{array} $$

(7)

where c ₁, c ₂ are nonnegative parameters and q ₁, q ₂ are slack vectors of appropriate dimensions. The label of a new example x is determined by the minimum of |x ^⊤ w _r+b _r| (r = 1, 2) which are the perpendicular distances of x to the two hyperplanes given in (5).

2.2 LapSVMs

LapSVMs combine manifold regularization and SVMs [22]. Suppose x ₁,⋯ ,x _{l + u} ∈ R ^d represent a set of examples including l labeled examples and u unlabeled examples. W _{(l + u)×(l + u)} represents the similarity of every pair of examples

$$ W_{ij}=exp\left(-\parallel x_{i}-x_{j}\parallel^{2}/2\sigma^{2}\right), $$

(8)

where σ is a scale parameter. The manifold regularization can be written as

$$\begin{array}{@{}rcl@{}} Reg(f)&=&\frac{1}{2}\sum\limits_{i,j=1}^{l+u}W_{ij}(f(x_{i})-f(x_{j}))^{2}\\ &=&\sum\limits_{i=1}^{l+u}\left(\sum\limits_{j=1}^{l+u}W_{ij}\right)f^{2}(x_{i})-\sum\limits_{i,j=1}^{l+u}W_{ij}f(x_{i})f(x_{j})\\ &=&\mathbf{f}^{\top}(V-W)\mathbf{f}=\mathbf{f}^{\top}L\mathbf{f}, \end{array} $$

(9)

where function $f: R^{d}\rightarrow R$ and f = [f(x ₁),⋯ ,f(x _{l + u})]. The matrix V is diagonal with the ith diagonal entry $V_{ii}={\sum }_{j=1}^{l+u}W_{ij}$. The matrix L = V−W and L is the graph Laplacian of W. LapSVMs have the following optimization problem

$$\begin{array}{@{}rcl@{}} \min\limits_{f\in \mathcal {H}}&\frac{1}{l}&\!\!\sum\limits_{i=1}^{l}(1-y_{i}f(x_{i}))_{+}+\gamma_{A}\|f\|^{2}\\ &+&\frac{\gamma_{I}}{(u+l)^{2}}\sum\limits_{i,j=1}^{l+u}W_{ij}(f(x_{i})-f(x_{j}))^{2}, \end{array} $$

(10)

where ℋ is the RKHS induced by a kernel. γ _A and γ _I are respectively ambient and intrinsic regularization coefficients.

2.3 LapTSVMs

The square loss function and hinge loss function are used for TSVMs from supervised learning to semi-supervised learning. LapTSVMs [24] are similar to LapSVMs in the sense of manifold regularization. The optimization problems of LapTSVMs can be written as

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w_{1},b_{1},\xi}\frac{1}{2}\|A_{+}w_{1}+e_{1}b_{1}\|^{2}+c_{1}e_{2}^{\top}\xi+\frac{1}{2}c_{2}\left(\|w_{1}\|^{2}+{b_{1}^{2}}\right)\\ &&{\kern2.8pc}+\frac{1}{2}\,c_{3}\left(w_{1}^{\top}M^{\top}+e^{\top}b_{1}\right)L(Mw_{1}+eb_{1})\\ &&\text{s.t.}-(B_{-}w_{1}+e_{2}b_{1})+\xi\succeq e_{2},\,\xi\succeq 0, \end{array} $$

(11)

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w_{2},b_{2},\eta}\frac{1}{2}\|B_{-}w_{2}+e_{2}b_{2}\|^{2}+c_{1}e_{1}^{\top}\eta+\frac{1}{2}c_{2}\left(\|w_{2}\|^{2}+{b_{2}^{2}}\right)\\ &&{\kern2.8pc}+\frac{1}{2}\,c_{3}\left(w_{2}^{\top}M^{\top}+e^{\top}b_{2}\right)L(Mw_{2}+eb_{2})\\ &&\text{s.t.}-(A_{+}w_{2}+e_{1}b_{2})+\eta\succeq e_{1},\,\eta\succeq 0, \end{array} $$

(12)

where M includes all of labeled data and unlabeled data. L is the graph Laplacian. e ₁, e ₂ and e are vectors of appropriate dimensions. w ₁, b ₁, w ₂, b ₂ are classifier parameters. c ₁, c ₂ and c ₃ are nonnegative parameters. ξ and η are slack vectors of appropriate dimensions. The dual problem of (11) and (12) respectively can be written as

$$\begin{array}{@{}rcl@{}} &&\max\limits_{\alpha}e_{2}^{\top}\alpha-\frac{1}{2}\alpha^{\top}G\left(H^{\top}H+c_{2}I+c_{3}J^{\top}LJ\right)^{-1}G^{\top}\alpha\\ &&\text{s.t.}\,\,\,0\preceq \alpha \preceq c_{1}e_{2}, \end{array} $$

(13)

$$\begin{array}{@{}rcl@{}} &&\max_{\beta}e_{1}^{\top}\beta-\frac{1}{2}\beta^{\top}H\left(G^{\top}G+c_{2}I+c_{3}J^{\top}LJ\right)^{-1}H^{\top}\beta\\ &&\text{s.t.}\,\,\,0\preceq \beta \preceq c_{1}e_{1}, \end{array} $$

(14)

where

$$\begin{array}{@{}rcl@{}} v_{1}&=&\left( \begin{array}{c} w_{1}\\ b_{1} \end{array}\right), \,\,v_{2}=\left( \begin{array}{c} w_{2}\\ b_{2} \end{array}\right),\\ H&=&(A_{+},\,e_{1}),\,J=(M,\,e),\,G=(B_{-},\,e_{2}). \end{array} $$

(15)

α and β are the vectors of nonnegative Lagrange multipliers. I is an identity matrix of appropriate dimensions. v ₁,v ₂ can be obtained simultaneously

$$\begin{array}{@{}rcl@{}} &v_{1}=-\left(H^{\top}H+c_{2}I+c_{3}J^{\top}LJ\right)^{-1}G^{\top}\alpha, \end{array} $$

(16)

$$\begin{array}{@{}rcl@{}} &v_{2}=-\left(G^{\top}G+c_{2}I+c_{3}J^{\top}LJ\right)^{-1}H^{\top}\beta. \end{array} $$

(17)

According to matrix theory, it can be easily proved that H ^⊤ H + c ₂ I + c ₃ J ^⊤ L J is a positive definite matrix. LapTSVMs obtain two nonparallel hyperplanes

$$ w_{1}^{\top}x+b_{1}=0\,\,\,\,\,\text{and}\,\,\,\,w_{2}^{\top}x+b_{2}=0. $$

(18)

The label of a new example x is determined by the minimum of |x ^⊤ w _r+b _r| (r = 1,2) which are the perpendicular distances of x to the two hyperplanes given in (18).

2.4 SVM-2K

Suppose that we are given two views of the same data, view 1 is represented by a feature projection ϕ _A with the corresponding kernel function k _A and view 2 is represented by a feature projection ϕ _B with the corresponding kernel function k _B. Then the two-view data are given by a set S = {(ϕ _A(x ₁),ϕ _B(x ₁)),⋯ ,(ϕ _A(x _n),ϕ _B(x _n))}. SVM-2K [31] combines the two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct SVMs from the two feature spaces:

$$ |\langle w_{A},\phi_{A}(x_{i})\rangle+b_{A}-\langle w_{B},\phi_{B}(x_{i})\rangle-b_{B}|\leq \eta_{i}+\epsilon $$

(19)

where w _A, b _A, w _B, b _B are the weight and threshold of the first (second) SVMs. The SVM-2K method has the following optimization for classifier parameters w _A, b _A, w _B, b _B

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w_{A},w_{B},q_{1i},q_{2i},\eta_{i}} \frac{1}{2}\|w_{A}\|^{2}+\frac{1}{2}\|w_{B}\|^{2}+c_{1}\sum\limits_{i=1}^{n}q_{1i}\\ &&{\kern5.1pc}+c_{2}\sum\limits_{i=1}^{n}q_{2i}+D\sum\limits_{i=1}^{n}\eta_{i}\\ &&\text{s.t.}{\kern.5pc}|\langle w_{A},\phi_{A}(x_{i})\rangle+b_{A}-\langle w_{B},\phi_{B}(x_{i})\rangle-b_{B}|\leq \eta_{i}+\epsilon, {} \\ &&{\kern1.5pc}y_{i}(\langle w_{A},\phi_{A}(x_{i})\rangle+b_{A})\geq 1-q_{1i},\\ &&{\kern1.5pc}y_{i}(\langle w_{B},\phi_{B}(x_{i})\rangle+b_{B})\geq 1-q_{2i},\\ &&{\kern1.5pc}q_{1i}\geq 0,q_{2i}\geq 0,\eta_{i}\geq 0,\text{all for}\,\, 1\leq i\leq n, \end{array} $$

(20)

where D, c ₁, c ₂, 𝜖 are nonnegative parameters and q _1i, q _2i, η _i are slack vectors of appropriate dimensions. Let $\hat {w}_{A}$, $\hat {w}_{B}$, $\hat {b}_{A}$, $\hat {b}_{B}$ be the solution to this optimization problem. The final SVM-2K decision function is $f(x)=\frac {1}{2}(\langle \hat {w}_{A},\phi _{A}(x)\rangle +\hat {b}_{A}+\langle \hat {w}_{B},\phi _{B}(x)\rangle +\hat {b}_{B})$. The dual formulation of the above optimization problem can be written as

$$\begin{array}{@{}rcl@{}} &&\min\limits_{{\xi_{i}^{A}},{\xi_{j}^{A}},{\xi_{i}^{B}},{\xi_{j}^{B}},{\alpha_{i}^{A}},{\alpha_{i}^{B}}}\frac{1}{2}\sum\limits_{i,j=1}^{n}\left({\xi_{i}^{A}}{\xi_{j}^{A}}k_{A}(x_{i},x_{j})\right.\\ &&{\kern6pc}\left.+{\xi_{i}^{B}}{\xi_{j}^{B}}k_{B}(x_{i},x_{j})\right)-\sum\limits_{i=1}^{n}\left({\alpha_{i}^{A}}+{\alpha_{i}^{B}}\right)\\ &&\text{s.t.}\kern1pc{\xi_{i}^{A}}={\alpha_{i}^{A}}y_{i}-\beta_{i}^{+}+\beta_{i}^{-},\\ &&\kern2pc{\xi_{i}^{B}}={\alpha_{i}^{B}}y_{i}+\beta_{i}^{+}-\beta_{i}^{-},\\ &&{\kern2pc}\sum\limits_{i=1}^{n}{\xi_{i}^{A}}=\sum\limits_{i=1}^{n}{\xi_{i}^{B}}=0,\\ &&{\kern2pc}0\leq \beta_{i}^{+},\beta_{i}^{-},\beta_{i}^{+}+\beta_{i}^{-} \leq D,\\ &&{\kern2pc}0\leq \alpha_{i}^{A/B}\leq c_{1/2}, \end{array} $$

(21)

where ${\alpha _{i}^{A}}$, ${\alpha _{i}^{B}}$, $\beta _{i}^{+}, \beta _{i}^{-}$ are the vectors of nonnegative Lagrange multipliers and we have taken 𝜖 = 0. The prediction function for each view is given by

$$ f_{A/B}(x)=\sum\limits_{i=1}^{n}\xi_{i}^{A/B}k_{A/B}(x_{i},x)+b_{A/B}. $$

(22)

3 Our proposed methods

3.1 Linear MvLapTSVMs

In this part, we extend LapTSVMs to multi-view learning. Here on view 1, positive examples are represented by $A_{1}^{\prime }$ and negative examples are represented by $B_{1}^{\prime }$. On view 2, positive examples are represented by $A_{2}^{\prime }$ and negative examples are represented by $B_{2}^{\prime }$. The optimization problems of linear MvLapTSVMs can be written as

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w_{1},w_{2},b_{1},b_{2},q_{1},q_{2},\eta}\frac{1}{2}\|A_{1}^{\prime}w_{1}+e_{1}b_{1}\|^{2}+\frac{1}{2}\|A_{2}^{\prime}w_{2}+e_{1}b_{2}\|^{2}\\ &&{\kern5.5pc}+c_{1}e_{2}^{\top}q_{1}+c_{2}e_{2}^{\top}q_{2}\\ &&{\kern5.5pc}+\frac{1}{2}c_{3}\left(\|w_{1}\|^{2}+{b_{1}^{2}}+\|w_{2}\|^{2}+{b_{2}^{2}}\right)\\ &&{\kern5.5pc}+\frac{1}{2}c_{4}\left[(w_{1}^{\top}M_{1}^{'\top}+e^{\top}b_{1})\right.\\ &&{\kern6pc}L_{1}(M_{1}^{\prime}w_{1}+eb_{1})\\ &&{\kern5.5pc}\left.+(w_{2}^{\top}M_{2}^{\prime\top}+e^{\top}b_{2})L_{2}(M_{2}^{\prime}w_{2}+eb_{2})\right]\\ &&{\kern6pc}+De_{1}^{\top}\eta\\ &&{\kern4.5pc}\text{s.t.}{\kern.5pc}|A_{1}^{\prime}w_{1}+e_{1}b_{1}-A_{2}^{\prime}w_{2}-e_{1}b_{2}|\preceq \eta,\\ &&{\kern5.7pc}-B_{1}^{\prime}w_{1}-e_{2}b_{1}+q_{1}\succeq e_{2},\\ &&{\kern5.7pc}-B_{2}^{\prime}w_{2}-e_{2}b_{2}+q_{2}\succeq e_{2},\\ &&{\kern6pc}q_{1}\succeq 0,\,q_{2}\succeq 0,\\ &&{\kern6pc}\eta\succeq 0, \end{array} $$

(23)

$$\begin{array}{@{}rcl@{}} &&\min\limits_{w_{3},w_{4},b_{3},b_{4},q_{3},q_{4},\zeta}\frac{1}{2}\|B_{1}^{\prime}w_{3}+e_{2}b_{3}\|^{2}+\frac{1}{2}\|B_{2}^{\prime}w_{4}+e_{2}b_{4}\|^{2}\\ &&{\kern5.5pc}+c_{1}e_{1}^{\top}q_{3}+c_{2}e_{1}^{\top}q_{4}\\ &&{\kern5.5pc}+\frac{1}{2}c_{3}\left(\|w_{3}\|^{2}+{b_{3}^{2}}+\|w_{4}\|^{2}+{b_{4}^{2}}\right)\\ &&{\kern5.5pc}+\frac{1}{2}c_{4}\left[(w_{3}^{\top}M_{1}^{\prime\top}+e^{\top}b_{3})\right.\\ &&{\kern6pc}L_{1}(M_{1}^{\prime}w_{3}+eb_{3})\\ &&{\kern5.5pc}\left.+(w_{4}^{\top}M_{2}^{\prime\top}+e^{\top}b_{4})L_{2}(M_{2}^{\prime}w_{4}+eb_{4})\right]\\ &&{\kern5.5pc}+He_{2}^{\top}\zeta\\ &&\text{s.t.}{\kern.5pc}|B_{1}^{\prime}w_{3}+e_{2}b_{3}-B_{2}^{\prime}w_{4}-e_{2}b_{4}|\preceq \zeta,\\ &&{\kern1.5pc}-A_{1}^{\prime}w_{3}-e_{1}b_{3}+q_{3}\succeq e_{1},\\ &&{\kern1.5pc}-A_{2}^{\prime}w_{4}-e_{1}b_{4}+q_{4}\succeq e_{1},\\ &&{\kern1.9pc}q_{3}\succeq 0,\,q_{4}\succeq 0,\\ &&{\kern1.9pc}\zeta\succeq 0, \end{array} $$

(24)

where $M_{1}^{\prime }$ includes all of labeled data and unlabeled data from view 1. $M_{2}^{\prime }$ includes all of labeled data and unlabeled data from view 2. L ₁ is the graph Laplacian of view 1 and L ₂ is the graph Laplacian of view 2. e ₁, e ₂ and e are vectors of ones of appropriate dimensions. w ₁, b ₁, w ₂, b ₂, w ₃, b ₃, w ₄, b ₄ are classifier parameters. c ₁, c ₂, c ₃ and c ₄ are nonnegative parameters. q ₁, q ₂, q ₃, q ₄, η and ζ are slack vectors of appropriate dimensions.

The Lagrangian of the optimization problem (23) is given by

$$\begin{array}{@{}rcl@{}} L&=&\frac{1}{2}\|A_{1}^{\prime}w_{1}+e_{1}b_{1}\|^{2}+\frac{1}{2}\|A_{2}^{\prime}w_{2}+e_{1}b_{2}\|^{2}+c_{1}e_{2}^{\top}q_{1}\\ &+&c_{2}e_{2}^{\top}q_{2}+\frac{1}{2}c_{3}\left(\|w_{1}\|^{2}+{b_{1}^{2}}+\|w_{2}\|^{2}+{b_{2}^{2}}\right)\\ &+&\frac{1}{2}c_{4}\left[(w_{1}^{\top}M_{1}^{'\top}+e^{\top}b_{1})L_{1}(M_{1}^{\prime}w_{1}+eb_{1})\right.\\ &+&\left.(w_{2}^{\top}M_{2}^{'\top}+e^{\top}b_{2})L_{2}(M_{2}^{\prime}w_{2}+eb_{2})\right]\\ &+&De_{1}^{\top}\eta-\beta_{1}^{\top}(\eta-A_{1}^{\prime}w_{1}-e_{1}b_{1}+A_{2}^{\prime}w_{2}+e_{1}b_{2})\\ &-&\beta_{2}^{\top}(A_{1}^{\prime}w_{1}+e_{1}b_{1}-A_{2}^{\prime}w_{2}-e_{1}b_{2}+\eta)\\ &-&\alpha_{1}^{\top}(-B_{1}^{\prime}w_{1}-e_{2}b_{1}+q_{1}-e_{2})\\ &-&\alpha_{2}^{\top}(-B_{2}^{\prime}w_{2}-e_{2}b_{2}+q_{2}-e_{2})\\ &-&\lambda_{1}^{\top}q_{1}-\lambda_{2}^{\top}q_{2}-\sigma^{\top}\eta, \end{array} $$

(25)

where α ₁, α ₂, β ₁, β ₂, λ ₁, λ ₂ and σ are the vectors of nonnegative Lagrange multipliers. We take partial derivatives of the above equation and let them be zero

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial w_{1}}&=&A_{1}^{'\top}(A_{1}^{\prime}w_{1}+e_{1}b_{1})+c_{3}w_{1}\\ &+&c_{4}M_{1}^{'\top}L_{1}(M_{1}^{\prime}w_{1}+eb_{1})\\ &+&A_{1}^{'\top}\beta_{1}-A_{1}^{'\top}\beta_{2}+B_{1}^{'\top}\alpha_{1}=0, \end{array} $$

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial b_{1}}&=&e_{1}^{\top}(A_{1}^{\prime}w_{1}+e_{1}b_{1})+c_{3}b_{1}+c_{4}e^{\top}L_{1}(M_{1}^{\prime}w_{1}+eb_{1})\\ &+&e_{1}^{\top}\beta_{1}-e_{1}^{\top}\beta_{2}+e_{1}^{\top}\alpha_{1}=0,\\ \frac{\partial L}{\partial w_{2}}&=&A_{2}^{'\top}(A_{2}^{\prime}w_{2}+e_{1}b_{2})+c_{3}w_{2}\\ &+&c_{4}M_{2}^{'\top}L_{2}(M_{2}^{\prime}w_{2}+eb_{2})\\ &-&A_{2}^{'\top}\beta_{1}+A_{2}^{'\top}\beta_{2}+B_{2}^{'\top}\alpha_{2}=0,\\ \frac{\partial L}{\partial b_{2}}&=&e_{1}^{\top}(A_{2}^{\prime}w_{2}+e_{1}b_{2})+c_{3}b_{2}+c_{4}e^{\top}L_{2}(M_{2}^{\prime}w_{2}+eb_{2})\\ &-&e_{1}^{\top}\beta_{1}+e_{1}^{\top}\beta_{2}+e_{2}^{\top}\alpha_{2}=0, \end{array} $$

(26)

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial q_{1}}=c_{1}e_{2}-\alpha_{1}-\lambda_{1}=0,\\ &\frac{\partial L}{\partial q_{2}}=c_{2}e_{2}-\alpha_{2}-\lambda_{2}=0,\\ &\frac{\partial L}{\partial \eta}=De_{1}-\beta_{1}-\beta_{2}-\sigma=0. \end{array} $$

We define

$$\begin{array}{@{}rcl@{}} &&A_{1}=\left(A_{1}^{\prime},\,e_{1}\right), \,\,A_{2}=\left(A_{2}^{\prime},\,e_{1}\right),\\ &&B_{1}=\left(B_{1}^{\prime},\, e_{2}\right),\,\,B_{2}=\left(B_{2}^{\prime},\, e_{2}\right),\\ &&J_{1}=\left(M_{1}^{\prime},\, e\right),J_{2}=\left(M_{2}^{\prime},\,e\right),v_{1}=\begin{pmatrix}w_{1}\\b_{1}\end{pmatrix},v_{2}=\begin{pmatrix}w_{2}\\b_{2}\end{pmatrix}. \end{array} $$

(27)

From the above equations, we obtain

$$\begin{array}{@{}rcl@{}} A_{1}^{\top}A_{1}v_{1}&+&c_{3}v_{1}+c_{4}J_{1}^{\top}L_{1}J_{1}v_{1}+A_{1}^{\top}\beta_{1}\\ &-&A_{1}^{\top}\beta_{2}+B_{1}^{\top}\alpha_{1}=0, \end{array} $$

(28)

$$\begin{array}{@{}rcl@{}} A_{2}^{\top}A_{2}v_{2}&+&c_{3}v_{2}+c_{4}J_{2}^{\top}L_{2}J_{2}v_{2}-A_{2}^{\top}\beta_{1}\\ &+&A_{2}^{\top}\beta_{2}+B_{2}^{\top}\alpha_{2}=0. \end{array} $$

(29)

It follows that

$$\begin{array}{@{}rcl@{}} v_{1}&=&\left(A_{1}^{\top}A_{1}+c_{3}I+c_{4}J_{1}^{\top}L_{1}J_{1}\right)^{-1}\\ &&[A_{1}^{\top}(\beta_{2}-\beta_{1})-B_{1}^{\top}\alpha_{1}], \end{array} $$

(30)

$$\begin{array}{@{}rcl@{}} v_{2}&=&\left(A_{2}^{\top}A_{2}+c_{3}I+c_{4}J_{2}^{\top}L_{2}J_{2}\right)^{-1}\\ &&[A_{2}^{\top}(\beta_{1}-\beta_{2})-B_{2}^{\top}\alpha_{2}]. \end{array} $$

(31)

We substitute (30), (31) into (25) and get

$$\begin{array}{@{}rcl@{}} L&=&(\alpha_{1}+\alpha_{2})^{\top}e_{2}-\frac{1}{2}\left[(\beta_{2}-\beta_{1})^{\top}A_{1}-\alpha_{1}^{\top}B_{1}\right]\left(A_{1}^{\top}A_{1}\right.\\ &+&\left.c_{3}I+c_{4}J_{1}^{\top}L_{1}J_{1}\right)^{-1} \left[A_{1}^{\top}(\beta_{2}-\beta_{1})-B_{1}^{\top}\alpha_{1}\right]\\ &-&\frac{1}{2}\left[(\beta_{1}-\beta_{2})^{\top}A_{2}-\alpha_{2}^{\top}B_{2}\right]\left(A_{2}^{\top}A_{2}+c_{3}I\right.\\ &+&\left.c_{4}J_{2}^{\top}L_{2}J_{2}\right)^{-1}\left[A_{2}^{\top}(\beta_{1}-\beta_{2})-B_{2}^{\top}\alpha_{2}\right]. \end{array} $$

(32)

Therefore, the dual optimization formulation is

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\xi_{1},\xi_{2},\alpha_{1},\alpha_{2}}\frac{1}{2}\xi_{1}^{\top}\left(A_{1}^{\top}A_{1}+c_{3}I+c_{4}J_{1}^{\top}L_{1}J_{1}\right)^{-1}\xi_{1}\\ &&{\kern3pc}+\frac{1}{2}\xi_{2}^{\top}\left(A_{2}^{\top}A_{2}+c_{3}I+c_{4}J_{2}^{\top}L_{2}J_{2}\right)^{-1}\xi_{2}\\ &&{\kern3pc}-(\alpha_{1}+\alpha_{2})^{\top}e_{2}\\ &&{\kern.8pc}\text{s.t.}{\kern.5pc}\xi_{1}=A_{1}^{\top}(\beta_{2}-\beta_{1})-B_{1}^{\top}\alpha_{1},\\ &&{\kern2.3pc}\xi_{2}=A_{2}^{\top}(\beta_{1}-\beta_{2})-B_{2}^{\top}\alpha_{2},\\ &&{\kern2.6pc}0\preceq \beta_{1},\beta_{2},\beta_{1}+\beta_{2} \preceq De_{1},\\ &&{\kern2.6pc}0\preceq \alpha_{1/2}\preceq c_{1/2}e_{2}. \end{array} $$

(33)

Applying the same techniques to (24), we obtain its corresponding dual optimization formulation as

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\rho_{1},\rho_{2},\omega_{1},\omega_{2}}\frac{1}{2}\rho_{1}^{\top}\left(B_{1}^{\top}B_{1}+c_{3}I+c_{4}J_{1}^{\top}L_{1}J_{1}\right)^{-1}\rho_{1}\\ &&{\kern3.5pc}+\frac{1}{2}\rho_{2}^{\top}\left(B_{2}^{\top}B_{2}+c_{3}I+c_{4}J_{2}^{\top}L_{2}J_{2}\right)^{-1}\rho_{2}\\ &&{\kern3.5pc}-(\omega_{1}+\omega_{2})^{\top}e_{1}\\ &&{\kern.7pc}\text{s.t.}{\kern1pc}\rho_{1}=B_{1}^{\top}(\gamma_{2}-\gamma_{1})-A_{1}^{\top}\omega_{1},\\ &&{\kern2.7pc}\rho_{2}=B_{2}^{\top}(\gamma_{1}-\gamma_{2})-A_{2}^{\top}\omega_{2},\\ &&{\kern3.1pc}0\preceq \gamma_{1},\gamma_{2},\gamma_{1}+\gamma_{2} \preceq He_{2},\\ &&{\kern3.1pc}0\preceq \omega_{1/2}\preceq c_{1/2}e_{1}, \end{array} $$

(34)

where the augmented vectors $u_{1}=\begin {pmatrix}w_{3}\\b_{3}\end {pmatrix},\,u_{2}=\begin {pmatrix}w_{4}\\b_{4}\end {pmatrix}$ are given by

$$\begin{array}{@{}rcl@{}} u_{1}&=&\left(B_{1}^{\top}B_{1}+c_{3}I+c_{4}J_{1}^{\top}L_{1}J_{1}\right)^{-1}\\ &&{\kern.5pc}\left[B_{1}^{\top}(\gamma_{2}-\gamma_{1})-A_{1}^{\top}\omega_{1}\right], \end{array} $$

(35)

$$\begin{array}{@{}rcl@{}} u_{2}&=&\left(B_{2}^{\top}B_{2}+c_{3}I+c_{4}J_{2}^{\top}L_{2}J_{2}\right)^{-1}\\ &&{\kern.5pc}\left[B_{2}^{\top}(\gamma_{1}-\gamma_{2})-A_{2}^{\top}\omega_{2}\right]. \end{array} $$

(36)

For an example x with $x_{1}^{\prime }$ and $x_{2}^{\prime }$, if $\frac {1}{2}(|x_{1}^{\top }v_{1}|+|x_{2}^{\top }v_{2}|)\leq \frac {1}{2}(|x_{1}^{\top }u_{1}|+|x_{2}^{\top }u_{2}|)$, where $x_{1}=(x_{1}^{\prime },1)$ and $x_{2}=(x_{2}^{\prime },1)$, it is classified to class +1, otherwise class −1.

Now we compare SVM-2K and MvLapTSVMs. SVM-2K is a multi-view supervised learning method for SVMs while MvLapTSVMs are multi-view semi-supervised learning methods for TSVMs. Suppose the number of samples from either class is equal to l/2. SVM-2K solves a single QPP and has the computational complexity of O((2l)³), while MvLapTSVMs solve a pair of QPPs and have the computational complexity of O(2l ³). About hyper-parameter selection, SVM-2K needs three hyper-parameters to select, and MvLapTSVMs need five hyper-parameters to select. Therefore, MvLapTSVMs are more efficient for multi-view learning in computational complexity.

3.2 Kernel MvLapTSVMs

Now we extend the linear MvLapTSVMs to the nonlinear case. The kernel-induced hyperplanes are:

$$\begin{array}{@{}rcl@{}} &K\{x_{1}^{\top},C_{1}^{\top}\}\lambda_{1}+b_{1}=0, &K\{x_{2}^{\top},C_{2}^{\top}\}\lambda_{2}+b_{2}=0,\\ &K\{x_{1}^{\top},C_{1}^{\top}\}\lambda_{3}+b_{3}=0, &K\{x_{2}^{\top},C_{2}^{\top}\}\lambda_{4}+b_{4}=0, \end{array} $$

(37)

where K is a chosen kernel function which is defined by K{x _i,x _j}=(Φ(x _i),Φ(x _j)). Φ(⋅) is a nonlinear mapping from a low-dimensional feature space to a high-dimensional feature space. C ₁ and C ₂ denote training examples from view 1 and view 2 respectively, that is, $C_{1}=(A^{'\top }_{1},B^{'\top }_{1})^{\top }$, $C_{2}=(A^{'\top }_{2},B^{'\top }_{2})^{\top }$.

The optimization problems can be written as

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\lambda_{1},\lambda_{2},b_{1},b_{2},q_{1},q_{2},\eta}\frac{1}{2}\|K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}+e_{1}b_{1}\|^{2}\\ &&{\kern5.5pc}+\frac{1}{2}\|K\left\{A_{2}^{\prime},C_{2}^{\top}\right\}\lambda_{2}+e_{1}b_{2}\|^{2}\\ &&{\kern5.5pc}+c_{1}e_{2}^{\top}q_{1}+c_{2}e_{2}^{\top}q_{2}+\frac{1}{2}c_{3}\left(\lambda_{1}^{\top}K_{1}\lambda_{1}\right.\\ &&{\kern5.5pc}+\left.{b_{1}^{2}}+\lambda_{2}^{\top}K_{2}\lambda_{2}+{b_{2}^{2}}\right)+\frac{1}{2}c_{4}\left[(\lambda_{1}^{\top}K_{1}\right.\\ &&{\kern5.5pc}+\left.e^{\top}b_{1}\right)L_{1}(K_{1}\lambda_{1}+eb_{1})+\left(\lambda_{2}^{\top}K_{2}\right.\\ &&{\kern5.5pc}+\left.e^{\top}b_{2}\right)L_{2}(K_{2}\lambda_{2}+eb_{2})]+De_{1}^{\top}\eta\\ &&\text{s.t.}{\kern.5pc}|K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}+e_{1}b_{1}\\ &&{\kern1.4pc}-K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}-e_{1}b_{2}|\preceq \eta,\\ &&{\kern1.4pc}-K\{B_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}-e_{2}b_{1}+q_{1}\succeq e_{2},\\ &&{\kern1.4pc}-K\{B_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}-e_{2}b_{2}+q_{2}\succeq e_{2},\\ &&{\kern1.6pc}q_{1}\succeq 0,\,q_{2}\succeq 0,\\ &&{\kern1.6pc}\eta\succeq 0, \end{array} $$

(38)

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\lambda_{3},\lambda_{4},b_{3},b_{4},q_{3},q_{4},\zeta}\frac{1}{2}\|K\left\{B_{1}^{\prime},C_{1}^{\top}\right\}\lambda_{3}+e_{2}b_{3}\|^{2}\\ &&{\kern5.3pc}+\frac{1}{2}\|K\left\{B_{2}^{\prime},C_{2}^{\top}\right\}\lambda_{4}+e_{2}b_{4}\|^{2}+c_{1}e_{1}^{\top}q_{3}\\ &&{\kern5.3pc}+c_{2}e_{1}^{\top}q_{4}+\frac{1}{2}c_{3}\left(\lambda_{3}^{\top}K_{1}\lambda_{3}+{b_{3}^{2}}\right.\\ &&{\kern5.3pc}+\left.\lambda_{4}^{\top}K_{2}\lambda_{4}+{b_{4}^{2}}\right)+\frac{1}{2}c_{4}\left[(\lambda_{3}^{\top}K_{1}\right.\\ &&{\kern5.3pc}+e^{\top}b_{3})L_{1}(K_{1}\lambda_{3}+eb_{3})+(\lambda_{4}^{\top}K_{2}\\ &&{\kern5.3pc}+\left.e^{\top}b_{4})L_{2}(K_{2}\lambda_{4}+eb_{4}){\vphantom{A^{T}}}\right]+He_{2}^{\top}\zeta\\ &&\text{s.t.}{\kern.5pc}|K\left\{B_{1}^{\prime},C_{1}^{\top}\right\}\lambda_{3}+e_{2}b_{3}-K\left(B_{2}^{\prime},C_{2}^{\top}\right)\lambda_{4}\\ &&{\kern1.5pc}-e_{2}b_{4}|\preceq \zeta,-K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{3}-e_{1}b_{3}+q_{3}\succeq e_{1},\\ &&{\kern1.5pc}-K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{4}-e_{1}b_{4}+q_{4}\succeq e_{1},\\ &&{\kern1.8pc}q_{3}\succeq 0,\,q_{4}\succeq 0,\\ &&{\kern1.8pc}\zeta\succeq 0, \end{array} $$

(39)

where K ₁ represents kernel matrix of view 1 and K ₂ represents kernel matrix of view 2. L ₁ is the graph Laplacian of view 1 and L ₂ is the graph Laplacian of view 2. e ₁, e ₂ and e are vectors of ones of appropriate dimensions. λ ₁, b ₁, λ ₂, b ₂, λ ₃, b ₃, λ ₄, b ₄ are classifier parameters. c ₁, c ₂, c ₃ and c ₄ are nonnegative parameters. q ₁, q ₂, q ₃, q ₄, η and ζ are slack vectors of appropriate dimensions.

The Lagrangian of the optimization problem (38) is given by

$$\begin{array}{@{}rcl@{}} L&=&\frac{1}{2}\|K\left\{A_{1}^{\prime},C_{1}^{\top}\right\}\lambda_{1}+e_{1}b_{1}\|^{2}+\frac{1}{2}\|K\left\{A_{2}^{\prime},C_{2}^{\top}\right\}\lambda_{2}\\ &+&e_{1}b_{2}\|^{2}+c_{1}e_{2}^{\top}q_{1}+c_{2}e_{2}^{\top}q_{2}+\frac{1}{2}c_{3}\left(\lambda_{1}^{\top}K_{1}\lambda_{1}+{b_{1}^{2}}\right.\\ &+&\left.\lambda_{2}^{\top}K_{2}\lambda_{2}+{b_{2}^{2}}\right)+\frac{1}{2}c_{4}\left[(\lambda_{1}^{\top}K_{1}+e^{\top}b_{1})L_{1}(K_{1}\lambda_{1}\right.\\ &+&\left.eb_{1})+(\lambda_{2}^{\top}K_{2}+e^{\top}b_{2})L_{2}(K_{2}\lambda_{2}+eb_{2})\right]+De_{1}^{\top}\eta\\ &-&\beta_{1}^{\top}\left(\eta-K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}-e_{1}b_{1}+K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}\right.\\ &+&\left.e_{1}b_{2}{\vphantom{A^T}}\right)-\beta_{2}^{\top}\left(K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}+e_{1}b_{1}-K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}\right.\\ &-&\left.e_{1}b_{2}+\eta{\vphantom{A^T}}\right)-\alpha_{1}^{\top}\left(-K\{B_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}-e_{2}b_{1}+q_{1}-e_{2}\right)\\ &-&\alpha_{2}^{\top}\left(-K\{B_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}-e_{2}b_{2}+q_{2}-e_{2}\right)\\ &-&\xi_{1}^{\top}q_{1}-\xi_{2}^{\top}q_{2}-\sigma^{\top}\eta, \end{array} $$

(40)

where α ₁, α ₂, β ₁, β ₂, ξ ₁, ξ ₂ and σ are the vectors of nonnegative Lagrange multipliers.

We take partial derivatives of the above equation and let them be zero

$$\begin{array}{@{}rcl@{}} \frac{\partial L}{\partial \lambda_{1}}&=&K\left\{A_{1}^{\prime},C_{1}^{\top}\right\}^{\top}\left(K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}+e_{1}b_{1}\right)+c_{3}K_{1}\lambda_{1}\\ &+&c_{4}K_{1}L_{1}(K_{1}\lambda_{1}+eb_{1})+K\left\{A_{1}^{\prime},C_{1}^{\top}\right\}^{\top}\beta_{1}\\ &-&K\left\{A_{1}^{\prime},C_{1}^{\top}\right\}^{\top}\beta_{2}+K\left\{B_{1}^{\prime},C_{1}^{\top}\right\}^{\top}\alpha_{1}=0,\\ \frac{\partial L}{\partial b_{1}}&=&e_{1}^{\top}\left(K\{A_{1}^{\prime},C_{1}^{\top}\}\lambda_{1}+e_{1}b_{1}\right)+c_{3}b_{1}\\ &+&c_{4}e^{\top}L_{1}(K_{1}\lambda_{1}+eb_{1})+e_{1}^{\top}\beta_{1}-{e_{1}^{T}}\beta_{2}+e_{2}^{\top}\alpha_{1}\!=0,{}\\ \frac{\partial L}{\partial \lambda_{2}}&=&K\left\{A_{2}^{\prime},C_{2}^{\top}\right\}^{\top}\left(K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}+e_{1}b_{2}\right)+c_{3}K_{2}\lambda_{2}\\ &+&c_{4}K_{2}L_{2}(K_{2}\lambda_{2}+eb_{2})-K\left\{A_{2}^{\prime},C_{2}^{\top}\right\}^{\top}\beta_{1}\\ &+&K\left\{A_{2}^{\prime},C_{2}^{\top}\right\}^{\top}\beta_{2}+K\left\{B_{2}^{\prime},C_{2}^{\top}\right\}^{\top}\alpha_{2}=0,\\ \frac{\partial L}{\partial b_{2}}&=&e_{1}^{\top}\left(K\{A_{2}^{\prime},C_{2}^{\top}\}\lambda_{2}+e_{1}b_{2}\right)+c_{3}b_{2}\\ &+&c_{4}e^{\top}L_{2}(K_{2}\lambda_{2}+eb_{2})-e_{1}^{\top}\beta_{1}+e_{1}^{\top}\beta_{2}+e_{2}^{\top}\alpha_{2}\!=0, \end{array} $$

(41)

$$\begin{array}{@{}rcl@{}} &&\frac{\partial L}{\partial q_{1}}=c_{1}e_{2}-\alpha_{1}-\xi_{1}=0,\\ &&\frac{\partial L}{\partial q_{2}}=c_{2}e_{2}-\alpha_{2}-\xi_{2}=0,\\ &&\frac{\partial L}{\partial \eta}=De_{1}-\beta_{1}-\beta_{2}-\delta=0.\\ \end{array} $$

Let

$$\begin{array}{@{}rcl@{}} &&H_{\phi}=\left(K\{A_{1}^{\prime},C_{1}^{\top}\},e_{1}\right),\,\,G_{\phi}=\left(K\{B_{1}^{\prime},C_{1}^{\top}\},e_{2}\right),\\ &&O_{\phi}=\left(\begin{smallmatrix} K_{1}& 0 \\ 0 & 1\end{smallmatrix}\right),J_{\phi}=(K_{1},\,e),\,\,Q_{\phi}=\left(K\{A_{2}^{\prime},C_{2}^{\top}\},e_{1}\right),\\ &&P_{\phi}=\left(K\{B_{2}^{\prime},C_{2}^{\top}\},e_{2}\right),U_{\phi}=\bigl(\begin{smallmatrix} K_{2} & 0 \\ 0 & 1\end{smallmatrix}\bigr),\,\,F_{\phi}=(K_{2},\,e),\\ &&\theta_{1}=\begin{pmatrix}\lambda_{1}\\b_{1}\end{pmatrix},\,\,\theta_{2}=\begin{pmatrix}\lambda_{2}\\b_{2}\end{pmatrix}. \end{array} $$

(42)

From the above equations, we obtain

$$\begin{array}{@{}rcl@{}} H_{\phi}^{\top}H_{\phi}\theta_{1}&+&c_{3}O_{\phi}\theta_{1}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\theta_{1}+H_{\phi}^{\top}\beta_{1}-H_{\phi}^{\top}\beta_{2}\\ &+&G_{\phi}^{\top}\alpha_{1}=0, \end{array} $$

(43)

$$\begin{array}{@{}rcl@{}} Q_{\phi}^{\top}Q_{\phi}\theta_{2}&+&c_{3}U_{\phi}\theta_{2}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\theta_{2}-Q_{\phi}^{\top}\beta_{1}+Q_{\phi}^{\top}\beta_{2}\\ &+&P_{\phi}^{\top}\alpha_{1}=0. \end{array} $$

(44)

It follows that

$$\begin{array}{@{}rcl@{}} \theta_{1}&=&\left(H_{\phi}^{\top}H_{\phi}+c_{3}O_{\phi}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\right)^{-1}\\ &&\left[H_{\phi}^{\top}(\beta_{2}-\beta_{1})-G_{\phi}^{\top}\alpha_{1}\right], \end{array} $$

(45)

$$\begin{array}{@{}rcl@{}} \theta_{2}&=&\left(Q_{\phi}^{\top}Q_{\phi}+c_{3}U_{\phi}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\right)^{-1}\\ &&\left[Q_{\phi}^{\top}(\beta_{1}-\beta_{2})-P_{\phi}^{\top}\alpha_{2}\right]. \end{array} $$

(46)

We substitute (45), (46) into (40) and get

$$\begin{array}{@{}rcl@{}} L&=&(\alpha_{1}+\alpha_{2})^{\top}e_{2}-\frac{1}{2}\left[(\beta_{2}-\beta_{1})^{\top}H_{\phi}-\alpha_{1}^{\top}G_{\phi}\right]\\ &&\left(H_{\phi}^{\top}H_{\phi}+c_{3}O_{\phi}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\right)^{-1}\\ &&\left[H_{\phi}^{\top}(\beta_{2}-\beta_{1})-G_{\phi}^{\top}\alpha_{1}\right]-\frac{1}{2}\left[(\beta_{1}-\beta_{2})^{\top}Q_{\phi}\right.\\ &&-\left.\alpha_{2}^{\top}F_{\phi}\right]\left(Q_{\phi}^{\top}Q_{\phi}+c_{3}U_{\phi}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\right)^{-1}\\ &&\left[Q_{\phi}^{\top}(\beta_{1}-\beta_{2})-P_{\phi}^{\top}\alpha_{2}\right]. \end{array} $$

(47)

Therefore, the dual optimization formulation is

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\xi_{1},\xi_{2},\alpha_{1},\alpha_{2}}\frac{1}{2}\xi_{1}^{\top}\left(H_{\phi}^{\top}H_{\phi}+c_{3}O_{\phi}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\right)^{-1}\xi_{1}\\ &&{\kern3pc}+\frac{1}{2}\xi_{2}^{\top}\left(Q_{\phi}^{\top}Q_{\phi}+c_{3}U_{\phi}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\right)^{-1}\\ &&{\kern3pc}\xi_{2}-(\alpha_{1}+\alpha_{2})^{\top}e_{2} \end{array} $$

(48)

$$\begin{array}{@{}rcl@{}} &&\text{s.t.}{\kern1pc}\xi_{1}=H_{\phi}^{\top}(\beta_{2}-\beta_{1})-G_{\phi}^{\top}\alpha_{1},\\ &&{\kern2pc}\xi_{2}=Q_{\phi}^{\top}(\beta_{1}-\beta_{2})-P_{\phi}^{\top}\alpha_{2},\\ &&{\kern2pc}0\preceq\beta_{1},\beta_{2},\beta_{1}+\beta_{2} \preceq De_{1},\\ &&{\kern2pc}0\preceq \alpha_{1/2}\preceq c_{1/2}e_{2}. \end{array} $$

Correspondingly, the dual optimization formulation for (39) is

$$\begin{array}{@{}rcl@{}} &&\min\limits_{\rho_{1},\rho_{2},\omega_{1},\omega_{2}} \frac{1}{2}\rho_{1}^{\top}\left(G_{\phi}^{\top}G_{\phi}+c_{3}O_{\phi}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\right)^{-1}\\ &&{\kern3.5pc}\rho_{1}+\frac{1}{2}\rho_{2}^{\top}\left(P_{\phi}^{\top}P_{\phi}+c_{3}U_{\phi}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\right)^{-1}\\ &&{\kern3.5pc}\rho_{2}-(\omega_{1}+\omega_{2})^{\top}e_{1}\\ &&{\kern1.5pc}\text{s.t.}{\kern1pc}\rho_{1}=G_{\phi}^{\top}(\gamma_{2}-\gamma_{1})-H_{\phi}^{\top}\omega_{1},\\ &&{\kern3.5pc}\rho_{2}=P_{\phi}^{\top}(\gamma_{1}-\gamma_{2})-Q_{\phi}^{\top}\omega_{2},\\ &&{\kern3.5pc}0\preceq \gamma_{1},\gamma_{2},\gamma_{1}+\gamma_{2} \preceq He_{2},\\ &&{\kern3.5pc}0\preceq \omega_{1/2}\preceq c_{1/2}e_{1}, \end{array} $$

(49)

where the augmented vectors $\pi _{1}=\begin {pmatrix}\lambda _{3}\\b_{3}\end {pmatrix},\,\pi _{2}=\begin {pmatrix}\lambda _{4}\\b_{4}\end {pmatrix}$ are given by

$$\begin{array}{@{}rcl@{}} \pi_{1}&=&\left(G_{\phi}^{\top}G_{\phi}+c_{3}O_{\phi}+c_{4}J_{\phi}^{\top}L_{1}J_{\phi}\right)^{-1}\\ &&\left[G_{\phi}^{\top}(\gamma_{2}-\gamma_{1})-H_{\phi}^{\top}\omega_{1}\right], \end{array} $$

(50)

$$\begin{array}{@{}rcl@{}} \pi_{2}&=&\left(P_{\phi}^{\top}P_{\phi}+c_{3}U_{\phi}+c_{4}F_{\phi}^{\top}L_{2}F_{\phi}\right)^{-1}\\ &&\left[P_{\phi}^{\top}(\gamma_{1}-\gamma_{2})-Q_{\phi}^{\top}\omega_{2}\right]. \end{array} $$

(51)

Suppose an example x has two views x ₁ and x ₂. If $\frac {1}{2}(|K\{x_{1}^{\top },C_{1}^{\top }\}\lambda _{1}+b_{1}|+|K\{x_{2}^{\top },C_{2}^{\top }\}\lambda _{2}+b_{2}|)\leq \frac {1}{2}(|K\{x_{1}^{\top },C_{1}^{\top }\}\lambda _{3}+b_{3}|+|K\{x_{2}^{\top },C_{2}^{\top }\}\lambda _{4}+b_{4}|)$, it is classified to class +1, otherwise class −1.

4 Experimental results

In this section, we evaluate our proposed MvLapTSVMs on three real-world datasets. Three datasets are from UCI Machine Learning Repository: ionosphere classification, handwritten digits classification and advertisement classification. Details about the three datasets are listed in Table 1.

Table 1 Datasets.

Full size table

4.1 Ionosphere

The ionosphere dataset ^{Footnote 1} was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. The targets were free electrons in the ionosphere. “Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not and their signals pass through the ionosphere. It includes 351 instances in total which are divided into 225 “Good” (positive) instances and 126 “Bad” (negative) instances.

In our experiments, we regard original data as the first view. Then we capture 99 % of the data variance while reducing the dimensionality from 34 to 21 with PCA and regard the resultant data as the second view. We compare MvLapSVMs with single-view LapTSVMs (LapTSVM1 means using the LapTSVMs method to deal with one view data and LapTSVM2 means using the LapTSVMs method to deal with the other view data), SVM-2K and multi-view TSVMs (MvTSVMs)^{Footnote 2}. The result of experiment varies by use of different size of unlabeled data. We select regularization parameters from the range [2⁻⁷,2⁷] with exponential growth 0.5. The linear kernel is chosen for the dataset. We select 70 labeled and 70 unlabeled examples as the training set (i.e., l = 70,u = 70). The unlabeled examples are randomly selected from both classes. The size of the test data is 71. The result is in the second column in Table 2. Then we select 70 labeled and 140 unlabeled examples as the training set (i.e., l = 70,u = 140). The unlabeled examples are randomly selected from both classes. The size of the test data is 71. The result is in the third column. Each experiment is repeated five times. Experiment result is in Table 2.

Table 2 Classification accuracies and standard deviations (%) on Ionosphere.

Full size table

4.2 Handwritten digits

The handwritten digits dataset^{Footnote 3} consists of features of handwritten digits (0 ∼ 9) extracted from a collection of Dutch utility maps. It consists of 2000 examples (200 examples per class) with view 1 being the 76 Fourier coefficients, and view 2 being the 64 Karhunen-Lo$\grave {e}$ve coefficients of each example image.

In this experiment, we compare MvLapSVMs with single-view LapTSVMs, SVM-2K and MvTSVMs. Because TSVMs are designed for binary classification while handwritten digits contains 10 classes, we use three pairs (1,7), (2,4) and (3,9) for binary classification. We select regularization parameters from the range [2⁻⁷,2⁷] with exponential growth 0.5. We select 160 labeled and 160 unlabeled examples as the training set (i.e., l = 160,u = 160). Half of the unlabeled data come from one class and the other half come from the other class. The size of the test data is 80. The Gaussian kernel is chosen for the dataset. Each experiment is repeated five times. Experiment result is in Table 3.

Table 3 Classification accuracies and standard deviations (%) on Handwritten digits.

Full size table

4.3 Advertisement

The advertisement dataset^{Footnote 4} [35] consists of 3279 examples including 459 ads images (positive examples) and 2820 non-ads images (negative examples). One view describes the image itself (words in the images URL, alt text and caption), while the other view contains all other features (words from the URLs of the pages that contain the image and the image points to).

In this experiment, we randomly select 700 examples therein to form the used dataset. We select regularization parameters from the range [2⁻⁷,2⁷] with exponential growth 0.5. The Gaussian kernel is chosen for the dataset. We select u = 100 unlabeled data. The unlabeled examples are randomly selected from both classes. Each experiment is repeated five times. Experiment result is in Fig. 1.

4.4 Analysis of the results

MvLapTSVMs can obtain good performance by combining two views in the constraints and are better than the corresponding single-view LapTSVMs. The second row, third row and sixth row in Table 2 show that MvLapTSVMs are superior to single-view LapTSVMs with the same labeled examples and different unlabeled examples. Similarly, the second row, third row and sixth row in Table 3 show that MvLapTSVMs are superior to single-view LapTSVMs in different digit pairs classification problems. From Figure 1 with varying training sizes, we can conclude that our method MvLapTSVMs are superior to single-view LapTSVMs. MvLapTSVMs can also exploit the usefulness of unlabeled examples to improve the classification accuracy comparable to supervised learning such as MvTSVMs and SVM-2K. The fourth row, fifth row and sixth row in Table 2 show that MvLapTSVMs are superior to MvTSVMs and SVM-2K with the same labeled examples and different unlabeled examples. Similarly, the fourth row, fifth row and sixth row in Table 3 show that MvLapTSVMs are superior to MvTSVMs and SVM-2K in different digit pairs classification problems. From Figure 1 with varying training sizes, MvLapTSVMs are superior to MvTSVMs and SVM-2K.

5 Conclusion

In this paper, we extended LapTSVMs to multi-view learning and proposed a new framework called MvLapTSVMs which combine two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct TSVMs from two feature spaces. MvLapTSVMs construct a decision function by solving two quadratic programming problems. We provide their dual formulation making use of Lagrange dual optimization techniques. MvLapTSVMs were further extended to their kernel version. Experimental results on real datasets indicate that the multi-view LapTSVMs are better than the corresponding single-view and supervised learning methods.

Notes

http://archive.ics.uci.edu/ml/datasets/Ionosphere
We do not detail the MvTSVMs here. They are supervised extensions of TSVMs to multi-view learning.
https://archive.ics.uci.edu/ml/datasets/Multiple+Features
http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements

References

Shawe-Taylor J, Sun S (2011) A review of optimization methodologies in support vector machines. Neurocomputing 74(17):3609–3618
Article Google Scholar
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Christianini N (2002) An introduction to ssupport vector machines. Cambridge University Press, Cambridge
Google Scholar
Scholkopf B, Smola A (2003) Learning with kernels. MIT Press, Cambridge
Google Scholar
Fung G, Mangasarian O (2001) Proximal support vector machines. In: Proceedings of the 7th international conference knowledge discovery and data mining, pp 77–86
Mangasarian O, Wild E (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Jayadeva K, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article Google Scholar
Ghorai S, Mukherjee A, Dutta P (2009) Nonparallel plane proximal classifier. Signal Process 89(4):510–522
Article MATH Google Scholar
Shao Y, Chen W, Deng N (2013) Nonparallel hyperplane support vector machine for binary classification problems. Information sciences. doi:10.1016/j.ins.2013.11.003
Shao Y, Wang Z, Chen W, Deng N (2013) Least squares twin parametric-margin support vector machines for classification. Appl Intell 39(3):451–464
Article Google Scholar
Xu Y, Guo R (2014) An improved ν-twin support vector machine. Appled intelligence. doi:10.1007/s10489-013-0500-2
Chen W, Shao Y, Xu D, Fu Y (2013) Manifold proximal support vector machine for semi-supervised classification. Applied intelligence. doi:10.1007/s10489-013-0491-z
Yang Z (2013) Nonparallel hyperplanes proximal classifiers based on manifold regularization for labeled and unlabeled examples. Int J Pattern Recogn Artif Intell 27(5):1–19
Google Scholar
Shao Y, Deng N (2012) A coordinate descent margin based-twin support vector machine for classification. Neural Netw 25:114–121
Article MATH Google Scholar
Chapelle O, Scholkopf B, Zien A (2010) Semi-supervised Learning. MIT Press, Massachusetts
Google Scholar
Zhu X (2008) Semi-supervised learning literature survey. Technical report 1530, Department of Computer Sciences University of Wisconsin Madison
Zhu X, Ghahramani Z, Lafferty J (2006) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference machine learning, pp 912– 919
Zhou Z, Zhan D, Yang Q (2007) Semi-supervised learning with very few labeled training example. In: Proceedings of the 22nd AAAI conference on artificial intelligence, pp 675–680
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning, pp 200–209
Bennett K, Demiriz A (1999) Semi-supervised support vector machines. Adv Neural Info Proc Syst 11:368–374
Google Scholar
Fung G, Mangasarian O (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Method Soft 15:29–44
Article MATH Google Scholar
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
MATH MathSciNet Google Scholar
Melacci S, Beklin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184
MATH MathSciNet Google Scholar
Qi Z, Tian Y, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53
Article MATH Google Scholar
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Article Google Scholar
Ding S, Zhao Y, Qi B, Huang H (2012) An overview on twin support vector machines. Artificial intelligence review. doi:10.1007/s10462-012-9336-0
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038
Article Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory, pp 92–100
Sindhwani V, Rosenberg D (2008) An RKHS for multi-view learning and manifold co-regularization. In: Proceedings of the 25th international conference on machine learning, pp 976–983
Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: Proceedings of the workshop on learning with multiple views, 22nd ICML, pp 824–831
Farquhar J, Hardoon D, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. Adv Neural Info Proc Syst 18:355–362
Google Scholar
Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455
MATH MathSciNet Google Scholar
Sun S (2011) Multi-view Laplacian support vector machines. Lect Notes Comput Sci 7121:209–222
Article Google Scholar
Bartlett P, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
MathSciNet Google Scholar
Kushmerick N (1999) Learning to remove internet advertisement. In: Proceedings of the 3rd annual conference on autonomous agents, pp 175–181

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Projects 61370175 and 61075005, and Shanghai Knowledge Service Platform Project (No. ZF1213).

Author information

Authors and Affiliations

Department of Computer Science and Technology, East China Normal University, 500 Dongchuan Road, Shanghai, 200241, People’s Republic of China
Xijiong Xie & Shiliang Sun

Authors

Xijiong Xie
View author publications
You can also search for this author in PubMed Google Scholar
Shiliang Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shiliang Sun.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, X., Sun, S. Multi-view Laplacian twin support vector machines. Appl Intell 41, 1059–1068 (2014). https://doi.org/10.1007/s10489-014-0563-8

Download citation

Published: 03 September 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10489-014-0563-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multi-view Laplacian twin support vector machines

Abstract

Similar content being viewed by others

Regularized multi-view least squares twin support vector machines

Multi-task \(\nu\)-twin support vector machines

Multi-view semi-supervised least squares twin support vector machines with manifold-preserving graph reduction

1 Introduction

2 Related work

2.1 SVMs and TSVMs

2.2 LapSVMs

2.3 LapTSVMs

2.4 SVM-2K

3 Our proposed methods

3.1 Linear MvLapTSVMs

3.2 Kernel MvLapTSVMs

4 Experimental results

4.1 Ionosphere

4.2 Handwritten digits

4.3 Advertisement

4.4 Analysis of the results

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-view Laplacian twin support vector machines

Abstract

Similar content being viewed by others

Regularized multi-view least squares twin support vector machines

Multi-task \(\nu\)-twin support vector machines

Multi-view semi-supervised least squares twin support vector machines with manifold-preserving graph reduction

Explore related subjects

1 Introduction

2 Related work

2.1 SVMs and TSVMs

2.2 LapSVMs

2.3 LapTSVMs

2.4 SVM-2K

3 Our proposed methods

3.1 Linear MvLapTSVMs

3.2 Kernel MvLapTSVMs

4 Experimental results

4.1 Ionosphere

4.2 Handwritten digits

4.3 Advertisement

4.4 Analysis of the results

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation