Abstract
Twin support vector machines are a recently proposed learning method for pattern classification. They learn two hyperplanes rather than one as in usual support vector machines and often bring performance improvements. Semi-supervised learning has attracted great attention in machine learning in the last decade. Laplacian support vector machines and Laplacian twin support vector machines have been proposed in the semi-supervised learning framework. In this paper, inspired by the recent success of multi-view learning we propose multi-view Laplacian twin support vector machines, whose dual optimization problems are quadratic programming problems. We further extend them to kernel multi-view Laplacian twin support vector machines. Experimental results demonstrate that our proposed methods are effective.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Support vector machines (SVMs) are a state-of-the-art tool for pattern classification and regression problems [1–3], which originate from the idea of structural risk minization in statistical learning theory. SVMs can learn a nonlinear decision function which is linear in a potentially high-dimensional feature space [4] with the aid of the kernel trick. In practice, SVMs have been applied to a variety of domains such as object detection, text categorization, bioinformatics and image classification, etc.
In order to reduce the computational cost of SVMs, proximal support vector machines (PSVMs) [5] have been proposed. Compared with SVMs, PSVMs solve a linear equation with time complexity O(d 3) (d is the dimension of the examples) while SVMs solve the convex optimization problem. In essence, PSVMs classify the examples by a hyperplane on the premise of guaranteeing the maximum margin. Mangasarian and Wild [6] proposed generalized eigenvalue proximal SVMs (GEPSVMs) which are an extension of PSVMs for binary classification. Instead of finding a single hyperplane as in PSVMs, GEPSVMs find two nonparallel hyperplanes such that each hyperplane is as close as possible to examples from one class and as far as possible to examples from the other class. The two hyperplanes are obtained by eigenvectors corresponding to the smallest eigenvalues of two related generalized eigenvalue problems. Jayadeva et al. [7] proposed another nonparallel hyperplane classifier called twin SVMs (TSVMs), which aim to generate two nonparallel hyperplanes such that one of the hyperplanes is closer to one class and has a certain distance to the other class. The formulation of TSVMs is different from that of GEPSVMs and is similar to SVMs. TSVMs solve a pair of quadratic programming problems (QPPs), whereas SVMs solve a single QPP. This strategy of solving two smaller sized QPPs rather than one large QPP makes TSVMs work faster than standard SVMs [7]. Experimental results [8] show that nonparallel hyperplane classifiers given by TSVMs can indeed improve the performance of conventional SVMs [9–14].
In many machine learning tasks [15–18], labeled examples are often difficult and expensive to obtain, while unlabeled examples may be relatively easy to collect. Semi-supervised learning has attracted a great deal of attention in the last decade to deal with this situation. It can be superior to the performance of the counterpart supervised learning approaches if the unlabeled data are properly used. Some extensions of SVMs and TSVMs from supervised learning to semi-supervised learning have been proposed, e.g., transductive SVMs, semi-supervised support vector machines, Laplacian support vector machines (LapSVMs), Laplacian twin support vector machines (LapTSVMs) [19–24]. LapTSVMs [24] are a successful combination of semi-supervised learning and TSVMs, which are a generalized framework of twin support vector machines for learning from labeled and unlabeled data. By choosing appropriate parameters, LapTSVMs can degenerate to TSVMs [25, 26]. Experimental results showed that LapTSVMs are superior to LapSVMs and TSVMs in classification accuracy and the training time is more economical than LapSVMs and TSVMs.
In many real-world applications, multi-modal data are very common because of the use of different measuring methods (e.g., infrared and visual cameras), or of different media (e.g., text, video and audio) [27]. For example, web pages can be represented by a vector for the words in the web page text and another vector for the words in the anchor text of a hyper-link. In content-based web-image retrieval, an image can be simultaneously described by visual features and the text surrounding the image. Multi-view learning (MVL) is an emerging direction which aims to improve classifiers by leveraging the complementarity and consistency among distinct views [28–30]. The theories on MVL can be classified to four categories which are canonical correlation analysis, effectiveness of co-training, generalization error analysis for co-training and generalization error analysis for other MVL approaches [27].
SVM-2K is a successful combination of MVL and SVMs which combines the maximum margin and multi-view regularization principles to leverage two views to improve classification performance [31]. Farquhar et al. [31] have provided a theoretical analysis to illuminate the effectiveness of SVM-2K, showing a significant reduction in the Rademacher complexity of the corresponding function class. Sun and Shawe-Taylor characterized the generalization error of multi-view sparse SVMs [32] and multi-view LapSVMs (MvLapSVMs) [33] in terms of the margin bound and derived the empirical Rademacher complexity of the considered function classes [34]. MvLapSVMs integrate three regularization terms respectively on function norm, manifold and multi-view regularization in the objective function. However, there is no existing multi-view extension for LapTSVMs although LapTSVMs are superior to LapSVMs. In this paper, we extend LapTSVMs to our new frameworks named by multi-view Laplacian twin support vector machines (MvLapTSVMs) which combine two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct TSVMs from two feature spaces. Compared to MvLapSVMs, there are two main differences. First, LapSVMs and LapTSVMs are different in the principle though they commonly use the manifold regularization term for semi-supervised learning. MvLapSVMs are based on LapSVMs while MvLapTSVMs are based on LapTSVMs. Second, MvLapTSVMs combine two views in the constraints rather than in the objective function. Experimental results validate that our proposed methods are effective.
The remainder of this paper proceeds as follows. Section 2 briefly reviews related work including SVMs, TSVMs, LapSVMs, LapTSVMs and SVM-2K. Section 3 introduces our proposed linear MvLapTSVMs and kernel MvLapTSVMs. After reporting experimental results in Section 4, we give conclusions in Section 5.
2 Related work
In this section, we briefly review SVMs, TSVMs, LapSVMs, LapTSVMs and SVM-2K. They constitute the foundation of our subsequent proposed methods.
2.1 SVMs and TSVMs
Suppose there are l examples represented by matrix A with the ith row A i (i = 1, 2, ⋯ ,l) being the ith example. Let y i ∈ {1, −1} denote the class to which the ith example belongs. For simplicity, here we only review the linearly separable case [1]. Then, we need to determine w ∈ R d and b ∈ R such that
The hyperplane described by w ⊤ x + b = 0 lies midway between the bounding hyperplanes given by w ⊤ x + b = 1 and w ⊤ x + b = −1. The margin of separation between the two classes is given by \(\frac {2}{\| w \|},\) where ∥w∥ denotes the ℓ 2 norm of w. Support vectors are those training examples lying on the above two hyperplanes. The standard SVMs [1] are obtained by solving the following problem
The decision function is
Then we introduce TSVMs [7]. Suppose examples belonging to classes 1 and −1 are represented by matrices A + and B −, and the size of A + and B − are (l 1×d) and (l 2 × d), respectively. We define two matrices A, B and four vectors v 1, v 2, e 1, e 2, where e 1 and e 2 are vectors of ones of appropriate dimensions and
TSVMs obtain two nonparallel hyperplanes
around which the examples of the corresponding class get clustered. The classifier is given by solving the following QPPs separately (TSVM1)
(TSVM2)
where c 1, c 2 are nonnegative parameters and q 1, q 2 are slack vectors of appropriate dimensions. The label of a new example x is determined by the minimum of |x ⊤ w r +b r | (r = 1, 2) which are the perpendicular distances of x to the two hyperplanes given in (5).
2.2 LapSVMs
LapSVMs combine manifold regularization and SVMs [22]. Suppose x 1,⋯ ,x l + u ∈ R d represent a set of examples including l labeled examples and u unlabeled examples. W (l + u)×(l + u) represents the similarity of every pair of examples
where σ is a scale parameter. The manifold regularization can be written as
where function \(f: R^{d}\rightarrow R\) and f = [f(x 1),⋯ ,f(x l + u )]. The matrix V is diagonal with the ith diagonal entry \(V_{ii}={\sum }_{j=1}^{l+u}W_{ij}\). The matrix L = V−W and L is the graph Laplacian of W. LapSVMs have the following optimization problem
where ℋ is the RKHS induced by a kernel. γ A and γ I are respectively ambient and intrinsic regularization coefficients.
2.3 LapTSVMs
The square loss function and hinge loss function are used for TSVMs from supervised learning to semi-supervised learning. LapTSVMs [24] are similar to LapSVMs in the sense of manifold regularization. The optimization problems of LapTSVMs can be written as
where M includes all of labeled data and unlabeled data. L is the graph Laplacian. e 1, e 2 and e are vectors of appropriate dimensions. w 1, b 1, w 2, b 2 are classifier parameters. c 1, c 2 and c 3 are nonnegative parameters. ξ and η are slack vectors of appropriate dimensions. The dual problem of (11) and (12) respectively can be written as
where
α and β are the vectors of nonnegative Lagrange multipliers. I is an identity matrix of appropriate dimensions. v 1,v 2 can be obtained simultaneously
According to matrix theory, it can be easily proved that H ⊤ H + c 2 I + c 3 J ⊤ L J is a positive definite matrix. LapTSVMs obtain two nonparallel hyperplanes
The label of a new example x is determined by the minimum of |x ⊤ w r +b r | (r = 1,2) which are the perpendicular distances of x to the two hyperplanes given in (18).
2.4 SVM-2K
Suppose that we are given two views of the same data, view 1 is represented by a feature projection ϕ A with the corresponding kernel function k A and view 2 is represented by a feature projection ϕ B with the corresponding kernel function k B . Then the two-view data are given by a set S = {(ϕ A (x 1),ϕ B (x 1)),⋯ ,(ϕ A (x n ),ϕ B (x n ))}. SVM-2K [31] combines the two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct SVMs from the two feature spaces:
where w A , b A , w B , b B are the weight and threshold of the first (second) SVMs. The SVM-2K method has the following optimization for classifier parameters w A , b A , w B , b B
where D, c 1, c 2, 𝜖 are nonnegative parameters and q 1i , q 2i , η i are slack vectors of appropriate dimensions. Let \(\hat {w}_{A}\), \(\hat {w}_{B}\), \(\hat {b}_{A}\), \(\hat {b}_{B}\) be the solution to this optimization problem. The final SVM-2K decision function is \(f(x)=\frac {1}{2}(\langle \hat {w}_{A},\phi _{A}(x)\rangle +\hat {b}_{A}+\langle \hat {w}_{B},\phi _{B}(x)\rangle +\hat {b}_{B})\). The dual formulation of the above optimization problem can be written as
where \({\alpha _{i}^{A}}\), \({\alpha _{i}^{B}}\), \(\beta _{i}^{+}, \beta _{i}^{-}\) are the vectors of nonnegative Lagrange multipliers and we have taken 𝜖 = 0. The prediction function for each view is given by
3 Our proposed methods
3.1 Linear MvLapTSVMs
In this part, we extend LapTSVMs to multi-view learning. Here on view 1, positive examples are represented by \(A_{1}^{\prime }\) and negative examples are represented by \(B_{1}^{\prime }\). On view 2, positive examples are represented by \(A_{2}^{\prime }\) and negative examples are represented by \(B_{2}^{\prime }\). The optimization problems of linear MvLapTSVMs can be written as
where \(M_{1}^{\prime }\) includes all of labeled data and unlabeled data from view 1. \(M_{2}^{\prime }\) includes all of labeled data and unlabeled data from view 2. L 1 is the graph Laplacian of view 1 and L 2 is the graph Laplacian of view 2. e 1, e 2 and e are vectors of ones of appropriate dimensions. w 1, b 1, w 2, b 2, w 3, b 3, w 4, b 4 are classifier parameters. c 1, c 2, c 3 and c 4 are nonnegative parameters. q 1, q 2, q 3, q 4, η and ζ are slack vectors of appropriate dimensions.
The Lagrangian of the optimization problem (23) is given by
where α 1, α 2, β 1, β 2, λ 1, λ 2 and σ are the vectors of nonnegative Lagrange multipliers. We take partial derivatives of the above equation and let them be zero
We define
From the above equations, we obtain
It follows that
We substitute (30), (31) into (25) and get
Therefore, the dual optimization formulation is
Applying the same techniques to (24), we obtain its corresponding dual optimization formulation as
where the augmented vectors \(u_{1}=\begin {pmatrix}w_{3}\\b_{3}\end {pmatrix},\,u_{2}=\begin {pmatrix}w_{4}\\b_{4}\end {pmatrix}\) are given by
For an example x with \(x_{1}^{\prime }\) and \(x_{2}^{\prime }\), if \(\frac {1}{2}(|x_{1}^{\top }v_{1}|+|x_{2}^{\top }v_{2}|)\leq \frac {1}{2}(|x_{1}^{\top }u_{1}|+|x_{2}^{\top }u_{2}|)\), where \(x_{1}=(x_{1}^{\prime },1)\) and \(x_{2}=(x_{2}^{\prime },1)\), it is classified to class +1, otherwise class −1.
Now we compare SVM-2K and MvLapTSVMs. SVM-2K is a multi-view supervised learning method for SVMs while MvLapTSVMs are multi-view semi-supervised learning methods for TSVMs. Suppose the number of samples from either class is equal to l/2. SVM-2K solves a single QPP and has the computational complexity of O((2l)3), while MvLapTSVMs solve a pair of QPPs and have the computational complexity of O(2l 3). About hyper-parameter selection, SVM-2K needs three hyper-parameters to select, and MvLapTSVMs need five hyper-parameters to select. Therefore, MvLapTSVMs are more efficient for multi-view learning in computational complexity.
3.2 Kernel MvLapTSVMs
Now we extend the linear MvLapTSVMs to the nonlinear case. The kernel-induced hyperplanes are:
where K is a chosen kernel function which is defined by K{x i ,x j }=(Φ(x i ),Φ(x j )). Φ(⋅) is a nonlinear mapping from a low-dimensional feature space to a high-dimensional feature space. C 1 and C 2 denote training examples from view 1 and view 2 respectively, that is, \(C_{1}=(A^{'\top }_{1},B^{'\top }_{1})^{\top }\), \(C_{2}=(A^{'\top }_{2},B^{'\top }_{2})^{\top }\).
The optimization problems can be written as
where K 1 represents kernel matrix of view 1 and K 2 represents kernel matrix of view 2. L 1 is the graph Laplacian of view 1 and L 2 is the graph Laplacian of view 2. e 1, e 2 and e are vectors of ones of appropriate dimensions. λ 1, b 1, λ 2, b 2, λ 3, b 3, λ 4, b 4 are classifier parameters. c 1, c 2, c 3 and c 4 are nonnegative parameters. q 1, q 2, q 3, q 4, η and ζ are slack vectors of appropriate dimensions.
The Lagrangian of the optimization problem (38) is given by
where α 1, α 2, β 1, β 2, ξ 1, ξ 2 and σ are the vectors of nonnegative Lagrange multipliers.
We take partial derivatives of the above equation and let them be zero
Let
From the above equations, we obtain
It follows that
We substitute (45), (46) into (40) and get
Therefore, the dual optimization formulation is
Correspondingly, the dual optimization formulation for (39) is
where the augmented vectors \(\pi _{1}=\begin {pmatrix}\lambda _{3}\\b_{3}\end {pmatrix},\,\pi _{2}=\begin {pmatrix}\lambda _{4}\\b_{4}\end {pmatrix}\) are given by
Suppose an example x has two views x 1 and x 2. If \(\frac {1}{2}(|K\{x_{1}^{\top },C_{1}^{\top }\}\lambda _{1}+b_{1}|+|K\{x_{2}^{\top },C_{2}^{\top }\}\lambda _{2}+b_{2}|)\leq \frac {1}{2}(|K\{x_{1}^{\top },C_{1}^{\top }\}\lambda _{3}+b_{3}|+|K\{x_{2}^{\top },C_{2}^{\top }\}\lambda _{4}+b_{4}|)\), it is classified to class +1, otherwise class −1.
4 Experimental results
In this section, we evaluate our proposed MvLapTSVMs on three real-world datasets. Three datasets are from UCI Machine Learning Repository: ionosphere classification, handwritten digits classification and advertisement classification. Details about the three datasets are listed in Table 1.
4.1 Ionosphere
The ionosphere dataset Footnote 1 was collected by a system in Goose Bay, Labrador. This system consists of a phased array of 16 high-frequency antennas with a total transmitted power on the order of 6.4 kilowatts. The targets were free electrons in the ionosphere. “Good” radar returns are those showing evidence of some type of structure in the ionosphere. “Bad” returns are those that do not and their signals pass through the ionosphere. It includes 351 instances in total which are divided into 225 “Good” (positive) instances and 126 “Bad” (negative) instances.
In our experiments, we regard original data as the first view. Then we capture 99 % of the data variance while reducing the dimensionality from 34 to 21 with PCA and regard the resultant data as the second view. We compare MvLapSVMs with single-view LapTSVMs (LapTSVM1 means using the LapTSVMs method to deal with one view data and LapTSVM2 means using the LapTSVMs method to deal with the other view data), SVM-2K and multi-view TSVMs (MvTSVMs)Footnote 2. The result of experiment varies by use of different size of unlabeled data. We select regularization parameters from the range [2−7,27] with exponential growth 0.5. The linear kernel is chosen for the dataset. We select 70 labeled and 70 unlabeled examples as the training set (i.e., l = 70,u = 70). The unlabeled examples are randomly selected from both classes. The size of the test data is 71. The result is in the second column in Table 2. Then we select 70 labeled and 140 unlabeled examples as the training set (i.e., l = 70,u = 140). The unlabeled examples are randomly selected from both classes. The size of the test data is 71. The result is in the third column. Each experiment is repeated five times. Experiment result is in Table 2.
4.2 Handwritten digits
The handwritten digits datasetFootnote 3 consists of features of handwritten digits (0 ∼ 9) extracted from a collection of Dutch utility maps. It consists of 2000 examples (200 examples per class) with view 1 being the 76 Fourier coefficients, and view 2 being the 64 Karhunen-Lo\(\grave {e}\)ve coefficients of each example image.
In this experiment, we compare MvLapSVMs with single-view LapTSVMs, SVM-2K and MvTSVMs. Because TSVMs are designed for binary classification while handwritten digits contains 10 classes, we use three pairs (1,7), (2,4) and (3,9) for binary classification. We select regularization parameters from the range [2−7,27] with exponential growth 0.5. We select 160 labeled and 160 unlabeled examples as the training set (i.e., l = 160,u = 160). Half of the unlabeled data come from one class and the other half come from the other class. The size of the test data is 80. The Gaussian kernel is chosen for the dataset. Each experiment is repeated five times. Experiment result is in Table 3.
4.3 Advertisement
The advertisement datasetFootnote 4 [35] consists of 3279 examples including 459 ads images (positive examples) and 2820 non-ads images (negative examples). One view describes the image itself (words in the images URL, alt text and caption), while the other view contains all other features (words from the URLs of the pages that contain the image and the image points to).
In this experiment, we randomly select 700 examples therein to form the used dataset. We select regularization parameters from the range [2−7,27] with exponential growth 0.5. The Gaussian kernel is chosen for the dataset. We select u = 100 unlabeled data. The unlabeled examples are randomly selected from both classes. Each experiment is repeated five times. Experiment result is in Fig. 1.
4.4 Analysis of the results
MvLapTSVMs can obtain good performance by combining two views in the constraints and are better than the corresponding single-view LapTSVMs. The second row, third row and sixth row in Table 2 show that MvLapTSVMs are superior to single-view LapTSVMs with the same labeled examples and different unlabeled examples. Similarly, the second row, third row and sixth row in Table 3 show that MvLapTSVMs are superior to single-view LapTSVMs in different digit pairs classification problems. From Figure 1 with varying training sizes, we can conclude that our method MvLapTSVMs are superior to single-view LapTSVMs. MvLapTSVMs can also exploit the usefulness of unlabeled examples to improve the classification accuracy comparable to supervised learning such as MvTSVMs and SVM-2K. The fourth row, fifth row and sixth row in Table 2 show that MvLapTSVMs are superior to MvTSVMs and SVM-2K with the same labeled examples and different unlabeled examples. Similarly, the fourth row, fifth row and sixth row in Table 3 show that MvLapTSVMs are superior to MvTSVMs and SVM-2K in different digit pairs classification problems. From Figure 1 with varying training sizes, MvLapTSVMs are superior to MvTSVMs and SVM-2K.
5 Conclusion
In this paper, we extended LapTSVMs to multi-view learning and proposed a new framework called MvLapTSVMs which combine two views by introducing the constraint of similarity between two one-dimensional projections identifying two distinct TSVMs from two feature spaces. MvLapTSVMs construct a decision function by solving two quadratic programming problems. We provide their dual formulation making use of Lagrange dual optimization techniques. MvLapTSVMs were further extended to their kernel version. Experimental results on real datasets indicate that the multi-view LapTSVMs are better than the corresponding single-view and supervised learning methods.
Notes
http://archive.ics.uci.edu/ml/datasets/Ionosphere
We do not detail the MvTSVMs here. They are supervised extensions of TSVMs to multi-view learning.
https://archive.ics.uci.edu/ml/datasets/Multiple+Features
http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements
References
Shawe-Taylor J, Sun S (2011) A review of optimization methodologies in support vector machines. Neurocomputing 74(17):3609–3618
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Christianini N (2002) An introduction to ssupport vector machines. Cambridge University Press, Cambridge
Scholkopf B, Smola A (2003) Learning with kernels. MIT Press, Cambridge
Fung G, Mangasarian O (2001) Proximal support vector machines. In: Proceedings of the 7th international conference knowledge discovery and data mining, pp 77–86
Mangasarian O, Wild E (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Jayadeva K, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Ghorai S, Mukherjee A, Dutta P (2009) Nonparallel plane proximal classifier. Signal Process 89(4):510–522
Shao Y, Chen W, Deng N (2013) Nonparallel hyperplane support vector machine for binary classification problems. Information sciences. doi:10.1016/j.ins.2013.11.003
Shao Y, Wang Z, Chen W, Deng N (2013) Least squares twin parametric-margin support vector machines for classification. Appl Intell 39(3):451–464
Xu Y, Guo R (2014) An improved ν-twin support vector machine. Appled intelligence. doi:10.1007/s10489-013-0500-2
Chen W, Shao Y, Xu D, Fu Y (2013) Manifold proximal support vector machine for semi-supervised classification. Applied intelligence. doi:10.1007/s10489-013-0491-z
Yang Z (2013) Nonparallel hyperplanes proximal classifiers based on manifold regularization for labeled and unlabeled examples. Int J Pattern Recogn Artif Intell 27(5):1–19
Shao Y, Deng N (2012) A coordinate descent margin based-twin support vector machine for classification. Neural Netw 25:114–121
Chapelle O, Scholkopf B, Zien A (2010) Semi-supervised Learning. MIT Press, Massachusetts
Zhu X (2008) Semi-supervised learning literature survey. Technical report 1530, Department of Computer Sciences University of Wisconsin Madison
Zhu X, Ghahramani Z, Lafferty J (2006) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference machine learning, pp 912– 919
Zhou Z, Zhan D, Yang Q (2007) Semi-supervised learning with very few labeled training example. In: Proceedings of the 22nd AAAI conference on artificial intelligence, pp 675–680
Joachims T (1999) Transductive inference for text classification using support vector machines. In: Proceedings of the 16th international conference on machine learning, pp 200–209
Bennett K, Demiriz A (1999) Semi-supervised support vector machines. Adv Neural Info Proc Syst 11:368–374
Fung G, Mangasarian O (2001) Semi-supervised support vector machines for unlabeled data classification. Optim Method Soft 15:29–44
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Melacci S, Beklin M (2011) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184
Qi Z, Tian Y, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Ding S, Zhao Y, Qi B, Huang H (2012) An overview on twin support vector machines. Artificial intelligence review. doi:10.1007/s10462-012-9336-0
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23:2031–2038
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory, pp 92–100
Sindhwani V, Rosenberg D (2008) An RKHS for multi-view learning and manifold co-regularization. In: Proceedings of the 25th international conference on machine learning, pp 976–983
Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: Proceedings of the workshop on learning with multiple views, 22nd ICML, pp 824–831
Farquhar J, Hardoon D, Shawe-Taylor J, Szedmak S (2006) Two view learning: SVM-2K, theory and practice. Adv Neural Info Proc Syst 18:355–362
Sun S, Shawe-Taylor J (2010) Sparse semi-supervised learning using conjugate functions. J Mach Learn Res 11:2423–2455
Sun S (2011) Multi-view Laplacian support vector machines. Lect Notes Comput Sci 7121:209–222
Bartlett P, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Kushmerick N (1999) Learning to remove internet advertisement. In: Proceedings of the 3rd annual conference on autonomous agents, pp 175–181
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Projects 61370175 and 61075005, and Shanghai Knowledge Service Platform Project (No. ZF1213).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xie, X., Sun, S. Multi-view Laplacian twin support vector machines. Appl Intell 41, 1059–1068 (2014). https://doi.org/10.1007/s10489-014-0563-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0563-8