A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization

Balasundaram, S.; Gupta, Deepak; Prasad, Subhash Chandra

doi:10.1007/s10489-016-0809-8

A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization

Published: 29 July 2016

Volume 46, pages 124–134, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization

Download PDF

S. Balasundaram¹,
Deepak Gupta¹ &
Subhash Chandra Prasad¹

519 Accesses
22 Citations
Explore all metrics

Abstract

In this paper, a novel unconstrained convex minimization problem formulation for the Lagrangian dual of the recently introduced twin support vector machine (TWSVM) in simpler form is proposed for constructing binary classifiers. Since the objective functions of the modified minimization problems contain non-smooth ‘plus’ function, we solve them by Newton iterative method either by considering their generalized Hessian matrices or replacing the ‘plus’ function by a smooth approximation function. Numerical experiments were performed on a number of interesting real-world benchmark data sets. Computational results clearly illustrates the effectiveness and the applicability of the proposed approach as comparable or better generalization performance with faster learning speed is obtained in comparison with SVM, least squares TWSVM (LS-TWSVM) and TWSVM.

Newton method for implicit Lagrangian twin support vector machines

Article 28 August 2015

Lagrangian Twin-Bounded Support Vector Machine Based on L2-Norm

Application of smoothing techniques for linear programming twin support vector machines

Article 21 September 2014

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Support vector machines (SVMs), proposed by Vapnik [25], are computationally powerful machine learning tools applied to classification. They have been successfully applied to problems of practical importance from wider areas like face detection [18], gene prediction [7], and text characterization [10].

The objective of SVM is in determining a separating hyperplane by maximizing the margin between the samples of positive and negative classes and assigning class labels to test samples in accordance with the half-space in which they lie. It is well-known that SVM determines the optimal hyperplane as the solution of a quadratic programming problem (QPP) having linear inequality constraints [3, 25].

As SVM derives a robust, sparse, global solution owning better generalization ability than other machine learning approaches like artificial neural networks, it becomes the state of the art method for classification. However, one of the major challenges of SVM is its high computational cost associated with training which restricts its application to problems with large data sets. To overcome this problem, over the past decade, efficient learning algorithms and models have been proposed in the literature [2, 9, 11, 16, 19, 24]. Recently, Mangasarian and Wild [16] proposed a new SVM model called generalized eigenvalue proximal SVM (GEPSVM) wherein the binary classification is obtained by constructing two non-parallel hyperplanes having the property that samples of each class will be clustered around its corresponding hyperplane. This results in solving two generalized eigenvalue problems. Similar in sprit to GEPSVM, a novel formulation called twin SVM (TWSVM) has been proposed in [9] wherein two non-parallel hyperplanes are constructed by solving two QPPs of smaller size than solving a single QPP as in the case of the standard SVM. This strategy makes TWSVM work four times faster than the standard SVM and further showing good generalization ability [9, 12]. Due to these advantages, TWSVM becomes one of the most popular methods for classification. For the interesting work on least squares TWSVM (LS-TWSVM) and smooth TWSVM, see [11, 12]. For other extensions to TWSVM, the interested reader is referred to [20, 21, 23].

With the aim of obtaining an efficient TWSVM model, following the novel approach in solving the dual SVM [1, 26], a naïve unconstrained Lagrangian twin SVM (ULTSVM) formulation has been proposed in this paper. Since the objective functions contain a term having non-smooth ‘plus’ function, the proposed minimization problems are solved either by considering its generalized Hessian [5, 8] or by introducing the smooth approximation function of [13] in place of the non-smooth ‘plus’ function and then applying Newton-Armijo algorithm [13]. Its convergence and finite termination will follow directly from the results of [13, 14]. Finally, the effectiveness of the proposed ULTSVM problem is demonstrated by performing experiments on a number of interesting real-world datasets and comparing their results with SVM, LS-TWSVM and TWSVM.

Throughout this work, all vectors are assumed as column vectors. The inner product of two vectors x , y in the n−dimensional real space R ⁿ is denoted by: x ^t y, where x ^tis the transpose of x. For any vector x=(x ₁,...,x _n)^t∈R ⁿ, the plus function x ₊ is defined as: (x ₊)_i= max{0,x _i} and i=1,...,n. The 2-norm of a vector x will be denoted by: ||x||. We denote the vector of ones of dimension m by e and the identity matrix of appropriate size by I. If f is a real valued function of the variable x=(x ₁,...,x _n)^t∈R ⁿ then its gradient vector and Hessian matrix are denoted by: ∇f=(∂ f/∂ x ₁,...,∂ f/∂ x _n)^t and ∇² f=(∂ ² f/∂ x _i ∂ x _j)_i,j=1,...,n respectively.

The paper is organized as follows. In Section 2, the standard SVM, LS-TWSVM and TWSVM are reviewed. The proposed unconstrained TWSVM problem in its dual form and Newton iterative method of solving it are described in Section 3. Numerical experiments have been performed on a number of real-world datasets and their results have been compared with that of SVM, LS-TWSVM and TWSVM in Section 4 and finally the conclusions and future work are drawn in Section 5.

2 Related work

In this section, we briefly describe the standard SVM for binary classification problems and one of its important variants, the twin support vector machine.

Consider the binary classification problem assuming that the training set $\{(\textbf {x}_{i} ,y_{i})\}_{i=1}^{m} $ be given where for the input sample x _i∈R ⁿ let its corresponding class label be y _i∈{−1, +1}.

2.1 Support vector machine (SVM)

Assume that the input samples are mapped into a higher dimensional feature space via a nonlinear function φ(.). Then, an SVM classifier seeks for an optimal hyperplane of the form w ^t φ ( x )+b= 0 in the feature space, where the bias term b∈R and the vector normal to the hyperplane w are the unknowns which are determined by solving the following QPP [3, 25]

$$\begin{array}{cc} &{\underset{\textbf{w},b,\boldsymbol{\xi}}{ {\min}}} \frac{1}{2}\textbf{w}^{t}\textbf{w}+C\textbf{e}^{t}\boldsymbol{\xi}\\ \text{subject to:}\\ &y_{i} \,\,(\,\textbf{w}^{t}\varphi (\textbf{x}_{i} )+b\,)\,\,\ge \,\,1\,-\,\xi_{i} \end{array} $$

and

$$ \xi_{i} \ge 0 \,\, \text{for} \,\, i= 1,2,\ldots,m. $$

(1)

Here, ξ=(ξ ₁,..., ξ _m)^t is the vector of slack variables and C>0 is a parameter.

Usually, problem (1) is solved by minimizing its Wolfe dual obtained to be

$$\begin{array}{cc} &{\underset{\mathbf{u}}{{\min}}} \,\,\frac{1}{2}\sum\limits_{i,j=1}^{m} {y_{i} y_{j} \varphi (\mathbf{x}_{i} )^{t}\varphi (\mathbf{x}_{j} )u_{i} u_{j} -} \sum\limits_{i=1}^{m} {u_{i} } \end{array} $$

subject to

$$ \sum\limits_{i=1}^{m} {u_{i} y_{i} =0} \text{ and } 0\le u_{i} \le C \text{ for } i= 1,2,\ldots,m, $$

(2)

where u=(u ₁,...,u _m)^t∈R ^m is the Lagrangian multiplier vector. In this case, the decision function f(.) is taken as

$$ f(\mathbf{x})=\textit{sign}\,\left({\sum\limits_{s=1}^{N_{SV} } {u_{s} y_{s} \varphi (\mathbf{x})^{t}\varphi (\mathbf{x}_{s} )+b} } \right), $$

(3)

where N _{S
V} is the number of support vectors x _s∈R ⁿ in which 0 < u _s< C.

By applying the kernel trick, i.e. taking k(x,z)=φ(x)^t φ(z) for x,z∈R ⁿ in (2) and (3), where k(.,.) is a given kernel function, the explicit construction of the nonlinear mapping φ(.) will be avoided. In this work, the Gaussian kernel function of the form

$$k(\mathbf{x},\mathbf{z})= \exp (\,-\mu \,\| \mathbf{x}-\mathbf{z}\|^{2}\,) $$

is considered, where μ > 0 is a parameter.

2.2 Twin support vector machine (TWSVM)

Assume that the training set consists of m ₁ and m ₂ number of samples belonging to class (+1) and class (−1) respectively so that m=m ₁+m ₂. Further, let the samples from class (+1) and class (−1) be represented by matrices $A\in \boldsymbol {\Re }^{{m}_{1} \times n}$ and $B\in \boldsymbol {\Re }^{{m}_{2} \times n}$ respectively.

Similar to SVM, the nonlinear TWSVM problem can be formulated by mapping the training samples into a higher dimensional feature space via a kernel function k(.,.) and performing the linear TWSVM classification in the feature space. More precisely, TWSVM determines two kernel generated surfaces of the form [9]

$$ K(\mathbf{x}^{t},C^{t})\,\mathbf{w}_{1} +b_{1} =0\,\text{ and }\,K(\mathbf{x}^{t},C^{t})\,\mathbf{w}_{2} +b_{2} =0 $$

(4)

such that each one of them will be as close as possible to samples of one class and also will be at a distance of at least one unit from samples of the other class where C=[ A ; B ] is an augmented matrix of size m×n and K(x ^t,C ^t) =(k(x,x ₁), ... ,k(x,x _m) ) is a row vector in R ^m. In fact, unlike solving two eigenvalue problems as in the case of GEPSVM [16], the nonparallel kernel surfaces (4) are obtained by solving the following pair of QPPs defined by [9]

$$ \begin{array}{cc} &{\underset{(\mathbf{w}_{1} ,\,b_{1} ,\,\boldsymbol{\xi}_{2} )\,\in\,\boldsymbol{\Re}^{m+1+m_{2}}}{{\min}}} \frac{1}{2}\left\| K(A,C^{t})\,{\mathbf{w}}_{1} + \mathbf{e}_{1} b_{1} \right\|^{2}+ C_{2} \mathbf{e}_{2}^{t} \boldsymbol{\xi}_{2}\\ \text{subject to}\\ &-(K(B,C^{t})\mathbf{w}_{1} + \mathbf{e}_{2}b_{1})+ \boldsymbol{\xi}_{2}\ge \mathbf{e}_{2}, \boldsymbol{\xi}_{2} \ge 0 \end{array} $$

(5a)

and

$$ \begin{array}{cc} &{\underset{(\mathbf{w}_{2} ,\,b_{2} ,\,\boldsymbol{\xi}_{1} )\,\in \,\boldsymbol{\Re}^{m+1+m_{1}}}{{\min}}}\frac{1}{2}\left\| K(B,C^{t})\mathbf{w}_{2} + \mathbf{e}_{2}b_{2}\right\|^{2} +{C}_{1}\,\, \mathbf{e}^{t}_{1}\boldsymbol{\xi}_{1}\\ \text{subject to}\\ &(K(A,C^{t})\mathbf{w}_{2} + \mathbf{e}_{1}b_{2})+ \boldsymbol{\xi}_{1}\ge \mathbf{e}_{1}, \boldsymbol{\xi}_{1} \ge 0 \end{array} $$

(5b)

where $\boldsymbol {\xi }_{1} \in \boldsymbol {\Re }^{{m}_{1} }$, $\boldsymbol {\xi }_{2} \in \boldsymbol {\Re }^{{m}_{2}}$ are vectors of slack variables; C ₁,C ₂>0 are the regularization parameters and the unknowns are w ₁,w ₂∈R ^mand b ₁,b ₂∈R.

In practice, the solution of the above pair of primal problems (5a) and (5b) is obtained by constructing their Wolfe duals and solving them. Finally, for any test sample x∈R ⁿ, its class label is assigned according to its proximity to the non-parallel surfaces, i.e.

$$ \text{class } i=\arg \,\underset{k=1,2}{\min} \,\vert K({\mathbf{ x}}^{t},C^{t})\,{\mathbf{w}}_{k} +b_{k} \vert, $$

(6)

where |K(x ^t,C ^t) w _k+b _k| is the perpendicular distance from x∈R ⁿ to the hyperplane K(x ^t,C ^t) w _k+b _k. For more details on TWSVM, see [9].

2.3 Least squares twin support vector machine (LS-TWSVM)

Similar to the study of least squares SVM (LS-SVM) [24], the extension of TWSVM to least squares TWSVM (LS-TWSVM) was proposed in [11] leading to solving a pair of QPPs with equality constraints. More precisely, the nonlinear LS-TWSVM is defined as

$$ \begin{array}{cc} &{\underset{(\mathbf{w}_{1} ,\,b_{1} ,\,\boldsymbol{\xi}_{2} )\,\,\in \,\boldsymbol{\Re}^{m+1+m_{2}}}{ {\min}}}\frac{1}{2}\vert\vert K(A, C^{t})\mathbf{w}_{1} + \mathbf{e}_{1}b_{1}\vert\vert^{2} +\frac{C_{2}}{2}\boldsymbol{\xi}^{t}_{2}\boldsymbol{\xi}_{2}\\ \text{subject to}\\ &-(K(B,C^{t})\mathbf{w}_{1} + \mathbf{e}_{2}b_{1})+ \boldsymbol{\xi}_{2}= \mathbf{e}_{2} \end{array} $$

(7a)

and

$$ \begin{array}{cc} &{\underset{(\mathbf{w}_{2} ,\,b_{2} ,\,\boldsymbol{\xi}_{1} )\,\in \,\boldsymbol{\Re}^{m+1+m_{1}}}{{\min}}} \frac{1}{2} \vert \vert K(B,C^{t})\,{\mathbf{w}}_{2} +\mathbf{e}_{2} b_{2} \vert \vert^{2}+ \frac{C_{1} }{2}{\boldsymbol{\xi}}_{1}^{t} {\boldsymbol{\xi }}_{1}\\ \text{subject to}\\ &(K(A,C^{t})\,{\mathbf{w}}_{2} +{\mathbf{e}}_{1} b_{2} )+ {\boldsymbol{\xi}}_{1} ={\mathbf{e}}_{1} . \end{array} $$

(7b)

In fact, on substituting the equality constraints into the object functions, the problems (7a) and (7b) become

$$ {\underset{({\mathbf{w}}_{1} ,\,b_{1} )\,\in\,\boldsymbol{\Re}^{m+1}}{{\min}}} \frac{1}{2} \vert \vert K(A,C^{t})\,{\mathbf{w}}_{1} +{\mathbf{e}}_{1} b_{1} \vert \vert^{2}+ \frac{C_{2}}{2}\vert \vert K(B,C^{t})\,{\mathbf{w}}_{1} +{\mathbf{ e}}_{2} b_{1} +{\mathbf{e}}_{2} \vert \vert^{2} $$

(8a)

and

$$ {\underset{({\mathbf{w}}_{2} ,\,b_{2} )\,\in \,\boldsymbol{\Re}^{m+1}}{{\min}}} \frac{1}{2} \vert \vert K(B,C^{t})\,{\mathbf{w}}_{2} +{\mathbf{e}}_{2} b_{2} \vert\vert^{2}+ \frac{C_{1}}{2}\vert \vert \,\,K(A,C^{t})\,{\mathbf{w}}_{2} +{\mathbf{e}}_{1} b_{2}\,-\,{\mathbf{e}}_{1}\,\vert \vert^{2}. $$

(8b)

In this case, their solutions become

$$\begin{array}{@{}rcl@{}}\left[ {{\begin{array}{*{20}c} {\mathbf{w}}_{1} \hfill \\ {b_{1} } \hfill \\ \end{array}}} \right]\,&=&\,-(H^{t}H+\frac{1}{C_{2} }G^{t}G)^{-1}H^{t}\,{\mathbf{e}}_{2} \,\, \text{and} \\\, \left[{{\begin{array}{l} {\mathbf{w}}_{2} \hfill \\ {b_{2} } \hfill \\ \end{array}}} \right]\,&=&\,(G^{t}G+\frac{1}{C_{1} }H^{t}H)^{-1}G^{t}\,{\mathbf{e}}_{1} , \end{array} $$

where

$$ G=[K(A,C^{t})\,{\mathbf{e}}_{1}] \, \text{and} \, H=[K(B,C^{t})\,{\mathbf{e}}_{2} ] $$

(9)

are augmented matrices of size m ₁ × (m+1) and m ₂ × (m+1) respectively. Since LS-TWSVM results in solving two systems of linear equations it is faster in learning than TWSVM. Further, its classification accuracy results are comparable to TWSVM [11].

3 Proposed unconstrained Lagrangian twin SVM (ULTWSVM)

Following the work of [1, 26], a new variant of TWSVM in its dual is proposed in this section as a pair of unconstrained minimization problems whose solutions will be obtained by the Newton iterative method. We discuss our proposed model for both the linear and nonlinear cases.

Instead of assuming the 1-norm of the vector of slack variables ξ _k with weight C _k>0 in (5a) and (5b) where k=1,2; the square of the 2-norm of ξ _k with weight $\frac {C_{k} }{2}$ is minimized in our modified TWSVM formulation. In fact, the linear TWSVM in 2-norm solves the pair of QPPs [9]

$$ \begin{array}{cc} &{\underset{({\mathbf{w}}_{1} ,b_{1} ,\boldsymbol{\xi}_{2} )\,\in \,\boldsymbol{\Re}^{n+1+m_{2}}}{{\min}}} \frac{1}{2} \vert \vert A{\mathbf{w}}_{1} +{\mathbf{e}}_{1} b_{1} \vert \vert^{2}+ \, \frac{C_{2} }{2}{\boldsymbol{\xi}}_{2}^{t} {\boldsymbol{\xi}}_{2}\\ \text{subject to}\\ &-(B{\mathbf{w}}_{1} +{\mathbf{e}}_{2} b_{1} )+\boldsymbol{\xi}_{2} \ge {\mathbf{e}}_{2} \end{array} $$

(10a)

and

$$ \begin{array}{cc} &{\underset{({\mathbf{w}}_{2} ,b_{2} ,\boldsymbol{\xi}_{1} )\,\,\in \,\boldsymbol{\Re}^{n+1+m_{1} }}{{\min}}} \frac{1}{2} \vert \vert B{\mathbf{w}}_{2} +{\mathbf{e}}_{2} b_{2} \vert \vert^{2}+ \, \frac{C_{1}}{2}\boldsymbol{\xi}_{1}^{t} \boldsymbol{\xi}_{1} \\ \text{subject to}\\ &(A{\mathbf{w}}_{2} +{\mathbf{e}}_{1} b_{2} )+\boldsymbol{\xi}_{1} \ge { \mathbf{e}}_{1} , \end{array} $$

(10b)

where w ₁,w ₂∈R ⁿand b ₁,b ₂∈R are the unknowns.

Note that since the non-negativity constraints of the slack variables will be automatically satisfied at optimality [15], they have been dropped in (10a) and (10b).

By considering Lagrangian functions and using Karush-Kuhn-Tucker (KKT) conditions, the Wolfe duals of (10a) and (10b) can be formulated as QPPs of the following form

$$ {\underset{0\,\le \,{\mathbf{u}}_{1} \,\in \,\boldsymbol{\Re}^{{m}_{1}}}{{\min}}} \,L_{1} ({\mathbf{u}}_{1} )\,=\,\frac{1}{2}{\mathbf{u}}_{1}^{t} Q_{1} { \mathbf{u}}_{1} -{\mathbf{e}}_{1}^{t} {\mathbf{u}}_{1} $$

(11a)

and

$$ {\underset{0\,\le \,{\mathbf{u}}_{2} \,\in \,\boldsymbol{\Re}^{{m}_{2}}}{{\min}}} \,L_{2} ({\mathbf{u}}_{2} )\,=\,\frac{1}{2}{\mathbf{u}}_{2}^{t} Q_{2} {\mathbf{u}}_{2} -{\mathbf{e}}_{2}^{t} {\mathbf{u}}_{2} $$

(11b)

where

$$ Q_{1} =\left({\frac{I}{C_{1}}+G(H^{t}H)^{-1}G^{t}} \right) \, \text{and} \, Q_{2} =\left({\frac{I}{C_{2}}+H(G^{t}G)^{-1}H^{t}} \right); $$

(12)

and, G=[A e ₁] and H=[B e ₂] are augmented matrices of sizes m ₁ × (n+1) and m ₂ × (n+1) respectively. Here ${\mathbf {u}}_{1} \in \boldsymbol {\Re }^{{m}_{1} }$ and ${ \mathbf {u}}_{2} \in \boldsymbol {\Re }^{{m}_{2}}$ are Lagrange multipliers satisfying the following

$$ \left[ {{\begin{array}{*{20}c} {{\mathbf{w}}_{1}} \hfill \\ {b_{1} } \hfill \\ \end{array} }} \right]\,=\,-(G^{t}G)^{-1}H^{t}\,{\mathbf{u}}_{2}~\text{and}~\left[ {{\begin{array}{*{20}c} {{\mathbf{w}}_{2}} \hfill \\ {b_{2}} \hfill \\ \end{array}}} \right]\,=\,(H^{t}H)^{-1}G^{t}\,{\mathbf{u}}_{1} . $$

(13)

Finally, using the solutions of (11a) and (11b), and (13) one can determine the end classifier (6).

The nonlinear TWSVM in 2-norm determines two non-parallel hyperplanes in the feature space of the form (4) by solving the following pair of QPPs:

$$\begin{array}{cc} &{\underset{({\mathbf{w}}_{1} ,b_{1} ,\boldsymbol{\xi}_{2} )\,\in \,\boldsymbol{\Re}^{m+1+m_{2}}}{{\min}}} \frac{1}{2} \vert \vert K(A,C^{t}){\mathbf{w}}_{1} +{\mathbf{e}}_{1} b_{1} \vert \vert^{2}+ \, \frac{C_{2} }{2}\boldsymbol{\xi}_{2}^{t} \boldsymbol{\xi}_{2}\\ \text{subject to:}\\ &-(K(B,C^{t}){\mathbf{w}}_{1} +{\mathbf{e}}_{2} b_{1} )+ \boldsymbol{\xi}_{2} \ge {\mathbf{e}}_{2} \end{array} $$

and

$$\begin{array}{cc} &{\underset{({\mathbf{w}}_{2} ,b_{2} ,\boldsymbol{\xi}_{1} )\,\in \,\boldsymbol{\Re}^{m+1+m_{1}}}{{\min}}} \frac{1}{2} \vert \vert K(B,C^{t}){\mathbf{w}}_{2} +{\mathbf{e}}_{2} b_{2} \vert \vert^{2}+ \, \frac{C_{1} }{2}\boldsymbol{\xi}_{1}^{t} \boldsymbol{\xi}_{1}\\ \text{subject to:} \\ &(K(A,C^{t}){\mathbf{w}}_{2} +{\mathbf{e}}_{1} b_{2} )+ \boldsymbol{\xi}_{1} \ge {\mathbf{e}}_{1} \end{array} $$

where $\boldsymbol {\xi }_{1} \in \boldsymbol {\Re }^{{m}_{1} }$, $\boldsymbol {\xi }_{2} \in \boldsymbol {\Re }^{{m}_{2}}$ are slack variable vectors and C ₁,C ₂>0 are input parameters.

With the introduction of Lagrange multipliers ${\mathbf {u}}_{1} \in \boldsymbol {\Re }^{m_{1} }$ and ${\mathbf {u}}_{2} \in \boldsymbol {\Re }^{{m}_{2}}$, the duals of the above problems can be derived again of the same form as (11a) and (11b) where the matrices Q ₁,Q ₂ are defined by (12). However, in this case, the augmented matrices G and H are given by (9). Using their solutions and (13), the kernel generated functions (4) can be easily obtained.

For the purpose of reformulating the pair of duals (11a) and (11b) into a pair of equivalent, unconstrained minimization problems as our proposed problem formulation and obtaining their solutions by iterative methods, one can rewrite them either for the linear or nonlinear case as

$$ {\underset{0\,\le \,{\mathbf{u}}_{1} \,\in \,\boldsymbol{\Re}^{{m}_{1} }}{{\min}}} L_{1} ({\mathbf{u}}_{1} )=\frac{1}{2}{\mathbf{u}}_{1}^{t} \left({\frac{I}{C_{1}}+ \boldsymbol{\mathcal{G}}\,\boldsymbol{\mathcal{G}}^{t}} \right){\mathbf{u}}_{1} -{\mathbf{e}}_{1}^{t} {\mathbf{u}}_{1} $$

(14)

and

$$ {\underset{0\,\,\le \,\,{\mathbf{u}}_{2} \,\in \,\,\boldsymbol{\Re}^{{m}_{2}}}{{\min}}} \, L_{2} ({\mathbf{u}}_{2} )=\frac{1}{2}{\mathbf{u}}_{2}^{t} \left({\frac{I}{C_{2} }+\boldsymbol{\mathcal{H}}\,\boldsymbol{\mathcal{H}}^{t}} \right){\mathbf{u}}_{2} -{\mathbf{e}}_{2}^{t} {\mathbf{u}}_{2} , $$

(15)

where $\boldsymbol {\mathcal {G}}=G(H^{t}H)^{-1}H^{t}$ and $\boldsymbol {\mathcal {H}}=H(G^{t}G)^{-1}G^{t}$are matrices of sizes m ₁ × m ₂ and m ₂ × m ₁ respectively.

Now, let us consider the minimization problem (14). It can be equivalently written as

$$\begin{array}{cc} &{\underset{{\begin{array}{l} 0\,\le\,{\mathbf{u}}_{1} \,\in\,\boldsymbol{\Re}^{{m}_{1} } \\ {\mathbf{v}}_{2} \,\in \,\boldsymbol{\Re}^{{m}_{2}} \\ \end{array}}}{\min}} \left({\frac{1}{2C_{1} }{\mathbf{u}}_{1}^{t} {\mathbf{u}}_{1} -{\mathbf{ e}}_{1}^{t} {\mathbf{u}}_{1} } \right)+\frac{1}{2}{\mathbf{v}}_{2}^{t} {\mathbf{v}}_{2}\\ \text{subject to}\\ &{\mathbf{v}}_{2} =\boldsymbol{\mathcal{G}}^{t}\,{\mathbf{u}}_{1} \in \boldsymbol{\Re}^{{m}_{2}}. \end{array} $$

By introducing the Lagrangian multiplier $\text {\textbf {z}}_{2} \in \boldsymbol {\Re }^{{m}_{2} }$, the dual of the above problem can be expressed as

$$\begin{array}{@{}rcl@{}} &&{\underset{{\mathbf{z}}_{2} \,\,\in \,\,\boldsymbol{\Re}^{{m}_{2}}}{\max}} \,\,\{{\underset{{\begin{array}{l} 0 \le{\mathbf{u}}_{1} \in \boldsymbol{\Re}^{{m}_{1} } \\ {\mathbf{v}}_{2} \in \boldsymbol{\Re}^{{m}_{2}} \\ \end{array}}}{\min}} [\,\,\left({\frac{1}{2C_{1} }{\mathbf{u}}_{1}^{t} {\mathbf{u}}_{1} -{\mathbf{e}}_{1}^{t} {\mathbf{u}}_{1} } \right)\\&&{\kern5pt}+\frac{1}{2}{\mathbf{ v}}_{2}^{t} {\mathbf{v}}_{2} +{\mathbf{z}}_{2}^{t} (\boldsymbol{\mathcal{G}}^{t}{\mathbf{u}}_{1} -{\mathbf{v}}_{2} )\,\,]\,\,\}\\ &&= {\underset{{\mathbf{z}}_{2} \,\,\in \,\,\boldsymbol{\Re}^{{m}_{2}}}{\max}} \,\,\{ {\underset{0\,\,\le \,\,{\mathbf{u}}_{1} \,\in \,\,\boldsymbol{\Re}^{{m}_{1}}}{\min}} [\,\,\left({\frac{1}{2C_{1}}{\mathbf{u}}_{1}^{t} {\mathbf{u}}_{1} -{\mathbf{e}}_{1}^{t} {\mathbf{u}}_{1} +{\mathbf{z}}_{2}^{t} \boldsymbol{\mathcal{G}}^{t}{\mathbf{ u}}_{1} } \right)\,\,]\,\,\\&&{\kern5pt}+{\underset{{\mathbf{v}}_{2} \,\in \,\,\boldsymbol{\Re}^{{m}_{2}}}{\min}} \,(\,\frac{1}{2}{\mathbf{v}}_{2}^{t} {\mathbf{v}}_{2} -{\mathbf{z}}_{2}^{t} {\mathbf{v}}_{2} )\,\,\}. \end{array} $$

(16)

However, it can be easily verified analytically that [22]

$$ {\underset{0 \le {\mathbf{u}}_{1}\, \in\, \boldsymbol{\Re}^{{m}_{1}}}{\min}} [\,\left({\frac{1}{2C_{1} }{\mathbf{u}}_{1}^{t} {\mathbf{u}}_{1} -{\mathbf{e}}_{1}^{t} {\mathbf{u}}_{1} +{\mathbf{z}}_{2}^{t} \boldsymbol{\mathcal{G}}^{t}{\mathbf{ u}}_{1} } \right)\,]\,= -\frac{C_{1} }{2}\vert \vert \,({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{z}}_{2} )_{+} \vert \vert^{2}\,\, $$

(17)

and is attained at

$${\mathbf{u}}_{1} =C_{1} \,({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{z}}_{2} )_{+} \,. $$

Similarly,

$$ {\underset{{\mathbf{v}}_{2} \,\in \,\boldsymbol{\Re}^{{m}_{2}}}{\min}} \,(\,\frac{1}{2}{\mathbf{v}}_{2}^{t} {\mathbf{v}}_{2} -{\mathbf{ z}}_{2}^{t} {\mathbf{v}}_{2} )\,=\,-\,\frac{1}{2}{\mathbf{ z}}_{2}^{t} {\mathbf{z}}_{2} $$

(18)

when v ₂=z ₂.

Applying the results of (17) and (18) in (16), the dual problem (14) can be written as an unconstrained, strongly convex minimization problem in its simpler form

$$ {\underset{{\mathbf{z}}_{2} \,\in\,\boldsymbol{\Re}^{{m}_{2}}}{\min}} \,\tilde{{L}}_{1} ({\mathbf{z}}_{2} )\,=\,\frac{1}{2}{\mathbf{ z}}_{2}^{t} {\mathbf{z}}_{2} +\frac{C_{1} }{2}\vert \vert \,({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{z}}_{2} )_{+} \vert \vert^{2}. $$

(19)

By following the above procedure, the dual problem (15) can be equivalently written as a strongly convex, minimization problem of the form

$$ {\underset{{\mathbf{z}}_{1} \,\in \,\boldsymbol{\Re}^{{m}_{1}}}{\min}} \,\tilde{{L}}_{2} ({\mathbf{z}}_{1} )\,=\,\frac{1}{2}{\mathbf{ z}}_{1}^{t} {\mathbf{z}}_{1} +\frac{C_{2} }{2}\vert \vert \,({\mathbf{e}}_{2} -\boldsymbol{\mathcal{H}}\,{\mathbf{z}}_{1})_{+} \vert \vert^{2}. $$

(20)

Since, the objective functions $\tilde {{L}}_{1} (.),\,\tilde {{L}}_{2} (.)$ of our proposed LTWSVM are strongly convex and therefore each problems (19), (20) will have a unique solution. From the above discussion, one can easily conclude that if ${\tilde {\mathbf {z}}}_{1} $ and ${\tilde {\mathbf {z}}}_{2} $ are the unique solutions of (20) and (19) respectively and let ${\tilde {\mathbf {u}}}_{1} =C_{1} \,({\mathbf {e}}_{1} -\boldsymbol {\mathcal {G}}\,{\tilde {\mathbf {z}}}_{2} )_{+} \,$ and ${\tilde {\mathbf {u}}}_{2} =C_{2} \,({\mathbf {e}}_{2} -\boldsymbol {\mathcal {H}}\,{\tilde {\mathbf {z}}}_{1} )_{+} \,$ then ${\tilde {\mathbf {u}}}_{1} $ and ${\tilde {\mathbf {u}}}_{2} $ become solutions of the dual problems (14) and (15) respectively.

The above approach provides an alternative pair of unconstrained optimization problems instead of the pair of constrained optimization problems (14), (15).

Remark 1

For the derivation of the proposed problem formulations (19) and (20), it is required that the inverse of the matrices (H ^t H) and (G ^t G) of order (m+1) should exist. However, since the matrix (H ^t H), similarly (G ^t G), is positive semi-definite and therefore its inverse may or may not exist. Following [9], a regularization term σ _H I may be introduced where σ _H > 0 is a very small number so that the matrix (σ _H I + H ^t H) becomes positive definite whose inverse can be computed using SMW identity [6], i.e.

$$(\sigma_{H}\,I\,+\,H^{t}H)^{-1}=\frac{1}{\sigma_{H} }\,(I-H^{t}(\sigma_{H} \,I\,+\,HH^{t})^{-1}H), $$

having the advantage that it is sufficient to compute (σ _H I + H H ^t)⁻¹ of order m ₂ only. Similarly, introducing a regularization term σ _G I where σ _G > 0 is very small, one can compute $(\sigma _{G} \,I\,+\,G^{t}G)^{-1}=\frac {1}{\sigma _{G} }\,(I-G^{t}(\sigma _{G} \,I\,\,+\,GG^{t})^{-1}G)$ in which the matrix (σ _G I + G G ^t)⁻¹ is of order m ₁ only.

Finally, for solving (19) and (20), it is proposed to obtain their critical points, i.e. finding the roots of the following system of nonlinear equations by using the Newton iterative method

$$\begin{array}{@{}rcl@{}} &&\nabla \tilde{{L}}_{1} ({\mathbf{z}}_{2} )={\mathbf{z}}_{2} -C_{1} \,\boldsymbol{\mathcal{G}}^{t}({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{z}}_{2} )_{+} =0 \,\text{and} \end{array} $$

$$\begin{array}{@{}rcl@{}} &&\nabla \tilde{{L}}_{2} ({\mathbf{z}}_{1} )={\mathbf{z}}_{1} -C_{2} \boldsymbol{\mathcal{H}}^{t}({ \mathbf{e}}_{2} -\boldsymbol{\mathcal{H}}\,{\mathbf{z}}_{1} )_{+} =0. \end{array} $$

(21)

As each of the above equations of (21) contains the non-differentiable ‘plus’ function, it is proposed to solve them by using the Newton iterative method with Armijo step size using the generalized Hessian approach of [5] and smoothing technique detailed in [13].

Assume that k=1,2. Using generalized derivative, a generalized Hessian matrix of $\tilde {{L}}_{k} (.)$ can be obtained [8] as

$$\begin{array}{@{}rcl@{}} \nabla^{2}\tilde{{L}}_{1} ({\mathbf{z}}_{2} )&=&I+C_{1} \boldsymbol{\mathcal{G}}^{t}\,diag(\,({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{z}}_{2} )_{\ast } )\,\boldsymbol{\mathcal{G}} \, \text{and} \\\nabla^{2}\tilde{{L}}_{2} ({\mathbf{z}}_{1} )&=&I+C_{2} \boldsymbol{\mathcal{H}}^{t}\,diag(({ \mathbf{e}}_{2} -\boldsymbol{\mathcal{H}}\,{\mathbf{z}}_{1} )_{\ast } )\,\boldsymbol{\mathcal{H}}. \end{array} $$

Clearly $\nabla ^{2}\tilde {{L}}_{k} (.)$ is symmetric and positive-definite, and therefore the solution of each system of nonlinear (21) can be computed using fast Newton iterative algorithm with Armijo stepsize whose proof of convergence and finite termination will follow from [14].

The smooth approximation approach is a very popular method [13] used for solving optimization problems, especially QPPs of SVM and SVR, whose objective functions are not twice differentiable. In this work, as another approach, we employ a smoothing technique to make the objective functions $\tilde {{L}}_{k} (.)$ sufficiently smooth. More precisely, since the plus function (x)₊ appearing in (19) and (20) is not differentiable, it will be replaced by the smooth approximation function p(x,α) with smooth parameter α>0, defined as [13]

$$\quad p(x,\alpha )\, = \, x+\frac{1}{\alpha }\log (1+\exp (-\alpha x)). $$

With the above approximation, the smooth reformulation of (19), for example, will become

$$ {\underset{{\mathbf{z}}_{2} \,\in \,\boldsymbol{\Re}^{{m}_{2}}}{\min}} \,\frac{1}{2}{\mathbf{z}}_{2}^{t} {\mathbf{z}}_{2} +\frac{C_{1} }{2}\vert \vert \,\,p({\mathbf{e}}_{1} -\boldsymbol{\mathcal{G}}\,{\mathbf{ z}}_{2} ,\alpha )\,\,\vert \vert^{2}, $$

(22)

where $\,p(\text {\textbf {e}}_{1} -\boldsymbol {\mathcal {G}}\,\text {\textbf {z}}_{2} ,\alpha )\,$is a vector in $\boldsymbol {\Re }^{{m}_{1} }$whose i-th component can be written as $(\,p(\text { \textbf {e}}_{1} -\boldsymbol {\mathcal {G}}\,\text {\textbf {z}}_{2} ,\alpha )\,)_{i} \,=\,p(1-\boldsymbol {\mathcal {G}}_{i} \,\text {\textbf {z}}_{2} ,\alpha )\,$ and $\boldsymbol {\mathcal {G}}_{i} \,$ is the i-th row of the matrix $\boldsymbol {\mathcal {G}}\,.$

With the advantage of twice differentiability of the objective function (22), one can apply the Newton-Armijo algorithm for solving it.

Remark 2

The Hessian corresponding to the smooth approximation problem (22) can be computed to be a matrix of order m ₂ as

$$I+C_{1} \,\boldsymbol{\mathcal{G}}^{t}(\,diag\left({\frac{1}{1+\exp (\alpha \,(\boldsymbol{\mathcal{G}}\,\boldsymbol{z}_{2} -\text{\textbf{ e}}_{1} ))}} \right)\,\,)\,\boldsymbol{\mathcal{G}}\,. $$

Following the work of [13] it can be easily shown that the Newton method with Armijo stepsize for (22) converges to its unique solution with quadratic convergence.

The smoothing technique can be extended in a similar manner to the modified dual problem (20) leading to its smooth reformulation

$${\underset{{\mathbf{z}}_{1}\,\in \,\boldsymbol{\Re}^{{m}_{1}}}{\min}} \,\frac{1}{2}{\mathbf{z}}_{1}^{t} {\mathbf{z}}_{1} +\,\frac{C_{2} }{2}\vert \vert \,p({\mathbf{e}}_{2} -\boldsymbol{\mathcal{H}}\,{\mathbf{ z}}_{1} ,\alpha )\,\,\vert \vert^{2} $$

whose solution can be obtained by applying the Newton-Armijo algorithm.

Remark 3

For simplicity reasons, we apply the Newton algorithm without Armijo stepsize to solve the pair of problems (19) and (20) numerically by either generalized derivative or smooth approximation approach.

Remark 4

Throughout in this work, solving (21) by Newton method using generalized Hessian and smooth approximation will be denoted by NLWTSVM and SLTWSVM respectively.

4 Experimental results

To analyze the generalization performance and the computational efficiency of our proposed ULTWSVM formulation solved by NLTWSVM and SLTWSVM training algorithms, experiments were performed on 17 bench mark data sets from UCI repository [17] and their results were compared with SVM, LS-TWSVM and TWSVM. All the classifiers were implemented on a PC running on Windows XP OS with 64 bit, 3.20 GHz Intel®;core™2 Duo processor having 8 GB of RAM under MATLAB R2008a environment. The standard SVM was solved by MOSEK optimization toolbox for MATLAB available at http://www.mosek.com and, however, no external optimizer was used for solving LS-TWSVM, TWSVM, NLTWSVM and SLTWSVM.

In the implementation of NLTWSVM and SLTWSVM, the values of the termination criteria tol and itmax were set to 0.001 and 10 respectively. The regularization parameters σ _H, σ _G > 0 were chosen to be 10⁻⁵. Since smooth function approximation with parameter α=5 has shown successful results [13], we assumed α=5 in the implementation of SLTWSVM.

All the datasets were normalized so that each feature value lies in [0, 1]. The optimal parameter values of C ₁ , C ₂ and μ were obtained by performing 10-fold cross validation on the training set by varying their values from the sets {10⁻⁵,... ,10⁵} and {2⁻⁵,... ,2⁵} respectively. With these optimal values, the classification prediction on the test set was computed by dividing the whole dataset randomly into 10 equal parts of which one of them was taken for testing and the remaining parts for training. Finally, the average test accuracy was taken as the measure of prediction.

In Tables 1, 2, 3 and 4 we have shown the accuracy results, along with the optimal parameter values and training time in seconds, by all the classifiers and their averaged ranks on accuracy values for the linear and Gaussian kernels. We notice immediately from Tables 1 and 3 that nonlinear classifiers perform better than their corresponding linear classifiers in terms of accuracy but not in terms of learning time. From Table 1, one can observe that the best performance in terms of accuracy was shown more number of times by SVM than the rest of the classifiers. However, for the case of Gaussian kernel, we observe from Table 3 that NLTWSVM shows the best performance in comparison with the rest of the learning algorithms considered.

Table 1 Performance comparison of our proposed methods NLTWSVM and SLTWSVM with LS-TWSVM, TWSVM and SVM on real world datasets. Linear kernel was employed. Time is for training in seconds

Full size table

Table 2 Average ranks of SVM, LS-TWSVM, TWSVM, NLTWSVM and SLTWSVM with linear kernel on accuracy values

Full size table

Table 3 Performance comparison of our proposed methods NLTWSVM and SLTWSVM with LS-TWSVM, TWSVM and SVM on real world datasets. Gaussian kernel was employed. Time is for training in seconds

Full size table

Table 4 Average ranks of SVM, LS-TWSVM, TWSVM, NLTWSVM and SLTWSVM with Gaussian kernel on accuracy values

Full size table

To further analyze statistically the performance of the proposed LTWSVM and SLTWSVM classifiers with SVM, LS-TWSVM and TWSVM, as it was suggested in Demsar [4], we perform a non-parametric Friedman test with the corresponding post hoc tests. For this purpose, the average ranks of all the classifiers in terms of prediction accuracy for the linear and Gaussian kernels were computed and listed in Tables 2 and 4 respectively. From the tables we notice that the average ranks of SVM and TWSVM are the least for the linear and Gaussian kernels respectively.

Under the null hypothesis that all the five algorithms on the seventeen data sets considered are equivalent, we compute the Friedman statistics [4] for the linear kernel as shown below

$$\begin{array}{@{}rcl@{}} {\chi_{F}^{2}} &&=\frac{12\times 17}{5\times 6}\left[ \,(2.4118^{2}+2.8824^{2}+3.5294^{2}+2.9706^{2}+3.2059^{2})\right.\\&&\left.{\kern5pt}-\frac{5\times 6^{2}}{4}\, \right]\approx 4.6512, \end{array} $$

$$F_{F} =\frac{16\times 4.6512}{17\times 4-4.6512}\approx 1.1748, $$

where F _F is distributed according to F-distribution with (4,4×16)=(4,64)degrees of freedom. The critical value of F(4,64) for the level of significance α=0.05 is 2.5153. Since the F _F value on RMSE, i.e. 1.1748, is smaller than the critical value 2.5153 for α=0.05, there is no significant difference between the five algorithms. Also, from Table 1 we can observe that LS-TWSVM takes the least training time and it is followed, in general, by ULTWSVM solved using the algorithms NLTWSVM and SLTWSVM.

For the Gaussian kernel, we notice from Table 3 that the number of times the best accuracy obtained by SVM, LS-TWSVM, TWSVM, NLTWSVM and SLTWSVM become 3, 4, 3, 5 and 3 respectively. This gives an indication of the effectiveness of the proposed problem formulation solved by NLTWSVM. To analyze statistically the comparative performance of all the algorithms, the Friedman statistic under the null hypothesis that all the algorithms are equivalent can be computed as

$$\begin{array}{@{}rcl@{}} {\chi_{F}^{2}} && =\frac{12\times 17}{5\times 6}\left[ \,(3.4706^{2}+3.2059^{2}+2.7353^{2}+2.7647^{2}+2.8235^{2})\right.\\&&\left.{\kern5pt}-\frac{5\times 6^{2}}{4}\, \right]\approx 2.8601, \end{array} $$

$$F_{F} =\frac{16\times 2.8601}{17\times 4-2.8601}\approx 0.7025. $$

Since the F _F value on RMSE, i.e. 0.7025, is again smaller than the critical value 2.5153 for α=0.05, there is no significant difference between the five algorithms, i.e. we conclude that none of the methods are statistically better than the rest. Finally, regarding the computational learning speed, NLTWSVM and SLTWSVM show faster learning speed than SVM and TWSVM except for one data set. The overall superiority of the proposed novel formulation solved by the two iterative algorithms clearly illustrates its effectiveness and applicability.

5 Conclusion and future work

By reformulating the pair of Lagrangian dual problems of the twin support vector machine, a novel equivalent problem formulation was proposed in this work as a problem of solving a pair of unconstrained minimization problems. Since the objective functions contain the non-smooth ‘plus’ function, their solutions were obtained by Newton iterative method using the well-known generalized Hessian and smooth approaches. The efficiency of the proposed model in terms of classification accuracy and learning time was demonstrated by performing numerical experiments and comparing their results with SVM, least squares twin SVM and twin SVM. In summary, a simple problem formulation, very simple MATLAB coding and the computational efficiency clearly illustrate the effectiveness and the applicability of our proposed model. The future work will be on the application of semi-smooth approach of [26] for solving this novel problem formulation.

References

Balasundaram S, Gupta D (2014) Training Lagrangian twin support vector regression via unconstrained convex minimization. Knowl-Based Syst 59:85–96
Article MATH Google Scholar
Cortes C, Vapnik V N (1995) Support vector networks. Mach Learn 20:273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel based learning method. Cambridge University Press, Cambridge
Book MATH Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Fung G, Mangasarian OL (2003) Finite Newton method for Lagrangian support vector machine. Neurocomputing 55:39–55
Article Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd ed., The Johns Hopkins University Press
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machine. Mach Learn 46:389–422
Article MATH Google Scholar
Hiriart-Urruty J -B, Strodiot J J, Nguyen V H (1984) Generalized Hessian matrix and second-order optimality conditions for problems with C ^1,1 data. Appl Math Optim 11:43–56
Article MathSciNet MATH Google Scholar
Jayadeva K R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article MATH Google Scholar
Joachims T, Ndellec C, Rouveriol (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning, no.10, Chemnitz, Germany, pp 137–142
Kumar M A, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36:7535–7543
Article Google Scholar
Kumar M A, Gopal M (2008) Application of smoothing technique on twin support vector machines. Pattern Recogn Lett 29:1842–1848
Article Google Scholar
Lee Y J, Mangasarian O L (2001) SSVM: A smooth support vector machine for classification. Comput Optim Appl 20(1):5–22
Article MathSciNet MATH Google Scholar
Mangasarian O L (2002) A finite Newton method for classification. Optimization Methods and Software 17:913–929
Article MathSciNet MATH Google Scholar
Mangasarian O L, Musicant D R (2001) Lagrangian support vector machines. J Mach Learn Res 1:161–177
MathSciNet MATH Google Scholar
Mangasarian O L, Wild E W (2006) Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Murphy P M, Aha D W (1992) UCI Repository of machine learning databases. University of California, Irvine. http://www.ics.uci.edu/~mlearn
Google Scholar
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings of Computer Vision and Pattern Recognition, pp 130–136
Platt J (1999) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges CJC, Smola AJ (Ed.), Advances in kernel methods- support vector learning, MIT press, Cambridge, MA, pp 185–208
Peng X (2011) TPMSVM: A novel twin parametric-margin support vector machine for pattern recognition. Pattern Recogn 44(10-11):2678–2692
Article MATH Google Scholar
Peng X (2010) TSVR: An efficient twin support vector machine for regression. Neural Netw 23(3):365–372
Article Google Scholar
Rockafellar R T (1974) Conjugate duality and optimization. SIAM, Philadelphia
Book MATH Google Scholar
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Article Google Scholar
Suykens J A K, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article MATH Google Scholar
Vapnik VN (2000) The nature of statistical learning theory, 2nd ed. Springer, New York
Book Google Scholar
Zhou S, Liu H, Zhou L, Ye F (2007) Semi-smooth Newton support vector machine. Pattern Recogn Lett 28:2054–2062
Article Google Scholar

Download references

Acknowledgments

The authors are thankful to the anonymous reviewers for their comments.

Author information

Authors and Affiliations

School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
S. Balasundaram, Deepak Gupta & Subhash Chandra Prasad

Authors

S. Balasundaram
View author publications
You can also search for this author in PubMed Google Scholar
Deepak Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Subhash Chandra Prasad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Balasundaram.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Balasundaram, S., Gupta, D. & Prasad, S.C. A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization. Appl Intell 46, 124–134 (2017). https://doi.org/10.1007/s10489-016-0809-8

Download citation

Published: 29 July 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10489-016-0809-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization

Abstract

Similar content being viewed by others

Newton method for implicit Lagrangian twin support vector machines

Lagrangian Twin-Bounded Support Vector Machine Based on L2-Norm

Application of smoothing techniques for linear programming twin support vector machines

1 Introduction

2 Related work

2.1 Support vector machine (SVM)

2.2 Twin support vector machine (TWSVM)

2.3 Least squares twin support vector machine (LS-TWSVM)

3 Proposed unconstrained Lagrangian twin SVM (ULTWSVM)

Remark 1

Remark 2

Remark 3

Remark 4

4 Experimental results

5 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization

Abstract

Similar content being viewed by others

Newton method for implicit Lagrangian twin support vector machines

Lagrangian Twin-Bounded Support Vector Machine Based on L2-Norm

Application of smoothing techniques for linear programming twin support vector machines

Explore related subjects

1 Introduction

2 Related work

2.1 Support vector machine (SVM)

2.2 Twin support vector machine (TWSVM)

2.3 Least squares twin support vector machine (LS-TWSVM)

3 Proposed unconstrained Lagrangian twin SVM (ULTWSVM)

Remark 1

Remark 2

Remark 3

Remark 4

4 Experimental results

5 Conclusion and future work

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation