A Novel Twin Support Vector Machine for Binary Classification Problems

Chen, Sugen; Wu, Xiaojun; Zhang, Renfeng

doi:10.1007/s11063-016-9495-0

A Novel Twin Support Vector Machine for Binary Classification Problems

Published: 14 January 2016

Volume 44, pages 795–811, (2016)
Cite this article

Download PDF

Access provided by CONRICYT – Journals CONACYT

Neural Processing Letters Aims and scope Submit manuscript

A Novel Twin Support Vector Machine for Binary Classification Problems

Download PDF

Sugen Chen^1,2,
Xiaojun Wu¹ &
Renfeng Zhang¹

586 Accesses
12 Citations
Explore all metrics

Abstract

Based on the recently proposed twin support vector machine and twin bounded support vector machine, in this paper, we propose a novel twin support vector machine (NTSVM) for binary classification problems. The significance of our proposed NTSVM is that the objective function is changed in the spirit of regression, such that hyperplanes separate as much as possible. In addition, the successive overrelaxation technique is used to solve quadratic programming problems to speed up the training process. Experimental results obtained on several artificial and UCI benchmark datasets show the feasibility and effectiveness of the proposed method.

Multi-class Twin Support Vector Machine for Pattern Classification

Generalized Twin Support Vector Machines

Article 06 March 2021

Fixed-point twin support vector machine

Article 28 December 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Image classification is one of the most fundamental problems in computer vision and pattern recognition, which is to assign one or more category labels to an image. However, image classification is a complex process that may be affected by many factors. There are several stages in image classification such as preprocessing, sample selection, feature extraction and classifier design and so on. In those stages, feature extraction and classifier design are especially important. In feature extraction, there are a lot of different methods, such as PCA [1], LDA [2], 2DPCA [3], STL [4] and GTDA [5] and so on. Actually, varied features describe different properties of the same image. In order to make full use of various features, multi-view learning [6, 7] and multiple feature fusion [8, 9] have been widely studied. In classifier design, support vector machine (SVM) [10, 11], as a powerful tool for pattern classification and regression, has been widely used to a variety of real-world problems such as pattern recognition [12, 13], bioinformatics [14], text categorization [15, 16] and financial applications [17]. Moreover, some related works with regard to SVM are also worthy of attention [18–20]. In order to improve the performance of SVM-based relevance feedback in image retrieval, the asymmetric bagging and random subspace for SVM was proposed [18]. In order to address the problems in LR-based (Laplacian regularization) image annotation, Liu and Tao [19] proposed a multiview framework of SVM for image annotation. Furthermore, aiming to solve the class separation problem, Tao et al. [20] discussed subspace learning for SVM. However, SVM is computationally expensive because its solution follows from solving a quadratic programming problem (QPP) with numerous constraints, especially when handing large-scale data. In addition, SVM seeks only a separating plane such that it cannot well cope with the complex XOR problems. Recently, in order to improve the generalization ability and computational speed of SVM, multi-hyperplane support vector machine, which seeks two nonparallel hyperplanes instead of one hyperplane in SVM, has been widely investigated. In 2006, Mangasarian and Wild [21] proposed a generalized eigenvalue proximal support vector machine (GEPSVM), which is originally motivated to solve XOR problem and reduce the computing time of SVM. In this approach, training points of each class are proximal to one of two nonparallel hyperplanes. Next year, Jayadeva, Khemchandani and Chandra [22] proposed another nonparallel hyperplane classifier named twin support vector machine (TWSVM), which aims at seeking two nonparallel hyperplanes such that each hyperplane is close to one of the two classes and is as far as possible from the other simultaneously. Experimental results in [22] and [23] have shown the effectiveness of TWSVM over both standard SVC and GEPSVM, at the same time, TWSVM can also well deal with the XOR problems. Therefore, from then on, the methods of constructing the nonparallel hyperplanes support vector machine have been extensively studied, such as twin bounded support vector machine (TBSVM) [24], TPMSVM [25], RTSVM [26], STSVM [27], PPSVC [28] and NHSVM [29] and so on. Similarly, the methods of finding two projection directions [30–34] and two hyperspheres [35, 36] also have been widely investigated, such as MVSVM [30], PTSVM [31] and TSVH [36].

However, the above nonparallel hyperplane methods [22–24, 29] are almost in seeking two nonparallel hyperplanes $w_1^T x+b_1 =0$, $w_2^T x+b_2 =0$ and minimizing $||Aw_1 +e_1 b_1 ||_2^2 $ or $||Bw_2 +e_2 b_2 ||_2^2 $. Inspired by [37], in this paper, we propose a novel twin support vector machine, which aims at seeking two hyperplanes $w_1^T x+b_1 =1, w_2^T x+b_2 =-1$ and minimizing $||Aw_1 +e_1 b_1 -e_1 ||_2^2 $ or $||Bw_2 +e_2 b_2 +e_2 ||_2^2 $ such that the two hyperplanes separate as much as possible. Furthermore, a regularization term is used to overcome overfitting and singularity problem, which is similar to [24]. Thus, the structural risk minimization principle is implemented.

The rest paper is organized as follows. Section 2 gives a brief overview of TWSVM and TBSVM. Section 3 proposes our linear novel twin support vector machine (NTSVM) and extends to nonlinear NTSVM. Experimental results are described in Sect. 4, and finally, conclusions are given in Sect. 5.

2 Brief Reviews of TWSVM and TBSVM

2.1 TWSVM

Consider a binary classification problem of $m_1 $ data points in positive class and $m_2 $ data points in negative class. Suppose that all of the data points in positive class are denoted by a matrix $A\in R^{m_1 \times n}$ and the data points in negative class are denoted by a matrix $B\in R^{m_2 \times n}$. $A_i $ is the i-th row of A which is a row vector in $R^{n}$. For linear case, TWSVM [22] determines two nonparallel hyperplanes:

$$\begin{aligned} w_1^T x+b_1 =0 \quad \hbox {and}\quad w_2^T x+b_2 =0 \end{aligned}$$

(1)

where $w_i \in R^{n},b_i \in R,i=1,2$. The TWSVM aims at seeking a pair of nonparallel hyperplanes for the two classes such that each hyperplane is close to one of the two classes and is at a distance at least one from the other class. The primal problems of TWSVM are

$$\begin{aligned} \min _{w_1 ,b_1 ,\xi _2} \frac{1}{2}\bigg |\bigg |Aw_1 +e_1 b_1 \bigg |\bigg |_2^2 +c_1 e_2^T \xi _2 \quad s.t.\;\;-(Bw_1 +e_2 b_1)+\xi _2 \ge e_2 , \quad \xi _2 \ge 0 \end{aligned}$$

(2)

$$\begin{aligned} \min _{w_2 ,b_2 ,\xi _1} \frac{1}{2}\bigg |\bigg |Bw_2 +e_2 b_2 \bigg |\bigg |_2^2 +c_2 e_1^T \xi _1 \quad s.t.\;\;(Aw_2 +e_1 b_2)+\xi _1 \ge e_1 , \quad \xi _1 \ge 0 \end{aligned}$$

(3)

where $c_1 $ and $c_2 $ are the positive penalty parameters, $\xi _1$ and $\xi _2 $ are nonnegative slack variables, $e_1 $ and $e_2 $ are vectors of all ones of appropriate dimensions. By introducing the Lagrangian multipliers, the Wolfe dual of quadratic programming problems (QPPs) (2) and (3) can be represented as follows, respectively.

$$\begin{aligned} \max _\alpha \;e_2^T \alpha -\frac{1}{2}\alpha ^{T}G(H^{T}H)^{-1}G^{T}\alpha \quad s.t.\;\;0\le \alpha \le c_1 e_2 \end{aligned}$$

(4)

$$\begin{aligned} \max _\gamma \;e_1^T \gamma -\frac{1}{2}\gamma ^{T}H(G^{T}G)^{-1}H^{T}\gamma \quad s.t.\;\;0\le \gamma \le c_2 e_1 \end{aligned}$$

(5)

where $G=[B\;e_2] ,\;H=[A\;e_1]$ and $\alpha \in R^{m_2} ,\gamma \in R^{m_1} $ are Lagrangian multipliers.

The nonparallel hyperplanes (1) can be obtained from the solutions $\alpha $ and $\gamma $ to the optimization problems (4) and (5) by

$$\begin{aligned} v_1= & {} -(H^{T}H)^{-1}G^{T}\alpha , \quad \hbox { where } \quad v_1 =\bigg [w_1^T \;b_1\bigg ]^{T}, \end{aligned}$$

(6)

$$\begin{aligned} v_2= & {} (G^{T}G)^{-1}H^{T}\gamma , \quad \hbox { where } \quad v_2 =\bigg [w_2^T \;b_2\bigg ]^{T}, \end{aligned}$$

(7)

A new data point $x\in R^{n}$ is then assigned to the positive class $W_1 $ or negative class $W_2 $, depending on which of the hyperplanes in (1) it lies closer to, i.e.

$$\begin{aligned} x\in W_i ,\quad i= \arg \min _{i=1,2} \;\bigg |w_i^T x+b_i \bigg | \end{aligned}$$

(8)

where $|\cdot |$ is the absolute value.

2.2 TBSVM

Following the basic idea of SVC and TWSVM, TBSVM [24] also seeks a pair of nonparallel hyperplanes such that each hyperplane is proximal to the data points of one class and far from the data points of another class. The main different between TBSVM and TWSVM is that the structural risk minimization principle is implemented by adding the regularization term in TBSVM. The primal problems of TBSVM are

$$\begin{aligned}&\min _{w_1 ,b_1 ,\xi _2} \;\frac{1}{2}c_3 \bigg (||w_1 ||^{2}+b_1^2\bigg )+\frac{1}{2}\bigg |\bigg |Aw_1 +e_1 b_1 \bigg |\bigg |_2^2 +c_1 e_2^T \xi _2 \nonumber \\&\quad s.t.\;\;-(Bw_1 +e_2 b_1)+\xi _2 \ge e_2 , \quad \xi _2 \ge 0 \end{aligned}$$

(9)

$$\begin{aligned}&\min _{w_2 ,b_2 ,\xi _1} \;\frac{1}{2}c_4 \bigg (||w_2 ||^{2}+b_2^2\bigg )+\frac{1}{2}\bigg |\bigg |Bw_2 +e_2 b_2 \bigg |\bigg |_2^2 +c_2 e_1^T \xi _1 \nonumber \\&\quad s.t.\;\;(Aw_2 +e_1 b_2)+\xi _1 \ge e_1 , \quad \xi _1 \ge 0 \end{aligned}$$

(10)

where $c_1 $ and $c_2 $ are the positive penalty parameters, $c_3$ and $c_4 $ are the positive trade-off parameters, $\xi _1 $ and $\xi _2$ are nonnegative slack variables, $e_1 $ and $e_2 $ are vectors of all ones of appropriate dimensions. By introducing the Lagrangian multipliers, the Wolfe dual of QPPs (9) and (10) can be represented as follows.

$$\begin{aligned} \mathop {\max } \limits _\alpha \;e_2^T \alpha -\frac{1}{2}\alpha ^{T}G\bigg (H^{T}H+c_3 I\bigg )^{-1}G^{T}\alpha \quad s.t.\;\;0\le \alpha \le c_1 e_2 \end{aligned}$$

(11)

$$\begin{aligned} \mathop {\max } \limits _\gamma \;e_1^T \gamma -\frac{1}{2}\gamma ^{T}H\bigg (G^{T}G+c_4 I\bigg )^{-1}H^{T}\gamma \quad s.t.\;\;0\le \gamma \le c_2 e_1 \end{aligned}$$

(12)

where $G=[B\;e_2],\;H=[A\;e_1], I$ is an identity matrix and$\alpha \in R^{m_2} ,\gamma \in R^{m_1} $ are Lagrangian multipliers.

The nonparallel hyperplanes (1) can be obtained from the solutions $\alpha $ and $\gamma $ to the optimization problems (11) and (12) by

$$\begin{aligned} v_1= & {} -\bigg (H^{T}H+c_3 I\bigg )^{-1}G^{T}\alpha , \quad \hbox { where } \quad v_1 =\bigg [w_1^T \;b_1\bigg ]^{T} \end{aligned}$$

(13)

$$\begin{aligned} v_2= & {} \bigg (G^{T}G+c_4 I\bigg )^{-1}H^{T}\gamma , \quad \hbox { where } \quad v_2 =\bigg [w_2^T \;b_2\bigg ]^{T} \end{aligned}$$

(14)

Once the solutions $[w_1^T \;b_1]$ and $[w_2^T \;b_2]$ to (11) and (12) are obtained from (13) and (14), a new data point $x\in R^{n}$ is assigned to the positive class $W_1 $ or negative class $W_2 $, depending on which of the hyperplanes in (1) it lies closer to, i.e.

$$\begin{aligned} x\in W_i ,\quad i=\mathop {\arg \min } \limits _{i=1,2} \;\frac{|w_i^T x+b_i |}{||w_i ||}\; \end{aligned}$$

(15)

where $|\cdot |$ is the absolute value.

3 Novel Twin Support Vector Machines

3.1 Linear NTSVM

Now, let us construct our linear NTSVM. Following the idea of PSVM [37] and TBSVM [24], we propose our method, which seeks two nonparallel hyperplanes:

$$\begin{aligned} w_1^T x+b_1 =1 \quad \hbox {and} \quad w_2^T x+b_2 =-1 \end{aligned}$$

(16)

by considering the following primal problems.

$$\begin{aligned}&\mathop {\min } \limits _{w_1 ,b_1 ,\xi _2} \;\frac{1}{2}c_3 \bigg (||w_1 ||_2^2 +b_1^2\bigg )+\frac{1}{2}\bigg |\bigg |Aw_1 +e_1 b_1 -e_1 \bigg |\bigg |_2^2 +c_1 e_2^T \xi _2 \nonumber \\&\quad s.t.\;\;-(Bw_1 +e_2 b_1)+\xi _2 \ge e_2 , \quad \xi _2 \ge 0 \end{aligned}$$

(17)

$$\begin{aligned}&\mathop {\min } \limits _{w_2 ,b_2 ,\xi _1} \;\frac{1}{2}c_4 \bigg (||w_2 ||_2^2 +b_2^2\bigg )+\frac{1}{2}\bigg |\bigg |Bw_2 +e_2 b_2 +e_2 \bigg |\bigg |_2^2 +c_2 e_1^T \xi _1 \nonumber \\&\quad s.t.\;\;(Aw_2 +e_1 b_2)+\xi _1 \ge e_1 , \quad \xi _1 \ge 0 \end{aligned}$$

(18)

where $c_1 $ and $c_2 $ are the positive penalty parameters, $c_3 $ and $c_4 $ are the positive trade-off parameters, $\xi _1 $ and $\xi _2 $ are nonnegative slack variables, $e_1 $ and $e_2 $ are vectors of all ones of appropriate dimensions.

Then, we discuss the difference between the primal problems of TBSVM and our NTSVM, by comparing optimization problems (9) and (17). Obviously, minimizing $||Aw_1 +e_1 b_1 -e_1 ||_2^2 $ in (17) instead of minimizing $||Aw_1 +e_1 b_1 ||_2^2 $ in (9) means our NTSVM aims to obtain $[w_1^T \;b_1]$ such that the positive data points are proximal to the hyperplane$w_1^T x+b_1 =1$ and the negative data points are far away from it at the same time. Based on this change, our optimization problem will be similar to regression problem, i.e., we generate a function $f_1 (x)=w_1^T x+b_1 $ to fit the positive class label $+1$. Meanwhile, we find another function $f_2 (x)=w_2^T x+b_2 $ to fit the negative class label $-1$. In addition, similar to TBSVM, our NTSVM also adds a regularization term to the objective function in (17). Thus, the structural risk minimization principle is considered.

In order to get the solutions to problems (17) and (18), we need to derive their dual problems. For (17), by introducing the Lagrangian multipliers, the Lagrangian function is given by

$$\begin{aligned} L(w_1 ,b_1 ,\xi _2 ,\alpha ,\beta )= & {} \frac{1}{2}c_3 \bigg (||w_1 ||_2^2 +b_1^2\bigg )+\frac{1}{2}\bigg |\bigg |Aw_1 +e_1 b_1 -e_1 \bigg |\bigg |_2^2 +c_1 e_2^T \xi _2 \nonumber \\&-\,\alpha ^{T}\bigg [-(Bw_1 +e_2 b_1)+\xi _2 -e_2\bigg ]-\beta ^{T}\xi _2 \end{aligned}$$

(19)

where $\alpha \in R^{m_2}$ and $\beta \in R^{m_2} $ are the vectors of Lagrangian multipliers. By using Karush-Kuhn-Tucker (KKT) conditions, we can get

$$\begin{aligned}&\nabla _{w_1} =c_3 w_1 +A^{T}(Aw_1 +e_1 b_1 -e_1)+B^{T}\alpha =0 \end{aligned}$$

(20)

$$\begin{aligned}&\nabla _{b_1} =c_3 b_1 +e^{T}(Aw_1 +e_1 b_1 -e_1)+e_2^T \alpha =0 \end{aligned}$$

(21)

$$\begin{aligned}&\nabla _{\xi _2} =c_1 e_2 -\alpha -\beta =0 \end{aligned}$$

(22)

$$\begin{aligned}&\qquad \ \quad -(Bw_1 +e_2 b_1)+\xi _2 \ge e_2 , \quad \xi _2 \ge 0 \end{aligned}$$

(23)

$$\begin{aligned}&\alpha ^{T}(Bw_1 +e_2 b_1 -\xi _2 +e_2)=0, \quad \beta ^{T}\xi _2 =0 \end{aligned}$$

(24)

$$\begin{aligned}&\alpha \ge 0, \quad \beta \ge 0 \end{aligned}$$

(25)

Since $\beta \ge 0$, from (22) and (25), we can get

$$\begin{aligned} 0\le \alpha \le c_1 e_2 \end{aligned}$$

(26)

Let $G=[B\;e_2],\;H=[A\;e_1],v_1 =[w_1^T \;b_1]^{T}$, (20) and (21) imply that

$$\begin{aligned} \bigg (H^{T}H+c_3 I\bigg )v_1 -H^{T}e_1 +G^{T}\alpha =0 \end{aligned}$$

(27)

Thus, we can get the augmented vector

$$\begin{aligned} v_1 =\bigg (H^{T}H+c_3 I\bigg )^{-1}\bigg [H^{T}e_1 -G^{T}\alpha \bigg ] \end{aligned}$$

(28)

where I is an identity matrix. Then putting (28) into (19) and using (20–22), omitting the constant term, we obtain the Wolfe dual problem of (17) as follows

$$\begin{aligned}&\mathop {\max } \limits _\alpha \;\bigg [e_2^T +e_1^T H\bigg (H^{T}H+c_3 I\bigg )^{-1}G^{T}\bigg ]\alpha -\frac{1}{2}\alpha ^{T}G\bigg (H^{T}H+c_3 I\bigg )^{-1}G^{T}\alpha \nonumber \\&\quad s.t.\;\;0\le \alpha \le c_1 e_2 \end{aligned}$$

(29)

In the same way, we can obtain the Wolfe dual problem of (18) as follows

$$\begin{aligned}&\mathop {\max } \limits _\gamma \;\bigg [e_1^T +e_2^T G\bigg (G^{T}G+c_4 I\bigg )^{-1}H^{T}\bigg ]\gamma -\frac{1}{2}\gamma ^{T}H\bigg (G^{T}G+c_4 I\bigg )^{-1}H^{T}\gamma \nonumber \\&\quad s.t.\;\;0\le \gamma \le c_2 e_1 \end{aligned}$$

(30)

where $\alpha \in R^{m_2} ,\gamma \in R^{m_1} $ are Lagrangian multipliers.

The nonparallel hyperplanes (16) can be obtained from the solutions $\alpha $ and $\gamma $ to the optimization problems (29) and (30) by

$$\begin{aligned}&v_1 =\bigg (H^{T}H+c_3 I\bigg )^{-1}\bigg [H^{T}e_1 -G^{T}\alpha \bigg ], \quad \hbox {where} \quad v_1 =\bigg [w_1^T \;b_1\bigg ]^{T} \end{aligned}$$

(31)

$$\begin{aligned}&v_2 =\bigg (G^{T}G+c_4 I\bigg )^{-1}\bigg [-G^{T}e_2 +H^{T}\gamma \bigg ], \quad \hbox {where} \quad v_2 =\bigg [w_2^T \;b_2\bigg ]^{T} \end{aligned}$$

(32)

Once $v_1 $ and $v_2 $ are obtained from (31) and (32), the two nonparallel hyperplanes (16) are known. A new data point $x\in R^{n}$ is then assigned to the positive class $W_1 $ or negative class $W_2 $, depending on which of the hyperplanes in (16) it lies closer to, i.e.

$$\begin{aligned} x\in W_i ,\quad i=\mathop {\arg \min } \limits _{i=1,2} \bigg \{\;\bigg |w_1^T x+b_1 -1\bigg |,\bigg |w_2^T x+b_2 +1\bigg |\bigg \} \end{aligned}$$

(33)

where $|\cdot |$ is the absolute value.

3.2 Nonlinear NTSVM

In this subsection, we show that our NTSVM can be extended to nonlinear case. Here, we consider the following kernel-generated hyperplanes

$$\begin{aligned} K(x^{T},C^{T})w_1 +b_1 =1 \quad \hbox {and} \quad K(x^{T},C^{T})w_2 +b_2 =-1 \end{aligned}$$

(34)

where $C=[A\;B]\in R^{(m_1 +m_2)\times n}$ and K is an appropriate kernel. Similar to linear case, the nonlinear optimization problems can be expressed as

$$\begin{aligned}&\mathop {\min } \limits _{w_1 ,b_1 ,\eta _2} \;\frac{1}{2}c_3 \bigg (||w_1 ||_2^2 +b_1^2\bigg )+\frac{1}{2}\bigg |\bigg |K(A,C^{T})w_1 +e_1 b_1 -e_1 \bigg |\bigg |_2^2 +c_1 e_2^T \eta _2 \nonumber \\&\quad s.t.\;\;-\bigg (K(B,C^{T})w_1 +e_2 b_1\bigg )+\eta _2 \ge e_2 , \quad \eta _2 \ge 0 \end{aligned}$$

(35)

$$\begin{aligned}&\mathop {\min } \limits _{w_2 ,b_2 ,\eta _1} \;\frac{1}{2}c_4 \bigg (||w_2 ||_2^2 +b_2^2\bigg )+\frac{1}{2}\bigg |\bigg |K(B,C^{T})w_2 +e_2 b_2 +e_2 \bigg |\bigg |_2^2 +c_2 e_1^T \eta _1 \nonumber \\&\quad s.t.\;\;\bigg (K(A,C^{T})w_2 +e_1 b_2\bigg )+\eta _1 \ge e_1 , \quad \eta _1 \ge 0 \end{aligned}$$

(36)

By using the Lagrangian method and KKT conditions, we can obtain Wolfe dual problems as follows

$$\begin{aligned}&\mathop {\max } \limits _\alpha \;\bigg [e_2^T +e_1^T \tilde{H}\bigg (\tilde{H}^{T}\tilde{H}+c_3 I\bigg )^{-1}\tilde{G}^{T}\bigg ]\alpha -\frac{1}{2}\alpha ^{T}\tilde{G}\bigg (\tilde{H}^{T}\tilde{H}+c_3 I\bigg )^{-1}\tilde{G}^{T}\alpha \nonumber \\&\quad s.t.\;\;0\le \alpha \le c_1 e_2 \end{aligned}$$

(37)

$$\begin{aligned}&\mathop {\max } \limits _\gamma \;\bigg [e_1^T +e_2^T \tilde{G}\bigg (\tilde{G}^{T}\tilde{G}+c_4 I\bigg )^{-1}\tilde{H}^{T}\bigg ]\gamma -\frac{1}{2}\gamma ^{T}\tilde{H}\bigg (\tilde{G}^{T}\tilde{G}+c_4 I\bigg )^{-1}\tilde{H}^{T}\gamma \nonumber \\&\quad s.t.\;\;0\le \gamma \le c_2 e_1 \end{aligned}$$

(38)

where $\tilde{G}=[K(B,C^{T})\;e_2],\;\tilde{H}=[K(A,C^{T})\;e_1]$. From (35)-(38), and the augmented vectors $v_1 =[w_1^T \;b_1]^{T}$ and $v_2 =[w_2^T \;b_2]^{T}$ can be obtained by

$$\begin{aligned} v_1= & {} \bigg (\tilde{H}^{T}\tilde{H}+c_3 I\bigg )^{-1}\bigg [\tilde{H}^{T}e_1 -\tilde{G}^{T}\alpha \bigg ] \end{aligned}$$

(39)

$$\begin{aligned} v_2= & {} \bigg (\tilde{G}^{T}\tilde{G}+c_4 I\bigg )^{-1}\bigg [-\tilde{G}^{T}e_2 +\tilde{H}^{T}\gamma \bigg ] \end{aligned}$$

(40)

Once the augmented vector $v_1 $ and $v_2 $ are obtained from (39) and (40), the two nonparallel hyperplanes (34) are known. A new data point $x\in R^{n}$ is then assigned to the positive class $W_1 $ or negative class $W_2 $, depending on which of the hyperplanes in (34) it lies closer to, i.e.

$$\begin{aligned} x\in W_i ,\quad i=\mathop {\arg \min } \limits _{i=1,2} \bigg \{\;\bigg |w_1^T K(x^{T},C^{T})+b_1 -1\bigg |,\bigg |w_2^T K(x^{T},C^{T})+b_2 +1\bigg |\bigg \} \end{aligned}$$

(41)

where $|\cdot |$ is the absolute value.

3.3 Implementation

In this subsection, we discuss the implementation of our proposed NTSVM. In our NTSVM, the dual problem can be rewritten as the following unified form

$$\begin{aligned}&\mathop {\min } \limits _\alpha \quad \frac{1}{2}\alpha ^{T}Q\alpha -f^{T}\alpha \nonumber \\&\quad s.t.\quad \;0\le \alpha \le ce \end{aligned}$$

(42)

where Q is a positive definite matrix and f is a vector. For example, if we choose $Q=G(H^{T}H+c_3 I)^{-1}G^{T}$ and $f=[e_2^T +e_1^T H(H^{T}H+c_3 I)^{-1}G^{T}]^{T}$, the problem (42) becomes the problem (29). Similarly, the problem (42) becomes problem (37), when we choose $Q=\tilde{G}(\tilde{H}^{T}\tilde{H}+c_3 I)^{-1}\tilde{G}^{T}$ and $f=[e_2^T +e_1^T \tilde{H}(\tilde{H}^{T}\tilde{H}+c_3 I)^{-1}\tilde{G}^{T}]^{T}$.

Algorithm 1. SOR Algorithm

Step 1. Select the parameter $t\in (0,2)$ and the initial vector $\alpha ^{0}$, set $k=0$;

Step 2. Compute $\alpha ^{k+1}$ by $\alpha ^{k+1}=(\alpha ^{k}-t\cdot D^{-1}(Q\alpha ^{k}-f+L(\alpha ^{k+1}-\alpha ^{k})))_\# $; where Q and f are set according to (29), (30), (37) and (38). $(u)_\# $ denotes the 2-norm projection on the feasible region, that is

$$\begin{aligned} ((u)_\#)_i =\left\{ {\begin{array}{ll} 0,&{}\quad \mathrm{if}\;u_i \le 0 \\ u_i,&{}\quad \mathrm{if}\;0<u_i <c \\ c,&{}\quad \mathrm{if}\;u_i \ge c \\ \end{array}} \right. \end{aligned}$$

And define $L+D+L^{T}=Q$, where L and D are the strictly lower triangular matrix and the diagonal matrix, respectively;

Step 3. Stop if $||\alpha ^{k+1}-\alpha ^{k}||<\varepsilon $, where $\varepsilon $ is desired tolerance; else replace $\alpha ^{k}$ by $\alpha ^{k+1}$, k by $k+1$ and go to Step2.

Table 1 The parameters for different classifiers for linear kernel

Full size table

In our proposed NTSVM, most of the computational cost is incurred in solving the dual QPP (42). In order to solve the above QPP quickly, we use a very efficient optimization technique called successive overrelaxation (SOR) algorithm, which can be seen in [24, 29, 38]. SOR is an excellent QPP solver because it is able to deal with very large datasets that need not reside in memory [38]. The experimental results in the following section will show that the SOR technique has remarkable acceleration effect on our proposed NTSVM. Note that the SOR technique can also be used in the original TBSVM [24] and NHSVM [29]. In practice, for nonlinear NTSVM, if the number of training points in positive class or negative class is large, then the rectangular kernel technique [39] can be applied to reduce the dimensionality of (37) and (38).

4 Experimental Results

In order to evaluate the proposed NTSVM, we investigate its classification accuracies and computational efficiencies on two artificial datasets, a number of real-world UCI benchmark datasets and five NDC datasets. In the experiments, we focus on the comparison between our proposed NTSVM and some state-of-the-art classifiers, e.g. GEPSVM [21], TWSVM [22], TBSVM [24] and NHSVM [29]. All methods were implemented in MATLAB R2013a on a personal computer (PC) with an Intel (R) Core (TM) processor (3.40GHz) and 4 GB random-access memory (RAM). TBSVM, NHSVM and our proposed NTSVM are solved by SOR algorithm. The eigenvalue problem in GEPSVM is solved by MATLAB function ‘eig.m’ and the QPP problems in TWSVM are solved by the optimization toolbox QP in MATLAB. In our evaluation of the classifiers, we used “Accuracy” which is defined as Accuracy = (TP+TN)/(TP+FP+TN+FN), where TP, TN, FP and FN are the number of true positives, true negatives, false positives and false negatives, respectively. Note that all of these classifiers contained different parameters, which were selected from the set $\{2^{-8}, \ldots , 2^{8}\}$ by employing the standard 10-fold cross-validation methodology [40]. The parameters used with the different classifiers for linear case are listed in Table 1.

4.1 Toy Examples

In this subsection, two artificial datasets, including crossplane (XOR) and Ripley’s synthetic datasets [41] have been used to show that our proposed NTSVM can deal with linearly inseparable problems. Ripley’s synthetic dataset contains 250 training points and 1000 test points. The average results of linear GEPSVM, TWSVM, TBSVM, NHSVM and our proposed NTSVM on crossplane (XOR) dataset are reported in Table 2 and the average results of nonlinear GEPSVM, TWSVM, TBSVM, NHSVM and our proposed nonlinear NTSVM with Gaussian kernels on Ripley’s synthetic dataset are listed in Table 3.

Table 2 Classification accuracy (%) on crossplane (XOR) datasets

Full size table

Table 3 Classification accuracy (%) on ripley datasets

Full size table

From Table 2, we can observe that all method above can gain good performance on crossplane (XOR) problem and our proposed NTSVM get the best performance. From Table 3, we can find that the accuracy of our proposed NTSVM is 89.20 %, which is little lower than NHSVM but better than other methods. Figure 1 illustrates the crossplane (XOR) dataset and hyperplanes of our proposed NTSVM. Ripley’s dataset and the hyperplanes of GEPSVM, TWSVM, TBSVM, NHSVM and our proposed nonlinear NTSVM are shown in Fig. 2.

4.2 UCI Datasets

To further compare our proposed NTSVM with GEPSVM, TWSVM and TBSVM, we choose 13 datasets from the UCI machine learning repository [42]. The numerical results of their linear version are given in Table 4. We also list the mean accuracy for each of the classifiers and the best accuracy is shown in boldface. According to the W-T-L summarization, we can find that the accuracy of our proposed linear NTSVM is better than that of GEPSVM, TWSVM and TBSVM on most of the datasets. For example, for the Australian dataset, the accuracy of our NTSVM is 87.54 %, while that of GEPSVM is 85.65 %, TWSVM is 86.81 % and TBSVM is 86.96 %.

Table 4 Ten-fold testing percentage accuracy of linear classifiers

Full size table

Table 5 displays the experimental results for nonlinear GEPSVM, TWSVM, TBSVM and our NTSVM on the above chosen UCI datasets. The Gaussian kernel $K(x,y)=e^{-\frac{||x-y||^{2}}{2\sigma ^{2}}}$ is used. The kernel parameter $\sigma $ is obtained by searching in the range from $2^{-8}$ to $2^{8}$. The accuracy and the training CPU time for these methods are also listed. The results in Table 5 are similar to those in Table 4, and therefore confirm the observation above.

Table 5 Ten-fold testing percentage accuracy of nonlinear classifiers

Full size table

In order to compare the performance of different classifiers, the two-dimensional scatter plots were used in [19, 26, 28, 30]. The corresponding scatter plots are shown in Fig. 3 for Australian UCI datasets with about 20 % of data points, where the coordinates $(d_i^+ ,d_i^-)$ are the respective distances of a test point $x_i $ given in (33). In Fig. 3, the figures (a–d) are the results obtained by GEPSVM, TWSVM, TBSVM and our NTSVM respectively.

As we know, there are four parameters $c_i (i=1,2,3,4)$ and a kernel parameter $\sigma $ in our linear NTSVM and nonlinear NTSVM, respectively. We will analyze the effect of $c_i$ to linear NTSVM and the effect of kernel parameter $\sigma $ to nonlinear NTSVM. All the experiments in this subsection will be carried out on four benchmark datasets: “Australian”, “House votes”, “Heart-c” and “Ionosphere”. We first evaluate the influence of the parameters $c_i (i=1,2,3,4)$. In real experiments, for convenience, we assume $c_1 =c_2 $ and $c_3 =c_4 $ for each dataset. The results are depicted in Fig. 4. As we can see, proper selection of parameters $c_i $ will improve the performance of the classifier in a certain extent. We next illustrate the relationship between kernel parameter $\sigma $ and classification accuracy in the nonlinear case. To analyze the effect of kernel parameter $\sigma $ on the performance of NTSVM for the nonlinear case, the comparison is carried out under the condition that the parameter $c_i $ are all selected as 1 for convenience. To clarify the influence of the kernel parameter, we draw the parameter-accuracy curves of above four datasets with parameter $\sigma $ belongs to the set $\{2^{-8},2^{-7},\cdots ,2^{7},2^{8}\}$. Based on the results in Fig. 5, we can see that the performance of NTSVM is closely related to the chosen of kernel parameters.

4.3 NDC Datasets

In this subsection, we conducted experiments on some large datasets, generated using David Musicants NDC Data Generator [43]. For experiments with NDC datasets, we fixed parameters of all classifiers to be the same (i.e.$c_i =1$). The training time and accuracy are reported in Tables 6 and 7, respectively. Table 6 shows the comparison results for linear TWSVM, TBSVM, NHSVM and our NTSVM on NDC datasets. In Table 6, it is not hard to see that our NTSVM obtains the comparable or better accuracies and performs faster than other classifiers on most datasets.

For nonlinear case, Table 7 shows the comparison results of computing times and accuracies of all four classifiers considered on several NDC datasets with Gaussian kernel. For NDC-2k, NDC-3k and NDC-5k datasets, the rectangular Generator [39] is applied by using 10 % of total data points. The results on these datasets show that our NTSVM and TBSVM are much faster than TWSVM and NHSVM. This is because that the QPP problems in TWSVM are directly solved by the optimization toolbox QP in MATLAB, while our NTSVM and TBSVM are solved by SOR algorithm. Although the QPP problem in NHSVM is also solved by SOR algorithm, the computing time is more than NTSVM and TBSVM. The main reason is that the size of QPP in NHSVM is large than the size of QPPs in NTSVM and TBSVM. Tables 6 and 7 demonstrate the good generalization ability and efficiency of our NTSVM when dealing with large-scale problems.

Table 6 Comparison on NDC datasets for linear classifiers

Full size table

Table 7 Comparison on NDC datasets for nonlinear classifiers

Full size table

5 Conclusions

For binary classification, an improved twin support vector machine (NTSVM) based on TWSVM is proposed in this paper. Unlike TWSVM, our proposed NTSVM seeks two nonparallel hyperplanes $w_1^T x+b_1 =1$ and $w_2^T x+b_2 =-1$. Similar to TBSVM, a regularization term is used to overcome overfitting and singularity problem. In addition, SOR technique is utilized to speed up training process. Experimental results illustrate that our NTSVM is feasible and effective. It should be pointed out that there are four parameters in our NTSVM, so the parameter selection is a practical problem and should be investigated in the future. Furthermore, expanding the applications of our NTSVM in practical scenarios and extending our NTSVM to multi-view learning [44] are also interesting.

References

Turk M, Pentland A (1991) Eigenfaces for recognition. J Cognit Neurosci 3(1):71–86
Article Google Scholar
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Yang J, Zhang D, Frangi AF, Yang JY (2004) Two-dimensional PCA: a new approach to face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137
Article Google Scholar
Tao DC, Li XL, Hu WM et al (2005) Supervised tensor learning. In: Fifth IEEE international conference on data mining
Tao DC, Li XL, Wu XD, Maybank SJ (2007) General tensor discriminant analysis and gabor features for gait recognition. IEEE Trans Pattern Anal Mach Intell 29(10):1700–1715
Article Google Scholar
Hong CQ, Yu J, Li J, Chen XH (2013) Multi-view hypergraph learning by patch alignment framework. Neurocomputing 118:79–86
Article Google Scholar
Yu J, Rui Y, Tang YY, Tao DC (2014) High-order distance-based multiview stochastic learning in image classification. IEEE Trans Cybern 44(12):2431–2442
Article Google Scholar
Wang Z, Chen SC, Sun TK (2008) MultiK-MHKS: a novel multiple kernel learning algorithm. IEEE Trans Pattern Anal Mach Intell 30(2):348–353
Article Google Scholar
Pong KH, Lam KM (2014) Multi-resolution feature fusion for face recognition. Pattern Recognit 47(2):556–567
Article Google Scholar
Cortes C, Vapnik VN (1995) Support vector machine. Mach Learn 20(3):273–297
MATH Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Byun H, Lee SW (2002) Applications of support vector machines for pattern recognition: a survey, pattern recognition with support vector machines. Springer, Berlin
Book MATH Google Scholar
Burges C (1998) A tutorial support vector machines for pattern recognition. Data Mining Knowl Discov 2:1–43
Article Google Scholar
Noble WS (2004) Kernel methods in computational biology. In: Vert J-P (ed) Support vector machine applications in computational biology. MIT Press, Cambridge, pp 71–92
Google Scholar
Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272
Article Google Scholar
Yen SJ, Wu YC, Yang JC, Lee YS, Liu LL (2013) A support vector machine-based context-ranking model for question answering. Inf Sci 224(1):77–87
Article Google Scholar
Trafails TB, Ince H (2000) Support vector machine for regression and applications to financial forecasting. In: International joint conference on neural networks, pp 6348–6348
Tao DC, Tang XO, Li XL, Wu XD (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099
Article Google Scholar
Liu WF, Tao DC (2013) Multiview Hessian regularization for image annotation. IEEE Trans Image Process 22(7):2676–2687
Article MathSciNet Google Scholar
Tao DC, Li XL, Wu XD, Maybank SJ (2009) Geometric mean for subspace selection. IEEE Trans Pattern Anal Mach Intell 31(2):260–274
Article Google Scholar
Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Jayadeva R, Khemchandani S Chandra (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article MATH Google Scholar
Kumar MA, Gopal M (2008) Application of smoothing technique in twin support vector machines. Pattern Recognit Lett 29(13):1842–1848
Article Google Scholar
Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Article Google Scholar
Peng XJ (2011) TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recognit 44(10):2678–2692
Article MATH Google Scholar
Qi ZQ, Tian YJ, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recognit 46(1):305–316
Article MATH Google Scholar
Qi ZQ, Tian YJ, Shi Y (2013) Structural twin support vector machine for classification. Knowl-Based Syst 43:74–81
Article Google Scholar
Wang Z, Shao YH, Wu TR (2014) Proximal parametric-margin support vector classifier and its applications. Neural Comput Appl 24:755–764
Article Google Scholar
Shao YH, Chen WJ, Deng NY (2014) Nonparallel hyperplane support vector machine for binary classification problems. Inf Sci 263:22–35
Article MathSciNet MATH Google Scholar
Ye QL, Zhao CX, Ye N, Chen YN (2010) Multi-weight vector projection support vector machines. Pattern Recognit Lett 31:2006–2011
Article Google Scholar
Chen XB, Yang J, Ye QL, Liang J (2011) Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit 44(10):2643–2655
Article MATH Google Scholar
Shao YH, Deng NY, Yang ZM (2012) Least squares recursive projection twin support vector machine for classification. Pattern Recognit 45(6):2299–2307
Article MATH Google Scholar
Shao YH, Wang Z, Chen WJ, Deng NY (2013) A regularization for the projection twin support vector machine. Knowl-Based Syst 37:203–210
Article Google Scholar
Ding SF, Hua XP (2014) Recursive least squares projection twin support vector machines for nonlinear classification. Neurocomputing 130:3–9
Article Google Scholar
Peng XJ (2010) Least squares twin support vector hypersphere (LS-TSVH) for pattern recognition. Expert Syst Appl 37(12):8371–8378
Article Google Scholar
Peng XJ, Xu D (2014) Twin support vector hypersphere (TSVH) classifier for pattern recognition. Neural Comput Appl 24:1207–1220
Article MathSciNet Google Scholar
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 77–86
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Netw 10(5):1032–1037
Article Google Scholar
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 13(1):1–13
Article Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Ripley BD (2008) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Muphy PM, Aha DW (1992) UCI repository of machine learning databases
Musicant DR (1998) NDC: normally distributed clustered datasets. http://www.cs.wisc.edu/dmi/svm/ndc/. Accessed 5 Nov 2015
Xie XJ, Sun SL (2014) Multi-view Laplacian twin support vector machines. Appl Intell 41(4):1059–1068
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under grant No. 61373055 and No. 61103128. The authors would like to thank Dr. Yuan-Hai Shao from Zhejiang University of Technology for his valuable discussion and help.

Author information

Authors and Affiliations

School of IoT Engineering, Jiangnan University, Wuxi, 214122, People’s Republic of China
Sugen Chen, Xiaojun Wu & Renfeng Zhang
School of Mathematics & Computational Science, Anqing Normal University, Anqing, 246133, People’s Republic of China
Sugen Chen

Authors

Sugen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Renfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaojun Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, S., Wu, X. & Zhang, R. A Novel Twin Support Vector Machine for Binary Classification Problems. Neural Process Lett 44, 795–811 (2016). https://doi.org/10.1007/s11063-016-9495-0

Download citation

Published: 14 January 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11063-016-9495-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Novel Twin Support Vector Machine for Binary Classification Problems

Abstract

Similar content being viewed by others

Multi-class Twin Support Vector Machine for Pattern Classification

Generalized Twin Support Vector Machines

Fixed-point twin support vector machine

1 Introduction

2 Brief Reviews of TWSVM and TBSVM

2.1 TWSVM

2.2 TBSVM

3 Novel Twin Support Vector Machines

3.1 Linear NTSVM

3.2 Nonlinear NTSVM

3.3 Implementation

4 Experimental Results

4.1 Toy Examples

4.2 UCI Datasets

4.3 NDC Datasets

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Novel Twin Support Vector Machine for Binary Classification Problems

Abstract

Similar content being viewed by others

Multi-class Twin Support Vector Machine for Pattern Classification

Generalized Twin Support Vector Machines

Fixed-point twin support vector machine

Explore related subjects

1 Introduction

2 Brief Reviews of TWSVM and TBSVM

2.1 TWSVM

2.2 TBSVM

3 Novel Twin Support Vector Machines

3.1 Linear NTSVM

3.2 Nonlinear NTSVM

3.3 Implementation

4 Experimental Results

4.1 Toy Examples

4.2 UCI Datasets

4.3 NDC Datasets

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation