A new fuzzy twin support vector machine for pattern classification

Chen, Su-Gen; Wu, Xiao-Jun

doi:10.1007/s13042-017-0664-x

A new fuzzy twin support vector machine for pattern classification

Original Article
Published: 06 April 2017

Volume 9, pages 1553–1564, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

A new fuzzy twin support vector machine for pattern classification

Download PDF

Su-Gen Chen^1,2 &
Xiao-Jun Wu¹

1230 Accesses
47 Citations
Explore all metrics

Abstract

Fuzzy SVM is often used to solve the problem that patterns belonging to one class often play more significant roles in classification. In order to improve the efficiency and performance of fuzzy SVM, this paper proposes a new fuzzy twin support vector machine (NFTSVM) for binary classification, in which fuzzy neural networks and twin support vector machine (TWSVM) are incorporated. By design, the influence of the samples with high uncertainty can be mitigated by employing fuzzy membership to weigh the margin of each training sample, which improves the generalization ability. In addition, we show that the existing TWSVM and twin bounded support vector machines (TBSVM) are special cases of the proposed NFTSVM when the parameters of NFTSVM are appropriately selected. Moreover, the successive overrelaxation (SOR) technique is adopted to solve the quadratic programming problems (QPPs) in the proposed NFTSVM algorithm to speed up the training procedure. Experimental results obtained on several artificial and real-world datasets validate the feasibility and effectiveness of the proposed method.

Twin Support Vector Machines Classifier Based on Intuitionistic Fuzzy Number

Entropy-Based Fuzzy Least Squares Twin Support Vector Machine for Pattern Classification

Article 11 July 2019

A novel fuzzy twin support vector machine based on centered kernel alignment

Article 24 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Support vector machine (SVM) [1, 2], as a powerful tool for classification and regression, has been widely used in a variety of practical applications [3–6]. However, SVM is computational costly because its solution follows from solving constrained quadratic programming problems (QPPs), especially when handing large-scale data. Furthermore, SVM has only one separating plane hence it cannot cope with the complex XOR problem. To address these issues, the generalized eigenvalue proximal SVM (GEPSVM) [7] uses two nonparallel haperplanes such that each hyperplane is close to one of the two classes and is as far as possible from the other class. Motivated by GEPSVM, Jayadeva et al. [8] proposed a twin SVM (TWSVM) that developed a novel nonparallel hyperplane classifier for binary classification. In TWSVM, training samples in each class are proximal to one of the two nonparallel hyperplanes. The nonparallel hyperplanes are obtained by solving a pair of small-sized QPPs. Experimental results in [8] validated the superiority of TWSVM over the classical SVM. Moreover, TWSVM can effectively deal with the XOR problem due to the underlying model. Therefore, the method of constructing nonparallel hyperplane SVM has been extensively studied and a variety of methods have been proposed [9–16], such as LSTSVM [9], TBSVM [10], TPMSVM [11], RTSVM [12], PPSVC [13], NHSVM [14] and NPSVM [15]. Also, the method of finding two projection directions has been widely investigated, such as MVSVM [17], PTSVM [18] and LSPTSVM [19, 20]. An overview on twin support vector machines was given in [21].

However, in practice, patterns belonging to one class often play more significant roles in classification. Such a problem is usually solved using fuzzy SVMs [22–26]. In fuzzy SVMs, the patterns of more important classes are assigned higher membership values. Different from FSVM, Tao and Wang proposed a new framework for SVM to solve the fuzzy classification problem using fuzzy membership function [27], which termed new fuzzy SVM (NFSVM) [28]. However, FSVM and NFSVM have the same computational complexity because their solutions follow from solving QPP. Lastly, inspired by TWSVM, FTWSVM [29] and FTSVM [30] have also been proposed. Specifically, FTWSVM incorporates the information of fuzziness in the data and obtains nonparallel planes around which the data points of the corresponding class get clustered. However, based on v-twin support vector machine, FTSVM was proposed by introducing importance of training sample.

In order to enhance the efficiency and performance of fuzzy SVM and TWSVM for binary classification, in this paper, we propose a new fuzzy twin support vector machine (NFTSVM). The proposed NFTSVM relates to the ideas of both NFSVM and TWSVM. However, NFTSVM has two advantages compared to NFSVM and TWSVM. First, NFTSVM aims at generating two nonparallel hyperplanes such that each plane is close to one of the two classes and is as far as possible from the other. It employs the fuzzy membership to weigh the margin of each training sample, hence enhances the generalization ability. Second, NFTSVM has a more general formulation, compared to TWSVM and TBSVM. In fact, TWSVM and TBSVM are special cases of the proposed NFTSVM. Moreover, our proposed NFTSVM is different from FTWSVM and FTSVM. For FTWSVM and FTSVM, a fuzzy membership value is assigned to each pattern, and points are classified by assigning them to the nearest of two nonparallel planes that are close to their respective classes. However, inspired by NFSVM, the main idea of our proposed NFTSVM is to weigh the margin by using the fuzzy membership function. Thus, it is easy to think that the influence of the samples with high uncertainty can be decreased by weighing the margin of each training vector. In addition, one of the important privileges of NFTSVM by the idea from fuzzy neural networks is that we can employ some available fuzzy membership functions.

The rest of this paper is organized as follows. Section 2 gives a brief overview to related work, including NFSVM, TWSVM and TBSVM. Section 3 presents the proposed NFTSVM in detail. Experimental results and conclusion are given in Sects. 4 and 5, respectively.

2 Related work

Conventional SVMs solve the binary classification problem using the principle of structural risk minimization (SRM) [1, 2]. For a typical binary classification problem, we have a set of training samples:

$$T = \{ ({x_1},{y_1}),({x_2},{y_2}), \ldots ,({x_l},{y_l})\}$$

(1)

where ${x_i} \in {\Re ^n}$ is the i-th observation, ${y_i} \in \{ - 1,1\}$ is the label of the i-th observation, and $l$ is the number of training samples. The aim of linear SVM is to solve the following primal QPP:

$$\begin{gathered} \mathop {\min }\limits_{w,b,\xi } \quad \frac{1}{2}{w^T}w + C\sum\limits_{i = 1}^l {{\xi _i}} \hfill \\ s.t.\quad {y_i}({w^T}{x_i} + b) \geqslant 1 - {\xi _i},\; \hfill \\ \quad \quad {\xi _i} \geqslant 0,\;i = 1,2, \ldots ,l \hfill \\ \end{gathered}$$

(2)

where $C$ is penalty parameter and ${\xi _i}$ are slack variables. The output of SVM is an optimal separating hyperplane:

$${w^T}x + b = 0$$

(3)

where $w \in {\Re ^n}$ is the normal vector to the hyperplane, and $b \in {\Re ^1}$ is the offset. Given a new observation $x'$, we can use this hyperplane to determine the label, i.e. we assign the positive label to $x'$ if ${w^T}x + b \geqslant 0$, otherwise, we assign the negative label to $x'$.

2.1 New fuzzy SVM (NFSVM)

However, one of the main drawbacks of the conventional SVM is that the linear model is very sensitive to outliers or noises. In order to deal with this problem, fuzzy support vector machines (FSVM) have been proposed [22–26]. In addition, Tao and Wang [28] proposed a new fuzzy support vector machine (NFSVM). In NFSVM, we consider the binary classification problem for a set of training samples:

$${T^*} = \{ ({x_1},{m_1}),({x_2},{m_2}), \ldots ,({x_l},{m_l})\}$$

(4)

where ${x_i} \in {\Re ^n}$ is independently distributed, ${m_i} \in [0,1$ is the fuzzy membership that measures the contribution of the i-th observation ${x_i}$ to the positive class and $l$ is the number of training samples. In order to get the similar formula to classical SVM, NFSVM uses a new label ${y_i} = 2{m_i} - 1$ to replace the fuzzy membership ${m_i}$. Hence, the primal problem of NFSVM can be expressed as

$$\begin{gathered} \mathop {\min }\limits_{w,b,\xi } \quad \frac{1}{2}{w^T}w + C\sum\limits_{i = 1}^l {{\xi _i}} \hfill \\ s.t.\quad {y_i}({w^T}{x_i} + b) \geqslant y_i^2 - y_i^2{\xi _i},\; \hfill \\ \quad \quad {\xi _i} \geqslant 0,\;i = 1,2, \ldots ,l \hfill \\ \end{gathered}$$

(5)

where $C$ is penalty parameter, ${\xi _i}$ are slack variables, $w \in {\Re ^n}$ is the normal vector and $b \in \Re$ is the offset.

To solve this primal problem, NFSVM solves its Lagrangian dual problem:

$$\begin{gathered} \mathop {\min }\limits_\alpha \quad \frac{1}{2}\sum\limits_{i = 1}^l {\sum\limits_{j = 1}^l {{\alpha _i}} } {\alpha _j}{y_i}{y_j}({x_i} \cdot {x_j}) - \sum\limits_{i = 1}^l {{\alpha _i}} y_i^2 \hfill \\ s.t.\quad \;\sum\limits_{i = 1}^l {{\alpha _i}} {y_i} = 0\; \hfill \\ \quad \quad \;0 \leqslant {\alpha _i}y_i^2 \leqslant C,\;i = 1,2, \ldots ,l \hfill \\ \end{gathered}$$

(6)

where $\alpha \in {\Re ^n}$ is Lagrangian multiplier [28].

2.2 TWSVM

Different from the single hyperplane SVM discussed above, linear twin support vector machine (TWSVM) [8] aims to find a pair of nonparallel hyperplanes:

$$w_1^Tx + {b_1} = 0\quad {\text{and}}\quad w_2^Tx + {b_2} = 0$$

(7)

such that each hyperplane is proximal to the training samples of one class and is as far as possible from the samples of the other class. More specifically, we consider the training of a binary classifier from $p$ positive samples and $q$ negative samples. The training samples in positive class are denoted by matrix $A \in {\Re ^{p \times n}}$ and the training samples in negative class are denoted by matrix $B \in {\Re ^{q \times n}}$. The primal problem of linear TWSVM can be expressed as:

$$\begin{gathered} \mathop {\min }\limits_{{w_1},{b_1},{\xi _2}} \,\,\frac{1}{2}||A{w_1} + {e_1}{b_1}||_2^2 + {c_1}e_2^T{\xi _2} \hfill \\ \;s.t.\quad - (B{w_1} + {e_2}{b_1}) + {\xi _2} \geqslant {e_2},{\xi _2} \geqslant 0 \hfill \\ \end{gathered}$$

(8)

$$\begin{gathered} \mathop {\min }\limits_{{w_2},{b_2},{\xi _1}} \,\,\frac{1}{2}||B{w_2} + {e_2}{b_2}||_2^2 + {c_2}e_1^T{\xi _1} \hfill \\ \;s.t.\quad (A{w_2} + {e_1}{b_2}) + {\xi _1} \geqslant {e_1},{\xi _1} \geqslant 0 \hfill \\ \end{gathered}$$

(9)

where ${c_1}> 0$ and ${c_2}> 0$ are penalty parameters, ${\xi _1}$ and ${\xi _2}$ are slack variables, ${e_1}$ and ${e_2}$ are vectors with each element of the value of 1. By introducing the method of Lagrangian multipliers, the corresponding Wolfe dual of QPPs (8) and (9) can be represented as

$$\begin{gathered} \mathop {\max }\limits_\alpha \,\,e_2^T\alpha - \frac{1}{2}{\alpha ^T}G{({H^T}H)^{ - 1}}{G^T}\alpha \hfill \\ \;s.t.\quad 0 \leqslant \alpha \leqslant {c_1}{e_2} \hfill \\ \end{gathered}$$

(10)

$$\begin{gathered} \mathop {\max }\limits_\gamma \,\,e_1^T\gamma - \frac{1}{2}{\gamma ^T}H{({G^T}G)^{ - 1}}{H^T}\gamma \hfill \\ \;s.t.\quad 0 \leqslant \gamma \leqslant {c_2}{e_1} \hfill \\ \end{gathered}$$

(11)

where $G = [B\;{e_2}$ and $\alpha \in {\Re ^q},\;\gamma \in {\Re ^p}$ are Lagrangian multipliers [8].

2.3 TBSVM

In order to embody the marrow of statistical learning theory and improve the performance of classification, the twin bounded support vector machines (TBSVM) was proposed [10]. TBSVM minimizes the structural risk by adding a regularization term with the idea of maximizing one side margin. For a linear binary classification problem, TBSVM solves the following two primal problems:

$$\begin{gathered} \mathop {\min }\limits_{{w_1},{b_1},{\xi _2}} \,\,\frac{1}{2}{c_3}(w_1^2 + b_1^2) + \frac{1}{2}||A{w_1} + {e_1}{b_1}||_2^2 + {c_1}e_2^T{\xi _2} \hfill \\ \;s.t.\quad - (B{w_1} + {e_2}{b_1}) + {\xi _2} \geqslant {e_2},{\xi _2} \geqslant 0 \hfill \\ \end{gathered}$$

(12)

$$\begin{gathered} \mathop {\min }\limits_{{w_2},{b_2},{\xi _1}} \,\,\frac{1}{2}{c_4}(w_2^2 + b_2^2) + \frac{1}{2}||B{w_2} + {e_2}{b_2}||_2^2 + {c_2}e_1^T{\xi _1} \hfill \\ \;s.t.\quad (A{w_2} + {e_1}{b_2}) + {\xi _1} \geqslant {e_1},{\xi _1} \geqslant 0 \hfill \\ \end{gathered}$$

(13)

where ${c_1},\;{c_2},\;{c_3}$ and ${c_4}$ are positive penalty parameters, ${\xi _1},\;{\xi _2}$ are slack variables, and ${e_1},\;{e_2}$ are vectors with each element of the value of 1. By introducing the method of Lagrangian multipliers, the Wolfe dual of QPPs (12) and (13) can be represented as

$$\begin{gathered} \mathop {\max }\limits_\alpha \,\,e_2^T\alpha - \frac{1}{2}{\alpha ^T}G{({H^T}H + {c_3}I)^{ - 1}}{G^T}\alpha \hfill \\ \;s.t.\quad 0 \leqslant \alpha \leqslant {c_1}{e_2} \hfill \\ \end{gathered}$$

(14)

$$\begin{gathered} \mathop {\max }\limits_\gamma \,\,e_1^T\gamma - \frac{1}{2}{\gamma ^T}H{({G^T}G + {c_4}I)^{ - 1}}{H^T}\gamma \hfill \\ \;s.t.\quad 0 \leqslant \gamma \leqslant {c_2}{e_1} \hfill \\ \end{gathered}$$

(15)

where $G = [B\;{e_2}],\,H = [A\;{e_1}]$, $\alpha \in {\Re ^q},\;\gamma \in {\Re ^p}$ are Lagrangian multipliers, and $I$ is an identity matrix of appropriate dimensions [10].

3 New fuzzy twin support vector machine (NFTSVM)

3.1 Linear NFTSVM

In real-world applications, we may often face with the situations where patterns belonging to one class play a more significant role in classification. In such cases, the main concern is how to determine the final classes by assigning different importance degrees to training data. A well approach to deal with this challenge is to use the concept of fuzzy function. As we know, the existing nonparallel hyperplane support vector machines often obtain higher solving efficiency than conventional SVM, such as TWSVM is approximately four times faster than conventional SVM. In order to deal with real-world applications and get higher solving efficiency, we incorporate the concept of fuzzy theory into NFSVM and propose a new fuzzy twin support vector machine (NFTSVM) for binary classification. As discussed in Sect. 2.1, in fuzzy binary classification, ${m_i}$ is the fuzzy membership that measures the membership of the corresponding observation ${x_i}$ to the positive class. Given $p$ positive samples $\{ ({\tilde x_1},{\tilde m_1}),({\tilde x_2},{\tilde m_2}), \ldots ,({\tilde x_p},{\tilde m_p})\}$ and $q$ negative samples$\{ ({\hat x_1},{\hat m_1}),({\hat x_q},{\hat m_q}), \ldots ,({\hat x_q},{\hat m_q})\}$, then, we have the diagonal matrices ${Y_1} = diag({\tilde y_1},{\tilde y_2}, \ldots ,{\tilde y_p}),\;{Y_2} = diag({\hat y_1},{\hat y_2}, \ldots ,{\hat y_q})$, where ${\tilde y_i} = 2{\tilde m_i} - 1,\;(i = 1,2, \ldots ,p)$, ${\hat y_j} = 2{\hat m_j} - 1,\;(j = 1, \ldots ,q)$, respectively.

Similar to NFSVM [28], TWSVM [8] and TBSVM [10], the goal of the proposed linear NFTSVM is to find two nonparallel hyperplanes (7) using the following primal problems:

$$\begin{gathered} \mathop {\min }\limits_{{w_1},{b_1},{\xi _2}} \,\,\frac{1}{2}||A{w_1} + {e_1}{b_1}||_2^2 + \frac{1}{2}{c_1}(w_1^2 + b_1^2) + {c_2}e_2^T{\xi _2} \hfill \\ \;s.t.\quad {Y_2}(B{w_1} + {e_2}{b_1}) \geqslant Y_2^2{e_2} - Y_2^2{\xi _2},{\xi _2} \geqslant 0 \hfill \\ \end{gathered}$$

(16)

$$\begin{gathered} \mathop {\min }\limits_{{w_2},{b_2},{\xi _1}} \,\,\frac{1}{2}||B{w_2} + {e_2}{b_2}||_2^2 + \frac{1}{2}{c_3}(w_2^2 + b_2^2) + {c_4}e_1^T{\xi _1} \hfill \\ \;s.t.\quad {Y_1}(A{w_2} + {e_1}{b_2}) \geqslant Y_1^2{e_1} - Y_1^2{\xi _1},{\xi _1} \geqslant 0 \hfill \\ \end{gathered}$$

(17)

where ${c_1},\;{c_2},\;{c_3}$ and ${c_4}$ are positive penalty parameters, ${\xi _1}$ and ${\xi _2}$ are slack variables, ${e_1}$ and ${e_2}$ are vectors with each entry of the value of 1.

Let us compare the objective functions and constraints in (12), (13) and (16), (17). Obviously, their objective functions are the same and their constraints are different. In fact, for a classical binary classification problem, the fuzzy membership ${\tilde m_i} = 1$ and ${\hat m_j} = 0$, and the label ${\tilde y_i} = 1,\;(i = 1,2, \ldots ,p)$ and ${\hat y_j} = - 1,\;(j = 1, \ldots ,q)$. In this case, the constraints in (16), (17) are the same as in (12), (13). That is to say, TBSVM is a special case of our NFTSVM when the parameters of NFTSVM are appropriately selected. However, in practical problem, the fuzzy memberships ${\tilde m_i}\,(i = 1,2, \ldots ,p)$ are not all 1 and ${\hat m_j}\,(j = 1,2, \ldots ,q)$ are not all 0. This means that the punishment for different samples with different coefficients, which maybe improve the classification accuracy in practical problems.

To obtain the solutions of the objective functions (16) and (17), we have to derive their dual problems. For (16), by introducing the method of Lagrangian multipliers, we can obtain the Lagrangian function:

$$\begin{aligned} L({w_1},{b_1},{\xi _2},\alpha ,\beta ) = \frac{1}{2}||A{w_1} + {e_1}{b_1}||_2^2 + \frac{1}{2}{c_1}(w_1^2 + b_1^2) + {c_2}e_2^T{\xi _2} \hfill \\ \quad \quad \quad \quad \quad \quad \quad - {\alpha ^T}[{Y_2}(B{w_1} + {e_2}{b_1}) - Y_2^2{e_2} + Y_2^2{\xi _2}\end{aligned}$$

(18)

where $\alpha \in {\Re ^q}$ and $\beta \in {\Re ^q}$ are Lagrangian multipliers. With the Karush-Kuhn- Tucker (KKT) conditions [31], we have:

$$\frac{{\partial L}}{{\partial {w_1}}} = {A^T}(A{w_1} + {e_1}{b_1}) + {c_1}{w_1} - {B^T}Y_2^T\alpha = 0,$$

(19)

$$\frac{{\partial L}}{{\partial {b_1}}} = e_1^T(A{w_1} + {e_1}{b_1}) + {c_1}{b_1} - e_2^TY_2^T\alpha = 0,$$

(20)

$$\frac{{\partial L}}{{\partial {\xi _2}}} = {c_2}{e_2} - Y_2^T\alpha - \beta = 0,$$

(21)

$${Y_2}(B{w_1} + {e_2}{b_1}) \geqslant Y_2^T{e_2} - Y_2^T{\xi _2},{\xi _2} \geqslant 0,$$

(22)

$${\alpha ^T}[{Y_2}(B{w_1} + {e_2}{b_1}) - Y_2^T{e_2} + Y_2^T{\xi _2}$$

(23)

$$\alpha \geqslant 0,\beta \geqslant 0.$$

(24)

Since $\beta \geqslant 0$, from (21) and (24), we have:

$$0 \leqslant \alpha \leqslant {c_2}{(Y_2^2)^{ - 1}}{e_2}$$

(25)

Let $G = [B\;{e_2}],\,G = [H\;{A\;e_1}],\, v_{1} = [w_{1}^{T} b_{1} ]^{T}$, (19) and (20) imply:

$$({H^T}H + {c_1}I){v_1} - {G^T}Y_2^T\alpha = 0$$

(26)

Thus, we can get the augmented vector

$${v_1} = {({H^T}H + {c_1}I)^{ - 1}}{G^T}Y_2^T\alpha$$

(27)

where $I$ is an identity matrix. Then, we obtain the Wolfe dual problem of (16) by combing (27) with (18) using (19), (21):

$$\begin{gathered} \mathop {\max }\limits_\alpha \,\,e_2^TY_2^T\alpha - \frac{1}{2}{\alpha ^T}{Y_2}G{({H^T}H + {c_1}I)^{ - 1}}{G^T}Y_2^T\alpha \hfill \\ \;s.t.\quad 0 \leqslant \alpha \leqslant {c_2}{(Y_2^2)^{ - 1}}{e_2} \hfill \\ \end{gathered}$$

(28)

In the same fashion, we can obtain the Wolfe dual problem of (19):

$$\begin{gathered} \mathop {\max }\limits_\gamma \,\,e_1^TY_1^T\gamma - \frac{1}{2}{\gamma ^T}{Y_1}H{({G^T}G + {c_3}I)^{ - 1}}{H^T}Y_1^T\gamma \hfill \\ \;s.t.\quad 0 \leqslant \gamma \leqslant {c_4}{(Y_1^2)^{ - 1}}{e_1} \hfill \\ \end{gathered}$$

(29)

where $\alpha \in {\Re ^q},\gamma \in {\Re ^p}$ are Lagrangian multipliers.

The nonparallel hyperplanes (7) can be obtained from the solutions of $\alpha$ and $\gamma$ in (28) and (29) by:

$${v_1} = {({H^T}H + {c_1}I)^{ - 1}}{G^T}Y_2^T\alpha ,\quad {\rm{where }}\,{v_1} = {[w_1^T\,{b_1}]^T}$$

(30)

$${v_2} = {({G^T}G + {c_3}I)^{ - 1}}{H^T}Y_1^T\gamma ,\quad {\rm{where}}\;{v_2} = {[w_2^T\,{b_2}]^T}$$

(31)

Once ${v_1}$ and ${v_2}$ are obtained, these two nonparallel hyperplanes (7) are known. A new sample $x \in {\Re ^n}$ is assigned to either positive or negative label, depending on the hyperplanes (7) it lies closest to, i.e.

$$x \in {W_k},\,\,k = \mathop {\arg \min }\limits_{i = 1,2} \,\left\{ {\frac{{|w_1^Tx + {b_1}|}}{{||{w_1}||}},\,\frac{{|w_2^Tx + {b_2}|}}{{||{w_2}||}}} \right\}$$

(32)

where $| \cdot |$ is the absolute value.

3.2 Nonlinear NFTSVM

In this subsection, we show that our linear NFTSVM can be easily extended to nonlinear case. Here, we consider the following kernel-generated hyperplanes

$$K({x^T},{C^T}){w_1} + {b_1} = 0\quad {\text{and}}\quad K({x^T},{C^T}){w_2} + {b_2} = 0,$$

(33)

where $C = [A\;B] \in \Re ^{{(p + q) \times n}}$ and $K$ is an appropriately kernel. Similar to the linear case, the nonlinear optimization problems can be expressed as:

$$\begin{gathered} \mathop {\min }\limits_{{w_1},{b_1},{\eta _2}} \,\,\frac{1}{2}||K(A,{C^T}){w_1} + {e_1}{b_1}||_2^2 + \frac{1}{2}{c_1}(w_1^2 + b_1^2) + {c_2}e_2^T{\eta _2} \hfill \\ \;s.t.\quad {Y_2}(K(B,{C^T}){w_1} + {e_2}{b_1}) \geqslant Y_2^2{e_2} - Y_2^2{\eta _2},{\eta _2} \geqslant 0, \hfill \\ \end{gathered}$$

(34)

$$\begin{gathered} \mathop {\min }\limits_{{w_2},{b_2},{\eta _1}} \,\,\frac{1}{2}||K(B,{C^T}){w_2} + {e_2}{b_2}||_2^2 + \frac{1}{2}{c_3}(w_2^2 + b_2^2) + {c_4}e_1^T{\eta _1} \hfill \\ \;s.t.\quad {Y_1}(K(A,{C^T}){w_2} + {e_1}{b_2}) \geqslant Y_1^2{e_1} - Y_1^2{\eta _1},{\eta _1} \geqslant 0 \hfill \\ \end{gathered}$$

(35)

With the Lagrangian method and KKT conditions, we can obtain the corresponding Wolfe dual problems

$$\begin{gathered} \mathop {\max }\limits_\alpha \,\,e_2^TY_2^T\alpha - \frac{1}{2}{\alpha ^T}{Y_2}\tilde G{({{\tilde H}^T}\tilde H + {c_1}I)^{ - 1}}{{\tilde G}^T}Y_2^T\alpha \hfill \\ \;s.t.\quad 0 \leqslant \alpha \leqslant {c_2}{(Y_2^2)^{ - 1}}{e_2}, \hfill \\ \end{gathered}$$

(36)

$$\begin{gathered} \mathop {\max }\limits_\gamma \,\,e_1^TY_1^T\gamma - \frac{1}{2}{\gamma ^T}{Y_1}\tilde H{({{\tilde G}^T}\tilde G + {c_3}I)^{ - 1}}{{\tilde H}^T}Y_1^T\gamma \hfill \\ \;s.t.\quad 0 \leqslant \gamma \leqslant {c_4}{(Y_1^2)^{ - 1}}{e_1}, \hfill \\ \end{gathered}$$

(37)

where $\tilde G = [K(B,{C^T})\;{e_2}],\, \tilde{H} = [K(A,C^{T} )e_{1} ]$

According to (34)–(37), the augmented vectors ${v_1} = {[w_1^T\,{b_1}]^T}$ and ${v_2} = {[w_2^T\,{b_2}]^T}$ can be obtained by

$${v_1} = {({\tilde H^T}\tilde H + {c_1}I)^{ - 1}}{\tilde G^T}Y_2^T\alpha ,$$

(38)

$${v_2} = {({\tilde G^T}\tilde G + {c_3}I)^{ - 1}}{\tilde H^T}Y_1^T\gamma ,$$

(39)

Once the vector ${v_1}$ and ${v_2}$ are obtained, the two nonparallel hyperplanes (33) are known. A new sample $x \in {\Re ^n}$ is assigned to either positive or negative label, depending on which of the hyperplanes (33) it lies closest to, i.e.

$$x \in {W_k},\,\,k = \mathop {\arg \min }\limits_{i = 1,2} \,\left\{ {\frac{{|w_1^TK(x,{C^T}) + {b_1}|}}{{\sqrt {w_1^TK(C,{C^T}){w_1}} }},\,\frac{{|w_2^TK(x,{C^T}) + {b_2}|}}{{\sqrt {w_2^TK(C,{C^T}){w_2}} }}} \right\}$$

(40)

where $| \cdot |$ is the absolute value.

3.3 Implementation

In this subsection, we discuss the implementation of our proposed NFTSVM. In our NFTSVM, the dual problem can be rewritten as the following unified form

$$\begin{gathered} \mathop {\min }\limits_\alpha \quad \tfrac{1}{2}{\alpha ^T}Q\alpha - {d^T}\alpha \hfill \\ s.t.\quad \;0 \leqslant \alpha \leqslant ce\;, \hfill \\ \end{gathered}$$

(41)

where $Q$ is a positive definite matrix and $d$ is a vector. For example, if we set $Q = {Y_2}G{({H^T}H + {c_1}I)^{ - 1}}{G^T}Y_2^T,\;d = {Y_2}{e_2}$ and $c = {c_2}{(Y_2^2)^{ - 1}}$, problem (41) becomes the problem in (28) and if we set $Q = {Y_2}\tilde G{({\tilde H^T}\tilde H + {c_1}I)^{ - 1}}{\tilde G^T}Y_2^T,\;d = {Y_2}{e_2}$ and $c = {c_2}{(Y_2^2)^{ - 1}}$, problem (41) becomes the problem in (36).

In our proposed NFTSVM, the most expensive part in terms of computational cost is solving the dual QPPs (41). To solve the QPPs more efficiently, we use a state- of-the-art optimization technique, i.e. successive overrelaxation (SOR) algorithm [10, 14, 32]. SOR is an excellent QPPs solver because it is able to deal with large scale problems, without storing all the data in memory [32]. The experimental results in the following section show that the SOR accelerates our proposed NFTSVM.

To have a further analysis and understanding, we give some remarks to our NFTSVM. Firstly, if we define the membership ${m_i} = 1$ when ${x_i}$ is a positive sample, the corresponding label ${y_i} = 2{m_i} - 1 = 1$. If we define the membership ${m_i} = 0$ when ${x_i}$ is a negative sample, the corresponding label ${y_i} = 2{m_i} - 1 = - 1$. Therefore, ${Y_1} = diag\{ 1,1, \ldots ,1\} ,\;{Y_2} = diag\{ - 1, - 1, \ldots , - 1\}$ and the primal problems of NFTSVM can be described as

$$\begin{gathered} \mathop {\min }\limits_{{w_1},{b_1},{\xi _2}} \,\,\frac{1}{2}||A{w_1} + {e_1}{b_1}||_2^2 + \frac{1}{2}{c_1}(w_1^2 + b_1^2) + {c_2}e_2^T{\xi _2} \hfill \\ \;s.t.\quad - (B{w_1} + {e_2}{b_1}) \geqslant {e_2} - {\xi _2},{\xi _2} \geqslant 0, \hfill \\ \end{gathered}$$

(42)

$$\begin{gathered} \mathop {\min }\limits_{{w_2},{b_2},{\xi _1}} \,\,\frac{1}{2}||B{w_2} + {e_2}{b_2}||_2^2 + \frac{1}{2}{c_3}(w_2^2 + b_2^2) + {c_4}e_1^T{\xi _1} \hfill \\ \;s.t.\quad (A{w_2} + {e_1}{b_2}) \geqslant {e_1} - {\xi _1},{\xi _1} \geqslant 0, \hfill \\ \end{gathered}$$

(43)

It is obvious that (42) and (43) are the same as (12) and (13). Moreover, if the parameters ${c_1} = {c_3} = 0$, (42) and (43) degenerate to (8) and (9). Thus, the classical TWSVM [8] and TBSVM [10] are special cases of our NFTSVM when the parameters of NFTSVM are appropriately selected. Secondly, for nonlinear NFTSVM, if the number of training points is large, the rectangular kernel technique [33] can be applied to reduce the dimensionality of (36) and (37). Lastly, in practical classification applications, we often only have the information of labels and have to define the fuzzy membership based on prior knowledge. Fortunately, we can employ some available fuzzy membership functions from fuzzy neural networks. In this paper, we directly use a fuzzy membership function that is also used in [27, 28]. Keller and Hunt [27] suggested the following way to assign fuzzy membership values such that a fuzzy 2-partition is formed. Given a positive sample ${x_i}$ with positive label $+ 1$, the fuzzy membership is defined as

$${m_1}({x_i}) = 0.5 + \frac{{\exp ({C_0}({d_{ - 1}}({x_i}) - {d_1}({x_i}))/d) - \exp ( - {C_0})}}{{2(\exp {C_0} - \exp ( - {C_0}))}},\;{m_{ - 1}}({x_i}) = 1 - {m_1}({x_i})$$

Given a negative sample ${x_i}$ with negative label $- 1$, the fuzzy membership is defined as

$${m_{ - 1}}({x_i}) = 0.5 + \frac{{\exp ({C_0}({d_1}({x_i}) - {d_{ - 1}}({x_i}))/d) - \exp ( - {C_0})}}{{2(\exp {C_0} - \exp ( - {C_0}))}},\;{m_1}({x_i}) = 1 - {m_{ - 1}}({x_i})$$

where ${d_1}({x_i})$ is the distance between ${x_i}$ and the mean of the positive class, ${d_{ - 1}}({x_i})$ is the distance between ${x_i}$ and the mean of the negative class, $d$ is the distance between the means of positive and negative classes, and ${C_0}$ is a constant controlling the membership function.

4 Experimental results

In order to evaluate the performance of our proposed NFTSVM, we investigated its classification accuracy and computational efficiency both on artificial datasets and real-world benchmark datasets. In our experiments, we focused on the comparison between our proposed NFTSVM and some classical classifiers, e.g. NFSVM [28], TWSVM [8], TBSVM [10], FTWSVM [29] and FTSVM [30]. These methods were implemented in MATLAB R2013a on a PC with Intel (R) Core (TM) processor (3.40 GHz) CPU and 4 GB RAM. TWSVM, TBSVM, and our proposed NFTSVM were solved using SOR algorithm. The QPPs in NFSVM, FTWSVM and FTSVM were solved by the optimization toolbox QP in MATLAB. The “Accuracy” used in our experiments was defined as Accuracy = (TP + TN)/(TP + FP + TN + FN), where TP, TN, FP and FN are the number of true positives, true negatives, false positives and false negatives, respectively. The parameters were selected by employing the standard tenfold cross-validation methodology [31]. The parameters ${c_i}$ and Gaussian kernel width $\sigma$ are selected from the set $\{ {2^i}|i = - 8, \ldots ,8\}$ and ${C_0}$ is selected from the set $\{ 0.5,\;1,\;1.5,\;2,\;2.5\}$.

4.1 Artificial datasets

In this subsection, three artificial datasets, including crossplane (XOR), complex XOR and Ripley’s synthetic datasets [34] have been used to validate our proposed NFTSVM in dealing with linearly inseparable problems. Ripley’s synthetic dataset contains 250 training samples and 1000 test samples. The average results of linear NFSVM, TWSVM, TBSVM and our proposed NFTSVM on crossplane (XOR) dataset and complex XOR dataset are demonstrated in Table 1. The average results of nonlinear NFSVM, TWSVM, TBSVM, and our NFTSVM with Gaussian kernels on Ripley’s synthetic dataset are presented in Table 2. In all tables, “Acc” and “Std” denote “Accuracy” and “Standard deviation” respectively.

Table 1 Classification accuracy (training and testing) on crossplane (XOR) datasets

Full size table

Table 2 Classification accuracy (training and testing) on Ripley datasets

Full size table

From Table 1, we can conclude that NFSVM cannot well deal with XOR and complex XOR problems. Although TWSVM and TBSVM have better performance on XOR dataset, the performance decreases on the complex XOR dataset. In contrast, our proposed NFTSVM performs best both on the XOR and complex XOR datasets. According to Table 2, the accuracy of our proposed NFTSVM is 87.60% that beats all the others, which demonstrates that our method has better generalization ability. Figure 1 illustrates the crossplane (XOR) and complex XOR datasets. Ripley’s dataset and the hyperplanes of TWSVM, TBSVM and our proposed NFTSVM are shown in Fig. 2.

4.2 UCI datasets

To further compare our proposed NFTSVM with NFSVM, TWSVM, TBSVM, FTWSVM and FTSVM, we selected 15 datasets from the UCI machine learning repository [35]. The numerical results of their linear versions are given in Table 3. In Table 3, the best results are highlighted in bold font. We can find that the accuracy of linear NFTSVM is better than that of NFSVM, TWSVM, TBSVM, FTWSVM and FTSVM on most of the datasets. A win-tie-loss (W-T-L) summarization based on mean accuracy is also reported at the bottom of Table 3. For example, for the Heart-Statlog dataset, the accuracy of our NFTSVM is 86.30%, while that of NFSVM is 84.07%, TWSVM is 84.81%, TBSVM is 85.19%, FTWSVM is 84.85% and FTSVM is 84.81%.

Table 3 Tenfold testing percentage accuracy of linear classifiers

Full size table

Table 4 shows the experimental results of nonlinear NFSVM, TWSVM, TBSVM, FTWSVM, FTSVM and our NFTSVM on the selected UCI datasets. Note that the Gaussian kernel $K(x,y) = {e^{ - ||x - y|{|^2}/2{\sigma ^2}}}$ was used in this experiment. The accuracy and the time of training of these methods are also reported. The results in Table 4 are similar to those in Table 3, and therefore confirm our conclusion again. In addition, from Tables 3 and 4, we can find that the training times of NFSVM, FTWSVM and FTSVM are more than TWSVM, TBSVM and our NFTSVM, which indicates that the SOR can speed up the training procedure.

Table 4 Tenfold testing percentage accuracy of nonlinear classifiers

Full size table

4.3 Image recognition

In this subsection, we test our method using the task of image recognition. Three well-known and publicly available databases for image classifications benchmarking, handwritten digits (USPS), objects (COIL-20), and recognition of faces (AR), are used to evaluate our proposed NFTSVM compared with TWSVM, TBSVM, FTWSVM and FTSVM. The USPS database [36] consists of gray-scale handwritten digit images from 0 to 9, as shown in Fig. 3. Each digit has 1100 images, and the resolution of each image is 16 × 16. In this paper, we selected five pairwise digits of varying difficulty for odd vs. even digit classification. COIL-20 [37] is a database of gray scale images of 20 objects, which are illustrated in Fig. 4. The objects were placed on a motorized turntable against a black background. Images of the objects were taken at pose intervals 5°, which corresponds to 72 images per object. In our experiments, we resized each image to 32 × 32. The AR database [38] contains 100 subjects and each subject has 26 face images taken in two sessions. For each session, there are 13 face images. We selected 14 unoccluded images from these two sessions in our experiments, as shown in Fig. 5. The 1400 images are all cropped into the same size of 40 × 30. For these datasets, we randomly split the images of each object into two parts with the same sizes such that one part is selected for training and the remaining part is used for testing. This process is repeated ten times and the average result is reported. Moreover, we only consider the Gaussian kernel for these methods. Table 5 lists the experimental results of these four methods in USPS, COIL-20 and AR datasets. The proposed NFTSVM obtains the best results both on USPS, COIL-20 and AR than the others in most cases according to the W-T-L summarization.

Table 5 The classification performance comparison on the COIL-20, AR and USPS datasets

Full size table

5 Conclusions

Inspired by NFSVM and TWSVM, a new fuzzy twin support vector machine (NFTSVM) was presented in this paper. NFTSVM employs fuzzy membership to weigh each training sample to improve the generalization ability of the system. As discussed in the paper, TWSVM and TBSVM are special cases of our proposed NFTSVM when the parameters of NFTSVM are appropriately selected. Experimental results obtained on a number of benchmarks illustrate the superiority of the proposed NFTSVM. In our future works, we will address the following three issues. First, it is worth noting that there are five parameters in the proposed NFTSVM, hence parameter selection is a practical problem that should be carefully investigated in the future. Second, we will construct fast algorithm to solve QPPs, such as genetic algorithm (GA) [39]. Third, extensions of NFTSVM to multi-class classification [40] and uncertainty mining [41] are also interesting.

References

Cortes C, Vapnik VN (1995) Support vector machine. Mach Learn 20(3):273–297
MATH Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer-Verlag, New York (Incorporated)
Book MATH Google Scholar
Burges C (1998) A tutorial support vector machines for pattern recognition. Data Min Knowl Disc 2:1–43
Article Google Scholar
Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272
Article Google Scholar
Yen SJ, Wu YC, Yang JC, Lee YS, Liu LL (2013) A support vector machine-based context-ranking model for question answering. Inf Sci 224(1):77–87
Article Google Scholar
You ZH, Yu JZ, Zhu L, Li S, Wen ZK (2014) A mapreduce based parallel SVM for large-scale predicting protein–protein interactions. Neurocomputing 145: 37–43
Article Google Scholar
Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Jayadeva, R Khemchandani, S Chandra (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article MATH Google Scholar
Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543
Article Google Scholar
Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Networks 22(6):962–968
Article Google Scholar
Peng XJ (2011) TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recognit 44(10):2678–2692
Article MATH Google Scholar
Qi ZQ, Tian YJ, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recognit 46(1):305–316
Article MATH Google Scholar
Wang Z, Shao YH, Wu TR (2014) Proximal parametric-margin support vector classifier and its applications. Neural Comput Applic 24:755–764
Article Google Scholar
Shao YH, Chen WJ, Deng NY (2014) Nonparallel hyperplane support vector machine for binary classification problems. Inf Sci 263:22–35
Article MathSciNet MATH Google Scholar
Tian YJ, Qi ZQ, Ju XC, Shi Y, Liu XH (2014) Nonparallel support vector machines for pattern classification. IEEE Trans on Cybern 44(7):1067–1079
Article Google Scholar
Chen SG, Wu XJ, Zhang RF (2016) A novel twin support vector machine for binary classification problems. Neural Process Lett 44(3):795–811
Article Google Scholar
Ye QL, Zhao CX, N. Y, Chen YN (2010) Multi-weight vector projection support vector machines. Pattern Recognit Lett 31:2006–2011
Article Google Scholar
Chen XB, Yang J, Ye QL, Liang J (2011) Recursive projection twin support vector machine via within-class variance minimization. Pattern Recognit 44(10):2643–2655
Article MATH Google Scholar
Shao YH, Deng NY, Yang ZM (2012) Least squares recursive projection twin support vector machine for classification. Pattern Recognit 45(6):2299–2307
Article MATH Google Scholar
Ding SF, Hua XP (2014) Recursive least squares projection twin support vector machines for nonlinear classification. Neurocomputing 130: 3–9
Article Google Scholar
Ding SF, JZ Yu, BJ Qi (2014) An overview on twin support vector machines. Artif Intell Rev 42(2):245–252
Article Google Scholar
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Networks 13(2):464–471
Article Google Scholar
Jiang XF, Yi Z, Lv JC (2006) Fuzzy SVM with a new fuzzy membership function. Neural Comput Appl 15(3–4):268–276
Article Google Scholar
An WJ, Liang MG (2013) Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises. Neurocomputing 110(7):101–110
Article Google Scholar
Ding SF, Han YZ, JZ Yu, YX Gu (2013) A fast fuzzy support vector machine based on information granulation. Neural Comput Appl 23:139–144
Article Google Scholar
Wang XZ, RAR Ashfaq, Fu AM (2015) Fuzziness based sample categorization for classifier performance improvement. J Intell Fuzzy Syst 29(3):1185–1196
Article MathSciNet Google Scholar
Keller JM, Hunt DJ (1985) Incorporating fuzzy membership functions into the perceptron algorithm. IEEE Trans Pattern Anal Mach Intell 6:693–699
Article Google Scholar
Tao Q, Wang J (2004) A new fuzzy support vector machine based on the weighted margin. Neural Process Lett 20(3):139–150
Article Google Scholar
Khemchandani R, Jayadeva, Chandra S (2008) Fuzzy twin support vector machines for pattern classification. Mathematical programming and game theory for decision making. World Scientific, Singapore, pp 131–142
Book MATH Google Scholar
Li K, Ma HY (2013) A fuzzy twin support vector machine algorithm. Int J Appl Innov Eng Manag 2(3):459–465
Google Scholar
R.O. Duda, P.E. Hart, D.G. Stork (2001) Pattern classification. Second edition, Wiley, Hoboken
MATH Google Scholar
Mangasarian OL, Musicant DR (1999) Successive overrelaxation for support vector machines. IEEE Trans Neural Networks 10(5):1032–1037
Article Google Scholar
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Networks 13(1):1–13
Article Google Scholar
Ripley BD (2008) Pattern recognition and neural networks. Cambridge University, Cambridge
MATH Google Scholar
Muphy PM, Aha DW (1992) UCI repository of machine learning databases. University of California, Irvine. http://www.ics.uci.edu/~mlearn
The USPS Database, http://www.cs.nyu.edu/~roweis/data.html
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical report CUCS-005-96, Department of Computer Science, Columbia University
Martinez AM, Benavente R (1998) The AR face database. CVC Technical Report #24
Wang XZ, He Q, Chen DG, Yeung D (2005) A genetic algorithm for solving the inverse problem of support vector machines. Neurocomputing 68: 225–238
Article Google Scholar
Wang XZ, Lu SX, Zhai JH (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120
Article Google Scholar
Wang XZ (2015) Uncertainty in learning from big data-Editorial. J Intell Fuzzy Syst 28(5):2329–2330
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61373055 and 61672265), the University Natural Science Research Project of Anhui Province of China (Grant No. KJ2015A266, KJ2016A431 and KJ2017A361) and the University Outstanding Young Talent Support Project of Anhui Province of China (Grant No. gxyq2017026).

Author information

Authors and Affiliations

School of IoT Engineering, Jiangnan University, Wuxi, 214122, People’s Republic of China
Su-Gen Chen & Xiao-Jun Wu
School of Mathematics and Computational Science, Anqing Normal University, Anqing, 246133, People’s Republic of China
Su-Gen Chen

Authors

Su-Gen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Jun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-Jun Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, SG., Wu, XJ. A new fuzzy twin support vector machine for pattern classification. Int. J. Mach. Learn. & Cyber. 9, 1553–1564 (2018). https://doi.org/10.1007/s13042-017-0664-x

Download citation

Received: 28 August 2015
Accepted: 13 March 2017
Published: 06 April 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s13042-017-0664-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new fuzzy twin support vector machine for pattern classification

Abstract

Similar content being viewed by others

Twin Support Vector Machines Classifier Based on Intuitionistic Fuzzy Number

Entropy-Based Fuzzy Least Squares Twin Support Vector Machine for Pattern Classification

A novel fuzzy twin support vector machine based on centered kernel alignment

1 Introduction