Abstract
Maximum margin of twin spheres support vector machine (MMTSSVM) is effective to deal with imbalanced data classification problems. However, it is sensitive to outliers because of the use of the Hinge loss function. To enhance the stability of MMTSSVM, we propose a Ramp loss maximum margin of twin spheres support vector machine (Ramp-MMTSSVM) in this paper. In terms of the Ramp loss function, the outliers can be given fixed loss values, which reduces the negative effect of outliers on constructing models. Since Ramp-MMTSSVM is a non-differentiable non-convex optimization problem, we adopt Concave-Convex Procedure (CCCP) approach to solve it. We also analyze the properties of parameters and verify them by one artificial experiment. Besides, we use Rest-vs.-One(RVO) strategy to extend Ramp-MMTSSVM to multi-class classification problems. The experimental results on twenty benchmark datasets indicate that no matter in binary or multi-class classification cases, our approaches both can obtain better experimental performance than the compared algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
An increasing number of machine learning approaches are prevailing these days, such as artificial neural networks [1], multi-objective optimization [2] and support vector machine (SVM) [3], and they all have their pros and cons. The SVM, proposed by Vapnik, is based on statistical learning theory. Under the guidance of VC dimension and structural risk minimization principle, it can get a good generalization and promotion ability through the compromise between empirical risk and model complexity. It is very effective when solving small sample, nonlinear, high dimensional sample problems, also feasible to avoid dimensional disasters and over-fitting problems to some extent. Nowadays, SVM has been applied to many fields, containing text classification [4], disease detection [5, 6], driver fatigue detection [7] and object detection [8] etc. Nevertheless, there are still some defects in SVM, and many improvements have been put forward in recent years.
One drawback of classic SVM is the slow computational speed. In order to improve this problem, Jayadeva et al. [9] proposed a twin SVM (TSVM) for binary classification problems. TSVM generates two nonparallel hyperplanes by solving two smaller-sized quadratic programming problems (QPPs) rather than one large QPP, which makes the computational speed approximately four times faster than that of classic SVM in theory. Meanwhile, TSVM requires that the data points of one class are as close as possible to one hyperplane and as far as possible from the other. Based on TSVM, many algorithms in [10,11,12,13,14,15,16,17] have been proposed.
In many TSVM-based algorithms, they all need matrix inverse operation. However, sometimes the matrix may not be reversible. What’s more, solving a matrix inverse is computationally expensive. In order to remedy this problem, the twin hypersphere support vector machine (THSVM) [18] was raised. It finds two irrelevant hyperspheres rather than two nonparallel hyperplanes by solving a pair of smaller-sized QPPs. It requires each hypersphere to capture as many data points of one class as possible. Because it needs to find two centers and two radiuses, it is still computational cost. In addition, THSVM can not deal with imbalanced data classification problems well.
To decrease computational cost and deal with imbalanced data classification problems more efficently, Xu [19] came up with a maximum margin of twin spheres support vector machine (MMTSSVM). It finds two homocentric hyperspheres rather than two irrelevant hyperspheres by solving one QPP and one linear programming problem (LPP).
All algorithms mentioned above use the Hinge loss function, which makes the models sensitive to outliers. The outliers may normally be given the largest hinge loss values, so the decision hyperplane is drawn toward outliers incorrectly, leading to decreased generalization performance. Hence, Huang et al. [20] proposed a ramp loss support vector machine (RSVM). According to the Ramp loss function, the outliers can be given fixed loss values. Thus, RSVM decreases the sensitivity to outliers. Because the RSVM is a non-convex optimization problem, the authors adopt the effective Concave-Convex Procedure (CCCP) [21] to solve it. Based on RSVM, some algorithms have been proposed, such as RLSSVM [22] etc.
In our daily life, multi-class classification problems are more common. The researchers have proposed a great deal of multi-class classification strategies, such as One-vs.-One (OVO) [23, 24], One-vs.-Rest (OVR) [25, 26], Rest-vs.-One (RVO) [27, 28] and One-vs.-One-vs.-Rest(OVOVR) [29,30,31]. For a K-class classification problem, OVO needs to combine the K classes in pairs. We choose the i th class as the positive class, and the j th class as the negative class to generate a classifier. Thus, K(K-1)/2 classifiers will finally be obtained. In OVR method, we choose the i th class as the positive class, and the rest classes as the negative class to generate a classifier. In total, OVR will generate K classifiers. RVO is just opposite to OVR. It picks the i th class as the negative class and the rest classes as the positive class to generate a classifier. RVO will also generate K classifiers. OVOVR is a bit similar with OVO. It also combines the classes in pairs. We select the i th class as the positive class, the j th class as the negative class and the remaining classes as the rest class to generate one classifier. OVOVR will also generate K(K-1)/2 classifiers totally. All the strategies above determine the type of a new data point by ‘voting’ scheme.
In order to improve the generalization performance of MMTSSVM, we propose a Ramp loss maximum margin of twin spheres support vector machine (Ramp-MMTSSVM) in this article. We employ MMTSSVM as the core of our algorithm because it has faster computational speed and can deal with imbalanced data classification problems effectively. Meanwhile, we use the Ramp loss function to substitute the Hinge loss function to decrease the sensitivity of MMTSSVM to outliers. After introducing the Ramp loss function, the optimization problem becomes a non-differentiable non-convex problem, which can not be solved by quadratic programming directly. Therefore, we use CCCP approach to deal with it. We also discuss the properties of parameters in Ramp-MMTSSVM, which can help us determine the range of parameters. In addition, we extend Ramp-MMTSSVM to multi-class classification problems by RVO strategy, termed as multicategory Ramp loss maximum margin of twin spheres support vector machine (MRMMTSSVM). Through twenty benchmark experiments, we compare Ramp-MMTSSVM with THSVM, SSLM and MMTSSVM and compare MRMMTSSVM with OVO-THSVM, THKSVM and OVR-MMTSSVM. The experimental results imply that our algorithms have better performance than other algorithms.
The paper is organized as follows. Section 2 gives a brief introduction on MMTSSVM and RSVM. In Section 3, we introduce Ramp-MMTSSVM. Section 4 shows our algorithm, MRMMTSSVM. Several artificial experiments and twenty benchmark experiments to testify the effectiveness of our algorithms are shown in Section 5. The last section, Section 6, is our conclusion on present study.
2 Background
In this section, we review the basics of MMTSSVM and RSVM.
To make it easier to understand, we first give some definitions of notions. Given a set {(x1,y1),…,(xl,yl)}, where \(\mathbf {x}_{i} \in \mathcal {R}^{m}\) and yi ∈{+ 1,− 1}, i = 1,…,l includes l+ positive data points and l− negative data points. In addition, I+ and I− represent positive set and negative set, respectively. ϕ is a nonlinear mapping, which maps the input data points into the higher-dimensional feature space. Kernel function K(xi,xj) = (ϕ(xi) ⋅ ϕ(xj)) is selected in advance.
2.1 Maximum margin of twin spheres support vector machine
MMTSSVM is good at dealing with imbalanced data classification problems. It aims to find two homocentric spheres. On one hand, the small sphere needs to cover as many data points of the positive class as possible. On the other hand, the large sphere needs to push out as many data points of the negative class as possible. In addition, it follows the maximum margin principle which requires the margin between the small sphere and the large sphere is as large as possible. The optimization problems that MMTSSVM needs to solve are denoted as follows,
and
where ν,ν1 and ν2 are parameters chosen a priori. C and R are the center and the radius of the small sphere, respectively. \(\sqrt {R^{2}+\rho ^{2}}\) is the radius of the large sphere.
The decision function of MMTSSVM is shown as follows,
2.2 Ramp loss SVM
Classic SVM employs the Hinge loss function, which is denoted as follows, Hs(z) = max(0,s − z), where s indicates the position of the Hinge point, to penalize examples classified with an insufficient margin. In terms of the Hinge loss function, the objective function can be written as follows,
where the f(x) is the decision function and the expression of f(x) is f(x) = wTx + b.
From Fig. 1, we can see that the outliers tend to have the largest loss values according to the Hinge loss function. Therefore, the outliers have a negative influence on constructing the hyperplane and the model is sensitive to outliers. To improve this problem, researchers proposed the Ramp loss function, i.e., the Robust Hinge loss. The expression of the Ramp loss function is shown as follows,
where s < 1 is given a prior. We can see that the Ramp loss function can be expressed by a convex Hinge loss and a concave loss, i.e., Rs(z) = H1(z) − Hs(z) in Fig. 1. After introducing the Ramp loss function, RSVM is denoted as follows,
This is a non-differentiable non-convex problem, which is solved by CCCP approach.
3 Ramp loss maximum margin of twin spheres support vector machine
Because MMTSSVM is sensitive to outliers, we substitute the Hinge loss function with the Ramp loss function to improve this problem.
We set u1 = R2 −∥ϕ(xi) − C∥2, u2 = ∥ϕ(xj) − C∥2 − (R2 + ρ2). Since the Hinge loss function is Hs(z) = max(0,s − z), the objective functions of problems (1) and (2) are equal to
and
3.1 Primal Formulations
After replacing the Hinge loss function with the Ramp loss function, we get the primal formulations as follows,
and
where Q(u1) and Q(u2) are the Ramp loss functions. Q(u1) and Q(u2) can be denoted as follows,
where k1 and k2 are chosen a prior.
Then, the problems (9) and (10) can be rewritten as
and
From (11) and (12), we can see that the Ramp loss function limits the maximum loss value, which reduces the influence of outliers on the optimal solution to the problem. Therefore, the model is less sensitive to outliers.
3.2 Dual problems
The Ramp loss can be decomposed into the sum of a convex Hinge loss and a concave function, i.e.,
where H1(ui) = max(−ui,0)(i = 1,2), H2(u1) = max(−u1 − k1R2,0), H2(u2) = max (−u2 − k2(R2 + ρ2),0).
The (13) and (14) can be rewritten as follows,
and
We can see that QPP (16) and LPP (17) are both composed of a convex function and a concave function, which can not be solved by quadratic programming. For this reason, we take advantage of CCCP approach to solve this problem for its simple tuning and iteration manner.
We first take QPP (16) into consideration. The objective function is the sum of a convex function u(x) and a concave function v(x). CCCP solves problems by iterating a series of concave functions, i.e.,
where t means the number of iterations. The convex part of QPP (16) is shown as follows,
and the concave part is shown as follows,
The CCCP process for this issue is as follows:
Then we rewrite the problem (??) as follows:
To simplify the process, we introduce the following symbols,
Therefore, problem (22) can be rewritten as follows,
Similarly, the CCCP structure of problem (17) can also be given. The convex part of this problem is shown as follows:
and the concave part is shown as follows:
The CCCP process for problem (17) is as follows:
Then we rewrite the problem (??) as follows:
To simplify the process, we introduce the following symbols,
Therefore, problem (28) can be rewritten as follows,
To solve the problem (24), we introduce the Lagrange function which is shown as follows,
where αi ≥ 0, βi ≥ 0 both are Lagrange multipliers. Differentiating the Lagrangian function L1 with respect to variables R2,C and ξi yields the following Karush-Kuhn-Tucker(KKT) conditions:
From (32), we can get
From (33), we can obtain the center C as follows:
Then
Finally, we derive the dual formulation as follows:
In order to obtain the radius of the small sphere, we use positive points xi whose Lagrangian multiplier αi satisfy \(0<\alpha _{i}<\frac {1}{\nu _{1}l^{+}}\). Supposing the number of xi above is np. According to KKT condition, we can get: αi (∥ϕ(xi) − C∥2 − R2 − ξi) = 0. Then we can acquire the square radius of the small sphere as
To solve the problem (30), we introduce the Lagrange function which is shown as follows,
where γj ≥ 0 and λj ≥ 0 are Lagrange multipliers. Differentiating the Lagrange function L2 with respect to variables ρ2 and ηj yields the following KKT conditions:
From (41), we can obtain
Finally, we derive the dual formulation as follows:
In order to obtain ρ2, we use negative points xj whose Lagrangian multiplier γj satisfy \(0<\gamma _{j}<\frac {1}{\nu _{2}l^{-}}\). Supposing the number of xj is nn. According to KKT condition: γj(∥ϕ(xj) − C∥2 − R2 − ρ2 + ηj) = 0, we can acquire
Then, CCCP procedure based on Ramp-MMTSSVM is summarized as follows.
3.3 Property of parameters ν 1 and ν 2
In the Ramp-MMTSSVM, parameters ν1 and ν2 have their theoretical significance. In the following part, we will analyze the properties of ν1 and ν2.
To make it easier to understand, we first give the following two definitions.
Definition 1
The small sphere can be divided into four sets S+ = {i|∥ϕ(xi) − C∥2 < R2}, B+ = {i|∥ϕ(xi) − C∥2 = R2}, R+ = {i|R2 < ∥ϕ(xi) − C∥2 < R2 + k1R2} and W+ = {i|∥ϕ(xi) − C∥2 ≥ R2 + k1R2}.
Definition 2
The large sphere can be divided into four sets S− = {j|∥ϕ(xj) − C∥2 ≤ R2 + ρ2 − k2(R2 + ρ2)}, B− = {j|R2 + ρ2 − k2(R2 + ρ2) < ∥ϕ(xj) − C∥2 < R2 + ρ2}, R− = {j|∥ϕ(xj) − C∥2 = R2 + ρ2} and W− = {j|∥ϕ(xj) − C∥2 > R2 + ρ2}.
According to the two definitions above, we can obtain the following propositions.
Proposition 1
Let n 1 represent the number of positive samples in B + , R + and W + , n 2 represent the number of positive samples in R + and W + , n 3 represent the number of positive samples in W + . Thus, we can obtain \(\frac {n_{2}-n_{3}}{l^{+}}\leq \nu _{1} \leq \frac {n_{1}-n_{3}}{l^{+}}\) .
Proof
If xi(i = 1,2,…,l+) represents a positive data point in S+, we can get ξi = 0. Then, according to KKT condition αi(∥ϕ(xi) − C∥2 − R2 − ξi) = 0, we can get αi = 0. From (23), we can acquire 𝜃i = 0. If xi represents a positive data point in B+, we can get ξi = 0, then according to KKT condition βiξi = 0, we can get βi ≥ 0. Therefore, \(0\leq \alpha _{i}\leq \frac {1}{\nu _{1}l^{+}}\), and 𝜃i = 0. If xi represents a positive data point in R+, we can get ξi≠ 0, then βi = 0, then \(\alpha _{i}=\frac {1}{\nu _{1}l^{+}}\) and 𝜃i = 0. If xi represents a positive data point in W+, we can get \(\alpha _{i}=\frac {1}{\nu _{1}l^{+}}\) and \(\theta _{i}=\frac {1}{\nu _{1}l^{+}}\). According to the conclusions above, we have \(\frac {n_{2}}{\nu _{1}l^{+}}\leq {\sum }_{i\in I^{+}}\alpha _{i} \leq \frac {n_{1}}{\nu _{1}l^{+}}\) and \(-{\sum }_{i\in I^{+}}\theta _{i}=-\frac {n_{3}}{\nu _{1}l^{+}}\). Therefore, \(\frac {n_{2}-n_{3}}{\nu _{1}l^{+}}\leq {\sum }_{i\in I^{+}}(\alpha _{i}-\theta _{i}) \leq \frac {n_{1}-n_{3}}{\nu _{1}l^{+}}\). From (35), we can get \(\frac {n_{2}-n_{3}}{\nu _{1}l^{+}}\leq 1 \leq \frac {n_{1}-n_{3}}{\nu _{1}l^{+}}\). Then, \(\frac {n_{2}-n_{3}}{l^{+}}\leq \nu _{1} \leq \frac {n_{1}-n_{3}}{l^{+}}\). □
Proposition 2
Let m 1 represent the number of negative samples in S − , B − and R − , m 2 represent the number of negative samples in S − and B − , m 3 represent the number of negative samples in S − . Thus, we can obtain \(\frac {m_{2}-m_{3}}{l^{-}}\leq \nu _{2} \leq \frac {m_{1}-m_{3}}{l^{-}}\) .
Proof
If xj(i = 1,2,…,l−) represents a negative data point in S−, we can get ηj≠ 0, then according to KKT condition λjηj = 0, we can get λj = 0. From (42), we can acquire \(\gamma _{j}=\frac {1}{\nu _{2}l^{-}}\). And from (29), we can get \(\delta _{j}=\frac {1}{\nu _{2}l^{-}}\). If xj represents a negative data point in B−, we can get \(\gamma _{j}=\frac {1}{\nu _{2}l^{-}}\), and δj = 0. If xj represents a negative data point in R−, we can get ηj = 0, then λj ≥ 0, then \(0\leq \gamma _{j}\leq \frac {1}{\nu _{2}l^{-}}\) and δj = 0. If xj represents a negative data point in W−, we can get ηj = 0, then γj = 0 and δj = 0. According to conclusions above, we have \(\frac {m_{2}}{\nu _{2}l^{-}}\leq {\sum }_{j\in I^{-}}\gamma _{j} \leq \frac {m_{1}}{\nu _{2}l^{-}}\) and \(-{\sum }_{j\in I^{-}}\delta _{j}=-\frac {m_{3}}{\nu _{2}l^{-}}\). Therefore, \(\frac {m_{2}-m_{3}}{\nu _{2}l^{-}}\leq {\sum }_{j\in I^{-}}(\gamma _{j}-\delta _{j}) \leq \frac {m_{1}-m_{3}}{\nu _{2}l^{-}}\). From (43), we can get \(\frac {m_{2}-m_{3}}{\nu _{2}l^{-}}\leq 1 \leq \frac {m_{1}-m_{3}}{\nu _{2}l^{-}}\). Then, \(\frac {m_{2}-m_{3}}{l^{-}}\leq \nu _{2} \leq \frac {m_{1}-m_{3}}{l^{-}}\). □
From Propositions above, we can acquire the range of ν1 and ν2: 0 < ν1,ν2 < 1.
4 Multicategory Ramp loss maximum margin of twin spheres support vector machine
Because multi-class classification problems are more common, we extend Ramp-MMTSSVM to multi-class classification problems by RVO strategy. For a K-class classification problem, we choose the i th class as the negative class and the rest classes as the positive class to generate a Ramp-MMTSSVM. Thus, K classifiers will finally be generated. The optimization problems of MRMMTSSVM are shown as follows:
and
where lk denotes the number of data points of k th class and lK− 1 denotes the number of data points of the rest classes, \(A_{k} \in R^{l_{k}\times n}(k = 1, \ldots , K)\), and \(B_{k}=[{A_{1}^{T}}, {\ldots } A_{k-1}^{T}, A_{k + 1}^{T}, {\ldots } {A_{K}^{T}}]^{T}\).
For solving each classifier, we also use CCCP approach. The usage of CCCP approach is the same as the binary classification case. Finally, we adopt ‘voting’ scheme to obtain the class of each new data point.
5 Numerical experiments
In this section, we conduct experiments on one artificial dataset and twenty benchmark datasets to demonstrate the validity of the algorithms we proposed.
5.1 Experiments on one artificial dataset
We generate a 2-D artificial dataset, including 80 positive data points and 10 negative data points. The positive data points follow uniform distribution N(0,0,3,3) and the negative data points follow N(3,3,2,2). The illustration of linear Ramp-MMTSSVM on this dataset is shown in Fig. 2.
In this part, we compare linear Ramp-MMTSSVM and linear MMTSSVM which are shown in Figs. 2 and 3 on the artificial dataset. From the pictures, it is easy to find that the small sphere captures more positive data points in Ramp-MMTSSVM than that in MMTSSVM. This represents Ramp-MMTSSVM has better experimental performance than MMTSSVM on this artificial dataset.
To testify that the Ramp loss can reduce the effect of outliers, we add five noise points to the positive and negative class respectively. After adding noises, the performance of Ramp-MMTSSVM and MMTSSVM are shown in Figs. 4 and 5 respectively. From Fig. 5, it is obvious that the boundary of MMTSSVM expands outward. However, the boundary of Ramp-MMTSSVM in Fig. 4 has almost no change. That is to say, when handling noises and outliers, Ramp-MMTSSVM has better performance than MMTSSVM.
In addition, in order to better show the property of ν1 which is mentioned in part 3, we depict Fig. 6, where the x − axis represents the values of parameter ν1. In the picture, the black dotted line indicates when ν1 changes, the changing proportion of n1 − n3 in positive data points. The red solid line represents the changing values of ν1. The blue dotted line indicates when ν1 changes, the changing proportion of n2 − n3 in positive data points.
From the picture, we can get that parameter ν1 is a lower bound on the fraction of n1 − n3 in the positive data points and an upper bound on the fraction of n2 − n3 in the positive data points, which is helpful for us to select the range of parameters.
5.2 Experiments on benchmark datasets
In this part, we make experiments on twenty benchmark datasets, which are collected from UCI machine learning repository. The detailed information of twenty datasets is shown in Table 1.
In the binary classification case, Ramp-MMTSSVM is compared with THSVM, SSLM and MMTSSVM. In the multi-class classification case, MRMMTSSVM is compared with OVO-THSVM, THKSVM and OVR-MMTSSVM.
We adopt 5-fold cross-validation for each experiment to make the experiments more convincing. All experiments are carried out in Matlab R2014a on Windows 7 running on a PC with system configuration Inter Core i3-4160Duo CPU(3.60GHz)with 4.00 GB of RAM.
5.2.1 Parameters selection
The approaches we mentioned above, i.e., THSVM, SSLM, MMTSSVM, THKSVM, Ramp-MMTSSVM and MRMMTSSVM all depend heavily on parameters selection. In these experiments, we choose the Gaussian kernel function k(xi,xj) = exp(−∥xi −xj∥2/σ2). We acquire optimal values of the parameters by the grid search method. All the Gaussian kernel parameter σ is selected from the set {2i|i = − 3,− 1,…,7}.
In THSVM, for reducing computational complexity, we set c1 = c2 and ν1 = ν2, which are chosen from the set {2i|i = 0,2,…,8} and {0.1,0.2,…,0.9}, respectively.
In SSLM, we set ν1 = ν2, which are selected from the set {0.001,0.01} and the range of ν is {0.1,0.2,…,0.9}.
In MMTSSVM, we set ν1 = ν2. All the ν are selected from the set {0.1,0.2,…,0.9}.
In THKSVM, the range of νk is {2i|i = − 4,− 2,…,4}. And dk is selected from the set {2i|i = 1,3,…,9}.
In our approach, we set k1 = k2 whose range is {0.5,1,1.5,2}. The range of other parameters in our approaches is the same as that in MMTSSVM.
5.2.2 Result analysis
In Tables 2 and 3, ‘Accuracy’ means the average value of testing results and plus or minus the corresponding standard deviation. ‘Time’ means the mean value of the time, including training time and testing time.
From Table 2, we can see in binary classification case, the experimental results of our approach are better than other algorithms on most datasets, i.e., Sonar, Breast cancer, Heart, Pima, Spectf heart and Abalone dataset. MMTSSVM obtains the best performance on Ionosphere, Banknote and Monks dataset. However, the accuracy of our algorithm is lower than MMTSSVM within a little range and higher than THSVM and SSLM. THSVM performs the best on Liver disorder dataset. Also, the performance of our approach is slightly worse than THSVM, which ranks the second. In binary classification case, SSLM never acquires the best performance on the ten datasets.
From Table 3, we can see in multi-class classification case, MRMMTSSVM has the best experimental results on most datasets, i.e., Soybean, Ecoli, Optidigits, Hayes-roth, Teaching, Balance and Image dataset. For Seeds dataset, THKSVM and MRMMTSSVM acquire the same accuracy, which is the highest. THKSVM obtains the best performance on Iris dataset, but MRMMTSSVM is a little worse than THKSVM, ranking the second. On Cmc dataset, OVO-THSVM gets the best experimental result. Though the accuracy produced by MRMMTSSVM is not the highest on Cmc dataset, it is better than THKSVM and OVR-MMTSSVM. In multi-class classification case, OVR-MMTSSVM never obtains the highest accuracy on the ten datasets.
In conclusion, we can see that both in binary and multi-class classification case, our algorithms perform better than other algorithms.
5.2.3 Friedman test
In order to better analyse the experimental performance of eight algorithms statically, we introduce Friedman Test. Tables 4 and 5 show the average ranking of four algorithms in binary and multi-class classification case respectively.
Under the null-hypothesis that all the algorithms are equivalent, one can compute the Friedman statistic according to:
where \(R_{j}=\frac {1}{N}{\sum }_{i}{r_{i}^{j}}\) and \({r_{i}^{j}}\) represents the j th of k algorithms on the i th of N datasets. Friedman’s \({\chi _{F}^{2}}\) is undesirably conservative and derives a better statistic
which is distributed according to the F-distribution with k − 1 and (k − 1)(N − 1) degrees of freedom.
According to (48) and (49), we can obtain in binary classification case, \({\chi _{F}^{2}}= 18.9300\) and FF = 15.3902, where FF is distributed according to F-distribution with (3,27) degrees of freedom. The critical value of F(3,27) is 2.96 for the level of significance α = 0.05, and similarly it is 3.65 for α = 0.025 and 4.60 for α = 0.01. We can see that the value of FF is much larger than the critical value which means our approach in binary classification case has significant difference with other algorithms. In addition, from Table 4, it is clear that the average ranking of our approach is far lower than that of the remaining algorithms, which means our approach is more valid than other three algorithms in binary classification case.
Similarly, we can obtain that in multi-class classification case, \({\chi _{F}^{2}}= 16.1700\) and FF = 10.5228. It is also clear that the value of FF is much larger than critical value. From Table 5, we can see that the average ranking of MRMMTSSVM is the lowest. They both represent that MRMMTSSVM has better experimental performance than other three algorithms.
In conclusion, our approaches are more valid than other algorithms both in binary and multi-class classification condition.
5.3 The effect of parameters on the performance
In Ramp-MMTSSVM and MRMMTSSVM, there are three parameters, i.e., k,ν, and σ. The different values of the three parameters will have a great influence on experimental accuracies. Therefore, in this part, we conduct experiments on Sonar dataset to investigate the influence of the three parameters on experimental performance. The results are shown in Figs. 7, 8 and 9.
The x-axis of the three figures denotes the changing values of the three parameters, and the y-axis denotes the corresponding accuracies. From Fig. 7, we can see that on Sonar dataset when k is from 0.5 to 1.5, the accuracy increases. While when k = 2.0, the accuracy decreases. Figure 8 shows that when ν is small, the accuracy is stable and great change of accuracy takes place when ν increases. We can get in Fig. 9 that the accuracy changes greatly when σ is small and as σ increases, the accuracy tends to be stable. In conclusion, the three parameters k, ν and σ all have an influence on experimental results in Ramp-MMTSSVM and MRMMTSSVM. In addition, the proper ranges of these paremeters are different for different datasets.
6 Conclusions
In this paper, we propose a Ramp loss maximum margin of twin spheres support vector machine. It reduces the sensitivity to outliers and improves generalization performance. Because it is a non-differentiable non-convex optimization problem, we adopt CCCP approach to solve it. Furthermore, we prove the properties of the parameters ν1 and ν2, and testify them by one artificial experiment. In addition, we extend Ramp-MMTSSVM to multi-class classification problems by RVO strategy. The experimental results show that our approaches both in binary and multi-class classification cases have better performance than other algorithms on twenty benchmark datasets. During the experiments, we find that the computational speed of our algorithms is a little slower than other algorithms. Therefore, how to accelerate the computational speed is our future work.
References
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Meza J, Espitia H, Montenegro C, Crespo RG (2016) Statistical analysis of a multi-objective optimization algorithm based on a model of particles with vorticity behavior. Soft Comput 20(9):3521–3536
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Zhang W, Yoshida T, Tang X (2008) Text classification based on multi-word with support vector machine. Knowl-Based Syst 21(8):879–886
Ghosh S, Mondal S, Ghosh B (2014) A comparative study of breast cancer detection based on SVM and MLP BPN classifier. In: First international conference on automation, control, energy & systems (ACES-14), pp 87–90
Gohariyan E, Esmaeilpour M, Shirmohammadi MM (2017) The combination of mammography and MRI for diagnosing breast cancer using fuzzy NN and SVM. Int J Interact Multimed Artif Intell 4(5):20–24
Naz S, Ziauddin S, Shahid AR (2018) Driver fatigue detection using mean intensity, SVM, and SIFT. Int J Interact Multimed Artif Intell 5(IP):1
Pang Y, Zhang K, Yuan Y, Wang K (2014) Distributed object detection with linear SVMs. IEEE T Cybern 44(11):2122–2133
Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE T Pattern Anal 29(5):905–910
Peng X (2010) A ν-twin support vector machine (ν-TSVM) classifier and its geometric algorithms. Inform Sciences 180(20):3863–3875
Xie X, Sun S (2014) Multi-view Laplacian twin support vector machines. Appl Intell 41(4):1059–1068
Xu Y, Guo R (2014) An improved ν-twin support vector machine. Appl Intell 41(1):42–54
Wang H, Zhou Z (2017) An improved rough margin-based ν-twin bounded support vector machine. Knowl-Based Syst 128:125–138
Wang H, Zhou Z, Xu Y (2018) An improved ν-twin bounded support vector machine. Appl Intell 48(4):1041–1053
Xu Y, Wang L, Zhong P (2012) A rough margin-based ν-twin support vector machine. Neural Comput Appl 21:1307–1317
Xu Y, Yu J, Zhang Y (2014) KNN-Based weighted rough ν-twin support vector machine. Knowel-Based Syst 71:303–313
Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543
Peng X, Xu D (2013) A twin-hypersphere support vector machine classifier and the fast learning algorithm. Inform Sci 221:12–27
Xu Y (2016) Maximum margin of twin spheres support vector machine for imbalanced data classification. IEEE T Cybern 47(6):1540–1550
Huang X, Shi L, Suykens JAK (2014) Ramp loss linear programming support vector machine. J Mach Learn Res 15:2185–2211
Yuille A, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15:915–936
Liu D, Shi Y, Huang X (2016) Ramp loss least squares support vector machine. J Comput Sci-Neth 14:61–68
Zhang Z, Krawczyk B, Garcìa S, Rosales- Pérez A, Herrera F (2016) Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data. Knowl-Based Syst 106:251–263
Zhou L, Wang Q, Fujita H (2017) One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies. Inform Fusion 36:80–89
Yang X, Yu Q, Guo T (2013) The one-against-all partition based binary tree support vector machine algorithms for multi-class classification. Neurocomputing 113:1–7
Tomar D, Agarwal S (2015) A comparison on multi-class classification methods based on least squares twin support vector machine. Knowl-Based Syst 81:131–147
Xu Y, Guo R (2014) A twin hyper-sphere multi-class classification support vector machine. J Intell Fuzzy Syst 27(4):1783–1790
Kumar D, Thakur M (2018) All-in-one multicategory least squares nonparallel hyperplanes support vector machine. Pattern Recogn Lett 105:165–174
Angulo C, Parra X, Català A (2003) K-SVCR. a support vector machine for multi-class classification. Neurocomputing 55(1-2):57–77
Xu Y (2016) K-nearest neighbor-based weighted multi-class twin support vector machine. Neurocomputing 205:430–438
Xu Y, Guo R, Wang L (2013) A twin multi-class classification support vector machine. Cogn Comput 5(4):580–588
Acknowledgments
This work was supported in part by National Natural Science Foundation of China (No.11671010).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, S., Wang, H. & Zhou, Z. All-in-one multicategory Ramp loss maximum margin of twin spheres support vector machine. Appl Intell 49, 2301–2314 (2019). https://doi.org/10.1007/s10489-018-1377-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1377-x