Abstract
In real-world binary class datasets, the total number of samples may not be the same in both the classes, i.e. size of the majority class is much larger than minority class which is called as imbalance problem. In various classification problems, the main interest is to correctly classify the samples belonging to the minority class. Since support vector machine (SVM) and twin support vector machine (TWSVM) obtain the resultant classifier by giving same importance to all the training samples, it results in a biased classifier towards the majority class in imbalanced datasets. In this paper, by considering the fuzzy membership value for each sample, we have proposed an efficient approach, entropy-based fuzzy twin support vector machine for class imbalanced datasets (EFTWSVM-CIL) where fuzzy membership values are assigned based on the entropy values of samples. Here, we give more importance to the minority class by assigning relatively larger fuzzy memberships to the minority class samples. Further, it solves a pair of smaller-size quadratic programming problems (QPPs) rather than a large one as in the case of SVM. Experiments are performed on various real-world imbalanced datasets, and results of our proposed EFTWSVM-CIL are compared with twin support vector machine (TWSVM), fuzzy twin support vector machine (FTWSVM) and entropy-based fuzzy SVM (EFSVM) for imbalanced datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
In recent years, many machine learning and data mining techniques have been introduced to solve the classification and regression problems. If a particular dataset is having equal number of samples of each class, then it is called a balanced dataset; otherwise, it is an imbalanced dataset. It is not easy to solve the imbalance problem for classification. Support vector machine (SVM) is one of the most popular machines learning approach which is based on structural risk minimization (SRM) principle [1,2,3]. It solves a quadratic programming problem and always provides a globally optimal, relatively robust and sparse solution, whereas techniques like artificial neural network (ANN) is based on empirical risk minimization (ERM) principle and has local minima problem. SVM has been used in applications such as face recognition [4,5,6], pattern recognition [7, 8], speaker verification [9], intrusion detection [10] and various other classification problems [11,12,13,14].
SVM finds the resultant classifier by maximizing the margin between the support vectors and decision boundary, meanwhile improving the generalization ability. One can notice that SVM provides better generalization performance, but the training cost of SVM is very high i.e. \(O(m^{3} )\) where \(m\) is the total number of training samples [15]. Recently, an efficient approach twin support vector machine (TWSVM) is proposed by Jayadeva et al. [15] to decrease the training cost of SVM. In TWSVM, two quadratic programming problems of smaller size are solved to find the solution rather than a single large problem as in SVM.
SVM is a supervised machine learning algorithms which constructs a model depending on the available number of samples of each class. Due to some imbalance in the dataset, samples belonging to the minority class get misclassified since they cannot contribute much in the training phase of the method. Thus, the classifier becomes biased towards the majority class. Here, the class of interest is the minority class; therefore, giving more weights to the data points of minority class resolves this problem to some extent. In applications such as fault detection and disease detection, more emphasis is on correctly identifying the faults in machinery and abnormalities in the patients data which are present in very few samples.
To address this problem, Lin et al. [16] proposed a support vector machine based on fuzzy membership values (FSVM). Similar to SVM, FSVM also suffers from the problem of class imbalance. Batuwita and Palade [17] have presented a new model as FSVMs for class imbalance learning (FSVM-CIL) to handle the problem of class imbalance which is less sensitive to outliers and noise. Here, the smaller fuzzy membership values are assigned to support vectors to reduce the effect of support vectors on the resultant decision surface based on class centres. In a similar manner, a new efficient approach fuzzy support vector machine for non-equilibrium data is proposed [18] to reduce the misclassification accuracy of minority class in FSVM. A new approach, Bilateral-weighted FSVM (B-FSVM) is proposed [19] where membership of each sample is calculated by considering the samples as belonging to minority and majority class with different membership values. To solve bankruptcy prediction problem, a new fuzzy SVM is proposed by Chaudhuri and De [20]. In order to reduce the complexity of TWSVM for large-scale data, Shao et al. [21] proposed a weighted linear loss twin support vector machine for imbalanced probelm (WLTSVM) where linear equations are solved and lesser weights are given to the points having high loss values. A fuzzy-based Lagrangian twin parametric-margin support vector machine (FLTPMSVM) is proposed by Gupta et al. [22] to deal with noisy data. Tomar et al. [12] assigned weights to the data points on the basis of number of samples in each class and proposed a weighted least squares twin support vector machine (WLSTVM). In this, all the samples of each class are assigned the same weight. For more efficient classification methods, reader may see [23, 24].
Recently, Fan et al. [25] proposed an entropy-based fuzzy support vector machine (EFSVM) for class imbalance problem in which fuzzy membership is computed based on the class certainty of samples. Motivated by the work of Fan et al. [25] and Jayadeva et al. [15], we propose a new approach termed as entropy-based fuzzy twin support vector machine (EFTWSVM-CIL) to solve the class imbalance problem. One can notice that EFTWSVM-CIL solves a pair of smaller-size QPPs to find the resultant decision surface rather than solving a single large one in case of SVM. Hence, EFTWSVM-CIL improves the generalization of the decision surface for minority class samples based on class certainty and also takes less training time.
In this paper, all vectors are considered as column vectors. Suppose \(x\) and \(z\) are the vector in \(n -\) dimensional real space \(R^{n}\) then the inner product of two vectors is denoted as: \(x^{t} z\) where \(x^{t}\) is the transpose of \(x\). \(||x||\) and \(||Q||\) will be the 2-norm of a vector \(x\) and a matrix \(Q\), respectively. The identity matrix of appropriate size and the vector of dimension \(m\) are denoted by \(I\) and \(e\), respectively.
The paper is organized as follows: Sect. 2 is to give a review on the work related to Support Vector Machine discussing Twin Support Vector Machine (TWSVM), Fuzzy Twin Support Vector Machine (FTWSVM) and Entropy Fuzzy Support Vector Machine (EFSVM). The proposed method is discussed in Sect. 3. Several numerical experiments have been performed on well-known real-world dataset for the discussed and proposed variant of SVM in Sect. 4. In Sect. 5, we conclude the paper with future work.
2 Related Work
In this section, we briefly describe the formulations of twin support vector machine (TWSVM), fuzzy twin support vector machine (FTWSVM) and entropy support vector machine (EFSVM).
2.1 Twin Support Vector Machine (TWSVM)
Mangasarian and Wild [26] extended the idea of proximal SVM (PSVM) [27] to a new approach termed as multisurface proximal SVM via generalized eigenvalues (GEPSVM) for binary classification. In order to improve the learning efficiency, Jayadeva et al. [15] suggested a novel approach as Twin Support Vector Machine (TWSVM) in the light of GEPSVM. In TWSVM, two non-parallel hyperplanes are obtained instead of one hyperplane such that each of them is nearer to one of the class and as far as possible from the other class. Here, two optimization problems of smaller size are solved in form of QPPs instead of solving a large QPP as in the case of standard SVM. The running time of TWSVM is given as \(\left\{ { 2 { } \times \, \left( {\frac{m}{2}} \right)^{3} = \frac{{m^{3} }}{4}} \right\}\) which is a reduction of four times as compared to standard SVM.
Let us consider the input matrices \(\text{X}_{\text{1}}\) and \(\text{X}_{\text{2}}\) of size \(p \times n\) and \(q \times n\) where \(p\) is the total number of data point belonging to ‘Class 1’ and \(q\) are the total number of data points belonging to ‘Class 2’ such that total number of data samples \(m = p + q\) and \(n\) is the dimension of each data points. In nonlinear case, twin support vector machine finds a pair of non-parallel hyperplanes \(f_{1} (x) = K(x^{t} \varvec{,}D^{t} )w_{1}^{{}} + b_{1} = 0\) and \(f_{2} (x) = K(x^{t} \varvec{,}D^{t} )w_{2}^{{}} + b_{2} = 0\) from the solution of the following QPPs as
subject to
subject to
where \(\xi\),\(\eta\) represent slack variables; \(C_{1}\), \(C_{2}\) are penalty parameters; \(D = [\text{X}_{1} \varvec{;}\text{X}_{2} ]\); \(\varvec{e}_{\varvec{1}}\),\(\varvec{e}_{\varvec{2}}\) are vectors of suitable dimension having all values as 1’s; and \(K(x^{t} ,D^{t} ) = (k(x,x_{1} ), \ldots ,k(x,x_{m} ))\) is a row vector in \(R^{m}\).
The Lagrangian of problems (1) and (2) is written as
where \(\alpha_{1} = (\alpha_{11} , \ldots ,\alpha_{1q} )^{t} ,\quad \beta_{1} = (\beta_{11} , \ldots ,\beta_{1q} )^{t} ,\quad \alpha_{2} = (\alpha_{21} , \ldots ,\alpha_{2p} )^{t} ,\) and \(\beta_{2} = (\beta_{21} , \ldots ,\beta_{2p} )^{t}\) are the vectors of Lagrange multipliers. The Wolfe dual of Eqs. (3) and (4) is written by applying the Karush–Kuhn–Tucker (K.K.T) necessary and sufficient conditions [28] as
subject to
subject to
where \(S = [K(X_{1} ,D^{t} )\,\,\,e_{1} ]\) and \(T = [K(X_{2} ,D^{t} )\,\,\,e_{2} ]\).
We compute the nonlinear hyperplanes \(K(x^{t} ,D^{t} )w_{1}^{{}} + b_{1} = 0\) and \(K(x^{t} ,D^{t} )w_{2} + b_{2} = 0\) by computing the value of \(w_{1}\),\(w_{2}\), \(b_{1}\) and \(b_{2}\) using Eqs. (7) and (8)
Each new data point \(x \in R^{n}\) is assigned to a given class \('i'\) by using the following formula depending on which plane is closest to that data point.
2.2 Fuzzy twin support vector machine (FTWSVM)
In the case of FTWSVM, a weighting parameter is used based on fuzzy membership values. For comparison, we choose the fuzzy membership for each data points based on its distance from the centroid [17]. The membership values are used for giving weights to the error tolerance, i.e. \(C\) for every data point in FTWSVM.
The fuzzy membership function is given as
where \(d_{\text{cen}}\) is the Euclidean distance of each data point from the centroid of its class and \(\delta\) is a small positive integer for making the denominator non-zero.The formulation of FTWSVM in primal is written as
subject to
subject to
where \(\xi\), \(\eta\) represent slack variables; \(C_{1}\), \(C_{2}\) are penalty parameters; \(K(,)\) is the kernel function, \(s_{1}\),\(s_{2}\) are vectors having the membership values of the data samples in the constraints.
The Lagrangian of the problems (10) and (11) is written as
where \(\alpha_{1} = (\alpha_{11} , \ldots ,\alpha_{1q} )^{t} ,\quad \beta_{1} = (\beta_{11} , \ldots ,\beta_{1q} )^{t} ,\quad \alpha_{2} = (\alpha_{21} , \ldots ,\alpha_{2p} )^{t}\) and \(\beta_{2} = (\beta_{21} , \ldots ,\beta_{2p} )^{t}\) are the vectors of Lagrange multipliers. Now, we apply the Karush–Kuhn–Tucker (K.K.T) necessary and sufficient conditions [28] to find the Wolfe dual of Eqs. (12) and (13) as
subject to
subject to
where \(S = [K(X_{1} ,D^{t} )\,\,e_{1} ]\) and \(T = [K(X_{2} ,D^{t} )\,\,e_{2} ]\).
We compute the nonlinear hyperplanes \(K(x^{t} ,D^{t} )w_{1}^{{}} + b_{1} = 0\) and \(K(x^{t} ,D^{t} )w_{2} + b_{2} = 0\) by computing the values of \(w_{1}\),\(w_{2}\), \(b_{1}\) and \(b_{2}\) by using Eq. (16) as
Similarly, the resultant classifier is obtained by using Eq. (9).
3 Proposed Entropy-based Fuzzy Twin Support Vector Machine for class imbalance learning (EFTWSVM-CIL)
Recently, Fan et al. [25] proposed a novel fuzzy membership evaluation to improve the effectiveness and generalization ability of fuzzy support vector machine where memberships of the samples are computed based on class certainty. In information theory, entropy is a measure of the information carried by a sample. Chen et al. [29] used information entropy to find the uncertainty measure of a neighbourhood system. In case of class imbalance problem, most of the noisy data points of the majority class lie at the boundary of the two classes. So, for the majority class, the information of every data point is calculated based on its probability of belonging to any of the classes. This information is higher for the noisy samples as compared to rest of the samples in that class. The probability of a sample belonging to a particular class is based on class certainty. To find the class certainty, we can use entropy which is one of the effective-measuring approaches. Hence, one can assign the fuzzy membership to the data points by using the information entropy as the weighted parameter. Thus, the noisy samples of the majority class get lesser weights as compared to the other samples of the class. The traditional approach of giving weights [16] does not take into account the noise at the boundary of the two classes and do not incorporate the information about the probability distribution. Moreover, in most of the weighting strategies used for class imbalance problems, measures like distance from the centroid are used which do not give any information about the data points at the boundary of the two classes. In the proposed approach, to enhance the participation of the minority class in the decision classifier, the samples of majority class with lower entropy get larger fuzzy membership values. The entropy of any sample \(x_{i}\) is calculated as:
where \(P_{{{\text{pos}}\_x_{i} }}\) and \(P_{{{\text{neg}}\_x_{i} }}\) are the probability of minority class and majority class of sample \(x_{i}\), respectively. Further, we calculate the \(K\) nearest neighbours of sample \(x_{i}\) and assign the values to \(P_{{{\text{pos}}\_x_{i} }}\) and \(P_{{{\text{neg}}\_x_{i} }}\) based on count of total minority and majority class neighbours.
Further, the data points of the majority class are divided into \(n\) subsets based on increasing order of entropy. The fuzzy membership of samples in each subset are calculated as
where \(F_{\text{q}}\) is the fuzzy membership for samples distributed in qth subset with fuzzy membership parameter \(\beta \in \left( {0,\frac{1}{n - 1}} \right]\) which controls the scale of the fuzzy values of samples. The fuzzy membership function is written as
Fan et al. [25] considered this approach to find the fuzzy membership of the sample and proposed a new approach termed as entropy-based fuzzy support vector machine for imbalance datasets. Motivated by the work of Fan et al. [25] and Jayadeva et al. [15], in this paper, we propose a new fuzzy twin support vector machine based on information entropy for class imbalance learning where information entropy is used for the fuzzy membership. The data points which have highest entropy are those present on the boundary between the classes. So, the data points of the majority class get their membership value based on their entropy and all the minority class samples get full membership value equal to 1. EFTWSVM-CIL finds two non-parallel hyperplanes such that each one is closer to the two classes and as far as possible from the other, whereas EFSVM finds separating hyperplanes that maximizes the margin between two classes. Due to this approach, the proposed EFTWSVM-CIL gives better generalization performance in comparison with EFSVM. Further, one can notice that we consider a pair of QPP of smaller size to find the decision surface of our proposed EFTWSVM-CIL, instead of solving a single large QPP as in the case of EFSVM. This makes our proposed EFTWSVM-CIL faster than EFSVM in terms of training time. Thus, it is very well suited for training on large imbalanced data. Now, we discuss the linear and nonlinear formulations of our EFTWSVM-CIL.
3.1 Linear EFTWSVM-CIL
In linear case, the EFTWSVM-CIL finds the resultant classifier by solving the following pair of QPPs
subject to
subject to
where \(\xi\), \(\eta\) represent slack variables, \(C_{1} ,C_{2} > 0\) are penalty parameters and \(s_{1}\),\(s_{2}\) are vectors containing the entropy-based fuzzy membership values of minority as well as majority class, respectively. The Lagrangian of problems (17) and (18) in primal is written as
where \(\alpha_{1} = (\alpha_{11} , \ldots ,\alpha_{1q} )^{t} ,\quad \beta_{1} = (\beta_{11} , \ldots ,\beta_{1q} )^{t} ,\quad \alpha_{2} = (\alpha_{21} , \ldots ,\alpha_{2p} )^{t}\) and \(\beta_{2} = (\beta_{21} , \ldots ,\beta_{2p} )^{t}\) are the vectors of Lagrange multipliers. Applying the KKT conditions to (19), we get
Combining (21) and (22), we get
One can rewrite (23) as
where \(A = \left[ {\begin{array}{*{20}l} {X_{1} } \hfill & {e_{1} } \hfill \\ \end{array} } \right]\), \(B = \left[ {\begin{array}{*{20}l} {X_{2} } \hfill & {e_{2} } \hfill \\ \end{array} } \right]\) and the augmented vector \(u_{1} = \left[ \begin{aligned} w_{1} \hfill \\ b_{1} \hfill \\ \end{aligned} \right]\).
Here, we introduce the regularization term \(\delta \, I\) where \(\delta > 0\) and \(I\) is the identity matrix of appropriate size to handle the ill-conditioning of \(\text{A}^{\text{t}} \text{A}\) in finding the inverse. Thus, we get,
Using the above KKT conditions and (19), the dual of the optimization problem in (17) can be written in the form of following QPP
subject to
In similar manner, one can find the dual of (18) as
subject to
The values of \(w_{2}\) and \(b_{2}\) are calculated as
where \(u_{2} = \left[ \begin{aligned} w_{2} \hfill \\ b_{2} \hfill \\ \end{aligned} \right]\).
After calculating the value of \(u_{1}\) and \(u_{2}\), we find the non-parallel hyperplanes \(f_{1} (x) = w_{1}^{t} x + b_{1}\) and \(f_{2} (x) = w_{2}^{t} x + b_{2}\). Every new data point \(x \in R^{n}\) is assigned to a given class \('i'\) by using the following formula depending on the distance from the two planes.
3.2 Nonlinear EFTWSVM-CIL
For classifying nonlinear separable data points, we used kernel function to transform the data points in the higher-dimensional feature space [30]. The nonlinear TWSVM is formulated in the primal form as
subject to
subject to
where \(\xi\),\(\eta\) represent slack variables, \(C_{1}\), \(C_{2}\) are penalty parameters, \(D = [X_{1} ;X_{2} ]\), and \(s_{1}\),\(s_{2}\) are vectors containing the entropy-based fuzzy membership values. The Lagrangian function of the problems (29) and (30) is written as
where \(\alpha_{1} = (\alpha_{11} , \ldots ,\alpha_{1q} )^{t} ,\quad \beta_{1} = (\beta_{11} , \ldots ,\beta_{1q} )^{t} ,\quad \alpha_{2} = (\alpha_{21} , \ldots ,\alpha_{2p} )^{t}\) and \(\beta_{2} = (\beta_{21} , \ldots ,\beta_{2p} )^{t}\) are the vectors containing the Lagrange multipliers.
Following the same procedure as in the linear case, we compute the nonlinear hyperplanes \(K(x^{t} ,D^{t} )w_{1}^{{}} + b_{1} = 0\) and \(K(x^{t} ,D^{t} )w_{2} + b_{2} = 0\) by computing the value of \(w_{1}\),\(w_{2}\), \(b_{1}\) and \(b_{2}\) using Eqs. (33) and (34)
where \(P = [K(X_{1} ,D^{t} ) \,\,e_{1} ]\),\(Q = [ K(X_{2} ,D^{t} )\,\, e_{2} ]\).
For each new data point \(x \in R^{n}\), it is assigned to a given class \('i'\) by using the following formula depending on which of the planes is closest to that point.
4 Numerical Experiments
In this section, to check the effectiveness of the proposed EFTWSVM-CIL with TWSVM, FTWSVM and EFSVM, we performed experiments on several imbalanced datasets from KEEL imbalanced datasets [31] and UCI repository [32] for binary classification. All computations were carried out on a PC running on Windows 7 OS with 64 bit, 3.20 GHz Intel® core™ i5-2400 processor having 2 GB of RAM under MATLAB R2008b environment. We used MOSEK optimization toolbox to solve the SVM formulations which is taken from http://www.mosek.com. For selecting the optimum parameters, we used fivefold cross-validation technique. To construct nonlinear classifier, we have used Gaussian kernel \(k(a,b) = \exp ( - \sigma \left\| {a - b} \right\|^{2} )\) where vector \(a,b \in R^{m}\).
We have taken the value of the parameter \(C = C_{1} = C_{2}\) from the set \(\{ 2^{ - 5} , \ldots ,2^{5} \}\) for all the cases. For FTWSVM, \(\delta\) is taken as 0.5. For EFTWSVM and EFSVM the value of \(K\) for k-NN is chosen from {5, 10} and \(\beta\) is taken as 0.05. The value of \(\sigma\) is calculated as per the following formula [33] in all methods,
All the results for TWSVM, FTWSVM, EFSVM and proposed method EFTWSVM-CIL are shown in terms of prediction accuracy, i.e. the area under the ROC curve (AUC) [34] and training time for both linear and nonlinear cases in Tables 1 and 3. One can observe from Tables 1 and 3 that EFTWSVM-CIL is much superior to TWSVM, FTWSVM, and EFSVM in terms of better generalization performance. Our proposed EFTWSVM-CIL takes very less training time in comparison with EFSVM because EFTWSVM-CIL solves a pair of smaller-size QPPs instead of solving a large one as in the case of EFSVM.
It is observable from Table 1 that our proposed method EFTWSVM-CIL has not performed better in case of all the datasets for linear kernel. Further, we analyse the comparative performance of EFTWSVM-CIL with TWSVM, FTWSVM, and EFSVM based on the average ranks of all the methods which are presented in Table 2 for the linear case. One can clearly observe form Table 2 that the average rank of proposed EFTWSVM-CIL is lowest among all the methods. We perform the Friedman test with the corresponding post hoc test [35] in the case of linear kernel for statistical comparison on the performance of the 4 algorithms using 24 datasets. We assume all the methods are equivalent under null hypothesis, and the Friedman statistic is computed from Table 2 as
where \(F_{F}\) is distribution according to the \(F\)-distribution with \((3,\,3 \times 23) = (3,\,69)\) being degree of freedom with 4 methods and 24 datasets. The critical value of \(F(3,69)\) is \(2.7375\) for the level of significance at \(\alpha = 0.05\). Since the value of \(F_{F} = 9.7306 > 2.7375\), we reject the null hypothesis. Further, Nemenyi post hoc test is performed for pair-wise comparison of methods and the significant difference between them is checked by computing the critical difference (CD) at \(P = 0.10\) which should differ by at least \(2.291\sqrt {\frac{4 \times (4 + 1)}{6 \times 24}} \approx 0.8539\).
Since the difference between the averages ranks of EFSVM with EFTWSVM-CIL \(( 3. 4 5 8 3- 1. 7 7 0 8= 1.6875)\) is greater than \(0.8538\), we conclude that EFTWSVM-CIL is significantly better than EFSVM. Since the differences in the average rank of TWSVM and FTWSVM with EFTWSVM-CIL are \(( 2. 4 5 8 3- 1. 7 7 0 8= 0.6875)\) and \(( 2. 3 1 2 5- 1. 7 7 0 8= 0.5417)\), respectively, which are less than \(0.8539\), this shows that there is no significant difference between EFTWSVM-CIL with TWSVM and FTWSVM.
For the Gaussian kernel, the accuracy values are shown with the training time for the proposed EFTWSVM-CIL with TWSVM, FTWSVM and EFSVM in Table 3. One can observe from Table 3 that EFTWSVM shows the better or equal generalization performance in 18 cases. The training speed of our proposed EFTWSVM-CIL is better than EFSVM and comparable to TWSVM and FTWSVM. The average ranks of all the methods based on accuracy values are shown in Table 2. One can conclude that among all the methods our proposed EFTWSVM-CIL has the lowest average rank. It is noticeable from the table that the proposed EFTWSVM is not always better in terms of accuracy for all the datasets, so further Friedman statistical test is performed with the post hoc tests.
Now, the Friedman statistic is computed for nonlinear kernel under null hypothesis by using Table 4:
The critical value of \(F(3,84)\) i.e. \(2.7132\) for the level of significant \(\alpha = 0.05\) is less than the value of \(F_{F} .\) Thus, it rejects the null hypothesis. Further, the Nemenyi post hoc test is used to find the significant difference between the pair-wise comparisons. We computed the critical difference (CD) at \(p = 0.10\) which should differ by at least \(2.291\sqrt {\frac{4 \times (4 + 1)}{6 \times 28}} \approx 0.7905\).
The difference between the average ranks of EFTWSVM-CIL with EFSVM and FTWSVM are \((2.8214 - 1.9107 = 0.9107)\) and \((2.7143 - 1.9107 = 0.8036)\), respectively, which are greater than \(0.7905\). Hence, proposed EFTWSVM-CIL is significantly better than EFSVM and FTWSVM.
One can verify that the performance of our proposed EFTWSVM-CIL is not sensitive to the values of its parameters C and K. After extensive simulations, it is found that EFTWSVM-CIL is not very sensitive to the user-specified parameter K. To illustrate this result, the performance of EFTWSVM-CIL with Gaussian RBF kernel on Australian Credit, WPBC, Yeast-0-3-5-9_vs_7-8 and Yeast-2_vs_4 datasets is shown in Fig. 1. From the figures, one can observe that better accuracy could be achieved for smaller values of C.
5 Conclusions and future work
In this paper, we proposed a new variant of SVM as EFTWSVM-CIL to solve class imbalance problem in binary class datasets where the fuzzy membership values are calculated based on entropy values of samples. Here, our proposed EFTWSVM-CIL solves the two smaller-size QPPs rather than a single large one as in case of EFSVM to find the decision surface. So, one can conclude from the results that EFTWSVM-CIL shows better generalization performance as compared to TWSVM, FTWSVM and EFSVM which clearly illustrates its efficacy and applicability. It has been found that EFTWSVM-CIL outperforms in terms of learning speed in comparison with EFSVM for both linear and nonlinear kernels. Here, the performance of EFTWSVM-CIL also depends on the optimal parameters. So, in future the proper selection of parameters for EFTWSVM-CIL may improve the performance of our proposed model. Some heuristic approaches can also be used to improve the method for parameter selection which may result into the better performance.
References
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Vapnik VN (2000) The nature of statistical learning theory, 2nd edn. Springer, New York
Osuna E, Freund, R, Girosi F (1997) Training support vector machines: an application to face detection. In: Computer vision and pattern recognition, 1997. Proceedings., IEEE computer society conference on (pp 130–136)
Phillips PJ (1998) Support Vector Machines Applied to Face recognition. Proc Conf Adv Neural Inf Process Syst 11:803–809
Michel P, El Kaliouby R (2003) Real time facial expression recognition in video using support vector machines. In: Proceedings of the 5th International Conference on Multimodal Interfaces, pp 258–264, ISBN: 1-58113-621-8
Borovikov E (2005) An evaluation of support vector machines as a pattern recognition tool. University of Maryland at College Park. http://www.umiacs.umd.edu/users/yab/SVMForPatternRecognition/report.pdf. Accessed 1 Dec 2016
Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543
Schmidt M, Gish H (1996) Speaker identification via support vector classifiers, acoustics, speech, and signal processing, 1996. ICASSP-96. In: Conference Proceedings, 1996 IEEE International Conference on, vol. 1. Atlanta, GA, pp 105–108
Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16:507–521
Tomar D, Ojha D, Agarwal S (2014) An emotion detection system based on multi least squares twin support vector machine. Adv Artif Intell 2014:8
Tomar D, Agarwal S (2015) Hybrid feature selection based weighted least squares twin support vector machine approach for diagnosing breast cancer, hepatitis, and diabetes. Adv Artif Neural Syst 2015:1, Article ID 265637
Zhang J, Liu Y (2004) Cervical cancer detection using SVM-based feature screening. In: Proceedings of Seventh Int’l Conference Medical Image Computing and Computer Aided Intervention, pp 873–880
Balasundaram S, Gupta D, Prasad SC (2017) A new approach for training Lagrangian twin support vector machine via unconstrained convex minimization. Appl Intell 46:124–134
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell (TPAMI) 29:905–910
Lin C-F, Wang S-D (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Batuwita R, Palade V (2010) FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3):558–571
Tian D-Z, Peng G-B, Ha M-H (2012) Fuzzy support vector machine based on non-equilibrium data. In: International Conference on Machine Learning and Cybernetics, Xi’an, China, pp 15–17
Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. IEEE Trans Fuzzy Syst 13(6):820–831
Chaudhuri, De K (2010) Fuzzy support vector machine for bankruptcy prediction. Appl Soft Comput 11(2):2472–2486
Shao YH, Chen WJ, Zhang JJ, Wang Z, Deng NY (2014) An efficient weighted Lagrangian twin support vector machine for imbalanced data classification. Pattern Recogn 47(9):3158–3167
Gupta D, Borah P, Prasad M (2017) A fuzzy based Lagrangian twin parametric-margin support vector machine (FLTPMSVM). In: Computational intelligence (SSCI), 2017 IEEE symposium series on pp 1–7 https://doi.org/10.1109/ssci.2017.8280964
Balasundaram S, Gupta D (2016) On optimization based extreme learning machine in primal for regression and classification by functional iterative method. Int J Mach Learn Cybern Springer 7(5):707–728
Balasundaram S, Gupta D, Kapil (2014) 1-norm extreme learning machine for regression and multiclass classification using Newton method. Neurocomputing, Elsevier 128:4–14
Fan Qi, Wang Zhe, Li Dongdong, Gao Daqi, Zha Hongyuan (2017) Entropy-based fuzzy support vector machine for imbalanced datasets. Knowl-Based Syst 115:87–99
Mangasarian OL, Wild EW (2006) Multisurface proximal support vector classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings Internation Conference Knowl. Discov. Data Mining, pp 77–86
Mangasarian OL (1994) Nonlinear programming. SIAM Philadelphia, PA
Chen Y, Wu K, Chen X, Tang C, Zhu Q (2014) An entropy-based uncertainty measurement approach in neighborhood systems. Inf Sci 279:239–250
Burges CJC (1998) Geometry and invariance in kernel based methods. In: Cristopher JCB, Alexander JS (eds) Advances in kernel methods-support vector learning, Bernhard Scholkopf. MIT Press, Cambridge
Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17(2–3):255–287
Murphy PM, Aha DW (1992) UCI repository of machine learning databases, University of California, Irvine. http://www.ics.uci.edu/~mlearn. Accessed 1 Dec 2016
Tsang I, Kocsor A, Kwok J (2006) Efficient kernel feature extraction for massive data sets. In: International Conference on Knowledge Discovery and Data Mining
Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Gupta, D., Richhariya, B. & Borah, P. A fuzzy twin support vector machine based on information entropy for class imbalance learning. Neural Comput & Applic 31, 7153–7164 (2019). https://doi.org/10.1007/s00521-018-3551-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-018-3551-9