Abstract
Least squares twin support vector machine (LSTSVM) is a new machine learning method, as opposed to solving two quadratic programming problems in twin support vector machine (TWSVM), which generates two nonparallel hyperplanes by solving a pair of linear system of equations. However, LSTSVM obtains the resultant classifier by giving same importance to all training samples which may be important for classification performance. In this paper, by considering the fuzzy membership value for each sample, we propose an entropy-based fuzzy least squares twin support vector machine where fuzzy membership values are assigned based on the entropy values of all training samples. The proposed method not only retains the superior characteristics of LSTSVM which is simple and fast algorithm, but also implements the structural risk minimization principle to overcome the possible over- fitting problem. Experiments are performed on several synthetic as well as benchmark datasets and the experimental results illustrate the effectiveness of our method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Support vector machine (SVM) [1] is one of the most popular machine learning approach, which has already been successfully applied to a variety of real-world problems such as face recognition [2], bioinformatics [3], text categorization [4], intrusion detection [5] and various other classification problems [6, 7]. However, the training cost of SVM is very high, i.e. \( O(m^{3} ) \), where \( m \) is the number of training samples. Researchers have made many improvements on the basis of SVM, such as Fung and Mangasrian [8] proposed proximal support vector machine (PSVM) for binary classification. Recently, following PSVM, Mangasrian and Wild [9] proposed multisurface proximal SVM via generalized eigenvalues (GEPSVM) for binary classification, which aims at seeking two nonparallel proximal hyperplanes such that each hyperplane is closer to one of two classes and as far as possible from the other. Inspired by GEPSVM, Jayadeva et al. [10] proposed another nonparallel hyperplane classifier for pattern classification, called twin support vector machine (TWSVM). The main idea of TWSVM is to solve two smaller quadratic programming problems (QPPs) rather than a single large QPP which makes training speed of TWSVM four times faster than SVM. From then on, TWSVM has been widely investigated [11,12,13,14,15,16]. Some improvements have been made to TWSVM by researchers to obtain higher classification accuracy with lower computational time, such as Least squares twin support vector machine (LSTSVM) [11], Twin bounded support vector machine (TBSVM) [12], Twin parametric-margin support vector machine (TPMSVM) [13], Robust twin support vector machine (RTSVM) [14], Nonparallel support vector machines (NPSVM) [15] and Angle-based twin support vector machine (ATSVM) [16], and so on [17,18,19,20].
It should be noted, in practical problems, some data are often polluted by noise or in low quality. The patterns, even belong to the same class, should play different roles in the model training. Since SVM treats all samples with the same importance, it ignores the differences between the positive and negative classes, which results in the learned decision surface biasing toward the majority class. To address this problem, based on fuzzy membership values, Lin et al. [21] propose the Fuzzy SVM (FSVM) such that different input samples have different contributions to the learning of decision surface. However, it assigns smaller fuzzy memberships to support vectors which might decrease the effects of support vectors on the construction of decision surface. In order to overcome this problem in FSVM, a new efficient approach fuzzy SVM for non-equilibrium data is proposed to reduce the misclassification accuracy of minority class in FSVM [22]. Adopting fuzzy membership, various Fuzzy SVMs are presented such as Bilateral-weighted FSVM (B-FSVM) [23], NFSVM [24], WCS-FSVM [25], FTSVM [26] and NFTSVM [27]. Moreover, the fuzzy set and fuzzy system theory is also widely used in various control problems [28,29,30].
Recently, Fan et al. [31] proposed an entropy-based fuzzy support vector machine (EFSVM) for class imbalance problem in which fuzzy membership is computed based on the class certainty of samples. Motivated by EFSVM, Gupta et al. [32] proposed a fuzzy twin support vector machine based on information entropy which is termed as EFTWSVM-CIL. Therefore, the choice of fuzzy membership is very important for classification problems. Each sample is given a fuzzy membership which indicates the importance of the corresponding sample toward one class and this change can make some contribution to the final decision surface. Based on the above discussion and inspired by EFTWSVM-CIL and LSTSVM, in this paper, we propose a new approach termed as entropy-based fuzzy least squares twin support vector machine (EFLSTSVM). The contributions of the proposed EFLSTSVM can be highlighted as follows. First, entropy-based fuzzy membership is given to evaluate the class certainty of training samples. Similar to EFSVM and EFTWSVM-CIL, our EFLSTSVM adopts the entropy to evaluate the class certainty of each sample and then determines the corresponding fuzzy membership based on the class certainty. Thus, it can pay more attention to the samples with higher class certainty to result in more robust decision surface, which enhances the classification accuracy and generalization ability. Second, we modify the QPP-based formulation of EFTWSVM-CIL in least squares sense which leads to solving the optimization problem with equality constraints. Different from EFTWSVM-CIL, our EFLSTSVM solves a pair of linear system of equations as opposed to solving two QPPs in EFTWSVM-CIL, which leads to simple algorithm and less computational time. Third, a regularization term in the objective function is introduced. Therefore, our EFLSTSVM implements the structural risk minimization principle instead of the empirical risk minimization principle, which can overcome the possible over-fitting problems in LSTSVM. And last but not least, the experimental results on several synthetic datasets and benchmark datasets show the effectiveness of our proposed EFLSTSVM.
The rest of this paper is organized as follows. In Sect. 2, we give a brief review of LSTSVM and FTSVM. Section 3 proposes the details of linear EFLSTSVM and its nonlinear version. Experimental results on both synthetic and real-world datasets to investigate the effectiveness of our method are described in Sect. 4. Finally, Sect. 5 gives the conclusion.
2 Brief Review of LSTSVM and FTSVM
In this section, we briefly explain the basics of LSTSVM and FTSVM. Let us consider a binary classification problem in the n-dimensional real space \( R^{n} \) and a set of training data samples is represented by \( T = \{ (x_{1} ,y_{1} ),(x_{2} ,y_{2} ), \ldots ,(x_{m} ,y_{m} )\} \), where \( x_{i} \in R^{n} \) and \( y_{i} \in \{ - 1,1\} ,i = 1,2, \ldots ,m \). We organize the \( m_{1} \) samples of positive class by a \( m_{1} \times n \) matrix \( A \in R^{{m_{1} \times n}} \) and \( m_{2} \) samples of negative class by a \( m_{2} \times n \) matrix \( B \in R^{{m_{2} \times n}} \).
2.1 Least Squares Twin Support Vector Machine (LSTSVM)
Different from least squares support vector machine (LSSVM) [33], least squares twin support vector machine (LSTSVM) [11] aims to find a pair of nonparallel hyperplanes
such that each hyperplane is close to the training samples of one class and as far as possible from the samples of the other class. Then, the primal optimization problem of linear LSTSVM can be expressed as
where \( c_{1} \) and \( c_{2} \) are positive penalty parameters, \( \xi_{1} \) and \( \xi_{2} \) are slack variables, \( e_{1} \) and \( e_{2} \) are vectors with each element of the value of 1.
By substituting the equality constraint into the objective function, we obtain the unconstrained optimization problem as follows.
Let \( {\text{E = [}}A\;e_{1} ],F{ = [}B\;e_{2} ] \), form (4) and (5), we can get
The solutions to the optimization problems (2) and (3) can be found directly by solving systems of linear Eqs. (4) and (5), more details can be seen in [11]. Once \( w_{1} ,b_{1} \) and \( w_{2} ,b_{2} \) are obtained from (6) and (7), the nonparallel hyperplanes (1) are known. A new data point \( x \in R^{n} \) is then assigned to positive class \( W_{1} \) or negative class \( W_{2} \) by
where \( |\cdot| \) is the absolute value.
2.2 Fuzzy Twin Support Vector Machine (FTSVM)
Different from TWSVM [10], in the case of linear FTSVM [26], a weighting parameter is used to construct the classifier based on fuzzy membership values. The formulation of linear FTSVM can be written as
where \( c_{1} \) and \( c_{2} \) are positive penalty parameters, \( \xi_{1} \) and \( \xi_{2} \) are slack variables, \( e_{1} \) and \( e_{2} \) are vectors with each element of the value of 1, \( s_{1} \) and \( s_{2} \) represent fuzzy membership of each type of sample points.
By introducing the method of Lagrangian multipliers, the corresponding Wolfe dual of QPPs (9) and (10) can be represented as
where \( G = [B\;e_{2} ],\;H = [A\;e_{1} ] \) and \( \alpha \in R^{{m_{2} }} ,\;\gamma \in R^{{m_{1} }} \) are Lagrangian multipliers.
The nonparallel hyperplanes (1) can be obtained from the solutions \( \alpha \) and \( \gamma \) of (11) and (12) by
Once \( w_{1} ,b_{1} \) and \( w_{2} ,b_{2} \) are obtained from (13) and (14), the two nonparallel hyperplanes (1) are known. A new data point \( x \in R^{n} \) is then assigned to positive class \( W_{1} \) or negative class \( W_{2} \) by
where \( |\cdot| \) is the absolute value. More details about FTSVM can be seen in [26].
3 Entropy-Based Fuzzy Least Squares Twin Support Vector Machine
As the evaluation of fuzzy membership is the key issue of FSVM, in this section, we introduce the entropy-based fuzzy membership at first. Then, by adopting the entropy-based fuzzy membership, the entropy-based fuzzy least squares twin support vector machine (EFLSTSVM) for binary classification is presented.
3.1 Entropy-Based Fuzzy Membership
In information theory, entropy is measure of the information carried by a sample [34]. Thus, entropy characterizes the certainty about the source of information, that is, the smaller entropy indicates the information is more certain. By adopting entropy, we can evaluate the class certainty of training samples and assign the fuzzy membership of the training samples based on their class certainty. Specifically, the sample with higher class certainty will be assigned to larger fuzzy memberships to enhance their contribution to the decision surface, and vice versa. Supposing the probabilities of the training samples \( x_{i} \) belonging to the positive and negative class are \( p_{ + i} \) and \( p_{ - i} \), respectively. The entropy of \( x_{i} \) is defined as
where ln represents the natural logarithm operator. The key point of calculating \( H_{i} \) by (16) is to evaluate the probability of each sample belong to positive and negative class. We calculate the K nearest neighbours of sample \( x_{i} \) and assign the values to \( p_{ + i} \) and \( p_{ - i} \) based on count of total positive and negative class neighbours, i.e.
where \( num_{ + i} \) and \( num_{ - i} \) represent the number of positive and negative samples in the selected K nearest neighbours, and \( num_{ + i} { + }num_{ - i} = k \).
By adopting the above entropy evaluation, the entropy of the positive samples are \( H_{ + } = \{ H_{ + 1} ,H_{ + 2} , \ldots ,H_{{ + m_{1} }} \} \). Then, data points of positive class are divided into \( N_{ + } \) subsets based on increasing order of entropy and the fuzzy memberships of positive samples in each subset are calculated as
where \( FM_{ + j} \) is the fuzzy membership for positive samples distributed in jth subset with fuzzy membership parameter \( \beta \in (0,\tfrac{1}{{N_{ + } - 1}}] \) which controls the scale of the fuzzy values of positive samples. Then, the fuzzy membership of positive samples are defined as
Similarly, the entropy of the negative samples are \( H_{ - } = \{ H_{ - 1} ,H_{ - 2} , \ldots ,H_{{ - m_{2} }} \} \). Then, data points of negative class are divided into \( N_{ - } \) subsets based on increasing order of entropy and the fuzzy memberships of negative samples in each subset are calculated as
where \( FM_{ - j} \) is the fuzzy membership for negative samples distributed in jth subset with fuzzy membership parameter \( \beta \in (0,\;\tfrac{1}{{N_{ - } - 1}}] \) which controls the scale of the fuzzy values of negative samples. Then, the fuzzy membership of negative samples are defined as
3.2 Linear EFLSTSVM
For binary classification problem, inspired by LSTSVM [11] and FTSVM [26], the proposed EFLSTSVM seeks a pair of nonparallel hyperplanes and modifies the primal problems of FTSVM by replacing the inequality constrains with equality constraints and using the square of 2-norm slack variables. Different from LSTSVM, the structural risk is minimized by adding a regularization term in our EFLSTSVM. Thus, the primal problems of EFLSTSVM are expressed as follows.
where \( c_{i} \;(i = 1,2,3,4) \) are positive penalty parameters, \( S_{ + } \; = diag(s_{ + 1} ,s_{ + 2} , \ldots ,s_{{ + m_{1} }} ) \) and \( S_{ - } \; = diag(s_{ - 1} ,s_{ - 2} , \ldots ,s_{{ - m_{2} }} ) \) are the entropy-based fuzzy membership of positive class and negative class, \( \xi_{1} \) and \( \xi_{2} \) are slack variables, \( e_{1} \) and \( e_{2} \) are the vectors of ones with appropriate dimensions.
Consider the primal problem (22), by substituting the equality constraint into the objective function, we obtain the unconstrained problem. Therefore, we obtain
By taking partial derivative of (24) with respect to \( w_{1} \) and \( b_{1} \), we get
Then, combing (25) and (26) and solving for \( w_{1} \) and \( b_{1} \), it is easy to lead to a system of linear equations which is expressed as follows.
where \( I_{1} \) is an identity matrix.
Let \( H = [A\;e_{1} ] \in R^{{m_{1} \times (n + 1)}} ,G = [B\;e_{2} ] \in R^{{m_{2} \times (n + 1)}} \), by simplifying (27), we can get
where \( I_{ + } \) is an identity matrix and \( u_{ + } = [w_{1} ;b_{1} ] \in R^{n + 1} \).
Thus, form (28), we can get
Consider the primal problem (23), and substitute the equality constrains into the objective function. Thus, we can obtain
By taking partial derivative of (30) with respect to \( w_{2} \) and \( b_{2} \), we obtain
Then, combing (31) and (32) and solving for \( w_{2} \) and \( b_{2} \), it is easy to lead to a system of linear equations which is expressed as follows.
where \( I_{2} \) is an identity matrix.
Similarly, by simplifying (33), we can obtain
where \( I_{ - } \) is an identity matrix and \( u_{ - } = [w_{2} ;\;b_{2} ] \in R^{n + 1} \).
Thus, form (34), we can get
The solutions to the pair of QPPs (22) and (23) can be found directly by solving two systems of linear Eqs. (28) and (34), which involves two matrix inverses of size \( (n + 1) \times (n + 1) \). When \( n \) is much smaller than the number samples of positive class and negative class, the training speed of linear EFLSTSVM is extremely fast. Once \( w_{1} ,b_{1} \) and \( w_{2} ,b_{2} \) are obtained from (29) and (35), the two nonparallel hyperplanes (1) are known. A new data point \( x \in R^{n} \) is then assigned to positive class \( W_{1} \) or negative class \( W_{2} \) by
where \( |\cdot| \) is the absolute value.
In summary, the steps for constructing linear EFLSTSVM classifier are shown in Algorithm 1.
3.3 Nonlinear EFLSTSVM
For nonlinear case, firstly, we define \( C = [A;B] \) and choose an appropriate kernel function \( K \). Following the same idea, linear EFLSTSVM classifier can be extended to nonlinear version by considering the following kernel-generated surfaces
The primal problems of the nonlinear EFLSTSVM are expressed as follows.
Similar to linear case, by substituting the constraints into objective function, then we can obtain the solutions to the problems (38) and (39) as follows.
Let \( KerH = [K(A,C^{T} )\;\;e_{1} ] \in R^{{m_{1} \times (m + 1)}} ,KerG = [K(B,C^{T} )\;\;e_{2} ]^{{m_{2} \times (m + 1)}} \), similar to linear case, the solutions of QPPs (40) and (41) can be obtained as follows.
where \( v_{ + } = [w_{1} ;b_{1} ] \in R^{m + 1} \), \( v_{ - } = [w_{2} ;b_{2} ] \in R^{m + 1} \).
By simplifying the expression (42) and (43), we obtain
In summary, the solutions to the pair of QPPs (38) and (39) can be found directly by solving two systems of linear Eqs. (42) and (43). Once \( w_{1} ,b_{1} \) and \( w_{2} ,b_{2} \) are obtained from (44) and (45), the two nonparallel hyperplanes (37) are known. A new data point \( x \in R^{n} \) is then assigned to positive class \( W_{1} \) or negative class \( W_{2} \) by
where \( |\cdot| \) is the absolute value.
Similar to linear EFLSTSVM, steps for constructing the nonlinear EFLSTSVM classifier are given in Algorithm 2 as follows.
4 Experimental Results and Discussions
In order to validate the performance of the proposed EFLSTSVM, we investigate its classification accuracy and computational efficiency on several synthetic datasets, UCI benchmark datasets, NDC datasets and image recognition datasets, respectively. In our experiments, we focus on the comparison between our proposed EFLSTSVM and five state-of-the-art classifiers, including TWSVM [10], LSTSVM [11], FTSVM [26], EFSVM [31] and EFTWSVM-CIL [32]. And the classification accuracy of each method is evaluated by standard 5-fold cross-validation methodology. All methods are implemented in MATLAB R2018a on a personal computer (PC) with an Intel (R) Core (TM) i7-7700CPU (3.60 GHz × 8) and 32 GB random-access memory (RAM). The QPPs in TWSVM, FTSVM and EFTWSVM-CIL are solved by SOR algorithm, which is also used to solve QPPs in literatures [12, 18]. And the systems of linear equations in LSTSVM and our EFLSTSVM are solved by ‘\’. In addition, the grid search method is used to find the optimal parameters in all methods. Specifically, the penalty parameters \( c_{i} \) and kernel wide parameter \( \sigma \) of Gaussian kernel function \( K(x,y) = e^{{ - \frac{{\left\| {x - y} \right\|^{2} }}{{2\sigma^{2} }}}} \) in all methods are selected form the set \( \{ 2^{i} |i = - 8, - 7, \ldots ,7,8\} \). The neighborhood size k in our EFLSTSVM, EFSVM and EFTWSVM-CIL are chosen from the set \( \{ 1,3,5, \ldots ,17,19\} \). The number of the separated subsets \( N_{ + } \) and \( N_{ - } \) are set to 10 and the fuzzy membership parameter \( \beta \) is set to 0.05.
4.1 Synthetic Datasets
In this subsection, two artificial datasets, including Two-moons manifold dataset [35, 36] and Ripley’s synthetic datasets [37] have been used to illustrate that the proposed EFLSTSVM can deal with linearly inseparable problems. In experiments, Two-moons-1 manifold dataset contains 100 samples (50 positive samples and 50 negative samples) and Two-moons-2 manifold dataset contains 200 samples (100 positive samples and 100 negative samples). Figure 1 shows two kinds of Two-moons datasets with different complexity.
For Two-moons manifold datasets, we investigate the performance of nonlinear TWSVM, LSTSVM, EFSVM, FTSVM, EFTWSVM- CIL and our EFLSTSVM with Gaussian kernel function. We randomly select 40% for training sets and 60% for testing sets, each experiment repeat 10 times and the average results are listed in Table 1. For Ripley’s synthetic dataset, which contains 250 training points and 1000 test points, we investigate the classification performance of nonlinear TWSVM, LSTSVM, EFSVM, FTSVM, EFTWSVM-CIL and our proposed EFLSTSVM with Gaussian kernel function. The test accuracy is listed in Table 2. From Tables 1 and 2, we can observe that our EFLSTSVM obtains the best performance on Two-moons and Ripley datasets. Moreover, the hyperplanes of TWSVM, LSTSVM, EFSVM, FTSVM, EFTWSVM-CIL and our EFLSTSVM on Ripley’s dataset are shown in Fig. 2.
4.2 UCI Datasets
To further compare EFLSTSVM with TWSVM, LSTSVM, EFSVM, FTSVM and EFTWSVM-CIL, we choose 11 datasets from UCI machine learning repository [38]. Specifically, they are Australian, Bupa-Liver, House-Votes, Heart-c, Heart-Statlog, Ionosphere, Musk, PimaIndian, Sonar, Spect and Wpbc, respectively. Experimental results of their linear and nonlinear versions are given in Tables 3 and 4. The best accuracy is shown in boldface and the shortest time is shown by underline for each dataset. In Table 3, we can find that the accuracy of our proposed linear EFLSTSVM is better than that of TWSVM, LSTSVM, EFSVM, FTSVM and EFTWSVM-CIL on most of the datasets. For example, for the Spect dataset, the accuracy of our linear EFLSTSVM is 83.89%, while TWSVM is 81.24%, LSTSVM is 80.89%, EFSVM is 83.15%, FTSVM is 81.67% and EFTWSVM-CIL is 81.68%, respectively.
Furthermore, Table 4 shows the experimental results for the nonlinear TWSVM, LSTSVM, EFSVM, FTSVM, EFTWSVM-CIL and our proposed EFLSTSVM on the above 11 UCI datasets. The results in Table 4 are similar to those in Table 3, and it also confirms the observation above. Especially for Ionosphere dataset, our nonlinear EFLSTSVM obtains the classification accuracy 96.58%, which is 2.26% higher than TWSVM, 2.28% higher than LSTSVM, 0.85% higher than EFSVM and 1.98% higher than FTSVM and EFTWSCM-CIL, respectively. In addition, from Tables 3 and 4, we can find that our proposed EFLSTSVM is not the fastest method. Although it is a bit slower than LSTSVM in most case, it is faster than the other four methods on all chosen datasets. The main reason might be that TWSVM, EFSVM, FTSVM and EFTWSVM-CIL are required to solve QPPs, while LSTSVM and our proposed EFLSTSVM are only required to solve systems of linear equations. Figure 3 shows the training times of nonlinear methods on selected UCI datasets.
In addition, Friedman test [39] is conducted to give a statistic comparison on the effectiveness with the compared algorithms. For this test, the average ranks of the compared algorithms on the used datasets are listed in the last row of Tables 3 and 4. In the experiments, we consider k (= 6) number of compared algorithms and n (= 11) number of datasets. Let \( r_{i}^{j} \) be the rank of the jth algorithms on the ith datasets. The average rank of the jth algorithms is calculated as \( R_{j} = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {r_{i}^{j} } \). We assume all the methods are equivalent under null hypothesis, and the Friedman statistic
is distributed according to \( \chi_{\text{F}}^{ 2} \) with \( (k - 1) \) degrees of freedom, when n and k are reasonable large.
Moreover, Iman and Davenport [40] showed that Friedman’s \( \chi_{\text{F}}^{ 2} \) presents a pessimistic behavior, and they derived a better statistic
which is distributed according to the F-distribution with \( (k - 1) \) and \( (k - 1)(n - 1) \) degrees of freedom.
For linear case, in Table 3, it is noticed that our proposed EFLSTSVM ranks the first with an average score of 1.8182. To validate that the measured average ranks are significantly different from the mean rank by the null hypothesis, according to (47) and (48), we obtain
Specifically, for 6 algorithms and 11 datasets, \( F_{F} \) is distributed according to the F distribution with \( (6 - 1){ = }5 \) and \( (6 - 1) \times (11 - 1){ = }50 \) degrees of freedom. We can find that the critical value of \( F(5,50) \) is 2.400 for the level of significant \( \alpha { = }0.05 \) and it is less than the value of \( F_{F} \), which indicates the null hypothesis is rejected. Thus, the compared algorithms are significantly different on the adopted datasets.
For nonlinear case, in Table 4, it is noticed that EFLSTSVM ranks the first with an average score of 1.4091. According to (47) and (48), we obtain
Similar to linear case, for 6 algorithms and 11 datasets, the critical value of \( F(5,50) \) is 2.400 for the level of significant \( \alpha { = }0.05 \) and it is also less than the value of \( F_{F} \). Therefore, the null hypothesis is rejected and the compared algorithms are significantly different.
On the other hand, in order to compare the performance of different nonparallel hyperplanes support vector machines, i.e. TWSVM, LSTSVM, FTSVM, EFTWSVM- CIL and our proposed EFLSTSVM, we give the two-dimensional scatter plots [14, 18] of part test points for above classifiers. Here, Australian UCI data set is selected for a practical example and the corresponding scatter plots are shown in Fig. 4 for Australian data set with about 20% of data points, where the coordinates \( (d_{i}^{ + } ,d_{i}^{ - } ) \) are the respective distances of a test point \( x_{i} \) to the two hyperplanes. Specifically, in Fig. 4, the figures (a–e) are the results obtained by TWSVM, LSTSVM, FTSVM, EFTWSVM-CIL and our proposed EFLSTSVM, respectively. From Fig. 4, it can be seen that most test samples are clustered around the corresponding hyperplanes for all methods. And it is clearly noticeable that our proposed EFLSTSVM obtains better clustered points and separated classes than other methods.
4.3 Image Recognition
In this subsection, we apply our method to image recognition. Three well-known and publicly available databases corresponding to typical image classifications, i.e., handwritten digits (USPS), objects (COIL-20) and recognition of faces (AR) are used to evaluate our proposed EFLSLSVM with TWSVM, LSTSVM, EFSVM, FTSVM and EFTWSVM-CIL.
The USPS database [41] consists of gray-scale handwritten digit images from 0 to 9, as shown in Fig. 5. Each digit contains 1100 images, and the size of each image is 16 × 16 pixels with 256 gray levels. Here we select five pairwise digits of varying difficulty for odd versus even digit classification.
COIL-20 [42] is a database of gray scale images of 20 objects, which are illustrated in Fig. 6. The objects were placed on a motorized turntable against a black background. Images of the objects were taken at pose intervals 5°, which corresponds to 72 images per object. In our experiments, we have resized each of the original 1440 images into 32 × 32 pixels.
The AR database [43] contains 100 subjects and each subject has 26 face images taken in two sessions. For each session, there are 13 face images. Here 14 unoccluded images from the two sessions of each person are chosen for experiments, which are shown in Fig. 7. In our experiments, the 1400 images are all cropped into the same size of 40 × 30 pixels.
For these datasets, we randomly partition the images of each project into two parts with same sizes such that one part is selected for training and the remaining part is used for testing. This process is repeated ten times. We only consider the Gaussian kernel for these methods and Table 5 lists the experimental results of these methods in USPS, COIL-20 and AR datasets. It can be seen that the proposed EFLSTSVM obtains the best classification performance than the other five methods on USPS datasets. Although our proposed EFLSTSVM obtains lower classification accuracies than EFTWSVM-CIL on COIL-20 and AR datasets, it also obtains higher accuracies than the other four methods.
4.4 NDC Datasets
In this subsection, we conduct some experiments on large scale classification datasets. So, the David Musicants NDC Data Generator [44] is used to evaluate the computation time for all methods with respect to number of data points. Table 6 lists a description of NDC datasets, each dataset is divided into a training set and testing set.
For the experiments on NDC datasets, we fixed parameters of all methods to be the same (i.e. \( c_{i} { = 1,}\sigma {\text{ = 1,k = 9}} \)). The training accuracy, testing accuracy and training time of linear and nonlinear classifiers are reported in Tables 7 and 8, respectively. In particular, Table 7 shows the comparison results for linear TWSVM, LSTSVM, EFSVM, FTSVM, EFTWSVM-CIL and our proposed EFLSTSVM on NDC datasets. In Table 7, we can see that our EFLSTSVM obtains the comparable accuracies and performs faster than other methods on most datasets. In addition, for nonlinear case, Table 8 shows the comparison results of all methods conducted on NDC datasets with Gaussian kernel. The results on these datasets also illustrate that LSTSVM and our EFLSTSVM are much faster than TWSVM, EFSVM, FTSVM and EFTWSVM-CIL. The reason might be that LSTSVM and our proposed EFLSTSVM are only solving systems of linear equations rather than quadratic programming in other methods.
For linear case, taking NDC-10000 dataset for example, the training time of TWSVM is 17.5482 s, EFSVM is 64.2797 s and FTSVM is 17.2046 s, EFTWSVM-CIL is 18.6533 s, while LSTSVM is 0.0347 s and our EFLSTSVM is 0.0702 s, respectively. Moreover, for nonlinear case, the training times of all methods on three large scale NDC datasets, e.g. NDC-5000, NDC-8000 and NDC-10000, are shown in Fig. 8. Thus, the results of Tables 7, 8 and Fig. 8 can indicate the efficiency of our proposed EFLSTSVM when dealing with large scale problems.
4.5 Further Discussion
In the proposed EFLSTSVM, there are so many parameters, i.e. the number of nearest neighbor k, the positive penalty parameters \( c_{1} ,c_{2} ,c_{3} ,c_{4} \) and the kernel wide parameter \( \sigma \) for nonlinear case. However, these parameters significantly impact the classification performance of the proposed EFLSTSVM. In order to investigate the influence of these parameters to EFLSTSVM, we select 5 datasets from Table 4 and discuss their effects to the classification performance to EFLSTSVM. For simplicity, we choose the penalty parameters \( c_{1} = c_{3} \) and \( c_{2} = c_{4} \). In addition, we should be declared that when discussing on k, other parameters are set to the best parameters which are selected by fivefold cross-validation. Take Sonar dataset for example, the best parameters of Sonar are \( k = 13,c_{1} { = }c_{3} { = 1},c_{2} { = }c_{4} { = 2}^{ - 6} \) and \( \sigma = 1 \). Specifically, when discussing on k, k is selected from the set of \( \{ 1,3, \ldots ,17,19\} \) and the other parameters are set to \( c_{1} { = }c_{3} { = 1,}c_{2} { = }c_{4} { = 2}^{ - 6} ,\sigma = 1 \), respectively. When discussing on \( \sigma \), \( \sigma \) is selected from the set of \( \{ 2^{ - 8} ,2^{ - 7} , \ldots ,2^{7} ,2^{8} \} \) and the other parameters are set to \( c_{1} { = }c_{3} { = 1,}c_{2} { = }c_{4} { = 2}^{ - 6} ,k = 13 \). When discussing on \( c_{i} \), the parameter k is set to \( 13 \), \( \sigma \) is set to \( 1 \) and \( c_{1} = c_{3} \),\( c_{2} = c_{4} \) are selected from the set of \( \{ 2^{ - 8} ,2^{ - 7} , \ldots ,2^{7} ,2^{8} \} \). At last, classification accuracies of the proposed EFLSTSVM with respect to k, \( \sigma \) and \( c_{i} \) on adopted datasets are shown in Figs. 9, 10 and 11, respectively.
Further, according to (44) and (45), it can be noted that the solutions of nonlinear EFLSTSVM require calculating the matrix inversion of size \( (m + 1) \times (m + 1) \) twice. Therefore, the computational complexity of the matrix inversion will be high when \( m \) is large enough. However, using Sherman–Morrison–Woodbury (SMW) formula [45], the solution of (42) and (43) can be solved using for inverses of smaller dimension than \( (m + 1) \times (m + 1) \). Thus, the expression (44) and (45) can be rewritten as follows.
where \( \bar{I}_{ + } \in R^{{m_{1} \times m_{1} }} ,\bar{I}_{ - } \in R^{{m_{2} \times m_{2} }} \) are identity matrices.
After using SMW formula, according to (49), (50), (51) and (52), the solutions of (42) and (43) require inversion of matrix of size \( m_{1} \times m_{1} \) and \( m_{2} \times m_{2} \) twice. Thus, when \( m \) is more than \( m_{1} \) and \( m_{2} \), we can utilize SMW formula to reduce computational complexity in our experiment.
5 Conclusions
In this paper, an entropy-based fuzzy least squares twin support vector machine (EFLSTSVM) is proposed by adopting the entropy to evaluate the class certainty of each sample and calculating the corresponding fuzzy membership based on the class certainty. This method pays more attention to the samples with higher class certainty to result in more robust decision surface, which enhances the classification accuracy and generalization ability. Experimental results obtained on synthetic and real-world datasets illustrate the effectiveness of our proposed EFLSTSVM. It should be pointed out that there are many parameters in our EFLSTSVM, so the parameter selection is a practical problem. Therefore, in the future work, we will propose some technique to select the optimal parameters and further improve the performance of the proposed EFLSTSVM. In addition, the extension of EFLSTSVM to multi-class classification [46, 47], multi-view classification [48, 49], semi-supervised classification [50, 51] and expand its practical application areas are also interesting.
References
Cortes C, Vapnik VN (1995) Support-vector networks. Mach Learn 20(3):273–297
Osuna E, Freund R, Girosi F (1997) Training support vector machines: an application to face detection. In: Proceedings of computer vision and pattern recognition, pp 130–136
Noble WS (2004) Kernel methods in computational biology. In: Schölkopf B, Tsuda K, Vert JP (eds) Support vector machine applications in computational biology. MIT, Cambridge, pp 71–92
Isa D, Lee LH, Kallimani VP, Rajkumar R (2008) Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Trans Knowl Data Eng 20(9):1264–1272
Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521
Schmidt M, Gish H (1996) Speaker identification via support vector classifiers and hierarchical clustering. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, pp 105–108
Zhang J, Liu Y (2004) Cervical cancer detection using SVM-based feature screening. In: Proceedings of international conference on medical image computing and computer-assisted intervention, pp 873–880
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings KDD-2001, knowledge discovery and data mining, pp 77–86
Mangasarian OL, Wild EW (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Jayadeva Khemchandai R, Chandra S (2007) Twin support vector machine classification for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543
Shao YH, Zhang CH, Wang XB, Deng NY (2011) Improvements on twin support vector machines. IEEE Trans Neural Netw 22(6):962–968
Peng XJ (2011) TPMSVM: a novel twin parametric-margin support vector machine for pattern recognition. Pattern Recognit 44(10):2678–2692
Qi ZQ, Tian YJ, Shi Y (2013) Robust twin support vector machine for pattern classification. Pattern Recognit 46(1):305–316
Tian YJ, Qi ZQ, Ju XC, Shi Y, Liu XH (2014) Nonparallel support vector machines for pattern classification. IEEE Trans Cybern 44(7):1067–1079
Rastogi R, Saigal P, Chandra S (2018) Angle-based twin parametric-margin support vector machine for pattern classification. Knowl Based Syst 139:64–77
Mehrkanoon S, Huang XL, Suykens JAK (2014) Non-parallel support vector classifiers with different loss functions. Neurocomputing 143:294–301
Chen SG, Wu XJ, Zhang RF (2016) A novel twin support vector machine for binary classification problems. Neural Process Lett 44(3):795–811
Ding SF, An YX, Zhang XK, Wu FL, Xue Y (2017) Wavelet twin support vector machine based on glowworm swarm optimization. Neurocomputing 225:157–163
Xu YT, Yang ZJ, Pan XL (2017) A novel twin support vector machine with pinball loss. IEEE Trans Neural Netw Learn Syst 28(2):359–370
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Batuwita R, Palade V (2010) FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3):558–571
Wang Y, Wang S, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. IEEE Trans Fuzzy Syst 13(6):820–831
Tao Q, Wang J (2004) A new fuzzy support vector machine based on the weighted margin. Neural Process Lett 20(3):139–150
An WJ, Liang MG (2013) Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises. Neurocomputing 110(7):101–110
Li K, Ma HY (2013) A fuzzy twin support vector machine algorithm. Int J Appl Innov Eng Manag 2(3):459–465
Chen SG, Wu XJ (2017) A new fuzzy twin support vector machine for pattern classification. Int J Mach Learn Cybern 9(9):1553–1564
Shen H, Li F, Wu ZG, Park JH, Sreeram V (2018) Fuzzy-model-based nonfragile control for nonlinear singularly perturbed systems with semi-Markov jump parameters. IEEE Trans Fuzzy Syst 26(6):3428–3439
Shen H, Men YZ, Wu ZG, Cao JD, Lu GP (2019) Network-based quantized control for fuzzy singularly perturbed semi-Markov jump systems and its application. IEEE Trans Circuits Syst I Regul Pap 66(3):1130–1140
Shen H, Li F, Yan HC, Karimi HR, Lam HK (2018) Finite-time event-triggered H∞ control for T-S fuzzy Markov jump systems. IEEE Trans Fuzzy Syst 26(5):3122–3135
Fan Q, Wang Z, Li DD, Gao DQ, Zha HY (2017) Entropy-based fuzzy support vector machine for imbalance datasets. Knowl Based Syst 115:87–99
Gupta D, Richhariya B, Borah P (2018) A fuzzy twin support vector machine based on information entropy for class imbalance learning. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3551-9
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Shannon C (2001) A mathematical theory of communication. ACM SIGMOBILE Mobile Comput Commun 5(1):3–55
Benkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434
Chen SG, Wu XJ, Xu J (2016) Locality preserving projection twin support vector machine and its application in classification. J Algorithms Comput Technol 10(2):65–72
Ripley BD (2008) Pattern recognition and neural networks. Cambridge University Press, Cambridge
Muphy PM, Aha DW (1992) UCI repository of machine learning databases. University of California, Irvine. http://www.ics.uci.edu/~mlearn. Accessed 10 Jan 2018
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595
The USPS Database. http://www.cs.nyu.edu/roweis/data.html. Accessed 15 Nov 2018
Nene SA, Nayar SK, Murase H (1996) Columbia object image library (COIL-20). Technical report CUCS-005096, February
Martinez AM, Benavente R (1998) The AR face database. CVC technical report #24, June
Musicant DR (1998) NDC: Normally distributed clustered datasets. Computer Science Department, University of Wisconsin, Madison, USA. http://research.cs.wisc.edu/dmi/svm/ndc/. Accessed 16 Feb 2018
Golub GH, Van Loan CF (2012) Matrix computations, vol 3. JHU Press, Baltimore
Nasiri JA, Charkari NM, Jalili S (2015) Least squares twin multi-class classification support vector machine. Pattern Recognit 48(3):984–992
Chen SG, Wu XJ (2017) Multiple birth least squares support vector machine for multi-class classification. Int J Mach Learn Cybern 8(6):1731–1742
Tang JJ, Li DW, Tian YJ, Liu DL (2018) Multi-view learning based on nonparallel support vector machine. Knowl Based Syst 158:94–108
Houthuys L, Langone R, Suykens JAK (2018) Multi-view least squares support vector machines classification. Neurocomputing 282:78–88
Qi ZQ, Tian YJ, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53
Chen WJ, Shao YH, Deng NY, Feng ZL (2014) Laplacian least squares twin support vector machine for semi-supervised classification. Neurocomputing 145:465–476
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (Grant No. 61702012), the University Outstanding Young Talent Support Project of Anhui Province of China (Grant No. gxyq2017026), the University Natural Science Research Project of Anhui Province of China (Grant Nos. KJ2016A431, KJ2017A361 and KJ2017A368) and the Program for Innovative Research Team in Anqing Normal University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, S., Cao, J., Chen, F. et al. Entropy-Based Fuzzy Least Squares Twin Support Vector Machine for Pattern Classification. Neural Process Lett 51, 41–66 (2020). https://doi.org/10.1007/s11063-019-10078-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10078-w