Applying 1-norm SVM with squared loss to gene selection for cancer classification

Zhang, Li; Zhou, Weida; Wang, Bangjun; Zhang, Zhao; Li, Fanzhang

doi:10.1007/s10489-017-1056-3

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Published: 15 September 2017

Volume 48, pages 1878–1890, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Download PDF

Li Zhang ORCID: orcid.org/0000-0001-7914-0679¹,
Weida Zhou²,
Bangjun Wang¹,
Zhao Zhang¹ &
…
Fanzhang Li¹

443 Accesses
20 Citations
Explore all metrics

Abstract

Gene selection methods available have high computational complexity. This paper applies an 1-norm support vector machine with the squared loss (1-norm SVMSL) to implement fast gene selection for cancer classification. The 1-norm SVMSL, a variant of the 1-norm support vector machine (1-norm SVM) has been proposed. Basically, the 1-norm SVMSL can perform gene selection and classification at the same. However, to improve classification performance, we only use the 1-norm SVMSL as a gene selector, and adopt a subsequent classifier to classify the selected genes. We perform extensive experiments on four DNA microarray data sets. Experimental results indicate that the 1-norm SVMSL has a very fast gene selection speed compared with other methods. For example, the 1-norm SVMSL is almost an order of magnitude faster than the 1-norm SVM, and at least four orders of magnitude faster than SVM-RFE (recursive feature elimination), a state-of-the-art method.

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Improved Gene Expression Classification Through Multi-class Support Vector Machines Feature Selection

Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Microarray technology can simultaneously access the expression level of thousands of genes for DNA, which can be used for identifying and classifying cancer diseases [6, 16, 19, 20, 26, 27, 36, 37, 39]. Many methods have been applied to cancer classification, such as support vector machines(SVMs) [4, 16], and k nearest neighbor (k NN) [4, 19]. Generally, a DNA microarray data set consists of a large number of genes and a relatively small number of samples for cancer classification. Among these genes, not all of them are useful for cancer classification tasks. To find the useful genes in the gene expression data, it is necessary to implement gene selection or feature selection for cancer classification based on gene expression data.

Many gene selection method have been proposed, such as genetic algorithm-based methods [19, 26], and SVMs-based methods [4, 10, 12, 16,17,18, 38, 46]. In this paper, we focus on the gene selection methods based on SVMs. An embedded feature selection algorithm based on SVMs was proposed in [18], which can adaptively identify important features through introducing data driven weights, and simultaneously implement classification and gene selection. However, this method requires adjusting more parameters, and its performance largely depends on the parameters. The ℓ ₂ − ℓ ₀ SVM, a variant of SVM, can be applied to feature selection [31]. How to design the objective function based on the standard SVM is the key and difficult issue of this kind of algorithm.

Lee and Leu proposed a hybrid feature selection algorithm based on SVM for microarray data analysis [17]. Specifically, this method uses the genetic algorithm to generate a number of subsets of genes, the chi square test to select a proper number of the top-ranked genes for data analysis and SVM to verify the efficiency of the selected genes. Weston et al. proposed the wrapper feature selection algorithm based on SVM, which finds useful genes by minimizing bounds on the leave-one-out error using gradient descent [38].

Guyon et al. proposed SVM-RFE (recursive feature elimination) feature selection algorithm, which is the most representative gene selection algorithm based on SVMs [16]. Duan et al. presented a new gene selection method based on SVM-RFE, called MPSVM-RFE. At each step, MPSVM-RFE trains multiple linear SVMs on subsamples of training data and computes the feature ranking scores from statistical analysis of the weight vectors [10]. To deal with multi-class cancer classification tasks, a multi-class SVM-RFE (MSVM-RFE) [46], multiple SVM-RFE (MPSVM-RFE) [40] and a multiple support vector data description recursive feature elimination (MSVDD-RFE) [4] have been developed. Once gene selection is achieved, a classifier should be applied subsequently. Although SVM-RFE methods can select useful genes for cancer classification, the computational complexity of these methods are very high when the gene number is huge.

Fortunately, the 1-norm SVM can be applied to perform gene selection fast [12] and feature selection for conjoint analysis [22]. The 1-norm SVM is a variant of SVM [2, 23, 43, 45, 47]. It is well-known that the objective function of SVM consists of 2-norm regularization penalty and the hinge loss [13, 29, 34, 35]. The 1-norm SVM is to simultaneously minimize the 1-norm regularization penalty and the hinge loss. Since the 1-norm regularization penalty as a substitution of the 0-norm regularization penalty can also induce a sparse solution, it has shown that the 1-norm SVM has a better sparse representation than SVM [43].

Fung and Mangasarian used the Newton method to minimize an exterior penalty problem constructed from the dual problem of the 1-norm SVM, which is called NLPSVM, and applied NLPSVM to gene selection [12]. Since NLPSVM with the linear kernel directly picks up the useful genes and does not use the recursive feature elimination scheme, NLPSVM is much faster than SVM-RFE methods. In addition, NLPSVM can can classify microarray data in its own way. As a variant of the 1-norm SVM, the 1-norm SVM with a squared loss (1-norm SVMSL) was proposed in [42]. Since the 1-norm SVMSL uses orthogonal matching pursuit (OMP) to find an approximate solution, its training speed is very fast. It is the approximate solution that would cause loss on the classification performance.

To fast select gene and achieve better classification performance, this paper designs a cancer classification framework. The 1-norm SVMSL is first used to select useful genes, and then a subsequent classifier is applied on the reduced data. We perform extensive experiments on DNA microarray data sets.

The rest of the paper is organized as follows. In Section 2, we address related work on gene selection based on SVMs. Section 3 gives the framework of this paper. Experimental results on the real-world data sets are given in Section 4. Section 5 concludes this paper.

2 Related work

This section introduces gene selection methods based on SVMs. We discuss SVM-RFE and the 1-norm SVM, respectively. Generally, cancer classification can be simply described as follows. Given a set of training samples $X=\{(\textbf {x}_{i},y_{i})\}_{i = 1}^{n}$, where ${{\textbf {x}}_{i}} \in {\mathbb {R}^{D}}$ denotes the gene expression of a patient, y _i ∈{−1,+1} is the label of x _i, D is the number of genes (or the dimension of samples), and n is the number of samples, we try to find genes from this training set which are useful for discriminating cancer diseases. In different datasets, the label y _i of x _i has different meanings. In the Breast Cancer dataset [33], for example, the label “+1” means “non-relapse” and the label “-1” means “relapse”.

2.1 SVM-RFE

Support vector machine, the state of the art learner in machine learning and pattern recognition, is famous for its good generalization performance, sparse model representation, and nonlinear learning ability [13, 29, 34, 35].

As a classifier, SVM requires a set of training samples X, and solves the following primal optimization problem:

$$\begin{array}{@{}rcl@{}} \min_{\textbf{w},b,{\xi}}& &\frac{1}{2}\|\textbf{w}\|^{2}_{2}+C\sum\limits_{i=1}^{n}\xi_{i} \\ subject~to & &y_{i}\left( \textbf{w}^{T}\textbf{x}_{i}+b\right)\geq 1-\xi_{i}, \\ & & \xi_{i} \geq 0, i=1,\cdots, n \end{array} $$

(1)

where ξ _i is the slack variable, ∥⋅∥ denotes the 2-norm of a vector, C > 0 is the regularization factor which controls the balance between the empirical risk and the capacity of SVM, and w and b are respectively the weight vector and the threshold in the hypothesis function

$$ f\left( \textbf{x}\right)=\textbf{w}^{T}\textbf{x}+b $$

(2)

where w = [w ₁,w ₂,⋯ ,w _D]^T. As long as we get w and b, we can assign a label for any sample x using

$$ f^{*}(\textbf{x})=sign\left( f\left( \textbf{x}\right)\right)=sign\left( \textbf{w}^{T}\textbf{x}+b\right) $$

(3)

where s i g n(⋅) is the sign function.

To implement gene selection for cancer classification, SVM-RFE was proposed in [16]. In a sequential backward elimination way, SVM-RFE iteratively deletes redundant genes and retains useful genes. In each iteration, SVM-RFE trains a linear SVM model with the reserved genes, and deletes a gene according to weight scores ${w_{j}^{2}}$, j = 1,⋯ ,D. The weight score for gene j is $c_{j}={w_{j}^{2}}$ . The higher c _j , the greater the importance of the j th gene is. The detail algorithm for SVM-RFE is shown in Algorithm 1 [16]. Note that the order of elements in R is very important in Step 2(d). The last element in R means that the corresponding gene has the highest score and is the most important one.

A novel gene selection method, MPSVM-RFE was proposed based on SVM-RFE [10]. In each iteration, MPSVM-RFE computes the ranking scores from a statistical analysis of the weight vectors in multiple linear SVMs trained on original training subsets. CV (cross-validation) is adopted to resample the training data. Thus, the computational complexity of MPSVM-RFE is much higher than SVM-RFE.

2.2 1-norm SVM

Since the 1-norm SVM has sparsity, the linear 1-norm SVM can be used for feature selection or variable selection [3]. As we know, there are some different LP forms for the 1-norm SVM [2, 8, 23, 24, 43, 45, 47]. However, these forms are proved to be essentially equivalent while selecting proper parameters. Here we introduce the LP form presented in [43], which can be solved by the simplex method, the interior-point method, NLPSVM [12], the column generation Newton (CGN) method [44] and the row-column generation (RCG) method [41]. Among these methods, NLPSVM is relatively fast in the linear case. CGN and RCG were proposed for improving the training speed in the nonlinear case.

Consider the training sample set of two classes X = {(x ₁,y ₁),(x ₂,y ₂),...,(x _n,y _n)}, where $\textbf {x}_{i}\in \mathbb {R}^{D}$ and y _i ∈{1,−1}. The 1-norm SVM is to simultaneously minimize the 1-norm regularization penalty and the hinge loss, which can be expressed as a linear programming problem. The primal problem of the 1-norm SVM is:

$$ \begin{array}{ll} \min \hspace{0.2cm} &\sum\limits_{j=1}^{D}\left( \alpha^{+}_{j}+\alpha^{-}_{j}\right)+ \sigma\left( \beta^{+}+\beta^{-}\right) + C\sum\limits_{i=1}^{n} \gamma_{i} \\ s.t.\hspace{0.2cm} & y_{i}\left[\textbf{x}_{i}^{T}\left( \boldsymbol{\upalpha}^{+} - \boldsymbol{\upalpha}^{-}\right) + \left( \beta^{+} - \beta^{-}\right)\right] \geq 1-\gamma_{i} \\ &\beta^{+},\beta^{-} \geq 0, \gamma_{i}\geq 0, i=1, \cdots, n\\ & \alpha_{j}^{+},\alpha_{j}^{-} \geq 0, j=1, \cdots, D \end{array} $$

(4)

where C > 0 is the regularization factor, σ > 0 is a small constant, $\boldsymbol {\upalpha }^{+}=\left [\alpha _{1}^{+},\cdots ,\alpha _{D}^{+}\right ]^{T}$, $\boldsymbol {\upalpha }^{-}=\left [\alpha _{1}^{-},\cdots ,\alpha _{D}^{-}\right ]^{T}$, $\beta ^{+}\in \mathbb {R}$ and $\beta ^{-}\in \mathbb {R}$ are model coefficients for the 1-norm SVM, and $\boldsymbol {\upgamma }=\left [\gamma _{1},\cdots ,\gamma _{\ell }\right ]^{T}\in \mathbb {R}^{n}$ is a loss vector.

Naturally, (4) could be rewritten in matrix form

$$ \begin{array}{ll} \min_{\textbf{u}}\hspace{0.2cm} &{c}^{T}\textbf{u}\\ s.t.\hspace{0.2cm}&Au\geq \textbf{b}\\ &{u}\geq \textbf{0} \end{array} $$

(5)

where the variable vector u = [(α ⁺)^T,(α ⁻)^T,β ⁺, β ⁻,γ ^T]^T, c =[1 ^T,1 ^T,σ,σ,C 1 ^T]^T, b = 1, the constraint coefficient matrix A = [D _y X,−D _y X,y,−y,I], $\textbf {X}\in \mathbb {R}^{n \times D}$ is the sample matrix in which each sample is a row vector, y = [y ₁,y ₂,⋯ ,y _n]^T, D _y is the diagonal matrix with the diagonal line of y, and 1 and I are the column vector of all ones and the identity matrix with the appropriate size, respectively.

The decision function of the linear 1-norm SVM takes the form:

$$\begin{array}{@{}rcl@{}} f^{*}(\textbf{x})&=&sign\left( \left( \boldsymbol{\upalpha}^{+}-\boldsymbol{\upalpha}^{-}\right)^{T}\textbf{x}+(\beta^{+}-\beta^{-})\right)\\ &=&sign\left( \boldsymbol{\upalpha}^{T}\textbf{x}+\beta\right) \end{array} $$

(6)

where α = α ⁺ −α ⁻ and β = β ⁺ − β ⁻.

Once the problem (5) is solved, we can get α and β. Since most elements in α are zero, gene selection is automatically implemented by searching nonzero coefficients in α. If α _j≠0, then the j gene is selected.

The 1-norm SVM solved by NLPSVM is much faster than SVM-RFE when dealing with gene selection since it does not require a recursive process. The complexity of each iteration in NLPSVM is O(m i n(n,D)³) when applying the Sherman-Morrison-Woodbury identity equation, where n is the number of samples and D is the number of genes.

3 Framework of the proposed method

The classification framework based on the 1-norm SVMSL is shown in Fig. 1, where the linear SVM is taken as an illustration of subsequent classifiers, X is the given training sample set, and X ^′ is the reduced training sample set. We first introduce the 1-norm SVMSL, and then respectively discuss gene selection and classification stages.

3.1 1-norm SVMSL

Given a training set $X=\{(\textbf {x}_{i},y_{i})\}_{i=1}^{n}$ with $\textbf {x}\in \mathbb {R}^{D}$, our goal is to select d genes from D ones with d << D in the stage of gene selection. The 1-norm SVMSL is applied to select the d genes.

By replacing the hinge loss with the squared loss and ignoring σ, we can rewrite (4) and obtain the optimization problem for the 1-norm SVMSL:

$$\begin{array}{@{}rcl@{}} &&\min\left\|\boldsymbol{\upalpha}^{+}\right\|_{1}\,+\,\left\|\boldsymbol{\upalpha}^{-}\right\|_{1}\,+\,\sigma\left( \beta^{+}+\beta^{-}\right)+ C \left\|\textbf{1}- \textbf{D}_{y}\left( \textbf{X}\boldsymbol{\upalpha}+\beta\right)\right\|_{2}^{2} \\ &&s.t.~\beta^{+}\!\geq\! 0,\beta^{-} \geq 0, \boldsymbol{\upalpha}^{+}\geq \textbf{0},\boldsymbol{\upalpha}^{-} \geq \textbf{0} \end{array} $$

(7)

where ∥⋅∥₁ denotes the 1-norm of a vector. The standard matrix formulation of (7) can be expressed as

$$ \begin{array}{ll} \underset{\textbf{u}}{\min} \hspace{0.2cm} &\tau \|\textbf{u}\|_{1}+ \frac{1}{2} \left\|\textbf{c}- \textbf{Au}\right\|_{2}^{2} \\ s.t. & \textbf{u} \geq \textbf{0} \end{array} $$

(8)

where u =[(α ⁺)^T,(α ⁻)^T,β ⁺,β ⁻]^T, A = [D _y X,−D _y X,y,−y], c = 1, and τ > 0 is a non-negative real parameter which has the same role as the parameter C in both SVM and the 1-norm SVM.

In sparse signal reconstruction, the problem (8) is equivalent to both the least absolute shrinkage and selection operator (LASSO), and the quadratically constrained linear problem (QCLP) under some conditions [1]. Thus, orthogonal matching pursuit (OMP) which has been applied to approximately solve QCLP [7, 9, 32] can also be used to approximately solve the 1-norm SVMSL. In addition, OMP needs only the parameter d instead of τ in solving (8).

3.2 Gene selection and classification

When the problem (8) is solved approximately by using OMP, we can get the coefficient vector vector α. Similar to NLPSVM, the 1-norm SVMSL can automatically select genes. However, since the solution to the 1-norm SVMSL is only an approximate one, the classification performance obtained by directly using α is not consistent with one generated by an exact solution. The experimental results in [44] also show that the performance of approximate algorithms could be better or worse than that of exact algorithms.

To avoid this situation, we can use a subsequent classifier to classify the reduced dataset, such as support vector machine, and nearest neighbor. As shown in Fig. 1, the framework shows the training and test processes, which are respectively discussed as follows.

The training process is given in Algorithm 2, which requires training both the 1-norm SVMSL and the linear SVM. Let $X=\{(\textbf {x}_{i},y_{i})\}_{i=1}^{n}$ denote the given training sample set. Assume that we need to select d genes from D ones, where d << D. The 1-norm SVMSL is first trained by using X and then outputs the selected gene subset S with |S| = d. According to the subset S, we can select useful genes and generate a new training set $X^{\prime }=\{(\textbf {x}^{\prime }_{i},y_{i})\}_{i=1}^{n}$, where $\textbf {x}^{\prime }_{i}\in \mathbb {R}^{d}$. The subsequent classifier, for example the linear SVM, is trained by using X ^′.

For an unseen sample x, we first reduce it to x ^′ employing S and then apply the trained model on x ^′ to assign a class label for x.

4 Results and discussions

This section discusses experimental results of the proposed method on cancer classification using gene expression data and compares it with classical gene selection methods, such as SVM-RFE.

4.1 Experiment settings

Four public gene microarray datasets available are used to validate the performance of these compared methods, including Leukemia-ALLAML [14], Lung Cancer [15], Central Nervous System (CNS) dataset [25], and Prostate Cancer [28]. These four datasets can be downloaded from http://datam.i2r.a-star.edu.sg/datasets/krbd/. All data here are normalized such that each gene has mean 0 and variance 1 according to the way mentioned in [11].

Since a subsequent classifier is required, we here provide two classifiers, including SVM, and the nearest neighbor (NN) to observe the performance of the 1-norm SVMSL. The compared methods include SVM-RFE+SVM/NN [16], SVDD-RFE+SVM/NN [4], the 1-norm SVM [12, 43], the 1-norm SVMSL [42], the 1-norm SVM+SVM/NN, and the proposed 1-norm SVMSL+SVM/NN. Since the 1-norm SVM is similar to the 1-norm SVMSL, SVM or NN could be subsequent to it. Thus, we have the compared method, the 1-norm SVM+SVM/NN. Note that there are many methods for solving the 1-norm SVM, we choose two ways, or the simplex method and the Newton method. The 1-norm SVM with the simplex method is an exact solution one, and with the Newton method is an approximate one. For simplicity, the 1-norm SVM with the simplex method is still denoted by the 1-norm SVM. The 1-norm SVM with the Newton method is NLPSVM proposed in [12].

All these gene selection methods use the linear kernel. The regularization parameter C for SVM-RFE is selected from the set {0.5,1,2,10,100}, for SVDD-RFE from the set {1/n,0.1,0.5,1}, and for 1-norm SVM from the set {0.5,1,2, 10,100} according to the empirical evidence. The regularization parameter is approximately selected by using 3-fold cross validation on the training set and the corresponding classifiers. For example, C for SVM-RFE is selected by using SVM. For the 1-norm SVMSL, the only parameter is d which determines the selected gene number.

We use recalls to evaluate the performance of these methods. The recall of the j th class is defined as

$$ Recall_{j}=\frac{TP_{j}}{TP_{j}+FN_{j}} $$

(9)

where T P _j and F N _j are the number of correctly classified samples and the number of incorrectly classified samples in class j, respectively.

All programs are written in MATLAB or/and C. Both SVM and SVDD are implemented by using the LIBSVM package [5], the 1-norm SVM is implemented by using GLPK package [21], NLPSVM is implemented by using the code provided in [12]. In addition, the 1-norm SVMSL is solved by using OMP in the SparseLab package [30]. All numerical experiments are performed on a personal computer with a 2.93GHz Intel(R) Core(T)2 Duo CPU and 2G bytes of memory. This computer runs on Windows 7, with MATLAB 7.10 compiler installed.

4.2 Lung cancer

Lung cancer consists of 181 tissue samples of which 31 samples are malignant pleural mesothelioma (MPM), and 150 adenocarcinoma(ADCA) of the lung. The training set contains 32 of them, 16 MPM and 16 ADCA. The remaining 149 samples are used for test, with 15 MPM and 134 ADCA. Each sample is described by 12533 genes.

4.2.1 SVM classifier

On the training set, we determine the parameter for SVM-RFE C = 10, for SVDD-RFE C = 1, and for the 1-norm SVM, NLPSVM and the 1-norm SVMSL C = 2. Note that SVM-RFE and SVDD-RFE can rank genes, but the 1-norm SVMSL and the 1-norm SVM can not. However, 1-norm SVMSL has a free parameter d which can represent the number of selected genes. Thus, we can select the last d genes in R for the first two methods and set different d for the 1-norm SVMSL. Let d vary from 1 to 400, which follows the experiment setting in [46]. The average recall vs. d for the three methods is shown in Fig. 2.

When d is small, say d < 20, the 1-norm SVMSL is better than SVM-RFE and SVDD-RFE. When d > 25, the performance of the 1-norm SVMSL is unchanged. In addition, the maximum selected gene number is 29 for the 1-norm SVMSL on the Lung cancer dataset. Thus, the 1-norm SVMSL has a stable performance when d ≥ 29. The 1-norm SVM automatically determines the number of selected genes which is the number of non-zero coefficients in its model. On this dataset, 24 genes are selected by the 1-norm SVM, and 33 by NLPSVM.

Table 1 lists the best average performance and the corresponding gene number and CPU time for all these methods, where CPU time includes gene selection, classifier training and test. Observation on Table 1 indicates that the 1-norm SVMSL can very quickly pick up genes, and the use of the subsequent classifier can also improve the average recall from 98.99% to 99.25%. Actually, for the 1-norm SVM-like methods, the use of SVM improves their classification performance. The 1-norm SVMSL is almost four orders of magnitude faster than SVM-RFE, three orders faster than SVDD-RFE, two orders faster than NLPSVM, and one order faster than the 1-norm SVM. In addition, note that the number of useful genes is relatively small for the 1-norm SVMSL, only three.

Table 1 Comparison of gene selection methods with SVM on the Lung cancer dataset

Full size table

4.2.2 NN classifier

The subsequent classifier used here is NN. Similary, let d vary from 1 to 400 for three methods, SVM-RFE, SVDD-RFE and the 1-norm SVMSL. The average recall vs. d is shown in Fig. 3. We can see that the 1-norm SVMSL has good average recall when d is small, say d < 10.

Table 2 lists the best average performance and the corresponding gene number and CPU time for all these methods, where CPU time includes gene selection and classification process. Since the 1-norm SVM, NLPSVM and the 1-norm SVMSL can implement gene selection and classification at the same time, their results are the same as those in Table 1. Unfortunately, the use of NN can not improve the classification performance as we expect. Only for the 1-norm SVM, NN improves its classification performance. Obviously, the rank about the CPU time is the same as the previous experiment.

Table 2 Comparison of gene selection methods with NN on the Lung cancer dataset

Full size table

4.2.3 Selected genes by the 1-norm SVMSL

On the Lung cancer dataset, the 1-norm SVMSL can at most select 29 genes based on the training set. However, from Tables 1 and 2, we can see that only three or four genes are enough for discriminating ADCA and MPM, 3 for the SVM classifier and 4 for the NN classifier. Table 3 shows the information on four selected genes, where the first three genes are only for SVM. These genes are protein coding genes from homo sapiens, or human.

Table 3 Genes selected by the 1-norm SVMSL on the Lung cancer dataset

Full size table

Gene selection methods aim to select genes which are less correlated to each other to reduce the redundancy. Statistical methods offer a way to measure a linear correlation between genes. For any gene pair (G _i,G _j), we can compute the correlation coefficient between them. Namely,

$$ \rho_{ij}=\frac{Cov(G_{i},G_{j})}{\sqrt{D(G_{i})D(G_{j})}} $$

(10)

where C o v(⋅,⋅) denotes the covariance function, and D(⋅) denotes the variance function and 0 ≤ ρ _{i
j} ≤ 1. G _i is linearly uncorrelated to G _j if and only if ρ _{i
j} = 0.

Based on Table 1, we check the linear correlation of selected genes corresponding to the best performance. To see the coefficient distribution, we discrete the correlation coefficient into five intervals [0.0,0.2), [0.2,0.4), [0.4,0.6), [0.6,0.8) and [0.8,1), and compute the probability in the corresponding interval. Table 4 gives the coefficient distribution and average correlation coefficient, which is defined as

$$ \bar{\rho}=\frac{2}{d(d-1)}\sum\limits_{i=1}^{d-1}\sum\limits_{j=i+1}^{d}\rho_{ij} $$

(11)

The number of Gene pair in Table 4 is equal to d(d − 1)/2 and the detail value for d is given in Table 1. The correlation coefficients for the 1-norm SVMSL are less than 0.4, which indicates that this method can select less correlated genes indeed. For Table 2, we have the similar conclusion.

Table 4 Average correlation coefficient and distribution of best gene subsets on the Lung cancer dataset

Full size table

In summary, we have the following conclusions based on the experimental results on the Lung cancer dataset:

Although SVM-RFE +SVM/NN has the best average recall, which is slightly superior to the 1-norm SVMSL +SVM/NN, the speed of the 1-norm SVMSL is greatly superior to SVM-RFE. In addition, the 1-norm SVMSL +SVM/NN achieves its best performance with a small number of genes, which is useful for speeding the test process and data storage.
Compared with SVDD-RFE, the 1-norm SVMSL has an advantage over both speed and performance.
The 1-norm SVMSL has a similar performance to the 1-norm SVM and a much better performance than NLPSVM. However, the 1-norm SVMSL needs less time for selecting genes than these two methods.
As the subsequent classifier, SVM is much better than NN because SVM improves the average recall of the 1-norm SVMSL.
The genes selected by the 1-norm SVMSL are mostly less correlated. Some gene pairs selected by other methods have high correlation. For example, 0.79 % gene pairs in SVM-RFE have large correlation coefficients which locates in the interval [0.8,0.9).

4.3 Leukemia-ALLAML

Leukemia-ALLAML dataset has 72 samples belonging to two classes, or ALL (Acute Lymphoblastic Leukemia) and AML (Acute Myeloid Leukemia). The training set consists of 38 bone marrow samples (27 ALL and 11 AML), over 7129 probes from 6817 human genes. In addition, 34 test samples are provided, with 20 ALL and 14 AML.

d varies from 1 to 400. The average recall vs. d obtained by SVM-RFE, SVDD-RFE and the 1-norm SVMSL is shown in Fig. 4. We have the similar conclusion as before. The 1-norm SVMSL achieves its best performance when d is small.

Table 5 shows the best average performance and the corresponding gene number and CPU time for all these methods. Observation on Table 5 also implies that the 1-norm SVMSL can select gene subsets fast. SVM-RFE has the best average recall among all the methods, followed by the 1-norm SVMSL+SVM and the 1-norm SVM. In addition, NN can not improve the performance of the 1-norm SVM, NLPS-VM and the 1-norm SVMSL, but SVM can improve the performance of both NLPSVM and the 1-norm SVMSL.

Table 5 Comparison on the Leukemia-ALLAML dataset

Full size table

On this dataset, the 1-norm SVMSL can at most select 32 genes based on the training set. Among the 32 genes, four genes are enough for discriminating ALL and AML. Table 6 shows the information on the four genes. The first two genes, ZYX and MPO are often presented in visualizations, see http://www.biolab.si/supp/bi-cancer/projections/info/leukemia.htm.

Table 6 Genes selected by the 1-norm SVMSL on the Leukemia-ALLAML dataset

Full size table

Now, we validate the linear correlation of best gene genes which means subsets with highest average recall obtained by SVM. Table 7 lists the coefficient distribution and average correlation coefficient. The correlation coefficients for the 1-norm SVMSL are all between 0.2 to 0.4, which shows that the genes selected by the 1-norm SVMSL are not strongly correlated.

Table 7 Average correlation coefficient and distribution of best gene subsets on the Leukemia-ALLAML dataset

Full size table

Experimental conclusions on the Leukemia-ALLAML dataset are consistent with those on the Lung cancer dataset. We do not discuss more.

4.4 More datasets

In the following, we perform experiments on another two gene expression datasets, including CNS and Prostate tumor. The two datasets are described as follows:

CNS dataset contains 60 patient samples, 21 are survivors (alive after treatment) and 39 are failures (succumbed to their disease). There are 7129 genes in the dataset. The training set consists of the first 10 survivors and 30 failures, the other 11 survivors and 9 failures are testing points.
Prostate tumor dataset has a training set which contains 52 prostate tumor samples and 50 non-tumor (labeled as ”Normal”) prostate samples with around 12600 genes. An independent set of testing samples is also prepared, which is from a different experiment and has a nearly 10-fold difference in overall microarray intensity from the training data. Besides, we have removed extra genes contained in the testing samples. There are 25 tumor and 9 normal samples in the test set.

As showed above, NN does not outperform SVM. Thus, we only consider SVM as the subsequent classifier. The setting of parameters is the same as before.

The average recall vs. d obtained by SVM-RFE, SVDD-RFE and the 1-norm SVMSL is shown in Fig. 5. If the selected gene number is big enough, SVDD-RFE can obtained good performance on the CNS and Prostate datasets. For the 1-norm SVMSL, its best performance still can be achieved when d is small.

Tables 8 and 9 show the experimental results on the two datasets, respectively. If we do not perform gene selection, the average recall obtained by SVM is very low on both datasets. SVDD-RFE is compared to SVM-RFE when SVDD-RFE selects more genes. On the two datasets, the 1-norm SVMSL+SVM is much better than other 1-norm SVM-like methods.

Table 8 Comparison on the CNS dataset

Full size table

Table 9 Comparison on the Prostate tumor dataset

Full size table

The correlation coefficient between selected genes is also shown in Fig. 6. Similarly, the linear correlation between genes selected by the 1-norm SVMSL is very low on both the CNS and Prostate tumor datasets.

5 Conclusions

This paper proposes a new strategy for cancer classification based on gene expression data. A fast gene selection way is implemented by using the 1-norm SVMSL. The 1-norm SVMSL is an variant of the 1-norm SVM. Although the 1-norm SVMSL can perform gene selection and classification at the same time, we use OMP to approximate its solution which may be not so exact to perform classification. Thus, a subsequent classifier is used to discriminating genes selected by the 1-norm SVMSL. In theory, any classifier dealing with cancer classification can be used as the subsequent classifier. Experimental results shows SVM is a proper one.

In our experiments, we compare the 1-norm SVMSL with SVM-RFE, SVDD-RFE, the 1-norm SVM and NLPSVM on four datasets. SVM-RFE outperforms other methods on classification performance in three of four datasets, but it takes a long time to rank genes, say more than 1,000 sec. for 12533 genes. Note that the 1-norm SVMSL is at least four order of magnitude faster than SVM-RFE. In addition, the 1-norm SVMSL+SVM has an acceptable performance by reference to SVM-RFE+SVM. Compared to the 1-norm SVM and NLPSVM, the 1-norm SVMSL is almost an order of magnitude faster than it. On four datasets, the 1-norm SVMSL is better than or compared to both the 1-norm SVM and NLPSVM. In addition, we test the linear correlation between the selected genes. Among these gene selection methods, the linear correlation is the weakest for the 1-norm SVMSL. In other words, genes selected by the 1-norm SVMSL are not redundant and less correlated.

It is notable that the 1-norm SVMSL only needs a small number of genes to achieve its best performance, which would contribute to fast test and storage. However, we can not determine how many genes we should select, or determine the value of d in the 1-norm SVMSL. If we select all which the 1-norm SVMSL generates, we can not guarantee the classification performance, especially in case of the Prostrate tumor dataset. In the future, we try to find a way to determine the optimal gene number for the 1-norm SVMSL.

References

Becker S, Bobin J, Candès E (2009) NESTA: A fast and accurate first-order method for sparse recovery. SIAM J Imaging Sci 4(1):1–39
Article MathSciNet MATH Google Scholar
Bennett KP (1999) Combining support vector and mathematical programming methods for classification. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in Kernel methods: support vector learning. MIT Press, Cambridge, pp 307–326
Bi J, Bennett KP, Embrechts M, Breneman CM, Song M (2003) Dimensionality reduction via sparse support vector machines. J Mach Learn Res 3:1229–1243
MATH Google Scholar
Cao J, Zhang L, Wang B, Li F, Yang J (2015) A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 53(2):381–389
Article Google Scholar
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. http://www.csie.ntu.edu.tw/cjlin/libsvm
Cui Y, Zheng CH, Yang J, Sha W (2013) Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. Comput Biol Med 43(7):933–941
Article Google Scholar
Davis G, Mallat S, Avellaneda M (1997) Greedy adaptive approximation. J Construtive Approx 13:57–98
Article MATH Google Scholar
Demiriz A, Bennett KP, Shawe-Taylor J (2002) Linear programming Boosting via column generation. Mach Learn 46(1):225–254
Article MATH Google Scholar
Donoho D, Elad M, Temlyakov V (2006) Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans Inf Theory 52:6–18
Article MathSciNet MATH Google Scholar
Duan KB, Rajapakse JC, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification with expression data. IEEE Trans Nanobioscience 4(3):228–234
Article Google Scholar
Dudoit S, Fridlyand J, Speed TP (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J Amer Stat Assoc 97(457):77–87
Article MathSciNet MATH Google Scholar
Fung GM, Mangasarian OL (2004) A feature selection Newton method for support vector machine classification. Comput Optim Appl 28:185–202
Article MathSciNet MATH Google Scholar
Girosi F (1998) An equivalence between sparse approximation and support vector machines. Neural Comput 10(6):1455–1480
Article Google Scholar
Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caligiuri M, Bloomfield C, Lander ES (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5436):531–537
Article Google Scholar
Gordon G, Jensen R, Hsiao L, Gullans S, Blumenstock J, Ramaswamy S, Richards W, Sugarbaker D, Bueno R (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62(17):4963–4937
Google Scholar
Guyon I, Weston J, Barnhill S, Vapink V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422
Article MATH Google Scholar
Lee C, Leu Y (2011) A novel hybrid feature selection method for microarray data analysis. Appl Soft Comput 11(1):208–213
Article Google Scholar
Li JT, Jia YM, Li WL (2011) Adaptive huberized support vector machine and its application to microarray classification. Neural Comput Appl 20(1):123–132
Article Google Scholar
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17 (12):1131–1142
Article Google Scholar
Liu H, Liu L, Zhang H (2010) Ensemble gene selection by grouping for microarray data classification. J Biomed Inform 43(1):81–87
Article Google Scholar
Makh A (2012) GLPK (GNU linear programming kit). http://www.gnu.org/software/glpk/glpk.html
Maldonado S, Montoya R, Lpez J (2017) Embedded heterogeneous feature selection for conjoint analysis: A SVM approach using L1 penalty. Appl Intell 46:775–787
Article Google Scholar
Mangasarian OL (2000) Generalized support vector machines. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 135–146
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7:1517–1530
MathSciNet MATH Google Scholar
Pomeroy S, Tamayo P, Gaasenbeek M, Sturla L, Angelo M, McLaughlin M, Kim J, Goumnerova L, Black P, Lau C, Allen J, Zagzag D, Olson J, Curran T, Wetmore C, Biegel J, Poggio T, Mukherjee S, Rifkin R, Califano A, Stolovitzky G, Louis D, Mesirov J, Lander E, Golub T (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415 (6870):436–442
Article Google Scholar
Shah S, Kusiak A (2007) Cancer gene search with data-mining and genetic algorithms. Comput Biol Med 37(2):251–261
Article Google Scholar
Shen Q, Mei Z, Ye BX (2009) Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification. Comput Biol Med 39(7):646–649
Article Google Scholar
Singh D, Febbo PG, Ross K, Jackson DG, Manola J, Ladd C, Tamayo P, Renshaw AA, D’Amico AV, Richie JP (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209
Article Google Scholar
Steinwart I (2003) Sparseness of support vector machines. J Mach Learn Res 4(6):1071–1105
MathSciNet MATH Google Scholar
Stodden V, Carlin L, David Donoho EA (2007) SparseLab: Seeking sparse solutions to linear systems of equations. http://sparselab.stanford.edu/
Thi HAL, Tao PD, Thiao M (2016) Efficient approaches for L2-L0 regularization and applications to feature selection in SVM. Appl Intell 45(2):1–17
Google Scholar
Tropp J, Gilbert A (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inform Theroy 53(12):4655–4666
Article MathSciNet MATH Google Scholar
Van’t Veer LJ, Dai H, van de Vijver M, He Y, Hart A, Mao M, Peterse H, Van der Kooy K, Marton M, Witteveen A, Schreiber G, Kerkhoven R, Roberts C, Linsley P, Bernards R, Friend S (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530–536
Article Google Scholar
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Vapnik V (1999) The overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Wang C, Cao L, Miao B (2013) Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data. Comput Stat Data Anal 66(10):140–149
Article MathSciNet Google Scholar
Wang HQ, Wong HS, Zhu H, Yip TT (2009) A neural network-based biomarker association information extraction approach for cancer classification. J Biomed Inform 42(4):654–666
Article Google Scholar
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2000) Feature selection for SVMs. NIPS 12:668–672
Google Scholar
Wong HS, Wang HQ (2008) Constructing the gene regulation-level representation of microarray data for cancer classification. J Biomed Inform 41(1):95–105
Article Google Scholar
Zhang L, Huang X (2015) Multiple SVM-RFE for multi-class gene selection on dna microarray data. In: Proceedings of 2015 international joint conference on neural networks, pp 897–902
Zhang L, Zhou W (2013) Analysis of programming properties and the rowccolumn generation method for 1-norm support vector machines. Neural Netwx 48(12):32–43
Article MATH Google Scholar
Zhang L, Zhou W, Zhang Z, Yang J (2015) A fast approximation algorithm for 1-norm svm with squared loss. In: Proceedings of 2015 IEEE International Joint Conference on Neural Networks (IJCNN)
Zhang L, Zhou W (2010) On the sparseness of 1-norm support vector machines. Neural Netw 23(3):373–385
Article Google Scholar
Zhang L, Zhou W (2013) A fast algorithm for kernel 1-norm support vector machines. Knowl-Based Syst 52(16):223–235
Article Google Scholar
Zhou W, Zhang L, Jiao L (2002) Linear programming support vector machines. Pattern Recogn 35(12):2927–2936
Article MATH Google Scholar
Zhou X, Tuck DP (2007) MSVM-RFE: Extensions of SVM-RFE for multi-class gene selection on dna microarray data. Bioinformatics 23(9):1106C1114
Article Google Scholar
Zhu J, Rosset S, Hastie T, Tibshirani R (2004) 1-norm support vector machines. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, pp 49–56

Download references

Acknowledgements

This study was funded by the National Natural Science Foundation of China (grant numbers 61373093, 61672364, and 61672365), by the Natural Science Foundation of Jiangsu Province of China (grant number BK20140008), by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (grant number 13KJA520001), and by the Soochow Scholar Project.

Author information

Authors and Affiliations

School of Computer Science and Technology, Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Li Zhang, Bangjun Wang, Zhao Zhang & Fanzhang Li
AI Speech Ltd., Suzhou, 215123, Jiangsu, China
Weida Zhou

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Weida Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Bangjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fanzhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Zhou, W., Wang, B. et al. Applying 1-norm SVM with squared loss to gene selection for cancer classification. Appl Intell 48, 1878–1890 (2018). https://doi.org/10.1007/s10489-017-1056-3

Download citation

Published: 15 September 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10489-017-1056-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Applying 1-norm SVM with squared loss to gene selection for cancer classification

Abstract

Similar content being viewed by others

An Optimize Gene Selection Approach for Cancer Classification Using Hybrid Feature Selection Methods

Improved Gene Expression Classification Through Multi-class Support Vector Machines Feature Selection

Semi-supervised SVM-based Feature Selection for Cancer Classification using Microarray Gene Expression Data

1 Introduction