Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Gu, Quan; Ding, Yong-Sheng; Jiang, Xiao-Ying; Zhang, Tong-Liang

doi:10.1007/s00726-008-0209-4

Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Original Article
Published: 02 December 2008

Volume 38, pages 975–983, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Amino Acids Aims and scope Submit manuscript

Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Download PDF

Quan Gu¹,
Yong-Sheng Ding^1,2,
Xiao-Ying Jiang³ &
…
Tong-Liang Zhang⁴

1427 Accesses
35 Citations
Explore all metrics

Abstract

Apoptosis proteins have a central role in the development and the homeostasis of an organism. These proteins are very important for understanding the mechanism of programmed cell death. The function of an apoptosis protein is closely related to its subcellular location. It is crucial to develop powerful tools to predict apoptosis protein locations for rapidly increasing gap between the number of known structural proteins and the number of known sequences in protein databank. In this study, amino acids pair compositions with different spaces are used to construct feature sets for representing sample of protein feature selection approach based on binary particle swarm optimization, which is applied to extract effective feature. Ensemble classifier is used as prediction engine, of which the basic classifier is the fuzzy K-nearest neighbor. Each basic classifier is trained with different feature sets. Two datasets often used in prior works are selected to validate the performance of proposed approach. The results obtained by jackknife test are quite encouraging, indicating that the proposed method might become a potentially useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas. The supplement information and software written in Matlab are available by contacting the corresponding author.

Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA

Article Open access 24 May 2020

Multi-level Feature-Based Subcellular Location Prediction of Apoptosis Proteins

Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction

Article Open access 19 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Computational approaches, such as structural bioinformatics (Argos et al. 1982; Chou 2004a, b, c, d, 2005a), molecular docking (Chou et al. 2003; Gao et al. 2007; Li et al. 2007; Wang et al. 2008; Zhang et al. 2006a, b, c; Zheng et al. 2007), molecular packing (Chou et al. 1984, 1988), pharmacophore modeling (Chou et al. 2006; Sirois et al. 2004), Mote Carlo simulated approach (Chou 1992), diffusion-controlled reaction simulation (Chou and Jiang 1974; Chou and Zhou 1982; Li and Chou 1976), bio-macromolecular internal collective motion simulation (Chou 1988), QSAR (Dea-Ayuela et al. 2008; Du et al. 2005, 2008a, b; Gonzalez-Diaz et al. 2006, 2008; Prado-Prado et al. 2008), protein subcellular location prediction (Chou and Shen 2006a, b, 2007a, c, 2008a; Shi et al. 2008), identification of membrane proteins and their types (Chou and Shen 2007b), identification of enzymes and their functional classes (Shen and Chou 2007a), identification of GPCR and their types (Chou 2005b; Chou and Elrod 2002), identification of proteases and their types (Chou and Shen 2008b), protein cleavage site prediction (Chou 1993, 1996; Shen and Chou 2008), and signal peptide prediction (Chou and Shen 2007d; Shen and Chou 2007b) can timely provide very useful information and insights for both basic research and drug design and hence are widely welcome by science community. The present study is attempted to develop a computational approach for predicting the subcellular localization of apoptosis proteins in hope to stimulate the development of the relevant areas (Emanuelsson et al. 2007; Fauchere et al. 1988; Janin 1979; Janin and Wodak 1978).

Apoptosis is a form of cell death which plays a central role in normal tissue homeostasis by regulating a balance between cell proliferation and death (Chou et al. 1997, 1999, 2000; Chou 2004a, b, c, d, 2005a, b, c). Cells undergoing apoptosis usually exhibit a characteristic morphology, including fragmentation of the cell into membrane-bound apoptotic bodies, nuclear and cytoplasm condensation and hemolytic cleavage of the DNA into small oligonucleosomal fragments (Kerr et al. 1972; Steller 1995). Unregulated excessive apoptosis may cause various degenerative and autoimmune diseases. Conversely, an inappropriately low rate of apoptosis may promotes survival and accumulation of abnormal cells that can give rise to tumor formation and prolonged autoimmune stimulation such as in cancers and Graves disease (Peter et al. 1997).The study on apoptosis proteins can help us to understand the mechanism of apoptosis and provide many targets for therapeutic intervention (Cosic 1994; Du and Li 2006; Hong et al. 1999; Hopp and Woods 1981; Huang and Shi 2005; Chou 2000, Chou 2004a, b, c, d, 2005a, b, c).

The function of a protein is closely correlated with its subcellular location (Cai and Chou 2003; Cai et al. 2003; Chou 2002; Chou and Cai 2002, 2004, 2005; Chou and Shen 2006a, b, c; Shen et al. 2007b; Shen et al. 2005; Shen and Chou 2007; Chou and Elrod 1999; Chou 2000, 2001; Feng 2002). Thus, the knowledge of apoptosis proteins subcellular location will help to understand the apoptosis mechanism and functions of proteins (Schulz et al. 1999; Reed and Paternostro 1999). The knowledge of apoptosis proteins function is very important for understanding the mechanism of programmed cell death. The malfunction of apoptosis or cell death will lead to some formidable diseases, such as cancer (Adams and Cory 1998; Evan and Littlewood 1998), autoimmune diseases, ischemic damage, or neurodegenerative disease (Schulz et al. 1999). With the rapid increasing of the number of unknown function protein sequences in protein databank, it is crucial to develop fast and powerful computational tools and algorithms to predict apoptosis proteins subcellular location directly from their amino acid sequences.

Several prediction algorithms have been reported for subcellular location of apoptosis protein. Zhou and Doctor (2003) have predicted four kinds of subcellular locations by using amino acid composition (AAC) representing sample of protein, and covariant discriminate algorithm of Chou (1995) as prediction engine. They obtained overall accuracy 72.5% by jackknife test. Bulashevska and Eils (2006) achieved accuracies 85.7 and 89.9% using single Bayesian classifier and hierarchical ensemble classifier, respectively. Zhang et al. (2006b) developed a new encoding method with grouped weight for protein sequence. Meanwhile, they constructed a larger dataset with 225 apoptosis protein belonged to four subcellular locations. A prediction algorithm of dual-layer support vector machine has been developed (Zhou et al. 2008). Chen and Li (2007a, b) have developed two prediction approaches based on increment of diversity (ID) and increment of diversity with support vector machine (ID_SVM), which are validated on a new dataset covering six subcellular compartments and 317 apoptosis proteins.

Compare to lots of research on protein subcellular location (Chou and Shen 2007b), the studies on apoptosis protein subcellular location are limited. It is mainly due to the flexibility of the apoptosis proteins distribution and the limited of apoptosis proteins annotated. In this study, we propose a new prediction approach based on ensemble classifier and feature selection for prediction of apoptosis protein subcellular location based on the analysis above mentioned. A new kind of ensemble classifier is introduced as prediction engine. The methods of ensemble classifier, which has the capability of reducing the variance caused by the peculiarities of a single training set and hence be able to learn a more expressive concept in classification than a single classifier, are proposed in various attributes of protein science (Shen and Chou 2006a, b, 2007a; Shen et al. 2007a; Kedarisetti et al. 2006; Chou and Shen 2006a, 2006b, 2007a). The basic classifier is fuzzy K-nearest neighbor (FKNN) (Keller et al. 1985) classifier, which is a simple and powerful classifier often used in identifying various protein attributes (Huang and Li 2004; Shen et al. 2006; Huang et al. 2006). For each basic classifier within ensemble classifier, the input data is k-spaced amino acid pair’s composition after feature selection. The test results obtained by jackknife test indicate that the proposed method might be a useful tool for subcellular location of apoptosis protein, or at least can play a complimentary role to the existing methods in the relevant areas.

Materials and methods

Datasets

Two datasets constructed by the previous investigators are used to examine the power of the new method. The dataset CL317 is a larger one with 317 apoptosis proteins constructed by Chen and Li (2007a), which has 112 cytoplasmic proteins, 55 membrane proteins, 34 mitochondrial proteins, 17 secreted proteins, 52 nuclear proteins, and 47 endoplasmic reticulum proteins. The dataset ZW225 with 225 apoptosis proteins in the work (Zhang et al. 2006b) includes four subcellular locations with 41 nuclear proteins, 70 cytoplasmic proteins, 25 mitochondrial proteins, and 89 membrane proteins.

K-spaced amino acid pairs

As mentioned in prior works, amino acids (AA) composition vector of protein sequence is a simple sequence representation that is widely used in prediction of various structural aspects. Given 20 alphabetically ordered (A, C,D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y) AA, which are denoted as A ₁, A ₂,…, A ₁₉ , and A ₂₀, and the number of occurrences of A _i in the sequence that is denoted as x _i, the composition vector is defined as (x ₁/L, x ₂/L,…, x _i/L), where L is the length of the sequence (Chen et al. 2006b, 2007b; Kawashima et al. 1999; Nakashima and Nishikawa 1994; Park and Kanehisa 2003; Pincus 1991; Richman and Moorman 2000; Shi et al. 2007; Tanford 1962; Zimmerman et al. 1968). However, the composition vector is insufficient to represent a sequence, since it only counts the frequencies of individual AAs. At the same time, frequencies of AA pairs (dipeptides) provide more information since they reflect interaction between local (with respect to the sequence) AA pairs. Based on the frequency of collocation of AA pairs in the sequence, all dipeptides in the sequence can be counted. Since there are 400 possible dipeptides (AA, AC, AD,…,YY), a feature vector of that size is used to represent occurrence of these pairs in the sequence. Each AA pairs occurrence rate is (n ₁/(L − 1), n ₂/(L − 1),…, n _i/(L − 1)). Since short-range interactions between AAs, rather than only interactions between immediately adjacent AAs, have impact of folding, the proposed representation also considers collocated pairs of AAs, i.e., the AA pairs that are separated by p other AAs (e.g., the AA pairs form is AA ₁ A ₂ …A _p A, where A ₁ A ₂ …A _p are other AA). In summary, these pairs can be understood as the dipeptides with gaps. For each value of p, there are 400 corresponding feature values. At the same time, each AA pairs occurrence rate is reduced to (n ₁/(L − p − 1), n ₂/(L − p − 1),…, n _i/(L − p − 1)). Collocated pairs for p = 0, 1,…, 20 are considered for the reason that the distance of AA in motif database PROSITE is up to 20 (Chen et al. 2007a; Falquet et al. 2002). As a result, we propose representation that includes total of 400(20 + 1) + 20 = 8,420 features.

Binary particle swarm optimization

Particle swarm optimization (PSO) is a population-based stochastic optimization technique, which was developed by Kennedy and Eberhart (1995). It is one of the evolutionary optimization methods inspired by nature which include evolutionary strategy, evolutionary programming, genetic algorithm and genetic programming. PSO is distinctly different from other evolutionary-type methods in that it does not use the filtering operation (such as crossover and/or mutation) and the members of the entire population are maintained through the search procedure (Kennedy et al. 2001).

In the PSO algorithm, every solution is a bird of the flock and is referred to as a particle: in this framework the birds, besides having individual intelligence, also develop some social behavior and coordinate their movement towards a destination.

Initially, the process starts from a swarm of particles, in which each of them contains a solution to the hydraulic problem that is generated randomly, and then one searches the optimal solution by iteration. The ith particle is associated with a position in an s-dimensional space, where M is the number of variables involved in the problem; the values of the M variables which determine the position of the particle represent a possible solution of the optimization problem. Each particle i is completely determined by three vectors: its current position X _i, and its velocity V _i as follows:

$$ {\text{Current position }}X_{i} = (x_{i1} ,x_{i2} , \ldots ,x_{iM} ) $$

(1)

$$ {\text{Flight velocity }}V_{i} = (v_{i1} ,v_{i2} , \ldots ,v_{iM} ) $$

(2)

This algorithm simulates a flock of birds which communicate during flight. Each bird looks at a specific direction (its best ever attained position), and later, when they communicate among themselves, the bird which is in the best position is identified. With coordination, each bird moves also towards the best bird using a velocity which depends on its present velocity. Thus, each bird examines the search space from its current local position, and this process repeats until the bird possibly reaches the desired position. Note that this process involves as much individual intelligence as social interactivity; the birds learn through their own experience (local search) and the experience of their peers (global search).

In each cycle, one identifies the particle which has the best instantaneous solution to the problem; the position of this particle subsequently enters into the computation of the new position for each of the particles in the flock. This calculation is carried out according to

$$ X_{id}^{k + 1} = X_{id}^{k} + V_{id}^{k + 1} $$

(3)

$$ v_{id}^{k + 1} = v_{id}^{k} + C_{1} \,rand()(pbest_{id}^{k} - x_{id}^{k} ) + C_{2} \,rand()(gbest_{d}^{k} - x_{id}^{k} ) $$

(4)

Here, rand() represents a function which creates random numbers between 0 and 1 (two independent random numbers enter Eq. 4); pbest ^k_id represents the best position of each particle i reached in kth cycle whereas gbest represents the best result of global search. C ₁ and C ₂ are two positive constants which are called learning factors or rates which are usually set to 2.

PSO was originally introduced as an optimization technique for real-number spaces. However, many optimization problems occur in a space featuring discrete, qualitative distinctions between variables and between levels of variables. Kennedy and Eberhart introduced binary PSO (BPSO), which can be applied to discrete binary variables. In a binary space, a particle may move to near corners of a hypercube by flipping various numbers of bits; thus, the overall particle velocity may be described by the number of bits changed per iteration (Kennedy and Eberhart 1997). In BPSO, each particle position X _i is set to 1 or 0, but the flight velocity V _i are not limited. In our paper, BPSO is used as feature selection algorithm. All the AA pairs feature of apoptosis proteins above mentioned compose of particle space. If the ith feature is selected, then X _i = 1; if not, X _i = 0. The fitness function of the feature selection algorithm is formulated by Eq. 5.

$$ fitness = Ac - k*nNewFeature/nAllFeature $$

(5)

In Eq. 5, Ac represents the accuracy of Jackknife test (Chou and Zhang 1995) on training dataset, nNewFeature represents the number of newly features selected, nAllFeature represents the number of all features, and k is a parameter represents the fixed ratio of feature selected in the algorithm. In our paper, k = 1.

Based on the velocity of particles in BPSO calculated by Eq. 4, each particle’s new position X ^k+1_id can be get as follows:

$$ x_{{id}}^{{k + 1}} = \left\{ \begin{array}{*{20}l}R{\text{(}}0,1{\text{)}} > 0.7\quad\;&{\text{if}}\;v_{{id}}^{{k + 1}} < 2a/3 \hfill \\ x_{{id}}^{k} \quad\quad\quad\,\,\;&{\text{if}}\;2a/3 <= v_{{id}}^{{k + 1}} < a \hfill \\ pbest(id)\quad\quad\,&{\text{if}}\;a <= v_{{id}}^{{k + 1}} < {\text{(}}1 + a{\text{)}}/2 \hfill \\ gbest\quad\quad\quad\,&{\text{if}}\;{\text{(}}1 + a{\text{)}}/2 <= v_{{id}}^{{k + 1}} < 1 \hfill \\ \end{array} \right.$$

(6)

where a is a parameter represents updating of particles, in our research a = 1.

Ensemble classifier

The framework of the ensemble classifier is illustrated in Fig. 1. The basic classifier is FKNN classifier which is trained on the k-spaced amino acid pair’s composition after feature selection. Combining a set of basic classifiers, the ensemble classifier is formulated by

$$ C = C_{1} \{ BiAAC(p = 0)\} \oplus C_{2} \{ BiAAC(p = 1)\} \oplus \cdots \oplus C_{n} \{ BiAAC(p = n)\} $$

(7)

where C denotes the ensemble classifier, C _i{BiAAC(p = i)}, i = 0, 1, , n, represent the basic classifiers trained by proteins based on the feature selection results of p-spaced amino acid pairs composition. The symbol ⊕ is the combination operator. Here, the basic classifier is the FKNN classifier (Keller et al. 1985), which combines the fuzzy set theory with KNN algorithm. The detailed algorithm description of the FKNN can be found in the work (Huang and Li 2004; Shen et al. 2006; Zheng et al. 2007). The output of each basic classifier is the fuzzy membership value of subcellular location of apoptosis protein. A fuzzy membership matrix can be formulated as Eq. 8

$$ \left[ {\begin{array}{*{20}c} {m_{1}^{1} (x)} & {m_{1}^{2} (x)} & \cdots & {m_{1}^{n + 1} (x)} \\ {m_{2}^{1} (x)} & {m_{2}^{2} (x)} & \cdots & {m_{2}^{n + 1} (x)} \\ \vdots & \vdots & \vdots & \vdots \\ {m_{c}^{1} (x)} & {m_{c}^{1} (x)} & \cdots & {m_{c}^{n + 1} (x)} \\ \end{array} } \right] $$

(8)

where c is the number of subcellular location and n is the number of k-spaced amino acid pair.

Through fusing the output of each basic classifier, the fuzzy membership value of output of ensemble classifier can be obtained.

$$ f_{i}^{\text{comb}} (u_{i} ) = \frac{1}{n}\sum\limits_{j = 1}^{n} {u_{i} (j)} $$

(9)

where u _i = (m ¹_i (x), m ²_i (x),…, m ⁿ_i (x)), i = 1, 2,…, c, “comb” express the rule of fusion. The final result is the maximum of f _i in Eq. 10.

$$ predicted={\mathop{\arg\max(f_{i})}\limits_{i = 1,2,\ldots,c}}$$

(10)

Fuzzy K-nearest neighbor classifier

Combining the fuzzy set theory with KNN algorithm, Keller has proposed a new method named as FKNN classifier algorithm (Keller et al. 1985). The fuzzy membership of a sample of protein is assigned to different subcellular location according to the formulation as below:

$$ u_{i} (p) = \frac{{\sum\nolimits_{j = 1}^{k} {u_{i} } \left( {p^{(j)} } \right)\left( {\left\| {p - p^{(j)} } \right\|^{{{{ - 2} \mathord{\left/ {\vphantom {{ - 2} {(m - 1)}}} \right. \kern-\nulldelimiterspace} {(m - 1)}}}} } \right)}}{{\sum\nolimits_{j = 1}^{k} {\left( {\left\| {p - p^{(j)} } \right\|^{{{{ - 2} \mathord{\left/ {\vphantom {{ - 2} {(m - 1)}}} \right. \kern-\nulldelimiterspace} {(m - 1)}}}} } \right)} }},\quad i = 1, \ldots ,c $$

(11)

where k is the number of nearest neighbors, u _i(p) is the membership value of a protein sample to structural class i. m is the fuzzy parameter, which determines the weight of distance of each neighbor to membership value. ||p − p ^(j)|| is the distance between the test protein sample and it nearest neighbor samples, various distance functions can be chosen, Here, we use Euclidean distance. u _i(p ^(j)) is the membership value of the jth nearest neighbor to ith subcellular location. It is assigned in crispest way, which is illuminated as below.

$$ u_{i} \left( {p^{(j)} } \right) = \left\{ {\begin{array}{*{20}c} {1\quad {\text{if}}\,p^{(j)} \in C_{i} } \\ {0\quad {\text{otherwise}}} \\ \end{array} } \right. $$

(12)

When all memberships of each subcellular location are calculated, the test protein sample is assigned to the class with highest membership value. As the prior work we did, it is a useful prediction engine (Zhang et al. 2006a, b, c, 2008). For the reason that p = 0, 1,…, 20 in feature selection in our research, 21 FKNN are selected as basic classifiers of ensemble classifier.

Performance measurement

To measure the quality of apoptosis protein subcellular locations prediction, it is convenient to introduce an accuracy matrix [M _ii] of size c × c (c is the number of compartments to be predicted). The element M _ii of the accuracy matrix is the number of proteins to be predicted in subcellular location j, which are actually in subcellular location i.

Three indexes are applied to evaluate the prediction accuracy, which are sensitivity (S _n), specialty (S _p), and Matthew’s correlation coefficients (MCC).

$$ S_{n} = \frac{{M_{ii} }}{{\sum\nolimits_{j = 1}^{c} {M_{ij} } }} $$

(13)

$$ S_{p} = \frac{{M_{ii} }}{{\sum\nolimits_{j = 1}^{c} {M_{ji} } }} $$

(14)

$$ {\text{MCC}} = \frac{{M_{ii} \left( {\sum\nolimits_{k \ne i}^{c} {\sum\nolimits_{j \ne i}^{c} {M_{jk} } } } \right) - \left( {\sum\nolimits_{j \ne i}^{c} {M_{ij} } } \right) \times \left( {\sum\nolimits_{j \ne i}^{c} {M_{ji} } } \right)}}{{\left[ {\left( {M_{ii} + \sum\nolimits_{j \ne i}^{c} {M_{ij} } } \right)\left( {M_{ii} + \sum\nolimits_{j \ne i}^{c} {M_{ji} } } \right)\left( {\sum\nolimits_{k \ne i}^{c} {\sum\nolimits_{j \ne i}^{c} {M_{jk} } } + \sum\nolimits_{j \ne i}^{c} {M_{ji} } } \right)\left( {\sum\nolimits_{k \ne i}^{c} {\sum\nolimits_{j \ne i}^{c} {M_{jk} } } + \sum\nolimits_{j \ne i}^{c} {M_{ij} } } \right)} \right]^{1/2} }} $$

(15)

$$ A_{c} = {{\left( {\sum\limits_{i = 1}^{c} {M_{ii} } } \right)} \mathord{\left/ {\vphantom {{\left( {\sum\limits_{i = 1}^{c} {M_{ii} } } \right)} {\left( {\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{c} {M_{ij} } } } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{c} {M_{ij} } } } \right)}} $$

(16)

S_n represents the accuracy, and S_p represents the reliability in prediction. MCC is a single parameter characterizing the matching degree between the observed and predicted structural classes.

Results and discussion

In statistical prediction, the following three cross-validation tests are often used to examine the power of a predictor: independent dataset test, sub-sampling (such fivefold or tenfold sub-sampling) test, and jackknife test (Chou and Zhang 1995; Cai et al. 2001; Zhou and Assa-Munt 2001; Zhou 1998). Of these three, however, the jackknife test is thought the most rigorous and objective that can always yield a unique result for a given benchmark dataset, as elucidated in (Zhou and Cai 2006; Chou and Shen 2008a) and demonstrated by Eq. 50 of (Chou and Shen 2007c), and hence has been used by more and more investigators (e.g., Chen et al. 2006a, b; Gao et al. 2005a, b; Liu et al. 2005a; Liu et al. 2005b; Chou and Shen 2006a, b, 2007a; Xiao et al. 2005; 2006a, b; Lin and Li 2007a, b; Zhang et al. 2006a, b; Zheng et al. 2007) in examining the power of various prediction methods.

Firstly, the dataset CL317 (Chen and Li 2007a) is applied to validate our research approach. The dimension of protein features and jackknife test result are showed in Table 1.

Table 1 The results of feature selection for different space in k-spaced amino acid pairs

Full size table

From Table 1, we can see that the features dimension of different k-space has been reduced, while the jackknife accuracy of each basic classifier reasonably increases after feature selection. The reason for that is using BPSO as the feature selection method can reduce the redundancy features efficiently.

After ensemble the 21 FKNN classifiers as prediction engine, the jackknife results on CL137 dataset are listed in Table 2.

Table 2 The results by jackknife test on the dataset CL317

Full size table

As shown in Table 2, the overall accuracy of jackknife test is 91.5% by using ensemble classifier with 21 trained FKNN weak classifiers, 1–3% higher than using only one FKNN classifier (Table 1). The reason is listed as follows: the ensemble classifier, which has the capability of reducing the imbalance caused by the peculiarities of a single training set and hence be able to learn a more expressive concept in classification than a single classifier, are proposed in various attributes of protein science (Shen and Chou 2006a, b, 2007a). From the Table 2 we also can find two results: firstly, the result of our approach is obviously higher than ID (Chen and Li 2007a) and ID_SVM (Chen and Li 2007b) in the same dataset. The reason is our protein features after feature selection are more effective than that of two methods. From Table 2, we can see the jackknife results are obviously higher in Cytoplasmic, Nuclear proteins and endoplasmic reticulum location than ID and ID_SVM methods.

In order to validate the performance of the proposed approach further, the dataset ZW225 (Zhang et al. 2006b) is adopted. The jackknife results are shown in Table 3.

Table 3 The results by jackknife on the dataset ZW225

Full size table

As shown in Table 3, the overall prediction accuracy Ac(%) of this study is the highest both in total accuracy and success rate in each subcellular compartment. What is more, from the Tables 2 and 3 we can see the desirable values of S _n, S _p, MCC, which also verify the objective of jackknife test.

Conclusions

In this paper, binary particle swarm optimization (BPSO) is applied to extract effective feature, and AA pair compositions with different spaces are used to construct feature sets for protein feature selection. In order to increase prediction accuracy, ensemble classifier is applied as prediction engine, of 21 classifier is the FKNN (fuzzy K-nearest neighbor) trained with different feature sets. Two datasets CL317 and ZW225 are selected to validate the performance of proposed approach, the jackknife result are 91.5 and 88.0%, respectively, which both are the better than other methods. The results indicate that the proposed method will be a potentially useful tool for subcellular location of apoptosis protein.

References

Adams JM, Cory S (1998) The Bcl-2 protein family: arbiters of cell survival. Science 281:1322–1326
CAS PubMed Google Scholar
Argos P, Rao JK, Hargrave PA (1982) Structural prediction of membrane-bound proteins. Eur J Biochem 128:565–575
Article CAS PubMed Google Scholar
Bulashevska A, Eils R (2006) Predicting protein subcellular locations using hierarchical ensemble of Bayesian classifiers based on Markov chains. BMC Bioinform 7:298–310
Google Scholar
Cai YD, Chou KC (2003) Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 305:407–411
CAS PubMed Google Scholar
Cai YD, Liu XJ, Xu XB, Zhou GP (2001) Support vector machines for predicting structural class. BMC Bioinform 2:3
CAS Google Scholar
Cai YD, Zhou GP, Chou KC (2003) Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J 84:3257–3263
CAS PubMed Google Scholar
Chen YL, Li QZ (2007a) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245:775–783
CAS PubMed Google Scholar
Chen YL, Li QZ (2007b) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248(2):377–381. doi:10.1016/j.jtbi.2007.05.019
CAS PubMed Google Scholar
Chen C, Zhou X, Tian YX, Zou XY, Cai PX (2006a) Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network. Anal Biochem 357:116–121
CAS PubMed Google Scholar
Chen C, Tian YX, Zou XY, Cai PX, Mo JY (2006b) Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 243:444–448
CAS PubMed Google Scholar
Chen K, Kurgan LA, Rahbari M (2007a) Prediction of protein crystallization using collocation of amino acid pairs. Biochem Biophys Res Commun 355:764–769
CAS PubMed Google Scholar
Chen K, Kurgan LA, Ruan JH (2007b) Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct Biol 7:25
PubMed Google Scholar
Chou KC (1988) Review: low-frequency collective motion in biomacromolecules and its biological functions. Bio Chem 30:3–48
CAS Google Scholar
Chou KC (1992) Energy-optimized structure of antifreeze protein and its binding mechanism. J Mol Biol 223:509–517
CAS PubMed Google Scholar
Chou KC (1993) A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. J Biol Chem 268:16938–16948
CAS PubMed Google Scholar
Chou KC (1995) A novel approach to predicting protein structural classes in a (20–1)-D amino acid composition space. Proteins 21:319–344
CAS PubMed Google Scholar
Chou KC (1996) Review: prediction of HIV protease cleavage sites in proteins. Anal Biochem 233:1–14
CAS PubMed Google Scholar
Chou KC (2000) Review: prediction of protein structural classes and subcellular locations. Curr Protein Pept Sci 1:171–208
CAS PubMed Google Scholar
Chou KC (2001) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins Struct Funct Genet 43(3):246–255
CAS PubMed Google Scholar
Chou KC (2002) A new branch of proteomics: prediction of protein cellular attributes. In: Weinrer PW, Lu Q (eds) Gene cloning and expression technologies. Eaton Publishing, Westborough, pp 57–70
Google Scholar
Chou KC (2004a) Review: structural bioinformatics and its impact to biomedical science. Curr Med Chem 11:2105–2134
CAS PubMed Google Scholar
Chou KC (2004b) Insights from modelling the 3D structure of the extracellular domain of alpha7 nicotinic acetylcholine receptor. Biochem Biophys Res Commun 319:433–438
CAS PubMed Google Scholar
Chou KC (2004c) Modelling extracellular domains of GABA-A receptors: subtypes 1, 2, 3, and 5. Biochem Biophys Res Commun 316:636–642
CAS PubMed Google Scholar
Chou KC (2004d) Molecular therapeutic target for type-2 diabetes. J Proteome Res 3:1284–1288
CAS PubMed Google Scholar
Chou KC (2005a) Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes. Bioinformatics 21:10–19
CAS PubMed Google Scholar
Chou KC (2005b) Coupling interaction between thromboxane A2 receptor and alpha-13 subunit of guanine nucleotide-binding protein. J Proteome Res 4:1681–1686
CAS PubMed Google Scholar
Chou KC (2005c) Prediction of G-protein-coupled receptor classes. J Proteome Res 4:1413–1418
CAS PubMed Google Scholar
Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Biol Chem 277:45765–45769
CAS PubMed Google Scholar
Chou KC, Cai YD (2004) Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem Biophys Res Commun 320:1236–1239
CAS PubMed Google Scholar
Chou KC, Cai YD (2005) Predicting protein localization in budding yeast. Bioinformatics 21:944–950
CAS PubMed Google Scholar
Chou KC, Elrod DW (1999) Prediction of membrane protein types and subcellular locations. Proteins: Struct Funct Genet 34:137–153
CAS Google Scholar
Chou KC, Elrod DW (2002) Bioinformatical analysis of G-protein-coupled receptors. J Proteome Res 1:429–433
CAS PubMed Google Scholar
Chou KC, Jiang SP (1974) Studies on the rate of diffusion-controlled reactions of enzymes. Sci Sinica 17:664–680
Google Scholar
Chou KC, Shen HB (2006a) Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. Biochem Biophys Res Commun 347:150–157
CAS PubMed Google Scholar
Chou KC, Shen HB (2006b) Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers. J Proteome Res 5:1888–1897
CAS PubMed Google Scholar
Chou KC, Shen HB (2006c) Predicting protein subcellular location by fusing multiple classifiers. J Cell Biochem 99:517–527
CAS PubMed Google Scholar
Chou KC, Shen HB (2007a) Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. J Proteome Res 6:1728–1734
CAS PubMed Google Scholar
Chou KC, Shen HB (2007b) MemType-2L: a Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochem Biophys Res Comm 360:339–345
CAS PubMed Google Scholar
Chou KC, Shen HB (2007c) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370:1–16
CAS PubMed Google Scholar
Chou KC, Shen HB (2007d) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Comm 357:633–640
CAS PubMed Google Scholar
Chou KC, Shen HB (2008a) Cell-PLoc: a package of web-servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3:153–162
CAS PubMed Google Scholar
Chou KC, Shen HB (2008b) ProtIdent: a web server for identifying proteases and their types by fusing functional domain and sequential evolution information. Biochem Biophys Res Comm 376(2):321–325. doi:10.1016/j.bbrc.2008.08.125
CAS PubMed Google Scholar
Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Bio 30:275–349
CAS Google Scholar
Chou KC, Zhou GP (1982) Role of the protein outside active site on the diffusion-controlled reaction of enzyme. J Am Chem Soc 104:1409–1413
CAS Google Scholar
Chou KC, Nemethy G, Scheraga HA (1984) Energetic approach to packing of a-helices: 2. General treatment of nonequivalent and nonregular helices. J American Chem Soc 106:3161–3170
CAS Google Scholar
Chou KC, Maggiora GM, Nemethy G, Scheraga HA (1988) Energetics of the structure of the four-alpha-helix bundle in proteins. Proc Natl Acad Sci U S A 85:4295–4299
CAS PubMed Google Scholar
Chou KC, Zhang TC, Maggiora MG (1997) Disposition of amphiphilic helices in heteropolar environments. Proteins 28:99–108
CAS PubMed Google Scholar
Chou JJ, Li H, Salvessen GS, Yuan J, Wagner G (1999) Solution structure of BID, an intracellular amplifier of apoptotic signalling. Cell 96:615–624
CAS PubMed Google Scholar
Chou KC, Tomasselli AG, Heinrikson RL (2000) Prediction of the tertiary structure of a caspase-9/inhibitor complex. FEBS Lett 470:249–256
CAS PubMed Google Scholar
Chou KC, Wei DQ, Zhong WZ (2003) Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS (Erratum: ibid., 2003, Vol. 310, 675). Biochem Biophys Res Comm 308:148–151
CAS PubMed Google Scholar
Chou KC, Wei DQ, Du QS, Sirois S, Zhong WZ (2006) Review: progress in computational approach to drug development against SARS. Curr Med Chem 13:3263–3270
CAS PubMed Google Scholar
Cosic I (1994) Macromolecular bioactivity: is it resonant interaction between macromolecules?—theory and applications. IEEE Trans Biomed Eng 41:1101–1114
CAS PubMed Google Scholar
Dea-Ayuela MA, Perez-Castillo Y, Meneses-Marcel A, Ubeira FM, Bolas-Fernandez F, Chou KC, Gonzalez-Diaz H (2008) HP-Lattice QSAR for dynein proteins: Experimental proteomics (2D-electrophoresis, mass spectrometry) and theoretic study of a Leishmania infantum sequence. Bioorg Med Chem 16:7770–7776
CAS PubMed Google Scholar
Du PF, Li YD (2006) Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence. BMC Bioinform 7:518–526
Google Scholar
Du QS, Mezey PG, Chou KC (2005) Heuristic molecular lipophilicity potential (HMLP): a 2D-QSAR study to LADH of molecular family pyrazole and derivatives. J Comput Chem 26:461–470
CAS PubMed Google Scholar
Du QS, Huang RB, Chou KC (2008a) Review: recent advances in QSAR and their applications in predicting the activities of chemical molecules, peptides and proteins for drug design. Curr Protein Pept Sci 9:248–259
CAS PubMed Google Scholar
Du QS, Huang RB, Wei YT, Du LQ, Chou KC (2008b) Multiple field three dimensional quantitative structure-activity relationship (MF-3D-QSAR). J Comput Chem 29:211–219
CAS PubMed Google Scholar
Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:953–971
CAS PubMed Google Scholar
Evan G, Littlewood T (1998) A matter of life and cell death. Science 281:1317–1322
CAS PubMed Google Scholar
Falquet L, Pagni M, Bucher P, Hulo N, Sigrist CJ, Hofmann K, Bairoch A (2002) The PROSITE database, its status in 2002. Nucleic Acids Res 30:235–238
CAS PubMed Google Scholar
Fauchere JL, Charton M, Kier LB, Verloop A, Pliska V (1988) Amino acid side chain parameters for correlation studies in biology and pharmacology. Int J Pept Protein Res 32:269–278
CAS PubMed Google Scholar
Feng ZP (2002) An overview on predicting the subcellular location of a protein. In Silico Biol 2:291–303
CAS PubMed Google Scholar
Gao QB, Wang ZZ, Yan C, Du YH (2005a) Prediction of protein subcellular location using a combined feature of sequence. FEBS Lett 579:3444–3448
CAS PubMed Google Scholar
Gao Y, Shao SH, Xiao X, Ding YS, Huang YS, Huang ZD, Chou KC (2005b) Using pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter. Amino Acids 28:373–376
CAS PubMed Google Scholar
Gao WN, Wei DQ, Li Y, Gao H, Xu WR, Li AX, Chou KC (2007) Agaritine and its derivatives are potential inhibitors against HIV proteases. Med Chem 3:221–226
CAS PubMed Google Scholar
Gonzalez-Diaz H, Sanchez-Gonzalez A, Gonzalez-Diaz Y (2006) 3D-QSAR study for DNA cleavage proteins with a potential anti-tumor ATCUN-like motif. J Inorg Biochem 100:1290–1297
CAS PubMed Google Scholar
Gonzalez-Díaz H, Gonzalez-Díaz Y, Santana L, Ubeira FM, Uriarte E (2008) Proteomics, networks, and connectivity indices. Proteomics 8:750–778
PubMed Google Scholar
Hong B, Tang QY, Yang FS (1999) Apen and Cross-ApEn: property, fast algorithm and preliminary application to the study of EEG and cognition. Signal Process 15:100–108 (in Chinese)
Google Scholar
Hopp TP, Woods KR (1981) Prediction of protein antigenic determinants from amino acid sequences. Proc Natl Acad Sci USA 78:3824–3828
CAS PubMed Google Scholar
Huang Y, Li YD (2004) Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics 20(1):21–28
CAS PubMed Google Scholar
Huang J, Shi F (2005) Support vector machines for predicting apoptosis proteins types. Acta Biotheor 53:39–47
PubMed Google Scholar
Huang WL, Chen HM, Hwang SF, Ho SY (2006) Accurate prediction of enzyme subfamily class using an adaptive fuzzy K-nearest neighbor method. BioSystems 90(2):405–413. doi:10.1016/j.biosystems.2006.10.004
PubMed Google Scholar
Janin J (1979) Surface and inside volumes in globular proteins. Nature 277:491–492
CAS PubMed Google Scholar
Janin J, Wodak S (1978) Conformation of amino acid side-chains in proteins. J Mol Biol 125:357–386
CAS PubMed Google Scholar
Kawashima S, Ogata H, Kanehisa M (1999) AAindex: amino acid index database. Nucleic Acids Res 27:368–369
CAS PubMed Google Scholar
Kedarisetti KD, Kurgan LA, Dick S (2006) Classifier ensembles for protein structural class prediction with varying homology. Biochem Biophys Res Commun 348:981–988
CAS PubMed Google Scholar
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbors algorithm. IEEE Trans Syst Man Cybern 15:580–585
Google Scholar
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceedings of the 1995 IEEE International Conference on Neural Networks, vol 4, Perth, Australia, pp 1942–1948
Kennedy J, Eberhart RC (1997) A discrete binary version of the particles warm algorithm. Systems, man and cybernetics, 1997. In: Proceedings of the IEEE International Conference on Computational Cybernetics and Simulation, vol 5, October 12–15, pp 4104–4108
Kennedy J, Eberhart RC, Shi Y (2001) Swarm intelligence. Morgan Kaufman, San Mateo
Google Scholar
Kerr JF, Wyllie AH, Currie AR (1972) Apoptosis: a basic biological phenomenon with wide-ranging implications in tissue kinetics. Br J Cancer 26:239–257
CAS PubMed Google Scholar
Li TT, Chou KC (1976) The quantitative relations between diffusion-controlled reaction rate and characteristic parameters in enzyme-substrate reaction system: 1. Neutral substrate. Sci Sinica 19:117–136
CAS PubMed Google Scholar
Li Y, Wei DQ, Gao WN, Gao H, Liu BN, Huang CJ, Xu WR, Liu DK, Chen HF, Chou KC (2007) Computational approach to drug design for oxazolidinones as antibacterial agents. Med Chem 3:576–582
CAS PubMed Google Scholar
Lin H, Li QZ (2007a) Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. Biochem Biophys Res Commun 354:548–551
CAS PubMed Google Scholar
Lin H, Li QZ (2007b) Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 28:1463–1466
CAS PubMed Google Scholar
Liu H, Wang M, Chou KC (2005a) Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun 336:737–739
CAS PubMed Google Scholar
Liu H, Yang J, Wang M, Xue L, Chou KC (2005b) Using Fourier spectrum analysis and pseudo amino acid composition for prediction of membrane protein types. Protein J 24:385–389
CAS PubMed Google Scholar
Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J Mol Biol 238(1):54–61
CAS PubMed Google Scholar
Park KJ, Kanehisa M (2003) Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13):1656–1663
CAS PubMed Google Scholar
Peter ME, Heufelder AE, Hengartner MO (1997) Advances in apoptosis research. Proc Natl Acad Sci USA 94:12736–12737
CAS PubMed Google Scholar
Pincus SM (1991) Approximate entropy as a measure of system complexity. Proc Natl Acad Sci USA 88:2297–2301
CAS PubMed Google Scholar
Prado-Prado FJ, Gonzalez-Diaz H, de la Vega OM, Ubeira FM, Chou KC (2008) Unified QSAR approach to antimicrobials. Part 3: First multi-tasking QSAR model for Input-Coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. Bioorg Med Chem 16:5871–5880
CAS PubMed Google Scholar
Reed JC, Paternostro G (1999) Postmitochondrial regulation of apoptosis during heart failure. Proc Natl Acad Sci USA 96:7614–7616
CAS PubMed Google Scholar
Richman JS, Moorman JR (2000) Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol 278(6):H2039–H2049
CAS PubMed Google Scholar
Schulz JB, Weller M, Moskowitz MA (1999) Caspases as treatment targets in stroke and neurodegenerative diseases. Ann Neurol 45:421–429
CAS PubMed Google Scholar
Shen HB, Chou KC (2006a) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22:1717–1722
CAS PubMed Google Scholar
Shen HB, Chou KC (2006b) Using ensemble classifier to identify membrane protein types. Amino Acids 32(4):483–488. doi:10.1007/s00726-006-0439-2
PubMed Google Scholar
Shen HB, Chou KC (2007a) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355(4):1006–1011
CAS PubMed Google Scholar
Shen HB, Chou KC (2007b) EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem Biophys Res Comm 364:53–59
CAS PubMed Google Scholar
Shen HB, Chou KC (2007c) Signal-3L: a 3-layer approach for predicting signal peptide. Biochem Biophys Res Comm 363:297–303
CAS PubMed Google Scholar
Shen HB, Chou KC (2008) HIVcleave: a web-server for predicting HIV protease cleavage sites in proteins. Anal Biochem 375:388–390
CAS PubMed Google Scholar
Shen HB, Yang J, Liu XJ, Chou KC (2005) Using supervised fuzzy clustering to predict protein structural classes. Biochem Biophys Res Commun 334:577–581
CAS PubMed Google Scholar
Shen HB, Yang J, Chou KC (2006) Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol 240:9–13
CAS PubMed Google Scholar
Shen HB, Yang J, Chou KC (2007a) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33(1):57–59
CAS PubMed Google Scholar
Shen HB, Yang J, Chou KC (2007b) Methodology development for predicting subcellular location and other attributes of proteins. Expert Rev Proteomics 4(4):453–463
CAS PubMed Google Scholar
Shi JY, Zhang SW, Pan Q, Cheng YM, Xie J (2007) Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33:69–74
CAS PubMed Google Scholar
Shi JY, Zhang SW, Pan Q, Zhou GP (2008) Using pseudo amino acid composition to predict protein subcellular location: approached with amino acid composition distribution. Amino Acids 35:321–327
CAS PubMed Google Scholar
Sirois S, Wei DQ, Du QS, Chou KC (2004) Virtual screening for SARS-CoV protease based on KZ7088 pharmacophore points. J Chem Inf Comput Sci 44:1111–1122
CAS PubMed Google Scholar
Steller H (1995) Mechanisms and genes of cellular suicide. Science 267:1445–1449
CAS PubMed Google Scholar
Tanford C (1962) Contribution of hydrophobic interactions to the stability of the globular conformation of proteins. J Am Chem Soc 84:4240–4274
CAS Google Scholar
Wang JF, Wei DQ, Chen C, Li Y, Chou KC (2008) Molecular modeling of two CYP2C19 SNPs and its implications for personalized drug design. Protein Pept Lett 15:27–32
PubMed Google Scholar
Xiao X, Shao SH, Ding YS, Huang ZD, Huang YS, Chou KC (2005) Using complexity measure factor to predict protein subcellular location. Amino Acids 28:57–61
CAS PubMed Google Scholar
Xiao X, Shao SH, Ding YS, Huang ZD, Chou KC (2006a) Using cellular automata images and pseudo amino acid composition to predict protein subcellular location. Amino Acids 30(1):49–54
CAS PubMed Google Scholar
Xiao X, Shao SH, Huang ZD, Chou KC (2006b) Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor. J Comput Chem 27:478–482
PubMed Google Scholar
Zhang R, Wei DQ, Du QS, Chou KC (2006a) Molecular modeling studies of peptide drug candidates against SARS. Med Chem 2:309–314
CAS PubMed Google Scholar
Zhang TL, Ding YS, Chou KC (2006b) Prediction of protein subcellular location using hydrophobic patterns of amino acid sequence. Comput Biol Chem 30:367–371
CAS PubMed Google Scholar
Zhang ZH, Wang ZH, Zhang ZR, Wang YX (2006c) A novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine. FEBS Lett 580:6169–6174
CAS PubMed Google Scholar
Zhang TL, Ding YS, Chou KC (2008) Prediction protein structural classes with pseudo amino acid composition: approximate entropy and hydrophobicity pattern. J Theor Biol 250(1):186–193
CAS PubMed Google Scholar
Zheng H, Wei DQ, Zhang R, Wang C, Wei H, Chou KC (2007) Screening for new agonists against Alzheimer’s Disease. Med Chem 3:488–493
CAS PubMed Google Scholar
Zhou GP (1998) An Intriguing controversy over protein structural class prediction. J Protein Chem 17:729–738
CAS PubMed Google Scholar
Zhou GP, Assa-Munt N (2001) Some insights into protein structural class prediction. Proteins 44:57–59
CAS PubMed Google Scholar
Zhou GP, Cai YD (2006) Predicting protease types by hybridizing gene ontology and pseudo amino acid composition. Proteins 63(3):681–684
CAS PubMed Google Scholar
Zhou GP, Doctor K (2003) Subcellular location prediction of apoptosis proteins. Proteins: Struct Funct Genet 50:44–48
CAS Google Scholar
Zhou XB, Chen C, Li ZC, Zou XY (2008) Improved prediction of subcellular location for apoptosis proteins by the dual-layer support vector machine. Amino Acids 35(2):383–388
CAS PubMed Google Scholar
Zimmerman JM, Eliezer N, Simha R (1968) The characterization of amino acid sequences in proteins by statistical methods. J Theor Biol 21:170–201
CAS PubMed Google Scholar

Download references

Acknowledgments

The authors wish to thank Dr. Z. H. Zhang for providing the datasets. This work was supported in part by Specialized Research Fund for the Doctoral Program of Higher Education from Ministry of Education of China (No. 20060255006), Project of the Shanghai Committee of Science and Technology (No. 08JC1400100), Shanghai Talent Developing Foundation (No. 001), Specialized Foundation for Excellent Talent from Shanghai, and the Open Fund from the Key Laboratory of MICCAI of Shanghai (06dz22013).

Author information

Authors and Affiliations

College of Information Sciences and Technology, Donghua University, 201620, Shanghai, China
Quan Gu & Yong-Sheng Ding
Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education, 201620, Shanghai, China
Yong-Sheng Ding
School of Chemistry and Chemical Engineering, Henan Institute of Science and Technology, Xinxiang, 453003, Henan, China
Xiao-Ying Jiang
Research Institute of Highway, Research Institute of Highway Ministry of Communications, 100088, Beijing, China
Tong-Liang Zhang

Authors

Quan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Sheng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Ying Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tong-Liang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong-Sheng Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gu, Q., Ding, YS., Jiang, XY. et al. Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection. Amino Acids 38, 975–983 (2010). https://doi.org/10.1007/s00726-008-0209-4

Download citation

Received: 09 September 2008
Accepted: 03 November 2008
Published: 02 December 2008
Issue Date: April 2010
DOI: https://doi.org/10.1007/s00726-008-0209-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Abstract

Similar content being viewed by others

Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA

Multi-level Feature-Based Subcellular Location Prediction of Apoptosis Proteins

Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction

Introduction

Materials and methods

Datasets

K-spaced amino acid pairs

Binary particle swarm optimization

Ensemble classifier

Fuzzy K-nearest neighbor classifier

Performance measurement

Results and discussion

Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection

Abstract

Similar content being viewed by others

Subcellular location prediction of apoptosis proteins using two novel feature extraction methods based on evolutionary information and LDA

Multi-level Feature-Based Subcellular Location Prediction of Apoptosis Proteins

Prediction of subcellular location of apoptosis proteins by incorporating PsePSSM and DCCA coefficient based on LFDA dimensionality reduction

Explore related subjects

Introduction

Materials and methods

Datasets

K-spaced amino acid pairs

Binary particle swarm optimization

Ensemble classifier

Fuzzy K-nearest neighbor classifier

Performance measurement

Results and discussion

Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation