Abstract
In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins—a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test. By comparing its results with some most popular predictors such as Discriminant Function, neural networks, and SVM, we demonstrated that the AdaBoost predictor outperformed these predictors. As a result, we arrive at the conclusion that AdaBoost algorithm could be employed as a robust method to predict subcellular location. An online web server for predicting subcellular location of prokaryotic and eukaryotic proteins is available at http://chemdata.shu.edu.cn/subcell/.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Eisenhaber F, Bork PW (1998) Subcellular localization of proteins based on sequence. Trends Cell Biol 8: 169–170
Nakai K (2000) Protein sorting signals and prediction of subcellular localization. Adv Protein Chem 54: 277–344
Nakai K, Kanehisa M (1991) Expert system for predicting protein localization sites in gram negative bacteria. Proteins Struct Funct Genet 1: 95–110
Nakai K, Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14: 897–911
Von Heijne G, Nielsen H, Engelbrecht J, Brunak S (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 10: 1–6
Nakashima H, Nishikawa K (1994) Discrimination of intracellular and extracellular proteins using amino acid composition and residue pair frequencies. J Mol Biol 238: 54–61
Cedano J, Aloy P, Pérez-Pons JA (1997) Relation between am ion acid composition and cellular location of proteins. J Mol Biol 266: 594–600
Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 9: 2230–2236
Cai YD, Chou KC (2000) Using neural networks for prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Cell Biol Res Commun 4: 172–173
Cai YD, Chou KC (2003) Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition. Biochem Biophys Res Commun 2: 407–411
Cai YD, Chou KC (2004) Predicting subcellular localization of proteins in a hybridization space. Bioinformatics 7: 1151–1156
Cai YD, Chou KC (2004) Predicting 22 protein localizations in budding yeast. Biochem Biophys Res Communi 2: 425–428
Chou KC, Elrod DW (1998) Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem Biophys Res Commun 252: 63–68
Chou KC, Elord DW (1999) Prediction of membrane protein types and subcellular locations. Proteins Struct Funct Genet 34: 137–153
Chou KC, Elrod D (1999) Protein subcellular location prediction. Protein Eng 2: 107–118
Chou KC, Cai YD (2002) Using functional domain composition and support vector machines for prediction of protein subcellular location. J Mol Biol 48: 45765–45769
Yuan Z (1999) Prediction of protein subcellular locations using Markov chain models. FEBS Lett 1: 23–26
Brown MPS, Grundy WN, Lin D, Cristianini N, Sugnet C, Ares JM, Haussler D, Chou KC (1995) A novel approach to predict protein structural classes in a (20-1)-D amino acid composition space. Proteins Struct Funct Genet 21: 319–344
Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Stat 26(5): 1651–1686
Freund Y, Schapire RE (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 1: 119–139
Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Machine Learn 37: 297–336
Romero E (2004) Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost. Neurocomputing 57: 313–344
Schapire RE (2002) The boosting approach to machine learning. An Overview MSRI Workshop on Nonlinear Estimation and Classification.
Duffy N, Helmbold D (2002) A geometric approach to leveraging weak learners. Theor Comput Sci 284: 67–108
Ding CHQ, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17: 349–358
Breiman L (2001) Random Forests. Machine Learn 15–32
Witten IH, Frank E (1999) Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco
Mardia KV, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
Vapnik V (1998) Statistical learning theory. Wiley-Interscience, New York
Chen NY, Lu WC, Li GZ, Yang J (2004) Support vector machine in chemistry. World Scientific Publishing Company, Singapore
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Niu, B., Jin, YH., Feng, KY. et al. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12, 41–45 (2008). https://doi.org/10.1007/s11030-008-9073-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-008-9073-0