Abstract
Mycobacterium tuberculosis is the primary pathogen causing tuberculosis, which is one of the most prevalent infectious diseases. The subcellular location of mycobacterial proteins can provide essential clues for proteins function research and drug discovery. Therefore, it is highly desirable to develop a computational method for fast and reliable prediction of subcellular location of mycobacterial proteins. In this study, we developed a support vector machine (SVM) based method to predict subcellular location of mycobacterial proteins. A total of 444 non-redundant mycobacterial proteins were used to train and test proposed model by using jackknife cross validation. By selecting traditional pseudo amino acid composition (PseAAC) as parameters, the overall accuracy of 83.3% was achieved. Moreover, a feature selection technique was developed to find out an optimal amount of PseAAC for improving predictive performance. The optimal amount of PseAAC improved overall accuracy from 83.3 to 87.2%. In addition, the reduced amino acids in N-terminus and non N-terminus of proteins were combined in models for further improving predictive successful rate. As a result, the maximum overall accuracy of 91.2% was achieved with average accuracy of 79.7%. The proposed model provides highly useful information for further experimental research. The prediction model can be accessed free of charge at http://cobi.uestc.edu.cn/cobi/people/hlin/webserver.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Yeh JI, Mao L (2006) Prediction of membrane proteins in Mycobacterium tuberculosis using a support vector machine algorithm. J Comput Biol 13: 126–129. doi:10.1089cmb.2006.13.126
Chou KC, Shen HB (2007) Review: recent progresses in protein subcellular location prediction. Anal Biochem 370: 1–16. doi:10.1016/j.ab.2007.07.006
Chou KC, Shen HB (2008) Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms. Nat Protoc 3: 153–162. doi:10.1038/nprot.2007.494
Shen HB, Chou KC (2007) Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites. Biochem Biophys Res Commun 355: 1006–1011. doi:10.1016/j.bbrc.2007.02.071
Shen HB, Chou KC (2007) Gpos-Ploc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins. Protein Eng Des Sel 20: 39–46. doi:10.1093/protein/gzl053
Shen HB, Chou KC (2007) Virus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells. Biopolymers 85: 233–240. doi:10.1002/bip.20640
Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33: 57–61. doi:10.1007/s00726-006-0478-8
Wang T, Yang J (2009) Using the nonlinear dimensionality reduction method for the prediction of subcellular localization of Gram-negative bacterial proteins. Mol Divers. doi:10.1007/s11030-009-9134-z
Niu B, Jian YH, Feng KY, Lu WC, Cai YD, Li GZ (2008) Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol Divers 12: 41–45. doi:10.1007/s11030-008-9073-0
Kalate RN, Tambe SS, Kulkarni BD (2003) Artificial neural networks for prediction of mycobacterial promoter sequences. Comput Biol Chem 27: 555–564. doi:10.1016/j.compbiolchem.2003.09.004
González-Díaz H, Pérez-Bello A, Uriarte E, González-Díaz Y (2006) QSAR study for mycobacterial promoters with low sequence homology. Bioorg Med Chem Lett 16: 547–553. doi:10.1016/j.bmcl.2005.10.057
González-Díaz H, Pérez-Bello A, Uriarte E (2005) Stochastic molecular descriptors for polymers. 3. Markov electrostatic moments as polymer 2D-folding descriptors: RNA-QSAR for mycobacterial promoters. Polymer 46: 6461–6473. doi:10.1016/j.polymer.2005.04.104
González-Díaz H, Pérez-Bello A, Cruz-Monteagudo M, González-Díaz Y, Santana L, Uriarte E (2007) Chemometrics for QSAR with low sequence homology: mycobacterial promoter sequences recognition with 2D-RNA entropies. Chemom Intell Lab Syst 85: 20–26. doi:10.1016/j.chemolab.2006.03.005
Perez-Bello A, Munteanu CR, Ubeira FM, De Magalhães AL, Uriarte E, González-Díaz H (2009) Alignment-free prediction of mycobacterial DNA promoters based on pseudo-folding lattice network or star-graph topological indices. J Theor Biol 256: 458–466. doi:10.1016/j.jtbi.2008.09.035
González-Díaz H, Prado-Prado F, Ubeira FM (2008) Predicting antimicrobial drugs and targets with the MARCH-INSIDE approach. Curr Top Med Chem 8: 1676–1690. doi:10.2174/156802608786786543
González-Díaz H, González-Díaz Y, Santana L, Ubeira FM, Uriarte E (2008) Proteomics, networks and connectivity indices. Proteomics 8: 750–778. doi:10.1002/pmic.200700638
Rashid M, Saha S, Raghava GPS (2007) Support vector machine-based method for predicting subcellular localization of mycobacterial proteins using evolutional information and motifs. BMC Bioinformatics 8: 337. doi:10.1186/1471-2105-8-337
Nair R, Rost B (2002) Sequence conserved for subcellular localization. Protein Sci 11: 2836–2847. doi:10.1110/ps.0207402
Yu CS, Chen YC, Lu CH, Hwang JK (2006) Prediction of protein subcellular localization. Proteins 64: 643–651. doi:10.1002/prot.21018
Lin H, Ding H, Guo FB, Zhang AY, Huang J (2008) Predicting subcellular localization of Mycobacterial proteins by using Chou’s pseudo amino acid composition. Protein Pept Lett 15: 739–744. doi:10.2174/092986608785133681
Park KJ, Gromiha MM, Horton P, Suwa M (2005) Discrimination of outer membrane proteins using support vector machines. Bioinformatics 21: 4223–4229. doi:10.1093/bioinformatics/bti697
Chen YL, Li QZ (2007) Prediction of the subcellular location of apoptosis proteins. J Theor Biol 245: 775–783. doi:10.1016/j.jtbi.2006.11.010
Chen YL, Li QZ (2007) Prediction of apoptosis protein subcellular location using improved hybrid approach and pseudo-amino acid composition. J Theor Biol 248: 377–381. doi:10.1016/j.jtbi.2007.05.019
Emanuelsson O, Nielsen H, Brunak S, Heijine G (2000) Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol 300: 1005–1016. doi:10.1006/jmbi.2000.3903
Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658–1659. doi:10.1093/bioinformatics/btl158
Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/_cjlin/libsvm
Chou KC (2001) Prediction of protein cellular attributes using pseudo amino acid composition. Proteins 43: 246–255. doi:10.1002/prot.1035
Shen HB, Chou KC (2008) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Anal Biochem 373: 386–388. doi:10.1016/j.ab.2007.10.012
Russell RB, Saqi MA, Sayle RA, Bates PA, Sternberg MJ (1997) Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol 269: 423–439. doi:10.1006/jmbi.1997.1019
Pánek J, Eidhammer I, Aasland R (2005) A new method for identification of protein (Sub)families in a set of proteins based on hydropathy distribution in proteins. Proteins 58: 923–934. doi:10.1002/prot.20356
Agüero-Chapin G, González-Díaz H, Molina R, Varona-Santos J, Uriarte E, González-Díaz Y (2006) Novel 2D maps and coupling numbers for protein sequences. The first QSAR study of polygalacturonases; isolation and prediction of a novel sequence from Psidium guajava L. FEBS Lett 580: 723–730. doi:10.1016/j.febslet.2005.12.072
Chou KC, Zhang CT (1995) Review: prediction of protein structural classes. Crit Rev Biochem Mol Biol 30: 275–349. doi:10.3109/10409239509083488
Chou KC (1999) A key driving force in determination of protein structural classes. Biochem Biophys Res Commun 264: 216–224. doi:10.1006/bbrc.1999.1325
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lin, H., Ding, H., Guo, FB. et al. Prediction of subcellular location of mycobacterial protein using feature selection techniques. Mol Divers 14, 667–671 (2010). https://doi.org/10.1007/s11030-009-9205-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-009-9205-1