Summary.
DNA-binding proteins play a pivotal role in gene regulation. It is vitally important to develop an automated and efficient method for timely identification of novel DNA-binding proteins. In this study, we proposed a method based on alone the primary sequences of proteins to predict the DNA-binding proteins. DNA-binding proteins were encoded by autocross-covariance transform, pseudo-amino acid composition, dipeptide composition, respectively and also the different combinations of the three encoded methods; further, these feature matrices were applied to support vector machine classifiers to predict the DNA-binding proteins. All modules were trained and validated by the jackknife cross-validation test. Through comparing the performance of these substituted modules, the best result was obtained from pseudo-amino acid composition with the overall accuracy of 96.6% and the sensitivity of 90.7%. The results suggest that it can efficiently predict the novel DNA-binding proteins only using the primary sequences.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
S Ahmad MM Gromiha A Sarai (2004) ArticleTitleAnalysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information Bioinformatics 20 477–486 Occurrence Handle14990443 Occurrence Handle10.1093/bioinformatics/btg432 Occurrence Handle1:CAS:528:DC%2BD2cXhvFSru7k%3D
S Ahmad A Sarai (2005) ArticleTitlePSSM-based prediction of DNA binding sites in proteins BMC Bioinformatics 6 33 Occurrence Handle15720719 Occurrence Handle10.1186/1471-2105-6-33 Occurrence Handle1:CAS:528:DC%2BD2MXivVajt74%3D
N Bhardwaj RE Langlois G Zhao H Lu (2005) ArticleTitleKernel-based machine learning protocol for predicting DNA-binding proteins Nucleic Acids Res 33 6486–6493 Occurrence Handle16284202 Occurrence Handle10.1093/nar/gki949 Occurrence Handle1:CAS:528:DC%2BD2MXhtlSjurnF
YD Cai GP Zhou KC Chou (2003) ArticleTitleSupport vector machines for predicting membrane protein types by using functional domain composition Biophys J 84 3257–3263 Occurrence Handle12719255 Occurrence Handle1:CAS:528:DC%2BD3sXjvFGju7o%3D
C Chen YX Tian XY Zou PX Cai JY Mo (2006) ArticleTitleUsing pseudo-amino acid composition and support vector machine to predict protein structural class J Theor Biol 243 444–448 Occurrence Handle16908032 Occurrence Handle10.1016/j.jtbi.2006.06.025 Occurrence Handle1:CAS:528:DC%2BD28XhtFKlsL3N
KC Chou (2000a) ArticleTitlePrediction of protein subcellular locations by incorporating quasi-sequence-order effect Biochem Biophys Res Commun 278 477–483 Occurrence Handle10.1006/bbrc.2000.3815 Occurrence Handle1:CAS:528:DC%2BD3cXotlKksbs%3D
KC Chou (2000b) ArticleTitleReview: prediction of protein structural classes and subcellular locations Curr Protein Pept Sci 1 171–208 Occurrence Handle10.2174/1389203003381379 Occurrence Handle1:CAS:528:DC%2BD3cXnsVeisL0%3D
KC Chou (2001) ArticleTitlePrediction of protein cellular attributes using pseudo amino acid composition Proteins 43 246–255 Occurrence Handle11288174 Occurrence Handle10.1002/prot.1035 Occurrence Handle1:CAS:528:DC%2BD3MXjtFOls74%3D
KC Chou (2005a) ArticleTitleReview: progress in protein structural class prediction and its impact to bioinformatics and proteomics Curr Protein Pept Sci 6 423–436 Occurrence Handle10.2174/138920305774329368 Occurrence Handle1:CAS:528:DC%2BD2MXhtV2gt7zI
KC Chou (2005b) ArticleTitleUsing amphiphilic pseudo amino acid composition to predict enzyme subfamily classes Bioinformatics 21 10–19 Occurrence Handle10.1093/bioinformatics/bth466 Occurrence Handle1:CAS:528:DC%2BD2MXisVWitw%3D%3D
KC Chou YD Cai (2002) ArticleTitleUsing functional domain composition and support vector machines for prediction of protein subcellular location J Biol Chem 277 45765–45769 Occurrence Handle12186861 Occurrence Handle10.1074/jbc.M204161200 Occurrence Handle1:CAS:528:DC%2BD38XovFKjurg%3D
KC Chou YD Cai (2003) ArticleTitlePrediction and classification of protein subcellular location: sequence-order effect and pseudo amino acid composition J Cell Biochem 90 1250–1260 Occurrence Handle14635197 Occurrence Handle10.1002/jcb.10719 Occurrence Handle1:CAS:528:DC%2BD3sXpslSgsb4%3D
KC Chou YD Cai (2005) ArticleTitlePrediction of membrane protein types by incorporating amphipathic effects J Chem Inf Model 45 407–413 Occurrence Handle15807506 Occurrence Handle10.1021/ci049686v Occurrence Handle1:CAS:528:DC%2BD2MXht1aqtLs%3D
KC Chou YD Cai (2006) ArticleTitlePredicting protein–protein interactions from sequences in a hybridization space J Proteome Res 5 316–322 Occurrence Handle16457597 Occurrence Handle10.1021/pr050331g Occurrence Handle1:CAS:528:DC%2BD2MXhtlClsLnK
KC Chou GM Maggiora (1998) ArticleTitleDomain structural class prediction Protein Eng 11 523–538 Occurrence Handle9740370 Occurrence Handle10.1093/protein/11.7.523 Occurrence Handle1:STN:280:DyaK1cvhtFSrtA%3D%3D
KC Chou HB Shen (2006a) ArticleTitleHum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization Biochem Biophys Res Commun 347 150–157 Occurrence Handle10.1016/j.bbrc.2006.06.059 Occurrence Handle1:CAS:528:DC%2BD28Xmslyrsbc%3D
KC Chou HB Shen (2006b) ArticleTitleLarge-scale predictions of Gram-negative bacterial protein subcellular locations J Proteome Res 5 3420–3428 Occurrence Handle10.1021/pr060404b Occurrence Handle1:CAS:528:DC%2BD28XhtFehurjJ
KC Chou HB Shen (2006c) ArticleTitlePredicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-nearest neighbor classifiers J Proteome Res 5 1888–1897 Occurrence Handle10.1021/pr060167c Occurrence Handle1:CAS:528:DC%2BD28XmvVeitr0%3D
KC Chou HB Shen (2006d) ArticleTitlePredicting protein subcellular location by fusing multiple classifiers J Cell Biochem 99 517–527 Occurrence Handle10.1002/jcb.20879 Occurrence Handle1:CAS:528:DC%2BD28XhtVSktL3J
KC Chou HB Shen (2007a) ArticleTitleEuk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites J Proteome Res 6 1728–1734 Occurrence Handle1:CAS:528:DC%2BD2sXjs1SrsbY%3D
KC Chou HB Shen (2007b) ArticleTitleLarge-scale plant protein subcellular location prediction J Cell Biochem 100 665–678 Occurrence Handle10.1002/jcb.21096 Occurrence Handle1:CAS:528:DC%2BD2sXht1Slu7c%3D
KC Chou HB Shen (2007c) ArticleTitleSignal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides Biochem Biophys Res Comm 357 633–640 Occurrence Handle10.1016/j.bbrc.2007.03.162 Occurrence Handle1:CAS:528:DC%2BD2sXkslCju78%3D
KC Chou CT Zhang (1995) ArticleTitleReview: prediction of protein structural classes Crit Rev Biochem Mol Biol 30 275–349 Occurrence Handle7587280 Occurrence Handle10.3109/10409239509083488 Occurrence Handle1:CAS:528:DyaK2MXosFentb8%3D
P Du Y Li (2006) ArticleTitlePrediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence BMC Bioinformatics 7 518 Occurrence Handle17134515 Occurrence Handle10.1186/1471-2105-7-518 Occurrence Handle1:CAS:528:DC%2BD28XhtlWlsb7E
QS Du ZQ Jiang WZ He DP Li KC Chou (2006) ArticleTitleAmino acid principal component analysis (AAPCA) and its applications in protein structural class prediction J Biomol Struct Dyn 23 635–640 Occurrence Handle16615809 Occurrence Handle1:CAS:528:DC%2BD28XkvVCntLw%3D
QS Du DQ Wei KC Chou (2003) ArticleTitleCorrelation of amino acids in proteins Peptides 24 1863–1869 Occurrence Handle15127938 Occurrence Handle10.1016/j.peptides.2003.10.012 Occurrence Handle1:CAS:528:DC%2BD2cXhtV2ktLw%3D
M Edman T Jarhede M Sjöström A Wieslander (1999) ArticleTitleDifferent sequence patterns in signal peptides from mycoplasmas, other gram-positive bacteria, and Escherichia coli: a multivariate data analysis Proteins 35 195–205 Occurrence Handle10223292 Occurrence Handle10.1002/(SICI)1097-0134(19990501)35:2<195::AID-PROT6>3.0.CO;2-P Occurrence Handle1:CAS:528:DyaK1MXitlagtLo%3D
QB Gao ZZ Wang (2006) ArticleTitleClassification of G-protein coupled receptors at four levels Protein Eng Des Sel 19 511–516 Occurrence Handle17032692 Occurrence Handle10.1093/protein/gzl038 Occurrence Handle1:CAS:528:DC%2BD28XhtFeisbzM
Y Gao SH Shao X Xiao YS Ding YS Huang ZD Huang KC Chou (2005) ArticleTitleUsing pseudo amino acid composition to predict protein subcellular location: approached with Lyapunov index, Bessel function, and Chebyshev filter Amino Acids 28 373–376 Occurrence Handle15889221 Occurrence Handle10.1007/s00726-005-0206-9 Occurrence Handle1:CAS:528:DC%2BD2MXlt1Kmurw%3D
J Guo Y Lin X Liu (2006a) ArticleTitleGNBSL: a new integrative system to predict the subcellular location for Gram-negative bacteria proteins Proteomics 6 5099–5105 Occurrence Handle10.1002/pmic.200600064 Occurrence Handle1:CAS:528:DC%2BD28XhtFarsbzO
Y Guo M Li M Lu Z Wen Z Huang (2006b) ArticleTitlePredicting GPCR-G-protein coupling specificity based on autocross-covariance transform Proteins Struct Func Bioinformatics 65 55–60 Occurrence Handle10.1002/prot.21097 Occurrence Handle1:CAS:528:DC%2BD28Xpt1Cmurc%3D
S Hellberg M Sjostrom B Skagerberg S Wold (1987) ArticleTitlePeptide quantitative structure-activity relationships, a multivariate approach J Med Chem 30 1126–1135 Occurrence Handle3599020 Occurrence Handle10.1021/jm00390a003 Occurrence Handle1:CAS:528:DyaL2sXktFOhsLY%3D
S Jones HP Shanahan HM Berman JM Thornton (2003) ArticleTitleUsing electrostatic potentials to predict DNA-binding sites on DNA-binding proteins Nucleic Acids Res 31 7189–7198 Occurrence Handle14654694 Occurrence Handle10.1093/nar/gkg922 Occurrence Handle1:CAS:528:DC%2BD3sXps1Shsr4%3D
M Keil TE Exner J Brickmann (2004) ArticleTitlePattern recognition strategies for molecular surfaces: III. Binding site prediction with a nural network J Comput Chem 25 779–789 Occurrence Handle15011250 Occurrence Handle10.1002/jcc.10361 Occurrence Handle1:CAS:528:DC%2BD2cXjtVKrsbY%3D
I Kuznetsov Z Gou R Li S Hwang (2006) ArticleTitleUsing evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins Proteins Struc Funct Bioinformatics 64 19–27 Occurrence Handle10.1002/prot.20977 Occurrence Handle1:CAS:528:DC%2BD28XlsFahs70%3D
ES Lander LM Linton B Birren C Nusbaum MC Zody J Baldwin K Devon K Dewar M Doyle W FitzHugh et al. (2001) ArticleTitleInitial sequencing and analysis of the human genome Nature 409 860–921 Occurrence Handle11237011 Occurrence Handle10.1038/35057062 Occurrence Handle1:CAS:528:DC%2BD3MXhsFCjtLc%3D
H Lin QZ Li (2007) ArticleTitlePredicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant Biochem Biophys Res Commun 354 548–551 Occurrence Handle17239817 Occurrence Handle10.1016/j.bbrc.2007.01.011 Occurrence Handle1:CAS:528:DC%2BD2sXhtlOgtLo%3D
DQ Liu H Liu HB Shen J Yang KC Chou (2007a) ArticleTitlePredicting secretory protein signal sequence cleavage sites by fusing the marks of global alignments Amino Acids 32 493–496 Occurrence Handle10.1007/s00726-006-0466-z Occurrence Handle1:CAS:528:DC%2BD2sXlsVGnsL8%3D
H Liu J Yang DQ Liu HB Shen KC Chou (2007b) ArticleTitleUsing a new alignment kernel function to identify secretory proteins Protein Pept Lett 14 203–208 Occurrence Handle10.2174/092986607779816087 Occurrence Handle1:CAS:528:DC%2BD2sXjtlGnsrY%3D
LX Liu ML Li FY Tan MC Lu KL Wang YZ Guo ZN Wen L Jiang (2006) ArticleTitleLocal sequence information-based support vector machine to classify voltage-gated potassium channels Acta Biochim Biophys Sin 38 363–371 Occurrence Handle16761093 Occurrence Handle10.1111/j.1745-7270.2006.00177.x Occurrence Handle1:CAS:528:DC%2BD28XntlWksLY%3D
W Liu KC Chou (1998) ArticleTitleSingular points of protein beta-sheets Protein Sci 7 2324–2330 Occurrence Handle9827998 Occurrence Handle1:CAS:528:DyaK1cXnt1Grsbw%3D Occurrence Handle10.1002/pro.5560071109
S Mondal R Bhavna R Mohan Babu S Ramakumar (2006) ArticleTitlePseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification J Theor Biol 243 252–260 Occurrence Handle16890961 Occurrence Handle10.1016/j.jtbi.2006.06.014 Occurrence Handle1:CAS:528:DC%2BD28XhtVygtbzM
HP Shanahan MA Garcia S Jones JM Thornton (2004) ArticleTitleIdentifying DNA-binding proteins using structural motifs and the electrostatic potential Nucleic Acids Res 32 4732–4741 Occurrence Handle15356290 Occurrence Handle10.1093/nar/gkh803 Occurrence Handle1:CAS:528:DC%2BD2cXnvVOmu78%3D
HB Shen KC Chou (2005a) ArticleTitlePredicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition Biochem Biophys Res Comm 337 752–756 Occurrence Handle1:CAS:528:DC%2BD2MXhtFCjs7%2FI Occurrence Handle10.1016/j.bbrc.2005.09.117
HB Shen KC Chou (2005b) ArticleTitleUsing optimized evidence – theoretic K-nearest neighbor classifier and pseudo amino acid composition to predict membrane protein types Biochem Biophys Res Commun 334 288–292 Occurrence Handle10.1016/j.bbrc.2005.06.087 Occurrence Handle1:CAS:528:DC%2BD2MXmt1aqsLw%3D
HB Shen KC Chou (2006) ArticleTitleEnsemble classifier for protein fold pattern recognition Bioinformatics 22 1717–1722 Occurrence Handle16672258 Occurrence Handle10.1093/bioinformatics/btl170 Occurrence Handle1:CAS:528:DC%2BD28Xotl2rsLY%3D
HB Shen KC Chou (2007a) ArticleTitleGpos-PLoc: an ensemble classifier for predicting subcellular localization of Gram-positive bacterial proteins Protein Eng Des Sel 20 39–46 Occurrence Handle10.1093/protein/gzl053 Occurrence Handle1:CAS:528:DC%2BD2sXhvFWmtr8%3D
HB Shen KC Chou (2007b) ArticleTitleHum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites Biochem Biophys Res Commun 355 1006–1011 Occurrence Handle10.1016/j.bbrc.2007.02.071 Occurrence Handle1:CAS:528:DC%2BD2sXivVahur0%3D
HB Shen KC Chou (2007c) ArticleTitleUsing ensemble classifier to identify membrane protein types Amino Acids 32 483–488 Occurrence Handle10.1007/s00726-006-0439-2 Occurrence Handle1:CAS:528:DC%2BD2sXlsVGnsLY%3D
HB Shen KC Chou (2007d) ArticleTitleVirus-PLoc: a fusion classifier for predicting the subcellular localization of viral proteins within host and virus-infected cells Biopolymers 85 233–240 Occurrence Handle10.1002/bip.20640 Occurrence Handle1:CAS:528:DC%2BD2sXhvFWhs70%3D
HB Shen J Yang KC Chou (2006) ArticleTitleFuzzy KNN for predicting membrane protein types from pseudo amino acid composition J Theor Biol 240 9–13 Occurrence Handle16197963 Occurrence Handle10.1016/j.jtbi.2005.08.016 Occurrence Handle1:CAS:528:DC%2BD28Xjs1Knt70%3D
Shen HB, Yang J, Chou KC (2007) Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids (doi: 10.1007/s00726-006-0478-8)
M Sjöström S Rännar Å Wieslander (1995) ArticleTitlePolypeptide sequence property relationships in Escherichia coli based on auto cross covariances Chemometr Intell Lab Syst 29 295–305 Occurrence Handle10.1016/0169-7439(95)00059-1
EW Stawiski LM Gregoret YM Gutfreund (2003) ArticleTitleannotating nucleic acid-binding function based on protein structure J Mol Biol 326 1065–1079 Occurrence Handle12589754 Occurrence Handle10.1016/S0022-2836(03)00031-7 Occurrence Handle1:CAS:528:DC%2BD3sXhtVCitLw%3D
Tan F, Feng X, Fang Z, Li M, Guo Y, Jiang L (2006) Prediction of mitochondrial proteins based on genetic algorithm – partial least squares and support vector machine. Amino Acids (published online Oct 15, 2006, doi: 10.1007/s00726-006-0465-0)
Y Tsuchiya K Kinoshita H Nakamura (2004) ArticleTitleStructure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces Proteins 55 885–894 Occurrence Handle15146487 Occurrence Handle10.1002/prot.20111 Occurrence Handle1:CAS:528:DC%2BD2cXkslCjtbw%3D
VN Vapnik (1998) Statistical learning theory J Wiley New York
M Wang J Yang GP Liu ZJ Xu KC Chou (2004) ArticleTitleWeighted-support vector machines for predicting membrane protein types based on pseudo amino acid composition Protein Eng Des Sel 17 509–516 Occurrence Handle15314209 Occurrence Handle10.1093/protein/gzh061 Occurrence Handle1:CAS:528:DC%2BD2cXos1GisLY%3D
SQ Wang J Yang KC Chou (2006) ArticleTitleUsing stacked generalization to predict membrane protein types based on pseudo amino acid composition J Theor Biol 242 941–946 Occurrence Handle16806277 Occurrence Handle10.1016/j.jtbi.2006.05.006 Occurrence Handle1:CAS:528:DC%2BD28Xps1Oku70%3D
S Wold J Jonsson M Sjöström M Sandberg S Rännar (1993) ArticleTitleDNA and peptide sequences and chemical processes multivariately modelled by principal component analysis and partial least-squares projections to latent structures Anal Chim Acta 277 239–253 Occurrence Handle10.1016/0003-2670(93)80437-P Occurrence Handle1:CAS:528:DyaK3sXksVait74%3D
X Xiao S Shao Y Ding Z Huang X Chen KC Chou (2005a) ArticleTitleAn application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation J Theor Biol 235 555–565 Occurrence Handle10.1016/j.jtbi.2005.02.008 Occurrence Handle1:CAS:528:DC%2BD2MXltVelt7c%3D
X Xiao S Shao Y Ding Z Huang Y Huang KC Chou (2005b) ArticleTitleUsing complexity measure factor to predict protein subcellular location Amino Acids 28 57–61 Occurrence Handle10.1007/s00726-004-0148-7 Occurrence Handle1:CAS:528:DC%2BD2MXhsVKqsro%3D
X Xiao S Shao Y Ding Z Huang KC Chou (2006a) ArticleTitleUsing cellular automata images and pseudo amino acid composition to predict protein sub-cellular location Amino Acids 30 49–54 Occurrence Handle10.1007/s00726-005-0225-6 Occurrence Handle1:CAS:528:DC%2BD28XhsFCksrk%3D
X Xiao SH Shao ZD Huang KC Chou (2006b) ArticleTitleUsing pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor J Comput Chem 27 478–482 Occurrence Handle10.1002/jcc.20354 Occurrence Handle1:CAS:528:DC%2BD28XitFyqsr4%3D
X Xiao SH Shao KC Chou (2006c) ArticleTitleA probability cellular automaton model for hepatitis B viral infections Biochem Biophys Res Commun 342 605–610 Occurrence Handle10.1016/j.bbrc.2006.01.166 Occurrence Handle1:CAS:528:DC%2BD28XhvVehsLg%3D
ZH Zhang ZH Wang ZR Zhang YX Wang (2006) ArticleTitleA novel method for apoptosis protein subcellular localization prediction combining encoding based on grouped weight and support vector machine FEBS Lett 580 6169–6174 Occurrence Handle17069811 Occurrence Handle10.1016/j.febslet.2006.10.017 Occurrence Handle1:CAS:528:DC%2BD28XhtFOhsrzN
GP Zhou (1998) ArticleTitleAn intriguing controversy over protein structural class prediction J Protein Chem 17 729–738 Occurrence Handle9988519 Occurrence Handle10.1023/A:1020713915365 Occurrence Handle1:CAS:528:DyaK1MXnslaltw%3D%3D
GP Zhou N Assa-Munt (2001) ArticleTitleSome insights into protein structural class prediction Proteins 44 57–59 Occurrence Handle11354006 Occurrence Handle10.1002/prot.1071 Occurrence Handle1:CAS:528:DC%2BD3MXktlSnsbk%3D
GP Zhou K Doctor (2003) ArticleTitleSubcellular location prediction of apoptosis proteins Proteins 50 44–48 Occurrence Handle12471598 Occurrence Handle10.1002/prot.10251 Occurrence Handle1:CAS:528:DC%2BD3sXlsVKmug%3D%3D
Author information
Authors and Affiliations
Corresponding author
Additional information
Authors’ address: Menglong Li, College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, P.R. China
Rights and permissions
About this article
Cite this article
Fang, Y., Guo, Y., Feng, Y. et al. Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features. Amino Acids 34, 103–109 (2008). https://doi.org/10.1007/s00726-007-0568-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00726-007-0568-2