Abstract
Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2256–2268. doi:10.1107/S0907444904026460
Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2–3):204–218. doi:10.1006/jsbi.2001.4336
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637. doi:10.1002/bip.360221211
Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23(4):566–579. doi:10.1002/prot.340230412
Moult J, Pedersen JT, Judson R, Fidelis K (1995) A large‐scale experiment to assess protein structure prediction methods. Proteins 23(3):ii–iv
Biou V, Gibrat JF, Levin JM, Robson B, Garnier J (1988) Secondary structure prediction: combination of three different methods. Protein Eng 2:185–191
Salamov AA, Solovyev VV (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol 247:11–15
Rost B, Sander C (2000) Third generation prediction of secondary structures. Methods Mol Biol 143:71–95
Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120
Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:222–245
Lim VI (1974) Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol 88:873–894
Lim VI (1974) Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 88:857–872
Levin JM, Garnier J (1988) Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta 955:283–295
Levin JM, Robson B, Garnier J (1986) An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett 205:303–308
Salamov AA, Solovyev VV (1997) Protein secondary structure prediction using local alignments. J Mol Biol 268:31–36
Salzberg S, Cost S (1992) Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol 227:371–374
Yi TM, Lander ES (1993) Protein secondary structure prediction using nearest-neighbor methods. J Mol Biol 232:1117–1129
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27:329–335
Holley LH, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A 86:152–156
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Petersen TN, Lundegaard C, Nielsen M, Bohr H, Bohr J, Brunak S, Gippert GP, Lund O (2000) Prediction of protein secondary structure at 80 % accuracy. Proteins 41:17–20
Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599
Rost B, Sander C, Schneider R (1994) PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci 10:53–60
Stolorz P, Lapedes A, Xia Y (1992) Predicting protein secondary structure using neural net and statistical methods. J Mol Biol 225:363–377
Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–166
Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information. Polymer 43:441–449
Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301:173–190
Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(suppl 2):W244–W248
Karplus K (2009) SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res 37(suppl 2):W492–W497
Asai K, Hayamizu S, Handa KI (1993) Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci 9(2):141–146
Li SC, Bu D, Xu J, Li M (2008) Fragment‐HMM: a new approach to protein structure prediction. Protein Sci 17(11):1925–1934
Ding W, Dai D, Xie J, Zhang H, Zhang W, Xie H (2012) PRT-HMM: A novel hidden Markov model for protein secondary structure prediction. In Computer and information science (ICIS), 2012 IEEE/ACIS 11th international conference on. IEEE. pp 207–212
Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321
Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–2788
Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A consensus data mining secondary structure prediction by combining GOR V and fragment database mining. Protein Sci 15:2499–2506
Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus data mining (CDM) protein secondary structure prediction server: combining GOR V and fragment database mining (FDM). Bioinformatics 23:2628–2630
Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34(4):508–519
Cuff JA, Barton GJ (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40(3):502–511
Simossis VA, Heringa J (2004) Integrating protein secondary structure prediction and multiple sequence alignment. Curr Protein Pept Sci 5(4):249–266
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Rost B (2003) Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal 44:559–587
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this protocol
Cite this protocol
Kandoi, G., Leelananda, S.P., Jernigan, R.L., Sen, T.Z. (2017). Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information. In: Zhou, Y., Kloczkowski, A., Faraggi, E., Yang, Y. (eds) Prediction of Protein Secondary Structure. Methods in Molecular Biology, vol 1484. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6406-2_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6406-2_4
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6404-8
Online ISBN: 978-1-4939-6406-2
eBook Packages: Springer Protocols