Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information

Kandoi, Gaurav; Leelananda, Sumudu P.; Jernigan, Robert L.; Sen, Taner Z.

doi:10.1007/978-1-4939-6406-2_4

Gaurav Kandoi^6,7,
Sumudu P. Leelananda⁸,
Robert L. Jernigan^6,9 &
…
Taner Z. Sen^6,10

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1484))

2776 Accesses
7 Citations

Abstract

Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Protein Secondary Structure Prediction Using Machine Learning

A Review on Protein Structure Classification

The MULTICOM Protein Tertiary Structure Prediction System

References

Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr 60(Pt 12 Pt 1):2256–2268. doi:10.1107/S0907444904026460
Article CAS PubMed Google Scholar
Rost B (2001) Review: protein secondary structure prediction continues to rise. J Struct Biol 134(2–3):204–218. doi:10.1006/jsbi.2001.4336
Article CAS PubMed Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637. doi:10.1002/bip.360221211
Article CAS PubMed Google Scholar
Frishman D, Argos P (1995) Knowledge-based protein secondary structure assignment. Proteins 23(4):566–579. doi:10.1002/prot.340230412
Article CAS PubMed Google Scholar
Moult J, Pedersen JT, Judson R, Fidelis K (1995) A large‐scale experiment to assess protein structure prediction methods. Proteins 23(3):ii–iv
Article CAS PubMed Google Scholar
Biou V, Gibrat JF, Levin JM, Robson B, Garnier J (1988) Secondary structure prediction: combination of three different methods. Protein Eng 2:185–191
Article CAS PubMed Google Scholar
Salamov AA, Solovyev VV (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol 247:11–15
Article CAS PubMed Google Scholar
Rost B, Sander C (2000) Third generation prediction of secondary structures. Methods Mol Biol 143:71–95
CAS PubMed Google Scholar
Garnier J, Osguthorpe DJ, Robson B (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120:97–120
Article CAS PubMed Google Scholar
Chou PY, Fasman GD (1974) Prediction of protein conformation. Biochemistry 13:222–245
Article CAS PubMed Google Scholar
Lim VI (1974) Algorithms for prediction of alpha-helical and beta-structural regions in globular proteins. J Mol Biol 88:873–894
Article CAS PubMed Google Scholar
Lim VI (1974) Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 88:857–872
Article CAS PubMed Google Scholar
Levin JM, Garnier J (1988) Improvements in a secondary structure prediction method based on a search for local sequence homologies and its use as a model building tool. Biochim Biophys Acta 955:283–295
Article CAS PubMed Google Scholar
Levin JM, Robson B, Garnier J (1986) An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett 205:303–308
Article CAS PubMed Google Scholar
Salamov AA, Solovyev VV (1997) Protein secondary structure prediction using local alignments. J Mol Biol 268:31–36
Article CAS PubMed Google Scholar
Salzberg S, Cost S (1992) Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol 227:371–374
Article CAS PubMed Google Scholar
Yi TM, Lander ES (1993) Protein secondary structure prediction using nearest-neighbor methods. J Mol Biol 232:1117–1129
Article CAS PubMed Google Scholar
Frishman D, Argos P (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27:329–335
Article CAS PubMed Google Scholar
Holley LH, Karplus M (1989) Protein secondary structure prediction with a neural network. Proc Natl Acad Sci U S A 86:152–156
Article CAS PubMed PubMed Central Google Scholar
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292:195–202
Article CAS PubMed Google Scholar
Petersen TN, Lundegaard C, Nielsen M, Bohr H, Bohr J, Brunak S, Gippert GP, Lund O (2000) Prediction of protein secondary structure at 80 % accuracy. Proteins 41:17–20
Article CAS PubMed Google Scholar
Qian N, Sejnowski TJ (1988) Predicting the secondary structure of globular proteins using neural network models. J Mol Biol 202:865–884
Article CAS PubMed Google Scholar
Rost B, Sander C (1993) Prediction of protein secondary structure at better than 70 % accuracy. J Mol Biol 232:584–599
Article CAS PubMed Google Scholar
Rost B, Sander C, Schneider R (1994) PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci 10:53–60
CAS PubMed Google Scholar
Stolorz P, Lapedes A, Xia Y (1992) Predicting protein secondary structure using neural net and statistical methods. J Mol Biol 225:363–377
Article CAS PubMed Google Scholar
Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154–166
Article CAS PubMed Google Scholar
Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002) Protein secondary structure prediction based on the GOR algorithm incorporating multiple sequence alignment information. Polymer 43:441–449
Article CAS Google Scholar
Bystroff C, Thorsson V, Baker D (2000) HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. J Mol Biol 301:173–190
Article CAS PubMed Google Scholar
Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(suppl 2):W244–W248
Article PubMed PubMed Central Google Scholar
Karplus K (2009) SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res 37(suppl 2):W492–W497
Article CAS PubMed PubMed Central Google Scholar
Asai K, Hayamizu S, Handa KI (1993) Prediction of protein secondary structure by the hidden Markov model. Comput Appl Biosci 9(2):141–146
CAS PubMed Google Scholar
Li SC, Bu D, Xu J, Li M (2008) Fragment‐HMM: a new approach to protein structure prediction. Protein Sci 17(11):1925–1934
Article CAS PubMed PubMed Central Google Scholar
Ding W, Dai D, Xie J, Zhang H, Zhang W, Xie H (2012) PRT-HMM: A novel hidden Markov model for protein secondary structure prediction. In Computer and information science (ICIS), 2012 IEEE/ACIS 11th international conference on. IEEE. pp 207–212
Google Scholar
Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314–4321
Article CAS PubMed PubMed Central Google Scholar
Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787–2788
Article CAS PubMed PubMed Central Google Scholar
Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A consensus data mining secondary structure prediction by combining GOR V and fragment database mining. Protein Sci 15:2499–2506
Article CAS PubMed PubMed Central Google Scholar
Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus data mining (CDM) protein secondary structure prediction server: combining GOR V and fragment database mining (FDM). Bioinformatics 23:2628–2630
Article CAS PubMed PubMed Central Google Scholar
Faraggi E, Zhang T, Yang Y, Kurgan L, Zhou Y (2012) SPINE X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles. J Comput Chem 33(3):259–267
Article CAS PubMed Google Scholar
Heffernan R, Paliwal K, Lyons J, Dehzangi A, Sharma A, Wang J, Sattar A, Yang Y, Zhou Y (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476
Article PubMed PubMed Central Google Scholar
Cuff JA, Barton GJ (1999) Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34(4):508–519
Article CAS PubMed Google Scholar
Cuff JA, Barton GJ (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40(3):502–511
Article CAS PubMed Google Scholar
Simossis VA, Heringa J (2004) Integrating protein secondary structure prediction and multiple sequence alignment. Curr Protein Pept Sci 5(4):249–266
Article CAS PubMed Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed PubMed Central Google Scholar
Rost B (2003) Prediction in 1D: secondary structure, membrane helices, and accessibility. Methods Biochem Anal 44:559–587
CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, USA
Gaurav Kandoi, Robert L. Jernigan & Taner Z. Sen
Department of Electrical and Computer Engineering, Iowa State University, Ames, IA, USA
Gaurav Kandoi
Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, Columbus, OH, USA
Sumudu P. Leelananda
Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA, USA
Robert L. Jernigan
Department of Genetics, Development and Cell Biology, Iowa State University, 1025 Crop Genome Informatics Lab, Ames, IA, 50011, USA
Taner Z. Sen

Authors

Gaurav Kandoi
View author publications
You can also search for this author in PubMed Google Scholar
Sumudu P. Leelananda
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Jernigan
View author publications
You can also search for this author in PubMed Google Scholar
Taner Z. Sen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taner Z. Sen .

Editor information

Editors and Affiliations

Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, Australia
Yaoqi Zhou
Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio, USA
Andrzej Kloczkowski
Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, USA
Eshel Faraggi
Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, Australia
Yuedong Yang

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kandoi, G., Leelananda, S.P., Jernigan, R.L., Sen, T.Z. (2017). Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information. In: Zhou, Y., Kloczkowski, A., Faraggi, E., Yang, Y. (eds) Prediction of Protein Secondary Structure. Methods in Molecular Biology, vol 1484. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6406-2_4

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6406-2_4
Published: 28 October 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6404-8
Online ISBN: 978-1-4939-6406-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Protein Secondary Structure Prediction Using Machine Learning

A Review on Protein Structure Classification

The MULTICOM Protein Tertiary Structure Prediction System

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Protein Secondary Structure Prediction Using Machine Learning

A Review on Protein Structure Classification

The MULTICOM Protein Tertiary Structure Prediction System

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation