Data Mining for Unidentified Protein Sequences

Blaese, Leif

doi:10.1007/978-3-662-45006-2_6

Leif Blaese¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 500))

724 Accesses

Abstract

Through the use of next generation sequencing (NGS) technology, a lot of newly sequenced organisms are now available. Annotating those genes is one of the most challenging tasks in sequence biology. Here, we present an automated workflow to find homologue proteins, annotate sequences according to function and create a three-dimensional model.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Homology-Based Annotation of Large Protein Datasets

Accurate Prediction of Protein Sequences for Proteogenomics Data Integration

Applications of Bio-molecular Databases in Bioinformatics

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Altschul, S.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acid research 25, 3389–3402 (1997)
Article Google Scholar
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Anfinsen, C., Haber, E., Sela, M., White Jr., F.: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. PNAS 47(9), 1309–1314 (1961)
Article Google Scholar
Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J., Sayers, E.: Genbank. Nucleic Acids Res. 7, D32–D37 (2011)
Google Scholar
Berger, M., Muson, M.: A novel randomized iterative strategy for aligning multiple protein sequences. Comput. Appl. Biosci. 7, 479–484 (1994)
Google Scholar
Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., Bourne, P.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Article Google Scholar
Cole, C., Barber, J., Barton, G.: The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 36(Web server issue), W197–W201 (2008)
Google Scholar
Ebert, B.E., Lamprecht, A.-L., Steffen, B., Blank, L.M.: Flux-P: Automating Metabolic Flux Analysis. Metabolites 2(4), 872–890 (2012)
Article Google Scholar
Gribskov, M., Luthy, R., Eisenberg, D.: Profile Analysis. Methods in Enzymology 183, 146–159 (1990)
Article Google Scholar
Hunter, S., Jones, P., Mitchell, A., et al.: InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Research 40, D306–D312 (2011)
Google Scholar
Jones, D.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)
Article Google Scholar
Katoh, K., Misawa, K., et al.: MAFFT version 5: Improvement in accuracy of multiple sequence alignment. KNucleic Acids Research 33(2), 411–518 (2005)
Google Scholar
Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MaFFT: a novel method for rapid multiple sequence alignment based on fast Furier transform. Nucleic Acids Res. 30(14), 3059–3066 (2002)
Article Google Scholar
Kelley, L., Sternberg, M.: Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 4, 363–371 (2009)
Article Google Scholar
Krogh, A., Brown, M., Mian, I., Sjolander, K., Haussle, D.: Hidden Markov models in computational biology. Applications to protein modeling. Journal of Molecular Biology 235(5), 1501–1531 (1994)
Article Google Scholar
Labarga, A., Valentin, F., Anderson, M., Lopez, R.: Web services at the European bioinformatics institute. Nucleic Acids Research 35(Web Server issue), W6–W11 (2007)
Google Scholar
Lamprecht, A.-L.: User-Level Workflow Design. LNCS, vol. 8311. Springer, Heidelberg (2013)
Google Scholar
Lamprecht, A.-L., Margaria, T. (eds.): Process Design for Natural Scientists: An Agile Model-Driven Approach. CCIS, vol. 500. Springer, Heidelberg (2014)
Google Scholar
Lamprecht, A.-L., Margaria, T., Steffen, B.: Seven variations of an alignment workflow - an illustration of agile process design and management in bio-jETI. In: Măndoiu, I., Wang, S.-L., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, pp. 445–456. Springer, Heidelberg (2008)
Chapter Google Scholar
Lamprecht, A.-L., Margaria, T., Steffen, B.: Bio-jETI: a framework for semantics-based service composition. BMC Bioinformatics 10(Suppl 10), S8 (2009)
Google Scholar
Lamprecht, A.-L., Margaria, T., Steffen, B., Sczyrba, A., Hartmeier, S., Giegerich, R.: GeneFisher-P: variations of GeneFisher as processes in Bio-jETI. BMC Bioinformatics 9(Suppl 4), S13 (2008)
Google Scholar
Lamprecht, A.-L., Naujokat, S., Margaria, T., Steffen, B.: Semantics-based composition of EMBOSS services. Journal of Biomedical Semantics 2(Suppl 1), S5 (2011)
Google Scholar
Lamprecht, A.-L., Wickert, A.: The Course’s SIB Libraries. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 30–44. Springer, Heidelberg (2014)
Google Scholar
Lis, M.: Constructing a Phylogenetic Tree. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 101–109. Springer, Heidelberg (2014)
Google Scholar
Marchler-Bauer, A., et al.: CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res. 35(Database issue), D237–D240 (2007)
Google Scholar
Margaria, T., Nagel, R., Steffen, B.: jETI: A tool for remote tool integration. In: Halbwachs, N., Zuck, L.D. (eds.) TACAS 2005. LNCS, vol. 3440, pp. 557–562. Springer, Heidelberg (2005)
Chapter Google Scholar
Margaria, T., Steffen, B.: Agile IT: Thinking in User-Centric Models. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. CCIS, vol. 17, pp. 490–502. Springer, Heidelberg (2009)
Chapter Google Scholar
Margaria, T., Steffen, B.: Business Process Modelling in the jABC: The One-Thing-Approach. In: Cardoso, J., van der Aalst, W. (eds.) Handbook of Research on Business Process Modeling. IGI Global (2009)
Google Scholar
Margaria, T., Steffen, B.: Continuous Model-Driven Engineering. IEEE Computer 42(10), 106–109 (2009)
Article Google Scholar
Margaria, T., Steffen, B.: Simplicity as a Driver for Agile Innovation. Computer 43(6), 90–92 (2010)
Article Google Scholar
Margaria, T., Steffen, B.: Service-Orientation: Conquering Complexity with XMDD. In: Hinchey, M., Coyle, L. (eds.) Conquering Complexity, pp. 217–236. Springer, London (2012)
Chapter Google Scholar
Margaria, T., Steffen, B., Reitenspieß, M.: Service-oriented design: The roots. In: Benatallah, B., Casati, F., Traverso, P. (eds.) ICSOC 2005. LNCS, vol. 3826, pp. 450–464. Springer, Heidelberg (2005)
Chapter Google Scholar
Naujokat, S., Lamprecht, A.-L., Steffen, B.: Loose programming with PROPHETS. In: de Lara, J., Zisman, A. (eds.) Fundamental Approaches to Software Engineering. LNCS, vol. 7212, pp. 94–98. Springer, Heidelberg (2012)
Chapter Google Scholar
Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biologie 48, 443–453 (1970)
Article Google Scholar
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Article Google Scholar
Reso, J.: Protein Classification Workflow. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 65–72. Springer, Heidelberg (2014)
Google Scholar
Rohl, C., Strauss, C., Misura, K.: DBaker. Protein structure prediction using rosetta. Methods in Enzymology 383, 66–93 (2004)
Article Google Scholar
Schulze, G.: Workflow for Rapid Metagenome Analysis. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 88–100. Springer, Heidelberg (2014)
Google Scholar
Schütt, C.: Identification of Differentially Expressed Genes. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 127–139. Springer, Heidelberg (2014)
Google Scholar
Zvelebil, M., Baum, J.: Understanding Bioinformatics. Garland Science (2008)
Google Scholar
Sigrist, C., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P.: PROSITE: A documented database using patterns and profiles as motif descriptos. Briefigs in Bioinformatics 3, 265–275 (2002)
Article Google Scholar
Sokal, R., Michener, C.: A statistical method for evaluation systematic relationships. The University of Kansas science bulletin 28, 1409–1438 (1958)
Google Scholar
Steffen, B., Margaria, T., Nagel, R., Jörges, S., Kubczak, C.: Model-driven development with the jABC. In: Bin, E., Ziv, A., Ur, S. (eds.) HVC 2006. LNCS, vol. 4383, pp. 92–108. Springer, Heidelberg (2007)
Chapter Google Scholar
Thomas, T., Gilbert, J., Meyer, F.: Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation 2(3) (2012)
Google Scholar
Vierheller, J.: Exploratory Data Analysis. In: Lamprecht, A.-L., Margaria, T. (eds.) Process Design for Natural Scientists. CCIS, vol. 500, pp. 110–126. Springer, Heidelberg (2014)
Google Scholar
Zdobnov, E., Apweiler, R.: InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17(9), 847–848 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Potsdam University, Potsdam, D-14482, Germany
Leif Blaese

Authors

Leif Blaese
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chair Service and Software Engineering, Institute of Computer Science, University of Potsdam, Potsdam, Germany
Anna-Lena Lamprecht
Chair Software Engineering, Computer Science and Information Systems Department, University of Limerick and Lero, The Irish Software Research Center, Limerick, Ireland
Tiziana Margaria

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Blaese, L. (2014). Data Mining for Unidentified Protein Sequences. In: Lamprecht, AL., Margaria, T. (eds) Process Design for Natural Scientists. Communications in Computer and Information Science, vol 500. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45006-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-45006-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45005-5
Online ISBN: 978-3-662-45006-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Mining for Unidentified Protein Sequences

Abstract

Chapter PDF

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

Accurate Prediction of Protein Sequences for Proteogenomics Data Integration

Applications of Bio-molecular Databases in Bioinformatics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Data Mining for Unidentified Protein Sequences

Abstract

Chapter PDF

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

Accurate Prediction of Protein Sequences for Proteogenomics Data Integration

Applications of Bio-molecular Databases in Bioinformatics

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation