Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

Jaramillo-Garzón, Jorge Alberto; Castro-Ceballos, Jacobo; Castellanos-Dominguez, Germán

doi:10.1007/978-3-319-16480-9_26

Jorge Alberto Jaramillo-Garzón^20,21,
Jacobo Castro-Ceballos²⁰ &
Germán Castellanos-Dominguez²⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9044))

Included in the following conference series:

International Conference on Bioinformatics and Biomedical Engineering

3084 Accesses
1 Citations

Abstract

Sub-cellular localization prediction is an important step for inferring protein functions. Several strategies have been developed in the recent years to solve this problem, from alignment-based solutions to feature-based solutions. However, under some identity thesholds, these kind of approaches fail to detect homologous sequences, achieving predictions with low specificity and sensitivity. Here, a novel methodology is proposed for classifying proteins with low identity levels. This approach implements a simple, yet powerful assumption that employs hierarchical clustering and hidden Markov models, obtaining high performance on the prediction of four different sub-cellular localizations.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Protein Subcellular Localization Prediction

Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms

Hidden Markov Models for Protein Domain Homology Identification and Analysis

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Chou, K.-C., Shen, H.-B.: Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols 3(2), 153–162 (2008)
Article Google Scholar
Baldi, P., Brunak, S.: Bioinformatics: the machine learning approach. The MIT Press (2001)
Google Scholar
Jaramillo-Garzón, J., Perera-Lluna, A., Castellanos-Domiínguez, C.: Predictability of protein subcellular locations by pattern recognition techniques. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5512–5515. IEEE (2010)
Google Scholar
Conesa, A., Götz, S.: Blast2go: A comprehensive suite for functional analysis in plant genomics. International Journal of Plant Genomics 2008 (2008)
Google Scholar
Hawkins, T., Chitale, M., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3), 566–582 (2009)
Article Google Scholar
Yu, C., Lin, C., Hwang, J.: Predicting subcellular localization of proteins for gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 13(5), 1402–1406 (2004)
Article Google Scholar
Shi, J., Zhang, S., Pan, Q., Cheng, Y., Xie, J.: Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33(1), 69–74 (2007)
Article Google Scholar
Nanni, L., Lumini, A.: An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence. Amino Acids 35(3), 573–580 (2008)
Article Google Scholar
Ma, J., Liu, W., Gu, H.: Predicting protein subcellular locations for Gram-negative bacteria using neural networks ensemble. In: Proceedings of the 6th Annual IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 114–120. The Institute of Electrical and Electronics Engineers Inc. (2009)
Google Scholar
Shen, Y., Burger, G.: ‘Unite and conquer’: enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics 8(1), 420 (2007)
Article Google Scholar
Shen, H., Yang, J., Chou, K.: Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33(1), 57–67 (2007)
Article Google Scholar
Niu, B., Jin, Y., Feng, K., Lu, W., Cai, Y., Li, G.: Using adaboost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Molecular Diversity 12(1), 41–45 (2008)
Article Google Scholar
Khan, A., Majid, A., Choi, T.: Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers. Amino Acids 38(1), 347–350 (2010)
Article Google Scholar
Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., et al.: The pfam protein families database. Nucleic Acids Research 40(D1), D290–D301 (2012)
Google Scholar
Arango-Argoty, G., Ruiz-Munoz, J., Jaramillo-Garzon, J., Castellanos-Dominguez, C.: An adaptation of pfam profiles to predict protein sub-cellular localization in gram positive bacteria. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5554–5557. IEEE (2012)
Google Scholar
Chou, K.-C., Shen, H.-B.: Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PloS One 5(6), e11335 (2010)
Article Google Scholar
Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)
Article Google Scholar
Finn, R.D., Clements, J., Eddy, S.R.: Hmmer web server: interactive sequence similarity searching. Nucleic Acids Research 39(suppl. 2), W29–W37 (2011)
Google Scholar
Jaramillo-Garzón, J.A., Gallardo-Chacón, J.J., Castellanos-Domínguez, C.G., Perera-Lluna, A.: Predictability of gene ontology slim-terms from primary structure information in embryophyta plant proteins. BMC Bioinformatics 14(1), 68 (2013)
Article Google Scholar
Yooseph, S., Li, W., Sutton, G.: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering. BMC Bioinformatics 9(1), 182 (2008)
Article Google Scholar
Sun, S., Chen, J., Li, W., Altintas, I., Lin, A., Peltier, S., Stocks, K., Allen, E.E., Ellisman, M., Grethe, J., et al.: Community cyberinfrastructure for advanced microbial ecology research and analysis: the camera resource. Nucleic Acids Research 39(suppl. 1), D546–D551 (2011)
Google Scholar
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Article Google Scholar
Freyhult, E.K., Bollback, J.P., Gardner, P.P.: Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding rna. Genome Research 17(1), 117–125 (2007)
Article Google Scholar
Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Molecular Systems Biology 7(1) (2011)
Google Scholar
Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N., Suzek, B., Martin, M., McGarvey, P., Gasteiger, E.: Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinformatics 10(1), 136 (2009)
Article Google Scholar
Barrell, D., Dimmer, E., Huntley, R.P., Binns, D., O’Donovan, C., Apweiler, R.: The goa database in 2009-an integrated gene ontology annotation resource. Nucleic Acids Research 37(suppl. 1), D396–D403 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Nacional de Colombia, Sede Manizales, Colombia
Jorge Alberto Jaramillo-Garzón, Jacobo Castro-Ceballos & Germán Castellanos-Dominguez
Institute Tecnológico Metropolitano, Medellín, Colombia
Jorge Alberto Jaramillo-Garzón

Authors

Jorge Alberto Jaramillo-Garzón
View author publications
You can also search for this author in PubMed Google Scholar
Jacobo Castro-Ceballos
View author publications
You can also search for this author in PubMed Google Scholar
Germán Castellanos-Dominguez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de Granada. , Dpto. de Arquitectura y Tecnología de Computadores (ATC)., E.T.S. de Ingenierías en Informática y Telecomunicación. CITIC-UGR.,, , c/ Periodista Daniel Saucedo Aranda s/n, , 18071, Granada, , , Spain
Francisco Ortuño
Universidad de Granada, E.T.S. Ingenierías Informática y de Telecomunicación , , Dpto. Arquitectura y Tecnología de Computadores, CITIC-UGR, , , C Periodista Rafael Gómez Montero , , 18071, Granada,, Spain
Ignacio Rojas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaramillo-Garzón, J.A., Castro-Ceballos, J., Castellanos-Dominguez, G. (2015). Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9044. Springer, Cham. https://doi.org/10.1007/978-3-319-16480-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-16480-9_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16479-3
Online ISBN: 978-3-319-16480-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

Abstract

Chapter PDF

Similar content being viewed by others

Protein Subcellular Localization Prediction

Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms

Hidden Markov Models for Protein Domain Homology Identification and Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

Abstract

Chapter PDF

Similar content being viewed by others

Protein Subcellular Localization Prediction

Predicting Subcellular Localization of Proteins by Bioinformatic Algorithms

Hidden Markov Models for Protein Domain Homology Identification and Analysis

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation