Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Li, Jing; Wang, Wei

doi:10.1007/s11427-007-0023-3

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Published: June 2007

Volume 50, pages 392–402, (2007)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Science in China Series C: Life Sciences Aims and scope Submit manuscript

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Download PDF

Li Jing¹ &
Wang Wei^1,2

94 Accesses
18 Citations
Explore all metrics

Abstract

Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.

Article PDF

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

Article Open access 05 February 2015

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Article Open access 07 May 2015

Algebraic Interpretations Towards Clustering Protein Homology Data

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bowie J U, Luthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science, 1991, 253: 164–170
Article PubMed CAS Google Scholar
Jones D T, Taylor W R, Thornton J M. A new approach to protein fold recognition. Nature, 1992, 358: 86–89
Article PubMed CAS Google Scholar
Regan L, Degrado W F. Characterization of a helical protein designed from first principles. Science, 1988, 241: 976–978
Article PubMed CAS Google Scholar
Kamtekar S. Protein design by binary patterning of polar and nopolar amino acids. Science, 1993, 262: 1680–1685
Article PubMed CAS Google Scholar
Plaxco K W. Simplified proteins: Minimalist solutions to the “protein folding problem”. Curr Opin Struct Biol, 1998, 8: 80–85
Article PubMed CAS Google Scholar
Wang J, Wang W. A computational approach to simplifying the protein folding alphabet. Nature Struct Biol, 1999, 6: 1033–1038
Article PubMed CAS Google Scholar
Henikoff S, Henikoff J G. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA, 1992, 89: 10915–10919
Article PubMed CAS Google Scholar
Ogata K, Ohya M, Umeyama H. Amino acid similarity matrix for homology derived from structural alignment and optimized by the Monte Carlo method. J Mol Graph Model, 1998, 16: 178–189
PubMed CAS Google Scholar
Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins, 2005, 58: 321–328
Article PubMed CAS Google Scholar
Friedberg I, Kaplan T, Margalit H. Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci, 2000, 9: 2278–2284
Article PubMed CAS Google Scholar
Mallick P, Weiss R, Eisenberg D. The directional atomic solvation energy: An atombased potential for the assignment of protein sequences to known folds. Proc Natl Acad Sci USA, 2002, 99: 16041–16046
Article PubMed CAS Google Scholar
Kleiger G. PFIT and PFRIT: Bioinformatic algorithms for detecting glycosidase function from structure and sequence. Protein Sci, 2004, 13: 221–229
Article PubMed CAS Google Scholar
Karlin S, Altschul S F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci USA, 1990, 87: 2264–2268
Article PubMed CAS Google Scholar
Altschul S F. Amino acid substitution matrices from an information theoretic perspective. J Mol Biol, 1991, 219: 555–565
Article PubMed CAS Google Scholar
Karlin S, Altschul S F. Applications and statistics for multiple high-scoring segments in molecular sequences. Proc Natl Acad Sci USA, 1993, 90: 5873–5877
Article PubMed CAS Google Scholar
Higgins D G, Sharp P M. CLUSTAL: a package for performing multiple sequence alignment on a microcomputer. Gene, 1988, 73: 237–244
Article PubMed CAS Google Scholar
Holm L, Sander C. Mapping the protein universe. Science, 1996, 273: 595–602
Article PubMed CAS Google Scholar
Holm L, Sander C. Dictionary of recurrent domains in protein structures. Proteins, 1998, 33: 88–96
Article PubMed CAS Google Scholar
Blake J D, Cohen F E. Pairwise sequence alignment below the twilight zone. J Mol Biol, 2001, 307: 721–735
Article PubMed CAS Google Scholar
Dosztanyi Z, Torda A E. Amino acid identity matrices based on force fields. Bioinformatics, 2001, 17: 686–699
Article PubMed CAS Google Scholar
Johnson M S, Overington J P. A structural basis for sequence comparisons an evaluation of scoring methodologies. J Mol Biol, 1993, 233: 716–738
Article PubMed CAS Google Scholar
Li T. Reduction of protein sequence complexity by residue grouping Protein Eng, 2003, 16: 323–330
CAS Google Scholar
Fan K, Wang W. What is the minimum number of letters required to fold a protein. J Mol Biol, 2003, 328: 921–926
Article PubMed CAS Google Scholar
Koradi R, Billeter M, Whrich K. MOLMOL: A program for display and analysis of macromolecular structures. J Mol Graphics, 1996, 14: 51–55
Article CAS Google Scholar
Henikoff S. Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene, 1995, 163: GC17–GC26
Article PubMed CAS Google Scholar
Pietrokovski S, Henikoff J G, Henikoff S. The blocks database-A system for protein classification. Nucleic Acids Res, 1996, 24: 197–200
Article PubMed CAS Google Scholar
Clarke N D. Sequence “minimization”: Exploring the sequence landscape with simplified sequences. Curr Opin Biotech, 1995, 6: 467–472
Article PubMed CAS Google Scholar
Riddle D S. Functional rapidly folding proteins from simplified amino acid sequences. Nature Struct Biol, 1997, 4: 805–809
Article PubMed CAS Google Scholar
Akanuma S, Kigawa T, Yokoyama S. Combinatorial mutagenesis to restricted amino acid usage in an enzyme to a reduced set. Proc Natl Acad Sci USA, 2002, 99: 13549–13553
Article PubMed CAS Google Scholar
Felsenstein J. Confidence limits on phylogenies: An approach using the bootstrap. Evolution, 1985, 39: 783–791
Article Google Scholar
Liu X. Simplified amino acid alphabets based on deviation of conditional probability from random background. Phys Rev E, 2002, 66: 021906-1–021906-4
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing, 210093, China
Li Jing & Wang Wei
Interdisciplinary Center of Theoretical Studies, Chinese Academy of Sciences, Beijing, 100080, China
Wang Wei

Authors

Li Jing
View author publications
You can also search for this author in PubMed Google Scholar
Wang Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wang Wei.

Additional information

Supported by the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) and the Nonlinear Project (973) of the NSM

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, W. Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids. SCI CHINA SER C 50, 392–402 (2007). https://doi.org/10.1007/s11427-007-0023-3

Download citation

Received: 23 May 2006
Accepted: 19 September 2006
Issue Date: June 2007
DOI: https://doi.org/10.1007/s11427-007-0023-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Abstract

Article PDF

Similar content being viewed by others

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Algebraic Interpretations Towards Clustering Protein Homology Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Abstract

Article PDF

Similar content being viewed by others

Evaluation and improvements of clustering algorithms for detecting remote homologous protein families

PR2ALIGN: a stand-alone software program and a web-server for protein sequence alignment using weighted biochemical properties of amino acids

Algebraic Interpretations Towards Clustering Protein Homology Data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation