Abstract
The frequencies of bases A (adenine), C (cytosine), G (guanine), and T (thymine) occurring in codon positioni, denoted bya i,c i,g i, andt i, respectively (i=1, 2, 3), have been calculated and diagrammatized for the 1490 human proteins in the codon usage table for primate genes compiled recently. Based on the characteristic graphs thus obtained, an overall picture of codon base distribution has been provided, and the relevant biological implication discussed. For the first codon position, it is shown in most cases that G is the most dominant base, and that the relationshipg 1>a 1>c 1>t 1 generally holds true. For the second codon position, A is generally the most dominant base and G is the one with the least occurrence frequently, with the relationship ofa 2>t 2>c 2>g 2. As to the third codon position, the values ofg 3+c 3 vary from 0.27 to 1, roughly keeping the relationship ofc 3>g 3>a 3=t 3 for the majority of cases. Interestingly, if the average frequencies for bases A, C, G, and T are defined as\(\bar a = {{(a_1 + a_2 + a_3 )} \mathord{\left/ {\vphantom {{(a_1 + a_2 + a_3 )} 3}} \right. \kern-\nulldelimiterspace} 3}, \bar c = {{(c_1 + c_2 + c_3 )} \mathord{\left/ {\vphantom {{(c_1 + c_2 + c_3 )} 3}} \right. \kern-\nulldelimiterspace} 3}, \bar g = {{(g_1 + g_2 + g_3 )} \mathord{\left/ {\vphantom {{(g_1 + g_2 + g_3 )} 3}} \right. \kern-\nulldelimiterspace} 3} and \bar t = {{(t1 + t2 + t3)} \mathord{\left/ {\vphantom {{(t1 + t2 + t3)} 3}} \right. \kern-\nulldelimiterspace} 3}\), respectively, we find that\(\bar a^2 + \bar c^2 + \bar g^2 + \bar t^2< \tfrac{1}{3}\) is valid almost without exception. Such a characteristic inequality might reflect some inherent rule of codon usage, although its biological implications is unclear. An important advantage by introducing graphic methods is to make it possible to catch essential features from a huge amount of data by a direct and intuitive examination. The method used here allows one to see means and variances, and also spot outliers. This is particularly useful for finding and classifying similarity patterns and relationships in data sets of long sequences, such as DNA coding sequences. The current method also holds a great potential for the study of molecular evolution from the viewpoint of genetic code whose data have been accumulated rapidly and are to continue growth at a much faster pace.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aota, S., Gojobori, T., Ishibashi, F., Maruyama, T., and Ikemura, T. (1988).Nucl. Acids Res. 16, r315–r402.
Grantham, R. (1980).Trends Biochem. Sci. 5, 327–330.
Grantham, R., Gautier, C., Gouy, M., Mercier, R., and Pave, A. (1980).Nucl. Acids Res. 8, r49–r62.
Grantham, R., Gautier, C., Gouy, M., Jacobzone, M., and Mercier, R. (1981).Nucl. Acids Res. 9, r43–r74.
Ikemura, T. (1985).Mol. Biol. Evol. 2, 13–24.
Ikemura, T., and Wata, K. (1991).Nucl. Acids Res. 19, 4333–4339.
Maruyama, T., Gojobori, T., Aota, S., and Ikemura, T. (1986).Nucl. Acids Res. 14, r151–r197.
Murray, E. E., Lotzer, J., and Eberle, M. (1989).Nucl. Acids Res. 17, 477–494.
Wata, K., Aota, S., Tsuchiya, R., Ishibashi, F., Gojobori, T., and Ikemura, T. (1990).Nucl. Acids Res. 18, r2367–r2411.
Author information
Authors and Affiliations
Additional information
On sabbatical leave from Department of Physics, Tianjin University, Tianjin, China.
Rights and permissions
About this article
Cite this article
Zhang, CT., Chou, KC. Graphic analysis of codon usage strategy in 1490 human proteins. J Protein Chem 12, 329–335 (1993). https://doi.org/10.1007/BF01028195
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01028195