Higher Gene Duplicabilities for Metabolic Proteins Than for Nonmetabolic Proteins in Yeast and E. coli

Marland, Elizabeth; Prachumwat, Anuphap; Maltsev, Natalia; Gu, Zhenglong; Li, Wen-Hsiung

doi:10.1007/s00239-004-0068-x

Higher Gene Duplicabilities for Metabolic Proteins Than for Nonmetabolic Proteins in Yeast and E. coli

Published: December 2004

Volume 59, pages 806–814, (2004)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

Higher Gene Duplicabilities for Metabolic Proteins Than for Nonmetabolic Proteins in Yeast and E. coli

Download PDF

Elizabeth Marland¹,
Anuphap Prachumwat²,
Natalia Maltsev¹,
Zhenglong Gu³ &
…
Wen-Hsiung Li³

181 Accesses
23 Citations
Explore all metrics

Abstract

Although the evolutionary significance of gene duplication has long been appreciated, it remains unclear what factors determine gene duplicability. In this study we investigated whether metabolism is an important determinant of gene duplicability because cellular metabolism is crucial for the survival and reproduction of an organism. Using genomic data and metabolic pathway data from the yeast (Saccharomyces cerevisiae) and Escherichia coli, we found that metabolic proteins indeed tend to have higher gene duplicability than nonmetabolic proteins. Moreover, a detailed analysis of metabolic pathways in these two organisms revealed that genes in the central metabolic pathways and the catabolic pathways have, on average, higher gene duplicability than do other genes and that most genes in anabolic pathways are single-copy genes.

Gene Duplication and Functional Consequences

Increased rates of protein evolution and asymmetric deceleration after the whole-genome duplication in yeasts

Article Open access 06 February 2017

Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution

Article Open access 22 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In every genome sequenced to date, there are genes that are present in only a single copy and there are genes that are present in two or more copies. This observation suggests that different genes have different duplicabilities. However, it is far from clear what factors determine gene duplicability. Recently, Papp et al. (2003) proposed the dosage balance hypothesis, which postulates that genes coding for subunits of protein complexes (multimers) tend to have a lower duplicability than do genes coding for monomers because duplication of a single subunit may cause dosage imbalance among the subunits of the protein complex. Pursuing this issue further, Yang et al. (2003) hypothesized that dosage sensitivity increases while gene duplicability decreases with the number of subunits in a protein (i.e., protein complexity), and they indeed found support for this hypothesis from genomic and protein structure data of human and yeast.

Gene function is likely another important determinant of gene duplicability because it is well known that high dosages of some genes (e.g., histone genes) are required for a complex organism and that in many cases (e.g., MHC genes) multiple gene copies are required for functional diversities. In this study we investigated whether metabolic proteins tend to have higher gene duplicability than nonmetabolic proteins. It is well known that cellular metabolism is crucial for the survival and reproduction of cells. All cells in the three domains of life (Bacteria, Archaea, and Eukaryota) obtain energy and universal precursors during the biochemical assimilation and dissimilation of nutrients via metabolic pathways. The metabolic axis of a cell is represented by the pathways of central metabolism (e.g., glycolysis, pentose–phosphate shunt, and the Krebs cycle). The crucial roles that metabolic pathways play in the survival of an organism may affect the duplicability of metabolic genes. Moreover, the patterns of gene duplication may depend on the metabolic role of the gene product (e.g., catabolic, anabolic).

Escherichia coli and Saccharomyces cerevisiae are good prokaryotic and eukaryotic model organisms, respectively, for studying gene duplication patterns in metabolic pathways, because their genomes have been completely sequenced and their metabolic pathways have been well characterized. In the present study, an analysis of metabolic pathways in these organisms revealed that genes in the central metabolic pathways and catabolic pathways have, on average, higher gene duplicabilities than do other genes. In contrast, single-copy genes (singletons) were predominant in anabolic pathways.

Materials and Methods

Identification of Duplicate and Singleton Genes

As described in Gu et al. (2002, 2003), the whole sets of S. cerevisiae and E. coli K-12 MG1655 protein sequences were downloaded from SGD (http://genome-www.stanford.edu/Saccharomyces/) and from E. coli Genome Project ( http://www.genome.wisc.edu/sequencing/k12.htm ), respectively. An all-against-all FASTA search was conducted on each protein dataset independently. A singleton was defined as a protein that did not hit any other proteins in the FASTA search with E = 0.1. Duplicate genes were identified as described in Gu et al. (2003) (E < 10⁻¹⁰). We have also used less stringent criteria to detect duplicate genes and obtained essentially the same results.

Metabolic Pathways

Genes in S. cerevisiae and E. coli metabolic pathways are defined according to the KEGG (http://www.genome.ad.jp/kegg/; Ogata et al. 1999) and WIT (http://wit.mcs.anl.gov/WIT2/; Overbeek et al. 2000) databases. The S. cerevisiae and E. coli ORFs (denoted ALL) are categorized into metabolic (M) and nonmetabolic (non-M) genes. Metabolic genes are those that are involved in any metabolic pathways but not in signal transduction and transport. The metabolic genes are further classified into genes in central metabolic (denoted CM) and non-central metabolic pathway genes (denoted non-CM). The numbers of metabolic steps within CM and non-CM with singletons and duplicates are counted. A metabolic step represents a biochemical reaction catalyzed by an enzyme. When a step has both singleton and duplicate enzymes, we count it as one for singleton and one for duplicate. Although many reactions are reversible, the glucose dissimilation is the direction used to define the non-CM upstream (predominantly catabolic) and downstream of CM (predominantly anabolic) pathways (upstream- and downstream-CM, respectively). For example, galactose, starch, and sucrose catabolism are upstream-CM pathways, whereas amino acid biosynthesis is a downstream-CM pathway.

Proportion of Unduplicated Genes and Number of Duplications per Gene

For each category (i.e., a pathway) under study, the number of unique types of genes is defined as the number of singletons plus the number of duplicated gene types in that category. The number of duplications per gene (n) is the total number of genes divided by the total number of unique types of genes. The proportion of unduplicated genes (P) is the proportion of singletons in the total number of unique types of genes. While n roughly indicates how often a gene has been duplicated in the genome, 1 − P denotes the proportion of gene types that have been duplicated in the genome. Both n and 1 − P can be used as measures of gene duplicability (Yang et al. 2003). In addition, we also consider the proportion of duplicate genes in each category. The latter measure and n are less desirable than P because they can be strongly affected by the presence of large gene families.

Our statistical analyses were conducted in R (Version 1.7.1; http://www.r-project.org/). All statistical tests were Fisher’s exact test.

Results

Duplication Patterns of Genes in Metabolic andNon-metabolic Pathways

The genes involved in 72 yeast metabolic pathways as defined by the KEGG and WIT databases were downloaded, but only 43 pathways (Table 1) were used in this study because the others showed small numbers of steps or overlapped with other pathways. These genes, which are called metabolic (M) genes, were further divided into two categories: central metabolic (CM) and non-central metabolic (non-CM) genes. The duplication patterns of genes in these 43 S. cerevisiae pathways are compared with those in nonmetabolic genes (non-M) and all genes (ALL). The proportions of duplicates in the ALL and non-M categories are similar (34–36%), but the proportion is significantly higher for metabolic genes (56%; p < 10⁻⁴⁰; Table 2 and Fig. 1A); all p values in this paper were obtained by Fisher’s exact test. Furthermore, the proportion of duplicates in CM is about 1.5-fold higher than that in non-CM (p < 10⁻⁸; Table 2 and Fig. 1A). A similar pattern of gene duplication is observed in E. coli, where CM also has the highest proportion of duplicates, being significantly higher than non-CM (p < 0.003). Moreover, the metabolic pathways as a whole (M) show a significantly higher proportion of duplicates than non-M (p < 10⁻⁷) and ALL genes (Table 2 and Fig. 1B).

Table 1 Distributions of duplicates in 43 S. cerevisiae metabolic pathways (pathways are ordered by the proportion of duplicates; pathway numbers are assigned according to the KEGG database)

Full size table

Table 2 Distribution pattern of duplicates for the whole genome, nonmetabolic, metabolic, central metabolic, and non-central metabolic pathways for S. cerevisiae and E. coli

Full size table

The proportion of unduplicated genes (P) in the central metabolic pathways (CM) show the lowest P (i.e., the highest duplicability) for both S. cerevisiae and E. coli (Table 2). In S. cerevisiae, non-CM has a P value similar to that for the whole metabolic category (M), which is, however, lower than those for ALL and non-M (Table 2). Similar conclusions hold for the E. coli data (Table 2).

With respect to the number of duplications per gene (n) for each category in S. cerevisiae, CM has the highest value (2.46; Table 2), non-CM has an intermediate value (1.63), and non-M has the lowest value (1.31). A similar pattern holds for the E. coli data (Table 2). These data together with the P values suggest that genes in the central metabolic pathways have, on average, the highest gene duplicability.

The above comments still apply when the criteria used to detect duplicates are relaxed to E < 10⁻⁵ in both the S. cerevisiae and the E. coli data.

Pattern of Duplicates in Each Step of the Metabolic Pathways in S. cerevisiae

In S. cerevisiae (Table 3) the proportion of singleton steps in non-CM (68.4%) is much higher than that in CM (42.85%; p = 0.001). Indeed, in non-CM there are more steps with a singleton than steps with duplicates (158 vs. 73), whereas in CM there are roughly equal proportions of steps with singletons and duplicates (21 vs. 28).

Table 3 Numbers of steps with singletons and duplicates for metabolic, non-central metabolic, and central metabolic pathways of S. cerevisiae

Full size table

Interestingly, the non-CM pathways upstream of the CM pathways (upstream-CM) show a high proportion of duplicate genes and a high number of duplications per gene (Table 1 and Fig. 2), in comparison with the non-CM pathways downstream of CM pathways (downstream-CM; Fig. 3). Indeed, steps in downstream-CM pathways are dominant with singletons; for example, steps with singletons are overrepresented in the histidine, urea, glutamate, biotin, pyrimidine and purine metabolism pathways (Fig. 3). These results suggest that CM and upstream-CM pathways have a higher gene duplicability than do downstream-CM pathways.

Discussion

The gene duplication patterns in both S. cerevisiae and E. coli reveal a higher average duplicability for genes that are involved in metabolism, especially central metabolism, than for nonmetabolic genes. We note that both species studied are fast-growing organisms and this could be the reason for the higher duplicability for central metabolic enzymes. It will therefore be interesting to see whether our observation holds for other organisms in general.

It is also possible that certain protein families have been preferentially duplicated in the central metabolic pathways. For this possibility we consider the enzymes with a (βα)₈ (TIM) barrel because Copley and Bork (2000) have noted the presence of many TIM barrel-containing enzymes in the pathways of central metabolism; from this observation they suggested that early on, enzyme recruitment was a driving force behind the evolution of metabolic pathways. In yeast the proportion of unduplicated genes is 42.9% for TIM barrel-containing enzymes and 37.5% for enzymes containing no TIM barrel. In E. coli, the corresponding proportions are 62.5 and 61.9%. In both cases, the difference between the two proportions is not significant, so TIM barrel-containing enzymes and non-TIM-barrel enzymes have approximately the same gene duplicability. It should be noted that while Copley and Bork (2000) were concerned with ancient duplications, we are concerned with more recent duplications, i.e., duplicate proteins whose homology can still be readily detected from sequence alignment. Therefore, TIM barrel-containing enzymes in the central metabolic pathways do not seem to have been preferentially duplicated during the evolution of yeast and E. coli at least in recent times.

Generally, a gene duplicate accumulates deleterious mutations more quickly than advantageous ones and has a high chance of becoming a pseudogene as long as the other copy maintains the original function. Thus, the persistence of both duplicates in a genome would require a selective advantage such as functional diversification or a larger dosage requirement. Therefore, it seems that duplication of a metabolic gene tends to have a higher chance to become advantageous than duplication of a nonmetabolic gene.

Most universal precursors for biosynthesis are produced by the central metabolic pathways (e.g., glyceroldehyde 3-phosphate, fructose 6-phosphate, citrate, α-ketoglutarate [Neidhardt et al. 1990]). For this reason, duplication of a gene in a central metabolic or upstream-CM pathway might have been favored. As noted above, in S. cerevisiae and E. coli, genes in the central metabolic and upstream-CM pathways have the highest gene duplicability (Table 2, Figs. 1 and 2).

This argument may be strengthened by the following observation. In S. cerevisiae intracellular hexoses (mainly glucose) that enter the glycolytic pathway are converted to pyruvate and oxidized to ethanol via fermentation. After the fermentable hexoses are exhausted, ethanol is used as a carbon source for aerobic growth, which involves the TCA cycle. Alternatively, glucose can be oxidized in the pentose–phosphate shunt. This pathway provides the cell with pentose sugar and cytosolic NADPH. Ribose sugars generated are used further in the biosynthesis of nucleic acid precursors and nucleotide coenzymes. Therefore, in order to utilize the hexoses rapidly, duplication of an enzyme in an upstream-CM or CM pathway might have been an advantage during some period in evolution. Furthermore, the importance of glycolysis is obvious in view of the fact that glycolytic enzymes are present around 30–68% of soluble protein in the yeast cell (Banuelos and Fraenkel 1982).

The presence of gene duplicates may also increase genetic robustness against null mutations (Gu et al. 2003). Using the data on the fitness effects of single-gene deletions for the whole yeast genome (Steinmetz et al. 2002), we find that essential genes in the central metabolic pathways are all singletons (i.e., in CM 100% of genes with lethal single-gene deletions are singletons), but no deletion of a duplicate is lethal.

Enzyme duplication could provide an opportunity for an enzyme with a multiple substrate specificity to specialize in different functions. Recent biochemical studies provide evidence that many enzymes in central metabolic pathways have binding specificities to not-normally-known substrates (e.g., O’Brien and Herschlag 1999; for a review, see D’Ari and Casadesus 1998). For example, the glycolytic kinases such as 6-phosphofructokinases, phosphoglycerate kinases, pyruvate kinases, and acetate kinases of the small genome wall-less Mollicutes (Mycoplasma species) could use other nucleoside diphosphates besides their normally known reactants (Pollack et al. 2002). Such usages of unnatural reactants of these glycolytic kinases are reported in various organisms including E. coli, dog, and cat (Brenda Enzyme Database; http://www.brenda.uni-koeln.de [Schomburg et al. 2002]). Moreover, duplicates may be regulated and/or expressed in different environmental conditions. In yeast, pyk1 (pyruvate kinase 1) mutants fail to grow on fermentable carbon sources but can grow normally on ethanol or other gluconeogenic carbon sources (a very low glycolytic flux). Under such conditions, pyruvate kinase 2 (PYK2), a PYK1 paralog, is expressed (Boles et al. 1997). Such an “underground metabolism” could provide functional diversification, which in turn provides metabolic plasticity for organisms to survive in wider environmental habitats (D’Ari and Casadesus 1998).

Gene duplication has been the major process proposed for the evolution of enzymes and the metabolic pathways, but the issue has been under intense debate for more than 50 years. Possible models for describing its evolutionary mechanism have been proposed such as duplication of either enzymes or pathways, recruitment of enzymes from other pathways, or retro-evolution of the pathways (e.g., for a review, see Schmidt et al. 2003). As metabolic data from various organisms increased, it became clear that the lower part of glycolysis has been well conserved across eubacteria, archaea and eukaryotes, whereas major variations are found in the upper part from glucose to 3-phosphoglycerate (Ronimus and Morgan 2003; Verhees et al. 2003). Although archaeal enzymes in the upper part of the glycolysis have less sequence similarity than, and diverse functions from, eubacteria and eukaryote counterparts, their structures are homologous. In addition to this observation, many downstream-CM pathways (e.g., individual amino acid biosyntheses) in E. coli show high conservation in the number of orthologs in all three domains of life (Peregrin-Alvarez et al. 2003). Thus, in ancient times duplication in the central metabolic and upstream-CM pathways might have been a driving force for an organism to cope with changes in metabolites.

These data provide evidence for gene function as an important determinant of gene duplicability, especially genes functioning in metabolism in S. cerevisiae and E. coli. Given that these free-living unicellular organisms make a contact to the environment directly, their source of nutrients depends on the habitats. Often their inhabiting environments are short in nutrient supplies, so that they have to compete with each other in a species and/or with different species for the available metabolites. The ability to process these nutrients into metabolic precursors quickly directly increases the growth and survival rates. Therefore, duplication in upstream metabolic genes may increase the ability to compete for resources. In this study, we have indeed found that many gene duplicates have been retained in the upstream-CM and CM pathways.

References

M Banuelos DG Fraenkel (1982) ArticleTitleSaccharomyces carlsbergensis fdp mutant and futile cycling of fructose 6-phosphate Mol Cell Biol 8 921–929
Google Scholar
E Boles F Schulte T Miosga K Freidel E Schluter FK Zimmermann CP Hollenberg JJ Heinisch (1997) ArticleTitleCharacterization of a glucose-repressed pyruvate kinase (Pyk2p) in Saccharomyces cerevisiae that is catalytically insensitive to fructose-1,6-bisphosphate J Bacteriol 179 2987–2993
Google Scholar
RR Copley P Bork (2000) ArticleTitleHomology among (βα)₈ barrels: Implications for the evolution of metabolic pathways J Mol Biol 303 627–640
Google Scholar
R D’Ari J Casadesus (1998) ArticleTitleUnderground metabolism Bioessays 20 181–186
Google Scholar
Z Gu D Nicolae HH Lu WH Li (2002) ArticleTitleRapid divergence in expression between duplicate genes inferred from microarray data Trends Genet 18 609–613 Occurrence Handle10.1016/S0168-9525(02)02837-8 Occurrence Handle1:CAS:528:DC%2BD38XoslCgur0%3D Occurrence Handle12446139
Article CAS PubMed Google Scholar
Z Gu LM Steinmetz X Gu C Scharfe RW Davis WH Li (2003) ArticleTitleRole of duplicate genes in genetic robustness against null mutations Nature 421 63–66
Google Scholar
FC Neidhardt J Ingraham M Schaechter (1990) Physiology of the bacterial cell: A molecular approach Sinauer Associates Sunderland, MA
Google Scholar
PJ O’Brien D Herschlag (1999) ArticleTitleCatalytic promiscuity and the evolution of new enzymatic activities Chem Biol 6 R91–R105
Google Scholar
H Ogata S Goto K Sato W Fujibuchi H Bono M Kanehisa (1999) ArticleTitleKEGG: Kyoto Encyclopedia of Genes and Genomes Nucleic Acids Res 27 29–34
Google Scholar
R Overbeek N Larsen GD Pusch M D’Souza E Selkov SuffixJr N Kyrpides M Fonstein N Maltsev E Selkov (2000) ArticleTitleWIT: Integrated system for high-throughput genome sequence analysis and metabolic reconstruction Nucleic Acids Res 28 123–125
Google Scholar
B Papp C Pal LD Hurst (2003) ArticleTitleDosage sensitivity and the evolution of gene families in yeast Nature 424 194–197
Google Scholar
JM Peregrin-Alvarez S Tsoka CA Ouzounis (2003) ArticleTitleThe phylogenetic extent of metabolic enzymes and pathways Genome Res 13 422–427
Google Scholar
JD Pollack MA Myers T Dandekar R Herrmann (2002) ArticleTitleSuspected utility of enzymes with multiple activities in the small genome Mycoplasma species: The replacement of the missing “household” nucleoside diphosphate kinase gene and activity by glycolytic kinases OMICS 6 247–258
Google Scholar
R Ronimus H Morgan (2003) ArticleTitleDistribution and phylogenies of enzymes of the Embden–Meyerhof–Parnas pathway from archaea and hyperthermophilic bacteria support a gluconeogenic origin of metabolism Archaea 1 199–221
Google Scholar
S Schmidt S Sunyaev P Bork T Dandekar (2003) ArticleTitleMetabolites: A helping hand for pathway evolution? Trends Biochem Sci 28 336–341
Google Scholar
I Schomburg A Chang D Schomburg (2002) ArticleTitleBRENDA, enzyme data and metabolic information Nucleic Acids Res 30 47–49
Google Scholar
LM Steinmetz C Scharfe AM Deutschbauer D Mokranjac ZS Herman T Jones AM Chu G Giaever H Prokisch PJ Oefner RW Davis (2002) ArticleTitleSystematic screen for human disease genes in yeast Nat Genet 31 400–404
Google Scholar
CH Verhees SW Kengen JE Tuininga GJ Schut MW Adams WM Vos ParticleDe OoJ Der ParticleVan (2003) ArticleTitleThe unique features of glycolytic pathways in Archaea Biochem J 375 231–246
Google Scholar
J Yang R Lusk WH Li (2003) ArticleTitleOrganismal complexity, protein complexity, and gene duplicability Proc Natl Acad Sci USA 100 15661–15665
Google Scholar

Download references

Acknowledgments

We thank the two reviewers for valuable comments. This study was supported by the International Balzan Foundation and NIH grants to W.H.L.

Author information

Authors and Affiliations

Mathematics & Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, 60439, USA
Elizabeth Marland & Natalia Maltsev
Committee on Genetics, University of Chicago, Chicago, IL, 60637, USA
Anuphap Prachumwat
Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL, 60637, USA
Zhenglong Gu & Wen-Hsiung Li

Authors

Elizabeth Marland
View author publications
You can also search for this author in PubMed Google Scholar
Anuphap Prachumwat
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Maltsev
View author publications
You can also search for this author in PubMed Google Scholar
Zhenglong Gu
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Hsiung Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen-Hsiung Li.

Additional information

Reviewing Editor: Dr. Rüdiger Cerff

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marland, E., Prachumwat, A., Maltsev, N. et al. Higher Gene Duplicabilities for Metabolic Proteins Than for Nonmetabolic Proteins in Yeast and E. coli. J Mol Evol 59, 806–814 (2004). https://doi.org/10.1007/s00239-004-0068-x

Download citation

Received: 02 April 2004
Accepted: 29 June 2004
Issue Date: December 2004
DOI: https://doi.org/10.1007/s00239-004-0068-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Higher Gene Duplicabilities for Metabolic Proteins Than for Nonmetabolic Proteins in Yeast and E. coli

Abstract

Similar content being viewed by others

Gene Duplication and Functional Consequences

Increased rates of protein evolution and asymmetric deceleration after the whole-genome duplication in yeasts

Global analysis of human duplicated genes reveals the relative importance of whole-genome duplicates originated in the early vertebrate evolution

Introduction

Materials and Methods