Abstract
Threonine aldolases (TAs) catalyze the interconversion of threonine and glycine plus acetaldehyde in a pyridoxal phosphate-dependent manner. This class of enzymes complements the primary glycine biosynthetic pathway catalyzed by serine hydroxymethyltransferase (SHMT), and was shown to be necessary for yeast glycine auxotrophy. Because the reverse reaction of TA involves carbon–carbon bond formation, resulting in a β-hydroxyl-α-amino acid with two adjacent chiral centers, TAs are of high interests in synthetic chemistry and bioengineering studies. Here, we report systematic phylogenetic analysis of TAs. Our results demonstrated that l-TAs and d-TAs that are specific for l- and d-threonine, respectively, are two phylogenetically unique families, and both enzymes are different from their closely related enzymes SHMTs and bacterial alanine racemases (ARs). Interestingly, l-TAs can be further grouped into two evolutionarily distinct families, which share low sequence similarity with each other but likely possess the same structural fold, suggesting a convergent evolution of these enzymes. The first l-TA family contains enzymes of both prokaryotic and eukaryotic origins, and is related to fungal ARs, whereas the second contains only prokaryotic l-TAs. Furthermore, we show that horizontal gene transfer may occur frequently during the evolution of both l-TA families. Our results indicate the complex, dynamic, and convergent evolution process of TAs and suggest an updated classification scheme for l-TAs.
Avoid common mistakes on your manuscript.
Introduction
Glycine plays an essential biological role as a precursor for the synthesis of proteins, nucleic acids, and other metabolites. Most organisms produce glycine by serine hydroxymethyltransferase (SHMT), which uses the cofactor pyridoxal 5′-phosphate (PLP) and cleaves serine to produce glycine and the C1 unit for tetrahydrofolate (THF)-dependent reactions (Fig. 1a, i). In some cases, glycine biosynthesis is supplemented by a second enzyme threonine aldolase (TA) (Duckers et al. 2010; Franz and Stewart 2014; Liu et al. 2000a). TA is also PLP-dependent and cleaves threonine to produce glycine and acetaldehyde (Fig. 1a, ii). TAs have been isolated from various organisms including bacteria, fungi, and mammals, and were shown to be necessary for yeast glycine auxotrophy (McNeil et al. 1994). Putative TA homologues have also been found in various organisms including protozoa, insects, and plants (Edgar 2005; Fesko et al. 2008; Jander et al. 2004).
TAs are classified into l-TAs and d-TAs according to their specificity at the α-carbon of threonine (Fig. 1b) (Fesko et al. 2008; Liu et al. 2000a). l-TAs are specific for l-threonine and can be further classified into three subgroups, which prefer l-threonine or specifically select l-allo-threonine, or have low specificity toward the β-carbon of l-threonine (Fesko et al. 2008; Liu et al. 2000a) (Fig. 1b). All the known d-TAs are non-specific for the β-carbon of d-threonine (Fesko et al. 2008; Kataoka et al. 1997a; Liu et al. 1998b; Liu et al. 2000b). The non-specificity of TA catalysis raises an interesting question regarding their physiological roles. Because the reverse reaction of TA involves carbon–carbon bond formation that results in a β-hydroxyl-α-amino acid with two adjacent chiral centers, TAs are of high interest in synthetic chemistry (Liu et al. 2000a). More promisingly, these enzymes accept a wide variety of acceptor aldehydes and therefore make an important addition to the synthetic tool repertoire (Duckers et al. 2010; Franz and Stewart 2014). Notably, β-phenylserine, a threonine analog possessing a much larger phenyl group at the β position (instead of a methyl group in threonine), is a more active substrate than threonine in all the investigated cases (Kataoka et al. 1997b; Liu et al. 1998a; Liu et al. 1998c), arguing whether threonine is a genuine natural substrate for TAs.
Intrigued by this interesting class of enzymes, here we report detailed phylogenetic investigation on TAs together with their closely related enzymes SHMTs and alanine racemases (ARs) (Contestabile et al. 2001; Paiardini et al. 2003). Our results show that, interestingly, l-TAs are derived from two distinct families that share low sequence similarity with each other but likely have the same structural fold, suggesting a convergent evolution of these enzymes. One TA family contains enzymes of both prokaryotic and eukaryotic origins, whereas the second TA family contains only prokaryotic enzymes. Phylogenetic analysis suggests that horizontal gene transfer may occur frequently during the evolution of both TA families, as the tree topology is highly inconsistent with the taxonomic classification of host organisms. Our results indicate a complex evolutionary process for TAs and suggest an updated classification scheme for these enzymes.
Methods
Sequence Similarity Network Analysis
The amino acid sequences of TAs, SHMTs, and ARs were obtained from the National Center for Biotechnology Information (NCBI) sequence database (Pruitt et al. 2007) and are listed in Supplementary Table 1. To identify the putative TAs, BlastP searches (Gish and States 1993) were performed using the protein sequences of biochemically characterized TAs as the queries against the database of different organisms (archaea, actinobacteria, cyanobacteria, firmicutes, proteobacteria, fungi, and other eukaryotes). Hits with expected values less than 1E-70 and query coverage >80 % were usually selected. Network analysis was performed by BlastP searches comparing each sequence against another. A VBA script was written to remove all the duplicate comparisons, and the result was imported into Cytoscape software package (Cline et al. 2007). The nodes were arranged using the yFiles organic layout provided with Cytoscape version 2.8.3. The arrangements were slightly modified in some cases for a better illustration.
Phylogenetic Analysis
The same sequences of TAs and SHMTs (outgroup) from network analysis were aligned using ClustalX (Thompson et al. 1997) with iteration at each alignment step, and the alignment was manually fine-tuned afterward to minimize hypothetical insertion/deletion events. Bayesian Markov Chain Monte Carlo (MCMC) inference analyses were performed using the program MrBayes (version 3.2) (Ronquist et al. 2012). Final analyses consisted of two sets of eight chains each (one cold and seven heated), run for about 2 million generations with trees saved and parameters sampled every 100 generations. Analyses were run to reach a convergence with standard deviation of split frequencies <0.01. Posterior probabilities were averaged over the final 75 % of trees (25 % burn in). The analysis utilized a mixed amino acid model with a proportion of sites designated invariant (+I), and rate variation among sites modeled after a gamma distribution (+G) divided into eight categories, with all variable parameters estimated by the program based on random starting trees. The figure of the Bayesian phylogram was prepared using MEGA5 (Tamura et al. 2011).
Maximum likelihood analysis was performed using the program PhyML (Guindon and Gascuel 2003) with the WAG + I + G + F (Whelan and Goldman 2001) model. Gamma distribution was divided into eight categories and the tree topologies were estimated by SPR + NNI branch swapping, with 20 random starting trees. Branch support was determined by SH-like approximate likelihood-ratio test (aLRT) statistics (Anisimova and Gascuel 2006; Guindon and Gascuel 2003).
Results and Discussion
Sequence Similarity Network of TAs
Sequence similarity network analysis is a powerful and computation-economic method to depict the relationship among different protein sequences (Atkinson et al. 2009; Lukk et al. 2012; Zhao et al. 2014). In a network, each node represents a protein sequence, and each edge (line) indicates a pair of nodes (protein sequence) that have a BlastP e-value more stringent than a certain cutoff value. To study the evolution of TAs and their relationship with other PLP-dependent enzymes, we constructed a sequence similarity network containing TAs, SHMTs, and ARs from different organisms (Fig. 2). In this analysis, l-TAs and d-TAs completely separate from each other with a very relaxed cutoff value of 1E-10 (Fig. 2a), and this is consistent with previous studies suggesting that l-TAs and d-TAs are structurally and evolutionarily different (Paiardini et al. 2003). Under the same cutoff value, l-TAs also separate from SHMTs and bacterial ARs, suggesting that, although l-TA, SHMTs, and ARs are structurally and mechanistically closely related, (Eliot and Kirsch 2004; Hayashi 1995), these enzymes have evolved from different ancestors.
Intriguingly, l-TAs completely separate into two different clusters with a relatively decreased cutoff value of 1E-22 (Fig. 2b), suggesting that l-TAs are derived from two different origins. Detailed examination of the network indicated that the sequence identities for proteins within a cluster and between the two clusters are normally above 40 % and below 20 %, respectively. The first cluster (cluster A) consists of diverse enzymes, including fungal ARs and l-TAs from both prokaryotic and eukaryotic origins (Fig. 2b). Biochemically characterized enzymes in this cluster include l-TA from Aeromonas jandaei (l-TAaj) that is specific for l-allo-threonine (Kataoka et al. 1997b; Liu et al. 1997a; Qin et al. 2014), and low-specificity l-TAs from Escherichia coli (l-TAe) (di Salvo et al. 2014; Liu et al. 1998a), Saccharomyces cerevisiae (l-TAsc) (Liu et al. 1997b), and Thermatoga maritime (l-TAtm) (Kielkopf and Burley 2002). l-TAaj, l-TAe, and l-TAtm have also been structurally characterized (di Salvo et al. 2014; Kielkopf and Burley 2002; Qin et al. 2014). The second cluster (cluster B) has only prokaryotic l-TAs (Fig. 2b), including low-specificity l-TA from Pseudomonas sp. NCIMB 10558 (l-TAps) (Liu et al. 1998c), and two enzymes from Pseudomonas aeruginosa (l-TApa) and Pseudomonas putida (l-TApp) that prefer l-threonine (Fesko et al. 2008). So far no structure has been reported from cluster B. Further decreasing the cutoff value led to separation of fungal ARs and fungal l-TAs from cluster A, but no separation of cluster B, even when the cutoff was decreased to a very stringent value of 1E-95 (Fig. 2c, d). Multiple sequence alignment of the selected l-TAs show that, although enzymes from one cluster are clearly different from those of the other cluster, both clusters of enzymes share many conserved residues (including those constituting the active site) and likely possess the same structural fold (Supplementary Fig. 1). These results suggest that l-TAs have been evolved convergently from two ancestral families.
The catalytic specificity is highly diverse among cluster A enzymes, ranging from very stringent specificity for l-allo-threonine (l-TAaj) to high tolerance with regard to the β-position (l-TAe and l-TAsc). We note that all the above-mentioned enzymes prefer l-allo-threonine over l-threonine to a certain extent (Kataoka et al. 1997b; Liu et al. 1998a; Qin et al. 2014). On the contrary, all three biochemically characterized enzymes l-TAps, l-TApa, and l-TApp in cluster B prefer l-threonine, although the specificity was reported to be very low for l-TAps (Liu et al. 1998c). It remains to be investigated whether the two clusters of l-TAs have different substrate specificities toward the β-position (i.e., cluster A enzymes may generally prefer l-allo-threonine, whereas cluster B may generally prefer l-threonine).
Phylogenetic Analysis
To confirm the proposal that l-TAs have been evolved from two ancestral families, we performed phylogenetic analysis of l-TAs using the Bayesian MCMC method (Mau et al. 1999). The analysis includes all the enzymes from cluster A and cluster B, and SHMTs that serve as the outgroup (due to the very low sequence similarities of d-TAs and bacterial ARs with l-TAs, the former two classes of enzymes were not included in the phylogenetic analysis). Indeed, l-TAs separate into two clusters with good statistical support (Fig. 3). Detailed examination of each enzyme in the Bayesian MCMC tree showed that the two clusters correspond well with cluster A and B in our sequence similarity network (Fig. 3 and Supplementary Table 1), further supporting that l-TAs are derived from two different origins. We also constructed a phylogenetic tree using maximum likelihood (ML)-based method, which is very similar to the Bayesian MCMC tree and therefore confirmed the robustness of our analysis (Supplementary Fig. 2). We noted that in many cases, enzyme phylogeny does not correlate with the host taxonomy (e.g., enzymes from cyanobacteria and firmicutes are found in many branches of the tree and do not form separate clusters) (Fig. 3 and Supplementary Fig. 2). This observation suggests that horizontal gene transfer might occur frequently during TA evolution. Another interesting finding is that no significant divergence is observed between high- and low-specificity enzymes. For example, l-TApp is believed to be l-threonine specific (Fesko et al. 2008; Liu et al. 1998c) but is phylogenetically closely related with a low-specificity enzyme l-TAps (Fig. 3 and Supplementary Fig. 2). In addition, l-TAaj which is specific for l-allo-threonine (Kataoka et al. 1997b) is not far away from a low-specificity enzyme l-TAe (Liu et al. 1998a) (Fig. 3 and Supplementary Fig. 2). These observations suggest a possibility that many TAs may be engineered, using techniques such as random PCR mutation, DNA shuffling, or chemical modification, to significantly alter their catalytic specificity.
Conclusions
We performed sequence similarity network and phylogenetic analysis on TAs and their closely related PLP-dependent enzymes. We show that SHMTs, bacterial ARs, d-TA, and l-TAs are derived from different origins within the same cluster, and l-TAs are further grouped into two evolutionarily distinct families. Our results suggest a convergent evolutionary process for l-TAs and the importance of these enzymes for the life process. As horizontal gene transfer occurred frequently during TA evolution, and the distribution of these enzymes is highly diverse among different organisms (e.g., many organisms do not have a TA-encoding gene in the genome, whereas many others contain more than one TA genes), it is likely that TAs may be involved in some secondary metabolic pathways besides glycine biosynthesis. The fact that several l-threonine analogs are more active substrates of l-TAs than l-threonine is consistent with this proposal, which needs to be further tested. Our analysis also suggests the potential of engineering the catalytic specificity of TAs and screening novel TAs with desired activity by sampling sequence space that so far has not been tapped (e.g., enzymes from archaea). These studies are of particular interest because the low diastereospecificity of TAs is currently the main hurdle in using these enzymes for synthetic applications.
References
Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539
Atkinson HJ, Morris JH, Ferrin TE, Babbitt PC (2009) Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE 4:e4345
Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, Workman C, Christmas R, Avila-Campilo I, Creech M, Gross B, Hanspers K, Isserlin R, Kelley R, Killcoyne S, Lotia S, Maere S, Morris J, Ono K, Pavlovic V, Pico AR, Vailaya A, Wang PL, Adler A, Conklin BR, Hood L, Kuiper M, Sander C, Schmulevich I, Schwikowski B, Warner GJ, Ideker T, Bader GD (2007) Integration of biological networks and gene expression data using cytoscape. Nat Protoc 2:2366
Contestabile R, Paiardini A, Pascarella S, di Salvo ML, D’Aguanno S, Bossa F (2001) l-Threonine aldolase, serine hydroxymethyltransferase and fungal alanine racemase—a subgroup of strictly related enzymes specialized for different functions. Eur J Biochem 268:6508
di Salvo ML, Remesh SG, Vivoli M, Ghatge MS, Paiardini A, D’Aguanno S, Safo MK, Contestabile R (2014) On the catalytic mechanism and stereospecificity of Escherichia coli l-threonine aldolase. FEBS J 281:129
Duckers N, Baer K, Simon S, Groger H, Hummel W (2010) Threonine aldolases-screening, properties and applications in the synthesis of non-proteinogenic beta-hydroxy-alpha-amino acids. Appl Microbiol Biotechnol 88:409
Edgar AJ (2005) Mice have a transcribed L-threonine aldolase/GLY1 gene, but the human GLY1 gene is a non-processed pseudogene. BMC Genom 6:32
Eliot AC, Kirsch JF (2004) Pyridoxal phosphate enzymes: mechanistic, structural, and evolutionary considerations. Annu Rev Biochem 73:383
Fesko K, Reisinger C, Steinreiber J, Weber H, Schurmann M, Griengl H (2008) Four types of threonine aldolases: similarities and differences in kinetics/thermodynamics. J Mol Catal B-Enzym 52–3:19
Franz SE, Stewart JD (2014) Threonine aldolases. Adv Appl Microbiol 88:57
Gish W, States DJ (1993) Identification of protein coding regions by database similarity search. Nat Genet 3:266
Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696
Hayashi H (1995) Pyridoxal enzymes: mechanistic diversity and uniformity. J Biochem 118:463
Jander G, Norris SR, Joshi V, Fraga M, Rugg A, Yu S, Li L, Last RL (2004) Application of a high-throughput HPLC-MS/MS assay to arabidopsis mutant screening; evidence that threonine aldolase plays a role in seed nutritional quality. Plant J 39:465
Kataoka M, Ikemi M, Morikawa T, Miyoshi T, Nishi K, Wada M, Yamada H, Shimizu S (1997a) Isolation and characterization of d-threonine aldolase, a pyridoxal-5′-phosphate-dependent enzyme from Arthrobacter sp. DK-38. Eur J Biochem 248:385
Kataoka M, Wada M, Nishi K, Yamada H, Shimizu S (1997b) Purification and characterization of L-allo-threonine aldolase from aeromonas jandaei DK-39. FEMS Microbiol Lett 151:245
Kielkopf CL, Burley SK (2002) X-ray structures of threonine aldolase complexes: structural basis of substrate recognition. Biochemistry 41:11711
Liu JQ, Dairi T, Kataoka M, Shimizu S, Yamada H (1997a) L-allo-threonine aldolase from Aeromonas jandaei DK-39: gene cloning, nucleotide sequencing, and identification of the pyridoxal 5′-phosphate-binding lysine residue by site-directed mutagenesis. J Bacteriol 179:3555
Liu JQ, Nagata S, Dairi T, Misono H, Shimizu S, Yamada H (1997b) The GLY1 gene of Saccharomyces cerevisiae encodes a low-specific l-threonine aldolase that catalyzes cleavage of L-allo-threonine and l-threonine to glycine–expression of the gene in Escherichia coli and purification and characterization of the enzyme. Eur J Biochem 245:289
Liu JQ, Dairi T, Itoh N, Kataoka M, Shimizu S, Yamada H (1998a) Gene cloning, biochemical characterization and physiological role of a thermostable low-specificity l-threonine aldolase from Escherichia coli. Eur J Biochem 255:220
Liu JQ, Dairi T, Itoh N, Kataoka M, Shimizu S, Yamada H (1998b) A novel metal-activated pyridoxal enzyme with a unique primary structure, low specificity d-threonine aldolase from Arthrobacter sp. Strain DK-38. Molecular cloning and cofactor characterization. J Biol Chem 273:16678
Liu JQ, Ito S, Dairi T, Itoh N, Kataoka M, Shimizu S, Yamada H (1998c) Gene cloning, nucleotide sequencing, and purification and characterization of the low-specificity l-threonine aldolase from Pseudomonas sp. strain NCIMB 10558. Appl Environ Microbiol 64:549
Liu JQ, Dairi T, Itoh N, Kataoka M, Shimizu S, Yamada H (2000a) Diversity of microbial threonine aldolases and their application. J Mol Catal B-Enzym 10:107
Liu JQ, Odani M, Yasuoka T, Dairi T, Itoh N, Kataoka M, Shimizu S, Yamada H (2000b) Gene cloning and overproduction of low-specificity d-threonine aldolase from Alcaligenes xylosoxidans and its application for production of a key intermediate for parkinsonism drug. Appl Microbiol Biotechnol 54:44
Lukk T, Sakai A, Kalyanaraman C, Brown SD, Imker HJ, Song L, Fedorov AA, Fedorov EV, Toro R, Hillerich B, Seidel R, Patskovsky Y, Vetting MW, Nair SK, Babbitt PC, Almo SC, Gerlt JA, Jacobson MP (2012) Homology models guide discovery of diverse enzyme specificities among dipeptide epimerases in the enolase superfamily. Proc Natl Acad Sci 109:4122
Mau B, Newton MA, Larget B (1999) Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics 55:1
McNeil JB, McIntosh EM, Taylor BV, Zhang FR, Tang S, Bognar AL (1994) Cloning and molecular characterization of three genes, including two genes encoding serine hydroxymethyltransferases, whose inactivation is required to render yeast auxotrophic for glycine. J Biol Chem 269:9155
Paiardini A, Contestabile R, D’Aguanno S, Pascarella S, Bossa F (2003) Threonine aldolase and alanine racemase: novel examples of convergent evolution in the superfamily of vitamin B6-dependent enzymes. Biochim Biophys Acta 1647:214
Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35:D61
Qin HM, Imai FL, Miyakawa T, Kataoka M, Kitamura N, Urano N, Mori K, Kawabata H, Okai M, Ohtsuka J, Hou F, Nagata K, Shimizu S, Tanokura M (2014) L-allo-threonine aldolase with an H128Y/S292R mutation from Aeromonas jandaei DK-39 reveals the structural basis of changes in substrate stereoselectivity. Acta Crystallogr D Biol Crystallogr 70:1695
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Hohna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691
Zhao S, Sakai A, Zhang X, Vetting MW, Kumar R, Hillerich B, San Francisco B, Solbiati J, Steves A, Brown S, Akiva E, Barber A, Seidel RD, Babbitt PC, Almo SC, Gerlt JA, Jacobson MP (2014) Prediction and characterization of enzymatic activities guided by sequence similarity and genome neighborhood networks. Elife 3:e03275
Acknowledgments
This work was supported in part by grants from National Natural Science Foundation (2011DFA32520 to M.Z.), from State Key Laboratory of Bioorganic Chemistry (SKLBNPC13425 to W.D.), and from Fudan University (IDH1615002 to Q.Z). Q.Z would also like to thank the support of the Thousand Talents Program.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liu, G., Zhang, M., Chen, X. et al. Evolution of Threonine Aldolases, a Diverse Family Involved in the Second Pathway of Glycine Biosynthesis. J Mol Evol 80, 102–107 (2015). https://doi.org/10.1007/s00239-015-9667-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-015-9667-y