Introduction

All organisms are endowed with sensory systems that allow them to respond to different environmental chemical signals. The quorum sensing signaling system is a cell-to-cell communication process mediated by molecules known as autoinducers (e.g., acylated homoserine lactones, small peptides, furanosyl borate diester). Modifications in the autoinducer concentration by variations in density population trigger intracellular signaling cascades (Bharati and Chatterji 2013). Some components involved in these processes are in fact intra- and extracellular second messengers, which have been defined as alarmones. Alarmone is a term first used by Stephens et al. (1975) to describe signaling molecules that are ribonucleotides or have a chemical structure similar to a ribonucleotide, such as ppGpp or cAMP, which activate or inhibit different cellular processes during stress conditions. Variations in alarmone concentration levels alter gene expression or metabolism changes that allow organisms to adapt to environmental insults. Production of alarmones is increased when cells face a wide range of stress conditions, including amino acid, carbon, carbohydrate, and phosphate starvation, as well as limitation of fatty acids or metallic ions, changes in temperature, salinity, pH, oxidative stress, and the presence of antibiotics (Supplementary Table S1). Increase in alarmone concentrations triggers a response known as stringent response or control (Stent and Brenner 1961), that regulates different key cellular processes such as replication, transcription, translation, and metabolism, allowing organisms to switch from a growth mode to a survival one or, in some cases, promoting the appearance of flagella that allow migration to more favorable environments (Pesavento and Hengge 2009; Bharati and Chatterji 2013).

The chemical structures of all known alarmones include a purine moiety (Fig. 1). As summarized by Nelson and Breaker (Breaker 2010; Nelson and Breaker 2017), this bias can be explained by the fact that purines (a) are less soluble than pyrimidines, favoring their associations with RNA polymers; (b) contain a fused two-ring system and stack better than pyrimidines, allowing them a more favorable interaction with the biological receptors; and (c) are preferred by artificial ribozymes that recognize cyclic mononucleotides. However, other factors may be involved in this bias. For instance, it has been argued that delocalizable π electrons of heterocyclic conjugated molecules could have played a key role in the selection of the chemical composition of living systems (Pullman 1972). As underlined by Pullman (1972), purines have a resonance energy higher than pyrimidines. This makes purines more stable, with adenine being the most stable of all the biochemically significant bases, and the one that exhibits the greatest value of resonance energy per π electron, which makes it more resistant to radiation damage. Therefore, it is reasonable to assume that this may have led to its selection as part of the basic skeleton of many compounds, including high energy molecules (e.g., ATP), coenzymes (e.g., NAD, FAD, CoA), alarmones (e.g., cAMP), and some antibiotics (e.g., lysylaminoadenosine).

Fig. 1
figure 1

Alarmones divided in four main groups. The first group, adenine-based alarmones, is formed by cyclic AMP (cAMP), cyclic diadenylate (c-di-AMP), and diadenosine polyphosphates (Ap(n)A). All of them pointed with a red arrow. The second group is alarmones that have a guanine ring in their chemical structure, such as cyclic GMP (cGMP), cyclic diguanylate (c-di-GMP), guanosine (penta-)tetraphosphate ((p)ppGpp), diguanosine polyphosphates (Gp(n)G), which are pointed with a blue arrow. A third group, which is pointed with a green arrow, is formed by metabolic intermediates in the biosynthesis of purines, such as the aminoimidazole carboxamide ribotide (AICAR), or derived from purines, such as aminoimidazole carboxamide ribotide triphosphate (ZTP). The last group is formed by joining guanine and adenine purines (pointed with a purple arrow), producing the cyclic guanosine monophosphate-adenosine monophosphate (cGAMP) alarmone. Alarmones with a universal cellular distribution are highlighted in yellow background. Red and blue circles represent the chemical differences between purines and pyrimidines, respectively. (Color figure online)

Alarmones can be classified according to their nucleotide composition and the cyclic or lineal chemical linkage they form (Nelson and Breaker 2017) or, as shown in Fig. 1, by their chemical structure. In the latter case, four main groups can be recognized (Fig. 1): (a) adenine-ribonucleotide group; (b) guanine-ribonucleotide group; (c) alarmones derived from purine anabolism intermediates; and (d) those formed by guanine and adenine producing a cyclic dinucleotide. The adenine-alarmone group includes the well-known cyclic AMP (cAMP), together with cyclic diadenylate (c-di-AMP) and diadenosine polyphosphates (Ap(n)A). Alarmones endowed with a guanine moiety in their chemical structure include cyclic GMP (cGMP), cyclic diguanylate (c-di-GMP), guanosine (penta-)tetraphosphate ((p)ppGpp), and diguanosine polyphosphates (Gp(n)G). A small third group is formed by intermediates of purine metabolism, which includes aminoimidazole carboxamide ribonucleotide (AICAR) and aminoimidazole carboxamide riboside triphosphate (ZTP). Finally, the last group is formed by the cyclic guanosine monophosphate-adenine monophosphate alarmone (cGAMP). Other modified purine [e.g., Xp(n)X] and pyrimidine ribonucleotides [e.g., cUMP, cCMP, Up(n)U] have been described, but their role in signaling cascades is poorly understood. The agonist effect against some uridine dinucleoside polyphosphates receptors has demonstrated that they affect proliferation, differentiation, phagocytosis, secretion, cell adhesion, and migration (Pendergast et al. 2001; Seifert et al. 2011). This suggests that these molecules could perhaps also function as alarmones or regulators (Nelson and Breaker 2017).

The key role of alarmones in the regulation of cellular processes, together with their biological distribution in the three major biological lineages, has been interpreted as evidence that signal mechanisms were already present in the last common ancestor (LCA) (i.e., the last universal common ancestor or LUCA) of all extant life forms (Lazcano et al. 2011). As reported here, analyses of the phylogenetic distribution of alarmone biosynthetic enzymes strongly support this possibility. Alarmone components (Fig. 1) are easily formed in prebiotic simulation experiments (Oró 1960; Ferris et al. 1978; Ritson and Sutherland 2012; Rios and Tor 2013), and are present in carbonaceous chondrites (Copper et al. 2001; Callahan et al. 2011), suggesting that they may have appeared during earlier stages of cellular evolution when RNA molecules and ribonucleotide derivatives played a much more conspicuous role in replication and metabolism (Breaker 2010; Lazcano et al. 2011; Lazcano 2014, 2018; Nelson and Breaker 2017). It has been argued that ribonucleotidyl coenzymes may also be vestiges of early stages of cellular evolution (Orgel 1968; Orgel and Sulston 1971) as discussed here: the same appears to be true for alarmones, and they appear to represent the oldest recognizable system of cell signaling involved in intra- and extracellular communication, environmental sensing, and interaction with organisms of the same and different species (Breaker 2010; Lazcano et al. 2011; Nelson and Breaker 2017).

Results

Phylogenetic Distribution of Biosynthetic Enzymes of Alarmones in Completely Sequenced Cellular Genomes

The chemical nature of alarmones cannot be used to reconstruct their evolutionary relationships, but the history of these signal molecules can be understood in part by the sequence comparison and phylogenetic distribution of the enzymes involved in their biosynthesis and degradation. Therefore, search for homologs of biosynthetic and degradative enzymes of alarmones was carried out in complete sequenced cellular genomes (see “Materials and Methods”). Our results are summarized in Supplementary Table S2 and Fig. 2. Supplementary Table S2 displays the distribution of homologous enzymes (denoted as H) involved in the biosynthesis and in the degradation of alarmones. Letters in each column of the table represent the acronym for each organism’s genome from KEGG database. Their phylogenetic distribution, based on an rRNA evolutionary tree, is shown in Fig. 2.

Fig. 2
figure 2figure 2figure 2

Phylogenetic distribution of enzymes involved in the synthesis of alarmones. Five groups of alarmones were analyzed: a cAMP; c-di-AMP; Ap(n)A; b cGMP; c-di-GMP; (p)ppGpp; Gp(n)G; c AICAR; ZTP; d cGAMP; e Xp(n)X; Up(n)U; cCMP; cUMP. The outer circles of the phylogenies represent the presence of alarmone biosynthesis in each clade except for the tree E in which are represented the degradative enzymes. Red clade indicates the eukaryal branches, green clade the archaeal branches, and blue clade the bacterial branches. (Color figure online)

As shown in Fig. 2, the biosyntheses of five groups of alarmones were analyzed: (a) adenine-based alarmones (cAMP, c-di-AMP, and Ap(n)A); (b) alarmones that have a guanine ring (cGMP, c-di-GMP, (p)ppGpp, and Gp(n)G); (c) alarmones formed by intermediates in the purine metabolism (AICAR and ZTP); (d) alarmones formed by guanine and adenine purines in a cyclic structure (cGAMP); and (e) alarmone-like molecules (Xp(n)X, Up(n)U, cCMP, and cUMP). In Fig. 2, the color line drawn outside the tip of the branches in each clade of the phylogenetic trees shows the presence of alarmone biosynthetic enzymes. The enzymes were considered to be present in each clade if (1) more than fifty percent of the branches exhibited homologous hits, or if (2) such hits were distributed through the main basal branches of each clade.

As shown in the phylogenetic trees in Fig. 2, enzymes involved in biosynthesis of cAMP, Ap(n)A, cGMP, Gp(n)G, AICAR, and ZTP have a universal distribution, while enzymes involved in the synthesis of c-di-AMP, c-di-GMP, and (p)ppGpp are clearly restricted to the Bacteria domain. The absence of homologs in the Chlamydia and Tenericutes clades in most phylogenies can be understood as a secondary loss due to their intracellular parasitic lifestyle.

Since sequences of biosynthetic enzymes of Xp(n)X, Up(n)U, cCMP, and cUMP have not been reported, only their degradative enzymes were analyzed. All of them are present in almost all cellular lineages.

It has recently been proposed that other cyclic dinucleotides such as cyclic guanosine monophosphate-adenosine monophosphate (cGMP-AMP or cGAMP) (Fig. 2) can also function as an endogenous second messenger. In animals, it triggers the production of type I interferons in response to foreign DNA, e.g., DNA transfection or DVA virus infection (Wu et al. 2013; Sun et al. 2013; Hall et al. 2017), or an efficient intestinal colonization by Vibrio cholera (Davies et al. 2012). As shown in Fig. 2, the cGAMP biosynthetic enzyme is distributed in some γ-proteobacteria, essentially in the genus Vibrio, as well as in the animal group (Supplementary Table S2; Fig. 2). In our searches, the use of V. cholerae query sequences detected homologs only in γ-proteobacteria, while animal sequences used as queries found homologs exclusively within the animal group itself. However, tertiary structure comparisons of the two groups of cGMP-AMP biosynthetic enzymes demonstrated their monophyletic origin (see below). No homologs were detected in Archaea domain using any of the above-mentioned sequences or three-dimensional structures.

Alarmone Biosynthetic and Degradative Enzymes and Their Homologous Sequences Present in DNA and RNA Viral Genomes

A search for homologs of alarmone biosynthetic and degradative enzymes was carried out in 5,691 DNA and RNA viral genomes. As shown in Supplementary Table S3, homologous sequences of five enzymes (EC: 3.6.1.29, 3.6.1.61, 3.6.1.17, 3.6.1.41, and 3.1.7.2) involved in the degradation of Ap(n)A, Gp(n)G, Xp(n)X, Up(n)U, and (p)ppGpp, as well as one biosynthetic enzyme (EC: 2.7.6.1) for ZTP, were found in dsDNA viruses that have no RNA stage in their biological cycle. Most of these DNA viruses belong to the Myoviridae and Siphoviridae families, whose hosts are γ-proteobacteria and Firmicutes and Actinobacteria, respectively. The presence of sequences encoding phosphoribosyl pyrophosphate (PRPP) synthetase (EC: 2.7.6.1) in viral genomes is probably due to the fact that it participates in the production of phosphoribosyl pyrophosphate (PRPP), but not of ZTP. No sequences encoding for alarmone biosynthetic and degradative enzymes were found in the RNA viral genomes included in our analysis.

Structural Similarity Among Adenylate Cyclase (AC), Guanylate Cyclase (GC), Diguanylate Cyclase (DGC), and Polymerase Palm Domain

Cyclic nucleotides, such as cyclic AMP, cyclic GMP and cyclic di-GMP, are highly versatile second messengers that regulate different important cellular processes, including replication, transcription, motility, virulence, biofilm formation, cell cycle progression, and differentiation, among others (see Supplementary Table S1), and they are synthesized by adenylate cyclase (AC), guanylate cyclase (GC), and diguanylate cyclase (DGC), respectively. The structural homology of the catalytic core of ACs class III with the polymerase palm domain of DNA- and viral monomeric RNA polymerases is well established (Artymiuk et al. 1997; Bierger and Essen 2001; Bassler et al. 2018) (Fig. 3). This evolutionary relationship with the catalytic palm domain, which is the oldest and most conserved part of DNA- and monomeric viral RNA polymerases (Jácome et al. 2015), suggests that ACs may have emerged early in evolution of life, most probably during an RNA/protein World. There are other cyclases that synthesize alarmones (e.g., GC, DGC, DAC), whose origin is important to elucidate. As shown in Fig. 3, guanylate and diguanylate cyclases are also homologous with the palm domain, and may be the outcome of ancient gene duplications and recruitment events (Supplementary Figure S1). The palm domain, which is constituted by four β-strands and three α-helices, is similar both in element number and topology to the catalytic domain of different alarmone biosynthetic enzymes, including adenylate cyclases (class III), guanylate cyclases, and diguanylate cyclases (Bierger and Essen 2001). In fact, nucleotide polymerization and the biosyntheses of cAMP, cGMP, and c-di-GMP are chemically equivalent processes and involve a nucleophilic attack from the 3′OH group of the ribose to the α-phosphate of a nucleotide 5′-triphosphate, with the elimination of pyrophosphate (Artymiuk et al. 1997). In both cases, two conserved aspartate residues and the requirement of a metal cofactor, usually Mg2+ or Mn2+, are necessary for catalysis.

Fig. 3
figure 3

The palm domain structure found in different enzymes involved in alarmone biosynthesis and present in nucleotide polymerases. The presence of the palm domain in different enzymes that synthesize alarmones including adenylate cyclases (class III), guanylate cyclases, and diguanylate cyclases indicates that they are evolutionary related to an ancient nucleotide polymerase. In each structure, α-helices are colored in red, β-strands in yellow, loops in green; α-helices and β-strands colored in blue indicate structural segments not present in the palm domain. (Color figure online)

The structural comparison between the palm domain and cGAMP synthase did not indicate high levels of primary structure similarity. However, tertiary structure comparisons (Supplementary Figure S2) demonstrate that the catalytic site of cGAMP synthase is homologous to the non-canonical palm domain present in DNA polymerase III or DNA polymerase β (Bailey et al. 2006; Lamers et al. 2006) (Supplementary Figure S2).

Discussion

Universal Distribution of Alarmone Biosynthetic Routes

Comparisons of completely sequenced cellular genomes has shown that the biosynthetic and degradative enzymes of alarmones, including cAMP, Ap(n)A, cGMP, AICAR, and ZTP, are widely distributed in all extant life forms (Fig. 2; Table S2). The universal phylogenetic distribution of biosynthetic and degradative enzymes of alarmones, combined with their key role in the regulation of different conserved cellular processes such as replication, gene expression, and metabolism, suggests that they are indeed very ancient molecules. This supports the possibility that the LCA (LUCA) was already endowed with an elaborated sensory system comparable to that of extant cells that responded to these signal molecules (Lazcano et al. 2011; Becerra et al. 2007). This is consistent with the idea that biochemical pathways of earlier life forms were regulated by chemical signals derived from RNA molecules and ribonucleotides (Nelson and Breaker 2017).

Degradative enzymes of Gp(n)G, Xp(n)X, Up(n)U, cCMP, and cUMP molecules also have an universal distribution (Fig. 2), but the corresponding biosynthetic enzymes of these nucleotides are not detected in the available databases. There are several possible explanations for their absence: (a) metabolic accidents during stress conditions (e.g., cells under heat shock) could lead to accumulation of these unusual nucleotides. Enzymatic degradation of these products would explain the wide phylogenetic distribution of the corresponding enzymes; (b) Gp(n)G, Xp(n)X, Up(n)U, cCMP, and cUMP may be secondary products of enzymatic hydrolysis of RNA molecules; (c) they are synthesized by enzymes of broad substrate specificity; or (d) the degradative enzymes may be part of a salvage mechanism to recycle ribonucleotides.

As shown here, analysis of completely sequenced cellular genomes indicates that some alarmones have an uneven phylogenetic distribution (e.g., c-di-AMP, c-di-GMP, (p)ppGpp, cGAMP). Comparisons of alarmone biosynthetic enzymes have also shown that enzymes involved in the biosynthesis of c-di-AMP and c-di-GMP are not found in all completely sequenced cellular genomes. As shown in Fig. 2, sequences encoding for diadenylate cyclases (DACs) and diguanylate cyclases (DGCs) are present in most bacterial genomes. Indeed, most enzymes involved in synthesis and degradation of alarmones are widely distributed in Bacteria, suggesting that this is the cellular domain that has exploited the widest range of these signal molecules. The same is true for sequences involved in (p)ppGpp biosynthesis. Nevertheless, other studies (Atkinson et al. 2011) have reported the presence of (p)ppGpp synthase and hydrolase in some members of the archaeal (e.g., Methanosarcina, Natronomonas, and Methanococcoides) and eukaryal (e.g., some algae, amoeba, other protists, and fungi) domains. The absence of homologous sequences reported here can be explained in part by the stringent search parameters we have employed, e.g., the percentage involved in the alignment in query and subject sequence. More flexible parameters allowed us to identify only (p)ppGpp hydrolase homologs. However, the robustness in our analysis resulted in a fewer false negatives, while less stringent parameters allowed us to obtain more hits, as well as protein domains or even sequence segments, increasing dramatically the number of false positives. The phylogenetic distribution of the cGAMP synthase in some γ-proteobacteria (e.g., Vibrio cholera) and in animals (Supplementary Table S2; Fig. 2) suggests that cyclic GMP-AMP alarmone is a very recent second messenger, and that the corresponding biosynthetic enzyme was horizontally transferred. Although V. cholerae and animal cGAMP synthase sequences have low similarity levels, their conserved tertiary structure suggest their common origin.

The ubiquitous distribution of AC class III and the apparent absence of lateral gene transport strongly support previous suggestions that cAMP was already present in the LCA (Lazcano et al. 2011). The wide variety of domains associated with the cyclase catalytic domain could reflect diverse strategies by which the additional domains could regulate different cellular processes (see ref. Bassler et al. 2018; Shenoy et al. 2004; Baker and Kelly 2004; Shenoy and Visweswariah 2004) (Supplementary Table S4). The evolutionary history of genes involved in cAMP synthesis appears to have been shaped by shuffling domain and gene duplication (Lazcano et al. 2011). As argued by Lazcano and coworkers (2011) and Nelson and Breaker (2017), cAMP and other alarmones can be considered “metabolic fossils” from early stages of biological evolution when ribonucleotides and RNA molecules played a more conspicuous role in chemical signaling, metabolite sensing, and storage of genetic information and catalysis.

The presence of the palm domain in different enzymes that synthesize alarmones, including adenylate cyclases (class III), guanylate cyclases, and diguanylate cyclases, indicates that they are all descended from a protein domain shared with an ancient nucleotide polymerase ancestor, which was duplicated and recruited many times during the evolutionary history of polymerases and cyclases (Fig. 3; Supplementary Figure S1). Although the canonical palm domain is not an homolog of cGAMP synthase, the structural comparison between the DNA polymerase X palm domain and the cGAMP synthase showed high levels of similarity, suggesting the monophyletic origin of these two catalytic structures (Supplementary Figure S2).

A detailed bioinformatic search for alarmone biosynthetic and degradative sequences in DNA and RNA viral genomes also revealed that some double-stranded DNA bacteriophages encode sequences involved in alarmone synthesis or degradation (Supplementary Table S3). Homologous sequences of alarmone degradation were found in these same viral genomes. A possible explanation is that viruses control the intracellular alarmone concentration via degradation process as a strategy that allow them to remain hidden from other non-infected cells of the bacterial population.

Alarmones and the RNA World

There are many different definitions of the RNA World, but all of them are based on the hypothesis that during early evolution of life replication and catalysis were mediated primarily by RNA molecules (Gilbert 1986). However, the RNA World should not be understood as a mere collection of catalytic and replicative polynucleotides (Lazcano 2014; Vázquez-Salazar and Lazcano 2018), but rather as a stage during which the interaction of RNA molecules, small molecules such as ribonucleotidyl cofactors (Handler 1961; Eakin 1963; Orgel 1968; White 1976), metallic ions, lipids, and small and simple peptides allowed the maintenance and evolution of a primitive replicative and metabolic apparatus (Lazcano 2014, 2018). Proposals of an RNA World are supported by the stunning expansion of the functional repertoire of synthetic ribozymes (Chen et al. 2007), which catalyze the same classes of chemical reactions as enzymes (Table 1), as well as by the direct participation of ribonucleotide cofactors and modified nucleotides in biological catalysis (White 1982; Lazcano 2014), the regulation of gene expression by riboswitches and other non-coding small RNAs (Breaker 2010; Lazcano 2014; Chen and Gottesman 2014; Nelson and Breaker 2017; Vázquez-Salazar and Lazcano 2018; González-Plaza 2018), and, as noted here, the activation or inhibition of cellular processes by alarmones.

Table 1 Ribozyme-mediated catalysis

Although in principle some cyclic deoxyribonucleotides (e.g., cyclic dAMP) could function as alarmones, this is not the case (Nelson and Breaker 2017). The fact that all known alarmones are derived from ribonucleotides but not from deoxyribonucleotides is certainly consistent with the possibility that at least some of them are remnants of an epoch prior to the evolutionary emergence of DNA cellular genomes (Lazcano et al. 2011; Nelson and Breaker 2017). How did they first evolve? Did alarmones first emerge in the RNA World? If so, what was their original role? Or did they first appear in the RNA/protein World? Did RNA/protein-based entities simply take advantage of what was already there? There are several alternative possibilities. Perhaps the simplest explanation is that they were by-products of the degradation and recycling of RNA molecules (Nelson and Breaker 2017), or resulted from side reactions of catalysts of a ribozyme-mediated underground metabolism. Underground metabolism has been defined as those reactions that occur when enzymes (or ribozymes, as discussed here) use chemically similar substrates that form part of endogenous metabolites. Although biochemical reactions are exceptionally precise, molecular errors may confer evolutionary advantages, such as an enhanced metabolic plasticity or the establishment of new pathways (D’Ari and Casadesús 1998). In both cases, the resulting alarmone precursors would be incorporated as signal mechanisms by an exaptation phenomenon prior to the evolutionary divergence of the Bacteria, Archaea, and Eukarya.

The results presented here cannot be extrapolated back in time beyond a period in which ribosome-mediated polypeptide synthesis had already evolved. However, in vitro selection experiments have shown that RNA molecules catalyze the formation of modified ribonucleotides such as CoA, NAD, and FAD (Huang et al. 2000), supporting the possibility that cyclic or polyphosphate nucleotides such as alarmones could have also been synthesized by ribozymes. Since some of the reactions that form alarmones are equivalent to polynucleotide elongation reactions (Supplementary Table S5), the demonstration of ligase and polymerase activity of ribozymes, together with many other RNA-catalyzed chemical reactions, supports the assumption that these signal molecules could have been originally synthesized by catalytic RNA (Supplementary Table S6). Moreover, it has been reported that some (di)nucleotides, whose chemical structure resembles that of some alarmones (e.g., AppA or GppG), can be synthesized using a polyribonucleotide template (Puthenvedu et al. 2015; Majerfeld et al. 2016). It is therefore very likely that moieties of nucleotidyl-alarmones were available in an RNA World and in an RNA/protein World.

Alarmones as Examples of Exaptations in Early Evolution of Life

The evolutionary history of some alarmones such as 5-aminoimidazole-4-carboxamide ribonucleotide (AICAR) is probably best understood as an exaptation process, in which there was a shift during evolution of the original function of a molecule to another role (Fig. 4). It has been traditionally thought that intermediate forms or states do not have adaptive value by themselves and they can only be recognized when they are preserved in fossil record. However, at a subcellular level there are metabolic intermediates that have been recruited and perform several other functions due to their adaptive value. This appears to be the case of AICAR, which is a molecule used in different cellular processes (Fig. 4). AICAR is a metabolic intermediary in the biosynthesis of inosine monophosphate (IMP) in purine anabolism, and is also formed in histidine anabolism (Vázquez-Salazar et al. 2017, 2018). Moreover, it is also a precursor of 5-aminoimidazole ribonucleotide (AIR) in the biosynthesis of thiamine cofactor (Bazurto et al. 2015), a pathway that is independent of the purine biosynthetic reactions. These three processes are ancient metabolic pathways. Quite surprisingly, AICAR can also function as an alarmone, which suggests that 5-aminoimidazole-4-carboxamide ribonucleotide was perhaps recruited from its original function as an intermediary in purine biosynthesis to a new role as a signal molecule. The triphosphate form of AICAR (ZTP) can also carry out completely different regulatory roles. Additionally, it is well known since the 1970s that the antibiotic bredinin (5′hydroxy-1-b-ribofuranosyl-1H-imidazole-4-carboxamide) is an AICAR derivative (Suhadolnik 1979) that inhibits IMP dehydrogenase, blocking the conversion of IMP to GMP (Sakaguchi et al. 1976). It has been hypothesized that some antibiotics could have had other more ancient roles such as signaling molecules. As noted by Yim et al. (2007), small concentrations of antibiotics can modulate gene transcription during the interactions of populations with the surrounding organisms. Like alarmones, antibiotics can affect the expression of genes related to virulence, colonization, motility, stress response, and biofilm formation (Romero et al. 2011), suggesting that they might act as signal molecules in natural environments, facilitating intra- or interspecies interactions within microbial communities.

Fig. 4
figure 4

The AICAR metabolic intermediate and its role in other biological processes. This ribotide is a key intermediate in the biosynthesis of (1) IMP in nucleotide biosynthesis; (2) ATP to synthesize histidine in amino acid biosynthesis; (3) the thiamine coenzyme; (4) signal molecules such as AICAR itself and AICAR triphosphate (ZTP); or (5) the bredinin antibiotic

Alarmones and Riboswitches

As discussed in considerable detail by Nelson and Breaker (2017), alarmones can also regulate genes through riboswitches. A riboswitch is an RNA segment located in non-coding regions, which binds to a metabolite and controls the gene expression by changing the tridimensional structure of the mRNA, which results in the regulation of transcription or translation of the mRNA in which it is embedded. The discovery of riboswitches that regulate ribozyme activity (Winkler et al. 2004) led to the proposal this could be in fact an ancient control mechanism (Breaker 2010, 2012; Nelson and Breaker 2017). Since RNA can recognize by itself metabolites and control gene expression without the assistance of protein factors, this suggests that some riboswitches and their RNA-based regulatory elements are modern descendants of an ancient regulatory mechanism that evolved prior to the emergence of protein enzymes and receptors (Breaker 2010, 2012; Nelson and Breaker 2017). Both AICAR and ZTP metabolites bind to a conserved non-coding region of pfl mRNA and promote the transcription of genes involved in purine and folate biosynthesis (e.g., purH, purJ, and pyruvate formate lyase), i.e., in purine- and folate-deprived cells where they function as genetic “on” switches (Jones and Ferré-D’Amaré 2015). In addition, c-di-GMP and cAMP-GMP nucleotidyl metabolites can be sensed by GEMM, a conserved RNA motif that is characteristic of some riboswitches that control the expression of genes associated with cell differentiation, virulence, pilus formation, flagellum biosynthesis, and extracellular electron transfer in Bacteria (Sudarsan et al. 2008; Breaker 2012; Kellenberger et al. 2015). It has also been shown that c-di-AMP can be sensed by the riboswitch ydaO, which is implicated in cell wall metabolism, sporulation, and osmotic tress responses (Nelson et al. 2013). The recent discovery of a riboswitch that binds to the ppGpp alarmone has expanded the list of RNA-based signaling systems (Sherlock et al. 2018). The discovery of alarmones that function as metabolites that bind to riboswitches and control gene expression provides a good model of early stages where catalytic RNAs regulated their expression by sensing alarmone-like metabolites. A good example of a signal-receptor system composed solely of RNA molecules is the complex c-di-GMP riboswitch and the self-splicing ribozyme from Clostridium difficile (Nelson and Breaker 2017). The presence of the alarmone c-di-GMP and its binding to the riboswitch facilitates the recognition of the 5′ splice site and the splicing reaction that produces a mature mRNA by the ribozyme. During early evolution of life these systems could regulate the ribozyme function, controlling both gene expression and metabolism (Nelson and Breaker 2017).

Materials and Methods

Sequences of Enzymes Involved in Synthesis and Degradation of Alarmones

Pathway maps from Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto 2000; Kanehisa et al. 2016) were used to identify and collect the enzyme sequences involved in synthesis and degradation of alarmones. Four groups of alarmones were identified: (1) those derived from adenine, such as cAMP, c-di-AMP, and Ap(n)A; (2) those derived from guanine, such as cGMP, c-di-GMP, (p)ppGpp, and Gp(n)G; (3) those derived from intermediaries in purine metabolism, such as AICAR and ZTP; and (4) cyclic GMP-AMP or cGAMP. Additionally, two groups of possible alarmones were also considered: (1) those derived from other purines, such as Xp(n)X; and (2) those derived from a pyrimidine base, such as cCMP, cUMP, and Up(n)U.

Searching for Biosynthetic and Degradative Enzymes of Alarmones in Completely Sequenced Cellular Genomes

A search for homologs for each enzyme involved in biosynthesis and degradation of alarmones was performed in archaeal (n = 256), bacterial (n = 4392), and eukaryal (n = 401) genomes. All completely sequenced cellular genomes analyzed were downloaded from the KEGG database, including the following archaeal genomes: Euryarchaeota (170), Crenarchaeota (60), Thaumarchaeota (15), Nanoarchaeota (2), Nanohaloarchaeota (1), Micrarchaeota (1), Korarchaeota (1), Bathyarchaeota (2), Lokiarchaeota (1), and other Archaea (3); bacterial genomes: Gammaproteobacteria (1043), Betaproteobacteria (316), Epsilonproteobacteria (157), Deltaproteobacteria (84), Alphaproteobacteria (489), Firmicutes (882), Tenericutes (126), Actinobacteria (547), Cyanobacteria (99), Melainabacteria (1), Chloroflexi (27), Deinococcus (27), Armatimonadetes (2), Terrabacteria (1), Chlamydiae (118), Verrucomicrobia (9), Planctomycetes (16), Kiritimatiellaeota (1), Spirochaetes (77), Acidobacteria (9), Elusimicrobia (3), Fusobacteria (14), Synergistetes (6), Fibrobacteres (2), Gemmatimonadetes (3), Bacteroidetes (222), Chlorobi (17), Cloacimonetes (1), Aquificae (14), Thermotogae (30), Caldiserica (1), Chrysiogenetes (1), Deferribacteres (4), Calditrichaeota (1), Dictyoglomi (2), Nitrospirae (9), Thermodesulfobacteria (4), Saccharibacteria (4), Peregrinibacteria (1), and unclassified bacteria (12); and eukaryal genomes: Animals (160), Plants (73), Fungi (119), and Protists (49). In order to determine the phylogenetic distribution of enzymes involved in the synthesis and degradation of alarmones, the query sequences of alarmones were compared against subject sequences from each genome using DELTA-BLAST program (Boratyn et al. 2012), version 2.2.29+. The parameters used in this work were a BLOSUM45 scoring matrix, an E-value ≤ 0.001, and an identity percentage ≥ 20%. In order to obtain homologous sequences and no protein domains or protein segments, we considered the query coverage and subject coverage metric. These metrics are defined as the percent of the query and subject sequence that overlaps with the alignment. A query coverage ≥ 70% and a subject coverage ≥ 70% were used.

Phylogenetic Analysis

Four phylogenetic trees were used to depict the distribution of homologs of alarmone biosynthetic enzymes. Phylogenetic trees based on small subunit of ribosomal RNA in Newick format were visualized with the Interactive Tree of Life (iTOL) program (Letunic and Bork 2016). Phylogenetic trees were grouped according to the chemical structure of alarmones: (A) cAMP; c-di-AMP; Ap(n)A; (B) cGMP; c-di-GMP; (p)ppGpp; Gp(n)G; (C) AICAR; ZTP; (D) cGAMP; and (E) Xp(n)X; Up(n)U; cCMP; cUMP.

The presence of biosynthetic enzymes was considered in each clade if (1) a few homologous hits were scattered in some branches of the clade, or (2) more than fifty percent of branches had homologous hits.

Structural Comparisons of Polymerase Palm Domain and Different Kinds of Cyclase Enzymes

The tertiary structures of adenylate (1WC0), guanylate (2W01), diadenylate (4YVZ), and diguanylate cyclases (4H54) and palm domain of DNA polymerase (1D8Y) were obtained from Protein Data Bank (PDB) (Berman et al. 2000). The palm domain of the polymerases is homolog to the catalytic domain of adenylate cyclase (Artymiuk et al. 1997). In order to know if other cyclases (GC, DAC, and DGC) also share structural homology with the palm domain, a structural alignment was made overlapping the structures using PyMol CE package.Footnote 1 Moreover, their evolutionary relationships were reconstructed by aligning all structures using PDBeFold pairwise alignment (Krissinel and Henrick 2004) and calculating a geometric distance measure for each comparison using Structural Alignment Score (SAS) (Subbiah et al. 1993). A phylogenetic tree was made using Fitch, which is included in the PHYLIP package (Felsenstein 2005).

Homologs of Biosynthetic and Degradative Enzymes of Alarmones in Viral Genomes

A search for homologs of alarmone biosynthetic and degradative enzymes was carried out in 5,691 viral genomes of 102 DNA and RNA viral families using DELTA-BLAST. The viral families analyzed were Ackermannviridae, Adenoviridae, Alloherpesviridae, Alphaflexiviridae, Alphasatellitidae, Alphatetraviridae, Alvernaviridae, Amalgaviridae, Ampullaviridae, Anelloviridae, Arenaviridae, Arteriviridae, Ascoviridae, Asfarviridae, Aspiviridae, Astroviridae, Bacilladnaviridae, Baculoviridae, Barnaviridae, Benyviridae, Betaflexiviridae, Bicaudaviridae, Bidnaviridae, Birnaviridae, Bornaviridae, Bromoviridae, Caliciviridae, Carmotetraviridae, Caulimoviridae, Chrysoviridae, Circoviridae, Closteroviridae, Coronaviridae, Corticoviridae, Cystoviridae, Dicistroviridae, Endornaviridae, Filoviridae, Fimoviridae, Flaviviridae, Flexiviridae, Fusariviridae, Fuselloviridae, Gammaflexiviridae, Geminiviridae, Genomoviridae, Globuloviridae, Hantaviridae, Hepadnaviridae, Hepeviridae, Herpesviridae, Hypoviridae, Hytrosaviridae, Iflaviridae, Inoviridae, Iridoviridae, Lavidaviridae, Leviviridae, Lipothrixviridae, Luteoviridae, Malacoherpesviridae, Marnaviridae, Marseilleviridae, Megabirnaviridae, Mesoniviridae, Microviridae, Mimiviridae, Mymonaviridae, Myoviridae, Nairoviridae, Nanoviridae, Narnaviridae, Nidovirales, Nodaviridae, Nudiviridae, Nyamiviridae, Orthomyxoviridae, Papillomaviridae, Paramyxoviridae, Partitiviridae, Parvoviridae, Peribunyaviridae, Permutotetraviridae, Phenuiviridae, Phycodnaviridae, Picobirnaviridae, Picornaviridae, Pithoviridae, Plasmaviridae, Pleolipoviridae, Pneumoviridae, Podoviridae, Polycipiviridae, Polydnaviridae, Polyomaviridae, Potyviridae, Poxviridae, Quadriviridae, Reoviridae, Retroviridae, Rhabdoviridae, Roniviridae, Rudiviridae, Secoviridae, Siphoviridae, Smacoviridae, Solemoviridae, Solinviviridae, Sphaerolipoviridae, Sunviridae, Tectiviridae, Togaviridae, Tolecusatellitidae, Tombusviridae, Tospoviridae, Totiviridae, Turriviridae, Tymoviridae, and Virgaviridae. Some relevant parameters included a BLOSUM45 scoring matrix, an E-value ≤ 0.001, an identity percentage ≥ 20%, and query and subject coverage ≥ 70%.