“Therefore, I should infer from analogy that probably all the organic beings which have ever lived on this Earth have descended from some one primordial form…”.

—Chales R. Darwin, on The Origin of species, first British edition, 1859—

From the Progenote to the Last Common Ancestor

It was Carl Woese and George E. Fox who first proposed that all extant organisms which inhabit Earth can be grouped into one of the three major domains, Bacteria, Archaea, and Eukarya (Woese and Fox 1977a). Their subsequent trifurcated, explicitly unrooted ribosomal RNA (rRNA) tree suggests that all organisms within these domains derived from a common ancestral life form (Fox et al. 1980). In this regard, all modern organisms share the central dogma. This includes the translation machinery, the genetic code, the essential features of genome replication and gene expression, as well as many essential metabolic reactions and basic ATP energy production. Variation from these essential features is usually attributed to environmental adaptations posterior to the divergence of the three major biological domains (Becerra et al. 2007). Nevertheless, there are differences like the exclusive membrane lipid composition between Bacteria and Archaea that remain an unsolved mystery (Wächtershäuser 2003; Peretó et al. 2004).

With the discovery of the Archaea Domain, Woese and Fox (1977a) recognized the existence of what seemed like significant differences in the translation machinery, such as the larger size of the large subunit in Eukaryotes and Archaea. They envisioned that “the basic cell type” would necessarily be on a level of complexity far simpler than what is seen in modern prokaryotes. Such entities would be in the process of evolving the genotype–phenotype relationship and might appropriately be called progenotes (Woese and Fox 1977b). One may notice that the hypothetical progenote was not initially envisioned as the ancestral population at the trifurcation of the emerging 16S rRNA phylogeny, since a comprehensive 16S rRNA tree including the Archaea was not available until 1980 (Fox et al. 1980). Over the years as described in detail by Gogarten and Olendzenski (1999), the progenote has been envisioned as either an organizational level that preceded prokaryotes or as the last common ancestor of extant life (LCA). We strongly encourage the community to instead use the term “progenote” as it was initially envisioned by Woese and Fox (1977b).

The confusion stems in part because progenotes are defined as entities still ‘evolving the relationship between genotype and phenotype.’ But what does that mean? Are there any likely progenotes out there that we can study? Recent efforts to understand the evolutionary origin of the translation system might provide a window to these earlier times (Hsiao et al. 2009; Petrov et al. 2014, 2015). The extant ribosome consists of a large and small subunit. The large subunit is responsible for the synthesis of the peptide bonds, whereas the small subunit implements the machinery for coded synthesis (Steitz 2008; Fox 2010; Wilson and Doudna Cate 2012). Recent studies have shown that a 67-nucleotide RNA derived from the current large subunit can in fact catalyze peptide bond formation without coding (Bose et al. 2021, 2022). Over the years, this entity was named the protoribosome by Yonath and her co-workers (Bashan et al. 2003; Agmon et al. 2005, 2009; Agmon 2009, 2016; Davidovich et al. 2010; Krupkin et al. 2011; Huang et al. 2013; Yonath 2017). We suggest that the protoribosome can appropriately be considered to be an example of a progenote. Later, if this progenote became increasingly complex it may have partnered with a second progenote, which would perhaps be an ancestor of the small subunit. The resulting entity could now have sequence preference and coding too. A timeline which leads from a purely chemical stage to an RNA-World or an RNA-peptide world to the LCA now makes sense (Fig. 1). These larger populations would likely have included subpopulations of progenotes that provided aspects of the central dogma. The existence of progenotes, able to catalyze the peptide bond, which is fossilized within extant rRNA, implies that we are now able to study further the early evolution of the code and even its very origin.

Fig. 1
figure 1

Timeline from the synthesis and accumulation of organic compounds (SAOC) to the chemical evolution to an RNA-World containing progenotes to the LCA. (1) Emergence and early evolution of the PTC (protoribosome) (LSU). First small uncoded peptides. (2) Emergence and early evolution of the decoding center (SSU). (3) Association of the LSU and the SSU. First proto-tRNA adaptors. (4) Early evolution of the genetic code. Early evolution of the aminoacyl tRNA synthetases. Class I first and then Class II. (5) Transition to DNA as genetic material. (6) Consolidation of the tRNAs. (7) Consolidation of the aminoacyl tRNA synthetases. Class I and II

The origin and early evolution of the code remain elusive despite the code itself being deciphered over 50 years ago. The evidence suggests that it has evolved into a code that minimizes the effects of point mutations and mistranslation, in a sense, “the genetic code is one in a million” (Freeland and Hurst 1998). It has also been proposed that the extant code arose from stereochemical interactions between RNA and the amino acids. Then, it expanded by biosynthetic modification and finally was optimized through codon reassignment. Three complementary forces of its evolution that most likely fixed the code in the LCA of modern organism (Knight et al. 1999). A genetic code fully in place within the LCA is a common conclusion that arises independently from different lines of evidence, like the compositional analysis of ribosomal proteins made by Fournier and Gogarten (2007). There, they identified a subset of amino acids that are most likely the most recent additions to the code and suggested that the expansion of the code may have enhanced the transition from an RNA-based to a protein-based life prior to LCA’s times. The implications of the emergence and posterior assembly of two different families of aminoacyl tRNA synthetases that may have significantly affected the code will be discussed at length later in the text.

Many Names, Too Many Interpretations

Fitch and Upper (1987) coined the term Cenancestor to name the ancestral organism from which Archaea, Bacteria, and Eukaryotes descend. The last universal common ancestor (LUCA) was, for several years, the most commonly used term by which such entities were known. It was initially used in the first report that reconstructed LUCA’s genetic content that included genomic data from an Archaea (Kyrpides et al. 1999). There were other proposals like the last universal cellular ancestor made by Philippe and Forterre (1999), universal ancestor by Doolittle (2000), last common community by Line (2002), and urancestor by Kim and Caetano-Anollés (2011), among others. These terms are of course not synonyms because they reflect the particular vision of the authors and the ongoing controversies about its metabolic nature, origin, and posterior evolution. As of today, the simplest commonly used term is the last common ancestor (LCA). This entity is currently understood as the ancestral population from which all extant living creatures descend. Although strictly speaking, the LCA is an inventory of the genetic characteristics that are shared among extant organisms (Becerra et al. 2007).

The constantly increasing number of completely sequenced genomes has made it possible to apply new approaches and techniques to improve the estimations of the LCA genetic content and from the latter derive its nature and environment. Several studies have used clever bioinformatic approaches to characterize the minimal set of genes present in the LCA. This has included the search for gene families instead of individual genes (Harris et al. 2003; Mirkin et al. 2003; Weiss et al. 2016b), the search of protein architectures (Yang et al. 2005; Ranea et al. 2006), as well as individual protein domains and motifs (Yang et al. 2005; Kim and Caetano-Anollés, 2011). Such reconstructions and their techniques exploited the intrinsic features within the primary sequences to improve the search of homologous sequences along the phylogeny. Despite the universality, centrality, and antiquity of the noncoding RNA genes, such as the rRNA and/or the tRNA, they are not the subject of these types of homology searches due to the technical challenges, such as their small sizes and their 4 letter alphabet. Instead, other approaches have been made like the comparison of atomic-resolution structures of ribosomes from distant phyla. Thanks to the latter, it was suggested that approximately 90% of the extant prokaryotic rRNA forms an ancestral conserved core, which is the structural and functional unit of all known cytoplasmatic ribosomes (Bernier et al. 2018).

LCA’s Genetic Content Reconstructions

It was long before fully sequenced genomes were available that people started to wonder about the nature of the last common ancestor (LCA). The very first attempt to reconstruct the genetic elements most likely present in the LCA was done more than 30 years ago by Lazcano et al. (1992). It was an exhaustive and comprehensive review of the literature available at the time. There, it was suggested that the machineries of DNA replication, gene expression, and basic biosynthetic pathways are essentially the same among Archaea, Bacteria, and Eukaryotes. Thus, concluding that “the LCA was very much like a contemporary prokaryote at its fundamental level of biological complexity.”

The release of the first completely sequenced genomes started a new era of comparisons and searches of sequences from genes and proteins among different organisms. The comparison of parasitic bacterial genomes, from Haemophilus influenzae and Mycoplasma genitalium, resulted in the first estimation of the minimal gene set necessary to sustain essential cellular functions. Unfortunately, the absence of homologous genes from several key proteins involved in DNA replication led the authors to a likely faulty conclusion; they suggested that the LCA had an RNA genome (Mushegian and Koonin 1996). This interpretation can be attributed to an underestimation of the gene content due to the parasitic lifestyle of H. influenza and M. genitalium. Later, Methanococcus jannaschii was the first Archaea whose genome was completely sequenced (Bult et al. 1996). This allowed the first estimation of the genome content of the LCA that included archaeal genes for comparison against bacterial and eukaryotic genes (Kyrpides et al. 1999). As a result, the authors infer that the LCA was an organism with several biochemical functions and genetic machineries similar to extant unicellular organisms.

As the completely sequenced genomes of more and more organisms became available, many research groups tried to improve the estimation of the LCA genetic content. Take, for instance, Norman Pace’s group that used the Clusters of Orthologous Genes (COGs) database to search for the universally conserved genes exclusively within fully sequenced organisms (Harris et al. 2003). This study required that highly conserved genes exhibit the same phylogenetic signal as the ribosomal RNA. The result being that most of such universal genes are related to the ribosome. Even further, such an approach oversees the effect of horizontal gene transfer (HGT), a phenomenon whose degree of impact in the reconstruction of ancestral life forms is still under debate (Doolittle 1999, 2000).

To deal with the fact that almost 90% of the COGs are incompatible with the rRNA universal tree and to reconcile gene loss, gene emergence and events of HGT several algorithms that compute parsimonious evolutionary scenarios for genome evolution were developed (Mirkin et al. 2003). The authors concluded that gene loss and HGT are major aspects that shape prokaryotic evolution with almost equal frequency. They also concluded that if LCA was a minimal free-living entity it would necessarily benefit from HGT and in a lesser way from the invention of new genes. And on the other hand, if LCA was a complex entity it would eventually benefit from differential gene loss.

A separate approach with a biological perspective was then developed. Instead of using every completely sequenced organism available, a representative sample of well-known and well-characterized organisms from Archaea, Bacteria, and Eukaryotes was chosen. This biologically curated sample also tried to exclude endosymbionts and parasites (Delaye et al. 2005). The gene complement of the LCA that was presented there is more compatible with a cellular entity that emerged prior to the divergence of the three cellular domains of life.

By taking advantage of the Structural Classification of Proteins database (SCOP), a strategy that uses the presence or absence of protein domain architecture was used to construct the phylogeny of 174 complete genomes (Yang et al. 2005). This study was grounded in the well-accepted notion that protein tertiary structure is more conserved than primary sequence and that it allows one to see deeper into the past. They reported 49 super family folds common to all genomes under scrutiny, suggesting a LCA with a sophisticated genetic inventory and gene products far beyond those from just the translation machinery. This conclusion was supported by Ouzounis et al. (2006) who suggested that the LCA possessed a complex genome similar to extant free-living prokaryotes. They implemented a search strategy based on primary sequence that suggests functional capabilities like metabolism, information processing, active membrane transport, and complex regulatory functions were among the capacities of the LCA.

The notion that three-dimensional structure comparison is more sensitive than conventional primary sequence methodologies in detecting remote homology has also been used to identify a set of ancestral protein domains most likely present in the LCA (Ranea et al. 2006). A functional analysis of such ancestral domains again reveals a genetically complex LCA, with all essential functional cellular systems in place. The latter conclusion supports previous proposals suggesting that life acquired its modern cellular characteristics before the divergence of the three domains (Doolittle 2000).

A more recent proposal suggests that the Urancestor (≈LCA) was similar to modern organisms in terms of gene content. It is also grounded in a phylogenomic study of protein domain structure and their classification into highly conserved fold super-families (Kim and Caetano-Anollés, 2011). The authors argue that despite its possession of advanced metabolic capabilities, being especially rich in nucleotide metabolism, the Urancestor had pathways for membrane synthesis and crucial elements of translation. However, it lacked fundamental elements for transcription and extracellular communication, as well as for the synthesis of deoxyribonucleotides. Therefore, its proteomic history suggests that the Urancestor is closer to a simple progenote population that harbored a set of modern molecular functions.

The most recent attempt to reconstruct the genetic content of the LCA also tries to derive its physiology and habitat from the defiant premise that non-universal proteins can illustrate LCA’s physiology (Weiss et al. 2016b). They also support the recent two-domain tree of life hypothesis, which proposes that Eukaryotes arose from the Archaeal branch of the Prokaryote lineage (Williams et al. 2013; Raymann et al. 2015). Within this study, the authors depict the LCA as an anaerobic autotroph living in a hydrothermal setting, dependent upon geochemistry and therefore “only half-alive.” Such a disruptive vision has been the subject of many rigorous revisions and vivid discussions (Gogarten and Deamer 2016; Weiss et al. 2016a). Those of course are not within the objective of the present review, but we encourage the readers to examine them and form their own opinion.

It is evident that there have been a considerable number of attempts to reconstruct the gene content of the LCA. All of them exploit completely sequenced genomes but use different approaches from primary sequence comparisons to phylogenetic strategies, to protein domain architecture, to tertiary structure searches, and even a mixture of them (Table 1). While several arrive to the conclusion that the LCA resembles an extant free-living prokaryote others point to a simpler being perhaps closer to a progenote. Nevertheless, they all agree on one feature that must be present in the LCA, the translation machinery. For years, it has been recognized as one ancestral element whose analysis must shed light on the earliest history of life, even predating the LCA (Agmon et al. 2005; Davidovich et al. 2009; Fox 2010; Petrov et al. 2014, 2015; Rivas and Fox 2023). We have searched throughout the ten studies described above and their results to extract their conserved genes, which are part of the translation machinery proposed for the LCA. Therefore, the conclusions, suggestions and speculations that will be presented in the following section are based on the comparisons of these reconstructions and their conserved genes from LCA’s translation machinery.

Table 1 Comparison of the 10 reconstructions of the gene content of the LCA

A Common Conclusion: The Translation Machinery

As these approaches accumulate, the idea of extrapolating the consensus genetic content of the LCA emerged. As far as we know the very first attempt was LUCApedia (Goldman et al. 2012) a database that was presented as “a unified framework for simultaneously evaluating multiple datasets related to the LCA.” It represented a tool for a quick reference in determining if a gene or a set of genes could be considered ancient. A more refined and detail attempt was recently published by Crapitto et al. (2022), they developed a series of bioinformatic and statistical procedures to compare the prediction of eight reconstructions of the genetic content of the LCA. Therein, the authors discuss that although most of the studies show a strong agreement with the consensus predictions, no single study shows even a moderate degree of similarity with any other. Of special interest is the conclusion, which derives from the consensus set, saying that the LCA possessed a protein synthesis machinery, amino acid metabolism, and used nucleotide-derived cofactors. The latter immediately implied that the consensus set could in principle reveal the most conserved elements within the genetic content of the LCA. This is an idea that we independently explore in detail, with a more bounded scope, limited to the elements of the translation machinery.

Despite the different methodological strategies from the reconstructions of the genetic content of the LCA, all of them independently inferred that some portions of the translation machinery are among the most conserved features and therefore likely to have already been active at the time of the LCA. Although the vast majority of the elements that integrate extant translation machinery are not equally conserved among these reconstructions, by comparing the lists of each reconstruction, it was found that several key components are indeed well conserved across all of them.

The ribosome is a ribonucleoprotein complex that is regarded as the core of the translation system and it is composed of a small subunit (SSU) and a large subunit (LSU) (Steitz 2008; Fox 2010; Wilson and Doudna Cate 2012). The structure and contents of these subunits include both conserved and variable features. In prokaryotes, the SSU contains one 16S ribosomal RNA (rRNA) and ~ 21 ribosomal Proteins (rProteins), while the LSU contains 5S and 23S rRNAs and ~ 33 rProteins (Steitz 2008; Wilson and Doudna Cate 2012). Several ribosomal proteins including L1, L2, L5, L6, L11, L14, L22, S2, S5, S7, S8, S10, S11, S13, and S19 are found essentially in all the reconstructions (Table 2). LSU and SSU rProteins that are listed above were found in 80–100% of the studies, nevertheless all rProteins detected by even a single reconstruction are included in the supplementary tables. Such degree of conservation immediately suggests that the ribosome of the LCA was already exploiting the coexistence with large globular peptides. Even further, such observation implies that the LCA´s ribosome has already gone through several stages of rProtein evolution (Kovacs et al. 2017).

Table 2 Highly conserved rProteins that might be within the genetic content of the LCA

These highly conserved rProteins associate in various functional places within the extant ribosome (Schuwirth et al. 2005; Lin et al. 2015). L2 and L14 are located between the two subunits, most likely assisting in the ribosome assembly. L22 is associated with the last part of the exit tunnel, likely assisting with the folding and expulsion of the nascent polypeptide. S7 and S5 touch the SSU in helix 28, while S11 touches helix 45. Both helixes are at the core of the decoding center whose dependence on rProteins for appropriate folding has been established (Schedlbauer et al. 2021). Ribosomal proteins L5, S13, and S19 establish a bridge between the 5S, the 16S, and the 23S rRNAs. Of special interest are those conserved rProteins that potentially contained posttranscriptional modifications, such as methylation, acetylation, and/or phosphorylation. As shown by Ilag et al. (2005) phosphorylated rProteins bind more tightly to the rRNA scaffold. Highly conserved rProteins L5, L11, L22, S5, S7, S11, and S13 are phosphorylated in extant ribosomes (Soung et al. 2009). Although, enzymes capable of such modifications were not reported by any of the reconstructions. Therefore, it is less probable that they were modified by the enzymatic mechanisms of the LCA instead such modifications must have evolved later.

Transfer RNAs (tRNAs) are crucial components of the translation machinery. tRNAs are the “adaptors” that establish the complementarity between the mRNA codons and the amino acid alphabet (Crick 1958). The tRNAs are charged with specific amino acids by highly specialized enzymes called aminoacyl tRNA synthetases (aaRS). Each tRNA is specific for one amino acid and each aaRS specifically recognizes both the tRNA and its cognate amino acid. Extant proteins are made from 20 canonical amino acids although some variations occur like pyrrolysine a non-canonical amino acid (Srinivasan et al. 2002; Nozawa et al. 2009). Hence, there are at least an equal number of aaRSs. In the charging reaction, one canonical amino acid is ester bonded to its cognate tRNA by one specific aaRS. Based on their primary sequences and their tertiary structure, two classes (I and II) of aaRS were identified and usually, there are 10 aaRSs in each class (Eriani et al. 1990; Ibba and Söll 2000). Their distinct protein fold domains (Rossman and ATP-grasp, respectively) suggest they have a separate evolutionary history. A plausible explanation for this observation could imply earlier progenotes may have only one of the two classes, which would likely have restricted the usable code before they meet each other. Both class I and class II aaRSs are detected as elements from the genetic content of the LCA by most reconstructions (Table 3). Class I and class II aaRSs were detected by at least 50% of the studies, but 8 out of 10 aaRSs within each class were detected from 70 to 100% of the studies. This high conservation pattern strongly suggests that the translation machinery of the LCA had an almost complete version of the extant genetic code, if not fully consolidated.

Table 3 Highly conserved aminoacyl tRNA synthetases that might be among the genetic content of the LCA

Contrary to what has been documented to occur with the rest of the translation machinery, several horizontal gene transfer (HGT) events appear to have dominated the history of the aaRSs. Using sequence reconstruction and phylogenetic analyses, Fournier et al. (2011) recognized the role of several HGT events prior and after the divergence of the LCA, revealing a complex and intricate evolution of the aaRSs. Thus, explaining why their phylogeny does not always match to that of other highly conserved phylogenetic markers, like rProteins or the rRNA, nor between aaRSs themselves. Nevertheless, as complex as it appears, most of its evolution seems to have happened before the time of the LCA. Analysis of atypical forms of aaRSs suggests that ancient HGT have occurred within sister groups of a diverse community that inhabited different niches at the same time the LCA existed and even before. Further, the paralog sequence reconstruction of isoleucyl- and valyl-RSs suggest that they did not co-evolve with the genetic code, and these amino acids were already part of it before the cognate aaRSs diverged from their common ancestor prior to the LCA (Fournier et al. 2011).

Protein synthesis is promoted by several translation factors, which bind transiently to the ribosome during the phases of the translation process (Lipmann 1969; Kaziro 1978). Although translation can be carried out without translation factors (Spirin 1978), the rates are many orders of magnitude below the ones of the modern system. Recently, several observations supported the spontaneous translation view (Shoji et al. 2006; Konevega et al. 2007); however, without the translation factors protein synthesis is very slow and error prone. Table 4 shows that several initiation factors and elongation factors are among the proposed genetic content of the LCA by several reconstructions. Initiation Factors 1 and 2 were reported by 60 and 80% of the studies, respectively, while Elongation Factors Tu and G were detected in 80 and 90% of the cases. The high conservation of these factors suggests that the LCA’s translation machinery was already fine-tuned and dependent on translation factors that enhance its translation rates and fidelity.

Table 4 Highly conserved translation factors listed within the reconstructions of the genetic content of the LCA

Furthermore, it is well known that several translation factors hydrolyze GTP. They belong to a family of GTP-hydrolyzing enzymes that is related to a larger family of ATP-hydrolyzing enzymes (Leipe et al. 2002). IF2, EF-G, and EF-Tu are among these factors which can be called translation GTPases which are indispensable for the extant translation machinery, as can be clearly seen by the fact that nowadays Archaea and Bacteria possess several backup copies in their genomes (Margus et al. 2007). IF2, EF-G, and EF-Tu are listed as part of the genetic content of the LCA by most reconstructions (Table 4). The latter immediately suggests that the LCA must have possessed an efficient energy production system able to meet the ribosome's extensive GTP demand.

It is widespread knowledge that several modifications to the RNA nucleobases are common features of ribosomal, messenger, transfer, and other noncoding RNAs. Such modifications are believed to play key roles in regulation, molecular recognition, and structural stabilization. Methylation, acetylation, and the chemical transformation of uridine into pseudouridine are examples of the most common. Some of these modifications even occur in the ribonucleobases that form the peptidyl transferase center, the very core of the translation machinery (Tirumalai et al. 2021). Mature tRNAs are also extensively modified (McCloskey and Crain 1998; Byrne et al. 2010). Such modifications are so typical that they have influenced the name of the tRNA structures. For instance, the D loop is named after the 5,6-dihydrouridines and the T loop after the thymine preceding a pseudouridine (Ψ). tRNA pseudouridine synthase catalyze the conversion of uridine to Ψ at several positions in the tRNA. tRNA pseudouridine synthase is also listed by 90% of the reconstructions among the genetic content of the LCA (Table S5) suggesting that fine tune regulation such as nucleotide modification with structural and functional implications was operational but still evolving within the translation machinery of the LCA.

Final Remarks

As reviewed here, there have been at least ten attempts to reconstruct the genetic content of the LCA since the release of the first completely sequenced organisms. Some tried to derive LCA’s physiology and others even extrapolated into the possible environment that the LCA inhabited. They have implemented different methodological strategies and used a variety of completely sequenced organisms from the three domains of life. This was a deliberate effort to compensate for the methodological biases inherent to previous strategies. Although their specific conclusions are not always compatible, all ten studies have been successful in noticing, to some extent, that multiple aspects of the translation system are highly conserved.

Key elements of the translation machinery are found in essentially every reconstruction.

Many ribosomal proteins from the SSU and the LSU, representing over half of the total number of rProteins of extant prokaryotic ribosomes, are included within the genetic content of the LCA by the majority of reconstructions (Table 2). Almost every aaRS from both classes are listed by most reconstructions as present in the genome of the LCA (Table 3). GTP-dependent translation factors like IF2, EF-G, and EF-Tu are also regarded as elements of the LCA translation system by most reconstructions (Table 4). Even the tRNA pseudouridine synthase is included among the genetic content of the LCA by many reconstructions (Table S5). All these key features indicate that LCA’s translation machinery closely resembled a contemporary prokaryotic system. It contained many rProteins, a full set of aaRSs which directly imply a modern genetic code, several energy-dependent elongation factors and even specific nucleotide modification enzyme that most likely enhanced structure and may have influenced the overall translation rate alongside the translation factors.

These reconstructions focus on the distribution of the proteins rather than the rRNAs or tRNAs. Typically, RNA secondary structure is defined by the occurrence of helical regions. When comparing large RNAs one can monitor the presence or absence as well as the extent of conservation of each individual helical region. The history of the individual helical regions can also be correlated with ribosomal protein interaction sites. When this is done, some aspects of rRNA structure are essentially universal and could be useful to include them in future LCA’s reconstructions as it has done for the rRNA (Petrov et al. 2014, 2015; Bernier et al. 2018). Independent comparisons of atomic-resolution ribosomal structures suggested that the size of the LCA’s rRNA must be closer to extant prokaryotic ribosomes (Bernier et al. 2018). Efforts to include RNA structural features that were useful in reconstructing the history of rRNA, such as GNRA tetraloops (Hsiao et al. 2009), A-minor interactions (Bokov and Steinberg 2009), and insertion fingerprints (Petrov et al. 2014), await future studies focused on the translation machinery of the LCA.

A usual conclusion is that the genetic code is essentially universal and likely already established in the LCA. This should be determined by looking at tRNA populations, but instead it has been inferred from the conserved sequences of key enzymes like the aaRSs. As described above, the history of the aaRSs turned out to be intricate due to several HGT events that occurred after the divergence of the main cellular domains (Fournier et al. 2011). More important are those HGT events that occurred before that divergence since they molded the extant genetic code, whose origin and early evolution seem to be the consequence of multiple forces acting differentially throughout their history (Knight et al. 1999).

Different reconstructions produce different scenarios for the physiology and possible environment of the LCA. Whether it was an autotroph or a heterotroph is still unclear. What every reconstruction agrees on is that it possessed an almost fully functional translation machinery that closely resembles a modern prokaryotic one. Therefore, we propose that the prokaryotic nature of the LCA was largely established when the divergence of the three main cellular domains occurred.