Introduction

“The origin of protein synthesis is a notoriously difficult problem.”—Crick et al. (1976).

The supposed difficulty lies on two aspects. The first is the complexity of the translational system, i.e., the reducibility of the system. How can a simpler version of the translational system benefit an organism? The second aspect is the chicken and the egg phenomena or the causality dilemma. Much of the modern cellular processes, including translation, rely on complex, coded proteins. Yet proteins themselves are created via translation.

For this review, we will focus on the first question. The second already has an attractive answer. Indeed the theories presented here are not so much theories on the origin of the translation system as they are theories on the transition from the RNA world to the world of proteins. The RNA world hypothesis as used here simply indicates that before the era of coded protein synthesis, RNA performed both the major catalytic and informational roles. This does not preclude the presence of other functional chemical moieties, such as uncoded peptides, metabolites, and lipids. The plausibility of the RNA world hypothesis will not be discussed here, but for a recent, humorous defense of the RNA world hypothesis see Bernhardt (2012). We will also refer you to an amusing “preintroduction” to the study of the origin of life, which includes quotes from the two most influential voices on the subject, Darwin and Genesis, as well as a recipe for the spontaneous generation of mice (Penny 2005).

We have refrained from critically analyzing the models presented here for the sake of accessibility and objectivity. This may seem odd in certain cases, where there are clear flaws or evolutionary gaps. Our hope is by stating these theories succinctly and clearly, these can be highlighted. These are of course not necessarily reasons to dismiss the theories concerned, but a call for further development and discussion. For the same reasons, the experimental evidence will also not be presented.

In one of his many publically available reviews of relevant articles, Koonin claims “any attempt on detailed modeling [of the origin of translation] is fraught with speculation” (Ma 2010). Taking this to heart, absent from this review is the myriad of very specific solutions that have been presented in the articles cited here. While such details can be used as effective arguments for the plausibility of broader ideas, they contribute little to the central problem. In the absence of these details, many ideas become almost indistinguishable. This is a good thing, and the very theme that this review is attempting to capture. We have here a list of these overarching ideas for how translation came to evolve.

We will also make passing mention of Koonin (2007)’s article, which, though vaguely heretical, makes an important point about the plausibility of early evolutionary theories and the anthropomorphic principle: because we are here, in a protein world, the translational system evolved. This implies that the low probability of a given theory is not grounds for dismissal. Something must have happened, and that event may be of exceptionally low probability. This is not to say that probability should not be considered when evaluating these theories, but that a series of unlikely events leading to translation may have been exactly what occurred.

To reemphasize a point, the list here is not complete. Many theories, which are truly independent, have been lumped together in the name of simplicity and clarity. This allows the big ideas present in numerous articles to be examined without considering the dozens of smaller debates that can exist within each. In this attempt to synthesize the many theories on the origin of translation, we will focus on each component of the translational system in turn: Peptides before proteins, adaptor molecules (tRNAs), the template (mRNA), the ribosome, and the first proteins (Table 1). We feel this highlights the modularity of this transition: To construct a complete pathway from the RNA to the protein world, we can consider the different theories almost independently.

Table 1 Theories for each component

Peptides Before Proteins

While the modern translational system depends on many different interacting parts, it fundamentally mediates the interaction between nucleic and amino acids. Before proteins and translation, what could drive this interaction (Fig. 1)? It is important to note that these peptides are not coded by RNAs, but instead generated via a abiotic mechanism and coopted to work with RNA molecules.

Fig. 1
figure 1

Peptides before Proteins. Note that these representations are symbolic and are not specific predictions of shape or structure. a Peptides as Cofactors. A ribozyme catalyzes its substrate more efficiently after it incorporates an amino acid into its active site. b Sequestration of Substrates. Initially, amino acids can diffuse into and away from the RNA organism, but as ribozymes evolve to sequester these substrates, diffusion away is prevented, allowing buildup. c Replication tag. The RNA replicase recognizes the amino acid tagged RNA preferentially. The untagged RNA initiates replication at multiple sites, forming truncated copies

Peptides as Cofactors (White 1982; Gibson and Lamond 1990; Wong 1991; Szathmary 1993, 1999; Szathmary and Maynard Smith 1997; Di Giulio 1997, 2003, 2008; Knight and Landweber 2000; Wolf and Koonin 2007; Rodin et al. 2011)

In the modern world, proteins expand their range of catalytic activities by binding to cofactors, whose chemically distinct structures can stabilize proteins or even play a direct role in catalysis. A natural extension to the RNA world is that ribozymes could have taken advantage of cofactors, perhaps even amino acids themselves.

Ribozymes that bind to amino acids or even short peptides could potentially gain new or improved function, and thus be selected for. This interaction could have initially been mediated in a noncovalent manner due to the ribozyme sequence or covalently via UV radiation. The amino acids could have played a direct catalytic role, being present in the active site, or simply stabilized the ribozyme structure.

Once an advantage for a peptide-bound RNA has developed, the stability and specificity of that binding event would also be favored. A primitive aminoacyl tRNA synthetase (aaRS) may have arisen, either as a distinct ribozyme or a domain on the RNA itself, generating covalent bonds as seen in modern tRNAs. Via this mechanism, a specific relationship might have evolved between amino acids and RNAs. As the translational system evolved, this relationship could have been coopted, giving rise to the genetic code itself.

Sequestration of Substrates (Gibson and Lamond 1990; Wolf and Koonin 2007)

In a prebiotic, precellular, or even protocellular environment, abiotically produced substrates, such as amino acids, would freely diffuse. Thus, a competing genome, metabolic cycle, or proto-organism would be given an advantage if it was able to sequester its substrates and products from its competitors. In the absence of a sophisticated semipermeable membrane, an appropriate mechanism would be the covalent attachment of substrates/products to the catalysts or even carrier RNAs, which would then present their bound substrate to the catalyst.

To be effective any enzyme, either protein or RNA, must have a high affinity for its substrate, especially when competing for a scarce metabolite. A ribozyme that lost its catalytic activity yet maintained its high affinity to the substrate that would allow the accumulation of a useful metabolite. Other ribozymes with affinity for this substrate may have interacted with the bound substrate and thus the “charged” RNA. Once the interaction was established, selection could push the uncharged ribozyme to interact with the charged RNA itself, using the perhaps more dependable Watson–Crick pairs, and eventually losing its affinity for the amino acid substrate. This usage of charged RNAs would provide a clear evolutionary selection for specific binding, cementing the relationship between specific amino acids and specific RNAs.

Replication Tag (Orgel 1989)

In the RNA world, efficient replication is paramount, and the RNA replicase faces numerous issues. Presumably a given RNA organism contains catalytic, informational (complementary to the catalytic), junk, and foreign RNA. Only some of these should be replicated. The replicase also must choose a replication origin: without a specific site to begin replication, the ribozyme responsible would likely begin copying at random locations on the RNA, generating truncated copies and being generally inefficient. Perhaps the organism developed a regulation system using genomic tags. One possibility for such a tag is a 3′ bound amino acid (see “Replication Tag” under “Adaptor Molecules” for another possibility). This would provide a chemically distinct handle, which can be reversibly bound in a manner similar to that of modern tRNAs. In addition it would identify the 3′ end of the RNAs to be replicated, allowing efficient and complete 3′–5′ replication.

Imagine an RNA whose sequence encouraged the 3′ binding of an amino acid, having either gained this binding site randomly or through a previously evolved catalytic interaction. Initially, the replicase would not recognize a bound amino acid as a replication tag. However, the amino acid may have helped to neutralize the negative charge on the RNA, which allows the replicase, also negatively charged, to bind this RNA’s 3′ end preferentially over other RNAs, and other loci of the same RNA. An internally bound amino acid, by contrast, would encourage truncated RNAs, and thus be selected against. This explains selection for such 3′ acylation and would allow the replicase to evolve a more specific interaction. Note that the same argument can be applied to 3′ bound short peptides.

Adaptor Molecules

Here are the general theories and ideas behind tRNA evolution. For an excellent review of specific predictions of tRNA structure and its evolution, see Di Giulio (2009). Included here are functions for tRNA without the rest of the translational system (tRNAtRNA) and numerous accessories to the RNA replication process (Replication Tags, Preventing dsRNA, Replication Parsimony) (Fig. 2).

Fig. 2
figure 2

Adaptor Molecules. Note that these representations are symbolic and are not specific predictions of shape or structure. a tRNA–tRNA. Two amino acid charged RNAs interact, synthesizing a dipeptide. Repeated reactions create longer peptides. b Replication tag. Using short, tRNA-like tags, the replicase will only bind and replicate specific RNAs. c dsRNA. A naïve RNA replicase copying the sequence of a ribozyme creates dsRNA. Using RNAs that bind short sequences on the nascent strand, replication occurs without formation of dsRNA, allowing the ribozyme (the pentagon) to fold correctly. Finally, using dRNAs carrying trinucleotides, the replicase can create an identical, not complementary, strand, creating a copy of the ribozyme. d Replication Parsimony. Using adaptor molecules that translate complementary amino acid binding sites into their bound amino acids, both the original ribozyme (pentagon) and the complementary strand can form identical peptide products

tRNA–tRNA (Orgel 1989; Wong 1991; Schimmel and Henderson 1994; Brosius 2001; Di Giulio 2003, 2004, 2008; Bernhardt and Tate 2010)

The modern translation system mediates and dictates a series of tRNA–tRNA interactions, so perhaps it evolved from a similar reaction without the additional machinery, i.e., the ribosome, mRNA etc., Two amino acid or peptide-charged RNAs could come in close contact, catalyzing the transfer of the amino acid of one RNA to the amino acid of the other. Short peptides could be synthesized in this manner, which could presumably be more efficient than catalyzing the addition of a single free amino acid due to base pairing between the RNAs. These peptide bound RNAs could then serve as a ribozyme, with the peptide functioning to enhance the RNAs catalytic activity (See “Peptides as Cofactors”), or could release the short peptide for use by some other ribozyme.

Note that an RNA carrying a newly synthesized dipeptide may be indistinguishable from an identical RNA carrying only a single amino acid. This may have led to a single RNA having amino acids repeatedly added in a wasteful or unfavorable manner. One solution is that the identity of the bound peptide may have helped to determine the potential of base pairing to another RNA, preventing runaway synthesis. Another possibility is that this ambiguity may have selected for these pre-tRNAs to bind only a single amino acid, perhaps losing their original catalytic function in the process.

Replication Tags (Eigen and Winkler-Oswatitsch 1981; Weiner and Maizels 1987, 1999; Maizels and Weiner 1994; Brosius 1999, 2001; Yakhnin 2007)

This proposal attempts to address the same issues as those proposed in amino acid Replication Tag, namely the necessity of a RNA replicase to identify RNA to be replicated as well as choosing an appropriate replication origin. Instead of amino acids as tags, tRNA-like structures may have played that role: tagging the RNA strand with a tRNA-like structure acted as a recognition site for the RNA replicase. This may have begun as simply a binding site at the 3′ end of the RNA, but then been separated into a distinct RNA replication initiation factor that would bind via base pairing the RNA that was to be replicated, serving additionally as a primer.

Amino acids may have been bound to these pre-tRNAs to aid in binding to the template, to the replicase, or even functioned as a switch, where the presence or absence of the amino acid determined whether an RNA was to be copied. As the replicase evolved into the ribosome (see “Replicase”), the tRNA-ribosome-template interaction was already set.

Preventing dsRNAs (Campbell 1991; Yakhnin 2007; Noller 2010)

Consider the template driven synthesis of RNA by a replicase. In a naive system, the product would be double-stranded RNA (dsRNA) as the replicase would depend on the Watson–Crick base pairing for specificity of replication. This presents an important twofold problem: neither the template nor the newly created strand would be catalytically active in this state, and separating dsRNA into its single-stranded components is energetically difficult. Thus, any system taking advantage of base pairing to copy RNA must have compensated for this problem. It has been suggested that initially, regular cycles of environmental changes allowed strand separation. Any organism that escaped dependency, even partially, from these cycles would have a huge selective advantage.

One possibility is that there evolved a set of duplicating RNAs (dRNAs), which acted as adaptors. These dRNAs would bind to a specific sequence on the template, as well as an identical, free trinucleotide. In a similar manner to modern tRNAs donating amino acids, these dRNAs would donate their bound trinucleotide to a growing RNA, which would be identical, not complementary, to the template. This prevents both the formation of dsRNA and the formation of the complementary, and presumably catalytically useless, RNA. One potential mechanism by which the identical sequences were maintained is if the dRNA itself was a homodimer.

Another possibility is that pre-tRNAs bound immediately to the newly created RNA, temporarily preventing the two strands from coming together and forming dsRNA, analogous to single-strand binding proteins preventing DNA duplexes in present day DNA replication. Even if this interaction is very brief, the time delay may allow the normal secondary structures to form in the template strand: the structured nature of the template would prevent in a more permanent manner the formation of dsRNA. If the pre-tRNAs possessed only short complementary sequences, say three nucleotides, their separation from the created strand would require little additional energy. We note that even a single pre-tRNA of this type would help prevent at least some dsRNA formation, and as these pre-tRNAs were duplicated with perhaps small changes to their “anti-codons,” they would prevent more and more dsRNAs. Bound amino acids may have encouraged binding, and the following de-acylation may have accelerated turnover of these pre-tRNAs, allowing replication to proceed more rapidly.

Replication Parsimony (Ma 2010)

Consider a system where useful peptides are produced by directly bound amino acids on a ribozyme (see “Peptidase”). These ribozymes would contain a sequence of binding sites for amino acids that, when bound, would create a specific peptide. When copying this ribozyme, a necessary intermediate is the complementary sequence and presumably this RNA would be catalytically useless, a waste of energy to create. Any mechanism which could skip this intermediate (see “Preventing dsRNA”) or make use of it would be highly advantageous. Now imagine there evolved adaptor RNAs, which would bind to the complementary sequence of the amino acid binding sites. These adaptors would carry the amino acid corresponding to the binding site, allowing the otherwise useless complementary RNA to catalyze, via these adaptors, the creation of an identical product to the original ribozyme. The use of these adaptors would allow the binding sites to be more compactly coded, as they would provide flexibility, the actual reaction taking place at some distance from the coding RNA. This meant the complementary strand could be more effective as an mRNA than the original, encoding the same information in fewer bases.

The Template

As tRNAs and the ribosome itself play active roles in translation, it is reasonable to imagine that they might have played active roles in other processes. On the other hand, mRNA is a purely informational molecule, useless without something to read its message. An important issue which will not be addressed explicitly here is the ancestral intron/exon structure (See Penny et al. 2009 for a recent review).The existing theories for the origin of the interaction between mRNAs and the translational system suggest that mRNA may have provided information for another process (One Gene, One Reaction), served only a structural purpose (Internal Template, Mediating tRNAtRNA), or had no purpose whatsoever (Unconserved RNA) (Fig. 3). Note that the evolution of mRNA is highly dependent on the other components.

Fig. 3
figure 3

The Template. Note that these representations are symbolic and are not specific predictions of shape or structure. a One Gene, One Reaction. Here we see an RNA dictating a sequential reaction in the metablosome by bringing in substrate-bound RNAs in a specific order. b Internal template. The proto-ribosome uses an internal component as a binding site for the pre-tRNAs. This component becomes separated and eventually a highly modular RNA strand. c Mediating tRNA–tRNA. A reaction that occurred between two amino acid-bound RNAs is initially stabilized by a third, pre-mRNA. Soon, this interaction relies entirely on the mRNA sequence, with minimal direct interaction between the amino acid-bound RNAs. d Unconserved mRNA. RNA which plays no purpose, informational or catalytic, could be available via intron splicing or transcription of regions between RNA genes. As there would be little or no selection on their sequence, they could readily be accepted as the first templates in a primitive translation system

One Gene, One Reaction (Gibson and Lamond 1990; Di Giulio 2003)

In the case where the proto-ribosome played a general metabolic role (see “Metablosome”), there must be a mechanism which dictates which reaction will occur. In any given reaction, a specific set of cofactor and/or metabolite charged RNAs must come into contact with the metablosome. The most natural solution is that for that reaction, there exists a specific RNA, which binds to the metablosome while also bearing complementary sequences to the charged RNAs, bringing the metabolites and/or cofactors to the metablosome in a specific manner. These templates could be exchanged to allow for multiple reactions to take place using the same sequestered environment.

One could imagine that sequential reactions would also be valuable. From the case above, there could be a transition where the template would code for an order of cofactors/substrates brought into the proto-ribosome via a translocation-like mechanism. Thus instead of a one gene, one protein, a gene would encode a single reaction. As the translational system continued to evolve, these RNAs would be used as the first templates for protein synthesis.

Internal Template (Yarus 1998; Brosius 2001; Di Giulio 2003; Wolf and Koonin 2007)

Usually the ribosome is considered a highly versatile system, able to read any mRNA. Perhaps, this ability was preceded by an efficient, yet limited proto-ribosome. The template itself may have originated as internal binding sites within the ribosome. Internal RNA components of the proto-ribosome may have bound pre-tRNAs via base pairing. This would allow a small set of pre-tRNAs to interact efficiently. This internal component could have then been replaced by an equivalent but highly-variable, external pre-mRNA, allowing a greater variety of pre-tRNAs to bind.

Mediating tRNA–tRNA (Orgel 1989; Wong 1991; Schimmel and Henderson 1994; Di Giulio 2004, 2008; Bernhardt and Tate 2010; Fox 2010)

Assuming a reason for two RNAs to closely interact in a specific manner (see “tRNA–tRNA”), a single RNA could base pair with both, facilitating that interaction, perhaps increasing the rate of peptide synthesis. These pre-mRNAs may have been promiscuous at first, promoting large numbers of RNA interactions, including ones with deleterious products. Selection avoiding such products and favoring helpful peptides may have led to coding. This may go further in that, the pre-mRNA determined the specific order of interactions between different peptide bound RNAs. The gradual optimization of this process could have led to contiguous binding sites, i.e., no gaps between codons. The pre-mRNAs would favor this change as the bound RNAs would be closer, and thus react more efficiently.

Unconserved RNA (Poole et al. 1998)

Under a RNA world hypothesis, splicing may have been involved in the processing and maturation of ribozymes. In addition, in a RNA genome, there may have existed spacer sequences between different RNA genes, which may have been randomly transcribed. Either way, this RNA would have been under no or little selection for sequence conservation. If unconserved RNA was translated into a peptide, the RNA would begin to express a selectable phenotype. Small changes in the sequence would be reflected in the peptide, allowing weak selection to proceed for even the least functional peptides, improving their function. Unconserved RNA could also be the original source of any of the above pre-mRNAs.

The Ribosome

While it is often noted that the aaRSs and tRNAs are the true adaptors, translating from nucleic to amino acid, it is the ribosome that does the heavy chemical lifting. More specifically, it is the RNA components of the ribosome that perform the important reactions, see Noller (2010) for an excellent overview of the evidence. True to its modern capacity as a catalyst, the existing theories describe a number of catalytic properties the proto-ribosome may have had, such as a peptidase, RNA replicase, or the more general metablosome (Fig. 4).

Fig. 4
figure 4

The Ribosome. Note that these representations are symbolic and are not specific predictions of shape or structure. a Peptidase. A ribozyme binds amino acids to improve its catalytic activity. It soon loses this activity in favor of forming peptide bonds between bound amino acids. It begins to accept amino acid bound RNAs as substrates, adopting a secondary subunit to stabilize the interaction. b Replicase. The triplicase excises trinucleotides from the tRNAs, adding them to the nascent strand, which is then bound by a secondary subunit to prevent the formation of dsRNA. c Metablosome. The metablosome provides an environment for catalysis, accepting substrate bound RNAs whose identities are determined by a separate pre-mRNA. Alternatively, the proto-ribosome was an aggregation of distinct catalysts that accepted multiple substrates

Peptidase (Wong 1991; Schimmel and Henderson 1994; Szathmary and Maynard Smith 1997; Yarus 1998, Noller 2004, 2010; Wolf and Koonin 2007; Bokov et al. 2009; Yarus et al. 2009; Bernhardt and Tate 2010; Fox 2010; Ma 2010)

As the abiotically formed oligopeptides deplete due to their consumption and degradation, a selective pressure forms to catalyze the formation of more peptide bonds to keep up with demand (see “Peptides Before Proteins”). The peptidase itself may have originally catalyzed a different reaction, which happened to be stabilized by the binding of two or more amino acids in close proximity (see “Peptides as Cofactors”). This ribozyme then came to catalyze the peptidation of these amino acids, which perhaps provided greater catalytic activity. If released, this created peptide could be used as a cofactor for another ribozyme. Soon, as other ribozymes depend on these peptides, the peptidase activity would be separated from the original function via duplication. One could imagine that large and accurate proteins could be made in this fashion by utilizing numerous amino acid binding sites in close proximity.

This peptidase could then begin to recognize amino acid bound RNAs instead of free amino acids; this in turn would increase the binding capability due to the use of complementary sequences. In addition, the transfer of an amino acid from one substrate to another is energetically favorable relative to the binding of free amino acids. An additional subunit, i.e., the small ribosomal subunit, may have evolved to stabilize the interaction between the peptidase and the amino acid bound RNAs.

Replicase (Weiner and Maizels 1987, 1999; Campbell 1991; Maizels and Weiner 1994; Gordon 1995; Poole et al. 1998; Penny 2005; Yakhnin 2007)

In this theory, the protoribosome serves as a RNA replicase. This structure reads a single strand of RNA to be copied, perhaps recognizing a tRNA-like or amino acid replication tag (see “Replication Tag”). Note that this highlights the ability of the modern ribosome to facilitate the complementary reaction of anticodon and codon as opposed to peptide bond formation (see “Peptidase”). A major subtheory is the triplicase; instead of adding a single nucleotide to the strand, trimers are added, donated by tRNA precursors. The RNA molecules interact with the triplicase and bind three base pairs, the early codon. The amino acid attached to the tRNA may have been present all along or selected later to aid in specificity and recognition. A ratchet mechanism, which shifts the RNA strand by three nucleotides, would increase the rate of triplication and be strongly selected for.

Presumably, a triplicase would replicate RNA in a faster and more accurate manner by relying on the combined base pairing of three nucleotides. It added trimers and not dimers or 4-mers simply because this explains the modern triplicate code. Such a ribozyme would be selected for because the slow replication time of single-nucleotide RNA replicases would allow the template to disassociate from the free nucleotides prior to the completion of replication. As discussed in Preventing dsRNA a naïve system would create RNA duplexes, which would require energy to separate. One possible solution is that the growing RNA strand would be threaded through another, accessory RNA, which may have eventually become the small subunit.

The Metablosome (Gibson and Lamond 1990; Di Giulio 2003, 2008; Krupkin et al. 2011)

The modern ribosome is the center of a number of distinct catalytic interactions, including ATPase, translocation, and transpeptidation. The metablosome is thus a potential precursor to the ribosome, serving as an organizing center for general catalysis. This may start as a simple RNA structure, stabilizing the interaction between substrate and cofactor bound RNAs (pre-tRNAs), with distinct or internal pre-mRNAs, which served as a binding site for the pre-tRNAs.

The most primitive form of this organizing center may have been a single RNA, which was bound to two substrates/cofactors. These were brought into the main body of the metablosome, where the interaction of the substrates/cofactors was stabilized. The two bound regions themselves were stabilized by an internal template. This self-contained catalyst used base pairing to facilitate interactions, and via exchangeable binding sites (see “One Gene, One Reaction”) may have been able to catalyze numerous reactions, including, upon the appearance of a ratcheting mechanism, sequential reactions.

Another similar theory is that the ribosome evolved not as a functional unit per se, but simply an aggregation of numerous catalysts, brought together by the additional stability of a multimer. This explains the contact of the two major subunits, while each of these is made up of numerous RNA and peptide components themselves. This theory dictates that each of these subunits must have played their own function. Once aggregated, these catalysts may have coevolved to function as a site of general catalysis on one or more metabolites.

The First Proteins

Having evolved a translational system for any number of reasons, either related or not to the ultimate “goal” of translation, we can ask the question: What were the first proteins, i.e., coded polypeptides? The theories here include both catalytic (Runaway Cofactors, Sophisticated Catalysts, Primitive yet Effective) and structural roles (RNA Atrophy, Structural Proteins), as well as no role at all (Waste Products) (Fig. 5).

Fig. 5
figure 5

The First Proteins. Note that these representations are symbolic and are not specific predictions of shape or structure. a Runaway Cofactors. A ribozyme begins to incorporate amino acids into its active site as a cofactor. Longer peptides are used until the active site is composed entirely of protein. Soon the protein functions as an independent enzyme. b RNA Atrophy. Bound amino acids stabilize a ribozyme. Longer peptides are more effective and begin to replace RNA structural components until there is only an RNA active site surrounded by protein. c Waste Products. Proteins are generated as a byproduct of RNA replication. d Structural Proteins. Primitive proteins could form structural components of the organism, such as amyloid barriers that are insensitive to sequence or more rigid structures that are insensitive to translational flaws. e Primitive yet Effective. Simple proteins can have useful functions as membrane pores or small catalysts. f Sophisticated Catalysts. Proteins are originally generated by an alternate pathway, allowing them to evolve into powerful enzymes. As translation arises, this information is preserved

Runaway Cofactors (Wong 1991)

This theory assumes that ribozymes used peptides as cofactors to expand their catalytic activity (see “Peptides as Cofactors”). This would create a selective pressure to synthesize these products in a more consistent, i.e., coded, manner. The short length would allow the primitive translation system to generate these with few errors. As ribozymes came to rely on coded cofactors, the translational system would improve in accuracy, allowing longer peptides to be used, continually enhancing the ribozymal activity until the now protein active site no longer requires the RNA backbone to function as a catalyst. This first enzyme detaches from the RNA and replaces the RNA’s original function.

RNA Atrophy (White 1976, 1982; Gibson and Lamond 1990; Di Giulio 1997; Poole et al. 1998; Noller 2004)

Given the sensitivity of active sites to the sequence of amino acids, it is unlikely that an early, presumably inaccurate, translational system could generate superior catalysts, or even active sites, to the already existing ribozymes. Small, poorly coded peptides could incorporate themselves into the structural backbone of ribozymes, stabilizing the RNA active site, providing selection for continued translation. If the RNA active site already used an amino acid cofactor (see “Peptides as Cofactors”), then the increasing peptide chain could be build off of this. The RNA structural portions and substrate binding sites of the ribozyme would then atrophy as they were replaced by the increasingly complex and more accurately translated proteins, leaving soon only sophisticated structural proteins built around an RNA active site. Contrast this to Runaway Cofactors, where a structural RNA is built around a protein active site. At this point, the translational system would have become sophisticated enough to create superior protein active sites. These proteins may still exist today, in the form of the numerous enzymes that take advantage of nucleotide-derived coenzymes, such as NADH and coenzyme A.

Waste Products (Gordon 1995; Yakhnin 2007)

This theory suggests that the original ribosome was a replicase (or triplicase). As often occurs in biology, this energetically unfavorable process could be accelerated by coupling it to an energetically favorable process, such as the formation of proteins. Peptides created in this manner would initially serve no purpose, but their consistent appearance and coded nature allowed selection to shape them into functional peptides.

Structural Proteins (Orgel 1989; Wetzel 1995; Milner-White and Russell 2008; Baranov et al. 2009; Krupkin et al. 2011)

Small coded peptides may have played some other important role, allowing the translational system to add new amino acids and gain a certain level of complexity. But the jump from small primitive peptides to catalysts that outstrip the existing ribozymes seems unlikely. Between the two must be a mid-level function that would provide selection for longer, more sophisticated proteins, while still remaining insensitive to translational inaccuracies and perhaps a changing genetic code. In modern organisms, proteins play structural as well as catalytic roles, and this could function as a point of entry. As nucleic acids are fragile, they make poor structural components, allowing less sophisticated proteins to replace them.

A specific example would by amyloids. These proteins are normally mentioned as active agents in many diseases, such as Alzheimer’s. In a primitive organism, they may have served as a protective insulating layer. They are unstructured proteins determined by composition, not sequence, so translational errors would not affect their function.

Sophisticated Catalysts (Yarus 1998; Yarus et al. 2009; Ma 2010)

This theory suggests that complex peptides evolved before the translational system, being produced by the translational precursor (see “Peptidase”). Under a theory where the modularity of protein synthesis evolved late, initial systems could have evolved the accuracy, though not the broad repertoire, of a modern system. This allows the first truly translated proteins to have already evolved, and be optimized for, important and delicate tasks.

Primitive yet Effective (Gibson and Lamond 1990; Milner-White and Russell 2008; Bernhardt and Tate 2010)

When we imagine a protein, we consider this massive finely tuned molecular machine, capable of all manner of chemistries, and this kind of macromolecule is clearly out of the reach of a primitive translation complex. However, simpler proteins can have effective functions. One example is a membrane pore that can be created using only a short polypeptide made of two amino acids. This would provide an important step in development of membrane bound organisms. Another example is that of poly-glycine, which shows catalytic activity, suggesting that even the most primitive peptides may have played an important role in metabolism.

Conclusions

The question we have been avoiding here is the all-important testability of the above theories. While this topic is discussed in its necessary detail in many of the original articles, we can still address this idea inside this broader framework. Sometimes the correct question to ask is clear. For example, for Peptides as Cofactors: can ribozymes incorporate amino acids as cofactors to expand or enhance their catalytic activity? (Yes, at least in one case (Roth and Breaker 1998)). Other theories could be supported in a similar constructive fashion. If someone managed to actually create an RNA triplicase (Replicase) or a ribozyme which constructed specific peptides (Peptidase), this would go far to establish plausibility—if they managed to construct one via modifications to the modern ribosome it may even convince some people. We do not believe this to be a throwaway comment; in what is the beginning of long and hopefully fruitful era of synthetic biology, we are no longer limited to what we observe in nature. Already a similar approach has been brought to bear on more recent evolutionary questions (Voordeckers et al. 2012, inter alia).

But the highest goal, at least on the theoretical side, is to layout a series of not-so-farfetched events that present a path into the protein world. The value of these theories, or lack thereof, lies in their coherency and ability to stimulate discussion (Cairns-Smith 1974; White 1982). This is not to suggest that each of these theories is equally valid in light of existing evidence and theoretical concerns. We would like to reiterate that the proposals have been presented in a clear, concise, and accessible manner in order to highlight the flaws and evolutionary gaps that may deserve additional attention.

One of the major gaps we would like to emphasize explicitly is the evolution of translocation. While certain above theories do propose an evolutionary advantage for translocation before coded protein synthesis, it is a complex mechanism in its own right which necessitates a more explicit evolutionary path (See Bernhardt and Tate 2010). The relative lack of theory perhaps reflects the limited understanding of the translocation mechanism. But just as new crystallography information has expanded our knowledge of how the ribosome creates peptide bonds (Noller 2010), there is a wealth of new structures available probing translocation (See Achenbach and Nierhaus 2013). Hopefully this new information will translate into theory.

As we have presented them here, the theories fall short of a complete answer to our original question: how could translation have evolved. Each describes only a single step, absent of the necessary details. These details are outside the scope of this review, but we can bring the possible steps together. While the evolutionary paths of the translational system are not truly independent, it is productive to consider each component separately, and then with a few added transitions, mix and match to generate plausible ideas.

For example: The original interaction between RNAs and peptides came as RNAs captured amino acids to be used cofactors (see “Peptides as cofactors”). The large ribosomal subunit then evolved as a peptidase (see “Peptidase”), taking amino acid bound RNAs as substrates, creating small peptides used as cofactors. The small subunit evolved separately as a replicase (see “Replicase”), using a varied group of pre-tRNAs to prevent the formation of dsRNA (Preventing dsRNA). These pre-tRNAs become charged with amino acids, allowing them to become substrates of the peptidase; this coupling to peptide formation provided additional energy for replication (see “Waste Products”). Random, junk RNA, is now “translated,” and as there is little conservative selection on these RNAs, sequence changes that produce useful peptides are preserved and selected for (see “Unconserved RNA”). Initially these peptides stabilize existing ribozymes but as they grow in sophistication, they replace the structural RNA components, becoming a protein scaffold for an RNA active site (see “RNA Atrophy”). As translation becomes more accurate, and incorporates more amino acids, independent, catalytic proteins arise, superior to their RNA counterparts, and the protein invasion is complete.