The Importance of the Catalysis in the Evolutionary History of the Genetic Code

There are about 120 definitions of life (Barbieri 2003; Popa 2004). A definition of life considering catalysis as primary quality might be: “a system is living if it codifies for not trivial multimeric catalysts.” Such definition is actually another way to define the genetic code. Indeed, the genetic code might be defined as the evolution of the coded catalysis (Di Giulio 2003).

This definition requests clarifications. One can consider a “trivial” life—at low complexity—based on metabolic maybe autocatalytic metabolic cycles, as discussed in the literature (Hoffmann-Ostenhof 1959; Eakin 1963; King 1980; Wachtershauser 1988, 1992), in which the catalysis might have been achieved only by ions or molecules of low molecular weight (Hoffmann-Ostenhof 1959; Eakin 1963; King 1980; Wachtershauser 1988; Wächtershäuser 1992; Di Giulio 1997), and is therefore not extremely efficient. However, the definition of life given above refers, evidently, to the complex life, that is to say, that life in which the catalysis performed by multimeric complexes is several order of magnitude more efficient, specific, or stereospecific than in trivial life. And it is this very transition from the trivial life to the complex life that the genetic code would seem to have mediated. Therefore, according to this argument the genetic code would have allowed the access to the complex life through the evolution of the coded catalysis. If this reasoning is correct, then the catalysis, mediated by polymers, would have been the main selective pressure that led to the establishment of the genetic code. In a more direct way, the catalysis made by means of proteins might have been the only propulsive force that would have determined directly the structuring of the genetic code. Although strictly speaking, this might be true only a posteriori being the genetic code, for instance, evolved for reasons that were not linked to the catalysis (Wolf and Koonin 2007; Rodin et al. 2011). It would however result striking that the genetic code—codifying primarily for the catalysis—does not suggest in a very strong but not conclusive way, that the catalysis is the raison d’être of its evolution.

If this is true, then a basic research would be the one that would try to identify the key molecule that would have permitted to achieve the catalysis during all the evolution of the genetic code.

It seems evident that two requirements have been requested for triggering the origin of the genetic code. The first would reside in the observation that a system that does not have a certain degree of complexity is not able to trigger the origin of the genetic code, and the second is that RNAs, peptides, and polypeptides should have been present. Actually, the second point seems to contain also the first because a system that is already able to synthesize RNAs and peptides of various lengths is evidently a system that would possess a complexity able to promote the origin of the genetic code. Now we could ask in a more direct way, what would have been the form under which would be due to be present the catalyst that would have perform the catalysis during the evolution of the genetic code. The choice seems to be that of a mixed polymer made of RNA and peptides covalently linked, because this polymer would be able to explain the birth of the peptidyl-tRNA, the key intermediary of the protein synthesis, otherwise difficult to explain (Wong 1991; Wong and Xue 2002; Di Giulio 1997, 2003, 2007; see also below). Therefore, according to this point of view, RNA covalently linked to peptides or polypeptides would have mediated the catalysis during all the evolution of the genetic code. And it is to understand this point that seems fundamental that I have decided to investigate furthermore a model that has been already proposed (see Di Giulio (2003, 2007) also for an introduction).

A Model for the Evolution of the First mRNAs

A simple way to imagine the initial onset of the origin of protein synthesis is to hypothesize that two aminoacylated or peptidated-RNAs, in some points, pair somewhere along their sequences. Consequently, the amino acids and peptides could have come in contact in a way that the peptide bond might have established. It seems to me that it is impossible to hypothesize a simpler protein synthesis system. This rudimental mechanism of protein synthesis not only enjoys the criterion of simplicity, i.e., the simple and natural interaction mediated by hydrogen bonds between the bases of the two RNAs, but also the “trivial” interaction enabling us to recognize the first form of coding that might have attributed to RNA. That is to say, the pairing between the two RNAs would result, in the last analysis, to codify the formation of the peptide bond between, for example, an amino acid and a peptide present on these two RNAs. This is because the peptide bond would have been formed by means of the pairing regions between these two RNAs that would have juxtaposed the amino acid and the peptide in a way to establish the peptide bond. Therefore the protein synthesis and finally the genetic code could be the evolution of this simple and natural interaction among peptidated-RNAs (Di Giulio 2003, 2007). For this reason, the peptidated-RNAs not only would seem to be the ideal candidates for ancestral catalysis (Wong 1991; Wong and Xue 2002; Di Giulio 1997, 1998, 2003, 2007; Szathmary 1993), but they would enjoy also the extraordinary property that a their interaction might lead to the formation of chemical bonds and in particular of the peptide bond, because their pairing might favor these bonds. These two properties of peptidated-RNAs are such to make them—for the Ockam’s razor—particularly suitable to explain the origins of protein synthesis and of the genetic code.

If interactions among peptidated-RNAs characterized the first stage of evolution of the protein synthesis and the genetic code, we can ask on what would have been their path for arriving to define the first mRNAs. We have mentioned above that the interaction between two peptidated-RNAs might have been the first coding form because it would have supported the formation of the peptide bond. That is to say, the bond that later would be resulted to be one of the chemical bonds most important of the biosphere. If this was true, then further evolution of this interaction with the participation of more than two peptidated-RNAs—always under the selective pressure to improve the efficiency of these catalysts—might have led to the interaction of several of these molecules (Fig. 1a) and also sequential interactions that would have resulted in the formation of peptidated-RNAs would have bestowed considerable selective advantages to protocells in which they appeared (see also the legend of Fig. 1). (At this evolutionary stage, the catalysis might have been likely very rudimental in the sense that, for instance, the molecules that were obliged to react were only anchored with reduction thus of the number of their rotations and twisting. Nevertheless, the protocell obtained a considerable benefit by the rudimental catalysis performed by these first peptidated-RNAs).

Fig. 1
figure 1

In the first stage (a) four RNAs are differently aminoacylated and peptidated. The secondary structures of the RNAs have a character with several hairpin-like structures. This stage (as well as all the other stages) represents also a hypothetical succession of interactions among peptidated-RNAs. The interactions result in the formation of one or more peptidated-RNAs (not specified) that are able to catalyze specific reactions. In the second stage (b) a pre-mRNA molecule evolves and is able to start codifying and therefore to direct the successions of interactions between the aminoacylated and peptidated-RNAs. The secondary structures of RNAs of peptidated-RNAs emphasize the character of hairpin structures. This trend will continue also in the next stages because this makes the decoding more efficient. Also the pre-mRNA molecules are peptidated at this stage. In the third stage (c) the pre-mRNAs emphasize their codifying capability, thus losing their peptidic component. In the fourth stage (d) most part of the peptidated-RNAs have hairpin structures in their RNAs. Finally, in the fifth stage (e) the secondary structures of RNAs of aminoacylated and peptidated-RNAs all have hairpin structures, and a lot of RNAs are only aminoacylated with the six amino acids predicted by the coevolution theory of the origin of the genetic code (Wong 1975; Di Giulio 2008). See text for full comments on all the evolutionary stages reported here

It seems to be of all natural that whether also a few interactions were required among the peptidated-RNAs to produce other peptidated-RNAs with a some catalytic activity (Fig. 1a), then the next step should have been that within an interaction group among peptidated-RNAs would emerge one of them that started to operate as a pre-mRNA (Fig. 1b), in the sense that it started to guide (to codify) the majority and finally all interactions among peptidated-RNAs (Fig. 1b) that were part of that group. Therefore, interactions among peptidated-RNAs would seem to trigger, in a natural way, an evolution toward a pre-mRNA form, codifying a determined succession of interactions among peptidated-RNAs, and thus lead to the construction of one or more peptidated-RNAs with desired catalytic activity. Although the genetic code is still extremely distant, it seems to me that this evolutionary stage (Fig. 1b) contains per se already “all” the genetic code, and that the question is essentially the one to reduce the ennuplets—governing interactions between pre-mRNAs and peptidated-RNAs (Fig. 1b)—to the triplets. Indeed, interactions between peptidated-RNAs and pre-mRNAs occurred by means of the pairing of regions consisting of a number of bases greater than three (this number is not specified in Fig. 1b). The other characteristic that appears mainly at this evolutionary stage (Fig. 1b), but that it should also be present in the previous stage, is that the RNA of peptidated-RNAs had the aspect of hairpin-like structures because interactions on a pre-mRNA of several peptidated-RNAs might have been better achieved by means of peptidated-hairpin-like structures (Fig. 1b, c). This would have been at least partially a characteristic also of the previous evolutionary stage (Fig. 1a) because the RNA hairpin structures should be the ones that were more easily synthesized under natural conditions. Indeed, they are the structures that were derived from intramolecular synthesis, that is to say, when a strand of RNA reach a certain length, it can bend on itself and act as a template for directing the synthesis of a complementary strand and thus lead to the formation of a hairpin structure (Orgel 1968).

At this evolutionary stage, also pre-mRNAs might have performed some catalytic activity even though they were involved mainly to guide interactions among other peptidated-RNAs (Fig. 1b). This would seem to justify the peptidation of pre-mRNAs at this evolutionary stage (Fig. 1b). With the progress of evolution, pre-mRNAs would have lost their peptidic component and highlighted their codifying character. This occurred because there would have been a request of improvement of the catalysis that was obtained only by means of a coding improvement of successions of interactions among peptidated-RNAs. In other words, an improvement of the catalysis and even its substantial jump of quality might have happened only through an improvement of the coding of these interactions, because this improvement would have had primarily as an effect an advantage in the quality of interactions that in turn would imply an improvement of the catalysis and superior peptidated-RNAs. The system favored aminoacylated-RNAs more than peptidated-RNAs that allowed the construction of polypeptides having specific sequences, that is to say, builded amino acid by amino acid. This would imply, as already said, an intensification of the codifying character of pre-mRNAs. Furthermore, this would seem to imply that the peptidated-RNAs with catalytic function did not increase in number but increased the catalysis quality, although the number of the peptidated-RNAs and all the aminoacylated ones should reach their peak. This allowed an improvement of the catalysts that started to be associated with the final steps of the successions of interactions among these peptidated-RNAs (Fig. 1d).

Finally, the author would like to tell explicitly that several peptidated-RNAs and the aminoacylated ones were part of different groups of successions of interactions between aminoacylated and peptidated-RNAs with pre-mRNAs. This imposed a constraint among different pre-mRNAs and laid the bases for a coevolution among them and with peptidated-RNAs that in the end would have resulted in the triplet code. This point might have been particularly important for the evolution toward the triplet code because if some aminoacylated-RNAs started to transfer the amino acid in several and different groups of successions of interactions between the aminoacylated and peptidated-RNAs with the pre-mRNAs—because this improved the catalysis of at least a peptidated-RNA—then this would have established a constraint among different catalysts, products of different groups of successions of interactions, and in such a way to make these aminoacylated-RNAs “immortal.” The same thing would have had a much inferior advantage if would have been instead referred to RNAs carrying, for example, a dipeptide or a tripeptide because a tripeptide added to a peptidated-RNA for the construction of a specific polypeptide might have been less flexible, if a specific sequence would have been required. In other words, the aminoacylated-RNAs might have been utilized to construct every kind of polypeptide sequence, whereas RNAs with dipeptides or tripeptides, for instance, cannot construct it. Therefore, there would have been necessarily a selection in favor of aminoacylated-RNAs because, compared to that having more than one amino acid, would permit the construction of specific polypeptide sequences, and thus catalytically more efficient, the one that peptidated-RNAs might not simply make.

From the Ennuplet Code to the Triplet Code

The interactions between peptidated-RNAs and pre-mRNAs occurred by means of pairing of regions involving a base number greater than three (ennuplet code) (Fig. 1b, c, d). It seems that an improvement of coding and thus of the catalysis could have been obtained if in a certain stage of evolution toward the current genetic code, these interactions were contiguous, i.e., the aminoacylated or peptidated-RNAs would recognize regions on pre-mRNAs contiguous among them, or separated by a few nucleotides (Di Giulio 2003, 2007; Fig. 1e). This is because a coding, instead, involving regions not contiguous but separated by a lot of bases, might have been less efficient, for the simple reason that the regions that are not contiguous would have interfered and/or damaged the quality of coding, because of simply being extraneous to interactions between aminoacylated or peptidated-RNAs and pre-mRNAs. Therefore, in the evolution toward the first mRNAs, there would have been a stage in which the coding occurred by means of mainly contiguous ennuplets, and interactions were codified between aminoacylated or peptidated-RNAs and pre-mRNAs, not necessarily in a high number, perhaps inferior to the dozen for only one group of interactions, that is to say, for a single pre-mRNA. All this leads to the synthesis of catalysts, i.e., peptidated-RNAs, the total number of which might have been on the order of one hundred or few hundreds. This number might be estimated from the number of reactions that were catalyzed at this evolutionary stage. These might be some hundreds or also less, and this depends on the complexity of the protocell that is not easily quantifiable. However, according to the author an order of magnitude of some hundreds does not preclude the evolution from the ennuplet code to the triplet code.

Therefore, the model would predict that, from this ennuplet code, the triplet code should have evolved, as we know it. It seems to me that this even complex circumstance is different from that discussed by Crick (1968) who maintained that the almost or sudden transition, for instance, from a quadruplet code to the triplet code cannot occur because this would entail the loss of meaning of most part of information codified in the quadruplets, if it would be decodified, instead, by a triplet reading. On the contrary, in the circumstance reported above, the transition from an ennuplet code to the triplet code would seem possible or strongly “sweetened” for the following arguments:

  1. (a)

    The number of pre-mRNAs, as said above, might have been of order of some hundreds, while the one of the interactions between the aminoacylated or peptidated-RNAs and pre-mRNAs might not have been superior to the dozen for every pre-mRNAs. All this determined a circumstance surely complex that does not seem necessarily insurmountable for an evolution toward the triplet code, as the one discussed by Crick (1968). This is because this complexity might have been eliminated by means of the enormous time at disposal—also billion of years—for making the transition from the ennuplet code to the triplet code to occur. Indeed, the argument that in the ennuplets were already present the future triplets (see also below) and the enormous time at disposal might, under pertinent selective pressure, have realized this surely difficult transition.

  2. (b)

    It was in the likely presence of ennuplets composed by repeats of GNC or GNS type and, more in general, of RNAs rich in GC (Crick et al. 1976; Eigen and Winkler-Oswatitsch 1981; Shepherd 1981; Ikehara 2002; Ikehara et al. 2002; Di Giulio 2008; Higgs and Pudritz 2009; Francis 2013). These repeated sequences might have favored the transition from the ennuplets to the triplets because these seem to speed up the mechanisms of exchange of meaning among the ennuplets. Indeed, codes of the GNC or GNS type, reducing enormously the number of possible combinations, would seem to substantially speed up the transition from the ennuplet code to the triplet code, compared to codes of the NNN type. We have to tell also that complementary RNAs to the GNC code are included also in the GNC code. This would seem to imply that an evolution always confined to the same pool of RNAs that would have finally favored the fixation of the GNC code (Ikehara 2002; Ikehara et al. 2002; Di Giulio 2008). This is true at least in part also for the GNS code.

  3. (c)

    It is evident differently from the difficulty discussed by Crick (1968) that (1) the ennuplets already contained, within them, the triplets of the genetic code toward which the ennuplets were evolving, and (2) there was a coevolution between the regions of peptidated-RNAs that paired with pre-mRNAs and pre-mRNAs themselves, in the sense, that if for any reason some peptidated-RNAs started to codify the interaction mainly by means of a triplet of their ennuplets, then this should necessarily have had a repercussion on all other pre-mRNAs. That is to say, also these latter started to attribute to the triplets of their ennuplets a higher meaning, with the consequence that there was a coevolution toward the triplets. This might have simply occurred to uniform the reading module that was still different among ennuplets of various dimensions (quadruplets, quintets, sextets, etc.), to the triplets that might have been selected simply because would have made more efficient these interactions given that the ennuplets were at least more cumbersome and therefore less economic; and also because the triplet was the simpler between the ones available, given that the doublets would have codified too few meanings, while the quadruplets too many. We have to note that the reading module evolves in a natural way, in the model suggested here, because it would result simply due to the succession of interactions between the aminoacylated or peptidated-RNAs and pre-mRNAs. In this sense, the model would explain also the origin of translation. And the protoribosome might simply have been at the origin of a ribonucleoprotein complex that in some generic way favored the interaction between the aminoacylated or peptidated-RNAs and pre-mRNAs.

An equivalent consideration should have made for mRNAs that seem to evolve in a spontaneous way from pre-mRNAs, because from an interaction group among peptidated-RNAs (Fig. 1a) would evolve one—the pre-mRNAs—that codify all the interactions of that group (Fig. 1b). This is because such circumstance is more favorable and therefore selectable than that interaction group that does not evolve pre-mRNAs, since this latter results to be less efficient as it has a coding less specialized and organized than the other and therefore with an inferior selective value.

  1. (d)

    It has been discussed that the RNA editing and modifications of RNAs might have facilitated the transition from the ennuplet codes to the triplet code ((Di Giulio 2003, 2007, 2008; Di Giulio et al. 2014). To this, I would like to add that all mechanisms of recombination—thinking for it cut and ligation among RNAs—both among the pre-mRNAs and the peptidated-RNAs but also that between pre-mRNAs and peptidated-RNAs would have considerably accelerated this transition (Di Giulio 2003), because these mechanisms would have, for instance, exchanged a triplet from a peptidated-RNA with an ennuplet from a pre-mRNAs, given that sequences of the GNC type are such also for an RNA that is complementary to them. Several authors think that the first mRNAs should codify only for six amino acids (Gly, Ser, Asp, Glu, Ala, and Val) for which the GNC code seems to be sufficient (Ikehara 2002; Di Giulio 2008; Higgs and Pudritz 2009). In particular, the coevolution theory of the origin of the genetic code (Wong 1975; Di Giulio 2008) maintains that these first amino acids (precursors) occupied the genetic code very early on, and that from them evolved the other amino acids (products) following the biosynthetic relationships among them, and this defined the organization of the genetic code (Wong 1975; Di Giulio 2008). Indeed, the GNC code is able to codify all the precursor amino acids (Ikehara 2002; Di Giulio 2008; Higgs and Pudritz 2009), and the last stage of the model (Fig. 1e) would represent the very GNC code or at least one that contains the GNC code.

Predictions, Corroborations, and Falsifications of the Model

The main prediction of the model here maintained is that it should be identifiable of enzymes made of a mixed polymer of RNA and proteins covalently linked. Although it has been found something of this kind (Wong 1991; Di Giulio 1997), a clear example in which an RNA covalently linked to a polypeptide and that has a catalytic activity has not yet been found and this is against the model. However, as Wong (1991) already discussed, it seems that an extensive era of peptidation of RNAs has preceded the origins of the protein synthesis and of the genetic code, strongly favoring the model here analyzed. For instance, the observation that the bases of tRNAs are modified, in particular in the anticodons (Di Giulio 1998; Di Giulio et al. 2014), would seem to indicated that in a phase of the evolution of the genetic code there has been at least an aminoacylation of RNAs that has favored the origin of the genetic code (Di Giulio 1998; see, for this point, Di Giulio et al. 2014). The model is instead able to explain and to predict the key intermediate of the protein synthesis, that is to say, the peptidyl-tRNA, that none of models (for example: Wolf and Koonin 2007; Rodin et al. 2011) suggested for explaining both origins of protein synthesis and of the genetic code is, on the contrary, able to take in account. The origin of the peptidyl-tRNA is very difficult to explain because it does not have a function per se, being an intermediate and therefore not directly selectable in a strict darwinian sense (Wong 1991; Wong and Xue 2002; Di Giulio 1997, 2003, 2007). On the contrary, the model here suggested bestowing to peptidated-RNAs the major catalytic role during the origin of the genetic code is able to explain elegantly the origin of the peptidyl-tRNA, simply because this does not seem other than a peptidated-RNA. In other words, the model suggests that the peptidyl-tRNA derives from the evolution of peptidated-RNAs and that, therefore, at the origin all the catalysis, or at least the codified one, was performed by peptidyl-tRNA-like molecules. Therefore, although there are no clear cases of RNAs linked to proteins that perform catalytic activity, nevertheless, the model is able to explain elegantly the key intermediate of the protein synthesis, and this is not achieved by the other models that are distant to make it. The prediction is also that a lot of proteins covalently linked to tRNAs or polypeptidyl-tRNA precursors for an enzyme should have the same catalytic activity as the protein, and this can be experimentally tested (see also Wong 1991). Furthermore, it seems to me untrivial to underline that all proteins that are enzymes pass through the stage of the peptidyl-tRNA and that therefore the enzymes, that is to say all the actual catalysts might recapitulate, that is, to have evolved from the peptidated-RNAs, being the peptidyl-tRNAs nothing other than homologs to peptidated-RNAs according to the model (Fig. 1).

I am convinced that a formidable corroboration of the model is represented by the tmRNA molecule. The tmRNA (transfer-messenger-RNA) is a molecule that behaves both as a tRNA and a mRNA (Muto et al. 1998). In the stage in which the tmRNA links itself to a polypeptide, this complex does not seem to be substantially dissimilar from the structures reported, as shown in Fig. 1b. In particular, the pre-mRNA molecule might be the homolog of the tmRNA, that is to say, the tmRNA might be a molecular fossil of the first mRNAs, because the pre-mRNA has a function of mRNA (Fig. 1b), and according to the model, the pre-mRNA might have been also peptidated (Fig. 1b) and therefore also possesses the tRNA function. This seems to be a fascinating interpretation being—according to my knowledge—one of the first explanations that is able to find a molecular fossil of the first mRNAs. These considerations do not seem trivial because the tmRNA is an extremely ancient molecule being present in all the bacterial domain (Gillet and Felden 2001) and therefore might represent a true molecular fossil of the some stages of the origins of protein synthesis and of the genetic code (Di Giulio 2003, 2007).

One of the ways for falsifying the model here proposed is that to find strong observations, for example, in favor of models that do not put in the core of evolution of the genetic code the catalysis (Wolf and Koonin 2007; Rodin et al. 2011), but that they suggest that the genetic code evolved for another reason and it has been after utilized for the current purposes, that is to say, it has been exaptated (Wolf and Koonin 2007; Rodin et al. 2011). Strong data in favor of these latter models would deny that the catalysis has been the main selective pressure, that is the raison d’être, of the origin of the genetic code, falsifying the model here suggested. It seems to me, from general point of view, that the models based on the exaptation are not correct because link the origin of the genetic code, that is to say the most important adaptive circumstance of the history of life of this planet, to be instead originated for another reason, and this seems absurd. On the contrary, the author retains that the catalysis has been the main selective pressure to promote the evolution of the genetic code. Indeed, the genetic code might be defined as the evolution of the codified catalysis, given that primarily it codifies proteins, i.e., catalysts (Di Giulio 2003, 2007; see Francis (2013) for references). As referred above, another way to falsify the model is that to test if it is possible to find in the peptidyl-tRNAs of a lot of enzymes the same catalytic activity of the correspondent complete protein. If it would not be possible to find this catalytic activity for an elevated number of different peptidyl-tRNAs, then this model would be incorrect (see also Wong 1991).

Conclusion

The model discussed here has some properties making it particularly fascinating. Firstly, the model is able to explain the origin of the mRNA, in a natural way, simply from the interactions among peptidated-RNAs. In making this, the model attributes to the pairings among peptidated-RNAs the first form of coding, that is to say, it attributes to the hydrogen bonds formed between the bases of RNAs an intrinsic propensity to the coding, rejecting a stereochemical origin for the genetic code (Di Giulio 2008). Another intriguing property of the model is that it is able to promote the evolution—in a completely natural way—of the translation of the mRNA from the sequential interactions of peptidated-RNAs with pre-mRNAs. In other words, the model considers the sequential interactions of peptidated-RNAs with pre-mRNAs as evolutionary equivalent and, therefore, substantially not dissimilar to the ones between aminoacylated-tRNAs and mRNAs. That is to say, the model would be able to give the translation a precise and natural sense and therefore a evolutionary meaning. All these properties do not seem possessed by any of the models analyzed so far and therefore this model is strongly corroborated.