Massey (2008) claims that error minimization (EM)—that is to say, the capacity of the genetic code to buffer deleterious effects, for example, of translation errors (Woese 1965)—is not the result of the action of natural selection, but it is an emergent property linked to the mechanism that would have structured the genetic code that was based on duplications of aminoacyl-tRNA synthetases (ARSs) and tRNAs. The main evidence that natural selection did not participate in the origin of EM of the genetic code would seem to be based on (i) some simulations using selection criteria that are able to generate significant percentages of genetic codes with EM equal to or better than that of the standard genetic code (Massey 2008, 2016, 2018); and (ii) on the observation that the mechanism based on duplications of ARSs and tRNAs is not the best evolutionary strategy to generate highly optimized codes (Massey 2008).

As far as the first point concerns, Massey (2008, 2016, 2018) makes use of at least two criteria of selection, one based on a threshold, that is to say, he adopts the criterion “when an amino acid enters in the code in evolution, it will enter if its distance from that of reference is less than a certain threshold value” (truncation selection) (Massey 2008, 2018), and the other criterion of selection is that “the amino acid that is added to the genetic code in formation is the most similar to the amino acid that has preceded it” (elitist selection) (Massey 2016, 2018). It seems to me, that these two criteria of selection are unable to corroborate that natural selection is not intervened in the origin of EM of the genetic code, but they would only corroborate that they are able to determine significant percentages of genetic codes with EM values equivalent to the one of the genetic code. In other words, truncation selection and elitist selection would be able to produce codes that would have EM values equivalent to the one of the genetic code (Massey 2008, 2016, 2018), but it would not result in absolute clarity because this excludes the intervention of natural selection—in the origin of EM of the genetic code—in favour of a neutral process. It is useful to specify that these selection criteria should always produce—by definition, in Massey’s simulations—intermediate evolutionary stages of genetic codes in which similar amino acids should have similar codons; that is to say, a form of code similar to our genetic code in which exactly similar amino acids are coded by similar codons (Epstein 1966; Goldberg and Wittes 1966). Namely, results of simulations of Massey (2008, 2016, 2018) would be somewhat tautological arguments because they are determined by the same selection criteria that he has utilized. Indeed, when Massey (2016) simulates the evolution of the genetic code by means of the mechanism of the coevolution theory—that is to say, not using his selection criteria—then the number of codes produced that have an EM value better than the one of the genetic code practically disappear. This is true for at least three measurements of distances between amino acids that are surely in relationship with the origin of the genetic code (Massey 2016; see his Table 2c). This clearly indicates that the selection criteria used by Massey (2008, 2016) would have heavily conditioned the percentages of codes having EM values equivalent to the one of the genetic code. In other words, these simulations would not have the great value that Massey would seem to ascribe to them. In addition, his simulations might paradoxically corroborate a quite opposite view to the hypothesis that he favours. Indeed, these criteria might corroborate the intervention of natural selection because the truncation selection and elitist selection are in any case forms, although very simplified, of natural selection and these would mime its way to work. Therefore, it would not be absolutely clear because these methods of selection should necessarily exclude the intervention of natural selection in the origin of EM of the genetic code, when they are very clear selection criteria should instead indicate—in a natural way—the presence of it! In other words, it is not clear why a neutral mechanism is favoured to the one more spontaneous of natural selection as these are obvious selection criteria. For this reason, it seems to me that the conclusion of Massey (2008) that the “EM arose neutrally from the addition of similar amino acids to similar codons” is only a very weak possibility but it does not appear absolutely as the more likely. Indeed, for instance, in the simulations employing the elitist selection (Massey 2016, 2018), the amino acid entering into the genetic code in evolution would be the one more similar to the reference amino acid and selected among the remaining ones: but this process is very similar to that of natural selection! Therefore, why should EM not be the action of natural selection? Only why would there not be an efficient mechanism for structuring of the genetic code by means of duplications of ARSs and tRNAs? This question is answered hereafter.

Although the mechanism based on duplications of ARSs and tRNAs has been considered to be responsible for determining the current level of optimization of the genetic code (Crick 1968; Higgs 2009; van der Gulik and Hoff 2011), nevertheless a mechanism for the physicochemical theory of the genetic code does not seem to exist that can justify in a clear and immediate way how the physicochemical properties of amino acids have been allocated in the genetic code (Di Giulio 2017a, b). Namely, in which way the selective pressures supposed by this theory (Sonneborn 1965; Woese 1965; Fitch and Upper 1987) have been realized. Massey (2008) also stressed the same point of view. Therefore, the absence of a clear and efficient mechanism usable during the origin of the genetic code to determine an optimal allocation of physicochemical properties of amino acids within the genetic code table would have incited Massey (2008) to conclude—with the results of his simulations (see above)—that EM of the genetic code has originated in the complete absence of positive selection. I think that this conclusion is not only very weak but also mistaken because it is likely that the genetic code was structured utilizing the mechanism suggested by the coevolution theory (Wong 1975; Di Giulio 2008, 2016a, b, 2017a), which is compatible with the intervention of natural selection. This theory assumes that the mechanism introducing amino acids into the genetic code was based on tRNA-like molecules on which occurred the biosynthetic pathways of amino acids (Wong 1975; Di Giulio 2002, 2008, 2017b). Indeed, the mechanism of structuring of the genetic code provided by coevolution theory would have guaranteed the production of populations of protocells with different genetic codes upon which natural selection would have acted. For instance, when from any precursor amino acid evolved a determined product amino acid, different codons might have attributed to this last amino acid on the base of the evolution of different tRNA-like molecules with different anticodons, on which was synthesized the product amino acid (Wong 1980; Di Giulio 2008, 2017b). This circumstance would have determined different types of genetic code in different protocells. Natural selection would have selected the one with the lower value in the function of minimization of error. This process guided by natural selection would have at the end determined the EM of the genetic code. Indeed, if the genetic code were structured through the biosynthetic pathways of amino acids (Dillon 1973; Wong 1975; Taylor and Coates 1989; Di Giulio 2002, 2016a; Di Giulio and Medugno 1999; Di Giulio and Amato 2009)—which are reflected by the majority of its rows (Taylor and Coates 1989; Di Giulio 2008, 2016a, 2017a, b, 2018)—then there would have been a discrete freedom of choice to allocate amino acids with similar physicochemical properties on its columns. Because a product amino acid that would have been allocated on a determined row of code it could have also been allocated on different columns because its synthesis on tRNA-like molecules allowed for that, since different types of anticodons were able to evolve (Wong 1980; Di Giulio and Medugno 1999; Di Giulio 2008, 2016a, 2017a, b, 2018). This would have produced different kinds of genetic codes in different protocells. These different allocations of amino acids in different genetic codes would have determined different levels of reduction, for instance, of the translation noise. As a result, this mechanism would have led—by means of positive selection—to a genetic code in which physicochemical properties of amino acids are above all well allocated on its columns (Nelsestuen 1978; Wolfenden et al. 1979; Sjostrom and Wold 1985; Di Giulio 1989b, 2016a, 2017a, b, 2018), whereas on the contrary the rows reflect the biosynthetic pathways of amino acids (Taylor and Coates 1989; Di Giulio 2008, 2016a, 2017a, b, 2018). In other words, I think that natural selection has intervened in the evolution of the genetic code—as far as the origin of EM concerns—taking advantage evidently of the mechanism of coevolution theory—as in part also sustained by Massey (2008, 2016)—but that this mechanism has instead determined the production of populations of protocells with different genetic codes on which natural selection has acted, under the push of the important selective pressure tending to reduce the translation noise (Woese 1965; Fitch and Upper 1987). Instead, Massey (2008, 2016, 2018) has supported exactly the contrary (see also below).

Evidently, the important point related to these arguments is if the level of optimization of the genetic code is sufficiently high to justify in full the predictions of the physicochemical theory. This did not seemed to me to be the case (Di Giulio 1989a, b, 1996, 1998; Di Giulio and Medugno 1999; Facchiano and Di Giulio 2018). Also, Massey (2008) has concluded in the same way, but this cannot support his hypothesis. Indeed, to claim that the EM is only a collateral effect (Massey 2008, 2016, 2018), and therefore completely irrelevant for the origin of the genetic code, does not seem to be justifiable because—also if the selective pressures of the physicochemical theory might not have been easily realized—the EM might have emerged as a very important factor in the establishing of which codons of the precursor amino acids were attributed to the product amino acids (Wong 1980; Di Giulio and Medugno 1999), determining the current but not very high level of optimization of the genetic code (Wong 1980; Sjostrom and Wold 1985; Di Giulio 1989a, b; Di Giulio and Medugno 1999; Facchiano and Di Giulio 2018). Therefore, all this might be evidently a falsification of the physicochemical theory understood as a very high level of optimization of the genetic code because a mechanism would not exist that could justify an optimal level of optimization (Di Giulio 2016a, 2017a, b), as Massey (2008) also maintains. However, the EM would not be only a collateral effect; instead, it would have been an important result of natural selection. That is to say, a consequence of which codons of the first amino acids coded into the genetic code were ceded to the amino acids biosynthetically derived from them (Di Giulio 2008), by means of a selection for the lowering of the translation noise (Woese 1965; Fitch and Upper 1987). Therefore, in this, there would be an indirect support for the mechanism of the entrance of amino acids in the genetic code as predicted by the coevolution theory (Wong 1975; Di Giulio 2008, 2016a, b, 2017a, b).

I would like to discuss this set of problems with respect to observations that have appeared in my researches on the origin of the genetic code. Recently, we have considered a model for the origin of the genetic code that takes into account both the biosynthetic relationships between amino acids and their physicochemical properties (Facchiano and Di Giulio 2018). That is to say, a model in which the codes of amino acid permutation (Di Giulio 1989a, b) have been subjected to biosynthetic constraints (Facchiano and Di Giulio 2018). This determines a huge reduction of the number of codes to analyse in order to understand where the optimization of the genetic code has arrived. These sets can have investigated in an exhaustive way (Facchiano and Di Giulio 2018). If the Massey hypothesis were true—namely, that the EM is derived from a neutral process and not by means of natural selection—then for his model, we would have reason to observe that the EM of the genetic code in the set of codes subjected to biosynthetic constraints would have been limited, instead this has not been observed (Facchiano and Di Giulio 2018). Indeed, a limited EM—in these sets of codes subjected to biosynthetic constraints—would be more easily explained by the Massey hypothesis because it would indicate that the physicochemical properties of amino acids had a marginal role in these sets of codes in which the origin of the genetic code would seem to be passed. On the contrary, the EM of the genetic code is placed on distribution of these sets always in point implying probability values less than about 10−3, with only the exception of a probability value of about 1% (Facchiano and Di Giulio 2018). If the Massey hypothesis would be true then we would not have cause to observe probabilities of the type of the ones observed that are instead of the order of 10−6 (Facchiano and Di Giulio 2018). These low probability values would seem more easily explained by the presence of natural selection than by a neutral process. In another recent analysis, the physicochemical theory has been compared with the coevolution theory by means of an analysis that has examined how the physicochemical properties of amino acids and their biosynthetic relationships are distributed within the genetic code table (Di Giulio 2018). This analysis was able to associate a probability value with each of these two theories that expressed the significance level of these theories (Di Giulio 2018). If the Massey hypothesis were true—i.e. that the EM has originated as a collateral effect and not by means of natural selection—then we would have needed to observe that the probability associated to the coevolution theory would have been much smaller than the one associated to the physicochemical hypothesis, but this was not the case (Di Giulio 2018). One probability value is about double of the other one (Di Giulio 2018). This would seem to imply that the EM has not been generated by means of a neutral process because otherwise we would have needed to observe that the probability value associated to physicochemical properties of amino acids would have been higher than the one associated to the biosynthetic relationships of amino acids because these pathways having presumably been the reason through which the genetic code was structured. These pathways would have owed to show a very much more significant probability—that is to say, much lower—but this has not been observed or it has been observed but not in a striking way (Di Giulio 2018). Namely, physicochemical properties of amino acids having been allocated—according to Massey hypothesis—through a neutral process would have needed to show a level of significance lower than of the biosynthetic relationships of amino acids that instead would have structured the genetic code. This instead has not been the case (Di Giulio 2018). Therefore, in the allocating of physicochemical properties of amino acids in the genetic code would have intervened a force stronger than a simple neutral process, i.e. natural selection would have presumably intervened. That is to say, this observation would be better explained by means of natural selection than a neutral process. Finally, the simpler and more reliable observation regarding the organization of the genetic code is that the biosynthetic relationships of amino acids are organized above all on its rows (Taylor and Coates 1989; Di Giulio 2008, 2016a, 2017a, b, 2018), whereas its columns host more significantly physicochemical properties of amino acids (Nelsestuen 1978; Wolfenden et al. 1979; Sjostrom and Wold 1985; Di Giulio 1989b, 2016a, 2017a, b, 2018). Or better, that when there is presence of statistical significance of the distribution of physicochemical properties of amino acids on the rows or columns of genetic code, then—in seven cases out of eight—there is always in presence of the absence of statistical significance of their biosynthetic relationships (Di Giulio 2016b). Namely, when the distribution of physicochemical properties of amino acids is significant on the rows or columns of the code, then the one of their biosynthetic relationships is not significant and vice-versa (Di Giulio 2008, 2016a, 2017a, b, 2018). I believe that this observation shows in a strong way the intervention of natural selection in the structuring of the genetic code in the following way. The statistical significance of the biosynthetic relationships of amino acids on the rows of the genetic code can be explained by the argument that the code was originating—under the general selective pressure of the improvement of the enzymatic catalysis (Di Giulio 2015)—utilizing the mechanism suggested by the coevolution theory (Wong 1975; Di Giulio 2008, 2016a, b, 2017a) as force related to the biosynthetic relationships of amino acids. This did not allow physicochemical properties of amino acids to be allocated on the rows in a way such that their distribution would be statistically significant because there was already this force in action linked to the biosynthetic relationships of amino acids. On the contrary, the neutralist model of Massey would interpret this observation by saying that while the rows of the genetic code were filled—through the mechanism based on the biosynthetic relationships between amino acids—simultaneously physicochemical properties of amino acids were not well organized on its rows simply because the EM of the rows would be the result of a neutral process not linked in realty to physicochemical properties of amino acids, but it would only be a collateral effect due to the mechanism of the coevolution theory, through which the genetic code was originating. This is a misguided explanation but can be made sensible by the fact that the distribution of physicochemical properties of amino acids is not statistically significant on the rows of the genetic code (Di Giulio 2016b, 2017b, 2018). However, this would appear much less credible than explanation based on natural selection because on the columns of the genetic code where instead there is absence of the biosynthetic relationships between amino acids [with the exception of the fourth column (Di Giulio 2016b, 2017b, 2018)]—and thus there would not be the force due to the mechanism linked to the biosynthetic pathways of amino acids—the absence of this force would have allowed for natural selection to operate freely bringing, therefore, to high statistical significance the distribution of physicochemical properties of amino acids on the columns of the genetic code. Whereas, if the neutral mechanism suggested by Massey would have been effectively operative then it would have resulted in the very opposite, that is to say, it would have made as do the rows of the genetic code: it would have had the distribution of physicochemical properties of amino acids completely non-significant also on the columns! This would not have been because of the absence of the force relative to the biosynthetic pathways of amino acids that allowed instead to natural selection to operate—and therefore to display itself—for allocating similar amino acids on the columns of the genetic code. Here, I see with clarity as the absence of the biosynthetic relationships of amino acids on the columns of the genetic code has allowed to highlight another force in action—natural selection—which was able to allocate physicochemical properties of amino acids in a highly statistically significant way on the columns of the genetic code. Otherwise, we would not understand because on the rows—where there is instead presence of the biosynthetic between amino acids and therefore of the other force—physicochemical properties of amino acids have been so clumsily allocated. That is to say, it is the presence/absence of the biosynthetic relationships of amino acids to determine the absence/presence of the display of natural selection; and this would not have been determined by a neutral process because if it would have been really operative then it would not have made for a distinction between the two different circumstances. Moreover, if the neutral process of Massey would have been really operative then it would have displayed itself exactly on the columns of the genetic code, because on the columns there is the complete absence of the other forces and this would not have resulted in any hindrance to achieve and therefore to reveal of this neutral process.

In summary, there is no direct observation that leads actually to the conclusion that the EM of the genetic code has been generated by means of a neutral process. Indeed, (i) the results of the simulations of Massey (2008, 2016, 2018) also have been explained through natural selection, and rather they are actually explained in a more natural way (see above); (ii) the mechanism of evolution of the genetic code suggested by the coevolution theory is compatible with the intervention of natural selection in the origin of EM of the genetic code (see above); and (iii) if the genetic code would have been structured by means of duplications of ARSs and tRNAs—that is to say, following the postulates of the physicochemical theory—then there would be arguments—like the ones of Higgs (2009)—that would support that the EM has been generated by natural selection. In other words, there is no simple and natural observation that makes the neutralistic hypothesis formulated by Massey for the origin of EM at least a plausible hypothesis. That is to say, there would exist only indirect observations compatible with this hypothesis, but above all, there would not exist an only reasonable observation compatible with it. Under these circumstances, it seems to me that the Massey hypothesis should not be true.