Introduction

The question on how the building blocks of today’s nucleic acids (NAs) emerged in the early Earth, billions of years ago, remains an intriguing open question for prebiotic chemistry (Castanedo 2024). A widely accepted theory for the origins of life considers that RNA was the first biopolymer to emerge (Cech 2012; Gesteland et al. 1999). This theory is known as the “RNA world” hypothesis and proposes that RNA came first because this biomolecule has catalytic activity in the form of ribosomal RNA (rRNA) and simultaneously can also transmit the genetic information in the form of messenger RNA (mRNA) (Ayukawa et al. 2019; Neveu et al. 2013; Orgel 2004).

Initial attempts to synthetize nucleosides by Orgel and coworkers (Fuller et al. 1972) showed that the formation of N-glycosidic bonds between the canonical nitrogenous bases generically named as recognition units or RUs (Hud et al. 2013) and ribose, also known as the trifunctional connector or TC (Hud et al. 2013), through a condensation reaction in dehydrated conditions is thermodynamically unfavored. Orgel attempted to glycosylate ribose with adenine, guanine, inosine, and xanthine, heating the reaction at 100 °C with and without the presence of catalysts and obtained a mixture of only β- and α-ribofuranosyl adenosine with a ~ 2–10% yield.

Similar unsuccessful attempts to glycosylate uracil and cytosine with ribose were also reported (Fuller 1972). This inability of canonical bases to create nucleosides with ribose in aqueous solution has been referred as “the nucleoside problem,” a special case of the “water problem” (Hud et al. 2013; A. do Nascimento Vieira et al. 2020; Kim et al. 2016; Joyce and Orgel 1999). Alternative models have been suggested to explain how building blocks of today’s nucleic acids were selected by Nature and assembled. Amount these models we may mention the “ribose centric model” (Hud et al. 2013) and the “polymer fusion model” (Hud and Anet 2000).

Nevertheless, it is also possible that the first nucleic acids had TCs, RUs, and ionized linkers (ILs) in their building blocks different than today’s ribose (TC), the five canonical nucleobases (RUs), and the phosphate group (IL), respectively. This hypothesis proposes that the contemporary nucleic acids emerged as a product of evolutionary selection from a proto- and pre-RNAs in which non-canonical (alternative) TCs and RUs glycosylated and assembled easier in prebiotic conditions (Hud et al. 2013). These components may have then evolved to become today’s β-ribofuranose, Adenine (A), Guanine (G), Cytosine (C), Thymine (T), Uracil (U), and phosphate (Engelhart and Hud 2010).

On the search for alternative prebiotic RUs, Kolb et al. (Kolb et al. 1994) reported the prebiotic synthesis of nucleosides containing a non-canonical base (either urazole or 1,2,4-triazolidine-3,5-dione) and ribose. They obtained a mixture of α- and β-configurations of the urazole nucleosides with the ribose in the furanose (F)- and pyranose (P)-forms. From that mixture the β-ribopyranose was predominant with a 53% yield. Following this idea Cafferty and coworkers (Cafferty et al. 2018) explored a library of 91 RUs (that included the five canonical bases (A, G, C, T, U)) based on the following criteria:

  • can glycosylate with ribose in the presence of water,

  • able to create complementary base pairing with at least two hydrogen bonds,

  • capable of π-stacking in aqueous environment,

  • chemically and photo-stable,

  • with conceivable synthesis in prebiotic conditions,

  • and “good chromophores” which makes them suitable to absorb UV radiation believed to have flooded prebiotic Earth (Douglas 2017).

Three non-canonical nucleobases fulfill all these conditions simultaneously. These are the triazine melamine (MM) and the pyrimidines 2,4,6-triaminopyrimidine (TAP) and barbituric acid (BA). In addition to these three bases the triazine cyanuric acid (CA) can create hexads in aqueous solution. Hexads are defined as linear π-stacked assemblies in the form of a six-sided polygon (Cafferty et al. 2016, 2013; Cafferty and Hud 2014, 2015; Fialho 2019; Li et al. 2016). Furthermore, numerous studies have shown the ability of MM, TAP, and BA to overcome the “glycoside problem” by creating N- and C-glycosides with ribose. As an example, a mixture of β-N-, α-N-, β-C-, and α-C-nucleosides can be obtained, with a yield of ~ 33–55%, by heating at 35 °C for 24 h TAP with either ribofuranose or ribopyranose. The β-C-ribofuranose nucleoside was predominant with a 20% yield (Chen et al. 2014).

Let’s address another fundamental question: was ribose present in the first proto-nucleic acids? The most accepted synthetic route for ribose in prebiotic conditions is the formose reaction. This reaction was first proposed by Butlerow (Butlerow 1861) and includes the polymerization of formaldehyde (H2CO) to obtain sugars. However, this method faces a challenge termed “the asphalt problem” as simple sugars can also polymerize to create asphalts through a series of competitive enolization/aldol additions between the carbonyl groups. As a result of the asphalt problem, ribose is obtained only as a minor product with ~ 2% of yield. Additionally, the 5-member ring (5-MR) ribofuranose is the least of the products from the mutarotation of ribose in water with a yield of around ~ 13% for the β- and ~ 7% for the α-anomer (Sutherland 2010; Drew et al. 1998).

Given the compilation of challenges associated with considering ribose as the TC present in the first nucleic acids, alternative candidates for the first TC are sought for. Nucleic acids with a TC different from ribofuranose are named “xeno-nucleic acids” (XNA) and they, incidentally, have a wide range of applications as biomarkers for cancer, HIV, and hepatitis (Sefah et al. 2014).

Among several classes of XNA nucleic acids analogs, N-(2-aminoethyl)-glycine (AEG) peptide nucleic acids (aegPNAs)—where the sugar-phosphate is replaced by a polypeptide backbone containing repetitive units of N-(2-aminoethyl)glycine (AEG) bound through a carbonyl methylene group to the nucleobases (Frenkel-Pinter et al. 2020; Wu et al. 2017; Lee et al. 2013; Mateo-Martí and Pradie 2010; Nielsen 2007, 1996; Menchise et al. 2003)—have become attractive as a plausible candidate for a proto-nucleic acid for a number of reasons, some of which are enumerated below:

  1. 1.

    The sugar-phosphate backbone of DNA/RNA is replaced in aegPNAs by a peptide bond which is more resistant to hydrolysis in aqueous solution than the phosphodiester bond (Frenkel-Pinter et al. 2020).

  2. 2.

    AEG may have been present in the early Earth environment and can have conceivably been incorporated in proto-nucleic acids and replaced later by 2'-deoxyribose or ribose through natural selection. Bolstering this proposition is the present-day traces of AEG which are found in certain strains of cyanobacteria (Josa et al. 2013). These bacteria are believed to have emerged around 3.5 Gyr ago and are known to be resilient in extreme environments such as geothermal vents which, in of themselves, are considered as a plausible environment for the emergence of early forms of life.

  3. 3.

    The components of aegPNAs’ building blocks have been synthesized prebiotically. These components include AEG as TC and the N-acetic acid derivatives of adenine, guanine, cytosine, and uracil as RUs (Nelson et al. 2000). The acetic acid derivatization of the nucleobases is necessary in order to form the amide linkage between the AEG and the bases that is part of the polyamide backbone. For example, the famed Miller-Urey experiments (McCollom 2013; Miller 1955, 1953) produced, in vitro, precursors of amino acids and L-glycine in a simulated chemical composition of the early Earth that included electric discharge to simulate thunder. In a later study, Nelson and coworkers (Nelson et al. 2000) studied the prebiotic synthesis of AEGs by i) electric discharge experiments of a mixture of CH4 (g), N2(g), NH3(g), and H2O (10–5–10–6% of yield for AEG), ii) polymerization of NH4CN (10–5% of yield for AEG), and iii) polymerization of NH4CN catalyzed by H2CO (10–5% of yield for AEG). Despite the low yields of AEG from the three previous methods Nelson and coworkers hypothesize that the production of this amino acid could have been possible in the early Earth due to the high solubility of its monohydrochloride and that AEG can also be obtained through the Strecker synthesis (Strecker 1854), from the reaction of ethylenediamine with HCN and H2CO. Ethylenediamine can be obtained with a yield of 0.05% through the reaction of NH4CN and H2CO in the presence of UV radiation (Nelson et al. 2000). Nelson et al., estimate that low concentrations of the Strecker mixture (~ 10−6 M) can produce up to 33% of AEGs. Adenine-N9-acetyl (acetyl = Ac) and guanine-N9-Ac were also obtained from polymerization of HCN in the presence of glycine in 0.0062% and 0.011%, respectively. Meanwhile, cytosine-N1-Ac and uracil-N1-Ac were produced from the reaction between hydantoic acid with cyanoacetaldehyde at 100 °C in 18% and 1.8% yields, respectively.

  4. 4.

    Nelson and coworkers (2000) also found that AEGs can polymerize in dry conditions at 100 °C better than α-amino acids at higher temperatures.

  5. 5.

    The carbon atoms of the aegPNAs backbone lack chirality. This property may have given them an evolutionary advantage in the early Earth since, in contrast with today’s nucleic acids, they are not subjected to enantiomeric cross-inhibition (ECI). ECI refers to the exclusive relationship between DNA/RNA biochemical functions and the D-enantiomerism of the ribofuranose or 2’-deoxyribofuranose. For example, studies by Joyce and Orgel (1999) show that the template directed polymerization of DNA containing D-ribose can be inhibited when D-ribose-phosphate is replaced by the L-enantiomer in the building blocks of the D-homochiral template (Hehre et al. 1986; Rao et al. 2009).

Another important question that requires further explanation regards the adoption by the majority of contemporary DNA and RNA of a β- instead of an α-configuration at the anomeric C1’ position of the sugar D-ribofuranose (Fig. 1). But why not α?

Fig. 1
figure 1

Orientation of the nucleobase at the C1’ of the furanose with respect to the hydroxymethyl group at the C4’ for the β- (left) and α-anomers (right) of canonical ribonucleosides. RU: nucleobase or recognition unit. R: can be either H in 2’-deoxynucleosides (in DNA) or OH in ribonucleosides (in RNA) (taken from (Ni et al. 2019) and reprinted with permission of RSC Advances. © Royal Society of Chemistry, 1999)

Numerous experimental reports have suggested that α-strands of DNA can create Watson and Crick (WC) complementary α-α and α-β double stranded polynucleotides (Froeyen et al. 2001; Guesnet et al. 1990; Paoletti et al. 1989; Lancelot et al. 1989, 1987).

It is also well-known that 5-MR sugars (pentoses), e.g., ribose, adopt two preferential 3T2 (2’-exo-3’-endo) and 2T3 (2’-endo-3’-exo) conformations in equilibrium for their respective β- and α-anomers (Fig. 2). Thibaudeau and coworkers propose that the pseudorotational equilibrium between these two conformations in ribofuranose is an important factor to take into account when comparing the stability of the β- and α-anomers of the building blocks of nucleic acids (see Chapter 2 of Thibaudeau et al. (2005) and (Thibaudeau et al. 1997, 1994)).

Fig. 2
figure 2

Equilibrium between the 3T2 (2’-exo-3’-endo) and 2T3 (2’-endo-3’-exo) conformations for a nucleoside with a 5-MR sugar (see pp. 22–46 in (Thibaudeau et al. 2005) and (Thibaudeau et al. 1997, 1994)). ∆X: change in enthalpy (∆H°) or Gibbs free energy (∆G°). Red chemical groups represent atoms over, meanwhile blue atoms under the mean plane of the sugar. Modified from (Thibaudeau et al. 1997) and reprinted with permission of Elsevier © Elsevier, 1997

Castanedo and Matta (2022a, b) explore the role of thermodynamics as a driver of evolutionary selection for the chemical structure of today’s nucleic acids in implicit solvation models and in vacuum/gas phase in a study titled “[o]n the prebiotic selection of nucleotide anomers: a computational study”. C and M estimate the energetic differences between β- and α-anomers of the sugars, their monophosphates, the derived nucleosides and nucleotides and the thermodynamic feasibility of the synthesis of the five canonical nucleotides through a classic and alternative pathway in which the order of the reactants was changed. The changing of the order of addition of the components of the building blocks locks the intermediates in different local minima on the potential energy hypersurface and hence results in energetic differences of the products if those have different conformations.

Castanedo and Matta report a marginal energetic preference ~ 8 kJ/mol (or less) for the β- over the α-anomers for some nucleosides containing adenine and guanine in vacuum (Castanedo and Matta 2022a, b). The classic pathway, in which two condensation reactions give the nucleotides as product, was found to be the preferred route in vacuum. In aqueous medium, however, neither pathway is energetically favored (both are non-spontaneous). These quantum chemical results are consistent with the “water problem” prohibiting the formation of nucleotides in aqueous media from their corresponding components.

The present paper explores the thermodynamic plausibility of the synthesis of potential prebiotic nucleosides from the canonical D-ribofuranose and non-canonical AEG and the five canonical bases A, G, C, T, U, the four non-canonical bases TAP, BA, MM, and CA, and their corresponding acetyl derivatives. The influence of the two preferential sugar ring conformations 2T3 and 3T2 for the D-ribofuranose in the corresponding α- and β-anomers is also examined.

Furthermore, we wish to revisit whether the synthesis of canonical and/or non-canonical nucleosides containing ribofuranose and AEG is thermodynamically favored following the “classic model.”

Finally, a hypothetical pathway to synthesize free β-uridine 5’-monophosphate (β-UMP) as free nucleotide and an AEG dipeptide in aqueous medium is proposed by building β-UMP on an AEG-Ac-C1-C template. Part of this work has recently been published in the PhD thesis of one of the authors [L.A.M.C.] (Castanedo 2024).

Computational Methods

The chemical structures of all the canonical components (TCs and RUs) from RNA were modified from the nucleotides contained in the structure of a B-DNA dodecamer (Protein Data Bank Identifier [PDB ID] # 1BNA) (Drew et al. 1981), while the structure corresponding to the PDB ID 1PNN (Betts et al. 1995) structure was used to create the initial 3D geometries of the components of aegPNA nucleosides by using the graphical interfaces UCSF Chimera (Pettersen et al. 2004), Hyperchem 7.0 (HyperChem release 7.0 2002), and GaussView 5.0 (Dennington et al. 2009).

All possible combinations between each TC and RU represented in Fig. 3 were explored. The enol form of barbituric acid (BA) which prevails in aqueous solution (Lubczak and Mendyk 2008) has been used in all calculations. For nucleosides containing D-ribofuranose (Ribf), each RU has been placed in the β- and α-configurations at the anomeric position C1’. The two preferential sugar ring conformations 2T3 or 2’-exo-3’-endo and 3T2 or 2’-endo-3’-exo for Ribf have also been considered.

Fig. 3
figure 3

Canonical and non-canonical trifunctional connectors (TCs) (top panel) and recognition units (RUs) (bottom panel) considered in the modeling of the nucleosides. (Top panel): Ribf = D-ribofuranose (β-anomer present in RNA) and AEG = N-(2-aminoethyl)-glycine (present in peptide nucleic acids (PNAs)). (Bottom panel) A = adenine, G = guanine, C = cytosine, T = thymine, U = uracil, TAP = 2, 4, 6-triaminopyrimidine, BA = barbituric acid in enol form, MM = melamine, CA = cyanuric acid. The R group is H for the RUs in non-PNAs nucleosides and acetyl (–CH2–COOH) for acetyl derivatives of RUs in the PNAs nucleosides. The blue C and green N represent reactive centers for the condensation reactions. Notice that in the case of TAP and BA they can either create N- or C5-nucleosides. Vertical lines with a dotted shadow indicate bonds that are broken during the condensation reactions and curved arrows denote rotatable bonds. In the case of Ribf-nucleosides, the leaving group is R = H while for AEG-nucleosides it is the hydroxyl (–OH) of the R = Ac

In total 110 nucleosides have been constructed. The total number of nucleosides included 88 nucleosides with a Ribf as TC {(11 RUs) × 2 environments (vacuum and implicit solvation) × 2 sugar ring conformations (3T2, 2T3) × 2 anomeric forms (β- and α-anomers, respectively) and 22 nucleosides with an AEG {(11 RUs) × 2 environments}.

Each of the TCs (Ribf been considered in the β- and α-configurations and with two sugar ring conformations) and RUs with rotatable bonds (curved arrows in Fig. 3) were subjected to a potential energy scan (PES) by reading their Z-matrices in the programme GRANADAROT (Montero 2019; Montero et al. 2000, 1998) which was used to generate, for each structure, 1,000 different conformers by randomly changing the dihedral angle around each rotatable bond. Each of these 11,000 (= 6,000 for RUs + 5,000 for TCs) conformers were optimized at the semiempirical PM7 (Parametric Method 7) (Stewart 2023) through a gradient minimization of energy until convergence (energy gradient threshold = 0.01 kcal·mol−1·Å−1) using MOPAC2016 (Stewart 2023). PM7 was chosen as it includes empirical corrections for dispersive and hydrogen bonding interactions.

Calculations were performed both in vacuum and in implicit continuum solvent modeled using the conductor-like screening model (COSMO) (Klamt 2011; Klamt and Schüümann 1993). For ribofuranose the coordinates of the carbon atoms in the furanose ring were kept frozen during the calculations at the PM7.

Some of the 1,000 optimized structures at PM7 converged without changing the conformation (only the values of the distances and angles were altered). For each set of final n unique optimized geometries, the final n’ conformers that collectively contribute at least 50% to the partition function (Z) were kept for refinement at a more accurate quantum mechanical level of theory. The rest of the conformers with minor contributions were discarded.

For each TC, RU, and acetylated RU with rotatable bonds in vacuum and in implicit aqueous medium the numbers n and n’ and their contribution to Z are summarized in Tables S1-S2 (in the Supplementary Information (SI)).

The n’ conformers were then subjected to a fully relaxed geometry optimization using the B3LYP DFT functional (Orio et al. 2009; Hertwig and Koch 1997; Becke 1988; Lee et al. 1988) with the 6–311+ +G(d, p) basis set (Hehre et al. 1986) as implemented in Gaussian 16 (Frisch et al. 2019). The B3LYP functional was chosen as it has been widely used for the computational modeling of building blocks of nucleic acids (Kaur et al. 2019, 2017; Šponer et al. 2011). The error bars for a similar level of theory, namely, DFT-B3LYP/6–31+G(d, p) have been benchmarked by Zhao and Truhlar (ZT) to be around 15 kJ/mol (Zhao and Truhlar 2008a). ZT obtained this estimate by comparing calculated and experimental thermodynamic data for 177 compounds (Zhao and Truhlar 2008a). Another study by ZT (Zhao and Truhlar 2008b) has reported a mean unassigned error of 13.8 kJ/mol in the estimation of B3LYP/6–31+G(d, p) interaction energies for 22 hydrogen-bonded complexes. Meanwhile, Rao and coworkers (Rao et al. 2009) evaluate 11 DFT functionals for their accuracy in estimating hydrogen bonding energies and relative energies of a conformational scan for 14 systems of biological interest that includes glycine, proline, and serine. In this study, the highest level of theory in predicting conformational energies is DFT-B3LYP/6–31++G(2d, 2p) with a mean absolute deviation of ~ 6 kJ/mol. Castanedo and Matta (2022a, b) estimate the uncertainty of the B3LYP/6-31G(d, p) level of theory to be within 13–17 kJ/mol, which is probably similar to the level of theory in this work.

From the final DFT-refined n’ conformers for each component the corresponding nucleoside has been designed by using an in-house bash-python script that manipulates the Z-matrices and internal coordinates to create a customized glycosidic bond between each TC and RU.

For consistency, the length of the C- and N-glycosidic bonds between each RU and TC, have been initially set to 1.52 Å while the dihedral angle that contains the glycosidic bond has been initially set to 180°.

Each of the 506 nucleoside structures was then subjected to a fully relaxed scan from 0 to 360° in 6 steps of 60° on the dihedral that involves the glycosidic bond. The lowest energy structure from each scan has been refined by subjecting it to a fully unconstrained optimization at the DFT-B3LYP/6–311++G(d, p) level of theory to obtain the final structures of the nucleosides in vacuum and in implicit solvation. A harmonic frequency calculation has been performed to ensure the absence of imaginary frequencies that indicates that the final structures are minima on the PES.

The thermodynamic feasibility for the synthesis of the canonical and non-canonical nucleosides following the “classic model” has been analyzed by estimating the ∆G° for the condensation reaction as:

$$ {\text{TC}} - {\text{OH}}\, + \,{\text{H}}{-}{\text{RU}}\, \to \,{\text{nucleoside}}\, + \,{\text{H}}_{2} {\text{O}}$$

for non-aegPNA nucleosides, and

$$ {\text{AEG}} - {\text{H}}\, + \,{\text{HOOC}} - {\text{CH}}_{2} - {\text{RU}}\, \to \,{\text{nucleoside}}\, + \,{\text{H}}_{2} {\text{O}}$$

for the aegPNA nucleosides.

Aqueous solvation has been accounted for in the DFT calculations using the Integral Equation Formalism variant of the “Polarizable Continuum Model” (IEFPCM) (Tomasi et al. 2010; Tomasi 2011; Barone et al. 1997; Cossi et al. 1996; Miertuš and Tomasi 1982; Miertuš et al. 1981) implemented in Gaussian 16 (Frisch et al. 2019).

If we accept as true the “water problem” (Joyce and Orgel 1999) this implies that the complex prebiotic chemistry that may have produced the first building blocks of ancestral NAs took place in a non-aqueous environment or at least that the reactants had controlled exposure to water (see for example Ref. (A. do Nascimento Vieira et al. 2020) and literature cited therein). Thus, the effect of aqueous solvation has been included in this study for a complete evaluation of this “problem” as well.

The 2, 4, 6-triaminopyrimidine (TAP) and the barbituric acid (BA) can create either N- or C-glycosidic bonds (Fialho 2019; Fialho et al. 2020), hence in the next sections TAP-C5 or BA-C5 and TAP-N or BA-N refer to the C5- and N-glycosides, respectively.

For the modeling of the new prebiotic pathway to obtain dipeptides and free nucleotides the base uracil was selected for the nucleosides since due to the “water problem” it is not possible to glycosylate it with ribose in water.

The n’ conformers from β-Ribf with the 2T3 sugar ring conformation, AEG, uracil (U), and cytosine (C)-N1-Ac were refined at the M05-2X/6–311++G(d, p).

The meta-generalized hybrid functional M05-2X has been selected since it has been widely used in the modeling of non-covalent interactions (hydrogen bonds and dispersive interactions) in molecular systems of biological interest predicting results in terms of energies and geometries comparable to experimental data (Zhao and Truhlar 2008b, 2007; Jissy et al. 2011; Johnson et al. 2009). Some of these studies include the optimization of π-stacking systems containing DNA nitrogenous bases (Zhao and Truhlar 2008a, 2007; Josa et al. 2013).

For each newly created bond a fully relaxed torsion scan was performed at the M05-2X/6–31G(dp) level by performing 6 steps of 60°. The local minimum from each PES was then reoptimized at the higher M05-2X/6–311++G(d, p) level of theory.

The ∆G° for each reaction step in this synthetic sequence is estimated as follows:

$$\Delta G^\circ_{1} \, = \,\left[ {G^\circ \left( {\text{product} \, 1} \right)\, + \,G^\circ \left( {\text{H}_{2} {\text{O}}} \right)} \right]{-}\left[ {G^\circ \left( {\text{AEG}} \right)\, + \,G^\circ \left( {\text{Ac} - \text{N}^{1} - \text{C}} \right)} \right]$$
(1)
$$\Delta G^\circ_{2} \, = \,\left[ {G^\circ \left( {\text{product} \, 2} \right)\, + \,G^\circ \left( {\text{H}_{2} {\text{O}}} \right)} \right]{-}\left[ {G^\circ \left( {\text{product} \, 1} \right)\, + \,G^\circ \left( \beta\text{-Ribf} \right)} \right]$$
(2)
$$\Delta G^\circ_{3} \, = \,\left[ {G^\circ \left( {\text{product} \, 3} \right)\, + \,G^\circ \left( {\text{H}_{2} {\text{O}}} \right)} \right]{-}\left[ {G^\circ \left( {\text{product} \, 2} \right)\, + \,G^\circ \left( \text{U} \right)} \right]$$
(3)
$$\Delta G^\circ_{4} \, = \,\left[ {G^\circ \left( {\text{product} \, 4} \right)\, + \,G^\circ \left( {\text{H}_{2} {\text{O}}} \right)} \right]{-}\left[ {G^\circ \left( {\text{product} \, 3} \right)\, + \,G^\circ \left( {\text{H}_{2} \text{PO}^{1 - }_{4} } \right)} \right]$$
(4)
$$\Delta G^\circ_{5} \, = \,\left[ {G^\circ \left( {\text{dipeptide}} \right)\, + \,G^\circ \left( {\text{UMP}} \right)} \right]{-}\left[ {G^\circ \left( {\text{product} \, 4} \right)\, + \,G^\circ \left( {\text{product} \, 1} \right)} \right]$$
(5)

Results and Discussion

Thermodynamics of the Formation of Canonical and Non-Canonical Nucleosides

The prebiotic synthesis of (non-) canonical nucleosides, that is, the condensation of an electrophile {TC} and a nucleophile {RU}, is explored thermodynamically on the basis of theoretical calculations. The broad question addressed is whether the emergence of proto-nucleosides that may have preceded the building blocks of today’s DNA and RNA is consistent with this simple modeling from a thermodynamical standpoint. The order of the synthetic steps is unimportant since the free energy is a state function and hence depends only on the two endpoints of the reaction except if the intermediate reactants are trapped in different potential energy wells on the conformational potential energy hypersurface.

It is well documented that using the classic model implies facing the “nucleoside problem” as a specific case of the “water problem” (Hud et al. 2013; A. do Nascimento Vieira et al. 2020; Kim et al. 2016; Joyce and Orgel 1999). This challenge refers to the inability of canonical bases to N-glycosylate with ribose in an aqueous environment due to the thermodynamic instability of the glycosidic bond. The “nucleoside problem” was first reported by Orgel et al. in 1972 (Fuller et al. 1972). Despite these challenges, Orgel et al. obtained in their study adenosine albeit in low yield (1–5%). Interestingly, these authors also obtained N-nucleosides with the adenine exocyclic NH2 in 50–70% yield.

To overcome the “nucleoside problem” different strategies have been proposed in the literature that include using different synthetic routes and non-canonical components (where non-canonical refers to components non present in the building blocks of today’s NAs). In this paper we use non-canonical components to glycosylate the TCs to RUs.

We searched the literature for TCs that can create non-conventional xeno-NAs backbones, selecting AEG for its highlighted advantages and RUs that have been proven to be successful in circumventing the “glycoside problem.” These RUs are 2,4,6-triaminopyrimidine (TAP), barbituric acid (BA), and melamine (MM).

Additionally, the base cyanuric acid (CA), is also tested since its nucleotides can create hydrogels (stacking hexads) through complementary base pairing with BA nucleotides (Cafferty et al. 2016), adenosine 5’-monophosphate (AMP), and 2,6-diaminopurine (DAP) (Li et al. 2016), similarly to the complementary base pairing observed in today’s DNA double helixes.

TAP is a pyrimidine that contains three exocyclic amino groups. This base exhibits four nucleophilic sites: The three exocyclic NH2 groups in positions 2, 4, 6 and the endocyclic C5. The amino groups in position 4 and 6 are equivalent by symmetry and exhibit a better nucleophilic character than the one in position 2 because of its depleted negative charge due to the electron withdrawing effect of the two ortho NH2 in position 4 and 6. Additionally, the prebiotic synthesis of TAP has been previously proposed (Pérez-Fernández et al. 2022; Salván et al. 2020; Trinks 1987).

TAP can glycosylate with different TCs depending on the reaction conditions (Fialho 2019; Fialho et al. 2018). This property is enhanced by its higher solubility in water compared to canonical bases (Chen et al. 2014). For example, Chen and coworkers (Chen et al. 2014) were the first to test the glycosylation of TAP with ribose by heating the mixture in aqueous solution at 35–95 ºC and  pH = 8. A complex mixture of products was obtained. An important result from this study is that after 10 days of reaction a combined yield of 60% was obtained for the mixture of TAP-ribose and 90% was obtained for the same mixture at a higher temperature. The β-ribofuranose-C5-TAP nucleoside was detected as majoritarian product.

Studies by Cafferty and coworkers (Cafferty et al. 2018) reproduced these results obtaining in similar conditions (35 ºC and 10 days of reaction) a complex mixture of TAP-nucleosides that included pyranose and furanose rings for the ribose, β- and α-anomers and C5- and N-glycosides identified by EM-HPLC after basic hydrolysis in NH4OH for 4, 20, and 44 h. This mixture represented a combined yield of 60%. The presence of N-nucleosides with exocyclic NH2 of TAP is somehow related with the findings by Fuller et al. (Fuller 1972) of N-glycosides of adenosine.

Even as N-nucleosides are predominant in today’s DNA and RNA, C-nucleosides can also be found in Nature, e.g., pseudouridine which is a product from the post-translation of RNA. For example, rRNA from mammals contains around 100 pseudouridines per ribosome (Chen et al. 2014; Clercq 2015; Wellinton and Benner 2006). Several of these C-nucleosides have anticancer and antibiotic activity (Clercq 2015; Wellinton and Benner 2006).

Fialho expanded on the work by Cafferty et al. (Cafferty et al. 2018) by testing the glycosylation of TAP with 17 different sugars that included hexoses, pentoses, and tetroses. They heated at 85 °C a mixture of TAP and sugar in a 1:1 ratio for 24 h at  pH = 1 or 7. The products were analyzed by UV-LC/MS and 1H-NMR and it was proven that all glycosylation reactions proceeded to a certain extent. The sugar that exhibited the higher yield of glycosylation was the pentose arabinose with 61% and 55% at pH = 1 and 7, respectively. Ribonucleosides were obtained in a 28% and 31% of yield at pH = 1 and 7, respectively (Fialho 2019; Fialho et al. 2020, 2018).

Barbituric acid (BA) is a strong acid (pKa = 4) that can react with ribose in aqueous solution due to the nucleophilic character of its barbiturate ion (Fialho 2019). The glycosylation of BA has been tested with glucosamine at pH = 7 in a 1:2 ratio of BA:glucosamine, for 10 h at 50 °C. The majoritarian C-β-nucleoside has been produced in basic catalysis (Fialho 2019; Gonzalez et al. 1986). The C-glycoside of the enol form of BA can create stacked hexads with MM in aqueous solution (Cafferty et al. 2016).

Melamine (MM) contains a heterocyclic ring with three nitrogen atoms that makes it more electron withdrawing, hence it is less nucleophilic than TAP and BA. Additionally, MM is less soluble in water at pH \(\ge\) 5. Its solubility at 65 ºC is 0.2 M (Fialho 2019). MM can co-precipitate in aqueous solution with phosphates, sulfates, and carboxylates (Fialho 2019).

Nevertheless, Cafferty and coworkers (Cafferty et al. 2016) showed that MM can glycosylate with ribose in aqueous solution when the reaction lasts for 24 h at 20 ºC despite the MM disadvantages described above. Both β- and α-anomers of the MM ribosides with the 5-MR and 6-MR of ribose were detected by AFM-HPLC and by measuring the 1H-NMR coupling constants for the anomeric hydrogens.

Fialho (Fialho 2019) explored the reactivity of 80 reactions between 8 nucleophilic bases that included adenine, uracil, TAP, BA, CA, and MM with a set of 10 electrophiles that included the aldehydes D-, L-glyceraldehyde, D-ribose, one anhydride, one imide, three esters, two thioesters, and one Michael acceptor. All nucleobases tested, apart from uracil and CA, were able to react with D-ribose.

Table 1 and Figs. 4, 5, represent the values and the bar graphs for the ∆G° of reaction for all 110 nucleosides. ∆G° < 0 suggests that the condensation reaction is thermodynamically favored meanwhile ∆G° > 0 indicates otherwise.

Table 1 Gibbs energies (∆G°) in kJ/mol at standard pressure and temperature for a hypothetical condensation reaction that follows the “classic” model leading to the 5 canonical and 6 non-canonical nucleosides containing AEG and β- and α-nucleosides containing Ribf in vacuum and in aqueous environment. The energies are estimated at the DFT-B3LYP/6–311++G(d, p) level of theory. Aqueous solvation is included implicitly with the IEFPCM model
Fig. 4
figure 4

Comparison of Gibbs energies of reaction (∆G°) at 298 K for the classic synthesis, leading to the 5 canonical and 6 non-canonical β- and α-counterparts of Ribf nucleosides. Each colored bar represents the initial puckering conformations for the 5-MR sugar. (Top): ∆G° obtained at B3LYP/6–311++G(dp) in vacuum, (bottom): ∆G° at B3LYP/6–311++G(d,p) using the IEFPCM model to include aqueous implicit solvation

Fig. 5
figure 5

Comparison of the Gibbs energies of reaction (∆G°) at 298 K for the classic synthesis, leading to the 11 aegPNA nucleosides between AEG and the five canonical acetylated bases (RUs-Ac): A = adenine-N9-Ac, G = guanine-N9-Ac, C = cytosine-N1-Ac, T = thymine-N1-Ac, U = uracil-N1-Ac, and the six non-canonical RUs-Ac: TAP-C5 = 2,4,6-triaminopyrimidine-C5-Ac, TAP-N = 2,4,6-triaminopyrimidine-N4-Ac, BA-C5 = barbituric acid-C5-Ac, BA-N = barbituric acid-N1-Ac, CA = cyanuric acid-N5-Ac, and MM = melamine-N-Ac. (Top): ∆G° obtained at B3LYP/6–311++G(d, p) in vacuum, (bottom): ∆G° at B3LYP/6–311++G(d, p) using the IEFPCM model for the aqueous solvation

Figure 4 represents the ∆G° values for the classic formation of the Ribf nucleosides. For the case of the canonical nucleosides in vacuum the formation of both β-2T3 and β-3T2 is estimated as more favorable over the α-anomers. The cytidine has the more negative values for the ∆G° with − 16.5 and − 17.8 kJ/mol for the 2T3 and β-3T2, respectively. A similar picture is observed in the case of the canonical nucleosides in aqueous environment with the exceptions of the thymidine and uridine for which no reaction is favored. The nucleoside that is more favored in aqueous solution is the β-2T3 adenosine with a ∆G° = −18.6 kJ/mol.

In the case of the synthesis of the non-canonical nucleosides containing Ribf in vacuum even when the β-2T3 and β-3T2 are more favored to be obtained for the TAP-C5 and the BA-C5, in the case of TAP-C5 the energies are within the error of the method. All BA-C5 anomers in all sugar ring conformations are predicted to be synthesized with values that overcome the error of the DFT method: (∆G° = −32.7 kJ/mol for the α-2T3; ∆G° = −50.7 kJ/mol for the β-2T3; ∆G° = −36.7 kJ/mol for the α-3T2 and ∆G° = −48.1 kJ/mol for the β-3T2). In contrast, the formation of BA-N and CA Ribf nucleosides is estimated to be unfavorable with values beyond the DFT intrinsic error (BA-N: \(\approx\) 11 to 18 kJ/mol and CA: \(\approx\) 10 to 23 kJ/mol).

For the non-canonical Ribf nucleosides in implicit solvation a similar trend is observed. In this case the β-anomer of both F-form conformations is predicted to be thermodynamically favored while the α-anomer is not for the TAP-C5, BA-C5, BA-N, and MM. For TAP-C5 the lowest ∆G° value is for β-3T2 with a ∆G° = −15.2 kJ/mol. The BA-C5 nucleoside has the lowest ∆G° values across all non-canonical nucleosides in water for the β-anomers: (∆G° = −26.5 kJ/mol for the β-2T3 and ∆G° = −22.1 kJ/mol for the β-3T2). Again, the formation of BA-N and CA nucleosides was estimated as non favored with ∆G° ≈ 2.6–26.8 kJ/mol. The formation of MM nucleosides was favored but the energies were within the error for the estimation of ∆G° at the B3LYP/6–311++G(d, p) level.

Since the condensation reaction of nucleobases with ribose has been widely reported in the literature, we can compare our results with some of these reports. For the case of the canonical nucleosides, it can be noticed that our results are in agreement with the studies by Fuller et al. (Fuller et al. 1972; Fuller 1972) on the condensation reaction between different bases and ribose in aqueous solution. In these two papers the authors obtained β-glycosides in a low yield for the case of the reaction between A or G and ribose. No product was observed when reacting a mixture of pyrimidines with the ribose in aqueous solution. Similarly in our study the formation of β-nucleosides for A and G is predicted to be thermodynamically favored in both aqueous and non-polar environments. Our results are consistent with the well-known “nucleoside problem” in the case of pyrimidine bases (Fuller et al. 1972; Fuller 1972; Benner et al. 2012).

In the case of the non-canonical nucleosides our results are in agreement with: (1) the results obtained by Chen et al. (Chen et al. 2014) for the glycosylation of TAP with ribose where the β-ribofuranose-C5-TAP was the majoritarian product, (2) the studies by Cafferty and coworkers (Cafferty et al. 2018, 2016; Cafferty and Hud 2014) for the synthesis of TAP, BA, CA, and MM ribosides, and (3) the work by Fialho (Fialho 2019; Fialho et al. 2020, 2018) in which the CA does not to react with the ribose, meanwhile β-nucleosides were obtained for TAP, BA, and MM in different yields depending on the reaction conditions.

In our case the β-anomers of the TAP-C5, TAP-N, BA-C5, and MM nucleosides with Ribf with the initial sugar ring conformations 2T3 and 3T2 were favored in implicit solvation. This result suggests that at least TAP, BA, and MM can react with ribose in water and circumvent the “nucleoside problem.”

In a previous paper, Castanedo and Matta (2022a, b) analyze the plausibility of a similar classic synthesis for the β- and α-anomers of canonical 2’-deoxyribofuranose (2dRibf) and Ribf nucleosides but at a lower DFT-B3LYP/6-31G(d, p) level in vacuum and in implicit aqueous medium using IFPCM. In this paper it was found that, overall, the synthesis of neither of the canonical nucleosides could be predicted as thermodynamically favored/unfavored in either environment since most of the energies were within the intrinsic error of the method used (3–4 kcal/mol). Meanwhile, in this study the formation of some A, G, and C nucleosides is predicted to be favored through the condensation reaction of their components when considering different puckering conformations of the sugar and a higher DFT level of calculation.

The only other computational modeling studies of TAP, MM, BA, and CA nucleosides(tides) that we could find are the works by Kaur et al. (2017, 2019). In the publication of 2019, Kaur and coworkers studied, using molecular dynamics (MD), molecular mechanics (MM), and DFT-B3LYP, the β- and α-ribonucleosides of the non-canonical TAP-C5 and CA, their complementary base pairing TAP-C5:CA, stacking energies and deglycosylation barriers in vacuum and implicit solvation using IEFPCM. Kaur et al. found from the deglycosylation profiles obtained for the most stable β- and α-anomers of both TAP ribonucleosides anomers at the B3LYP/6-31G(d, p) that the glycosidic bond in the TAP nucleosides (relative dissociation energy (Erel.) ≈ 350–447 kJ/mol) is stronger than the one in the canonical ribonucleosides (Erel. ≈ 222–258 kJ/mol) in both environments. Meanwhile, an opposite result was obtained for both CA ribonucleosides anomers (Erel. ≈ 155–193 kJ/mol {vacuum} and 207–267 kJ/mol {IFPCM}), leading the authors to propose that maybe TAP-C5 ribonucleosides were present in more hydrolytic environments, while CA ribonucleosides may have been in less hostile conditions.

Another study by Kaur et al. was published in 2017 for the BA-C5 in its keto form and MM ribonucleosides. When the authors analyzed the energetic barriers of dissociation of the glycosidic bond between the two non-canonical nucleobases and D-ribofuranose in the β- and α-configurations using the same DFT levels from the paper in 2019 they found that for BA-C5 (Erel. ≈ 317–451 kJ/mol at a glycosidic bond length of 3.0–4.1 Å) the cleavage of the glycosidic bond was less energetically favored than for canonical ribonucleosides. The glycosidic bond in MM ribonucleoside is predicted to be more resistant to dissociation when in the β-configuration and, overall, even when less stable than the one in BA-C5 ribonucleosides it can be stronger than the glycosidic bond in canonical ribonucleosides. This could make both non-canonical ribonucleosides suitable for hydrolytic prebiotic scenarios.

The present study complements the results obtained by Kaur et al. (2017, 2019). In our case we expand the analysis by considering the sugar puckering of the furanose form of D-ribose and also analyze the C- and N-glycosides of the TAP and BA bases. In this study instead of analyzing the stability of the glycosidic bond we estimated the thermodynamic feasibility for the formation of the given ribonucleosides and compared them with the formation of the canonical nucleosides, predicting that, overall, the formation of the β-anomer of the BA-C5 ribonucleosides is the most favored in both environments compared to the canonical nucleosides. For the case of the TAP-C5 and MM a similar trend is observed even when in some cases the energies are within the error of the calculations. In general, the classic synthesis of the CA ribonucleosides is estimated as the most unfavored, even more unfavored than the one for some canonical nucleosides. These results are in agreement with the studies by Kaur et al. (Kaur et al. 2017, 2019).

When analyzing the thermodynamic plausibility for the formation of canonical and non-canonical nucleosides with AEG it is observed that the condensation reaction of the acetylated derivatives of canonical and non-canonical nucleobases with AEG (Fig. 5) is thermodynamically favored in aqueous environment (but not in vacuum) in all cases. The most favored reactions are between AEG and A-acetylated (−19.4 kJ/mol), C-acetylated (−22.3 kJ/mol) and BA-N1-acetylated (−17.9 kJ/mol) nucleosides.

These observations suggest the possibility that maybe the TCs present in the first proto-NAs were different from ribose and that these were not necessarily sugar-based. A “ribonucleoprotein (RNP) world” instead of an “RNA world” has been proposed in the literature (Lehman 2015). This topic was widely discussed in a special issue of Life with 17 papers dedicated to discuss the origins and evolution of RNA and the “RNA world” hypothesis. Carter and coworkers (Carter 2015) discussed this hypothesis on the basis of a coexistence between both polynucleotides (as storage of information (Gatlin 1972; Volkenstein 1979, 2009; Hooft et al. 2024)) and polypeptides (as cofactors with catalytic activity) to complement the formation of RNA in the evolutionary process.

Also, in support of the RNP theory Smith et al. (Smith et al. 2014) addresses the fact that some proteins can store information (Gatlin 1972; Volkenstein 1979, 2009; Hooft et al. 2024) and catalyze the synthesis of other proteins.

Wächtershäuser (Wächtershäuser 2014) argued against the “Strong RNA World” hypothesis analyzing reactions that require metals as catalyzers and hypothesizes that until this type of chemistry was well defined in the prebiotic world none of the processes invoked by the “RNA world” theory could have happened.

Taking into consideration these observations, our predicted ∆G° of reaction for the synthesis of AEG nucleosides further support the idea that the building blocks of aegPNAs could have been precursors of today’s nucleosides if these molecules emerged in aqueous environment as a way to circumvent the “water problem.”

If this hypothesis can be proven experimentally, then an obvious next question emerges: how the building blocks of PNA transitioned to today’s nucleosides (tides)? One accepted theory is that their strands can participate in template-directed reactions since they both can create hybrid double stranded helixes with DNA and/or RNA (Frenkel-Pinter et al. 2020; Meggers and Zhang 2010; Zhang et al. 2005; Nielsen et al. 1994, 1991; Nielsen 1993; Egholm et al. 1992).

Another possibility could be that single AEG nucleosides(tides) could have assisted the glycosylation and phosphorylation of Ribf in today’s nucleotides. Supporting this idea, Nature has left us a clue hidden in our own genetic material: the “transfer-RNA (tRNA)”. tRNA can create polypeptides by transferring an amino acid bonded through an ester bond to the C3'-terminal of one of the tRNA strands to the nascent polypeptide chain in the peptidyl transferase center of ribosomal RNA (rRNA) (Bose et al. 2022; Massa et al. 2010; Simonović and Steitz 2009; Gindulyte et al. 2007).

In addition, a recent study by Suárez-Marina and coworkers (Suárez-Marina et al. 2019) has shown that canonical nucleosides and nucleotides can be synthesized through condensation reactions of their components in dehydrating conditions in the presence of glycine. This study showed that glycine can react with the canonical bases and direct the glycosylation to the correct site with the ribose-5’-monophosphate.

Hirakawa et al. (2022) were able to synthesize ribose-5’-phosphate in the presence of urea, borate (BO3−3) and phosphate ions in heating conditions at 80 ºC for 24 h. The product was obtained in a 22% yield after removing by acidic hydrolysis the excess of reactants, the borate and urea (contains two amine groups and a secondary carbonyl with certain similarity to amino acids). Hirakawa et al. cleverly combined the use of borate to overcome the “asphalt problem” (Neveu et al. 2013) derived from the formose reaction for the synthesis of ribose. The use of BO3−3 was initially proposed by Ricardo and coworkers (Neveu et al. 2013; Ricardo et al. 2004) as a way to synthesize and stabilize pentoses in the presence of borates.

What if it is possible to use AEG nucleoside as scaffold to further assist the condensation reactions between ribose, canonical bases, and phosphate, helped by possible π-π stacking interactions between the existing base in the AEG nucleoside and the base been glycosylated to Ribf?

Alternative Pathway for the Formation of Nucleotides from an AEG Nucleoside Template

As a preliminary test of the previous hypothesis, we have modeled the synthesis of uridine-5’-monophosphate uridine (UMP) and AEG-Ac-N1-C dipeptides by following the reaction sequence represented in Fig. 6. The corresponding molecular structures are visualized in Fig. 7.

Fig. 6
figure 6

A reaction sequence modeled for the prebiotic synthesis of UMP assisted by an AEG-Ac-C1-C scaffold

The proposed prebiotic pathway in aqueous medium includes the condensation reaction between N-(2-aminoethyl)-glycine (AEG) and Ac-N1-C to obtain the AEG-N2-Ac-N1-C derivative (product 1, {∆G°1 = −36.8 kJ/mol}). This reaction is followed by the condensation reaction between AEG-N2-Ac-N1-C and the OH in the C3′ position of β-Ribf in the 2T3 puckering conformation to obtain the β-Ribf-C3′-O-AEG-N2-Ac-N1-C (product 2, {∆G°2 = 5.9 kJ/mol}). Product 2 condensates then with uracil to obtain the uridine-C3′-O-AEG-N2-Ac-N1-C (product 3, {∆G°3 = −6.6 kJ/mol}). The phosphorylation of the OH-C5′ in product 3 with the di-hydrogen phosphate ion gives the β-UMP-C3′-O-AEG-N2-Ac-N1-C derivative (product 4, {∆G°4 = −8.7 kJ/mol}). Finally, product 4 condensates with a molecule of AEG-N2-Ac-N1-C to obtain a UMP, the AEG-N2-Ac-N1-C dipeptide (product 5, {∆G°5 = −36.4 kJ/mol}) and 4 water molecules are generated in total. While step 2 has a slightly positive ∆G°2 ≈ + 6 kJ/mol and step 3 is borderline spontaneous with ∆G°3 = −6.6 kJ/mol, these reactions are driven by the overall negative ∆G° of the reaction pathway whereby ∆G°(pathway) = \(\sum_{i=1}^{5}{\Delta G^\circ }_{i}=-82.6 \text{kJ}/\text{mol}\).

This reaction pathway that builds free UMP on an AEG-N2-Ac-N1-C template is, thus, overall favored by -82.6 kJ/mol in implicit solvation. Notice that this reaction pathway goes through a β-Ribf-C3′-O-AEG-N2-Ac-N1-C reactant which is similar to the C3′ end of the transfer RNA (tRNA) for the catalysis of the peptide bond formation in the ribosomal RNA (rRNA).

These results open the door to new prebiotic routes to overcome the “water problem” that could have used other potential prebiotic molecules as reagents to stabilize the building blocks of today’s nucleic acids. This pathway offers a way to obtain AEG dipeptides and free UMP nucleotides Fig. 7.

Fig. 7
figure 7

Molecular structure of all products in reaction sequence to obtain UMP and AEG dipeptide assisted by an AEG template

Discussion and Closing Remarks

The thermodynamic plausibility for the selection of the β- over the α-anomers of canonical and non-canonical nucleosides has been studied at the DFT-B3LYP/6–311++G(d, p) level of theory. A “classic” prebiotic synthesis for ribofuranose-containing nucleosides favors the formation of the β-anomer of canonical nucleosides containing adenine, guanine, and cytosine in both vacuum and implicit solvation and most non-canonical nucleosides containing N- and C4-glycosilated 2, 4, 6-triaminopyrimidine in aqueous environment and C4-glycosilated barbituric acid in both environments.

The formation of (2-aminoethyl)-glycine (AEG) containing nucleoside is favorable across the board in implicit solvation. This reinforces the hypothesis that an “RNP world” may have preceded RNA and DNA and constitutes theoretical evidence on how thermodynamics could justify the possible existence of a proto-RNA with a TC different from ribofuranose or 2’-deoxyribofuranose.

A novel hypothetical pathway was tested for the synthesis of β-uridine 2’-monophosphate (UMP) and a dipeptide containing the glycoside of AEG and the acetyl-N1 derivative of cytosine. This pathway includes the formation of some subproducts that are similar to existing aminoacyl derivatives of RNA present in today’s transfer RNA. The overall synthesis was predicted to be thermodynamically favored with a total ∆G° = − 82.6 kJ/mol. This represents a workaround by Nature to overcome the “water problem” for the glycosylation and phosphorylation of Ribf in the correct β configuration by building the free nucleotide on an amino acid derivative template. Hence, considering the possible coexistence of the components of nucleotides and ancestor amino acids in a prebiotic soup open numerous possibilities for the synthesis of both proteins and nucleic acids.

The results presented in this paper complement the theoretical studies previously published and constitute a step in the direction of understanding the drivers that may have led to the chemical nature of today’s nucleic acids. Nevertheless, we cannot disregard the relevance of other important factors. Some of these may include kinetic considerations (Jeilani and Nguyen 2020; Singh et al. 2014; Sheng et al. 2009). Singh and coworkers (Singh et al. 2014), for instance, describe a hydrogen-bond self-activated glycosylation mechanism for adenine and cytidine with D-ribose. After 8 days of heating at 60–70 °C β-adenosine was obtained in 15% of yield while β-cytidine was obtained in a 12%. The proposed reaction mechanism supports a hydrogen bond-assisted interaction of the base with the α-anomer of the sugar leading to a more stable transition state that goes through an oxonium intermediate that eventually interconverts into the β-nucleosides. This mechanism is suggested to happen on the surface of water.

Another factor that was not discussed here is pH which can have dramatic consequences since it determines the dominant protonation and tautomerization states of the nucleobases (Krishnamurthy 2012). Thibaudeau and coworkers (pp. 22–46 in Thibaudeau et al. (2005) and (Thibaudeau et al. 1994, 1997)) propose that the preference for one specific β-anomer over the α-counterpart for canonical nucleosides is related to its higher flexibility (lower differences between the ΔG° for the 2T3 / 3T2 pseudorotational equilibrium) in the β-configuration when transitioning from an acidic to an alkaline pH. Additionally, and related to the formation of non-canonical nucleosides containing TAP, BA, CA, and MM, Fialho (Fialho 2019) refers to the difference between the ionization states of the four non-canonical bases TAP, BA, CA, and MM. Due to this difference it can be inferred that only the pairs MM:BA and TAP:CA can be found at pH = 4–5 and pH = 6–7, respectively. Considering this if the first proto-NAs had these RUs and depending on the pH of the environment they may have only contained either the pair BA, MM or TAP, CA.

Ions and minerals would also have a significant effect on the prebiotic selection of building blocks of nucleic acid precursors. This is expected, at least, on the basis of the well-known ion-based catalysis especially for ions like Mg+ (Gull et al. 2017) or BO3−3 (Neveu et al. 2013; Fuller et al. 1972; Fuller 1972; Ricardo et al. 2004; Šponer et al. 2016; Holm 2012). Divalent cations, e.g., Mg2+, can assist in the folding of nucleic acids (Holm 2012). Possibly an early example of the use of Mg2+ as a catalyst in this context is the study of Fuller et al. (Fuller 1972) showing the marked increase in the synthetic yields of nucleosides in prebiotic conditions upon inclusion of Mg2+ in the reaction mixture.

Formamide (H2NCOH) has received special attention as a potential prebiotic solvent, since it can be synthesized from different routes, and it is present in interstellar space (Saladino et al. 2012). Nucleobases and some other molecules of life can be synthesized from H2NCOH (Saladino et al. 2012). Furthermore, the phosphorylation of different nucleosides in the presence of H2NCOH is possible with different yields going from 6%-59% (Gull 2014). There are still, however, issues that may cast doubt on formamide as the solvent that eventually led to modern day nucleic acid since this solvent inhibits base pair complementary (Hud et al. 2013). This latter point provided the impetus for the search of other prebiotic solvents. Polar organic solvents, e.g., ionic liquids and deep eutectic solvents have been considered (Mamajanov et al. 2010). For instance, choline chloride:urea mixture assists the folding of several nucleic acids imparting a higher stability to some than in water itself (Cafferty and Hud 2014). Other solvents may include urea (Salván et al. 2020) and deep eutectic solvents. In closing, the conclusions reached in this work are provisional despite being novel and suggestive and do call for more investigation.

Supporting Information

All the scripts written in bash and python to generate the nucleosides, post-process/analyze all the results and generate the diagrams and tables can be accessed through https://github.com/mattas-research-group/scripts_PhD_thesis_Lazaro/. For more information check the Supplementary Information.