In 1915, mathematician Emmy Noether formulated and proved a theorem (the subsequently termed Noether’s theorem) which states that every differential (thus continuous) symmetry that holds in a physical system has a corresponding conservation law (Noether 1918). For example, in systems which are symmetrical under translation in space, momentum is conserved and in systems endowed with rotational symmetry, angular momentum is conserved. Also, symmetry in time (that is, when any experiment produces the same results independently of the moment that it starts) according to this theorem leads to energy conservation. Several other symmetries and corresponding conserved quantities according to this theorem have been explored as modern physics kept developing, but these cases are outside the scope of the present description. The interested reader can find systematic outlines of Noether’s theorem, which stress historical (Byers 1998), more technical (Marinho 2006) and other aspects as well (Brown and Holland 2004; Wigner 1954). Moreover, Kosmann-Schwarzbach (2011) has written an excellent book on Noether’s theorem and its physical applications.

Several years after the formulation of Noether’s theorem, the discovery of the structure of DNA and of its role as a carrier of genetic information gave rise to the new fields of molecular biology and genomics. It was already well known before the determination of the double helical structure of DNA and has been fundamental for its discovery, that when total DNA is analyzed, i.e., if both strands are considered, exact equimolarity holds between complementary nucleotides, i.e., between adenine (A) and thymine (T) or guanine (G) and cytocine (C): A = T and G = C, where A, T, G, C denote the molar fractions of the four bases. This is the so-called interstrand basepairing rule (BPR).

Early in the history of molecular biology, Erwin Chargaff, one of its pioneers, and his collaborators (Rudner et al. 1968), discovered that the above relations also hold approximately for the composition of long stretches of single-stranded genomic DNA. While BPR is clearly related to and explained by the complementary structure of double-stranded DNA, the observation of Chargaff et al. posed a considerable challenge to the developing field of genomics. We will briefly review here the cause behind these quantitative relationships as analyzed in a seminal work by Sueoka (1995, 1996). At the same time, answer to this problem was also provided by Lobry (1995a, b) independently, using a slightly different approach. The condition met when mutation and selection are equally effective on both strands (i.e., when strand-equivalence holds) is termed first parity rule (PR1). In the works of Sueoka and Lobry cited above is shown that when PR1 holds true and the substitutional dynamics has sufficient time to reach its equilibrium state, then, equalities A = T and G = C become valid for the composition of any of the two single DNA strands. This is termed Chargaff’s second parity rule (PR2). As stated, prerequisite for PR2 and thus for the intrastrand A = T and G = C relations to hold is the absence of biases in mutation and selection between the two strands.

In terms of an interplay between symmetry and the corresponding conserved quantities we may consider the case where the double-stranded DNA is symmetric under the following substitutional modalities. Let us call PI(A → T) the base substitution rate that reflects the combined effect of a mutation rate (m) and a selection coefficient (s) from A to T in strand I and accordingly for all other possible base substitution events in strands I and II (see Sueoka 1995). Namely, in the case we describe here the following base-substitution equalities hold: PI(A → T) = PI(T → A), PI(C → G) = PI(G → C), PI(A → G) = PI(T → C), PI(A → C) = PI(T → G), … and similarly for the whole set of 12 substitution rates. See (Lobry 1995a) and “Appendix”. This symmetry, as expressed by the above equalities, allows to reduce the number of free base-substitution constants from 12 to 6, see (Sueoka 1995, 1996). This is equivalent to PR1 and constitutes between-strands symmetry in the DNA structure. Subsequently, the role of the conservation here is played by the PR2 intrastrand A = T and G = C relationships, and consequently, for purines (Pu) and pyrimidines (Py), Pu = Py = 0.5 holds.

To further explore the relation between PR1 and PR2, we recall in the “Appendix” the substitution matrix, which takes into account the aforementioned rate relations (Lobry 1995a). The eigenvectors and eigenvalues of this matrix are useful for determining the asymptotic nucleotide frequencies. Due to the properties of the substitution matrix the largest eigenvalue is unity and corresponds to the asymptotic state (Perron-Frobenius theorem). The components of the corresponding eigenvector correspond to the four nucleotide asymptotic concentrations (molecular fractions). If the PR1 conditions are taken into account, then the resulting asymptotic frequencies lead to A = T, G = C and consequently Pu = Py = 0.5, i.e., to PR2. Therefore, the symmetry conditions PR1 lead to equal frequencies of purines and pyrimidines. The PR2 equalities are known to be conserved in most genomes (except for animal mitochondria) as earlier stated and is going to be discussed in the sequel. This symmetry—conservation interplay could be seen as an analogue of Noether’s theorem in genomics.

As an exemplary simulation, we consider in Fig. 1 the case of evolution of an initial random sequence of length L = 2 × 104 nucleotides (nts), where the initial values of molecular fractions are: A = 0.45, T  = 0.05, C = 0.2 and G = 0.3. In Fig. 1a no biases in mutation and selection between the two strands of DNA are assumed, i.e., PR1 holds. The inverse condition is met in Fig. 1b. In accordance with the theory exposed above, in Fig. 1a the asymptotic emergence of the compositional properties denoted as PR2, i.e. A = T, C = G and Pu = Py = 0.5, are clearly visible. This is not the case when PR1 is not respected, see Fig. 1b. Similar results are obtained for different initial molecular fractions (not shown here).

Fig. 1
figure 1

a Here, in a computer simulation, the A (blue), T (green), G (magenta), C (red), Pu [A + G] (brown) and Py [C + T] (purple) molecular fractions (initial values are arbitrary chosen) for a DNA molecule of 2 × 104 nts, are monitored in evolutionary time. Interstrand parity in the mutation rates is supposed (i.e., PR1 holds). One observes that (i) approximate equalities A ≅ T & G ≅ C tend to be established and (ii) Pu ≅ Py ≅ 0.5 are reached asymptotically. This is an illustration of the emergence of the PR2 ‘conservation’ relationships as a result of the interstrand ‘symmetry’ relations dictated by PR1. b In a numerical set-up analogous to the previous ones (color convention as above) when mutation rates do not obey PR1, then PR2 relations fail to establish. Thus, the resulting DNA molecule disposes a heavy (purine-rich) and a light (pyrimidine-rich) strand, similarly to the genome of animal mitochondria (Color figure online)

As we may infer from the above analysis, deviations from PR2 might be used for the detection of differences in the functionality of the two DNA strands, differences between their roles in genetic signaling or between the local environments where they are exposed. Cumulative presentation of the excess quantities (A − T)/(A + T) and (G − C)/(G + C), the so-called relative nucleotide skews, has been shown to generate V- or Λ-shaped patterns along the linear chromosome coordinate, which is suitable for following such trends in DNA composition (Grigoriev 1999).

While PR2 is obeyed by most genomes of organisms or organelles when their entire length is considered, notable exceptions also exist. These exceptions concern parts of genomes, particularly halves of circular bacterial chromosomes delimited by the ‘origin of replication’ (ori) and its diametrical point ‘terminus of replication’ (ter). Frank and Lobry (2000) developed Oriloc, a successful online computational tool for the positioning of ori and ter in bacteria, based on the above observation. While, in a first approximation, deviations from equivalence between leading and lagging strand during DNA replication are at the origin of departures from PR2, transcription also contributes to it, as the strands of opened DNA are being exposed to different environmental influences during the transcription as well as during the replication process (Beletskii and Bhagwat 1998; Seplyarskiy and Sunyaev 2021). In that way asymmetric mutational pressures between leading and lagging strand or between coding and complementary strand can be shaping the nucleotide composition violating PR1 (and thus PR2) through both processes (Francino and Ochman 2001; Fijalkowska et al. 1998; Nikolaou and Almirantis 2005; Necşulea and Lobry 2007).

Two additional points might be mentioned here in brief:

  • First, in bacteria, not only mononucleotide skews (i.e., deviations from PR2) but also skewed first-neighbor preferences are shown to follow V- or Λ-shaped patterns, as a result of deviations from PR1 related to mutational dynamics, as we have systematically investigated in a previous work (Apostolou-Karampelis et al. 2016). Moreover, as analytically discussed therein, PR1 symmetry produces intra-strand first-neighbor preferences of the PR2 type, under conditions of inter-strand equality of context-dependent substitution rates (see Fig. 1 in the aforementioned work).

  • Second, in the eukaryotic chromosome, a more complicated pattern emerges when single-nucleotide skews are considered, the so-called ‘factory-roof motif’, due to the coexistence of numerous origins of replication. This is again a consequence of functional asymmetry between the two strands (Touchon et al. 2005).

While entire bacterial and eukaryotic chromosomes mostly follow PR2, in some cases, organellar genomes deviate from what dictates PR2 for their entire length (Mitchell and Bridge 2006). In a systematic review of these small and peculiar genomes (Nikolaou and Almirantis 2006), it is shown that animal mitochondria deviate considerably and systematically from PR2. Indeed, this compositional feature (here, of whole mitochondrial genomes) is shown to relate to asymmetry in structure and function between the two strands: mitochondrial DNA strands differ in their content in purines and pyrimidines, thus named heavy and light strands. Moreover, in vertebrate mitochondria, genome duplication occurs through the rolling circle mode, meaning that both strands are replicated continuously starting from two different single strand origins. This replication dynamics leads to two strands being exposed to different mutational dynamics in all their length, which represents a complete deviation from PR1, and consequently leading to a whole-length violation of PR2 in these organellar genomes.

We have to keep in mind that symmetry considerations between DNA strands involved in the explanation of PR2 are not of the type described in the prerequisites of Noether’s theorem (Noether 1918; Byers 1998; Wigner 1954; Kosmann-Schwarzbach 2011). They consist of, as briefly stated here and analyzed in (Sueoka 1995, 1996; Lobry 1995a, b), absence of biases in mutation and selection between the two strands, thus they are mainly of a functional origin. This makes any parallelism between Noether’s theorem and the emergence of PR2 a metaphor rather than a direct one. However, in both: (a) the case of symmetry causing conservation in Noether's theorem and (b) RP1 causing RP2 discussed here, the cause—effect link can be mathematically derived exactly. The prerequisite—some kind of symmetry or equivalence between strands—and its consequence—the compositional constraints A = T and G = C, which hold true in the measure of validity of the strand equivalence—brings the connection between PR1 and PR2 close to the essence of Noether’s theorem. Although, PR1 → PR2 relationship lies outside the range of the mathematical prerequisites of Noether’s theorem, we may consider PR2 as the consequence of PR1, just like symmetry acts as a cause and conservation as a consequence in Noether’s theorem. Additionally, the PR1 relations have, as a consequence that Pu = Py = 0.5, which, technically, sounds close to a typical ‘conservation’ property. Therefore, the symmetry expressed by PR1 leads to several forms of quantitative relationships: compositional variables of specific information-bearing macromolecules are constrained to remain almost equal, and the two strands are led to always conserve equal percentages of purines and pyrimidines.

Although applications of Noether’s theorem come principally from the field of Physics, here, symmetry engenders quantitative relationships within the realm of biology. Another aspect that makes peculiar this analogy is that here relative nucleotide skews only approach zero in a time infinity limit (i.e., conservation-like relationships are approximate), while deviations from equality (skews) are studied as bearing valuable information for complex biological procedures which may involve several mutational but also selectional biases (Grigoriev 1999; Seplyarskiy and Sunyaev 2021; Francino and Ochman 2001; Fijalkowska et al. 1998; Nikolaou and Almirantis 2005; Necşulea and Lobry 2007). In entirely different contexts, in developmental biology and brain dynamics, Noether’s theorem relation to biology have been reported recently (Papageorgiou 2020; Bilteanu et al. 2017).

Overall, the observation of PR2 in genomic sequences is a case where symmetry in function between DNA strands, i.e. (a) equivalence in the mutational pressure they undergo and (b) equivalence in their biological roles (thus almost-equality of constraints due to evolutionary selection), has compositional consequences. The interplay between (interstrand) symmetry and (intrastrand) ‘conservation’ relationships has parallels with Noether’s theorem, although, as stated, it cannot be considered to be a direct application of this theorem. Instead, we suggested that this seminal theorem may be seen as a metaphor for PR2 and this brings out a conceptual analogy between physics and biology. This metaphor discussed in the present note may be viewed in the perspective described by Eugene Koonin in his article: “Are There Laws of Genome Evolution?” (Koonin 2011) and in other related studies as well (West et al. 1997; Guiot et al. 2003; Li 2011; Marchal 2015; Militaru and Munteanu 2013; Longo and Montévil 2011). In Koonin’s work (Koonin 2011), a selection of the most conspicuous such universal regularities is presented, which share the status of ‘laws of evolutionary genomics’, in the same sense that ‘law’ is understood in modern physics. They include: log-normal distribution of the evolutionary rates between orthologous genes; power–law–like distributions of membership in paralogous gene families and node degree in biological ‘‘scale-free’’ networks; negative correlation between a gene’s sequence evolution rate and expression level (or protein abundance); distinct scaling of functional classes of genes with genome size. For the original references on these relationships and detailed discussion see (Koonin 2011).

We would like to stress Koonin’s remark that the universal regularities discussed in his article do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Similar, in that respect, is the phenomenon examined herein: according to the analysis of Sueoka (1995) and Lobry (1995a, b), PR2 results as a consequence of the mutational dynamics and of the whole cellular activity. This is the case, provided that selection or functional divergence between strands either act weakly or their impact is strand-independent, when the entire genome is considered (e.g., in typical bacterial genomes). Therefore, the cause-effect link between PR1–PR2 approaches the status of a physical law—and thus can be seen as an analogue of Noether’s theorem—as far as dependence on natural selection is relaxed. This means that PR2 is non-adaptive in the sense that emergent properties reviewed by Kooning—e.g., scale-free networks of biochemical reactions etc.—are also non-shaped by selective pressure. All these regularities can be considered as constituting an ensemble of ‘physical laws’ met in biological systems.

In the last two decades, much work has been done investigating the genomic composition looking for PR2-type relations between oligo-nucleotides (k-mers) and their reverse complement with very interesting results. In several works, alternative mechanisms additional to the PR1 → PR2 (i.e., additional to the compositional intra-strand relations generated by inter-strand symmetry) have been formulated. It is out of the scope of the present work to systematically review this research field. Indicatively, we recall below some characteristic cases and their connections to the PR1–PR2 relationship.

In the work of Afreixo and co-workers, see e.g. (Afreixo et al. 2013, 2016), the appearance of PR2 relations has been explored in the case of k-mers for k up to 10. This research group has examined the statistical significance of the measured compositional traits, focusing on the ‘exceptional symmetry’, meaning symmetry beyond that expected under an independent nucleotide assumption (i.e., under no neighbor preference). They conclude that up to k = 7, characteristic deviations occur from what is expected on the grounds of randomness. They also observed a complex compartmentalization of the genome involving: k-mers abundances, patterns of occurrence of PR2 and diversification of genomic functionality including protein-coding and non-coding regions.

Forsdyke and co-workers have developed an explanation for the establishment of PR2 relationships for both single-nucleotides and k-mers, proposing that: “Chargaff’s second parity rule reflects the evolution of genome-wide stem-loop potential …”, see (Bell and Forsdyke 1999) and references given therein. This approach brings attention to a factor which might contribute to the finally measured PR2 relations, provided that a given sequence does exhibit considerable stem-loop potential. However, this is expected mainly in the protein-coding regions, thus leaving unexplained the PR2 equalities that hold in the noncoding, especially within the long eukaryotic chromosomes. Later on, Forsdyke, following the same trail of ideas, has also elaborated on the alignment-free phylogenetic classification (Forsdyke 2019), which represents a particularly fertile line of thought, although far from the subject of the present research note.

Albrecht-Buehler (2006) suggested that: “… inversions and inverted transpositions could be a major contributing if not dominant factor in the almost universal validity …” of PR2 for single- and oligo-nucleotides. Although they describe a mechanism certainly contributing to PR2, the degree of its contribution to the final picture remains an open question.

Both approaches of ‘stem-loop potential’ and of ‘inversions and inverted transpositions’ cannot explain, even in principle, either the V- and Λ-shaped skew patterns of single nucleotides (Grigoriev 1999), or the corresponding similar patterns found in the first-neighbor preferences (Apostolou-Karampelis et al. 2016) in bacteria. Also, they cannot account for the ‘factory-roof motif’ in eukaryotic chromosomes (Touchon et al. 2005).

We have to stress here that inter-strand symmetry in function (PR1) can account not only for single-nucleotide PR2, but also for equalities in the frequency of occurrence of k-mers. This last property happens on the basis of strand equivalence in contextual (neighbor-dependent) substitution rates, which then engenders PR2-type relations for k-mers. In a previous work (Apostolou-Karampelis et al. 2016), one of us and co-workers have analyzed the link between inter-strand symmetry and PR2 relations for k-mers. The appearance of accurate PR2 equalities for dimers does depend on equivalence between strands, which concerns known molecular mechanisms: (a) No or minimal strand-biases in the function of PolIII α-subunit during RNA transcription is necessary for the appearance of PR2 relations for k-mers. Also, (b) the contextuality (dependence on neighboring nucleotides) in transcription-coupled repair, and (c) in other repair-mechanisms or genomic dynamics in general is examined and shown to influence the values of frequency of occurrence of specific dinucleotides. Therefore, in (Apostolou-Karampelis et al. 2016) it was shown that PR2 for k-mers is ultimately reduced to inter-strand symmetry considerations.

Rosandić and co-workers (Rosandić and Paar 2014; Rosandić et al. 2016, 2019) relate the appearance of PR2 to the inter-strand mirror symmetry in 20 symbolic purine—pyrimidine symmetry quadruplets of trinucleotides (direct, reverse complement, complement, and reverse) mapped to double-stranded genome. Their description is compatible with the above dynamical approaches and emphasizes the link between the compositional relations (PR2) at all k-mers’ lengths with intrer-strand symmetry (PR1).

Closing this research note on the connection between Chargaff’s PR2 relations and Noether’s theorem, we should stress that the context of this type οf work is very wide, see e.g. (Byers 1998; Wigner 1954; Kosmann-Schwarzbach 2011; Wigner et al. 1969; Monod 1978), where views from physics are progressively incorporated into main stream biology, especially in the study of the interplay between the genome and its function.