Introduction: Scope of the Call for Rethinking the Axioms of Genomics

Upon publication of the results of the project Encyclopedia of DNA Elements (ENCODE), a 4-year research effort led by the US Government, its architect issued a mandate: “the scientific community will need to rethink some long-held views” [1].

What views require our revision? The sequencing of the human genome engendered the idea of genomics as information science [2]. New avenues must be explored that are opening up to scientific research once breakthroughs from decades-old theoretical cul-de-sac lead to theoretical and experimental advances. “DNA has two types of digital information—the genes that encode proteins, which are the molecular machines of life, and the gene regulatory networks that specify the behavior of the genes” [2]. This paper reviews the dichotomy of genomics regarding historical conflicts regarding gene and gene regulation and offers a guiding principle for their synthesis. The introduction of this principle is made possible by a long-delayed but now respectful removal of two pragmatic dogmas, replacing them by a sound information-theoretical axiom.

In response to the open request in the post-ENCODE era that welcomes increasingly rigorous theoretical and mathematical foundations, this note comes forward with a principle learned from data in the course of an attempt to mathematize biology. In early efforts,Footnote 1 recursive algorithms came to the fore in explaining the function of neural networks.Footnote 2 However, since such an idea ran counter to the central dogma of molecular biology [10] that then prevailed in genetics, wherein only a “forward growth” mindset obtained (see Figs. 1, 2, and 3), this author subdued his claim, stating that “establishing a rigorous relation of these ‘code sequences’ to the genetic code that underlies the morphogenesis of differentiated neurons may be far in the future” [5]. Now, however, with the ENCODE report in hand and with the field of neural networks now blossoming, the time seems ripe to advance a recursive principle whereby the genome governs growth of organelles, organs, and organisms.

Fig. 1.
figure 1

Nascence of the “Central Dogma of Molecular Biology”; the original concept diagram by Francis Crick in 1956 (not known to have been published, but acknowledged by Crick [61])

Fig. 2.
figure 2

Watson’s simplified rendering of Crick’s central dogma states what is certainly a fact and strips the dogma of its controversial prohibitions ([62], p. 298)

Fig. 3.
figure 3

Gene Paradigm for Forward Growth as of 2003. The left diagram of the double helix with a “genic” region highlighted is modified from the cover of Scientific American (April, 2003). Diagram on the right side is a brain cell. The diagram depicts the oversimplification, as if 1.3% of the DNA could determine, in a forward-growth manner, not only the Purkinje neuron shown but, given enough genes, all phenotypes resulting from separate genotypes. The 98.7% of junk DNA is a no man’s land to which there is no recursion

In post-ENCODE genomics, the issue of “genome regulation” takes its long-deserved place. The concept, aimed at controlling elements, began with Barbara McClintock in the 1940s. Before the double helix was revealed [11], she discovered transposable elements in maize. She called them controlling elements because they altered gene expression. She published in the same year [12] when the Crick’s central dogma was conceived (1956), wherein key feedback pathways were arbitrarily excluded. Gene regulation advanced to the operon theory of Jacob and Monod [13], for which they received the Nobel Prize in 1965. Jacob reminisced in his Memoirs about “...one of the oldest problems in biology: in organisms made up of millions, even billions of cells, every cell possesses a complete set of genes: how then, is it that all the genes do not function in the same way in all tissues?” [14]. This profound question is examined below.

By 1969, the field of gene regulation was growing at a healthy rate, to result in the work of Britten and Davidson [15]. However, in 1970, a wound opened up in genomics that has not yet healed. On one hand, the first major failure on Crick’s doctrine was revealed [16]. Both Crick and Watson responded but in different ways [17, 18]. To keep the establishment together, Ohno declared the same year that all but the genetic DNA was garbage DNA [19], along with the slightly modified term of junk DNA [20]—a notion which prevailed for a generation.

The objective of this paper is to provide an historical review of the bifurcation and to offer a theoretical synthesis to remedy. With post-ENCODE genomics now removing obsolete impediments, principle of recursive genome function (PRGF) is expected to rapidly evolve, especially since some workers have been laboring for years in a clandestine fashion, quietly disregarding obsolete views (Pellionisz 2002; see in [21]).

In a general sense, the profound impact of changing axioms should be outlined, such as in medicine, bioenergy, nanotechnology, and synthetic biology—and even in philosophy. It is possible, as it was in physics [22], and even in neuroscience (Neurophilosophy, [23]) that the reversal of long-held views may have philosophical implications, giving rise to what might be called genome philosophy [24]. For instance, based on the ENCODE results, a synthesis may be necessary to integrate some goal-directed Lamarckian notions of evolution [25]—not confronting but rather surpassing simplification of original Darwinian notions [26], where, as we all learn in school, natural selection suffices for the emergence of species. So saying, we are led to a few key conclusions concerning algorithmic approaches vis-à-vis the ENCODE study [27].

Core Idea: The Principle of Recursive Genome Function Reverses the Double Lock on Our Understanding of the Double Helix

The main body of this essay aims at accomplishing two goals: The first and far lesser task entails a brief review of history, since it has been attempted and to different extents attained numerous times before (see brief reviews below), to put to rest a long-standing but increasingly controversial theoretical double lock on the understanding of the function of “double helix” from the viewpoint of what we might call genome informatics. The second, far more important and difficult goal is, since “data never kill theories, only better theory can kill less tenable theories,” that this note should leave no vacuum by removal of the key dogma but should replace two discarded, obsolete conjectures—regarding the central dogma of molecular biology, along with the notion of junk DNA—and, by means of their reversal, synthesize into a single principle (PRGF), more completely grounded in empirical data and withstanding more scrutiny from the viewpoint of information theory.

The recursive genome function is expressed by a process of already-built proteins, iteratively accessing sets of first primary and ensuing auxiliary information packets of DNA to build hierarchies of protein structures.

In abstraction, recursion is meant as a process of defining functions in which the function being defined is applied within its own definitions.

Applying these postulates to the genome, the most concise formulation is as follows (see Eq. 1).

Every 1 − m finite state (Z) of the protein system (e.g., the n + 1st state, denoted by Z n  + 1) relies on the previous state of the protein system (e.g., Z n ) by applying a recursive function (f). The process is bounded by the limitation to the maximal number of states (m), where there is a function (f) from the nth state Z n to be executed on Z n to yield Z n  + 1:

$$Z_{n + 1} = \sum\limits_{n = 1}^m {f\left( {Z_n } \right)} $$
(1)

The diagram below (see also Fig. 4) pictures, in simplest terms, the principle of recursion of genomic function. In this equation, we see the cardinal role of the main path of recursive processes play in the construction of protein systems:

$$DNA \to RNA \to PROTEIN \to DNA \to RNA \to PROTEIN \to DNA \to \ldots $$
Fig. 4.
figure 4

PRGF breaks through the Double Lock of central dogma and junk DNA barriers (shown in a by triple lines), to yield PFGF (shown in b by checkered circle). The background figures in both a and b is from Fig. 2 from Crick (1970, see in this paper as the right side of Fig. 5), which permits only a “forward growth” from DNA that dead ends in proteins. By removal of both the central dogma that arbitrarily forbids information feedback from proteins to DNA as well as disposing the “Junk DNA” conjecture that claimed that (even if there was a path back to DNA), zero information would be found in the “Junk DNA,” a main recursive path (PRGF, checkered circle and arrows) is not only available in principle, but it is the principle of recursive genome function

These recursive feedback processes then snowball into evolving (protein) structures, governed by DNA.

Purkinje brain cells provide an illuminating example of building a protein structure by means of an L-string replacement recursive algorithm [28]. The application of that algorithm is given elsewhere [5]. Experimental support of the quantitative predictions of the recursive approach is also readily available [21].

Readers will note that the PRGF is consistent not only with the recursive algorithms used in neural networks but is conceptually akin to a particular recursive formula, viz., the Mandelbrot set (see Eq. 2, from [29]):

$${\text{Z}}_{n + 1} = {\text{Z}}_n^2 + C$$
(2)

Further, fractal sets (see also the Julius Ruis set [30]) are representatives of just one of the class of recursive algorithms.

The postulated principle of recursive genomic function opens new avenues by way of a class of recursive algorithmic functions. Just one (fractal) example is given to show that the formulation of experimentally testable hypotheses for genomic function is plausible and supported by experimental results [21, 31].

Deep Background: Recursion is a Well-accepted Process in Science; Why Should it Not Become a Principle of Genome Function?

Recursion is a well-established concept in the sciences, ranging from pure mathematics to biological neural nets (cited in “Introduction”). A common linchpin between the two given domains is the least squares algorithm, which minimizes errors; see the recursive mathematical and neural network basics [32]. From the viewpoint of the information theory, it is particularly notable that informational recursion from proteins (that are exposed to the external world) drastically alters the conception of genomic function as a closed system (to which the second principle of thermodynamics applies, with entropy increasing) to a system open to the world. Note that such recursion—involving outside factors—helps resolve the paradox between random mutation and the natural selection theory, where it is questioned whether the genome, featured as a closed system, can cope with an out-of-bounds increase in genomic entropy [33], once we consider that entropy can be regularized, given an open system [34]. Genomic function has hitherto been a strange exception to the widespread modeling of living and nonliving systems in terms of recursion. This singularity is especially peculiar since great physicists of the last century already predicted that our times would become the century of biology and that their physics-minded thinking processes, as given in Wiener’s Cybernetics [35], explicitly invoked feedback as a primary principle in animal and machine. Schrödinger’s What is Life? [36], von Neumann’s The Computer and the Brain [37], and Szilard’s A theory of aging [38] argued in unison that information-theoretic aspects would become key to a future understanding of biology. However, biology is a very young science—it is a mere 231 years since its coinage [39]. Genetics, as we knew it in pre-ENCODE genomics, just slightly exceeds a single century [40, 41]. Thus, the mathematical rigor that has characterized physics for over two millennia since Aristotle ([42], ca. 400 b.c.) could not be hastily enforced on unripe subjects—who were, moreover, for a long time somewhat unready and occasionally unwilling.

The above does not mean, however, that recursive algorithms have not been applied in conjunction with genomic systems, e.g., for extrinsic description and construction. In fact, the plethora of extrinsic applications makes it curious that the ideas forming PRGF have not heretofore overcome resistance and declared a breakthrough—a fact attributable to doctrines protected by the bulwark of scientists who underwrote pre-ENCODE genomics.

Recursive algorithms have encircled genomics and not only in respect of neural networks. Noted representative examples are genetic algorithms [43], recursive PCR [44], algorithms using DNA sequences as templates for encryption [45], construction of DNA structures by recursive algorithms [46, 47], and reconstruction of the genome by recursive assembly [4851].

The above encirclement of genomics by recursive approaches greatly facilitates a dignified removal of both the central dogma and junk DNA conjectures. A breakthrough from Fig. 4a to b completes the liberation of post-ENCODE genomics, to better and more fully embrace the principle of recursion. Significantly, the inherent mathematics of genomic informatics would no longer be perceived as running against the establishment.

Obituaries of the central dogma and junk DNA are offered elsewhere. In brief, the demise of the obsolete axioms has been a yearly event in recent times, summarily refuted by leaders [5257]. Specific factual anomalies contradicting the doctrine have been reported for decades (see in a separate section). One should appreciate that, even at its outset, serious reservations were voiced; see Jacobs, e.g., in his Memoirs (pp. 288 in [14]). Jacob and Monod [13] provided Nobel Prize-winning evidences within half a decade of Crick’s concept that operon regulation exerts a demonstrable feedback on DNA gene activity. Likewise, the junk DNA misnomer was summarily voided of its scientific validity as recently as in the suggested formal abandonment of the term as a scientific notion (International PostGenetics Society, 2006 —paper of 20 Founders rejected without review) and later in the ENCODE report stating that “the DNA is pervasively transcribed” [58]. The conjecture was finally put to rest by Mattick [59].

In all fairness, upholding vague and even controversial axioms in the nascent stage of biology (compared to more than ten times older and thus much more mathematical physics) were necessities dictated by practical constraints. This is perhaps best described by Brenner in his Nobel Lecture [60]:

In 1985, when the first suggestions were made to sequence the human genome, I thought that the sequencing techniques, even with incremental improvements, would not be equal to the task, and would require a factory scale operation to do it. I had also come to the conclusion that most of the human genome was junk, a form of rubbish which, unlike garbage, is not thrown away. My view at the time was that we should treat the human genome like income tax and find every legitimate way of avoiding sequencing it.... I was puzzled by the enormous variations in the amounts of DNA in different organisms. Indeed, whereas most physicists thought that organisms did not have enough DNA to specify their complexity, it was clear to me that many organisms had too much. I discovered from Hinegardner that one group of fish, the Tetraodontidae, which included the Japanese pufferfish, Fugu, had very small genomes, with a haploid content of about 400 megabases as opposed to the 3000 megabases of mammalian genomes. Although teleost fish are distant from humans they are still vertebrates, with the same body plans, development and physiological systems as ourselves. Because of these basic similarities it seemed unlikely that Fugu, with a haploid DNA content one eighth that of mammals would have eight times fewer genes, making it much more probable that what was missing in Fugu was junk DNA. [60]

This pragmatic consideration is also explicit in Crick’s 1970 “revision” ([17], cf. his Fig. 1, reproduced in Fig. 5a of this paper).

The principal problem could then be stated as the formulation of the general rules for information transfer from one polymer with a defined alphabet to another. This could be compactly presented by the diagram of Fig. 1. [of Crick, 1970] (which was actually drawn at that time [1958], though I am not sure that it was ever published) in which all possible simple transfers were represented by arrow. The arrows do not, of course, represent the flow of matter but the directional flow of detailed, residue-by-residue, sequence information from one polymer molecule to another. Now if all possible transfers commonly occurred it would have been almost impossible to construct useful theories. [Emphasis added, Pellionisz]. Nevertheless, such theories were part of our everyday discussions. This was because it was being tacitly assumed that certain transfers could not occur. It occurred to me that it would be wise to state these preconceptions explicitly. [17]

Fig. 5.
figure 5

(From Figs. 1 and 2 of Crick 1970). Left side permits an infinite number of possible recursive paths. Right side arbitrarily prohibits the main path of recursion by “dead-ending” proteins. The DNA → RNA → PROTEIN → DNA recursion is just a single obvious recursive path, and the fractal approach already elaborated to some extent is just one of the possible recursive algorithms

The saga (life, death, and obituary) of junk DNA is treated elsewhere. Suffice it here that both the central dogma and the interlocking junk DNA dogma were finally and officially put on hold only by the conclusions of the ENCODE pilot project, on June 14th, 2007 [58].

Historical Recount: Specific Review of the Nascence and Demise of the Central Dogma and Junk DNA Conjectures

The central dogma in its various renderings held that transfer of information from proteins and RNA back to DNA never happens. The central dogma of molecular biology was put forward in Francis Crick’s talks from 1956 (Fig. 4, cf. recollection of Jacob [14], p. 286) and published 2 years later [61].

This concept might be called “don’t look back” or “no feedback permitted” postulate. Further, proponents of junk DNA [20] claimed that, even if a process could be found, a recursion from proteins and/or RNA to DNA could not retrieve information from functionless junk DNA (that is 98.7% of the human DNA). This conjecture may be called “even if you look back, you find only junk” postulate. Recursion for information was not only forbidden but in addition was prejudged as useless because of an assumed void of information.

It may be Watson’s reduced version of the central dogma [62], wherein he emphasized what was surely found in genomic function and avoided needless and unsupported prohibitions, which helped the central dogma receive common acceptance from 1965 to 1969–1970, skirting sharp personal criticisms that were already present at its conception (e.g., questions as to its dogmatic stance; see Jacob’s Memoirs [14], pp. 288).

A factor in the prevalence of the central dogma during this period might be that, since the 1965 Nobel Prize for Jacob and Monod’s work on operon regulation [13], it had to be evident to Watson that, given the operon regulation of gene expression (as a function of the level of produced proteins), the prohibition of a protein-to-DNA information channel need not be in Watson’s textbooks. His simplified and convenient view (avoiding controversial prohibitions and their refutation by data) is called here the concept of forward growth—which is, without recursion, only half the loop.

This view, backed by Watson, prevailed so strongly that even the 50th anniversary issue of Scientific American, celebrating the discovery of the structure of DNA by Watson and Crick (1953), depicted as the general understanding that gene expression of DNA results, through RNA, in the construction of protein structures (see Fig. 3, a composite illustration of the modified cover page figure of Scientific American of April, 2003, where on the right side the model of a Purkinje brain cell is pictured, from [5]).

Reality was, as might be expected, much more complicated than any simplification. As early as 1969, in the Britten and Davidson theory of gene regulation [15], and by 1970 (cf. philosophical reflection by Darden [63]), the central dogma was squarely confronted by the discovery of reverse transcription, later called retroviruses, from RNA to DNA [16, 64].

Both Watson and Crick responded promptly but separately to the challenge posed by the discovery of the new enzyme that flagrantly violated their views. In the June 27, 1970 issue of Nature, the reverse transcriptase discovery was announced, and an anonymous “News and Views” article claimed: “Central Dogma Reversed.”

(Quote from Darden [63]): Watson, in the 1970 second edition of his Molecular Biology of the Gene [18], said: “The concept of a DNA provirus for an RNA virus is clearly a radical proposal. It overturns the belief that flow of genetic information always goes in the direction of DNA to RNA and never RNA to DNA. [Emphasis added, AJP] On the other hand, it offers an even greater variety of ways for cells to exchange genetic information. Considering the enormous complexity of biological systems, it would not be surprising if this device were uniquely advantageous in some situations. ([18], pp. 621–622)

Crick (1970) also responded immediately to the challenge [17] but in a different way (unfortunately, the dogma was not allowed to gracefully expire). Crick published a paper in Nature. His version of the central dogma, he contended, had not been reversed, as the anonymous Nature article had claimed. Crick stated, correctly, that in 1958, he had framed the central dogma in terms of the general transfer of information from nucleic acids to protein—but not the reverse (Crick 1958). That abstract claim had not yet been challenged. If it were shown that information could flow from proteins to nucleic acids, he said, then such a finding would “shake the whole intellectual basis of molecular biology” ([17], p. 563; quote from Darden [63], emphasis added, Pellionisz).

Thus, by 1970, the intellectual split between Crick, the originator of “The Central Dogma” (1956, [61]), and its promoter Watson (1965, [62]), threatened the collapse of the genomics establishment. The shaky ground of “The Central Dogma” was not really firmed up by the confession: “Dogma was just a catch phrase” (Crick, quoted in [65]).

In the same year of the split, Ohno’s junk DNA idea came to the rescue. Ohno first referred to garbage DNA in the human genome ([19], p. 62). Meaning that, even if there was recursive information access to DNA—from proteins or from RNA, e.g.—there was supposedly no information in the intronic and intergenic regions to be found and retrieved. Although the term “garbage DNA” floated in 1970, it did not take hold, but by 1972, in his presentation, he began using the more suitable term “junk DNA,” which did stick [20]. One should appreciate that, immediately after his presentation, the first person to rise in the discussion vehemently objected to the basis of the junk DNA conjecture (see “Discussion” by Boyer, in [20]): “It thus seems to me that the permissible number of structural loci is—as yet—a somewhat suspect way to arrive at figures of 1% structural utility to 99% junk.”

Why is it that Nobel Prize-winning experimental work (cf. [66], Jacob’s Nobel Lecture on his Prize with Monod, 1965) was available as early as within 5 years after Crick’s conception of his dogma—yet no theoretical confrontation developed? Their operon regulation [13] clearly demonstrated that the protein level, viewed as a result of genetic activity, did have an information-feedback mechanism on the genes in the DNA—such that down- or upregulated DNA–RNA activity resulted, in accordance with the amount of protein already generated:

Experiments on genetic transfer by conjugation not only led to a revision of the concepts on the mechanisms of information transfer which occur in protein synthesis; they also made it possible to analyze the regulation of this synthesis.... the operator is not transcribed into messenger and repression can be exerted only at the level of DNA.... Gene expression was then usually believed to consist in the accumulation of stable structures in the cytoplasm, probably the RNA of ribosomes, which were assumed to serve as templates specifying protein structures... Such a scheme, which can be summarized by the aphorism “one gene-one ribosome-one enzyme”, was hardly compatible with an immediate protein synthesis at maximal rate.” [13]

In retrospect and judging from Jacob’s Memoirs [14], it seems evident that Jacob was fully aware of the intellectual conflict between the Jacob–Monod finding and Crick’s central dogma, even at its birth (not shying away from direct criticism of the label “dogma,” however):

In an acute sense of publicity, to baptize Central Dogma—that is to say, incontestable truth—a hypothesis that was unsupported by any serious argument. ([14], pp. 288)

However, it appears that Jacob did not directly confront Crick on the latter’s conception of the dogma [61], since that was prior to the publication of the Jacob–Monod operon concept. Jacob and Monod (1961) published their “operon regulation” work soon after but did not receive their Prize until 1965, whereas in the following year, Watson and Crick (1962) received their award. Thus, it was arguably more politic to avoid a direct conflict among the foursome. Almost simultaneously, however, in 1965, Watson [62] distanced himself from the central notion by putting forward his “simplified version,” emphasizing what was undeniably true, although he did not dwell upon issues already controversial (see Fig. 2).

The point of this paper, however, is neither iconoclastic (discarding pragmatic doctrines while simply leaving a theoretical void in their place) nor to merely cite evidence for the widely reported means of feedback processes from proteins to DNA (and RNA to DNA). The point is to fill the void. PRGF is proposed to break through the “double ceiling” of the central dogma and junk DNA that impeded theoretical advances for half a century.

Theoretical and Factual Breakdown of the Central Dogma and Junk DNA

Further flogging of two dead horses is avoided as much as possible. This abbreviated section merely supplies evidence, gives credit where most needed, and points to the most powerful reviews.

From the theoretical viewpoint of informatics, the “double lock” on recursive information has long been suspect. First, there is no more information for hereditary material than that present in the DNA. In humans, if 98.7% of the DNA is arbitrarily closed to access and in addition its information voided, the remaining information that 1.3% of the (human) genome harbors is deemed simply insufficient to govern development of such advanced organisms as vertebrates. It was painfully experienced, for instance, that when constructing a computer model of 1.68 million brain cells of the frog cerebellum, algorithmic approaches had to be invoked, rather than pretending that an impossible amount of information was available to specify the vast neural network in its every detail [67].

As for factual contradictions, the array of evidence—against both the central dogma and junk DNA conjectures—is staggering; see respective reviews [52, 53, 56].

Beyond the early factual evidence belying the validity of the central dogma (“Operonic Regulation by Feedback from Proteins to DNA”), another large assortment of facts is available, as follows.

For an account of the (forbidden) information transfer from RNA back to the DNA, see the very recent review by Mattick [59], with background about the RNA world [6872].

Major issues are gene silencing (or “turning genes on and off,” e.g., by so-called LINE way stations, and switching via SINE-s). It is noteworthy that the PRGF is fully consistent with the currently vague notion of “turning genes on and off” but goes further by invoking recursion not only the sign(s) of the “parallel feedback” are meaningful but much more important information is the set of algorithmic values of recursive signals [7376].

For the “forbidden” protein-to-DNA interaction, see the work on protein binding with the DNA and methylation of the DNA by proteins—rendering DNA transcription reversibly or permanently impossible [7782].

For the (also forbidden) protein–protein interactions, see Prions [83] and a detailed and philosophical review [52].

Specific Process for Purkinje Neuron Growth Governed by a Recursive Genomic Function

In its simplest form, PRGF can be metaphorically described as the manner whereby an assembler employs a user’s manual with respect to a streaming supply of parts. First, the assembler looks at step 1 of the manual, and, operating according to instructions in the primary information packet, the assembler puts together the indicated components, taken from the supply of parts.

Next, the assembler compares the emerging structure to step 1 as depicted in the instruction manual by referring to the primary source of information (in our case, the DNA, the “genes”). It is noteworthy that it is useful for the assembler to mark as done the just completed instruction step, to avoid its repetition by mistake.

Next, the assembler proceeds to step 2 in the instructions. Accessing the next auxiliary information packet, the assembler puts together the indicated components into the next layer of the hierarchy. Comparing the emerging structure in the second hierarchy with step 2 by looking back at the manual, the assembler marks step 2 as done.

The process goes on in a recursive fashion through the manual, until all the finite steps are taken. The assembler then runs out of instructions (all instructions are marked done).

For the DNA, RNA, and protein (and so on), in the circular chain depicted in Figs. 4 and 6, the term “recursion” means that, by reversing the central dogma, a main path opens for recursive algorithms to be applied as the intrinsic mathematics of genome function. That is, not only an information transfer back from proteins to DNA is allowed, but this feedback mechanism is (again) relied upon as the enabling feature of a circular process, thus:

$$DNA \to RNA \to PROTEIN \to DNA \to RNA \to PROTEIN \to \ldots $$
Fig. 6.
figure 6

Sketch of recursive genomic government of the growth of Purkinje neuron. Starting from a primary information packet highlighted, a Y-shaped protein template is built by the “forward growth” process in accordance with the simplified (Watson) picture through transcription of DNA to RNA and, in turn, RNA building nucleic acids that form a structural protein. During the construction of the Y-shaped template, the primary gene is in a “turned on” condition. Thus, the most primitive primary part of the process retains Watson’s simplified scheme. In other words, the postulated process does not contradict to the process of “DNA makes RNA that makes proteins” but goes beyond it, by violating both the forbidden feedback mechanisms and the notion of junk DNA. In each recursive step, the perused auxiliary information packet (formerly “junk DNA” or “regulatory DNA”) is cancelled (methylated) upon perusal

Further, by reversing the notion of junk DNA, the principle of recursion postulates that the above main recursive path accesses functional (as opposed to junk) DNA, viz., information from intronic and intergenic regions that were formerly regarded as useless.

PRGF claims that the overall DNA function is expressed by a recursive process determined and governed by repetitive access to information packets contained in the DNA through the channel depicted in the schema above. The postulated process is active and bounded. By active, it is meant that information packets accessed may be rendered inaccessible (in reversible and/or a permanent manner by de novo methylation) and that the consumption of information governing growth leads to an eventual death of the organism.

Despite tremendous advances, the full genomic government of Purkinje cell assembly still remains largely unknown [84, 85]. This paper illustrates the applicability of PRGF in a sketch of the development of this brain cell (cf. [5, 21]). It is expected that the recursive framework provided will contribute to further advances in revealing genomic government of developments of Purkinje neurons and other structures, making it an eminent platform for post-ENCODE genomics.

To recapitulate and expand, Fig. 6 pictures the recursive process as follows.

Structural proteins are generated by a DNA primary information packet (pre-ENCODE “gene”), growing a Y-shaped template. Completion of proteins of step 1, however, is not a dead end, as formerly asserted by the dogma. Completion may be reported by a completion marker protein that, in its simplest version, binds with the DNA. Specifically, the completion marker protein can “turn off the gene.”

Completion of this first step shuts down the first stage of growth. Evidence is available for micro-RNAs (and interfering micro-RNAs, small inhibitory RNA-s; see [86, 87]) that can signal completion by turning off the primary information packet (formerly “gene”).

The auxiliary packet of information (formerly “junk” DNA or regulatory DNA) is turned off by de novo methylation upon perusal of retrieved information; each such auxiliary information packet, once perused, is rendered temporarily or permanently unreadable. This provides a framework to explain the oldest problem in biology, the fact that the differentiated cells of an organism are no longer omnipotent. Their methylation pattern permits specific and limited further growth, as much as permitted by the remainder of unmethylated auxiliary information to be accessed by further recursion. This framework is consistent with the argument of the theory of aging [38], the proposition that genetic damage leads to progressive degradation of the ability to make necessary proteins.

Generalizations of the Principle of Recursive Genome Function

PRGF obviously transgresses the once forbidden feedback mechanism and also relies on auxiliary genomic information packets—previously regarded as junk. It is noteworthy that, prior to the publication of Cybernetics [35], which made feedback mechanisms a most conspicuous aspect of biological processes, most philosophies in biology (including traditional evolution, [26]) assumed a similar, rudimentary forward-growth process via random mutations and natural selection. In view of the interaction of protein structures (organisms) with the environment, recursion would seem to enable nonrandom development.

Another comment toward generalization is that the depicted recursion that—not unlike neural reflex arcs in the early history of neuroscience—neural networks at first glance looked like a chain [3]. However, just as with neural networks, genomic recursion is inherently parallel, since the recursion is not limited to a single primary information packet and its auxiliary information packets. In the case of genomic function, the three main layered sets of elements (DNA, RNA, and proteins) operate in a massively parallel manner, inviting neural network algorithms (see in “Introduction”).

In this context, it is noteworthy that there is no separate or even separable operating system [55] as the recursive genome is self-governed (unsupervised). In fact, the view here is much in agreement with the concept first touched upon in “What is life?” [36]:

But the term code-script is, of course, too narrow. The chromosome structures are at the same time instrumental in bringing about the development they foreshadow. They are law-code and executive power—or, to use another simile, they are architect’s plan and builder’s craft—in one. [36]

Von Neumann [37] also stated his principle that there is no difference between the two kinds of information (code and data) so far as its repository (memory) is concerned. Primary information packets (“genes”) and auxiliary information packets are all nucleotide sequences; it is the PRGF recursive process of access that distinguishes them.

Readers familiar with the “fractal approach” [21] may note the applicability of PRGF to that particular paradigm pursued by this author to describe fractal development of a Purkinje cell in the “Pre-ENCODE wilderness years.” The fractal approach is especially encouraged by the measurements that “pykon-like elements” [88] of the whole genome of Mycoplasma genitalium [89, 90] apparently follow the Zipf–Mandelbrot parabolic fractal distribution (personal communication by Pellionisz in [31, 91] also pointing out a found Pareto-distribution, that is, a truncated Taylor series to the Zipf–Mandelbrot parabolic fractal distribution.

Readers will also note the generalization that PRGF opens a way not only for fractal recursive iterations (e.g., a fractal DNA resulting in the fractal structure of the Romanesca vegetable photographed for illustrative purposes on Fig. 7) but to an entire class of recursive algorithms, with the recursion in principle certainly not limited to the main recursion via DNA → RNA → PROTEIN → DNA → (etc) but possibly involving an infinite number of recursive loops pictured in Fig. 5a (Fig. 1 of [17] reproduced).

Fig. 7.
figure 7

An example of a recursive-looking organism (Cauliflower Romanesca). It is possibly grown by a Lindenmayer L-string replacement recursive algorithm, e.g., governed by the DNA → RNA → PROTEIN → DNA recursion, a massively parallel process executed repeatedly

Just for one more intricate example:

$${\text{DNA}} \to {\text{RNA}} \to {\text{DNA}} \to {\text{RNA}} \to {\text{PROTEIN}} \to {\text{PROTEIN}} \to {\text{RNA}} \to {\text{DNA}} \to {\text{RNA}} \to {\text{PROTEIN}} \to $$

From the viewpoint of an overriding pragmatism of genomics, one may note that the theoretically infinite possible variations of pathways of recursions in Fig. 1 of Crick [17] justifiably frightened workers (including Crick [17]) away from the theoretical problem at the time of 1960–1970, rushing to reduce genomic function into a two-step procedure (transcription and translation; [62]), and also putting a second lock on recursion by way of the junk DNA conjecture. It has long been known in physics that the two-body problem of two masses interacting can be described exactly, while the three-body problem presents formidable mathematical challenges. Crick, who later ventured into neural networks himself, did not seem to favor heavy use of mathematics [23].

Indeed, early in the second century of genetics (now in its post-ENCODE era, i.e., PostGenetics), the rudimentary recursive sketch of the fractal growth of a single Purkinje cell is a far cry from fully defining (and, through experimentation, verifying in every detail) a complete mathematical model of even the genomic governance of even one of the best known single-cell platform of the well-familiar multicellular organ of the cerebellum, the single Purkinje neuron.

Our best immediate hope is to gather support for revealing first the most rudimentary genomic regulation present in the smallest DNA of a free-living organism (Mycoplasma genitalium, where intergenic sequences total a mere 50,000 nucleotides, and thus not only actual fractal structures could be revealed in the DNA but “fractal defects” were corroborated with glitches in regulatory intergenic sequences; [92]). Next, we can target more complex (multicellular) organisms, to start with Purkinje neurons, later proceeding for instance to the quite obviously fractal-looking Cauliflower Romanesca, which visibly evolves in a massively parallel manner (Fig. 7).

To emphasize that the class of both massively parallel and recursive algorithms of neural nets led toward this school of thinking [6, 5] and will increasingly be applied in implicit denial of the central dogma, the most recent article from the flagship of NN R&D is cited [93].

The utilization of PRGF is expected to lead to predictable implications—as well as others, unforeseen. As indicated at the outset, new advances in applications of PRGF are likely to include epigenetic medicine (for diagnosis, identification of factors that block PRGF—such as defects in regulatory sequences—and for therapies that block PRGF to create new types of antibiotics, etc.). For bioenergy, nanotechnology, and synthetic biology, it seems essential to first understand genomic function, including genes but also their regulatory mechanisms. As suggested by Fig. 7, our plate is full for the second century of (postmodern) genomics.