Introduction

In the history of science and philosophy, there has been a long-standing debate between fundamentally different views of life, such as those that emphasize co-operation and symbiosis, and in contrast to this views that promote the central importance of ‘selfish’ individuals (Villarreal and Witzany 2013b; Witzany 2006, 2007). The adaptability of individual-based selection fundamentally stems from errors, a few of which might improve survival. But is it really possible to originate a natural language-like code based on errors? Do we know any empirical example in that a natural language or code used by living agents to organize and coordinate their behavior resulted out of selecting a variety of errors? In contrary, all empirical facts indicate natural language and code using agents that coordinate their behavior by information-exchange, -modulation, and -innovation, i.e., generating new sequence-based content. This means natural codes basically result out of social interactions of agents competent to use codes in the context (pragmatics) of real-life world experience and –history (Witzany 2014, 2015a). Applied to the virosphere, we outlined this in detail as a reply to outdated opinions concerning the role of virus within the tree of life (Villarreal and Witzany 2010). One crucial question therefore is Is really new sequence space the result of selection of an abundance of replication errors (Eigen 1971), or in contrary, is it the result of the practical competence of living agents to generate new sequences? Are, e.g., RNA viruses “error-prone” or is this a misleading term for a highly productive behavior?

Here we reconsider the importance to life of collective behaviors in RNA species from the perspective of infectious and transmissible agents as currently known as viruses and virus-like agents.

Therefore, we argue for the better explanatory power of highly productive RNA quasispecies consortia (qs-c) for the evolution, conservation, and plasticity of genetic identities. As will be presented below, the biological capability to generate a toxic (viral) code can clearly differentiate populations from one another and can force the coherence within one population. The code generation of a nucleic acid sequence that functions as preclusion of certain RNA consortia by toxic effect which is not counter-regulated by an antitoxin generates a consortia which are not excluded, and therefore can generate a “self,” which originates a biological identity. Such basic RNA behavior is completely absent in abiotic environments.

Falsified: Most Viruses are Disease-Causing Selfish Parasites

In contrast to former wide spread opinions that viruses are strictly selfish and disease-causing agents, two different and important assumptions are investigated here:

  1. (1)

    Empirical evidence now clearly supports the view that in most cases, viruses follow a persistent non-lytic lifestyle which has been documented in a variety of exemplars and are prevalent in every known cellular hosts. Although in last decades we heard of “exotic” viral lifestyles, such as temperate phages or lysogenic strains of viral genomes, the predominant opinion about virus life styles was that of epidemic character of viral diseases. Recent metagenomic studies about viruses have fundamentally falsified this perspective. Although the previous disease-causing designation remains true, it is now acknowledged that the large majority of virus infections on this planet do not display this disease causing behavior. Now we know most viruses are inapparent settlers of cellular host organisms, both within the nucleus in the genome or as non-genomic settlers such as plasmids. We have heard much about non-coding RNAs, mobile genetic elements, repeat sequences, and introns— all of them formerly being termed “junk,” but now we can imagine these as remnants of persistent viral infection events. Thus, the persistent viral lifestyle is the most dominant biological lifestyle on this planet. And from this perspective, cellular host organisms look like islands in an ocean of the global virosphere. Viruses and virus-derived parts represent the most abundant genetic information on the planet overrepresenting cellular genetic information ten times. If we ignore eukaryotes and only consider prokaryotic life, we have a number of prokaryote viruses of 1031, which means if we line up the length of their virions, we get 40 million light years (Rohwer 2014). The visible living world of organisms of all domains is literarily the tip of the iceberg surrounded by biology’s viral “dark matter” of which we are currently scratching the surface at best (Youle et al. 2012). Importantly, a key feature of this viral lifestyle is that only few need to remain as functional agents, such as mammalian endogenous retroviruses needed for in the syncytia, which regulates mammalian pregnancy (Perot et al. 2012). Here, a much larger number of retroviral-derived code (LTRs) are providing the regulatory network of the placenta. In most cases, parts of infectious agents remain as defectives some known as LTRs, non-LTRs, SINEs, LINEs, and Alu’s which are later on co-adapted for cellular needs such as regulation tools in all steps and fine-tuned substeps of cellular functions such as, e.g., transcription, translation, epigenetics, repair, and immunity (Villarreal 2005; Roossinck 2015; Chalopin et al. 2012; Slotkin and Martienssen 2007; Conley and Jordan 2012).

  2. (2)

    The second but not less important point is that viruses can co-operate, that is they interact to build groups that invade host genomes and even compete as a group for limited resources such as host genomes. This leads to an extraordinary effective result and a key behavioral motif that is able to integrate a persistent lifestyle into cellular host organisms, the “addiction” modules: former competing viral groups are counterbalancing each other together with the host immune system (Villarreal 2012). Although rather stable under certain circumstances, this addiction balance can also get out of balance which means the competing viral features may become virulent again. But when stable, we can find such counter-regulating paired genes of the addiction modules, as in the RM (restriction/modulation) systems as well as in TA (toxin/antitoxin) systems. Insertion/deletion functions represent similar modules as do RM systems. This ‘infectious’ colonization by new addiction modules is a main process in generating new sequence space without error-replication and therefore in the evolution, conservation and plasticity of genetic identities.

Because this view contradicts well-entrenched mainstream perspectives on virus–host relationships, we will remind the reader throughout this article on these key features.

Before Group Selection Might be Considered, Group Identity Must Emerge

The historical dilemma of co-operation (symbiosis) versus competition (selfish behaviors) in biology was thought to have been resolved in the 1960s by evolutionary biologist. With the introduction of kin selection and later game theory, it seemed apparent that individual-based selection could mostly account for co-operation (Eigen and Winkler 1983). Thus, the acceptance of the individual-selfish types became well established. This view, which requires unending competition and conflict, also results in the domination and use of the war-like metaphors.

However, absent from all of these early debates was the concept that biological groups must initially form some type of group identity before group selection might be considered or understood. Group identity seemed to be a concept that applied to things like human social behavior and not to the behavior of genes, ribonucleoproteins, viruses, or organisms. But with the development of metagenomic analysis, we can now clearly see, e.g., major viral influence on prokaryotic genetic diversity. Viruses are often thought to affect their host survival via ‘killing the winner’ (most numerous/successful host). Thus, lytic viruses would seem able to keep the dominant host type in check. Viruses can also transfer sequences and genes between virus and host. It has been proposed that viruses are a main driving force in microbial diversification and diversity (Weinbauer and Rassoulzadegan 2004). Some authors prefer a war-like metaphor to describe the virus–host relationship (Forterre and Prangishvili 2009, Brüssow 2009). Similar reports are arguing for a destructive/constructive dynamic between killer virus and surviving host (Koonin 2011, Koonin and Dolja 2014). Thus, this is the accepted neodarwinian perspective of individual fittest type selection as essentially put forward by Dawkins (Dawkins 2006) and his selfish gene narrative.

Opposing Forces Linked to Survival: Cumulative Evolution

However, the above scenario of killer virus and war-like metaphor is only half the story and not the innovative half.

The amazing hidden half of the story: Although viruses can clearly kill their host, they can also colonize their host without disease and thus sometimes protect the host from the same or similar virus. Together with lysis and protection, we see a virus-colonized host that is both symbiotic and innovative (acquiring new competent code). Virus together with their non-infectious defective variants can sometimes persist in their host for long—even permanent—times without causing harm. Most importantly, the acquired virus information indeed can then provide the basis of antiviral defense (Villarreal 2005, 2009a, 2011a). This new virus–host combination can thus provide an ‘antiviral’ survival advantage, especially in a virus-rich habitat. Since it is transmissible, such survival can also apply to groups that were virus colonized (not necessarily direct kin). This virus/host relationship is not one that adheres to a war-like metaphor. But it represents a pervasive yet ancient and ongoing force in the origin and evolution of life.

The opposing forces of virus killing and virus protection together function to define and provide an acquired group identity linked to survival. Both the destructive and the immune function to destruction promote the synthesis of new identity. This represents three steps: evolution, conservation, and plasticity of genetic identities.

As noted, viruses are the most abundant and diverse genetic entities on earth (Ryan 2009, Moelling 2013, Rohwer 2014). As they seek to either colonize or lytically replicate in their host, they also provide a dynamic, consortial, and history-dependent behavior for cumulative evolution.

Cumulative evolution at the level of nucleic acid codes (genetic code based on DNA or RNA) is not solely what remains out of natural selection processes of mutations (error replication). Cumulative evolution is more like a “ratchet effect”, which as originally used was metaphor-like means for the cultural transmission of learned experiences to the next generations. This learning must accumulate so that every new generation must not need to repeat all innovative thoughts and techniques (Tomasello 2014). Interestingly, such “ratchet” (cumulative) effects are basically evident in the virus–host relationship of persistent colonizations, although this is a genotypic level in contrast the phenotypic level where this effect is originally described. Organism immunity, especially adaptive immunity (CRIPRs/Cas, VDJ-systems), represents excellent and highly efficient examples for this “ratchet” effect as they integrate history-dependent experiences with infectious elements into their newly ‘adapted’ genetic identity.

Additionally, the “marking” of DNAs via methylation and histone modification links the genome to environmental experiences in a memory-like learning function and is used in context-dependent “interpretation”, i.e., in transcriptional modifications (alternative splicing, RNA editing) and translation. We can consider how epigenetics might contribute to a cumulative evolution. Since it marks regions of DNA for non-expression, compared to unmarked regions, the expression patterns in subsequent generations can be affected thus show adaptation without genetic change. Interestingly, the origin of most all of these marking epigenetic systems appears to relate directly to the silencing and persistence of virus-like parasitic code. This marking allows the epigenetic inheritance of the differentiation in the variety of tissues of several organs throughout all known species during the timely regulated steps in developmental stages. Therefore, epigenetics is a rather intriguing example of how evolution, conservation, and plasticity of genetic identities are interlinked without the error-replication narrative, but related to genetic parasite colonization.

Therefore, we should not think in terms of destruction and error, but think of the great and innovative power of the ever-present virosphere (Villarreal 2009a), linking together former competing agents into groups that are competent to generate new genetic code sequences, with new genetic identities. Thus, the persistence of virus derived information matters in a big way that affects group identity. This perspective represents a major adjustment to our thinking.

Techniques, Tools, and Strategies of Co-operative Agents of the Virosphere

As mentioned before, a main strategy by which viruses affect host group survival is via ‘addiction modules’ (Villarreal 2012). Viruses can persist in the host by ‘addicting’ them. In this, they are providing protection (antitoxin) against the ever-present killing (toxin) by virus. This module provides a core ‘social force,’ linking two lineages (viral and host) into one group. Thus, it is via addiction modules that viruses provide a path to co-operation and symbiosis. Such a relationship was initially discovered via viruses of bacteria that persist but do not integrate into host DNA (episomal phage). And although this lesson in addiction was acquired from persisting DNA viruses of bacteria, it has led to generalized idea of virus-mediated group identity (Supplementary File 1). We can now also apply this idea to populations of infectious RNA agents, which will also apply to the very origins of life.

The study of RNA virus in the 1970s led to quasispecies theory, which is based on the idea that ‘error prone’ replication of master fittest type results in a population of mostly less fit variants around the master type (Eigen 1993; Eigen and Schuster 1977; Witzany 1995). As mentioned in the introduction, “error” is not the appropriate term to describe agents that are competent to use the natural genetic code and that act co-operatively also. Experimental virology since 1990 has redefined quasispecies as being co-operative. However, besides being co-operative, quasispecies are also exclusive of other quasispecies (Domingo et al. 2012). They have group recognition, which means they are able to differentiate self from non-self. These two features allow quasispecies to promote the emergence of group identity. The concept of group identity has not historically been crucial or important in biological theories. However, as we will assert below, group identity provides a fundamental and innovative force in biology that affects both the very earliest and most recent events in evolution (Villarreal 2005, 2014). When such concepts are applied to the behaviors and evolution of pre-cellular RNA populations of relatively simple structures (stem loops), we can define a basal and still active force in the emergence of life (Villarreal and Witzany 2013a).

Evolution of Early (RNA Based) Life was Communal

At least with the emergence of DNA-based life forms, we see clonal and individual types can exist. However, with the viruses, we still see the communal character that must have prevailed during early life. As noted, the ‘killing vs. survival’ character of virus and host has led too many to consider war-like metaphors. But the necessity of group identity (a general issue in the biology of cells, tissues, organs, and organisms) also requires that individual agents be able to participate in a coherent group, i.e., to co-operate (Villarreal 2009b).

The crucial difference to former concepts is that with group identity via counter-regulated addiction modules, we see that two opposing components must be present and work coherently to define the group as a whole. This means biological identity inherently is constituted by dynamic interactions of co-operative groups. Science has long approached complex systems from a reductionist (individual) perspective. But when the many make up the one (such as a co-operative group), coherent non-reductionistic approaches become necessary to understand the system (Woese 2004). Let us now consider how the virus, the most selfish of all genetic agents, can also inform us about co-operation and group behaviors.

Overall Conceptual Objectives: Group Identity as Essential Feature of Consortia

Thus, a core assertion is that a ‘virus addiction module’ is a general and essential strategy for existence of life in the virosphere. But because viruses are transmissible and can persist in specific host populations, this leads to a form of group immunity/identity since identical but uncolonized host populations remain susceptible to the killing actions of the lytic viruses. In this way, we see that the viruses are providing the necessary opposing functions for addiction (persistence/protection and lytic/killing).

The colonized host must retain persisting viruses (or defectives) in order to survive in the omnipresent virosphere and also to be able to kill competing but uncolonized host. It is important to note that this is a main underlying mechanism of host cell group identity (Villarreal 2012).

This concept would seem to suggest a deep importance as to why prokaryotes, for example, must retain cryptic virus information (Supplementary File 2) and might better explain the highly dynamic interactions of bacteria and their viruses. But the existence of virus mediated group identity can have very deep implications for all life. Indeed it can lead us to propose a role for this in the origin of life itself.

How Might a Virus Lifestyle Predate the Origin of Life?

Is not a virus a parasite of a cell therefore only able to emerge after the emergence of cells (and ribosomes)? But a virus at its core is a molecular genetic parasite that can parasitize every nucleic acid sequence even other virus systems. This means any proto-biotic replicator system can and likely will be susceptible to virus emergence and colonization.

However, the virus-mediated addiction module leads us to also think how opposing functions might emerge and support consortia functions or group identity in early life. This can now provide us with a major insight. If virus can function as a consortia (an essential interacting group), then it might provide mechanisms from which consortial functions themselves could emerge in the origin of proto-biotic life (Villarreal and Witzany 2013b). Genetic parasites can act as a group (qs-c). But for the groups to be coherent, they must attain group identity and this is typically via an addiction strategy (Villarreal 2009a). In general, antiviral (and antiproviral) systems (such as CRISPRs in archaea and bacteria) will themselves emerge in host from virus-derived information. Most importantly, it is the viruses that are providing the crucial functions needed for antiviral defense.

Such thinking, however, does not seem to adhere to the tenants of natural selection in which the variants from an individual fittest type must undergo natural selection. Instead, it fundamentally derives from an external consortia representing diverse information and is not generated by variation from a fittest individual. There need never have been an individual fittest type to get the consortia started. To initate the functional activity of a consortia, there need never have been an individual fittest type. The consortia itself provides the needed function. In addition, the transmissive (infectious-horizontal) nature of the consortia agents means that they need not decend from a common ancestor.

Instead, group identity becomes crucial and the participating agents become ‘one’ and this must express coupled opposing protective/replicative and destructive functions in order to define the group identity.

The opposing functions are the basis of addiction modules. Thus, the emergence of group identity becomes an essential and very early event in the emergence of life. This is coherent to the basically group behavior of RNA-based agents that are competent to use the natural genetic code as outlined in the introduction. This group selection and group identity are needed to create information coherence and network formation and to establish a system of communication, i.e., code–competent interactions: The identity serves as information also for the ones that do not share this identity (Villarreal and Witzany 2010, 2013b). This is the beginning of self/non-self differentiation capability.

When viral consortia successfully colonize a host, new virus–host combined identity (immunity) results. This colonization also creates new regulatory networks which will typically involve new addiction modules. With this new identity, there will also emerge an enhanced (cumulative) host complexity along with new virus/host evolutionary ecology (Villarreal 2005, 2011b). Survival in the virosphere will have been significantly modified. And a new host–host interaction and group survival mediated by virus will also result. Again we look at the core of evolution, conservation, and plasticity of genetic identities.

A Network is not a Linear Order of Nodes

However, in contrast to traditional and linear thinking, all these features (group identity, addiction modules, regulatory complexity, network emergence, host–virus ecology, and host–host competition) are fundamentally interlinked and consortial (Villarreal and Witzany 2013b). They are inherently network phenomena. But such networks will have a major historic and stochastic dependence. Thus, they will inherently resist any formal predictive analysis calculation as is done in systems biology. These identity networks cannot be teased apart to define a single and linear logic as is currently accepted for individual fittest type selection. Nor can one ‘node’ be set to one specific function as the participating agents will be context dependent and multifunctional. For example, one cannot understand the origin and function of the viral ‘toxin’ (including lysis) without also considering the viral ‘antitoxin’ (the opposition of “self” by persistence or defectives). And these two functions will respond rather differently to self and non-self agents (such as another virus). They must be considered to have emerged together in a particular context.

We will now apply this network membership idea to the origin of RNA-based life below. However, such opposing and multifunctional requirements for network identity will almost certainly confuse us. Our very language compels us to generate a coherent line up according grammar rules and as a result to think in linear terms and syntactic sequence order (like this text). And we have become imprinted to think first of individual fittest type mechanisms in order to explain co-operative and complex systems. We do not, for example, consider the possibility of gang-like agent action, as being important for innovative and multifunctional solutions to problems (as presented in the “Gangen” hypothesis below).

The Basic Motif: Virus-Derived Addiction Modules Promote Group Identity

The discovery of addiction modules and their relationship to persisting virus has mostly been in the context of bacterial dsDNA viruses. And from this perspective, we can also infer that collective action of dispersed seemly defective (cryptic) viruses is able to provide specific adaptive functions (such as mobilization, network control). But as asserted in our introduction, a host cell population that is persistently colonized by such a ‘controlled or cryptic’ virus set will also be able to provide information that resist the action of the equivalent lytic virus(es). Thus, a competing identical population of host cells that are not persistently colonized with the same cryptic virus will be susceptible to viral lysis when it becomes exposed to populations of cells that are persistently (non-lytically) infected (Fig. 1).

Fig. 1
figure 1

Schematic of virus affects on population-based host survival. The five diffuse circles (left) represent a host population free of the infectious virus in question. When exposed, many members will succumb to the toxic (acute) affects of virus infection (crossed lines). Some, however, may be stably colonized (shown with dark center). This host population has acquired a new virus derived instruction set that also provided immunity to the same (and often other) viruses (shown by broken lines between cells). If this population retains some capacity to produce infectious virus, or if the virus remains prevalent, when it encounters another naive population, the uncolonized population will crash due to virus toxicity. The virus colonized population will be favored. Reproduced with permission from Villarreal (2011b)

This is essentially why a ‘lysogenic’ strain of bacteria will lyse an identical bacterial strain that is not lysogenic when the two populations are mixed. The lysogenic strain can ‘reach out’ and kill its otherwise identical neighbor via transmissible virus (Villarreal 2011a, 2012). Since this can happen with episomally persisting agents, it need not directly involve the host DNA genome content (it can be epigenomic). The history of virus exposure and colonization will therefore determine whether a specific host population will be lysed or resist a particular virus. As mentioned before, this is historically derived and stochastic, however, and cannot be predicted. But to continue to favor survival of the virus persisting population, these cells must maintain both the capacity to resist virus as well as the capacity (or a habitat with the capacity) for the production of lytic virus.

Hence, virus ‘junk’ must remain as it is crucial for this. This is thus a ‘virus addiction module’ with both protective and destructive functions which are required to favor the survival of persistently infected populations, especially in a diverse and omnipresent virosphere.

In this way, viruses are promoting the emergence of a group identity in its host. The bacterial identity will be very much determined by its colonizing set of genetic parasites (Villarreal 2011b, 2012). Although such assertions seem broadly important, in our judgment what is even more broadly significant is that this situation defines a strategy by which a collective set of ‘sub-functional’ and opposing agents can participate in the genesis of a new collective function and group identity. This then importantly requires a coherent network that is inclusive of opposing functions (various TAs), but favors persistence of the parasite derived new information.

Importantly, this new information is not the result of error-replication but a result of module-like linked genetic contents. This fundamental difference to error-replication narratives proposes new nucleic acid sequence constructions by integration of larger content arrangements into a coherent syntax without destroying the already existing sequence content. Cryptic prophages are indeed the main source of new TA sets in prokaryotes, but such new sets must counter or interact with prior TA sets to persist. They must become coherent with their host. The big implication of this is that such a strategy should also apply to various RNA agents thought to have participated in the origin of life.

What addiction modules and group identity can now allow us to explore is how a collective of sub-functional RNA agents might have been able to become a coherent group that has both function and a TA system needed for group identity. RNA is the crucial population of agents that needs to be understood as a defined population. For it is this defined consortia that underlies the origin of life and the regulations of much of the complexity of higher organisms. Can virus addiction and group identity help explain innovative RNA functions?

Origin and Function of RNA and Virus is Linked to Addiction, Persistence, and Group Behavior

RNA is active directly (enzymatically, genetically) and not only as a messenger. Thus, RNA is more fundamental molecule of life than is the DNA. Accordingly, and in keeping with population-based functions of RNA, we have previously proposed that DNA can be considered as a habitat for a consortia RNA (Villarreal and Witzany 2013a; Witzany 2015b). Thus, ribosomes can be considered as functional RNA consortia that have inhabited DNA (discussed further below). In a sense, a focus on DNA genomes (including viral) give us a skewed view that DNA initiates major creative changes (via errors), with little or no role of RNA populations and diversity and genetic innovation.

A basal role of small RNA in regulation has not been as obvious in prokaryotes compared to eukaryotes. Yet, we have long also known that RNA must indeed have been more basal if there ever indeed did exist an RNA world that created life. And this world must also have created DNA. Several investigators have noted the striking strategic similarities between the RNA-based defense systems of prokaryotes and eukaryotes (Karginov and Hannon 2010; Cooper and Overstreet 2014; Koonin and Krupovic 2014). Yet these systems share no homology with each other and must have derived from distinct ancestors. However, in both these domains of life, as well as in the interferon system of vertebrates and all RNA viruses, small RNAs with double-strand (stem-loop) regions are crucial for cellular defense recognition and response. Above, we noted that small RNAs can also be components of TA systems. Historically, little attention was given to the role of small RNA in basic regulatory functions of cells. More recently, we have come to learn that here too small RNA’s (such as CRISPRS or even tRNAs) are much more involved in basic regulatory functions (Nicolas et al. 2013. And in eukaryotes, there has been a big change in our thinking regarding regulation by RNA (Cech and Steitz 2014). Thus, we propose to place RNA-based regulation in a more fundamental role (Supplementary File 3).

As with transition from prokaryotes to eukaryotes, we see some striking differences regarding the activity and amount of parasitic RNA agents. Prokaryotic genetic changes are mostly driven by dsDNA parasites (virus and plasmids) as noted above. In eukaryotes, RNA agents (retroviruses and retroposons) are much more diverse, numerous and dynamic and are providing multiple levels of regulatory complexity. We have recently come to realize that transcription of such retroposon sequence (previously considered junk) is abundant and often produces non-coding RNAs, all being part of stem-loop regions (Villarreal and Witzany 2013b).

It is such RNA that is involved in complex multicellular identity. However, as we will now present, retroviruses are the major initiators of retroposon mediated changes in eukaryotes and the fitness of retroviral RNA (like all RNA) is fundamentally consortial. But this is not the quasispecies as most have come to understand it is based on error-replication and master fittest type concept of Manfred Eigen. It is fundamentally a co-operative and counteractive version of quasispecies (Domingo et al. 2012) that also supports group identity. Life will only emerge from consortial systems with group identity competent to use and edit the natural genetic code. Some RNA-based life forms, like ‘RNA only’ viruses, cannot persist as DNA and persist as RNA. Similar to the ancestral RNA, early life must have also persisted either as a dynamic RNA population or as a sequestered (static) RNA population. One present day example of dynamic RNA only persistence is Hepatitis C Virus HCV. The creative and consortial action of RNA populations remains a potent and ongoing force in the evolution, conservation, and plasticity of the genetic identities of the most complex life forms that continue to inhabit DNA.

Retroviral Networks Regulate Evolution and Development

Retroviruses also clearly generate and operate via quasispecies (Villarreal 2009c). But in contrast to the RNA-only viruses, retroviruses persist as and are copied from DNA and have also provided a large amount of genomic DNA sequence especially long terminal repeats (LTRs) as found in most eukaryotes (Shapiro 2005). If such genomic endogenous retrovirus (ERV) sequences are also produced by QS-mediated evolution, then their involvement in the formation of new or edited networks regulating host functions might be understood as resulting from a consortial RNA-based process with inherent coherence. Indeed, understanding the origin of transposable RNA-based networks (and network security) has always been challenging as networks do not fit into tree-based analogies (Bapteste et al. 2013; Daly et al. 2011; Feschotte 2008).

It also appears that various small non-coding RNAs participate in ‘multi-task’ networks and such RNAs tend to be transcribed from ‘junk’ retroposons (Mattick 2011; Mattick and Gagen 2001; Mattick and Makunin 2006; Pheasant and Mattick 2007). In terms of active editors of the human genome, there are about 330,000 solo LTRs (Oliver and Greene 2011, 2012) each of which must have initially corresponded to an intact ERV (~10 kb) subsequently lost by deletion. This means that 3.3 gB of human DNA (current size of our genome) was once retrovirus during our evolution. But such LTRs are highly involved in the emergence of new regulatory networks, such as the origin of the placenta (Bièche et al. 2003; Chuong et al. 2013; Emera and Wagner 2012; Harris 1998; Nakagawa et al. 2013) (re-regulating 1500 genes) Lynch et al. 2011, 2012) or in the African primates where alteration of 320,000 LTR p53 binding sites occurred onto the p53 cell cycle control network (Wang et al. 2007). These primate p53 network changes also relate to (co-operate with) changes in brain specific microRNAs (Le et al. 2009), alterations to DNA methylation involved in controlling SINE-derived RNA transcription (Leonova et al. 2013), as well as Alu-derived transcription (Zemojtel et al. 2009), an interconnected situation as seen in other networks. (Supplementary File 4)

Group Identity and Co-operativity of an RNA Collective: Essential Roles of Defective Minorities

In proposing the qs-c concept, it was argued that agent diversity (not errors) was essential for the capacity of a collective of RNA agents to function co-operatively (Villarreal and Witzany 2013b). Thus, an identity group of sub-functional RNA agents would be the predecessors of RNA-based life (not a functioning individual). And the type of Darwinian (individual fittest type) selection we are now so familiar with would not emerge until DNA emerged to provide individual genomes. DNA essentially functions as a habitat for the living RNA collective (Villarreal and Witzany 2013b). But with early life (prior to DNA), such an RNA collective must have been able to operationally hold itself together in order to function as a selectable population. And this would most likely be via both a commonly shared syntax of a natural nucleic acid code and a dynamic state (with ongoing replication), given the unstable highly productive (former ‘error-prone’) nature of RNA. Thus, in order to behave as a population or group, a qs-c must have some process that compels its coherence. Fundamentally, this is inherent in a qs-c behavior. But robust coherence of a population would also require a process that prevents the occurrence of both overly potent individual defectives as well as overly active individual replicators.

Significantly, self-parasitizing defectives can provide this control. Thus, there is an essential requirement for defective minorities providing functional (inhibitory) diversity. Minorities in the population will also retain memory (pre-requisite for learning) from past group selection events. And group process must also oppose non-members of other qs-c’s, just as observed with the RNA viruses. These negative (toxic) functions themselves are also likely to emerge from co-operative action of sub-functional RNA agents. Hence, to attain coherent group behavior, the group must also attain coherent group identity. And the occurrence of any participating TA function would need to be coherent with the rest of the TAs found in the population. In terms of the emergence of an RNA ribozyme-based living collective, it will need to have both opposing ribozyme activities; replication (ligation) and endonuclease to provide a coherent TA set (see Fig. 2).

Fig. 2
figure 2

Schematic for a ‘gangen’ of RNA that promotes the emergence of group identity, communication, and co-operativity from an RNA collective that also requires opposing functions. Reproduced with permission from Villarreal (2014)

Thus, the collective must initially emerge as a collective, with group identity mediated by co-operative sub-functional agents that together provide both the ligase (positive) and endonuclease (negative) functional features of an addiction module. And this means there was no ancestral individual fittest type, the collective was always a dynamic network with clearly defined membership (security, immunity), that depends on internal competition, co-operation, and opposing functions (antisense), while retaining a history of RNA agent colonization and their corresponding TA sets. All these aspects must attain coherence and provide coherent communication, genetic code-use, and group identity. In the origin of life, this was mediated mostly via a collective of stem-loop RNAs. Villarreal calls this hypothesis for the emergence of collective RNA-based life the ‘Gangen’ hypothesis as diagramed in Fig. 2. All the features noted above are included in the diagram (Villarreal 2014).

The ‘Gangen’ Hypothesis

The term comes from archaic Nordic. Gangen was an early Nordic term applied to pathways (gangway) but also led to descriptions of collectives (gangs) with clear collective functional abilities and group identities although participating members may highly vary according to dynamic changes in the real-life world environment. A Gangen (unlike a collective) must attain group identity (Villarreal 2014). Thus, it also describes the emergence of commonly shared code-use, group membership, and the collective living functions of the RNA agents. Membership is not a byproduct of individual selection, but enforced by the required toxic capacity within the collective. Note also that this collective, because it is dynamic and depends on diversity, will also retain memory of its history such as remaining minorities. Also the remaining minorities share this competence to use the natural genetic code. Emergence of a Gangen is therefore not a simple, chemically predetermined event. It depended on historic and stochastic agents that were able to join the collective and add and edit genetic code and its meaning (use).

This hypothesis provides a distinction between the principles of chemistry and biology (a living restricted collective with history and communication). Clearly, there is more to understanding the emergence of life than this hypothesis alone can account for. For example, the physical containment of the qs population (such as via membranes) or the source of metabolic energy, substrates, etcetera, and the role of amino acids as catalytic RNA primers or markers of replicator identity is not addressed and will not be considered here.

But there is an additional feature that should be emphasized, that is communication (code dependent interactions). Transmission of infectious code defines the origin of the virosphere. This issue reduces to the idea that a collective of agents (RNA) with inherent toxic and antitoxic features should be able to transmit (communicate) these agents and their features to nearby competing populations (via simple diffusion). Such transmission is essentially infectious and very much like a virus (or viroid). But in communicating RNA-based TAs, it strongly favors the survival of the RNA population with the compatible addiction modules that will inhibit agent toxicity (prevent lysis via ‘defective’ code) and allow persistence of the new agents. This is thus the survival of the persistently colonized (infected) set, which is an inherently symbiotic and consortial process. It also promotes increasing complexity (and identity/immunity) of the host collective via new agent colonization and stable addition. Thus, the transmission of RNA agents attains both communication (competent nucleic acid code-use) and recognition of group membership. In this way, the emergence of a ‘virosphere’ must also have been an early event in the origin of life, one that will shape communication of natural code and create group identity (Villarreal 2014) and therefore clearly represents evolution, conservation, and plasticity of genetic identities.

This concept differs fundamentally from current (and highly successful) view based on individual type selection of DNA-based organisms. Below, we assemble some evidence from study or RNA, which clearly supports the existence of collective phenomena in the origin of life.

Quasispecies Consortia (qs-c): Origin of Ribozymes and Co-operating Stem-loop RNAs

As noted, all investigations into the role of RNA in the origin of life assume that some form of ‘master fittest type’ of RNA existed, which was able to function as a ribozyme and inefficiently copies itself with a high error rate as essentially outlined initially by Eigen (Eigen 1971, 2013; Eigen and Schuster 1977). Thus, the original RNA replicator must have functioned as an individual ribozyme molecule. However, it has been noted that group selection of early replicators along with compartmentalization might be required to integrate information in the origin of life (Szathmary and Demeter 1987). Previously, however, the qs-c version of RNA selection has not been considered, and as it asserts that a sub-functional collective of RNA agents would be ancestral to effective ribozyme-based replication, it makes a very different prediction with respect to the unnecessary master fittest type (Villarreal and Witzany 2013b).

An assembly of sub-functioning RNAs could be produced by pure chemical mechanisms. But for this assembly to form a functional collective, it must attain a “Gangen” state (see above): A collective state of group identity based on commonly shared genetic code-use and co-operative functionality. Thus, bridging the split from proto-biotic assemblies to biotic groups requires agents competent for genetic code-use and co-operation. This can then provide both the combined positive and negative chemical activities of ribozymes.

Recently, there has been an accumulation of experimental evidence that RNA ribozymes do act and emerge from collectives that can also form networks. Very small hairpin ribozymes are known to have catalytic activity (Yarus 2011; Muller et al. 2012). Populations of evolving ligase ribozymes have been maintained by in vitro serial diluted passage (McGinness et al. 2002). More recently, the participation of two RNAs that participate in each other’s synthesis from four substrates (via co-operation) has been observed (Lincoln and Joyce 2009; Ferretti and Joyce 2013). Others have also used multiple (up to 4) stem-loop ribozymes together to select for combined ribozyme activity (Gwiazda et al. 2012). Similarly, 4 subfunctional fragments of group I intron ribozyme can self-assemble too into an autocatalytic ribozyme (Hayden and Lehman 2006). It has been established that group I ribozymes must undergo co-operative interactions that depend on native helix orientation to attain their functional 3D folds (Behrouzi et al. 2012). Co-operative fragments of RNA replicators have also been observed to spontaneously self-assemble and generate a network with co-operative catalytic activity (Vaidya et al. 2012). In such a network, a single RNA molecule can be multifunctional in an RNA pathway (Vaidya 2012).

Together, these results provide strong experimental evidence of the co-operative potential of sub-functional ribozymes. However, in none of these discussions has the issue of network membership (or group identity) been considered. According to the “Gangen” hypothesis, network membership along with their various addiction modules would also be essential for life to emerge from the RNA world. Accordingly, the ligation and endonuclease activity of ribozymes would need to emerge together to provide a TA set of functions.

It is now possible to consider many ancient and recent issues in evolutionary biology from the perspective of qs-c and the co-operative interaction of stem-loop RNAs. For example, ribosomes are essentially very complex ribozymes which are composed of a complicated set of covalently linked stem-loop RNAs which interact in complex ways to provide it with its core function, catalytic synthesis of peptide bonds (Bokov and Steinberg 2009). Given that their individual stem loops appear to have distinct evolutionary histories, the ribosome seems to represent consortia of stem loops that were built up historically over time during evolution (Harish and Caetano-Anollés 2012). Thus, when ribosomes became a resident of DNA in the first cells, the stem-loop RNA consortia were made stable. This again represents evolution, conservation, and plasticity of a genetic identity.

We can also evaluate very recent events in evolution from the qs-c perspective. Consider the extremely complex neuronal cell identity and communication issues that must apply to the nervous system of hominids. There is an emerging view for a basic role of non-coding retroposon-derived stem-loop RNAs in cell identity and neuronal network formation (Oldham et al. 2006; Qureshi and Mehler 2009; Barry and Mattick 2012; Qureshi and Mehler 2012). Indeed, as we reevaluate the human genome and consider the presence of several hundred thousand solo LTRs (many human specific), we might reconsider how these ‘many’ infectious agents co-operated to become the ‘one’ RNA collective we call human.

Conclusion

Viruses and virus-like infectious genetic parasites are the most abundant living entities on earth that outdate cellular life more than ten times. All living cellular organisms have always operated in a virosphere. And a virosphere is essentially a network of infectious genetic agents. The real survival of all organisms must always be considered in the context of its virosphere. This realization is thus a very recent but major shift in our thinking. Most experiments that evaluate the fitness (survival) of an organism ignore the virosphere and thus provide both artificial and unrealistic situations and outcomes for survival which have fundamentally misled us. For example, when we establish a sterile mouse colony free of all the usual persistent mouse viruses, we create an artificial laboratory habitat for survival. When we clone E. coli free of temperate and lytic phage, we similarly create an artificial laboratory artifact for survival. To get a coherent view on in vivo habitats in the context of real-life circumstances, we always must assume a virosphere perspective.

Although Eigen’s quasispecies concept predominated evolution concepts nearly half a century, it could not coherently explain empirical data of RNA groups and viruses that co-operate. In this review, we have demonstrated and exemplified that evolution, conservation, and plasticity of genetic identities are the result of co-operative consortia of RNA stem loops that are competent to communicate, i.e., build groups that use natural genetic code and edit this code, even by the generation of new sequences without error replication. The highly productive (not “error-prone”) capability to generate new sequences allows such groups to constantly infect other nucleic sequence-based agents, whether they have virus-like or cellular genomes. The generation of such new sequences by co-operating RNA stem-loop groups leads to identity groups of viruses that can function as toxic and antitoxic codes. Infected host organisms are the habitats in which such formerly competing agent groups (“Gangen”) now unify in addiction modules, which provide group identity such as TA, RM, and ID modules. Thus, all of the former competing groups become unified into stable/unstable modules that are counter-regulated, to provide immunity and memory systems, such as VDJ and CRISPRS/Cas, against related genetic parasites for the host. In this way what historically seemed to be competing and selfish viruses can instead provide a unifying collective of viruses (and their defective participants) to better explain evolution, conservation as well as the plasticity and often cumulative genetic identities of complex organisms more coherently than the previous quasispecies concept.