Keywords

4.1 Introduction

RNA editing is a post-transcriptional modification that alters the information content of the RNA sequence itself (Bass 2002; Nishikura 2016; Eisenberg and Levanon 2018). Across metazoa, the most prevalent type of RNA editing is adenosine to inosine (A-to-I) editing, mediated by members of the well-conserved ADAR (adenosine deaminase acting on RNA) enzyme family. Two catalytically active enzymes of this family are encoded in the mammalian genome: ADAR1 (also known as ADAR) and ADAR2 (also known as ADARB1). ADAR1 is strongly expressed in all tissues (Lonsdale et al. 2013). ADAR2 expression is lower than that of ADAR1. It is expressed most highly in the artery, cerebellum, esophagus and lung tissues, although observed to some extent in most other tissues as well (Lonsdale et al. 2013).

ADARs were first identified as enzymes that unwind double-stranded RNA (dsRNA) structures (Rebagliati and Melton 1987; Bass and Weintraub 1988). It is now widely believed that this dsRNA unwinding function is the ancestral function of the widely expressed ADAR1 protein, accounting for the lethal phenotype of ADAR1 deletion in mice (Hartner et al. 2009; Mannion et al. 2014; Liddicoat et al. 2015; Pestal et al. 2015; George et al. 2016). Long double-stranded RNAs (dsRNAs) are identified by sensor proteins such as MDA5, and trigger production of type I interferons as part of recruiting the innate immunity system against viral RNA (Schneider et al. 2014; Wu and Chen 2014). However, large numbers of endogenous dsRNAs are likely to appear in normal eukaryotic cells as well (Reich and Bass 2019), mainly due to the abundance of mobile elements in the genome—transcripts harboring nearby inverted copies of the same repeat fold to create an endogenous dsRNA structure (Porath et al. 2017b). These structure may erroneously trigger the cytosolic immune response, resulting in a severe outcome to the host cell (Hartner et al. 2009; Mannion et al. 2014; Liddicoat et al. 2015; Pestal et al. 2015; George et al. 2016). A-to-I editing, mostly carried out by the constitutive ADAR1p110 variant, introduces mismatches to the endogenous dsRNAs while still in the nucleus (Patterson and Samuel 1995; Roth et al. 2019), so that the edited endogenous transcripts are no longer recognized by dsRNA sensors in the cytoplasm, possibly through destabilization of the RNA structure. Preventing the endogenous dsRNAs from false alarming the immune system is the essential function of ADAR1.

In parallel with editing and unwinding the potentially dangerous long and nearly perfect dsRNAs, ADARs also edit much shorter and weaker structures. Many such structures are bound to appear in the transcriptome, due to the abundance of repetitive elements. In fact, all multicellular metazoans screened so far (Porath et al. 2017a, b) exhibit extensive editing, the extent of which strongly depends on the repertoire of repetitive elements in their genome (Neeman et al. 2006; Porath et al. 2017b). Likely, most of this extensive editing is not crucial for preventing an innate immune response (Barak et al. 2020).

The vast majority of this editing activity occurs in non-coding regions, such as the primate-specific Alu repetitive elements (Levanon et al. 2004), and is catalyzed mostly by ADAR1 (Roth et al. 2019). In some cases, noncoding editing events may have acquired a function. For example, the cellular fate of an mRNA and/or its translation probability can be affected by editing of miRNA binding sites in its 3′ UTR (Pinto et al. 2017) or by editing of the cognate miRNAs themselves (Kawahara et al. 2007; Alon et al. 2012; Vesely et al. 2012; Pinto et al. 2017; Wang et al. 2017). Yet, as of now it seems that most of these sites are functionally irrelevant.

The situation is quite different with respect to the coding sequence. Due to the structural similarity, inosines mimic guanosines in many cellular processes (Basilio et al. 1962). Translation of inosine-containing codons is mostly similar to that of the equivalent guanosine-containing ones (except for IAC codons, where 25% of the translated proteins interpret the inosine as adenosine) (Licht et al. 2019). Thus, editing of protein-coding sequences may lead to non-synonymous substitutions and novel protein variants, possibly affecting protein functionality. In addition to point-like protein modifications, editing may create splice sites, resulting in the introduction of novel exons (Rueter et al. 1999; Lev-Maor et al. 2007), and editing of a stop codon (e.g., UAG (stop) → UIG (tryptophan)) may lead to stoploss and C-terminal extension of the protein. Thus, unlike non-coding editing, the functional potential of editing events modifying the resulting protein (“recoding” sites) is quite clear. In mammals, recoding sites are mainly targeted by ADAR2, and it is thus believed that the main function of the mammalian ADAR2 enzyme is to edit specific non-synonymous sites within protein-coding sequences (Tan et al. 2017).

Although other types of RNA editing also lead to recoding, to the best of our knowledge A-to-I editing is the only type that gives rise to recoding in nuclear mRNA across multiple tissues and conserved across lineages. The rest of this chapter focuses on A-to-I recoding.

4.2 Observing Recoding in RNA-Seq Data

Most current RNA sequencing schemes start with reverse transcription of the RNA into cDNA. Like ribosomes, reverse-transcriptases treat the inosines as guanosines. Consequently, inosines in the mRNA appear as guanosines in the cDNA, and the editing events show up in the RNA-seq data as A-to-G DNA-RNA mismatches.

Discovery of the first mammalian recoding sites throughout the first decade of A-to-I RNA editing research were serendipitous. The introduction of computational approaches has enabled systematic large-scale editing detection. The basic idea behind these approaches is quite simple. As editing shows up as an A-to-G DNA-RNA mismatch, one only needs to scan through large-scale sequencing databases and look for this mismatches, filtering out technical and biological noise (e.g., sequencing errors, incorrect alignment, genomic polymorphisms, somatic mutations) (Eisenberg et al. 2010; Schrider et al. 2011; Kleinman and Majewski 2012; Lin et al. 2012; Pickrell et al. 2012; Piskol et al. 2013). Since 2003, a number of groups have developed computational approaches that apply various filters to the multitude A-to-G mismatches observed in a given sample, or a set of samples, in order to identify the relatively few originating from an editing event (Levanon and Eisenberg 2006; Eisenberg 2012; Ramaswami and Li 2016; Diroma et al. 2017; PMID: 32211029 Claudio Lo Giudice et al, “Quantifying RNA Editing in Deep Transcriptome Datasets”). Advances in sequencing technologies have increased the availability of high coverage multi-sample datasets, resulting in millions of editing sites identified in human and other species (Bazak et al. 2014a; Ramaswami and Li 2014; Picardi et al. 2017a).

These systematic searches revealed that recoding is but an exception of the editing repertoire. Virtually all sites found in the abovementioned computational screens reside out of the coding region and have no direct effect on the protein. Furthermore, non-coding editing events are easier to find, as they are often clustered and concentrated in the well-identified repetitive elements. As a result, on top of the low numbers of recoding sites detected, the false-positive rate is very high in the coding region, especially for mammalian transcriptomes where the scope of recoding is rather low compared with Drosophila or cephalopods (see below).

Accordingly, standard widely used all-purpose detection schemes are not suitable for detection of recoding events. While they do show an impressive transcriptome-wide performance, the results in coding regions are rather poor (as reflected by the low fraction of A-to-G mismatches among all mismatches found). The reliability of thousands of putative human recoding sites that have been reported by the large-scale systematic searches for editing sites is thus questionable. Reliable identification of recoding sites is yet an unmet challenge.

One effective approach is available for conserved recoding sites. The technical and biological errors mentioned above are not expected to reoccur in multiple species at the exact same location, and therefore conserved A-to-G mismatches that are observed at the same position in two (not-too-close) species are expected to be enriched in evolutionarily conserved recoding sites (Hoopengardner et al. 2003; Levanon et al. 2005; Pinto et al. 2014). Note, however, that in highly conserved exons one may observe the same alignment artifact in several species, leading to a false discovery of a “conserved recoding event.” Dedicated methods for detection of recoding events in a single-species data are being developed currently. Hopefully, a conservative alignment that minimizes alignment errors supplemented by utilization of multiple samples to filter out genomic polymorphisms may be the key to reliable and comprehensive mapping of recoding sites.

4.3 Utility of Recoding

4.3.1 Diversifying the Proteome

Recent decades have revealed the important role played by post-transcriptional and post-translational mechanisms in generating the proteomic complexity of higher organisms. These epigenetic mechanisms allow for diversification of the proteome in a temporally regulated, tissue-specific, condition-dependent way, leading to functional heterogeneity across tissues, developmental stages, brain regions or even among individual cells within the same tissue.

Recoding by A-to-I RNA editing is an example for such a mechanism, facilitating proteome diversification. It has the capacity to create a range of proteins from a single genomically encoded gene, providing the organism with a new means for acclimation and adaptation. Unlike genomic mutations, editing could modify a fraction of the transcript copies, and its levels may be fine-tuned to produce the edited and unedited versions of the protein concurrently, even within the same single cell, at a relative concentration that depends on the tissue, condition and environment. Indeed, several studies have demonstrated how recoding levels at specific sites do change as a function of the organism’s condition. For example, editing in a variety of transcripts was shown to modulate along the circadian cycle of transcripts in mouse liver (Terajima et al. 2016), and changes in RNA editing have been associated with sleep (Robinson et al. 2016). Importantly, many studies have demonstrated altered editing of individual recoding targets in various disease states [for a recent review, see (Gallo et al. 2017)].

Recoding facilitates a much wider range of possibilities for adjusting the transcriptome than genomic mutations do. Unlike genomic mutations, the edits are transient, well-suited to respond immediately to external cues and drive acclimation to changes in internal or environmental conditions, without compromising the genomic information. A nice demonstration of this idea is the peak in ADAR levels and editing levels during spawning in corals, leading to over a thousand recoding events at the time of gamete release that are not observed in adult corals (Porath et al. 2017a). This extensive increase in protein diversity may improve gamete’s adaptability without manipulating the underlying genome (Eisenberg and Levanon 2018). Another intriguing example is provided by recoding of a potassium channel in octopus, whose level correlates with the external temperature. It is not yet clear, however, whether this effect is due to rapid acclimation or long-term adaptation (Garrett and Rosenthal 2012a, b).

While recoding probably occurs in virtually all metazoa, the repertoire of recoding sites varies considerably across lineages. Only a few dozen recoding sites are known to be conserved across mammals (Pinto et al. 2014). Similarly, dozens of sites were found in zebra fish (Sie and Maas 2009; Pozo and Hoopengardner 2012; Li et al. 2014a; Shamay-Ramot et al. 2015), ants (Li et al. 2014b), as well as 164 sites in bees (Porath et al. 2019). The situation is somewhat different in Drosophila, where nearly a thousand recoding sites were shown to be conserved across the lineage (Yu et al. 2016; Duan et al. 2017; Zhang et al. 2017). The most notable exception is the cephalopod’s lineage, utilizing recoding at a level that far surpasses all other species studied so far (Alon et al. 2015; Liscovitch-Brauer et al. 2017), with tens of thousands of recoding events found in each of the four coleoid cephalopod species studied.

4.3.2 Limitations on Functional Utilization of Recoding

Given the above-described potential of recoding to be functionally utilized, and the fact that the editing mechanism is encoded in the metazoan genome, the relatively limited scope of recoding is surprising. One may have expected that in the course of organisms’ evolution, recoding sites will appear and fixate in the transcriptome as a response to external pressures. However, with the exception of cephalopods, recoding seems to be utilized to a rather limited extent across the animal kingdom. Even in Drosophila and cephalopods, the contribution of the conserved recoding sites to adaptation is not clear (Yablonovitch et al. 2017a, b). Why would that be the case? Several possible explanations have been proposed.

One possibility is that regulation of RNA editing is not sufficiently complex to allow for individual control of each of the hundreds or thousands of functional recoding sites. As far as is currently known, the editing efficiency is mostly determined by two factors: local sequence and structural motifs encoded in the RNA sequence, and the expression level of the ADAR proteins and their regulators. The surrounding sequence is, by and large, hard-wired in the genome, and is therefore independent on the tissue, cell-type, environmental condition or developmental stage. Indeed, editing levels at specific mammalian sites are largely consistent across tissue-matched samples from different individuals (Greenberger et al. 2010). Thus, this factor does not contribute to regulation, and one would expect the variations in editing level at a given site to be mostly governed by the level of the ADAR proteins and their regulators. Alterations in ADAR levels might allow intricate tissue-dependent or condition-dependent regulation (Picardi et al. 2015), but all editing sites would be equally affected. This sets a major limitation on the flexibility of regulation, and may result in an effective upper bound to the number of independently regulated functional recoding sites.

It should be noted, though, that the full repertoire of ADAR regulators is still unknown. Possibly, there are multiple trans-regulators of RNA editing that allow for a more complex editing pattern (several candidates have been recently suggested Fritz et al. 2009; Marcucci et al. 2011; Garncarz et al. 2013; Behm et al. 2017; Oakes et al. 2017; Tan et al. 2017; Chung et al. 2018; Roth et al. 2019). Note, however, that the enzyme specificity of these regulators is mostly unknown. Possibly they affect mostly ADAR1. Another interesting layer of editing regulation is provided by auto-editing of ADAR2 (Rueter et al. 1999), resulting in the appearance of a novel 3′ splice acceptor site, which in turn leads to an addition of 47 nucleotides. The affected transcript is frame-shifted, predicted to lose the dsRNA-binding domain as well as the catalytic domain. Interestingly, ADAR-auto-regulation is also observed in Drosophila and bumblebee, but there it leads to non-synonymous changes rather than a frameshift (Palladino et al. 2000; Savva et al. 2012; Porath et al. 2019). However, as far as we currently know these ADAR regulators mostly affect editing globally, and probably do not allow for site-specific control of editing levels. More intricate, yet unidentified, layers of regulation may exist, providing differential control over the editing levels at different sites. On the other hand, if indeed editing regulation, by and large, does not provide site-specific resolution, this sets a major limitation on the use of recoding for adaptation and acclimation. These limitations become more and more pressing with an increasing number of functional recoding sites, as adjustment of the global regulators of recoding should take into account the effect on an increasing number of targets.

Another possible explanation for the rare usage of recoding in many species is related to the evolutionary cost of maintaining a fixed functional recoding site. It has been suggested (Liscovitch-Brauer et al. 2017) that conservation of an active recoding site imposes a severe constraint on the genomic region that encodes the dsRNA structure recognized by ADAR proteins. Mutations that affect the stability of this secondary structure might modify the level of editing or abolish editing altogether (Reenan 2005; Rieder et al. 2013). If the site is indeed positively selected, such mutations will undergo purifying selection so that the delicate balance between the edited and unedited versions of the protein is maintained. The higher the number of such positively selected sites is, the stronger is this constraint on the global genomic evolution. In cephalopods, it is estimated that 3–15% of the inter-species mutations and 10–26% of the intra-species polymorphisms were purified due to constraints associated with maintenance of editing (Liscovitch-Brauer et al. 2017). Conversely, creation of a new editing site requires a structure to evolve, imposing evolutionary constraints on the surrounding sequence. This trade-off between the transcriptome plasticity provided by RNA editing and the genomic variation required to drive adaptation and evolution might explain why extensive recoding was disfavored in most metazoan lineages (Liscovitch-Brauer et al. 2017).

4.3.3 Recoding as a Global Response to External Conditions

However, even if recoding cannot be efficiently regulated at a single target resolution, global regulation of recoding may be still useful for adaptation if a change in external conditions, such as temperature or acidity, affects all sites, or many of them, in a similar way. Recoding may then be utilized to counteract this change, or response to it, in all recoding sites. For example, editing has been shown to be involved in temperature response in both Drosophila and cephalopods (Garrett and Rosenthal 2012b; Rieder et al. 2015; Buchumenski et al. 2017). Presumably, a decrease in the external temperature perturbs the energy-entropy balance controlling protein-folding and might be mitigated by a global increase in editing that tends to replace multiple amino acids by smaller, less stabilizing, ones (Garrett and Rosenthal 2012a). Under this scenario, global coordinated upregulation of editing in multiple targets could be functional as a response mechanism to lowered temperatures. Interestingly, this response of editing to temperature, one of the most important environmental variables, can be easily achieved without any need in intricate regulatory networks. Editing depends on folding the RNA molecule into dsRNA structures. The stable folded structure is governed by a balance between binding energies and structural entropy, and is therefore affected directly by the external temperature. It is therefore easy to imagine RNA structures that are fine-tuned to allow editing only below a certain cut-off temperature.

Having the above scenario in mind, one is tempted to offer an attractive explanation to the striking difference between mammals on one side, and Drosophila and cephalopods on the other. The latter species have been shown to utilize recoding to respond to acute temperature changes, while the homeothermal mammals have no incentive to utilize extensive recoding. This is further supported by a recent study that examined RNA editing in squirrel, a heterothermic mammal, and suggested a dynamic response of the A-to-I editing profile to the low body temperature during hibernation (Riemondy et al. 2018). One should note, however, that the above-mentioned initial analyses of ants, bees, and fish, seem to suggest that limited-scope recoding is not limited to homeothermal animals. Future studies of more diverse species are needed to reveal the extent to which cold-blooded organisms utilize extensive editing to respond to temperature.

4.3.4 Functional Studies of Specific Sites

The previous sections leave us with a number of open questions: Is RNA editing utilized for proteome diversifications? If so, which of the editing events is adaptive? Is conserved recoding generally adaptive? Does editing contribute to a dynamic proteomic response to external pressures? Detailed functional analyses of multiple recoding sites are required in order to fully settle these questions. However, experimental studies of the effect of recoding are often challenging and time-extensive, as the phenotype of editing may be subtle, if not elusive. Accordingly, mechanistic understanding of the effect of recoding in these sites on the biochemical activity of the protein, not to mention functional analysis of the consequences to the cell and the organism, typically lags behind identification of new recoding sites. So far, only some of the strongly edited and conserved mammalian sites have been characterized in detail.

The most studied recoding site is the Q/R site in GluR-B, the first discovered case of recoding in mammals, which results in voltage-independent gating with decreased calcium permeability (Sommer et al. 1991; Higuchi et al. 1993; Seeburg and Hartner 2003). Editing of this site is nearly complete in normal brain tissues (Sommer et al. 1991). Its under-editing is associated with human diseases such as amyotrophic lateral sclerosis (ALS) and malignant gliomas (Maas et al. 2001; Kawahara et al. 2004; Kwak and Kawahara 2005) and the absence of recoding at this site results in an early death in mice (Higuchi et al. 2000). This is the only mammalian recoding site associated with such a severe phenotype. The Q/R site is one of the most conserved recoding sites in mammals, observed in amphibians and some species of fish, and is likely to have been evolved no later than the appearance of cartilaginous fish (Kung et al. 2001).

The second target identified, the serotonin 2C receptor (Burns et al. 1997) (5-HT2cR) is one member of a family of serotonin receptors expressed in the central nervous system, edited in five different sites affecting three amino acids. These sites are not fully edited, nor fully correlated, and thus editing could potentially lead to 24 different protein isoforms with varying effect on the response to serotonin and a cascade of downstream pathways (Burns et al. 1997; Marion et al. 2004). Transcripts encoding for at least 20 of the different protein variants were observed in human brain tissues (Wang et al. 2000; Wahlstedt et al. 2009; Khermesh et al. 2016; Zaidan et al. 2018). However, the unedited isoform (Isoleucine–Asparagine–Isoleucine; INI) alone accounts for roughly half of the transcripts (Khermesh et al. 2016).

Functional studies of the effect of recoding have been published for a small number of other physiologically important mammalian genes (Sommer et al. 1991; Egebjerg and Heinemann 1993; Lomeli et al. 1994; Burns et al. 1997; Sailer et al. 1999; Bhalla et al. 2004; Yeo et al. 2010; Daniel et al. 2011; Chen et al. 2013; Miyake et al. 2016; Jain et al. 2018), and electrophysiological studies have analyzed the effects of recoding on a few ion channels in cephalopods (Patton et al. 1997; Rosenthal and Bezanilla 2002; Colina et al. 2010; Liscovitch-Brauer et al. 2017), but the implications of recoding remain largely unknown for the vast majority of reported sites.

Over one thousand recoding sites reported in humans, but only a few dozen of them were shown to be conserved across mammals (Pinto et al. 2014). Thus, the vast majority of human recoding sites seem to be restricted to human or the primate lineage. These non-conserved recoding sites do not show signs of selection (Xu and Zhang 2014)—that is, they are less abundant and more weakly edited compared with editing at synonymous sites, and they are under-represented in essential genes, highly expressed genes, and genes that are under purifying selection. However, it is not clear yet whether these results represent the actual behavior of mammalian recoding sites or merely reflect the rather large false-positive rate in current databases.

Furthermore, even for the conserved sites the functional importance of editing is not obvious. A recent study has demonstrated that, with the exception of the essential recoding Q/R site within GRIA2 transcripts (Higuchi et al. 2000), complete abolishment of recoding is well tolerated (Chalk et al. 2019). Mice lacking ADAR2 suffer from progressive seizures and die within three weeks of birth, but this severe phenotype is completely rescued by altering their genome to encode an arginine at the GRIA2 Q/R recoding site (Higuchi et al. 2000; Chalk et al. 2019). The rescued mice develop normally and live a normal lifespan even if ADAR1-editing is further shut down (Chalk et al. 2019). This unexpected result does not exclude the possibility that recoding of conserved mammalian targets (other than the Q/R GRIA2 site) does have functionally important, even if subtle (Horsch et al. 2011) (or apparent only under specific conditions), effects. However, it raises the possibility that many of these sites may be dispensable.

Finally, the vast majority of the mammalian recoding sites reported so far are edited to a very low level. Often, only a few percent or less of the transcripts carry the edited version. Certainly, low-level editing is less likely to have a functional impact. Indeed, the editing levels at the conserved recoding sites, expected to be adaptive, are much higher than that of the non-conserved sites, or the synonymous editing sites with the coding sequence (Pinto et al. 2014). Assuming the low-level sites are not functional, why are they being edited? This may be just a biological noise, as ADAR enzymes may bind weakly to some randomly structured RNAs and edit them to a minimal extent. In parallel, many weakly edited sites are due to “satellite” editing. The RNA structures required for editing of functionally important recoding sites often include dozens, or even hundreds, of adenosine nucleotides. Some of these may get edited just because they happen to be incorporated in the dsRNA structure. In both cases, these events may survive selection as long as the effect of editing is not too deleterious (e.g., editing is weak enough so that the slight decrease in the unedited protein isoform is tolerable and the edited form itself is not harmful) (Xu and Zhang 2014). Satellite sites may even be conserved across distant species, as a result of conservation of the structure required for editing of the functional site in their vicinity.

However, it is also possible that sites appearing to be weakly edited when averaged over a tissue, exhibit much higher editing levels in specific subpopulations of cells (Gal-Mark et al. 2017), or even at a single-cell level (Picardi et al. 2017b). In fact, an interesting recent report suggests that at the single-cell level, editing is often binary in nature—either all copies of the transcript are being edited, or none are (Picardi et al. 2017b). If this is indeed the case, then even a low-level of editing could have a major impact on some cells within the tissue.

4.4 Evolutionary Aspects of Recoding

4.4.1 The Evolutionary History of Recoding

The ancestral ADAR enzyme appears to have originated via the incorporation of a double-stranded RNA binding domain into the coding sequence of ADAT1, a member of the ADATs family (adenosine deaminases acting on tRNA) found in all eukaryotes (Gerber et al. 1998) that are incapable of editing mRNAs. Extensive editing has been observed in cnidaria (corals) (Porath et al. 2017a), and ADAR enzymes were identified in multiple Ctenophora and Porifera species (although not in the placozoan Trichoplax adhaerens) suggesting that the origin and expansion of the ADAR gene family preceded the last common ancestor to all contemporary animals (Grice and Degnan 2015). It is now widely believed that the ancestral function of ADAR1, shared by all present-day metazoans, is to protect against false activation of the innate immune system. Recoding is probably a secondary use of the editing machinery. Following the introduction of ADARs to the metazoan cell, weak recoding sites have presumably appeared as a side-effect to the ancestral ADAR1 activity, and the beneficial ones were then maintained and further evolved.

It should be noted that while the RNA edits themselves are transient and are not transmitted to the next generation of cells, editability is inherited through the RNA structural and sequence motifs encoded in the parent genomic sequence. As editing relies on the target RNA adopting a specific dsRNA secondary structure, and possibly adjacent editing-enhancing dsRNA structures (Lomeli et al. 1994; Rieder et al. 2013; Daniel et al. 2014; Sapiro et al. 2015), the genomic sequence surrounding a sites may transmit the editing pattern to the next generation of cells, and genomic mutations in this sequence may further fine-tune editing efficiency. Recoding is therefore a mechanism for heritable proteome diversification and has the potential to lead to adaptation in response to external pressures (Gommans et al. 2009).

A novel recoding site may appear in the course of evolution following an accumulation of random point mutations that slowly modify the structure of the corresponding RNA molecule to form of the minimal dsRNA structure required for ADAR recruitment. This process may be accelerated by the activity of mobile elements, in two different ways. First, mobile elements newly integrated to the genome may be exonized and incorporated into protein-coding sequences (Sorek et al. 2004). These repetitive elements are susceptible to editing, as they can readily pair with a similar reversely oriented element in a nearby intron to create a long and stable dsRNA duplex (Bazak et al. 2014b). For example, the hundreds of Alu elements that have been exonized into coding regions of the human transcriptome (Dagan et al. 2004) are enriched in primate-specific recoding sites (over a thousand such sites are tabulated in current databases). A notable example is the NARF gene, harboring a pair of extensively edited inverted Alu repeats in one of its introns. In primates, editing of NARF pre-mRNA creates a novel splicing site and recodes a stop-codon, resulting in a novel primate-specific alternatively spliced exon, which itself contains additional recoding sites (Lev-Maor et al. 2007).

Second, mobile elements may accelerate the emergence of novel recoding events by creating an intronic RNA duplex as a result of mobile element activity in a nearby intron. Long and stable intronic dsRNAs are known to induce or enhance site-selective editing at recoding sites in a neighboring exon, up to several hundred nucleotides away (Daniel et al. 2012, 2017; Ramaswami et al. 2015). Notably, many of the most efficiently edited (>50% editing) recoding sites conserved across mammals are located in proximity to a nearby editing-inducing elements (Daniel et al. 2017) that may serve as ADAR recruitment elements. Accordingly, a pair of inverted mobile elements newly introduced near a coding exon could form a dsRNA structure that would enhance editing of a neighboring preexisting recoding site, or even initiate recoding at a site that was not edited prior to insertion of the repetitive element (Daniel et al. 2014).

Interestingly, the genetic code prevents the appearance of a premature stop codon due to an adenosine into guanosine substitution. Thus, random non-specific A-to-I editing events cannot produce truncated protein products, usually dysfunctional and often harmful, and their potential deleterious effect is limited. This observation may partially explain how extensive A-to-I editing is tolerated (as compared to C-to-U editing, for example). Most nonspecific recoding is expected to be evolutionarily neutral or slightly deleterious and should be slowly depleted from the transcriptome, while the few beneficial sites are fixated. If this model is correct, one may expect to see in present-day transcriptomes many newly acquired recoding sites that are organism-specific (or lineage-specific) and mostly evolutionarily neutral or possibly mildly deleterious, in addition to a set of more deeply conserved, functionally beneficial, fixated sites.

Indeed, virtually all recoding sites identified in mammals, Drosophila, cephalopods, and other species studied so far are lineage-specific, and most of them are not conserved even across closely related species. Thousands of human recoding sites have been reported, only a few dozens of which were found in mouse, and only a handful are known to be edited in non-mammalian vertebrates. For example, editing of the Q/R site in GluR-B is observed in birds, amphibians and some species of fish, assumed to have been acquired following the Agnatha–Gnathostome separation (Kung et al. 2001), and recoding of FLNA and CYFIP2 is conserved in birds (Levanon et al. 2005). So far, only a single target (the Shaker potassium channel) is known to be shared by vertebrates, Drosophila and cephalopods (Porath et al. 2019). Thus, while the available information about the conservation of recoding across species is still partial, it seems consistent with the view that recoding sites were not part of the ancestral set of ADAR targets, but rather were exapted into the genomes of the different lineages subsequent to their divergence, possibly following a lineage-specific large-scale genome invasion of mobile elements. Screening of more lineages is then expected to reveal independent sets of recoding sites, of widely varying size.

4.4.2 Interplay Between Recoding and Genomic Mutations

Interestingly, many recoding sites are fixed genomically as guanosines in closely related species (Tian et al. 2008; Pinto et al. 2014). In some cases, the ancestral genomic allele is G, and then editing partially counteracts the effect of a G-to-A genomic mutation. For example, it is argued that the Q/R site in GluR-B has emerged following the divergence of jawed vertebrates. The ancestral allele, as appears in jawless fish (but also in many teleost fish, including zebra fish and fugu) codes for arginine (Kung et al. 2001). Similarly, frog and puffer fish genomic versions of subunit α3 of the GABAA receptor encode for methionine at a position orthologous to the mammalian-conserved I/M recoding site (Ohlson et al. 2007). In these cases, one may argue that the genomic-A allele is disadvantageous, and it is only due to editing that the G-to-A mutation can be tolerated and fixated. If this is the case, recoding should have evolved rather quickly (on evolutionary scales, obviously) following the genomic G-to-A conversion, which means that the mutation should have occurred within a pre-existing dsRNA structure. It is yet to be determined whether in such cases having the “editing switch,” i.e., the possibility to express both the edited and non-edited variants of the protein, is beneficial compared with having only the edited version hard-wired G in the genome.

On the other hand, there are several examples for sites where the ancestral genomic state was an editable adenosine, and then in some species a guanosine was hardwired into the genome. For example, one of the recoding sites in subunit α6 of the nicotinic acetylcholine receptor is recoded in the silkworm and the honeybee, but the tobacco budworm harbors a genomically encoded G (Tian et al. 2008). Phylogenic analysis reveals that the ancestral state at this site is an adenosine, which has gained recoding in some species, and then was converted to a guanosine in the tobacco budworm. In such cases, it is tempting to think of editing as an evolutionary intermediate, enabling “probing” of the G allele without changing the genome. Only when the organism is well-adjusted to the G allele, can the genomic A-to-G mutation be accepted (Tian et al. 2008). However, currently available data is limited to anecdotal examples and can be equally explained by the simple observation that sites where the G allele is tolerable are more likely to acquire both recoding and a genomic A-to-G mutation.

4.4.3 Is Recoding Generally Adaptive?

What fraction of recoding activity is adaptive? Analysis of thousands of human putative recoding sites suggests that these sites are mostly non-adaptive and slightly deleterious (Xu and Zhang 2014). Only a few dozen human coding sites are conserved across mammalian species (Pinto et al. 2014) and expected to be functional. The situations seems very different in other lineages: close to a thousand recoding sites are conserved across the Drosophila lineage (Yu et al. 2016; Duan et al. 2017; Zhang et al. 2017), as well as more than 10,000 recoding sites conserved across cephalopod species (Liscovitch-Brauer et al. 2017). These sites show signs of positive selection and are enriched for non-synonymous substitutions (recoding sites) over synonymous substitutions, an indicator of positive selective pressure.

Even in mammals, the question of recoding adaptiveness is not fully settled. First, it is not yet clear to what extent these analyses are affected by the high false-discovery rates in the reported sites. An improved analysis of the adaptive nature of recoding in mammals requires a more accurate detection scheme, as well as a more detailed analysis of conservation in closer species, e.g., within the primate lineage. Second, as explained above, many weak editing sites are expected to arise due to nonspecific ADAR activity, so adaptiveness should be analyzed based on the editing levels. In fact, although these weak sites are numerous, their overall contribution to the recoding activity (measured by the number of deamination reactions) is not large compared to the conserved sites that are strongly expressed and strongly edited. In most human tissues, recoding of FLNA and IGFBP7, whose recoding is both conserved across mammals and has a proven functional impact (Jain et al. 2018; Morgantini et al. 2019), accounts for the majority of ADAR’s recoding deamination reactions. Thus, while it may very well be the case that most recoding sites are nonadaptive, most recoding activity may be adaptive. Third, some weak sites are “satellite” events that belong to a cluster of sites including a stronger, possibly conserved and functional site. The latter sites may be nonadaptive standing alone, but editing of the whole cluster may still be beneficial.

On the other hand, the adaptive role of conserved recoding activity was recently challenged from a different angle (Jiang and Zhang 2019). It was suggested that editing as a diversifying mechanism is actually never adaptive, and the only cases in which editing is conserved and maintained by evolution are those where only the G allele is actually beneficial. According to this “harm-permitting model,” recoding is fixated in the genome only when required to correct for a deleterious G-to-A genomic mutation (“restorative editing,” which may be the case for the Q/R GRIA2 site, see above), or at least to compensate for the lack of a beneficial A-to-G mutation. One may argue that such cases are not truly adaptive, as having a fixed G allele would be advantageous over the flexible editable adenosine. Restorative non-adaptive editing may account for the over-representations of recoding sites (high N/S, nonsynonymous to synonymous ratio) observed in conserved mammalian sites, as well as Drosophila and cephalopod sites, even if there is no adaptive advantage to having an editable A at these sites as compared to the ancestral genomically encoded G. This “harm-permitting” model is supported by analysis of cephalopods’ recoding sites exhibiting enrichment of recoding in restorative ancestral-G sites, consistently with prior studied (Tian et al. 2008; Zhang et al. 2014; An et al. 2019). While restorative editing certainly takes place, its extent is still unclear. It is not known yet whether it may account for the multitude of deeply conserved sites. Careful analysis of the evolutionary history of recoding sites in multiple lineages and experimental analysis of known conserved sites are required in order to settle this fundamental and important question.

4.5 Conclusion

Recoding is a post-transcriptional mechanism, capable of diversifying the proteome and contributing to its complexity. Despite much progress in the past three decades, a number of key basic questions are still open. Computational biologists are still struggling to provide comprehensive and accurate sets of recoding sites, even in human. On the experimental side, the biochemical and functional impact of recoding is largely unknown for the majority of the strongly edited and well-conserved sites. Finally, there are many open global questions regarding the regulatory and evolutionary aspects of this intriguing phenomenon, and even the general notion of recoding being adaptively utilized to diversify the proteome is not fully accepted. We look forward to future computational and experimental advancements, combining global analyses of recoding sites and their properties with detailed characterization of individual sites, in hope for clarifying the above questions as well as opening new exciting research directions.