Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Spontaneous mutations are among the primary engines of evolutionary change. Until now, the major mode of their occurrence has been thought to be a consequence of errors in DNA replication, resulting in substitution and frameshift mutations. In reality, such mutational errors are not due to the enzymatic process having gone awry. Rather, substitution mutations are the consequences of intrinsic chemical equilibria associated with the tautomers of the bases and the steric isomers of the nucleotide precursors of DNA replication (Topal and Fresco 1976). Frameshift mutations, on the other hand, arise as a consequence of template or growing strand DNA misfoldings resulting in sequences that misdirect the polymerase, which is possible because of the flexibility of the nucleic acid backbone (Fresco and Alberts 1960).

Recently, we discovered a very different source of spontaneous mutations that is due to a self-catalyzed, site-specific DNA depurination mechanism inherent in a DNA sequence element 14–16 base pairs long (Amosova et al. 2006). The mechanism is mediated by this stem-loop-forming sequence that is widely present in all the more than 100 double-stranded genomes we have examined from Archaea to Homo sapiens, occurring within many lower forms with a frequency of every ~5,000 base pairs for self-depurination of G-residues, and gradually rising up the phylogenetic tree to every ~3,000 base pairs for self-depurination of those pairs in man, i.e., ~1.25 × 106 such sites.

Self-depurination has the potential to occur wherever the consensus sequences specific for self-catalysis of depurination of G or of A residues (see below and Fig. 1.1) are present, and so give rise to apurinic sites in the DNA backbone. Those sites can, in turn, result in point mutations as a consequence of their highly error-prone repair (Boiteux and Guillet 2004; Chakravarti et al. 2000; Korolev 2005; Simonelli et al. 2005). On the one hand, such repair can cause a substitution mutation. Alternatively, if the apurinic site gives rise to a backbone break by either the action of the widely occurring enzyme apurinic endonuclease (Korolev 2005) or by way of the well-known spontaneous β-elimination reaction (Lhomme et al. 1999; Sugiyama et al. 1994), the strand break site results in a frayed end accessible to exonuclease attack, particularly if the first base pair after the break is A•T, and so leads to a short deletion mutation.

Fig. 1.1
figure 1

Schematic diagrams of the stem-loop structure of the catalytic intermediates for site-specific self-catalyzed depurination in DNA of the indicated G-residue (left) and A-residue (right). In each case, the self-depurinating stem-loop is one side of a cruciform extruded from duplex DNA. The complementary inverted repeat sequence forms the other stem-loop of the cruciform that is, however, incapable of self-depurination. Certain sequence elements of the self-depurinating stem-loops are required for self-depurination to occur: the residues shown for each tetra-loop; particular complementary base pairs at the base of the loop, T•A or G•C on the left, and T•A, G•C or A•T on the right. These should be followed by any sequence of four or more additional complementary base pairs that can generally tolerate one mismatched pair (as shown on the left) or a single extrahelical residue. Thus, the specificity for self-depurination lies in the loop sequence and first base pair. The helical stem otherwise functions to maintain some strained loop structure required for the self-catalysis, and by its stability, to extend the lifetime of the intermediate and so enhance the kinetics of the self-catalysis

In this report, some essential features of the underlying self-depurination mechanism are first described. We then proceed to issues of biological relevance, including indications that the mechanism has played a role in the evolution of some biological phenomena. Because the consensus sequence for self-depurination of G-residues was discovered several years ago, and that for A-residues only very recently, our data is much more extensive for the former. Nevertheless, it will become apparent that both mechanisms are very similar, and appear to have played comparable mutagenic roles in several biological processes, including those of molecular evolution.

2 Essential Features of DNA Self-Catalyzed Depurination

Self-catalyzed depurination is a remarkably site-specific reaction that does not involve the direct participation of either a protein enzyme or any multivalent cation or cofactor, and can occur under essentially physiological conditions (Amosova et al. 2006). As such, it represents the first natural deoxyribozyme activity discovered. It is mediated by formation of two very similar stem-loop-forming consensus sequences, which we have found to be present at rather high frequency in every double-stranded DNA genome searched, from the lowest to the highest form.

Figure 1.1 shows schematically the stem-loop structure of two sequences that are highly site-specific for self-catalyzed depurination of a G-residue (left) and an A-residue (right). As the figure indicates, the self-depurination mechanism removes the 5′G-residue of the loop in the former case, and the A-residue toward the 5′ end in the latter one. In either case, the initial product of the catalytic event is an apurinic site in the loop sequence. This chain backbone site is thereby labilized, carrying a potential for intracellular backbone cleavage either by the enzyme apurinic endonuclease, or else as a result of its susceptibility to spontaneous backbone cleavage by a β-elimination reaction that can occur at slightly alkaline intracellular pH. Such an apurinic site, susceptible to error-prone repair, is potentially highly mutagenic and can give rise to a substitution or a short deletion. It is this resultant mutagenic potential that confers on self-depurination a role in evolution, all the more so because the self-catalytic depurination rate we have measured in vitro (Amosova et al. 2006) occurs 104–105× faster than the background spontaneous depurination rate that has been estimated in vivo (Lindahl and Nyberg 1972).

DNA is typically double-stranded, whereas the catalytic intermediate for self-depurination is a single-stranded stem-loop. Hence, the self-depurination mechanism requires that the inverted repeat sequence harboring the self-depurinating loop first extrude as a cruciform (half of which contains the single-stranded self-depurinating stem-loop), and that the cruciform have a sufficient lifetime for the reaction to occur. That such cruciform extrusion can take place has been demonstrated previously (Alvarez et al. 2002; Inagaki et al. 2009; Kim et al. 1998; Shlyakhtenko et al. 1998). We have recently performed experiments under physiological conditions in vitro with stem-loop-forming sequences for G-residue self-depurination embedded in supercoiled plasmids, in which such extrusion has been directly shown to be crucial for the self-catalytic depurination (Amosova et al. 2011b).

As Fig. 1.1 indicates, the essential features of the stem-loops are very similar, i.e., both have tetra-loops with different highly specific sequences: 5′G-A/T-G-G for depurination of the extreme 5′ G-residue and 5′G-A-G-A for depurination of the second residue in from the 5′ end, which is an A. Interestingly, both can form a homopurine base pair within the loop, G+•G and A+ A, in which the residue to be depurinated is protonated at N3 of the base (Lavelle and Fresco, in preparation). It is this base pair formation that likely explains why acid-catalyzed depurination can actually occur in the neutral pH environment of most cells.

The G-residue self-depurination activity exhibits very limited tolerance for loop sequence variation (Amosova et al. 2011a). For example, the three G-residues in the loop are replaceable only by hypoxanthine, a closely related purine analogue, with only modest reduction in the activity; and the A-residue in the G-self-depurinating sequence is replaceable only by a T-residue, in this case with activity enhancement. In contrast, there is total tolerance for variation in the complementary base pairs of the helical stem, except for the first one at the base of the loop. Even a single base pair mismatch or a single extrahelical base elsewhere in the stem can be tolerated. Apparently, the role of the stem is to stabilize and maintain the loop in some strained configuration favorable to glycosyl bond cleavage. Thus, the more stable the stem, as affected by length (Blake and Fresco 1973; Brahms et al. 1967), G•C content (Marmur and Doty 1959), base pair sequence (Ornstein and Fresco 1983), base pair mismatches (Lomant and Fresco 1973), and the presence of extrahelical bases (Lomant and Fresco 1973), the faster the rate of self-depurination. The nature of the first base pair is somewhat restricted, possibly because it orients the water molecule involved in the hydrolysis of the glycosyl bond of the residue to be depurinated.

3 Biological Relevance of the Self-Depurination Mechanism

The self-depurination reaction was discovered in the course of working with 29-residue long complementary deoxyoligonucleotide strands of the human β-globin gene that contained the sickle cell anemia mutation site. Whereas the duplex formed from those strands, as well as the noncoding strand by itself, were characteristically stable, the coding strand was not. Rather, under solvent, pH, and temperature that mimic physiological conditions, it rapidly self-fragmented not randomly, but in a unique way. This fragmentation was found to arise from backbone cleavage that was ultimately traced to spontaneous β-elimination at the apurinic site caused by self-catalyzed depurination mediated by stem-loop formation (Amosova et al. 2006). The occurrence of such a reaction in a strand segment of a significant human gene, immediately upstream of the sickle cell mutation site, provided the impetus for trying to understand the significance of what was obviously a DNA self-catalyzed reaction.

With the finding that the stem base-paired sequence is tolerant of great variation, it was decided to determine the number of potential stem-loops for self-depurination of G-residues in the human genome, and their distribution across the phyla. The numbers proved to be surprisingly large, and indicative of substantial overrepresentation relative to random expectation, which was not the case for stem-loops of similar size with non-self-depurinating loop sequences. The overrepresentation was therefore taken to be indicative of some important bio-functionality of the stem-loops for self-depurination. The whole-genome search for self-depurinating stem-loops was complemented first by identifying all human genes containing those sequences, including the loci of their occurrence, i.e., exons, introns, control elements, untranslated regions, intergenic regions, etc. This was followed by analysis of G-residue consensus sequence occurrence within more than 100 individual genes and their degree of overrepresentation relative to random expectation (see Table 1.3 as an example). These searches, together with the uncovering of the stem-loop consensus sequence for self-depurination of A-residues by way of its overrepresentation in the human genome, led us to the indications of the role of self-depurinating sequences in evolution; and it is from this vantage point that we present the findings that follow.

4 Self-Depurination in the Human β-Globin Gene

This gene, which has 148 codons (in its message), contains no A-residue self-depurination site, and but one site for G-residue self-depurination, of which the first three residues of the loop correspond to codon 6. It is the second residue of the loop that is the site of the sickle cell anemia mutation. Figure 1.2 shows a plot of the number of independent variations (i.e., those in different haplotypes) per codon of this gene. The plot is based upon data obtained from two databases, HB Var (http://globin.cse.psu.edu), which includes all human hemoglobin variants and β-thalassemia mutations reported in the literature over more than half a century, and the recently assembled Human Gene Mutation Database (HGMD) (http://www.hgmd.cf.ac.uk). As such, the plot represents a compendium of all unique β-globin alleles revealed in most of the human populations in the world. It is striking that the most prominent mutation site in the plot in Fig. 1.2, that at codon 6, corresponds to three of the four loop residues of that single self-depurinating consensus sequence in this gene. In this glutamic acid codon, residue #1, the self-depurinating G-residue, and residue #2, the site of the sickle cell mutation, are the sites of readily detectable substitutions (Table 1.1). Due to the partial degeneracy in the glutamic acid codon, a transition mutation in residue #3 would be silent, i.e., result in no amino acid change; but a transversion would encode aspartic acid, with little effect on hemoglobin function. Such a transversion should be detectable by the protein sequencing used to identify most of these mutations, and both types should be by DNA sequence analysis. So far, SNPs at that codon residue have not been reported. Codon 6 is also the site of a single base deletion at residue #2 and a deletion of the entire codon. As noted earlier, this range of substitutions and short deletions at an apurinic site is just what is expected as a consequence of error-prone repair. All these are known to be inherited germline mutations that must have occurred over evolutionary time, and are responsible for different anemias and β-thalassemias. As such, the coincidence of these inherited mutations with the self-depurination consensus sequence at codon 6 in the β-globin gene provides strong support for the occurrence of the self-depurination mechanism in vivo, at least in germline cells. In this connection, it is worthy of mention that several haplotypes of the sickle cell mutation have been identified (Chebloune et al. 1988; Wailoo 1991), each likely representing an independent occurrence traceable to at least four different places in the Indian/Saudi Arabian subcontinent and in Africa (Kulozik et al. 1986; Lapouméroulie et al. 1992; Pagnier et al. 1984; Schroeder et al. 1989). These occurrences speak to the mutagenicity associated with the error-prone repair of the apurinic sites created by the self-depurinating mechanism.

Fig. 1.2
figure 2

Distribution of mutations appearing in independent haplotypes, i.e., unique mutations, among the codons of the β-globin gene. Codon 6, with the highest mutation frequency, is the only site in this gene capable of forming a stem-loop for self-depurination. Codon 6 and an additional 3′G-residue constitute the loop, which is preceded and followed by residues that form the stem shown in Fig. 1.1 (left). As can be deduced from Table 1.1, error-prone repair of the apurinic site resulting from self-depurination of the 5′G-residue of the loop must give rise to the observed substitution and deletion mutations, and thereby to the various anemias and β-thalassemias listed

Table 1.1 Coincidence of mutations reported in codon 6 of the β-globin gene with the first three loop residues of its only stem-loop-forming G-residue self-depurination consensus sequencea

5 Coincidence of Substitution and Short Deletion Mutations with the Consensus Sequence for Self-Depurination of G-Residues in Some Human Genes

Having discovered the self-depurination mechanism at the site of codon 6 in the β-globin gene, and then found that nearly all detectable mutations anticipated for this mechanism have occurred over time at this site, it was of interest to see whether such mutations have occurred as well in other genes by this mechanism. With ~1.25 × 106 potential G-residue self-depurinating sites in the human genome, and given their fairly regular distribution among exons, introns, control elements, etc., numbers of those sites should coincide with mutations detected in exons of other genes. A number of other such coincidences (see below) have indeed been found. At this early stage of human gene sequencing and the analysis of SNPs and short deletions, the finding of such coincidences in some cases can be viewed as more significant than their absence in others.

Table 1.2 provides details of nine such examples in five different genes. In each example, the entire consensus sequence is capable of forming a stem-loop catalytic intermediate. This sampling includes examples with the two loop sequences, 5′G-A-G-G and 5′G-T-G-G, each with T•A or G•C as the complementary pair at the base of the loop. Substitutions are seen to occur in the depurinated first loop residue, in the second loop residue, in the third, and in the fourth; there are three examples of deletion, one of the depurinated G-residue, another of the two consecutive G residues in the loop, and one of two stem residues.

Table 1.2 Coincidence of self-depurination consensus sequence and mutation sites in some human genes

As was apparent from the very limited tolerance for stem-loop sequence variation, it is reasonable to assume that any site in a genome which meets the sequence criteria for the self-depurinating mechanism is a potential site for mutation. That this is in fact the case is made even more convincing based upon the findings that follow.

6 Stem-Loop-Forming Consensus Sequences for Site-Specific Self-Catalyzed Depurination of G-Residues Are Highly Overrepresented in Some Human Genes

Although genomes have not evolved as strictly random sequences relative to some defined base pair composition, it is not unreasonable to assume that sequences of the size of the consensus sequences for self-depurination (14 nt minimal length) should be present at frequencies that do not deviate too far from random unless they have some biofunctional role to play that led to their selection. Besides, the frequency of random sequence stem-loops can be used as a control or basis for comparison.

Table 1.3 gives “random probabilities” of the consensus sequence for each gene listed, and for comparison, the actual or “observed” number of consensus sequences in each. It will be noticed that the random probabilities are quite close to the “observed” number for non-depurinating stem-loops for each gene (the ratios of the “observed” to “calculated” are in the range of 0.7–2.1), in contrast to the wide variation of this ratio for the G-residue self-depurinating stem-loops (from 4 to 48). If the random probabilities are generally slightly less than the observed numbers, it is probably because these stem-loop-forming sequences represent a subclass of inverted repeats. These are known to be overrepresented in the genome (Cox and Mirkin 1997), which, as noted, is not a true random sequence. This overrepresentation of inverted repeats could also contribute to the number of “observed” depurinating consensus sequences, and its effect should not be vastly different from what is observed for the non-depurinating stem-loops, i.e., it can contribute at most a factor of 2, not of 48. The fact that G-residue self-depurination sites throughout the genome occur far in excess of random expectation, by a factor of more than 5, suggests that they probably have some significant or essential biological role(s) that might be connected to the potential they create for mutation. Some critical insights in this regard followed upon the discovery of the consensus sequence for self-depurination of A-residues.

Table 1.3 Overrepresentation of the stem-loop-forming consensus sequence for site-specific, self-catalyzed depurination of G-residues in some human genesa

7 Discovery of the Consensus Sequence for Self-Depurination of A-Residues

It is the recognition of the overrepresentation of the consensus sequence for G-residue self-depurination, and the assumption that the consensus stem-loop sequence for similar self-depurination of A-residues might also be overrepresented that led us to proceed with a search for it in the human genome. An important clue in that search was obtained from the work on the mechanism of the toxic protein ricin, which had been found to depurinate an adenine residue at a unique site, position 4,324, in 23S ribosomal RNA (Endo and Tsurugi 1987). The sequence at that site had been shown to form a stem-loop with the same number of loop and stem residues as the stem-loop for DNA self-depurination of G-residues, but with a different loop sequence, 5′G-A-G-A instead of 5′G-T/A-G-G (Amukele et al. 2005). Why then the requirement for a protein to depurinate the RNA target but not a DNA target? Our explanation was that the deoxyriboglycosyl bond, in being some three orders of magnitude more susceptible to acid-catalyzed hydrolysis than the riboglycosyl bond (Shabarova and Bogdanov 1994), did not require an enzyme catalyst. Based on this reasoning, we initiated a search in the human genome for stem-loop sequences with a 5′G-A-G-A loop and all four possible first base pairs. Once they were found, calculations were made to determine which if any were overrepresented. Interestingly, three of the four base pairs at the base of the loop were found to be similarly overrepresented significantly, which was taken to mean that they were likely self-catalytic for self-depurination of A-residues. This was confirmed in preliminary experiments.

More informative searches of the entire human genome followed to determine whether there are any genes with common functional annotation among the top 100 genes in which the A-residue self-depurinating consensus sequence is most highly overrepresented. This information was sought in order to learn whether there are any groups of genes that might have exploited the A-residue consensus sequence for self-depurination. It was indeed gratifying to find that there are, in fact, two groups of such genes, each with a common functional annotation, for which a built-in mutagenic mechanism to create sequence diversity is consistent with their function. The group with the greatest overrepresentation is the one encoding the hypervariable regions of immunoglobulins, while that with the second highest overrepresentation consists of the olfactory receptor genes.

8 Self-Depurination of A-Residues and Antibody Diversity

Why should the genes for the hypervariable regions of immunoglobulins encode sequences particularly disposed to undergoing mutation? In order for antibodies to perform their function, they must recognize a wide variety of antigens, including many that have not been encountered previously. Rather than the immune response mechanism having encoded what, in effect, would have to be so large a number of antibody genes in the germline as would be wasteful, biologically unproductive, and require a genome of excessive size, antibody diversity is generated by a capacity for targeted mutagenesis within a relatively small number of genes in the genome of certain somatic cells (Tonegawa 1983). From the diversity of somatic antibody genes so created by mutagenic mechanisms, clonal selection can enable production of a group of antibodies appropriate to any new antigenic challenge.

Two mechanisms were uncovered previously in (somatic) B cells to generate the requisite diversity of the different polypeptide components of the Y-shaped antibody structure (see Fig. 1.3). One such mechanism involves recombination of certain immunoglobulin gene sequence elements (Roth et al. 1992). A second mechanism, occurring during B cell division, involves somatic hypermutation that targets a different segment of antibody structure by way of enzymatic deamination of cytosine to uracil, thereby creating transition mutations (Neuberger et al. 2003). With our finding that immunoglobulin genes are the most significantly overrepresented with the A-residue self-depurination consensus sequence, and that they contain G self-depurination sites as well, it would appear that nature has selected this additional mutagenic mechanism to create antibody diversity.

Fig. 1.3
figure 3

A schematic diagram of the Y-shaped structure of a typical human antibody (protein) molecule showing the location of the constant regions (CR) that form the skeletal framework from which are extended various combinations of hypervariable heavy chain (HC-HVR) and light chain (LC-HVR) segments. Those hypervariable sequence segments, three in each heavy and each light chain, are interspersed by constant segments (HC-CR and LC-CR), and together form antigen-binding sites (ABS). Previously, those hypervariable segments had been found to attain great sequence diversity by a combination of genetic recombination and C-residue deamination. Now we have discovered that over the course of evolution, nature selected the self-depurination mechanism for this purpose as well, since the hypervariable regions are very heavily endowed with A-residue self-depurinating sites. Hence, the gene sequences for those sites provide another major mechanism to enable them to undergo mutation in order to create the extraordinary sequence diversity required to meet the challenges to the immune response

In this connection, it is interesting that in the constant regions of immunoglobulin genes, the self-depurinating consensus sequence for A-residues is not highly overrepresented. This is a further indication of its selection for the purpose of creating the hypervariable regions. It is apparent then that these genes evolved so as to exploit the self-depurinating mechanism in their contemporary function.

9 Self-Depurination of A-Residues in the Evolution of Olfactory Receptor Genes

As with the immunoglobulins, the function of the structurally similar olfactory receptors requires them to be able to recognize a large number of different odorific molecules. However, in contrast to the genes encoding the hypervariable regions of the immunoglobulins, the olfactory receptor genes are all already encoded in the genome, and there is no indication that their somatic DNA sequences undergo hyper mutation in their coding regions to any significant extent (Sharon et al. 1998, 1999). In effect, they are currently utilized with the diversity with which they evolved over time. Of the ~850 such human genes, of which some 55–60% are pseudogenes (Olender et al. 2004), a small fraction are singlets, but the majority are in clusters distributed among all chromosomes but #20 and Y. The gene clusters are likely to have arisen by repeated duplication of individual genes and clusters (Glusman et al. 1996), as many of the genes in tandem arrays are closely related (Niimura and Nei 2003). At the same time, during their evolution, they apparently exploited mutagenesis, at least by the self-depurination of A-residues to differentiate in function.

With such high overrepresentation of the self-depurination consensus sequence in olfactory receptor genes, it is reasonable to expect that over evolutionary time, the same mutagenic mechanism may have also resulted in the loss of the self-depurinating capacity in some of those genes. In that case, corresponding nonfunctional or pseudogenes would have accumulated that arose, e.g., as a consequence of substitution mutations in the loop residues of the catalytic intermediate. Evidence for this possibility was sought by searching these gene clusters for highly overrepresented pseudogenes whose consensus loop sequences had been mutated from 5′ G-A-G-A to 5′G-T-G-A and 5′ G-G-G-A, some of the gene products expected at those self-depurination sites. Such a mechanism could account, then, for the very high fraction of these clustered genes that were found to be pseudogenes (Rouquier et al. 1998a, b). In fact, pseudogenes with these very A → T and A → G mutations at the self-depurinating sites were found to be overrepresented. These observations do indicate, then, that over the course of evolution the same mutagenic mechanism has served to create mutations leading both to olfactory receptor diversity and to erasure of such genes.

10 Discussion

Our goal in this chapter has been to summarize that aspect of our knowledge of self-catalyzed DNA depurination that particularly relates to ways in which the mechanism played a role in the evolution of certain biological processes and some inherited diseases.

Thus, we have presented several types of observations that support the notion that the self-depurination mechanism is likely to have played a role in molecular evolution. One relates to the distribution of the consensus sequence for self-depurination of G-residues. Not only is that consensus sequence found in every double-stranded genome we have examined from various Archaea species to Homo sapiens, but its frequency has been found to be very high, in the neighborhood of once every 3,000–5,000 base pairs. In mitochondrial genomes, which are also double-stranded, they are present in most species at a frequency similar to those in the genomes of lower forms. While they appear to be lacking in the mitochondrial genomes of some species, it must be kept in mind that we have not yet searched them for the consensus sequence for self-depurination of A-residues. Nor, in fact, have we yet searched any genomes other than that of Homo sapiens for the A-residue consensus sequence.

It is for this same reason that we are not yet able to interpret the absence of G-residue self-depurination sequences in half the single-stranded viral genomes recorded in the viral genome database. If the viral genomes with no G-residue self-depurination sites are in fact lacking in those for A-residues as well, that would provide important evidence that evolution selected against the self-depurination mechanism for these species. Such negative selection would arise because once an apurinic site resulted in backbone fragmentation of the single-stranded genome, the fragments might have no template complementary strand to enable their repair.

Another type of evidence supporting a role for self-depurination in molecular evolution relates to the highly error-prone repair of apurinic sites in all species in which it has been examined, from viruses (whose hosts carry the repair mechanism) and bacteria (Shearman and Loeb 1979) to lower and higher eukaryotes (Chakravarti et al. 2000). Moreover, in a number of cases, such sites have been specifically identified as mutational hot spots (Kunkel 1984).

We have additionally presented three examples of genes for which the evidence would seem to be compelling that self-depurination must have played a role in their evolution, in their loss of function or conversion to pseudogenes, and/or in their contemporary functioning.

In the case of the human olfactory receptor genes, the self-depurination consensus sequence, at least for A-residues, is clearly associated with the evolutionary development of their diversity; and just as these sites are significantly overrepresented in this group of genes, so are their mutation products overrepresented in their nonfunctional pseudogenes. Yet, in the human species, these olfactory receptor genes do not appear to be evolving now at any detectable rate. They are already all encoded in the genome, with no evidence of somatic cell DNA rearrangements or somatic mutations in the coding regions of those genes. So it is reasonable to conclude that the mechanism played a role in the course of the evolution of those genes to their current state of diversity, but that it does not play a significant role contemporaneously.

The case of the genes encoding the hypervariable regions of the immunoglobulins would appear to be somewhat different. Here we have genes for which mechanisms for variation are essential elements for their contemporary function. The consensus sequences for both G- and A-residue self-depurination are again highly overrepresented in these genes. The consequent mutagenic mechanism may have played a role in their evolution, something which our analysis cannot say with certainty at the present time. What is much more apparent, however, is that it is one of the several mechanisms by which antibody diversity is contemporaneously achieved in response to confrontation with an antigen.

Finally, we have presented the case of the unique occurrence of six different anemias due to substitutions and two β-thalassemias due to short deletions, all within codon 6 of the human β-globin gene. That codon is the only self-depurination consensus sequence site for G-residues in the entire gene, and there are none for A-residues. While these mutations are all explicable as a consequence of the self-depurination mechanism, there are no indications that they readily occur as somatic mutations (and even if they do, they are unlikely to be detected, since a majority of hemoglobin-producing cells would still have the wild-type DNA sequence). Consequently, they appear as germline mutations, evolutionarily retained, no doubt, as a consequence of the resistance to malaria that at least the sickle cell anemia mutation confers (Friedman 1978), and possibly some of the others as well.

Taken together, the foregoing examples serve as a strong indication that the self-depurination mechanism, coupled to the process of error-prone repair, has played an evolutionary role. But why has a mechanism capable of causing DNA damage been selected for and even concentrated in genomes in the course of evolution? Perhaps the advantages accruing from the availability of a mechanism inherent in the sequence and structural dynamics of DNA to help create gene sequence diversity far outweighs the negative consequences of genetic instability, at least where the products of gene sequence diversification have positive value. It is interesting in this connection that we have found that some genes appear to be favored by the presence of either G-residue or of A-residue self-depurinating consensus sequences; this might possibly indicate different evolutionary origins for such genes. Moreover, genes that are highly conserved, e.g., those encoding HOX, histone core, ribosomal proteins, contain very few if any such consensus sequences. It appears then, that these two consensus sequences have sometimes been selected against, but other times selected for, which would indicate that self-depurination is a significant mutagenic mechanism that has been taken note of and sometimes harnessed by evolution.