Introduction

The deciphering of the genetic code and subsequent discoveries in the field of molecular biology have revolutionized our understanding of the molecular basis of evolution, while raising new questions. Compare the potential number of distinct structural variants for a 109 bp genome (\( 4^{{10^{9} }} \approx 10^{ 500,000,000} \)) with the number of atoms in the Universe (less than 1080, Turner and Tyson 1999): the number of possible variants of the genome structure is of a super-astronomical scale, that is, their complete enumeration is not feasible for natural selection. A question that naturally follows is, How can spontaneous mutagenesis sustain a sufficiently high rate of adaptive evolution? A plausible answer can be constructed by invoking a block-hierarchical search through alternative genome variants concurrently, at all the hierarchical levels. This approach is based on the observation that the genome and the organism itself are hierarchical on multiple levels of organization. The essence of the method can be clarified through an analogy between the text in natural language and a nucleotide sequence in the genome, for example, mapping letter = nucleotide, word = DNA transcription factor binding site (TFBS) or exon, sentence = a unique combination of TFBSs (or exons) within a gene. A writer generates her prose recombining symbols at several levels: letters, words, and sentences. Similarly, mutagenesis generates new functions by recombining modules at several structural levels: nucleotides, regulatory sites, exons, and protein domains.

Gilbert (1978) was the first to postulate an evolutionary role for “fragmented” genes. Once the introns facilitate DNA shuffling by illegal recombination, the coding sequences of individual domains can form new combinations. Mathematical modeling led Ivanitsky et al. to conclude that it is possible for evolution to involve an exhaustive search over all combinations for a sequence with a length of 106 nt, over the whole known time of evolution. This exhaustive search can be implemented via a hierarchical structural system comprising 50 levels (Ivanitsky et al. 1985). As exhaustive search is never required for local adaptation of a species, it might be sufficient to consider only three to five levels of the structure–function hierarchy in the genome—from nucleotide to operon (Ratner 1992). The abundance of multidomain proteins in the genomes of both prokaryotes and eukaryotes (Tordai et al. 2005) suggests the reality of blockwise evolution.

The widely accepted mechanism of adaptive evolution is tinkering, a stepwise improvement of the structures of functional systems from one efficient organization to another (Jacob 1977). Gene duplications create material (Ohno 1970) used by mutagenesis and selection to create novel functional modules. At a qualitative level, this suggests the answer to the question about mechanisms of adaptive evolution. The degree of complexity of the problems that are to be solved during the evolution of functional systems in an organism is still unclear. Currently available information about genome structures and the mechanisms of mutagenesis makes it possible to perform a finer quantitative analysis of this problem.

Construction of a quantitative model requires a more precise definition of the goals of adaptive evolution. The functional systems of an organism are improved during evolution by the emergence of novel elements (genes or proteins), relations, and specialized subsystems. We will use the most evident relationship between the evolution of functional systems and mutagenesis. All the functional relations in an organism occur via intermolecular contacts provided by pairs of binding sites. The structures of the contact surfaces of the cell’s chief macromolecules (DNA, RNA, and proteins) are represented in the genome by groups of nucleotides. The contact specificity depends on the type and mutual arrangement of nucleotides in the consensus sequence, and these are changed by mutagenesis, that is, mutagenesis modifies intermolecular contacts. Hence, the diverse problems solved by adaptive evolution can be reduced to the creation of novel sites encoding intermolecular contact surfaces and to the search for beneficial combinations of sites within genes.

The genome contains some structures that are obligatorily created during the evolution of any species. These are novel functional modules, from regulatory sites of transcription and splicing (basic modules) to protein domains and genes. Here, the creation of basic modules as well as novel combinations of exons and regulatory modules during adaptive evolution will be referred to as constructive mutations. An example of constructive mutation at the basic level of a genome is formation of a novel site providing a contact between molecules. At the second level, for example, exon shuffling adds new domains to proteins, thereby increasing the diversity of multidomain proteins.

The problem of quantitatively estimating the capacities of mutagenesis can be distinctly defined as follows: What conditions are required to create novel, unique modules by spontaneous mutagenesis over a known time period of evolution of a taxon? The main goal of this work was to estimate the contribution of genetic search to the duration of the adaptive process, in particular, to assess the effects of the mechanisms of mutagenesis and genome organization on the evolvability of organisms. A broad formulation of the problem allows the overall system to be covered, and the correlations between different processes as well as their limitations and capacities to be detected. Analyzing the model results, we will consider the following questions: To what degree does genetic search influence the rate of adaptive evolution? What are the theoretical limits to the constructive capacities of spontaneous mutagenesis? How does the organization of eukaryotic and prokaryotic genomes influence their evolvability?

Modeling Genetic Search for Beneficial Mutations

Genetic search is the process of mutagenesis, which is constantly creating neutral, deleterious, and beneficial innovations. Mutations occur constantly in all individuals; in particular, the genome of any human contains on the average 115 de novo mutations (Kondrashov 2003). Thus, genetic search is continual and all reproducing individuals of a species are involved in this search.

Three groups of facts formed the background for this model (Fig. 1). First, it is known that functional modules of all levels had been created during adaptive evolution of various taxa. Second, we used the concept of structure–function autonomy of the modules. For example, the number of nucleotides that directly influence the affinity of a contact is limited and, correspondingly, the number of alternative variants for the structure of any site is limited as well. Third, the structure of a given genome can change according to a finite number of patterns. Thus, we can calculate the probabilities for emergence of certain modules in the given genome.

Fig. 1
figure 1

Theoretical foundations for the model of genetic search for beneficial mutations

The cost of genetic search (C) is the number of individuals (variant structures of the system) required to create, with a probability close to unity, a novel functional module by random mutagenesis. The cost of the search characterizes the first stage of an adaptive event, and it should not be confused with the cost of selection (Haldane 1957), which characterizes the second stage of an adaptive event (the fixation of a mutation). The cost of the search is an integral value reflecting the intricate nature of the constructive problem per se as well as the effects of genome organization and the mechanisms underlying mutagenesis (Fig. 1). The complexity of a problem is determined by the number of alternative structures of a created module. The genome organization and gene structure determine the probability of new modules being created at a given genomic locus. The mechanisms of mutagenesis specify the frequencies of mutations of various sizes, thereby also influencing the cost of creating modules.

The objects of the modeling are groups of multiply repeated events that have taken place thousands of times during the evolution of any taxon. These include, for example, the emergence of novel sites providing intermolecular contacts, the creation of novel proteins via exon shuffling or alternative splicing, and duplications of whole genes. Since we speak about certain mass events, their probability should be on the order of unity.

Consider two main paths by which new modules are created by mutagenesis. The first is the de novo creation of a module in a given locus via small mutations (single nucleotide substitutions and indels (insertions/deletions) of lengths less than 20 nucleotides). The second path is the copying of an existing module and insertion of a copy into a target genomic locus; it occurs through various duplications and transpositions with a length over 20 nucleotides (nt).

The First Path

The novel structures are created by spontaneous mutations, so it is impossible to predict the emergence of a particular beneficial mutation, and selection is the factor that settles everything. Our approach allows us to estimate the average number of mutations necessary to create a module in a certain genomic locus. Creation of a unique module becomes inevitable provided that the probability that this structure will appear is close to unity. It is possible to create any structure with the help of a sufficient number of small mutations in the target genomic locus. For small mutations (<20 nt), it is possible to put down this condition as N × G/(Q × m) ~ 1 or Q ≈ (G × 4m)/m, where N = 4m is the total number of alternative structures in a given group of linked elements encoding a functional module (GLE, defined below); G, the genome size; Q, the evolutionary capacity of the species; and m, the number of nucleotides (nt) in a functional module. We introduced the term GLE, defined as a DNA sequence encoding a functional unit (module) where the change in type or position of at least one element leads to failure of the module’s function. In particular, the GLE for a TFBS is the consensus nucleotide sequence.

To create a unique module of length m nucleotides with a probability close to unity, it is necessary that N = 4m small mutations have taken place in this locus. Since mutations are distributed over the whole genome in a random manner, it is necessary to provide a similar mutation density at all the genomic loci. Consequently, for 4m mutations to take place at a given locus, the total number of small mutations over the entire genome should be G/m-fold larger. The value N reflects the objective complexity of the constructive problem, whereas the value G takes into account the effect of genome size on the cost of the search.

Creation of a novel module by the first path implies a complete exhaustion of all the possible combinations (N) in the group of nucleotides encoding a contact surface of a molecule. In this case, the functional unit (affinity surface) coincides with the structural unit, the GLE. If the genome were a single functional module (GLE), then the estimate made above (\( 4^{{10^{9} }} \)) would be correct. However, the length of consensus sequence in the majority of the known modules providing intermolecular contacts (splicing sites, TFBSs, and others) is 2–20 nt (Mount et al. 1992; Kolchanov et al. 2007; Bryne et al. 2008). In a search, a complete exhaustion of all the possible combinations is necessary only for the nucleotides contained in the GLE, since the change in type or position of any nucleotide in the GLE leads to a partial or complete loss of the corresponding function. The functional and structural autonomy of modules allows the evolutionary cost of creating each module to be computed independently of its neighborhood.

The stepwise models, describing creation of a TFBS from random nucleotide sequence (Stone and Wray 2001; Durrett and Schmidt 2007), take into account only single nucleotide substitutions. We take into account that all possible mutations with a length up to 20 nt (single substitutions and indels) are involved in the creation of a novel TFBS. Indel can create a target site in a single step (one mutation), thereby erasing the problem of fixation of intermediate variants that inevitably arises in the models of stepwise TFBS creation.

The Second Path

Block mutations are the main path to creating novel genes and large modifications in the structure of existing genes (Gilbert 1978; Patthy 1999; Tordai et al. 2005). The second path—duplication of an existing module and insertion of the copy into a target genomic locus—includes all kinds of duplications and transpositions of sequences of lengths over 20 nt. Duplications and transpositions of ~20 to ~10,000 nt are able to change the composition and order of regulatory sites and exons in a gene, that is, they rearrange the second level of genome organization.

The result of a block mutation is the duplication (copying) of a DNA fragment and its insertion into a genomic locus. The evolutionary cost mainly depends on the probability of a necessary module being copied and the probability of this copy being inserted into a target locus. The probability of creation of a necessary module in a target genomic locus will be close to unity provided that P c × P in × Q ~ 1 or Q ~ 1/P c × P in, where P c is the probability of the necessary module to be copied; P in is the probability of the module to be inserted into the target locus; and Q, the evolutionary capacity of species. The total number of various duplications for a given genome is finite; consequently, it is possible to compute the probabilities for the mutation classes of interest.

These paths provide the variation required for evolution, at the three levels of genome organization (module, gene, and operon) that regulate and encode the main elements of the organism’s functional systems. In this process, the network of relations at higher organizational levels is modified in an indirect manner.

The evolutionary capacity of a species (Q) is the total number of individuals of the species that have existed over time T. The number of variant structures cannot exceed the total number of mutations that have occurred in all the individuals constituting the species over a certain time period. It is known that the total population of any species and the total mutation frequency are always limited (Drake et al. 1998). Consequently, the number of variants that can be tested in the time during which a taxon evolves (evolutionary capacity, Q) is also limited.

The model allows us to compute the number of mutations (or individuals) necessary to exhaust the alternative variants for the structure of a module at a given genome organization and mutation frequency range. Comparing the cost of the search for constructive mutations and the evolutionary capacity of a species, it is possible to detect the boundaries for the capacities of spontaneous mutagenesis in creating new modules. The model also makes it possible to quantitatively estimate the effect of genome organization on the evolvability of prokaryotic and eukaryotic organisms.

Methods

The search cost (C s) for creating the structure of a new module in a certain genomic locus by the first path (de novo) by small mutations is calculated as

$$ C_{\text{s}} = N \times \frac{1}{{P_{m} }} \times \frac{1}{{K_{\text{s}} }} = N \times \frac{G}{m + \Updelta m} \times \frac{1}{{K_{\text{s}} }} = \frac{G \times N}{{\left( {m + \Updelta m} \right) \times K_{\text{s}} }}, $$
(1)

where N is the number of alternative variants for the module structure; P m  = (m + Δm)/G, the probability of a small mutation occurring in a DNA locus of length of m nucleotides in a genome of size G; Δm, the amount of nonfunctional DNA (length of spacers) in the neighborhood of the module m; and K s is the frequency of small (<20 nucleotide) mutations per genome per generation (Table 1). K s is calculated by multiplying a standard mutation rate (per nucleotide per generation) by the genome size G. Since the function of a eukaryotic TFBS is independent of its distance from the transcription start site (Hartman et al. 2005), it can emerge at any position within the gene regulatory region. Therefore, in calculations, m + Δm ≈ 5,000.

Table 1 Summary of notations

The number of alternative structure variants (N) of a linear sequence of elements is determined by three parameters: the possible diversity of elements at each position, the mutual arrangement of elements, and the length of the consensus sequence. Equation 2 is written as

$$ N = b_{1} \times b_{2} \times \cdots \times b_{i} , $$
(2)

where b i is the permitted diversity of elements at the ith position. If the permitted diversity of elements does not differ at individual positions, then the number of alternative states for DNA and RNA sites (transcription and splicing regulators) is calculated according to simplified equation (3):

$$ N_{\text{na}} = 4^{m} . $$
(3)

Equation 1 is also true for the DNA sequences encoding protein functional sites of comparable sizes, but with certain restrictions. First, unlike in the eukaryotic TFBSs, the localization of a novel contact surface in a protein domain is as a rule highly restricted; therefore, Δm ≈ 0. Second, it is known that almost all (18 of 20) amino acids are encoded by multiple (two to six) synonymous codons. Consequently, the actual number of distinct states for the amino acids with degenerate code is two- to sixfold smaller than 64. The code degeneracy coefficient (Z) for an amino acid sequence of length k is calculated according to the following equation:

$$ Z = \frac{{\sum\nolimits_{i = 1}^{i = k} {a_{i} } }}{k}, $$
(4)

where a i is the degree of code degeneracy for the ith amino acid in the protein site (1 ≤ a i  ≤ 6) and k is the number of amino acids in the protein site. The number of alternative states for the protein site is calculated as

$$ N_{\text{aa}} = \prod\limits_{i = 1}^{i = k} {\frac{64}{{a_{i} }}} . $$
(5)

Since the target sites differ in their amino acid composition, the cost to create new contacts will vary depending on the coefficient of code degeneracy.

In the case of the second path, the cost of creating a new site in a given locus via block mutations is calculated according to the equation below:

$$ C_{\text{b}} = \frac{1}{{V \times P_{\text{c}} }} \times \frac{1}{{P_{\text{in}} }} \times \frac{1}{{K_{\text{d}} }} = \frac{G \times F}{{V \times 2sd^{2} }} \times \frac{G}{L} \times \frac{1}{{K_{\text{d}} }} = \frac{{G^{2} \times F}}{{2sd^{2} \times V \times L \times K_{\text{d}} }}, $$
(6)

where V is the copy number of paralogous modules of a given type in the genome; P c is the probability of the necessary module being copied; F, the total number of all the possible duplications in the genomic fragment of a certain size, calculated according to Eq. 7; d = i − j, the size range of duplications (from j to i); s, the size of spacer; P in = L/G, the probability of the module being inserted into the target locus of length of L in a genome of length G; and K d , the frequency of block mutations per genome per generation. In the genome without spacers, s = 1 and L = 1.

The number of all the possible duplications of a certain size, from j to i, is calculated as

$$ F = \frac{1 + i}{2} \times i - \frac{1 + j}{2} \times j \approx \frac{{i^{2} - j^{2} }}{2}. $$
(7)

The probability of transposition is obtained by multiplying the probability of duplication of the necessary site (P c) by the probability of insertion of the copy into the target locus (P in). The number of duplications of all possible lengths is calculated as the sum of an arithmetic series. Let the average length of a human chromosome be 3 × 109/23 = 1.4 × 108 nt; then for the human genome 1016 × 23 ≈ 2 × 1017 alternative variants of duplications of all sizes are possible.

Equation 6 is the calculation of the cost for site transposition into a gene’s regulatory region. When calculating the cost of exon shuffling, it is necessary to take into account that only one-ninth of all insertions into an intron of a target gene do not lead to a frameshift. The cost of the transposition of an exon into an intron of a target gene is calculated according to the following equation:

$$ {\text{C}}_{\text{ex}} = 9\times {\text{C}}_{\text{b}} $$
(8)

To provide a precise copy of a module, it is necessary to search through all the possible duplications within a certain range of sizes. In particular, it would be necessary to search through all the duplications in the range of ~20 to ~1,000 nucleotides (F ≈ 5 × 105 alternative variants for only one genome fragment with a size of ~1,000 nt) to provide a precise copy of a protein module. However, exons can be copied accurate to intron. For example, an average intron size in the human genome is 6,000 nt. Therefore, it is sufficient that at least one duplication (with a size of ~100 to ~12,000 nt) falls in the neighborhood of a given exon.

The results of all calculations are given with an accuracy of one order of magnitude. The frequencies of block (K d) and small (K s) mutations are given per genome per generation. The cost is normalized by the average number of mutations originating in each individual. Such normalization provides for a direct comparison of the search cost (C) and evolutionary capacity of a species (Q), since both values are expressed in numbers of individuals. Knowing the average total population of a species and the rate at which it adds generations, it is possible to transform the search cost into time units, that is, the average waiting time for mutation (Stone and Wray 2001).

Results

The model of genetic search makes it possible to compute the cost of creating a functional module in a certain genomic locus. Below are the calculated costs for the following types of modules: transcription factor biding sites (TFBSs), alternative splicing sites, protein contact surfaces, exons and clusters of exons, and genes (Table 2).

Table 2 Comparison of the evolutionary costs for analogous events in human and E. coli genomes

Creating Novel Modules Via Small Mutations (First Path)

Let us analyze the effects of the following factors: sizes of modules, degeneracy of the amino acid code (for protein sites), genome size and mutation frequency, sizes of spacers (nonfunctional DNA) in the neighborhood of modules, and requirements to the accuracy of module positioning.

The Length of the Module

The number of alternative structure variants of the module grows very rapidly with the length of the module (m), since N is directly proportional to the exponential function with the power m (Table 3). The shorter the site, the smaller is the number of alternative structures (N) and the higher the probability of emergence of a single unique variant of the structure. Over 108 years in all the possible loci with a length of ~5,000 nt (in particular, in the regulatory regions and introns of all human genes), all the variants of structures with a length of 17 nt will be created on average once; those with a length of 12 nt, on average 1,000 times; and those with a length of 7 nt, on average 1,000,000 times.

Table 3 The effects of eukaryotic genome organization and mechanisms of mutagenesis on the cost for creation of different types of modules

Code Degeneracy

The cost of creating a novel contact surface in a protein molecule depends on the length of the sequence encoding it and on the amino acid composition of the “target” site. In its value, code degeneracy is equivalent to a decrease in the site length. For example, if a target site of 10 amino acids contained only the amino acids encoded by four codons, it would have only 20 (rather than 30) significant nucleotides which when changed will influence the amino acid composition. Under condition that the sizes of two target sites are equal, the site with higher code degeneracy will be found quicker (Table 2).

Genome Size and Mutation Frequencies

The number of mutations per haploid human genome (K s ≈ 60, G ≈ 3 × 109) is approximately 24,000-fold larger than in the case of E. coli (K s ≈ 0.0025, G = 4.6 × 106), i.e., the density of small mutations in the human genome is ~33-fold higher. Correspondingly, the cost for de novo creation of protein sites in the human genome is lower as compared with the E. coli genome (Fig. 2).

Fig. 2
figure 2

The cost of creating a module via small mutations. In the human genome: (A) transcription factor binding sites and (B) protein sites at Z = 3.5. In E. coli genome: (C) transcription factor binding sites and (D) protein sites at Z = 3.5. The X-axis shows the site size in nucleotides and the Y-axis, the logarithm of evolutionary cost for search (left) and the logarithm of evolutionary capacity of species (right)

Sizes of Spacers in the Neighborhood of a Module and the Requirements to Accuracy of Module Positioning

Consider a typical result of adaptive evolution, the creation of a novel TFBS in the regulatory region of a target gene. The majority of such modules have a length of 5–20 nt (Kolchanov et al. 2007; Bryne et al. 2008) and, according to Eq. 1, the evolutionary cost for their de novo creation in the human genome is 0–1016 (Table 2). In this path, both copies of the already existing functional sites and unique site structures can be created de novo in the regulatory part of a certain gene. This range of costs includes all the possible structural variants of DNA and RNA functional sites. (Fig. 2).

The position of a novel module in the regulatory region of a eukaryotic gene is not restricted to a distinct position and can vary within the range of 5 × 103 permissible positions (Hartman et al. 2005). A random sequence of this length has all the possible combinations for groups having lengths up to 6 nt (46 = 4,096). Indeed, a random sequence with a length of 5,000 nt contains about 5,000/6 ≈ 833 various “words” with a length of 6 nt. Shifting the reading frame for “words” by one nucleotide, we obtain five additional sets of “words,” giving in total 833 × 6 ≈ 5,000 “words” with a length of 6 nt. Thus, potential TFBSs with a length of no more than 6 nt are present in almost any eukaryotic gene. This is the reason why the cost of their creation in the human genome is set equal to zero.

On the contrary, due to the absence of spacers in the prokaryotic genome, the cost of creating such sites with a length of 4–6 nt is higher by 11 to 12 orders of magnitude (Table 2). Creation of longer sites (m > 6) requires considerable expenditures of resources (individuals and time) in both prokaryotes and eukaryotes. However, owing to the presence of spacers, the cost of creating such modules via small mutations in the regulatory region of a human gene is lower by five orders of magnitude as compared with the E. coli genome. When calculating the cost of creating longer TFBSs (>6 nt) in the human genome, the number of variants already existing in the promoter is subtracted from the calculated cost for the site. However, this correction has a noticeable effect on only the cost of creating TFBSs with consensus sequences of 7 nt, whereas for longer sites this correction is insignificant.

Unlike the eukaryotic TFBSs, the protein sites and alternative splicing sites should be precisely localized (Δm = 0). Therefore, the costs of creating splicing sites and TFBSs of equal lengths can differ by three to eight orders of magnitude.

Creating Modules Via Block Mutations (Second Path)

The second path implies creation of functional modules by transposition of existing copied modules. We will demonstrate in which particular way the cost for block mutations depends on the following parameters: genome size, copy number of a module in the genome, size of the module and rate of block mutations, sizes of spacers in the neighborhood of this module, and requirements to the accuracy of module positioning (Tables 2, 3).

Copy Number of Paralogous Modules

The genome frequently contains tens and even hundreds of paralogous modules (Gough et al. 2001). This correspondingly increases several-fold the probability of the copying of the sites belonging to a particular type.

Sizes of Modules and Rate of Block Mutations

On average, with an increase in the length of indels by one order of magnitude, their frequency decreases, also by one order of magnitude (Ogurtsov et al. 2004). Correspondingly, the cost for block mutations grows relatively slower with the increase in module sizes. Thus, the second path allows new models of all sizes to be created.

Sizes of Spacers in the Neighborhood of Modules and Requirements to the Accuracy of Module Positioning

In eukaryotes, the modules alternate with spacers, namely, introns in the transcribed gene regions, nonfunctional DNA between TFBSs in gene regulatory regions, and intergenic DNA. The introns have some positions controlled by selection: splice sites, ~20 nt per intron (Lynch 2002); sites regulating transcription, frequently present in the first intron (Bergman and Kreitman 2001; Hartman et al. 2005); and small noncoding RNA (Brown et al. 2008) and even whole genes (Yu et al. 2005), discovered in some introns. However, the major part of the introns, especially of extended ones, is nonfunctional DNA. Therefore, the function of an exon is retained when it is duplicated, independently of whether it was copied precisely or with fragments of the flanking introns. When inserted into an intron of another gene, the functions of both the exon and recipient gene remain native unless a frameshift occurs or the splice sites at the exon–intron boundary are damaged. Thus, the presence of introns increases both probabilities (P c and P in) that determine the evolutionary cost of successful exon transposition. Introns elevate the rate of successful (with preservation of the function) duplications of adjacent exons; note that this effect is directly proportional to the length of introns.

The indicated evolutionary cost (1014) includes all the variants of existing proteins with copies of all the exon-bordering domains of a given proteome (Table 2). The probability of a domain lacking introns at its borders to duplicate is ≈10,000-fold lower and the cost for exon shuffling for this group of domains is correspondingly higher. This does not mean complete exhaustion of all the possible combinations of domains in proteins (the cost of which is extremely high) but rather that a copy of each exon-bordering domain has “visited” all the introns of the human genome. Since the calculations take into account all duplications of up to 12,000 nt except for copies of individual exons, this includes also the clusters simultaneously containing several exons of one gene.

The TFBSs in eukaryotic genes are also separated by spacers of nonfunctional DNA and are located at random in the region comprising ≈5,000 bp from the transcription start site. Therefore, the cost of TFBS transposition has a value (1013) approximately equal to the cost of exon transposition. This cost includes the transpositions of all the regulatory modules present in the genome, as well as their clusters, to the promoter regions of all the genes.

The cost for a successful gene transposition in a mammalian genome is relatively low (109)—all the genes could have been multiply duplicated during the evolution of this taxon. The calculations are analogous to those that were made above for introns. Also taken into account is the fact that the probability of a successful insertion of a gene copy is proportional to the fraction of nonfunctional DNA. In the mammalian genome, excluding introns, approximately two-thirds of the genome can be a potential target for insertions. Thus, owing to extended intergenic spacers, the cost of a successful gene transposition in the mammalian genome is lower by 11 orders of magnitude compared with that in the E. coli genome.

Dynamics of the Cost for Creating Different Modules with an Increase in the Genome Size at the Expense of Spacers

Figure 3 shows the dynamics of the cost for creating different modules in a series of hypothetical genomes similar to the human genome. It is assumed that the minimum genome (G = 3 × 107 nt, log G = 7.5) contains only functional DNA. Interestingly, the cost for block mutations decreases extremely fast only at the initial stage when spacers just start to appear. In particular, an increase in the genome size from 3 × 107 to 108 nt (introns of 130 nt and intergenic spacers, 3,000 nt) causes a decrease in the cost for genetic search by two to ten orders of magnitude (for different modules). The costs for gene duplications and exon shuffling constantly decrease with an increase in the genome size. The cost for creating TFBSs via the first path reaches its minimum at a genome size of 2 × 108 and then remains constant. The cost for creating TFBSs via the second path reaches its minimum at the same point but then begins to increase. The value of this critical point depends on the length of gene regulatory region; here, we assumed it amounting to 5,000 nt (Fig. 3). If the permissible range of TFBS localization exceeds 5,000 nt, than the critical value for the genome size will be >2 × 108.

Fig. 3
figure 3

Dynamics of the cost for creating three types of modules in a series of hypothetical species with genome sizes of 3 × 107 to 1012 nucleotides. (1) Creation of the TFBSs with a length of 10 nt by the first path, (2) gene duplications, (3) exon shuffling, and (4) creation of TFBSs by the second path. All species have the same functional genome (22,000 genes), mutation density, and the ratio of intergenic DNA to introns of 2:1 (as in human genome). The species differ in the content of nonfunctional DNA, namely, the sizes of introns and intergenic spacers. It is assumed that all TFBSs are located within 5,000 nt from the transcription start site

Both paths of evolution have their own advantages and shortcomings. In the case of the first, the cost grows too rapidly with the size of the module, thereby limiting the path’s capacity. The second path allows new combinations (genes or proteins) to be created from modules already present in the genome, independently of their sizes, yet is unable to create a new, unique basic module. Nonetheless, these paths compensate for each other’s shortcomings, acting in parallel.

The local rates of genetic search can differ significantly from the average values, due to specific structural features of the genome and the different mechanisms of mutagenesis. For example, the presence of mobile elements (Ratner 1992; Polavarapu et al. 2008) or repeats and indels (Zhu et al. 2009) can lead to a local increase in the rate of mutagenesis. Therefore, our estimates can be regarded as a first approximation to the real parameters of genetic search. To increase the accuracy of the results, it is necessary to ascertain the rates of duplications at various scales and take into account the effects of hot spots.

The Significance of Genome Organization

From the standpoint of organization, characteristic of the prokaryotic genomes is considerably smaller sizes, being by one to four orders of magnitude shorter as compared with the eukaryotic genomes (Li and Graur 1991). This difference is mainly determined by the differences in the content of nonfunctional DNA: its amount in the eukaryotic genomes may reach 90% and more, whereas it does not exceed 10–15% in the prokaryotic genomes (Koonin 2009). The performed analysis has demonstrated that the presence of spacers and their sizes are the particular factors that have the most pronounced influence on the cost for genetic search. For a protein domain to preserve its function, it is necessary that its duplication and insertion precisely match the borders of its functional modules, whereas the same procedure in a eukaryotic genome takes place accurate to the intron size. In general, the efficiency of genetic search in mammalian genomes is by two to 11 orders of magnitude higher (according to different modules) as compared with the prokaryotic genomes (Table 2), except for a blockwise pattern in creating the new protein sites (fragments of exons) that lack introns at their borders.

On the other hand, the evolutionary potential of mammalian species considerably yields to that of prokaryotic species. The total population for the majority of mammalian species varies from 105 to 107 individuals (Gromov and Erbaeva 1995). Given the annual addition of one generation, the evolutionary capacities of the majority of mammalian species vary from 1013 to 1015, which is the total number of individuals that have existed during 108 years (the evolution time of the functional systems). On the other hand, 1 g soil contains 106–109 bacteria on one species (Weinbauer et al. 1998) or 1017–1020 cells per 1 km2. Even at minimal rates of alternation of generations, the evolutionary potential of prokaryotes is by over ten orders of magnitude higher as compared with mammals.

Algorithms for Elevating Evolvability

Colegrave and Collins (2008) define evolvability as the ability of a population to both generate and use genetic variation to respond to natural selection. The mechanisms for elevating evolvability involve the structures and processes that optimize the genetic search for beneficial mutations by increasing the fraction of these mutations. Generalizing the obtained results, it is possible to consider two main algorithms for elevating evolvability.

The Correspondence Between the Sizes of Modules and the Scale of High-Frequency Mutations (First Algorithm)

Imagine that mutations of all sizes have an equal frequency. For a chromosome of 108 nt, the number of possible mutation types differing in size is also 108. Then the fraction of small mutations (<20 nt) would be ~10−7 (or 0.00001%), whereas the real frequency of such mutations is about 99%. If duplications of all sizes occurred at the same rate, the fraction of mutations with a size of 20–10,000 nt would be ~10−4, whereas their actual rate is higher by one to three orders of magnitude (Ogurtsov et al. 2004). Thus, the shift in mutation frequency range accelerates the genetic search via small mutations by seven orders of magnitude, and the search via block mutations by one to three orders of magnitude.

Possibly, the mechanisms of mutagenesis are tuned so that the most frequent mutations would be of a scale approximately matching the sizes of the main functional modules in the genome. In the case of such correspondence, a considerable fraction of mutations modify the structure of the modules, introducing, in particular, beneficial alterations. Any alternative variant of frequency distribution would lead to a more or less pronounced slowdown of the genetic search (Fig. 4).

Fig. 4
figure 4

The first algorithm for accelerating genetic search: correspondence between the sizes of modules and the scale of high-frequency mutations. (1) Real pattern of mutation frequencies (the shift towards small mutations) and alternative patterns: (2) normal distribution, (3) the shift towards large-scale mutations, and (4) equal frequencies

Easing the Requirement to Accuracy in Positioning of Constructive Mutations (Second Algorithm)

The second algorithm is mainly used by eukaryotes, its effectiveness depending on the presence of spacers between modules. The difference between the prokaryotic and eukaryotic approaches may be demonstrated by analogy (Fig. 5). A blockwise mutagenesis is similar to a blindfold cook who slices a link of sausage (representing a genome composed on modules). If the connecting ropes (spacers) are longer than each sausage itself (module), as it is in the eukaryotic genome, the knife more frequently cuts the ropes missing sausages (not destroying modules). On the contrary, in the prokaryotic genome, where the modules are arranged in a compact manner (without spacers), mainly nonfunctional fragments of modules are generated by duplications. We have described three ways in which this increases evolvability (Table 4). The creation of novel proteins by exon shuffling is accelerated by four to seven (and more) orders of magnitude, depending on the intron length (Table 4). The rate of blockwise evolution of gene regulatory regions is increased in an analogous manner. The presence of intergenic DNA increases the probability of successful gene duplications in eukaryotic compared with prokaryotic genomes.

Fig. 5
figure 5

The second algorithm for accelerating genetic search: easing the requirements to the accuracy of module duplications and insertions at the expense of nonfunctional DNA spacers in the neighborhood of modules

Table 4 Acceleration mechanisms for genetic search (increase in evolvability)

Discussion

Verification of Model

The results of this study demonstrate that genomic organization and the mechanisms of mutagenesis significantly influence the probability of creating a novel module. Of the three types of spacers, the attention has been mainly focused on the role of introns. The main parties in the discussion have been the advocated of the intron-early (Gilbert 1978; Blake 1979) and intron-late (Doolittle and Stoltzfus 1993) hypotheses. The inferences of our model are compatible with both hypotheses as well as with their combined variant (Koonin 2006), since they imply that introns enhance a blockwise evolution of proteins. However, the mechanisms of such influence are still vague. According to our opinion, the presence of introns at the borders of a domain significantly elevates the probability of a successful effective duplication of the domain (i.e., preserving its integrity), since the domain may be duplicated with fragments of introns. Using the model of genetic search, we can quantify the contribution of any spacers (including introns) to the acceleration of genetic search.

The first corollary of this model is that effective duplications of exon-bordering domains take place at a rate that is by several orders of magnitude higher as compared with the domains lacking any introns at their borders. Consequently, the probability to create novel multidomain proteins with involvement of exon-bordering domains should be higher as compared with the proteins of the second group. Indeed, a systematic comparison of nine animal genomes from nematodes to mammals revealed that exon-bordering domains expanded faster than other protein domains in both abundance and distribution, as well as the diversity of co-occurring domains and the domain architectures of harboring proteins (Liu and Grigoriev 2004; Liu et al. 2005). These facts directly confirm the first corollary of the genetic search model.

The second corollary is that the longer the introns flanking a domain, the more frequent are the effective duplications. This allows another fact to be explained (Liu and Grigoriev 2004), namely, that the fraction of exon-bordering domains in the vertebrate genomes had increased faster as compared with the invertebrate genomes. The reason is that the average intron size in vertebrates is by one to two orders of magnitude larger and, correspondingly, the rates of effective duplications in their genomes are higher. Thus, the difference in the rates of effective duplications between the group of exon-bordering domains and the domains lacking introns at their borders is more pronounced in the vertebrate genomes. The presence of a positive correlation between the intron size and the rate of protein blockwise evolution can be also tested in another manner. In particular, one should expect that the average length of introns is longer in the domain superfamilies where successful exon shuffling and tandem duplications of domains had taken place at a higher rate.

It is known that one and the same functional structure can be created by different ways. We can approximately estimate the relative contributions of different evolutionary processes by merely comparing the costs for implementation of alternative scenarios. As we have seen by the example of exon-bordering domains, the drastic differences in the generation rates of whole modules determined by spacers are also retained after the domains are fixed. Consequently, the further population processes (selection and genetic drift) cannot completely level (disguise) these differences.

For example, a new DNAprotein contact can be created via the evolutionary alteration in one of the components forming this pair. The question on which particular part of the gene—regulatory or coding—harbors the changes that had played the leading role in the evolution is being discussed for decades (Wray et al. 2003). We have demonstrated (Fig. 2) that the cost for creating a TFBS in the mammalian genome via the first path is considerably lower as compared with the cost for creating a protein site of the corresponding size. Since the genome can contain tens and even hundreds of copies of a TFBS (Kolchanov et al. 2007), the duplications of regulatory sites also take place at a higher frequency. Consequently, it should be expected that the evolution of regulatory regions was prevalent in multicellular eukaryotes.

Another example is comparison of the costs for alternative scenarios of creating proteins: alternative splicing ≤ duplications of whole genes < exon shuffling (Fig. 6). It has been shown that that the exon shuffling frequency in drosophila genome amounts to ≈15% of the frequency of whole gene duplications (Conant and Wagner 2005). The evolutionary cost of creating a splice site is relatively low (Table 2), consequently, alternative splice sites could have multiply appeared in every existing intron of mammalian genomes. Indeed, 93% of human genes (nearly all the genes with introns) are subject to alternative splicing (Wang et al. 2008; Pan et al. 2008). This is one of the main sources of the proteomic complexity of eukaryotes (Lareau et al. 2004; Artamonova and Gelfand 2007). A relatively low cost also allows novel exons to be created from intron sequences (Alekseyenko et al. 2007) by small mutations. In this case, the probability of creating a novel exon also correlates positively with intron length (Roy et al. 2008). The cost of successful gene transposition in mammalian genomes is relatively low (109), being considerably smaller than the cost of other block mutations; thus, all genes could have been multiply duplicated during evolution (Table 2). This inference agrees with the known facts on the role of gene duplications as the main supplier of material for adaptive evolution (Ohno 1970; Kondrashov and Kondrashov 2006).

Fig. 6
figure 6

The limits of evolutionary capacities of mammalian species for different types of modules. Arrows denote the costs for creating modules taking into account the effect of spacers and gray rectangles, the hypothetical costs for creating the same modules provided that the mechanisms for search acceleration are inactive

Alternative splicing creates proteins by recombining domains of existing proteins and thus cannot replace exon shuffling, which creates novel proteins by recombining domains of the overall proteome. However, the rate of creation of novel proteins via the first path exceeds the rate of the analogous process by exon shuffling by two to six orders of magnitude (Table 4).

Thus, the corollaries of the model associated with the role of introns on intensification of exon shuffling are directly confirms by experimental facts. Theoretical estimations of the relative contributions of alternative evolutionary processes to creation of the main types of modules made using this model are also confirmed by known facts.

The Potential Limits of Spontaneous Mutagenesis

Many factors influence the rate of evolution. Here we discuss only one aspect of this complex problem, namely, the degree to which the stage of genetic search influences the rate of adaptive evolution. An adaptive event comprises two stages: the first stage is the genetic search for a beneficial mutation and the second, its fixation by selection.

With a sudden change in environmental conditions, a population can use only the existing resource of mutations. Dobzhansky believed that the resource of mutations in natural populations is so large that any genetic changes can be implemented by natural selection without waiting for new mutations (Dobzhansky 1951; Grant 1985). As was earlier demonstrated, the regulatory region of a mammalian gene houses a set of various potential TFBSs with consensus sequences not exceeding 6 nt, any of these potential TFBSs can be used by selection. In this scenario, the duration of genetic search has no effect on the rate of adaptive evolution, since the rate of the adaptive process is determined by the stage of selection. When migrating to another ecological niche, a long-term adaptation to new conditions is initiated (Schmalhausen 1949). It includes optimization of existing functional subsystems and the creation of new ones. In this case, genetic search can limit the rate of emergence of constructive innovations and, correspondingly, influence the duration of the overall adaptation.

The evolutionary capacity of a species has to provide for adaptive evolution at all levels of genome organization, from the basic module to genes. We are interested in the evolution of functional systems with a time scale to ~100,000,000 years. Over 108 years, the time period of divergence of the main mammalian orders, by small mutations all sequences of TFBSs with lengths of up to 17 nt could have been created as well as all protein sites with a length of six amino acids (Fig. 6). As a result of blockwise mutagenesis, a copy of each exon (or a cluster of exons) could have “visited” all the introns of the genome. Note that the evolutionary capacities of mammalian species (1013–1015) approximately correspond to an average cost of blockwise evolution. This coincidence demonstrates that the composition and order of domains in proteins could have been modified during the evolution of mammalian species. Assuming that the above-described mechanisms for accelerating the search have not worked dramatically changes the situation. The creation of novel sites (regulatory and protein) via small mutations would slow down by ~four orders of magnitude, the blockwise evolution of proteins and gene regulatory regions—by ~seven orders of magnitude (Fig. 6). This means that creation of novel functional subsystems would become impossible for mammalian species with relatively low species populations.

Kawashima et al. (2009) discovered ~1,000 novel domain pairs in the vertebrate lineage, most of them in proteins specific to this taxonomic group. Tordai et al. (2005) compared the mobility of domains (the number of local architecture types in which a given domain occurs) in various kingdoms and demonstrated that the domains of the metazoans displayed the highest mobility. These authors (Tordai et al. 2005) have also demonstrated that the metazoan proteomes contain considerably larger fractions of multidomain proteins, 39%, as compared with the prokaryotes Archaea (23%) and Bacteria (27%). Clearly, constructive blockwise evolution has played an essential role in the evolution of metazoan organisms.

When stating that the intensity of blockwise evolution in metazoans was higher than that in the prokaryotes, we mean the absolute rates. Note that the colossal difference in the evolution capacities of these groups is not always apprehended. Bacteria experience a high rate of genetic search even at a relatively high cost of constructive innovations, due to their high population, short generation times, and wide abundance of horizontal gene transfer. The emergence of multicellular organisms was concurrently accompanied by decrease in evolutionary capacity due to a decrease in the total population of a species and the rate at which it adds generations. To compensate for such disproportion and maintain a comparable (to prokaryotes) rate of blockwise evolution, vertebrates should possess more efficient (by ten and more orders of magnitude) mechanisms of genetic search. Therefore, the evolution of large organisms with their low population sizes could have occurred only provided a certain specific organization of their genome. The presence of long spacers in the genomes of multicellular eukaryotes makes it possible to a certain extent to compensate for a dramatic decrease in their evolutionary potential.

At equal mutation rates, the performance of mutagenesis may change by over ten orders of magnitude (see Fig. 6) depending on the range of mutation rates and genome structure. It is evident that such a parameter as total mutation rate does not adequately reflect the real intensity of genetic search. This suggests that the differences in genome organization should be taken into account when comparing the evolutionary rates of different taxa, since the genome structure can considerably modulate the effectiveness of mutations.

Selection for Evolutionary Rate and/or Neutral Processes?

The number and sizes of spacers are extremely variable in the eukaryotic genomes, ranging from several introns per genome in unicellular organisms (intron sizes of several tens of nucleotides) to ~10 introns per single gene in mammals and higher plants (with a size of thousands of nucleotides). As is shown above, introns of a certain length are necessary elements of the vertebrate animal genome. Therefore, I assumed that the above-described mechanisms underlying evolvability were preserved and optimized by the action of a second order selection favoring the species with a higher rate of adaptive evolution. Several papers discuss the need to consider the selection of evolutionary mechanisms as a second-level (-order) selection (Schmalhausen 1974; Radman et al. 2000; Arber 2003; Earl and Deem 2004). The unit objects for the second-level selection are genetically isolated groups—various clones in the case of bacteria and species in the case of animals. In each adaptive round, the competing species are assessed by selection not only from the standpoint of the relative adaptation of their individuals, but also from the standpoint of the relative efficiency of their evolutionary mechanisms (Ananko 2002a, b). A relative duration of the adaptive process in competing species is a criterion for second-order selection or selection according to the adaptation rate.

The simulation of evolution using a series of hypothetical species demonstrates that emergence of spacers provides for a manifold increase in the rate of generating new modules (Fig. 7). This acceleration is especially pronounced when the initial genome (G = 3 × 107, without spacers) is doubled for the first time to a size of 6 × 107 nt: the rate of genetic search increases by two to ten orders of magnitude (for different modules). The next doubling of the genome size (to 1.2 × 108 nt) and the corresponding increase in spacers causes a considerably smaller acceleration, only 3- to 4.5-fold. During further increase in the genome size, the rate of creating TFBSs remains constant (via the first path) or even decreases (via the second path). The rates of exon shuffling and gene duplications continue their linear growth: doubling of the genome size leads to doubling of the search rate. We believe that the number, distribution, and sizes of spacers in the genomes of extant species result from the balance between the requirements of selection for rate, on the one hand, and the limitations (metabolic and structural) on genome size, on the other (Fig. 7). Reconstruction of the evolutionary dynamics of introns has demonstrated that their number during the evolution of eukaryotes increases in a stepwise manner. Carmel et al. (2007) detected three main waves in the expansion of introns. It is also known that one of these waves (at the stage of the common ancestor of multicellular animals) was accompanied by an increase in the evolutionary rate of multidomain proteins (Patthy 1999). Taking into account our simulation results, it is possible to assume that each new wave of intron expansion involved new groups of domains into the process of intensive blockwise evolution, thereby initiating an outburst in the rate of protein evolution.

Fig. 7
figure 7

Acceleration of genetic search in a series of hypothetical species with genome sizes of 3 × 107 to 1012 nt. The degree to which the rate of module creation is changed during successive genome size doubling at the expense of an increase in spacer sizes is shown. Exon shuffling is shown black; gene duplication, white; creation of TFBSs by the second path, light gray; and creation of TFBSs of 10 nt by the first path 10 nt, dark gray. All species have the same functional genome (22,000 genes), mutation density, and the ratio of intergenic DNA to introns of 2:1 (as in human genome). The species differ in the content of nonfunctional DNA, namely, the sizes of introns and intergenic spacers. It is assumed that all TFBSs are located within 5,000 nt from the transcription start site

Lynch believes that accumulation of excess DNA in the eukaryotic genomes is a consequence of the predominance of genetic drift processes in small populations (Lynch 2002, 2007a, b). Confirming their model, Lynch and Conery (2003) refer to the fact of negative dependence between the parameter Neu (effective population size and nucleotide mutation rate) and genome size. However, significance of this fact is decreased due to the possibility of its ambiguous interpretation (Whitney and Garland 2010). The reason of this ambiguity is in that Ne is indirectly associated with the parameter, such as sizes of organisms, which, in turn, correlates with a multitude of other factors (Schmid et al. 2000). In particular, the larger the size of an organism, the smaller are its species population size (R), rate of alternation of generations (τ), and size of individual populations (Ne). Thus, the observed correlation (Lynch and Conery 2003) may be determined by a negative correlation between the evolutionary capacities (R × τ) and spacer lengths (and correspondingly, genome size), as it follows from our hypothesis. Thus, this fact is compatible with both hypothesis, that of Lynch and ours.

It is relevant here to pay attention to the specificity in simulation of long-term evolutionary processes. Even if intrapopulation selection is unable to control DNA accumulation in the genomes of multicellular organisms (in small populations), there are alternatives to a neutral explanation of this trend. Our studies demonstrate that the mutation frequency range and genome structure actively influence the rate of genetic search. In particular, an increase in the fraction of exon-bordering domains in animal genomes (Liu and Grigoriev 2004; Liu et al. 2005) suggests that the composition of domains in proteomes has been gradually optimized due to the impact of selection for rate. Since the effective duplications of exon-bordering domains take place by several orders of magnitude more frequently, this trend has been accompanied by an increase in the total intensity of protein blockwise evolution.

It is also necessary to take into account the fact that species selection fixes the entire genetic system, including both the selective and neutral (from the standpoint of intrapopulation selection) changes. Therefore, the species selection can control the direction of neutral trends via mechanisms of mutagenesis. For example, a “neutral” shift in mutation frequency towards insertions will lead to a gradual increase in the sizes of spacers and the overall genome (Petrov 2002). In the absence of beneficial mutations, this is a negative trend, since it leads to a decrease in the metabolic efficiency. However, in the case of parallel increase in the generation rates of beneficial mutations, this is a positive trend, since a high rate of emergence of beneficial innovations can superfluously compensate the negative consequences (Fig. 7). Thus, the necessary feedback regulating the evolutionary efficiency of mutagenesis and genome architecture can be implemented via species selection (Fig. 8). In particular, species selection could keep the spacer and genome sizes within an optimal range. The fact that the sizes of introns and intergenic spacers correlate with the genome size (Vinogradov 1999; Bergman and Kreitman 2001) also favors the existence of a general mechanism controlling the sizes of all spacer types. Genome design is a very long and inertial process, and the impact of selection could have been intermittent: neutral evolution (at an intrapopulation level) → correction at the level of species selection → another neutral process to the next correction. Thus, when simulating macroevolutionary processes, it is necessary to consider the whole complex of constraints determined by the variations in the mutation frequency range, genome structure (Fig. 8), and population size.

Fig. 8
figure 8

The factors influencing the rate of genetic search and possible ways (feedback) of the regulation of mutation frequency range and genome structure by selection for rate

In addition to the factors mentioned above, several other factors are likely to influence the sizes of introns, namely, the presence of regulatory elements controlling gene expression (Bergman and Kreitman 2001); the presence of RNA genes (Maxwell and Fournier 1995); insertions of mobile elements (Bartolomé et al. 2002); selection for reduced energy cost of transcription (Carvalho and Clark 1999); selection for small chromosomal domain size (Prachumwat et al. 2004); and reduction of the Hill–Robertson interference between exons (Comeron and Kreitman 2000). The goal of future studies is to correctly estimate he real contributions of the above-mentioned factors to the structure of extant genomes.

It is clearly evident by the example of competition between the companies manufacturing products of the same functional type (PCs, phones, etc.) that the rate of appearance of new items plays the most important role. The company that spends more money for a long-run development more frequently proposes new types of products and, correspondingly, expands its sale markets. Analogously, the species with longer spacers more frequently generate beneficial mutations and expand their distribution areas at the expense of competitor species. Compare the scale of advance: the efficiency of the state-of-the-art supercomputer has increased 1013-fold as compared with the first computer (1949), while the difference in the rate of genetic search between the mammalian genome and the genome without spacers amounts to 104–1014. The PC users undoubtedly see the difference; could species selection have ignored such difference in the evolution rate?

Heuristic Capacities of the Model

The model makes it possible to describe the space of constructive capacities of spontaneous mutagenesis depending on the mutation frequency range and architecture of species genome. In particular, the theoretical limits for the capacities of mutagenesis according to the main module types have been detected for the class of mammals. It has been found that the mutation frequency range and genome architecture have a paramount effect on the success of mutations: at the same general mutation rates, the actual rate of module generation may differ by over ten orders of magnitude. Consequently, such a parameter as total mutation rate is informationally inadequate for comparative studies into the evolutionary rate of different taxa whose genomes differ from one another in their structures.

A diversity of the observed effects was generalized in the form of two algorithms of increase in evolvability. The first algorithm is implemented via a shift in the frequencies towards small mutations to achieve an approximate correspondence between the sizes of main modules and the scale of high-frequency mutations. The second algorithm consists in easing the requirements to the accuracy of module positioning and is predominantly utilized by eukaryotes due to the presence of a multitude of spacers between modules.

In particular, the model allows for quantitative estimation of the role of introns in intensifying exon shuffling. It has been shown that the efficiency of domain duplications directly depends on the presence and size of introns at their borders. This explains the observed differences in the rate of evolutionary amplification between the groups of exon-bordering domains and the domains lacking introns at their borders as well as the fact that the fraction of exon-bordering domains in the vertebrate genomes increases faster than in the invertebrate genomes.

It is evident that the extant taxa have solved the evolutionary problem of creating new modules but have done this in different ways. This is demonstrated by significant differences in the genome architecture. We have demonstrated the possibility of a theoretical estimation of the relative contributions of alternative evolutionary scenarios to creation of new modules. The predictions based on the model of genetic search can serve as a benchmark for comparative studies of an actual contribution of different mutagenesis mechanisms and genome structures to the adaptive evolution of individual taxa.