Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

11.1 Introduction

Plant genomes have been widely affected by recursive polyploidizations, which repeatedly double or triple the genome information in a cell over-night (Bowers et al. 2003, 2005; Jaillon et al. 2007; Soltis et al. 2008; Soltis and Soltis 2009; Abrouk et al. 2010; Tang et al. 2010; Jiao et al. 2011, 2012). Though wide-spread gene losses and DNA rearrangements often follow, mostly leading to restoration of diploid heredity, hundreds of duplicated genes are often preserved in colinearity on homoeologous chromosomes or chromosomal segments, retaining valuable traces of these abrupt evolutionary events (Wang et al. 2005; Gaeta et al. 2007; Paterson 2008; Paterson et al. 2009, 2012; Proost et al. 2011; Schnable et al. 2011; Freeling et al. 2012).

Recent research into illegitimate recombination between duplicated genes revealed that many duplicated genes might have been affected by gene conversion, with one copy of a pair of duplicates being converted to the DNA sequence by the other by a unidirectional recombination-like mechanism (Xu et al. 2008; Gaeta and Chris Pires 2009; Wang and Paterson 2011). A comparative analysis between rice and sorghum genomes showed that 12 % of rice duplicated genes and 14 % of sorghum duplicated genes were affected by conversion after the divergence of these lineages (Wang et al. 2007, 2009). Among those converted genes, 40 % were affected to their full gene length and the others in only partial sequence. These conversion events may have occurred tens of millions years ago.

A comparison between rice subspecies indica and japonica found evidence of more recent gene conversion, showing that ~8 % of rice genes may have been converted after the split of the two subspecies about 400,000 years ago (Zhu et al. 2007). One pair of grass chromosomes, e.g., rice chromosomes 11 and 12, their sorghum orthologous chromosomes 5 and 8, and corresponding chromosomes in other grasses, have been affected by prominent conversion (Wang et al. 2011a, b, c). After the split of rice and sorghum, nearly 60 % of rice and sorghum duplicated genes have been converted by their duplicated copies. Evidence from sequence similarity analysis, and independent analysis of Oryza species (including rice) indicated that near the termini of the short arms of Oryza chromosomes 11 and 12, gene conversion may be still ongoing, 70 million years or after the origination of these duplicated genes (Jacquemin et al. 2009; Wang et al. 2011a, b, c).

Analysis of eudicot genomes found more evidence of homo(eo)logous gene conversion. In a tetraploid cotton, Acala Maxxa, 40 % of paralogous genes from its two subgenomes At and Dt differ in sequence from their diploid progenitors. The vast majority of these mutations are convergent, with At genes converted to the Dt state at more than twice the rate (25 %) as the reciprocal (10.6 %) (Paterson et al. 2012). As to conversion between homologous chromosomes, sequencing 40 Arabidopsis F2 plants and their parents showed that small gene conversion tracts, often biased, represented over 90–99 % of all recombination events. Moreover, the rate of alteration of protein sequence caused by gene conversion is reported to be more than 600-times that caused by mutation (Yang et al. 2012).

11.2 Comparative Inference of Gene Conversion in B. rapa and B. oleracea

The existence of large homoeologous blocks provides a chance for homoeologous (ectopic) DNA recombination, which may result in concerted evolution of duplicated genes as inferred previously in grasses (Wang et al. 2009, 2011a, b, c).

11.2.1 Rationale to Infer Gene Conversion

Annotated genes from Brassica rapa and Brassica oleracea were from sequencing project websites (Wang et al. 2011a, b, c; Liu et al. 2014). To find colinear homologs within a plant or between two plants, we run BLASTP to find homologous genes. Homologs with E-values smaller than 1e-10 were taken as input for ColinearScan, which was adopted to infer DNA blocks containing 10 or more colinear genes. By checking chromosome numbers, it was not difficult to define orthologs between B. rapa and B. oleracea. By using an approach described previously, we defined three subgenomes A, B, and C, and found paralogs in each plant. If there was no gene loss, at corresponding locations there would be three colinear genes in each plant produced by the genome triplication, namely, Br-A, Br-B, and Br-C in B. rapa, and their respective orthologs, Bo-A, Bo-B, and Bo-C in B. oleracea, forming homologous gene sextet. However, due to wide-spread gene loss after the genome triplication, often we could not find sextets of homoeologs.

To infer gene conversion, based on sextets or incomplete groups, we defined homologous gene quartets, two paralogs in a plant and their respective orthologs in the other plant. Then we inferred synonymous nucleotide substitution rates (Ks) between them. We anticipated that orthologs were more similar than paralogs, in that speciation was after genome triplication. However, if paralogs in a genotype were more similar than orthologs, we considered that the paralogs might have been affected by gene conversion. Bootstrapping tests were repeated 100 times. To estimate Ks, we first aligned proteins of a homologous quartet with CLUSTALW, and after removing gaps, the protein alignment was then translated into cDNA alignment in codons. Ks were estimated by using the Nei-Gojobori approach implemented by BioPerl.

11.2.2 Characterization of Gene Conversion

By using ColinearScan to find gene colinearity and by checking sequence similarity between chromosomal regions, we inferred paralogous genes within B. rapa and B. oleracea genomes, respectively, and inferred orthologous genes between them. Here, we checked triplicated genes that were preserved in both Brassica species, which form homologous gene sextets. For genes in each sextet, we checked each quartet of homologs within them. We compared gene similarity or tree topology. We anticipated that the paralogs (duplicated genes) were more diverged than their respective orthologs. If not, we inferred that the paralogs might have been affected by gene conversion. We removed possible redundancy when counting converted gene pairs.

We detected 4296 homoeologous pairs of genes, involving 8592 (20.6 % of) B. rapa genes and 8592 (24.7 % of) B. oleracea genes. Most of these reside in 23 large duplicated blocks in B. rapa (Fig. 11.1a) and 19 large duplicated blocks in B. oleracea (Fig. 11.1b), distributed throughout the chromosomes. In total, we found that ~8 % of duplicates (368 and 343) in B. rapa and B. oleracea have been affected by gene conversion (Table 11.1). The conversion tracts vary in size, ranging from a few base pairs to full gene lengths.

Fig. 11.1
figure 1

Whole-genome triplication and gene conversion. Distributions of duplicated genes (a, c) and those converted (b, d) in B. rapa and B. oleracea, respectively

Table 11.1 Gene conversion on chromosomes in B. rapa and B. oleracea

11.2.3 Unbalanced Gene Conversion Among Chromosomes

Different chromosomes have been unequally affected by gene conversion (Fig. 11.1c, d). In B. rapa, the most affected chromosomes are Br01, Br04, and Br05, with >10 % of paralogs affected, whereas in B. oleracea, the most affected chromosomes are Bo01 and Bo06, with >10 % of paralogs affected. In contrast, no paralogous pair from between Br09 and Br01, Bo08 or Bo09, Bo04 and Bo09 has been affected. Genes residing in bigger chromosomes with more colinear homoeologs are more likely to be affected by conversion (Fig. 11.2). This means larger duplicated regions on these chromosomes may facilitate the occurrence of homoeologous recombination due to preserving more DNA homology.

Fig. 11.2
figure 2

Correlation of gene conversion and duplicated block size. a B. rapa. b B. oleracea

11.2.4 Gene Conversion Occurs Correspondingly in Two Brassica Species

Gene conversion often occurs in both Brassica species in a corresponding manner, that is, if a duplicated gene pair were affected by gene conversion in one species, so were their counterparts in the other species. Most homoeologous quartets (~92 %) were found to be converted in both species. Only 53, or about one-sixth, of homoeologous gene quartets showed evidence of independent concerted evolution, i.e. were inferred to have experienced independent conversion events in B. rapa or B. oleracea. That is, it is likely that 5/6 the events are likely to have occurred shortly after the triplication but before the lineages diverged, or co-occurred independently in each lineage.

11.2.5 Biased Gene Conversion Among Different Subgenomes

Previous publication (Wang et al. 2011a, b, c) revealed three subgenomes that formed the present genomes of B. rapa and B. oleracea, and here we characterize gene conversion between different subgenomes. As to the analysis, there is an occurrence bias of gene conversion among subgenomes. About 40–44 % of conversion events involved paralogs on subgenomes A and B in both species, substantially more than between other subgenome combinations (Table 11.2). However, this increase parallels gene numbers in the respective subgenomes, with the percentages of converted paralogs from any two subgenomes being similar. This suggests that gene conversion is related to homologous gene density, which determines the likelihood of illegitimate recombination to occur.

Table 11.2 Gene conversion in subgenomes in B. rapa and B. oleracea

11.3 Gene Conversion and Genome Stability

11.3.1 Increased Genome Stability and Complexity After Polyploidization

Wide-spread and recursive polyploidizations have affected all flowering and seed plants, and may have been an important driving force of their evolution, especially the likely rapid divergence and speciation of lineages to form large groups of related species (Bowers et al. 2003; Ziolkowski et al. 2006; Soltis and Soltis 2009; Jiao et al. 2012). This should be a direct result of genome instability after whole-genome duplication/triplication (Marfil et al. 2006; Mazowita et al. 2006). These large-scale genome addition events duplicated/triplicated DNA content overnight, adding much genomic complexity and increasing interactions between chromosomes. Such interactions may include physically by DNA binding, knotting, splitting and breaking; and genetically by pairing, clustering, recombining, and segregating. A drive to recover diploid heredity may be the paramount source of force. Anyway, the majority of land plants favors diploid heredity and are adapted to finish a cycle of meiosis each year. Increased complexity will lead a lot of outcomes genetically, and the first among them is genomic instability.

Genomic instability is often accompanied by wide-spread gene losses, chromosomal rearrangement, and recombination between homo(eo)logous chromosomes or chromosomal segments (Wang et al. 2005; Feldman et al. 2012). If a polyploid came to recover diploid heredity, with one-to-one pairing of homologous chromosomes rather than pairing among multiple homo(eo)logous chromosomes, it may eventually regain much of its genomic stability. However, small scale chromosomal rearrangement may still continue to occur. The analysis of grass genomes indicated that the majority of genomic changes occurred before the divergence of major grass clades. For example, after the divergence of rice and sorghum, only ~2–3 % of genes were lost, resulting in minimal erosion of gene colinearity along orthologous chromosomes, in contrast to the loss of at least 65 % of genes duplicated in their common ancestor (Wang et al. 2005; Paterson et al. 2009). For another example, the majority of chromosomal rearrangement occurred before their divergence, and only a few such rearrangements can be identified in the sorghum lineage after its split with rice (Murat et al. 2010).

11.3.2 Homoeologous Recombination Is a Driving Force for Genomic Evolution

Homoeologous recombination is also a phenomenon of genomic instability, and can last much longer than other changes discussed above. As a result of this kind of illegitimate recombination, gene conversion transfers genetic information in a unidirectional manner. As gene conversion mechanisms proposed, it would increase DNA substitution rates, and therefore may play a role as a driving force of evolution (Chen et al. 2007; Wang and Paterson 2011). This has been attested to by comparative analysis of grass genes (Wang et al. 2009; Wang and Paterson 2011). After the ease of major genomic changes, homoeologous recombination and gene conversion can still occur millions of years after ancestral polyploidization (Wang et al. 2011a, b, c; Paterson et al. 2012). This has been evidenced from the analysis of both monocot to dicot plants. A particularly striking finding involves genes at the very end of rice chromosomes 11 and 12 and their counterparts in other grasses (Wang et al. 2007; Jacquemin et al. 2009; Paterson et al. 2012).

11.3.3 Gene Conversion and Homoeologous Block Length

Here, we revealed a correlation of longer lengths of duplicated blocks (or larger numbers of genes) with higher conversion rates, which agrees with previous findings in grasses. More colinear genes often mean higher DNA similarity between duplicated regions, which would increase the likelihood of homoeologous pairing. The chance of pairing is definitely much less between homoeologous than homologous chromosomes. Once it occurs, it would have some genetic outcomes, such as relatively low-level genomic instability, DNA mutations, and conversion.

11.4 Conclusion

Here, by performing comparative genomic analysis, we characterized gene conversion in B. rapa and B. oleracea. Gene conversion as a result of homoeologous recombination is a long lasting driving force of plant evolution. Widespread and recursive polyploidizations have played a pivotal role in the evolution, divergence and speciation of land plants. After the ease of genome shock (McClintock 1984) often in the early days after polyploidization, characterized by wide-spread gene losses and chromosomal rearrangements, genomes may recover much stability and return to diploid heredity. Though occurring at lower levels in later stages than early days after polyploidization, homoeologous recombination and gene conversion may last for a very long time, continuing to play a driving force in genomic evolution and genetic innovation.