Introduction

Although sex chromosomes have evolved independently in diverse lineages of plants and animals, all sex chromosomes originate from autosomes and are shaped by a common set of evolutionary forces. The pivotal turning point in the evolution of sex chromosomes is the suppression of recombination in an autosomal region containing genes that control sex (Westergaard 1958; Bergero and Charlesworth 2009). Suppression of recombination fixes the sex determination loci, resulting in discrete sex types and the development of heteromorphic sex chromosomes. The absence of recombination in the Y (male heterogametic) or W (female heterogametic) sex chromosome relaxes purifying selection, subjecting the Y or W to the powerful degenerative effects of Muller’s ratchet (Muller 1964). The early stages of sex chromosome evolution are characterized by an increase in deleterious mutations and rapid expansion due to the accumulation of repetitive sequences (Bachtrog 2003; Bachtrog and Charlesworth 2002). Over time, deleterious mutations and repeat accumulation can lead to Y chromosome degeneration.

Repeat accumulation is a common feature of Y chromosomes in animals (Steinemann and Steinemann 1992; Erlandsson et al. 2000) and has recently been documented in several plants, including hemp, Cannabis sativa (Sakamoto et al. 2000); liverwort, Marchantia polymorpha (Okada et al. 2001); white campion, Silene latifolia (Grant et al. 1994); and papaya (Liu et al. 2004). The nascent sex chromosomes in papaya originated 7 MYA (Wang et al. 2012) and are in the early stages of evolution, with a dramatic expansion in size seen in the male-specific and hermaphrodite-specific region of the Y chromosome (MSY and HSY, respectively). The HSY is 8.1 Mbp, significantly larger than the corresponding 3.5 Mbp X (Na et al. 2012). The HSY is largely repetitive with 79.3 % repetitive elements compared to 67.2 % in the X (Wang et al. 2012). Ty3-gypsy retrotransposons constitute the bulk of repeats, representing the causal agent of expansion. Among the repetitive elements found to accumulate in plant, sex chromosomes are organelle-derived DNA fragments.

Organelle-derived sequences, collectively referred to as nuclear organelle DNA (norgDNA), are abundant in sequenced plant genomes and organelle to nucleus transfers are ongoing. Most norgDNA are less than 1 kb in length (Ricchetti et al. 1999; Mourier et al. 2001; Richly and Leister 2004) with several notable exceptions, including a 620 kb fragment of nuclear mitochondria DNA (NUMT) found on chromosome 2 in Arabidopsis (Lin et al. 1999; Stupar et al. 2001), and a 131 kb fragment of nuclear plastid DNA (NUPT) on chromosome 2 in rice (Rice Chromosome 10 Sequencing Consortium 2003). Early studies suggested that norgDNA is transferred to the nucleus via an RNA intermediate (Nugent and Palmer 1991), but recent evidence suggests that most norgDNA integrates into the nuclear genome by a non-homologous end-joining repair of double-stranded breaks (NHEJ-DSB repair) mechanism (reviewed in Leister 2005). The turnover of organelle-derived fragments is high, with an estimated 80 % of the NUPTs in the rice genome lost within a million years of integration (Matsuo et al. 2005).

norgDNA has been found previously to accumulate in the Y chromosomes of Silene and liverwort (Kejnovsky et al. 2006; Yamato et al. 2007). Large fragments of NUPT have been found in sequenced bacterial artificial chromosomes (BAC) from Silene, but the largely unsequenced Y chromosome makes it impossible to accurately gauge the extent of NUPT accumulation (Kejnovsky et al. 2006). In contrast to Silene, the Y chromosome of liverwort contains hundreds of NUMT insertions, but only three NUPT insertions (Yamato et al. 2007).

The recently sequenced papaya X and HSY provide a unique opportunity to study the integration, shuffling, and fractionalization of norgDNA throughout the early stages of sex chromosome evolution. Here, we present evidence of a dramatic accumulation of norgDNA in the papaya HSY. Novel inserts absent from the X chromosome are abundant in the HSY and range in divergence times, with several occurring before the evolution of sex chromosomes. These results demonstrate the rapid expansion and accumulation of deleterious sequences as a result of suppressed recombination in the HSY.

Results

NUPT fragments preferentially accumulate in the HSY

The papaya HSY, MSY, and X were recently physically mapped and sequenced using a reiterative BAC by BAC approach with individual BACs assembled into pseudo-molecules of 8.1, 8.05, and 5.3 Mb for the HSY, MSY and X, respectively (Na et al. 2012; Ming unpublished). Only 3.5 Mb of the X corresponds to homologous regions in the HSY and MSY, with the remaining 1.8 Mb likely corresponding to the shared knob 1 structure which is unsequenced in the MSY and HSY. The assembled sex chromosome sequences were scanned for papaya norgDNA integrations using the papaya chloroplast and mitochondria genomes.

NUPT integrations in the HSY are numerous, with 174 insertions totaling 93,824 bp, or roughly 1.15 % of the HSY sequence (Table 1). The 93 kb of insertions encompass 36 % of the papaya chloroplast genome (58,320 unique bases). Integrations range in size from 75 to 4,517 bp with an average size of 481 bp. Nineteen of the insertions are larger than 1 kb, with several containing multiple complete chloroplast genes. The 1.15 % of NUPT integrations in the HSY is four times higher than the genome-wide average of 0.28 %. All but three NUPT integrations are shared between the MSY and HSY, with the unshared fragments corresponding to small gaps in the MSY sequence. Furthermore, NUPT fragments have 99.5–100 % homology in the HSY and MSY, which is expected given the overall sequence homology of 99.8 % (Ming et al. unpublished). Because the HSY and MSY are nearly identical with respect to sequence homology and norgDNA content, further analyses were conducted using only the HSY sequence.

Table 1 norgDNA statistics in the papaya HSY, MSY and X

In contrast to the HSY and MSY, the X chromosome contains only 40 chloroplast sequences totaling 9,546 bp, constituting 0.18 % of the total sequence, far less than the genome-wide average and the Y chromosomes. NUPT integrations are significantly less in the X, with a range of sizes from 75 to 814 bp (mean 180 bp). NUPTs have a range of 84–100 % match to the papaya chloroplast, with an average sequence identity of 92 % in both the HSY and the X chromosome. The variable matches to the chloroplast genome indicates that the fragments have a range of insertion times, with some predating the divergence of X and HSY and others occurring within the last 100,000 years.

NUPT are unevenly distributed in the HSY and the X. Sequence divergence identified three distinct regions in the papaya sex chromosomes: two evolutionary strata and a collinear region (Wang et al. 2012). The evolutionary strata are represented by large scale inversions in the HSY and the collinear region has conserved gene content and structure in the HSY and X. Most of the norgDNA in the HSY is located in the two evolutionary strata and comparatively little is found in the collinear region. NUPT fragments represent 1.2 and 1.5 % of the total sequence in inversions 1 and 2, respectively; whereas only 0.1 % of the collinear region is NUPT (Table 2). NUMT integrations in the HSY show a similar pattern constituting 0.26 and 0.12 % of inversion 1 and 2, respectively, and only 0.07 % of the collinear region. This contrasts the X chromosome, where NUPT fragments are largely found in inversion 2 (representing 0.52 % of the sequence) and comparatively little is present in inversion 1 and the collinear region (0.15 and 0.14 %, respectively) (Table 2). NUMT fragments are evenly distributed on the X chromosome, representing 0.14, 0.18, and 0.10 % of the inversions and collinear region, respectively.

Table 2 norgDNA in the evolutionary strata and collinear regions

Twenty-two (58 %) of the NUPT fragments in the X are conserved in the HSY. Shared NUPTs mirror the large scale inversions, translocations, and rearrangements seen between the X and HSY based on genic regions (Fig. 1). The remaining 16 unshared fragments in the X have either been lost in the HSY or were incorporated after divergence. Only 11 % of the NUPT in the HSY are conserved in the X, suggesting the vast majority integrated after diverging 7 MYA. NUPT fragments are dispersed unevenly in the HSY forming distinct clusters, but are relatively uniform in the X (Fig. 1).

Fig. 1
figure 1

Distribution and conservation of norgDNA fragments in HSY and X. Knob structures in the HSY are denoted in gray on the outermost ring. A Represents a heatmap of repeat element composition, gene blocks displaying lower repeat content. Sites of mitochondria and chloroplast insertions are shown in B and C, respectively. Paired genes (shown in gray), and conserved mitochondria (green) and chloroplast (blue) insertions are linked based on position in D

The HSY contains fewer NUMP fragments than the genome wide average with 16,458 bp (0.11 %) in the HSY and 858,190 bp (0.31 %) in the genome. All of the NUMP fragments are shared between the HSY and MSY, totaling 16, 234 bp (0.11 %). The X region also contains fewer NUMT fragments in comparison to the papaya genome with 6,558 bp (0.12 %). NUMT integrations are larger in the HSY, ranging in size from 75 to 2,965 bp, in comparison to the range of 75–481 bp in the X. Like the chloroplast integrations, NUMT are evenly dispersed across the X and HSY, with only five integrations shared between the X and HSY (Fig. 1).

Dramatic amplification of rsp15 in the HSY

Strikingly, a chloroplast fragment containing ribosomal protein 15 (rsp15) is amplified in the HSY 23 times. The total length of the fragment is 501 bp, with flanking, noncoding chloroplast sequence on both sides of the complete 276 bp rsp15 gene. The average estimated age of the rsp15 insertions is 7 million years with individual estimates ranging from 5.8 to 8.9 MYA (supplemental Table 1); around the divergence of X and HSY. rsp15 fragments are dispersed evenly throughout the HSY and no copies are found in the X. A blast search against the papaya draft genome also revealed no hits, suggesting that the rsp15 fragment integrated into the proto-HSY and proliferated, co-amplifying with surrounding retrotransposons. Intact Ty3-gypsy elements are found less than 1.5 kb upstream of most rsp15 sequences, suggesting possible retrotransposon-mediated duplication.

Most of the norgDNAs integrated after the origin of sex chromosomes

The estimated age of divergence is a valuable tool for discerning the origin of norgDNA in the X and HSY. A divergence time predating recombination suppression between the X and HSY reflects a fragment that originated elsewhere in the nuclear genome and later retrotransposed. norgDNA fragments younger than 7 million years likely represent direct transfers of norgDNA occurring after sex chromosome initiation.

Molecular clock estimates are useful for dating norgDNA insertions, but are subject to varying levels of uncertainty based on the nuclear substitution rate used (Bromham and Penny 2003). Substitution rates vary substantially in plants largely due to a lack of plant fossil calibration points and lineage-specific rate variations (Koch et al. 2000; Zhang et al. 2002). The closest available nuclear substitution rate beings to Arabidopsis, which has a divergence rate of 7 × 10−9 substitutions per synonymous site per year (Ossowski et al. 2010). Arabidopsis belongs to the closely related Brassicaceae family and was used previously to estimate the divergence time of the papaya sex chromosomes (Wang et al. 2012).

Dating is also complicated by the varied origin of norgDNA. Nucleotide substitutions between norgDNA and the modern day organelle genomes can be divided into two types based on their origin. Type 1 mutations are more difficult to distinguish and represent differences between the ancient and modern organelle genomes. Type 2 mutations represent substitutions that occurred between the norgDNA and organelle genome after integration into the nucleus and are, thus, more suitable for estimating divergence time. The only way to accurately discern between type 1 and 2 mutations is by comparisons of the near identical inverted repeat regions (IR) in the chloroplast genome. Substitutions common to both IR segments likely represent differences between the ancient and modern chloroplast, as the chance of randomly mutating the same base in both regions is negligible. Making this distinction in the X and HSY is impossible, as no fragments contain both IR regions. Thus, both type 1 and type 2 mutations were used to estimate divergence.

Divergence times of norgDNA vary extensively from 0 (no synonymous substitutions) to over 14 million years, with no clear pattern in the X or HSY (Fig. 2, Supplemental Table 1). Seventy percent of the NUPT fragments in the HSY integrated after sex chromosome emerged, as they have 63 % of the NUMT fragments. A similar trend is seen in the X, with 66 % of the NUPT and 57 % of the NUMT fragments integrating after X and Y divergence. This suggests that most fragments arose directly from the organelle genomes, with the remaining minority translocating from the autosome.

Fig. 2
figure 2

Estimated divergence times of organelle DNA insertions. Divergence times were calculated based on nucleotide differences between the modern papaya chloroplast and mitochondria genomes and the norgDNA. A detailed list of divergence times, fragment sizes, and physical location is provided in supplemental Table 1

Discussion

The transfer of organelle genome sequences to the nuclear genome is a continually ongoing process in plants (Timmis et al. 2004), with evidence of recent large insertions in rice and Arabidopsis (Lin et al. 1999; Rice Chromosome 10 Sequencing Consortium 2003). The rate of NUPT integration in plant genomes is astonishingly high based on experimental data from tobacco, suggesting an elimination mechanism in effect to counterbalance this accumulation (Huang et al. 2003). An elimination mechanism is clearly present in rice, as NUPT integrations in the rice genome are quickly shuffled, translocated, and eliminated; 80 % of the insertions are discarded within one million years. But how is the elimination process affected when recombination and purifying selection are suppressed, as is the case with sex chromosomes?

The papaya HSY and MSY, likely through recombination suppression, are accumulating NUPT at a rate 4 times the genome wide average and nearly 12 times the average in the corresponding X region. Furthermore, conserved norgDNA is sparse between X and HSY, with only 11 and 12 % of the chloroplast and mitochondria fragments conserved, respectively. The remaining norgDNA was either integrated after the inception of sex chromosomes or existed in the proto-X and Y and was eliminated from one of the sexes. A similar accumulation of norgDNA is seen with NUMT in the human Y chromosome (Ricchetti et al. 2004) and NUPT in the Y chromosome of Silene (Kejnovsky et al. 2006). The accumulation of NUPTs in papaya has contributed to the dramatic expansion and repeated accumulation seen in the HSY and is likely indicative of the early stages of Y chromosome degeneration.

NUPT and NUMT presumably integrate into the nuclear genome using the same mechanisms and are equally prevalent in the papaya genome, each constituting 0.3 % of the genome sequence, which rules out the possibility of massive translocations of previously integrated organelle DNA from the autosome to the HSY. The four times higher rate of chloroplast DNA integration into the HSY have occurred mostly after the divergence of the sex chromosomes, an observation supported by the short divergence time (less than 7 MYA) of most chloroplast fragments.

Organelle fragments have been found to preferentially integrate in the pericentric regions of chromosomes in rice (Matsuo et al. 2005). The HSY and its X counterpart are pericentric with perhaps higher organelle DNA content before the sex chromosomes diverged. However, the X chromosome has lower organelle DNA content than that of the genome average, the opposite of what would be expected. The X chromosomes are recombining in female meioses, but not male meioses, and selection in the X chromosome is less efficient than that of the autosomes. However, lower DNA sequence diversity in the X chromosome than genome wide average was observed in papaya (Weingartner and Moore 2012), and the same mechanisms might explain for the lower organelle DNA content.

Surprisingly, NUPT are 12 times more numerous than NUMT in the HSY. This sharply contrasts the Y chromosome in liverwort, which contains hundreds of NUMT, but only three plastid insertions (Yamato et al. 2007). In rice, NUPT preferentially integrates in the pericentric regions of chromosomes (Matsuo et al. 2005), possibly explaining the abundance of chloroplast sequences in the pericentric HSY. This does not explain, however, the lack of chloroplast integration in the X. Proximity to the centromere may help the NUPTsto accumulate, and the lack of recombination and purifying selection prevents them from being eliminated in the HSY. Sequencing of additional plant sex chromosomes will shed more light on this question.

norgDNA in the papaya sex chromosomes has diverse origins, as demonstrated by their range in divergence times and close proximity to retroelements. norgDNAs can be broken into two distinct classes based on their origin. The first class represents novel DNA insertions originating directly from the organelles. These fragments must be younger than the sex chromosomes (unless they are conserved between the HSY and X). Any non-conserved fragments predating sex chromosome inception likely originated in the autosome and were transposed sometime later. We can see direct evidence of retrotransposon-mediated duplication in the 23 nearly identical 501 bp rsp15 fragments scattered throughout the HSY. These fragments have at least 98 % identity to each other, but around 90 % identity to the chloroplast genome. The rsp15 fragment is likely integrated from the chloroplast to the protoY in proximity to an active Ty3-gypsy retrotransposon (Fig. 3). Full length transcription of the active retrotransposon co-amplified the rsp15 fragment at least 23 times throughout the HSY. This duplication and proliferation made the rsp15 fragment one of the most abundant repeats in the HSY. Intact retrotransposons are found less than 1.5 kb upstream of most rsp15 fragments, and retrotransposons at the remaining sites have likely been lost or fragmented. Ongoing repeat integrations, translocations, and inversions quickly fragment repeats in the HSY, complicating their identification and annotation. This process provides direct evidence of retrotransposon-mediated gene duplication and repeat accumulation in Y chromosome evolution.

Fig. 3
figure 3

Mechanistic view of retrotransposon mediated duplication of NUPTs in the HSY. a A 501 bp NUPT containing rsp15 integrated into the papaya autosome or proto-HSY. A Ty3-gypsy retrotransposon inserted upstream of the rsp15 fragment. b Read through transcription of the gypsy element co-amplified the NUPT followed by reverse transcription and second strand cDNA synthesis. c The Ty3-gypsy element reintegrated into the HSY. d The process repeated, amplifying the rsp15 fragment 23 times

The young sex chromosomes of papaya serve as a useful model for characterizing the early stages of sex chromosome evolution. Sex chromosomes from mammals are ancient and highly divergent; tracing the events that lead to their inception and shaped their divergence is impossible. The papaya HSY has ballooned in size to 8.1 Mb, three times larger than the X and already shows early signs of degeneration with dramatic changes in gene content and a higher proportion of pseudogenes (Wang et al. 2012). Because of their suppressed recombination, Y chromosomes have relaxed purifying selection and begin to accumulate deleterious mutations soon after they arise. The high repeat content and newly discovered organelle DNA accumulation presented here are likely the contributing factors to the degeneration and dramatic expansion seen in the HSY. norgDNA has a high turnover rate in plant genomes (Huang et al. 2003), but is widely dispersed in the HSY, reflecting the need for recombination to shuffle and eliminate promiscuous DNA.

Methods

Identification of organelle-derived sequences

The BLAST algorithm (Altschul et al. 1990) was used to search the recently sequenced papaya HSY, MSY, and X for NUMT and NUPT integrations. Complete sequences of the papaya (Carica papaya) chloroplast and mitochondria genomes used for the analysis were retrieved from GenBank (accession numbers EU431223 for the chloroplast genome and EU431224 for the mitochondria genome). norgDNA matches longer than 75 bp and e-values less than 1e−10 with >80 % homology are included in the analyses. Results were manually searched to remove any overlap or redundancy. Abundance, distribution, and conservation of norgDNA were plotted using Circos software (Krzywinski et al. 2009).

Estimated divergence times of norgDNA

Substitutions per nucleotide site (K) between the norgDNA fragments and the organelle genomes were calculated from the BLAST alignments and corrected using one-parameter methods (Jukes and Cantor 1969). Nucleotide differences between the norgDNA and modern day organelle genomes can be divided into two types based on their origin. Type 1 mutations represent differences between the ancient and modern chloroplast, and type 2 mutations represent those that occurred between the norgDNA and organelle genome after integration into the nucleus. The only way to accurately discern between type 1 and 2 mutations is by comparisons in the inverted repeat regions. Substitutions common to both IR segments likely represent differences between the ancient and modern chloroplast, as the chance of randomly mutating the same base in both regions is negligible. Making this distinction in the X and HSY is impossible, as no fragments contain both IR regions. Thus, both type 1 and type 2 mutations were used to estimate divergence. Divergence times were calculated using (K) obtained as described above and methods from (Li 1997) with a substitution rate of 7 × 10−9 substitutions per synonymous site per year (Ossowski et al. 2010).