Introduction

Sex can be determined by heteromorphic sex chromosomes that differ in their size, morphology, and gene content (Charlesworth 1991; Bachtrog 2013). Ancient sex-specific W or Y chromosomes (proto-sex chromosomes), such as those found in fish evolutionary lineages, have usually lost most functional genes owing to a loss of recombination with their former homologs, the Z or X chromosomes (Charlesworth et al. 2005; Steinemann and Steinemann 2005). The first step in chromosome differentiation is recombination suppression, which most likely evolved to prevent recombination between sex-determining genes, but the mechanisms of and reasons why this process occurs remain unclear (Charlesworth 1991; Bergero and Charlesworth 2009).

At the origin of heteromorphic sex chromosomes, mutations that are beneficial to one sex but detrimental to the other provide the selective force to suppress recombination between nascent sex chromosomes (Charlesworth 1996; Bachtrog 2013). The heterogametic sex tends to establish sexually antagonistic mutations in a population due to their beneficial effects, and the restriction of recombination between the proto-X (or Z) and proto-Y (or W) chromosomes may be a consequence of selection favoring genetic linkage between sexually antagonistic mutations and the sex-determining region (for a review, see Bachtrog 2013).

Recent “omics” studies have demonstrated that several genes distributed across the whole genome contribute to male and female differentiation (Teaniniuraitemoana et al. 2014; Chalopin et al. 2015; Chen et al. 2015). Moreover, the evolutionary forces that connect sexual dimorphism and the sex chromosomes are far more complicated than current theory allows. In the Mexican axolotl Ambystoma mexicanum, there is a miniscule difference between the sex chromosomes resulting from W-specific sequences (Keinath et al. 2018). The corresponding region contains a duplicated copy of the ATRX gene that appears to be a strong candidate for the primary sex determining locus (Keinath et al. 2018). The subsequent steps of chromosomal differentiation can involve mechanisms, such as chromosomal rearrangements and the accumulation of repetitive DNA sequences, that can trigger genetic degeneration (Charlesworth et al. 2005; Steinemann and Steinemann 2005; Bachtrog 2013). Y- or W-originated chromosomes present non-recombining portions corresponding to male- or female-specific regions, respectively, and the recombinant portion is referred to as the “pseudo-autosomal region” (PAR) (Bergero and Charlesworth 2009).

Repetitive DNAs are sequences that have been studied in more detail in recent decades (Marino-Ramirez et al. 2005; Feschotte 2008; Kapusta et al. 2013; Schemberger et al. 2016). They constitute the major fraction of genomes and can be classified as being dispersed (transposable elements—TEs—repetitions that occur at random positions in the genome) or in tandem (repetitions that are directly adjacent to each other) (Charlesworth et al. 1994). Several studies have described the role of repetitive DNAs in the structure, function, and evolution of genomes (Marino-Ramirez et al. 2005; Feschotte 2008; Kapusta et al. 2013) and in triggering the origination of a non-recombinant region between sex chromosomes (Steinemann and Steinemann 2005). In medaka fish, the insertion of a TE into the regulatory region of the dmrt1bY gene on the sex chromosome rewired the gene regulatory network cascades of sex determination (Herpin et al. 2010).

An interesting model for investigating sex chromosome origins and evolution is provided by the Parodontidae Neotropical fish group, composed of 32 species arranged into three genera, Parodon, Apareiodon, and Saccodon. Some of these species are cytogenetically characterized by heteromorphic W sex chromosomes in different stages of evolutionary differentiation (Bellafronte et al. 2011; Schemberger et al. 2011). There are species without evidences for sex chromosome occurrence, species presenting proto-sex chromosomes, species possessing a ZZ/ZW sex chromosome system ranging from small to large W chromosomes (e.g., Apareiodon sp., an undescribed species related to Apareiodon ibitiensis), and species presenting a multiple ZZ/ZW1W2 sex chromosome system (Vicari et al. 2006; Schemberger et al. 2011; do Nascimento et al. 2018).

Chromosomal studies in Paradontidae have focused on the mapping of repetitive DNAs in the karyotype and demonstrated enrichment of repetitive elements on the W chromosome (Schemberger et al. 2014, 2016; Ziemniczak et al. 2014). The recent advances in large-scale DNA sequencing have allowed comparative genomic analyses of repeated DNAs in ways that were never considered before (Treangen and Salzberg 2012). Therefore, we conducted whole-genome repetitive DNA analysis between males and females of Apareiodon sp. (ZW) to reveal the composition and evolutionary history of the W sex chromosome. Our study aimed to integrate genomic analyses and cytogenetic data to elucidate the role of repetitive DNAs in W sex chromosome differentiation.

Materials and methods

DNA extraction, next-generation sequencing, and genome assembly

DNA was isolated from the livers of two males (ZZ chromosomes) and two females (ZW chromosomes) of Apareiodon sp. (Verde River, Paraná State, Brazil; − 25°04′35″ and − 50°04′03″) using illustra tissue and a cell genomicPrep Mini Spin Kit (GE Healthcare, Chicago, Illinois, USA) following the manufacturer’s protocol. Two independent DNA libraries were sequenced on the Illumina HiSeq platform (100-base paired-end reads) for each sex (male and female) and filtered using FASTX-Toolkit software. Reads showing ≥ 90% of sequenced bases with Phred scores over 30 were selected. The genome was assembled using Velvet software (Zerbino and Birney 2008) due to its good performance for short-read datasets, and a 1.2 Gb total assembly length was generated (as expected for the Parodontidae genome). The statistics of genome assembly (scaffold and contig evaluations) were generated using the Perl script assemblathon_stats.pl (Bradnam et al. 2013). Additionally, genome data quality control was carried out with BUSCO v.3; the source code is available through the GitLab project, https://gitlab.com/ezlab/busco (Waterhouse et al. 2017).

Repeat identification, landscape pipelines, and TE insertion time estimation

The repetitive sequences of males and females were annotated using RepeatMasker (Smit et al. 2013–2015). For repetitive annotation, a custom repeat library was obtained with RepeatModeler (Smit and Hubley 2008–2015) using the default parameters. The custom library was merged with Repbase Update 20181026. RepeatMasker was run with the “slow” (-s), “library” (-lib), and “align” (-a) parameters. To summarize the RepeatMasker results, we used buildSummary.pl. output files from RepeatMasker as the input to generate calcdivergencefromalign.pl and repeatlandscape.pl files to calculate Kimura divergence values and plot the repeat landscape.

The nucleotide distances between all copies of each TE measured using the Kimura two-parameter method according to RepeatMasker (previously described) were compared to estimate insertion age. An average substitution rate (r) of 10−8 for substitutions per synonymous site per year and insertion time (T), as estimated for TEs (Arkhipova and Meselson 2005), was applied using the formula T = k/2r (Li et al. 2019).

Repetitive DNA analyses using RepeatExplorer

RepeatExplorer (Novak et al. 2010) was used to comparatively analyze repetitive DNAs between males and females. This software performs graph-based clustering of raw Illumina reads and uses a small sample of sequenced reads (0.1–0.5× coverage) (Novak et al. 2013) as the input, providing a fast and accurate way to compare two or more datasets. The reads were quality filtered based on the default parameters of paired_fastq_filtering. R, and a random sample of 5 million reads for each genome (~ 0.45× genome coverage) was selected. Finally, graph-based clustering was applied for de novo repeat identification and comparative analysis using the RepeatExplorer pipeline with the default parameters according to the developer’s recommendations (https://repeatexplorer-elixir.cerit-sc.cz/; Novak et al. 2010). The results from clustering were visually inspected with respect to their graph composition, which indicated the type of repeat and proportion of reads from each genome. The clusters showing differential contents in comparison of the female and male read datasets were selected for analysis. Blastn 2.2.31+ software was applied to search for the presence of Sat1WP satellite DNA (Schemberger et al. 2014) in the obtained clusters.

Chromosome preparation and fluorescence in situ hybridization (FISH)

Chromosome preparations were obtained from anterior kidney cells using an air-drying method (Bertollo et al. 1978), and C-banding was conducted according to Sumner (1972). The procedures employed in this study were in agreement with the Committee on the Ethics of the Use of Animals of the State University of Ponta Grossa (protocol 29/2016). Several DNA sequences were used as probes for FISH mapping, which are as follows: (a) the microdissected W chromosome; (b) the Cot-1 DNA fraction; (c) the satellite DNA Sat-1WP; (d) TEs; and (e) microsatellite sequences. The heterochromatic fraction (C-banding) of the Apareiodon sp. W chromosome was obtained by microdissection and DOP-PCR (referred to as Wm) and was subsequently synthetized as probe using digoxigenin 11-dUTP (Roche Applied Science, Mannheim, Germany) according to the method described by Vicari et al. (2010). The repetitive Cot-1 DNA fraction and the satellite DNA Sat-1WP were obtained from previous studies and labeled with digoxigenin 11-dUTP (Vicari et al. 2010; Schemberger et al. 2014).

Abundant TEs obtained from RepeatMasker annotated sequences and RepeatExplorer clusters were used for primer design and subjected to PCR amplification and probe labeling (Table S1—Electronic supplementary material). The PCR amplification of each TE was performed using 1× Taq Reaction buffer (200 mM Tris, pH 8.4, 500 mM KCl), 40 μM dNTPs, 2 mM MgCl2, 2 U of Taq DNA polymerase (Invitrogen, Carlsbad, California, USA), and each primer set at 0.2 μM. The amplicons produced by conventional PCR were labeled with digoxigenin 11-dUTP through nick translation (Dig Nick Translation Mix Roche Applied Science, Mannheim, Germany).

The abundant microsatellites resulting from the RepeatMasker and RepeatExplorer analysis of Apareiodon sp. genomic data—(GATA)n, (GA)n, (CA)n, (CAA)n, (CAT)n, (CG)n, (GT)n, (GAC)n, (CGG)n, (TAA)n, (GAA)n, (GAG)n, (GACA)n, (TA)n, (CAC)n, and (CAG)n—were labeled with tetramethylrhodamine or biotin at the 5′ end during synthesis (Sigma-Aldrich, St. Louis, Missouri, USA) and were used as probes. The general protocol for FISH followed the protocol described by Pinkel et al. (1986) under ~ 80% stringency conditions (2.5 ng/μl probe, 50% formamide, 2× SSC, 10% dextran sulfate, 42 °C for 16 h). Post-hybridization washes were performed at high stringency (50% formamide at 42 °C for 20 min, 0.1× SSC at 60 °C for 15 min, and 4× SSC 0.05% Tween at room temperature for 10 min). Streptavidin Alexa Fluor 488 (Molecular Probes, Eugene, Oregon, USA) and anti-digoxigenin rhodamine fab fragment (Roche Applied Science) antibodies were used for probe detection. The chromosomes were stained with DAPI (0.2 μg/ml) in Vectashield mounting medium (Vector, Burlingame, CA, USA) and analyzed by epifluorescence microscopy.

Results

Genome sequencing and assembly

A total of 508,168,176 male reads and 563,850,145 female reads (~ 100 nt) were obtained after filtering, generating ~ 42× and ~ 47× coverage, respectively. The genome assembly yielded 660,652 scaffolds for males and 501,976 scaffolds for females, for a total of 1.2 Gb (available at http://sacibase.ibb.unesp.br/jbrowse/JBrowse-1.12.1/index.html?data=data-apareiodon-sp). The N50 value was 14,634 for males and 22,307 bp for females (Table S2—Electronic supplementary material). Other quality metrics are shown in Table S2. Using BUSCO (searching 978 metazoan orthologous groups), 82.5 and 82.3% of the genes were found in the male and female genomes, respectively, most of which were intact in both genomes (Fig. S1—Electronic supplementary material). The same analysis using 4584 orthologous Actinopterygii groups identified 69.3 and 73.9% of the genes in the male and female genomes, respectively, most of which were intact in both genomes (Fig. S1—Electronic supplementary material).

Repetitive genomic composition

RepeatMasker analysis of the Apareiodon sp. assemblies showed that 36.2 and 35.3% of the genome was composed of repeat elements in males and females, respectively (Fig. 1a). A total of 56 superfamilies of TEs were identified, including 23 DNA transposon superfamilies and 33 retrotransposon superfamilies (Tables S3 and S4—Electronic supplementary material). DNA transposons presented a higher percentage (~ 38%) than other repetitive elements in the genome (Fig. 1a). Tc1/mariner and Hat transposons were most frequent within this category (Fig. 1b; Tables S3 and S4—Electronic supplementary material). The retrotransposons presented a 26.9% repetitive composition in males and females (Fig. 1b; Tables S3 and S4—Electronic supplementary material). The LTR retrotransposon Gypsy was predominant within the TE category (Fig. 1b; Tables S3 and S4—Electronic supplementary material). L2 and Rex-Babar were the most frequent among LINE retrotransposons (Fig. 1b; Tables S3 and S4—Electronic supplementary material). The SINE derived from transfer RNA (tRNA) was most abundant in this TE order (Fig. 1b; Tables S3 and S4—Electronic supplementary material). CMC/EnSpm, LTR/ERV, LTR/Gypsy, and LTR/Pao presented the largest fragments of ~ 5000 bp (Tables S3 and S4—Electronic supplementary material). The shortest fragment was 10 bp for several TEs (Tables S3 and S4—Electronic supplementary material). A total of 24.6% of the elements for males and 23.8% for females were unclassified elements in the genome (Fig. 1a).

Fig. 1
figure 1

Repetitive DNA composition and abundance in the Apareiodon sp. genome. a Percentage of repetitive classes based on whole-genome analysis of males and females conducted with RepeatMasker. DNA transposons and retrotransposon/LINEs were the most abundant. b Percentages of more frequent TEs in the genomes of males (dark color) and females (light color). On the y axis, 100% indicates the sum of the total length of the DNA transposon analyzed, and the x axis indicates the TE superfamily. c Percentages of more frequent tandem repeats in the genomes of males (dark color) and females (light color). On the y axis, 100% indicates the sum of the total length of tandem repeats analyzed, and the x axis indicates the repetition unit

A total of 20 abundant microsatellites were identified in the male and female genomes of Apareiodon sp. (Fig. 1c; Table S5—Electronic supplementary material). The microsatellites accounted for 8.9% of the male genome and 9.9% of the female genome (Fig. 1a; Table S5—Electronic supplementary material), and AC/GT, CA/GT, TC/GA, and CT/GT were the most abundant repeats in the genome (Fig. 1c; Table S5—Electronic supplementary material).

Repetitive divergence analyses and TE insertion ages

The relative ages of the different TE families were estimated by calculating Kimura-2 parameter value distances between individual copies and their consensus sequences (Smit and Hubley 2015). For each TE family/superfamily, the consensus sequence provides an approximation of the sequence of the ancestral TE. The landscape graphs were organized into two categories—DNA transposons and retrotransposons (Fig. 2 and interactive Electronic supplementary material Figs. S2, S3, S4, S5). Two evident DNA transposon invasion phases were detected in the DNA transposons of Apareiodon sp. males and females (Fig. 2a,b). The ancient peak of DNA transposon invasion showed higher copy numbers of DNA/helitron, DNA/Tc1-mariner, and DNA/CMC EnSpm (Fig. 2a,b and Electronic supplementary material Figs. S2, S3). The recent DNA transposon invasion phase mainly involved DNA/Hat, DNA/Harbinger, and DNA/Crypton and a second round of DNA/Tc1-mariner invasion (Fig. 2a,b and Electronic supplementary material Figs. S2, S3).

Fig. 2
figure 2

Repeat landscapes of male and female Apareiodon sp. genomes. The graphs show, for each element, the sequence divergence from their consensus sequence (x-axis) in relation to their number of copies in the genome (y-axis). Peaks represent waves of insertion (yellow arrows) of specific elements in the genome. Elements exhibiting older waves of insertion are present on the right side of the graph, while recent waves of insertions are depicted on the left side. Different colors indicate the distinct element types described on the right side. a Landscape of DNA transposons in the male genome (increases in Tc1-mariner, helitron, CMC EnSpm, and Hat occur in the ~ 40 to ~ 20 Kimura distance range in the ancient peak and at ~ 10 to 4 in the recent peak). b Landscape of DNA transposons in the female genome (increases in Tc1-mariner, helitron, CMC EnSpm, and Hat occur in the ~ 40 to ~ 20 Kimura distance range in the ancient peak and in the ~ 10 to 4 Kimura distance range in the recent peak). c Landscape of retrotransposons in the male genome (increases in SINE/tRNA and occur in the LINE/L2 ~ 29 to ~ 5 Kimura distance range in the recent peak; SINE/MIR and LINE/Rex-Babar increase in ~ 14 to ~ 4 Kimura distance range). d Landscape of retrotransposons in the female genome (increases in SINE/tRNA and occur in the LINE/L2 ~ 29 to ~ 5 Kimura distance range in the recent peak; SINE/MIR and LINE/Rex-Babar increase in ~ 14 to ~ 4 Kimura distance range). For a detailed interactive version of the graph, please refer to Figs. S1, S2, S3, and S4—Electronic supplementary material

Retrotransposons also presented two principal genomic invasion phases in Apareiodon sp. (yellow arrows in Fig. 2c,d). The ancient peak involved LTR/Pao, LINE/L1, and LTR/Gypsy as the most representative retrotransposons (Fig. 2c,d and Electronic supplementary material Figs. S4, S5). The recent peak was represented by SINE/MIR, SINE/tRNA, LINE/Rex-Babar, and LINE/L2 (Fig. 2c,d and Electronic supplementary material Figs. S4, S5).

According to neutral nucleotide substitution rates, we estimated the insertion time of the major TEs in the Apareiodon sp. genome. The putative mean age of the TE insertions in the genome ranged from ~ 25 million years ago (my) to ~ 1.5 my (Fig. 3). The ancient invasion phase demonstrated in landscape analysis involved LINE/L1, LTR/Gypsy, and LTR/Pao bursts in the genome between ~ 25 and 20 my, followed by DNA/helitron (~ 20–15 my) and Tc1-mariner and CMC EnSpm bursts between ~ 20 and 12 my (Fig. 3). The accumulation of helitron TEs was visibly higher in the female genome between this lapse time (Fig. 3). In the most recent TE invasion phase of the landscape analysis, SINE/MIR, LINE/Rex-Babar, and LINE/L2 element bursts were observed between ~ 7.5 and 2.5 my and DNA/Hat and a second Tc1-mariner invasion peak between ~ 3 and 1.5 my (Fig. 3).

Fig. 3
figure 3

Insertion time in millions of years (my) for the most enriched TEs in the Apareiodon sp. genome (ac, gj retrotransposons; df DNA transposons). The evolutionary age estimation indicates that the sex chromosome started to experience recombination suppression caused by the burst of TE insertion, as demonstrated in df

Comparative analysis of male and female repetitive sequences

To obtain the repetitive sequences accumulated on the W chromosome, comparative analysis between the sexes was performed using RepeatExplorer. The analyses resulted in 69 clusters (CL) with different graphing patterns (Fig. S6—Electronic supplementary material). A total of 67.9% of the reads were included in clusters, while 31.1% were singlets. CLs 6, 24, and 40 exhibited a higher proportion of female repeats and were selected for analyses (Fig. 4, Table 1). The cluster with the highest proportion of female reads was CL6. This cluster was enriched with different classes of microsatellites, and those presenting major differences between the sexes were selected for FISH mapping (Table 1). DNA/CMC EnSpm, DNA/Hat, DNA/helitron, DNA/Tc1-mariner, and LTR/Gypsy transposons were also enriched in CL6 of the female genomes and were subjected to FISH mapping. Blastn searches identified a fragment of Sat1WP (Schemberger et al. 2014) in CL6. CL24 exhibited enrichment of Maui/LINE-L2, and CL40 showed SSU-rRNA_Hsa enrichment in females. In parallel, CLs 4, 15, 48, and 49 presented higher proportions of male repeats (Fig. 4). These clusters exhibited a slightly higher percentage of penta- and hexanucleotide expansions of the DNA transposon Kolobok and LTR/Gypsy (Table S6—Electronic supplementary material).

Fig. 4
figure 4

Comparative repeat analysis of male and female genomes using RepeatExplorer software. Each column represents 100% of the reads of a cluster. The read proportions for the male and female genomes are indicated in blue and red, respectively. Clusters with a higher proportion of reads in females are expanded in the lower part of the graph

Table 1 Female repetitive enrichment clusters selected from RepeatExplorer

Karyotype organization and repetitive physical mapping validation on the sex chromosomes

Apareiodon sp. possesses 2n = 54 chromosomes arranged in 48 meta/submetacentric and six subtelocentric chromosomes in males and in 47 meta/submetacentric and seven subtelocentric chromosomes in females (Fig. 5; Fig. S7—Electronic supplementary material). This difference is due to a ZZ/ZW heteromorphic sex chromosome system (Figs. 5, 6; Fig. S7—Electronic supplementary material). The W chromosome of Apareiodon sp. is almost totally heterochromatic and enriched with repetitive sequences, which was demonstrated by heterochromatin detection (C-banding) and FISH mapping of Wm heterochromatin and Cot-1 probes (Fig. 6; Fig. S7—Electronic supplementary material). The Sat1WP satellite is present in the interstitial region of the long arm on of the W chromosome (Fig. 6; Fig. S7—Electronic supplementary material).

Fig. 5
figure 5

Karyotypes of Apareiodon sp. females subjected to in situ localization with TE probes (red signals) in (a) helitron, (b) Tc1/mariner, and (c) CMC EnSpm. The scale bar corresponds to 5 μm

Fig. 6
figure 6

The Z sex chromosome of Apareiodon sp. (metacentric chromosome) and W chromosome repetitive composition. a C-banding (CB) and FISH mapping with Wm, Cot-1, and Sat1WP probes. b FISH mapping of microsatellites selected via RepeatExplorer analyses ((GATA)n, (GA)n, (CA)n, (CAA)n, (CAT)n, and (GC)n). c FISH mapping of DNA transposons helitron, Tc1-mariner, and CMC EnSpm selected by RepeatExplorer. The scale bar corresponds to 5 μm

The sequences with the greatest difference scores related to females were used as probes to validate elements enriched on the W chromosome. FISH analysis revealed a hybridization signal mostly on the long arm of the W chromosome (Fig. S7—Electronic supplementary material). The helitron element possesses notable accumulation on the W long arm and a small site in the interstitial region of the Z short arm (Fig. 5a). The Tc1-mariner clusters were located in the interstitial region of the W long arm and the subterminal region of the Z short arm (Fig. 5b). The CMC EnSpm transposon accumulated in the proximal region of the W long arm and at a small site in the proximal region of the Z short arm (Fig. 5c). Gypsy and Hat were dispersed in the genome without W chromosome accumulation (data not shown).

The (GATA)n, (GA)n, (CA)n, (CAA)n, (CAT)n, and (CG)n clusters were located in the interstitial region of the W long arm (Fig. 6; Fig. S7—Electronic supplementary material). (GA)n, (CAT)n, and (CG)n accumulated in the terminal region of the Z short arm, while (GATA)n, (CA)n, and (CAA)n accumulated in the proximal region of the Z short arm (Fig. 6; Fig. S7—Electronic supplementary material).

Discussion

Repetitive DNA fraction in Apareiodon

The first step in understanding sex chromosome differentiation is to obtain knowledge of the repetitive genomic composition associated with physical mapping methodology (Mongue et al. 2017). Most of the sequenced fish genomes are from species with XX/XY chromosomal determination (e.g., Medaka) or without heteromorphic sex chromosomes (e.g., Danio rerio), and only one ZZ/ZW species genome is available, for tongue sole, Cynoglossus semilaevis (for a review, see Chalopin et al. 2015). This is the first genome assembly study in a Neotropical fish in which females are the heteromorphic sex. We set out to generate 1.2 Gb of genome data as expected for Apareiodon sp. This genome is almost 3× smaller than the human genome and shows a similar length to that of Danio rerio (1.4 Gb). On the basis of genome quality assessments comparing orthologous genes, it is possible to conclude that the Apareiodon sp. female genome is slightly better assembled than the male genome, as demonstrated by BUSCO analysis (Waterhouse et al. 2017).

The RepeatMasker analyses of the Apareiodon sp. assembly showed that ~ 36% of the genome matched diverse classes of repetitive DNAs, with a majority of DNA transposon classes. In teleosteans, the repetitive fraction ranges from 7% in Takifugu rubripes, with a majority of LINE retrotransposons, to 55% in zebrafish, with a majority of DNA transposons (Chalopin et al. 2015). Studies in mammals using the same approach have demonstrated a 31 to 49% repetitive element composition (Kirkness et al. 2003; de Koning et al. 2011). Mammals present less retrotransposon superfamily diversity than fish species (Volff et al. 2003). However, it is important to note that next-generation sequencing, even when producing high genome coverage, results in short reads that can lead to ambiguity in repetitive DNA assembly (Treangen and Salzberg 2012).

Microsatellite expansions were abundant in the Apareiodon sp. genome while classical satellites were underrepresented. Di-, tri-, and tetranucleotide expansions were more abundant. On the other hand, as demonstrated by FISH analysis (Schemberger et al. 2011), the satellite DNA pPh2004 described in Parodon hilarii (Vicente et al. 2003) was not detected (by Blastn) in the Apareiodon sp. genome. Our data show that Apareiodon sp. microsatellites can hitchhike within mobile elements, promoting chromosome site dispersion (Coates et al. 2010; Milani and Cabral-de-Mello 2014; Pucci et al. 2016) and increases in copy number via replication slippage (Kelkar et al. 2011). Furthermore, microsatellite genesis can also occur in degenerate TE segments (Wilder and Hollocher 2001).

Many families of satellite DNAs have been derived from transposable elements or include a major component that is related to part of a mobile element (Heikkinen et al. 1995; Kapitonov et al. 1998). We compared CL 48, which presented more TE LTR/Gypsy copies in male genome than in the female genome, contrary to the situation in CL 6. The graph structure of CL 48 is characterized by the presence of long multiple parallel lines that form a (more or less) much more linearly organized cluster, which is characteristic of dispersed and less degenerated sequences (Fig. S5) according to Novak et al. (2013). On the other hand, the graph structure of CL 6 formed a star circular-like structure that is characteristic of tandem repeats (Fig. S5) (see Novak et al. 2013). CL 6 included TEs (especially helitron, Tc1-mariner, and EnSpm TEs) that occur in the W sex chromosome-specific region. The helitron and EnSpm TEs exhibited in situ locations organizing extensive sites on the Apareiodon sp. chromosomes. The higher degeneracy level of these TEs and their in situ locations characteristic of tandem repeats indicate a probable origin from satellite DNAs during the evolutionary history of the origin of W chromosome differentiation.

Apareiodon sp. showed great diversity of TEs, and the DNA transposon class accounted for the majority of repetitive DNA present in the genome. Similarly, Schemberger et al. (2016) demonstrated accumulation of the DNA transposon Tc1-mariner in Parodontidae species. DNA transposons and retrotransposons presented different expansion histories as demonstrated by landscape analysis. TE activity invasion waves showed two evident time points (one ancient and another recent) with different levels of substitution rates demonstrated by Kimura analysis.

More recent activity of the TEs in the Apareiodon sp. genome can be observed between ~ 7.5 and 1.5 my with high copy numbers of Tc1-mariner, DNA/Harbinger, DNA/Hat, DNA/P, SINE/MIR, SINE/tRNA, LINE/Rex-Babar, and LINE/L2. The ancient invasion of the TEs was verified by a large increase in the copy number of LTR/Pao, LINE/L1, LTR/Gypsy, DNA/helitron, Tc1-mariner, and DNA/CMC EnSpm around ~ 25–12 my. Inactivated transposons can accumulate mutations at neutral rates until losing their molecular identity, as represented in this study by high Kimura distance values associated with copy numbers. In Apareiodon sp., the different levels of molecular deterioration suggested the occurrence of senescent TEs with deteriorated element identity, neutral TEs or TEs coopted by the genome, and a small copy number of autonomous TEs, as suggested for the “TE cycle life” (Kidwell and Lisch 2001; Fernández-Medina et al. 2012).

Sex chromosome origin and differentiation

The accumulation of repetitive DNAs is one of the key features of genomes and sex chromosome differentiation (Charlesworth et al. 1994) and can be cytogenetically characterized by chromosomal rearrangements and heterochromatinization (Gazoni et al. 2018; Komissarov et al. 2018; de Oliveira et al. 2018; Xin et al. 2018). The repetitive DNA data for Apareiodon sp. obtained via RepeatMasking and RepeatExplorer analyses were used in this study to screen the W chromosome repeat DNA composition. Interestingly, the different repetitive DNAs located on the W chromosome of Apareiodon were involved in TE invasions between ~ 20 and 12 my (helitron, Tc1-mariner, and EnSpm) detected by landscape analysis, and they exhibited major coverage in female genome clusters detected by RepeatExplorer. Other TEs, such as DNA/Hat and LTR/Gypsy, expanded in the genome before 20 my or after 12 my. These TEs showed no consistent signals on sex chromosomes and no participation in the origin of the W chromosome.

The repetitive sequences detected by RepeatExplorer presented more coverage in the W genome (68.8%) compared with the ZZ genome (67%), which was also demonstrated by the microdissected W and Cot-1 probes (Schemberger et al. 2014). In CL 6, (GACA)n, (GATA)n, (CAA)n, (CAT)n, (AC)n, (CAG)n, (CA)n, (GA)n, DNA/helitron, DNA/Tc1-mariner, and DNA/CMC EnSpm, DNA repeats showed a greater number of reads in females compared to the male genome, especially DNA/helitron repeats. This result was validated by FISH, which detected prominent interstitial signals of DNA/helitron, DNA/Tc1-mariner, DNA/CMC EnSpm, (GATA)n, (GA)n, (CAA)n, (CAT)n, (CA)n, and (CG)n on the W long arm.

Repetitive sequences are found at a high proportion on such heterochromatic sex chromosomes, and the evolution and emergence of sex chromosomes has been connected to the dynamics of repeats and transposable elements (Chalopin et al. 2015). Our TE insertion age estimates for Apareiodon sp. demonstrated that the three DNA transposons, especially the helitron transposon, with a high copy number insertion on the W chromosome, probably caused recombination suppression between the Z and W chromosomes 20 my. After helitron invasion, the W chromosome accumulated Tc1-mariner and EnSpm elements, which also exhibited a high copy number on the Z chromosome. TEs amplifications from not only the specific Y or W sex chromosome but also the corresponding regions of the X and Z chromosomes have also been observed in other fish species (Chalopin et al. 2015).

The genomic organization of the sex chromosomes has been well studied in other vertebrate model organisms with relatively old sex chromosomes, such as mammals (up to 166 my) and birds (approximately 200 my) (Bachtrog 2013). In fish species, Chalopin et al. (2015) proposed an early origin for sex chromosomes when compared to mammals and birds. The Y chromosome of a stickleback lineage originated 10 my (White et al. 2015), and the W chromosome of the tongue sole originated 30 my (Chen et al. 2014). However, the Japan sea lineage of the three-spine stickleback possesses a neo-Y chromosome dated to 1.5–2 my (Chalopin et al. 2015). Our data demonstrated that the Apareiodon sp. sex chromosome system shows a compatible emergence time to those of other teleosteans. These data corroborate the hypothesis that teleost fish exhibit multiple independent incidences of sex chromosome evolution, in a group that displays a wide variety of sex chromosome systems and diverse XY or ZW types with very young ages and various stages of differentiation (Chalopin et al. 2015).

Our study also identified and mapped several repetitive sequences constituting the Wm probe and further elucidated the theory of the evolution of sex chromosomes predicted by Schemberger et al. (2011). This study suggests that the ancestral proto-sex chromosome underwent an inversion event in its terminal region that rearranged the W chromosome repetitive DNAs to the proximal region. Here, we propose that the DNA helitron element (located in the proximal region of the Z short arm and in the W long arm) was involved in one important step of W differentiation, the crossing-over restriction that occurred during the W chromosome’s evolutionary history, according to the theory of sex chromosome gene erosion and the formation of the W chromosome specific region mediated by TE invasion (Charlesworth et al. 2005).

After ZW non-recombinant region origin, bursts of TEs and simple repeat accumulation occurred around young W sex-specific chromosome regions. We found that the interstitial W region was enriched with (GATA)n, (CA)n, (CAA)n, (CAT)n, and (CG)n microsatellites, indicating microsatellite sequences together with helitron, Tc1-mariner, and CMC EnSpm TEs participated in W chromosome rearrangement. The accumulation of microsatellites and TEs with different numbers of reads and signals in clusters in both sexes corroborates the hypothesis regarding the derivation of the W chromosome in Parodontidae. The W PAR was predicted by previous studies with the pPh2004 satellite in Parodon species, which is shared by the Z long arm and W short arm (Schemberger et al. 2011; Ziemniczak et al. 2014). The pPh2004 satellite is lacking in Apareiodon species with differentiated W chromosomes, making it difficult to clarify Z chromosome identity (Schemberger et al. 2011; Traldi et al. 2016). Based on previous data and the new data obtained here, it is possible to infer that the PAR of the ZW chromosomes in Apareiodon sp. is euchromatic and is organized by the long arm of the Z chromosome and the short arm of the W chromosome.

Usually, the heteromorphic sex chromosome accumulates TE DNA and exhibits a partial shift of euchromatic to heterochromatic regions (Charlesworth et al. 2005; Steinemann and Steinemann 2005). It has been proposed that gene silencing could potentially be achieved through the recruitment of TEs to the heteromorphic sex chromosome region, resulting in transcriptionally inactive heterochromatin, or through the accumulation of mutations in regulatory regions (Bachtrog 2013). The genomic and in situ location data showed that bursts of helitron, Tc1-mariner, and CMC EnSpm TEs and simple repeat accumulation around the Apareiodon sp. W chromosome-specific region could transform chromatin to a condensed region.

Regarding DNA structure, inverted polypurine/polypyrimidine DNA motifs tend to form triplex structures (triple helix or H-DNA, see Frank-Kamenetskii and Mirkin 1995). This type of repetitive DNA is an essential part of heterochromatic regions and is important for the maintenance of chromosome structure and the formation of heterochromatic W bodies in chickens (Saitoh et al. 1991; Komissarov et al. 2018). The W chromosome-specific region in Apareiodon sp. is heterochromatic and rich in microsatellite expansions, and the role of these expansions may be associated with chromatin nucleation and heterochromatic W body formation. In contrast, the major concentration of (GATA)n on the sex chromosomes was related to the activity of chromatin decondensation (Singh et al. 1994; Viger et al. 1998; Priyadarshini et al. 2003).

In previous experiments involving the in situ localization of the (GATA)n expansion, several accumulation sites were detected on the autosomes and W chromosome of Parodontidae species (Ziemniczak et al. 2014). Similarly, the W chromosome of chickens is rich in the (GGAAA)n expansion, and the involvement of this satellite in upstream regions of avian gonad-specific protein-coding sequences has been discussed (Komissarov et al. 2018). Interestingly, GATA-4 is a transcription factor that is specifically expressed in Sertoli cells in the testis and granulosa cells in the ovaries, where GATA sequences are used as transcription factor binding sites (Viger et al. 1998). In addition, GATA sequences are present in the promotor region of doublesex and mab-3-related transcription factor 1 (Dmrt-1), a gene involved in postnatal testis differentiation (Raymond et al. 2000).

According to Bachtrog (2013), the repetitive DNA structure of the sex chromosomes continues to cause challenges in deciphering their genomic sequences. Here, we have made progress in the elucidation of the repetitive DNA organization on the sex chromosomes of a Neotropical fish genome, and advances in the understanding of ZW sex determination mechanisms are expected in the future.

Conclusion

Our data provide a detailed view of the characterization and identification of repetitive elements in the Apareiodon sp. genome. Most TEs show high levels of nucleotide substitution, leading to defective, neutral, or coopted copies. Two different waves of TE invasion occurred in the evolutionary history of the Apareiodon sp. lineage, resulting in DNA transposon and retrotransposon accumulation. An ancient invasion of helitron DNA TE appeared during the Z/W recombination suppression event approximately 20 my. The W-specific genomic region expanded with accumulation of microsatellites and satellite DNA originating from TEs, increasing the diversity of repetitive DNAs in the sex chromosome composition and Z/W differentiation.