Introduction

Repetitive DNA constitutes a large fraction of most eukaryotic genomes, in the form of transposable elements (TEs), satellite DNA, segmental duplications, ribosomal DNA, multi-copy gene families, pseudogenes, etc. which, collectively, have been called the “repeatome” (Kim et al. 2014). Some of these elements are repeated in tandem, such as satellite DNA and multi-copy gene families, whereas others are scattered across the genome, the most abundant being a multitude of transposable elements. Accurate quantification of repetitive DNA composition is a common caveat of current genome sequencing projects, due to the difficulty in proper assembling mainly posed by satellite DNA (Miga 2015). In fact, a precise account of the amount of different repetitive elements is not currently known for a single chromosome even for the most advanced genome sequencing projects. Our present research tries to fill this gap by developing appropriate NGS and bioinformatic approaches which might be useful in the case of a B chromosome, i.e., a kind of parasitic genomic element being present in only some individuals thus being dispensable (Camacho 2005), but also of possible application to other cases such as sex-specific chromosomes (e.g., W or Y) or aneuploids. Until recently, it was thought that B chromosomes are almost exclusively composed of repetitive DNA, mainly TEs, satellite DNA, and some repetitive gene families, especially 45S ribosomal DNA (rDNA) (for review, see Camacho et al. 2000 and Camacho 2005). However, other gene families have recently been found on B chromosomes, such as histone genes (Teruel et al. 2010; Oliveira et al. 2011; Silva et al. 2014; Utsunomia et al. 2016), 5S rDNA (Oliveira et al. 2011; Kour et al. 2013; Xie et al. 2014; Jang et al. 2016), and the U2 small nuclear DNA (Bueno et al. 2013; Menezes-de-Carvalho et al., 2015). Likewise, recent literature has confirmed that satellite DNA (Poletto et al. 2010; Peng and Cheng 2011; Klemme et al. 2013; Bauerly et al. 2014; Jang et al. 2016; Utsunomia et al. 2016) and TEs (Martis et al. 2012; Klemme et al. 2013; Kour et al. 2013; Houben et al. 2014; Valente et al. 2014; Huang et al. 2016) are abundant in B chromosomes of several species. This abundance in repetitive elements has hindered the finding of protein-coding genes in B chromosomes, but NGS is showing that these genes appear to be an important component of B chromosomes, at both structural and functional levels (Martis et al. 2012; Valente et al. 2014; Ma et al. 2017; Navarro-Dominguez et al. 2017).

One of the most recent approaches to search for repetitive elements in B-carrying genomes has been facilitated by NGS and the RepeatExplorer software (Novák et al. 2013), and a specific protocol allowing the high-throughput analysis of satellite DNA has recently been put forward by Ruiz-Ruano et al. (2016). These new technologies have been applied in species such as the fungus Alternaria arborescens (Hu et al. 2012), rye (Martis et al. 2012; Klemme et al. 2013), the fruitfly Drosophila albomicans (Zhou et al. 2012), the fish Astatotilapia latifasciata (Valente et al. 2014) and Moenkhausia sanctaefilomenae (Utsunomia et al. 2016), and the grasshopper Eumigus monticola (Ruiz-Ruano et al. 2017). This has provided fragmented pieces of new and valuable information, but the quantitative composition of B chromosomes for repetitive DNA is largely ignored.

The migratory locust (Locusta migratoria) harbors a very widespread B chromosome system, as its presence has been reported in populations from Asia (Itoh 1934; Hsiang 1958; Nur 1969; Kayano 1971), Africa (Rees and Jamieson 1954; Lespinasse 1973; Dearn 1974; Lespinasse 1977), Australia (King and John 1980), and Europe (Cabrero et al. 1984; Viseras et al. 1990). This B chromosome includes a proximal euchromatic region and a distal heterochromatic one (Cabrero et al. 1984). B chromosomes in this species are mitotically unstable, so that they vary in number between different cells of a same individual (Nur 1969; Kayano 1971; Cabrero et al. 1984; Viseras et al. 1990). This mitotic instability is already manifested in 5-day-old embryos (Pardo et al. 1995), and the preferential destiny of cells with higher number of Bs toward the germ line has been suggested to be a mechanism of premeiotic drive in males (Nur 1969). In addition, B chromosomes in this species show drive through females (Pardo et al. 1994), most likely through meiotic drive. The presence of drive in both sexes might explain why these B chromosomes have been so successful in nature.

Here, we develop an approach to estimate the abundance of repetitive DNA in B chromosomes, by means of Illumina sequencing of genomic DNA (gDNA) of B-carrying and B-lacking individuals, followed by subtractive bioinformatic analysis, Illumina sequencing of DNA extracted from a microdissected B chromosome, and the physical mapping of some elements by means of FISH. We applied this protocol to the B chromosomes in L. migratoria, and the results have revealed their extreme enrichment in a single-satellite DNA family and have suggested the possibility that two A chromosomes (8 and 9) might have been involved in B chromosome origin.

Materials and methods

Materials and nucleic acid extraction

In 2014, we collected 20 males and 10 females of Locusta migratoria in two natural populations close to Los Barrios (Cádiz, Spain), namely 5 males and 3 females at Finca El Patrón (36.20685N, 5.46481W) and 15 males and 7 females at Puente de Hierro (36.19251N, 5.55131W). The females were cultured to obtain embryos for FISH experiments. We performed an NGS experiment including the two B-lacking males found and four of the B-carrying males. Males were anesthetized before dissection, and several testis tubules were immersed in 3:1 ethanol-acetic acid for cytological analysis of B chromosome frequency. Testis and body remains were immediately frozen in liquid nitrogen and stored at − 80 °C. We extracted genomic DNA (gDNA) from one hind leg, and RNA from the testes and the other hind leg.

RNA from each individual and body part was sequenced in the Illumina HiSeq2000 platform, which yielded about 8 Gb of 100 nt paired end reads (SRA project SRP066094) per sample. gDNA from each individual was sequenced in the Illumina HiSeq2500 platform yielding about 20 Gb of 125 nt paired end reads per individual, i.e., ~ 3× genome coverage (SRA project SRP079877). In addition, we used 454 reads of gDNA previously obtained from a B-carrying individual collected at Padul (Granada, Spain) consisting of 292,352 reads summing up ~ 200 Mb, which is available in the SRA database, with SRR1200889 accession number (Ruiz-Ruano et al. 2015).

In case of body parts with cuticle, materials were frozen into liquid nitrogen before breaking them in a porcelain mortar. For soft tissues, e.g., testis, we used an eppendorf pistil to break them up into 1.5-mL suspension buffer extraction kit. For genomic DNA extractions, we used the “Genomic DNA Gen Elute Mamalian Miniprep” (Sigma-Aldrich) kit following the manufacturer’s protocol. For total RNA extraction from hind legs, we used the “Real Total RNA Spin Plus” (Durviz), whereas for RNA extraction in testis, we used the kit “RNeasy Lipid Tissue” (Qiagen). We measured the concentration of extracted DNA and RNA using the “Infinite 200 NanoQuant” (Tecan) spectrophotometer. We checked DNA integrity in 2-μL sample run on 1% agarose gel and RNA integrity in 2-μL sample run on a MOPS gel with DEPC water. After verifying the good concentration and quality of the samples, we sent for sequencing at least 1 μg of sample, at concentration higher than 50 ng/μL.

Acquiring reference sequences for repetitive DNA content analysis

First, we got 107 satellite DNA sequences belonging to 62 families found in L. migratoria (Ruiz-Ruano et al. 2016) and 1128 TEs included in RepBase v20.10 (Bao et al. 2015) (last access: 28 October 2015). Then, we assembled mitochondrial DNA and both genes and spacers of the histone cistron, 45S and 5S rDNAs, U1, U2, U4, U5, and U6 snDNAs, using MITObim (Hahn et al. 2013) with known seed sequences of these genes coming from several species: Locusta migratoria for H3 histone (GU111931) and mitochondrial DNA (JN858179), Gomphocerinae sp. for 45S rDNA (AY859546.1), Ronderosia bergi for 5S rDNA (KP213274), Eyprepocnemis plorans for U1 (KJ606069), Abracris flavolineata for U2 (KP975085), and Drosophila melanogaster for U4 (NR_001670), U5 (NR_001933), and U6 (NR_002081) snDNA genes.

In addition, we searched for tRNAs and other repetitive elements in the L. migratoria genome assembly (Wang et al. 2014). For tRNA, we used the tRNAscan-SE program (Lowe and Eddy 1997) and removed redundancies with CD-HIT-EST (Li et al. 2012) applying the options -M 0 -aS 0.8 -c 0.8 -G 0 -g 1. This yielded 617 tRNAs sequences. For other repetitive elements, we performed assembly with the RepeatModeler pipeline (Smit and Hubley 2017), and then, we masked the output against the previously mentioned sequences with RepeatMasker (Smit et al. 2017), keeping 1108 sequences with at least 40 consecutive unmasked nucleotides. All together, we used 2969 reference sequences.

Abundance of repetitive sequences in the B chromosome

To estimate the abundance of repetitive elements in B-carrying and B-lacking genomes, we aligned 5 million Illumina read pairs from each gDNA library (~ 0.2× of genome coverage) with the reference sequences, by means of RepeatMasker. Abundance for a given repetitive element was calculated as the proportion of nucleotides aligned with the reference sequence, in respect to total library size. In addition to abundance, RepeatMasker provides divergence estimates, and both parameters allow building a repeat landscape which is highly informative about the abundance of repetitive elements showing different values of divergence. By comparing abundance and divergence for the different subclasses of repetitive elements between B-carrying and B-lacking libraries, we built a subtractive repeat landscape which shows the elements with the highest difference in abundance between the two types of library. This gives a first indication of which DNA sequences are over-represented in the B-carrying libraries in respect to the B-lacking ones, presumably because they are more abundant in B than A chromosomes.

We estimated abundance of repetitive elements in B chromosomes from abundances in B-carrying and B-lacking libraries. For this purpose, we estimated abundance in four B-carrying and two B-lacking males. The use of several individuals within each type, instead of one, has the advantage of diminishing false positives caused by between-individual differences in abundance of repetitive elements in the A chromosomes. Of course, the more individuals in each group, the higher accuracy in B chromosome abundance estimates. For the following calculations, we will use the B-carrying and B-lacking averages. Our estimates are based on the fact that abundance of a given repetitive element in the B-carrying genome is the weighed mean of its abundance in the A chromosomes and that in the B chromosomes. Therefore, we first calculated the weight of A (w A ) and B (w B ) chromosomes in the B-carrying genome on the basis of A (2C) and B (C B ) chromosomes DNA content:

$$ {w}_A=\frac{2C}{2C+{C}_B{N}_B},\kern0.5em \mathrm{and}\kern0.5em {w}_B=\frac{C_B{N}_B}{2C+{C}_B{N}_B} $$

where N B is the average number of B chromosomes in the B-carrying individuals Illumina sequenced.

Therefore, the abundance of a given repetitive element in a B-carrying library (A +B ) should be equal to the weighed mean of abundances in the A (A A ) and B (A B ) chromosomes, which are thus multiplied by their weights in the B-carrying genome:

$$ {A}_{+B}={A}_A{w}_A+{A}_B{w}_B $$

so that its abundance in the B chromosomes is as follows:

$$ {A}_B=\frac{A_{+B}-{A}_A{w}_A}{w_B} $$

Abundances in the B-lacking and B-carrying libraries (A A and A +B , respectively) were estimated as described above, and the number of B chromosomes (N B ) was scored by cytologically examining 35–111 spermatocytes per male. Genome size (C = 6.3 Gb) was given in Wang et al. (2014) and DNA content per B chromosome was calculated from the proportion between C-value and B chromosome size measured in Ruiz-Ruano et al. (2011) (C B  = 6.3 × 0.02539 = 0.1599 Gb). Due to the fact that B chromosomes are mitotically unstable in this species (see Cabrero et al. 1984), N B in this case was the mean number of Bs per male. For our present calculations, N B in the equation was the average of N B values for the four B-carrying males (1.31).

We divided each A B value between N B to obtain abundance estimates per B chromosome (A 1B ) and then calculated the proportion of B sequences corresponding to each repetitive element, by dividing A 1B by the sum of all A 1B values being higher than zero. This latter sum provides an estimate of the total proportion of repetitive DNA in the B chromosome. The number of copies per B chromosome for a given repetitive element can be estimated by multiplying A 1B by total DNA content in the B chromosome (C B ) and dividing by element length. Likewise, the number of copies in the diploid A chromosome set can be calculated by multiplying its abundance in the B-lacking genome (A A ) by total DNA content in the A chromosomes (2C) and dividing by element length.

Validation of B chromosome content by Illumina sequencing of microdissected B chromosomes

A B-carrying male collected in 2012 at Padul (Granada, Spain), sited about 190 km from Los Barrios, was used to perform the microdissection of one B chromosome from a single diplotene cell (as described in Teruel et al. 2014). B chromosome DNA was amplified by the Phi29 DNA Polymerase with the Illustra GenomiPhi V2 amplification kit (GE Healthcare Life Sciences) following manufacturer’s instructions, and the DNA obtained was sequenced in the Illumina HiSeq2000 platform.

Search for sequence and structural variation specific to B chromosomes

We first selected the repetitive sequences being clearly more abundant in the B-carrying genomes, compared to B-lacking ones. In these sequences, we searched for two types of variation specific to B chromosomes: sequence variants in the form of single-nucleotide polymorphisms (SNPs) and structural variation as insertions, deletions, and inversions. For this purpose, we selected Illumina reads showing homology with the reference sequences by means of the algorithm BLAT (Kent 2002). On the one hand, we mapped the reads with SSAHA2 (Ning et al. 2001) and selected SNPs showing no variation in B-lacking genomes but showing an additional variant exclusive to B-carrying genomes. We then calculated the frequency of both variants in gDNA and RNA from B-carrying individuals and calculated the transcription intensity (TI) for each variant as the quotient between its frequency in RNA and gDNA from a same individual.

On the other hand, we visualized the former SSAHA2 mappings by the IGV software (Thorvaldsdóttir et al. 2013) and searched for coverage variation, along the contigs, which might indicate structural variation. For a direct comparison of structure in this kind of regions, between B-carrying and B-lacking libraries, we selected 2500 read pairs showing homology with a given element, per library and region, and performed a RepeatExplorer clustering for each region.

When the structural variations which were characteristic of B-carrying individuals, and hence the B chromosome, were complex, we tried to increase the known contig sequence, toward both sides, by searching 454 reads of gDNA showing homology with the 300–400 nt at both ends, by means of the BLASTN software (Altschul et al. 1990). We then assembled the reads with the Geneious v4.8 software (Drummond et al. 2009) and repeated the process several times until no more lengthening was possible.

PCR amplification and fluorescent in situ hybridization

We designed primer pairs by using the Primer3 software (Untergasser et al. 2012) (Table S1). For elements others than satellite DNA, the PCR program employed started with an initial denaturation step at 95 °C for 5 min, followed by 30 cycles including heating at 94 °C for 30 s, 55–60-65 °C as annealing temperatures for 30 s, and 72 °C for 30 s, and a final extension step at 72 °C for 7 min. For satellites, we used the primers and PCR conditions previously described in Ruiz-Ruano et al. (2016). All the PCR products used for FISH were labeled by nick translation with 2.5 units of DNA polymerase/DNAse I (Invitrogen), following the standard protocol. The 18S rDNA probe was obtained by amplification of a 1113 bp fragment using the 18S-E and 1100R primers described in Littlewood and Olson (2001), according to PCR conditions described in Ruiz-Estévez et al. (2014). The physical mapping of these probes was performed by FISH following the protocol described in Cabrero et al. (2003). The probes were labeled with tetramethylrhodamine-5-dUTP or fluorescein-12-dUTP (Roche).

MinION validation of the TE chimera

To test the physical reliability of the TE chimera, we performed a sequencing experiment with the Oxford Nanopore MinION system using flow cell version R9. The 1D library preparation was carried out using the Nanopore Genomic Kit (SQK-LSK108) with the CleanNGS magnetic beads (CleanNA), with a starting DNA quantity of 5 μg without previous fragmentation. The sequencing was performed using the Nanopore’s local base-calling software and yielded a total of 130 Mb in 63,346 reads. We first searched for reads showing homology with the TE chimera, using BLAST (Altschul et al. 1990), and then selected those reads aligning along its entire length with chimera sequence, using Geneious (Drummond et al. 2009), mapped them with SSAHA2 (Cabrero et al. 2003), and visualized the mapping with IGV (Thorvaldsdóttir et al. 2013).

Results

Subtractive analysis of Illumina sequenced B-carrying and B-lacking genomic DNA

Global abundance analysis showed that about 64.1% of the B-lacking L. migratoria genome consists of repetitive DNA (Fig. 1 and Table S2), a figure being slightly higher than the 60% estimated by Wang et al. (2014) in individuals from China. Remarkably, B-carrying libraries showed a slightly higher figure (64.6%) due to B chromosome enrichment in repetitive DNA (see below).

Fig. 1
figure 1

a Subtractive repetitive landscape obtained from average counts for repetitive DNA elements observed in gDNA in four B-carrying and two B-lacking males of L. migratoria. It represents the difference in abundance for each repeat subclass showing different degrees of divergence in respect to the consensus sequence. It shows positive values for subclasses showing higher abundance in the B-carrying genomes than in the B-lacking ones, and negative ones for subclasses being underrepresented in the B-carrying genomes. Note that B-carrying genomes are enriched in satellite DNA and histone genes (positive values) but impoverished in TEs (negative values). b Relative frequency of repetitive elements in A and B chromosomes, the former being inferred from the average counts in B-lacking males, and the latter as explained in materials and methods. The 0% in the B chromosome subfigure refers to the absence of mtDNA in it. c Fluorescent in situ hybridization (FISH) for the LmiSat02-176 satellite DNA (in red color) on a mitotic metafase cell from a diploid B-carrying L. migratoria embryo, merged with DAPI (blue). This satellite DNA is present across the whole B chromosome length, and bioinformatic analysis indicates that it constitutes more than half of DNA content in the B chromosome. Note the presence of green signals, corresponding to 18S rDNA, on three pairs of A chromosomes (2, 6, and 9) and their absence on the B chromosome. Bar in c represents 5 μm

Out of 2841 repetitive elements showing positive results in RepeatMasker, our subtraction procedure indicated that 1533 of them showed higher abundance in the B-carrying genomes, suggesting that they are present in the B chromosome (Table S2). Illumina sequencing of the microdissected B chromosome showed the presence of 275 out of the 2841 elements detected by RepeatMasker. Considering both approaches, 1199 elements were not present in the B chromosome by both methods, 1367 were found in the B by the subtractive procedure only, 109 were found by microdissection only, and 166 were found by both methods. Contingency analysis showed significant positive association between results with both methods (χ2 = 5.03, df = 1, P = 0.025, odds ratio = 1.34, 95% CI: 1.04–1.72). Likewise, Spearman rank correlation analysis showed significant positive correlation between A 1B values and coverage observed in the Illumina sequences obtained from the microdissected B chromosome (rS = 0.33, N = 1533, t = 13.7, P < 0.000001), suggesting that the higher the abundance of a repetitive element in the B chromosome, the higher the coverage in the DNA sequences obtained by B microdissection. In fact, 86% of the elements obtained by microdissection were predicted to be in the B chromosome by the subtraction approach.

Summarizing the information obtained by both methods, at subclass level, showed the presence of 48 subclasses of repetitive elements in the B chromosome (Table S3), with very highly significant positive correlation between the proportions resulting from both methods (rS = 0.79, t = 8.74, df = 46, P < 0.000001), showing that 34 of these subclasses were detected by B microdissection. Bearing in mind that both methods were applied to individuals from two populations placed 190 km apart, we consider that the high consistency between the two methods support the high reliability of the B chromosome composition in repetitive DNA inferred here. In fact, all 11 repetitive classes in which the 48 subclasses can be summarized were present in the Illumina sequencing of the microdissected B chromosome, with very high positive correlation between the proportions yielded by both methods (rS = 0.81, t = 4.13, df = 9, P = 0.0026).

Our subtractive approach indicated that 94.9% of B chromosome DNA is repetitive (Fig. 1, Tables S2 and S3), leaving about 5.1% of low and single-copy DNA. The most abundant types of repetitive DNA in the B chromosome are satellite DNA (65.2% of B chromosome content) and TEs, especially DNA transposons (7.9%) and LINEs (7%). In addition, the B chromosome contains several gene families, such as histone genes (2.7%), 45S (0.25%) and 5S (0.78%) rRNA genes, snRNA genes (1.3%), especially U2 (1.1%), and tRNA genes (0.7%) (Fig. 1 and Table S3). These results reveal remarkable differences in repetitive DNA content between A and B chromosomes since, in A chromosomes, TEs comprise 54% of their total DNA (19.1% in the B chromosomes) whereas all remaining classes of repetitive elements sum only 10%, with satellites representing a mere 2.4% (see also Ruiz-Ruano et al. 2016) (Fig. 1a, b). In brief, A chromosomes contain 64% of repetitive DNA and 36% of low and single-copy DNA. As a reflect of the difference between A and B chromosomes in DNA content, the abundance of the 11 classes of repetitive DNA showed no significant correlation between A and B chromosomes (rS = 0.49, t = 1.69, df = 9, P = 0.125).

The presence of histone genes in the B chromosomes of this species had previously been shown by Teruel et al. (2010), by means of FISH analysis. However, previous FISH analyses had failed to detect the presence of 45S and 5S rDNA (Teruel et al. 2010), or else U1 (Anjos et al. 2015) or U2 snRNA genes (DC Cabral del Mello, personal communication) on the B chromosome, but all these genes show much lower abundance in the B chromosome than histone genes.

A single satellite DNA family comprises about half of repetitive DNA in the B chromosome

It was highly remarkable that a single satDNA (LmiSat02-176) comprised more than half of B chromosome DNA (54.8%), representing 84% of all satellite DNA contained in the B chromosome. This satellite is thus the most abundant repetitive element in the B chromosome, the remaining elements individually representing 3.1% or less of B chromosome content, and only three satellites (LmiSat02-176, LmiSat01-193, and LmiSat04-18) surpassed 1% of B content (Table S4). The physical mapping by FISH for all 59 satDNAs reported in Ruiz-Ruano et al. (2016) confirmed that LmiSat02-176 is extremely abundant in the B chromosome as it shows intense FISH signal covering its whole length (Fig. 1c). Five other satDNAs were observed by FISH on the B chromosome (Table 1 and Fig. S1b-f). It was noteworthy that the bioinformatic analysis suggested the presence of four satDNAs in the B chromosome showing higher abundance than the telomeric repeat (i.e., > 0.2%) but failing to show FISH signals on the B chromosome even though they did on A chromosomes (Table 1).

Table 1 FISH analysis of some repetitive elements found in the L. migratoria genome

TE abundance in the B chromosome

The most abundant TEs in the B chromosome were RTE-2_LMi (0.57% of B-DNA), followed by hAT-4_LMi (0.47%), Sola2-3N1_LMi (0.41%), Helitron-N11B_LMi (0.39%), and Penelope-42_LMi (0.38%) (Table S2). To get additional insights on the structure of the TEs found in the B chromosome, we selected seven TEs showing different levels of abundance in the B chromosome. Coverage analysis along their full length, performed by SSAHA2, revealed that the three most abundant TEs (hAT-4_LMi, Sola2-3N1_LMi, and Penelope-42_LMi) showed uniformly high coverage along most of their length in the B-carrying genomes (Fig. S2), whereas the TEs with intermediate (Penelope-59_LMi and Tx1-1_LMi) (Fig. 2) or low (Helitron-N17B_LMi and DNA2-4_LMi) (Fig. S3) abundance in the B chromosome showed an irregular coverage pattern by displaying high coverage only for a short region. Interestingly, the high coverage region in DNA2-4_LMi yielded a satellite DNA which was present only in B-carrying males, but it was not apparent by FISH analysis (see below).

Fig. 2
figure 2

Coverage analysis for two selected TEs (Penelope-59_LMi (a) and Tx1-1_LMi (c) each representing about 0.1% of B chromosome content. Note that only a discrete region of both TEs was highly overrepresented in the B-carrying libraries. In both cases, we selected the Illumina reads showing homology with the high-coverage region for each of these TEs, separately in B-carrying and B-lacking libraries, and then performed a RepeatExplorer run, which yielded different graphs for B-lacking and B-carrying libraries (see inset graphs). In the B-carrying graph, the cluster components being common with B-lacking ones are noted in green color, whereas those being specific to the B-carrying libraries (i.e., the TE chimera) are noted in orange color. c, d The physical mapping by FISH for Penelope-59_LMi and Tx1-1_LMi, respectively. Note that, for both TEs, only the high coverage region showed conspicuous signals on an interstitial region of the B chromosome, located in the heterochromatic distal half being adjacent to the euchromatic proximal region. However, no signal was found using probes for a region showing no coverage differences between 0B and +B individuals (insets in b and d). Bar in b and d represents 5 μm

To investigate the chromosome distribution of these seven TEs, we selected one or two regions in TEs with uniform or irregular coverage, respectively, for PCR amplification (Figs. 2, S2 and S3) and probe generation for FISH analysis. Remarkably, six of these TEs showed a large FISH signal on a same interstitial region of the B chromosome, but they were not apparent on A chromosomes, whereas the remaining TE (DNA2-4_LMi) showed a conspicuous cluster on autosome 9 but not on the B chromosome (Table 1, Figs. 2 and S4). Remarkably, the two TEs showing intermediate abundance and irregular coverage (Penelope-59_LMi and Tx1-1_LMi) displayed the FISH signal in the interstitial region of the B chromosome only for the probe generated on the high coverage region, suggesting that the low coverage regions of these TEs are absent or poorly represented in the B chromosome (Fig. 2b, d). Taken together, these results revealed a very high consistency between the bioinformatic estimates of TE abundance in the B chromosome and those that can be inferred by FISH.

A TE chimera in the B chromosome

For a better understanding of the structure of the TE-rich interstitial region found in the B chromosome, we performed an additional bioinformatic analysis starting with Penelope-59_LMi and Tx1-1_LMi elements. First, we searched for reads showing homology with the high coverage regions of these elements, in B-carrying and B-lacking libraries, and then, we performed RepeatExplorer clustering in each case. For both elements, the graphs obtained in the B-lacking library showed a lineal pattern, typical for TEs, and the contig only contained the target TE flanking the selected region (Fig. 2a, c). In the B-carrying library, however, the graph showed two lineal structures joined by the central (selected) region (insets in Fig. 2a, c). One of these lineal structures corresponded with that found in the B-lacking genome, i.e., including target TE sequences, whereas the other contained the high coverage region of the target TE flanked by other different TEs. This difference between B-carrying and B-lacking genomes suggested the existence of a TE chimera in the B chromosome.

In order to lengthen the known DNA sequence for these B chromosome contigs, we reiteratively aligned 454 reads with the known DNA sequences and manually extended the known contig as much as possible (Fig. S5). The contig finally obtained was 17,327 nt long and contained fragments of 28 TEs and a full SINE, in different orientations (Tables S5 and Fig. S6). We submitted this B chromosome chimeric sequence to GenBank with accession number KX611679. Remarkably, bioinformatic search failed to find evidence for the presence of the TE chimera in the two B-lacking males analyzed here, reinforcing the conclusion that it is specific to B chromosomes. Finally, in order to validate the inferred TE chimera structure, we performed a MinION sequencing experiment on gDNA from one of the four B-carrying males analyzed here. The finding of 24 reads showing full length homology with different regions of the chimera sequence, ranging from 496 to 4430 nt (Fig. S5), supported the reliability of the structure inferred from the 454 reads assembling. However, we did not obtain longer enough reads providing further evidence on the complete structure of the TE chimera.

Some TEs residing in the B chromosome are transcriptionally active

We searched for SNPs showing variants being specific to B-carrying individuals thus being absent from B-lacking genomes and transcriptomes. We named “Ref” the reference variant present in all individuals, thus being on A chromosomes, and “B-Alt” the B-specific alternative variant. We then calculated the frequency of Ref and B-Alt variants in genome and transcriptome libraries, and then got an estimate of transcription intensity (TI) of the B-Alt variant as the quotient between its transcriptome and genome frequencies in each of the four B-carrying males. This analysis indicated that some of the TEs located in the B chromosome (Gypsy-52, Kolobok-4, and Sola2-3N1) showed no transcription, whereas others appear to be active (Fig. 3). For instance, hAT-4 showed transcriptional activity only for copies carrying SNPs but not for other copies carrying deletions (see also Table S6). However, the TE showing the highest TI, with high consistency between different SNPs, was Sola1-3_LMi, with slight differences between body parts and individuals (Fig. 3). In addition, although with lower intensity, we observed transcriptional activity for the Tx1-1 element, but only in testis.

Fig. 3
figure 3

Transcription intensity (TI) for B-specific SNPs in B-carrying males, calculated as the quotient between the frequency of the Alt variant in RNA and gDNA from a same individual

Discussion

B chromosomes in L. migratoria are highly enriched in repetitive DNA, especially satellite DNA

Our present results suggest that, as a whole, repetitive DNA represents 94.9% of the B chromosome, confirming previous predictions that B chromosomes are enriched in repetitive DNA (for review, see Camacho 2005). It is noteworthy that the quantitative composition of a B chromosome in repetitive DNA had never been reported with the level of detail performed here. Remarkably, it highly differs from that of A chromosomes in L. migratoria, as the latter are mainly composed of TEs (54.1%) whereas TEs only constitute 19.1% of B chromosome content. In contrast, B chromosome is mainly made of satellite DNA (65.2%), with a single satellite (LmiSat02-176) constituting more than 54.8% of it, whereas satellites comprise only 2.4% of A chromosomes. In rye, A and B chromosomes contain about similar proportions of the different classes of repeats, but B chromosomes show accumulation of B-specific satellite repeats and organellar DNA (Martis et al. 2012), and several transposons are either amplified or depleted on the B chromosome (Klemme et al. 2013).

A remarkable fact of L. migratoria B chromosomes is that they do not show B-specific satellite DNA families, with the only exception of the satellite DNA emerged within DNA2-4 TEs (see Fig. S3b), but it was not apparent by FISH (results not shown). However, in rye and the grasshopper Eumigus monticola, two satellite DNA families have been observed by FISH on B but not A chromosomes (Klemme et al. 2013; Ruiz-Ruano et al. 2017).

Bioinformatic analysis indicated that only three satellite DNAs (LmiSat02-176, LmiSat01-193, and LmiSat04-18) showed abundance higher than 1% in the B chromosome, representing 54.8, 3.1, and 2.2%, respectively, of all DNA sequences in it. However, only the two former were visualized in the B chromosome by FISH (Fig. S1). Three other satDNAs (LmiSat06-185, LmiSat53-47, and LmiSat07-5-tel) were also visualized by FISH on the B chromosome even though they showed lower abundances (0.2–0.37%). Only three of these six satDNAs (i.e., LmiSat01-193, LmiSat02-176, and LmiSat06-185) were found in the Illumina reads from the microdissected B chromosome (Table S4). All five satDNAs visualized on the B chromosome show a clustered pattern on A chromosomes (Ruiz-Ruano et al. 2016). Remarkably, the bioinformatic approach suggested that the B chromosome contains four other satDNAs (LmiSat09-181, LmiSat10-9, LmiSat16-278, and LmiSat18-210) showing abundances (0.36–0.8%) being higher than that of the least abundant satDNA which was visualized on the B chromosome by FISH (i.e., LmiSat07-5-tel). These four satellites are clustered on A chromosomes (Ruiz-Ruano et al. 2016) but were not apparent on the B chromosome, most likely because they show arrays shorter than 1.5 kb, which is the minimum threshold for FISH visualization (Schwarzacher and Heslop-Harrison 2000). The bioinformatic approach also suggested the presence of 27 other satDNAs on the B chromosome in abundances ranging from 0.002 to 0.19%, but none of them was visualized by FISH or appeared in the Illumina reads of the microdissected B chromosome (Table S4). Likewise, our bioinformatic results suggested the presence of 45S and 5S rDNA, as well as U1 and U2 snRNA genes on the B chromosome, but they had never been visualized by FISH (Teruel et al. 2010; Anjos et al. 2015), perhaps because of their low abundance (1% or less) and their organization in small arrays. The actual presence of these satDNAs on the B chromosome cannot be elucidated with the present approach and future research might perhaps solve this conundrum. Anyway, the 27 satDNAs represent, as a whole, only 3.5% of all satDNA content in the B chromosome, and their effect on global B chromosome content is actually negligible.

Whereas the 62 satDNA families found in the A chromosomes of L. migratoria represent only 2.4% of the genome (Ruiz-Ruano et al. 2016), those found in the B chromosome represent 65.2% of all B sequences. This is mainly due to the massive amplification of a single satellite in the B chromosome (LmiSat02-176) as it constitutes almost 84% of all satDNA found in the B chromosome. In fact, the four B-carrying males analyzed carried 672,537 repeats of this satDNA, on average, whereas the two B-lacking individuals carried only 174,798 repeats, on which basis we estimate that about 497,739 repeats were present on B chromosomes. We can thus infer that a B chromosome in this species carries about double amount of LmiSat02-176 than all A chromosomes together, a fact that is consistent with FISH results (see Fig. 1c). The amplification of this satellite in the B chromosome might have been facilitated by the high frequency of B-bivalent formation during meiotic first prophase, frequently persisting till metaphase I thus being chiasmate (Cabrero et al. 1984), as this facilitates their amplification through unequal crossover (Smit 1976). In L. migratoria B chromosomes, chiasma formation is most frequent in the proximal euchromatic region (Cabrero et al. 1984), whereas LmiSat02-176 is present along whole B chromosome length (Fig. 1c). It is thus conceivable that satDNA amplification in the B chromosome could be more often achieved by sister chromatid unequal crossover than by chiasma-dependent homologous meiotic unequal crossover, as suggested for ribosomal DNA (Eickbush and Eickbush 2007).

Frequent TE insertion in an interstitial B chromosome region

At first sight, the low TE content of L. migratoria B chromosomes is in high contrast with the general view that these dispensable chromosomes are havens for TE proliferation because, by landing on them, TEs do not decrease host genome fitness (for review, see Camacho et al. 2000 and Camacho 2005). In fact, B chromosome scarcity for TEs, shown here, is consistent with previous observations by FISH on B chromosomes of the grasshopper Eyprepocnemis plorans (Montiel et al. 2012) and is also consistent with our present FISH analysis (see Fig. S4). Bearing in mind that B chromosomes in L. migratoria most likely derived from A chromosomes (Teruel et al. 2010), it is worth inferring that massive amplification of satellite DNA in the B chromosome (especially LmiSat02-176) explains the sharp difference in repetitive DNA content between A and B chromosomes and is responsible for TE loss in the B chromosome after its origin, replaced by satellite DNA. This would be feasible if the B chromosome would have some constraint for maximum size, as suggested by the absence of B variants being larger than the typical one observed in all natural populations (Pardo et al. 1995).

In this context, the presence in the B chromosome of a highly complex chimeric region including many different TEs is highly remarkable, most of which are incomplete and thus inactive. The chimera comprised 17,327 bp and included DNA sequences showing homology with 29 TEs belonging to 18 different subclasses (see Table S5 and Fig. S6). The only full element was SINE/Lm2 (0.37% of TE DNA in the B), which shows homology with SINE/Lm1, a non-clustered sequence described by Bradfield et al. (1985). The reliability of this chimera was confirmed by the MinION sequencing.

This maremagnum of incomplete TEs suggests a scenario of frequent TE insertion in the B chromosome, with some elements inserted over others, successively inactivating one another. This suggests that this region is dispensable for B chromosome maintenance, for which reason it tolerates such a concentration of TE insertions. The TE chimera is located in an interstitial region of the B chromosome, within its heterochromatic half, next to the euchromatic proximal region. FISH analysis showed high abundance for six TEs located in this region (hAT-4_LMi, Helitron-N17B_LMi, Sola2-3N1_LMi, Penelope-42_LMi, Penelope-59_LMi, and Tx1-1_LMi), four of them forming part of the chimera (Fig. S6) in very high correspondence with our bioinformatic analysis of abundance in the B chromosome. The apparent clustering of these TEs on this region indicates that they are tandemly repeated, and the fact that different TEs yielded variable intensity for the FISH signal, in high correspondence with bioinformatic estimations of TE abundance, indicates that the methods developed here provide accurate estimations of TE abundance in B chromosomes. Anyway, the presence of the TE chimera means that L. migratoria B chromosomes can act as evolutionary sinks for transposons, as previously suggested for the ribosomal DNA of B chromosomes in the grasshopper E. plorans and the R2 retrotransposon (Montiel et al. 2014).

Whereas we failed to find evidence for the presence of the TE chimera in the two B-lacking individuals analyzed here, it was highly surprising to find it in the L. migratoria genome published by Wang et al. (2014) (Fig. S6). Given the complex structure of the chimera, it would be highly unlikely that it might have arisen twice independently, and we believe that the L. migratoria genome sequencing project was performed on B-carrying individuals, as is also suggested by the abundance of LmiSat02-176 in their Illumina reads, which was twice that found in Spanish B-lacking individuals (Ruiz-Ruano et al. 2016). Likewise, Valente et al. (2014) noted a similar problem in cichlids, in the species Metriaclina zebra, and called attention on the need of karyotyping individuals prior to start a genome sequencing project, to avoid contamination by B chromosome sequences.

B chromosome origin

The present results on repetitive DNA content of B chromosomes in L. migratoria have important implications for B chromosome origin. The only previous hypothesis on this subject claimed that the B chromosome had derived from autosome 8, as it was the only A chromosome carrying histone genes, likewise B chromosomes (Teruel et al. 2010). The observed sequence divergence for H3 and H4 genes, between the A and B chromosomes, suggested that the B chromosome arose more than 750,000 years ago (Teruel et al. 2010). This long age makes it difficult identifying the A chromosome from which the B chromosome derived, as during such a long evolutionary time, the B chromosome can have experienced many changes in sequence and structure, giving it features being unrecognizable on A chromosomes, such as the TE chimera. However, the absence of B-specific satellite DNAs is puzzling for old B chromosomes like this.

The present results on satellite DNA abundance and location in the B chromosome are not fully consistent with the origin of this B chromosome from autosome 8, as autosome 9 is the only A chromosome carrying all 5 clustered satDNAs found on the B chromosome, a fact pointing to autosome 9 as possible B ancestor. This is reinforced by the exclusive presence of LmiSat53-47 on these two chromosomes. However, evidences based on satellite DNA location are weakened by the intragenomic dissemination of this kind of DNA sequences (Ruiz-Ruano et al. 2016), as the coincidence in satDNA location between autosome 9 and the B chromosome could also be due to the post-B-origin amplification of some of these 5 satDNAs, even if the B had derived from autosome 8. Therefore, with the available information, it is most parsimonious to consider autosomes 8 and 9 as putative ancestors of the B chromosome. Likewise, in rye, NGS has shown that B chromosomes arose from two different A chromosomes (Martis et al. 2012).

Although we have greatly delimited B chromosome origin in L. migratoria, this is a difficult problem to solve as these B chromosomes are quite old (see above). Therefore, other kinds of genetic markers would help to complete this task, especially protein-coding genes, as well as the completion of this species genome, as this would allow ascertaining which A chromosome carries the genes found in the B chromosomes, which is more reliable information than that coming from repetitive elements.