Introduction

Repeated sequences, including transposable elements (TEs) or transposons, tandem repeats, and other repetitive elements, contribute large fractions of many eukaryotic genomes (Wessler 2006). Once considered “junk DNA,” repetitive sequences are now recognized to play essential roles in variation of functional genes (Kobayashi et al. 2004) and genome evolution (Gao et al. 2016). They also serve as important components to form and maintain functional centromeres and telomeres (Maxwell et al. 2006; reviewed in Grewal and Jia 2007; Gao et al. 2015) and provide raw materials for emergence of new genes and other genetic innovations (for review: Long et al. 2003).

Transposons are mobile genomic elements which have potential capability to move from one position to another place in host genome or even conduct horizontal transfer between distantly related organisms (El Baidouri et al. 2014; Gao et al. 2018). Except for a handful of TEs that show insertion preferences and were found in some specific regions such as telomeres (Maxwell et al. 2006), numerous transposons are dispersed throughout eukaryotic genomes. In plant genomes, TEs are very dynamic and homologous transposons have been difficult to identify, even in closely relative genomes (Piegu et al. 2006; Gao et al. 2009).

Tandem DNA repeats represent another type of repetitive sequences which are organized into tandem array of multiple units ranging from a few to thousands or more. Centromeric tandem repeats (CTRs) and telomeric tandem repeats (TTRs) are two major types of tandem repeats in higher eukaryotic genomes. In primate and flowering plants, some centromeres can contain megabase arrays of tandem repeats; these extremely repetitive structures make the centromeric regions difficult to completely sequence, even with the latest long-read sequencers (Bzikadze and Pevzner 2020). CTRs usually evolve rapidly and lineage-specific centromeric repeats were found in many genomes (Lee et al. 2005). In contrast to CTRs, TTRs tend to be highly conserved. For example, the (TTAGGG)n repeat is present in the telomeric regions of human (Homo sapiens) and other vertebrates (Meyne et al. 1989; Podlevsky et al. 2008). In plants, the (TTTAGGG)n repeat, which was originally identified in Arabidopsis thaliana (Richards and Ausubel 1988), is present in a wide range of plants including dicots, monocots, and algae (Petracek et al. 1990; Fuchs et al. 1995; reviewed in Peska and Garcia 2020). We hereafter referred to as canonical plant telomeric repeat (CPTR). However, in some plants such as the order Asparagales and Cestrum elegans, the Arabidopsis-type telomeric repeat was replaced by TTAGGG, TTTTTTAGGG, or other types of tandem repeats (Peška et al., 2015; reviewed in Peska and Garcia 2020).

Despite the high conservation of plant telomeric tandem repeats in DNA sequence and cytological localization, the lengths of telomeric DNA tracts can be tremendously different among species and varieties as well as between different chromosomes within a nucleus (for review: Shippen and McKnight 1998). Additionally, the lengths of telomeres may vary dramatically between different developmental stages or tissues in some plants. For instance, in young embryos and undifferentiated calli of barley (Hordeum vulgare), the telomeres can reach 80 kb and 300 kb, respectively. However, the telomere lengths in mature embryos and old leaves were estimated to be 30 kb and 23 kb (Kilian et al. 1995). Stability of telomere length during plant growth and development was also found in white campion (Melandrium album) (Riha et al. 1998) and likely reflects various mechanisms of telomere length regulation in different plants. Maintenance of basic telomere lengths plays pivotal roles for cell differentiation as significant shortage of telomeres can cause higher frequency of chromosomal rearrangements and abnormal cells (for review: Graham and Meeker 2017). Telomere lengths may also affect flowering time in some plants such as Arabidopsis thaliana, rice (Oryza sativa), and maize (Zea mays) (Choi et al. 2021). Additionally, mutations in the telomere lengths can result in progressive and severe developmental abnormalities in both germination and post germination growth of vegetative organs (Hong et al. 2007). In order to recover the losses of telomeres and to maintain the genomic stability, several mechanisms have evolved in eukaryotes, one of which is the telomerase-associated telomere elongation. During this process, the telomerase reverse transcriptase (TERT) catalyzes the synthesis of short tandem motifs and extends the telomeres by using the telomerase RNA subunit as the template (for review: Peska and Garcia 2020).

Peanut (Arachis hypogaea L. AABB genome type, 2n = 4x = 40) is an important oil and food crop in the world. The cultivated peanut is an allotetraploid which was derived from the interspecific hybridization of two diploid wild species, A. duranensis (AA genome type, 2n = 2x = 20) and A. ipaensis (BB genome type, 2n = 2x = 20), and followed by a spontaneous whole-genome duplication (Seijo et al. 2007). Like many other flowering plants, peanut telomeres also consist of multiple copies of TTTAGGG repeat (Du et al. 2016; Zhang et al. 2016). However, other types of repetitive sequences in peanuts and their evolution are still poorly understood. In this study, we analyzed the peanut genome sequences and identified a new tandem repeat named TAR30, [TTTT(C/T)TAGGG]n. We found that TAR30 shows significant sequence similarity to CPTR, (TTTAGGG)n, in over 100 plant genomes. Our fluorescence in situ hybridization (FISH) analysis revealed that TAR30 is enriched in the interstitial regions of peanut chromosomes. Different FISH hybridization patterns of TAR30 were observed between a newly induced allotetraploid and its two wild diploid progenitors. Our results indicate that TAR30 is a homolog of CPTR and provide insights into evolution of tandem repeats in peanuts.

Materials and methods

Plant materials

The seeds of cultivated peanut “Tifrunner” (Holbrook and Culbreath 2007) were provided by the peanut breeding lab at USDA-ARS in Tifton, GA, USA. Another peanut variety “Runner IAC 886” was obtained from the Active Germplasm Bank of Embrapa Genetic Resources and Biotechnology (Embrapa-Cenargen, Brasília, Brazil). Seeds of two diploid wild species, A. valida (PI 468,154, BB genome type, 2n = 2x = 20) and A. stenosperma (PI666100, AA genome type, 2n = 2x = 20), were collected from the USDA seed bank. Additionally, a synthetic allotetraploid ValSten developed from the cross between the wild species A. valida and A. stenosperma (Gao et al. 2021) was also included. Seeds of ValSten used in this study were harvested from the earliest generation (S0) tetraploid plants.

Sequence analysis

Identification of tandem repeat sequences

The genome sequence of cultivated peanut (Bertioli et al. 2019) was initially analyzed by the LTR-Finder software (Xu and Wang 2007) using the default parameters except that we set a 50 bp of minimum long terminal repeat (LTR) length and 100 bp of minimum distance between LTRs. The annotated sequences were then manually inspected based on their sequence structures including LTRs, target site duplication (TSD), and retrotransposase proteins. All sequences containing no retrotransposase protein were used for BLASTN searches against themselves to detect if they were tandemly repetitive. The selected tandem repeats were further analyzed by the Tandem Repeats Finder program (Benson 1999) to determine the consensus patterns of tandem repeats and the repeat unit sizes. The consensus sequence of the tandem repeat was generated with the WebLogo website (https://weblogo.berkeley.edu/logo.cgi).

Identification of transposons in tandem repeat sequences

The peanut repeat database including both transposons and tandem repeats were used to screen the assembled peanut genome (Bertioli et al. 2019) using RepeatMasker (http://www.repeatmasker.org) program with the default parameters but we used the “nolow” option to avoid masking the low‐complexity DNA regions. The hits masked by the tandem repeat TAR30 were further checked to determine if TAR30 sequences were inserted by transposons according to the locations of both TAR30 and transposons in the peanut genome.

PCR analysis

PCR analysis was performed by following our previous protocols (Gao et al. 2009). Briefly, the young leaves of Tifrunner plant were used to extract genomic DNA with the DNeasy Plant Mini Kit (QIAGEN, Venlo, Netherlands). Amplification was conducted in an MJ Research PTC-200 thermal cycler using 20 ng genomic DNA, 1.5 mM MgCl2, 1.0 unit Taq DNA polymerase, 0.2 mM dNTP, 0.2 mM primer, 1 × PCR buffer, and ddH2O to a final volume of 25 μl. The temperature cycling conditions were 5 min at 95 °C, followed by 35 cycles of 95 °C for 50 s, 55 °C for 50 s, and 72 °C for 1 min, and a final extension at 72 °C for 5 min. The primers for amplifying TAR30 tandem repeat were 5′-ATTTGGAGTTTGGAGTTTAGG-3′ (forward) and 5′-GGCGATATAAATAGGACGAAT-3′ (reverse). Amplicons were purified with QIAquick PCR purification kits (QIAGEN, Venlo, Netherlands) and it was used to obtain the probe for FISH.

Fluorescence in situ hybridization (FISH)

Slide preparation

Around 5–10 mm of at least five root tips was collected from 4-week-old plants (5 plants for each genotype) and treated with 2 mM 8-hydroxyquinoline for 2 h at room temperature followed by an extra hour at 4 °C with fresh hydroxyquinoline solution (Fernández and Krapovickas, 1994). The samples were incubated in the fixative solution containing absolute ethanol:glacial acetic acid (3:1, v/v) for 12 h at 4 °C and the spreads of somatic chromosomes were prepared according to the previous protocol (Schwarzacher and Heslop-Harrison 2000). Meristems were digested in 10 mM citrate buffer containing 2% cellulase from Trichoderma viridae (Onozuka R-10; Serva, Heidelberg, Germany) and 20% pectinase from Aspergillus niger (Sigma-Aldrich, Darmstadt, Germany) for 2 h at 37 °C. Chromosomes of each root were spread in a drop of acetic acid of 45% (v/v) on a slide. The spread was obtained after a gentle pressure using the coverslip. The slides were selected using the phase contrast in the AxiosKop microscope (Zeiss, Oberkochen, Germany). Coverslips were removed; slides were air-dried for 24 h and kept at − 20 °C until use.

Probe preparation, hybridization, and imaging

The purified TAR30 DNA sequence was labeled with digoxigenin-11-dUTP using the Nick translation kit (10,976,776,001, Roche Diagnostics GmbH, Mannheim, Germany). The 5S ribosomal DNA sequence (5S rDNA) was obtained from clones of Lotus japonicus (Pedrosa et al. 2002) and the correspondent probe was used as a positive technical control. The 5S rDNA sequence was isolated with the Illustra Plasmid Prep Midi Flow kit (28,904,269, GE Healthcare, Chicago, IL) and labeled with Cy3-dUTP (GEPA53022, Roche Diagnostics GmbH, Mannheim, Germany) by nick translation.

FISH experiments were performed following the previous protocol (Schwarzacher and Heslop-Harrison 2000). Both probes of TAR30 and 5SrDNA were used simultaneously for hybridization (~ 50 ng of each probe/slide). Hybridizations were performed for 14 h at 37 °C, followed by washes with around 73% stringent washes, using saline citrate buffer (SSC) 2 × . The chromosome sites that hybridized with the TAR30 probe labeled with digoxigenin-11-dUTP were immunocytochemically detected by the antibody anti-digoxigenin conjugated to fluorescein (11,207,741,910, Roche Diagnostics GmbH, Mannheim, Germany), while the 5S rDNA loci, in which probe was labeled with Cy3-dUTP, were detected by the direct observation in the Zeiss AxioPhot epifluorescence microscope (Zeiss, Oberkochen, Germany). All slides were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) before observation. Images were captured using Zeiss AxioCam MRc digital camera (Zeiss Light Microscopy, Göttingen, Germany) and the AxioVision Rel. 4.8 software and they were further processed as a whole image uniformly, using the Adobe Photoshop CS software.

Results

A new tandem repeat in cultivated peanut genome

In the process of annotating LTR retrotransposons in the genomes of wild rice species and other plants (Gao et al. 2009, 2016), we recognized that not all sequences identified by the LTR_Finder program (Xu and Wang 2007) were true LTR retrotransposons, and some of them were false positives including tandem repeats and others. We analyzed the cultivated peanut genome (Bertioli et al. 2019) with the LTR_Finder program and identified a 6,030-bp sequence containing some typical features of LTR retrotransposons including the LTR with 5′TGT … ACA3′ and 5-bp TSD (Fig. 1a). To identify the putative retrotransposase protein and to define the superfamily of this “potential LTR retrotransposon,” the 6,030-bp sequence was used as a query to conduct a BLASTX search against the NCBI non-redundant protein sequences (https://blast.ncbi.nlm.nih.gov). No significant hit was found even with a lower E-value cutoff (E value = 0) suggesting that this sequence encodes no homologous protein to any protein in the NCBI protein database. No predicted protein was found for the sequence with the gene annotation program GENSCAN (Burge and Karlin, 1998). To further test if this sequence is related to any previously characterized LTR retroelements, the 6,030-bp sequence was used to search against the comprehensive LTR retrotransposon database, Gypsy Database (GyDB) (Llorens et al. 2011), and no hit was found. The BLAST searches against both NCBI and GyDB indicated that this is either a non-autonomous LTR retroelement or a false positive LTR retroelement.

Fig. 1
figure 1

Sequence structure of the 6014-bp sequence (a) and WebLogo of three tracts of TAR30 (b). The false positive element shows some structural futures of LTR retrotransposons such as started with TGT and ended by ACA and flanked by 5-bp TSD. However, it contains multiple units of tandem repeat indicated by red arrows

We further analyzed the 6,030-bp sequence with the Tandem Repeats Finder (Benson 1999) to determine if the sequence contained any tandem repeats. The results indicated that this sequence is organized into three tracts and each of them contains multiple copies ranging from 125 to 316 of 10-bp monomers, TTTT(C/T)TAGGG (Fig. 1a, b). Therefore, this analysis indicated that this is a new tandem repeat sequence in peanut which was named as TAR30 (TAndem Repeat 30).

Identification of homologous sequences of TAR30 in plant genomes

To detect if there were homologous sequences of TAR30 in other plant genomes and to better understand its evolutionary origin, TAR30 DNA sequence was used to conduct BLASTN searches against the plant genomic sequences in GenBank. To avoid false positives, we used stringent criteria including over 200-bp size of homologous sequence alignment and a lower E value (≤ 1 × e−10). For some genomes, if multiple significant hits were detected, the sequence with the best alignment score was designed as the TAR30 homolog. TAR30 homologous sequences were found in 160 plants including 139 flowering plants, three lycophytes, and 18 algae (Table 1, Supplementary Table 1). All the homologous sequences were analyzed with the Tandem Repeats Finder program (Benson 1999) to determine the basic motifs of tandem repeats. The homologous sequences from 70% (112/160 × 100%) of the plant genomes contain the TTTAGGG repeat that has been detected in the telomeric regions in many plants including Arabidopsis (Richards and Ausubel 1988), barley (Röder et al. 1993) and rice (Mizuno et al. 2006). We refer to this sequence as (T)3A(G)3 where (T)3 and (G)3 represent three thymines (TTT) and guanines (GGG), respectively. It should be noted that this number is undercounted as we used very strict criteria. For example, homologous sequences of TAR30 with the TTTAGGG motif was found in Zizania latifolia, Cenchrus americanus, Ensete ventricosum, Brassica juncea, and six other plants which were not included due to either short alignment size (less than 200 bp) or higher E value (> 1 × e−10). Additionally, the homologs of TAR30 in 17 plant genomes contain the consensus pattern of TTTTAGGG or TTTTTAGGG (Table 1, Supplementary Table 1). Thus, our sequence comparisons suggest that TAR30 is the homologous sequence of CPTR and is likely related to the TTTAGGG telomeric repeat. We also identified significant hits of TAR30 in 31 plant genomes although these homologous sequences consisted of other types of tandem repeats or other sequences but their chromosomal distributions are not clear. The TTTAGGG repetitive sequences may be present in some of the 31 plant genomes but were not sequenced or assembled. Another possibility was that TTTAGGG repeat evolved rapidly or has been lost in these plants (Sýkorová et al., 2006).

Table 1 Summary of different types of TAR30 homologs in 160 plant genomes

TAR30 was also used to conduct BLASTN searches against the sequences from 245 plant mitochondria and 2216 plant chloroplasts in GenBank, but no homologous hit was detected suggesting the homologs of TAR30 are only present in plant nuclear genomes.

Chromosomal locations of TAR30

TAR30 was used as the probe to conduct FISH analyses to determine its genomic distributions. 5S rDNA was included as a positive control and marker for chromosomes A3 and B3. In both peanut cultivars “RunnerIAC 886” (Fig. 2a) and “Tifrunner” (Fig. 2b), 40 chromosomes were observed, including 20 chromosomes from subgenome A since they all have DAPI+ bands (bright white) on the centromere regions and another 20 chromosomes lacking DAPI+ signals on centromeres thus corresponding to chromosomes of the subgenome B in both varieties. FISH signals of the TAR30 probe (green signals) were detected in the proximal region of one pair of chromosomes of the subgenome A (white arrows in Fig. 2) in both cultivars, and in similar region of three pairs of the subgenome B in “Runner IAC 886” (yellow arrows in Fig. 2a) but not in the subgenome B of “Tifrunner” (Fig. 2b). As expected, rDNA loci (red signals) were identified in the cyt-A3 and cyt-B3 chromosomes in both peanut cultivars. No overlap of TAR30 and 5S rDNA was detected in these chromosomes.

Fig. 2
figure 2

FISH analysis of two peanut cultivars, “Runner IAC 886” (a) and “Tifrunner” (b). DAPI+ bands (bright white) on centromere regions of the chromosomes of subgenome A lacking in another 20 chromosomes of the subgenome B. Hybridization with TAR30 probe (green signals) on the proximal region of one pair of chromosomes of the subgenome A (white arrows) and hybridization signals (green) in three pairs of chromosomes in “Runner IAC 886” (yellow arrows), together with the 5S rDNA probe (red signals) (a, b) in chromosomes cyt-A3 (A3) and cyt-B3 (B3). Scale bar: 5 μm

To compare the distribution patterns of TAR30 between the two diploid species that generated the synthetic allotetraploid ValSten and to better understand TAR30 distributions in chromosomes in wild diploids and in newly obtained allotetraploid genotype, further FISH analysis of TAR30 was conducted together with rDNA 5S loci detection. Eighteen chromosomes without DAPI+ bands and two chromosomes with DAPI+ bands (B3) were found in the mitotic cells of A. valida (BB-type genome) with TAR30 FISH signals observed in the proximal regions of all 20 chromosomes, but with the signals more evident in some of the chromosomes (Fig. 3a). An overlap of TAR30 and 5S rDNA hybridization signals was observed in the chromosome B3, indicating overlap or close proximity of this ribosomal DNA region with the TAR30 sequence. All twenty chromosomes of A. stenosperma (AA-type genome) had DAPI+ bands and the TAR30 hybridization signals were found on the proximal regions of only three pairs of chromosomes (white arrows in Fig. 3b).

Fig. 3
figure 3

FISH analysis of TAR30 and 5S rDNA in A. valida (a), A. stenosperma (b), and the new synthetic allotetraploid ValSten (c). DAPI+ bands (bright white) on the centromere regions of the 10 pairs of chromosomes of the AA wild type (b), lacking in 20 chromosomes of BB wild type (a), while present in half of the 40 chromosomes of ValSten (c). Hybridization with TAR30 probe (green signals) in all chromosomes of A. valida (a), in three pairs of chromosomes in A. stenosperma (b) (white arrows), and in 18 pairs of chromosomes in ValSten (c). 5S rDNA loci (red signals) were detected in only one pair of chromosomes, cyt-B3 (B3) in A. valida (a) and cyt-A3 (A3) in A. stenosperma (b) and in both pairs of chromosomes (cyt-A3 and B3; A3 and B3) in ValSten (c). Hybridization signals of both probes (green and red) were detected in cyt-B3 of A. valida and cyt-B3 and A3 in ValSten (c). A10 with short arm and proximal segment of the long arm (A10*) and the satellite region (A10°) and small pair A (A9). Scale bars: 5 μm

Forty mitotic chromosomes were observed in the new synthetic wild ValSten (Gao et al. 2021), equal to the sum of number of chromosomes of its diploid parental species, A. stenosperma and A. valida. FISH signals were detected in 36 chromosomes of them, with varying signal strength among the chromosomes and lacking detection only in the chromosome pairs A9 (small pair) and A10 (SAT chromosomes) (Fig. 3c). The 36 signals more likely correspond to 18 pairs of chromosomes, which is more than the sum of the number of chromosomes with TAR30 signals in A. stenosperma (six chromosomes) and A. valida (twenty chromosomes). The colocalization of TAR30 and 5S rDNA hybridization signals on the chromosome B3 of ValSten was similar to that in the B3 of the A. valida. The TAR30 and 5S rDNA hybridization signals were also colocalized in the A3 of ValSten, but not in the A genome species, A. stenosperma, indicating differences of genome organization after tetraploidization.

The newly synthetized tetraploid wild ValSten showed FISH signals of TAR30 in more chromosomes than that in both cultivated peanut varieties. The hybridization signals in ValSten were evidently closer to the centromere region whereas the signals in the two peanut cultivars were dispersed along chromosome arms.

Coexistence of TAR30 with TTTAGGG telomeric repeat

Previous cytogenetic analyses revealed that the FISH signals of TTTAGGG tandem repeat were detected in several Arachis species including both cultivated peanut and its two wild progenitors (Du et al. 2016; Zhang et al. 2016); however, it is not clear about the abundance of the telomeric repeat in the peanut genomes. We used the telomeric repetitive sequence of Arabidopsis (Richards and Ausubel, 1988) to search against the genomes of wild and cultivated peanuts (Bertioli et al. 2016, 2019). Multiple significant hits (< 1 × e−100) with the basic motif of TTTAGGG were identified, thus confirming the high conservation and presence of TTTAGGG repeat in peanuts.

Highly repetitive sequences make some regions including centromeres and telomeres difficult to sequence completely and assemble correctly (Bzikadze and Pevzner 2020). To provide more accurate estimation on their abundance, both TAR30 (PIVG01000013: 67,234,205–67,235,790) and TTTAGGG repeats in A. hypogaea (PIVG01000012: 72,642–74,255) were used to screen the PacBio long-read sequencing data sets from both cultivated peanut and its diploid wild parents (Bertioli et al. 2016, 2019). Our analysis indicated that TTTAGGG telomeric repeat is more abundant than TAR30 in both wild and cultivated genomes (Supplementary Fig. 1). The coverage of TTTAGGG telomeric repeat in cultivated peanut was nearly equal to the sum of that detected in two diploid wild species. However, TAR30 in cultivated peanut genome is two times more than the sum of the repeat in two diploid wilds suggesting recent amplification of TAR30 in cultivated peanut after the polyploidization or domestication.

Most of the masked long reads contained either TAR30 or TTTAGGG repeat; however, some reads contained both tandem repeats. All these reads were extracted and manually inspected. Twenty-six reads from cultivated peanut harbored both TAR30 and TTTAGGG telomeric repeat which were organized into different tracts (Fig. 4). The coexistence of TAR30 and TTTAGGG repeat was also found in 21 reads from A. ipaensis and in 10 reads from A. duranensis. Our long-read surveys show that TAR30s and TTTAGGG repeats are mostly located in different genomic regions, although some of them colocalized.

Fig. 4
figure 4

Five PacBio long reads of cultivated peanut which contain both TAR30 and TTTAGGG telomeric repeats. The orientations of different tandem repeat tracts are indicated by the black arrows (TAR30) or gray triangles (TTTAGGG)

Expression and genic contributions of TAR30

Tandem repeats play important roles not only in functional centromeres and telomeres but also in gene evolution as they may serve as coding sequences and generate structural and functional divergence of genes between relatives (Gemayel et al. 2012; Sulovari et al. 2019). We searched against the cDNA sequences in GenBank and identified 12 significant hits (< 1 × e−10) including 11 cDNAs from developing embryo or young leaves in Tifrunner (Supplementary Table 2). These cDNAs were used to conduct BLASTX search against the NCBI protein sequences (https://blast.ncbi.nlm.nih.gov); no significant hit was found for 10 cDNAs suggesting that these cDNAs likely serve as non-encoding sequences such as untranslated regions (UTRs) or they encode new or unique proteins not yet annotated in GenBank. However, one TAR30-related cDNA sequence (LN672472), which was collected from the mixed samples treated by various biotic and abiotic stresses in a Chinese peanut cultivar Minhua6, showed sequence similarity to glutaminyl-peptide cyclotransferase (2 × e−29). These results indicated that some TAR30 sequences were expressed and likely served as either non-coding RNA or parts of functional genes expressed during stress. To gain a global view on its impact on genes, we compared the locations of TAR30 sequences with the annotated genes in cultivated peanut (Bertioli et al. 2019). Thirteen genes were found to overlap with TAR30 sequences indicating the TAR30 sequences have been recruited as the parts of these genes (Table 2). It includes the gene encoding MYB family transcription factor (TF) which plays a key function in regulatory networks controlling development, metabolism, and responses to biotic and abiotic stresses (for review: Dubos et al. 2010).

Table 2 A list of TAR30 related annotated genes in A. hypogaea Tifrunner

Identification of transposons in TAR30 tracts

We screened the cultivated peanut genome with our repeat database including both transposons and tandem repeats and identified 92 tracts of TAR30 (here we define tracts as the tandem arrays of TAR30 which sizes are larger than 800 bp). We further manually inspected and compared the TAR30 tracts with the genomic distributions of transposons. Thirty-five TAR30 tracts were interrupted by transposons including both class I and class II transposons, but LTR retrotransposons were dominant in both abundance and coverage (Supplementary Table 3). A 48,271-bp tract on chromosome 1 for which a mutator transposon Ah_mu4 inserted into TAR30 repeats and then served as the bottom element for hosting four LTR retrotransposons (Fig. 5). Among the four LTR retrotransposons, Ah_Feral is a complete retroelement with two intact LTRs and flanked by 5-bp TSD which allowed us to calculate the insertion time of this element based on the sequence divergence of its two LTRs (Gao et al. 2009). Our result indicates that this element inserted into the peanut genome about 0.35 million years ago (MYA). To further gain insights into the genomic dynamics mediated by TAR30 and transposons, 2-kb sequence for each end of the track (1–2000, 46,272–48,271) which includes both TAR30 and the mutator transposon sequences was used to search against GenBank; both 2-kb ends were detected in the subgenome A of A. hypogaea and A. duranensis (AA genome) but not in the subgenome B of A. hypogaea and A. ipaensis (BB genome); thus, all these transposons likely inserted into the TAR30 region after the divergence of the A and B genomes that occurred about 2.2 MYA (Bertioli et al. 2016).

Fig. 5
figure 5

Graphic display of nested transposons. An array of TAR30 is interrupted by a complete mutator transposon which is flanked by 9-bp TSD (TTAGGAATT) and serves as the host element for four LTR retrotransposons. The concave-pointed rightwards arrows represent intact LTRs while the ribbon arrows mean truncated LTRs. The insertion time of the retrotransposon with two intact LTRs was calculated and is shown

Discussion

Evolutionary origin of TAR30

In this study, we analyzed the cultivated peanut genome and identified a new tandem repeat TAR30 containing the 10-bp of basic motif of TTTT(C/T)TAGGG. Comparative analyses revealed that TAR30 is a homolog of the Arabidopsis-type telomeric repeat TTTAGGG in 112 plant genomes. Additionally, the homologous sequences of TAR30 in 17 another plants contained TG-rich tandem repeats such as TTTTAGGG which represents another type of plant telomeric repeat in Chlamydomonas (Petracek et al. 1990; reviewed in Peska and Garcia 2020). It should be noted that strict criteria (> 200-bp hit size and E value of ≤ 1 × e−10) were used to define the homologous sequences of TAR30. All homologous sequences with TTTAGGG or other TG-rich motifs were the hits with the best alignment scores and lowest E values. From this, we conclude that TAR30 is a homolog of CPTR and it likely shared a common evolutionary origin with the canonical TTTAGGG repeat. No homolog of TAR30 was found in the sequenced eukaryotic mitochondria and plant chloroplasts suggesting that TAR30 was likely derived from variations of nuclear DNA sequence. Given that the basic motif of TAR30 is found in both A and B genomes of wild and cultivated peanuts but not in other sequence legumes such as soybean (Glycine max), chick pea (Cicer arietinum), and barrel medic (Medicago truncatula) (Supplementary Table 2), TAR30 likely emerged after the split of the Arachis genus and other legumes but before the divergence of two Arachis sub-genomes (A and B genome). As TAR30 is abundantly present in the Arachis genomes, it was possible that TAR30 loci served as fragile sites and play some role in Arachis genome evolution or instability via homologous exchanges, chromosomal rearrangements, and other mechanisms (Mondello et al. 2000; Dvořáčková et al. 2015).

Genomic distributions of TAR30

In a wide range of plants, spanning flowering plants to algae, FISH signals of telomeric tandem repeats including TTTAGGG and other variants were only detected at the ends of plant chromosomes, and not in other chromosome regions (Röder et al. 1993; Fuchs et al. 1995; Sýkorová et al., 2003; Mizuno et al. 2006; Peška et al., 2015; Yang et al. 2017). However, an increasing body of work shows that telomeric repeats are also located in other regions including the centromeres or pericentromeres in some Solanum species and other eukaryotes (Nergadze et al. 2004; Tek and Jiang 2004; He et al. 2013; Du et al. 2016; Zhang et al. 2016). The telomeric-like repeats found in non-terminal regions of chromosomes including centromeric, pericentromeric regions and between centromeres and telomeres are called interstitial telomeric repeats (ITRs) (Bolzán 2017). TAR30 exhibited different distributions from that of the plant telomeric repeat TTTAGGG. Here, the TAR30 FISH signals were detected in the proximal regions of the chromosomes, including centromeric and interstitial regions, and clear lack of hybridization signals in the terminal region of the chromosomes (Fig. 2, 3). Previous results indicated that the FISH signals of common plant TTTAGGG repeat were mostly detected at the ends of chromosomes in both wild and cultivated peanuts with few signals in pericentromeric regions and other locations (Du et al. 2016; Zhang et al., 2016). Thus, TAR30 showed distinct chromosomal distributions from the TTTAGGG telomeric repeat in cultivated peanut and the diploid wild Arachis species which is consistent with our in silico analysis as only a tiny fraction of long PacBio reads contained both TAR30 and TTTAGGG repeat (Fig. 4). Overall, the enrichments of TAR30 signals in proximal regions of peanut chromosomes suggest that TAR30 is an interstitial homolog of TTTAGGG telomeric repeat in peanuts although we cannot rule that there is a little amount of TAR30 in the terminal regions of peanut chromosomes that impaired the microscope detection.

Sequence convergence between TAR30 and the TTTTTTAGGG telomeric repeat

Telomeric tandem repeats tend to be highly conserved. The Arabidopsis-type TTTAGGG repeat has been found across the plant kingdom (Petracek et al. 1990; Fuchs et al. 1995; reviewed in Peska and Garcia 2020). Thus, this telomeric repeat probably existed in the common ancestor of flowering plants. Transitions/switches of telomeric repeats have also been reported in many plants including Asparagaceae, Zostera marina, and Iridaceae in which the Arabidopsis-type repeat has been lost and replaced by the human-type repeat (Sýkorová et al. 2006; Peska et al. 2020). Thus, the TTAGGG repeat in these plants should be newly emerged after the divergence of monocots and dicots. The repeated appearances of the human-type telomeric repeat in distantly related lineages might be a consequence of independent events which can be considered evolutionary convergence or “homoplasy” of telomeric sequences. The recovery of TTAGGG repeat in these plants indicates the extreme importance of telomeres and implies that the evolution of telomeric repeats have undergone strong but similar selection pressures which often result in similar outcomes (Parker et al. 2013).

TAR30, TTTT(C/T)TAGGG, is nearly identical to the telomeric tandem repeat in Cestrum elegans with TTTTTTAGGG motif (Peška et al., 2015). It was possible that the original type of TAR30 was TTTTTTAGGG and then a single nucleotide substitution (T to C) within TTTTTTAGGG sequence occurred, generating TTTTCTAGGG unit which further expanded through strand slippage during DNA replication (Schlötterer and Tautz, 1992). Indeed, there are numerous TTTTTTAGGG motifs in the arrays of TAR30 (Supplementary Fig. 2). One scenario was that TAR30 and the TTTTTTAGGG repeat have independently emerged in two different lineages. However, TAR30 sequences are enriched in proximal regions of peanut chromosomes (Fig. 2, 3) that is distinct from the telomeric repeat in C. elegans (Peška et al., 2015).

Genomic changes during peanut polyploidization and domestication

Polyploidy has played a critical role in the divergence of higher plants and the innovations of new phenotypic traits (for review: Van de Peer et al. 2009). However, the genomic changes in the process of plant polyploidization and domestication are still not well understood. Induced allotetraploids provide a valuable resource for understanding the genomic differences in early generations after polyploidization. Previous studies have revealed genomic changes during polyploidization including sequence elimination (Guo and Han 2014), exchange of repeats between parental chromosomes (Lim et al. 2007), meiotic instability (Chester et al. 2012), transposon activation (Ramachandran et al. 2020), and sequence replacements (Lim et al. 2007). We genotyped two wild diploid species, A. valida (BB) and A. stenosperma (AA), and the earliest generation (S0) of their synthetic tetraploid ValSten (AABB) with a SNP array, and identified AAAA or BBBB alleles at some loci in ValSten suggesting homologous recombination during interspecific hybridization and polyploidization (Gao et al. 2021). Our FISH analysis detected different hybridization patterns of TAR30 between the two diploid parents and their synthetic allotetraploid and identified the emergence of TAR30 signals on A3 chromosome and a few other chromosomes in ValSten (Fig. 3). The appearance of TAR30 on A3 chromosome in ValSten may be caused by rapid burst of TAR30 or homologous exchanges between parental chromosomes (Lim et al. 2007). The FISH signals of TAR30 were stronger on some chromosomes in cultivated peanut than that in the synthetic wild tetraploid (Fig. 2, 3) and the coverage of TAR30 in cultivated peanut was higher than the sum of two wild progenitor diploids (A. duranensis and A. ipaensis) (Supplementary Fig. 1) implying possible amplification of TAR30 after peanut polyploidization.

In conclusion, TAR30 likely represents a homolog of the Arabidopsis-type telomeric repeat. Unlike the canonical telomeric repeat, TAR30 is enriched in the interstitial and centromeric regions of peanut chromosomes and lacks visible signals at the ends of chromosomes. Distinct FISH signals of TAR30 were detected between two diploid wild species and their newly synthetic allotetraploid as well as cultivated peanut implying rapid changes of repetitive sequences during peanut polyploidization and domestication. Overall, our efforts provide new insights into the chromosome evolution and the genomic dynamics of repetitive sequences in the early generation of synthetic polyploids.