Introduction

Sugarcane (Saccharum spp.) is a tropical crop with a worldwide impact on sugar and ethanol production (Zhang et al. 2018b). Basically, modern sugarcane cultivars derived from early interspecific hybridizations of different Saccharum species, mainly from S. officinarum, which has high sucrose content, and S. spontaneum which is a source of multiple resistance genes and ratooning ability (Daniels and Roach 1987). The first generation hybrids from these interspecific crosses were successively backcrossed with Saccharum officinarum accessions to recover sucrose content. Therefore, modern sugarcane cultivars inherited the majority of the S. officinarum chromosomes (70–80%), 10–20% of S. spontaneum, and 10% of recombinant chromosomes between these two species (Singh et al. 2019). The current narrow genetic base of sugarcane commercial breeding populations (Ali et al. 2019) in conjunction with the increasing importance of the crop for bio-based fuels has led to initiatives of broadening its genetic base via incorporation of new germplasm into the breeding pool. The required genetic resource for this task is found within the interbreeding group called Saccharum complex, which comprises six species from the genus Saccharum, namely the wild species S. spontaneum and S. robustum, and the cultivated species S. officinarum, S. sinense, S. barberi, and S. edule, plus closely related genera, namely Miscanthus, Sclerostachya, Erianthus, and Narenga (Irvine 1999). S. spontaneum, S. robustum, and Erianthus sp. accessions are an important source of genes for high biomass production, high tillering, and vigor for sugarcane improvement for energy production (Aitken and McNeil 2010).

Aside from genetic diversity, epigenetic modifications such as DNA methylation, RNA interference (RNAi), and histone modifications influence gene expression and ultimately the population phenotypes, causing variability among and within species that can be heritable (Grativol et al. 2012). Different epigenetic states can arise as epialleles via numerous routes, broadly classified as non-genetic and genetic sources, having distinct stabilities across generations and thus leading to highly variable roles in crop improvement (Springer and Schmitz 2017). The stable inheritance of methylated regions in the genome has been demonstrated and may constitute a relevant source of phenotypic plasticity in crop plants, especially those with large genomes and many silenced regions (Hofmeister et al. 2017). This is particularly relevant for crop species that went through polyploidization events, where intergenomic interactions between progenitor genomes are predicted to induce epigenetic changes such as DNA methylation, occurring in parallel with the establishment of new cytotypes and adaptation in local environments (Song and Chen 2015).

The DNA methylation in eukaryotes occurs exclusively on cytosine by the addition of a methyl group from S-adenosyl methionine at the 5′ position, resulting in 5-methylcytosine (5mC) (Schübeler 2015; Sahu et al. 2013). In plants, cytosine methylation occurs in the sequence contexts of CG, CHG, and CHH (H = A, C, or T) (Sahu et al. 2013), and has been extensively investigated with three main approaches: endonuclease digestion; affinity enrichment; and bisulfite conversion (Alonso et al. 2016). Methylation sensitive amplification polymorphism (MSAP) is an endonuclease-based technique, which is a variation of the Amplified Fragment Length Polymorphism (AFLP) approach (Vos et al. 1995). Briefly, the MSAP method uses the isoschizomers HpaII and MspI as frequent cutters in parallel reactions, which have differential susceptibility to cytosine methylation at the CCGG motif, in combination with a common rare cutter, indifferent of cytosine methylation, e.g., EcoRI (Fraga and Esteller 2002; Schulz et al. 2013). According to Schulz et al. (2013), both isoschizomers cleave CCGG motifs when unmethylated at both cytosines, and do not cleave when both cytosines are methylated or when the external cytosine is fully methylated, while HpaII cleaves CCGG motifs hemi-methylated at the external cytosine and MspI cleaves when hemi or fully methylated at the internal cytosine. Such approach enables the assessment of the cytosine methylation status across numerous random loci over the genome, including in non-model organisms (Alonso et al. 2016), providing a high number of markers that allow the efficient determination of the epigenetic diversity between and within different species (Lima et al. 2002). In sugarcane, this technique proved to be efficient to verify the fidelity of micropropagated sugarcane plantlets in relation to the matrix plants (Francischini et al. 2017).

Germplasm characterization efforts at both the phenotypic and molecular levels are imperative for germplasm bank management and their use by genetic breeding programs. Sugarcane germplasm characterization has been well explored via molecular markers, which reveal polymorphisms at the level of changes in the DNA sequences (Nayak et al. 2014). On the other hand, methylated regions of the genome may represent an untapped source of allelic variation (Ji et al. 2015) which remains largely unexplored within the Saccharum complex. Therefore, the present work aimed to assess the epigenetic diversity represented by cytosine methylation patterns and its magnitude in a group of genotypes encompassing commercial varieties, S. officinarum, S. spontaneum, S. robustum, S. barberi, and Erianthus sp. by using MSAP markers.

Materials and methods

Genotypes and DNA extraction

We analyzed 60 sugarcane genotypes encompassing commercial cultivars and wild accessions, i.e., 10 commercial cultivars, 5 S. robustum, 4 S. barberi, 24 S. spontaneum, 12 S. officinarum, and 5 Erianthus sp. accessions (Table 1). The genotypes derived from the Sugarcane germplasm bank of the Campinas Agronomic Institute (IAC).

Table 1 List of the 60 sugarcane genotypes comprising commercial cultivars and wild accessions used in the current investigation, with descriptions of their species and origin

Total genomic DNA was extracted from powder leaf tissue grinded in a Tissuelyser (Quiagen, USA) using the GenElute Plant Genomic DNA, Miniprep Kit (Sigma, USA) according to the fabricant instructions. The extracted DNA was quantified in 0.8% agarose gel electrophoresis using the λ DNA as standard and stained with ethidium bromide. The DNA was diluted to a final concentration of 10 ng/µl.

MSAP technique

The MSAP markers were performed based on the Amplified Fragment Length Polymorphism (AFLP) technique described by Vos et al. (1995) using methylation-sensitive enzymes as described by Lei et al. (2006) with minor modification for sugarcane (Francischini et al. 2017). Briefly, the DNA (250 ng) of each individual was double-digested, respectively, with the restriction enzymes EcoRI/HpaII and EcoRI/MspI. The product of the digestion reaction was ligated to the respective adapters and submitted to pre-selective and selective amplification reactions (Francischini et al. 2017). The adapters sequences, as well as the pre-selective and selective primers, are described in Supplementary Table 1. The pre-amplification products were diluted ten times (10×) in milli-Q water and used in the selective amplification with EcoRI primers labeled with infrared dye at wavelengths of 700 or 800 nm. The amplified products of two selective primer combinations were mixed and separated on 5% denaturing polyacrylamide gel in an Infrared 4300 DNA Analyzer (LiCor Bioscience, USA).

Marker genotyping

For each genotype, the molecular patterns obtained by EcoRI/HpaII and EcoRI/MspI enzyme combinations were compared side-by-side, and the markers genotyped as presence (1) and absence (0) using the Saga™ (Automated AFLP Analysis Software) program from Li-Cor® Biosciences. Four patterns were observed: absence of methylation (1/1 or +/+: marker present in both EcoRI/HpaII and EcoRI/MspI), complete methylation or hemi-methylation of internal cytosine (0/1 or −/+: marker absent in EcoRI/HpaII and present in EcoRI/MspI), external cytosine hemi-methylation (1/0 or +/−: present in EcoRI/HpaII and absent in EcoRI/MspI), and non-informative (0/0, −/−; absent in both EcoRI/HpaII and EcoRI/MspI) (Supplementary Fig. 1).

Data analysis

The marker loci were classified as susceptible to methylation (methylation-susceptible loci—MSL) and non-methylated loci (NML) by using the R package MSAP program (Pérez-Figueroa 2013) version 3.2.5 (Team RC 2014). The number of polymorphic loci for each MSAP selective primer combination, as well as the mean Shannon diversity index of the MSL and NML polymorphism, was also estimated. The levels of complete methylation or hemi-methylation of internal cytosine and methylation of the external cytosine were estimated for each MSAP selective primer combination (EcoRI/HpaII and EcoRI/MspI). The confidence intervals for these estimates were calculated using the VassarStats program (VassarStats platform, https://vassarstats.net/prop1.html) at 5% significance level.

The data matrix was transformed into a binary matrix, as proposed by Ma et al. 2013, in which the presence of methylation, defined by the MSAP patterns −/+ and +/−, was considered as "1", and the absence of methylation together with non-informative data, respectively, from the MSAP patterns +/+ and −/−, was considered as "0".

Polymorphism information content (PIC) was calculated for each MSAP selective primer combination according to the expression: PICi = 2fi(1 − fi), where fi is the frequency of the amplified allele/marker and (1 − fi) is the frequency of the null allele (Abuzayed et al. 2017). Genetic structure analysis was performed by the STRUCTURE (Pritchard et al. 2000) program version 2.3.4 (Pritchard et al. 2010) assuming the admixture model. The K (number of subpopulations) was set ranging from 1 to 10 assuming 100,000 initial interactions (burning period) and 200,000 interactions of Monte Carlo Markov Chain (MCMC). The results were evaluated by the STRUCTURE HARVESTER v.0.6.94 (Earl et al. 2012) to define the most probable number of K according to Evanno et al. (2005).

Molecular variance analysis (AMOVA) was performed by the Arlequin v.3.5 program (Excoffier and Lischer 2010) considering the number of subpopulations (K) previously defined by the STRUCTURE program. The genetic dissimilarity matrix was calculated by the Darwin v.6 program (Perrier and Jacquemoud-Collet 2006) using the Jaccard coefficient. The dissimilarity between the accessions was displayed in a dendrogram using the Neighbor-joining method (Saitou and Nei 1987) and adopting the bootstrap method with 1000 replications.

Isolation and sequencing of methylated MSAP fragments

Four fragments from MSAP patterns −/+ or +/− , indicating methylated fragments (MFs), were randomly selected and excised directly from the polyacrylamide gel using the Odyssey CLx equipment (Li-Cor® Biosciences). The fragments were further eluted with 50 µL of TE-buffer (10 mM Tris–HCL pH 8.0; 0.1 mM EDTA) at 4 °C, incubated at 60 °C for 2 h, and centrifuged for 10 s to remove the supernatant. An aliquot of the elution was used for reamplification with the respective MSAP selective primer combination and PCR cycling conditions. After the separation of the reamplified PCR products in 1% agarose gel electrophoresis, the fragments were purified using the Wizard® SV Gel and PCR Clean-up (Promega, USA) kit.

The purified PCR products were directly sequenced by the Sanger method. Homology analysis was carried out with BLASTN search against the sugarcane R570 mosaic monoploid genome (Garsmeur et al. 2018) and S. spontaneum tetraploidy genome (Zhang et al. 2018b). Exons and introns were identified by matching the transcripts with their respective proteins from both databases using BLASTX. With the frame information from BLASTX results, the nucleotide sequence corresponding to the transcription start site (TSS) was determined for all transcripts found using translate Expasy tool (https://web.expasy.org/translate/). Putative protein families were assessed by Pfam program (https://pfam.xfam.org/), while the search for cis-acting sequences, tandem repeats, and CpG islands was performed with PlantPAN 3.0 (https://plantpan.itps.ncku.edu.tw/promoter.php) in a range of up to 3000 bp from the alignment with the MSAP fragment. The relative position of PlantPAN output and the MF alignments had the TSS as reference.

Results

MSAP polymorphism for selective primer-pair combinations

The number of markers (loci) produced by the 15 selective primer-pair combinations ranged from 69 for the E(AGA)/H-M(TTG) to 104 for the E(ACC)/H-M(ACA), with an amplicon length varying from 37 to 478 bp. Of the total number of markers (1341 loci) obtained across the 60 genotypes, 1117 (83.30%) were classified as MSL and 224 (16.70%) were classified as NML (Table 2). Of the MSL, 98% (1100) were polymorphic with an average Shannon diversity index of 0.578 ± 0.121. The selective combination E(ACC)/H-M(TTG) presented the highest MSL, while the lowest MSL was observed for E(AGG)/H-M(TTG). The total polymorphism proportion of NML was 177 (79%), with a mean Shannon diversity index of 0.397 ± 0.194. The highest NML polymorphism was observed for the selective primer-pair combination E(ACC)/H-M(ACA), while the lowest for the selective combination E(ACA)/H-M(ACT). PIC values ranged from 0.252 (E(AGA)/H-M(ACC)) to 0.393 (E(AGG)/H-M(ACT)) with an average value of 0.316 (Table 2).

Table 2 Number of markers (loci) across 15 selective primer combinations with respective methylation-susceptible loci (MSL), non-methylated loci (NML), and polymorphism information content (PIC) values

Methylation levels between accession groups

A high number of unmethylated loci were observed for all genotype groups, with commercial cultivars standing out with the lowest frequency (Table 3). The commercial cultivars also presented the highest percentage of non-informative loci, although not significant at the 5% significance level (confidence interval, CI, of 95%). In relation to the basic germplasm, Saccharum officinarum had the highest frequency of unmethylated loci (40.59%; CI 95% 39.83–41.35%) compared to the commercial cultivars (37.29%: CI 95% 36.48–38.10%) while S. spontaneum had the highest frequency of internal methylation loci (10.05%; CI 95% 9.73–10.38%) and Erianthus sp. had the lowest (7.63%: CI 95% 7.03–8.30%). The highest frequency of external methylation loci was observed for S. barberi (17.72%; CI 95% 16.73–18.78%), while the lowest (13.77%; CI 95% 13.26–14.32%) for S. officinarum (Table 3).

Table 3 Methylation patterns of sugarcane genotype groups

Relative total methylation

The relative total methylation, i.e., the sum of the complete methylation or hemi-methylation of the internal cytosine and the external cytosine hemi-methylation, represents the entire portion of the genome that is methylated regardless the methylated cytosine status, i.e., whether it is internal or external. The relative total methylation was lower than its absence in all the accessions groups (Fig. 1). The selective primer-pair combination E(ACA)/H-M(TTG) detected the highest frequency of internal methylation in the commercial cultivars, S. robustum and S. barberi accessions (Table 4).

Fig. 1
figure 1

Average percentages of unmethylation, external cytosine methylation, and internal cytosine methylation for the six sugarcane genotype groups (commercial, S. robustum, S. barberi, S. spontaneum, S. officinarum, and Erianthus sp.)

Table 4 Frequency and confidence interval (%) of the relative total methylation of six accessions groups for 15 selective primer-pair combinations

According to the average number of loci obtained by the 15 selective primer-pair combinations, the S. officinarum accessions had the highest frequency of internal methylation (41.130%, 95% CI 39.57–42.71%), whereas Erianthus sp. had the lowest (32.80% CI 95% 30.52–35.17%) (Table 4). The E-(ACC)/H-M(TTG) selective primer-pair combination captured the highest number of external hemi-methylation loci in the commercial cultivars, S. spontaneum and S. officinarum. On average, the Erianthus sp. group had the highest number of external methylated loci (67.20%, 95% CI 64.83–69.48%), whereas S. officinarum had the lowest (57.28%, 95% CI 57.29–60.43%) (Table 4). In addition, most of the polymorphisms from the DNA digestion with EcoRI/HpaII enzymes were due to external cytosine methylation.

Structure analysis based on methylated loci

The genotyping data were transformed into a sensitive methylation polymorphism matrix where the presence of methylation was considered as "1" and the absence as "0". The structure analysis based on the sensitive methylation polymorphism (epigenome) revealed the presence of two subpopulations (k = 2, Delta K = 218.015) (Fig. 2a).

Fig. 2
figure 2

Population epigenetic structure of the 60 sugarcane genotypes using the binary matrix of relative total methylation, transformed from the MSAP data set. a Inference of the optimal number of subpopulations (k) using the delta K variation (Δk), with k varying from 1 to 9. b Bar plot with each column representing the estimated membership coefficients, in K = 2, for each genotype, which are represented with numbers from 1 to 60 and grouped as follows: commercial (1–10), S. robustum (11–15), S. barberi (17–20), S. spontaneum (16, 21–43), S. officinarum (44–55), and Erianthus sp. (56–60). Subpopulation 1 and subpopulation 2 are represented by orange and green colors, respectively

All of the commercial cultivars were grouped within the first subpopulation (orange color), representing its majority composition, which also included the S. robustum accessions. Most of the S. barberi and S. spontaneum accessions, such as Chin (S. barberi) and US85108 (S. spontaneum), did not split along the two subpopulations. The second subpopulation (green color) encompasses all S. officinarum, Erianthus sp., and the remaining S. spontaneum accessions (Fig. 2b).

According to the molecular analysis of variance (AMOVA), most of the epigenetic variability is distributed within the groups (91.38%), while the differentiation index value (Φst) of 8.6% suggest moderate epigenetic differentiation among the six accession groups (Table 5). The dendrogram based on the susceptible methylation loci separated the accessions into three major groups. The commercial cultivars clustered with most of the S. robustum and S. barberi accessions, while the second group comprised the remaining accessions of S. robustum and S. barberi, nearly half of S. spontaneum accessions and all accessions of S. officinarum and Erianthus sp. The remaining S. spontaneum accessions formed a third group (Fig. 3).

Table 5 Distribution of epigenetic variability within and between sugarcane genotype groups
Fig. 3
figure 3

Dendrogram of 60 sugarcane genotypes (commercial, S. robustum, S. barberi, S. spontaneum, S. officinarum, and Erianthus sp.) obtained via Neighbor Joining for methylation-susceptible loci (MSL) based on dissimilarity values (Jaccard complement). Bootstrap values are shown in the nodes

The dendrogram based on the NML loci also separated the accession into three groups. The commercial cultivars, S. robustum, S. barberi, and S. spontaneum accessions were in the same group, while S. officinarum, and some S. spontaneum and Erianthus sp. accessions were placed in a second group. The third group encompassed some S. barberi and the remaining Erianthus sp. accessions (Fig. 4).

Fig. 4
figure 4

Dendrogram of the 60 sugarcane genotypes (commercial, S. robustum, S. barberi, S. spontaneum, S. officinarum, and Erianthus sp.) obtained via Neighbor Joining for non-methylated loci (NML) based on dissimilarity values (Jaccard complement). Bootstrap values are shown in the nodes

Analysis of methylated fragments

The four random sets of fragments sequences ranged from 158 to 356 bp. Significant alignments with S. spontaneum and/or sugarcane R570 sequences (https://www.ncbi.nlm.nih.gov/; https://sugarcane-genome.cirad.fr/content/blast) were obtained for these fragments, considering an e value cut off of 1e − 5 (Table 6). Three of them showed intragenic alignments with intronic regions, whereas MF_2 aligned to a genomic region. The putative function of the S. spontaneum transcript aligned to MFs 3 and 4 is unknown, since analysis with Pfam did not identify any recognized domain for the associated protein. On the other hand, MF_1 aligned to transcripts assigned with the pfam domains protein kinase (Pkinase) and WD40 repeat when comparing with the transcriptomes from sugarcane commercial cultivar R570 and S. spontaneum, respectively. BLASTN and Plant PAN analysis revealed that MF_2 is upstream transcripts with unknown functions and aligned to the cis-acting sequence basic helix–loop–helix (bHLH) in both databases (Table 7). Plant PAN analyses also revealed nearby CpG islands, less than 3 kb, for all MF alignments, while the proximity of tandem repeats was observed for MF_1 alignment in the R570 database, and for MF_2 alignments for both databases. The HpaII/MspI CCGG recognition motif was found within CpG island for the alignment between MF_2 and R570 genome.

Table 6 Sequence analysis of four methylated fragments (MFs) denoting internal or external cytosine methylation
Table 7 PlantPAN analysis for the assessment of putative cis-regulatory sequences, CpG island, and tandem repeats of methylated fragments (MFs), with positioning relative to transcription start site (TSS)

Discussion

The genetic variability of the Saccharum complex has been extensively investigated using molecular markers (Aitken and McNeil 2010; Liu et al. 2016; Singh et al. 2017), while epigenetic diversity is yet to be reported in sugarcane. Increasing evidence points to the relevance of epialleles in crop breeding, which includes the stable inheritance of differences in DNA methylation across generational time (Hofmeister et al. 2017) and its association with phenotypic variation of agriculturally important traits, such as overall crop yield (Ong-Abdullah et al. 2015) and disease resistance (Akimoto et al. 2007). We, therefore, aimed to assess the epigenetic background of sugarcane germplasm accessions and commercial cultivars by evaluating the diversity of cytosine methylation patterns and the extent of this epigenetic mark across these genotypes. We also performed a brief evaluation of the genomic distribution of four MFs via an in silico analysis of current sugarcane-genome databases.

According to the global analysis by R MSAP package, the majority of the MSAP loci were epigenetically informative, while statistical significance was observed for both genetic and epigenetic variations, which represented similar amounts of diversity. Such significance indicates contributions of cytosine methylation patterns for the phenotypic diversity observed within the Saccharum complex. Our results also revealed that the proportion of loci with cytosine methylation was inferior to the unmethylated loci in all the 60 accessions, ranging from 23.28 to 26.75%, which is similar to Ma et al.’s (2013) observations for Populus tomentosa. The lowest unmethylation frequency observed for the commercial cultivars may represent higher cytosine methylation within the non-informative MSAP loci. This putative difference of cytosine methylation is most likely caused by the interspecific hybridizations and polyploidization events that occurred during sugarcane breeding, which effects on cytosine methylation patterns have been observed in diverse plant taxa (Song and Chen 2015; Li et al. 2019).

It is also noteworthy that the proportion of internal and external cytosine relative methylation varied across the accessions groups and selective primer-pair combination. For instance, S. robustum and S. barberi showed the highest frequency of internal cytosine methylation when assessed with the selective primer-pair combination E(ACA)/H-M(TTG), while the selective primer-pair combination E(ACC)/H-M(TTG) captured the highest frequency of external cytosine methylation for S. spontaneum and S. officinarum. The 15 selective primer-pair combinations showed a different capacity to capture genomic regions susceptible to cytosine methylation, with the E(ACC)/H-M(TTG) combination being the most efficient for such detection, due to the highest PIC value and MSL. This information may be useful for further epigenetic investigations within the Saccharum complex aiming specific cytosine methylation patterns and high information content.

Considering the average number of loci for all selective primer-pair combinations, a higher frequency of HpaII cut in comparison to MspI was observed for all accession groups, indicating a higher occurrence of external cytosine hemi-methylation. This proportion may be specific to the leaf tissue sampled, since studies performed in leaves of perennial species of P. tomentosa (Ma et al. 2013) and P. alba (Guarino et al. 2015) showed higher frequency of HpaII cut, whereas reports from the same species by Ma et al. (2012) with bark samples revealed an opposite proportion.

The species S. barberi, S. spontaneum, S. officinarum, and Erianthus sp. did not significantly differ in the frequency of internal cytosine methylation, while significant differences were observed among commercial cultivars, S. robustum and Erianhus sp accessions. On the other hand, a significantly higher frequency of hemi-methylation at the external cytosine was observed in S. barberi. These differences may be explained by the minor participation of S. robustum, S. barberi, and Erianthus sp. in the constitution of modern cultivars. According to Arceneaux (1967), only 19 accessions of S. officinarum, two of S. spontaneum, four of S. sinense, and one of S. robustum were used as progenitors in the first interspecific hybridizations.

The epigenetic structure of the sugarcane genotypes revealed two major subgroups with an epigenetic differentiation value of 8.60% indicating moderate differentiation among subpopulation (Wright 1978). According to the polymorphic MSL, S. spontaneum accessions were separated into two distinct groups. Such separation is demonstrated by the higher variability within groups and may be addressed to the fact that S. spontaneum is a highly polymorphic species with considerable variations in chromosome number and extensive geographical distribution (Aitken and McNeil 2010; Da Silva 2017). Moreover, S. spontaneum accessions have distinct levels of tolerance against various abiotic factors (Da Silva 2017), which have close relation with DNA methylation (Grativol et al. 2012).

A coherence was observed between the genotype grouping on the dendrogram from MSL and the subpopulations defined by the STRUCTURE program, with the mixed ancestrality assumption. The first and third subgroups included the genotypes from the subpopulation 1, while the second group comprised all accessions from subpopulation 2. The genetic variation of sugarcane genotypes evidenced through the NML dendrogram, in turn, revealed agreements with pedigree information. This is most evident within the second group, with S. barberi closely grouping with S. officinarum and S. robustum, which is in accordance with reports, demonstrating that S. barberi originated from natural crosses between these aforesaid species (D’Hont et al. 2002); and also due to the close grouping among commercial cultivars and S. officinarum accessions (Irvine 1999). The first group, on the other hand, grouped the IJ76-381 accession, first classified as Erianthus sp., with S. officinarum accessions. However, according to the GRIN database (https://www.ars-grin.gov/), this accession became classified as a S. arundinaceum member, which explains its proximity to the Saccharum genus. Finally, the third group comprised the Erianthus sp. accessions and the MATNA SHAHJ accession, previously considered as an S. barberi, thus characterizing a disagreement with the pedigree information, probably due to misidentifications.

In plants, cytosine methylation is distributed in a mosaic pattern across the genome, targeting repetitive DNA and actively transcribed regions (Schübeler 2015). Changes in cytosine methylation in CHG and CHH contexts have been associated with the activity of transposable elements and heterochromatin formation in repetitive DNA, occurring in response to environmental factors (Gent et al. 2013), while the context CG is often associated with cytosine methylation within gene body and promoters (Zhang et al. 2018a). The alignments of the MFs provided some insights about the distribution of cytosine methylation over the genome within the Saccharum complex. Among the cytosine methylation contexts in plants, the MSAP technique provides coverage for CG and CHG (Schulz et al. 2013; Fulneček and Kovařík 2014), with the former context being illustrated by MF_1, MF_3, and MF_4 due to the alignment within transcribed regions, and MF_2 due to alignment upstream the transcribed region, where a promoter-binding sites may be located. To et al. (2015) classified intragenic cytosine methylation as intragenic heterochromatin, mostly found within intron after the insertion of transposable elements (TEs), and gene body methylation found primarily in exons but also in introns. We did not find any evidence of TEs comprising the intragenic alignments observed here; therefore, the MFs 1, 3, and 4 most likely correspond to gene body methylation cases. According to Zhang et al. (2018a), the biological relevance of gene body methylation seems to depend on the plant species, which has been associated with the reduction of gene expression variability, prevention of aberrant transcription, and efficiency of pre-mRNA splicing.

The CpG islands, defined as DNA segments rich in CG dinucleotides in comparison to the rest of the genome, are mostly associated with genes in plant species with small genomes, i.e., Arabidopsis thaliana, sorghum, and rice, whereas fewer associations with genes were observed in plants with larger genomes, i.e., barley and maize (Ashikawa 2001). Nevertheless, CpG islands and tandem repeats are correlated with gene regulation, which presence in TSS indicates the involvement of epigenetic pathways, such as chromatin modifications, in the regulation of downstream gene expression (Rombauts et al. 2003; Ludwig et al. 1997). Our results showed an association of MFs with sugarcane transcripts, of which Sh03_p000580, Sh_004L21_g000090, and Sspon.04G0014790-1A are likely to be regulated via cytosine methylation due to the presence of CpG islands comprising the TSS. Aside from MF_2 alignment with R570 genome, cytosine methylation occurred outside CpG islands and tandem repeats, although the proximity was less than 3 kb for all MFs. Cytosine methylation outside CpG islands has been reported in mammals (Luu et al. 2013) and plants (Ashikawa 2001). The relative distance between MFs and CpG islands differed between sugarcane R570 and S. spontaneum sequences, and so for the identified transcripts, indicating that MFs did not align in homologous regions. This may be explained by the predicted induction for epigenetic changes, such as cytosine methylation and histone modifications, during intergenomic interactions between progenitors in allopolyploid species (Song and Chen 2015).

Finally, it is noteworthy the MF_1 alignment with transcripts having pfam motifs Pkinase and WD40. Protein kinases function in the activation of various cellular processes via protein phosphorylation, such as metabolism, transcription, cell cycle progression, and disease resistance (Afzal et al., 2008). Repeated WD40 domains, in turn, act as a site for protein–protein interactions, having central roles in diverse biological processes such as cell division, protein trafficking, flowering, chromatin modifications, among others (Gachomo et al. 2014; Stirnimann et al. 2010). Another interesting alignment was observed in MF_2, which comprised a cis-regulatory sequence from the bHLH transcription factor superfamily in the R570 and S. spontaneum genomes, with functions in stress-adaptative responses and phytohormone signalization (Seo et al. 2011; Xu et al. 2014), having possible roles in the regulation of the downstream transcripts with unknown functions.

Our study provided the first information about the epigenetic diversity, the extent of cytosine methylation patterns, and the epigenetic relatedness within the Saccharum complex, while preliminary in silico analysis of MFs suggested biological relevance of MSL. The analysis properly represented the species with major contributions to modern sugarcane cultivars, i.e., S. spontaneum and S. officinarum, while future observations with more accessions of the species misrepresented here (S. barbei, S. robustum, and Erianthus sp.) are required to further uncover the epigenetics of sugarcane. Nevertheless, this first epigenetic characterization revealed a significant epigenetic diversity within our sample representing the Saccharum complex, with highlights for S. spontaneum accessions, identified the most informative selective primers, and analyzed the putative biological relevance of the investigated MFs, assuring further search for epialleles with functional effects in sugarcane.