Introduction

Sugarcane (Saccharum spp. hybrid) is a crop of industrial importance in the tropical and sub-tropical regions of more than 100 countries around the globe (daCosta et al. 2011). In the year 2014, with a total sugarcane production of 352.1 mT from 5.01 mha, India ranked second only to Brazil in cane area (10.4 mha) and sugarcane production (737.1 mT; FAO 2013). In India, sugarcane is cultivated primarily for white sugar and jaggery production, but it has enormous potential as a multiproduct crop, viz., green top as fodder, baggase, press mud, paper pulp, green fuel, etc. (Allen et al. 1997; Pinto et al. 2010; Kang et al. 2013). In addition, in recent times, it has gained attraction due to its potential for green fuel (ethanol) production, and now much of the sugarcane in countries like Brazil and United States is being used for ethanol production (Sanghera et al. 2015).

Grivet et al. (2004) classified the sugarcane genetic resources into three groups: (1) traditional cultivars, (2) wild relatives, and (3) modern cultivars. Traditional cultivars are the ones that descend from primary domesticates and comprise of Saccharum officinarum, S. barberi, S. sinense and S. edule. The wild species, S. spontaneum and S. robustum purportedly interbred with the traditional cultivars during the process of sugarcane domestication. It is believed that S. officinarum has evolved by man-made selections from the mutant forms of wild cane S. robustum (x = 10, 2n = 6x = 60; Stevenson 1965), whereas, S. spontaneum is a wild species that exhibits a great amount of phenotypic variation. The modern cultivars of sugarcane varieties are the outcome of inter-specific hybridization between domesticated octaploid S. officinarum (2n = 80; x = 10) as female parent and wild S. spontaneum (2n = 40–128; x = 8) as male parent. Thus the cultivated varieties have a complex polyploid genome, and exhibit a high level of heterozygosity coupled with varied number (100-130) of chromosomes (D’Hont et al. 1998; Casu et al. 2005). Inter-crossing of promising and high yielding inter-specific hybrids coupled with continuous selection over the period for few similar desirable traits, especially cane yield and sucrose content has resulted in narrowing of the gene pool and low degree of internal contrast within the modern sugarcane cultivars (Walker 1987) causing ambiguity in delineation of individual genotype. In last few decades efforts have been made to broaden the genetic base in sugarcane but that too met with little success (You et al. 2013).

The use of molecular markers for predicting phenotype requires accurate tagging and validation of the marker(s) closely linked to the trait(s) of interest, and such mapped or even trait-associated stable markers could be of immense economic benefit in marker-assisted breeding programmes. In most crops, either family-based or linkage mapping has been used to tag the segregating trait(s) of interest and a number of examples of linkage mapping are available but in the past few decades, only a limited number of QTLs have been identified. Linkage disequilibrium (LD)-based mapping popularly called association mapping (AM) was initially adopted in human genetic studies but now it is being increasingly applied in crop plants including sugarcane mainly for identification of marker-trait associations (Wei et al. 2006, 2010; Debibakas et al. 2014; Guoy et al. 2015; Banerjee et al. 2015; Singh et al. 2016). In AM, the QTLs are identified by analyzing the degree of marker-trait associations that have been able to withstand segregation and reshuffling of the genetic elements in a diverse gene pool down the line of evolutionary recombination within the species (Gupta et al. 2005). LD-based methods are now the most suited approach for uncovering marker-trait associations (MTAs) in crops like sugarcane, where complex polyploidy, inbreeding depression and dominant marker annotation make the linkage mapping almost impractical (Banerjee et al. 2015). Moreover, it is expected that such molecular markers tagged to all the phenotypic traits could be of immense benefit in future when the plant protection regime for registration/patenting of new varieties becomes more streamlined.

Sugarcane breeding at present is mainly focussed on increasing sugar yield, biomass, and breeding for abiotic stress resistance. In recent times, in sugarcane genome-wide association mapping studies have been carried out for a number of traits (Wei et al. 2006, 2010; Debibakas et al. 2014; Guoy et al. 2015; Banerjee et al. 2015; Singh et al. 2016). However, such studies refer to a limited number of the most obvious agronomic traits that are predominantly quantitative, that resulted in identification of a multitude of small effect markers predominantly due to the complications of polyaneuploid genome. Genomic approaches have been predominantly developed on model crop systems and thus can not be straightforward applied in sugarcane. The conventional marker-assisted breeding approaches require markers that are able to tag alleles explaining a major proportion of trait variations, e.g., disease resistance genes, etc. Since, yield and quality characteristics are complex, it requires a more comprehensive tailoring and coverage during investigations for marker-trait associations (Varshney and Tuberosa 2013).

With this backdrop, in the present study, a thorough genotypic characterization using 1546 marker loci generated by 174 SSR (simple sequence repeat) primers and phenotypic characterization for 32 traits was carried out in a panel of 92 sugarcane varieties from sub-tropical India so as to identify LD-based MTAs that directly or indirectly could be related to yield and other agronomically important traits that are generally exploited by the breeders while selecting a superior genotypes. The trait associated markers (MTAs) thus identified could find potential applications in fine-tuning the sugarcane breeding and selection programmes.

Materials and methods

Plant materials and phenotyping for DUS descriptors

This study included 92 sugarcane varieties that were released for commercial cultivation in India during 1933 to 2008 and are being maintained at ICAR-Indian Institute of Sugarcane Research, Lucknow, India as a reference collection for conducting DUS (distinctiveness, uniformity, and stability) tests of newly released varieties in India (Table S1). Field trials of these varieties were conducted during 2011–2012, 2012–2013 and 2013–2014 in DUS test field in a randomized block design (RBD) in two replications, three bud sett planting was done in a 6 m row spaced at 90 cm and plant-to-plant distance was maintained at 60 cm. Data for all the three years were recorded at full maturity stage (300 days after planting) on all the 92 varieties for 32 phenotypic traits (Table S2) as prescribed in DUS test guidelines (Anonymous 2009). Of the 32 traits, twenty three, viz., plant growth habit, leaf sheath hairiness, leaf blade curvature, adherence of leaf sheath, dewlap colour, internode shape, bud shape, internode colour unexposed and exposed to sun, auricle shape, shape of ligule, bud tip in relation to growth ring, root band width, rind surface appearance, growth crack, pithiness, growth ring prominence, zig-zag alignment, internode waxiness, internode cross section, bud groove, bud cushion, and bud length were assessed as qualitative traits that were stable over the years. Rest of the nine traits, viz., internode diameter, NMCs, leaf blade width, plant height, sucrose content (%), fibre content (%), commercial cane sugar (CCS) per cent, corrected Brix (%) and purity per cent were measured in quantitative terms and showed variation over the years.

For each variety, 30 random stools were selected for two replications each of the three years, and data was recorded on the mother shoot; the observations on node and internode were made at mid-height of mature cane. Observations on plant growth habit were recorded visually on a group of plants inside the field. The 23 qualitative traits that were considered in this study are also included in DUS descriptors of sugarcane and hence these were measured/scored according to the guidelines for sugarcane characterization (Anonymous 2009). Among the quantitative traits, sucrose and purity per cent were estimated with the help of a Polarimeter (Erma, Japan).

Genotyping using SSR markers

Genomic DNA of all the 92 varieties was isolated from young leaf tissues following cetyl trimethyl ammonium bromide (CTAB) method (Doyle and Doyle 1990), quantified by electrophoresis in 0.8% agarose gel with known DNA standards, and suitably diluted to a final concentration of 25 ng/µL. Genotyping was carried out using a set of 174 polymorphic SSR primers (Table S3) comprising of 43 genomic SSRs (Govindaraj et al. 2005; Parida et al. 2009), and 131 EST-SSRs (Pinto et al. 2004; Oliveira et al. 2009; Singh et al. 2013). DNA amplification was carried out in a 15 µL reaction volume consisting of 1 × PCR assay buffer, 200 mM of each dNTPs (Fermentas, USA), 12 ng (1.8 pmol) each of forward and reverse primers (Operon Biotechnologies, GmbH, Germany), 0.5 units of Taq DNA polymerase (Fermentas, USA) and 25 ng genomic DNA using a thermal cycler (MyCycler, Biorad, USA). The cycling parameters were initial denaturation at 94 °C for 5 min, followed by 33 cycles of 94 °C for 1 min, 55–62 °C (as per Tm value of the primer) for 1 min, 72 °C for 2 min, and finally, a primer extension cycle of 7 min at 72 °C. The amplification products were resolved on 12% acrylamide gel with 1 × Tris borate EDTA buffer, stained with 0.5 µg/ml ethidium bromide (Sigma-Aldrich, USA) and images were captured using a gel documentation system (G:Box, Syngene, U.K.). Only clear and unambiguous amplified products were scored as dominant markers with 1 (present) or 0 (absent) score since it is difficult to identify and differentiate between amplicons from homologous chromosomes.

Statistical analyses

Phenotypic data analysis

For the nine quantitative traits, raw data was analyzed using two-factor analysis of variance (ANOVA) using SAS General Linear Model (GLM) procedure considering genotype, replication and year as variable effects and all other factors that could affect the trait variation, e.g., fertilizer application, irrigation etc., as fixed effects using SAS/STAT® software 9.3 (SAS Institute Inc., Cary, NC, USA). In addition, least square means (LSM; genotype as fixed effect), standard error of differences (SED), least significant difference (LSD) and descriptive statistics like coefficient of variation (CV) and grand mean (GM) were also calculated. The genetic advance and broad sense heritability (Johnson et al. 1955) of all the nine characters was calculated based on the variance component estimates (ANOVA) using the software INDOSTAT (Indostat Services, Hyderabad). Pair-wise correlation coefficients among the nine traits were calculated for all the three years of study. For all the 32 traits, the phenotypic data was converted into scores according to DUS test guidelines for sugarcane that were used for further analyses (Anonymous 2009).

Polymorphic information content analysis

Each SSR marker was evaluated for its polymorphic information content (PIC) value using the following formula given by Botstein et al. (1980):

$${\text{PIC}} = 1 - \sum {\text{pi}}^{2} - \sum 2{\text{pi}}^{2} {\text{pj}}^{2}$$

where, pi is the frequency of the ith allele, and pj is the frequency of the (i + 1)th allele in the group of 92 sugarcane varieties.

Analysis of population stratification

A total of 1546 dominant loci that were generated by 174 SSR primer pairs were used for STRUCTURE analysis. Minor alleles (with frequency of <0.05 or >0.95) were removed using TASSEL ver. 3.0 (http://www.maizegenetics.net; Bradbury et al. 2007) prior to the LD analysis so as to avoid any bias in LD estimates between pairs of loci that may lead to a false picture of population stratification (Mohlke et al. 2001). In this way, a set of 1306 loci were obtained that were finally used in STRUCTURE analysis resulting to a Q-matrix. In order to assess the number of groups in the panel of varieties, Bayesian clustering analysis was done using the software STRUCTURE 2.3.1 in which individuals are assigned to clusters or groups (k) using multi-locus genotypic data, and Markov Chain Monte Carlo method was used to verify the consistency in grouping parameters (Pritchard et al. 2000). The natural log probability of data [LnP(D)] obtained from the STRUCTURE output and an ad hoc statistic Δk were combined to accurately estimate the value of k (Evanno et al. 2005).

Marker-trait association (MTA) analysis

Marker-trait association was investigated once for qualitatively assessed twenty three traits, which was followed by identification of MTAs for the nine quantitative traits assessed over the years. Association between each marker and phenotypic trait was calculated using software TASSEL ver. 3.0 on the basis of mixed linear model (MLM). For MTA analysis, MLM model which accounts both the population structure and kinship was used so as to avoid false positives arising from inbreeding. The Q matrix which was generated by STRUCTURE ver. 2.2 and a kinship matrix generated by using the software SPAGeDi1.4c (Hardy and Vekemans 2002) were used as input data in the MLM analysis. The MTA study for the 23 qualitative traits was based on their respective scores that were uniform for all the three years of study, for the nine quantitative traits it was based on the three year quantitative data so that over the year stability of the identified MTAs could be assessed. False discovery rate (FDR) 0.05 with Benjamini and Hochberg FDR control was taken as a threshold to identify a significant MTA (Benjamini and Hochberg 1995).

Results

Phenotypic data and correlation studies

The 23 qualitative traits exhibited uniform scores over the three years, and a predominant monomorphism was observed across the 92 sugarcane varieties for traits like plant growth habit, leaf blade curvature and leaf sheath adherence. For rest of the qualitative traits, a significant degree of polymorphism was recorded. In the case of the nine quantitative traits, the mean phenotypic data of the 92 sugarcane varieties over the replications and years revealed a high degree of polymorphism (Table 1, S5). The panel of 92 varieties recorded the highest (cv 24%) variation for NMC, while the lowest variation (cv 3.5%) was recorded for pol per cent (Table 2).

Table 1 Mean sum of square values based on analysis of variance (ANOVA) for the nine quantitative traits of sugarcane for the years 2011–2012, 2012–2013 and 2013–2014
Table 2 General mean, coefficients of variation, heritability (broad sense) and genetic advance for the nine quantitative traits of sugarcane for the years 2011–2012, 2012–2013 and 2013–2014

The average correlation coefficient for all the three years (Table 3) revealed that number of millable canes (NMC) which is one of the most important yield contributing traits possessed a significant negative correlation with sucrose per cent (−0.16), corrected Brix (−0.15) and commercial cane sugar per cent (CCS; −0.15). Leaf blade width which is an important descriptor for grouping of varieties recorded a significant positive correlation with internode diameter (0.44), plant height (0.27) and corrected Brix (0.15), while it possessed a significant negative correlation (−0.20) with Fibre Per cent. Internode diameter a major contributor to cane weight was significantly positively correlated (0.20) with plant height but it showed a significant negative correlation (−0.25) with FP. Purity per cent, cane diameter and CCS are derivatives of sucrose per cent (SP) and hence showed significant positive correlation (0.95, 0.50 and 0.99, respectively) with the latter. The GCV, ECV and PCV were the highest for NMCs and the lowest in the case of PP (Table 2), and as a result, the GA as percent of mean was also the highest for NMC (18.60) and the lowest for PP (1.35). This indicates a greater potential of the former trait for improvement through traditional breeding.

Table 3 Pair-wise correlation coefficients between the 32 phenotypic descriptors of 92 sugarcane genotypes over three years (2011–2012, 2012–2013 and 2013–2014)

Polymorphic information content (PIC) of the SSR primers

The 174 SSR primer pairs generated a total of 1546 dominant markers across the 92 sugarcane genotypes (Table S3); the average number of markers generated by one SSR primer pair ranged from 2 (primer IISR_75) to 24 (EST B39) with a mean value of 10.34. The amplification products generated by SSR primers were between 20 bp and 1200 bp. The average PIC value of all the 174 SSR primers was 0.60; it was the highest (0.99) for primer IISR_270, while the lowest (0.11) was recorded for primer SEGMS 780.

Population diversity, structure and kinship analyses

The number of sub-populations in the panel of varieties was identified as K = 7 based on the method as described by Evanno et al. (2005), and thus the 92 sugarcane varieties were classified into 7 groups (Fig. 1). The STRUCTURE analysis revealed that none of the varieties in the panel had an unblemished pure STRUCTURE profile. The fixation index (Fst) and gene-flow (Nm) values of the sub-groups also supported this fact (Table 4); each sub-group showed a high degree of gene-flow that ranged between 0.41 (group 7) and 0.97 (group 3) with an average of 0.59. These results also indicate that group 3 possessed the highest internal hetrozygosity. The average frequency divergence among the sub-populations was very low (0.67; Table 4) among all the sub-groups indicating the presence of a mixed genetic composition.

Fig. 1
figure 1

The STRUCTURE generated profiles of the 92 sugarcane varieties from sub-tropical India (Each one of the 92 Sugarcane varietal profiles is depicted as a vertical column and based on their similarity, grouped into seven sub clusters)

Table 4 Fixation index (Fst) and gene-flow (Nm) estimates of the seven pedigree-based clusters of sugarcane that were generated using STRUCTURE ver. 2.2

Marker-trait associations

The search for marker-trait associations was based on mixed linear model (MLM) that is able to account for familial relatedness and population sub-structure pattern. In addition, the MLM is also able to control both type I and type II errors and screen out markers that are genuinely associated with the traits of interest. The minimum threshold to identify the significant markers was based on Benjamini and Hochberg’s FDR. In this way a total of 60 markers were identified that showed associations with 22 out of the 23 qualitative traits considered in this study (Table 5). Out of all the qualitative traits, the maximum six markers were identified for leaf curvature, while a single MTA was identified for few descriptors like bud cushion, bud size, growth crack, pithiness, and shape of bud (Table 5). On individual basis, a significant proportion of trait variation ranging from 16% (marker IISR_9_150 for colour of dewlap) to 40% (marker IISR_227_150 for root band width) was explained by the markers which is comparatively higher than the traits variation explained in previously reported marker-trait association studies with sugarcane (Debibakas et al. 2014; Guoy et al. 2015; Banerjee et al. 2015). No MTA was detected for internode colour (unexposed).

Table 5 Significant marker-trait associations (MTAs) identified for the phenotypic descriptors

In the case of the nine quantitative traits, a total of 21 marker-trait associations were detected showing stability over the three years of study (Table 6). The MLM model identified one marker each associated with corrected Brix value and fibre content, two each with CCS per cent, leaf blade width and plant height, three each with internode diameter, NMCs and sucrose content, and four markers associated with purity per cent (Table 6). The per cent trait variation explained by the EST-SSR markers was reasonably high, for example, the marker IISR_236 explained 37% trait variation of CCS per cent and 38% trait variation in corrected Brix, while the marker IISR_279 explained 37% trait variation of plant height (Table 6).

Table 6 Significant marker-trait associations (MTAs) identified for the nine quantitative traits of sugarcane

It is worthwhile to mention that eight of the 75 MTAs that were identified in this study were also identified in a previous study by Banerjee et al. (2015) utilizing a different panel of 108 genotypes comprising of advanced breeding stocks, old parental lines and released varieties but some of the varieties were common in both the studies. Incidentally, in both the studies, markers IISR_176c and EST A48 were found to be associated with NMCs and IISR_114 was associated with leaf blade width, CCS per cent and cane weight.

Discussion

This study employed a genome-wide association mapping approach in order to thoroughly investigate and identify markers associated with 32 agronomic and quality traits in sugarcane in a panel of 92 varieties that have been in cultivation in the Indian sub-continent since 1930. An attempt was also made to compare the efficacy of marker-trait association analysis in case of the quantitative vis-a-vis the qualitative traits. In general, the converted score of qualitative traits showed no variation over the years indicating their stability and low environmental influence on the expression of these traits. Such low environmental influence has also been reported by Chaudhary (2001) and this could be attributed to strictly clonal multiplication of sugarcane.

The phenotyping data of sugarcane varieties for nine quantitative traits revealed significant correlations among the traits like internode diameter, plant height and leaf blade width, which are significant contributors to the total cane yield. Thus MTAs identified for any of these three traits might also serve as a potential MTA for the other two traits. Internode diameter is a major quantitative trait contributing to cane yield, but it is difficult to track this trait individually for MTAs mainly due to a high degree of missing heritability. This trait showed significant positive correlation with two other important traits, viz., plant growth habit (0.33) and leaf blade width (0.52) and a significant negative correlation with leaf sheath hairiness (−0.33). It could be suggested that since only a few MTAs were detected for internode diameter, the MTAs detected for those qualitative traits that had correlation to this trait can be used for fine screening during MAS. Another important yield contributing trait is plant height but its expression is highly sensitive to environmental conditions, and thus there is always a difficulty in detecting stable MTAs for this trait. Since plant height was found to exhibit a significant positive correlation with root band width (0.44), the MTAs identified for root band width may also act as alternative markers for plant height. Qualitative trait MTAs do have the advantage, as they show a lesser degree of missing heritability compared to that of markers associated with quantitative traits. A negative correlation between sucrose content and number of millable canes (NMCs) has been reported in previous studies (Chaudhary 2001). In this study, NMCs showed the highest GCV value which reinforced the strong potential of this trait for improvement through traditional breeding approaches.

A number of studies reflected greater efficiency of STRUCTURE in accurate prediction of population stratification (Falush et al. 2007). In present study, the panel of 92 varieties could be classified into 7 sub-groups, and none of the varieties had a pure STRUCTURE profile. Such results are quite obvious in view of the fact that sugarcane possesses a mosaic genome which is a highly heterozygous and complex conglomerate of randomly segregating long stretches of genetic elements (Singh et al. 2013). A very high average gene-flow value (0.59) between population sub-clusters also supports this theory and indicates a mixed ancestry for all the sugarcane varieties in current study. This study identified a total of 60 SSR markers associated with 22 qualitative traits and 21 markers with nine quantitative traits. In case of the qualitative traits, it was observed that the MTAs were able to explain a satisfactorily higher proportion of the phenotypic trait variation as compared to that of quantitative traits. Similar finding were also reported in previous MTA studies with sugarcane (Debibakas et al. 2014; Guoy et al. 2015; Banerjee et al. 2015). This could be due to the fact that the qualitative-phenotypic traits were not quantitatively governed and hence had a reduced possibility of missing out rare variants thereby minimizing the chances of missing heritability. Thus markers identified for qualitative traits could help in screening of progeny in the breeding programmes and also be of immense value in varietal identification in future when the plant protection regime becomes more streamlined as well as stringent and the inheritance of specific loci might be required to be followed during breeding process.

A total of 21 MTAs that were detected for the nine quantitative traits were found to be stable over the three years of study. Of these traits, fibre content was highly unstable between the years; fibre is an important trait in sugarcane breeding the inheritance of which is generally camouflaged by the environmental variations. However, despite this instability, one SSR marker, ESTB157_600 was identified to be associated with this trait and showed a stable and significant association for all the three years of study. Such a marker could be effectively used as an alternative to phenotypic screening so as to overcome the ambiguities arising out of environmental variations over the years. It could be concluded that the strength of mixed linear model (MLM) based on TASSEL had a greater strength than ANOVA in accounting for errors arising out of environmental variations.

Among the quantitative traits, number of millable canes (NMCs) had the highest heritability and, it is worth to mention that two MTAs for NMCs, viz., IISR_176c and EST A48 identified in this study were also reported for the same trait in a previous study by Banerjee et al. (2015). These MTAs were able to explain 13% (IISR_176c) and 8% (ESTA48) of the phenotypic variation and had been designed from source ESTs that coded for a 60S ribosomal protein and chloroplast phytoene synthase1, respectively. Thus the two MTAs need further investigations. For purity percent and internode diameter, the identified MTAs explained a cumulative trait variation of nearly 100 per cent, which could be due to the fact that a large number of EST-SSR markers were used in this study and the identified MTAs might be linked and/or located on the same chromosome. In the case of interspecific catfish, Hutson et al. (2014) also reported QTLs cumulatively explaining more than 100 per cent of trait variation and opined a similar reason. In the case of sugarcane, studies by Yang et al. (2010) and Debibakas et al. (2014) suggest that low marker density could lead to low cumulative R2 values. In present study, such bottleneck was overcome by increasing the number of markers, and as a result, a sufficient percent trait variation could be explained.

Conclusion

The genome complexity of sugarcane coupled with complex genetic mechanism controlling yield and quality traits increases manifolds the hurdles in targeted breeding of this polyaneuploid crop. Hence stable MTAs could be of practical use for breeding and selection in sugarcane which requires a more comprehensive tailoring and coverage. This study identified a significant number of MTAs for important yield contributing qualitative and quantitative traits and that explained a significantly higher proportion of trait variation. Such MTAs could be of immense value in sugarcane where only a multitude of small effect MTAs have been previously reported. Further, the incorporation of these markers in selection and breeding programmes have potential of increased genetic gain per cycle of recurrent selection due to the higher per cent trait variation explained by them.