Introduction

Globally, about thirty percent of the irrigated agricultural land is affected by salinity (http://www.ars.usda.gov/Main/docs.htm?docid=10201&page=1). Human population is projected to be 9.2 billion by the year 2050, and hence, the agricultural production must be doubled to ensure adequate food supply (FAO, UN 2009). Rice is one of the most important food crops, which fulfils approximately 20 % of the global food requirement, and hence, it could act as a model crop for global food security (Maclean et al. 2013). However, during the last decade, the growth in rice productivity has reached a plateau, possibly due to intensive agriculture practices, changes in climatic conditions, and/or associated abiotic stresses (Mishra et al. 2016). The changing climate, frequent intermittent flooding, drought, and reduced freshwater availability have led to enhanced salt concentration in soil (Turral et al. 2011). Continuous climatic fluctuations challenge the society to develop climate-resilient agricultural crops. Plants are sensitive to climatic variations and environmental stresses, which create an alarming situation for sustaining genetic diversity of crops and world food security (Mercer 2010). New adaptive mutations would be insufficient to endure such rapid climatic fluctuations. Hence, existing wild rice genetic resources, with well-documented adaptive variations in different genes, may have retained potential genes to withstand such stressed conditions (Shaw and Etterson 2012). Landraces are known to contain higher genetic diversity than the modern high-yielding varieties of rice; however, inadequate sources of salt tolerance in these require the exploration of a novel genetic variation present in the wild rice germplasm.

Natural populations of A-genome wild rice, the progenitors of cultivated rice, are a rich source of genetic diversity. Hence, in this study, we analysed Indian wild rice accessions collected from different regions of India including the stress hotspots for sequence variation in salt-tolerance candidate genes and phenotype. Different stress tolerance genes have already been exploited from the wild rice germplasm, including the male sterility gene found in Oryza rufipogon (Yuan et al. 1993), Bph 19(t) gene from O. rufipogon (Li et al. 2010), and the grassy stunt virus-resistance gene from Oryza nivara (Khush and Ling 1974). For salinity-stress tolerance, inositol methyl transferase was identified and l-myo-inositol 1-phosphate synthase was introgressed from the wild rice Oryza coarctata into Oryza sativa Indica cultivar (Das-Chatterjee et al. 2006; Sengupta et al. 2008). The beneficial alleles derived from wild rice have been transferred into elite genetic backgrounds, which have led to enhanced yield and trait performance of rice varieties. Identification of such allelic variants and exploiting candidate gene diversity from wild rice is in its nascent stage. In this context, a genome-wide survey of candidate genes responding to stress may spur on the case for identifying adaptive variations, conducting evolutionary studies, and ultimately result in crop improvement. The present molecular breeding strategies for the improvement of cultivars depend upon the availability of tightly-linked markers, or functional genic markers based on the sequence variation in the causal gene for the agronomic traits, or both (Xu et al. 2012).

Salinity imposes both osmotic and ionic stresses, and results in the induction of a network of complex physiological responses, which is regulated by a plethora of genes (Kumar et al. 2013). The effect of osmotic stress is alleviated by the production of cellular osmolytes (proline, glycine betaine, manitols, etc.), whose synthesis is regulated by the genes of metabolic pathways (Singla-Pareek et al. 2008). Ion transporters are another group of genes, pivotal for ion homeostasis of the system, which is further controlled by signalling genes and transcription factors (Serrano and Rodriguez-Navarro 2001; Zhu 2003). Many significant genes of essential biochemical pathways responding to salt stress were selected for the present study. The regulatory genes of the osmolyte biosynthesis pathways are key to their production, such as P5CS1 (Jacobs et al. 2003) and P5CS2 (Hur et al. 2004) for proline, BADH1, BADH2, CMO for glycine betaine (Nakamura et al. 1997), and MIPS for myoinositol (Das-Chatterjee et al. 2006). Among the transporter gene families, HKT (Ren et al. 2005; Horie et al. 2007), NHX (Fukuda et al. 2011), SOS1 (Martinez-Atienza et al. 2007), and PIP1;1 (Liu et al. 2013) genes were selected for the study. Important genes of the signalling components regulating ionic homeostasis, namely, SIK1 (Ouyang et al. 2010), SOS2, and SOS3 (Martinez-Atienza et al. 2007) as well as transcription factors, such as bZIP23 (Xiang et al. 2008), bZIP71 (Liu et al. 2014), HsfC1B (Schmidt et al. 2012), and MYB2 (Yang et al. 2012) were also analysed in the study. The extent of vulnerability to salinity and other abiotic stresses depend upon the specific growth stage of the crop, wherein the flowering stage is the most susceptible, as it leads to greater yield losses (Ghomi et al. 2013; Singh et al. 2015a). The identification of functional allelic variants of these candidate genes will contribute towards designing climate-resilient rice varieties that are salt tolerant.

In the present study, 21 genes responding to salinity stress and belonging to different functional categories, such as osmolytes, transporters, and transcription factors, were selected for allele mining. The genes were PCR amplified, re-sequenced, and analysed in the representative sample, comprising 103 accessions of wild and cultivated rice.

Results

Variation for salt tolerance among the wild rice germplasm

A set of 299 wild rice accessions were phenotyped for seedling stage salinity tolerance. Tolerance was measured as standard evaluation score (SES) and four tolerance-associated traits, namely, relative water content (RWC), chlorophyll content, total biomass, and sodium and potassium ion concentrations. From the overall screening, we found two accessions highly tolerant, 14 tolerant, 28 moderately tolerant, 69 sensitive, and 186 highly sensitive to salt stress (Mishra et al. 2016).

Sequence diversity in salt stress-responsive candidate genes

Twenty-one salt-tolerance candidate genes belonging to different functional categories were PCR amplified and re-sequenced from 103 selected accessions of wild rice showing varying levels of salt tolerance. These included regulatory genes of major osmolyte synthesis pathways, transporter gene families, transcription factors, signalling components regulating ions, and anti-oxidative stress-responsive genes. Sequencing covered entire genes, including promoter, coding and noncoding regions, and 5′ and 3′ UTR. A total of 125.5 kb sequenced region covering all the 21 genes identified 2043 SNPs with an average density of 16 SNPs per kb. Among the osmotic-stress-regulating genes, MIPS1 had the lowest eight segregating sites (S) and 0.00024 nucleotide diversity (π), while BADH1 had the highest 50 segregating sites and BADH2 had the highest nucleotide diversity 0.0004 (Table 1). The number of haplotypes ranged from 7 in MIPS1 to 27 in BADH1, and haplotype diversity ranged from 0.2 in MIPS1 to 0.736 in P5CS2. The signalling genes regulating ion concentration, such as SIK1, SOS2, and SOS3 had 13, 18, and 11 haplotypes and 0.481, 0.44, and 0.405 haplotype diversity, respectively. Among the transporter family genes sequenced, the highest nucleotide diversity of 0.00212 was observed for the NHX2, while the largest number of 42 haplotypes was for NHX2 followed by 25 for SOS1 gene. Maximum haplotype diversity of 0.888 was found for NHX2 gene, while the lowest value was 0.146 for NHX4 gene. Among the four transcription factor genes involved in the regulation of different pathways, MYB2 was absolutely conserved with no segregating sites in 103 wild rice genotypes, while bZIP71 and hsfC1B were monomorphic in the coding region. The observed number of haplotypes and haplotype diversity for bZIP23, bZIP71, and hsfC1B genes were 5, 5, and 7, and 0.182, 0.00158, and 0.217, respectively (Table 1).

Table 1 Nucleotide and haplotype diversity and tests of neutrality for functionally different class 21 salt stress-related genes

Association of SNPs in the candidate gene with salt-tolerance phenotype

LD-based association analysis showed a variable number of SNPs in 14 of the 21 genes having significant associations with the SES score for salt tolerance in the wild rice accessions (Table 2). Overall, the highest number of 14 SNPs in PIP1 gene showed association with the salt-tolerance phenotype. Among seven genes belonging to osmolyte biosynthesis pathways, one SNP each in MIPS1, MIPS2, and P5CS1 and two SNPs each in BADH2 and P5CS2 gene showed significant association with the SES score. Among the signalling pathway genes, five SNPs in SIK1 and two SNPs in SOS2; while among transporter family genes, one SNP each in SOS1, NHX1, NHX3, nine SNPs in NHX2 and 14 SNPs in PIP1 genes were significantly associated with the SES score. Out of four transcription factor genes studied, one SNP in HsfC1B and two SNPs in MYB2 gene were also associated with the SES score. While most of the associated SNPs were found in the 5′ upstream, and 3′ downstream noncoding or intronic regions, SIK1 and MYB2 genes had SNPs from the exonic region associated with the salt-tolerance phenotype (Table 2).

Table 2 Association of SNPs in selected salt stress response genes with standard evaluation score (SES) (at R 2 > 5 % and p < 0.01)

Identification of salt-tolerant groups of genotypes based on 29 candidate genes

Apart from these 21 candidate genes for salt tolerance, sequence data for eight HKT transporter genes (Mishra et al. 2016) were also used to prepare a concatenation of all the 29 genes for the grouping of wild rice genotypes. To exclude rare alleles, a filter was set at a minimum count of six genotypes per SNP site. A neighbour joining (NJ)-based dendrogram was generated using SNPs from the promoter, coding, and noncoding regions of all the 29 genes (Fig. 1). This grouped the genotypes into three clusters, i.e., cluster-I, II, and III. Cluster-I had only one wild rice accession NKSWR238, and cluster-II consisted of 12 wild rice accessions. Cluster-III was the major cluster and was further divided into two sub-clusters, such as clusters-IIIA and IIIB. Cluster-IIIA had only wild rice accessions which was further subdivided into two minor groups. The cluster analysis revealed a random distribution of tolerant accessions among different groups, except for two clusters of four and six accessions which were all of tolerant type. This suggested that certain combinations of alleles of these candidate genes are associated with the salt-tolerance phenotype. From the two minor clusters, sub-cluster-IIIA-1-2 has only tolerant wild rice accessions. In addition, sub-cluster-IIIB-3 also grouped only tolerant accessions. The remaining tolerant accessions were randomly dispersed across different sub-clusters. Seven cultivated rice varieties of O. sativa, including both salt tolerant and sensitive cultivars, were grouped together in a separate sub-cluster-IIIB-5. To identify the differentiating SNPs and corresponding gene(s), genomic sequences of each tolerant cluster were separately aligned with five each of tolerant and susceptible accessions from neighbouring clusters (Table 3). For cluster-IIIA-1-2, SNPs differentiating the tolerant and sensitive accessions were found in HKT2;3, NHX1, SOS1, BADH1, and P5CS2 genes. Similarly, for cluster-IIIB-3, differentiating SNPs were found in HKT1;3, P5CS2, and BADH1 genes. SNP annotation from individual tolerant sub-clusters showed one SNP each of BADH1 and NHX1, four SNPs each of HKT2;3 and P5CS2, and 14 SNPs of SOS1 in cluster-IIIA-1-2 and additionally one SNP each from P5CS2 and BADH1 and three SNPs from HKT1;3 in cluster-IIIB-3 differentiating the clusters (Fig. 1; Table 3).

Fig. 1
figure 1

Neighbour joining tree-based on SNPs of all the 29 genes. The genes were concatenated based on their position in the genome, aligned, and cluster was made using SNPs. Red lines indicate tolerant wild rice lines and blue lines indicate O. sativa cultivars and black line indicate intolerant wild rice (colour figure online)

Table 3 Unique SNPs in the accessions belonging to tolerant clusters

Geographical distribution of the haplotypes of salt-responsive gene

The set of 103 wild rice accessions belonging to Indo-Gangetic plains, Western Himalayas (Himachal Pradesh), and the West Coast (Gujrat and Maharashtra) were genetically structured into three sub-populations by genome-wide SNP markers (Mishra et al. 2016). All the genes had a major haplotype present in a large number of accessions and a few minor haplotypes. These haplotypes were distributed randomly across three genetically structured populations (Fig. 2). Minor haplotypes with a maximum of seven accessions (five from the West Coast and two from lower Gangetic plain region) were found in 13 genes. Among them, SIK1 and P5CS2 further differentiated these accessions of Gujrat from those of Indo-Gangetic plains into two separate haplotypes. However, SOS1, SOS2, and SOS3 genes differentiated these groups of accession into two haplotypic groups randomly but not by geography. Seven accessions in H4 haplotype of CMO gene with NKSWR186 in the group being an exception, and haplotypes H5 and H6 with three and eight accessions, respectively, of SOS3 gene and haplotype H3 with five accessions of PIP1;1 gene belonged specifically to upper and middle Indo-Gangetic plains (Fig. 2). Cultivated rice varieties were grouped with major haplotype within most of the genes, or they were distributed across the minor haplotypes for some of the genes.

Fig. 2
figure 2figure 2figure 2

Haplotype network of 20 genes and their distribution in seven geographic locations. a BADH1, b BADH2, c bZIP23, d bZIP71, e CMO, f hsfC1b, g MIPS1, h MIPS2, i NHX1, j NHX2, k NHX3, l NHX4, m NHX5, n P5CS1, o P5CS2, p PIP1, q SIK1, r SOS1, s SOS2, t SOS3. Each circle represents a haplotype, and the size of the circle is proportional to haplotype frequency. Colour coding represents geographic location of the accessions. The numbers on the branch indicate the position of mutation in the sequenced region

Discussion

Salinity imposes substantial yield losses in rice, but to date, only a limited number of tolerance sources have been identified and that too with only partial success towards the development of salt-tolerant cultivars. To identify new sources of salt tolerance, studies have been conducted on rice landraces, but greater genetic variations in wild rice germplasm remain unexplored. Therefore, in this study, we attempted to identify the genetic potential of Indian wild rice by analysing sequence variations in 21 salt stress-responsive candidate genes. It was found that the transporter genes had the highest average nucleotide variations per gene, followed by osmotic stress and signalling intermediates, while transcription factors had the lowest variations. This indicated that genetic variation has a significant relationship with adaptive variation and functional significance of the genes (Storz and Wheat 2010). However, number of haplotypes and haplotype diversity were broadly similar for all classes of genes except for transcription factors, which were significantly lower. This study also indicated differences between nucleotide and haplotype diversity as well as the nature of variation across different gene families. Differences between nucleotide and haplotype diversity have been measured across different genes in wild and cultivated rice, such as sucrose synthase 3 gene (Lestari et al. 2011), OsDREB1F gene (Singh et al. 2015b), OsSNAC1 gene (Singh et al. 2015c), and across HKT gene family (Mishra et al. 2016).

Association analysis showed that while most of the trait-associated SNPs were present in the promoter regions, only a few were present in the exonic regions. Mutations in the regulatory region are usually responsible for altered gene expressions (Stern and Orgogozo 2008). SNPs in the promoter regions of different genes were analysed, and an associated W-box region of WRKY transcription factor binding site was identified in the BADH2 gene. The role of WRKY transcription factors in both biotic and abiotic stresses tolerance has been reported (Eulgem et al. 2000). BADH2 regulates the synthesis of betaine from choline, which is an osmoprotectant that allows normal growth under salinity stress (Chen et al. 2008). Another associated gene SOS2 although regulates ionic stress response, its role in regulating dehydration stress response has also been identified through transcriptional activation of abscisic acid signalling (Abe et al. 2003). An MYB transcription factor regulates transcriptional activation of dehydration responsive genes. Its recognition site has been identified in the promoter region of SOS2, and hence, it can be inferred that the dehydration response is also a part of the salt-tolerance mechanism. In response to dehydration, an interaction between SOS2 protein and ABI2 (abscisic acid-insensitive2), a protein phosphatase has been identified in Arabidopsis, showing cross talks with the ABA signalling pathway (Ohta et al. 2003). It further corroborates the role of regulatory networks, including plant hormone and different downstream gene families, in both ionic and osmotic homoeostasis for salt tolerance in rice.

None of the haplotypes of analysed genes had only tolerant accessions suggesting that no single gene was potent enough to impart complete salt tolerance. As salt tolerance is governed by multiple genes, the SES score is a measure of cumulative action of all the genes/QTLs responding to salt stress. The previous studies conducted on haplotype-based association analysis have rarely shown association with SES score; however, this study did shows association with SES and other salt-related traits, such as Na+ concentration in shoot (Platten et al. 2013). Hence, an authentic differentiation of tolerant alleles of a gene could be done by a combination of two techniques: namely, a precise phenotypic measurement, and second, a molecular or genetic technique that is able to dissect the role of a particular gene in an interactive metabolic pathway governing salt tolerance. In this study, none of the haplotypes of the 21 genes had grouped all the tolerant accessions, which were distributed in different haplotype group. Negrão et al. (2013) also found similar results, where most of the tolerant accessions were not grouped into haplotypes.

Salt tolerance is governed by multiple gene families. Transporters are important regulators of ion homeostasis, while, osmolytes are equally important to maintain osmotic potential and the functional integrity of biomolecules. Transcription factors are the master regulators of gene expression. In this study, we have identified haplotypes for different gene families, and improved salt-tolerant varieties can be developed by assembling superior alleles of different classes of genes into a high-yielding genetic background. Kurokawa et al. (2016) have used haplotype information of 14 useful alleles for the construction of SNP array, which could be used in haplotype-based selection for pyramiding of important genes in rice. Transgenic tobacco plants, having a bicistronic transcriptional unit for expressing a wheat antiporter and an H+ pyrophosphate, were salt tolerant (Gouiaa et al. 2012). Seven different QTLs for drought tolerance are being transferred into submergence tolerant varieties having Sub1 gene (Singh et al. 2015d). Hence, it is concluded that a single candidate gene may not provide an adequate level of salt tolerance. Therefore, cumulative action of multiple genes may be exploited for developing salt-tolerant cultivars.

Clustering of genotypes based on the concatenation of 21 gene sequences was used to assess if any particular combination of alleles of these genes led to salt-tolerant phenotype (Fig. 1). Clustering showed an even distribution of tolerant accessions among different clusters. This indicated the difference in genetic potential of the accessions and the presence of various tolerant alleles of the candidate genes. Earlier, Platten et al. (2013) identified seven haplotypes for the HKT1;5 gene in seven different clusters. Furthermore, analysing epistatic interaction, two tolerant sub-cluster, viz. sub-cluster-IIIA-1-2 and sub-cluster-IIIB-3 grouped all the tolerant accessions. This is likely due to different molecular mechanisms for salt tolerance, or the presence of different interacting sets of multiple genes in each group. It is possible that the clustering of tolerant accessions in these subgroups is due to their close phylogeny and underlying genetic control of salt tolerance may be simple. However, this can only be deciphered by inheritance analysis after crossing the tolerant accessions with closely related sensitive accessions. We identified the SNPs which differentiated the tolerant clusters, and both the clusters showed variations in the transporter family and osmolyte biosynthetic pathway genes. Hence, the phenotypic difference was perhaps due to the regulation of both osmolyte accumulation and ionic concentrations. Phylogenetic analysis, based on Nei coefficient, revealed separate grouping of cultivated rice varieties which may be due to their phylogenetic distance from the wild rice accessions. It also shows that the cultivated varieties have retained only a fraction of the wider genetic variation, because domestication of cultivated rice is based on a particular group of wild rice. It is a well known fact that elite modern rice varieties represent only a fraction of the total genetic variations present in their wild relatives (Brar and Khush 2003; Londo et al. 2006; McNally et al. 2009). Extensive analysis is required to identify the ancestral population from which the cultivated rice might have evolved.

In neutral allele model, new alleles are supposed to be produced by mutations and the most frequent allele is inferred to be the oldest one (Watterson and Guess 1977). In the present study, major haplotypes of all the genes are present in accessions from each of the three genetically structured sub-populations of wild rice. Similar results were obtained by Singh et al. (2015e). This indicates ancient origin of these haplotypes, which is the reason for their even distribution in wild rice from across multiple geographic regions of India. The haplotype network analysis of some genes showed a different eco-geographical distribution of haplotypes, In particular, minor haplotypes, representing the same seven accessions from two geographically different salt-affected areas, were found across the haplotype network of seven genes. It is supposed that this is due either to adaptive variations acquired by the accessions, or to the migration of the accessions between geographic regions. Moreover, in six other genes, the cluster of these seven accessions was further divided into two subgroups, but not by geography. These can be considered as recent haplotypes that originated in Gujrat Hills and Plains region (Fig. 2). It could represent a recent expansion of Indian wild rice, and it may, therefore, be inferred that minor haplotypes play a significant role in studying evolution and the domestication history of rice. Furthermore, the probability of adaptive variations across multiple genes was quite low, and hence, it may be concluded that directional expansion of wild rice may be a primary reason for such common variations. Population expansion has also been supported by significant negative tests of selection and neutrality (Tajima’s D and Fu and Li’s D*, Singh et al. 2015b). In cultivated rice, we observed that the haplotype of one gene was distributed into multiple haplotypes of other genes. Londo et al. (2006) also measured such variations between the haplotype networks of different genes. They inferred that the differences were due to the diverse evolutionary history of different types of genes. In addition, they mentioned that such variations could represent lineage sorting, or multiple origins of the O. sativa indica lineage; however, this study was conducted on wider geographic regions.

The wild progenitors of cultivated rice are a rich source of genetic variations; however, the level of variations is different across different gene families. Unravelling the genetic potential residing within individual gene families may indicate their acquired function. Existence of high genetic variations among transporter genes signifies the relationship between adaptive variations and functional significance, which has further been validated by candidate gene-based association analysis. Genetic interaction between different genes following overlapping tolerance mechanism may be responsible for tolerance among the studied set of wild rice. The identified minor haplotypes may play a significant role in analysing evolutionary studies of crop. The measuring of phenotypic variation and exploiting of sequence variations of the candidate gene may enhance our understating towards salt tolerance and also help in the development of new salt-tolerant cultivars.

Materials and methods

Plant material and their salt-tolerance phenotypes

A total of 299 accessions of wild rice (O. nivara and O. rufipogon) collected from different geographical regions of India were used. Of these, 95 wild accessions and eight cultivated rice varieties were used for targeted re-sequencing of the candidate genes for salt-tolerance. Phenotyping for all the accessions was conducted as per IRRI’s SES protocol with minor modifications (Gregorio et al. 1997), at the National Phytotron Facility, IARI, New Delhi, India for three consecutive years as published elsewhere (Mishra et al. 2016).

PCR amplification and re-sequencing

Genomic DNA was extracted from the leaf tissues using the CTAB method described by Murray and Thompson (1980). All of the 21 genes were PCR amplified using the primer walking method (Supplementary Table 1). The nucleotide sequence of the genes was retrieved from NCBI database (http://www.ncbi.nlm.nih.gov/), and primers were designed using the Primer3 software (http://www.genome.wi.mit.edu). Specific amplification and validation of the primers was done by NCBI Primer BLAST (http://www.ncbi.nlm.nih.gov/tools/primer-blast/) against Oryza taxon and high stringency conditions during PCR amplification. The PCR amplifications were carried out in 25 μl reactions consisting of 1 unit of SpeedStar™ HS DNA polymerase from TAKARA BIO INC, 1× Fast Buffer, 2 μl of dNTP mixture, 0.5 Picomoles each of the two primers, and 80 ng of template DNA. The PCR reaction was carried out in a Bio-Rad Thermal cycler with initial denaturation at 98 °C for 3 min followed by 38 cycles of denaturation for 10 s at 98 °C, annealing for 1 min at 64 °C and extension for 1 min at 68 °C, and final extension for 10 min at 68 °C. The amplified products were further checked by electrophoresis in 1 % agarose gel in 1× TBE buffer.

PCR products were directly sequenced by Ion Torrent PGM sequencer (Life Technologies) after fragmentation, library preparation, purification, and cycle sequencing according to the manufacturer’s instructions. Briefly expounded, the amplicons of different genes of the same genotype were pooled in equimolar concentration, sheared to a size of ~200 bp and then barcoded to identify individual accessions. The sheared and barcoded products were then size-selected, pooled together in equimolar ratios, and then PCR-sequenced. At each step, purification was done using Agencourt AMPure XP reagent using Kingfisher Flex. Ion OneTouch2™ was used for emulsion PCR to clone the library on the beads and thereafter for enrichment. The enriched library was loaded onto the Ion PGM 316 chip, and sequencing was performed on Ion PGM™ sequencer.

Sequence data analysis

Coverage analysis and variant-caller plugins were run, and the sequences were viewed in IGV (Robinson et al. 2011; Thorvaldsdóttir et al. 2013). SamTools mpileup command was used to generate consensus sequences, and alignment was done against the reference sequence of respective genes by ClustalW (Thompson et al. 1994) in BioEdit (Hall 2011). Nucleotide polymorphisms were analysed using the DnaSP software version 5.10 (Rozas et al. 2003). The level of silent-site nucleotide diversities per site (π) (Nei 1987) and population mutation parameter (θ) (Watterson 1975) was estimated. The sliding window analysis was performed to examine nucleotide polymorphism across the genes in all accessions using the DnaSP software. Statistical tests of neutrality, such as Tajima’s D (Tajima 1989), Fu, and Li’s D* and F, were calculated to examine the selection pressure at this locus. A haplotype network was constructed for the comparison of genealogical relationships among the haplotypes using the Network software (Bandelt et al. 1999) (http://www.fluxus-engineering.com).

SNP-trait association analysis

Association analysis was performed with the MLM model, considering both kinship (K) and population structure (Q), implemented in the TASSEL software. The kinship (K) and population structure (Q) have been generated for this set using a genome-wide 48-plex Illumina GoldenGate SNP genotyping assay (Mishra et al. 2016). For association mapping, filtered sites within genes were used to determine linkage disequilibrium (LD) by correlation between alleles at two loci in the TASSEL 5.0 (Bradbury et al. 2007) software and the significance of LD among SNPs was determined by Fisher’s exact test. The mixed model showed least deviation of observed P values from expected P values in QQ plot when compared with that of Q (population structure) or K (kinship) model only. A probability value of 0.01 was used as the threshold for the significance of SNP–trait associations. Functions of associated sites in the promoter region were identified by PLACE (Higo et al. 1999). For a combined analysis of 29 candidate genes (21 genes from this study and the eight HKT transporter genes published earlier, Mishra et al. 2016), sequences of all the genes were concatenated according to their position in the genome using TASSEL and UPGMA tree was constructed. The tree was edited using FigTree v 1.4.2.

Author contribution statement

SM: design and conduct of experiments, sequencing, data analysis, and manuscript preparation; BS: collection and characterization of wild rice, and conduct of experiments and manuscript drafting; PM: manuscript editing; VR: planning of experiment, and manuscript editing; NKS: wild rice collection, planning and supervision of the experiments, and manuscript editing and finalization.