Introduction

Rice, one of the world’s major crops, is an essential component of the diets and livelihood of more than 3.5 billion people, and it serves as an integral part of their culture. As reported by Ray et al. [1] the average rate of increase in the annual yield of rice in the world is 1%. However, boosting crop production by 2050 requires an average yield increase of 2.4% per year, indicating the inadequacy of current yield trends [1]. The present incompetence in yield increase can only be vanquished by producing high-yielding rice varieties that are resource-use efficient and tolerant to various abiotic and biotic stresses. Identifying genes/QTLs associated with yield-related traits enables the introduction of these important loci of interest to the target rice plant through marker-assisted selection (MAS) to produce high-yielding elite rice varieties.

Grain yield of rice is a complex trait determined by grain traits such as grain weight, number of panicles, and number of filled grains per panicle, guided by multiple genes with minor genetic effects [2]. Among these traits, grain weight can be considered a major trait influencing grain yield. The grain weight is mainly determined by grain size, which includes grain length, width, degree of filling, and grain thickness. In addition to direct effect traits, rice grain yield is also influenced by the extensive interplay of several indirect agronomic and morphological traits like plant height, panicle length, tillering ability, growth period, seed length, grains per panicle, and seed setting rate [3]. The advancement in sequencing technologies and the development of molecular markers has led to reasonable progress in mapping genes controlling yield-related traits in rice. Genome wide association study (GWAS) is a popularized cost-effective mapping tool engaged in dissection of the genetic architecture of complex quantitative traits. To date, several candidate genes/ quantitative trait loci that control grain yield in rice have been identified through GWAS [4,5,6,7].

Situated on the Malabar Coast in southwest India, Kerala is rich in biodiversity formed by the favorable climate, soil, and other physical factors. Kerala holds an innumerable collection of cultivated, indigenous, and wild rice genotypes with several unique properties and a great influence on the state’s culture [8]. Indigenous rice genotypes perform superior to modern rice cultivars due to their complex genetic backgrounds, rich genetic diversity, strong adaptability to the environment, exceptional resistance to disease and pests, and good quality. ‘Rajakazhama’, ‘Kavungin poothada’, ‘Kochuvithu’, ‘Arikarai’, ‘Chenkazhama’ and ‘Kothamparikayama’ are high-yielding rice landraces of Kerala. The significance of the present work lies in the fact that studies on rice landraces from Kerala to date are limited to only genetic diversity analyses. A recent study has reported genomic regions associated with physiological, morphological, and yield-related traits in rice landraces of Kerala under drought tolerance [9]. However, comprehension of molecular framework underlying yield-related traits in rice landraces of Kerala is handsome. Understanding the genetic basis of yield-related traits of these genetic resources in Kerala will accelerate molecular marker-assisted breeding to produce high-yielding elite rice varieties competent to equate the feeding demands of expanding population.

In the present study, we employed GWAS in a population of rice landraces from farmer’s field in northern Kerala with the following major objectives (1) to dissect the morphological and genetic diversity of rice landraces, (2) to identify the chromosomal regions affecting grain-related traits, (3) mining candidate genes associated with grain-related traits.

Materials and methods

Plant material and phenotypic evaluation

A diversity panel of 96 rice accessions was gathered from various places in northern Kerala as per ABS regulations 17(a). The collection includes rice landraces from other states of India, as farmers have collected them through seed exchange programs of indigenous rice cultivars [10]. Among them, the diverse set of 73 Oryza sativa genotypes, including landraces (70) and elite varieties (three), was chosen as the GWAS population supported by a rigorous phenotypic evaluation based on agro-morphological traits (Table S1). The accessions, including photosensitive rice landraces were cultivated at a field in Kerala (12.186687°N latitude and 75.222666°E longitude) in two seasons- ‘Virippu’/Autumn season (May to October) and ‘Puncha’/Summer season (December to March) from 2017 to 2019. Three seedlings/hill were transplanted in a plot size of 2.75 m× 3.10 m with a spacing of 20 cm between rows and hills. Each plot consisted of 13 rows of 15 hills. In the current study, rather than employing block designs for experiments, replicated trials were conducted in different years directly on farmer’s fields to capture valuable data on the performance of each rice accession periodically in the same environmental condition. In total, 15 plants/entry from the middle rows were selected to avoid border effects and examined for 31 morphological, seven agronomic, and one-grain quality traits at different growth stages in the field according to the IRRI’s ‘Standard Evaluation System’ for rice guidelines [11]. Well-filled grains were used to measure grain length, width, and weight. Randomly chosen ten well-filled grains from each plant were lined up length-wise along a ruler to measure grain length and then arranged by breadth to measure grain width. Grain weight was calculated based on 100 grains.

Data analysis

Descriptive statistics of 14 quantifiable traits were analyzed by calculating mean, standard deviation (SD), coefficient of variation (CV), skewness, kurtosis, maximum, and minimum using Microsoft Excel 2007. Principal component analysis (PCA) was performed using IBM SPSS statistic version 20 [12]. The correlation between variables was computed using the Spearman correlation method in the Corrplot R package [13]. Path coefficient analysis and plotting path diagrams were done using the R packages- lavaan [14] and semPlot [15], respectively. Grain weight was used as the dependent variable for computing path coefficient analysis. A hierarchical clustering algorithm was used to perform cluster analysis in R v3.5.0 [16].

DNA extraction and genotyping by sequencing

Approximately 50 mg of leaf tissue from seedlings for each accession was collected after 15 days of emergence. Total genomic DNA from leaf tissues of each sample was isolated using CTAB and Phenol: Chloroform DNA extraction method followed by RNase A treatment and purification [17]. The quantity of the isolated DNA was estimated using a Nanodrop Spectrophotometer. Furthermore, purity was substantiated by running the DNA in 0.8% agarose gel electrophoresis. GBS libraries were constructed using the protocol adapted from Poland and Rife [18]. For GBS library preparation, 10–20 µg of genomic DNA was digested using restriction enzymes ApKΙ and PstΙ, followed by ligation of barcoded adapters to the sticky ends of digested DNA. Adaptor ligated products were then pooled and PCR enriched with specific primers enabling them to hybridize Illumina flow cell primers and priming subsequent DNA sequencing reactions. The QC of prepared libraries was checked with Agilent Tape Station. These libraries were sequenced on Illumina NextSeq 500 platform with 2 × 150 bp v2 chemistry. The adaptors and barcodes of the sequenced reads were trimmed with the help of trimmomatic v0.36, and the sequence quality was checked using FASTQC with a Phred quality score > = 30. These high-quality tag sequences were aligned to the reference genome Os-Nipponbare-Reference-IRGSP-1.0 (BioProject: PRJDB1747) using the Burrows-Wheeler Alignment tool [19]. The SAM files were piled up and converted to BAM files using SAM tools. Individual SNPs were detected using SAM tools with the following parameter: ‘mpileup -m 2 -F 0.002 -d 2000’. To reduce the error rate in SNP detection, results were filtered with the following criteria: (1) The number of support reads for each SNP should be more than 4. (2) The mapping quality of each SNP should be higher than 20. (3) The SNPs with minor allele frequency (MAF) > 0.05 were retained. (4) SNPs having more than 10% missing information were removed using vcf tools.

Genome-wide association mapping for grain-related traits and identification of putative candidate genes

Analysis of agro-morphological traits and results of subsequent statistical analyses manifested that the codes/scales varied for accession to their corresponding traits. Grain traits showing variation, such as number of well-filled grains /panicle (WFGP), number of empty hulls/ panicle (EHPP), weight of 100 grains (W100), length of grain (LG) and width of grain (WG) were selected to perform GWAS. A total of 1,47,994 SNPs were used for the GWAS analysis. PCA for the genetic relationships among genotypes was calculated using the R studio using the SNPrelate package [20]. A multi-locus model was applied using the mrMLM R package [21], with the kinship matrix (K) as a random effect to consider the population stratification. The association to the phenotypic traits was calculated for each marker, where the significance threshold to declare a marker as associated was set to LOD score > = 3. The chromosome-wise local linkage disequilibrium (LD) was calculated by the Haploview v4.2 software [22] as previously described [23, 24]. The regions defined by the peak marker positions, including 100 Kbp upstream and downstream (corresponding to the half-maximum average LD decay estimated on the LOESS curve fitting the data), were screened to search for candidate genes underlying each trait. REVIGO [25] was applied to visualize the summary results. The candidate gene mining was performed from the Q-TARO database (http://qtaro.abr.affrc.go.jp/) and the funRiceGenes database (https://funricegenes.github.io/).

Results

Phenotypic variability in the rice landrace association mapping panel

In the present investigation, 39 different agro-morphological traits were selected, and their variabilities among germplasm were recorded to evaluate the diversity of rice landraces of northern Kerala. Among the traits, ligule shape was monomorphic, leaf angle and panicle axis were dimorphic, and all other traits exhibited significant variation (Table S2). Anthocyanin pigmentation in different plant parts was noticeable among the landraces. Pigmentation in basal leaf sheaths was found in 24.65% of rice germplasm, of which 5.48% were light purpled, 9.59% had purple lines, and 9.59% exhibited uniform purple coloration. Similarly, internode color was observed as green in 39.73%, light gold in 49.31%, purple in 5.48%, and purple lines only in 5.48% of genotypes. For the leaf blade, variations in the color of leaf lamina were observed as dark green (5.48%), light green (6.85%), purple blotch (4.11%), purple (1.37%), and purple tips (1.37%). However, 80.82% of the accessions showed green leaf lamina. Most plants (94.52%) were found with erect flag leaves. Intermediate panicles were more frequent (61.64%), although very few open panicles were also observed (6.85%). Rice grains varied significantly in their morphological characteristics, such as lemma and palea color, pubescence, apiculus color, awning and sterile lemma color (Table S2). The mean plant height recorded for landraces was 116.83 cm, with a minimum height of 65 cm for ‘Badma’ and a maximum of 185 cm for ‘Malakkaran’ (Table S3). The landrace accessions were divided into semi-dwarf (12.33%), intermediate (50.68%) and tall (36.99%) based on their differences in plant height. The mean number of well-filled grains/panicle (WFGP) varied significantly between landraces (81.46) and elite varieties (116.51). In addition, landraces had fewer empty hulls/panicle (EHPP) than cultivated varieties (Table S4). The number of WFGP for ‘Rajakazhama’, ‘Thavalakkannan’ and ‘Allikkannan’ was 161.00, 131.29, and 118.10, respectively. For length of grain (LG), an exceptional range of variability was found among the genotypes. The range for grain length was from 5.50 to 17.00 mm with 8.33 mm mean value, and the coefficient of variation for this parameter was 17.04%. For the trait width of grain (WG), considerable variation was observed among the genotypes. The mean value for this trait was 2.91 mm with a coefficient of variation of 10.99%. A considerable variation was observed for weight of 100 grains (W100), ranging between 3.91 g (‘Orthadiyan’) to 1.20 g (‘Kochu vithu’). Other landraces such as ‘Kavunni’ (3.73 g), ‘Chinese silk’ (3.31 g), ‘Valiya kazhama’ (3.28 g), ‘Thouvan’ (3.21 g) and ‘Allikkannan’ (3.20 g) had significant grain weights. Among the five traits, LG, WG, WFGP and W100 showed normal distributions (Fig. S1). Taken together, the extent of available variability for these five grain-related traits suggested the suitability of rice landrace association mapping panel for conducting GWAS.

Principal component analysis (PCA) using 14 traits produced three principal components (PC), which accounted for 66.9% of the total phenotypic variance (Fig. S2). The first PC explained 42.4% variance, where the key determinants were leaf length (LL) (0.90), plant height (PH) (0.84), culm length (CL) (0.82), flag leaf length (FLL) (0.82) and ligule length (LIL) (0.63). PC2 showed 14.8% variance exhibited by W100 (0.69), WG (0.65), WFGP (0.47), leaf width (LW) (0.42), flag leaf width (FLW) (0.37) and culm diameter (CD) (0.21). Finally, PC3 explained 9.7% variance and directly correlated with LG (0.83) but inversely related with EHPP (-0.14) (Table S5). Correlation analysis among traits indicated that PH positively correlated with CL, LL, panicle length (PL) and FLL. Significant positive correlations were found between WFGP with LW, FLW, FLL, LL and CD. Negative correlations were found between WFGP, LG and WG. W100 was positively correlated with WG, CL, PH, FLL and LG. Negative correlations were found between EHPP with PH, CL, W100, LG, WG, FLL; between WFGP with LG and W100 (Fig. S3). Path coefficient analysis was conducted by considering W100 as the dependent variable and other traits as independent variables. Results indicated that CL had the highest positive direct effect on W100, followed by WG and LG (Fig. S4). Moreover, PH and LW had an indirect effect on W100. PL and LL showed a positive impact on CL. A dendrogram of genetic distance was constructed based on morphological traits, which resulted in three major clusters comprising 19, 27 and 34 rice genotypes (Fig. 1). The three major clusters were further subdivided into nine subgroups. In cluster 1, the average values of major traits such as LL, PH, PL, CL, LIL, FLL, WFGP and W100 were 32.99 cm, 96.41 cm, 19.65 cm, 76.77 cm, 13.17 mm, 25.48 cm, 47.64 and 2.44 g, respectively. In cluster 2, the average values of LL, PH, PL, CL, LIL, FLL, WFGP and W100 were 50.35 cm, 139.83 cm, 25.51 cm, 114.43 cm, 16.72 mm, 36.56 cm, 80.52 and 2.93 g, respectively. In cluster 3, the average values of LL, PH, PL, CL, LIL, FLL, WFGP and W100 were 45.16 cm, 108.21 cm, 23.19 cm, 85.52 cm, 16.76 mm, 34.52 cm, 110.10 and 2.38 g, respectively (Table S6).

Fig. 1
figure 1

Cluster dendrogram for 73 rice accessions produced by hierarchical clustering algorithm showing genetic relationship and clusters are based on agro morphological traits (Scale: Euclidean distance)

Identification of QTNs using multi-locus genome-wide association mapping

GBS generated 1,47,994 polymorphic SNPs after filtering them with more than 10% missing values and minor allele frequency (MAF) ≤ 0.05. This subset was used to detect associations between SNP markers and five grain-related traits in the 73 rice genotypes in the collection. PCA was performed with all polymorphic SNPs generated from the panel. The first, second and third PCs explained 22.2%, 8.1% and 5.9% of the total variance. The first two PCs clustered the three subpopulations. However, clustering got distorted when additional components were considered (Fig. 2). The mean LD decay over the physical distance, computed as r2, was 200 Kbp in this GWAS panel. The highest r2 value is observed at 0.6, but the LD decayed below 0.2 at 200 Kbp (Fig. S5). The mrMLM R package, mrMLM, FASTmrMLM and FASTmrEMMA were chosen as they produce fewer spurious associations and two-step verification than single-locus models. Based on the p-value < 0.002 or LOD score ≥ 3, a total of 48 QTNs were identified using the three multi-locus models. Among the forty-eight QTNs, eight for W100, seven for LG, 11 for WG, 13 for WFGP, and nine for EHPP were identified (Table 1). The greatest number of QTNs were detected in FASTmrMLM. A total of seven, five, seven, five, and two QTNs were detected for WG, LG, W100, WFGP, and EHPP, respectively, with phenotypic variation explained (PVE) up to 13.15%. In FASTmrEMMA, the least number of significant SNP markers were detected, with only three. Two significant markers were detected for WFGP and one for LG, with PVE up to 17.68%. The mrMLM method could detect 19 significant markers. A total of four, one, one, six, and seven SNPs were detected for WG, LG, W100, WFGP, and EHPP, respectively, with PVE up to 36.78%. For LG, two QTNs on chromosome 6 were considered to be stable since they have been detected by both methods. Among the 48 QTNs, 43 QTNs can be considered novel, the remaining five have been reported previously (Table 2). Manhattan plots for WFGP, EHPP, W100, LG, and WG were generated in GWAS, indicating the most significant associations (Fig. 3). The quantile-quantile (Q-Q) plot confirmed a normal distribution of phenotypic traits.

Fig. 2
figure 2

Principal component analysis using 1,47,994 SNP markers generated from 73 rice accessions categorized by SNPrelate software

Table 1 The significant associations for five grain-related traits such as length of grain (LG), width of grain (WG), number of well-filled grains/ panicle (WFGP), number of empty hulls/ panicle (EHPP), and weight of 100 grains (W100) detected by using diverse methods within multi-locus GWAS
Table 2 Previously identified QTLs related to significant marker-trait associations for grain-related traits
Fig. 3
figure 3

Manhattan and quantile-quantile plots of genome-wide association studies (GWAS) for grain related traits such as number of empty hulls/ panicle (EHPP), length of grain (LG), number of well-filled grains/ panicle (WFGP), width of grain (WG) and weight of 100 grains (W100). The x and y-axis on the Manhattan plot represent SNP positions on the 12 chromosomes of rice and the is the negative logarithm p-value for each SNP respectively. The x and y-axis on the quantile-quantile plot represent expected -log10 p-values and observed -log10 p-values respectively

Exploration of candidate gene

LD decay distance was employed to select potential candidate genes within a specific distance of each QTN. Since the population selected in the study is a collection of rice landraces that have experienced indirect selection by farmers over millions of years, LD decays slowly over a large distance across the chromosome. Therefore, the range of potential candidate genes was determined according to the position of the fastest decay rate. As the LD decays fastest before 200 kb, potential candidate genes were searched within the interval of 100 kb on either side of each QTN. A total of 889 genes were derived, among which 134, 178, 255, 175, and 147 genes were found within the specified LD of the significant marker associated with EHPP, LG, WFGP, WG, and W100, respectively. The identified genes encode several classes of proteins with known or predicted functions in metabolic processes, cellular processes, response to stimulus, biosynthetic processes, transcription, transport, signaling, response to endogenous stimulus, signal transduction, cellular biosynthetic process, carbohydrate metabolic process, post-embryonic development, cell differentiation, response to biotic stimulus and establishment of localization (Fig. S6). Among the 889 genes, 112 genes were already cloned and functionally characterized. Further filtering identified 15 candidate genes that affect rice grain yield, number of grains per panicle, and grain width (Table 3).

Table 3 Details of potential candidate genes strongly associated with the respective targeted traits

Discussion

As an initiative to conserve traditional genetic resources, conscious seed exchange programs are implemented among local farmers across the country, facilitating the collection of different landraces adapted to different localities in a particular area. The accompanying cultivation and acclimatization of these various genetic resources lead to their heterogeneity. As part of this movement, farmers in northern Kerala are cultivating a collection of rice landraces from Kerala and other states [10]. Many of these rice landraces have not been investigated to their full potential except for documentation and genetic diversity analyses. Recently, Peringottillam et al. [10] assessed the genetic diversity and population structure of these rice landraces and found significant within-population diversity that could be complemented by the seed exchange between farmers. Such a diverse and unexplored rice germplasm can be a source of alternative alleles or new genes needed for the genetic enhancement of rice cultivars. Besides, it is stated that this germplasm panel of northern Kerala is suitable for GWAS due to the high genetic diversity, low genetic relatedness, and slow LD decay [10]. With this perspective in mind, we have characterized the collection of rice landraces from northern Kerala, assessed their morphological and genetic diversity, selected five grain-related traits that are sufficiently differentiated among the landraces, and performed genome-wide association mapping of the same.

The selected rice landraces from Kerala exhibited wide variations in morphological, agronomic, and grain quality traits. In our study, only the shape of the ligule was monomorphic, which is in agreement with the findings of Manjunatha et al. [26] in rice landraces from another district of Kerala. The coloration of plant parts is common in wild species and often found in landraces, which are thought to be involved in fighting against biotic and abiotic stresses [27]. Anthocyanin accumulation in the entire plant and parts such as basal leaf sheath, internodes, leaf collar, and leaf blades are advantageous to farmers in many ways. Chinese silk, a landrace with purple-colored basal leaf sheaths, leaf tips, and collar, was introduced in the rice fields of northern Kerala for clear identification of weedy plants and their removal. Naserbath, a purple-colored landrace, is planted as guard rows to distinguish different landraces. Farmers sometimes entitle different names for the same landrace. Hence, the variations in morphological traits like basal leaf sheath color, leaf blade color, culm internode color, ligule color and auricle color, together with grain characteristics like lemma and palea color, lemma palea pubescence, and apiculus color, are crucial in identifying each germplasm, eliminating the possibility of duplication of landrace identity. The genetic purity of crop landraces can be maintained by systematically assessing differences in morphological traits at each growth stage, from seedling to mature grain. Therefore, these morphological traits can also be considered robust markers to avoid genetic contamination unless they are affected by environmental variations.

A few important agronomic traits such as plant vigor, plant height, tillering ability, and panicle exsertion showed considerable variation. A similar trend has also been reported by Roy et al. [28], which described variations in agronomic traits of rice landraces in Nagaland. Crop varieties having these variations are of prime importance as farmers can further utilize them for their specific agroeconomic needs. For example, the landrace, ‘Mundon’ in our study (tall with a mean plant height of 128.5 cm), has long been cultivated abundantly for fodder purposes even though its yield is moderate under low input farming systems. Semi-dwarf phenotype is an important agronomic characteristic deployed in modern breeding programs, enabling lodging resistance and a significant increase in yield. Among the collected germplasm, the landraces such as ‘Chinese silk’, ‘Thekkancheera’, ‘Black jasmine’, ‘Chitteni’, ‘Badma’, ‘Naserbath’, ‘Basumati’, ‘HMT’, ‘Kochu vithu’ belong to the semi-dwarf category. In the present study, we found that the landraces ‘Thavalakkannan’, ‘Allikkannan’ and ‘Rajakazhama’ were grown well regarding the overall acceptability of the growing area. Therefore, the diversity in agronomic characteristics observed in our germplasm indicates the acceptability of these traits for selection.

In the current study, PCA identified LL, PH, CL, WFGP, FLW, WG and W100 as important traits contributing to total phenotypic variation. This finding agrees with the earlier report of Ray et al. [1], which showed prominent contributions of leaf length, plant height and grain characteristics to the phenotypic diversity of aromatic rice landraces. The correlation analysis among traits specified a significant positive correlation between PH and PL. A parallel trend has also been reported in other studies [29]. The association of WFGP, a potent yield-related trait, with FLW and FLL was supported by previous work of Bassuony and Zsembeli [30], stating that flag leaf is critically important in determining yield. The traits like culm length, grain width and grain length were positively correlated with grain weight, which corresponded with a previous study [31]. The negative association between WFGP and grain size was formerly detailed by Saha et al. [29]. Information on the correlation between grain-related traits is vital for enhancing grain yield through breeding programs as it helps choose effective selection criteria [28]. Our target was to find the associations between SNP markers and traits that contribute enough to enhance the grain yield of rice landraces cultivated in farmer’s fields in northern Kerala.

GBS, one of the reduced representation sequencing methods, was applied for genotyping our population of interest. GBS generated a total of 1,47,994 SNPs which is sufficient to a great extent in GWAS experiments for genotypes exhibiting slow LD decay and to execute the marker-assisted selection. ML-GWAS methods are more effective than SL-GWAS methods in investigating the small-effect QTNs of yield-related traits [31]. In the present study, the size of the rice landrace population used as the GWAS panel was minimal, with 73 genotypes. Mogga et al. [32] performed GWAS of grain quality traits in 59 rice genotypes with only 525 SNPs. Also, in a previous study, GWAS was performed using 94 deep-water rice accessions collected from different flood-prone areas of Assam and successfully identified genomic regions associated with desirable traits [33]. Recently Soumya et al. [34] pointed out the possibility of discovering associations between genotype and phenotype by GWAS even if the population size is less than 100. The population size used in the current study is less than a hundred as in the aforementioned GWAS populations. However, the phenotypic diversity exhibited by several traits in the study was highly remarkable. Research directly on farmer’s field is of utmost importance as the growing conditions show a greater range of variations depending on the soil type, rainfall, climatic conditions, previous harvest history and crop management practices of each farmer [35]. Here we exploited the natural variation in rice landraces directly from the farmer’s field, which is sufficient to carry out GWAS even if the population size is small because such natural variation is the key to understanding adaptive processes and improving agriculture [36]. Also, in the current study, a threshold value of three was used to identify significant associate SNPs using ML-GWAS methods. LOD value of three can be used to balance the high power and low false-positive rate in ML-GWAS [37]. Additionally, Rohilla et al. [33] discussed the possibility of getting significant association while using a threshold of three with a limited number of genotypes.

A total of 48 QTNs were identified in the current study, among which five were reported earlier [38,39,40,41,42]. The other novel significant SNP marker loci indicated the possibility of uncovering novel alleles associated with rice grain-related traits. The two peak values of chromosome 6 located between 1,464,501 and 1,467,584 bp, 1,816,630 and 1,818,748 bp corresponded to previously reported genes such as Grain Size 6 (GS6) and 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS), respectively. GS6 is a unique member of the GRAS gene family, closely associated with grain size selected by ancient humans during rice domestication [43]. The associated loci identified in this study co-localized with previously reported QTLs controlling grain yield, weight, length and shape. For example, a locus on chromosome 1 was associated with W100 based on the GWAS results, and this locus co-localized with a reported QTL yd1 controlling yield [40]. Similarly, a locus on chromosome 3 was associated with WFGP based on the GWAS results, and this locus co-localized with a reported QTL yld3.2 controlling yield per plant [42]. In our study, the locus Os02g0537700 on chromosome 2 was associated with WFGP based on GWAS results related to yield. For grain length, QTNs were discovered by both mrMLM and FASTmrMLM on chromosome 6. Candidate gene analysis identified candidate gene IPI1 associated with grain length in our study found to increase yield per plant [44]. Candidate gene Os07g0138200 associated with grain length in the present study has been reported to affect grain yield [45]. Kim et al. [46] discovered that the putative chloroplastic 2 cys thioredoxin peroxidase (OsTPX) from leaf samples of rice contributes to increase stress tolerance and fermentation capacity in genetically engineered yeasts. Candidate genes affecting well-filled grains per panicle in the current study was found to increase yield [47, 48]. Interestingly, another candidate gene associated with the number of well-filled grains per panicle was reported to affect grains per panicle [49]. Additionally, the candidate gene Os01g0105900, related to grain width, was reported to play an essential role in rice growth and productivity [50].

Conclusion

The present investigation forms a pioneer work demonstrating the untapped genetic potential of rice landrace collections in northern Kerala. This study was performed using a maximum number of morphological markers, revealing a substantial variation in agro-morphological characteristics that have implications for the utilization and conservation of rice landrace collections. Also, it indicated that the morphological traits were useful for the preliminary evaluation of rice landraces and for assessing genetic diversity among morphologically different rice landraces. PCA enabled the assessment of major traits which were greatly impacted the diversity of landraces. The significant diversity in grain-related traits and their significance in enhancing grain yield prompted us to pursue GWAS. Through GWAS, we could successfully identify 48 significant QTNs associated with grain-related traits. Among these, five had previously been reported, while the remaining significant associations were indicative of the probability of discovering novel alleles associated with grain-related traits. Candidate gene mining within the specific distance of each associated SNP resulted in 15 highly associated candidate genes directly related to traits such as yield, grains per panicle, and grain width. The genetic basis of grain-related traits will provide important insights to the rice breeders to introgress candidate genes/QTLs associated with these traits to develop high-yielding rice varieties. Moreover, the study highlights the significance of rice landraces as an invaluable reservoir of genetic diversity and a potential source of valuable alleles for rice breeding.