Introduction

Rice (Oryza sativa L.) is an important cereal crop and staple food for over three billion people on the planet. The stagnated yield levels since the Green revolution pose challenge to feed anticipated 9 billion people in the coming three decades [1, 2]. The grain yield potential of rice primarily depends on panicle number per plant, grain number per panicle, grain size and grain weight [3,4,5]. However, the elucidation of genetics of these yield components is complicated due to their polygenic inheritance besides being highly influenced by the environment. Hence, understanding of the genetic basis of yield traits at molecular level is required to further enhance yield ceiling in the present-day elite cultivars. As of now 639 genes [6] and 2060 QTLs (https://www.gramene.org) pertaining to yield and its component traits have been identified. Among them, GS3 for grain size/weight, DEP1 and Gn1a for grain number, MINI SEED 2 (MIS2) for grain size and shape, DEGENERATED PANICLE AND PARTIAL STERILITY 1 (DPS1) for anther cuticle and panicle development and GW8 for grain weight are important major QTLs governing the yield component traits [3, 5, 7].

Grain weight is one of the key component traits that determine the final yield in rice. The grain size is determined by few important QTLs such as GS3, GL3.1, GW2, GW5, GW6a and OsLG3 through regulating cell proliferation in spikelet hulls while TGW6 QTL in endosperm [8,9,10]. Further, the grain size is influenced by GLW7 and qNPT1/WTG1 QTLsby regulating cell expansion, whereas GS2, GS5, GL7/GW7 and GW8 influence by combining the regulation of cell division and cell expansion [8, 11,12,13]. From bulked QTL-seq analysis, a candidate genomic region (qTGW5.3) strongly associated with grain length and weight was identified and narrowed down to an interval of 1.13 Mb by screening the recombinants and their progeny test [14]. Yet the complete elucidation of inheritance of the grain weight is lacking due to its extremely complex nature. The key reason can be attributed to the insufficient genotyping data of the mapping populations or development of polymorphic DNA markers as the phenotyping for yield component traits and development of mapping populations from contrasting parental genotypes demands relatively less effort.

Bulked segregant analysis (BSA) is an elegant and rapid method to identify DNA markers linked to a trait of interest [15]. However, in BSA analyses, the availability of DNA markers was the key factor limiting effectiveness of the method. Furthermore, genotyping of each marker for the two extreme DNA bulks is still time-consuming and expensive. Rapid development of next-generation sequencing (NGS) technologies has facilitated to use the combination of BSA and whole genome re-sequencing of DNA pools to identify genomic regions for target traits. Using this concept, several methods have been developed for rapid identification of the QTLs such as SHOREmap [16], Next generation mapping (NGM) [17], MutMap [18], Isogenic mapping by sequencing [19], SNP-ratio mapping (SRM) [20], MutMap+ [21] MutMap-Gap [23], QTL-seq [22], SLAF-seq [24] and Indel-seq [25]. Among these techniques, QTL-seq that combines BSA with next-generation sequencing has successfully identified many QTLs in rice [14, 22, 26,27,28], chickpea [29] and [30], cucumber [31] and tomato [32] and [33].

Mining of candidate genes underlying a QTL is a challenging task. However, with the availability of high-quality reference genome sequence and other advances in sequencing technologies resulted in several public databases for genomics, transcriptomics, proteomics and metabolomics. Rice Annotation Project-Database (https://rapdb.dna.affrc.go.jp), RiceXPro (ricexpro.dna.affrc.go.jp), RicevarmapV2.0 (https://ricevarmap.ncpgr.cn/v2/vars_in_gene/) gramene (www.gramene.org) and Rice Genome Annotation Project (https://rice.plantbiology.msu.edu/) are some of the important rice databases, which enables effective functional annotation of genomic regions. Exploiting these public databases, now it is possible to pinpoint the candidate genes within the QTL regions. Therefore, the present research work was conducted to identify the genomic regions and candidate genes determining grain weight in rice by employing QTL-seq approach.

Materials and methods

Plant material and development of mapping population

In this study, F2 mapping population was developed using BPT5204 (Samba Mahsuri) and MTU3626 (Prabhat) as parents. BPT5204 is a fine-grained high yielding variety with excellent cooking quality developed at Agricultural College, Acharya NG Ranga Agricultural University (ANGRAU), Bapatla whereas MTU3626 is a high yielding variety with high grain weight developed at Regional Agricultural Research Station, ANGRAU, Maruteru.

For development of mapping population, BPT5204 was crossed with male parent MTU3626 to develop F1 seeds during kharif (June to October), 2016 (Fig. 1). The F1 plants were confirmed for their heterozygosity using simple sequence repeat (SSR) markers that are polymorphic between parents (Supplementary Fig. 1). True F1s were selfed and simultaneously backcrossed with Samba Mahsuri to generate F2 and BC1F1 seeds, respectively. Later, F2 and BC1F1 plants were advanced to F3 and BC1F2 seeds, respectively by selfing. F2 mapping population along with parents were evaluated for yield and its component traits during kharif, 2017 while F3 and BC1F2 plants along with parents were raised during rabi (November to April), 2017–2018 at wetland farm, S. V. Agricultural College, Acharya NG Ranga Agricultural University, Tirupati (Supplementary Fig. 2). The 1000 grain weight was measured from the parents and the mapping populations and expressed in grams.

Fig. 1
figure 1

Development of mapping populations for mapping of grain weight QTLs in rice

Construction of extreme bulks for 1000 grain weight

From a total of 862 F2 plants of BPT5204 × MTU3626 cross, 15 plants each with the extreme high grain weight and low grain weight were selected to establish two bulks i.e. highest grain weight bulk (H-Bulk) and lowest grain weight bulk (L-Bulk), respectively. The genomic DNA from 100 mg fresh rice leaves of the parents and extreme F2 individuals for the grain weight trait were extracted using DNeasy Plant Mini Kit (QIAGEN Sciences) and quantification of DNA was performed using Quant-iTPicoGreen dsDNA reagent and kits (Invitrogen). The bulk DNA samples were prepared by mixing equimolar concentration of extracted DNA from 15 individuals of low-and high-mean grain weight traits and pooled together and two extreme bulks along with two parents were resequenced.

Whole genome resequencing and mapping of reads

A total of 4 libraries (2 bulks—H-Bulk and L-Bulk and 2 parents—BPT5204 and MTU3626) for Illumina sequencing were constructed from 10 µg of DNA sample and sequenced in Illumina Hi-Seq Ten X sequencer. The pair end (PE) short reads (150 × 2) obtained with the Phred quality score of ≥ 30 were used for aligning with the reference genome. Initially mapping of both parents was performed using Os-Nipponbare-Reference-IRGSP-1.0, and then post processing and filtering of the alignment files was executed. Further, called variants used to develop consensus assembly of the donor parent (MTU3626) by substituting the bases with confidence variants calls in the genome. All generated whole genome sequencing data of BPT5204 was used from our previous study [34] and MTU3626 with bulks, whole genome sequencing data is available in SRA database under the BioProject ID: PRJNA482942.

In total, 92.32, 107.41, 122.18 and 124.04 million paired-end (PE) reads for BPT5204, MTU3626, high bulk BPT5204 × MTU3626 and low bulk BPT5204 x MTU3626, respectively were generated (Table 1). GC content of H-bulk and L-bulk was 43.6% and 45.33%, respectively. The raw data was pre-processed using Adapter removal (version: 2.2.0). Cleaning of adapters and pre-processing of reads were performed and high-quality reads from parents and bulks used in mapping and QTL-seq analysis. After trimming and filtering 93230844, 70476388, 75200694 and 75200694 clean reads were procured in BPT5204, MTU3626, H-bulk and L-bulk, respectively. Alignment of clean reads of BPT5204 and MTU3626 with Os-Nipponbare-Reference-IRGSP-1.0 resulted in an average depth of 36.93X and 25.12X and genome coverage of 92.74% and 89.65% respectively, letting us to develop a reference-based assembly for MTU3626 male parent. The aligned samples and the reference genome sequence are used for variant calling with the help of samtools program (samtools version 0.1.18).

Table 1 Summary of whole genome resequencing data of parental lines and bulks for grain weight trait of the cross between BPT5204 and MTU3626

Calculation of SNP Index

Whole analysis was performed using QTLseq pipeline (https://genome-e.ibrc.or.jp/home/bioinformatics-team/mutmap) and associated tool i.e., BWA aligner [35], SAM tools [36] and Covel [37]. SNP-index (= count of alternate base/count of reads aligned) was calculated for all the SNP positions found in both bulks using reference genome positions. In total, 422,882 SNPs were identified across all 12 chromosomes (Table 2). However, low quality SNP positions with SNP-index of < 0.3 and read depth < 5 from the two sequences were excluded. If SNPs with SNP-index ≥ 0.3 were present in only one of the sequences obtained from the two DNA High bulk, we considered them as real SNPs and assumed their presence in the other Low bulk too. Further, ΔSNP-index was calculated by subtracting SNP-index of High bulk from SNP-index of Low bulk. SNP-index plots were generated using sliding window analysis with 2 Mb window size and increment of 10 Kb.The possible effects of the identified SNPs were annotated using In-house pipeline VAriMAT. SNPs with SNP index < 0.3 in both bulks were removed as they could be spurious SNPs caused by sequencing errors or alignment errors. The SNP index of remaining SNPs calculated from each bulk was physically plotted on all 12 rice chromosomes. The Δ(SNP index) calculated by subtracting the SNP index values in HGW-bulk by those in LGW-bulk together with the sliding windows of average SNP indices of SNPs located within a 2-Mb region and 1-Kb stepwise were also plotted.

Table 2 Chromosome-wise distribution of single nucleotide polymorphisms (SNPs) between highest and lowest bulks of 1000 grain weight

QTLs responsible for changes observed in low and high bulks of grain weight were identified by comparing the ‘SNP-index’ between them. SNP-index represents the frequencies of parental alleles in the bulked individuals. In the BPT5204 × MTU3626 bulks, “SNP-index = 1” indicates that reads in the population are derived only from the MTU3626 genome, whereas “SNP-index = 0” indicates the reads are derived only from BPT5204. A SNP-index of 0.5 indicates an equal genome contribution from both parents and a deviation from 0.5 SNP-index indicates that contribution of that SNP to the phenotypic difference observed in the bulks. Differences in SNP-indices between high and low bulks were plotted through ΔSNP-index with significance of P < 0.05.

Confirmation of identified genomic regions

The QTLs detected in the F2 population using QTL-seq approach were confirmed in F2, F2 and BC1F2 populations of the cross between BPT5204 and MTU3626 using polymorphic markers developed from the InDel regions of QTL-seq analysis. Primers were designed for identified InDel markers using Primer3 and previously reported genomic SSR markers (physically/genetically mapped on selected QTL regions) of chromosomes (Supplementary Table 1).

Parents were screened with these markers to identify polymorphic markers. The PCR was carried out using a programmable thermal cycler (Eppendorf). 2 µl of 50 ng template DNA was pipetted into each of the wells in the PCR plate. Then, the master mix was prepared by taking each 0.5 µl of 10 pmol primer (both forward and reverse primer), 0.5 µl of 10 mM deoxy ribonucleotides (dNTPs), 1 µl of 10X PCR buffer with Mg2+(ABM, Canada) and 0.1 µl of 5 U/µl Taq DNA polymerase (ABM, Canada) and 4.4 µl of sterile distilled water was added to make up the volume to 8 µl. The master mix was centrifuged for short period of about 10 s for thorough mixing of the components. Subsequently, 8 µl of master mix was added to each of wells in the PCR plate well having 2 µl of template DNA to make the final volume to 10 µl. The PCR products were separated on 3.5% agarose gel electrophoresis.

Statistical analysis

Single-marker analysis is the simplest method for detecting markers associated with QTLs. The statistical methods used for single-marker analysis includes, t-tests, analysis of variance (ANOVA) and linear regression. Linear regression is most commonly used method for the coefficient of determination (R2) from the marker that explains the phenotypic variation arising from the marker linked to the QTL [38]. Linear regression analysis was used to study the marker-trait correlation in the study.

Results

Phenotyping of F2 mapping populations

The parents used for mapping, BPT5204 (Samba Mahsuri) and MTU3626 (Prabhat) were analysed for test of significance using paired t-test and observed highly significant differences for plant height, number of tillers per plant, number of panicles per plant, number of grains per panicle, number of filled grains per panicle, spikelet fertility, grain yield per plant, grain length, grain width, grain size and 1000 grain weight suggesting that there is substantial variability between BPT5204 and MTU3626 genotypes (Fig. 2 and Table 3). During kharif 2017, 862 F2 plants of the cross between BPT5204 and MTU3626, along with the parents were evaluated for yield and its component traits. The frequency distribution (Fig. 3b) of 1000 grain weight in F2 mapping population is symmetrical with bell shaped curve. The mean (20.79 g), median (21 g) and mode (20 g) were almost equal, which clearly showed the normal distribution of grain weight trait in F2 mapping population, indicating that this trait was controlled by many genes with small effect in the expression of the trait.

Fig. 2
figure 2

Morphological characteristics of the parents and F1 generations for a Plant height (BPT5204: 95.17 cm, MTU3626: 88.17 cm and F1: 93.28 cm) and b Panicle Length (BPT5204: 24.52 cm, MTU3626: 26.53 cm and F1: 26.33 cm). c Grain length traits (BPT5204: 6.50 mm, MTU3626: 8.40 mm). and d Grain width (BPT5204: 2.00 mm, MTU3626: 3.60 mm)

Table 3 Test of significance of yield and its component traits between BPT5204 (Samba Mahsuri) and MTU3626 (Prabhat) rice varieties
Fig. 3
figure 3

QTL-seq approach adopted for mapping genomic regions responsible for 1000 grain weight. a Crossing of BPT5204 (Fine-grained parent) and MTU3626(High grain weight parent) to obtain F1 (BPT5204 × MTU3626) progeny. b Frequency distribution based on thousand grain weight of 862 F2 plants. The F2 plants that were selected to build the highest and lowest thousand grain weight bulks are highlighted in the rectangles. c Three Δ(SNP-index) graphs for the chromosomes 1, 7 and 8 plotted using a sliding window of 4 Mb with a step of 10 kb. The significant genomic regions are highlighted in shaded colour (35–40 Mb on chromosome 1, 10–17 Mb on chromosome 7 and 2–5 Mb on chromosome 8)

QTL-seq analysis

A total of 5714 and 6494 SNPs were identified for H-bulk and L-bulk, respectively in selected target regions. Homozygous SNPs identified for H-bulk and L-bulk are 1040 and 1251, respectively (Supplementary Table 1). In total, 1896 SNPs were present in genic region at Read Depth ≥ 5 and among them only 962 SNPs were in the exonic regions of the gene. SNPs with less than 0.3 SNP index in both bulks were removed as they could be spurious SNPs. The SNP index of remaining SNPs calculated from each bulk was physically plotted onto all 12 rice chromosomes. The Δ(SNP index) calculated by subtracting the SNP index values in H-bulk by those in L-bulk together with the sliding windows of average SNP indices of SNPs located within a 2-Mb region and 1 Kb stepwise were also plotted.QTL-seq analysis revealed three genomic regions i.e., qGW1 (35–40 Mb), qGW7 (10–18 Mb), qGW8 (2–5 Mb) on chromosomes 1, 7 and 8 (Fig. 3c and Table 4) for grain weight at P < 0.01.

Table 4 Genomic regions identified as QTLs for grain weight from QTL-seq analysis in F2 Populations of the cross between BPT5204 and MTU3626

Confirmation of the QTL-seq derived QTLs in F2, F3 and BC1F2 populations

The polymorphic SSR markers in candidate genomic regions were identified and used for screening the F2 and BC1F2 population (Supplementary Table 2). RM3572 was significantly associated with grain weight trait (qGW8) in F2 mapping population obtained from the cross BPT5204 × MTU3626 (Fig. 4). The per cent contribution of the significantly associated marker RM3572 to total phenotypic variance was 17.88% with a P-value of < 0.0001. RM3572 marker showed significant association with grain weight in F3 and BC1F2 population with a phenotypic variance of 16.70% and 15.00%, respectively at < 0.0001 P-value (Fig. 5) (Table 5).

Fig. 4
figure 4

a Representative agarose gel picture showing segregation pattern of F2 and BC1F2 populations of the cross between BPT5204 and MTU3626. Primer-RM3572, 1–42 are F2 plants, 1–22 are BC1F2 Plants, L-50 bp Ladder, B-BPT5204; M-MTU3626. b Frequency of marker RM3572 genotypes in the F2 and F3 and BC1F2 populations of the cross between BPT5204 and MTU3626 for 1000 grain weight. RM3572 marker genotype: BPT5204-BPT5204 marker allele, MTU3626-MTU marker allele and Heterozygote-heterozygote marker allele

Fig. 5
figure 5

Linear regression analysis for confirmation of grain weight QTL, qGW8 in F2, F3 and BC1F2 populations of a cross between BPT5204 and MTU3626

Table 5 Confirmation of grain weight QTL in F2, F3 and BC1F2 populations of the cross between BPT5204 and MTU3626 using single-marker analysis

Discussion

The grain weight is one of the key yield contributing traits in rice. However, the complete mechanism of the inheritance of this complex trait is far from comprehensive understanding in spite of several QTLs and candidate genes were identified. In the present investigation, an attempt was made to uncover QTLs for grain weight employing NGS-based QTL-seq analysis using mapping population derived from BPT5204 and MTU3626. The bold grain variety, MTU3626 shows as much as 29 g of 1000 grain weight-a highest among all cultivated varieties. Therefore, MTU3626 variety was used as one of the parent.

From ∆SNP index plots, three regions on chromosomes 1, 7 and 8 were identified as putative QTLs for 1000 grain weight at P < 0.01. These three QTLs were found to be novel as they were not reported earlier in thesame chromosomal regions. However, in these QTL regions, five yield component traits governing QTLs other than grain weight viz., white berry kernals [39], spikelet fertility QTL, sf1.1 [40], shattering habit at maturity QTL, qSH1 [41], plant height QTL, ph1.1 [40] and panicle length QTL, pl1.1 [42] on chromosome 1, number of spikelets per panicle governing QTL, qSSP7 [43] on chromosome 7 and mean seed fertility controlling QTL, qCTB8 [44] on chromosome 8 were previously reported (Supplementary Table 3).

In the present study, the QTL, qGW8 was detected on chromosome 8 between 2 and 5 Mb region for grain weight. Through single marker analysis, Illa-Berenguer et al. [32] reported significant markers linked to fruit weight QTL (fw3.3) identified from QTL-seq analysis in F2 populations of tomato. The marker-QTL linkage may not be consistent in different genetic backgrounds or in different testing environments, especially for complex traits [45]. Markers validation on independent populations of different genetic background is essential in determining the effectiveness and reliability of markers to predict phenotype of trait [46,47,48,49]. Hence, in the present study, genotypic data of F2 was analysed with the phenotypic data of F3 population and also BC1F2 population was screened with polymorphic markers for QTL confirmation. These results inferred that phenotypic variation of 1000 grain weight in F2, F3 and BC1F2 population was completely in accordance with its genotypic data. These results suggest the reliability of RM3572 marker association with the qGW8 QTL. It was further verified with previously reports and was found that no QTL for grain weight was identified in the selected region. Hence, qGW8 was considered as novel QTL for grain weight in rice. In identified QTL region, alleles from MTU3626 contributed to higher trait value and lower trait value. From the mean values of marker classes, MTU3626 was identified as the source of favourable allele for grain weight. Furthermore, this QTL explained more than 15% phenotypic variance in all three populations suggesting that this is a major QTL. Hence, it can be targeted for candidate gene identification and subsequently this QTL can also be transferred to low yielding varieties via marker-assisted breeding.

A total of 45 annotated genes are present in the genomic region (2–5 Mb) of qGW8 QTL as per the rice genome annotation project- database (RAP-DB). Among them, five genes viz., LOC_Os08g01490 (Cytochrome P450), LOC_Os08g01510 (Cytochrome P450), LOC_Os08g01520 (Cytochrome P450), LOC_Os08g01680 (WD domain, G-beta repeat domain containing protein) and LOC_Os08g01780 (OsIAA25-Auxin-responsive Aux/IAA gene family member) were considered as putative candidate genes controlling grain weight in rice as similar kind of genes in different genomic regions were reported to control grain weight in rice and other crops as per the previous literature (Table 6). In addition, the cytochrome P450 controlling genes, LOC_Os08g01520, LOC_Os08g01510 and LOC_Os08g01520 () found to be over expressed while LOC_Os08g01680 (WD domain, G-beta repeat domain containing protein) exhibited low expression in lemma and palea as per the RiceXPro (https://ricexpro.dna.affrc.go.jp/) (Fig. 6), hence, can be considered as probable candidate genes.

Table 6 The shortlisted candidate genes based on previous reports in the QTL regions for grain weight QTL of the cross between BPT5204 and MTU3626
Fig. 6
figure 6

Gene expression analysis of candidate genes underlying grain weight QTL region. More intense red colour represents the high expression while more intense blue colour indicates low expression. X-axis—various tissues of rice inflorescence

Genes encoding cytochrome P450 are known to be involved in brassinosteroid biosynthesis [50]. Brassinosteroids (BRs) are a class of growth-promoting steroid plant hormone, which are crucial for normal growth and development, such as plant height, leaf angle, panicle architecture, and seed size [51]. Liu et al. [52] reported that GW5 could repress the kinase activity of GSK2 (Glycogen synthase kinase2) toward OsBZR1 (Oryza sativa BRASSINAZOLE RESISTANT1) and DLT (DWARF AND LOW-TILLERING), resulting in accumulation of their unphosphorylated forms and altered BR signalling. Therefore, GW5 might be a positive regulator of the BR signalling pathway in regulating grain width and weight in rice. Daware et al. [26] identified a cytochrome P450 and serine carboxy peptidase protein encoding genes in the 1000 grain weight QTL, OsqGW5.1 on chromosome 5 of rice using QTL-seq approach in the F4:5 population of the cross between IR64 and Sonasal. Rice encodes one each of Gα and Gβ, and five Gγ proteins [53] and Botella et al. [54]. Both Gα and Gβ proteins are positive regulators of cell proliferation and grain size in rice [55]. Loss of function of Gα (RGA1) or suppression of Gβ (RGB1) decreases grain size in rice [56], and [57], suggesting that growth of rice grain is also regulated by Gα and Gβ proteins. Utsunomiya et al. [57] reported that RICE G PROTEIN BETA SUBUNIT (RGB1) (Os03g0669100) belongs to G protein β subunit protein category, is a positive regulator of cell proliferation in rice. Ishimaru et al. [58] reported that TGW6 (THOUSAND GRAIN WEIGHT 6) encodes a novel protein with indole-3-acetic acid (IAA)-glucose hydrolase activity and positively regulates free IAA levels in rice grains and loss of function of TGW6 enhances rice grain weight. Through RNA-sequencing analysis, Hu et al. [59] reported that OsARF4 and OsSK41 at qTGW3 repress the expression of a common set of downstream genes, including some auxin-responsive genes, during rice grain development and loss of function of OsARF4 results in larger rice grains.

In conclusion, the present study demonstrates the rapid identification of QTLs employing QTL-seq approach and the prediction of candidate genes exploiting rice genome database. Hence, these genes also might have a positive role in controlling grain weight in rice. However, further confirmation of this novel QTL is warranted before exploiting to develop elite cultivars through MAS. The novel QTL identified in the present study undoubtedly enhance our understanding of the complex nature of the yield component traits in rice.