Introduction

Starch is a crucial energy source that has played a significant role in human social activities (Kumar et al. 2021). Crop grains, which contain large amounts of starch, are utilized for food, animal feed, biofuels, and other products (Wu et al. 2021). With global maize (Zea mays) yield surpassing 1.086 billion tons in 2022, starch has become more accessible than ever before. However, rapid population growth and a deteriorating ecological environment have placed enormous pressure on food security (Dossa et al. 2021). Currently, increasing yield per unit area has reached a bottleneck, and providing additional arable land to meet the demands of a growing population is not feasible (Yin et al. 2020). Therefore, it’s necessary to increase the accumulation of crop dry matter to generate more energy. The endosperm accounts for over 90% of the total grain weight, and the accumulation of starch typically begins in the central region of the endosperm, progressing to the aleurone layer, to form the starchy endosperm in mature grains (Wu et al. 2016). Ensuring the development of endosperm and promoting more starch accumulation is thus essential.

Starch plays a crucial role in plant development. The starch stored in the endosperm provides nutrition for embryo development and serves as an energy source during seed germination (Liu et al. 2022). Starch biosynthesis involves a series of enzymatic reactions, with five classes of enzymes playing a particularly important role: adenosine-diphosphate-glucose pyrophosphorylases (AGPases), soluble starch synthases (SSs), granule-bound starch synthases (GBSSs), branching enzymes (BEs), and debranching enzymes (DBEs). Among these enzymes, AGPases are rate-limiting and can slow down starch synthesis, while GBSSs and SSs are responsible for the biosynthesis of amylose and amylopectin, respectively (Tetlow 2011). BEs and DBEs, on the other hand, assist in breaking down abnormal glucans. Additionally, endosperm development is regulated by several key cell cycle regulators, including RBR protein, CDK/cyclin complex, CDK-specific inhibitor, and APC/C (Cross and Umen 2015). Downregulating RBR1 promotes mitosis and increases cell number, while overexpressing type A CDK leads to reduced accumulation of storage material by preventing endonuclear replication (Leiva-Neto et al. 2004; Sabelli et al. 2013). KRP is another important cell cycle regulator involved in endosperm development, it binds A-type CDKs and D-type cyclins to form a complex (Dante et al. 2014).

Several transcription factors that regulate starch content in maize endosperm have been documented. Opaque2 (O2) and prolamine-box binding factor (PBF) are noteworthy examples as they not only govern the synthesis of storage protein zein but also exert control over starch synthesis (Zhang et al. 2016a). Another transcription factor, ZmbZIP22, exhibits specific expressed in maize endosperm, and it is known to bind to the promoter of the 27-kD γ-zein gene and plays a pivotal role in regulating its expression. Notably, the overexpression of ZmbZIP22 has been shown to reduce the size of starch granules, indicating its role as a negative regulator of starch synthesis (Dong et al. 2019; Li et al. 2018a). Additionally, two endosperm-specific NAC transcription factors, ZmNAC128 and ZmNAC130, have been identified. Downregulation of the expression of ZmNAC128 and ZmNAC130 results in smaller kernels, and a reduction in starch content. Intriguingly, these two factors, along with O2, synergistically promote endosperm filling (Chen et al. 2023; Zhang et al. 2019a).

Hormones play a crucial role in the development of cereal endosperm. Indole-3-acetic acid (IAA) is synthesized and accumulated in fertilized maize kernels and rice grains, where it promotes endosperm development (Basunia and Nonhebel 2019). Cytokinin (CK) positively regulates endosperm cell division rate and enhances starch accumulation (Zhang et al. 2020). Abscisic acid (ABA) is involved in the cellularization of endosperm (Sreenivasulu et al. 2010), while brassinolide (BR) plays a role in endosperm development and starch accumulation (Zhang et al. 2020). The key genes involved in hormone biosynthesis, such as OsYUCs, OsTAR1 and ZmEHD1 (for IAA), SEG8 and DG1 (for ABA), and DWARF4 (for BR), are important regulators for starch content (Abu-Zaitoon et al. 2012; Qin et al. 2021; Sreenivasulu et al. 2010; Wang et al. 2020; Zhang et al. 2020). In addition, epigenetic mechanisms, such as DNA methylation and histone modification, also play a crucial role in regulating cereal endosperm development (Zhao and Zhou 2012 and Zhang et al. 2018).

In recent years, the development of high-throughput sequencing technology and the release of numerous plant reference genomes have made it easier to develop high-density molecular markers covering the whole genome of plant species. Moreover, the interactive utilization of global germplasm resources has provided convenience for the construction of large-scale genetic populations. Consequently, genome-wide association study (GWAS) has become a powerful tool for dissecting quantitative traits (Yamaguchi-Kabata et al. 2008). For example, over the past decade, the most widely used maize association mapping panel (AMP), which represents global maize diversity collected from tropical/subtropical and temperate germplasms, was constructed by Yang et al. (~527 inbred lines) for GWAS (Yang et al. 2011). Using this population, numerous QTL and candidate genes for corresponding traits have been cloned and proposed (Li et al. 2013; Sun et al. 2022; Yang et al. 2013; Zhang et al. 2021). However, GWAS has limited power to detect minor alleles due to the requirement of a minimum allele frequency (MAF) higher than 0.05 (Everett et al. 2020), leading to the exclusion of small-effect genes. As we all know, increasing the density of molecular markers is necessary to improve the detection efficiency of GWAS. Zhang et al. (2016b) conducted a GWAS on drought-related metabolic changes using an enlarged SNP panel (156,599 SNPs) and detected 63 significant QTL, including 56 novel loci compared to a previous study (Setter et al. 2011). Similarly, using an enlarged SNP panel obtained by identity by descent (IBD) and k-nearest neighbor (KNN) imputation methods, GWAS was conducted on 17 agronomic traits of 513 maize inbred lines, revealing numerous significant loci, including known and some novel QTL (Yang et al. 2014). Except for enlarged genotype data, innovation in statistical models has greatly contributed to improving the detection efficiency of GWAS. For example, the GCIM-QEI model can detect QTL-by-environment interaction loci, and the ІІІVmrMLM model can detect dominant effect loci (Li et al. 2022; Zhou et al. 2022). Therefore, taking into account both enlarged genotype data and efficient statistical models is crucial for GWAS.

Understanding the genetic architecture of starch content is crucial for identifying candidate genes. Using a recombinant inbred line (RIL) population with CI7 and K22 as parental lines. Six QTL that affect starch content in maize kernels were identified. Each of these QTL explained 4.07 to 10.6% of phenotypic variation. Furthermore, seven genes were considered as potential causal genes, with four of them acting as regulators of starch biosynthesis (Wang et al. 2015). Another study identified 13 QTL using four double haploid (DH) populations, with 12 genes located within these QTL being implicated in starch synthesis (Zhang et al. 2022). In another research, employing single linkage mapping, joint linkage mapping, and a genome-wide association study, 50 QTL were identified by using a multi-parent population, of which 18 were novel. Notably, ZmTPS9 was identified as the causal gene, encoding a trehalose-6-phosphate synthase. Knocking out ZmTPS9 resulted in increased starch content and grain weight in maize (Hu et al. 2021). These findings broaden our understanding of the genetic basis for starch content. Nevertheless, starch content is a quantitative trait, and further research is essential to unveil its genetic and molecular mechanisms.

In this study, we re-analyzed the genetic basis of starch content for 261 maize inbred lines using an enlarged SNP panel and improved statistical models; the main purpose was (i) to identify novel loci that may regulate maize starch content, (ii) to filter not yet reported candidate genes, and (iii) to develop markers for use in marker-assisted selection of maize kernels with starch content. Our research aims to provide new insights into improving maize kernel starch content.

Materials and methods

Phenotype resources

Phenotypic data for this study was obtained from 261 maize inbred lines cultivated across grown in three different environments, each with three replicates. These 261 inbred lines were randomly selected from a panel of 513 inbred lines used for association mapping. Among them, 71 inbred lines originated from tropical/subtropical regions, while 190 were from temperate regions. All these inbred lines were planted in Ledong, Hainan Province (latitude 18.75° N, longitude 109.17° E), for the years 2011, 2012, and 2013. The region experiences an annual average precipitation of 1181 mm, an annual average temperature of 23 °C, an annual average sunshine duration of 1039.6 h, and an accumulated temperature of 9300.7 °C (Liu et al. 2016a). The starch content of each line was measured with three repetitions, and the best linear unbiased prediction (BLUP) was calculated for the combined data from all three environments and replicates. BLUP value was used as the phenotypic data for GWAS in this study.

Genotype resources

In this study, by combining the MaizeSNP50 BeadChip and RNA sequencing, an enlarged SNP panel containing 558,629 high-quality SNPs (B73_RefGen_v2, referred 0.56M) using two-step approaches, identited by descent (IBD) and the k-nearest neighbor (KNN) algorithm, was obtained (Yang et al. 2014). The set of genotype data covers the entire maize genome with a minimum allele frequency (MAF) of at least 0.05 (MAF≥0.05) and can be downloaded from the Maizego website (http://www.maizego.org/Resources.html).

Statistical model

To control for both type I (false positive) and type II (false negative) error rates, three models were compared: generalized linear model (GLM) with population structure as a fixed effect (GLM + Q), mixed linear model (MLM) with relative kinship as a random effect (MLM + K), and MLM with both population structure and kinship as fixed and random effects (MLM + Q + K), respectively. Specifically, the GLM can be represented as y = Xα + Zβ + e, while MLM can be represented as y = Xα + Zβ + Wμ + e. Here, y is the trait value, Xα represents the population structure or Q matrix as a fixed effect, Zβ represents SNP or marker effect as a fixed effect, Wμ represents the kinship matrix as a random effect, and e represents the residual (Yu et al. 2006). It is essential to provide detailed information regarding the Q matrix as the Q model is the most suitable for this research. The number of subgroups (K) was set from 1 to 15 and was used to identify the optimal K value; this was achieved by conducting 150,000 MCMC (Markov chain Monte Carlo) replications and 100,000 burn-ins in both STRUCTURE and INSTRUCT software. These tools were employed to estimate population structure and create subpopulations. In the STRUCTURE software, combining the log-likelihood of data (LnP(D)) and an ad hoc statistic ΔK determines the most suitable K value. Meanwhile, in INSTRUCT, LnP(D) and deviance information criterion were used to define the optimal K. To consolidate the results obtained from replicate simulations conducted in STRUCTURE and INSTRUCT, CLUMPP software was used. Inbred lines with probabilities greater than or equal to 0.60 were assigned to their respective subpopulations, while lines with probabilities less than 0.60 were grouped into a mixed category (Yang et al. 2011). To evaluate the performance of the three models, quantile-quantile (QQ) plots were generated for each model using the best linear unbiased prediction (BLUP) of starch content. An optimal model was determined based on a QQ plot that had a line close to 1:1 with a distinct tail that deviated upwards, indicating that well-controlled type I and type II errors and a true association with causal polymorphism(s) (Zhang et al. 2010).

Genome-wide association analyses

The genome-wide association study (GWAS) was performed using TASSEL 3.0 software (Bradbury et al. 2007). To account for the linkage disequilibrium among SNP markers, the effective number of markers (En) was calculated using GEC software (Yang et al. 2010) as 250,345 for 0.56M (558,629 SNPs) SNPs, respectively. Additionally, to avoid false negative (type II error) and be able to detect more small effect loci, an appropriately adjusted threshold of 2.07 × 10−5 was used for the 0.56M SNPs, which is commonly used in plant genome-wide association study.

Candidate gene analyses

To identify potential candidate gene associated with starch content, we defined significant QTL by using previously estimated linkage disequilibrium (LD) distance, and a 100-kb QTL interval was defined for 0.56M SNPs, with 50 kb upstream and downstream of each significant SNP (Yang et al. 2014). The candidate gene was identified using the filtered working gene list from the B73 reference genome (RefGen_v2) obtained from MaizeGDB. The candidate gene was annotated using InterProScan (http://www.ebi.ac.uk/interpro/scan.html), and expression patterns in maize organs were analyzed to predict the potential relationship with starch content (Hoopes et al. 2019). The most likely candidate gene within each QTL was selected based on its annotation or contained the peak SNP. If there were no genes within the interval, the neighboring gene of the peak SNP was considered the most likely candidate gene. The rule of QTL naming is as follows: q + trait + serial number, for example, [qSc1, qSc (Starch content) 1 (serial number)]. Additionally, using the “lm” package in R software, we employed a multiple linear regression model to assess the total phenotypic variation accounted for all QTL (Zhang et al. 2016b).

Gene Ontology (GO) enrichment analyses

To identify enriched Gene Ontology (GO) terms, we performed GO enrichment analysis using OmicShare Tools (https://www.omicshare.com/tools) (Ding et al. 2019). The analysis involved mapping genes expressed in maize endosperm to various sets in the GO database (http://www.geneontology.org/). The number of genes in each set was counted, and a list of genes with a specific GO function and the number of genes in each function was obtained. The top 30 GO terms with the minimum P values were selected for analysis and visualization.

Linkage disequilibrium analyses

The extent of linkage disequilibrium (LD) was estimated using the squared correlation of paired SNPs, which was computed using the “genetics” package in R (version 4.1.1). An LD plot was then generated with the “LDheatmap” package in R.

Haplotype analyses of ZmAPC4

BLUP value of starch content with 261 inbred lines was used as phenotype data. All SNPs located in ZmAPC4 were used as genotype data, and then combing them for haplotype analyses. According to the number of inbred lines that carry different haplotypes with starch content from high to low, they were named hap1, hap2, hap3, and hap4, respectively. To ensure robustness in our analysis, haplotypes consisting of fewer than 10 inbred lines were excluded from further consideration. Additionally, taking into account the peak single nucleotide polymorphism (SNP) chr4.S_175584318, which is associated with ZmAPC4, two distinct haplotypes carrying either the G allele or the T allele were selected for comprehensive haplotype analysis. Significant differences between different haplotypes were determined using Student’s t-test.

Construction of phylogenetic tree

To bolster the credibility of ZmAPC4, we searched all genes encoding the WD40 domain that have been previously documented in Arabidopsis, maize, and rice. Using the amino acid sequence of these genes, we conducted an amino acid sequence alignment with the neighbor-joining (NJ) method (Saitou and Nei 1987) within the MEGA X software. Subsequently, the resulting phylogenetic tree was annotated using the iTOL online tool (https://itol.embl.de/).

Development of molecular markers

For the peak SNP chr4.S_175584318, which exhibits two alleles (GG or TT), we developed molecular markers to distinguish inbred lines based on their starch content. To achieve this, we selected inbred lines with higher starch content (carrying the GG allele) and lower starch content (carrying the TT allele), respectively. To design the molecular markers, we utilized primer 1 to amplify a 504bp fragment that includes the peak SNP. We employed the dCAPS Finder 2.0 program available at http://helix.wustl.edu/dcaps/ (Neff et al. 2002) to develop dCAPS markers and design nearly matched primers, referred to as primer 2. Primer 2 was specifically designed to amplify a 251bp fragment that contains the NdeI restriction site ('CATATG') from the 504bp fragment. Following the amplification with primer 2, the resulting 251bp fragment was extracted and purified using a 4% agarose gel. Subsequently, the purified product underwent digestion with NdeI endonuclease (New England Biolabs R0111V). Detailed information regarding the primers, PCR system, and enzyme digestion can be found in Table S3.

Results

Optimized model and expanded genotype

The characterization of starch content in the AMP using a near-infrared analyzer (NIA) is an important step for breeding high-quality maize varieties. In a previous study, the starch content of 261 maize inbred lines was measured by NIA, a genome-wide association study (GWAS) using a mixed linear model (MLM) with principal components (PCs) and kinship (K) as a model (PCs + K) was performed. However, this model was found to be too stringent in reducing false positive (type I error) compared to other models (Figure S1 and Figure S2) (Liu et al. 2016a). To improve the accuracy of the results, we used three different models (Q, K, and Q+K) for GWAS, where the Q model only accounts for population structure, the K model only accounts for kinship, and the Q+K accounts for both population structure and kinship. To test whether increasing marker density could further improve GWAS detection power for starch content, we expanded the genotype to 558,629 high-quality SNPs and repeated the analysis using the Q, K, and Q+K models. However, both the K and Q+K models still showed too much false negative, while the Q model consistently outperformed the other two best among the three models (Fig. 1a). In conclusion, the Q model is the most appropriate choice for this research. Additionally, our study also suggested that increasing marker density can improve the statistical power of GWAS and more SNPs/loci were detected, and the choice of the appropriate model is crucial for successful GWAS.

Fig. 1
figure 1

a QQ plot of Q, K, and Q+K models for starch content based on 0.56M SNPs. The red line represents the significant threshold was 3.99 × 10−6 (1/250345). b Manhattan plot of Q model based on 0.56M SNPs. The red line represents the significant threshold was 2.07 × 10−5 (1/48393)

GWAS

The Q model was selected and used to interpret GWAS results of starch content. A total of 21 significant SNPs were identified (Fig. 1b, and Table 1), indicating that expanding the marker density and using appropriate thresholds can improve the detection power of GWAS (only four loci were detected in the previous study) (Liu et al. 2016a). To further understand the genetic basis underlying starch content, the 21 significant SNPs were categorized into 14 QTL based on the definition of QTL. Each QTL could explain the phenotypic variation (R2) ranging from 7.02 to 9.62%, with an average of 8.42% and the total phenotypic variation explained by all QTL is 47.66%. These QTL are likely to be associated with starch content and defined as starch-content candidate loci. Furthermore, at chr.2, qSc1(222.99 Mb-223.09 Mb) had a powerful ability to explain 9.61% of the phenotypic variation for starch content (Table S1). Remarkably, seven significant SNPs co-located in qSc1, indicating that qSc1 is a crucial region. Furthermore, only one gene (GRMZM2G056335) was identified within qSc1, and its annotation as UDP-glucosyl transferase has been reported to be associated with heat-stress-induced leaf senescence (Han et al. 2023). Apart from its potential impact on starch content, it also appears to have a role in conferring resistance to abiotic stress. These findings strongly suggested that qSc1 represents a genetic hotspot region. In conclusion, these results provide valuable insights into the accumulation of useful information concerning starch content in maize kernels. Analyzing the candidate gene responsible for underlying the QTL could reveal even more insights into the genetic basis of starch content.

Table 1 All genes within significant QTL associated with starch content by GWAS with an enlarged SNP panel

Candidate gene analysis about multiple loci

After analyzing the GWAS results, we identified a total of 42 genes involved in various functions, such as transcription factors (e.g., GRMZM2G138165), enzymes (e.g., GRMZM2G056335), and proteins (e.g., GRMZM2G138076, GRMZM2G005791) (Table 1). As the maize endosperm is the main organ rich in starch and accounts for more than 90% of the kernel dry weight, we found about half of all genes (22/42) expressed in the maize endosperm; it suggested their potential role in regulating starch content. Only two genes have been reported to be directly involved in endosperm development. One of them is GRMZM2G080843, which plays a regulatory role in starch biosynthesis in maize endosperm (Finegan et al. 2022). Another is GRMZM2G022453, which is also implicated in endosperm development and has an indirect impact on starch synthesis (Song et al. 2021). Overall, it’s noteworthy that only two genes had been characterized in terms of their roles in starch synthesis within the endosperm, the remaining forty genes represent novel candidates, highlighting the potential value of using an expanded SNP panel to gather additional insights into the genetic basis of starch content in maize kernels. Notably, GRMZM2G053766 has a high expression level during endosperm development, and it encodes a WD40 family protein and shares homology with anaphase-promoting complex 4 (APC4) in Arabidopsis. Mutations in apc4 have been linked to abnormal endosperm development and cell cycle disorders (Guo et al. 2018). Another evidence comes from TRANSPARENT TESTA GLABRA1 (TTG1), which also encodes a WD40 repeat transcription factor, it has been demonstrated to play a role in seed storage accumulation in Arabidopsis. In ttg1-1 mutant seeds, there is a significant increase in dry weight, primarily attributed to elevated starch content, total protein, and fatty acids (Chen et al. 2015a). Given these findings, we named GRMZM2G053766 as ZmAPC4. Subsequently, we filtered genes encoding the WD40 domain that had been previously reported in Arabidopsis, maize, and rice. Using the amino acid sequence of these genes, we constructed the phylogenetic tree of ZmAPC4 (Fig. 2) employing the neighbor-joining (NJ) method. The results revealed that ZmAPC4 shares the highest homology with APC4 in Arabidopsis. Additionally, we also observed homology between ZmAPC4 and the reported genes KRN2 (Chen et al. 2022), ALI1 (Best et al. 2021) in maize, and OsPHF1 in rice (Chen et al. 2015b). In conclusion, the homology information strengthens the credibility of ZmAPC4 as a candidate gene that may influence starch content.

Fig. 2
figure 2

The phylogenetic tree of ZmAPC4 and other genes that encode WD40 protein domain

Gene Ontology enrichment analysis

We conducted a GO enrichment analysis on the 22 genes that were expressed in the maize endosperm and found significant enrichment in several categories, including intracellular organelle part (cellular component), transmembrane transporter activity, RNA binding (molecular function), and embryo development (biological process) (Fig. 3). Importantly, ZmAPC4 was enriched in 18 of the top 30 GO terms and showed significant enrichment in biological processes related to embryo and seed or fruit development (GO:0009793, GO:0009790, and GO:0048316) (Table S2). These findings suggest that ZmAPC4 plays a critical role in maize kernel development and may be involved in regulating starch synthesis.

Fig. 3
figure 3

Top 30 of GO enrichment entries by using 22 genes expressed in maize endosperm

Haplotype analysis of ZmAPC4

To analyze the haplotype of ZmAPC4, we extracted all SNPs within one LD decay distance (±100kb) upstream and downstream of the peak SNP (chr4.S_175584318, P = 1.52E−05) (Fig. 4a), we found strong linkage relationship between other SNPs and peak SNP (Fig. 4b). chr4.S_175584318 (TT/GG) is located in the 3′ UTR region of ZmAPC4 and does not result in a change of encoded amino acid, but allele variation can alter mRNA stability and lead to changes in expression (Pal et al. 2011). Based on 0.56M SNPs, we filtered all SNPs located in ZmAPC4 as genotype data. BLUP value of starch content with 261 inbred lines was used as phenotype data. Combing genotype and phenotype for haplotype analysis after removing the missing SNPs. It was observed that there are significant differences in starch content between hap1 and hap4, hap2 and hap4, as well as hap3 and hap4 (Fig. 4c), The average difference in starch content between hap1, hap2, hap3, and hap4 is approximately 2.6%. Among these haplotypes, hap4 (AGTAACATTTCAG) consisted of 11 inbred lines that exhibited the highest starch content. This suggested that these particular inbred lines may harbor favorable allele variants associated with increased starch content (Table 2). Regarding the peak SNP chr4.S_175584318, significant differences were observed between the two haplotypes carrying the T or G (Fig. 5a). Based on the identification of these favorable genomic regions, molecular marker-assisted selection could be employed to differentiate the starch content of various maize germplasms and enhance the starch levels in modern maize breeding programs.

Fig. 4
figure 4

ZmAPC4 affected starch content in maize kernels. a Enlarged Manhattan plot of the lead SNP; red diamonds represent the lead SNP. The red dotted line represents the threshold −log10(P) ≥ 4.68 (P ≤ 2.07 × 10−5). b R2 values of SNPs associated with ZmAPC4; the lead SNP located at 64 bp downstream within 3’UTR of ZmAPC4. c Using one-way ANOVA to perform haplotype analysis of ZmAPC4, considering the uneven number of inbred lines among haplotypes, the Games-Howell method was employed for pairwise comparisons. The maize inbreds which carrying hap4 had highest starch content

Table 2 Comparison of haplotypes for starch content using BLUP value of 261 inbred lines
Fig. 5
figure 5

Difference between maize inbred lines which have TT or GG allele for peak SNP. a Difference analysis of maize inbreds have T or G haplotype. b The bands of eight inbred lines have TT allele and eight inbred lines have GG allele after using agarose gel electrophoresis

Development molecular markers of ZmAPC4

Molecular marker-assisted selection is a valuable approach that complements traditional breeding methods, enhancing efficiency in the breeding process (Guo et al. 2021). In our study, we successfully developed dCAPS markers capable of categorizing inbred lines based on their starch content (Table S3). To implement the markers, DNA fragments containing the peak SNP chr4.S_175584318 were subjected to digestion using the NdeI enzyme; subsequently, the resulting fragments were separated using 4% agarose gel electrophoresis. Notably, distinct banding patterns were observed between the eight high-starch lines (carrying the GG allele) and the eight low-starch lines (carrying the TT allele) (Table S4 and Fig. 5b). These findings yield two important implications. Firstly, ZmAPC4 emerges as a stable candidate gene associated with starch content. Secondly, the developed molecular markers can be effectively utilized to screen other maize varieties with either higher or lower starch content. Consequently, our research significantly contributes to improving breeding efficiency and provides a valuable reference for the development of new maize varieties characterized by high starch content.

Discussion

Starch is a major component of plant endosperm and is mainly composed of two types: amylose and amylopectin (Jeon et al. 2010). In our study, we identified two genes, GRMZM2G056335 and GRMZM2G007721, which encode UDP-glucosyl transferase (UDPG). UDPG-related genes, such as du and waxy in rice, have been found to affect amylose content in grains (Kaushik and Khush 1991 and Zhang et al. 2019b), while flos (flo1 and flo2) affects the transparency of endosperm and produces a starchy endosperm (She et al. 2010). Therefore, our findings suggest that GRMZM2G056335 and GRMZM2G007721 may be the potential targets for affecting starch content in maize kernels.

Starch biosynthesis in cereals is regulated by various transcription factors (TFs), including MYC, EREBP, bHLH, and PPR family transporters in rice (Bello et al. 2019; Wu et al. 2020; Zhu et al. 2003), MYBs and DOFs in maize (Wu et al. 2019; Xiao et al. 2017), and AP2/EREBP and bZIP in wheat (Liu et al. 2016b; Song et al. 2020). Notably, our study identified GRMZM2G317596, which encodes an AP2/EREBP transcription factor (Table 1). In rice, the homolog gene RSR1 was found to regulate starch biosynthesis, and its mutant led to the up-regulation of genes involved in starch synthesis, an increase in amylose content, and changes in amylopectin structure, which altered the morphology of starch grains (Fu and Xue 2010). Therefore, GRMZM2G317596 could be regarded as a potential candidate gene affecting starch content in maize. In addition to TFs, several long non-coding RNAs (lncRNAs) have also been found to play a role in starch biosynthesis. Overexpression of lncRNA_2308, lncRNA_1267, and lncRNA_1631 reduced the expression of GBSSI, resulting in a decrease in starch content and grain weight in rice (Zheng et al. 2019). However, the complexity of transcriptional regulation in starch biosynthesis implies that mutation in a single TF gene may not result in significant changes in endosperm development and starch content. The complexity suggests that the endosperm has established feedback mechanisms to respond to internal and external changes. Therefore, co-expression analysis and genetic analysis are powerful tools for identifying candidate genes that regulate endosperm development and starch content. In particular, GWAS has become an efficient method for detecting and analyzing the genetic mechanisms of quantitative traits.

To ensure the accuracy of GWAS results, it is important to consider the trait sensitivity and choose the appropriate model with high statistical power and low error rate (Chang et al. 2018). In this study, we employed an enlarged genotype dataset and applied three different statistical models (Q, K, and Q+K) to conduct GWAS. After a comprehensive comparison of the results, we found that the K and Q+K models had a more stringent control on false positive. On the other hand, the Q model had the best control effect on false positive, as indicated by the QQ plot, and was found to be a suitable model for our research.

In a previous GWAS for kernel starch content, only four SNP-trait associations underlying four candidate loci were identified using a set of 52,370 SNPs (Liu et al. 2016a). In the present study, an enlarged panel of high-density SNP panel (558,629) was obtained from RNA sequencing data performed on 368 of the 513 lines used in the previous study with a minor allelic frequency greater than 0.05 (Yang et al. 2014). This enabled the identification of 14 new loci significantly associated with maize kernel starch content, as well as 21 additional significantly associated SNPs that were not detected in the previous study with smaller density markers (Table S1). Interestingly, the four loci significantly associated with starch content in the previous study were not significant in the current study. This was likely due to the different P-values of the same SNPs in the present study, which did not meet the suggestive threshold (P ≤ 0.05/48, 277) in the expanded GWAS (McGeachie et al. 2015). The increase in significant SNPs in the present study was primarily due to the higher marker density, which increased statistical power and enabled the identification of minor effects and unbalanced allele frequency loci (Gong et al. 2013; Tedja et al. 2018).

In the current study, we found 21 significant SNPs associated with maize kernel starch content, involving 14 novel QTL containing 42 genes. Out of these, 29 genes had functional annotation. Some of the QTL we identified had been reported in previous studies. For example, qSc6 was located on chromosome 6 within the interval of 164.64 Mb-164.74 Mb (Table S1), which was previously identified as a QTL for starch content using BLUP value with epistatic QTL by a GWAS (Hu et al. 2021). Based on these findings, we suggested that qSc6 may be considered as a stable QTL that regulates starch content. Eight genes were within qSc6, including those encoding ribosomal family protein (GRMZM2G022453, GRMZM2G022619), F-box family protein (GRMZM2G154626, GRMZM2G023190), and the AP2-EREBP transcription factor (GRMZM2G317596). In wheat and rice, AP2/EREBP transcription factors have been reported to be closely related to starch content (Fu and Xue 2010; Liu et al. 2016b). This suggests that these genes may be conserved during evolution and could have similar functions in maize. We also identified qSc2, which is located on chromosome 4. The significant SNP of qSc2 (chr4.S_175584318, P = 1.52 × 10–5) was found to be co-located with chr4.S_165621095 (the distance between two SNPs was less than 10Mb) in a previous study (Li et al. 2018b). Only one gene, ZmAPC4, was located within qSc2; it encodes a WD40 repeat-like superfamily protein. APC4 has been demonstrated to influence endosperm development, and WD40 proteins have been shown to impact starch accumulation in Arabidopsis seeds (Chen et al. 2015a; Guo et al. 2018). GO analysis confirmed that ZmAPC4 significantly affects the progression of grain development (Fig. 3), and the haplotype analysis revealed a significant difference in starch content between maize inbred lines carrying hap4 compared to those carrying other haplotypes (Fig. 4c). Furthermore, the peak SNP chr4.S_175584318, which is associated with the ZmAPC4, displayed an interesting pattern where inbred lines carrying the GG allele exhibited higher starch content compared to those carrying the TT allele (Fig. 5a). Capitalizing on the information from the peak SNP, we successfully developed dCAP markers capable of distinguishing between maize-inbred lines with higher or lower starch content. These molecular markers can be widely applied to assess and differentiate the starch content of various maize lines (Fig. 5b), This advancement could significantly enhance breeding efficiency and offer a valuable tool for developing new maize varieties with high starch content. These findings suggest that ZmAPC4 plays a role in regulating starch content. Genetically modified (GM) technology such as gene silencing, knockout, and overexpression could be employed to further verify the functions of these genes in different cereal crops, such as wheat and rice.

Conclusion

Overall, our study employed an expanded SNP panel and a more suitable statistical model to re-analyze the published data on maize kernel starch content, resulting in the identification of several novel genetic loci through GWAS. We also predicted potential candidate genes that may regulate starch content, which could be useful for improving the efficiency of maize breeding through the development of molecular markers. Our findings provide a valuable reference for enhancing grain yield and could contribute to the development of more productive and sustainable agricultural practices.