Introduction

Cauliflower (Brassica oleracea var. botrytis) is an economically significant vegetable crop known for its unique flavor and high nutritional value, and it is widely produced and consumed globally. The curd, characterized by its creamy-white color and enlarged inflorescence meristem, is the most valuable and edible part of the cauliflower plant. The quality and appearance of curd can be described by various agronomic traits including curd size, shape, color, texture, degree of tightness (also known as solidity). However, the molecular regulatory mechanisms underlying curd development remain unclear. Understanding the genetic basis of these curd-related traits through map-based cloning is crucial for cauliflower improvement, which will accelerate the steps of germplasm innovation and empower cauliflower breeding.

The solidity of the cauliflower curd (SCC) is a critical agronomic trait that has a substantial impact on cauliflower morphology and is essential in determining curd quality. This trait categorizes cauliflower into two distinct groups: compact-curd and loose-curd (Supplementary Fig. 1). Compact-curds are characterized by short stalks/branches, wide stalk/branches angles, and high solidity, whereas loose curds exhibit contrasting characteristics (Nieuwhof and Garretsen 1961; Zhao et al. 2020a, 2013). In an attempt to quantitatively assess SCC in cauliflower, several easily measurable properties have been reported including curd tightness (CT), the base radius of the curd (BRC), the number of curd branches (NCB), curd branch length (CBL), the angle between branches and the vertical axis (ABV), and the curd weight-to-height ratio (CWHR, g/cm) (Nieuwhof and Garretsen 1961; Sharma et al. 2018). Remarkably, CBL and ABV are recognized as crucial factors contributing to the differences observed between compact and loose curds. Previous studies have demonstrated that these traits are strongly associated with each other (Kumar et al. 2017; Lan and Paterson 2000; Rana et al. 2024; Sharma et al. 2017; Singh et al. 2023; Zhao et al. 2020a). Of these, CBL is a crucial parameter for assessing the curd solidity and yield of cauliflower. Previously, two specific traits, the length of the outermost branch (LOB) and secondary branch (LSB) in curd, have been utilized as indicators for representing CBL trait (Zhao et al. 2020a, 2012).

In this decade, several studies have been published focusing on the key loci and genes responsible for important agricultural traits of cauliflower curd. Matschegewski et al. (2015) identified 18 QTLs and several flowering genes that influence temperature-dependent curd induction using GWAS strategy. Based on the inconsistent expression of the flowering genes FLOWERING LOCUS C (BoFLC) and VERNALIZATION 2 (BoVRN2), they speculated that facultative BoVRN2/BoFLC-independent mechanisms might control the temperature-regulated floral transition in cauliflower. Hasan et al. (2016) repeatedly detected QTLs for days to curd initiation (DCI) on chromosomes C06 and C09 using composite interval mapping (CIM), with increased additive effects at higher temperatures. Zhao et al. (2020a) analyzed curd architecture across two environments, focusing on four parameters: basal diameter, stalk length, stalk angle, and curd solidity in two double haploid (DH) populations developed by a common compact-curd parent and two loose-curd parents, comprising 122 and 79 lines, respectively. In total, they identified 20 QTLs associated with these parameters, in which qSL.C6-1, qSL.C6-2, qCS.C6-1, and qCS.C6-2 were found in close proximity on chromosome C6. Zhao et al. (2020b) performed BSA-seq using an F2 population derived from sister lines with “riceyness” and “non-riceyness” bulks resulting in the identification of a 4.0 Mb candidate region on chromosome C04 containing 22 putative SNPs. Through comprehensive RNA-seq, gene function annotation, and sequence analysis, they pinpointed Bo4g024850, an orthologous gene of Arabidopsis SOC1, as the candidate gene responsible for the development of riceyness. Rakshita et al. (2021) detected twelve QTLs and 121 significant SNPs related to agronomic traits, including curd length, curd width, days to 50% curd harvest, marketable curd weight, gross plant weight, leaf length, curd weight, and number of leaves per plant, using genotyping by sequencing (GBS) on 92 Indian cauliflower germplasms. Recently, Zhang et al. (2023) selected 220 core accessions of loose-curd cauliflower for resequencing, phenotypic investigation, and genome-wide association studies (GWAS), leading to the identification of several signals on chromosome C02 for main stem height (MSH) and purplish curd (PC), on C06 for external leaf wing (ELW), and on C01 for weight of a single curd (WSC). Additionally, BOB01G136670, located in the WSC signal interval, was regarded as a plausible candidate gene for WSC in cauliflower. Taken together, these efforts focusing on dissecting phenotypes and mapping quantitative trait loci (QTL) of curd-related traits will contribute to revealing key regulatory genes and improving our understanding of this important trait in cauliflower.

In the past decades, the emergence and wide applications of next-generation sequencing (NGS) techniques have ushered in a transformative era, making it possible to generate genomic big data inexpensively and facilitate genetic mapping by providing high-density single-nucleotide polymorphisms (SNPs) and significantly improve marker resolution. Combined with NGS technologies, recent studies have revolutionized the approach to genetic dissection of complex trait architectures in cauliflower (Zhao et al. 2020b; Zhang et al. 2023). In particular, integrated multi-omics has shown great superiority in QTL mapping and candidate gene mining. For example, bulked segregant analysis (BSA) and bulked segregant RNA-seq (BSR) were used together for simultaneous mapping and analysis of differential gene expression within the mapping interval (Borovsky et al. 2019; Yan et al. 2019; Zhao et al. 2020b). BSA, combined with transcriptome analysis, was utilized to identify genes that show differential expression within major signals (Hao et al. 2019; Ou et al. 2020). BSA and GWAS are frequently co-located to improve resolution by leveraging natural population diversity (Chen et al. 2024; Du et al. 2018; Su et al. 2019). Integration of transcriptome sequencing and GWAS approaches could not only cross-validate the result, but also reduce the scope of candidate genes based on SNP markers and gene expression data, which has proven to be effective in identifying key genes that control important agronomic traits in cauliflower (Matschegewski et al. 2015), rice (Wei et al. 2021), maize (Ma et al. 2021), oilseed rape (Li et al. 2021), and oak (Francisco et al. 2021).

In this study, a comprehensive strategy for gene mining was performed to investigate the CBL trait using 298 cauliflower inbred lines. Based on the genome-wide SNP dataset generated by whole genome resequencing (WGRS) and phenotypic data, we analyzed the phylogenetic structure and conducted GWAS for LOB and LSB traits, resulting in the identification of the major signal on chromosome 8. In addition, selective sweep analysis, haplotype analysis, and tissue-specific transcriptome profiling provided further evidence to narrow down the signal interval and identify causal genes. The main aims of this study were two-fold: (1) to detect QTLs and analyze the complex genetic architecture that controls the LOB and LSB traits, and (2) to validate the significance of selected major QTLs and pinpoint causal genes. Our results provide valuable insights into the molecular mechanisms of curd development and offer candidate genes for improving curd features in cauliflower.

Materials and methods

Plant materials

Cauliflower inbred lines used in this study were collected and stored at the Tianjin Kernel Vegetable Research Institute, Tianjin Academy of Agricultural Sciences (Supplementary Table 1). The total study of 298 cauliflower inbred lines, comprising 167 loose-curd accessions and 131 compact-curd accessions. For outgroups, four accessions resequencing data, including two wild accessions (Brassica cretica), and two Chinese kale accessions (Brassica oleracea var. alboglabra), were downloaded from the NCBI SRA database (Supplementary Table 2).

In the context of real-time reverse transcription PCR (qRT-PCR) experiments, we utilized two distinct cauliflower inbred lines, which represented loose and compact-curd groups with LOB length in 3.568 and 8.336 cm, to conduct analyses of candidate gene expressions. Sampling was performed across multiple stages of curd development, encompassing the curd formation stage, curd elongation stage, and curd maturation stage (Guo et al. 2021). During each of these stages, the fleshy stem portion of the curd was meticulously excised, and RNA extraction was performed to enable subsequent qRT-PCR analysis.

Phenotyping

All 298 cauliflower inbred lines were planted in Tangshan, Hebei Province, China (2019) and Baodi Biological Center, Tianjin, China (2020), respectively. All accessions were planted in an experimental field with an arrangement-order design, including five replicates. Each plot contained one row 5 m in length, 9–10 plants per row, approximately 50 cm between plants within each row and 55 cm between rows. Cauliflowers were sown in mid-to-late June and harvested in mid-to-late October. Phenotyping of LOB and LSB traits was measured using Vernier caliper at curd physiological maturity stage.

SNP calling

The genome resequencing raw data of 298 cauliflower accessions have been sequenced and stored in our laboratory (Chen et al. 2024), and these data have been deposited in the NCBI SRA database (PRJNA794342).

For data preprocessing, Fastp (v0.12.4) and FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) programs were employed for read cleaning (Chen et al. 2018). Clean reads were subsequently aligned to the reference “C-8” cauliflower genome (NGDCGWHBKKZ00000000) using BWA aln (v0.7.10-r789) (Li and Durbin 2009). Samtools (v1.14) (Li et al. 2009) was used for converting the format of SAM files, sorting BAM files, and filtering mapping quality with the “-q 30” parameter. The Genome Analysis Toolkit (GATK, v4.1.4.1) (McKenna et al. 2010) was utilized for processing SAM files and calling SNPs. To exclude SNP-calling errors resulting from incorrect mapping, only high-quality SNPs were retained based on the following criteria: depth of coverage ≥ 4 (approximately one-third of the average depth), mapping quality score ≥ 20, missing sample ratio within the population ≤ 20%, and minor allele frequency (MAF) ≥ 0.05.

Phylogenetic and population structure analysis

To clarify phylogenetic relationships from a genome-wide perspective, an individual-based NJ (neighbor-joining) tree was constructed with 100 bootstraps using PHYLIP (v3.697) (Felsenstein 1993) and the tree layout was generated using the online tool iTOL (https://itol.embl.de/). Principal component analysis (PCA) was performed on the filtered SNPs using GCTA (v1.94.0beta) (Yang et al. 2011). The population structure was analyzed with the cluster number K ranging from 1–20 by ADMIXTURE (v1.3.0) (Alexander et al. 2009) using a default fivefold cross-validation (− cv = 5). Each K was run with 20 replicates, and the outputs were visualized using the function plotQ() of the R package pophelper (Francis 2017).

GWAS analysis

The R package lme4 and the mixed model y =  + Zu + ε were used to calculate the best linear unbiased estimation (BLUE) of each line in each correlation-based group. Genotype was defined as the fixed effect, while environment was considered as the random effect. The yield BLUEs calculated for each phenotype group were utilized for GWAS and for developing genomic prediction models.

Totally, 729,691 SNPs (MAF > 0.05; Quality ≥ 20; GQ ≥ 5; missing rate ≤ 0.2; depth ≥ 4) were used for the two different traits in this study. Association analysis was performed using the MLM methods implemented in rMVP (Yin et al. 2021). Genome-wide significance thresholds (1.37E-06) were determined using a uniform threshold of 1/n, where n is the effective number of independent SNPs and SVs calculated using the Genetic type 1 Error Calculator (v.0.2) (Li et al. 2012). QTLs were defined based on the positions of significant SNPs and the size of LD interval. The LD decay of cauliflower populations was recently calculated to be approximately 27.9 kb and reported by our group (Chen et al. 2024). To ensure key candidate genes were not missed, the total QTL size was defined as 200 kb, comprising 100 kb upstream and downstream of associated SNPs, slightly larger than the actual LD interval size (27.9 kb) (Chen et al. 2024; Luo et al. 2021; Wang et al. 2019).

FST and XP-EHH analysis and selective sweep detection

Two methods, linkage disequilibrium-based cross-population extended haplotype homozygosity (XP-EHH) (Sabeti et al. 2007) and the population differentiation-based FST (Weir and Cockerham 1984), were used to detect signatures of selection in cauliflower. FST values were calculated using VCFtools v0.1.17 (Danecek et al. 2011) with a window size of 100 kb and a step size of 10 kb. Windows with top 1% FST value were considered candidate regions under a selective sweep. The XP-EHH values were calculated using REHH v3.2.2 R package (Gautier et al. 2017) to locate regions under positive selection by applying a threshold of − log10 (p-value) > 4.0. We used the topGO package to obtain the Gene Ontology (Alexa and Rahnenführer 2009), and the ggplot2 package was used for visualization (Villanueva and Chen 2019).

Gene annotation and haplotype analysis

Candidate gene annotation was conducted using the TBtools software platform (Chen et al. 2020; Chen and Xia 2022), with subsequent alignment of all candidate gene protein sequences against the UniProt databases. In-depth analyses were then carried out for each candidate gene and their corresponding overlapping genomic regions. Within this context, haplotypes were extracted from a cohort of sixty samples, comprising 30 individuals each exhibiting extreme phenotypic traits on either end of the spectrum. The R package pegas v0.12 (Paradis 2010) was employed for haplotype analysis in our work, which had also been used in previous studies (Guo et al. 2020; Zhou et al. 2020). Samples or genes featuring inadequate site coverage or poor data quality were excluded from the HapMap graph construction process, thereby ensuring the integrity and fidelity of the data representation for specific genes.

RNA extraction and expression analysis

Total messenger RNAs were extracted using an RNA isolation kit (Thermo Scientific), and genomic DNA was removed using DNase I (Invitrogen). Reverse transcription was performed using PrimeScript™ IV 1st strand cDNA Synthesis Mix (Takara) according to the manufacturer’s protocol. qRT-PCR was performed using TB Green® Premix Ex Taq™ II (Tli RNaseH Plus) (Takara) on a LightCycler 480 II instrument (Roche). The analysis was conducted using three biological and three technical replicates. Relative expression levels of each gene were calculated using the 2−∆∆ct method. Actin was used as the reference gene and primers used are listed in Supplementary Table 12.

A comprehensive collection of 48 tissue-specific RNA-seq datasets for B. oleracea was downloaded from the SRA database and the National Genomics Data Center database (PRJNA516113, PRJNA546441, and PRJCA027857). The datasets encompassed a variety of organs, including the root, leaf, flower and curd. Moreover, transcriptome data detailing different stages of curd development were also acquired from the SRA database (PRJNA546441).

For preprocessing, we employed the Fastq-dump tool (v2.11.2) from the SRA Toolkit and the Fastp program (v0.12.4) (Chen et al. 2018) for format conversion and read cleaning. HISAT2 (v2.2.1) (Kim et al. 2019) and Cufflinks suite (v2.2.1) (Trapnell et al. 2012) were used to compute the fragments per kilobase of transcript per million mapped reads (FPKM) values for each gene. For visualization, a heatmap was generated using the pheatmap R package (v1.0.12) (Kolde 2019), which incorporated log2 (FPKM + 1) values of the selected genes.

Statistical analysis

Comparisons of phenotypic data and gene expression levels were conducted by unpaired two-tailed Student’s t tests using the rstatix package in R version 4.2.1 (http://www.r-project.org/). Coefficients of variation, means, and standard deviations were calculated using the raster package (Hijmans et al. 2015) and R (v4.2.1), respectively. Genomic heritability (h2) was performed using the GCTA approach (Yang et al. 2011).

Results

Phenotypic analysis

Field observations and measurements of LOB and LSB traits were conducted on 298 cauliflower inbred lines for two consecutive years (2019 and 2020) in different locations. The results showed significant disparities that followed a normal distribution pattern (Fig. 1a, b). The LOB values ranged from 2.316 to 9.657 cm, with a mean value of 4.406 cm, whereas the LSB ranged from 1.172 to 3.910 cm, with an average of 2.073 cm (Supplementary Table 1). The phenotypic coefficient of variation (PCV) for LOB was 32.63% in Hebei (2019) and 25.41% in Tianjin (2020). In contrast, the corresponding values for LSB were 23.09% and 19.28% at the same geographical locations and time periods.

Fig. 1
figure 1

Phenotype of LOB and LSB traits and population structure of cauliflower accessions. a Illustrative representation of the branch lengths of a cauliflower curd. b Histogram depicting the blue value distribution for LOB and LSB phenotypic data collected in 2019 and 2020. c Phenotype correlation analysis of LOB and LSB. Asterisks indicate significance of the correlations as *** p < 0.001. Red lines in the lower diagonal are based on linear regressions. d Principal component analyses (PCA) of all the accessions. e Neighbor-joining phylogenetic tree based on single-nucleotide polymorphisms (SNPs), using Brassica cretica as outgroup. f The population structure analysis with different numbers of clusters (K = 3, 5, and 6) matches the phylogenetic tree. The x-axis lists the different accessions that are consistent with those in the phylogenetic tree

Genomic heritability (h2) and genomic correlation between LOB and LSB were assessed utilizing the whole data of 298 cauliflower inbred lines. As a result, LOB exhibited a higher heritability (h2 = 0.84) in contrast to LSB (h2 = 0.64). In addition, our results revealed a significant genetic correlation of 0.808 between LOB and LSB traits. Moreover, there was a notably strong positive correlation between LOB and LSB, with a correlation coefficient (R) of 0.700 (Fig. 1c). The correlations observed between the two traits imply a shared genetic basis due to pleiotropic effects influencing their expression. These pieces of evidence collectively indicate a strong potential for identifying common candidate genes through GWAS.

Genomic variants and phylogenetic relationships analysis

The WGRS of 298 cauliflower accessions generated a total of 2.3 terabases (Tb) of raw reads. After removing low-quality and adapter sequences, an average coverage of about 13.5-fold data was obtained for each accession, which was subsequently utilized for SNP calling. The average mapping rate of the cleaned reads to the reference genome of C-8 (V2) was determined to be approximately 91% (Supplementary Table 2). A total of 729,691 SNPs were filtered out and used for downstream analyses. These SNPs had a missing rate of no more than 20% and a minor allele frequency (MAF) of ≥ 0.05.

Phylogenetic tree construction based on neighbor-joining (NJ) method (rooted on wild species, Brassica cretica), as well as admixture models and PCA were performed to determine their phylogenetic relationships. The results support a classification of two groups that could be further classified into four subgroups according to the SCC trait (Fig. 1d, e and Supplementary Fig. 2). According to the phylogenetic tree, from distal to proximal, four subgroups are as follows: CCA (compact-curd type, group A), LCA (loose-curd type, group A), CCB (compact-curd type, group B), and LCB (loose-curd type, group B), consisting of 58, 102, 73, and 65 cauliflower accessions, respectively. PCA results also show that CCA group is closet to kale and wild species. Overall, CCA and CCB accessions are relatively scattered, while LCA and LCB accessions are tightly clustered, reflecting a narrow genetic background for loose-curd cauliflowers which might undergo strong bottlenecks during its evolutionary history (Fig. 1d). Notably, the nested structure of phylogenetic tree indicates that the evolution of the curd solidity is not one-way pattern and might be more complex than previously expected (Fig. 1e). Moreover, model-based clustering analysis also demonstrates frequent genetic exchanges between compact-curd and loose-curd accessions, suggesting their interlaced relationships due to crossbreeding events that occurred during curd differentiation (Fig. 1f).

GWAS analysis

A sum of 729,691 SNPs was subjected to GWAS analysis using rMVP with a mixed linear model (MLM). As a result, sixty-four candidate loci were identified that exceeding the significance threshold (1.37E−06) (Fig. 2a). Of these statistically significant SNPs, fifty-four SNPs on chromosomes 1 and 8 were associated with LOB. Regarding LSB, eleven SNPs were identified on three chromosomes (2, 3 and 8) (Supplementary Table 3). To maximally collect potential candidate genes, we identified all protein-coding genes located within a 200 kb context of statistically significant loci. A total of 262 candidate genes were identified in association with LOB; meanwhile, the analysis yielded the identification of 179 candidate genes for LSB trait (Supplementary Table 4). Notably, the major signal (Chr8:11,387,967) was significantly associated with both LOB and LSB traits (Fig. 2a), suggesting that these two curd branch length traits might undergo the same selection process during curd improvement in cauliflower. A total of 27 genes within the candidate interval (Chr8:11,287,967–11,487,967) of the shared significant SNP were identified as the most promising candidate genes associated simultaneously with LOB and LSB traits.

Fig. 2
figure 2

Combined Manhattan plot for LOB and LSB traits. A yellow dashed line denotes the significance threshold − log10 (p-value) = 5.69). A red arrow highlights the overlapping signal for both LOB and LSB traits (a). Genome-wide scan for FST differentiation signals associated with the outermost branch length of the cauliflower curd (LOB) during its improvement. A red dashed line marks the top 1% threshold, used to identify highly differentiated regions (b). XP-EHH plot contrasting extreme long-length and extreme short-length curd populations. Positive values suggest directional selection favoring the extreme long-length population, while negative values suggest selection favoring the extreme short-length population. Red dashed lines indicate thresholds at values beyond ± 4, deemed potentially significant at above 4 (-log10 (p-value)) (c). Schematic view of identified 79 kb region, encompassing eleven candidate genes (d)

Phenotypic variation explained (PVE) of the peak SNP (Chr8:11,387,967) showed that the accessions with the alternate AA sequence exhibited higher LOB and LSB values than those with the reference GG sequence (Supplementary Fig. 3). Notably, it was observed that the ratio of allele AA exhibited an increasing pattern from compact-curd to loose-curd accessions, indicating that this variant might be the causal site for regulating curd branch length and could be used as a molecular marker for CBL.

Selective sweep signals during curd improvement

CBL, as an important trait, has been continuously selected and optimized during the domestication of cauliflower. To further investigate the genomic factors influencing this trait, FST and XP-EHH methods were used to compare selection throughout the genome using cauliflower germplasm diversity panel. The FST method was employed to conduct a systematic survey for candidate selective sweeps, utilizing windows corresponding to the top 1% of maximum values as indicative of selection regions (Fig. 2b). This analysis resulted in the identification of a total of 297 selective regions across the genome which represented the top 1% windows with a total length of 29.69 Mb that accounted for approximately 5.22% of the genome (Supplementary Table 5). Within these regions, 1,003 candidate genes were discovered with positive associations (Supplementary Table 6). Furthermore, the haplotype-based method XP-EHH was utilized to explore candidate selective sweeps where windows with threshold of -log10 (p-value) > 4.0 were considered as selection regions (Fig. 2c and Supplementary Table 7). As a result, 140 selective regions and 1,397 protein-coding candidate genes were identified with an overall length of 13.97 Mb, accounting for approximately 2.45% of the entire genome (Supplementary Table 8). Furthermore, comparative analysis of FST and XP-EHH results led to the identification of 69 overlapping regions and 53 candidate genes, which were highlighted for subsequent analysis (Supplementary Table 9 and Supplementary Table 10). To elucidate the potential functions of these candidate genes, gene ontology (GO) enrichment analysis was performed, leading to the identification of enriched GO terms related to flower morphogenesis (GO:0048439), bract morphogenesis (GO:0010433), bract formation (GO:0010434), UDP-glucuronate metabolic process (GO:0046398), and response to singlet oxygen (GO:0000304) (Supplementary Fig. 4). These results offered important clues for dissecting the regulatory mechanisms of curd branch development in cauliflower.

Inferring causal genes controlling CBL

To further explore candidate genes associated with CBL, we conducted haplotype analysis on the common interval of GWAS, FST, and XP-EHH results (79 kb, Chr8_11327064 ~ 11,399,048) (Fig. 3a). Three major haplotypes were characterized, accounting for 66.67% (LL haplotype), 8.33% (ML haplotype) and 25% (SL haplotype), respectively. Significantly, about 76% of long CBL accessions (longer than 6 cm) carry the LL haplotype, compared to short CBL accessions (shorter than 3 cm) that belonged mainly to SL haplotype. In this overlapping region, 11 protein-coding genes were annotated (Fig. 2d). Of these, five genes were either not expressed or expressed at very low levels (log2(fpkm + 1) < 3) in different organs of cauliflower which were excluded (Fig. 3b). The expression level of remaining six genes were analyzed using qRT-PCR (Supplementary Table 11). The results indicated that BOB08G028680 exhibited significantly variable expression levels at different stages of curd development (Fig. 3c). Protein sequence homology and phylogenetic analysis revealed that BOB08G028680 encodes a homolog of response regulator 9 (ARR9, AT3G57040), which belongs to the type-A ARR family in Arabidopsis (Fig. 3d). ARR9 acts as a negative regulator of the cytokinin signaling (To et al. 2004). This finding leads to the hypothesis that BOB08G028680 is a strong candidate gene that may play a crucial role in controlling the CBL trait through cytokinin signaling pathway in cauliflower. However, validating this hypothesis requires more experimental evidence, additional functional validations are needed to illustrate its biological functions and significance.

Fig. 3
figure 3

Validation of the candidate genes in the LOB and LSB candidate region. a Haplotype patterns of the candidate region (Chr8:11,327,064–11,399,048) across sixty cauliflower accessions, comprising thirty each of extreme long-length and extreme short-length curd accessions. Haplotypes were deduced using 199 SNPs derived from the vcf file. Color-coded bars denote 60 distinct accessions as detailed in Supplementary Table 2. b Fold changes in candidate genes transcript levels in a variety of organs, including the root, leaf, flower, and curd, especially different stages of curd development (vegetative (S0), transition (S1), curd formation (S2), pre-mature (S3), and branch elongation (S4)). c Comparative expression of candidate genes during various curd developmental stages in cauliflower. Gene expression levels were determined using qRT-PCR, employing specific primers and normalized against the Actin gene. Note: Data values represent the mean of three replicates, accompanied by ± standard error. d Phylogenetic tree of BOB08G028680 and its orthologs across eight monocot and dicot species

The variants located in the coding regions of BOB08G028680 were further analyzed to predict their putative effects on protein-coding. A total of four SNP sites resulted in missense mutations (Fig. 4). Two SNP sites (Chr8:11,401,901 and Chr8:11,402,203) in the first exon and two SNP sites (Chr8:11,402,788 and Chr8:11,402,790) in the third exon led to four amino acid changes (from Ser to Asn, His to Asp, Gln to Glu, and Asp to Ala, respectively) (Fig. 4 and Supplementary Fig. 5), which could cause defects in the protein function of the candidate gene. Overall, our findings showed that natural variation in BOB08G028680 influences LOB and LSB traits, suggesting that manipulating this gene may improve the branch length of cauliflower curd.

Fig. 4
figure 4

Multiple alleles in Candidate Gene BOB08G028680. a Schematic drawing of BOB08G028680 gene in “C-8” reference genome. b Allelic information of sequence variants in BOB08G028680 for the six-representative cauliflower. c CBL values for the six-representative cauliflower

Discussion

The initiation and development of curds in cauliflower exhibit intricate characteristics, often demonstrating complex quantitative inheritance patterns (Rakshita et al. 2021). Understanding the genetic foundations of curd development is essential for cauliflower improvement. Association mapping is a powerful approach for identifying QTLs and candidate genes associated with agronomic traits (Jaganathan et al. 2020). To date, the application of association studies focusing on curd traits in cauliflower is rare (Hasan et al. 2016; Matschegewski et al. 2015; Rakshita et al. 2021; Rosen et al. 2018; Zhang et al. 2011, 2023; Zhao et al. 2020a, 2020b). Previous GWAS studies in cauliflower have been limited by small population sizes, narrow genetic variation, or a low number of markers (Matschegewski et al. 2015; Rakshita et al. 2021; Zhang et al. 2023). Low marker density has been proved to be adequate for the identification of genomic regions or QTLs associated with specific traits. However, it was insufficient for accurately identifying the exact causal gene responsible for a QTL. In this study, the utilization of a high marker density (729,691 SNPs) and a large population size (298 accessions, including loose-curd and compact-curd cauliflowers) facilitated the successful identification of QTLs and candidate genes associated with CBL traits. Additionally, the integration of multiple strategies, including selective sweep analysis, transcriptome profiling, and haplotype analysis, has contributed considerably to the success of QTL mapping and gene mining. Moreover, it is an effective way to simultaneously survey multiple highly associated traits for mapping QTLs.

Our GWAS results revealed a total of sixty-four highly associated SNPs that were distributed across four chromosomes. One significant SNP (Chr8:11,387,967) was shared between LOB and LSB traits, which could be regarded as repeated validation. In previous studies, Zhao et al. (2020a) detected two QTLs controlling LOB trait (C6: 30,128,973–34,650,103 and C6: 7,318,888–9,408,197). Rakshita et al. (2021) detected twelve QTLs for traits associated with regulation of curd formation and development, two of which, namely C1:33,480,103 and C3:42,391,757, were associated with curd length. Differing from these previously reported loci, our study discovered new QTLs that are associated with CBL traits relevant to the solidity of cauliflower curd. Furthermore, the identified candidate genes show significant potential for cauliflower germplasm innovation. These significant SNPs within the QTLs can be utilized as credible markers to facilitate marker-assisted selection in cauliflower breeding.

Shoot branching determines plant architecture through affecting the number, length, angle, and position of branches, which are controlled by the activity of axillary and floral meristems (Barbier et al. 2019; Rameau et al. 2015). Various studies have revealed that auxin, cytokinins, and strigolactones play crucial roles in regulating shoot branching. Auxin and strigolactone act as inhibitors of bud outgrowth, while cytokinins act as promoters of bud outgrowth (Domagalska and Leyser 2011; Müller and Leyser 2011; Werner et al. 2003). Arabidopsis type-A response regulators (ARR3, ARR4, ARR5, ARR6, ARR8 and ARR9) have overlapping functions and act as negative regulators in the cytokinin signaling pathway (Kiba et al. 2003; Kushwah and Laxmi 2014; Rashotte et al. 2003; To et al. 2004; Zhang et al. 2011). SlRR6, as a type-A response regulator, positively regulates plant height in tomato. Knock out of SlRR6 reduced tomato plant height through shortening internode length, while overexpression of SlRR6 resulted in taller plants due to increased internode number (Liu et al. 2023). Zhao et al. (2023) showed that ectopic expression of MdRR9 in tomatoes indicated that MdRR9 plays a positive role in controlling branch development. Herein, we identified a homologous gene of ARR9 as the causal gene responsible for both LOB and LSB traits using multiple fine-mapping strategies. This gene might regulate curd branch development in cauliflower through interacting with proteins in the cytokinin signaling pathway. However, the exact molecular mechanisms underlying this candidate gene and its regulatory role in the cytokinin pathway require further investigation. The candidate gene and genetic variations identified in this study could be utilized as valuable foundations to promote germplasm innovation for improving curd solidity in cauliflower.

In conclusion, we integrated GWAS, selective sweeps, transcriptome profiling, and haplotype analysis approaches to comprehensively investigate the causal loci and candidate genes for LOB and LSB traits. Our results yielded valuable insights into the genetic basis of CBL trait in cauliflower. The identification of candidate genes and natural variations contributes to a better understanding of the underlying molecular mechanisms of curd development. These findings can pave the way for subsequent functional validation and provide valuable genetic resources for future cauliflower breeding.