Abstract
Cauliflower is a distinct subspecies of the Brassica oleracea plants due to its specialized and edible floral organ. Cauliflower curd is composed of enlarged inflorescence meristems that developed by a series of precise molecular regulations. Based solely on the curd solidity, cauliflower is generally classified into two groups (compact-curd and loose-curd), where curd branch length acts as a crucial parameter to determine the curd morphological difference. Herein, to understand the genetic basis of curd branch development, we utilized a total of 298 inbred lines representing two groups of cauliflower to comprehensively investigate the causal genes and regulatory mechanisms. Phylogenetic and population structure analyses revealed that two subgroups could be further categorized into the compact-curd and the loose-curd groups, respectively. Integrating the genotype and phenotype data, we conducted a genome-wide association study for the length of the outermost branch (LOB) and secondary branch (LSB) of the curd. Sixty-four significant loci were identified that are highly associated with curd branch development. Evidence from genome-wide selective sweep analysis (FST and XP-EHH) narrowed down the major signal on chromosome 8 into an approximately 79 kb region which encodes eleven protein-coding genes. After further analysis of haplotypes, transcriptome profiling, and gene expression validation, we finally inferred that BOB08G028680, as a homologous counterpart of AtARR9, might be the causal gene for simultaneously regulating LOB and LSB traits in cauliflower. This result provides valuable information for improving curd solidity in future cauliflower breeding.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Cauliflower (Brassica oleracea var. botrytis) is an economically significant vegetable crop known for its unique flavor and high nutritional value, and it is widely produced and consumed globally. The curd, characterized by its creamy-white color and enlarged inflorescence meristem, is the most valuable and edible part of the cauliflower plant. The quality and appearance of curd can be described by various agronomic traits including curd size, shape, color, texture, degree of tightness (also known as solidity). However, the molecular regulatory mechanisms underlying curd development remain unclear. Understanding the genetic basis of these curd-related traits through map-based cloning is crucial for cauliflower improvement, which will accelerate the steps of germplasm innovation and empower cauliflower breeding.
The solidity of the cauliflower curd (SCC) is a critical agronomic trait that has a substantial impact on cauliflower morphology and is essential in determining curd quality. This trait categorizes cauliflower into two distinct groups: compact-curd and loose-curd (Supplementary Fig. 1). Compact-curds are characterized by short stalks/branches, wide stalk/branches angles, and high solidity, whereas loose curds exhibit contrasting characteristics (Nieuwhof and Garretsen 1961; Zhao et al. 2020a, 2013). In an attempt to quantitatively assess SCC in cauliflower, several easily measurable properties have been reported including curd tightness (CT), the base radius of the curd (BRC), the number of curd branches (NCB), curd branch length (CBL), the angle between branches and the vertical axis (ABV), and the curd weight-to-height ratio (CWHR, g/cm) (Nieuwhof and Garretsen 1961; Sharma et al. 2018). Remarkably, CBL and ABV are recognized as crucial factors contributing to the differences observed between compact and loose curds. Previous studies have demonstrated that these traits are strongly associated with each other (Kumar et al. 2017; Lan and Paterson 2000; Rana et al. 2024; Sharma et al. 2017; Singh et al. 2023; Zhao et al. 2020a). Of these, CBL is a crucial parameter for assessing the curd solidity and yield of cauliflower. Previously, two specific traits, the length of the outermost branch (LOB) and secondary branch (LSB) in curd, have been utilized as indicators for representing CBL trait (Zhao et al. 2020a, 2012).
In this decade, several studies have been published focusing on the key loci and genes responsible for important agricultural traits of cauliflower curd. Matschegewski et al. (2015) identified 18 QTLs and several flowering genes that influence temperature-dependent curd induction using GWAS strategy. Based on the inconsistent expression of the flowering genes FLOWERING LOCUS C (BoFLC) and VERNALIZATION 2 (BoVRN2), they speculated that facultative BoVRN2/BoFLC-independent mechanisms might control the temperature-regulated floral transition in cauliflower. Hasan et al. (2016) repeatedly detected QTLs for days to curd initiation (DCI) on chromosomes C06 and C09 using composite interval mapping (CIM), with increased additive effects at higher temperatures. Zhao et al. (2020a) analyzed curd architecture across two environments, focusing on four parameters: basal diameter, stalk length, stalk angle, and curd solidity in two double haploid (DH) populations developed by a common compact-curd parent and two loose-curd parents, comprising 122 and 79 lines, respectively. In total, they identified 20 QTLs associated with these parameters, in which qSL.C6-1, qSL.C6-2, qCS.C6-1, and qCS.C6-2 were found in close proximity on chromosome C6. Zhao et al. (2020b) performed BSA-seq using an F2 population derived from sister lines with “riceyness” and “non-riceyness” bulks resulting in the identification of a 4.0 Mb candidate region on chromosome C04 containing 22 putative SNPs. Through comprehensive RNA-seq, gene function annotation, and sequence analysis, they pinpointed Bo4g024850, an orthologous gene of Arabidopsis SOC1, as the candidate gene responsible for the development of riceyness. Rakshita et al. (2021) detected twelve QTLs and 121 significant SNPs related to agronomic traits, including curd length, curd width, days to 50% curd harvest, marketable curd weight, gross plant weight, leaf length, curd weight, and number of leaves per plant, using genotyping by sequencing (GBS) on 92 Indian cauliflower germplasms. Recently, Zhang et al. (2023) selected 220 core accessions of loose-curd cauliflower for resequencing, phenotypic investigation, and genome-wide association studies (GWAS), leading to the identification of several signals on chromosome C02 for main stem height (MSH) and purplish curd (PC), on C06 for external leaf wing (ELW), and on C01 for weight of a single curd (WSC). Additionally, BOB01G136670, located in the WSC signal interval, was regarded as a plausible candidate gene for WSC in cauliflower. Taken together, these efforts focusing on dissecting phenotypes and mapping quantitative trait loci (QTL) of curd-related traits will contribute to revealing key regulatory genes and improving our understanding of this important trait in cauliflower.
In the past decades, the emergence and wide applications of next-generation sequencing (NGS) techniques have ushered in a transformative era, making it possible to generate genomic big data inexpensively and facilitate genetic mapping by providing high-density single-nucleotide polymorphisms (SNPs) and significantly improve marker resolution. Combined with NGS technologies, recent studies have revolutionized the approach to genetic dissection of complex trait architectures in cauliflower (Zhao et al. 2020b; Zhang et al. 2023). In particular, integrated multi-omics has shown great superiority in QTL mapping and candidate gene mining. For example, bulked segregant analysis (BSA) and bulked segregant RNA-seq (BSR) were used together for simultaneous mapping and analysis of differential gene expression within the mapping interval (Borovsky et al. 2019; Yan et al. 2019; Zhao et al. 2020b). BSA, combined with transcriptome analysis, was utilized to identify genes that show differential expression within major signals (Hao et al. 2019; Ou et al. 2020). BSA and GWAS are frequently co-located to improve resolution by leveraging natural population diversity (Chen et al. 2024; Du et al. 2018; Su et al. 2019). Integration of transcriptome sequencing and GWAS approaches could not only cross-validate the result, but also reduce the scope of candidate genes based on SNP markers and gene expression data, which has proven to be effective in identifying key genes that control important agronomic traits in cauliflower (Matschegewski et al. 2015), rice (Wei et al. 2021), maize (Ma et al. 2021), oilseed rape (Li et al. 2021), and oak (Francisco et al. 2021).
In this study, a comprehensive strategy for gene mining was performed to investigate the CBL trait using 298 cauliflower inbred lines. Based on the genome-wide SNP dataset generated by whole genome resequencing (WGRS) and phenotypic data, we analyzed the phylogenetic structure and conducted GWAS for LOB and LSB traits, resulting in the identification of the major signal on chromosome 8. In addition, selective sweep analysis, haplotype analysis, and tissue-specific transcriptome profiling provided further evidence to narrow down the signal interval and identify causal genes. The main aims of this study were two-fold: (1) to detect QTLs and analyze the complex genetic architecture that controls the LOB and LSB traits, and (2) to validate the significance of selected major QTLs and pinpoint causal genes. Our results provide valuable insights into the molecular mechanisms of curd development and offer candidate genes for improving curd features in cauliflower.
Materials and methods
Plant materials
Cauliflower inbred lines used in this study were collected and stored at the Tianjin Kernel Vegetable Research Institute, Tianjin Academy of Agricultural Sciences (Supplementary Table 1). The total study of 298 cauliflower inbred lines, comprising 167 loose-curd accessions and 131 compact-curd accessions. For outgroups, four accessions resequencing data, including two wild accessions (Brassica cretica), and two Chinese kale accessions (Brassica oleracea var. alboglabra), were downloaded from the NCBI SRA database (Supplementary Table 2).
In the context of real-time reverse transcription PCR (qRT-PCR) experiments, we utilized two distinct cauliflower inbred lines, which represented loose and compact-curd groups with LOB length in 3.568 and 8.336 cm, to conduct analyses of candidate gene expressions. Sampling was performed across multiple stages of curd development, encompassing the curd formation stage, curd elongation stage, and curd maturation stage (Guo et al. 2021). During each of these stages, the fleshy stem portion of the curd was meticulously excised, and RNA extraction was performed to enable subsequent qRT-PCR analysis.
Phenotyping
All 298 cauliflower inbred lines were planted in Tangshan, Hebei Province, China (2019) and Baodi Biological Center, Tianjin, China (2020), respectively. All accessions were planted in an experimental field with an arrangement-order design, including five replicates. Each plot contained one row 5 m in length, 9–10 plants per row, approximately 50 cm between plants within each row and 55 cm between rows. Cauliflowers were sown in mid-to-late June and harvested in mid-to-late October. Phenotyping of LOB and LSB traits was measured using Vernier caliper at curd physiological maturity stage.
SNP calling
The genome resequencing raw data of 298 cauliflower accessions have been sequenced and stored in our laboratory (Chen et al. 2024), and these data have been deposited in the NCBI SRA database (PRJNA794342).
For data preprocessing, Fastp (v0.12.4) and FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) programs were employed for read cleaning (Chen et al. 2018). Clean reads were subsequently aligned to the reference “C-8” cauliflower genome (NGDCGWHBKKZ00000000) using BWA aln (v0.7.10-r789) (Li and Durbin 2009). Samtools (v1.14) (Li et al. 2009) was used for converting the format of SAM files, sorting BAM files, and filtering mapping quality with the “-q 30” parameter. The Genome Analysis Toolkit (GATK, v4.1.4.1) (McKenna et al. 2010) was utilized for processing SAM files and calling SNPs. To exclude SNP-calling errors resulting from incorrect mapping, only high-quality SNPs were retained based on the following criteria: depth of coverage ≥ 4 (approximately one-third of the average depth), mapping quality score ≥ 20, missing sample ratio within the population ≤ 20%, and minor allele frequency (MAF) ≥ 0.05.
Phylogenetic and population structure analysis
To clarify phylogenetic relationships from a genome-wide perspective, an individual-based NJ (neighbor-joining) tree was constructed with 100 bootstraps using PHYLIP (v3.697) (Felsenstein 1993) and the tree layout was generated using the online tool iTOL (https://itol.embl.de/). Principal component analysis (PCA) was performed on the filtered SNPs using GCTA (v1.94.0beta) (Yang et al. 2011). The population structure was analyzed with the cluster number K ranging from 1–20 by ADMIXTURE (v1.3.0) (Alexander et al. 2009) using a default fivefold cross-validation (− cv = 5). Each K was run with 20 replicates, and the outputs were visualized using the function plotQ() of the R package pophelper (Francis 2017).
GWAS analysis
The R package lme4 and the mixed model y = Xβ + Zu + ε were used to calculate the best linear unbiased estimation (BLUE) of each line in each correlation-based group. Genotype was defined as the fixed effect, while environment was considered as the random effect. The yield BLUEs calculated for each phenotype group were utilized for GWAS and for developing genomic prediction models.
Totally, 729,691 SNPs (MAF > 0.05; Quality ≥ 20; GQ ≥ 5; missing rate ≤ 0.2; depth ≥ 4) were used for the two different traits in this study. Association analysis was performed using the MLM methods implemented in rMVP (Yin et al. 2021). Genome-wide significance thresholds (1.37E-06) were determined using a uniform threshold of 1/n, where n is the effective number of independent SNPs and SVs calculated using the Genetic type 1 Error Calculator (v.0.2) (Li et al. 2012). QTLs were defined based on the positions of significant SNPs and the size of LD interval. The LD decay of cauliflower populations was recently calculated to be approximately 27.9 kb and reported by our group (Chen et al. 2024). To ensure key candidate genes were not missed, the total QTL size was defined as 200 kb, comprising 100 kb upstream and downstream of associated SNPs, slightly larger than the actual LD interval size (27.9 kb) (Chen et al. 2024; Luo et al. 2021; Wang et al. 2019).
FST and XP-EHH analysis and selective sweep detection
Two methods, linkage disequilibrium-based cross-population extended haplotype homozygosity (XP-EHH) (Sabeti et al. 2007) and the population differentiation-based FST (Weir and Cockerham 1984), were used to detect signatures of selection in cauliflower. FST values were calculated using VCFtools v0.1.17 (Danecek et al. 2011) with a window size of 100 kb and a step size of 10 kb. Windows with top 1% FST value were considered candidate regions under a selective sweep. The XP-EHH values were calculated using REHH v3.2.2 R package (Gautier et al. 2017) to locate regions under positive selection by applying a threshold of − log10 (p-value) > 4.0. We used the topGO package to obtain the Gene Ontology (Alexa and Rahnenführer 2009), and the ggplot2 package was used for visualization (Villanueva and Chen 2019).
Gene annotation and haplotype analysis
Candidate gene annotation was conducted using the TBtools software platform (Chen et al. 2020; Chen and Xia 2022), with subsequent alignment of all candidate gene protein sequences against the UniProt databases. In-depth analyses were then carried out for each candidate gene and their corresponding overlapping genomic regions. Within this context, haplotypes were extracted from a cohort of sixty samples, comprising 30 individuals each exhibiting extreme phenotypic traits on either end of the spectrum. The R package pegas v0.12 (Paradis 2010) was employed for haplotype analysis in our work, which had also been used in previous studies (Guo et al. 2020; Zhou et al. 2020). Samples or genes featuring inadequate site coverage or poor data quality were excluded from the HapMap graph construction process, thereby ensuring the integrity and fidelity of the data representation for specific genes.
RNA extraction and expression analysis
Total messenger RNAs were extracted using an RNA isolation kit (Thermo Scientific), and genomic DNA was removed using DNase I (Invitrogen). Reverse transcription was performed using PrimeScript™ IV 1st strand cDNA Synthesis Mix (Takara) according to the manufacturer’s protocol. qRT-PCR was performed using TB Green® Premix Ex Taq™ II (Tli RNaseH Plus) (Takara) on a LightCycler 480 II instrument (Roche). The analysis was conducted using three biological and three technical replicates. Relative expression levels of each gene were calculated using the 2−∆∆ct method. Actin was used as the reference gene and primers used are listed in Supplementary Table 12.
A comprehensive collection of 48 tissue-specific RNA-seq datasets for B. oleracea was downloaded from the SRA database and the National Genomics Data Center database (PRJNA516113, PRJNA546441, and PRJCA027857). The datasets encompassed a variety of organs, including the root, leaf, flower and curd. Moreover, transcriptome data detailing different stages of curd development were also acquired from the SRA database (PRJNA546441).
For preprocessing, we employed the Fastq-dump tool (v2.11.2) from the SRA Toolkit and the Fastp program (v0.12.4) (Chen et al. 2018) for format conversion and read cleaning. HISAT2 (v2.2.1) (Kim et al. 2019) and Cufflinks suite (v2.2.1) (Trapnell et al. 2012) were used to compute the fragments per kilobase of transcript per million mapped reads (FPKM) values for each gene. For visualization, a heatmap was generated using the pheatmap R package (v1.0.12) (Kolde 2019), which incorporated log2 (FPKM + 1) values of the selected genes.
Statistical analysis
Comparisons of phenotypic data and gene expression levels were conducted by unpaired two-tailed Student’s t tests using the rstatix package in R version 4.2.1 (http://www.r-project.org/). Coefficients of variation, means, and standard deviations were calculated using the raster package (Hijmans et al. 2015) and R (v4.2.1), respectively. Genomic heritability (h2) was performed using the GCTA approach (Yang et al. 2011).
Results
Phenotypic analysis
Field observations and measurements of LOB and LSB traits were conducted on 298 cauliflower inbred lines for two consecutive years (2019 and 2020) in different locations. The results showed significant disparities that followed a normal distribution pattern (Fig. 1a, b). The LOB values ranged from 2.316 to 9.657 cm, with a mean value of 4.406 cm, whereas the LSB ranged from 1.172 to 3.910 cm, with an average of 2.073 cm (Supplementary Table 1). The phenotypic coefficient of variation (PCV) for LOB was 32.63% in Hebei (2019) and 25.41% in Tianjin (2020). In contrast, the corresponding values for LSB were 23.09% and 19.28% at the same geographical locations and time periods.
Genomic heritability (h2) and genomic correlation between LOB and LSB were assessed utilizing the whole data of 298 cauliflower inbred lines. As a result, LOB exhibited a higher heritability (h2 = 0.84) in contrast to LSB (h2 = 0.64). In addition, our results revealed a significant genetic correlation of 0.808 between LOB and LSB traits. Moreover, there was a notably strong positive correlation between LOB and LSB, with a correlation coefficient (R) of 0.700 (Fig. 1c). The correlations observed between the two traits imply a shared genetic basis due to pleiotropic effects influencing their expression. These pieces of evidence collectively indicate a strong potential for identifying common candidate genes through GWAS.
Genomic variants and phylogenetic relationships analysis
The WGRS of 298 cauliflower accessions generated a total of 2.3 terabases (Tb) of raw reads. After removing low-quality and adapter sequences, an average coverage of about 13.5-fold data was obtained for each accession, which was subsequently utilized for SNP calling. The average mapping rate of the cleaned reads to the reference genome of C-8 (V2) was determined to be approximately 91% (Supplementary Table 2). A total of 729,691 SNPs were filtered out and used for downstream analyses. These SNPs had a missing rate of no more than 20% and a minor allele frequency (MAF) of ≥ 0.05.
Phylogenetic tree construction based on neighbor-joining (NJ) method (rooted on wild species, Brassica cretica), as well as admixture models and PCA were performed to determine their phylogenetic relationships. The results support a classification of two groups that could be further classified into four subgroups according to the SCC trait (Fig. 1d, e and Supplementary Fig. 2). According to the phylogenetic tree, from distal to proximal, four subgroups are as follows: CCA (compact-curd type, group A), LCA (loose-curd type, group A), CCB (compact-curd type, group B), and LCB (loose-curd type, group B), consisting of 58, 102, 73, and 65 cauliflower accessions, respectively. PCA results also show that CCA group is closet to kale and wild species. Overall, CCA and CCB accessions are relatively scattered, while LCA and LCB accessions are tightly clustered, reflecting a narrow genetic background for loose-curd cauliflowers which might undergo strong bottlenecks during its evolutionary history (Fig. 1d). Notably, the nested structure of phylogenetic tree indicates that the evolution of the curd solidity is not one-way pattern and might be more complex than previously expected (Fig. 1e). Moreover, model-based clustering analysis also demonstrates frequent genetic exchanges between compact-curd and loose-curd accessions, suggesting their interlaced relationships due to crossbreeding events that occurred during curd differentiation (Fig. 1f).
GWAS analysis
A sum of 729,691 SNPs was subjected to GWAS analysis using rMVP with a mixed linear model (MLM). As a result, sixty-four candidate loci were identified that exceeding the significance threshold (1.37E−06) (Fig. 2a). Of these statistically significant SNPs, fifty-four SNPs on chromosomes 1 and 8 were associated with LOB. Regarding LSB, eleven SNPs were identified on three chromosomes (2, 3 and 8) (Supplementary Table 3). To maximally collect potential candidate genes, we identified all protein-coding genes located within a 200 kb context of statistically significant loci. A total of 262 candidate genes were identified in association with LOB; meanwhile, the analysis yielded the identification of 179 candidate genes for LSB trait (Supplementary Table 4). Notably, the major signal (Chr8:11,387,967) was significantly associated with both LOB and LSB traits (Fig. 2a), suggesting that these two curd branch length traits might undergo the same selection process during curd improvement in cauliflower. A total of 27 genes within the candidate interval (Chr8:11,287,967–11,487,967) of the shared significant SNP were identified as the most promising candidate genes associated simultaneously with LOB and LSB traits.
Phenotypic variation explained (PVE) of the peak SNP (Chr8:11,387,967) showed that the accessions with the alternate AA sequence exhibited higher LOB and LSB values than those with the reference GG sequence (Supplementary Fig. 3). Notably, it was observed that the ratio of allele AA exhibited an increasing pattern from compact-curd to loose-curd accessions, indicating that this variant might be the causal site for regulating curd branch length and could be used as a molecular marker for CBL.
Selective sweep signals during curd improvement
CBL, as an important trait, has been continuously selected and optimized during the domestication of cauliflower. To further investigate the genomic factors influencing this trait, FST and XP-EHH methods were used to compare selection throughout the genome using cauliflower germplasm diversity panel. The FST method was employed to conduct a systematic survey for candidate selective sweeps, utilizing windows corresponding to the top 1% of maximum values as indicative of selection regions (Fig. 2b). This analysis resulted in the identification of a total of 297 selective regions across the genome which represented the top 1% windows with a total length of 29.69 Mb that accounted for approximately 5.22% of the genome (Supplementary Table 5). Within these regions, 1,003 candidate genes were discovered with positive associations (Supplementary Table 6). Furthermore, the haplotype-based method XP-EHH was utilized to explore candidate selective sweeps where windows with threshold of -log10 (p-value) > 4.0 were considered as selection regions (Fig. 2c and Supplementary Table 7). As a result, 140 selective regions and 1,397 protein-coding candidate genes were identified with an overall length of 13.97 Mb, accounting for approximately 2.45% of the entire genome (Supplementary Table 8). Furthermore, comparative analysis of FST and XP-EHH results led to the identification of 69 overlapping regions and 53 candidate genes, which were highlighted for subsequent analysis (Supplementary Table 9 and Supplementary Table 10). To elucidate the potential functions of these candidate genes, gene ontology (GO) enrichment analysis was performed, leading to the identification of enriched GO terms related to flower morphogenesis (GO:0048439), bract morphogenesis (GO:0010433), bract formation (GO:0010434), UDP-glucuronate metabolic process (GO:0046398), and response to singlet oxygen (GO:0000304) (Supplementary Fig. 4). These results offered important clues for dissecting the regulatory mechanisms of curd branch development in cauliflower.
Inferring causal genes controlling CBL
To further explore candidate genes associated with CBL, we conducted haplotype analysis on the common interval of GWAS, FST, and XP-EHH results (79 kb, Chr8_11327064 ~ 11,399,048) (Fig. 3a). Three major haplotypes were characterized, accounting for 66.67% (LL haplotype), 8.33% (ML haplotype) and 25% (SL haplotype), respectively. Significantly, about 76% of long CBL accessions (longer than 6 cm) carry the LL haplotype, compared to short CBL accessions (shorter than 3 cm) that belonged mainly to SL haplotype. In this overlapping region, 11 protein-coding genes were annotated (Fig. 2d). Of these, five genes were either not expressed or expressed at very low levels (log2(fpkm + 1) < 3) in different organs of cauliflower which were excluded (Fig. 3b). The expression level of remaining six genes were analyzed using qRT-PCR (Supplementary Table 11). The results indicated that BOB08G028680 exhibited significantly variable expression levels at different stages of curd development (Fig. 3c). Protein sequence homology and phylogenetic analysis revealed that BOB08G028680 encodes a homolog of response regulator 9 (ARR9, AT3G57040), which belongs to the type-A ARR family in Arabidopsis (Fig. 3d). ARR9 acts as a negative regulator of the cytokinin signaling (To et al. 2004). This finding leads to the hypothesis that BOB08G028680 is a strong candidate gene that may play a crucial role in controlling the CBL trait through cytokinin signaling pathway in cauliflower. However, validating this hypothesis requires more experimental evidence, additional functional validations are needed to illustrate its biological functions and significance.
The variants located in the coding regions of BOB08G028680 were further analyzed to predict their putative effects on protein-coding. A total of four SNP sites resulted in missense mutations (Fig. 4). Two SNP sites (Chr8:11,401,901 and Chr8:11,402,203) in the first exon and two SNP sites (Chr8:11,402,788 and Chr8:11,402,790) in the third exon led to four amino acid changes (from Ser to Asn, His to Asp, Gln to Glu, and Asp to Ala, respectively) (Fig. 4 and Supplementary Fig. 5), which could cause defects in the protein function of the candidate gene. Overall, our findings showed that natural variation in BOB08G028680 influences LOB and LSB traits, suggesting that manipulating this gene may improve the branch length of cauliflower curd.
Discussion
The initiation and development of curds in cauliflower exhibit intricate characteristics, often demonstrating complex quantitative inheritance patterns (Rakshita et al. 2021). Understanding the genetic foundations of curd development is essential for cauliflower improvement. Association mapping is a powerful approach for identifying QTLs and candidate genes associated with agronomic traits (Jaganathan et al. 2020). To date, the application of association studies focusing on curd traits in cauliflower is rare (Hasan et al. 2016; Matschegewski et al. 2015; Rakshita et al. 2021; Rosen et al. 2018; Zhang et al. 2011, 2023; Zhao et al. 2020a, 2020b). Previous GWAS studies in cauliflower have been limited by small population sizes, narrow genetic variation, or a low number of markers (Matschegewski et al. 2015; Rakshita et al. 2021; Zhang et al. 2023). Low marker density has been proved to be adequate for the identification of genomic regions or QTLs associated with specific traits. However, it was insufficient for accurately identifying the exact causal gene responsible for a QTL. In this study, the utilization of a high marker density (729,691 SNPs) and a large population size (298 accessions, including loose-curd and compact-curd cauliflowers) facilitated the successful identification of QTLs and candidate genes associated with CBL traits. Additionally, the integration of multiple strategies, including selective sweep analysis, transcriptome profiling, and haplotype analysis, has contributed considerably to the success of QTL mapping and gene mining. Moreover, it is an effective way to simultaneously survey multiple highly associated traits for mapping QTLs.
Our GWAS results revealed a total of sixty-four highly associated SNPs that were distributed across four chromosomes. One significant SNP (Chr8:11,387,967) was shared between LOB and LSB traits, which could be regarded as repeated validation. In previous studies, Zhao et al. (2020a) detected two QTLs controlling LOB trait (C6: 30,128,973–34,650,103 and C6: 7,318,888–9,408,197). Rakshita et al. (2021) detected twelve QTLs for traits associated with regulation of curd formation and development, two of which, namely C1:33,480,103 and C3:42,391,757, were associated with curd length. Differing from these previously reported loci, our study discovered new QTLs that are associated with CBL traits relevant to the solidity of cauliflower curd. Furthermore, the identified candidate genes show significant potential for cauliflower germplasm innovation. These significant SNPs within the QTLs can be utilized as credible markers to facilitate marker-assisted selection in cauliflower breeding.
Shoot branching determines plant architecture through affecting the number, length, angle, and position of branches, which are controlled by the activity of axillary and floral meristems (Barbier et al. 2019; Rameau et al. 2015). Various studies have revealed that auxin, cytokinins, and strigolactones play crucial roles in regulating shoot branching. Auxin and strigolactone act as inhibitors of bud outgrowth, while cytokinins act as promoters of bud outgrowth (Domagalska and Leyser 2011; Müller and Leyser 2011; Werner et al. 2003). Arabidopsis type-A response regulators (ARR3, ARR4, ARR5, ARR6, ARR8 and ARR9) have overlapping functions and act as negative regulators in the cytokinin signaling pathway (Kiba et al. 2003; Kushwah and Laxmi 2014; Rashotte et al. 2003; To et al. 2004; Zhang et al. 2011). SlRR6, as a type-A response regulator, positively regulates plant height in tomato. Knock out of SlRR6 reduced tomato plant height through shortening internode length, while overexpression of SlRR6 resulted in taller plants due to increased internode number (Liu et al. 2023). Zhao et al. (2023) showed that ectopic expression of MdRR9 in tomatoes indicated that MdRR9 plays a positive role in controlling branch development. Herein, we identified a homologous gene of ARR9 as the causal gene responsible for both LOB and LSB traits using multiple fine-mapping strategies. This gene might regulate curd branch development in cauliflower through interacting with proteins in the cytokinin signaling pathway. However, the exact molecular mechanisms underlying this candidate gene and its regulatory role in the cytokinin pathway require further investigation. The candidate gene and genetic variations identified in this study could be utilized as valuable foundations to promote germplasm innovation for improving curd solidity in cauliflower.
In conclusion, we integrated GWAS, selective sweeps, transcriptome profiling, and haplotype analysis approaches to comprehensively investigate the causal loci and candidate genes for LOB and LSB traits. Our results yielded valuable insights into the genetic basis of CBL trait in cauliflower. The identification of candidate genes and natural variations contributes to a better understanding of the underlying molecular mechanisms of curd development. These findings can pave the way for subsequent functional validation and provide valuable genetic resources for future cauliflower breeding.
Data availability
The transcriptome sequencing raw data of loose-curd cauliflower SV105 have been deposited in the National Genomics Data Center database under a BioProject accession PRJCA027857.
Code availability
All codes are available from the corresponding authors upon request.
References
Alexa A, Rahnenführer J (2009) Gene set enrichment analysis with topGO. Bioconductor Improv 27:1–26
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664
Barbier FF, Dun EA, Kerr SC, Chabikwa TG, Beveridge CA (2019) An update on the signals controlling shoot branching. Trends Plant Sci 24:220–236
Borovsky Y, Monsonego N, Mohan V, Shabtai S, Kamara I, Faigenboim A, Hill T, Chen S, Stoffel K, Van Deynze A (2019) The zinc-finger transcription factor Cc LOL 1 controls chloroplast development and immature pepper fruit color in Capsicum chinense and its function is conserved in tomato. Plant J 99:41–55
Chen C, Xia R (2022) Interactive data analyses using TBtools. Integrative bioinformatics: history and future. Springer, Singapore
Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34:i884–i890
Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R (2020) TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant 13:1194–1202
Chen R, Chen K, Yao X, Zhang X, Yang Y, Su X, Lyu M, Wang Q, Zhang G, Wang M (2024) Genomic analyses reveal the stepwise domestication and genetic mechanism of curd biogenesis in cauliflower. Nat Genet 1–10
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group (2011) The variant call format and VCFtools. Bioinformatics 27:2156–2158
Domagalska MA, Leyser O (2011) Signal integration in the control of shoot branching. Nat Rev Mol Cell Bio 12:211–221
Du X, Huang G, He S, Yang Z, Sun G, Ma X, Li N, Zhang X, Sun J, Liu M (2018) Resequencing of 243 diploid cotton accessions based on an updated A genome identifies the genetic basis of key agronomic traits. Nat Genet 50:796–802
Felsenstein J (1993) PHYLIP (phylogeny inference package), version 3.5 c. Joseph Felsenstein
Francis RM (2017) pophelper: an R package and web app to analyse and visualize population structure. Mol Ecol Resour 17:27–32
Francisco FR, Aono AH, Da Silva CC, Gonçalves PS, Scaloppi Junior EJ, Le Guen V, Fritsche-Neto R, Souza LM, Souza APD (2021) Unravelling rubber tree growth by integrating GWAS and biological network-based approaches. SciFront Plant Sci 12:768589
Gautier M, Klassmann A, Vitalis R (2017) rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure. Mol Ecol Resour 17:78–90
Guo W, Xin M, Wang Z, Yao Y, Hu Z, Song W, Yu K, Chen Y, Wang X, Guan P, Appels R, Peng H, Ni Z, Sun Q (2020) Origin and adaptation to high altitude of Tibetan semi-wild wheat. Nat Commun 11:5085
Guo N, Wang S, Gao L, Liu Y, Wang X, Lai E, Duan M, Wang G, Li J, Yang M, Zong M, Han S, Pei Y, Borm T, Sun H, Miao L, Liu D, Yu F, Zhang W, Ji H, Zhu C, Xu Y, Bonnema G, Li J, Fei Z, Liu F (2021) Genome sequencing sheds light on the contribution of structural variants to Brassica oleracea diversification. BMC Biol 19:93
Hao Z, Geng M, Hao Y, Zhang Y, Zhang L, Wen S, Wang R, Liu G (2019) Screening for differential expression of genes for resistance to Sitodiplosis mosellana in bread wheat via BSR-seq analysis. Theor Appl Genet 132:3201–3221
Hasan Y, Briggs W, Matschegewski C, Ordon F, Stützel H, Zetzsche H, Groen S, Uptmoor R (2016) Quantitative trait loci controlling leaf appearance and curd initiation of cauliflower in relation to temperature. Theor Appl Genet 129:1273–1288
Hijmans RJ, Van Etten J, Cheng J, Mattiuzzi M, Sumner M, Greenberg JA, Lamigueiro OP, Bevan A, Racine EB, Shortridge A (2015) Package ‘raster.’ R Package 734:473
Jaganathan D, Bohra A, Thudi M, Varshney RK (2020) Fine mapping and gene cloning in the post-NGS era: advances and prospects. Theor Appl Genet 133:1791–1810
Kiba T, Yamada H, Sato S, Kato T, Tabata S, Yamashino T, Mizuno T (2003) The type-A response regulator, ARR15, acts as a negative regulator in the cytokinin-mediated signal transduction in Arabidopsis thaliana. Plant Cell Physiol 44:868–874
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL (2019) Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37:907–915
Kolde R (2019) pheatmap: Pretty Heatmaps. R Package Version 1:12
Kumar V, Singh DK, Panchbhaiya A, Singh N (2017) Correlation and path coefficient analysis studies in mid-season cauliflower (Brassica oleracea var. botrytis L.). J Pharm Phytochem 6:1130–1137
Kushwah S, Laxmi A (2014) The interaction between glucose and cytokinin signal transduction pathway in Arabidopsis thaliana. Plant Cell Environ 37:235–253
Lan TH, Paterson AH (2000) Comparative mapping of quantitative trait loci sculpting the curd of Brassica oleracea. Genetics 155:1927–1954
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760
Li MX, Yeung JM, Cherny SS, Sham PC (2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 131:747–756
Li K, Wang J, Kuang L, Tian Z, Wang X, Dun X, Tu J, Wang H (2021) Genome-wide association study and transcriptome analysis reveal key genes affecting root growth dynamics in rapeseed. Biotechnol Biofuels 14:1–20
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079
Liu Y, Liu Y, He Y, Yan Y, Yu X, Ali M, Pan C, Lu G (2023) Cytokinin-inducible response regulator SlRR6 controls plant height through gibberellin and auxin pathways in tomato. J Exp Bot 74:4471–4488
Luo M, Zhang Y, Li J, Zhang P, Chen K, Song W, Wang X, Yang J, Lu X, Lu B (2021) Molecular dissection of maize seedling salt tolerance using a genome-wide association analysis method. Plant Biotechnol J 19:1937–1951
Ma L, Zhang M, Chen J, Qing C, He S, Zou C, Yuan G, Yang C, Peng H, Pan G (2021) GWAS and WGCNA uncover hub genes controlling salt tolerance in maize (Zea mays L.) seedlings. Theor Appl Genet 134:3305–3318
Matschegewski C, Zetzsche H, Hasan Y, Leibeguth L, Briggs W, Ordon F, Uptmoor R (2015) Genetic variation of temperature-regulated curd induction in cauliflower: elucidation of floral transition by genome-wide association mapping and gene expression analysis. Front Plant Sci 6:720
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Müller D, Leyser O (2011) Auxin, cytokinin and the control of shoot branching. Ann Bot-London 107:1203–1212
Nieuwhof M, Garretsen F (1961) The solidity of the cauliflower curd. Euphytica 10:301–306
Ou C, Zhang X, Wang F, Zhang L, Zhang Y, Fang M, Wang J, Wang J, Jiang S, Zhang Z (2020) A 14 nucleotide deletion mutation in the coding region of the PpBBX24 gene is associated with the red skin of “Zaosu Red” pear (Pyrus pyrifolia White Pear Group): a deletion in the PpBBX24 gene is associated with the red skin of pear. Hortic Res-England 7
Paradis E (2010) pegas: an R package for population genetics with an integrated-modular approach. Bioinformatics 26:419–420
Rakshita K, Singh S, Verma VK, Sharma BB, Saini N, Iquebal MA, Behera T (2021) Understanding population structure and detection of QTLs for curding-related traits in Indian cauliflower by genotyping by sequencing analysis. Funct Integr Genomic 21:679–693
Rameau C, Bertheloot J, Leduc N, Andrieu B, Foucher F, Sakr S (2015) Multiple pathways regulate shoot branching. Front Plant Sci 5:741
Rana N, Sharma A, Kumari V, Lata H, Kaur M, Thakur A (2024) Assessment of genetic variability and character association in mid-late/late cauliflower genotypes. Electron J Plant Breed 15:70–79
Rashotte AM, Carson SD, To JP, Kieber JJ (2003) Expression profiling of cytokinin action in Arabidopsis. Plant Physiol 132:1998–2011
Rosen A, Hasan Y, Briggs W, Uptmoor R (2018) Genome-based prediction of time to curd induction in cauliflower. Front Plant Sci 9:256209
Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, Cotsapas C, Xie X, Byrne EH, McCarroll SA, Gaudet R (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449:913–918
Sharma S, Singh Y, Sekhon BS, Verma A (2017) Association studies for yield and some yield contributing morphophysiological components in cauliflower (Brassica oleracea L. var. botrytis L.). Electron J Plant Breed 8:718–723
Sharma S, Singh Y, Sharma S (2018) Studies on Mean Performance for Yield and its Contributing Traits in Cauliflower (Brassica oleracea var. botrytis L.) under mid hill conditions of northwestern Himalayas. Int J Curr Microbiol Appl Sci 7:3288–3296
Singh J, Sharma A, Sharma P, Kumar N (2023) Genetic Variability and Association Studies in Mid-late and Late Group of Cauliflower (Brassica oleracea L. var. botrytis). Indian J Plant Genet Resour 36:45–51
Su T, Li P, Wang H, Wang W, Zhao X, Yu Y, Zhang D, Yu S, Zhang F (2019) Natural variation in a calreticulin gene causes reduced resistance to Ca2+ deficiency-induced tipburn in Chinese cabbage (Brassica rapa ssp. pekinensis). Plant Cell Environ 42:3044–3060
To JP, Haberer G, Ferreira FJ, Deruere J, Mason MG, Schaller GE, Alonso JM, Ecker JR, Kieber JJ (2004) Type-A Arabidopsis response regulators are partially redundant negative regulators of cytokinin signaling. Plant Cell 16:658–671
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L (2012) Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 7:562–578
Villanueva RAM, Chen ZJ (2019) ggplot2: elegant graphics for data analysis
Wang X, Zhang R, Song W, Han L, Liu X, Sun X, Luo M, Chen K, Zhang Y, Yang H (2019) Dynamic plant height QTL revealed in maize through remote sensing phenotyping using a high-throughput unmanned aerial vehicle (UAV). Sci Rep-UK 9:3458
Wei Z, Yuan Q, Lin H, Li X, Zhang C, Gao H, Zhang B, He H, Liu T, Jie Z (2021) Linkage analysis, GWAS, transcriptome analysis to identify candidate genes for rice seedlings in response to high temperature stress. BMC Plant Biol 21:1–13
Weir BS, Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. evolution 1358–1370
Werner T, Motyka V, Laucou V, Smets R, Van Onckelen H, Schmülling T (2003) Cytokinin-deficient transgenic Arabidopsis plants show multiple developmental alterations indicating opposite functions of cytokinins in the regulation of shoot and root meristem activity. Plant Cell 15(11):2532–2550
Yan C, An G, Zhu T, Zhang W, Zhang L, Peng L, Chen J, Kuang H (2019) Independent activation of the BoMYB2 gene leading to purple traits in Brassica oleracea. Theor Appl Genet 132:895–906
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82
Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, Yuan X, Zhu M, Zhao S, Li X (2021) rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. Genom Proteom Bioinf 19:619–628
Zhang W, To JP, Cheng CY, Eric Schaller G, Kieber JJ (2011) Type-A response regulators are required for proper root apical meristem function through post-transcriptional regulation of PIN auxin efflux carriers. Plant J 68:1–10
Zhang X, Wen Z, Jiang H, Niu G, Liu L, Yao X, Sun D, Shan X (2023) Identification of loci for four important agronomic traits in loose-curd cauliflower based on genome-wide association studies. Horticulturae 9:970
Zhao Z, Gu H, Sheng X, Yu H, Wang J, Cao J (2012) Genetic purity testing of loose-curd cauliflower hybrids using SSR markers and grow out test. Seed Sci Technol 40:209–214
Zhao Z, Gu H, Wang J, Sheng X, Yu H (2013) Development and comparison of quantitative methods to evaluate the curd solidity of cauliflower. J Food Eng 119:477–482
Zhao ZQ, Sheng XG, Yu HF, Wang JS, Shen YS, Gu HH (2020a) Identification of QTLs associated with curd architecture in cauliflower. BMC Plant Biol 20:1–8
Zhao Z, Sheng X, Yu H, Wang J, Shen Y, Gu H (2020b) Identification of candidate genes involved in curd riceyness in cauliflower. Int J Mol Sci 21:1999
Zhao Q, Chen H, Zhang D, Ma J (2023) Ectopic expression of the apple cytokinin response regulator MdRR9 gene in tomatoes promotes shoot branching. Sci Hortic-Amsterdam 321:112228
Zhou Y, Zhao X, Li Y, Xu J, Bi A, Kang L, Xu D, Chen H, Wang Y, Wang YG, Liu S, Jiao C, Lu H, Wang J, Yin C, Jiao Y, Lu F (2020) Triticum population sequencing provides insights into wheat adaptation. Nat Genet 52:1412–1422
Funding
This study was funded by the Natural Science Foundation of Tianjin (22JCYBJC00190), the Innovation Research and Experiment Program for Youth Scholar of Tianjin Academy of Agricultural Sciences (2022017), the National Natural Science Foundation of China (32302579), the Modern Agro-Industry Technology Research System of China (CARS-23-A-04), “131” innovative team construction project of Tianjin (201923) and the Natural Science Foundation of Tianjin (23JCQNJC01040). The work was performed in the State Key Laboratory of Vegetable Biobreeding, Tianjin Academy of Agriculture Sciences, Tianjin 300192, China.
Author information
Authors and Affiliations
Contributions
YY, RC, XY, XG and DS designed studies and initiated this project. DS, XY, WC, JW and WZ contributed to the collection of cauliflower accessions. XY, YY, YG and MH planted accessions, prepared the samples and performed phenotyping. YY, ML, YG and QW performed the bioinformatic analysis. YY, YG and WC designed and performed the molecular experiments. YY, RC and XG wrote and/or revised the manuscript. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Communicated by Jacqueline Batley.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Difference types of cauliflower curd
Supplementary file1 (XLSX 227 KB)
122_2024_4719_MOESM2_ESM.pdf
Principal component analyses (PCA) of all the accessions based on SNP data. Scatter plot showing the distribution of caulifloweraccessions in the first two principal components (PC1 and PC2)
Supplementary file2 (PDF 59602 KB)
122_2024_4719_MOESM3_ESM.pdf
The LOB (a) and LSB (b) are associated with SNP (Chr8:11387967) in 298 diverse inbred lines. Each box represents the medianand interquartile range, and whiskers extend to maximum and minimum values. The genotype and number (N) of each allele arelisted below each box
Supplementary file3 (PDF 861 KB)
Go enrichment analysis of 53 genes from overlapping regions of FST and XP-EHH analysis
Supplementary file4 (PDF 1403 KB)
122_2024_4719_MOESM5_ESM.pdf
Multiple sequence alignment of the gene region of BOB08G028680. Sequence alignment between long LOB and short LOBaccessions. The red box indicates the protein changes caused by SNPs
Supplementary file5 (PDF 428 KB)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Y., Guo, Y., Wang, J. et al. Genome-wide association study and selective sweep analysis uncover candidate genes controlling curd branch length in cauliflower. Theor Appl Genet 137, 209 (2024). https://doi.org/10.1007/s00122-024-04719-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00122-024-04719-5