Introduction

Cotton is the most important source of natural textile fiber in the world. Upland cotton (Gossypium hirsutum L.) has high lint yield, accounting for more than 95% of world cotton production (Li et al. 2016; Hu et al. 2019). Given the evolving trade dynamics in the cotton industry and the push for mechanization, the cultivation of high-yield and high-quality cotton varieties becomes paramount. The plant architecture plays a vital role in determining cotton yield and the effectiveness of mechanical harvesting (Dong et al. 2018). Plant architecture includes various agronomic traits, such as plant height, number of fruit branches, distribution of fruit branch, as well as bolls and leaves.

Plant height is a crucial crop trait that determines overall architecture impacting light interception, photosynthesis, and harvesting in various crop species (Teng et al. 2013). Breeders typically prefer plants that approach an optimal height value, as excessively short plants tend to exhibit lower yields compared to semi-dwarf varieties. The identification and characterization of plant height-related genes are increasing, and several genes have been successfully cloned. Notably, recent research has demonstrated that the down-regulation of GhPAS1 plays an important role in modulating architectural traits by regulating plant height and fruit branch angle (Wu et al. 2021). The overexpression of GhDREB1B has been found to result in a notable decrease in plant height, branch length, and branch angle. Regulating the expression of GhDREB1B offers a potential means to optimize plant architecture for dense planting (Ji et al. 2021). Despite the insights provided by previous studies, the genetic and molecular mechanisms underlying cotton plant height is not clear.

Presently, the regulation of cotton plant height is achieved through chemical means, thereby incurring supplementary expenses and engendering the risk of chemical contamination. Alternatively, given the escalating demand for mechanical harvesting in China, the cultivation of varieties with ideal plant height emerges as the most dependable and viable approach. However, plant height inherits as a quantitative trait governed by multiple genes and is readily influenced by environmental factors. Numerous QTLs have been reported for cotton plant height through the utilization of diverse genetic populations. For instance, Wu et al. (2022) employed QTL mapping in a recombinant inbred line (RIL) population across nine environments and three best linear unbiased predictions (BLUPs) to identify 60 plant height QTLs. Ma et al. (2019) utilized SLAF-seq technology to identify a stable QTL, qPH-Dt1-1, and a gene, GhPIN3, that regulates cotton plant height in an interspecific G.hirsutum × G.barbadense backcross inbred line (BIL) population. Su et al. (2018) identified eight SNPs associated with plant height, and observed that the silenced plants (Gh_D03G0922) exhibited increased plant height. Additionally, qRT-PCR analysis revealed that the expression of Gh_D03G0922 was upregulated in the apical buds and young leaves of short and compact cotton varieties. Additionally, Ye et al. (2023) conducted genome-wide association studies in three field conditions using unmanned aerial vehicles and identified two genetic loci GhUBP15 and GhCUL1 associated with plant height. However, the identification of candidate genes associated with plant height within these QTL regions is still limited.

To provide additional information for breeding, the genetic basis of plant height must be further investigated in cotton. In this study, we mapped the plant height QTL of mutant cotton genotype cz3 precisely by BSA-seq and RNA-seq to locate the candidate gene region on the chromosome D05. GhSOT may plays an important role in regulating plant height development. This work will provide a foundation to verify the plant height function of candidate genes and ultimately realize the use of ideal plant height resource.

Materials and methods

Plant materials and field experiments

A dwarf cotton mutant named cz3 was identified from EMS-mutagenized seeds of cotton germplasm resources. cz3 and CIR12 were used as female and male parents, respectively, to generate an F2 population of 600 individuals. cz3 (n = 48), CIR12 (n = 48), and F2 (n = 600) individuals were planted in the field at Anyang, China in 2021. Each row of field plots was 6 m long and 80 cm wide, with the seeds placed 20 cm apart within the row. The sowing depth was about 8 cm. Plants from the center of each plot were selected for plant height evaluation.

DNA extraction and construction of BSA library

Genomic DNA was extracted from parental and F2 plants leaves with the cetyltrimethylammonium bromide (CTAB) method and followed by purification through RNase A treatment. Two segregating pools, namely the dwarf and tall bulk, were created by combining equal amounts of DNA from 30 dwarf lines and 30 tall lines. The quality and concentration of the DNA were assessed using a Beckman Coulter DU800 spectrophotometer, and necessary adjustments were made to achieve a final DNA concentration of 100 ng mL−1. Approximately 5 μg of DNA from the two segregating pools and two parental lines were utilized for the construction of sequencing libraries.

RNA-seq

Total RNA was extracted from stem tissues of cz3 and CIR12 during the four-leaf development period, and RNA libraries were prepared using Illumina 2500 platform sequencing technology. The sequencing was performed by Beijing Genomics Institution (BGI) in Shenzhen, China. For the RNA-seq experiment, three biological replicates were performed to ensure the reliability of the results. The TM-1 (AD1) genome NAU-NBI assembly v1.1 was used as the reference genome for mapping and analysis (Nanjing Agricultural University‐Novogene Bioinformatics Technology). Gene expression levels were calculated and normalized into fragments per kilobase per million mapped fragments (FPKM) values using the stringtie software (Zhao et al. 2021). To identify differentially expressed genes (DEGs), the deseq2 package was used with the following parameters: a fold change of more than two and an adjusted P-value less than 0.01. Gene ontology enrichment was analyzed by David (https://david.ncifcrf.gov). KEGG pathway enrichment was examined via the Novogene platform (https://magic.novogene.com).

LUC imaging assay

The coding sequences of proGhSOTcz3 and proGhSOTCIR12 were PCR-amplified using specific primer sets listed in Table S1. proGhSOTcz3 and proGhSOTCIR12 were cloned in-frame with the p0800MINI vector. These constructs were transformed into A. tumefaciens strain GV3101 and subsequently infiltrated into N. benthamiana leaves. After infiltration, the plants were maintained under a 16 h light/8 h dark cycle for 2 days. Firefly LUC and Renilla luciferase (REN) activities were quantified using a dual-luciferase reporter assay system (Promega, Madison, WI, USA).

Gene cloning and plant transformation

For virus-induced gene silencing (VIGS) experiment, about 100–200 bp DNA fragments of candidate gene were each into pTRV2 vectors that was transformed into A. tumefaciens strains GV3101 to generate pTRV::candidate gene in cz3. The GV3101 cells with each of the above plasmids were mixed with an equal volume of the cells harboring pTRV1 and infiltrated into 10-day-old cotton seedling cotyledons as previously described (Chen et al. 2021; Burch-Smith et al. 2004; Pang et al. 2013). The VIGS experiments were repeated three times, and each time with more than 10 plants. TRV::PDS was used as the positive control and the empty vector pTRV2 was used as the negative control. All the plants grew in the greenhouse that were facilitated with 28 °C/22 °C day/night temperature, 16 h/8 h light/dark photoperiod.

Scanning electron microscopy (SEM)

SEM was employed to examine the stems of both cz3 and CIR12 cotton plants, adapting the method outlined with slight modifications to suit the specific requirements of the samples being analyzed (Challa et al. 2021).

Subcellular localization of GhSOT

The full-length coding sequence (CDS) of GhSOT, excluding the stop codon, was amplified from CIR12 using specific primers (Table S1) and cloned into the pCAMBIA2300-GFP vector to construct a 35S::GhSOT-GFP fusion protein for transient expression. Along with the control plasmid (35S::GFP), the recombinant plasmid was transferred into A. tumefaciens strain GV3101. The subcellular localization of GhSOT in living cells was analyzed in the leaves of 4-week-old N. benthamiana plants infiltrated with A. tumefaciens cells carrying the GhSOT construct. After 48 h, the epidermal cells of the infiltrated tobacco leaves were examined under an OLYMPUS FV1200 confocal microscope (Olympus, Tokyo, Japan) to determine the localization of the GhSOT-GFP fusion protein. The primers used for subcellular localization are listed in Table S1.

Real-time quantitative RT-PCR

Total RNA was extracted from cotton samples using a plant total RNA extraction kit (Tiangen Biotech, Beijing, China) following the instructions provided with the kit. Subsequently, first-strand cDNA was synthesized using the PrimeScript Rt reagent Kit with gDND Eraser reagent (Takara Biotech, Beijing, China) for the purpose of analyzing the expression levels of relevant genes through qRT-PCR. The qRT-PCR was conducted using the MonAmp in review SYBR Green qPCR Mix Kit (Monad Biotech, Suzhou, China). The relative expression level was calculated according to the 2−ΔΔCT method (Livak and Schmittgen 2001). In order to standardize the gene expression, an internal control GhUBQ-7 was used.

Statistical analysis

Differences between parent and progeny plant height were detected using SPSS18.0 (IBM, Armonk, NY, USA). Data represent means ± standard deviations. Graphs were drawn using edgeR (http://www.r-project.org/) and Origin 2018 (OriginLab, Northampton, MA, USA).

Results and analysis

Phenotypic identification and genetic analysis of plant height

cz3 displayed a dwarf plant phenotype with height of ~ 65 cm, significantly lower than CIR12 (Fig. 1A, B). However, cz3 and CIR12 have the same number of internodes, indicating that the observed plant height difference between the two lines is caused by differences in the length of the internodes, rather than the number of internodes (Fig. 1C, D). Cytological observation revealed that internode cell sizes in CIR12 were significantly larger than those of cz3 (P < 0.01) (Fig. 1E, F), indicating that the longer internodes in CIR12 relative to cz3 are caused by increased cell sizes, not increased cell numbers.

Fig. 1
figure 1

Plant height and longitudinal sections of the stem at the mature stage. A, B Plant height of cz3 and CIR12 at the mature stage. Scale bar = 10 cm. C The spacing between four fruit branches on the stem of cz3 and CIR12. D Internode numbers on the stem of cz3 and CIR12, ‘ns’ indicates no significant difference. E Hand-cut longitudinal sections of the stem in cz3 and CIR12, respectively. Scale bar = 200 μm. F The internode cell size of cz3 and CIR12. Six slides of each sample were observed, and more than 100 cells were measured for each slide. **P < 0.01, Student's t test was used to calculate the P value

BSA-seq analysis

In order to facilitate phenotypic scoring, we opted to utilize plant height as the quantitative trait for the initial identification of the plant height locus. To achieve this, a pooled sequencing strategy was employed, involving two parent-bulk samples, cz3 (P1-pool) and CIR12 (P2-pool). From these samples, we selected 30 extreme-dwarf plants (D-pool) and 30 extreme-tall plants (T-pool) based on F2 plant height data, which were then pooled into two bulks (Fig. S1). The HAU versions of TM-1 genomes were chosen as the references. Subsequently, an average SNP-index was calculated using a sliding window approach (with a 1 Mb interval and 1 kb step width) for the two extreme bulks. The mapping ratio of the clean reads is between 97.07 and 97.76%, while the sequencing depths varied from 12.30-fold to 20.92-fold for cz3, CIR12, D-pool, and the T-pool, respectively. The Delta SNP-index was calculated and graphed across the entire genome using the Delta SNP-indexes for D-pool and T-pool. Regions with SNP-index values exceeding the threshold were identified as candidate QTL regions. Under a 95% level of confidence, only one genomic region (58.162.1 Mb) on chromosome D05 qualified presence a major-effect QTL controlling plant height, which was designated as qPH5.1 (Fig. 2).

Fig. 2
figure 2

QTL-seq approach for mapping genomic regions controlling cotton plant height. The colored dots represent the ∆(SNP index) value of every SNP locus. The red lines show the ∆(SNP index) value of the fitting results, the green dotted line shows the association threshold value (0.95) and the yellow dotted line shows the association threshold value (0.99)

Fine-mapping of qPH5.1 locus

In order to conduct a more detailed mapping of qPH5.1, a larger population of F2 individuals consisting of 1200 plants was developed in the year 2022. A total of 1800 F2 plants, including 600 plants for bulk construction, were genotyped using KASP1 and KASP7 markers, which were developed based on BSA data (Table S2, Fig. S3). Additionally, all progeny plants were genotyped using 7 markers (KASP1-KASP7), resulting in the identification of 20 recombinant plants. Subsequently, we assessed the performance of these plants for plant height in the field. Statistical analysis revealed a significant decrease in plant height for L3 and L4 in comparison to CIR12 (P < 0.01), while no significant differences were observed between contrasting genotypes in L1, L2, and L5 families. These findings provide evidence that the candidate gene is located within a physical interval of approximately ~ 0.34 Mb, which is flanked by markers KASP5 (61,068,056 bp) and KASP6 (61,403,436 bp) (Fig. 3A), this mapping interval is consistent with the qPH5.1 (Fig. 3B). The BSA data of QTL-seq and the findings from fine-mapping reveal a total of 4838 SNP sites, with 4402 being intergenic, 91 exonic, 30 synonymous, and 60 non-synonymous mutations. Among these, 22 SNP sites are located on the qPH5.1 interval, with 14 in exon, one in the interregion, and seven on the promoter. Out of these 14 exonic SNP sites, eight are synonymous, six are nonsynonymous. Further investigation will involve a comprehensive analysis of the genes associated with these SNP sites.

Fig. 3
figure 3

Fine-mapping and analysis of candidate genes. A Fine-mapping of the qPH5.1 locus. Genotyping of recombinant plants from 1800 F2 individuals. Annotation of the ~ 0.34 Mb region. Letters represent significant differences at the 0.05 level based on Tukey’s test. Yellow boxes indicate exons, and black lines indicate introns. Arrow indicates the orientation of the gene. B Mapping interval for qPH5.1 locus and its corresponding physical map of chromosome D05 on cotton. C Heat map of gene expressions in the candidate gene region between cz3 and CIR12 from RNA-Seq, with expression level presented as log2-transformed mean values from three biological replicates. The color spectrum from blue to red represents low to high expression levels, respectively. D Expression patterns of the five candidate genes in cz3 and CIR12. Data were obtained from the website (http://cotton.zju.edu.cn/10.rnasearch.html)

Identification of expressed genes in the candidate interval

To further investigate candidate genes, we conducted an analysis of the transcriptome data from stem tissues in cz3 and CIR12. A total of 968 genes showed statistically significant expression differences between cz3 and CIR12 (Table S3). Compared to CIR12, two genes were up-regulated and three genes were down-regulated of the stable QTL in cz3 (Fig. 3C). Additionally, examination of the expression patterns of these five genes revealed that GhSOT (GH_D05G3950) is predominantly expressed in the stem, as reported in the CottonFGD database (http://cotton.zju.edu.cn/10.rnasearch.html) (Fig. 3D). Subsequent analysis identified seven SNP variants in the promoter region of GhSOT among the five DEGs, supporting this finding through sequencing of GhSOT from CIR12 and cz3 (Fig. 4A, B). LUC experiments provided additional support, demonstrating that the promoter of GhSOT from cz3 exhibited a greater enhancement of downstream gene expression (Fig. 4C). The alteration in the promoter sequence may influence the expression of the downstream gene, potentially contributing to the observed change in plant height. The amino acid sequences of GhSOT were subjected to BLAST searches against public databases for the identification of homologous sequences. The protein sequences of GhSOT and its four most similar homologous genes were used for phylogenetic analysis. The results showed that GhSOT shares 45% amino acid sequence identity with the soybean and 44% identity with the Arabidopsis SOT16 proteins (Fig. 4D, E). These results suggest that GhSOT may be the candidate gene for plant height.

Fig. 4
figure 4

Promoter analysis of the candidate genes. A Venn diagram showing the overlap between DEGs and nonsynonymous genes in fine-mapping interval. B Seven SNP variants of the GhSOT promoter region in CIR12 and cz3. C LUC experiments of GhSOT promoter in CIR12 and cz3, demonstrating that the promoter of GhSOT from cz3 exhibited a greater enhancement of downstream gene expression. **P < 0.01, Student's t test was used to calculate the P value. D Gene structure of GhSOT. E Evolutionary analysis of GhSOT (GH_D05G3950)

VIGS of GhSOT brings an increase in plant height

To better understand GhSOT function, we conducted quantitative real-time RT-PCR analysis to examine its relative expression patterns in various tissues. Our analysis indicated that GhSOT was mainly expressed in the stem, with significantly lower expression in the leaf, bract, sepal, torus, and fiber tissue (Fig. 5A). In addition, to explore the subcellular localization of GhSOT in cell, a coding sequence construct was introduced into the epidermal cells of N. benthamiana leaves. The construct was fused with GFP at the N-terminal of GhSOT The results demonstrated that the GhSOT-GFP fusion protein was localized in the nucleus and plasma membrane (Fig. 5B). To elucidate the role of GhSOT in cotton, VIGS experiments were conducted on cz3 to suppress the expression of GhSOT genes. Results showed that silencing GhSOT in cz3 plants led to a 41.0% increase in plant height compared to control plants, suggesting that down-regulation of GhSOT partially restored plant height (Fig. 5C–E). Conversely, VIGS of CIR12 did not result in a significant change in plant height (Fig. S2), indicating that the expression level of GhSOT plays an important role in regulating plant development.

Fig. 5
figure 5

Functional validation of GhSOT. A Expression analysis of GhSOT gene in different tissues. GhHistone3 was used as an internal reference gene for data normalization. Data are presented as mean ± SD for three independent experiments. B Subcellular localization of GhSOT protein. Fusion of GhSOT with GFP protein was transiently expressed in N. benthamiana leaves, with free GFP serving as the control. H2B was employed as a nuclear maker. The visualization was conducted using confocal microscopy. C Phenotypes of blank control and GhSOT gene silenced in cz3. TRV::156 (empty, negative vector control), TRV::GhSOT (silenced cotton plants). D Quantification of GhSOT expression in blank control and VIGS in cz3. GhHistone3 was used as an internal reference gene for data normalization. E Plant height statistics of TRV::GhSOT and TRV::156 of cz3. Data are presented as mean ± SD for three independent experiments. **P < 0.01, Student's t test was used to calculate the P value

GhSOT may regulate plant height based on GA signaling

We employed the RNA-seq to investigate the pathways associated with stem development in a mutant organism, and identified the top 20 pathways that exhibited enrichment in DEGs. Notably, the Glycerolipid metabolism pathway displayed significant enrichment, with down-regulation observed in the related genes (GPAT3, GPAT5, EXL3, etc.) (Fig. 6A). Previous studies have indicated that GPAT1 plays a role in regulating plant height by influencing cell length (Bai et al. 2021). Additionally, four DEGs have been identified as key players in Auxin-related signaling pathways, including an Aux/IAA member, an Aux-responsive GH3 member, and two SAUR genes (Fig. 6B). Quantitative analysis of IAA and GA levels in the main stem between the first and second fruit branches revealed significantly higher levels in CIR12 compared to cz3 (P < 0.01) (Fig. 6C, D). Further, we started spraying IAA, GA3 and H2O on the cotton plants planted in the laboratory from the seventh day of sowing, with a concentration of 100 mg L−1 every two days. It is obvious that the plant height increment of cz3 is higher than CIR12 compared to the control after spraying GA3 (Fig. 6E). However, there was no significant change in plant height increment after spraying IAA with a concentration of 100 mg L−1 every two days (Fig. 6F). These findings indicate that the regulation of cotton plant height by GhSOT potentially relies on GA signaling.

Fig. 6
figure 6

Analysis of regulatory pathways. A KEGG pathway enrichment analysis of DEGs between cz3 and CIR12. The top 20 KEGG pathways with the most abundant DEGs were presented. The y-axis and x-axis represent the pathway name and rich factor, respectively. The size of the circle dot indicates the gene count in each pathway. B RNA-seq is used to evaluate expression level of putative downstream genes involved in Aux-related and GA signaling between cz3 and CIR12, with expression level presented as log2-transformed mean values from three biological replicates. The color spectrum from blue to red represents low to high expression levels. C, D IAA and GA quantification from the main stem between first and second fruit branches of cz3 and CIR12. Data show mean ± SD (n = 3), **P < 0.01, Student's t test was used to calculate the P value. E, F Plant height in cz3 and CIR12 plants after treatment with GA3, IAA. Plant height is shown as the means ± SD (n > 30)

Discussion

Cotton plant architecture breeding programs have become an effective method for improving yields (Song and Zhang 2009). The plant height is a crucial aspect of its overall structure and has implications for various agricultural factors such as mechanized harvesting, harvest index, and final yield performance. Plant height is a quantitative trait and many plant height QTLs in cotton have been reported that they are distributed over multiple linkage groups (Ye et al. 2023; Liu et al. 2020; Huang et al. 2017; Su et al. 2018; Wen et al. 2019; Zhang et al. 2019). These QTL fine-mapping and map-based cloning are supported by high-density genetic map and a series of near-isogenic lines. And the BSA-seq technology has proven to be successful for rapidly identifying the chromosome region harboring the genes/QTLs of interest in numerous crop species (Takagi et al. 2015; Lei et al. 2020; Xin et al. 2020). The BSA-seq technology can reduce the genotyping cost of large sets of individuals by choosing only extreme individuals for genotyping (Elshire et al. 2011). The present study mapped a major locus, qPH5.1, which is located at a different position from all of the previous studies in cotton.

Within the mapping interval, it was confirmed that GhSOT belonging to the sulfotransferase (SOT/ST) family regulates the development of plant height. Analysis of the GhSOT promoter revealed seven SNP site variations between the two parental genes. LUC experiments indicated that the promoter from cz3 effectively enhances the expression of the downstream gene GhSOT. Previous research has demonstrated the significant involvement of SOT/ST family genes in various plant processes (Komori et al. 2009; Igarashi et al. 2012; Zhou et al. 2010). In this study, RNA-seq analysis was conducted to investigate the role of GhSOT in regulating plant height development, with a suggesting potential modulation by the GA signaling pathway. Previous research has extensively documented the role of GA in the regulation of plant height (Wang and Wang 2022). The development of semi-dwarf varieties with increased yields during the 'Green revolution' is attributed to significant modifications in GA biosynthesis and signaling pathways (Evenson and Gollin 2003). Improving plant height by altering the GA pathway has proven to be an effective way (Tester and Langridge 2010; Wang et al. 2017). Additionally, there is evidence of crosstalk between GA and other phytohormones, such as Aux, brassinosteroids (BRs), and jasmonates (JAs), in the regulation of plant height (Ferrero-Serrano et al. 2019). Auxin response factor (ARF) and Aux/IAA family members play a direct role in up-regulating the expression of GA catabolic genes GA20ox and GA2ox, thereby coordinating GA metabolism during the process of organ elongation (Alamin et al. 2018). Therefore, GA signaling plays a crucial role in controlling plant height.

In this study, a SOT/ST family gene GhSOT was detected via BSA-seq and transcriptome sequencing. We have found seven SNP variants in the gene promoter between CIR12 and cz3. RT-PCR analysis indicated that GhSOT was mainly expressed in the stem. And its function was verified by VIGS, knock-down GhSOT expression partially restored the height of cz3 plants. These results reveal GhSOT is a key gene controlling plant height. Transcriptomics data analysis has demonstrated that the expression levels of genes associated with GA biosynthesis and signaling, as well as those involved in cell wall organization and biogenesis, ultimately led to the elongation of cell length and consequently an increase in plant height. We found the plant height increment of cz3 is higher than CIR12 compared to the control after application of GA3. There was no significant change in plant height increment after spraying IAA (Fig. 6E, F). We speculated the regulation of cotton plant height by GhSOT potentially relies on GA signaling. However, this inference needs further experiments to be authenticated.

Conclusion

In the present research, we have mapped a new QTL, qPH5.1, associated with plant height on chromosome D05 using BSA-seq and RNA-seq methods with  ~ 0.34 Mb region. This region contains five genes, including GhSOT which is the candidate gene for plant height. These findings will boost further research on the genetic mechanisms underlying plant height.