Introduction

Among the four cultivated Gossypium species, G. hirsutum (upland cotton) accounts for 90–95% of the total cotton fiber production because of its attractive fiber quality and high yield (Rong et al. 2004; Park et al. 2005; Lacape et al. 2009). Improved human living standards are increasing demand for cotton with high-quality traits, creating a challenge for cotton breeders. Plant architecture is one of the most important parts of agronomic traits related to the suitability of mechanical harvest, plant adaptability, plant density, yield and quality (Chen et al. 2019; Su et al. 2018). PH is a major trait affecting plant architecture and serves as a model to analyze dynamic development, which directly determines the biomass and has indirect influence on cotton economic yield. BN has large influences on morphological structure, photosynthetic capacity (Adawy et al. 2013; Guo et al. 2013; He et al. 2014; Shen et al. 2014; Shang et al. 2015, 2016;  Mei et al. 2017; Chen et al. 2019; Ma et al. 2018; Su et al. 2018). Most of the plant architecture traits, including PH and BN, are quantitative traits controlled by multiple genes and greatly influenced by environments (Adawy et al. 2013). Therefore, marker-assisted selection (MAS) could be used for improving these traits. Previous research identified QTLs for PH and BN with different kinds of markers and populations (Adawy et al. 2013; Guo et al. 2013; He et al. 2014; Shen et al. 2014; Shang et al. 2015, 2016; Mei et al. 2017; Qi et al. 2017; Ma et al. 2018) such as using simple sequence repeat (SSR) genetic maps (Guo et al. 2013, Shang et al. 2015, 2016;  Mei et al. 2017; Ma et al. 2018), amplified fragment length polymorphism (AFLP) and random amplified polymorphic DNA (RAPD) genetic maps (Adawy et al. 2013). Studies that used an SNP genetic map (Qi et al. 2017) did not use the entire upland cotton genome and some QTLs had large CIs. From these, we determined that it is necessary to identify QTLs and potential candidate genes for PH and BN with a high-density map.

In this research, we identified the QTLs for PH and BN across seven environments with a high-density genetic map (Zhang et al. 2016), constructed by an intraspecific RIL population with two parents, 0-153 and sGK9708 (Sun et al. 2012; Jamshed et al. 2016; Zhang et al. 2016, 2017). The stable QTLs were selected and compared with the database cottonQTLdb (Rong et al. 2007; Lacape et al. 2009; Said et al. 2013, 2015a, b; Fang et al. 2014). The genes located on the CI of the stable QTLs were annotated with ontology (GO), Kyoto encyclopedia of genes and genomes (KEGG) and eukaryotic orthologous groups (KOG) database, and the expression pattern of them in root, stem and leaf was analyzed with the RNA-Seq data in Zhang’s research (Zhang et al. 2015a). Combined all these results and the previous studies, the potential candidate genes were identified. This research could provide information about understanding the formation of PH and BN, and employ in marker assisted selection for molecular breeding with improved agronomic traits PH and BN.

Plant materials

An RIL population with 196 individuals was constructed with upland cotton 0–153 and sGK9708 as parents. For the parent 0–153, the mean values of PH and BN in seven environments were 91.14 cm and 13.9 respectively and for the parent sGK9708 were 82.72 cm and 11.1 respectively. F6:8 families were grown in 2007 and the later generations were considered as RILs. The process of developing the population was detailed in Sun’s research (Sun et al. 2012). The phenotypic data of PH and BN were collected across seven environments from 2007 to 2012. The seeds for the seven environments cross 5 years were from F6:8 and the seeds from the last generation were used for next year. The environments included Anyang (AY) of Henan province in 2007, 2008 and 2011, Quzhou (QZ) of Hebei province in 2008, Linqing (LQ) of Shandong province in 2008 and Zhengzhou (ZZ) of Henan province in 2011 and 2012 (Sun et al. 2012; Jamshed et al. 2016; Zhang et al. 2015b, 2016, 2017). In all the seven environments, the phenotypic evaluations of the population were conducted in a completely randomized block design with two replications. In 2007AY and 2008AY, a single row plot was used and each row was 5 m long and 0.8 m wide, the sowing days were April 25 and April 26 respectively; in 2008QZ and 2008LQ, the plastic-membrane covering technique and a wide/narrow row distance planting pattern were used with rows 5 m long and 0.5/0.8 m wide apart, the sowing days were April 27 and April 28 respectively (Sun et al. 2012); In 2011AY, 2011ZZ and 2012ZZ, a single row plot was used and each row was 5 m long and 0.8 m wide, the sowing days were April 26 in the three environments. The plant density was 0.25 m/per plant for every row and the local commercial field management practices were used in each environment.

Collection and statistical analysis of phenotypic data

The phenotype data of PH and BN were collected in the middle of September, and ten plants in each plot were selected in the seven environments. The trait PH was measured from the cotyledonary node to the topping place of the whole plant; the trait BN was counted for all the fruiting branches. The descriptive statistics such as mean value, standard deviation, skewness and kurtosis of PH and BN across the whole population were analyzed with SPSS 20.0 (https://www.ibm.com/analytics/spss-statistics-software). Variance and heritability were analyzed with ICIMAPPING 4.1 (Li et al. 20072015b; Meng et al. 2015). Heritability greater than 40% was considered high heritability; heritability between 20% and 40% was considered medium heritability; and heritability with less than 20% was considered low heritability (Bai et al. 2016; Wang et al. 2018).

QTLs identification for PH and BN

The QTLs were identified based on a high-density genetic map constructed by Zhang et al. (2016) with the phenotype data of seven environments using the composite interval mapping method (CIM) as provided in Windows QTL Cartographer 2.5 (Zeng 1994; Wang et al. 2001). The logarithm of odds (LOD) value for declaring significant QTLs across environments was calculated by a permutation test with the mapping step of 1.0 cM, five control markers, and a significance level of p < 0.05, n = 1000. The QTLs were named as follows: the QTL designations begin with “q” followed by the trait abbreviation, the chromosome number and the QTL serial number (Sun et al. 2012; Zhang et al. 2015b, 2016, 2017; Jamshed et al. 2016).

A positive additive effect indicated that favorable alleles were from parent 0–153 while a negative effect indicated parent sGK9708. The QTLs for the same trait for at least two environments were considered stable when their CIs overlapped (Sun et al. 2012). The stable QTLs were compared with the QTLs in the CottonQTLdb database (Rong et al. 2007; Lacape et al. 2009; Said et al. 2013, 2015a, b; Fang et al. 2014) (http://www2.cottonqtldb.org:8081/). QTLs were newly identified if they did not share or overlap physical CIs with prior studies in the database.

Gene identification and annotation

All the markers that located on the CIs of stable QTLs were compared to the genome of upland cotton (Li et al. 2015a, b; Zhang et al. 2015a) to get their physical position. The genes that located on the physical CIs of the stable QTLs were annotated. The functions of all the genes were forecasted by referring to their corresponding genes in A.thaliana. The genes were also annotated with GO, KEGG and KOG database. We used the go_20160702-seqdb.fasta (http://archive.geneontology.org/latest-lite/) and gene2go (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/) databases for the GO annotation and the KOG database (ftp://ftp.ncbi.nih.gov/pub/COG/KOG) for the KOG annotation. The sequences of the candidate genes were compared with the sequences in the database using BlastX (e < 10−10) software. KOBAS 3.0 (Xie et al. 2011) (http://kobas.cbi.pku.edu.cn/) software was used for the KEGG annotation.

In-silico RNA-Seq data analysis

RNA-seq data about different tissue of upland cotton (PRJNA248163) were downloaded from the Sequence Read Archive (SRA) of the national center for biotechnology information (NCBI) (https://www.ncbi.nlm.nih.gov/) (Zhang et al. 2015a). First, the data of SRA format were transformed into FASTQ format with the software SRA Toolkit (https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software) (–split-3), then the software Bowtie 2 (Langmead and Salzberg 2012) was used to construct the library and the software Tophat 2 (Kim et al. 2013) was used to compared the FASTQ format reads to the upland cotton genome (Li et al. 2015a; Zhang et al. 2015a). The software Cufflinks were used to calculate the fragments per kilobase million (FPKM) value (Trapnell et al. 2010). The genes with FPKM value more than 10 in at least one tissue could be considered as expression genes (Zhang et al. 2019). The heatmap was drawn with the value of log2(FPKM+1) by the software Pheatmap in the R package.

Results and discussion

The descriptive statistics and heritability analysis of PH and BN

The descriptive statistics of PH and BN for parents and the RIL population in seven environments are summarized in Table 1. All traits showed approximately normal distributions with an absolute skewness value of less than one and were characterized by transgressive segregations with respect to their parental performances during the evaluations (Table 1). Heritability for PH and BN was calculated across seven environments. For PH, heritability ranged from 0.34 to 0.62. Results indicated high heritability in five of the seven environments and showed medium heritability in the remaining two environments. For BN, the heritability ranged from 0.23 to 0.42. Results indicated high heritability in two of the seven environments and showed medium heritability in the remaining five environments. The variance analysis and heritability analysis results are shown in Table S1.

Table 1 Descriptive statistics of PH and BN in the parents and population

As the traits PH and BN were both quantitative traits and influenced greatly by environment, so identify QTLs for these two traits across multiple environments and selected the stable ones could be more accurately. Under the background of SNP genetic map widely used in QTL identification, in previous studies, most of the studies focused on QTLs for PH and BN using SSR map (Guo et al. 2013; He et al. 2014; Shang et al. 2015, 2016; Mei et al. 2017; Ma et al. 2018). Among the few reports that used SNP map, Qi’s research used only one environment in F2 population and one environment in F2:3 population (Qi et al. 2017) while in our research, total seven environments were used to identify QTLs. These could contribute to future research such as QTL identification, fine mapping, function genes cloning and also provide some reference for discovering the mechanism of PH and BN formation.

QTL identification and congruence analysis with previous reports

For PH, there were 68 QTLs on 22 of 26 chromosomes except chromosome 2, 6, 11, and 24 (Table S2 and Table 2 and Fig. 1). Among the results, nine stable QTLs could be detected in at least two environments. The qPH-chr16-1 was detected in five environments, located on the CI of 100.4–104.6 cM, and explained 4.38–8.96% of the observed phenotypic variation (PV) with negative additive effects; the QTLs qPH-chr5-5, qPH-chr7-1 and qPH-chr12-3 were detected in three environments, located on CIs of 185.9 cM-194.3 cM of chr.5, 13.7 cM–21.10 cM of chr.7 and 29.5 cM–32.4 cM of chr.12, and explained 5.67–9.64%, 5.06–6.68% and 7.07–11.13% of the observed PV, respectively, all with negative additive effects. The QTLs qPH-chr1-4, qPH-chr1-6, qPH-chr7-5, qPH-chr17-1 and qPH-chr26-1 were detected in two environments, located on the CIs of 74.5–76.2 cM of chr.1, 99.3–102.00 cM of chr.1, 119.7–121.4 cM of chr.7, 6.2–13.2 cM of chr.17 and 56.3–61.5 cM of chr.26, and explained 5.04–6.02%, 4.66–6.23%, 3.81–7.68%, 4.51–4.80% and 5.56–5.67% of the observed PV, respectively, all with positive additive effects.

Table 2 Stable QTLs for PH and BN
Fig. 1
figure 1

The variation trend of the LOD value along with the genetic distance of stable QTLs for PH and BN in specific one of the related environments. a The variation trend of the LOD value along with the genetic distance of stable QTLs for PH in specific one of the related environments. b The variation trend of the LOD value along with the genetic distance of stable QTLs for BN in specific one of the related environments

For BN, there were 64 QTLs located on all 26 chromosomes (Table S2, Table 2 and Fig. 1). Among them, eight stable QTLs were detected in two environments. The QTLs qBN-chr2-2, qBN-chr4-2, qBN-chr8-1, qBN-chr15-1, qBN-chr18-2, qBN-chr19-1, qBN-chr19-2 and qBN-chr25-3 were detected in two environments, located on the CIs of 12.9–19.2 cM of chr2, 95.3–97.0 cM of chr4, 30.4–37.1 cM of chr8, 0.0–2.9 cM of chr15, 132.5–135.9 cM of chr18, 0.0–6.1 cM of chr19, 16.9–22.0 cM of chr19 and 89.5–91.3 cM of chr25, and explained 6.19–6.43%, 6.45–8.74%, 4.67–5.47%, 5.24–7.17%, 4.81–6.42%, 5.72–6.40%, 5.74–6.32% and 5.21–5.97% of the observed PV, respectively. The additive effects for qBN-chr2-2, qBN-chr4-2, qBN-chr8-1, qBN-chr19-2 and qBN-chr25-3 were negative, while additive effects for qBN-chr15-1, qBN-chr18-2 and qBN-chr19-1 were positive (Table S2; Table 2).

To discover whether the stable QTLs in our study were novel or had been identified in previous studies, we compared the stable QTLs with those in the cottonQTLdb (Rong et al. 2007; Lacape et al. 2009; Said et al. 2013, 2015a, b; Fang et al. 2014). As in this database, all the markers were SSR markers and restriction fragment length polymorphism (RFLP) markers. Therefore meta-analysis could not be used as no common markers could be found between our SNP map and the SSR and RFLP maps in the database. The QTLs shared the same or overlapped physical CIs in the database and our results were common QTLs. For PH, the QTLs in the database were distributed on the 23 chromosomes except chromosome 4, 12, and 18. In our research, we also identified one stable QTL for PH on chromosome 12. For BN, the QTLs in the database were distributed on 18 chromosomes except chromosome 3, 4, 7, 8, 15, 19, 20 and 21. In our research, we also identified stable QTLs on chromosome 4, 7, 8, 15 and 19. Comparing the physical position between the QTLs in our research and the QTLs in the database, two QTLs for PH (qPH-chr1-4 and qPH-chr1-6) and one QTL for BN (qBN-chr25-3) should be considered as common QTLs. The other seven QTLs for PH and eight QTLs for BN were newly identified (Fig. 2). Previous studies identified QTLs with an SNP map, but most of the results focused on fiber quality and yield traits (Hulsse-Kemp et al. 2015; Wang et al. 2015a, b; Li et al. 2016; Tan et al. 2018). For the trait PH and BN, there were only a few researches. In Qi’s research, due to the unknown physical position of their SNP markers, we could not compared the results with ours, but that research have not identified any stable QTLs, each QTLs could be detected in only one generation and one environment (Qi et al. 2017).

Fig. 2
figure 2

Physical position of QTLs both could be identified in the database and in our research. a The physical position of qPH-chr1-4 and qPH-chr1-6 in the database and in our research. b The physical position of qPH-chr5-5 in the database and in our research. c The physical position of qBN-chr25-3 in the database and in our research

In our research, we increased the density of the markers and shortened the CIs of the QTLs in the database. The QTL on chromosome 1 for PH in the database harbored five SSR markers, four were located on the physical CIs from 86.31 to 98.15 MB and the other one located on 44.06 MB. In our research, the QTLs qPH-chr-4 and qPH-chr-6 were located on the physical CIs from 85.27 to 85.40 MB and 94.47 to 94.70 MB. One QTL with large CIs in the database were divided into two QTLs with smaller CIs. The QTL on chromosome 5 for PH in the database harbored three SSR markers and were located on the physical CI from 86.77 to 91.53 MB. In our research, the qPH-chr5-5 was located on the physical CI from 90.37 to 90.71 MB, much smaller than in the database. The QTL on chromosome 25 for BN in the database harbored 11 SSR markers and were located on the physical CI from 5.18 to 34.62 MB. In our research, the qBN-chr25-3 was located on the physical CI from 11.04 to 12.61 MB, much smaller than in the database. For the previously identified QTLs, we narrowed the CIs from the database. For the new data, we identified five important QTLs, including qPH-chr16-1, qPH-chr5-5 and qPH-chr12-3 for PH, and qBN-chr4-2, and qBN-chr15-1 for BN, which were detected in multiple environments and explained more than 7% of the observed PV. These results could promote further research into the mechanisms that control PH and BN, and could also provide a basis for MAS (Fig. 2).

Gene identification, annotation and expression pattern analysis

A total number of 1024 genes (400 for PH and 624 for BN) were located on the CIs of stable QTLs and used to do the next step of analysis. For PH, the qPH-chr26-1 harbored the most 248 genes, while the qPH-chr1-4 harbored the least one gene (Table 3). For BN, the qBN-chr15-1 harbored the most 282 genes, while the qBN-chr4-2 harbored the least ten genes. The qBN-chr2-2, qBN-chr18-2 and qBN-chr19-1 harbored no genes because of the small physical CIs (Table 3). The function of all the genes for PH and BN were forecasted by identifying its corresponding genes in A. thaliana (Table S3). The genes were also annotated with the database GO, KEGG and KOG. For GO annotation, the GO term “biological process” in the category “biological process”, the GO term “nucleus” in the category “cellular component” and “molecular function” in the category “molecular function” harbored the most number of genes for both traits. For PH, the three GO terms harbored 61, 120, and 58 genes, respectively. For BN, the three GO terms harbored 97, 159 and 91 genes, respectively (Table S4, Fig. S1). For KEGG annotation, the KEGG pathway “metabolic pathways”, “biosynthesis of secondary metabolites” and “ribosome” harbored the most genes for both traits. For PH, these three pathways harbored 20, 16 and seven genes, respectively. For BN, these three pathways harbored 62, 40 and 12 genes, respectively (Table S5, Fig. S1). For KOG annotation, the KOG basket “signal transduction mechanisms” and “transcription” had the most number of genes for both traits. For PH, the two baskets harbored 41 and 29 genes, respectively. For BN, the two KOG baskets harbored 47 and 28 genes, respectively. Some genes have no clear KOG annotation information, as they were located on the baskets “general function prediction only” and “function unknown”. For PH, there were 51 and 18 genes located on these two baskets, respectively. For BN, there were 98 and 30 genes located on these two baskets, respectively (Table S6, Fig. S1).

Table 3 Genes located on the CI of stable QTLs

For PH, there were 134 genes that expressed in at least one of the three tissues (root, stem and leaf). Among them, 60 were expressed in all the three tissues, 15 in stem and leaf, two in root and stem; 16 in root and leaf, 14 only in root, 16 only in stem, 134 only in leaf. For BN, there were 224 genes that expressed in at least one of the three tissues. Among them, 109 were expressed in all the three tissues; 22 in stem and leaf, nine in root and stem, 28 in root and leaf, 18 only in root, 25 only in stem, 13 only in leaf (Table S7 and Fig. S2).

Among the 134 genes for PH and 224 for BN, six for PH and four for BN have been validated in the previous researches. For PH, the gene Gh_A05G3508 is related to “cytochrome P450”which takes part in the important biosynthesis pathways, where mutations in the gene may block brassinosteroid biosynthesis, thereby creating dwarf plants (Yang et al. 2014; Qi et al. 2017; Wu et al. 2016); the gene Gh_D07G0671 belongs to MADS-box family which is involved in cellular differentiation and floral determination (Hempel et al. 1997; Gu et al. 1998; Su et al. 2018); the genes Gh_D12G1734, Gh_D03G0226 and Gh_A05G3499 belong to WRKY family which help plants to adapt through physiological and morphological changes and control flowering time and plant height (Cai et al. 2014; Gu et al. 2018); The gene Gh_D12G1644 belongs to TCP family which directly bind the promoters of core cell-cycle genes in Arabidopsis inflorescence and shoot apices (Martı´n-Trillo et al. 2010; Davie`re et al. 2014). For BN, the gene Gh_A08G0299 belongs to MYB family which is the key regulator of early steps during shoot branching process (Schmitz and Theres 2005; Song et al. 2011; Yang et al. 2018).The gene Gh_D05G1192 was annotated with the function of “AMP deaminase, putative/myoadenylate deaminase, putative”; the two genes Gh_D01G2007 and Gh_D01G2145 were annotated with the function of “Transducin/WD40 repeat-like superfamily protein” in Arabidopsis (Bajaj et al. 2016). These ten genes were considered as potential candidate genes, which could provide information of understanding the mechanism of the formation of PH and BN, and also help breeders to improve the PH and BN (Tables S3–S6).