Introduction

Cotton (Gossypium spp.) is an economically important industrial crop in the world, and its fibers represent one of the main naturally available raw materials for the textile industry, which contributes more than 10 billion dollars a year to the global economy. The Gossypium genus contains nine different genome groups comprising approximately 50 species [40]. Of the four Gossypium species that have been historically cultivated for the fiber [4], tetraploids G. hirsutum and G. barbadense currently dominate commercial cotton fiber production, with the former accounting for approximately 95% of worldwide cotton production [39], due to its high yield potential and broad environmental adaptability. G. barbadense accounts for the rest of world cotton fiber production, due to its superior fiber quality attributes. Combining the yield from G. hirsutum and the fiber quality from G. barbadense has been an ongoing goal of cotton breeders and geneticists; however, only limited success has been achieved thus far [47].

Cotton fibers are single-celled trichomes that eventually differentiate from single cells located in the ovule epidermis [2]. Fiber development consists of four overlapping stages: initiation, primary cell wall formation, secondary cell wall formation, and maturation. The quality and productivity of cotton fibers depend on two biological processes: fiber initiation to determine the number of fibers present on each ovule and fiber elongation to control the final length (and partially strength and fineness) of each fiber [16]. After initiation from − 1 to 3 days post anthesis (DPA), fibers are in the fastest elongation stage at 5–10 DPA, and cotton fiber elongation continues up to 20–25 DPA. Numerous studies have demonstrated that cotton fiber development is a complex biological process [24, 48, 49]. Therefore, understanding the mechanisms underlying fiber elongation through genetic and molecular analysis is essential for improving fiber quality.

Identification and characterization of quantitative trait loci (QTL) of cotton fiber quality have been a research focus for more than 2 decades [32,33,34]. As a result, a large number of QTL associated with fiber quality-related traits have been reported based on interspecific G. hirsutum × G. barbadense populations [5, 6, 8,9,10,11, 19,20,21,22, 28, 44,45,46]. Among the numerous cotton fiber quality QTL that have been mapped, more QTL have been mapped for fiber length than any for other fiber quality trait. Specifically, 281 QTL have been detected for fiber length based on 88 different QTL studies (as summarized in [34]). However, few studies involving validation of the efficacy or marker association of cotton fiber length QTL in advanced breeding generations have been published. Hence, the genes underlying fiber length QTL remain elusive. Cotton fiber cell elongation is a complex physiological process regulated by many genes [2]. For example, GhPIP2, GhHOX3, GhCaM7, GhPAG1, and GhCPC play an important role in controlling cotton fiber elongation [1, 23, 26, 35, 36, 42] and 81 genes were upregulated and 2553 ‘expansion-associated’ genes were downregulated in G. arboreum compared with their expression in Upland cotton at 7–10 DPA through microarray analysis, suggesting a trend of global gene repression during the fiber elongation stage in allotetraploids. In a recent RNA-seq study, 1551 differentially expressed genes (DEGs) were identified between two BILs differing in fiber length, but only eight DEGs were co-localized with 4 fiber length QTL identified in the BIL population containing the two BILs [25]. A comparative transcriptome study was also conducted in diploid cultivated cotton using isogenic fuzzy-lintless and normal fuzzy-linted diploid G. arboretum lines at two stages (0 and 10 DPA), using an Affymetrix cotton GeneChip genome array [12]. Numerous DEGs have been identified during the fiber elongation stage. However, these genes do not explain the molecular mechanism underlying the natural genetic differences between genotypes with different fiber length. Therefore, it is necessary to use more BILs to perform genome-wide analyses of DEGs to identify genes associated with cotton fiber quality traits, including fiber length.

In this study, RNA was extracted from developing fibers at 10 DPA in a long-fiber (LF) group and a short-fiber (SF) group, each including five backcross inbred lines (BILs), and was then compared for transcriptome analysis. Since the fibers of 10 DPA are in the fastest elongation phase after initiation, we chose to analyze the fibers of 10 DPA to identify DEGs that may be associated with determining the length of the cotton fibers at this important stage of fiber development. The aim of this study was to identify DEGs associated with fiber length QTL by a microarray analysis using the Affymetrix GeneChips and to gain insight into the potential molecular genetic differences in cotton fiber elongation between two groups of BILs differing in fiber length. This positional candidate gene approach based on the integration of Affymetrix GeneChips with QTL mapping and physical mapping represents one of the most efficient strategies to narrow the number of candidate genes for fiber length.

Materials and methods

Plant material

The interspecific backcross inbred population of 146 lines was developed at New Mexico State University [46]. For the current study, 10 BILs were then selected based on differences in fiber length and similarities in other traits (Supplementary Data 2).The results were analyzed using least significant difference (LSD) tests for significant differences with SAS software (SAS Institute Inc., 2009, USA). Five BILs (i.e. NMGA-062, NMGA-089, NMGA-017, NMGA-051 and NMGA-070, designated LF1, LF2, LF3, LF4, and LF5, respectively) were long-fiber (LF) with an average fiber length of 31.79 mm; and five other BILs (i.e., NMGA-095, NMGA-026, NMGA-144, NMGA-105, and NMGA-140, designated SF1, SF2, SF3, SF4 and SF5, respectively) were short-fiber (SF) with an average fiber length of 27.35 mm. The two group (LF vs. SF), differed significantly in fiber length (31.79 vs. 27.35 mm), but no significant differences in strength, micronaire, seed cotton yield, lint yield, lint percentage, boll weight and Verticillium wilt index value. The 10 lines, 5 LF and 5 SF BILs were planted in the Experimental Farm, the Cotton Research Institute of the Chinese Academy of Agricultural Sciences, Anyang, Henan Province, China (E 114°48′, N 36°06′) in the summer of 2009. The field plots were arranged using a randomized complete block design with three replications. The flowers appearing on each flowering day were marked, and the cotton boll samples were harvested before 10 am at 5, 10, 15, 20, and 25 DPA and placed immediately on ice [41]. To minimize experimental errors, all samples from each developmental stage were collected on the same day. The fibers of 5–25 DPA developing ovules were dissected, frozen in liquid nitrogen, and stored at − 80 °C until further processing.

RNA extraction

Total RNA was extracted from developing fibers at 10 DPA, from the BILs and from two parental lines, Giza75 and SG 747, through the improved cetyltrimethylammonium bromide (CTAB) method [27]. Agarose gel electrophoresis was used to detect the proportion of 28S and 18S rRNA in the total RNA to test its integrity. A DU®800 Nucleic Acid/Protein Analyzer (Beckman Coulter, Brea, CA, USA) was employed to determine the RNA concentration and the OD260/OD280 value.

Microarray hybridizations and data analysis

The commercially available Affymetrix GeneChip® Cotton Genome Microarray (Affymetrix Inc., Santa Clara, CA), including 239,777 probe sets representing 21,854 cotton transcripts from various EST databases, was used in this study. The extracted RNA was sent to Shanghai Seaview Biotech Co., Ltd. for GeneChip analysis. RNA labeling was performed using the Affymetrix GeneChip® 3′ IVT Express kit, and hybridization was performed according to the manufacturer’s protocols. After the chip was scanned with a high-resolution scanner, the data were analyzed. PARTEK software was used for analysis of variance (ANOVA) because it uses a robust multi-array averaging (RMA) method that calculates only the signal values of perfectly matched (PM) probes and considers the signal value of all samples. The RMA method is currently the mainstream gene chip algorithm being used.

Reverse transcription (RT) and quantitative PCR

The cDNA reaction was carried out using the iScript™ cDNA Synthesis kit (Bio-Rad Laboratories, Hercules, CA) according to the manufacturer’s instructions. The reaction performed in the absence of reverse transcriptase was used as a negative control for detecting genomic DNA contamination in the RNA samples. This reaction was then used as a qPCR reaction template to verify that no amplification occurred. After cDNA synthesis, the qPCR reaction was performed with a Bio-Rad CFX96 real-time PCR detection system using iTaq™ SYBR® Green Supermix (Bio-Rad Laboratories). The PCR conditions and the protocols for determining primer efficiency have been previously described [3]. Histone3 (GenBank Accession AF024716) was used as an internal control. Primer sequences are listed in Supplementary Data 1.

Chromosomal locations of DEGs within fiber length QTL regions

To locate the DEGs in the fiber length QTL anchored by simple sequence repeat (SSR) markers, we collected the tag names from the [“Guazuncho2” (G. hirsutum) × “VH8-4602” (G. barbadense)] map that was previously published [32, 33]. The marker sequence data were downloaded from the CottonGen database (https://www.cottongen.org/data/download/marker) (Yu et al. 2014). Secondly, using the marker and DEG sequences as the query and the genomic sequence [49] as the subject, the chromosomal locations of the marker and DEGs were identified. The chromosomal location of the QTL hotspot regions of the fiber lengths previously reported by Said et al. [32,33,34], and the 4 fiber length QTL reported by Yu et al. [46] were identified using the anchoring marker locations in the map. The candidate genes in the QTL hotspot regions or the loci were then identified and used in further studies. This analysis method of chromosomal locations of DEGs within fiber length QTL regions followed Wu et al. [41].

Results

Microarray and data analysis

The mature fiber length of LF group (i.e. NMGA-062, NMGA-089, NMGA-017, NMGA-051 and NMGA-070) and SF group (i.e., NMGA-095, NMGA-026, NMGA-144, NMGA-105, and NMGA-140) BILs averaged 31.79 versus 27.35 mm, respectively, as described previously [43] (see “Materials and methods”) and were selected for their differences in fiber length (FL) and similarities in other agronomic. Their highly similar genetic background was also evident in that they were identical in 96.7% of the SSR and SNP marks assayed (unpublished), which is suitable for the current comparative analysis. In this study, we selected 10 DPA developing fibers from the LF group and SF group to study the molecular aspects of fiber elongation, because the elongation rate of fiber at 10 DPA was shown to be highest with a significant difference in the FL between the LF group and SF group [25]. Affymetrix GeneChip Cotton Genome Array with multiple probes was used for a comparison each gene between the LF and SF BIL groups further minimized complications. In the current study, 1478 DEGs between the LF group and SF group in the 10 DPA fiber transcriptome were identified (> twofold change; Bonferroni-corrected P-value threshold < 0.05) (Supplementary Data 3). When the LF and SF groups were compared, 403 (27.27% of 1478) of the DEGs were upregulated and 1075 genes (72.73% of 1478) were downregulated in the LF group.

A cluster analysis of the gene expression data from the 1478 DE genes (DEGs) separated all the five LF lines from two of the five SF lines (Fig. 1). Specifically, NMGA-062 (LF1), NMGA-089 (LF2), NMGA-017 (LF3), and NMGA-070 (LF5) of LF group had the highest similarity in gene expression patterns, followed by the high similarity of gene expression between NMGA-026 (SF2), NMGA-144 (SF3), NMGA-105 (SF4), and NMGA-140 (SF5) of SF group. The result demonstrates the reliability of the selected BILs and the grouping method used in this study and provides great confidence in the results obtained. However, one of the five LF lines, i.e., NMGA-051 (LF4) was group together with the SF group, indicating that high levels of similarity in phenotypes (here lint yield, lint percent, boll weight, Verticillium wilt index, uniformity) may not always be reflected by high levels of similarity in gene expressions. The DEGs identified between the two BIL groups may indicated a common genetic regulatory mechanism for genes associated with fiber development (more specifically, fiber length-related genes) in cotton.

Fig. 1
figure 1

Clustering of differentially expressed genes. LF1:NMGA-062; LF2:NMGA-089; LF3:NMGA-017; LF4:NMGA-051; LF5:NMGA-070; SF1:NMGA-095; SF2:NMGA-026; SF3:NMGA-144; SF4:NMGA-105; SF5:NMGA-140

Candidate DEGs within the fiber length QTL regions identified in the BIL population

The chromosomal distribution of the DEG identified in this study can be determined by completing the G. hirsutum (AD) genomic sequencing [24, 49]. According to published data for the cotton genome, 420 of the 1478 DEGs were mapped to chromosome locations from At_chr1 to At_chr13 in the At subgenome, 727 were mapped from Dt_chr1 to Dt_chr13 in the Dt subgenome, and the remaining 121 DEGs were mapped to unmapped scaffolds. Co-localization of the QTLs with the identified 1478 DEGs will help identify candidate genes for fiber length (Supplementary Data 4). We mapped 166 DEGs with the 4 fiber length QTLs [46] and 13 fiber length QTL hotspots, as determined by a previous meta-analysis [33] (Supplementary Data 5 and Fig. 2). The 166 genes were distributed in A01 (5 DEGs), A03 (12), A04 (4), A05 (2), A07 (4), A11 (10), A12 (9), D05 (54), D08 (30), D11 (3) and D12 (33). Of the 166 DEGs, 145 were divided into 32 GO terms based on their biological process, cellular component, and molecular function (Fig. 3). Among the cellular component category, the DEGs were further classified as intracellular (42 genes, 25.30% of 166 DEGs), partially intracellular (41 genes, 24.70%), and organelle (36 genes, 21.69%). For the biological processes, the organic substance metabolic process (56 genes, 33.73%), the primary metabolic process (54 genes, 32.53%), and the cellular metabolic process (49 genes, 29.52%) were associated with a higher number of identified DEGs. In the molecular function category, organic cyclic compound binding (34 genes, 20.48%), heterocyclic compound binding (34 genes, 20.48%), and ion binding (30 genes, 18.07%) were the three most abundant subclasses, followed by transferase activity, hydrolase activity, small molecule binding, and protein binding. The KOBAS was used to further analyze the 166 DEGs identified by the microarray analysis to identify the metabolic pathways in which they function. The KOBAS mapped 49 of the 166 DEGs to 31 KEGG pathways. The pathways with the highest 8 unigene representation were purine metabolism (8, 25.81%), thiamine metabolism (5, 16.13%), and cysteine and methionine metabolism (3, 16.13%) (Fig. 4).

Fig. 2
figure 2figure 2

Mapping of DEGs with fiber length QTL hotspots in G. hirsutum

Fig. 3
figure 3

Distribution of the gene ontology functional classifications of 166 differentially expressed genes

Fig. 4
figure 4

Significantly enriched pathways of 166 differentially expressed genes identified using KOBAS

A gene expression enrichment analysis was performed on the 44 genes upregulated in the LF group and the 122 genes upregulated in the SF group. The 44 upregulated DEGs in the LF group were divided into 39 GO terms based on their biological process, cellular component, and molecular function, respectively (Fig. 5). For the biological processes, the organic substance metabolic process (25 genes, 56.82% of the upregulated 44 DEGs) and the primary metabolic process (24 genes, 54.55%) were associated with the highest number of DEGs identified in the GO terminology. In the molecular function category, organic cyclic compound binding (11 genes, 25.00%) and heterocyclic compound binding (11 genes, 25.00%) were the two most abundant subclasses, followed by ion binding, transferase activity, small molecule binding, carbohydrate derivative binding, and hydrolase activity. Among the cellular component category, the DEGs were further divided into partial intracellular (11 genes, 25.00%) and intracellular (11 genes, 25.00%). Based on their biological process, cellular component, and molecular function (Fig. 6), the 122 upregulated DEGs in the SF group were also classified into 33 GO terms. For the biological processes, the organic substance metabolic process (31 genes, 25.41% of the 122 DEGs) and the primary metabolic process (30 genes, 24.59%) represented the majority of the DEGs. In the molecular function category, organic cyclic compound binding (23 genes, 18.85%) and heterocyclic compound binding (23 genes, 18.85%) were the two most abundant subclasses, followed by ion binding, hydrolase activity, transferase activity, protein binding, small molecule binding and oxidoreductase activity. Among the cellular component, DEGs were further classified into intracellular (31 genes, 25.41%) and partially intracellular (30 genes, 24.59%).

Fig. 5
figure 5

Distribution of the gene ontology functional classifications of 44 differentially expressed genes

Fig. 6
figure 6

Distribution of the gene ontology functional classifications of 122 differentially expressed genes

Since the 10 BILs were selected by analyzing the G. hirsutum × G. barbadense BIL population [46], we concentrated on the 4 fiber length QTLs identified in the BILs to identify fiber length-related DEGs. Of the 4 fiber length QTLs, two were found to colocalize with 6 DEGs, as shown in Supplementary Data 6. The 2 QTLs for fiber length were qFL-08A-c12-1 on At_chr 12 and qFL-08A-c21-1 on Dt_chr11. Each QTL region carried three DEGs related to fiber length. The qFL-08A-c12-1 on At_chr 12 had three co-localized DEGs encoding a kinesin-like protein, a probable lipid phosphate phosphatase beta protein, and an unknown protein. The qFL-08A-c21-1 QTL on Dt_chr11 also exhibited three co-localized DEGs encoding a peroxisomal acyl-coenzyme a oxidase 1-like protein, the dual-specificity protein phosphatase 8 (DSP8) protein, and an unknown protein.

Mapping more DEGs with the previously reported fiber length QTL hotspots in the G. hirsutum genome

Since only 2 fiber length QTL were mapped with DEGs in the BIL population, including the 10 lines used in this study, we also used the 13 fiber length QTL hotspots previously reported from a meta-analysis [33] to identify more fiber length-related genes. A total of 160 DEGs were found, and a total of 12 QTL hotspot fibers were co-localized. The 12 fiber length QTL hotspots with DEGs were mapped onto 10 chromosomes, including 7 At chromosomes and 3 Dt chromosomes (At_chr1, At_chr3, At_chr4, At_chr5, At_chr7, At_chr11, At_chr12, Dt_chr5, Dt_chr8, Dt_chr12).

Four of these 12 fiber length QTL hotspots were mapped to two pairs of homeologous chromosomes (At_chr5 vs. Dt_chr5 and At_chr12 vs. Dt_chr12). Fiber length (FL)_QTL_Hotspot(4) was identified at 0.19–2.08 cM on At_chr5, and this hotspot contained 2 DEGs encoding the uncharacterized protein LOC105770639 and an unknown protein. FL_QTL_Hotspot(4) at 0.04–8.96 cM on Dt_chr5 contained 33 DEGs, including genes encoding proteins such as spx domain-containing protein 1-like, nad kinase 1-like, monogalactosyldiacylglycerol synthase chloroplastic-like, glucan endo-1, 3-beta-glucosidase 14-like isoform ×1, vacuolar sorting-associated protein 4, calcineurin b-like protein 10, and jasmonate ZIM-domain protein isoform 1. FL_QTL_Hotspot(6) at 8.96–16.75 cM on Dt_chr5 contained 21 DEGs, including genes encoding proteins such as pleiotropic drug resistance protein 1-like, plac8 family protein, 1-aminocyclopropane-1-carboxylate oxidase, beta-galactosidase 1-like, casein kinase i isoform delta-like, and calmodulin-binding protein. Since the three hotspots were not located in the same regions of the two homeologous chromosomes (At_chr5 vs. Dt_chr5), no common DEGs were identified.

Similarly, no common DEGs were identified for the QTLs on the other two pairs of homeologous chromosomes. FL_QTL_Hotspot(4) was identified at 70.04–76.00 cM on At_chr12, and it contained five DEGs encoding for heat shock factor protein 5, AP2 ERF and B3 domain-containing transcription factor rav1-like, the ubiquitin-conjugating enzyme E2 2, the probable signal peptidase complex subunit 1, and an unknown protein. As a comparison, FL_QTL_Hotspot(6) was identified at 25.00–45.88 and 50.67–59.04 cM on Dt_chr12, and it contained 33 DEGs, including genes encoding nucleotidyltransferase family protein isoform 1, skp1 ask-interacting protein 16 isoform 1, and fasciclin-like arabinogalactan protein 19, among others.

In addition, in the two above mentioned At subgenome chromosomes, FL_QTL_Hotspot(10) ranged from 17.66 to 36.80 cM on At_chr1 and contained five genes, including genes encoding the glycine-rich RNA-binding protein mitochondrial-like, mfp1 attachment factor 1-like, the n-lysine methyltransferase SEE1, the ribonuclease p protein subunit p25-like protein isoform x1, and an unknown protein. FL_QTL_Hotspot(6) ranged from 2.82 to 9.85 cM on At_chr3 and contained 6 genes, including genes encoding CBL-interacting serine threonine-protein kinase 11, a haloacid dehalogenase-like hydrolase superfamily protein, the inner membrane protein PPF-chloroplastic isoform × 1, an expansin protein, a protein from the glutaredoxin family and an unknown protein. FL_QTL_Hotspot(7) ranged from 35.00 to 81.57 cM on At_chr3 and contained six genes, including genes encoding for the male gametophyte defective 2 isoform 1, the E3 ubiquitin-protein ligase prt1-like, the protein salt inducible-like partial, the E3 ubiquitin-protein ligase rnf182-like, the probable calcium-binding protein cml27 and the NEDD8-activating enzyme E1 regulatory subunit. FL_QTL_Hotspot(5) ranged from 40.41 to 53.33 cM on At_chr4 and contained 4 genes, including genes encoding for the transducin family protein wd-40 repeat family isoform partial, the protein gpr107-like, the protein tip41-like, and the kinesin-like protein kif22. FL_QTL_Hotspot(4) ranged from 4.69 to 9.28 cM on At_chr7 and contained 4 genes, including genes encoding for the protein cyclin-h1-1 isoform × 2, the homeobox-leucine zipper protein hat5-like, the 40 s ribosomal protein s7, and the ribosomal protein s21 family protein isoform 1.

One additional Dt subgenome chromosome carried fiber length QTL hotspots. FL_QTL_Hotspot(7) was located at 32.00-58.05 cM on Dt_chr8, and it contained 30 DEGs encoding for proteins such as expansin-like b1, sucrose synthase 1, the snare-interacting protein KEULE, the nuclear pore complex protein NUP98a, the protein trichome birefringence-like 43, and the bromo-adjacent domain-containing protein, among others.

Candidate DEG expression analysis

Two upregulated and four downregulated DEGs identified in the LF group by microarray analysis that co-localized with fiber length QTLs were further analyzed using qRT-PCR. Developing fibers at 5, 10, 15, 20, and 25 DPA were analyzed. The six genes included kinesin-like protein (GenBank no. DT054207.1, designated GhKIF), probable lipid phosphate phosphatase beta (GenBank no. DW502593.1, designated GhPAP2), peroxisomal acyl-coenzyme a oxidase 1-like (GenBank no. DT462539.1, designated GhACX), dual-specificity protein phosphatase dsp8 (GenBank no. DT051818.1, designated GhDSP8), and unknown genes (GenBank no. DT052644.1, and GenBank no. DT455687.1) (Fig. 7).

Fig. 7
figure 7

Expression profile of GhKIF, GhACX, Unknown1, GhPAP2, GhDSP8, and Unknown2 in developing fibers at 5–25 days postanthesis (DPA). a Kinesin-like protein (GenBank no. DT054207.1, designated GhKIF); b Peroxisomal acyl-coenzyme a oxidase 1-like (GenBank no. DT462539.1, designated GhACX); cUnknown1 (GenBank no. DT052644.1); d Probable lipid phosphate phosphatase beta (GenBank no. DW502593.1, designated GhPAP2); e dual-specificity protein phosphatase dsp8 (GenBank no. DT051818.1, designated GhDSP8); fUnknown2 (GenBank no. DT455687.1). Long-fiber group, LF1:NMGA-062; LF2:NMGA-089; LF3:NMGA-017; LF4:NMGA-051; and LF5:NMGA-070. Short-fiber group, SF1:NMGA-095; SF2:NMGA-026; SF3:NMGA-144; SF4:NMGA-105; SF5:NMGA-140

The expression of GhKIF in 25 DPA fibers was higher than that in the other fiber developmental stages, and the expression in the LF group was downregulated at 5–20 DPA compared with that in the SF group (Fig. 7a). For the GhACX gene, mRNA expression was the lowest in 5 DPA fibers and then linearly increased at 5–20 DPA in SF group. Furthermore, its expression was highest in 5 DPA fibers and then linearly decreased at 10–15 DPA, then increased at 20–25 DPA in the LF lines (Fig. 7b). The expression of the Unknown1 gene at the mRNA level was lower in the 5 DPA fibers compared to the other fiber developmental stages and in the LF group compared with the SF group at 5–15 DPA (Fig. 7c). The expression of GhPAP2 in the SF group was upregulated in fibers including developing fibers at 0 DPA, which indicated that this gene is negatively correlated with fiber development (Fig. 7d). The expression of GhDSP8 in the 20 DPA fibers was higher than that in the other fiber developmental stages, and the expression of GhDSP8 in the LF group was lower than that in the SF group at 5 and 15–25 DPA (Fig. 7e). The expression of the Unknown2 gene at the mRNA level was reduced in the 25 DPA fibers compared to the other fiber developmental stages in the LF group (Fig. 7F). In terms of the Unknown2 gene, the expression at the mRNA level was highest in the SF lines in the 25 DPA fibers.. Interestingly, a completely opposite trend was observed in the 5–25 DPA fibers in the LF and SF groups. The expression of GhACX and GhDSP8 genes were up-regulated in the LF group based on microarray analyses, but which is no significant difference between LF group and SF group in the qRT-PCR results. The results indicated that the differential expression of the six genes between the LF group and SF group was relative congruence between the microarray and qRT-PCR results. Although the results indicated that the expression of GhKIF and GhACX was down-regulated in the LF group at 5–25 DPA, but the expression of GhKIF and GhACX was higher and higher during fiber development, which maybe plays an important role in fiber development.

Discussion

In the present study, we performed a microarray-based comparative transcriptome analysis of 10 DPA fibers between two groups of BILs based on differences in fiber length and similarities in other traits (long fiber-LF vs. short fiber-SF). A total of 1478 differentially expressed genes (DEGs) were identified that may be involved in fiber elongation. We mapped 166 DEGs with 2 fiber length QTL and 12 fiber length QTL hotspots that were previously reported [33, 46]. Most importantly, six genes with two FL QTL were identified in a BIL population from the two groups of introduction lines, which is the focus of the current study. The mining of DEGs in these QTL and QTL hotspots regions helps to identify the key candidate genes for FL. Our chip-based comparative DEG analysis, combined with physical mapping, FL QTL and gene mapping, identified several candidate genes that possibly underlie the genetic control of FL differences between G. barbadense and G. hirsutum. To the best of our knowledge, this fiber quality DEG study is the first to use gene-chip technology and genotyping to detect candidate genes for FL, which is the most effective strategy based on QTL mapping and physical mapping of a sequenced genome for narrowing down the number of candidate genes. This study paves the way for the identification of candidate genes for fiber traits in cotton in the future.

In the current study, 1478 DEGs between the “long” and “short” groups of BILs were identified in the 10 DPA fiber transcriptomes, 166 DEGs were mapped with 2 fiber length QTL and 12 fiber length QTL hotspots that were previously reported from the two groups of BILs selected for this study. Most importantly, six genes with 2 FL QTLs were identified in a population from the two groups of introduction lines. In our study, the 2 QTL for fiber length were qFL-08A-c12-1 on At_chr 12 and qFL-08A-c21-1 on Dt_chr11. The qFL-08A-c12-1 QTL on At_chr 12 included three co-localized DEGs, encoding a kinesin-like protein, a probable lipid phosphate phosphatase beta protein, and an unknown protein. The qFL-08A-c21-1 QTL on Dt_chr11 also included three co-localized DEGs, encoding a peroxisomal acyl-coenzyme a oxidase 1-like protein, the dual-specificity protein phosphatase 8 (DSP8) protein, and an unknown protein [25] found that 8 DEGs were co-localized with 4 FL QTLs identified in the BIL population containing the two BILs. The 4 FL QTL (qFL-07X-c5-1, qFL-07W-c11-1, qFL-08A-c21-1, and qFL-08A-c12-1) were mapped onto A05, A11, D11, and A12, respectively. The QTL regions of qFL-08A-c21–1 and qFL-08A-c12–1 were consistent between the results of Li et al. and our findings, but none of the same genes were identified. This study provide a glimpse into the relationship of DEGs with the genetic variation of fiber length in cotton.

Six DEGs identified in the LF group through microarray analysis that co-localized with fiber length QTL were further analyzed using qRT-PCR. Specifically, the gene expression levels of peroxisomal acyl-coenzyme a oxidase 1-like GhACX and the dual-specificity protein phosphatase GhDSP8 were higher in the LF group than in the SF group at 10 DPA, indicating that these genes are positively correlated with fast fiber elongation development. The gene expression levels of the kinesin-like protein GhKIF, the probable lipid phosphate phosphatase beta GhPAP2, and two unknown genes were lower in the LF group than in the SF group at 10 DPA. The peroxisomal acyl-coenzyme a oxidase 1-like GhACX gene was located at the c21_qFL-08A-c21-1 QTL for fiber length. Acyl-CoA oxidase (ACX) catalyzes the conversion of acyl-CoA to 2-trans-enoyl-CoA in the first reaction of the b-oxidation pathway [14, 17]. An acyl-CoA oxidase (ACX) that is involved in the oxidation of fatty acids has been cloned from Arabidopsis thaliana [13]. Fatty acid metabolism-related genes have also been reported to be involved in fiber development, as rapid fiber cell elongation requires related organelles and membranes, which are provided by fatty acid metabolism [31, 38]. Dong et al. found that the transcript of GhACX was most abundant in seeds at 25 DPA, indicating that it plays an important role in the fatty acid metabolism during cotton seed formation [7]. In this study, the expression level of the peroxisomal acyl-coenzyme a oxidase 1-like gene was relatively high in the developing fibers at 25 DPA and was upregulated in the LF group, indicating that the gene was preferentially expressed during the fiber development period and was positively correlated. The GhKIF expression levels were higher in the SF group than in the LF group at 25 DPA. Previous studies have identified more than a dozen kinesins that are abundantly expressed during fiber development in Upland cotton [15, 18, 29, 30, 37]. Wu et al. found that GhKIF11 expression levels were higher in the LY (low yield) group than in the HY (high yield) group at 15 and 20 DPA, but this gene was not located in any of the QTL regions for lint yield and yield traits [41]. This gene is located at the c12_qFL-08A-c12-1 QTL for fiber length. Therefore, GhKIF11 is likely one of the candidate genes for fiber length QTL.

Through a comparative genome-wide gene expression analysis of two groups of BILs with contrasting fiber lengths and the co-localization of DEGs with fiber length QTL, 1478 DEGs were narrowed to 166 genes in the QTL regions. Several candidate genes possibly underlying the genetic control of fiber length differences between G. barbadense and G. hirsutum, including GhACX and GhKIF, were likely the candidate genes for fiber length QTL. These results provide a list of candidate genes for fine mapping of fiber length QTL and mapping-based cloning, which will facilitate targeted gene transfer from island cotton to upland cotton to further improve fiber quality.