Introduction

As a leading global fiber crop, cotton provides most of the natural fiber for the textile industry. Furthermore, the cottonseed by-product is also a good natural source of oil and protein, and plays important roles in world oil and livestock feed product markets. Four species of the Gossypium genus are cultivated, including two diploid species, G. arboreum and G. herbaceum, and two allotetraploid species, G. barbadense and G. hirsutum (Wendel and Cronn 2003). G. hirsutum (upland cotton) is the most widely grown worldwide, accounting for 95 % of both acreage and fiber production (Chen et al. 2007).

Cotton fiber, the most important product of the cotton plant, is a highly elongated single cell derived from the ovule epidermis; its development synchronizes with and depends on seed development. Cottonseed development is a highly programmed and regulated process, shaping cotton fiber yield, quality, and seed nutrients. Lint percentage (LP), a component of lint yield and a critical economic index for cotton cultivars, is closely related to lint yield improvement (Culp and Harrell 1975; Meredith, 1984; Zeng and Meredith 2009). Previous research has demonstrated that lint percentage is a quantitative, stably inherited trait (Meredith 1984). Many quantitative trait loci (QTL) conferring lint percentage have been identified and mapped with molecular markers in different populations (Zhang et al. 2005; Shen et al. 2007; Wang et al. 2007; Qin et al. 2008; Wu et al. 2009; An et al. 2010; Yu et al. 2013; Zhe et al. 2014). Although the QTL identified have helped reveal the landscape of genetic factors controlling lint percentage, these QTL, which are either only detected in a single environment or have small effects, exhibit low reliability and stability, thus limiting their application.

Although cottonseed nutrients are not as important as fiber yield or quality traits, they are of interest to researchers. The major concerns in terms of cottonseed nutrient traits are oil and protein content. Generally, cottonseed has an oil content ranging from 13.6 to 24.7 % and a protein content ranging from 12 to 23 % (Lukonge et al. 2007; Dowd et al. 2010). Significant relationships have been detected between cotton fiber yield and seed nutrient traits (Dani and Pundarikakshudu 1986; Ashokkumar and Ravikesavan 2013). The relationship of most concern is the negative relationship between seed oil content and fiber yield (Turner et al. 1976), which indicates competition for carbohydrates made from photosynthesis. However, cottonseed oil content is also negatively related to seed protein content (Hanny et al. 1978; Kohel and Cherry 1983; Sun et al. 1987). Cottonseed nutrient traits are also quantitative and mainly affected by genotype (Anderson and Worthington 1971; Dowd et al. 2010). However, few studies on QTL mapping for cottonseed nutrient traits have been reported to date (Song and Zhang 2007; Liu et al. 2010; Yu et al. 2012).

In this study, we aimed to comprehensively map the loci associated with lint percentage and cottonseed nutrient traits based on a recombinant inbred line mapping population. The results will help us further understand the genetic mechanism of lint percentage and cottonseed nutrient traits and the relationship between them. The QTL identified as particularly stable, and co-located QTL will further facilitate mining of the genetic factors underlying lint percentage and cottonseed nutrient traits and provide a means for molecular marker-assisted selective breeding.

Materials and methods

Mapping population

Cultivar Yumian 1, bred by our laboratory, is characterized by high lint yield and high fiber strength. T586, known as the multiple dominant gene line, has nine morphological markers (R1, red plant; T1, pubescent; L2, okra leaf; R2, petal spot; Y1, yellow flower; P1, yellow pollen; Lc1, brown fiber; N1, naked seed; Lg, green lint) (Endrizzi et al. 1984). The population was developed from the two parent cultivars in summer 2000 in Southwest University (SWU), Chongqing, China. F1 individual plants were self-pollinated, and F2 seeds were harvested in winter 2000 in Hainan Island (Zhang et al. 2005). Two hundred and seventy F2 individuals were hand-harvested randomly and planted to obtain F2:3 family lines. One individual of each F2:3 family line was randomly selected to produce the next generation. This procedure was continued in the following generations until an F2:7 recombinant inbred line population was obtained. A completely randomized design was used to arrange the lines in the field. Parents and recombination inbred lines were planted in single-row plots 0.7 m wide and 5 m long, planted in April and harvested in October in Chongqing from 2004 to 2012.

Trait examination

All naturally opened bolls were hand-harvested to gin fiber. Delinted seeds were dried at 105 °C for 2 h in a forced-air oven, and then deshelled and ground into powder to detect the crude oil (CO) and crude protein (CP) content. CO and CP content (on a dry weight of kernel powder basis) was measured following the methods described by Ye et al. (2003). A GC 2010 gas chromatography system (Shimadzu Co., Ltd., Tokyo, Japan) was used to analyze fatty acid components, as described by Dowd et al. (2010). LP of a given line was measured by the weight ratio of lint to cottonseed. CO and CP percentages were determined by dividing delinted seed CO or CP percentage by embryo percentage. Cottonseed oil consists of four major fatty acids, which were expressed as a percentage (%) of CO; these include linolenic acid (18:3, LA), stearic acid (18:0, SA), oleic acid (18:1, OA) and palmitic acid content (16:0, PA).

SSR analysis and genetic map construction

Genomic DNA from the parents and the RIL population was isolated from leaf tissue by the CTAB method (Zhang et al. 2005). PCR was conducted in a total volume of 10 μl with 50 ng of cotton DNA, 1 × PCR buffer, 2.0 mM MgCl2, 0.2 mM dNTPs, 0.5 μM concentrations of each primer, and 0.5 units of Taqase (Shanghai Sangon, China). The PCR conditions were as follows: 94 °C for 5 min; 35 cycles of 94 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min; 72 °C for 10 min; 4 °C for preservation. After amplification, the PCR products were mixed with loading buffer (2.5 mg/ml bromophenol blue, 2.5 mg/ml diphenylamine blue) and then kept at 4 °C. The PCR products were separated on 10 % (w/v) polyacrylamide gels and visualized by silver staining.

A total of 25,313 pairs of microsatellite marker primers, including 18,358 pairs from the cotton marker database (http://www.cottonmarker.org/) and 6955 pairs from our laboratory (Tang et al. 2015), were synthesized by Shanghai Invitrogen and Shanghai Sangon. The primer pairs showing polymorphism between the mapping parents were used to genotype 270 recombinant inbred lines. Marker nomenclature was the same as the primer name. For multiple polymorphic loci revealed by the same primer pair, marker nomenclature consisted of the primer name and a letter a/b/c indicating the polymorphic fragment size from the smallest to the largest.

JoinMap 4.0 (Van Ooijen 2006) was used to group and order loci with a log of odds (LOD) threshold range of 4–8. Locus localization derived from previous maps (Zhang et al. 2009; Yu et al. 2011; Blenda et al. 2012) was used to assign linkage groups to putative chromosomes. Linkage groups known to a given chromosome were then treated as separate data sets and grouped and ordered at LOD values between 1 and 4. Map distances were calculated using Kosambi’s mapping function.

QTL analysis

MapQTL 6.0 (Van Ooijen 2009) was used to identify QTL for LP and seed nutrient traits. LOD ≥2.0 was used to declare suggestive QTL, as suggested by Lander and Kruglyak (1995), which has been used previously in cotton QTL identification (Shen et al. 2007; Qin et al. 2008). Graphic representation of the linkage groups and QTL was created in MapChart 2.2. QTL names start with “q”, followed by the trait abbreviation (e.g., LP for lint percentage), the name of the chromosome and then the number of QTL affecting the trait on the same chromosome.

Results

Phenotypic data analysis

Descriptive statistics for LP data across ten environments and seed nutrient trait data across three environments are summarized in Table 1. Significant differences were observed between parents for all of the traits, except for stearic acid content. For the recombinant inbred line population, all of the traits presented transgressive segregation in different environments and the skewness and kurtosis values revealed that these traits were all approximately normally distributed. The correlation analysis based on data from three environments for the recombinant inbred line population for the tested traits is shown in Table 2. The most obvious and stable relationship was that for all the three environments. First of all, LP was significantly positively correlated with CP content but negatively correlated with CO content. In addition, CP content was significantly negatively correlated with CO content and oil components. At the last, linoleic acid content was significantly negatively correlated with oleic acid and palmitic acid content. All other correlations were neither significant nor stable, showing low reliability.

Table 1 Phenotypic variation in lint percentage and seed nutrient traits in the recombinant inbred line population
Table 2 Correlation coefficients among all traits

Updated genetic map

A total of 25,313 microsatellite marker primer pairs were employed to screen for polymorphisms between the parents and 1712 primer pairs revealed clear polymorphism, accounting for 6.8 % of the total primers. The polymorphic primers were used to genotype the recombinant inbred line population, and 1792 loci were obtained, including 509 SSR (Zhang et al. 2009) and 32 SSR from transcription factors (Li et al. 2012). Among the polymorphic primers, 70 primer pairs amplified two loci, two amplified three loci, and two amplified four loci. Among the 1801 loci, 320 (17.7 %) exhibited segregation distortion (P < 0.05) with 226 (70.6 %) favoring the Yumian 1 alleles and 94 (29.4 %) favoring the T586 alleles. All of the 1801 loci and nine morphological marker loci were applied to construct the genetic map. We mapped 1675 SSR and nine morphological marker loci onto 26 upland cotton chromosomes. Generally, all of the loci were evenly distributed along the genome, but some chromosomes had more markers than others, for example Chr08 had 120 loci while Chr04 only had 27. The total recombinant length of this map was 3338.2 cM with an average of 1.98 cM between adjacent markers (Table 3; Fig. 1). The average chromosome recombinant length was 128.39 cM, with the longest chromosome (Chr19) spanning 172.6 cM and the shortest (Chr01) spanning 90.9 cM. Loci on Chr06 were densest, being 0.6 cM between adjacent markers, while the scarcest was Chr04, being 2.94 cM apart. The At-subgenome spanned 1446.7 cM, containing 726 markers with an average marker interval of 2.02 cM. The Dt-subgenome spanned 1871.5 cM, containing 958 markers with an average marker interval of 1.95 cM. There were 13 gaps (marker interval >10 cM) on this genetic map, with eight on At-subgenome and five on Dt-subgenome. The largest gap on Chr11 spanned 22.9 cM. More detailed information of the genetic map is depicted in Table 3 and Fig. 1.

Table 3 Marker distribution on chromosomes in the map developed from the recombinant inbred line population
Fig. 1
figure 1figure 1figure 1figure 1figure 1figure 1

QTL for lint percentage, seed protein, oil, and fatty acid components in upland cotton (T586 × Yumian1) recombinant inbred line population. Morphological loci are shown in bold and italics. Map distances are given in centimorgans (cM). Markers showing segregation distortion are indicated by asterisks (* P < 0.05; ** P < 0.01; *** P < 0.001) for markers skewed toward the Yumian 1 alleles or plus signs (+ P < 0.05; ++ P < 0.01; +++ P < 0.001) for markers skewed toward the T586 alleles. Bars along the linkage groups indicate 1-LOD likelihood intervals for QTL. QTL are shown as lint percentage (LP), crude protein content (CP), crude oil content (CO), linoleic acid content (LA), oleic acid content (OA), palmitic acid content (PA), and stearic acid content (SA). Favorable QTL alleles contributed by Yumian 1 are represented by black bars, and those contributed by T586 are represented by empty bars. Segregation distortion regions (SDRs) are named as “chromosome + no”. For example, SDR01.1 refers to the first SDR on Chr01

A total of 279 segregation-distorted loci, accounting for 16.6 % of the mapped loci, were unevenly distributed on the 26 cotton chromosomes with 2–27 loci on each chromosome. More distorted loci were located on the At-subgenome than on the Dt-subgenome (163 versus 116) (Table 3). A total of 19 segregation distortion regions (SDRs) were found on the 14 cotton chromosomes with 12 SDRs on the At-subgenome and seven SDRs on the Dt-subgenome (Fig. 1). Chr05 had the most distorted loci with the highest proportion, forming the largest SDRs.

LP QTL

Eight QTL, explaining 3.4–63.4 % of the total phenotypic variation, were detected across 10 environments. The nearest loci and confidence interval of these QTL are shown in Table 4 and Fig. 1. Among the eight non-over lapping QTL, qLP06.1, qLP07.1, qLP09.1, qLP12.1, qLP21.1, and qLP26.1, detected in 6–10 environments were stable. Four QTL, qLP06.1, qLP07.1, qLP12.1, and qLP21.1, were associated with the dominant morphological markers T1, Lc1, N1, and Lg, respectively. Three major QTL, qLP07.1, qLP21.1, and qLP12.1, explained >10 % variation; all of the favorable QTL effects were conferred by Yumian 1 alleles. Parent Yumian 1 alleles confer favorable effects for most QTL, except qLP06.1 and qLP26.1, which is consistent with parental LP.

Table 4 Fiber lint percentage QTL identified

Seed nutrient trait QTL

Sixty-four significant QTL for six seed nutrient traits were identified, including four detected in three environments and four detected in two environments. These QTL were mapped on different chromosomes (Fig. 1).

For the CP content, 13 QTL were identified and located on 13 chromosomes, explaining 5.2–48.1 % of the phenotypic variation (Table 5). Of these, 12 Yumian 1 alleles increased coarse protein content. qCP07.1 (at locus Lc1) and qCP12.1 (at locus N1) were identified in three environments, had large additive effects, and explained a lot of the phenotypic variation; these are major QTL. With the exception of QTL qCP01.1 identified in two environments, all the other ten QTL were identified only in one environment.

Table 5 QTL affecting seed protein content, oil, and fatty acid component percentage

For the CO content, 15 QTL were identified and mapped on 15 chromosomes, explaining 2.0–39.8 % of the phenotypic variation. Of these, 11 alleles from T586 increased coarse oil content. qCO07.1 (at locus Lc1) and qCO12.1 (at locus N1) were detected in three environments and qCO21.1 (at locus Lg) and qCO23.1 were detected in two. All of the other 11 QTL were detected in only one environment.

For the LA content, eight QTL were identified and located on seven chromosomes, explaining 2.2–8.0 % of the phenotypic variation, and no QTL was identified in two or three environments. Four alleles from T586 and four alleles from Yumian 1 increased the LA content

For the OA content, 10 QTL were identified and located on ten chromosomes, explaining between 2.0 and 15.4 % of the phenotypic variation. qOA18.1 was detected in 2006 and 2007, and the other nine QTL were detected in only one environment. Five alleles from T586 and five alleles from Yumian 1 increased the OA content.

For the PA content, 13 QTL were detected on 12 chromosomes, explaining between 4.2 and 13.3 % of the phenotypic variation. All of the QTL were detected in only one environment. Three alleles from T586 and nine alleles from Yumian 1 increased the PA content.

For the SA content, 12 QTL were detected on 10 chromosomes, which explained between 4.4 and 22.7 % of the phenotypic variation. qSA14.1 was detected in 2006 and 2011, and the other nine QTL were detected in only one enviroment. Eight alleles from T586 and four from Yumian 1 increased the SA content.

Co-localization QTL

A number of QTL controlling different traits were found to co-localize to the same region of the chromosomes. For example, eight regions (on Chr06, Chr07, Chr12, Chr14, Chr15, Chr16, Chr21, and Chr26) controlling three or more traits were detected in the RIL population. QTLs for LP, CP, and CO content were identified and mapped in the same confidence intervals on Chr06, Chr07, Chr12, Chr21, and Chr26. CO content had an opposite additive effect with LP and CP content, but LP and CP content had positive additive effects. QTL for CO and OA content were identified at the Lc1 loci on Chr07 with positive additive effects. QTL for CO and SA content identified on Chr06, Chr07 and Chr14 had opposite additive effects. QTL for CO and PA content identified on Chr15 and Chr26 had opposite additive effects. QTL for LA and OA content identified on Chr12 and Chr18 had opposite additive effects.

The co-localization QTL may serve to explain some correlations and the common genetic basis among the involved traits. Meanwhile, this result indicated that increasing LP may increase the CP content but reduces the CO content in cottonseed. Furthermore, it appeared that the four major fatty acids could not be simultaneously increased because of the different biochemical synthesis pathways of seed fatty acids.

Discussion

Genetic map

The updated genetic map contained 1675 SSR and nine morphological loci spanning 3338.2 cM with an average interval of 1.98 cM between adjacent loci. Because of the narrow genetic background (Linos et al. 2002; Lacape et al. 2007; Zhang et al. 2009), this is the densest intraspecific upland cotton genetic map to date. Compared with the tetraploid cotton interspecific genetic map (Rong et al. 2004; Yu et al. 2011; Zhao et al. 2012), this map is still far from being saturated. We had used 25,313 SSR primer pairs, including almost all of the publicly available ones. We only obtained 1712 primers with a polymorphism rate of 6.8 %, indicating that the potential of the publicly available SSR loci to construct a saturated upland cotton genetic map is very limited. Therefore, for saturated upland cotton genetic map construction, development of new markers from the genome databases, such as the G. arboreum and G. raimondii genomes, is necessary (Paterson et al. 2012; Wang et al. 2012; Li et al. 2014). However, the large number of SSR loci and even loci distribution make ours a good skeleton map to understand and exploit the tetraploid Gossypium genome.

Relationship between LP and cottonseed nutrient traits

The correlation analysis in this study presents the overall network of relationships between LP and cottonseed nutrient traits. The most important correlations were detected between LP, CO, and CP content, which revealed the proportion of carbohydrates flowing to different molecular types and how cottonseed nutrient traits have impacted on fiber yield. LP was positively correlated with CP content, whereas it was negatively correlated with CO content, which is consistent with previous reports (Bechere et al. 2009; An et al. 2010). A negative correlation between CO and CP content had also been reported by other researchers (Song and Zhang 2007; Yu et al. 2012). Our study provides further evidence that it is impossible to increase all three in breeding projects. However, there were no significant unfavorable correlations between LP and fatty acid content; similar correlations were observed between CP content and fatty acid components, except for oleic acid content. This means that breeders could increase a given fatty acid component content while increasing LP and CP content. Among the fatty acid components, the correlations were very complicated and the most significant were the negative correlations between linoleic acid and other fatty acid component content. Considering the high linoleic acid content in crude fatty acid content, it is obvious that the linoleic acid biosynthetic pathway is far more competitive.

Major QTL identified at morphological loci

Endrizzi et al. (1984) reported that T586 includes R1, T1, L2, R2, Y1, P1, Lc1, N1, and Lg, which were mapped on seven genetic linkage groups (Kohel et al. 1965). All nine morphological markers located in the present report were consistent with previous genetic maps, expect for Lg (Kohel et al. 1965; Guo et al. 2006). Green fiber (Lg) controlled by an incomplete dominance gene was located on Chr15 in previous studies (Stephens 1955; Kohel et al. 1965; Kohel 1985). In the present study, Lg is mapped on the end of Chr21 flanked by loci C2-0120 and CGR5015. Based on the G. raimondii reference genome (Paterson et al. 2012), the loci (C2-120, NAU3415, and DC340316) closely linked to Lg are physically aligned to Chr07 (corresponding to Chr11 and Chr21 of tetraploid cotton) rather than Chr02 (corresponding to Chr01 and Chr15 of tetraploid cotton), which further confirms that our result is reliable.

In this and our previous studies (Zhang et al. 2009), some of the major morphological markers (N1, Lc1, T1, and Lg) exhibited pleiotropic effects on fiber yield, fiber quality, and CO and CP content. Three of the morphological markers (N1, Lc1, and Lg) significantly impacted LP with very high phenotypic variation; in particular, QTL on loci N1 accounted for >60 %. N1 were fuzzless and had a significantly negative effect on LP. Abdurakhmonov (2007) detected two highly significant fiber percentage QTL that explained approximately 23–59 % of phenotypic variation around the regions TMB0471 and MGHES-31 on Chr12, and one parent was L-70 (fuzzless/lintless with 0 % lint percentage on cottonseed). Rong et al. (2005) reported that a major QTL, which explain 33.6 % of LP variation (LOD = 7.50), were mapped at the region where N1 is located. N1, as a dominant gene, could have pleiotropic effects on fiber development inhibition, in terms of both fuzz and lint. N1 may also be associated with a major gene affecting fiber development.

Both brown and green fibers were negatively correlated with fiber yield and quality (Richmond 1943). Zhang et al. (2005, 2009) reported that stable QTL affecting fiber length, uniformity, fineness, and strength were identified at locus Lc1 and that T586 alleles decrease the phenotypic value of traits. In this study, major QTL for LP, CP content, and CO content are identified at loci N1, Lc1, and Lg, which indicates that these loci affect multiple traits. Meanwhile, the T586 alleles at these loci decrease LP and CP content but increase CO content. This result shows that these loci are very important for cottonseed development, and partly explains the correlations among LP, CP, and CO content.

Simpson (1947) first reported that pilose (T1) produced short dense trichomes on the vegetative parts of upland cotton plants. Yi et al. (2001) and Guo et al. (2006) reported that T1 was associated with an LP QTL and with the favorable from T586 allele, which was further confirmed in this study. Zhang et al. (2005, 2009) and Wan et al. (2007) reported that the T1 locus might contain the candidate gene underlying QTL controlling fiber length, uniformity, strength, and fineness. The T1 region on Chr06 may carry a QTL with pleiotropic effects or a QTL cluster controlling plant trichomes and seed fibers (Said et al. 2014). In summary, the T1 locus in T586 increases LP and fiber micronaire but decreases fiber length and strength. QTL identified at locus T1 provide further evidence that trichomes and cotton fiber likely share common regulatory mechanisms (Suo et al. 2003; Lee et al. 2007).

Common QTL across populations

In the present study, eight LP QTL were detected, but only qLP26.1 was found near locus NAU5164 on Chr26 (Yu et al. 2013) in previous studies, except for the QTL common to the morphological markers. Among the 71 seed nutrient trait QTL in this study, three were also reported in other studies on different populations; these included qCO07.1 sharing a common marker NAU1302 (Song and Zhang 2007), qCO12.1, and qCP12.1 closely linked to marker BNL3867 (Yu et al. 2012). The reason that a few of the QTL identified in the present study were also detected in other populations are inferred as following. First, parent T586 has several morphological loci that contribute to most of the phenotypic variation. The other population, in which T856 was not a parent, did not contain the alleles that T586 had at the morphological loci. Therefore, the QTL identified at these loci were not detected. Second, most of the QTL have little effect and are mainly affected by environmental factors, so they are not easy to detect across populations planted under different environments. Third, not a lot of QTL mapping work has been carried out on cottonseed nutrient traits to date. Only 29 protein- and 16 oil-related QTL have been identified, which is far fewer than the number of fiber quality QTL (Said et al. 2013). Last, less common markers were found in the upland cotton genetic map because of relatively low levels of DNA marker polymorphism.

Knowledge on fiber growth and development at the molecular level and its integration with QTL mapping is essential in designing next-generation breeding strategies. The present map will provide a highly dense genetic linkage map for molecular marker-assisted selection. The QTL identified for LP and seed nutrient traits at loci T1, N1, Lc1, and Lg provide a means for further study on the molecular mechanisms of fiber and cottonseed development through map-based cloning and functional analysis.