Introduction

The first molecular genetic map for cotton was published in 1994 (Reinisch et al. 1994) and to date almost fully resolved cotton genetic maps have been constructed. Two major types of molecular markers have been used in map construction: hybridization-based restriction fragment length polymorphisms (RFLP) (Reinisch et al. 1994; Shappley et al. 1998; Rong et al. 2004) and PCR-based markers such as random amplified polymorphic DNA (RAPD) (Kohel et al. 2001), amplified fragment length polymorphisms (AFLP) (Abdalla et al. 2001; Mei et al. 2004), sequence tagged sites (STS) (Rong et al. 2004), and simple sequence repeats (SSR) (Han et al. 2004, 2006; Frelichowski et al. 2006). These markers provide valuable information regarding levels of genetic variability, but little information in terms of biological function can be obtained. Therefore, despite their usefulness, it is generally unknown if these markers target coding or non-coding regions of the genome. Recently developed candidate gene markers represented by expressed sequence tags derived SSR (EST-SSR) (Chee et al. 2004; Park et al. 2005; Guo et al. 2007), cDNA probe-based STS or RFLP markers (Rong et al. 2004), single nucleotide polymorphisms (SNP) (An et al. 2007), and sequence-related amplified polymorphisms (SRAP) (Lin et al. 2005) provide information about candidate genes with biological function. However, the utilization of these markers is limited in cotton genome mapping either by their low rates or number of polymorphisms detected (Rong et al. 2004), or by inadequate genome sequence information and highly repetitive DNA content (An et al. 2008).

cDNA-AFLP fragments or transcriptome-derived fragments (TDFs) are a type of candidate gene marker with high levels of polymorphism and reproducibility (Bachem et al. 1996). TDFs are derived from mRNA sequences of constitutively expressed genes, and therefore generally represent sequences conferring biological function (Brugmans et al. 2002). Consequently, the cDNA-AFLP technique is used extensively in differential gene expression and gene discovery in a wide number of organisms (Breyne and Zabeau 2001; Samuelian et al. 2004; Ranik et al. 2006) and applied in genome-wide expression analysis (Breyne et al. 2003) and transcriptome map construction (Brugmans et al. 2002; Pan et al. 2007; Ritter et al. 2008).

A cDNA-AFLP fragment-based genome-wide transcriptome map provides a powerful tool to identify candidate genes involved in, or controlling, various biological processes (Breyne and Zabeau 2001). Ritter et al. (2008) analyzed TDFs co-located with known QTL on an integrated potato transcriptome map and identified two resistance gene homologs. Several candidate genes in potato were identified for early tuber production in a diploid population using bulked segregant analysis (BSA) coupled with transcriptome analysis and QTL mapping (Fernández-del-Carmen et al. 2007). More recently, a cotton fiber transcriptome map was constructed and six candidate genes represented by TDFs co-segregating with QTL for fiber strength were identified (Liu et al. 2009). These functional genes were characterized as integral in plant cell wall morphogenesis or cellulose synthesis metabolism processes, and considered as candidate genes controlling cotton fiber strength.

Identification and characterization of QTL and genes controlling agronomic and fiber quality traits in cotton has been a research focus for well over a decade. Almost all mapped QTL have been identified in interspecific or intrapecific F2/F2:3 (Ren et al. 2002; Ulloa et al. 2005; Guo et al. 2006; He et al. 2007) and BC1 segregating families in a single or a limited number of environments; however, mapping precision with these transient populations is limited. Immortalized mapping populations such as recombinant inbred lines (RILs) (Shen et al. 2007; Wang et al. 2007; Wu et al. 2009) and doubled haploid (DH) (Zhang et al. 2002) lines have been considered as alternatives to solve this problem. However, since alleles in these populations are homozygous, dominant effects cannot be detected. Hua et al. (2002, 2003) first proposed the immortalized F2 (IF2) population. Intercrosses between RILs selected by random permutations are used to construct an IF2. The genetic composition of the population is similar to an F2; however, replicated experiments in multiple environments can be achieved (Hua et al. 2003). Wang et al. (2007) recently constructed an immortalized F2 population in cotton, and data from two environments were used to map fiber quality QTL. In this report, an immortalized F2 population, Xiangzamian2 (XZM2), the most widely cultivated cotton hybrid planted both in F1 and F2 population before transgenic Bt hybrids were extensively released in China, derived from the intraspecific cross “CRI12 × J8891”, was used to construct a cDNA-AFLP based transcriptome map and QTL for yield and yield components were identified. Concurrently, candidate genes for these important quantitative traits were identified by sequencing TDFs co-located with the corresponding QTL.

Materials and methods

Plant materials

XZM2 is a cotton hybrid with high yield potential, developed from the intraspecific cross “CRI12 × J8891”. XZM2 was released in the Hunan province in 1997 (Li et al. 1997) and in the Yangtze River cotton growing region in 2001. It is the most widely cultivated cotton hybrid planted both in F1 and F2 before transgenic Bt hybrids were extensively released in China; CRI12 and J8891 exhibit a high level of combining ability and yield production. Furthermore, CRI12 and J8891 are inbreds derived from self-pollination. CRI12 was crossed with J8891 to generate the F1 in 1998. A F8 population of 180 recombinant inbred lines (RILs) was generated by a bulk-selfing technique from this cross in 2002, and the immortalized F2 (IF2) population was constructed by crossing between RILs randomly selected by permutations in 2003 (Wang et al. 2007).

CRI12 and J8891 (parents), F1 and 171 IF2 were planted in 2004 and 2005 in Jiangpu Experiment Station, Nanjing Agriculture University (JES/NAU), the Yangtze River cotton growing region in China, and in 2008 in Linqing, Shangdong, Yellow River cotton growing region in China. Each site-year represented an individual environment (i.e., four environments). A randomized complete block design with two replications was applied in each environment. Cotton seeds were directly planted into nutritive pots in seed-bed plots in small arches covered by polyethylene mulch to control the bed temperature of 25–30°C. Seedlings exhibiting growth of three to four leaves were transplanted to the field, with a planting distance of 30 cm and a row spacing of 80 cm. Five plants in the middle of each single-row plot were tagged for trait measurement. The following yield and yield related traits were investigated: seed cotton yield (SY, g/plant), lint yield (LY, g/plant), bolls/plant (BP), boll weight (BW, g), lint percentage (LP, %), seed index (SI, g), lint index (LI, g) and fruit branch number (FBN). The IF2 population mid-parent heterosis for SY and LY was calculated using H MP = (IF2−(P1 + P2)/2) × 100%, where P1 and P2 indicated the original parents CRI12 and J8891, respectively.

cDNA-AFLP analysis

Total RNA was extracted from topmost unfolded leaves of the parents, F1 and IF2 individuals at the florescence stage in one of the replications in JES/NAU in 2008, with a modified CTAB-acidic phenolic method described by Jiang and Zhang (2003). At least three leaves for each sample were pooled for RNA extraction. A spectrophotometer was used to estimate RNA concentration and quality and RNA was visualized on 1% agarose gels. Residual DNA was digested with DNaseI. First- and second-strand cDNAs were synthesized according to the M-MLV RTase cDNA Synthesis Kit protocol (TaKaRa Company, China). Double-stranded cDNA (ds-cDNA) was digested with MseI/EcoRI followed by ligation of MseI/EcoRI adapters with T4 DNA ligase (using the same adapters as Vos et al. 1995). cDNA-AFLP reactions were performed according to Bachem et al. (1996, 1998). Primers contained one and two selective nucleotides for preamplification and selective amplification, respectively. Selective amplification products were denatured and separated on 6% denaturing polyacrylamide (19:1) gels and visualized by silver staining.

A total of 256 MseI/EcoRI selective primer combinations were tested (Table S1). All primer combinations that exhibited polymorphisms between the two parents and showed band type stability and consistency among 12 random samples in pilot reactions were selected to screen the immortalized F2 populations. The presence/absence of polymorphic cDNA fragments, also referred to as transcriptome derived fragments (TDFs), were scored and a subsequent χ 2 test (P < 0.05) was performed to determine if the segregation ratio fit the expected 3:1 ratio.

Transcriptome map construction

JoinMap3.0 was used to conduct linkage analysis (Van Ooijen and Voorrips 2001), with a maximum recombination fraction of 0.40 and a minimum LOD score of 4.0. The Kosambi function (Kosambi 1944) was used to convert recombination frequency to map distance (centiMorgan, cM). The linkage groups were denoted as LGXX, where XX represented serial numbers. Each TDF was designated according to the primer combination, followed by the estimated fragment size in bp. MapChart 2.2 created the graphic representation of linkage groups (Voorrips 2006).

QTL analysis

Eight yield and yield component traits were examined in this study, including seed cotton yield (SY, g/plant), lint yield (LY, g/plant), bolls/plant (BP), boll weight (BW, g), lint percentage (LP), seed index (SI, g), lint index (LI, g), and fruit branch number (FBN). The mean for each individual phenotypic measurement from two replications in each environment were calculated to assess trait performance. QTL detection was conducted by the Composite Interval Mapping (CIM) procedure (Zeng 1994) using Windows QTL Cartographer 2.5 (Basten et al. 2001). The standard model (Model 6) was adopted. The window size was set at 5 cM and the walk speed 1 cM. The maximum 10 background markers were used for genetic background control. LOD threshold values were estimated by 1,000 permutations to declare significant QTL (Churchill and Doerge 1994). QTL confidence intervals (90%/95%) were set as map intervals corresponding to two or one LOD decline on either side of the peak. The genetic QTL modes were classified as described by Stuber et al. (1987). QTL nomenclature referred to McCouch et al. (1997). The designation begins with “q”, followed by an abbreviation of the trait name, the linkage group, and finally the serial number.

Cloning and sequencing of cDNA-AFLP fragments

Targeted TDFs over 200 bp in length, which were co-located with QTL of yield and yield component traits, were excised from polyacrylamide gels and treated with diffusion solution containing 0.5 mM ammonium acetate, 10 mM magnesium acetate, 1 mM EDTA and 0.1% SDS at 65°C overnight. The gel extraction mini kit (Watson Biotechnologies Inc., Shanghai, China) was then used following the manufacturer’s instructions. Aliquots of the solution were used as templates to re-amplify the fragments, using PCR conditions and primer combinations identical to that used in the corresponding selective amplifications.

TDFs were cloned and sequenced using standard protocols (Sambrook et al. 1989), with slight modifications. Two microliters of the re-amplified products were ligated into pMD18-T Vectors (Takara Bio. Inc., Dalian, China), and ligates were further transformed into E. coli strain TOP10 cells. The cells were spread on Luria–Bertani (LB) agar plates containing ampicillin, X-gal and IPTG, and grown at 37°C overnight. Individual positive colonies were selected and grown in LB/ampicillin medium for 3–4 h at 37°C, with 300 r/min shaking. Two microliters of the culture solutions were used as templates to validate the presence of the putative inserted fragments via PCR amplifications, using M13 primer pairs and the corresponding AFLP primer combinations. GeneScript Corporation (Nanjing, China) sequenced at least two clones of each TDF in order to determine the consensus sequence.

DNAStar software was applied to remove all vector sequences. Subsequently, sequence homology searches were performed using the BLASTx algorithms (Altschul et al. 1997) in the GenBank non-redundant protein database (Benson et al. 2008) accessed through the NCBI homepage (http://www.ncbi.nlm.nih.gov/) and using the AmiGO search engine on http://www.geneontology.org/ under a significant e value <10−4. For TDFs resulting in no significant homolog hits, BLASTn algorithms were used to search EST Others databases in GenBank under a significant e value <10−8. ESTs with the highest similarity to each of these TDFs were selected as alternatives for a second protein database search. The correlation between functional TDF expression patterns and trait performance and/or yield heterosis were analyzed with SPSS 16.0 software using the rank-based Kendall’s tau-b method (Kendall 1948). A large sample approximation for the test statistic was calculated with the mean value of trait measurement in each environment and TDF presence/absence as 1/0, respectively. The threshold for declaring significant/highly significant correlations was P ≤ 0.05and/or 0.01.

Results

Construction of transcriptome map

Each cDNA-AFLP primer combination generated 10–30 readable bands, and most bands were monomorphic. A total of 627 polymorphic cDNA-AFLP fragments over 100 bp in length were generated from 256 primer combinations executed in the analyses, with 295 and 332 fragments from CRI12 and J8891 parents, respectively. Most polymorphic fragments (552/627) exhibited the clear presence/absence of dominant band types. A small portion of fragments (75/627) showed polymorphisms with expression levels indicated by a clear cut-off in band intensity, which made it possible to distinguish the two types of genotypes: high-intensity bands were scored as present and low-intensity bands as absent. Of the total 627 polymorphic fragments generated in the present study, 105 (16.75%) showed significant distortions, consistent with 14.4% reported by Fan et al. (2008). All polymorphic TDFs were used to construct the transcriptome map except for the 54 severely distorted TDFs.

JoinMap 3.0 was used to construct a transcriptmome map with 26 linkage groups at a threshold of 4.0 (Table S2). The total length of the map was 2747.01 cM, and the average individual linkage group length was 95.27 cM. Among the 573 TDF markers processed, 302 (52.70%) were mapped with an average interval of 8.23 cM between two adjacent markers. Distances between two adjacent TDFs were within 5.0 cM for approximately 45.0% of all markers. Individual linkage groups contained between three and 23 markers with an average 12 markers per linkage group. All cDNA-AFLP markers were distributed uniformly among linkage groups, with no significant clusters (Fig. 1). Obvious clusters were not detected in each linkage group, which is incongruent with most genetic maps constructed by DNA markers. Seventeen larger linkage groups with lengths ranging from 84.17 to 151.87 cM contained 85.8% (259/302) of the markers, with two large gaps over 30 cM on LG09 and LG14.

Fig. 1
figure 1figure 1figure 1figure 1

Transcriptome map and QTL position. The map of the immortalized F2 population derived from crosses between RILs of XZM2 (CRI12 × J8891F1) was constructed by JoinMap 3.0 with the Kosambi function and a LOD ratio of 3.0. Significant QTL for yield and yield component traits were indicated by bars and whiskers, which are 1- and 2-LOD likelihood intervals, respectively. Sequenced cDNA-AFLP fragments co-located with QTL were shown in brick red color

QTL analysis of yield and yield component traits based on transcriptome map

The mean value of traits from both mapping parents and the IF2 based on two replications in each environment was calculated. Student’s t test detected significant differences between parents for all traits with the exception of LP, SI, and FBN (data not shown). All yield and yield components in the IF2 expressed transgressive segregation in both directions, suggesting polygenic inheritance (Table 1; Fig. 1).

Table 1 QTL for yield and yield components detected by CIM in the IF2 population across four environments

Seed cotton yield

Thirteen QTL were detected for SY, which explained between 6.33 and 35.99% of the phenotypic variation (PV) and LOD scores ranged from 2.52 to 4.83. Three dominant or overdominant QTL (qSY-LG09-1, qSY-LG12-2 and qSY-LG16-1) were detected in two environments. Another QTL qSY-LG08-1 showed additive or overdominance in two environments. The other nine QTL were detected only once, seven of which exhibited partial to complete dominant effects and two showed overdominant effects. Alleles from J8891 contributed to an increase in SY for seven QTL, and alleles from CRI12 had positive effects for six QTL. A major QTL qSY-LG12-2 was associated with TDFs E4M16_100, E3M12_300 and E12M12_400.

Lint yield

Fourteen QTL were detected for LY, which explained 6.93% of the PV in qLY-LG09-2 to 35.05% in qLY-LG12-1. LOD scores ranged from 2.5 to 4.68. Two QTL (qLY-LG01-1 and qLY-LG08-1) were detected in two environments, both exhibiting at least partial dominance. The other 12 QTL were identified in one environment, with four showing partial to complete dominance and eight overdominance. Alleles from J8891 contributed to an increase in LY for six QTL, and alleles from CRI12 had positive effects for eight QTL. The major QTL qLY-LG12-1 was co-located with TDFs E3M12_300 and E12M12_400 on LG12.

Bolls per plant

Twelve QTL were detected for BP with LOD scores ranging from 2.52 to 5.15; and one QTL was detected twice exhibiting dominance and overdominance. Four partial to complete dominant and seven overdominant QTL were detected in a single environment. The PV explained by these QTL ranged from 10.38% for qBP-LG01-1 to 40.37% for qBP-LG22-1. CRI12 alleles were associated with increased BP for five QTL, and J8891 contributed positive alleles for seven QTL. Major QTL qBP-LG22-1 was associated with E16M8_250 and E7M15_700, and qBP-LG10-1 with E2M8_380 and E11M16_300.

Boll weight

Four QTL for BW were detected in a single environment with LOD scores between 2.55 and 3.38. These QTL explained from 7.55% for qBW-LG15-2 to 47.88% for qBW-LG18-1 of the total phenotypic variation. qBW-LG15-2 showed partial dominance and overdominance was exhibited by the three other QTL. J8891 alleles were responsible for an increase in BW for three QTL, and CRI12 alleles were associated with increased BW for one QTL. The major QTL qBW-LG18-1 explained the highest percentage of PV and was co-located with TDFs E16M15_125 and E8M11_250 on LG18.

Lint percentage

Fourteen LP QTL were identified, which explained 7.59–39.80% of the PV. LOD scores ranged from 2.53 to 4.45. The overdominant QTL qLP-LG07-1 was detected in three environments, and the partial dominant qLP-LG15-1 and additive qLP-LG19-1 QTL were detected twice. The remaining 11 QTL were detected in a single environment, of which three showed partial to complete dominant effects and eight showed overdominant effects. CRI12 alleles were associated with increased LP for six QTL and J8891 alleles for the remaining eight QTL. The major QTL qLP-LG16-1 was co-located with TDFs E7M7_520 and E8M11_300, and qLP-LG17-1 with E11M9_160 and E6M9_200.

Seed index

Four QTL were detected for SI with LOD scores ranging from 2.58 to 3.05. The phenotypic variation explained by these QTL ranged from 10.35 to 22.51%. QTL qSI-LG16-1 was detected in two environments. Alleles derived from CRI12 were responsible for the SI increase in qSI-LG19-1, and J8891 alleles were positive at qSI-LG05-1, qSI-LG07-1, and qSI-LG16-1.

Lint index

Fifteen QTL were identified for LI with LOD scores between 2.54 and 4.63. The QTL explained 8.35–36.69% of the phenotypic variation. QTL qLI-LG02-1 and qLI-LG06-1 were detected in two environments. Alleles derived from CRI12 were associated with increased LI for six QTL. J8891 alleles provided increased LI for the additional nine QTL. Seven QTL showed partial to complete dominance, and eight QTL showed overdominance. The major QTL qLI-LG18-1 was co-located with TDFs E8M9_260 and E12M9_305, and a second major QTL qLI-LG08-1 was associated with E16M16_126 and E6M6_120.

Fruit branch number

Seven QTL were detected in a single environment for FBN with the LOD scores ranged from 2.56 to 5.80. And 9.79 to 36.79% of the phenotypic variation was explained by these QTL. Three QTL showed partial dominance and four showed overdominance. Increases in FBN were observed in three QTL associated with CRI12 alleles and four QTL associated with J8891 alleles. The major QTL qFBN-LG15-1 was co-located with TDFs E1M6_250 and E10M6_600 on LG15.

Putative candidate genes revealed by homology analysis of associated TDFs

Of the total 51 TDFs sequenced, 44 with represented genes biological function (Table S3) were associated with 50 QTL for yield and yield components. Among them, nine, five, thirteen, and five TDFs representing genes were related to transcription regulation, signal transduction pathways, biosynthesis and metabolism processes, and ion, protein, or phospholipid transport, respectively. For example, three overdominant QTL qBP-LG16-1, qSY-LG16-1, and qLY-LG16-1, each with significant additive and dominant effects were flanked by TDF E11M15_470, which encodes an AP2 domain-containing transcription factor involved in modulating expression of genes responsive to biotic and abiotic stress diseases and genes related to the cell life-cycle, growth, and development. The overdominant QTL qBP-LG22-1 was associated with E16M8_250, a fragment encoding a NAC domain containing protein involved in multicellular organismal development, shoot development, xylem histogenesis, and programmed cell death. The gene represented by the E3M8_250 fragment on LG07 encodes Late Embryogenesis Abundance (LEA) 14. LEA proteins, when expressed under conditions of desiccation, cold, osmotic stress, and heat are thought to be associated with abiotic stress tolerance.

Relationship between gene differential expression and yield heterosis

A total of 37 TDFs were significantly correlated with trait performance and yield heterosis (Table 2). Among them, 13 TDFs originated from J8891 and 24 from CRI12, each with correlation coefficients either in a positive or negative direction. Five TDFs were positively correlated with the seedcotton and lint yields as well as yield heterosis, including E4M3_300, E1M6_250, E4M8_340, E10M6_600, and E4M16_280 encoding the CAP-binding protein, mitogen-activated protein kinase, GATA transcription factor, translation initiation factor EIF4A1, and beta-ketoacyl-CoA synthase, respectively. Six TDFs were positively correlated with both SY and LY, and included TDFs E7M15_250, E11M15_470, E12M9_305, and E16M8_250, respectively, encoding glycerol-3-phosphate acyltransferase, AP2 domain-containing transcription factor, ribonuclease II-like protein, NAC domain-containing protein and TDFs E11M16_240 and E6M4_200 encoding proteins of unknown functions. The fragment E2M3_235, encoding a proteasome subunit, was positively correlated with LY, BP, and yield heterosis. The fragment E12M5_250 was positively correlated with LY and BP and encodes glucose 1-dehydrogenase. Furthermore, seven TDFs including E4M3_260, E1M6_550, E2M5_250, E7M12_350, E12M1_210, E3M8_275, and E7M2_275 were positively correlated with heterosis performance of either yield trait. The respective TDF functions included encoding a GRAS family protein, a putative F28D10_20 protein, a phosphoglycerate kinase, a chlorophyll a-b binding protein, a magnesium transporter, a heavy metal ATPase, and a protein of unknown function.

Table 2 Correlation coefficients between differential expression of functional TDFs and performance of yield, yield heterosis, and other related traits

Interestingly, E11M16_300 encoding a GAI/RGA-like protein was detected as positively correlated with yield and yield heterosis in Jiangpu during 2004, but negative correlations were identified in Linqing during 2008, indicating repression of vegetative growth may have different results on yield in different environments. E12M9_305 encoding ribonuclease II-like protein co-located with qLP-LG19-1 and qLI-LG19-1 was positively associated with SY, LY, and BP, but both positive and negative effects of this TDF were detected on yield heterosis, indicating the mechanisms underlying trait performance and heterosis might not be the same.

Discussion

The feasibility of QTL tagging based on a transcriptome map

Distinct tissue and cell type differentiation is highly dependent on specific patterns of gene expression and transcript accumulation (Ewing et al. 1999; Drost et al. 2010). In higher plants, housekeeping genes encoding essential metabolic enzymes or cellular components are primarily cis regulated (~92%) and are expressed constitutively at relatively low levels in all tissues/cell types, while tissue-specific genes that are largely trans regulated differ in their expression at the levels of transcription, typically with two- to ten-fold between the uninduced and induced levels (Drost et al.2010). Cotton yield and yield component traits are the cumulative results of a series of metabolic pathway, in which many interacting genes are involved. At florescence stage, cotton leaves as the main photosynthesis organs contribute most to the final production and are likely to express similar sets of genes in uninduced natural conditions; therefore functional leaves in one of the environments were sampled for RNA extraction in this study. cDNA-AFLP-based transcriptome map construction provides a viable approach to tag QTL and even the potential candidate genes controlling important qualitative and quantitative traits of interest (Liu et al. 2009; Fernández-del-Carmen et al. 2007).

TDFs target highly conserved coding regions, each having a unique segregation pattern, as opposed to neutral DNA markers that tend to cluster on centromere or other repetitive non-coding regions (Brugmans et al. 2002). Hovav et al. (2008) reported an overwhelming majority of genes are transcribed in cotton fibers of different developmental stages. Comparative expression profiling in different plant materials has indicated that TDFs largely represent constitutively expressed house keeping genes (Ritter et al. 2008). Therefore, transcriptome maps can be used to map QTL of important traits similar to DNA marker-based maps, irrespective of the specific stages/cell types. TDFs are transcribed coding regions of functional genes without repetitive DNA, introns and spacers; therefore, the TDFs co-located with QTL have potential for cloning the functional genes responsible for each corresponding trait and have future value in marker-assisted selection.

The absence/presence of cDNA-AFLP fragment polymorphisms in a segregating population is predominately the result of genomic sequence polymorphisms in the coding region. This is likely due to SNPs or indels in the transcribed sequences (Brugmans et al. 2002). Some polymorphisms are probably cDNA specific, generated by splicing-site changes after transcription. In the present research, each polymorphic band was read separately; therefore, co-dominant allelic fragments generated from the same primer combination were initially mapped at the same or very close position, and only one fragment remained in the map to represent the allelic site. Band intensity polymorphisms representing quantitative differences in gene expression were not included due to the difficulty in discriminating by direct visualization. However, intensity polymorphisms that demonstrated clear-cut distinctions between parents and IF2 were genotyped into absence/presence classes and used in map construction. The percentage of mapped TDFs (302/573, 52.7%) was lower than reported in Brugmans et al. (2002) (65.9%) and Li et al. (2003) (72.6%), primarily caused by the inefficiency of dominant markers and the higher number of distorted markers adopted. The LGs contained averagely 12 markers. The LG24-26 each consisting of three markers and the QTL detected on them might be not reliable, therefore other markers are needed to fill the gaps in the future.

Since DNA markers were used to construct a genetic linkage map for the original RIL mapping population for QTL mapping in the RIL (Wang et al. 2006) and in the IF2 (Wang et al. 2007), integration of the current expression map with the prior mapping work using SSR markers will help reveal the likely chromosomes of the 26 linkage groups constructed from the TDF markers. We tried, and unfortunately these two kinds of maps are hard to integrate since cDNA-AFLP transcriptome can be identified as cis and trans traits. As we know, cis trait is one that genetically maps to the physical location of the gene encoding its mRNA, suggesting that variation at the locus is responsible for the heritable changes in gene expression. A trans trait maps to a region distinct from its physical location and implies the location of a potential regulator acting in trans. In species such as cotton in which no whole genome sequence is available, we presently cannot distinguish cis and trans traits, resulting in no reliable integration of two kinds of maps.

Complexity of elite gene exploitation for trait improvement in cotton breeding

Yield and yield components are highly sensitive to environmental variation. Consequently, replicated field experiments across multiple environments are required to detect stable QTL for these traits. In this study, XZM2-derived IF2 were generated for QTL mapping of yield and yield components at four different environmental sites. A total of 71 QTL were detected, with 13 in at least two environments. Significant clusters of QTL for highly related traits were found in several linkage groups. Nearly all QTL for seed cotton yield (SY) and lint yield (LY) were mapped at the same positions, and yield QTL were co-located with at least one QTL for yield components. These results suggested that the genetic potential of cotton varieties cannot be predicted simply based on phenotype. Results further indicated the inherent complexity of elite gene exploitation for trait improvement in hybrid breeding. It is notable that 38.8% of the total QTLs associated with improvement of yield and yield heterosis were detected from the phenotypically inferior parent CRI12. Yield heterosis exhibited in XZM2 should be the result of elite gene combinations from both parents; therefore, the low-yield parent CRI12 clearly contributes to heterosis. Similar results have been reported in both cotton and other crops; and alleles from phenotypically inferior parents have been associated with trait performance improvement for many detected QTL (Chee et al. 2005; Huang et al. 2003; Xie et al. 2008, Liu et al. 2009). Tanksley and McCouch (1997) pointed out that the phenotype of a plant is only a modest predictor of its genetic potential. For example, a high-yielding line often maintains positive alleles at most loci associated with yield, and typically an inferior parent contributes a superior allele from one or more loci. For this reason, QTL and other marker-based methodologies are suitable strategies for improving breeding efficiency. The present study has profound implications that although as the low-yield parent, CRI12 as good combining ability variety clearly contributes to heterosis in XZM2.

Mining and utilization of candidate genes for yield related traits

A lot of effort has been placed on tagging QTL for important fiber quality and agronomic traits in cotton (Ulloa et al. 2005; He et al. 2007; Guo et al. 2006; Wu et al. 2009), and cotton genome sequencing is currently underway (Yu 2010). However, little is known regarding the genes controlling these complicated quantitative traits. In the current study, potential candidate genes were analyzed by homology searches for 44 TDFs co-located with QTL for yield and yield components. According to Paterson and Smith (1999), the average DNA content of 1 cM in genetic maps is estimated to be 400–600 kb. The 95% confidence interval for QTL detected in this study was predominantly 10 cM; therefore many genes could be within one QTL interval, making it difficult to identify all the functional genes. However, neighboring genes tend to have related function and co-expressed (Michalak 2008; Williams and Bowles 2004; Zhan et al. 2006), and co-located TDFs are modest indicators of true functional genes. However, the genotypes that did not generate cDNA-AFLP bands can also provide beneficial allele for QTL. In such instance, the other allele relative to the gene represented by the TDF, which escaped detection by cDNA-AFLP techniques due to lack of restriction site, might be the potential candidate gene. In this study, correlation analysis between TDF expression pattern and yield performance detected some TDFs positively associated with yield and yield heterosis; indicating the proteins encoded by these putative candidate genes were potentially involved in the biological processes that contribute to the accumulation of cotton yield. Some TDFs exhibited expression negatively correlated with trait performance, consistent with a recessive or negative overdominant corresponding QTL. In particular, TDF E6M9_320 co-located with three recessive QTL for SY, LY, and LP on LG02, which were not related to an increase in yield but might be responsible for improving fiber quality by encoding putative cellulose synthase.

The ultimate goal of this study was to identify genes responsible for yield and yield components and potentially cotton heterosis. The results of associated TDFs provided insights into the final cloning of candidate genes, which might be crucial in the ultimate performance of quantitative traits. In the future, related clones will be identified by screening the cotton BAC library, and integrated into the cotton physical map in our lab, facilitating the map-based cloning of functional genes for yield and yield components. Furthermore, QTL-associated TDFs can be converted into user-friendly CAPs markers for marker-assisted breeding.