Introduction

Plant architecture is an important agronomic factor that influences planting area, cultivation pattern, yield level, and planting efficiency in crop plants (Sakamoto and Matsuoka 2004; Wang and Li 2008), and branching and flowering patterns are important components of this architecture (Reihardt and Kuhlemerier 2002). Most varieties or germplasms of tetraploid cotton produce lateral shoots (branches) from the main shoot axis. The lateral branches are usually divided into two types—vegetative and fruiting—and they arise from the lower and upper leaf axils, respectively, on the main stem. The main shoot and the vegetative branch display monopodial growth, but the fruit branches display sympodial growth usually with more than two nodes (Jenkins et al. 1990; Boquet et al. 1994). The architecture of the cotton plant is an important factor to be considered when breeding. Some researchers mapped many quantitative trait loci (QTLs) using molecular markers for a few of the plant architectural traits of cotton, such as the node of the first fruiting branch, fruiting branch angle, and fruiting branch internode length (Guo et al. 2008; Luan et al. 2009; Song and Zhang 2009).

Worldwide, manual/mechanical pruning or artificial chemical regulation is needed for alterations of cotton plant height and canopy architecture, which are important determinants for the cotton yield (Pettigrew 1994; Reta-Sánchez and Fowler 2002). A “nulliplex-branch” trait is a type of branching mutation, and on which the plant’s fruiting branches show determinate growth. In nulliplex-branch cotton, on most of the leaf axils on the main shoot, 1–3 flowers arise and the fruiting branches seem to disappear. At times, in some of the leaf axils, 1–2 very short branches with only one node arise and 2–3 flowers (sometimes incomplete) clump on top of them (Fig. 1). The nulliplex-branch trait could be a useful germplasm because it is suitable for high-density planting without pruning and chemical regulation. In the biggest cotton production region, the northwest inland area of China, all commercial Pima (Gossypium barbadense) and part of upland (G. hirsutum) varieties are nulliplex-branch types. Developing some closely linkage markers with nulliplex-branch traits will facilitate the molecular genetic analysis of floral branch architecture in cotton.

Fig. 1
figure 1

Plant appearance of branching in cotton: a Xinhai-18 (nulliplex-branch mutant). b TM-1 (wild type). c F1 of TM-1 × Xinhai-18. Open arrows indicate flower stalks. Solid arrows indicate fruiting branches. Circles indicate nodes on branch. Square indicates flowers clumping on the top of branch

Bulked segregant analysis (BSA) was developed by Michelmore et al. (1991) for rapidly identifying markers linked to any specific gene or genomic region. Over the past several decades, BSA combined with different molecular marker technologies has been used intensively in many organisms to map important traits (Zhang et al. 1994; Xu et al. 2000; Yi et al. 2006; Huang et al. 2014); however, developing thousands of molecular markers and screening them among pools to find markers for linkage analysis have been time consuming and labor intensive. Recently, BSA was combined with next-generation sequencing (NGS) technologies for gene/QTL mapping and identification (Cuperus et al. 2010; Zuryn et al. 2010; Pomraning et al. 2011; Bowen et al. 2012; Hartwig et al. 2012; Leeuwen et al. 2012; Swinnen et al. 2012; Zhu et al. 2012). The combination of BSA and NGS can usually directly map mutations responsible for phenotypes without time-consuming and laborious screening work (Austin et al. 2011; Abe et al. 2012); however, until now, few reports were available on the application of this strategy in gene/QTL mapping in cotton (Poland et al. 2012a; Gore et al. 2014).

In this study, the feasibility of the strategy of BSA plus NGS in cotton was explored, and several single-nucleotide polymorphism (SNP) markers were identified as closely linked to the nulliplex-branch gene. Some SNP markers were converted into PCR-based markers, and a fine map was constructed around the gene.

Materials and methods

Plant materials

TM-1 and Xinhai-18 were used as parent cotton plants in the present study. TM-1 was a genetic standard in G. hirsutum, and had normal fruiting branches with more than two nodes. Xinhai-18 was a commercial Pima cotton (G. barbadense) variety released in Xinjiang Province, China, in 2001 (Jiang et al. 2001) and characterized by the nulliplex-branch trait (Fig. 1). Xinhai-18 was selfed for more than five generations to eliminate heterozygous loci. In different planting environments, Xinhai-18 flowered about 5–12 days earlier than TM-1. An F2 population of TM-1 × Xinhai-18 comprising 1,266 plants was used for genetic analysis and molecular mapping of gb_nb1. The nulliplex-branch trait was scored by visual observation at flowering time.

An F2 population comprising 168 plants of the cross TM-1 × Hai7124 was used to map the markers developed in this study to the reference map (Guo et al. 2007). Hai7124 was a G. barbadense cultivar with normal fruiting branches.

Reduced-representation sequencing and SNP identification

Genomic DNA was isolated from young leaves using the CTAB method for parents and F2 plants (Song et al. 1998). DNA was quantified by a Nanodrop 2000 UV–Vis spectrophotometer machine. Thirty nulliplex-branch plants (referred as an Nb pool) and 30 multiple node–branch plants (referred as a Mn pool) were randomly selected from the F2 population, and an equal amount of DNA from each plant in each group was mixed to form pools. Reduced representation libraries (RRLs) for sequencing were constructed for TM-1, Xinhai-18 and the two DNA pools using the methods described by Sun et al. (2013). The combination of the restriction enzyme was “EcoRI + NlaIII + MseI”. Paired-end sequencing by an Illumina HiSeq 2000 machine for pooled RRLs was conducted according to standard sequencing protocol. The read length was 100 bp at each end.

Raw reads are filtered using the following criteria: reads with unknown nucleotides ≥10 % were removed; reads in which the percentage of low-quality bases (base quality ≤10) was ≥30 % were removed; reads with clear index information were trimmed; and low-quality bases at the rear end of reads were cut off. If the Q20 ratio was ≥80 % after trimming, the quality of reads was acceptable. In this study, a 45-bp length sequence following index sequence (21-bp bases) was retained at each end. These trimmed reads were called “useful fragments”. Identical fragments were merged into “tags” with a certain depth. Tags were grouped based on sequence similarity (90 % identity) among all four libraries by BLAT (Kent 2002). Tags with sequence errors were corrected (base correction) in a group according to the method proposed by Sun et al. (2013), and final groups were named “markers”.

Based on the number of tags and polymorphisms among tags in each, the markers were classified into four types (Fig. 2): “No Polymorphism” (contains one tag), “Repeat” (contains three or more tags), “SNP” and “Unknown”. An SNP marker contained two tags from more than two libraries, and in it there were no more than three (four) polymorphisms for a single (two) end(s).

Fig. 2
figure 2

Summary of different types of markers. Vertical lines indicate SNP sites on tag

Only SNP markers were used for further analysis. The markers with a summed depth of all tags less than eight were discarded. Parental origin of each tag in an SNP marker was judged by its presence in the two parent libraries. Based on the parental origins of tags, SNP markers were filtered for those that were homozygous (Fig. 3). A homozygous SNP marker was regarded as a result of a biallelic nucleotide variation, and in it each tag had a unique parental origin.

Fig. 3
figure 3

Examples of different types of SNP markers. Parental origin of each tag was judged by its presence in the two parent libraries. When the depth of a tag in a parent library was low (≤5) and higher in the other (fold change of depth is ≥3), its origin would be regarded as the higher depth parent. In a homozygous SNP marker, each tag has unique parental origin. In these examples, only Marker1, Marker2 and Marker3 were regarded as homozygous markers

Association analysis

Association mapping was conducted according to the method proposed by Abe et al. (2012). When plants were selected completely randomly for the construction of the pool, TM-1 allele of gb_nb1 would be missed in the Nb pool and ratio of TM-1(A) and Xinhai-18(a) allele frequency in the Mn pool should be A:a = 2:1. For each homozygous marker, the SNP index was calculated in both pools. In the Mn pool, SNP-index_Mn = X Mn/(X Mn + T Mn), in which X Mn and T Mn were the depth of the Xinhai-18 and TM-1 origin tags, respectively. This index was expected to be equal to 0.33 near the causal gene and 0.5 for the unlinked loci. In the Np pool, SNP-index_Np = X Np/(X Np + T Np), in which X Np and T Np were the depth of the Xinhai-18 and TM-1 origin tags, respectively. This index was expected to be equal to 1.0 near the causal gene and 0.5 for the unlinked loci.

All homozygous markers were aligned to the reference D5 and A2 genome sequence (90 % identity) of G. raimondii and G. arboreum, respectively, by BLAT (Kent 2002). If a marker was aligned to multiple positions (greater than or equal to two) in a genome, it would be discarded. For all aligned markers, both SNP indices were plotted across the genomes.

Validation of SNP markers

SNP markers were converted into PCR-based markers using the modified allele-specific PCR method of Hayashi et al. (2004), in which the SNP genotype was presented by competitive PCR amplification using allele-specific primers. In the present study, primers were designed to produce an amplification product in TM-1 but not in Xinhai-18, which appeared to be dominant markers. In each allele-specific primer, the SNP locus was put at the first base of 3′ end of the primer, and an additional artificial mismatch was included at the second or third base of the 3′ end.

Linkage analysis

The linkage map was constructed with MapMaker v3.0 (Lander et al. 1987) using phenotype and genotype data and allele-specific PCR markers in F2 populations. Recombination frequency was transformed into genetic distances (centimorgans, cM) using Kosambi’s mapping function (Kosambi 1944).

Results

Genetic analysis of the nulliplex-branch trait

To investigate the inheritance of the nulliplex-branch trait in cotton, reciprocal crosses between Xinhai-18 and TM-1 were performed. Both F1 generations had normal fruiting branches (Fig. 1), which suggested that the nulliplex-branch trait was recessive. Of the 1,266 F2 plants developed from TM-1 × Xinhai-18, 325 were the nulliplex-branch type and 941 were the normal fruiting-branch type, which fit the 1:3 segregation ratio (χ 2 = 0.27, P > 0.05). Thus, the nulliplex-branch trait in Xinhai-18 was controlled by the single recessive gene, gb_nb1.

Sequencing and SNP identification

Four samples generated 314–419 Mb sequencing data, and 2.07–2.76 million “useful fragments” were retained after read filtering and trimming (Table 1). After merging the identical fragments in each library, 44,350–62,822 tags were obtained in four libraries. Tags from the four libraries were grouped, and 50,480 groups (“markers”) were obtained. Of these markers, 71.53 % were the “No Polymorphism” type, and only 23.38 % were the “SNP” type, which could be used for further analysis (Fig. 2; see detail of these SNPs in supplementary Table 1). Out of 11,805 SNP markers, 7,332 were regarded as homozygous. For these homozygous markers, the average depth was 119 and the individual depths ranged from 22 to 1,196.

Table 1 Summary of reduced-representation sequencing

In the alignment of reference genomes, there were 3,600 and 4,538 homozygous markers that could be mapped to the D5 and A2 genomes, respectively. Out of these, 1,949 markers were common between the two genomes. For the alignment of the D5 genome, the genome coverage was 744 Mb and the average marker density was 4.8 makers per Mb. For the alignment of the A2 genome, the genome coverage was 1,527 Mb, and average marker density was 3.0 markers per Mb.

Association mapping

For assigned markers, SNP-index_Mn and SNP-index_Np were calculated and plotted against their genome positions in each reference genome. The interval of 9.9–18.1 Mb on D01 of the D5 genome appeared to be the only candidate region harboring gb_nb1 in both SNP index plots (Fig. 4; see SNP-indices for all aligned markers in supplementary Table 2). For SNP-index_Np, 25 out of 42 markers in this interval had an index of >0.90. In addition, a cluster of six consecutive markers with SNP index of >0.98 was found in the interval of 14.6–15.7 Mb. For SNP-index_Mn, two marker clusters with an SNP index of ~0.33 were found. One cluster containing five markers was located in the interval of 11.3–12.8 Mb with an SNP index of 0.31–0.39. For another cluster containing eight markers in the interval of 14.1–15.8 Mb, five markers with an SNP index of ~0.33 (0.31–0.39) were found.

Fig. 4
figure 4

SNP index plots on D01 of the D5 genome. Little open dots indicate index values of markers. Regression lines were obtained by the method of Abe et al. (2012). Averaged SNP index values in a moving window of 2.0 Mb with 100 kb increments were obtained, and the position of each averaged SNP index was set at a midpoint of the window

Validation of SNP marker and mapping

Allele-specific primers were designed for 10 markers in the candidate region on D01 (supplementary Table 3) and were expected to having PCR products in TM-1 but not in Xinhai-18. Of them, two markers, Marker9009 and M19989, had difference bands in the polyacrylamide gels between the parents that were ambiguous, so they were excluded from further analysis. Eight polymorphic primers were used to screen the genotype of the 30 plants used for bulk construction (data not shown). Of them, only Marker21373 showed no recombinant in the Nb pool (Fig. 5). Thus, Marker21373 and the three markers around it were used to further analyze the genotype of 1,266 F2 plants used for genetic analysis. After linkage analysis, gb_nb1 was mapped in an interval of 1.87 cM defined by Marker21373 and Marker2784, which were at a physical distance of 835 kb (Fig. 6b).

Fig. 5
figure 5

Confirmation of linked SNP markers in 30 nulliplex-branch (Nb) and 30 normal plants (Mn) consisting of the Nb and Mn pools: Gel images of Marker2784 (a) and Marker21373 (b) in the two pools. Open arrows indicate polymorphic bands. Stars indicate recombinant plants. M DNA ladder, P1 Xinhai-18, P2 TM-1

Fig. 6
figure 6

Mapping of gb_nb1. a Partial physical map of A01 of Gossypium arboreum. b Partial physical map of D01 of G. raimondii. c Genetic linkage map of gb_nb1 constructed from an F2 population of TM-1 × Xinhai-18. d Partial linkage map of Chr16 constructed from an F2 population of TM-1 × Hai7124

To further validate the position of gb_nb1, SNP information from the results of Byers et al. (2012) was used. The 29,356 SNPs between G. hirsutum and G. barbadense (downloaded from the National Center for Biotechnology Information dbSNP under the handle UDALL_LAB) were aligned to the reference D5 genome sequence by BLASTN analysis. Some SNPs were located in the interval defined by Marker21373 and Marker2784, and several were used to screen the parents by designing allele-specific primers (data not shown). Marker76506, derived from SNP ss262976506, showed polymorphism between TM-1 and Xinhai-18, and was then mapped between Marker21373 and Marker 2784 after segregation analysis in the F2 population (Fig. 6c). Finally, gb_nb1 was flanked by Marker21373 and Maker76506 with a genetic distance of 0.6 and 0.4 cM, respectively.

To determine the map location of gb_nb1 in the published reference map (Guo et al. 2007), polymorphism analysis was conducted between TM-1 and Hai7124 using these five markers around gb_nb1. All five markers showed expected polymorphisms and were then assayed in the F2 population of TM-1 × Hai7124. Nine nearby SSR markers (by BLASTN analyses using marker source sequences) on Chr. 7 and Chr. 16 in the reference map were also genotyped in the same population. After linkage analysis, these five markers were found associated with SSR markers specific for Chr. 16 (Fig. 6d). Thus, gb_nb1 was located on Chr. 16 (D7) of the allotetraploid genome.

Discussion

In China, many nulliplex-branch germplasms have been discovered within recent decades for both upland and sea-island cottons (Du et al. 1996); however, these germplasm lines were found to be agronomically undesirable because they have low a lint percentage, small boll size and short fiber length (Du et al. 1996). To eliminate these disadvantages, backcrossing was a good breeding protocol; however, the nulliplex-branch trait is controlled by a recessive locus, and a self-bred generation is essential after each backcrossing. Molecular marker-assisted selection (MAS) using tightly linked markers would expedite breeding progress; however, before the current study, no marker-trait association for gb_nb1 was available, and the genetic basis of the nulliplex-branch trait was unknown. In this study, the inheritance of the nulliplex-branch trait was investigated and a set of SNP markers closely linked with this trait was identified. These markers could be used for MAS of the nulliplex-branch gene in cotton breeding. Identification of closely linked markers and the genomic location of gb_nb1 with respect to a reference genome could help pave the way for map-based cloning of this gene and help unravel the molecular genetic basis of plant architecture in cotton.

SSR markers were the most widely used marker type in cotton for genetic linkage map construction and gene/QTL mapping (Guo et al. 2007; Yu et al. 2013); however, polymorphism levels observed for SSR markers were low, especially within intraspecific crosses in upland cotton (Shen et al. 2007; Wang et al. 2007; Zhang et al. 2009; Yu et al. 2011). Very few reports on fine mapping of gene/QTL have been published for cotton (Yin et al. 2006; Dong et al. 2007; Li et al. 2010; Shen et al. 2010). The lack of sufficient polymorphic markers was one of the major obstacles in fine mapping of cotton genes. Draft genome sequences of the diploid progenitors (G. raimondii and G. arboreum) were recently released (Paterson et al. 2012; Wang et al. 2012; Li et al. 2014), and the sequence information in their specific chromosome intervals can be used for developing SSR markers in allotetraploid cotton; however, low polymorphism ratio and low efficiency still hinder the application of this strategy (He et al. 2013). SNP occurs with a very high frequency in plant genomes, and is suitable for construction of high-density genetic linkage maps for large genome crops (Poland et al. 2012b; Chen et al. 2013). Recently, the discovery of SNP markers by reduced-representation sequencing (RRS) technology proved to be efficient and low cost in cotton (Byers et al. 2012; Chen et al. 2014; Gore et al. 2014). In the present study, the nulliplex-branch gene was successfully located between two SNP markers using BSA combined with NGS. This strategy should also be feasible in other gene/QTL mapping in cotton.

Reduced-representation sequencing rather than whole-genome sequencing usually used in model plants (Arabidopsis and rice) was adopted for SNP identification. The mapping solution is relatively low and the causative nucleotide variation for the nulliplex-branch gene could not be directly found using current sequence coverage and analysis. The coverage ratio of genome is only ~0.00208 when calculated using all markers obtained (50,482 × 90 bases/2.4 gigabase). This density is not enough for identifying the candidate genes; however, RRS is the ideal choice for BSA analysis in allotetraploid cotton, considering the large genome size. Causative nucleotide variation for a given phenotype can be located in an interval <1.0-Mb long in one step without prior knowledge about its location. Nevertheless, the polymorphism ratio between the parents of a mapping population is an important consideration when using this strategy in cotton. In this study, only 7,332 out of 50,482 (14.5 %) markers could be used for association analysis, and most (71.5 %) were not polymorphic. Considering the very low polymorphism observed in the current study, it is recommended that more sequence coverage is needed to achieve acceptable marker density in the upland × upland intraspecific cross.

The syntenic region defined by two franking markers (Marker21373 and Makrer76506) in the reference D5 genome was predicted to contain 20 genes (JGI annotation v2.1; Paterson et al. 2012). One of the putative genes, Gorai.001G121800, was found to show high homoeology with gene ATC (A. thaliana CENTRORADIALIS homolog). The G. arboreum homolog of Gorai.001G121800, Cotton_A_09584, is also located in the syntenic region in the A2 reference genome (Fig. 6a). In Antirrhinum majus L, the centroradialis mutation causes the normally indeterminate inflorescence to terminate in a flower (Bradley et al. 1996). ATC functions as a floral inhibitor (Huang et al. 2012) and encodes a protein similar to TFL1 (TERMINAL FLOWER 1), which functions as regulator of cell aging and flower development (Liljegren et al. 1999; Grbic and Bleecker 2000). Although ATC mutants flower normally under long-day conditions (Huang et al. 2012), the tfl1 mutants flower earlier and have determinate inflorescences (Shannon and Meeks-Wagner 1993). The similar pattern of branching and flowering between tfl1 and nulliplex-branch plants suggests that Gorai.001G121800 might be the candidate for the nulliplex-branch gene. McGarry and Ayre (2012) delivered FLOWERING LOCUS T in Arabidopsis to ancestral accession and domesticated variety of cotton, and observed some plant architectural changes, such as precocious determinate architecture and lanceolate leaf shape. It appears that some homologies of genes involved in the establishment of plant architecture of Arabidopsis might also affect the plant architecture of cotton.

Author contribution statement

W. C. conducted genetic analysis, marker development, and mapping analysis; J. Y., L. C., and Z. Y. conducted the sequencing; Y. L. conducted genotyping of the mapping population; Y. Z. designed and supervised the study; and W. C. and Y. Z. analyzed the data and wrote the paper. All authors discussed the results and commented on the manuscript.