Introduction

Hemp, Cannabis sativa L., is a non-food crop currently grown for a wide range of end products derived from its fibre, wooden core, or seeds, such as textiles, paper pulp, lime-hemp concretes, or hempseed oil (Struik et al. 2000). After a decline in its cultivation over the last 200 years, this species is now perceived as a potentially profitable crop suitable for cultivation in sustainable farming systems (van der Werf et al. 1996).

Hemp is a diploid species (2n = 20) that has heteromorphic sex chromosomes and is characterized by sexual dimorphism. Male plants of dioecious hemp have XY sex chromosomes, and female plants have XX sex chromosomes (Yamada 1943, cited by Sakamoto et al. 1995). The sex of the plant (male/female) is controlled by an X-to-autosomes equilibrium rather than by a Y-active system (Westergaard 1958; Ainsworth 2000). The presence of a pseudoautosomal region allowing recombination between the X and Y chromosomes has been reported in dioecious hemp (Peil et al. 2003), as well as the existence of polymorphisms between the X chromosomes (Peil et al. 2003; Rode et al. 2005). Morphologically, male plants of dioecious hemp are characterised by hanging panicles with few or no leaves, and staminate flowers. The female plants have a distinct appearance, bearing racemes with leafy bracts and pistillate flowers. Despite the presence of sex chromosomes, some plants of dioecious hemp produce flowers of the sex opposite to the main one. This anomalous flowering trait is affected by external factors, such as the photoperiod or hormonal treatments (Freeman et al. 1980).

Monoecious hemp has homomorphic sex chromosomes (XX) (Menzel 1964; Faux et al. 2014; Razumova et al. 2015). However, sex expression, defined as the ratio of female to male flowers, is highly variable, and the genetic determinism of this variation is unknown. The plants of monoecious hemp have inflorescences similar to those of female plants. However, they produce male and female flowers with ratios that differ among nodes along the stem, plants, cultivars, over time (Faux et al. 2013, 2014; Faux and Bertin 2014), and in response to external factors (Arnoux et al. 1966a, b; Arnoux and Mathieu 1969; Truta et al. 2007). The quantitative variation in sex expression along the stem of monoecious hemp plants has been successfully described as a logistic function of the node position (Faux and Bertin 2014). The observation of genotypic variability in sex expression in monoecious hemp led us to assume that this trait can be inherited, and therefore, it could be investigated by searching for quantitative trait loci (QTL) (Faux et al. 2014).

Agriculturally, hemp cultivation is significantly affected by the reproductive features of the plant. Although this species is naturally dioecious, both dioecious and monoecious cultivars exist. Dioecious cultivars have relatively high stem yields. However, male plants flower and senesce earlier than female plants, so it is difficult to determine the optimal time for harvesting the fiber of such cultivars as well as to harvest the seeds mechanically. This restricts seed multiplication for such cultivars (Bocsa and Karus 1998). In contrast, monoecious cultivars mature synchronously, allowing for the mechanical harvest of both stems and seeds (Mandolino and Carboni 2004). Consequently, seeds of these cultivars are easily obtained. In addition, these cultivars can improve the profitability of the farming operation by allowing the production of both stems and seeds (Bocsa and Karus 1998).

In monoecious hemp cultivars, the masculinised plants set few seeds and are more prone to fungal infections (Fournier and Beherec 2006). Therefore, strongly masculinised plants must be eliminated to produce good-quality seeds from monoecious cultivars (Beherec 2000). A previous study showed that sex expression in monoecious hemp varied among cultivars, while higher seed yields were obtained from early and mid-early feminised cultivars (Faux et al. 2013). Given its practical implications in both cultivation and breeding, sex expression is considered to be one of the most important factors in the genetic improvement in hemp (Mandolino and Carboni 2004; Ranalli 2004).

So far, two non-saturated molecular maps have been published for hemp (Mandolino and Ranalli 2002; Peil et al. 2003, cited by Mandolino and Carboni 2004). Both maps were constructed using F1 segregating progenies. The first map (Mandolino and Ranalli 2002) was based on 40 individuals derived from a cross between a monoecious plant and a female plant; thus, the Y chromosome was not included. In that study, 66 RAPD markers segregating 1:1 were distributed among 11 co-segregation groups in the female parent map, and 43 markers were distributed among nine co-segregation groups in the monoecious parent map. The second map was obtained from a cross between a male plant and a female plant of a dioecious accession. A total of 122 AFLP markers was distributed among 10 linkage groups (Peil et al. 2003, cited by Mandolino and Carboni 2004).

Sex-specific AFLP markers were successfully detected in hemp by Flachowsky et al. (2001). The AFLP technique was also used to detect sex-specific markers in Dioscorea tokoro (Terauchi and Kahl 1999) and Asparagus officinalis (Reamon-Büttner and Jung 2000). The AFLP technique is known for its reliability and the potentially high number of amplified fragments, which makes it a useful tool for the construction of genetic maps (Vos et al. 1995; Mueller and Wolfenbarger 1999; Meudt and Clarke 2007). Besides, according to Rouppe van der Voort et al. (1997) and Qi et al. (1998), the homology of AFLP products among populations belonging to the same species is nearly always valid, allowing AFLP markers to be used to integrate different linkage maps of a given species.

In this study, we investigated the roles of sex chromosomes in the genetic determinism of sex expression in dioecious and monoecious hemp. The experimental materials were segregating populations derived from dioecious and monoecious hemp cultivars. The use of a dioecious population allowed us to map a ‘sex’ phenotypic marker, and thus, to identify sex-linked markers. Genetic mapping was carried out using AFLP markers, and the genetic determinism of sex expression was investigated using a QTL analysis. We searched for QTLs using mixed models, because of their high flexibility to incorporate co-variables (such as time) into analyses, and to adequately model the residual genetic variation (Malosetti et al. 2004, 2006, 2007; Boer et al. 2007; Pastina et al. 2012). There were two specific objectives: (i) to identify sex-linked markers in dioecious hemp and, by homology, markers and co-segregation groups putatively located on the sex chromosomes in monoecious hemp; and (ii) to identify QTLs associated with sex expression in hemp.

Materials and methods

Genetic material

Three hemp cultivars were used: one dioecious, ‘Carmagnola’, and two monoecious, ‘Uso 31’ and ‘Fedora 17’. Seeds of the dioecious cultivar were obtained from Assocanapa (Carmagnola, Italy), and seeds of both monoecious cultivars were obtained from the Fédération Nationale des Producteurs de Chanvre (Le Mans, France). Three populations were created, all consisting of the full-sib F1 progeny of a cross between two outbred parents. The first two populations (C1 and C2) were derived from a cross between male and female ‘Carmagnola’ plants. The third population (UF) was obtained from a cross between ‘Uso 31’ as the male parent and ‘Fedora 17’ as the female parent after emasculation. ‘Uso 31’ has been described as being more masculinised than ‘Fedora 17’, while both cultivars exhibit similar earliness, allowing their flowering periods to synchronise (Faux et al. 2013, 2014). The C1, C2, and UF populations included 77 (43 females and 34 males), 76 (48 females and 28 males), and 167 individuals, respectively. The relatively poor seed set in the controlled crosses in ‘Carmagnola’ made the use of two populations necessary.

Growth conditions

The plants were cultivated in a greenhouse. One single plant per genotype was grown. Growth conditions were as described in Faux and Bertin (2014), except for the photoperiod, which was 16 h during the entire trial period for UF and 16 h for 69 days and 8 h thereafter to promote flowering of the C1 and C2 populations.

Molecular data

Molecular data were extracted for each of the 320 F1 individuals and six parental individuals. DNA extraction was conducted as described in Faux et al. (2014). AFLP amplifications were performed according to Vos et al. (1995) with slight modifications according to Flachowsky et al. (2001) and Peil (pers. comm. 2009). DNA was cleaved using the restriction enzymes HindIII and Tru91 (same restriction site as MseI) (Promega Corp, Madison, WI, USA). After ligation with adaptors specific to each restriction enzyme, pre-selective amplifications were performed with both HindIII- and Tru91-primers with one selective nucleotide (A). The pre-amplification mixture was diluted 20 times with sterile H2O. The selective amplifications were performed using a HindIII-primer as a rare cutter and both HindIII- and Tru91-primers with three selective nucleotides. Eight distinct primer combinations (Flachowsky et al. 2001) were tested (Table 1). All primer and adapter sequences were designed by Eurogentec SA (Seraing, Belgium). PCRs were run on a PTC 100 thermal cycler (MJ Research, Waltham, MA, USA).

Table 1 Selective nucleotides of each primer combination used to generate AFLP markers

The amplification products were fractionated with an automated ABI Prism 3100 Genetic analyser (Applied Biosystems, Warrington, UK) according to the manufacturer’s instructions. Their sizes were scaled with the molecular standard GeneScan-500 Rox. The size and intensity of the amplified fragments were visualised using Peak ScannerTM v1.0 free software (Applied Biosystems 2006), and the fragments were between 40- and 400-bp long.

Segregation of AFLP markers

Each amplification was scored as a dominant marker—i.e., according to its presence (allele a) or absence (allele o)—in each individual. The markers were labelled with the number of the primer combination (Table 1) followed by the molecular weight of the DNA fragment (in bp). A cross type was attributed to each marker in each population according to its presence in the parents and the segregation pattern in the offspring, which was tested against both possible ratios (1:1 and 3:1) using Chi square tests (χ2). Three cross types were distinguished: markers that were heterozygous in the female parent and absent from the male parent; markers that were heterozygous in the male parent and absent from the female parent (all of these segregating 1:1 in the offspring); and markers that were heterozygous in both parents, segregating 3:1.

Identification of sex-linked markers

In the dioecious populations, the sex of each plant was coded as 0 (female) or 1 (male) and used as a phenotypic marker. Sex-linked AFLP markers were detected using the five classes of markers segregating with the ‘sex’ phenotypic marker as defined by Peil et al. (2003) (Table 2). Classes A, B, and C contain markers that are polymorphic on the X chromosome with no fragment on the Y chromosome. Class A markers are heterozygous in both parents, class B markers are heterozygous in the female parent only, and class C markers are heterozygous in the male parent only. Class D markers are heterozygous in both parents with the ‘presence’ allele located on the Y chromosome in the male parent, indicating a pseudoautosomal region. Class E includes male-associated markers: they are heterozygous in the male parent only, with the ‘presence’ allele located on the Y chromosome. Recombinants between class B and sex cannot be detected as a result of their respective allelic configurations. Therefore, class B markers are identified by linkage to class A markers in the male progenies and to class D markers in the female progenies. In the present study, the markers that responded to the cross type and segregation criteria of class B (Table 2) but were not linked to a class A or D were referred to as ‘putative class B’ markers. These markers could be located on both sex chromosomes and autosomes.

Table 2 Five classes of sex-linked markers as defined by Peil et al. (2003)

The markers were assigned to a class of sex-linked markers by testing their segregation ratios in male and female progenies against the expected segregation ratios using a Chi square test. Deviations from the expected segregation ratios at alpha = 0.01 were discarded.

Map construction

The linkage analysis and map construction were performed independently for each segregating population using the OneMap package (Margarido et al. 2007, 2012). The linkage analysis was performed using the same statistical stringency to construct each of the three maps, to allow for meaningful comparisons among them. The LOD score and recombination fraction thresholds used were 3.5 and 0.5, respectively. The same threshold values of LOD score and recombination fraction were used to order markers within co-segregation groups. Map distances were based on the Kosambi function. The dioecious maps (C1 and C2) were integrated by using the markers that were mapped in both populations. Similarly, the monoecious map was integrated using the markers shared with C1 or C2.

Phenotyping

Sex expression was characterized based on the reproductive morphology of the plants (dioecious or monoecious). In the dioecious populations, the plants were phenotyped at three times, i.e., 77, 84, and 98 days after sowing. The sex of each plant was recorded as a binary variable, i.e., male or female. However, flowers of the sex opposite to the main one were observed on some plant nodes. For each male plant at each observation time, %F-♂ indicated the percentage of nodes bearing female flowers. Similarly, for each female plant at each observation time, %M- indicated the percentage of nodes bearing male flowers.

In the monoecious population, the continuous variation of sex expression was characterized by the monoecy degree (MD), a five-point scale ranging from 1 (mostly male flowers) to 5 (mostly female flowers) (Sengbusch 1952; Faux and Bertin 2014). We scored MD for each flowering node at six times, i.e., 43, 50, 57, 64, 71, and 78 days after sowing. Eight phenotypic variables were used to summarise the sex expression of each monoecious plant at each observation time (Table 3), as described in Faux and Bertin (2014).

Table 3 Phenotypic variables used to characterize sex expression in dioecious and monoecious hemp populations

QTL analysis

The QTL analysis was performed in each segregating population independently. Putative QTLs were identified by interval mapping (Lander and Botstein 1989) before being tested in a multiple regression similar to that applied by Pastina et al. (2012) to an F1 segregating population of sugarcane.

  1. (i)

    Computation of genetic predictors

Three QTL effects were estimated: two additive effects, α P in parent P and α Q in parent Q, and one dominance effect, δ PQ . Genetic predictors were constructed for each of the three QTL effects (Lynch and Walsh 1998; Pastina et al. 2012) and computed for a grid of evaluation points along the genome. These predictors were introduced as explanatory variables in the QTL models (see below).

  1. (ii)

    Identification of putative QTLs by interval mapping (IM)

For notation purposes, the random variables are underlined in the equation below. The presence of a putative QTL at evaluation genomic point w was assessed using the following model, hereafter referred to as the IM model:

$$\underline{y} {}_{i} = \mu + x_{{P_{iw}^{{}} }} \alpha_{{P_{w} }} + x_{{Q_{iw} }} \alpha_{{Q_{w} }} + x_{{PQ_{iw} }} \delta_{{PQ_{w} }} + \underline{e}_{i}$$
([1])

where y j is the phenotypic value of individual i, µ is the overall mean phenotypic value, \(\alpha_{{P_{w} }}\), \(\alpha_{{Q_{w} }}\) and \(\delta_{{PQ_{w} }}\) are the effects of each of the three genetic predictors, \(x_{{P_{w} }}\), \(x_{{Q_{w} }}\) and \(x_{{PQ_{w} }}\), respectively, and e i is the residual error associated with individual i, normally distributed with mean 0 and variance σ 2.

For the binary variable sex, the IM model was tested using the GLIMMIX procedure with binomial distribution for the response variable and logit link function in SAS (SAS Institute Inc. 2012). All other phenotypic variables (Table 3) were analysed by allowing non-null correlations between the observations made at distinct times on the same plant. For this purpose, the MIXED procedure in SAS was used with the REPEATED statement including type = AR(1) and subject = individual.

The search for putative QTLs was performed for each phenotypic variable similar to the analysis described by Pastina et al. (2012). A putative QTL was considered to be found at position w if α P , α Q , or δ PQ was significantly different from 0 at P < 0.01. The putative QTLs corresponding to a local minimum p-value were retained for the multi-QTL analysis. For the variable sex only, the percentage of variation (r 2) explained by each additive effect was obtained from the fit of the IM model. This was not possible for the other variables because different methods were used to fit the model 1—i.e., restricted maximum likelihood rather than least-squares (SAS Institute Inc. 2012).

  1. (iii)

    Identification of QTLs by multi-QTL analysis

For each phenotypic variable, the genetic effects of each putative QTL as detected by the IM model were tested together in an additive multiple regression using the following model:

$$\underline{y} {}_{i} = \mu + \sum\limits_{{w{\kern 1pt} \in {\kern 1pt} (W_{P} ,W_{PQ} )}}^{{}} {x_{{P_{iw}^{{}} }} \alpha_{{P_{w} }} } + \sum\limits_{{w{\kern 1pt} \in {\kern 1pt} (W_{Q} ,W_{PQ} )}}^{{}} {x_{{Q_{iw}^{{}} }} \alpha_{{Q_{w} }} } + \sum\limits_{{w{\kern 1pt} \in {\kern 1pt} W_{PQ} }}^{{}} {x_{{PQ_{iw} }} \delta_{{PQ_{w} }} } + \underline{e}_{i}$$
([2])

where W P , W Q , and W PQ are the sets of positions along the genome accounting for a putative QTL with an α P , α Q , and δ PQ effect, respectively. The genetic predictors of both additive effects at the positions accounting for a putative QTL with a dominance effect (W PQ ) were integrated into model 2 for more powerful QTL detection (Pastina et al. 2012). The multi-QTL analysis was performed using the GLIMMIX and MIXED procedures in SAS similar to the QTL analysis using the IM model (model 1).

A multi-QTL model was constructed for each phenotypic variable through a procedure of backward selection of genetic effects starting from model 2. At each step, model 2 was tested, and the effect associated with the highest non-significant (P > 0.05) p-value was removed until no effect was associated with a non-significant p-value. The resulting model was referred to as the multi-QTL model. The genomic positions selected in the multi-QTL model were considered to be QTLs, and the effect and standard error of each QTL were estimated from the multi-QTL model (Malosetti et al. 2006).

Cross-populations analysis

A cross-population (CP) analysis was conducted to assess the consistency of the effect of putative QTLs across populations. This analysis was restricted to the putative QTLs mapped at marker positions, since these were the only positions that could be assumed to be homologous between different linkage maps. For this purpose, all of the markers accounting for a putative QTL as detected by the IM model (model 1) in at least one of the three populations were retained.

The CP analysis was first performed across both dioecious populations including a ‘population’ cofactor, and then across the dioecious and monoecious populations. The consistency of the segregation pattern of the retained markers with sex across both dioecious populations was tested as follows:

$$\underline{y} {}_{ik} = \mu + \gamma_{k} + (\beta + \gamma \beta_{k} ) \times M + \underline{e}_{ik}$$
([3])

where \(\underline{y} {}_{ik}\) is the sex of individual i from dioecious population k (male/female), µ is the mean value over the two populations, γ k is the effect of population k, β is the mean effect of the marker, and (βγ) k its effect in population k, M is −1 or 1 according to the presence/absence of the marker in the ith individual of population k, and \(\underline{e}_{ik}\) is the residual error associated with individual i in population k. Then, the effect of the retained markers was tested on the phenotypic variables recorded in the monoecious population using model 3 without the ‘population’ cofactor. The CP analysis was performed using the GLIMMIX and MIXED procedures in SAS similar to the QTL analysis using the IM model (model 1).

Results

AFLP markers

The eight primer combinations (Table 1) allowed the detection of 410, 415 and 357 markers in the C1, C2 and UF populations, respectively, generating a total of 480 distinct AFLP markers (Table 4). Out of these, 287 (60 %) were scored in all three populations. The percentage of markers shared by C1 and C2 (81 %) was higher than that shared by C1 and UF (67 %) or C2 and UF (68 %).

Table 4 The number of markers scored, selected for mapping, and mapped in each population

Among the 480 detected markers, 274 (67 % of 410 markers), 182 (44 % of 415 markers) and 184 (52 % of 357 markers) segregated in C1, C2 and UF, respectively. They accounted for a total of 385 distinct segregating markers, out of which 126 segregated in both C1 and C2, 111 in both C1 and UF, 72 in both C2 and UF, and 54 in all three populations.

Among the 385 segregating markers, 193 (70 % of 274 markers), 147 (81 % of 182 markers) and 115 (63 % of 184 markers) segregated independently (1:1 or 3:1) in C1, C2 and UF, respectively. These accounted for a total of 319 distinct markers available for mapping, out of which 75 were found in both C1 and C2, 46 in both C1 and UF, 34 in both C2 and UF, and 19 in all three populations. The independently segregating markers consisted of 71, 53 and 64 markers segregating 1:1 (heterozygous in only one of both parents), and 122 (63 % of 193 markers), 94 (64 % of 147 markers), and 51 (44 % of 115 markers) markers segregating 3:1 (heterozygous in both parents) in C1, C2, and UF, respectively.

Linkage maps

The linkage analysis resulted in the mapping of 93, 92 and 86 AFLP markers assigned to 11, 16 and 10 co-segregation groups in C1, C2 and UF, respectively (not shown). These markers accounted for a total of 225 distinct mapped markers, among which 22 were shared by C1 and C2, the two ‘Carmagnola’ maps, 12 by C1 and UF, 17 by C2 and UF, and only five by all three maps. Out of these markers, 56, 67, and 41 % were heterozygous in both parents in each map, respectively (Table 4).

Among the mapped markers, 23, 42, and 26 were linked, directly or through homologous markers, to sex in C1, C2, and UF, respectively. They were distributed along 3, 6, and 3 co-segregation groups putatively located on sex chromosomes in each map, respectively (Fig. 1). They accounted for a total of 71 distinct markers, among which 9 were shared by C1 and C2, 3 by C1 and UF, 9 by C2 and UF, and a single one by all three maps (6_255).

Fig. 1
figure 1

Integration of three linkage maps (C1, C2, and UF) showing markers putatively located on sex chromosomes in hemp and location of QTLs for sex expression. C1 and C2 maps were derived from a dioecious population (‘Carmagnola’ × ‘Carmagnola’) and included a ‘sex’ phenotypic marker in C1-1 and C2-1, respectively. UF map was derived from a monoecious population (‘Uso 31’ × ‘Fedora 17’). Name of segregating population and number of co-segregation group are shown above co-segregation groups. Markers are labelled according to number of primer combination (Table 1) and molecular weight of corresponding DNA fragment. Cross type is indicated by different fonts: normal font for markers heterozygous in female parent only, italic for markers heterozygous in male parent only, and bold for markers heterozygous in both parents. Font colour indicates class of sex-linked markers (Table 2). Kosambi map function was used. LOD and maximum recombination fraction were 3.5 and 0.5, respectively. Coloured lines indicate regions including a putative QTL as revealed by interval mapping (model 1) with –log10 (p-value) ≥ 2 for six variables characterizing sex expression (Table 3). Arrows indicate most likely location of QTL as identified by multi-QTL analysis (model 2). Asterisk indicates modification of scale of QTL effects due to very high likelihood ratios in C1-1 and C2-1: arrows in legend refer to –log10 (p-value) of 20, 15, and 10 instead of 4, 3, and 2, respectively. Double asterisk indicates markers retained from the cross-population analysis (Table 8)

The C1, C2, and UF maps of the sex chromosomes covered total distances of 313.5, 399.8, and 489.2 cM, respectively (Fig. 1). The size of the CGs ranged between 6.1 and 252.9 cM in C1, 5.0 and 166.3 cM in C2, and 17.1 and 308.1 cM in UF. Markers were separated by an average distance of 13.7, 9.5, and 18.8 cM in the C1, C2, and UF maps, respectively, and a maximum distance of 43.5 (C1-2), 41.3 (C2-5), and 65.0 cM (UF-1) in each map, respectively. The largest gap (65.0 cM in UF-1) resulted from ignoring the recombination fraction between markers heterozygous in the male parent and those heterozygous in the female parent, as demonstrated by Maliepaard et al. (1997).

Co-segregation groups including the ‘sex’ phenotypic marker in the dioecious maps

The ‘sex’ phenotypic marker was mapped in C1-1 and C2-1 (Table 5; Fig. 1). These CGs, referred to as ‘sex CGs’, included eight and 19 AFLP markers, respectively—21 different markers in total. Among them, six and 12 markers—14 different markers—were assigned to classes A, D, or E (Table 2) in C1 and C2, respectively.

Table 5 Structure of co-segregation groups (CGs) including the ‘sex’ phenotypic marker in the dioecious maps C1-1 and C2-1. Region in CG1, marker name, position along CG, recombination rate with sex, marker class (Table 2), number of male and female progenies according to presence (M)/absence (m) of marker, and parental genotypes

Four successive regions were determined in the sex CGs based on the recombination rate of the markers with sex (Table 5): (1) a region including class A markers in addition to markers that were not assigned to a given class of sex-linked markers; (2) a central region surrounding the sex locus and including all of the class E and some class A and D markers; (3) a second region including class A, class D, and non-classed markers; and (4) a terminal region that recombined completely with sex. This latter region was found in the C2 map only.

The markers for regions 1 and 3 were mapped on the X or Y parental chromosomes (Table 5). The recombination rates between the markers mapped in these regions and sex ranged from 0.11 to 0.3. In C1, regions 1 and 3 included one class A marker (5_254) and one class D marker (8_164), respectively. The latter marked a fragment common to an X chromosome of the female parent and the Y chromosome (Table 2). In C2, seven markers were mapped in regions 1 and 3. Two of these markers belonged to class A (6_63 and 1_318). The class A markers mapped in regions 1 and 3 in C1 or C2 were absent from some female progenies, indicating recombination with sex in the male parent. In addition to class A markers, regions 1 and 3 included six markers with segregation ratios similar to those of class A markers, but significantly different from the segregation pattern expected for class A (2_299 in C1; 2_79, 1_204, 1_194, 1_274 and 1_364 in C2). The segregation pattern of these markers could be due to distorted segregation, resulting from the preferable inheritance of one of the homologous chromosomes. Among these markers, 2_299 was detected as a class A marker in C2, supporting the occurrence of distorted segregation in C1 for this marker.

Region 2 included the ‘sex’ phenotypic marker in addition to markers derived from the parental X or Y chromosomes. The markers mapped in this region had recombination rates with sex ranging from 0 to 0.07. In C2, all class E markers except one completely segregated with sex (Table 5). In contrast, three to five recombinants between class E and sex were found in C1, and one class E marker (5_85) was found in a female plant in C1. Considering both dioecious maps together, all the markers assigned to class E in the present study showed at least one recombinant with sex. In C2, the class A marker 6_255 and the class D marker 2_289 were mapped at the sex locus. The marker 6_255 accounted for a fragment mapped on X parental chromosomes, while the marker 2_289 indicated the presence of a fragment common to an X of the female parent and the Y chromosome of the male parent.

Region 4 was determined by two markers mapped on the X parental chromosomes at the extremity of the sex CG in the C2 map (1_284 and 6_336; Table 5). Both markers completely recombined with sex, with recombination rates of 0.41 and 0.50 with sex, respectively, thus indicating the presence of a pseudoautosomal region. The marker 1_284 was mapped in the sex CG by linkage in the coupling phase with the class A marker 1_318 in the male parent. Thus, it was mapped onto the X chromosome of the male parent. The second marker (6_336) was linked in the coupling phase with marker 1_284 in the male parent and thus, also mapped onto its X chromosome.

Identification of CGs putatively located on sex chromosomes

Different CGs putatively located on the sex chromosomes were identified in each map through the presence of markers mapped in the sex CG of the dioecious maps. These CGs were referred to as putative sex CGs. Five markers mapped in the sex CG of the C2 map were also found in the UF population (Table 5). Two of them (6_255 and 2_289) were mapped at the sex locus, and one of them (6_336) was mapped in the pseudoautosomal region of the sex CG. These five markers were mapped in three distinct putative sex CGs in UF (UF-1, 2 and 3; Fig. 1). The mapping of 6_255 and 6_336 in the same CG (UF-1) and the co-segregation of 2_289 and 1_194 (UF-2) in the UF map supported the presence of 6_336 and 1_194 on the sex chromosomes. These two markers showed relatively high recombination rates with sex in C2 (Table 5).

Two putative sex CGs were identified in the C1 map (Table 6). C1-2 included a marker that was linked to the sex and assigned to class A in C2 (6_255). Among the 13 markers linked in C1-2, 12 consisted of an allele derived from the female parent only and 10 were putative class B markers, being heterozygous in the female parent only and segregating 1:1 in both male and female progenies. One of them (6_72) was also a putative class B marker in C2. The second CG putatively located on the sex chromosomes in C1 (C1-3) was identified through the presence of one marker common to a putative sex CG in UF (5_71, present in both C1-3 and UF-1). This marker was a putative class B marker in C1 and was linked to a class D marker (7_130) in C2-3.

Table 6 Structure of co-segregation groups (CGs) putatively located on sex chromosomes in the dioecious maps C1 and C2. Marker name, position along CG, marker class (Table 2), number of male and female progenies according to presence (M)/absence (m) of marker, and parental genotypes

Five putative sex CGs were identified in the C2 map (Table 6). In C2-4, one marker heterozygous in both parents (2_66) was completely linked to the sex in the male parent, but its segregation in the female progenies did not correspond to the expected 1:1 ratio of class D markers (Table 2). C2-2, -4, and -5 included two, one, and four putative class B markers, respectively. C2-3 included a marker for which the cross type and segregation ratios in the female and male progenies corresponded to a class D marker at alpha = 0.01 (7_130).

QTL analysis

In total, 336, 440, and 515 points were evaluated in the C1, C2, and UF maps, respectively. The search for QTLs by interval mapping and multi-QTL analysis led to the identification of five distinct QTLs for sex expression in each map. The total number of QTLs per population was the sum of the individual QTLs detected for different phenotypic variables (Table 7; Fig. 1).

Table 7 QTL analysis for sex-expression related variables in dioecious (C1 and C2) and monoecious (UF) hemp populations
  1. (i)

    Dioecious populations (C1 and C2)

Because of colinearity, the multi-QTL analysis for sex was performed after discarding the sex locus from the multi-QTL model (model 2). In C1, two QTLs were found for sex, both mapped in the sex CG (C1-1). One of them was mapped close to a class E, male-associated, marker (5_323; Tables 5, 7) and explained a large proportion of the sex variation (85.2 %). The second QTL for sex was mapped at the locus of a class A marker (2_299), which is found on X chromosomes in both the female and male parent, and had a dominance effect. In C2, three QTLs with both additive and dominance effects were detected for sex. All of them were mapped in the sex CG (C2-1) close to markers mapped on X parental chromosomes. One of them was common to C1 (2_299). Large proportions of phenotypic variation (> 73 %) were explained by the two closest QTLs to the sex locus (2_299 and 1_220).

In C1, two QTLs were identified for the percentage of nodes bearing female flowers in male plants (%F-), and three QTLs for the percentage of nodes bearing male flowers in female plants (%M-). One of them (2_299) was a QTL for both %F- and %M- in addition to sex. This QTL had a positive additive effect on %F- and a negative effect on %M-, both due to the male parent (α Q ). Therefore, this QTL could include feminising genetic factors located on the male-parent X chromosome (Table 7). In C2, two QTLs were identified for %F- in a putative sex CG (C2-6), and none for %M-. Except for 2_299 in C1, which had both additive and dominance effects, all of the QTLs detected for %F- and %M- in C1 and C2 had additive effects only. These QTLs were located in putative sex CGs.

  1. (ii)

    Monoecious population (UF)

Five QTLs were found for variables related to sex expression (Table 7) in UF. All except one were associated with variation in structure variables—i.e, the parameters of the logistic function describing the monoecy degree along the stem (Faux and Bertin 2014). Two QTLs had additive effects only on sex expression: 1_106 (UF-3) on the percentage of nodes with intermediate monoecy degree (%MDinter), and 4_241 (UF-1) on the parameter accounting for the node at which there is maximum variation in monoecy degree along the stem (log_NDm). One of them (4_241) was closely linked to marker 6_255, which was itself mapped at the sex locus in C2 (Fig. 1). The three other QTLs had both additive and dominance effects (1_149 in UF-3), or only dominance effects on sex expression (4_121 and 6_215 in UF-1). One of these QTLs (6_215) was also identified close to marker 6_255.

Cross-population analysis

The cross-population analysis identified three markers that accounted for a putative QTL in UF and segregated with sex in both C1 and C2 (4_121, 2_66 and 6_255, showing a significant ‘M’ effect in Table 8), and one marker that accounted for a putative QTL in C2 and was associated with variation of sex expression in UF (2_271). All of these four markers were detected in the three populations; however, all of them were mapped in UF only (UF-1; Fig. 1). The identification in UF-1 of markers that segregated with sex in both C1 and C2 supported the presence of the CG on the sex chromosomes of the monoecious population. Three of these markers were mapped at positions that were, by homology, close to the sex locus (2_66, 6_255, and 2_271; Fig. 1).

Table 8 Cross-population analysis of sex expression: markers that segregated with sex in the dioecious populations (C1 and C2) and accounted for a putative QTL in the monoecious population (UF), or accounted for a putative QTL in a dioecious population (C1 or C2) and were associated with variation of sex expression in UF

The first marker (4_121) was more frequent in male plants across both dioecious populations and accounted for a QTL for the parameter determining the curvature of the logistic function (log_k) in UF. The second marker (2_66) was present in all male plants and in 77 % of female plants in both C1 and C2, and it was associated with variation in log_ND m in UF. The third marker (6_255) was mapped on the X chromosomes at the sex locus in C2. It was a class A marker in C2 and a putative class B marker in C1 (Tables 5, 6). In UF, this marker was associated with feminised phenotypes (positive effect on the mean monoecy degree, mMD, and negative effect on the percentage of highly masculinised nodes along the stem,  %MD1), an increased log_k, and a decreased log_ND m . The fourth marker (2_271) was a putative class B marker in C2 (Table 6). This marker was associated with variation in four sex-expression variables in UF (Table 8).

Discussion

In this study, we focused on the role of the sex chromosomes in the genetic determinism of sex expression in both dioecious and monoecious hemp. First, we constructed three genetic maps of the sex chromosomes in hemp: two derived from a cross between female and male plants of the dioecious cultivar ‘Carmagnola’, and one derived from a cross between the monoecious cultivars ‘Fedora 17’ and ‘Uso 31’. The genetic determinism of sex expression was then investigated using a QTL analysis. Because of the high variability of the sexual phenotype in hemp, we searched for QTLs based on distinct variables that dissect the variability of sex expression (Table 3; Faux and Bertin 2014). To our knowledge, this is the first study to integrate linkage maps from both dioecious and monoecious hemp, and the first report of a QTL analysis in hemp.

AFLP markers

The percentage of markers scored simultaneously in both dioecious populations was relatively high (81 %), as expected from their common ‘Carmagnola’ origin, although the percentage of markers scored simultaneously in one dioecious population and in the monoecious population was also large (67 % for C1 and UF, and 68 % for C2 and UF). This was consistent with the structure of the genetic diversity reported in hemp by Forapani et al. (2001), who concluded to the existence of a widely shared gene pool with limited genetic separations among groups.

The percentage of segregating markers was the lowest in C2 (44 %), followed by UF (52 %) and C1 (67 %). These values were relatively high compared to the percentage of RAPD-segregating markers found in a hemp progeny obtained from a cross between a ‘Carmagnola’ female plant and a monoecious plant (39.1 %) by Carboni et al. (2000). The difference in percentage of segregating markers found between UF and C1 can be explained by the narrower genetic basis of monoecious hemp, as a result of its ability to self-pollinate (Bocsa and Karus 1998) and the selection pressure needed to maintain the monoecious trait (Forapani et al. 2001; Mandolino and Carboni 2004). However, the percentage of segregating markers found in the dioecious population C2 was particularly low. It indicated the presence of a relatively high proportion of markers at the dominant homozygous stage (‘aa’) in one or both parents, resulting in a relatively high proportion of not segregating loci. This could be attributed to the relatively large genetic diversity of hemp, which has been characterized by high proportions of polymorphic markers, including within the cultivar ‘Carmagnola’ (Forapani et al. 2001).

The number of markers that segregated independently in all three populations (19 among 319 independently segregating markers) and the number of markers shared by the three maps (5 among 225 mapped markers) were low in contrast with the relatively large number of markers simultaneously scored in all three populations (287 among 480 scored markers). Similarly, Waugh et al. (1997) identified only eight AFLP markers that segregated in three populations of barley among totals of 234, 194 and 376 mapped AFLP markers, while Hoarau et al. (2001) used a relatively low number of anchoring markers (45 AFLPs) to join two maps that included 887 AFLP and 408 RFLP markers in sugarcane.

In particular, relatively few markers segregated independently in both ‘Carmagnola’ populations (75 markers) and were finally shared by both ‘Carmagnola’ maps (22 markers). This can be explained by two effects. First, 47 % of the markers that segregated independently in C1 were fixed in C2. This indicated the presence of markers at the dominant homozygous stage in a given parent of the C2 population but not in the parents of the C1 population, resulting in markers that are fixed in C2 but segregating in C1, and consequently in a low number of markers segregating in both populations. Second, the percentage of markers heterozygous in both parents among the independently segregating markers was relatively high in both C1 and C2 (63 and 64 %; Table 4). Such bi-allelic markers heterozygous in both parents and segregating 3:1 in the offspring are hereafter referred to as C.8 markers according to the notation of Wu et al. (2002). According to Maliepaard et al. (1997), the presence of a high proportion of C.8 markers decreases the probability of detecting linkages between them, especially when the population size is small. The relatively high percentages of C.8 markers in C1 and C2 and the relatively small size of these populations (77 and 76 progenies) might have prevented the detection of linkages among some markers, thereby resulting in a relatively low number of markers that are linked in both C1 and C2. Nevertheless, the sizes of the ‘Carmagnola’ populations used in this study were equivalent to those of populations used in mapping derived from crosses between male and female plants of dioecious hemp accessions, i.e., 66 (Mandolino and Ranalli 2002) and 80 individuals (Peil et al. 2003).

Linkage maps

The three maps of the sex chromosomes of hemp were unsaturated, as reflected by the mapping of several co-segregation groups that included few markers (Fig. 1). As a result, except for the sex CGs (C1-1 and C2-1), the location of the CGs on the sex chromosomes could not be ascertained. Nevertheless, the mapping of putative class B markers or a class D marker in each of the putative sex CGs from the C1 and C2 map except for C2-6 supported the presence of these CGs on the sex chromosomes.

The integration of the three maps revealed different inconsistencies with respect to marker linkages. For example, there were linkages between markers mapped together in the sex CG in C2 but in different putative sex CGs in UF. Also, compared with the C1 and UF maps, the C2 map had more CGs (11, 16, and 10 CGs, among which 3, 6, and 3 were putatively located on the sex chromosomes in C1, C2, and UF, respectively). These inconsistencies were attributed to the relatively low accuracy of the recombination fraction estimates and the low number of markers shared by the present maps, including between the two ‘Carmagnola’ maps. According to Maliepaard et al. (1997), the accuracy of the recombination fraction estimates in the C2 map was negatively affected by the relatively high proportion of C.8 markers (67 % against 56 % in C1 and 41 % in UF), and, in both ‘Carmagnola’ maps, by their relatively small population sizes. Indeed, the high uncertainty for the linkages among C.8 markers can prevent the detection of linkages between some markers, which would in turn result in a higher number of small CGs in C2, as well as make it difficult to establish the correct order of markers. As discussed above, the numbers of markers shared among the present maps were relatively low, and this makes the integration of the maps more difficult.

The consecutive mapping of markers derived from the same primer combination in some CGs (C1-2, C2-1, and C2-6) could be due to the presence of repetitive DNA sequences. In hemp, Peil et al. (2003) observed clusters of AFLP markers along linkage groups. Similarly, Rogers et al. (2007) reported a non-random distribution of AFLP loci amplified using different selective primer combinations across linkage groups in whitefish (Coregonus clupeaformis). They raised the possibility that repetitive DNA in the genome, which is partially due to transposable elements, could result in the occurrence of repetitive AFLP sequences. In a study on Asparagus officinalis, Reamon-Büttner et al. (1999) observed clusters of hybridisation signals on chromosomes, suggesting that the AFLP fragments were parts of repetitive sequence families. According to Sakamoto et al. (2005), multiple sequences encoding retrotransposable elements are ubiquitous in the hemp genome.

Identification of sex-linked markers and CGs putatively located on sex chromosomes of the monoecious population

Four male-specific markers detected in the present study had sizes similar to male-specific AFLP markers reported in hemp: the markers 4_251, 4_276, and 5_323 correspond to ACC*AAG250, ACC*AAG275, and ACC*AGA323, respectively, in the study of Flachowsky et al. (2001).

The detection of sex-linked markers allowed (i) the characterisation of the structure of the sex CG in the dioecious maps and (ii) the identification of markers and CGs putatively located on the sex chromosomes in the monoecious map. Like Peil et al. (2003), we identified a region including markers of classes A, D, and E that were closely linked to sex (region 2), as well as a region including markers that completely recombined with sex at an extremity of the sex CG (region 4 in the C2 map). These observations confirmed the presence of common fragments between X and Y chromosomes—represented by class D—and the existence of a pseudoautosomal region in the sex chromosomes of hemp, as reported by Peil et al. (2003). However, our results differed from those of Peil et al. (2003) in that we observed higher recombination rates between the markers mapped in each of the four regions and sex. First, one to five recombinants were observed between each of our class E markers and sex in C1 or C2 (Table 5). In contrast, Peil et al. (2003) detected only one recombinant with sex and therefore assumed that their class E markers were located in a non-pairing portion of the Y chromosome. The recombinations observed between our class E markers and sex suggested that they were located in a region where pairing between X and Y chromosomes can occur. Second, the recombination rates found here between the pseudoautosomal markers and sex (≥0.41) were relatively high compared with the values of 0.25 and 0.27 in hemp reported by Peil et al. (2003) and Rode et al. (2005), respectively. Third, two additional regions were distinguished in the sex CGs (regions 1 and 3; Table 5). Without considering the distorted segregations of the markers, the recombination rates with sex observed in regions 1 and 3 ranged from 0.11 to 0.19. The recombination rates observed here in the sex CG (Table 5) suggest that the X and Y chromosomes of hemp recombine with each other between the sex locus and the pseudoautosomal region. The existence of recombination between the X and Y chromosomes in the male parent is supported by the pairing of both chromosomes at the short arm of the Y chromosome reported by Sakamoto et al. (2000).

As well as identifying sex-linked markers in dioecious hemp, we identified sex-linked markers in monoecious hemp. Five markers mapped in the sex CG of C2 were detected in UF (Table 5). Four of these were mapped on X chromosomes, including one in the pseudoautosomal region, and one was mapped on both X and Y chromosomes (class D). Since monoecious hemp has XX chromosomes (Menzel 1964; Faux et al. 2014), the mapping of pseudoautosomal and class D markers in monoecious hemp suggests the presence of homologous fragments between the Y chromosome of dioecious hemp and the X chromosomes of monoecious hemp.

Determinism of sex expression in dioecious hemp

Five QTLs associated with sex expression were identified in both C1 and C2 (Table 7). One of them was detected in both maps (2_299). Large proportions of phenotypic variation were explained by the QTLs for sex (from 73.2 to 85.2 % for the closest QTL to sex), which was linked to the presence of heteromorphic sex chromosomes. In addition to QTLs for sex, QTLs were identified for the percentage of nodes bearing flowers of the sex opposite to the main one (%F- in male plants and %M- in female plants). The identification of QTLs for those variables supported previous assumptions that there is a genetic basis for the production of flowers of the sex opposite to the main one in dioecious hemp [Borthwick and Scully 1954; Grassi and de Meijer (pers. comm.) cited by Moliterni et al. 2004].

According to our results, the role of sex chromosomes in the determinism of sex expression in dioecious hemp could be as follows. The region surrounding the sex locus (region 2 in Table 5) would include genetic factors involved in the differentiation of the male plants, as suggested by the identification of a QTL with an additive effect due to the male parent in C1 (5_323; Table 7; Fig. 1). In contrast to region 2, regions 1 and 3 of the sex CGs would not be involved in the differentiation of male plants given the higher recombination rates with the sex locus and the presence of only one QTL for sex with a dominance effect and a low r 2 (1_318 in C2; Tables 5, 7). Instead, these regions would include genetic factors involved in the production of flowers of the sex opposite to the main one, as suggested by the QTL for both %F- and %M- identified in C1 (2_299; Table 7). No QTL was detected in the terminal pseudoautosomal region of the sex CG in C2. However, beyond the four regions defined in the sex CGs, our results suggest that additional regions of the sex chromosomes carry genetic factors involved in the determinism of sex expression. Indeed, QTLs for %F- and %M- were mapped in putative sex CGs (6_337, 6_243 and 6_253 in C1-2, and 1_62 and 1_72 in C2-6; Table 7; Fig. 1). These QTLs had additive effects due to the female parent and would therefore be located on X chromosomes only.

Determinism of sex expression in monoecious hemp

In this study, we identified five distinct QTLs associated with sex expression in UF (Table 7), and four markers that were associated with variation of sex expression in UF and segregated with sex, or accounted for a putative QTL for sex expression, in C1 or C2 (Table 8). Two of these QTLs (4_241 and 6_215) and three of these markers (2_66, 6_255, and 2_271) were mapped in a region homologous to the sex-locus region of the dioecious maps. These results suggested that genetic factors involved in the determinism of sex expression in monoecious hemp exist on the sex chromosomes, and thus on X chromosomes. In addition, these results suggested that genetic factors with quantitative effects on sex expression would be closely linked to the sex locus on the X chromosomes in monoecious hemp.

The number of QTLs detected for each phenotypic variable related to sex expression in UF ranged from zero to three (Tables 3, 7). In total, the synthesis variables allowed the detection of one QTL (1_160 for  %MDinter), while the structure variables, which consisted of parameters of a logistic curve describing sex expression as a function of node position (Table 3), allowed the detection of four distinct QTLs. The higher number of QTLs found using the modelling approach highlights its utility for characterizing the variability of sex expression among monoecious hemp plants.

General discussion

The identification of QTLs for the quantitative variation of sex expression on the sex chromosomes of hemp and the recombination rates observed between the sex chromosomes contrasted with the situation in Silene latifolia, a well-studied dioecious species with heteromorphic sex chromosomes (Filatov et al. 2001). In S. latifolia, recombination is absent from most of the Y chromosome (Charlesworth 2002), while the genetic basis of sex determination is strong, and there is little evidence for lability or environmental effects (Ainsworth 2000). According to Charlesworth et al. (2005), the suppression of recombination between the sex chromosomes in dioecious species results from the presence of sex-determining genes and the evolution of Y-linked genes that benefit male but not female functions, both effects resulting in selection against recombinants. In hemp, it is possible that the recombination rates observed between the sex chromosomes allow the exchange of genetic factors that affect the production of flowers of a given sex. Compared with Silene, in hemp, the individuals with recombined sex chromosomes would have a relatively high adaptive value, as supported by the diversity of intersexual forms existing in the species.

According to Dellaporta and Calderon-Urrea (1993), the QTLs identified for sex expression in the present study could include genetic factors that regulate programs of sexuality through a signal transduction mechanism that modifies endogenous hormonal levels. Indeed, the formation of male and female generative organs in hemp may be associated with an increased demand for gibberellin and auxins, respectively (Galoch 1980). More recently, cDNA-AFLP fragments differentially expressed in female and male apices were identified. These fragments showed similarities to a Rac-GTP binding protein that plays a signalling role in auxin-regulated gene expression in Arabidopsis (Moliterni et al. 2004). These studies suggested that sex expression in hemp could be related to the presence of hormonal gradients along the stem. This could provide a physiological explanation for the model parameters used in this study to characterize the sex expression in monoecious hemp. Further studies combining physiological and QTL approaches are needed to test this assumption.

Conclusions

In this study, we identified QTLs associated with the quantitative variation in sex expression on the sex chromosomes of both dioecious and monoecious hemp, despite the high environmental sensitivity of this trait. For this purpose, we constructed three AFLP linkage maps of the sex chromosomes in hemp. Two maps were derived from dioecious populations and one was derived from a monoecious population. Although these maps were unsaturated, they allowed us to identify sex-linked markers and, by homology, co-segregation groups putatively located on the sex chromosomes of dioeious and monoecious hemp.

The main advances can be summarized as follows. First, the X and Y chromosomes of dioecious hemp would recombine with each other between the sex locus and the pseudoautosomal region. Second, the X chromosomes of monoecious hemp would include fragments homologous with both the X and Y chromosomes of dioecious hemp. Third, the sex chromosomes of both dioecious and monoecious hemp would include genetic factors associated with quantitative variations in sex expression. Fourth, the higher number of QTLs detected for the parameters of a logistic curve modelling sex expression as a function of the node position along the stem supported the relevance of this approach for characterizing the variability of sex expression among monoecious hemp plants. Finally, the results of this study suggested that it would be relevant to conduct further research on the genetic determinism of sex expression in hemp using a quantitative approach.