Introduction

Flowering dogwood (Cornus florida L.), native to Eastern North America, is an ornamental tree that plays an important role in the U.S. nursery industry (Ament et al. 2000; Witte et al. 2000). Over 100 cultivars have been released in the U.S. in the last decade (Witte et al. 2000). ‘Appalachian Spring’ and ‘Cherokee Brave’ are popular cultivars exhibiting different phenotypes. ‘Appalachian Spring’ has large white-bracts in the spring and deep green leaves that turn scarlet in fall (Windham et al. 1998). In contrast, ‘Cherokee Brave’ has red or pink bracts in the spring and green leaves that turn fiery red in the fall.

Flowering dogwood is predominantly cross-pollinating and highly self-incompatible (Ament et al. 2000; Cappiello and Shadow 2005; Witte et al. 2000; Gunatilleke and Gunatilleke 1984; Orton 1985; Reed 2004; Sork et al. 2005). Most flowering dogwood cultivars are propagated from axillary buds grafted onto rootstock grown from wild seedlings, or more rarely, from rooted cuttings of the cultivar (Dirr 1998). With a long juvenility period, flowering dogwood trees can require five to seven years to bloom. In the past, studies of flowering dogwood were focused on breeding or selecting new varieties with disease or pest resistance (Windham et al. 2003). Molecular studies on flowering dogwood, which were initiated <10 years ago, have largely been concerned with cultivar identification and included work with simple sequence repeats (SSRs or microsatellites; Cabe and Liles 2002; Wang et al. 2007, 2008); amplified fragment length polymorphisms (AFLPs; Nealed and William 1999; Smith et al. 2007) and DNA amplification fingerprinting (DAF, Ament et al. 2000). Combinations of traditional breeding methods and molecular techniques will facilitate breeding progress for C. florida. However, molecular breeding must be established on some fundamental knowledge of plant genetics. Although molecular markers were applied in flowering dogwood (Smith et al. 2007; Wang et al. 2008), a framework of linkage relationships among these markers is necessary for the identification and localization of genes controlling important horticultural traits, which will subsequently permit the application of marker-assisted selection (Baldoni et al. 1999) in a dogwood breeding program.

Although woody perennial trees, like flowering dogwood, have many advantages for genetic studies (clonally propagated, small genome size and high genetic diversity), most of them have many disadvantages such as long generation time, self incompatibility, and inbreeding depression (Weeden et al. 1994). Due to high heterozygousity in out-breeding species, an F1 population pseudo-testcross strategy has been widely used for linkage analysis (Grattapaglia and Sederoff 1994; Arcade et al. 2000; Porceddu et al. 2002; Hanley et al. 2002; Barcaccia et al. 2003; La Rosa et al. 2003; Doucleff et al. 2004; Dirlewanger et al. 2004; Graham et al. 2004; Beedanagari et al. 2005; Kenis and Keulemans 2005; Verde et al. 2005; Lanteri et al. 2006; Lowe and Walker 2006; Mandl et al. 2006). Other mapping populations such as F1 full-sib progeny (Maliepaard et al. 1997; Venkateswarlu et al. 2006) and BC1 mapping populations (Lalli et al. 2008) have also been used for linkage map construction. In a pseudo-testcross, a single full-sib population is generated by crossing two parents in which one parent is heterozygous at one locus, and the other parent is homozygous at other loci (Mandl et al. 2006). Although the pseudo-testcross strategy has been widely used in the mapping of many out-breeding species, it is difficult to determine if one of the parents is homozygous for any specific locus or loci.

A “pseudo-F2” mapping strategy was described for the construction of a linkage map of a non-inbred Solanum tubersosum (potato) cultivar (Rouppe van der Voort et al. 1999). A selfed family of three F1 Manihot esculenta (cassava) plants derived from a single progeny of a full-sib population was considered a “pseudo F2” population and used for QTL analysis (Okogbenin et al. 2008). The generation of a classical F2 population for flowering dogwood is impractical because homozygous lines are not available and flowering dogwood is self-incompatible (Reed 2004). Furthermore, the long juvenility phase prior to bloom has thwarted geneticists from creating an F2 mapping population. Weeden et al. (1994) suggested that seeds derived from a single tree represent a similar set of haploid genotypes, although complexes with another haploid genotype from the pollinator exist. The seeds from the single tree could be considered equivalent to an F1, and be used as mapping population by means of the pseudo-testcross strategy. However, factors such as unknown pollen donor or possible multiple pollinators affect the linkage analysis.

Although it is impractical to make an F2 from one F1 flowering dogwood tree, it is easy to make a “pseudo F2” generation using two F1 trees. If the F1 trees show genetic identity at a given locus, the progeny could be considered as F2, which could be assigned as a “pseudo F2”. Hence, two F1 breeding lines, 97-6 and 97-7, from a cross between ‘Appalachian Spring’ and ‘Cherokee Brave’ were crossed to create a “pseudo F2” population.

We screened the DNA of four parents with 825 microsatellite (SSR) markers developed previously (Wang et al. 2008) in order to find those loci most polymorphic between ‘Appalachian Spring’ and ‘Cherokee Brave’, but with identical allele segregation between two F1 parents, 97-6 and 97-7 at a given locus. The concept of the F2 generation in self incompatible woody plants can be compared to that of F2 or BC1 of an annual selfing crop (Tulsieram et al. 1992). Such a mapping population was used for genetic linkage map construction of flowering dogwood following the out-breeding population module of the JoinMap® 4.0 software program (Van Ooijen 2006).

The aim of this work was to build a microsatellite marker-based genetic linkage map of flowering dogwood. This initial linkage map will facilitate the location of genes of interest and subsequently permit marker-assisted selection (Strauss et al. 1992) in our breeding program for flowering dogwood. The release of the first linkage map in flowering dogwood will provide the basis for identification and cloning of important genes and quantitative trait loci (QTLs) that are beneficial to ornamental industry (Dirlewanger et al. 1999).

Materials and methods

Plant material

Two flowering dogwood (C. florida L.) F1 trees, 97-6 and 97-7, derived originally at the University of Tennessee ten years ago from the intra-specific cross ‘Cherokee Brave’ × ‘Appalachian Spring’ (Fig. 1), were used for creating a mapping population. In early spring 2005 at the Agricultural Experimental Station of the University of Tennessee, Crossville, TN, two trees, 97-6 and 97-7, were placed into a fiberglass mesh screened cage (2.5 m × 2.5 m × 2.5 m) and reciprocal honeybee-mediated crosses were performed.

Fig. 1
figure 1

The pedigree of the “pseudo F2” mapping population of flowering dogwood (Cornus florida L.)

In the fall of 2005, about 400 mature fruits from 97-6 and 97-7 crosses were collected, depulped, the “seeds” removed, and then dried for 24–48 h. After stratification at 4°C for about 4 months, the germinated “pseudo F2” seeds were planted in the greenhouse.

DNA isolation

Young, not fully expanded leaves were collected during April–May from seedlings and DNA extracted using DNeasy Plant Mini Kit (QIAGEN, Valencia, CA, USA) following the manufacturer’s instructions. Due to inbreeding depression, most of the F2 plants appeared dwarfed and unhealthy and grew very slowly compared to robust growth exhibited by F1 plants. As a result, some samples were collected more than once to obtain useable DNA. Genomic DNA concentration was quantified using NanoDrop® ND-1000 UV-Vis Spectrophotometer (NanoDrop ND-1000, Delaware, USA). One nanogram (ng) of DNA was used as template for PCR with four informative SSR primers (data not shown) to check for successful amplification. The PCR protocols described previously by Wang et al. (2008) were used throughout this study.

Mapping population size and determination of SSR makers

A set of 94 “pseudo F2” samples was randomly chosen from successfully amplified samples with four SSR primers. To find polymorphisms between the DNA of the parents, but identity between the DNA of the two F1 parents, 825 SSR primers previously developed (Wang et al. 2008) were screened with the DNA of the four parents (data not shown). PCR conditions and allele size determination followed the methods of previous work (Wang et al. 2008).

Mapping and linkage analysis

Markers were recorded as the following three types: (1) maternal markers, segregating only within female parent ‘Cherokee Brave’, and null in male parent ‘Appalachian Spring’ (expected segregation ratio 1:1). These markers were coded with the letter “B” followed by the marker name; (2) paternal markers, segregating only within male parent ‘Appalachian Spring’, and null in female parent ‘Cherokee Brave’ (expected segregation ratio 1:1). These markers were coded with the letter “A” followed by marker name; and (3) intercross markers, segregating within both parents, expected segregation ratio either 3:1 (dominant), 1:2:1 (co-dominant) or 1:1:1:1 (presents three or four alleles). The dominant markers with 3:1 ratio were not be used for mapping because of homozygosity in both parents. The markers with 1:2:1 and 1:1:1:1 ratios were designated with the letter “C” followed by the marker name.

Chi-square (χ 2) tests were performed to test goodness-of-fit between observed and expected segregation ratios. Markers segregating in Mendelian ratios (including some slightly skewed markers, P ≤ 0.01) were used for map construction. Heavily distorted segregation of markers (P > 0.01) were omitted from the analysis. Data were analyzed using the cross-breeding population type option of JoinMap® 4.0 mapping program (Van Ooijen 2006). Different log-of-odds (LOD) scores were produced to determine the linkage groups that match the number of chromosomes of the C. florida genome, which is eleven (Radford et al. 1979). Finally, linkage groups were determined with a LOD threshold of 6.0 and recombination frequency ≤0.40. The relative marker order within each linkage group was determined based on the following parameters: Rec = 0.40, LOD = 1.0, goodness-of-fit jump threshold = 5.0, Map distances were computed using Kosambi’s regression mapping function (Kosambi 1944). Linkage maps were drawn using MapChart 2.1 software (Voorrips 2002).

The linkage coverage (percentage) can be calculated from the total length of the linkage divided by the total genome length (G) of the genome in centimorgans (cM). Total genome length can be estimated by the methods of both Hulbert et al. (1988) and Chakravarti et al. (1991), and the formula for determining G is as follows: G = N (N − 1)X/K at an LOD threshold of T, where N is the total number of markers analyzed, X is the average distance between adjacent markers at a certain LOD value of T, and K is the observed number of pairs of markers with an LOD ≥ T.

Genome size estimates by flow cytometry

Flow cytometric measurements of nuclear DNA quantity were made from fresh C. florida leaf tissue samples using a two-step procedure originally described by Otto (1990) and modified by Dolezel and Göhde (1995). Approximately 0.5 cm2 of growing leaf tissue of sample and 1.0 ml of cold Otto I Buffer (0.1 M citric acid monohydrate and 0.5% (v/v) Tween) were chopped for 30–60 s in a plastic petri dish on ice. The resulting extract was passed through a 30 μm nylon mesh filter into a 3.5 ml plastic tube, which was centrifuged at 150g for 5 min. Supernatant was removed leaving approximately 100 μl solution and pellet, to which an additional 100 μl of cold Otto I Buffer was added. Pellet was gently resuspended and incubated on ice for 30 min. The fluorochrome 4, 6-diamidino-2-phenylidole (DAPI) was added to 1.0 ml of ice-cold Otto II Buffer (0.4 M Na2HPO4 · 12H2O) at a concentration of 4 μg/ml and mixed with the sample on ice. The relative fluorescence of the total DNA was measured for each nucleus using a Partec PA-1 ploidy analyzer (Partec GMBH). At least 5,000 nuclei were analyzed, revealing a single peak with a coefficient of variation (CV) less than 5.0%. Genome sizes were calculated as nuclear DNA content for unreduced tissue (2C) as: 2C DNA content of tissue = (mean fluorescence value of sample ÷ mean fluorescence value of standard) × 2C DNA content of standard. Lycopersicon esculentum (Stupicke polni) and Zea mays (CE-177 inbred line) have established 2C genome sizes of 1.96 and 5.43 picograms, respectively (Lysák and Doležel 1998; Dolezel et al. 1992) and were used as internal standards.

Results

Primer screening

The DNA of ‘Appalachian Spring’, ‘Cherokee Brave’, and two F1 (97-6 and 97-7) were amplified with 825 SSR loci. Markers that were heterozygous within grandparents and homozygous within parents were selected for mapping population segregation. As a result, 469 of the 825 SSR markers (56.8%) showed polymorphism between ‘Appalachian Spring’ and ‘Cherokee Brave’. Of the 469 SSR markers, 274 (58.4%) presented no difference between 97-6 and 97-7, which implies that 97-6 and 97-7 are genetically identical for these alleles. From the 274 SSR markers, 219 SSR markers presented sufficient separation among the 94 individuals in “pseudo-F2” populations that the alleles were easily scored and used for linkage analysis. Of the 219 markers recorded, 167 exhibited single locus segregation (either male parent or female parent) and 52 showed two loci segregation (co-dominant plus male or female parent). Three markers (1%) showed highly significant levels of distortion (0.01 < P < 0.001) and were excluded from the linkage analysis (Table 1). A total of 271 SSR loci were used for linkage analysis.

Table 1 SSR markers generated for the genetic mapping in a pseudo F2 population of Cornus. florida L.

In all, 271 polymorphic loci were available for map construction, including 76 maternal, 48 paternal and 147 common loci. A total of 129 (47.6%) out of 271 markers showed significant deviation between 0.05 ≤ P ≤ 0.01 and were included for map construction. Alleles in the male parent showed significant segregation distortion when compared to those in the female parent (4 loci vs. 27 loci). Most co-dominant markers (66.7%) showed distortion at 0.05 ≤ P ≤ 0.01 (Table 1).

Linkage map construction

In all, 271 markers were available for map construction, including 76 maternal loci, 48 paternal loci and 147 common loci. All markers were assigned to 11 major linkage groups (LGs) with a minimum of 11 markers at a LOD value of 6.0 (Table 2). The number of linkage groups detected in this mapping population corresponds exactly with the haploid number of chromosomes in C. florida. Four maternal, seven paternal and five co-dominant loci remained unlinked (5.9% in total). A total of 255 (94.1%) markers were ordered onto 11 LGs. The length of the individual LGs varied from 69.4 to 136.5 cM with a mean of 106.8 cM. The number of loci on LGs ranged from 11 to 37 with an average of 23.2 loci. The distance for the 11 LGs spanned a total of 1,175.0 cM with an average internal distance of 4.6 cM. The majority of the marker intervals were less than 10 cM. However, some large gaps remain in each LG, and the largest gap of 26.6 cM was found on LG3 (Table 2; Fig. 2). Except for slight clustering of markers on LGs 5, 7 and 11, all markers were evenly distributed across the 11 LGs (Fig. 2).

Table 2 Linkage groups (LGs) detected in construction of genetic linkage map for Cornus florida L. in a pseudo F2 population
Fig. 2
figure 2

Genetic map of an intra-specific population between C. florida ‘Appalachian Spring’ (AS) and ‘Cherokee Brave’ (CB). The letter “A” indicates markers associated with the male parent AS and the letter “B” indicates markers associated with the female parent CB. The two letter prefix “CF” for each marker is the initial letters of C. florida and have been registered at GenBank (Wang et al. 2008)

Of the 255 mapped markers, 129 distorted markers were distributed onto each LG (Table 3). LG 1, 5, 6 and 8 contained more distorted dominant maternal markers, LG 1, 5, and 11 had more distorted co-dominant markers. There were no distorted dominant markers on LG 3 and 4. Four distorted parental markers were located on LG 1, 2 and 11 (Table 3).

Table 3 Mapped and distored markers (in parentheses, 0.05 ≤ P ≤ 0.01) in a pseudo F2 population of Cornus florida L.

There were 268 marker pairs at the LOD value of 6.0. Given the formula of Hulbert et al. (1988) and the “method 3” of Chakravarti et al. (1991), the genome length of this species (C. florida L.) was estimated as 1255.9 cM. Therefore, the 11 LGs covered 93.6% of the flowering dogwood haploid nuclear genome. Given one picogram (pg) equals 978 Mb (Doležel et al. 2003), the genome size of flowering dogwood was estimated to be 1.28 pg per haploid cell (1C), which corresponds with the estimated genome size (2C) of 2.19–2.77 pg/nucleus that we found for flowering dogwood using flow cytometry.

Discussion

Unlike inbred species, out-crossing species are highly heterozygous and generally not easy to obtain a pedigree from crosses between inbred lines (Venkateswarlu et al. 2006). When attempting to build a genetic linkage map, the mapping population is the main issue for the crossing parents (Maliepaard et al. 1997). Most difficulties in linkage analysis for out-crossing species involve more than one segregating allele per locus of each parent, and the unknown linkage phase of the loci. Unlike a segregating population derived from two inbreeding species, the segregating loci present only two alleles and the alleles have the same phase in the F1 (Blenda et al. 2007). On the contrary, a segregating population between two non-identical plants of two out-crossing parents may segregate up to four alleles per locus, and the linkage phase of the marker pairs usually are unknown (Maliepaard et al. 1997; La Rosa et al. 2003; Lanteri et al. 2006).

In out-breeding trees, most researchers have used an F1 mapping population with a two-way pseudo-testcross strategy to construct an individual genetic linkage map for each parent (Grattapaglia and Sederoff 1994; Arcade et al. 2000; Porceddu et al. 2002; Hanley et al. 2002; Barcaccia et al. 2003; La Rosa et al. 2003; Doucleff et al. 2004; Dirlewanger et al. 2004; Graham et al. 2004; Beedanagari et al. 2005; Kenis and Keulemans 2005; Verde et al. 2005; Lanteri et al. 2006; Lowe and Walker 2006). F1 full-sib progeny (Maliepaard et al. 1997; Venkateswarlu et al. 2006) and BC1 population (Lalli et al. 2008) were also used as mapping population in the linkage analysis of out-crossing trees. Individual linkage maps produced by these methods may lead to inaccuracy in the estimation of recombination frequencies (Venkateswarlu et al. 2006).

Because flowering dogwood is a highly heterozygous and self-incompatible ornamental tree, we created an F2 population using two F1 breeding lines. This F2 population can be considered equivalent to the F2 of annual self-pollinated plants when the parents (two F1) are identical genetically at a given locus (Fregene et al. 1997; Okogbenin et al. 2008). All SSR markers selected for linkage analysis in this study presented identical segregation within two F1 parents. Therefore, this linkage map of flowering dogwood was built with a “pseudo-F2” mapping population. To demonstrate the marker order within each linkage group, we also used the pseudo-testcross strategy to construct an individual linkage map for both F1 parents. Eleven LGs were obtained for 97-6 (female) and twelve for 97-7 (male) at LOD value of 6.0 (data not shown). The alignment showed that eleven LGs of the female parent matched those LGs of flowering dogwood with minor distortion; while LG 5 split into two short LGs and generated 12 LGs in male parent’s map. This result indicates a necessity to develop more male parental markers to fill in 11 LGs. The alignment of common markers (co-dominant markers) with the parents’ LG showed the same or minor reversed order with present linkage map (data not shown). These results demonstrated that the genetic linkage map for flowering dogwood constructed with a “pseudo-F2” mapping population in this study is reliable. Compared to the pseudo-testcross strategy, pseudo-F2 mapping strategy has its own advantages including the following: does not require that one parent be heterozygous and other to be homozygous (Mandl et al. 2006), and is not necessary to know the allele segregation phase (Maliepaard et al. 1997; Venkateswarlu et al. 2006). However, “pseudo-F2” mapping strategy needs more molecular markers to generate fine linkage map and will lose some useful loci.

The “pseudo-F2” population of seedlings created with two F1 breeding lines showed inbreeding depression after emergence of the second pair of true leaves. About one-half of marker segregation (50.6%) showed distortion at 0.05 ≤ P ≤ 0.01. Although there is bias in the estimation of linkage analysis (Tavoletti et al. 1996; La Rosa et al. 2003; Lanteri et al. 2006), these markers were still included in the mapping data. Three markers were significantly skewed at P > 0.01 and initially excluded in the mapping data. The segregation distortion observed in this study was significantly higher than other studies (Conner et al. 1997; Casasoli et al. 2001; Scalfi et al. 2004; Pekkinen et al. 2005; Lantri et al. 2006) because both 97-6 and 97-7 are highly heterozygous at most loci.

Many biological factors, such as unequal crossover during meiosis, chromosome loss, non-random union of gametes, zygotic embryo abortion (Faris et al. 1998), changes in genetic load (Bradshaw and Stettler 1994), or lethal alleles (Pilien et al. 1993; Pekkinen et al. 2005) may cause allele segregation distortion. Forest trees often have less self-fertility and are believed to have many recessive lethal genes in the heterozygous condition (Sneizko and Zobel 1988). Due to selfing, some of the recessive lethal genes present in the heterozygous condition might have become homozygous and expressed in the progeny (Venkateswarlu et al. 2006). However, it may lead to the loss of a large part of a linkage group if those markers are significantly distorted at 0.05 ≤ P ≤ 0.01 and are excluded in the linkage analysis (Cervera et al. 2001; Doucleff et al. 2004). Some reports demonstrated that the distorted markers at 0.05 ≤ P ≤ 0.01 included in the linkage analysis were beneficial to linkage map construction (Kuang et al. 1999; Fishman et al. 2001; La Rosa et al. 2003; Lanteri et al. 2006). The use of an intraspecific cross might have contributed to the reduced level of segregation distortion observed (Becker et al. 1995; Sondur et al. 1996). In this study, we used progenies from an intraspecific cross of two cultivars. Although the distorted markers were included in the linkage analysis, these markers did not affect the linkage arrangement, as they were distributed on all linkage groups. Distorted markers that are distributed onto different linkage groups could be due to the heterogeneous transmission of chromosome fragments to progenies (Quillet et al. 1995). All distorted markers in this study demonstrated whole genome distribution. This is suggestive of a biological mechanism underlying segregation distortion (Lanteri et al. 2006), as opposed to random bias caused by scoring errors or chance (Fishman et al. 2001).

Significant clustering of markers was not observed in the LGs. However, slight clustering of markers was observed on some LGs, such LG 5, 7 and 11. Clustering is very common, especially for AFLP and RAPD marker-based linkage maps (Tanksley et al. 1992; Vallejos et al. 1992; Lantri et al. 2006; Blenda et al. 2007), whereas SSR markers were reported to randomly distribute across LGs (Lanteri et al. 2006). The slight clustering that occurred on some of the LGs in this study may have resulted from the inbreeding depression exhibited in the F2 mapping population. The small mapping population size (<100 individuals) and SSR loci mainly from a certain area of genome may result in clustering as well.

To our knowledge this is the first linkage map available for flowering dogwood. Our linkage map will benefit the following areas of genetic research for this species: 1. provide molecular fundamentals of genetic structure; 2. identify genes of interest on chromosomal regions, allowing for targeted selection of these genes in markers-assisted breeding in the future; and 3. serve as a reference map for high density saturation of future maps. By applying more SSR markers and developing other types of molecular markers, such as AFLPs, and increasing the mapping population size, a more comprehensive genetic map can be obtained.