Introduction

Genetic variation provides the foundation for species to respond to changes in the natural environment (Jump et al. 2009). It is well understood that intraspecific variation is distributed non-randomly within and among plant populations (Caicedo et al. 2004; Soltis et al. 2006), and that contemporary patterns of genetic variation among populations within species result from ongoing evolutionary processes as well as historical factors (Mitchell-Olds et al. 2007; Sloan et al. 2008). Understanding the processes shaping the geographic structure of genetic variation has become a central focus of evolutionary biology (Charlesworth et al. 2003; Mitchell-Olds et al. 2007). It is particularly relevant for the wild relatives of domesticated species, which constitute an important component of the crop gene pool (Harlan 1992).

Pecan [Carya illinoinensis (Wangenh.) K. Koch] is the most economically important native nut tree in North America. A member of the hickory genus (Carya spp., Juglandaceae) pecans are monoecious, with male and female flowers produced on the same tree but maturing at different times in a heterodichogamous blooming pattern that minimizes inbreeding (Thompson and Romberg 1985). Pollen is distributed by wind and may travel long distances. In contrast, fruits appear to be dispersed on a local scale. This reproductive system results in different levels of dispersal between male and female gametes. Over time, such a system has implications for population structure (Namkoong and Gregorius 1985). To date, no studies have compared geographic patterns of variation in pecan based on nuclear (pollen) and plastid (seed) transmission.

Native pecan populations are an important resource for the pecan industry. Pecan production is marketed in two major categories: “Native/Seedling” and “Improved”. “Native” pecans are non-grafted trees growing in natural, regenerating stands not established by humans. “Seedling” pecans are non-grafted trees grown from selected nuts and/or intentionally planted. “Improved” pecans are selected cultivars that are asexually propagated by budding or grafting onto native or seedling rootstocks. The USA and Mexico are world leaders in pecan production (Flores and Ford 2009), but pecans are also grown commercially in Argentina, Australia, Brazil, China, Egypt, Peru, and South Africa. Pecans were first successfully grafted in 1846 or 1847 on the Oak Alley Plantation in St. James Parish, Louisiana (Taylor 1905). Commercial nursery production of grafted pecan trees began in the late 1800s and has resulted in steadily increasing acreage and production from improved pecans. Pecan production from improved trees began to exceed native production in the 1960s, while native production began a steady decline in the 1970s. Range-wide collections from native pecan populations have been added to National Clonal Germplasm Repository collections in order to preserve broad genetic diversity of a diminishing resource.

Native pecan populations occupy the westernmost range of any Carya species (Thompson and Grauke 1991) and occur in regenerating stands from Iowa south to the Gulf Coast of Louisiana, Texas, and into Mexico (Fig. 1). Some populations considered to be native may have been initially dispersed into their present locations by prehistoric people. Pecan was used by the people living at Modoc Rock Shelter in Randolph County, Illinois, near the northern limit of the native range, over 10,000 years ago (Styles et al. 1983). Pecans were recovered from strata over 8,000 years old from Baker's Cave in Val Verde County (Dering 1977; Hester 1981), the westernmost of the contiguous native stands in Texas. Braun (1961) considered the pecans in Butler County, OH, USA to be native. They are closely associated with Adena culture mounds, artifacts of a people who lived in that area from 2500 BP to 400 AD (Fagan 1992). Among the easternmost populations of putatively native pecans is the disjunct population in west central Alabama on the Black Warrior River (Harper 1928). That population is near Moundville, an important site of the Mississippian culture of west central Alabama where archeological excavations confirm pecan was present as early as the Moundville I phase (1050–1250 AD)(Welch and Scarry 1995). The distribution of pecan in Mexico (Manning 1949, Manning 1962; Miranda and Sharp 1950) is not qualified by archeological excavations, but is associated in many locations with other plant species that imply the remnants of an ancient flora (Stone 1962; Graham 1999). Ultimately, long-established populations imply adaptation to local conditions that may be genetically unique and valuable.

Fig. 1
figure 1

Map of the distribution of native pecan trees (shaded area), showing the locations of 19 populations as designated in Table 1

In this study, plastid and nuclear genetic variation were surveyed in a representative sample of 80 indigenous pecan trees collected from throughout the native range of species. The primary goal was to assess the geographic distribution of nuclear- and plastid-encoded genetic variation in native pecans, and to identify evolutionary processes that might underlie these patterns. Our specific objectives were to: (1) assess geographic structure of genetic variation based on plastid and nuclear SSRs; (2) explore latitudinal trends in genetic variation; and (3) identify the geographic origins of Mexican C. illinoinensis populations.

Materials and methods

Sampling and DNA extraction

Leaves for DNA extraction were collected from a pecan provenance orchard in Byron, Georgia. This collection was established from seed collected across the range of pecan in 1986 and 1987 (Grauke et al. 1989, Wood et al. 1998). Samples from 19 indigenous pecan populations were collected from across the USA and Mexico. Seed was collected from up to six trees in each population. The latitude and longitude of the original trees was estimated using collection records and Google Earth. We collected leaf samples from one seedling of each of 80 of the original mother trees to identify broad-scale patterns of genetic variation (Table 1, Fig. 1).

Table 1 Sample localities and organization of 80 indigenous pecan accessions

Young leaves were harvested and frozen at −80°C until DNA extraction. Total genomic DNA was extracted using methods modified from those of Paterson et al. (1993). Frozen tissue was ground and placed in an extraction buffer [0.35 M glucose, 0.1 M Tris–HCl pH = 8, 5 mM Na2EDTA pH = 8, 2% (w/v) polyvinylpyrolidone (PVP-40)] at pH = 7.5 and a lysis buffer [0.1 M Tris–HCl pH = 8, 1.4 M NaCl, 0.02 M Na2EDTA pH = 8, 2% (w/v) CTAB, and 2% PVP-40]. At the time of DNA extraction, 1% (w/v) of ascorbic acid and 0.2% (v/v) of β-mercaptoethanol were added to both buffers. DNA was cleaned with chloroform/isoamyl alcohol (24:1) and precipitated with salt and isopropanol or ethanol. Successful extractions were also made using the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instructions.

SSR amplification

Eight primer pairs targeting microsatellite loci in organellar DNA were identified by previous workers (Bryan et al. 1999;Weising and Gardner 1999; Cheng et al. 2006). A preliminary study found three polymorphic loci [ccmp2 (Weising and Gardner 1999), ntcp40, and ntcp9 (Bryan et al. 1999)] that amplified and produced consistent fragment sizes across runs of C. illinoinensis individuals. PCR amplifications were performed in PerkinElmer 9600 or 9700 thermal cyclers, using 10–30 ng of genomic DNA in a reaction volume of 12 μl. The PCR mix contained 20 mM Tris–HCl (pH 8.4), 50 mM KCl, 2.5 mM MgCl2, 0.1 mM each of dATP, dGTP, dTTP and dCTP (Promega), 0.04 ml of Taq DNA polymerase (Invitrogen), and a mix of the forward and reverse primers in a final concentration of 0.25 pmol/ml. The thermal cycling protocol consisted of 3 min at 94°C followed by 35–40 cycles of 45 s at 94°C, 45 s at 55 C and 1 min at 72°C, and final elongation of 40–60 min at 72°C. In addition, 14 nuclear microsatellites (Grauke et al. 2003, Mendoza-Herrera et al. 2008), including six previously identified in Juglans (Woeste et al. 2002; Dangl et al. 2005) were amplified for the 80 accessions. After PCR, 3 μl of each sample was loaded in a 2% agarose gel in 1× TAE buffer and stained with ethidium bromide to verify amplification.

PCR fragments were visualized using an ABI Prism Genetic Analyzer 3130 (Applied Biosystems, Foster City, CA, USA). For analysis in the capillary system, forward primers were fluorescently labeled at the 5′end using either 6-FAM or HEX. Two microliters of sample was diluted 1:30 with deionized water prior to sizing. Ten microliters 400-ROX internal size standard in deionized formamide at 2.5% was added to each sample to estimate fragment size using GeneScan and Genotyper software v 3.7 (Applied Biosystems). Samples were multiplexed when possible. Fragments were scored using Genotyper software v3.7. The size of amplified products was calculated based on the 400ROX internal standard. Alleles at each SSR locus were called by the size in base pairs expressed as a whole number. The range of sizes within each allele class was facilitated by FlexiBinV2 (Amos et al. 2007). Some samples had two replicates to monitor the reproducibility between runs and many were run multiple times. Repeated runs included known standards of diverse haplotypes to insure consistency.

Data analysis

Plastid and nuclear SSR datasets were analyzed separately using Genalex 6.41 (Peakall and Smouse 2006). For the nuclear loci, the number of alleles per locus (Na), effective number of alleles (Ne), Shannon's information index (I), observed heterozygosity (Ho), expected heterozgosity (He), unbiased expected heterozygosity (UHe), and the fixation index (F = He−Ho/He) were estimated. For each plastid locus, Na, Ne, I, Diversity (h, = 1-(sum of squared population allele frequencies)), and unbiased diversity (uh, = (N/(N−1)) *h) were calculated.

The 80 samples included in this study were collected from 19 geographically distinct populations (Table 1). Two distinct approaches were taken to analyze the data: (1) analyses were completed for a priori populations, but sample numbers were low (N = 2−5), and (2) a Bayesian framework was used to assign individuals to clusters, regardless of population of origin. In the first approach, standard population genetic parameters (listed above) were generated for each population: Na, number of private (unique) alleles (Pa), and number of polymorphic loci (PL). Population structure was assessed using AMOVA and Mantel tests were employed to test the null hypothesis that there is no association between geographic distance and genetic distance. Analyses were performed using Genalex 6.41 (Peakall and Smouse 2006). In the second approach, we used the model-based clustering method STRUCTURE version 2.3.2 (Pritchard et al. 2007) to assign individuals to homogeneous clusters (K populations), regardless of their geographic origin. Separate analyses were conducted for the organellar SSR markers and the nuclear SSR markers. Following a burn-in of 1 × 105 Markov chain Monte Carlo (MCMC) iterations, five independent runs for K = 1−10 (three for the nuclear data) were run using 1 × 106 MCMC iterations. For the organellar and nuclear SSR datasets, we specified that allele frequencies were independent and set lambda to 1. Two separate analyses were run for each dataset using: (1) admixture ancestry model and (2) no admixture model. The most likely number of populations (K) was identified based on the highest value of Ln P(D), a model-choice criterion that estimates the posterior probability of the data. Clusters identified based on the STRUCTURE results of the organellar and nuclear SSR dataset were treated as populations in subsequent analyses (“cpSSR-defined clusters” and “nSSR-defined clusters”). Using the cpSSR-defined clusters, we estimated genetic diversity and population structure for the cpSSR data and for the nuclear data. Then, we used the nSSR-defined clusters to estimate genetic diversity and structure of the nSSR data and the cpSSR data. This approach allowed us to compare levels of nuclear and plastid SSR genetic variation and population structure for clusters identified with each marker type (cpSSR and nSSR).

To explore relationships among individuals, a distance-based approach was implemented using the program DARwin (Perrier and Jacquemoud-Collet 2006). The dissimilarity index was calculated for the single-count data. Tree construction was completed using the weighted neighbor-joining method (Saitou and Nei 1987) and exported for visualization in the program FigTree v1.3.1 (downloaded Feb. 2011 from http://tree.bio.ed.ac.uk/software/figtree).

Results

The number of alleles per locus ranged from four to five for the plastid SSRs and from 2 to 22 for the nuclear SSRs; these and additional locus-specific measures are reported in Table 2 (and Supplemental Table 1). All loci were polymorphic for both datasets (Table 2). Analyses of the 19 original populations revealed variation in number of polymorphic loci and number of alleles per population (Table 3; other population genetic parameters not reported due to low sample numbers). The mean percentage of polymorphic loci per population was 24.56% ± 7.14% for cpSSRs and 77.07% ± 2.48% for the nuclear SSRs (Table 3). The average number of alleles per locus was 1.298 ± 0.075 for cpSSRs and 3.226 ± 0.123 for nSSRs. Private cp SSR alleles were identified in 3 of the 19 a priori populations: MX2 (two private alleles), MX5 (one private allele), and TX 3 (one private allele). Private nSSR alleles were identified in 16 of the 19 a priori populations (Table 3). AMOVA indicated that 74% of the plastid molecular variance was distributed among a priori populations, with 26% distributed within a priori populations (P < 0.001). In contrast, 11% of the nuclear molecular variance was distributed among populations, with 89% of the variance distributed within populations (P < 0.001) (Table 4). Mantel tests rejected the null hypothesis of no association between geographic distance and genetic distance, indicating a significant correlation between geography and genetics for the sampled populations based on plastid (P = 0.000, r = 0.461) and nuclear SSR data (P = 0.000, r = 0.165).

Table 2 Diversity statistics for 80 native pecan accessions by genome and locus
Table 3 Diversity indices for 19 a priori native pecan populations
Table 4 Analysis of molecular variance (AMOVA) by population and genome for 80 native pecan accessions

STRUCTURE analyses based on cpSSR data

A priori population designations were removed and Bayesian analyses were used to determine how the 80 individuals clustered together based on their cpSSR profiles. The cpSSR data clustered samples into three groups (admixture model K = 3, LnP(D) = −131.2; no admixture model K = 3, LnP(D) = −116.1). Bar plots for the admixture model are shown in Fig. 2a (accessions listed in their original order as in Table 1) and 2b (accessions listed by q score). There were no differences between the admixture and no admixture models for the cpSSR data (data not shown). Individuals were considered part of a group if more than 80% of their profile grouped with a given cluster. STRUCTURE analysis based on plastid SSRs unambiguously assigned 92.5% of individuals (74/80 individuals) to one of the three clusters (Fig. 2). Six individuals were assigned nearly equally to two clusters (#s 20, 22 (MX5) 34 (TX1), 46 (TX1), 55 (KY1), 64 (KS2)) (Fig. 2a,b). Cluster one (blue in Fig. 2, southern cluster; average latitude 19.94453° N, longitude 99.98127° W) included 15 individuals collected from four populations in Mexico (MX1, MX2, MX3, MX4) (Fig. 1). Cluster two (red in Fig. 2, central cluster; average latitude 30.29129° N, longitude 98.61908° W) included 30 individuals. These included all individuals from populations TX1, TX2, TX3, TX4, and TX5, with isolated individuals from KY1, KS2, MS1, MX5, and TX6 (Fig. 1). Six of the 30 individuals [20, 22 (MX5), 34 (TX1), 46 (TX6), 55 (KY1), 64 (KS2)] were assigned primarily to cluster two, but approximately 40–45% of their profile grouped with cluster three (Fig. 2b). Cluster three (green in Fig. 2; the northern cluster, average latitude 34.93399° N, longitude 93.17874° W) included 35 individuals. These included all individuals from IL1, KS1, MO2, MO1, and TN1 as well as most individuals from MS1, KS2, KY1, MX5, and TX6 (Fig. 1). Isolated individuals from MX2 and MX4 were also in this cluster. One individual that grouped with cluster three [#18 (MX5)] had a small portion of its profile group with cluster 2.

Fig. 2
figure 2

STRUCTURE analysis for cpSSRs using Admixture model (K = 3, LnP(D) = −131.2) for 80 C. illinoinensis individuals. Numbers correspond to accessions listed in Table 1. Colors correspond to southern (blue), central (red), and northern (green) clusters a Accession numbers are shown below profiles (see Table 1 for detailed description). Accessions are listed in their original order, regardless of q score b Accessions are ordered based on q score

Ten individuals (12.5%) clustered at least partially with a group that differed from the region where they were collected (Fig. 2a). Five individuals collected in Mexico grouped with the northern group [07 (MX2), 11 (MX4), 18, 19, 21 (MX5)]; two individuals collected in Mexico grouped with both the central group and the northern group [20, 22 (MX5)]. Individual 34 (TX1) clustered with both the central and the northern groups. Two individuals collected in the northern portion of the range clustered with both the northern and central clusters: 55 (KY1) and 64 (KS2).

Estimates of genetic variation for the three cpSSR-defined clusters (Table 5) reveal that plastid diversity is highest in the southernmost cpSSR-defined cluster, and decreases from south to north. To assess the impact of the six individuals that group with clusters two and three [20, 22 (MX5), 34 (TX1), 46 (TX6), 55 (KY1), 64 (KS2)], diversity parameters were estimated without these individuals. With these individuals removed, the south–north decrease in cpSSR genetic variation did not change in direction or magnitude (data not shown). Eighty-six percent of plastid molecular variance was attributed to among-cp cluster differences, with 14% due to within-cp cluster differences (P < 0.001;Table 4).

Table 5 Diversity indices of 80 native pecan accessions by cluster within genomes, with clusters based on either cpSSRs or nSSRs

Estimates of nSSR variation for the cpSSR-defined clusters exhibit a different geographic pattern; nuclear SSR variation increases from south to north (Table 5). AMOVA showed that 7% of the nuclear molecular variance in cpSSR-defined clusters was among-populations with 93% from within-populations (P < 0.001; Table 4).

STRUCTURE analyses based on nuclear SSR data

Bayesian analyses of the 80 C. illinoinensis individuals based on nSSR data indicated that the most likely number of clusters was three for both the admixture model [K = 3; LnP(D) = −3,033.2; Fig. 3] and the no-admixture model [K = 3, LnP(D) = −3,030.7; data not shown]. The three clusters identified based on nSSR data corresponded to southern (N = 13, blue in Fig. 3; average latitude 19.678372° N, longitude 100.04894° W), central (N = 48, red in Fig. 3; average latitude 31.414147° N, longitude 98.811790° W), and northern (N = 19, green in Fig. 3; average latitude 35.099938° N, longitude 93.260284° W).

Fig. 3
figure 3

STRUCTURE analysis of nuclear SSRs using the Admixture model (K = 3; LnP(D) = −3033.2). Accession numbers (Table 1) are shown below profiles. Colors correspond to southern (blue), central (red), and northern (green) clusters a Accessions are listed in their original order, regardless of q score b Accessions are listed according to q score

Although STRUCTURE analyses of cpSSRs and nSSRs both identified three geographically distinct clusters (southern, central, and northern), the size and composition of those clusters differed depending upon whether the clusters were defined based on chloroplast or nuclear SSR data. The primary difference between STRUCTURE analyses based on cpSSR and nSSR data is the size of the central cluster. Based on cpSSR data, the central cluster included 30 individuals but included 48 individuals based on nSSR data. Corresponding but asymmetrical size reductions are observed in the southern and northern populations: based on cpSSR data the southern cluster included 15 individuals but included only 13 individuals based on nSSR data; the northern cluster included 35 individuals based on cpSSR data, but only 19 individuals based on nSSR data.

Estimates of nuclear genetic variation for the three nSSR-defined clusters (Table 5) revealed no consistent geographic pattern in levels of genetic variation. Twelve percent of nuclear molecular variance was attributed to among-nSSR-defined cluster differences, with 88% due to within-cluster differences (P < 0.001; Table 4). Plastid diversity decreases from south to north in the nSSR-defined clusters (Table 5) and is independent of nSSR variation. AMOVA identified that 75% of the plastid molecular variance resulted from among nSSR-defined cluster differences, with 25% derived from within-population differences (P < 0.001; Table 4).

A neighbor-joining tree based on plastid SSR data identified the same three clusters as STRUCTURE (Fig. 4). Individuals assigned to multiple groups in the STRUCTURE analysis were resolved with the cluster that had the majority representation within its profile (31, 48, 57, 66, 20, 22, 35, 36) with one exception. Individual 18, collected in Mexico (MX5 population), grouped with the northern cluster in the STRUCTURE analysis of the cpSSR data. In the neighbor-joining tree, this individual was resolved at the base of the Mexican group. The neighbor joining analysis of the nuclear SSR data produced complex assemblages of individuals with mixed geographic origins, with no resolution of genetic structure related to geographic origin (data not shown).

Fig. 4
figure 4

Neighbor-joining tree based on dissimilarity index for single, counts data from cpSSR. Tip labels are accession numbers from Table 1, color coded consistently with STRUCTURE analysis as shown in Fig. 2: Southern = blue, Central = red, Northern = green

Discussion

Data presented here provide important insights into the geographic distribution of genetic variation in natural populations of C. illinoinensis. Previous studies of this same germplasm collection used allozymes to analyze levels of genetic variation and to estimate outcrossing rates (Rüter et al. 1999; Rüter et al. 2000). Our analyses of microsatellite data corroborate results from previous studies, and expand upon these earlier contributions by identifying differences in the geographic structure of nuclear and organellar genetic variation.

Levels of genetic variation in native C. illinoinensis populations

As expected, estimates of nuclear genetic variation based on microsatellites are higher than levels of genetic variation observed in provenance, wild, or cultivated pecan trees based on allozyme data (Rüter et al. 1999). Compared to estimates of microsatellite variation in other native tree populations [e.g., Fagus japonica (Hiraoka and Tomaru 2009); Fraxinus mandshurica var. japonica (Hu and Ennos 1999); Juglans nigra (Victory et al. 2006) and Picea abies (Meloni et al. 2007)], native pecan populations harbor relatively lower levels of genetic variation. Low levels of genetic variation could reflect relatively recent genetic reductions in diversity due to some constraint, such as disease pressure. Alternatively, rapid range expansion could also result in reduced levels of diversity.

Contrasting geographic patterns among organellar vs. nuclear SSRs: implications for dispersal

Evolutionary analyses based on nuclear and organellar DNA often yield contrasting patterns. In phylogenetic analyses, incongruence between genomes has been interpreted as evidence for hybridization between distinct taxa (Acosta and Premoli 2010), namely through plastid or cytoplasmic capture (Tsitrone et al. 2003). Intraspecific population-level analyses based on data from multiple genomes provide an opportunity to compare movement of genetic material carried by pollen (nuclear DNA) relative to genetic material carried by seeds (nuclear and plastid DNA) (Ennos 1994; Hu and Ennos 1999). Theoretical and empirical studies have shown that organellar DNA exhibit elevated levels of population structure relative to nuclear DNA (reviewed in Latta 2006). This has been attributed to the smaller effective population size of organellar DNA and limited dispersal ability of seeds relative to pollen. Data presented here reveal that microsatellite loci from the plastid and the nucleus display strikingly different patterns of genetic variation (Figs. 2, 3; Table 5). Among-population differentiation of plastid SSRs ranged from 74–86%, compared to 7–12% for the nuclear SSRs (Table 4). This pattern of highly structured plastid SSRs and highly unstructured nuclear SSRs was consistent across different groupings of individuals (e.g., using a priori populations, cpSSR-defined clusters, nSSR-defined clusters). These data suggest that plastid SSRs are useful tools for identifying population structure in C. illinoinensis and hold promise for ongoing efforts to identify and conserve a representative sample of C. illinoinensis germplasm in ex situ collections.

Comparative analyses of organellar and nuclear genetic structure in crop wild relatives may shed light on historical use patterns as well as contemporary gene flow. In this study, the relative dearth of among-population genetic structure displayed by nuclear microsatellite loci, and the extensive geographic structure exhibited by plastid SSRs, suggest that pollen is the primary agent of gene flow in native C. illinoinensis populations (Table 4)

Extensive pollen-mediated gene flow and limited seed-mediated gene flow is expected in wind-pollinated tree populations with nuts dispersed by gravity or rodents, as in Carya (Namkoong and Gregorius 1985). However, in tree species where fruits are dispersed long distances by birds (Vander Wall 2001) or by humans, the geographic structure of cpDNA might well be altered. Although plastid data presented here indicated significant geographic structure which implies limited seed dispersal, long-distance seed dispersal cannot be discounted completely for C. illinoinensis. Our analyses of cpSSR data identified a few individuals that may be the descendants of seed dispersed over long distance. For example, two individuals collected in southern portions of the range (#7 from MX2 and #11 from MX4) clustered with the northern groups based on cpSSR data (Fig. 2) and with the central group based on nSSR data (Fig. 3). These data suggest periodic long-distance dispersal via seeds.

Additional evidence for long-distance dispersal of pecan comes from the archeological record. Frequent occurrences of pecan in archeological sites in the Upper Mississippi Valley indicate that pecan was used by people characterized by the Dalton artifact complex (Smith 1992). These early Holocene foragers are described as “opportunistic dispersal agents for plant propagules, and their small camps created ephemeral disturbed soil opportunities for some pioneer species” (Smith 1992, p. 282). Abrams and Nowacki (2008) suggest that Native Americans influenced North America's forest composition by actively and passively promoting production of nuts and acorns in preference to other competing vegetation, using the tools of fire and tree girdling. Historic accounts of pecan usage indicate that people have been actively selecting and establishing plantings in new locations since the colonial period (True 1919).

Latitudinal gradients in genetic variation may reflect unique aspects of postglacial recolonization

The geographic distribution of genetic diversity in native populations is the result of evolutionary processes acting on the raw material of progenitor populations over time. A previous analysis of North American pecan populations (Rüter et al. 1999) reported higher levels of genetic variation in northern and central C. illinoinensis populations, and lower levels of genetic variation in southern and eastern populations based on nuclear-encoded allozyme loci. Nuclear microsatellite data presented here are consistent with this finding, but plastid data reveal the opposite pattern, with plastid diversity decreasing with latitude (Table 5). We suggest that contrasting patterns in plastid and nuclear genetic diversity resulted from unique aspects of the movement of seeds versus pollen during postglacial recolonization.

In North American and European temperate forests, glacial activity played an important role in shaping contemporary patterns of genetic variation. Fossil studies have shown that populations were restricted in refugia during glacial times, which were followed by periods of expansion as populations tracked warming climates (e.g., (Magri et al. 2006)). Molecular signatures of recolonization have been documented in numerous taxa and include increased levels of genetic variation in putative refuges and decreasing levels of genetic variation moving away from putative refugial areas (Hewitt 1996). Geographic patterns of plastid diversity presented here conform to expectations of recolonization, and display higher levels of genetic variation and greater numbers of private alleles in southern areas relative to northern regions. During the Pleistocene epoch, repeated episodes of glacial advance restricted the northern distribution of Carya in North America to about 35° N at 20,000 BP (Delcourt and Delcourt 1987). As glaciers receded, hickories migrated north with rates of advance as high as 354 m/year, and reached the northernmost interglacial limit west of the Appalachian Mountains of 45° N by 8,000 years BP (Delcourt and Delcourt 1987). The homogeneity of plastid haplotypes in the northern region is consistent with the establishment of northern populations from a limited group of sources. Human-mediated dispersal may have contributed to the rapid movement of C. illinoinensis up the Mississippi River floodplain, an area where Archaic people were active (Delcourt and Delcourt 1987; see discussion above).

In contrast to the inverse relationship observed between plastid genetic variation and latitude, northern C. illinoinensis populations exhibit the same or greater levels of nuclear SSR variation compared to southern populations. Relatively elevated levels of nuclear genetic variation at northern latitudes may reflect a unique aspect of postglacial recolonization: interspecific gene flow in newly colonized areas. Petit et al. (2003) proposed that pollen-mediated gene flow may facilitate expansion/invasion of a species into habitats already occupied by closely related species. In these cases, pollen of the expanding species lands upon stigmas of established species in a given area, yielding F1 hybrids. This scenario is consistent with observations of highly geographically structured plastid variation but unstructured nuclear variation, if C. illinoinensis was the initial Carya species to colonize areas in the north. In this case, other Carya species expanding into the northern regions following the establishment of C. illinoinensis may have contributed pollen to the gene pool of C. illinoinensis because these species would have lacked mates in newly colonized areas. Presumably, similar patterns would not be observed in more southern populations because these species would not have been mate-limited in refugial areas.

Geographic origins of Mexican C. illinoinensis populations

The origins of Mexican C. illinoinensis populations have been debated for over 60 years (e.g., Manning 1949). Previous studies based on allozyme data (Rüter et al. 1999) and nuclear microsatellite data presented here demonstrate that Mexican pecan populations harbor lower levels of nuclear genetic variation than northern C. illinoinensis populations, an observation that is consistent with a relatively recent introduction of the species into the southern portions of its range. However, plastid microsatellite data reveal that Mexican C. illinoinensis populations harbor more private alleles and higher levels of variation than other North American pecan populations. The plastid data suggest that Mexican populations are not the product of recent introductions, but may represent the descendants of relatively ancient C. illinoinensis populations. If they represent refugia, the Mexican populations would be the source for more northern materials.

More than 50 plant species occupy disjunct distributions between the eastern USA and northeastern Mexico (Graham 1999), including C. illinoinensis, Carya myristiciformis (nutmeg hickory), and Carya ovata (shagbark hickory). Two competing hypotheses have been proposed to explain the disjunction: (1) Mexican populations were introduced from the north during the late Cenozoic cooling (Miocene and Pliocene); and (2) Mexican populations were part of an ancient rainforest that has persisted since the early Eocene (Graham 1999). Milne (2006) identified some of the challenges associated with untangling vicariance from long-distance dispersal in resolving questions of long-standing plant disjunctions and noted that intraspecific haplotype variation could offer valuable insight, if examined with appropriate molecular tools. The use of molecular tools for dating phylogenies in broad contexts requires their development within groups that demonstrate known patterns of vicariance, documented by other hard evidence. The genus Carya provides opportunities for the development of such markers, due to its disjunct distribution between the southeastern USA and Asia that has been well-established with the fossil record (Manchester 1999). The use of plastid SSR markers in this collection reveals levels of intraspecific variation that is distinct from the pattern of nuclear variation and reveals structure related to geographic origin. Recognition of the structure of population diversity should contribute to the development of improved strategies for the conservation of that diversity (Namkoong et al. 2004), and to the development of more precise molecular tools to characterize those populations.