Introduction

Wild soybean (G. soja Sieb. et Zucc.), the progenitor of cultivated soybeans [Glycine max (L.) Merr.], is native to eastern Asia, including China, Japan, the Korean peninsula, and the Far East of Russia. In China, wild soybean mainly grows in the eastern half, and no wild soybean has found in the westernmost Xinjiang, western Qinghai, and southernmost Hainan areas, where the climate is unsuitable for this species. Although wild soybean is widely distributed through most parts of this country, it does not grow in some particular areas or small-scale ecological environments (habitats) because the local geographical and ecological conditions are unsuitable; e.g. paramos areas, arid or hot regions. In China, generally wild soybean does almost not grow in areas that had less than 300 mm of annual precipitation or no 1 month of monthly mean temperature of 20°C and more in the warmest months, or continuous 7 months and more of more than 20°C monthly mean temperature during a year (Wang et al. 2001).

Over 30 years, Chinese wild soybean resources were been collected all over the limits of its distribution range, contoured by 23°57′–53°29′ N and 97°04′–135° E and extensively studied in many aspects. Geographical genetic diversity of Chinese wild soybean has been evaluated morphologically (Wang et al. 2008) and at the molecular level (Liu et al. 2009). Xu 1995 showed two geographical centres of wild soybean in the Northeast and Huanghe River Valley of China on the basis of the geographical distribution of seed characteristics. Based on eight morphological characters defined in the China Genebank database, Dong et al. (2001) reported three centres of morphological diversity (including semi-wild soybean type, G. gracilis Skvortz.) in the northeast, the Yellow River Valley and the southeast coasts of China. Wang et al. (2005) reported that the mean seed sizes of natural populations in the northern and central wild soybean were greater than those in the southern ones in China. An investigation of phenotypic diversity in Chinese semi-wild soybean showed that the Northeast and North were two main centres (Wang et al. 2008). Shimamoto et al. (1998) reported that the Chinese wild soybeans in the Changjiang River valley had the greatest genetic diversity in cytoplasmic DNAs. Wen et al. (2009) analyzed 196 wild soybean accessions using SSR markerss and found that Chinese wild soybean had geographical differentiation in three artificial-divided large regions, north, middle and south; the southern China was the major centre of the genetic diversity with highest genetic diversity and largest number of specific-present alleles, and that geographical grouping and genetic clustering of Chinese wild soybean accessions are correlated. Li et al. (2009) found that the northern and southern geographical populations of Chinese wild soybean had the largest genetic differences, and the Central regional population had higher genetic diversity, and there was clear geographical correlation through nuclear SSR analysis of 375 wild accessions including semi-wild soybean.

Although there were extensive collection and many studies of Chinese wild soybean resource, a particular regional wild soybean–Tibetan population has almost not been known about its genetic information. Tibet is located at the southwester edge of China, which is a plateau of over mean 4,000 m in altitude, where plateau surface topography is sloping gently from the northwest to the southeast, descending from 5,500 m above sea level to 2,300 m on average. The southeast area has many flat and gently sloping patches and gorges of 1,600–2,000 m above sea level, where the microclimate is warm.

Thirty years ago, wild soybean was found in the southeast of Tibet, a warm and humid gorge (Gongrigabuqu-River Gorge), Chayu County (Li 1987). The Gongrigabuqu-River Gorge is 1,600–2,000 m above sea level and located in the northwest of Chayu County. The southeastern edge of Chayu is adjacent to Yunnan Province. The Hengduan Mountains (Meilixueshan mountain of Hengduan mountains is 6,740 m in above sea level) run along the eastern borders of Chayu and greatly obstruct the pervasion of plants from east to west. The Tibetan population of wild soybean is almost completely isolated from other areas. Although genetic diversity has been studied extensively for other regional wild soybeans in China, little is known about that of Tibetan wild soybean. Because only four Tibetan wild accessions have been reserved in the China Genebank, we very little understand the geographical, morphological and molecular genetic diversity for Tibetan wild soybean. Investigating the genetic attributes of Tibetan wild soybean will help to understand the geographical dissemination of wild soybean in China and the history of the community formation of wild soybean in Tibet. In view of the limited information on Tibetan wild soybean, the objective of this study was to analyse (1) the genetic differences and genetic diversity of the Tibetan wild soybean populations isolated from the eastern ones, (2) their geographical relationship between Tibetan and other, geographically distant populations in China; (3) to discuss how Tibetan wild soybean differentiated geographically and genetically from other regional populations.

Materials and methods

Plant sampling

To compare genetic differences between Tibetan and other eastern wild soybeans, wild soybean populations were sampled from various corners of China, including four Tibetan populations (pop. F, G, H, and I) from the Gongrigabuqu-River gorge, Chayu, and seven populations from remote regions in the centre (pop. D and E), north (pop. C), and outermost northeast (pop. A and B), southwest (pop. J) and south (pop. K) (Table 1; Fig. 1). These population sites covered four edges (northeast, southwest, southeast, and south) and two inland areas (centre and north) of the distribution range of this species in China. These are massive variation in the geographical and ecological conditions among the sites.

Table 1 Information on eleven wild soybean populations and two groups of soybeans (including landraces and modern varieties)
Fig. 1
figure 1

Geographical distribution of 11 natural populations of wild soybean. These populations grew in fringe and central regions of China, Tibet (F, G, H, and I), the Centre (D and E), North (C), and outermost Northeast (A and B), Southwest (J) and South (K)

Each population was randomly sampled by 30–37 individuals at intervals of at least 10 m (exception for the smaller population J). The Tibetan wild soybean populations were sampled directly by field-gathering young leaves from growing plants in natural populations during flowering period (in August, 2006) in Tibet. In other populations, seed samples were collected at maturation period in 2004–2007. These population sites covered four edges (Northeast, Southwest, Southeast, and South) and two inland areas (Centre and North) of the species distribution in China; they embodied vast differences in geographical and ecological conditions.

We also used two groups of Chinese northernmost and southernmost soybean varieties (including landraces and improved varieties) as reference in the data analysis to gain insight into whether the wild populations had some genetic involvement with their surrounding cultivated soybeans. These two groups of cultivated soybeans had once been cultivated within a radius of about 100 km around the population sites A and K (Table 1).

SSR marker test

Total genomic DNA was extracted from young leaves of each plant according to the method of Doyle and Doyle (1990). We used 20 pairs of SSR markers selected from previously reported linkage groups the linkage groups (Cregan et al. 1999). These markers showed good stability in our laboratory record. Polymerase chain reaction (PCR) amplification was carried out in a 20-μl reaction mixture containing 100 ng template DNA, 10 pmol each primer, 100 μM each of the four dNTPs, 1 U Taq polymerase, 2 μl 10× PCR buffer. Cycling conditions were as follows: 30 s denaturation at 94°C, 30 s annealing at 47°C, 30 s extension at 72°C (30 cycles). The amplified products were separated on a 6% gel (SDS–PAGE) and bands were visualized by silver staining.

Data analysis

POPGENE1.31 (Yeh et al. 1999) was used to valuated the genetic variability as the measures of the observed total number of alleles (N a), effective allele number (Ne), percentage of polymorphic loci (A p), Nei gene diversity (H e, Nei 1973), and Shannon–Weaver index (I, Shannon and Weaver 1949). Geographical differentiation was evaluated by F-statistics values (F st) among the geographical regions using software POPGENE version 1.31. A cluster dendrogram (UPGMA) was constructed to evaluate the genetic relationships for these natural populations based on the average genetic distances using software powerMarker version 3.25 (Liu and Muse 2005). To compare the levels of genetic diversity, we also calculated the amount of allele abundance in a locus by the rarefaction method of Hurlbert (1971) as introduced by El Mousadik and Petit (1996) using FSTAT2.9.3 for the parameter of allelic richness (\( {\hat{\text{r}}} \)) (Goudet 2001).

The Bayesian clustering algorithm was applied to assign genetically similar individual plants among these natural populations using STRUCTURE version 2.1 (Pritchard et al. 2000). One to 20 K (number of assumed clusters) was used to infer the number of clusters. Twenty times of the simulation independent were carried out for each K value, using without the prior population information model and the method allowing for the admixture and correlated allele frequencies with 50,000 burn-in period and 50,000 replication number. A consanguinity analysis of individuals was carried out to reveal the genetic interknitting within populations and between cultivated and wild soybeans based on the ancestry analysis method of Oumar et al. (2008).

Results

SSR polymorphic variation

In the total samples of 343 plants, we observed a total of 202 alleles at the 20 nuclear SSR loci; the allele number (N a) for each locus varied between 7 and 15, with a mean value of 10.1 (Table 2) and the mean effective allele number was 5.22. Of 202 alleles, 86 (42.57%) were rare, with a frequency less than 0.05 in the whole samples. The number of alleles is one of the most important genetic components for genetic differentiation in populations, types, geographical sites. Besides the N a, we calculated the allelic richness, which is an objective index independent of sample size. In this sample studied, the allelic richness (\( {\hat{\text{r}}} \)) varied from 5.12 to 12.345 per locus, with an average of 8.087. Low frequencies (mean value = 0.0047) for the observed heterozygosity (H o) per locus were observed as prognosticated for self-plants such as wild soybean. The mean H e was 0.79 (range: 0.68–0.88 per locus). The correlations between the genetic parameters of genetic diversity per locus were best reflected by N e, I and H e in this sample analyzed (r = 0.94–0.96, P < 0.001).

Table 2 Genetic parameters for 20 nuclear SSRs used in eleven natural wild populations and two groups of soybeans in China

Genetic diversity and variation in populations

The populations showed genetic differences among the 11 populations. Sixty alleles were fixed among the 11 populations (Table 4), of which higher ratios appeared in all the four Tibetan populations (range: 40–75%), one central population E (75%) and the drought-resistant population C (95%; Table 5). The percentage of polymorphic loci (A p) ranged from 5 to 95% with a mean of 56.8% between populations, and the mean number of alleles (N a) per population ranged from 1.05 to 4.00 averaged, averaging 2.01. The observed heterozygosity (H o) per population was 0.003. The mean allelic richness (\( {\hat{\text{r}}} \)) per locus among the populations varied from 1.05 to 3.984, with an average of 2.007. The drought-resistant population C had the lowest values for various genetic parameters (Table 3), which implied that this population had been severely selected by drought stress so that its most loci (95% loci) were fixed (Table 4). The central population D had highest genetic diversity (Table 3), with the highest values for all genetic parameters. However, its nearest population E, about 280 km far away, had lower genetic diversity. Population E was located on a hillside and this population might be formed by the individuals from the superior border. The northern and southern populations (A, B, J, and K) showed rather high genetic diversity. All four Tibet-Plateau populations (F, G, H, and I) all were characterized by low genetic richness (1.25–1.55; Table 4).

Table 3 Estimates of genetic diversity among eleven natural wild populations and two groups of soybean in China
Table 4 The number and distribution status of alleles at each locus among the wild populations

Intrapopulation status in genetic structure

The individual assignment test showed that all plants were completely assigned to their original populations (Fig. 2). No dissemination over large geographically areas was observed among these populations, even between relatively near populations D and E, which were 280 km apart. We analysed the genetic structure within the populations using STRUCTURE version 2.1. The Tibetan and other populations were statistically near-homologous. Only two northeastern populations A and B showed obvious structures; population A had two distinct genetic component groups and population B had three genetic component groups (Fig. 2).

Fig. 2
figure 2

Individual assignment in the 11 populations (A–K) of wild soybean using the without prior population information model when k = 12 [(with the highest likelihood value (In PrX|K)]. Two northeastern populations A and B had heterogeneous components in genetic structure. All plants were completely assigned to their original populations and no dissemination over distant geographical areas was found among these remote populations, even between D and E, which were only 280 km apart

Population A occupied a larger area, where the heterogenous components might have originated from the migration of surrounding congeners or from the introgression of soybean farmlands. To clarify reason what caused the heterogenous components in the population A, we conducted ancestry analysis of individuals determine whether gene flow had introgressed form soybeans into this wild population. For this test, we used 26 soybean varieties that grew within approximately 100 km of site A. The result displayed that there had been gene-flow from cultivated soybeans into this wild population in the past (Fig. 3a). At least four wild individual plants were showed consanguinity with soybeans. The heterogenous plants originated from natural cross between wild and cultivated soybeans a long time ago. Under natural selection in the ecosystem, the offspring segregated towards small seed size, so that these offspring plants completely became wild soybean morph. We also analysed the southern population K and 27 soybeans within a 100 km radius of site K, and it appeared that there was no natural hybridization between population K and the surrounding soybeans (Fig. 3b). This was in accordant with the results shown in Fig. 2, where indicated that population K consisted of genetically homogeneous individual plants.

Fig. 3
figure 3

Posibility of introgression from soybeans into wild populations. Ancestry analysis identified four introgressive plants consanguineous with soybeans in the northernmost population A (a) and no acceptable introgressive plants in southernmost population K (b). As shown in Fig. 2, heterogeneous components were present in population A but not in population K

However, population B was different from the situation of population A, which showed eco-genetic differentiation that resulted from two microenvironments; osier field and a drought, sandy revetment-dike of the osier field. Fourteen individuals (i.e. group 3) growing on the revetment-dike differentiated from other 16 individual samples from the osier field (groups 1 and 2). Population B generated intrapopulation heterogenous hybrid individuals among the three different components (groups 1, 2 and 3), as shown Fig. 4.

Fig. 4
figure 4

Heterogeneity of genetic backgrounds in population B. There three genetic components (groups 1, 2 and 3) and two kinds of heterogeneous individuals between components 1 and 2 and between 2 and 3. Some genetic differentiation seemed to occur between the drought and normal microenvironments; 14 drought plants were genetically separated from other plants growing in the osier field

Geographic differentiation

Geographically these populations distinctly showed regional features of genetic differentiation. The largeast sample, 120 Tibetan individuals produced only 49 alleles; 23 (46.94%) of which were fixed. This proportion of fixed alleles was obviously higher than that found in the other regional samples, for example, 23/111 (20.72%) in N-NE, 16/89 (17.98%) in region C, and 5/59 (8.25%) in S-SE (Table 5). The Tibetan wild soybean and southern samples generated more region-unique alleles. The proportion of regionally unique alleles was 63.26% for Tibetan wild soybean sample, 61.02% the southern sample (S-SE), 44.94% for central sample C, and 27.92% for the northern sample (N-NE; Table 4). However, in the Tibetan sample, 42.86% (21 alleles) of the fixed alleles, which was a higher ratio, were region-unique alleles; whereas in the southern sample (S-SE) only 5.05% of fixed alleles were regionally unique. In the C and N-NE regional samples, the regionally unique alleles of fixed alleles were 10.11 and 13.51%, respectively. In all the Tibetan samples 20% of loci were fixed single alleles (Table 5). Samples from regions C and S-SE had the highest allelic richness values (\( {\hat{\text{r}}} \); 2.616 and 2.564), followed by N-NE (2.041), whereas the Tibetan sample showed the lowest value (1.40; Table 5). Together, these results suggested that Tibetan wild soybean populations had less genetic exchange with other regional populations.

Table 5 Status of region-unique alleles in wild soybean of four geographical areas

The cluster dendrogram (UPGMA) confirmed that there was regional genetic differentiation among these regional populations (Fig. 5), revealed two geographical groups, Tibetan group and inland group. The Tibetan populations were independently differentiated from other regional populations in China. Population C from northern Beijing was a drought and sandy soil population and had specialized adaptation. The two northernmost and southernmost groups of soybeans first clustered and then jointed with the southern population (S). Basically, the populations located close to each other first converged as geographical groups, with the exception of population J from southeastern Fujian. Population J (SE) first clustered with central population D north but not with transverse population K(S). Genetically, southern K appeared to have closer connections with central E and northernmost A and B along with latitude. This suggested that there may be a latitudinal species-dispersal effect or gene flow, which was consistent with the topography in this part of China, i.e., the mountainous terrain and plateau obstructs dispersal of wild soybean in western China. All the four regional populations (Tibet, N-NE, Centre, and S-SE) showed significant geographical differentiation from one another (Table 6).

Fig. 5
figure 5

UPGMA cluster based allele frequencies and Nei (1973)’s distance with 1,000 bootstrap replicates (0.95 confidence interval) for eleven wild soybean populations and two soybean groups. The cluster showed two large geographical groups, Tibetan and inland groups. The Tibetan populations were independently differentiated from other regional populations in China. Population C was a drought sandy soil population and had specialized adaptation to its environment. Two northernmost and southernmost groups of soybeans first clustered and then joined with southern population (S). Population J (SE) first clustered with central population D north but not with population K(S) transverse. Southern K was more closely genetically related to central E and northernmost populations A and B

Table 6 Differentiation estimated by F-statistics values between areas

Discussion

This is the first study assessing genetic diversity and geographical peculiarity for Tibetan and other remote border populations of wild soybean in China. The Tibetan wild soybean was characterized by low level of genetic diversity as indicated by higher ratio of fixed alleles, higher frequency of regionally-unique alleles (Table 5), and lower genetic richness (Tables 3 and 5). The higher ratio of allele fixation and higher frequency of regionally-unique alleles suggest that Tibetan wild soybean had less genetic exchange from other inland populations because of its geographical isolation. Some results of previous studies in genetic diversity of wild soybean were always not coincident, and the various studies reported different morphological and genetic diversity centres for Chinese wild soybean (Xu 1995; Shimamoto et al. 1998; Dong et al. 2001; Li et al. 2009; Wen et al. 2009; Guo et al. 2010). In the present data of the natural populations, the higher levels of allelic richness were observed in populations from central China and the southern region (Table 5).

The UPGMA cluster revealed two large geographical groups, Tibet and inland. Tibetan wild soybean showed large genetic differences from other populations growing in central and various fringe regions of China (Table 4). The southern K appeared to be more closely connected to central E and northernmost A and B, which implied that there may have been more rapid spread of wild soybean between the north and south of eastern China in the dissemination history of this species. The faster dissemination along the south-north route was consistent with the topography of China, as the eastern parts are plain and lower hills and the western parts are plateau and has many north–south longitudinal mountains. However, the ancestry of Tibetan wild soybean remains unclear, and is requires further research.

Wild soybean was maybe a relatively age-old species of eastern Asia and established in subtropical belt area of China, 5 million years ago (Singh and Nelson 2009); this area is regarded as a centre of world’s angiosperm (Wu and Wu 1996; Wu 1988). The Japanese islands were joined with Asian continents of until late Miocene-Pliocene and Pleistocene (18,000 years ago), and Taiwan island was conjoint with the Asian continent before the Tertiary period. Japanese wild soybean is likely to be a descendant of the species at that time, which migrated with the motion of the Pacific plate. A study showed the genetic differentiation between Chinese and Japanese wild soybeans (Wang and Takahata 2007). Our present study showed that Tibetan wild soybean, with lower genetic diversity (Tables 3 and 5), differed genetically from other regional wild soybean populations (Table 6 and Fig. 4). The results suggested that Tibetan wild soybean in the Gongrigabuqu-River gorge likely underwent severe natural selection during its adaptation to the plateau climate and ecogeographic conditions in the gorge. Another possible explanation for the lower genetic diversity in the Tibetan wild soybean populations may implicate the late migration of wild soybean since the this species was established, that is, it could be that Tibetan wild soybean did not arise until after this species had already spread throughout the eastern parts of China and other eastern Asia (Japan, Taiwan, the Korean peninsula, and the Far East of Russia) via land routes, and profiting from the favorable climate. However, the history of the naissance of Tibetan wild soybean remains unclear. The occurrence and distribution of Tibetan wild soybean are likely to be related to the vicissitudes in climate and flora development of southeastern Tibet during and after the formation of Himalayan-Hengduan Mountains (Boufford and Van Dyck 2000; Sun 2002).

Cultivated soybean was domesticated somewhere in China 5,000 years ago (Hyten et al. 2006). Nevertheless the results of this study could provide little knowledge regards to the origin of cultivated soybean, because only a small number of natural populations were selected to focus on the genetic characteristics of Tibetan wild soybean. To determine the origin of cultivated soybean, more diverse geographical and evolutionary-level type accessions of Chinese wild and cultivated soybeans should be analyzed. Our results showed that the two soybean groups from the small areas in northernmost and southernmost parts of China were more closely related to the southern population K (Guangdong).

Our results clearly demonstrated that genetic differentiation of wild soybean has occurred not only among natural populations and geographical regions, but also even between heterogeneous microenvironments within a small natural population, and that outcrossing would occur between plants within populations (Fig. 4). As revealed by the ancestry analysis in Fig. 3a, introgression from cultivated soybean into its wild progenitor species had occurred. It is perhaps that this has continued for 5,000 years in China since the naissance of cultivated soybeans.