Introduction

Fruit flies within the Tephritidae family include more than 5,000 species, one-third of which develop in fresh fruits and represent important invasive pests (White and Elson-Harris 1992). Given their opportunistic phytophagous lifestyles (Diaz-Fleischer et al. 1999), host plants have a significant influence on their survival and play an important role in their dispersion. In other words, fruit flies dispersed with the diffusion and implementation of agriculture and, as secured passengers of trading goods, took further advantage from increasing human mobility and trading activities (Malacrida et al. 2007). Therefore, the global invasion of tephritid flies is emerging as the combined result of both natural and human processes. Examples are provided by pest species such as the Medfly Ceratitis capitata (Malacrida et al. 2007) and Bactrocera species such as Bactrocera cucurbitae (Virgilio et al. 2010) and taxa pertaining to the Bactrocera dorsalis complex. The latter species have natural distributions confined to oriental regions, but are becoming worldwide invasive pests (White and Elson-Harris 1992; Aketarawong et al. 2007; Khamis et al. 2009; Krosch et al. 2012). Within this species complex, the Oriental fruit fly, B. dorsalis sensu stricto, is one of the most economically destructive species for fruits and vegetables across East Asia and Southeast Asia (Drew and Hancock 1994; Clarke et al. 2005; Drew et al. 2008). The very broad host range of B. dorsalis s.s., its dispersal capacity and its relatively wide climatic tolerance (Fletcher 1989; Duyck et al. 2004; Liu et al. 2011), have allowed it to establish adventive populations in different tropical and subtropical areas (Stephens et al. 2007). It was first recorded in 1912 as lectotype in Taiwan (Hardy 1973; Drew and Hancock 1994) and in the period 1934–1937 on Hainan Island and on the south eastern coast of China (Xie 1937). In the following years its presence was reported in most countries of the Asia–Pacific region, from South China to the northern areas of the Indian subcontinent (Wang 1996). Outside Asia, the oriental fruit fly has been reported in Hawaii (1945), Guam (1947) and occasionally in California and Florida (1960–1990) (White and Elson-Harris 1992). The spread of this species is receiving considerable attention for the development and implementation of risk assessment and control measures (Biosecurity Australia 2009). As a consequence, this species is mobilizing substantial ecological/demographical and molecular genetic research to assess its potential home place and to interpret its dispersion patterns. At the macro-geographic diffusion level, the analyses of microsatellite markers in 14 populations from Far East Asia, South Asia, Southeast Asia and the Pacific area, indicated a western-oriented migration route from China to the West, with a colonization process associated with a relatively stable population demographic structure of adventive populations (Aketarawong et al. 2007). Within Asia the necessity of implementing control strategies against this pest stimulated studies attempting to validate the B. dorsalis s. s. taxonomic status in relation to its sister pest taxa, B. papayae and B. philippinenensis, with which it shared allopatric distribution in the Thai/Malay region (Schutze et al. 2012; Krosch et al. 2012). This area marks the southern diffusion limit of B. dorsalis s.s.. In parallel, strong efforts were devoted in the attempt to trace the invasion routes of this pest across and out the China/Southeast Asia regions (Wan et al. 2011, 2012; Shi et al. 2012). Each of these studies provided pictures of its complex colonization pattern. Two main invasion routes were proposed by Wan et al. (2012): one from Southeast China to Central China, another from Southeast China to Southeast Asia. A more extensive geographic sampling (Shi et al. 2012), which included most regions used by Aketarawong et al. (2007) and Wan et al. (2011), provided indirect evidence that the tropical regions of Southeast Asia and the Southern coast of China may be considered as the native range of the species. Moreover, the weak population genetic structure and the absence of isolation by distance effect detected across the considered areas were explained by the presence of human mediated long-distance dispersal events and miscellaneous host selection by this species. But specific approaches are necessary to disentangle the different forces that affected the invasion pattern and shaped the genetic make up of its populations, in an area such as East Asia, which in relatively few years became a theatre of intensive agriculture and a lively fruit trade.

On this background, here we use microsatellite data to evaluate both large- and small- scale population genetic differentiation and connectivity within and among South East Asia and East Asia regions. Using both classical and Bayesian methods, we attempt to infer the origin and the routes of invasion of B. dorsalis s.s. across these areas, in which the propagule pressure is exacerbated and spread over great distances by various intense human activities.

Materials and methods

Samples considered

Attention has been centred on 12 B. dorsalis s.s. populations from East Asia sampled in 2006–2007 (Table 1). We took advantage of the available SSR genotype dataset previously obtained from five Southeast Asia populations (Thailand, Myanmar, Laos, Cambodia) and from Taiwan (Aketarawong et al. 2007). This dataset has been integrated with six new population samples collected in China and genotyped at the same SSR loci. The six samples from China were collected in the following provinces: Hunan (Hu), Fujian (Fj), Yunnan (Yn), Guangxi (Gx), Guangdong (Gd), and Hainan (Ha). The integration of this new data from China with the available Southeast Asia dataset was aimed at obtaining a genetic portrait of this species along the tropical/subtropical East Asia region. This area offers climatic conditions and continuous host resources, also due to intensive crop cultivation (e.g. mango, guava, and orange), suitable for stable populations of B. dorsalis s.s. (Biosecurity Australia 2009). The Chinese samples were obtained as ethanol preserved adults and genomic DNA was extracted individually from each fly using the method of Baruffi et al. (1995).

Table 1 Collection data of Bactrocera dorsalis s.s. samples used in this study

Microsatellite analysis

Each individual from the Chinese samples was genotyped at the same nine microsatellite loci (Bd1, 7, 9, 15, 19, 39, 42, 54, and 85B) used for the South East Asia samples by Aketarawong et al. (2006, 2007). We employed the TANDEM program (Matschiner and Salzburger 2009) for the automated binning of microsatellite allele lengths (with additional manual checking) to overcome the problems of genotyping errors. Samples for which amplification was not successful, or scoring was uncertain, were rerun, and re-extraction of DNA was performed if necessary. The genotyping data obtained from the Chinese samples were integrated into the dataset from the Southeast Asian samples.

Sample variability estimates

Genetic variability in each of the 12 populations was estimated in terms of number of alleles (n a), effective number of alleles (n e), variance of allele size (V m), and observed and expected heterozygosities (H O and H E, respectively) using MICROSATELLITE ANALYSER (MSA) V.4.05 (Dieringer and Schlötterer 2003). Allele richness (R s), gene diversity (H S) and the inbreeding index (F IS) were obtained using FSTAT V.2.9.3.2 (Goudet 2001). The frequency of null alleles (A n) was estimated following Brookfield (1996). Fisher’s exact test, subprogram in GENEPOP V.4 (Rousset 2008), was used to test the departures from Hardy–Weinberg equilibrium at each locus/population combination. The same program was also used to examine the linkage disequilibrium between pairs of loci in each population (100 batches, 1,000 iterations per batch).

Population structure and demography analysis

The degree of genetic differentiation among populations was quantified by calculating pairwise-F ST values (Weir and Cockerham 1984) in the MSA V.4.05 software (Dieringer and Schlötterer 2003). The statistical significance of each value was assessed by the comparison of the observed value with the values obtained in 10,000 matrix permutations. The relationship between genetic (F ST/1-F ST) and geographic distances [Ln distance (Km)] among populations was studied using subprogram ISOLDE in GENEPOP V.4. Principal Coordinate Analysis (PCoA) in the program GENALEX V.6.4 (Peakall and Smouse 2005) was applied to identify the relationships among populations on the basis of their allele frequencies. A plot of the first three principal coordinates was conducted using subprogram MOD3D in NTSYSpc V.2.1 (Rohlf 2005).

A model-based clustering method implemented in the program STRUCTURE V 2.3.2 (Pritchard et al. 2000; Falush et al. 2003, 2007) was used to infer the number of potential genetic clusters (K) within the B. dorsalis samples. The simulation was performed using an admixture model assuming correlated allele frequencies (the F model). The F model allows frequencies in the different populations to be similar, probably because of ongoing migration or shared ancestry. The burn-in was set to 100,000 steps and was followed by 1,000,000 Markov Chain Monte Carlo (MCMC) replications. All runs were repeated 10 times for each number of possible clusters (K), set between 1 and 12 (i.e. the number of samples), regardless of a priori information (e.g., sampling location, distance, etc.). To indicate the most likely number of genetic clusters, the natural logarithm of the likelihood of the data, Ln P(X/K), was calculated. It is expected to be high with a low variance for the real cluster (Pritchard et al. 2000). We also implemented the Delta K method (Evanno et al. 2005) which uses the second-order rate of change of Ln P(X/K) given the data. This method is based on the rate of likelihood change between successive K values showing the best estimate of K.

The genetic distance based on the proportion of shared alleles (D S) (Bowcock et al. 1994) and Nei’s genetic distance (D A) (Nei et al. 1983) were calculated using the MSA V.4.05 (Dieringer and Schlötterer 2003). Neighbour-joining trees were reconstructed based on D S and D A, after 1,000 bootstrap re-samplings of the original data, using the PHYLIP package (Felsenstein 2005).

Hierarchical analysis of molecular variance (AMOVA) was tested to study the genetic discontinuities among geographical areas. Populations were partitioned into five scenarios as shown in the results section. The partition of genetic variation among and within regions and within populations (quantified through the sum of square size) were tested through the nonparametric permutation procedures of 10,000 iterations using ARLEQUIN V.3.0 (Excoffier et al. 2005).

Inference of population demography was analysed through the program GENECLASS V.2.0 (Piry et al. 2004). The former was used to assign or exclude reference populations as possible origins of individuals, on the basis of multilocus genotypes. The program calculates, for each individual of a population, the probability of belonging to each other reference population or to be a resident in the population where it was sampled. The standard criterion described by Rannala and Mountain (1997), which applies Bayesian statistics to compute probabilities, was used. The additional parameters chosen included (1) the simulation algorithm for population assignment described in Paetkau et al. (2004), (2) simulation of 10,000 genotypes for each population and (3) an arbitrary threshold probability value of 0.01 to determine the origin. In order to obtain detailed information of the routes of the invasion and colonization histories we designed four sets of evolutionary scenarios and we analysed them with ABC methods using the DIYABC v.2.0.3. program (Cornuet et al. 2014). We used prior distribution of demographic parameters as described in Table 2. The definition of scenarios was chosen on the basis of the scattered historical information and on the results of cluster analysis. The estimation of the time of events (in number of generations back in time), was based on the consideration that B. dorsalis s.s. can support from five to ten generations/year in most tropical areas, but less than four generations/year in most subtropical areas (Fletcher 1989; Shi et al. 2005; Ye and Liu, 2005; Liu et al. 2011). The current effective population size was assumed the same for all the sampled populations using a uniform distribution bounded between 10 and 100,000 diploid individuals. During colonization, the number of founder individuals for each colonization event was described as Nb and its size was drawn from a uniform distribution bounded between 2 and 500 individuals. Moreover, the duration of bottleneck (db) was assumed to occur after the introduction events. The introduced populations might take several generations to establish their population size. Therefore, this parameter was set from one to ten generations. A total of 11 scenarios were considered (in four tests) to test the hypothesis that the invasion routes of B. dorsalis s.s. originated in South East China (i.e., China-Fj) and propagated to South East Asia (i.e., Thailand-U, Thailand-R, Cambodia, and Laos). For all the 11 scenarios we simulated 106 data/scenario. To compute posterior probabilities of the competing scenarios, we used the 1 % of the simulated datasets closest to the observed data to estimate the relative posterior probability (with 95 % CI) of each scenario with a logistic regression (Cornuet et al. 2014). For each set of scenarios, the most likely scenario was the one with the highest posterior probability value and non-overlapping 95 % confidence intervals. Confidence in scenario choice was assessed by evaluating Type I and Type II error rates, following the method described in Cornuet et al. (2010).

Table 2 Definition and prior distributions of parameters used in the definition of the 11 tested scenarios in ABC method

Results

Intrapopulation genetic diversity

Overall the considered nine microsatellite loci, detected in 224 flies across the 12 localities, appear to be very informative with a mean Polymorphic Information Content (PIC) estimate of 0.745. Indeed, these loci display high levels of polymorphism, both in terms of number of alleles (from 15 to 26) and allele size range: Bd1: 18 alleles (87–107 bp); Bd7: 15 alleles (76–126 bp); Bd9: 26 alleles (118–184); Bd15: 17 alleles (163–189 bp); Bd19: 19 alleles (142–175 bp); Bd39: 16 alleles (70–89 bp); Bd42: 21 alleles (147–188 bp); Bd54: 20 alleles (91–122 bp); Bd85B: 20 alleles (61–86 bp). Results of Fisher’s exact test with the sequential Bonferroni correction (Rice 1989) revealed that the majority of populations conformed to Hardy–Weinberg equilibrium (HWE) at most loci. However, the locus/populations combinations that were not in HWE were not concentrated at any locus or in any population. Significant linkage disequilibrium was not detected between genotypes at the nine loci.

The overall genetic variability for each population, is summarized in Table 3. China (especially China-Fj and China-Gx) and Thailand (especially Thailand-R) present relatively high levels of genetic variability on the basis of the number of alleles (n a), allelic richness (R S), median variance of allele size (V m), and expected heterozygosity (H E). Rare alleles (frequency less than 0.05) and private alleles were detected in all populations ranging from 11 to 24 and from 1 to 13, respectively. Within China, the China-Fj population showed the highest number of private alleles (9) with a relatively low average frequency (0.06 ± 0.04). The lowest number of private alleles (1) was observed in China-Hu and China-Gx populations, although the China-Hu private allele displays a quite high frequency (0.13). China-Gd, with only two private alleles, is the population with the highest frequency of private alleles (0.21 ± 0.27). Outside China, the highest number of private alleles (13) was observed in Thailand-R and the lowest number was detected in Myanmar, Laos and Cambodia (3), with relatively low frequencies (0.03–0.07). A deficiency in the average level of observed heterozygosity was detected in several populations. This observation may be related to the high number of rare alleles present in almost all populations, although it could be possibly associated with a Wahlund effect and/or to the presence of null alleles. The frequency of null alleles was estimated in all populations, and ranges from 0.07 to 0.20 (Table 3).

Table 3 Genetic variability estimates in field-collected samples of Oriental fruit fly Bactrocera dorsalis s.s. from East Asia

Population structure

Genetic differentiation among populations was measured by the fixation index F ST (Table 4). The pairwise F ST values among the sampled populations range from 0.000 (Thailand-U and Laos) to 0.272 (China-Gd and Myanmar). For several pairs of populations, F ST values are not significantly different from zero. It is noteworthy that the Chinese island sample (China-Ha) displays non-significant estimates not only with two southern China samples, China-Fj and China-Gx, but also with Cambodia. This last population, in turn, shows non-significant differentiation with the other south-eastern populations. An additional case of non-significant differentiation is between the eastern China region (China-Fj) and Laos (F ST = 0.034). The most differentiated China samples, are China-Gd, China-Yn and the Taiwan with all F ST significantly different from zero. Myanmar significantly differs from all the other populations, with the highest F ST value with China-Gd (0.272) and the lowest with China-Yn (F ST = 0.060). The analysis of isolation by distance (IBD) results in a significant correlation (R = 0.33, P = 0.045) between geographical and genetic distances (Fig. 1). However, it was found that the relationship of the Thailand-U and Laos populations might have an impact on the correlation, as they present both the closest geographic distance and the smallest genetic distance (F ST = 0.000). If either Laos or Thailand-U, or both of them are excluded from the analysis, the correlation between geographical distance and genetic distance becomes non-significant (R No Laos = 0.30, P = 0.084; R No Thailand-U = 0.30, P = 0.084; R No Laos and Thailand-U = 0.31, P = 0.110).

Table 4 Pairwise-F ST values among 12 population samples of B. dorsalis s.s. as derived from MICROSATELLITE ANALYSER (Dieringer and Schlötterer 2003)
Fig. 1
figure 1

Regression of F ST values (F ST/(1 − F ST)) on geographic distances (Ln distance in Km) among the 12 East Asian samples of Bactrocera dorsalis s.s. The highlighted point represents the relationship between Laos and Thailand-U. See text for discussion

Principal Coordinate Analysis (PCoA) was performed as shown in Fig. 2. The first three axes of the PCoA explained a relatively high amount of the genetic variation (79 %). The first axis (44 %) separates China-Yn and Myanmar from the other populations. The second axis (20 %) mainly differentiates the remaining populations into two clusters: all Chinese samples versus Taiwan and Southeast Asia. In this representation the isolated position of China-Gd with respect to the third axis (15 % variation) is interesting.

Fig. 2
figure 2

Three-dimensional plot of principal coordinate analysis (PCoA) based on similarity matrix derived from B. dorsalis s.s. microsatellites data

The population structure of the 12 populations can be described in more detail when individual genotypes are considered using the Bayesian cluster analysis. In the STRUCTURE analysis the ancestry of each individual is referred to one of the likely hypothetical original populations defined as cluster (K). The Evanno et al. (2005) method indicated 2 as an optimal K value (Fig. 3), as Delta K dropped sharply with K = 3 and climbed slightly with K = 6. At K = 2, cluster 1 comprises China-Yn and Myanmar and cluster 2 all the other ten populations (Table 5); the same population subdivision is seen along the first principal axis of PCoA. In Fig. 3 the number of most probable K higher than the optimum of 2 is 6, which also corresponds to the number of clusters indicated by the plateau of Ln P(X/K) (Pritchard et al. 2000; Falush et al. 2003). Thus the genotype of each fly was assigned, entirely or in part, to each of six clusters with a certain probability value (Table 5). The Chinese ancestry is fragmented in all six clusters. Considering the highest proportion of membership, two clusters are centred in China, i.e., the first with Q China-Gd = 0.865 and the second with Q China-Hu = 0.508 and Q China-Fj = 0.453. The third and forth clusters are centred in Southeast Asia and share coancestry with China. The fifth cluster is mainly represented by Taiwan (Q Taiwan = 0.705) which shares coancestry with China (especially with China-Ha and China-Gx). In cluster 6 the coancestry is shared only between China-Yn and Myanmar.

Fig. 3
figure 3

The two curves indicate the most likely number of genetic clusters (K) of B. dorsalis s.s. according to the natural logarithm of the likelihood of the data Ln P(X/K) (Pritchard et al. 2000) and according to the Delta K method (Evanno et al. 2005)

Table 5 Average coefficient of ancestry obtained from a STRUCTURE run with K = 2 and K = 6 for the 224 individuals of B. dorsalis s.s. from the 12 regions

The Neighbour-joining phylogenetic trees reconstructed from the proportion of shared alleles (D S) and from Nei’s distance (data not shown), provided similar topologies. The branching of the trees was consistent with the population coancestry relationships of STRUCTURE, as shown in Fig. 4. The pie charts are related to the partitioning into six groups according to STRUCTURE results.

Fig. 4
figure 4

Neighbour-joining unrooted tree based on genetic distances derived from the proportion of shared alleles (Bowcock et al. 1994). Numbers at each node indicate the bootstrap values after 1,000 replicates. Only values above 50 % are shown. The six coloured segments in the pies represent the coancestry distribution of the 224 flies in six hypothetical clusters

Tests of homogeneity among the populations were performed using AMOVA. Populations were partitioned following five criteria (Table 6). Significant differences among groups were observed only when groups were those corresponding to the STRUCTURE and tree results (cases 3, 4, and 5).

Table 6 Analysis of molecular variance (AMOVA)

Demographic inference

Assignment test

The proportion of assignment rates, estimated using GENECLASS 2.0, are presented in Table 7. In diagonal, values indicate the average probability with which individuals were assigned to the corresponding reference population. The probability values of self-assignment ranged from 0.914 for China-Fj to 0.519 for Laos. The highest average assignment value from one population to the others (column values) is for the eastern region of China, China-Fj with an overall contribution of 0.368 ± 0.137. On the other hand, migration rate from other populations into a population (row values) has the highest value in Taiwan island (0.162 ± 0.157). China-Fj shares its genomes with all populations, especially with China-Hu (0.619). Likewise, Thailand-R mainly shares its genetic composition with Southeast Asian populations. Asymmetric assignments with a predominant unidirectional migration from one to other populations, were detected in two Chinese populations, China-Fj and China-Gx, and Thailand-R.

Table 7 Average assignment rate (m) of individuals into (rows) and from (column) each population as calculated by GENECLASS2 (Piry et al. 2004)

Unravelling the colonization scenario of B. dorsalis s.s. using ABC method

In order to unravel the routes of the invasion and colonization, four sets of evolutionary scenarios are presented. As shown in Table 8, in the four tests, a single scenario clearly dominated the others in terms of posterior probability. In each test, the most likely scenario is the one starting from the China-Fj, from which founder events lead to the formation of the South East Asia populations. Cambodia population (tests 1 and 2) is formed by introduction from China-Fj through China-Ha and/or through Laos. A higher value of posterior probability is shown in test 3, where China-Fj is connected to Thailand-U through Cambodia. In the test 4, seven populations are taken into account to obtain an overall picture. The best scenario is that shown in Fig. 5, in which starting from China-Fj, China-Ha is a bridge to South East Asian populations.

Table 8 Type-I and Type-II error rates for simulated data
Fig. 5
figure 5

Most-likely scenario of test 4 using Approximate Bayesian Computational (ABC) method. The Y-axis indicates the estimated time of event (not to scale). The details of this scenario are described in Table 8

We found that divergence times between populations, estimated by the median values of the posterior distribution, are reasonably congruent with the historical records. If B. dorsalis s.s. in sampled areas can complete ten generations per year, China-Ha firstly diverted from China-Fj during 1918–1920. In the following approximately 20 years, the China-Ha population had become established. Regarding the estimated time period, it is noteworthy that the presence of this fly had been recorded on this island in 1934–1937 (Xie 1937). In a relatively short time (between 671 and 679 generations ago) the colonization of South East Asia began (i.e., Thailand-R Thailand-U and Cambodia). Laos appeares to have been recently colonized from Thailand-U (1999–2001). Type-II error rates of all the most probable scenarios are found to be low (ranging from 4.6 to 7.6 %) of wrong assignment. Likewise, type-I error rates are reasonable with low values (ranging from 3.6 to 7.6 %) (Table 8).

Discussion

An important key to explain the spatial heterogeneity in terms of genetic variability and population relationships is that B. dorsalis s.s. attacks fruits of high economic value, in addition to non agricultural plants (Clarke et al. 2005). The genetic variability of populations can be thus influenced by the presence of intensive crop cultivation, markets and the use of pesticides, while the fly’s diffusion, and consequently the population structure, may be highly influenced/shaped by the intensity of human trade and mobility.

Genetic diversity and breeding habitat

A consistent level of genetic variability is observed both in China and in Southeast Asia (in Thailand) where the majority of the sampled localities are characterized by the presence of extensive cultivations of host fruits, rich markets, and an overall favourable climate for population growth and expansion. Examples are the Fujian sample from China and Ratchaburi in Thailand. The Fujian area has a highly developed fruit crop industry and, as an area of fresh fruit export, it ranks first in China with excellent enterprises for national and international markets with convenient transportation to Southeast Asia and other Chinese areas (http://www.unapcaem.org/Activities%20Files/A22/p50_FruitChina.pdf). Fujian ranks among the advanced regions for its contribution to the development of an environmentally friendly fruit market with strong limitations on insecticide use. Ratchaburi province in Thailand is also one of the biggest tropical fruit growing and trading areas in Southeast Asia (Aketarawong et al. 2007). But also the Guangdong sample, which was collected in one of the most cultivated provinces of China (Li 2009) that belongs to the main Chinese trade routes for fruit import/export (Biosecurity Australia 2009), would be expected to be highly variable. On the contrary, the sample from this province, China-Gd, has a relatively low level of genetic variability and appears to be genetically isolated from the geographically close eastern group of Chinese populations (see F ST, PCoA, STRUCTURE). But the only two private alleles detected in this population have very high frequencies. It is noteworthy that a high level of pesticide resistance (Jin et al. 2011) is registered in this region resulting in selection consequent to pesticide exposure. Therefore, the genetic peculiarity of the China-Gd sample may be the result of the intensive use of pesticides in Guangdong.

Dynamics of B. dorsalis s.s. in East Asia

The neighbour joining tree, PCoA, STRUCTURE and ABC analyses clearly show that genomes are shared between China and Southeast Asia, and there is also evidence that there is no isolation by distance between the populations from these two areas. To highlight the genetic connection between China and Southeast Asia and possibly to infer the direction of migration, we combined the results of two specific genetic analyses, i.e. F ST and GENECLASS. In Fig. 6, which represents the population sampling sites, the lines connecting the populations are based on non-significant F ST values, while the direction of each arrow is deduced from the migration rates estimated by GENECLASS. This picture may represent the outcome of propagule pressure, i.e. the number of introduced individuals and/or the number of independent introductions, which could have been responsible for the present population structure. Under this perspective the rich fruit growing area of China-Fj appears to have played a crucial role in distributing genomes not only within China, but also in Southeast Asia. A role in this spreading process is played by China-Ha island which appears to be a node towards both inland China and Southeast Asia. This connection is confirmed also by ABC analyses (Fig. 5) and the coincidence between the estimated date of the founder event (1937–1939) and the date of the first observation in Hainan island (China-Ha) (1934–1937, Xie 1937). Hainan’s moist tropical climate and the intensive plantations of hosts preferred by the fly (Li et al. 2012) allowed, over time, the establishment of large populations of this pest which achieved medium/high levels of genetic variability (Table 3). On the other hand, the intensive trade of fruits to major mainland regions could have provided a portal for the introduction of the fly and for the maintenance of gene flow toward Southeast Asia. Indeed, as shown in the tests 1 and 4 of the ABC analysis, and Fig. 6 the two Chinese samples from Fujian and Hainan are tightly related to Laos and Cambodia: this result allows us to believe that Laos and Cambodia constitute a corridor, through China-Ha island, for the spread of B. dorsalis s.s. towards Southeast Asia. In turn, Southeast Asia appears to be an area with a high rate of internal gene flow in which the populations form an homogeneous group characterized by high levels of variability and a distinct molecular variance (Aketarawong et al. 2007). Indeed, in this area there is no geographical isolation of populations and there is intensive fruit crop planting without commercial restriction of tropical fruits among and within Thailand, Laos and Cambodia. Instead, when political events have limited human mobility and the exchange of merchandise, the genetic variability of this species is modelled in the form of genetic isolation. In this context, a major separation is represented by the Southeast Asian population of Myanmar which shares alleles only with the closest Chinese population represented by China-Yn, as the China borders are the only commercial partners of Myanmar. Yunnan is one of the provinces highly infested by the fly, but due to the eco-geographic conditions of this area, the habitat of the fly is highly fragmented (Shi et al. 2005). The Yunnan sample was collected in the Kunming area which is an important trade centre serving as transportation hub, linking by road Southwest China to Myanmar. Kunming is a significant horticultural centre located on a high plateau on which, due to the particular geographical and climate conditions, B. dorsalis s.s. has only two generations per year, but with considerable genetic overlap due to the continuous arrival of immigrant flies during the summer months (Ye and Liu 2005). These conditions explain both the genetic variability of this fly in this area and the genetic separation of China-Yn sample from the other Chinese samples. Similar conclusions are obtained by mitochondrial DNA analyses (Li et al. 2012).

Fig. 6
figure 6

Geographic representation of the dynamics of B. dorsalis s.s. in East Asia. The lines connecting the sampling sites are based on non-significant F ST values, while the direction of each arrow is deduced from the migration rates (greater than 10 %) estimated by GENECLASS. The dashed line connecting Myanmar with China-Yn is related to the lowest (but significant) F ST value among those related to the two populations. The arrow indicates the high contribution (m = 0.412) of China-Yn to Myanmar

Taiwan, which in Fig. 6 appears to be isolated, according to the GENECLASS data is characterized by fly immigration mainly from Chinese districts and especially from Fujian. Accordingly, since ancient times, Fujian has cooperated with Taiwan to develop an intensive fruit agriculture.

About the home range of B. dorsalis s.s.

When diffusion is mainly due to step by step natural migration, the resulting gradient of ancestral diversity loss along the expansion axes, due to successive founder events, permits the identification of the original home of the species (Malacrida et al. 2007). This is not the case for the oriental fruit fly expansion in East Asia, as the overall picture seems to be masked by a strong propagule pressure driven by the human activities. Indeed none of the considered East Asian populations are markedly genetically depauperate, as the large number of founding members may have been supplemented by continued migration events via commercial traffic.

Nevertheless, specific analyses suggest that the probable place of origin of B. dorsalis s.s. in East Asia is in the region of Southeast China. Indeed the STRUCTURE analysis shows that four (clusters 1, 2, 5 and 6) out of six clusters are centred in China. But more evidence derives from ABC and GENECLASS analyses, where the Southeast population China-Fj shares the greatest proportion of its genomes with all the other Chinese and Southeast Asian populations, as shown in Table 7 and depicted in Figs. 5 and 6. Therefore, China-Fj appears to represent in East Asia an ancestral population pertaining to the home range of the species and probably the most ancient among the analysed populations.

Conclusion

In conclusion, this paper describes the different forces that affect the invasion process in East Asia. It complements and integrates the previous studies, mainly devoted to the inference of the colonization route and the native region of the species (Wan et al. 2011, 2012; Shi et al. 2012). Our data validate our previous results (Aketarawong et al. 2007) and are in agreement with Wan et al. (2011, 2012) and Shi et al. (2012) on indicating Southern East China as a possible source of invasion of this pest. We provide scientific background which clearly indicates the prevalent direction of the expansion pattern in East Asia. In this area, it appears that the genetic variability, the population structuring and dispersal are influenced by humans, but factors such as breeding/habitat size and life history traits of the species have influenced the post-introduction phases and expansion. In this context it is not secondary that B. dorsalis s.s. is a species with K-strategy traits which may have aided in its establishment and spread (Fletcher 1989) also in saturated habitats. Indeed in all recorded cases, B. dorsalis s.s. has invaded over other species in terms of food resources, exploitation and interspecific competition and displacement (Duyck et al. 2004).

From the perspective of management strategies for this pest in East Asia it appears evident that the high genetic interconnectivity and the high migration rates among populations suggest the planning of Area Wide integrated pest management (AW-IPM) strategies (Hendrichs et al. 2007) which take into account the spatial/temporal distribution of the species together with ecological and genetic information. The use of intensive insecticide spraying, in addition to causing environmental damage, may impact the structure of populations by inducing differentiative processes as we observed in the Guandong population. As insecticide resistance could be linked to the genetic differentiation, this may impact the application of chemical control methods.