Introduction

The increasing mobility of humans in particular during the last centuries has—inadvertently or not—transplanted a large number of animal and plant species from their native habitats to areas which they would otherwise never have reached. One important part of understanding biological invasions is the history of the processes including the routes of spread and introduction (Estoup and Guillemaud 2010).

Non-marine gastropods are often considered excellent model organisms for biogeographic questions due to their low potential for active dispersal, thus conserving information about their origin and distribution better and longer than vagile species (e.g. Davison 2002; Haase et al. 2003, 2007; Haase and Misof 2009; Pfenninger et al. 2009; Sauer and Hausdorf 2009; Cameron et al. 2010). In order to cover larger distances and to overcome dispersal barriers such as open water, snails depend on vectors. Several patterns of distribution including spectacular ones such as the occurrence of the genus Balea on the Tristan da Cunha islands (Gittenberger et al. 2006) can only be explained by long distance transportation through birds (e.g. Darwin 1859; Vagvolgyi 1975; Haase et al. 2010a, b). Animals may get caught in the plumage, stick to the feet, or survive passage through the gut (Malone 1965; Rees 1965; Cadée 1988; Green and Figuerola 2005). Accidental or deliberate human-mediated long distance transportation of non-marine gastropods is documented for a large number of species (e.g. Ponder 1988; Robinson 1999; Cowie 2001; Hayes et al. 2008) and many of them are considered threats to the local fauna (e.g. Civeyrel and Simberloff 1996; Kerans et al. 2005), to human health (Pointier et al. 2005; Majoros et al. 2008) or agricultural pests including the land snail Theba pisana pisana (Müller 1774) (e.g. Henderson 1996; Barker 2002).

Theba p. pisana, which is highly polymorphic with respect to shell color and banding pattern, is the only representative of the genus with a wide range. It has a circum-Mediterranean distribution and occurs along the Atlantic coast from Morocco northwards to the Netherlands including some localities on the British Islands and Ireland. It also occurs on some Atlantic islands including Gran Canaria, Tenerife and Madeira. Biogeographic considerations and a phylogenetic analysis indicate Morocco as place of origin (Sacchi 1955; Gittenberger and Ripken 1987; Greve et al. 2010). The species prefers the sparse coastal vegetation and experiences a single reproductive season after 1 or 2 years of development, depending on the climatic conditions of a particular area (Heller 1982; Cowie 1984; Baker 1991; Sacchi 1995). Relatively recently, probably during the 19th and 20th centuries, T. p. pisana has been introduced to the USA, South Africa, Australia, and Argentina, where it has gained its legacy as a pest species (Chace 1915; Connolly 1916; Dürr 1946; Pomeroy and Laws 1967; Odendaal et al. 2008; Rumi et al. 2010), although it is known to be harmful also in plantations along the Mediterranean coasts (e.g. Castillejo et al. 1996). In the USA, the species was presumably eradicated but reappeared and now “enjoys” high priority with quarantine (Cowie et al. 2009 and literature therein). Introductions to the Netherlands (van Benthem Jutting 1964; van Dalsum 1964) and the Macaronesian Islands (Gittenberger and Ripken 1987; Cook et al. 1993) are recent as well. However, considering that the Mediterranean area has been one of the cultural cradles of mankind including extensive trade relations established by the Phoenicians already more than 3,000 years ago (Sommer 2005), and that the distribution of T. p. pisana almost perfectly matches the extent of the Roman Empire at its maximum extent under Trajan during the early 2nd century (Wells 1995), the spread of this species may have already much earlier been influenced by human activities.

The goal of the present paper is the reconstruction of the phylogeography of T. p. pisana. In particular, we asked to which extent it may still be possible to trace natural as opposed to human causes for the current distribution. We also attempted to infer the origins of the apparently recent human-mediated introductions.

Materials and methods

Sampling and laboratory methods

A total of 63 samples from 17 countries comprising 343 specimens were collected and conserved in 99.8% ethanol largely between 2007 and 2009 or provided by museum collections (Table 1, Fig. 1). A part of this material, viz. 227 snails, has already been sequenced for Greve et al. (2010) (GenBank accession numbers of unique haplotypes: HM034556-HM034563, HM034570-HM034603). From another 116 individuals (HQ864649-HQ864687), DNA was extracted from foot muscle tissue using QIAGEN’s DNeasy Blood and Tissue kit following the manufacturer’s protocol. A fragment of the mitochondrial cytochrome c oxidase subunit I (COI) was amplified by polymerase chain reaction (PCR) using the primer pair LCO-1490 [5′-GGTCAACAAATCATAAAGATATTGG-3′ (Folmer et al. 1994)] and C1-N-2191 [5′-CCCGGTAAAATTAAAATATAAACTTC-3′ (Simon et al. 1994)]. Amplifications were carried out in a total volume of 25 μl consisting of 16 μl de-ionized water, 20–100 ng template DNA, 2.5 μl Buffer (Bioline), 2.0 μl 50 mM MgCl2, 1 μl of each primer (10 pm), 0.5 μl dNTP (each 2.5 mM), 1 μl BSA (1%) and 0.1 units of Taq polymerase (Bioline). The PCR conditions were as follows: an initial denaturation step at 95°C for 3 min, followed by 15 cycles of touchdown PCR (94°C for 0:35 min, 55°C–40°C annealing for 1:30 min and 72°C extension for 1:30 min) followed by 25 cycles (95°C for 0:35 min, 40°C for 1:30 min and 72°C for 1:30 min) and a final extension step at 72°C for 10 min. PCR products were purified using ExoSAP-IT. Subsequent cycle sequencing was performed using the BigDyeTerminator v3.1 kit of Applied Biosystems (ABI). The cycle sequencing products were purified with magnetic beads (Agencourt CleanSEQ, Beckman Coulter) and directly sequenced in both directions on an ABI 3130xl Genetic Analyzer.

Table 1 Sampling localities. Locality data and acronyms
Fig. 1
figure 1

Map showing sampling localities and definitions of regions. For abbreviations see Table 1

Sequence analyses

Sequences were edited in BioEdit 7.0.5 (Hall 1999) and aligned using MAFFT (Katoh and Toh 2008). The final alignment had a length of 610 bp. The Χ2 test implemented in PAUP* 4b10 (Swofford 2003) conducted with exclusion of constant sites as proxies of invariant sites (Lockhart et al. 1995) suggested homogeneity of base frequencies (Χ2 = 75.648, p = 1.000). Saturation of substitutions was controlled for by the test of Xia et al. (2003) and by visually inspecting a plot of transitions and transversions against corrected distances (F84). Both tests were conducted in Dambe (Xia and Xie 2001) and rejected saturation. Analyses of sequence polymorphism were carried out using DnaSP 5.10.0.1 (Librado and Rozas 2009) and Arlequin 3.11 (Excoffier et al. 2005).

Phylogenetic relationships were inferred using maximum parsimony (MP), bio-neighbor joining (NJ), maximum likelihood (ML), and Bayesian analyses (BA). MP and NJ were conducted in PAUP*. For MP, heuristic searches used TBR-branch swapping in 1,000 replicates. Starting trees were constructed by stepwise addition with random addition sequence. For NJ, TPM1uf+Γ+I was determined as best-fitting model with jModeltest (Posada 2008) and for BA, HKY+Γ+I using MrModeltest (Nylander 2004), applying the Akaike Information Criterion corrected for small sample sizes in both cases. BA was conducted with MrBayes (Huelsenbeck and Ronquist 2001). Every 100th tree was sampled over 6 Mio generations with a burnin of 15% and otherwise default settings. Convergence of runs and parameters was controlled by the diagnostics implemented in MrBayes and in Tracer 1.5 (Rambaut and Drummond 2009). ML was performed by GARLI (Zwickl 2006). This program uses a genetic algorithm estimating the tree and substitution model parameters simultaneously. Robustness of MP, NJ, and ML reconstructions was assessed by bootstrapping with 100, 1,000, and again 1,000 replicates, respectively. Theba p. cantinensis (HM034567-HM034569), T. p. ampullacea (HM034564-HM034566), both confined to Morocco, and the closely related T. andalusica (HM034555) from southern Spain served as outgroup taxa (see Greve et al. 2010). Since intraspecific relationships are often better displayed as networks due to the potential presence of ancestral sequences, we conducted a statistical parsimony analysis using TCS (Clement et al. 2000).

In order to compare rates of evolution a Bayesian relative rates test was performed based on the last 1,000 trees sampled during one of the parallel Bayesian tree searches. For each of these trees the distance from the most recent common ancestor to each terminal taxon was estimated with the help of the software Cadence (Wilcox et al. 2004). Means and 95%-confidence intervals for each lineage were calculated and plotted using a program written by CD in R (R Development Core Team 2010).

Phylogeographic analyses

For population genetic analyses, localities were united to eight regions based on the assumption of isolation by distance as most natural model of genetic differentiation (see e.g. Arter 1990; Arnaud et al. 2003 and references therein; Fig. 1).The number of individuals with identical haplotypes was restricted to five for each locality. This design aimed at compensating for the uneven sampling design both in terms of geography and number of specimens per sample. Individuals from the USA, South Africa and Australia were excluded in order to avoid bias. Thus, a total of 275 individuals remained for the following analyses.

Genetic differentiation of regions was expressed by FST-values calculated in Arlequin. Linearized FST-values [FST/(1 − FST), Slatkin 1995] were used to display the differentiation by non-metric multidimensional scaling (NMDS) performed in PAST 2.0 (Hammer et al. 2001). Using Mantel-tests, again conducted in Arlequin, correlation of geographic and genetic distances was investigated. These tests were conducted based on individuals and individual localities with genetic distances obtained in PAUP* and geographic distances (as the crow flies) in ArcGis 9.2 (ESRI, Redlands, California) as well as on regions using linearized FST-values and distances between centers of gravity and smallest distances between regions, respectively. Coalescence-based methods not assuming mutation-drift equilibrium and symmetric gene flow possibly modeling differentiation and gene flow better than F-statistics could not be implemented successfully, probably because of the high degree of sequence differentiation exhibited by T. p. pisana. To test for neutral evolution, respectively demographic expansion, Tajima’s D (Tajima 1989), Fu and Li’s F* and D* (Fu and Li 1993) as well as Fu’s F (Fu 1997) were calculated using DnaSP. Based on these results, the demographic history was analyzed using Bayesian skyline plots (Drummond et al. 2005) generated in BEAST 1.6.0 (Drummond and Rambaut 2007) and Tracer based on the HKY+Γ+I substitution model. We assumed a log normal uncorrelated relaxed clock and constant population size coalescent model. The skyline plots were created for the entire dataset as well as for each region and according population and group sizes varied from 4 and 3 to 8 and 7, respectively. Markov chains ran for at least 20 Mio generations until effective sample sizes of parameter estimates were >200. We did not use the dates suggested in Greve et al. (2010) for calibration, because AFLP data (Greve et al. unpublished) suggest major modifications to the sequence-based phylogeny of the genus. Sudden spatial expansion was controlled for by mismatch distributions using Arlequin based on the model developed by Excoffier (2004). Plots were made in an R routine written by CD. In contrast, the assumptions for testing sudden demographic expansion by mismatch distributions (see Marjoram and Donnelly 1994) were not met and therefore we dispensed with these tests.

Results

There were 146 unique haplotypes among the total of 343 sequenced individuals. Identical haplotypes were mostly collected at the same or geographically close localities. Overall haplotype diversity was high with 0.961 and that of regions at least 0.890 except for Gran Canaria and Madeira, the latter represented only by a small sample, though (Table 2). Likewise, overall nucleotide diversity was high with 0.058, but more heterogeneous among regions with a maximum of 0.081 in Morocco (Table 2). Among the 610 bp, 195 were variable and of these 171 parsimony informative. Tree reconstructions based on different methods gave very similar results with slight topological differences only within clades indicating unambiguous phylogenetic signal. The major clades showed identical relationships and were largely well supported (Fig. 2). Three haplotypes from five morphologically unambiguous T. p. pisana individuals, MAR2_3, MAR3_1-3, and USS_3, fell among the outgroup taxa T. p. ampullacea and T. andalusica. The position of MAR3_1 may well be due to long-branch attraction, although it is geographically consistent among several divergent, basal lineages from Morocco. The basal-most true ingroup-clade (clade 4) consisted of snails collected near Rabat in NW-Morocco and two individuals from California. Continuing in a ladder-like fashion a clade (5) composed of individuals collected in Andalusia, NW-Spain, southern France, Tunisia, Greece and the specimen from a market in Bonn (Germany) was followed by a small, well defined group of haplotypes from Morocco and the Italian island of Sardinia (clade 6). Snails from NW-Spain and the Eastern Mediterranean were sister group of the remaining two clades 8 and 9. Clade 8 consisted of snails from Gran Canaria, Portugal, Cyprus, Brittany (France) and a Moroccan subclade (8a) including again an individual from the United States. In the NJ and MP analyses, this Moroccan subclade was sistergroup to the remaining sequences. The largest, least structured clade, No 9, was composed of haplotypes from all eight defined regions. The snails displaced to South Africa had haplotypes identical to individuals collected in the Netherlands and on Corsica (France), respectively. The five specimens from W-Australia shared a single haplotype with two Dutch and three South African specimens.

Table 2 Genetic diversity and neutrality tests (Tajima 1989; Fu and Li 1993; Fu 1997)
Fig. 2
figure 2

50% majority rule consensus tree from Bayesian analysis with support values for major clades from all four analyses (BI, ML, MP, NJ). Clades are numbered for easier reference in text. For abbreviations see Table 1

The statistical parsimony network analysis reflected the tree topology and resulted in a total of 22 individual networks based on the parsimony connection limit of 95%, 11 of which were individual haplotypes including four belonging to the outgroup (not shown). In order to unite all haplotypes in a single network, the connection limit had to be raised to 66 mutational steps. Only the two most derived clades contained a total of three loops indicating a low degree of ambiguity of evolutionary pathways.

Relative rates of sequence evolution were surprisingly heterogeneous indicating that a strict molecular clock did not hold (Fig. 3). In general, basal lineages mainly from Morocco and Southern Spain evolved slower than younger ones. The highest substitution rates were again found among Moroccan haplotypes, viz. among representatives from the foothills of the Rif-Mountains. Interestingly, both the slowest (MAR2-3) and one of the fastest (MAR2-1) lineages occurred at the very same locality. These snails had practically identical shell banding patterns. Their uncorrected genetic distance was 12.3%.

Fig. 3
figure 3

Bayesian relative rates test based on last 1,000 trees sampled in one of the MrBayes runs. Order of haplotypes according to order in alignment; gaps indicate alignment positions of taxa not considered in this test (see Appendix). Black q’s represent distance to most recent common ancestor (MRCA) in each particular tree; grey stars enclosed by light grey q’s denote means and 95% CI. Numbers indicate clades (see Fig. 2)

Genetic variation within regions (56.77%) was higher than between regions (43.33%). All FST-based pairwise comparisons except Iberian Peninsula versus Madeira were highly significant as well (Table 3). Relationships among regions are represented by non-metric multidimensional scaling in Fig. 4. Distortions were moderate as indicated by a stress value of 0.177. A superimposed minimum spanning tree connected Morocco with the Iberian Peninsula and Northern Europe with the Eastern Mediterranean, Southern France, Gran Canaria and Madeira. The distortions in the two dimensional representation of genetic distances suggest that alternative configurations and connections of the regions are not much worse. Therefore, we connected the regions also based on the second lowest pairwise FST-values in addition to the minimum spanning tree (Fig. 5) assuming that both networks together possibly represent the main dispersal routes more comprehensively.

Table 3 Genetic differentiation between regions
Fig. 4
figure 4

Nonmetric multidimensional scaling of regions based on linearized FST-values (FST/(1 − FST)) (Table 3) with superimposed minimum spanning tree

Fig. 5
figure 5

Hypothesized major routes of dispersal deduced from minimum spanning tree from Fig. 4 plotted on map (black lines) and connections based on second lowest FST-values associated with each region (white lines)

In accordance with these representations, all Mantel-tests conducted did not reject the null-hypothesis of no correlation between genetic and geographic distances between regions (p > 0.5 in all cases). Therefore, genetic isolation of regions by geographic distance could be ruled out. This result could already be expected considering the composition of the major clades.

Across the entire alignment, neutral evolution of haplotypes could not be rejected. All test statistics were negative, but non-significant (Table 2). These results also suggested demographic stability in the recent past. Narrowing the focus onto single regions, the picture became more heterogeneous with positive as well as negative values for the four test statistics including significant ones suggesting recent demographic population growth in North Africa and Southern France (Table 2).

The previous findings, however, did not concur with the Bayesian skyline analyses modeling the demographic development over all regions. The resulting plot indicated a long period of stable population size throughout the first 75% of the evolutionary life time of T. p. pisana. Then the subspecies experienced a slow increase in population size with a peak at about 2% of its life time before present followed by a sharp decline until today. Skyline plots for individual regions basically showed the same shape (Fig. 6).

Fig. 6
figure 6

Bayesian skyline plots depicting mean effective population size (black line) through time and 95% highest posterior density interval (grey lines)

Mismatch distributions across all regions and for each region separately (not feasible for Gran Canaria and Madeira) could not be distinguished from distributions modeling spatial expansion (Fig. 7).

Fig. 7
figure 7

Comparisons of observed and expected mismatch distributions to test for spatial expansion (Excoffier 2004)

Discussion

In line with previous findings for several species of land snails (e.g. Thomaz et al. 1996; Pinceel et al. 2005; Haase and Misof 2009) mitochondrial diversity and divergence within T. p. pisana were very high. Greve et al. (2010) suggested assigning separate subspecies status to the basal Moroccan clades. However, the sympatry and morphological identity of the extremely divergent individuals MAR2_1 and MAR2_3 indicated unrestricted gene flow. Similarly, the morphologically well defined (sub)species T. andalusica and T. p. ampullacea could not be unambiguously delimited based on mitochondrial data. Preliminary AFLP data (Greve et al. unpublished) confirm this pattern of incongruence between morphological and genetic data difficult to handle taxonomically. Therefore, the extreme divergence and diversity were probably primarily due to the hermaphroditic nature of T. p. pisana (see Haase et al. 2003; Davison 2006). The deep clade structure was most likely a consequence of lineage sorting (Hoelzer et al. 1998), although we cannot exclude that our sampling missed other still extant lineages. The likewise high heterogeneity of substitution rates is so far unparalleled among land snails. Interestingly, older lineages seemed to evolve slower than younger ones. We do not believe that this is an artifact due to saturation of substitutions, because visual inspection of plots of transitions and transversions against genetic distance as well as the test of Xia et al. (2003; data not shown) indicated that saturation was negligible. Similar observations have recently been made for birds (Nabholz et al. 2011).

The tree reconstructions confirmed the origin of T. p. pisana in NW-Africa (Greve et al. 2010) and suggested a fairly long history of local differentiation, also expressed by the highest gene diversity among the eight regions defined, before extending the range onto the Iberian Peninsula. Plate tectonic, i.e. vicariance events in the Western Mediterranean did obviously not play a role for the present distribution of T. p. pisana as shown or assumed for other land snails (Pfenninger et al. 2009; Guillier and Madec 2010) with at least partly overlapping distribution. The spread across its current circum-Mediterranean range has occurred only in the younger past. However, a pattern of isolation by distance or a more linear stepping-stone arrangement of populations or regions one would expect for a naturally dispersing land snail (Arter 1990; Arnaud et al. 2003 and literature cited therein) could not be detected. In contrast to the test statistics for neutrality, the Bayesian skyline plot indicated a demographic expansion after about 75% of the lifetime of T. p. pisana, probably in the late Pleistocene considering substitution rate estimates for land snails in the absence of suitable calibration points (see Guillier and Madec 2010 and references therein). The most recent drop of effective population size probably masked the foregoing increase for the statistical tests for demographic expansion, which largely yielded non-significant p values. Expecting the demographic expansion to be linked with spatial expansion, the signal of which was still detectable in mismatch distributions, suggests that T. p. pisana started to naturally spread long before the antique civilizations had established. The lack of pattern indicates, however, that long distance dispersal must have played a major role to conquer the Mediterranean coasts. Although birds are known to consume species of Theba (Mienis 1971, 1978; Gittenberger and Ripken 1987; Hutterer personal observation), it seems unlikely that they were chiefly responsible for the current distribution. The genetic differentiation among regions suggested important North–South connections paralleling the major flyways (e.g. Berthold 2001). However, “Northern Europe” can hardly be considered a hub for the dispersal of T. p. pisana, as Figs. 4 and 5 would suggest, because we know that T. p. pisana has been deliberately introduced to northern France, Belgium and the Netherlands only during the last 120 years (Dutertre 1927; Adam and Leloup 1937; van Benthem Jutting 1964; van Dalsum 1964). Only in the British Islands, the species has been present already in the 18th century (Pennant 1777; Turk 1966, 1972). The West-East tracks indicated in Fig. 5, on the other hand, do not coincide with any important flyway, disqualifying birds as long distance vectors again. The lack of any fossil record in Israel has prompted Heller and Tchernov (1978) to assume that T. p. pisana has reached the Eastern Mediterranean area only in historical times. However, Holocene records from Malta (Giusti et al. 1995) do suggest an earlier spread towards the East. Recently, Jesse et al. (2011) demonstrated that already the human Neolithic expansion across the Western Mediterranean area has shaped the distribution of another land snail, Tudorella sulcata.

The detours a snail invasion can take are illustrated by the sample intercepted by quarantine in St. Paul, Minnesota (USA). This sample was collected on mint plants shipped in a container from Iran, which is not part of the native range, either. Our phylogenetic analyses indicated a Moroccan origin. The two individuals sequenced appeared in very divergent clades suggesting that already the ancestors in Morocco had a turbulent history. Also the Californian sample could be traced to Morocco. The genetic identity of most South African and the Australian specimens with an individual from the Netherlands seems to indicate an unambiguous route of translocation reminiscent of the Netherlands as colonial and trading power. However, this conclusion is misleading since T. p. pisana has been introduced to the Netherlands only in the middle of the 20th century (van Benthem Jutting 1964; van Dalsum 1964) when it was already present on the Southern hemisphere (Connolly 1916; Dürr 1946; Pomeroy and Laws 1967). One haplotype was shared by a South African specimen and a sample from Corsica. Interestingly, this haplotype was sister group to another Dutch specimen and both were closely related to snails collected in Montpellier/Southern France. Therefore, Southern France may be a better guess for the origin of both the Dutch—via Northern France and Belgium (Van Benthem Jutting 1964; van Dalsum 1964)—as well as the South African and Australian populations. Our single sample from Western Australia is certainly not representative for the entire continent, where T. p. pisana occurs more or less continuously along the entire South coast. Hence, we cannot exclude several independent invasions of Australia. Interestingly, based on allozyme data including samples from southern France, Israel and Wales, Johnson (1988) already assumed a southern French origin of the Australian populations, speculating about a “stop-over” in South Africa.

Clade 9 was geographically particularly diverse suggesting that different lineages differ in their adaptive and invasive potential. This has been observed only for few other species (Kelly et al. 2006; Kang et al. 2007; Ciosi et al. 2008). However, as in T. p. pisana, experiments comparing the ecological valences of clades are lacking.

In conclusion, apart from the origin in NW-Africa, the territory of today’s Morocco, where T. p. pisana already underwent considerable differentiation, and an initial spread to the Iberian Peninsula, the phylogeographic signal is largely blurred. All available evidence indicates that primarily human activities rather than natural causes have shaped and still shape the current distribution. A denser sampling design, which to achieve would require a tremendous effort, though, might reveal a few more details. Similarly, non-equilibrium approaches in population genetics, once adapted to divergent lineages such as those found in many species of land snails, will certainly refine the picture. However, although the picture may become more complex, a fundamental change of the general scenario outlined in this study is not to be expected.