Introduction

Ecological niche modelling and climate-matching has gained momentum recently for predicting potential invasions of biological entities. The methods used in this approach are based on the assumption that a climate-derived ecological niche can be described as the n-dimensional hypervolume of environmental conditions under which it is able to maintain populations without immigration. Almost exclusively, species’ invasions outside of their native range have been predicted with this method using locations from their native ranges where it is confirmed to exist. However, using locations from their native ranges assumes the same factors determine the distribution of the species in the invaded range.

Amblyomma variegatum is one of the most important ticks of livestock. It has a wide distribution south of Sahara, and has successfully established outside the African continent in the Yemen, various islands of the Indian Ocean, and the Caribbean (Walker and Olwage 1987). The tick is also a prominent vector of some pathogens of livestock, perhaps the most prominent being Ehrlichia ruminantium that causes Heartwater. Its saliva also contributes to the pathogenesis of severe bovine dermatophilosis by Dermatophilus congolensis. Both diseases are devastating and have been diagnosed throughout the distribution range of the tick in Africa and in some parts of its Caribbean range (Barré et al. 1987). Non-indigenous invasive tick species are a major threat to animal industry and parasitic ecosystem function. Predictive models can enable early detection of suitable areas for tick invasions, focusing research efforts to areas most likely to be impacted.

Intensive chemical control has been applied in several islands of the Caribbean to stop the serious economic losses and to prevent the spread of the tick into the mainland. Previous modelling studies (Sutherst and Maywald 1985) calculated that the tick could colonize a large region in American mainland. Barré et al. (1987) stated that “this species would find environmental conditions favourable for its survival and spread in most of the tropical and subtropical Western hemisphere” if adequate conditions exist to allow spreading. Other than domestic animals, the tick has a wide host range (Petney et al. 1987) and some wild hosts, like egrets, have been involved in the long-range dissemination of the tick in the Caribbean (Corn et al. 1993). There is a great concern about the potential spread of the tick into wider areas from the current foci in Caribbean using domestic or wild hosts as disseminating forces (Barré et al. 1995). This expansion could provide potential for native Amblyomma species to serve as vectors of heartwater (Barré et al. 1987). Observations have been made on specimens of A. variegatum recovered erratically from areas outside its normal range, commonly using egrets as disseminating hosts.

The purpose of the present study is to investigate the potential area of suitable habitat of A. variegatum in the New World, and whether invaded-range and native-range models could be used effectively to capture the current distribution of the tick. Further analysis is devoted to know the variables determining the habitat suitability for the tick in different parts of its distribution range and to the use of habitat clustering to further improvement of the modelling approach. This study is restricted to the evaluation of climate suitability for the tick, not to the possibility of dissemination, as this is mostly related to geographical barriers that are not considered here.

Materials and methods

Tick records

A historical dataset of the distribution of the tick in Africa was previously compiled and published (ICTTD 2004). Additional data were obtained through the courtesy of Gerrit Uilenberg and Trevor Peter for different parts of the continent. Distribution data recorded between 1970 and 2000 were selected from this dataset. From this subset, 1,642 records could be adequately georeferenced as shown in Fig. 1. This dataset does not include the complete known distribution range of the tick, as outlined by Walker and Olwage (1987) or Walker et al. (2004), because many records were not adequately referenced to a given pair of co-ordinates in the original report. Records from the Caribbean were compiled from the literature by Alberto Guglielmone and updated by one of the authors (R.P.).

Fig. 1
figure 1

The known distribution of Amblyomma variegatum in Africa, with the sets of georeferenced records (dots) used to fit the models and the total known distribution developed according to experts’ opinion (grey area)

Climate data

The database of tick records has been checked against a spatially and temporally extensive gridded climate dataset that extends from 1950 to 1999, interpolated at a resolution of 2.5 km (available at http://www.biogeo.berkeley.edu). This dataset contains monthly records of temperature (mean, minimum and maximum) and precipitation (monthly total).

Model development and testing

The general steps for the modelling were (1) Build models using the native range (Africa) dataset, both with the complete set of records and those derived from clusters (see below), using the area-weighted procedure for the final Habitat Suitability (HS) estimation, according to Estrada-Peña et al. (2006); (2) Project rules obtained in the previous step onto native range and invaded range (New World) and evaluate the models obtained against African and Caribbean records; (3) Build models using a dataset of the invaded range (Caribbean); (4) Project rules of this model onto invaded range end evaluate these models against the set of confirmed records of the tick in the New World.

Methods proposed to predict species distributions based on presence data only search for an “environmental envelope” characteristic of the points where the species is present to extrapolate to the remaining area under study (Guisan and Zimmerman 2000). For this study, we used both the Maximum Entropy approach (MaxEnt, by Phillips et al. 2006) and the Gower distance (Gower 1971). They use different methodologies to provide an index of climate appropriateness for the tick species, herein referred as habitat suitability (HS) from the layers of climate mentioned above. We trained the model with a randomly selected half of the records of the tick in the complete Africa or in each of the clusters detected (see below for clusters details).

The evaluation of performance measures first required the derivation of matrices of confusion that identified true positive, true negative, false positive and false negative. Positive sites were obtained from the data set in Africa, and negative sites were produced from pseudoabsence points. When absence data are unavailable or unreliable one can build discrimination models by generating pseudo-absences (Zaniewski et al. 2002; Lobo et al. 2006). Engler et al. (2004) proposed a strategy of pseudo-absences generation that reduces the chance of selecting absences where the species does in fact occur (but not detected) or where the environmental conditions are suitable even if the species is absent. According to their methodology, an envelope enclosing all occurrences in the environmental hyperspace is delineated. Pseudo-absences are then randomly chosen outside this envelope and combined with real occurrences for GLM calibration. The number of randomly chosen pseudo-absences is the same as the total number of real occurrences because it is easier to find optimal threshold in this situation. This method reduces the problem of ‘false’ absences and achieves good results on species for which each occurrence record reflects suitable environmental conditions. The methodology to produce pseudo-absences was the one proposed by Engler et al. (2004) and adhered by Titeux (2006).

Threshold-independent measures (e.g., ROC curves) are considered to be more robust and more objective than threshold-dependent measures (e.g., Kappa statistics) since they do not rely on a single threshold to distinguish between predicted presence and predicted absence. From the confusion matrix we calculated the area under the curve (AUC) of a receiver operating characteristic (ROC) plot of sensitivity against (1-specificity) (Swets 1988). Sensitivity is defined as the proportion of true positives correctly predicted, whereas specificity is the proportion of true negatives correctly predicted (Fielding and Bell 1997; Pearce and Ferrier 2000). We obtained AUC from a customised function in S-plus software. An additional evaluation was performed against the known distribution of the tick in Africa. This “complete” distribution dataset was published at a resolution of 1° (Walker et al. 2004) as agreed by the opinion of experts, and includes records that are not adequately georeferenced and therefore not included in the original dataset used to erect the models. Therefore, the results from the models for Africa were scaled to that resolution to obtain a further estimation of their accuracy. Models projected onto the New World were evaluated against the set of confirmed records of the tick in the New World (177 records with accurate georeference, ICTTD 2004) looking for the percent of false negatives as a measure of confidence.

Clustering

The main goal of clustering is the improvement of predictive power of the models through the use of partial models for different regions of the species range. Statistical clustering is the ordination and classification of multiple non-identical objects into subgroups based on their similarity. Hierarchical clustering provides a series of divisions, based on some measure of similarity, into all possible numbers of groups, from one single group containing all objects, to potentially as many groups as there are objects.

The records of the target species collected “within” a given region are considered as populations experiencing statistically different climate features. Reasoning of the procedures has been provided previously (Estrada-Peña et al. 2006). Briefly, the models are trained with half of the records collected within a given cluster, and climate suitability for the entire study area was then calculated as an average of values obtained from the algorithms developed for each cluster, weighted by the area of the cluster and the prevalence of the tick in that cluster. Prevalence of the tick is defined as the number of 1 sq. km. grid cells occupied by the tick in that given cluster. A multivariate clustering method was used to classify the habitat in Africa into a large number of small regions according to the Normalized Difference Vegetation Index (NDVI). NDVI was chosen because its signal has a sharp response to both temperature and humidity (Wang et al. 2004). Monthly 1 km NDVI data were obtained as the average for the period 1992–2002. A Principal Components Analysis (PCA) was performed on this image and the three first components used for further analysis. This was followed by a hierarchical clustering procedure performed on the PCA values obtained in the previous step. This produced a set of clusters according to the PCA axes and evaluating the distance of every pixel to the rest of pixels and to the clusters already in development. Mahalanobis distance was used as a measure of dissimilarity and the weighted pair-group average was selected as amalgamation method to produce the clusters. Values of P = 0.05 were set as cut-off probability to be assigned to a given cluster. This produced a classification of the habitat, composed by regions (each one within a significantly different range of monthly NDVI conditions as deduced from PCA). Once cluster assignments have stabilized, tick records collected “within” each ecoregion were considered as belonging to ecologically different populations.

Results

Figure 1 shows the set of georeferenced records of the tick in Africa as well as the complete distribution of the tick as agreed by experts’ opinions on a grid of 1° over the continent. This “complete distribution” is important in this context as it reflects the known area of distribution of the tick, even from places where no point records exist. Table 1 includes the AUC values for every model as developed for this study. MaxEnt selected as descriptive significant variables each monthly rainfall values except April, minimum monthly temperatures from May to August, and maximum monthly temperatures from February to July (21 out of 36 variables). The Gower algorithm selected a similar set of significant variables, including also rainfall in April, and maximum monthly temperature in January (23 out of 36). MaxEnt-derived models based in the set of records in Africa were slightly better than those developed with the Gower distance. However, the MaxEnt algorithm performed poorly when evaluated against the complete distribution of the tick at 1°. The expected distribution of climatically suitable habitat for the tick as erected with both methods in Africa is shown in Fig. 2.

Table 1 Features of the models developed for the estimation of habitat suitability for Amblyomma variegatum, according to AUC values obtained after evaluation of the models against the set of records as a whole (Complete Set) or after cluster development (Clusters) and use of the area-weighted method
Fig. 2
figure 2

The predicted distribution of adequate climate suitability for Ambblyomma variegatum in Africa, predicted by models using MaxEnt (A) or Gower metrics (B). Darker is higher

Principal Components Analysis (PCA) on monthly NDVI values produced a total of 10 clusters in Africa. A. variegatum was found in 7 of them. First principal component is loaded with the NDVI characteristics in June–August, while the second principal component is loaded with features in December–February. Other components were not significant according to the matrix of weights. Figure 3 shows the geographical range of these clusters over the records of the tick. Cluster 1 has a central distribution, being mainly recorded immediately above and below the equator. Cluster 2 encloses the records of the eastern part of the Horn of Africa. Cluster 4 is loosely centred on the equatorial line with an intense cluster around Nigeria and parts of Ethiopia. Clusters 3, 4, 5 and 7 form a series of bands according to the different NDVI characteristics found above and below the equator. The cluster 6 involves portions of Eritrea, Djibouti and Ethiopia. Figure 4 shows the climate features of these clusters (according to 95% confidence ellipses) in the reduced space of the two first principal components, to show the relationships between them. The figure includes the known georeferenced records of the tick in Africa as black dots. Clusters 4, 5 and 6 delineate a group showing clear affinities between them, while clusters 1, 3 and 7 shape a clearly separate group of records. Cluster 2 lies in an intermediate position. Cluster-trained models have few differences with models derived from the total set of records (Table 1). Both methods increased the accuracy when derived from partial cluster models, as compared against the set of records, but still have a low power when compared with the total distribution of the tick, as expressed by the 1° grid in Fig. 1. The simple Gower method (not derived from clusters) still had better performance as applied against the 1° grid than the rest of methodologies.

Fig. 3
figure 3

Geographical range of the NDVI-derived clusters (using principal component analysis) in Africa where A. variegatum has been recorded. Clusters are arbitrarily numbered from 1 to 7

Fig. 4
figure 4

Statistical affinities between the clusters as detected from features of NDVI after principal component analysis. Black dots represent the realized niche of the tick (georeferenced records). Ellipses are the 95% confidence for each one of the clusters, as included in Fig. 4. Axis 1 and 2 are the first and second principal components, respectively

When African-trained models are applied to the New World, they produced 56% of false negatives (MaxEnt) or 42% (Gower) as compared with the recorded distribution of the tick in the Caribbean. When these same methods are used as models trained with the New World records, they produced only 4% of false negatives (MaxEnt) and 5% (Gower). Figure 5 shows the expected area of climatically suitable habitat in the New World derived from an Africa-trained model (A) and a New World-trained one using Gower distance (B). Insert in figures show how these two models produce a very different result for the Caribbean islands, where data for currently known distribution of the tick in the area are concentrated. Considering this method as the best to detect areas of potential distribution (according to the results from Africa) and trained with the New World records, potential climate suitability is predicted to exist in most islands of the Caribbean region, including Haiti, Puerto Rico and Cuba, parts of northern Colombia, northern and eastern Venezuela, portions of the Mesoamerican corridor with large areas of suitability in Mexico. The tick may found adequate habitat in USA, particularly in southern portion of Florida Peninsula.

Fig. 5
figure 5

The predicted climate suitability for A. variegatum in the New World as predicted by the Gower metric, models developed with the whole set of georeferenced records from Africa (A) or with the records of the known invaded range (B). Inserted (bottom) are parts of the Caribbean region magnified to show how both models fit with the current situation of invasion in the Caribbean islands

Discussion

In spite of the increasing popularity of the methods used to approximate species’ ecological niche, there are difficulties in projecting species distributions into areas and times different to those used to calibrate the model (Loehle and LeBlanc 1996). Differences exist between fundamental niches, realized niches and geographic distributions, with each one implying a distinct set of influencing factors. It is interesting to note that Gower distance (but not MaxEnt) predicts the existence of climatically suitable habitat in areas of Angola, while this area is currently occupied by A. pomposum (Walker and Olwage 1987). It seems that biotic factors may be the responsible of this lack of confidence. Therefore, realized niches capture many factors other than species limits of tolerance to environmental variables (Araújo et al. 2001). On the other hand, biased availability of environmental conditions is often found when modelling distributions of a species over large areas, mostly due to restrictions in data accessibility and sampling strategies. In this sense, it seems that MaxEnt is highly dependent on the realized niche of the tick (the set of point georeferenced captures used to train the model) while the Gower distance can detect areas outside the realized niche and fit better the actual distribution of the species (the fundamental niche) according to the complete known distribution of the species at 1° including areas from where there are not point records (Walker and Olwage 1987; Walker et al. 2004). In general, a rule of the thumb is that the more frequently data are available, the better. Actually, the distribution of occurrence data in ecological space becomes much more important than the overall density of points, particularly for generalist species that most likely have broad ecological niches, like A. variegatum. The lack of portions of the fundamental niche to train models seem to be the main cause of low predictive power or contradictory results in some cases.

The question of whether confirmed presence in a species’ invaded range can be used to successfully predict its eventual distribution is fertile ground for research, remaining relatively unexplored (Welk 2004; Hierro et al. 2005). Few studies have investigated the performance of invaded-range versus native-range datasets for predicting invaded-range distributions. A handful of studies have suggested that species will occupy an overlapping set of environmental conditions in an invaded range (Kriticos and Randall 2001). These authors suggested that a species invaded range is best predicted using information from a previously invaded range; they reasoned that in invaded ranges the fundamental niche may be more fully realized than in the native range, where the species is constrained by competition and dispersal barriers. Because of the aforementioned restrictions, we prefer to consider Gower-based methods as the most adequate to model the probable distribution of the tick in the New World, because they would potentially detect better areas of suitable habitat. Africa-trained models, that include the full potential climate plasticity of the species, produced the worst invading scenario for A. variegatum, being predicted to colonize huge areas in the New World, in consonance with results previously published by Sutherst and Maywald (1985). However, this model fails to detect the current situation, as Caribbean islands are marked as unsuitable habitat by this model (up to 42% of false negatives) because the invading strain represents a small subset of the whole ecological space potentially available for the species. Best results are obtained when the ecological potential of the tick in the New World is delineated using a model trained with the invaded range, as it has been mentioned for other organisms (Mau-Crimmins et al. 2006). However, this method does not take into consideration the full plasticity of the species. Moreover, the current models do not take into account the potential increase in climatic favourability associated with global warming. A 1–3°C increase in temperature may extend the actual range further to both the north and the south.

Results outlined here point to a dreadful scenario of actual invading ability for A. variegatum in the New World. While our approach to the modelling of habitat suitability reduces the originally expected range for that tick (Sutherst and Maywald 1985) this tick has the potential to invade the whole Caribbean, wide areas of Colombia and Venezuela, parts of the Mesoamerican corridor, wide areas of Mexico and the Peninsula of Florida, considering the most restrictive scenario. This is of serious concern, not only because the direct damage derived from the presence of the tick itself but because the possibility of pathogen introduction and dissemination via other Amblyomma species already present in newly invaded areas, most notably Ehrlichia ruminantium (Burridge et al. 2002). It is important to state that we ignore if changes in the ecological potential of the invading strain have already occurred. Genetic adaptation of species is rarely considered in the literature on this topic. It is usually expected that evolutionary change occurs only on long time scales and that the tolerance range of a species to climate remains the same as it shifts its geographical range. However, studies have shown that climate-induced range shifts can involve selection against phenotypes that are poorly adapted to local conditions (Davis and Shaw 2001). Evidence of rapid adaptation to range margins of species has been also signalled for Boophilus microplus in Africa (Tonnesen et al. 2004). This is of crucial importance in predicting the potential invasive range of A. variegatum, as we ignore if the invading strain retains the full adaptative plasticity of the species (worst scenario) or if local adaptations are rapidly occurring. A combination of both factors (full ecological potential and rapid adaptation to local conditions) would provide a dramatic scenario of fast invasive abilities. This factor has deep implications in the final results of the modelling approach, as the best model can only provide an educated guess of the maximum and minimum limits of probable dispersion. Studies about climate tolerance of different strains under laboratory conditions should be encouraged.