Introduction

Species distribution models (SDMs) are commonly used for purposes of conservation, environmental planning, and wildlife management programs (Guisan et al. 2013). SDM models quantify the relationships between the distribution and demography of a species and the environment (Peterson 2011). SDMs allow us to study species distribution in large areas and even in remote habitats, where logistic and financial restrictions preclude direct observations (Duff and Morrell 2007). They may be particularly useful for assessing the success of reintroduction activities (Manel et al. 1999, 2001). Understanding habitat characteristics and distribution determinants of reintroduced species is important, as this information is a key factor for ensuring the protection of landscape components that are critical for the long-term persistence of these species in the wild.

The use of environmental variables to explain and predict species distribution is not trivial: these relationships are complex, and a large number of variables are involved (Guisan and Zimmermann 2000; Radosavljevic and Anderson 2014). It is well known that the variables that affect the distribution of a species change with the change of observation scale (Blank and Carmel 2012; Crawley and Harral 2001; Kent et al. 2011; Stauffer and Best 1986). Coarse-scale distribution models may be preferred in, for example, bio-geographic studies. In contrast, fine-scale distribution models can depict local scale phenomena such as essential corridors and animal passages and effects of roads and rivers, which coarse-scale models cannot detect. Thus, fine-scale distribution models may be preferable for conservation planning and management (Hess et al. 2006). Yet, in most studies, the selection of resolution is a consequence of the availability and quality of data pertaining to the specific study area, which is typically the limiting factor in distribution studies (Elith et al. 2006; Hess et al. 2006). Data layers used in such studies are most commonly derived from global databases, in which 1 km2 is considered the finest resolution.

Presence/absence information is thought to be preferred to presence-only information in SDMs. However, presence/absence information is more difficult or impossible to obtain than presence-only information (Kent et al. 2011; Pearce and Boyce 2006; Tsoar et al. 2007). At the same time, presence-only data may be subject to large errors due to small sample size and biased samples (Graham et al. 2004; Phillips and Elith 2013). A systematic data-collection survey, designed to collect data at precise locations should largely reduce these biases.

Indirect observations, and in particular dung surveys, are common non-invasive approaches for obtaining information about the presence of species and habitat selection. They are particularly useful when the studied species is hard to find due to its elusive behavior, rarity, or inaccessible habitat (Fernandez et al. 2006; Vina et al. 2010). The use of indirect observations in SDMs requires a clear connection between the presence of the species and the feces (Gallant et al. 2007; Kays et al. 2008; Perinchery et al. 2011). Systematic dung surveys, conducted in sites selected to represent the entire range of environmental conditions in a region, can be an appropriate solution to sampling-bias problems (Fernandez et al. 2006; Norris 2014; Vina et al. 2010).

Here, we developed a SDM for the population of the Asiatic wild ass (Equus hemionus), a rare and elusive species that was reintroduced into the Negev Desert in Israel. We combined three elements in order to overcome the obstacles in developing SDMs: a database of spatial layers of explanatory variables, mapped at a very fine resolution; a systematic sampling scheme; and an intensive survey of indirect observations, through the detection of feces mounds. This approach led to important insights regarding the habitat preferences of this species.

Methods

Study species

The Asiatic wild ass is an endangered species (Moehlman et al. 2008). In the past, the Syrian wild ass (E. h. hemippus) subspecies was found in the Middle East, and became extinct in the wild at the beginning of the 20th century (Groves 1986; Saltz et al. 2000; Schulz and Kaiser 2013). In 1968, a breeding core was established in Israel, using individuals from the subspecies E. h. onager and E. h. kulan, which were brought from Iran and Turkmenistan, respectively. In 1982, the Israel Nature and Parks Authority initiated a reintroduction program of the Asiatic wild ass (from the breeding core of these two subspecies, Saltz et al. 2000). The first individuals were released near Ein-Saharonim in Makhtesh Ramon (Fig. 1). By 1993, three additional releases were conducted at this site and two more in the Paran streambed (Saltz and Rubenstein 1995). A total of 38 individuals were released. The wild ass population expanded its range in the Negev Desert and the Arava valley (Saltz and Rubenstein 1995), and the current population is estimated at more than 250 individuals (Renan et al. 2015).

Fig. 1
figure 1

The study region, reintroduction and sampling sites in the Negev Desert, Israel

Study area

The study area extends over approximately 3000 km2 in the central part of the Negev Desert (Fig. 1). The area is arid and characterized by high daytime temperatures (on average 33 °C) and relatively low night-time temperatures (on average 12 °C). The mean annual precipitation ranges between 30 and 150 mm (Stern et al. 1986). Elevation ranges between 50 and 1033 m, and the area has a complex geomorphological structure. The bedrock is mainly hard limestone, resulting in a cliffy landscape and leveled floodplains. The majority of the area is drained by two main ephemeral streambeds (wadis)—Nekarot and Paran. There are several latitudinal geological faults in the region that create a steep terraced landscape. Flash floods are a common phenomenon after rain events. The flash floods fill water holes in the streambeds, which may hold for a few months. There are very few natural water sources that provide water year round. Vegetation is mostly limited to streams and their surroundings and generally located on the banks. Vegetation in the streams is mostly of a Saharo-Arabian origin, with a Sudanian component in the Arava (Danin 1999). It is dominated by three native Acacia tree species, Acacia raddiana, A. tortilis, and A. pachyceras.

Data collection

We selected 122 sampling sites, using an approximate systematic sampling scheme (Fig. 1) to capture the full range of conditions found in the study region. To ensure accurate representation of the environmental conditions in the sampling sites, we stratified the sampling locations according to three environmental parameters: distance from permanent water sources, altitude, and mean temperature of the hottest month. Based on our prior knowledge of the study species and a literature review (Henley and Ward 2006; Henley et al. 2007; Saltz and Rubenstein 1995; Saltz et al. 1999), we considered these environmental variables to have a high potential for explaining wild ass distribution. These variables were represented by GIS layers and combined into a three-banded composite, on which we performed K-means unsupervised classification using ERDAS IMAGINE V 9.1. The objective of this classification was to divide the study region into polygons with similar combinations of these variables (Carmel and Stoller-Cavari 2006). The 122 sampling sites were systematically distributed among these polygons.

In each sampling site we conducted a feces survey. Fecal droppings of wild ass constitute a straightforward indicator of species presence, because they are deposited frequently, and remain visible in the desert environment for several months (up to about a year). The survey in each site was composed of three 500 m belt transects arranged as an equilateral triangle with a total length of 1500 m, and divided into 150 survey units of 10 × 10 m per site. One of the triangle sides was always laid on a dry river-bed nearest to the point defined as the center of the sampling site. We recorded observations at a distance of 5 m on either side of the transect, where probability of feces detection was 100%. The exact location of feces mounds (droppings as well as dung piles) observed on the transect were recorded using a GPS at a spatial accuracy of 4 m. The number of feces mounds within each 10 m pixel was recorded. Between January 2009 and June, 2009, we surveyed 122 sites and explored 150 units per site, with a total sampled area of 183 ha. For the presence-only SDM, we classified a unit as present if one or more feces mounds were found in that unit.

Data analysis

Explanatory variables

We devoted extensive efforts to create a high-resolution digital data set of environmental variables. We generated 14 spatial layers (Table 1), from which the model predictors were derived. These layers pertained to five main categories (Table 1): vegetation (one variable), topography (4), climate (2), anthropogenic variables (5), and distance from water (2). The vegetation layer was derived from a complex processing of an aerial photo (Appendix 1 in supplementary material). Topography was derived from a digital elevation model of the area, at an original resolution of 10 m. Climate layers had an original resolution of 1 km, and were up-scaled to a 10 m resolution. Distance-to-layers was constructed using Euclidean distance to specific elements on the map at an original resolution of 10 m. To reduce multicollinearity, correlation coefficients were calculated between each pair of variables; in pairs with a high correlation (>0.65 or <−0.65, Pearson correlation), one of the variables was eliminated from the model. A map of each explanatory variable appears in Appendix 2 in supplementary material.

Table 1 Predictors used in the distribution model of wild ass

Statistical model

We used the “Maximum Entropy” model MAXENT V3.3.1 (Kumar et al. 2009; Phillips et al. 2006; Phillips and Dudik 2008). We selected this model from a large pool of possible models, because it was ranked in several comparative studies as one of the most effective models for predicting species distribution on the basis of presence-only data (Elith et al. 2006, 2011; Jeschke and Strayer 2008; Phillips et al. 2006; Radosavljevic and Anderson 2014). We considered using presence/absence models; unfortunately, absence of feces cannot be interpreted as true absence; hence, we decided to use a presence-only model. The MAXENT algorithm operates on a set of constraints that describes what is known from the sample of the target distribution (i.e., the presence data). Maxent characterizes the background environment with a set of background points from the study region. However, unlike the case of presence/absence data, the species occurrence at these background points is unknown. MAXENT predicts the probability distribution across all cells in the study area based on the presence data and, to prevent over-fitting, employs maximum entropy principles and regularization parameters (Phillips et al. 2006). MAXENT produces two outputs: a probabilistic distribution map describing the establishment probability of the species in a specific site and the relative weight of each explanatory variable. Distribution maps of the Asiatic wild ass were obtained by applying MAXENT models to all cells in the study region, using a logistic link function to yield a habitat suitability index between zero and one (Phillips and Dudik 2008). We ran the model in three spatial resolutions: 10 m, 100 m and 1 km, with 106, 105 and 104 background points, respectively. Recommended values were used for the convergence threshold (10−5), maximum number of iterations (500), and regularization multiplier (1). Response functions were constrained to only three feature types: linear, threshold and hinge.

To calculate the contribution percentage of each environmental variable in each iteration of the training algorithm, the increase in Regularized gain was added to the contribution of the corresponding variable. To estimate the importance of each environmental variable in turn, the values of the corresponding variable on training presence and background data were randomly permuted. The model was reevaluated using the permuted data, and the resulting drop in Training Area Under the Curve (AUC) was normalized to percentages. AUC is the area under the curve of the receiver operating characteristic (ROC) plot. ROC curves are widely used for validating SDMs and for comparing between models (Elith et al. 2006; Hernandez et al. 2006; Marmion et al. 2009). To determine whether occurrence data of wild ass were spatially autocorrelated, we calculated Moran’s I Index (Moran 1950) for each spatial resolution separately (10 m, 100 m, and 1 km).

Model validation

We validated the model using: (1) MAXENT’s five performance measures and (2) a cross-validation procedure. The MAXENT model generates three gain and two AUC measures. Gain measures the goodness of fit of a model; it represents the likelihood of presence records compared to background records (Phillips 2006). A gain of 1.6 means that an average presence location has a relative probability of e1.6, which is five times higher than an average background point. Regularized training gain accounts for the number of predictors in the model to address overfitting; Unregularized training gain has no compensation for the number of predictors in the model; and Test gain is calculated from presence records held out to test the model. The AUC values range between 0 and 1, where 1 represents perfect prediction ability of the model and 0.5 represents prediction that is no better than random. Training AUC calculates AUC using the training data; and Test AUC calculates AUC using the test data. A cross-validation procedure was used to estimate errors around predictive performance on held-out data (Elith et al. 2011). Occurrence data are randomly split into a number of equal-size groups (folds), and models are created leaving out each fold in turn. The left-out folds are then used for evaluation. Cross-validation uses all of the data for validation. A tenfold cross-validation procedure was used for the 10 and 100 m models, and a fivefold cross-validation procedure was used for the 1 km model.

Results

We recorded a total of 3232 feces mounds in 18,300 survey units (10 m cells). Feces mounds were found in 115 of the 122 sampling sites. The number of mounds per site ranged from 0 to 124. Five potential explanatory variables were eliminated from the model (Table 1), due to high correlation coefficient (>0.65 or <−0.65, Pearson correlation, see Appendix 3 in supplementary material), leaving nine variables in the model (Table 2). Three of these spatial data layers, namely vegetation, slope, and altitude, were considered as the most influential explanatory variables by the MAXENT algorithm, accounting together for ~85% of the cumulative relative contribution (Table 2). Woody vegetation density was found to have the strongest effect on the Asiatic wild ass distribution (Table 2; Appendix 4 in supplementary material). The response curve of woody vegetation cover (Appendix 5 in supplementary material) showed an increasing presence of the animals with increasing vegetation cover, leveling off sharply at the saturation point (>72% coverage). Slope was the second most important variable (Table 2) and was inversely related to wild ass distribution (Appendix 5 in supplementary material). In slopes steeper than 20°, no feces mounds were found. Altitude was the third most important variable, with 12% relative contribution. The other six explanatory variables that were included in the model had a lower effect on the distribution of wild ass, together accounting for ~15% of the relative contribution to the model (Table 2).

Table 2 Percent contribution and permutation importance of the predictor variables for the 10 m resolution MAXENT model for wild ass

The performance of the three models (10, 100 m, and 1 km) differed markedly. The 10 m model yielded the highest averaged values in all five performance measures (Table 3), indicating a high predictive capacity. The 1 km model yielded the lowest values in all measures, with extremely low values for Test gain (−0.02) and Test AUC (0.67), suggesting poor predictive capacity at this scale. The cross-validation procedure revealed high consistency between the different runs, since standard deviation values were relatively low (Table 2, 3).

Table 3 The averaged MAXENT performance measures calculated using a tenfolds or a fivefolds cross-validation procedure

Occurrence data at a 10 m resolution had a relatively low spatial autocorrelation (Moran’s I Index of 0.13), while the 100 m and 1 km resolutions had higher values (0.38 and 0.22 respectively).

The probabilistic distribution map was heterogeneous and informative at the very fine scale of 10 m, and the fine scale of 100 m (Fig. 2a, b), and much less informative at the scale of 1 km (Fig. 2c). The strong effect of streambeds on the species distribution was apparent at the two finer scales: areas of high probability of presence were in streambeds (wadis) characterized by woody vegetation and moderate terrain. The high resolution allowed detection of various site-related trends and phenomena: (a) Possible convenient movement corridors in a matrix of unsuitable environmental conditions, which enable landscape connectivity among sites (Fig. 4a). (b) Isolated local sites/areas of high suitability for the wild ass (high-quality habitat “islands”) situated within broad areas of low quality habitat (Fig. 4b). (c) Important geomorphologic features that affect the distribution, e.g., streambeds (Fig. 4c). (d) Human-induced local entities that affect the distribution, e.g., the influence of roads on the quality of proximate habitats (Fig. 4d, see "Discussion" for details).

Fig. 2
figure 2

A comparison between the northern regions of the probabilistic distribution maps of the three models. a 10 m resolution model, b 100 m resolution model, and c 1 km resolution model

In contrast to the high variability visualized at fine scale, this map did not show regional trends or gradients at the scale of the study area. Sites with very high and very low probabilities of wild ass presence were found near each other throughout the entire study area; however, in several areas, a spatial continuity of high value sites was noticeable: Makhtesh Ramon (A), Paran streambed (B), the upper part of Nekarot streambed (C), and the Lotz potholes (Borot Lotz) (D) (Fig. 3). These areas have the potential to serve as activity centers for the population. A spatial continuum of sites with low suitability for the Asiatic wild ass also was discernable (Fig. 3, points E–H).

Fig. 3
figure 3

Probabilistic distribution maps of a 10 m resolution model for the Asiatic wild ass in the Negev. Potential wild ass activity centers: Makhtesh Ramon (A), Paran streambed (B), the upper part of Nekarot streambed (C) and the Lotz potholes (Borot Lotz) (D). A spatial continuum of sites with low suitability: the Paran Stream Estuary (E), the region south of Mount Karkom (F), Be’er Menuha (G), and the Eastern part of Makhtesh Ramon (H). Stars indicate reintroduction sites

Discussion

In this study we combined three elements in order to develop a predictive distribution model for the wild ass, a rare and elusive animal: a database of spatial layers of explanatory variables, mapped at a very fine resolution; a systematic sampling scheme; and an intensive feces mound survey. The results indicate that this approach yields an accurate and informative model.

Factors affecting wild ass distribution

The most important variable in the model was the percentage of woody vegetation cover. Its relative contribution (54.5%) was much higher than that of the other variables. The importance of vegetation to the wild ass distribution is consistent with previous studies (Davidson et al. 2013; Giotto et al. 2015; Henley et al. 2007). The strong vegetation effect on the distribution is a result of its nutritional value (St-Louis and Côté 2014), the partial shade it offers, its value for hiding, and in arid areas the vegetation is a favorable microhabitat with reduced temperatures (Belsky et al. 1993).

The second most important variable in the model was slope (relative contribution of 18.04%). The Asiatic wild ass prefers moderate over steep terrain, and avoids steep slopes. This observation was supported by previous studies (Davidson et al. 2013; Giotto et al. 2015; Henley et al. 2007). The next variables in order of importance were altitude (11.97%) and distance from water sources (6.76%). The positive effect of altitude on wild ass distribution is probably related to lower temperatures associated with higher elevations. The Negev is a hyper-arid desert and we expected that distance from water sources would be a major predictor of wild ass distribution. Indeed, the water sources themselves were found to be centers of wild ass activity. However, the fact that the daily movement range of wild ass can reach up to 20 km in each direction (Saltz et al. 2000), confirmed by the finding of feces scattered across most of the study area, may explain why the distance from water source was not a major determinant of wild ass distribution.

Resolution

Model performance

Constructing models at various spatial resolutions and comparing between them enabled us to quantify the effect of resolution on SDM performance. Seemingly, model performance increased with increasing model resolution (Table 3). This finding contradicts a previous study (Guisan et al. 2007) of the effect of degrading model resolution on the performance of SDMs, which demonstrated that using finer cell sizes (from 1 km to 100 m, and from 10 km to 1 km) did not have a major effect on model predictions. In contrast, our results suggest that when the effective resolution of the predictors was 10 m (102 m2), the model provided useful insights regarding the species distribution that are not possible at coarser scales, as is elaborated in the following section.

AUC is one of the most commonly used statistics to characterize model performance (Yackulic et al. 2013). However, its usage has been strongly criticized, particularly with presence-only data (Gueta and Carmel 2016; Jiménez-Valverde et al. 2013; Lobo et al. 2008; Yackulic et al. 2013), since it ignores the predicted probability values and the goodness-of-fit of the model (Yackulic et al. 2013). Corroborating these views, our 1 km model had a high Training AUC value (0.85), whereas the Test gain showed near zero predictive capability (Table 3). This gap reveals AUC’s low informative value and its inadequacy as a performance index in a presence-only modelling framework. Gain indices are more sensitive indicators of model performance (Gueta and Carmel 2016).

High-resolution spatial layers of explanatory variables

We invested considerable resources and effort to produce and obtain the layers of explanatory variables at a spatial resolution of 10 m wherever possible. For climatic variables, the original spatial resolution is 1 km. In contrast, the original resolution of the vegetation and topography layers was 10 m. Indeed, these two variables were the most important predictors in the 10 m model, somewhat less so in the 100 m model, and nearly meaningless in the 1 km model.

Distribution models of large mammals with large home ranges are typically constructed at resolutions of 100–10,000 m (e.g., Bellamy et al. 2013), 2–6 orders of magnitude lower than the 10 m resolution of the present study. Apparently, the two predictors found to be the most important, vegetation and slope, appeared nearly meaningless at a resolution of 1000 m. The distribution map constructed at this coarse scale was not very informative.

High-resolution distribution map

The distribution map obtained by the model enabled us to examine the relative habitat suitability of each site for the wild ass at a fine resolution. The fine-grain image in Fig. 3 illustrates that low quality habitats are found within broad areas of suitable habitat, and vice versa. The high resolution of the map allowed the detection of four habitat components as important for the species’ use of space (Fig. 4): (a) Potential movement corridors (Fig. 4a). Connectivity within the species’ range is essential for the spatial, demographic and genetic dynamics of animal populations and their persistence over time (Colbert et al. 2001; Saccheri et al. 1998) and should be recognized as a high conservation priority (Beier et al. 2006). Identifying connectivity corridors is highly important for the protection of the species, since they may facilitate wild ass movements within a matrix of less suitable areas, enabling connectivity between high-quality habitats (Fig. 2a–d). (b) Isolated habitat patches (Fig. 4b). Isolated “islands” or small fragments of high habitat quality within low quality areas may constitute potential “stepping stones” sites that aid in connecting between activity centers. (c) Important geomorphologic features (Fig. 4c). The high-resolution map indicated clearly the importance of streambeds, including first order streams, in the distribution patterns of the wild ass. In coarser maps, the influence of the streambeds cannot be detected. (d) Anthropogenic effect on distribution. Anthropogenic features may influence distribution patterns of species and, therefore, it is important that they be identified (Valverde et al. 2008). For example, based on the high-resolution wild ass distribution model, roads were found to considerably increase the quality of habitats in nearby areas (Fig. 4d). However, in a specific case, the road effect led to high density of roadside vegetation. The high vegetation quality, in turn, attracted wild asses to the proximity of the road, and several road-kills of wild asses were reported in this area, calling for roadside vegetation management (Asaf Tsoar, personal communication). This example illustrates the importance of the model as a tool to identify such potential negative anthropogenic effects.

Fig. 4
figure 4

Detecting landscape features on the high-resolution map: a Potential movement corridors, b Isolated habitat patches, c Important geomorphologic features, d Anthropogenic effects on habitat quality (roads effect increased roadside vegetation). Colors represent predicted habitat suitability: from green, low suitability, to red, high suitability. (Color figure online)

Sampling

Systematic sampling of presence data

Many distribution models that are based on presence-only data suffer from inaccuracies, due to biased sampling (e.g., multiple observations near roads and accessible sites) and a distribution of observations that is unrepresentative of the range of environmental conditions in the study region (Barry and Elith 2006; Elith et al. 2011; Kramer-Schadt et al. 2013; Phillips and Elith 2013). In this study, we implemented an approximate systematic sampling scheme based on the spatial pattern of major environmental conditions in the study region, thus reducing the aforementioned errors. A common problem in sampling rare species is a zero-inflated distribution of records. In order to reduce this problem, dry river beds were over-represented based on a prior knowledge that wild asses are usually found within riverbeds. Still, two-thirds of the samples were located off riverbeds. However, due to the dense network of riverbeds and the high density of sampling sites, only few areas were out of the reach of this sampling scheme (Fig. 1), and the possible bias was minimal.

Indirect observations for presence data

Predictive distribution models are usually based on direct observations. Creating a database of direct observations of an elusive (Fernandez et al. 2006; Kays et al. 2008; Perinchery et al. 2011; Vina et al. 2010). In this study, we relied on indirect observations using feces mounds as the basis for presence data. The major advantage of surveying feces mounds is that they remain in the field after the animal leaves, increasing the probability of recording activity in sites visited by the species. These factors are enhanced in a desert environment, since in arid regions the decomposition rate of the feces is slower, and mounds may last for long periods, in the case of the wild asses in the Negev up to a year. The large number of observations is a major component of the strength and reliability of a distribution model (Barry and Elith 2006).The feces surveys in our study led to a large number of observations. Obtaining a similar sized database using direct observations would have required a much greater, longer and costlier sampling effort.

Implications for conservation

SDMs can be useful when designing conservation policies (Guisan and Zimmermann 2000). The endangered Asiatic wild ass has become a focus of conservation interest due to its impressive appearance, rarity, reintroduction process and its pivotal function in the Negev ecosystem (Polak et al. 2014). The SDM constructed in this study can serve to locate favorable high-quality patches, and potential future expansion directions of the species in the Negev Desert. It can also be used to locate potential routes and corridors among activity centers, which are important for maintaining connectivity within the population. Model predictions can then be validated by conducting field surveys (Davidson et al. 2013). This information can serve as the basis for developing conservation and management strategies for the wild ass. Specifically, the map enabled us to identify large continuous geographic areas of suitable habitat, which constitute potential activity centers. Three of the continuous areas identified in the map (central Makhtesh Ramon, Paran streambed, and Borot Lotz; Fig. 3) were confirmed in the field as significant activity centers, based on direct observations. Two of these sites—the Paran streambed and the central part of Makhtesh Ramon—overlap with the reintroduction sites. However, distance from the reintroduction sites was not found to be a significant factor affecting species distribution in the statistical model. Each one of the three activity centers contains a permanent water source. The model further enabled us, in a previous study, to identify areas with low landscape connectivity among activity centers (Gueta et al. 2014). These areas were suggested to limit gene flow, leading to the relative isolation of a subpopulation and to the development of population genetic structure in the reintroduced wild ass population (Gueta et al. 2014). Limited gene flow among activity centers may further affect the population’s genetic diversity (Renan et al. 2015), which is essential for the population’s long-term viability (Hughes et al. 2008).

The distribution model can also be used to locate a potential direction for expanding the wild ass range, by projecting the model onto additional areas (Bar-David et al. 2008). It is important to identify areas of potential spatial expansion, in order to ensure the protection and maintenance of landscape connectivity, which is essential for the species’ distribution and, hence, for its persistence in the wild.