Keywords

1 Introduction

Mexico, with a total area of about two million square kilometres, is a megadiverse country, but it presents high rates of deforestation [9]. Various studies have attempted to assess land use/cover change (LUCC) over the last decades [21] but there have been few attempts to assess the main causes of deforestation at national level [5, 10, 24]. Given the complexity of the Mexican territory, the processes of change and its factors are expected to be different depending on the region. Geographically Weighted Regression (GWR) has been applied in exploring spatial data in the social, health and environmental sciences. The goal of this study is to evaluate the spatial patterns of deforestation with respect to drivers reported to influence LUCC using GWR models.

2 Materials and Methods

2.1 Material

In order to elaborate the geospatial database, the following data were used:

  • Maps of tree cover, forest loss and forest gain from the Global Forest Change 2000–2013 database at 30 m resolution [14]. The map of tree cover in 2000 estimates canopy closure for all vegetation taller than 5 m in height and is encoded as a percentage per cell. The map of forest cover loss (2000–2013) is a binary map (loss/no loss) which indicates deforestation, defined as a change from a forest to non-forest state. An additional map (loss year) indicates the year forest loss occurred and is encoded as either 0 (no loss) or else a value in the range 1–13. The map of forest cover gain 2000–2012 indicates a non-forest to forest change within the study period and is encoded as either 1 (gain) or 0 (no gain). There is no map of forest gain year because forest gain was not allocated annually.

  • Maps of ancillary data (digital elevation model, slope, roads maps, human settlements, climate, soils, municipal boundaries) [15].

  • Socio-economic data from the National Institute of Geography, Statistics and Informatics (INEGI for its Spanish acronym) organized by municipality (Population census for 2005 and 2010) [16, 17].

  • Index of marginalization 2010 by municipality from the National Council of Population [8].

  • Government policies (rural and cattle-rearing subsidies, and protected areas) [7, 28].

GIS operations were carried out with Q-GIS [25]. Statistical analysis and graphs were created using R [26]. Geographically weighted regressions were carried out using the packages spgwr Bivand and Yu, [4] and GWmodel [13, 19] in R.

2.2 Deforestation Rate Computing and Database Elaboration

In this study, forest area and forest change were estimated at municipality level (2456 municipalities) using the Global Forest Change 2000–2013 database [14]. A map of 2000 forest area was obtained thresholding the tree cover map at 10 %. For each municipality, the 2007 forest area was estimated updating 2000 forest area taking into account forest gain and loss during 2000–2006. As forest gain was allocated over the entire period 2000–2012, we computed a gain area proportional to the 2008–2011 period duration with respect to the 2000–2012 period. Deforestation rate was computed as the average annual proportion of 2007 forest deforested during 2008–2011 for each municipality. In order to determine which ancillary variables are most likely to be indirect drivers of deforestation, we calculated, for each municipality, various indices describing resources accessibility, population, economic activities, and governmental policies:

  • Road density (km of road per km2 taking into account dirt and paved roads),

  • Population density in 2010 (people per km2),

  • Settlements density (number of settlements per km2),

  • Index of marginalization, which takes into account incomes, level of schooling and housing conditions [8],

  • Cattle density (heads per km2),

  • Goat density (heads per km2),

  • Average slope (degrees),

  • Average elevation,

  • Amount of governmental subsidies for agriculture (PROCAMPO, thousand of Mexican pesos per km2),

  • Amount of governmental subsidies for cattle ranching (PROGAN, thousand of Mexican pesos per km2),

  • Proportion of municipality within protected areas.

As the dependant variable is the proportion of 2007 forest cleared during 2008–2011, the variance of this proportion is likely to decrease as it approaches 0 or 1, which is a problem because the regression analysis assumes the error terms to have constant variance. To remove this problem, the proportion was transformed using the Eq. 1 to produce a variance-stabilised rate of deforestation [11].

$$\begin{aligned} TP = \arcsin [\sqrt{p}] \end{aligned}$$
(1)

where TP is the transformed rate of deforestation and p is the original proportion.

Explanatory variables were also transformed using logarithm, square, square-root and exponential transformation in order to improve linearity.

2.3 Statistical Analysis

GWR is a local spatial statistical technique for exploring spatial non-stationarity [11]. It supports locally modelling of spatial relationships by fitting regression models. Regression parameters are estimated using a weighting function based on distance in order to assign larger weights to closer locations. Different from the usual global regression, which produces a single regression equation by summarizing the overall relationships among the explanatory and dependent variables (for the whole Mexican territory in that case), GWR produces spatial data that express the spatial variation in the relationships among variables. Maps that present the spatial distribution of the regression coefficient estimates along with the level of significance (e.g. t-values) have an essential role in exploring and interpreting spatial non-stationarity. Fotheringham et al., [11] provide with a full description of GWR, and Mennis [22] gives useful suggestions to map GWR results.

Collinearity amongst the explanatory variables of a regression model is a well known problem which can lead to a loss of precision in the coefficient estimates [3]. This issue can be more pronounced in local regression models than in global ones due to smaller samples used to calibrate local regression and because some localities may exhibit high collinearity levels when others do not due to spatial heterogeneity [19]. The first stage of the study was a global correlation analysis between explanatory variables using the Spearman coefficient in order to discard highly correlated variables. Local collinearity between explanatory variables was assessed computing diagnostics as local correlations between pairs of explanatory variables, local variance inflation factors (VIFs) and local variance decomposition proportions (VDPs) for each explanatory variable and local condition numbers (CN). According to Lu et al. [19], local collinearity problems likely occur in the local regression model when local correlation values are greater than 0.8 for a given explanatory variable pair, when VIFs and VDPs are respectively greater than 10 and 0.5 for a given explanatory variable and when CN values are greater than 30 for the entire set of explanatory variables. Local correlation between the dependant variable (deforestation rate) and each explanatory variable was also calculated and mapped in order to detect explanatory variables with a high explanatory power. Explanatory variables were selected or discarded in order to reduce collinearity, keeping the ones with higher correlation with the deforestation rate and, as possible, trying to conserve variables describing different aspects of deforestation causes (accessibility, topography, human activities, public policies...).

2.4 Geographically Weighted Regression (GWR)

Due to the uneven distribution and size of the municipalities, the weighting function used an adaptive kernel which selects a proportion of the observations (k-nearest neighbours) assigned to each municipality and calculates the weights using a Gaussian model. The optimal size of the bandwidth (in this case the proportion of observations) was found by minimising the Akaike Information Criterion (AIC) [1], which is a model fit diagnostic that takes into account the model parsimony (trade-off between model complexity and prediction accuracy). A map was elaborated for each explanatory variable showing the value of the regressions coefficients (color scaling of the symbol) and statistical significance (hatched mask layer).

3 Results

3.1 Deforestation and Drivers Assessment

As shown in Fig. 1, the rate of deforestation varies greatly over space. The coastal floodplains of the Gulf of Mexico and the southern part of the country exhibits high rates of deforestation. It is worth mentioning that this rate of deforestation is a gross rate which does not include forest gain. It is also a relative rate, expressed as a proportion of previously existing forest that has been deforested during the period of study. While absolute rates of deforestation are expected among the municipalities with the largest areas of forests, high relative rates can occur when small area of forest are cleared in scarcely forested municipalities. Figures 2, 3 and 4 show the spatial distribution of some of the explanatory variables (road density, cattle density and marginalization).

Fig. 1.
figure 1

Rate of deforestation during the period 2008–2011.

Fig. 2.
figure 2

Road density (km per km2).

Fig. 3.
figure 3

Cattle density (heads per km2).

Fig. 4.
figure 4

Index of marginalization 2010 (CONAPO).

Fig. 5.
figure 5

Local condition number (CN) using the nine input explanatory variables. A value above 30 (red tones) indicates collinearity (Color figure online).

Fig. 6.
figure 6

Local Spearman correlation between the rate of deforestation and the slope. Red and green tones indicates negative and positive relationship respectively (Color figure online).

3.2 Explanatory Variables Selection

The pairs of variables population and human settlements densities and, amount of cattle-ranching subsidies and cattle density present a global Spearman correlation of 0.71 and 0.99 respectively. For this reason, population density and cattle-ranching subsidies were removed from the set of explanatory variables. However, with the nine remaining variables there is still a problem of collinearity (Fig. 5). Therefore, more variables were discarded from the analysis. Global diagnostic give a poor information to carry out the process of selection. For example, slope, which has a low global correlation with deforestation (Table 1) exhibits high local values of correlation (Fig. 6). In most of the cases the relation is negative but in some cases, which correspond mainly to municipalities with large flooded areas or with a high proportion of anthropic cover, the relation is negative (more deforestation in steeper sloping areas). However, slope was dropped out the set of explanatory variables because it presented local collinearity. The analysis of local correlation between the rate of deforestation and the explanatory variables along with collinearity indices enable us to identify the variables with low explanatory power and high collinearity with other explanatory variables.

Table 1. Global coefficient of correlation (Spearman) between the rate of deforestation and explanatory variables.
Table 2. Global model summary.

In order to decrease collinearity at tolerable levels, the number of explanatory variables was finally reduced to five: Index of marginalization, cattle density, elevation, road density, subsidies for agriculture and, protected areas. Figure 7 shows that the Condition Number is inferior to 30 in almost the entire territory.

Fig. 7.
figure 7

Local condition number using the five inputs variables. Values above 30 (red), which indicates collinearity, appear only in a small part of the center of the country (Color figure online).

Fig. 8.
figure 8

Distribution of the local parameter estimates associated with road density.

3.3 Geographically Weighted Regression (GWR)

For comparison purpose, a global model was fitted and obtained an adjusted-R2 of 0.20 (Table 2). A GWR was fitted using the five selected explanatory variables and a weighting function based on a 7 % of the observations (167 neighbours). The use of GWR increased the strength in the relationship in terms of the goodness-of-fit (adjusted R2) to 0.63.

Road density presents a positive relationship with deforestation in the north of the country and in the southern part, which is an expected behaviour as roads are often reported as a deforestation driver (Fig. 8). In the center of the country, which presents the higher road density values (Fig. 2), this relation is no significant or even negative, likely due to the fact that municipalities with the highest road densities are likely already almost totally deforested and therefore present low rates of deforestation during the period 2008–2011. Cattle density presents a positive relationship with deforestation in all the country (Fig. 9). The marginalization index presents a significant positive relationship with the rate of deforestation in the eastern part of Mexico (Fig. 10). Many studies have associated poverty and deforestation [27]. There are many regions with high level of marginalization where the relationship between marginalization and deforestation is not significant (and in some cases negative). Previous researches have reported that the most conserved natural areas in Mexico are often located in poor rural areas and/or community lands [2, 6, 10, 12, 18].

Fig. 9.
figure 9

Distribution of the local parameter estimates associated with cattle density.

Fig. 10.
figure 10

Distribution of the local parameter estimates associated with the marginalization index.

4 Discussion

Some limitations of this study related with input data and with the way information is summarized at municipality level have been identified:

  • Change data are based only on a drastic change of land cover (forest loss), they do not consider cover degradation. This factor has to be considered during the results interpretation. In Mexico, the rate of deforestation has been decreasing during the last decades and most of the change processes are related with vegetation cover degradation rather than deforestation. Moreover, as the study used a rate of deforestation as dependant variable it does not take into account large extensions of scrublands in the north of the country.

  • The rate of deforestation shows change from 2008 to 2011, but the drivers variables (population, marginalization, government subsidies) are from a particular date at different times of the period depending on data availability. The temporal issue cannot be totally addressed due to the lack of information. Additionally, in some cases, it could be interesting to calculate rates of change of these indices. For instance deforestation may be more related to the increase of population density than to population itself.

  • Another limitation is the averaging of indices at municipality level which may end up with a figure that does not reflect the actual situation over much of the area. For instance, a municipality with both flat and steep slope areas will present the average value corresponding to moderate slope. Moreover, deforestation can occur in small regions which present very different features from the average figure. This effect, known as the modifiable areal unit problem (MAUP) [23] was evaluated in the case of municipal data for Mexico [20]. The evaluation concluded that, in most of the cases, MAUP did not make large difference to the results. However, counterintuitive relationship between deforestation and slope in some areas (Fig. 6) is maybe due to MAUP. A way to decrease this problem could be to calculate the indices taking into account only the forested area. For instance, average slope of municipality forest area is used to explain deforestation instead of the slope average over the entire municipality.

  • Finally, as depicted by the moderate R2 of the model, the set of explanatory variables we used did not allow to explain the dependent variable in a satisfactory manner for the entire territory. More drivers have to be taken into account for future analysis.

Other limitations are related with the method used and the deforestation process itself: Deforestation is a complex process that depends on interacting environmental, social, economic and, cultural drivers. Some of them cannot be used into the model because they are unable to be mapped. Moreover, the GWR uses municipality information to explain deforestation but is unable to take into account shifting effects (deforestation in a given municipality is due to the actions from inhabitants from other municipalities) and effects at different scale (as the GWR use the same bandwidth for all the explanatory variables). It is worth noting that some drivers cannot act with very fuzzy spatial pattern or no pattern at all (e.g. global economy effect such as import/export of agriculture goods).

It is likely that the effect of a driver on a given region is related to the time such driver has been shaping the landscape and that different drivers have affect at different temporal and spatial scales, which makes the interpretation of the results difficult. Considering the rate of deforestation during different past periods of time will enable us to analyze the dynamic of deforestation in its temporal and spatial dimensions.

In this study, a special attention has been paid to evaluate and avoid collinearity removing the offending explanatory variables. However, the strategy of removing an explanatory variable is not ideal, particularly when only a local collinearity effect is present, because it limits the number of useful explanatory variables in the regression model due to the high correlation existing between spatial variables. An alternative strategy is to use models with a locally-compensated ridge term [13]. Other alternative GWR models are robust models to identify and reduce the effect of outliers and, mixed models which are able to manage some explanatory variables as constant over space (or stationary) and other as local (non-stationary). In future researches, alternative deforestation rates will be also computed, new explanatory variables such as land tenure will be integrated into the model and, a workshop will be organized to carry out deep interpretation of the results.

5 Conclusions

The results we obtained clearly show the advantages of a local approach such theGWR models over a global one, to evaluate drivers’ effect on LUCC processes as deforestation over such a diverse and complex territory as Mexico. The GWR model enabled us to describe spatial relationships between drivers and deforestation. Local models gave a much better explanation of deforestation patterns than the average changes identified by global models such as global regression models.

GWR models can be useful in different types of LUCC study. The exploration of space can help account for differences between regions not captured by standard global measures and thus explain causes of LUCC in different areas. However, the interpretation of the maps of regression estimates is not an easy task and need to be supported by a profound knowledge of the area and of its history. In this regard, GWR offers an attractive visual tool to motivate discussion among specialists from different fields of knowledge (e.g. geographers, anthropologists, economists) and to communicate scientific results with policy-makers and general public. GWR can also be useful in policy design and assessment. Different governmental policies and interventions for deforestation reduction may be appropriate in different areas; depending on local conditions. Therefore the design of locally sensitive policies is likely to be more efficient than nation-wide policies. Alternatively, GWR can be used to evaluate the success of a given policy already in place by determining areas where the intervention was more successful and eventually why.