1 Introduction

Desertification is a type of land degradation through which an area becomes increasingly arid, generally losing its bodies of water as well as vegetation and wildlife (Geist 2005). Nowadays desertification is one of the most important environmental issues (UNCOD 1977; UN 1994). The question has often been invoked by the media and decision makers during emergencies, such as prolonged droughts and water scarcity. These latter phenomena, however, are mainly due to climate change (Verstraete et al. 2008).

Moreover, this phenomenon is commonly confused with the expansion of deserts, and so it is believed restricted to areas where arid, semi-arid and sub-humid dry climates occur. Instead desertification is a widespread process of degradation, which is closely affecting, for instance, the Mediterranean countries (Hill et al. 2008). Namely many Italian regions appear particularly exposed to various forms of risk of soil degradation, and desertification (Salvati et al. 2005).

Desertification is highlighted by a persistent reduction of the ecological and economic productivity of ecosystems and agricultural lands, whose consequences are not always immediately perceptible (Henderson-Sellers et al. 2008). Therefore, the phenomenon requires a constant attention, a systematic and permanent monitoring and the development of medium and long term intervention strategies (Holm et al. 2003; Gringof and Mersha 2006; Vogt et al. 2011).

The desertification assessment implies a thorough reconstruction of the history and the evolution of the environmental and social context through physical, biological and socio-economic data (sensitivity). The evaluation of the problem evolution is even more complex, because it requires the availability of possible scenarios and forecast models or other statistical tools (vulnerability). In these assessments it is fundamental to understand the factors that control the climate and the environment through explicit and formal representations of knowledge, or empirical evidence obtained from observations and experiments.

To this aim a methodology based on a matrix of quantitative and qualitative variables has been developed by FAO (1984) and UNEP (1992). However, such methodology has gained several criticisms, due to the subjective nature of the considered data (Veron et al. 2006). Another approach, aimed to preventive actions, propose the application of the rain use efficiency (RUE), which is the ratio between the rate of the plant biomass accumulation and the annual precipitations (Le Houerou 1984; Prince et al. 1998; Nicholson et al. 1998; Holm et al. 2003; O’Connor et al. 2001; Diouf and Lambin 2001).

In the identification of areas at risk of desertification in Italy, climatic factors (aridity index and precipitation), soil (structural characteristics), vegetation (land cover) and human pressure (distribution and density of the population and demographic changes) have been taken into account (Diodato and Bellocchi 2008), and the desertification status is valued by the aridity index developed by De Martonne (1926), which includes the annual precipitations and the annual air temperature. According to several independent observations, this index decreases as aridity increases (Diodato and Bellocchi 2011).

In the light of these previous studies, in this work we apply two soft computing methods on meteorological and land use data to assess the desertification vulnerability, expressed by De Martonne index, of further areas, in addition to those already known as at risk. Namely the soft computing methods we apply are: (1) feedforward Multi-Layer Perceptron (MLP), a well-known Artificial Neural Network (ANN) (Beale and Jackson 1990; Bishop 1996; Haykin 2008), and (2) genetic programming (GP) (Koza 1992, 1994), a method based on genetic algorithms (Goldberg 1989) to generate and evolve automatically unknown functions, usually represented as tree structures (Cramer 1985).

Both the methods have already been successfully applied in several environmental contexts (Makkeasorn et al. 2008; Rampone and Valente 2012, 2017; Rampone 2013; Shiri et al. 2012; Stanislawska et al. 2012; Sivapragasam et al. 2007).

The remaining of this paper is organized as follows: the Sect. 2 is intended to describe the central Italy area interested by this study and the related environmental data used. The Sect. 3 resumes the experiments and their results. The last Section is devoted to the result discussion.

2 Application area

Sannio is the name of a territory situated in southern Italy, at present corresponding partially to the regions of Campania and Molise (Fig. 1). The Molise portion is excluded from our evaluation, so our analysis will take into account an area almost entirely in Campania, whose coordinates vary between latitude 41°29′N, and 41°09′N, and between longitude 15°10′E and 14°40′E. This area is surrounded by a series of high reliefs above 1000 m (Matese Mountains, Taburno-Camposauro Mountains) that belong to the Apennines. It includes two hill ranges: in the western one, the altitude is less than 300 m and the narrow floodplains of Calore, Tammaro and Fortore rivers develop here, while in the eastern one the relief gradually rises up to 1000 m (Daunia Mountains) and then descends to the coast of the Adriatic Sea in the territory of Puglia.

Fig. 1
figure 1

Sannio area in a digital terrain model (DTM), and locations of the meteorological stations used in this study (circled crosses). 1, San Marco dei Cavoti; 2, Castelvetere in Val Fortore; 3, Colle Sannita; 4, Castelfranco in Miscano; 5, San Bartolomeo in Galdo; 6, Santa Croce del Sannio; 7, Morcone; 8, Greci; 9, Ariano Irpino; 10, Biccari; 11, Faeto; 12, Troia

Such topography significantly affects the climate, since it influences the atmospheric circulation of the central Mediterranean (Wigley 1992). In fact, the Apennine chain, that stretches along the Italian peninsula, intercepts the north-westerly or westerly airflow (from Atlantic Sea), sometimes considerably strengthened by the cold air flow from the northeast (from Balkan Mountains). This implies deep cyclonic conditions, which causes, especially in winter, heavy rainfall with peaks of 1500–1900 mm measured in the western areas, close to the reliefs. Such precipitations, above the 400–500 m, may also be snowy. In summer, anticyclonic conditions could appear, and such situation creates intense and prolonged periods of drought, especially in eastern areas. These conditions are typical of the sub-tropical circulation that develops in areas of the southern Mediterranean (Brunetti et al. 2006). Therefore, the climate is Mediterranean in the area of Sannio (CSA) with marked seasonality: cold and wet in winter and hot and dry in summer. However, the complex land morphology can result in significant anomalies between the western and eastern areas.

From the available temperature data, recorded in several meteorological stations (see Fig. 1) from 1980 to 2012, it is evident that the warmest month is usually July with an average temperature of 23.6°, while the coldest one is January with an average temperature of 7.4°. The driest month is July, with an average rain value of 26 mm, although the month with maximal precipitations is November (110 mm on average). The annual average temperature of Sannio area is just over 15°, while the rainfall is a little lower than 800 mm. Among the extreme values, the maximum temperatures range between 31° and 36° and the minimum values can reach − 10°. Over the last decades, the trend evidences a temperature increase of the of the order of one degree, and a precipitation decrease of some hundreds of mm.

In addition, Sannio area is affected by numerous catastrophic events related to strong weather anomalies, such as heavy snowfall that isolate towns for weeks (eastern areas), copious rains that cause floods and landslides, and prolonged periods of drought that determine famines (eastern areas). These events affect the land use, which is mostly agricultural.

At present, a part of the surface of the mountain areas is covered by forests while another part is used in extensive cultivation activities. In hilly areas and in the valleys, the percentage of urbanized area substantially increases, even if the agricultural crops remain prevalent and ever more specialized (e.g. viticulture, tobacco). A comparison with the previous decades shows a dramatic reduction both in the wooded areas on the major reliefs, and in the cultivation activities on the eastern hilly areas.

This situation has to be related to both the deforestation and the abandonment of cultivated lands, and even to the action of soil degradation. Such action is spreading more and more in the Sannio, favoured by wild runoffs of surface waters of predominant clay soils, often arid because of low rainfalls. The geographical proximity of the Sannio with areas included in the national program against the desertification as Apulia and Molise, and the data emerging in the eastern hilly areas, suggest a desertification risk (Di Lisio et al. 2009).

2.1 Data

From the considered stations available data, for each year from 1980 to 2012, we value 7 + 1 characteristics as reported in Table 1.

Table 1 Characteristics and their range (data from 1980 to 2012)

The aridity indexes DMCY and DMNY are computed according to De Martonne formula

$$\frac{{MYR}}{{MYT\;+\;10^\circ }}.$$
(1)

For each year, the first 7 characteristics, that form a so called feature vector, should contribute to the prediction of De Martonne index in the next year.

In this way we build a dataset made up by 183 labelled feature vectors, where each vector has the following structure

$${X_p};{\text{ }}{t_p}\;=\;\left( {Y;\;MYR;\;MYMT;{\text{ }}MYmT;{\text{ }}MYT;{\text{ }}DMCY;{\text{ }}LU} \right);{\text{ }}DMNY.$$
(2)

3 Experiments

3.1 Multi-layer perceptron

The MLP configuration is made by the back propagation procedure (Beale and Jackson 1990), and a 10-fold cross-validation methodology (Devijver and Kittler 1982) is applied. So 10 independent experiments are performed for each validation set choice, and we use the resulting average performance as %-misclassification error.

The experiments are performed by using a neural network Excel-based simulation environment developed by Angshuman Saha (available on-lineFootnote 1). This tool allows the setting of several ANN parameters, as reported in Table 2.

Table 2 Adjustable parameters of the excel-based system which simulates the MLP neural network

The initial weights Wt are randomly chosen in a fixed range. The learning rate, a measure of the influence degree, in the formula for updating weights of the actual error, and the momentum term, that determines the influence of the past history of weight changes, are determined by a trials-and-errors methodology.

The number of neurons in the hidden layer(s) are select by a pruning/growing methodology (Rampone and Valente 2017), starting from an initial random choice. The resulting topology consists of 7 input, a single hidden layer of 2 neurons, and one output.

The number of epochs (training cycles) is fixed to 500.

As performance indicators, we use both the error percentage (resulting from the 10-fold cross-validation application) and the coefficient of determination R2.

In Table 3 we report the results. The %-misclassification error is 3.49% and the coefficient of determination is 0.97, both very good values. The predicted and expected De Martonne index values are comparatively reported in Fig. 2.

Table 3 MLP results: %-misclassification error and coefficient of determination (R2)
Fig. 2
figure 2

MLP results: Expected (X-axis) and predicted (Y-axis) DMNY

3.2 Genetic programming

In the genetic programming experiments, we are looking for a formula f() that satisfies

$$DMNY{\text{ }}={\text{ }}f\left( {Y;{\text{ }}MYR;{\text{ }}MYMT;{\text{ }}MYmT;{\text{ }}MYT;{\text{ }}DMCY;{\text{ }}LU} \right).$$
(3)

The set of possible component functions is limited to the arithmetic operators (+, , *, /), some trigonometric functions (sine, cosine and tangent and hyperbolic versions) including their inverse, the exponential and the natural logarithm, the logistic function, and the gauss function.

The fitness measure is the absolute error (AE).

The samples are divided in training and validation as in the previous MLP experiments.

The experiments are performed by a genetic programming software tool called Eureqa (Schmidt and Lipson 2009).

The best set of solutions, after about 300,000 generations, is reported in Table 4. The behaviour of each solution is sketched in Table 5. The predicted and expected De Martonne index values by using the best solution are comparatively reported in Fig. 3.

Table 4 Representation of the GP solution set
Table 5 GP results: R2 and AE for each solution
Fig. 3
figure 3

GP results: Expected (X-axis) and Predicted (Y-axis) DMNY by using the best GP solution

We also measure the relevance of each considered characteristic in determining the solution result by:

Sensitivity

The relative impact that a characteristic has on the solution result.

% Positive

The likelihood that, increasing this characteristic, the solution result will increase.

Positive magnitude

A measure of how big the positive impact is.

% Negative

The likelihood that, increasing this characteristic, the solution result will decrease.

Negative magnitude

A measure of how big the negative impact is.

Table 6 summarizes these results for the solution 1.

Table 6 Relevance of the considered characteristics in the solution 1

4 Discussion

In this work Artificial Neural Networks and Genetic Programming are applied in order to predict the desertification trend of the Sannio (central Italy) region. By using a 10-fold cross-validation methodology, both the used soft computing methods show low error rates and high values of the coefficient of determination (R2), and they appear to be able to predict, with a minimum and acceptable error, the aridity index of the successive years.

Namely, by the MLP application, the %-misclassification error is of 3.49% while the coefficient of determination is of 0.97.

However the MLP classification results are sub-symbolic, and so they are difficult to be used by an environment expert. On the other hand the GP approach offers the advantage of an explicit representation of the factors that favour or delay the desertification and maintains a low error rate and a high R2 value (0.97, as reported in Table 5).

The obtained GP formula evidences that the De Martonne index most influential factors are mainly represented by the land use and by the precipitations. More specifically, the deterioration of the De Martonne index is linked to the presence of pastured areas, rather than wooded, and to the decrease in precipitation. The spread of the deforestation and the great rainfall variability, consequent to the climate change, appear the keys to trigger the phenomenon of desertification.

Although a larger number of instance data and a deeper insight of the reasons of changes in land use would probably ensure a stronger result, the aridity index forecast for this area—yet unclassified among those at risk of desertification—provide an important tool for environmental monitoring and the development of medium and long-term intervention strategies in the Sannio.

It will be interesting to apply this analysis to other areas and try to use other soft computing methods with resulting explicit formula as BRAIN (Rampone and Russo 2012; D’Angelo and Rampone 2014), and so this will be the subject of a future work.