1 Introduction

A hazardous geological phenomenon that has accrued in recent years in many urban areas worldwide is land subsidence (Chen et al. 2019). As given from UNESCO define, land subsidence is “settlement or gradual downward settling of the ground’s surface, which may have a slight horizontal displacement vector” (UNESCO 2018). Damages to the natural environment and even economic losses are some effects of this geological hazard (Hu et al. 2004; Waltham 1989). Land subsidence, as one aftereffect of water resources mismanagement and excessive use, occurs when the reduction of groundwater levels leads to the compression of soil (Pacheco et al. 2006). Accordingly, this phenomenon that can caused by groundwater excessive pumping has been widely reported in many areas, such as Rafsanjan (Mousavi et al. 2001), Shanghai (Hu et al. 2004), Mashhad (Motagh et al. 2007), Mexico City (Calderhead et al. 2011), Tianjin (Lixin et al. 2011), California (Galloway and Burbey 2011), Su-Xi-Chang (Chen et al. 2013), Arak (Rajabi and Ghorbani 2016), Antelope Valley, Kerman (Abdollahi et al. 2019), Bangkok, Kashmar (Lashkaripour et al. 2006; Rahmati et al. 2019), Tehran (Dehghani et al. 2013; Mahmoudpour et al. 2013; Ranjbar and Ehteshami 2019). In these areas, severe damages occurred and are including fractures in underground lines and transport path, increasing flood risk, building cracking, and loss of ground level (Mohebbi Tafreshi et al. 2019).

In recent decades, increasing damages of land subsidence caused numerous studies worldwide that have attempted to susceptibility zonation of land subsidence risk and identify the factors that affect it (Abdollahi et al. 2019; Wang et al. 2019).

Some researches have appraised the factors affecting land subsidence risk. For example, Burbey (2002) assessed the fault’s effects on land subsidence of Nevada’s submarine sedimentary basins in the United States. Their research showed those joints in the fault’s adjacent that act as a barrier to flow, tend to horizontal deformation; conversely, in places where they do not, vertical deformation caused.

Oh and Lee (2010) for evaluating factors affecting land subsidence, have used seven main factors including land use, groundwater depths, fault distance, geology, the depth of faults, the gradient obtained from topographic maps, and the capability of landing from crater data.

Putra et al. (2011), in Rongkop (Indonesia), appraised the land subsidence risk. Their risk map developed based on five parameters of land use structures, distance to valley-like (cratering), slope, lithology, and elevation.

Park et al. (2012) utilized five main factors affecting land subsidence, including slop, geology, distance of fault, land use, and fault depth in Samcheok City, Korea.

Shadfar et al. (2016) concluded that “excessive groundwater pumping” factor primarily and “lithology” factor secondarily, are effectual in creating land subsidence in the Buin Zahra area.

Rezaee (2016) investigated the land subsidence risk in Kermanshah Plain. Their results show that in the south and east of the Deh-e-Platan village, in the east of Kermanshah, which level of groundwater is low and the aquifer has fine-grained sediments, the land subsidence risk is higher than elsewhere.

Behyari et al. (2017) in their research in Marzan Abad, Iran, studied the effect of tectonic on land subsidence occurrence. Accordingly, the results showed that the geological factors such as fault fractures and the presence of joint have led to the forming weaknesses in the soil structure and instability in the region, and on the other hand, has caused the transfer water to the subsurface calcareous units and has created dissolution cavities as a sample of subsidence.

Minderhoud et al. (2018) assessed the interaction effect of land subsidence and land use in the Mekong delta, Vietnam. Their results showed that land use can affect on intensification natural subsidence, the anthropogenic subsidence, or the land subsidence process. In various land use classes, different rates of land subsidence occurred. Accordingly, in those classes of land use which natural variations because of human activities have been changed, the highest rates of land subsidence occurred.

Moreover, new researches have investigated land subsidence susceptibility using hydrogeological, climate, geophysical, and geological data, as well as methods like statistics, genetic algorithm (GA) (Manafiazar et al. 2019; Taravatrooy et al. 2018), fuzzy algebra (Bianchini et al. 2019; Chanapathi et al. 2019; Ghorbanzadeh et al. 2018; Rafie and Samimi Namin 2015; Yu et al. 2018), artificial neural network (ANN) (Abdollahi et al. 2019; Dehghani et al. 2013; Oh et al. 2019; Tien Bui et al. 2018; Wang et al. 2018), and random forest (RF) models in geographic information system (GIS) applications (Ilia et al. 2018; Mohammady et al. 2019; Pourghasemi and Mohseni Saravi 2019).

In recent decades, several meta-modeling techniques have appeared as promising methods for modeling high dimensional and nonlinear processes. ANN (Tongal and Booij 2017; Zaman Zad Ghavidel and Montaseri 2014), GEP (Aziz et al. 2017; Kisi et al. 2019), fuzzy logic (Jahangoshai Rezaee et al. 2020; Wang and Chen 2015) and statistical methodologies (Barbulescu and Popescu-Bodorin 2019; Elhatip et al. 2008; Leduc and Ouldali 1990) are the best examples. Accordingly, highly accurate results of the GEP model and the ANN model in numerous studies have led us to evaluate and compare the results of these two models in the land subsidence approach.

Since the results of the hybrid models (Barzegar et al. 2016; Elalfy et al. 2018; Jamshidi et al. 2019; Moeeni and Bonakdari 2017; Wang and Hu 2019), especially in combination with the fuzzy models (Abass et al. 2011; Moghassem and Fallahpour 2013; Wang et al. 2010), show higher efficiency and accuracy than the non-hybrid models, in this study the hybrid mode of both GEP and ANN models was used. The advantage of such hybrid techniques is that they can deal with cases that are difficult for one alone as a universal approximator, and in particular that they can potentially find simpler solutions than either alone, viz. a more parsimonious model.

As a result of the literature, no work or limited works have evaluated together, erosion, fault, rainfall, land use, clay thickness, Debi of pumping wells, the effect of soil type, hydrodynamic properties of the aquifer, such as T, and Sy, on land subsidence susceptibility and its scatter. Simultaneous investigation of the parameters that have been identified as the main cause of the land subsidence in various researches in different regions of the world helps to identify and manage the important and effective factor of the subsidence event in the study area. This recognition can be applied to managers in adopting appropriate measures to reduce the negative effects of subsidence. Although the number of parameters affecting a phenomenon does not have a direct impact on the accuracy of the models, the use of more parameters can extend the evaluation circle of the parameters affecting the phenomenon and present a comprehensive susceptibility assessment procedure. Also, since there is a significant vacuum in answering the question: “What is the influence of more factors affect a phenomenon on the accuracy of models?”, and no direct research to answer the question has been done, so one of the aims of this paper, and the reasons for using maximum parameters affecting the subsidence phenomenon, is to investigate and attempts to clarify the relationship between the number of parameters and the accuracy of the model.

Despite using the hybrid ANN and GEP models especially in combination with fuzzy logic in various researches, so far, there has been no researches (or limited researches) worldwide on the use of these types of models in assessing land subsidence susceptibility. Consequently, the main objective of this study is to compare the hybrid F-GEP and the hybrid F-ANN models for land subsidence susceptibility modeling in Varamin aquifer. The findings of this research can provide scientific evaluation for sustainable development and a decrease in human and ecological risk due to land subsidence damages, based on land subsidence susceptibility map.

2 Study area

Varamin aquifer in the southeast of Tehran province, Iran (that is bounded by the latitudes of 540,000–580,000 N and the longitudes of 3,888,000–3,930,000 E in 39 N zone according to UTM coordinate system), is a part of Varamin sub-basin (Fig. 1a). The important communication paths, such as the East–West transit road and the Mashhad-Tehran railroad (Fig. 1b) are located in this area (Mohebbi Tafreshi et al. 2019). Moreover, part of the national electricity transmission network routes is located in this area (Fig. 1c). It has crossed the area affected by the land subsidence in Varamin, 2 km from the Mashhad-Tehran railroad and 5 km from the electricity transmission routes. Meanwhile, 670,000 people live in the area affected by the land subsidence and 4 villages and population centers are in the area.

Fig. 1
figure 1

a Location of the study area, b and c the railroad and electricity transmission route at the surveyed land subsidence locations. The red lines are crack boundaries of land subsidence and the arrows indicate the direction of the collapse

The Location of Varamin aquifer is in the Central zone of Iran from the structural viewpoint (Berberian and King 1981). This aquifer is divided into two parts (the mountains and the plain) by the Pishva hill. This hill is an anticline (Sadeghi et al. 2006). In terms of structural processes, especially the folding of Tertiary deposits can have formed mountains. Geological outcrops in this area (Fig. 2) included a diversity of formations, mostly marl, sandstone, shale, and conglomerate with the age of the Eocene to Quaternary (Sadeghi et al. 2006). Accordingly, the Pliocene and Quaternary deposits in the northeast and south of the Varamin-Eyvanekey road, northeastern and northern parts of Sharif Abad, and south of the village of Shah Qazi and Yousef Abad are observable, which according to their adjacent maps and their consistency, most of them composed of the conglomerate equivalent of the Hezardareh Formation (Sadeghi et al. 2006). The northeastern and northern boundary formations of the area are often related to marl, Eocene volcanic, and Oligomiocene limestone, as well as silt and shale with evaporative sediments of Miocene (Sadeghi et al. 2006).

Fig. 2
figure 2

Geology map of the study area

As observed in Fig. 3a, Sy ranging from 13 to 16% in the north of Varamin aquifer (at the beginning of the cone, which the alluvium has coarse-grained sediments). This amount around the city of Varamin in the middle of the plain is about 10% and is about 2–5% in the southern part of the plain (TRWA 2018).

Fig. 3
figure 3figure 3

a Sy; b erosion; c aquifer thickness; d distance of fault; e bedrock level; f DEM; g annual rainfall; h clay thickness; i T; j soil type; k Debi zonation of pumping wells; l slope based on DEM; m groundwater level in 1995 (arrows depict general flow path); n groundwater level in 2015; o groundwater drawdown in 20 years (1995–2015); p land use; q land subsidence rate based on radar image technique until 2015

A remarkable part in the central and southern areas of the study area has high sensitivity classes (Fig. 3b), in terms of susceptibility to erosion (Alimohammadi 2009). Moreover, moderately susceptible and hard erosion-resistant formations are seen in most of the northern areas of the Varamin sub-basin, and also a few separate parts in the northern and southern parts of the area (Alimohammadi 2009).

The Varamin aquifer is an unconfined aquifer (Nakhaei et al. 2019). In the center of the north half of the aquifer (Fig. 3c), highest thickness is seen up to 280 m, and in the southwest part of the aquifer the lowest thickness of the aquifer is less than 50 m (Shemshaki et al. 2006).

The tectonic movements of this region are affected by Parchin, Kahrizak, Pishva, and Eyvanekey faults (Figs. 2, 3d). The Kahrizak and Eyvanekey faults are thrust faults with a dip to the north, in which Eyvanekey fault has a northwest-southeast trend (IIEES 2010). Similarly, Pishva fault with a dip to the northeast is also a thrust fault that forms the boundary between the mountains and the plains in Pishva city by splitting the Quaternary sediments (IIEES 2010).

The average altitude of this area (Fig. 3f) is 950 m above sea level (Mohebbi Tafreshi et al. 2019). Accordingly, the highest elevation is 1148 m in the northern part, and the lowest elevation is 810 m in the southern and southeast of the aquifer (Nejatijahromi et al. 2019). The northeast to the Southeast of the aquifer is the direction of the topographic slope (Fig. 3l). The annual average rainfall of the study area (Fig. 3g) is 187.4 mm and the annual average temperature is 16.4 °C (Nejatijahromi et al. 2019). On this basis, Siberian fronts from the north, west, and northwest, the Mediterranean fronts have often influenced Varamin aquifer’s climate (Mokhtari and Espahbod 2009).

In the south and north half of the aquifer (Fig. 3i), the pattern of transmissivity is heterologous (Atarzadeh et al. 2014). The maximum transmissivity estimated in the north aquifer reaches up to 3000 m2/day (Mokhtari and Espahbod 2009). However, its trend because of a considerable change in the sediment grain size, or aquifer thickness was decreasing into the south half of the aquifer. Accordingly, It is seen that in the east and south half of the aquifer, the maximum amount is up to 150 m2/day (TRWA 2018).

Forage maize, barley, pistachio, grape, vegetable, and alfalfa are the main crops of Varamin Aquifer (Nejatijahromi et al. 2019).

3 Input data

As shown in Table 1, 18 input layers are evaluated and prepared to be employed in the GIS environment. Accordingly, the radar image until 2015 as an indicator of the land subsidence rate was used for comparison and verification of the results.

Table 1 Input data used for preparing, comparison and verification the land subsidence susceptibility modeling

3.1 Land subsidence effective factors

In the present study, 15 effectual factors including annual rainfall, soil type, T, Debi zonation of pumping wells, aquifer thickness, clay thickness, DEM, Sy, groundwater drawdown in 20 years, bedrock level, lithological units, erosion, slope based on DEM, land use, and distance of fault were used for land subsidence susceptibility modeling, based on literature review (Ayalew et al. 2005; Behyari et al. 2017; Karsli et al. 2009; Minderhoud et al. 2018; Wang et al. 2009). Accordingly, descriptions some of them are as follows:

Slope: One of the most effective factors which has a high effect on the development and expansion of diaclase in lithostratigraphic units and can control land subsidence (Arca et al. 2018; Dai and Lee 2001; Suh et al. 2013). Accordingly, in areas with a gentle slope, the speed of runoff is less, and consequently, there is adequate time for surface water influence into the depths and the dissolution cavities formation, especially in calcareous units. Therefore, the slope because of the loss of calcareous regions (such as karsts) is an affirmative and causative factor in karstic subsidence (Behyari et al. 2017).

Land use: From the land use viewpoint, urban areas, rangelands, and agriculture (due to groundwater harvesting to irrigate crops) are the most water consumed (Taheri et al. 2018). Since increased water consumption can lead to lower groundwater levels and an increased likelihood of subsidence, those kinds of land use that are more water consumption, are more important in assessing subsidence (Minderhoud et al. 2018).

The distance of faults: As the fault activities (such as earthquake) are affecting the possibility of land subsidence occurrence, the higher distance from the faults demonstrating that the region has a lower proportionality for the likelihood of land subsidence. In the lower distance, this probability is higher, conversely (Aalipour Erdi et al. 2017; Arca et al. 2018; Chen et al. 2016; Hu et al. 2019; Pradhan et al. 2014).

Bedrock depth: When the bedrock is located at a low depth, because of the low thickness of the alluvium, it is not possible to drill wells. As we know, groundwater is stored in areas that have a higher thickness. Usually, in these kinds of areas, excessive drilling of wells and consequently, excessive pumping leads to increased subsidence and vertical displacement of layers (WRI 2014).

Drawdown: In regions that are covered by semi consolidated or unconsolidated alluvial sediments, excessive groundwater pumping, can lead to land subsidence (Poland 1984). In the USA, more than 80% of the identified land subsidence has happened because of mismanagement exploitation and overuse of groundwater (USGS 2019b). As described, excessive groundwater pumping lead to the reduction of the groundwater level and consequently increases the land subsidence occurrence (USGS 2019a).

Lithology: The formations and lithologies that include fine-grained materials such as silt and clay in their structure will enhance the subsidence rate. On the other hand, because of the water influence on dissolution structures such as carbonates and gypsums, lithological structures including these materials also erosion and enhance the subsidence as a dissolved sink.

Soil type: When there are unconsolidated fine-grained sediment layers (such as silt and clay) in the aquifer structure, simultaneously with the drop in hydraulic height, the effective stress is enhanced, and the consolidation phenomenon happens (Terzaghi 1925). Consequently, the effect of which becomes manifest as subsidence in the land surface (Nameghi et al. 2013).

Rainfall: Since the higher amounts of rainfall lead to enhance water infiltration, it can increase the groundwater table. Consequently, enhancing rainfall is not only considered as a non-intensification factor in subsidence occurs but also it can be considered as a preventive or mitigating factor in subsidence because of the increase in the groundwater table.

T: Accurate data of hydraulic properties such as transmissivity is significant for reliable predictions of land subsidence modeling (Li and Zhang 2018). The lower T amount leads to enhance soil compressibility amount and subsequently enhances the land subsidence rate.

Aquifer thickness and aquifer hydraulic parameters: These parameters have a positive correlation and direct relationship with subsidence occurrence. Based on the Lohman (1961) equation, the land subsidence depends on the storage coefficient and its parameters, as bellow:

$$\Delta b = \Delta p\left( {\frac{S}{\gamma } - nb\beta } \right)$$
(1)

In this equation, Δb is the rate of land subsidence, Δp is the reduces the pressure head on the aquifer, γ is the water density, n is porosity, b is the aquifer thickness (or saturated thickness), β is the water compressibility [conversely of Young’s modulus for water \(\left( {\beta = \frac{1}{{E_{w} }}} \right)\)], S is the storage coefficient in a confined aquifer that is calculated based on De Wiest (1966) equation as bellow:

$$S = b\gamma \left[ {\left( {1 - n} \right)\alpha + n\beta } \right]$$
(2)

In this equation, α is the water compressibility [conversely of Young’s modulus for the solid grain material of the aquifer \(\left( {\alpha = \frac{1}{{E_{s} }}} \right)\)].

4 Methods

4.1 Factors standardization

ArcGIS version 10 software has various fuzzy membership functions to normalizing parameters in the fuzzy logic extension, which is used usually in many fuzzy logic applications (Mohebbi Tafreshi et al. 2019; Raines et al. 2010). Uses any of these functions are performed based on the spread factor and midpoint. Selecting a membership function for fuzzy normalizing is relevant to the importance, identity, and relationship of each criterion with the goal (Mohebbi Tafreshi et al. 2019). In this research, for normalization the factors, three fuzzy membership functions were used and described as follow:

Fuzzy Small: When small input values have a higher membership value, this function is used (Mohebbi Tafreshi et al. 2018; Raines et al. 2010; Zadeh 1965). The membership amounts that are less than the midpoint have increased (Fig. 4a).

$$\upmu\left( x \right) = \frac{1}{{1 + \frac{{x^{{f_{1} }} }}{{f_{2} }}}}$$
(3)

In this equation, user inputs f1 is the spread, and f2 is the midpoint.

Fig. 4
figure 4

Fuzzy membership’s transformation diagrams; a small, b linear, c large

Fuzzy Linear: This function establishes a linear relationship between the maximum and minimum values defined by the user (Raines et al. 2010; Zadeh 1965). 0 and 1 awarded to the values that are less than the minimum value and the values greater than the maximum value, respectively (Fig. 4b).

$$\upmu\left( x \right) = 0\quad if\quad x < min,\quad\upmu\left( x \right) = 1\quad if\quad x > max,\quad {\text{otherwise}}\quad\upmu\left( x \right) = \frac{{\left( {x - min} \right)}}{{\left( {max - min} \right)}}$$
(4)

In this equation, min and max are user inputs.

Fuzzy Large: When large input values have more membership value, This function is used and is precisely the opposite of the small function (Mohebbi Tafreshi et al. 2018; Zadeh 1965). In this function, the membership amounts that are more than the midpoint, have increased (Fig. 4c).

$$\upmu\left( x \right) = \frac{1}{{1 + \frac{{x ^{{ - f_{1} }} }}{{f_{2} }}}}$$
(5)

In this equation, f1 is the inputted spread amount by the user, and f2 is the midpoint.

4.2 Modeling using GEP

GEP is a generalized genetic algorithm that was first proposed by Ferreira in 1999 (Ferreira 2001) based on Darwin’s theory. For gene expression algorithm, the first step is production an initial population of solutions. To do the first step, an accidental process or application of some information can be used. Then a tree expression can be produced as a form of chromosomes expression, and fitting function can evaluate it and determine the fitting of a solution in the problem domain (Abbasi et al. 2019). Suitability level of fitting function usually can be evaluated by processing some instances of the actual problem, also called fitting cases. The tree structure helps to express the initial population at each stage as a simple linear structure, and all changes are made only on simple structures, so there is no need for relatively complex structures to expand at each stage (Abbasi et al. 2019). If the satisfactory quality of a solution is found or generations reach a specific number, evolution ceases, and the best solution is reported (Maroufpoor et al. 2019). On the other hand, if no stopping conditions are found, the best solution is kept by the current generation (meaning elitism), and the rest of the solution is left to a selective process. Choosing or choosing has the function of survival of the fittest, and accordingly, the best people have a better chance of producing children. The whole process is repeated for several generations, and as the generation moves forward, the quality of the population is expected to improve on average (Ferreira 2006). The algorithm defines a target function in terms of qualitative criteria and then applies the mentioned function to compare different problem-solving solutions in a step-by-step process of data structure correction, and finally, the appropriate solution. In this method, various phenomena are modeled using a set of functions and a set of terminals. The set of functions usually includes the arithmetic functions [+, −, *, /] of trigonometric functions and other mathematical functions or user-defined functions that they believe may be appropriate for model interpretation. The set of terminals consists of constant values and independent variables of the problem (Ferreira 2001).

In this study, GeneXpro Tools software was used to predict, develop, and implement a gene expression-based programming model. One of the strengths of gene expression planning is that the genetic diversity criterion is very simple and so genetic operators act on the chromosome level. Also, one of the strengths of this approach is its unique multi-gene nature that allows for the evaluation of complex models involving several sub-models. The modeling process of prediction of Varamin Plain subsidence is presented as follows:

The first step was to select the appropriate fitting function in which the root mean square error function was chosen as the fitting function (Mehdizadeh et al. 2016). The second step is to select the set of input variables and the set of functions to generate the chromosomes. In this study, four main operators including [+, −, *, /] and mathematical functions [Tanh, X2, Atan, Inv, 3Rt, Ln, NOT, Min2, Max2, Exp, Avg2] were `used. The third step involves selecting the structure and architecture of the chromosomes, which include the length of the head and the number of genes (Mehdizadeh et al. 2016). The fourth step is to select the linking function that was used in this study to add the link between subcategories. Finally, in step 5, the genetic operators and the rate of each of them are selected. In this case, a combination of all refinement operators such as mutation, inversion, three types of transposition, and three types of combinations where used.

In GEP that is a development of GA, various kinds of chromosomes such as linear or simple are encoded to the individuals, and then transformed into an expression parse tree completely separating the genotype and phenotype which causes GEP much faster (100–10,000 times) than the GP (Ferreira 2001; Dey et al. 2015). For instance, the expression tree of an algebraic expression (Eq. 6) is shown in Fig. 5.

$$\left( {O + q} \right)*\left( {r - S} \right)$$
(6)

In GEP, more complex technological and scientific programs can be solved with the help of linear chromosomes and Expression Trees (ET) (Dey et al. 2015). A chromosome is a linear symbolic string of constant length consisting of one or multiple genes of equal size. A typical GEP chromosome is presented in Fig. 6. Each linear chromosome is namely replication, genetically manipulated, replication, recombination mutation, and transposition (Ferreira 2001; Dey et al. 2015). Structurally, they are composed of genes that comprised of the tail and head parts (Dey et al. 2015). As shown in Eq. 7, the tail length (tl) is a function of head length (hl) and the number of arguments of the function (m):

$$t_{l} = h_{l} \left( {m - 1} \right) + 1$$
(7)

Although all genes of the GEP have the same size, they are coded for different expression trees of different sizes (Alkroosh and Ammash 2015). The trees represent a spatial illustration showing the interactions among the gene’s components on the map of the solution (Alkroosh and Ammash 2015). Figure 7 presents the genes expression trees of the chromosome in Fig. 6.

Fig. 5
figure 5

Expression tree of Eq. 6 (Dey et al. 2015)

Fig. 6
figure 6

GEP chromosome (Alkroosh and Ammash 2015)

Fig. 7
figure 7

Expression trees of chromosome in Fig. 6 (Alkroosh and Ammash 2015)

4.3 Modeling using ANN

ANNs are one of the computational methods that assisting the learning process, using processors called neurons, and by adjusting the weights to obtain a model using the available input samples. The neuron is the smallest information processing unit that forms the basis of neural network performance. Based on Fig. 8 a neuron consists of three main parts (Arjun and Kumar 2011). The synapse set establishes the relationship between the input xj and the neuron by the weights of wkj. The uk is the summing set that sum up the weighted input signals. An activation function [\(\varphi \left( . \right)\)] used to constrain the output range. The bk bias constant is used to reduce or increase the output of the neuron.

Fig. 8
figure 8

Non-linear model of a neuron (Arjun and Kumar 2011)

Equations 8 and 9 represent the neural network structure mathematically:

$$u_{k } = \mathop \sum \limits_{j = 1}^{n} w_{kj} x_{j}$$
(8)
$$y_{k} = \varphi \left( {u_{k} + b_{k} } \right)$$
(9)

The learning process of the learning network is performed by the input–output sample k, where the input vectors are x1, x2, …, xn and the output vectors corresponding to each input vector are y1, y2, …, yn. wkj and uk are the weights and bias vectors of hidden layer and network outputs, respectively. Each neuron receives all outputs of the previous layer’s neurons, but each receives a specific weight. After creating the network and determining the number of hidden layers and the number of neurons, the network is trained by available input–output samples and is implemented by a weighted vector learning law (Ross 2005). The activation function of each neuron is to determine the output from the sum of its weighted inputs. Generally, for all neurons in a layer, the same activation function is chosen, although such a condition is not necessary (Ross 2005).

Figure 9 shows the structure of the multi-layer perceptron (MLP) network with I inputs, one hidden layer (number of units in the layer is O) and one output layer. According to Fig. 9, depending on the type and location, the layers can be divided into input, hidden, and output layers. The input layers receive the information and provide it to the system. The output layers send the obtained values out of the system. The hidden layers are the layers whose input and output are only within the system. I is the number of input variables, H is the number of hidden layer nodes, and O is the number of output variables. One of the essential learning algorithms of ANN, which is also used in this research, is called back error propagation law. The back error propagation law is used to train multilayer feedforward neural networks, commonly referred to as MLP multilayer perceptron networks (Fig. 10). The back error propagation law consists of two main paths. The first path is called the forward path in which the input provided to the input layers is propagated through the network, layer by layer, to the output layer. In this way, network variables are considered constant and unchanged. In this algorithm, the objective function designed for network training is usually defined as the sum of the mean squares of the errors. The error value after the calculation is distributed in the backward path of the output layer and by the network layers throughout the network. In this way, the weights of the MLP network are changed and adjusted to minimize the sum of squares of the network error.

Fig. 9
figure 9

The MLP with one hidden layer (Arjun and Kumar 2011)

Fig. 10
figure 10

The flowchart of the methodology

4.4 Theoretical comparison between the ANN and the GEP methods and the conditions for their applications

An ANN that known as one of artificial intelligence-based technique, is a flexible mathematical method that is mighty to recognize intricate nonlinear relationships between input and output data sets.

The main advantage of ANN models over the statistical methods is that the latter assume linear relationships and/or normal distribution, while reality is non-linear and non-normal. Thus the ANN model is capable to conform to the real world. An important advantage of ANNs is its capability to exert large and intricate systems with many interrelated parameters (Nourani et al. 2011). The no free lunch theorem states that uniformly averaged over all target functions the expected error is the same for any two algorithms. Nonetheless, there are other reasons for stating that there are advantages of ANN over other algorithms. For example, the ANNs show graceful degradation was you may have noisy input data or even the removal of units and the ANN still functions. Another advantage is the inherently distributed nature of ANNs which allows better implementations across a distributed environment. The ANN is a non-parametric model, thus eliminates the error in parameter estimation, while most of the statistical methods (MLR, etc.) are parametric models that need higher background of statistic (Singh and Su 2016).

The drawback of this method is that the final product is not in the form of mathematical equations that can be easily implemented. Basically, a major limitation of common soft computing techniques is that no closed-form prediction equation is provided by them (Mohammadzadeh et al. 2019). In the last decade due to the importance of the research topic Numerous Studies were concentrated on many linear and nonlinear regression equations (Pham et al. 2016). Modeling by using artificial intelligence (AI) has been a very active research area (Pham et al. 2016). According to previous researches, although AI techniques such as ANN have demonstrated their superior capability over traditional modeling methods and so ANN was one the successful choice that used for prediction problems, it has some following limitations: 1. ANN does not provide information about the relative significance of the various parameters (Samui 2008) 2. A common criticism of neural networks is that they require a large diversity of training for operation (Saberi et al. 2013) 3. The knowledge acquired during the training of the model is stored in an implicit manner and hence it is hard to come up with reasonable interpretation of the overall structure of the network (Samui 2014) 4. In order to the ANN be able to learn it is essential to define the examples and to teach the network based on the desired output by demonstrating these examples to the network. The network’s success is directly proportional to the selected instances, and if the event cannot be indicated to the network in all its aspects, the network can produce false output. In addition, ANN has some intrinsic disadvantages such as less generalizing performance, arriving at the local minimum and over-fitting, and slow convergence pace (Samui 2014).

GEP is another artificial intelligence-based technique commonly used at nonlinear systems. The GEP method is a newer technique than ANN. The advantages of GEP are: first, the chromosomes are simple entities: linear, compact, relatively small, and easy to be genetically manipulated (replicate, mutate, recombine, transpose) and second, the expression trees are exclusively the expression of the respective chromosomes (Moghassem and Fallahpour 2013). The important powerful property of GEP is that the user can easily take a clear formula of the relation between the inputs and output, which makes GEP more interesting (Guven and Kisi 2013; Parasuraman et al. 2007).

Unlike ANN, GEP is self-parameterizing that creates the model’s structure without any user tuning (Danandeh Mehr et al. 2014). It is also, unlike ANN, which are black-box models that do not describe the physical relationships among various process components (Alavi et al. 2011; Moghassem and Fallahpour 2013) are capable of giving explicit expressions of the relationships between dependent and independent variables (Wang et al. 2016). Technicians with less skill can more easily use those expressions than ANN models (Wang et al. 2016).

As a conclusion, both have similarities in what they can do, but depending on the problem sometimes ANNs will fit fine, sometimes GEP will; i.e., ANN are usually straightforward to implement and work pretty well but their black box nature make them non-user friendly (Wolpert and Macready 1997). On the other hand GEP results are often human friendly, but coding such an algorithm from scratch can be painstaking (Wolpert and Macready 1997). Notwithstanding one has to take a look at the no free lunch theorem (NFLT) which states that two algorithms are equivalent when their performance is averaged across all possible problems (Wolpert and Macready 1997).

4.5 Performance evaluation

To performance evaluation, seven significant statistical criteria based on observed land subsidence were used. The descriptions of these statistical criteria are below:

The coefficient of determination (R2) shows how many percents of the changes in the dependent variable is explained by the independent variable. In other words, the R2 indicates how much the dependent variable changes are affected by the independent variable, and the other changes in the dependent variable are related to other factors. The R2 is always between 0 and 100%.

$$\left( {\frac{{\varvec{n}\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{F}_{\varvec{t}} \cdot \varvec{A}_{\varvec{t}} - \left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{F}_{\varvec{t}} } \right)\left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{A}_{\varvec{t}} } \right)}}{{\sqrt {\varvec{n}\left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{F}_{\varvec{t}}^{2} } \right) - \left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{F}_{\varvec{t}} } \right)^{2} } - \sqrt {\varvec{n}\left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{A}_{\varvec{t}}^{2} } \right) - \left( {\mathop \sum \nolimits_{{\varvec{t} = 1}}^{\varvec{n}} \varvec{A}_{\varvec{t}} } \right)^{2} } }}} \right)^{2}$$
(10)

where Ft is the forecast data, At is the actual data (observed land subsidence), and n is the number of data.

The average of the second power of the deviation of an estimator from its real value is the Mean Squared Error (MSE) defines. This statistic criterion is of particular utility among statisticians (Lehmann and Casella 1998).

$$\frac{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - F_{t} } \right)^{2} }}{n}$$
(11)

where Ft is the forecast data, At is the actual data (observed land subsidence), and n is the number of data.

A robust measure of overlapping data is named the Median Absolute Error (MAE) criteria. This is a more resistant criteria in the field of overload data to the standard deviation (Willmott and Matsuura 2005).

$$\frac{{\mathop \sum \nolimits_{t = 1}^{n} \left| {A_{t} - F_{t} } \right|}}{n}$$
(12)

where At is the actual data (observed land subsidence), Ft is the forecast data, and n is the number of data.

The number of deviations of estimated values from the observed values defined as the root mean square error (RMSE). In other words, dispersion of the data is shown in this criteria, and the excellent performance of the model expresses in the smaller RMSE and closer to zero. (Hyndman and Koehler 2006).

$$\sqrt {\frac{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - F_{t} } \right)^{2} }}{n}}$$
(13)

where Ft is the forecast data, At is the actual data (observed land subsidence), and n is the number of data.

The other three statistical sensors that GeneXpro Tools software specifically uses to evaluate model performance are Relative Absolute Error (RAE) (Eq. 14), Relative Squared Error (RSE) (Eq. 15) and Root Relative Squared Error (RRSE) (Eq. 16), respectively.

$$\frac{{\mathop \sum \nolimits_{\text{t = 1}}^{\text{n}} \left| {A_{t} - F_{t} } \right|}}{{\mathop \sum \nolimits_{\text{t = 1}}^{\text{n}} \left| {A_{t} - \bar{A}} \right|}}$$
(14)
$$\frac{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - F_{t} } \right)^{2} }}{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - \bar{A}} \right)^{2} }}$$
(15)
$$\sqrt {\frac{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - F_{t} } \right)^{2} }}{{\mathop \sum \nolimits_{t = 1}^{n} \left( {A_{t} - \bar{A}} \right)^{2} }}}$$
(16)

In the above equations, At is the actual data (measured subsidence from the radar images), Ft is the data estimated by the model, and \(\bar{A}\) is the average of real data.

5 Results and discussions

5.1 Factors standardization with GIS fuzzy memberships

Based on that lower amounts have a enhance effect on land subsidence in the DEM, Sy, distance of fault, T, rain, and slope parameters, it must use the “Small” function to fuzzy standardize of these factors (Mohebbi Tafreshi et al. 2019). Figure 11 shows the procedure of fuzzy standardization one of these kinds of parameters using fuzzy “small membership” function.

Fig. 11
figure 11

Fuzzy standardization of the “distance of fault” parameter using fuzzy “small membership”. In this figure, 0 was assigned to low effective (yellow), and 1 was assigned to most effective (blue)

“Large membership” function was used in those kinds of parameters that higher amounts have a higher effect on the rate of land subsidence (Mohebbi Tafreshi et al. 2019). Accordingly, the parameters of aquifer thickness, bedrock depth, Debi, and G.W. drawdown have been fuzzy standardize by this membership function (Mohebbi Tafreshi et al. 2019). Figure 12 shows the procedure of fuzzy standardization one of these kinds of parameters using fuzzy “large membership” function.

Fig. 12
figure 12

Fuzzy standardization of the “aquifer thickness” parameter using fuzzy “large membership”. In this figure, 0 was assigned to low effective (yellow), and 1 was assigned to most effective (blue)

Since the land use, geology, erosion, and soil type have qualitative classes, hence to fuzzy standardize these kinds of parameters, the “linear membership” function was used after the assigned a numerical value to each qualitative class (Table 2). Accordingly, the larger numerical value representative a higher effect on land subsidence (Mohebbi Tafreshi et al. 2019). Figure 13 has been shown the procedure of fuzzy standardization one of these kinds of parameters using fuzzy “linear membership” function. Figure 14 presents all fuzzificated factors.

Table 2 The assigned numerical value to each qualitative class according to their effect on land subsidence
Fig. 13
figure 13

Fuzzy standardization of the “soil type” parameter using fuzzy “linear membership”. In this figure, 0 was assigned to low effective (yellow), and 1 was assigned to most effective (blue)

Fig. 14
figure 14

Fuzzification the factors. a Sy; b erosion; c aquifer thickness; d distance of fault; e bedrock level; f DEM; g annual rainfall; h clay thickness; i T; j soil type; k Debi zonation of pumping wells; l slope; m geo units; n G.W. drawdown; o land use

5.2 Land subsidence susceptibility modeling with GEP

In this study, 70% of data (2919 pixels) used for training and 30% (1251 pixels) for testing were entered into the model, randomly. The statistical measures of the best fitness, R, R2, and RMSE were used to evaluate the performance of the model. The parameters and their rates at various stages of using GeneXproTools software to estimate the subsidence are summarized in Table 3.

Table 3 The parameters used and their rates at various stages of using GeneXproTools software

Table 4 shows the best mode in the training and testing phases (Figs. 15, 16). This result shows that the use of bedrock level, slop, soil, geology, aquifer thickness parameters, and +, −, *, /, Tanh, X2, Atan, Inv, 3Rt, Ln, NOT, Min2, Max2, Exp, Avg2 operators, will lead to improved model performance and excellent modeling results with real data.

Table 4 The values of statistical criteria for the F-GEP Model in training and testing modes
Fig. 15
figure 15

Fit diagram of the training phase

Fig. 16
figure 16

Fit diagram of the testing phase

Figure 17 shows the effect of each parameter on land subsidence in F-GEP modeling. Accordingly, the G.W. drawdown parameter had the highest impact, and the Debi of pumping wells parameter had the least effect on the land subsidence in the study area. The results of the GEP modeling on the high influence of G.W. drawdown parameter on the land subsidence are in line with the results of Shadfar et al. (2016) and Shemshaki et al. (2006). These results also are in line with the results of Sundell et al. (2019) that In their paper mentioned the high impact of groundwater and clay thickness parameters on subsidence and its associated hazards.

Fig. 17
figure 17

Percentage chart of the effectiveness of each parameter. In this chart, d0 is aquifer thickness, d2 is clay thickness, d3 is Debi of pumping wells, d4 is the distance to faults, d5 is G.W. drawdown, d6 is erosion, d7 is geology, d8 is land use, d11 is soil type, and d12 is Sy

Since the GEP model can obtain the mathematical relationship between inputs and output variables, so in Table 5, the mathematical and numerical relations are shown. Numerical constants randomly generate each of the graceful chromosome genes and help simplify the equation (Table 6). Given the four genes here, each gene has its sub-tree and its equation, which ultimately yields the final equation concerning the graft function. Figure 18 shows the structure of the desired output model tree.

Table 5 The mathematical relationship derived from the F-GEP Model
Table 6 The constants of each chromosome genes
Fig. 18
figure 18

Structure of the desired output model in tree form: Sub-ET 1: the first gene sub tree. In this sub tree, inputs d0, d5, d7, d8 and d9 are generated and the equation of this sub tree is created as: SUB (ET1) = *.*.Tanh.Avg2.Avg2.*.X2.*.Avg2.Avg2.d8.d5.d0.d9.c7.d5.c0.d0.d8.d11.d5; Sub-ET 2: sub tree related to the second gene. In the following tree, the equation of this sub tree is created as: SUB (ET2) = d0.d4.Avg2.Avg2.Exp.X2.d12.Exp.-.Avg2.d9.d9.d10.c5.d8.c5.d0.d8.d5.c0.d0; Sub-ET 3: sub tree related to the third gene. In the following tree, the equation of this sub tree is created as: SUB (ET3) = d0.Avg2.Max2.d1.c0.d4.-.-.d0.d12.d12.c0.d12.d2.d5.d9.d11.d7.c0.d1.c2. Sub-ET 4: sub tree related to the fourth gene. In the following tree, the equation of this sub tree is created as: SUB (ET4) = Avg2.d12.Avg2.+.Min2.+.d12.Exp.d0.*.d12.c6.c5.d0.d11.d0.d10.d2.d0.d1.d0

Since the link function is the sum function, the genes must be aggregated to obtain the answer equation, which results at the end of the final equation (Eq. 17) is as fellow:

$${\text{Land}}\,{\text{subsidence}} = {\text{SUB}}\,\left( {\text{ET1}} \right) + {\text{SUB}}\,\left( {\text{ET2}} \right) + {\text{SUB}}\,\left( {\text{ET3}} \right) + {\text{SUB}}\,\left( {\text{ET4}} \right)$$
(17)

Finally, in Fig. 19, land subsidence susceptibility map based on F-GEP model presented.

Fig. 19
figure 19

Land subsidence susceptibility map based on F-GEP model

5.3 Land subsidence susceptibility modeling with ANN

In this study, the ANN was used to model the subsidence. In other words, the ANN receives the input information that contains bedrock level, T, clay thickness, annual rainfall, aquifer thickness, slope based on DEM, Debi zonation of pumping wells, soil type, groundwater drawdown in 20 year, erosion, distance of fault, Sy, land use, and lithological units, and relates them to a mathematical logic with existing responses where subsidence values have occurred. Figure 20 shows the structure of the neural network with 14 inputs, two hidden layers (number of units in the first layer eight and the second layer six), and one output layer used in this study (Table 7). This network used 70% (2919 pixels) used for training and 30% (1251 pixels) of the data for the test. The hyperbolic tangent function was used for the processing elements (neurons) in the hidden layer. R2 and RMSE statistical criteria were used to select the appropriate number of neurons in the middle layer and the desired number of replicates and to evaluate neural network learning and obtain the best results. In order to find the optimal state of the networks, various threshold functions such as sigmoid logistic function, linear function, and hyperbolic sigmoid tangent were used. For each ANN network, in the default combination and with different iterations, the values of R2 and RMSE error coefficient were investigated. The number of iterations (which the RMSE error value of the test data was the lowest, and R2 was the highest) selected as the number of initial iterations.

Fig. 20
figure 20

Neural network structure

Table 7 ANN structure information’s

Figure 21 shows the desired output and actual network output and Fig. 22 shows the correlation coefficient between observational and computational subsidence, the error column for each learning process and the error value for each data for train, validation, and test data. These results (Table 8) indicate an excellent approximation of this network for this study (over 94%).

Fig. 21
figure 21

Desired output and actual network output

Fig. 22
figure 22

Correlation coefficient between observational and computational subsidence

Table 8 The values of statistical criteria for F-ANN model

Figure 23 illustrates the importance of input variables to the neural network in predicting subsidence. According to Fig. 23, variable G.W. drawdown is the most important and variables clay thickness, T, Sy, and geology have the next rank in the subsidence occurrence. The results of the ANN modeling on the high influence of G.W. drawdown parameter on the land subsidence are in line with the results of _ENREF_89 Li and Zhang (2018) that In their paper mentioned the high impact of G.W. Drawdown, clay thickness, and hydraulic properties such as transmissivity on subsidence and its associated hazards.

Fig. 23
figure 23

The importance of input variables to neural network

Finally, in Fig. 24, land subsidence susceptibility map based on the F-ANN model presented.

Fig. 24
figure 24

Land subsidence susceptibility map based on the F-ANN model

5.4 Empirical comparison between the F-ANN and the F-GEP methods

As shown in Fig. 25, the overall accuracy of the GEP model with higher amounts of R (0.99861) and R2 (0.99722), and lower amounts of MAE (0.00321), MSE (0.00021), and RMSE (0.01461), is greater than the ANN model. Based on these results, it seems that in non-linear geologic events such as land subsidence, landslide, and flood which are dependent on some other independent parameters of geology, hydrogeology, hydrology, soil and so on, the use of the GEP model leads to better concordance with values of actual data and has more accurate results than the ANN model. This result is in line with Nourani et al. (2014), Luo et al. (2019), and Pashazadeh and Javan (2020) researches in which the concordance with actual data in GEP model is higher than other models including ANN.

Fig. 25
figure 25

The overall accuracy of F-GEP and F-ANN models

Table 9 shows the accuracy of the two models in each of the susceptibility classes. Based on this table, the highest degree of conformity in the ANN model is observed in the very low class and the low, very high, high, and moderate classes are in the next category, respectively. Meanwhile, in the GEP model, the highest degree of conformity is observed in the low class and the very high, moderate, high, and very low classes are in the next category, respectively.

Table 9 The accuracy of susceptibility classes in F-GEP and F-ANN models based on statistical criteria

As can be seen in Table 9 and Fig. 26, despite the higher accuracy of the GEP model in most classes, in the very low class, the fit of the ANN model based on the R and R2 statistical criteria is higher (red dashed line). However, according to the RMSE, MAE, and MSE statistical criteria, it is still the GEP model that has higher accuracy (blue dashed line).

Fig. 26
figure 26

The accuracy of susceptibility classes in F-GEP and F-ANN models

Based on the results of model validity, it can be seen that the GEP model using 10 parameters yields better results than the ANN model using 14 parameters. Its cause can be attributed to the “Tree-based” nature of the GEP model. These types of models (like the support vector machine model) have some advantages such as feature selection and pruning (Naghibi et al. 2018) and are very robust to noise (Tien Bui et al. 2016). Feature selection leads to the selection of the most important factors which can be used for splitting and making the decision and makes the results more acceptable (Naghibi et al. 2018)_ENREF_62.

6 Conclusions

In this research, we tried to evaluate the accuracy of GIS-based hybrid F-GEP and F-ANN models for estimating the risk of land subsidence in Varamin aquifer based on radar image data. In order to standardize and fuzzification the factors before importing them into the two ANN and GEP models, the factors were divided into three groups according to their nature and three “large”, “small”, and “linear” fuzzy membership functions were used. Accordingly, DEM, Sy, the distance of fault, T, rain, and slope parameters by the “small” membership function, the parameters of aquifer thickness, bedrock depth, Debi, and G.W. drawdown by the “large” membership function, and the land use, geology, erosion, and soil type by the “linear” membership function, were standardized. For modeling with the F-GEP model, fourteen inputs, and +, −, *, /, Tanh, X2, Atan, Inv, 3Rt, Ln, NOT, Min2, Max2, Exp, Avg2 operators in thirty chromosomes, seven head, and four genes were used. In this regard, for modeling with the F-ANN model, fourteen inputs, two hidden layers (number of units in the first layer eight and the second layer six), and one output layer were used. In both models, 70% data used for training and 30% for testing were entered into the models. The results of the present study showed that overall accuracy based on the values of R, R2, MSE, MAE, and RMSE statistical criteria in the F-GEP model are better than the F-ANN model. Accordingly, the F-GEP model is more accurate than F-ANN model in land subsidence susceptibility modeling. Despite the clearly superiority of the F-GEP model based on R and R2 statistical criteria, the comparison of the susceptibility classes accuracy shows this model did not perform well in zoning and estimating “Very low sensitive regions” class and the F-ANN model performed better. However, the model output show that both models perform very well in estimating and zoning areas with “Very high” and “Low” risk classes of subsidence. The results also showed in both F-ANN and F-GEP models, the groundwater drawdown and the clay thickness parameters had the highest effect on land subsidence in Varamin aquifer. This result is in line with the previous studies in Varamin aquifer.

This study showed that the F-GEP is a powerful programming algorithm in land subsidence susceptibility modeling. It seems that the “Tree-based” nature of the F-GEP model causes the results more accurate.

Using support vector machine (SVM), random forest, and other tree-based algorithms and comparing them with the results of the current research is a suggestion for future work, which may further improve the modeling accuracy, especially in susceptibility classes.