Introduction

Land subsidence is referred to as a gradual settling or a sudden sinking of discrete segments of ground surfaces that usually occurs as a consequence of a number of physical and human-induced phenomena. Groundwater over-exploitation, natural compaction of unconsolidated fine-grained deposits, collapse of natural or man-made cavities, oxidation of peat-rich materials, and tectonic activity are among the most notable phenomena that trigger land subsidence (Galloway and Burbey 2011). The majority of land subsidence cause extensive deformations over large areas, mainly in coastal areas and urbanized deltas, covering tens to hundreds of square kilometers, damaging severely buildings and linear infrastructures. Significant land subsidence with immense socioeconomic impacts that are due to aquifer over-exploitation has been reported in Mexico, China, Thailand, Italy, Spain, Japan, and the USA (Galloway et al. 1998; Tomás et al. 2005; Stramondo et al. 2007; Hu et al. 2009; Righini et al. 2011; Raspini et al. 2013).

Numerous studies have been conducted in Greece concerning areas affected by land subsidence phenomena due to aquifer over-exploitation. Characteristic examples of such areas are the wider area close to Kalochori and Sindos villages west of the Thessaloniki plain (Stiros 2001; Psimoulis et al. 2009; Raucoules et al. 2008; Loupasakis and Rozos 2009; Raspini et al. 2014; Svigkas et al. 2016), the Anthemountas basin at the east of Thessaloniki (Koumantakis et al. 2008; Raspini et al. 2013), Thessaly plain in central Greece (Soulios 1997; Kaplanidis and Fountoulis 1997; Salvi et al. 2004; Ganas et al. 2006; Apostolidis and Georgiou 2007; Kontogianni et al. 2007; Rozos et al. 2010; Vassilopoulou et al. 2013; Apostolidis and Koukis 2013; Ilia et al. 2016), the area near the Amyntaio opencast coal mine (Soulios et al. 2011; Loupasakis et al. 2014; Tzampoglou and Loupasakis 2016), Messara valley in Crete (Mertikas and Papadaki 2009), and the Thriasio plain (Kaitantzian et al. 2014).

Regarding Thessaly plain, the eastern part has been affected by land subsidence phenomena related to reservoir compaction with cases observed since the early 1990s (Soulios 1997; Kaplanidis and Fountoulis 1997; Marinos et al. 1997). On the contrary, at the west Thessaly plain, the initial surface raptures were recorded much later, at 2001. The town of Farsala has been the first town manifesting differential surface deformations, affecting the road network and numerous buildings at the center of the town. During the past years, numerous damages have been recorded not only within the limits of the town but also at the plain area extending at the north (Apostolidis and Georgiou 2007; Rozos et al. 2010; Apostolidis and Koukis 2013; Ilia et al. 2016).

Concerning the detection and monitoring of land subsidence phenomena during the past decades, remote sensing techniques, especially Global Positioning System (GPS) and Synthetic Aperture Radar interferometry (SAR) technology have been widely applied (Dixon et al. 2006; Herrera et al. 2009; Hu et al. 2009; Galloway and Burbey 2011; Osmanoglu et al. 2011; Chaussard et al. 2013; Raspini et al. 2013, 2014; Zhu et al. 2015; Svigkas et al. 2016). Specifically, subsidence patterns and spatiotemporal trends have been identified in several studies by the analysis of data obtained through multisensor monitoring system comprising GPS stations, leveling surveys, monitoring wells by GPS-based monitoring systems, and the analyses of SAR images through mainly Persistent Scatterer Interferometry (PSI) techniques. According to Raspini et al. (2014), SAR satellite observations and PSI techniques provide a cost-efficiency method for the management of subsidence-related hazards and are regarded as a valuable tool to verify and validate subsidence models.

Focusing on land subsidence triggered by over-exploitation of aquifers and groundwater withdrawal, two investigation methodologies can be distinguished. For the geo-technical simulation of the phenomena, the first methodology involves deterministic methods that apply either the conventional consolidation theory (Terzaghi 1925) or more complicated soil deformation constitutive laws (Gambolati et al. 2005; Brinkgreve et al. 2006; Loupasakis and Rozos 2009; Raspini et al. 2014). The second methodology is based on the detection of the relation between the distribution of past ground deformations and land subsidence-related variables, providing a series of susceptible, hazardous and risk maps, through the utilization of knowledge- or data-driven methods (Zhu et al. 2013a, 2013b; Teartisup and Kerdsueb 2013). The first methodology requires highly accurate geo-technical and hydrological data; however, in most cases, data are not available and numerous assumptions have to be adopted. On the other hand, the second methodology can be safely applied for medium-scale studies, combining all kinds of land subsidence-related data, providing both qualitative and quantitative results. Successful examples of data-driven methods that have been applied in land subsidence assessments can be found in the international literature involving advanced methods of analysis such as logistic regression, weight of evidence, nearest-neighbor distance algorithm, artificial neural networks, decision trees, and fuzzy logic (Kim et al. 2009; Galve et al. 2009; Choi et al. 2011; Oh et al. 2011; Lee et al. 2012; Malinowska 2014),

Few studies have been reported in the literature that follows the principles of the second methodology and the usage of conventional ground truth measurements and PSI techniques, in order to predict the land subsidence due to groundwater withdrawal (Modoni et al. 2013; Zhu et al. 2013a, 2015). Zhu et al. (2013a) evaluated the risk of land subsidence considering six land subsidence-related variables, namely: the thickness of the compressible sediment and the thickness of quaternary strata, changes in groundwater level of the unconfined and confined aquifer system, the building density, and the recharge from precipitation infiltration. The Analytical Hierarchy Process coupled with sensitivity analysis (AHP-SA) method was implemented to predict the deformation rate, while the distribution of land subsidence, derived from PSI data, was used to verify the accuracy of the risk assessment map. In a similar manner, Teartisup and Kerdsueb (2013) analyzed a subsiding area in Nakhon Pathom Province, Thailand, considering six factors: geology, hydrogeology, wells density, groundwater consumption, land use, and population density. The factors were analyzed by evaluating the weighting and rating scores assigned to 12 independent governmental officers. The total score of each factor was employed to assess the high-risk area by applying the principal component analysis method. The validation process involved the comparison of ground truth measurement data, providing actual deformation rates, with the produced land subsidence risk map. Another interesting approach has been conducted by Zhu et al. (2015) in which the authors adopted PSI technique in order to quantify the dynamic evolution of land subsidence and to determine the spatial relation with its triggering factors, in the Beijing north plain. Specifically, PSI was used to quantify land subsidence, while GIS spatial analysis was applied to explore the correlations of the land movement, the groundwater drawdown, the thickness of the more compressible geologic layers, and the evolution and characteristic of the urbanization in the study area. Also, the authors modeled future land subsidence assuming different water pumping scenarios, with the results providing important information for land subsidence mitigation decision-making.

In this context, the main objective of the present study was to predict land subsidence expressed by the deformation rate under different water pumping scenarios in the wider area of Farsala, western Thessaly basin, by implementing a data mining method, random forest (RF). The decision of using RF as an investigation tool was based on their ability to perform well even in the case of nonlinear problems, when the data may appear “noisy” or incomplete and also based on the fact that the created model can easily be interrupted (Fayyad et al. 1996).

The variables used by the model were estimated by assessing the geological, hydrogeological, and tectonic settings of the area and the physical and geo-mechanical properties of the geological formation covering the area and also by analyzing the spatial and temporal trend of groundwater resources and PSI data time series. Specifically, the thickness of loose deposits, the Compression Index, and the Sen’s slope value of groundwater-level trend were considered the independent variables that influence the evolution of the phenomena, while remote sensing data was considered an evidence of past land subsidence.

The study area

The study area is located in Thessaly basin, central Greece. The basin is divided into two sub-basins the East and the West. The current study focuses on the Eastern part of the West Thessaly plain and specifically at the wider plain area extending north of Farsala town (Fig.1). The area which appears flat is defined by two major rivers Enipeas and Farsaliotis. The wider area has been undergoing intensive cultivation, consuming large volumes of irrigation water for at least three decades (Dimopoulos et al. 2003).

Fig. 1
figure 1

The study area

According to the Köppen climate classification system (Aguado and Burt 2012), the climate is characterized as Mediterranean type (Csa) having heavy winters and cool summers. The rainy season is from October to May accounting to almost 90% of the total amount of annual rainfall which approximately reaches 31.7 to 87.7 mm/month. December appears to be the rainiest month (87.7 mm) followed by November (86.2 mm), while the driest month appears to be August (10.4 mm) followed by July (14.1 mm) .

The annual average mean temperature is 15.13 °C with the highest and lowest average temperature to be 20.94 and 9.39 °C, respectively. The climate data were obtained from the University of East Anglia Climate Research Unit (CRU) and referred to a period over 107 years between 1901 and 2008 (Jones and Harris 2008).

Concerning the geo-tectonic evolution of Thessaly, the eastern part of the basin belongs mainly to the Pelagonian zone, while the western part belongs mainly to the Pindos zone. The wider area of west Thessaly plain is characterized by a variety of geological formations. Specifically, the sub-basin consists of Mesozoic Alpine formations belonging to the Pelagonian, sub-Pelagonian, and Pindos geo-tectonic zone while post-Alpine deposits fill the lowlands of the basin (Apostolidis and Koukis 2013). The recent Quaternary deposits consist mainly of lacustrine and torrential sediments.

The Mesozoic Alpine formations constitute the bedrock of the Quaternary deposits occupying the wider area of Farsala. These formations consist of schist–chert formation (sh), ophiolites (o), and limestones (Le), while the post-Alpine deposits include Neogene (Ne), Pleistocene, and Holocene deposits (Mariolakos et al. 2001; Rozos and Tzitziras 2002; Apostolidis 2014). The Quaternary deposits covering the plain area appear to be finer in respect to the distance from the Farsaliotis River, secondary branch of Pinios River, which crosses the area. As presented in Fig. 2, the coarser deposits, consisting of sands and gravels (sd-gr, gr-sd), occupy the riverbeds, while the rest of the plain is covered by the finer clayey silts and silty clays (cl-sl, sc-cs) with ranging percentage of intercalated sands and gravels.

Fig. 2
figure 2

Geological settings of the wider study area (modified after Apostolidis 2014)/boreholes and groundwater monitoring wells

The town of Farsala is mainly founded on red-brown to black-brown clays with ranging percentage of sands and gravels intercalated with brown to yellow sandy clay horizons (sc-cs). Only a small section at the south of the town is founded on the Mesozoic limestones of the Narthaki Mountain foothills. The Quaternary formations lying over the limestone block the karstic aquifer of the mountain, which gushes out at Apidanos Springs, springs that are located inside the town. The Quaternary deposits of the plain consist of red-brown to black-brown clays alternated by highly permeable sands and gravels layers (cl-sl).

Considering the hydrogeological setting, the unconfined and semi-confined aquifer systems of the Thessaly basin are distinguished in four sub-basins, namely (Kallergis 1971, 1973; Manakos 2010; Apostolidis 2014): (a) Kalambaka sub-basin, (b) Trikala sub-basin, (c) Karditsa-Sofades sub-basin, and (d) Zaimi-Sofiadas-Farsala sub-basin. Furthermore, several high-capacity karstic aquifer systems and some additional fractured rock aquifers of local importance are located at the perimeter of the plain, in the carbonate, and the metamorphic bedrock formations, respectively. Considering the porous media aquifers, the most productive are located at the Zaimi-Sofiadas-Farsala sub-basin.

Regarding the study area, the aquifers are located in the Quaternary deposits. They can be distinguished in an unconfined shallow aquifer extending through the upper stratum and a system of successive semi-confined artesian aquifers, develop in the deeper coarse grain layers (Apostolidis and Koukis 2013). These systems are recharged not only mainly from the infiltration of the surface water but also through the lateral infiltration of the karstic aquifers developed in the carbonate formations of the Narthaki Mountain (Rozos et al. 2010). The aquifer systems of the study area are subjected to excessive over-exploitation, according to piezometric measurements conducted between 1980 and 2005 (Manakos 2010).

Methods and data

The developed methodology was distinguished into four phases: (a) the first phase involved the investigation of the geological, hydrogeological, and tectonic settings of the study area and the estimation of the physical and geo-mechanical properties of the geological formations, (b) the second phase involved the analysis of the spatial and temporal trend of groundwater level, (c) the third phase involved the analysis of the PSI data, and (d) the final phase involved the prediction of the deformation rate and also the modeling of future land subsidence assuming different water pumping scenarios. The computational process was coded using R Studio (ver.0.99.489) (RStudio Team 2015) in order to analyze the spatial and temporal trend of the groundwater resources and implementing the RF model, while ArcGIS 10.1 (ESRI 2013) was used for compiling the data and producing the spatial patterns of trends and their magnitudes. Figure 3 illustrates the flowchart of the applied methodology, whereas details of each phase are described in the following paragraphs.

Fig. 3
figure 3

Flowchart of the followed methodology

During the first phase, the thickness of the loose deposits and the Compression Index of the geological formations that cover the area were defined. Specifically, previous studies concerning the geological, hydrogeological, and tectonic settings of the wider research area, including numerous borehole data, were evaluated (SOGREAH 1974; Apostolidis and Koukis 2013; Apostolidis 2014). Specifically, over 30 geo-technical boreholes and 60 oedometer tests were analyzed (Fig. 2).

During the second phase, the spatial and temporal trend of groundwater level was estimated. In the first step, spatial trend analysis was conducted in order to detect potential spatial variability concerning the groundwater-level variations. Specifically, a box–whisker plot was constructed to provide an insight about spatial variability, while by conducting a one-way ANOVA test, the statistical significance of spatial variability was estimated. Passing to the temporal trend analysis, the main objective was to estimate the Sen’s slope value of groundwater-level trend, a metric able to provide the fluctuation per unit time and estimate the magnitude of the detected trend in the groundwater-level data (Sen 1968; Hirsch et al. 1982). To achieve the objective of this phase, the first action was to investigate the assumption that the data in question are serially uncorrelated. It is well known that when assessing the significance of the trend in data that are serially correlated, the analysis could be affected and may produce misleading results. With a positively auto-correlated series, there are more chances of a series being detected as having trend, if though there may be actually none. In order to estimate the presence of serial correlation, Spearman’s rank correlation coefficient was applied. The next step was to identify the trends in the groundwater-level time-series data by applying the common or modified, in the case of serially correlated data, Mann–Kendall method (Mann 1945; Kendall and Stuart 1967; Hamed and Rao 1998). The final step involves the application of the Sen’s slope estimator in order to provide the fluctuation per unit time in the groundwater-level data.

The groundwater table data from ten nearby groundwater monitoring wells were obtained from the department of Hydrology of the Thessaly Prefecture, referring to the time period from 1980 to 2005 (Fig. 2). The analysis was conducted for the low-level season (September), since this season represents a season with high water consumption, low precipitation, and few wet days and also a season with considerably high temperatures and potential evapotranspiration.

For the preparation of all necessary contours within the frame of the above-described processes, Kriging technique was utilized. Kriging is considered to be a technique of making optimal, unbiased estimates of regionalized variables at unsampled locations using the structural properties of the semi-variogram and the initial set of data values (David 1977; Kumar 2007).

During the third phase, the analysis of the remote sensing data provided the evidence of past land subsidence. Satellite-based Interferometric Synthetic Aperture Radar, commonly referred to as InSAR, is the geodetic SAR processing technique developed in the early 1990s that uses two or more SAR images to generate maps of topography and/or deformation of the Earth’s surface (Massonett and Feigl 1998; Bamler and Hartl 1998; Ferretti et al. 2001; Hanssen 2001; Kampes 2006; Prati et al. 2010). Using the differences in the phase of reflected radar signals from two different satellite passes over the same place on the Earth’s surface, it is possible to make precise millimeter-scale measurements of changes in ground deformation over time spans of days to years. In our case, the Permanent Scatterers Interferometry (PSI) technique (Ferretti et al. 2001) was used to assess the SAR data. PSI methods identify and integrate the position of individual points that act like Permanent Scatterers (PS), which are pixels that have coherent phase and stable amplitude properties across all interferograms within a particular time period with respect to a common master scene (Ferretti et al. 2001; Werner et al. 2003). PSs can be man-made constructions (rooftops, roads, etc.), natural formations (protruding rocks, etc.) or even more custom-made reflectors, allowing the measurement of deformation velocities along the satellite line of sight (LOS) (Ferretti et al. 2001). PSI data were derived from a descending dataset provided by the German Space Agency (DLR), acquired in 1995–2003 by the European Space Agency (ESA) satellites ERS1 and ERS2. This set of data were processed within the framework of the Terrafirma project, which was supported by the Global Monitoring for Environment and Security (GMES) Service element Program, promoted and financed by the European Space Agency (ESA) (Adam et al. 2011).

For the interpretation of the final product, it should be noted that land subsidence is considered a phenomenon with high vertical movements, thus the deformation rates of the LOS were assumed to be greatly influenced by the vertical components (Righini et al. 2011; Zhang et al. 2018). The negative displacement rate values indicate a movement away from the sensor (subsidence), while the positive values represent a movement towards the sensor (uplift). Concerning the color scale, green indicates points that are characterized as stable (− 1.5 to + 1.5 mm/year), the gradation from yellow, orange to dark red, represents movement away from the sensor and the gradation from light blue to dark blue represents movements towards the sensor. Figure 4 presents the spatial distribution of the deformation rates at the wider study area. It is clear that the subsidence phenomena affect the town as well as the plain area extending at the north, while the rest of the PSs covering mainly the Mount Narthaki appear to be stable.

Fig. 4
figure 4

PSI deformation rate measurements

The prediction of the deformation rate based on the current water consumption was achieved by the implementation of the RF method, during the final phase of the methodology. RF is an ensemble learning method that is based on the generation of several classification or regression trees, which are aggregated to estimate a classification or a regression value (Breiman et al. 1984; Breiman 2001). The algorithm exploits random binary trees, which use a subset of observations through bootstrapping techniques. From the original dataset, a random selection of training data is sampled and used to build the model. The data that are not included are referred to as out of bag (OOB) (Breiman 2001). According to an ensemble method such as RF, is more accurate than individual members if only data appear random and diverse. In the case of RF, diversity is achieved by resampling the data with replacement and by randomly changing the predictive factor over the different tree induction processes (Youssef et al. 2016).

In order to successfully utilize the RF algorithm, there are some parameters that should be set by the user: (a) the number of the random variables that are used in each split, (b) a parameter that has to do with the growth control of the tree, and (c) the number of trees in the forest. In regression RF, the default number of random variables (mtry) is p/3, as opposed to p1/2 for classification, where p is the number of predictors.

When the training phase ends, some extra information about the influence of each variable can be extracted. Specifically, the analysis can order the variables by the mean decrease accuracy and the mean decrease Gini. The mean decrease in Gini coefficient measures the contribution of each variable to the homogeneity of the nodes and leaves in the resulting RF model. The mean decrease in accuracy is determined during the OOB error calculation phase. The more the accuracy of the RF due to the exclusion of a variable, the more important that variable is assumed, thus variables with a large mean decrease in accuracy are more important.

In our case, the RF method was utilized to predict the subsidence deformation rate based on three related variables, namely: thickness of loose deposits, the Sen’s slope value of groundwater-level trend, and the Compression Index of the formation covering the area of interest.

The validation of the predictive performance of the model was achieved by measuring two statistical metrics, the root mean squared error (RMSE) and the r square (R2). RMSE is a quadratic scoring rule that measures the average magnitude of error, the differences between prediction and actual observations, whereas r square provides a measure of how well observed the outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model.

Also, during this phase land subsidence values are predicted assuming different water pumping scenarios by the produced RF model. Specifically, prediction was made for a scenario that assumed a 20% decrease in the Sen’s slope value of groundwater-level trend and a scenario with a 20% increase in the Sen’s slope value of groundwater-level trend.

Results

During the first phase of the developed methodology and based on the analysis of the geo-technical borehole dataset (Apostolidis 2014) that included data concerning the depth of the bedrock formations and the thickness of the loose deposits, a map which provides the spatial distribution of thickness of the loose deposits was constructed. Ordinary Kriging method based on a Gaussian model which provided the lowest RMSE error was applied. Figure 5 shows the contours that represent the thickness of the loose deposits, with the maximum thickness estimated to the west of the research area (270 m) while the minimum thickness was estimated in proximity to the town of Farsala (40 m).

Fig. 5
figure 5

Spatial distribution of the thickness of loose deposits

The Compression Index of the Quaternary deposits was also estimated analyzing 60 oedometer tests (Apostolidis 2014). Specifically, the red-brown to black-brown clays and the silty clay-clayey silts appear having values between 0.060 and 0.835. The alternating loose sandy clay and silty sand horizons appear with lower values between 0.040 and 0.450. Figure 6 shows the contours of the Compression Index estimated by using ordinary kriging method based on a Gaussian model with higher values located between the town of Farsala and the village of Vasilis and lower values at the northwest of the research area.

Fig. 6
figure 6

Spatial distribution of the boreholes and the Compression Index values

During the second phase, the spatiotemporal trends of groundwater-level data were analyzed. In order to examine the spatial variability of the groundwater-level datasets, yearly time series during the low-level season were plotted on box–whisker plots (Fig. 7). The box–whisker plot indicates an extremely high spatial variability of the ground water fluctuation reflecting the dynamic exploitation condition of the aquifers. Analyzing further the box–whisker plot, the groundwater monitoring well 445YEB shows the highest fluctuation, followed by SR4. The observed depth ranged between 8.70 and 53.60 m for the 445YEB and between 6.00 and 40.58 m for the SR4.

Fig. 7
figure 7

Box-whisker plots presenting the groundwater-level measurements during the low-level season

According to the contacted one-way ANOVA test, the observed F-statistic exceeds the critical point (F.95, p–1, n–p), thus rejecting the hypothesis of equal well population means and suggesting that there is a significant spatial variability among the groundwater monitoring network. The result is consistent with the previous analysis shown by the box–whisker plot (Table 1).

Table 1 One-way ANOVA results

In order to estimate the presence of serial correlation, a necessary process of the temporal trend analysis, the Spearman’s rank correlation coefficient was calculated (Table 2). Analyzing the set of the time-series, groundwater monitoring wells PZ11, 445YEB, and PZ46 showed a significant serial correlation.

Table 2 Results of autocorrelation tests

The next step of the second phase was to implement the Mann–Kendall test in each noncorrelated well while, in order to consider the effect of autocorrelation, the modified Mann–Kendall test proposed by Hamed and Rao (1998) was applied in wells with autocorrelation (PZ11, 445YEB and PZ46). In the present study, the significance levels for the Mann-Kendall test was considered at a value of a = 0.05. The value of Z for 95% confidence level is 1.96. Therefore, when the time series groundwater-level data produce |Z| > 1.96, there is a significant upward or downward trend. For the low-level season, all the groundwater monitoring wells showed significant downward trend (Table 3).

Table 3 Results of trend analysis

Figure 8 illustrates the spatial distribution of trends expressed by the Sen’s slope estimator for the low-level season. The 445YEB monitoring well showed the highest values, − 1.699 m/year followed by SR4 (− 1.504 m/year) and PZ46 (− 1.400 m/year). The lowest values with statistical significant downward trend were recorder in LB117 (− 0.595 m/year) and in 540B (− 0.938 m/year).

Fig. 8
figure 8

Spatial distribution of trends in the low-level season

Concerning the third phase and the evaluation of the PSI data provided by the Terrafirma project, from a total of 5848 PSs with a threshold of ± 1.50 mm/year, 94.19% were characterized as stable, while only 5.06% of the PSs showed a downward velocity greater than − 1.50 mm/year.

Almost 65% of the PSs with subsiding deformation rate lower than 3.00 mm/year are located in areas where the thickness of the loose deposits is less than 50 m. Also, almost 64% of the PSs with subsiding deformation rate higher than 3.00 mm appear to be located in areas where the thickness of the loose deposits is greater than 50 m. The area north of Farsala (Fig. 9) showed the highest deformation rate (− 20.34 mm/year), whereas most of the PSs had values ranging between – 8.00 and − 12.00 mm/year.

Fig. 9
figure 9

The spatial distribution of the surface raptures and PSI data. The pictures at the bottom of the figure illustrate damages on the road network and buildings located inside the yellow eclipse

Besides the PSI data, the evaluation of the spatial distribution of the surface raptures (Fig. 9) proved that the differential deformations are related with the tectonic lines crossing the town. It is clear that the variation on the thickness of the Quaternary deposits due to the faults offset has caused deferential deformations, projected to the surface as raptures. The PS that were close to surface raptures, showed a deformation rate below − 9.5 mm/year, about 25% of the entire PS.

Passing to the next phase, from the PSs that presented subsiding movements greater than − 1.50 mm/year, a number of 119 PSs were selected for further analysis. The 119 PSs were separated into a training (80% of the total number of PSs) and validation dataset (the remaining 20% of the PSs). For each PS point the values of the thickness of the loose deposits, Sen’s slope value of groundwater-level trend and Compression Index were obtained by applying the tool Extract Multi Values to Points, a tool found within the Extraction toolset, component of the Spatial Analyst toolbox (ESRI 2013). Figure 10 summarizes the descriptive statistics, in the form of box plots for each variable.

Fig. 10
figure 10

Variable statistics

For the implementation of RF following a tuning procedure (Liaw and Wiener 2002), the number of the random variables that are used in each split, mtry, was set to 1, whereas the node size, that controls the growth of the tree was set to 5. Finally, the number of trees used the RF model was set to 1500.

During the present study, a linear regression analysis was also conducted in order to provide a base regression model and to compare it with the one produced by the RF model. Table 4 illustrates the results from the implementation of RF and multi-linear regression (MLR), in which the RF model outperforms the MLR model. The R2 value in the training dataset was estimated for the RF model to be 0.8861 and the RMSE 1.93, while the R2 for the same dataset and the MLR model was estimated to be 0.4110 and the RMSE 3.16. The same pattern of accuracy was identified in the validation dataset. Specifically, the R2 value was estimated for the RF model to be 0.7503 and the RMSE 2.43, while the R2 for the MLR model was estimated to be 0.2001 and the RMSE 2.79.

Table 4 Results of analysis

Figure 11 illustrates the comparison between the observed deformation rate and the predicted value from each model for the total number of PS points. Both models appear to perform better within the range of − 3.00 and − 11.00 mm/year with the predicted values of RF model presenting a higher concentration along with the gray diagonal. The gray diagonal represents the expected value of deformation.

Fig. 11
figure 11

Observed deformation vs. predicted deformation rate

The estimation of the mean decrease accuracy and the mean degrease Gini, revealed that the most important variable was the thickness of the loose deposits, followed by the Sen’s slope value of groundwater level and the Compression Index (Table 5).

Table 5 Variable importance index

Furthermore, the developed model was utilized to estimate the potential deformation rate, assuming different water pumping scenarios. Based on the two scenarios, for a 20% decrease and a 20% increase in the groundwater level within the research area the corresponding deformation rate predictions have been estimated. For the first scenario, with a decrease in the Sen’s slope value of groundwater-level trend of 20%, the mean decrease in the deformation rate value would be up to 9.01%. For the second scenario if the pumping volume increased by 20%, there would be a mean increase of up to 12.12% of deformation rate. Based on the observation that surface raptures and damages in buildings and road network had been recorded within areas that exhibit deformation rate lower than − 8.50 mm/year, it could be induced that the first scenario would provide a probable safer environment, since predicted rates would be much lower than the above threshold. On the other hand, the second scenario would increase the probability of occurrence of surface raptures and damages in building and infrastructure, since over 31.54% of the total number of PS would present deformation rates lower than − 8.50 mm/year. Table 6 provides the predictive deformation rates for a set of representative PS estimated by the RF model for both scenarios.

Table 6 Results of predictive analysis for seven representative PS

Discussion

According to the analysis of the groundwater-level data, it appears that a significant drawdown takes place within the research area. Even though a natural recharge of aquifers takes place every year during the wet season, a constant drawdown overall tendency occurs. According to the box–whisker plot and the one-way ANOVA test, almost all groundwater monitoring wells present high statistical significant fluctuation which is clearly related to the excessive and continuing consumption of water for irrigation purposes. The only exception can be identified at the low fluctuation observed in groundwater monitoring well 540B that may be related to the proximity to limestone formations which host the karstic aquifer of the area.

The outcomes of the present study are consistent with the results from previous studies. Specifically, according to Manakos (2010), the over-exploitation of the water resources has caused the systematic groundwater-level dropdown in several parts of the research area. The irrigated crops, especially those of cotton, have been increased during the last 40 years, extending even in areas where limited water resources are available. As mentioned above, the most affected area is located at the northern part of the Farsala plain, in which the drawdown tendency increases in respect to the distance of the monitoring wells from the laterally recharging karstic aquifer of the Narthaki mountain’s limestone. The area is also an area which is covered by formations with higher Compression Index values and is characterized by successive thick layers of loose deposits.

In respect with the mechanism of land subsidence, Modis and Sideri (2015) examined the spatiotemporal correlation and cross-correlation of the PSI data and groundwater levels of the wider west Thessaly plain and they identified high spatiotemporal correlation and reported that the uniformly sampled groundwater level could be used as an auxiliary variable for the estimation of surface deformations. In general, excessive groundwater withdrawal from aquifer systems decreases the pore water pressure and increases the normal effective stress, which results in the compaction of the hydrostratigraphic units and eventually leads to land subsidence (Galloway and Burbey 2011). In our study, two additional variables were introduced in order to model land subsidence, the thickness of the loose deposits, and the Compression Index of the geological formation covering the site, while the influence of groundwater was expressed by the Sen’s slope value, a metric able to provide the fluctuation per unit time and estimate the magnitude of the detected trend in the groundwater-level data. The high accuracy of the RF model could be explained by the sufficiency and applicability of the proposed methodology, whereas the low accuracy of the MLR model could be explained by the complex nature of subsidence and the nonlinear behavior of the parameters that could be used to describe the phenomena.

Referring to the degree of influence certain geo-environmental variables have on the evolution of land subsidence, and according to the implementation of the RF model, the most important variable was the thickness of the loose deposits. Specifically, areas with thickness of loose Quaternary deposits greater than 50 m are more likely to have subsidence rates greater than 3.00 mm/year. The northern part of the town of Farsala is founded on the compactable loose clayey Quaternary formations while the southern part on the rigid limestone of Mount Narthaki’s foothills. The highly compressible clays are easily affected by ground water drawdown allowing the manifestation of land subsidence. On the contrary, the rigid limestone formations affect the distribution of the deformations by bordering both the surface expansion and the thickness of the clays. Similar results have been obtained by Zhu et al. (2013a). The authors assigned to the thickness of compressible sediments and the thickness of Quaternary strata, the highest weights among variables such as changes in groundwater level of the unconfined and confined aquifer systems, the building density, and the recharge from precipitation infiltration. They found that regions with thickness values of compressible sediment and quaternary strata more than 300 and 350 m were the most vulnerable to subsidence phenomena.

Overall, the outcomes of the present study are in agreement with the theory concerning the mechanism of subsidence evolution, suggesting that an excessive lowering of the groundwater level leads to the radical change of the geostatic loads triggering or accelerating the consolidation of compressible ground layers.

The value of the developed methodology lies on its ability to estimate the influence of each variable has on the evolution of subsidence and to predict the rate of deformation in respect to groundwater-level variations. It is clear that a reduction in the water pumping volume would provide a safer environment and could be the first action in a water resource management plan that mitigates land subsidence. Further actions that insure a more sustainable environment could involve the reduction of the irrigated area, the improvement of the irrigated systems in order to minimize water losses, and probable change of cultivation species to less water demanding crops (Loukas et al. 2007). In addition, actions that increase the availability of surface waters may involve the construction of dams and reservoirs or diverting water from nearby river systems.

Conclusion

The present study provides a methodological approach for the investigation of land subsidence phenomena by utilizing spatiotemporal analysis of groundwater resources, remote sensing techniques, and data mining methods, implemented at the wider Farsala plain located in western Thessaly, Greece. During the analysis, a spatiotemporal analysis was performed based on groundwater-level data from ten groundwater monitoring wells. The analysis revealed that all groundwater monitoring wells presented significant downward trend. The most vulnerable area was located at the northern Farsala plain in proximity to the 445YEB, SR4, and PZ46 monitoring well. The analysis of the PSI data also proved that this area shows the highest deformation rates, reaching in places −20.34 mm/year. The RF method was utilized to predict the subsidence deformation rate based on three related variables, namely: thickness of loose deposits, the Sen’s slope value of groundwater trend, and the Compression Index of the formation covering the area of interest. From the implementation of the RF model, it was estimated that the most important variable was the thickness of the loose deposits, followed by the Sen’s slope value concerning the groundwater trend, and last the Compression Index. The high accuracy achieved by the RF model (r square value 0.7503) was an indication of the sufficiency and applicability of the conceptual model that was based on those three variables and described the mechanism of subsidence in the study area. The conducted analysis detected areas that exhibit deformation which however have no records of damages. It is most certain that the continuing over-exploitation of the water resources will trigger further subsidence phenomena and expand the affected areas. This early detection of surface deformations allows taking measures before severe subsidence phenomena occur and therefore allows for timely protection of the affected areas.