Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region

Behmanesh, Javad; Mehdizadeh, Saeid

doi:10.1007/s12665-017-6395-1

Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region

Original Article
Published: 11 January 2017

Volume 76, article number 76, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Environmental Earth Sciences Aims and scope Submit manuscript

Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region

Download PDF

Javad Behmanesh¹ &
Saeid Mehdizadeh¹

458 Accesses
34 Citations
Explore all metrics

Abstract

Soil temperature (T _s) is one of the most important parameters which affect physical and chemical properties of soil. In the present study, two biologically inspired approaches for artificial intelligence including gene expression programming (GEP) and artificial neural networks (ANN), as well as multiple linear regression (MLR) were used to estimate the soil temperature at six different depths (5, 10, 20, 30, 50 and 100 cm) for the Sanandaj synoptic station in a semiarid region in western Iran. Twelve combinations of meteorological parameters, such as minimum and maximum air temperatures, relative humidity, wind speed, sunshine hours and extraterrestrial radiation, were used as input variables. The full data set containing soil temperature and atmospheric parameters, which spans the time period from 1997 to 2008, was divided into training (1997–2004) and testing (2005–2008) data sets. To evaluate the accuracy of the models, determination coefficient (R ²) and root mean square error (RMSE) were calculated. The results showed that the GEP, ANN and MLR were able to model T _s at different depths. However, the performance of the ANN approach was the best.

Evaluating the performance of artificial intelligence methods for estimation of monthly mean soil temperature without using meteorological data

Article 25 April 2017

Soil temperature modeling at different depths using neuro-fuzzy, neural network, and genetic programming techniques

Article 02 May 2016

Temperature-based modeling of reference evapotranspiration using several artificial intelligence models: application of different modeling scenarios

Article 01 February 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Soil temperature (T _s) and its spatial and temporal variations affect physical and chemical processes in soil. Soil temperature is an important meteorological parameter for transferring heat energy from the atmosphere to soil and vice versa, solar energy applications such as the passive heating and cooling of buildings, frost depth prediction (construction depth of drainage structures and urban water supply networks), agricultural applications (crop growth, root development and potential evapotranspiration), hydrology (effect on climate change), geology, agronomy (determining suitable depth and time for seeds planting) and environmental studies (Mihalakakou 2002; Kocak et al. 2004; Jackson et al. 2008; Yilmaz et al. 2009). Soil temperature depends on a variety of environmental factors, including meteorological conditions such as surface global radiation and air temperature; soil physical parameters such as surface albedo, water content and texture; topographical variables such as elevation, slope and aspect; and other surface characteristics such as leaf area index and ground litter stores (Kang et al. 2000; Garcia-Suarez and Butler 2006; Paul et al. 2004).

Soil temperature is measured by using sensors. For example, THERM200 is a soil temperature probe, which has a temperature span from −40 to 85 °C. It outputs a voltage proportional to the temperature, so no complex equations are required to calculate the temperature from voltage. It is highly accurate with 0.125 °C of resolution. Empirical equations and models can be suitable to estimate T _s because of high costs of direct measurements. Soil temperature can be estimated by two different approaches based upon: (1) soil heat flow and energy balance and (2) empirical correlations with easily acquirable variables (Kang et al. 2000). The former approach can give accurate predictions for a well-evaluated site. However, in many sites, there is not sufficient data for calculating heat transfer equations such as surface global radiation (Kang et al. 2000). In addition, several studies have been conducted to estimate soil temperature by using various models, such as analytical, semi-analytical, numerical and experimental models (Hanks et al. 1971; Ghuman and Lal 1982; Paul et al. 2004; Prangnell and McGowan 2009; Droulia et al. 2009). Ghuman and Lal (1982) predicted soil temperature by Fourier analysis in a tropical area. The results demonstrated the high accuracy of the Fourier analysis. Usowicz and Walczak (1994) presented a mathematical model of heat flow to predict soil temperature. The results showed that the estimated values of soil temperature had acceptable accuracy. Droulia et al. (2009) estimated subsurface ground temperature profiles of a bare soil. They used an experimental plot located in the Agricultural University of Athens campus and analytical and semiempirical models. It was concluded that the proposed models may serve as useful tools for predicting T _s.

Recently, artificial intelligence techniques have been increasingly used to estimate meteorological and environmental parameters such as soil temperature. Artificial intelligence methods can be used as alternative techniques. These methods do not require the internal relationship between variables of any investigated system. Simple solutions for multivariable problems and factual calculation (for variables with nonlinear relationships) are other advantages of artificial intelligence methods (Zadeh 1992; Chaturvedi 2008; Huang et al. 2010). Gene expression programming (GEP) and artificial neural networks (ANN) are two examples of biologically inspired approaches to artificial intelligence.

Multiple linear regression (MLR) is one of the most simple methods to model variables which have a linear relationship. Nevertheless, the GEP and ANN are biologically inspired methods which have the ability to model variables with nonlinear relationship. The relationships between soil temperature and meteorological parameters are generally nonlinear (Jungqvist et al. 2014).

The GEP was used by many researchers in a wide range of sciences. For example, rainfall–runoff modeling (Aytek et al. 2008), modeling of daily river discharge (Guven 2009), modeling of groundwater table fluctuations (Shiri et al. 2013), estimation of potential evapotranspiration (i.e., is a measure of the ability of the atmosphere to remove water from the surface through the processes of evaporation and transpiration assuming no control on water supply) (Traore and Guven 2012; Shiri et al. 2014a), solar radiation modeling (Landeras et al. 2012; Mehdizadeh et al. 2016), estimation of dew point temperature (Shiri et al. 2014b), function finding in component thermodynamical selection (Guo et al. 2014) and estimating the peak flood (Zorn and Shamseldin 2015) were accomplished by using GEP. To our knowledge, no study has been yet conducted about soil temperature estimation using GEP. However, the ANN was extensively used to predict T _s in many studies. George (2001) predicted T _s in Gujarat, India, by using ANN with relative humidity, wind speed and air temperature serving as input parameters. The results showed that the ANN was able to estimate T _s. Mihalakakou (2002) modeled daily and annual variations of soil surface temperatures in Athens and Dublin by a deterministic equation (based on the transient heat conduction differential equation and using as boundary condition the energy balance equation at the ground surface) and neural network approaches. It was found that the proposed neural network had the capability to estimate the soil surface temperature distribution. Bilgili (2010) estimated the monthly T _s using linear and nonlinear regression and artificial neural networks in Adana, Turkey. He concluded that the ANN showed a better performance than both regression methods. Tabari et al. (2011) compared the ANN and MLR methods to estimate T _s in Isfahan, an arid region of Iran. The results showed that the ANN predictions were superior to the MLR. Bilgili et al. (2013) estimated T _s by multi-nonlinear regression and ANN at eight stations in Turkey. They concluded that the ANN model provides a simple and accurate prediction of T _s. Also, the ANN had a better performance than the nonlinear regression method. Tabari et al. (2014) predicted short-term soil temperature using ANN for two weather stations located in humid (Sari) and arid (Zahedan) regions of Iran. They concluded that the ANN can be successfully applied to provide accurate and reliable short-term soil temperature forecasts. Kisi et al. (2015) modeled monthly T _s in Mersin, Turkey, by using three different neural techniques which are multilayer perceptron, radial basis neural networks and generalized regression neural networks. Radial basis neural networks were found to be better than the generalized regression neural networks and multilayer perceptron in estimating monthly T _s at 5 and 10 cm depths, while the multiple linear regression and generalized regression neural networks gave the best accuracy at 50 and 100 cm depths, respectively. Further studies have been conducted to estimate soil temperature by Kisi et al. (2016), Hosseinzadeh Talaee (2014), Kim and Singh (2014), Shaker et al. (2014), Napagoda and Tilakaratne (2012).

As mentioned, accurate estimation of soil temperature is one of the most important problems in agricultural and environmental fields. Since strawberry is planted in Sanandaj and the time of planting this crop is a function of soil temperature, the investigation of soil temperature and its prediction have an important role in obtaining considerable yield. Literature review revealed that the GEP has not been used to estimate T _s. Furthermore, the GEP can give an algebraic equation which can be easily used in the future. Moreover, the ANN is commonly used in modeling of environmental variables with the nature of nonlinear dynamic. Beside the GEP and ANN models, the MLR method was employed to predict T _s. Twelve combinations of meteorological parameters, such as minimum air temperature, maximum air temperature, extraterrestrial radiation, relative humidity, wind speed and sunshine hours, were used as inputs in GEP, ANN and MLR.

Materials and methods

Study area and meteorological data

The considered site is Sanandaj which is located at latitude 35° 20′N and longitude 47° 00′E in western Iran (Fig. 1). Sanandaj has an area of 2906 km² and is located 1373.4 m above free water level. According to the de Martonne index (1925), Sanandaj is located in a semiarid region. The meteorological conditions in Sanandaj during the study period (1997–2008) are summarized in Table 1.

Table 1 Statistical characteristics of daily meteorological parameters for Sanandaj during 1997–2008

Full size table

The data for the present studywere included minimum and maximum air temperatures (T _min, T _max), relative humidity (RH), wind speed (U ₂), sunshine hours (n) and soil temperature (T _s) at six different depths (5, 10, 20, 30, 50 and 100 cm). The used data were collected from the Islamic Republic of Iran Meteorological Organization (IRIMO) for the period 1997–2008. It should be noted that the average measured soil temperature in three hours (03:00, 09:00 and 15:00) was considered as mean daily T _s. Other meteorological variables are measured at 00:00, 03:00, 06:00, 09:00, 12:00, 15:00, 18:00 and 21:00. The data between 1997 and 2004 as well as from 2005 to 2008 were used in training and testing stages, respectively.

Gene expression programming (GEP)

GEP was presented by Ferreira (2001). GEP is one of the several machine learning techniques which are based on the concept of Darwinian evolution. GEP is a member of a broad family of techniques called evolutionary algorithms. All these techniques are based on the Darwinian principle of reproduction and survival of the fitness. In fact, in this method, the best population is selected. Otherwise, the new population is reproduced to obtain the best population. One important point that distinguishes the GEP from many other artificial intelligence techniques is the representation of the solutions. GEP returns algebraic equations. The most important advantages of the GEP to other intelligent models are: (1) The chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, transpose, etc.) and (2) the expression trees are exclusively the expression of their respective chromosome (Ferreira 2001).

The flowchart of a GEP algorithm is shown in Fig. 2. The generation of an initial population is the first stage in GEP. This can be done randomly or incorporate prior knowledge on the problem to be solved. Then, the chromosomes are expressed in the form of an expression tree. The results are evaluated using a fitness function to determine the suitability of a solution. If a solution of satisfactory quality is found, the evolution process will be stopped and the best obtained solution to this stage will be reported. If the stop condition is not satisfied, the best solution from the present generation will be copied to the next generation. The chromosomes are selected to reproduce with modification. During reproduction, it is the chromosomes of the individuals, not the expression trees, which are reproduced with modification and transmitted to the new generation. The process is repeated for a certain number of generations or until a solution has been found.

In this study, GeneXpro Tools 4.0 program was employed to develop planning models based on gene expression programming. To apply GEP to a given problem (e.g., T _s), the user needs to provide the following,

1.
In the first step, a terminal set is selected which includes the independent variables of an investigated phenomenon. In this study, the input variables were different combinations of meteorological variables, e.g., minimum and maximum air temperatures, relative humidity, wind speed at 2 m height, sunshine hours and extraterrestrial radiation (Table 2).
Table 2 Different GEP and ANN scenarios and the respective input variables
Full size table
2.
The second stage is to select the function set. In the present study, the function set includes the four basic arithmetic operators $\left\{ { + , - , \times , \div } \right\}$ and a set of algebraic and transcendental functions $\left\{ {x^{2} ,x^{3} ,\sqrt x ,\sqrt[3]{x},Lnx,e^{x} ,Sinx,Cosx,Atanx} \right\}$.
3.
The third step is to choose an index for evaluating model’s accuracy. In this research, the root mean square error (RMSE) was used as fitness function.
4.
In the fourth step, the structure and architecture of the chromosomes are selected. In this process, head size (head contains special functions that activates function set) and the number of genes (these genes code for expression trees of different sizes and shapes) and chromosomes (composed of one or more genes of equal size) were selected 8, 3 and 30, respectively. Also, addition function was used as a linking function between expression trees. Finally, the genetic operators (to modify chromosomes) and corresponding rates were selected (Table 3).
Table 3 Setup of the genetic operators in GEP
Full size table
5.
In fifth step, a stop condition must be defined. In this study, the generation number equal to 1000 was employed as stop criterion.

It should be noted that the GEP is not a deterministic model and it is likely to achieve different results from repeating GEP runs. In the present study, all set of scenarios (see Table 2) were used to run the GEP. Then, the best runs for each scenario were selected.

Artificial neural networks (ANN)

Artificial neural networks are an artificial intelligence technique inspired by biologically neural networks. The ANN establishes a model between a set of input and output variables. Artificial neural networks typically consist of three layers: input layer, hidden layer and output layer (Fig. 3a). Each layer consists of one or several nodes (neurons). The input layer contains the input variables. The output layer contains the output variable(s) of the ANN. The hidden layer(s) processes the information between input and output layers.

The structure of a neuron is shown in Fig. 3b. The input to a node is given by a vector x = (x ₁, x ₂, …, x _n) with n components. The inputs are multiplied with corresponding weights w = (w ₁, w ₂, …, w _n) and added up to calculate transfer function variable, s,

$$s = \sum\limits_{n = 1}^{n} {w_{n} .x_{n} } = w^{T} .x$$

(1)

These weights are optimized by the model. To obtain the output of the respective neuron, s is transferred using a nonlinear transfer function f,

$$y = f(s)$$

(2)

The nonlinear transfer function usually is defined as a sigmoid function,

$$f(s) = \frac{1}{{(1 + e^{ - s} )}}$$

(3)

Further detailed information about the ANN can be found in Haykin (1998).

In this study, a three-layer feed-forward neural networks (Fig. 3a) with a back-propagation learning algorithm was employed to estimate soil temperature. To develop the ANN models, the ANN toolbox in MATLAB R2014a was used.

In the present study, the input layer contains several nodes defined by one of the various combinations of meteorological parameters (Table 2). The output layer contains one node representing the soil temperature at a certain depth. The number of neurons in the hidden layer was optimized by trial and error procedure.

Back-propagation algorithm

More than 70% of the studies implementing ANN for environmental and hydrological applications employed the back-propagation learning algorithm because of its simplicity and robustness (Kumar et al. 2011). The back-propagation algorithm is divided into two stages: forward and backward stages. In the forward stage, all samples enter the network and the neurons are continuously updated from input to output layer. In this stage, all inputs into a neuron are calculated and each input is multiplied to its weight. The weights are randomly applied by the ANN, and the optimum values are obtained. The input value of jth neuron is calculated as,

$$s_{j} = \sum\nolimits_{i} {x_{i} } w_{ji}$$

(4)

where x _i is the ith neuron value and w _ji is the ith neuron connection weight to jth neuron. Then, s _j is converted to the neuron output by an activation function (f),

$$f(s_{j} ) = y_{j}.$$

(5)

The tangent–sigmoid function is often used as an activation function. In the forward stage, weights are not adjusted and remain constant. The outputs obtained are compared with the actual values, and the error function of network is calculated. In the backward stage, the errors (Eq. 6) are used to update the weights from output to input layer. This stage aims to minimize the errors between network output and reference data. The forward and backward stages are repeated several times. The iteration process is stopped until minimizing the error function (Eq. 6) is obtained. A popular method for this is the steepest descent method. In this method, the weights are adjusted as follows (Eqs. 6 and 7),

$$E(w) = \frac{1}{2}\sum\limits_{j,k} {(y_{j,k} - o_{j,k} )^{2} }$$

(6)

where w is the weight, y _j,k is the output value of jth neuron in the last layer which is obtained from kth learning sample and o _j,k is the actual value of jth neuron in kth learning sample. There are different methods to minimize the value of E(w).

$$w_{j,i}^{n + 1} = w_{j,i}^{n} - \eta \frac{\partial E}{{\partial w_{j,i}^{n} }}$$

(7)

where $w_{j,i}^{n}$ is the weight of ith neuron to jth neuron at time n, $\eta$ is the learning constant (between 0 and 1) which is optimized by model and $\frac{\partial E}{{\partial w_{j,i}^{n} }}$ is the derivation at $w_{j,i}^{n}$.

Further detailed information about the back-propagation algorithm can be found in Rojas (1996).

Multiple linear regression (MLR)

Besides applied artificial intelligence models, i.e., GEP and ANN, the MLR method was employed to estimate T _s. The general form of the MLR approach is as follows:

$$Y = a_{0} + a_{1} X_{1} + a_{2} X_{2} + \cdots + a_{n} X_{n}$$

(8)

where a ₀, …, a _n are constants which are obtained by linear regression between dependent variable (Y, i.e., T _s) and independent variables used in the present study (X ₁, …, X _n, i.e., different combinations of predictors).

Evaluation criteria

To evaluate the performance of GEP, ANN and MLR for soil temperature prediction, two statistical indices, the determination coefficient (R ²) and the root mean square error (RMSE) were used (Eqs. 9 and 10):

$$R^{{^{2} }} = \left( {\frac{{\sum\nolimits_{i = 1}^{N} {(P_{i} - \overline{{P_{i} }} )(O_{i} - \overline{{O_{i} }} )} }}{{\sqrt {\sum\nolimits_{i = 1}^{N} {(P_{i} - \overline{{P_{i} }} )^{2} \sum\nolimits_{i = 1}^{N} {(O_{i} - \overline{{O_{i} }} )}^{2} } } }}} \right)^{2}$$

(9)

$$RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {(P_{i} - O_{i} )^{2} } }}{N}}$$

(10)

where P _i is the ith estimated T _s using GEP, ANN and MLR approaches; O _i is the ith observed T _s; P _av is the average of the estimated T _s values (the average of the P _i); O _av is the average of the observed T _s values (the average of the O _i); and N is the number of observations.

Results and discussion

The variations of the annual mean daily soil temperature at different soil depths are presented in Fig. 4. In the studied period, maximum and minimum soil temperatures occurred at 5 and 20 cm depths, respectively. The maximum and minimum soil temperatures at 5 and 20 cm depths were 19.9 °C (in 2008) and 16.6 °C (in 1997), respectively.

The statistical characteristics of daily soil temperature at various depths in the studied period are shown in Table 4. Maximum and minimum soil temperatures were observed at 5 cm depth (equal to 44.4 °C and −3.9 °C, respectively). From surface to the deeper depths, minimum soil temperature increased and maximum soil temperature decreased. The standard deviation of the soil temperature is highest in the surface layer. This is most likely caused the meteorological conditions which have a stronger effect on the surface layer than on deeper soil layers.

Table 4 Statistical characteristics of daily soil temperature for Sanandaj during 1997–2008

Full size table

Results of the GEP models

To develop the GEP models, various function sets were tested. The results showed that a function set consisting of basic arithmetic operators $\left\{ { + , - , \times , \div } \right\}$, as well as algebraic and transcendental functions $\left\{ {Sinx,Cosx,A\tan x,x^{2} ,x^{3} ,\sqrt x ,\sqrt[3]{x},Lnx,e^{x} } \right\}$, showed the best performance compared to all other tested function sets. The values of R ² and RMSE for the GEP in all twelve scenarios are presented in Table 5. The values of RMSE range from 2.027 °C at the depth of 10 cm in GEP5 to 4.812 °C at 100 cm depth (in GEP12). Moreover, R ² varies from 0.686 (100 cm) to 0.974 (10 cm) in GEP12 and GEP5, respectively. The performance of GEP1 is acceptable, especially for 10, 20, 30 and 100 cm depths. At 30 cm depth, the GEP1 was the best scenario (R ² = 0.941 and RMSE = 2.593 °C). Of all scenarios with three inputs (GEP2 to GEP5), the GEP2 (with T _min, T _max and RH predictors) showed the best performance at 20, 30, 50 and 100 cm depths. At 5 and 10 cm depths, the GEP4 and GEP5 were the best scenarios. At all depths, by increasing the number of input variables to four (GEP6 to GEP8), the GEP7 yielded the best results in estimating T _s, except at 30 cm depth. Therefore, with regard to the input variables in GEP6 to GEP8, it can be concluded that the use of wind speed with accompaniment of T _min, T _max and R _a showed better performance in comparison with adding RH and n to GEP5. In the scenarios with five inputs (GEP9 to GEP11), different results were obtained. The GEP10 at surface layer (5 cm depth), the GEP9 at 10, 20 and 50 cm depths and the GEP11 at 30 and 100 cm depths were the best scenarios to estimate T _s. In general, by considering six parameters as inputs (GEP12), the estimation accuracy of T _s is not necessarily increased.

Table 5 Statistical indices for the GEP models at different depths for 12 different scenarios (see Table 2)

Full size table

According to the statistical indices in all scenarios which have been presented in Table 5, it can be concluded that the GEP10 at 5 cm depth (R ² = 0.966 and RMSE = 2.575 °C), the GEP5 at 10 cm depth (R ² = 0.974 and RMSE = 2.027 °C), the GEP1 at 30 cm depth (R ² = 0.941 and RMSE = 2.593 °C), the GEP7 at the depths of 20 cm (R ² = 0.959 and RMSE = 2.270 °C), 50 cm (R ² = 0.914 and RMSE = 3.022 °C) and 100 cm (R ² = 0.833 and RMSE = 3.304 °C) achieved the highest accuracy in estimating T _s.

Figures 5 and 6 show scatter and time series plots comparing observed and estimated soil temperature in testing stage (2005–2008) for the best scenarios at each depth, respectively. It is obvious from the given R ² of the fitted lines in the scatter plots (Fig. 5) that the estimated values from GEP5 at 10 cm depth are closer to the observed soil temperature values than those of the other different scenarios and depths. Temporal variations of predicted soil temperature and its comparison with observed data (Fig. 6) reveal that the observed and predicted values of soil temperature match closely. It can be concluded from Fig. 6 that in warmer days, the values of T _s have been underestimated by GEP. It is clear that the model is adapted with all data and in the extreme points, such as warmer days, the results of the model are lower than observed data.

As mentioned before, the GEP returns algebraic equations between input and output variables. The equations of the best model at each depth generated by GEP are presented in Table 6. It is clear from the table that in some cases, all used predictors in different scenarios (Table 2) are not seen. The reason of this fact can be explained from little effect of the ignored predictor, i.e., the coefficient of some variables is negligible. Moreover, the constant coefficients in algebraic equations (Table 6) are randomly values which are produced by GEP. As an example, the expression tree returned by GEP5 for 10 cm depth is shown in Fig. 7. This figure can be transformed into an equation which is seen in Table 6. The mentioned equation has been constructed from three components including Sub-ET 1, Sub-ET 2 and Sub-ET 3 (in Fig. 7), and each Sub-ET was seen as an algebraic sentence. For example, Sub-ET 1 can be written as ${\text{Ln}}\left[ {(\sqrt {e^{{(T_{\hbox{min} } + T_{\hbox{max} } )}} } ) \times R_{\text{a}} } \right]$.

Table 6 Algebraic equations returned by the best GEP scenario for each depth

Full size table

Results of the ANN models

The ANN in this study consists of one hidden layer and one output layer. The optimum number of nodes in the hidden layer was obtained by trial and error and was found to vary between 3 and 10 for the different depths. As for GEP, twelve different combinations of input variables were considered (Table 2). RMSE and R ² for the different scenarios and soil depths are presented in Table 7. The RMSE ranges from 1.902 °C at the depth of 10 cm in ANN7 to 3.816 °C at 100 cm depth (in ANN2). In general, ANN1 which uses only minimum and maximum air temperatures does not give accurate estimation of soil temperature. Out of all three inputs scenarios (ANN2 to ANN5), the ANN5 (with inputs of T _min, T _max and R _a) estimated T _s with the smallest error at all depths, except 20 cm depth. At 20 cm depth, ANN2 showed better results than the other three inputs scenarios. The ANN7 was the best scenario at all soil depths out of all four inputs scenarios (ANN6 to ANN8). Similar results were obtained by GEP model (except 30 cm depth). By adding U ₂ to T _min, T _max and R _a, estimation accuracy of T _s improved in comparison with adding RH and n. Similar to the GEP models, in scenarios with five inputs (ANN9 to ANN11), different results were obtained. ANN10 at 5 and 20 cm depths, the ANN9 at 10 and 50 cm depths, the ANN11 at 30 and 100 cm depths gave the best results of all five input scenarios. In many cases, by considering the six inputs in ANN12, the ANN models’ accuracy was increased.

Table 7 Statistical indices for the ANN models at different depths for 12 different scenarios (see Table 2)

Full size table

With regard to the all statistical indices in Table 7, the ANN10 at 5 cm depth (R ² = 0.980 and RMSE = 2.191 °C), the ANN7 at the depths of 10 cm (R ² = 0.980 and RMSE = 1.902 °C), 20 cm (R ² = 0.970 and RMSE = 2.093 °C) and 50 cm (R ² = 0.943 and RMSE = 2.663 °C), the ANN5 at 30 cm depth (R ² = 0.957 and RMSE = 2.405 °C) and the ANN12 at 100 cm depth (R ² = 0.909 and RMSE = 2.570 °C) were the best scenarios.

Figures 8 and 9 show scatter and time series plots comparing observed and estimated soil temperature for the best scenarios at various depths. It is clear from the given R ² of the fitted lines in the scatter plots (Fig. 8) that the performance of ANN7 is quite good with high correlation. A visual inspection of the estimated and observed soil temperature clearly demonstrates the potential of ANN modeling (Fig. 9).

Results of the MLR models

Beside the GEP and ANN methods, the MLR approach was used to estimate T _s. Statistical indices including RMSE and R ² for this method are given in Table 8. As seen, considerable differences are not observed between the models’ accuracy at a specific depth, except for 100 cm depth. At this depth, increasing the predictors leads to improvement in the performance of the scenarios than upper layers. Also for all depths, the MLR12 with full inputs is the best model.

Table 8 Statistical indices for the MLR models at different depths for 12 different scenarios (see Table 2)

Full size table

Comparison of the GEP, ANN and MLR models

Comparing Tables 5 and 7 shows that in the scenarios with the same predictors (e.g., comparing GEP1 and ANN1 at a specific depth), the accuracy of ANN is better than for GEP. Similar to the results obtained from the present study, the superiority of the ANN to other models (e.g., multiple linear regression, multi-nonlinear regression, neuro-fuzzy and genetic programming) has been reported by Bilgili (2010), Tabari et al. (2011), Bilgili et al. (2013) and Kisi et al. (2016). Moreover, increasing depth and using the scenarios with further predictors cause an improvement in the accuracy of the ANN models than GEP scenarios, especially at the depth of 100 cm (see Tables 5 and 7). The values of R ² and RMSE (Table 5) indicate that the accuracy of the GEP models does not necessarily increase with increasing the number of input variables. For example, the GEP12 scenario considered the full predictor sets; however, this scenario performed the worst for 100 cm depth. Similar behavior was observed for GEP10 at 30 and 50 cm depths, i.e., increasing predictors in GEP does not cause the enhancement in the models’ performance. For ANN, the situation is different. Here for scenarios considering larger numbers of input variables, the performance of the respective ANN tends to be better than for scenarios considering less input variables. At 100 cm depth for instance the ANN12 scenario worked the best. For all other depths, scenarios which use five input variables (i.e., ANN9-ANN11) show high performance. The ANN is an intelligent model. Therefore, the ANN itself considers the parameters which have high correlation with soil temperature in modeling process, whereas the GEP does not have this ability. However, in the case of the correct definition of predictors to the GEP, the solutions of the GEP can be satisfied.

The scenarios GEP2 to GEP5 and ANN2 to ANN5 might be of interest for regions where only the respective subsets of meteorological predictors are available. In such regions, the models generated by GEP2 using T _min, T _max and RH which yields reasonable results, especially between 20 and 100 cm depth, and ANN5 with input variables T _min, T _max and R _a are recommended.

The results of GEP, ANN and MLR methods (Tables 5, 7 and 8) reveal that the MLR estimations are generally better than GEP for investigated depths and scenarios. On the other hand, the ANN showed the best performance.

The tendency of used models in estimating T _s was investigated. In warmer days, underestimation was observed. For other days, constant trends were not seen. However, all methods generally showed underestimation.

From a depth of 5 to 10 cm, the accuracy for all scenarios increased for all used methods. 10 cm depth had the highest accuracy in estimating T _s in comparison with all other depths due to the vicinity of upper layers to the atmosphere. In general, from 20 to 100 cm depth, the error increased. Similar results were obtained by Tabari et al. (2011). This can be due to the meteorological conditions having a stronger effect on the soil temperature near the surface than deeper depths. The upper soil layers serve as an insulation between lower atmosphere and deeper soil layers. In other words, the deeper soil layers respond much slower to changes in the atmospheric conditions than the soil layers near the surface.

Effect of meteorological parameters on soil temperature

To investigate the effects of meteorological variables, such as minimum air temperature, maximum air temperature, extraterrestrial radiation, relative humidity, wind speed and sunshine hours on soil temperature, the correlation coefficients between mentioned parameters and soil temperature were obtained at all depths. Table 9 shows the mentioned correlation coefficients. As seen, the highest correlation is observed between T _max and T _s. After T _max, T _min had the highest correlation with T_s. Air temperatures (T _max and T _min) appear to strongly affect soil temperature. This is in accordance with Mihalakakou (2002), Tabari et al. (2011) and Kisi et al. (2015) studies. They stated that soil temperature has the highest correlation with air temperature. Relative humidity has a strong negative correlation with T _s. This is rational because solar radiation causes the evaporation of soil moisture. Consequently, ambient relative humidity increases and radiation energy dose not increase T _s. After T _max, T _min and RH, the least informative parameters concerning T _s are R _a and n, respectively. The weakest correlation was observed between U ₂ and T _s. This means that wind speed has only little effect on T _s in comparison with other meteorological parameters. The correlation between meteorological parameters and T _s decreases with depths. Also, a little difference is observed between the correlations of meteorological parameters and T _s at 5–50 cm depths. However, for 50 and 100 cm depths, this difference is slightly more considerable than upper depths. This is due to negligible influence of meteorological parameters on T _s at 100 cm depth in comparison with upper layers.

Table 9 Pearson’s correlation coefficients between meteorological parameters and soil temperature

Full size table

Conclusion

In the recent years, artificial intelligence models have been increasingly used to predict meteorological and environmental parameters such as soil temperature (T _s). In this study, gene expression programming (GEP), artificial neural networks (ANN) and multiple linear regression (MLR) were used to estimate T _s at six different depths. The results indicated that used models are able to estimate soil temperature. However, the performance of ANN was better than for GEP and MLR. For used methods, the highest and lowest accuracy were observed at 10 and 100 cm depths, respectively. Between 20 and 100 cm depths, the accuracy decreased with depth. Between 5 and 10 cm depths, the accuracy increased. The results showed that the performance of GEP5 (R ² = 0.974 and RMSE = 2.027 °C), ANN7 (R ² = 0.980 and RMSE = 1.902 °C) and MLR12 (R ² = 0.971 and RMSE = 2.088 °C) at 10 cm depth was superior to all other investigated scenarios and depths. Moreover, correlation coefficients between soil temperature and meteorological parameters revealed that the correlations coefficients decreased with increasing depth. As the surface layers of the soil act as insulation, the effect of the meteorological parameters on T _s is less apparent in the deeper layers. Of all atmospheric parameters considered, air temperatures (maximum and minimum air temperatures) showed the highest correlation to soil temperature. Furthermore, it was found that in the warmer days, the GEP and ANN models underestimated T _s. For other days, the special trends are not observed.

At the studied station (and other synoptic stations in Iran), night measurements of T _s were not carried out. This can cause the creation of errors in correct estimation of T _s. In the present study, T _min, T _max, R _a, RH, U ₂ and n were used to estimate T _s. Future works may consider the effects of other variables such as precipitation, actual solar radiation and soil moisture on T _s.

Abbreviations

T _min :: Daily minimum air temperature (°C)
T _max :: Daily maximum air temperature (°C)
RH:: Daily relative humidity (%)
U ₂ :: Daily wind speed at 2 m height (m s⁻¹)
n :: Daily sunshine hours (hr)
R _a :: Daily extraterrestrial radiation (MJ m⁻² day⁻¹)
T _s :: Daily soil temperature (°C)
w :: Weight of each neuron
x :: Input to each neuron
s :: Summation of inputs multiplication in corresponding weights
f :: Transfer function
E :: Error function of network in the back-propagation algorithm
η :: Learning constant in the back-propagation algorithm
y _j :: The output value of jth neuron
R ² :: Determination coefficient
RMSE:: Root mean square error (°C)
P _i :: ith estimated T _s (°C)
O _i :: ith observed T _s (°C)
P _av :: Average of the estimated T _s values (°C)
O_av :: Average of the observed T _s values (°C)
N :: Number of observations
Y :: Dependent variable in MLR (i.e., T _s)
X ₁, …, X _n :: Independent variables in MLR
a _0, …, a _n :: Constant coefficients of MLR approach

References

Aytek A, Asce M, Alp M (2008) An application of artificial intelligence for rainfall runoff modeling. J Earth Syst Sci 117:145–155
Article Google Scholar
Bilgili M (2010) Prediction of soil temperature using regression and artificial neural network models. Meteorol Atmos Phys 110:59–70
Article Google Scholar
Bilgili M, Sahin B, Sangun L (2013) Estimating soil temperature using neighboring station data via multi-nonlinear regression and artificial neural network models. Environ Monit Assess 185:347–358
Article Google Scholar
Chaturvedi DK (2008) Soft computing: techniques and its applications in electrical engineering. Springer, Heidelberg
Book Google Scholar
de Martonne E (1925) Traité de Géographie Physique, 3 tomes. Librairie Armand Colin, Paris
Droulia F, Lykoudis S, Tsiros I, Alvertos N, Akylas E, Garofalakis I (2009) Ground temperature estimations using simplified analytical and semi-empirical approaches. Sol Energy 83:211–219
Article Google Scholar
Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst 13:87–129
Google Scholar
Garcia-Suarez AM, Butler CJ (2006) Soil temperatures at Armagh observatory, Northern Ireland, from 1904 to 2002. Int J Climatol 26:1075–1089
Article Google Scholar
George RK (2001) Prediction of soil temperature by using artificial neural networks algorithms. Nonlinear Anal 47:1737–1748
Article Google Scholar
Ghuman BS, Lal R (1982) Temperature regime of a tropical soil in relation to surface condition and air temperature and its Fourier analysis. Soil Sci 134:133–140
Article Google Scholar
Guo Z, Wu Z, Dong X, Zhang K, Wang S, Li Y (2014) Component thermodynamical selection based gene expression programming for function finding. Math Probl Eng 16:6263–6285
Google Scholar
Guven A (2009) Linear genetic programming for time-series modeling of daily flow rate. J Earth Syst Sci 118:137–146
Article Google Scholar
Hanks RJ, Austin DD, Ondrechen WT (1971) Soil temperature estimation by a numerical method. Soil Sci Soc Am J 35:665–667
Article Google Scholar
Haykin S (1998) Neural networks–a comprehensive foundation, 2nd edn. Prentice-Hall, Upper Saddle River, pp 26–32
Google Scholar
Hosseinzadeh Talaee P (2014) Daily soil temperature modeling using neuro-fuzzy approach. Theor Appl Climatol 118:481–489
Article Google Scholar
Huang Y, Lan Y, Thomson SJ, Fang A, Hoffmann WC, Lacey RE (2010) Development of soft computing and applications in agricultural and biological engineering. Comput Electron Agric 71:107–127
Article Google Scholar
Jackson T, Mansfield K, Saafi M, Colman T, Romine P (2008) Measuring soil temperature and moisture using wireless MEMS sensors. Meas 41:381–390
Article Google Scholar
Jungqvist G, Oni SK, Teutschbein C, Futter MN (2014) Effect of climate change on soil temperature in Swedish boreal forests. PLoS one 9(4):e93957. doi:10.1371/journal.pone.0093957
Article Google Scholar
Kang S, Kim S, Oh S, Lee D (2000) Predicting spatial and temporal patterns of soil temperature based on topography, surface cover and air temperature. For Ecol Manag 136:173–184
Article Google Scholar
Kim S, Singh VP (2014) Modeling daily soil temperature using data-driven models and spatial distribution. Theor Appl Climatol 118:465–479
Article Google Scholar
Kisi O, Tombul M, Zounemat Kermani M (2015) Modeling soil temperatures at different depths by using three different neural computing techniques. Theor Appl Climatol 121:377–387
Article Google Scholar
Kisi O, Sanikhani H, Cobaner M (2016) Soil temperature modeling at different depths using neuro-fuzzy, neural network, and genetic programming techniques. Theor Appl Climatol. doi:10.1007/s00704-016-1810-1
Google Scholar
Kocak K, Saylan L, Eitzinger J (2004) Nonlinear prediction of near-surface temperature via univariate and multivariate time series embedding. Ecol Model 173:1–7
Article Google Scholar
Kumar M, Raghuwanshi NS, Singh R (2011) Artificial neural networks approach in evapotranspiration modeling: a review. Irrig Sci 29:11–25
Article Google Scholar
Landeras G, Lopez JJ, Kisi O, Shiri J (2012) Comparison of Gene Expression Programming with neuro-fuzzy and neural network computing techniques in estimating daily incoming solar radiation in the Basque Country (Northern Spain). Energy Convers Manag 62:1–13
Article Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2016) Comparison of artificial intelligence methods and empirical equations to estimate daily solar radiation. J Atmos Sol-Terr Phys 146:215–227
Article Google Scholar
Mihalakakou G (2002) On estimating soil surface temperature profiles. Energy Build 34:251–259
Article Google Scholar
Napagoda NADN, Tilakaratne CD (2012) Artificial neural network approach for modeling of soil temperature: a case study for Bathalagoda area. Sri Lankan J Appl Stat 13:39–59
Google Scholar
Paul KI, Polglase PJ, Smethurst PJ, O’Connell AM, Carlyle CJ, Khanna PK (2004) Soil temperature under forests: a simple model for predicting soil temperature under a range of forest types. Agric For Meteorol 121:167–182
Article Google Scholar
Prangnell J, McGowan G (2009) Soil temperature calculation for burial site analysis. Forensic Sci Int 191:104–109
Article Google Scholar
Rojas R (1996) Neural networks. Springer, Heidelberg
Book Google Scholar
Shaker F, Monadjemi AH, Yazdanpanah H (2014) Comparing artificial neural networks and linear regression model in predicting soil surface temperature. Int J Sci Knowl 5:12–19
Article Google Scholar
Shiri J, Kisi O, Yoon H, Lee KK, Nazemi AH (2013) Predicting groundwater level fluctuations with meteorological effect implications—a comparative study among soft computing techniques. Comput Geosci 56:32–44
Article Google Scholar
Shiri J, Sadraddini AA, Nazemi AH, Kisi O, Landeras G, Fakheri Fard A, Marti P (2014a) Generalizability of gene expression programming-based approaches for estimating daily reference evapotranspiration in coastal stations of Iran. J Hydrol 508:1–11
Article Google Scholar
Shiri J, Kim S, Kisi O (2014b) Estimation of daily dew point temperature using genetic programming and neural networks approaches. Hydrol Res 45:165–181
Article Google Scholar
Tabari H, Sabziparvar AA, Ahmadi M (2011) Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region. Meteorol Atmos Phys 110:135–142
Article Google Scholar
Tabari H, Hosseinzadeh Talaee P, Willems P (2014) Short-term forecasting of soil temperature using artificial neural network. Meteorol Appl 22:576–585
Article Google Scholar
Traore S, Guven A (2012) Regional-specific numerical models of evapotranspiration using gene-expression programming interface in Sahel. Water Resour Manag 26:4367–4380
Article Google Scholar
Usowicz B, Walczak R (1994) Soil temperature prediction by numerical model. Pol J Soil Sci 28:87–94
Google Scholar
Yilmaz T, Ozbek A, Yılmaz A, Buyukalaca O (2009) Influence of upper layer properties on the ground temperature distribution. J Therm Sci Technol 29:43–51
Google Scholar
Zadeh LA (1992) Fuzzy logic, neural networks and soft computing. One-page course announcement of CS 294–4. University of California, Berkley
Google Scholar
Zorn CR, Shamseldin AY (2015) Peak flood estimation using gene expression programming. J Hydrol 531:1122–1128
Article Google Scholar

Download references

Author information

Authors and Affiliations

Water Engineering Department, Urmia University, Urmia, Iran
Javad Behmanesh & Saeid Mehdizadeh

Authors

Javad Behmanesh
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Mehdizadeh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javad Behmanesh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Behmanesh, J., Mehdizadeh, S. Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region. Environ Earth Sci 76, 76 (2017). https://doi.org/10.1007/s12665-017-6395-1

Download citation

Received: 18 October 2015
Accepted: 03 January 2017
Published: 11 January 2017
DOI: https://doi.org/10.1007/s12665-017-6395-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region

Abstract

Similar content being viewed by others

Evaluating the performance of artificial intelligence methods for estimation of monthly mean soil temperature without using meteorological data

Soil temperature modeling at different depths using neuro-fuzzy, neural network, and genetic programming techniques

Temperature-based modeling of reference evapotranspiration using several artificial intelligence models: application of different modeling scenarios

Introduction