Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region

Tabari, Hossein; Sabziparvar, Ali-Akbar; Ahmadi, Mohammad

doi:10.1007/s00703-010-0110-z

Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region

Original Paper
Published: 09 December 2010

Volume 110, pages 135–142, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Meteorology and Atmospheric Physics Aims and scope Submit manuscript

Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region

Download PDF

Hossein Tabari¹,
Ali-Akbar Sabziparvar¹ &
Mohammad Ahmadi²

1165 Accesses
103 Citations
Explore all metrics

Abstract

Soil temperature (T _S) strongly influences a wide range of biotic and abiotic processes. As an alternative to direct measurement, indirect determination of T _S from meteorological parameters has been the focus of attention of environmental researchers. The main purpose of this study was to estimate daily T _S at six depths (5, 10, 20, 30, 50 and 100 cm) by using a multilayer perceptron (MLP) artificial neural network (ANN) model and a multivariate linear regression (MLR) method in an arid region of Iran. Mean daily meteorological parameters including air temperature (T _a), solar radiation (R _S), relative humidity (RH) and precipitation (P) were used as input data to the ANN and MLR models. The model results of the MLR model were compared to those of ANN. The accuracy of the predictions was evaluated by the correlation coefficient (r), the root mean-square error (RMSE) and the mean absolute error (MAE) between the measured and predicted T _S values. The results showed that the ANN method forecasts were superior to the corresponding values obtained by the MLR model. The regression analysis indicated that T _a, RH, R _S and P were reasonably correlated with T _S at various depths, but the most effective parameters influencing T _S at different depths were T _a and RH.

Predictive modelling of soils’ hydraulic conductivity using artificial neural network and multiple linear regression

Article Open access 18 January 2021

Comparison of artificial intelligence techniques for prediction of soil temperatures in Turkey

Article 25 August 2016

Predicting moisture content of soil from thermal properties using artificial neural network

Article 26 September 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Soil temperature (T _S) is an important parameter in different areas of research such as hydrology, soil science, geotechnology, ecology, meteorology, agronomy and environmental studies (Jackson et al. 2008). For example, the mineralization of plant nutrients, such as nitrogen, along with the consequent liberation of carbon dioxide, is strongly temperature dependent (Seyfried et al. 2001).

The temperature regimes of the soil surface have two cyclical periods, namely diurnal and annual cycles. The variations of soil temperature resulting from daytime heating and nighttime cooling are known as diurnal variations. In the morning before sunrise, the minimum temperature of soil is minimum at the surface and increases with the depth. Similarly, the temperature continues to rise in the lower layers even after the top layer starts cooling down. However, the amplitude of the diurnal wave continues to decrease with soil depth. The annual variations in T _S result from the variations in short-wave radiation throughout the year. Above the equator (i.e., higher latitudes), the annual variations in T _S become significant. The summer months in June and July in the Northern Hemisphere represent the peak of global radiations and temperatures, whereas winter months have effects similar to nocturnal daily temperatures (Lal and Shukla 2004).

Measurements of T _S at the surface and at various depths are spatially and temporally limited. Soil temperature is influenced by a number of meteorological factors (e.g., solar radiation, air temperature, etc.), site topography, soil water content, soil texture and the area of surface covered by litter and canopies of plants. In locations where T _S measurements are sparse, theoretical estimates can be useful to predict it from other exiting data (Mihalakakou 2002; Paul et al. 2004).

So far, few studies have been conducted for estimation of T _S in different climate conditions (e.g., Lin 1980; Zheng et al. 1993; Hu and Islam 1995; Bond-Lamberty et al. 2005; Best et al. 2005; Holmes et al. 2008). George (2001) examined the potential of artificial neural network (ANN) in estimating soil temperature. The results indicated that the estimated values of T _S by ANN in general were in good agreement with the observed values. Plauborg (2002) developed empirical and simple models for estimation of T _S at 10-cm depth in grass-covered soils. The root mean-square error of the model was 0.97°C, and 98% of the variation in data was explained. Mihalakakou (2002) used a deterministic model and ANN approach for estimating daily and annual surface T _S. He found that the intelligent technique is able to adequately estimate surface soil temperature. Houle et al. (2002) evaluated the FORHYM2 model for prediction of soil temperature. The results confirmed the ability of the FORHYM2 model to simulate soil temperature. Gao et al. (2007) applied two T _S rate equations for estimating soil temperature. They indicated that the predicted T _S values were in satisfactory agreement with direct measurements.

Some currently available models for estimating T _S are based on soil heat flux and energy balance, but need numerous sets of input data such as wind speed (U) and cloudiness. So far, several authors have compared ANN and multivariate regression methods in estimating meteorological and hydrological parameters (e.g., Tabari et al. 2010a, 2010b). In this paper, an attempt has been made to estimate daily T _S at six depths (5, 10, 20, 30, 50 and 100 cm) by using a multilayer perceptron (MLP) artificial neural network model and a multivariate linear regression (MLR) method in an arid region. It is assumed that the T _S is affected by air temperature (T _a), solar radiation (R _S), relative humidity (RH) and precipitation (P).

2 Materials and methods

2.1 Study area and data

The study area was the Isfahan Province, which is located in the central part of Iran, at 49°36′ E to 55°31′ E longitude and 30°42′ N to 34°27′ N latitude, covering arable land area of 107,027 km² (Fig. 1). The climate of Isfahan is classified as arid, experiencing warm weather in summer and cold in winter (Sabziparvar 2009). The largest and the most arid desert land (Lut) of Iran, which lies on the eastern parts of Isfahan, affects the climate condition of this province. Moving toward the southern parts of this province, the amount of rainfall and humidity gradually increases and the temperature reduces. The average air temperature in this province is 19.3°C in spring, 27.2°C in summer, 12.4°C in autumn and 5.7°C in winter. The total annual number of freezing days of the province is 76 days and the average annual precipitation is about 120 mm.

Mean daily meteorological data include: T _a (°C); RH (%); R _S (MJ m⁻² day⁻¹); P (mm) and T _S (°C) for bare soil were collected from 1996 to 2005 (10 years) from five synoptic stations. The information about the stations is presented in Tables 1 and 2. Soil temperature data were measured by using standard WMO-approved earth thermometers installed at depths of 5, 10, 20, 30, 50 and 100 cm (IRIMO 2007). Monthly means of daily precipitation, air temperature, relative humidity and solar radiation are illustrated in Fig. 2. The maximum (21.7 mm) and minimum (0.1 mm) monthly precipitation is normally observed in March and September, respectively. The maximum and minimum T _a occurred in July (28.9°C) and January (3.4°C), respectively. The highest RH of 60% occurred in December and January, while the lowest RH of 25% was observed in June and July. The maximum and minimum R _S occurred in July (27.73 MJ m⁻² day⁻¹) and December (11.49 MJ m⁻² day⁻¹), respectively. Annual and monthly means of daily soil temperature averaged over the study period (1996–2005) are shown in Fig. 3. The minimum monthly T _S was recorded in January for all depths. With the exception of T _S at a depth of 100 cm, the maximum monthly T _S was observed in July, which was coincident with the maximum monthly air temperature. At a depth of 100 cm, the maximum monthly T _S was observed in August.

Table 1 Geographic characteristics of the synoptic stations used in the study

Full size table

Table 2 Annual means (1996–2005) of daily meteorological parameters averaged over 10 years

Full size table

2.2 Determination of T _S regime

Soil temperatures continuously vary in response to the changing meteorological regimes acting upon the earth–atmosphere interface. The meteorological regimes are characterized by periodic succession of days and nights and winters and summers. Based upon the mean annual T _S at a depth of 50 cm, soil temperature regimes are classified into six categories, namely pergelic, cryic, frigid, mesic, thermic and hyperthermic (Table 3) (Lal and Shukla 2004).

Table 3 Classes of soil temperature regimes

Full size table

2.3 Artificial neural network

Artificial neural networks were originally designed for the modeling of the performance of a biological neural system. The internal architecture of an ANN is similar to the structure of a biological brain with a number of layers of fully interconnected nodes or neurons. The most common architecture is composed of: the input layer, where the data are introduced into the ANN, the hidden layer(s) where the data are processed and the output layer where the results of the given inputs are obtained. This type of ANN is called multilayer perceptron (MLP) (Landeras et al. 2008). This study evaluates the utility of MLP neural networks for estimating T _S at different depths. Figure 4 provides an overview of the structure of this network.

In this study, the best learning algorithm, activation function and architecture of the network were determined by trial and error. Although special learning parameters (e.g., momentum factor, learning rate, etc.) can help to avoid local minima, no guarantee of finding the global minimum can be given. The probability of finding the global minimum was enhanced by selecting various random start positions.

2.4 Learning algorithm

As a need for an ANN computational model, an ANN model must be able to learn, that is, be able to recognize the value of the weights that represent the interconnections of the neurons or nodes found in the different layers making up the network. In this study, supervised learning is used to train and teach the network. In this method, a group of data called the training set is used to help the network to determine which values are appropriate for its weights. Each example is made up of an input signal with its corresponding correct answer or target. The learning process consists of modifying synapse weights that had been evaluated randomly at the beginning of the training to minimize the difference between the desired responses and those actually produced by the network. The training of the network is carried out for a considerable number of patterns until the network reaches the point where the weights no longer undergo significant changes (Torres et al. 2005). Each time that the network processes a whole set of data (both a forward and a backward pass) is called an epoch. The network was trained in this way and the error was reduced by every epoch until a reasonable level of error was obtained. Six learning algorithms (i.e., Levenberg–Marquardt, Delta-Bar-Delta, Step, Momentum, ConjugateGradient and Quickprop) were tested to identify the one which trains a given network more efficiently. In this study, the training of the ANN was carried out using the NeuroSolutions software as developed by Neurodimensions Inc. of Florida (NeuroDimension 2005).

2.5 Activation function

The activation function is the formula used to determine the output of a processing neuron. In this study, three different functions (i.e., Sigmoid, Tanh and Linear) were tested to identify the one which showed the best results in depicting the non-linearity of the system following the trial and error approach. The main selection criterion here was to increase the neural network accuracy. The Linear function implements a linear axon with slope and offset control. The Sigmoid function applies a scaled and biased sigmoid function to each neuron in the layer. The scaling factor and bias are inherited from the Linear function. The range of values for each neuron in the layer is between zero and one. The Tanh function applies a bias and Tanh function to each neuron in the layer. This will squash the range of each neuron in the layer to between (−1) and (1) (NeuroSolutions Manual 2003).

2.6 Multivariate linear regression

One of the classical problems in statistical analysis is to find a suitable relationship between a response variable and a set of regressor variables. Multivariate linear regression is a method used to model the linear relationship between a dependent variable and one or more independent variables. MNLR is based on least squares: the model is fit such that the sum of squares of differences of observed and predicted values is minimized.

2.7 Statistical criteria for model validation

The performances of the models developed in this study were assessed using various standard statistical performance evaluation criteria. Several performance criteria including correlation coefficient (r), root mean-square error (RMSE) and mean absolute error (MAE) were used.

3 Results and discussion

3.1 Statistical analysis

The mean annual T _S of the study area was 20.3°C at a depth 50 cm. Also, the mean T _S in summer and winter seasons were 26.7 and 13.8°C at a depth of 50 cm, respectively. Therefore, T _S regime of the study area (according to Table 3) is thermic. Statistical analysis on the soil temperature data showed that daily T _S data at a depth of 5 cm had the highest coefficients of variation (CV) of 56%, whereas the lowest CV (31%) was observed at a depth of 100 cm. In general, coefficients of variation decreased with increasing soil depth. These results are consistent with the study of Gao et al. (2007) who showed that variation of T _S decreased with increasing soil depth.

3.2 Results of ANN

To use and test the ANN models, the spatial averages of the daily meteorological data for the area were divided into a training set (in this study, 60% of the whole data), a cross-validation data set (15%), and a test set (25%). The training set was used to fit the ANN model weights (for a number of different network configurations and training phases) The cross-validation set was used to select the model variant that provided the best level of generalization, and the test set was used to evaluate the chosen model against unused data.

In this study, hundreds of different topologies were tested. This way of defining the topology takes a considerable amount of time, and it is nevertheless quite likely that an untested combination might have a better response to the expected generalization and convergence time than the one selected. The statistical performance evaluation criteria of the ANN for all depths in training and testing phases are presented in Table 4. Although several tests were repeated by using one, two and three hidden layers, a single hidden layer with five neurons was the best architecture. The simulations showed that increasing the number of hidden layers and the number of neurons in the hidden layers had no significant improvement in the predicted soil temperatures. For the best selected architecture, the Levenberg–Marquardt learning algorithm and Sigmoid activation function showed the highest correlation coefficients and minimum errors. Different numbers of epochs were also tested to obtain the best case with minimum errors. These tests were conducted to verify whether increase and decrease in this value could lower the error rate when T _S was obtained. For setting the numbers of epochs, the common practice is to start training with the default of 1,000. The results showed that increase and decrease in this value brought nearly no significant improvement to the T _S forecast, and the default was the optimum number for the best topology. Comparison of the T _S estimated by ANN and observed values at testing phases showed good agreement (Fig. 5). As shown in Table 4, the error values increased with increasing soil depth. In general, the study confirmed the capabilities of ANN as an effective tool for predicting daily soil temperature in arid regions. These results are consistent with the results as reported by George (2001). Mihalakakou (2002) also found that ANN was able to estimate surface T _S with a reasonable degree of accuracy.

Table 4 Statistical performance evaluation criteria of ANN for all depths in training and testing phases

Full size table

3.3 Results of MLR

The results of statistical evaluation criteria of the MLR for estimation of daily T _S at different depths are presented in Table 5. The T _S variable was defined as the dependent one and the daily areal average T _a, RH, R _S and P were considered as independent. The estimated T _S data by the MLR models were in reasonable agreement with the measured values. As shown in Table 5, the correlation coefficient increased with decreasing soil depth. Also, RMSE and MAE values increased with increasing soil depth, resulting from a decreasing effect of the meteorological parameters on T _S. The comparison between the T _S predicted by the MLR and the observed values are presented in Fig. 6. A comparison of the model performances between the ANN and MLR models indicated that the ANN was more suitable to estimate T _S in the study area.

Table 5 Statistical performance evaluation criteria of MLR for all depths

Full size table

3.4 Determination of the most effective parameters affecting soil temperature

Correlations (Pearson) between daily meteorological parameters and T _S at various depths are given in Table 6. As shown, there were significant positive correlations between soil temperature and T _a and R _S, whereas negative correlations were observed between T _S and RH and P. It means that T _S values decrease with increasing relative humidity and precipitation, whereas T _S increases with increasing T _a and R _S. Air temperature at 2 m height correlates well with T _S, because both are determined by the energy balance at the ground surface. These results are consistent with the study of Zheng et al. (1993) who showed that there are strong relationships between air temperature and soil temperature. Mihalakakou (2002) also found that air temperature was a more effective parameter influencing soil surface temperature than relative humidity and solar radiation. The results of the present study indicated that the effect of the meteorological parameters on T _S reduced with increasing soil depth. Due to decreasing atmospheric effects on T _S in deeper soil layers, significant differences in correlation coefficients between depths of 50 and 100 cm were observed for all meteorological parameters. The most effective parameters influencing T _S at different depths were T _a and RH, respectively.

Table 6 Correlations (Pearson) between meteorological parameters and T _S at various depths

Full size table

4 Conclusions

The main aim of this paper was to estimate daily T _S at different depths from 5 to 100 cm using meteorological parameters in an arid region, and an MLP neural network model and an MLR method were applied and tested. The results obtained by the ANN indicated that the 4–5–1 architecture produced the best results and the Levenberg–Marquardt learning algorithm and Sigmoid activation function were found to be the most appropriate choices for the estimations. The best topologies of the ANN were the same for all depths, but the errors increased with soil depth.

An increase in the number of hidden layers and the number of neurons in the hidden layer produced no significant improvement in the T _S forecast. Also, tests of alternate values to the default number of epoch could not lower the error rate, and the default of 1,000 was the optimum number for the best topology.

The model results indicated that the MLR was also able to predict T _S at a desirable level of accuracy. A comparison of the model performances showed that the ANN was more suitable than MLR for estimation of daily T _S at different depths in the selected arid study site. In general, the results of this case study revealed that ANN can be a useful tool in the estimation of daily T _S if other routine weather data are available.

The results of regression analysis showed that T _a, RH, R _S and P significantly correlated with T _S at various depths, but the most effective parameters influencing T _S at different depths were T _a and RH, respectively.

This study examined the MLP type of ANN for estimation of daily T _S. Testing other types of ANN such as radial basis function (RBF) is suggested. This work was done under bare soil conditions. Future studies may include conducting experiments on non-bare (e.g., grass cover) soils and in other climate types. The results of this study are applicable to the arid climate regions where we encounter lack or shortage of soil temperature data, provided that the mentioned meteorological parameters (R _S, T _a, RH, P) are available.

References

Best MJ, Cox PM, Warrilow D (2005) Determining the optimal soil temperature scheme for atmospheric modelling applications. Bound-Lay Meteorol 114:111–142
Article Google Scholar
Bond-Lamberty B, Wang C, Gower ST (2005) Spatiotemporal measurement and modeling of stand-level boreal forest soil temperatures. Agr Forest Meteorol 131:27–40
Article Google Scholar
Gao Z, Bian L, Hu Y, Wang L, Fan J (2007) Determination of soil temperature in an arid region. J Arid Environ 71:157–168
Article Google Scholar
George RK (2001) Prediction of soil temperature by using artificial neural networks algorithms. Nonlinear Anal 47:1737–1748
Article Google Scholar
Holmes TRH, Owe M, De Jeu RAM, Kooi H (2008) Estimating the soil temperature profile from a single depth observation: a simple empirical heat flow solution. Water Resour Res 44:W02412. doi:10.1029/2007WR005994
Article Google Scholar
Houle D, Duchesne L, Ouimet R, Paquin R, Meng FR, Arp PA (2002) Evaluation of FORHYM2 model for prediction of hydrologic fluxes and soil temperature at the Lake Clair Watershed (Duchesnay, Quebec). Forest Ecol Manag 159:249–260
Article Google Scholar
Hu Z, Islam S (1995) Prediction of ground surface temperature and soil moisture content by the force-restore method. Water Resour Res 31(10):2531–2539
Article Google Scholar
Jackson T, Mansfield K, Saafi M, Colman T, Romine P (2008) Measuring soil temperature and moisture using wireless MEMS sensors. Measurement 41:381–390
Article Google Scholar
Lal R, Shukla MK (2004) Principles of soil physics. Marcel Dekker, Inc. New York, p 682
Google Scholar
Landeras G, Ortiz-Barredo A, López JJ (2008) Comparison of artificial neural network models and empirical and semi-empirical equations for daily reference evapotranspiration estimation in the Basque Country (Northern Spain). Agr Water Manag 95:553–565
Article Google Scholar
Lin JD (1980) On the force-restore method for prediction of ground surface temperature. J Geophys Res 85:3251–3254
Article Google Scholar
Mihalakakou G (2002) On estimating soil surface temperature profiles. Energ Buildings 34:251–259
Article Google Scholar
MO IRI (2007) Iranian meteorological office. Data Processing Center, Tehran
Google Scholar
NeuroDimension, Inc. (2005) Developers of NeuroSolutions v5.01: Neural Network Simulator. Gainesville, FL. http:\\www.nd.com
NeuroSolutions (2003) The neural network simulation environment. NeuroDimension Inc., FL
Google Scholar
Paul KI, Polglase PJ, Smethurst PJ, O’Connell AM, Carlyle CJ, Khanna PK (2004) Soil temperature under forests: a simple model for predicting soil temperature under a range of forest types. Agr Forest Meteorol 121:167–182
Article Google Scholar
Plauborg F (2002) Simple model for 10 cm soil temperature in different soils with short grass. Eur J Agron 17:173–179
Article Google Scholar
Sabziparvar AA (2009) Estimation of clear-sky effective erythema radiation from broadband solar radiation (300–3,000 nm) data in an arid climate. Int J Climatol 29(13):2027–2032
Article Google Scholar
Seyfried MS, Flerchinger GN, Murdock MD, Hanson CL, Van Vactor S (2001) Long-term soil temperature database, Reynolds Creek Experimental Watershed, Idaho, United States. Water Resour Res 37(11):2843–2846
Article Google Scholar
Tabari H, Marofi S, Sabziparvar AA (2010a) Estimation of daily pan evaporation using artificial neural network and multivariate non-linear regression. Irrig Sci 28:399–406
Article Google Scholar
Tabari H, Marofi S, Zare Abyaneh H, Sharifi MR (2010b) Comparison of artificial neural network and combined models in estimating spatial distribution of snow depth and snow water equivalent in Samsami basin of Iran. Neural Comput Applic 19:625–635
Article Google Scholar
Torres M, Hervas C, Amador F (2005) Approximating the sheep milk production curve through the use of artificial neural networks and genetic algorithms. Comput Oper Res 32:2653–2670
Article Google Scholar
Zheng D, Raymond Hunt E Jr, Running SW (1993) A daily soil temperature model based on air temperature and precipitation for continental applications. Climate Res 2:183–191
Article Google Scholar

Download references

Acknowledgments

We wish to express our gratitude to the anonymous reviewers whose suggestions and remarks have greatly helped us to improve the quality of the manuscript. The Islamic Republic of Iran Meteorological Organization (IRIMO) for providing the meteorological data is also acknowledged.

Author information

Authors and Affiliations

Department of Irrigation, Faculty of Agriculture, Bu-Ali Sina University, Hamedan, Islamic Republic of Iran
Hossein Tabari & Ali-Akbar Sabziparvar
Department of Civil Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Islamic Republic of Iran
Mohammad Ahmadi

Authors

Hossein Tabari
View author publications
You can also search for this author in PubMed Google Scholar
Ali-Akbar Sabziparvar
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hossein Tabari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tabari, H., Sabziparvar, AA. & Ahmadi, M. Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region. Meteorol Atmos Phys 110, 135–142 (2011). https://doi.org/10.1007/s00703-010-0110-z

Download citation

Received: 15 August 2009
Accepted: 25 November 2010
Published: 09 December 2010
Issue Date: January 2011
DOI: https://doi.org/10.1007/s00703-010-0110-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparison of artificial neural network and multivariate linear regression methods for estimation of daily soil temperature in an arid region

Abstract

Similar content being viewed by others

Predictive modelling of soils’ hydraulic conductivity using artificial neural network and multiple linear regression

Comparison of artificial intelligence techniques for prediction of soil temperatures in Turkey

Predicting moisture content of soil from thermal properties using artificial neural network

1 Introduction