Developing hybrid time series and artificial intelligence models for estimating air temperatures

Mohammadi, Babak; Mehdizadeh, Saeid; Ahmadi, Farshad; Lien, Nguyen Thi Thuy; Linh, Nguyen Thi Thuy; Pham, Quoc Bao

doi:10.1007/s00477-020-01898-7

Developing hybrid time series and artificial intelligence models for estimating air temperatures

Original Paper
Published: 14 October 2020

Volume 35, pages 1189–1204, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Developing hybrid time series and artificial intelligence models for estimating air temperatures

Download PDF

Babak Mohammadi¹,
Saeid Mehdizadeh²,
Farshad Ahmadi³,
Nguyen Thi Thuy Lien⁴,
Nguyen Thi Thuy Linh^7,8 &
…
Quoc Bao Pham^5,6

826 Accesses
37 Citations
Explore all metrics

Abstract

Air temperature is a vital meteorological variable required in many applications, such as agricultural and soil sciences, meteorological and climatological studies, etc. Given the importance of this variable, this study seeks to estimate minimum (T_min), maximum (T_max), and mean (T) air temperatures by applying a linear autoregressive (AR) time series model and then developing a hybrid model by means of coupling the AR and a non-linear time series model, namely autoregressive conditional heteroscedasticity (ARCH). Hence, the hybrid AR-ARCH model was tested. To that end, the T_min, T_max, and T data from 1986 to 2015 at two weather stations located in Northwestern Iran were used for both daily and monthly time scales. The results showed that the hybrid time series model (i.e., AR-ARCH) performed better than the single AR for estimating the air temperature parameters at the study sites. Multi-layer perceptron (MLP) was then employed to estimate the air temperatures using lagged temperature data as input predictors. Next, the single AR and hybrid AR-ARCH time series models were utilized to implement the hybrid MLP-AR and MLP-AR-ARCH models. It is worth noting that developing the hybrid MLP-AR and MLP-AR-ARCH models, as well as AR-ARCH one is the novelty of this study. Three statistical metrics including root mean square error (RMSE), mean absolute error (MAE), and normalized RMSE (NRMSE) were used to investigate the performance of whole the developed models. The hybrid MLP-AR and MLP-AR-ARCH models were found to perform better than the single MLP when estimating the daily and monthly T_min, T_max, and T; however, the MLP-AR models outperformed the MLP-AR-ARCH ones. At the end of this study, the performance of MLP was evaluated under an external condition (i.e., estimating the temperature components at any particular site using the temperature data of an adjacent location). The results indicated that the temperature data of a nearby station can be used for estimating the temperatures of a desired station. Most accurate results during the test stage were obtained under a local assessment through the hybrid MLP-AR(1) at the Tabriz station when estimating the monthly T_max (RMSE = 0.199 °C, MAE = 0.159 °C, NRMSE = 1.012%) and hybrid MLP-AR(12) at the Urmia station when estimating the daily T_max (RMSE = 0.364 °C, MAE = 0.277 °C, NRMSE = 1.911%).

Using AR, MA, and ARMA Time Series Models to Improve the Performance of MARS and KNN Approaches in Monthly Precipitation Modeling under Limited Climatic Data

Article 10 December 2019

A hybrid ARIMA–ANN method to forecast daily global solar radiation in three different cities in Morocco

Article 20 November 2020

Exploration of Future Temperature Analysis Based on ARIMA Time Series Model and GA-BP Neural Network Prediction Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Air temperature is an important weather parameter required in different applications of various agricultural sciences such as agronomy, soil science, agricultural meteorology (Mehdizadeh 2018a), and studies related to climate change caused by air temperature changes (Ustaoglu et al. 2008). It is also an essential variable in atmospheric and environmental studies to predict natural hazards, such as drought and frost caused by variations in air temperature (Kaymaz 2005; Ustaoglu et al. 2008). Air temperature consists of three basic components including the minimum, maximum and mean temperatures. Knowing the minimum temperature is useful to find strategies to counter the risk of frostbite. In turn, maximum temperature in a region helps to determine the potential of solar energy when designing solar power plants.

In terms of agriculture, plant growth is strongly influenced by the air temperature, such that a plant can only grow within a certain range of air temperature (Cobaner et al. 2014; Webber et al. 2016). A site’s suitability for planting a given agricultural crop is commonly determined by the air temperature regimes (Hudson and Wackernagel 1994). Also, seeds grow optimally at a specific range of air temperatures (Cobaner et al. 2014). In addition, soil temperature, an important soil parameter, is greatly affected by weather variables including air temperature, relative humidity, solar radiation, etc. In fact, air temperature is the most important meteorological parameter affecting soil temperature regimes; so, there exists a strong correlation between air and soil temperatures (Behmanesh and Mehdizadeh 2017; Mehdizadeh et al. 2018a, 2020a, b). Moreover, irrigation scheduling is usually based on crop evapotranspiration which is affected by air temperature.

In recent years, artificial intelligence (AI) methods have been extensively and successfully used to estimate meteorological parameters time series such as air temperature. AI models have the ability to approximate a target parameter based on a series of input predictors without understanding the physical process. In addition to these AI approaches, various time series models have also been developed (Box and Jenkins 1976); however, these models have received less attention compared to AI methods for air temperature forecasting. Numerous studies have been published regarding air temperature modeling using AI-based models (Ustaoglu et al. 2008; Smith et al. 2009; Dombayk and Golcu 2009; Bilgili and Sahin 2010; Paniagua-Tineo et al. 2011; Sahin 2012; Cobaner et al. 2014; Pang et al. 2017; Noi et al. 2017; Sanikhani et al. 2018; Azad et al. 2020). Some of their findings are presented below.

Ustaoglu et al. (2008) forecasted the daily minimum, maximum and mean air temperatures of Geyve and Sakarya basins, Turkey. They applied feed forward back propagation (FFBP), radial basis function (RBF), generalized regression neural networks (GRNN), and multiple linear regression (MLR). The RBF was found to perform slightly better than the other methods. Sotomayor (2010) investigated the ability of back propagation (BP) type of artificial neural networks (ANN) and multivariate adaptive regression splines (MARS) to forecast rainfall and temperature in the Mantaro river basin, Peru. The temperature estimates generated by the MARS were superior to the BP model. Khatib et al. (2012) applied the GRNN method to estimate hourly air temperatures in Malaysia, and reported on the ability of this technique to accurately forecast temperature time series. The minimum and maximum air temperatures of Chennai, India, were estimated via the MARS and support vector machine (SVM) techniques by Ramesh and Anitha (2014). The authors concluded that the MARS technique had a higher accuracy than the SVM technique. Cobaner et al. (2014) estimated mean monthly air temperatures of 275 stations located in Turkey, through the adaptive neuro-fuzzy inference system (ANFIS), ANN, and MLR. They found that the ANFIS performed better than the ANN and MLR. Kisi and Sanikhani (2015) modeled the mean monthly temperatures of 50 stations in Iran by using the ANN, ANFIS-subtractive clustering, ANFIS-grid partitioning, gene expression programming (GEP), and support vector regression (SVR). The SVR performed the best out of the different techniques they used. The potential of SVR and multi-layer perceptron (MLP) was tested by Salcedo-Sanz et al. (2016) to estimate mean monthly air temperatures of Australia and New Zealand. They reported a better performance of SVR compared to the MLP. The ability of four AI models, which included ANN, ANFIS, MARS, and SVM, were evaluated by Mehdizadeh (2018a) to estimate the mean monthly air temperatures of 50 stations located in Iran. The author documented that the models performed reliably. Sanikhani et al. (2018) used the GRNN, MARS, random forests (RF), and extreme learning machine (ELM) for forecasting the long-term mean monthly air temperatures of Madhya Pradesh, Central India. They concluded that the models could forecast air temperatures via geographical inputs and periodicity term. Wagle et al. (2019) assessed the modeling performance of long short-term memory (LSTM) to estimate surface air temperature and its reliable accuracy was reported by the authors. In another research, Cifunentes et al. (2020) reviewed recent works published on air temperature estimation through the application of AI models and concluded that deep learning and SVR models could be employed when predicting the air temperature with a dependable level of precision. Azad et al. (2020) implemented new hybrid models by coupling the ANFIS with four different types of optimization algorithms for estimating the monthly ambient temperatures of 34 stations in Iran. They found that the hybrid models better estimated the monthly air temperatures compared to conventional ANFIS.

The main goals of this study are to (1) apply a linear autoregressive (AR) and then implement a hybrid linear-nonlinear time series model (i.e., autoregressive-autoregressive conditional heteroscedasticity; AR-ARCH) to estimate the air temperature of Tabriz and Urmia, Northwestern Iran, on both daily and monthly scales; (2) develop other types of hybrid models through hybridizing the single AR and hybrid AR-ARCH with an AI-based model, namely the MLP; (3) compare the performance of all the single and hybrid models developed in this study; and (4) evaluate the accuracy of MLP under an external condition. An external condition means that the air temperatures at a particular site are estimated using the temperature data of a neighboring location. Literature review reveals that the AI-based models have been extensively applied to estimate air temperatures compared to the time series-based models. On the other side, the hybrid models have recently received remarkable attention; however, the hybrid models implemented through coupling the AI and time series models have been rarely reported in literature when estimating the air temperature parameters. The main contributions of this research, which have not been addressed in previous works, are the use of a single AR, to develop the hybrid AR-ARCH, MLP-AR, and MLP-AR-ARCH models and to evaluate the performance of MLP under an external condition.

2 Materials and methods

2.1 Case study and data gathering

Two weather stations in Iran, the Tabriz and Urmia stations, were chosen as case studies. Both locations are in Northwestern Iran (Fig. 1) and are classified as having a semi-arid climate according to the climate classification developed by de Martonne (1925).

The air temperature data used in this study, which includes the daily and monthly minimum (T_min), maximum (T_max) and mean (T) temperatures between 1986 and 2015, are compiled by the Iran Meteorological Organization (IMO). For both stations and time scales, the data from 1986 to 2009 were used as the training data sets; while the data between 2010 and 2015 were used as the test datasets. The time series graphs of the daily and monthly air temperatures of the Tabriz and Urmia stations during the studied period are depicted in Figs. 2 and 3, respectively. As can be seen, temperature components have similar trends from year to year. Table 1 also summarizes some of the daily and monthly statistical parameters of the data used, including the minimum, maximum, mean, and standard deviation. These statistical parameters are similar for the training and test periods at the two study sites. On the same table, T_min and T_max respectively refer to the minimum and maximum values of the standard deviation indicator on both daily and monthly time scales.

Table 1 Daily and monthly statistical parameters of air temperature data at the study areas

Full size table

Before implementing the models to estimate air temperatures, all the data were standardized using the following equation:

$$T_{S} = \frac{{T_{m} - \overline{{T_{m} }} }}{{\sigma_{{T_{m} }} }}$$

(1)

where $T_{S}$, $T_{m}$, $\overline{{T_{m} }}$, and $\sigma_{{T_{m} }}$ correspond to the standardized air temperature, the measured air temperature, the mean of the measured air temperatures, and the standard deviation of the measured air temperatures, respectively.

2.2 AR and ARCH time series models

Different time series models have been developed to estimate the time series of observed data. AR and other derivations of this model, such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA), are classified as linear models. This means that in the AR, each event at a given time depends on the values of events at earlier times. An AR model can be formulated as follows (Mehdizadeh 2020):

$$Z_{t} (p) = \sum\limits_{i = 1}^{p} {(\varphi_{i} .Z_{t - i} )} + \varepsilon_{t}$$

(2)

where $Z_{t}$ and $Z_{t - i}$ denote the standardized data at times t and t-i, respectively; $p$ is the AR model order; $\varphi_{i}$ shows the ith coefficient of AR; and $\varepsilon_{t}$ represents the stochastic series or error rate.

Other than the linear AR, a non-linear ARCH model was tested in this study. In linear models like the AR, more attention is paid to the data’s mean than to its changing variance over time. The ARCH, a non-linear time series model, was initially developed by Engle (1982) and considers variations in the variance of the data. It can be expressed by the following formulas:

$$\sigma_{t}^{2} = a_{o} + \sum\limits_{i = 1}^{m} {b_{i} \varepsilon_{t - i}^{2} }$$

(3)

$$\varepsilon_{t}^{^{\prime}} = \sigma_{t} .Z_{t}$$

(4)

where $\sigma_{t}^{2}$ denotes the conditional variance; $a_{o}$ and $b_{i}$ are the coefficients of ARCH; and $\varepsilon_{t}^{^{\prime}}$ illustrates the stochastic series achieved by the ARCH. A first order ARCH model (i.e., m = 1 in Eq. 3) was used in this study.

2.3 MLP

One of the most common used types of artificial neural networks is the Multilayer Perceptron (MLP). In this model, weights and biases can be trained to produce a specific goal (Teo et al. 2001; Wang et al. 2006; Fang et al. 2014). The learning rules used in this regard are called perceptron training rules. Perceptron networks are very noteworthy because they have a good ability to evolve by input vectors. These networks are especially useful in solving simple classification problems. This type of neural network is very fast and reliable in solving problems (Gupta and Wang 2010; Wang and Teo 2001; Zhu and Wang 2010). It is an effective technique that can capture the non-linear relationship between output and input (Jahani and Mohammadi 2019). The major feature of MLP is that it completes information processing based on the interactive relationship between neurons, without requiring an advanced mathematical model design. Here, the researchers applied a 3-layered MLP model with a Levenberg–Marquardt algorithm (LM) error-correction learning algorithm. Figure 4 illustrates a schematic diagram of the MLP. This network was trained for 1000 epochs, at a learning rate of 0.0012 and a momentum coefficient of 0.85. This model also included an input layer, a hidden layer, and an output layer. Equation (5) represents the net input into the hidden and output layers.

$$y_{i} = \sum\limits_{j = 1}^{N} {w_{ji} x_{j} + w_{io} }$$

(5)

where N refers to the total number of nodes in the top layer of the node, i; w_ji is the weight between the nodes i and j in the upper layer; x_j denotes the output derived from node j; w_io presents the bias in node i; and y_i denotes the input signal of node i that passed through the transfer function.

The MLP’s network is trained to produce a set of outputs using a set of inputs. Each of these input or output categories can be thought of as a vector. Training is performed sequentially using input vectors and adjusting network weights, according to a predetermined method. During network training, network weights gradually converge to values for which the desired output vector is generated by applying an input vector. The important thing about MLP training is to decide whether to stop the training process, because if network training is not stopped properly, the network becomes prone to over-fitting problems. In these cases, to stop such problems, the technique of stopping training is used. That is, the whole data is divided into three categories (training and testing), whenever the network authentication error increases, the training process will stop. In the present study, the above-fit problem was performed by controlling the evaluation indicators of the models and observing the error chart versus repetitive periods in the training and validation stages.

2.4 Models development

The single AR models were developed by testing the different orders (i.e., p in Eq. 2) and then selecting the optimal AR models by looking for the lowest Akaike information criterion (AIC). In addition, the hybrid AR-ARCH models were implemented by these following steps:

Calculating the error rates obtained via the optimal AR models (i.e., $\varepsilon_{t}$).
Computing the values of $\varepsilon_{t}^{2}$ series.
Fitting the ARCH model to the $\varepsilon_{t}^{2}$ values achieved in the previous step and therefore developing the hybrid AR-ARCH models.

The single MLP models were developed by using the one day and one month lagged data to estimate the daily and monthly air temperatures of current day or month. Moreover, the hybrid MLP-AR and MLP-AR-ARCH models were developed by summing the outputs of the single MLP (i.e., deterministic term) with the outputs of the single AR and hybrid AR-ARCH models (i.e., stochastic term). The hybrid models were developed because the time series models can represent the stochastic term of the data; while the AI-based models, such as MLP, are able to capture the deterministic term of the data. Therefore, an accurate estimation approach needs to consider both terms, which the hybrid models developed in this study have taken into consideration.

All steps in the development of models explained above are related to the local assessment of applied models. A local assessment means that the air temperatures of a particular location are estimated using the temperature data at that same site. In addition to the local assessment of models, the performance of MLP was also evaluated under an external assessment using the air temperatures of an adjacent site to estimate the air temperatures at each desired location.

2.5 Performance assessment criteria

Here, the root mean square error (RMSE), mean absolute error (MAE), and normalized RMSE (NRMSE) were used for assessing the efficiency of all the models to estimate the daily and monthly air temperature as follows (Guan et al. 2020):

$$RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {T_{m,i} - T_{e,i} } \right)^{2} } }}{N}}$$

(6)

$$MAE = \frac{{\sum\nolimits_{i = 1}^{N} {\left| {T_{m,i} - T_{e,i} } \right|} }}{N}$$

(7)

$$NRMSE = \frac{{\sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {T_{m,i} - T_{e,i} } \right)^{2} } }}{N}} }}{{\overline{{T_{m} }} }} \times 100\%$$

(8)

where $T_{m,i}$, $T_{e,i}$, $\overline{{T_{m} }}$, and N denote the ith measured air temperature, the ith estimated air temperature via the single and hybrid models, mean of the measured air temperature data, and the total number of observational data, respectively. A lower value of these metrics indicates a better performance by any given model to estimate the daily and monthly T_min, T_max and T.

3 Results and discussion

3.1 Local assessment of the single and hybrid models

First, the different AR models containing the various orders (i.e., p in Eq. 2) were examined. Then, the AR models that presented the smallest AIC error criterion were selected as the best AR models. For example, the AR(15), AR(12) and AR(14) are the best AR models on a daily scale for estimating T_min, T_max and T at Urmia station, respectively. In addition, the optimal AR models for estimating T_min, T_max and T on a monthly scale at this location are the AR(4), AR(3) and AR(4) models. The values of the statistical indicators including the RMSE, MAE, and NRMSE for the single AR models during both training and test stages at Tabriz and Urmia are summarized in Tables 2 and 3. As can be seen, the single AR models are able to estimate T_min, T_max and T on both studied time scales, specifically on a monthly scale with a high level of accuracy.

Table 2 Error statistics of the standalone and hybrid models at Tabriz station (local assessment)

Full size table

Table 3 Error statistics of the standalone and hybrid models at Urmia station (local assessment)

Full size table

After that, the performance of single linear AR models was improved by combining them with a non-linear time series model named ARCH. Accordingly, the hybrid AR-ARCH models were developed and tested for estimating the daily and monthly temperature components. Tables 2 and 3 represent the values of error criteria calculated for the hybrid AR-ARCH models. The achieved results clearly demonstrate that hybridizing the linear AR with a non-linear ARCH model leads to better estimates of T_min, T_max and T at the study locations on both daily and monthly scales. For example, based on Table 2, the values of RMSE, MAE and NRMSE for the single AR(26) when estimating the T_min at Tabriz station on a daily scale are, respectively, 2.221 °C, 1.694 °C, 28.947% (training period), 2.343 °C, 1.781 °C, 29.302% (test period); while, these statistics improve to 0.445 °C, 0.340 °C, 5.806% (training period), 0.465 °C, 0.356 °C, 5.816% (test period) via the hybrid AR(26)-ARCH model. Assessing the performance of single and developed hybrid time series models when estimating temperature parameters revealed that the accuracy of single AR models was improved most via the hybrid AR-ARCH models for T_min estimation on a daily basis.

In addition to the single and hybrid time series models, an AI-based model including the MLP was developed in this study. As previously noted, the performance of this method depends on the optimal number of neurons in the hidden layer. Therefore, a series of trials were conducted to determine the optimum numbers for the hidden layer nodes by selecting for least error. Table 4 tabulates the optimal number of hidden layer nodes for the MLP models developed at the study locations for both time scales. As seen, these range from 2 (estimating monthly T_max) to 26 (estimating daily T_min) at Tabriz station, and 3 (estimating daily T and monthly T_min) to 17 (estimating monthly T) at Urmia station for local assessment. To implement the MLP models, one day and one month lagged T_min, T_max and T data were used as inputs to estimate the temperature time series of a current day and month. The error criteria RMSE, MAE and NRMSE computed for the single MLP models at Tabriz and Urmia stations are shown in Tables 2 and 3, respectively. Clearly, the lagged temperature data can be used to estimate the daily and monthly temperature components of current day and month.

Table 4 Optimal number of hidden layer neurons for the MLP models developed at the study locations

Full size table

Besides the hybrid AR-ARCH time series model, this study developed other types of hybrid models by combining the single AR and hybrid AR-ARCH models with the MLP, which led to the conception and implementation of hybrid artificial intelligence-time series models (i.e., MLP-AR and MLP-AR-ARCH). The values of statistical indicators obtained for the mentioned hybrid models are shown in Tables 2 and 3. Evaluating the performance of single and hybrid models revealed that better estimates of daily and monthly T_min, T_max and T parameters can be achieved by integrating the AR and AR-ARCH models with the MLP via the hybrid MLP-AR and MLP-AR-ARCH models, particularly by the MLP-AR. For example, based on the Table 3, for the single MLP the values of RMSE, MAE and NRMSE for T_min estimation on a daily scale are, respectively, 2.185 °C, 1.668 °C, 41.499% (training period), 2.325 °C, 1.799 °C, 44.151% (test period); while, the aforementioned statistics improve to 0.404 °C, 0.302 °C, 7.664% (training period), 0.452 °C, 0.328 °C, 8.579% (test period) for the hybrid MLP-AR(15); as well as 1.830 °C, 1.379 °C, 34.758% (training period), 2.007 °C, 1.489 °C, 38.106% (test period) for the hybrid MLP-AR(15)-ARCH model. As already noted, a time series like the AR and AR-ARCH, and AI techniques such as MLP can capture and estimate the stochastic and deterministic components of the data, respectively; while, the hybrid models developed in this study include both terms in their estimations.

Similarly, previous works have confirmed the suitability and higher precision of hybrid models generated via combining AI and time series models compared to the single AI models. These studies developed hybrid models by coupling the various time series and AI approaches for the hydrological and meteorological time series estimation including reference evapotranspiration (Mohammadi and Mehdizadeh 2020; Mehdizadeh 2018b), river flow (Mehdizadeh and Kozakalani Sales 2018; Fathian et al. 2019; Mehdizadeh et al. 2019a, b; Mohammadi et al. 2020a, b), precipitation (Mehdizadeh 2020; Mehdizadeh et al. 2017, 2018b), wind speed (Mehdizadeh et al. 2020c), soil temperature (Moazenzadeh and Mohammadi 2019; Mehdizadeh et al. 2020d), solar radiation (Mohammadi and Aghashariatmadari 2020). It was found that the estimates of hybrid models were better than that of the single AI methods.

Radar diagrams were then prepared to graphically show the estimation accuracy of all the developed models in terms of RMSE values during the test phase, which are depicted in Figs. 5 and 6, respectively, for the Tabriz and Urmia stations. It can be obviously observed that the hybrid AR-ARCH, MLP-AR, and MLP-AR-ARCH models yielded lower RMSE than the corresponding standalone AR and MLP ones. This verifies the superior performance of the implemented hybrid models compared to the single models to estimate the air temperature parameters (T_min, T_max and T).

3.2 External assessment of the MLP models

Following the local assessment of the MLP model, it was externally assessed as well. This type of evaluation is particularly important when the local data in a given station is not available as input for the AI approaches. In those cases, the data of a nearby station could be used to estimate T_min, T_max and T parameters at the desired station. The sites used in this study, the Urmia and Tabriz, are located close to each other in Northwestern Iran and have similar climatic characteristics (i.e., semi-arid). Hence, they can be qualified as neighboring stations, and the daily and monthly T_min, T_max and T parameters at one station were estimated using the same day or month data of the adjacent station. The optimal hidden layer nodes at the studied regions for the external assessment of MLP are presented in Table 4. As can be seen, it varies between 1 (for estimating monthly T_max) and 14 (for estimating monthly T) at Tabriz station, as well as 1 (for estimating monthly T_min) and 24 (for estimating daily T) at Urmia station. Tables 5 and 6, respectively, report the values of error indices obtained by the MLP model for an external assessment. A performance comparison of the single MLP developed in both local and external evaluation conditions (i.e., Tables 2, 3 and 5, 6) demonstrates that the accuracy of MLP models developed under the external condition is higher than for a local one for both time scales and T_min, T_max and T. As an example, the values of RMSE, MAE and NRMSE in Table 3 for the single MLP under a local assessment for the T_max on a daily scale at Urmia station are, respectively, 2.188 °C, 1.672 °C, 12.331% (training period), 2.333 °C, 1.779 °C, 12.238% (test period); while, the above-mentioned statistics improve to 1.329 °C, 1.009 °C, 7.491% (training period), 1.389 °C, 1.078 °C, 7.287% (test period) via the single MLP under an external condition (Table 6). Therefore, it can be concluded that proper selection of adjacent stations can improve the results of AI techniques for an external assessment over that of a local evaluation.

Table 5 Error statistics of the standalone MLP models at Tabriz station (external assessment)

Full size table

Table 6 Error statistics of the standalone MLP models at Urmia station (external assessment)

Full size table

This evaluation type of the AI-based models such as MLP used in the current study is also considered in the previous works when estimating the meteorological and hydrological parameters time series. Here, some of these studies are briefly presented. The climatic parameters of nearby location were used by Mehdizadeh (2018b) to estimate daily reference evapotranspiration of target site. In the field of streamflow modeling, Sanikhani and Kisi (2012) and Mehdizadeh et al. (2019b) evaluated the performance of AI models in modeling the streamflows of target station using the data of adjacent hydrometric location. Moreover, other researches were reported in literature on the applicability of adjacent station' data for modeling the intended parameter at the target site including the pan evaporation estimation (Lu et al. 2018), wind speed prediction (Deo et al. 2018), and drought modeling (Mehdizadeh et al. 2020e). The outcomes of these works indicated that the data of adjacent station could be applied to model the studied problem at the target location under an external evaluation.

3.3 Performance comparison of all models developed

As concluded from the previous sections, the hybrid time series model (i.e., AR-ARCH) performed better than the single AR when estimating the temperature parameters at both daily and monthly scales. Additionally, the hybrid MLP-AR and MLP-AR-ARCH models yielded better results compared to the single MLP models; however, the MLP-AR models developed at the study locations performed the best. Evaluating the performance of single AR and MLP under a local condition proved that the single AR models have better accuracy than the MLP models at both stations on a monthly scale. On the contrary, the MLP models of local condition performed better when estimating the T_min, T_max and T on a daily scale at Tabriz station. For the values of the Urmia station on a daily scale, the AR performed better than the MLP at local condition to estimate T_min and vice versa the MLP models showed better statistics compared to the AR models for estimating T_max and T. As for the hybrid models (i.e., AR-ARCH, MLP-AR and MLP-AR-ARCH), the hybrid AR-ARCH models outperformed the hybrid MLP-AR-ARCH models of local condition when estimating all temperature parameters for both study locations. However, the MLP-AR models developed for a local condition outperformed the hybrid AR-ARCH, except when estimating the daily T_min and monthly T at Urmia, where the AR-ARCH models present a slightly better accuracy than the MLP-AR models at a local condition. Also, the MLP models under an external condition led to better estimates of the daily and monthly T_min, T_max and T parameters for both study regions. The most accurate models at the Tabriz station for estimating T_min, T_max, and T on a monthly scale are, respectively, MLP-AR(3) (RMSE = 0.379 °C, MAE = 0.204 °C, NRMSE = 4.977% at the training stage, RMSE = 0.431 °C, MAE = 0.245 °C, NRMSE = 5.423% at the test stage), MLP-AR(1) (RMSE = 0.165 °C, MAE = 0.112 °C, NRMSE = 0.883% at the training stage, RMSE = 0.199 °C, MAE = 0.159 °C, NRMSE = 1.012% at the test stage), and MLP-AR(1) (RMSE = 0.311 °C, MAE = 0.216 °C, NRMSE = 2.366% at the training stage, RMSE = 0.255 °C, MAE = 0.194 °C, NRMSE = 1.846% at the test stage). Moreover, the most precise estimates of the temperature parameters at Urmia station were obtained by the AR(15)-ARCH when estimating the daily T_min (RMSE = 0.249 °C, MAE = 0.137 °C, NRMSE = 4.727% at the training stage, RMSE = 0.253 °C, MAE = 0.140 °C, NRMSE = 4.801% at the test stage), the MLP-AR(12) when estimating the daily T_max (RMSE = 0.315 °C, MAE = 0.243 °C, NRMSE = 1.774% at the training stage, RMSE = 0.364 °C, MAE = 0.277 °C, NRMSE = 1.911% at the test stage), and the MLP-AR(14) when estimating the daily T (RMSE = 0.232 °C, MAE = 0.176 °C, NRMSE = 2.018% at the training stage, RMSE = 0.262 °C, MAE = 0.194 °C, NRMSE = 2.155% at the test stage).

Regarding the ability of single and hybrid models to estimate the daily and monthly temperature parameters at the study sites, Tables 2 and 3 clearly show that the models performed better when estimating T_min and T than for T_max considering the lower values of RMSE and MAE indicators. However, these criteria cannot be used as reliable statistics to compare the accuracy of models to estimate the different air temperature components since their values depend on the measured values. Therefore, a dimensionless index, like the NRMSE, can be helpful for comparing the performance of different models. It can be concluded that the entire single and hybrid models performed better when estimating T_max on both daily and monthly scales than for the T_min and T parameters given their smaller NRMSE values.

4 Conclusions

In this study, a single AR time series model and an AI-based MLP were used to estimate daily and monthly air temperature parameters which include T_min, T_max and T. Two sites in Northwestern Iran, namely the Tabriz and Urmia, were used as case studies. The results showed that the single MLP outperformed the AR on a daily scale, and vice versa the single AR performed much better than the single MLP on a monthly scale for estimating all temperature components. In addition, three types of hybrid models were developed via coupling the linear AR with a non-linear ARCH, as well as coupling the previously mentioned time series models with the MLP. Accordingly, the hybrid AR-ARCH, MLP-AR and MLP-AR-ARCH models were tested. It was found that the hybrid AR-ARCH outperformed the single AR. Furthermore, the hybrid MLP-AR and MLP-AR-ARCH were better than the single MLP; however, the hybrid MLP-AR models performed best when estimating the daily and monthly air temperatures of the study regions on both time scales. An external assessment of the MLP was also conducted to evaluate if the data of an adjacent station could be used to estimate the temperature components of a target site. The results revealed that the air temperatures of a given station could be estimated using the data of a neighbor station and the performance of MLP under an external condition was better than the local one. Investigating the performance of the single and hybrid models when estimating the temperature components demonstrated that all the models performed best when estimating the daily and monthly T_max considering the NRMSE values.

Future research could implement diverse kinds of hybrid models by coupling the linear moving average (MA), ARMA and ARIMA models with non-linear ones such as ARCH, generalized ARCH (GARCH), etc. The aforementioned time series models could also be hybridized with AI-based approaches such as ANN, ANFIS, SVM and so on to more accurately estimate air temperatures. Additionally, the hybrid models developed in this study could be used to estimate other meteorological and hydrological data, such as soil temperature, rainfall, streamflow, etc.

References

Azad A, Kashi H, Farzin S, Singh VP, Kisi O, Karami H, Sanikhani H (2020) Novel approaches for air temperature prediction: A comparison of four hybrid evolutionary fuzzy models. Meteorol Appl 27(1):e1817
Google Scholar
Behmanesh J, Mehdizadeh S (2017) Estimation of soil temperature using gene expression programming and artificial neural networks in a semiarid region. Environ Earth Sci. https://doi.org/10.1007/s12665-017-6395-1
Article Google Scholar
Bilgili M, Sahin B (2010) Prediction of long-term monthly temperature and rainfall in Turkey. Energy Sources 32(1):60–71
Google Scholar
Box GEP, Jenkins GM (1976) Time series analysis: forecasting and control, Revised. Holden-Day, San Francisco
Google Scholar
Cifuentes J, Marulanda G, Bello A, Reneses J (2020) Air temperature forecasting using machine learning techniques: a review. Energies 13(16):4215
CAS Google Scholar
Cobaner M, Citakoglu H, Kisi O, Haktanir T (2014) Estimation of mean monthly air temperatures in Turkey. Comput Electron Agric 109:71–79
Google Scholar
de Martonne E (1925) Traité de Géographie Physique, 3 tomes. Paris
Deo RC, Ghorbani MA, Samadinfard S, Maraseni T, Bilgili M, Biazar M (2018) Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renew Energy 116:309–323
Google Scholar
Dombayc OA, Golcu M (2009) Daily means ambient temperature prediction using artificial neural network method: a case study of Turkey. Renew Energy 34(4):1158–1161
Google Scholar
Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007
Google Scholar
Fang Y, Fataliyev K, Wang L, Fu X, Wang Y (2014) Improving the genetic-algorithm-optimized wavelet neural network for stock market prediction. In 2014 IEEE International Joint Conference on Neural Networks (IJCNN) pp. 3038–3042
Fathian F, Mehdizadeh S, Kozekalani Sales A, Safari MJS (2019) Hybrid models to improve the monthly river flow prediction: integrating artificial intelligence and non-linear time series models. J Hydrol 575:1200–1213
Google Scholar
Gupta S, Wang LP (2010) Stock forecasting with feedforward neural networks and gradual data sub-sampling. Aus J Intell Inform Proc Syst 11(4):14–17
Google Scholar
Guan Y, Mohammadi B, Pham BQ, Adarsh S, Balkhair KS, Ur Rahman K, Linh NTT, Quang Tri D (2020) A novel approach for predicting daily pan evaporation in the coastal regions of Iran using support vector regression coupled with krill herd algorithm model. Theor Appl Climatol 142(1–2):349–367
Google Scholar
Hudson G, Wackernagel H (1994) Mapping temperature using kriging with external drift: theory and example from Scotland. Int J Climatol 14:77–91
Google Scholar
Jahani B, Mohammadi B (2019) A comparison between the application of empirical and ANN methods for estimation of daily global solar radiation in Iran. Theor Appl Climatol 137(1–2):1257–1269
Google Scholar
Kaymaz B (2005) Hazards and their impact on human. 29.IMISE (International Movement for Interdisciplinary Study of Estrangement) Conference. The American University of Paris: Paris, 4–9
Khatib T, Mohamed A, Sopian K, Mahmoud M (2012) Estimating ambient temperature for Malaysia using generalized regression neural network. Int J Green Energy 9:195–201
Google Scholar
Kisi O, Sanikhani H (2015) Prediction of long-term monthly precipitation using several soft computing methods without climatic data. Int J Climatol 3(14):4139–4150
Google Scholar
Lu X, Ju Y, Wu L, Fan J, Zhang F, Li Z (2018) Daily pan evaporation modeling from local and cross-station data using three tree-based machine learning models. J Hydrol 566:668–684
Google Scholar
Mehdizadeh S (2018a) Assessing the potential of data-driven models for estimation of long-term monthly temperatures. Comput Electron Agric 144:114–125
Google Scholar
Mehdizadeh S (2018b) Estimation of daily reference evapotranspiration (ET_o) using artificial intelligence methods: offering a new approach for lagged ET_o data-based modeling. J Hydrol 559:794–812
Google Scholar
Mehdizadeh S (2020) Using AR, MA, and ARMA time series models to improve the performance of MARS and KNN approaches in monthly precipitation modeling under limited climatic data. Water Resour Manage 34(1):263–282
Google Scholar
Mehdizadeh S, Kozekalani Sales A (2018) A comparative study of autoregressive, autoregressive moving average, gene expression programming and Bayesian networks for estimating monthly streamflow. Water Resour Manage 32(9):3001–3022
Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2017) A comparison of monthly precipitation point estimates using integration of soft computing methods and GARCH time series model. J Hydrol 554:721–742
Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2018a) Comprehensive modeling of monthly mean soil temperature using multivariate adaptive regression splines and support vector machine. Theor Appl Climatol 133(3–4):911–924
Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2018b) New approaches for estimation of monthly rainfall based on GEP-ARCH and ANN-ARCH hybrid models. Water Resour Manage 32(2):527–545
Google Scholar
Mehdizadeh S, Fathian F, Adamowski JF (2019a) Novel hybrid artificial intelligencetime series models for monthly streamflow modeling. Appl Soft Comput 80:873–887
Google Scholar
Mehdizadeh S, Fathian F, Safari MJS, Adamowski JF (2019b) Comparative assessment of time series and artificial intelligence models to estimate monthly streamflow: A local and external data analysis approach. J Hydrol 579:124225
Google Scholar
Mehdizadeh S, Mohammadi B, Pham QB, Khoy DN, Nhi PTT (2020a) Implementing novel hybrid models to improve indirect measurement of the daily soil temperature: Elman neural network coupled with gravitational search algorithm and ant colony optimization. Measurement 165:108127
Google Scholar
Mehdizadeh S, Ahmadi A, Kozekalanai Sales A (2020b) Modelling daily soil temperature at different depths via the classical and hybrid models. Meteorol Appl 27(4):e1941
Google Scholar
Mehdizadeh S, Kozekalani Sales A, Safari MJS (2020c) Estimating the short-term and long-term wind speeds: implementing hybrid models through coupling machine learning and linear time series models. SN Appl Sci. https://doi.org/10.1007/s42452-020-2830-0
Article Google Scholar
Mehdizadeh S, Fathian F, Safari MJS, Khosravi A (2020d) Developing novel hybrid models for estimation of daily soil temperature at various depths. Soil Till Res 197:104513
Google Scholar
Mehdizadeh S, Ahmadi A, Danandeh Mehr A, Safari MJS (2020e) Drought modeling using classic time series and hybrid wavelet-gene expression programming models. J Hydrol 587:125017
Google Scholar
Moazenzadeh R, Mohammadi B (2019) Assessment of bio-inspired metaheuristic optimisation algorithms for estimating soil temperature. Geoderma 353:152–171
Google Scholar
Mohammadi B, Aghashariatmadari Z (2020) Estimation of solar radiation using neighboring stations through hybrid support vector regression boosted by Krill Herd algorithm. Arab J Geosci 13(10)
Mohammadi B, Ahmadi F, Mehdizadeh S, Guan Y, Pham QB, Linh NTT, Tri DQ (2020a) Developing novel robust models to improve the accuracy of daily streamflow modeling. Water Resour Manage 34:3387–3409
Google Scholar
Mohammadi B, Linh NTT, Pham QB, Ahmed AN, Vojteková J, Guan Y, Abba SI, El-Shafie A (2020b) Adaptive neuro-fuzzy inference system coupled with shuffled frog leaping algorithm for predicting river streamflow time series. Hydrol Sci J 65(10):1738–1751
Google Scholar
Mohammadi B, Mehdizadeh S (2020) Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric Water Manage 237:106145
Google Scholar
Noi PT, Degener J, Kappas M (2017) Comparison of multiple linear regression Cubist regression, and random forest algorithms to estimate daily air surface temperature from dynamic combinations of MODIS LST data. Remote Sens 9(5):398
Google Scholar
Pang B, Yue J, Zhao G, Xu Z (2017) Statistical downscaling of temperature with the random forest model. Adv Meteorol 7265178:1–11
Google Scholar
Paniagua-Tineo A, Salcedo-Sanz S, Casanova-Mateo C, Ortiz-Garcia EG, Cony MA, Hernandez-Martin E (2011) Prediction of daily maximum temperature using a support vector regression algorithm. Renew Energy 36(11):3054–3060
Google Scholar
Ramesh K, Anitha R (2014) MARSpline model for lead seven-day maximum and minimum air temperature prediction in Chennai. India J Earth Syst Sci 123(4):665–672
Google Scholar
Sahin M (2012) Modelling of air temperature using remote sensing and artificial neuralnetwork in Turkey. Adv Space Res 50(7):973–985
Google Scholar
Salcedo-Sanz S, Deo RC, Carro-Calvo L, Saavedra-Moreno B (2016) Monthly prediction of air temperature in Australia and New Zealand with machine learning algorithms. Theor Appl Climatol 125(1–2):13–25
Google Scholar
Sanikhani H, Kisi O (2012) River flow estimation and forecasting by using two different adaptive neuro-fuzzy approaches. Water Resour Manag 26(6):1715–1729
Google Scholar
Sanikhani H, Deo RC, Samui P, Kisi O, Mert C, Mirabbasi R, Gavili S, Yaseen ZM (2018) Survey of different data-intelligent modeling strategies for forecasting air temperature using geographic information as model predictors. Comput Electron Agric 152:242–260
Google Scholar
Smith BA, Hoogenboom G, McClendon RW (2009) Artificial neural networks for automated year-round temperature prediction. Comput Electron Agric 68(1):52–61
Google Scholar
Sotomayor KAL (2010) Comparison of adaptive methods using multivariate regression splines (MARS) and artificial neural networks backpropagation (ANNB) for the forecast of rain and temperatures in the Mantaro river basin. Hydrol Days. pp. 58–68
Teo KK, Wang L, Lin Z (2001) Wavelet packet multi-layer perceptron for chaotic time series prediction: effects of weight initialization. In: International Conference on Computational Science. Springer: Berlin Heidelberg. pp. 310–317
Ustaoglu B, Cigizoglu HK, Karaca M (2008) Forecast of daily minimum, maximum and mean temperature time series by three artificial neural network methods. Meteorol Appl 15(4):431–445
Google Scholar
Wagle S, Uttamani S, Dsouza S, Devadkar K (2019) Predicting surface air temperature using convolutional long short-term memory networks ICCCE. Springer, Singapore, pp 183–188
Google Scholar
Wang L, Fu X (2006) Data mining with computational intelligence. Springer, New York
Google Scholar
Wang L, Teo KK, Lin Z (2001) Predicting time series with wavelet packet neural networks. In IJCNN'01 IEEE International Joint Conference on Neural Networks. Proceedings (Cat. No. 01CH37222). 3: 1593–1597
Webber H, Ewert F, Kimball BA, Siebert S, White JW, Wall GW, Ottman MJ, Trawally DNA, Gaiser T (2016) Simulating canopy temperature for modelling heat stress in cereals. Environ Model Softw 77:143–155
Google Scholar
Zhu M, Wang L (2010) Intelligent trading using support vector regression and multilayer perceptrons optimized with genetic algorithms. In: The 2010 IEEE International Joint Conference on Neural Networks (IJCNN) pp. 1–5

Download references

Author information

Authors and Affiliations

College of Hydrology and Water Resources, Hohai University, Nanjing, 210098, China
Babak Mohammadi
Water Engineering Department, Urmia, University, Urmia, Iran
Saeid Mehdizadeh
Department of Hydrology and Water Resources Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran
Farshad Ahmadi
Faculty of Water Resource Engineering, Thuyloi University, Hanoi, 100000, Vietnam
Nguyen Thi Thuy Lien
Environmental Quality, Atmospheric Science and Climate Change Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Quoc Bao Pham
Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Quoc Bao Pham
Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam
Nguyen Thi Thuy Linh
Faculty of Environmental and Chemical Engineering, Duy Tan University, Danang 550000, Vietnam
Nguyen Thi Thuy Linh

Authors

Babak Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Mehdizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thi Thuy Lien
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thi Thuy Linh
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Bao Pham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Bao Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohammadi, B., Mehdizadeh, S., Ahmadi, F. et al. Developing hybrid time series and artificial intelligence models for estimating air temperatures. Stoch Environ Res Risk Assess 35, 1189–1204 (2021). https://doi.org/10.1007/s00477-020-01898-7

Download citation

Accepted: 06 October 2020
Published: 14 October 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00477-020-01898-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Developing hybrid time series and artificial intelligence models for estimating air temperatures

Abstract

Similar content being viewed by others

Using AR, MA, and ARMA Time Series Models to Improve the Performance of MARS and KNN Approaches in Monthly Precipitation Modeling under Limited Climatic Data

A hybrid ARIMA–ANN method to forecast daily global solar radiation in three different cities in Morocco

Exploration of Future Temperature Analysis Based on ARIMA Time Series Model and GA-BP Neural Network Prediction Model

1 Introduction