1 Introduction

Air temperature is an important weather parameter required in different applications of various agricultural sciences such as agronomy, soil science, agricultural meteorology (Mehdizadeh 2018a), and studies related to climate change caused by air temperature changes (Ustaoglu et al. 2008). It is also an essential variable in atmospheric and environmental studies to predict natural hazards, such as drought and frost caused by variations in air temperature (Kaymaz 2005; Ustaoglu et al. 2008). Air temperature consists of three basic components including the minimum, maximum and mean temperatures. Knowing the minimum temperature is useful to find strategies to counter the risk of frostbite. In turn, maximum temperature in a region helps to determine the potential of solar energy when designing solar power plants.

In terms of agriculture, plant growth is strongly influenced by the air temperature, such that a plant can only grow within a certain range of air temperature (Cobaner et al. 2014; Webber et al. 2016). A site’s suitability for planting a given agricultural crop is commonly determined by the air temperature regimes (Hudson and Wackernagel 1994). Also, seeds grow optimally at a specific range of air temperatures (Cobaner et al. 2014). In addition, soil temperature, an important soil parameter, is greatly affected by weather variables including air temperature, relative humidity, solar radiation, etc. In fact, air temperature is the most important meteorological parameter affecting soil temperature regimes; so, there exists a strong correlation between air and soil temperatures (Behmanesh and Mehdizadeh 2017; Mehdizadeh et al. 2018a, 2020a, b). Moreover, irrigation scheduling is usually based on crop evapotranspiration which is affected by air temperature.

In recent years, artificial intelligence (AI) methods have been extensively and successfully used to estimate meteorological parameters time series such as air temperature. AI models have the ability to approximate a target parameter based on a series of input predictors without understanding the physical process. In addition to these AI approaches, various time series models have also been developed (Box and Jenkins 1976); however, these models have received less attention compared to AI methods for air temperature forecasting. Numerous studies have been published regarding air temperature modeling using AI-based models (Ustaoglu et al. 2008; Smith et al. 2009; Dombayk and Golcu 2009; Bilgili and Sahin 2010; Paniagua-Tineo et al. 2011; Sahin 2012; Cobaner et al. 2014; Pang et al. 2017; Noi et al. 2017; Sanikhani et al. 2018; Azad et al. 2020). Some of their findings are presented below.

Ustaoglu et al. (2008) forecasted the daily minimum, maximum and mean air temperatures of Geyve and Sakarya basins, Turkey. They applied feed forward back propagation (FFBP), radial basis function (RBF), generalized regression neural networks (GRNN), and multiple linear regression (MLR). The RBF was found to perform slightly better than the other methods. Sotomayor (2010) investigated the ability of back propagation (BP) type of artificial neural networks (ANN) and multivariate adaptive regression splines (MARS) to forecast rainfall and temperature in the Mantaro river basin, Peru. The temperature estimates generated by the MARS were superior to the BP model. Khatib et al. (2012) applied the GRNN method to estimate hourly air temperatures in Malaysia, and reported on the ability of this technique to accurately forecast temperature time series. The minimum and maximum air temperatures of Chennai, India, were estimated via the MARS and support vector machine (SVM) techniques by Ramesh and Anitha (2014). The authors concluded that the MARS technique had a higher accuracy than the SVM technique. Cobaner et al. (2014) estimated mean monthly air temperatures of 275 stations located in Turkey, through the adaptive neuro-fuzzy inference system (ANFIS), ANN, and MLR. They found that the ANFIS performed better than the ANN and MLR. Kisi and Sanikhani (2015) modeled the mean monthly temperatures of 50 stations in Iran by using the ANN, ANFIS-subtractive clustering, ANFIS-grid partitioning, gene expression programming (GEP), and support vector regression (SVR). The SVR performed the best out of the different techniques they used. The potential of SVR and multi-layer perceptron (MLP) was tested by Salcedo-Sanz et al. (2016) to estimate mean monthly air temperatures of Australia and New Zealand. They reported a better performance of SVR compared to the MLP. The ability of four AI models, which included ANN, ANFIS, MARS, and SVM, were evaluated by Mehdizadeh (2018a) to estimate the mean monthly air temperatures of 50 stations located in Iran. The author documented that the models performed reliably. Sanikhani et al. (2018) used the GRNN, MARS, random forests (RF), and extreme learning machine (ELM) for forecasting the long-term mean monthly air temperatures of Madhya Pradesh, Central India. They concluded that the models could forecast air temperatures via geographical inputs and periodicity term. Wagle et al. (2019) assessed the modeling performance of long short-term memory (LSTM) to estimate surface air temperature and its reliable accuracy was reported by the authors. In another research, Cifunentes et al. (2020) reviewed recent works published on air temperature estimation through the application of AI models and concluded that deep learning and SVR models could be employed when predicting the air temperature with a dependable level of precision. Azad et al. (2020) implemented new hybrid models by coupling the ANFIS with four different types of optimization algorithms for estimating the monthly ambient temperatures of 34 stations in Iran. They found that the hybrid models better estimated the monthly air temperatures compared to conventional ANFIS.

The main goals of this study are to (1) apply a linear autoregressive (AR) and then implement a hybrid linear-nonlinear time series model (i.e., autoregressive-autoregressive conditional heteroscedasticity; AR-ARCH) to estimate the air temperature of Tabriz and Urmia, Northwestern Iran, on both daily and monthly scales; (2) develop other types of hybrid models through hybridizing the single AR and hybrid AR-ARCH with an AI-based model, namely the MLP; (3) compare the performance of all the single and hybrid models developed in this study; and (4) evaluate the accuracy of MLP under an external condition. An external condition means that the air temperatures at a particular site are estimated using the temperature data of a neighboring location. Literature review reveals that the AI-based models have been extensively applied to estimate air temperatures compared to the time series-based models. On the other side, the hybrid models have recently received remarkable attention; however, the hybrid models implemented through coupling the AI and time series models have been rarely reported in literature when estimating the air temperature parameters. The main contributions of this research, which have not been addressed in previous works, are the use of a single AR, to develop the hybrid AR-ARCH, MLP-AR, and MLP-AR-ARCH models and to evaluate the performance of MLP under an external condition.

2 Materials and methods

2.1 Case study and data gathering

Two weather stations in Iran, the Tabriz and Urmia stations, were chosen as case studies. Both locations are in Northwestern Iran (Fig. 1) and are classified as having a semi-arid climate according to the climate classification developed by de Martonne (1925).

Fig. 1
figure 1

The geographical position of study locations in Northwest of Iran

The air temperature data used in this study, which includes the daily and monthly minimum (Tmin), maximum (Tmax) and mean (T) temperatures between 1986 and 2015, are compiled by the Iran Meteorological Organization (IMO). For both stations and time scales, the data from 1986 to 2009 were used as the training data sets; while the data between 2010 and 2015 were used as the test datasets. The time series graphs of the daily and monthly air temperatures of the Tabriz and Urmia stations during the studied period are depicted in Figs. 2 and 3, respectively. As can be seen, temperature components have similar trends from year to year. Table 1 also summarizes some of the daily and monthly statistical parameters of the data used, including the minimum, maximum, mean, and standard deviation. These statistical parameters are similar for the training and test periods at the two study sites. On the same table, Tmin and Tmax respectively refer to the minimum and maximum values of the standard deviation indicator on both daily and monthly time scales.

Fig. 2
figure 2

Time series of the daily and monthly air temperature data at Tabriz station during 1986–2015

Fig. 3
figure 3

Time series of the daily and monthly air temperature data at Urmia station during 1986–2015

Table 1 Daily and monthly statistical parameters of air temperature data at the study areas

Before implementing the models to estimate air temperatures, all the data were standardized using the following equation:

$$T_{S} = \frac{{T_{m} - \overline{{T_{m} }} }}{{\sigma_{{T_{m} }} }}$$
(1)

where \(T_{S}\), \(T_{m}\), \(\overline{{T_{m} }}\), and \(\sigma_{{T_{m} }}\) correspond to the standardized air temperature, the measured air temperature, the mean of the measured air temperatures, and the standard deviation of the measured air temperatures, respectively.

2.2 AR and ARCH time series models

Different time series models have been developed to estimate the time series of observed data. AR and other derivations of this model, such as autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA), are classified as linear models. This means that in the AR, each event at a given time depends on the values of events at earlier times. An AR model can be formulated as follows (Mehdizadeh 2020):

$$Z_{t} (p) = \sum\limits_{i = 1}^{p} {(\varphi_{i} .Z_{t - i} )} + \varepsilon_{t}$$
(2)

where \(Z_{t}\) and \(Z_{t - i}\) denote the standardized data at times t and t-i, respectively; \(p\) is the AR model order; \(\varphi_{i}\) shows the ith coefficient of AR; and \(\varepsilon_{t}\) represents the stochastic series or error rate.

Other than the linear AR, a non-linear ARCH model was tested in this study. In linear models like the AR, more attention is paid to the data’s mean than to its changing variance over time. The ARCH, a non-linear time series model, was initially developed by Engle (1982) and considers variations in the variance of the data. It can be expressed by the following formulas:

$$\sigma_{t}^{2} = a_{o} + \sum\limits_{i = 1}^{m} {b_{i} \varepsilon_{t - i}^{2} }$$
(3)
$$\varepsilon_{t}^{^{\prime}} = \sigma_{t} .Z_{t}$$
(4)

where \(\sigma_{t}^{2}\) denotes the conditional variance; \(a_{o}\) and \(b_{i}\) are the coefficients of ARCH; and \(\varepsilon_{t}^{^{\prime}}\) illustrates the stochastic series achieved by the ARCH. A first order ARCH model (i.e., m = 1 in Eq. 3) was used in this study.

2.3 MLP

One of the most common used types of artificial neural networks is the Multilayer Perceptron (MLP). In this model, weights and biases can be trained to produce a specific goal (Teo et al. 2001; Wang et al. 2006; Fang et al. 2014). The learning rules used in this regard are called perceptron training rules. Perceptron networks are very noteworthy because they have a good ability to evolve by input vectors. These networks are especially useful in solving simple classification problems. This type of neural network is very fast and reliable in solving problems (Gupta and Wang 2010; Wang and Teo 2001; Zhu and Wang 2010). It is an effective technique that can capture the non-linear relationship between output and input (Jahani and Mohammadi 2019). The major feature of MLP is that it completes information processing based on the interactive relationship between neurons, without requiring an advanced mathematical model design. Here, the researchers applied a 3-layered MLP model with a Levenberg–Marquardt algorithm (LM) error-correction learning algorithm. Figure 4 illustrates a schematic diagram of the MLP. This network was trained for 1000 epochs, at a learning rate of 0.0012 and a momentum coefficient of 0.85. This model also included an input layer, a hidden layer, and an output layer. Equation (5) represents the net input into the hidden and output layers.

$$y_{i} = \sum\limits_{j = 1}^{N} {w_{ji} x_{j} + w_{io} }$$
(5)

where N refers to the total number of nodes in the top layer of the node, i; wji is the weight between the nodes i and j in the upper layer; xj denotes the output derived from node j; wio presents the bias in node i; and yi denotes the input signal of node i that passed through the transfer function.

Fig. 4
figure 4

Multi-layer perceptron used structure

The MLP’s network is trained to produce a set of outputs using a set of inputs. Each of these input or output categories can be thought of as a vector. Training is performed sequentially using input vectors and adjusting network weights, according to a predetermined method. During network training, network weights gradually converge to values for which the desired output vector is generated by applying an input vector. The important thing about MLP training is to decide whether to stop the training process, because if network training is not stopped properly, the network becomes prone to over-fitting problems. In these cases, to stop such problems, the technique of stopping training is used. That is, the whole data is divided into three categories (training and testing), whenever the network authentication error increases, the training process will stop. In the present study, the above-fit problem was performed by controlling the evaluation indicators of the models and observing the error chart versus repetitive periods in the training and validation stages.

2.4 Models development

The single AR models were developed by testing the different orders (i.e., p in Eq. 2) and then selecting the optimal AR models by looking for the lowest Akaike information criterion (AIC). In addition, the hybrid AR-ARCH models were implemented by these following steps:

  • Calculating the error rates obtained via the optimal AR models (i.e., \(\varepsilon_{t}\)).

  • Computing the values of \(\varepsilon_{t}^{2}\) series.

  • Fitting the ARCH model to the \(\varepsilon_{t}^{2}\) values achieved in the previous step and therefore developing the hybrid AR-ARCH models.

The single MLP models were developed by using the one day and one month lagged data to estimate the daily and monthly air temperatures of current day or month. Moreover, the hybrid MLP-AR and MLP-AR-ARCH models were developed by summing the outputs of the single MLP (i.e., deterministic term) with the outputs of the single AR and hybrid AR-ARCH models (i.e., stochastic term). The hybrid models were developed because the time series models can represent the stochastic term of the data; while the AI-based models, such as MLP, are able to capture the deterministic term of the data. Therefore, an accurate estimation approach needs to consider both terms, which the hybrid models developed in this study have taken into consideration.

All steps in the development of models explained above are related to the local assessment of applied models. A local assessment means that the air temperatures of a particular location are estimated using the temperature data at that same site. In addition to the local assessment of models, the performance of MLP was also evaluated under an external assessment using the air temperatures of an adjacent site to estimate the air temperatures at each desired location.

2.5 Performance assessment criteria

Here, the root mean square error (RMSE), mean absolute error (MAE), and normalized RMSE (NRMSE) were used for assessing the efficiency of all the models to estimate the daily and monthly air temperature as follows (Guan et al. 2020):

$$RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {T_{m,i} - T_{e,i} } \right)^{2} } }}{N}}$$
(6)
$$MAE = \frac{{\sum\nolimits_{i = 1}^{N} {\left| {T_{m,i} - T_{e,i} } \right|} }}{N}$$
(7)
$$NRMSE = \frac{{\sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {\left( {T_{m,i} - T_{e,i} } \right)^{2} } }}{N}} }}{{\overline{{T_{m} }} }} \times 100\%$$
(8)

where \(T_{m,i}\), \(T_{e,i}\), \(\overline{{T_{m} }}\), and N denote the ith measured air temperature, the ith estimated air temperature via the single and hybrid models, mean of the measured air temperature data, and the total number of observational data, respectively. A lower value of these metrics indicates a better performance by any given model to estimate the daily and monthly Tmin, Tmax and T.

3 Results and discussion

3.1 Local assessment of the single and hybrid models

First, the different AR models containing the various orders (i.e., p in Eq. 2) were examined. Then, the AR models that presented the smallest AIC error criterion were selected as the best AR models. For example, the AR(15), AR(12) and AR(14) are the best AR models on a daily scale for estimating Tmin, Tmax and T at Urmia station, respectively. In addition, the optimal AR models for estimating Tmin, Tmax and T on a monthly scale at this location are the AR(4), AR(3) and AR(4) models. The values of the statistical indicators including the RMSE, MAE, and NRMSE for the single AR models during both training and test stages at Tabriz and Urmia are summarized in Tables 2 and 3. As can be seen, the single AR models are able to estimate Tmin, Tmax and T on both studied time scales, specifically on a monthly scale with a high level of accuracy.

Table 2 Error statistics of the standalone and hybrid models at Tabriz station (local assessment)
Table 3 Error statistics of the standalone and hybrid models at Urmia station (local assessment)

After that, the performance of single linear AR models was improved by combining them with a non-linear time series model named ARCH. Accordingly, the hybrid AR-ARCH models were developed and tested for estimating the daily and monthly temperature components. Tables 2 and 3 represent the values of error criteria calculated for the hybrid AR-ARCH models. The achieved results clearly demonstrate that hybridizing the linear AR with a non-linear ARCH model leads to better estimates of Tmin, Tmax and T at the study locations on both daily and monthly scales. For example, based on Table 2, the values of RMSE, MAE and NRMSE for the single AR(26) when estimating the Tmin at Tabriz station on a daily scale are, respectively, 2.221 °C, 1.694 °C, 28.947% (training period), 2.343 °C, 1.781 °C, 29.302% (test period); while, these statistics improve to 0.445 °C, 0.340 °C, 5.806% (training period), 0.465 °C, 0.356 °C, 5.816% (test period) via the hybrid AR(26)-ARCH model. Assessing the performance of single and developed hybrid time series models when estimating temperature parameters revealed that the accuracy of single AR models was improved most via the hybrid AR-ARCH models for Tmin estimation on a daily basis.

In addition to the single and hybrid time series models, an AI-based model including the MLP was developed in this study. As previously noted, the performance of this method depends on the optimal number of neurons in the hidden layer. Therefore, a series of trials were conducted to determine the optimum numbers for the hidden layer nodes by selecting for least error. Table 4 tabulates the optimal number of hidden layer nodes for the MLP models developed at the study locations for both time scales. As seen, these range from 2 (estimating monthly Tmax) to 26 (estimating daily Tmin) at Tabriz station, and 3 (estimating daily T and monthly Tmin) to 17 (estimating monthly T) at Urmia station for local assessment. To implement the MLP models, one day and one month lagged Tmin, Tmax and T data were used as inputs to estimate the temperature time series of a current day and month. The error criteria RMSE, MAE and NRMSE computed for the single MLP models at Tabriz and Urmia stations are shown in Tables 2 and 3, respectively. Clearly, the lagged temperature data can be used to estimate the daily and monthly temperature components of current day and month.

Table 4 Optimal number of hidden layer neurons for the MLP models developed at the study locations

Besides the hybrid AR-ARCH time series model, this study developed other types of hybrid models by combining the single AR and hybrid AR-ARCH models with the MLP, which led to the conception and implementation of hybrid artificial intelligence-time series models (i.e., MLP-AR and MLP-AR-ARCH). The values of statistical indicators obtained for the mentioned hybrid models are shown in Tables 2 and 3. Evaluating the performance of single and hybrid models revealed that better estimates of daily and monthly Tmin, Tmax and T parameters can be achieved by integrating the AR and AR-ARCH models with the MLP via the hybrid MLP-AR and MLP-AR-ARCH models, particularly by the MLP-AR. For example, based on the Table 3, for the single MLP the values of RMSE, MAE and NRMSE for Tmin estimation on a daily scale are, respectively, 2.185 °C, 1.668 °C, 41.499% (training period), 2.325 °C, 1.799 °C, 44.151% (test period); while, the aforementioned statistics improve to 0.404 °C, 0.302 °C, 7.664% (training period), 0.452 °C, 0.328 °C, 8.579% (test period) for the hybrid MLP-AR(15); as well as 1.830 °C, 1.379 °C, 34.758% (training period), 2.007 °C, 1.489 °C, 38.106% (test period) for the hybrid MLP-AR(15)-ARCH model. As already noted, a time series like the AR and AR-ARCH, and AI techniques such as MLP can capture and estimate the stochastic and deterministic components of the data, respectively; while, the hybrid models developed in this study include both terms in their estimations.

Similarly, previous works have confirmed the suitability and higher precision of hybrid models generated via combining AI and time series models compared to the single AI models. These studies developed hybrid models by coupling the various time series and AI approaches for the hydrological and meteorological time series estimation including reference evapotranspiration (Mohammadi and Mehdizadeh 2020; Mehdizadeh 2018b), river flow (Mehdizadeh and Kozakalani Sales 2018; Fathian et al. 2019; Mehdizadeh et al. 2019a, b; Mohammadi et al. 2020a, b), precipitation (Mehdizadeh 2020; Mehdizadeh et al. 2017, 2018b), wind speed (Mehdizadeh et al. 2020c), soil temperature (Moazenzadeh and Mohammadi 2019; Mehdizadeh et al. 2020d), solar radiation (Mohammadi and Aghashariatmadari 2020). It was found that the estimates of hybrid models were better than that of the single AI methods.

Radar diagrams were then prepared to graphically show the estimation accuracy of all the developed models in terms of RMSE values during the test phase, which are depicted in Figs. 5 and 6, respectively, for the Tabriz and Urmia stations. It can be obviously observed that the hybrid AR-ARCH, MLP-AR, and MLP-AR-ARCH models yielded lower RMSE than the corresponding standalone AR and MLP ones. This verifies the superior performance of the implemented hybrid models compared to the single models to estimate the air temperature parameters (Tmin, Tmax and T).

Fig. 5
figure 5

Radar graphs for the RMSE values obtained through the different models at Tabriz station during the test stage

Fig. 6
figure 6

Radar graphs for the RMSE values obtained through the different models at Urmia station during the test stage

3.2 External assessment of the MLP models

Following the local assessment of the MLP model, it was externally assessed as well. This type of evaluation is particularly important when the local data in a given station is not available as input for the AI approaches. In those cases, the data of a nearby station could be used to estimate Tmin, Tmax and T parameters at the desired station. The sites used in this study, the Urmia and Tabriz, are located close to each other in Northwestern Iran and have similar climatic characteristics (i.e., semi-arid). Hence, they can be qualified as neighboring stations, and the daily and monthly Tmin, Tmax and T parameters at one station were estimated using the same day or month data of the adjacent station. The optimal hidden layer nodes at the studied regions for the external assessment of MLP are presented in Table 4. As can be seen, it varies between 1 (for estimating monthly Tmax) and 14 (for estimating monthly T) at Tabriz station, as well as 1 (for estimating monthly Tmin) and 24 (for estimating daily T) at Urmia station. Tables 5 and 6, respectively, report the values of error indices obtained by the MLP model for an external assessment. A performance comparison of the single MLP developed in both local and external evaluation conditions (i.e., Tables 2, 3 and 5, 6) demonstrates that the accuracy of MLP models developed under the external condition is higher than for a local one for both time scales and Tmin, Tmax and T. As an example, the values of RMSE, MAE and NRMSE in Table 3 for the single MLP under a local assessment for the Tmax on a daily scale at Urmia station are, respectively, 2.188 °C, 1.672 °C, 12.331% (training period), 2.333 °C, 1.779 °C, 12.238% (test period); while, the above-mentioned statistics improve to 1.329 °C, 1.009 °C, 7.491% (training period), 1.389 °C, 1.078 °C, 7.287% (test period) via the single MLP under an external condition (Table 6). Therefore, it can be concluded that proper selection of adjacent stations can improve the results of AI techniques for an external assessment over that of a local evaluation.

Table 5 Error statistics of the standalone MLP models at Tabriz station (external assessment)
Table 6 Error statistics of the standalone MLP models at Urmia station (external assessment)

This evaluation type of the AI-based models such as MLP used in the current study is also considered in the previous works when estimating the meteorological and hydrological parameters time series. Here, some of these studies are briefly presented. The climatic parameters of nearby location were used by Mehdizadeh (2018b) to estimate daily reference evapotranspiration of target site. In the field of streamflow modeling, Sanikhani and Kisi (2012) and Mehdizadeh et al. (2019b) evaluated the performance of AI models in modeling the streamflows of target station using the data of adjacent hydrometric location. Moreover, other researches were reported in literature on the applicability of adjacent station' data for modeling the intended parameter at the target site including the pan evaporation estimation (Lu et al. 2018), wind speed prediction (Deo et al. 2018), and drought modeling (Mehdizadeh et al. 2020e). The outcomes of these works indicated that the data of adjacent station could be applied to model the studied problem at the target location under an external evaluation.

3.3 Performance comparison of all models developed

As concluded from the previous sections, the hybrid time series model (i.e., AR-ARCH) performed better than the single AR when estimating the temperature parameters at both daily and monthly scales. Additionally, the hybrid MLP-AR and MLP-AR-ARCH models yielded better results compared to the single MLP models; however, the MLP-AR models developed at the study locations performed the best. Evaluating the performance of single AR and MLP under a local condition proved that the single AR models have better accuracy than the MLP models at both stations on a monthly scale. On the contrary, the MLP models of local condition performed better when estimating the Tmin, Tmax and T on a daily scale at Tabriz station. For the values of the Urmia station on a daily scale, the AR performed better than the MLP at local condition to estimate Tmin and vice versa the MLP models showed better statistics compared to the AR models for estimating Tmax and T. As for the hybrid models (i.e., AR-ARCH, MLP-AR and MLP-AR-ARCH), the hybrid AR-ARCH models outperformed the hybrid MLP-AR-ARCH models of local condition when estimating all temperature parameters for both study locations. However, the MLP-AR models developed for a local condition outperformed the hybrid AR-ARCH, except when estimating the daily Tmin and monthly T at Urmia, where the AR-ARCH models present a slightly better accuracy than the MLP-AR models at a local condition. Also, the MLP models under an external condition led to better estimates of the daily and monthly Tmin, Tmax and T parameters for both study regions. The most accurate models at the Tabriz station for estimating Tmin, Tmax, and T on a monthly scale are, respectively, MLP-AR(3) (RMSE = 0.379 °C, MAE = 0.204 °C, NRMSE = 4.977% at the training stage, RMSE = 0.431 °C, MAE = 0.245 °C, NRMSE = 5.423% at the test stage), MLP-AR(1) (RMSE = 0.165 °C, MAE = 0.112 °C, NRMSE = 0.883% at the training stage, RMSE = 0.199 °C, MAE = 0.159 °C, NRMSE = 1.012% at the test stage), and MLP-AR(1) (RMSE = 0.311 °C, MAE = 0.216 °C, NRMSE = 2.366% at the training stage, RMSE = 0.255 °C, MAE = 0.194 °C, NRMSE = 1.846% at the test stage). Moreover, the most precise estimates of the temperature parameters at Urmia station were obtained by the AR(15)-ARCH when estimating the daily Tmin (RMSE = 0.249 °C, MAE = 0.137 °C, NRMSE = 4.727% at the training stage, RMSE = 0.253 °C, MAE = 0.140 °C, NRMSE = 4.801% at the test stage), the MLP-AR(12) when estimating the daily Tmax (RMSE = 0.315 °C, MAE = 0.243 °C, NRMSE = 1.774% at the training stage, RMSE = 0.364 °C, MAE = 0.277 °C, NRMSE = 1.911% at the test stage), and the MLP-AR(14) when estimating the daily T (RMSE = 0.232 °C, MAE = 0.176 °C, NRMSE = 2.018% at the training stage, RMSE = 0.262 °C, MAE = 0.194 °C, NRMSE = 2.155% at the test stage).

Regarding the ability of single and hybrid models to estimate the daily and monthly temperature parameters at the study sites, Tables 2 and 3 clearly show that the models performed better when estimating Tmin and T than for Tmax considering the lower values of RMSE and MAE indicators. However, these criteria cannot be used as reliable statistics to compare the accuracy of models to estimate the different air temperature components since their values depend on the measured values. Therefore, a dimensionless index, like the NRMSE, can be helpful for comparing the performance of different models. It can be concluded that the entire single and hybrid models performed better when estimating Tmax on both daily and monthly scales than for the Tmin and T parameters given their smaller NRMSE values.

4 Conclusions

In this study, a single AR time series model and an AI-based MLP were used to estimate daily and monthly air temperature parameters which include Tmin, Tmax and T. Two sites in Northwestern Iran, namely the Tabriz and Urmia, were used as case studies. The results showed that the single MLP outperformed the AR on a daily scale, and vice versa the single AR performed much better than the single MLP on a monthly scale for estimating all temperature components. In addition, three types of hybrid models were developed via coupling the linear AR with a non-linear ARCH, as well as coupling the previously mentioned time series models with the MLP. Accordingly, the hybrid AR-ARCH, MLP-AR and MLP-AR-ARCH models were tested. It was found that the hybrid AR-ARCH outperformed the single AR. Furthermore, the hybrid MLP-AR and MLP-AR-ARCH were better than the single MLP; however, the hybrid MLP-AR models performed best when estimating the daily and monthly air temperatures of the study regions on both time scales. An external assessment of the MLP was also conducted to evaluate if the data of an adjacent station could be used to estimate the temperature components of a target site. The results revealed that the air temperatures of a given station could be estimated using the data of a neighbor station and the performance of MLP under an external condition was better than the local one. Investigating the performance of the single and hybrid models when estimating the temperature components demonstrated that all the models performed best when estimating the daily and monthly Tmax considering the NRMSE values.

Future research could implement diverse kinds of hybrid models by coupling the linear moving average (MA), ARMA and ARIMA models with non-linear ones such as ARCH, generalized ARCH (GARCH), etc. The aforementioned time series models could also be hybridized with AI-based approaches such as ANN, ANFIS, SVM and so on to more accurately estimate air temperatures. Additionally, the hybrid models developed in this study could be used to estimate other meteorological and hydrological data, such as soil temperature, rainfall, streamflow, etc.