Introduction

Dew is a condensation of atmospheric moisture on the objects which their temperatures are lower than the dew point temperature of the surrounding ambient. In fact, the dew formation occurs once the surface air temperature decline to the dew point temperature. Therefore, the dew point temperature can be defined as the temperature at which moisture or water vapor in the air starts into dew or water droplets. The radiation exchange between the surface of Earth and atmosphere, water vapor pressure as well as turbulent heat are among the major elements influencing the dew formation (Atzema et al. 1990; Shiri et al. 2014). Based upon an agricultural vantage point, dew phenomenon can reduce the vapor pressure deficit in the proximity of the dew drops eventuating in a more favorable photosynthesis (Slatyer 1967) and improving the recovery of water content after extreme water losses (Went 1955).

The availability of accurate and reliable dew point temperature data plays a notable role in various hydrological, climatological and agronomical related researches. Dew point temperature is typically utilized along with relative humidity to identify the moisture’s level in the air. It can also be used in conjunction with wet bulb-temperature for computing the ambient temperature which provides the possibility for being prepare against the potential frosts which may harm crops (Snyder and Melo-Abreu 2005; Shank 2006). Dew point temperature can be used to provide a favorable estimate of the near-surface humidity that influences the stomatal closure in plants where a low level humidity may result in declining the plants productivity (Kimball et al. 1997; Shank 2006). The dew would be really significant for plant survival particularly in the arid areas with rare rainfall (Agam and Berliner 2006). The dew point temperature is as an especially significant element in various hydrological and climatological models for the purpose of reference evapotranspiration estimation (Hubbard et al. 2003; Shank 2006). It is also significant to mention that the dew point temperature data may be useful for the purpose of dam project flood study. Furthermore, dew point depression, defined as the difference between the temperature and dew point temperature, influence the water balance by changing the amount of water evaporated or transpired from a basin. Therefore, the knowledge of dew point temperature can also be useful for this aim.

Over the previous years, the artificial intelligence and computational intelligence techniques have been successfully applied in the estimation of dew point temperature.

Shank et al. (2008) utilized ANN technique for prediction of dew point temperature from 1 to 12 h ahead based upon the previous weather data sets. They used measured data of 20 stations in Georgia State in USA for developing general models to predict dew point temperature in the whole Georgia State. Zounemat-Kermani (2012) evaluated the capability of multi linear regression (MLR) and Levenberg–Marquardt (LM) feed-forward neural network for estimation of hourly dew point temperature in a location in Ontario, Canada. It was found that LM-NN model provide further accuracy compared to the MLR model. Nadig et al. (2013) developed combined air and dew point temperatures models using ANN technique to provide an enhancement in the predictions of both temperatures. Their results demonstrated that the combined method decline the predictions error. Shiri et al. (2014) assessed the capability of two ANN models and gene expression programming (GEP) technique to estimate daily dew point temperature in two stations of Korea. Their results indicated that the GEP model outperforms the ANN models. Kim et al. (2014) utilized two soft computing techniques for estimation of daily dew point temperature in California, USA. By providing comparisons with a conventional regression model, they found that developed soft computing models are more precise in estimating daily dew point temperature.

Mohammadi et al. (2016) applied adaptive neuro fuzzy inference system (ANFIS) to select the most influential parameters for prediction of daily dew point temperature. They analyzed the influence of eight different parameters on dew point temperature prediction in two cities of Iran. Their results showed that, despite climate difference between the selected studied areas, for both cities water vapor pressure is the most relevant parameter while relative humidity is the least relevant parameter. They concluded that using more than two input parameters cannot be proper and advisable.

As a key artificial intelligence method, the neural network (NN) is capable of solving the complicated nonlinear problems which cannot be solved easily via the classic parametric methods. The main shortcoming of NNs is their learning time requirement. Huang et al. (2004) introduced extreme learning machine (ELM) as an algorithm for a single-layer feedforward neural network (SLFN). The algorithm can reduce the required time for training a NN (Huang et al. 2004). ELM is able to produce good generalization performance and speed up the learning process (Huang et al. 2004; Liang et al. 2006) due to its capability in simplifying the training. The ELM also enjoys a faster speed in the learning process than other traditional algorithms such as the back-propagation (BP) (Huang et al. 2015). As a consequence, several researches have been performed to apply the ELM algorithm for problems’ solving in different realms of science (Ghouti et al. 2013; Zhao et al. 2013; Nian et al. 2014; Wang and Han 2014; Wang et al. 2014; Wong et al. 2015; Yu et al. 2014).

Lately, coupling different approaches to build a hybrid model has received considerable attention since it is possible to take the advantage of specific nature of each technique for enhancing the precision. In fact, the particular features of each technique are able to capture different patterns in the data series. Based upon the theoretical and empirical findings it has been proved that hybrid approaches would be particularly effective and promising for different scientific fields to enhance the prediction accuracy and reliability (Partal and Kisi 2007; Kisi and Cimen 2012; Liang et al. 2012; Kavousi-Fard et al. 2014; Xiong et al. 2014; Sudheer et al. 2014; Shrivastava and Panigrahi 2014; Shamshirband et al. 2015, 2016; Mohammadi et al. 2015; Olatomiwa et al. 2015).

Therefore, in this study, the ELM is coupled with wavelet transform (WT) algorithm to propose a hybrid approach for prediction of daily dew point temperature. To test the validity of the proposed method named ELM-WT, daily weather data sets for port of Bandar Abass located in the south costal part of Iran have been utilized as a case study. Wavelet analysis is utilized for preprocessing and decomposing the time series of weather data into its various components, after which the decomposed components are utilized as inputs for the ELM model. The merit of the proposed ELM-WT approach is validated against ELM, support vector machines (SVM) and artificial neural network (ANN). The performances evaluations are conducted on the basis of several widely utilized statistical parameters to provide an appropriate comparative study.

Methodology

In this research work, a new method by hybridizing the extreme learning machine (ELM) with wavelet transform (WT) algorithm is proposed to predict daily dew point temperature. The precision of the hybrid ELM-WT method is compared with support vector machine (SVM) and artificial neural network (ANN) approaches. In this section, the description of the employed methods is presented.

Extreme learning machine (ELM)

Huang at el. (2004) introduced ELM as a tool of learning algorithm for single-layer feedforward neural network (SLFN) architecture (Huang et al. 2004, 2006a). The ELM determines random input weight parameters and analytically defines the output weights parameters (Huang et al. 2015).

Single hidden layer feed-forward neural network (SLFN)

Mathematical representation of SLFN function incorporates additives and radial basis function (RBF) intermediate nodes (hidden nodes) in an integrated manner as (Huang et al. 2006b; Liang et al. 2006):

$$f_{L} \left( x \right) = \mathop \sum \limits_{i = 1}^{L} \beta_{i} G\left( {a_{i} ,b_{i} ,x} \right),\quad x \in R^{n} ,\quad a_{i} \in R^{n}$$
(1)

where f L (x) is the output function of ELM for generalized SLFNs, x is input, the learning parameters of hidden nodes are represented by a i and b i , L is number of hidden nodes and β i is the connecting weight between the ith hidden and output node. \(G\left( {a_{i} ,b_{i} ,x} \right)\) shows the output of the ith hidden node with regard to the input x. The additive hidden node with the activation function of \(g\left( x \right):R \to R\) (e.g., sigmoid and threshold) is (Huang et al. 2006a):

$$G\left( {a_{i} ,b_{i} ,x} \right) = g\left( {a_{i} \cdot x + b_{i} } \right),\quad b_{i} \in R$$
(2)

where a i is weight vector which connects input layer to ith hidden-node and b i is the bias of ith node a i . Internal product of vector a i and x in R n is represented as x. Activation function \(g\left( x \right):R \to R\) finds \(G\left( {a_{i} ,b_{i} ,x} \right)\) for RBF hidden node as (Huang et al. 2006a):

$$G\left( {a_{i} ,b_{i} ,x} \right) = g\left( {b_{i} \left\| {x - a_{i} } \right\|} \right),\quad b_{i} \in R^{ + }$$
(3)

The center and impact factors for RBF ith node is represented by a i and b i , respectively. Entire positive real value set is represented by R +. A particular SLFN case containing RBF node in their hidden layers forms the RBF network. Arbitrary N distinct samples are represented as \(\left( {x_{i} ,t_{i} } \right) \in R^{n} \times R^{m}\). The x i and t i are input vector and target vector, respectively. The size of x i and t i are n × 1 and m × 1, respectively. Provided that a SLFN is capable of approximating these N samples, it means that there exists β i , a i and b i as (Huang et al. 2006a):

$$f_{L} \left( x \right) = \mathop \sum \limits_{i = 1}^{L} \beta_{i} G\left( {a_{i} ,b_{i} ,x} \right),\quad j = 1, \ldots ,N.$$
(4)

Equation (4) may be expressed neatly as:

$$H\beta = T$$
(5)

where

$$H(\tilde{a},\tilde{b},\tilde{x}) = \left[ {\begin{array}{*{20}c} {G\left( {a_{1} ,b_{1} ,x_{1} } \right)} & \cdots & {G\left( {a_{L} ,b_{L} ,x_{1} } \right)} \\ {} & \cdots & {} \\ {G\left( {a_{1} ,b_{1} ,x_{N} } \right)} & \cdots & {G\left( {a_{L} ,b_{L} ,x_{N} } \right)} \\ \end{array} } \right]_{N \times L}$$
(6)

With \(\tilde{a} = a_{1} , \ldots ,a_{L} ;\quad \tilde{b} = b_{1} , \ldots ,b_{L} ;\quad \tilde{x} = x_{1} , \ldots ,x_{L}\)

$$\beta = \left[ {\begin{array}{*{20}c} {\beta_{1}^{T} } \\ \vdots \\ {\beta_{L}^{T} } \\ \end{array} } \right]_{L \times m} \quad {\text{and}}\quad T = \left[ {\begin{array}{*{20}c} {t_{1}^{T} } \\ \vdots \\ {t_{L}^{T} } \\ \end{array} } \right]_{N \times m}$$
(7)

where H is a matrix of SLFN’s hidden layer output. In matrix H, ith output of hidden mode forms the ith column of H for the corresponding inputs \(x_{1} , \ldots ,x_{N}\). In this study, 13 hidden nodes are used.

Principle of ELM

ELM designed as a SLFN with L hidden neurons can learn L distinct samples with zero error (Huang et al. 2004, 2006a). Even if the number of hidden neurons (L) is less than the number of distinct samples (N), ELM is still capable of assigning random parameters to the hidden nodes and computing the output weights by pseudo inverse of H giving only a small error ε > 0. The hidden node parameters of ELM should not be tuned throughout training and can easily be assigned with random values.

Discrete wavelet transform

Wavelet transform (WT) represents the basis of the mathematical expression to decompose time series frequency signal into various components (Mallat 1989; Mallat 2009; Peng and Chu 2004), and is also known as the signal processing algorithm from Fourier transforms. The key superiority over Fourier transform is its ability to perform accurate analysis based on resulting decomposed components with scaled-fit resolution that aids to enhance the size of the study model due to its capability to obtain the required information at various levels (Adamowski and Chan 2011). This is ideal for data analysis applications with a time domain and frequency owing to its ability to extract information from transient and non-periodic signals (Jawerth and Sweldens 1994). Recently, these methods have generated enormous interest for engineering applications (Burrus et al. 1997; Kalteh 2013).

On the other hand, continuous wavelet transform (CWT) with signal f(t), is defined as the time-scale technique of signal processing as

$$W_{f} (a,b,\psi ) = \frac{1}{{\sqrt {\left| a \right|} }}\int_{ - \infty }^{\infty } {f(t)\psi^{*} \left( {\frac{t - b}{a}} \right){\text{d}}t},\,b \in R,\,a \in R,\,a \ne 0$$
(8)

where ψ is the mother wavelet function, ψ*(t) denotes the complex conjugate of ψ, t is the time, b describes the time shifting parameter. By discretizing Eq. (8), the discrete wavelet transform (DWT) can be found whereby the parameters a and b can be found as:

$$a = a_{0}^{m} ,\,\quad b = na_{0}^{m} b_{0} ,\,\quad a_{0} > 1,\,\quad b_{0} \in R,$$
(9)

where n and m are integer numbers that control the scale and translation, respectively, a 0 is a fixed dilation step, and b 0 is a translation factor that depends on the aforementioned dilation step.

DWT of f(t) can be written as:

$$W_{f} (m,n,\psi ) = a_{0}^{ - m/2} \int_{ - \infty }^{\infty } {f(t)\psi^{*} \left( {a_{0}^{ - m} t - nb_{0} } \right){\text{d}}t}$$
(10)

According to (Mallat 1989), when a 0 = 2 and b 0 = 1 Eq. (10) becomes binary wavelet transform:

$$W_{f} (m,n,\psi ) = a_{0}^{ - m/2} \int_{ - \infty }^{\infty } {f(t)\psi^{*} \left( {a_{0}^{ - m} t - nb_{0} } \right){\text{d}}t}$$
(11)

\(W_{f} (a,b,\psi )\) or \(W_{f} (m,n,\psi )\) represents the features of original time series in frequency (a or m) and time domain (b or n) at the same time (Wang and Ding 2003). When a or m gain small values, the frequency resolution of wavelet transform becomes low, but the time domain resolution would then be high. When a or m gain large values, the frequency resolution of wavelet transform would then be high, but the time domain resolution would have low values.

Support vector machine (SVM)

SVM is a soft computing method that has been applied in a large number of scientific fields (Lee and Verri 2003; Lu and Wang 2005; Asefa et al. 2006; Ji and Sun 2013; Sun 2013). The details of theory and evolution of SVM developed by Vapnik can be found in Vapnik (2000) and Vapnik and Vapnik (1998).

On the basis of Vapnik’s theory, the SVM functions are offered by Eqs. (1215), where \(R = \left\{ {x_{i} ,d_{i} } \right\}_{i}^{n}\) is utilized for assuming a set of data points, the input space vector is shown by x i , and the desired value and data size are defined as d i and n, respectively. The SVM approximates the function as represented by Eqs. (12) and (13):

$$f\left( x \right) = w\varphi \left( x \right) + b$$
(12)
$$R_{\text{SVMs}} (C) = \frac{1}{2}\left\| w \right\|^{2} + C\frac{1}{n}\sum\limits_{i = 1}^{n} {L(x_{i} ,d_{i} )}$$
(13)

where \(\varphi (x)\) shows high dimensional space characteristic mapped from x. Also, w and b are normal vector and scalar, respectively. Furthermore, \(C\frac{1}{n}\sum\nolimits_{i = 1}^{n} {L(x_{i} ,d_{i} )}\) represents error or risk. Factors b and w are measured by minimizing the regularized risk equation by introducing the positive slack variables \(\xi_{i}\) and \(\xi_{i}^{*}\) as (Vapnik and Vapnik 1998):

$${\text{Minimize}}\quad R_{\text{SVMs}} \left( {w,\xi^{\left( * \right)} } \right) = \frac{1}{2}\left\| w \right\|^{2} + C\sum\limits_{i = 1}^{n} {(\xi_{i} + \xi_{i}^{*} )}$$
(14)
$${\text{Subject to}}\quad \left\{ {\begin{array}{*{20}l} {d_{i} - w\varphi \left( {x_{i} } \right) + b_{i} \le \varepsilon + \xi_{i} } \hfill \\ {w\varphi \left( {x_{i} } \right) + b_{i} - d_{i} \le \varepsilon + \xi_{i}^{*} } \hfill \\ {\xi_{i} ,\xi_{i}^{*} \ge 0,\quad i = 1, \ldots ,l} \hfill \\ \end{array} } \right.$$

where \(\frac{1}{2}\left\| w \right\|^{2}\) is the regularization term, C characterizes the error penalty feature utilized to regulate the trade-off between the regularization term and empirical error, ε denotes the loss function associated with approximation accuracy of the trained data point and the factors number in the training data is l.

Lagrange multiplier and optimality constraints utilized to solve Eq. (12) are consequently attained by a generic function as follows:

$$f\left( {x,a_{i} a_{i}^{*} } \right) = \sum\limits_{i = 1}^{n} {(a_{i} - a_{i}^{*} )K(x,x_{i} ) + b}$$
(15)

where \(K\left( {x,x_{i} } \right) = \varphi \left( {x_{i} } \right)\varphi \left( {x_{j} } \right)\) and the term K is the kernel function, which is dependent on the two vectors x i and x j in the feature space \(\varphi \left( {x_{i} } \right)\) and \(\varphi \left( {x_{j} } \right)\), respectively.

The kernel function, K, creates a non-linear mapping; thus, it can be used for operation in a higher dimensional space. This is used for calculating the inner product that can be served as a function for the original input points. The SVM’s flexibility for using the kernel functions is significant where it discreetly changes the information into a higher-dimensional feature space. The achieved results in this space characterize the outcomes of the lower-dimensional, original input space.

Four kernel functions including sigmoid, linear, polynomial, and radial basis functions are used for SVM models. Nevertheless, radial basis function (RBF) is usually considered as an ideal function because of its capability for adaptable, simple, reliable and effective computation for the sake of optimization particularly for compatibility in handling the complicated parameters (Rajasekaran et al. 2008; Yang et al. 2009; Wu and Wang 2009). The non-linear radial basis kernel function is defined as:

$$K(x_{i} ,x_{j} ) = \exp \left( { - \gamma \left\| {x_{i} - x_{j} } \right\|^{2} } \right)$$
(16)

where \(x_{i}\) and \(x_{j}\) are vectors in the input space, such as the vectors of features computed from training and testing. γ is defined by \(\gamma = - \frac{1}{{2\sigma^{2} }}\) for which σ is the Gaussian noise level of standard deviation.

RBF was adopted in this study for prediction of daily dew point temperature. The accuracy of the SVM models largely depends on the appropriate selection of user-defined parameters of C, γ and ε associated with kernels. In this study, the optimal values of user-defined parameters for the SVM model are C = 2.47, γ = 0.67, and ε = 0.62, obtained by trial and error procedure.

In Fig. 1, the hidden nodes’ centres are the SVM’s support vectors and weights, named as the Lagrange multipliers \(\left( {\bar{\alpha }_{i} = \alpha_{i} - \alpha_{i}^{*} } \right)\) that determine the relative significance of the training data sets for the final output.

Fig. 1
figure 1

The network architecture of SVM

Artificial neural network (ANN)

An usual ANN has 3 layers namely input, output and hidden layers (Aziz and Wong 1992; Balkhair 2002; Chau 2007; Schalkoff 1997).

The input vectors are \(D \in {\mathbb{R}}^{n}\) and \(D = (X_{1} ,X_{2} , \ldots ,X_{n} )^{\text{T}} ,\) the outputs of q neurons in the hidden layer are \(Z = (Z_{1} ,Z_{2} , \ldots ,Z_{n} )^{\text{T}} ,\) the outputs of the output layer are \(Y \in {\mathbb{R}}^{m} ,Y = (Y_{1} ,Y_{2} , \ldots ,Y_{n} )^{\text{T}} ,\) and the weight and the threshold between the input layer and the hidden layer are w ij and y j , respectively.

The following equations represent the neuron outputs in hidden layer and output layer (Schalkoff 1997):

$$Z_{j} = f\left( {\mathop \sum \limits_{i = 1}^{n} w_{ij} X_{i} } \right)$$
(17)
$$Y_{k} = f\left( {\mathop \sum \limits_{j = 1}^{q} w_{kj} Z_{j} } \right)$$
(18)

where a transfer function f is used to offer the rule for mapping the neuron’s total input to its output. A proper selection means introducing a non-linearity into the network’s design. Input and output layers are defined through training data. Number of inputs and number of outputs define the number of input nodes and output nodes, respectively. In this study, the key parameters of the developed ANN model were chosen manually by trial and error procedure. One of the main ANN parameter is learning rate. If the learning rate value be high, then the NN may learn more rapidly; however, in case there is a high variability in the input set then the network cannot learn favorably or at all. Usually, it is better to set the factor to a small value. In this article learning rate of ANN is chosen 0.2 by trial and error procedure. Momentum is also one of the most important ANN parameter. Momentum allows a change to weight of ANN which persists for a number of adjustment cycles. Momentum factor controls the magnitude of the persistence which could improve rate of learning in some situations by helping to smooth out unusual conditions in the training set. The chosen momentum factor by trial and error procedure in this study is 0.1. The structure of the ANN model is shown in Fig. 2.

Fig. 2
figure 2

ANN structure

Case study and data collection

To appraise the merit of the developed hybrid ELM–WT model, measured daily weather data sets for city of Bandar Abass were utilized in the present study. Bandar Abass, the capital of the Hormozgan province, is located in the southern part of Iran at 27°13′N and 56°22′E, and its elevation is 9.8 m above the sea level. The low level of precipitation, short cool season and long warm season are among the climatic features of this area. According to the Köppen classification the climate condition of Bandar Abass is categorized as BWh, which relates to arid desert hot (Kottek et al. 2006). It is worthwhile to mention that Bandar Abass which is the biggest port of Iran is considered as a significant location in the country in terms of agricultural and hydrological view points. In fact, Hormozgan province and Bandar Abass have an active agriculture sector, ranking first in Iran terms of production of some crops. Also, there are several dams in this region. Therefore, due to importance of accurate dew point temperature in agricultural, hydrological and crop modeling, Bandar Abass was nominated in this research work as a case study.

For this research work, 10 years measured data provided by Iranian Meteorological Organization (IMO) for the period of 2000–2009 have been utilized. Based upon the physical factors influencing the formation of dew, three widely available parameters have been nominated as inputs to predict daily dew point temperature. The used data sets consist of measured daily dew point temperature (T dew), average ambient temperature (T avg), relative humidity (R h) and atmospheric pressure (P). It should be mentioned that daily average ambient temperature is the average of maximum and minimum air temperatures throughout the day.

The accessible data for this study were divided into two data sets for training and testing. Generally, there is no rule to choose the size of these data sets. In this study, nevertheless, 7 years data sets for the period of 2000–2006 were utilized to train the models and the remaining 3 years for the period of 2007–2009 were used to test the models. In another word, the models were trained by 2555 days and tested using 1095 days.

Some descriptive statistics including mean values, standard deviation, minimum and maximum values and the range of the data utilized as well as the correlation coefficient between the dew point temperature and the considered input variables for both training and testing data sets are listed in Table 1. The variables of T avg, R h and P were selected as inputs owing to their broad accessibility in the long-term format as well as their favorable correlation with T dew; thus, convenient predictions can be performed with favorable accuracy using a limited number of input parameters. It is noticed that there are positive correlations between T dew and two variables of T avg and R h whereas negative correlations exist between T dew and P. This means that T dew increase with increasing T avg and R h while T dew decrease with increasing P.

Table 1 Descriptive statistics for daily weather data utilized for training and testing

Statistical performance assessment

The proficiency of the proposed hybrid ELM-WT model to predict daily dew point temperature is appraised using different statistical indicators which their descriptions are offered briefly in the following.

The bias error (BE) represents the deviation of predicted data from measured data and it is used to recognize that whether the predicted data are either larger or smaller than the measured values. BE is calculated by:

$${\text{BE}} = \left( {X_{{i,{\text{pred}}}} - X_{{i,{\text{meas}}}} } \right)$$
(19)

where \(X_{{i,{\text{pred}}}}\) and \(X_{{i,{\text{meas}}}}\) are the ith predicted and measured values, respectively.

The relative percentage error (RPE) presents the percentage deviation between the predicted and measured data and its values falling within the interval of −10 to +10 % are typically considered acceptable (Duzen and Aydin 2012). RPE is calculated by:

$${\text{RPE}} = \left( {\frac{{X_{{i,{\text{pred}}}} - X_{{i,{\text{meas}}}} }}{{X_{{i,{\text{meas}}}} }}} \right) \times 100$$
(20)

The mean absolute percentage error (MAPE) shows the mean absolute percentage deviation between the predicted and measured data. Also, the mean absolute bias error (MABE) indicates the average quantity of total absolute bias errors between the predicted and measured values. The MAPE and MABE are obtained respectively, by:

$${\text{MAPE}} = \frac{1}{x}\sum\limits_{i = 1}^{x} {\left| {\frac{{X_{{i,{\text{pred}}}} - X_{{i,{\text{meas}}}} }}{{X_{{i,{\text{meas}}}} }}} \right| \times 100}$$
(21)
$${\text{MABE}} = \frac{1}{x}\sum\limits_{i = 1}^{x} {\left| {X_{{i,{\text{pred}}}} - X_{{i,{\text{meas}}}} } \right|}$$
(22)

where x is the total number of observations.

The root mean square error (RMSE) identifies the models’ accuracy by providing comparison between the predicted and real data. The RMSE has always a positive value and is calculated by:

$${\text{RMSE}} = \sqrt {\frac{1}{x}\sum\limits_{i = 1}^{x} {\left( {X_{{i,{\text{pred}}}} - X_{{i,{\text{meas}}}} } \right)}^{2} }$$
(23)

The correlation coefficient (R) provides a measure of the linear relationship between the predicted and measured values, obtained by:

$$R = \frac{{\sum\nolimits_{i = 1}^{x} {\left( {X_{{i,{\text{pred}}}} - \bar{X}_{\text{pred}} } \right) \cdot \left( {X_{{i,{\text{meas}}}} - \bar{X}_{\text{meas}} } \right)} }}{{\sqrt {\left[ {\sum\nolimits_{i = 1}^{x} {\left( {X_{{i,{\text{pred}}}} - \bar{X}_{\text{pred}} } \right)^{2} } } \right]\left[ {\sum\nolimits_{i = 1}^{x} {\left( {X_{{i,{\text{meas}}}} - \bar{X}_{\text{meas}} } \right)^{2} } } \right]} }}$$
(24)

where \(\bar{X}_{\text{pred}}\) and \(\bar{X}_{\text{meas}}\) are the average of predicted and measured values, respectively.

The smaller values of BE, RPE, MABE, MAPE and RMSE indicate higher accuracy of the predictions. While greater values of R show higher linear relationship between the predicted and measured values so that its value of −1 or +1 shows a perfect linear relationship.

Results and discussion

In this research work, the ELM was coupled with WT algorithm to propose a new hybrid approach named ELM-WT for prediction of daily dew point temperature. The merit of the proposed ELM-WT method was verified against ELM, SVM and ANN.

Table 2 presents the parameters of the ELM, SVM and ANN modeling frameworks employed in this study.

Table 2 User-defined parameters for the developed ELM, SVM and ANN models

The following steps are considered for the ELM modeling:

Step 1: Initiate the population based on the ELM function.

Step 2: Evaluate the fitness function of each parameter

Step 3: Step 2 and Step 3 are iterated repetitively until the optimal iteration time is satisfied.

Step 4: The optimal parameters of ELM function can be determined. Then based on the optimized parameters, the hidden layer function matrix is computed.

Step 5: Determine the final output weights.

In this study, wavelet analysis is employed to decay series of the dew point temperature data to individual components, where the decomposed components can be considered for data inputs into the ELM model expressed. Figure 3 depicts the flow chart for obtaining the optimal ELM parameters using the wavelet algorithm.

Fig. 3
figure 3

Flow chart of proposed Wavelet-based parameter determination approach for the ELM

Basically, the ability of each model and technique to provide precise predictions is dependent on appropriate selection of input parameters. For this study, three highly accessible parameters of average ambient temperature (T avg), relative humidity (R h) and atmospheric pressure (P) were selected. These parameters have important influences on formation of dew so that T avg corresponds to the turbulent temperature. Also, R h and P correspond to the water vapor pressure (Shiri et al. 2014). The primary evaluation revealed that T avg is more effective compared to R h and P to obtain acceptable prediction of daily dew point temperature. Thus, further analysis is only based on the combination of T avg with R h and P. To achieve this, four models with 1, 2 and 3 input combinations are established using each approach and later explored. These four models with the used input elements are presented in Table 3.

Table 3 The considered models with different input elements

The performance evaluations of the models are conducted via several reliable and widely utilized statistical indicators. For this purpose, a comparison between the predicted values of daily dew point temperature via ELM-WT, ELM, SVM and ANN models and the measured ones was performed by the statistical indicators of MAPE, MABE, RMSE and R. Table 4 displays the performances of all developed models in terms of calculated statistical indicators. The results clearly indicate that for all considered combinations of inputs, the proposed ELM-WT approach presents absolute superiority over other employed techniques. This conclusion is drawn due to the lower values of MAPE, MABE and RMSE as well as the higher values of R. It is also noticed that the ELM, SVM and ANN approaches, respectively placed in the next ranks in terms of offering higher accuracy. The presented results in Table 4 convincingly advocate that coupling the ELM with WT would be highly promising as it leads to favorable improvement in the precision of ELM for prediction of dew point temperature. From the attained results it is also found that selection of proper input combinations has a notable influence on the prediction accuracy obtained by each method. It is observed that model (3) and (4) of each approach utilizing T avg and R h as well as T avg, R h and P as input elements presents much greater precision compared to models (1) and (2). Nevertheless, the model (3) has slightly better performance than model (4) and less complexity in terms of required inputs. Thus, utilization of model (3) with two inputs of T avg and R h is undoubtedly more appropriate. For the developed ELM-WT (3) model, statistical parameters of MAPE, MABE, RMSE and R are achieved as 6.1664 %, 0.5495, 0.7621 and 0.9953 °C, respectively for the testing phase. While for the ELM (3) placed in the next rank the obtained values are 10.0576 %, 0.8642, 1.0625 and 0.9905 °C, respectively.

Table 4 Performance assessment of the all developed ELM-WT, ELM, SVM and ANN models on the basis of the statistical parameters

In the following the merit of ELM-WT (3) is further assessed compared to ELM (3), SVM (3) and ANN (3).

The scatter plots of the predicted daily dew point temperature values by ELM-WT (3), ELM (3), SVM (3) and ANN (3) models against measured data are presented in Fig. 4a–d, respectively for the testing phase. The values of coefficient of determination (R 2), as an indicator of the linear relationship between the predicted and observed data, were calculated and presented for all figures. It is highly clear that the highest R 2 between the predicted dew point temperature values and the measured values are attained by the ELM-WT (3) model. In fact, it is observed that the dispersion degree of the scatter plots of the ELM-WT models is highly lower than other plots presented for the ELM, SVM and ANN models. This demonstrates the higher degree of linear relationship between the predicted values by ELM-WT (3) and the measured values.

Fig. 4
figure 4

Scatter plots of the predicted daily dew point temperatures by: a ELM-WT (3), b ELM (3), c SVM (3) and d ANN (3) versus the measured values

To evaluate the performance of the developed ELM-WT (3), ELM (3), SVM (3) and ANN (3) models in different days of the year, a statistical analysis was performed by computing the daily bias error (BE) and daily relative percentage error (RPE). These evaluations would definitely be useful to determine the capability of the models for dew point temperature prediction in different days throughout the years. Figure 5a–d illustrates the daily BE between the measured dew point temperature data and the predated values by ELM-WT (3), ELM (3), SVM (3) and ANN models, respectively for the testing phase. Clearly, the daily BE achieved for the ELM-WT (3) model has the lowest values compared to the other models. In fact, for the ELM-WT (3) model the attained daily BE are highly favorable so that for most of the days the BE falls within the range of −1 to 1 °C.

Fig. 5
figure 5

Daily bias error (BE) between the predicted daily dew point temperatures by: a ELM-WT, b ELM, c SVM and d ANN and the measured values

RPE analysis was also utilized to evaluate the day by day performance of the developed models. On this account, if a higher number of days fall in the lower ranges of RPE, the precision of the model will be higher. As mentioned, based on the RPE definition, the RPE values which fall in the interval of −10 to 10 % are typically considered acceptable (Duzen and Aydin 2012).

The attained results show that for the ELM-WT (3), ELM (3), SVM (3) and ANN (3) models, of the 1095 days considered for the testing phase, the predictions fall within the acceptable range, respectively in the 991, 925, 855, 797 days. This means that the predicted values by the ELM-WT (3), ELM (3), SVM (3) and ANN (3) models, respectively for 91, 85, 78, and 73 % of the testing data set fall within the RPE acceptable range of −10 and +10 %. Thus, it is found that the ELM-WT (3) model offers much greater performance in terms of RPE analysis since in a higher number of days its predictions fall in the acceptable range of −10 and +10 %.

The combination of BE and RPE analysis clearly demonstrates that the ELM-WT (3) model enjoys high capability to predict daily dew point temperature in different days throughout the years.

Conclusion

Accessibility to the accurate and reliable dew point temperature data is of indispensable importance in many scientific areas such as hydrology, climatology and agronomy. Hybrid models which enjoy the particular features of different techniques would be effective to attain further accuracy and reliability in the predictions. In this research work, a new hybrid method based on integrating the ELM with WT algorithm was proposed for prediction of daily dew point temperature. For this aim, daily climate data of an Iranian station placed in the south costal of the country were utilized as a case study. According to the physical factors influencing the formation of dew, three highly available meteorological parameters of T avg, R h and P were considered as input elements. Wavelet analysis was utilized for preprocessing and decomposing the time series of weather data into its various components, after which the decomposed components were utilized as inputs for the ELM model. To certify the suitability of the ELM-WT method, its predictions were compared against powerful techniques of ELM, SVM and ANN. To offer thorough comparisons and performance evaluations, the widely-used statistical indicators of BE, RPE, MAPE, MABE, RMSE and R were utilized. Based upon the achieved results it was conclusively proved that the hybrid ELM-WT approach favorably outperforms other examined techniques. In fact, the results demonstrated that integrating the WT into ELM boosts the effectiveness of ELM for prediction.

The primary evaluation showed that T avg is more effective than R h and P to attain acceptable predictions. Among four considered sets of parameters with 1, 2 and 3 inputs the combination of T avg and R h were determined as more effective inputs. For the best ELM-WT model using T avg and R h as inputs, the statistical indicators of MAPE, MABE, RMSE and R were obtained equal to 6.1664 %, 0.5495, 0.7621 and 0.9953 °C, respectively. The BE and RPE analysis showed that the best ELM-WT model, using T avg and R h as inputs, enjoys high performance to predict daily dew point temperature in different days throughout the years. For the best ELM-WT model, 91 % of the predictions fell within the RPE acceptable range of −10 and +10 %.

To sum up, the proposed hybrid ELM-WT approach would be highly efficient to offer precise predictions of daily dew point temperature and also higher accuracy than the single ELM technique.