Introduction

Renewable energy sources are progressively made use of because of the amazingly booming contamination degrees in the air, water and soil. Wind energy has ended up being the prime focus for energy developers because of the accessibility of megawatt size wind equipment, environmental friendliness, low cost of maintenance, ease of availability, etc. Moreover, wind resource is a tidy, limitless and free resource. This resource has offered mankind for numerous centuries in driving the wind turbines and pumping water (Bakhsh et al. 1985). To utilize this resource efficiently, prediction of wind speed (WS) plays a critical role. This is necessary for site planning, performance analysis and deciding the optimal size of the wind turbine. Rehman and Halawani (1994) predicted wind speed using the autoregressive moving average method. Recently, machine learning (ML) models such as artificial neural network (ANN), support vector machine (SVM), Gaussian process regression (GPR), fuzzy logic (FL) and extreme learning machine (ELM) have been extensively used for this purpose. Selection of the most appropriate ML technique is very important for obtaining accurate results. Apart from these ML techniques, researchers are nowadays focused on the development of hybrid ML tools to achieve accurate WS forecasting (Natarajan and Nachimuthu 2019). Furthermore, hybrid ML has the advantage of both algorithms.

Salcedo-Sanz et al. (2011) tried to estimate the short wind speed using an evolutionary SVR model. Wang et al. (2015) forecasted the WS using support vector regression (SVR) optimized by the cuckoo search optimization algorithm. Wang et al. (2016) utilized the SVM whose parameter search was optimized using a novel steepest descent Cuckoo search algorithm for the dataset which was pre-processed using the empirical mode decomposition. Khosravi et al. (2018) forecasted the WS and wind direction with the help of multi-layer feed-forward ANN, SVM with radial basis function (RBF) and adaptive neuro-fuzzy inference system (ANFIS) optimized using a swarm optimization (SO) called particle SO algorithm. Mi et al. (2019) developed a new WS multi-step forecasting framework based on the singular spectrum analysis, empirical mode decomposition and convolution SVM. Wu and Lin (2019) forecasted the WS based on LS-SVM optimized using the bat optimization algorithm. They applied variation mode decomposition to disintegrate the original WS series into separate sub-series with different frequencies. Fu et al. (2019) coupled SVM with improved chicken SO algorithm for the prediction of WS, and the results were compared with SVM-CSO. Xiang et al. (2019) forecasted WS based on the hybrid of improved empirical wavelet transform and LS-SVM. Li et al. (2020) performed wind power prediction based on SVM with improved dragonfly algorithm. Vinothkumar and Deeba (2020) used a long short-term memory network model and variants of the SVMs for forecasting the WS in the neighbourhood of windmills. Gupta et al. (2021) adopted convolution long-term short memory (LSTM) for the short-term prediction of wind power density for five stations in Tamil Nadu.

In addition to the above discussed hybrid SVM models, wavelet transform has been used in combination with SVM for the forecasting the WS of a location. There are two types of studies where the wavelet transform (WT) has been adopted. In some of the earlier studies, wavelet has been adopted for the processing of the WS data by considering the same as a signal, while in some other cases, a wavelet-based kernel has been used in the SVM model. Some of the studies which fall under the former category have been performed by Liu et al. (2014), Sangita and Deshmukh (2011), Sun et al. (2013), Sivanagaraja et al. (2014) and Prasetyowati et al. (2019). While most of them have considered the only SVM for the former case, Dhiman et al. (2019) have adopted not only SVM but also its variants. They analysed the efficiency of a hybrid model consisting of WT and several variants of SVR like ɛ-SVR, least square SVR, twin SVR (TSVR) and ɛ-TSVR for the prediction of WS at a farm in Spain. Only a few studies about the latter case have been performed for WS prediction. Zeng and Qiao (2012) performed wind power prediction using a novel wavelet SVM that changes between the RBF kernel and a Mexican hat kernel. They established that the wavelet SVM outperforms the radial basis function kernel and Mexican hat kernels. He and Xu (2019) predicted the WS of Ningxia using a combination of a wavelet as well as polynomial kernel function and concluded that the combined kernel has better accuracy compared to the single kernel function. A few very recent prominent works are portrayed in Table 1.

Table 1 List of a few recent contributions for wind speed prediction

In this study, the major contributions are as follows:

  1. 1.

    Motivated by the contribution of Zhang et al. (2004) and Ding et al. (2014), two novel wavelet-based models, namely, Morlet wavelet kernel LSTSVR (MKLSTSVR) and Mexican hat LSTSVR (MHKLSTSVR) models, are proposed for WS prediction.

  2. 2.

    Inspired by the PLSTSVR model, the optimization problem of the proposed MKLSTSVR and MHKLSTSVR is solved in primal, which always shows better performance compared to solving them in duals (Gupta 2017).

  3. 3.

    To expand the kernel selection range of the PLSTSVR model and to improve its generalization ability, the wavelet kernels are embedded in the PLSTSVR model.

The models have been applied to the WS data collected from four locations in the state of Tamil Nadu, India, and the performance of these models has been compared with other ML models such as TSVR, primal least square twin support vector regression (PLSTSVR) and large-margin distribution machine-based regression (LDMR). The rest of this paper is organized as follows: the “Related works” section briefly describes the related works. The “Proposed models” section describes the wavelet analysis briefly and also proposes the two novel models. Dataset description and area of study is presented in the “Dataset description” section. Experimental and numerical analyses are presented in the “Experimental setup and numerical analysis” section; statistical analyses are presented in the “Statistical analysis using Friedman test with post hoc analysis” section. The “Conclusion” section describes the conclusion.

Related works

Consider that \(m\) is the total number of samples. The total training samples can be represented as \(\left\{ {x_{i} } \right\}_{i = 1}^{m} \in R^{n}\). \(x_{i} \in R\) and \(y_{i} \in R^{N}\) indicate the input training points and original output, respectively. Also, let us consider that \(G \in R^{m \times n}\) are the input samples. Here the ith row vector can be represented as \(x_{i}^{t}\) \(y = (y_{1,...} y_{m} )\) denotes the observed data points.

The TSVR model

TSVR (Peng 2010) seeks for two non-parallel functions termed as ε-insensitive down-bound \(f_{1} (x) = K(x^{t} ,D^{t} )w_{1}^{{}} + b_{1}\) and up-bound function \(f_{2} (x) = K(x^{t} ,D^{t} )w_{2}^{{}} + b_{2}\), respectively. The primal problems of TSVR are expressed as follows:

$$\begin{array}{c}min\frac12\parallel y-\varepsilon_1e-\left(K\left(G,G^t\right)w_1+b_1e\right)\parallel^2+C_2e^t\psi_1,\\s.t.,y\left(K\left(G,G^t\right)w_1+b_1e\right)\geq\varepsilon_1e-\psi_1,\geq0\end{array}$$
(1)

and

$$\begin{array}{c}min\frac12\parallel y-\varepsilon_2e-\left(K\left(G,G^t\right)w_2+b_2e\right)\parallel^2+C_2e^t\psi_2,\\s.t.,y\left(K\left(G,G^t\right)w_2+b_2e\right)\geq\varepsilon_2e-\psi_2,\geq0.\end{array}$$
(2)

where regularization parameters are \(C_{1} ,C_{2} > 0\), and input parameters are \(\varepsilon_{1} ,\varepsilon_{2} > 0\); the slack variables are indicated by \(\psi_{1}\) and \(\psi_{2}\). \(e\) is the ones vector and \(G \in R^{m \times n}\) are the input samples.

By introducing the Lagrangian multipliers \(\gamma_{1} ,\gamma_{2} > 0\) and applying the K.K.T. sufficient conditions on Eq. (2), the duals of Eqs. (1) and (2) are expressed as follows:

$$\begin{array}{*{20}c} {\max - \frac{1}{2}\gamma_{1}^{t} M_{1} (M_{1}^{t} M_{1} )^{ - 1} M_{1}^{t} \gamma_{1} + R^{t} M_{1} (M_{1}^{t} M_{1} )^{ - 1} M_{1}^{t} \gamma_{1} - R^{t} \gamma_{1} ,} \\ {\,s.t.,0 \le \gamma_{1} \le C_{1} e} \\ \end{array}$$
(3)

and

$$\begin{array}{*{20}c} {\max - \frac{1}{2}\gamma_{2}^{t} M_{1} (M_{1}^{t} M_{1} )^{ - 1} M_{1}^{t} \gamma_{2} + Q^{t} M_{1} (M_{1}^{t} M_{1} )^{ - 1} M_{1}^{t} \gamma_{2} - Q^{t} \gamma_{2} ,} \\ {s.t.,\,0 \le \gamma_{2} \le C_{2} e,} \\ \end{array}$$
(4)

where \(R = (Y - \varepsilon_{1} e),\;Q = (y + \varepsilon_{2} e)\), and \(M_{1} = [\begin{array}{*{20}c} {K(G,G^{t} )} & {e]} \\ \end{array} .\). The unknowns \(w_{1} ,w_{2} ,b_{1} ,b_{2}\) can be determined as follows:

$$\left[ \begin{gathered} w_{1} \hfill \\ b_{1} \hfill \\ \end{gathered} \right] = (M_{1}^{t} M_{1} + \partial \,I)^{ - 1} M_{1}^{t} (R - \gamma_{1} )$$

and

$$\left[ \begin{gathered} w_{2} \hfill \\ b_{2} \hfill \\ \end{gathered} \right] = (M_{1}^{t} M_{1} + \partial \,I)^{ - 1} M_{1}^{t} (Q + \gamma_{2} ),$$

where \(\partial > 0\) and \(I\) is an identity matrix.

The PLSTSVR model

To reduce the computational complexity of the TSVR model, Huang et al. (2013) suggested a novel LSTVSR solved in primal, termed as least PLSTSVR. The formulation of PLSTSVR can be expressed as follows:

$$\begin{array}{*{20}c} {min\frac{1}{2}||R - (K(G,G^{t} )w_{1} + b_{1} e)||^{2} + \;\frac{1}{2}C_{1} \psi_{1}^{t} \psi_{1} ,} \\ {s.t.,y - (K(G,G^{t} )w_{1} + b_{1} e) = \varepsilon_{1} e - \psi_{1} } \\ \end{array}$$
(5)

and

$$\begin{array}{*{20}c} {min\frac{1}{2}||Q - (K(G,G^{t} )w_{2} + b_{2} e)||^{2} + \;\frac{1}{2}C_{2} \psi_{2}^{t} \psi_{2} ,} \\ {s.t.,(K(G,G^{t} )w_{2} + b_{2} e) - y = \varepsilon_{2} e - \psi_{2} ,} \\ \end{array}$$
(6)

where \(R = (Y - \varepsilon_{1} e)\) and \(Q = (y + \varepsilon_{2} e)\)

Now, by placing the values of Eqs. (5) and (6), the slack vectors in the objective functions, further computation of the gradient with respect to \(w_{i}\) and \(b_{i}\) for \(i = 1,\;2\), and equating to zero, we get,

$$\left[ \begin{gathered} w_{1} \hfill \\ b_{1} \hfill \\ \end{gathered} \right] = (M_{1}^{t} M_{1} + \partial I)^{ - 1} M_{1}^{t} R$$
(7)

and

$$\left[ \begin{gathered} w_{2} \hfill \\ b_{2} \hfill \\ \end{gathered} \right] = (M_{1}^{t} M_{1} + \partial I)^{ - 1} M_{1}^{t} Q,$$
(8)

where \(M_{1} = [\begin{array}{*{20}c} {K(G,G^{t} )} & e \\ \end{array} ]\) is the augmented matrix; \(\partial > 0\) is a small positive integer and \(I\) is an identity matrix.

The final regressor of PLSTSVR for any new instance \(x \in R^{n}\) can be calculated as follows:

$$f(x) = \frac{1}{2}(K(x^{t} ,G)(w_{1} + w_{2} ) + (b_{1} + b_{2} )).$$

The LDMR model

Recently, Rastogi et al. (2020) introduced a margin distribution-based LDMR which was on the spirit of the LDM model (Zhang and Zhou 2014). LDMR simultaneously minimizes the ε-insensitive loss function and the quadratic loss function. Hence, LDMR shows improved regression performance for noisy data, and it considers the minimization of scattering which is inside the ε-tube. The primal problem of LDMR can be expressed as follows:

$$\begin{array}{*{20}c} {\begin{array}{*{20}c} {min\frac{{l_{2} }}{2}||y - (Gw + eb)||^{2} + \;\frac{1}{2}Ce^{t} \left( {\psi_{1} + \psi_{2} } \right) + \frac{{l_{1} }}{2}||w||^{2} ,} \\ {s.t.,y - (Gw + eb) \le e\varepsilon + \psi_{1} ,\;\psi_{1} \ge 0,} \\ \end{array} } \\ {(Gw + eb) - y \le e\varepsilon + \psi_{2} ,\;\psi_{2} \ge 0,} \\ {} \\ \end{array}$$
(9)

where the input parameters are \(\varepsilon ,l_{1} ,l_{2} ,C > 0\); \(||w||^{2} = u_{2}^{t} I_{0} u_{2}\) where \(I_{0} = \left[ {\begin{array}{*{20}c} I & 0 \\ {} & . \\ {} & . \\ 0 & {...0} \\ \end{array} } \right]\); \(I^{m \times m}\) is an identity matrix.

The dual problem of Eq. (9) may be obtained after adding the Lagrangian multiplier and applying the KKT condition as follows:

$$\begin{array}{*{20}c} {min\frac{1}{2}\left( {\gamma_{1} - \gamma_{2} } \right)^{t} M_{1} \left( {l_{1} I_{0} + l_{2} M_{1}^{t} M_{1} } \right)^{ - 1} M_{1}^{t} \left( {\gamma_{1} - \beta_{2} } \right) + \;y^{t} M_{1} \left( {l_{1} I_{0} + l_{2} M_{1}^{t} M_{1} } \right)^{ - 1} M_{1}^{t} \left( {\gamma_{1} - \gamma_{2} } \right),} \\ { - y^{t} (\gamma_{1} - \gamma_{2} ) + \varepsilon e^{t} (\gamma_{1} + \gamma_{2} ),} \\ {s.t.,0 \le \gamma_{1} \le Ce,\;0 \le \gamma_{2} \le Ce,} \\ \end{array}$$
(10)

where \(M_{1} = [\begin{array}{*{20}c} {K(G,G^{t} )} & e \\ \end{array} ]\).

After determining the solution from Eq. (10) for \(\beta_{1}\) and \(\beta_{2}\), the unknowns \(w\) and \(b\) can be obtained as follows (Hazarika et al. 2020a):

\(\left[ \begin{gathered} w \hfill \\ b \hfill \\ \end{gathered} \right] = \left( {l_{1} I_{0} + l_{2} M_{1}^{t} M_{1} } \right)^{ - 1} M_{1}^{t} \left( {\gamma_{1} - \gamma_{2} + y} \right)\).

The LDMR regressor \(f(.)\) can be obtained for any new sample \(x \in R^{n}\) as follows:

\(f(x) = [\begin{array}{*{20}c} {K(x^{t} ,G^{t} )} & 1 \\ \end{array} ]\left[ \begin{gathered} w \hfill \\ b \hfill \\ \end{gathered} \right]\).

Proposed models

The wavelet transform (WT) is an enhanced version of the conventional Fourier transform (FT) that was introduced by Jean Morlet in 1982 (Morlet et al. 1982a, b). Recently, the WT analysis has gained a lot of interest among the researchers as it overcomes the two key disadvantages of FT (Sifuzzaman et al. 2009; Hazarika et al. 2020a, b):

  1. 1)

    FT is suitable only for stationary signals, whereas WT is suitable for both stationary as well as non-stationary datasets.

  2. 2)

    In FT, the time information is lost while transforming the time domain to the frequency domain, whereas in WT, the time information is not lost.

Wavelet analysis was initially suggested to enhance seismic signal analysis by switching from short-term analysis of Fourier to improved algorithms to identify and analyse abrupt signal changes (Daubechies 1990, 1992; Mallat 1999). The wavelet has the multi-resolution and localization capacity in both time-domain (TD) and frequency domain (FD). Wavelet transform’s TD properties can be elaborated through the wavelet functions which are translated from a wavelet base function (Holland 1992).

The wavelet functions can be derived from the mother wavelet (MW). Let \(\lambda (z)\) be the MW function (Hazarika and Gupta 2020). The wavelet function \(\lambda (z)\) can be obtained by the temporal translation \(t\) and with dilation \(\delta\) as follows:

$$f_{t} (\delta ,t) = \;\frac{1}{\sqrt \delta }\int\limits_{ - \infty }^{\infty } {t(z)\lambda \left( {\frac{z - t}{\delta }} \right)\delta z\;\;\;\;\delta > 0} .$$
(11)

Equation (11) can be represented in the time domain as follows:

$$f_{t} (\delta ,t) = \;\frac{1}{\sqrt \delta }\int\limits_{ - \infty }^{\infty } {t(\omega )\lambda \left( {a\omega } \right)e^{i\omega } \delta z\;\;\delta > 0} .$$
(12)

From Eqs. (11) and (12), one can observe that the local characteristics of a signal can be reflected through the wavelet transform. Hence, the wavelet transforms as an analytical technique has shown great potential (Zhang et al. 2004; Zhou and Ye 2006; Ding et al. 2016).

Translation-invariant kernel

The kernel of translation-invariant (TI) is acceptable if and only if the FT is always positive.

Lemma 1: Let \(\lambda (z)\) be an MW, and let \(\delta\) and \(t\) denote the dilation and translation. If \(z,z^{\prime} \in R^{N} ,\), then the dot-product wavelet kernel may be expressed as follows:

$$K(z,z^{\prime}) = \;\mathop \prod \limits_{i = 1}^{N} h\left( {\frac{{z_{i} - t_{i} }}{\delta }} \right)h\left( {\frac{{z^{\prime}_{i} - t_{i}^{i} }}{\delta }} \right).$$
(13)

The TI kernel of Eq. (13) may be expressed as

$$K(z,z^{\prime}) = \;\mathop \prod \limits_{i = 1}^{N} h\left( {\frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right).$$
(14)

In Eq. (14), \(N\) represents the total quantity of the sample (Ding et al. 2014).

To generate the TI kernel function, this work uses two wavelet kernel functions. They are:

  1. a.

    Morlet wavelet kernel: \(\left( {\lambda (z) = \cos (1.75z)\exp \left( { - \frac{{z^{2} }}{2}} \right)} \right)\) and

  2. b.

    Mexican hat wavelet kernel: \(\left( {\lambda (z) = (1 - z^{2} )\exp \left( { - \frac{{z^{2} }}{2}} \right)} \right).\)

Proposed MKLSTSVR

MKLSTSVR indicates the PLSTSVR embedded with the Morlet wavelet kernel instead of using the traditional kernels. To be an acceptable kernel, the Morlet kernel should satisfy the TI kernel theorem as shown in Eq. (14).

Lemma 2: The Morlet wavelet kernel function (MK) satisfies the TI kernel theorem as follows:

$$K(z,z^{\prime}) = \;\mathop \prod \limits_{i = 1}^{N} \left( {\cos \left( {1.75 \times \frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)\exp \left( { - \frac{{\left\| {z_{i} - z^{\prime}_{i} } \right\|^{2} }}{{2\delta^{2} }}} \right)} \right),$$

which is an admissible kernel (Zhang et al. 2004). Hence, it is a kernel by which the wavelet kernel trick can be used to construct MKLSTSVR. The basic goal of MKLSTSVR is to search for the optimum wavelet coefficients in feature space spanned by multidimensional wavelet basis (MWB). Thus, it obtains the optimal prediction function. Thereby, the prediction function for MKLSTSVR can be obtained as follows:

$$f(z) = \;\frac{1}{2}\;\mathop \prod \limits_{i = 1}^{N} \left( {\cos \left( {1.75 \times \frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)\exp \left( { - \frac{{\left\| {z_{i} - z^{\prime}_{i} } \right\|^{2} }}{{2\delta^{2} }}} \right)} \right)(w_{1} + w_{2} ) + \frac{1}{2}(b_{1} + b_{2} ).$$

Proposed MHKLSTSVR

MHKLSTSVR indicates the PLSTSVR combined with the Mexican hat wavelet kernel rather than the traditional kernels. To be an admissible kernel function, the Mexican hat kernel should also satisfy the TI kernel theorem as shown in Eq. (14).

Lemma 3: The Mexican hat wavelet kernel function (MHK) follows the TI kernel theorem as the following:

$$K(z,z^{\prime}) = \;\mathop \prod \limits_{i = 1}^{N} \left( {1 - \left( {\frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)^{2} } \right)\exp \left( { - \frac{1}{2}\left( {\frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)^{2} } \right),$$

which also can be considered as an admissible kernel (Zhang et al. 2004). Therefore, the Mexican hat wavelet kernel trick can be used to construct MHKLSTSVR. Like MKLSTSVR, the basic goal of MHKLSTSVR is to find the optimum wavelet coefficients in feature space spanned by MWB. Thereby, it obtains the optimum predictor. Thereby, the predictor for MHKLSTSVR can be obtained as follows:

$$f(z) = \;\frac{1}{2}\;\mathop \prod \limits_{i = 1}^{N} \left( {1 - \left( {\frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)^{2} } \right)\exp \left( { - \frac{1}{2}\left( {\frac{{z_{i} - z^{\prime}_{i} }}{\delta }} \right)^{2} } \right)(w_{1} + w_{2} ) + \frac{1}{2}(b_{1} + b_{2} ).$$

Dataset description

The state of Tamil Nadu is situated in the southern part of India. Tamil Nadu is bordered by the Eastern Ghats in the north, Anaimalai hills and Kerala in the west, Bay of Bengal in the east, Gulf of Mannar and Palk Strait in the southeast and the Indian Ocean in the south. The climate of Tamil Nadu ranges from sub-humid to semi-arid. The sites selected for this study has been provided in Fig. 1. The geographical information such as latitude (LAT), longitude (LON) and altitude (ALT) of the chosen sites are provided in Table 2. Hourly average WS data recorded at a height of 50 m above ground level was collected from MERRA-2 analysis database (NASA) for the period of Jan 1980 to May 2018. However, in our experiments, we have considered the data from Jan 2014 to May 2018.

Fig. 1
figure 1

Map of Tamil Nadu showing the chosen four sites (marked with boxes)

Table 2 Geographical locations of the cities in Tamil Nadu, India

The statistics of the WS data of the above four stations have been provided in Table 3 below.

Table 3 Statistics of the wind speed data

The maximum wind speed of 22.64 m/s is noticed in Tirunelveli, and the lowest wind speed of 13.86 m/s is noticed in Madurai. The observation is similar for the mean WS too. Figure 2 shows the hourly wind speed data from January 2014 to May 2018.

Fig. 2
figure 2

Hourly wind speed data measured during January 2014 to May 2018. a Chennai Station. b Madurai Station. c Salem Station. d Tirunelveli Station

Experimental setup and numerical analysis

To illustrate the prediction capability of the proposed MKLSTSVR and MHKLSTSVR models, their performance has been compared with TSVR, PLSTSVR and LDMR on a portion of the WS dataset. The performances of these implemented models are evaluated using RMSE (root mean squared error), MAE (mean absolute error), SMAPE (symmetric mean absolute percentage error), MASE (mean absolute scaled error) and R2 (coefficient of correlation). The definitions of these evaluators can be expressed as:

  • RMSE: RMSE describes how close the data is near the line of best fit. To validate experimental results, RMSE is frequently applied in climatology, forecasting and regression analysis.

    $${\text{RMSE}} = \sqrt {\frac{{\sum\limits_{k = 1}^{N} {(o - p)} }}{N}} ,$$
  • MAE: The amount of error in your measurements is expressed as absolute error (AE). It represents the difference between the measured and original values. The MAE is the average of all absolute errors.

    $${\text{MAE}} = \frac{{\sum\limits_{k = 1}^{N} {\left| {o - p} \right|} }}{N},$$
  • SMAPE: The mean absolute percentage error is a popular metric for assessing forecasting performance. MAPE is asymmetric, penalising negative errors more than positive errors when forecasts are higher than originals. SMAPE is a symmetric version of MAPE that overcomes MAPE’s asymmetry.

    $${\text{SMAPE}} = \frac{{\sum\limits_{k = 1}^{N} {|o - p|} }}{N(p + o)},$$

    Here, \(o\) and \(p\) indicate the original and the predicted WS values, respectively.

  • MASE: Mean absolute scaled error (MASE) is a scale-free error metric that presents each error as a ratio to the average error of a baseline. MASE has the advantage of never returning undefined or infinite values, making it an excellent choice for intermittent-demand series. It can be applied to a single series or used to compare multiple series.

    MASE = mean \(|z|\); where \(z = \frac{f}{{\frac{{\sum\limits_{k = 2}^{N} {|o_{i} - o_{i - 1} |} }}{N - 1}}}\),

    \(f\) is the forecast error in the given period.

  • R2: R2 is a statistical measure that represents the proportion of the variance described by an independent variable or variables in a regression model for a dependent variable. A model’s R2 value of 0.50 indicates that the model’s inputs can describe roughly half of the original variation.

In our experiments, we have considered short names for the evaluators, i.e. RMSE (E1), MAE (E2), SMAPE (E3), MASE (E4) and R2 (E5).

All the experiments were conducted on Window 8 installed laptop computer with 4 GB RAM. The QPPs of TSVR and LDMR models are solved using the external MOSEK optimization toolbox. The Gaussian kernel is used for the TSVR, PLSTSVR and LDMR models. The Gaussian kernel may be represented as follows:

\(K(d_{m} ,d_{n} ) = \exp \left( { - \mu ||d_{m} - d_{n} ||^{2} } \right)\), for \(m,\;n = 1,...,m,\)where dm and dn indicate the input data samples. The Gaussian kernel parameter \(\mu > 0\) is selected from the range \(\{ \;2^{ - 5} ,...,\;2^{5} \;\}\) for TSVR, PLSTSVR and LDMR models. The value of the \(\varepsilon\) parameter is set from \(\varepsilon = \;\{ 0.05,\;0.1,\;0.5,\;1,\;1.5,\,2\}\) for all models including the proposed MKLSTSVR and MHKLSTSVR. The model parameters \(\;C_{1} = C_{2}\) are considered from \(\{ \;10^{ - 5} ,10^{ - 3} ,...,\;\;10^{5} \;\}\) for all models. The Gaussian kernel parameter \(\lambda\) is taken from the set \(\{ \;2^{ - 5} ,2^{ - 3} ,...,\;\;2^{5} \;\}\) for TSVR, PLSTSVR and LDMR models. The \(\lambda\) values are chosen from \(\{ \;10^{ - 5} ,10^{ - 3} ,..,\;10^{5} \;\}\) for LDMR. For the proposed MKLSTSVR and MHKLSTSVR models, the wavelet kernel parameter \(d\) is fixed as 1.

Two types of lag periods are considered: l-3 and l-5. Table 4 and Table 5 portray the results based on the various performance evaluators utilized in this study for the various cities considered for l-3 and l-5, respectively. From Table 4, the following conclusions can be derived:

  1. a)

    The proposed MKLSTSVR and MHKLSTSVR show the best performance based on RMSE for 0 and 2 cases, respectively.

  2. b)

    The proposed MKLSTSVR and MHKLSTSVR show best results based on MAE in 1 case each.

  3. c)

    The proposed MKLSTSVR and MHKLSTSVR show best results based on SMAPE in 2 cases and 1 case, respectively.

  4. d)

    The proposed MKLSTSVR and MHKLSTSVR indicate the best performance based on MASE in 1 case each.

  5. e)

    The proposed MKLSTSVR and MHKLSTSVR show the best performance based on R2 in 0 and 2, respectively.

Table 4 Comparison of TSVR, PLSTSVR, LDMR, MKLSTSVR and MHKLSTSVR based on the performance evaluators, optimum parameters and the computational times for l-3 lag time period. (Best results are bolded.)
Table 5 Average ranks based on evaluators for l-3. (Best average rank is bolded.)

Table 5 shows ranks obtained by the reported models based on various performance evaluators. It can be observed that our proposed MKLSTSVR shows the lowest average rank followed by the MHKLSTSVR, which promulgates the efficiency of these models.

The window size is enlarged from l-3 to l-5 to further evaluate the performance of these models. In Table 6, the results are shown using a lag window of 5. From Table 6, the following conclusions can be derived:

  1. a)

    The proposed MKLSTSVR and MHKLSTSVR reveal the best results based on RMSE in 2 cases each.

  2. b)

    The proposed MKLSTSVR and MHKLSTSVR report the best performance based on MAE in 2 cases each.

  3. c)

    Based on SMAPE, the proposed MKLSTSVR and MHKLSTSVR show the best performance in 3 and 1 case, respectively.

  4. d)

    Based on MASE, the proposed MKLSTSVR and MHKLSTSVR show the best performance in 2 cases each.

  5. e)

    Based on R2, the proposed MKLSTSVR and MHKLSTSVR show the best performance in 1 and 3 cases, respectively.

Table 6 Comparison of TSVR, PLSTSVR, LDMR, MKLSTSVR and MHKLSTSVR based on the performance evaluators, optimum parameters and the computational times for l-5 lag time period. (Best results are bolded.)

Table 7 shows ranks obtained by the reported models based on different evaluators. One can notice that our proposed MHKLSTSVR shows the lowest average rank followed by the MKLSTSVR, which indicates the efficacy of the models.

Table 7 Average ranks based on evaluators for l-5. (Best average rank is bolded.)

Overall, it is noticeable that the increase in the time lag period shows improved performance for the proposed MKLSTSVR and MHKLSTSVR. Figures 3, 4, 5, and 6 reveal the comparison between the original and predicted WS values of TSVR, PLSTSVR, LDMR and the proposed MKLSTSVR and MHKLSTSVR for l-5. Although all the plots seem to be similar, the overall performance of the proposed models is better compared to TSVR, PLSTSVR and LDMR models based on the evaluators.

Fig. 3
figure 3

Comparison of original and predicted WS values for Chennai using l-5

Fig. 4
figure 4

Comparison of original and predicted WS values for Madurai using l-5

Fig. 5
figure 5

Comparison of original and predicted WS values for Salem using a l-5

Fig. 6
figure 6

Comparison of original and predicted WS values for Tirunelveli using a l-5

It can also be observed from Table 4 and Table 5 that the proposed models MKLSTSVR and MHKLSTSVR take less time for computation compared to other models. Thus, it will be interesting to see how the proposed MKLSTSVR and MHKLSTSVR behave in short-term WS prediction at different sites.

Statistical analysis using Friedman test with post hoc analysis

It is noticeable that the proposed models MKLSTSVR and MHKLSTSVR do not show the best results in all the cases. To further justify the efficiency of the best proposed model, the non-parametric Friedman test (Demšar 2006) has been performed. To generate the Friedman statistics, the average ranks of the five models are considered from Table 7.

Let us assume that all the models are not significantly different, from the null hypothesis (NH). Then, we formulate NH as follows:

$$\begin{array}{*{20}c} {\chi_{F}^{2} = \frac{12 \times 24}{{5 \times (5 + 1)}}\left[ {4.8333^{2} + 3.1875^{2} + 3.1458^{2} + 2.3333^{2} + 1.4583^{2} - \frac{{7 \times (7 + 1)^{2} }}{4}} \right] = 57.5909} \\ {F_{F} = \frac{(24 - 1) \times 57.5909}{{24 \times (5 - 1) - 57.5909}} = 34.3371} \\ \end{array}$$

The critical value \(C_{V}\) for \(F_{F}\) with (5 − 1) and (5 − 1) × (24 − 1) degrees of freedom is \(2.471\) for \(\alpha = 0.05\). As it can be noticed that FF > CV; therefore, we can reject the NH. Furthermore, the Nemenyi test is performed with \(p = 0.05\) as follows:

$$CD = \;2.728\sqrt {\frac{5 \times (5 + 1)}{{6 \times 24}}} = 1.2452$$

For significance, the pairwise difference in average ranks should be more than CD. It is worth noting that the proposed MHKLSTSVR is significantly different compared to TSVR, PLSTSVR and LDMR. However, no significant difference can be found between MKLSTSVR and MHKLSTSVR. The Friedman test with Nemenyi Statistics can be visualized from Fig. 7. Those models that are not connected by a line are statistically significant. It can be observed that our proposed MHKLSTSVR is significantly different than TSVR, PLSTSVR and LDMR.

Fig. 7
figure 7

Statistical comparison based on Friedman test with Nemenyi statistics

Conclusion

The wavelets are powerful enough to deal with the non-stationary datasets. Moreover, the PLSTSVR model shows excellent prediction performance. Hence, in this work, the wavelet kernels are embedded in the PLSTSVR models and two novel wavelets kernel-based PLSTSVR models, namely, MKLSTSVR and MHKLSTSVR are proposed for WS prediction in four different wind stations in India. The results show the proposed models to be superior when compared to the related models, i.e. TSVR, PLSTSVR and LDMR. Among the proposed MKLSTSVR and MHKLSTSVR models, the latter shows a closer relationship with the original data. Moreover, the proposed models are computationally efficient as they take less time for computation compared to the other reported models. Comprehensively, one can conclude that both MKLSTSVR and MHKLSTSVR are efficient models and applicable for short-term WS prediction. These models can also be applied in the field of engineering like prediction of river suspended sediment load, rainfall forecasting, runoff prediction, etc.