1 Introduction

Autoregressive types of regression models are used to formulate forecasting models by using the information contained within their recorded values. The study employs wind speed time series at the hourly interval and investigates four techniques: two machine learning techniques (artificial neural networks and genetic expression programming) and two techniques for comparison: a multiple linear regression and the simple Persistence methods. The study has potential applications to energy sources from the wind and, as stated by Burton et al. (2001) and Li and Shi (2010), this source of energy is particularly attractive for being clean, renewable, economically competitive and environmentally friendly. The wind energy systems depend on wind speed and a host of factors as discussed by Tandjaoui et al. (2013), including (i) low production capacity when deployed in a sheltered area and (ii) generated noise. The focus of this paper is on studying the forecasting problems of wind speed.

Machine learning techniques have been used extensively for wind speed predictions over the years. Some recent examples include applications of artificial neural networks (Bilgili and Sahin 2010; Li and Shi 2010; Khatibi et al. 2014), Fuzzy Logic (Barbounis and Theocharis 2007; Damousis et al. 2004; Kariniotakis et al. 1996a; Wang et al. 2004), Genetic Programming (Ghorbani et al. 2010; Guven et al. 2008; Kalra and Deo 2007; Kalra et al. 2008; Khatibi et al. 2011; Ustoorikar and Deo 2008), as well as radial basis function (Beyer et al. 1994), recurrent neural networks (Kariniotakis et al. 1996b; More and Deo 2003) and support vector machines (Ji et al. 2007; Mohandes et al. 2004). Notably, artificial neural networks and genetic programming techniques are among the most frequently used; they will be described in more detail here.

Artificial neural networks (ANNs) are parallel information processing system and emulate the working processes in the brain. ANNs consist of a set of neurons or nodes arranged in layers and, in the case that weighted inputs are used, these nodes provide suitable inputs by conversion functions (Kisi 2005). Each neuron in a layer is connected to all the neurons of the next layer, but without any interconnection among the neurons in the same layer. Neural networks can learn from past data, recognize hidden patterns or relationships in historical observations and use them to forecast future values and thus system behavior.

The genetic programming (GP) methods, first proposed by Koza (1992), are wide ranging and similar to genetic algorithms (GA) (Goldberg 1989). GP techniques are robust applications of optimization algorithms and represent one way of mimicking natural selection. These techniques derive a set of mathematical expressions to describe the relationship between the predictant and dependent variables using such operators as mutation, recombination (or crossover) and evolution. These are operated in a population evolving in generations through a definition of fitness and selection criteria. Applications of GP suit a wide range of problems and in particular to cases where: (i) the interrelationships among the relevant variables are poorly understood or suspected to be wrong; (ii) conventional mathematical analyses are constrained by restrictive assumptions, but approximate solutions are acceptable (Banzhaf et al. 1998). The genetic expression programming (GEP), an extension of genetic programming (GP), is used as an alternative approach to prediction of wind speed time series in this study. The fundamental difference between the GEP and GP algorithms resides in the nature of the individuals. In GP, the individuals are non-linear entities of different sizes and shapes (parse trees). In GEP, the individuals are encoded as linear strings of fixed length (the genome or chromosomes), which are afterward expressed as non-linear entities of different sizes and shapes (simple diagrammatic representations or expression trees). Also, GEP is faster than GP, by two to four orders of magnitude. Due to its simple and efficient features, GEP is also a popular method for evolution modeling and widely used (Ferreira 2006).

Statistical regression models are simple and straightforward and usually the first choice to establish the baseline; conversely, their applications often go back many decades. In addition to reporting multiple linear regression (MLR) method, the Persistence method is also reported here as a possible alternative, which has no mathematical sophistication.

The present work describes the development and training of ANNs and GEP models for the purpose of estimating hourly wind speed, and the results are compared with the MLR and the Persistence Method. The model performances have been estimated by using the correlation coefficient (CC), Nash–Sutcliffe efficiency coefficient (E), root mean square error (RMSE) and Akaike information criterion (AIC). The study employs the time series of wind speed values obtained at Kersey in Colorado, USA.

2 Materials and methods

2.1 Artificial neural networks (ANNs)

Neural networks are inspired by the studies of the brain and nervous systems in biological organisms. They have the capability for self-learning and automatic abstraction. Their developments go back to McCullon and Pitts (1943) as the designers of the first artificial neural networks (ANNs). Over the years, ANNs have been recognized to provide an important alternative to the traditional methods of data analysis and modeling.

The fundamental processing element of a neural network is a neuron. Each neuron computes a weighted sum of its p input signals, x i , for i = 0, 1, 2,…, P hidden layers, w ij, and then applies a nonlinear activation function to produce an output signals u j . The model of a neuron is shown in Fig. 1. A neuron j may be mathematically described with the following pair of equations:

$$ u_{j} = \sum\limits_{i = 0}^{p} {w_{ij} x_{i} } $$
(1)

and

$$ x_{j} = \phi \left( {u_{j} - \theta_{j} } \right). $$
(2)
Fig. 1
figure 1

Nonlinear model of a neuron

The use of threshold θ j has the effect of applying an affine transformation to the output of the linear combiner in the model of Fig. 1. A threshold transfer function is sometimes used to quantify the output of a neuron in the output layer. In particular, depending on whether the threshold θ j is positive or negative, the relationship between the effective internal activity level or activation potential \( v_{j} = \left( {u_{j} - \theta_{j} } \right) \) of neuron j and the linear combiner output u j is modified in the manner illustrated in Fig. 2. Affine transformation method is applied for the coordinate transformation between two reference systems. A coordinate transformation model can be optimized so that it is easy to perform and gives the highest accuracy. If coordinates from two coordinate systems are available for some common points, those transformation parameters can be estimated (Haykin 1999; Melesse and Hanley 2005).

Fig. 2
figure 2

Transformation produced by the presence of a threshold

The sigmoid logistic nonlinear function is described by the following equation (Bilgili et al. 2007):

$$ \phi (x) = \frac{1}{{1 + e^{ - x} }}. $$
(3)

The type of ANN used in this study is a feedforward multilayer perceptron (MLP) which is the most commonly used ANN in hydro-meteorological applications. A set of neurons or nodes may be arranged as layers. The structure of a three-layer MLP is shown in Fig. 3. It consists of three layers: an input layer, a hidden layer and an output layer. The number of neurons in the input and output layers is defined based on the number of input and output variables of the system under investigation, respectively. However, the number of neurons in the hidden layer(s) is usually determined via a trial-and-error procedure. As seen from the figure, the neurons of each layer are connected to the neurons of the next layer by weights. To obtain optimal values of these connection weights, ANNs must be trained.

Fig. 3
figure 3

Simple configuration of a multilayer perceptron neural network

The neural networks technique for wind speed prediction was first investigated by Mohandes et al. (1998) who compared its performance with an autoregressive model. This included investigations of the statistical characteristics of mean monthly and daily wind speed in Jeddah, Saudi Arabia. The autocorrelation coefficients and correlogram were employed to investigate the real diurnal variation of mean wind speed. They showed that for prediction of wind speed time series, feedforward back propagation (FFBP) artificial neural networks had higher accuracy than the used regression model, and the best network based on the RMSE over the training data was one with 24 hidden units.

The study by Cadenas and Rivera (2009) investigated hourly wind speed time series during about 1 month representative of three sites in Mexico, and diverse configurations of ANN were generated and compared through error measures. First, a model with three layers and seven neurons was chosen, according to the recommendations of various authors. The results were not sufficiently satisfactory, so other three models were developed, consisting of three layers and six neurons, two layers and four neurons, and two layers and three neurons. The simplest model of two layers, with two input neurons and one output neuron, performed the best in the short-term wind speed forecasting and showed a good accuracy to be used for the energy supply. The authors indicated that the efficiency of the FFBP may be increased by using the conjugate gradient learning.

A comprehensive comparison study on the application of different ANN, namely, adaptive linear element, back propagation and radial basis function in hourly wind speed forecasts was presented by Li and Shi (2010). The results of that study showed that no single neural network outperformed others when overall evaluation metrics were considered.

2.2 Genetic expression programming

Genetic expression programming (GEP) proposed by Candida Ferreira in 1999 is an evolutionary technique, which combines features of its predecessors, genetic algorithm (GA) and genetic programming (GP). The main difference between the three algorithms resides in coding of individuals, as defined below. In GA, individuals are represented by symbolic strings of a fixed length, the so-called chromosomes. Individuals in GP are nonlinear entities of different sizes and shapes, the so-called parse trees. GEP individuals are encoded as symbolic strings of a fixed length (chromosomes), which are then expressed as nonlinear entities of different size and shape, the so-called expression trees (ET).

GA and GP suffer from two main limitations. They may be easy to manipulate genetically but they lose in functional complexity. On the other hand, when their functional complexity is high, it is extremely difficult to reproduce without some modification. Furthermore, use of genetic operators in GP is very limited and operators act directly on parse trees. GEP resolves those constraints leading to possible improvements in speed and accuracy (Ferreira 2002). The application of GEP to practical problems is diversifying and in recent years and these include: Guven et al. 2008; Ustoorikar and Deo 2008; Kisi and Guven 2010; Zakaria et al. 2010; Khatibi et al. 2014. GEP automatically generates equations that describe cause and effect relationships in the data; it is significantly slower in developing models and generates relatively simple equations describing the relationships that can be interpreted directly.

In GEP, the chromosome consists of a linear, symbolic string of a fixed length. One chromosome can contain one or more genes, each encoding a sub-expression tree. Despite the fact that the length of chromosomes is fixed, it is still possible to code expression trees of different sizes and shapes. Structural organization of genes in a head and a tail always guarantees production of valid programs (Ferreira 2001a). GEP identifies an appropriate relationship for any given time series by two components: (i) a set of functions and their parameters (referred to as the terminal set), which emulates the role of proteins or chromosomes in biological systems; and (ii) a parse tree, which is a functional set of basic operators such as \( \{ { + , - ,*,/, \wedge ,\sqrt[{}]{{}},\log ,a\log ,\sin ,a\sin ,\exp , \ldots } \}. \)

The process of GEP begins with the random generation of the chromosomes of the initial population. Then the chromosomes are expressed and the fitness of each individual is evaluated. The individuals are then selected according to fitness to reproduce with modifications, leaving progeny with new traits. The individuals of this new generation are, in their turn, subjected to the same developmental process: expression of the genomes, confrontation of the selection environment and reproduction with modification. The process is repeated for a certain number of generations, or until a solution has been found (Ferreira 2001b). The fundamental steps of the GEP are schematically represented in Fig. 4.

Fig. 4
figure 4

The flowchart of a genetic expression algorithm

Flores et al. (2005) explored the applicability of genetic programming to wind speed prediction. Unlike other artificial intelligent techniques, GP provided closed-form models for the time series under analysis. They used a form of GEP for the implementation of the modeling tools, and the techniques were applied to the time series formed by monthly averages of the wind speed in the Isthmus of Tehnuantepec, Mexico. GP has shown to be a good alternative to provide models for wind speed prediction.

The method of least squares support vector machine (LS-SVM) for short-term wind speed prediction was put forward by Xiaojuan et al. (2009), and the influence of parameters selection of LS-SVM on prediction accuracy was analyzed. The genetic algorithm was adopted to realize parameters optimization of LS-SVM and establish short-term wind speed prediction model of LS-SVM based on genetic algorithm. It was shown that the method proposed in this paper can quickly and effectively carry out short-term wind speed prediction by simulation example.

Huang et al. (2011) used a real-valued genetic algorithm (RGA)-based least-squared support vector machine (LS-SVM) to precisely predict the short-term regional wind speed. A dataset including the time, temperature, humidity and the average regional wind speed being measured in a randomly selected date from a wind farm being located in Penghu, Taiwan, was selected for verifying the forecast efficiency of the proposed RGA-based LS-SVM.

Numerical weather prediction models were used by Arellano et al. (2012) to produce wind speed forecasts at a high spatial resolution. The integration of the weather research and forecasting-advanced research WRF (WRF-ARW) mesoscale model with four different downscaling approaches was presented. WRFARW forecasts and observations at three different sites of the state of Illinois in the USA were analyzed before and after applying the downscaling techniques. Three of the proposed methods needed a predefined model to be applied. The fourth approach, based on genetic programming (GP), implicitly found the optimal model to downscale WRF forecasts, so no previous assumptions about the model had to be made. The results obtained demonstrated that GP was able to successfully downscale the wind speed predictions, reducing significantly the inherent error of the considered numerical models.

2.3 Multiple linear regression (MLR)

The multiple linear regression analysis is a widely used technique for expressing the dependence of a response variable on several predictants. It fits a linear combination of the components of a multiple signals x i to a single output signal y, as defined by (4):

$$ y = a_{0} + \sum _{i = 1}^{n} a_{i} x_{i} , $$
(4)

where a i . values are called regression coefficients, which are estimated by using the least square or any other similar method. In this study, the coefficients of regression were determined using the least square method. The stepwise regression is applied as a robust method for the selection of best subset models and is based on adding or deleting the variable/variables with the greatest impact on the residual sum of squares (Ghorbani et al. 2012).

2.4 Persistence method

The Persistence method is the simplest way for the short-term forecasting of wind speed and assumes that the future wind speed is the same as the current one or \( \hat{x}_{t + k} = x_{t} \). This method is used as reference for evaluating the performance of advanced forecasting methods (Zhu and Genton 2012).

2.5 Used data and performance criteria

In this study, hourly wind speed time series were used from the Kersey site in Colorado, USA (latitude 40°22′ 36″ north and longitude 104°31′ 55″ west; altitude 1409.7 m above sea level).

The dataset used in this study corresponds to the months of January 2005 through to January 2012. The wind speed characteristics are presented in Table 1 for the site and their time series are displayed in Fig. 5 at the hourly interval.

Table 1 Summary of wind speed data (x t , m/s) used in the study
Fig. 5
figure 5

Time series plots of hourly wind speeds at the Kersey site for the period of January of each year

Hourly wind speed time series for the site was downloaded from the Colorado Climate Center (http://ccc.atmos.colostate.edu/). As shown in Fig. 5, the data for the first 7 years (2005–2011) were used for training and that for 2012 for model testing. Four performance criteria are employed, comprising: correlation coefficient (CC), root mean square error (RMSE), Nash–Sutcliffe efficiency coefficient (E) and Akaike information criterion (AIC). These are used to assess the goodness of fit for the selected models. The latter two performance criteria are expressed as:

$$ E = 1 - \frac{{\sum\limits_{i = 1}^{N} {(x_{\text{obs}} - x_{\text{comp}} )^{2} } }}{{\sum\limits_{i = 1}^{N} {(x_{\text{obs}} - \bar{x}_{\text{obs}} )^{2} } }}, $$
(5)
$$ {\text{AIC}} = N \cdot {\text{Ln}}\left( {\text{MSE}} \right) + 2k. $$
(6)

In the expressions, i is an integer varying from 1 to N, x obs and x comp are the observed and computed wind speed, respectively, the average value of the associated variable is represented with a bar above it, N is the total number of records and k is the number of model coefficients or parameters, and MSE is mean square errors.

The CC, which may range from −1 to 1, is a statistical measure of how well the regression line fits the observed data; a coefficient value of one indicates that the regression line perfectly fits the observed data. The CC also indicates whether or not the two variables move in the same or opposite directions and the degree of linear association.

The RMSE can provide a balanced evaluation of the goodness of fit of the model; the best RMSE would be zero, or close to it.

The range of E lies between 1.0 (perfect fit) and −∞. A lower than zero efficiency indicates that the mean value of the observed time series would have been a better predictor than the model being tested.

The Akaike information criterion (AIC) indicates by how much the dependent variable changes with the change of the predictant (Kisi and Guven 2010). In the selection process, the model with a lower AIC value is preferred.

3 Results

3.1 Selection of inputs

Inclusion of multiple inputs into any predictive system is often at the expense of increasing system complexity, but with diminishing returns. Therefore, selection of relevant input variables is an important problem when developing such systems. Overview of past studies indicated that modeling/predicting of wind speeds based on machine learning techniques usually involved using several previous wind records (see e.g., Mohandes et al. (2004); Monfared et al. (2009); Cadenas and Rivera (2010); Li and Shi (2010); Sheela and Deepa (2013)).

In this study, input variables were determined by the cross-correlation between the wind speed at the present time x(t) and time-lagged wind speed x(t − 1), x(t − 2),…., x(t − d). Figure 6 shows the variation of the cross-correlation against the lag time for the wind speed at the Kersey site. It is seen clearly that the cross-correlation values are higher than zero for time lags of up to 14 h, and approximately equal to zero after a time lag of 15 h. Therefore, any value of wind speed to be forecasted is regressed on a maximum number of its antecedent 14 values. The 14 input structures are shown in Table 2 and these models were used to train and test the developed ANNs, GEP, Persistence, and MLR models.

Fig. 6
figure 6

Cross correlations between time-lagged wind speed (m/s) values at the Kersey site

Table 2 Model structures for hourly wind speed prediction

3.2 ANN training

The training of the ANN model used the inputs as represented in Fig. 3 and the outputs obtained from the network are compared with the target output values to estimate errors. The computed errors are back-propagated through the network and the connection weights are updated, until reaching a desirable level of performance, if at all. The logarithmic sigmoid transfer function was used in the hidden layer and the linear transfer function was employed from the hidden layer to the output layer, because the linear function is known to be robust for a continuous output variable.

Selection of the number of neurons for the hidden layer intended achieving the best network architecture. A three-layer network that achieved the minimum value of RMSE was selected.

The M1 to M14 model structures with different input structures were trained and tested. The optimum number of neurons in the hidden layer was identified for I, 2I, and 2I + 1, where I is the number of inputs, which was successfully implemented by, e.g., Makarynskyy (2004); Mishra and Desai (2006); Makarynskyy and Makarynska (2007). The values applied in the input and output layers were normalized in the range from 0 to 1. The effect of changing the number of hidden neurons on the CC, RMSE, E, and AIC for each model is presented in Table 3.

Table 3 Comparison of ANN structures for the Kersey Site

In the training phase, the M8 model, where the number of hidden neurons is equal to I, demonstrates the best CC, RMSE and E statistic values of 0.892, 0.839 m/s and 0.796, respectively. This is selected as for the comparison purpose, although the best performance for the validation phase is the M8 model with 2I + 1 numbers of hidden neurons (the best CC, RMSE and E statistic values of 0.914, 0.874 m/s and 0.834, respectively). Hence, in accordance with the performance indices, ANN (8, 8, 1) has been selected as the most effective and appropriate ANN model. Notably, the performance parameters are not overly sensitive to model structures and there is a conflict between the performance parameters, such that the above selected model structure based on CC, RMSE and E does not produce the lowest AIC value. Therefore, according to the AIC values, the lowest value is −1266 as produced by the model structure of M3 with 2I-number of hidden layers.

3.3 GEP training

In the preliminary investigations for the GEP model, the four sets of operators presented in Table 4 were tested.

Table 4 Defined operators for GEP modeling

The GEP structure was developed based on the authors’ previous studies, literature review and trial-and-error procedure. The chromosomes of GEP are generally composed of more than one gene. Each gene codes for a sub-program or sub-expression tree. Then the sub-program can interact with one another in different ways, forming a more complex program. In the GEP model, the sub-expression tree must be linked through the linking function. Both the addition and multiplication operators were tested in the model, and it was found that the addition function provided a better fitness value. This study employed both: (i) the RMSE fitness function (based on the absolute error), and (ii) the various chromosomes linked together into algebraic subtrees by adding them, as opposed to multiplying them.

The default parameters and the architecture used for GEP modeling are given in Table 5. In this table, insertion sequence elements or IS elements are short fragments of the genome with a function or terminal in the first position that transposes to the head of genes except the root. The IS transposition operator randomly chooses the chromosome, the start and termination points of the IS element and the target site. Root insertion sequence (RIS) elements are short fragments with a function in the first position that transpose to the start position of genes. The RIS transposition operator randomly chooses the chromosome, the gene to be modified and the start and termination points of the RIS element.

Table 5 Initial setting for implementing the GEP models

The set of GEP model structures, defined in Table 4, were investigated. Their performances were evaluated for each model combination in Table 2 and the results of this investigation are presented in Table 6. The results show that model structures are almost insensitive to the values of CC, RMSE and E, but there is more discrimination suggested by the AIC values. Based on CC, RMSE and E values, the M2 model with the function type F3 suggests the best performance for the training phase, which produces the statistic values of 0.889, 0.869 m/s and 0.79 (and AIC value of −1615.9, which is not the best), respectively, and this is selected as the representative model. Notably, the same M2 model with the function type F3 performs the best in terms of the CC, RMSE and E statistic values of 0.884, 0.909 m/s and 0.781, respectively, for the testing phase. However, the quality of these performance parameters drops in the testing phase, but the performance parameters for the testing phase should not be used for decision making and is discussed in more detail in the next section. Notably, based on the AIC values, the M14 model with the function type F2 suggests the best performance for the training phase, which produces the lowest AIC value of −1110.

Table 6 The results of GEP models for the training and testing period

The expression trees obtained for the GEP (M2, F3) model are as shown in Fig. 7 and the simplified form of equation derived for the expression trees is as follows:

Fig. 7
figure 7

The expression trees for the selected GEP model in this study

$$ x_{t} = - 0.039 + x_{t - 1} + \exp [ - \exp (x_{t - 1} )] + 0.0136(x_{t - 2} )^{2} . $$
(7)

3.4 MLR fitting

The standard form of the MLR model based on Eq. (4) was used for the wind speed prediction. Table 7 presents the performance of the MLR model, according to which the model performance is almost insensitive to the model structure. Based on CC, RMSE and E values, the M13 model produces the best CC, RMSE and E statistics of 0.888, 0.874 m/s and 0.788 (note that the AIC value of −1527.0 is not optimum for this model structure), respectively. This is selected as the representative model. Notably, for the testing phase, the quality of these performance parameters drops and it seems that the M6 model has best performance in terms of CC, RMSE and E with statistic values of 0.882, 0.914 m/s and 0.778, respectively. However, this is not selected as discussed further in Sect. 4. The table further shows that on the basis of the AIC statistic value, M1 should be the model structure of choice with the AIC value of −1487.4.

Table 7 The results of the MLR model for the training and testing period
$$ x_{t}=0.212+0.972 \times x_{t - 1}- 0.104\times x_{t - 2}+0.026 \times x_{t - 3}-0.021\times x_{t - 4}+0.018\times x_{t - 5}+0.08\times x_{t - 6} - 0.043 \times x_{t - 7} + 0.024 \times x_{t - 8} + 0.011 \times x_{t - 9} - 0.039 \times x_{t - 10} + 0.014 \times x_{t - 11} - 0.011 \times x_{t - 12} + 0.010 \times x_{t - 13}. $$
(8)

3.5 The results by the Persistence method

This model requires no training due to its underlying simple structure, i.e., it is only implemented as M1 with no parameters to identify. Table 7 also presents the performance of the Persistence method, which produces the CC, RMSE, E and AIC statistic values of 0.886, 0.905 m/s, 0.772 and −1148.1, respectively, and these values for the testing periods are 0.879, 0.958 m/s, 0.757, and −12.3, respectively.

4 Model inter-comparisons

To further investigate the fitness of the developed models, the performance criteria of the optimum ANN, GEP and MLR models and the Persistence method were compared with one another. The overall performances of the models for training and testing datasets are summarized in Table 8 and show a mixed fortune for the models. In the first place, the CC, RMSE and E statistic values are quite insensitive to model structure. In a black-and-white world defining the actual statistic values as the basis of decision making, the ANN model performs better than the GEP and MLR models and the Persistence method. However, the Persistence method produces the lowest AIC value. In this black-and-white world, this conflict indicates a disastrous outcome with no hint to resolve the conflict. The results provide anecdotal evidence for a more detailed scrutiny of the results, as discussed below.

Table 8 Comparison of the performances of Persistence method, and MLR, ANN and GEP models for the Kersey site

The predicted values were also compared with that of the observed values for the testing period and the results are presented in Fig. 8.

Fig. 8
figure 8

Comparison of the testing results of ANN, GEP, Persistence and MLR models for the Kersey site

5 Discussions

The inter-comparison of the results above suggest that the Persistence method is the “best” in terms of the AIC statistic, yet this is a simple method that does not warrant being even called a model. In fact, this method may be labeled as a model with “zero information content” i.e., low grade information. If this is the best model, does it not expose a flaw in the world of mathematical modeling that mathematical sophistication serves little? There is a very straightforward answer to this, which is: “no, the zero-order Persistence model has very little to offer.” While in a pluralistic culture of modeling, any modeling technique has merits to be considered, the selection criteria should not be based on the nominal value of the performance parameter.

In the first place, the concept of “lead time” is a very important concept in forecasting practices, which refers to real time when forecasting is required for the future time. The longer the lead time in a forecasting task, the greater is the utility of the forecasting activities. However, too far into the future is associated with increasing inherent uncertainties. Figure 6 shows that the bottom line for the forecasting lead time is 14 h, but it also indicates the autocorrelation at 14 h lead time to be very low (high uncertainties). On this basis, the various model structures defined in Table 2 may be viewed as follows: (i) any model (ANN, GEP or MLR) using the M1 model structure extracts a limited amount of information from the recorded data; (ii) these ANN, GEP or MLR models using the M14 model structure extract a maximum amount of information from the recorded data; (iii) the model structures between M1 and M14 operate in between. The extraction of information is in terms of estimating the values of regression parameters in Eqs. (1) and (2) or (4) or expression trees.

Notwithstanding the above, these regression models are also required to be parsimonious, i.e., the number of independent variables should not be too few and not too many. It indicates that too many independent variables add to the complexity of the mathematical problem without a corresponding return (improvement) in the accuracy of the model; whereas too few of them would undermine the accuracy. The parsimony of a model is not simple either.

The Persistence method using the bare information within the 1 h antecedent observed value has a very limited utility, although it is still better than nothing. Notably, it is widely known that assuming the weather today would be the same as that of yesterday can still be useful, but such a forecast can be completely wrong as well. Another use of the Persistence method is that it helps to understand the baseline of the forecasting problem better. This method is not recommended for use.

While the selected model structure of M13 for MLR (based on CC) extracts information from 13 h of auto-correlated data, that of M1 based on AIC extracts information from 1 h of antecedent record and this should also not be a preferred option. However, the M13 model structure associated with MLR has fluctuating results with a greater tendency to underestimate (or even overestimate). Hence, this model is associated with problems, though it is easy to implement.

The model structure for GEP (based on CC statistic value) is M2, but that based on AIC is M14. However, the CC values are insensitive to the model structure and therefore the choice of model structure should be based on AIC. This conflict in the implication of performance criteria is reversed by ANN, in which the model structure selection based on CC suggests M8, I, which has 8 h of information extraction base, whereas that based on AIC suggest M3, 2I, which has 3 h information extraction base.

The above conflicts are not the only ones. For instance, it is expected that the best model should have the lowest drop in the quality of the information expressed by performance criteria by applying the trained model for the forecasting at the testing period. However, the results presented above indicate that this expectation does not prevail generally. When an overall view is taken, it is clear that it is futile to seek the single best model, but different models have to be used and the individual models have to be studied in a greater detail to gain an insight into their performance. On this basis, both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses. The better performance of GEP and ANN over MLR has been underlined by other researchers, e.g., Beyer et al. (1994); Mohandes et al. (1998); Sfetsos (2000); Li and Shi (2010); Cadenas and Rivera (2009); Bilgili and Sahin (2010); Upadhyay et al. (2011).

Short-term wind speed forecasting, as studied in this paper, is crucial for the production of wind energy, aircraft and airport safety, as well as in other applications including the account given by Lei et al. (2009), Luickx et al. (2008), Monfared et al. (2009), Pinson et al. (2009) and Sfetsos (2000) on estimating the efficiency, the integration and scheduling of a wind power-generating system. Also, an insight into wind speed behavior is used for positioning the systems in a sheltered area through predictive estimations of wind energy dependent on wind (Burton et al. 2001). Modeling may be the key for cost-efficient predictions of wind speed. In this paper, a comparison of the performance of different modeling strategies shows that forecasting wind speed is feasible, but different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results, and decisions should be arrived at on the basis of understanding inherent modeling uncertainties.

6 Concluding remarks

The objective of this study was to predict wind speed by using wind speed time series recorded at an hourly interval. The recorded data from the Kersey site in Colorado, USA, were used to investigate the performance of two modeling strategies: artificial neural networks (ANNs) and genetic expression programming (GEP). The obtained results were compared to multiple linear regression (MLR) model and the Persistence method.

Autocorrelation and partial autocorrelation functions for various wind speed data lags were used to find out the number of past observations to provide effective inputs to the models. The model performances were estimated and their results were compared with one another. This inter-comparisons indicated that the ANN, GEP, MLR models and the Persistence method can be successfully applied to the tasks of forecasting short-term wind speed,. In this paper, a comparison of the performance of different modeling strategies shows that forecasting wind speed is feasible, but different techniques would lead to different results, where the choice between them is not easy. Thus, decision making has to be informed of these modeling results and decisions should be arrived at on the basis of understanding inherent uncertainties. The results show that it is futile to seek for the single best model, but different models have to be used and the individual models have to be studied in greater detail to gain an insight into their performance. On this basis, both GEP and ANN are equally credible selections and even MLR should not be dismissed, as it has its uses.