Introduction

Background

Due to natural variability in the availability of wind resources, the knowledge of wind speed distribution is important for assessing economic viability of wind energy systems at any location. A two-parameter Weibull distribution is often used to represent wind speeds at a given location. Some studies used a single Weibull distribution to describe wind speeds at a particular location by treating all of the measured data as one group [1,2,3,4,5,6,7]. Since a single Weibull distribution fails to capture intra-annual variability of wind speed, some studies have segregated the measured wind data into various groups according to months of the year [2, 8,9,10,11,12,13,14,15], seasons [8, 13, 16], or hours of the day [13] to account for variations in wind speeds and estimate the Weibull parameters more accurately.

However, existing approaches to fitting a Weibull distribution to the observed wind data have ignored the fact that wind speed varies with hour of the day as well as month of the year and thus might result in poor fitting. The diurnal variations are not the same for each month as some months have higher daily wind speed than others. For example, the hourly profiles of wind speed for each month at selected locations in India are shown in Fig. 1. At these locations, diurnal profile of wind speed depends on the month. Monsoon months, from June to September, usually have higher wind speed as compared to other months for each hour of day.

Fig. 1
figure 1

Monthly and diurnal variations of mean wind speeds at four sample locations in India

The present study explored how segregating the data into different temporal resolutions affected the accuracy of representing wind speed. The study calculated the wind energy potential from the observed data and compared that with the potential estimated from a fitted Weibull distribution for the following four groupings of observed data: (i) A single Weibull distribution was used to represent the entire data. (ii) Twelve Weibull distributions were each fitted separately to the data for each month. (iii) Twenty-four Weibull distributions were each fitted separately to the data for each hour of a day. (iv) For each hour of the month, 288 (24 X 12) Weibull distributions were fitted separately to the data.

Knowledge Contribution

Existing studies on fitting Weibull distribution capture either monthly or diurnal variation but not both, lowering the accuracies of estimated Weibull parameters. This study addresses this research gap and captures diurnal variability for each month to help better assess availability of wind resources. The present study showed that fitting Weibull parameters for wind data on much more granular scale improves the accuracy of the fitting process. Such improvements may have significant impacts on the choice of suitable locations for installing wind turbines.

Method and Data

The overview of methodological steps is shown in Fig. 2 ([2, 7]). The following subsections describe the methodology used in the present work:

Fig. 2
figure 2

Methodological framework used in the present work. Note: MCS refers to Monte Carlo simulation

Parameter Estimation

A Weibull distribution describes observed wind speed data reasonably well [17,18,19,20,21]. Equation (1) expresses the probability density function of wind speed v as described by a two-parameter Weibull distribution at a particular location [18]:

$$f\left( v \right) = \left( \frac{k}{c} \right) \left( \frac{v}{c} \right)^{k - 1} \exp \left[ { - \left( \frac{v}{c} \right)^{ k} } \right]$$
(1)

where k = shape parameter (higher k means higher Vmax but lower power density), and c = scale parameter (in m/s).

Several methods have been used to estimate k and c parameters from observed wind speed data [1, 4, 8, 22,23,24]. These methods can be broadly classified into iterative and non-iterative methods. Non-iterative methods present closed-form solutions, while iterative methods require mathematical iterations to arrive at a final estimated value of parameters. Nonetheless, most methods fit a Weibull distribution to the observed wind data well, and no rule exists for choosing the best method, with “best” being defined based on goodness-of-fit criteria [21]. In the present work, a maximum likelihood (MLE) method and an empirical method were used for parameter estimation. The empirical method proposed by [22] presents a closed-form formula and requires only mean and standard deviation of wind speeds to determine the Weibull parameters, as expressed in Eqs. (2) and (3). As a scale parameter uses a somewhat complicated gamma function, several variants of this method have been proposed to approximate the value of the scale parameter [25, 26]. These variants use the same formula for k but use an approximated formula for c.

$$k = \left( {\frac{{\sigma_{v} }}{{\overline{v}}}} \right)^{ - 1.086}$$
(2)
$$c = \left( {\frac{{\overline{v}}}{{\Gamma \left( {1 + \frac{1}{k}} \right)}}} \right)$$
(3)

The MLE proposed by [27] (cited in Seguro and Lambert, 2000) is an iterative process and perhaps the most widely used method for estimating Weibull parameters [1, 15]. The parameters can be estimated using Eqs. (4) and (5).

$$k = \left( {\frac{{\mathop \sum \nolimits_{i = i}^{n} v_{i}^{k} \ln v_{i} }}{{\mathop \sum \nolimits_{i = i}^{n} v_{i}^{k} }} - \frac{1}{n}\mathop \sum \limits_{i = i}^{n} \ln v_{i} } \right)^{ - 1}$$
(4)
$$c = \left( { \frac{{\mathop \sum \nolimits_{i = i}^{n} v_{i}^{k} }}{n}} \right)^{1/k}$$
(5)

The values of k and c were estimated using the two methods explained above. The fitting steps were carried out for four cases: In the first case, a Weibull distribution was fit to the whole data, and a single pair of (k, c) was estimated. In the second case, the data were separated according to the months of a year, and subsequently, a Weibull distribution was separately fitted to get monthly estimates of parameters. In this case, a total of 12 pairs of Weibull parameters were obtained. Similarly, for the third case involving fitting a Weibull function to the data, the data were segregated by hour of the day, resulting in 24 estimated pairs of Weibull parameters for all the data. Lastly, in the fourth case, the data were segregated according to the hour of the day for each month, and a Weibull distribution was fit to 288 segregated datasets, resulting in a total of 288 pairs of estimated parameters.

Goodness-of-Fit

Different statistical tests have been used for assessing how well estimated distribution describes the observed data [28]. Some tests (e.g., Chi-square test or Kolmogorov–Smirnov statistics) should be avoided because they are susceptible to small sample size and fewer number of bins [4, 29]. Therefore, these tests were not included in this study.

Root Mean Square Error

Root mean square error (RMSE) is computed using Eq. (6) [1, 12]. Equation (6) shows the modified RMSE formula to account for fitting multiple Weibull distributions to the observed data. A smaller value of RMSE indicated a better fit.

$${\text{RMSE}} = \sqrt {\frac{1}{N}\mathop \sum \limits_{g = 1}^{G} \mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - x_{i} } \right)^{2} }$$
(6)

where \({{\text{y}}}_{{\text{i}}}\) : observed frequency;

\({{\text{x}}}_{{\text{i}}}\): expected frequency (from the fitted Weibull distribution);

\({\text{n}}\): number of bins of data of a group g;

G: segregated groups for a particular case (e.g., January to December for monthly group, 1 to 24 for

the hourly group); and

\({\text{N}}\): total number of data points in all groups.

R-Squared

The coefficient of determination (\({R}^{2}\)) is calculated using Eq. (7) [4, 30]. Similar to Eq. (6), Eq. (7) is modified to account for the goodness-of-fit of multiple Weibull distributions to the observed data. \({R}^{2}\) explained how well variations in the data were explained by the fitted model. \({R}^{2}\) usually lies between zero and one, with one considered the best fit.

$$R^{2} = 1 - \frac{{\mathop \sum \nolimits_{g = 1}^{G} \mathop \sum \nolimits_{i = 1}^{n} \left( {y_{i} - x_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{N} \left( {y_{i} - \overline{y}} \right)^{2} }}$$
(7)

where \(\overline{{\text{y}} }\) is the global mean of observed frequency (mean frequency of the entire data).

Average Power Density Error

Average power density error (APDE) is defined as the difference between the average power density calculated from the fitted Weibull distribution and power density calculated from the observed data. It can be denoted by Eq. (8) [16, 18]. This method has several advantages, such as simple mathematical expressions, freedom from binning errors, and better prediction accuracies [13, 30].

$${\text{apde}} \left( \% \right) = \frac{{\underbrace {{0.5\rho c^{3} \Gamma \left( {1 + \frac{3}{k}} \right)}}_{{{\text{fitted }}\;{\text{data}}}} - \underbrace {{0.5\rho \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{n} v_{i}^{3} }}{n}} \right)}}_{{{\text{observed }}\;{\text{data}}}}}}{{\underbrace {{0.5\rho \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{n} v_{i}^{3} }}{n}} \right)}}_{{{\text{observed }}\;{\text{data}}}}}} \times 100$$
(8)

Average Energy Output Error

For modeling wind energy systems, Eqs. (6) to (8) may not be suitable measures of goodness-of-fit. These tests give equal weight to each wind speed, irrespective of its magnitude, but wind turbines do not always produce power. The output is zero when wind speed is below cut-in or above cut-out threshold of the wind turbines. Any fit might on average describe wind speed of certain magnitudes more accurately than other wind speeds, but it may be considered a poor fit if it fails to describe wind speeds at which turbines actually produce power [10, 31]. If the purpose of obtaining wind speed distribution at any location is to eventually predict wind power generation, suitable wind speed distribution must be able to predict wind power generation reasonably well [4, 10, 16].

Therefore, in this study, average energy output error (AEOE) was also used as an indicator of goodness-of-fit. It was computed as the difference between the estimated energy and the actual energy. The relevant mathematical expression is shown using Eq. (9). The power curve of a generic turbine was used to map wind speed to a particular power output of a wind turbine. The data interval was taken to be one hour. For a turbine of MW rating, the output of f(x) was in MWh, depending upon the turbine rating.

$${\text{aeoe }}\left( \% \right) = \frac{{\sum\nolimits_{gG} {\sum\nolimits_{i = 1}^{n} {\underbrace {{f\left( {u_{i} } \right)}}_{{{\text{simulated}}\;{\text{instance}}\;{\text{offitted}}\;{\text{data}}}}} } - \sum\nolimits_{gG} {\sum\nolimits_{i = 1}^{n} {\underbrace {{f\left( {v_{i} } \right)}}_{{{\text{observed}}\;{\text{data}}}}} } }}{{\sum\nolimits_{gG} {\sum\nolimits_{i = 1}^{n} {\underbrace {{f\left( {v_{i} } \right)}}_{{{\text{observed}}\;{\text{data}}}}} } }} \times 100$$
(9)

where f(x): wind power output of a given turbine, and x: wind speed.

Equation (5) requires as many estimated wind data points in each group as there are points in the corresponding group of observed data. Monte Carlo simulation (MCS) was used to generate data from estimated parameters to match the numbers in observed data. MCS is useful for capturing uncertainties such as natural variabilities in wind speed and obtaining a probability distribution of the goodness-of-fit metric [11, 23]. For each group, a random sample of n data points was drawn from the estimated Weibull distribution associated with that group. The procedure was repeated for 500 trials to account for natural variability of wind, resulting in 500 values of error for each group in each case. The advantage of this approach over other similar indicators, such as expected energy/power output (EEO) used by [4, 10, 23, 25], is that it produces a distribution of errors rather than a single value of error.

Data

The National Institute of Wind Energy (NIWE) maintains hourly wind speed data for various locations across India and makes the data freely available at http://niwe.res.in:8080/NIWE_WRA_DATA/). Based on the meta-data of wind monitoring stations mentioned in the Wind Atlas of India [32], the top four sites in central India, where data were observed for the longest time period, were selected for this study. A script was written in Python (www.python.org) for simulation and estimation. The linearized power curve of a commercial turbine of size 2 MW was used in the study [33].

Results and Discussion

Wind Data Summary

Wind speed data of four locations in the central region of India were analyzed. The meta-data are shown in Table 1.

Table 1 Summary of information about four sites chosen for the study

The wind statistics at these locations are summarized in Table 2. Hourly and monthly variations in wind speed curves are shown in Fig. 1. Using the formula expressed by Eq. (10), the data were rescaled to estimate the speed values at the height of 100 m, the height at which wind energy potential is usually estimated in India [34]. The value of α (terrain factor) at these locations was assumed to be equal to 0.2 [26].Footnote 1 The parameters were estimated using wind speeds at 20 m.

$$V_{100} = V_{20} \left( {\frac{{h_{2} }}{{h_{1} }}} \right)^{\alpha }$$
(10)
Table 2 Wind statistics summary

Table 3 shows how much wind speed varied within a day for different months. The range for a particular month is the difference between the maximum and minimum average hourly wind speeds. This range shows how much wind varies throughout the day, and a higher value indicates wide diurnal variations in wind speed. The range was similar for three of the locations, but for location D, the range was much smaller, indicating that diurnal variabilities in wind speed were smaller at this location.

Table 3 Monthly range of diurnal variations in wind speed

Estimation of Parameters

The results are presented in Table 4. The findings showed that fitting a Weibull distribution to the data segregated by the hours of each month resulted in a better fit as compared to estimating Weibull parameters from more aggregated datasets. The findings were consistent across all four locations and for all goodness-of-fit metrics except APDE. At location D, Case II was a better fit than Case IV. This could be due to overfitting, as the observed diurnal daily variations were small, or it could possibly be due to the small sample size.

Table 4 Test results of fitting of Weibull distribution

RMSE found in this study was higher that reported in other studies, e.g., [2], perhaps due to smaller sample size or lower wind speeds. Studies have found that an iterative Weibull method fits better than a two-parameter Weibull method if wind data are skewed toward values close to zero [1]. Further, this study’s findings did not agree with findings of [10] that indicated that statistical measures might not be suitable for goodness-of-fit. Moreover, this study complimented the findings of [35], suggesting temporal variations for wind energy assessment.

Figures 3 and 4 show a range of estimated parameters for different months. For each month, the top and bottom of the bar represent the minimum and maximum values among 24 hourly values of the parameter. With both figures, diurnal variabilities in wind speed are significant. Estimated values for each of the 24 h of the day varied considerably for many months. This variability was not captured when the data were only segregated by months to obtain a single monthly value of the shape and scale parameters. Using multiple Weibull distributions increased the complexity of the fitting process and required computational resources [36]. However, since the costs of computational power and data storage have become affordable, the complexity of the fitting process may not be an issue, even if the commercial implications of improving the accuracies were considerable.

Fig. 3
figure 3

Maximum and minimum values of k for each month

Fig. 4
figure 4

Maximum and minimum values of c for each month

Conclusion

Many studies represented wind speed using a single Weibull distribution. As wind resources at any location constantly vary daily or monthly, researchers captured monthly/seasonal variations to better represent wind speeds. However, diurnal variabilities in wind speed can be significant. Using different Weibull distributions for each hour of each month provides a better fit, since R2 for location I varies between 0.81 and 0.84. Similarly, other locations also have higher R2 value, indicating a better fit. Although the improvement may be small, capturing such variability is important for making preliminary assessments of wind energy potential and optimally sizing and modeling renewable systems.

Furthermore, in energy modeling, better fit usually implies accurate estimations of energy yield, suggesting that energy-based indicators might be appropriate criteria for evaluating how well the distribution fits the observed data. Nevertheless, the results of this investigation indicated that conventional statistical tests offer a comparable level of precision to an energy yield indicator for adjusted parameters. Future research may explore advanced modeling techniques such as non-stationary modeling of wind [37] and multi-instrument observations [38] for wind energy measurement.