Introduction

Renewable energy sources, specifically wind and solar, are progressively being integrated into global power systems to prevent the greenhouse gas emissions produced by fossil fuel-based generators while meeting growing demand for power at the lowest possible cost. According to the Global Wind Council, the total global wind power installed capacity is up to 837 GW by 2022, allowing the globe to save nearly 1.2 billion tons of CO2 yearly [1]. Such massive wind power has the potential to supply global energy demand sustainably while also achieving net-zero objectives by 2050 [1].

Large-scale integration of wind energy in power systems poses considerable challenges for the system operators and wind power producers due to its unpredictable and highly variable nature. Modelling wind power uncertainty into numerous decision-making problems, such as generation scheduling, market clearing, power trading, reserve management, etc. is a formidable challenge [2]. Decision-making problems under wind uncertainty are generally modelled through stochastic programming, robust and chance-constrained programming approaches. The stochastic programming approach is accorded utmost importance because of its accurate representation of wind uncertainty through scenarios. Scenarios are possible sets of random wind power inputs with definite probability [2, 3]. The generation of quality scenarios is essential to model wind power uncertainty in decision-making problems through a stochastic programming approach.

Several methods have been proposed in the literature to generate wind power scenarios. These are fundamentally categorized as path-based methods, movement matching, and internal sampling. In path-based methods, scenarios are generated by combining forecasted wind power and random error matrices. Machine learning and statistical time-series models are commonly used to predict wind power [2, 3]. Movement-matching methods generate a discrete distribution of statistically dependent random variables by comparing sample distribution with original distributions. Internal sampling is a continuous sampling of the actual distribution of random variables. Due to the use of advanced forecasting methods for scenario generation, path-based methods can accurately represent the stochastic nature of wind power. This paper focuses on expanding the use of path-based concept for wind power generation scenarios considering spatiotemporal correlation between multiple WFs.

An in-depth analysis of wind power scenario generation techniques for efficient use of renewable energy systems is provided [2, 3]. Short-term wind power scenarios are generated for a single WF using multivariate Gaussian distribution-based error covariance matrix [4]. Further, wind speed scenarios are generated for WFs using a univariate Autoregressive Moving Average (ARMA) model and stationary variance–covariance matrix [5, 6]. ARMA model is further translated into the state-space model to generate wind power scenarios and analyze dependencies for multiple WFs [7]. An inverse transform sampling approach was proposed to generate wind power scenarios considering statistical uncertainty and variability [8]. Empirical cumulative distribution functions characterized by uncertainty and variability are modelled by sampling from multivariate normal distributions of forecast errors. The above-discussed scenario generation methods assume parametric distributions for errors and separately modelled temporal and spatial correlations. As a result, significant correlation loss in the generated scenarios occurs due to these assumptions.

A generalized dynamic factor model preserves correlations between generated load and wind power scenarios [9]. However, capturing the temporal correlations through the first or second-order statistics is difficult. Electric load, photovoltaic (PV) and wind production scenarios can be generated using Artificial Neural Network (ANN) based methodology [10]. Data-driven approaches can also generate wind and solar power scenarios using Generative Adversarial Networks (GANs) [11, 12]. In this approach, two deep neural networks are fused as a generator and discriminator. It assumes no parametric distribution of errors but requires another forecasting method to generate the tedious forecasting errors. GANs are modified by imposing Lipschitz constraints on discriminator networks for wind power generation [12]. The GANs based scenario generation approach is further improved by using a conditional improved Wasserstein generative adversarial network (WGAN) [13]. The support vector classifier (SVC) predicts data labels in this approach. The wind power scenarios are generated by integrated non-separable spatiotemporal covariance function and fluctuation-based clustering [14]. The historical data is grouped into clusters with different fluctuations using the K-means clustering algorithm to estimate the covariance matrix precisely. The machine learning models performance is better than time series models but require a large amount of historical data for learning and have over and underfitting issues. Covariance matrix-based approaches have limitations due to skewness, excess kurtosis and asymmetrical dependencies present in the wind power data. Also, the size of covariance matrix increases with the data dimensions or number of WFs.

Copulas are commonly used to model dependencies among high-dimensional random variables [15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]. This approach has gained significant attention recently for WFs’ dependency and uncertainty modelling. Copulas combine an arbitrary number of univariate distributions into a joint probability density function (PDF), from which useful information can be extracted. The advantageous feature is that it does not impose any restrictions on the data's marginal distribution [16,17,18,19,20,21]. This property of copula allows it to model asymmetrical dependencies between any numbers of WFs while considering spatiotemporal correlation. A Drawable Vine (D-Vine) copula generates time-coupled wind power infeed scenarios for aggregation of WFs [21]. D-Vine copula was combined with residuals of the support vector regression (SVR) model to represent spatial dependence in the probabilistic forecasts of wind power [22]. A mixture vine copula method that combined the K-means, C-vine, and D-vine copulas is used to analyze the dependence of the multi-wind power output for wind/hydrogen production scheduling. [23]. Regular-Vine (R-Vine) copulas are flexible and have better choices of decomposing dependence structures than D-vine copulas and canonical vine (C-vine) copulas. The R-Vine copula produces better fitting and prediction performance than C-Vine and D-Vine copulas [24,25,26,27]. R-Vine copula models, kernel density estimation, time series and hybrid models are used to capture spatiotemporal correlations of multiple WFs for the generation of wind power, wind speed, and market price scenarios. Kernel density estimation, ARIMA, generalized autoregressive conditional heteroskedasticity (GARCH) and ARIMA-GARCH-t models can estimate the marginal distributions for scenario generation [24,25,26,27,28]. A two-stage spatiotemporal sampling approach employs R-Vine copula as a spatial sampling method to generate wind power scenarios for multiple WFs. [29]. R-Vine copula is further combined with a variance reduction method to generate time-coupled wind power scenarios. [30]. R-Vine copula is also used in conjunction with the random forest method to identify load patterns in smart grids. [31].

In the above-mentioned methods, R-Vine copula-based scenario generation methods used a univariate or multivariate distribution to model temporal correlations and copula for spatial correlation. However, the simultaneous modeling of temporal and spatial correlations through the multivariate forecasting models and further recovery of spatial correlation from the residuals using the R-Vine copula model can generate quality wind power scenarios retaining spatiotemporal correlations. Table 1 highlights the summary of the literature review on scenario generation and provides comparison between the existing work and the proposed work. The models that use the vine copula approach for scenario generation are highlighted in grey in this table. The table clearly shows that the advanced multivariate time series model, i.e., VARMA, is not a hybrid with the R-Vine copula for generating wind power scenarios that consider spatial and temporal correlation simultaneously. The combination of VARMA and R-Vine copula can provide two-stage modeling for spatial correlation, which improves the quality of generated scenarios. The wind power scenario generation method can be further improved by incorporating the R-Vine copula and the multivariate time series forecasting model, which capture the asymmetrical tail dependency that occurs in wind generation without making any assumptions about distribution types.

Table 1 Summary of literature review on scenario generation

This paper proposes a hybrid, distribution-free VARMA-Copula approach for generating wind power scenarios for multiple WFs with spatiotemporal correlations in the very short-term horizon. For spatiotemporal correlations, the multivariate VARMA model is used to generate marginal distributions of WFs residuals. The R-vine copula then uses the residuals to capture the asymmetrical dependencies between WFs. Modelling marginal distributions through a multivariate model can simultaneously consider the spatial and temporal propagation and forecast errors. The use of the R-vine copula makes the proposed model distribution-free and better compared to other benchmark models. The proposed algorithm is implemented on publicly available data of nine Australian WFs. The obtained results are compared with benchmark models such as VAR, VARMA, ANN [10] and GANs [11]. The proposed model retains the spatiotemporal correlation in the generated scenarios, as demonstrated by the minimum energy score, cross-correlation plots, and Kendall's correlation plots.

The rest of the paper is organized as follows. Section II describes the mathematical formulation of VARMA and R-Vine copula to model spatiotemporal correlations. The proposed scenario generation algorithm is discussed in Section II. Section III describes energy score and cross-correlation functions to evaluate the quality of generated scenarios using proposed and benchmark models. Results are discussed in the case study section IV. Finally, the proposed work has been concluded in section V.

Spatiotemporal Correlation Modeling in Scenario Generation

VARMA Model

The power output of \(N\)-WFs is considered as stochastic process and modelled through the multivariate n-dimensional \(VARMA\left( {p,q} \right)\) model. Compared to the VAR model, the VARMA model can forecast wind power output using temporal propagation of historical observations and forecast errors [32,33,34]. The number of historical observations and errors used in the VARMA process depends on the estimated order of VA terms \(p\) and MA terms \(q\). The spatial correlation between WFs is modelled using the VARMA model parameter matrix. The N-dimensional \(VARMA\left( {p,q} \right)\) is mathematically expressed as follows:

$$\hat{y}_{t}^{n} = \sum\limits_{l = 1}^{p} {\phi_{l} } y_{t - 1} + a_{t} + \sum\limits_{m = 1}^{q} {\Theta_{m} a_{t - m} }$$
(1)

where, \(\hat{y}_{t}^{n}\) is forecasted conditional mean power output of \(n^{th}\) WF at time \(t\). \(\phi_{l}\) and \(\Theta_{m}\) are \(N \times N\) AR and MA parameter matrices for \(l\) and \(m\) lags, respectively. The size of these matrices depends on the number of WFs and the order of the fitted VARMA model. \(a_{t}\) denotes N-dimensional white-noise vector at a time \(t\) with zero mean and \(N \times N\) non-singular contemporaneous covariance matrix \({\text{cov}} (a)\). The Extended Cross-Correlation Matrix and Maximum likelihood approach are used to estimate order and parameters of the VARMA model, respectively [34, 35]. The estimated model parameters are locally optimal. The estimated order of the VARMA model is further verified by different information criterions such as Akaike's Information Criteria (AIC), Bayesian Information Criteria (BIC) and Hannan–Quinn information criterion (HQC). The minimum value of these criteria is desired for the selected order of the VARMA model [34, 35].

Under the following statistical assumptions, the VARMA model can be used directly to generate wind power scenarios for multiple WFs while accounting for spatiotemporal correlations.

  • Historical wind power data input is assumed to be stationary and linear. Thus, input data is preprocessed to make it stationary using either differentiation and/or logarithm transformation.

  • Furthermore, input wind power data is assumed to follow parametric distributions such as Gaussian or Weibull distributions with symmetrical tails.

  • The fitting of the VARMA model to time series data involves epistemic uncertainty regarding model order and parameter estimations.

  • The VARMA model parameters are assumed to be time-independent. The consideration of time-varying model parameters necessitates adaptive models for parameter estimation, which is beyond the scope of the current paper.

Due to the aforementioned statistical assumptions, VARMA model-based scenario generation algorithms will fail to retain the spatial correlation between the WFs. The reasons for this are as follows:

  1. 1.

    VARMA models primarily focus on capturing the temporal dynamics within a time series and do not inherently account for the spatial interdependencies between different WFs. Spatial correlations in wind power data arise from factors such as wind patterns, geographical proximity, and shared weather conditions. By not explicitly incorporating these spatial dynamics, VARMA models alone cannot adequately capture or retain the spatial correlations.

  2. 2.

    VARMA models often make assumptions about the statistical properties of the data, such as stationarity, linearity, and parametric distributions with symmetrical tails. However, wind power data often exhibits non-Gaussian characteristics and non-stationary behavior, which are not accurately captured by these assumptions. The oversimplification of the underlying statistical properties can lead to the failure of VARMA models to capture the true spatial correlations.

  3. 3.

    VARMA models require decisions regarding the model order selection and parameter estimation, both of which introduce uncertainty. The choice of model order and estimation of parameters have a significant impact on the model's ability to capture spatial correlations accurately. Incorrect model order selection or inaccurate parameter estimation can result in the omission or misrepresentation of spatial dependencies, leading to a failure in retaining spatial correlations.

Based on the reasons discussed above, the VARMA model-based scenario generation algorithms alone are unable to adequately retain spatial correlations in wind power data. To address this limitation, the integration of copula models, such as the regular vine (R-Vine) copula, with the VARMA model is proposed in this paper. VARMA and copula models are hybridized through residuals. Residuals of the VARMA model are inputs for the copula Model. The copula model captures the residual correlations between WFs, thus improving the representation of spatial correlations in the generated wind power scenarios. The following equation can be used to calculate residuals:

$$u_{t}^{n} = y_{t}^{n} - \hat{y}_{t}^{n}$$
(2)

Mathematically, residuals \(u_{t}^{n}\) are the difference between fitted or forecasted values \(\hat{y}_{t}^{n}\) and observed values \(y_{t}^{n}\) of the \(n^{th}\) WF’s power outputs. Uncorrelated residuals signify the best fitting of the VARMA model. But if residuals are correlated, means that VARMA model has not fully captured spatiotemporal correlation. The dependency between the residuals of WFs is modelled using the R-Vine copula model in this paper. Copula models the dependency in terms of marginal and copula probability density functions separately which is free from an assumption of a specific distribution.

R-vine Copula Model

The primary function of a copula in mathematics is to capture and quantify the dependency or correlation between variables. It provides a way to study and analyze the relationship between random variables separately from their individual distributions. For example, random variables in the power system such as wind speed, wind power, load, and solar generation are following the different probability distributions like Gaussian, Weibull, Gamma, Beta etc. Therefore, modeling of correlation between such variables is difficult by the conventional correlation measures. By using copulas, it becomes possible to model and simulate multivariate data while preserving the marginal distributions (not having correlation) and capturing the correlation structure. Copula has widely used for the following purpose in the literature:

  1. 1.

    Copulas allow for the modeling of various types of dependence structures between random variables, such as positive or negative correlation, tail dependence, and rank correlation.

  2. 2.

    Copulas can be used to construct multivariate distributions by combining specified marginal distributions with a copula function that represents the dependence structure.

  3. 3.

    Copulas provide a framework for estimating and testing the dependence structure between variables based on observed data. They enable the estimation of copula parameters and the assessment of goodness-of-fit.

Mathematically, copula is a flexible approach to model multivariate distributions for different univariate time series. As per Sklar’s theorem, the multivariate cumulative distribution function (CDF) can be expressed as a function of marginal distributions of each variable and copula that defines the dependency between the variables [18, 27]. Let \(u_{1} ,u_{2} .......u_{n}\) are estimated residuals of N-WFs at time t and \(F_{1} ,F_{2} .......F_{n}\) are corresponding marginal CDFs. According to Sklar’s theorem, the joint distribution can be mathematically represented as follows:

$$F\left( {u_{1} ,u_{2} ,.......,u_{n} } \right) = C\left\{ {F_{1} \left( {u_{1} } \right),F_{2} \left( {u_{2} } \right),.....,F_{n} \left( {u_{n} } \right)} \right\}$$
(3)

where, C is n-dimensional copula and a joint CDF of n-dimensional residuals with uniform marginals. Similarly, the joint probability density function (PDF) for residuals of n WFs can be expressed as follows:

$$f\left( {u_{1} ,u_{2} ,.......,u_{n} } \right) = c\left\{ {F_{1} \left( {u_{1} } \right),F_{2} \left( {u_{2} } \right),.....,F_{n} \left( {u_{n} } \right)} \right\}.f_{1} \left( {u_{1} } \right),f_{2} \left( {u_{2} } \right),....,f_{n} \left( {u_{n} } \right)$$
(4)

where, \(c\) is the copula PDF and \(f_{n}\) represents the marginal PDF of \(n\) residuals. The parametric normal distribution is used in this paper to estimate marginals’ PDF of residuals. Several classes of copula functions are available in the existing literature. These are broadly classified as Elliptical and Archimedean copula [25,26,27,28]. Gaussian copulas and t-copulas are standard multivariate Elliptical copulas used to model data has symmetrical tail dependence. While Clayton, Gumbel, and Joe copula are bivariate Archimedean copulas that can capture asymmetrical tail dependence i.e. common for wind power and wind speed distributions. Elliptical copulas can model high dimensional distributions with symmetric tail dependence while Archimedean copulas are limited bidimensional distributions with asymmetric tail dependence.

The efficient modelling of high dimensional distribution requires a copula model which can extract all variety of tail dependence without restricting to two dimensions. So, vine copula also named pair copula construction (PCC) emerged as a powerful tool to model the high dependence structure in form of bivariate copulas. This extensibility enhances the performance of the vine copula in exhibiting the arbitrary dimensional structure better than the elliptical copula. Vine copula is more efficient in capturing the wind power data structure which possesses different characteristics of tail dependence. PCCs provide a way to explore the excellence of bivariate copula by extending it to variable dimensions.

The multivariate N-dimensional copula is shown in Eq. (3) and (4) are decomposed into \(N\left( {N - 1} \right)/2\) bivariate or pair copula as a Vine copula. The graphical representation of decomposed copula is called the R-Vine copula. In the R-Vine copula, the relationship between pair copula is depicted by a set of N-1 trees. Each tree \(T_{i}\) consists of \(N_{i}\) number of nodes and \(E_{i}\) edges. The first or initial tree has several nodes equal to the number of WFs. Each tree should satisfy the essential properties as described in [24, 27].

  1. 1.

    Initial tree \(T_{1}\) with the node \(N_{1} = \left\{ {1,2,........n} \right\}\) and edge \(E_{1}\).

  2. 2.

    For \(i = 2,......,n - 1\) tree \(T_{i}\) has the node \(N_{i} = E_{i - 1}\).

  3. 3.

    Proximity condition: If two nodes are joined by an edge in a tree \(T_{i}\), their corresponding edges in the tree \(T_{i - 1}\) must share a common node.

Further, an edge \(E_{i}\) is denoted by \(e = \left\{ {j\left( e \right),k\left( e \right)|D\left( e \right)} \right\},\forall n,j \ne k\) where \(j\left( e \right)\) and \(k\left( e \right)\) are conditional nodes and \(D\left( e \right)\) is a conditioning set associated with an edge \(e\). The elements of these sets are all nodes. The conditioning set \(D\left( e \right)\) of the initial tree is empty [27]. The density function of the bivariate copula associated with each edge is represented by \(c_{j\left( e \right),k\left( e \right)|D\left( e \right)}\).

The joint density function of an n-dimensional residual vector as shown in Eq. (4), can be expressed as a product of the marginal density functions and bivariate copula density functions shown in Eq. (5)-(7):

$$f\left( {u_{1} ,u_{2} ,..,u_{n} } \right) = c\left\{ {u|\nu ,B,\alpha } \right\}.f_{1} \left( {u_{1} } \right),f_{2} \left( {u_{2} } \right),..,f_{n} \left( {u_{n} } \right)$$
(5)
$$f\left( {u_{1} ,u_{2} ,.......,u_{n} } \right) = c\left\{ {u|\nu ,B,\alpha } \right\} \times \left[ {\prod\limits_{k = 1}^{n} {f_{k} \left( {u_{k} } \right)} } \right]$$
(6)
$$c\left\{ {u|\nu ,B,\alpha } \right\} = \prod\limits_{i = 1}^{n} {\prod\limits_{{e \in E_{i} }} {c_{j\left( e \right),k\left( e \right)|D\left( e \right)} \left( \begin{gathered} F_{j\left( e \right),D\left( e \right)} \left( {u_{j\left( e \right)} |u_{D\left( e \right)} } \right), \hfill \\ F_{k\left( e \right),D\left( e \right)} \left( {u_{k\left( e \right)} |u_{D\left( e \right)} } \right) \hfill \\ \end{gathered} \right)} }$$
(7)

where, \(u_{D\left( e \right)} = \left\{ {u_{i} |i \in D\left( e \right)} \right\}\) \(c\left( {x|\nu ,B,\alpha } \right)\) is the R-vine copula density function, \(\nu\) is the R-vine structure, B is the set of bivariate copulas associated with the R-vine, and \(\alpha\) is the parameter vector for the bivariate copulas. The sequential approach is used in this paper to select the R-vine structure [24, 27]. In this approach, the trees are selected in such a way that the chosen pairs model the strongest pairwise dependencies present. The algorithm for the sequential approach is described in the Algorithm 2. The parameters of each bivariate copula are estimated by Maximum Likelihood Estimation [27].

Proposed Scenario Generation Algorithm

R-Vine Copula and VARMA models are used in this paper to generate wind power scenarios considering spatiotemporal correlations. The proposed scenario approach is summarized in Algorithm 1. The first VARMA model is fitted on collected historical wind power data of N WFs using Eq. (1) then standard residuals are formed using Eq. (2). The residuals are separated for 24-steps ahead. Before being fed into the VARMA model, WF's historic data is normalized based on installed capacity. To check stationary in the historical power output data of WFs, the Augmented Dickey-Fuller test is used [35]. If data is found to be non-stationary, then differencing and log transformation are applied to make data stationary. VARMA model is fitted on stationary data time series and model parameters and residuals are estimated.

Algorithm 1
figure a

VARMA-Copula based algorithm for generation of wind power scenarios

Due to the normalized data, the residuals will range from -1 to + 1. As a result, the unit hypercube constraint \(\left[ {0,1} \right]^{N}\) is violated. Therefore, in step 5, residuals must be converted into copula data. After the joint PDF has been constructed, the copula data is transformed back into the original data using Probability Integral Transformation in Step 8. Once the desired number of scenarios are generated, their quality check has been done using the energy score and spatiotemporal correlation are depicted by the cross-correlation plot, describes in section III. An energy score with the minimum value is the required result to prove the superiority of the model. Algorithm 2 presents the sequential method for R-Vine copula selection and parameter estimation. This algorithm's input is the residuals of the WFs under consideration, and its output is the joint PDF. The sequential method captures most of the dependence in lower-level trees while leaving no or minimal dependence in higher-level trees.

Algorithm 2
figure b

Sequential method for R-Vine copula selection and estimation [27]

Pearson and Kendall are two main correlation measures that have been used in the existing literature to demonstrate the correlation between two variables, namely electricity prices, renewable generation, and loads [36, 37]. Kendall's tau is a non-parametric measure, meaning it does not assume any specific distribution for the variables. On the other hand, Pearson is a linear correlation measure, and it relies on three key assumptions: (1) linearity; (2) normality of variables; and (3) homoscedasticity. However, for the case of wind speed and power time series, meeting these assumptions is challenging due to the following characteristics of wind energy: (1) a nonlinear relationship between wind power and wind speed in the wind turbine power curves; (2) wind power follows an approximately Weibull distribution instead of a normal distribution; and (3) wind power time series are typically characterized by heteroscedasticity rather than homoscedasticity. Consequently, using Pearson's linear correlation to measure the correlation between two nonlinear wind power time series may result in errors. In contrast, Kendall correlation does not rely on the above preconditions, making it suitable for use in the proposed research to analyze the correlation between multiple WFs.

First, the empirical Kendall's tau is computed for every possible variable pair combination. The total number of combinations for each time step is the square of the number of WFs. Following that, the sum of absolute empirical Kendall's taus is maximized to form the initial tree. To minimize AIC, a bivariate copula is chosen from a pre-defined set of copula families for each edge \(E_{1}\) of the initial tree. For other than initial tree edge \(E_{i}\) is denoted by \(E_{i} = \left\{ {j\left( e \right),k\left( e \right)|D\left( e \right)} \right\}\), where \(j\left( e \right)\) and \(k\left( e \right)\) are conditional nodes and \(D\left( e \right)\) is the conditional set associated with edge \(e\).

The R-vine is a flexible copula that constructs a tree-like dependence structure using a sequential modelling approach. This sequential modelling enables a more flexible representation of dependency patterns. Each level of the vine structure can have a different bivariate copula family, allowing for the modelling of a wide range of dependencies. Different bivariate copula families can be used to model the dependencies at each level of the R-vine copula. The bivariate copula family includes the Gaussian, Student's t, Gumbel, Clayton, Frank, Joe, Tawn, independent copulas, and their hybrid. Copulas with rotations of 90, 180, and 270 degrees are modified versions of the base or survival copula that capture asymmetrical tail and negative dependence. More information on vine copulas, including their CDF equations and parameter ranges, can be found in [27, 38]. Finally, parameters for selected bivariate copulas are estimated for each tree edge. Because the presented paper focuses on multistep scenario forecasting, copula parameters are calculated for each time step separately rather than for the entire time period. The procedure is repeated for the remaining tree set.

Evaluation of Scenarios

The quality of scenarios is of utmost importance as they impact the decision-making in stochastic programming. Various scenario quality assessment methods such as reliability diagrams, sharpness, skill score and energy score are discussed [39]. Most of these quality assessment methods are limited to univariate probabilistic marginal distribution and overlooking the temporal independence structure. Overcoming the drawback, the various multivariate verification tools can be used. In this paper, an energy skill score is used to evaluate the quality of generated scenarios.

Energy Score:

The energy score is a proper scoring rule for verification of multivariate scenario forecasting. It is used in the evaluation of time-coupled power infeed scenarios [39]. It is a negatively oriented score, i.e. lower the better. Providing the real observations of wind power data \(y\) and a matrix of corresponding generated scenarios \(y_{s}\), the Energy score can be calculated as:

$$ES\left( {Y_{s} ,y} \right) = \frac{1}{m}\sum\limits_{i = 1}^{m} {\left\| {y_{s,i} - y} \right\|} - \frac{1}{{2m^{2} }}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {\left\| {y_{s,i} - y_{s,j} } \right\|} }$$
(8)

where, \(m\) is the number of observations and \(\| .\|\) is the Euclidean norm.

Cross-Correlation Function

The cross-correlation function is a way to find the degree of dependency between two-time series in different times and spaces. It shows how one-time series behave when other shifts relative to time and space. Let sample scenario time series for \(i^{th}\) and \(j^{th}\) WFs are \(X_{i}\) and, \(X_{j}\) respectively. Both time series consists of \(m\) number of observations. The cross-correlation \(\rho_{i,j}\) between \(X_{i}\) and \(X_{j}\) is defined as the ratio of covariance to root-mean variance [34, 35]. The cross-correlation is mathematically expressed as follows:

$$\rho_{i,j} = \frac{{\hat{\gamma }_{i,j} }}{{\sqrt {\sigma_{i}^{2} \sigma_{j}^{2} } }}$$
(9)

where, \(\hat{\gamma }_{i,j}\) is sample covariance, \(\sigma_{i}^{2}\) is the variance of time series \(X_{i}\) and \(\sigma_{j}^{2}\) is the variance of time series \(X_{j}\). The cross-covariance of time series \(X_{i}\) and \(X_{j}\) with some lag \(l\) is given by using Eq. (10).

$$\hat{\gamma }_{i,j} = \frac{1}{m}\sum\limits_{t = 1}^{m - l} {\left[ {\left( {X_{i}^{t} - \overline{X}_{i} } \right)\left( {X_{j}^{t + l} - \overline{X}_{j} } \right)} \right]}$$
(10)

where, \(\overline{X}_{i}\) and \(\overline{X}_{i}\) are mean of \(X_{i}\) and \(X_{j}\) time series, respectively. In this paper cross-correlation function approach is used to validate the spatiotemporal correlation between different wind power time series.

Case Study

Dataset

Proposed model efficacy is validated through application on real time nearby WF datasets. It is implemented on a publicly available wind power data sets which is obtained from Australian WFs [33, 40]. Nine WFs from the four regions are selected to generate wind power scenarios. Table 2 shows the Australian Energy Market Operator Identity (AEMO ID) given to different WFs with installed capacity, latitude, and longitude. The historical wind power data of each WF is normalized according to its installed capacity for input in the scenario generation models. Figure 1 shows the location of nine WFs. The resolution of data is 5 min and 24-step ahead scenarios are generated. Among available data, 34561 data points at a resolution of 5-min are used for the training purpose.

Table 2 WFs id and installed capacity [40]
Fig. 1
figure 1

Location of WFs [40]

Section of Benchmark Models

Among various time series univariate ARIMA, multivariate VAR, and VARMA models are considered benchmark models. ARIMA is a popular time series univariate model used in a variety of applications including wind power forecasting [41, 42]. VAR is used to emphasize the superiority of multivariate time series model over the univariate model. VAR is a subset of the VARMA model that does not include the moving average component [32, 33]. VARMA is used as a benchmark to demonstrate the significance of the proposed hybrid VARMA Copula model over the standard VARMA model. Machine learning models may be advantageous over time series models, in capturing the nonlinear dynamics of wind power generation. However, the performance of these models relies on careful input feature selection and hyper parameters tuning. In this paper, the ANN [10] and GAN [11, 12] models are chosen as benchmark models. ANN is suitable for prediction with labeled input datasets, while GAN excels at generating realistic data samples, such as wind power scenarios. The GAN model consists of a generator and a discriminator, and during training, both networks improve their performance through an adversarial process. GAN can produce distinct wind power scenarios that capture the intrinsic features of historical wind power data, such as ramps and spikes. Both ANN and GAN models can generate wind power scenarios considering spatiotemporal correlations [10,11,12]. The detailed comparison of all scenario generation models is beyond the scope of the presented paper.

Simulation Platforms and Packages

All simulations have been performed on the R-studio platform version 4.1.2. The Multivariate Time Series (MTS) version 1.2.1 and Vine Copula version 2.4.5 packages are used for modelling VARMA and R-Vine copula, respectively [35, 43]. Other than these main packages, readxl, writexl, vars, forecasts, copula and RSNNS packages are used for simulation of basic functions and benchmark models. The reference manuals for these packages can be found in the Comprehensive R Archive Network (CRAN) repository [44]. All of the code and results described in this paper are publicly available at https://zenodo.org/deposit/8017860 and https://zenodo.org/deposit/8017980.

VARMA Results

This section describes the results obtained using the VARMA model. In the VARMA model, parameters are estimated through maximum likelihood approach. For reducing computational efforts, here (1, 1) order of VARMA is used. Total of 171 parameters are estimated for VARMA (1, 1) model. These include 81 AR, 81 MA and 09 standard variance parameters. For simplicity, three representative WFs (WF2, WF5 and WF8) are selected from each zone and results for these three WFs are discussed in detail. Table 3 shows the estimated AR and MA coefficients for three representative WFs. After parameter computation, obtained residual covariance matrix is used to generate a normally distributed white noise for each WF separately.

Table 3 Estimated AR and MA Coefficients for three representative WFs

Figure 2 depicted the diagnosis of VARMA model residuals for three denoted WFs. Because of the normalization of the input data, the residuals range from -1 to 1. The obtained residuals appear to be a noise signal with many random spikes. This implies that the obtained results should be scrutinized further to assess the quality of the VARMA model. For the residuals of the VARMA, a sample autocorrelation function (ACF) and a quantile–quantile (QQ) plot are drawn. ACF plots demonstrate the presence of correlation in the sample data with different time lags. Ideally, residuals should be uncorrelated. The ACF plots of WF2, WF5, and WF8 show that sample autocorrelation exceeds the limits for multiple lags. For WF2, sample correlation at lags 5 to 8, 13, and 16 exceeds the threshold limit. This high sample autocorrelation value indicates that the correlation has been left in the residuals due to model fitting errors that must be extracted using copula models.

Fig. 2
figure 2

Residuals, sample autocorrelation and QQ plot of three WFs for first time horizon

The QQ plot is drawn between standard normal quantiles and sample residual quantiles. The QQ plot will be straight if the residual follows a normal or uniform distribution. However, the sample quantiles for WF residuals are not properly aligned with standard normal quantiles. This demonstrates that the residuals do not follow the marginal distribution and have tail asymmetry. As shown in the figure, the tail asymmetry is greater for WF5 and WF8 than for WF2. Copula models must be used to model such asymmetrical dependency.

Copula Results

The estimated residuals from VARMA model are separated for each time step to model dependency using R-vine copula. Nine-dimensional R-vine copula is required to model dependency between considered WFs for each time step. Further, PCC decomposed a 9-dimensional copula into 36 bivariate copulas using the sequential approach as discussed in Algorithm 2. Eight trees graphically represent the dependency between bivariate copulas. The obtained tree structure for first-time step is shown in Fig. 3. The nodes {1 to 9} of the initial tree represents the nine considered WFs. These nodes are connected through the eight edges {5,8},{1,5},{3,1}, {2,3}, {6,4}, {6,2}, {9,6} and {9,7}. The copula type, number, estimated parameters, and Kendall’s taus \(\hat{\tau }\) for each edge of the tree are shown in Table 4. For the second tree the nodes {5,8} and {1,5} are joined by the edge \(\{ 1,8|5\}\) because node {5} is common between node {1} and {8}, and satisfy the proximity condition. Similarly, the tree's other edges can be defined.

Fig. 3
figure 3

Tree structure of bivariate Copulas for first time horizon

Table 4 Bivariate copula selection for R-Vine structure of first time horizon

In the Fig. 3, the nodes of the first tree \(T_{1}\) are the WFs and these are joined according to their dependency structure. It is observed in the tree \(T_{1}\) that WF5 and WF8 are closer to each other than WF3. That is also observed from the Fig. 1, the geographical distance between WF5 and WF8 is smaller than their distance from WF3.

The range of \(\hat{\tau }\) from -0.64 to 0.67 for tree \(T_{1}\) shows the strong dependency between the WFs. After constructing the vine tree, different copulas are selected for each edge. The selection process is typically done using a sequential method mentioned in the Algorithm 2. This algorithm evaluates various candidate copulas for each edge and selects the one that best captures the dependence structure observed in the residual data. The sequential method uses statistical criteria, such as AIC to compare the goodness-of-fit of different copulas. The AIC criteria measure how well a copula fits the observed data and assess the quality of the dependence structure captured by the copula. The sequential method iteratively evaluates different copulas for each edge, assessing their fit to the data and selecting the one that minimizes the selected criterion. This process is repeated for each edge in the vine tree until copulas have been selected for all edges. The type of copula selected for each edge in any tree also depends on the type of tail dependency presented in the residuals of WFs. For the first time step, the selected copulas are Gaussian, Frank, t, BB8, Rotated BB8 (90 and 270 degrees), Rotated tawn (type 1 and 2), and independent as shown in Table 4. Among these copulas, Frank and Gaussian are one parameter copulas.

From the Table 4, it is observed that number of bivariate copulas and value of \(\hat{\tau }\) are decreasing from tree \(T_{1}\) to \(T_{8}\). Thus, mostly independent copulas are selected for higher-level trees and most dependences are modeled through the lower-level trees. The high proportion of independence copulas can significantly reduce the computational burden of modeling the R-vine copula. Additionally, the dependence model is an offline model. Once the model is established, enough scenarios can be extracted from it without excessive computation. It is not necessary to update the dependence model daily. In practice, it can be updated every few weeks.

The selection of R-vine copula instead of Gaussian copula to model for modeling spatial correlation between WFs is further validated by AIC information criteria in this paper. Figure 4 shows the calculated AIC over the given sample set for these two copula models for 120-steps. This figure shows that R-vine copula outperforms the gaussian copula in terms of minimum AIC. So, R-vine copula offers higher accuracy and more flexibility to model asymmetrical dependences between WFs as compared to Gaussian copula.

Fig. 4
figure 4

Performance of R-vine and Gaussian copulas

Scenario Results

The generated per-unit wind power scenarios using proposed approach and set of benchmarks models for denoted WFs are shown in Fig. 5. A total of 1000 scenarios are generated for each WF and 50 scenarios are plotted to better visualize multiple parameters through YY plot. In Fig. 5 left Y-axis shows the actual value and, the right Y-axis show the scenarios & mean scenario. The significant bias between the actual values and generated scenarios using a set of benchmark models are witnessed in the scenario results depicted through YY plot. From Fig. 5, it is observed that scenarios generated using proposed distribution free approach capture the wind uncertainty with minimum bias. Although the assumed error distribution is same for all the approaches, a significant difference is observed in generated scenarios. Because VAR Model underperform and linearizes with time and the ARIMA model shows abrupt waves and irregular fluctuations and spikes in generated scenarios.

Fig. 5
figure 5

Generated scenarios for WF2, WF5 and WF8 using proposed, VARMA, VAR and ARIMA approaches. In the dual-axis sub-plots, the right y-axis depicts the generated wind power and mean scenarios, measured in per unit. "Per unit" refers to the ratio of the power output of WFs to their installed capacities in MW

Scenario Results Evalauation through Energy Score, Kendall’s correlation and Cross-Correaltion Plots

The proposed model can generate quality scenarios, which are then evaluated using energy scores, CCF plot and heat plot. The energy score for proposed and benchmark approaches are provided in Table 5. The table highlights the minimum energy score obtained for the proposed and benchmark approaches in bold letter. In terms of minimum energy score, the proposed method outperforms 55.56% of WFs. However, GAN and ANN outperform for 33.33% and 11% of WFs, respectively. In the context of time series approaches, VAR offers the lowest energy score for 89% of WFs when compared to the ARIMA model. This is due to the VAR model's multivariate nature. When compared to VAR and VARMA, the proposed VARMA-Copula model provides the lowest energy scores for all WFs. Because the proposed method can model both correlation and tail dependency.

Table 5 Energy Score Evaluation

The Kendall's correlation plots and cross-correlation function are used to check spatiotemporal correlation in the generated scenarios. The Kendall's correlation plot for the considered WFs is shown in Fig. 6. Kendall's correlation ranges from -1 to 1, where -1 indicates a perfect negative correlation, 1 indicates a perfect positive correlation, and 0 indicates no correlation. This correlation figure is plotted using the actual data and generated scenarios through proposed and benchmark approaches to show the spatial correlation between the WFs. From Fig. 6, it is visualized that rank correlation obtained for scenarios generated using the proposed approach is almost like the correlation obtained using real or actual data. However, it is significant for other benchmark approaches. The minimum deviation in heat map validates that the proposed approach can retain spatial correlations in the generated scenarios.

To complement the Kendall's correlation coefficient, a significance test using the “r test” is also performed on sample scenario generated using the proposed approach and benchmark approaches and the actual scenario. This test determines if the observed correlation coefficient is significantly different from zero, allowing you to assess the statistical significance of the association. The null hypothesis assumes no correlation (tau = 0), and the alternative hypothesis suggests a non-zero correlation. The result of “r test” is highlighted by the red color in the Fig. 6. It shows the strong correlation between the WFs. For example, WF1 has the strong correlation with WF2, WF3, WF5 and WF8 (Fig. 6).

Fig. 6
figure 6

Kendall's correlation for generated scenrios and actual output of considered WFs. Scenarios are generated using proposed approach, VARMA, VAR, GAN and ANN methods

The Kendall's rank correlation coefficient matrix can be used to create correlation figure, which only display the spatial correlations between the two data series over the given time period. On the other hand, CCF captures the spatial correlation between fixed and shifted (lagged) series as a function of time. The CCF for represented WFs are shown in Fig. 7. This figure shows that the cross-correlation plot for the mean scenario using the proposed VARMA-Copula approach follows a nearly identical pattern and moves in the same direction as the cross-correlation plot for the actual data. However, the CCF plot of the actual data differs significantly from the CCF plot of the WFs combinations for the VARMA, VAR, ANN, and GAN models. The proposed approach provides a two-stage modelling solution for spatial correlation between WFs. At first, spatial correlation is modelled using mutual parameters of the VARMA model. Later, spatial correlation is recovered from the residuals of the VARMA model using the R-Vine copula.

Fig. 7
figure 7

CCF plots of three WFs for their actual data and mean power scenario using propoased approach, VARMA, VAR, ANN and GAN methods

Conclusion

To address the uncertainties in various decision-making problems of power system and electricity market, a high-quality generated scenarios plays a significant role. A hybrid VARMA-Copula based distribution-free approach has been proposed in this paper to generate wind power scenarios considering spatiotemporal correlation. The proposed model can generate scenarios for multiple WFs considering spatiotemporal correlation without the assumption of the marginal distribution. The proposed model has been implemented on the nine WFs located in Australia. The superiority of the proposed scenario generation approach has been proved by comparison with a set of benchmark models. The obtained results show that the proposed approach can generate quality scenarios with minimum energy scores without loss of spatiotemporal correlation, which can easily be depicted in the Kendall's correlation and CCF plots. It emphasizes that uncertainty modelling is improved by incorporating spatial–temporal correlation between nearby WFs. The R-Vine is a flexible copula model that uses a variety of bivariate copulas to represent asymmetrical dependency. This extends the proposed method for generating quality wind power scenarios with minimum energy scores. The proposed method can be enhanced by applying adaptive and non-linear forecasting models with time-varying parameters to generate wind power scenarios. The proposed work could be extended to generate load, solar generation, and price scenarios for different power systems and electricity markets applications.