1 Introduction

Increased interest in the analysis of coherent measures of risk is motivated by their application as mathematical models of risk quantification in finance and other areas. This line of research leads to new mathematical problems in convex analysis, optimization, and statistics. The risk assessment is expressed mathematically as a functional of random variable, which may be nonlinear with respect to the probability measure. Most frequently, the risk measures of interest in practice arise when we evaluate gains or losses depending on the choice z, which represents the control of a decision maker and random quantities, which may be summarized in a random vector X. More precisely, we are interested in the functional f(zX), which may be optimized under practically relevant restrictions on the decisions z. Most frequently, some moments of the random variable \(Y= f(z,X)\) are evaluated. However, when models of risk are used, the existing theory of statistical estimation is not always applicable.

Our goal is to address the question of statistical estimation of composite functionals depending on random vectors and their moments. Additionally, we analyze the optimal values of such functionals, when they depend on finite-dimensional decisions within a deterministic compact set. The known coherent measures of risk can be cast in the structures considered here and we shall specialize our results to several classes of popular risk measures. We emphasize, however, that the results address composite functionals of more general structure with potentially wider applicability.

Axiomatic definition of risk measures was first proposed in Kijima and Ohnishi (1993). The currently accepted definition of a coherent risk measure was introduced in Artzner et al. (1999) for finite probability spaces and was further extended to more general spaces in Ruszczyński and Shapiro (2006), Föllmer and Schied (2011). Given a probability space \((\varOmega ,{\mathcal F},P)\), we consider the set of random variables defined on it, which have finite pth moments and denote it by \(\mathcal {L}^p(\varOmega ,{\mathcal F},P)\). A coherent measure of risk is a convex, monotonically increasing, and positively homogeneous functional \(\rho : \mathcal {L}^p(\varOmega ,{\mathcal F},P)\rightarrow \bar{R}\), which satisfies the translation property: \(\rho (Y+a)=\rho (Y)+ a\) for all \(a\in \mathbb {R}\). Here \(\bar{R} = \mathbb {R}\cup \{+\infty \}\) and we assume that Y represents losses, i.e., smaller realizations are preferred. Related concepts are introduced in Rockafellar et al. (2006), Föllmer and Schied (2002).

A measure of risk is called law-invariant, if it depends only on the distribution of the random variable, i.e., if \(\rho (X)= \rho (Y)\) for all random variables \(X,Y\in \mathcal {L}^p(\varOmega ,{\mathcal F},P)\) having the same distribution.

A practically relevant law-invariant coherent measure of risk is the mean-semideviation of order \(p\ge 1\) (see Ogryczak and Ruszczyński 1999, 2001; Shapiro et al. 2009, s.6.2.2), defined in the following way:

$$\begin{aligned} \rho (X) = \mathbb {E}[X] + \kappa \left\| (X-\mathbb {E}[X])_+\right\| _p = \mathbb {E}[X] + \kappa \left[ \mathbb {E}\left[ \left( \max \{0,X- \mathbb {E}[X]\}\right) ^p\right] \right] ^{\frac{1}{p}}, \end{aligned}$$
(1)

where \(\kappa \in [0,1]\). Note the nonlinearity with respect to the probability measure in formula (1).

Another popular law-invariant coherent measure of risk is the Average Value at Risk at level \(\alpha \in (0,1]\) (see Rockafellar and Uryasev 2002; Ogryczak and Ruszczyński 2002), which is defined as follows:

$$\begin{aligned} \mathrm{AVaR}_{\alpha }(X) = \frac{1}{\alpha } \int _{1-\alpha }^1 F_X^{-1}(\beta )\;\mathrm{d}\beta = \min _{\eta \in \mathbb {R}} \left\{ \eta + \frac{1}{\alpha } \mathbb {E}[(X- \eta )_+]\right\} . \end{aligned}$$
(2)

Here, \(F_X(\cdot )\) denotes the distribution function of X. The reader may consult, for example, (Shapiro et al. 2009, Chapter 6) and the references therein, for more detailed discussion of these risk measures and their representation.

The risk measure \(\mathrm{AVaR}_{\alpha }(\cdot )\) plays a fundamental role as a building block in the description of every law-invariant coherent risk measure via the Kusuoka representation. The original result is presented in Kusuoka (2001) for risk measures defined on \(\mathcal {L}^{\infty }(\varOmega ,\mathbb {F},P)\), with an atomless probability space. It states that for every law-invariant coherent risk measure \(\rho (\cdot )\), a convex set \(\mathcal {M}\subset \mathcal {P}(0,1]\) exists such that for all \(Z\in \mathcal {L}^{\infty }(\varOmega ,\mathbb {F},P)\), it holds

$$\begin{aligned} \rho (X) = \sup _{m \in \mathcal {M}} \int _0^1 \mathrm{AVaR}_{\alpha }(X)\;m(\mathrm{d}\alpha ). \end{aligned}$$
(3)

Here \(\mathcal {P}(0,1]\) denotes the set of probability measures on the interval (0, 1]. This result is extended to the setting of \(\mathcal {L}^p\) spaces with \(p \in [1, \infty )\); see Frittelli and Rosazza-Gianin (2005), Pflug and Römisch (2007), Pflug and Wozabal (2010), Shapiro et al. (2009), Dentcheva et al. (2010), and the references therein.

The extremal representation of \(\mathrm{AVaR}_{\alpha }(X)\) on the right-hand side of (2) was used as a motivation in Krokhmal (2007) to propose the following higher moment coherent measures of risk:

$$\begin{aligned} \rho (X)=\min _{\eta \in \mathbb {R}} \left\{ \eta +\frac{1}{\alpha }\Vert (X - \eta )_+\Vert _p\right\} , \quad p > 1. \end{aligned}$$
(4)

These risk measures are special cases of a more general family considered in Cheridito and Li (2009); they are also examples of optimized certainty equivalents of Ben-Tal and Teboulle (2007). In the paper Dentcheva et al. (2010), the explicit Kusuoka representation for the higher order risk measures (4) was described by utilizing duality theorems from Rockafellar (1974). These risk measures are used for portfolio optimization in Krokhmal (2007), where their advantages in comparison to the classical mean-variance optimization model of Markowitz (1952, 1987) is demonstrated on examples. The recent work of Matmoura and Penev (2013) indicates that if such type of risk measure is used as a risk criterion in European option portfolio optimization, the time evolution of the portfolio is superior to the evolution of a portfolio optimized with respect to the AVaR risk or with respect to the mean-variance optimization model of Markowitz. Similar observations were recently made by Gülten and Ruszczyński (2015).

A connection of measures of risk to the utility theories has been widely discussed in the literature. Many of the risk measures of interest can be expressed via optimization of the so-called optimized certainty equivalent (Ben-Tal and Teboulle 2007) for a suitable choice of the utility function. Relations of risk measures to rank-dependent utility functions are given in Föllmer and Schied (2011). In Dentcheva and Ruszczyński (2014), it is established that law-invariant coherent measures of risk are a numerical representation of certain preference relation defined on the space of bounded quantile functions, and are closely related to the dual utility theory.

In practical applications, we deal with samples and stochastic models of the underlying random quantities. Therefore, the questions pertaining to statistical estimation of the measures of risk are crucial to the proper use of law-invariant measures of risk. Several measures of risk have an explicit formula, which can be used as a plug-in estimator, with the original measure P replaced by the empirical measure. The empirical quantile is a natural estimator of the Value at Risk. A natural empirical estimator of \(\mathrm{AVaR}_{\alpha }(X)\) leads to the use of the L-statistic (see Jones and Zitikis 2003; Dentcheva and Penev 2010). Furthermore, the Kusuoka representation, as well as the use of distortion functions in insurance has motivated the construction and analysis of empirical estimates of spectral measures of risk using L-statistic. We refer to Jones and Zitikis (2003, 2007), Brazauskas et al. (2008), Beutner and Zähle (2010), Tsukahara (2013), Belomestny and Krätschmer (2012) for more details on this approach. Some risk measures, such as the tail risk measures of form (4), cannot be estimated via simple explicit formulae but are obtained as a solution of a convex optimization problem with convex constraints. Although asymptotic behavior of optimal values of sample-based expected value models has been investigated before (see Römisch 2003, Ch. 8; Shapiro et al. 2009, Ch. 5 and the references therein), only few results address models with risk measures. Römisch 2003, ch. 8 contains results on perturbation analysis for optimization problems with risk functionals. In Dentcheva et al. (2011), risk measures are used to design statistical tests for stochastic dominance and our results have direct implications for those tests.

Our paper is organized as follows. Section 2 contains the key result of our paper, which establishes a central limit formula for a composite risk functional. We provide a characterization of the limiting distribution of the empirical estimators for such functionals. Section 3 contains a central limit formula for risk functionals, which are obtained as the optimal values of composite functionals. Section 4 provides asymptotic analysis and central limit formulae for the optimal value of optimization problems which use measures of risk in their objective functions. We pay special attention to some popular measures and we discuss several illustrative examples in Sects. 2, 3, and 4. In Sect. 5, we perform a simple simulation study to assess the accuracy of our approximations. Section 6 concludes.

2 Estimation of composite risk functionals

In the first part of our paper, we focus on functionals of the following form:

$$\begin{aligned} \rho (X) = \mathbb {E}\left[ f_1\left( \mathbb {E}\left[ f_2\left( \mathbb {E}[\;\cdots f_k(\mathbb {E}[f_{k+1}(X)],X)]\;\cdots ,X\right) \right] ,X\right) \right] , \end{aligned}$$
(5)

where X is an m-dimensional random vector, \(f_j:\mathbb {R}^{m_j} \times \mathbb {R}^m \rightarrow \mathbb {R}^{m_{j-1}}\), \(j=1,\ldots ,k\), with \(m_0=1\) and \(f_{k+1}:\mathbb {R}^m\rightarrow \mathbb {R}^{m_k}\). Let \(\mathcal {X}\subset \mathbb {R}^m\) be the domain of the random variable X. We denote the probability distribution of X by P.

Given a sample \(X_1,\ldots ,X_n\) of independent identically distributed observations, we consider the following plug-in empirical estimate of the value of \(\rho \):

$$\begin{aligned} \rho ^{(n)}&= \sum _{i_0=1}^n\frac{1}{n}\bigg [f_1\bigg (\sum _{i_1=1}^n\frac{1}{n}\bigg [f_2\bigg (\sum _{i_2=1}^n\frac{1}{n}\bigg [\;\cdots f_k\bigg (\sum _{i_k=1}^n\frac{1}{n}f_{k+1}(X_{i_k}),X_{i_{k-1}}\bigg )\bigg ]\;\\&\quad \cdots ,X_{i_1}\bigg )\bigg ],X_{i_0}\bigg )\bigg ]. \end{aligned}$$

Our construction is primarily motivated by the aim to estimate coherent measures of risk from the family of mean–semideviations (Ogryczak and Ruszczyński 1999, 2001).

Example 1

(Semideviations) Consider the functional (1) representing the mean–semideviation of order \(p\ge 1\). In this case, (5) has parameters \(k=2\), \(m=1\), and

$$\begin{aligned} f_1(\eta _1,x)&= x + \kappa \eta _1^{\frac{1}{p}},\\ f_2(\eta _2,x)&= \left[ \max \{0,x- \eta _2\}\right] ^p,\\ f_3(x)&= x. \end{aligned}$$

\(\blacktriangle \)

Example 2

(Composite semideviations) Consider now a slightly more complex structure, arising in systemic risk analysis. There are N random variables, \(Z_1,\ldots , Z_N\), representing performance of elements of a system. The semideviation risk measures of the components are defined as follows:

$$\begin{aligned} \rho _j(Z_j) = \mathbb {E}[Z_j] + \kappa _j \left\| (Z_j-\mathbb {E}[Z_j])_+\right\| _p,\quad j=1,\ldots ,N. \end{aligned}$$

A random N-dimensional line vector Y has support in the simplex \(\big \{y\in \mathbb {R}^N_+: \sum _{j=1}^N y_j=1\big \}.\) We can now consider the random variable

$$\begin{aligned} S = \sum _{j=1}^N Y_j\rho _j(Z_j), \end{aligned}$$

representing the system’s performance. Our intention is to estimate the mean–semideviation risk measure of S:

$$\begin{aligned} \rho (S) = \mathbb {E}[S] + \kappa \left\| (S-\mathbb {E}[S])_+\right\| _p. \end{aligned}$$

Substitution of the functional form of S yields the following expression of the systemic risk:

$$\begin{aligned} \rho (S)= & {} \sum _{j=1}^N \mathbb {E}[Y_j]\left( \mathbb {E}[Z_j] + \kappa _j \left\| (Z_j-\mathbb {E}[Z_j])_+\right\| _p\right) \\&+ \kappa \left\| \left( \sum _{j=1}^N \left( Y_j-\mathbb {E}[Y_j]\right) \left( \mathbb {E}[Z_j] + \kappa _j \left\| (Z_j-\mathbb {E}[Z_j])_+\right\| _p\right) \right) _+\right\| _p. \end{aligned}$$

We comprise all random variables in the 2N-dimensional vector \(X=(Y,Z_1,\ldots ,Z_N)\).

We define the following functions:

$$\begin{aligned}&f_6:\mathbb {R}^{2N}\rightarrow \mathbb {R}^N,\quad&f_{6}(y,z)&= z,\\&f_5:\mathbb {R}^N\times \mathbb {R}^{2N}\rightarrow \mathbb {R}^N,\quad&f_{5,j}(\eta ,y,z)&= \max \left( 0, z_j - \eta _j\right) ^p,\quad j=1,\ldots ,N,\\&f_4:\mathbb {R}^N\times \mathbb {R}^{2N}\rightarrow \mathbb {R}^N\quad&f_{4,j}(\eta ,y,z)&= z_j + \kappa _j \eta _j^{1/p},\quad j=1,\ldots ,N,\\&f_3:\mathbb {R}^N\times \mathbb {R}^{2N}\rightarrow \mathbb {R}^{N+1} \quad&f_{3}(\eta ,y,z)&= \left[ \begin{array}{ll} \eta \\ \sum _{j=1}^N \eta _j y_j \end{array}\right] ,\\&f_2:\mathbb {R}^{N+1}\times \mathbb {R}^{2N}\rightarrow \mathbb {R}^2,\quad&f_2(\eta ,y,z)&= \left[ \begin{array}{ll} \eta _{N+1} \\ \max \left( 0, \sum _{j=1}^N \eta _j y_j - \eta _{N+1}\right) ^p \end{array}\right] ,\\&f_1:\mathbb {R}^2\times \mathbb {R}^{2N}\rightarrow \mathbb {R},\quad&f_1(\eta ,y,z)&= \eta _1 + \kappa \eta _2^{1/p}. \end{aligned}$$

With these definitions, the system’s risk has the form (5), where \(m=2N\) and \(k=5\).

\(\blacktriangle \)

The above example illustrates that functionals in the form (5) with relatively complex form of \(\rho \) may often appear in practice. Further examples of this type can be given from the area of multi-criteria optimization (Jahn 2011). A common approach there is an aggregation of the separate criteria. While the linear aggregation, similar to Example 2 is the most common way to go, sometimes a nonlinear aggregation makes better sense. In cost–benefit analysis, for example, it makes sense to analyze the cost–benefit ratio and this leads again to a functional of the form (5).

To formulate the main theorem of this section, we introduce several relevant quantities. We define:

$$\begin{aligned} \bar{f}_j(\eta _j)&= \int _{\mathcal {X}} f_j(\eta _j,x)\,P(\mathrm{d}x),\quad j=1,\ldots ,k, \\ \mu _{k+1}&= \int _{\mathcal {X}} f_{k+1}(x)\,P(\mathrm{d}x),\\ \mu _j&= \bar{f}_{j}(\mu _{j+1}),\quad j=1,\ldots ,k. \end{aligned}$$

Suppose \(I_j\) are compact subsets of \(\mathbb {R}^{m_j}\) such that \(\mu _{j+1}\in \text {int}(I_j)\), \(j=1,\ldots ,k\). We introduce the notation \(\mathcal {H}= \mathcal {C}_1(I_1)\times \mathcal {C}_{m_1}(I_2)\times \cdots \mathcal {C}_{m_{k-1}}(I_k)\times \mathbb {R}^{m_{k}}\), where \(\mathcal {C}_{m_{j-1}}(I_j)\) is the space of continuous functions on \(I_j\) with values in \(\mathbb {R}^{m_{j-1}}\) equipped with the supremum norm. The space \(\mathbb {R}^{m_k}\) is equipped with the Euclidean norm, and \(\mathcal {H}\) with the product norm. We use Hadamard directional derivatives of the functions \(f_{j}\big (\cdot ,x)\) at points \(\mu _{j+1}\) in directions \(\zeta _{j+1}\), i. e.,

$$\begin{aligned} f'_{j}(\mu _{j+1},x;\zeta _{j+1}) = \lim _{\begin{array}{c} {t\downarrow 0}\\ s\rightarrow \zeta _{j+1} \end{array}}\frac{1}{t} \left[ f_{j}(\mu _{j+1}+ts,x) - f_{j}(\mu _{j+1},x)\right] . \end{aligned}$$

For every direction \(d = (d_1,\ldots ,d_{k},d_{k+1}) \in \mathcal {H}\), we define recursively the sequence of vectors:

$$\begin{aligned} \xi _{k+1}(d)= & {} d_{k+1},\nonumber \\ \xi _{j}(d)= & {} \int _{\mathcal {X}} f'_{j}\left( \mu _{j+1},x;\xi _{j+1}(d)\right) \,P(\mathrm{d}x) + d_{j}\left( \mu _{j+1}\right) ,\quad j=k,k-1,\ldots ,1.\nonumber \\ \end{aligned}$$
(6)

Theorem 1

Suppose the following conditions are satisfied:

  1. (i)

    \(\int \Vert f_j(\eta _j,x)\Vert ^2 \;P(dx)<\infty \) for all \(\eta _j\in I_j\), and \(\int \Vert f_{k+1}(x)\Vert ^2 P(dx)<\infty \);

  2. (ii)

    For all \(x\in \mathcal {X}\), the functions \(f_j(\cdot ,x)\), \(j=1,\ldots ,k\), are Lipschitz continuous:

    $$\begin{aligned} \Vert f_j(\eta _j',x)- f_j(\eta _j'',x)\Vert \le \gamma _j(x) \Vert \eta _j'-\eta _j''\Vert ,\quad \forall \; \eta _j',\eta _j''\in I_j, \end{aligned}$$

    and \(\int \gamma _j^2(x)\;P(dx) <\infty \).

  3. (iii)

    For all \(x\in \mathcal {X}\), the functions \(f_j(\cdot ,x)\), \(j=1,\ldots ,k\), are Hadamard directionally differentiable.

Then

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\xi _1(W), \end{aligned}$$

where \(W(\cdot ) = \big ( W_1(\cdot ), \ldots , W_{k}(\cdot ), W_{k+1}\big ) \) is a zero-mean Brownian process on \(I = I_1\times I_2 \times \cdots \times I_k\). Here \(W_j(\cdot )\) is a Brownian process of dimension \(m_{j-1}\) on \(I_j\), \(j=1,\ldots ,k\), and \(W_{k+1}\) is an \(m_k\)-dimensional normal vector. The covariance function of W has the following form:

$$\begin{aligned} \mathrm{cov}\left[ W_i(\eta _i), W_j(\eta _j) \right]= & {} \int _{\mathcal {X}} \left[ f_i(\eta _i,x) - \bar{f}_i(\eta _i)\right] \left[ f_j(\eta _j,x) - \bar{f}_j(\eta _j)\right] ^\top \;P(dx),\nonumber \\&\quad \eta _i\in I_i,\ \eta _j\in I_j,\ i,j=1,\ldots ,k,\nonumber \\ \mathrm{cov}\left[ W_i(\eta _i) , W_{k+1} \right]= & {} \int _{\mathcal {X}} \left[ f_i(\eta _i,x) - \bar{f}_i(\eta _i)\right] \left[ f_{k+1}(x) - \mu _{k+1}\right] ^\top \;P(dx),\nonumber \\&\quad \eta _i\in I_i,\ i=1,\ldots ,k,\nonumber \\ \mathrm{cov}\left[ W_{k+1}, W_{k+1} \right]= & {} \int _{\mathcal {X}} \left[ f_{k+1}(x) - \mu _{k+1}\right] \left[ f_{k+1}(x) - \mu _{k+1}\right] ^\top \;P(dx). \end{aligned}$$
(7)

Proof

We define \(I = I_1\times I_2 \times \cdots \times I_k\), \(M= m_0+m_1+\cdots +m_k\), and the vector-valued function \( f: I \times \mathcal {X}\rightarrow \mathbb {R}^{M} \) with block coordinates \(f_j(\eta _j,x)\), \(j=1,\ldots ,k\), and \(f_{k+1}(x)\). Similarly, we define \(\bar{f}: I \rightarrow \mathbb {R}^{M}\) with block coordinates \(\bar{f}_j(\eta _j)\), \(j=1,\ldots ,k\), and \(\mu _{k+1}\). Consider the empirical estimates of the function \(\bar{f}(\eta )\):

$$\begin{aligned} h^{(n)}(\eta ) = \frac{1}{n}\sum _{i=1}^n f(\eta ,X_i),\quad n =1,2,\ldots . \end{aligned}$$

Due to assumptions (i)–(ii), all functions \(h^{(n)}\) are elements of the space \(\mathcal {H}\).

Furthermore, assumptions (i)–(ii) guarantee that the class of functions \(f(\eta ,\cdot )\), \(\eta \in I\), is Donsker, that is, the following uniform Central Limit Theorem holds (see Van der Vaart 1998, Ex.19.7):

$$\begin{aligned} \sqrt{n}\big ( h^{(n)} - \bar{f} \big ) \xrightarrow {\mathcal {D}}W, \end{aligned}$$
(8)

where W is a zero-mean Brownian process on I with covariance function

$$\begin{aligned} \mathrm{cov}\left[ W(\eta '), W(\eta '') \right] = \int _{\mathcal {X}} \left[ f(\eta ',x) - \bar{f}(\eta ')\right] \left[ f(\eta '',x) - \bar{f}(\eta '')\right] ^\top \;P(dx). \end{aligned}$$
(9)

This fact will allow us to establish asymptotic properties of the sequence \(\big \{\rho ^{(n)}\big \}\).

First, we define a subset H of \(\mathcal {H}\) containing all elements \((h_1,\ldots ,h_k, h_{k+1})\) for which \(h_{j+1}(h_{j+2}(\cdots h_k(h_{k+1}) \cdots )) \in I_j\), \(j=1,\ldots ,k\). We define an operator \(\Psi : H \rightarrow \mathbb {R}\) as follows

$$\begin{aligned} \varPsi (h) = h_1\big (h_2\big (\;\cdots h_k(h_{k+1})\cdots \big )\big ). \end{aligned}$$

By construction the value of \(\rho (X)\) is equal to the value of \(\varPsi \big (\bar{f}\big )\) and the value of \(\rho ^{(n)}\) is equal to the value of \(\varPsi \big (h^{(n)}\big )\).

To derive the limit properties of the sequence \(\big \{ \rho ^{(n)} \big \}\) we shall use Delta Theorem (see, Römisch 2006). The essence of applying the theorem is in identifying conditions under which a statement about a limit result related to convergence in distribution of a scaled version of a statistic \(h^{(n)},\) can be translated into a statement about a convergence in distribution of a scaled version of a transformed statistic \(\Psi (h^{(n)}).\)

To this end, we have to verify Hadamard directional differentiability of \(\varPsi (\cdot )\) at \(\bar{f}\).

Observe that the point \(\bar{f}\) is an element of H, because \(\mu _{j+1} \in \text {int}(I_j)\), \(j=1,\ldots ,k\). Moreover, due to assumption (ii), the following inequality is true for every \(j=1,\ldots ,k\):

$$\begin{aligned}&{\Vert h_{j}(h_{j+1}(h_{j+2}(\cdots h_k(h_{k+1}) \cdots ))) - \mu _j \Vert } \\&\quad \le \Vert h_j - \bar{f_j}\Vert + \Vert \bar{f}_{j}(h_{j+1}(h_{j+2}(\cdots h_k(h_{k+1}) \cdots ))) - \bar{f}_j(\mu _{j+1}) \Vert \\&\quad \le \Vert h_j - \bar{f_j}\Vert + \int \gamma _j(x)\;P(dx) \cdot \Vert h_{j+1}(h_{j+2}(\cdots h_k(h_{k+1}) \cdots )) - \mu _{j+1} \Vert . \end{aligned}$$

Recursive application of this inequality demonstrates that \(\bar{f}\) is an interior point of H. Therefore, the quotients appearing in the definition of the Hadamard directional derivative are well defined.

Conditions (ii) and (iii) imply that the functions \(\bar{f}(\cdot )\) and \(h^{(n)}(\cdot )\) are also Hadamard directionally differentiable. Consider the operator \(\varPsi _k(h) = h_k(h_{k+1})\) at \(h \in \text {int}(H)\). Let \(d^\ell =(d_1^\ell ,\ldots ,d_k^\ell ,d_{k+1}^\ell )\in \mathcal {H}\) be a sequence of directions converging in norm to an arbitrary direction \(d \in \mathcal {H}\), when \(\ell \rightarrow \infty \). For a sequence \(t_\ell \downarrow 0\) and \(\ell \) sufficiently large, we have

$$\begin{aligned} \varPsi _k'(h;d)&= \lim _{\ell \rightarrow \infty }\frac{1}{t_\ell }\big [\varPsi _k(h_k+t_\ell d_k^\ell ,h_{k+1}+t_\ell d_{k+1}^\ell ) - \varPsi _k(h_k,h_{k+1})\big ] \\&= \lim _{\ell \rightarrow \infty } \frac{1}{t_\ell }\big ([h_k+t_\ell d_k^\ell ](h_{k+1}+t_\ell d_{k+1}^\ell ) - h_k(h_{k+1})\big )\\&= \lim _{\ell \rightarrow \infty } \frac{1}{t_\ell }\big (h_k(h_{k+1}+t_\ell d_{k+1}^\ell ) - h_k(h_{k+1}) \big ) + d_{k}^\ell (h_{k+1}+t_\ell d_{k+1}^\ell ) \\&= h_k'(h_{k+1}; d_{k+1}) + d_{k}(h_{k+1}). \end{aligned}$$

Consider now the operator \(\varPsi _{k-1}(h) = h_{k-1}\big (h_k(h_{k+1})\big ) = h_{k-1}\big (\varPsi _k(h)\big )\). By the chain rule for Hadamard directional derivatives, we obtain

$$\begin{aligned} \varPsi _{k-1}'(h;d)&= h'_{k-1}\big (\varPsi _k(h);\varPsi _k'(h;d)\big ) + d_{k-1}\big (\varPsi _k(h)\big ). \end{aligned}$$

In this way, we can recursively calculate the Hadamard directional derivatives of the operators \(\varPsi _j(h) = h_j\big (h_{j+1}(\,\cdots h_{k}(h_{k+1})\cdots )\big )\) as follows:

$$\begin{aligned} \varPsi _{j}'(h;d) = h'_{j}\big (\varPsi _{j+1}(h);\varPsi _{j+1}'(h;d)\big ) + d_{j}\big (\varPsi _{j+1}(h)\big ),\quad j=k,k-1,\ldots ,1.\quad \end{aligned}$$
(10)

Now the Delta Theorem (see, Römisch 2006), relation (8), and the Hadamard directional differentiability of \(\varPsi (\cdot )\) at \(\bar{f}\) imply that

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho (X) \big ] = \sqrt{n}\big [ \varPsi \big (h^{(n)}\big ) - \varPsi \big (\bar{f}\big ) \big ] \xrightarrow {\mathcal {D}}\varPsi '\big ( \bar{f}, W \big ). \end{aligned}$$
(11)

The application of the recursive procedure (10) at \(h=\bar{f}\) and \(d=W\) leads to formulae (6). The covariance structure (7) of W follows directly from (9). \(\square \)

We return to Example 2 and apply Theorem 1.

Example 3

(Semideviations continued) We have defined the mappings

$$\begin{aligned} \bar{f}_1(\eta _1)&= \mathbb {E}[X] + \kappa \eta _1^{\frac{1}{p}} = \int f_1(\eta _1,x) P(\mathrm{d}x),\\ \bar{f}_2(\eta _2)&= \mathbb {E}\big \{ \big [\max \{0,X- \eta _2\}\big ]^p \big \}, \end{aligned}$$

and the constants

$$\begin{aligned} \mu _3&= \mathbb {E}[X],\quad \mu _2 = \mathbb {E}\big \{ \big [\max \{0,X- \mathbb {E}[X]\}\big ]^p \big \},\quad \mu _1=\rho (X). \end{aligned}$$

We assume that \(p>1\) and \(I_2\subset \mathbb {R}\) is a compact interval containing the support of the random variable X. The interval \(I_1=[0,a]\subset \mathbb {R}\) can be defined by choosing a so that \(a\ge |X - \mathbb {E}(X)|^p\); for example, a may be equal to the diameter of the support of X raised to power p. The space \(\mathcal {H}\) is \(\mathcal {C}_1(I_1)\times \mathcal {C}_{2}(I_2)\times \mathbb {R}\) and we take a direction \(d\in \mathcal {H}\). Following (6), we calculate

$$\begin{aligned} \xi _2(d)&= \bar{f}_2'(\mu _{3}; d_{3}) + d_{2}(\mu _{3}) = - p \mathbb {E}\big \{ \big [\max \{0,X- \mu _3\}\big ]^{p-1} \big \}d_3 + d_2(\mu _3),\\ \xi _1(d)&= \bar{f}'_{1}\big (\mu _2;\xi _{2}(d)\big ) + d_{1}\big (\mu _2\big ) = \frac{\kappa }{p}\mu _2^{\frac{1}{p} - 1}\xi _{2}(d) + d_{1}\big (\mu _2\big ). \end{aligned}$$

We obtain the expression

$$\begin{aligned} \xi _1(W)= & {} W_1\left( \mathbb {E}\left\{ \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p \right\} \right) + \frac{\kappa }{p}\left( \mathbb {E}\left\{ \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p \right\} \right) ^{\frac{1-p}{p}} \nonumber \\&\times \left( W_2\left( \mathbb {E}[X]\right) - p \mathbb {E}\left\{ \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^{p-1} \right\} W_3\right) . \end{aligned}$$
(12)

The covariance structure of the process W can be determined from (7). The process \(W_1(\cdot )\) has the constant covariance function:

$$\begin{aligned} \text {cov}\left[ W_1(\eta '),W_1(\eta '')\right]= & {} \int _{\mathcal {X}} \left[ f_1(\eta ',x) - \bar{f}_1(\eta ')\right] \left[ f_1(\eta '',x) - \bar{f}_1(\eta '')\right] \\&\quad P(\mathrm{d}x) = \text {Var}[X]. \end{aligned}$$

It follows that \(W_1(\cdot )\) has constant paths. The third coordinate, \(W_3\), has variance equal to \(\text {Var}[X]\). It also follows from (7) that \(\text {cov}\big [W_1(\eta ),W_3\big ] = \text {Var}[X]\). Therefore, \(W_1\) and \(W_3\) are, in fact, one normal random variable, which we denote by \(V_1\).

Observe that (12) involves only the value of the process \(W_2\) at \(\mu _3=\mathbb {E}[X]\). The variance of the random variable \(V_2=W_2(\mathbb {E}[X])\) and its covariance with \(V_1\) can be calculated from (7) in a similar way:

$$\begin{aligned} \text {Var}[V_2]&= \mathbb {E}\left\{ \left( \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p - \mathbb {E}\left( \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p\right) \right) ^2 \right\} ,\\ \text {cov}[V_2,V_1]&= \mathbb {E}\left\{ \left( \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p - \mathbb {E}\left( \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p\right) \right) \left( X-\mathbb {E}[X] \right) \right\} . \end{aligned}$$

Formula (12) becomes

$$\begin{aligned} \xi _1(W)= & {} V_1 + \frac{\kappa }{p}\left( \mathbb {E}\left\{ \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^p \right\} \right) ^{\frac{1-p}{p}} \nonumber \\&\times \left( V_2 - p \mathbb {E}\left\{ \left[ \max \{0,X- \mathbb {E}[X]\}\right] ^{p-1} \right\} V_1\right) . \end{aligned}$$
(13)

We conclude that

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\mathcal {N}(0,\sigma ^2), \end{aligned}$$

where the variance \(\sigma ^2\) can be calculated in a routine way as a variance of the right-hand side of (13), by substituting the expressions for variances and covariances of \(W_1\), \(W_2\), and \(W_3\). \(\blacktriangle \)

Remark 1

Following Example 3, we could derive the limiting distribution of \(\sqrt{n}\big [ \rho ^{(n)} - \rho \big ]\) for \(p=1\) as well. However, the risk measure for \(p=1\) has an equivalent min–max formulation, for which a Central Limit Theorem has already been derived in the literature (see, Shapiro 2008; Shapiro et al. 2009, Section 6.5).

3 Estimation of risk measures representable as optimal values of composite functionals

As an extension of the methods of Sect. 2, we consider the following general setting. Functions \(f_1:\mathbb {R}^d\times \mathbb {R}^s\rightarrow \mathbb {R}\), \(f_2:\mathbb {R}^d\times \mathbb {R}^m\rightarrow \mathbb {R}^s\), and a random vector X in \(\mathbb {R}^m\) are given. Our intention is to estimate the value of a composite risk functional

$$\begin{aligned} \varrho = \min _{z\in Z}f_1\left( z,\mathbb {E}[f_2(z,X)]\right) . \end{aligned}$$
(14)

where \(Z\subset \mathbb {R}^d\) is a nonempty compact set.

We note that the compactness restriction is made for technical convenience and can be relaxed.

Let \(X_1,\ldots ,X_n\) be a random iid sample from the probability distribution P of X. We construct the empirical estimate

$$\begin{aligned} \rho ^{(n)} = \min _{z\in Z}f_1\left( z, \textstyle {\frac{1}{n}\sum \limits _{i=1}^n f_2(z,X_i)}\right) . \end{aligned}$$

Our intention is to analyze the asymptotic behavior of \(\rho _n\), as \(n\rightarrow \infty \).

Following the method of Sect. 2, we define the mapping \(\varPhi :Z\times \mathcal {C}(Z)\rightarrow \mathbb {R}\) as follows:

$$\begin{aligned} \varPhi (z,h) = f_1\left( z,h(z)\right) . \end{aligned}$$

The space \(\mathbb {R}^d\times \mathcal {C}(Z)\) is equipped with the product norm of the Euclidean norm on \(\mathbb {R}^d\) and the supremum norm on \(\mathcal {C}(Z)\). We also define the functional \(v:\mathcal {C}(Z)\rightarrow \mathbb {R}\),

$$\begin{aligned} v(h) = \min _{z\in Z} \varPhi (z,h). \end{aligned}$$
(15)

Setting

$$\begin{aligned} \bar{h}(z)&= \mathbb {E}[f_2(z,X)],\\ h^{(n)}(z)&= \textstyle {\frac{1}{n}\sum \limits _{i=1}^n f_2(z,X_i)}, \end{aligned}$$

we see that

$$\begin{aligned} \varrho&= v(\bar{h}),\\ \varrho ^{(n)}&= v(h^{(n)}),\quad n=1,2\ldots . \end{aligned}$$

Let \(\hat{Z}\) denote the set of optimal solutions of problem (14).

Theorem 2

In addition to the general assumptions, suppose the following conditions are satisfied:

  1. (i)

    The function \(f_2(z,\cdot )\) is measurable for all \(z\in Z\);

  2. (ii)

    The function \(f_1(z,\cdot )\) is differentiable for all \(z\in Z\), and both \(f_1(\cdot ,\cdot )\) and its derivative with respect to the second argument, \(\nabla f_1(\cdot ,\cdot )\), are continuous with respect to both arguments;

  3. (iii)

    An integrable function \(\gamma (\cdot )\) exists such that

    $$\begin{aligned} \Vert f_2(z',x) - f_2(z'',x)\Vert \le \gamma (x)\Vert z' - z''\Vert \end{aligned}$$

    for all \(z',z''\in Z\) and all \(x\in \mathcal {X}\); moreover, \(\int \gamma ^2(x)\;P(dx) <\infty \).

Then

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\min _{z\in \hat{Z}} \left\langle \nabla f_1\left( z,\mathbb {E}[f_2(z,X)]\right) ,W(z)\right\rangle , \end{aligned}$$
(16)

where W(z) is a zero-mean Brownian process on Z with the covariance function

$$\begin{aligned} \mathrm{cov}\big [ W(z'),W(z'')\big ]= & {} \int _{\mathcal {X}} \big ( f_2(z',x) - \mathbb {E}[f_2(z',X)]\big ) \big ( f_2(z'',x)\nonumber \\&-\mathbb {E}[f_2(z'',X)]\big )^\top \;P(dx). \end{aligned}$$
(17)

Proof

Observe that assumptions (i)–(ii) of Theorem 1 are satisfied due to the compactness of the set Z and assumptions (ii)–(iii) of this theorem. Therefore, formula (8) holds:

$$\begin{aligned} \sqrt{n}\big ( h^{(n)} - \bar{h} \big ) \xrightarrow {\mathcal {D}}W. \end{aligned}$$

The limiting process W is a zero-mean Brownian process on Z with covariance function (17).

Furthermore, due to assumption (ii), the function \(\varPhi (\cdot ,h)\) is continuous. As the set Z is compact, problem (15) has a nonempty solution set S(h). By virtue of (Bonnans and Shapiro 2000, Theorem 4.13), the optimal value function \(v(\cdot )\) is Hadamard directionally differentiable at \(\bar{h}\) in every direction d with

$$\begin{aligned} v'(\bar{h};d) = \min _{z\in S(\bar{h})} \varPhi _h'(z,\bar{h})d, \end{aligned}$$

where \(\varPhi '(z,h)\) is the Fréchet derivative of \(\varPhi (z,\cdot )\) at h. Therefore, we can apply the delta method (see, Römisch 2006) to infer that

$$\begin{aligned} \sqrt{n}\big (v(h^{(n)}) -v(\bar{h})\big )\xrightarrow {\mathcal {D}}\min _{z\in S(\bar{h})} \varPhi _h'(z,\bar{h})W. \end{aligned}$$

Substituting the functional form of \(\varPhi \), we obtain

$$\begin{aligned} \varPhi _h'(z,\bar{h}) = \nabla f_1\big (z,\mathbb {E}[f_2(z,X)]\big )\delta _z, \end{aligned}$$

where \(\delta _z\) is the Dirac measure at z. Application of this operator to the process W yields formula (16). Observe that \(W(\cdot )\) has continuous paths and the minimum exists. \(\square \)

Corollary 1

If, in addition to conditions of Theorem 2, the set \(\hat{Z}\) contains only one element \(\hat{z}\), then the following central limit formula holds:

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\big \langle \nabla f_1\big (\hat{z},\mathbb {E}[f_2(\hat{z},X)]\big ),W(\hat{z})\big \rangle , \end{aligned}$$
(18)

where \(W(\hat{z})\) is a zero-mean normal vector with the covariance

$$\begin{aligned} \mathrm{cov}\big [ W(\hat{z}),W(\hat{z})\big ] = \mathrm{cov}\big [ f_2(\hat{z},X), f_2(\hat{z},X)\big ]. \end{aligned}$$

The following examples show that two notable categories of risk measures fall into the structure (14).

Example 4

(Average Value at Risk) Average Value at Risk (2) is one of the most popular and most basic coherent measures of risk. Recall that for a random variable X, it is representable as follows:

$$\begin{aligned} \mathrm{AVaR}_{\alpha }(X) = \min _{z\in \mathbb {R}} \left\{ z + \frac{1}{\alpha } \mathbb {E}[(X- z)_+]\right\} . \end{aligned}$$

This measure fits in the structure (14) by setting

$$\begin{aligned} f_1(z,\eta )&= z+ \frac{1}{\alpha }\eta \\ f_2(z,X)&= \max (0, X-z). \end{aligned}$$

The plug-in empirical estimators of (2) have the following form

$$\begin{aligned} \rho ^{(n)} = \min _{z\in \mathbb {R}} \left\{ z + \frac{1}{\alpha n} \sum _{i=1}^n \big (\max (0, X_i- z)\big ) \right\} . \end{aligned}$$

If the support of the distribution of X is bounded, then so is the support of all empirical distributions and we can assume that the set Z contains the support of the distribution. Observe that all assumptions of Theorem 2 are satisfied. If the distribution function of the random variable X is continuous at \(\alpha \), then the solution of the optimization problem at the right-hand side of (2) is unique. In that case, the assumptions of Corollary 1 are also satisfied. We conclude that

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\frac{1}{\alpha } W, \end{aligned}$$

where W is a normal random variable with zero mean and variance

$$\begin{aligned} \text {Var}[W] = \mathbb {E}\Big [ \Big ( \max (0, X- \hat{z}) - \mathbb {E}\big [ \max (0, X- \hat{z} \big ] \Big )^2 \Big ]. \end{aligned}$$

We note that the assumption of the boundedness of the support of the random variable X is not restrictive, because we could take a sufficiently large set Z, which would contain the corresponding quantile of the distribution function of X and all empirical quantiles for sufficiently large sample sizes.

Additionally, we refer to another method for estimating the Average Value at Risk at all levels simultaneously, which was discussed in Dentcheva and Penev (2010), where also central limit formulae under different set of assumptions were established. \(\blacktriangle \)

Example 5

(Higher order Inverse Risk Measures) Consider a higher order inverse risk measure (4) with \(c=\frac{1}{\alpha }>1\):

$$\begin{aligned} \rho [X] = \min _{z\in \mathbb {R}} \Big \{ z + c \big \Vert \max (0, X- z)\big \Vert _p \Big \}, \end{aligned}$$
(19)

where \(p>1\) and \(\Vert \cdot \Vert _p\) is the norm in the \(\mathcal {L}^p\) space. We define:

$$\begin{aligned} f_1(z,y)&= z + c y^{\frac{1}{p}},\\ f_2(z,x)&= \big (\max (0, x- z)\big )^p. \end{aligned}$$

If the support of the distribution of X is bounded, so is the support of all empirical distributions. In this case, we can find a bounded set Z (albeit larger than the support of X) such that all solutions of problems (19) belong to this set. For \(p>1\) and \(c>1\) problem (19) has a unique solution, which we denote by \(\hat{z}\).

The plug-in empirical estimators of (19) have the following form

$$\begin{aligned} \rho ^{(n)} = \min _{z\in \mathbb {R}} \left\{ z + c \left( \frac{1}{n} \sum _{i=1}^n \left( \max (0, X_i- z)\right) ^p\right) ^{\frac{1}{p}} \right\} . \end{aligned}$$
(20)

Observe that all assumptions of Theorem 2 and Corollary 1 are satisfied. We conclude that

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\frac{c}{p} \Big (\mathbb {E}\big [\big (\max (0,X-\hat{z})\big )^p\big ]\Big )^\frac{1-p}{p} W, \end{aligned}$$
(21)

where W is a normal random variable with zero mean and variance

$$\begin{aligned} \text {Var}[W] = \mathbb {E}\Big [ \Big ( \big (\max (0, X- \hat{z})\big )^p - \mathbb {E}\big [ \big (\max (0, X- \hat{z})\big )^p\big ] \Big )^2 \Big ]. \end{aligned}$$

\(\blacktriangle \)

4 Estimation of optimized composite risk functionals

In this section, we are concerned with optimization problems in which the objective function is a composite risk functional. Our goal is to establish a central limit formula for the optimal value of such problems.

Our methods allow for the analysis of more complicated structures of optimized risk functionals:

$$\begin{aligned} \varrho =\min _{u\in U}\mathbb {E}\Big [f_1\Big (u,\mathbb {E}\big [f_2\big (u,\mathbb {E}[\cdots f_k(u,\mathbb {E}[f_{k+1}(u,X)],X)]\cdots ,X\big )\big ],X\Big )\Big ]. \end{aligned}$$
(22)

Here X is a m-dimensional random vector, \(f_j:U\times \mathbb {R}^{m_j} \times \mathbb {R}^m \rightarrow \mathbb {R}^{m_{j-1}}\), \(j=1,\ldots ,k\), with \(m_0=1\) and \(f_{k+1}:U\times \mathbb {R}^m\rightarrow \mathbb {R}^{m_k}\). We assume that U is a compact set in a finite-dimensional space and the optimal solution \(\hat{u}\) of this problem is unique.

We define the functions:

$$\begin{aligned} \bar{f}_j(u,\eta _j)&= \int _{\mathcal {X}} f_j({u},\eta _j,x)\,P(\mathrm{d}x),\quad j=1,\ldots ,k, \\ \bar{f}_{k+1}(u)&= \int _{\mathcal {X}} f_{k+1}({u},x)\,P(\mathrm{d}x),\\ \end{aligned}$$

and the quantities

$$\begin{aligned} \mu _{k+1}&= \bar{f}_{k+1}(\hat{u}),\\ \mu _j&= \bar{f}_{j}(\hat{u},\mu _{j+1}),\quad j=1,\ldots ,k. \end{aligned}$$

We assume that compact sets \(I_1,\ldots ,I_{k}\) are selected so that \(\text {int}(I_k) \supset \bar{f}_{k+1}(U)\), and \(\text {int}(I_j)\supset \bar{f}_{j+1}(U,I_{j+1})\), \(j=1,\ldots ,k-1\). Let us define the space

$$\begin{aligned} \mathcal {H}=\mathcal {C}_1^{(0,1)}(U\times I_1)\times \mathcal {C}_{m_1}^{(0,1)}(U\times I_2)\times \cdots \mathcal {C}_{m_{k-1}}^{(0,1)}(U\times I_k)\times \mathcal {C}_{m_{k}}(U), \end{aligned}$$

where \(\mathcal {C}_{m_{j-1}}^{(0,1)}(U\times I_j)\) is the space of \(\mathbb {R}^{m_{j-1}}\)-valued continuous functions on \(U\times I_j\), which are differentiable with respect to the second argument with continuous derivatives on \(U\times I_j\). We denote the Jacobian of \(f_j({u},\eta _j,x)\) with respect to the second argument at \(\eta _j^{*}\in I_j\) by \(f'_j(u,\eta _j^{*},x)\). For every direction \(d\in \mathcal {H}\), we define recursively the sequence of vectors:

$$\begin{aligned} \xi _{k+1}(d)= & {} d_{k+1},\nonumber \\ \xi _{j}(d)= & {} \int _{\mathcal {X}} f'_{j}(\hat{u},\mu _{j+1},x)\xi _{j+1}(d)\,P(\mathrm{d}x) + d_{j}(\mu _{j+1}),\quad j=k,k-1,\ldots ,1.\nonumber \\ \end{aligned}$$
(23)

The empirical estimator is

$$\begin{aligned} \varrho ^{(n)}&=\min _{u\in U}\sum _{i=1}^n \frac{1}{n}\bigg [f_1\bigg (u,\sum _{i=1}^n \frac{1}{n}\bigg [f_2\bigg (u,\sum _{i=1}^n \frac{1}{n}\bigg [\;\cdots f_k\bigg (u,\sum _{i=1}^n \frac{1}{n}[f_{k+1}(u,X)],X\bigg )\bigg ]\\&\quad \;\cdots ,X\bigg )\bigg ],X\bigg )\bigg ]. \end{aligned}$$

We establish the following result.

Theorem 3

Suppose the following conditions are satisfied:

  1. (i)

    \(\int _\mathcal {X}\Vert f_j(u,\eta _j,x)\Vert ^2 \;P(dx)<\infty \) for all \(\eta _j\in I_j\), \(u\in U\), \(j=1,\ldots ,k\), and \(\int _\mathcal {X}\Vert f_{k+1}(u,x)\Vert ^2 P(dx)<\infty \) for all \(u\in U\);

  2. (ii)

    The functions \(f_j(\cdot ,\cdot ,x)\), \(j=1,\ldots ,k\), and \(f_{k+1}(\cdot ,x)\) are Lipschitz continuous for every \(x\in \mathcal {X}\):

    $$\begin{aligned} \Vert f_j(u',\eta _j',x)- f_j(u'',\eta _j'',x)\Vert&\le \gamma _j(x)\big (\Vert u'-u''\Vert + \Vert \eta _j'-\eta _j''\Vert \big ),\quad j=1,\ldots ,k.\\ \Vert f_{k+1}(u',x)- f_{k+1}(u'',x)\Vert&\le \gamma _{k+1}(x)\Vert u'-u''\Vert , \end{aligned}$$

    for all \(\eta _j',\eta _j''\in I_j\), \(u',u''\in U\); moreover, \(\int \gamma _j^2(x)\;P(dx) <\infty \), \(j=1,\ldots ,k+1\);

  3. (iii)

    The functions \(f_j(u,\cdot ,x)\), \(j=1,\ldots ,k\), are continuously differentiable for every \(x\in \mathcal {X}\), \(u\in U\); moreover, their derivatives are continuous with respect to the first two arguments.

Then

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\xi _1(W), \end{aligned}$$

where \(W(\cdot ) = \big ( W_1(\cdot ), \ldots , W_{k}(\cdot ), W_{k+1}\big ) \) is a zero-mean Brownian process on \(I = I_1\times I_2 \times \cdots \times I_k\). Here \(W_j(\cdot )\) is a Brownian process of dimension \(m_{j-1}\) on \(I_j\), \(j=1,\ldots ,k\), and \(W_{k+1}\) is an \(m_k\)-dimensional normal vector. The covariance function of \(W(\cdot )\) has the following form

$$\begin{aligned}&\mathrm{cov}\big [ W_i(\eta _i), W_j(\eta _j) \big ]\nonumber \\&\quad = \int _{\mathcal {X}} \big [ f_i(\hat{u},\eta _i,x) - \bar{f}_i(\hat{u},\eta _i)\big ] \big [ f_j(\hat{u},\eta _j,x) - \bar{f}_j(\hat{u},\eta _j)\big ]^\top \;P(dx),\nonumber \\&\qquad \eta _i\in I_i,\ \eta _j\in I_j,\ i,j=1,\ldots ,k \nonumber \\&\mathrm{cov}\big [ W_i(\eta _i), W_{k+1} \big ]\nonumber \\&\quad =\int _{\mathcal {X}} \big [ f_i(\hat{u},\eta _i,x) - \bar{f}_i(\hat{u},\eta _i)\big ] \big [ f_{k+1}(\hat{u},x) - \bar{f}_{k+1}(\hat{u})\big ]^\top \;P(dx),\\&\qquad \eta _i\in I_i,\ i=1,\ldots ,k \nonumber \\&\mathrm{cov}\big [ W_{k+1}, W_{k+1} \big ]\nonumber \\&\quad = \int _{\mathcal {X}} \big [ f_{k+1}(\hat{u},x) - \bar{f}_{k+1}(\hat{u})\big ] \big [ f_{k+1}(\hat{u},x) - \bar{f}_{k+1}(\hat{u})\big ]^\top \;P(dx). \nonumber \end{aligned}$$
(24)

Proof

We follow the main line of argument of the proof of Theorem 1. We define \(M= m_0+m_1+\cdots +m_k\) and the vector-valued function \( f: U\times I \times \mathcal {X}\rightarrow \mathbb {R}^{M} \) with block coordinates \(f_j(u,\eta _j,x)\), \(j=1,\ldots ,k\), and \(f_{k+1}(u,x)\). Similarly, we define \(\bar{f}: U\times I \rightarrow \mathbb {R}^{M}\) with block coordinates \(\bar{f}_j(u,\eta _j)\), \(j=1,\ldots ,k\), and \(\bar{f}_{k+1}(u)\). Consider the empirical estimates of the function \(\bar{f}(u,\eta )\):

$$\begin{aligned} h^{(n)}(u,\eta ) = \frac{1}{n}\sum _{i=1}^n f(u,\eta ,X_i),\quad n =1,2,\ldots . \end{aligned}$$

Due to our assumptions, for sufficiently large n all these functions are elements of the space \(\mathcal {H}\).

Owing to assumptions (i)–(ii), the class of functions \(f(u,\eta ,\cdot )\), \(u\in U\), \(\eta \in I\), is Donsker, that is the following uniform Central Limit Theorem, holds (see Van der Vaart 1998, Ex. 19.7):

$$\begin{aligned} \sqrt{n}\big ( h^{(n)} - \bar{f} \big ) \xrightarrow {\mathcal {D}}W, \end{aligned}$$
(25)

where W is a zero-mean Brownian process on \(U\times I\) with covariance function

$$\begin{aligned}&\mathrm{cov}\big [ W(u',\eta '), W(u'',\eta '')\big ] \nonumber \\&\quad =\int _{\mathcal {X}} \big [ f(u',\eta ',x) - \bar{f}(u',\eta ')\big ] \big [ f(u'',\eta '',x) - \bar{f}(u'',\eta '')\big ]^\top \;P(\mathrm{d}x). \end{aligned}$$
(26)

This fact will allow us to establish asymptotic properties of the sequence \(\big \{\rho ^{(n)}\big \}\). We define an operator \(\Psi : \mathcal {H}\rightarrow \mathbb {R}\) as follows

$$\begin{aligned} \varPsi (u,h) = h_1\Big (u,h_2\big (u,\cdots h_k(u,h_{k+1}(u))\cdots \big )\Big ). \end{aligned}$$

By definition,

$$\begin{aligned} \rho (X)&= \min _{u\in U}\varPsi \big (u,\bar{f}\big ),\\ \rho ^{(n)}&= \min _{u\in U}\varPsi \big (u,h^{(n)}\big ). \end{aligned}$$

To apply Delta Theorem to the sequence \(\big \{ \rho ^{(n)} \big \}\), we have to verify Hadamard directional differentiability of the optimal value function \(v(\cdot ) = \min _{u\in U}\varPsi (u,\cdot )\) at \(\bar{f}\). Observe that our assumptions imply that the conditions of (Bonnans and Shapiro 2000, Thm. 4.13) are satisfied. As the optimal solution set is a singleton, the function \(v(\cdot )\) is differentiable at \(\bar{f}\) with the Fréchet derivative

$$\begin{aligned} v'(\bar{f}) = \varPsi '(\hat{u},\bar{f}), \end{aligned}$$

where \(\varPsi '(u,f)\) is the Fréchet derivative of \(\varPsi (u,\cdot )\) at f. The remaining derivations are identical as those in the proof of Theorem 1. We only need to substitute \(\hat{u}\) as an additional argument of all functions involved. \(\square \)

Example 6

(Optimization problems with mean–semideviation) Consider now an optimization problem involving a mean–semideviation measure of risk

$$\begin{aligned} \min _{u\in U} \rho [\varphi (u,X)] = \mathbb {E}[\varphi (u,X)] + \kappa \Big (\mathbb {E}\big [ \big (\varphi (u,X) - \mathbb {E}[\varphi (u,X)]\big )_+^p\big ]\Big )^\frac{1}{p}, \end{aligned}$$
(27)

where \(\varphi :\mathbb {R}^d \times \mathcal {X}\rightarrow \mathbb {R}\). We have

$$\begin{aligned} {f}_1(\eta _1,u,x)&= \kappa \eta _1^{\frac{1}{p}} + \varphi (u,x),\\ {f}_2(\eta _2,u,x)&= \big \{ \big [\max \{0,\varphi (u,x)- \eta _2\}\big ]^p \big \},\\ {f}_3(u,x)&= \varphi (u,x), \end{aligned}$$

and

$$\begin{aligned} \bar{f}_1(\eta _1,u)&= \kappa \eta _1^{\frac{1}{p}} + \mathbb {E}[\varphi (u,X)],\\ \bar{f}_2(\eta _2,u)&= \mathbb {E}\big \{ \big [\max \{0,\varphi (u,X)- \eta _2\}\big ]^p \big \},\\ \bar{f}_3(u)&= \mathbb {E}[\varphi (u,X)]. \end{aligned}$$

We assume that \(p>1\). Suppose \(\hat{u}\) is the unique solution of problem (27). We set \(\mu _3 = \mathbb {E}[\varphi (\hat{u},X)]\). Then \(\mu _2 = \mathbb {E}\big \{ \big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^p \big \}\) and \(\mu _1=\rho (X)\). Following (23), we calculate

$$\begin{aligned} \xi _2(d) = \bar{f}_2'(\mu _{3},\hat{u}; d_{3}) + d_{2}(\mu _{3}) = - p \mathbb {E}\big \{ \big [\max \{0,\varphi (\hat{u},X)- \mu _3\}\big ]^{p-1} \big \}d_3 + d_2(\mu _3), \end{aligned}$$
$$\begin{aligned} \xi _1(d) = \bar{f}'_{1}\big (\mu _2,\hat{u};\xi _{2}(d)\big ) + d_{1}\big (\mu _2\big ) = \frac{\kappa }{p}\mu _2^{\frac{1}{p} - 1}\xi _{2}(d) + d_{1}\big (\mu _2\big ). \end{aligned}$$

We obtain the expression

$$\begin{aligned}&\varPsi _1'(\bar{f};W) = W_1\big ( \mathbb {E}\big \{ \big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^p \big \} \big ) \nonumber \\&\quad +\frac{\kappa }{p}\Big ( \mathbb {E}\big \{ \big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^p \big \} \Big )^{\frac{1-p}{p}} \nonumber \\&\quad \times \Big (W_2\big (\mathbb {E}[\varphi (\hat{u},X)]\big )-p \mathbb {E}\big \{ \big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^{p-1} \big \}W_3\Big ).\qquad \end{aligned}$$
(28)

The covariance structure of the process W can be determined from (26), similar to Example 3. The process \(W_1(\cdot )\) has the constant covariance function:

$$\begin{aligned} \text {cov}\big [W_1(\eta _1(\hat{u})),W_1(\eta _1(\hat{u}))\big ] = \text {Var}[\varphi (\hat{u},X)]. \end{aligned}$$

The third coordinate, \(W_3\) has variance equal to \(\text {Var}[\varphi (\hat{u},X)]\). In addition,

$$\begin{aligned} \text {cov}(W_1(\eta _1(\hat{u})),W_3) = \text {Var}[\varphi (\hat{u},X)], \end{aligned}$$

and thus \(W_1\) and \(W_3\) have the same normal distribution and are perfectly correlated.

The variance function of \(W_2(\cdot )\) and its covariance with \(W_1\) (and \(W_3\)) can be calculated in a similar way:

$$\begin{aligned}&\text {Var}[W_2(\mathbb {E}[\varphi (\hat{u},X)])] = \mathbb {E}\Big \{\Big ( \big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^p \\&\quad -\mathbb {E}\big (\big [\max \{0,\varphi (\hat{u},X)- \mathbb {E}[\varphi (\hat{u},X)]\}\big ]^p\big ) \Big ) \Big ( \varphi (\hat{u},X)-\mathbb {E}[\varphi (\hat{u},X)] \Big )\Big \}. \end{aligned}$$

We conclude that

$$\begin{aligned} \sqrt{n}\big [ \rho ^{(n)} - \rho \big ] \xrightarrow {\mathcal {D}}\mathcal {N}(0,\sigma ^2), \end{aligned}$$

where the variance \(\sigma ^2\) can be calculated in a routine way as a variance of the right-hand side of (28), by substituting the expressions for variances and covariances of \(W_1\), \(W_2\), and \(W_3\). \(\blacktriangle \)

5 A simulation study

In this section, we illustrate the convergence of some estimators discussed in this paper to the limiting normal distribution. Many previously known results for the case \(p=1\) have been investigated thoroughly in the literature (see, e.g., Stoyanov et al. 2010) and we will not dwell upon these here. We will only illustrate the case about Higher order Inverse Risk Measures as discussed in Example 4 for the case \(p>1.\) More specifically, we take independent identically distributed observations \(X_i, i=1,2,\ldots , n\) from an independent identically distributed \(X\sim \mathcal {N}(10,3)\) observations. We take \(\epsilon =0.05\) and \(p=2.\) In that case \(c=20.\) Numerical calculation in Matlab delivers the theoretical argument minimum \(z^*=14.5048\) and the value of the risk in (19) being \(\rho [X]=15.5163.\) The standard deviation of the random variable in the right-hand side of (21) is 16.032. The plug-in estimator \(\rho ^{(n)}\) of this risk can be represented as a solution of a convex optimization problem with convex constraints and hence a unique solution can be found by any package that solves such type of problems. We have used the cvx package that can be operated within matlab. Denoting \(d_i=\max (X_i-z,0), i=1,2,\ldots ,n\) and putting all \(d_i, i=1,2,\ldots ,n\) in a vector \(\mathbf d \) we can rewrite our optimization problem as follows:

(29)

The numerical solution to this optimization problem gives us the estimator \(\rho ^{(n)}.\) To get an idea about the speed of convergence to the limiting distribution in (20) we simulate \(m=2500\) risk estimators \(\rho ^{(n)}_j, j=1,2,\ldots ,2500\) for a given sample size n and draw their histogram. The number of bins for the histogram is determined by the rough “squared root of the sample size” rule. This histogram is superimposed to the \(\mathcal {N}(15.5163, (16.032/\sqrt{n})^2)\) density. As n is increased, our theory suggests that the histogram and the normal density graph will look more and more similar in shape. Their closeness indicates how quickly the central limit theorem pops up in this case.

Fig. 1
figure 1

Density histogram of the distribution of the estimator \(\rho _n \) for increasing values of n and its normal approximation using Theorem 2 and \(X\sim \mathcal {N}(10,3)\)

Fig. 2
figure 2

Density histogram of the distribution of the estimator \(\rho _n \) for different values of p when \(X\sim \mathcal {N}(10,3)\)

Fig. 3
figure 3

Density histogram of the distribution of the estimator \(\rho _n \) for \(n=4000\) and \(X\sim t_{\nu }\) with \(\nu \) being 60, 8, 6 and 4

Figure 1 shows that the central limit theorem indeed represents a very good approximation which improves significantly with increasing sample size. The small downward bias that appears in Fig. 1a is getting increasingly irrelevant with growing sample size. We have experimented with different values of p such as \(p=1, 1.5, 2 \) and 2.5 and we have also changed the value of \(\epsilon \) (respectively, \(c=1/\epsilon \)). The tendency shown in Fig. 1 is largely upheld, however, as expected, the standard errors are increased when c and/or p is increased. In addition, the limiting normal approximation seems to be more accurate for the same sample sizes when a smaller value of p is used. This discussed effect is illustrated on Fig. 2 where \(p=1\) (i.e., the case of AVaR), \(p=1.5,\) \(p=2\) (where a different sample in comparison to the sample in Fig. 1,) and \(p=2.5\) was simulated). The remaining quantities have been kept fixed to \(n=2000\) and \(c=20.\) We stress that increasing the sample size in Fig. 2d makes the histogram look much more like the limiting normal curve so that the discrepancy observed there is indeed just due to the limiting approximation popping up at larger samples when p is increased.

We also experimented with different distributions for the random variable X. We took specifically t distributions with degrees of freedom \(\nu \) such as 4, 6, 8 and 60, shifted to have the same mean of 10 like in the normal simulated data. The results of this comparison for \(p=2, \epsilon =0.05\) and \(n=4000\) are shown in Fig. 3. The variances of the t-distributed variables, being equal to \(\nu /(\nu -2),\) are finite and even smaller than the variance of the normal random variable in Fig. 1. However, the heavier tails of the t distribution adversely affect the quality of the approximation. Despite the fact that the limiting distribution of the risk estimator is still normal when \(\nu =6\) and \(\nu =8,\) the heavy tailed data cause the normal approximation to be relatively poor even at \(n=4000.\) The case \(\nu =60\) is closer to normal distribution and hence the approximation works better in this case.

Note that the limiting distribution when \(p=2\) involves the fourth moment of the t distribution and this moment is finite for \(\nu =6, 8\) and 60 but is infinite when \(\nu =4.\) As a result, it can be seen from Fig. 3d that the normal approximation collapses in this case. In addition, Fig. 3 shows that for attaining similar quality in Kolmogorov metric for the asymptotic approximation like in the case of normally distributed X,  in Fig. 1c, much bigger samples are needed. For the fixed sample size of 4000, the quality of the normal approximation worsens as \(\nu \) decreases from 60 to 8 and then to 6. Furthermore, and outside of the scope of the present paper, we note that if the distribution of X has even heavier tails than the t distribution (for example, if it is in the class of stable distributions with stability parameter in the range (1,2)) then the limiting distribution of the risk may not be normal at all.

6 Conclusions

Motivated by the need to estimate coherent risk measures, we introduce a general composite functional structure in which many known coherent risk measures can be cast. We establish central limit theorems by reformulating the problems in functional spaces, using the infinite dimensional delta method, and Donsker theory. The applicability of the procedure hinges on verifying smoothness conditions of the related functionals. The potential applicability of our central limit theorems, however, extends beyond functionals representing coherent risk measures. Our short simulation study indicates that the central limit theorem-type approximations are very accurate when the sample size is large, p is in reasonable limits between 1 and 3 and the tails of the distribution of X are not very heavy. We note that for smaller sample sizes, the technique of concentration inequalities may be more powerful and accurate when evaluating the closeness of the approximation. It is possible to derive concentration inequalities for estimators of statistical functionals with the structure that has been introduced in our paper. This is subject of ongoing research.