1 Introduction

Although value at risk (VaR) is the most popular risk measure among practitioners, it has been heavily criticized in the theoretical literature since it does not necessarily associate portfolio diversification with risk reduction. Therefore, axiomatically founded risk measures such as coherent (cf. [3] for finite and [10] for infinite spaces) and, more generally, convex (cf., [13, 14]) risk measures have been introduced whose axioms of sub-additivity, respectively convexity, directly ensure that diversification reduces the measured risk.

This paper argues that all types of risk measures that are studied in the literature suffer from either of the following two drawbacks: either, like coherent and convex risk measures, they cannot be defined on the space of all random variables (see discussions in [10]) or, like VaR, they might not be available for assessing the risk that arises from model uncertainty (see example 3 below). To address both shortcomings, we are going to introduce and axiomatically characterize the class of natural risk measures. In contrast to the sub-additivity axiom of coherent risk measures, natural risk measures require sub-additivity only for co-monotonic random variables. Note that VaR, first, satisfies additivity for co-monotonic random variables and is, second, well-defined on the space of all random variables. To relax sub-additivity in favor of co-monotonic sub-additivity therefore amounts to studying a general class of risk measures that contains VaR and is well-defined on large spaces of random variables.

The representation of natural risk measures, provided in Theorem 1, can be regarded as a convexification of the representation of insurance risk premiums in [25].Footnote 1 However, we will see that this convexification is not mathematically straightforward and needs a significant amount of further mathematical work. Whereas [25] used results attributed to Greco in [11] (see [15] in Italian) to provide the dual representation of insurance risk premiums, we use Daniell integrals to extend the representation of natural risk statistics (see [1, 17]) on finite probability spaces to the set of all bounded-below random variables. In addition, our representation in Theorem 1 sheds some light on the axiomatic foundations of the VaR criterion versus the [18] representation, and provides further tools to develop theories of risk measures and risk premiums for large spaces.

The paper is organized as follows: in Sect. 2, we provide some preliminary mathematical definitions and introduce natural risk measures and the weak continuity. In Sect. 3, we present examples of natural risk measures that are different from VaR but are also co-monotone sub-additive (and not necessarily additive) in the presence of model uncertainty. In Sect. 4, we state our main result for Theorem 1, which gives a dual representation of weakly continuous natural risk measures.

2 Preliminaries and definitions

In this section, we will introduce preliminary mathematical tools and definitions with the necessary economic and financial concepts that we will use in our discussions.

2.1 Mathematical framework

Let \((\Omega ,\mathcal {F},P)\) be an atom-less probability space, where \(\Omega \) represents the “states of the nature”, \(\mathcal {F}\) is the sigma-field of all measurable sets, and P is the physical probability measure. In this study, we consider that \(L^{0}\), the set of all measurable functions or random variables on \((\Omega ,\mathcal {F},P)\), represents the set of individual loss variables.Footnote 2 Let us also denote the set of all bounded-below random variables by \(L_{B}^{0}\). The space \(L^{0}\) is a metric space whose metric is defined as \(d\left( X,Y\right) =E\left( \min \left\{ \left| X-Y\right| ,1\right\} \right) \), where E denotes the expectation. Convergence in this topology is equivalent to convergence in probability, i.e., \(d\left( X_{n},X\right) \rightarrow 0\) iff \(\forall \epsilon >0,\, P\left( \left| X_{n}-X\right| >\epsilon \right) \rightarrow 0\). The space \(L^{p}\), for \(0<p\), is the space of all random variables with pth finite moment, i.e., \(L^{p}=\left\{ X\in L^{0}\vert E\left( \left| X\right| ^{p}\right) <\infty \right\} \). \(L^{\infty }\) is the set of all almost surely bounded members of \(L^{0}\).

The cumulative distribution function of a random variable \(X\in L^{0}\) is denoted by \(F_{X}.\) For any \(X\in L^{0}\), \(F_{X}\) is a càdlàgFootnote 3 and non-decreasing function, with a left inverse given by \(F_{X}^{-1}(\alpha )=\inf \{x\in \mathbb {R}:F_{X}(x)\ge \alpha \}\), for \(\alpha \in \left( 0,1\right) \), which is also a càdlàg function. If \(X\in L_{B}^{0}\), one can extend the inverse to \(\alpha =0\), i.e., \(F_{X}^{-1}\left( 0\right) =\mathrm {essinf}\left( X\right) \), where \(\mathrm {essinf}\) is the essential infimum.

Two random variables \(X,X'\in L^{0}\) have the same distribution if and only if \(F_{X}=F_{X'}\). Two random variables, X and Y, are co-monotone if

$$\begin{aligned} \left( X(\omega )-X(\omega ')\right) \left( Y(\omega )-Y(\omega ')\right) \ge 0\,\,\,\text { a.s.}\,\,\omega ,\omega '\in \Omega . \end{aligned}$$

In this study, we use a version of co-monotonicity due to Denneberg (cf. Proposition 2 in [25]), which says that X and Y are co-monotone if there are two non-decreasing real functions f and g and a random variable U such that \(X=f\left( U\right) \) and \(Y=g\left( U\right) \). Finally, as usual, \(\mathcal {B}\left[ 0,1\right] \) denotes the set of all Borel measurable subsets of \(\left[ 0,1\right] \).

2.2 Natural risk measures

Now we introduce the class of natural risk measures

Definition 1

A natural risk measureFootnote 4 \(\varrho \) is a mapping from \(L_{B}^{0}\) to \(\mathbb {R}\) that satisfies the following conditions:

  1. 1.

    Positive homogeneity: \(\varrho (\lambda X)=\lambda \varrho (X)\),\(\forall \lambda >0\) and \(\forall X\in L_{B}^{0}\);

  2. 2.

    Cash invariance: \(\varrho (X+c)=\varrho (X)+c\), \(\forall X\in L_{B}^{0}\) and \(\forall c\in \mathbb {R}\);

  3. 3.

    Monotonicity: \(\varrho (X)\le \varrho (Y)\), \(\forall X,Y\in L_{B}^{0}\) and \(X\le Y\);

  4. 4.

    Co-monotone sub-additivity: \(\varrho (X+Y)\le \varrho (X)+\varrho (Y),\forall X,Y\in L_{B}^{0}\) ; X and Y are co-monotone;

  5. 5.

    Law invariance: \(\varrho (X)=\varrho (Y)\) if \(F_{X}=F_{Y}\), i.e., X and Y have the same distribution.

Let

$$\begin{aligned} \mathbb {F}= & {} \left\{ K:\left[ 0,1\right] \rightarrow \mathbb {R}\cup \left\{ +\infty \right\} \left| \begin{array}{c} K\left( 0\right) \in \mathbb {R}\\ K\left( 1\right) \in \mathbb {R}\cup \left\{ +\infty \right\} \\ \forall \alpha \in \left( 0,1\right) ,K\left( \alpha \right) \in \mathbb {R} \end{array}\right. \right\} \end{aligned}$$

and \(\mathbb {A}=\mathbb {F}\cap \{ \text {non-decreasing and c}\grave{a}\text {dl}\grave{a}\text {g functions}\} \). It is clear that \(\mathbb {A}\) consists of all left inverse cumulative distribution functions with a finite essential infimum. It is known that, for any random variable X and any random variable U with a uniform distribution on \(\left[ 0,1\right] \), X and \(F_{X}^{-1}(U)\) have the same distribution. That is why we can consider a natural risk measure \(\varrho \) as a well-defined function on the set of all inverse cumulative distribution functions \(\mathbb {A}\). We use this fact later in the proof of our main result for Theorem 1.

Definition 2

The natural risk measure \(\varrho \) is weakly continuous if \(\varrho \left( X_{n}\right) \underset{n\rightarrow \infty }{\longrightarrow }\varrho \left( X\right) \) when \(F_{X_{n}}\left( x\right) \underset{n\rightarrow \infty }{\longrightarrow }F_{X}\left( x\right) ,\forall x\in \mathbb {R}\).

3 Examples of natural risk measures

In this section, we introduce co-monotone additive and sub-additive natural risk measures that naturally emerge in insurance and finance applications.

Example 1

Value at risk, for a fixed tolerance level \(\alpha \in \left[ 0,1\right) \), is introduced as follows:

$$\begin{aligned} \mathrm {VaR}_{\alpha }\left( X\right) =F_{X}^{-1}(\alpha ) ,\quad X\in L_{B}^{0}. \end{aligned}$$

Note that, since natural risk measures are defined on \(L_{B}^{0}\), \(\mathrm {VaR}_{0}\left( X\right) =F_{X}^{-1}(0)=\mathrm {essinf}\left( X\right) \in \left( -\infty ,+\infty \right) \).

Observe that VaR is weakly continuous. To see this, let us consider \(F_{X_{n}}\left( x\right) \underset{n\rightarrow \infty }{\longrightarrow }F_{X}\left( x\right) ,\forall x\in \mathbb {R}\). First, we prove that there is no \(\epsilon >0\) such that \(\mathrm {VaR_{\alpha }}\left( X_{n}\right) \le \mathrm {VaR_{\alpha }}\left( X\right) -\epsilon ,\forall n\). Indeed, if this happens, then for any y such that \(F_{X}\left( y\right) \ge \alpha \) we have that \(\mathrm {VaR_{\alpha }}\left( X_{n}\right) \le y-\epsilon ,\forall n\). By right continuity and monotonicity of \(F_{X_{n}}\), this implies that \(\alpha \le F_{X_{n}}\left( y-\epsilon \right) \). When n tends to infinity, then this gives that \(\alpha \le F_{X}\left( y-\epsilon \right) \). This implies that \(\mathrm {VaR}_{\alpha }\left( X\right) =\inf \left\{ x\in \mathbb {R}:F_{X}\left( x\right) \ge \alpha \right\} \le y-\epsilon .\) Now, by taking infimum over all y such that \(F_{X}\left( y\right) \ge \alpha \), we get that \(\mathrm {VaR}_{\alpha }\left( X\right) \le \mathrm {VaR}_{\alpha }\left( X\right) -\epsilon \), which is a contradiction. Now, using the fact that \(\mathrm {VaR}_{\alpha }\left( X\right) =-\mathrm {VaR}_{1-\alpha }\left( -X\right) \), one can show that there is no \(\epsilon >0\) such that \(\mathrm {VaR_{\alpha }}\left( X_{n}\right) \ge \mathrm {VaR_{\alpha }}\left( X\right) +\epsilon ,\forall n\).

Example 2

Consider an insurance company whose total loss for a fiscal year can be represented by a non-negative random variable X. Consider also that the insurance company will have to buy a reinsurance contract \(0\le Y\le X\). The premium of Y is simply given by expectation, i.e., E(Y). Therefore, the insurance company’s global position is \(X-Y+E(Y)\). On the other hand, to avoid the risk of moral hazard, the contract Y should be such that both parties feel any increase in the losses. Therefore, we consider a non-decreasing and non-negative function f, so that \(x\mapsto x-f(x)\) is also non-decreasing and non-negative; see, for example, [4, 8, 9]. Let us denote the set of all such functions f by \(\mathbb {C}\). If the insurance company measures its risk by \(\mathrm {VaR}_{\alpha }\), for some \(\alpha \in \left( 0,1\right) \), the optimal contract \(f^{*}\) will be found by solving

$$\begin{aligned} \min _{f\in \mathbb {C}}\mathrm {VaR}_{\alpha }\left( \left( X-f\left( X\right) \right) +E\left( f\left( X\right) \right) \right) . \end{aligned}$$

Therefore, the risk of the global position is given by

$$\begin{aligned} \varrho \left( X\right) =\mathrm {VaR}_{\alpha }\left( X-f^{*}\left( X\right) \right) +E\left( f^{*}\left( X\right) \right) =\min _{f\in \mathbb {C}}\mathrm {VaR}_{\alpha }\left( \left( X-f\left( X\right) \right) +E\left( f\left( X\right) \right) \right) . \end{aligned}$$

Following the discussions in [4, 6], one can easily see that

$$\begin{aligned} \varrho \left( X\right) =\min _{f\in \mathbb {C}}\mathrm {VaR}_{\alpha }\left( X-f\left( X\right) \right) +E\left( f\left( X\right) \right) =\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda \left( t\right) , \end{aligned}$$

where

$$\begin{aligned} \lambda \left( x\right) =\max \left\{ 1_{\left[ \alpha ,1\right] }\left( x\right) ,x\right\} ={\left\{ \begin{array}{ll} x, &{}\alpha <x\le 1\\ 0, &{}0\le x\le \alpha \end{array}\right. }. \end{aligned}$$

Therefore, to assess the risk of the insurance company’s global position, one needs to define a new risk measure as

$$\begin{aligned} \varrho \left( X\right) =\left( 1-\alpha \right) \mathrm {VaR}_{\alpha }\left( X\right) +\int _{0}^{\alpha }\mathrm {VaR}_{t}\left( X\right) dt, \end{aligned}$$

which is different from \(\mathrm {VaR}_{\alpha }\). Note that this risk measure can be defined on the set of all bounded-below random variables.

By abusing the notation, one can consider \(\lambda \) as a measure on \(\left[ 0,1\right] \) defined as \(\lambda \left( a,b\right] =\lambda \left( b\right) -\lambda \left( a\right) \). Observe that the support of \(\lambda \) as a measure is \(\left[ 0,\alpha \right) \). According to Theorem 1 in Sect. 4, this condition implies the weak continuity of the risk measure \(\varrho \).

The following simple example shows how in the presence of model uncertainty, when a robust analysis approach needs to be conducted, one may have a co-monotone sub-additive risk measure.Footnote 5

Example 3

Let us consider an insurance company that needs to issue a deposit insurance on an asset value that follows a geometric Brownian motion dynamics

$$\begin{aligned} {\left\{ \begin{array}{ll} dS_{t}=\mu S_{t}dt+\sigma S_{t}dW_{t}\\ S_{0}>0 \end{array}\right. },\quad 0<t\le T. \end{aligned}$$

Here \(\left\{ W_{t}\right\} _{t=0}^{T}\) is a standard Brownian motion on \(\left[ 0,T\right] ,\) \(\mu \in \mathbb {R}\) is the drift, and \(\sigma >0\) is the volatility. After solving this stochastic differential equation, we get that \(S_{t}=S_{0}\exp ((\mu -\frac{\sigma ^{2}}{2})t+\sigma W_{t}),0\le t\le T\). Let us assume that the losses of the financial company (which needs to be insured) can be given by \(X=\mathcal {L}\left( S_{T}\right) \), where \(\mathcal {L}\) is a non-increasing real function (e.g., \(\mathcal {L}\left( x\right) =\max \left\{ e^{rT}S_{0}-x,0\right\} \), where \(r>0\) is the risk-free rate). As discussed before, in order to avoid the risk of moral hazard, one needs to consider a contract like \(f\left( L\right) \), where \(f\in \mathbb {C}\), implying that a contract Y is a non-increasing function of \(S_{T}\). Now the market premium of the contract Y can be found as its market value (since it is a European option) given by the discounted expectation under the risk neutral probability:

$$\begin{aligned} \pi \left( Y\right) :=e^{-rT}E\left( \varphi Y\right) , \end{aligned}$$

where

$$\begin{aligned} \varphi =\exp \left( \left( \frac{1}{2}m^{2}/\sigma ^{2}-m/2\right) T\right) \left( \frac{\exp \left( -rT\right) S_{T}}{S_{0}}\right) ^{-m/\sigma ^{2}}. \end{aligned}$$

Here we have \(m=\mu -r\). For instance, see [19] for further details. As one can see, if \(0<m<\sigma ^{2}\), then \(\varphi \) is also a non-increasing function of \(S_{T}\). Following [5], one can see that

$$\begin{aligned} E(\varphi Y)=\int _{0}^{1}\mathrm {VaR}_{t}(\varphi )\mathrm {VaR}_{t}(Y)dt=\int _{0}^{1}\mathrm {VaR}_{t}(Y)d\lambda _{1}(t), \end{aligned}$$

where \(\lambda _{1}(x)=\int _{0}^{x}\mathrm {VaR}_{t}(\varphi )dt\). One can show further that \(\lambda _{1}\left( x\right) =N(N^{-1}(x)-\frac{m\sqrt{T}}{\sigma })\), where \(N\left( x\right) =\frac{1}{\sqrt{2\pi }}\int _{-\infty }^{x}e^{-\frac{t^{2}}{2}}dt\) is the cumulative distribution function of the standard normal distribution. Similar to the previous example, the risk of the global position will be assessed by a risk measure \(\varrho \) given by

$$\begin{aligned} \varrho \left( X\right) =\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda \left( t\right) , \end{aligned}$$
(1)

where \(\lambda \left( x\right) =\max \left\{ 1_{\left[ \alpha ,1\right] }\left( x\right) ,\lambda _{1}\left( x\right) \right\} \).

Now, let us consider that there is uncertainty in estimating the volatility \(\sigma \). That means that, for two positive numbers \(\sigma _{\min }\text { and }\sigma _{\max }\), we only know \(\sigma \in \left[ \sigma _{\min },\sigma _{\max }\right] \), where \(0<m<\sigma _{\min }^{2}\). In that case, the risk has to be assessed in a robust manner

$$\begin{aligned} \varrho ^{\mathrm {Robust}}\left( X\right)&=\sup _{\sigma _{\min }\le \sigma \le \sigma _{\max }}\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda ^{\sigma }\left( t\right) \\&=\left( 1-\alpha \right) \mathrm {VaR}_{\alpha }\left( X\right) +\int _{0}^{\alpha }\mathrm {VaR}_{t}\left( X\right) d\left( N\left( N^{-1}(t)-\frac{m\sqrt{T}}{\sigma }\right) \right) \\&=\left( 1-\alpha \right) \mathrm {VaR}_{\alpha }\left( X\right) +\sup _{\sigma _{\min }\le \sigma \le \sigma _{\max }}e^{\frac{-m^{2}T}{2\sigma ^{2}}}\int _{0}^{\alpha }\mathrm {VaR}_{t}\left( X\right) e^{\frac{m\sqrt{T}N^{-1}\left( t\right) }{\sigma }}dt, \end{aligned}$$

where \(\lambda ^{\sigma }\) is from (1). As one can see, \(\varrho ^{\mathrm {Robust}}\) is co-monotone sub-additive but not necessarily co-monotone additive. Note that this risk measure can be defined on the set of all bounded-below random variables. With a similar argument as in the previous example, if we look at \(\lambda ^{\sigma }\) as a measure, the support of all measures in \(\left\{ \lambda ^{\sigma }\right\} _{\sigma \in \left[ \sigma _{\mathrm {min}},\sigma _{\mathrm {max}}\right] }\) is \(\left[ 0,\alpha \right) \). Again, according to Theorem 1 in Sect.  4, this shows that \(\varrho ^{\mathrm {Robust}}\) is weakly continuous.

4 Dual characterization of natural risk measures

In this section, we characterize the family of weak continuous natural risk measures. However, in order to present our main result, we need to introduce some further notations.

Let \(C\left[ 0,1\right] \) be the space of all continuous functions on \(\left[ 0,1\right] \) with the uniform norm \(\Vert .\Vert _{\infty }\). Then it is known that the topological dual of \(C\left[ 0,1\right] \) is the space of all bounded variation functions on \(\left[ 0,1\right] \), denoted by \(BV\left[ 0,1\right] \), with the total variation on \(\left[ 0,1\right] \) as its norm. The dual relation between \(C\left[ 0,1\right] \) and \(BV\left[ 0,1\right] \) is defined as

$$\begin{aligned} \left\langle H,K\right\rangle =\int _{0}^{1}K\left( t\right) dH\left( t\right) ,\,\forall \left( K,H\right) \in C\left[ 0,1\right] \times BV\left[ 0,1\right] , \end{aligned}$$

where the integral is the Riemann–Stieltjes integral. In the following discussions, \(\left\langle H,K\right\rangle \) is used to show \(\int _{0}^{1}K\left( t\right) dH\left( t\right) \) when in general K is H integrable. The same bilinear operator introduces the smallest topology on \(BV\left[ 0,1\right] \), with the topological dual \(C\left[ 0,1\right] \). This topology is denoted by \(\sigma \left( BV\left[ 0,1\right] ,C\left[ 0,1\right] \right) \).Footnote 6

Now we are in a position to state the main result of this study.

Theorem 1

Let \(\varrho :L_{B}^{0}\rightarrow \mathbb {R}\) be a natural risk measure characterized by Definition 1. Then \(\varrho \) is weakly continuous if and only if there exists a compact set \(\Delta \) in \(\sigma (BV[0,1],C[0,1])\) such that

  1. 1.

    Each \(\lambda \in \Delta \) is a probability measure on \(\left( \left[ 0,1\right] ,\mathcal {B}\left[ 0,1\right] \right) \);

  2. 2.

    There exists \(\epsilon _{0}>0\) such that \(\forall \lambda \in \Delta \,,\mathrm {\, supp}\left( \lambda \right) =\left[ 0,1-\epsilon _{0}\right) \) Footnote 7

and

$$\begin{aligned} \varrho \left( X\right) =\sup _{\lambda \in \Delta }\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda \left( t\right) . \end{aligned}$$
(2)

Moreover, if \(\varrho \) is co-monotone additive, then \(\Delta =\left\{ \lambda \right\} \).

Before presenting the proof of the theorem, we need to introduce some notations and recall some statements from the convex analysis that we will use in the proof.

Let V be a topological vector space, and let \(V'\) be its topological dual. Recall the Alaoglu theorem that states that for any set \(C\subseteq V\), with a non-empty \(\sigma \left( V,V'\right) \)-interior and any \(c\in \mathbb {R}_{+}\), the set \(\{ H\in V'\vert \sup _{K\in C}\left\langle H,K\right\rangle \le c\} \) is \(\sigma \left( V',V\right) \) compact. Also, recall that V has the Dunford–Pettis property if for any sequence \(\left\{ \left( F_{n},H_{n}\right) \right\} _{n=1}^{\infty }\subseteq V\times V'\) converging in \(\sigma \left( V,V'\right) \times \sigma \left( V',V\right) \) to \(\left( F,H\right) \) , \(\left\{ \left\langle F_{n},H_{n}\right\rangle \right\} _{n=1}^{\infty }\) converges to \(\left\langle F,H\right\rangle \). It is known that \(C\left[ 0,1\right] \) has this property.

Let us assume that V is a locally convex topological vector space. Recall that the domain of any convex function \(\phi :V\rightarrow \mathbb {R}\cup \left\{ +\infty \right\} \), denoted by \(\mathrm {dom(\phi )}\), is equal to \(\left\{ K\in V|\phi (K)<\infty \right\} \). The dual of \(\phi \), denoted by \(\phi ^{*}\), is defined as \(\phi ^{*}(K)=\sup _{H\in V'}\left\{ \left\langle H,K\right\rangle -\phi (H)\right\} \). A convex function is said to be lower semicontinuous iff \(\phi =\phi ^{**}\). For a closed convex set \(C\subseteq V,\) the indicator function of C, denoted by \(\chi _{C}\), is introduced as \(\chi _{C}(X)=0\) if \(X\in C\) and \(+\infty \) otherwise. For any positive homogeneous convex function \(\phi \), let

$$\begin{aligned} \Delta _{\phi }=\left\{ H\in V'|\left\langle H,K\right\rangle \le \phi (K),\forall K\in V\right\} . \end{aligned}$$

It is easy to see that \(\phi ^{*}=\chi _{\Delta _{\phi }}\). Therefore, any positive homogeneous function \(\phi \) can be represented as \(\phi (K)=\sup \limits _{H\in \Delta _{\phi }}\left\langle H,K\right\rangle \). If \(\phi \) is continuous, then by the Alaoglu theorem we know that \(\Delta _{\phi }\) is \(\sigma \left( V',V\right) \)-compact.

Remark 1

For the reader’s benefit, we recommend [16] for the functional analysis discussions. This book contains all the tools and definitions that are used in this paper, in particular the Alaoglu and the Dunford-Pettis theorems. One can also read [23, 24] for further information. For the convex analysis part, we also recommend to see [12, 21].

Proof of Theorem 1

First, we prove the simple implication that \(\varrho \) in (2) is a weakly continuous natural risk measure. We leave it to the reader to check that \(\varrho \) satisfies the condition of a natural risk measure. We only check that any functional, in the form of (2), is weakly continuous.

Given that for every \(X\in L^{0}\), \(\mathrm {VaR}_{t}\left( X\right) \) is a non-decreasing function of t, \(\varrho \) is finite on \(L_{+}^{0}\). Now, let us consider that \(X_{n}\ge 0\) converges weakly to X, as n tends to infinity. This implies that \(\mathrm {VaR}_{t}\left( X_{n}\right) \) converges pointwise to \(\mathrm {VaR}_{t}\left( X\right) \) for \(t<1\). Let \(N\in \mathbb {N}\) be large enough such that \(\mathrm {VaR}_{1-\epsilon _{0}}\left( X_{n}\right) \le \mathrm {VaR}_{1-\epsilon _{0}}\left( X\right) +1\), for all \(n\ge N\). Since \(\mathrm {VaR}_{t}\left( .\right) \) is non-decreasing in t, this implies that \(0\le \mathrm {VaR}_{t}\left( X_{n}\right) \le \mathrm {VaR}_{1-\epsilon _{0}}\left( X\right) +1,t\le 1-\epsilon _{0}\), for all \(n\ge N\). Let \(\left\{ K_{n}\right\} _{n=1}^{\infty }\) be a sequence of continuous functions such that \(\left| \mathrm {VaR}_{t}\left( X_{n}\right) -\mathrm {VaR}_{t}\left( X\right) \right| \le K_{n}\left( t\right) ,t\in \left[ 0,1-\epsilon _{0}\right] ,n=N,N+1,...\), and \(K_{n}\rightarrow 0\), pointwise. Since \(\left| \mathrm {VaR}_{t}\left( X_{n}\right) -\mathrm {VaR}_{t}\left( X\right) \right| \le 2\mathrm {VaR}_{1-\epsilon _{0}}\left( X\right) +1\) for \(t\in \left[ 0,1-\epsilon _{0}\right] \text { and }n=N,N+1,...\), one can consider that \(\left\{ K_{n}\right\} _{n=N}^{\infty }\) is bounded above. Therefore, for every \(\mu \in BV\left[ 0,1\right] \), by using the dominated convergence theorem, we have that \(\int _{0}^{1}K_{n}\left( t\right) d\left| \mu \right| \left( t\right) \rightarrow 0\).Footnote 8 This implies that \(K_{n}\rightarrow 0\) in \(\sigma \left( C\left[ 0,1\right] ,BV\left[ 0,1\right] \right) \). Since \(\Delta \) is \(\sigma \left( BV\left[ 0,1\right] ,C\left[ 0,1\right] \right) \) compact, for each \(n\ge 1\) there exists \(\lambda _{n}\in \Delta \) such that \(\sup _{\lambda \in \Delta }\int _{0}^{1}K_{n}\left( t\right) d\lambda \left( t\right) =\int _{0}^{1}K_{n}\left( t\right) d\lambda _{n}\left( t\right) \).

We prove the continuity of \(\varrho \) by way of contradiction. Assume there exist \(\delta >0\) and a sub-sequence \(\{ X_{n_{i}}\} _{i=1}^{\infty }\) such that \(\left| \varrho \left( X_{n_{i}}\right) -\varrho \left( X\right) \right| \ge \delta \). Let \(\{ \lambda _{n_{i_{k}}}\} _{k=1}^{\infty }\) be a sub-sequence that converges to \(\lambda \in \Delta \) in \(\sigma \left( BV\left[ 0,1\right] ,C\left[ 0,1\right] \right) \); then since \(C\left[ 0,1\right] \) has the Dunford–Pettis property, it follows that \(\int _{0}^{1}K_{n_{i_{k}}}\left( t\right) d\lambda _{n_{i_{k}}}\left( t\right) \rightarrow 0\) as \(k\rightarrow \infty \). Now we have

$$\begin{aligned} 0<\delta \le \left| \varrho \left( X_{n_{i_{k}}}\right) -\varrho \left( X\right) \right|&\le \left| \sup _{\lambda \in \Delta }\int _{0}^{1}\mathrm {VaR}_{t}\left( X_{n_{i_{k}}}\right) d\lambda \left( t\right) -\sup _{\lambda \in \Delta }\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda \left( t\right) \right| \\&\le \sup _{\lambda \in \Delta }\int _{0}^{1}\left| \mathrm {VaR}_{t}\left( X_{n_{i_{k}}}\right) -\mathrm {VaR}_{t}\left( X\right) \right| d\lambda \left( t\right) \\&\le \sup _{\lambda \in \Delta }\int _{0}^{1}K_{n_{i_{k}}}\left( t\right) d\lambda \left( t\right) \\&=\int _{0}^{1}K_{n_{i_{k}}}\left( t\right) d\lambda _{n_{i_{k}}}\left( t\right) \rightarrow 0, \end{aligned}$$

which is a contradiction. This completes the proof of the first implication.

Now we prove the second implication, i.e., we show that, if \(\varrho \) is a weakly continuous natural risk measure, then there exists a compact set \(\Delta \) in \(\sigma (BV[0,1],C[0,1])\) as described by conditions 1 and 2 in the theorem statement such that \(\varrho \left( X\right) =\sup _{\lambda \in \Delta }\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda \left( t\right) \).

First, we give an outline of the proof as follows:

  1. 1.

    We restrict \(\varrho \) to

    $$\begin{aligned} S_{n}:=\left\{ \sum _{i=1}^{2^{n}}x_{i}1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+x_{2^{n}+1}1_{\left\{ 1\right\} }\left| \left( x_{i}\right) _{i=1}^{2^{n}+1}\in \mathbb {R}^{2^{n}+1}\right. \right\} . \end{aligned}$$

    Then it is clear that \(\varrho \vert _{S_{n}}\) is a natural risk statistics.Footnote 9

  2. 2.

    Using the previous step, we extend \(\varrho \) to \(\bigcup _{n=1}^{\infty }S_{n}\).

  3. 3.

    By using the previous step and the conditional expectation on the partition \(\{ \left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) ,i=1,...,2^{n},\left\{ 1\right\} \} \), we extend \(\varrho \) to \(C\left[ 0,1\right] \).

  4. 4.

    We give the Fenchel–Moreau representation of \(\varrho \) on \(C\left[ 0,1\right] \).

  5. 5.

    By using the Daniell integral, we extend the \(C\left[ 0,1\right] \)- Fenchel-Moreau representation of \(\varrho \) to \(\mathbb {A}\).

Let us fix a uniformly distributed random variable U. We introduce the risk measure \(\Pi \) on \(\mathbb {A}\) by

$$\begin{aligned} \Pi \left( H\right) =\varrho \left( H\left( U\right) \right) . \end{aligned}$$

Let \(H_{1},H_{2}\in \mathbb {A}\), then it is clear that \(H_{1}\left( U\right) \) and \(H_{2}\left( U\right) \) are co-monotone. Since \(\varrho \) is co-monotone sub-additive, it implies that \(\Pi \) is sub-additive over \(\mathbb {A}\). It is also clear that \(\Pi \) is a positive homogeneous mapping of degree 1 on \(\mathbb {A}\).

Now, let us introduce the following functions from \(\mathbb {R}^{2^{n}+1}\) to \(\mathbb {A}\):

$$\begin{aligned} T_{n}\left( \mathbf {x}\right) =\sum _{i=1}^{2^{n}}x_{i}1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+x_{2^{n}+1}1_{\left\{ 1\right\} },\quad \forall \mathbf {x=}\left( x_{i}\right) _{i=1}^{2^{n}+1}, \end{aligned}$$

and

$$\begin{aligned} \tilde{T}_{n}\left( \mathbf {w}\right) =2^{n}\sum _{i=1}^{2^{n}}w_{i}1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+w_{2^{n}+1}1_{\left\{ 1\right\} },\quad \forall \mathbf {w=}\left( w_{i}\right) _{i=1}^{2^{n}+1}. \end{aligned}$$

Let \(S_{n}=T_{n}\left( \mathbb {R}^{2^{n}+1}\right) \) and \(S=\cup _{n=1}^{\infty }S_{n}\). We introduce a natural risk statistics \(\Pi _{n}\) on \(\mathbb {R}^{2^{n}+1}\), for \(n\ge 1\), as follows:

$$\begin{aligned} \Pi _{n}\left( x_{1},...,x_{2^{n}+1}\right) =\Pi \left( T_{n}\left( \mathbf {x}^{os}\right) \right) =\Pi \left( \sum _{i=1}^{2^{n}}x_{i}^{os}1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+x_{2^{n}+1}^{os}1_{\left\{ 1\right\} }\right) , \end{aligned}$$

where \(\mathbf {x}^{os}=\left( x_{i}^{os}\right) _{i=1}^{2^{n}+1}\) is the order statistics of \(\mathbf {x}=\left( x_{i}\right) _{i=1}^{2^{n}+1}\), i.e., \(x_{1}^{os}\le x_{2}^{os}\le ...\le x_{2^{n}+1}^{os}\). Define the dual relation between \(\mathbb {R}^{2^{n}+1}\) and itself, with the Euclidean norm, as

$$\begin{aligned} \left\langle \mathbf {w},\mathbf {x}\right\rangle _{n}=\sum _{i=1}^{2^{n}+1}w_{i}x_{i},\quad \forall \left( \mathbf {w},\mathbf {x}\right) \in \mathbb {R}^{2^{n}+1}\times \mathbb {R}^{2^{n}+1}. \end{aligned}$$

In [1], it is shown that \(\Pi _{n}\) can be represented as follows:

$$\begin{aligned} \Pi _{n}\left( \mathbf {x}\right) =\sup _{\mathbf {w}\in \Omega _{n}}\left\langle \mathbf {w,x}^{os}\right\rangle _{n},\quad \forall \mathbf {x}\in \mathbb {R}^{2^{n}+1}, \end{aligned}$$

where \(\Omega _{n}\) is a closed convex subset of

$$\begin{aligned} \Lambda _{n}=\left\{ \mathbf {w}\in \mathbb {R}^{2^{n}+1}\left| w_{i}\ge 0,i=1,...,2^{n}+1,\right. \sum _{i=1}^{2^{n}+1}w_{i}=1\right\} . \end{aligned}$$

Let us introduce \(\Gamma _{n}\) as

$$\begin{aligned} \Gamma _{n}\left( \mathbf {x}\right) =\sup _{\mathbf {w}\in \Omega _{n}}\left\langle \mathbf {w,x}\right\rangle _{n},\quad \forall \mathbf {x}\in \mathbb {R}^{2^{n}+1}. \end{aligned}$$

Note that, since \(\Gamma _{n}\) is translation invariant, it is continuous (using the sup norm) on \(\mathbb {R}^{2^{n}+1}\). Given this, and the discussions we had in the previous section, \(\Omega _{n}\) is easily given as follows:

$$\begin{aligned} \Omega _{n}=\left\{ \mathbf {w}\in \mathbb {R}^{2^{n}+1}\left| \left\langle \mathbf {w,x}\right\rangle _{n}\le \Gamma _{n}\left( \mathbf {x}\right) ,\forall \mathbf {x}\in \mathbb {R}^{2^{n}+1}\right. \right\} . \end{aligned}$$

We now introduce the following mapping from \(\mathbb {R}^{2^{n}+1}\) to \(\mathbb {R}^{2^{n+1}+1}\):

$$\begin{aligned} d_{n}\left( x_{1},x_{2},...,x_{2^{n}+1}\right) =\left( x_{1},x_{1},x_{2},x_{2},...,x_{2^{n}},x_{2^{n}},x_{2^{n}+1}\right) . \end{aligned}$$

Notice that

$$\begin{aligned} \Gamma _{n}\left( \mathbf {x}\right)&=\Pi \left( T_{n}\left( \mathbf {x}\right) \right) \\&=\Pi \left( \sum _{i=1}^{2^{n}}x_{i}1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+x_{2^{n}+1}1_{\left\{ 1\right\} }\right) \\&=\Pi \left( \sum _{i=1}^{2^{n}}\left( x_{i}1_{\left[ \frac{2i-2}{2^{n+1}},\frac{2i-1}{2^{n+1}}\right) }+x_{i}1_{\left[ \frac{2i-1}{2^{n+1}},\frac{2i}{2^{n+1}}\right) }\right) +x_{2^{n}+1}1_{\left\{ 1\right\} }\right) \\&=\Pi \left( T_{n+1}\left( x_{1},x_{1},x_{2},x_{2},...,x_{2^{n}},x_{2^{n}},x_{2^{n}+1}\right) \right) \\&=\Gamma _{n+1}\left( d_{n}\left( \mathbf {x}\right) \right) , \end{aligned}$$

therefore \(\Gamma _{n}=\Gamma _{n+1}\circ d_{n}\). Let us define a mapping from \(\mathbb {R}^{2^{n+1}+1}\) to \(\mathbb {R}^{2^{n}+1}\)

$$\begin{aligned} d^{n}\left( w_{1},...,w_{2^{n+1}+1}\right) =\left( w_{1}+w_{2},w_{3}+w_{4},...,w_{2^{n+1}-1}+w_{2^{n+1}},w_{2^{n+1}+1}\right) . \end{aligned}$$

It is very simple to check that

$$\begin{aligned} \left\langle \mathbf {w,}d_{n}\left( \mathbf {x}\right) \right\rangle _{n+1}=\left\langle d^{n}\left( \mathbf {w}\right) ,\mathbf {x}\right\rangle _{n},\forall \left( \mathbf {w},\mathbf {x}\right) \in \mathbb {R}^{2^{n+1}+1}\times \mathbb {R}^{2^{n}+1}. \end{aligned}$$

We claim that \(d^{n}\left( \Omega _{n+1}\right) =\Omega _{n}\), and to see this, observe that

$$\begin{aligned} \Gamma _{n}\left( \mathbf {x}\right)&=\Gamma _{n+1}\left( d_{n}\left( \mathbf {x}\right) \right) \\&=\sup _{\mathbf {w}\in \Omega _{n+1}}\left\langle \mathbf {w},d_{n}\left( \mathbf {x}\right) \right\rangle _{n+1}\\&=\sup _{\mathbf {w}\in \Omega _{n+1}}\left\langle d^{n}\left( \mathbf {w}\right) ,\mathbf {x}\right\rangle _{n}\\&=\sup _{\mathbf {w}\in d^{n}\left( \Omega _{n+1}\right) }\left\langle \mathbf {w},\mathbf {x}\right\rangle _{n}. \end{aligned}$$

Introduce \(\Delta _{n}=\tilde{T}_{n}\left( \Omega _{n}\right) \) and

$$\begin{aligned} L_{n}\left( K\right) :=\sup _{H\in \Delta _{n}}\left\langle H,K\right\rangle ,K\in L^{0}, \end{aligned}$$

then it is easy to verify that for all \(\left( \mathbf {w},\mathbf {x}\right) \in \mathbb {R}^{2^{n+1}+1}\times \mathbb {R}^{2^{n}+1}\)

$$\begin{aligned} \left\langle \mathbf {w},\mathbf {x}\right\rangle _{n}=\left\langle \tilde{T}_{n}\left( \mathbf {w}\right) ,T_{n}\left( \mathbf {x}\right) \right\rangle . \end{aligned}$$

This implies that

$$\begin{aligned} L_{n}\left( T_{n}\left( \mathbf {x}\right) \right) =\Gamma _{n}\left( \mathbf {x}\right) ,\forall \mathbf {x}\in \mathbb {R}^{2^{n}+1}. \end{aligned}$$

Also note that if \(\mathbf {x}\in \mathbb {R}^{2^{n}+1}\) is a non-decreasing sequence, then \(T_{n}\left( \mathbf {x}\right) \) is non-decreasing and thus

$$\begin{aligned} L_{n}\left( T_{n}\left( \mathbf {x}\right) \right) =\Gamma _{n}\left( \mathbf {x}\right) =\Pi \left( T_{n}\left( \mathbf {x}\right) \right) . \end{aligned}$$
(3)

Let \(T_{n}\left( \mathbf {x}\right) \in S_{n}\) and \(\tilde{T}_{n+1}\left( \mathbf {w}\right) \in S_{n+1}\). It can be easily checked that

$$\begin{aligned} \left\langle \tilde{T}_{n+1}\left( \mathbf {w}\right) ,T_{n}\left( \mathbf {x}\right) \right\rangle =\left\langle d^{n}\left( \mathbf {w}\right) ,\mathbf {x}\right\rangle _{n}, \end{aligned}$$

which implies that

$$\begin{aligned} L_{n+1}\left( T_{n}\left( \mathbf {x}\right) \right)&=\sup _{H\in \Delta _{n+1}}\left\langle H,T_{n}\left( \mathbf {x}\right) \right\rangle \\&=\sup _{\mathbf {w}\in \Omega _{n+1}}\left\langle \tilde{T}_{n+1}\left( \mathbf {w}\right) ,T_{n}\left( \mathbf {x}\right) \right\rangle \\&=\sup _{\mathbf {w\in }\Omega _{n+1}}\left\langle d^{n}\left( \mathbf {w}\right) ,\mathbf {x}\right\rangle _{n}\\&=\sup _{\mathbf {w\in }\Omega _{n+1}}\left\langle \mathbf {w},d_{n}\left( \mathbf {x}\right) \right\rangle _{n}\\&=\Gamma _{n+1}\circ d_{n}\left( \mathbf {x}\right) \\&=\Gamma _{n}\left( \mathbf {x}\right) \\&=L_{n}\left( T_{n}\left( \mathbf {x}\right) \right) . \end{aligned}$$

This shows that

$$\begin{aligned} L_{n+1}\left( T_{n}\left( \mathbf {x}\right) \right)&=L_{n}\left( T_{n}\left( \mathbf {x}\right) \right) . \end{aligned}$$
(4)

Let us introduce the mapping from the set of continuous functions, \(C\left[ 0,1\right] \), to \(S_{n}\) with

$$\begin{aligned} E_{n}\left( K\right) =\sum _{i=1}^{2^{n}}\left( \frac{1}{2^{n}}\int _{\frac{i-1}{2^{n}}}^{\frac{i}{2^{n}}}K\left( t\right) dt\right) 1_{\left[ \frac{i-1}{2^{n}},\frac{i}{2^{n}}\right) }+\left( \frac{1}{2^{n}}\int _{\frac{2^{n}-1}{2^{n}}}^{1}K\left( t\right) dt\right) 1_{\left\{ 1\right\} },\quad K\in C\left[ 0,1\right] . \end{aligned}$$

From a probabilistic point of view, this is the conditional expectation with respect to the sigma-algebra induced by partition \(\{[\frac{i-1}{2^{n}},\frac{i}{2^{n}}),i=1,...,2^{n-1},[\frac{2^{n}-1}{2^{n}},1]\} \). For a continuous function K, it is clear that \(E_{n}\left( K\right) \) converges pointwise to K. Now, we introduce the following function \(L^{c}\) on \(C\left[ 0,1\right] \)

$$\begin{aligned} L^{c}\left( K\right) =\limsup _{n}L_{n}\left( E_{n}\left( K\right) \right) . \end{aligned}$$

First of all, it is clear that \(\min \left( K\right) \le L^{c}\left( K\right) \le \max \left( K\right) \); therefore \(L^{c}\left( K\right) \) is a finite number, which means \(\text {dom}\left( L^{c}\right) =C\left[ 0,1\right] \). Note that any convex function is continuous in the interior of its domain. Now let us consider a continuous and non-decreasing member \(K\in C\left[ 0,1\right] \); then it is clear that \(E_{n}\left( K\right) \) is a non-decreasing member of \(S_{n}\). Therefore, as discussed earlier, \(L\left( E_{n}\left( K\right) \right) =\Pi \left( E_{n}\left( K\right) \right) \). However, since \(E_{n}\left( K\right) \) converges pointwise to K, we have that \(\Pi \left( E_{n}\left( K\right) \right) \rightarrow \Pi \left( K\right) \), as \(n\rightarrow \infty \). Therefore, by using (4) and (3),

$$\begin{aligned} L^{c}\left( K\right) =\limsup _{n}L_{n}\left( E_{n}\left( K\right) \right) =L_{n}\left( E_{n}\left( K\right) \right) =\Pi \left( K\right) . \end{aligned}$$
(5)

It can be easily checked that \(L^{c}\) is sub-additive and positive homogeneous of degree 1, that it is non-decreasing, and that \(L^{c}\left( K+c\right) =L^{c}\left( K\right) +c\). The last-mentioned property easily results in the continuity of \(L^{c}\) (note \(L^{c}\left( K\right) -L^{c}\left( H\right) \le L^{c}\left( H-K\right) \le L^{c}\left( \Vert H-K\Vert _{\infty }\right) =\Vert H-K\Vert _{\infty }\) , for all \(K,H\in C\left[ 0,1\right] \)). Let \(C=\left\{ K\in C\left[ 0,1\right] \vert L^{c}\left( K\right) \ge 0\right\} \) and

$$\begin{aligned} \Delta =\left\{ \mu \in BV\left[ 0,1\right] \vert \forall K\in C,\int _{0}^{1}K\left( t\right) d\mu \left( t\right) \ge 0\text { and }\int _{0}^{1}d\mu \left( t\right) =1\right\} . \end{aligned}$$

Since \(C\left[ 0,1\right] _{+}\subseteq C\), then all members of \(\Delta \) are non-negative and it is also easy to see that \(\Delta \) is a closed convex set in \(BV\left[ 0,1\right] \). Like in the proof of Theorem 2.3 in [10], one can show that

$$\begin{aligned} L^{c}\left( K\right) =\sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\mu \left( t\right) . \end{aligned}$$

By using the Alaoglu theorem, we find that the continuity of \(L^{c}\) results in \(\Delta \) being \(\sigma \left( BV\left[ 0,1\right] ,C\left[ 0,1\right] \right) \)-compact.

We now prove the following lemma:

Lemma 1

There exists \(\epsilon _{0}>0\) such that

$$\begin{aligned} \forall \mu \in \Delta ,\int _{1-\epsilon _{0}}^{1}d\mu \left( t\right) =0. \end{aligned}$$
(6)

Proof

Let us consider the opposite to be true and prove it by means of contradictions. Therefore, for any N, there exists \(\mu _{N}\) such that \(\int _{1-\frac{1}{N}}^{1}d\mu _{N}\left( t\right) >0\) . For each N, let us consider a continuous, non-negative, and non-decreasing function \(I_{N}\) such that \(\mathrm {supp}\left( I_{N}\right) \subseteq \left[ 1-\frac{1}{N},1\right] \) and \(I_{N}|_{[1-\frac{1}{2N},1]}=\frac{1}{\int _{1-\frac{1}{N}}^{1}d\mu _{N}\left( t\right) }\). Observe that \(I_{N}\rightarrow 0\), pointwise as \(N\rightarrow \infty \), which implies \(\Pi \left( I_{N}\right) \rightarrow 0\) as \(N\rightarrow \infty \). Since \(I_{N}\in C\left[ 0,1\right] \) is non-decreasing, then by (5) we have \(L^{c}\left( I_{N}\right) =\Pi \left( I_{N}\right) \). This means that \(L^{c}\left( I_{N}\right) \rightarrow 0\) as \(N\rightarrow \infty \). However,

$$\begin{aligned} L^{c}\left( I_{N}\right)&\ge \int _{0}^{1}I_{N}\left( t\right) d\mu _{N}\left( t\right) \\&\ge \int _{1-\frac{1}{2N}}^{1}\frac{1}{\int _{1-\frac{1}{N}}^{1}d\mu _{N}\left( t\right) }d\mu _{N}\left( t\right) =1, \end{aligned}$$

which is a contradiction. \(\square \)

Now, we show that each member \(\mu \in \Delta \) can be considered as a measure on \(\left[ 0,1\right] \). First, we have the following simple lemma:

Lemma 2

For \(\mu \in \Delta \), let \(\Lambda _{\mu }\left( K\right) =\int _{0}^{1}K\left( t\right) d\mu \left( t\right) \). \(\Lambda \) has the following properties:

  1. (1)

    Linearity If \(K_{1},K_{2}\in C\left[ 0,1\right] \), and \(\alpha _{1},\alpha _{2}\) are any two real numbers, then \(\Lambda _{\mu }(\alpha _{1}K_{1}+\alpha _{2}K_{2})=\alpha _{1}\Lambda _{\mu }\left( K_{1}\right) +\alpha _{2}\Lambda _{\mu }\left( K_{2}\right) \).

  2. (2)

    Non-negativity If \(K\in C\left[ 0,1\right] \) and \(K\ge 0\), then \(\Lambda _{\mu }\left( K\right) \ge 0\).

  3. (3)

    Continuity If \(K_{m}\) is a non-increasing sequence (i.e., \(K_{1}\ge \cdots \ge K_{n}\ge \cdots \)) of functions in L that converges to 0 for all x in \(\left[ 0,1\right] \), then \(\Lambda _{\mu }\left( K_{n}\right) \rightarrow 0\).

Proof

The first and the second properties are clear. Note that, when K is continuous, the integral \(\int _{0}^{1}K\left( t\right) d\mu \left( t\right) \) can be regarded as the Lebesgue integral for the measure \(m_{\mu }\) defined as \(m_{\mu }\left[ a,b\right) =\mu \left( b\right) -\mu \left( a\right) \) and \(m_{\mu }\left( 1\right) =0\). Therefore, the third property is an easy result of the dominated convergence theorem. \(\square \)

Let us now introduce the following Daniell integral (see [22] for instance) on the set, \(\mathbb {F}\), of non-negative functions on \(\left[ 0,1\right] \): let \(\left\{ K_{n}\right\} _{n=1}^{\infty }\) be an arbitrary increasing sequence from \(C\left[ 0,1\right] \) converging pointwise to \(K\in \mathbb {F}\), i.e., \(K_{n}\uparrow F\) . Then, the Daniell integral of K is defined as

$$\begin{aligned} D_{\mu }\left( K\right) =\lim _{n}\Lambda _{\mu }\left( K_{n}\right) . \end{aligned}$$

Daniell has shown that, given the three properties we mentioned in the previous lemma, this limit is independent of the choice of the sequence. It is important that the value of the Daniell integral can be \(+\infty \), if the limit is not bounded. There is also a Borel measure \(\bar{\mu }\), on \(\left[ 0,1\right] \), given in intervals as

$$\begin{aligned} \bar{\mu }\left[ a,b\right) =D_{\mu }\left( 1_{\left[ a,b\right) }\right) \quad \text { and }\quad \bar{\mu }\left( \left\{ 1\right\} \right) =D_{\mu }\left( 1_{\left\{ 1\right\} }\right) , \end{aligned}$$

where

$$\begin{aligned} D_{\mu }\left( K\right) =\int _{0}^{1}K\left( t\right) d\bar{\mu }\left( t\right) . \end{aligned}$$

If the sequence \(\int _{0}^{1}K_{n}\left( t\right) d\mu \left( t\right) \) is bounded above, we say K is integrable. It is easy to see that \(\bar{\mu }\) inherits all the properties of \(\mu \) such as non-negativity, \(\bar{\mu }\left[ 0,1\right] =1\), and \(\text {supp}\left( \bar{\mu }\right) \subseteq \left[ 0,\epsilon _{0}\right) \). Now, let us consider some \(K\in \mathbb {A}\) with a non-increasing sequence \(\left\{ K'_{n}\right\} _{n=1}^{\infty }\) from \(C\left[ 0,1\right] \) such that \(K'_{n}\downarrow K\) on \(\left[ 0,1-\frac{\epsilon _{0}}{2}\right] \) (we know that such sequence always exists given that K is bounded on \(\left[ 0,1-\frac{\epsilon _{0}}{2}\right] \)). Then, given that \(\bar{\mu }\) is non-negative and \(\text {supp}\left( \bar{\mu }\right) \subseteq \left[ 0,\epsilon _{0}\right) \) \(\forall \mu \in \Delta \), along with the pointwise continuity of \(\varrho \), we have

$$\begin{aligned} \sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\bar{\mu }\left( t\right)&=\sup _{\mu \in \Delta }\int _{0}^{1-\epsilon _{0}}K\left( t\right) d\bar{\mu }\left( t\right) \\&\le \sup _{\mu \in \Delta }\int _{0}^{1-\epsilon _{0}}K'_{n}\left( t\right) d\bar{\mu }\left( t\right) \\&=\sup _{\mu \in \Delta }\int _{0}^{1}K'_{n}\left( t\right) d\bar{\mu }\left( t\right) \\&=\Pi \left( K'_{n}\right) \downarrow \Pi \left( K\right) . \end{aligned}$$

This inequality has two implications. First, K is Daniell integrable for all \(\mu \in \Delta \). Second,

$$\begin{aligned} \sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\bar{\mu }\left( t\right) \le \varrho \left( K\right) . \end{aligned}$$
(7)

On the other hand, let us assume that \(\left\{ K_{n}\right\} _{n=1}^{\infty }\) is a sequence in \(C\left[ 0,1\right] \) such that \(K_{n}\uparrow K\) pointwise. Then we have

$$\begin{aligned} \sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\bar{\mu }\left( t\right) \ge \sup _{\mu \in \Delta }\int _{0}^{1}K_{n}\left( t\right) d\bar{\mu }\left( t\right) =L^{c}\left( K_{n}\right) =\Pi \left( K_{n}\right) \uparrow \Pi \left( K\right) . \end{aligned}$$

This shows, on all members of \(\mathbb {A}\), that

$$\begin{aligned} \sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\bar{\mu }\left( t\right) \ge \Pi \left( K\right) . \end{aligned}$$
(8)

Finally, (8) and (7) result in (2).

Now, let us assume that \(\varrho \) is co-monotone additive and introduce, for any continuous function K, the set \(M\left( K\right) \) as follows:

$$\begin{aligned} M\left( K\right) =\left\{ \lambda \in \Delta \Bigg \vert \int _{0}^{1}K\left( t\right) d\lambda \left( t\right) =\sup _{\mu \in \Delta }\int _{0}^{1}K\left( t\right) d\mu \left( t\right) \right\} . \end{aligned}$$

Since \(\Delta \) is non-empty, \(M\left( K\right) \) is also non-empty. Now, we claim that for two non-decreasing continuous functions \(K_{1}\text { and }K_{2}\), we have \(M\left( K_{1}+K_{2}\right) \subseteq M\left( K_{1}\right) \cap M\left( K_{2}\right) \). Let us take \(\lambda \in M\left( K_{1}+K_{2}\right) \); then by using \(L^{c}\) representation

$$\begin{aligned} \int _{0}^{1}K_{1}\left( t\right) d\lambda \left( t\right)&\le \sup _{\mu \in \Delta }\int _{0}^{1}K_{1}\left( t\right) d\mu \left( t\right) ,\end{aligned}$$
(9)
$$\begin{aligned} \int _{0}^{1}K_{2}\left( t\right) d\lambda \left( t\right)&\le \sup _{\mu \in \Delta }\int _{0}^{1}K_{2}\left( t\right) d\mu \left( t\right) \end{aligned}$$
(10)

and by co-monotonicity of \(\varrho \), we have

$$\begin{aligned} \int _{0}^{1}K_{2}\left( t\right) d\lambda \left( t\right) +\int _{0}^{1}K_{2}\left( t\right) d\lambda \left( t\right)&=\int _{0}^{1}\left( K_{1}\left( t\right) +K_{2}\left( t\right) \right) d\lambda \left( t\right) \nonumber \\&=\sup _{\mu \in \Delta }\int _{0}^{1}\left( K_{1}\left( t\right) +K_{2}\left( t\right) \right) d\mu \left( t\right) \nonumber \\&=\Pi \left( K_{1}+K_{2}\right) \nonumber \\&=\Pi \left( K_{1}\right) +\Pi \left( K_{2}\right) . \end{aligned}$$
(11)

We can see that (9), (10), and (11) imply that (9) and (10) hold with equality, meaning that \(\lambda \in M\left( K_{1}\right) \cap M\left( K_{2}\right) \). By induction, one can then infer \(M\left( K_{1}+K_{2}+\cdots +K_{n}\right) \subseteq M\left( K_{1}\right) \cap M\left( K_{2}\right) \cap \cdots \cap M\left( K_{n}\right) \) for any n continuous and non-decreasing functions \(K_{1},\cdots ,K_{n}\). This means that, for every \(n\in \mathbb {N}\), \(M\left( K_{1}\right) \cap M\left( K_{2}\right) \cap \cdots \cap M\left( K_{n}\right) \not =\varnothing .\) By using the finite compact intersection lemma, one can deduce that

$$\begin{aligned} C:=\bigcap \left\{ M\left( K\right) \left| \begin{array}{c} K\in C\left[ 0,1\right] \\ K \text { is non-decreasing } \end{array}\right. \right\} \not =\varnothing . \end{aligned}$$

Let us assume \(\lambda \in C.\) Then for any continuous and non-decreasing function K, we have \(\Pi \left( K\right) =\int _{0}^{1}K\left( t\right) d\lambda \). By using the same argument for introducing the Daniell integral above, one can show that \(\lambda \) can induce a measure on \(\left[ 0,1\right] \) such that \(\Pi \left( K\right) =\int _{0}^{1}K\left( t\right) d\lambda \), for any \(K\in \mathbb {A}\). This completes the proof of Theorem  1. \(\square \)

Remark 2

It may be questioned why we did not adopt the same approach as in [1] to introduce the following convex function

$$\begin{aligned} L^{cc}\left( K\right) ={\left\{ \begin{array}{ll} \Pi \left( K\right) &{} K\in C\left[ 0,1\right] \text { and }K\text { is non-decreasing}\\ +\infty &{} \text {o.w.} \end{array}\right. }, \end{aligned}$$

and used instead its Fenchel–Moreau representation given by

$$\begin{aligned} L^{cc}\left( K\right) =\sup _{H\in \Delta _{cc}}\int _{0}^{1}K\left( t\right) dH\left( t\right) . \end{aligned}$$

The matter of fact is, in this case (or any similar approach), to show that the set \(\Delta _{cc}\) has non-negative members, we need to know that, for a continuous and non-decreasing function K, the sub-gradient \(\partial L^{cc}\left( K\right) \) is non-empty. On the other hand, \(\partial L^{cc}\left( K\right) \) is non-empty if K is in the interior of the domain:

$$\begin{aligned} dom\left( L^{cc}\right) =\left\{ K\in C\left[ 0,1\right] \text { and }K\text { is non-decreasing}\right\} . \end{aligned}$$

However, it is not difficult to see that the interior of \(dom\left( L^{cc}\right) \) is empty.

The approach we have chosen above allows us to construct an appropriate reduction of \(\Pi \) to the set of non-decreasing continuous functions, which can be extended to the whole set C[0, 1].

Remark 3

It is known that, for any non-empty set \(\mathbb {X}\), the set of all real functions from \(\mathbb {X}\) to \(\mathbb {R}\), endowed with the pointwise topology, is a topological vector space and each continuous functional f on this space can be described as \(f\left( K\right) =\sum _{i=1}^{n}a_{i}K\left( x_{i}\right) \), for some \(n\in \mathbb {N}\), \(\left( a_{i}\right) _{i=1}^{n}\in \mathbb {R}^{n}\), and \(\left( x_{i}\right) _{i=1}^{n}\in \mathbb {X}^{n}\) (see, for instance, [2]). That is why one would guess, in the first place, that the same might hold for a weakly continuous co-monotone additive risk measure (which, as we have seen, does not).

Remark 4

Now let us compare our main result, Theorem 1, with a similar representation of law-invariant coherent risk measures by [18]. A coherent risk measure \(\varrho \) is a mapping from \(L^{\infty }\) to \(\mathbb {R}\), with properties 1, 2, and 3 of Definition 1, and also with the following one:

4\('\). Sub-additivity: \(\varrho (X+Y)\le \varrho (X)+\varrho (Y),\forall X,Y\in L^{\infty }\).

In [18], it is shown that a law-invariant coherent risk measure \(\varrho \) that is \(\sigma \left( L^{\infty },L^{1}\right) \) lower semicontinuous can be represented as follows:

$$\begin{aligned} \varrho \left( X\right) =\sup _{m\in \mathcal {C}}\int _{0}^{1}\mathrm {CVaR}_{\alpha }\left( X\right) dm\left( \alpha \right) , \end{aligned}$$

where \(\mathcal {C}\) is a set of probability measures on \(\left[ 0,1\right] \) and

$$\begin{aligned} \mathrm {CVaR}_{\alpha }\left( X\right) =\frac{1}{1-\alpha }\int _{\alpha }^{1}\mathrm {VaR}_{t}\left( X\right) dt,\quad \forall X\in L^{\infty }. \end{aligned}$$

However, this can be written as a double integral as follows:

$$\begin{aligned} \varrho \left( X\right) =\sup _{m\in \mathcal {C}}\int _{0}^{1}\frac{1}{1-\alpha }\int _{\alpha }^{1}\mathrm {VaR}_{t}\left( X\right) dtdm\left( \alpha \right) . \end{aligned}$$

By changing the variables, one gets

$$\begin{aligned} \int _{0}^{1}\frac{1}{1-\alpha }\int _{\alpha }^{1}\mathrm {VaR}_{t}\left( X\right) dtdm\left( \alpha \right)&=\int _{0}^{1}\int _{0}^{t}\frac{1}{1-\alpha }\mathrm {VaR}_{t}\left( X\right) dm\left( \alpha \right) dt\\&=\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) \left( \int _{0}^{t}\frac{1}{1-\alpha }dm\left( \alpha \right) \right) dt\\&=\int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) d\lambda _{m}\left( t\right) , \end{aligned}$$

where \(\lambda _{m}\left( t\right) =\int _{0}^{t}\int _{0}^{s}(\frac{1}{1-\alpha }dm\left( \alpha \right) )ds\). Note that \(\lambda _{m}\) is a non-decreasing function such that \(\lambda _{m}\left( 0\right) =1-\lambda _{m}\left( 1\right) =0\). Indeed, from the above, one can have

$$\begin{aligned} \lambda _{m}\left( 1\right) =\left. \int _{0}^{1}\mathrm {VaR}_{t}\left( X\right) \left( \int _{0}^{t}\frac{1}{1-\alpha }dm\left( \alpha \right) \right) dt\right| _{X=1}=\left. \int _{0}^{1}\mathrm {CVaR}_{\alpha }\left( X\right) dm\left( \alpha \right) \right| _{X=1}=1. \end{aligned}$$

Therefore, by abusing the notation and using \(\lambda _{m}\) to denote a measure, \(\lambda _{m}\left( a,b\right] =\lambda _{m}\left( b\right) -\lambda _{m}\left( a\right) \) can introduce a measure on \(\left[ 0,1\right] \). Finally, one can represent a law-invariant coherent risk measure in the following way:

$$\begin{aligned} \varrho \left( X\right) =\sup _{m\in \mathcal {C}}\int _{0}^{1}\mathrm {VaR}_{\alpha }\left( X\right) d\lambda _{m}\left( \alpha \right) . \end{aligned}$$

As one can see, this representation is very similar to what is expressed in Theorem 1, except that all members of \(\mathcal {C}\) have particular forms represented by \(\lambda _{m}\left( t\right) =\int _{0}^{t}\int _{0}^{s}(\frac{1}{1-\alpha }dm\left( \alpha \right) )ds\).