1 Introduction

Returns in a portfolio arise from movements in market prices of assets, viz. stocks, commodities, market indices etc. Asset returns cannot be predicted perfectly and the distribution of returns is unknown (See Ruppert (2004), pages 78 and 79). Modeling return distribution is an important problem in finance. Tolikas and Gettinby (2009) studied the suitability of the Generalized Extreme Value (GEV), Generalized Pareto (GP) and Generalized Logistic (GL) distributions for modeling the distribution of the extreme daily share returns in Singapore Stock Exchange over the period 1973 to 2005. Kassberger and Kiesel (2006) investigated multivariate extension of the Normal Inverse Gaussian (NIG) distribution for capturing the distributional features of hedge fund returns. The extreme quantiles of the return distribution are important in the context of market risk estimation and also have important applications in portfolio optimization. See for instance (Ruppert, 2004; Allen et al. 2013).

Value-at-Risk (VaR) and Median-Shortfall (MS) are two well known measures of market risk which are required to determine regulatory capital requirements. See for instance, Cont (2001), Danielsson (2002), Ruppert (2004), So and Wong (2012), and Santomila et al. (2018) and references therein. Under Solvency II standard model, VaR has been chosen as a widely reported risk measure in financial markets (See Santomila et al. (2018)). Solvency II is a revision of standards for evaluating the financial situation of European insurers intended to improve risk measurement and control (see Santomila et al. (2018)). For any 0 < p < 1 and m > 0, Goh et al. (2012) interpreted the 100p percent VaR of a portfolio during (t, t + m] as the least amount of capital or cash necessary to be added with the portfolio at time t + m to ensure that the augmented return (portfolio return plus the cash added) is positive with probability at least p. In this sense, VaR is a measure of capital adequacy of a portfolio over a period of length m and with a certain confidence level p. The VaR of a portfolio turn out of be the extreme quantiles of the return distribution. See So and Wong (2012) and Dutta and Biswas (2017) and references therein. However, there are several demerits of the VaR. It is not a coherent risk measure, as it is not sub additive. Also, VaR does not provide any information about the size of potential loss when it exceeds the VaR level. Considering such issues, median shortfall (MS) was introduced. It is the median of the conditional loss distribution, given the event that the loss exceeds the VaR (see So and Wong (2012)). Therefore estimation of VaR and MS essentially reduce to the problem of estimation of extreme quantiles of the return distribution (See Dutta and Biswas (2017)).

Let Pt be the price of a financial asset at time t. Let

$$ X_{t, m }=\log\left( \frac{P_{t+m}}{P_{t}}\right). $$

Xt, m is referred to as the return (in log-scale) during a time period (t, t + m]. The number m > 0 is referred to as the time scale and it can be measured in weeks, days, hours or minutes. Xt, m is widely used in finance to represent the revenue earned from investing in an asset over a time period of length m. See for instance, Ruppert (2004), Cont (2001), and Goh et al. (2012) and the references therein.

The distribution of Xt, m is in general unknown. There seems to be no one probability model that provides best fit to all type of asset return data (values of Xt, m) for different time scales (i.e. different choices of m). See Cont (2001). The statistical properties of short term (i.e. small m) and long term (i.e. large m) return data are different. For instance, Cont (2001) reported that for a wide variety of assets the data on price fluctuations over very small time scales seem to be highly leptokurtic and negatively skewed. The kurtosis of the marginal distribution seems to be very high for m equal to 5 to 30 minutes (see Cont (2001, page 226)). However, as the time scale \(m \rightarrow \infty \) the empirical distribution of the asset returns resembled a normal distribution. This phenomenon was referred to as Aggregational Gaussianity by Cont (2001). But no proof of this observation seems to be available.

Most of the existing research papers seem to focus on modeling short term returns (e.g. daily returns) or estimation of short term VaR or MS (for instance daily VaR and MS). Long-term return refers to the return generated after holding an asset for a substantial period of time. For example, in India equity returns exceeding one year is considered as long-term. In the 90s JP Morgan developed a VaR estimation method which was effective in measuring short term risk in the banking industry. Dowd et al. (2004) and Fedor (2007) discuss various problems of JP Morgan’s method in the context of measuring longer-term risks. These studies show that the long-term VaR is more difficult to estimate than the short-term VaR. Our model enables estimation of long term VaR and MS.

Dowd et al. (2004) have mentioned the demerits of the “square-root rule” of computing the VaR over m days (say VaR(m)) by multiplying the one day VaR with \(\sqrt {m}\). The authors have argued that the formula VaR(m)=\(\sqrt {m}\)VaR(1) leads to over estimation of the VaR over m days. Under the assumption that the daily returns follow log-normal distribution with parameters μ and σ, Dowd, Blake, and Cairns obtained the following formula for the 100p percent VaR of the absolute returns over m days.

$$ \begin{array}{@{}rcl@{}} VaR(m)&=& P-\exp\left( \mu m+\alpha_{1-p}\sigma\sqrt{m}+\ln P\right)\\ &=&P\left( 1-\exp{\left( \mu m+{\alpha_{1-p}}\sigma\sqrt{m}\right)}\right), \end{array} $$
(1.1)

where P is the current price of the portfolio or asset and α1−p is the (1 − p)th quantile of N(0,1) distribution, 0 < p < 1. Dowd et al. (2004) obtained the formula Eq. 1.1 for the VaR(m) under the assumption that the one day returns follow log-normal distribution. But the distribution of daily returns are in general not known. Obtaining formulae of VaR(m) for other probability distributions such as the Student’s-t, GEV, GP and GL distributions, which can fit daily return data, appear to be quite challenging. Our model for long term return yields asymptotic approximations to m period VaR and MS for large m, under very general conditions on short term 1-period returns.

We provide a probabilistic model which theoretically justifies the gaussianity of Xt, m for large m. We shall refer to the asymptotic distribution of Xt, m for large m as the long term return distribution. To obtain this distribution, we partition the interval (t, t + m] into a m number of non-overlapping subintervals {(t + i − 1, t + i]}i= 1,2,⋯ , m, each of length 1. Since the log- returns are additive, we have the following equation

$$ \begin{array}{@{}rcl@{}} &&\ X_{t,m}=\log\left( \frac{P_{t+m}}{P_{t}}\right)=\sum\limits_{i=1}^{m}\ X_{t+i}\\ \text{where}\ &&\ X_{t+i} =\log\left( \frac{P_{t+i}}{P_{t+(i-1)}}\right),\ i=1,\cdots, m. \end{array} $$
(1.2)

Cont (2001) referred to the returns over small m time scales as fine. The return over large m is referred to as coarse. Since Xt, m is a sum of m many fine returns, a suitable Central Limit Theorem can be used to approximate the asymptotic distribution of centered and scaled Xt, m for \(m\rightarrow \infty \) provided the fine returns satisfy some common properties. Our asset return model thus consists of appropriate assumptions on the fine returns {Xt+i}i= 1, 2, ⋯ in line with the empirical properties of fine returns as observed by Cont (2001).

Our Lemma 1 explains Cont’s Aggregational Gaussianity observation, i.e. the distribution of long term returns can be approximated by the normal distribution. The normal approximation naturally lead to approximation of quantiles of Xt, m for large m. See for instance, the quantile estimator Eq. 2.2. In Lemma 2 we show that the distribution of Xt, m can be approximated by the classical i.i.d. bootstrap for large m as well.

This paper is divided into five sections. In Section 1, we introduce a model (viz. Equation 1.2 and Assumption 1) for the long term return distribution and state and prove Lemmas 1 and 2 which provide mathematical basis for normal approximation and i.i.d. bootstrap approximation of the long-term return distribution and its quantiles. In Section 2, we have discussed the problem of estimating the long term VaR and MS. Equations 2.2 and 2.5 are the proposed estimators of VaR and MS over a long time period. Also we propose bootstrap based VaR and MS estimators given by Eqs. 2.12 and 2.13. We describe seven other VaR and MS estimators including the “square-root of time rule (SRTR)” based estimator. In Section 3, using Monte Carlo simulations, we compare the mean squared errors (MSE) of the nine VaR and MS estimators for different choices of n (sample size of long term returns), m (duration of long term returns) and for three different time series models for the fine returns data. The simulation results are reported in Tables 123 and 4. The results suggest that the proposed estimators Eqs. 2.22.52.12 and 2.13 outperform the other estimators of 95 percent VaR and MS for almost all choices of n and m and the time series model for fine returns. The SRTR based estimator performs well for fine returns data generated by GARCH(1,1) process. In the SRTR method a probability distribution is fitted to the fine return data to estimate the short term VaR, which is then multiplied by \(\sqrt {m}\) to estimate the m period VaR (see Spadafora et al. (2014)). If the marginal distribution of the fine returns is different from the probability distribution fitted to the fine return data in the SRTR method, the SRTR VaR estimator performs poorly. The performance of the SRTR also deteriorates in the presence of dependence in the fine returns data. The extreme value theory based VaR estimator proposed by Drees (2003), Sfakianakis and Verginis based VaR estimator (Sfakianakis and Verginis, 2008) and kernel based estimators of the VaR perform poorly in comparison to the proposed central limit theorem based VaR estimator Eq. 2.2 and bootstrap based VaR estimator Eq. 2.12, for m ≥ 250. The MSE of the proposed estimators Eqs. 2.22.52.12 and 2.13 do not seem to fluctuate widely under different time series models. In contrast, the performance of the other estimators seem to be sensitive to the underlying model for the fine return data.

In Section 4, we describe the unconditional backtest by Kupiec (1995). We use the proposed estimators Eqs. 2.22.52.12 and 2.13 and a number of other estimators, viz. the sample quantile, Sfakianakis and Verginis estimator, extreme value theory based estimator and SRTR based estimator to estimate the 95 percent annual VaR and MS of the Nifty 50 index based on the real data reported by Dutta and Das (2018). We also use the six estimators to estimate the 95 percent annual VaR and MS of the crude oil and gold prices based on the historical data available in the Yahoo Finance website. Our analysis suggests that while the VaR and MS of crude oil annual returns are comparable to the same for the NIFTY 50 annual returns, the annual VaR and MS of gold returns are much smaller than the same for NIFTY 50 index. This indicates gold exhibits the least market risk over a duration of one year in comparison to the crude oil and NIFTY 50 index. In Appendix, viz. Appendix we report the Tables 110 containing the simulation results, real data and VaR and MS estimates based on real data.

1.1 Long Term Return Distribution and Aggregational Gaussianity

As \(m\rightarrow \infty \), Eq. 1.2 represents a model for long term return of an asset for which the return history has been recorded for a large number of times in the past. This model is incomplete without specifying assumptions on the fine returns {Xt+(i− 1)}i= 1,2,3,⋯. Before we state our assumptions, we note the empirical observations reported by Cont (2001):

  1. (i)

    (linear) auto-correlations of asset returns are often insignificant, except for very small intra-day time scales (say 20 minutes);

  2. (ii)

    different measures of volatility such as absolute and squared daily returns, display a positive auto-correlation over several days. This is known as volatility clustering;

  3. (iii)

    The marginal distribution of the fine returns exhibit Pareto-like tail or properties similar to the Student’s-t distribution with four degrees of freedom. The marginal distribution of the fine returns seemed to have finite variance, but infinite fourth moment.

Exact specification of the marginal distribution of fine returns is not possible and we make the following assumptions in line with the above. We refer to the process {Xt+i− 1}i= 1,2,⋯ as the fine return process and make the following assumptions regarding the fine return process.

Assumption 1.

For any t > 0, {Xt+i− 1}i= 1,2,⋯ is a stationary strongly mixing process, with exponential mixing rate, satisfying

$$ \begin{array}{@{}rcl@{}}&&\textbf{a.}\ 0\le E\left[|X_{t+i-1}|^{2}|\log|X_{t+i-1}||^{1+\delta}\right]<\infty,\ \text{for some}\ \delta>0,\\ &&\textbf{b.}\ Corr(X_{t+i-1},\ X_{t+k-i+1})=0\ \forall i,\ k=1,2,\cdots\\ &&\textbf{c.}\ 0< Corr(|X_{t+i-1}|, |X_{t-i}|) \ \text{and}\ \ Corr(X^{2}_{t-i+1}, X^{2}_{t-i}) \forall i=1,2,\cdots.\end{array} $$

Equation 1.2 and Assumption 1 represent our proposed model for m −period return.

Under Assumption 1, E(Xt, m) = mE(Xt) and V ar(Xt, m) = mV ar(Xt). The volatility of the m period returns increase with increase in m. From Eq. 1.2 we get

$$ \begin{array}{@{}rcl@{}}X_{t+k,m} =\sum\limits_{i=1}^{m}\ X_{t+k+i-1}.\end{array} $$
(1.3)

Then under Assumption 1, {Xt+k, m}k= 0,1,2,.... are identically distributed with common marginal distribution function Fm. Xt+k, m denotes the return during (t + k, t + k + m].

The above mentioned assumption viz. Assumption 1 are supported by several real datasets. For instance, Dutta and Das (2018) published data on daily log-returns of the Nifty 50 index in national stock exchange (NSE) in India for the financial years (FY) from 1995-96 to 2017-18. The Augmented Dickey Fuller (ADF) Test suggests that the NIFTY 50 data is stationary. The marginal variance of these 5692 observations in the data is less than 3. The auto-correlation of the daily log-returns of the Nifty 50 index seem to be insignificant. But the absolute and the squared daily log returns exhibit significant auto correlation.

The historical data on daily closing prices of crude oil (per barrel) and gold (per troy ounce) in USD are obtained from Yahoo Finance website (https://finance.yahoo.com/quote/CL%3DF/history?p=CL%3DF, https://finance.yahoo.com/quote/GC%3DF/history?p=GC%3DF) from FY 2001-02 to FY 2020-21. Based on the daily closing prices we obtain the log returns. The daily log returns data of crude oil prices and gold prices exhibit similar empirical properties as that of NIFTY 50 daily log returns. For instance, the datasets are stationary and the auto correlation of the daily log returns is insignificant but the squared daily log returns exhibit significant positive auto correlation. Further, the set of observations of crude oil log returns (5004 observations) and gold log returns (5009 observations) datasets seem to have finite marginal variance (less than 2). These observations support the Assumption 1 for the fine return process.

A sequence of i.i.d. random variables with finite third moment satisfies the conditions a., b., but not c.. The condition c. under Assumption 1 is in line with the phenomenon of volatility clustering observed by Cont (2001). The following processes are non-trivial example of {Xt}t= 1,2,⋯ satisfying Assumption 1.

Example 1.

Let {Pt}t= 1,2,⋯ be a sequence of positive valued random variables such that \(\log (P_{t})\) follows an autoregressive process defined as follows

$$ \log(P_{t})=\theta\epsilon_{t-1}+\sqrt{1-\theta^{2}}\epsilon_{t},\ 0<\theta<1, $$

where {𝜖t}t= 1,2,⋯ is a sequence of i.i.d. N(0, 1) random variables, independent of {Pt}t= 1,2,⋯. Further let {Yt}t= 1,2,⋯ be a sequence of i.i.d. random variables (independent of {Pt}t= 1,2,⋯ and {𝜖t}t= 1,2,⋯) such that P(Yt = 1) = 0.5 and P(Yt = − 1) = 0.5 (for instance, one can look at {Yt}t as the outcomes of a sequence of coin tosses). Define

$$X_{t}=Y_{t}P_{t}.$$

Then

$$ \begin{array}{@{}rcl@{}}&& Cov\left( X_{t+i-1},\ X_{t-i+k+1}\right)=0,\ \forall i,\ k=1,2,\cdots\\ && Cov\left( |X_{t+i-1}|,\ |X_{t-i}|\right)=Cov\left( P_{t-i+1},\ P_{t-i}\right)=e\left( e^{\theta\sqrt{1-\theta^{2}}}-1\right),\\ &&Cov\left( |X_{t}|,\ |X_{t-k}|\right)=0,\ k\ge 2.\end{array} $$

Example 2.

Let {Pt}t= 1,2,⋯ be a sequence of positive valued random variables such that \(\log (P_{t})\) follows an autoregressive process defined as follows

$$ \log(P_{t})=\theta\log(P_{t-1})+\sqrt{1-\theta^{2}}\epsilon_{t},\ 0<\theta<1, $$

where {𝜖t}t= 1,2,⋯ is a sequence of i.i.d. N(0, 1) random variables, independent of {Pt}t= 1,2,⋯. {Yt}t= 1,2,⋯ be a sequence of i.i.d. random variables as defined in Example 1. Define Xt = YtPt. Then {Xt+i− 1}i= 1,2,⋯ satisfies Assumption 1. For instance, it is easy to verify that in the above Example 1

$$ Cov\left( X_{t},\ X_{t-1}\right)=0,\ \ Cov\left( |X_{t}|,\ |X_{t-1}|\right)=Cov\left( P_{t},\ P_{t-1}\right)=e\left( e^{\theta}-1\right). $$

Example 3.

Let {Xt}t= 1,2,⋯ follow a stationary GARCH(1,1) process defined as follows

$$ \begin{array}{@{}rcl@{}}X_{t}&=&\sigma_{t}Z_{t},\\ {\sigma^{2}_{t}}&=& C+ \alpha X^{2}_{t-1}+\beta\sigma^{2}_{t-1}, \end{array} $$

where C, α, β > 0 and α + β < 1. {Zt}t= 1,2,⋯ is a sequence of martingale differences with mean= 0 and variance= 1. Posedel (2005) has studied properties of the GARCH(1,1) model in detail. Under the assumptions that α + β < 1 and β2 + 2αβ + 3α2 < 1, {Xt} is a stationary uncorrelated process with \(Var(X_{t})=\frac {C}{1-\alpha -\beta }\), finite fourth moment and \(\{{X^{2}_{t}}\}\) is an ARMA process with positive auto-correlation. Hence Assumption 1 are satisfied.

Using Central Limit Theorem (CLT) for Strongly Mixing Sequences of Random Variables by Herrndorf (1985) we have the following result

Lemma 1.

Let {Xt+i− 1}i= 1,2,⋯ be a α-mixing stationary process satisfying Assumption 1. Then

$$ \frac{X_{t,m}-mE(X_{t})}{\sqrt{mVar(X_{t})}}\rightarrow_{D} N(0,1)\ \text{as}\ m\rightarrow \infty, $$

where m is the number of fine (short-term) returns recorded within the time scale m.

Remark 1.

1. Lemma 1 explains Cont’s Aggregational Gaussianity observation, i.e. the distribution of long term return can be approximated by the normal distribution.

2. Let Qm, p denote the 100p th percentile of the marginal distribution of Xt, m. Lemma 1 motivates the following estimator estimator of Qm, p for large m

$$ \hat{Q}_{m,p}=mE(X_{t})+\sqrt{mVar(X_{t})}{\Phi}^{-1}(p),\ 0<p<1. $$
(1.4)

E(Xt), V ar(Xt) are approximated by the mean and the variance of the m fine returns.

The following lemma ensures that, one can also use the i.i.d. bootstrap method by Efron (1979) to approximate the sampling distribution of \(\frac {X_{t,m}-mE(X_{t})}{\sqrt {mVar(X_{t})}}\) for large m.

Lemma 2.

Let {Xt+i− 1}i= 1,2, ⋯ be a sequence of stationary random variables satisfying Assumption 1 and that \(E\left (|X_{t}|^{3}\right )<\infty \). Then as \(m\rightarrow \infty \),

$$\frac{X^{*}_{t, m }-X_{t,m}}{\sqrt{m}S_{m}}\rightarrow_{D} N(0,1),\ \text{almost surely},$$

where \(X^{*}_{t, m}={\sum }^{m}_{i=1}X^{*}_{t+i-1}.\ X^{*}_{t},\ X^{*}_{t+1},\cdots ,\ X^{*}_{t+m-1}\) are i.i.d. draws from the empirical distribution of Xt, Xt+ 1, ⋯ , Xt+m− 1. And Sm is the sample standard deviation of Xt, Xt+ 1, ⋯ , Xt+m− 1.

Proof.

Using the arguments in the proof of Theorem 2.2 of Lahiri(Lahiri, 2003), page 21) we see that it is enough to show that under Assumption 1 and the assumption that \(E\left (|X_{t}|^{3}\right )<\infty \), as \(m\rightarrow \infty \)

$$ {S^{2}_{m}}\rightarrow Var(X_{t}),\ \frac{1}{m^{3/2}}\sum\limits_{i=1}^{m}|X_{i}|^{3}\rightarrow 0,\ \text{almost surely}. $$
(1.5)

A stationary strongly mixing process is a stationary ergodic process (see Rieders (1993)). Hence, Under Assumption 1, {Xt}tZ is a stationary Ergodic process with finite marginal variance, and the Birkhoff ergodic theorem ensures that as \(m\rightarrow \infty \)

$$\frac{1}{m}\sum\limits_{i=1}^{m} X_{t+i-1}\rightarrow E(X_{t}),\ \frac{1}{m}\sum\limits_{i=1}^{m} X^{2}_{t+i-1}\rightarrow E({X^{2}_{t}}),\ \text{almost surely}$$

Consequently Eq. 1.5 is proved, and this completes the proof. □

Lemma 2 implies that Φ− 1(p) in Eq. 1.4 can be approximated by the p th quantile of of the distribution of \(\frac {X^{*}_{t, m }-X_{t,m}}{\sqrt {m}S_{m}}\) to obtain Qm, p.

2 Estimation of VaR and MS

A risk measure ρ is a functional of a random variable representing return of a portfolio over a certain holding period. In this paper Xt, m is the random variable representing m period return. A law invariant risk measure is functional of the distribution function of the marginal return distribution. The Value at Risk (VaR), and the Median Shortfall (MS) are two well known law-invariant risk measures (see Dutta and Biswas (2017)). Goh et al. (2012) defined the VaR as follows.

Definition 1.

For 0 < p < 1, the 100p percent VaR during (t, t + m] denoted by V aRm, p is a number satisfying

$$VaR_{m, p}=\inf\{x\ge 0:P\left( X_{t, m}+x\ge 0\right)\ge p\}.$$

Remark 2.

Since \(P\left (X_{t, m}+x\ge 0\right )=P\left (X_{t, m}\ge -x\right )\), it is easy to verify that

$$ VaR_{m, p}=-Q_{m, 1-p}, $$
(2.1)

where is the Qm, p is the p −th quantile of the marginal distribution of Xt, m, i.e. \(Q_{m, p}=\inf \{y:F_{m}(y)\ge p\}\).

Under the Assumption 1 and that \(m\rightarrow \infty \), using Eq. 1.4 we can approximate V aRm, p by the following estimator

$$ \widehat{VaR}_{m, p}=-\hat{Q}_{m, 1-p}=-mE(X_{t})-\sqrt{mVar(X_{t})}{\Phi}^{-1}(1-p). $$
(2.2)

E(Xt) and V ar(Xt) are in general unknown. But under Assumption 1, these parameters can be estimated consistently from the observed fine returns. For instance, we estimate E(Xt) and V ar(Xt) by the mean and variance of the observed fine returns. One of the reviewers suggested to take into account an empirically observed property that the conditional second moment of asset returns is time-varying while estimating the VaR. The conditional variance of Xt+i, given Xt, Xt+ 1, ⋯ , Xt+i− 1, can be modeled by a stationary GARCH model satisfying Assamption 1. For instance, in Example 3 the GARCH(1,1) satisfies the Assumption 1 and takes into account the time variation of the conditional variance of fine returns. Under the GARCH(1,1) model in Example 3, \(Var(X_{t})=\frac {C}{1-\alpha -\beta }\) and C, α, β can be estimated by maximum likelihood method based on the fine returns. However, if we replace V ar(Xt) in Eq. 2.2 by this formula the resulting VaR estimate is not robust i.e. mean squared error (MSE) of the resulting estimator varies widely for different time series models for the fine return process. For example if we fit a GARCH(1,1) model to the time series in Example 1 and estimate V ar(Xt) in Eq. 2.2 by \(\frac {C}{1-\alpha -\beta }\), the resulting VaR estimate performs very poorly (large mean squared error) in comparison to the proposed VaR estimator obtained by replacing E(Xt) and V ar(Xt) in Eq. 2.2 by the sample mean and variance of the fine returns. Therefore, we do not recommend estimating V ar(Xt) in Eq. 2.2 by fitting a specific model to the fine return time series, as the resulting VaR estimate is a function of the model parameters and is highly sensitive to the choice of the model for fine returns. In contrast, the VaR estimator obtained by estimating E(Xt) and V ar(Xt) in Eq. 2.2 by the sample mean and variance of the fine returns is a nonparametric estimator of VaR and does not depend on the model generating the fine return process.

Remark 3.

Since \(X_{t,m}=\log \left (\frac {P_{t+m}}{P_{t}}\right )\) and Qm, p is the 100p th percentile of the distribution of Xt, m, the 100p th percentile of the distribution of m −period absolute return \(\left (\frac {P_{t+m}}{P_{t}}-1\right )\) is equal to \(\exp \{Q_{m,p}\}-1\). Consequently under Assumption 1 and for large m, we can approximate 100p-percent m-period VaR in absolute scale by the following estimator

$$ {-\left( \exp\{Q_{m,1-p}\}-1\right)=1-\exp{\{mE(X_{t})+\sqrt{mVar(X_{t})}{\Phi}^{-1}(1-p)\}}}, $$
(2.3)

which is in fact equal to the formula Eq. 1.1 obtained by Dowd et al. (2004) for P = 1, μ = E(Xt) and \(\sigma =\sqrt {Var(X_{t})}\). Dowd, Blake, and Cairns (Dowd et al. 2004) obtained Eq. 1.1 under the assumption that the daily or 1 − period returns follow log-normal distribution with parameters μ and σ. In contrast, Eq. 2.3 is obtained under Assumption 1 without requiring the knowledge of the exact distribution of the fine returns.

One of the demerits of VaR is that it does not provide any information about the size of the potential loss during a time scale m when the loss within that period falls below the VaR level. The conditional loss distribution, given − Xt, m > V aRm, p, is in general unknown. Also it is not a cohorent risk measure (see So and Wong (2012)). To overcome these issues, So and Wong (2012) introduced another risk measure named median shortfall (MS).

Let Θp denote the distribution function of the conditional loss distribution, given that the loss − Xt, m exceeds the VaR level, is defined as

$$ \begin{array}{@{}rcl@{}}{\Theta}_p(x)=\begin{cases}P\{-X_{t,m}\leq x||\space -X_{t,m} {>} VaR_{m,p}\},\ \ \text{if}\ x>-VaR_{m,p}\\ 0,\ \ \text{otherwise.}\end{cases} \end{array} $$

The median of this distribution Θp is called the median shortfall (MS). It is straightforward to verify that

$$ \begin{array}{@{}rcl@{}} {\Theta}_p(x) = \begin{cases}{1-\frac{F_m(-x)}{1-p}}, \ \ \text{if}\ x\le -VaR_p \\ 0,\ \ \text{otherwise.} \end{cases} \end{array} $$
(2.4)

where Fm is the marginal distribution function Xt, m. Fm is unknown. But Lemma 1 implies that that as \(m\rightarrow \infty \) Fm can be approximated by normal distribution under Assumption 1.

Definition 2.

The 100p percent MS, denoted by MSm, p, is defined as follows

$$MS_{m, p}=\inf\{x: {\Theta}_{p}(x)\ge 0.5\}.$$

By the above definition, the MS is the median loss when the loss exceeds the VaR (see So and Wong (2012)). Therefore, MS gives the median depreciation of the asset value during a time scale m, under the worst-case scenario quantified by the m − period VaR.

The following Lemma is a direct consequence of the definition of MS and Eq. 2.4.

Lemma 3.

Let Fm be a continuous distribution function. Then

$$MS_{m, p}=VaR_{m, 0.5+0.5p}.$$

Therefore for any 0 < p < 1, an estimator \(\hat {MS}_{m, p}\) of the 100p percent MS is defined as follows

$$ \begin{array}{@{}rcl@{}} \widehat{MS}_{m, p}&=&\widehat{VaR}_{m, 0.5+0.5p}=-\hat{Q}_{m,1-0.5(1+p)}=-mE(X_{t})\\ &&-\sqrt{mVar(X_{t})}{\Phi}^{-1}(1-0.5(1+p)). \end{array} $$
(2.5)

2.1 Other Non-Parametric VaR and MS Estimators

Let X1, m, X2, m,..., Xn, m be identical copies of Xt, m with distribution function Fm. Let X(1), m, X(2), m,..., X(n), m denote the corresponding order statistics. The m −period 100p percent V aR is equal to − Qm,1−p (see Eq. 2.1), where 0 < p < 1. Equation 2.1 and Lemma 3 imply that the 100p percent MS is equal to − Qm,1 − 0.5(1+p). Therefore the problems of estimating the m −period 100p percent V aR and MS are essentially the problems of estimating − Qm,1−p and − Qm,1 − 0.5(1+p) based on X1, m, X2, m,..., Xn, m.

Dutta and Biswas (2017) have reviewed the performance of a number of non-parametric quantile estimators which can be used to estimate Qm, p. Following are some of the estimators which performed well in their simulation study.

Sample Quantile and Kernel Quantile Estimator

The p th sample quantile \(X_{\left (\left \lfloor np \right \rfloor +1 \right ),m}\) is a natural estimator of Qm, p. Asymptotic properties of the sample quantile are well known under i.i.d. assumption (see Sherfling (1980)). Asymptotic properties of the sample quantile has also been studied extensively under various dependence assumptions. See, for instance, Sun (2006), Wu (2005), and Wang et al. (2011) and references therein. Under strong mixing dependence (with polynomial mixing rate) assumption, Wang et al. (2011) obtained Bahadur representation of sample quantile, which provides insight into the rate of strong convergence of the sample quantile. Dutta and Biswas (2017) have reviewed these properties of the sample quantile in detail. We denote the sample quantile estimator by SQp.

A kernel estimator of Qm, p is defined as follows

$$\widehat{Q}_{m, p}=\inf\left \{ x:\widehat{F}_{m}(x)\geq p \right \}$$

where \(\widehat {F}_{m}\) is a kernel distribution function estimator (see Dutta and Biswas (2017)) which is defined as follows

$$\widehat{F}_{m}(y)=\frac{1}{n}\sum\limits_{i=1}^{n}K\left( \frac{y-X_{i,m}}{h_{n}}\right),$$

where K is the distribution function known as the kernel and {hn} is a positive sequence referred to as the bandwidth. In the kernel-based method, the main challenge lies with the selection of bandwidth hn. Polanski and Baker (2000), Chen and Tang (2005), and Alemany et al. (2013) provide some choices of hn. Using the “kerdiest” package in R software one can find the kernel distribution function estimate with bandwidth formula proposed by Polanski and Baker (2000) as default which is given by,

$$h_{n}=\left (\frac{\rho (K)}{-n{\mu_{2}^{2}}(K)\widehat{\psi_{2}}(g_{2}) } \right )^{1/3},$$

where \(\rho (K)=2{\int \limits }_{-\infty }^{\infty }uw(u)G(u)\), \(\mu _{2}(K)={\int \limits }_{-\infty }^{\infty }u^{2}w(u)du\) and \(\psi _{r}(g)=\frac {1}{n^{2}g^{r+1}}{\sum }_{i=1}^{n}{\sum }_{j=1}^{n}L^{r}\left (\frac {u_{i}-u_{g}}{g} \right )\), r ≥ 2 an even integer and \(g_{2} = \left (\frac {2L^{(2)}(0)}{-n{\mu _{2}^{2}}(L)\psi _{4}}\!\right )^{1/5}\). L is a kernel function not necessarily equal to kernel function w and \(G(u)={\int \limits }_{-\infty }^{u}w(t)dt\).

We denote the Polansky and Baker quantile estimator by PBp.

Chen and Tang (2005) suggested the following choice for the optimal value of hn,

$$h_{n}=\left\{ \frac{2f^{3}\left (Q_{p} \right )b_{k}}{{\sigma_{k}^{4}}\left (f^{(1)}\left (Q_{p} \right ) \right )^{2}}\right\}^{1/3}n^{-1/3},$$

where \(b_{k}={\int \limits } uw(u)G(u)du\) and \({\sigma _{k}^{2}}={\int \limits } u^{2}w(u)du\), where w is a probability density function with zero mean and finite variance, known as the kernel. G(⋅) is the distribution function of the distribution with density w. hn involves unknown constants Qp, f and its derivative f(1) at Qp. Chen and Tang (2005) suggested to approximate Qp in hn by the corresponding sample quantile. The authors suggested to approximate f and f(1) by the density and the first derivative of the generalized Pareto distribution. We denote the Chen and Tang’s quantile estimator by CTp.

Harrell-Davis Estimator

Harrell and Davis (1982) introduced a quantile estimator (we call HDp) which is a weighted linear combination of order statistics and defined as follows:

$$ \begin{array}{@{}rcl@{}} HD_{p}&=& \sum\limits_{i=1}^{n}w_{(i)}X_{(i),m}, \end{array} $$
(2.6)
$$ \begin{array}{@{}rcl@{}} w_{(i)}&=&I_{i/n}\left (p\left (n+1 \right ),\left (1-p \right )\left (n+1 \right ) \right )\\ &&-I_{(i-1)/n}\left (p\left (n+1 \right ),\left (1-p \right )\left (n+1 \right ) \right ), i=1,2,..,n\end{array} $$
(2.7)

where Ix(a, b) denotes the incomplete beta function. It is available in R software (see hdquantile function in Hmisc package in R software for statistical computing). − HD1−p is equal to 100p percent V aR and 100p percent MS is 100(0.5p + 0.5) percent VaR.

Sfakianakis and Verginis estimator

Sfakianakis and Verginis (2008) introduced three L −statistic type estimators, SV 1p, SV 2p and SV 3p (see Sfakianakis and Verginis (2008) for a detailed discussion). Among these estimators SV 3p seems to be the appropriate estimator for Qm, p, especially for 1 − p close to zero. It is defined as follows:

$$ SV3_{p}= \sum\limits_{i=1}^{n}B\left (i,n,p \right )X_{(i),m}+\left (2X_{(1),m}-X_{(2),m} \right )B(0,n,p) $$
(2.8)

where B(i, n, p) is the probability mass function of the Binomial distribution with parameters n and p. − SV 31−p is equal to 100p percent V aR and 100p percent MS is 100(0.5p + 0.5) percent VaR.

2.2 VaR and MS Estimation Based on Extreme Value Theory (EVT)

In this approach the idea is to use the high returns, above some threshold, in the observed data to estimate Qm, p for p close to 1 and hence the VaR and the MS (see Dutta and Biswas (2017), Drees (2003) and references therein). From Definition 1, we find that V aRm, p is equal to the p th quantile of the marginal distribution of Yt, m = −Xt, m. Since the extreme value theory based estimator is suitable for estimating quantiles to the right tail of a distribution, we find 100p percent VaR by estimating the p th quantile of Yt, m.

Pickands-Balkema-de Haan theorem (see Balkema and de Haan (1974)) claims that the conditional distribution of exceedance, given that a random variable exceeds a threshold value can be well approximated by Generalized Pareto distribution (GPD) provided the distribution function of the random variable is in the domain of attraction of the Generalized Extreme Value (GEV). Based on this theorem a GPD distribution is fitted to the k largest observations in the sample to approximate the tail of the conditional distribution of Yt, mu given Yt, m > u where u is the threshold value. Usually the threshold value is chosen to be the (nk)th order statistics, where n is the number of observed values of Yt, m in the data. Let, Fm be the distribution function of Yt, m. The choice of k is not straight-forward. For small k (i.e. for large threshold), the GPD approximation of the tail is more accurate, assuming Fm is in the domain of attraction of GEV. But for small k, lesser observations in the sample are available for fitting the GPD. In contrast for large k, more data are available for fitting the GPD distribution, but the the GPD approximation to the tail is biased.

Drees (2003) extended the extreme value theory for estimation of extreme quantiles of the marginal distribution of a stationary time series from i.i.d. assumption to β −mixing type dependence, which cover a broad class of time series models. The author assumed that the common distribution function Fm satisfies the property that as λ 0

$$ \frac{F^{-1}_{m}(1-\lambda t)}{F^{-1}_{m}(1-\lambda)}\rightarrow \frac{1}{t^{\xi}},\ \forall t>0 $$

for some ξ > 0 and \(F^{-1}_{m}\) is the quantile function of Fm. Under further assumptions that as \(n\rightarrow \infty \), \(p\rightarrow 1\) and \(k_{n}\rightarrow \infty \) in such a way that n(1 − p) = O(1) and \(\frac {k_{n}}{n}=o(1)\), one can argue that

$$ Q_{m,p}\equiv Q_{m,1-k_{n}/n} \left( \frac{k_{n}}{n(1-p)}\right)^{\xi}, $$

see equation (4) in Drees (2003). A suitable estimator of the tail index ξ is the Hill estimator

$$ \hat \xi=\frac{1}{k_{n}}\sum\limits_{i=1}^{k_{n}}\log\frac{Y_{(n-i+1), m}}{Y_{(n-k_{n}), m}}, $$

where Y(i), m, i = 1,⋯ , n are n ordered observations (from smallest to largest in magnitude) of Yt, m. The above approximation naturally leads to the following estimator

$$ {EVT_{p}= Y_{(n-k_{n}), m}\left( \frac{k_{n}}{n(1-p)}\right)^{\hat \xi},} $$
(2.9)

(see Drees (2003)). Therefore, in EVT the 100p percent m −period VaR and MS estimators are EV Tp and EV T0.5(1+p) respectively.

2.3 VaR and MS Estimation Based on Square Root of Time Rule (SRTR)

Dowd et al. (2004) have stated the “square-root of time rule” based VaR estimator which is given by

$$VaR(m)=\sqrt{m}VaR(1),$$

where V aR(1) is the VaR over 1 day and V aR(m) is the VaR over m days. V aR(1) is estimated based on some reliable assumptions and is multiplied with square root of m to get the m day VaR.

Spadafora et al. (2014) have proposed a VaR scaling formula at confidence level 1 − α for a horizon of m days which is given by,

$$ VaR(m, \alpha)=\sqrt{m}\frac{F^{-1}(\alpha)}{ F^{-1}(0.01)}VaR(1,0.01), $$
(2.10)

where F− 1(α) is the αth quantile of the distribution of short term (1 day) returns and V aR(0.01,1) is the estimated daily VaR at confidence level 99 percent.

The authors considered three distributions viz. Normal (N), Student’s t (ST) and Variance-Gamma (VG) distributions for fitting the short term (1 day) returns distribution and found that ST and VG distributions yield better fit results. Cont (2001) observed that the marginal distributions of the fine returns exhibit properties similar to the Student’s− t distribution. We fit Student’s t distribution to the fine return data and estimate the 100p percent daily VaR by taking the p th quantile of the fitted Student’s t distribution. From Eq. 2.10 we get the following formula the 100p percent VaR for m days.

$$ \widehat{VaR}(m,p)=\sqrt{m}F_{ST}^{-1}(p), $$
(2.11)

where \(F_{ST}^{-1}(p)\) is the p th quantile of the Student’s t distribution fitted to the fine return data.

2.4 VaR and MS Estimation by Bootstrap Approach

We can replace Φ− 1(1 − p) in Eq. 2.2 by the (1 − p)th quantile of the distribution of \(\frac {X^{*}_{t, m }-X_{t,m}}{\sqrt {m}S_{m}}\) by using Lemma 2. We denote this new proposed estimator by \(\widehat {VaR}_{boot,p}\).

$$ \widehat{VaR}_{boot,p}=-mE(X_{t})-\sqrt{mVar(X_{t})} \times D_{1-p}, $$
(2.12)

where D1−p is the (1 − p)th quantile of the distribution of \(\frac {X^{*}_{t, m }-X_{t,m}}{\sqrt {m}S_{m}}\).

The 100p percent MS can be estimated by,

$$ \begin{array}{@{}rcl@{}} \widehat{MS}_{boot,p}&=&-mE(X_{t})-\sqrt{mVar(X_{t})} \times D_{1-0.5(1+p)}\\ &=&-mE(X_{t})-\sqrt{mVar(X_{t})} \times D_{0.5(1-p)} \end{array} $$
(2.13)

3 Simulation Study

The exact mean squared error (MSE) of the above mentioned VaR and MS estimators are difficult to obtain in general. However we can approximate and compare the MSE of these estimators in a Monte-Carlo (MC) simulation study for some specific data generating models satisfying Assumption 1. Given a data generating model, we generate B samples each of size n. Based on the B values of the statistic, say T1, T2,⋯ , TB, the MC estimate of the MSE is equal to \(\frac {1}{B}{\sum }^{B}_{i=1}\left (T_{i}-\theta \right )^{2}\), where 𝜃 is the parameter of interest. We have used B = 10,000.

In general the stochastic process generating the observed data is not known. However in a MC simulation study we can compute the MC estimate assuming some test distribution or data generating process. In this simulation study we consider the following three time series models described in Examples 1-3, for the 1 period returns {Xt}.

(I) Xt = YtPt, where \(\left \{ P_{t} \right \}_{t=1,2,...}\) be a sequence of positive valued random variables such that \(\log (P_{t})\) follows an autoregressive process defined as follows

$$\log(P_{t})=\theta \epsilon_{t-1}+\sqrt{1-\theta^{2}}\epsilon_{t},$$

where \(\left \{ \epsilon _{t} \right \}_{t=1,2...}\) is a sequence of i.i.d. N(0,1) random variables, independent of \(\left \{ P_{t} \right \}_{t=1,2,...}\) and \(\left \{ Y_{t} \right \}_{t=1,2,...}\) be a sequence of i.i.d. random variables (independent of \(\left \{ P_{t} \right \}_{t=1,2,...}\) and \(\left \{ \epsilon _{t} \right \}_{t=1,2...}\)) such that P(Yt = 1)= 0.5 and P(Yt = − 1) = 0.5.

(II) Xt = YtPt, where \(\left \{ P_{t} \right \}_{t=1,2,...}\) be a sequence of positive valued random variables such that \(\log (P_{t})\) follows an autoregressive process defined as follows

$$\log(P_{t})=\theta \log(P_{t-1})+\sqrt{1-\theta^{2}}\epsilon_{t},$$

where 𝜖t and Yt are defined in the same way as in the first model.

(III) GARCH(1,1) model given by Xt = σtZt where \({\sigma _{t}^{2}}=0.0001+0.4X_{t-1}^{2}+0.5\sigma _{t-1}^{2}\) and {Zt}t= 1,2,..... are i.i.d. standard normal random variables.

For each one of the above mentioned time series models I to III, we generate n × m values of Xt viz. {Xt+i}i= 1,⋯ , m, t = 1,⋯ , n. Consequently, \(X_{t,m}={\sum }^{m}_{i=1} X_{t+i},\ t=1,\cdots ,n\) are the n values of the m period return Xt, m. The unknown parameters E(Xt) and V ar(Xt) in our VaR and MS estimators Eq. 2.22.52.12 and 2.13 are estimated by the mean and variance of the n × m values {Xt+i}i= 1,⋯ , m, t= 1,⋯ , n. We compute the other VaR and MS estimators based on the Xt, m t = 1,⋯ , n, which represents a sample of size n of long term returns of duration m. The process is repeated B = 10,000 times for each choice of n and m. The bootstrap process for Eqs. 2.12 and 2.13 is carried out D = 10000 times.

Let MSE1, MSE2 and MSE3 denote the MC estimate of the MSE of the sample quantile (SQp), our estimator (\(\widehat {VaR}_{m, p}\)) and our bootstrap based estimator \(\widehat {VaR}_{boot,p}\) which is given by Eq. 2.12. MSE4 denotes the MC estimate of the MSE of Harell-Davis estimator (HDp). Let MSE5 and MSE6 denote of the estimated MSE of the S-V estimator (SV 3p) and the EVT estimator (EV Tp) respectively. MSE7 and MSE8 are the estimated MSE of the Kernel quantile estimators using bandwidths by Polansky and Baker (PBp) and Chen and Tang (CTp). MSE9 denotes the MC estimate of the MSE of the SRTR estimator (Eq. 2.11 and . In Tables 1 and 2 we report the ratios \(\frac {MSE2}{MSE1}\) to \(\frac {MSE9}{MSE1}\) for the eight 95 percent VaR estimators (other than the sample quantile) for different time series models, for different values of n and for m = 250 and 500 respectively. In Tables 3 and 4 we report the same ratios for 95 percent MS estimators, which are essentially 97.5 percent VaR estimators.

We observe that for n = 20 and for p = 0.95, kn = 2 and in that case the EVT estimator EV Tp may not be defined for those samples where Y(n), m and \(Y_{(n-k_{n}), m}\) have opposite signs. Therefore for n = 20, the MC estimate of the MSE of the EVT estimator is not defined and is returned as NaN (not a number) by the R −programming environment. Following are the main observations based on the ratio of MSEs reported in Tables 14 (See Section 5, Appendix).

  1. 1.

    Performance of our proposed estimators: a. The central limit theorem (CLT) based VaR and MS estimators Eqs. 2.2 and 2.5 exhibit the least MSE among all the 95 percent VaR and MS estimators for model (I) and (II) and for all choices of n and m. For the other time series model, viz. model (III), the proposed estimators Eqs. 2.2 and 2.5 outperform almost all the other estimators, viz. sample quantile, HDp, SV 3p, PBp and CTp except the SRTR estimator Eq. 2.11. For data generated by GARCH (1,1) model, the bootstrap based VaR estimators Eq. 2.12 outperforms the other VaR estimators (except SRTR) for sample size less than 50 and m = 250.

    b. Overall, the MSE of our proposed CLT based estimators Eqs. 2.2 and 2.5 and the bootstrap based estimators Eqs. 2.12 and 2.13 do not seem to fluctuate widely under different time series models. For instance, \(\frac {MSE2}{MSE1}<0.8\) and \(\frac {MSE3}{MSE1}<0.8\) for all choices of n, m and for all the time series models which can be seen in Tables 14. In contrast, the MSE of the other estimators seem to be much higher for certain time series models and certain choices of n and m. For example, the ratio \(\frac {MSE9}{MSE1}>1\), for models (I), (II) and n ≥ 50. For the GARCH(1,1) model \(\frac {MSE8}{MSE1}, \frac {MSE5}{MSE1} >1\) for all choices of n and m. See Tables 14. The proposed CLT based estimators Eqs. 2.2 and 2.5 and the bootstrap based estimators Eqs. 2.12 and 2.13 seem to perform reliably in estimating 95 percent VaR and MS for all choices of n and m, irrespective of the time series model generating the fine return data.

  2. 2.

    Performance of the SRTR estimator: The SRTR rule based 95 percent VaR and MS estimators seem to exhibit the least MSE for fine return data generated by model (III). However for model (III), \(F_{ST}^{-1}(p)\) in Eq. 2.11 is replaced by the p th quantile of N(0, σ2) distribution with σ2 equal to the sample variance, as this distribution serves as a better fit to the data generated by the GARCH(1,1) model (III).

    However, the performance of SRTR estimator seems to be sensitive to the choice of the model for the fine returns. For instance, under model (I) and (II) the MSE of the SRTR rule based 95 percent VaR and MS estimators are much higher in comparison to the same under model III for the fine return data. In fact, in Tables 1 and 2, \(\frac {MSE9}{MSE1}>1\) under models (I) and (II) for all choices of n and the ratio \(\frac {MSE9}{MSE1}\) increases as n is increased. This indicates that the SRTR rule estimator is outperformed by the sample quantile based 95 percent VaR and MS estimator for models (I) and (II).

  3. 3.

    Performance of the Sfakianakis and Verginis estimator: \(\frac {MSE5}{MSE1}>1\) for almost all choices of n, m and time series models. See fifth column in Tables 14. This indicates that Sfakianakis and Verginis estimator (SV 3p) is outperformed by the sample quantile for estimation of 95 percent VaR and MS for m = 250 and 500.

  4. 4.

    Performance of extreme value theory based estimator: \(\frac {MSE2}{MSE1}<\frac {MSE6}{MSE1}\) and \(\frac {MSE3}{MSE1}<\frac {MSE6}{MSE1}\) for all choices of n, m and for all time series models in Tables 14. The estimator EV Tp in Eq. 2.9 is uniformly outperformed by our proposed estimators Eqs. 2.2 and 2.5 of VaR and MS respectively, for all the models. EV Tp is also outperformed by our proposed bootstrap based VaR and MS estimators Eqs. 2.12 and 2.13. For model (I), \(\frac {MSE6}{MSE1}>1\) for n ≥ 50. This indicates that for this model the sample quantile based VaR and MS estimators outperform the EVT estimator.

  5. 5.

    Performance of Harell Davis estimator: \(\frac {MSE2}{MSE1}<\frac {MSE4}{MSE1}\) and \(\frac {MSE3}{MSE1}<\frac {MSE4}{MSE1}\) for all choices of n, m and for all time series models in Tables 14. This indicates that the estimator HDp in Eq. 2.6 is also uniformly outperformed by our proposed estimators Eqs. 2.22.52.12 and 2.13 of VaR and MS respectively, for all the models. For n ≥ 50 in GARCH(1,1) model, \(\frac {MSE4}{MSE1}>1\) in Tables 3 and 4 which indicates that for this model sample quantile based MS estimator outperform the Harell Davis estimator.

  6. 6.

    Performance of Kernel Quantile Estimators: The Kernel Quantile Estimator PBp seems to have the least MSE for Model (III) in Table 4 for n ≥ 50. But apart from that PBp is outperformed by our proposed estimators as can be seen in Tables 14 where \(\frac {MSE2}{MSE1}<\frac {MSE7}{MSE1}\) and \(\frac {MSE3}{MSE1}<\frac {MSE7}{MSE1}\).

    CTp is uniformly outperformed by our proposed estimators for all choices of n, m and for all time series models in Tables 14 since \(\frac {MSE2}{MSE1}\!<\!\frac {MSE8}{MSE1}\) and \(\frac {MSE3}{MSE1}\!<\!\frac {MSE8}{MSE1}\). Further, \(\frac {MSE8}{MSE1}>1\) for GARCH(1,1) model for all n and m. This indicates that the CTp performs poorly compared to sample quantile based VaR and MS estimator for the GARCH(1,1) model and hence it is not suitable for estimating long term VaR and MS for data generated by GARCH(1,1).

Remark 4.

1. The above observations suggests that the proposed estimators \(\widehat {VaR}_{m,p}\) and \(\widehat {VaR}_{boot,p}\) perform well for all the time series models and m ≥ 250. The EVT based estimator, Harell Davis estimator, Sfakianakis and Verginis estimator and the Kernel quantile estimator CTp perform poorly in comparison to the proposed estimators \(\widehat {VaR}_{m,p}\) and \(\widehat {VaR}_{boot,p}\) for n ≥ 20, m ≥ 250. Therefore, these estimators are not recommended for estimating long term VaR (m ≥ 250).

2. There are two reasons for the proposed estimators of long term VaR and MS being reliable. The m − period VaR and MS are equal to negative of the extreme left quantiles of the m −period return distribution (See Eqs. 2.2 and 2.5). For large m, our Lemma 1 and the empirical observations in Cont (2001) suggest that the m −period return distribution is well approximated by the normal distribution. Therefore the proposed m − period VaR and MS estimators, especially Eqs. 2.2 and 2.5, which are based on approximating the quantiles of m −period return distribution by the corresponding quantiles of the normal distribution seem to work well for estimation of m − period VaR and MS, for large m (m ≥ 250). The extreme value theory (EVT) based quantile estimator Eq. 2.9 is based on the assumption that as λ 0

$$\frac{F^{-1}_{m}(1-\lambda t)}{F^{-1}_{m}(1-\lambda)}\rightarrow \frac{1}{t^{\xi}},\ \forall t>0$$

for some ξ > 0 and \(F^{-1}_{m}\) is the quantile function of m − period return distribution (See page 10). Since for large m the m − period return distribution resembles normal distribution, therefore the above mentioned assumption on \(F^{-1}_{m}\) does not seem to be appropriate for large m. Therefore the proposed estimators Eqs. 2.2 and 2.5, based on normal approximation of the long term return distribution, seem to be more appropriate than the EVT estimator Eq.2.9 for estimation of long term VaR and MS.

Moreover for large m, the number n of observed values of the m − period return Xt, m is small. For example, for m = 250 (i.e. one-year time period) there are n = 26 observations on Xt, m, i.e. annual return of Nifty 50 index. See Table 5. All the other m − period VaR and MS estimators, viz. the sample quantile and the kernel based estimator in page 9, the Harrell-Davis estimator Eq. 2.6 , Sfakianakis and Verginis estimator Eq. 2.8 and the EVT estimator Eq. 2.9 are based on the n observed values of Xt, m. Larger the m, the lesser number of observations n on Xt, m are available for computation of these other estimators. Hence these estimators are more suitable for estimation of short term VaR and MS, i.e. for small m and large n. In contrast, our proposed estimators Eq. 2.2 and 2.5 depend on estimation of E(Xt) and V ar(Xt) for which n × m observations on the 1 period return Xt are available. In our proposed methodology observations of short term returns are used to estimate the parameters of long term VaR and MS formulae Eqs. 2.2 and 2.5.

4 Risk Estimation and Backtesting Based on Real Data

4.1 Backtesting VaR Estimates

Santomila et al. (2018) describe the unconditional backtest of a 100p percent VaR estimation model as a procedure of comparing the observed number of times the losses exceed the estimated VaR in a given period with number of times the actual VaR is expected to be exceeded during the same time period. If the observed number of exceedances is much higher than the expected number of exceedances of the actual 100p percent VaR, the VaR estimate is considered to be inadequate for regulatory purposes (See Santomil et al. 2018).

Let us recall that Xt+k, m denotes the returns during (t + k, t + k + m] for k = 0,1,2,..., where Xt+k, m is defined in Eq. 1.3. For any natural number n ≥ 2, let

$$Z_{n}=\sum\limits_{k=0}^{n-1}I\left( -X_{t+k, m}>VaR_{m, p}\right).$$

Zn is the number of times the m − period loss exceeds the 100p percent VaR level in n successive time intervals (t + k, t + k + m] for k = 0,1,2,..., n − 1. Under Assumption 1{Xt+k, m}k= 0,1,2,... are identically distributed. Since V aRm, p = −Qm,1−p, the expected number of exceedances is equal to

$$E\left( Z_{n}\right)=n(1-p).$$

V aRm, p is unknown. Replacing V aRm, p by an estimator \(\widehat {VaR}_{m, p}\) in Zn we get the the observed number of exceedances (we call it \(\hat {Z}_{n}\)). Therefore

$$\widehat{Z}_{n}=\sum\limits_{k=0}^{n-1}I\left( -X_{t+k, m}>\widehat{VaR}_{m, p}\right).$$

However \(E(\widehat {Z}_{n})\) is unknown (as \(\widehat {VaR}_{m, p}\) may not be equal to the negative of (1 − p)th quantile of Xt, m).

We consider \(\widehat {VaR}_{m, p}\) is an adequate estimator of V aRm, p if \(P\left (-X_{t,m}>\right .\) \(\left .\widehat {VaR}_{m, p}\right )\le 1-p\) and inadequate if \(P\left (-X_{t,m}>\widehat {VaR}_{m, p}\right )> 1-p\). Therefore we test

$$H_{0}:\ P\left( -X_{t, m}\!>\!\widehat{VaR}_{m, p}\right) = 1-p\ \ \text{against}\ H_{1}:\ P\left( - X_{t, m}\!>\!\widehat{VaR}_{m, p}\right)\!>\!1-p.$$

Under H0, \(E\left (\widehat {Z}_{n}\right )=n(1-p)=E\left (Z_{n}\right )\) (the expected number of exceedances). We reject H0 at 100α percent level of significance if \(\widehat {Z}_{n}>n(1-p)+z_{n,\ \alpha },\) where zn, α is the 100(1 − α) percentile of the distribution of \(\widehat {Z}_{n}-n(1-p)\) under H0.

The traditional unconditional backtest by Kupiec (1995) assume \(\widehat {Z}_{n}\) to follow binomial(n,1 − p) distribution under H0. See for instance Kupiec (1995) and Santomila et al. (2018). We use unconditional backtest by Kupiec (1995) to test H0 against H1 based on the observed m −period return data.

4.2 Annual VaR and MS Estimation of the Nifty 50 Index, Crude Oil And Gold

The S& P CNX Nifty 50 is a well diversified 50 stock market index accounting for twenty two sectors of the Indian economy. It is used for a variety of purposes such as benchmarking fund portfolios (see www.nseindia.com for details). Also there are a number of Nifty 50 index funds which are passively maintained mutual funds mirroring the portfolio composition of the Nifty 50 index. An obvious interest for the investors is to measure the risk due to fluctuation of the value the Nifty 50 index over a certain period.

Crude oil and gold prices have important impacts on the financial markets and the economy of a country. Oil and gold are two of the world’s most important commodities. They have received much attention recently due to the fluctuations in their prices and the increase in their economic applications. Crude oil is one of the most commonly traded commodity, and its price exhibits high volatility in the commodity market (See Regnier (2007)). The price fluctuations of gold lead to parallel movements in the prices of other precious metals (See Sari et al. (2010)). Gold is also an investment asset and commonly known as a “safe haven” from the increasing risks in financial markets. The estimation of market risk of crude oil and gold are important for the various stakeholders and participants such as producers, exporters etc. (See https://www.mcxindia.com/products/energy/crude-oil and https://www.mcxindia.com/products/bullion/gold)

In India, a financial year (we call it FY) refers to the period from 1st April of a year to 31st March of the next year. There are approximately 250 trading days in a financial year (the exact number of days on which the stock markets in India remain closed may vary from one year to another). Dutta and Das (2018) published data on daily log-returns of the Nifty 50 index in national stock exchange (NSE) in India for the financial years (FY) from 1995-96 to 2017-18 in the form of twenty three “csv” files in Mendeley https://data.mendeley.com/datasets/tm2kzgf3gd/.Footnote 1 These log returns are reported as percentages, i.e. daily log return multiplied by 100.

The historical data on Gold and Crude Oil daily closing prices (in US dollars per barrel and per troy ounce respectively) from the FY 2001-02 to FY 2020-21 are available in Yahoo finance (https://finance.yahoo.com/quote/CL%3DF/history?p=CL%3DF, https://finance.yahoo.com/quote/GC%3DF/history?p=GC%3DF). The daily log returns are calculated by taking the logarithm of the ratio of closing prices on two consecutive days. The annual log return is the sum of the daily log returns within a financial year. In Tables 6 and 7, we report the annual log returns (in percentage) of crude oil and gold prices respectively for the 20 financial years, from FY 2001 − 02 to FY 2020 − 21.

4.2.1 Data Analysis

1. NIFTY 50 index: In Table 5, we report the annual log returns for the 26 financial years, from FY 1995 − 96 to FY 2020 − 21 for Nifty 50 index. Each of the annual log-return is the sum of the daily log returns recorded between the 1st and the last day of the FY. Dutta and Das(Dutta and Das, 2018) have reported data upto the FY 2017 − 18. The annual Nifty return data for the financial year FY 2018 − 19 to FY 2020 − 21 are obtained from the Yahoo Finance website https://in.finance.yahoo.com. The data in Table 5 is positively skewed and the moment coefficient of kurtosis is close to 3 (i.e. not heavy tailed).

We use the daily log return data set by Dutta and Das(Dutta and Das, 2018) and the Eqs. 2.22.52.12 and 2.13 to estimate the 95 percent value at risk (VaR) and the median shortfall (MS) of the Nifty 50 index over a period of one FY (i.e. 250 trading days starting from the 1st trading day in April). Here, m = 250 days. E(Xt) and V ar(Xt) in Eq. 2.2 are approximated by the mean and variances of the daily returns, i.e. negative of the numbers reported by Dutta and Das (2018). There are more than five thousand daily log return numbers for the twenty three years. The average daily log return of the Nifty 50 during 1995-96 to 2017-18 is equal to 0.039, with standard deviation 1.511.

The 95 percent VaR and MS estimates using Eqs. 2.2 and 2.5 for the Nifty 50 annual loss, i.e. m = 250 are equal to 29.460 and 36.992 percent respectively. These numbers imply that there is a five percent chance (one in twenty years) of the Nifty 50 value in log-scale depreciating by more than 29.46 percent in one financial year. In case the annual loss of the Nifty 50 value exceeds 29.46 percent, median annual loss (in log scale) beyond the VaR level is estimated to be 36.992 percent.

We also estimate the 95 percent annual VaR and MS of the NIFTY 50 index using the proposed bootstrap based estimator (\(\widehat {VaR}_{boot,p}\)), sample quantile estimator (SQp), Sfakianakis and Verginis estimator (SV 3p), extreme value theory based estimator (EV Tp) and SRTR estimator. These estimates are based on the 26 annual log returns of the Nifty 50 index and are reported in Table 8. Using the Kupiec test described in Section 4.1 we test whether our proposed estimators \(\widehat {VaR}_{m ,p}\) in Eq. 2.2 and \(\widehat {MS}_{m ,p}\) in Eq. 2.5, (\(\widehat {VaR}_{boot,p}\)) in Eq. 2.12 and (\(\widehat {MS}_{boot,p}\)) in Eq. 2.13 and the other four estimators viz. SQp, SV 3p, EV Tp and SRTR are adequate risk measures. The p-value is equal to the probability that at least \(\widehat {Z}_{n}\) out of 26 annual losses of the Nifty 50 index exceed the estimated 95 percent VaR or the MS, assuming H0 is true.

In Table 8 we report the annual 95 percent VaR and MS estimates of the NIFTY 50 by each one of the above mentioned estimators. Also reported are the number of exceedances \(\widehat {Z}_{n}\) of the VaR and MS estimates by each method and corresponding p-values based on the unconditional backtest by Kupiec (1995).

All the p-values exceed the 5 percent level of significance, indicating that all the estimates of the one year 95 percent VaR and MS of the Nifty 50 index are adequate (See Table 8). However, the SRTR based one year MS estimate and SV 3p based one year MS estimate exceed the magnitude of all the 26 annual losses of Nifty 50 index from FY 1995 − 96 to FY 2020 − 21 in Table 8. Clearly the SRTR method and SV 3p over estimates the annual MS of the Nifty 50 index. This is in line with the observation given by Dowd et al. (2004). The proposed estimators Eqs. 2.22.52.12 and 2.13 and the other two estimators viz.SQp and EV Tp seem to yield similar estimates of the one year 95 percent VaR and MS of the Nifty 50 index.

2. Crude oil and gold prices:

The 95 percent annual VaR and MS estimates for the crude oil and gold prices are reported in Tables 9 and 10 respectively.

From Tables 9 and 10 we observe that the p-values of the unconditional backtest of the proposed VaR estimators viz. \(\widehat {VaR}_{m,p}\) and \(\widehat {VaR}_{boot,p}\) (See Eqs. 2.2 and 2.12) exceed the 5 percent level of significance. The same observation is also true for the other estimators viz. SQp, SV 3p, EV Tp and SRTR.Footnote 2 Therefore, the proposed estimators and the estimators SQp, SV 3p, EV Tp and SRTR provide adequate estimates of annual VaR and MS for crude oil and gold returns. The SV 3p (in Tables 9 and 10) and SRTR (in Table 10) based VaR estimates seem to be exaggerated as there are no observed exceedances of the resulting risk estimates in the historical data of annual returns of crude oil and gold.

Comparing the VaR and MS estimates of the NIFTY 50, crude oil and gold annual returns we observe that gold exhibits the least market risk over a duration of one year and the crude oil exhibits similar annual market risk as the NIFTY 50 index.