1 Introduction

The concern with the information content in option-implied distributions has been growing for the last several years. In particular, there has been a growing interest in the information of the difference between option-implied and realized distributions, which is usually recognized as a risk premium required by the representative agent, from the point of view of the financial risk management and asset pricing implications. This paper investigates the risk premiums in higher order moments, especially in the skewness, of financial asset returns under a general equilibrium setting. In this paper, each of the risk premiums in higher order moments of financial asset returns is defined by the difference between two expected values of the moment under the risk-neutral and physical probability measures, respectively.

In recent years, there remains an ever-increasing interest and challenge to develop an entirely self-contained equilibrium-based explanation for the nonzero volatility (or variance) risk premiumFootnote 1 and its predictability for stock index returns. To the best of our knowledge, the first attempt to demonstrate the existence of the volatility risk premium based on a general equilibrium market model is made by Eraker (2008). Eraker (2008) develops an equilibrium explanation for the volatility risk premium based on the long-run risks (LRR) modelFootnote 2 which emphasizes the role of long-run risks, that is, low-frequency movements in consumption growth rates and volatility, in accounting for a wide range of asset pricing puzzles. The LRR model features an Epstein and Zin (1989) utility function with an investor preference for early resolution of uncertainty and contains (i) a persistent expected consumption growth component and (ii) long-run variation in consumption volatility. On the basis of the LRR model, Eraker (2008) studies the volatility risk premium through the framework of a general equilibrium model.

In addition to the development of an entirely self-contained equilibrium-based explanation for the risk premiums in higher order moments, several academic studies related to those risk premiums are provided in recent years. For example, motivated by fruitful implications from the LRR model pioneered by Bansal and Yaron (2004), Bollerslev et al. (2009) investigate the stock return predictability of the variance risk premium from the point of view of a general equilibrium setting based on the LRR model framework. They show that the difference between option-implied and realized variation, or the variance risk premium, is able to explain a nontrivial fraction of the time-series variation in post-1990 aggregate stock market returns, with high (low) premia predicting high (low) future returns. The magnitude of the predictability is particularly strong at the intermediate quarterly return horizon, where it dominates that afforded by other popular predictor variables, such as the P/E ratio, the default spread, and the consumption-wealth ratio.

Drechsler and Yaron (2011) also show the predictability of the variance risk premium for stock index returns based on an extended LRR model with jumps in uncertainty and the long-run component of cash-flows. They demonstrate that a risk aversion greater than one and a preference for early resolution of uncertainty correctly signs the variance risk premium and the coefficient from a predictive regression of returns on the variance risk premium.

All of the studies cited above focus only on the variance risk premium which is required by a representative investor due to the stochastic nature of asset return variance. Conversely, as far as we know, there are few reports about the risk premium which compensates for uncertainty of the third moment, that is, the skewness, of asset returns. In this paper, we demonstrate that the skewness risk premium, defined by the difference between two expected values of the skewness under the risk-neutral and physical probability measures, respectively, also captures attitudes toward economic uncertainty as well as the variance risk premium. Among recent studies on self-contained equilibrium-based models for the nonzero variance risk premium referred above, all of the studies except for Drechsler and Yaron (2011) model the processes of both the variance of consumption growth rate and the LRR factor as conditional normal, so that the one-step-ahead conditional distribution of the market return in each time is also conditional normal and, as a result, the skewness of that distribution must be zero. Therefore, the models proposed by those studies can not explain the negative risk-neutral skewness, which is found by the previous studies such as those by Aït-Sahalia and Lo (1998) and Aït-Sahalia et al. (2001). They document several empirical features of the state price density for the S&P500 index option market over time, including the term structures of mean returns, volatility, skewness, and kurtosis, that are implied by option-implied distributions. In particular, They show that the nonparametric state price densities are negatively skewed, have fatter tails and the amount of skewness and kurtosis both increase with maturity.

We show that jump components in the LRR factor and/or the variance of consumption growth rate can explain the nonzero (and negative) skewness of the one-step-ahead asset return distribution. To the best of our knowledge, Drechsler and Yaron (2011) is the first paper that indicates an important role for transient non-Gaussian shocks (jumps) to fundamentals such as the LRR-factor and the variance of consumption growth rate for understanding how perceptions of economic uncertainty and cash-flow risk manifest themselves in asset prices. However, in Drechsler and Yaron (2011), they assume that the jump intensity process \(\lambda _t\) has an affine structure of \(\lambda _t = l_0 + l_1 \sigma _t^2\), where \(l_0, l_1 > 0\) and \(\sigma _t^2\) is the variance of consumption growth rate, and under this assumption, it is unfortunately not possible to explain an empirical fact on a simultaneous relation between monthly stock returns and monthly changes of the option-implied skewness:

$$\begin{aligned} \begin{aligned} r_{m,t+1}&= 0.006 - 0.019 \times \triangle I{\textit{Skew}}_{t+1},\\&\quad \quad (2.46) \quad ( -3.46 )\\ r_{m,t+1}&= 0.006 - 0.007 \times \triangle {\textit{VIX}}_{t+1} - 0.016 \times \triangle I{\textit{Skew}}_{t+1}, \\&\quad \quad (3.33) \quad (-16.00) \quad ( -3.94 ) \end{aligned} \end{aligned}$$
(1)

where \(r_{m,t+1}\) is the monthly return of the S&P500 Total Return Index from time t to \(t+1\), \(\bigtriangleup {\textit{VIX}}_{t+1} \) is the monthly change of option-implied volatility calculated with the CBOE’s VIX from time t to \(t+1\), and \(\bigtriangleup I{\textit{Skew}}_{t+1}\) is the monthly change of option-implied skewness calculated with the CBOE’s Skew Index from time t to \(t+1\).Footnote 3 These results are obtained based on the monthly data from Jan-1990 to Aug-2012. Under the assumption on the jump intensity process in Drechsler and Yaron (2011), however, we can confirm that the regression parameters to \(\bigtriangleup I{\textit{Skew}}_{t+1}\) in the above regression models should be positive.

Moreover, such assumption in Drechsler and Yaron (2011) can not explain another empirical fact of low correlation (in absolute value) between \(\triangle VIX_{t}\) and \(\triangle I{\textit{Skew}}_{t}\). The jump intensity model assumed in Drechsler and Yaron (2011) leads to the perfect correlation between \(\triangle VIX_{t}\) and \(\triangle I{\textit{Skew}}_{t}\) because the jump intensity process in their paper is driven by only one factor, that is, \(\sigma _t^2\). However, actual correlation value between \(\triangle VIX_{t}\) and \(\triangle I{\textit{Skew}}_{t}\) in the period from Jan-1990 to Aug-2012 is nearly zero,Footnote 4 so that the assumption for jump intensity in Drechsler and Yaron (2011) can not explain such empirical fact.

In this paper, we propose an extension of the LRR models pioneered by Bansal and Yaron (2004) and Drechsler and Yaron (2011). Our model contains a rich set of transient dynamics and can quantitatively account for the time variation and asset return predictability of the skewness premium as well as the variance risk premium. In particular, we introduce a stochastic jump intensity for transient jumps to fundamentals such as the LRR factor and the variance of consumption growth rate, and show that this additional introduction of a stochastic jump intensity enables our model to capture the various empirical aspects of the stock index returns and its option-implied moments including the facts cited above. Christoffersen et al. (2012) find very strong support for time-varying jump intensities for S&P500 index returns and they show that, compared to the risk premium on dynamic volatility, the risk premium on the dynamic jump intensity has a much larger impact on option prices. We find that the existence of the negative skewness and the skewness risk premium observed in historical data have a close relationship with the existence of the jumps and the jump risk premium of the economic uncertainty, respectively.

This paper also shows that the skewness of asset return distribution and the skewness risk premium, which compensates for the stochastic nature of the skewness, are both time-varying due to the stochastic nature of the jump intensity for transient jumps in both the LRR factor and the variance of consumption growth rate. Moreover, providing an equity risk premium representation of a linear factor pricing model with time-varying variance and skewness risk premiums, we find that those risk premiums can explain a nontrivial fraction of the time series variation in the aggregate stock market returns. Simplified preliminary tests of regression models provide empirical evidence in which the skewness risk premium, as well as the variance risk premium, has superior predictive power for future aggregate stock market index returns. Compared with the variance risk premium, the results also show that the skewness risk premium plays an independent and essential role for predicting the market index returns.

The remainder of this paper is organized as follows. Section 2 outlines the basic theoretical model with jumps in consumption growth rate and its volatility, shows how equilibrium is derived for our model economy, and highlights its key features. In particular, we provide an equity risk premium representation of a linear factor pricing model with time-varying variance and skewness risk premiums. Section 3 provides the implications from a calibrated version of the theoretical equity risk premium representation of a linear factor pricing model derived in Sect. 2 to help guide and interpret our subsequent empirical reduced form predictability regressions. Section 4 describes the data used for examining the equity risk premium representation empirically and discusses the results from the predictive regressions on the stock returns to the variance and the skewness risk premiums with historical data. Section 5 provides concluding remarks.

2 Model framework

2.1 Model setup and assumptions

The underlying environment is a discrete time endowment economy. The representative agent’s preferences on the consumption stream are of the Epstein and Zin (1989) form, allowing for the separation of risk aversion and the intertemporal elasticity of substitution (IES). Thus, the agent maximizes his lifetime utility, which is defined recursively as

$$\begin{aligned} \begin{aligned} \displaystyle V_t = \Big [ (1-\delta )C_t^{\frac{1-\gamma }{\theta }} + \delta \Big ( \mathbb {E}_t [ V_{t+1}^{1-\gamma } ] \Big )^{\frac{1}{\theta }} \Big ]^{\frac{\theta }{1-\gamma }}, \end{aligned} \end{aligned}$$
(2)

where \(C_t\) is consumption at time t, \(0 < \delta < 1\) reflects the agent’s time preference, \(\gamma \) is the coefficient of risk aversion, \(\theta =\frac{1-\gamma }{1-\frac{1}{\psi }}\), and \(\psi \) is the intertemporal elasticity of substitution (IES). This preference structure collapses to a standard CRRA utility representation if \(\gamma =\frac{1}{\psi }\), that is, \(\theta =1\), and in this case, only innovations to consumption are priced. In the following, based on the result provided by Bansal and Yaron (2004) we assume that both \(\gamma \) and \(\psi \) are larger than one. It then holds that \(\gamma > \frac{1}{\psi }\), which implies \(\theta < 0\). With this choice, the investor has a preference for early resolution of uncertainty (Bansal and Yaron 2004). Then, not only consumption risk is priced, but state variables carry risk premia, too. The parameter restrictions also ensure that the signs of the risk premia are in line with economic intuition, and that a worsening of economic conditions leads to a decrease in asset prices.

Utility maximization is subject to the budget constraint:

$$\begin{aligned} \begin{aligned} \displaystyle W_{t+1} = (W_t - C_t) R_{c,t+1}, \nonumber \end{aligned} \end{aligned}$$

where \(W_t\) is the wealth of the agent and \(R_{c,t}\) is the return on all invested wealth. As shown in Epstein and Zin (1989), for any asset j, the first-order condition yields the following Euler condition:

$$\begin{aligned} \begin{aligned} \displaystyle \mathbb {E}_t \Bigl [ \exp (m_{t+1} + r_{j,t+1}) \Bigr ] = 1, \end{aligned} \end{aligned}$$
(3)

where \(r_{j,t+1}\) is the log of the gross return on asset j and \(m_{t+1}\) is the log of the intertemporal marginal rate of substitution (IMRS), which is given by \( m_{t+1} = \theta \log \delta - \frac{\theta }{\psi } \bigtriangleup c_{t+1} + (\theta -1) r_{c,t+1} \). Here, \(r_{c,t+1}\) is \( \log R_{c,t+1}\) and \( \bigtriangleup c_{t+1} \) is the change in \( \log C_t \), that is, \(\log \Big ( \frac{C_{t+1}}{C_t} \Big )\).

We model consumption and dividend growth rates, \(g_{t+1} \equiv \log (\frac{C_{t+1}}{C_t})\) and \(g_{d, t+1} \equiv \log (\frac{D_{t+1}}{D_t})\) where \(D_t\) is dividend at time t, respectively, as containing a small persistent predictable component \(x_t\), which determines the conditional expectation of consumption growth,

$$\begin{aligned} \begin{aligned} \displaystyle x_{t+1}&= \rho _x x_t + \varphi _e \sigma _t e_{t+1} + J_{x,t+1}, \\ g_{t+1}&= \mu _g + x_t + \varphi _\eta \sigma _t \eta _{t+1}, \\ g_{d,t+1}&= \mu _d + \rho _d x_t + \varphi _\zeta \sigma _t \zeta _{t+1}, \end{aligned} \end{aligned}$$
(4)

where \(\varphi _e, \varphi _\eta , \varphi _\zeta , \rho _x, \rho _d >0\), \(\mu _g, \mu _d \in \mathbb {R}\), \(e_t, \eta _t\), and \(\zeta _t\) are mutually independent i.i.d. N(0, 1) processes, and \(J_{x,t+1}\) is a compound-Poisson process represented by \(J_{x, t+1} \equiv \sum _{j=1}^{N_{t+1}^x} \epsilon _x^j \) where \(N_{t+1}^x\) is the Poisson counting process for that jump component whose the intensity process is \(\lambda _{x,t+1} \equiv l_x \lambda _{t+1}, l_x>0\), and \(\epsilon _x^j \sim i.i.d. \quad N(0,\delta _x^2)\), \(\delta _x>0\), is the size of the jump that occurs upon the \(N_{t+1}^x\).

Furthermore, we also model the dynamics of the volatility as follows:

$$\begin{aligned} \begin{aligned} \displaystyle \sigma _{t+1}^2&= \mu _{\sigma } + \rho _\sigma \sigma _{t}^2 + \sqrt{q_t} w_{t+1} + J_{\sigma ^2,t+1}, \\ q_{t+1}&= \mu _q + \rho _q q_t + \varphi _\xi \sqrt{q_t} \xi _{t+1}, \end{aligned} \end{aligned}$$
(5)

where the parameters satisfy \(\mu _{\sigma }>0, \mu _q>0, |\rho _\sigma |<1, |\rho _q|<1\), \(\varphi _\xi >0\), and \(w_t\) and \(\xi _t\) are mutually independent i.i.d. N(0, 1) processes and are also independent of each of \(e_t,\eta _t\), and \(\xi _t\). \(J_{\sigma ^2,t+1}\) is a compound-Poisson process, which is represented by \(J_{\sigma ^2, t+1} \equiv \sum _{j=1}^{N_{t+1}^{\sigma ^2}} \epsilon _{\sigma ^2}^j \) where \(N_{t+1}^{\sigma ^2}\) is the Poisson counting process for that jump component whose the intensity process is \(\lambda _{\sigma ^2,t+1} \equiv l_{\sigma ^2} \lambda _{t+1}, l_{\sigma ^2}>0\), and \(\epsilon _{\sigma ^2}^j \sim i.i.d. \quad N(0,\delta _{\sigma ^2}^2)\), \(\delta _{\sigma ^2}>0\), is the size of the jump that occurs upon the \(N_{t+1}^{\sigma ^2}\). We assume that \(N_{t+1}^x\) and \(N_{t+1}^{\sigma ^2}\) are mutually independent and \(\epsilon _x^j\) and \(\epsilon _{\sigma ^2}^j\) are too. The stochastic variance process \(\sigma _t^2\) represents time-varying economic uncertainty in consumption growth with the variance-of-variance process \(q_t\) in effect inducing an additional source of temporal variation in that same process. We model the variance-of-variance process \(q_t\) in the same fashion as Bollerslev et al. (2009).

Importantly, the jump intensity dynamics for \(\lambda _t\) is newly introduced in our economy, which is represented by the following discrete-time stochastic process,

$$\begin{aligned} \begin{aligned} \displaystyle \lambda _{t+1} = \mu _\lambda + \rho _\lambda \lambda _t + \varphi _u \sqrt{q_t} (\rho \xi _{t+1} + \sqrt{1-\rho ^2} u_{t+1}), \end{aligned} \end{aligned}$$
(6)

where \(\mu _\lambda >0\), \(|\rho _\lambda |<1\), \(|\rho | \le 1\), and \(u_t\) is an i.i.d. N(0, 1) process which is independent of each of \(e_t\), \(\eta _t\), \(\zeta _t\), \(w_t\), and \(\xi _t\).

One of the notable features of our model setup is this introduction for the jump intensity process (6). Christoffersen et al. (2012) also find very strong support for time-varying jump intensities for S&P500 index returns, and they show that, compared to the risk premium on dynamic volatility, the risk premium on the dynamic jump intensity has a much larger impact on option prices. In the previous studies, Drechsler and Yaron (2011) is the first paper that introduces transient jumps to fundamentals such as the LRR-factor \(x_t\) and the variance of consumption growth rate \(\sigma _t^2\). However, it assumes that the jump intensity process \(\lambda _t\) is represented by an affine structure of \(\lambda _t = l_0 + l_1 \sigma _t^2\) where \(l_0,l_1 > 0\). As mentioned in the introduction of this paper, such assumption for the jump intensity process can not explain empirical facts of regression (1) and nearly zero correlation between option-implied volatility and option-implied skewness. We extend the LRR models of Bansal and Yaron (2004) and Drechsler and Yaron (2011) so as to introduce a stochastic jump intensity of (6) into the economy. As shown in the following, this introduction enables our model to have a consistency with the empirical facts mentioned above and plays a key role in describing the characteristics of asset return distributions.

2.2 The model solution in equilibrium

We distinguish between the unobservable return on a claim to aggregate consumption, \(R_{c,t+1}\), and the observable return on the market portfolio, \(R_{m,t+1}\): the latter is the return on the aggregate dividend claim. Solving our model numerically, we demonstrate the mechanisms working in our model via approximate analytical solutions in the same fashion as the previous studies such as those by Bansal and Yaron (2004), Bollerslev et al. (2009), Drechsler and Yaron (2011), etc. To derive these solutions for our model, we use the standard approximation utilized in Campbell and Shiller (1988),

$$\begin{aligned} \begin{aligned} \displaystyle r_{c,t+1} = \kappa _0 + \kappa _1 v_{t+1} - v_t + g_{t+1}, \end{aligned} \end{aligned}$$
(7)

where lowercase letters refer to logs, so that \(r_{c,t+1} = \log (R_{c,t+1})\) is the continuous return, \(v_t = \log (\frac{P_t}{C_t})\) is the log price-consumption ratio of the asset that pays the consumption endowment, \( \{ C_{t+i} \}_{i=1}^\infty \), and \(\kappa _0\) and \(\kappa _1\) are approximating constants that both depend only on the average level of v.Footnote 5 Analogously, \(r_{m,t+l}\) and \(v_{m,t+1}\) correspond to the market return and its log price-dividend ratio and the similar approximation presented below can also be derived:

$$\begin{aligned} \begin{aligned} \displaystyle r_{m,t+1} = \kappa _{0,m} + \kappa _{1,m} v_{m,t+1} - v_{m,t} + g_{d,t+1}. \end{aligned} \end{aligned}$$
(8)

The standard solution method for finding the equilibrium in a model like the one defined above then consists in conjecturing solutions for \(v_t\) and \(v_{m,t}\) as an affine function of the state variables, \(x_t\), \(\sigma _t^2\), \(q_t\), and \(\lambda _t\),

$$\begin{aligned} v_t= & {} A_0 + A_x x_t + A_\sigma \sigma _t^2 + A_q q_t + A_\lambda \lambda _t, \end{aligned}$$
(9)
$$\begin{aligned} v_{m,t}= & {} A_{0,m} + A_{x,m} x_t + A_{\sigma ,m} \sigma _t^2 + A_{q,m} q_t + A_{\lambda ,m} \lambda _t, \end{aligned}$$
(10)

respectively, solving for the coefficients \(A_0\), \(A_x\), \(A_\sigma \), \(A_q\), and \(A_\lambda \) in \(v_t\) and for the coefficients \(A_{0,m}\), \(A_{x,m}\), \(A_{\sigma ,m}\), \(A_{q,m}\), and \(A_{\lambda ,m}\) in \(v_{m,t}\).

Substituting (9) for (7), we have a temporal representation for \(r_{c,t+1}\) with the state variables, \(x_t\), \(\sigma _t^2\), \(q_t\), and \(\lambda _t\), and furthermore, substituting this \(r_{c,t+1}\) for the Euler equation (3), we can derive an identity with those state variables. Solving the identity in the same manner as Bansal and Yaron (2004), Bollerslev et al. (2009) and Drechsler and Yaron (2011), we can derive the equilibrium solutions for the four parameters as follows:

$$\begin{aligned} \begin{aligned} \displaystyle A_x&= \frac{\gamma -1}{\theta (\kappa _1 \rho _x -1)}, \\ A_\sigma&= - \frac{1}{2} \frac{(1-\gamma )^2 \varphi _{\eta }^2 + \theta ^2 \kappa _1^2 A_x^2 \varphi _e^2}{\theta (\kappa _1 \rho _\sigma - 1)}, \\ A_\lambda&= \frac{2-\exp (\frac{1}{2} \theta ^2 \kappa _1^2 A_x^2 \delta _x^2) - \exp (\frac{1}{2} \theta ^2 \kappa _1^2 A_\sigma ^2 \delta _\sigma ^2)}{\theta (\kappa _1 \rho _\lambda - 1)}, \\ A_q&\text { is a solution of the quadratic equation presented below:} \\&\theta A_q (\kappa _1 \rho _q - 1) + \frac{\theta ^2 \kappa _1^2}{2} \Bigl [ A_\sigma ^2 + A_q^2 \varphi _\xi ^2 + 2 A_q A_\lambda \varphi _\xi \varphi _u \rho + A_\lambda ^2 \varphi _u^2 \Bigr ] = 0. \end{aligned} \end{aligned}$$
(11)

Considering the expressions of (11), the following proposition can be proven easily:

Proposition 1

If \(\gamma >1\) and \(\psi >1\), then, \(A_x>0\), \(A_\sigma <0\), \(A_q<0\), and \(A_\lambda <0\).

The above proposition suggests that if the IES and risk aversion are higher than 1, a rise in each of the state variables of \(\sigma _t^2\), \(q_t\), and \(\lambda _t\) lowers the price-consumption ratio.

Having solved for \(r_{c,t+1}\) with the four parameters derived above, we can substitute it (and \(\triangle c_{t+1} = g_{t+1}\)) into \(m_{t+1}\) to obtain an expression for the conditional innovation to the log pricing kernel at time \(t+1\):

$$\begin{aligned}&\displaystyle m_{t+1} - \mathbb {E}_t [m_{t+1}] \nonumber \\&\quad = \theta \log \delta - \frac{\theta }{\psi } \bigtriangleup c_{t+1} +(\theta -1) r_{c,t+1}\nonumber \\&\qquad -\, \mathbb {E}_t \Bigl [ \theta \log \delta - \frac{\theta }{\psi } \bigtriangleup c_{t+1} +(\theta -1) r_{c,t+1} \Bigr ] \nonumber \\&\quad = - \Lambda ^t (G_t z_{t+1} + J_{t+1} - \mathbb {E}_t [J_{t+1}]), \end{aligned}$$
(12)

where

$$\begin{aligned} \begin{aligned} \displaystyle \Lambda&\equiv \begin{pmatrix} \gamma&(1-\theta ) \kappa _1 A_x&(1-\theta ) \kappa _1 A_{\sigma }&(1-\theta ) \kappa _1 A_q&(1-\theta ) \kappa _1 A_\lambda&0 \end{pmatrix} ^t, \\ G_t&\equiv \begin{pmatrix} \varphi _\eta \sigma _t &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \varphi _e \sigma _t &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \sqrt{q_t} &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \varphi _\xi \sqrt{q_t} &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \rho \varphi _u \sqrt{q_t} &{}\quad \varphi _u \sqrt{1-\rho ^2} \sqrt{q_t} &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \varphi _\zeta \sigma _t \end{pmatrix}, \\ z_{t+1}&\equiv \begin{pmatrix} \eta _{t+1}&e_{t+1}&w_{t+1}&\xi _{t+1}&u_{t+1}&\zeta _{t+1} \end{pmatrix} ^t, \\ J_{t+1}&\equiv \begin{pmatrix} 0&J_{x, t+1}&J_{\sigma ^2, t+1}&\quad 0&\quad 0&\quad 0 \end{pmatrix} ^t, \\ \mathbb {E}_t [J_{t+1}]&\equiv \begin{pmatrix} 0&\mathbb {E}_t [J_{x,t+1}]&\mathbb {E}_t [J_{\sigma ^2,t+1}]&\quad 0&\quad 0&\quad 0 \end{pmatrix} ^t. \end{aligned} \end{aligned}$$
(13)

\(\Lambda \) can be interpreted as the price of risk for Gaussian shocks and also the sensitivity of the IMRS to the jump shocks. From the expression of \(\Lambda \), one can see that the prices of risks are determined by the A coefficients, that is, \(A_x\), \(A_\sigma \), \(A_q\), and \(A_\lambda \). The expression of \(\Lambda \) also shows that the signs of the risk prices depend on the signs of the A coefficients and \((1-\theta )\). In particular, when \(\gamma =\frac{1}{\psi }\), \(\theta =1\), and we are in the case of constant relative risk aversion (CRRA) preferences, it is clear that only the transient shock to consumption \(z_{c,t+1} \equiv \eta _{t+1}\) is priced, and prices do not separately reflect the risk of shocks to \(x_t\) (long-run risk), \(\sigma _t^2\) (volatility-related risk), \(q_t\) (variance-of-variance-related risk), and \(\lambda _t\) (jump intensity-related risk).

In the discussion and calibrations explored below, we especially focus on the case in which the agent’s risk aversion \(\gamma \) and the IES \(\psi \) are both greater than 1, which implies that \(A_x>0\), \(A_\sigma <0\), \(A_q<0\), and \(A_\lambda <0\) by the proposition provided above. Thus, positive shocks to long-run growth decrease the IMRS, while positive shocks to the levels of the other state variables, \(\sigma _t^2\), \(q_t\), and \(\lambda _t\), increase the IMRS. Note that in this case, since \((1-\theta ) > 0\), each of the A coefficients has the same sign as the corresponding price of risk.

To study the risk premiums in higher-order moments of the market returns, we first need to solve for the market return. A share in the market is modeled as a claim to a dividend with growth process given by \(g_{d,t}\). To solve for the price of a market share, we proceed along the same lines as for the consumption claim and solve for \(v_{m,t+1}\), the log price-dividend ratio of the market, by using the the conjecture (10), Campbell and Shiller (1988)-approximation (8), and the Euler equation (3).Footnote 6

With the equilibrium solutions for the parameters of \(A_{x,m}\), \(A_{\sigma ,m}\), \(A_{q,m}\), and \(A_{\lambda ,m}\) in (10), we can obtain an expression for \(r_{m,t+1}\) in terms of the state variables and its innovations by substituting the expression for \(v_{m,t(+1)}\) into (8):

$$\begin{aligned} r_{m,t+1}= & {} \kappa _{0,m} + \kappa _{1,m} A_{0,m} + \kappa _{1,m} A_{\sigma , m} \mu _d + \kappa _{1,m} A_{q,m} \mu _g + \kappa _{1,m} A_{\lambda ,m} \mu _\lambda -A_{0,m} + \mu _d \nonumber \\&+\, (\kappa _{1,m} A_{x,m} \rho _x - A_{x,m} + \rho _d) x_t \nonumber \\&+\, (\kappa _{1,m} A_{\sigma ,m} \rho _\sigma - A_{\sigma ,m}) {\sigma _t}^2 \nonumber \\&+\, (\kappa _{1,m} A_{q,m} \rho _q - A_{q,m}) q_t \nonumber \\&+\, (\kappa _{1,m} A_{\lambda ,m} \rho _\lambda - A_{\lambda ,m}) \lambda _t \nonumber \\&+\, \kappa _{1,m} A_{x,m} \varphi _e \sigma _t e_{t+1} + \kappa _{1,m} A_{\sigma ,m} \sqrt{q_t} w_{t+1} \nonumber \\&+\, \kappa _{1,m} (A_{q,m} \varphi _\xi + A_{\lambda ,m} \varphi _u \rho ) \sqrt{q_t} \xi _{t+1} \nonumber \\&+\, \kappa _{1,m} A_{\lambda ,m} \varphi _u \sqrt{1-\rho ^2} \sqrt{q_t} u_{t+1} + \varphi _\zeta \sigma _t \zeta _{t+1} \nonumber \\&+\, \kappa _{1,m} A_{x,m} J_{x,t+1} + \kappa _{1,m} A_{\sigma ,m} J_{\sigma ^2,t+1} \nonumber \\= & {} r_0 + (B_r^t F - A_m^t) Y_t + B_r^t G_t z_{t+1} + B_r^t J_{t+1}, \end{aligned}$$
(14)

where

$$\begin{aligned} \begin{aligned} \displaystyle r_0&\equiv \kappa _{0,m} + (\kappa _{1,m}-1) A_{0,m} + (\kappa _{1,m} A_{\sigma , m}+1) \mu _d + \kappa _{1,m} A_{q,m} \mu _g + \kappa _{1,m} A_{\lambda ,m} \mu _\lambda , \\ B_r&\equiv \kappa _{1,m} A_m + e_d, \\ A_m&\equiv \begin{pmatrix} 0 \\ A_{x,m} \\ A_{\sigma ,m} \\ A_{q,m} \\ A_{\lambda ,m} \\ 0 \end{pmatrix}, e_d \equiv \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{pmatrix}, F \equiv \begin{pmatrix} 0 &{}\quad \quad 1 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \rho _x &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \rho _\sigma &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \rho _q &{}\quad \quad 0 &{}\quad \quad 0 \\ 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \rho _\lambda &{}\quad \quad 0 \\ 0 &{}\quad \rho _d &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 &{}\quad \quad 0 \end{pmatrix}, Y_t \equiv \begin{pmatrix} g_t \\ x_{t} \\ {\sigma _t}^2 \\ q_t \\ \lambda _t \\ g_{d,t} \end{pmatrix}. \end{aligned} \end{aligned}$$
(15)

2.3 Risk premiums in higher-order moments in equilibrium

Before proceeding to investigating the risk premiums in higher-order moments in equilibrium, we need to provide some further explanation on the jump dynamics and the features of the pricing kernel introduced above.

To handle the jumps, we introduce some notation. \(\psi _k(u_k) = \mathbb {E} [\exp (u_k \epsilon _k)]\) (k is x or \(\sigma ^2\)) denotes the moment-generating function (mgf) of the jump size \(\epsilon _k\). The mgf for the jump component of k, \(\mathbb {E} [\exp (u_k J_{k, t+1})]\), then equals \(\exp (\Psi _{t,k} (u_k))\), where \(\Psi _{t,k} (u_k)=\lambda _{k,t}(\psi _k(u_k)-1)\). \(\Psi _{t,k}\) is called the cumulant-generating function (cgf) of \(J_{k, t+1}\) and is a very helpful tool for calculating asset pricing moments. The reason is that its n-th derivative evaluated at 0 equals the n-th central moment of \(J_{k, t+1}\).

Regarding the features of the pricing kernel, we can show what described below in line with Drechsler and Yaron (2011). Let us set the Radon-Nikodym derivative \(\frac{d \mathbb {Q}}{d \mathbb {P}} = \frac{M_{t+1}}{\mathbb {E}_t [M_{t+1}]}\), where \(\mathbb {P}\) is the physical probability measure and \(\mathbb {Q}\) is the risk-neutral probability measure in our economy. From (12), we have \(\frac{M_{t+1}}{\mathbb {E}_t [M_{t+1}]} \propto \exp (- \Lambda ^t (G_t z_{t+1} + J_{t+1}))\). Since \(z_{t+1}\) and \(J_{t+1}\) are independent, we can treat their measure transformations between \(\mathbb {P}\) and \(\mathbb {Q}\) separately. As a consequence, Drechsler and Yaron (2011) show that

$$\begin{aligned} \begin{aligned} \displaystyle z_{t+1} \mathop {\sim }\limits ^{\mathbb {Q}} N(-G_t^{'} \Lambda , I), \end{aligned} \end{aligned}$$
(16)

where I is the identity matrix in \(\mathbb {R}^{6 \times 6}\). That is to say that, under \(\mathbb {Q}\), \(z_{t+1}\) is still a vector of independent normals with unit variances, but with a shift in the mean.

For the case of \(J_{t+1}\), we could also proceed by transforming the probability density function directly. As guided in Drechsler and Yaron (2011), Proposition (9.6) in Cont and Tankov (2004) shows that under \(\mathbb {Q}\), the \(J_{t+1,k}\) are still compound Poisson processes, but with cgf given by

$$\begin{aligned} \begin{aligned} \displaystyle \Psi _{t,k}^{\mathbb {Q}} (u_k) = \lambda _{k,t} \psi _k (- \Lambda _k) \Big ( \frac{\psi _k(u_k-\Lambda _k)}{\psi _k(-\Lambda _k)}-1 \Big ), \end{aligned} \end{aligned}$$
(17)

where \(k=x\) or \(k=\sigma ^2\), \(\Lambda _x\) denotes the price of risk for the LRR-factor \(x_t\), that is, \((1-\theta ) \kappa _1 A_x\), and \(\Lambda _{\sigma ^2}\) denotes the price of risk for the variance of consumption growth rate, that is, \((1-\theta ) \kappa _1 A_\sigma \). (see (13)) In the following discussion, we use the facts mentioned above to calculate the higher-order moments of the market return and to investigate the risk premiums in the moments.

2.3.1 The variance risk premium in equilibrium

According to Bollerslev et al. (2009) and Drechsler and Yaron (2011), the variance risk premium in equilibrium, \(vp_t\), is defined by

$$\begin{aligned} \begin{aligned} \displaystyle vp_t \equiv \mathbb {E}_t^\mathbb {Q} [\mathrm{Var}_{t+1}^\mathbb {Q} (r_{m,t+2})] - \mathbb {E}_t^\mathbb {P} [\mathrm{Var}_{t+1}^\mathbb {P} (r_{m,t+2})], \end{aligned} \end{aligned}$$
(18)

where \(\mathrm{Var}_{t+1}^\mathbb {P}\) (\(\mathrm{Var}_{t+1}^\mathbb {Q}\)) is the variance operator under the physical (risk-neutral) probability measure at time \(t+1\). We have the following proposition:

Proposition 2

(The Conditional Variance of the Market Return) The conditional variance of the market return \(r_{m,t+2}\) at time \(t+1\) under \(\mathbb {P}\) can be expressed as follows:

$$\begin{aligned} \begin{aligned} \displaystyle \displaystyle \mathrm{Var}_{t+1}^\mathbb {P} (r_{m,t+2}) = B_r^t (H_{\sigma ^2} \sigma _{t+1}^2 + H_q q_{t+1}) B_r + {B_r^2}^t \mathrm{diag} \Big ( \psi ^{(2)} (0) \Big ) \Pi _{t+1}, \end{aligned} \end{aligned}$$
(19)

where

$$\begin{aligned} \begin{aligned}&\displaystyle H_{\sigma ^2} \equiv \begin{pmatrix} \varphi _\eta ^2 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \varphi _e^2 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \varphi _\zeta ^2 \end{pmatrix}, H_q \equiv \begin{pmatrix} 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 1 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \varphi _\xi ^2 &{}\quad \!\! \rho \varphi _\xi \varphi _u &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \rho \varphi _\xi \varphi _u &{}\quad \!\! \varphi _u^2 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \end{pmatrix},\\&\mathrm{diag} \Big ( \psi ^{(2)} (0) \Big ) \equiv \begin{pmatrix} 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \psi _x^{(2)} (0) &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \psi _{\sigma ^2}^{(2)} (0) &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \\ 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 &{}\quad \!\! \quad 0 \end{pmatrix}, \Pi _{t+1} \equiv \begin{pmatrix} 0 \\ \lambda _{x, t+1} \\ \lambda _{\sigma ^2, t+1} \\ 0 \\ 0 \\ 0 \end{pmatrix}. \nonumber \end{aligned} \end{aligned}$$

Proof

See the “Appendix”. \(\square \)

Under the risk-neutral probability measure \(\mathbb {Q}\), the conditional variance of the market return \(r_{m,t+2}\) at time \(t+1\) also can be obtained in the same manner demonstrated above. As a consequence, we can show the following proposition based on the definition of the variance risk premium (18).

Proposition 3

(The Variance Risk Premium in Equilibrium) In equilibrium, the variance risk premium at time t, \(vp_t\), is linear to the variance-of-variance, \(q_t\), and the jump intensity, \(\lambda _t\), and the representation of it is provided as follows:

$$\begin{aligned} \displaystyle vp_t&= \beta _{vp,c} + \beta _{vp,q} q_t + \beta _{vp,\lambda } \lambda _t, \\ where \nonumber \\ \beta _{vp,c}&\equiv \Bigl [ l_x B_r^2(2) (\psi _x^{(2)} (-\Lambda _x) - \psi _x^{(2)} (0)) + l_{\sigma ^2} B_r^2(3) (\psi _{\sigma ^2}^{(2)} (-\Lambda _{\sigma ^2}) - \psi _{\sigma ^2}^{(2)} (0)) \Bigr ] \mu _\lambda , \nonumber \\ \beta _{vp,q}&\equiv - B_r^t \Bigl [ \Lambda _{\sigma ^2} H_{\sigma ^2} + \varphi _\xi (\varphi _\xi \Lambda _q + \rho \varphi _u \Lambda _\lambda ) H_q \Bigr ] B_r \nonumber \\&\quad - \varphi _u (\rho \varphi _\xi \Lambda _q + \varphi _u \Lambda _\lambda ) (l_x B_r^2(2) \psi _x^{(2)} (- \Lambda _x) + l_{\sigma ^2} B_r^2(3) \psi _{\sigma ^2}^{(2)} (- \Lambda _{\sigma ^2})), \nonumber \\ \beta _{vp,\lambda }&\equiv B_r^t H_{\sigma ^2} B_r \psi _{\sigma ^2}^{(1)} (-\Lambda _{\sigma ^2}) \nonumber \\&\quad + \Bigl [ l_x B_r^2(2) (\psi _x^{(2)} (-\Lambda _x) - \psi _x^{(2)} (0)) + l_{\sigma ^2} B_r^2(3) (\psi _{\sigma ^2}^{(2)} (-\Lambda _{\sigma ^2}) - \psi _{\sigma ^2}^{(2)} (0)) \Bigr ] \rho _\lambda . \nonumber \end{aligned}$$
(20)

Proof

See the “Appendix”. \(\square \)

A number of interesting implications arise from the expression (20). In particular, any temporal variation in the endogenously generated variance risk premium is solely due to the variance-of-variance \(q_t\) and the jump intensity \(\lambda _t\). Moreover, provided that \(\theta < 0\), \(\Lambda _x > 0\), and \(\Lambda _{\sigma ^2} < 0\), as would be implied by \(\gamma > 1\) and \(\psi > 1\), the factor loading to the jump intensity, that is, \(\beta _{vp,\lambda }\), is guaranteed to be positive, but that to the variance-of-variance, that is, \(\beta _{vp,q}\), can be both positive and negative in general. However, if the correlation between the dynamics of the variance-of-variance and that of the jump intensity, that is , \(\rho \), is positive, then \(\beta _{vp,q}\) is also guaranteed to be positive due to the facts that \(\Lambda _q<0\) and \(\Lambda _\lambda <0\).

2.3.2 The skewness risk premium in equilibrium

On the basis of the same manner used to derive the expression (19) in the previous subsection, we can also derive the representations for the skewness of the market return under \(\mathbb {P}\) and \(\mathbb {Q}\), respectively, as follows:

$$\begin{aligned} \begin{aligned} \displaystyle \mathrm{Skew}_t^\mathbb {P} (r_{m,t+1})&= {B_r^3}^t \mathrm {diag} (\psi ^{(3)} (0)) \Pi _t, \\ \mathrm{Skew}_t^\mathbb {Q} (r_{m,t+1})&= {B_r^3}^t \mathrm {diag} (\psi ^{(3)} (- \Lambda )) \Pi _t, \end{aligned} \end{aligned}$$
(21)

where

$$\begin{aligned} \begin{aligned}&\displaystyle B_r^3 \equiv \begin{pmatrix} B_r^3\;(1) \\ B_r^3\;(2) \\ B_r^3\;(3) \\ B_r^3\;(4) \\ B_r^3\;(5) \\ B_r^3\;(6) \end{pmatrix}, \mathrm{diag} \Big ( \psi ^{(3)} (0) \Big ) \equiv \begin{pmatrix} 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{} \psi _x^{(3)} (0) &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{} \psi _{\sigma ^2}^{(3)} (0) &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \end{pmatrix}, \\&\mathrm{diag} \Big ( \psi ^{(3)} (-\Lambda ) \Big ) \equiv \begin{pmatrix} 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{} \psi _x^{(3)} (- \Lambda _x) &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{} \psi _{\sigma ^2}^{(3)} (- \Lambda _{\sigma ^2}) &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \\ 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 &{}\quad \! 0 \end{pmatrix}. \nonumber \end{aligned} \end{aligned}$$

In this paper, we define the skewness risk premium in equilibrium at time t, \(skp_t\), as the following expression, which is the same manner with the definition of the variance risk premium (18):

$$\begin{aligned} \begin{aligned} \displaystyle skp_t \equiv \mathbb {E}_t^\mathbb {Q} [\mathrm{Skew}_{t+1}^\mathbb {Q} (r_{m,t+2})] - \mathbb {E}_t^\mathbb {P} [\mathrm{Skew}_{t+1}^\mathbb {P} (r_{m,t+2})]. \end{aligned} \end{aligned}$$
(22)

Substituting (21) into (22), the explicit representation for the skewness risk premium can be obtained as follows:

$$\begin{aligned} \begin{aligned} \displaystyle skp_t \equiv {B_r^3}^t \mathrm{diag} (\psi ^{(3)} (- \Lambda )) \mathbb {E}_t^\mathbb {Q} [\Pi _{t+1}] - {B_r^3}^t \mathrm{diag} (\psi ^{(3)} (0)) \mathbb {E}_t^\mathbb {P} [\Pi _{t+1}]. \end{aligned} \end{aligned}$$
(23)

We also find a number of interesting implications from the expressions of (21) and (23). First, in the case that there is no jump to fundamentals in the economy, that is, in the case of \(\Pi _t \equiv 0\), it is clear that the conditional skewness of the market return should be zero due to (21). Thus, the existence of the nonzero skewness of the market return crucially depend on the existence of the jumps to fundamentals in the economy. Second, any temporal variation in endogenously generated skewness and skewness risk premium are solely due to the temporal variation in the jump intensity process \(\lambda _t\). For example, if the jump intensity is constant, then it is clear that the skewness (under \(\mathbb {P}\) and \(\mathbb {Q}\)) and skewness risk premium should be constant by (21) and (23). Third, since we have the fact of \(A_{\sigma }<0\) by the proposition 1, then in the case that the jump to the variance of consumption growth rate exists, that is, in the case that \(\lambda _{\sigma ^2,t}>0\), we can show easily by (21) that the risk-neutral skewness at time t, \(\mathrm{Skew}_t^\mathbb {Q} (r_{m,t+1})\) , should be negative. Finally, we can also find via (23) that in the case that either \(\lambda _{x,t}>0\) or \(\lambda _{\sigma ^2,t}>0\) is satisfied, the skewness risk premium at time t, \(skp_t\), in equilibrium also should be negative due to the facts of \(A_x>0\) and \(A_\sigma <0\).

On the basis of the definition of (22), let us provide the proposition for the representation of the skewness risk premium in equilibrium.

Proposition 4

(The Skewness Risk Premium in Equilibrium) In equilibrium, the skewness risk premium at time t, \(skp_t\), is linear to the variance-of-variance, \(q_t\), and the jump intensity, \(\lambda _t\), and the representation of it is provided as follows:

$$\begin{aligned} \begin{aligned} \displaystyle skp_t&= \beta _{sp,c} + \beta _{sp,q} q_t + \beta _{sp,\lambda } \lambda _t, \\ where \\ \beta _{sp,c}&\equiv \Bigl [ l_x B_r^3(2) {\psi _x}^{(3)} (-\Lambda _x) + l_{\sigma ^2} B_r^3(3) {\psi _{\sigma ^2}}^{(3)} (-\Lambda _{\sigma ^2}) \Bigr ] \mu _\lambda , \\ \beta _{sp,q}&\equiv \Bigl [ l_x B_r^3(2) {\psi _x}^{(3)} (-\Lambda _x) + l_{\sigma ^2} B_r^3(3) {\psi _{\sigma ^2}}^{(3)} (-\Lambda _{\sigma ^2}) \Bigr ] \varphi _u (-\rho \varphi _\xi \Lambda _q - \varphi _u \Lambda _\lambda ), \\ \beta _{sp,\lambda }&\equiv \Bigl [ l_x B_r^3(2) {\psi _x}^{(3)} (-\Lambda _x) + l_{\sigma ^2} B_r^3(3) {\psi _{\sigma ^2}}^{(3)} (-\Lambda _{\sigma ^2}) \Bigr ] \rho _\lambda . \end{aligned} \end{aligned}$$
(24)

Proof

Considering (6), (16), and the definition of the moment-generating function, it is trivial to derive the above expression. \(\square \)

From the above proposion, we find that any temporal variation in endogenously generated skewness risk premium is also solely due to the variance-of-variance \(q_t\) and the jump intensity \(\lambda _t\) as well as the volatility risk premium (Proposition 2). Moreover, provided that \(\Lambda _x>0\) and \(\Lambda _{\sigma ^2} < 0\), the factor loading to the jump intensity, that is, \(\beta _{sp,\lambda }\), is guaranteed to be negative, but that to the variance-of-variance, that is, \(\beta _{sp,q}\), can be both positive and negative in general. However, if the correlation between the dynamics of the variance-of-variance and that of the jump intensity, that is , \(\rho \), is positive, then \(\beta _{sp,q}\) is also guaranteed to be negative due to the facts that \(\Lambda _q<0\) and \(\Lambda _\lambda <0\).

Before we turn to the next discussion, it will be useful to mention about some features of the higher-order moments of the market return and the risk premiums in them.

First, as mentioned in the introduction in this paper, the usual assumption of an affine structure on the jump intensity process \(\lambda _t\), that is, \(\lambda _t = l_0 + l_1 \sigma _t^2\) where \(l_0, l_1 > 0\) and \(\sigma _t^2\) is the variance of consumption growth rate, in the previous studies such as Drechsler and Yaron (2011) can not explain an empirical fact on a simultaneous relation between monthly stock returns and monthly changes of the option-implied skewness shown by (1). It is because, under such assumption, we can show analytically that the regression parameters to \(\bigtriangleup I{\textit{Skew}}_{t+1}\) in (1) should be positive even if empirical evidence is consistent with the negative parameters.Footnote 7 On the other hand, based on our model provided above, the correlation between the one-step-ahead market return, \(r_{m,t+1}\), and the one-step-ahead change of risk-neutral skewness, \(\bigtriangleup Skew_{t+1}^{\mathbb {Q}} \equiv {\textit{Skew}}_{t+1}^{\mathbb {Q}} - {\textit{Skew}}_{t}^{\mathbb {Q}}\), can be derived analytically with (14) and (21) as follows:

$$\begin{aligned} \begin{aligned} \displaystyle \mathrm{Corr } \Big (r_{m,t+1}, \bigtriangleup {\textit{Skew}}_{t+1}^{\mathbb {Q}} \Big )&= K \varphi _u \kappa _{1,m} (\rho A_{q,m} + \varphi _u A_{\lambda ,m}) q_t, \\ \mathrm{where } \quad K&\equiv l_x B_r^3(2) {\psi _x}^{(3)} (-\Lambda _x) + l_{\sigma ^2} B_r^3(3) {\psi _{\sigma ^2}}^{(3)} (-\Lambda _{\sigma ^2}). \nonumber \end{aligned} \end{aligned}$$

From the above expression, we can show that, when \(\rho < - \varphi _u \frac{A_{\lambda ,m}}{A_{q,m}}\), the correlation between \(r_{m,t+1}\) and \(\bigtriangleup {\textit{Skew}}_{t+1}^{\mathbb {Q}} \equiv {\textit{Skew}}_{t+1}^{\mathbb {Q}} - {\textit{Skew}}_{t}^{\mathbb {Q}}\) should be negative because, in the case of \(\gamma > 1\) and \(\psi > 1\), it is proven that K is negative. This observation is consistent with the empirical fact of (1) shown in the introduction of this paper. If we assume the “zero” correlation between the \(\lambda _t\) and \(q_t\), we can not show the negative correlation between the one-step-ahead market return, \(r_{m,t+1}\), and the one-step-ahead change of risk-neutral skewness, \(\bigtriangleup {\textit{Skew}}_{t+1}^{\mathbb {Q}} \equiv {\textit{Skew}}_{t+1}^{\mathbb {Q}} - {\textit{Skew}}_{t}^{\mathbb {Q}}\), with our model. Thus, we would like to emphasize that there is considerable validity in our model setting with the stochastic jump intensity compared with the previous studies such as Drechsler and Yaron (2011), etc.

Second, although both the variance risk premium and the skewness risk premium are linear to the variance-of-variance \(q_t\) and the jump intensity \(\lambda _t\), we can find that they are linearly independent because of the fact that \(det \equiv \beta _{vp,q} \beta _{sp,\lambda } - \beta _{vp,\lambda } \beta _{sp,q}\) will not be zero under suitable parameter condition, which will be also proven in Sect. 3 with a model calibration result. If we assume that jump intensity process \(\lambda _t\) has an affine structure as mentioned above, the correlation between the variance and skewness risk premium should be one in absolute value and should not be independent each other. This is because the jump intensity process is driven by only one factor, that is, \(\sigma _t^2\), under such assumption. This observation also shows considerable validity in our model setting with the stochastic jump intensity.

2.4 An equity risk premium representation

In this subsection, let us show an equity risk premium representation with the variance and skewness risk premiums in equilibrium. In the beginning, we start with an expression for the equity risk premium provided by Drechsler and Yaron (2011) as follows:

$$\begin{aligned} \begin{aligned} \displaystyle&\log \mathbb {E}_t (R_{m,t+1}) - r_{f,t} = B_r^t G_t G_t^t \Lambda + \Pi _t^t (\psi (B_r)-1 - \psi (B_r - \Lambda )+\psi (- \Lambda ) ), \nonumber \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \displaystyle \psi (B_r)&\equiv \begin{pmatrix} 0 \\ \psi _x (B_r(2)) \\ \psi _{\sigma ^2} (B_r(3)) \\ 0 \\ 0 \\ 0 \end{pmatrix}, \psi (B_r-\Lambda ) \equiv \begin{pmatrix} 0 \\ \psi _x (B_r(2)-\Lambda _x) \\ \psi _{\sigma ^2} (B_r(3)-\Lambda _{\sigma ^2}) \\ 0 \\ 0 \\ 0 \end{pmatrix}, \nonumber \\ \psi (-\Lambda )&\equiv \begin{pmatrix} 0 \\ \psi _x (-\Lambda _x) \\ \psi _{\sigma ^2} (-\Lambda _{\sigma ^2}) \\ 0 \\ 0 \\ 0 \end{pmatrix}, \nonumber \end{aligned} \end{aligned}$$

As mentioned in Drechsler and Yaron (2011), the first term, \(B_r^t G_t G_t^t \Lambda \), represents the contributions of the Gaussian shocks to the equity risk premium. In particular, according to the expression of \(G_t G_t^t = H_{\sigma ^2} \sigma _{t}^2 + H_q q_{t} \) (see (19)), this term aggregates both the risk-return tradeoff relationship and a true premium for variance risk. The next terms, \(\Pi _t^t (\psi (B_r)-1 - \psi (B_r - \Lambda )+\psi (- \Lambda ) )\), represents the contributions from the jump processes. The derivation of this expression is presented in the “Appendix” in Drechsler and Yaron (2011).

The \(r_{f,t}\) is the risk-free rate at time t in the economy and the explicit expression of this \(r_{f,t}\) is provided in the “Appendix”.

With the expression of \( G_t G_t^t = H_{\sigma ^2} \sigma _{t}^2 + H_q q_{t} \) and \(\Pi _{t} \equiv \begin{pmatrix} 0&\lambda _{x, t}&\lambda _{\sigma ^2, t}&0&0&0 \end{pmatrix}^t\), the following representation can be obtained via the expression for the equity risk premium shown above:

$$\begin{aligned}&\log \mathbb {E}_t (R_{m,t+1}) - r_{f,t} = \beta _{er,\sigma } \sigma _t^2 + \beta _{er,q} q_t + \beta _{er,\lambda } \lambda _t, where \,\beta _{er,\sigma } \equiv B_r^t H_\sigma \Lambda , \nonumber \\&\qquad \beta _{er,q} \equiv B_r^t H_q \Lambda , \nonumber \\&\qquad \beta _{er,\lambda } \equiv l_x \Bigl [ \psi _x(B_r(2)) - 1 - \psi _x(B_r(2)-\Lambda _x) + \psi _x(-\Lambda _x) \Bigr ] \nonumber \\&\qquad \quad \quad \qquad +\, l_\sigma \Bigl [ \psi _{\sigma ^2}(B_r(3)) - 1 - \psi _\sigma (B_r(3)-\Lambda _{\sigma ^2}) + \psi _\sigma (-\Lambda _{\sigma ^2}) \Bigr ]. \end{aligned}$$
(25)

As shown in (25), the equity risk premium is driven by the state variables of \(\sigma _t^2\), \(q_t\), and \(\lambda _t\) and have a time-varying nature essentially because each of those variables has the stochastic nature. In particular, in the case of \(\gamma > 1\) and \(\psi > 1\), it is proven that \(\beta _{er,\sigma }>0\), \(\beta _{er,q}>0\), and \(\beta _{er,\lambda }>0\) because of the facts that \(\Lambda _x>0\), \(\Lambda _{\sigma ^2}<0\), \(\Lambda _q<0\), and \(\Lambda _\lambda <0\), which are provided in Proposition 1, so that if each of the state variables increases, then the equity risk premium also increases, and vice versa.

The conditional variance of the equity return at time t, \(\mathrm{Var}_t^{\mathbb {P}} (r_{m,t+1})\), is also expressed by

$$\begin{aligned} \begin{aligned} \displaystyle \mathrm{Var}_t^\mathbb {P} (r_{m,t+1})&= B_r^t G_t G_t^t B_r + {B_r^2}^t \mathrm {diag} \Big ( \psi ^{(2)} (0) \Big ) \Pi _t \\&= B_r^t H_{\sigma ^2} B_r \sigma _t^2 + B_r^t H_q B_r q_t + ( l_x B_r^2(2) \psi _x^{(2)} (0) + l_{\sigma ^2} B_r^2(3) \psi _{\sigma ^2}^{(2)} (0) ) \lambda _t \\&\equiv \beta _{var,\sigma } \sigma _t^2 + \beta _{var,q} q_t + \beta _{var,\lambda } \lambda _t, \nonumber \end{aligned} \end{aligned}$$

so that with (20), (24), (25), and the above expression for the conditional variance of the equity return we can derive an explicit equity risk premium representation of a linear factor pricing model with the variance and skewness risk premiums and the conditional variance of the equity return.

Proposition 5

(An Explicit Representation for the Equity Risk Premium)

$$\begin{aligned} \displaystyle \log \mathbb {E}_t (R_{m,t+1})&- r_{f,t} = \pi _{c} + \pi _{var} \mathrm{Var}_t^\mathbb {P} (r_{m,t+1}) + \pi _{vp} vp_t + \pi _{sp} skp_t, \\ where \nonumber \\ \pi _c&\equiv \Big ( - \frac{\beta _{er,\sigma } \beta _{var,q}}{\beta _{var,\sigma }} + \beta _{er,q} \Big ) \frac{-\beta _{sp,\lambda } \beta _{vp,c} + \beta _{vp,\lambda }\beta _{sp,c}}{det} \nonumber \\&+ \Big ( - \frac{\beta _{er,\sigma } \beta _{var,\lambda }}{\beta _{var,\sigma }} + \beta _{er,\lambda } \Big ) \frac{\beta _{sp,q} \beta _{vp,c} - \beta _{vp,q}\beta _{sp,c}}{det}, \nonumber \\ \pi _{var}&\equiv \frac{\beta _{er,\sigma }}{\beta _{var,\sigma }}, \nonumber \\ \pi _{vp}&\equiv \Big ( - \frac{\beta _{er,\sigma } \beta _{var,q}}{\beta _{var,\sigma }} + \beta _{er,q} \Big ) \frac{\beta _{sp,\lambda }}{det} - \Big ( - \frac{\beta _{er,\sigma } \beta _{var,\lambda }}{\beta _{var,\sigma }} + \beta _{er,\lambda } \Big ) \frac{\beta _{sp,q}}{det}, \nonumber \\ \pi _{sp}&\equiv - \Big ( - \frac{\beta _{er,\sigma } \beta _{var,q}}{\beta _{var,\sigma }} + \beta _{er,q} \Big ) \frac{\beta _{vp,\lambda }}{det} + \Big ( - \frac{\beta _{er,\sigma } \beta _{var,\lambda }}{\beta _{var,\sigma }} + \beta _{er,\lambda } \Big ) \frac{\beta _{vp,q}}{det}, \nonumber \\ det&\equiv \beta _{vp,q} \beta _{sp,\lambda } - \beta _{vp,\lambda } \beta _{sp,q}. \nonumber \end{aligned}$$
(26)

This representation of (26) suggests that the skewness risk premium, as well as the variance risk premium and the conditional variance of the market return, constitutes the dominant source of the variation in the equity risk premium. In the following section, we will show that det in (26) is not zero under the suitable parameter condition. Thus, from the above proposition, it is found that the skewness risk premium has an essential source of the variation in the equity risk premium, which is different from that of the variance risk premium (see Fig. 1).

Some recent studies such as those by Bali and Hovakimian (2009), Yan (2009), Chang et al. (2012), Driessen et al. (2012), and Rehman and Vilkov (2012) focus on a significant relationship between skewness or jump risks and expected stock returns, and they provide empirical evidence for a significantly positive link between the expected stock returns and the jump or skewness risks. To the best of our knowledge, this result of (26), which suggests an explicit relationship between the skewness risk premium and the expected equity excess return, is the first to provide a theoretical implication in their empirical evidence in terms of the LRR model approach pioneered by Bansal and Yaron (2004).

Fig. 1
figure 1

The risk premiums in higher-order moments and the equity risk premium

3 Model implications

Before proceeding to an empirical analysis based on the representation of (26), we show the implications from a calibrated version of the theoretical model (26) to help guide and interpret our subsequent empirical reduced form predictability regressions.

Table 1 The set of model parameters

Table 1 reports the parameter values used in the calibration of the factor loadings in the theoretical model (26). CM, BY, BTZ, and BST in this table denote values taken directly from Chan and Maheu (2002), Bansal and Yaron (2004), Bollerslev et al. (2009), and Bollerslev et al. (2012), respectively. Those previous studies refer to the unit time interval in the calibrated equilibrium models as a month, and we also refer to the unit time as the same. On the basis of the parameters exhibited in Table 1, we calibrate the factor loadings for the variance risk premium, which appear in the representation of (20), and for the skewness risk premium, which appear in the representation of (24), in equilibrium.

Fig. 2
figure 2

The factor loading \(\beta _{vp,q}\)

Fig. 3
figure 3

The factor loading \(\beta _{vp,\lambda }\)

Fig. 4
figure 4

The factor loading \(\beta _{sp,q}\)

Fig. 5
figure 5

The factor loading \(\beta _{sp,\lambda }\)

The figures from Figs. 2, 3, 4 and 5 show the factor loadings, \(\beta _{vp,q}\), \(\beta _{vp,\lambda }\), \(\beta _{sp,q}\), and \(\beta _{sp,\lambda }\), corresponding to the parameters of the risk aversion parameters \(\gamma \) and the correlation \(\rho \) between the volatility of volatility \(q_t\) and the jump intensity \(\lambda _t\). As is shown in the previous section, \(\beta _{vp,\lambda }\), which is the factor loading to the jump intensity \(\lambda _t\) in the variance risk premium representation (20), is essentially positive, and under the parameter values exhibited in Table 1, \(\beta _{vp,q}\), which is the factor loading to the variance-of-variance \(q_t\) in (20), also seems to be positive. These results indicate that when the variance-of-variance and (or) the jump intensity rise(s), the level of the variance risk premium also increases. In contrast, \(\beta _{sp,\lambda }\), which is the factor loading to the jump intensity in the skewness risk premium representation (24), is essentially negative and this result is consistent with the discussion explored in the previous section. However, interestingly, \(\beta _{sp,q}\), which is the factor loading to the variance-of-variance in (24), can be both positive and negative corresponding to the parameters of \(\gamma \) and \(\rho \). These results on the \(\beta _{sp,\lambda }\) and the \(\beta _{sp,q}\) indicate that although an increase in the jump intensity reduces the level of the skewness risk premium essentially, but an increase in the variance-of-variance will raise or reduce the level of the skewness risk premium corresponding to the values of \(\gamma \) and \(\rho \).

Fig. 6
figure 6

The factor loading to \(Var_t^\mathbb {P}:\pi _{Var}\)

Fig. 7
figure 7

The factor loading to \(vp_t\):\(\pi _{vp}\)

Fig. 8
figure 8

The factor loading to \(skp_t\):\(\pi _{skp}\)

The figures from Figs. 6, 7 and  8 show the factor loadings to the variance of the market return, the variance risk premium, and the skewness risk premium in the equity risk premium representation (26). It is interesting that both of the \(\pi _{Var}\) and \(\pi _{vp}\) are essentially positive and these results are irrelevant to the values of \(\gamma \) and \(\rho \). Moreover, these results are consistent with the previous studies such as those by Bollerslev et al. (2009) and Drechsler and Yaron (2011). An important point to emphasize is that the factor loading of \(\pi _{skp}\), which is the loading to the skewness risk premium in (26), can be positive corresponding to the risk aversion parameter \(\gamma \). In particular, when the \(\gamma \) is over 4, it is clear from Fig. 8 that the \(\pi _{skp}\) is strictly positive. This result indicates that a decrease in the skewness risk premium, which is the case that the risk-neutral skewness is going to be much smaller than the skewness under the physical measure, reduces the equity risk premium when \(\gamma \) is over 4. This implication is interesting as it shows the essential contribution of the skewness risk premium to the equity risk premium explicitly implying the sign of the \(\pi _{skp}\) corresponding to the values of \(\gamma \) and \(\rho \). As mentioned above, some recent studies such as those by Bali and Hovakimian (2009), Yan (2011), Chang et al. (2012), Driessen et al. (2012), and Rehman and Vilkov (2012) focus on a significant relationship between skewness or jump risks and expected stock returns, and they provide empirical evidence for a significantly positive link between the expected stock returns and the jump or skewness risks. In particular, Bali and Hovakimian (2009) and Yan (2011) provide evidence for a significantly positive link between expected returns and the call-put options’ implied volatility spread that can be considered as a proxy for jump risk. Moreover, using data on individual stock options, Rehman and Vilkov (2012) show that the currently observed option-implied ex ante skewness is positively related to future stock returns. There has been no study that tried to provide the theoretical equilibrium model which is consistent with the empirical results cited above. To the best of our knowledge, this is the first paper that demonstrates what mentioned above with a stylized model that accounts for a close relationship between the skewness risk premium and the equity risk premium.

4 Empirical measurements

The theoretical model developed in the previous section suggests that the variance and skewness risk premiums, as well as the variance of the market return, may serve as useful predictor variables for the future market returns (see Proposition 4). To examine this suggestion empirically and compare the results with those by Bollerslev et al. (2009) and Drechsler and Yaron (2011), we plan for running some statistical tests based on simple linear regression models of the S&P500 excess return on different sets of lagged predictor variables including the variance and skewness risk premiums. We always rely on monthly and quarterly observations and focus our discussion on the estimated slope coefficients and their statistical significance as determined by the t-statistics. We also report the forecasts’ accuracy of the regression models as measured by the corresponding adjusted \(R^2s\).

Before showing the results of the predictive regression models of the S&P500 excess return, let us note some key points on the measurements for the variance and skewness risk premiums and describe the data used in our analysis explored in the following subsection.

4.1 Measurements for the higher-order moments

Our method for measuring the risk premiums in higher-order moments is similar to that in Bollerslev et al. (2009) and Drechsler and Yaron (2011). As mentioned above, we formally define the variance risk premium as the difference between the risk-neutral and physical expectations of the variance of the market return and also define the skewness risk premium in the same manner. We focus on the one-month- and three-month-forward predictability of those risk premiums and use the squared VIX and the SKEW index from the Chicago Board of Options Exchange (CBOE) as our measures for the risk-neutral expected variance and skewness, respectively. The VIX is calculated by the CBOE using the model-free approach to measure 30-day expected volatility of the S&P500 return. The components of the VIX are near- and next-term put and call options, usually in the first and second SPX (S&P500 index) contract months. The model-free approach used to calculate the VIX is provided by, for example, Demeterfi et al. (1999). The SKEW index from the CBOE is also calculated from the S&P 500 option prices based on the method similar to that used to calculate the VIX, which is obtained by a portfolio of S&P 500 index options that mimics an exposure to the skewness payoff of one-step-ahead cumulative return distribution of the index. The Skew index is also calculated by the model-free approach provided, for example, Bakshi et al. (2003).Footnote 8

For the measures of the expected variance and skewness under the physical measure, we basically use the current variance and skewness of the S&P500 index return, which are respectively defined as the historical 22 days actual variance estimated based on daily return data of the index and the historical 12 months actual skewness estimated based on monthly return data of the index. To match the definition of those historical moments of the index return distribution with the risk-neutral expected moments mentioned above, we use the annualized current variance, while the current skewness, which is estimated based on historical 12 months monthly return data, is used as it is. Bollerslev et al. (2009) suggest that, for highly persistent variance dynamics, or \(\rho _\sigma \approx 1\), the objective expected future variance will obviously be close to the value of the current variance so that the same qualitative implications hold true for the variance difference obtained by replacing \(\mathbb {E}_t^{\mathbb {P}} [Var_{t+1}^{\mathbb {P}} (r_{m,t+2})]\) in the definition of (18) with the current variance. From a similar point of view, the same would be considered for the objective expected future skewness.Footnote 9 However, in our model setting, the time-t skewness of the one-step-ahead market return, \(r_{m,t+1}\), can be expressed by the jump intensity \(\lambda _t\) (see (21)), and empirically some people may find non-persistent jump intensities and it may be true that the jumps are rare events and their roles should disappear very quickly. For this reason, in addition to the current skewness as a proxy of the measure of expected skewness under the physical measure, it may be very nice to introduce another approach for measuring the \(\mathbb {E}_t^{\mathbb {P}} [{\textit{Skew}}_{t+1}^{\mathbb {P}} (r_{m,t+2})] \) based on a forecasting regression.Footnote 10 In the empirical analysis in the following, we also investigate empirical results explored by using the expected physical skewness estimated by forecasting regressions described in the below. The models we employ in this case are the following:

$$\begin{aligned} \displaystyle Reg. (1) : {\textit{Skew}}_{t+1}^M= & {} \alpha + \beta _1 {\textit{Skew}}_t^M + \beta _2 CVol_t + \epsilon _{t+1}, \nonumber \\ Reg. (2) : {\textit{Skew}}_{t+1}^D= & {} \alpha + \beta _1 {\textit{Skew}}_t^D + \beta _2 CVol_t + \epsilon _{t+1}, \nonumber \\ Reg. (3) : {\textit{Skew}}_{t+1}^M= & {} \alpha + \beta _1 {\textit{Skew}}_t^M + \beta _2 \triangle CVol_t + \epsilon _{t+1}, \nonumber \\ Reg. (4) : {\textit{Skew}}_{t+1}^D= & {} \alpha + \beta _1 {\textit{Skew}}_t^D + \beta _2 \triangle CVol_t + \epsilon _{t+1}, \nonumber \\ Reg. (5) : {\textit{Skew}}_{t+1}^M= & {} \alpha + \beta _1 {\textit{Skew}}_t^M + \epsilon _{t+1}, \nonumber \\ Reg. (6) : {\textit{Skew}}_{t+1}^D= & {} \alpha + \beta _1 {\textit{Skew}}_t^D + \epsilon _{t+1}. \end{aligned}$$
(27)

All the models presented above are monthly-based model and one time step of the above models is one month. \({\textit{Skew}}_t^M\) is time-t actual skewness, which is culculated based on historical 12 months monthly return data of the S&P500 index. \({\textit{Skew}}_t^D\) is time-t actual skewness culculated based on historical 22 days daily return data of the S&P500 index,Footnote 11 and \(CVol_t\) is time-t annualized volatility culculated based on historical 22 days daily return data of the S&P500 index. \(\triangle CVol_t\) is one-month differential of the current volatility, which is defined by \(\triangle CVol_t \equiv CVol_t - CVol_{t-1}\). The regression models introduced above are estimated based on historical 36 months rolling window procedure. Based on the one-step-ahead forecasts of the physical skewness of the S&P500 index monthly return distribution, which are estimated by each of the models introduced above, we calculate the skewness risk premium based on the definition of (22).

4.2 Data description

Our data series for the VIX, SKEW index, and expected variance and skewness under \(\mathbb {P}\) covers the period from January 1990 to August 2012. The main limitation on the length of our sample comes from the VIX and SKEW index, since the time series published by the CBOE begins in January 1990. As mentioned in the previous subsection, we rely on the monthly and quarterly data for the squared VIX and SKEW index for quantifying \(\mathbb {E}_t^\mathbb {Q} [Var_{t+1}^{\mathbb {Q}}(r_{m,t+2})]\) in (18) and \(\mathbb {E}_t^\mathbb {Q} [{\textit{Skew}}_{t+1}^{\mathbb {Q}}(r_{m,t+2})]\) in (22), respectively, and purposely rely on the readily available squared VIX as our measure for the risk-neutral expected variance and the value of \(\frac{1}{10} (100-{\textit{Skew}} \quad index)\) as our measure for the risk-neutral expected skewness. The expected variance \(\mathbb {E}_t^\mathbb {P} [Var_{t+1}^{\mathbb {P}}(r_{m,t+2})]\) and the expected skewness \(\mathbb {E}_t^\mathbb {P} [{\textit{Skew}}_{t+1}^{\mathbb {P}}(r_{m,t+2})]\) at time t are respectively calculated based on the historical index returns as described in the previous subsection.

Fig. 9
figure 9

The VIX and the current volatility. This figure shows the time-series data of the VIX and the current volatility (the square root of the current variance defined in the main paper). The current volatility is the historical 22 days actual volatility estimated based on daily return data of the S&P500 index

Fig. 10
figure 10

The risk-neutral skewness and the current skewness. This figure shows the time-series data of the risk-neutral expected skewness extracted from the SKEW index and the current skewness. The current skewness is the historical 12 months actual skewness estimated based on monthly return data of the S&P500 index

To illustrate the data, Figs. 9 and 10 plot the monthly time-series of the risk-neutral expected volatility (VIX), the current volatility (historical 22 days annualized actual volatility), the risk-neutral expected skewness, and the current skewness (historical 12 months actual skewness). Consistent with the theoretical model developed in the previous section and the earlier empirical evidence, the spread between the risk-neutral expected variance (the squared VIX) and the current variance is almost always positive and the spread between the risk-neutral expected skewness and the current skewness is almost always negative. Moreover, it is clear that these spreads have a time-varying nature. It is interesting that, although the value of the VIX reaches an outstanding peak at the period of the subprime crisis in 2008, the risk-neutral skewness seems to be more negative at the period of the European financial crisis in 2011 than at the period of the subprime crisis.

In addition to the variance and skewness risk premiums, we also consider a set of other more traditional predictor variables for the predictive regressions examined in the following subsection. Specifically, we obtain monthly P/E ratios and dividend yields of the S&P 500 index directly from Standard&Poor’s. Data on the three-month T-bill, the high-yield spread (hys) (between Moody’s BAA and AAA corporate bond yields), and the term spread (ts) (between the 1-year T-bond and the three-month T-bill yields) are taken from the Thomson Reuters Data Stream. The CAY, which represents the aggregate consumption-wealth ratio, defined in Lettau and Ludvigson (2001) is downloaded from Lettau and Ludvigson’s Web site.

Table 2 Summary statistics for the monthly returns and predictor variables

Basic summary statistics for the monthly excess returns of the S&P500 index and predictor variables are exhibited in Table 2. The sample period extends from January 1990 to August 2012. All variables are reported in monthly-based percentage form whenever appropriate. The \(r_{m,t}-r_{f,t}\) denotes the logarithmic return on the S&P 500 index in excess of the three-month T-bill rate. \(VIX^2\) denotes the squared VIX index. ISkew refers to the risk-neutral expected skewness extracted from the CBOE SKEW index by the formula of \(I{\textit{Skew}} = \frac{1}{10} (100-\text {Skew index})\). CVar and CSkew refer to the current variance, which is the annualized actual variance based on historical 22 days daily return data, and the current skewness, which is the actual skewness based on historical 12 months monthly return data, respectively. vp and skp respectively refer to the variance and skewness risk premiums, that is, \(vp \equiv VIX^2-CVar\) and \(skp \equiv I{\textit{Skew}}-C{\textit{Skew}}\). The predictor variables include the log price-earning ratio (ln(pe)), the log dividend yield (ln(dy)), the high yield spread (hys) defined as the difference between Moody’s BAA and AAA bond yield indices, and the term spread (ts) defined as the difference between the 10-year and 3-month Treasury yields.

Table 3 Summary statistics for the expected physical skewness and skewness risk premium

The mean excess return on the S&P 500 index over the sample equals 0.3 % monthly. The sample means for the \(VIX^2\) and the current (historical 22 days) annualized variance are 6.0 and 5.0 %, respectively, and the sample means for the risk-neutral expected skewness and the current (historical 12 months) skewness are -1.6 and -0.2, respectively. The numbers for the traditional forecasting variables are all directly in line with those reported in previous studies. In particular, all of the variables are highly persistent with first-order autocorrelations ranging from 0.95 to 0.99.

As stated in the previous subsection, we additionally construct expected physical skewness for measuring the \(\mathbb {E}_t^{\mathbb {P}} [{\textit{Skew}}_{t+1}^{\mathbb {P}} (r_{m,t+2})] \) based on the forecasting regressions (see (27)). Table 3 shows basic summary statistics for the expected physical skewness (“ES”), which are estimated by the forecasting regressions described in (27), and correspondent skewness risk premium (skp). The sample period extends from January 1990 to August 2012. For example, the label “\(ES_1\)” means the expected physical skewness estimated by the model of “Reg. (1)” in (27) and the label “\(skp_1\)” means correspondent skewness risk premium. From this table, we can find that the expected physical skewness and correspondent skewness risk premium have similar characteristics among monthly- or daily-based skewness regression models.

4.3 The results of simple regression tests

Table 4 provides the results of return predictability regressions with the variance and skewness risk premiums. In the first place, we use the skewness risk premium constructed with the current skewness in this table. (The regression results based on the expected physical skewness estimated by the regression models of (27) and correspondent skewness risk premium will be shown in the next table.) All of our forecasts are based on simple linear regressions of the S&P500 excess returns on different sets of lagged predictor variables. There are two sets of columns with regression estimates. The first set of columns shows OLS estimates by monthly return regressions, that is, one-month-ahead forecasts, and the second set shows OLS estimates by non-overlapped quarterly return regressions, that is, one-quarter-ahead forecasts. These regressions are examined in the period from January 1990 to August 2012 and, in particular, each of the monthly return regressions is examined by 270-month samples and each of the quarterly return regressions is examined by 88-quarter samples. Each of the sets of columns consists of five regression results. The first two regressions are one-factor regression models using the variance risk premium (vp model) or the skewness risk premium (skp model) as a univariate regressor, while the third regression is two factors regression model with both the variance and skewness risk premiums (vp+skp model). The fourth regression model, which is denoted by 3 factors model, is based on the three independent factors of the current variance (CVar), the variance premium (vp), and the skewness risk premium (skp), and it represents the theoretical linear model of (26) derived in the previous section. Finally, we also provide the stepwise selection model (Stepwise model), which the universe of independent variables consists of the risk premiums in higher-order moments, changes of those risk premiums, and one of the traditional predictor variables, that is, the log price-earning ratio (ln(pe)). The variables such as \(\bigtriangleup VIX^2\) and \(\bigtriangleup I{\textit{Skew}}\) exhibited in this table are monthly or quarterly changes of the \(VIX^2\) and \(I{\textit{Skew}}\), respectively.

Table 4 The monthly and quarterly return regressions

From the monthly return regression results in this table, we can find that the slope coefficients of the vp and skp model are both significant at 5 % level and, in particular, the slope coefficient of the vp model is significant at 1 % level. Moreover, the slope coefficients of the vp+skp model are also significant at the same level as mentioned above and this model can account for about 7.2 % of the monthly return variation. The 3 factors model represents the theoretical implication of (26) and this model has a superior predictive power in the adjusted \(R^2\) than the vp+skp model due to the additional variable of the current variance (CVar). Although the stepwise model is not equivalent to the theoretical implication of (26), that is, the 3 factors model, all of the independent variables of CVar, vp, and skp in (26) are significant at 5 or 1 % level. These results indicate that the theoretical model of (26) and, in particular, the variance and skewness risk premiums have superior predictive power for future aggregate stock market index returns and this indication is consistent with the theory provided in the previous section in this paper.

The quarterly regressions reported in this table further underscore the significance of the monthly return regressions and, in contrast to the monthly return regressions, all of the t-statistics for the skewness risk premium are insignificant at conventional levels. However, interestingly, we can find that the stepwise model is perfectly equivalent to the theoretical implication of (26), that is, the 3 factors model, and this model can account for about 14.7  % of the quarterly return variation. Although the slope coefficient to the skewness risk premium is not significant as mentioned above, the coefficients to the variance risk premium and the current variance are both significant at 5 % level and, in particular, at 1 % level for the variance risk premium.

Table 5 provides the results of predictive regressions with the 3 factors, that is, the physical variance \(Var_t^{\mathbb {P}} (r_{m,t+1})\), the variance risk premium \(vp_t\), and the skewness risk premium \(skp_t\), of the S&P500 index. Particularly, the skewness risk premiums used in these regressions are constructed by the expected physical skewness estimated by the regression models described in (27). These regressions are examined in the period from Jan-1990 to Aug-2012 and this in-sample period is the same as the in-sample period of the regressions examined in Table 4. For example, the “Original 3 fac. model” means the 3 factors model which is shown in the Table Table 5 and the “Predictive Reg. (1)” means the predictive regression model based on the 3 factors in which the skewness risk premium is constructed with the expected physical skewness estimated by the Reg. (1) model in (27). We can find that the results in Table 5 show that none of the alternative 3 factors models using the one-step-ahead forecast of the physical skewness have a superior predictive power in comparison with the original 3 factors model in terms of the adjusted R-square. Moreover, the statistical significance of the factor loadings on the skewness risk premium \(skp_t\) are decreasing when compared with the original model and, in particular, the factor loadings on the \(skp_t\) of the Predictive Reg. (2), Predictive Reg. (4), and Predictive Reg. (6) are statistically insignificant at 5 % level. However, decreasing R-squares and smaller t-ratios should be the case when we use the expected quantities for the physical skewness and this is not surprising. In particular, the results of Predictive Reg. (1), (3), and (5) nevertheless indicate that the variance and skewness risk premiums have superior predictive power for future aggregate stock market index returns.

Table 5 The results of monthly-return-based predictive regressions

Let us show the other results to emphasize the superiority of the skewness risk premium, as well as the variance risk premium, as a predictor variable for the equity excess return. Table 6 reports monthly- and quarterly-based predictive regression results for the S&P500 index excess return with each of the traditional predictor variables exhibited in this table, that is, the price-earning ratio (pe), dividend yield (dy), high-yield spread (hys), and term spread (ts) defined in the previous subsection and the changes of those variables. As shown in this table, we can find that, in the case of the monthly return regressions, none of the predictor variables are superior in the adjusted \(R^2\) to the variance and skewness risk premiums (see Table 4). In the case of the quarterly return regressions in this table, it seems that only \(\triangle \)ln(pe) and \(\triangle \)hys have superior adjusted \(R^2\) in comparison with the skewness risk premium, but, none of the variables are superior in the adjusted \(R^2\) to the variance risk premium (see Table 4)

Table 6 The univariate regressions with traditional predictor variables
Table 7 Summary statistics for the CAY

Table 8 reports monthly- and quarterly-based predictive regression results for the S&P500 index excess return with the CAY, the aggregate-consumption wealth ratio defined in Lettau and Ludvigson (2001). The CAY is quarterly-based data and downloaded from Lettau and Ludvigson’s web site. The downloaded data covers January 1990 to January 2012. Table 7 shows summary statistics for the CAY as well as the variance and skewness risk premiums under the period from January 1990 to January 2012. For the monthly return regressions, we define a monthly CAY series from the most recent quarterly observation.

As shown in Table 8, we can find that the CAY does not seem to be superior predictor variable in comparison with the variance and skewness risk premiums. This result is similar to the results in Table 6 and also suggests that the skewness risk premium, as well as the variance risk premium, has superior predictive power for future aggregate stock market index returns.

Table 8 The univariate regressions with the CAY

5 Concluding remarks

In this study, we investigate the skewness risk premium in the financial market under a general equilibrium setting. Extending the long-run risks (LRR) model proposed by Bansal and Yaron (2004) by introducing a stochastic jump intensity for jumps in the LRR factor and the variance of consumption growth rate, we provide an explicit representation for the skewness risk premium, as well as the volatility risk premium, in equilibrium.

On the basis of the representation for the skewness risk premium, we propose a possible reason for the empirical facts of time-varying and negative risk-neutral skewness. Moreover, we also provide an equity risk premium representation of a linear factor pricing model with the variance and skewness risk premiums. The empirical results imply that the skewness risk premium, as well as the variance risk premium, has superior predictive power for future aggregate stock market index returns, which are consistent with the theoretical implication derived by our model. Compared with the variance risk premium, the results show that the skewness risk premium plays an independent and essential role for predicting the market index returns.

Some recent studies such as those by Bali and Hovakimian (2009), Yan (2011), Chang et al. (2013), Driessen et al. (2012), and Rehman and Vilkov (2012) focus on a significant relationship between skewness or jump risks and expected stock returns and they provide empirical evidence for a significantly positive link between the expected stock returns and the jump or skewness risks. To the best of our knowledge, this study is the first to provide a theoretical implication in their empirical evidence in terms of the LRR model approach pioneered by Bansal and Yaron (2004). It remains some challenges for future research on providing more explicit theoretical explanation for the results presented by the recent studies cited above with the theoretical implication shown in this paper. And moreover, it also needs a detailed analysis on the reasons why the skewness and variance risks are priced differently and, in particular, independently of each other. Further insight into this aspect is left to further work.