Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The advent of commodity exchange-traded funds (ETFs) has provided both institutional and retail investors with new ways to gain exposure to a wide array of commodities, including precious metals, agricultural products, and oil and gas. All commodity ETFs are traded on exchanges like stocks, and many have very high liquidity. For example, the SPDR Gold Trust ETF (GLD), which tracks the daily London gold spot price, is the most traded commodity ETF with an average trading volume of 8 million shares and market capitalization of US $31 billion in 2013.Footnote 1

Within the commodity ETF market, some funds are designed to track a constant multiple of the daily returns of a reference index or asset. These are called leveraged ETFs (LETFs). An LETF maintains a constant leverage ratio by holding a variable portfolio of assets and/or derivatives, such as futures and swaps, based on the reference index. For example, the Dow Jones U.S. Oil & Gas Index (DJUSEN) or the Dow Jones U.S. Basic Materials Index (DJUSBM) and their associated ETFs track the stocks of a basket of commodities producers, as opposed to the physical commodity prices. On the other hand, most LETFs are based on total return swaps and commodity futures. The most common leverage ratios are ± 2 and ± 3, and LETFs typically charge an expense fee. Major issuers include ProShares, iShares, VelocityShares and PowerShares (see Table 1). For example, the ProShares Ultra Long Gold (UGL) seeks to return 2x the daily return of the London gold spot price minus a small expense fee. One can also take a bearish position by buying shares of an LETF with a negative leverage ratio. The ProShares Ultra Short Gold (GLL) is an inverse LETF that tracks −2x the daily return of the London gold fixing price. LETFs are a highly accessible and liquid instrument, thereby making them attractive instruments for traders who wish to gain leveraged exposure to a commodity without borrowing money or using derivatives.

Table 1 A summary of the 23 LETFs studied in this paper, arranged by commodity type and then leverage

For a long LETF, with a leverage ratio β > 0, the fund must add to a winning position in a bull market to maintain a constant leverage ratio. On the other hand, during a bear market, the fund must sell its losing positions to maintain the same leverage ratio. Similar arguments can be made for short (or inverse) LETFs (β < 0). As a consequence, LETFs can potentially outperform β times its reference during periods of market trending. However, should the LETF exhibit high volatility but no significant movement in price over a period of time, the constant daily re-balancing would cause the fund to decline in value. Therefore, LETFs can be viewed as long momentum but short volatility, and the value erosion due to realized variance of the reference is called volatility decay (see [24]). This raises the important question of how well do LETFs perform over a long horizon.

Since their introduction to the market, LETFs a number of criticisms from both practitioners and regulators.Footnote 2 Some are concerned that the returns of LETFs exhibit some discrepancies from the goals stated in their prospectuses. In fact, some issuers provide warnings that LETFs are unsuitable for long-term buy-and-hold investors.

Many existing studies focus on equity-based ETFs and their leveraged counterparts. For example, Avellaneda and Zhang [2] study the price behavior and discuss the volatility decay of equity LETFs in different sectors. They find minimal 1-day tracking errors among the most liquid equity ETFs. They explain that an equity LETF can replicate the leveraged returns of its reference through a dynamic portfolio consisting of the component equities.

In contrast, commodities are unique because the physical assets cannot be stored easily. As such, ETF issuers are required to replicate through either warehousing,Footnote 3 which is very costly, and thus uncommon except for precious metals such as silver and gold, or trading futures with multiple counterparties (see [5]). Since the reference indices may represent the spot prices of physical commodities, futures-based commodity ETFs may fail to track their reference indices perfectly and their tracking performance is subject to the fluctuation and term structure of futures prices. On top of that, most commodity LETFs use over-the-counter (OTC) total return swaps with multiple counterparties to generate the required leverage ratios. The lower liquidity of OTC contracts and counterparty risk can contribute to additional tracking errors. As we show in this paper, tracking errors can seriously affect the long-term fund performance of LETFs.

In a related work, Murphy and Wright [12] perform a t-test based on 1-day returns to determine if any commodity LETF has a non-zero tracking error. They conclude that all LETFs have a very good daily tracking performance. However, they do not conduct the analysis over a longer horizon, or account for the volatility decay. There is also no discussion of trading strategies there. On the other hand, Guedj et al. [5] discuss the difficulties faced by an ETF provider in replicating a commodity index using futures. In particular, they point out that the term structure of futures may lead to large deviations between the ETF price and the spot price of a commodity.

In this paper, we analyze the tracking performance of commodity leveraged ETFs. Through a series of regression analyses, we illustrate how the returns of commodity LETFs deviate from the reference returns multiplied by the leverage ratio over different holding periods. In particular, the average tracking error tends to turn more negative over a longer horizon and for higher leveraged ETFs. With in mind that realized variance of the reference can erode the LETF value, we examine the over/under-performance of LETFs with respect to a benchmark that incorporates the effect of volatility decay. From empirical data, we find that many commodity leveraged ETFs in our study underperform significantly against the benchmark, and we quantify such a discrepancy by introducing the realized effective fee. Finally, we consider a static trading strategy that involves shorting two LETFs with leverage ratios of different signs, and study its performance and dependence on the realized variance of the reference. We find that the resulting portfolio is always long realized variance both theoretically and empirically, but is also exposed to the tracking errors associated with the two LETFs. We also backtest the strategy through examining its empirical returns over rolling periods.

The rest of the paper is organized as follows. In Sect. 2, we analyze the returns of commodity LETFs over different holding periods and illustrate horizon dependence of tracking errors. In Sect. 3, we use a benchmark process that incorporates the realized variance of the reference to study the over/under-performance of each LETF. In Sect. 4, we discuss a static trading strategy and backtest using historical data. Section 5 concludes the paper and points out a number of directions for future research.

2 Analysis of Tracking Error

We first compare the returns of LETFs and their reference indices. For every ETF, we obtain its closing prices and reference index values from Bloomberg for the period Dec 2008–May 2013. We then calculate the n-day returns from n = { 1, 2, , 30} using disjoint successive periods (e.g. the return over days 1–30 then returns over days 31–60 for 30-day returns). Let L t be the price of an LETF and S t be the reference index value at time t. For a given leverage ratio β, we compare the log-returns of the LETF to β times the log-returns of the corresponding reference index. This leads us to define the n-day tracking error at time t by

$$\displaystyle{ Y _{t}^{(n)} =\ln \frac{L_{t+n\varDelta t}} {L_{t}} -\beta \ln \frac{S_{t+n\varDelta t}} {S_{t}}, }$$
(1)

where Δ t represents one trading day. We explore the empirical distribution of the n-day tracking error, and then analyze the effect of holding horizon on the magnitude of tracking errors. We remark there are alternative ways to define tracking errors for ETFs. For example, one can consider the difference in relative returns as opposed to log-returns, or the root mean square of the daily differences (see [10]).

2.1 Regression of Empirical Returns

We conduct a regression between log-returns of the LETF and its reference index based on the linear model:

$$\displaystyle{ \ln \frac{L_{t}} {L_{0}} =\hat{\beta }\ln \frac{S_{t}} {S_{0}} +\hat{ c}+\epsilon, }$$
(2)

where ε ∼ N(0, σ 2) is independent of the reference index value S t , \(\forall t \geq 0\). In other words, we run an ordinary least square 1-variable regression between the log-returns for every fixed horizon of n days. Then, we increase the holding period from 1 to 30 days, and observe how the regression coefficients vary.

We display the regression results in Figs. 1, 2, 3, and 4 for log-returns over periods of 1, 5, 10, and 20 days. To avoid dependence among returns, we use disjoint time intervals to calculate returns. For example, we use \(\frac{S_{20}} {S_{0}}, \frac{S_{40}} {S_{20}} \ldots\) and \(\frac{L_{20}} {L_{0}}, \frac{L_{40}} {L_{20}} \ldots\) for 20-day log-returns as the inputs for the regression.

Fig. 1
figure 1

From top left to bottom right: regression of DJUSEN-DIG (β = 2, oil & gas) 1, 5, 10, 20-day log-returns. We consider disjoint periods from Dec 2008 to May 2013

Fig. 2
figure 2

From top left to bottom right: regression of DJUSEN-DUG (\(\beta = -2\), oil & gas) 1, 5, 10, 20-day log-returns. We consider disjoint periods from Dec 2008 to May 2013

Fig. 3
figure 3

From top left to bottom right: regression of GOLDLNPM-UGL (β = 2, gold) 1, 5, 10, 20-day log-returns. We consider disjoint periods from Dec 2008 to May 2013

Fig. 4
figure 4

From top left to bottom right: regression of GOLDLNPM-GLL (\(\beta = -2\), gold) 1, 5, 10, 20-day log-returns. We consider disjoint periods from Dec 2008 to May 2013

In Fig. 1, the regression coefficient \(\hat{\beta }\) for DIG (β = 2, oil & gas) increases from 2 to 2. 1 as the holding period lengthens from 1 to 20 days. Although the coefficient of determination R 2 is close to 99 % for up to 20 days, it is highest for 1-day returns. In Fig. 2 for DUG (\(\beta = -2\), oil & gas), one again observes \(\hat{\beta }\) increasing, and R 2 decreasing. For DUG (\(\beta = -2\), oil & gas), as n varies from 1 to 20, \(\hat{\beta }\) increases from − 2 to − 1. 66. As a result, this implies that DIG (β = 2, oil & gas) effectively gains leverage as the holding time increases, while DUG (\(\beta = -2\), oil & gas) loses leverage compared to the advertised fund β.

On the other hand, UGL (β = 2, gold) and GLL (\(\beta = -2\), gold) exhibit very different return behaviors. In Fig. 3 the R 2 for UGL (β = 2, gold) is surprisingly worst for the shortest holding period of 1 day, whereas it increases to 95 % over a holding period of 20 days. In Fig. 4 for GLL (\(\beta = -2\), gold), the R 2 increases from 35 % to 96 % when holding the fund from 1 to 20 days. Furthermore, the estimators \(\hat{\beta }\) for UGL (β = 2, gold) and GLL (\(\beta = -2\), gold) both slowly approach their advertised β = ±2. The variation of \(\hat{\beta }\) for DIG (β = 2, oil & gas) and UGL (β = 2, gold) over different holding periods is summarized in Fig. 5.

Fig. 5
figure 5

The estimated \(\hat{\beta }\) from the regressions for DJUSEN-DIG (β = 2, oil & gas), and GOLDLNPM-UGL (β = 2, gold)

We observe that LETFs that track an illiquid reference, such as the gold bullion index GOLDLNPM, tend to have more tracking errors than those tracking a liquid index, such as the oil & gas index DJUSEN. The oil & gas commodity LETFs involve exchange-traded futures which are liquid proxy to the spot price. The gold and silver bullion LETFs consist of OTC total return swaps. The difficulty and higher costs replication using swaps, as well as infrequent (typically daily) update of the swaps’ mark-to-market values can weaken the fund’s tracking ability. For example, the 1-day regressions of UGL and GLL (β = ±2, gold) yield R 2 values less than 40 %, while DIG and DUG (β = ±2, oil & gas ) have 1-day R 2 values of over 90 %. On the other hand, full physical replication yields the greatest R 2, with examples of the non-leveraged gold and silver ETFs, GLD and SLV, respectively. Hence, the replication strategy can significantly affect a fund’s tracking errors. A more precise understanding of the effectiveness of swaps, futures, and other replication strategies requires the full holdings history from the ETF provider, which is not publicly available at all times.Footnote 4

In addition, the LETFs we studied have an increasingly negative constant coefficient \(\hat{c}\) as the holding time increases. For example, over a holding period of 20-days, DUG (\(\beta = -2\), oil & gas) has a 3 % decay on returns compared to β times its reference index. We would expect this phenomenon, however, since the LETF would need to buy high and sell low, while the reference investor would simply hold his securities. Therefore, the longer the LETF is held, the more likely the fund will underperform against β times the reference index. As we will see in Sect. 3, the constant coefficient \(\hat{c}\) depends on two factors, the expense fee charged by the issuer as well as the realized variance of the reference index.

Hence, with this simple linear model for LETF prices, we have observed that although LETFs safely replicate β times the reference over short holding periods, they begin to exhibit negative tracking error and deviations in their leverage ratios β as the holding time increases. Furthermore, we see that LETFs which attempt to track illiquid spot prices perform much more poorly than expected. We conclude that more factors must be considered when modeling LETF returns.

2.2 Distribution of Tracking Errors

As defined in (1), the tracking error is the difference between the LETF’s log-return and the corresponding multiple of its reference index’s log-return. In this section, we examine the distribution of the tracking error. This provides a picture of the LETF’s efficiency in its stated goal of replicating the leveraged return of a reference index.

For the 23 LETFs in Table 2, we compute the mean μ and standard deviation σ for the tracking errors using available price data during the period Dec 2008 to May 2013. For all these funds, the mean 1-day tracking error has μ ≈ 0, ranging from 0 % to − 0. 27 %. Therefore, all these LETFs on average successfully replicate the stated multiple β of the daily reference return, with a slight negative bias. In fact, many LETFs even continued to replicate returns over periods as long as 10 days. However, as the holding time increases, the average tracking error grows more negative, so that the LETF in fact underperforms its intended goal over longer holding periods (see Fig. 6).

Fig. 6
figure 6

Histograms and QQ plots of 1-day tracking errors for DIG, DUG (β = ±2, oil & gas); UGL, GLL (β = ±2, gold) from top to bottom

Table 2 Mean μ and standard deviation σ of the 1-day tracking error by commodity

Interestingly, the tracking errors for the silver and gold LETFs (AGQ, ZSL (β = ±2, silver); UGL, GLL (β = ±2, gold)) in Table 2 have σ several magnitudes higher than μ. For example, AGQ (β = 2, silver) has a tracking error σ of 5 % compared to a μ of 0.01 %. In other words, these four LETFs, while they might track their references well on average, may also exhibit positive and negative deviations over 1-day holding periods as well. These observations are consistent with the regressions in Figs. 3 and 4, where UGL and GLL (β = ±2, gold) show significant 1-day tracking errors. On the other hand, the non-leveraged gold and silver bullion ETFs, GLD and SLV, have almost no tracking error σ ≈ 0, because they hold the underlying bullion according to their prospectuses. Since many investors use these ETFs to gain leveraged exposure to commodities, they should be aware of the large variance of the associated tracking errors.

In Fig. 6, we show the histogram for the tracking error for each ETF along with a quantile-quantile plot to illustrate the distribution. For DIG and DUG (β = ±2, oil & gas), the quantile-quantile plot shows that the tracking error distribution is not quite normal, and has a large negative tail, so that the commodity LETF tracking error is negatively biased even for the shortest possible holding period of 1 day. On the other hand, for UGL, GLL (β = ±2, gold) the distribution appears to be normal with R 2 close to 98 %. However, as noted in Table 2, the tracking errors for UGL and GLL (β = ±2, gold) also have a very large variance.

Next, we examine the horizon effect of tracking errors. Figure 7 indicates that higher leveraged ETFs tend to have more negative average tracking errors, which appear to be decreasing linearly over longer holding periods. In addition, negative leveraged LETFs have a more negative average tracking error than their positive counterparts. For example, in Fig. 7, GLL (\(\beta = -2\), gold) has a lower slope than UGL (β = 2, gold) even though they have the same absolute value of leverage ratio | β | . Furthermore, with few exceptions, the average tracking error is most negative when \(\beta = -3\) followed by \(\beta = 3,-2,2,-1,1\). Thus, there is a higher holding horizon punishment for buying short than long LETFs.

Fig. 7
figure 7

A plot of no. of days vs the mean tracking error arranged by commodities tracked. From top left to bottom right: US Oil & Gas, Gold, Crude Oil,and Silver. As the holding period increases, the average tracking error becomes more negative as well

Our analysis of the tracking error distribution reveals several characteristics of the tracking error defined in (1). Over a very short holding period, most LETFs perform close to their objectives stated in their prospectuses. Nevertheless, the realized tracking error varies over time, and can be positive or negative. For gold and silver LETFs, the tracking error is more volatile. Moreover, the magnitude of the mean tracking error depends heavily on the β of the LETF, with bear LETFs suffering a higher penalty than bull LETFs.

3 Incorporating Realized Variance into Tracking Error Measurement

As is well known in the industry (see [2, 3]), the price dynamics of an LETF depends on the realized variance of the reference index. This leads us to incorporate the realized variance in measuring the performance of an LETF. We run a regression analysis based on empirical LETF and reference prices that incorporates the realized variance as an independent variable. We then derive a realized effective fee associated with each LETF and analyze the realized price behavior relative to a theoretical benchmark to better quantify the over/under-performance.

3.1 Model for the LETF Price

Let S t be the price of the reference index, and L t be the price of the LETF at time t. Also denote f as the expense rate, r as the interest rate and β as the leverage ratio. Assume the reference asset follows the SDE

$$\displaystyle{ \frac{dS_{t}} {S_{t}} =\mu _{t}dt +\sigma _{t}dW_{t},\quad t \geq 0, }$$
(3)

with stochastic drift \((\mu _{t})_{t\geq 0}\) and volatility \((\sigma _{t})_{t\geq 0}\). For our analysis herein, we assume a general diffusion framework, but do not need to specify a parametric model. Many well-known models, including the CEV, Heston, and exponential Ornstein-Uhlenbeck models, fit within the above framework.

A long β-LETF L can be constructed through a dynamic portfolio. Specifically, the portfolio at time t consists of the cash amount $ β L t invested in the reference index S t , while $(β − 1)L t is borrowed at the positive risk free rate r. As a result, the LETF satisfies the SDE

$$\displaystyle{ dL_{t} = L_{t}\beta \frac{dS_{t}} {S_{t}} - L_{t}((\beta -1)r + f)dt. }$$
(4)

Solving the SDE, the log-return of the LETF is given by

$$\displaystyle{ \ln \frac{L_{t}} {L_{0}} =\beta \ln \frac{S_{t}} {S_{0}} + \frac{\beta -\beta ^{2}} {2} V _{t} + ((1-\beta )r - f)t, }$$
(5)

where

$$\displaystyle{ V _{t} =\int _{ 0}^{t}\sigma _{ s}^{2}ds }$$
(6)

is the realized variance of S accumulated up to time t. Therefore, under this general diffusion model, the log-return of the LETF is proportional to the log-return of the reference index by a factor of β, but also proportional to the variance by a factor of \(\frac{\beta -\beta ^{2}} {2}\). The latter factor is negative if \(\beta \notin (0,1)\), which is true for every LETF traded on the market. Also, the expense fee f reduces the return of the LETF.

Our regression analysis will focus on testing the functional form (5). We observe from (5) that the functional form of L t in terms of S t and V t holds for any parametric model within the diffusion framework in (3). Considering the daily LETF returns, we set \(\varDelta t = \frac{1} {252}\) as one trading day. Let R t S be the daily return of the reference index at time t. At any time t, the n-day log-returns of an LETF follows

$$\displaystyle{ \ln \frac{L_{t+n\varDelta t}} {L_{t}} =\beta \ln \frac{S_{t+n\varDelta t}} {S_{t}} + \frac{\beta -\beta ^{2}} {2} V _{t}^{(n)} + ((1-\beta )r - f)n\varDelta t, }$$
(7)
$$\displaystyle{ V _{t}^{(n)} =\sum _{ i=0}^{n-1}(R_{ t+i\varDelta t}^{S} -\bar{ R_{ t}}^{S})^{2},\quad \bar{R_{ t}}^{S} = \frac{1} {n}\sum _{i=0}^{n-1}R_{ t+i\varDelta t}^{S}. }$$
(8)

This serves as a benchmark process for our subsequent analysis.

3.2 Regression of Empirical Returns

The log-return equation (7) suggests a regression with two predictors: the log-returns and the realized variance of the reference over n-days. This results in the linear model

$$\displaystyle{ \ln \frac{L_{t}} {L_{0}} =\hat{\beta }\ln \frac{S_{t}} {S_{0}} +\hat{\theta } V _{t} +\hat{ c}+\epsilon, }$$
(9)

where \(\hat{c}\) is a constant intercept to be determined, and \(\varepsilon \sim N(0,\sigma ^{2})\) is independent of \((S_{t})_{t\geq 0}\).

In Table 3, we summarize the estimated \(\hat{\theta }\) from our regression with holding periods of 30 days. Again, we use price data from disjoint periods to calculate returns. The realized variance is calculated using the inter-period returns (30 days). The choice of 30-day periods gives us sufficient points to compute the realized variance while providing enough disjoint periods during the period Dec 2008–May 2013 to perform a regression. A longer price history would certainly have helped in balancing this tradeoff, but all these commodity LETFs were introduced only in the past 5 years.

Table 3 \(\hat{\theta }\) vs. θ, estimated from 30-day multi-variable regression of returns, with a partial correlation table

Our empirical analysis confirms several aspects of our theoretical model in (5) and provides explanations in cases where there is discrepancy. The theoretical value of θ according to (5) is given by \(\frac{\beta -\beta ^{2}} {2}\). Table 3 shows that the estimator \(\hat{\theta }\) is typically in the neighborhood of θ, its theoretical value. For example, SCO (\(\beta = -2\), crude oil) has \(\hat{\theta }= 2.93\) versus a theoretical θ of 3. In addition, the non-leveraged ETFs all have \(\hat{\theta }\) close to 0, suggesting that realized variance does not play an important role in its price process, as predicted. However, some LETFs have \(\hat{\theta }\) diverging significantly from θ. For example, the \(\hat{\theta }\) for UGL (β = 2, gold) differs from its theoretical value by a factor of 114 % even with a regression R 2 of 99 %.

We attribute the deviation of \(\hat{\theta }\) from θ in our regression to the collinearity effect of the two predictors (\(\ln \frac{S_{t}} {S_{0}}\) and V t ). Of course \(\ln \frac{S_{t}} {S_{0}}\) and V t cannot be independent observations, since V t depends on the price path process of S t , the reference index. In general, the reference returns and the realized variance are negatively correlated. When the realized variance is high, it is likely the reference has suddenly dropped in value. When the realized variance is low, it usually implies a period of steady positive growth for the reference. Thus, the multi-collinearity effect is responsible for shifting predictive power among the different predictor variables. In order to measure the magnitude of the collinearity effect and the contribution of each correlated predictor variable, we compute the coefficients of partial determination for our regression model.

The factor r y | x 2 which measures the marginal predictive power of adding the realized variance into the model. As r y | x 2 increases, \(\hat{\theta }\) becomes closer to θ, suggesting a larger dependence of LETF returns on realized variance during holding periods of high volatility. For example, for the 3 LETFs DIG (β = 2, oil & gas), SCO (\(\beta = -2\), crude oil), and UYM (β = 2, building materials) all have r y | x 2 over 90 %. Their estimated \(\hat{\theta }\) is similarly very close to the theoretical θ, never differing by more than 10 %. However, for non-leveraged ETFs, the realized variance has minimal added predictive power in the model. For those ETFs, we observe \(\hat{\theta }\approx 0\). For example, SLV (β = 1, silver), GLD (β = 1, gold), and DBO (β = 1, crude oil) all have \(r_{y\vert x}^{2} \approx 0\), and they subsequently have \(\hat{\theta }\approx 0\). In addition, \(r_{x\vert y}^{2}\), which is the marginal predictive power of adding the log-returns of the reference into our regression model, is always very high, indicating that the log-returns of the reference affect the LETF prices the most, but that the realized variance is still important for predictive power, especially when leverage and the holding period is high.

3.3 Realized Effective Fee

In Fig. 8, we show three empirical price paths: the LETF log-returns, the benchmark process defined in (5), and β times the reference index log-returns. As we can see, the value erosion due to realized variance (volatility decay) starts to play a significant role in determining LETF prices as the holding time increases. The path associated with β times the reference log-returns dominates the LETF log-returns after about 1 month of holding. After about 1 year, the benchmark which incorporates volatility decay more closely models the empirical LETF log-returns. For example, after 6 months of holding, SCO (\(\beta = -2\), crude oil) diverges from β times the reference, illustrating the effects of volatility decay.

Fig. 8
figure 8

Cumulative empirical log-returns of the LETF (solid dark) vs benchmark (solid light) and β times reference (dashed light), from Dec 2008–May 2013. From top left to bottom right: UCO, SCO (crude oil); UGL, GLL (gold); DIG, DUG (building materials). UCO, UGL, and DIG have β = 2 while SCO, GLL, and DUG have \(\beta = -2\)

However, there are also some strong deviations from the predictions given by the benchmark, which compound as the holding time increases. This causes the LETF to underperform even after the volatility decay is accounted for. For example, DUG’s (\(\beta = -2\), oil & gas) empirical returns begin to trail its benchmark significantly around 2009. Therefore, the volatility decay cannot explain all the LETF underperformance.

We are therefore motivated to quantify the over/under-performance of the LETFs after observing deviations from the benchmark in Fig. 8. We introduce the concept of realized effective fee (REF) as the effective deduction rate charged by the LETF provider over the frictionless dynamic portfolio from which the LETF is constructed in Sect. 3.1. For a holding interval [0, t], the corresponding REF is defined by

$$\displaystyle{ \widehat{f_{t}} = (1-\beta )r -\frac{\ln \frac{L_{t}} {L_{0}} -\beta \ln \frac{S_{t}} {S_{0}} -\frac{\beta -\beta ^{2}} {2} V _{t}} {t}. }$$
(10)

Since for each LETF, L t , S t , V t , β, and r are all known, we can calculate the REF \(\widehat{f_{t}}\) for any LETF over a given holding period [0, t] using historical prices. We remark that the REF, which is indexed by time t, depends on the selected holding horizon.

In many cases, the REF is seen to be much larger than the fund’s advertised fee, indicating significant underperformance. Out of the 23 commodity LETFs, 2 have negative implied costs, so that the fund overperforms by the end of the 5 year period Dec 2008 to May 2013. If the REF exceeds the advertised fee, then the investor effectively pays an extra price for the opportunity to invest in the LETF. As a general trend, the bear LETFs tend to charge higher REFs than bull LETFs with the same magnitude of leverage | β | . For example, USLV (β = 2, silver) has a REF of 93 bps, while DSLV (\(\beta = -2\), silver) has an REF of 504 bps over the period Dec 2008–May 2013. The two highest REFs correspond to DUG (\(\beta = -2\), oil & gas) and SMN (\(\beta = -2\), building materials), whose REFs are 1,134 bps and 1,625 bps respectively. Figure 8 illustrates that DUG (\(\beta = -2\), oil & gas) drastically underperforms the benchmark, thereby realizing a high REF. Notice that in both cases, however, DUG and SMN’s bull counterparts DIG (β = 2, oil & gas) and UYM (β = 2, building materials) respectively display a negative REF, indicating overperformance during the same period. It is possible that as the reference trends upwards for a long period of time, the bear LETF will underperform, while the bull LETF will overperform (Table 4).

Table 4 Comparison of the official fee for the LETF charged on the fund prospectus and the REF calculated using 5 years of price data (Dec 2008–May 2013) for the LETF and reference (see (10))

4 A Static LETF Portfolio

Taking advantage of the volatility decay, a well-known trading strategy used by practitioners involves shorting a ±β pair of LETFs with the same reference, as discussed in [2, 7, 9, 11]. Since the LETFs have opposite daily returns on the same reference index, the portfolio has very little exposure to the reference as long as the holding period is sufficiently short. With this strategy, the volatility decay can help generate profit, which is the intuition of many practitioners. However, the portfolio is exposed to risk during periods of low volatility and high trending, as well as tracking errors. In this section, we describe an extension of this trading strategy by allowing the positive and negative leverage ratios to differ. We determine the portfolio weights to approximately eliminate the dependence on the reference. We show that the resulting portfolio is long volatility. For a number of LETF pairs, we find from empirical data that on average the strategy is profitable with enormous tail risk.

We now construct a weighted portfolio which is short the LETF with leverage ratio β + > 0 and short another LETF with leverage ratio β  < 0. We emphasize that both LETFs having the same reference, but that β + and | β  | may differ. We hold fraction ω ∈ (0, 1) of the portfolio in the β +-LETF and (1 −ω) of the portfolio in the β -LETF. At time T, the normalized return from this strategy is

$$\displaystyle{ \mathcal{R}_{T} = 1 -\omega \frac{L_{T}^{+}} {L_{0}^{+}} - (1-\omega )\frac{L_{T}^{-}} {L_{0}^{-}}. }$$
(11)

Applying (5), \(\mathcal{R}_{T}\) admits the expression

$$\displaystyle\begin{array}{rcl} \mathcal{R}_{T}& =& 1 -\omega \left (\frac{S_{T}} {S_{0}} \right )^{\beta _{+}}\exp (\varGamma _{T}^{+}) - (1-\omega )\left (\frac{S_{T}} {S_{0}} \right )^{\beta _{-}}\exp (\varGamma _{T}^{-}),{}\end{array}$$
(12)

where

$$\displaystyle{ \varGamma _{T}^{\pm } = \frac{\beta _{\pm }-\beta _{\pm }^{2}} {2} V _{T} + ((1 -\beta _{\pm })r - f_{\pm })T, }$$
(13)

Here, β ± and f ± are the respective leverage ratios and fees of the two LETFs in the portfolio defined in (11). Over a short holding period such that \(\frac{L_{T}} {L_{0}} \approx 1\), one can pick an appropriate weight ω to approximately remove the dependence of \(\mathcal{R}_{T}\) on S T .

Proposition 1.

Select the portfolio weight \(\omega ^{{\ast}} = \frac{-\beta _{-}} {\beta _{+}-\beta _{-}}\) . For \(\frac{L_{T}} {L_{0}} \approx 1\) , the return from this strategy is given by

$$\displaystyle{ \mathcal{R}_{T} = \frac{-\beta _{-}\beta _{+}} {2} V _{T} - \frac{\beta _{-}} {\beta _{+} -\beta _{-}}(f_{+} - f_{-})T + (f_{-}- r)T. }$$
(14)

Proof.

For \(\frac{L_{T}} {L_{0}} \approx 1\), we can substitute for \(\frac{L_{T}} {L_{0}}\) with \(\ln \frac{L_{T}} {L_{0}} + 1\) in (11). Then, we set ω = ω and apply (5) to conclude (14).

The return (14) corresponding to portfolio weight ω reflects a linear dependence on the realized variance. In particular, the coefficient \(\frac{-\beta _{-}\beta _{+}} {2}\) is strictly positive, so the strategy is effectively long volatility (V T ). Also, as it does not depend on S T , the ω portfolio is Δ-neutral as long as the reference does not move significantly. In Table 5, we summarize the coefficient of V T and the weighted portfolio \((\omega ^{{\ast}},1 -\omega ^{{\ast}})\) for different combinations of leverage ratios. Note that as long as \(\beta _{+} = -\beta _{-}\), we end up with the portfolio weight \(\omega ^{{\ast}} = \frac{1} {2}\). Also, the coefficient \(\frac{-\beta _{-}\beta _{+}} {2}\) exceeds or equals to 1 except for the pair \((\beta _{+},\beta _{-}) = (1,-1)\), and it is largest for the pair \((\beta _{+},\beta _{-}) = (3,-3)\).

Table 5 Table of \((\beta _{+},\beta _{-})\) pairs vs ω the weight of the β + portfolio, and \(\frac{-\beta _{-}\beta _{+}} {2}\) the dependence of the strategy on V t (see Proposition 1)

We now backtest the ω strategy from Proposition 1 as follows. For each LETF pair, we short $0.5 of the β +-LETF and $0.5 of the β -LETF with \(\beta _{+} = -\beta _{-} = 2\) and hold the position for some time T. The normalized return \(\mathcal{R}_{T}\) depends on the relative weights on the long/short-LETFs but not the absolute cash amounts. More generally, one can also test the strategy with different β ± and ω .

Dividing the price data from Dec 2008 to May 2013 into n-day rolling (overlapping) periods, we calculate the returns from the strategy over each period. For every n-day return, we compare against the realized variance over the same period. This is illustrated in Fig. 9. As a theoretical benchmark, we also plot \(\mathcal{R}_{T}\) in (14) as a linear function. Each point (dot) on the plots represents a 5-day return, but over rolling periods the returns are not independent. In other words, the lines in Fig. 9 are not generated by regression but taken from (14). We choose (14) as a benchmark because it is expected to hold pathwise as long as \(\frac{L_{T}} {L_{0}} \approx 1\) with negligible tracking error.

Fig. 9
figure 9

Plot of trading returns vs realized variance for a double short strategy over 5-day rolling holding periods, with β ± = ±2 for each LETF pair. We compare with the empirical returns (circle) from the ω strategy with the predicted return (solid line) in Proposition 1. Trading pairs are DIG-DUG (oil & gas), UGL-GLL (gold), UCO-SCO (crude oil), AGQ-ZSL (silver)

We can observe from Fig. 9 that the returns exhibit positive dependence on the realized variance (V T ). In particular, for the energy pairs (DIG-DUG (β = ±2, oil & gas) and UCO-SCO (β = ±2, crude oil)), the returns tend to be very positive when the realized variance is high. This is because the strategy captures the volatility decay as profit. Nevertheless, there is also a visible amount of noise in the returns deviating from the linear dependence on V T , especially for the gold and silver pairs (UGL-GLL (β = ±2, gold) and AGQ-ZSL (β = ±2, silver), respectively). This can be partly attributed to tracking errors from both LETFs in the portfolio. Also, the ω -strategy loses its Δ-neutrality if the reference moves significantly.

While this portfolio is expected to be Δ-neutral (with respect to the reference index) for small reference movements, in reality the strategy is also short-Γ. One way to see this is through Fig. 10 that plots the returns against the reference index returns. Common to all four LETF pairs, when the reference return is either very positive or negative, the return of the ω -strategy tends to be negative. As a theoretical benchmark, we also plot the normalized return equation (12) which applies even for large reference movements.

Fig. 10
figure 10

Plot of returns of reference index vs trading returns for a double short strategy over 5-day rolling, holding periods. β ± = ±2 for each LETF pair. We compare the empirical returns from our trading strategy (dark solid circle) with the predicted dependence on reference returns according to (12), using Γ T ± = 0 (light solid line). Trading pairs are DIG-DUG (oil & gas), UGL-GLL (gold), UCO-SCO (crude oil), AGQ-ZSL (silver)

In contrast to the energy pairs, the gold and silver pairs yield very noisy returns. This is consistent with our earlier observations from our regressions in Figs. 3 and 4. For instance, both UGL and GLL (β = ±2, gold) show substantial tracking errors over short periods such as 5 days, and their regressed leverage ratios differ from the stated ones. On the other hand, the DIG and DUG (β = ±2, oil & gas) regressions in Figs. 1 and 2 reflect much less tracking errors.

Furthermore, Fig. 11 shows that as the holding time increases, the returns from the ω strategy increases as well. The performance is best for the energy pairs UCO-SCO (β = ±2, crude oil) and DIG-DUG (β = ±2, oil & gas), but more subdued for the bullion pairs UGL-GLL (β = ±2, gold) and AGQ-ZSL (β = ±2, silver). However, over longer holding periods, the ω portfolio may lose its Δ-neutral status, thereby generating more risk as well. Although average returns from the ω strategy are positive, one is subject to enormous tail risk, which increases with the holding time of the static portfolio. In order to ensure that we do not subject ourselves to excessive tail risk, we should not only be sure of a high volatility environment, but we must also adjust the holding time to account for the extra risk associated with time horizon of returns.

Fig. 11
figure 11

Average returns from a double short trading strategy by commodity pair over no. of days holding period. β ± = ±2 for each LETF pair. Trading pairs are DIG-DUG (oil & gas), UGL-GLL (gold), UCO-SCO (crude oil), AGQ-ZSL (silver)

Figure 12 gives another perspective of the ω strategy’s dependence on realized variance. It shows the time series of the 30-day rolling returns along with the realized variance of the reference index from Dec 2008 to May 2013. We see that when the realized variance increases sharply, the strategy returns also spike sharply. For example, when DJUSEN index realized variance spikes, the DIG-DUG (β = ±2, oil & gas) trading pair accumulates a 30 % return over a single 30-day holding period. However, when realized variance is subdued over a period of time, the ω returns may turn quite negative as well.

Fig. 12
figure 12

Time series of returns for a double short strategy over 30-day rolling, holding periods, with β ± = ±2 for each LETF pair. Notice how during the periods of greatest volatility the double short strategy has the greatest return. Trading pairs are DIG-DUG (oil & gas), UGL-GLL (gold), UCO-SCO (crude oil), AGQ-ZSL (silver)

In summary, the double-short trading strategy studied herein is profitable on average, but it is commodity specific and subject to enormous tail risk, as seen from empirical prices. The strategy’s profitability depends strongly on a high volatility from the reference index. Although longer holding times tend to enhance the average return, they also enormously increase the horizon risk. According to these findings, this strategy appears to be appealing only during times of high volatility in the reference index.

5 Concluding Remarks

The ETF market has continued to grow in quantity and diversity, especially in the past 5 years. For both investors and regulators, it is very important to understand and quantify the risks involved with various ETFs. In this paper, we have focused on commodity ETFs and their leveraged counterparts. We find that the LETF returns tend to deviate significantly from the corresponding multiple of the reference returns as the holding horizon lengthens. To study the performance of an LETF, we have applied a new benchmark process that accounts for the realized variance of the underlying. We find that many commodity LETFs still diverge, typically negatively, from this benchmark over time. These empirical observations motivate us to illustrate the over/under-performance of an LETF via the concept of realized expense fee. Based on the funds and the time periods we have studied, most commodity LETFs effectively charge significantly higher expense fees than stated on their prospectuses.

In view of LETFs’ common pattern of value erosion over time, one well-known trading strategy in the industry involves statically shorting both long and short LETFs in order to capture the volatility decay as profit. We systematically study an extension of this strategy that is applicable to LETF pairs with different asymmetric leverage ratios. We analytically derive the specific weights in the LETFs so that the resulting portfolio is approximately Δ-neutral, but short-Γ as well. This strategy can potentially be quite profitable but its return can be negatively impacted by tracking errors generated by the LETFs and large movements of the reference index. These two factors both depend on the holding horizon. This should motivate future research on the horizon risk for LETF strategies. To this end, Leung and Santoli [7] study the admissible holding horizon and leverage ratio given a risk constraint. The recent papers [6, 13, 14] examine the dynamics of price spreads between ETF pairs, for example, gold vs. silver.

Our analysis herein does not assume a parametric stochastic volatility model for the underlying. It is of practical interest to investigate the price behavior of LETF under a number of well-known stochastic volatility models, such as the Heston and SABR models. On top of LETFs, there are also options written on these funds. This gives rise to the question of consistent pricing of LETF options across leverage ratios (see [1, 8]). Finally, models that capture the connection between LETFs and the broader financial market would be very useful for not only traders and investors, but also regulators.