INTRODUCTION

For almost two decades, the persistence of hedge fund performance has been subject of a considerable number of academic studies because of its important implications for investors (see Eling, 2009). Investors are naturally interested in selecting the funds with the highest future performance but typically base their decisions on past returns (see Capon et al, 1996). Thus, the identification of the future best managers on the basis of a track record requires a rigorous method to quantify and evaluate persistence in returns (see Géhin, 2005).

Generally, we distinguish between two types of return persistence: relative return persistence (RRP) and pure return persistence (PRP).Footnote 1 To analyse RRP, funds are classified as winners or losers by comparing their returns with the median return in a given period. Evidence of return persistence is observed when winners and/or losers remain the same between periods. While the cross-product ratio test (see Agarwal and Naik, 2000), the χ 2 test (see Park and Staum, 1998), the rank information coefficient test (see Herzberg and Mozes, 2003), the Spearman rank correlation test (see Park and Staum, 1998) and the regression-based method (see Brown et al, 1999) are the most-used two-period approaches, the Kolmogorov–Smirnov test employs a multi-period framework and can be considered the most powerful method of testing RRP (see Agarwal and Naik, 2000; Teo et al, 2003; Eling, 2009). In contrast to RRP methods, which indicate persistence of the outperformance or underperformance relative to other funds, a PRP test allows identification of the funds exhibiting the strongest persistence of their positive returns which is analysed fund by fund.

De Souza and Gokcan (2004), henceforth DSG, propose examining PRP based on the Hurst exponent H because it is a natural indicator of long memory and its fractal estimators are typically superior to more conventional approaches of measuring long memory, such as autocorrelations, variance ratios and spectral decompositions, in terms of sensitivity to highly non-Gaussian series, convergence and non-periodic cycles (see Campbell et al, 1997).Footnote 2 H<0.5 suggests anti-persistence, that is, that positive returns tend to succeed negative returns and vice versa, H=0.5 implies random succession of returns and H>0.5 indicates persistence, that is, that returns tend to be followed by returns of the same sign (see Kantelhardt, 2009). While for high H observations in the remote past are nontrivially correlated with observations in the distant future, no information is given on whether positive or negative returns persist. Thus, DSG suggest a fund selection procedure that combines a high Hurst exponent with a low negative return ratio D because this mix indicates the persistence of positive returns. In an exemplary performance analysis of 314 hedge funds in the period from 1997 to 2002, they show that this procedure can identify funds that significantly outperform the remaining funds out-of-sample and thus can be regarded as a particularly interesting approach for hedge fund selection.Footnote 3

Despite its high theoretical and empirical appeal, the DSG approach has received little attention in the literature. This can be attributed to several issues. First, the accuracy of the presented empirical results is limited because Hurst exponent approaches are known to require larger time series to be precise (see Amvella et al, 2010). Second, an extension to a larger number of funds (extracted from one of the standard hedge fund databases) would make the presented evidence more convincing (see Capocci and Hübner, 2004). Finally, we have to consider studies showing that the choice of fractal estimator for the Hurst exponent can crucially influence decision-making (see Taqqu et al, 1995; Clegg, 2006; Fernandez, 2011). Thus, while DSG rely only on the classic rescaled range analysis (RRA) of Hurst (1951) and Mandelbrot (1971), using alternative estimators would give important indications on the robustness of results.

In this article, we address these drawbacks and revisit the DSG approach in a wider setting. On the one hand, we use a richer sample (in the time series as well as in the cross-sectional dimension) of hedge funds obtained from the Center for International Securities and Derivatives Markets (CISDM) database (as used in Auer and Schuhmacher, 2013a, 2013b). On the other hand, we take a detailed look at the sensitivity of results to alternative fractal estimators. In addition to using the standard RRA, we employ three alternative estimators that have received considerable attention in recent years. Specifically, we implement the detrended fluctuation analysis (DFA) of Moreira et al (1994) and Peng et al (1994), the periodogram regression method (PRM) of Geweke and Porter-Hudak (1983) and the average wavelet coefficient method (AWCM) of Simonsen et al (1998). While RRA and DFA have become state-of-the-art techniques not only for measuring long memory in economic time series but also in the related areas of market crash prediction (see Grech and Mazur, 2004), investment strategy development (see Batten et al, 2013) and the evaluation of market efficiency (see Eom et al, 2008), the PRM and the AWCM are less widespread but have proven particularly valuable in the analysis of stock and commodity return dynamics (see Crato and de Lima, 1994; Simonsen, 2003) and order sign forecasting (see Lillo and Farmer, 2004). The primary goal of our study is to evaluate the ability of the Hurst exponent to identify superior hedge funds in-sample and out-of-sample. With respect to the choice of fractal estimator, we are particularly interested in whether it has a crucial influence on the fund selection outcome.

The remainder of the article is organised as follows. The next section provides a brief description of our hedge fund data set. The subsequent section discusses our four fractal estimation procedures for the Hurst exponent. The section after that presents the empirical results and the outcomes of several robustness checks. The final section concludes.

DATA

The hedge fund return data used in this study are obtained from the CISDM hedge fund database that has been subject of many academic studies.Footnote 4 Even though it covers 6854 hedge funds from January 1972 through November 2009, we concentrate on the years 1994–2009 for three reasons.Footnote 5 First, because hedge fund data before 1994 is considered unreliable (see Fung and Hsieh, 2000; Liang, 2000; Li and Kazemi, 2007), the exclusion of this time frame limits the data flaws that plague much of the older hedge fund research (see Capocci and Hübner, 2004; Liang and Park, 2007; Eling, 2009). Second, unlike other studies (see Brown et al, 1999; Amenc et al, 2003; Baquero et al, 2005), our sample contains bullish and bearish markets such that our results are not biased by a concentration on a specific market phase (see Capocci et al, 2005; Ding and Shawky, 2007). Third, it is important not to use excessively long time periods in the analysis of hedge fund return persistence because, in this context, persistence is associated mostly with the special skill of the fund manager and hedge fund managers usually do not work with the same hedge fund for more than a decade (see Boyson, 2008).

While the time-series dimension of our sample should not be too big because of the aforementioned reasons, we have to consider that our fractal estimators of the Hurst exponent require sufficient sample sizes in order to be precise (see Chamoli et al, 2007). Thus, to balance both requirements, only funds with at least 120 monthly return observations are admitted to our sample. This leads to a final cross-sectional sample size of 1493 funds. Furthermore, we have to take into account that Hurst exponent estimates must be obtained based on continuously compounded returns (see Peters, 1992). Thus, we convert the simple returns from the CISDM database to log-returns.

Table 1 presents some descriptive statistics for our sample. It shows the minimum, maximum, mean and standard deviation of the first four moments of the return distributions (mean, standard deviation, skewness and kurtosis). As we can see, the hedge funds in the data set show positive returns of 0.75 per cent on average with an average standard deviation of 4.25 per cent.Footnote 6 On average, returns are skewed to the left and show heavier tails than the normal distribution. However, this is not a concern for our subsequent analysis because fractal estimators can be applied even to highly non-Gaussian time series (see Mandelbrot and Wallis, 1969; Barunik and Kristoufek, 2010; Kristoufek, 2012).

Table 1 Descriptive statistics

Even though our sample selection process eliminates periods with unreliable return data, one might argue that the data still suffers from classic survivorship and backfilling biases (see Ackermann et al, 1999; Fung and Hsieh, 2000). These biases influence studies evaluating the performance of the overall hedge fund industry because they lead to an overstatement of the industry’s performance. In our study, however, the aim is different. We take the perspective of an investor with the goals to (i) identify the best funds listed in the CISDM database by means of a Hurst exponent approach and to (ii) invest in the resulting selection. This kind of fund-picking is naturally only meaningful when conducted among funds still active at the time the investment decision is made. Furthermore, for this investor, the CISDM database is the only source of return information and he cannot detect whether a fund backfilled data after the incubation period or not. As a consequence, our research question and our results are conditional upon the CISDM data. That is, we answer the question of whether investors can use the Hurst coefficient obtained based on the CISDM data to identify the best hedge funds.

METHODOLOGY

Rescaled range analysis

The RRA of Hurst (1951) and Mandelbrot (1971) is probably the best-known method to estimate the self-similarity parameter or Hurst exponent H. Given a series of continuously compounded returns , RRA is basically performed in two steps (see Sánchez Granero et al, 2008; Souza et al, 2008).Footnote 7 We begin the first step by dividing the given time series into d sub-series of length n. Next, for each sub-series m=1, …, d, we estimate the mean μ m and standard deviation σ m by means of maximum likelihood and use the resulting values to obtain the rescaled range

where the first (second) term in brackets is the maximum (minimum) over i of the partial sums of the first i deviations of r j,m from the sub-series mean μ m . Finally, we calculate the average of the rescaled range over all sub-series of length n.

The second step makes use of the fact that the statistic asymptotically follows the relation , where c is a constant and H is the Hurst exponent (see Mandelbrot, 1975). Thus, we can obtain H by running a linear regression over a sample of increasing time horizons, that is, , where ε is the residual of the regression.

Detrended fluctuation analysis

Our second method to estimate H is the DFA proposed by Moreira et al (1994) and Peng et al (1994). Similar to the RRA, the DFA is performed in two steps (see Grau-Carles, 2000; Grech and Mazur, 2004). In the first step, we divide the time series into d sub-series of length n. Next, for each sub-series m=1, …, d, we create a cumulative time series for i=1, …, n, fit a least squares line Y i,m =a m +b m i+ε to {Y 1,m , …, Y n,m }Footnote 8 and calculate the root mean square fluctuation

of the integrated and detrended time series. Finally, we calculate the mean value of the root mean square fluctuation for all sub-series of length n. In the second step, a power-law behaviour is expected (see Peng et al, 1994; Taqqu et al, 1995) from which H can be extracted from log-log linear fit similar to RRA.

Periodogram regression method

A semi-parametric procedure to obtain the Hurst exponent has been developed by Geweke and Porter-Hudak (1983). Its basic idea is to estimate the differencing parameter κ of a general fractionally integrated model. Since the spectral density function of such a model is identical to that of a fractional Gaussian noise with Hurst exponent H=κ+0.5, the PRM can be used to estimate H. The estimation procedure involves the following calculations (see Weron, 2002; Granger and Hyung, 2004). We begin with calculating the periodogram, which is a sample analogue of the spectral density (see Stoica and Moses, 2005). For a time series , it is defined as

where ω k =k/T, k=1, …, ⌊T/2⌋, and ⌊x⌋ denotes the largest integer less than or equal to x. The next and final step is to run a linear regression at low Fourier frequencies ω k , k=1, …, K⩽⌊T/2⌋. H is then obtained by plugging the estimate of κ into H=κ+0.5.

Average wavelet coefficient method

Our last estimation procedure is the AWCM of Simonsen et al (1998). This method utilises the wavelet transformation in order to measure the Hurst exponent and is conducted as follows (see Simonsen, 2003; Chamoli et al, 2007). First, the given time series is transformed into the wavelet domain. That is, a filter ψ(.) is passed over the series . The width of the filter is usually increased by a power two, giving a finite number of filtered signals

where ψ(.) is called the mother wavelet function, a is the scale factor and b is the translation of the origin. The variable 1/a gives the frequency scale and b gives the temporal location of an event. Thus, W a,b (r, ψ) can be interpreted as the ‘energy’ of r of scale a at t=b (see Percival and Walden, 2000; Weron, 2006; Serinaldi, 2010). The AWCM consists of finding a representative ‘energy’ or amplitude for a given scale a. This is usually done by taking the arithmetic average of |W a,b (r, ψ)| over all location parameters b corresponding to the same scale a. This yields the spectrum W a (r, ψ) that depends only on the scale. If r t is a self-affine process characterised by the exponent H, this spectrum should scale as W a (r, ψ)~a H+0.5 (see Simonsen et al, 1998). Thus, we can estimate the slope of a double-log regression of W a (r, ψ) on a and obtain H by subtracting 0.5 from the slope coefficient.

EMPIRICAL ANALYSIS

Basic setting

For the estimation of the Hurst exponent, we use the fractal estimators of the section ‘Methodology’ with the following parameterisations. We set the minimum sub-sample size in RRA and DFA to n min=12 months. To avoid problems with small n, we apply the Peters (1994) correction to adjust the regression slope in the RRA. In line with the recommendation of Diebold and Rudebusch (1989) and Weron (2002), we choose K in the PRM such that ⌊N 0.2⌋⩽K⩽⌊N 0.5⌋, that is, K=⌊N v⌋ with v=0.35. Finally, we follow the AWCM specification of Weron et al (2004).

To select funds based on Hurst exponent estimates, we use the following procedure. In a first step, we estimate the Hurst exponent based on the in-sample return observations t=1, …, TT oos +1, where T oos is the number of returns that characterise our out-of-sample period. As some authors regard a number of 24 monthly returns as a minimum requirement for evaluating hedge fund performance (see Ackermann et al, 1999; Gregoriou, 2002; Capocci and Hübner, 2004; Liang and Park, 2007), we set our out-of-sample phase to this length. In the second step, we calculate the in-sample D ratio, that is, the ratio of negative returns to the total number of returns, and select the funds with the highest H and the lowest D. This is because we are interested in the persistence of positive returns. Finally, we evaluate the average performance of the resulting fund selection by means of its Sharpe ratios in-sample and out-of-sample.Footnote 9

Main results

Figure 1 shows the distributions of the hedge fund Hurst exponents resulting from our four fractal estimators.Footnote 10 As the value of H for a random sequence will almost always deviate from 0.5 when the study sample is limited, we need some kind of confidence level that allows us to interpret this figure, that is, to judge the number of persistent funds. In this respect, we follow the recent empirical study of Hull and McGroarty (2014).Footnote 11 They consider a Hurst exponent greater (lower) than 0.65 (0.40) to be strong evidence for persistence (anti-persistence). Realisations between 0.40 and 0.65 are interpreted as random behaviour.

Figure 1
figure 1

Distributions of hedge fund Hurst exponents.

This figure shows the distributions of the Hurst exponents estimated for our hedge fund sample using the RRA, the DFA, the PRM and the AWCM in the specifications outlined in the section ‘Basic setting’.

With these boundaries in mind, Figure 1 provides the first important result that, even though we analyse a sample of long-lived hedge funds, only a small proportion of them shows very high persistence. The majority of funds is characterised by random return behaviour and thus their good performance may just have been luck (see Fama and French, 2010). Interestingly, this holds irrespective of the choice of fractal estimator.

To obtain a more detailed picture on this issue, Panel A of Table 2 reports the proportions of funds falling into the three groups anti-persistent (H<0.4), random (0.4⩽H⩽0.65) and persistent (H>0.65). For all estimators, we find the highest proportion of funds in the random group, that is, 59 per cent for the RRA, 38 per cent for the DFA, 40 per cent for the PRM and 41 per cent for the AWCM. Only 18, 30, 30 and 34 per cent are classified as persistent by RRA, DFA, PRM and AWCM, respectively. Interestingly, the RRA identifies a somewhat lower number of persistent funds than the three alternative approaches.

Table 2 Main results

Panels B and C of Table 2 describe the most important outcome of our study. They show the average in-sample and out-of-sample Sharpe ratios for the funds in the different long memory groups. Furthermore, they subdivide the persistent group into funds with high (above the group median) and low (below the median) D ratios.

Looking at the in-sample results first, we find that the persistent funds show the highest Sharpe ratios, followed by the random funds and the anti-persistent funds. In-sample, even the two latter groups show rather high Sharpe ratios on average. However, this is not surprising because of the overall positive performance of our fund sample (see Table 1). Another straightforward result is that a focus on persistent funds with low D ratios yields higher average Sharpe ratios because a low D ratio automatically implies a higher Sharpe ratio. Persistent funds with low D ratios are characterised by an average Sharpe ratio of 0.88, 0.67, 0.73 and 0.73 for the RRA, DFA, PRM and AWCM, respectively.Footnote 12 In contrast, persistent funds with high D ratios earn average Sharpe ratios of only 0.28, 0.22, 0.24 and 0.26, respectively.

Turning to the out-of-sample results, we can observe that anti-persistent and random funds do not appear to repeat their good performance because they show low out-of-sample Sharpe ratios that do not crucially differ from 0. A similar argument holds for the persistent funds with high D ratios. However, funds with persistent positive returns show not only high average Sharpe ratios in-sample but also out-of-sample. We find average realised Sharpe ratios of 0.54, 0.26, 0.20 and 0.27 for our four approaches, respectively. Even though the Sharpe ratios are somewhat higher in the case of the RRA, all four Hurst exponent approaches appear to have considerable potential for effective hedge fund selection.Footnote 13

Robustness checks

As our main results are subject to issues of arbitrary choice, this section describes a variety of supplementary calculations that verifies the robustness of our results.Footnote 14

Influence of outliers

To limit the influence of outliers (either resulting from potentially faulty data or crisis impacts) on our results, we follow Bali et al (2011) and perform a 95 per cent winsorization. That is, we set all data below (above) the 5 per cent (95 per cent) percentile to the 5 per cent (95 per cent) percentile. Table 3 reports the resulting average out-of-sample Sharpe ratios for this robustness check. Even though the Sharpe ratios are lower, our main conclusions of the section ‘Main results’ still hold.

Table 3 Robustness checks numbers 1–5

Return filters

Several articles have shown that RRA can be sensitive to short memory and heteroskedasticity (see Lo, 1991) and PRM is not always unbiased and not generally consistent in the non-stationary case (see Velasco, 1999; Andrews and Guggenberger, 2003).Footnote 15 Thus, recent literature suggests to check the robustness of results by means of ARMA-GARCH filters that eliminate short-range dependencies and heteroskedasticity from the data (see Cajueiro and Tabak, 2004; Batten et al, 2013).Footnote 16 Fitting a simple ARMA(1,1)-GARCH(1,1) model and obtaining H for the standardised residuals of the fitted model yields the average out-of-sample Sharpe ratios presented in Table 3.Footnote 17 With the exception of the PRM, previous implications on the out-of-sample performance of the Hurst-based selection are confirmed.Footnote 18

This filter exercise not only addresses the potential problems with our estimators, it also implicitly rules out a possible explanation for observable return persistence. Henn and Meier (2004) and Kosowski et al (2007) argue that funds with smoothed monthly returns (either because of holding illiquid securities or managed returns) might show a high level of performance persistence that does not reflect manager skill. As artificial smoothing of returns becomes visible in the form of short-term autocorrelation (see Gemansky et al, 2004) and the filters eliminate this correlation, our identified persistence does not appear to be crucially driven by return smoothing. Furthermore, note that the high out-of-sample performance is a strong indication that return persistence actually comes from skill and not from return smoothing.

Hurst significance boundaries

So far, we have interpreted a fund to show significant signs of persistence (anti-persistence) for H>0.65 (H<0.40). As these boundaries may still lead to misleading conclusions (see Weron, 2002), we use more conservative values, that is, a lower bound of 0.25 and upper bound of 0.85. As we can see in Table 3, our conclusions remain unchanged. However, we can now observe even higher out-of-sample Sharpe ratios for the funds with persistent positive returns than in the main analysis. This is quite reasonable because, with higher H, we now concentrate on the most persistent funds.

Size of out-of-sample period

In order to analyse the predictive ability for shorter and longer horizons, we perform two additional robustness checks in which we limit T oos to 12 months and extend it to 36 months. Again, Table 3 shows qualitatively similar outcomes. For the RRA, we find a slight tendency of decreasing Sharpe ratio performance (of positively persistent funds) with increasing forecasting horizon.

Basic parameterisations

Finally, Table 4 varies some of the parameter settings in our estimations. First, we use a smaller number of minimum returns n min=6 to increase the number of regression observations in RRA and DFA. Second, we employ lower and higher cut-off values v=0.3 and v=0.4 for the PRM. The results are similar either way.

Table 4 Robustness checks numbers 6–9

CONCLUSION

Motivated by the need for adequate procedures to identify hedge funds with positively persistent returns (see Géhin, 2005) and the increasing popularity of Hurst exponent approaches to measure long memory (see Kantelhardt, 2009; Batten et al, 2013), we analyse their usefulness in hedge fund selection. We find that, regardless of the estimation procedure, Hurst exponents are highly interesting predictors of future hedge fund performance. Specifically, we show that a combination of a high Hurst exponent and a low proportion of negative returns can identify the funds with the highest out-of-sample Sharpe ratios. Thus, our extended research design provides highly supportive and robust evidence for the suggestion made in earlier studies (see De Souza and Gokcan, 2004) that Hurst exponents might be useful tools for individual investors seeking the ‘best’ hedge fund and for fund mangers choosing candidates for a fund of hedge funds.

Future research may extend the results of this article in several important ways. First, there are three main providers of hedge fund data (see Liang and Park, 2007) but we have used only data from one of these providers. Thus, using data from Hedge Fund Research (HFR) and Tremont Advisory Shareholders Services (TASS) might provide further support for the robustness of our results. Second, we have concentrated on return persistence. However, analysing persistence in other performance metrics, that is, Sharpe ratios or fund alphas (see Eling, 2009), may also be fruitful endeavours. Finally, we should also take a closer look at the negative implications of our results because it is well-known that long memory in returns has severe consequences for financial modelling. Long memory (i) makes optimal consumption/savings decisions extremely sensitive to the investment horizon, (ii) invalidates martingale methods in derivative pricing and (iii) distorts standard asset pricing tests (see Lo, 1991; Beran, 1992). Thus, if persistent hedge funds are part of investment decisions, financial products or asset pricing models, we might have to be careful when applying established standard methods. However, the question of how strong the observed levels of long memory in hedge fund returns would affect these methods still has to be answered.