1 Introduction

Pairs trading is the practice of taking simultaneous short and long positions in two similar securities (pairs) when their price spread exceeds a specified threshold. It is a common investment strategy: a Google News search of “pairs trade” on August 3, 2016 yields 154,000 articles, many recommending specific pair trades. While there has surely been analysis of alternative pairs-trading strategies and methods by investors that use them, such work is largely proprietary. The academic literature on this issue is small. The best-known work is Gatev et al. (2006). They find that, for one technique (the “distance approach”) with a specific parameterization, trading pairs of US equities produces excess returns of over 10% annually. Study of a second technique (the “cointegration approach”) has reached similar broad conclusions, although studies that use this technique are fewer and more specialized. The purpose of this paper is twofold. First, we study both approaches and compare their relative strengths and weaknesses for pairs trading using data from 1980 through 2014. Second, for both techniques, we do not limit ourselves to a specific parameterization of the pairs matching and trading algorithm, in contrast to much of the existing literature, but instead explore a wide range of possible parameterizations. This provides insight into the design of an optimal pairs-trading system.

There are three broad dimensions in the design of a pairs-trading system that we explore. The first is the pairs matching algorithm. The literature focuses on two main approaches. The most studied from the perspective of profitability of pairs trading is the distance approach, which is a simple method that selects pairs for trading based on their historical closeness in (normalized) price paths. This approach is reportedly (e.g., Gatev et al. 2006) commonly used by industry practitioners. The second approach—the cointegration approach—has greater theoretical foundation (i.e., a good pair will have cointegrated prices and their spread will be a mean-reverting process), but is much more complicated to implement.Footnote 1 There is no comprehensive analysis of pairs trading using both the distance and cointegration approaches. This paper fills that gap.

The second aspect of designing pairs-trading systems is specifying values for a series of parameters that affect both the pairs selection stage and the precise conditions under which pairs trades are executed. The literature focuses mainly on one such parameterization due to Gatev et al. (2006): pairs are chosen based on 12 months of historical data (the “formation period” length); pair trades are executed when price spreads diverge by two standard deviations (computed from the 12-months formation period); and a trade is closed when prices converge or after 6 months if prices have not converged (the “trading period” length). What is interesting is that this specific parameterization has become standard in analyses of pairs trading using both approaches, but appears to lack rigorous justification. As Gatev et al. (2006, 803) state: “We form pairs over a 12-months period (formation period) and trade them in the next 6-months period (trading period). Both 12 and 6 months are chosen arbitrarily and have remained our horizons since the beginning of the study” (emphasis added). The two-standard-deviation value for the pair trade trigger is similarly chosen arbitrarily.

In this study, we measure the sensitivity of return distributions from pairs trading to these three parameters for each of the two main approaches to pairs trading. This analysis requires the examination of billions of possible pairs for each of the two approaches. Our broad conclusion is that the parameterization can matter greatly for the return distribution for pairs trading, and the best parameterization is different for the two approaches, in different sub-periods, and across different sized portfolios. Taken together, return distributions are generally substantially more favorable under the optimal parameterization than with the standard parameterization. Of course, one should be mindful in such analyses of data-snooping bias, but our results, nevertheless, suggest that more consideration should be given to parameterization of pairs-trading methods.

One further aspect of pairs trading we consider is the sensitivity of pairs-trading profitability to the number of pairs that are tracked at any given point in time, which, therefore, sets an upper bound on the number of pairs trades that can be open at any time. While there is little direct evidence on how many pairs practitioners monitor, the academic literature considers this issue to some extent. We more thoroughly explore this aspect of designing a pairs-trading system, namely, we study systems that monitor the best 5, 10, 20, 50, 100, and 200 pairs, in all our analyses described above.

In general, we find that, ignoring transaction costs, the returns from both approaches are very large in the 1980s, 1990s, and for 2000–2014. However, once bid-ask spreads are taken into account, the cointegration approach does not produce positive excess returns in the post 1980s period, whereas the distance approach produces statistically positive risk-adjusted returns in all sub-periods. Once other transactions costs (commissions and market impact costs) are properly taken into account, the returns from the distance approach largely disappear in recent years, save perhaps periods of extreme market turmoil, such as the 2007 financial crisis, although even then they do not appear large.

The paper proceeds as follows. Section 2 discusses the literature. Section 3 describes pairs-trading methods, and Sect. 4 discusses parameterization of pairs-trading systems. Section 5 presents the results, and Sect. 6 concludes and suggests several extensions to explore in future research.

2 Literature

Reportedly, pairs trading originated in the mid-1980s with the quantitative group at Morgan Stanley (Bookstaber 2006). Although pairs trading is reportedly a common strategy at hedge funds and proprietary-trading operations, the academic literature is limited.

The most well-known study of pairs trading is by Gatev et al. (2006). The authors define and use the distance approach. The authors use daily US stock prices from 1962–2002, a 12-months formation period, a 6-months trading period, and a fixed trading threshold (a two-standard-deviation divergence in normalized prices). Their study finds that pairs trading generates a substantial excess return and a monthly Sharpe ratio six times larger than that of the overall market. They also show that their pairs-trading strategy has a high risk-adjusted Jensen alpha, low exposure-to-several systematic risk factors, covers reasonable transaction costs, and is not simply capturing short-term return reversals (e.g., Lehmann 1990). They also find that the profitability of pairs trading has declined over time, which they attribute to the strategy becoming more prevalent. Restricting pairs to industries does not raise profitability relative to the unrestricted strategy.

Do and Faff (2010, 2012) extend the sample period of Gatev et al. (2006) from 2002 through mid-2009. They find that the profitability of pairs trading has shown a downward trend, as Gatev et al. (2006) also note, although profitability is markedly higher in the bear market of the early 2000s and in the portion of the global financial crisis included in their sample. They attribute the higher profitability during bear markets to less market efficiency during these periods. Do and Faff (2012) argue that transaction costs significantly reduce the profitability of pairs trading.

Papadakis and Wysocki (2007) examine the impact of accounting information events on the profitability of pairs-trading strategies in US equity markets. They find that earning announcements and analyst forecasts can cause drift in relative prices, which often triggers the opening of pair trades. However, pairs trading based on such events is less profitable than non-event-triggered pairs trading.

Engelberg et al. (2009) investigate how information and liquidity influence the profitability of pairs trading. These researchers find that profit is lower when the news is specific to only one stock in the pairs as idiosyncratic news increases divergence risk and trading-period horizon risk. They also find that trading on large and liquid pairs tends to underperform trading on smaller and less liquid pairs.

The distance approach has also been studied for non-US equity markets, for example, Broussard and Vaihekoski (2012) for Finland, Perlin (2009) for the Brazilian market, Bolgun et al. (2010) for the Istanbul stock market, Lucey and Walshe (2013) for Germany and France, and Deaves et al. (2013) for Canada. These studies typically reach the same broad conclusions as Gatev et al. (2006). Jacobs and Weber (2013) use Gatev et al.’s (2006) approach to examine whether pairs trading returns are explained by the investor-inattention hypothesis (Peng and Xiong 2006). They find that this strategy is more profitable when their measure of investor inattention is high. This is confirmed using larger firms from US markets with daily data through 2008, and also generally for eight other developed-country stock markets.

The distance approach has been used to study pairs trading in markets other than equities. These studies include Nath (2003), who studies the secondary market for US Treasury securities, and Kanemura et al.’s (2008) study of energy futures markets.

A second branch of the literature uses the cointegration approach. This technique—but little or no empirical analysis—is discussed by, for example, Vidyamurthy (2004), Gregory et al. (2011), and Herlemont (2004). Lin et al. (2006) add a minimum profit constraint and study this modification using two Australian bank stocks. Caldeira and Moura (2013) use 50 stocks from Brazil and choose (cointegrated) pairs using the top-20 Sharpe ratios for in-sample returns on pairs trades using data for 2005–2012.

Baronyan et al. (2010) examine pairs trading by comparing the distance approach and cointegration approaches using weekly data for the 30 stocks in the DJIA over the period 1999–2008. This study refines the pair selection method by selecting (cointegrated) pairs using criteria similar to that explained in the present study and explained in Sect. 3. The authors find considerably lower returns on average than do most other studies.

Some studies consider parameterizations alternative to the standard one of Gatev et al. (2006). Baronyan et al. (2010) consider different formation period lengths, but their data are weekly for only 30 stocks, which could be a major drawback for analysis of what is typically considered as a fairly high-frequency trading strategy. Broussard and Vaihekoski’s (2012) study for Finland considers alternative values of the trigger parameter, Perlin (2009) considers the trigger parameter value in his study of 50 Brazilian stocks, and Lucey and Walshe (2013) do so in their analysis of pairs trading for a subset of stocks on the German and French exchanges.

The literature also contains some alternative techniques to the main ones discussed above. Huck (2009, 2010) develops a methodology that combines forecasting techniques with multi-criteria decision-making methods. The author ranks stocks in the S&P 100 with weekly data according to expected returns and pairs assets based on the highest over-valuations and under-valuations. Elliott et al. (2005), Tourin and Yan (2013), and Mudchanatongsuk et al. (2008) consider various technical models of the spread between two cointegrated prices, but in large part do not study these models empirically. Chen et al. (2012) use monthly data on the CRSP stocks from 1931–2010. Their pairs selection strategy is to create portfolios of 50 pairs that have the highest cross correlation of returns in the preceding 5 years.

3 Pairs-trading methodology

Pairs trading consists of two stages. The first stage is the formation period, where pairs of stocks are selected. The second stage is the trading period, where trades are opened and closed. We first discuss the formation period and, in particular, the two main techniques for selecting pairs.

3.1 Pair formation: the distance approach

The first step in the distance approach is to normalize the price of all stocks in the sample to unity at the beginning of the formation period. Let \(T_{\mathrm{fp}} \) denote the number of trading days in the formation period. The normalized price of each stock at the end of day \(t=1,2,\ldots T_{\mathrm{fp}} \), is, therefore:

$$\begin{aligned} P_t^i =\mathop \prod \limits _{\tau =1}^t \left( {1+r_\tau ^i } \right) , \end{aligned}$$
(1)

where \(r_\tau ^i \) is the stock’s daily return inclusive of dividends.

The second step is to compute the distance, \(D_{i,j} \), between two stocks i and j over the formation period:

$$\begin{aligned} D_{i,j} =\frac{{\sum }_{t=1}^{T_{\mathrm{fp}} } \left( {P_t^i -P_t^j } \right) ^{2}}{T_{\mathrm{fp}} }. \end{aligned}$$
(2)

If there are N stocks, then there are \((N\times (N-1))/2\) distances. The final step in selecting pairs with this method is to rank the pairs on the basis of the distances and create portfolios of pairs with the smallest distances (e.g., the “top 5” and “top 10” pairs).

3.2 Pair formation: the cointegration approach

A preliminary first step in the cointegration approach is determining the order of integration of each stock price using the Augmented Dickey–Fuller (ADF) test. Stocks can be potential pairs only if their order of integration is the same. The second step is to calculate the spread between two stock prices, \(\mathrm{PR}_t^{ij} =\log P_t^i -\log P_t^j \), and use the ADF to test for mean reversion of the spread. That is, estimate the regression equation:

$$\begin{aligned} \Delta \mathrm{PR}_t^{ij} =\gamma \mathrm{PR}_{t-1}^{ij} +\varepsilon _t , \end{aligned}$$
(3)

and test the null hypothesis that \(\gamma =0\). If the null hypothesis can be rejected (we use a 99% confidence level), it indicates that the spread is mean reverting. The third step is to test for cointegration using the Johansen test.Footnote 2

As might be expected, this procedure produces an extraordinarily large number of possible pairs, as many stock prices are cointegrated. To improve the pairs matching algorithm, further refinements are introduced. The fourth step we use is based on Granger causality. As Baronyan et al. (2010, 118) explain: “Specifically, the pairs must pass through the filter of satisfying two-way Granger causality tests. ... Obviously, two-way Granger causality is stronger than one-way Granger causality. A pair selected as such makes it less likely for the aforementioned structural breakdown [spurious cointegration] to take place before the trade is timed out at the end of the year”.

The cointegration approach does not provide any obvious metric for how to rank pairs in terms of their possible profitability for pairs trading. In contrast, with the distance approach, the measure (2) readily translates into forming pairs-trading portfolios for the “top 5” or “top 100” pairs, for example. To allow for a similar construction with the cointegration approach, we follow the literature and use a fifth step that involves specifying a metric for ranking pairs. Specifically, the metric used is the “market factor spread” (MFS) (see Herlemont 2004; Baronyan et al. 2010), where \(\hbox {MFS(i,j)}=\left| {\beta _i -\beta _j } \right| \) and \(\beta _i \) and \(\beta _j \) denote the market factors (the CAPM betas, computed over the formation period) for the pair. Pairs are ordered by their MFSs and pairs-trading portfolios are created using pairs having the lowest MFSs. The intuition behind the MFS criterion is that a pairs-trading strategy should be nearly market neutral. We acknowledge that there are clearly other possible criteria to use in place of the MFS. This is a topic for future research.

3.3 Opening a pairs trade

After choosing pairs in the formation period, all prices are normalized to unity at the beginning of a subsequent period, the “trading period”, and the spread of normalized prices for each pair is tracked. A pair trade is opened only if the spread of normalized prices exceeds a threshold, called the “trigger”. Opening a trade involves longing $1 of the lower priced stock and shorting $1 of the higher priced one. Formally, a trade is opened when \(\left| {P_t^i -P_t^j } \right| \ge \mathrm{trigger}(i,j)\), where:

$$\begin{aligned} {\mathrm{Trigger}}(i,j)= n\times {\mathrm{stdev}}(i,j). \end{aligned}$$
(4)

Here, n is a scalar and \(\mathrm{stdev}(i,j)\) is the standard deviation of the spread of normalized prices computed from the formation period. Because prices are normalized in both approaches, we can write:

$$\begin{aligned} {\mathrm{stdev}}(i,j)=\sqrt{\frac{1}{T_{\mathrm{fp}} -1}\mathop \sum \limits _{t=1}^{T_{\mathrm{fp}} } \left[ {\left( {P_t^i -P_t^j } \right) ^{2}-D_{i,j} } \right] ^{2}}. \end{aligned}$$
(5)

The standard value in the literature for the scalar n that pins down the trigger is \(n=2\).

3.4 Closing a pairs trade

A pairs trade is closed when the normalized price spread returns to a non-positive value. If that does not happen during the trading period, and thus, a pairs trade is still open at the end of the trading period, the position is automatically closed at that time.

3.5 Calculation of return on pairs trading

We use the same method to calculate portfolio return as in Gatev et al. (2006) (and most of the subsequent literature), called the “fully-invested return” or “return on capital employed”. Specifically, for any pair k, let \(p^{k}=\{l^{k},s^{k}\}\) represent a simultaneous long and short position of $1 in the two stocks of pair k, where \(l^{k}\) and \(s^{k}\) are indicator variables that take the value 1 when a pair trade has occurred and 0 otherwise. Let \(d^{k}\) indicate the day during a specific trading period that a trade was most recently opened on pair k. Because we are using end-of-day prices (see below), then \(d^{k}\in \{1,2,\ldots T_{\mathrm{tp}-1} \}\), where \(T_{\mathrm{tp}} \) is the number of days in a trading period. If \(R_t (l^{k})\) and \(R_t (s^{k})\), respectively, represent 1-day returns on the long and short positions of the pair, then the daily return for the pair trade is

$$\begin{aligned} R_t \left( {p^{k}} \right) =R_t \left( {l^{k}} \right) -R_t \left( {s^{k}} \right) . \end{aligned}$$
(6)

The daily return on a portfolio of \(N_t^*\) pairs is

$$\begin{aligned} R_t^{{\mathrm{port}}} =\sum _{k=1}^{N_t^*} {W_t^k R_t \left( {p^{k}} \right) } , \end{aligned}$$
(7)

where \(W_t^k =w_t^k /\sum _{j=1}^{N_t^*} {w_t^j } \) and

$$\begin{aligned} w_t^k =\left[ {1+R_{t-1} \left( {p^{k}} \right) } \right] \times \left[ {1+R_{t-2} \left( {p^{k}} \right) } \right] \times \cdots \times \left[ {1+R_{d^{k}+1} \left( {p^{k}} \right) } \right] , \end{aligned}$$
(8)

for \(t\ge d^{k}+2\) and \(\omega _t^k =1\) for \(t=d^{k}+1\). That is, we use the \(N_t^*\) open pairs that are held in the portfolio on day \({}^{t}\) to calculate the daily return of the portfolio, which is equal to the weighted average return of the pairs. The weight given to a pair is determined by its cumulative return relative to the sum of cumulative returns of all pairs in the portfolio. Note that because the strategy is based on a long–short position of $1, the return of the portfolio has the interpretation of excess return (i.e., the risk-free rate cancels from the two sides of the pair trade).

4 Parameterization of the trading system

These approaches can be implemented once some parameter values are specified. The parameters in question determine the formation period length \((T_{\mathrm{fp}} )\), the trading-period length \((T_{\mathrm{tp}} )\), the number of pairs to include in the pairs-trading portfolio, and the trigger. Note that the trigger is pinned down by the parameter n and we will henceforth simply refer to n as the “trigger value”.

There is no intuitively obvious parameterization. The literature, however, focuses mainly on the single parameterization used by Gatev et al. (2006) in their analysis using the distance approach. This “standard parameterization” is a 12-month formation period, a 6-month trading period, and a trigger value \(n=2\). The optimality of this particular parameterization for the profitability of pairs-trading techniques has received little attention in the academic literature.Footnote 3 In addition, there is no obvious reason to think this parameterization should work equally well for the cointegration and distance approaches, or that it should be time invariant.

We, therefore, consider a range of parameterizations. We consider the standard 12-months formation period as well as a 9-month formation period. For the trigger value, we consider \(n\in \{0.3, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0\}\). The trading period is set at 6 months throughout as we find that a shorter trading-period results in many pairs trades remaining open at the end of the trading period, and a longer trading period has minimal effects as pairs that converge do so by 6 months.Footnote 4 For portfolio sizes, we consider those with the top 5, 10, 20, 50, 100, and 200 pairs.

We use CRSP daily data from January 1980–December 2014. We consider only common stocks. To avoid survivorship bias, we drop stocks from a given formation and trading period only when they have missing data during that particular formation or trading period. The number of stocks under consideration is 4258 in the 1980s, 9522 in the 1990s, and 9644 for 2000–2014. The number of possible pairs to examine, therefore, ranges from about 9 million in the 1980s to more than 45 million in each of the two later sub-periods for a given parameterization. The dimensionality of the problem, however, is much larger: with two alternative formation period lengths and seven alternative trigger values, in each month, we must examine 126, 884, and 142 possible pairs for the 1980s, 634, 612, and 734 possible pairs in the 1990s, and 650, 979, and 644 possible pairs in the 2000s—for each of the distance and cointegration approaches. Furthermore, this pairs matching and trading problem is repeated monthly using a rolling window of formation and trading periods: the first formation period begins on the first trading day of January 1980, the second period begins on the first trading day of February 1980, and both formation and trading periods roll forward by 1 month throughout the sample. Thus, there are multiple overlapping portfolios in any given month and returns are averaged across these portfolios.

5 Results

5.1 Profitability of pairs trading and the role of parameterization of the trading system

We begin by presenting returns from pairs trading without adjustments for transaction costs of any sort, and study to what extent the parameterization of the trading system matters for the profitability of pairs trading. A main focus later in the paper is the effect of transaction costs, but the qualitative implications we discuss here in regard to the role of parameterization are valid in all cases.

Table 1 Pairs trading with distance approach—no waiting

The obvious question to address in regard to parameterization is whether the standard parameterization used in the literature is more or less the best one, or whether there are gains in profitability to be had with different ones. We first present results for the distance approach for the three sub-periods. For each portfolio of a given number of pairs, the results presented in Table 1 correspond to the portfolio, where the trigger value and formation period are chosen optimally on an in-sample basis. The displayed results are for the case in which pairs trades are opened and closed on the same day as the trigger is breached (“no waiting”). Below, we consider an alternative, where trading is delayed 1 day after the trigger is breached (“one-day-later rule”), because the no-waiting case is arguably contaminated with some degree of bid-ask bounce. For the purposes of assessing the consequences of alternative parameterizations, the conclusions are similar in both cases.

Note three main observations from the results with the distance approach. First, both excess returns and market-risk-adjusted excess returns (Jensen alphas) are very large and statistically significant for virtually all portfolio sizes and sub-periods, reaching as high as nearly 40% annually for the portfolios with fewer pairs; moreover, most of the excess return is market neutral.Footnote 5 Second, the number of pairs matters for profitability, with portfolios with fewer pairs having the highest returns. This is intuitive as the more pairs monitored for trading, the lower will be the average distance for a pair in the portfolio measured during the formation period, and thus, the pairs will generally be of lower quality in terms of matched stock pairs. Third, there is considerable variation in the optimal parameterization, with the standard parameterization being optimal in just seven of the 18 portfolios studied. The optimal trigger is generally larger than the standard value. The optimal parameterization, however, is reasonably close to the standard parameterization for the most profitable portfolio sizes (five and ten pairs).

Table 2 Pairs trading with cointegration approach—no waiting

The results for the cointegration approach (Table 2) are broadly similar to those for the distance approach, with a few exceptions: (1) they are somewhat lower than with the distance approach, but still large; (2) the standard parameterization is less often optimal than with the distance approach (in just two of 18 cases is it optimal); and (3) the optimal trigger value is generally lower than the standard value. Note that the standard value of the trigger is attributed to Gatev et al. (2006), who study only the distance approach, but this value has been adopted in the cointegration literature also. Our results suggest that, for both approaches, more thought may need to be given to simply assuming a trigger value of \(n=2\) as best practice for actually trading pairs or for researching the profitability of this strategy.

Fig. 1
figure 1

Trigger values and excess returns—distance approach. The figure shows monthly excess returns averaged across sub-periods, 1980–2014

Fig. 2
figure 2

Trigger values and excess returns—cointegration approach. The figure shows monthly excess returns averaged across sub-periods, 1980–2014

Fig. 3
figure 3

Formation period and excess returns—distance approach. The figure shows monthly excess returns averaged across sub-periods, 1980–2014

Fig. 4
figure 4

Formation period and excess returns—cointegration approach. The figure shows monthly excess returns averaged across sub-periods, 1980–2014

To explore the extent to which parameterization of the trading system is important, we study how profitability varies with trigger values and two alternative formation periods (Figs. 1, 2, 3, 4). To do this, we vary one of these parameters, holding the other at its standard value. For presentation purposes, we average results across sub-periods. Clearly, the trigger value can matter a great deal for profitability of the trading system, and this is without any regard to transaction costs. This is true also for risk-adjusted returns as they are very similar to raw returns regardless of the parameterization (not shown).

Turning to specific parameter values, profitability is generally a hump-shaped function of the trigger value. Further investigation reveals an intuitive reason for this finding. First, increasing the trigger captures only pairs that have diverged by a greater amount, which raises profitability from a given trade, but reduces the number of pair trades. A very high trigger, such as three standard deviations \((n=3)\), produces high profitability but from relatively few trades open at any given time. On the other hand, a low trigger results in more trading opportunities, but the returns on a given trade are smaller as the price divergence is smaller when the trade is initiated.

To give an example of the importance of the trigger value from these results, consider the cointegration approach and the top-5 pairs portfolio. The optimal trigger value is one-half the standard value of \(n=2\). Most importantly, with the (in-sample) optimal trigger value \(n=1\), profits are 40% larger than they would be if one were instead to use the standard trigger value.

Finally, we note that formation period length also matters, but to a lesser extent quantitatively than the trigger value, at least for our limited comparison of 9 and 12 months. It would be of interest in future research to examine longer formation periods. A longer formation period may well be desirable; perhaps especially with the cointegration approach as there the first objective is to discover long-run relationships with a statistical technique designed for this purpose.

5.2 Trading costs: one-day-later rule

Many studies of pairs trading make an adjustment for transaction costs, the most common of which is due to Gatev et al. (2006) and termed the “one-day-later rule”. This amounts to opening and closing trades 1 day later than when the trigger is first breached. The reasoning behind this assumption is to account for bid-ask spreads in computing returns from pairs trading. This can be a major component of transaction costs, particularly if some stocks have low liquidity.

Table 3 Pairs trading with distance approach—one-day-later rule
Table 4 Pairs trading with cointegration approach—one-day-later rule

More specifically, the one-day-later rule is employed to reduce the effects on calculated returns of bid-ask bounce associated with using daily closing stock prices from the Center for Research in Security Prices (CRSP) database. CRSP uses the daily closing price as the index of the daily stock price, which will be the price for the last transaction of the day, which in turn can either be at the bid or the ask depending on the order. The excess return calculated from these prices could be biased upward because pairs trading is a contrarian investment strategy, and thus, when we open the position, we may be buying underpriced stocks at the bid and selling overpriced ones at the ask (see Gatev et al. 2006). Delaying trades until 1 day after the trigger is breached for a stock pair will correct for this bid-ask bias, assuming that closing prices 1 day later are equally likely to be at bid or ask, which is, arguably, a reasonable assumption. Of course, this assumption will not pick up returns on pairs that converge extremely rapidly (i.e., in 1 day). Note also that this adjustment does not capture other types of trading costs—commissions, market impact costs of trading, and short-selling costs. Such costs are rarely accounted for in studies of pairs trading; we will consider those additional costs later in this section.

Tables 3 and 4 present the results for the one-day-later rule. Note first that with this adjustment, there is considerably more variation in the optimal in-sample parameterization, with many fewer instances, where the standard parameterization is optimal. Most striking, however, is that profitability is dramatically lower in both approaches, and especially in the more recent sub-samples. Specifically, for the distance approach, returns in the 1980s and 1990s drop by roughly 50% relative to the no-waiting case, and in the 2000s period, they drop by an even larger amount: from upward of 2% with no waiting to less than 0.5% with the one-day-later rule. Nevertheless, excess returns and Jensen alphas remain statistically significant for the distance approach.

The drop in returns after the 1980s, however, is most dramatic for the cointegration approach. Specifically, pairs trading returns in the 1990s and 2000s are economically and statistically insignificantly different from zero for every portfolio size in both sub-periods. These facts are evident in Figs. 5, 6, and 7, which show the cumulative returns from the two approaches for the top-20 portfolios versus that of the equity premium (using the S&P 500 returns net of the risk-free rate). In this example, the most striking observation is that the cointegration approach would have been a money-losing strategy in both the 1990s and 2000s, and this is using an optimal in-sample parameterization of the trading system—the performance of the pairs-trading portfolio would have been even worse than shown had one used the standard parameterization as it was not optimal in any sub-period for the top-20 portfolio.

We next turn to discussion of risk-adjusted returns. We begin with Sharpe ratios. Standard Sharpe ratios may be problematic for our purposes, because the return distributions (see Tables 1, 2, 3, and 4) are highly non-normal.Footnote 6 For this reason, we present in Table 5 both standard and modified Sharpe ratios. The latter differ from the standard ones in that the standard deviation in the denominator of the standard Sharpe ratio is replaced by the modified value at risk for a non-normal return distribution. What is important about this correction is that both skew and kurtosis enter directly into the risk-adjustment term via a Cornish-Fisher asymptotic expansion for the quantiles of a non-normal distribution. This modified Sharpe ratio is known as the Cornish-Fisher Sharpe ratio (e.g., Bramante 2013).

The main implications from both standard and modified Sharpe ratios are consistent with the above inferences from simple excess returns and Jensen alphas: (1) in the 1980s, Sharpe ratios were much larger than for the equity premium with either pairs-trading approach; (2) for the distance approach, they remain higher than the equity premium in the 1990s and 2000s; and (3) the cointegration approach produces lower Sharpe ratios than the market throughout the 1990s and 2000s periods.

Figures 5, 6, and 7 suggest that an important reason for the superior risk-adjusted performance of pairs trading with the distance approach relative to the equity premium is not only that raw returns are higher but also that pairs trading clearly produces much less volatile returns relative to the overall equity market. However, returns from pairs trading are not risk free—the standard deviation is not zero. A remaining question is what underlies the risk in pairs trading. To investigate this, we next decompose excess returns using the Fama-French factor approach (Tables 6, 7). It is strikingly apparent that none of the four Fama-French factors are driving returns from pairs trading with either approach, in any sub-period: Fama-French alphas are very similar quantitatively to Jensen alphas and, more importantly, to mean excess returns. We conclude that the variation in pairs trading excess returns is not explained by standard factor models, and is arguably simply “arbitrage risk” (Gatev et al. 2006).

In sum, the cointegration approach has not been a profitable strategy throughout the past 25 years even when the parameterization of the trading system is chosen optimally on an in-sample basis. The distance approach performs better, but excess returns have clearly fallen considerably over time: the top-20 portfolio excess returns halved in the 1990s from the 1980s, and halved again from the 1990s to the 2000s. In the full 2000–2014 period, the excess returns on this portfolio are just 33 bp. The main unresolved question is whether such returns with the distance approach are still economically significant if, for example, we break this time period down into finer sub-periods and, more importantly, when we account for other costs associated with pairs trading.

Fig. 5
figure 5

Cumulative returns for the 1980s. Cumulative returns for the pairs portfolios refer to the top-20 portfolio

Fig. 6
figure 6

Cumulative returns for the 1990s. Cumulative returns for the pairs portfolios refer to the top-20 portfolio

Fig. 7
figure 7

Cumulative returns for the 2000s. Cumulative returns for the pairs portfolios refer to the top-20 portfolio

Table 5 Sharpe ratios
Table 6 Factor analysis—pairs trading with distance approach
Table 7 Factor analysis—pairs trading with cointegration approach

5.3 Is pairs trading still profitable?

The evidence presented so far clearly shows the decline in pairs-trading returns across three sub-periods. In the full 2000–2014 period, excess returns with the distance approach range from a low of 12 bp to a high of 66 bp on the top-200 and top-10 portfolios, respectively, and risk-adjusted returns (Fama-French alphas) are nearly identical. As the objective here is to assess whether pairs trading with the distance approach is profitable with the most recent data and with account taken of additional transaction costs, it is instructive first to examine the temporal pattern of returns in the 2000s period. For the top-20 portfolio, for example, average excess returns were 33 bp for the full period 2000–2014, 26 bp for 2000–2004, 48 bp for 2005–2009, and 23 bp for 2010–2014. The global financial crisis in 2007–2008 clearly presented more favorable pairs-trading opportunities—raising excess returns by roughly half relative to the full 2000s sample. This fact has been pointed out by others—market turmoil presents better prospects for pairs-trading strategies (see Do and Faff 2010, 2012; Deaves et al. 2013). Before discussing transaction costs, we also note that it is important to recognize that these return calculations are generous measures as they use an optimal in-sample parameterization. For instance, if one were to simply use the standard parameterization, the returns drop significantly: on the top-20 portfolio, for example, they drop from 36 bp to just 13 basis points for the full 2000–2014 period. The key question is whether, even apart from the relatively small residual risk that is involved in pairs trading—arbitrage risk—these returns are economically meaningful.

It is impossible to directly measure the additional transaction costs from each pair trade. The best that can be done is to provide some evidence on the magnitudes of such costs on average for trades executed by professional investors. Do and Faff (2012) estimate three transaction costs for pairs trading by professional investors: commission expenses, market impact costs associated with relative value arbitrage, and costs associated with short selling. They report that the costs associated with short sales are very small (less than 1% loan fee per annum), and Gatev et al. (2006) reach similar conclusions. Thus, short-selling costs will likely not greatly alter the profitability of pairs trading.

For each one-way trade, Do and Faff (2012) report average commission fees of 9 bp for the 2000s (their sample ends in 2009) and market impact trading costs of around 20 bp. Each pair trade involves two roundtrip trades (four one-way trades). Thus, using these estimates for the 2000s period, total transaction costs per pair trade are on the order of 116 bp (i.e., \(4 \times (9 + 20)\)). On average, we trade each pair twice in a 6-month trading period, and thus, transaction costs are on the order of 232 bp per 6 months, which translates to 39 bp per month. Based on these estimates of transaction costs for the 2000s period, and returns ranging from 12–66 bp on a monthly basis, returns from pairs trading with the distance approach and using an optimal in-sample parameterization are at best 17 bp monthly (which corresponds to the top-10 portfolio).

In sum, it is difficult to conclude that the returns to pairs trading in the 2000s with any of the two main approaches, and accounting for transaction costs, are on average economically significant. The most favorable case is that the distance approach can yield economically significant returns mainly during market disruptions. Although some market disruptions are obvious (such as the 2007–2008 financial crisis), which could provide a guide for pairs traders, other market disruptions may not be. For instance, we experimented with using the VIX implied volatility index as an indicator of market stress and conditioned pairs trading on it. In the 2000s period, we found no correlation between the VIX and the returns to pairs trading. Further research into what types of market conditions are most favorable to pairs trading would be worthwhile.

6 Conclusion

This paper studies alternative techniques for identifying stock pairs in a pairs-trading strategy and explores a range of parameterizations of the trading system. Parameterization of the trading system matters for the profitability of pairs trading, and the standard one used in the literature does not appear to be the optimal one in the majority of cases studied.

The profitability of pairs trading has declined over time, and this has been noted in the literature. However, we find that the cointegration approach to pairs trading would not have been a profitable investment strategy throughout the 1990s and 2000s, and quite likely would have returned a negative amount when all transaction costs are accounted for properly. The distance approach also shows declining returns over time, but the crucial question is whether the very low, but statistically significant, returns in recent years are high enough to cover all relevant transaction costs. Our analysis suggests they may not be.

Our analysis leads to a number of suggestions for future research. One is to experiment with alternative implementations of the cointegration approach. This approach readily identifies cointegrated stock prices, but the number of such “pairs” identified just with cointegration is very large. As a result, some additional structure must be employed to reduce the set of pairs to a set of the best ones. In this paper, we used a strategy that narrows this set by requiring mean reversion of the price spreads, two-way Granger causality in the prices of a given pair, and quantitatively similar market betas. The point is that there are other refinements to the cointegration approach that should be explored with the aim of seeing whether they produce better pairs-trading returns.

A second avenue worth considering is the role of the formation period length. We have studied a shorter formation period than the standard 12-months period, but it could be that a longer formation period might be beneficial for pairs-trading profitability. In the cointegration approach, for instance, it could very well be the case that more than 1 year of historical price data would help in identifying cointegrated prices and their possible mean reversion. These are important topics for future work.

Finally, it has come to be an accepted belief that difficult market conditions produce the best opportunities for pairs trading. More research into exactly what types of market conditions, and how they are identified in real time, are most favorable to pairs trading could produce important findings.