1 Introduction

Cross-correlations often provide us very useful information about financial markets to figure out various non-trivial and complicated structures behind the stocks as multivariate time series (Bouchaud and Potters 2009). Actually, the use of the cross-correlation can visualize collective behavior of stocks during the crisis. As such examples, we visualized the collective movement of the stocks by means of the so-called multidimensional scaling (MDS) during the earthquake in Japan on March 2011 (Ibuki et al. 2012a, b, 2013). We have also constructed a prediction procedure for several stocks simultaneously by means of multi-layer Ising model having mutual correlations through the mean fields in each layer (Ibuki et al. 2012a, b, 2013; Murota and Inoue 2013).

Usually, we need information about the trend of each stock to predict the price for, you might say, ‘single trading’ (Murota and Inoue 2013; Kaizoji 2000; Bouchaud 2012). However, it sometimes requires us a lot of unlearnable ‘craftsperson’s techniques’ to make a profit. Hence, it is reasonable for us to use the procedure without any trend-forecasting-type way to manage the asset with a small risk.

From the view point of time series prediction, Elliot et al. (2005) made a model for the spread and tried to estimate the state variables (spread) as hidden variables from observations by means of Kalman filter. They also estimated the hyper-parameters appearing in the model using EM algorithm (Expectation and Maximization algorithm) which has been used in the field of computer science. As an example of constructing optimal pairs, (Mudchanatongsuk 2008) regarded pair prices as Ornstein–Uhlenbeck process, and they proposed a portfolio optimization for the pair by means of stochastic control.

For the managing of assets, the so-called pairs trading (Vidyamurthy 2004; Whistler 2004; Gatev et al. 2006) has attracted trader’s attention. The pairs trading is based on the assumption that the spread between highly correlated two stocks might shrink eventually even if the two prices of the stocks temporally exhibit ‘mis-pricing’ leading up to a large spread. It has been believed that the pairs trading is almost ‘risk-free’ procedure; however, there are only a few extensive studies (Perlin 2009; Do and Faff 2010) so far to examine the conjecture in terms of big-data scientific approach.

Of course, several purely theoretical approaches based on probabilistic theory have been reported. For instance, the so-called arbitrage pricing theory (APT) Gatev et al. (2006) in the research field of econometrics has suggested that the pairs trading works effectively if the linear combination of two stocks, each of which is non-stationary time series, becomes stationary. Namely, the pair of two stocks showing the properties of the so-called co-integration (Engle and Granger 1987; Stock and Watson 1988) might be a suitable pair. However, it might cost us a large computational time to check the stationarity of the co-integration for all possible pairs in a market, whereas it might be quite relevant issue to clarify whether the pairs trading is actually safer than the conventional ‘single trading’ [see for instance (Murota and Inoue 2013)] to manage the asset, or to what extent the return from the pairs trading would be expected etc.

With these central issues in mind, here we construct a platform to carry out and to investigate the pairs trading which has been recognized an effective procedure for some kind of ‘risk-hedge’ in asset management. We propose an effective algorithm (procedure) to check the amount of profit from the pair trading easily and automatically. We apply our algorithm to daily data of stocks in the first section of the Tokyo Stock Exchange, which is now available at the Yahoo! finance web site http://finance.yahoo.co.jp. In the algorithm, three distinct conditions, namely, starting (\(\theta \)), profit-taking (\(\varepsilon \)) and stop-loss (\(\Omega \)) conditions of transaction are automatically built into the system by evaluating the spread (gap) between the prices of two stocks for a given pair. Namely, we shall introduce three essential conditions to inform us when we should start the trading, when the spread between the stock prices satisfies the profit-taking conditions, etc. by making use of a very simple way. Numerical evaluations of the algorithm for the empirical data set are carried out for all possible pairs by changing the starting, profit-taking and stop-loss conditions to look for the best possible combination of the conditions.

This paper is organized as follows. In the next Sect. 2, we introduce several descriptions for the mathematical modeling of pairs trading and set-up for the empirical data analysis by defining various variables and quantities. Here we also mention that the pairs trading is described by a first-passage process (Redner 2001), and explain the difference between our study and arbitrage pricing theory (APT) (Gatev et al. 2006) which has highly developed in the research field of econometrics. In Sect. 3, we introduce several rules of the game for the trading. We define two relevant measurements to quantify the usefulness of pairs trading, namely, winning probability and profit rate. The concrete algorithm to carry out pairs trading automatically is also given in this section explicitly. The results of empirical data analysis are reported and argued in Sect. 4. The last section is devoted to summary.

2 Mathematical descriptions and set-up

In pairs trading, we first pick up two stocks having a large correlation in the past. A well-known historical example is the pair of coca cola and pepsi cola http://en.wikipedia.org/wiki/Pairs_trade. Then, we start the action when the spread (gap) between the two stocks’ prices increases up to some amount of the level (say, \(\theta \)), namely, we sell one increasing stock (say, the stock \(i\)) and buy another decreasing one (say, the stock \(j\)) at the time \(t_{<}^{(ij)}\). We might obtain the arbitrage as a profit (gain) \(g_{ij}\):

$$\begin{aligned} g_{ij} = |\gamma _{i}(t_{<}^{(ij)})-\gamma _{j}(t_{<}^{(ij)})|- |\gamma _{i}(t_{>}^{(ij)})-\gamma _{j}(t_{>}^{(ij)})| \end{aligned}$$
(1)

when the spread decreases to some amount of the level (say, \(\varepsilon (> \theta )\)) again due to the strong correlation between the stocks, and we buy the stock \(i\) and sell the stock \(j\) at time \(t_{>}^{(ij)}\). We should keep in mind that we used here the stock price normalized by the value itself at \(\tau \)-times before (we may say ‘rate’) as

$$\begin{aligned} \gamma _i(t) \equiv \frac{p_i (t)-p_i (t-\tau +1)}{p_i (t-\tau +1)} = \frac{p_i(t)}{p_i(t-\tau +1)}-1 \end{aligned}$$
(2)

where we defined \(p_{i}(t)\) as a price of the stock \(i\) at time \(t\). It is convenient for us to use the \(\gamma _{i}(t)\) (or \(\gamma _{i}(t)+1\)) instead of the price \(p_{i}(t)\) because we should treat the pairs changing in quite different ranges of price. Hence, we evaluate the spread between two stocks by means of the rate \(\gamma _{i}(t)\) which denotes how much percentage of the price increases (or decreases) from the value itself at \(\tau \)-times before. In our simulation, we choose \(\tau =\text{250 } \text{[days] }\). Using this treatment (2), one can use unified thresholds \((\theta , \varepsilon , \Omega )\) which are independent of the range of prices for all possible pairs.

2.1 Pairs trading as a first-passage process

Obviously, more relevant quantities are now not the prices themselves but the spreads for the prices of pairs. It might be helpful for us to notice that the process of the spread defined by

$$\begin{aligned} d_{ij} (t) \equiv |\gamma _{i}(t)-\gamma _{j}(t)| \end{aligned}$$
(3)

also produces a time series \(d_{ij}(0) \rightarrow d_{ij}(1) \rightarrow \cdots \rightarrow d_{ij}(t) \rightarrow \cdots \), which is described as a stochastic process.

In financial markets, the spread (in particular, the Bid-Ask spread) is one of the key quantities for double-auction systems [for instance, see (Ibuki and Inoue 2011)] and the spread between two stocks also plays an important role in pairs trading. Especially, it should be regarded as a first-passage process (or sometimes referred to as first-exit process) [see for instance (Redner 2001; Inoue and Sazuka 2010; Sazuka et al. 2009; Inoue and Sazuka 2007; Sazuka and Inoue 2007) for recent several applications to finance] with absorbing boundaries \((\theta , \varepsilon , \Omega )\), and the times \(t_{<}^{(ij)}\) and \(t_{>}^{(ij)}\) are regarded as first-passage times. Actually, \(t_{<}^{(ij)}\) and \(t_{>}^{(ij)}\) are the times \(t\) satisfying the following for the first time

$$\begin{aligned} d_{ij} (t) \ge \theta ,\quad \text{ and }\quad \theta < d_{ij} (t) < \varepsilon \quad ( t<t_{>}^{(ij)}), \end{aligned}$$
(4)

respectively. More explicitly, these times are given by

$$\begin{aligned} t_{<}^{(ij)}= & {} \min \{t>0 \,|\, d_{ij}(t) \ge \theta \} \end{aligned}$$
(5)
$$\begin{aligned} t_{>}^{(ij)}= & {} \min \{t>t^{(ij)}_{<}\, |\, \theta <d_{ij}(t) \le \varepsilon \}. \end{aligned}$$
(6)

The above argument was given for somewhat an ideal case, and of course, we might lose the money just as much as

$$\begin{aligned} l_{ij} = |\gamma _{i}(t_{*}^{(ij)})-\gamma _{j}(t_{*}^{(ij)})|- |\gamma _{i}(t_{<}^{(ij)})-\gamma _{j}(t_{<}^{(ij)})| =|d_{ij}(t_{*}^{(ij)})-d_{ij}(t_{<}^{(ij)})| \end{aligned}$$
(7)

where \(t_{*} (>t_{<})\) is a ‘termination time’ satisfying

$$\begin{aligned} t_{*}^{(ij)} = \min \{ t >t_{<}^{(ij)} \,|\, d_{ij}(t) > \Omega \}. \end{aligned}$$
(8)

This means that we should decide a ‘loss-cutting’ when the spread \(d_{ij} (t)\) does not shrink to the level \(\varepsilon \) and increases beyond the threshold \(\Omega \) at time \(t_{*}\). It should bear in mind that once we start the trading, we get a gain or lose, hence the above \(t_{>}^{(ij)}\) and \(t_{*}^{(ij)}\) are unified as ‘time for decision’ \(\hat{t}^{(ij)}\) by

$$\begin{aligned} \hat{t}^{(ij)}= t_{*}^{(ij)} + (t_{>}^{(ij)}-t_{*}^{(ij)}) \Theta (t_{*}^{(ij)}-t_{>}^{(ij)}) \end{aligned}$$
(9)

where we defined a unit step function as

$$\begin{aligned} \Theta (x) = \left\{ \begin{array}{cc} 1 &{} (x\ge 0) \\ 0 &{} (x<0). \end{array} \right. \end{aligned}$$
(10)

Namely, if a stochastic process \(d_{ij}(t)\) firstly reaches the threshold \(\varepsilon \), the time for decision is \(\hat{t}^{(ij)}=t_{>}^{(ij)}\), whereas if the \(d_{ij}(t)\) goes beyond the threshold \(\Omega \) before shrinking to the level \(\varepsilon \), we have \(\hat{t}^{(ij)}=t_{*}^{(ij)}\).

2.2 Correlation coefficient and volatility

We already mentioned that the pairs trading is based on the assumption that the spread between highly correlated two stocks might shrink shortly even if the two prices of the stocks exhibit a temporal spread. Taking into account the assumption, here we select the suitable pairs of stocks using the information about correlation coefficient (the Pearson estimator) for pairs to quantify the correlation:

$$\begin{aligned} \rho _{ij} (t)= \frac{\sum _{\Delta t=t-\tau +1}^{t} (\Delta \gamma _{i} (t, \Delta t)-\overline{\Delta \gamma _{i}(t, \Delta t)}) (\Delta \gamma _{j} (t, \Delta t)-\overline{\Delta \gamma _{j}(t, \Delta t)})}{ \sqrt{ \sum _{\Delta t=t-\tau +1}^{t}(\Delta \gamma _{i} (t, \Delta t)-\overline{\Delta \gamma _{i}(t, \Delta t)})^{2} \sum _{\Delta t=t-\tau +1}^{t}(\Delta \gamma _{j} (t, \Delta t)-\overline{\Delta \gamma _{j}(t, \Delta t)})^{2} }} \end{aligned}$$
(11)

and standard deviation (volatility):

$$\begin{aligned} \sigma _{i} (t) = \sqrt{ \sum _{l=t-\tau +1}^{t} (\gamma _{i}(l)-\overline{\gamma _{i}(t)})^{2}} \end{aligned}$$
(12)

It should be noted that we also use the definition of the logarithmic return of the rescaled price \(\gamma _{i}(t)+1=p_{i}(t)/p_{i}(t-\tau +1)\) [see Eq. (2)] for the duration \(\Delta t\) in (11) by

$$\begin{aligned} \Delta \gamma _{i}(t, \Delta t) \equiv \log (\gamma _{i}(t+\Delta t)+1)-\log (\gamma _{i}(t)+1) \end{aligned}$$
(13)

and moving average of the observable \(A(t)\) over the duration \(\tau \) as

$$\begin{aligned} \overline{A (t)} = \frac{1}{\tau } \sum _{l=t-\tau +1}^{t} A(l). \end{aligned}$$
(14)

At first glance, the definition of (11) for correlation coefficient might look like unusual because (13) accompanying with (2) implies that the correlation coefficient consists of the second derivative of prices. However, as we already mentioned, the ‘duration’ \(\tau =1 \text{[year] }\) appearing in (2) is quite longer than \(\Delta t =\text{1 } \text{[day] }\), namely, \(\tau \gg \Delta t\) is satisfied. Hence, the price difference in (2) could not regarded as the same derivative as in the derivative definition of (13). Therefore, the definition of correlation coefficient (11) is nothing but the conventional first derivative quantity.

From the view point of these quantities \(\{\rho _{ij}(t), \sigma _{i}(t)\}\), the possible pairs should be highly correlated and the standard deviation of each stock should take the value lying in some finite range. Namely, we impose the following condition for the candidates of the pairs at time \(t_{<}^{(ij)}\), that is, \(\rho _{ij} (t_{<}^{(ij)}) > \rho _{0}\) and \({\sigma _\text{min}} < \sigma _{i}(t_{<}^{(ij)}) < \sigma _{\text{max}}\). Thus, the total number of pairs for us to carry out the pair trading (from now on, we call such pairs as ‘active pairs’) is given explicitly as

$$\begin{aligned} N_{(\theta ,\varepsilon ,\Omega )}&= \sum _{i}\sum _{j<i} \Theta (\rho _{ij}(t_{<}^{(ij)})-\rho _{0})\{ \Theta (\sigma _{i}(t_{<}^{(ij)}) -{\sigma _\text{min}})- \Theta (\sigma _{i}(t_{<}^{(ij)}) - {\sigma _\text{max}}) \} \nonumber \\&\quad \times \Theta ({\tau_{\text{max}}}-\hat{t}^{(ij)}) \end{aligned}$$
(15)

where a factor \(\Theta ({\tau_\text{max}}-\hat{t}^{(ij)})\) means that we terminate the game if ‘time of decision’ \(\hat{t}^{(ij)}\) [see Eq. (9)] becomes longer than the whole playing time \({\tau _\text{max}}=\text{1 } \text{[year] }\). Therefore, the number of active pairs is dependent on the thresholds \((\theta ,\epsilon , \Omega )\), and we see the details of the dependence in Tables 1 and 2 under the condition \(\Omega =2\theta -\varepsilon \) (see \(N_{w}+N_{l} \equiv N_{(\theta ,\varepsilon )}\) in the tables).

Table 1 Details of the result in 2012
Table 2 Details of the result in 2011

From Fig. 1, we are also confirmed that the number pairs \(N\) satisfying \(\rho _{ij} (t_{<}^{(ij)}) > \rho _{0}\) and \({\sigma _\text{min}} < \sigma _{i}(t_{<}^{(ij)}) < {\sigma _\text{max}}\) is extremely smaller (\(N \sim 300\)) than the number of combinations for all possible \(n\) stocks, namely, \(N \ll n(n-1)/2=1,590,436\).

2.3 Minimal portfolio and APT

It might be helpful for us to notice that the pairs trading could be regarded as a ‘minimal portfolio’ and it can obtain the profit even for the case that the stock average decreases. Actually, it is possible for us to construct such ‘market neutral portfolio’ (Livan et al. 2012) as follows. Let us consider the return of the two stocks \(i\) and \(j\) which are described as

$$\begin{aligned} \Delta \gamma _{i}(t)= & {} \beta _{i} \Delta \gamma _{m}(t) +q_{i}(t) \end{aligned}$$
(16)
$$\begin{aligned} \Delta \gamma _{j}(t)= & {} \beta _{j} \Delta \gamma _{m}(t) +q_{j}(t) \end{aligned}$$
(17)

where parameters \(\beta _{i},\beta _{j}\) denote the so-called ‘market betas’ for the stocks \(i,j\), and \(\Delta \gamma _{m}(t)\) stands for the return of the stock average, namely,

$$\begin{aligned} \beta _{i} = \frac{\overline{ \Delta \gamma _{m}(t) \Delta \gamma _{i}(t)}}{\sqrt{(\overline{\Delta \gamma _{m}(t)})^{2}}},\,\,\, \beta _{j} = \frac{\overline{\Delta \gamma _{m}(t) \Delta \gamma _{j}(t)}}{\sqrt{(\overline{\Delta \gamma _{m}(t)})^{2}}} \end{aligned}$$
(18)

and here we select them (\(\beta _{i},\beta _{j}\)) as positive values for simplicity. On the other hand, \(q_{i}(t),q_{j}(t)\) appearing in (16), (17) are residual parts (without any correlation with the stock average) of the returns of stocks \(i,j\). Then, let us assume that we take a short position (‘selling’ in future) of the stock \(j\) by volume \(r\) and a long position (‘buying’ in future) of the stock \(i\). For this action, we have the return of the portfolio as

$$\begin{aligned} \Delta \gamma _{ij}(t) = \Delta \gamma _{i}(t) -r \Delta \gamma _{j}(t) = (\beta _{i}-r\beta _{j})\Delta \gamma _{m}(t) + q_{i}(t) -r q_{j}(t). \end{aligned}$$
(19)

Hence, obviously, the choice of the volume \(r\) as

$$\begin{aligned} r = \frac{\beta _{i}}{\beta _{j}} \end{aligned}$$
(20)

leads to

$$\begin{aligned} \Delta \gamma _{ij}(t) = q_{i}(t) - \left( \frac{\beta _{i}}{\beta _{j}}\right) q_{j}(t) \end{aligned}$$
(21)

which is independent of the market (the average stock \(\Delta \gamma _{m}(t)\)). We should notice that \(\Delta \gamma _{ij}(t) = \Delta \gamma _{i}(t) -r \Delta \gamma _{j}(t)\) is rewritten in terms of the profit as follows:

$$\begin{aligned} \Delta \gamma _{ij}(t)&= \Delta \gamma _{i}(t) -r \Delta \gamma _{j}(t) = \{\gamma _{i}(t+1)-r \gamma _{j}(t+1)\}- \{\gamma _{i}(t)-r \gamma _{j}(t)\} \nonumber \\&\simeq d_{ij}(t+1)-d_{ij}(t) \end{aligned}$$
(22)

Therefore, in this sense, the profit \(\Delta \gamma _{ij}(t)\) is also independent of the market \(\Delta \gamma _{m}(t)\). This empirical fact might tell us the usefulness of pairs trading.

In the arbitrage pricing theory (APT) (Vidyamurthy 2004; Gatev et al. 2006), the condition for searching suitable pairs is the linear combination of ‘non-stationary’ time series \(\gamma _{i}(t)\) and \(\gamma _{j}(t)\), \(\gamma _{i}(t)-r \gamma _{j}(t)\) becomes co-integration, namely, it becomes ‘stationary’. Then, the quantity possesses the long-time equilibrium value \(\mu \) and we write

$$\begin{aligned} \gamma _{i}(t)-r \gamma _{j}(t)= & {} \mu +\omega \end{aligned}$$
(23)
$$\begin{aligned} \gamma _{i}(t +l)-r \gamma _{j}(t+l)= & {} \mu - \omega \end{aligned}$$
(24)

with a small deviation \(\omega ({>}0)\) from the mean \(\mu \). Therefore, we easily find

$$\begin{aligned} \gamma _{i}(t)-r \gamma _{j}(t) - \{ \gamma _{i}(t +l)-r \gamma _{j}(t+l) \} \simeq d_{ij}(t)-d_{ij}(t+l) =2\omega \end{aligned}$$
(25)

namely, we obtain the profit with a very small risk. Hence, the numerical checking for the stationarity of the linear combination \(\gamma _{i}(t)-r \gamma _{j}(t)\) by means of, for instance, exponentially fast decay of the auto-correlation function or various types of statistical test might be useful for us to select the possible pairs. However, it computationally cost us heavily for large-scale empirical data analysis. This is a reason why here we use the correlation coefficients and volatilities to investigate the active pairs instead of the co-integration-based analysis as given in the references (Vidyamurthy 2004; Whistler 2004; Gatev et al. 2006; Engle and Granger 1987; Stock and Watson 1988).

3 Procedures of empirical analysis and ‘rules of the game’

In this section, we explain rules of our game (trading) using the data set for the past three years 2010–2012 including 2009 to evaluate the quantities like correlation coefficient and volatility in 2010 by choosing \(\tau =250\) [days]. In the following, we explain how one evaluates the performance of pairs trading according to the rules.

3.1 A constraint for the thresholds

Obviously, the ability of the asset management by pairs trading is dependent on the choice of the thresholds \((\theta , \varepsilon , \Omega )\). Hence, we should investigate how much percentage of total active pairs can obtain a profit for a given set of \((\theta , \varepsilon , \Omega )\). To carry out the empirical analysis, we define the ratio between the profit \(d_{ij}(t_{<}^{(ij)})-d_{ij}(t_{>}^{(ij)}) ({>}0)\) and the loss \(d_{ij}(t_{*}^{(ij)})-d_{ij}(t_{<}^{(ij)}) ({>}0)\) for the marginal spread, namely, \(d_{ij}(t_{<}^{(ij)})=\theta , d_{ij}(t_{>}^{(ij)})=\varepsilon , d_{ij}(t_{*}^{(ij)})=\Omega \) as

$$\begin{aligned} \frac{\Omega -\theta }{\theta -\varepsilon } \equiv \alpha \end{aligned}$$
(26)

where \(\alpha ({>}0)\) is a control parameter. It should be noted that for positive constants \(\delta ,\delta ^{'}\), the gap of the spreads (profit) \(d_{ij}(t_{<}^{(ij)})-d_{ij}(t_{>}^{(ij)})\) is written as

$$\begin{aligned} d_{ij}(t_{<}^{(ij)})-d_{ij}(t_{>}^{(ij)}) = \theta +\delta - (\varepsilon -\delta ^{'})= \theta -\varepsilon +(\delta +\delta ^{'}) \ge \theta -\varepsilon . \end{aligned}$$
(27)

Therefore, the difference \(\theta -\varepsilon \) appearing in the denominator of Eq. (26) gives a lower bound of the profit. Although the numerator \(\Omega -\theta \) in (26) has no such an explicit meaning; however, implicitly it might be regarded as a ‘typical loss’ because the actually realized loss \(d_{ij}(t_{*}^{(ij)})-d_{ij}(t_{<}^{(ij)})\) fluctuates around the typical value and it is more likely to take a value which is close to \(\Omega -\theta \).

Hence, for \(\alpha >1\), the loss for the marginal spread \(\Omega -\theta \) is larger than the lowest possible profit once a transaction is taken place and vice versa for \(\alpha <1\). If we set \(\alpha <1\), it is more more likely to lose the money less than the lowest bound of the profit \(\theta -\varepsilon \), however, at the same time, it means that we easily lose due to the small gap between \(\Omega \) and \(\theta \). In other words, we might frequently lose with a small amount of losses. On the other hand, if we set \(\alpha >1\), we might hardly lose; however, once we lose, the total amount of the losses is quite large. Basically, it lies with traders to decide which to choose \(\alpha >1\) or \(\alpha <1\); however, here we set the marginal \(\alpha =1\) as a ‘neutral strategy’, that is

$$\begin{aligned} \Omega =2\theta -\varepsilon . \end{aligned}$$
(28)

Thus, we have now only two thresholds \((\theta , \varepsilon )\) for our pairs trading, and the \(\Omega \) should be determined as a ‘slave variable’ from Eq. (28). Actually this constraint (28) can reduce our computational time to a numerically tractable revel. Under the condition (28), we sweep the thresholds \(\theta , \varepsilon \) as \(0.01\le \theta \le 0.09\), \(0.0\le \varepsilon \le \theta \) (\(d\theta =0.01\)) and \(0.1\le \theta \le 1.0\), \(0.0\le \varepsilon \le \theta \) (\(d\theta =0.1\)) in our numerical calculations (see Tables 1 and 2).

3.2 Observables

In order to investigate the performance of pairs trading quantitatively, we should observe several relevant performance measurements. As such observables, here we define the following wining probability as a function of the thresholds:

$$\begin{aligned} p_{w} (\theta ,\varepsilon ) = \frac{ \sum _{i,j<i} \Theta (d_{ij}(t_{<}^{(ij)})-d_{ij}(\hat{t}^{(ij)})) \psi (d_{ij}(t_{<}^{(ij)}), \rho _{0},\sigma _\mathrm{min},\sigma _\mathrm{max}: \theta ,\varepsilon )}{\sum _{i,j<i} \psi (d_{ij}(t_{<}^{(ij)}), \rho _{0},\sigma _\mathrm{min},\sigma _\mathrm{max}: \theta , \varepsilon )} \end{aligned}$$
(29)

where we defined

$$\begin{aligned}&\psi (d_{ij}(t_{<}^{(ij)}), \rho _{0},\sigma _\mathrm{min},\sigma _\mathrm{max}: \theta ,\varepsilon ) \nonumber \\&\quad \equiv \Theta (\rho _{ij}(t_{<}^{(ij)})-\rho _{0})\{ \Theta (\sigma _{i}(t_{<}^{(ij)}) -\sigma _\mathrm{min})- \Theta (\sigma _{i}(t_{<}^{(ij)}) - \sigma _\mathrm{max}) \} \Theta (\tau _\mathrm{max}-\hat{t}^{(ij)}) \nonumber \\&\quad = \ll \Theta (d_{ij}(t_{<}^{(ij)})-d_{ij}(\hat{t}^{(ij)})) \gg \nonumber \\&\quad = \frac{N_{w}}{N_{(\theta ,\varepsilon )}} = 1-\frac{N_{l}}{N_{(\theta ,\varepsilon )}} \end{aligned}$$
(30)

where \(N_{w},N_{l}\) are numbers of wins and loses, respectively, and the conservation of the number of total active pairs

$$\begin{aligned} N_{(\theta ,\varepsilon )}=N_{w}+N_{l} \end{aligned}$$
(31)

should hold (see the definition of \(N_{(\theta ,\varepsilon ,\Omega )}\) in (15) under the condition \(\Omega =2\theta -\varepsilon \)). The bracket \(\ll \cdots \gg \) appearing in (30) is defined by

$$\begin{aligned} \ll \cdots \gg &\equiv \frac{ \sum _{i,j<i} (\cdots ) \psi (d_{ij}(t_{<}^{(ij)}), \rho _{0},\sigma _\mathrm{min},\sigma _\mathrm{max}: \theta ,\varepsilon ) }{ \sum _{i,j<i} \psi (d_{ij}(t_{<}^{(ij)}), \rho _{0},\sigma _\mathrm{min},\sigma _\mathrm{max}: \theta , \varepsilon ) }. \end{aligned}$$
(32)

We also define the profit rate:

$$\begin{aligned} \eta (\theta ,\varepsilon ) =\, \ll d_{ij}(t_{<}^{(ij)})- d_{ij}(\hat{t}^{(ij)}) \gg \end{aligned}$$
(33)

which is a slightly different measurement from the winning probability \(p_{w}\). We should notice that we now consider the case with the constraint (28) and in this sense, the explicit dependences of \(p_{w}\) and \(\eta \) on \(\Omega \) are omitted in the above descriptions. We also keep in mind that \(d_{ij}(t_{<}^{(ij)})- d_{ij}(\hat{t}^{(ij)})\) takes a positive value if we make up accounts for taking the arbitrage at \(\hat{t}^{(ij)}=t_{>}^{(ij)}\). On the other hand, the \(d_{ij}(t_{<}^{(ij)})- d_{ij}(\hat{t}^{(ij)})\) becomes negative if we terminate the trading due to loss-cutting. Therefore, the above \(\eta \) denotes a total profit for a given set of the thresholds \((\theta ,\varepsilon ).\)

3.3 Algorithm

We shall list the concrete algorithm for our empirical study on the pairs trading as follows:

  1. 1.

    We collect a pair of stocks \((i,j)\) from daily data for the past one year.

  2. 2.

    Do the following procedures from \(t=0\) to \(t=\tau (=250: \,{\rm the}\, {\rm number}\, {\rm of}\,{\rm daily} \,{\rm data}\, {\rm for} \,{\rm one} \,{\rm year}).\)

    1. (a)

      Calculate \(\rho _{ij}(t)\) and \(\sigma _{i}(t), \sigma _{j}(t)\) to determine whether the pair \((i,j)\) satisfies the start condition.

      Start condition:

      • If \(\sigma _\mathrm{min} < \sigma _{i}(t), \sigma _{j}(t) < \sigma _\mathrm{max}\) and \(\rho _{ij}(t) > \rho _{0}\) and \(d(t_{<}^{(ij)}) > \theta \), go to (c).

      • If not, go to (b).

    2. (b)

      \(t \leftarrow t+1\) and back to (a).

    3. (c)

      Termination condition:

      \(t \leftarrow t+1\) and go to the termination condition.

      • If \(d_{ij}(t_{>}^{(ij)})<\varepsilon \) (we ‘win’), go to the next pairs \((k,l) \ne (i,j).\)

      • If not, go back to (c). If \(d_{ij}(t_{*}^{(ij)})> \Omega \), we ‘lose’. If \(\hat{t}^{(ij)} > \tau _\mathrm{max}\), go to 1.

Then, we repeat the above procedure for all possible pairs of 1784 stocks listed in the first section of the Tokyo Stock Exchange leading up to totally \({}_{1784}C_{2}=1,590,436\) pairs. We play our game according to the above algorithm for each pair, and if a pair \((i,j)\) passes their decision time \(\hat{t}^{(ij)}\) resulting in the profit:

$$\begin{aligned} d_{ij}(t_{<}^{(ij)})-d_{ij}(\hat{t}^{(ij)}) \quad \text{ with }\quad \hat{t}^{(ij)}=t_{>}^{(ij)} \end{aligned}$$
(34)

or the loss:

$$\begin{aligned} d_{ij}(\hat{t}^{(ij)})-d_{ij}(t_{<}^{(ij)})\quad \text{ with }\quad \hat{t}^{(ij)}=t_{*}^{(ij)}, \end{aligned}$$
(35)

we discard the pair \((i,j)\) and never ‘recycle’ the pair again for pairs trading. Of course, such treatment might be hardly accepted in realistic pairs trading because traders tend to use the same pairs as the one which gave them a profit in the past markets. Nevertheless, here we shall utilize this somewhat ‘artificial’ treatment to quantify the performance of pairs trading through the measurements \(p_{w}\) and \(\eta \) systematically. We also simplify the game by restricting ourselves to the case in which each trader always makes a trade by a unit volume. In the next section, we show several result of empirical data analysis.

4 Empirical data analysis

Here we show several empirical data analyses done for all possible pairs of 1784 stocks listed in the first section of the Tokyo Stock Exchange leading up to \({}_{1784}C_{2}=1,590,436\) pairs. The daily data sets are collected for the past four years 2009–2010 from the web site http://finance.yahoo.co.jp. In our empirical analysis, we set \(\tau =250\) [days], \(\rho _{0}=0.6, \sigma _\mathrm{min}=0.05, \sigma _\mathrm{max}=0.2.\)

4.1 Preliminary experiments

Before we show our main result, we provide the two empirical distributions for the correlation coefficients and volatilities, which might posses very useful information about selecting the active pairs. We also discuss the distribution of the first-passage time to quantify the processing time roughly.

4.1.1 Correlation coefficients and volatilities

In Fig. 1, we plot the distributions of \(\{\rho _{ij}(t)\}\) (left) and \(\{\sigma _{i}(t)\}\) (right) for the past four years (2009–2012).

Fig. 1
figure 1

Distributions of Pearson estimator \(\{\rho _{ij}(t)\}\) (left) and volatility \(\{\sigma _{i}(t)\}\) (right) for the past four years data sets: 2009, 2010, 2011 and 2012. On the other hand, the distribution of the volatility is almost independent of year and it possess a single peak around \(0.02\)

From the left panel, we find that the distribution of correlation coefficients is apparently skewed for all years and the degree of skewness in 2011 is the highest among the four due to the great east Japan earthquake as reported in Ibuki et al. (2012a, b, 2013). Actually, we might observe that most of stocks in the multidimensional scaling plane shrink to a finite restricted region due to the strong correlations.

On the other hand, the distribution of the volatility is almost independent of the year and possess a peak around \(0.02\). We are confirmed from these empirical distributions that the choice of the system parameters \(\rho _{0}=0.6, \sigma _\mathrm{min}=0.05, \sigma _\mathrm{max}=0.2\) could be justified properly in the sense that the number of pairs \((i,j)\) satisfying the criteria \(\rho _{ij}(t) <\rho _{0}\) and \(\sigma _\mathrm{min} <\sigma _{i}(t) < \sigma _\mathrm{max}\) is not a vanishingly small fraction but reasonable number of pairs (\(\sim 300\)) can remain in the system.

4.1.2 First-passage times

We next show the distributions of the first-passage times for the data set in 2010. It should be noted that we observe the duration \(t\) as a first-passage time from the point \(t_{<}\) in time axis, hence, the distributions of the duration \(t\) are given for

$$\begin{aligned} P(t)= & {} P(t \equiv t_{>}-t_{<}) \quad \text{(for } \text{win) }, \end{aligned}$$
(36)
$$\begin{aligned} P(t)= & {} P(t \equiv t_{*}-t_{<})\quad \text{(for } \text{lose) }, \end{aligned}$$
(37)

respectively. We plot the results in Fig. 2.

Fig. 2
figure 2

The empirical distributions \(P(t)\) of first-passage times \(t \equiv \hat{t}-t_{<}\). The left panel is \(P(t \equiv t_{*}-t_{<})\), whereas the right is \(P(t \equiv t_{>}-t_{<})\). We find that one confirms the lose by loss-cutting by 50 days after the start point \(t_{<}\) in most cases and the decision is disclosed at latest by 250 days after the \(t_{<}\). On the other hand, we win within several days after the start and a single peak is actually located in the short time frame

From the left panel, we find that one confirms the lose by loss-cutting by 50 days after the start point \(t_{<}\) in most cases, and the decision is disclosed at latest by 250 days after the \(t_{<}\). On the other hand, we win within several days after the start and a single peak is actually located in the short time frame. These empirical findings tell us that in most cases, the spread between highly correlated two stocks actually shrink shortly even if the two prices of the stocks exhibit ‘mis-pricing’ leading up to a large spread temporally. Taking into account this fact, our findings also imply that the selection by correlation coefficients and volatilities works effectively to make the pairs trading useful.

4.2 Winning probability

As our main results, we first show the wining probability \(p_{w}\) as a function of thresholds \((\theta ,\varepsilon )\) defined by (30) in Fig. 3. To show it effectively, we display the results as three-dimensional plots with contours. From these panels, we find that the winning probability is unfortunately less than that of the ‘draw case’ \(p_{w}=0.5\) in most choices of the thresholds \((\theta ,\varepsilon )\). We also find that for a given \(\theta ({>}\varepsilon )\), the probability \(p_{w}\) is almost a monotonically increasing function of \(\theta \) in all the three years. This result is naturally accepted because the trader might take more careful actions on the starting of the pairs trading for a relatively large \(\theta \).

Fig. 3
figure 3

Winning probability \(p_{w}\) as a function of \((\theta ,\varepsilon )\). From the top most to the bottom, the results in 2012, 2011 and 2010 are plotted. The right panels are the plots for relatively small range of thresholds \((\theta ,\varepsilon )\). From these panels, we find that the winning probability is less than that of the ‘draw case’ \(p_{w}=0.5\) in most cases of the thresholds \((\theta ,\varepsilon )\). See also Tables 1 (2012) and 2 (2011) for the details

To see the result more carefully, we write the raw data produced by our analysis in Tables 1 (2012) and 2 (2011). From these two tables, we find that relatively higher winning probabilities \(p_{w} \sim 0.7\) are observed; however, for those cases, the number of wins \(N_{w}\) (or loses \(N_{l}\)) is small, and it should be more careful for us to evaluate the winning possibility of pairs trading from those limited data sets.

4.3 Profit rate

In order to consider the result obtained by our algorithm for pairs trading from a slightly different aspect, we plot the profit rate \(\eta \) given by (33) as a function of thresholds \((\theta ,\varepsilon )\) in Fig. 4. We clearly find that for almost all of the combinations \((\theta ,\varepsilon )\), one can obtain the positive profit rate \(\eta >0\), which means that our algorithm actually achieves almost risk-free asset management and it might be a justification of the usefulness of pairs trading.

At a glance, it seems that the result of the small winning probability \(p_{w}\) is inconsistent with that of the positive profit rate \(\eta >0\). However, the result can be possible to be obtained. To see it explicitly, let us assume that the pairs \((i,j)\) and \((k,l)\) lose and the pair \((m,n)\) wins for a specific choice of thresholds \((\theta _{+},\varepsilon _{+})\). Then, the wining probability is \(p_{w}=1/3\). However, the profits for these three pairs could satisfy the following inequality:

$$\begin{aligned} d_{mn}(t_{>}^{(mn)})-d_{mn}(t_{>}^{(mn)}) > \{d_{ij}(t_{>}^{(ij)})-d_{ij}(t_{*}^{(ij)})\} + \{d_{kl}(t_{>}^{(kl)})-d_{kl}(t_{*}^{(kl)})\} \end{aligned}$$
(38)

From the definition of the profit rate (33), we are immediately conformed as

$$\begin{aligned} \eta (\theta _{+},\varepsilon _{+})&= d_{mn}(t_{>}^{(mn)})-d_{mn}(t_{>}^{(mn)}) - \{d_{ij}(t_{>}^{(ij)})-d_{ij}(t_{*}^{(ij)})\} \nonumber \\&\quad- \{d_{kl}(t_{>}^{(kl)})-d_{kl}(t_{*}^{(kl)})\} > 0. \end{aligned}$$
(39)

Hence, an active pair producing a relatively large arbitrage can compensate the loss of wrong active pairs by choosing the threshold \((\theta ,\varepsilon )\) appropriately. It might be an ideal scenario for the pairs trading.

Fig. 4
figure 4

Profit rate \(\eta \) as a function of \((\theta ,\varepsilon )\). From the upper left to the bottom, the results in 2012, 2011 and 2010 are plotted. We clearly find that for almost all of the combinations \((\theta ,\varepsilon )\), one can obtain the positive profit rate \(\eta >0\), which means that our algorithm actually achieves almost risk-free asset management and it might be a justification of the usefulness of pairs trading

Finally, we should stress that the fact \(\eta > 0\) in most cases of thresholds \((\theta ,\varepsilon )\) implies that automatic pairs trading system could be constructed by applying our algorithm for all possible \((\theta ,\varepsilon )\) in parallel. However, it does not mean that we can always obtain positive profit ‘actually’. Our original motivation in this paper is just to examine (from the stochastic properties of spreads between two stocks) how much percentage of highly correlated pairs is suitable for the candidate in pairs trading in a specific market, namely, Tokyo Stock Exchange. In this sense, our result could not be used directly for practical trading. Nevertheless, as one can easily point out, we may pare down the candidates by introducing the additional transaction cost, and even for such a case, the game to calculate the winning probability etc. by regarding the trading as a mixture of first-passage processes might be useful.

4.4 Profit rate versus volatilities

In Fig. 5, we plot the profit rate \(\eta \) against the volatilities \(\sigma \) as a scattergram only for the winner pairs.

Fig. 5
figure 5

The profit rate \(\eta \) against the volatilities \(\sigma \) as a scattergram. We set the profit-taking threshold \(\varepsilon \) as \(\varepsilon = 0.1 \theta \) and vary the starting threshold \(\theta \) in the range of \(0.1 \le \theta \le 0.3\). We find that there exist two distinct clusters (components) in the winner pairs, namely, the winner pairs giving us the profit rate typically as much as \(\eta \simeq \theta -\varepsilon =\theta -0.1\theta =0.9\, \theta \simeq 0.2\) for the range of \(0.1 \le \theta \le 0.3\), which are almost independent of \(\sigma \), and the winner pairs having the profit linearly dependent on the volatility \(\sigma \). Note that the vertical axis is shown as ‘percentage’

In this plot, we set the profit-taking threshold \(\varepsilon \) as

$$\begin{aligned} \varepsilon = 0.1 \theta \end{aligned}$$
(40)

and vary the starting threshold \(\theta \) in the range of \(0.1 <\theta < 0.3\). For each active winner pair, we observe the profit rate \(\eta \) and the average volatility \(\sigma \) of the two stocks in each pair, and plot the set \((\sigma ,\eta )\) in the two-dimensional scattergram. From this figure, we find that there exist two distinct clusters (components) in the winner pairs, namely, the winner pairs giving us the profit rate typically as much as

$$\begin{aligned} \eta \simeq \theta -\varepsilon =\theta -0.1\theta =0.9\, \theta \simeq 0.2 \end{aligned}$$
(41)

for the range of \(0.1 \le \theta \le 0.3\), which are almost independent of \(\sigma \), and the winner pairs having the profit rate linearly dependent on the volatility \(\sigma \). The former is a low-risk group, whereas the latter is a high-risk group. The density of the points for the low-risk group in Fig. 5 is much higher than that of the high-risk group. Hence, we are confirmed that our selection procedure of the active pairs works effectively to manage the asset as safely as possible by reducing the risk which usually increases as the volatility grows.

Finally, it should be noted that as we discussed in Sect. 3.1, the value \(\theta -\varepsilon \) is a lower bound of the profit rate [see Eqs. (27) and (41)]. Therefore, in the above case, the lower bound for the profit rate \(\eta \) should be estimated for \(0.1 \le \theta \le 0.3\) as

$$\begin{aligned} \eta \ge \eta _\mathrm{min}=0.9\, \theta =0.9 \times 0.1 = 0.09. \end{aligned}$$
(42)

The lowest value for the profit rate (42) is consistent with the actually observed lowest value in the scattergram shown in Fig. 5.

4.5 Examples of winner pairs

Finally, we shall list several examples of active pairs to win the game. Of course, we cannot list all of the winner pairs in this paper, hence, we here list only three pairs as examples, each of which includes SANYO SPECIAL STEEL Co. Ltd. (ID: 5481) and the corresponding partners are HITACHI METALS. Ltd. (ID: 5486), MITSUI MINING & SMELTING Co. Ltd. (ID: 5706) and PACIFIC METALS Co. Ltd. (ID: 5541). Namely, the following three pairs

$$({\tt{5481}},{\tt{5486}}), ({\tt{5481}},{\tt{5706}}),({\tt{5481}},{\tt{5541}})$$

actually won in our empirical analysis of the game. Note that each ID in the above expression corresponds to each identifier used in Yahoo! Finance http://finance.yahoo.co.jp. As we expected as an example of coca cola and pepsi cola http://en.wikipedia.org/wiki/Pairs_trade, these are all the same type of industry (the steel industry). We would like to stress that we should act with caution to trade using the above pairs because the pairs just won the game in which the pairs once got a profit in the past are never recycled in future. Therefore, we need much more extensive analysis for the above pairs to use them in practice.

5 Summary

In this paper, we proposed a very simple and effective algorithm to make the pairs trading easily and automatically. We applied our algorithm to daily data of stocks in the first section of the Tokyo Stock Exchange. Numerical evaluations of the algorithm for the empirical data set were carried out for all possible pairs by changing the starting (\(\theta \)), profit-taking (\(\varepsilon \)) and stop-loss (\(\Omega \)) conditions to look for the best possible combination of the conditions \((\theta ,\varepsilon ,\Omega )\). We found that for almost all of the combinations \((\theta ,\varepsilon )\) under the constraint \(\Omega =2\theta - \varepsilon \), one can obtain the positive profit rate \(\eta >0\), which means that our algorithm actually achieves almost risk-free asset management at least for the past three years (2010–2012) and it might be a justification of the usefulness of pairs trading. Finally, we showed several examples of active pairs to win the game. As we expected before, the pairs are all the same type of industry (for these examples, it is the steel industry). We should conclude that the fact \(\eta > 0\) in most cases of thresholds \((\theta ,\varepsilon )\) implies that automatic pairs trading system could be constructed by applying our algorithm for all possible \((\theta ,\varepsilon )\) in parallel way.

Of course, the result does not mean directly that we can always obtain positive profit in a practical pairs trading. Our aim in this paper was to examine how much percentage of highly correlated pairs is suitable for the candidate in pairs trading in a specific market, namely, Tokyo Stock Exchange. In this sense, our result could not be used directly for practical pairs trading. Nevertheless, we may pare down the candidates by introducing the additional transaction cost, and even for such a case, the game to calculate the winning probability etc. by regarding the trading as a mixture of first-passage processes might be useful.

We are planning to consider pairs listed in different stock markets, for instance, one is in Tokyo and the other is in NY. Then, of course, we should also consider the effect of the exchange rate. Those analyses might be addressed as our future study.