Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The paper is devoted to the problem of asset pricing on the basis of high frequency data. We propose the model describing one single asset’s rate of return which is a modified version of Fama and French (1993) asset pricing model, including additionally the ‘order imbalance factor’. The order imbalance is assumed as a temporary imbalance between buy and sell orders. Its estimation is preceded by an indication which side of the market was initiating each transaction. In the research the original imbalance indicators proposed by Nowak (2014) are employed.

The empirical analysis is conducted for the selected stocks, index WIG20 constituents, listed on the Warsaw Stock Exchange (WSE) over the period of 2000 to 2016. The research hypothesis states that including in the asset pricing model the explanatory variable reflecting the imbalance of the single asset’s market improves the descriptive properties of the model.

The model is validated using i.a. the underidentification test and the weak identification test (Kleibergen and Paap 2006), overidentification test of all instruments (Hansen 1982) and endogeneity test of endogenous regressors.

The remainder of the paper is organized as follows. Sect. 2 provides a brief literature review. Section 3 presents the definition of the order imbalance and describes the different methods of classification the transactions for buyer- and seller-initiated ones. It also reminds the original conceptualization of the imbalance indicators presented in the previous work of the author (Nowak 2014). Section 4 introduces the proposed three-factor asset pricing model, including the market factor, the factor reflecting the size of the company and the order imbalance factor. Section 5 presents the empirical research methodology and procedure. Section 6 reports the major results, briefly summarizes the survey and indicates the areas for the future research.

2 Brief Literature Review

The contributions of the paper are twofold. First, it contributes to the literature of market microstructure, since the temporary imbalance between buy and sell orders plays an important role in the stock prices formation process. Second, it contributes to the broad strand of literature related to asset pricing.

The relation between the order imbalance and the rates of return of individual stocks listed on the NYSE in the period of 1988–1998 was discussed in the paper of Chordia and Subrahmanyam (2004). The authors discovered the significant impact both of the lagged and contemporaneous order imbalances on the daily rates of return in the short-term horizon. They emphasized the accuracy of using order imbalance as a measure of trading activity, instead of volume, which ‘alone is absolutely guaranteed to conceal some important aspects of trading’ (Chordia et al. 2002). It is pertinent to note that in both papers mentioned above the aggregate market-wide order imbalance measures are employed. On the contrary, we propose the order imbalance indicators estimated for each stock individually. However, all the papers represent the market microstructure literature based on the high frequency data and reflect order imbalance’ contribution to the stock prices formation process.

The literature referring to asset pricing using multifactor asset pricing models is large and comprehensive. Nonetheless, Fama and French in one of their most recent works (Fama and French 2015) argue that there is still a place for some extensions: the authors add two new factors to the well-known three factor model, reflecting respectively the profitability and investment opportunities of the company. Empowering by their findings and discussion of the previous results [see Fama and French (2015) and the references therein] we propose an original approach basing on the high frequency intraday data and the concept of order imbalance as an innovative factor of the company’s activity and performance.

3 Definition of Order Imbalance

There are two major reasons of the stock price changes described in the literature: arrival of the new information, either public or private, on the financial market and the temporary imbalance between buy and sell orders. The latter reason is frequently ignored: some academic researchers argue that there is no imbalance (in volume) since the volume bought by some traders is always equal to the volume sold by others (Hopman 2007). Thus, in order to measure the imbalance existing on the market, the indication of the side initiating the transaction and a distinction between the so-called buyer- and seller-initiated trades should be made (Lee and Ready 1991; Ellis et al. 2000). Counting the number of buyer- and seller-initiated transactions enables to indicate which part of the market—the buyer or the seller—is more eager to trade. It is worth to note that such a distinction is useful only on the markets where limit orders predominate. Many submitted orders remain then unexecuted, hence the existence of an imbalance (in volume) is understood in terms of submission, and not execution (Nowak 2014).

There are four main trade classification rules discussed in the literature: quote rule, tick rule, Lee-Ready rule (Lee and Ready 1991) and Ellis et al. rule (Ellis et al. 2000). Those rules allow to distinguish between the buyer- and seller-initiated trades. They were described in detail and compared to each other in the work by Nowak (2014). In the same paper three groups of the original imbalance indicators were proposed (compare the summary in Table 1).

Table 1. Imbalance indicators proposed by Nowak (2014)

The imbalance indicators within group I reflect respectively: ratio of a number of transactions initiated both by buyers and sellers in the whole number of executed trades (imb1), ratio of number of buyer-initiated transactions in the number of trades initiated either by buyers or sellers (imb2), ratio of difference between buyers- and sellers-initiated transactions in the number of trades initiated either by buyers or sellers (imb3). The indicators within the groups II and III were constructed analogously referring respectively to the volume and the value of trades. Estimation of the ratios’ values demanded counting the whole number of transactions initiated by both sides of the market and indeterminate within each day.

The side initiating a transaction was denoted according to the quote rule, which classifies the trades by comparing the transaction price to the mid-point price at time t. The transaction price was approximated by close price. The mid-point price was calculated as the arithmetic mean of the lowest price \( {P}_t^l \) and the highest price \( {P}_t^h \) which were the approximations for the best ask price P t (a) and the best bid price P t (b), respectively.Footnote 1 Trades with transaction prices higher (lower) than the mid-point price were classified as the buyer- (seller-) initiated. The trades executed at the mid-point price were not classified.

4 The Three-Factor Asset Pricing Model with Imbalance Indicator

The research hypothesis is verified on the basis of the model for the rate of return specified on the grounds of three theories: asset pricing theory, efficient market hypothesis and rational expectations theory. The general form of the model can be written as follows:

$$ {R}_{i, t+1}-{R}_{f, t+1}={\gamma}_0+{{\boldsymbol{\upgamma}}^{\mathbf{\prime}}\mathbf{Y}}_{t+1}+{{\boldsymbol{\upalpha}}^{\mathbf{\prime}}\mathbf{Z}}_{i t}+{\varepsilon}_{i, t+1}, $$
(1)

where R i , t + 1—the rate of return on asset i at time t+1, (t=1,2,…,T-1), R f , t + 1—risk-free rate of return, Y t + 1—vector of variables reflecting factors proposed by the asset pricing theory, referring to the market as a whole, Z it —vector of variables reflecting factors proposed by the efficient market theory, referring to the individual risky asset in the market, γ 0—intercept term, γ', α'—vectors of parameters, ε i , t + 1—error term. Since the daily risk-free rates of return for the whole examined period were unavailable, the R f , t + 1 variable was eventually neglected.

The model is based on the modified version of Fama and French model (1993) therefore there are two variables included in the vector Y t + 1: the market factor and size factor. The first one is approximated by the rate of return of market portfolio, whereas the second one is reflected by the difference of the rates of return of diversified portfolios built of the big and small companies.

The third factor in the model belongs to vector Z it , refers to the individual asset and reflects the imbalance (disequilibrium) magnitude of the WSE single asset’s market. In order to measure such magnitude, the own original indicators imb1imb9 described in Sect. 3 can be employed. As was pointed out in the paper of Nowak (2014), the average values of the following pairs of indicators: imb4 and imb7, imb5 and imb8, imb6 and imb9 turned out to be on the very similar level, which led to the conclusion that they may contain the same information. This resulted in resignation of using imb7, imb8 and imb9 imbalance indicators. Moreover, considering that only the ratios within the first group incorporate all the trades executed, it appeared to be reasonable to apply only imb1 and imb4 indicators in the research.

However, both imb1 and imb4 variables turned out to be nonstationary and could not be used directly in the model. Therefore, the ‘day with imbalance’ of the single asset’s market was defined as the day when the number (volume) of trades initiated by the buyers and sellers in the whole number (volume) of trades was higher than 1%, 2.5%, 5%, 10% and 20% respectively. Such a situation was reflected by the dummy variable lxx_1 (lxx_4) taking the value 1 in the case of the ‘day with imbalance’ and value 0 in other cases, namely:

$$ \begin{array}{l} l10\_1=\left\{\begin{array}{l}1\ \mathrm{when}\ imb1>0.01\\ {}0\ \mathrm{when}\ imb1\le 0.01\end{array}\right.,\kern1em l25\_1=\left\{\begin{array}{l}1\ \mathrm{when}\ imb1>0.025\\ {}0\ \mathrm{when}\ imb1\le 0.025\end{array}\right.,\\ {} l50\_1=\left\{\begin{array}{l}1\ \mathrm{when}\ imb1>0.05\\ {}0\ \mathrm{when}\ imb1\le 0.05\end{array}\right.,\kern1em l100\_1=\left\{\begin{array}{l}1\ \mathrm{when}\ imb1>0.1\\ {}0\ \mathrm{when}\ imb1\le 0.1\end{array}\right.,\\ {} l200\_1=\left\{\begin{array}{l}1\ \mathrm{when}\ imb1>0.2\\ {}0\ \mathrm{when}\ imb1\le 0.2\end{array}\right..\end{array} $$

Dummy variable lxx_4 was constructed analogously, basing on indicator imb4.

In consequence, the model (1) was written as

$$ {r}_{i, t+1}={\gamma}_0+{\gamma}_1{lwig}_{t+1}+{\gamma}_2{lbs}_{t+1}+\alpha \cdot {I}_{i t}+{\varepsilon}_{i, t+1}, $$
(2)

where r i , t + 1—logarithmic rate of return for i th asset at time t+1, lwig t + 1—logarithmic rate of return for market portfolio, reflected by index WIG, lbs t + 1—the difference of the logarithmic rates of return of diversified portfolios built of the big and small companies, approximated by the WIG20 and sWIG80 indices respectively, I it —variable reflecting the imbalance on the i th asset’s market at time t, defined as a dummy lxx_1 (lxx_4), γ 0, γ 1, γ 2, α— structural parameters, ε i , t + 1—disturbance term.

5 Research Methodology and Procedure

Due to the fact that model (2) is an errors-in-variables model, in order to estimate its parameters the method of instrumental variables (IV) was employed. Additionally, the Newey-West autocorrelation and heteroskedasticity consistent covariance matrix estimator was used (with Bartlett kernel function).

For the variable I it we can indicate minimum two following sources of errors:

  1. 1.

    The side initiating the transaction remains unknown and its identification basing on the chosen trade classification rule results in randomness of the variables imb1 and imb4.

  2. 2.

    The variables imb1 and imb4 are based on the high frequency data which, however, does not reflect all the transactions, but is ‘rounded to the nearest second’.

The variables I it − 1 and I it − 2 were used as the instrumental variables.

The research sample covers 10 selected stocks listed on the WSE and included in the WSE index WIG20 in the period of November 17, 2000—June 30, 2016. The intra-day data of the transactions executed are ‘rounded to the nearest second’ and available at www.bossa.pl. The selection of the stocks was made on the basis of three criteria: assignment to index WIG20, liquidity and uninterrupted trading in the research period.

The list of shares with the number of intra-day and daily observations is presented in Table 2.

Table 2. Research sample

The first stage of the research was conducted basing on the intra-day data. The side of the market initiating each transaction was indicated and the values of the original imbalance indicators imb1imb9 were estimated. In the second stage of research the daily data was employed. At first, the nonstationarity of imb1 and imb4 variables was demonstrated.Footnote 2 Consequently, the dummy variables lxx_1 and lxx_4 were calculated. Subsequently, the model (2) was estimated using the IV method. The following three versions of the dependent variable—logarithmic rate of return r i , t + 1—were appliedFootnote 3:

  1. 1.

    The rate of return calculated for close prices: r 1i , t + 1 = close i , t + 1 − close it

  2. 2.

    The ‘intra-day’ rate of return: r 2i , t + 1 = close i , t + 1 − open i , t + 1

  3. 3.

    The ‘night’ rate of return: r 3i , t + 1 = open i , t + 1 − close it .

The results of estimation and validation of model (2) including variable imb1 calculated for the KGHM share, are summarized in Table 4 in Appendix.Footnote 4 The values of the following statistics with the corresponding p-values are put in the points 1–8 listed in Table 4:

  1. 1.

    Kleibergen-Paap underidentification test, under H 0 (excluded instruments are not correlated with the endogenous regressor, in other words: the equation is underidentified), distributed as χ 2(2) (Kleibergen and Paap 2006)

  2. 2.

    Kleibergen-Paap weak identification test, under H 0 (correlations between the excluded instruments and endogenous regressors are nonzero but ‘weak’), distributed as Wald F-statistic F(2.3904) (Baum et al. 2007)

  3. 3.

    Sargan-Hansen test (test of overidentifying restrictions), under H 0 (instruments are correlated with the error term and the excluded instruments are correctly excluded from the estimated equation), distributed as χ 2(1)

  4. 4.

    Endogeneity test for endogenous regressor r i , t + 1, under H 0 (endogenous regressor can be treated as exogenous), distributed as χ 2(1)

  5. 5.

    Cumby and Huizinga test of first-order autocorrelation, under H 0 (the regression error has no first-order correlation), distributed as χ 2(1) (Cumby and Huizinga 1992)

  6. 6.

    Cumby and Huizinga test of up to fifth-order autocorrelation, under H 0 (the regression error has no up to fifth-order correlation), distributed as χ 2(5)

  7. 7.

    Pesaran-Taylor RESET test heteroskedastic and autocorrelation robust, under H 0 (there are no neglected nonlinearities in the choice of a functional form), distributed as χ 2(1) (Pesaran and Taylor 1999)

  8. 8.

    Pagan-Hall test, under H 0 (the disturbance term is homoskedastic), distributed as χ 2(4) (Pagan and Hall 1983).

6 Empirical Results

The main results of empirical research can be summarized as follows.

  1. 1.

    In the research 300 models were estimated, including 30 models for each share and 150 models containing a dummy variable basing on the indicator imb1 and imb4 respectively.

  2. 2.

    The choice of the IV method of estimation was appropriate according to the results of the both Kleibergen-Paap tests. For the overwhelming number of regressions, the null hypothesis of underidentification of the model was rejected (apart from the models based on l200_1 variable built for 3 assets: ORANGEPL, PGNiG and PKOBP).

  3. 3.

    The instruments in the IV estimation method were chosen properly. In Sargan-Hansen test only in the case of 7 (4) models including a dummy variable based on imb1 (imb4) indicator the null hypothesis was rejected. However, in the endogeneity test r i , t + 1, the null hypothesis was rejected in the case of 13 (18) models respectively.

  4. 4.

    In the light of results of Pesaran-Taylor RESET test we can confirm that for the majority of regressions the choice of an equation functional form was appropriate. The null hypothesis was rejected only for 14 (12) models based on indicator imb1 (imb4).

  5. 5.

    For c.a. 50% of estimated regressions the null hypothesis in Cumby and Huizinga test of first-order (up to fifth-order) autocorrelation was rejected. The null hypothesis of no first-order correlation was rejected in the case of 70 (79) models built on the basis of imb1 (imb4) indicator, whereas the null hypothesis of no up to fifth-order correlation was rejected in the case of 88 (82) models.

  6. 6.

    For about 60% of regressions the null hypothesis in Pagan-Hall test was rejected. The disturbance test was not homoskedastic in the case of 91 (94) models including a dummy based on indicator imb1 (imb4).

  7. 7.

    The market factor turned out to be statistically significant in the case of 145 (150) models based on imb1 (imb4) indicator.

  8. 8.

    The size factor was statistically significant in the case of 121 (125) models including a dummy variable constructed basing on imb1 (imb4) indicator.

  9. 9.

    The variable reflecting the imbalance on the single asset’s market was statistically significant for a considerably smaller number of cases. The number of the models is presented in Table 3.

Table 3. The number of models (2) with significant variable reflecting the imbalance

The imbalance factor was most frequently statistically significant in the case of the models estimated for the ‘night’ rate of return r 3i , t + 1 (36 and 42 cases with a dummy variable based on the imb1 and imb4 indicator respectively), more rarely for the ‘intra-day’ rate of return (19 and 27 cases). In contrast, in all models describing the rate of return calculated on the basis of close prices, r 1i , t + 1, the imbalance factor turned out to be statistically insignificant. Comparing the findings obtained for the models basing on imb1 and imb4 indicators, we can notice that the variables constructed on the basis of imb4 were more frequently statistically significant (69 cases, 46%) than those constructed on the basis of imb1 indicator (55 cases, 37%).

Summarizing concisely the obtained findings we can state that there is no reason to reject the research hypothesis claiming that the inclusion of the additional explanatory variable in the asset pricing model, reflecting the information of order imbalance of the single asset’s market, improves the descriptive properties of such a model. Nevertheless, the hypothesis cannot be rejected regarding the ‘intra-day’ rate of return and the ‘night’ rate of return calculated for stocks listed on the WSE in the period of 2000–2016. It is worth pointing out that in the case of a model describing the rate of return calculated on the basis of close prices, the research hypothesis was rejected.

The further investigation will concern the predictability of the WSE stock market returns basing on the information of order imbalance. The research hypothesis will refer to the good predictive properties of the asset pricing model including the imbalance indicator. Those properties will be assessed on the basis of the traditional measures of forecast accuracy and the results of the selected statistical tests, including Diebold and Mariano (1995) test, Pesaran and Timmermann (1992) test and the forecast encompassing test.