Keywords

1 Introduction

The problem that arises in the selection of a stock portfolio generally regards estimating the volatility of log returns, which means estimating a variance-covariance matrix [1]. The problems to be addressed in defining a portfolio are, however, of a different nature. First it is necessary to take the entire set of shares and identify the subset from which those to be included in the portfolio are selected on the basis of ranking. Then the volatility matrix, and hence the investment risk, is estimated. Finally, the portfolio is defined according to the information obtained in the first two phases. The first problem is addressed in this work by proposing a selection criterion based on the differences between the log intrinsic value of the shares and the log returns. The volatility matrix of the shares selected is then estimated through combined use of the Cointegrated Vector Autoregressive (CVAR) model [2, 5, 7] and the Baba-Engle-Kraft-Kroner (BEKK) model [3]. Once the volatility matrix has been estimated by solving a problem of quadratic optimization, it is possible to establish the proportions in which each of the shares selected is to be purchased, i.e. to select a portfolio.

2 Selection of Shares and Estimation of the Volatility Matrix

The need to select a (possibly large) subset of n shares for inclusion in the portfolio on the basis of their log returns stems from the fact that the entire set of shares available on the market is so large that it would be impossible to define a portfolio a through simultaneous study of all their log returns. It therefore makes sense to concentrate on the subset of the “best” shares, defined here as those with the greatest difference between log intrinsic value and log returns. The number n of shares to be selected is established through application of a criterion (efficient frontier) that identifies the combination of shares offering the returns with minimum risk for every fixed n but with the composition varying in terms of the quantity of each share to be purchased. The first problem to be addressed in the selection of shares is therefore the calculation of the difference between the log intrinsic value of the share and its log return. It should be pointed out, however, that this is in any case a problem of prediction, as the point of interest is the future value of the difference between the two magnitudes considered. This makes it possible to decide which share to select at the moment of investment. It is therefore necessary to estimate a model that makes this prediction possible. A CVAR(p) model is adopted in order to estimate both the log intrinsic value of the share and its log return. The starting point is the K=150 series, regarding the log returns R k,t on the shares, and the average log return of the market R M,t , t=t k ,…,T, k=1,…,K. For each series, the CVAR(p) model is considered for the random vector y t =[y 1,t ,y 2,t ]′=[R k,t ,R M,t ]′ as

$$ \Delta y_t = \eta_t + \varPi y_{t-1} + A_1 \Delta y_{t-1}+\cdots+A_{p-1} \Delta y_{t-p+1} +u_t $$
(1)

where Δ indicates the usual difference operator, η t =η 0+η 1t, A i is the 2×2 matrix, Π is the matrix of parameters containing information on the cointegration of the series [5], i=1,…,p−1, η 0,η 1,A i ,Π are the unknown coefficients and u t =[u 1,t ,u 2,t ]′ is the vector of errors such that u t N(0,Σ u ). It should be noted that the CVAR(p) model is chosen to estimate the unknown coefficients of Eq. (1) because it makes it possible to consider the possible presence of integration or cointegration between the two components of the random vector y t . Note also that when the series present neither cointegration nor integration, Eq. (1) is not informative and it becomes necessary to estimate a Vector Autoregressive (VAR) model. The same procedure is also adopted to estimate the model for log intrinsic value, where use is made not only of the historical series of log intrinsic value for the period under examination, but also of the average log intrinsic value of the sector of economic activity to which the share belongs. Once the two magnitudes have been estimated for every share i, all the differences that present positive values of differences and log return at the same time are also estimated. With the number n of shares to be selected set initially at 10, the volatility matrix is estimated element by element. In particular, a two-step procedure is adopted to estimate the variance of the log return of share i (element i on the main diagonal of the matrix). First, a CVAR(p) model is estimated in which the historical series considered are the log return of the share i itself and the log return of the market index [7]. Second, if the ARCH test [5] carried out on the residuals of the CVAR model estimated in step 1 indicates the presence of heteroscedasticity, a BEKK model is applied to the same residuals, which makes it possible to interpret the temporal dynamics of the variances of the log return of share i [3]. In order to estimate the extra-diagonal elements of the volatility matrix (covariances between the log returns of two shares i and j), the same procedure is used with the difference that the CVAR model is applied to the series of the log returns of the two shares. Here too, if the ARCH test indicates the presence of heteroscedasticity, a BEKK model is estimated. It should be noted that it is possible to demonstrate that the two-step procedure converges asymptotically on the simultaneous estimation of the elements of the matrix [5]. Moreover, since the estimation of the entire matrix of volatility is obtained asymptotically as a composition of consistent and increasingly efficient estimates [5], it presents the same characteristics. The size n of the portfolio is increased by means of an iterative procedure until all the “best” shares are included in the portfolio.

3 Results and Conclusions

The model put forward was applied to the 150 shares of highest value on the Italian stock market. The maximum number of shares in the portfolio proved to be 25. Figure 1 top left shows the volatility and the log return obtained by solving the Markowitz optimization problem for variation of the expected return R p,T+1 and the dimension n of the portfolio (n=10,11,…,n max ). The portfolio risk tends to decrease as n increases [6]. Figure 1 top right shows the efficient frontiers (defined by the part of every curve continuing upward from X) obtained by solving the Markowitz optimization problem for variation of the expected return R p,T+1 and the dimension n of the portfolio (n=10,11,…,n max ). The optimal risk from a risk-averse standpoint corresponds to n=25, i.e. to the point on the curve furthest to the left in Fig. 1 to right, indicated with the symbol F. The portfolio thus identified presents an average monthly return of 0.00993, a standard monthly deviation of 0.0630 and a Sharpe index value of 0.15771. Figure 1 bottom presents the estimates of the elements of the volatility matrix and shows that the risk is mostly due to the variances of the shares, to which the highest peaks correspond. It is evident, however, that the values of variance and covariance are comparable for some subsets of shares. This suggests that it could prove useful, in order to reduce computational complexity, to take covariance into consideration only for specific subgroups of shares and variance alone for the others. It therefore becomes necessary to develop a criterion, based for example on the Granger principle of causality or on analysis of cross-correlation [4], in order to identify the groups of shares to be addressed in a different way.

Fig. 1
figure 1

Top Left: n evolution. Top Right: Portfolio frontiers simulation. Bottom: Volatility estimation