1 Introduction

The seminal mean-variance model proposed by Markowitz (1952) lays down the foundation for the modern portfolio theory. As the name implies, expected returns, variances and covariances are the key input model parameters. In most real-world situations, these parameters are, nevertheless, not known a priori. Much work in this vein has to postulate that those unknown moments can be estimated precisely by sample averages, which is usually known as the plug-in approach. However, point estimates are subject to estimation errors and may result in portfolio allocations far from optimum. In fact, the problem of estimation risk will be even exacerbated in the mean-variance model as more and more researchers find the portfolios produced by the mean-variance model very sensitive to model parameters (see, e.g., Best and Grauer 1991a, b; Chopra et al. 1993; Chopra and Ziemba 1993), that is, a small fluctuation in the estimate might be amplified to a dramatic change in portfolio weights. Thus, the mean-variance model without considering parameter uncertainty is often criticized for entailing extreme positions in the optimal portfolio and delivering poor out-of-sample performances (see Litterman et al. 2004).

There is an extensive literature devoted to applying Bayesian approaches to relieve the adverse impacts of parameter uncertainty in portfolio optimization. Given available data and a portfolio choice problem, an investor might be inclined to discard potential features of uncertainties and make decisions based on nonparametric estimation methods (see Tsybakov 2008). The other extreme is the plug-in approach in which the investor is convinced about an underlying data-generating model that incorporates specific features for predicting the future, through a statistical test or a heuristic argument, assuming that the model parameters are exactly equal to the sample estimates and ignoring the estimation risk of unknown parameters. By contrast to the above two extremes, a Bayesian investor assumes a parametric return-generating model and treats unknown model parameters as random variables with specified priors. A posterior/predictive probability distribution of asset returns, which depends only on the observation of data, can be obtained by integrating out the unknown parameters according to the Bayes’ rule and evolves automatically as new data released. The research work on this topic includes Frost and Savarino (1986), Aguilar and West (2000), Polson and Tew (2000), Pástor and Stambaugh (2000), Wang (2005), Black and Litterman (1992), Kolm and Ritter (2017), Zhou (2009), Bauder et al. (2021), Bodnar et al. (2017); Anderson and Cheng (2016) and Marisu and Pun (2023). The majority of the relevant literature is, however, limited in a static framework of the mean-variance model. One exception is Winkler and Barry (1975), who consider Bayesian inference and learning in a multi-period setting, in which the investor is assumed to maximize a utility function of the terminal wealth and the optimal strategy requires a case-by-case discussion. Their work first shows that even for the simple linear and quadratic utilities, the corresponding multi-period model with one risky asset and one risk-free asset involved can only be solved numerically rather than analytically in general.

As intuitively expected, especially for dynamic problems, it is important for the investor to exploit the data observed gradually, recognizing the updated information about the unknown parameters and revising the portfolio accordingly. Nevertheless, as indicated in Winkler and Barry (1975), if the formulated stochastic dynamic program cannot be solved analytically, the development of the optimal portfolio with information updating, which requires computation of conditional expectations and optimization at each time period, will be of great computational challenge even for simple investment cases. As a consequence, there have been few results in the literature on solutions for general dynamic portfolio optimization problems with unknown parameters.Footnote 1 On the other hand, recent studies achieve some progress by developing a variety of approximate solution methods. For instance, Barberis (2000) conducts backward induction through discretizing the state space. Brandt et al. (2005) take a Taylor series expansion over the expected utility to obtain an approximate closed-form solution. Soyer and Tanyeri (2006) adopt a surface fitting approach for a two-stage model. Skoulakis (2008) approximates the value functions using feedforward neural network. Jurek and Viceira (2010) obtain their approximate analytical solution by log-linearizing the budget constraint for the log-normal return distribution. Unfortunately, these papers suffer from several deficiencies. First, almost all the previous literature investigating Bayesian learning and dynamic portfolio choice relies on using utilities defined over terminal wealth as objective functions, e.g., the power utility and the exponential utility. Although they are popular in economic studies, investors and portfolio managers may be concerned more about explicit measurement of the investment risk, which cannot be easily seen and analyzed if a utility function of final wealth is used. Second, errors in the portfolios resulted from multi-period approximation and optimization are hard to detect and control, especially when the state variables are of a high dimension. Third, due to the absence of analytical solutions, the traditional models lack a clear interpretation of how Bayesian learning affects investors’ decision making. The role of Bayesian learning in forming optimal portfolio allocations is still obscure.

In this paper, we focus on the investment needs of long-term conservative investors who are particularly risk averse. To appropriately model the decision features of those investors, we adopt an \(l_{\infty }\) risk function as the risk measure in our dynamic model formulation. Various risk measures have been proposed in the literature to capture the concerns of investors in different situations. These include, e.g., semi-variance (see Markowitz 1959), value at risk (see Duffie and Pan 1997), CVaR (see Rockafellar et al. 2000), and \(l_{1}\) risk function (see Konno and Yamazaki 1991). Cai et al. (2000) propose a more conservative \(l_{\infty }\) risk function to possibly reflect the risk attitude of those relatively conservative investors. Specifically, this \(l_{\infty }\) risk measure can be defined mathematically as follows:

$$\begin{aligned} l_{\infty }(\varvec{x}) = \max _{1\leqslant j\leqslant p} \mathbb {E}\left[ |r_{j}x_{j}-\mathbb {E}(r_{j})x_{j}|\right] , \end{aligned}$$

where \(r_{j}\) denotes the return rate of asset j, \(x_{j}\) is the amount of allocation of the fund to asset j, p is the total number of assets and \(\varvec{x}\) is a vector of the form \(\varvec{x}=(x_{1},...,x_{p})^{\top }\). It is clear that the risk of holding a portfolio in one period under \(l_{\infty }(\varvec{x})\) is measured by the maximal expected absolute deviation of individual asset return from its expectation. Therefore, by minimizing the risk proxy \(l_{\infty }(\varvec{x})\) function, an investor can set up a minimax rule to construct the optimal portfolio. In the original work of Cai et al. (2000), the authors introduce the \(l_{\infty }\) risk function in a static portfolio setting and derive an analytical solution for a single-period model. By analyzing the feature of the optimal investment strategy, they show in theory that their model with \(l_{\infty }\) risk function can exhibit some robustness to the errors in the problem inputs. The empirical work in Cai et al. (2004) further supports that the portfolio derived from the \(l_{\infty }\) model is less sensitive to the input data compared with Markowitz’s mean-variance model. More studies on \(l_{\infty }\) risk measure have been reported in the literature subsequently; see, e.g., Prigent (2007), Ryals et al. (2007), Park et al. (2019); Vercher and Bermúdez (2015); Sun et al. (2015) and Meng et al. (2022).

The contributions of this paper can be summarized as follows.

(1) We set up a novel dynamic portfolio selection framework in which the estimates of unknown parameters can be updated via Bayesian learning and the \(l_{\infty }\) risk function is used as the risk measure. Specifically, the investor in our model is assumed to choose the optimal portfolio by (i) maximizing the expected terminal wealth; and (ii) minimizing the cumulative investment risk which is defined as a weighted sum of risks the investor will undertake in subsequent periods. We show that the proposed stochastic dynamic program has a closed-form optimal policy, independent of assumptions on the return-generating process. In contrast to previous studies relying on approximate solution methods, our work gives an analytical expression of the optimal portfolio allocation, making it possible to see clearly how the composition of the portfolio is determined and how Bayesian learning affects investor’s decision in a dynamic setting. The investment strategy in each period can be intuitively viewed as a three-step decision scheme. First, we rank the individual assets in terms of their expected returns adjusted by available information and anticipation of future decisions. Then, we select assets to be invested by checking a sequence of inequality rules that exploit the information contained in the adjusted expected returns and risks. Finally, the actual amount to be allocated to those selected assets are computed on the basis of the current wealth and their risks, i.e., the mean absolute deviations of individual asset returns. For implementation of the optimal policy, we introduce a least squares Monte Carlo method to approximate complex conditional expectations.

(2) Under an i.i.d. normal return-generating process with unknown means and covariance matrix, we find that, besides providing a formal way to accommodate new information from observed data, Bayesian learning can also help diversifyFootnote 2 portfolio allocations and reduce sensitivity of optimal portfolios to changes in model inputs. The major insight behind the properties is that incorporating the estimation risk of unknown parameters via Bayesian learning makes risky assets more risky, which leads to a more conservative and robust investment policy under our model framework. A typical message from the previous literature is that the use of Bayesian learning induces a negative effect on portfolio weights of risky assets (see Barberis 2000; Brandt et al. 2005; Skoulakis 2008). Barberis (2000) also mentioned that in their case, Bayesian learning affects the sensitivity of the optimal allocation to state variables. However, these findings are mainly based on numerical observations, without clarifying the mechanism or the role of Bayesian learning in forming optimal portfolios. In contrast, our paper aims at interpreting the effects of Bayesian learning from the policy layer in multi-period portfolio selection problems. Our conclusions are consistent with numerical findings in related literature, but we enrich the understanding of Bayesian learning in a new dynamic model scheme.

(3) Our numerical results based on real market data indicate that compared with a plug-in model, using Bayesian learning to account for parameter uncertainty and estimation risk in dynamic portfolio selection problems is able to improve the policy’s out-of-sample performance. The performance gap between models with and without Bayesian learning is, however, affected by investor’s risk preference and the amount of data available. That is, we observe that the incorporation of Bayesian learning has a significant advantage in out-of-sample performance when the investor’s risk tolerance level is high or the amount of available data is small. As the investor becomes more conservative and more data are observed, the gap between the two models narrows. We believe that these findings are valuable in answering the questions on the performance of considering Bayesian learning in practical investments.

The remainder of this paper is organized as follows. In Sect. 2, we formulate the dynamic portfolio selection model with unknown parameters. We solve the proposed stochastic dynamic program in Sect. 3. A plug-in model without learning is discussed in Sect. 4. An empirical study is provided in Sect. 5. We conclude our work in Sect. 6. The proofs of theorems and propositions are included in the “Appendix”.

2 A multi-period portfolio selection model under minimax rule

Assume that there is a capital market with p risky assets, \(S_{1},S_{2},...,S_{p}\). The decision time is discrete and indexed by \(\{0,1,...,T-1\}\). An investor joins this market with initial fund \(V_{0}\) and he can reallocate his fund among these p assets at the beginning of each of the following T consecutive time periods. The returns of risky assets are stochastic and denoted by \(\varvec{r}_{t}= (r_{t}^{1},r_{t}^{2}...,r_{t}^{p})^{\top }\), where \(r_{t}^{j}\) is the return for \(S_{j}\) in period t. Throughout the paper, we use boldface upper and lower case characters to denote vectors and matrices, respectively. \(\varvec{1}\) is the vector of all ones. We use [tT] to denote an index set \(\{t,t+1,...,T\}\) for short. The monetary values of the risky asset holdings at the beginning of period t are described by a vector \(\varvec{x}_{t} = (x_{t}^{1},...,x_{t}^{p})^{\top }\). Let \(V_{t}\) be the wealth at the beginning of period t. Then, the budget constraint is \(\sum _{j=1}^{p}x_{t}^{j} = V_{t}\) for all \(t\in [0;T-1]\). Short-selling and transaction cost are not considered.

Uncertainty in risky asset returns is analyzed based on a probability space \((\Omega ,\mathcal {F},P)\) where \(\Omega\) is the set of possible outcomes with element \(\omega\), \(\mathcal {F}\) is a \(\sigma\)-algebra and P is a probability measure. The flow of information is modeled by a filtration \(\{\mathcal {F}_{t}\}_{t=0}^{T}\), where the \(\sigma\)-algebra \(\mathcal {F}_{t}\) describes the information available to the investor at the beginning of period t satisfying \(\mathcal {F}_{t}\subseteq \mathcal {F}_{t+1}\subseteq \mathcal {F}_{T}=\mathcal {F}\) for all \(t<T\). Therefore, the expectation operator conditional on available information can be formally defined as \(\mathbb {E}(\cdot |\mathcal {F}_{t})\), abbreviated as \(\mathbb {E}_{t}(\cdot )\).

In order to evaluate the risk of holding a portfolio \(\varvec{x}_{t}\) in period t, we define a single-period portfolio risk measure based on the \(l_{\infty }\) risk function as

$$\begin{aligned} l_{t}(\varvec{x}_{t}) = \max _{1\leqslant j\leqslant p} \mathbb {E}_{t}(|r_{t}^{j}-\mathbb {E}_{t}(r_{t}^{j})|)x_{t}^{j} = \max _{1\leqslant j\leqslant p} \mathbb {E}_{t}(|r_{t}^{j}-m_{t}^{j}|)x_{t}^{j} = \max _{1\leqslant j\leqslant p} q_{t}^{j}x_{t}^{j}, \end{aligned}$$
(1)

where we use \(m_{t}^{j} = \mathbb {E}_{t}(r_{t}^{j})\) and \(q_{t}^{j} = \mathbb {E}_{t}(|r_{t}^{j}-m_{t}^{j}|)\) to denote the (conditional) expected return and its mean absolute deviation (MAD), respectively. The corresponding vector forms are \(\varvec{m}_{t} = \mathbb {E}_{t}(\varvec{r}_{t})\) and \(\varvec{q}_{t} = \mathbb {E}_{t}(|\varvec{r}_{t}-\varvec{m}_{t}|)\). Note that under our notation, \(r_{t}^{j}\) is \(\mathcal {F}_{t+1}\)-measurable, and \(m_{t}^{j}\), \(q_{t}^{j}\) and \(l_{t}(\varvec{x}_{t})\) are adapted to \(\mathcal {F}_{t}\). Strictly speaking, since \(m_{t}^{j}\), \(q_{t}^{j}\) and \(l_{t}(\varvec{x}_{t})\) are measurable functions mapping from the sample space \(\Omega\) to real line \(\mathbb {R}\), their complete forms should be \(m_{t}^{j}(\omega )\), \(q_{t}^{j}(\omega )\) and \(l_{t}(\varvec{x}_{t},\omega )\). For expositional brevity, we will suppress henceforth the dependency on the element \(\omega \in \Omega\) for functions measurable with respect to \(\{\mathcal {F}_{t}\}_{t=0}^{T}\). Instead, we point out the measurability when necessary. Generally, the dependency could be inferred easily from the context.

By minimizing the risk \(l_{t}(\varvec{x}_{t})\) and maximizing the one-period expected return, the so called minimax rule in the single-period model can be formulated as follows:

$$\begin{aligned} \min _{\varvec{x}_{t}\in \mathcal {X}_{t}} \quad \lambda \left( \max _{1\leqslant j\leqslant p} q_{t}^{j}x_{t}^{j}\right) - (1-\lambda )\sum _{i=1}^{p}m_{t}^{j}x_{t}^{j}, \end{aligned}$$
(2)

where \(\mathcal {X}_{t} = \left\{ \varvec{x}_{t}:\sum _{j=1}^{p}x_{t}^{j} = V_{t}, x_{t}^{j}\geqslant 0,j\in [1;p]\right\}\) and \(\lambda \in (0,1)\) represents the investor’s risk aversion level — the larger the \(\lambda\), the more conservative the investor is. In multi-objective optimization (to maximize the expected return and minimize the risk), the optimal solution of Problem (2) with respect to a given value of \(\lambda\) corresponds to an efficient solution point of the bi-criteria problem in (2) that considers both risk and return, and the set of optimal solutions with respect to all \(\lambda \in (0,1)\) correspond to the efficient frontier. Prior to solving the Problem (2), we first define an ancillary function \(G(\varvec{a}_{1},\varvec{a}_{2},k)\) with input vectors \(\varvec{a}_{1}\), \(\varvec{a}_{2}\) and scalar \(k\in [0;p-1]\) such that for all \(k\in [1;p-1]\),

$$\begin{aligned} G(\varvec{a}_{1},\varvec{a}_{2},k) =\sum _{j=0}^{k-1} \frac{a_{1}^{i_{p-j}(\varvec{a}_{1})}-a_{1}^{i_{p-k}(\varvec{a}_{1})}}{a_{2}^{i_{p-j}(\varvec{a}_{1})}} \end{aligned}$$

and specially \(G(\cdot ,\cdot ,0)=0\), where the function \(i_{j}(\varvec{a}_{1})\) outputs the index of the jth largest element of a given vector \(\varvec{a}_{1}=(a_{1}^{1},...,a_{1}^{p})^{\top }\), i.e., \(a_{1}^{i_{1}(\varvec{a}_{1})}\leqslant a_{1}^{i_{2}(\varvec{a}_{1})}\leqslant ...\leqslant a_{1}^{i_{p}(\varvec{a}_{1})}\). Then, the optimal portfolio allocation of Problem (2) can be presented in the following lemma.

Lemma 1

Given \(\lambda \in (0,1)\), the optimal solution of Problem (2) is that

$$\begin{aligned} x_{t}^{j*} = \left\{ \begin{aligned}&\frac{V_{t}}{q_{t}^{j}}\Bigg (\sum _{j\in \mathcal {A}_{t}^{*}} \frac{1}{q_{t}^{j}}\Bigg )^{-1},&j&\in \mathcal {A}_{t}^{*},\\&0,&j&\notin \mathcal {A}_{t}^{*}, \end{aligned} \right. \end{aligned}$$
(3)

where the set of assets in which to invest \(\mathcal {A}_{t}^{*}\) can be constructed by the following rule: If there exists an integer \(k \in [0;p-2]\) such that

$$\begin{aligned} G(\varvec{m}_{t},\varvec{q}_{t},k)< \frac{\lambda }{1-\lambda } \quad \text {and}\quad G(\varvec{m}_{t},\varvec{q}_{t},k+1) \geqslant \frac{\lambda }{1-\lambda }, \end{aligned}$$
(4)

then \(\mathcal {A}_{t}^{*} = \{i_{p}(\varvec{m}_{t}),i_{p-1}(\varvec{m}_{t}),...,i_{p-k}(\varvec{m}_{t})\}\). Otherwise, if the condition above is not satisfied by any integer \(k\in [0;p-2]\), then \(\mathcal {A}_{t}^{*} = [1;p]\).

Lemma 1 can be obtained by solving a set of equations from standard KKT conditions. Readers of interest could refer to the appendix in Cai et al. (2000) for more details. The primary difference between Problem (2) and the original model in Cai et al. (2000) is that Problem (2) considers estimation risk and uses conditional expectations to highlight the value of information flow, while the original model only focuses on point estimates and assumes that the unknown parameters can be estimated precisely.

Despite the wide popularity of single-period models, this static paradigm is known to be difficult to apply to the long-term investors such as pension planners and insurance companies (Mulvey et al. 2003). Moreover, like a wise chess player, it is reasonable to suggest that a successful manager or investor always think ahead and contemplate the inter-temporal effect of multi-period decisions. We next formulate a dynamic model considering both portfolio optimization and Bayesian learning. Let \(l_{t}(\varvec{x}_{t})\) in (1) quantify the portfolio risk in period t. We then define the cumulative risk during the investment horizon as follows:

$$\begin{aligned} L_{{t:T}} = {\mathbb {E}}_{t}\left[ \sum _{k=t}^{T-1}\gamma _{k}l_{k}(\varvec{x}_{k})\right] = {\mathbb {E}}_{t}\left[ \sum _{k=t}^{T-1}\gamma _{k}\max _{1\leqslant j\leqslant p} q_{k}^{j}x_{k}^{j}\right] , \quad {t=0,1,...,T-1}, \end{aligned}$$
(5)

where \(\gamma _{k}\geqslant 0\) is the weight for each period risk \(l_{k}(\varvec{x}_{k})\). By tuning the values of \(\{\gamma _{k}\}_{k=t}^{T-1}\), one can distinguish the importance of risks in different periods.

Analogous to the single-period model (2), we now extend the minimax rule to the multi-period case and propose the dynamic model as follows:

$$\begin{aligned} \begin{aligned} \min _{\varvec{x}_{0}\in \mathcal {X}_{0}}\mathbb {E}_{0}&\Big [\min _{\varvec{x}_{1}\in \mathcal {X}_{1}}\mathbb {E}_{1}\Big [... \min _{\varvec{x}_{T-1}\in \mathcal {X}_{T-1}}\mathbb {E}_{T-1}\Big [\lambda L_{{0:T}} - (1-\lambda )V_{T}\Big ]... \Big ]\Big ]\\&\text {s.t.} \quad V_{t+1} = V_{t}+\varvec{r}_{t}^{\top }\varvec{x}_{t}, \quad t \in [0;T-1]. \end{aligned} \end{aligned}$$
(6)

Several clarifications on Problem (6) are in order. First, the cumulative risk defined in (5) is a natural way to evaluate the total risk the investor undertakes during T periods, and it is widely acceptable in multi-period portfolio optimization literature, see, e.g., Calafiore (2008), Liu and Zhang (2015) and Boyd et al. (2017). Second, the modeling of (6) indicates that the optimal policy should be derived by maximizing the expected final wealth at a minimum level of cumulative risks. Hence, our model formulation is appropriate for the investors who care both final wealth at the expiration and the risks they will undertake during the investment process. The non-negative weights considered \(\{\gamma _{t}\}_{t=0}^{T-1}\) provide extra model flexibility for users in practice. Third, in Problem (6), the investor chooses the optimal portfolio at each decision point with the possibility of rebalancing portfolios in future periods and recognizes that data realizations during the investment horizon contain useful information for updating beliefs on unknown parameters. This decision making process with learning can be described intuitively as follows:

$$\begin{aligned} \text {Decision}(\varvec{x}_{0}^{*})\rightsquigarrow \text {Observation}(\mathcal {F}_{1}) \rightsquigarrow \text {Learning} \rightsquigarrow \text {Decision}(\varvec{x}_{1}^{*}) \rightsquigarrow \\ \cdots \rightsquigarrow \text {Observation}(\mathcal {F}_{T-1}) \rightsquigarrow \text {Learning} \rightsquigarrow \text {Decision}(\varvec{x}_{T-1}^{*}). \end{aligned}$$

Prior to solving Problem (6), we further investigate the time-consistency property from the perspectives of the multi-period risk measure \(L_{t:T}\) and the problem’s optimal policy, respectively.

Let \(X_k = \gamma _{k}\max _{1\leqslant j\leqslant p} q_{k}^{j}x_{k}^{j}\). The sequence \(\bar{X}_{t} = \{ X_k\}_{k=t}^{T-1}\) can be viewed as the loss process with \(X_{k}\)’s realization the lower the better, and each \(X_k\) is adapted to \(\mathcal {F}_{k}\). Then, \(L_{t:T}(\bar{X}_{t}) = \mathbb {E}_{t}(\sum _{k=t}^{T-1}X_{k})\), that is, a conditional expectation of a sum of future losses. Given \(0\le t_1 < t_2 \le T-1\) and loss processes \(\bar{X}_{t_1}\) and \(\bar{X}_{t_1}^{\prime }\), if \(X_{k}=X_{k}^{\prime }\), \(\forall k = t_1,...,t_{2}-1\), and \(L_{t_2:T}(\bar{X}_{t_2})\le L_{t_2:T}(\bar{X}_{t_2}^{\prime })\), where the equality and inequality between random variables are understood in the almost surely sense, then, given the conditional expectation in \(L_{t_1:T}\), it is straightforward to have \(L_{t_1:T}(\bar{X}_{t_1}) \le L_{t_1:T}(\bar{X}_{t_1}^{\prime })\). Therefore, the multi-period risk measure \(L_{t:T}\) is time consistent in the sense that if a position is riskier than another one at some future time (\(t_2\)), then the position should also be riskier from the perspective today (\(t_1\)). For more discussion on time consistent dynamic risk measures, see Ruszczyński (2010) and references therein.

For the multi-period stochastic programming Problem (6) with further terminal wealth considered, we follow the concept in Shapiro (2009) and time-consistency means that any optimal policy specified today would imply its optimality in future stages. By expanding \(V_T\) and \(L_{0:T}\), Problem (6) can be reformulated as

$$\begin{aligned} \min _{\varvec{x}_{0}\in \mathcal {X}_{0}}-(1-\lambda )V_0 +g_0(\varvec{x}_{0})+\mathbb {E}_{0}\Big [\min _{\varvec{x}_{1}\in \mathcal {X}_{1}}g_1(\varvec{x}_{1})+\mathbb {E}_{1}\Big [...+\mathbb {E}_{T-2}\Big [ \min _{\varvec{x}_{T-1}\in \mathcal {X}_{T-1}}g_{T-1}(\varvec{x}_{T-1})\Big ]... \Big ]\Big ], \end{aligned}$$

where \(V_0\) is known at the beginning of period 0 and \(g_{t}(\varvec{x}_t) = \lambda \gamma _{t}\max _{1\leqslant j\leqslant p} q_{t}^{j}x_{t}^{j} - (1-\lambda )\mathbb {E}_{t}[\varvec{r}_{t}^{\top }\varvec{x}_{t}]\) is \(\mathcal {F}_t\)-measurable. According to Example 2 of Shapiro (2009), we can conclude that Problem (6) is time consistent.

3 Optimal investment policy with learning

In this section, we first solve the Problem (6) and give some structural results on the optimal policy. Then, we set up a Bayesian learning framework under an i.i.d. normal return-generating process with unknown means and covariance matrix. We finally introduce a least squares Monte Carlo method to estimate complex conditional expectations for implementation of the optimal policy.

3.1 An optimal investment policy

It turns out the optimal policy of Problem (6) can be solved analytically. We directly give results in Theorem 1 and defer its proof in “Appendix A”.

Theorem 1

Given non-negative \(\{\gamma _{t}\}_{t=0}^{T-1}\) and \(\lambda \in (0,1)\), the optimal policy of Problem (6) is such that for each \(t\in [0;T-1]\),

$$\begin{aligned} x_{t}^{j*} = \left\{ \begin{aligned}&\frac{V_{t}}{q_{t}^{j}}\Bigg (\sum _{j\in \mathcal {A}_{t}^{*}} \frac{1}{q_{t}^{j}}\Bigg )^{-1},&j&\in \mathcal {A}_{t}^{*},\\&0,&j&\notin \mathcal {A}_{t}^{*}, \end{aligned} \right. \end{aligned}$$
(7)

where the set of assets in which to invest \(\mathcal {A}_{t}^{*}\) is determined by the following rule: When \(\gamma _{t}>0\), if there exists an integer \(k \in [0;p-2]\) such that

$$\begin{aligned} G(\varvec{v}_{t},\gamma _{t}\varvec{q}_{t},k)< \frac{\lambda }{1-\lambda } \quad \text {and}\quad G(\varvec{v}_{t},\gamma _{t}\varvec{q}_{t},k+1) \geqslant \frac{\lambda }{1-\lambda }, \end{aligned}$$
(8)

then \(\mathcal {A}_{t}^{*} = \{i_{p}(\varvec{v}_{t}),i_{p-1}(\varvec{v}_{t}),...,i_{p-k}(\varvec{v}_{t})\}\); otherwise, \(\mathcal {A}_{t}^{*} =[1;p]\). When \(\gamma _{t}=0\), \(\mathcal {A}_{t}^{*}=\{i_{p}(\varvec{v}_{t})\}\). The vector \(\varvec{v}_{t}\) is recursively defined as: For \(t=T-1\), \(\varvec{v}_{T-1} = \mathbb {E}_{T-1}(\varvec{r}_{T-1})\) and \(c_{T-1} = -(1-\lambda )\). For \(t\in [0;T-2]\), \(\varvec{v}_{t}= -\mathbb {E}_{t}[(c_{t+1}+\lambda \gamma _{t+1} z_{t+1} -(1-\lambda ) y_{t+1})\varvec{r}_{t}]/(1-\lambda )\) and \(c_{t} = \mathbb {E}_{t}[c_{t+1}+\lambda \gamma _{t+1} z_{t+1}-(1-\lambda )y_{t+1}]\) where in each stage \(t\in [0;T-1]\),

$$\begin{aligned} y_{t}=z_{t}\sum _{j\in \mathcal {A}_{t}^{*}}\frac{v_{t}^{j}}{q_{t}^{j}} \quad \text {and}\quad z_{t}=\Bigg (\sum _{j\in \mathcal {A}_{t}^{*}}\frac{1}{q_{t}^{j}}\Bigg )^{-1}. \end{aligned}$$
(9)

The optimal policy derived in Theorem 1 is nonanticipative in that \(V_{t}\), \(\varvec{v}_{t}\) and \(\varvec{q}_{t}\) are \(\mathcal {F}_{t}\)-measurable and decisions in \(\mathcal {A}_{t}^{*}\) and \(\varvec{x}_{t}^{*}\) depend only on what is known at the beginning of period t. Moreover, the given policy can be intuitively viewed as a three-step decision scheme. First, we rank the p risky assets in terms of their values in the vector \(\varvec{v}_{t}\). Then, we select a set of assets to be included in the portfolio \(\mathcal {A}_{t}^{*}\) by checking a sequence of inequalities based on their values in \(\varvec{v}_{t}\) and \(\varvec{q}_{t}\). The actual amount allocated from \(V_{t}\) to those selected assets depends on the MADs of their returns in \(\varvec{q}_{t}\) following Eq. (7). Structurally, the policy in Theorem 1 can be decomposed into a selection rule in (8) and an allocation rule in (7), and it has a connection with the single-period solution in Lemma 1. To be specific, regarding the selection rule to determine \(\mathcal {A}_{t}^{*}\), the inequalities in (4) shows that an asset with a high expected return is always considered prior to an asset with a low expected return and the asset with the highest value in \(\varvec{m}_{t}\) is always included in \(\mathcal {A}_{t}^{*}\) (but the actual allocation to the selected asset may be close to zero if its risk is high). The policy in Theorem 1 acts in a similar way, but the selection criterion now depends on the comparison of values in \(\varvec{v}_{t}\) instead of the simple expected asset returns and the MADs in function G are weighted by \(\{\gamma _{t}\}_{t=0}^{T-1}\).

As shown in Theorem 1, the vector \(\varvec{v}_{t}\) is the crux to implement the optimal policy. The calculation of \(\varvec{v}_{t}\), however, is far from straightforward. We will introduce a least squares Monte Carlo method in Sect. 3.3 to approximate its value. For now, we deviate and give the following proposition to better understand the meaning of \(\varvec{v}_{t}\) in Theorem 1.

Proposition 1

For \(t\in [0;T-2]\), let

$$\begin{aligned} \Delta _{t+1} = \sum _{s=t+1}^{T-1} \frac{\varvec{v}_{s}^{\top }\varvec{x}_{s}^{*}}{V_{s}}-\frac{\lambda \gamma _{s}}{1-\lambda }z_{s}, \end{aligned}$$
(10)

where \(\varvec{x}_{s}^{*}\) denotes the optimal asset holdings at the beginning of period s. Let \(\Delta _{T}=0\). Then, for all \(t\in [0;T-1]\), \(\varvec{v}_{t}\) can be written as

$$\begin{aligned} \varvec{v}_{t}&= \mathbb {E}_{t}[(1+\Delta _{t+1}) \varvec{r}_{t}]. \end{aligned}$$
(11)

An important observation in (11) is that the vector \(\varvec{v}_{t}\) can be regarded as adjusted expected returns (AERs) of risky assets because it is composed of expected returns and adjusted in two perspectives: (i) It is adjusted by the anticipation of future decisions. This type of adjustment is reflected in \(\Delta _{t+1}\) which is obtained by anticipating the performance of the optimal portfolios \(\{\varvec{x}_{s}^{*}\}_{s=t+1}^{T-1}\) in future periods. Precisely, in (10), \(z_{s}\), as defined in (9), generally measures the overall risk of investing the assets in \(\mathcal {A}_{s}^{*}\), while \(\varvec{v}_{s}^{\top }\varvec{x}_{s}^{*}/V_{s}\) represents the generalized return rate in period s, evaluated in terms of AERs \(\varvec{v}_{s}\). Thus, this anticipation effect takes account of both risk and return, and it accumulates across all future periods. (ii) It is adjusted by new observed data. This adjustment involves the calculation of conditional expectation \(\mathbb {E}_{t}(\cdot )\). When new data are released, the distribution of \(\varvec{r}_{t}\) as well as future prospects in \(\Delta _{t+1}\) will be updated accordingly in a Bayesian fashion. Therefore, the impact of future moves from dynamic programming and the role of learning can be well reflected given \(\varvec{v}_{t}\) presented in (11).

Special care should also be paid to the choice of the weights \(\{\gamma _{t}\}_{t=0}^{T-1}\). Investors are allowed to impose personal preference on risks in different periods by setting \(\{\gamma _{t}\}_{t=0}^{T-1}\) properly. For example, if the final period risk really matters, a portfolio manager may raise the value of \(\gamma _{T-1}\). Increasing \(\gamma _{T-1}\) alone will make \(\mathcal {A}_{T-1}^{*}\) include more assets since G function is decreasing in \(\gamma _{T-1}\) given \(\varvec{v}_{T-1}\), \(\varvec{q}_{T-1}\) and k. Moreover, for all previous stages \(t\in [0;T-2]\), the weight of \(z_{T-1}\) will be increased accordingly as shown in (10). As another extreme, if one totally ignores the risk in period \(T-1\), he could set \(\gamma _{T-1}=0\). Theorem 1 tells us that in this case, the optimal decision for stage \(T-1\) is to invest all the fund \(V_{T-1}\) in the asset with the highest value in expected returns \(\varvec{m}_{T-1}\), which is not surprising for such a risk-seeking investor. Similarly, for all previous stages \(t\in [0;T-1]\), the effect of \(z_{T-1}\) would be removed by setting \(\gamma _{T-1}=0\) in (10).

Note that Theorem 1 solves a general stochastic dynamic programming problem proposed in (6) and thus the derived nonanticipative policy in Theorem 1 is optimal with respect to arbitrary probability measure P that dominates the return process \(\{\varvec{r}_{t} \}_{t=0}^{T-1}\). That is, Theorem 1 provides a general optimal policy without specifying how to compute the conditional expectations and how to update the information. For practical implementation of the policy, we need to further introduce some necessary assumptions on returns and unknown parameters, and develop procedures to learn from data. Next, we will restrict our attention to a parameterized return-generating process and introduce unknown parameters as well as Bayesian learning.

3.2 Bayesian learning framework

In this section, we specify a parameterized return-generating process and further show how the unknown parameters are updated in a Bayesian fashion.

For concentration on parameter uncertainty and estimation risk, we conduct our analysis based on a popular i.i.d. normal return-generating process.Footnote 3 Specifically, we assume that the return rates of p risky assets in period t, \(t\in [0;T-1]\), follow a linear model given by

$$\begin{aligned} \varvec{r}_{t} = \varvec{\mu } + \varvec{\epsilon }_{t} \quad \text {with} \quad \varvec{\epsilon }_{t} \sim \mathcal {N}(0,\varvec{\Sigma }), \end{aligned}$$
(12)

where \(\varvec{\epsilon }_{0}\), \(\varvec{\epsilon }_{1}\),..., \(\varvec{\epsilon }_{T-1}\) are i.i.d. noises. Given true parameters \(\varvec{\mu }\) and \(\varvec{\Sigma }\), the returns \(\{\varvec{r}_{t}\}_{t=0}^{T-1}\) are independently and identically distributed as \(\mathcal {N}(\varvec{\mu },\varvec{\Sigma })\). In most realistic situations, however, the investor cannot know the exact true values of parameters \((\varvec{\mu }, \varvec{\Sigma })\). In spirit of Bayesian learning, we first suppose that the unknown parameters are random and follow a specified prior distribution, then according to Bayesian rule, the posterior beliefs on probability distributions of unknown parameters could be updated gradually as new data are observed.

To start up, suppose there are h data points before the investment horizon denoted as \(\{\varvec{r}_{-h},...,\varvec{r}_{-1}\}\). The information set \(\mathcal {F}_{t}\) now can be formally defined as the \(\sigma\)-algebra generated by returns up to time t, that is, \(\mathcal {F}_{t} = \sigma (\{\varvec{r}_{-h},...,\varvec{r}_{-1},\varvec{r}_{0},...,\varvec{r}_{t-1}\})\). Denote the set of return data by \(\mathcal {D}_{t} = \{\varvec{r}_{-h},...,\varvec{r}_{-1},\varvec{r}_{0},...,\varvec{r}_{t-1}\}\). Thus, the conditional expectation \(\mathbb {E}(\cdot |\mathcal {F}_{t})\) can be written as \(\mathbb {E}(\cdot |\mathcal {D}_{t})\). This transition facilitates us to deal with conditional expectations defined directly in terms of data process rather that the sequence of \(\sigma\)-algebras. The computation work of Bayesian updating is presented in what follows.

Suppose that the mean vector \(\varvec{\mu }\) is unknown while the covariance matrix \(\varvec{\Sigma }\) is known. The uncertainty about \(\varvec{\mu }\) is further assumed to be described by a multi-variate normal prior, i.e., \(\varvec{\mu }\sim \mathcal {N}(\varvec{\mu }_{-1},\varvec{\Sigma }_{-1})\). According to Bayesian rule, it can be verified that \(\varvec{r}_{t}|\mathcal {D}_{t} \sim \mathcal {N}(\varvec{\mu }_{t},\varvec{\Sigma }_{t}+\varvec{\Sigma })\), where \(\varvec{\mu }_{t} =\varvec{\Sigma }_{t}(\varvec{\Sigma }^{-1}\varvec{r}_{t-1}+\varvec{\Sigma }_{t-1}^{-1}\varvec{\mu }_{t-1})\) and \(\varvec{\Sigma }_{t} = (\varvec{\Sigma }^{-1}+\varvec{\Sigma }_{t-1}^{-1})^{-1}\) for \(t> 0\). Specially, \(\varvec{\mu }_{0} =\varvec{\Sigma }_{0}(\varvec{\Sigma }^{-1}\sum _{s=-h}^{-1}\varvec{r}_{s}+\varvec{\Sigma }_{-1}^{-1}\varvec{\mu }_{-1})\) and \(\varvec{\Sigma }_{0} = (h\varvec{\Sigma }^{-1}+\varvec{\Sigma }_{-1}^{-1})^{-1}\).

A more general case is that both \(\varvec{\mu }\) and \(\varvec{\Sigma }\) are unknown. To express some general and objective information on \((\varvec{\mu },\varvec{\Sigma })\) before the realization of data, we assume a conventional “uninformative” prior for \((\varvec{\mu },\varvec{\Sigma })\), namely,

$$\begin{aligned} f(\varvec{\mu },\varvec{\Sigma }) \propto \left| \varvec{\Sigma } \right| ^{-(p+1)/2}. \end{aligned}$$
(13)

where \(\left| \varvec{\Sigma } \right|\) is the determinant of the matrix \(\varvec{\Sigma }\). At the beginning of period 0, one can obtain the posterior distribution after observing h historical data, following the analysis in Zellner (1996). That is, given \(\mathcal {D}_{0}\), we have

$$\begin{aligned} \varvec{\mu }|\varvec{\Sigma }, \mathcal {D}_{0} \sim \mathcal {N}\left( \varvec{\mu }_{0}, \varvec{\Sigma }/h\right) \quad \text {and}\quad \varvec{\Sigma }|\mathcal {D}_{0} \sim \mathcal{I}\mathcal{W}\left( h-1,\varvec{\Sigma }_{0}\right) , \end{aligned}$$
(14)

where \(\mathcal{I}\mathcal{W}(h-1,\varvec{\Sigma }_{0})\) is a inverse Wishart distribution with degree of freedom \((h-1)\) and scale matrix \(\varvec{\Sigma }_{0}\), \(\varvec{\mu }_{0}= \sum _{s=-h}^{-1}\varvec{r}_{s}/h\), \(\varvec{\Sigma }_{0} = (\varvec{R}_{0}-\varvec{1}\varvec{\mu }_{0}^{\top })^{\top }(\varvec{R}_{0}-\varvec{1}\varvec{\mu }_{0}^{\top })\) and \(\varvec{R}_{0} = (\varvec{r}_{-h},...,\varvec{r}_{-1})^{\top }\). It is well-known that the normal-inverse-Wishart is a conjugate prior with respect to the normally distributed data. At the beginning of period \(t>0\), we take the posterior in stage \(t-1\) as the new prior for \((\varvec{\mu },\varvec{\Sigma })\) and solve the updated posterior as follows:

$$\begin{aligned} \varvec{\mu }|\varvec{\Sigma }, \mathcal {D}_{t} \sim \mathcal {N}\left( \varvec{\mu }_{t}, \varvec{\Sigma }/(h+t)\right) \quad \text {and}\quad \varvec{\Sigma }|\mathcal {D}_{t} \sim \mathcal{I}\mathcal{W}\left( h+t-1,\varvec{\Sigma }_{t}\right) , \end{aligned}$$
(15)

where

$$\begin{aligned} \varvec{\mu }_{t} = \frac{h+t-1}{h+t}\varvec{\mu }_{t-1}+\frac{1}{h+t}\varvec{r}_{t-1},\quad \varvec{\Sigma }_{t} = \varvec{\Sigma }_{t-1}+\frac{h+t-1}{h+t}(\varvec{r}_{t-1}-\varvec{\mu }_{t-1})(\varvec{r}_{t-1}-\varvec{\mu }_{t-1})^{\top }. \end{aligned}$$
(16)

For posterior marginal distribution of \(\varvec{\mu }\), according to the results in Zellner (1996), one can further obtain that

$$\begin{aligned} \varvec{\mu }|\mathcal {D}_{t} \sim \mathcal {T}_{h+t-p}\left( \varvec{\mu }_{t},\frac{1}{(h+t)(h+t-p)}\varvec{\Sigma }_{t}\right) , \end{aligned}$$
(17)

which turns out to be a multivariate t-distribution with degree of freedom \((h+t-p)\), location vector \(\varvec{\mu }_{t}\) and shape matrix \(\varvec{\Sigma }_{t}/[(h+t)(h+t-p)]\). The predictive distribution of \(\varvec{r}_{t}\) is then given by

$$\begin{aligned} \varvec{r}_{t}|\mathcal {D}_{t} \sim \mathcal {T}_{h+t-p}\left( \varvec{\mu }_{t},\frac{h+t+1}{(h+t)(h+t-p)}\varvec{\Sigma }_{t}\right) . \end{aligned}$$
(18)

Based on posterior marginal distributions of unknown parameters \((\varvec{\mu }, \varvec{\Sigma })\) in (15) and (17), one can compute the first and second moments of \(\mu ^{j}|\mathcal {D}_t\) and \((\varvec{\Sigma })_{jj}|\mathcal {D}_t\) for each \(j\in [1;p]\) as follows,

$$\begin{aligned} \mathbb {E}_{t}(\mu ^{j}) = \mu _{t}^{j}, \quad \mathbb {E}_{t}((\varvec{\Sigma })_{jj}) = \frac{1}{h^{\prime }}(\varvec{\Sigma }_{t})_{jj}, \end{aligned}$$
(19)
$$\begin{aligned} {\text {Var}}(\mu ^{j}|\mathcal {D}_{t}) = \frac{(\varvec{\Sigma }_{t})_{jj}}{(h+t)h^{\prime }}\quad \text {and}\quad {\text {Var}}((\varvec{\Sigma })_{jj}|\mathcal {D}_{t}) = \frac{2(\varvec{\Sigma }_{t})_{jj}^{2}}{(h^{\prime }-2)h^{\prime 2}}, \end{aligned}$$
(20)

where \((\cdot )_{jj}\) denotes the jth diagonal element of the given matrix and \(h^{\prime }=h+t-p-2\).

To implement the policy with Bayesian learning, we need the following proposition to update \(\varvec{m}_{t}\) and \(\varvec{q}_t\) as new data released.

Proposition 2

Suppose both \(\varvec{\mu }\) and \(\varvec{\Sigma }\) are unknown and the initial prior follows (13). At the beginning of period \(t\in [0;T-1]\), we have

$$\begin{aligned} \varvec{m}_{t} = \varvec{\mu }_{t} \quad \text {and} \quad \varvec{q}_{t} = \frac{2\sqrt{(h+t+1){\text {diag}}(\varvec{\Sigma }_{t})}}{\sqrt{h+t}(h+t-p-1)B\left( \frac{h+t-p}{2},\frac{1}{2}\right) }, \end{aligned}$$
(21)

where \({\text {diag}}(\cdot )\) is the operator that takes diagonal elements from the given matrix and \(B(\cdot ,\cdot )\) is a beta function.

According to the definition of \(l_{t}(\varvec{x}_t)\) in (1) and the result in (21), it is clear that the optimal portfolio of our proposed model dose not depend on the covariances between assets. Interestingly enough, the total portfolio variance, \(\varvec{x}_{t}^{\top }\varvec{\Sigma }\varvec{x}_{t}\), can be related to our model, albeit in an implicit way. The formal statement is presented in the following proposition.

Proposition 3

Suppose \(\varvec{r}_{t}|\mathcal {D}_{t}\sim \mathcal {N}(\varvec{\mu },\varvec{\Sigma })\) for all \(t\in [0;T-1]\). For arbitrary \(\xi >0\) and \(\varvec{x}_t \in \mathcal {X}_t\), it holds that

$$\begin{aligned} 2\left [1-\Phi \left (\frac{\xi }{\sqrt{\varvec{x}_{t}^{\top }\varvec{\Sigma }\varvec{x}_{t}}}\right )\right ]\leqslant \frac{p}{\xi }l_{t}(\varvec{x}_{t}), \end{aligned}$$

where \(\Phi (\cdot )\) denotes the cumulative density function of standard normal distribution and \(l_{t}(\varvec{x}_{t})\) is defined in (1).

The inequality in Proposition 3 shows that under the given assumptions, the total portfolio variance, \(\varvec{x}_{t}^{\top }\varvec{\Sigma }\varvec{x}_{t}\), that explicitly contains the covariance matrix of asset returns, will be small if \(l_{t}(\varvec{x}_{t})\) is kept small (nevertheless, it may not be true the other way around).

As pointed out by Best and Grauer (1991a), the more highly correlated the asset returns, the more sensitive the portfolio holdings from the mean-variance model are to expected returns. Hence, although it is intuitive to consider covariances in making portfolio decisions, an investment policy that removes the explicit dependence on asset covariances in the allocation part, such as the dynamic model proposed in this paper, may benefit from this “counter-intuitive” property in out-of-sample performance. We will provide more numerical evidence later to confirm the potential benefits of this property. It should be noted that the purpose of this paper is not to show whether the \(l_{\infty }\) risk function is universally better than variance. Instead, we aim to offer alternative investment models different from the traditional mean-variance and utility-based models for heterogeneous users, especially for those relatively long-term conservative investors, and also to provide some insights for handling parameter uncertainty in multi-period problems via incorporating Bayesian learning.

3.3 Least squares Monte Carlo method

According to the analysis in previous subsections, the adjusted expected return vector \(\varvec{v}_{t}\) is one of the keys to obtaining the optimal policy. However, with learning in the dynamic model, its value involves conditional expectations of quantities in future periods and cannot be computed directly. Instead, we introduce a least squares Monte Carlo method to obtain an estimate of \(\varvec{v}_{t}\). This numerical method is first proposed for pricing American options in Longstaff and Schwartz (2001) and Tsitsiklis and Van Roy (2001). Recently, it has been applied to dynamic portfolio selection problems to estimate complex conditional expectations, e.g., Brandt et al. (2005), van Binsbergen and Brandt (2007), Diris et al. (2014), Lan (2014), Denault and Simonato (2017) and Zhang et al. (2019). For convergence analysis of this approach, one can refer to Clément et al. (2002), Stentoft (2004) and Tsitsiklis and Van Roy (2001).

Briefly speaking, the least squares Monte Carlo method is composed of two parts: (i) replace the conditional expectation by projection on a finite set of basis functions of state variables; (ii) use Monte Carlo simulation and least squares regression to compute the estimated values with the replacement in (i) recursively starting from the terminal stage. We now give more details on the implementation.

First, note that \(\varvec{v}_{t}\) is an expectation conditional on data \(\mathcal {D}_{t}\). According to the update rules in (14)–(16), it is clear that \((\varvec{\mu }_{t},\varvec{\Sigma }_{t})\) are the sufficient state variables to describe the probability distributions. Denote the set of elements in \(\varvec{\mu }_{t}\) and unique elements in \(\varvec{\Sigma }_{t}\) by \(\varvec{\theta }_{t}\). The evolution of \(\varvec{\theta }_{t}\) depends on the newly observed returns, their squares, and cross-products. For \(t\in [0;T-2]\) and \(j\in [1;p]\), we can write

$$\begin{aligned} v_{t}^{j}= \frac{-1}{1-\lambda }\mathbb {E}\Big [\Big (c_{t+1}+\lambda \gamma _{t+1} z_{t+1} -(1-\lambda ) y_{t+1}\Big )r_{t}^{j}\Big |\varvec{\theta }_{t}\Big ] \end{aligned}$$
(22)

and

$$\begin{aligned} c_{t} = \mathbb {E}\Big [c_{t+1}+\lambda \gamma _{t+1} z_{t+1} -(1-\lambda ) y_{t+1}\Big |\varvec{\theta }_{t}\Big ], \end{aligned}$$
(23)

which are the two types of conditional expectations we have to estimate in order to obtain \(\varvec{x}_{t}^{*}\). Essentially, the conditional expectations \(v_{t}^{j}\) and \(c_{t}\) can be viewed as some functions of \(\varvec{\theta }_{t}\). The theory on Hilbert spaces tells us that any function belonging to this space can be represented as a countable linear combination of basis vectors for the space (see Royden and Fitzpatrick 1988). Therefore, it is reasonable to approximate \(v_{t}^{j}\) and \(c_{t}\) by a set of basis functions as follows,

$$\begin{aligned} v_{t}^{j} = \sum _{m=1}^{M}a_{mt}^{j}\phi _{m}(\varvec{\theta }_{t}) \quad \text {and} \quad c_{t} = \sum _{m=1}^{M}a_{mt}^{c}\phi _{m}(\varvec{\theta }_{t}), \end{aligned}$$
(24)

where \(\{\phi _{m}(\varvec{\theta }_{t})\}_{m=1}^{M}\) are the M basis functions, and \(\{a_{mt}^{j}\}_{m=1}^{M}\) and \(\{a_{mt}^{c}\}_{m=1}^{M}\) are the coefficients for \(v_{t}^{j}\) and \(c_{t}\), respectively. Specially, \(v_{T-1}^{j} = \mu _{T-1}^{j}\), \(j\in [1;p]\), and \(c_{T-1} = -(1-\lambda )\) are obviously simple functions of \(\varvec{\theta }_{T-1}\). Note that we do not need to estimate \(\varvec{q}_{t}\) since given \(\varvec{\theta }_{t}\), one can directly compute it following (21).

We next employ Monte Carlo simulation and least squares regression to estimate \(\{a_{mt}^{j}\}_{m=1}^{M}\) and \(\{a_{mt}^{c}\}_{m=1}^{M}\). Consider a set of N simulated return paths, denoted as \(\{\varvec{r}_{t}^{(n)}\}_{n=1}^{N}\), \(t\in [0;T-1]\), following the Bayesian update rules in Sect. 3.2 with both \(\varvec{\mu }\) and \(\varvec{\Sigma }\) unknown. Denote the realized values of state variables in path n as \(\{\varvec{\theta }_{t}^{(n)}\}_{t=1}^{T-1}\) and the MADs of returns are \(\{\varvec{q}_{t}^{(n)}\}_{t=1}^{T-1}\), \(n\in [1;N]\). The algorithm works backwards from \(t=T-1\) to the current decision point \(t=0\). At the beginning of period \(T-1\), given \(\varvec{\theta }_{T-1}^{(n)}\), one can easily solve a single-period last stage problem following Theorem 1 and obtain the estimated values \(\widehat{\varvec{v}}_{T-1}^{(n)}=\sum _{s=-h}^{T-2}\varvec{r}_{s}^{(n)}/(T-1+h)\), \(\hat{c}_{T-1}^{(n)}=-(1-\lambda )\), \(\hat{z}_{T-1}^{(n)}\) and \(\hat{y}_{T-1}^{(n)}\) in each path. At the beginning of period \(t<T-1\), we should already know the estimated values \(\widehat{\varvec{v}}_{t+1}^{(n)}\), \(\hat{c}_{t+1}^{(n)}\), \(\hat{z}_{t+1}^{(n)}\) and \(\hat{y}_{t+1}^{(n)}\), \(n\in [1;N]\). Then, the realized values of \(v_{t}^{j}\) and \(c_{t}\) in path n are

$$\begin{aligned} v_{t}^{j(n)}= \frac{-1}{1-\lambda }\Big (\hat{c}_{t+1}^{(n)}+\lambda \gamma _{t+1} \hat{z}_{t+1}^{(n)} -(1-\lambda ) \hat{y}_{t+1}^{(n)}\Big )r_{t}^{j(n)} \end{aligned}$$
(25)

and

$$\begin{aligned} c_{t}^{(n)}= \hat{c}_{t+1}^{(n)}+\lambda \gamma _{t+1} \hat{z}_{t+1}^{(n)} -(1-\lambda ) \hat{y}_{t+1}^{(n)}. \end{aligned}$$

On the other hand, we have the basis function values \(\{\phi _{m}(\varvec{\theta }_{t}^{(n)})\}_{m=1}^{M}\). Therefore, the estimated coefficients \(\{\hat{a}_{mt}^{j}\}_{m=1}^{M}\), \(j\in [1;p]\), and \(\{\hat{a}_{mt}^{c}\}_{m=1}^{M}\) could be obtained by regressions, that is, they are the solutions of the following minimization problems:

$$\begin{aligned} \min _{\{a_{mt}^{j}\}_{m=1}^{M}}\sum _{n=1}^{N}\Big [\sum _{m=1}^{M}a_{mt}^{j}\phi _{m}(\varvec{\theta }_{t}^{(n)})-v_{t}^{j(n)} \Big ]^{2}, \end{aligned}$$

and

$$\begin{aligned} \min _{\{a_{mt}^{c}\}_{m=1}^{M}}\sum _{n=1}^{N}\Big [\sum _{m=1}^{M}a_{mt}^{c}\phi _{m}(\varvec{\theta }_{t}^{(n)})-c_{t}^{(n)} \Big ]^{2}. \end{aligned}$$

The fitted values of the regressions, denoted as \(\{\widehat{v}_{t}^{j(n)}\}_{j=1}^{p}\) and \(\hat{c}_{t}^{(n)}\), \(n\in [1;N]\), constitute the estimates of conditional expectations in (22) and (23). These estimates of the conditional expectations, in turn, yield estimates of \(\hat{z}_{t}^{(n)}\) and \(\hat{y}_{t}^{(n)}\) for each path n following the results in Theorem 1. At the decision point \(t=0\), since the state variable is fixed on \(\varvec{\theta }_0\) for all paths, the fitted value from the regression simply reduces to \(\hat{v}_{0}^{j} = \sum _{n=1}^{N} v_{0}^{j(n)}/N\), \(j\in [1;p]\) (\(\hat{c}_{0}\) now is irrelevant to portfolio decision). Based on \(\widehat{\varvec{v}}_{0}\) and \(\varvec{q}_0\), the investor can optimally allocate his fund among the assets following Theorem 1.

There are many basis functions we can use for evaluating the conditional expectations, including Hermite, Legendre, Chebyshev, Laguerre polynomials among others. A number of numerical evidence in, e.g., Longstaff and Schwartz (2001), Brandt et al. (2005) and van Binsbergen and Brandt (2007) indicates that the order of the polynomial is not necessary to be very high for obtaining reliable estimates and even the first order (linear) polynomial of state variables is an effective choice in practice.

For path simulation, it should be noted that the sample paths of asset returns are simulated in a Bayesian context to perform learning. In each path n, once new data are revealed, we can compute the updated state variables in \(\varvec{\theta }_{t}^{(n)}\). Given \(\varvec{\theta }_{t}^{(n)}\), we can go on to simulate a new return data point along the path according to the multivariate t-distribution derived in (18). More simulation paths of course will improve the regression fitting, but at the expense of more computational time as the number of portfolio selection problems the algorithm needs to solve increases linearly with the number of simulated paths. The complete implementation process is presented in Algorithm 1.

Algorithm 1
figure a

The Optimal Investment Policy with Bayesian Learning

4 Plug-in model

4.1 Model formulation

Unlike the Bayesian portfolio selection model, the plug-in model solves the multi-period investment problem believing that the unknown parameters could be estimated precisely by sample estimates given historical data. Let \(\tilde{\varvec{m}}_{0} = (\tilde{m}_{0}^{1},...,\tilde{m}_{0}^{p})^{\top }\) and \(\tilde{\varvec{q}}_{0} = (\tilde{q}_{0}^{1},...,\tilde{q}_{0}^{p})^{\top }\) denote the point estimates of unconditional expected returns and MADs, respectively, at the beginning of period 0. Given h historical data points in \(\mathcal {D}_{0}\) and the assumption on the normal distribution of asset returns, we have

$$\begin{aligned} \tilde{\varvec{m}}_{0} =\frac{1}{h}\sum _{s=-h}^{-1}\varvec{r}_{s} \quad \text {and} \quad \tilde{\varvec{q}}_{0} =\sqrt{\frac{2}{\pi (h-1)}{\text {diag}}(\varvec{\Sigma }_{0})}, \end{aligned}$$
(26)

where \(\varvec{\Sigma }_{0}\) has been defined in (14). Consistent with the Bayesian model, the cumulative risk during the investment horizon in this case is defined as

$$\begin{aligned} \tilde{L}_{T} = \sum _{t=0}^{T-1}\gamma _{t}\max _{1\leqslant j\leqslant p} \tilde{q}_{0}^{j}x_{t}^{j}, \end{aligned}$$
(27)

where \(\tilde{\varvec{q}}_{0}\) is used for all future periods \(t\in [0;T-1]\), which implies the ignorance of parameter uncertainty and estimation risk in decision making under the plug-in model.

Accordingly, we set up the following multi-period optimization problem for the plug-in model,

$$\begin{aligned} \begin{aligned}&\min _{\varvec{x}_{0}\in \mathcal {X}_{0},\cdots ,\varvec{x}_{T-1}\in \mathcal {X}_{T-1}}\mathbb {E}\Big [\lambda \tilde{L}_{T} - (1-\lambda )V_{T}\Big ]\\&\qquad \qquad \text {s.t.} \quad V_{t+1} = V_{t}+\varvec{r}_{t}^{\top }\varvec{x}_{t}, \quad t \in [0;T-1]. \end{aligned} \end{aligned}$$
(28)

The policy is self-financing, so we have \(\mathbb {E}(V_{T}) = V_{0}+\sum _{t=0}^{T-1}\mathbb {E}(\varvec{r}_{t})^{\top }\varvec{x}_{t}\). Again, in plug-in model, we replace the uncondtional expected returns \(\mathbb {E}(\varvec{r}_{t})\) with known sample estimate \(\tilde{\varvec{m}}_{0}\) for all \(t\in [0;T-1]\) to emphasize that this model ignores parameter uncertainty.

4.2 The optimal policy

Problem (28) is a standard dynamic program and its solution could be obtained by backwards induction. We present the optimal policy in the following proposition.

Proposition 4

Given non-negative \(\{\gamma _{t}\}_{t=0}^{T-1}\) and \(\lambda \in (0,1)\), the optimal policy of Problem (28) is such that for each stage \(t\in [0;T-1]\), if \(j\notin \tilde{\mathcal {A}}_{t}^{*}\), then \(\tilde{x}_{t}^{j*}= 0\); if \(j\in \tilde{\mathcal {A}}_{t}^{*}\), then \(\tilde{x}_{t}^{j*} =V_{t}/(\tilde{q}_{0}^{j}\sum _{j\in \tilde{\mathcal {A}}_{t}^{*}} 1/\tilde{q}_{0}^{j})\), where \(\tilde{\mathcal {A}}_{t}^{*}\) can be determined by the rule: When \(\gamma _{t}>0\), if there exists an integer \(k \in [0;p-2]\) such that \(G(\tilde{\varvec{v}}_{t},\gamma _{t}\tilde{\varvec{q}}_{0},k)<\lambda /(1-\lambda )\) and \(G(\tilde{\varvec{v}}_{t},\gamma _{t}\tilde{\varvec{q}}_{0},k+1) \geqslant \lambda /(1-\lambda )\), then \(\tilde{\mathcal {A}}_{t}^{*} = \{i_{p}(\tilde{\varvec{v}}_{t}),...,i_{p-k}(\tilde{\varvec{v}}_{t})\}\); otherwise, \(\tilde{\mathcal {A}}_{t}^{*} = [1;p]\). When \(\gamma _{t}=0\), \(\tilde{\mathcal {A}}_{t}^{*} = \{i_{p}(\tilde{\varvec{v}}_{t}) \}\). The vector \(\tilde{\varvec{v}}_{t}\) is recursively defined as: For \(t=T-1\), \(\tilde{\varvec{v}}_{t} = \tilde{\varvec{m}}_{0}\) and \(\tilde{c}_{t} = -(1-\lambda )\). For \(t\in [0;T-2]\), \(\tilde{\varvec{v}}_{t} = -\tilde{c}_{t}\tilde{\varvec{m}}_{0}/(1-\lambda )\) and \(\tilde{c}_{t} =\tilde{c}_{t+1} + \lambda \gamma _{t+1} \tilde{z}_{t+1}-(1-\lambda )\tilde{y}_{t+1}\) where for each \(t\in [0;T-1]\), \(\tilde{z}_{t}=1/(\sum _{j\in \tilde{\mathcal {A}}_{t}^{*}}1/\tilde{q}_{0}^{j})\) and \(\tilde{y}_{t}=\tilde{z}_{t}\sum _{j\in \tilde{\mathcal {A}}_{t}^{*}}\tilde{v}_{t}^{j}/\tilde{q}_{0}^{j}\).

Similar to the case with Bayesian learning, we can further rewrite the AER vector \(\tilde{\varvec{v}}_{t}\) in Proposition 4 as \(\tilde{\varvec{v}}_{t} = (1+\tilde{\Delta }_{t+1})\tilde{\varvec{m}}_{0}\) where

$$\begin{aligned} \tilde{\Delta }_{t+1} = \sum _{s=t+1}^{T-1}\frac{\tilde{\varvec{v}}_{s}^{\top }\tilde{\varvec{x}}_{s}^{*}}{V_{s}} -\frac{\lambda \gamma _{s}}{1-\lambda } \tilde{z}_{s}. \end{aligned}$$
(29)

In addition, notice that given sample estimates \(\tilde{\varvec{m}}_{0}\) and \(\tilde{\varvec{q}}_{0}\), the optimal portfolio weights in percentage of wealth \(\{ \tilde{\varvec{x}}_{t}^{*}/V_{t}\}_{t=0}^{T-1}\) can be exactly known at the beginning of period 0. Therefore, the plug-in investor can choose to follow the deterministic policy \(\{\tilde{\varvec{x}}_{t}^{*}/V_{t}\}_{t=0}^{T-1}\) to allocate his fund at each decision point, ignoring the new released data. A “wiser” alternative may be that the plug-in investor only uses \(\tilde{\varvec{x}}_{0}^{*}\) to decide the portfolio at time point 0 and updates the sample estimates with new observed returns for decisions in future periods. Specifically, according to (26), the update rule is simply that, for \(t\in [0;T-1]\),

$$\begin{aligned} \tilde{\varvec{m}}_{t} =\frac{1}{h+t}\sum _{s=-h}^{t-1}\varvec{r}_{s} \quad \text {and} \quad \tilde{\varvec{q}}_{t} =\sqrt{\frac{2}{\pi (h+t-1)}{\text {diag}}(\varvec{\Sigma }_{t})}, \end{aligned}$$

where \(\varvec{\Sigma }_{t}\) is defined in (16). Then, at the beginning of period t, the plug-in investor solves Problem (28) with updated point estimates \(\tilde{\varvec{m}}_{t}\) and \(\tilde{\varvec{q}}_{t}\), and only \(\tilde{\varvec{x}}_{t}^{*}\) is used to construct the portfolio in stage t. We will present the performance of the above two decision approaches for the plug-in investor in our numerical study.

In financial economics literature, the difference in optimal portfolios between a long-term and a short-term investor is often identified as the hedging demand whose theoretical foundation can be dated back to Merton (1969). Here, we compare the multi-period (dynamic) and single-period (myopic) solutions under the plug-in model and the following four cases are considered.

  1. (a):

    \(\tilde{\Delta }_{t+1}>0\). Because \(G(\tilde{\varvec{v}}_{t},\tilde{\varvec{q}}_{0},k)\geqslant G(\tilde{\varvec{m}}_{0},\tilde{\varvec{q}}_{0},k)\) for all \(k\in [0;p-1]\), the dynamic solution is more aggressive than the myopic solution under a positive future prospect. Precisely, the dynamic solution tends to select less assets and focuses on choosing assets with high expected returns to increase portfolio value.

  2. (b):

    \(\tilde{\Delta }_{t+1}=0\). The dynamic solution degenerates to the myopic solution.

  3. (c):

    \(-1\leqslant \tilde{\Delta }_{t+1}<0\). Because \(G(\tilde{\varvec{v}}_{t},\tilde{\varvec{q}}_{0},k)\leqslant G(\tilde{\varvec{m}}_{0},\tilde{\varvec{q}}_{0},k)\) for all \(k\in [0;p-1]\), the dynamic solution performs more conservatively than the myopic solution by selecting more assets to diversifyFootnote 4 the portfolio under a poor future prospect.

  4. (d):

    \(\tilde{\Delta }_{t+1}<-1\). The plug-investor has an extremely poor future prospect and the ordering of elements in \(\tilde{\varvec{v}}_{t}\) is reversed. Suppose that the MADs in \(\tilde{\varvec{q}}_{0}\) are ordered in the same way as their expected returns in \(\tilde{\varvec{m}}_{0}\). In this case, the dynamic solution prefers investing in assets with low expected returns (e.g., some bonds or treasury bills instead of stocks) to possibly control risk.

4.3 Comparison with Bayesian model

For now, we have shown the optimal policies under Bayesian and plug-in model in Theorem 1 and Proposition 4.1 respectively. Then, we discuss the impact of incorporating Bayesian learning on investor’s decision making by comparing these two policies.

Given \(\tilde{\varvec{v}}_{t}\) and \(\varvec{v}_{t}\), we see that \(\tilde{\varvec{v}}_{t}\) is a deterministic function that relies on the sample estimates of historical returns, as shown in \(\tilde{\varvec{m}}_{0}\) and \(\tilde{\varvec{q}}_{0}\), while \(\varvec{v}_{t}\) is a conditional expectation adapted to the available information \(\mathcal {F}_{t}\) in a Bayesian fashion. Particularly, for each \(j\in [1;p]\), \(v_{t}^{j}\) can be understood as an expected value across different realizations of future returns and state variables based on the observed data at the beginning of period t. That is, given \(\varvec{\theta }_{t}\), the evaluation of \(\varvec{v}_{t}\) has an anticipation that the future prospect may deviate from what has been revealed by historical information in \(\varvec{\theta }_{t}\). In contrast, ignoring parameter uncertainty, the plug-in model fully “trusts” the historical information and believes that risky assets in future would perform the same as in history, which may lead to extreme AER values especially under some poor parameter estimates with large errors.

On the other hand, as a consequence of considering extra sources of uncertainty in unknown parameters, accounting for parameter uncertainty via Bayesian learning makes the risky assets more risky in the sense that the MADs estimated in the Bayesian model are larger than those of plug-in model. Specifically, we can rewrite \(\varvec{q}_{0}\) and \(\tilde{\varvec{q}}_0\) as \(\varvec{q}_{0}= \beta _{L}(h)\sqrt{\text {diag}(\varvec{\Sigma }_{0})}\) and \(\tilde{\varvec{q}}_0=\beta _{P}(h)\sqrt{\text {diag}(\varvec{\Sigma }_{0})}\) according to (21) and (26) where

$$\begin{aligned} \beta _{L}(h) = \frac{2\sqrt{h+1}}{\sqrt{h}(h-p-1)B\left( \frac{h-p}{2},\frac{1}{2}\right) }\quad \text {and}\quad \beta _{P}(h)=\sqrt{\frac{2}{\pi (h-1)}}. \end{aligned}$$

Both \(\beta _{L}(h)\) and \(\beta _{P}(h)\) are functions of the amount of historical data h. A more straightforward and quantitative description of these two functions is provided in Fig. 1 where it can be easily observed that \(\beta _{L}(h)>\beta _{P}(h)\) especially when h is small. As time passes and more data are revealed, the issue of parameter uncertainty is relieved and thus \(\beta _{L}(h)\) gradually approaches to \(\beta _{P}(h)\) in Fig. 1. Moreover, with larger MADs \(\varvec{q}_{0}\) evaluated, more assets are likely to be selected for investment in the Bayesian model than the case of plug-in model because \(G(\varvec{v}_{0}, \varvec{q}_{0}, k) \leqslant G(\varvec{v}_{0}, \tilde{\varvec{q}}_{0}, k)\) for all \(k\in [0;p-1]\) and any given \(\varvec{v}_{0}\).

Fig. 1
figure 1

Comparison of \(\beta _{L}(h)\) and \(\beta _{P}(h)\)

5 Numerical study

In this numerical study, we first investigate the role of Bayesian learning in the optimal portfolio decision. Then, an out-of-sample performance test is provided for models with and without Bayesian learning based on real market data.

5.1 Data

The market data used in this study consist of monthly return data of 17 industry portfolios from August 1989 to July 2019, that is, we have \(p=17\) and 30-year monthly return data in this experiment. These data are accessible on the website of Ken French.Footnote 5 In Table 1, we list the industry portfolio names and report the expectations and standard deviations of posterior distributions of unknown parameters in \(\varvec{\mu }\) and \(\varvec{\Sigma }\) given full data sample. Particularly, in Table 1, we compute \(\mathbb {E}_{0}(\mu ^{j})\), \(\mathbb {E}_{0}((\varvec{\Sigma })_{jj})\), \(\sqrt{{\text {Var}}(\mu ^{j}|\mathcal {D}_{0})}\) and \(\sqrt{{\text {Var}}((\varvec{\Sigma })_{jj}|\mathcal {D}_{0})}\), \(j\in [1;17]\), following equations in (19) and (20) with \(\mathcal {D}_{0}\) containing the full data sample from August 1989 to July 2019. The results in Table 1 show that the expectations of \(\{\mu ^{j}\}_{j=1}^{17}\) conditional on \(\mathcal {D}_{0}\) range from 0.468 to 0.992% with standard deviations \(\{\sqrt{{\text {Var}}(\mu ^{j}|\mathcal {D}_{0})}\}_{j=1}^{17}\) varying from 0.209 to 0.443%. Compared to the standard deviations of return variances \(\{\sqrt{{\text {Var}}((\varvec{\Sigma })_{jj}|\mathcal {D}_{0})}\}_{j=1}^{17}\) which possess the range from 0.012% to 0.054%, it appears that the uncertainty in return means is a dominated source of parameter uncertainty for investors.

Table 1 Data description (\(\times 10^{-2}\))

5.2 The role of Bayesian learning

Table 2 Role of Bayesian learning (\(\times 10^{-2}\))

As analyzed in Sect. 4.3, the Bayesian model is likely to produce a more diversified portfolio than the plug-in model. We show this phenomenon by designing an experiment with results presented in Table 2 where we report the conditional expectations \(\mathbb {E}_{0}(\varvec{\mu })\), their standard deviations \(\sqrt{{\text {Var}}(\varvec{\mu }|\mathcal {D}_{0})}\), the MADs from two investors and portfolio positions under two scenarios, i.e. “Normal” and “High”, with monthly return data from February 2017 to July 2019 (i.e., \(h=30\))Footnote 6 contained in \(\mathcal {D}_{0}\). We use the least squares Monte Carlo method to estimate AERs in \(\varvec{v}_{0}\) with \(N=20000\) and basis functions including the state variables and their quadratic values, that is, we have \(M=341\) and

$$\begin{aligned} \begin{aligned} (\phi _{1}(\varvec{\theta }_{0}),...,\phi _{341}(\varvec{\theta }_{0})) =&\Big (1,\mu _{0}^{1},...,\mu _{0}^{17},(\varvec{\Sigma }_{0})_{1,1},...,\{(\varvec{\Sigma }_{0})_{i,j}\}_{i\le j},...,(\varvec{\Sigma }_{0})_{17,17},\\&(\mu _{0}^{1})^{2},...,(\mu _{0}^{17})^{2},(\varvec{\Sigma }_{0})_{1,1}^{2},...,\{(\varvec{\Sigma }_{0})^{2}_{i,j}\}_{i\le j},...,(\varvec{\Sigma }_{0})_{17,17}^{2}\Big ). \end{aligned} \end{aligned}$$
(30)

In Table 2, the “Normal” scenario means that the results are obtained under the real estimates \(\varvec{\mu }_{0}=\tilde{\varvec{m}}_{0}=\sum _{s=-h}^{-1}\varvec{r}_{s}/h\), while “High” corresponds to the outcomes with \(\mu _{0}^{j}=\tilde{m}_{0}^{j}=\sum _{s=-h}^{-1}\varvec{r}_{s}/h+\sqrt{{\text {Var}}(\mu ^{j}|\mathcal {D}_{0})}\), \(j\in [1;17]\). In other words, we add a perturbation (plus one standard deviation) on the sample average of historical returns in \(\mathcal {D}_{0}\) and want to see the responses of \(\varvec{v}_{0}\), \(\tilde{\varvec{v}}_{0}\) and the optimal portfolio allocations given new mean estimates in the “High” scenario. The investment horizon is set to be \(T=6\) months with \(\lambda =0.4\) and \(\gamma _{t}=1\), \(\forall t\in [0;5]\).

According to Table 2, the results in “High” scenario show that \(\varvec{v}_{0}^{\text {H}}\) is uniformly lower than \(\tilde{\varvec{v}}_{0}^{\text {H}}\), with a smaller range (\(1.713\times 10^{-2}\) in \(\varvec{v}_{0}^{\text {H}}\) versus \(1.938\times 10^{-2}\) in \(\tilde{\varvec{v}}_{0}^{\text {H}}\)) such that \(G(\varvec{v}_{0}^{\text {H}}, \varvec{q}_{0}, 16)<G(\varvec{v}_{0}^{\text {H}}, \tilde{\varvec{q}}_{0}, 16) < G(\tilde{\varvec{v}}_{0}^{\text {H}}, \tilde{\varvec{q}}_{0}, 16)\). It is thus reasonable to expect a more diversified portfolio in Bayesian model than in plug-in model. Consistent with our analysis, we note that in “High” scenario, the plug-in investor selects 6 assets whereas the Bayesian investor selects 8 assets with their portfolio weights denoted by \(\tilde{\varvec{x}}_{0}^{*\text {H}}\) and \(\varvec{x}_{0}^{*\text {H}}\), respectively.

On the other hand, incorporating Bayesian learning to account for parameter uncertainty can also reduce the sensitivity of optimal portfolios to changes in model inputs. To show this, we follow the procedures similar to Best and Grauer (1991a) and focus on the results under the “Normal” scenario in Table 2 where \(\varvec{v}_{0}\) and \(\tilde{\varvec{v}}_{0}\) are close to each other and \(\varvec{x}_{0}^{*}\) is the same as \(\tilde{\varvec{x}}_{0}^{*}\). Among the selected 8 assets, for both Bayesian and plug-in investors, asset \(S_{15}\) and asset \(S_{16}\) feature the maximal and the minimal AER values, respectively. We present the model sensitivity by comparing the sizes of the shifts in the largest AER (\(\tilde{v}_{0}^{15}\) and \(\tilde{v}_{0}^{15}\)) required to drive \(S_{16}\) from the original optimal portfolio (\(\varvec{x}_{0}^{*}\) and \(\tilde{\varvec{x}}_{0}^{*}\)). It turns out that, when \(S_{16}\) is driven from \(\varvec{x}_{0}^{*}\), \(v_{0}^{15}\) should increase at least by 132.6% (from \(1.688\times 10^{-2}\) to \(3.926\times 10^{-2}\)). On the contrary, the required increase in \(\tilde{v}_{0}^{15}\) is by 16.1% (from \(1.669\times 10^{-2}\) to \(1.938\times 10^{-2}\)). Similar results can also be obtained by decreasing \(q_{0}^{15}\) and \(\tilde{q}_{0}^{15}\) to drive asset \(S_{16}\) from \(\varvec{x}_{0}^{*}\) and \(\tilde{\varvec{x}}_{0}^{*}\) (decrease by 78.2% in \(q_{0}^{15}\) versus 29.4% in \(\tilde{q}_{0}^{15}\)). The above results provide the evidence to support that the model with Bayesian learning is more robust to changes in model inputs than the plug-in model that ignores parameter uncertainty and estimation risk. Technically, the robustness gained by the Bayesian model comes from the higher estimates in MADs \(\varvec{q}_{t}\) which could attenuate the impact of changes in \(\varvec{v}_{t}\) and \(\varvec{q}_{t}\) to the variation of function G. Again, consistent with our analysis, we observe that \(\varvec{x}_{0}^{*}\) remains unchanged in “High” scenario, whereas, the plug-in investor selects less assets in “High” scenario with 8 assets in \(\varvec{x}_{0}^{*}\) versus 6 assets in \(\tilde{\varvec{x}}_{0}^{*\text {H}}\).

5.3 Out-of-sample performance

In this section, we provide the out-of-sample performance of policies with and without Bayesian learning to further support our findings and analysis.

5.3.1 Models

Six models are considered in this out-of-sample test. We use BL to refer to the model with Bayesian learning with the policy derived by solving Problem (6). Two models for the plug-in investor based on Problem (28) are included, denoted as PI-1 and PI-2. PI-2 follows a deterministic policy solved at the beginning of period 0, ignoring the newly released data. PI-1 keeps updating the point estimates in \(\tilde{\varvec{m}}_{t}\) and \(\tilde{\varvec{q}}_{t}\), and only \(\tilde{\varvec{x}}_{t}^{*}\) is used for constructing the optimal portfolio in period t. Although both BL and PI-1 can utilize the new released data, the difference is that BL incorporates estimation risk and Bayesian learning while PI-1 does not. We also introduce two single-period models denoted as SP-1 and SP-2. SP-1 is a single-period model with \(l_{\infty }\) risk measure,

$$\begin{aligned} \min _{\varvec{x}\in \mathcal {X}} \quad \lambda \max _{1\leqslant j\leqslant p} \tilde{q}_{0}^{j}x^{j} - (1-\lambda )\tilde{\varvec{m}}_{0}^{\top }\varvec{x}, \end{aligned}$$
(31)

where \(\mathcal {X} = \left\{ \varvec{x}:\sum _{j=1}^{p}x^{j} = V_{0}, x^{j}\geqslant 0,j\in [1;p]\right\}\). SP-2 is a mean-variance type model. To make SP-1 and SP-2 comparable, we use portfolio standard deviation, instead of variance, as the risk measure in SP-2, that is, we solve the following problem in SP-2,

$$\begin{aligned} \min _{\varvec{x}\in \mathcal {X}} \quad \lambda \Vert \tilde{\varvec{\Sigma }}_{0}^{1/2}\varvec{x}\Vert _{2} - (1-\lambda )\tilde{\varvec{m}}_{0}^{\top }\varvec{x}, \end{aligned}$$
(32)

where \(\tilde{\varvec{\Sigma }}_{0}\) is the sample covariance matrix given \(\mathcal {D}_{0}\); \(\tilde{\varvec{\Sigma }}_{0}^{1/2}\) is the matrix square root and \(\Vert \cdot \Vert _{2}\) is the 2-norm. The naive equally weighted portfolio denoted as 1/p is included as well.

For BL, the AER vector \(\varvec{v}_{t}\) is estimated by the least squares Monte Carlo method with settings the same as those used in Table 2. The SP-2 model (32) is solved by CPLEX. Solutions of other models follow Lemma 1, Theorem 1 and Proposition 4.3.

5.3.2 Setup

The out-of-sample test follows a rolling-horizon procedure. We first choose an estimation window with h data points as training data. We set the investment horizon as T, so the following T data points are used as out-of-sample test data. Every month, BL and PI-1 are allowed to use all available data to update model inputs and rebalance their portfolio holdings. The parameter estimates of PI-2, SP-1 and SP-2 depend only on the h training data. During the investment horizon, PI-2 follows its deterministic policy solved at the beginning of period 0. SP-1 and SP-2 repeat using their myopic solutions solved at the beginning of period 0. At the end of the terminal stage, this investment process produces exactly T out-of-sample monthly portfolio return rates for the six models. Then, we repeat the test for the next investment horizon and move the data window by T months, again taking the first h data points as training data and the left T points as test data, until the end of the data set is reached. In this test, we set \(T=6\), \(V_0 =1\), \(\lambda \in \{0.2,0.8\}\), \(\gamma _{t}=1\) for all \(t\in [0;T-1]\) and \(h=30\).

Based on the selected real data sequence from August 1989 to July 2019 (360 months), we further randomly generate nine time permuted return sequences. Note that we focus on the performance of the terminal portfolio return over the T-period investment. Since we set \(T=6\), for each return sequence, we will have 360/6 = 60 terminal portfolio returns, and, in total, we can collect \(60*10=600\) terminal portfolio returns over the 10 monthly return sequences. The T-period investment test will be repeated for 600 times. For ease of reference, we call the kth T-period investment test the kth iteration where \(k=1,2,...,600\).

5.3.3 Metrics

The performance metrics for the models include mean (MEAN), standard deviation (STD), Sharpe ratio (SR) and value-at-risk at 95% and 99% levels (VaR95%,VaR99%) of the 600 out-of-sample terminal portfolio returns. Besides these, we also report the portfolio turnover which is defined as

$$\begin{aligned} \text {PTO} = \frac{1}{600}\sum _{k=1}^{600}\sum _{t=0}^{T-1}\Vert \varvec{x}_{k,t}-\varvec{x}_{k,t^{-}} \Vert _{1}, \end{aligned}$$

where \(\Vert \cdot \Vert _{1}\) denotes 1-norm, \(\varvec{x}_{k,t}\) is the desired portfolio weight vector in iteration k at the beginning of period t and \(\varvec{x}_{k,t^{-}}\) is the portfolio weight before rebalancing but after the realization of the actual asset returns based on \(\varvec{x}_{k,t-1}\). We set \(\varvec{x}_{k,0^{-}}=\varvec{0}\). The metric PDM, short for portfolio diversification measure, documents the average number of assets with positive weights in one period, which can be mathematically defined as

$$\begin{aligned} \text {PDM} = \frac{1}{600T}\sum _{k=1}^{600}\sum _{t=0}^{T-1}\sum _{j=1}^{p}\mathbbm {1}\{ \varvec{x}_{k,t}^{j}>0 \}. \end{aligned}$$

where \(\mathbbm {1}\{\cdot \}\) is an indicator function and it equals one if \(\varvec{x}_{k,t}^{j}>0\) and zero otherwise. Moreover, we also compute the sum of squares of portfolio weights (SSQ) as another index for observing the fund distribution, with its definition follows that

$$\begin{aligned} \text {SSQ} = \frac{1}{600}\sum _{k=1}^{600}\sum _{t=0}^{T-1}\Big \Vert \frac{\varvec{x}_{k,t}}{V_{k,t}} \Big \Vert _{2}, \end{aligned}$$

where \(V_{k,t}\) is the initial wealth at the beginning of period t in iteration k. Finally, the portfolio sensitivity measure (PSM) (see Palczewski and Palczewski 2014) is given by

$$\begin{aligned} \text {PSM} = \frac{1}{600}\sum _{k=1}^{600}\sum _{t=0}^{T-1}\Big \Vert \frac{\varvec{x}_{k,t}}{V_{k,t}}-\varvec{w}_{t}^{*} \Big \Vert _{2}, \end{aligned}$$

where \(\varvec{w}_{t}^{*}\) contains the optimal percentage portfolio weights at the beginning of period t with full knowledge of the return distribution. For BL, PI-1 and PI-2, \(\varvec{w}_{t}^{*}\) is approximated by the optimal policy of a plug-in model with the full data samples. For SP-1 and SP-2, \(\varvec{w}_{t}^{*}\) is approximated by the solution from (31) and (32), respectively, given the full data samples.

Following DeMiguel et al. (2009), we also consider transaction costs in an ex post way. Specifically, the transaction cost arises from the portfolio turnover at the beginning of period t is quantified by \(\kappa \Vert \varvec{x}_{k,t}-\varvec{x}_{k,t^{-}} \Vert _{1}\) where we set \(\kappa =0.002\). After deducting the total transaction costs in T periods, we can obtain a net terminal portfolio return for each iteration and then we can compute the Sharpe ratio net of the transaction costs (NSR) based on the 600 net terminal portfolio returns. To test whether the Sharpe ratios (SR/NSR) of two models are statistically distinguishable, we also compute the p value of the difference following Jobson and Korkie (1981) and Memmel (2003).Footnote 7

5.3.4 Results

Table 3 Out-of-sample performance (\(p=17\))
Table 4 p values of differences in SR and NSR (\(p=17\))

Table 3 contains the out-of-sample performance of the six models in terms of ten metrics and Table 4 reports the p values of differences in SR and NSR with benchmark being BL. In Table 3, we see that under \(\lambda =0.2\), BL outperforms PI-1 and PI-2 in SR with the performance gaps both significant at 5% level, and the better performance of BL persists even after the consideration of transaction costs (see NSR and p values). Meanwhile, BL has larger MEAN, smaller STD, larger VaR95% and VaR99% than do PI-1 and PI-2, which apparently confirms the superiority of our proposed BL model. BL also has the largest PDM and smallest PSM compared to other models except 1/p under both \(\lambda =0.2\) and \(\lambda =0.8\), and these observations are consistent with our analysis in the previous section where we show that Bayesian learning can promote diversification and reduce sensitivity to data changes. Note that, under \(\lambda =0.2\), the metric results of PI-1, PI-2 and SP-1 are close to each other. The similar performance between PI-1 and PI-2 suggests that simply updating point estimates with new data may not improve the quality of the resulted portfolio. While, the similar performance between PI-2 and SP-1 implies that the advantage of dynamic models can be diminished with the existence of parameter uncertainty. The above observations together demonstrate the positive effect and the necessity of incorporating Bayesian learning procedure in dynamic portfolio optimization problems. On the other hand, the positive effect of using \(l_{\infty }\) risk function instead of portfolio variance can be observed from the comparison of models SP-1 and SP-2. Clearly, for both \(\lambda = 0.2\) and \(\lambda = 0.8\), SP-1 leads to better Sharpe ratios, SR and NSR, and notice that SP-2 has the smallest PDM and the largest SSQ and PSM in Table 3, which is aligned with the criticism on the classic mean-variance model that its solution is sensitive to model parameters and usually concentrates on a few assets (Litterman et al. 2004).

We have seen that BL can outperform PI-1 significantly under \(\lambda =0.2\). When \(\lambda =0.8\), their difference in SR narrows, e.g., 0.433 vs. 0.431 with p value 0.517 in Table 3. We know that the portfolio selection models with \(l_{\infty }\) risk function first select assets to invest according to (adjusted) expected returns and then determine the weight of each selected asset according to risks. So, when the investor is risk-seeking with a small \(\lambda\), the resulting policy only focuses on several assets with large historical returns and consideration of extra parameter uncertainty can have a noticeable impact on portfolio choice by making the model include more assets than the case without considering parameter uncertainty and reducing sensitivity to input changes as shown in Table 2. On the contrary, when \(\lambda\) is close to 1, the resulting policy will almost invest on all the available assets and in such a case, considering parameter uncertainty cannot affect much the decision on the wealth weights of the selected assets, which causes the similar of BL and PI-1 under \(\lambda =0.8\).

Another interesting observation is that in contrast to the comparison of BL and PI-1, our proposed BL model significantly outperforms 1/p under \(\lambda =0.8\) (0.433 vs. 0.413 in SR with p value 0.000), even after the consideration of transaction costs (0.432 vs. 0.413 in NSR with p value 0.000), but their differences in SR and NSR are not significant under \(\lambda =0.2\). The reason is that when \(\lambda\) is large, BL will invest on most of the assets and actively distribute the fund over the selected assets, making use of the risk information, and thus it can have a significant advantage over 1/p, while, when \(\lambda\) is small, BL that considers only several assets with high historical returns pursues high expected return in the cost of high volatility, which results in a higher out-of-sample return (0.046 vs. 0.045 in MEAN) but possibly with a statistically insignificant lower Sharpe ratio than 1/p.

5.3.5 Additional tests

Table 5 Additional out-of-sample test on 12-asset data set (\(p=12\))
Table 6 p values of differences in SR and NSR (\(p=12\))
Table 7 More historical data (\(h=120\))
Table 8 Out-of-sample results of BL under linear basis

To check the robustness of our findings, we reimplement the experiments on another real market data set with monthly returns of 12 risky assets,Footnote 8 i.e., \(p=12\). The settings are the same as those in Sects. 5.3.1, 5.3.2 and 5.3.3 and the out-of-sample performance results are presented in Table 5 (ten metrics) and Table 6 (p values). The numerical results in Tables 5 and 6 still support our findings stated in Sect. 5.3.4, that is, (i) our proposed dynamic portfolio selection model with Bayesian learning BL is able to significantly outperform the plug-in models, PI-1 and PI-2, and the equally weighted portfolio 1/p. (ii) The significance of the performance gap is affected by the risk preference level \(\lambda\). (iii) Compared to the mean-variance model, the use of \(l_{\infty }\) risk function can lead to better out-of-sample performance.

In addition, we also test the effect of Bayesian learning when more data are available with out-of-sample results presented in Table 7 where we set the historical data length \(h=120\) instead of \(h=30\) used in Tables 3 and 5. In Table 7, for both data sets \(p=17\) and \(p=12\), BL, PI-1 and PI-2 have a similar performance under \(\lambda =0.2\), which is in contrast to the observations in Tables 3 and 5. As predicted by the trend in Fig. 1, Table 7 numerically shows that the importance of incorporating Bayesian learning decreases as more data are available and the severity of parameter uncertainty reduces.

In Table 8, we report the new out-of-sample results of BL with AERs estimated by a linear basis, that is, we remove the quadratic terms in (30) and again run the regression following Algorithm 1. Compared to the results in Tables 3 and 5 which use the basis vector (30), the performance results in Table 8 with less regressors are almost the same as those listed in Table 3 and Table 5, suggesting that our numerical findings are robust to the setting in least squares Monte Carlo method.

6 Concluding remarks

The issue of parameter uncertainty and estimation risk has been long recognized as a crucial problem in portfolio management. In this paper, we incorporate Bayesian learning to deal with this issue in the framework of a proposed dynamic portfolio selection model where an \(l_{\infty }\) risk function is used as risk measure. The investor in our model is assumed to make decisions by maximizing expected terminal wealth at a minimal level of total risk, quantified by a weighted sum of the risks during the investment horizon. We show that the proposed stochastic dynamic program has a closed-form optimal policy that can be constructed intuitively. For implementation, we introduce a least squares Monte Carlo method to approximate the complex conditional expectations in AERs. We discuss the impact of Bayesian learning on investor’s decision making and show how it promotes diversification and reduces sensitivity of optimal portfolios to changes in model inputs under an i.i.d. normal return-generating process with unknown means and covariance matrix. The numerical results based on real market data show that our proposed dynamic portfolio selection model with Bayesian learning can significantly outperform the plug-in models and the equally weighted portfolio with the performance gaps affected by the risk preference level and the amount of data available.

For future research, one can discard the assumption that the return-generating process is known and study the effect of model ambiguity based on the dynamic portfolio selection model developed in this paper. Another direction is to consider transaction costs in an ex ante way. As an important element for practical portfolio selection models, it is interesting to investigate how the investment strategy of our dynamic model would change if the transaction costs are included in model formulation. Finally, introducing other advanced Bayesian learning techniques with side information could also be a meaningful way to extend our work.