10.1 Introduction

A financial portfolio is defined by a vector, w = (w 1, …, w p), say, whose element w j describes the fraction of the portfolio investment that is allocated to some asset x j. The random return of the portfolio is obtained as r = x 1w 1 + ⋯ + x pw p = wx, where x is a vector of random returns. An investor is interested in holding a portfolio that maximizes the expected return E[r] at a given level of risk (var[r]) or, equivalently, minimizes the risk var[r] at a given level of the expected return. A portfolio satisfying such a maximization/minimization is called an optimal portfolio. Markowitz [20] developed a theory for such mean-variance portfolio optimization that still plays a fundamental role. The theory considers a fixed (non-random) portfolio w, which could be determined qualitatively, and gives closed-form expressions for the optimal portfolio as a function of w, which in turn depends on the mean vector and covariance matrix of x.

While Markowitz’s theory is concerned with analyses and conclusions drawn from a fixed portfolio vector, there has been a growing interest during the last fifty years in how one can use statistical methods to estimate w. Because the optimal portfolio is a function of certain population parameters, one obtains the so-called standard estimator of w by substituting these unknown parameters by their sample counterparts. Some basic sampling properties such as low-order moments have long been known [6, 15], while the full normal-theory distribution of the standard estimator was derived by [22]. It has since been recognized, both from the theoretical distribution and from empirical findings, that the sampling variance of the standard estimator may be too large to be useful in investment strategies [4, 9, 17, 21].

As in most multivariate analyses, this applies in particular when the sample size n is close to the dimension p. A number of alternatives to the standard estimator have been derived recently. Jagannathan and Ma [13] noticed that imposing certain moment constraints leads to a reduction of sampling variance, while [4] proposed a generalized estimator that impose a threshold constraint on the portfolio vector. Bayesian methods have been proposed by [15] and [16], while inference solutions to the case of singular covariance have been derived by [1].

In this paper our interest lies in a particular family of estimators defined as a weighted mean between the standard estimator and a constant vector. This type of estimator, which is related to a family of estimators originally proposed by James and Stein [14], has been considered by [2, 8] and [12]. One of the main features of this weighted estimator is that it depends on a tuning coefficient, which also introduces a bias in the estimator. We derive some properties of this Stein-type estimator with respect to different kinds of weighted squared loss functions. Particular focus is set on the bias term since this has been given little attention in the literature previously. We will restrict ourselves to the specific case of the global minimum variance portfolio (GMVP), although many of the concerns in the paper also apply to more general portfolio optimizations. We will not attempt to derive any explicit estimators. Our primary interest lies in deriving and comparing different risk measures for a commonly used Stein-type estimator and discuss some of their differences.

10.2 Preliminaries

Consider a vector x : p × 1 of excess returns of p financial assets. A financial portfolio is defined as the weighted sum wx, where w : p × 1 is called the portfolio weight. The vector w can be determined “qualitatively”, i.e., based on expert knowledge about the market, or it could be estimated from historical data of returns, which is the objective of this article. An efficient portfolio is usually determined by minimizing the portfolio variance subject to a mean portfolio premium return and the additional constraint that investment proportions sum to one. According to [20], an efficient portfolio weight w, assuming absence of short sale constraints, is determined by

$$\displaystyle \begin{aligned} \min_{\mathbf{w}\in \mathbb R^p}\{\mathbf{w}'\varSigma\mathbf{w}\;|\; \mathbf{w}'\mathbf{1} = 1\}, \end{aligned} $$
(10.1)

where Σ is the covariance matrix of x and 1 : p × 1 is a vector of ones. The well-known solution to (10.1) is given by

$$\displaystyle \begin{aligned} \mathbf{w} = \frac{\varSigma^{-1}\mathbf{1}}{\mathbf{1}'\varSigma^{-1}\mathbf{1}}. \end{aligned} $$
(10.2)

The vector w is known as the global minimum variance portfolio (GMVP). It is also possible to define portfolios under more general constraints than those of (10.1), see [15] and [4] for alternative formulations.

Since the quantity in (10.2) depends on an unknown parameter, it needs to be estimated from data. We are thus concerned with the problem of using a set of n observations on random returns, say x 1, …, x n, to develop an estimator of w. We will assume a common setting where x i ∼iidN p(μ, Σ) under the assumptions maxj|μ j|≤ a 1 < , maxjλ j(Σ) ≤ a 2 < , and 0 < a 3 ≤minjλ j(Σ), where μ = (μ 1, …, μ p) and λ j(Σ) denote an eigenvalue of Σ. Although these assumptions are not strictly necessary for making inference of the GMPV weight, they simplify the technical treatments considerably.

An obvious estimator of w is obtained by replacing the unknown covariance matrix by its sample counterpart. The resulting estimate is commonly referred to as the standard estimator, henceforth denoted by \(\widehat {\mathbf {w}}_0\). This estimator is central in the paper, and we state some basic properties for the sake of clarity.

Property 10.1

Assume x i ∼iidN p(μ, Σ), i = 1, …, n, p ≥ 4 and n ≥ p + 2. Let \(S = n^{-1}\sum _{i=1}^n ({\mathbf {x}}_i-\bar {\mathbf {x}})({\mathbf {x}}_i-\bar {\mathbf {x}})'\), where \(\bar {\mathbf {x}} = n^{-1}\sum _{i=1}^n{\mathbf {x}}_i\). Define

$$\displaystyle \begin{aligned} (i) & \quad \mathbf{w} = \frac{\varSigma^{-1}\mathbf{1}}{\mathbf{1}'\varSigma^{-1}\mathbf{1}},\\ (ii) & \quad \sigma^2 = \frac{1}{\mathbf{1}'\varSigma^{-1}\mathbf{1}},\\ (iii) & \quad \widehat{\mathbf{w}}_0 = (\widehat w_{0(1)},\dots,\widehat w_{0(p)})' = \frac{{\mathbf{S}}^{-1}\mathbf{1}}{\mathbf{1}'{\mathbf{S}}^{-1}\mathbf{1}}. \end{aligned} $$

Then

$$\displaystyle \begin{aligned} \widehat{\mathbf{w}}_0 \sim t_p\left(\mathbf{w}, \frac{\sigma^2\varSigma^{-1}-\mathbf{w}\mathbf{w}'}{n-p+1},n-p+1\right), \end{aligned} $$

where t p(⋅) denotes a p-dimensional singular t-distribution with n − p + 1 degrees of freedom, location vector w and dispersion matrix σ 2Σ −1 −ww with rank(σ 2Σ −1 −ww) = p − 1.

Proof ([18, 22])

See also [19, Chapter 1], for a definition of the multivariate t-distribution. □

It is well known that the sampling variance of \(\widehat {\mathbf {w}}_0\) can be substantial when n is small relative to the dimension p and hence of limited relevance to an investor. A considerable amount of research has been concerned with development of improved estimators of w [2, 4, 8, 12]. A common approach is to first decide a family of estimators, which usually depends on some tuning coefficient, and then use a risk function to identify the appropriate value of this coefficient. One concern with this approach, however, is that two different risk functions usually produce two different values of the tuning coefficient and hence that the distributional properties of our portfolio weight estimator may strongly depend on which specific risk function that is being used. The next section will discuss this matter further.

10.3 Risk Measures and Portfolio Estimation

The original view of portfolio optimization and risk as stated by [20] is that Thew j’s are not random variables, but are fixed by the investor and that “Risk” is described by the variance of the return. The “risk” an investor is facing is accordingly determined by var[wx] = w′Σw, where Σ = cov[x i]. However, the definition of “risk” is less obvious when w is estimated from data because of the additional uncertainty of sampling variance and bias. From a perspective of statistical inference the term “risk” refers to the sampling variance and bias of a parameter estimator (say \(\widehat {\mathbf {w}}\)), while in portfolio theory “risk” primarily refers to variance of the return vector x. Since the estimated portfolio return is defined by \(r_i = \widehat {\mathbf {w}}'{\mathbf {x}}_i\), it involves risk in both senses.

Following Markowitz’s view of a fixed (non-random) portfolio, the Lagrangian of the optimization problem in (10.1) may be formulated

$$\displaystyle \begin{aligned} L(\mathbf{w}, \varSigma,\lambda_0) = \frac{1}{2}\mathbf{w}'\varSigma\mathbf{w} - \lambda_0(\mathbf{w}'\mathbf{1}-1), \end{aligned} $$
(10.3)

where λ 0 is a Lagrange multiplier. By taking derivatives of L(w, Σ, λ 0) w.r.t. w and equating at zero we get the condition

$$\displaystyle \begin{aligned} \mathbf{w} = \lambda_0\varSigma^{-1}\mathbf{1}. \end{aligned} $$
(10.4)

Since w1 = 1 it follows that λ 0 = (1′Σ −11)−1 and we obtain (10.2). Note that the identity (10.4) is completely determined by Σ −11 in the sense that if we define θ = Σ −11, then the solution to the optimization problem depends on θ only. When it comes to random (estimated) portfolio weights, the optimization problem has been formulated in different ways in the literature. Jagannathan and Ma [13] specify a constrained portfolio variance minimization problem as

$$\displaystyle \begin{aligned} \min_{\mathbf{w}\in \mathbb R^p} \{\mathbf{w}'\widehat\varSigma\mathbf{w} |\mathbf{w}'\mathbf{1} = 1, 0\leq w_j, w_j \leq \varpi, j=1,\dots,p \}, \end{aligned} $$
(10.5)

where \(\widehat \varSigma \) is some estimator of Σ and ϖ is a finite constant such that w j ≤ ϖ defines an upper bound constraint for the weights. Let \(\lambda = \begin {pmatrix} \lambda _1&\dots &\lambda _p\end {pmatrix}\)be the Lagrange multiplier for 0 ≤ w j and \(\delta = \begin {pmatrix} \delta _1 &\dots & \delta _p\end {pmatrix}\)the Lagrange multiplier for w j ≤ ϖ and define \(\widetilde \varSigma = \widehat \varSigma + (\delta \mathbf {1}'+\mathbf {1}\delta ') - (\lambda \mathbf {1}'+\mathbf {1}\lambda ')\), where \(\widehat \varSigma \) is the normal-theory unconstrained maximum likelihood (ML) estimate of Σ. Jagannathan and Ma [13] showed that \(\widetilde \varSigma \) is positive semidefinite, and that constructing a constrained global minimum variance portfolio from \(\widehat \varSigma \) (i.e., the solution to (10.5)) is equivalent to constructing the unconstrained minimum variance portfolio from \(\widetilde \varSigma \). Thus the constraints can be imposed in the estimation stage instead of the optimization stage and the result would be the same. This is a fundamental result in portfolio theory because it connects Markowitz’s theoretical portfolio theory with statistical inference theory, including both frequentistic and Bayesian treatments.

DeMiguel et al. [4] proposed a norm-constrained minimum variance portfolio as one that solves \(\min _{\mathbf {w}\in \mathbb R^p}\{\mathbf {w}'\varSigma \mathbf {w}\;|\; \mathbf {w}'\mathbf {1} = 1, \lVert \mathbf {w}\rVert \leq \varpi \}\), where \(\lVert \mathbf {w}\rVert \) denotes either the L 1 norm \(\lVert \mathbf {w}\rVert _1 = \sum _{j=1}^n |w_j|\) or the L 2 norm \(\lVert \mathbf {w}\rVert _2 = \sqrt {\mathbf {w}'Q\mathbf {w}}\), where Q is some positive definite matrix. They showed that the solution under the constraint \(\lVert \mathbf {w}\rVert _1=1\) coincides with the shortsale constrained estimator.

[8] take a rather different view on the optimization problem and argue that, for some estimator \(\widehat {\mathbf {w}}\) based on returns x 1, …, x n, the quantity \(var[{\mathbf {x}}_{n+1}^{\prime }\widehat {\mathbf {w}}|\mathcal {F}_n] = \widehat {\mathbf {w}}'\varSigma \widehat {\mathbf {w}}\), where \(\mathcal {F}_n\) is the information set up to time n, represents the actual variance of the return belonging to the portfolio \(\widehat {\mathbf {w}}\). References [2] and [23] take on a similar approach.

There currently seems to be no consensus, or unified theory, on how to formulate a statistical version of Markowitz’s fixed-portfolio optimization problem. In what follows we will discuss some consequences of the view one takes on the optimization problem and the measure used to evaluate the properties of a weight estimator.

Portfolio Weight Estimators

The GMVP weight vector w = (1′Σ −11)−1Σ −1 only involves one unknown quantity, Σ −1. One can therefore specify w as a function of Σ −1, say w = f(Σ −1) = (1′Σ −11)−1Σ −1. A portfolio estimator may thus be obtained by substituting an estimator \(\widehat \varSigma ^{-1}\) into w and obtain \(\widehat {\mathbf {w}} = f(\widehat \varSigma ^{-1}) = (\mathbf {1}'\widehat \varSigma ^{-1}\mathbf {1})^{-1}\widehat \varSigma ^{-1}\). Any estimator \(f(\widehat \varSigma ^{-1})\) will be a sufficient statistic for w as long as \(\widehat \varSigma ^{-1}\) is a sufficient statistic for Σ −1. Such “plug-in” estimators are therefore completely legitimate from an inferential point of view. The literature on estimation of Σ −1 is, in turn, extensive. Some important references include [5, 7, 10,11,12, 24, 26]. A comprehensive survey of improved estimators of Σ −1, including Bayesian frameworks, is given in [25]. We will, however, not proceed on this path but instead consider a more restricted class of estimators.

Let \(\widehat {\mathbf {w}}_0\) denote the standard estimator defined in Property 10.1 and w ref denote some pre-determined reference portfolio, which could be fixed or random (but independent of \(\widehat {\mathbf {w}}_0\)). We denote a weighted mean between these two quantities as follows

$$\displaystyle \begin{aligned} \widehat{\mathbf{w}}_{\alpha} = (1-\alpha)\widehat{\mathbf{w}}_0 + \alpha{\mathbf{w}}_{ref}, \quad 0\leq \alpha\leq 1. \end{aligned} $$
(10.6)

The estimator \(\widehat {\mathbf {w}}_{\alpha }\), which is closely related to a family of estimators proposed by [14], has been used by [2, 8, 18]. In order to determine an appropriate value of the tuning coefficient α one usually applies a loss function. A common square loss is defined by \(l = (\widehat {\mathbf {w}} - \mathbf {w})'Q(\widehat {\mathbf {w}} - \mathbf {w})\), where Q is a p.d. matrix. Taking the expected value of l we obtain a quadratic risk function for \(\widehat {\mathbf {w}}_{\alpha }\) as

$$\displaystyle \begin{aligned} \mathfrak R(\widehat{\mathbf{w}}_{\alpha},Q) = E[(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w})'Q(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w})]. \end{aligned} $$
(10.7)

The value of α that minimizes \(\mathfrak R\) is then defined as the optimalα. By different choices of Q we obtain several common risk functions as special cases of (10.7). For example, \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I)\) is the common mean squared error (MSE), which consists of the variance plus squared bias. The particular risk function \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma )\) has been used by [8] and [18] while [2] used \(l = (\widehat {\mathbf {w}}_{\alpha } - \mathbf {w})'\varSigma (\widehat {\mathbf {w}}_{\alpha } - \mathbf {w})\) directly, rather than its expected value, to derive an optimal portfolio estimator. Yet another way to evaluate the properties of a portfolio weight estimator is through its predictive, or out-of-sample properties. The predicted return of some estimator \(\widehat {\mathbf {w}}\) is obtained by \(\widehat {\mathbf {w}}'{\mathbf {x}}_t\), where x t is a vector of returns not used in \(\widehat {\mathbf {w}}\), i.e., x t∉{x 1, …, x n}. The mean squared error of prediction (MSEP) of \(\widehat {\mathbf {w}}\) may then be evaluated by \(E[(\widehat {\mathbf {w}}'{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]\). Note that MSEP is also a special case of (10.7), for

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}'{\mathbf{x}}_t - \mathbf{w}'{\mathbf{x}}_t)^2] & = E[(\widehat{\mathbf{w}} - \mathbf{w})'{\mathbf{x}}_t{\mathbf{x}}_t^{\prime}(\widehat{\mathbf{w}} - \mathbf{w})] \\ & = E[(\widehat{\mathbf{w}} - \mathbf{w})'(\varSigma + \mu\mu')(\widehat{\mathbf{w}} - \mathbf{w})] = \mathfrak R(\widehat{\mathbf{w}},\varSigma+\mu\mu'). \end{aligned} $$

In what follows we give explicit expressions of \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },Q)\) for some particular values of Q.

Proposition 10.1

Let \(\widehat {\mathbf {w}}_{\alpha } = (1-\alpha )\widehat {\mathbf {w}}_0 + \alpha p^{-1}\mathbf {1}\)for some 0 ≤ α ≤ 1, and let \(E[\widehat {\mathbf {w}}_0] = \mathbf {w}\), \(R=cov[\widehat {\mathbf {w}}_0] = \dfrac {\sigma ^2\varSigma ^{-1}-\mathbf {w}\mathbf {w}'}{n-p-1}\), where σ 2 = (1′Σ −11)−1. Letx tbe an out-of-sample vector of returns, and define μ = E[x t], μ w = w′μ and \(\bar \mu = p^{-1}\mathbf {1}'\mu \). Then the following risk identities hold

  1. (a)

    \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = (1-\alpha )^2\mathrm {tr}\{R\}+\alpha ^2(\mathbf {w}'\mathbf {w}-p^{-1})\),

  2. (b)

    \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma ) = \dfrac {(1-\alpha )^2}{n-p-1}\sigma ^2(p-1)+\alpha ^2(p^{-2}\mathbf {1}'\varSigma \mathbf {1} - \sigma ^2)\),

  3. (c)

    \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\mu \mu ') = (1-\alpha )^2\mu 'R\mu + \alpha ^2(\mu _w-\bar \mu )^2\),

  4. (d)

    \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma + \mu \mu ') = \dfrac {(1-\alpha )^2}{n-p-1}(\sigma ^2(p-1)+\sigma ^2\mu '\varSigma ^{-1}\mu +\mu _w^2)+\alpha ^2((p^{-2}\mathbf {1}'\varSigma \mathbf {1}-\sigma ^2)+(\mu _w-\bar \mu )^2)\).

Before the proof of Proposition 10.1 we will state a useful identity in the following lemma.

Lemma 10.1

Let \(\widehat {\mathbf {w}}_{\alpha } = (1-\alpha )\widehat {\mathbf {w}}_0 + \alpha p^{-1}\mathbf {1}\)for some 0 ≤ α ≤ 1, and let \(E[\widehat {\mathbf {w}}_0] = \mathbf {w}\), \(cov[\widehat {\mathbf {w}}_0] = R\). Then it holds that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'] = (1-\alpha)^2R+\alpha^2(\mathbf{w}-p^{-1}\mathbf{1})(\mathbf{w}-p^{-1}\mathbf{1})'. \end{aligned} $$

Proof

See Appendix. □

We are now ready to give the proof of Proposition 10.1.

Proof

Proof of Proposition 10.1. From Lemma 10.1 we find that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'] = \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w}){\widehat{\mathbf{w}}_{\alpha}-\mathbf{w}}']\} = (1-\alpha)^2\mathrm{tr}\{R\}+\alpha^2(\mathbf{w}'\mathbf{w}-p^{-1}), \end{aligned} $$

which establishes (a). Applying Lemma 10.1 again we obtain (b) since

$$\displaystyle \begin{aligned} \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'\varSigma]\} =(1-\alpha)^2\mathrm{tr}\{R\varSigma\}+\alpha^2(p^{-2}\mathbf{1}'\varSigma\mathbf{1} - \sigma^2) \end{aligned} $$

and

$$\displaystyle \begin{aligned} \mathrm{tr}\{R\varSigma\} = \frac{\mathrm{tr}\{(\sigma^2\varSigma^{-1}-\mathbf{w}\mathbf{w}')\varSigma\}}{n-p-1} = \frac{\mathrm{tr}\{\sigma^2I-\mathbf{w}\mathbf{w}'\varSigma\}}{n-p-1} = \frac{p-1}{n-p-1}\sigma^2. \end{aligned} $$

The identity (c) is similarly obtained by

$$\displaystyle \begin{aligned} \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'\mu\mu']\} =(1-\alpha)^2\mu'R\mu + \alpha^2(\mathbf{w}'\mu-p^{-1}\mathbf{1}'\mu)^2, \end{aligned} $$

while (d) is obtained by adding (b) and (c) and simplifying terms. □

Remark 10.1

The MSEP measure in (d), i.e., \(E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]\), describes the variability of \(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t\) around the random term wx t. An alternative way of defining MSEP is by \(E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]\) which describes the variability of \(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t\) around the non-random point w′E[x t]. These two differ in that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}^{\prime}_{\alpha}{\mathbf{x}}_t - \mathbf{w}'\mu)^2] = E[(\widehat{\mathbf{w}}^{\prime}_{\alpha}{\mathbf{x}}_t - \mathbf{w}'{\mathbf{x}}_t)^2] + (\mathbf{1}'\varSigma^{-1}\mathbf{1})^{-1}. \end{aligned} $$

See Appendix for a proof of this identity. Note that the smallest possible return prediction variance for any of the risks (a)–(d) of Proposition 10.1 is 0, which is attained at α = 1, whereas for \(E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]\) the minimum prediction variance is determined by (1′Σ −11)−1 regardless of the value of α.

For purposes of deriving an estimator of w, the choice between \(E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]\) and \(E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]\) makes no difference, but for calculation of prediction intervals our view of the centre point does matter.

Remark 10.2

The bias terms vanish trivially if α = 0 and/or Σ = I. More generally, the bias terms for \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I)\), \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma )\), and \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\mu \mu ')\) vanish when p1′Σ −21 = (1′Σ −11)2, p −21′Σ1 = σ 2 and \(\mu _w - \bar \mu = 0\), respectively. While the first two conditions are almost identical to the condition Σ = I, the identity \(\mu _w=\bar \mu \), holds when the portfolio return equals the average asset mean.

Remark 10.3

The optimal value of the tuning coefficient α may be derived by equating \(\dfrac {\partial \mathfrak R(\widehat {\mathbf {w}}_{\alpha },Q)}{\partial \alpha }\) at zero and solving for α. The resulting value, say α opt, then yields an adaptive estimator \(\widehat {\mathbf {w}}_{\alpha _{opt}} = (1-\alpha _{opt})\widehat {\mathbf {w}}_0 + \alpha _{opt}p^{-1}\mathbf {1}\). However, α opt is by necessity a function of unknown population parameters. Operational, or “bona-fide”, estimators are usually obtained by substituting these unknown parameters by some estimators to obtain an approximation \(\widehat \alpha \). The adapted portfolio estimator is then defined by \(\widehat {\mathbf {w}}_{\widehat \alpha } = (1-\widehat \alpha )\widehat {\mathbf {w}}_0 + \widehat \alpha p^{-1}\mathbf {1}\). The coefficient \(\widehat \alpha \) is hence random, which in turn distorts the otherwise known distribution of \(\widehat {\mathbf {w}}_{\alpha }\). It is still possible to conduct inferential analysis (interval estimation, etc.) through the theory of oracle inequalities but at the price of becoming more technically involved (see [3] for a comprehensive survey of the topic). Another option is to determine α qualitatively, for example based on expert knowledge of the financial market. This method has the appealing property of retaining the sampling distribution of \(\widehat {\mathbf {w}}_{\alpha }\).

Remark 10.4

The original portfolio problem as posed by [20] considers the return of a non-random portfolio at data point t (e.g., a time point), determined by wx t with variance var[wx t] = tr{cov[x t]ww} = w′Σw, which is void of concerns about consistency and bias, etc. The fixed-portfolio theory uses Lagrange multipliers to derive the global minimum weight portfolio \(\mathbf {w} = \dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}\), which is the minimum of var[wx t] subject to the constraint \(\sum _{j=1}^pw_j=1\). How to proceed from there to statistical estimation is less obvious. It is customary to apply the criterion \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma ) = E[(\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})]\) to derive estimators of w, but it is not clear what is gained by minimizing this risk instead of the unweighted risk \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = E[(\widehat {\mathbf {w}} - \mathbf {w})'(\widehat {\mathbf {w}} - \mathbf {w})]\). For any consistent estimator, say \(\widehat {\mathbf {w}} = \dfrac {\widehat \varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\widehat \varSigma ^{-1}\mathbf {1}}\), where \(\widehat \varSigma ^{-1}\) some consistent estimator of Σ −1, the asymptotic return variance is \(\lim _{n\rightarrow \infty }var[\widehat {\mathbf {w}}'{\mathbf {x}}_t] = \mathbf {w}'\varSigma \mathbf {w}\) regardless of which risk function was used to obtain \(\widehat {\mathbf {w}}\). The weight Q = Σ has already been used in the minimization of w′Σw to obtain the functional form \(\dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}\), and there is no apparent gain in using the weighting \((\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})\) once again in the estimation stage. In fact, Σ −1 is the only unknown parameter in \(\dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}\). Knowing Σ (up to a scalar multiplication) is therefore equivalent to knowing w. Since Σ is a one-to-one transformation of Σ −1, the weighted loss \((\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})\) in a sense uses the true parameter to derive its own estimator. In this view the unweighted risk \(\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = E[(\widehat {\mathbf {w}} - \mathbf {w})'(\widehat {\mathbf {w}} - \mathbf {w})]\) may better reflect the actual risk of an estimator \(\widehat {\mathbf {w}}\).