Risk and Bias in Portfolio Optimization

Holgersson, Thomas; Singull, Martin

doi:10.1007/978-3-030-56773-6_10

Thomas Holgersson³ &
Martin Singull⁴

643 Accesses

Abstract

In this paper we derive weighted squared risk measures for a commonly used Stein-type estimator of the global minimum variance portfolio. The risk functions are conveniently split in terms of variance and squared bias over different weight matrices. It is argued that the common out-of-sample variance criteria should be used with care and that a simple unweighted risk function may be more appropriate.

Access provided by Autonomous University of Puebla. Download chapter PDF

A risk perspective of estimating portfolio weights of the global minimum-variance portfolio

Article Open access 04 February 2019

Incorporating Estimation Errors into Portfolio Selection: Robust Portfolio Construction

Robust optimization approaches for portfolio selection: a comparative analysis

Article 24 June 2021

10.1 Introduction

A financial portfolio is defined by a vector, w = (w ₁, …, w _p), say, whose element w _j describes the fraction of the portfolio investment that is allocated to some asset x _j. The random return of the portfolio is obtained as r = x ₁w ₁ + ⋯ + x _pw _p = w′x, where x is a vector of random returns. An investor is interested in holding a portfolio that maximizes the expected return E[r] at a given level of risk (var[r]) or, equivalently, minimizes the risk var[r] at a given level of the expected return. A portfolio satisfying such a maximization/minimization is called an optimal portfolio. Markowitz [20] developed a theory for such mean-variance portfolio optimization that still plays a fundamental role. The theory considers a fixed (non-random) portfolio w, which could be determined qualitatively, and gives closed-form expressions for the optimal portfolio as a function of w, which in turn depends on the mean vector and covariance matrix of x.

While Markowitz’s theory is concerned with analyses and conclusions drawn from a fixed portfolio vector, there has been a growing interest during the last fifty years in how one can use statistical methods to estimate w. Because the optimal portfolio is a function of certain population parameters, one obtains the so-called standard estimator of w by substituting these unknown parameters by their sample counterparts. Some basic sampling properties such as low-order moments have long been known [6, 15], while the full normal-theory distribution of the standard estimator was derived by [22]. It has since been recognized, both from the theoretical distribution and from empirical findings, that the sampling variance of the standard estimator may be too large to be useful in investment strategies [4, 9, 17, 21].

As in most multivariate analyses, this applies in particular when the sample size n is close to the dimension p. A number of alternatives to the standard estimator have been derived recently. Jagannathan and Ma [13] noticed that imposing certain moment constraints leads to a reduction of sampling variance, while [4] proposed a generalized estimator that impose a threshold constraint on the portfolio vector. Bayesian methods have been proposed by [15] and [16], while inference solutions to the case of singular covariance have been derived by [1].

In this paper our interest lies in a particular family of estimators defined as a weighted mean between the standard estimator and a constant vector. This type of estimator, which is related to a family of estimators originally proposed by James and Stein [14], has been considered by [2, 8] and [12]. One of the main features of this weighted estimator is that it depends on a tuning coefficient, which also introduces a bias in the estimator. We derive some properties of this Stein-type estimator with respect to different kinds of weighted squared loss functions. Particular focus is set on the bias term since this has been given little attention in the literature previously. We will restrict ourselves to the specific case of the global minimum variance portfolio (GMVP), although many of the concerns in the paper also apply to more general portfolio optimizations. We will not attempt to derive any explicit estimators. Our primary interest lies in deriving and comparing different risk measures for a commonly used Stein-type estimator and discuss some of their differences.

10.2 Preliminaries

Consider a vector x : p × 1 of excess returns of p financial assets. A financial portfolio is defined as the weighted sum w′x, where w : p × 1 is called the portfolio weight. The vector w can be determined “qualitatively”, i.e., based on expert knowledge about the market, or it could be estimated from historical data of returns, which is the objective of this article. An efficient portfolio is usually determined by minimizing the portfolio variance subject to a mean portfolio premium return and the additional constraint that investment proportions sum to one. According to [20], an efficient portfolio weight w, assuming absence of short sale constraints, is determined by

$$\displaystyle \begin{aligned} \min_{\mathbf{w}\in \mathbb R^p}\{\mathbf{w}'\varSigma\mathbf{w}\;|\; \mathbf{w}'\mathbf{1} = 1\}, \end{aligned} $$

(10.1)

where Σ is the covariance matrix of x and 1 : p × 1 is a vector of ones. The well-known solution to (10.1) is given by

$$\displaystyle \begin{aligned} \mathbf{w} = \frac{\varSigma^{-1}\mathbf{1}}{\mathbf{1}'\varSigma^{-1}\mathbf{1}}. \end{aligned} $$

(10.2)

The vector w is known as the global minimum variance portfolio (GMVP). It is also possible to define portfolios under more general constraints than those of (10.1), see [15] and [4] for alternative formulations.

Since the quantity in (10.2) depends on an unknown parameter, it needs to be estimated from data. We are thus concerned with the problem of using a set of n observations on random returns, say x ₁, …, x _n, to develop an estimator of w. We will assume a common setting where x _i ∼_iidN _p(μ, Σ) under the assumptions max_j|μ _j|≤ a ₁ < ∞, max_jλ _j(Σ) ≤ a ₂ < ∞, and 0 < a ₃ ≤min_jλ _j(Σ), where μ = (μ ₁, …, μ _p)′ and λ _j(Σ) denote an eigenvalue of Σ. Although these assumptions are not strictly necessary for making inference of the GMPV weight, they simplify the technical treatments considerably.

An obvious estimator of w is obtained by replacing the unknown covariance matrix by its sample counterpart. The resulting estimate is commonly referred to as the standard estimator, henceforth denoted by $\widehat {\mathbf {w}}_0$. This estimator is central in the paper, and we state some basic properties for the sake of clarity.

Property 10.1

Assume x _i ∼_iidN _p(μ, Σ), i = 1, …, n, p ≥ 4 and n ≥ p + 2. Let $S = n^{-1}\sum _{i=1}^n ({\mathbf {x}}_i-\bar {\mathbf {x}})({\mathbf {x}}_i-\bar {\mathbf {x}})'$, where $\bar {\mathbf {x}} = n^{-1}\sum _{i=1}^n{\mathbf {x}}_i$. Define

$$\displaystyle \begin{aligned} (i) & \quad \mathbf{w} = \frac{\varSigma^{-1}\mathbf{1}}{\mathbf{1}'\varSigma^{-1}\mathbf{1}},\\ (ii) & \quad \sigma^2 = \frac{1}{\mathbf{1}'\varSigma^{-1}\mathbf{1}},\\ (iii) & \quad \widehat{\mathbf{w}}_0 = (\widehat w_{0(1)},\dots,\widehat w_{0(p)})' = \frac{{\mathbf{S}}^{-1}\mathbf{1}}{\mathbf{1}'{\mathbf{S}}^{-1}\mathbf{1}}. \end{aligned} $$

Then

$$\displaystyle \begin{aligned} \widehat{\mathbf{w}}_0 \sim t_p\left(\mathbf{w}, \frac{\sigma^2\varSigma^{-1}-\mathbf{w}\mathbf{w}'}{n-p+1},n-p+1\right), \end{aligned} $$

where t _p(⋅) denotes a p-dimensional singular t-distribution with n − p + 1 degrees of freedom, location vector w and dispersion matrix σ ²Σ ⁻¹ −ww′ with rank(σ ²Σ ⁻¹ −ww′) = p − 1.

Proof ([18, 22])

See also [19, Chapter 1], for a definition of the multivariate t-distribution. □

It is well known that the sampling variance of $\widehat {\mathbf {w}}_0$ can be substantial when n is small relative to the dimension p and hence of limited relevance to an investor. A considerable amount of research has been concerned with development of improved estimators of w [2, 4, 8, 12]. A common approach is to first decide a family of estimators, which usually depends on some tuning coefficient, and then use a risk function to identify the appropriate value of this coefficient. One concern with this approach, however, is that two different risk functions usually produce two different values of the tuning coefficient and hence that the distributional properties of our portfolio weight estimator may strongly depend on which specific risk function that is being used. The next section will discuss this matter further.

10.3 Risk Measures and Portfolio Estimation

The original view of portfolio optimization and risk as stated by [20] is that Thew _j’s are not random variables, but are fixed by the investor and that “Risk” is described by the variance of the return. The “risk” an investor is facing is accordingly determined by var[w′x] = w′Σw, where Σ = cov[x _i]. However, the definition of “risk” is less obvious when w is estimated from data because of the additional uncertainty of sampling variance and bias. From a perspective of statistical inference the term “risk” refers to the sampling variance and bias of a parameter estimator (say $\widehat {\mathbf {w}}$), while in portfolio theory “risk” primarily refers to variance of the return vector x. Since the estimated portfolio return is defined by $r_i = \widehat {\mathbf {w}}'{\mathbf {x}}_i$, it involves risk in both senses.

Following Markowitz’s view of a fixed (non-random) portfolio, the Lagrangian of the optimization problem in (10.1) may be formulated

$$\displaystyle \begin{aligned} L(\mathbf{w}, \varSigma,\lambda_0) = \frac{1}{2}\mathbf{w}'\varSigma\mathbf{w} - \lambda_0(\mathbf{w}'\mathbf{1}-1), \end{aligned} $$

(10.3)

where λ ₀ is a Lagrange multiplier. By taking derivatives of L(w, Σ, λ ₀) w.r.t. w and equating at zero we get the condition

$$\displaystyle \begin{aligned} \mathbf{w} = \lambda_0\varSigma^{-1}\mathbf{1}. \end{aligned} $$

(10.4)

Since w′1 = 1 it follows that λ ₀ = (1′Σ ⁻¹1)⁻¹ and we obtain (10.2). Note that the identity (10.4) is completely determined by Σ ⁻¹1 in the sense that if we define θ = Σ ⁻¹1, then the solution to the optimization problem depends on θ only. When it comes to random (estimated) portfolio weights, the optimization problem has been formulated in different ways in the literature. Jagannathan and Ma [13] specify a constrained portfolio variance minimization problem as

$$\displaystyle \begin{aligned} \min_{\mathbf{w}\in \mathbb R^p} \{\mathbf{w}'\widehat\varSigma\mathbf{w} |\mathbf{w}'\mathbf{1} = 1, 0\leq w_j, w_j \leq \varpi, j=1,\dots,p \}, \end{aligned} $$

(10.5)

where $\widehat \varSigma $ is some estimator of Σ and ϖ is a finite constant such that w _j ≤ ϖ defines an upper bound constraint for the weights. Let $\lambda = \begin {pmatrix} \lambda _1&\dots &\lambda _p\end {pmatrix}$be the Lagrange multiplier for 0 ≤ w _j and $\delta = \begin {pmatrix} \delta _1 &\dots & \delta _p\end {pmatrix}$the Lagrange multiplier for w _j ≤ ϖ and define $\widetilde \varSigma = \widehat \varSigma + (\delta \mathbf {1}'+\mathbf {1}\delta ') - (\lambda \mathbf {1}'+\mathbf {1}\lambda ')$, where $\widehat \varSigma $ is the normal-theory unconstrained maximum likelihood (ML) estimate of Σ. Jagannathan and Ma [13] showed that $\widetilde \varSigma $ is positive semidefinite, and that constructing a constrained global minimum variance portfolio from $\widehat \varSigma $ (i.e., the solution to (10.5)) is equivalent to constructing the unconstrained minimum variance portfolio from $\widetilde \varSigma $. Thus the constraints can be imposed in the estimation stage instead of the optimization stage and the result would be the same. This is a fundamental result in portfolio theory because it connects Markowitz’s theoretical portfolio theory with statistical inference theory, including both frequentistic and Bayesian treatments.

DeMiguel et al. [4] proposed a norm-constrained minimum variance portfolio as one that solves $\min _{\mathbf {w}\in \mathbb R^p}\{\mathbf {w}'\varSigma \mathbf {w}\;|\; \mathbf {w}'\mathbf {1} = 1, \lVert \mathbf {w}\rVert \leq \varpi \}$, where $\lVert \mathbf {w}\rVert $ denotes either the L ₁ norm $\lVert \mathbf {w}\rVert _1 = \sum _{j=1}^n |w_j|$ or the L ₂ norm $\lVert \mathbf {w}\rVert _2 = \sqrt {\mathbf {w}'Q\mathbf {w}}$, where Q is some positive definite matrix. They showed that the solution under the constraint $\lVert \mathbf {w}\rVert _1=1$ coincides with the shortsale constrained estimator.

[8] take a rather different view on the optimization problem and argue that, for some estimator $\widehat {\mathbf {w}}$ based on returns x ₁, …, x _n, the quantity $var[{\mathbf {x}}_{n+1}^{\prime }\widehat {\mathbf {w}}|\mathcal {F}_n] = \widehat {\mathbf {w}}'\varSigma \widehat {\mathbf {w}}$, where $\mathcal {F}_n$ is the information set up to time n, represents the actual variance of the return belonging to the portfolio $\widehat {\mathbf {w}}$. References [2] and [23] take on a similar approach.

There currently seems to be no consensus, or unified theory, on how to formulate a statistical version of Markowitz’s fixed-portfolio optimization problem. In what follows we will discuss some consequences of the view one takes on the optimization problem and the measure used to evaluate the properties of a weight estimator.

Portfolio Weight Estimators

The GMVP weight vector w = (1′Σ ⁻¹1)⁻¹Σ ⁻¹ only involves one unknown quantity, Σ ⁻¹. One can therefore specify w as a function of Σ ⁻¹, say w = f(Σ ⁻¹) = (1′Σ ⁻¹1)⁻¹Σ ⁻¹. A portfolio estimator may thus be obtained by substituting an estimator $\widehat \varSigma ^{-1}$ into w and obtain $\widehat {\mathbf {w}} = f(\widehat \varSigma ^{-1}) = (\mathbf {1}'\widehat \varSigma ^{-1}\mathbf {1})^{-1}\widehat \varSigma ^{-1}$. Any estimator $f(\widehat \varSigma ^{-1})$ will be a sufficient statistic for w as long as $\widehat \varSigma ^{-1}$ is a sufficient statistic for Σ ⁻¹. Such “plug-in” estimators are therefore completely legitimate from an inferential point of view. The literature on estimation of Σ ⁻¹ is, in turn, extensive. Some important references include [5, 7, 10,11,12, 24, 26]. A comprehensive survey of improved estimators of Σ ⁻¹, including Bayesian frameworks, is given in [25]. We will, however, not proceed on this path but instead consider a more restricted class of estimators.

Let $\widehat {\mathbf {w}}_0$ denote the standard estimator defined in Property 10.1 and w _ref denote some pre-determined reference portfolio, which could be fixed or random (but independent of $\widehat {\mathbf {w}}_0$). We denote a weighted mean between these two quantities as follows

$$\displaystyle \begin{aligned} \widehat{\mathbf{w}}_{\alpha} = (1-\alpha)\widehat{\mathbf{w}}_0 + \alpha{\mathbf{w}}_{ref}, \quad 0\leq \alpha\leq 1. \end{aligned} $$

(10.6)

The estimator $\widehat {\mathbf {w}}_{\alpha }$, which is closely related to a family of estimators proposed by [14], has been used by [2, 8, 18]. In order to determine an appropriate value of the tuning coefficient α one usually applies a loss function. A common square loss is defined by $l = (\widehat {\mathbf {w}} - \mathbf {w})'Q(\widehat {\mathbf {w}} - \mathbf {w})$, where Q is a p.d. matrix. Taking the expected value of l we obtain a quadratic risk function for $\widehat {\mathbf {w}}_{\alpha }$ as

$$\displaystyle \begin{aligned} \mathfrak R(\widehat{\mathbf{w}}_{\alpha},Q) = E[(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w})'Q(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w})]. \end{aligned} $$

(10.7)

The value of α that minimizes $\mathfrak R$ is then defined as the optimalα. By different choices of Q we obtain several common risk functions as special cases of (10.7). For example, $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I)$ is the common mean squared error (MSE), which consists of the variance plus squared bias. The particular risk function $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma )$ has been used by [8] and [18] while [2] used $l = (\widehat {\mathbf {w}}_{\alpha } - \mathbf {w})'\varSigma (\widehat {\mathbf {w}}_{\alpha } - \mathbf {w})$ directly, rather than its expected value, to derive an optimal portfolio estimator. Yet another way to evaluate the properties of a portfolio weight estimator is through its predictive, or out-of-sample properties. The predicted return of some estimator $\widehat {\mathbf {w}}$ is obtained by $\widehat {\mathbf {w}}'{\mathbf {x}}_t$, where x _t is a vector of returns not used in $\widehat {\mathbf {w}}$, i.e., x _t∉{x ₁, …, x _n}. The mean squared error of prediction (MSEP) of $\widehat {\mathbf {w}}$ may then be evaluated by $E[(\widehat {\mathbf {w}}'{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]$. Note that MSEP is also a special case of (10.7), for

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}'{\mathbf{x}}_t - \mathbf{w}'{\mathbf{x}}_t)^2] & = E[(\widehat{\mathbf{w}} - \mathbf{w})'{\mathbf{x}}_t{\mathbf{x}}_t^{\prime}(\widehat{\mathbf{w}} - \mathbf{w})] \\ & = E[(\widehat{\mathbf{w}} - \mathbf{w})'(\varSigma + \mu\mu')(\widehat{\mathbf{w}} - \mathbf{w})] = \mathfrak R(\widehat{\mathbf{w}},\varSigma+\mu\mu'). \end{aligned} $$

In what follows we give explicit expressions of $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },Q)$ for some particular values of Q.

Proposition 10.1

Let $\widehat {\mathbf {w}}_{\alpha } = (1-\alpha )\widehat {\mathbf {w}}_0 + \alpha p^{-1}\mathbf {1}$for some 0 ≤ α ≤ 1, and let $E[\widehat {\mathbf {w}}_0] = \mathbf {w}$, $R=cov[\widehat {\mathbf {w}}_0] = \dfrac {\sigma ^2\varSigma ^{-1}-\mathbf {w}\mathbf {w}'}{n-p-1}$, where σ ² = (1′Σ ⁻¹1)⁻¹. Letx _tbe an out-of-sample vector of returns, and define μ = E[x _t], μ _w = w′μ and $\bar \mu = p^{-1}\mathbf {1}'\mu $. Then the following risk identities hold

(a)
$\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = (1-\alpha )^2\mathrm {tr}\{R\}+\alpha ^2(\mathbf {w}'\mathbf {w}-p^{-1})$,
(b)
$\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma ) = \dfrac {(1-\alpha )^2}{n-p-1}\sigma ^2(p-1)+\alpha ^2(p^{-2}\mathbf {1}'\varSigma \mathbf {1} - \sigma ^2)$,
(c)
$\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\mu \mu ') = (1-\alpha )^2\mu 'R\mu + \alpha ^2(\mu _w-\bar \mu )^2$,
(d)
$\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma + \mu \mu ') = \dfrac {(1-\alpha )^2}{n-p-1}(\sigma ^2(p-1)+\sigma ^2\mu '\varSigma ^{-1}\mu +\mu _w^2)+\alpha ^2((p^{-2}\mathbf {1}'\varSigma \mathbf {1}-\sigma ^2)+(\mu _w-\bar \mu )^2)$.

Before the proof of Proposition 10.1 we will state a useful identity in the following lemma.

Lemma 10.1

Let $\widehat {\mathbf {w}}_{\alpha } = (1-\alpha )\widehat {\mathbf {w}}_0 + \alpha p^{-1}\mathbf {1}$for some 0 ≤ α ≤ 1, and let $E[\widehat {\mathbf {w}}_0] = \mathbf {w}$, $cov[\widehat {\mathbf {w}}_0] = R$. Then it holds that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'] = (1-\alpha)^2R+\alpha^2(\mathbf{w}-p^{-1}\mathbf{1})(\mathbf{w}-p^{-1}\mathbf{1})'. \end{aligned} $$

Proof

See Appendix. □

We are now ready to give the proof of Proposition 10.1.

Proof

Proof of Proposition 10.1. From Lemma 10.1 we find that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'] = \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w}){\widehat{\mathbf{w}}_{\alpha}-\mathbf{w}}']\} = (1-\alpha)^2\mathrm{tr}\{R\}+\alpha^2(\mathbf{w}'\mathbf{w}-p^{-1}), \end{aligned} $$

which establishes (a). Applying Lemma 10.1 again we obtain (b) since

$$\displaystyle \begin{aligned} \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'\varSigma]\} =(1-\alpha)^2\mathrm{tr}\{R\varSigma\}+\alpha^2(p^{-2}\mathbf{1}'\varSigma\mathbf{1} - \sigma^2) \end{aligned} $$

and

$$\displaystyle \begin{aligned} \mathrm{tr}\{R\varSigma\} = \frac{\mathrm{tr}\{(\sigma^2\varSigma^{-1}-\mathbf{w}\mathbf{w}')\varSigma\}}{n-p-1} = \frac{\mathrm{tr}\{\sigma^2I-\mathbf{w}\mathbf{w}'\varSigma\}}{n-p-1} = \frac{p-1}{n-p-1}\sigma^2. \end{aligned} $$

The identity (c) is similarly obtained by

$$\displaystyle \begin{aligned} \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'\mu\mu']\} =(1-\alpha)^2\mu'R\mu + \alpha^2(\mathbf{w}'\mu-p^{-1}\mathbf{1}'\mu)^2, \end{aligned} $$

while (d) is obtained by adding (b) and (c) and simplifying terms. □

Remark 10.1

The MSEP measure in (d), i.e., $E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]$, describes the variability of $\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t$ around the random term w′x _t. An alternative way of defining MSEP is by $E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]$ which describes the variability of $\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t$ around the non-random point w′E[x _t]. These two differ in that

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}^{\prime}_{\alpha}{\mathbf{x}}_t - \mathbf{w}'\mu)^2] = E[(\widehat{\mathbf{w}}^{\prime}_{\alpha}{\mathbf{x}}_t - \mathbf{w}'{\mathbf{x}}_t)^2] + (\mathbf{1}'\varSigma^{-1}\mathbf{1})^{-1}. \end{aligned} $$

See Appendix for a proof of this identity. Note that the smallest possible return prediction variance for any of the risks (a)–(d) of Proposition 10.1 is 0, which is attained at α = 1, whereas for $E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]$ the minimum prediction variance is determined by (1′Σ ⁻¹1)⁻¹ regardless of the value of α.

For purposes of deriving an estimator of w, the choice between $E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'{\mathbf {x}}_t)^2]$ and $E[(\widehat {\mathbf {w}}^{\prime }_{\alpha }{\mathbf {x}}_t - \mathbf {w}'\mu )^2]$ makes no difference, but for calculation of prediction intervals our view of the centre point does matter.

Remark 10.2

The bias terms vanish trivially if α = 0 and/or Σ = I. More generally, the bias terms for $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I)$, $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma )$, and $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\mu \mu ')$ vanish when p1′Σ ⁻²1 = (1′Σ ⁻¹1)², p ⁻²1′Σ1 = σ ² and $\mu _w - \bar \mu = 0$, respectively. While the first two conditions are almost identical to the condition Σ = I, the identity $\mu _w=\bar \mu $, holds when the portfolio return equals the average asset mean.

Remark 10.3

The optimal value of the tuning coefficient α may be derived by equating $\dfrac {\partial \mathfrak R(\widehat {\mathbf {w}}_{\alpha },Q)}{\partial \alpha }$ at zero and solving for α. The resulting value, say α _opt, then yields an adaptive estimator $\widehat {\mathbf {w}}_{\alpha _{opt}} = (1-\alpha _{opt})\widehat {\mathbf {w}}_0 + \alpha _{opt}p^{-1}\mathbf {1}$. However, α _opt is by necessity a function of unknown population parameters. Operational, or “bona-fide”, estimators are usually obtained by substituting these unknown parameters by some estimators to obtain an approximation $\widehat \alpha $. The adapted portfolio estimator is then defined by $\widehat {\mathbf {w}}_{\widehat \alpha } = (1-\widehat \alpha )\widehat {\mathbf {w}}_0 + \widehat \alpha p^{-1}\mathbf {1}$. The coefficient $\widehat \alpha $ is hence random, which in turn distorts the otherwise known distribution of $\widehat {\mathbf {w}}_{\alpha }$. It is still possible to conduct inferential analysis (interval estimation, etc.) through the theory of oracle inequalities but at the price of becoming more technically involved (see [3] for a comprehensive survey of the topic). Another option is to determine α qualitatively, for example based on expert knowledge of the financial market. This method has the appealing property of retaining the sampling distribution of $\widehat {\mathbf {w}}_{\alpha }$.

Remark 10.4

The original portfolio problem as posed by [20] considers the return of a non-random portfolio at data point t (e.g., a time point), determined by w′x _t with variance var[w′x _t] = tr{cov[x _t]ww′} = w′Σw, which is void of concerns about consistency and bias, etc. The fixed-portfolio theory uses Lagrange multipliers to derive the global minimum weight portfolio $\mathbf {w} = \dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}$, which is the minimum of var[w′x _t] subject to the constraint $\sum _{j=1}^pw_j=1$. How to proceed from there to statistical estimation is less obvious. It is customary to apply the criterion $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },\varSigma ) = E[(\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})]$ to derive estimators of w, but it is not clear what is gained by minimizing this risk instead of the unweighted risk $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = E[(\widehat {\mathbf {w}} - \mathbf {w})'(\widehat {\mathbf {w}} - \mathbf {w})]$. For any consistent estimator, say $\widehat {\mathbf {w}} = \dfrac {\widehat \varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\widehat \varSigma ^{-1}\mathbf {1}}$, where $\widehat \varSigma ^{-1}$ some consistent estimator of Σ ⁻¹, the asymptotic return variance is $\lim _{n\rightarrow \infty }var[\widehat {\mathbf {w}}'{\mathbf {x}}_t] = \mathbf {w}'\varSigma \mathbf {w}$ regardless of which risk function was used to obtain $\widehat {\mathbf {w}}$. The weight Q = Σ has already been used in the minimization of w′Σw to obtain the functional form $\dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}$, and there is no apparent gain in using the weighting $(\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})$ once again in the estimation stage. In fact, Σ ⁻¹ is the only unknown parameter in $\dfrac {\varSigma ^{-1}\mathbf {1}}{\mathbf {1}'\varSigma ^{-1}\mathbf {1}}$. Knowing Σ (up to a scalar multiplication) is therefore equivalent to knowing w. Since Σ is a one-to-one transformation of Σ ⁻¹, the weighted loss $(\widehat {\mathbf {w}} - \mathbf {w})'\varSigma (\widehat {\mathbf {w}} - \mathbf {w})$ in a sense uses the true parameter to derive its own estimator. In this view the unweighted risk $\mathfrak R(\widehat {\mathbf {w}}_{\alpha },I) = E[(\widehat {\mathbf {w}} - \mathbf {w})'(\widehat {\mathbf {w}} - \mathbf {w})]$ may better reflect the actual risk of an estimator $\widehat {\mathbf {w}}$.

References

Bodnar, T., Mazur, S., Podgórski, K.: Singular inverse Wishart distribution and its application to portfolio theory. J. Multivar. Anal. 143, 314–326 (2016)
Article MathSciNet MATH Google Scholar
Bodnar, T., Parolya, N., Schmid, W.: Estimation of the global minimum variance portfolio in high dimensions. Eur. J. Oper. Res. 266(1), 371–390 (2018)
Article MathSciNet MATH Google Scholar
CandŔes, E.J.: Modern statistical estimation via oracle inequalities. Acta Numer. 15, 257–325 (2006)
Article MathSciNet Google Scholar
DeMiguel, V., Garlappi, L., Nogales, F.J., Uppal, R.: A generalized approach to portfolio optimization: improving performance by constraining portfolio norms. Manage. Sci. 55(5), 798–812 (2009)
Article MATH Google Scholar
Dey, D.K., Ghosh, M., Srinivasan, C.: A new class of improved estimators of a multinormal precision matrix. Stat. Decis. 8(2), 141–152 (1990)
MathSciNet MATH Google Scholar
Dickinson, J.P.: The reliability of estimation procedures in portfolio analysis. J. Financ. Quant. Anal. 9(3), 447–462 (1974)
Article Google Scholar
Efron, B., Morris, C.: Multivariate empirical Bayes and estimation of covariance matrices. Ann. Stat. 4(1), 22–32 (1976)
Article MathSciNet MATH Google Scholar
Frahm, G., Memmel, C.: Dominating estimators for minimum-variance portfolios. J. Econom. 159(2), 289–302 (2010)
Article MathSciNet MATH Google Scholar
Frankfurter, G.M., Phillips H.E., Seagle, J.P.: Portfolio selection: the effects of uncertain means, variances, and covariances. J. Financ. Quant. Anal. 6(5), 1251–1262 (1971)
Article Google Scholar
Haff, L.R.: Estimation of the inverse covariance matrix: random mixtures of the inverse Wishart matrix and the identity. Ann. Stat. 7(6), 1264–1276 (1979)
Article MathSciNet MATH Google Scholar
Haff, L.R.: The variational form of certain bayes estimators. Ann. Stat. 19(3), 1163–1190 (1991)
Article MathSciNet MATH Google Scholar
Holgersson, T., Karlsson, P., Stephan, A.: A risk perspective of estimating portfolio weights of the global minimum variance portfolio. Adv. Stat. Anal. 104(1), 59–80 (2020)
Article MathSciNet MATH Google Scholar
Jagannathan, R., Ma, T.: Risk reduction in large portfolios: why imposing the wrong constraints helps. J. Finance 58(4), 1651–1684 (2003)
Article Google Scholar
James, W., Stein, C.: Estimation with quadratic loss. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 361–379 (1961)
MathSciNet MATH Google Scholar
Jobson, J.D., Korkie, B.: Estimation for Markowitz efficient portfolios. J. Am. Stat. Assoc. 75(371), 544–554 (1980)
Article MathSciNet MATH Google Scholar
Jorion, P.: International portfolio diversification with estimation risk. J. Bus. 58(3), 259–278 (1985)
Article Google Scholar
Kan, R., Zhou, G.: Optimal portfolio choice with parameter uncertainty. J. Financ. Quant. Anal. 42(3), 621–656 (1985)
Article Google Scholar
Kempf, A., Memmel, C.: Estimating the global minimum variance portfolio. Schmalenbach Bus. Rev. 58, 332–348 (2006)
Article Google Scholar
Kotz, S., Nadarajah, S.: Multivariate T-Distributions and Their Applications, 1st edn. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952)
Google Scholar
Michaud, R.O.: The Markowitz optimization enigma: is optimized optimal? ICFA Continuing Educ. Ser. 4, 43–54 (1989)
Article Google Scholar
Okhrin, Y., Schmidt, W.: Distributional properties of portfolio weights. J. Econom. 134(1), 235–256 (2006)
Article MathSciNet MATH Google Scholar
Rubio, F., Mestre, X., Palomar, D.P.: Performance analysis and optimal selection of large minimum variance portfolios under estimation risk. IEEE J. Select. Top. Signal Process. 6(4), 337–350 (2012)
Article Google Scholar
Serdobolskii, V.I.: Estimation of high-dimensional inverse covariance matrices. In: Multivariate Statistical Analysis: A High-Dimensional Approach, pp. 87–101. Springer, New York, NY (2000)
Google Scholar
Tsukuma, H., Konno, Y.: On improved estimation of normal precision matrix and discriminant coefficients. J. Multivar. Anal. 97, 1477–1500 (2006)
Article MathSciNet MATH Google Scholar
Zhou, X., Sun, X., Wang, J.: Estimation of the multivariate normal precision matrix under the entropy loss. Ann. Inst. Stat. Math. 53(4), 760–768 (2001)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would also like to thank the reviewer for several valuable and helpful suggestions and comments to improve the presentation of the paper.

Author information

Authors and Affiliations

Department of Economics and Statistics, Linnaeus University, Växjö, Sweden
Thomas Holgersson
Department of Mathematics, Linköping University, Linköping, Sweden
Martin Singull

Authors

Thomas Holgersson
View author publications
You can also search for this author in PubMed Google Scholar
Martin Singull
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Holgersson .

Editor information

Editors and Affiliations

Department of Economics and Statistics, Linnaeus University, Växjö, Sweden
Thomas Holgersson
Department of Mathematics, Linköping University, Linköping, Sweden
Martin Singull

Appendix

First we give the proof of Lemma 10.1.

Proof

Note that

$$\displaystyle \begin{aligned} \widehat{\mathbf{w}}_{\alpha}-E[\widehat{\mathbf{w}}_{\alpha}] = \widehat{\mathbf{w}}_0 - \alpha(\widehat{\mathbf{w}}_0-p^{-1}\mathbf{1}) - \mathbf{w} - \alpha(\mathbf{w}-p^{-1}\mathbf{1}) = (\alpha-1)(\widehat{\mathbf{w}}_0-\mathbf{w}). \end{aligned} $$

Hence,

$$\displaystyle \begin{aligned} cov[\widehat{\mathbf{w}}_{\alpha}] = (\alpha-1)^2E[(\widehat{\mathbf{w}}_0-\mathbf{w})(\widehat{\mathbf{w}}_0-\mathbf{w})']=(\alpha-1)^2cov[\widehat{\mathbf{w}}_0] = (\alpha-1)^2 R. \end{aligned} $$

Similarly, define the bias term $\mathbf {b} = E[\widehat {\mathbf {w}}_{\alpha }]- \mathbf {w}$. Then

$$\displaystyle \begin{aligned} E[\widehat{\mathbf{w}}_0]- \alpha(E[\widehat{\mathbf{w}}_0]-p^{-1}\mathbf{1}) - \mathbf{w} = -\alpha(\mathbf{w}-p^{-1}\mathbf{1}). \end{aligned} $$

Hence it follows that bb′ = α ²(w − p ⁻¹1)(w − p ⁻¹1)′ and

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})(\widehat{\mathbf{w}}_{\alpha}-\mathbf{w})'] = cov[\widehat{\mathbf{w}}_{\alpha}] + \mathbf{b}\mathbf{b}' = (1-\alpha)^2R+\alpha^2(\mathbf{w}-p^{-1}\mathbf{1})(\mathbf{w}-p^{-1}\mathbf{1})'. \end{aligned} $$

□

Next we give the proof of the identity in Remark 10.1.

Proof

We have

$$\displaystyle \begin{aligned} E[(\widehat{\mathbf{w}}_{\alpha}^{\prime}{\mathbf{x}}_t - \mathbf{w}'\mu)^2] & = E[((\widehat{\mathbf{w}}_{\alpha}^{\prime}{\mathbf{x}}_t - \mathbf{w}'{\mathbf{x}}_t) + (\mathbf{w}'{\mathbf{x}}_t - \mathbf{w}'\mu))^2] \\ & = \mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w}){\mathbf{x}}_t^{\prime}{\mathbf{x}}_t(\widehat{\mathbf{w}}_{\alpha} - \mathbf{w})]\}\\ & \quad + 2\mathrm{tr}\{E[(\widehat{\mathbf{w}}_{\alpha}^{\prime}{\mathbf{x}}_t-\mathbf{w}'{\mathbf{x}}_t)(\mathbf{w}'{\mathbf{x}}_t - \mathbf{w}'\mu)]\}\\ & \quad + E[\mathbf{w}'({\mathbf{x}}_t-\mu)({\mathbf{x}}_t-\mu)'\mathbf{w}]\\ & = A + 2B + D, \end{aligned} $$

say. The identities $A = \mathrm {tr}\{(\varSigma + \mu \mu ')E[(\widehat {\mathbf {w}}_{\alpha }-\mathbf {w})(\widehat {\mathbf {w}}_{\alpha }-\mathbf {w})']\}$ and D = tr{E[(x _t − μ)(x _t − μ)′ww′]} = tr{Σww′} = w′Σw are immediate. For the middle term B we have

$$\displaystyle \begin{aligned} B & = \mathrm{tr} \{E[{\mathbf{x}}_t{\mathbf{x}}_t^{\prime}\mathbf{w}\widehat{\mathbf{w}}_{\alpha} - {\mathbf{x}}_t{\mathbf{x}}_t^{\prime}\mathbf{w}\mathbf{w}'-\mu\mu'\mathbf{w}\widehat{\mathbf{w}}_{\alpha}^{\prime}+\mu\mu'\mathbf{w}\mathbf{w}']\}\\ & = \mathrm{tr} \{E[{\mathbf{x}}_t{\mathbf{x}}_t^{\prime}(\mathbf{w}(\mathbf{w} + \alpha(\mathbf{w}-p^{-1}\mathbf{1}))' - \mathbf{w}\mathbf{w}')-\mu\mu'(\mathbf{w}(\mathbf{w} + \alpha(\mathbf{w}-p^{-1}\mathbf{1}))'-\mathbf{w}\mathbf{w}')]\}\\ & = \mathrm{tr}\{\alpha E[{\mathbf{x}}_t{\mathbf{x}}_t^{\prime} - \mu\mu']\mathbf{w}(\mathbf{w}-p^{-1}\mathbf{1})'\} = \alpha(\mathbf{w}-p^{-1}\mathbf{1})'\varSigma\mathbf{w} = 0. \end{aligned} $$

□

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Holgersson, T., Singull, M. (2020). Risk and Bias in Portfolio Optimization. In: Holgersson, T., Singull, M. (eds) Recent Developments in Multivariate and Random Matrix Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-56773-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-56773-6_10
Published: 18 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-56772-9
Online ISBN: 978-3-030-56773-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Risk and Bias in Portfolio Optimization

Abstract

Similar content being viewed by others

A risk perspective of estimating portfolio weights of the global minimum-variance portfolio

Incorporating Estimation Errors into Portfolio Selection: Robust Portfolio Construction

Robust optimization approaches for portfolio selection: a comparative analysis

10.1 Introduction

10.2 Preliminaries

Property 10.1

Proof ([18, 22])

10.3 Risk Measures and Portfolio Estimation

Portfolio Weight Estimators

Proposition 10.1

Lemma 10.1

Proof

Proof

Remark 10.1

Remark 10.2

Remark 10.3

Remark 10.4

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Proof

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Risk and Bias in Portfolio Optimization

Abstract

Similar content being viewed by others

A risk perspective of estimating portfolio weights of the global minimum-variance portfolio

Incorporating Estimation Errors into Portfolio Selection: Robust Portfolio Construction

Robust optimization approaches for portfolio selection: a comparative analysis

10.1 Introduction

10.2 Preliminaries

Property 10.1

Proof ([18, 22])

10.3 Risk Measures and Portfolio Estimation

Portfolio Weight Estimators

Proposition 10.1

Lemma 10.1

Proof

Proof

Remark 10.1

Remark 10.2

Remark 10.3

Remark 10.4

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Proof

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation