1 Introduction

The seminal work of Markowitz (1952) changed the landscape of asset allocation problems, which up to that point were usually tackled in an ad-hoc fashion, see Kolm et al. (2014). By casting portfolio selection as a well-defined optimization problem, he established the risk-return paradigm which is still the fundamental reference framework used by most investment professionals. In essence, he created the conceptual structure that gave birth to modern portfolio theory.

While Markowitz’s contribution cannot be overstated, the direct application of the method proposed in the original paper proved problematic in practice for a variety of reasons. First, the sensitivity of the solution to the inputs—expected returns and the covariance matrix—is considerable, and given that estimation errors are always present the resulting portfolio was often deemed unreliable (see Kolm et al. 2014 and references therein). Second, it has been observed that in many cases the solution was not fully diversified. It is widely accepted that diversification is highly desirable in practice since it is an efficient way of protecting investors against unexpected events (e.g. situations of extreme volatility or geopolitical turmoil). However, forcing diversification in an ad-hoc fashion can lead to poor quality solutions due to an overly constrained feasible set. In Green and Hollifield (1992) the authors derived theoretical conditions under which diversification occurs for the mean-variance problem.

On the other hand, using the variance as a risk measure brings both practical and theoretical limitations. Being symmetrical, the variance is sensitive to extreme values on both ends of the distribution. In essence, it mixes both, “positive” and “negative” events offering a distorted assessment of the true risk the investor is exposed to. In the last 60 years there has been a significant amount of research into different ways of capturing risk (see the excellent surveys Krokhmal et al. 2011; Mansini et al. 2014), and we make use of some of those risk measures in the present work.

Markowitz optimization model assumes the parameters are known exactly, which, of course, in practice is not the case. Several different alternatives have been proposed to overcome this issue: popular choices include the assumption that the parameters follow a known distribution function, or that they belong to a specific uncertainty set. In the first case a widely-used methodology is the sample average approximation (SAA) approach described in Shapiro (2003). Given a distribution function for the returns, a deterministic approximation of the original stochastic problem can be constructed drawing samples from the given distribution. Under certain conditions, it can be shown that, as the sample size increases, the optimal solution and optimal value of the SAA problem converge to their exact counterparts in the original stochastic problem. More traditional methods in stochastic programming often make assumptions regarding the underlying distribution of the problem. SAA is flexible, and it has been shown that even for problems with an astronomical number of scenarios good candidate solutions can be obtained by sampling a few thousand of those scenarios (e.g. Linderoth et al. 2006).

Another popular approach is robust optimization (RO), and its variations. First proposed by Soyster (1973) and later developed in Bertsimas and Sim (2004), RO pursues a worst-case approach—it attempts to protect the decision maker against all possible realizations of the random parameters by considering an uncertainty set. Thus, the decision maker needs to solve a deterministic problem to obtain a solution that offers protection against all possible realizations within this set. RO is also helpful to avoid estimation errors, generating portfolio less sensitive to parameter changes. Examples of RO applications in finance include DeMiguel and Nogales (2009), Fernandes et al. (2016), Goldfarb and Iyengar (2003), Kawas and Thiele (2011), Wang et al. (2016) and Quaranta and Zaffaroni (2008).

Building on these methods, more recent approaches consider risk-averse SAA formulations that combine machine learning and regularization schemes to obtain diversified portfolios with good out-of-sample performance. Machine learning tools, which are widely used in other areas such as regression analysis (see Bishop 2006) and data mining (see Witten et al. 2011), are recently making their way into portfolio management. Algorithms developed by this discipline, like cross-validation and classification algorithms, can be of great importance for estimation procedures and as decision supporting tools.

Regularization techniques, commonly used in high-dimensional regression problems (e.g. Belloni and Chernozhukov 2013; Candes and Tao 2007), have also made an entrance in portfolio optimization due to their ability to cope with numerical stability issues (e.g. Tikhonov 1963) or sparsity of the solution (e.g. Lasso, Tibshirani 1996). In Brodie et al. (2009) the authors reformulated the mean-variance problem into a constrained least-squares regression, and added a \(\ell\)-1 penalty to the objective function that encourages sparse portfolios. The \(\ell\)-1 regularization is the most common approach to stabilize solution of portfolio problems, see Corsaro and De Simone (2019) and Dai and Wen (2018) for additional examples. In Fastrich et al. (2015) the authors extend \(\ell\)-1 regularization by considering Lasso, and propose new non-convex regularization terms that exhibit good performance when applied to large data sets. The authors use cross-validation to estimate the parameter. By using tools from statistical learning theory (Still and Kondor 2010) use the \(\ell\)-2 norm of the weight vector to induce diversification and achieve stability on out-of-sample experiments.

Recently, Ban et al. (2016) proposed a method called performance-based regularization (PBR) which focuses on both sides of the optimization problem: it acknowledges parameter estimation difficulties, and aims at generating solutions that are more diversified, which is key in practice. The central idea is to exclude solutions that are in-sample optimal, but have potentially high out-of-sample variability. The authors considered the problem of minimizing two risk measures, the variance and the Conditional Value-at-Risk (CVaR), subject to having returns higher than some threshold, in addition to regularization constraints.

In this paper, we formulate an optimization problem based on a maximization-of-return approach (instead of a minimization-of-risk approach), since we believe is more intuitive from a practical standpoint. With that as background our objective is twofold.

First, we consider three different risk measures to cast the portfolio optimization problem constraints: (i) integrated chance constraints, proposed by Haneveld (1986); ii) quantile deviation (see Cotton and Ntaimo 2015); and, absolute semi-deviation, proposed in Ogryczak and Ruszczyski (2002). We prove certain properties of these measures that are key to construct the corresponding PBR version of the relevant constraints.

Second, an extensive numerical comparison between a pure SAA approach and a PBR-based approach is presented using the three risk measures just described, in addition to the CVaR. In light of the mounting evidence in favor or passive strategies vis-à-vis active portfolio selection strategies, we have chosen to cast the optimization problem based on indices rather than individual stocks or bonds (see Arnott et al. 2000; Bogle 1995; Elton and Blake 1996; Malkiel 1995, 1996), and we rebalance the portfolio only once a year. The experiments cover the 10-year period between January 2003 and December 2012, and thus, they include the subprime crisis, the most challenging market environment of the last 50 years. The strategy is based on a rolling horizon scheme, in which past data are used to estimate the portfolio optimization parameters, and the performance of the optimal solution is tested with samples obtained from a parameterized model of returns. By using different samples and risk measures, the results show that the formulation with PBR constraints results in higher levels of diversification compared to solutions obtained with the SAA approach. Nevertheless, in periods of relative market stability, SAA outperforms PBR because the solutions with more variability end up being the ones with higher expected returns.

The rest of the paper is organized as follows. Section 2 describes common regularized mean-risk formulations, including PBR, as well as our modification based on a max-return framework. In Sect. 3 we derived the relevant expressions to use the three risk measures mentioned within the context of PBR. Sect. 4 shows the numerical simulation results, and Sect. 5 presents our conclusions.

2 Regularized mean-risk formulations

Consider the following mean-risk formulation:

$$\begin{aligned} \min _{w \in {\mathbb {R}}^{p}} \quad&\mathrm {Risk}(w^{T}X), \nonumber \\ s.t. \quad&w^{T}{\mathbf {1}}_{p} = 1,\nonumber \\&w^{T}\mu = R, \nonumber \\&(w \ge 0), \end{aligned}$$
(1)

where \(w \in {\mathbb {R}}^{p}\) is the investor’s portfolio, \({\mathbf {1}}_{p} \in {\mathbb {R}}^{p}\) is a vector of ones, \(X \in {\mathbb {R}}^{p}\) is a random vector representing the return of p assets, \(\mu = {\mathbb {E}}(X)\) is a vector of averages of each component of X and \(\mathrm {Risk}: {\mathcal {X}} \rightarrow {\mathbb {R}}\) is a risk measure defined on some space of random variables, e.g. the \(L^1\) space. The parenthesis in the last constraint indicates it is optional. For most risk measures—and most distribution functions—it is not possible to explicitly solve problem (1). The SAA formulation associated to problem (1) is given by

$$\begin{aligned} \min _{w \in {\mathbb {R}}^{p}} \quad&\widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}}),\nonumber \\ s.t. \quad&w^{T}{\mathbf {1}}_{p} = 1, \nonumber \\&w^{T}{\hat{\mu }} = R, \nonumber \\&(w \ge 0), \end{aligned}$$
(2)

where \({\mathsf {X}} = (X_1,\ldots ,X_n)\) is a random vector representing observed returns, \(\widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}})\) is the sample estimator of \(\mathrm {Risk}(w^{T}X)\) and \({\hat{\mu }}\) is the vector of sample averages based on the n observations of the random vector X, \({\hat{\mu }} = (1/n)\sum _{i=1}^{n} X_{i}\).

It is well-known that the solution of problem (2) can be highly unreliable due to estimation errors. This happens when \(\mathrm {Risk}(w^{T}X) = w^{T}\Sigma w\) (the mean-variance problem) as shown in Best and Grauer (1991), Broadie (1993), Chopra and Ziemba (1993), Frankfurter et al. (1971), Frost and Savarino (1986), Frost and Savarino (1988) and Michaud (1989), and also when the risk measure is the CVaR (Lim et al. 2011). Recall that the CVaR at a confidence level \(\alpha\) is defined as

$$\begin{aligned} \mathrm {CVaR}_{\alpha }(w^{T}X) := \min _{\eta \in {\mathbb {R}}} \eta + \frac{1}{1-\alpha }{\mathbb {E}}\left[ (w^{T}X - \eta \big )^{+}\right] , \end{aligned}$$

where \((a)^{+}\) denotes the maximum between \(a \in {\mathbb {R}}\) and 0. In order to overcome this problem, the work Ban et al. (2016) proposes a performance-based regularization (PBR) to control the instability of the SAA solution in terms of its out-of-sample behavior. The main idea behind their method is to constrain the variances of the estimators of \(\widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}})\) and \(\omega ^T{\hat{\mu }}\) in order to move the SAA solution away from portfolios with high variability, which tend to be less diversified and yield poor out-of-sample performance in many cases. This regularization method imposes two additional constraints to problem (2), as follows:

$$\begin{aligned} \mathrm {SV}\left[ \widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}}) \right]&\le U_{1}, \end{aligned}$$
(3)
$$\begin{aligned} \mathrm {SV}\left[ w^{T}{\hat{\mu }}\right]&\le U_{2}, \end{aligned}$$
(4)

where \(\mathrm {SV}(\cdot )\) is the standard unbiased sample variance estimator of the variance, denoted by \(\mathrm {Var}(\cdot )\), and \(U_1\) and \(U_2\) are real numbers that are obtained using cross-validation on past data. The values are selected according to the solution that gives the highest Sharpe Ratio. Most importantly, the cross-validation procedure defines automatically the values of \(U_1\) and \(U_2\), so they are not inputs that need to be defined by the investor. Note that constraint (4) does not depend on the choice of the risk measure and can be easily converted to a quadratic constraint by noting that

$$\begin{aligned} \mathrm {Var}\left[ w^{T}{\hat{\mu }} \right] = w^{T}\Sigma w, \end{aligned}$$

where \(\Sigma\) is the covariance matrix of X, which gives

$$\begin{aligned} \mathrm {SV}\left[ w^{T}{\hat{\mu }} \right] = w^{T}\Sigma _{n} w, \end{aligned}$$

where \(\Sigma _{n}\) is the sample estimator of \(\Sigma\).

Expression (3) is more involved and when the risk measure is the variance, or the CVaR, Ban et al. (2016) derived explicit expressions for constraint (3). Recall that one of the purposes of this paper is to derive closed-form expressions for constraint (3) when using the three risk measures mentioned before. Our first step is to consider a mean-risk formulation different from problem (2), as follows:

$$\begin{aligned} \max _{w \in {\mathbb {R}}^{p}} \quad&w^{T}{\hat{\mu }}, \nonumber \\ s.t. \quad&w^{T}{\mathbf {1}}_{p} = 1, \nonumber \\&\widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}}) \le k, \nonumber \\&\mathrm {SV}\left[ \widehat{\mathrm {Risk}}(w^{T}{\mathsf {X}}) \right] \le U_{1}, \nonumber \\&w^{T}\Sigma _{n}w \le U_{2}, \nonumber \\&(w \ge 0), \end{aligned}$$
(5)

where k is a risk-tolerance parameter selected by the investor. We believe that formulating the optimization problem in reference to a return-maximization framework is more intuitive from a practical viewpoint. Additionally, we think it is easier to select a priori an appropriate risk parameter than a desired return. (Returns, unlike risk tolerance levels, can be, at least in theory, unbounded). In the next section we derive explicit convex formulations of problem (5) for the three different risk measures.

3 Extension to different risk measures

We start defining the three risk measures we will work with in this paper. In all cases let \(X \in {\mathbb {R}}^{p}\) be a random vector and \(w \in {\mathbb {R}}^{p}\) a vector that represents decisions. All proofs of propositions and lemmas are relegated to the appendix.

  • Integrated chance constraints (ICC): In Haneveld (1986) the integrated chance constraints (ICC) are defined as

    $$\begin{aligned} {\mathbb {E}}_{\omega }\Big [ \big (h(\omega ) - w^{T}X(\omega ) \big )^{+}\Big ], \end{aligned}$$

    where \(h(\omega ) \in {\mathbb {R}}\) is a random benchmark, e.g. an index, \((x)^{+} = \max \{x,0\}\). ICC imposes that the average violation from a target must be bounded by a given threshold, which will be given by the risk-tolerance parameter k. Let \({\mathsf {Y}} = (Y_{1}, \ldots , Y_{n})\) be a random vector with \(Y_{i} = (X_{i,1}, \ldots , X_{i,p}, h_{i})\), \(X_{i,j}\) is the return of asset j in scenario i and \(h_{i}\) is the value of the stochastic benchmark \(h(\omega )\) in scenario i. We have \(X_{i} = (X_{i,1},\ldots , X_{i,p})\) as the vector representing the i-th sample of asset returns and \({\mathsf {X}} = (X_{1}, \ldots , X_{n})\) is the random vector of returns, as in formulation (2). The sample estimator of \(\mathrm {ICC}\) is

    $$\begin{aligned} \widehat{\mathrm {ICC}}(w;{\mathsf {Y}}) = \widehat{\mathrm {ICC}}(w,h;{\mathsf {X}}) = \frac{1}{n} \sum _{i=1}^{n} (h_{i} - w^{T}X_{i})^{+}. \end{aligned}$$
    (6)
  • Absolute semideviation (ASD): In Ogryczak and Ruszczyski (2002) the absolute semideviation (ASD) is defined as

    $$\begin{aligned} \mathrm {ASD}(w) = {\mathbb {E}}\Big [ \big ( w^{T}X - w^{T}\mu \big )^{+} \Big ], \end{aligned}$$

    where \(\mu = {\mathbb {E}}[X]\) is the vector of mean returns. ASD measures the average one-sided excesses with respect to the mean. The sample estimator of ASD is

    $$\begin{aligned} \widehat{\mathrm {ASD}}(w; {\mathsf {X}}) = \frac{1}{n} \sum _{i=1}^{n} \big (w^{T}X_{i} - w^{T}{\hat{\mu }} \big )^{+}, \end{aligned}$$
    (7)

    where \({\hat{\mu }} = \frac{1}{n} \sum _{i=1}^{n} X_{i}\).

  • Quantile deviation (QDEV): Let \(f:{\mathbb {R}}^{p} \times \Omega \rightarrow {\mathbb {R}}\) be a function such that \({\mathbb {E}}\big [|f(w, X)|\big ] < \infty\) for every \(X \in \Omega\). Let \(\alpha \in (0,1)\). In Ogryczak and Ruszczyski (2002), quantile deviation (QDEV) is defined as

    $$\begin{aligned} \mathrm {QDEV}_{\alpha }(w) = {\mathbb {E}}\Big [(1-\alpha ) \big ( (\kappa _{\alpha } - 1)f(w,X) \big )^{+} + \alpha \big ( (1 - \kappa _{\alpha })f(w,X) \big )^{+} \Big ], \end{aligned}$$

    where \(k_{\alpha }\) is the \(\alpha\)-quantile of the distribution of f(wx). Similar to the CVaR, in Ruszczyski and Shapiro (2006) \(\mathrm {QDEV}_{\alpha }[w]\) is shown to be equivalent to the result of the following minimization problem:

    $$\begin{aligned} \mathrm {QDEV}_{\alpha }(w) = \min _{\eta \in {\mathbb {R}}} {\mathbb {E}}\Big [\epsilon _{1} \big ( \eta - w^{T}X \big )^{+} + \epsilon _{2} \big ( w^{T}X - \eta \big )^{+} \Big ], \end{aligned}$$
    (8)

    with \(f(x,W) = w^{T}X, \alpha = \frac{\epsilon _{2}}{\epsilon _{1} + \epsilon _{2}}\), and \(\epsilon _{1}, \epsilon _{2} > 0\). The sample estimator of QDEV based on expression (8) is

    $$\begin{aligned} \widehat{\mathrm {QDEV}}_{\alpha }(w; {\mathsf {X}}) = \min _{\eta \in {\mathbb {R}}} \frac{1}{n} \sum _{i=1}^{n} \epsilon _{1}\big (\eta - w^{T}X_{i} \big )^{+} + \epsilon _{2}\big (w^{T}X_{i} - \eta \big )^{+}. \end{aligned}$$

Before presenting the expressions for \(\mathrm {SV}\) of the risk measures defined above, we introduce a useful definition and a lemma.

Definition 1

Let \(n \in {\mathbb {N}}\), \(n \ge 2\). We define \(\Omega _{n}\) as the matrix

$$\begin{aligned} \Omega _{n} := \frac{1}{n-1}\big [I_{n} - n^{-1}1_{n}1_{n}^{T}\big ], \end{aligned}$$

where \(I_{n}\) is the \(n \times n\) identity matrix and \(1_{n} = (1, 1, \ldots , 1) \in {\mathbb {R}}^{1 \times n}\).

Lemma 1

Let \({\mathsf {z}} = (z_{1}, \ldots , z_{n})\) be a sample from a given distribution \(F(\cdot )\). Then the sample variance is

$$\begin{aligned} \mathrm {SV}[ {\mathsf {z}} ] = {\mathbf {z}}^{T}\Omega _{n}{\mathbf {z}}. \end{aligned}$$

3.1 Regularized ICC constraint

For the ICC we have the following result:

Proposition 1

Let \(Y_{1}, \ldots , Y_{n} \overset{\mathrm {i.i.d}}{\sim }F\), where the cumulative distribution F has finite second moment, and let \(\widehat{\mathrm {ICC}}_{n}(w;{\mathsf {Y}})\) be defined as in (6). Then

$$\begin{aligned} \mathrm {Var} \left[ \widehat{\mathrm {ICC}}(w;{\mathsf {Y}}) \right] = \frac{1}{n} \mathrm {Var}\big [(h - w^{T}X)^{+}\big ]. \end{aligned}$$

Corollary 1

Under the assumptions of Proposition 1we have

$$\begin{aligned} \mathrm {SV}\left[ \widehat{\mathrm {ICC}}(w;{\mathsf {Y}})\right] = \frac{1}{n}z^{T}\Omega _{n}z, \end{aligned}$$

where \(z = (z_{1}, \ldots , z_{n})\) and \(z_{i} = (h_{i} - w^{T}X_{i})^{+}\).

Proof

Direct application of Lemma 1 for \({\mathsf {z}} = (z_{1}, \ldots , z_{n})\). \(\square\)

3.2 Regularized ASD constraint

Similarly, for the ASD we have the following proposition.

Proposition 2

Let \(X_{1},\ldots X_{n} \overset{\mathrm {i.i.d}}{\sim }F\), where the cumulative distribution F has finite second moment, and let \(\widehat{\mathrm {ASD}}[w; {\mathsf {X}}]\) be defined as in (7). Then

$$\begin{aligned} \mathrm {SV}\Big [\widehat{\mathrm {ASD}}(w; {\mathsf {X}}) \Big ] = \frac{1}{n} z^{T}\Omega _{n}z, \end{aligned}$$

where \(z = (z_{1}, \ldots , z_{n}), z_{i} = (w^{T}X_{i} - w^{T} {\hat{\mu }})^{+}\) and \({\hat{\mu }} = (1/n)\sum _{i=1}^{n} X_{i}\) is the vector of sample averages of \({\mathsf {X}}\).

3.3 Regularized QDEV constraint

Finally, for QDEV we have

Proposition 3

Let \(X_{1}, \ldots , X_{n} \overset{\mathrm {i.i.d}}{\sim }F\) be a random vector with finite second moments, and let F denote its distribution function. Let \(\widehat{\mathrm {QDEV}}_{\alpha }(w; {\mathsf {X}})\) be as above and \(\eta ^{*} \in {\mathbb {R}}\) such that

$$\begin{aligned} \widehat{\mathrm {QDEV}}_{\alpha }(w; {\mathsf {X}}) = \frac{1}{n} \sum _{i=1}^{n} \epsilon _{1}\big (\eta ^{*} - w^{T}X_{i} \big )^{+} + \epsilon _{2}\big (w^{T}X_{i} - \eta ^{*}\big )^{+}. \end{aligned}$$

Then

$$\begin{aligned} \mathrm {Var}\Big [\widehat{\mathrm {QDEV}}_{\alpha }(w; {\mathsf {X}}) \Big ] = \frac{1}{n} \Bigg \{\mathrm {Var}\Bigg [ \epsilon _{1}\big (\eta ^* - w^{T}X_{i} \big )^{+} + \epsilon _{2} \big (w^{T}X_{i} - \eta ^* \big )^{+} \Bigg ]\Bigg \}. \end{aligned}$$

Note that

$$\begin{aligned} 1-\alpha = 1- \frac{\epsilon _{2}}{\epsilon _{1} + \epsilon _{2}} = \frac{\epsilon _{1}}{\epsilon _{1} + \epsilon _{2}}. \end{aligned}$$
(9)

For simplicity, write \(Z_i = w^{T}X_i\). From expression (9), and using \(\epsilon _{1}, \epsilon _{2} > 0\) we have

$$\begin{aligned} \widehat{\mathrm {QDEV}}_{\alpha }(w; {\mathsf {X}}) = (\epsilon _{1} + \epsilon _{2}) \min _{\eta \in {\mathbb {R}}} \frac{1}{n} \sum _{i=1}^{n} (1-\alpha )\big (\eta - Z_{i} \big )^{+} + \alpha \big (Z_{i} - \eta \big )^{+}. \end{aligned}$$
(10)

The next lemma is essential to prove Proposition 3.

Lemma 2

Let \(p = \lceil n\alpha \rceil - n\alpha\). Following the notation above, if \(p > 0\), then \(\eta ^{*} = Z_{(\lceil n\alpha \rceil -1)}\) is the unique minimizer of problem (10). Otherwise, if \(p = 0\), then \(\eta ^{*} = Z_{(\lceil n\alpha \rceil )}\) is one of the minimizers of problem (10).

Corollary 2

Under the assumptions of Proposition 3we have

$$\begin{aligned} \mathrm {SV}\left[ \widehat{\mathrm {QDEV}}(w;{\mathsf {X}})\right] = \frac{1}{n}z^{T}\Omega _{n}z, \end{aligned}$$

where \(z = (z_{1}, \ldots , z_{n})\) and \(z_{i} = \epsilon _1(\eta (p)-w^TX_i)^+ + \epsilon _2(w^TX_i - \eta (p))^+\).

Proof

Direct application of Lemma 1 for \({\mathsf {z}} = (z_{1}, \ldots , z_{n})\). \(\square\)

Note that for all three cases the resulting PBR constraints are quadratic, which makes the corresponding optimization problems amenable to be tackled with off-the-shelf convex commercial solvers. In the next section we use the expressions presented in Propositions 1, 2 and 3 to test the performance of PBR in practice.

4 Numerical Results

Due to the recent boom in passive investments, in addition to mounting evidence that passive (index-based) approaches tend to outperform active investment strategies, we have designed our experiment using indices instead of individual assets. An additional advantage of choosing an investment strategy based on indices is that it offers a high degree of diversification while keeping the size of the optimization problem more manageable. In our study, following Walden (2015), we have selected thirteen indices that offer a wide exposure to the most popular asset classes, namely, stocks, bonds, real estate and commodities. The indices are described in Table 1. The period considered goes from January 2000 until December 2012; thus, it includes the subprime crisis, a time period of significant market turmoil, which we judge essential to assess the virtues of any investment strategy.

Table 1 List of indices

4.1 Design of experiments

We use monthly returns from a 3-year period to cast the optimization problem directly, with no parametric estimation, and then we test the performance on year 4. In order to perform extensive computations for year 4 we need some parametric assumption on returns. To this end, we assume that the vector of yearly returns r follows a multivariate normal distribution \(N(\mu ,\sigma )\), and estimate the corresponding parameters using past data from the 3-year window. Other possibilities could have been selected as long as samples can be easily obtained by the parametric model. For each year starting in 2003 and for each risk measure, we sample 100 yearly returns and evaluate the optimal solutions obtained by the SAA and PBR formulations, using the same sample in both cases. Such parametric approach allows us to have a more exhaustive and robust assessment of each method.

Thus, we initially start with [2000, 2001, 2002] and test our results with actual returns in 2003. We end with the window [2009, 2010, 2011], testing on 2012, for a total of 10 comparison years in a rolling horizon fashion. In each case (each 3-year window) we solve the data-driven optimization problem (no parametric model is needed in this step) using the four metrics (the three presented in Sect. 3, plus the CVaR, based on the results derived in Ban et al. (2016)), for both the SAA and the PBR-based approach.

For the CVaR and QDEV we use \(\alpha =0.9\), which means we want to control average tail losses (beyond the 1–0.9 = 0.1 percentile) given that a loss occurred. For ICC we use \(h(\omega ) = 5.5\%\), which is a benchmark of yearly returns. Finally, the values of k for the experiments are the smallest numbers such that problem (5) is feasible for each risk measure, which explains why they vary from experiment to experiment.

The code was written in Python 2.7.13 and the problems were solved using Gurobi (version 7.0.2) on a MacBook Pro with a 2 GHz Intel Core i5 and 8 GB of RAM. In Table 2 we report for SAA and PBR the time (in s) to simulate and solve the 100 problems, and to compute the statistics of interest for the 10 years in study. The difference in computational times across methods is remarkable. This is to be expected, considering that the cross-validation procedure that selects \(U_1\) and \(U_2\) involves solving several auxiliary optimization problems. Since both methods are being implemented as passive investment strategies the computational times do not prevent the implementation of PBR in practice.

Table 2 Computational times for each experiment

4.2 Diversification

From a practical viewpoint, a fundamental aspect of the resulting portfolio is its degree of diversification. Following Woerheide and Persson (1992), we use a normalized version of the complement of Herfindal’s diversification index (DI) to measure this property:

$$\begin{aligned} {\text {DI}} = \frac{1 - \sum _{i=1}^{n} w_{i}^{2}}{1 - 1/n}, \end{aligned}$$

where \(w_{i}\) is the weight of index i and \(n>1\) is the number of assets available for investment. According to this measure, a concentrated portfolio (only one position among the n assets) has a DI of zero; and a portfolio of equally weighted assets (the so-called 1/n portfolio), would have a DI of one, which corresponds to maximum diversification.

Figure 1 shows the average DI values, for the SAA and PBR portfolios, for each year between 2003 and 2012. The results indicate that the PBR portfolios are more diversified, and often the difference is significant. Moreover, in several cases the SAA portfolios have a \({\text {DI}} = 0\) for all samples, whereas PBR portfolios always have a positive average DI (in all years and for all four risk measures). Figure 1b (0.9-QDEV) reveals an extreme case—in 7 out of 10 years SAA portfolios have a \({\text {DI}}=0\). The reason is that in cases where regularization is not present, there was one asset with a high return, and being fully invested in this asset did not violate the risk constraint. PBR constraints are designed to avoid that: solutions that have high returns are often infeasible because their variability is too high.

It is also interesting to note that diversification is persistent over time, that is, the average \({\text {DI}}\) for the PBR portfolios is not only higher than those of the SAA portfolios, but is also more stable over the 10 year-period under study. Figure 3 shows the area below the SAA and PBR diversification trajectories displayed in Fig. 1. Total diversification—the 1/n portfolio—would have an area of 10. Thus, the entries in Table 3 can be thought of as the number of years in which the strategy corresponds to complete diversification. We observe that the PBR values are roughly between 2 and 3 times those of the SAA portfolios, indicating, again, a much higher degree of diversification.

Fig. 1
figure 1

Average \({\text {DI}}\) values for SAA and PBR simulated portfolios, using different risk measures and values of k. The red line corresponds to SAA and the blue line to PBR

Table 3 Area below diversification trajectories

4.3 Risk and returns

We now turn our attention to the performance of both methods with respect to returns and variability. The comparison will take place in three time windows: pre-crisis (2003–2007), crisis (2008) and post-crisis period (2011–2012). Results in the recovery years (2009–2010) were similar between SAA and PBR and are available from the authors upon request.

4.3.1 Pre-crisis period

Table 4 shows summary statistics for realized returns during the pre-crisis years (2003–2007). All in all, SAA portfolios outperform the PBR portfolios, with differences as high as 8% on a given year. This is not surprising since the stability provided by the PBR algorithm comes at the expense of ruling out high variability solutions, which, in turn, are the ones that yield the highest out-of-sample returns. It is also noteworthy that the differences in performance between the SAA and PBR portfolios are fairly consistent across all risk measures, which validates the robustness of both methods.

Table 4 Average realized returns of optimal portfolios determined by SAA and PBR, and using four different risk measures during years pre crisis (2003–2007)

4.3.2 The crisis of 2008

During the subprime crisis, the PBR portfolios have smaller losses, not only in average terms, but also when the maximum and minimum returns are considered, as shown in Table 5. During the crisis PBR portfolios outperform SAA portfolios in every aspect and for all four risk measures. It should be noted that realized returns are significantly low with both approaches because the parameter estimation was based on data that was unable to anticipate the crisis. It is certainly beyond the scope of this study to identify or propose indicators that could predict crises. But it suffices to say that not using regularization techniques can magnify the losses in those scenarios.

Table 5 Average realized returns of optimal portfolios determined by SAA and PBR, and using four different risk measures in 2008. The numbers in parentheses correspond to the minimum and maximum realized returns, respectively

4.3.3 Post-crisis period

Let us now compare the two methods in 2011 and 2012. As shown in Tables 6 and 7, the difference in performance is extraordinary, with the SAA portfolios exhibiting significantly more dispersion. Moreover, the PBR portfolios offer more protection against extreme losses, without suffering a noticeable reduction in terms of either average or maximum returns.

Table 6 Average realized returns of optimal portfolios determined by SAA and PBR, and using four different risk measures in 2011
Table 7 Average realized returns of optimal portfolios determined by SAA and PBR, and using four different risk measures in 2012

4.4 Discussion

For an investor dealing with an asset allocation problem it is by no means clear, unless additional information is provided, which risk measure will best fit his/her interests. The results suggest that the ASD produces more diversified portfolios, albeit with lower but more stable returns. On the other hand, the ICC results in the least diversified portfolios (lowest \({\text {DI}}\) values), combined with the highest realized returns, which, not surprisingly, come with the highest variability. The CVaR portfolios are somewhere in between. Their returns are higher than those obtained by the ASD portfolios, but not as high as the ICC portfolios. Diversification for CVaR is on average higher but it is the only risk measure which generates completely concentrated portfolios (\({\text {DI}}=0\)) for some—but never for all—of the sampled returns, in some years.

Lastly, QDEV offers a much more complex behavior, and its effects depend greatly on the value of \(\alpha\). Our experiments suggest that when this parameter gets closer to 0 or 1, the portfolios tend to be more concentrated and riskier, but also, more rewarding. Interestingly, when \(\alpha\) is closer to 0.5, those solutions behave similarly to those rendered by the ASD.

We close this section with a comment regarding the effects of the right-hand side constants, \(U_1\) and \(U_2\), in the resulting portfolio. The constraint controlled by \(U_2\) [variance of returns, constraint (4)] is the one responsible for diversification, while \(U_1\) [variance of risk, constraint (3)] induces minor changes in the portfolio allocations. The former is binding more often, whereas low values of \(U_2\) are a common cause for infeasibilities. It is therefore possible to infer that controlling the variance of returns excludes unreliable solutions, while controlling the variance of the risk measure improves the out-of-sample performance of the portfolio.

An important distinction must be made between the variability constraints defined by \(U_1\) and \(U_2\), and the risk constraint defined by k. If infeasibilities are caused by a value of k which is too small, then, no combination of the indices will yield a portfolio with the acceptable level of risk—the only course of action is to simply increase the value of k until a solution is found. Infeasibilities induced by the \(U_2\) constraint are completely different. First, since this parameter is set via a machine learning procedure as described in Section 5.4 of Ban et al. (2016), we have no room to maneuver. Second, and more important, the lack of feasible solutions should be taken as a warning, since it means that the observed returns being used present high variability, which should serve as an alert to reframe the optimization problem using more data.

5 Conclusions

Since the publication of Markowitz’s seminal paper, the trade-off between risk and return within an optimization framework has attracted the attention of academics and investors alike. Nevertheless, the practical implementation of solutions obtained from such model has been marred by estimation errors and have often resulted in poorly diversified portfolios. Several tools have been developed to overcome this shortcoming, and in this work we studied a performance-based regularization (PBR) scheme, a novel regularization tool that incorporates machine learning to find the parameters that produce better out-of-sample performance. Building on Ban et al. (2016), we developed explicit convex expressions to test the PBR formulation in combination with three risk measures: integrated chance-constraints, absolute semi-deviation and quantile deviation.

We show in our numerical results that PBR is capable of delivering more diversified portfolios than those of SAA, and also more stable, over time. Additionally, the experiments show that PBR can effectively protect the investors from portfolios with low out-of-sample performance. In particular, during times of crises, PBR’s performance was superior in maximum, average and minimum observed returns for the simulated portfolios. On more stable years, SAA outperforms PBR since the elimination of solutions with high variability—precisely the ones which perform better in those years—can damage returns when market conditions are favorable. Finally, the right-hand sides of the regularized constraints are defined via cross-validation, freeing the investor of the problematic task of having to specify those parameters.

Our findings show that pure SAA techniques are not suitable to deal with practical portfolio selection problems. It is critical to impose some regularization to the problem, and our work shows that PBR is a viable and tractable choice, especially for mid–to long–term investments. Future work should focus on comparing PBR-based methods with robust or distributionally robust optimization, in combination with machine learning techniques. Another avenue of research is to explore regularization schemes in the context of stochastic dynamic—multistage—portfolio problems. It would be interesting to compare different multistage frameworks that have been studied lately, such as Expected Conditional Risk measures (Homem-de-Mello and Pagnoncelli 2016), nested risk measures (Kozmík and Morton 2015) and Expected Conditional Stochastic Dominance (Escudero et al. 2018), and understand the effect of including PBR in each case.