1 Introduction

The classical approaches to optimal portfolio selection call for finding a feasible portfolio that optimizes a risk measure, or a gain measure, or a combination thereof by means of a utility function or of a performance measure. However, the optimization approach might be misleading due to the difficulty of obtaining good estimates for the parameters involved in the function to be optimized and to the high sensitivity of the optimal solutions to the input data. Therefore, several authors have proposed alternative approaches where they look for the best way of diversifying the risk of the portfolio.

A straightforward approach to diversify the risk of a portfolio seems to be that of using an Equally Weighted (EW) portfolio. Pflug et al. [39] prove that the EW portfolio is the asymptotical best choice when uncertainty increases, whereas DeMiguel et al. [25] show that it can hardly be beaten by an optimized portfolio in practice. However, if the market contains securities with very different intrinsic risks, then the EW approach leads to a portfolio that is not balanced in terms of risk allocation to each security [34].

Another naive strategy, frequently used in practice to achieve approximately equal risk contribution of all securities, is to take weights proportional to \(1/\sigma _{i}\), where \(\sigma _{i}\) is the volatility of security i. The term “Risk Parity” was first introduced by Qian [40]. He developed the idea of Risk Parity Portfolios, where an equal amount of risk is allocated to stocks and bonds. Risk Parity (RP) has been later formalized in a model which aims at making the total risk contributions of all securities equal among them [34]. This naturally leads to solving a system of nonlinear equations and inequalities [34, 50], but the RP model can also be equivalently formulated as a nonlinear convex optimization problem. The risk measure commonly used in the RP approach is volatility. However alternative risk measures can be considered (see, e.g., [9, 12] and the comprehensive recent monograph by Roncalli [46] on Risk Parity and Risk Budgeting).

After the global financial crisis of 2008, the interest for the defensive strategy of the RP approach has continuously grown over the years, both among academics and among practitioners [2, 3, 5, 6, 9, 12, 18, 19, 22,23,24, 32,33,34, 40, 41, 46, 49, 50], thus becoming a popular asset allocation strategy,Footnote 1. Furthermore, several alternative approaches to diversify risk have been recently proposed in the literature (see, e.g., [17, 21, 37]).

Due to its formulation, all securities are selected in a RP portfolio, since, by construction, each security gives a positive (equal) contribution to total risk and must hence have a positive weight. It has been recently observed [15] that the Risk Parity approach is much more stable than the optimization approaches w.r.t. input data perturbation.

In this paper, we propose a new approach that tries to reduce the impact of data estimation errors (as discussed in Sect. 6.3), and to join the benefits of the optimization and of the diversification approaches by choosing the portfolio that is best diversified (w.r.t. Risk Parity) on a subset of assets of the market, and that globally optimizes an appropriate risk, or utility, or performance measure among all portfolios of this type. The portfolio obtained with this approach is generally small w.r.t. the size of the market. This seems to lead to a better out-of-sample performance, a phenomenon already observed in [14] and described here in Sect. 6.2. Our approach is based on a novel framework that allows to combine the diversification and optimization approaches to the portfolio selection problem through the global solution of a hard nonlinear mixed integer or pseudo Boolean problem. For this problem we propose an efficient and accurate heuristic that extends the classical single-threaded greedy approach to a multiple threaded setting. We show that this approach yields portfolios that are only slightly suboptimal in-sample, and generally show improved out-of-sample performance with respect to their purely diversified or purely optimized counterparts. We present empirical results on real-world data showing the better performance of the diversified optimal portfolios obtained with our new method with respect to the ones obtained with standard optimization or diversification techniques.

In this paper we also provide several new theoretical results for the Risk Parity approach for general risk measures, most of which have been presented at various conferences and workshops.Footnote 2

The paper is organized as follows. In Sect. 2, we show our new theoretical results. Then, after recalling the main optimization models for portfolio selection, we propose, in Sect. 4, our novel framework to the portfolio selection problem. In Sect. 5, we present our new multiple threaded greedy heuristic approach. Finally, in Sect. 6 we provide empirical results on real-world data along with the performance obtained with our new approach.

2 Diversification with Risk Parity for general risk measures

We consider a setting where n securities are available with rates of return described by random variables \(S_1, \ldots , S_n\). The aim of an investor is to select a portfolio composed with such securities that achieves some specified goal. In this section we recall the classical uniform diversification method for portfolio selection and we propose a general framework for the Risk Parity diversification approach where we establish new existence and uniqueness results.

A portfolio is identified with a vector \(x \in \mathbb {R}^{n}_{+} = \{x \in \mathbb {R}^{n}: x_i \ge 0, i = 1, \ldots , n\}\), where \(x_i\) represents the weight of the ith security in the portfolio. Thus, the random rate of return of the portfolio is given by \(P(x)=\sum _{i=1}^{n} x_i S_i\). Note that P(x) belongs to the convex cone \(\mathcal {P} = \{\sum _{i=1}^{n} x_i S_i: x \in \mathbb {R}^{n}_{+}\}.\) To each random variable P in \(\mathcal {P}\) we associate a real value \(\psi (P)\) which may interpreted as a risk (or gain, or utility) measure associated with P such as Var(P) or E[P]. Thus we obtain the mapping \(\rho : \mathbb {R}^{n}_{+} \rightarrow \mathbb {R}\) given by \(\rho (x) = \psi (P(x))\) that can also be viewed as a risk measure for the portfolio represented by x. The risk measure \(\rho \) is called:

  • positive iff \(\quad \rho (x)>0\)\(\quad \forall x \in \mathbb {R}^{n}_{+} {{\setminus }} \{0\}\);

  • convex iff \(\quad \rho (\eta x + (1-\eta ) y)) \le \eta \rho (x) + (1-\eta ) \rho (y)\), \(\quad \forall x,y \in \mathbb {R}^n_+\), \(\forall \eta \in [0,1]\);

  • positively homogeneous of degree \(\tau \) iff \(\quad \rho (\lambda x) = \lambda ^{\tau } \rho (x)\), \(\quad \forall x \in \mathbb {R}^{n}_{+} \quad \forall \lambda \in \mathbb {R}_{+}\).

The oldest and most intuitive way to obtain a diversified portfolio, going back to the Talmudian wisdom [53], is to equally distribute the capital among all stocks available in the market (capital diversification approach). In terms of normalized weights we have \(x_{i}^{EW}=1/n\). This is known as the Equally Weighted (also called naïve or uniform) portfolio. Clearly the choice of the Equally Weighted portfolio does not use any past or prospective information, nor involves any optimization approach. However, some authors claim that its practical out-of-sample performance is hard to beat on real-world data sets [25]. Furthermore, from the theoretical viewpoint, Pflug et al. [39] show that when increasing the amount of portfolio model uncertainty, i.e., the degree of ambiguity on the distribution of the securities returns, the optimal portfolio converges to the EW portfolio.

A more thorough approach to risk diversification requires to take into account and to formalize the concept of risk contribution of each security to the portfolio, and then to manage it by a model (risk diversification approach). This was done in the original Risk Parity approach of Maillard et al. [34] in the case where \(\rho (x)\) is the volatility of the portfolio P(x), and was applied to other risk measures in [9, 12].

We now describe a general framework for Risk Parity that can be applied under very mild conditions on the risk measure \(\rho \). For this we first recall Euler’s Homegeneous Function Theorem.

Theorem 1

(Euler’s homogeneous function theorem) Let \(f: \mathbb {R}^n \rightarrow \mathbb {R}\) be a continuously differentiable positively homogeneous function of degree \(\tau \). Then

$$\begin{aligned} f(x)=\frac{1}{\tau } \sum \limits _{i=1}^{n} x_{i} \frac{\partial f(x)}{ \partial x_{i}}. \end{aligned}$$

For a positively homogeneous risk measure of degree \(\tau \), a reasonable, although possibly not unique, measure of the risk contribution of each security to the total risk of the portfolio is provided by

$$\begin{aligned} \displaystyle { TRC}_{i}^{\rho }(x)=\frac{1}{\tau }x_{i} \frac{\partial \rho (x)}{\partial x_i}, \end{aligned}$$

where \({ TRC}_i^{\rho }(x)\) is called the total risk contribution of security i. Indeed, Euler’s homogeneous function theorem guarantees that

$$\begin{aligned} \rho (x)= \sum \limits _{i=1}^{n}{} { TRC}_{i}^{\rho }(x). \end{aligned}$$

The Risk Parity (also called Equal Risk Contribution) portfolio is characterized by the requirement of having equal total risk contribution from each security:

$$\begin{aligned} { TRC}_{i}^{\rho }(x)=TRC_{j}^{\rho }(x)\quad \forall i,j . \end{aligned}$$
(1)

Thus, the RP portfolio can be found by solving the following system of equations and inequalities:

$$\begin{aligned} \left\{ \begin{array}{l} { TRC}_{i}^{\rho }(x)=\alpha \qquad i=1,\ldots ,n \\ x\in \varDelta \ \ =\{x\in \mathbb {R}^{n}_{+}:\sum \limits _{i=1}^{n}x_{i}=1 \} \\ \alpha \in \mathbb {R} \end{array} \right. \end{aligned}$$
(2)

We now show that system (2) has a unique solution \((x^{RP},\alpha ^{RP})\), when \(\rho \) is positive, convex and positively homogeneous. For this we need to establish a correspondence between the solutions of (2) and those of an auxiliary unconstrained optimization problem similar to the one considered in [34] for the case of the volatility risk measure. Let \(\mathbb {R}^{n}_{++} = \{x \in \mathbb {R}^{n}: x_i > 0, i=1, \ldots , n \}\) and, for \(\alpha > 0\), set

$$\begin{aligned} g_\alpha (x) = \rho (x) - \alpha \sum \limits _{i=1}^{n}\ln x_{i} . \end{aligned}$$

Proposition 1

For \(\alpha > 0\) and for any differentiable function \(\rho \), if \(x^* \in \arg \min _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x)\), then \({ TRC}_{i}^{\rho }(x^*)= TRC_{j}^{\rho }(x^*)\) for all \(i,j = 1, \ldots , n\). If, in addition, \(\rho \) is convex, then the converse implication also holds.

Proof

The first implication easily follows from the observation that the first order conditions for the unconstrained minimization of \(g_\alpha \) are equivalent to (1). Indeed, for every i we have

$$\begin{aligned} \frac{\partial g_\alpha (x)}{\partial x_i} = \frac{\partial \rho (x)}{\partial x_i} - \frac{\alpha }{x_i} = 0 \quad \Leftrightarrow \quad x_i \frac{\partial \rho (x)}{\partial x_i} = \alpha . \end{aligned}$$
(3)

The converse implication is a straightforward consequence of the above equivalence and of the global optimality of stationary points for convex minimization problems. \(\square \)

We now show that any scalar multiple of a global minimizer for \(g_\alpha \) is a global minimizer for \(g_\beta \) for an appropriate \(\beta \).

Proposition 2

If \(\rho \) is a positively homogeneous function of degree \(\tau \), then

$$\begin{aligned} x^{*} \in \arg \min _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x) \quad \Leftrightarrow \quad \lambda x^{*} \in \arg \min _{x \in \mathbb {R}^{n}_{++}} g_{\alpha \lambda ^{\tau }}(x) . \end{aligned}$$
(4)

Proof

We observe that \(g_{\alpha \lambda ^{\tau }}(\lambda x) = \lambda ^{\tau } g_{\alpha }(x) - \alpha \lambda ^{\tau } n \ln \lambda \). Thus,

$$\begin{aligned} \arg \min _{x \in \mathbb {R}^{n}_{++}} g_{\alpha \lambda ^{\tau }}(\lambda x)=\arg \min _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x) , \end{aligned}$$

which is equivalent to (4). \(\square \)

From Propositions 1 and 2 we immediately derive the following corollary, which implies that an RP portfolio can always be obtained by minimizing a function \(g_\alpha \) for any \(\alpha \) and then normalizing the solution found.

Corollary 1

If \(x^{*} \in \arg \min _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x)\), then \(\frac{1}{\sum \nolimits _{i=1}^{n} x_{i}^{*}} x^{*}\) is a solution to (2), i.e., it is an RP portfolio.

Theorem 2

For a continuously differentiable risk measure \(\rho :\mathbb {R}^{n}_{+} \rightarrow \mathbb {R}\) we have that

  1. (a)

    if \(\rho \) is positive and positively homogeneous of degree \(\tau >0\), then there exists a Risk Parity portfolio;

  2. (b)

    if \(\rho \) is convex, then there exist at most one Risk Parity portfolio.

Proof

In view of Propositions 1 and 2 and of Corollary 1, to prove that (2) has at least one solution it is sufficient to show that \(\arg \min _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x) \ne \emptyset \) for some \(\alpha > 0\). For this we first show that for any \(\alpha > 0\) there exists \(K > 0\) such that

$$\begin{aligned} \inf _{x \in \mathbb {R}^{n}_{++}} g_\alpha (x) = \inf _{x \in B_K} g_\alpha (x). \end{aligned}$$
(5)

where \(B_{K} = \{x \in \mathbb {R}^{n}_{++}: \Vert x \Vert \le K \}\). Indeed, this is a straightforward consequence of the coercivity of \(g_\alpha \), that we will now prove. Note that, by Weierstrass Theorem, the function \( \rho \) attains a minimum value M on the compact set \(\{x\in \mathbb {R}^{n}_{+}: \Vert x \Vert = 1 \}\), and \(M > 0\) by the positivity assumption. Furthermore, for every \(x \in \mathbb {R}^{n}_{++}\) we have

$$\begin{aligned} g_\alpha (x)&= \rho (x) - \alpha \sum \limits _{i=1}^{n}\ln x_{i} = \Vert x \Vert ^\tau \rho \left( \frac{x}{\Vert x \Vert }\right) - \alpha \sum \limits _{i=1}^{n} \ln \frac{x_{i}}{\Vert x \Vert } - \alpha n \ln \Vert x \Vert \nonumber \\&\ge \Vert x \Vert ^\tau \rho \left( \frac{x}{\Vert x \Vert }\right) - \alpha n \ln \Vert x \Vert \ge M \Vert x \Vert ^\tau - \alpha n \ln \Vert x \Vert . \end{aligned}$$
(6)

Thus

$$\begin{aligned} \lim _{\Vert x \Vert \rightarrow +\infty } g_\alpha (x) \ge \lim _{\Vert x \Vert \rightarrow +\infty } (M \Vert x \Vert ^\tau - \alpha n \ln \Vert x \Vert ) = +\infty . \end{aligned}$$

Hence, \(g_\alpha \) is coercive and thus (5) holds for some \(K > 0\). Observe now that for all \(x \in B_{K}\) and for all \(i = 1, \ldots , n\) we have

$$\begin{aligned} g_\alpha (x) \ge - \alpha \ln x_{i} - \alpha (n-1) \ln K. \end{aligned}$$

Hence,

$$\begin{aligned} \lim _{\begin{array}{c} x \in B_K \\ x_i \rightarrow 0^+ \end{array}} g_\alpha (x) = +\infty . \end{aligned}$$

Thus, there exists \(\varepsilon > 0\) such that

$$\begin{aligned} \inf _{x \in B_K} g_\alpha (x) = \inf _{x \in B_{\varepsilon , K}} g_\alpha (x), \end{aligned}$$

where \(B_{\varepsilon , K} = \{x \in B_K: x_i \ge \varepsilon , i = 1, \ldots , n \}\). Since \(B_{\varepsilon , K}\) is compact, the function \(g_\alpha (x)\) attains its minimum on \(B_K\) at a point of \(B_{\varepsilon , K}\). Such point then yields a Risk Parity portfolio through normalization by Corollary 1. To prove (b) it is sufficient to observe that when \(\rho \) is convex, the function \(g_\alpha \) is strictly convex due to the strict convexity of \(\sum \nolimits _{i=1}^{n}\ln x_{i}\). The conclusion then follows from the uniqueness of minimizers of strictly convex functions. \(\square \)

Remark 1

We point out that strict convexity of the risk measure \(\rho \) is not required for the uniqueness of a Risk Parity portfolio. This is useful since several risk measures (e.g. volatility) are convex but not strictly convex, unless additional assumptions are made. Furthermore, the uniqueness result of Theorem 2 could be extended with the same proof also to nonconvex risk measures provided that \(g_\alpha \) is strictly convex for some \(\alpha \).

Corollary 2

If \(\rho \) is continuously differentiable, positive, convex, and positively homogeneous, then (2) has exactly one solution.

The unique vector of weights \(x^{RP}\) which solves (2) is called the Risk Parity portfolio and the corresponding value \(\alpha ^{RP}\) is the equal total risk contribution from each security in the RP portfolio. Note that \(n \alpha ^{RP}\) coincides with the risk of the RP portfolio or, equivalently,

$$\begin{aligned} \alpha ^{RP} = \frac{\rho (x^{RP})}{n}. \end{aligned}$$

3 Portfolio optimization models

Portfolio optimization consists in selecting the portfolio weights, among all those that are feasible, with the aim of maximizing or minimizing one or more objective functions typically representing gain, risk, utility or performance measures. We briefly recall here two of the main optimization approaches to portfolio selection (see, e.g., [38, 44]): the biobjective Gain-Risk analysis, which yields an efficient frontier of Pareto optimal solutions, and the related Gain-Risk ratio that provides a single optimal portfolio.

3.1 Gain-risk efficient portfolios

Following the seminal works by Markowitz [35, 36], the general portfolio selection problem based on the Gain-Risk approach can be formulated as follows

$$\begin{aligned} \begin{array}{lll} \max &{} (1-\eta ) \gamma (x) - \eta \rho (x) &{} \\ s.t. &{} &{} \\ &{} x \in \varDelta &{} \end{array} \end{aligned}$$
(7)

where \(\gamma : \mathbb {R}^{n} \rightarrow \mathbb {R}\) is a measure of gain, generally represented by the expected linear return of the portfolio, and \(\rho : \mathbb {R}^{n} \rightarrow \mathbb {R}\) is a risk measure. This approach aims at determining the fractions \(x_i\) of a given capital to be invested in each security i belonging to an investment universe in order to maximize a generic utility function such as the convex combination of the portfolio risk and gain. The parameter \(\eta \in [0,1]\) is related to the risk aversion of the investor. When \(\eta \) varies from 0 to 1, the optimal solutions to (7) form the whole efficient frontier of the biobjective problem.

Often in practical applications some additional constraints are added to Model (7) for incorporating real-world features (see, e.g., [1, 16] and references therein).

3.2 Portfolios with maximum gain-risk ratio

Another approach based on a trade-off between gain and risk is the one that consists in maximizing the Gain-Risk ratio, i.e., in maximizing the amount of gain per unit of risk.

$$\begin{aligned} \begin{array}{lll} \max &{}\displaystyle \frac{\gamma (x)}{\rho (x)} &{} \\ s.t. &{} &{} \\ &{} x \in \varDelta &{} \end{array} \end{aligned}$$
(8)

The resulting optimal portfolio belongs to the efficient frontier and is often called the market or tangent portfolio. Several works are devoted to empirically compare the performance of such portfolios for different Gain-Risk ratios [42, 43], while other works are mainly focused on optimization issues related to the maximization of Gain-Risk ratios (see [26, 31, 51] and references therein).

4 Diversified optimal portfolios

We recall that any optimization approach to portfolio selection might be misleading due to the difficulty of obtaining good estimates for the parameters involved in the function to be optimized (particularly due to errors in the estimates of the means of the securities returns, see, e.g., [7, 8, 20]) and to the high sensitivity of the optimal solutions to the input data [30].

In this section we propose a new approach to portfolio selection that tries to combine the advantages of the diversification and of the optimization strategies, described in Sects. 2 and 3, respectively.

4.1 Optimization + Risk Parity diversification: a mixed-integer formulation

For a given universe of securities, we showed that, under mild assumptions, the Risk Parity portfolio is unique. However, it obviously includes all the securities of the investment universe and this might not be a desirable feature if the exclusion of a subset of securities could lead to better theoretical and practical results. For example, Cesarone and Tardella [17] have shown that one can find RP portfolios on subsets of securities of the investment universe that have a smaller risk than the global RP portfolio and also smaller total risk contribution from each security. Thus we consider the problem of finding a subset of securities for which the risk (or utility) of the associated RP portfolio is optimal. This can be formulated as an MINLP as follows.

$$\begin{aligned} \max&u(x)&\\ s.t.&\nonumber \end{aligned}$$
(9)
$$\begin{aligned}&\displaystyle x_{i} \frac{\partial \rho (x)}{\partial x_i} = \alpha y_{i}&i=1,\ldots ,n \end{aligned}$$
(10)
$$\begin{aligned}&x_{i} \le y_{i}&i=1,\ldots ,n \end{aligned}$$
(11)
$$\begin{aligned}&y \in \{0,1 \}^{n}&\nonumber \\&x \in \varDelta&\nonumber \\&\alpha \in \mathbb {R}&\end{aligned}$$
(12)

where for instance, the objective function (9) could be \(u(x)=(1-\eta ) \gamma (x) - \eta \rho (x)\), or \(u(x)=\frac{\gamma (x)}{\rho (x)}\). Note that the unique RP portfolio on a subset S of securities \(x^{RP}(S)\) is obtained imposing constraints (10), (11), and (12).

This model is a hard nonconvex mixed-integer optimization problem whose precise structure depends on the choice of the risk measure \(\rho (x)\) and of the utility function u(x), since the gain measure is generally represented by the expected linear return of the portfolio which is a linear function of the portfolio weights x.

4.2 Optimization + Risk Parity diversification: a pseudo-boolean formulation

We present here a unified alternative method for finding diversified optimal portfolios by solving general nonlinear pseudoBoolean optimization problems.

Let \(N =\left\{ 1, \ldots , n \right\} \) be the set of indices of all the securities in the investment universe, and “Div” any (capital or risk) diversification approach as described in Sect. 2. For each subset \(S \subseteq N\), we denote by \(x^{Div}(S)\) the (unique) diversified portfolio that uses only securities with indices in S. Thus, we have \(x^{Div}(S) \equiv x^{RP}(S)\) in the case of the Risk Parity portfolio, or \(x^{Div}(S) \equiv x^{EW}(S)\) for the Equally Weighted portfolio.

In the optimization approach to portfolio selection (Sect. 3) we use an objective function \(u: \mathbb {R}^{n} \rightarrow \mathbb {R}\) that associates to every feasible portfolio x a (risk, performance, etc.) value that we seek to optimize. The convex combination of risk and gain and the gain-risk ratio are relevant examples of objective functions u and we will use them in our empirical analysis below. However, the method that we propose here can be straightforwardly applied to any pair of diversification approach and objective function.

A diversified optimal portfolio, for a diversification approach “Div” and an objective function u (to be maximized), is a portfolio that solves the following pseudoBoolean optimization problem:

$$\begin{aligned} \displaystyle \max _{S \subseteq N} u(x^{Div}(S)). \end{aligned}$$
(13)

5 The Multi-Greedy heuristic

In alternative to solving the specialized mixed-integer models described in Sect. 4.1, all diversified optimal portfolios can be obtained as solutions of the general pseudoBoolean problem (13). However, no satisfactory general purpose method for this problem seems to be available in the literature. Hence, we propose here a new heuristic procedure, that we call Multi-Greedy heuristic, for solving a general pseudoBoolean problem. The computational results in the following Sect.s suggest that our Multi-Greedy heuristic is very accurate and efficient, at least for the class of problems that we consider here.

We first need to introduce some notation. For any \(S \subseteq N=\{1,2, \cdots ,n \}\), we let \(Ad(S) = \{ T \subseteq N | T = S \cup {h} \; \text{ for } \; h \notin S, \; \text{ or } \; T= S {{\setminus }} {h} \; \text{ for } \; h \in S \}\) denote the 1-neighborhood of S. We let \(V_0=\{ S^1_0, \ldots , S^d_0 \}\) denote a set of d good candidate initial solutions (possibly \(V_0=\emptyset \)). The basic idea of the Multi-Greedy heuristic is to keep improving a set of candidate solutions with local search until no more improvements are possible. An important issue in this heuristic is the use of a bounding function \(m: \mathbb {N} \rightarrow \mathbb {N}\) for the size of the sets of candidate solutions at each step to control the accuracy and the efficiency of the heuristic.

The main steps of the Multi-Greedy heuristic are the following:

Multi-Greedy heuristic

figure a

We observe that we may associate to any given set \(S \subseteq N\) its characteristic vector \(x(S) \in \mathbb {R}^{|N|}\), where \(x^{S}_{j}=1\) if and only if \(j \in S\).

A straightforward way to construct the set of candidate solutions \(V_{i+1}\) from the current set \(V_i\) is by trying to increase or decrease every element of \(R\in V_i\) by one in all possible ways. Therefore, the greatest computational burden for the Multi-Greedy heuristic consists in the construction of the set \(U_{i+1}\) in Step 2.

We point out that the elements of the current set \(V_i\) can be processed in parallel to obtain \(V_{i+1}\), so that, each element of \(V_i\) can be increased or decreased independently from each other using a thread for each element of \(V_{i}\). In this way, our Multi-Greedy heuristic extends the classical single-threaded greedy approach to a, possibly more effective, multiple-threaded setting. In fact, a single-threaded strategy starts from one candidate solution at a time and then iteratively moves to one neighbor solution. This approach is clearly less time consuming than a multiple-threaded approach. However, it will likely stop at a low quality local optimum, while a multiple-threaded strategy explores a larger number of candidate solutions and typically finds better solutions. A comparison of the efficiency and effectiveness of the two greedy approaches is shown in Tables 12 and 3.

The efficiency of our Multi-Greedy method can also be improved with the use of an appropriate function \(m(\cdot )\) that limits the cardinality of the set \(V_{i+1}\) at each step. In order to widen or limit the search, the number of candidate solutions \(m(\cdot )\) may be a function of the specific iteration i. At the end of the procedure, a post optimization phase could be further applied in order to escape from local optima. In fact, on the set \(V^*\) that yields \(S^*\) in Step 5, one can perform a larger neighborhood search by transforming each solution \(R\in V^*\) into a different one by substituting a subset of indices in R with indices not in R. In this way, we obtain a new set of current solutions V from which one could restart with the basic steps of the Multi-Greedy heuristic. Clearly, the efficiency of the neighborhood search depends on the number of changes performed in the current solution R. In practice, limiting the search to a small number of changes, we have observed considerable reduction in the execution time without losing optimality in all real-world instances that have been solved both with our heuristic and with the state-of-the-art global optimization solver BARON (version 15.9.22, see [52]).

6 Computational and empirical results

For the practical implementation and experimentation of our approach, we consider the portfolio expected return and volatility as gain and risk measures, respectively. More precisely, \(\gamma (x)=\mu ^{T} x\) and \(\rho (x) = \sigma (x) = \sqrt{x^{T} \varSigma x}\), where \(\mu \) is the vector whose components \(\mu _i\) are the expected returns of the n securities, and \(\varSigma \) is their covariance matrix, i.e., its generic element \(\sigma _{ij}\) is the covariance of the returns of security i and security j. Then, the Risk Parity conditions (2) are expressed by the nonlinear constraints \(\displaystyle \frac{x_i \left( \varSigma x \right) _i}{\sqrt{x^{T}\varSigma x}} = \alpha \) for all \(i =1, \ldots , n\).

We test our approach on the following three real-world datasets belonging to major stock markets across the world.

  1. 1.

    NASDAQ100 (NASDAQ 100, USA), containing 82 securities and 596 observations (November 2004–April 2016);

  2. 2.

    FTSE100 (FTSE 100, UK), containing 83 securities and 717 observations (July 2002–April 2016);

  3. 3.

    SP500 (S&P 500, USA), containing 442 securities and 595 observations (November 2004–April 2016).

These datasets consist of weekly linear returns computed on daily prices data, adjusted for dividends and stock splits, obtained from Thomson Reuters Datastream. We included stocks with at least ten years of observations. These weekly returns time series for securities and indexes are publicly available in [10], and have been used in other empirical studies on portfolio selection [11, 13].

Our analysis is twofold. In Sect. 6.1, we solve Problem (13) with the utility function \(u(x)=(1-\eta ) \mu ^{T} x - \eta x^{T} \varSigma x\) for several values of \(\eta \in [0,1]\). We thus obtain the mean–variance Risk Parity efficient frontier for each dataset and we compare it with the efficient frontier of the corresponding classical mean–variance Markowitz model [Problem (7)]. Our experiments show that the additional Risk Parity constraints, while guaranteeing equal risk contribution from all securities, do not cause a significant worsening of the quality of the selected portfolios w.r.t. the classical Markowitz approach. Furthermore, in Sect. 6.1 we evaluate the efficiency and the accuracy of our heuristic procedure by comparing its results with those obtained by the global optimization solver BARON. In Sect. 6.2 we compare the out-of-sample performance of the market portfolios (i.e., those that maximize \(u(x)=\frac{\mu ^{T} x}{\sqrt{x^{T} \varSigma x}}\)) with the diversified optimal portfolios [the solution of (13] with the same utility function) and with those obtained with the pure Risk Parity approach [Model (2)]. We adopt a Rolling Time Window (RTW) scheme, namely we allow for the possibility of rebalancing the portfolio composition during the holding period at fixed intervals. We chose to adopt a period of 80 weeks for the in-sample window and of 12 weeks for the out-of-sample window, with rebalancing allowed every 12 weeks. All the procedures have been implemented in MATLAB 11.0 and executed on a workstation with Intel Core i7-4810MQ processor with 2.28 GhZ clock rate and 16 GB RAM under Windows 10. The quadratic programming problems have been solved using the TOMLAB/CPLEX toolbox [27].

6.1 In-sample analysis

We compare here the classical Markowitz efficient frontier with the one obtained by adding the Risk Parity constraints. More precisely, we first compute the Markowitz portfolios \(x^{*}_{\eta }\) by solving Problem (7) for 21 values of \(\eta \) in the interval [0, 1]. Note that for \(\eta =0\) the resulting portfolio only contains the security with maximum expected return, while for \(\eta =1\) we obtain the global minimum variance portfolio. For the same 21 values of \(\eta \in [0,1]\), we then compute the diversified optimal portfolios \(x^{RP}_{\eta }(S^*)\) by solving Problem (13) with \(u(x)=(1-\eta ) \mu ^{T} x - \eta x^{T} \varSigma x\). Finally, in the mean-variance plane we report the optimal solutions \(x^{*}_{\eta }\) (red line) and \(x^{RP}_{\eta }(S^*)\) (blue dashed line). For all the datasets considered, Figs. 1a, 2a, b show that the efficient frontiers obtained combining the optimization and the diversification strategies are, in fact, very close to those generated with the pure optimization approach. Thus the diversified optimal portfolios, while only slightly worse in terms of optimality, provide a significant gain in terms of balance of the risk contributions of the selected stocks. We graphically illustrate the values \(\alpha _i = \displaystyle \frac{x_i \left( \varSigma x \right) _i}{\sqrt{x^{T}\varSigma x}}\) of the total risk contributions of all assets included in the Markowitz optimal portfolios in Fig. 1b, and the equal values \(\alpha \) of the total risk contributions of the assets included in the optimal Risk Parity portfolios in Fig. 1c.

Fig. 1
figure 1

NASDAQ100

Fig. 2
figure 2

Efficient frontiers

To conclude this section, we evaluate the efficiency and accuracy of our more effective Multi-Greedy heuristic procedure by comparing the values found for the diversified optimal portfolios with those obtained by BARON. Furthermore, we also compute the (exact) values of the minimum variance portfolios that trivially constitute lower bounds on the values of the diversified optimal portfolios. This allows us to obtain an estimate for the accuracy of the Multi-Greedy heuristic in the cases where no global optimal solution is available for comparison. The efficiency and accuracy of the Multi-Greedy heuristic emerges from Tables 12 and 3. Here we report the values of the solutions found for the diversified optimal portfolios by the Multi-Greedy heuristic [(applied to Problem (13) with \(u(x)=\sigma ^{2}(x) = x^{T} \varSigma x\)] and by BARON [applied to the MINLP described in Sect. 4.1 with \(u(x)= \sigma ^{2}(x)\)], and the corresponding exact values of the minimum variance portfolios obtained by solving simple convex quadratic optimization problems. For each investment universe we solve these problems for several subsets of securities with cardinalities ranging from 10 to the size of the universe. Since BARON was unable to certify optimality within several hours of computation for all but the smallest sizes, we set a time limit of 10,000 s so that the values reported for BARON when the limit is achieved should be regarded as heuristic solutions. We point out that in Tables 12 and 3 (column 3) the values of the solutions found by the Multi-Greedy heuristic are typeset in italic when we are able to certify optimality by means of a complete enumeration (only for \(n=10,20\)). In the last column, we also provide the following Relative Accuracy Bound (RAB) on the quality of the solutions found by the Multi-Greedy heuristic

$$\begin{aligned} RAB=\frac{\sigma ^{2}(x_{MinV-RP}) - \sigma ^{2}(x_{MinV})}{\sigma ^{2}(x_{MinV})} . \end{aligned}$$

Here \(x_{MinV}\) denotes minimum variance portfolio and \(x_{MinV-RP}\) denotes minimum variance portfolio with RP constraints. For the cases where an exact diversified optimal portfolio could be found, the Multi-Greedy heuristic always found the same solution. In the other cases the quality of the solutions found by the Multi-Greedy heuristic is certified by the small values of the RAB and through the comparison with BARON.

Table 1 Efficiency and accuracy for NASDAQ100
Table 2 Efficiency and accuracy for FTSE100
Table 3 Efficiency and accuracy for SP500

6.2 Out-of-sample performance analysis

For the out-of-sample analysis we choose the following four performance measures often adopted in the literature (see, e.g., [4, 45]). We denote by \(R^{out}\) the out-of-sample portfolio return, by \(R_I^{out}\) the return of the Market Index in the out-of-sample period, and by \(r_f\) a constant risk free rate of return that we set equal to 0.

  • Sharpe ratio [47, 48] is defined as the ratio between the average of \(R^{out}-r_f\) and its standard deviation, namely:

    $$\begin{aligned} \frac{E[R^{out}-r_f]}{\sigma (R^{out})}. \end{aligned}$$

    The larger its value, the better is the portfolio performance.

  • Jensen’s alpha [28], defined as the intercept from the regression of portfolios out-of-sample excess returns on the out-of-sample returns of the benchmark, namely:

    $$\begin{aligned} \alpha = (E[R^{out}] - r_f) - \beta (E[R_I^{out}] - r_f) , \end{aligned}$$

    where \(\beta =Cov(R^{out},R_I^{out})/\sigma ^{2}(R_I^{out})\) is the regression coefficient representing the systematic risk measure of the market.

  • Average return, defined as the average \(E[R^{out}]\) of the out-of-sample returns of a portfolio.

  • Omega ratio (introduced by Keating and Shadwick [29] and recently used, e.g., in [26]) can be written as

    $$\begin{aligned} \Omega _{\eta }(x)=\frac{E[\max (0, R^{out} - E[R_I^{out}])]}{E[\min (0, R^{out} - E[R_I^{out}])]} . \end{aligned}$$
Table 4 Out-of-sample performances

For each dataset, in Table 4 we report the out-of-sample performance results of the diversified optimal portfolios (i.e., the Risk Parity portfolios that maximize the Sharpe ratio), of the portfolios that maximize the Sharpe ratio only (i.e., the pure optimization strategy), and of the Risk Parity portfolios (i.e., the pure diversification strategy). For each performance measure, the best values are in bold. We also report the Mean Relative Difference (MRD, expressed in percentage) between the maximum Sharpe ratio achieved by the diversified optimal and by the RP portfolios and that of the corresponding pure optimization approach. The mean is evaluated w.r.t. all the in-sample windows considered in the Rolling Time Window scheme. Furthermore, in the last column of Table 4 we also provide the average number of securities selected by each strategy. The empirical results show a clear superiority of the combined diversification-optimization approach with respect to pure diversification or optimization. Furthermore, the number of securities included in a diversified optimal portfolio is generally smaller than those of the securities included in the portfolios provided by the other approaches, and this might be an interesting feature for small investors or to reduce the impact of the estimation errors on the parameters required by the model.

Fig. 3
figure 3

Average dispersion of perturbed optimal portfolios for a multivariate normal market

6.3 Stability analysis

In this section, we provide an empirical stability analysis to compare the impact of data estimation errors on our diversified optimal portfolios w.r.t. the purely optimized or purely diversified ones. To this aim, we use here the same approach described in [30], which was also used in [15] for the analysis of several portfolio selection models including Risk Parity. The idea behind the study of the stability of portfolio selection models is to perturb their input parameters aiming at statistically representing the same random market returns. In the space of the portfolio weights, the perturbed data generate a cloud of optimal portfolios around the “true” optimal solution. We then evaluate a dispersion measure for the cloud of generated optimal portfolios, consisting in the average distance of these “perturbed” optimal portfolios from a single portfolio regarded as the true optimal solution. Similar to [30], we assume a multivariate normal market where all assets have expected return equal to 0.1 and the identity matrix as covariance matrix. In this case, all the models considered have the true (Equally-Weighted) optimal portfolio \(x^{0}_{k}=1/n\). For fixed n / T (where n is the number of assets in the investment universe, and T is the number of scenarios), we then generate M statistically equivalent samples via the Monte Carlo method. Correspondingly, we find M perturbed optimal portfolios \(x^{j}\) with \(j = 1, \ldots , M\). In our experiments we observed that the results were relatively stable for values of M greater than 50. Hence, for computational reasons we decided to set \(M = 50\). We evaluate the effect of the estimation errors of the input data using a dispersion measure in the space of the decision variables. More precisely, we use the average of the Euclidean norms of the differences between the true optimal portfolio \(x^{0}\) and the perturbed optimal portfolios \(x^{j}\) with \(j = 1, \ldots , M\). In Fig. 3 we show the boxplots of the average dispersion of the perturbed optimal portfolios around the true (Equally-Weighted) optimal portfolio for \(n/T=0.05, 0.1, 0.2, 0.4\). Note that the value of n / T can be interpreted as the intensity of the perturbation of the input data. We observe that the stability of the diversified optimal portfolio model is always better than that of the pure optimization models. However, as expected and as already found in [15], the Risk Parity diversification model is always the best. Furthermore, we also observe that the stability of our diversified optimal portfolio model approaches that of the minimum variance model when the intensity of the perturbation increases. This is compatible with the proximity of the efficient frontiers of the Mean-Variance and that of the Mean-Variance-Risk Parity models.

7 Conclusions

We have proposed a new framework that tries to join the benefits of the optimization and of the diversification strategies for portfolio selection and we have provided a more general formulation and new theoretical results for the Risk Parity diversification strategy. Since our approach requires the global solution of hard integer or mixed-integer optimization problems, we have also devised a new Multi-Greedy heuristic procedure for general pseudoBoolean problems that shows very promising results in terms of accuracy and efficiency.

Preliminary empirical analysis clearly shows encouraging out-of-sample performances of optimal diversified portfolios with respect to the pure diversification and optimization counterparts in the case of the volatility risk measure objective and of Risk Parity diversification. Future research will be directed to applications of the optimal diversified portfolio selection framework to other diversification-objective pairs.