1 Introduction

Financial decisions are directly linked to wealth creation through capital accumulation, sustainable economic development, and an increase in welfare (Patrick 1966). This thriving for improvement suggests a mentality of optimization in financial decision-making. Both firms and governments alike face investment decisions consisting in selecting, from an array of candidate projects, those that most successfully fulfill the organizations strategic objectives and ensure future profitable growth. This project portfolio selection problem (PPSP) is concerned with identifying efficient portfolios of projects instead of evaluating the suitability of solely individual projects (Urli and Terrien 2010).

Traditional approaches to the PPSP aim at building a ranking of the projects and allocating the available budget according to this ranking. Among the most widely employed are the analytical hierarchy process (Suh et al. 1994) and the scoring method (Coldrick et al. 2005). However, these approaches suffer from two major shortcomings (Carazo et al. 2010): (i) they typically assume independence among projects—thus neglecting synergy and cannibalism effects as well as interdependences–; and (ii) they fail to provide optimal solutions when the decision-maker wishes to consider further constraints beyond budget restrictions.

When considering realistic instances, this problem usually becomes NP-hard due to its sheer complexity, since the budget-allocating entity usually pursues several conflicting objectives while taking into account a considerable number of restraining factors (Fernandez et al. 2015). As noticed by Urli and Terrien (2010), objectives can be of quantitative nature—such as net present value or market share–, or pertain to qualitative measures—such as personnel capabilities or environmental impact. While employing exact methods in solving NP-hard combinatorial optimization problems (COPs) tends to be computationally expensive, metaheuristics can provide a near-optimal solution to such problems in reasonable computing times (Soler-Dominguez et al. 2017).

In this paper, we analyze a stochastic version of the PPSP: the goal is to maximize the expected net present value of the inversion, while considering random cash flows and discount rates in future periods as well as a rich set of constraints. These constraints include the maximum risk allowed and other conditions defined by the decision maker. In order to solve this version of the problem, we propose a simheuristic algorithm. As described in Juan et al. (2015a), simheuristic approaches integrate metaheuristics with simulation techniques in order to deal with the random nature of stochastic COPs. In particular, we use an extension of the variable neighbourhood search (VNS) metaheuristic (Hansen and Mladenović 2001) that integrates Monte Carlo simulation (MCS) techniques. In short, while the metaheuristic generates promising portfolios for a deterministic version of the problem, simulation techniques are applied to: (i) estimate the expected net present value and risk of these project portfolios under time-variant uncertainty conditions; (ii) complete a risk analysis on each project portfolio; and (iii) provide feedback to the metaheuristic in order to better guide the searching process. The VNS framework was used since, as discussed in Hansen and Mladenović (2014), it offers an excellent trade-off between simplicity and performance.

Thus, the main contributions of this paper are: (i) to propose a mathematical formulation for a rich version of the stochastic PPSP where the goal is to maximize the net present value of the investment; (ii) to develop a simheuristic algorithm able to solve this stochastic version of the PPSP; and (iii) to analyze, using the aforementioned algorithm, how the selected portfolio of projects varies as the uncertainty level increases.

We solve both deterministic and stochastic PPSPs, and compare the near-optimal solutions. The deterministic PPSP indicates that portfolios consisting of risky projects have a higher NPV than portfolios consisting of relatively safe projects. However, it is worth noting that such a relation is not necessarily linear due to the the presence of cardinality and quantity constraints. Also, the instances employed in our experiments vary in terms of the pairwise correlation between cashflows from any two projects. The ensuing interdependencies among the projects can be regarded as a constraint to the volume of projects that can be included in a portfolio. Turning to the the stochastic PPSP, we find that a portfolio of projects in a stochastic environment always yields a lower (expected) NPV than a portfolio in a deterministic environment. Furthermore, our research findings indicate that a near-optimal solution to the deterministic PPSP is generally sub-optimal under uncertainty. Lastly, a near-optimal solution to the stochastic PPSP leads to a higher (expected) NPV than a near-optimal solution to the deterministic PPSP evaluated under uncertainty.

The remainder of the paper is structured as follows. Section 2 presents a literature review. Section 3 contains the description of the problem as well as a mathematical formulation. We propose our solving methodology in Sect. 4. Following this, the computational experiments are presented in Sect. 5 and their results analyzed in Sect. 6. Lastly, we present our conclusions and future research lines in Sect. 7.

2 Literature review on the PPSP

Early work on the PPSP (Ghasemzadeh and Archer 2000) considers a single weighted objective function and constraints concerning budget and man-hours. However, their test instances were very limited because they aspired a comparison between manually computed portfolios and those constructed employing their decision support system. To solve a PPSP employing further constraints, a two-stage procedure is proposed by Doerner et al. (2004, 2006). During the first phase, the Pareto frontier of efficient project portfolios is constructed through optimization. Then, in the second phase it is interactively explored by the decision-makers to account for personal preferences. They further take into account floor and ceiling constraints for inclusion of projects from any given subset, as well as resource limitations and minimum benefit requirements for individual projects. As there are possible synergies between projects that should be evaluated in order to accurately estimate the benefits of a project portfolio, the authors make an attempt at incorporating these considerations into their methodology. The Pareto ant colony optimization approach is further enhanced by Stummer and Sun (2005), who suggest that their improved model performs better with many objective functions and a large set of efficient solutions and is thus specifically suitable for real-life problems.

These interdependences show that the portfolio optimization is not a trivial task as the number of possible portfolios increases exponentially with the number of possible projects. Thus, Urli and Terrien (2010) consider project interdependences modeled by an interaction matrix as proposed by Schmidt (1993) in addition to restrictions on monetary and human resources. Rabbani et al. (2010) further consider that some projects may be mandatory or mutually exclusive. Furthermore, project interaction leads to the consideration of timing of project implementation.

While previous research considers static optimization approaches, more recently, research has also drawn on findings from other areas, such as scheduling: Gutjahr et al. (2008) and Gutjahr et al. (2010) also take employee competencies and the evolution of their knowledge scores over time through learning or depreciation into account. Carazo et al. (2010) further investigate this research line and include scheduling as a continuative concept being implemented simultaneously to, but also following the project selection. As previous work, they also consider certain interrelations between different projects and allow for the transfer of unused monetary resources to the next period. Urli and Terrien (2010) included continuous project portfolio adjustment over the respective time horizon and solved small and medium instances in satisfactory computation time. However, the determination of all non-dominated project portfolios still remains difficult when considering large, but realistically relevant instances (100 projects or more). While this might not be relevant in most firm investment decisions, it is a significant drawback for governments or bodies awarding funding for projects and even financial institutions (Cruz et al. 2014).

A further important factor when considering the particular case of project portfolio selection processes of financial institutions is project divisibility (Urli and Terrien 2010). When the possible decision variables are no longer binary, the complexity is yet increased. While business projects are at least partially indivisible, research projects funded by governments can often also be executed with partial funding and it is thus a further question how much of the sought after funding is awarded, introducing further constraints to the budget allocation. Hence, more recent research increasingly focuses on large-scale instances and partial allocation. Cruz et al. (2014) solve a stationary project portfolio optimization problem, in which partial support of the requested budget is allowed. Unlike previous research, they assume that the preferences of the decision-maker are to some extent known. Outranking is employed in an a priori preference system in order to model that decision makers will have preferences towards different portfolios on the efficient frontier based on their personal goals concerning the achievement of objectives. Incorporating these preferences allows identifying those portfolios that lie on the efficient frontier and simultaneously are not outranked by another portfolio. They incorporate budgetary constraints in that they define upper and lower bounds for inclusion of projects from a particular group. Fernandez et al. (2015) further enhance this approach by including synergies in their optimization.

Due to the uncertainty present in different facets of project appraisal, simulation has been incorporated into the metaheuristics framework to address this. Gabriel et al. (2006) employ MCS in their methodology to simulate possible cost scenarios for the respective optimization constraints. They analyzed a government agency facing a project portfolio decision and showed that their approach significantly improved the decision-making process and led to more robust results due to the incorporation of uncertainty. Medaglia et al. (2007) combine a multi-objective evolutionary algorithm with MCS in order to solve a project portfolio problem that allows for partial funding of projects, project interdependences, constrained resources and uncertainty in the objective function regarding the preferences of the decision-maker. Huang (2007) treats the project parameters as uncertain and combines a genetic algorithm with random fuzzy simulation in order to account for this. An interesting application combining discrete-event simulation with a genetic algorithm to select security control portfolios is discussed in Kiesling et al. (2016). In the context of an IT infrastructure subject to a number of threats, the authors focus on selecting the best policy from efficient combinations of security controls. Another way to address uncertainty is by considering robust solutions that perform reasonably well across the full range of feasible parameter values. Thus, Liesiö et al. (2007) propose a multi-objective robust portfolio modeling methodology. Their approach relies on preference programming methods. They illustrate the effectiveness of their approach in a case study involving real data from a road pavement project in Finland. In Liesiö et al. (2008), the same authors extend their previous work by also considering project interdependencies, incomplete cost information and variable budget levels. Using a personal computer, they are able to solve, in reasonable computing times, instances up to 60 projects, 5 optimization criteria, and 10 constraints. For large-size instances (e.g., with 200 projects or more), the authors suggest the use of heuristic-based approaches.

As can be seen from the reviewed examples, uncertainty can be considered for the project parameters, the modeled constraints, and the objective function. In this paper, we develop a simheuristic algorithm to address a rich and stochastic version of the PPSP with the goal of maximizing the net present value of the investment. Our approach thus addresses several gaps in the literature and combines different research challenges. Firstly, while most authors consider a particular subset of projects, such as R&D projects (Doerner et al. 2004) or government projects (Cruz et al. 2014), we formulate a general approach that can easily be applied to a range of project types, such as R&D projects, investments, or financial projects. We further include constraints on the number of projects included in the portfolio, as well as on the divisibility of project funds requested. Likewise, we give the decision maker the possibility to pre-select portfolios based on personal or strategic preferences (independently of the projects risk-return characteristics). In addition, we create instances that can be employed for comparison in future analyses, as the previously studied examples are either very small in nature or not openly accessible.

3 Formal description of the stochastic and rich PPSP

We consider a stochastic and rich variant of the PPSP, in which there is a large set of candidate projects that compete for a limited global budget. The budget allocation is limited in the following ways: (i) if funded, each project i has to receive a minimum amount, \(\epsilon _i \in (0,1]\), expressed as a percentage of the available budget (this is an amount below which the project could not yield successful results and it would thus not make sense to include it in the portfolio); (ii) similarly, a project is maximally funded with the amount originally requested, \(\delta _i \in [\epsilon _i,1]\), also expressed as a percentage of the available budget; (iii) the problem is further constrained by the decision makers’ option to include certain projects irrespective of their characteristics for strategic or political reasons (this will be modeled by the binary variable \(q_i\), which will take the value 1 whenever the project i has to be necessarily included); (iv) there is a minimum (\(k_{min}\)) and a maximum (\(k_{max}\)) number of projects that can be funded (these threshold values are decided by the evaluation committee in advance to assure a certain diversification level); and (v) the risk of the portfolio of selected projects cannot exceed a given threshold, \(r_{max}\), where the risk is calculated employing a traditional variance–covariance matrix, \(\sigma _{ij} = \sigma _{ji}\), thus accounting for interdependences between projects i and j.

Under these conditions, the projects are evaluated based on their net present value (NPV), which is defined as the difference between the present value of future cash inflows and the present value of future cash outflows. Since predicting future cash flows might be subject to some degree of uncertainty—which will be higher as we move further into the future–, these cash flows will be modeled as random variables. When computing the NPV associated with a given cash flow, an interest rate is used to transform future cash values into present ones. Since this interest rate might also be subject to some uncertainty, we will model it as a random variable too (Fig. 1). Notice that the use of random variables for modeling future cash flows as well as the interest rate transform our COP into a stochastic one. In summary, under the aforementioned constraints our main goal will be to maximize the expected NPV of the project portfolio.

Fig. 1
figure 1

Illustrative scheme of the stochastic PPSP

More formally, consider a set of n projects, \(P = \{1, 2, \ldots , n\}\), and a set of m future times, \(T = \{1, 2,\ldots , m\}\). Let us assume that the actual cash flow generated by each project is directly proportional to the quantity invested in it. Thus, for each \(i \in P\) and for each \(t \in T\), the actual cash flow of project i at time t will be given by \(C_{it} \cdot x_i\), where \(C_{it}\) is a random variable representing the potential cash flow when project i receives the total requested funding and \(x_i\) represents the actual investment in project i (measured as a percentage of the total available budget). The periodical benefit of project i at period t, \(B_{it}(x_i)\), is defined as the actual cash flow of project i at time t adjusted by a random discount rate \(R_{it}\), i.e.:

$$\begin{aligned} \begin{array}{lll} B_{it}(x_i) = \frac{C_{it} \cdot x_i}{(1+R_{it})^t} &{}\quad \forall i \in P, &{} \forall t \in T\\ \end{array} \end{aligned}$$
(1)

The net present value associated with each project i, \(N_{i}(x_i)\), is computed by adding all the periodical benefits provided by the project over time, i.e.:

$$\begin{aligned} \begin{array}{lll} N_{i}(x_i) = \mathop {\sum }_{t=1}^{m} B_{it}(x_i) &{}\quad \forall i \in P\\ \end{array} \end{aligned}$$
(2)

Similarly, the net present value of an investment plan \(x = (x_1, x_2, \ldots , x_n)\), N(x), is obtained as the aggregation of individual net present values, i.e.:

$$\begin{aligned} \begin{array}{lll} N(x) = \mathop {\sum }_{i=1}^{n} N_{i}(x_i)\\ \end{array} \end{aligned}$$
(3)

The goal is to find a project investment plan, x, that maximizes the expected net present value, i.e.:

$$\begin{aligned} \begin{array}{lll} \text{ Max } &{} E[N(x)] = \mathop {\sum }_{i=1}^{n} \mathop {\sum }_{t=1}^{m} E[B_{it}(x_i)] &{} \\ \end{array} \end{aligned}$$
(4)

Also, according to our previous discussion the following constraints apply:

$$\begin{aligned}&\mathop {\sum }_{i=1}^{n} \mathop {\sum }_{j=1}^{n} \sigma _{ij}x_i x_j \le r_{max} \end{aligned}$$
(5)
$$\begin{aligned}&\sum _{i=1}^{n}x_i=1 \end{aligned}$$
(6)
$$\begin{aligned}&0 \le x_i \le 1 \quad \forall i \in P \end{aligned}$$
(7)
$$\begin{aligned}&0 \le \epsilon _i \le \delta _i \le 1 \quad \forall i \in P \end{aligned}$$
(8)
$$\begin{aligned}&z_i \in \{0,1\} \forall i \in P \end{aligned}$$
(9)
$$\begin{aligned}&\epsilon _i z_i \le x_i \le \delta _i z_i \quad \forall i \in P \end{aligned}$$
(10)
$$\begin{aligned}&z_i \le w \cdot x_i \quad \forall i \in P \end{aligned}$$
(11)
$$\begin{aligned}&q_i \in \{0,1\} \quad \forall i \in P \end{aligned}$$
(12)
$$\begin{aligned}&q_i \le z_i \quad \forall i \in P \end{aligned}$$
(13)
$$\begin{aligned}&k_{min} \le \sum _{i=1}^{n}z_i \le k_{max} \end{aligned}$$
(14)

Constraint (5) quantifies and limits the risk exposure of the decision-maker. Equations (6) and (7) restrain the total investment to the available resources. Equation (8) guarantees that the lower and upper bounds for each project are within the valid range. The auxiliary variables \(z_i\) are introduced in Eq. (9). The value of \(z_i\) is 1 if project i is actually included in the portfolio, and 0 otherwise. These binary variables are used in Eq. (10) to guarantee that the investment in each project is within its bounds. In the auxiliary equation (11), w is a very large positive value such that: if \(x_i>0\) then \({wx}_i\ge 1\). Equations (12) and (13) define and impose the pre-assignment constraints: if the project i is pre-selected (i.e., \(q_i=1\)), it must be included in the solution (i.e., \(z_i=1\)) irrespective of its risk-return characteristics. Finally, the cardinality constraint is provided in Eq. (14) to guarantee that the number of selected projects fits within the allowed bounds.

4 Our simheuristic approach

In order to solve the stochastic and rich PPSP described in the previous section, a simheuristic approach is proposed. It combines simulation techniques with an adaptive variable neighborhood search (VNS) metaheuristic—which also integrates an acceptance criterion based on simulated annealing (SA). The metaheuristic component itself relies on a constructive heuristic which employs biased randomization (BR) techniques. The main ideas behind each of these components are briefly explained next. After that, the way these components are integrated in our approach is described.

4.1 Overview of the main components

As described in Juan et al. (2015a), simheuristics extend metaheuristics by introducing simulation techniques that assess the performance of promising solutions in a stochastic environment: a selected subset of promising solutions generated by the metaheuristic component are simulated in order to estimate its performance under a stochastic environment. Usually, this performance is measured in terms of expected value of some key indicator, but other statistics could be analyzed as well.

The main advantage of the VNS metaheuristic (Mladenović and Hansen 1997) lies in the search systematically employing different neighborhood structures, rendering it increasingly flexible within the solution space of the problem. This, in return, potentially leads to better solutions compared to single-neighborhood-based local search algorithms. Many extensions of VNS have been proposed, most of them oriented to the solving of large-scale instances (Melián 2006; Moreno-Vega and Melián 2008; Höller et al. 2008; Hansen et al. 2008).

The SA searching procedure is inspired by the process of physical annealing with solids in which a crystalline solid is heated and then allowed to cool slowly until it achieves its most regular possible crystal lattice configuration (Nikolaev and Jacobson 2010). In a classical SA, the search starts with a high temperature and higher chance of transition to a worse solution that decreases as the search continues, thus reducing the chance of transition (Azizi and Zolfaghari 2004). To avoid being trapped in a local minimum, our algorithm makes use of an adaptive cooling schedule, which includes the possibility of reheating.

Biased randomization (Juan et al. 2013; Grasas et al. 2017) is a mechanism to randomize deterministic heuristics. Heuristics include at least one step in which a choice has to be made, for instance, selecting an element from a sorted list. Employing BR, instead of selecting the ‘most promising’ option, a probability is assigned to each candidate option in such a way that the logic is maintained (i.e., the most promising options receive higher probabilities). Hence, different solutions may be built at each execution, some of which are expected to outperform the one provided by the deterministic version.

figure a

4.2 Algorithm description

Our approach is depicted in Algorithm 1, which is composed of three stages. In the first stage, a feasible initial solution is constructed. Then, during the second stage, an adaptive VNS metaheuristic enhances the initial feasible solution by iteratively exploring the search space and conducting a short number of simulation runs in which both cash flows and discount rates are randomly generated to estimate the expected net present value. From this stage, a reduced set of promising solutions is obtained. In the third stage, an extended simulation experiment yields a more accurate estimate of the expected NPV, as well as other statistics whenever required. The initial-solution stage as well as the VNS stage are explained next in more detail.

Given the stochastic instance, we consider its deterministic counterpart obtained after replacing the random variables by their expected values. In order to generate an initial solution, initSol, we randomly choose a valid size s for the project portfolio (i.e., \(k_{min} \le s \le k_{max}\) and \(s \ge \sum q_i\)). First, the pre-selected projects are included (Algorithm 2). Then, we randomly select projects until a portfolio of size s is generated. In order to set the weights of each project in this portfolio, we apply LocalSolver, a powerful optimization software (http://www.localsolver.com). This software was selected due to its ability to consider quadratic expressions as constraints, which is the case of Eq. (5). The entire process is repeated until the randomly generated solution satisfies all the constraints (i.e., until a feasible solution is obtained). Notice that, being a project-investment plan, initSol will be a feasible solution for both the deterministic and the stochastic versions of the problem.

figure b

In the second stage, the initSol is improved using a VNS procedure. First, the feasible initSol is copied into baseSol and bestSol. Moreover, the size of the neighborhood, k, is set to one and the SA-related temperature is set to zero. Then, a new solution, newSol, is created by “shaking” the current one. This procedure consists of randomly deleting a number of non pre-selected projects in the current portfolio and then randomly introducing new projects. Each newSol is send to LocalSolver to determine the appropriate investment weights. In order to avoid calling the solver more than strictly necessary, a cache memory is implemented –it stores, in a hash map data structure, the weights assigned to previously analyzed project portfolios. Next, newSol undergoes a local search phase in order to find the local minimum within the defined neighborhood structure. In this local search phase, we randomly substitute projects from the current portfolio by projects outside the portfolio. This replacement is performed taking into account the risk-affinity between projects (i.e., the covariance matrix), and a BR technique relying on a geometric distribution is employed (Juan et al. 2015b). If newSol is promising in terms of deterministic NPV, then it is sent through a fast simulation process, consisting of 200 runs, to estimate its associated expected NPV under uncertainty. Whenever this expected NPV outperforms the one of the baseSol and/or the one of the bestSol, these solutions are updated to newSol and the process continues. Also, to reduce the odds of getting trapped in a local minimum, an SA-like acceptance criterion is used to update baseSol with newSol in some occasions even if newSol does not outperform baseSol.

Once the VNS stage ends, the algorithm returns a selected list with of top 5 solutions. For each of these solutions, we perform a more intensive simulation experiment, consisting of 15, 000 runs, which provides a more accurate estimate of the expected NPV. Notice that the outcomes of this simulation experiment can also be used to complete a risk analysis on each proposed solution as well as to obtain other relevant statistics—e.g., NPV quartiles associated with each investment plan, etc.

5 Computational experiments

The algorithm was implemented as a Java application and all experiments were performed on a standard personal computer equipped with Intel Core i7 CPU at 2.9 GHz with 8 GB of RAM memory. As operating system we have used Windows 8. In order to test it, we created benchmark indexes with a set of 10 projects and the corresponding required inputs. In a preliminary analysis, we identified a range of acceptable risk levels, which, as similarly suggested for traditional portfolio optimization, we then divided into 1000 equidistant points as risk constraint. The so-created benchmarks differ in terms of their interdependence (correlation) between projects and are deterministic. In this regard, we distinguish among six different instances. Instance 1 assumes 0 correlation among each pair of projects. Instance 2 assumes that each pair of projects is correlated with the coefficient of 0.9. This coefficient is 0.5 for instance 3, − 0.5 for instance 4, and -0.9 for instance 5. Finally, Instance 6 randomly generates the coefficients of correlation for each pair of projects. To evaluate the effects of uncertainty on project portfolio selection, we make several reasonable distributional assumptions about our stochastic variables \(C_{it}\) and \(R_{it}\).

It is assumed that the cash flow of project i at time t, \(C_{it}\), follows a normal distribution \(N(\mu _{it}, \sigma _{it})\), where \(\mu _{it} = E[C_{it}]\) is the deterministic value for the cash flow of project i at time t (given as an input), and \(\sigma _{it} = \gamma \cdot |\mu _{it}| \cdot t\). In the previous expression, \(\gamma > 0\) is an auxiliary parameter that is used to consider different variability levels in our experiments (intra-period layer of uncertainty), while t accounts for the fact that uncertainty grows as we move forward in the future (inter-period layer of uncertainty). In our experiments, we assume three different levels of intra-period uncertainty: low (\(\gamma = 1.05\)), medium (\(\gamma = 1.10\)), and high (\(\gamma = 1.15\)). For the purpose of our computational experiments we have used a total of five periods, i.e., \(t \in \{1,2,3,4,5\}\). Similarly, it is assumed that the discount rate of project i at time t, \(R_{it}\), follows a normal distribution \(N(\mu '_{it}, \sigma '_{it})\), where \(\mu '_{it} = E[R_{it}]\) is the deterministic value for the discount rate of project i at time t (given as an input), and \(\sigma '_{it} = \gamma \cdot |\mu '_{it}| \cdot t\). As in the case of uncertain cash flows, the discount rate features two uncertainty layers, the intra-period layer of uncertainty and the inter-period one. It is worth noting that our methodology is robust to modification of the aforementioned distributional properties, either theoretical or empirically based on real-life scenarios.

The algorithm is executed ten times with different seeds, storing only the best solutions in each run. A maximum time of 150 s was allowed for each execution. In order to gauge the effect of uncertainty on project portfolio selection, we carried out two computational experiments. The first experiment considers deterministic instances, while the second assumes stochastic cash flows and discount rates.

6 Analysis of results

In this section we present and discuss the results obtained in two computational experiments: Sect. 6.1 analyzes the results obtained for the deterministic version of the PPSP (i.e., assuming deterministic cash flows and discount rates), while Sect. 6.2 discusses the stochastic PPSP and shows that an optimal (or near-optimal) investment plan for the deterministic scenario might be a suboptimal plan for the stochastic one.

6.1 Analyzing the deterministic PPSP

Figure 2 presents these results by instance. This figure sheds light on the relation between the maximum risk constraint and the associated expected NPV for the six instances.

Fig. 2
figure 2

Results for the deterministic PPSP

Notice that relaxing the risk constraint leads to a higher average NPV. However, several departures from this general finding stand out. First, the nature of the relation between the expected NPV is also driven by the coefficient of correlation between each pair of projects. Portfolios of projects that are highly correlated (instances 2 and 3) typically generate a lower expected NPV than portfolios of less correlated projects (Instances 1, 4, and 5). Finally, Instance 6 also features positively correlated projects, although based on randomly drawn coefficients of correlation. Specifically, if the coefficient of correlation is high and the maximum allowed risk is low, then the portfolio will be dominated by smaller projects that generate lower expected NPV. Intuitively, the higher the correlation among the projects the more limiting is the maximum risk constraint. To meet the maximum risk constraint, larger projects are crowded out—or cannibalized—by smaller projects. As the coefficient of correlation decreases, the expected NPV increases for the same value of the maximum allowed risk (due to a lower cannibalism effect). As anticipated, the NPV of the best-found portfolio increases with an increased allowance of risk exposure for each of the instances. Considering instances 4 and 5 (with negative correlations), it becomes clear that the risk constraint—which is always defined as a positive monetary value—does not affect the solution, and that a near-optimal solution is found in most cases. Similarly, it is intuitive that the higher the correlation between projects, the more limiting the risk constraint acts. This becomes visible when considering instance 2, which achieves a lower or equal NPV for all levels of maximum risk compared to the remaining instances. The jumps in the graphs of instances 1, 3, and 6 are due to the limited number of potential projects, which causes projects with higher benefits to be included only after a certain risk threshold.

Table 1 The net present value (NPV) of deterministic (Dy) and stochastic (Sy) solutions in a deterministic (eD) or stochastic (eL, eM, eH) environment (Instance 6)
Table 2 The gap between the DD solution and each of the other solutions (Instance 6)

6.2 Analyzing the stochastic PPSP

For instance 6 (random correlations), Table 1 presents the obtained results. The first column shows the risk threshold considered (i.e., the maximum risk allowed by the user). Column [DD] represents the NPV associated with the best deterministic solution. Columns [Dy] (with \(y \in \{L, M, H\}\)) show the expected NPV obtained when the best deterministic solution is evaluated in the stochastic PPSP with the corresponding level of uncertainty (L = low, M = medium, and H = high). Similarly, columns [Sy] show the expected NPV obtained for the stochastic PPSP using the solution provided by our simheuristic approach. Table 1 indicates that, in general, the (expected) NPV increases with the maximum risk allowed, regardless of the solution type (deterministic or stochastic). However, it is worth noting that the NPV-curve becomes flat after the project portfolio reaches a certain risk treshold. For instance, when the risk threshold is 37,000 units, taking an additional unit of risk is not followed by a commensurate rise in the (expected) NPV. Furthermore, a deterministic solution evaluated in the stochastic PPSP yields a lower expected NPV than the corresponding stochastic solution obtained with our simheuristic aproach. This finding suggests that using deterministic solutions in a stochastic environment can lead to significant biases in project appraisal. Table 2 contains the gaps (measured as percentage difference) between [DD] and [Dy], and between [DD] and [Sy]. The gaps are always positive, which indicates that a higher degree of uncertainty erodes the expected NPV of a project and hence drives wedge between a deterministic and a stochastic solution. It is also worth noting than the gaps between [DD] and [Dy] are larger than the gaps between [DD] and [Sy].

Figure 3 shows the box-plots of the aforementioned gaps between DD and Dy or between DD and Sy (where \(y \in \{L,M,H\}\)). It is important to remark that these gaps are always positive, meaning that the NPV in the deterministic PPSP can be seen as an upper bound for the expected NPV in the stochastic PPSP –i.e., ceteris paribus, the existence of uncertainty in the project selection problem reduces the average quality of the NPV that can be attained. Indeed, the NPV is inversely related to the degree of uncertainty. For a given expected NPV, a higher degree of uncertainty translates into a larger standard deviation of a cashflow, and consequently the maximum risk allowed goes up. However, such as a reduction larger if the existence of uncertainty is ignored in the model. Also, note that these gaps grow with the uncertainty level: eH box-plots are above eM box-plots, which in turn are above eL box-plots, (where \(e \in \{D,S\}\)). Finally, it can be noticed that the expected NPV associated with the use of the best-deterministic solution in a stochastic environment acts as a lower-bound for the optimal value in the stochastic environment, i.e.: our solutions for the stochastic PPSP always outperform the expected NPV generated by using the best-deterministic solution in a stochastic environment. This observation not only adds credibility to the quality of our algorithm but also illustrates that using the best deterministic solution in a stochastic environment might be a bad decision since it usually provides sub-optimal values.

Fig. 3
figure 3

Boxplots of gaps w.r.t. the best solution for the deterministic PPSP (DD)

Finally, Fig. 4 shows that: (i) as the maximum risk allowed increases, the problem becomes less constrained and, therefore, the stochastic solutions get closer to the deterministic ones; and (ii) for each risk level, gap obtained from our stochastic solutions [DD-Sy] are lower than those obtained from our deterministic solutions in a stochastic environment [DD-Dy].

Fig. 4
figure 4

Gaps [DD-ey] versus MaxRisk allowed (where ‘ey’ represents different solutions)

6.3 Exploring a multi-objective scenario

In multi-objective optimization two or more conflicting goals are considered. This subsection explores the optimization of a bi-objective PPSP model, where the NPV associated with an investment plan is maximized (first goal) while, at the same time, its risk is minimized (second goal). Such a setting resembles Rooderkerk and van Heerde (2016), who have sought to balance risk and return, albeit in a specific retail-assortment related optimization problem. Importantly, a two-objective optimization problem can now be tailored to capture a varying degree of risk aversion of the decision maker.

In order to address this multi-objective scenario, we make use of the multi-directional local search (MDLS) method, which was introduced by Tricoire (2012). This method relies on the concept of Pareto dominance. A neighbor solution \(x'\) of x is efficient if \(x'\) is better than x for at least one objective. Hence, to find efficient neighbor solutions of x, it is sufficient with searching one direction at a time using single-objective local search methods.

The MDLS requires an initial set F of non-dominated solutions to start an iterative procedure. As initial set, we have used the set F containing the top five stochastic non-dominated solutions generated by our VNS-based Simheuristic. At each iteration, a solution x from F is randomly selected and then, for each objective, a corresponding local search method is employed to generate a neighbor solution \(x'\).

The single-objective local searches employed in this stage are the ones already described in the previous section. First, we use biased randomization to exchange projects from the current portfolio by projects outside the portfolio. This replacement is performed taking into account the risk-affinity between projects. Then, the newly generated solution is sent to the commercial LocalSolver software, which determines the appropriate weights of the investment. We execute LocalSolver twice, once for each goal. Thus, during the first iteration the LocalSolver tries to maximize the NPV, while during the second one it is focused on minimizing the investment risk. Finally, at the end of each local search, a simulation process is carried out in order to estimate the NPV under uncertainty.

After that, the non-dominated set F is updated by merging solutions in F with the new neighbor solutions that have been provided by the local searches. To merge solutions the dominance rule is used, i.e.: all dominated solutions are deleted from F. The whole procedure is repeated until a maximum-allowed computing time is reached. At this point, the MDLS returns the set F of mutually non-dominated solutions.

Figure 5 shows a Pareto chart with the results obtained after executing the MDLS method for instance 6. Here, a maximum risk of 21,000 was allowed, and a medium (\(\gamma = 1.10\)) level of uncertainty was considered. This chart shows the Pareto frontier of non-dominated solutions for this particular scenario. Notice that there is a strong relationship between the two objective functions. As the NPV increases—thus obtaining a higher return of the investment plan—the risk also increases.

Fig. 5
figure 5

Pareto frontier for instance 6

7 Conclusions and future work

As far as we know, this is the first work addressing the relevant problem of maximizing net present value in project portfolio selection problem (PPSP) under uncertainty and rich conditions. In the stochastic version of the PPSP discussed here, both the periodical cash flows as well as the discount rates are modeled as random variables, which represents a step forward with regards traditional approaches in which they are assumed to be deterministic and known in advance. Since real-life financial activities underlie plenty of uncertainty, adding randomness to these elements contributes to diminish the gap between theory and practice.

After completing a literature review on related work and providing a formal model for the stochastic PPSP, we propose a simheuristic approach to obtain efficient solutions to it. Our algorithm integrates Monte Carlo simulation inside a variable neighborhood search framework. The algorithm also uses other components inspired in simulated annealing and biased-randomization techniques. A series of computational experiments allow us to validate the solving methodology. Notice that, despite we have assumed specific probability distribution and covariance matrices during the numerical experiments, our methodology is general and it could be used with any other probability distribution and covariances (in practice, data should be collected for each case-study and then modeled using the right probability distribution, correlation values among projects, etc.).

Our research findings are as follows. First, we find that a relation between the expected NPV and risk is not necessarily linear. Indeed, the presence of cardinality and quantity constraints generate non-linearities in the risk-return relation. Second, project interdependencies—as measured by the correlation between cash-flows from two projects—can be regarded as a limit to the volume of projects that can be included in a portfolio. Third, a near-optimal solution to the deterministic PPSP is generally sub-optimal in a stochastic environment. Fourth, a near-optimal solution to the stochastic PPSP gives rise to a higher (expected) NPV than a near-optimal solution to the deterministic PPSP evaluated in a stochastic environment.

As research lines for future work, it would be worthy to explore the following topics, which represent limitations of our current work: (i) interdependencies can be defined as effects that dynamically influence risk and/or benefit figures of the project portfolio based on the constituent projects; (ii) although this paper has already explored a bi-objective extension of our single-objective approach, developing a complete multi-objective model and solving methodology can be of great interest for both researchers and practitioners; and (iii) other metaheuristic frameworks (e.g.: tabu search, simulated annealing, genetic algorithms, etc.) could be used as a base for our simheuristic approach, thus comparing the performance of different approaches.