1 Introduction

There is an increasing interest of the scientific community in addressing optimization problems that include in their mathematical formulation uncertain, stochastic, and dynamic information. Problem solving under uncertainty has a very high impact on the real world, since optimization problems arising in practice are becoming increasingly complex and dynamic, also thanks to the fast development of telecommunications that makes not only the perception but also the changes of the world more rapid, stochastic and difficult to forecast. A relevant group of combinatorial optimization problems under uncertainty, which is the one considered by this survey, corresponds to the class of Stochastic Combinatorial Optimization Problems (SCOPs). The distinctive feature of SCOPs is that part of the information about the problem data is partially unknown, and knowledge about its probability distribution is assumed.

In recent years, metaheuristic algorithms such as Ant Colony Optimization (ACO), Evolutionary Computation (EC), Simulated Annealing (SA), Tabu Search (TS), and others, are emerging as successful alternatives to classical approaches based on mathematical and dynamic programming for solving SCOPs. In fact, due to the high complexity and difficulty of optimization problems under uncertainty, often classical approaches (that guarantee to find the optimal solution) are feasible only for small size instance of the problems, and they could require a lot of computational effort. In contrast, approaches based on metaheuristics are capable of finding good and sometimes optimal solutions to problem instances of realistic size, in a generally smaller computation time. Table 1 lists some papers in the literature providing evidence about the advantages in solving SCOPs via metaheuristics instead of using exact classical methods.

Table 1 Evidence about some of the advantages in solving SCOPs via metaheuristics instead of using exact classical methods

In this paper we provide a thorough review of the current applications of metaheuristics to the class of SCOP, by filling a gap in the literature, where a number of surveys and books about solving SCOPs via mathematical and dynamic programming exist, but none about using metaheuristics, despite the research literature in this field is already quite rich. Some of the metaheuristics that we consider (such as ACO, EC, and SA) are nature-inspired, while some others (such as TS and others) are not, and we include them all in this survey because (a) they achieve good performance in solving SCOPs by combining the concepts of “intensification” and “diversification” which are a common feature of all metaheuristics, (b) there is a significant amount of literature involving these metaheuristics and SCOPs, and many concepts used to apply a metaheuristic to a given SCOP may be profitably applied to other metaheuristics. By reviewing the literature about applying metaheuristics to SCOPs we identify: common issues that one faces when trying to solve a SCOP via a metaheuristic, differences in the maturity of the research about each metaheuristic as a SCOP-solving algorithm, areas where more research is much needed. This survey aims at being a useful instrument for who wants to find significant directions of research in solving SCOPs via metahuristics, and at the same time it provides an introduction to both the field of metaheuristics and of optimization under uncertainty.

The remainder of the paper is organized as follows. Section 2 presents the basic principles of the ACO, EC, SA, and TS metaheuristics in the context of deterministic combinatorial optimization problems. Optimization under uncertainty and SCOPs are introduced and formalized in Sect. 3. First, a classification of the modeling approaches to optimization under uncertainty is given, and the precise scope of SCOP is stated. Then, the main formal definitions of static and dynamic SCOPs from the literature are introduced. Section 4 addresses the key algorithmic design issue to be considered when extending any metaheuristic from deterministic COPs to SCOPs: The computation of the objective function and of different types of objective function approximations. Section 5 reviews the main applications to SCOPs of each metaheuristic for which a significant amount of interesting literature exist. These include the metaheuristics introduced in Sect. 2 (ACO, EC, SA, and TS), plus some others (Stochastic Partitioning Methods, Progressive Hedging, Rollout algorithms, Particle Swarm Optimization, and Variable Neighborhood Search) that are still less widespread in the nature-inspired/metaheuristic community. Whilst Sect. 5 takes a vertical view, by focusing separately on each metaheuristic, Sect. 6 takes a transversal view on the reviewed literature. Using this point of view issues common to all metaheuristics, open problems, and possible directions of research are proposed and discussed. Finally, Sect. 7 summarizes the conclusions that this survey has arrived to.

2 Metaheuristics: basic principles

In this section we introduce the concept of metaheuristic, and we describe the working principles of the ACO, EC, SA, and TS metaheuristics in the context of deterministic combinatorial optimization problems. For a more extended review and comparison among some of these metaheuristics, see for instance the paper by Blum and Roli (2003), and the publications pointed to by Metaheuristics Network Web http://www.metaheuristics.org/. The acronyms used to identify metaheuristics are recalled in Table 3.

Informally Papadimitriou and Steiglitz (1982), a deterministic combinatorial optimization problem consists in finding the best object from a finite—or possibly countably infinite—set. This object, also called a solution, is typically an integer, a subset, a permutation, or a graph structure. As a formal definition of deterministic combinatorial optimization problem, we propose the following one.

Definition 1

(Deterministic Combinatorial Optimization Problem-DCOP) Given a finite set S of feasible solutions x, and a real valued cost function G(x), find

$$ \mathop{\hbox{min}}\limits_{x\, \in \, S } \; G(x). $$
(1)

The set S is usually called search (or solution) space. Its structure may be made complex by the presence of constraints on solutions. The solution x* with minimal objective function value, that is, \(G(x^{\ast}) \leq G(x) \forall x\, \in \, S,\) is called a globally optimal solution. Definition 1 considers the minimization form for optimization problems. The maximization form is equivalent and can be derived in a straightforward manner by substituting the word ‘min’ with the word ‘max’. If not explicitly mentioned, in this paper we always consider the minimization form for optimization problems.

For many DCOPs belonging to the class of NP-hard optimization problems (Garey and Johnson 1979), algorithms that guarantee to find the optimal solution within bounded time (we call these algorithms exact) may require exponential computation time. Even for small instances of a problem, exact algorithms may require too much computation time for practical purposes. This is the reason why there is a great interest in designing algorithms that find in a reasonable computation time a solution that is as good as possible, but not necessarily optimal (we call these algorithms approximate). Heuristics and Metaheuristics belong to this class.

The term heuristic derives from the Greek verb heuriskein \((\epsilon\upsilon\rho\iota\sigma\kappa\epsilon\iota\nu)\) that means “to find”. In fact, heuristics are basic approximate algorithms that search the solution space to find a good solution. There are mainly two types of heuristics: constructive algorithms and local search algorithms. Constructive algorithms build a solution by joining together “pieces”, or components, of a solution, that are added one after the other until a solution is complete. Local search algorithms start from a pre-existent solution (called a current solution) and try to improve it by modifying some of its components (such a modification is called a move).

The Greek suffix “meta” used in the word metaheurstic means “beyond, in an upper level”. Thus, metaheuristics are algorithms that combine heuristics (that are usually very problem-specific) in a more general framework. According to Blum and Roli (2003),

metaheuristics are high level concepts for exploring search spaces by using different strategies. These strategies should be chosen in such a way that a dynamic balance is given between the exploitation of the accumulated search experience (which is commonly called intensification) and the exploration of the search space (which is commonly called diversification). This balance is necessary on one side to quickly identify regions in the search space with high quality solutions and on the other side not to waste too much time in regions of the search space which are either already explored or don’t provide high quality solutions.

Because of their practical relevance, most of the literature about metaheuristics is of experimental nature. Nevertheless, for many metaheuristics also formal theoretical results are available, particularly concerning convergence. The general question is: will a given metaheuristic find an optimal solution when given enough resources? Due to the stochastic nature of metaheuristics, answers to this question are not trivial, and they are always probabilistic. Even if convergence proofs are usually not very useful in practice to implement an efficient metaheuristic (since they typically require infinite computation time or memory space), they are still an important aspect to investigate, because they can provide insight into the working principles of an algorithm.

Let us now concentrate on specific metaheuristics. For each of them we are going to illustrate: First, the working principles, inspiring concepts, and intensification/diversification strategies; second, a brief list of references to milestone and survey publications, including pointers to the literature about theoretical results of convergence.

2.1 Ant Colony Optimization

ACO is one of the most successful nature-inspired metaheuristics. The inspiring concept that links optimization with biological ants is based on the observation of their foraging behavior: when walking on routes from the nest to a source of food, ants seem to find not simply a random route, but a quite ‘good’ one, in terms of shortness, or, equivalently, in terms of time of travel; thus, their behavior allows them to solve an optimization problem. This kind of success of biological ants is entirely explained by their type of communication and by their way of deciding where to go: While walking, ants deposit a chemical called pheromone on the ground, and they tend to choose routes marked by strong pheromone concentrations. Given two initially unexplored routes, a short and a long one, between the nest and the source of food, ants choose at first randomly which one to walk. Ants that have chosen, at first by chance, the shorter route are the first to reach the food and to start their return to the nest. Therefore, pheromone starts to accumulate faster on the shorter route than on the longer one. Subsequent ants tend to follow the shorter route because it has more pheromone, thus reinforcing it more and more, and further attracting other ants on the good route.

Combinatorial problems addressed by ACO are usually encoded by a construction graph G = (V, A), a completely connected graph whose nodes V are components of solutions, and arcs A are connections between components. Finding a solution means constructing a feasible walk in G. The construction graph encoding is also used in current ACO applications to SCOPs and to dynamic optimization problems. Some examples are also described in Dorigo and Stützle (2004).

The ACO algorithm is essentially the interplay of three procedures (Dorigo et al. 1999): \({\mathsf{ConstructAntsSolutions, UpdatePheromones}},\) and \({\mathsf{DeamonActions}},\) as represented by Algorithm 1.

\({\mathsf{ConstructAntsSolutions}}\) is the process by which artificial ants construct walks on the construction graph incrementally and stochastically. For a given ant, the probability p kl to go from a node k to a feasible successor node l is an increasing function of τ kl and η kl (u), where τ kl is the pheromone on arc (k,l), and η kl (u) is the heuristic value of arc (k,l), which should be a reasonable guess of how good arc (k,l) is. The heuristic value may depend on the partial walk u.

Table 2

\({\mathsf{EvaporatePheromone}}\) is the process by which pheromone is decreased on arcs. Pheromone is decreased by each ant on each arc as soon as it is added to a partial walk on the construction graph, this operation is called local update. Decreasing pheromone on selected arcs is important, in order to avoid too rapid convergence of the algorithm to suboptimal solutions, and to favor the exploration of new areas of the search space. Interestingly, pheromone decreases also in the biological environment, due to evaporation.

\({\mathsf{DeamonActions}}\) are centralized operations that use global knowledge about the search process of the algorithm. Here, the term ‘centralized’ is used as opposed to the term ‘local’, and identifies those operations that cannot be performed by single ants (since they operate locally). Deamon actions may include: the evaluation and comparison of the objective function value of different solutions produced by ants; applying a local search heuristic to solutions produced by ants, in order to intensify the search near some selected solutions; collect global information that can be used to increase pheromone on some solutions or on some solution components in order to bias the search and to intensify it around solutions with given characteristics (this centralized pheromone update is also called global update).

The first algorithms based on the ant colony analogy appeared at the beginning of the nineties in a paper by Dorigo et al. (1991) later published as Dorigo et al. (1996). ACO is now a widely studied metaheuristic for combinatorial optimization problems, as the recent book by Dorigo and Stützle (2004) testifies. Among recent overviews, we cite the one by Blum (2005), which also highlights new research trends. The theoretical convergence properties of the ACO meteheuristic have been recently reviewed in a paper by Dorigo and Blum (2005). The first convergence proofs have been provided for an ACO algorithm called GBAS by Gutjahr in (2000) and (2002). Other proofs of convergence, given by Dorigo and Stützle in (2002) and Dorigo and Stützle (2004), apply to a class of ACO algorithms that constraint all pheromone values not to be smaller than a given positive lower bound. The lower bound prevents that the probability to generate any solution becomes zero.

2.2 Evolutionary Computation

EC is a collective term for all variants of optimization algorithms that are inspired by Darwinian evolution. In this context, a solution to a given optimization problem is called individual, and a set of solutions is called population. The basic structure of an EC algorithm is represented by Algorithm 2.

Table 3

Every iteration of the algorithm corresponds to a generation, where certain operators are applied to some individuals of the current population to generate the individuals of the population of the next generation. Then, a variation operator is applied to the population (the \({\mathsf{Vary}}\) operator of Algorithm 2). At each generation, only some individuals are selected for being elaborated by variation operators, or for being just repeated in the next generation without any change, on the base of their fitness measure (this can be the objective function value, or some other kind of quality measure). Individuals with higher fitness have a higher probability to be selected. In this metaheuristic, exploration is provided by mutation and recombination operators, while exploitation of obtained information on good solutions is enabled by the selection mechanism.

In the literature there are mainly three different categories of EC that have been developed independently from each other: Evolutionary Programming (EP), proposed by Fogel et al. in (1966), Evolutionary Strategies (ES) proposed by Rechenberg in (1973), and Genetic Algorithms proposed by Holland in (1975). Presently, algorithms that fall in the EP and ES category mostly apply to continuous optimization problems, while GA are more specific for discrete and combinatorial optimization problems. Recent overviews about EC include Hertz and Kobler (2000), Calégari et al. (1999), and Bäck et al. (1997). For the convergence properties of EC and GA, see for instance Rudolph (1996), Vose (1999), and Reeves and Rowe (2003).

2.3 Simulated Annealing

SA relies on a model developed by Metropolis et al. (1953) for simulating the physical annealing process, where particles of a solid arrange themselves into a thermal equilibrium.

The algorithm is based on the principle of local search heuristics, and uses a pre-defined neighborhood structure on the search space S. A control parameter which is called “temperature” in analogy to the physical annealing process governs the search behavior. In each iteration, a neighbor solution y to the current solution x is computed. If y has a better objective function value than x, the solution y is “accepted”, that is, the current solution x is replaced by y. If, on the other hand, y has a worse objective function value than x, the solution y is only accepted with a certain probability depending on (i) the difference of the objective function values in x and y, and (ii) the temperature parameter. In SA, intensification is provided by the local search nature of the algorithm, while diversification is produced by the presence of “uphill” moves that are possible for non-zero temperature. In pseudocode, the SA metaheuristic can be represented as in Algorithm 3 (cf. Aarts and Korst 1990, p. 16).

Table 4

Therein,

  • S is the search space and G is the objective function (see Definition 1);

  • x and y are feasible solutions from S;

  • T 1,T 2,... is a (usually decreasing) sequence of values for the temperature parameter; the update of the values T k is done according to a so-called cooling schedule;

  • the sets S(x) form the pre-defined neighborhood structure: to each feasible solution x ∈ S, a set \(S(x) \subseteq S \setminus \{x\}\) of “neighbor solutions” is assigned;

  • uniform[α,β] is a procedure selecting a uniformly distributed (pseudo-)random number from the interval [α,β].

The SA metaheuristic has been introduced in the area of combinatorial optimization by Kirkpatrick et al. (1983). An introduction to SA can be found in van Laarhoven and Aarts (1987) or Aarts and Korst (1990). Several results showing convergence of SA to the set of optimal solutions under suitable cooling schedules have been obtained by diverse authors, for example Geman and Geman (1984), Gelfand and Mitter (1985), or Hajek (1988). Essentially, convergence can be assured by a cooling schedule where T k is decreased as Γ/log k, with sufficiently large Γ. In practice, cooling is usually done faster for computation time reasons. (For more details, see Aarts and Korst 1990.)

2.4 Tabu Search

TS is essentially a sophisticated and improved type of local search. The simplest local search heuristic is known as Hill Climbing, and works as follows. Consider a starting current solution, evaluate its neighboring solutions (according to a given neighborhood structure), and set the best or the first found neighbor which is better than the current solution as new current solution. Iterate this process until an improving solution is found in the neighborhood of a current solution. The local search stops when the current solution is better than all its neighbors, that is, when the current solution is a local optimum.

Such a simple and very general local search behaves quite poorly in practice, particularly because when a local optimum is found, the algorithm stops improving, and combinatorial problems often have local optima whose objective values are much worse than that of the global optimum. The strength of the TS metaheuristic with respect to Hill Climbing is that, by employing three TS-specific concepts, it avoids to get prematurely stuck in a local optimum. These TS-specific concepts are: best improvement, tabu lists, and aspiration criteria.

Best improvement means that each current solution is always replaced by its best neighbor, even if the best neighbor is worse than the current solution. This is clearly a way not to get stuck in local optima. Using best improvement poses the problem of possible cycling among already visited solutions, because it is possible, for example, that the best neighbor of a solution is indeed the last visited current solution. In order to avoid cycling, choosing recently visited solutions is forbidden, by storing some attributes of these solutions in the so-called tabu lists. Whole solutions are not stored in a tabu list, because this would require too much memory for most combinatorial optimization problems. The choice of attributes is a delicate point. Typically, tabu lists store the ‘moves’ that should be performed in order to go from one solution to another, or the differences between solutions. In this way the memory requirement of tabu lists is feasible, but another problem arises: forbidding all solutions corresponding to a tabu attribute may forbid also solutions that have not yet been visited, and possibly also very good or optimal solutions. TS employs aspiration criteria for solving this problem. An aspiration criterion is a condition that, if satisfied, allows to set as new current solution a solution obtained by performing a tabu move. A typical example of aspiration criterion is requiring that a solution is better than the best solution found from the beginning of the algorithm. In this metaheuristic, intensification is provided, like in SA, by the local search mechanism, while diversification is given by use of tabu lists.

Table 5

In pseudocode, the TS metaheuristic may be represented as in Algorithm 4, where x,y are feasible solutions of the combinatorial optimization problem, A(x,k) is the set of solutions among which the new current solution is chosen at iteration k, S(x) is the set of neighbors of x, T(x,k) is the set of tabu moves at iteration k, and \(\tilde{T}(x,k)\) is the set of tabu moves satisfying at least one aspiration criterion. In TS, typical stopping criteria may be a maximum CPU time, a maximum number of consecutive iterations not producing an improving solution, or the emptiness of the set A(x,k).

The main ideas characterizing the TS metaheuristic were independently proposed in the eighties by Glover (1986) and Hansen (1986), and since then TS has been widely applied to combinatorial optimization problems. A comprehensive introduction to TS can be found in the book by Glover and Laguna (1997), or in Hertz et al. (1997). Theoretical properties of convergence of TS to the optimal solutions has been analyzed only quite recently by Hanafi (2000) and by Glover and Hanafi (2002). Both papers derive convergence results for a version of TS where the choice of a given neighborhood and a decision criterion for selecting moves force some solutions to be revisited before exploring other new ones.

3 The domain of stochastic combinatorial optimization

Optimization under uncertainty is a vast field and the precise scope of SCOPs needs to be identified. For this purpose we first propose, in Sect. 3.1, a classification of the possible modeling approaches of optimization under uncertainty, where the position of SCOPs can be watched in perspective. Then, in Sect. 3.2 the main formal definitions of static and dynamic SCOPs from the literature are introduced.

3.1 Modeling approaches to uncertainty

In defining the scope of SCOPs one faces the difficulty of considering the many ways in which uncertainty may be formalized. Uncertainty is included in the formulation of optimization problems in order to go nearer to real world conditions, but models should also be a bit simplified, in order to be tractable analytically or numerically. The efforts done in reaching a good trade-off between usefulness of the model and tractability of the problem have produced a multitude of formalizations of uncertainty. This is even more evident for metaheuristics, because, due to their simplicity, they may be easily applied to complex formulations that would be considered intractable for many classical algorithmic approaches.

When considering models of optimization problems under uncertainty, there are mainly two aspects to define: first, the way uncertain information is formalized, and second, the dynamicity of the model, that is, the time uncertain information is revealed with respect to the time at which decisions must be taken. The several modeling approaches differ in the way the first and/or the second aspects are defined. Here, we propose a classification of models according to these two aspects, uncertainty and dynamicity, as schematized in Fig. 1. For space limitations, this paper will then focus only on a subset of models that correspond to our definition of SCOPs (and to the grey box in Fig. 1).

Fig. 1
figure 1

Scheme for the conceptual classification of Combinatorial Optimization Problems (COPs) under uncertainty. This paper focuses on Stochastic COPs (SCOPs) and on solution methods based on metaheuristics

Uncertain information may be formalized in several ways (vertical axis of Fig. 1). The case of perfect knowledge about the data of the problem corresponds to the classical field of solving (Deterministic) Combinatorial Optimization Problems (DCOPs) (low left corner of Fig. 1). Here, all information is available at the decision stage, and it is used by optimization algorithms to find a possibly optimal solution. The concrete application of a solution found would lead exactly to the cost of the solution as computed by the optimization algorithm, therefore DCOPs are also considered static problems, because from the point of view of the decision maker, there is nothing else to be decided after the optimization took place.Footnote 1 A typical example of DCOP is the well known Traveling Salesman Problem (TSP) (Gutin and Punnen 2002), where, given a set of customers and the set of distance values among each couple of customers, one must find the Hamiltonian tour (that is, a tour visiting once each customer) of minimal length. Despite its simple formulation, the TSP is an NP-hard problem, like many DCOPs.

Let us now consider problem formulations involving uncertainty (upper levels of Fig. 1). One possibility is to describe uncertain information by means of random variables of known probability distributions. This is what we assume in SCOPs (a more precise definition and examples of SCOPs will be given in Sect. 3.2). Under this assumption, the optimization problem is stochastic, and the objective function strongly depends on the probabilistic structure of the model. Typically, the objective function involves quantities such as an expected cost, the probability of violation of some constraint, variance measures, and so on. In SCOPs one can distinguish a time before the actual realization of the random variables, and a time after the random variables are revealed, because the associated random events happen. Static SCOPs are characterized by the fact that decisions, or, equivalently, the identification of a possibly optimal solution, is done before the actual realization of the random variables. This framework is applicable when a given solution may be applied with no modifications (or very small ones) once the actual realization of the random variables are known. The literature sometimes addresses this type of problems as ‘a-priori’ optimization. As an example of this class of problems, consider the probabilistic TSP (PTSP), that consists in finding a Hamiltonian tour visiting all customers (the ‘a priori’ tour) of minimum expected cost, given that each customer has a known probability of requiring a visit. Once the information of which customers actually require a visit on a certain day is known, the customers requiring a visit are visited in the order of the ‘a priori’ tour, simply skipping the customers not requiring a visit.

Dynamic SCOPs arise when it is not possible or not convenient to design a solution that is usable as it is for any realization of the random variables. In this case, decisions that need an optimization effort must be taken also after the random events have happened. This could also be done in stages, because it is often the case that the uncertain information is not revealed all at once, but in stages. As an example of dynamic SCOP, consider for instance a TSP where new customers of known positions appear with a certain probability while the salesman has already started to visit the customers known a priori. In this case an a priori tour must be modified dynamically in order to include the new customers in the visiting tour.

Another way of formalizing uncertainty is to identify the uncertain information with fuzzy quantities (vectors or numbers), and constraints with fuzzy sets. This approach has its roots in Bellman and Zadeh (1970) and in Zimmermann (1991).

An approach which is receiving increasing attention in the last years is the one of robust optimization, which assumes that uncertain information is known in the form of interval values. For example, one could consider the robust TSP, where the cost of arcs between couples of customers is given by interval values. These costs could have the meaning of travel times, being small if there is no or little traffic, and being high in case of traffic congestion. The robustness approach consists in finding solutions that hedge against the worst contingency that may arise, given that no knowledge about the probability distribution of random data is known. One possible way of quantifying robustness is the minmax criterion, under which the robust decision is that for which the highest level of cost taken across all possible future input data scenarios is as low as possible. Both static and dynamic versions of robust optimization problems may be formulated. For a good introduction to robust optimization, see for instance the book by Kouvelis and Yu (1997).

On the highest level of Fig. 1 we placed problems that we call Pure Online, where the input is modeled as a sequence of data which are supplied to the algorithm incrementally, but without making any assumption that can help to make a prevision on the new data. An algorithm for a Pure Online Problem produces the output incrementally without knowing the complete input, and its performance is evaluated with respect to an abstract competitor, who knows all the complete (past and future) data, and that is able to solve the offline problem optimally. This way of evaluating algorithms is called in the literature competitive analysis (Albers 2003; Borodin and El-Yaniv 1998). An example of Pure Online problem is the Dynamic Traveling Repair Problem (Irani et al. 2004), where a set of servers move from point to point in a metric space; the speed of each server is constant, so the time it takes to travel from one point to another is proportional to the distance between two points; time is continuous and at any moment a request for service can arrive at any point in the space; each job also specifies a deadline; if a job is serviced, a server must reach the point where the request originated by its deadline; the goal is to service as many incoming requests as possible by their deadlines.

We should again remark that in this paper we restrict to SCOPs (the shaded box in Fig. 1). SCOPs are combinatorial optimization problems, that is, problems where the decision space is finite but possibly too big to be enumerated, and/or problems having a combinatorial structure because solutions are encoded by permutations, binary vectors or other combinatorial objects. By this choice we neglect the vast field of continuous optimization under uncertainty, although the scheme we have just proposed for classifying problems under uncertainty equally applies to continuous problems.

SCOPs are relevant in many practical contexts, such as vehicle routing problems, where stochasticity is due to variable customers demands, or variable travel times, routing on information networks, where stochasticity is due to the variability of traffic and the related speed of information packages, finance, scheduling, location problems and many other contexts. All these problem domains may be, and usually are, also modeled as DCOPs. The advantage of using SCOPs over DCOPs is that the solutions produced may be more easily and better adapted to practical situations where uncertainty cannot be neglected, such as thrash collection, cash collection from banks, location of emergency services, and so on. Of course, the use of SCOPs instead of DCOPs comes at a price: first, the objective function is typically much more computationally demanding in SCOPs than in DCOPs; second, for a practical application of SCOPs, there is the need to assess probability distributions from real data or subjectively, a task that is far from trivial. For a discussion about the issue of computational burden and complexity in certain SCOP formulations, see for instance Haneveld and van der Vlerk (1999), and Dyer and Stougie (2003). The ways this issue is managed in metaheuristics applied to SCOPs will be described in detail in Sect. 5.

3.2 Formal descriptions of SCOPs

The class of SCOPs is so important and has impact in so many domains that several research areas are dedicated to its study: Stochastic Integer Programming, Markov Decision Processes (which is part of Stochastic Dynamic Programming) and Simulation Optimization being the main ones. Each research area corresponds to a particular way of modeling, formulating and solving optimization problems under uncertainty, and it is often treated separately in the optimization literature. The application of metaheuristics to SCOPs is a quite recent and fast growing research area, and it is thus natural that many of the papers borrow from the classical SCOP literature the same problem formulations. In this section we first give a general definition of a SCOP, then, in Sects. 3.2.1 and 3.2.2, we recall the main formal definitions of both static and dynamic SCOPs from the literature, by giving pointers to the research areas that originally proposed them.

Let us now give a general definition of SCOP, as proposed by Kall and Wallace (1994).

Definition 2

(SCOP) Consider a probability space (Ω, Σ, P) (Grimmett and Stirzaker 2001), where Ω is the domain of random variables ω (typically a subset of \({\mathbb{R}}^k\)), Σ is a family of “events”, that is subsets of Ω, and P is a probability distribution on Σ, with P(Ω) = 1. Consider also a finite set S of decision variables x. S is typically a subset of \({\mathbb{R}}^n.\) The random variable ω could also depend on the decision variable x, in that case it is denoted by ω x . Given a cost function G and constraint functions H i , i = 1,2,..., m, mapping (x,ω) ∈ (S, Ω) to \({\mathbb{R}},\) find

$$ \left\{ \begin{array}{ll} \mathop{\hbox{``min''}}\limits_{x\, \in \, S }\; G(x, \omega),\\ \hbox{subject to } H_i(x, \omega) \leq 0, \; i=1,2,\ldots,m. \end{array} \right. $$
(2)

Note, however, that according to the above definition, a SCOP is not well defined, since the meaning of “min” as well as of the constraints are not clear at all (Kall and Wallace 1994). In fact, how could one take a decision on x before knowing the value of ω, and how could one verify if H i (x, ω) ≤ 0, if ω is not yet known? Moreover, since ω is a random variable, also G(x,ω) and H i (x,ω) are random variables. For these reasons, the definition of SCOPs must be refined. There are several possibilities to do this, giving rise to different SCOP variants, both static and dynamic. These are also called deterministic equivalents of Definition 2. Let us first focus on static SCOPs, and later on dynamic SCOPs.

3.2.1 Static SCOPs

Definition 3

(Stochastic Integer Program—SIP) Given a probability space (Ω,Σ,P), a finite set S of feasible solutions x, a real valued cost function G(x, ω) of the two variables x ∈ S and ω ∈ Ω, and denoting by \({\mathbb{E}}_P (G(x, \omega))\) the expected value of G(x,ω) over Ω according to P, find

$$ \mathop{\hbox{min}}\limits_{x \, \in \,S} \left\{g(x) := {\mathbb{E}}_P (G(x, \omega)) \right\}. $$
(3)

The above definition is maybe the simplest SCOP formulation, and it does not consider random constraints (observe, though, that deterministic constraints could be implicitly included in the definition of the domain S of decision variables).

In some cases the cost function G is deterministic, that is, G only depends on x and not on the random variable ω, but constraints do depend on the random variable ω. In such situation it might be impossible to enforce H i (x,ω) ≤ 0 for all ω ∈ Ω. Thus, one could relax the notion of constraint satisfaction by allowing constraint violation, and by imposing that constraints are satisfied at least with some given probabilities. This leads to the following

Definition 4

(Chance Constrained Integer Program—CCIP) Given a probability space (Ω, Σ, P), a finite set S of feasible solutions x, a real valued cost function G(x), a set of real valued constraint functions H i (x, ω), and a set of constraint violation probabilities α i , with 0 ≤  α i ≤  1 and i = 1,2, ..., m, find

$$ \left\{ \begin{array}{ll} \mathop{\hbox{min}}\limits_{x\, \in \, S } \; G(x),\\ \hbox{subject to } \hbox{Prob}\{ H_i(x,\omega) \leq 0 \} \geq 1- \alpha_i, \; i=1,2,\ldots,m. \end{array} \right. $$
(4)

Both the Stochastic and Chance Constrained Program formulations have been originally proposed in the context of Mathematical Programming applied to SCOPs, and this field is also called in the literature Stochastic Integer Programming (SIP), a subset of the broader field of Stochastic Programming (Birge and Louveaux 1997). The Stochastic Programming community has a very active website (Stochastic Programming Community http://stoprog.org/) where updated bibliographic references and papers are available. Recent surveys on SIP include Haneveld and van der Vlerk (1999) and Kenyon and Morton (2002) (the latter overviews SIP applications in the context of location routing problems). Let us now focus on some dynamic SCOP deterministic equivalents of Definition 2.

3.2.2 Dynamic SCOPs

Informally, a stochastic dynamic problem is a problem where decisions are taken at discrete times t = 1,..., T, the horizon T being finite or infinite. Decisions taken at time t may influence the random events that happen in the environment after t. In dynamic SCOPs the concept of solution used in static SCOPs is no longer valid. For example, in the dynamic TSP that we described in Sect. 3.1, a tour among the set of customers known at the beginning of the day cannot be traveled as it is in practice, but it must be modified when new observations (new customers) are known. What the decision maker can do before the observation-decision process starts is to decide which policy (or strategy) to adopt, that is, to specify a set of rules that say what action will be taken for each possible random future event. For example, in the dynamic TSP, a possible policy consists in re-optimizing the portion of route among the not-yet-visited customers each time that a new customer appears. Another policy, which is less computationally expensive, but that possibly leads to a more costly tour, is to re-optimize at stages, only after a certain number of new customers has appeared. Note that in solving dynamic SCOPs one has to make a double effort: first, decide which policy to adopt, second, given the policy, solve the optimization sub-problems emerging dynamically. Both parts have influence on the final solution cost, but often the choice of the policy is due to factors that are outside the control of the decision maker. For instance, in the dynamic TSP one could be forced not to optimize every time a new customer arrives, in case customers want to know in advance the vehicle arrival time.

Among the dynamic formulations the most common ones are those belonging to the class of Stochastic Programming with Recourse (Two-stage and Multiple-stage Integer Stochastic Programs) and Markov Decision Processes.

Definition 5

(Two-stage Stochastic Integer Program—TSIP) Given a probability space (Ω, Σ, P), a finite set S 1 of first-stage decisions x 1, a finite set S 2 of second-stage decisions x 2, and real valued cost functions f 1 and f 2, find

$$ \mathop{\hbox{min}}\limits_{x_1\, \in \, S_1} \left\{g_1(x_1) := f_1(x_1) + {\mathbb{E}}_P \left( G(x_1,\omega)\right) \right\},$$
(5)

where

$$ G(x_1,\omega) :=\mathop{\hbox{min}}\limits_{x_2\, \in \, S_2(x_1,\omega)} f_2(x_1,x_2,\omega).$$
(6)

Given the above definition, solving a Two-stage Stochastic Integer Program consists in solving two problems: a DCOP for the second-stage (Eq. 6), and a Stochastic Integer Program (Definition 3) for the first-stage (Eq. 5). The meaning of the two-stage decision process is the following. The first-stage decision x 1 must be taken before knowing the actual value of the random variable ω. After the value of ω is observed, it may be convenient or necessary to take some other decision (the second-stage decision x 2) in order to better adapt to the new situation discovered. The second-stage decision is also called recourse action, because in some practical situations it has the effect of ‘repairing’ the consequences of an action (x 1) taken before knowing the value of the random variable. Informally, a Two-stage Stochastic Integer Program consists in finding the best decision now, with the hypothesis that I will also take the best decision when I will know the value of the random quantities. A practical example of a Two-stage Stochastic Integer Program is the Vehicle Routing Problem with Stochastic Demands (VRSPD), where a set of customers is given, and for each customer a probability distribution for its demand is known, as depicted in Fig. 2. One vehicle of fixed capacity Q must travel among customers to deliver (or pick-up) some good, minimizing the expected length of the tour. The vehicle tour is decided prior of knowing the actual demand of each customer. As the vehicle travels along the tour, the driver discovers the actual demand of a customer only when arriving at that customer. When a customer demand is known and the customer has been serviced, the next best decision may be to go back to the depot for replenishment, or to proceed to the next planned customer. The choice between these options is part of the second-stage optimization problem. In this context, the tour planned a priori may be interpreted as the first-stage decision x 1, while the set of return trips to the depot may be interpreted as the second-stage decision x 2.

Fig. 2
figure 2

An instance of the vehicle routing problem with stochastic demands (VRPSD), which is an example of a Two-stage Stochastic Integer Programming problem (TSIP)

The Two-stage Stochastic Integer Program may be easily extended to the general Multi-stage case.

Definition 6

(Multi-stage Stochastic Integer Program—MSIP) Consider T decision stages t = 1,2,..., T, and correspondingly, T decision variables x t  ∈ S t (with S t finite subsets depending on (x 1,..., x t-1, ω1,..., ω t − 1)), and T random variables ω t belonging to probability spaces (Ω t , Σ t , P t ). The problem consists in finding

$$ \mathop{\hbox{min}}\limits_{x_1\, \in \, S_1} \left\{g(x) := f_1(x_1) + {\mathbb{E}}_{P_1} ( G_1(x_1, \omega_1) ) \right\} $$
(7)

where, for t = 1,2,..., T − 2,

$$ \begin{aligned} G_t (x_1,\ldots, x_t,\omega_1,\ldots \omega_t)&=\mathop{\hbox{min}}\limits_{x_{t+1}\, \in \, S_{t+1}}\left[ f_{t+1}(x_1,\ldots, x_{t+1},\omega_1,\ldots, \omega_{t+1})\right.\\ &\left.+{\mathbb{E}}_{P_{t+1}} (G_{t+1}(x_1,\ldots, x_{t+1},\omega_1,\ldots, \omega_{t+1}))\right], \end{aligned} $$
(8)

and

$$ G_{T-1}(x_1,\ldots, x_{T-1},\omega_1,\ldots, \omega_{T-1})= \mathop{\hbox{min}}\limits_{x_{T} \, \in \, S_T} f_T(x_1,\ldots, x_T, \omega_1, \ldots, \omega_T).$$
(9)

Observe that, from the above definition, solving a Multi-stage Stochastic Integer Program consists in solving one DCOP for the last stage (Eq. 9), and T − 1 Stochastic Integer Programs for the intermediate stages (Eqs. 7, 8).

For the sake of completeness, we say that MSIP are also related to the domain of Stochastic Dynamic Programming and Markov Decision Processes, that we do not treat in this survey. The interested reader may find a clear exposition on the relation between MSIP and Stochastic Dynamic Programming in Kall and Wallace (1994), and references on the few papers applying metaheuristics to Markov Decision Problems in Bianchi (2006).

4 Applying metaheuristics to SCOPs

In this section we introduce the key issue to be addressed when extending a metaheuristic from DCOPs to SCOPs: the computation of the objective function and of different types of objective function approximations.

As we have seen in the previous section, all SCOP formulations involve the computation of one or more expected values for evaluating the objective function. As a consequence, three different situations may arise when computing SCOP objective functions:

  1. 1.

    closed-form expressions for the expected values are available, and the objective function is computed exactly based on these objective values;

  2. 2.

    as in case 1, closed-form expressions for the expected values are available, but the objective function is considered to be too time consuming to be always computed during optimization. Therefore, ad hoc and fast approximations of the objective are designed and used during optimization (possibly alternating exact and approximated evaluations);

  3. 3.

    the problem is so complex in terms of decision variables and/or in terms of probabilistic dependences, that no closed-form expression exists for the expected values, therefore, the objective function is estimated by simulation.

All the three above situations have been addressed by the metaheuristics literature, as summarized by Table 2. Let us now give some introductory information on the use of ad hoc and sampling approximations in SCOPs.

Table 2 Classification of metaheuristics papers according to SCOP formulation (rows) and type of objective function computation (columns)

4.1 Ad hoc approximations

The design of ad hoc approximations is strongly problem dependent, and no general rule exists for finding efficient approximations of the objective function. Examples of ad hoc approximations in the literature include: the use of the objective function of a DCOP similar in some respects to the SCOP considered; the use of truncated expressions for the expected values, by neglecting terms that are estimated to be small; the use of scenarios, instead of considering the true probabilistic model. Ad hoc approximations, if on one side accelerate the evaluation and comparison among solutions, on the other side introduce a systematic error in the computation of objective function values. Usually, the systematic error cannot be reduced unless a different, more precise ad hoc approximation is designed, and it can only be evaluated by comparison with the exact objective value. Thus, metaheuristics typically alternate exact and approximated evaluations during the optimization process. More details about different ways ad hoc approximations are used by metaheuristics will follow in Sect. 5.

4.2 Simulation approximation

When a closed-form expression for the expected value(s) is not available, one common choice is to estimate expectations by Monte Carlo-type simulations. For example, in the case of the Stochastic Integer Program (Definition 3), the objective function g(x) is typically approximated by the sample average

$$ g_N(x):= \frac{1}{N} \sum\limits_{j=1}^{N} G(x, \omega_j) $$
(10)

where ω1, ω2, ..., ω N is a random sample of N independent, identically distributed (i.i.d.) realizations of the random vector ω. The sample average is also referred to as sample estimate, and the random realizations as random scenarios. In this paper, we will use these terms interchangeably.

The main difference between SCOPs requiring simulation for estimating the objective function and DCOPs, or SCOPs with exactly computable objective function is that, in the first-mentioned case, it is not possible to decide with certainty whether a solution is better than another one. This can only be tested by statistical sampling, obtaining a correct comparison result only with a certain probability. Thus, the way simulation approximation is used in metaheuristics largely depends on the way solutions are compared and the best solution among a set of other solutions is selected (‘selection-of-the-best’ method).

A huge research area devoted to solving problems with simulated objective function is Simulation Optimization. Following the definition given by Fu (2003), Simulation Optimization means “searching for the settings of controllable decision variables that yield the maximum or minimum expected performance of a stochastic system that is presented by a simulation model.” A compact picture of the field is given by the reviews of the Winter Simulation Conference (Andradóttir 1998; Ólafsson and Kim 2002). The latest one by Ólafsson and Kim (2002) emphasizes discrete problems and practical approaches, including some references to metaheuristics. Until a few years ago, the literature on Simulation Optimization was especially focussed on theoretical results of convergence of mathematically elegant algorithms. Interestingly, as noted by Fu (2002), the many new commercial software packages for simulation do not take advantage of the theoretical results of the literature. On the contrary, most of them rely on metaheuristics such as GA and Neural Networks, that are more easily adaptable to complex real-world simulations, but often their integration into commercial packages lacks rigor and is not provably convergent. Fu speculates that an interesting direction of research would be the development of algorithms that take advantage of the theoretical results of the literature, but are still flexible and applicable to real-world situations, so to fill the gap between theory and practice. Indeed, recent developments in Simulation Optimization, especially relying on metaheuristics, go in this direction.

5 Literature review

This section focuses on the application to SCOPs of the metaheuristics introduced in Sect. 2, namely ACO, EC, SA, and TS, plus algorithms such as SPM, PH, and RO (see Table 3) that could be called metaheuristics, even if they are not commonly known as such, or that make use of metaheuristics as part of the algorithmic procedure.

Table 3 Acronyms used for the metaheuristics described in this paper

For each metaheuristic, the reviewed papers have been grouped into two different paragraphs, respectively focussing on: Papers that make use of one or more types of ad hoc approximations; papers that make use of the simulation approximation.

5.1 ACO for SCOPs

The application of ACO to SCOPs is at its early stages, the first works being appeared at conferences after the year 2000. Nevertheless, the ACO literature already contains both theoretical and experimental works that cover both static and dynamic SCOPs.

Ad hoc approximation. The first SCOPs that have been addressed by ACO are the probabilistic TSP (PTSP), in Bianchi et al. (2002a, b) and Branke and Guntsch (2003, 2004), and the vehicle routing problem with stochastic demands (VRPSD) in Bianchi et al. (2004, 2006).

The PTSP and the VRPSD have in common the fact that their solution structure (and the corresponding construction graph) is very similar to their deterministic counterpart (the TSP, respectively capacitated VRP). The main difference with the respective deterministic counterpart problem is the much higher computational complexity of the objective function in the stochastic version of the problem. In the PTSP, the objective function is computable in O(n 2) time, n being the number of customers, while in the TSP it only requires O(n) time. In the VRPSD, the objective requires O(nKQ), where n is the number of customers, K is the number of possible demand values of each customer, and Q is the vehicle capacity, while the capacitated VRP objective only requires O(n) time. The fact that the difference between the stochastic and deterministic versions of these problems mainly lies in the objective function makes them particularly appropriate for studying a first application of ACO (or of any other metaheuristic) to SCOPs. In fact, in this case it is possible to apply to the stochastic problem an ACO algorithm originally designed for the deterministic problem with almost no modifications.

In Bianchi et al. (2002a, b), the authors experimentally investigate on the PTSP two versions of ACO: ACS and pACS. ACS, that was originally designed for the TSP by Gambardella and Dorigo (1996) and by Dorigo and Gambardella (1997), solves the PTSP using the objective function of the TSP (the length of a Hamiltonian path) as a rough but fast approximation of the PTSP objective function. The second version of ACO considered in Bianchi et al. (2002a, b), pACS, is identical to ACS except from the fact that it uses the exact PTSP objective function (the expected length). The two ACO versions use the same, TSP specific, heuristic information (the reciprocal of the distance between two customers). Experimental results on PTSP instances with homogeneous customers probabilities have shown that pACS is better than ACS, except for the case when the customers probabilities are close to 1, in which case ACS is more efficient than pACS. This means that the overhead of the time consuming PTSP objective function is not justified in those cases where the approximate objective function, which can be computed much faster, is close enough to the exact one. The idea to employ faster approximations of the exact objective function has been further developed in Branke and Guntsch (2003, 2004). The authors propose an ad hoc approximation of the expected cost that neglects the least probable customers configurations. This approximation is shown experimentally to accelerate convergence without significantly worsening the solution quality. Another issue addressed by Branke and Guntsch (2003, 2004) is the design of PTSP-specific heuristics to guide the ants construction process. The authors experimentally analyze different heuristics, and show that one of them indeed improves the quality of solution constructed by ants, but at the cost of a higher computational time.

An important aspect in designing ACO (and most metaheuristics) for SCOPs, is the application of a local search procedure to improve solutions. In ACO, the local search is part of the \({\mathsf{DeamonActions}}\) of Algorithm 1. In order to be competitive with state-of-the-art algorithms, it has been necessary for ACO algorithms to use a local search both in the PTSP (Bianchi 2006; Branke and Guntsch 2004) and the VRPSD (Bianchi et al. 2004). Unfortunately, designing an effective local search for a stochastic problem with a computationally expensive objective function may be quite a challenging task. The reason is that in local search it is very important to compute efficiently the change, or ‘delta’, of the objective function between two neighboring solutions. When the objective function is complex like in most SCOPs, it is difficult to find a delta expression which is both exact and fast to be computed. For the PTSP, Bianchi et al. (2005) and Bianchi and Campbell (2007) derived fast and exact recursive expressions for the objective function delta for two local search operators, the 1-shift and the 2-p-opt. The 1-shift and 2-p-opt are very efficient, since they can explore the whole neighborhood of a solution in O(n 2) time, the same time it would take for the same operators in the TSP. Note that this result could be applied not only to improve the performance of ACO, as it has been done by Bianchi in (2006), but also to improve the performance of EC algorithms by applying the local search to some selected solutions. The 1-shift and 2-p-opt neighborhoods with fast computed delta could also be used as building blocks of SA and TS algorithms.

For the VRPSD, a local search operator is available, the OrOpt, with an efficient ad hoc approximated delta expression that has been introduced by Yang et al. in (2000) (we call this ‘VRPSD approximation’). In Bianchi et al. (2004, 2006), besides the VRPSD approximation, one based on computing the length difference between two neighboring solutions has been considered. This last approximation is equivalent to treat a VRPSD solution (which is a Hamiltonian path) like a solution for the TSP, and it is faster but less accurate than the VRPSD approximation. In Bianchi et al. (2004, 2006), the impact of using the two above types of delta approximation has been tested on several metaheuristics, namely ACO, EC, SA, TS, and Iterated Local Search. In ACO, the use of the rough but efficient TSP approximation lead to better results than the VRPSD approximation (even though ACO was not able to reach the quality of the best performing metaheuristics, that were Iterated Local Search and EC).

Sampling approximation. When ACO is applied to this type of problems, the \({\mathsf{DeamonActions}}\) procedure (Algorithm 1) must implement ways of performing statistical tests for comparing the sample average values of the solutions generated by ants, in order to select the best solution (or a set of best solutions). Sampling could also be used in \({\mathsf{ConstructAntsSolutions}},\) in order to estimate heuristic values η k,l (u), when the chosen heuristic depends on random variables.

The first sampling-based ACO, called S-ACO, has been proposed and analyzed by Gutjahr in two papers (Gutjahr 2003, 2004). The first paper (Gutjahr 2003) theoretically analyzes S-ACO, by proving convergence to the optimal solution with probability one. The second paper (Gutjahr 2004) experimentally studies S-ACO on two stochastic routing problems, the PTSP, and the TSP with time windows and stochastic service times (TSPTW). S-ACO has been applied in a third paper by Rauner et al. (2005) to a policy optimization problem in healthcare management. Algorithm 5 summarizes the functioning of S-ACO, showing in particular how sampling is used; for details about procedures \({\mathsf{ConstructAntsSolutions}}\) and \({\mathsf{UpdatePheromone}},\) see (Gutjahr 2003, 2004). In every iteration, after ants have constructed their solutions x σ, only one ant solution x is selected for being further compared with the current best solution (step 3 of Algorithm 5). Interestingly, for the sake of convergence, it does not matter how the ant solution is selected (Gutjahr 2003). A possible way to do it, which has been chosen in Gutjahr (2004), is to evaluate each x σ on a same random scenario drawn specifically for a given iteration, and to take x as the solution with the best value. In the case of the more complex problem treated in Rauner et al. (2005), selecting x σ based on a single random scenario turned out as suboptimal; better results were obtained by choosing several (but not too many) scenarios. After x has been selected, it is then again evaluated, together with the current best solution x*, in order to decide whether it is better than x*. This is done by estimating x by sampling over N k scenarios ων and x* over N k scenarios ω′ν. In the convergence proof of Gutjahr (2003), it has been necessary to impose that ων and ω′ν are independent, but in practice (Gutjahr 2004), if ων = ω′ν S-ACO also performs well. The number of sample scenarios N k is a critical parameter of S-ACO: if too small, the estimate and comparison of solutions will be often faulty, but if N k is too big, the computational time required for one solution evaluation could become a problem. As shown in Gutjahr (2003), for proving convergence it is sufficient that N k increases linearly with the iteration number k. This result is interesting especially if compared with the faster than quadratic increase recognized as necessary for the corresponding SA approach in Gutjahr and Pflug (1996; Homem-de-Mello 2000). In practical implementations of S-ACO, it may be more convenient to choose the sample size N k adaptively, based on some statistical test. In Gutjahr (2004), one version of S-ACO establishes the sample size by means of a parametric statistical test: N k is gradually increased till when the difference between the sample estimation for the two solutions being compared is larger than 3 times their estimated standard deviation. This kind of sample schedule, also known as variable-sample strategy, has been theoretically analyzed in the context of random search algorithms by Homem-de-Mello (2003).

Table 8

More recently, the ACO/F-Race algorithm has been proposed by Birattari et al. (2005, 2006), where at each iteration the selection of the new best solution (steps 3–12 of Algorithm 5) is done with a procedure called F-Race, which is more sophisticated than the simple parametric test of S-ACO. As explained in Birattari et al. (2005), F-Race consists in a series of steps at each of which a new scenario ω is sampled and is used for evaluating the solutions that are still in the race (at the beginning, all solutions generated by ants in a given iteration, together with the current best solution, are in the race). At each step, a Friedman test is performed and solutions that are statistically dominated by at least another one are discarded from the race. The solution that wins the race is stored as the new current best solution. Preliminary experiments on homogeneous instances of the PTSP problem have shown that ACO/F-Race improves over the parametric procedure adopted by S-ACO.

5.2 EC for SCOPs

There is a very large amount of literature on applying EC to optimization problems ‘under uncertainty’, such as problems with noisy fitness, with time varying and dynamic fitness, and with approximated fitness. This huge stream of research is certainly related to SCOPs, but it does not exactly match the scope of our survey, as we have defined it Sect. 3. In fact, although SCOPs are a particular case of optimization under uncertainty, the translation to the SCOP domain of the results from the EC literature on optimization under uncertainty is not easy. The main difficulty is that most papers focus on continuous optimization, and they often restrict their attention to the optimization problem characterized by ad hoc test functions, such as the ‘spherical’ objective function \(f({\mathbf{x}}) = {\mathbf{x}}^T {\mathbf{x}}, {\mathbf{x}} \, \in \, {\mathbb{R}}^N.\) Also when discrete optimization problems are considered, experiments are often restricted to the ‘onemax bit-counting’ function, which can be regarded as the counterpart of the spherical objective function in binary search spaces.

In the following, we mainly consider contributions of EC that precisely belong to the SCOP domain. A survey of the broader field of EC for problems ‘under uncertainty’ would be out of scope, and would be impossible for space limitations, therefore, we just highlight the main ideas and methods that may be relevant for SCOPs. The reader interested in EC applied to optimization with noisy fitness or with time varying information may find references in the surveys by Jin and Branke (2005). Other reviews include a book by Arnold (2002), and a paper by Beyer (2000). For problems involving time varying information, there are excellent surveys by Morrison (2004) and Branke (2001, 2002).

Ad hoc approximation. We have identified two groups of papers. In the first group (Easton and Mansour 1999; Mak and Guo 2004) EC algorithms use the exactly computable objective function as it is, even if computationally expensive, while in the second group (Bianchi et al. 2004, 2006), and references cited in Jin (2005) EC exploits also computationally more efficient objective function (fitness) approximations. Let us briefly analyze these two groups of papers.

Easton and Mansour (1999) apply a distributed GA to three different labor scheduling problems, one of which is formulated as a stochastic goal programming problem. Their algorithm operates in parallel on a network of three workstations. Separate sub-populations evolve independently on each processor, but occasionally the fittest solutions migrate over the network to join the other sub-populations. Also infeasible solutions are accepted (with a fitness penalty) in order to encourage the exploration of promising regions of the search space. The proposed GA is compared experimentally to a SA and a TS metaheuristic previously developed by other authors (respectively in Brusco and Jacobs (1993a, b) and in Easton and Rossin (1996)), and it is shown to outperform both of them.

Mak and Guo (2004) consider a vehicle routing problem with stochastic demand and soft time windows, which is formulated as a Two-stage Stochastic Integer Program (Definition 5). The authors propose an EC algorithm called Age-GA where, instead of being replaced by their offspring after each iteration, individuals may grow up and generate new offspring continuously before death, and the population comprises individuals from various age-groups. With the same amount of computational effort, it is possible to use a larger population size in Age-GA than in a canonical GA. The paper shows that, on a set of eighteen randomly generated instances, Age-GA outperforms a canonical GA without the aging mechanism.

Liu (2007) considers a hybrid Scatter Search evolutionary algorithm for the PTSP that incorporates the use of the nearest neighbour constructive heuristic, threshold accepting screening mechanism, and crossover operator. Their algorithm is shown experimentally to outperform other heuristics and metaheuristic approaches based on SA. Liu et al. (2007) investigate the effectiveness of a diversified crossover operator under an evolutionary algorithm framework to solve the PTSP. They combine different crossover operators such as order crossover, order based crossover, and position based crossover and test their effectiveness experimentally.

To the group of papers using ad hoc approximations of the fitness function in SCOPs belong (Bianchi et al. 2004, 2006) by Bianchi et al. (also cited in Sect. 2.1), that compare a simple EC with other metaheuristics (ACO, SA, TS, and Iterated Local Search) for the VRPSD. Similarly to the other metaheuristics, EC is integrated with the OrOpt local search operator, where two approximations for the objective value difference between neighboring solutions have been tested, the VRPSD and the TSP approximation. The exact VRPSD objective function is used for accepting a new solution in the local search, and for the selection of a new population. EC, like ACO and Iterated Local Search, performs better with the TSP approximation than with the VRPSD approximation, as shown in Fig. 3. Interestingly, EC is improved even more when in Bianchi et al. (2006), instead of OrOpt, a more TSP-specific local search (3-opt) is used. EC, together with Iterated Local Search, is shown to be the best performing among the tested metaheuristics.

Fig. 3
figure 3

Effect of using two different ad hoc approximations (called, respectively, TSP approximation and VRPSD approximation) in EC and other metaheuristics analyzed in Bianchi et al. (2006)

Here, it is useful to note that there is a thread in the EC literature that focuses on the use of computationally efficient approximations of the original fitness in continuous optimization problems. Some aspects of this issue that are developed in the context of continuous optimization may be relevant to SCOPs as well. Fitness approximations are also known as approximate models, meta-models or surrogates. A comprehensive survey on fitness approximation in EC has been written by Jin (2005). This growing research area is particularly oriented to continuous optimization problems with extremely time consuming objective function computations, such as, for instance, structural design optimization (Barthelemy and Haftka 1993), where one single fitness evaluation may take over ten hours on a high-performance computer. The issue of how the approximate model can be incorporated in the EC algorithm, which has been widely addressed by the EC literature on fitness approximation, is quite independent from the continuous or discrete nature of the optimization problem. Nevertheless, most of the ideas still haven’t been applied to SCOPs. For a review and pointers to existing literature, see Sect. 4 of Jin (2005).

Sampling approximation. The EC literature about optimization with noisy fitness function is also relevant for SCOPs with sampling estimated objective function. In fact, noise is mostly assumed to be additive with zero-mean, which is the case when Monte Carlo sampling is used to estimate the objective function. Section II of Jin and Branke (2005) is a good overview about the methodological approaches used in EC to deal with noisy fitness functions. The authors identify three main strategies for dealing with noise: explicit averaging, implicit averaging, and modifying the selection process. In the following we explain the working principles of these strategies, but for a detailed analysis and complete references, see (Jin and Branke 2005).

Explicit averaging corresponds to the computation of sample averages of the fitness function performing repeated measures of the noisy fitness and computing their average. This is very similar to the Simulation Optimization technique we have illustrated in Sect. 4.2. The second type of strategy for dealing with noise in EC is implicit averaging. Its aim is to reduce the influence of noise by using a large population size, instead of performing more fitness measures of a single individual. The intuition behind implicit averaging is the following (Jin and Branke 2005, p. 305): because promising areas of the search space are sampled repeatedly by the EC algorithm, and there are usually many similar solutions in the population, when the population is large, the influence of noise in evaluating an individual is very likely to be compensated by that of a similar individual. This can be regarded as an implicit averaging effect. The third strategy used in EC to reduce the influence of noise is modifying the selection process of individuals. One example is to accept an offspring individual only if its fitness is better than that of its parents by at least a predefined threshold. Another way of modifying the selection process with respect to standard EC is to eliminate random choices during selection, in order to exploit the uncertainty due to the noisy fitness as a sort of randomization effect. Convergence properties of EC and the dynamics of the fitness function when noise is present have been analyzed in several papers, for example by Miller and Goldberg (1997) and by Beyer (2000).

In all the above cited papers aiming at reducing the influence of noise via implicit and explicit averaging, or via modifications of the selection process, the computational experience is unfortunately limited to ad hoc continuous or discrete test functions. It appears that an experimental validation of the various techniques in the SCOP domain is still missing in the literature. In fact, the few papers applying EC to SCOPs that we are going to outline below, either use very simple techniques for dealing with noise or rely on methods that are unrelated to the main EC literature on noisy fitness functions.

Watson et al. (1999) address a stochastic warehouse scheduling problem where the objective function must be estimated by simulation. The authors consider a GA, a solution construction heuristic specific for that problem, two local search and a random search algorithm. Two versions of the GA and the local search algorithms are considered where the (set of) starting solution(s) is randomly generated in one case, and provided by the constructive heuristic in the other case. In order to keep the run time of the algorithms feasible, the simulator is used in a fast but inaccurate mode. Only final solutions are eventually evaluated with a more accurate—two order of magnitude slower—simulator mode. The constructive heuristic exploits specific knowledge about the internal states of the simulator in order to construct a solution. Instead, in the GA and local search algorithms the simulator is used as a black box, that, provided a solution, returns a real value indicating the solution quality. Experimental results show that GA initialized with the domain-specific construction heuristic outperforms all the other algorithms. Moreover, all algorithms perform worse when initialized by random solutions. The results also highlight an interesting phenomenon related to the use of a black box, fast but inaccurate simulator for the evaluation of solutions during the execution of the GA. As better and better solutions according to this simulator are found, it is observed that the correlation with solution values given by the slow-accurate simulator (evaluated a posteriori) decreases. This implies that the final solution returned by the GA as best solution may be quite bad with respect to the nearly exact objective value. It is reasonable to think that this is a general phenomenon that can happen in any metaheuristic exploiting a non-exact or noisy objective function evaluation, particularly when estimating the objective function by sampling and with a fixed (low) number of samples. One possibility to overcome this problem is to keep in memory a set of promising solutions encountered during the execution of the algorithm, and to evaluate them a posteriori with the accurate simulator, or to apply more sophisticated adaptive sampling techniques.

Yoshitomi (2002) and Yoshitomi and Yamaguchi (2003) use GA for solving the stochastic job-shop scheduling problem. In both papers, the best solution is extracted among the set of solutions that have been more frequently present through the generations of the GA. Yoshitomi and Yamaguchi (2003), Monte Carlo sampling is used to select among the set of most frequent solutions the best final solution.

Other applications of GA based on Monte Carlo sampling are the ones by Sudhir Ryan Daniel and Rajendran (2005) applying GA to the inventory optimization problem in a serial supply chain, by Jellouli and Châtelet (2001) using GA for addressing a supply-chain management problem in a stochastic environment, and by Wang et al. (2008) applying GA to solve a nonlinear mixed integer stochastic program arising in resource portfolio planning optimization under demand uncertainty.

5.3 SA for SCOPs

In the literature, several extensions of the SA algorithm above have been suggested for treating Stochastic Integer Programs (Definition 3), both in the case of ad hoc approximations and sampling approximation.

Ad hoc approximation. One early application of SA in the context of SCOPs is due to Teodorović and Pavković (1992). The authors address a VRPSD with multiple vehicles, and use SA in two stages, first for partitioning the customers among the different vehicles, and second to improve the single vehicle routes. In this preliminary work, computational results are reported only for one instance of 50 customers.

More recently, the already cited papers (Bianchi et al. 2004, 2006) by Bianchi et al. (see Sects. 2.1, 2.2) have applied to the VRPSD a simple SA algorithm, together with other metaheuristics (ACO, EC, TS, and Iterated Local Search). Similarly to the other metaheuristics, two approximations for the objective value difference between neighboring solutions generated according to the OrOpt scheme have been tested, the VRPSD and the TSP approximation. Differently from what happens for ACO, SA performs better when using the more accurate but more computationally expensive VRPSD approximation. On average, SA does not perform significantly different from ACO, and it is not able to reach the quality of the best performing metaheuristics, that are EC and Iterated Local Search.

Sampling approximation. Algorithm 6 shows a typical basic structure of an SA modification to the solution of Stochastic Integer Programs (Definition 3) with sampling estimated objective function. The approaches from the literature outlined below follow this general scheme. Differences stay particularly in the way step 5 (estimation of the objective value), step 11 (choice of a new approximation of the optimal solution), and step 12 (temperature level) are implemented in Algorithm 6.

Table 9

Gelfand and Mitter (1989) investigate the case where the observation of the objective function g(x) is disturbed by random noise W k in iteration k of the SA process, such that instead of g(x), the estimate g k (x) = g(x) + W k is observed. They show that if W k is normally distributed with mean zero and variance σ 2 k , if certain conditions on the values σ k and on acceptance probabilities are satisfied, and if the cooling schedule, i.e., the sequence of values T k , as required for convergence of ordinary SA, is used, then the convergence property of ordinary SA remains valid.

Gutjahr and Pflug (1996) follow a similar approach by showing that under suitable conditions on the “peakedness” (Birnbaum 1948) of the noise distribution, convergence of the current solutions to the set of optimal solutions can be guaranteed. To be more specific, let us call a symmetric distribution μ1 more peaked around zero than a symmetric distribution μ2, if for all t > 0, the probability mass on the interval [−t, t] is larger or equal under μ1 than under μ2. Then, if the distribution of the noise W k is more peaked around zero than a normal distribution N(0,σ 2 k ), where σ k  = O(k −γ) with a constant γ > 1, the distribution of the solution in iteration k converges as k →∞ to the uniform distribution on the set of global optimizers, provided that a suitable cooling schedule (ensuring convergence of ordinary SA) is used. Decreasing σ k with the required rate can be achieved by increasing the sample size N k more than quadratically in k, that is, by imposing that N k  = O(k μ) with μ > 2. An application of the technique of (Gutjahr and Pflug 1996) to a discrete time/cost tradeoff problem in activity planning has been reported in Gutjahr et al. (2000a).

Other approaches have been presented by Roenko (1990), who proposes to store the feasible solutions produced during the execution of the algorithm and to compare them with the solution generated in each current iteration, and by Fox and Heine (1995), who derive a convergence result based on the assumption that with probability one, the objective function estimates g k (x) coincide after some finite time with the true objective function values g(x), as it can be achieved by consistent estimators in the case of only finitely many possible objective function values. The last assumption can also be relaxed, if some more complicated condition can be verified, but Fox and Heine argue that in each computer representation, objective function values are taken from some finite domain (given by the machine number precision) anyway. The algorithm indicated by Fox and Heine does not use independent sampling from scratch in each iteration, as it is done in Gelfand and Mitter (1989) and Gutjahr and Pflug (1996), but cumulates the sampling results, which is of course advantageous from a computation time viewpoint.

Alrefaei and Andradóttir (1999) pursue a different idea by keeping the temperature parameter T k constant during the process instead of decreasing it toward zero (as usual in ordinary SA). To obtain convergence, two alternative techniques are suggested. The first, let us call it A1, consists in the following procedure: In each iteration k, for the current solution x chosen in this iteration, a counter V k (x) is increased by one in order to register how often x has been visited since the start of the algorithm. The current number V k (x) of visits is divided by the number D(x) of neighbors of x. The estimated optimal solution x* in iteration k is then defined as that solution x* = x for which V k (x)/D(x) is maximal, among all the solutions x that have been encountered so far. The second technique, let us call it A2, is to estimate the objective function value of solutions x and y (step 5 of Algorithm 6), by cumulating previous estimates of x and y (if any), and then, choose as new approximation x* of the optimal solution at iteration k the solution with the smaller estimated objective value, among all solutions evaluated so far. Both A1 and A2 compute sample averages with an increasing number of samples at each iteration k.

Alrefaei and Andradóttir show that both alternatives guarantee, under mild conditions, convergence with probability 1 to the set of optimal solutions. Their article also reports on experimental comparisons showing a superiority of the introduced new algorithms over the previous approaches in Gelfand and Mitter (1989), Gutjahr and Pflug (1996) and Fox and Heine (1995); among the two new algorithms, A2 turns out to yield better results than A1. The experiments are restricted to a test instance with only 50 feasible solutions, therefore it is not clear whether the results can be generalized to larger search spaces; nevertheless, the empirical findings give some evidence that using the solution with best objective function estimate so far as the proposed solution may be a very good choice. Interestingly, for the considered test instance, a random-search-like neighborhood structure including all elements of S (different from x) into the neighborhood S(x) of x produces, for all tested algorithms, better results than a more restricted neighborhood. This seems to indicate that in the stochastic case, the hill-climbing feature of SA gains importance only for larger solution spaces S.

A further important contribution of (Alrefaei and Andradóttir 1999) is that the article discusses optimization both in a transient and in a steady-state simulation context. It is shown that if g(x) is given as the expectation of a functional G(x,ω) of a stochastic process in either a transient or a steady-state situation, then the theoretical result derived for the simple static SCOP case (corresponding to our Definition 3) still remains valid.

One practical limitation of approaches such as the two just described by Alrefaei and Andradóttir (1999) and the one by Roenko (1990) is that they require the storage of information about all or most of the solutions encountered by the algorithm, and this is an infeasible task for problems that have a combinatorial nature.

Alkhamis et al. (1999) use again a decreasing cooling schedule for the parameters T k . They propose to decide on acceptance or rejection of a neighbor solution y by means of a statistical significance test: A confidence interval for the difference between the true objective function values in x resp. y is computed; depending on the position of the value zero in relation to this confidence interval, the neighbor solution is judged as equal, better or worse than the current solution x. After that, the usual acceptance rule of SA is applied. The authors are able to show that on certain conditions on sample size and cooling schedule, the classical SA convergence property is still satisfied.

Homem-de-Mello (2000, 2003) presents a comprehensive framework for describing and analyzing variants of SA for SCOPs. The framework enables a thorough theoretical analysis and opens a broader range of flexibility in the choice of sampling distributions. Using ergodicity theory, Homem-de-Mello proves in Homem-de-Mello (2000) a rather general convergence theorem for a variable-sample modification of SA. The theorem includes the result in Gutjahr and Pflug (1996) as a special case, but does not make use of any normality assumptions related to noise distributions anymore. Homem-de-Mello (2003), this approach is further generalized beyond the area of SA, although the described analytical techniques and algorithmic ideas remain applicable in a SA context, as well as in the context of other metaheuristics dealing with SCOPs with objective function estimated by sampling. In particular, the author presents the interesting idea of adaptively modifying the sample size N k during the iterations of the algorithm, in such a way that N k is usually only increased if the result of a t-test indicates that higher accuracy of the objective function estimates is required. To preserve the convergence property, the sample size is increased at some specific points in time regardless of the t-test.

In Alkhamis and Ahmed (2004), the acceptance rule based on confidence intervals developed in Alkhamis et al. (1999) is modified by applying the constant-temperature schedule of Alrefaei and Andradóttir (1999) instead of the classical decreasing temperature schedule. As the current estimated solution, the authors take the solution with the maximum (normalized) number of visits so far. Again, a convergence result is given.

There are also some purely experimental papers involving SA and SCOPs with sampling estimated objective function. The earliest is a paper by Bulgak and Sanders (1988) addressing a buffer allocation problem in the context of a complex manufacturing system. The objective function to be maximized (the efficiency of the system) is estimated by means of a discrete event simulator. Similarly to Homem-de-Mello (2003), an adaptive sampling procedure is used, where the number of samples is gradually increased for testing whether a candidate solution is statistically better than the current best solution.

Haddock and Mittenthal (1992) investigate the feasibility of using an SA algorithm in conjunction with a simulation model to find the optimal parameter levels at which to operate a system. The authors modify Kirkpatrick et al. (1983) by substituting an estimate of the expected value of the system response (the objective function) in all places requiring a deterministic objective function value.

Rosen and Harmonosky (2005) propose a combined procedure, called RS team method, that improves the SA of Haddock and Mittenthal (1992) by initially searching for good solutions to be then employed as starting solutions by SA. The initial search for good starting solutions is done by the use of first-order linear approximations of the model, adapting the technique of response surface methodology to the case of a discrete decision space. The RS team method is tested on a simulation model of a semi-conductor manufacturing process consisting of over 40 workstations, and it is experimentally compared with the SA algorithm of Haddock and Mittenthal (1992).

Bowler et al. (2003) use a stochastic SA algorithm to experimentally analyze the asymptotic behavior of (sub)optimal homogeneous PTSP solutions, in the limit of pn (customers probability times number of customers) going to infinity. The PTSP objective function is estimated by sampling, and the sampling estimation error is used instead of the annealing temperature. Temperature decrease during the execution of the SA algorithm is mimicked by an increase in the accuracy of the objective function estimation, which, in turn, is obtained by increasing the number of samples.

Finally, two papers Gutjahr (2004) and Pichitlamken and Nelson (2003) focus on different metaheuristics, but involve SA in experimental comparison. The paper by Gutjahr (2004) that we also cited in Sect. 2.1 focuses on S-ACO, and reports experimental comparisons between S-ACO and the S-SA algorithm of Gutjahr and Pflug (1996). Pichitlamken and Nelson (2003), while focusing on a Stochastic Partitioning Method that will be described in Sect. 5.5.1, use the SA algorithm of Alrefaei and Andradóttir (1999) as a term of comparison in the experimental analysis of their algorithm.

Although variants of SA for SCOPs have received a great deal of attention in the last decade, such that, for example, the question under which conditions convergence to the optimal solution is ensured can now be considered as relatively well understood, there is a comparably smaller body of comprehensive experimental results.

5.4 TS for SCOPs

The very few papers applying TS to SCOPs present in the literature are of experimental nature, and address static SCOPs both in the case of ad hoc approximations and sampling approximation.

Ad hoc approximation. As we have already pointed out, one of the major difficulties when solving SCOPs is that the objective function, even if explicitly computable, is computationally expensive. In local search based algorithms such as TS, it is crucial to be able to evaluate the neighborhood of a solution efficiently. Therefore, one of the main issues when applying TS to SCOPs is to find efficient approximations of the objective value difference between couples of neighboring solutions.

Gendreau et al. (1996) propose a TS algorithm for solving the vehicle routing problem with stochastic demands and customers. One of the major contribution of their paper is indeed the development of an easily computed approximation for the objective function, used for the evaluation of potential moves. The proposed TS was quite successful in experiments: for instances up to about 50 customes, it was able to find optimal solutions in about 90% of cases, with an average deviation of 0.38% from optimality.

Other papers applying TS to SCOPs are the already cited (Bianchi et al. 2004, 2006) by Bianchi et al. (see Sects. 2.12.3), where a simple TS algorithm has been compared with other metaheuristics (ACO, EC, SA, and Iterated Local Search). Similarly to the other metaheuristics, two approximations for the objective value difference between neighboring solutions generated according to the OrOpt scheme have been tested, the VRPSD and the TSP approximation. Even if the two approximations have different characteristics (the first one is more accurate but more computationally expensive than the second), the quality of results produced by the two versions of TS seemed to be quite insensitive to the type of approximation (see Fig. 3). Bianchi et al. (2006), TS obtained results better than ACO and SA, but worse than EC.

Haugland et al. (2007) formulate the problem of designing delivery districts for vehicle routing with stochastic demands, and they propose a TS and a multistart heuristic to solve the problem. An ad hoc approximation of the objective function is used to verify the feasibility of solutions in the proposed algorithms, since the objective function is computationally expensive. TS is shown to outperform multistart.

Sampling approximation. In the literature two types of contributions may be distinguished: papers that use simulation as a black box for the evaluation of the objective value of solutions, and papers that adapt the simulation procedure to the different components of TS, such as neighborhood exploration, setting of tabu moves, verification of aspiration criteria, in order to speed up the computation.

To the first group belong the papers by Lutz et al. (1998), Finke et al. (2002), Dengiz and Alabas (2000). These papers apply quite standard TS techniques that are very time consuming, since the evaluation of solutions by simulation is a time consuming process often relying on extern or commercial simulation packages. The advantage of using simulation is that in this way the real objective function is considered, in problems where a rigorous mathematical programming formulation would impose severe unrealistic restrictions.

Among the second group of papers (adapting simulation to the different components of TS), there are Costa and Silver (1998) Aringhieri (2004), and Cheung et al. (2007). Costa and Silver (1998) describe a TS algorithm for a problem in the context of cause-effect analysis, where the true cause of an undesirable effect must be recognized and eliminated. Given that the time to investigate a cause is a random variable with known probability distribution, the goal is to establish a fixed sequence of n causes so as to maximize the expected reward associated with discovering the true cause within a specified time horizon. This problem is also called stochastic ordering problem with time constraint (SOPTC).

The TS developed in this context, called NTS (Noisy TS), is based on sampling and statistical tests, and is suited for all optimization problems where the evaluation of the objective function is computationally expensive due to the presence of noise in the problem definition. In the following we only describe the characteristics of NTS that are directly related to the stochastic nature of the problem. What we do not describe, is part of standard TS techniques for permutation problems. The objective value of a new current solution is computed by a sample average of the type of Eq. 10. N samples are generated according to the so-called descriptive sampling technique as described in Jönsson and Silver (1996), in order to obtain substantial variance reduction with respect to other sampling methods. Descriptive sampling has been adopted by Costa and Silver also because in this way the quality of estimation of the exact objective value does not depend on the quality of the pseudo-random generator used. The estimation of the objective function takes O(Nn) time, and if N is large enough to guarantee a good estimation quality, this computation may be quite time consuming. For this reason, the evaluation of the (possible many) neighbors of a solution is done with the following method relying on a smaller number of samples. A statistical test is used to decide whether a considered neighbor y c  ∈ A(x, k) is better than the best neighbor y b  ∈ A(x,k) examined so far in the current iteration k. The decision is done in two phases. First, a small number N c N of samples is randomly generated for estimating the expected value \({g_{N_c}}(y_c)\) of y c . The decision as to whether the true objective value of y c , is higher than that of y b is done by hypothesis testing. Second, if the test ‘has decided’ that y c is better than y b , this is further ascertained, by using all the N samples. If it results that g N (y c ) > g N (y b ), than y b is replaced by y c . Since N is finite, notwithstanding the use of this double-check procedure, there is a certain probability that y b is not truly the best feasible neighbor, and that the best solution so far is updated with not the truly best solution so far. In order to lesser the risk of missing a very good solution due to the bad quality of sampling, NTS keeps track of the ns best solutions encountered so far. At the end of the run all solutions in this list are re-evaluated with a number of samples \(\kappa > N,\) and the best solution according to this new estimation is the solution returned by NTS.

Costa and Silver (1998), the influence on the performance NTS of several factors has been experimentally analyzed: the hypothesis testing technique (the t-test, the Wilcoxon test, and the Median test have been compared), and the number of samples N, N c and \(\kappa \) to be used in the different phases of NTS. NTS has been compared also with a TS that is similar in everything to NTS, except for the fact that the objective function is computed exactly on the base of a closed form expression available for SOPTC, and no hypothesis test is performed. TS outperforms NTS both in computation time and solution quality, but the solution quality is only slightly better than NTS. This is a result that encourages the use of NTS for problems with very complex, or impossible to compute, objective functions. Note, however, that when a closed form expression for the objective function is available, even if it is quite computationally expensive like in SOPTC, it may still be more efficient to use the classical TS algorithm, instead of NTS.

An application where sampling is employed to save time with respect to using the exact objective function is the one by Aringhieri (2004), that applies TS to a Chance Constrained Program (Definition 4). Constraints are supposed to be linear functions, that is, in Definition 4 we pose H i (x,ω) = ∑ j a ij x j −  b i (ω), with j = 1,2..., n and \(x \, \in \, S \subset {\mathbb{R}}^n.\) Note that in this problem, only the vector b is assumed to be random. In the proposed TS, sampling is used to estimate the probability p i (x,k) that at iteration k solution x violates constraint b i . Given a set of Nm random samples b i,r , i = 1,2,…, m, r = 1,2,…N, the probabilities are estimated as follows

$$ p_i(x,k) = \frac{\sum_{r=1}^N \delta_{i,r}}{N}, \; \hbox{ where } \delta_{i,r} = \left\{ \begin{array}{ll} 1 &\hbox{ if } \sum_j a_{ij} x_j - b_{i,r} > 0 \\ 0 & \hbox{ otherwise} \end{array} \right.$$
(11)

Probabilities p i (x,k) are used to define the concept of probably tabu moves that in practice extends the set of tabu moves. A move is probably tabu at iteration k, if it leads to a solution x for which p i (x,k) > α i , i = 1,2,…,m (compare this with Eq. 4). Given the set P(x,k) of probably tabu neighbors of x, the new TS, called SIMTS-CCP (simulation TS for Chance Constrained Programs), can be obtained from algorithm 4 by modifying the computation of A(x,k) as

$$ A(x,k) = \{y \, \in \, S(x)\backslash T(x,k) \backslash P(x,k) \cup \tilde{T}(x,k) \}. $$
(12)

In Aringhieri (2004) the SIMTS-CCP algorithm has been applied to two NP-hard optimization problems arising in the design of telecommunication networks. Preliminary computational results show that solution quality is comparable to that obtained by a TS algorithm that addresses the problem as deterministic, and the increase in computation time is acceptable.

Finally, Cheung et al. (2007) apply sampling to estimate the cost of TS moves for solving the two-dispatch delivery problem, formulated as a Two-stage Stochastic Integer Programming problem.

5.5 Other metaheuristic approaches for SCOPs

In this section we present metaheuristics approaches such that Stochastic Partitioning Methods, Progressive Hedging, Rollout algorithms, Particle Swarm Optimization, and Variable Neighborhood Search, that are still less widespread in the nature-inspired/metaheuristic community with respect to the previous ones, but that are emerging as effective methods for combinatorial optimization in general, and for SCOPs in particular.

5.5.1 Stochastic Partitioning Methods

We have grouped under the name of SPM the Beam Search heuristic applied to SCOPs (Beraldi and Ruszczyński 2005, Erel et al. 2005), the Stochastic Branch and Bound (Norkin et al. 1998a, b), and the combined procedure inspired by Nested Partitions (Pichitlamken and Nelson 2003). These methods, explicitly designed for SCOPs, follow in different ways the same search strategy: the search space is recursively partitioned in sub-spaces, and the computation effort is concentrated on the sub-spaces that are estimated to be the most promising ones. SPM are not usually considered as belonging to the class of metaheuristics, but they could, since inside the general search strategy, several heuristics may be employed for the evaluation of search sub-spaces, for the improvement of solutions, and for the estimation and comparison among solutions. In the following, we introduce the different SPM methods in the context of the type of SCOP that each method mainly focus on.

Ad hoc approximation. The Beam Search (BS) heuristic is a heuristic strategy closely related to Branch and Bound, where the search space is recursively partitioned in sub-spaces, for which upper and lower bounds for the objective function are computed, in order to guide the search in the more promising partitions. Unlike Branch and Bound, BS reduces the width of the search moving downward in the search tree only from a limited number of best promising nodes. The success of BS depends on the evaluation function that is used to select the nodes that will be further explored. Typically, in BS different evaluation functions are used. First, a simple but imprecise evaluation function is used at to discard some nodes (this phase is called filtering); second, nodes that survive filtering are subject to a more precise and time consuming evaluation. Thus, the main principles behind BS (partitioning the search space and dosing the computation effort in specific partitions) are similar to those of SBB and NP. The BS has been only recently applied to SCOPs. Beraldi and Ruszczyński (2005) consider chance constrained problems like the ones we described in Sect. 3.2.1. They apply BS to a set covering problem with probabilistic constraints, and show experimentally that BS allows a considerable time saving with respect to an exact Branch and Bound algorithm, and the solution quality of BS goes from optimal to 5% worse than optimal. Erel et al. (2005) present a BS-based method for the stochastic assembly line balancing problem in U-lines. Computational experiments indicate that the average performance of the proposed method is better than the best-known heuristic in the literature for the traditional straight-line problem.

Sampling approximation. Stochastic Branch and Bound (SBB) has been first proposed by Norkin et al. (1998a), as a method for solving problems where the objective function must be estimated by sampling as described in Sect. 4.2. This algorithm extends to SCOPs the main principle of the classical Branch and Bound, that is, the computation of upper and lower bounds for the objective function of portions of the search space, in order to guide the search. The main difference with respect to classical Branch and Bound is that here, due to the stochastic and non-exact estimation of the objective function (and thus of the upper and lower bounds), sub-spaces cannot in general be cut during the search, but a sort of backtracking into previously evaluated sub-spaces may be necessary.

The SBB algorithm proposed in Norkin et al. (1998a) is represented by the pseudocode of Algorithm 7, and works as follows. Given the search space S, the algorithm constructs increasingly finer partitions of S, denoted by \({\mathcal{P}}= \{S^1,S^2,\ldots\}.\) The original problem of finding \(g^{\ast}(S) := min_{x\, \in \, S} \{ g(x) \}\) (see Eq. 3), is divided into the sub-problems of finding \(g^{\ast}(S^r) := min_{x\, \in \, S^r} \{ g(x) \},\) with r = 1,2,…, and \(g^{\ast}(S) = min_{S^r \, \in \, {\mathcal{P}}} \{g^{\ast}(S^r)\}.\) Assume that there exist functions L and U from \({\mathcal{P}}\) to \({\mathbb{R}}\) such that, for each \(S^r \, \in \, {\mathcal{P}},\) L(S r) ≤ g*(S r) ≤ U(S r), and \(U(S^r) = g(\bar{x})\) for some \(\bar{x} \, \in \, S^r,\) and if S r is a singleton set, then L(S r) = g*(S r) = U(S r). Suppose that the lower and upper bounds L(S r) and U(S r) cannot be exactly computed, but instead estimates λl(S r) and νm(S r) are used, respectively, assuming that almost surely \(\hbox{lim}_{l\rightarrow \infty} \lambda ^l(S^r) = L(S^r),\) and \(\hbox{lim}_{m\rightarrow \infty} \nu^m(S^r) = U(S^r).\)

Table 10

Norkin et al. (1998a) proved the following convergence result: Suppose the indices l k and m k are chosen in such a way that whenever a subset S r is an element of \({\mathcal{P}}_k\) for infinitely many k, then \(\hbox{lim}_{k\rightarrow \infty } l_k = \infty \) and \(\hbox{lim}_{k\rightarrow \infty } m_k = \infty.\) Then with probability one there exists an iteration k 0 such that for all k ≥ k 0, the lowest-bound subsets \(\bar{S}_k\) are singletons and contain optimal solutions only. As suggested in Norkin et al. (1998a), the estimation of a lower bound L(S r) for g*(S r), may be done by exchanging the minimization and the expectation operator, since \(g^{\ast}(S^r) = \hbox{min} _{x\, \in \, S^r} g(x) = \hbox{min} _{x\, \in \, S^r}{\mathbb{E}}_P (G(x, \omega)) \geq {\mathbb{E}}_P \left( \hbox{min} _{x\, \in \, S^r} G(x, \omega) \right).\) Thus, one may choose \(L(S^r) = {\mathbb{E}}_P \left( \hbox{min} _{x\, \in \, S^r} G(x, \omega) \right),\) and the estimation of the lower bound L(S r) may be computed by the sample average

$$ \lambda^N (S^r)= \frac{1}{N} \sum _{j=1}^N \hbox{min} _{x\, \in \, S^r} G(x, \omega_j), $$
(13)

where ω1, ω2,…, ω N is an independent, identically distributed (i.i.d.) random sample of N realizations of the random vector ω.

In general, the practical application of SBB implies one major difficulty: computing an estimation of the lower bound by Eq. 13 requires solving a possibly NP-hard deterministic combinatorial optimization problem, \(\hbox{min} _{x\, \in \, S^r} G(x, \omega_j),\) for every sample scenario ω j , and this is unfeasible in a reasonable amount of computation time, unless very small problem instances are addressed.

Gutjahr et al. (1999) use SBB to solve small instances of the single-machine-tardiness scheduling problem. They consider different sampling techniques for estimating lower bounds, and report computational experiments.

As a way to make SBB more efficient, Gutjahr et al. (2000b) propose to use heuristics or metaheuristics to approximately solve the deterministic subproblems for the lower bound estimation of Eq. 13, as schematized by Fig. 4. The authors focus on the problem of Activity Crashing in Project Management, and show experimentally that the replacement of an exact solution to deterministic subproblems by a heuristic one (in this case a local search algorithm) is very advantageous. The authors also say that it is possible to extend the convergence results of Norkin et al. (1998a) to cases in which the deterministic subproblems are approximately solved by a search heuristic with a random starting point, keeping track of the best solution found so far. Another practical enhancement of the SBB proposed in Gutjahr et al. (2000b) is the use of Importance Sampling as a technique to reduce the variance of the sample average estimates. Without the use of a variance-reduction technique, the number of Monte Carlo samples (and thus of computation time) required to obtain a sample average with the same variance would be much greater.

Fig. 4
figure 4

Possible ways of solving the deterministic subproblems for the computation of the lower bound (Eq. 13) in SBB

Doerner et al. (2006) further develop the algorithm proposed in Gutjahr et al. (2000b) by applying SBB to workflow management.

Pichitlamchen and Nelson (2003) propose a combined procedure extending the Nested Partitions (NP) method by Shi and Ólafsson (2000) to SCOPs where the objective is estimated by sampling. NP is based on identifying a sequence of ‘most promising’ subsets of the search space S, and concentrating the search of good solutions there. At each iteration, the most promising subset of S is partitioned into M subsets, and the entire surrounding region is aggregated into one single subset of S. Thus, at each iteration NP looks at a partition of M + 1 subsets of the search space S. From each of these M + 1 subsets, a random solution is chosen using some random sampling scheme, and the objective value of each solution is evaluated, in order to decide which is the most promising subset of the next iteration. With respect to NP, the combined procedure of Pichitlamchen and Nelson applied to SCOPs includes a number of enhancements. First, in each of the current M + 1 subsets, more than one solution is randomly chosen for evaluating the most promising subset. Second, solutions here are evaluated by the sample average estimation the objective value (see Eq. 10). Moreover, in order to select the best solution of each subset, and the best solution of all subsets, a statistical procedure called Sequential Selection with Memory (SMM) is used. SMM guarantees to select the best or near-best alternative among a set of solutions with a user-specified probability. It also exploits memorized information (samples and sample averages) on previously encountered solutions. The spirit of SMM is similar to the F-Race procedure proposed in the context of ACO (Birattari et al. 2005) (see Sect. 2.1), since it consists in a series of steps in which a set of competing solutions is evaluated and the worst of them are eliminated. For details about SMM, see also (Pichitlamken 2002, 2001). The combined procedure based on NP also applies a Hill Climbing local search (HC) to the best solution of each iteration. In this way, the computational effort is concentrated on the most promising subset of the search space. Another specific characteristic of the combined procedure of Pichitlamchen and Nelson is that at the end of the algorithm, the solution having the smallest sample average accumulated over all visits to that solution is returned as final solution. Pichilamchen and Nelson call their combined procedure NP + SMM + HC, a name that underlines its main building blocks just described. Pichitlamken and Nelson (2003), they provide a proof that, with probability one, NP + SMM + HC finds one of the optimal solutions as the number of iterations goes to infinity. Moreover, numerical experiments applying the algorithm to an (s,S) Inventory Problem and to a Three-Stage Buffer allocation problem show that NP + SMM + HC has a good performance in comparison to a pure random search and to a SA algorithm. While the convergence guarantee of NP + SMM + HC is due to the global guidance system provided by NP, the practical performance is enhanced by the use of SMM selection-of-the-best method and HC local search.

5.5.2 Progressive Hedging

Progressive Hedging (PH) is an algorithm proposed by Rockafellar and Wets (1991) for solving multistage stochastic programs. It is based on considering a set of few representative scenarios that capture the uncertain future; for each of these scenarios, a deterministic optimization subproblem is solved; in this way one ends up with more solutions, neither of which is in general feasible for the original problem. Therefore, a sort of averaging procedure among the solutions of the subproblems is performed, in order to obtain a ‘blended’ solution that hedges against future uncertainty. Some extensions of PH involve the use of heuristics or metaheuristics to solve the deterministic progressive hedging subproblems. For example, Løkketangen and Woodruff (1996) use TS, while Hvattum and Løkketangen (in press) use the GRASP metaheuristic Resende and Ribeiro (2003). Haugen et al. (2001) propose an extension of PH that is explicitly proposed as a metaheuristic: rather than using a heuristic algorithm to solve deterministic subproblems, it uses an algorithm for subproblems that is exact in its usual context, but serves as a heuristic for the proposed PH metaheuristic.

5.5.3 Rollout algorithms

Rollout algorithms (RO) are an emerging class of methods for solving combinatorial optimization problems, that are capable of improving the performance of a given heuristic through sequential applications. Originally, the ideas underlying RO have been proposed by Tesauro and Galperin in (1997) for developing a simulation-based algorithm to play blackgammon. In the same year, Bertsekas et al. (1997) formalized RO for combinatorial optimization problems, by applying them to a machine maintenance and repair problem. RO are based on the Policy Iteration algorithm, which is part of the Dynamic Programming framework for solving MDP (Bertsekas 1995) (see the paragraph about Markov Decision Processes of Sect. 3.2.2). Some authors (Bertsekas et al. in Sect. 2 of (Bertsekas et al. 1997), and Bertsekas on page 528 of (Bertsekas 1998)), emphasize that RO also share some ideas with Tabu Search, particularly with the sequential fan candidate list strategy (Glover and Laguna 1997) and its extended variant, the fan and filter method (Glover 1998). Among the papers that apply RO to SCOPs, we cite the works of Secomandi on the vehicle routing problem with stochastic demands (Secomandi 2000, 2001) and on the TSP with stochastic travel times (Secomandi 2003), and the paper by Bertsekas and Castañon (1998) on stochastic scheduling.

5.5.4 Particle Swarm Optimization

Particle Swarm Optimization (PSO) is a nature-inspired metaheuristic based on the idea of simulating the flight of bird flocks that was originally proposed by Eberhart and Kennedy (1995, Kennedy (1997). Recent and comprehensive overviews on PSO can be found in Banks et al. (2007, 2008) and Poli et al. (2007). In PSO a set of particles is placed in the search space of a given optimization problem, and each particle evaluates the objective function corresponding to its current location. Then, each particle determines a move through the search space by combining the history of its own current and best locations with those of one or more particles of the swarm, with some random perturbations. After all particles have been moved, the next iteration starts. In this way the particle swarm moves similarly to a flock of birds collectively foraging for food, having a high probability to ‘hit’ a local optimum of the problem objective function.

Applications of PSO to SCOPs are quite recent, and involve problems that are very near to real-world situations, such as power generation and dispatch (Pappala and Erlich 2007; Wang and Singh 2008), warehousing (Brodersen and Schumann 2007), vehicle routing (Lu and Tan 2006; Lu et al. 2006; Zhao 2007), and scheduling (Liu et al. 2005).

5.5.5 Variable Neighborhood Search

Variable Neighborhood Search (VNS), a metaheuristic developed by Hansen and Mladenović (2001) less than a decade ago, has recently found much interest because of its good empirical performance especially in the vehicle routing field. The idea of VNS is similar to that of SA. Also VNS is a stochastic local search procedure, but contrary to SA, it does not use a single neighborhood structure, but instead a hierarchy of more narrow as well as of broader neighborhood definitions. From an “incumbent” solution x, the procedure jumps to a random neighbor solution x′ contained in one of these neighborhoods (a step called shaking), improves the solution x′ by local search according to the smallest pre-defined neighborhood, which yields a solution x″, and accepts or rejects x″ depending on wether it is better than x or not. In the case of rejection, the current neighborhood size for the shaking step is increased by one; in the case of acceptance, it is re-set to the smallest value.

Applications of VNS to SCOPs are still rare at the moment. Jovanović et al. (2007) investigated a probabilistic version PSAT of the well-known satisfiability (SAT) problem. The VNS approach was combined with the Nelder-Mead nonlinear optimization method to solve large PSAT instances.

A general-purpose VNS algorithm for SCOPs based on Monte-Carlo sampling was introduced by Gutjahr et al. (2007) under the name S-VNS. As the S-ACO algorithm described before, S-VNS uses the idea of comparing a current solution x proposed by the basic search mechanism to the solution x* considered best so far by means of a sample average estimate of variable sample size. A convergence result similar in flavor to that for S-ACO (albeit somewhat stronger) has been shown. Furthermore, in Gutjahr et al. (2007), the procedure S-VNS has been investigated experimentally on a stochastic project portfolio selection problem combining a project subset selection decision with scheduling and staff assignment under stochastic work times.

6 Discussion

In this section we take a transversal view on the metaheuristics literature presented in the previous sections. We highlight issues common to all metaheuristics, open problems, and possible directions of research.

6.1 New balance between intensification and diversification

In order to efficiently explore the search space, any metaheuristic must realize a dynamic balance between intensification (the exploitation of the accumulated search experience), and diversification (the exploration of the search space). As we have seen in Sect. 2, diversification mechanisms used by metaheuristics when solving DCOPs include: Pheromone evaporation in ACO, mutation in EC, higher than zero temperature in SA, and tabu lists in TS.

When solving SCOPs, a new balance between intensification and diversification must be reached, due to the fact that the objective function is most of the times computed in an approximated way. More precisely, the approximate nature of objective function values introduces noise in a metaheuristic (we call this approximate-evaluation noise). The existence of approximate-evaluation noise makes possible the acceptance of low-quality solutions, that, in turns, opens the exploration of areas of the search space which are far from the current best solution. Thus, approximate-evaluation noise can be seen as an additional diversification mechanism that must be added to (or may even substitute) metaheuristics-specific diversification. In practice, the effect of approximate-evaluation noise is similar to the effect of temperature in SA, or to the effect of tabu-lists in TS. We expect that this additional diversification mechanism has a great influence on the tuning parameters of a metaheuristic, and should be taken into account. For example in SA, one could see that after tuning the algorithm for a given SCOP, the best temperature value is zero. Indeed, Bowler et al. (2003) use the sampling estimation error as a sort of temperature in the annealing process, instead of using temperature.

This new balance between intensification and diversification required when metaheuristics use approximate evaluations could be an interesting research topic, which has still received little attention in the literature.

6.2 Issues in using the simulation approximation

We have seen that the selection-of-the-best method that a metaheuristic uses for performing sample averages and for comparing solutions can have a great impact on the effectiveness of the algorithm, but it is still hard to say which method is the most effective in relation to the metaheuristic where it is employed, and this is an interesting open issue.

Table 4 reports some successful selection-of-the-best methods described in the previous sections in the context of the metaheuristic where they have been used. In some cases (Alkhamis et al. 1999; Alrefaei and Andradóttir 1999; Gutjahr 2003; Gutjahr and Pflug 1996; Homem-de-Mello 2000), the use of a particular method has been justified mainly by the need to derive rigorous properties of convergence, and the application to other metaheuristics is not very meaningful. But in more experimental oriented papers, a method which is particularly efficient in one metaheuristic, could be advantageous also in others. This is the case, for instance, of F-Race (Birattari et al. 2005), SMM (Pichitlamken and Nelson 2003), and the adaptive sampling procedures used in Gutjahr (2004), Homem-de-Mello (2003), and Costa and Silver (1998). In looking for efficient selection-of-the-best methods to be applied in metaheuristics, the literature about statistical procedures of ranking, selection, and multiple comparisons (see, for example Fu (2002), Swisher et al. (2003) and the references cited therein) could be a good source of inspiration. Moreover, for speeding up the sample average computations, it could be useful the application of variance-reduction techniques, such as, for example, those belonging to the field of Rare Event Simulation (Rubinstein 1981). A deeper analysis of simulation-based local search also seems advisable, since most metaheuristics exploit or may exploit some kind of local search. Balaprakash et al. (2007a) make a first step in this direction.

Table 4 Main selection-of-the-best methods used in the metaheuristics literature for SCOPs where the objective function must be estimated by sampling

Given the above observations, a selection-of-the-best method working as a black box simulation that does not allow to specify how samples are chosen is not advisable. Another requirement that seems necessary is the possibility to increase the accuracy of objective function estimates, particularly when the algorithm has identified good or near optimal solutions. The intuitive reason is that often in SCOPs there are many local optima, whose values may be also quite near, and in order to discriminate between local optima one needs that the estimation error is small with respect to the difference between the exact value of local optima. We have seen a practical confirmation of this in several experimental papers, for instance Watson et al. (1999) and Costa and Silver (1998]. A more rigorous argument in favor of this requirement is that all metaheuristics with provable convergence properties need to use a number of samples increasing with the iteration counter.

Finally, a good source of inspiration for promising selection-of-the-best methods to be integrated into metaheuristics for SCOPs, are the numerous papers that we cited from the EC literature that focus on continuous optimization and/or on ad hoc test functions.

6.3 Level of stochasticity and algorithm performance

It has been recognized (cf. Fu 2002) that completely different optimization algorithms may be needed for small and for large search spaces, respectively. Similarly, we think that also the “level of stochasticity” of a problem instance is an important factor. By level of stochasticity we mean, informally, the amount of randomness in the problem instance. Quantitatively, the level of stochasticity can be measured by the variance of the random variables involved. This quantity is particularly meaningful in SCOPs where stochasticity may be parametrized in such a way that for some parameter value(s) the problem is reduced to a DCOP. Thus, a SCOP instance with ‘zero stochasticity’ corresponds to a instance of a particular DCOP. Very often, the DCOP which a SCOP can be reduced to is a very well known problem in the optimization literature, for which many efficient and ready-to-use algorithms are available. This is the case, for example, of the PTSP that can be reduced to the TSP, and of the VRPSD that can be reduced to the capacitated VRP. In all these situations it is reasonable to expect that a state-of-the-art algorithm designed for the DCOP is not worse than a metaheuristics designed for the SCOP, for instances with small stochasticity. It is also possible that the state-of-the-art algorithm for the DCOP outperforms the metaheuristic for a non-negligible set of SCOP instances. It is very important then to investigate the performance of a metaheuristic with respect to the level of stochasticity, and in particular to compare it with already available algorithms for the corresponding DCOPs from the literature. This type of analysis has been done in a few works (Bianchi 2006; Bianchi et al. 2002a, b) in the context of the PTSP and ACO, but it still waits to be elaborated for other SCOPs and metaheuristics.

6.4 Experimental comparisons among different metaheuristics

At the moment, most of the papers in the SCOP literature focus on one single metaheuristic, which is compared either to variants of the same metaheuristic, or to simple heuristics such as random search, or to exact methods when these are available. Only a very small number of papers perform comparisons among different metaheuristics, as reported in Table 5. Thus, it is still impossible to give guidelines on which metaheuristic is better in which situation.

Table 5 Papers with comparisons among different metaheuristics

One important aspect that experimentation with different metaheuristics could reveal is whether the effectiveness of a metaheuristic is due to the particular adjustments to speed up the computation (like approximating the objective function or using carefully designed sampling and statistical comparison strategies), or to the intrinsic search trajectory of the metaheuristic.

6.5 Theoretical convergence properties

Papers analyzing theoretical convergence properties of metaheuristics applied to SCOPs are summarized in Table 6. Note that in the table SCOPs with exactly computable objective function are missing. In fact, when a metaheuristic always uses an exact expression for the objective value of a solution, its convergence behavior is equivalent, from a theoretical point of view, to that of applying the metaheuristic to a DCOP (pointers to theoretical analyses of metaheuristics for DCOPs have been provided in the previous sections while introducing each metaheuristic). On the contrary, when ad hoc approximations of the objective function are used, the systematicness of the error makes a theoretical analysis very difficult.

Table 6 Papers with theoretical convergence proofs on SCOPs

Theoretical convergence analyses do exist for static SCOPs with sampling estimated objective function. The most studied metaheuristic from this point of view is certainly SA, followed by ACO and SPM. TS and EC still miss this kind of analysis. Interestingly, Homem-de-Mello (2003) suggests that the results he derives for a simple variable-sample random search algorithm can be readily adapted to show the convergence of variable-sample versions of more sophisticated methods, in particular, those methods for which the proof of convergence in the DCOP domain relies on the convergence of pure random search. The author indicates explicitly EC (GA) as one of these, by referring to the work of Rudolph (1996).

Finally, metaheuristics with provable convergence properties (ACO and EC) have been designed to solve MDPs. Actually, at the moment MDPs have been addressed only theoretically by metaheuristics. Therefore, there is the need to validate their effectiveness also by experimental investigations.

7 Conclusion

In this paper, a wide class of combinatorial optimization problems under uncertainty addressed by metaheuristics is considered. The domain of SCOPs is clearly identified by providing the formal description of several SCOPs. Metaheuristics that have been applied so far to SCOPs are introduced and the related literature is throughly reviewed.

The following properties of metaheuristics have emerged from this survey: they are a valid alternative to exact classical optimization algorithms, particularly for addressing real-sized SCOPs; they are flexible, since they can be quite easily adapted to solve different SCOPs formulations, both static and dynamic.

The main issues that in our opinion need further investigation are the following. Given a particular SCOP and metaheuristic, which is the best way to use objective function approximations in the optimization process? In particular, when the sampling approximation is considered, the selection-of-the-best method used may have a great impact on the algorithm performance, and should be carefully chosen. This survey highlights some methods that are less promising than others, providing some guidelines and references for future research. Another important issue is the investigation of the algorithm performance with respect to different levels of stochasticity of the problem instance for a given SCOP, in order to identify groups of instances where the metaheuristic outperforms algorithms originally designed to solve similar deterministic problems. In this survey we have also summarized the achievements in theoretical proofs of convergence, so that the interested researcher may easily find space for new investigations.