Keywords

Introduction

The design of heuristics for difficult optimization problems is itself a heuristic process that often involves the following main steps.

After a clever analysis of the problem at hand and of the acceptable simplifications in its definition, one tries to set up an effective mathematical programming (MP) model and to solve it by a general-purpose piece of software—often a mixed-integer linear programming (MIP) solver. Due to the impressive improvement of general-purpose solvers in recent years, this approach can actually solve the instances of interest to proven optimality (or with an acceptable approximation) within a reasonable computing time, in which case of course no further effort is needed.

If this is not the case, one can insist on the MP approach and try to obtain better and better results by improving the model and/or by enhancing the solver by specialized features (cutting planes, branching, etc.). Or one can forget about MP and resort to ad hoc heuristics not based on the MP model. In this latter case, the MP model is completely disregarded or just used for illustrating the problem characteristics and/or for getting an off-line indication of the typical approximation error on a set of sample instances.

A third approach is however possible that consists in using the MP solver as a basic tool within the heuristic framework. This hybridization of MP with metaheuristics leads to the matheuristic approach, where the heuristic is built around the MP model. Matheuristics became popular in recent years, as witnessed by the publication of dedicated volumes and journal special issues [8, 19, 24] and by the dedicated sessions on MP and metaheuristic conferences.

Designing an effective heuristic is an art that cannot be framed into strict rules. This is particularly true when addressing a matheuristic, which is not a rigid paradigm but a concept framework for the design of mathematically sound heuristics. In this chapter, we will therefore try to illustrate some main matheuristic features with the help of different examples of application.

Section “General-Purpose MIP-Based Heuristics” describes powerful general-purpose MIP heuristics that can be used within the matheuristic framework. Interestingly, these heuristics can themselves be viewed as the first successful applications of the matheuristic idea of hybridizing MP and metaheuristics. Indeed, as noticed in [8], one of the very first illustrations of the power of the matheuristic idea is the general-purpose local branching [7] paradigm, where a black-box MIP solver is used to explore a solution neighborhood defined by invalid constraints added to the MIP model for the sake of easing its solution.

Section “Application 1: Wind Farm Layout Optimization” addresses the design of a matheuristic for wind farm optimization. This application is used to illustrate the importance of the choice of the MIP model: models that are weak in polyhedral terms can be preferred to tighter—but computationally much harder—models when heuristic (as opposed to exact) solutions are required.

Section “Application 2: Prepack Optimization” addresses a packing problem where the model is nonlinear, and the matheuristic is based on various ways to linearize it after a heuristic fixing of some variables.

Finally, section “Application 3: Vehicle Routing” is used to illustrate an advanced feature of matheuristics, namely, the solution of auxiliary MP models that describe a subproblem in the solution process. In particular, we address a vehicle routing problem and derive a matheuristic based on a set-partitioning MIP model asking for the reallocation of a subset of customer sequences subject to capacity and distance constraints.

The present chapter is based on previous published work; in particular, sections “General-Purpose MIP-Based Heuristics”, “Application 1: Wind Farm Layout Optimization”, “Application 2: Prepack Optimization”, and “Application 3: Vehicle Routing” are based on [8, 11, 13, 15], respectively.

General-Purpose MIP-Based Heuristics

Heuristics for general-purpose MIP solvers form the basis of the matheuristic’s toolkit. Their relevance for our chapter is twofold. On the one hand, they are invaluable tools for the solution of the subproblems tailored by the matheuristic when applied to a specific problem. On the other hand, they illustrate the benefits for a general-purpose MIP solver deriving from the use of metaheuristics concepts such as local search and evolutionary methods.

Modern MIP solvers exploit a rich arsenal of tools to attack hard problems. It is widely accepted that the solution of hard MIPs can take advantage from the solution of a series of auxiliary linear programs (LPs) intended to enhance the performance of the overall MIP solver. For example, auxiliary LPs may be solved to generate powerful disjunctive cuts or to implement a strong branching policy. On the other hand, it is a common experience that finding good-quality heuristic MIP solutions often requires a computing time that is just comparable to that needed to solve the LP relaxation. So, it makes sense to think of exact/heuristic MIP solvers where auxiliary MIPs (as opposed to LPs) are heuristically solved on the fly, with the aim of bringing the MIP technology under the chest of the MIP solver itself. This leads to the idea of “translating into a MIP model” (MIPping in the jargon of [9]) some crucial decisions to be taken when designing a MIP-based algorithm.

We next describe the new generation of MIP heuristics that emerged in the late 1990s, which are based on the idea of systematically using a “black-box” external MIP solver to explore a solution neighborhood defined by invalid linear constraints. We address a generic MIP of the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} (MIP)~~~~~ &\displaystyle &\displaystyle \min c^T x \end{array} \end{aligned} $$
(1)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle A x \geq b, {} \end{array} \end{aligned} $$
(2)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle x_j \in \{0,1\}, ~\forall j \in \mathcal{B}, {} \end{array} \end{aligned} $$
(3)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle x_j \mbox{ integer}, ~\forall j \in \mathcal{G}, {} \end{array} \end{aligned} $$
(4)
$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle x_j \mbox{ continuous}, ~\forall j \in \mathcal{C},{} \end{array} \end{aligned} $$
(5)

where A is an m × n input matrix and b and c are input vectors of dimension m and n, respectively. Here, the variable index set \(\mathcal {N}:=\{ 1,\dots , n \}\) is partitioned into \(({\mathcal {B}}, {\mathcal {G}}, {\mathcal {C}})\), where \(\mathcal {B}\) is the index set of the 0-1 variables (if any), while sets \(\mathcal {G}\) and \(\mathcal {C}\) index the general integer and the continuous variables, respectively. Removing the integrality requirement on variables indexed by \(\mathcal {I} := \mathcal {B} \cup \mathcal {G}\) leads to the so-called LP relaxation.

Local Branching

The local branching (LB) scheme of Fischetti and Lodi [7] appears to be one of the first general-purpose heuristics using a black-box MIP solver applied to subMIPs, and it can be viewed as a precursor of matheuristics. Given a reference solution \(\bar {x}\) of a MIP with \(\mathcal {B}\neq \emptyset \), one aims at finding an improved solution that is “not too far” from \(\bar {x}\), in the sense that not too many binary variables need be flipped. To this end, one can define the k-opt neighborhood \(\mathcal {N}(\bar {x}, k)\) of \(\bar {x}\) as the set of the MIP solutions satisfying the invalid local branching constraint

$$\displaystyle \begin{aligned} \Delta(x,\bar{x}) := \sum_{j\in \mathcal{B}: \bar{x}_j=0} x_j +\sum_{j\in \mathcal{B}: \bar{x}_j=1} (1 - x_j) \leq k, \end{aligned} $$
(6)

for a small neighborhood radius k—an integer parameter typically set to 10 or 20. The neighborhood is then explored (possibly heuristically, i.e., with some small node or time limit) by means of a black-box MIP solver. Experimental results [10] show that the introduction of the local branching constraint typically has the positive effect of driving to integrality many component of the optimal solution of the LP relaxation, improving the so-called relaxation grip and hence the capability of the MIP solver to find (almost) optimal integer solutions within short computing times. Of course, this effect is lost if parameter k is set to a large value—a mistake that would make local branching completely ineffective.

LB is in the spirit of local search metaheuristics and, in particular, of large-neighborhood search (LNS) [29], with the novelty that neighborhoods are obtained through “soft fixing,” i.e., through invalid cuts to be added to the original MIP model. Diversification cuts can be defined in a similar way, thus leading to a flexible toolkit for the definition of metaheuristics for general MIPs.

Relaxation-Induced Neighborhood Search

The relaxation-induced neighborhood search (RINS) heuristic of Danna, Rothberg, and Le Pape [4] also uses a black-box MIP solver to explore a neighborhood of a given solution \(\bar {x}\) and was originally designed to be integrated in a branch-and-bound solution scheme. At specified nodes of the branch-and-bound tree, the current LP relaxation solution x and the incumbent \(\bar {x}\) are compared, and all integer-constrained variables that agree in value are fixed. The resulting MIP is typically easy to solve, as fixing reduces its size considerably, and often provides improved solutions with respect to \(\bar {x}\).

Polishing a Feasible Solution

The polishing algorithm of Rothberg [27] implements an evolutionary MIP heuristic which is invoked at selected nodes of a branch-and-bound tree and includes all classical ingredients of genetic computation, namely:

  • Population: A fixed-size population of feasible solutions is maintained. Those solutions are either obtained within the branch-and-bound tree (by other heuristics) or computed by the polishing algorithm itself.

  • Combination: Two or more solutions (the parents) are combined with the aim of creating a new member of the population (the child) with improved characteristics. The RINS scheme is adopted, i.e., all variables whose value coincides in the parents are fixed, and the reduced MIP is heuristically solved by a black-box MIP solver within a limited number of branch-and-bound nodes. This scheme is clearly much more time-consuming than a classical combination step in evolutionary algorithms, but it guarantees feasibility of the child solution.

  • Mutation: Diversification is obtained by performing a classical mutation step that (i) randomly selects a “seed” solution in the population, (ii) randomly fixes some of its variables, and (iii) heuristically solves the resulting reduced MIP.

  • Selection: Selection of the two parents to be combined is performed by randomly picking a solution in the population and then choosing, again at random, the second parent among those solutions with a better objective value.

Proximity Search

Proximity search [10] is a “dual version” of local branching that tries to overcome the issues related to the choice of the neighborhood radius k. Instead of hard-fixing the radius, proximity search fixes the minimum improvement of the solution value and changes the objective function to favor the search of solutions at small Hamming distance with respect to the reference one.

The approach works in stages, each aimed at producing an improved feasible solution. As in LB or RINS, at each stage a reference solution \(\bar {x}\) is given, and one aims at improving it. To this end, an explicit cutoff constraint

$$\displaystyle \begin{aligned} c^T x \le c^T \bar{x} - \theta {} \end{aligned} $$
(7)

is added to the original MIP, where θ > 0 is a given tolerance that specifies the minimum improvement required. The objective function of the problem can then be replaced by the proximity function \(\Delta (x, \bar {x})\) defined in (6), to be minimized. One then applies the MIP solver, as a black box, to the modified problem in the hope of finding a solution better than \(\bar {x}\). Computational experience confirms that this approach is quite successful (at least, on some classes of problems), due to the action of the proximity objective function that improves the “relaxation grip” of the model.

A simple variant of the above scheme, called “proximity search with incumbent,” is based on the idea of providing \(\bar {x}\) to the MIP solver as a staring solution. To avoid \(\bar {x}\) be rejected because of the cutoff constraint (7), the latter is weakened to its “soft” version

$$\displaystyle \begin{aligned} \begin{array}{rcl} c^T x \le c^T \bar{x} - \theta (1-\xi) {} \end{array} \end{aligned} $$
(8)

while minimizing \(\Delta (x,\bar {x}) + M \xi \) instead of just \(\Delta (x,\bar {x})\), where ξ ≥ 0 is a continuous slack variable and M ≫ 0 is a large penalty.

Application 1: Wind Farm Layout Optimization

Green energy became a topic of great interest in recent years, as environmental sustainability asks for a considerable reduction in the use of fossil fuels. The wind farm layout optimization problem aims at finding an allocation of turbines in a given site so as to maximize power output. This strategic problem is extremely hard in practice, both for the size of the instances in real applications and for the presence of several nonlinearities to be taken into account. A typical nonlinear feature of this problem is the interaction among turbines, also known as wake effect. The wake effect is the interference phenomenon for which, if two turbines are located one close to another, the upwind one creates a shadow on the one behind. Interference is therefore of great importance in the design of the layout as it results into a loss of power production for the turbine downstream.

We next outline the main steps in the design of a sound matheuristic scheme for wind farm layout optimization that is able to address the large-size instances arising in practical applications.

Choice of the MIP Model

Different models have been proposed in the literature to describe interference. We will consider first a simplified model from the literature [5], where the overall interference is the sum of pairwise interferences between turbine pairs. The model addresses the following constraints:

  1. (a)

    a minimum and maximum number of turbines that can be built is given;

  2. (b)

    there should be a minimal separation distance between two turbines to ensure that the blades do not physically clash (turbine distance constraints);

  3. (c)

    if two turbines are installed, their interference will cause a loss in the power production that depends on their relative position and on wind conditions.

Let V denote the set of possible positions for a turbine, called “sites” in what follows, and let

  • N MIN and N MAX be the minimum and maximum number of turbines that can be built, respectively;

  • D MIN be the minimum distance between two turbines;

  • dist(i, j) be the Euclidean distance between sites i and j;

  • I ij be the interference (loss of power) experienced by site j when a turbine is installed at site i, with I jj = 0 for all j ∈ V ;

  • P i be the power that a turbine would produce if built (alone) at site i.

In addition, let G I = (V, E I) denote the incompatibility graph with

$$\displaystyle \begin{aligned} E_I = \{ [i,j] \in V \times V: \, dist(i,j) < D_{\mathrm{MIN}}, \, i<j\}\end{aligned} $$

and let n := |V | denote the total number of sites. Two sets of binary variables are defined:

$$\displaystyle \begin{gathered} x_i = \left\{\begin{array}{ll} 1 & \, \mbox{if a turbine is built at site {$i$}}; \\ 0 & \, \mbox{otherwise} \end{array} \right. \quad(i \in V) \\ z_{ij} = \left\{\begin{array}{ll} 1 & \, \mbox{if two turbines are built at both sites {$i$} and {$j$}}; \\ 0 & \, \mbox{otherwise} \end{array} \right. \quad(i,j \in V, i < j) \end{gathered} $$

The model then reads

$$\displaystyle \begin{aligned} \begin{array}{rcl} \max \sum_{i \in V} P_i x_i - \sum_{i \in V}\sum_{j \in V, i < j} (I_{ij}+I_{ji}) z_{ij} {} \end{array} \end{aligned} $$
(9)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{s.t.} N_{\mathrm{MIN}} \leq \sum_{i \in V} x_i \leq N_{\mathrm{MAX}} {} \end{array} \end{aligned} $$
(10)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_i + x_j \leq 1 \forall [i,j] \in E_I {} \end{array} \end{aligned} $$
(11)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_i + x_j - 1 \leq z_{ij} \forall i, j \in V, i<j {} \end{array} \end{aligned} $$
(12)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_i \in \{0, 1 \} \forall i \in V {} \end{array} \end{aligned} $$
(13)
$$\displaystyle \begin{aligned} \begin{array}{rcl} z_{ij} \in \{0, 1 \} \forall i,j \in V, i < j {} \end{array} \end{aligned} $$
(14)

Objective function (9) maximizes the total power production by taking interference losses I ij into account. Constraints (11) model pairwise site incompatibility. Constraints (12) force z ij = 1 whenever x i = x j = 1; because of the objective function, this is in fact equivalent to imposing z ij = x i x j.

The definition of the turbine power vector (P i) and of interference matrix (I ij) depends on the wind scenario considered, which greatly varies in time. Using statistical data, one can in fact collect a large number K of wind scenarios k, each associated with a pair (P k, I k) with a probability π k, and define the average power and interference to be used in the model as:

$$\displaystyle \begin{aligned} \begin{array}{rcl} P_i := &\displaystyle \displaystyle{\sum_{k=1}^K \pi_k P_{i}^k} &\displaystyle \forall i \in V \end{array} \end{aligned} $$
(15)
$$\displaystyle \begin{aligned} \begin{array}{rcl} I_{ij} := &\displaystyle \displaystyle{\sum_{k=1}^K \pi_k I_{ij}^k} &\displaystyle \forall i, j \in V \end{array} \end{aligned} $$
(16)

While (9), (10), (11), (12), (13), and (14) turns out to be a reasonable model when just a few sites have to be considered (say n ≈ 100), it becomes hopeless when n ≥ 1000 because of the huge number of variables and constraints involved, which grows quadratically with n. Therefore, when facing instances with several thousand sites, an alternative (possibly weaker) model is required, where interference can be handled by a number of variables and constraints that grows just linearly with n. The model below is a compact reformulation of model (9), (10), (11), (12), (13), and (14) that follows a recipe of Glover [17] that is widely used, e.g., in the quadratic assignment problem [12, 32]. The original objective function (to be maximized), rewritten as

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \displaystyle{\sum_{i \in V} P_i x_i - \sum_{i \in V} \left(\sum_{j \in V} I_{ij} x_j\right)\, x_i } {} \end{array} \end{aligned} $$
(17)

is restated as

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \displaystyle{\sum_{i \in V} (P_{i} x_i - w_i)} &\displaystyle {} \end{array} \end{aligned} $$
(18)

where

$$\displaystyle \begin{aligned} w_i := \left(\sum_{j \in V} I_{ij} x_j \right)\, x_i = \left\{\begin{array}{ll} \sum_{j \in V} I_{ij} x_j & \mbox{if {$x_i=1$}} \\ 0 & \mbox{if {$x_i=0$}} \end{array} \right. \end{aligned}$$

denotes the total interference caused by site i. Our compact model then reads

$$\displaystyle \begin{aligned} \begin{array}{rcl} \max z = \sum_{i \in V} (P_i x_i - w_i) &\displaystyle &\displaystyle {} \end{array} \end{aligned} $$
(19)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{s.t.} N_{\mathrm{MIN}} \leq \sum_{i \in V} x_i &\displaystyle \leq &\displaystyle N_{\mathrm{MAX}} {} \end{array} \end{aligned} $$
(20)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_i + x_j &\displaystyle \leq &\displaystyle 1 \forall [i,j] \in E_I {} \end{array} \end{aligned} $$
(21)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{j \in V} I_{ij} x_j &\displaystyle \leq &\displaystyle w_i + M_i (1-x_i) \forall i \in V {} \end{array} \end{aligned} $$
(22)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_i &\displaystyle \in &\displaystyle \{0, 1\} \forall i \in V {} \end{array} \end{aligned} $$
(23)
$$\displaystyle \begin{aligned} \begin{array}{rcl} w_i &\displaystyle \geq &\displaystyle 0 \forall i \in V {} \end{array} \end{aligned} $$
(24)

where the big-M term \(M_i = \sum _{j \in V: [i,j] \not \in E_I} I_{ij}\) is used to deactivate constraint (22) in case x i = 0, in which case w i = 0 because of the objective function.

Choice of Ad Hoc Heuristics

A simple 1-opt heuristic can be designed along the following lines. At each step, we have an incumbent solution, say \(\tilde {x}\), that describes the best-known turbine allocation (\(\tilde {x}_i=1\) if a turbine is built at site i, 0 otherwise), and a current solution x. Let

$$\displaystyle \begin{aligned} z = \sum_{i \in V} P_i x_i - \sum_{i \in V}\sum_{j \in V} I_{ij}\, x_i \, x_j \end{aligned}$$

be the profit of the current solution, γ =∑iV x i be its cardinality, and define for each j ∈ V the extra-profit δ j incurred when flipping x j, namely:

$$\displaystyle \begin{aligned} \delta_j = \left\{\begin{array}{ll} P_j - \displaystyle{\sum_{i\in V: x_i=1} (I_{ij}+I_{ji})} & \mbox{if {$x_j=0$};} \\*[1ex] -P_j + \displaystyle{\sum_{i\in V: x_i=1} (I_{ij}+I_{ji})} & \mbox{if {$x_j=1$}} \end{array} \right.\end{aligned} $$

where we assume I ij = BIG for all incompatible pairs [i, j] ∈ E I, and BIG >∑iV P i is a large penalty value, while I ii = 0 as usual.

We start with x = 0, z = 0, and γ = 0 and initialize δ j = P j for all j ∈ V . Then, we iteratively improve x by a sequence of 1-opt moves, according to the following scheme. At each iteration, we look in O(n) time for the site j with maximum δ j + FLIP(j), where function FLIP(j) takes cardinality constraints into account, namely

$$\displaystyle \begin{aligned} FLIP(j) = \left\{\begin{array}{ll} - HUGE & \mbox{if {$x_j=0$} and {$\gamma \geq N_{MAX}$}}\\ - HUGE & \mbox{if {$x_j=1$} and {$\gamma \leq N_{MIN}$}}\\ + HUGE & \mbox{if {$x_j=0$} and {$\gamma < N_{MIN}$}}\\ + HUGE & \mbox{if {$x_j=1$} and {$\gamma > N_{MAX}$}}\\ ~~0 & \mbox{otherwise} \end{array} \right.\end{aligned} $$

with HUGE ≫ BIG (recall that function δ j + FLIP j has to be maximized).

Once the best j has been found, say j , if \(\delta _{j^*}+FLIP(j^*)>0\), we just flip \(x_{j^*}\); update x, z, and γ in O(1) time; update all δ j’s in O(n) time (through the parametric technique described in [11]); and repeat. In this way, a sequence of improving solutions is obtained, until a local optimal solution that cannot be improved by just one flip is found. To escape local minima, a simple perturbation scheme can be implemented; see again [11] for details.

A 2-opt heuristic can similarly be implemented to allow a single turbine to move to a better site—a move that requires flipping two variables. Each 2-opt exchange requires O(n 2) time as it amounts to trying n 1-opt exchanges and to apply the best one.

The Overall Matheuristic

Our final approach is a mixture of ad hoc (1- and 2-opt) and general MIP (proximity search with incumbent) heuristics and works as shown in Algorithm 1.

Algorithm 1: The overall matheuristic framework

At Step 2, the heuristics of section “Choice of Ad Hoc Heuristics” are applied in their “initial-solution” mode where one starts with \(\tilde {x}=x=0\) and aborts execution when 1-opt is invoked 10,000 consecutive times without improving \(\tilde {x}\). At Step 4, instead, a faster “cleanup” mode is applied. As we already have a hopefully good incumbent \(\tilde {x}\) to refine, we initialize \(x=\tilde {x}\) and repeat the procedure until we count 100 consecutive 1-opt calls with no improvement of \(\tilde {x}\). As to time-consuming 2-opt exchanges, they are applied with a certain frequency and in any case just before the final \(\tilde {x}\) is returned.

Two different MIP models are used to feed the proximity-search heuristic at Step 6. During the first part of the computation, we use a simplified MIP model obtained from (19), (20), (21), (22), (23), and (24) by removing all interference constraints (22), thus obtaining a much easier problem. A short time limit is imposed for each call of proximity search when this simplified model is solved. In this way we aggressively drive the solution \(\tilde {x}\) to increase the number of built turbines, without being bothered by interference considerations and only taking pairwise incompatibility (21) into account. This approach quickly finds better and better solutions (even in terms of the true profit), until either (i) no additional turbine can be built or (ii) the addition of new turbines does in fact reduce the true profit associated to the new solution because of the neglected interference. In this situation we switch to the complete model (19), (20), (21), (22), (23), and (24) with all interference constraints, which is used in all next executions of Step 6. Note that the simplified model is only used at Step 6, while all other steps of the procedure always use the true objective function that takes interference into full account.

Computational Results

The following alternative solution approaches were implemented in C language, some of which using the commercial MIP-solver IBM ILOG Cplex 12.5.1 [21]; because of the big-Ms involved in the models, all Cplex’s codes use zero as integrality tolerance (CPX_PARAM_EPINT = 0.0).

  1. (a)

    proxy: The matheuristic outlined in the previous section, built on top of Cplex with the following aggressive parameter tuning: all cuts deactivated, CPX_PARAM_RINSHEUR = 1, CPX_PARAM_POLISHAFTERTIME = 0.0, CPX_PARAM_INTSOLLIM = 2;

  2. (b)

    cpx_def: The application of IBM ILOG Cplex 12.5.1 in its default setting, starting from the same heuristic solution \(\tilde {x}\) available right after the first execution of Step 2 of Algorithm 1;

  3. (c)

    cpx_heu: Same as cpx_def, with the following internal tuning intended to improve Cplex’s heuristic performance: all cuts deactivated, CPX_PARAM_RINSHEUR = 100, CPX_PARAM_POLISHAFTERTIME = 20% of the total time limit;

  4. (d)

    loc_sea: A simple heuristic not based on any MIP solver, that just loops on Steps 4 of Algorithm 1 and randomly removes installed turbines from the current best solution after 10,000 iterations without improvement of the incumbent.

For each algorithm, we recorded the best solution found within a given time limit.

In our view, loc_sea is representative of a clever but not oversophisticated metaheuristic, as typically implemented in practice, while cpx_def and cpx_heu represent a standard way of exploiting a MIP model once a good feasible solution is known.

Our test bed refers to an offshore 3,000 × 3,000 (m) square with D MIN = 400 (m) minimum turbine separation, with no limit on the number of turbines to be built (i.e., N MIN = 0 and N MAX = +). Turbines are all of Siemens SWT-2.3-93 type (rotor diameter 93 m), which produces a power of 0.0 MW for wind speed up to 3 m/s, of 2.3 MW for wind speed greater than or equal to 16 m/s, and intermediate values for winds in range 3–16 m/s according to a nonlinear power curve [30]. Pairwise interference (in MW) was computed using Jensen’s model [22], by averaging 250,000+ real-world wind samples. Those samples were grouped into about 500 macro-scenarios to reduce the computational time spent for the definition of the interference matrix. A pairwise average interference of 0.01 MW or less was treated as zero. The reader is referred to [6] for details.

We generated five classes of medium-to-large problems with n ranging from 1,000 to 20,000. For each class, ten instances have been considered by generating n uniformly random points in the 3,000 × 3,000 square. (Although in the offshore case turbine positions are typically sampled on a regular grid, we decided to randomly generate them to be able to compute meaningful statistics for each value of n.)

In what follows, reported computing times are in CPU sec.s of an Intel Xeon E3-1220 V2 quad-core PC with 16GB of RAM and do not take Step 1 of Algorithm 1 into account as the interference matrix is assumed to be precomputed and reused at each run.

Computational results on our instances are given in Table 1, where each entry refers to the performance of a given algorithm at a given time limit. In particular, the left part of the table reports, for each algorithm and time limit, the number of wins, i.e., the number of instances for which a certain algorithm produced the best-known solution at the given time limit (ties allowed).

Table 1 Number of times each algorithm finds the best-known solution within the time limit (wins) and optimality ratio with respect to the best-known solution—the larger, the better

According to the table, proxy outperforms all competitors by a large amount for medium-to-large instances. As expected, cpx_heu performs better for instances with n = 1,000 as it is allowed to explore a large number of enumeration nodes for the original model and objective function. Note that loc_sea has a good performance for short time limits and/or for large instances, thus confirming its effectiveness, whereas cpx_heu is significantly better than loc_sea only for small instances and large time limits.

A different performance measure is given in the right-hand side part of Table 1, where each entry gives the average optimality ratio, i.e., the average value of the ratio between the solution produced by an algorithm (on a given instance at a given time limit) and the best solution known for that instance—the closer to one, the better. It should be observed that an improvement of just 1% has a very significant economical impact due to the very large profits involved in the wind farm context. The results show that proxy is always able to produce solutions that are quite close to the best one. As before, loc_sea is competitive for large instances when a very small computing time is allowed, whereas cpx_def and cpx_heu exhibit a good performance only for small instances and are dominated even by loc_sea for larger ones.

Application 2: Prepack Optimization

Packing problems play an important role in industrial applications. In these problems, a given set of items has to be packed into one or more containers (bins) so as to satisfy a number of constraints and to optimize some objective function.

Most of the contributions from the literature are devoted to the case where all the items have to be packed into a minimum number of bins so as to minimize, e.g., transportation costs; within these settings, only loading costs are taken into account. The resulting problem is known as the bin packing problem and has been widely studied in the literature both in its one-dimensional version [25] and in its higher-dimensional variants [23].

We will next consider a different packing problem arising in inventory allocation applications, where the operational cost for packing the bins is comparable, or even higher, than the cost of the bins themselves. This is the case, for example, for warehouses that have to manage a large number of different customers (e.g., stores), each requiring a given set of items. Assuming that automatic systems are available for packing, the required workforce is related to the number of different ways that are used to pack the bins to be sent to the customers. To limit this cost, a hard constraint can be imposed on the total number of different box configurations that are used.

Prepacking items into box configurations has obvious benefits in terms of easier and cheaper handling, as it reduces the amount of material handled by both the warehouse and the customers. However, the approach can considerably reduce the flexibility of the supply chain, leading to situations in which the set of items that are actual shipped to each customer may slightly differ from the required one—at the expense of some cost in the objective function. In addition, an upper bound on overstocking is usually imposed for each store.

The resulting problem, known as prepack optimization problem (POP), was recently addressed in [20], where a real-world application in the fashion industry is presented, and heuristic approaches are derived using both constraint programming (CP) and MIP techniques.

Mathematical Model

In this section we briefly formalize POP and review the mathematical model introduced in [20]. We are given a set I of types of products and a set S of stores. Each store s ∈ S requires an integer number r is of products of type i ∈ I. Bins with different capacities are available for packing items: we denote by K ⊂ Z + the set of available bin capacities.

Bins must be completely filled and are available in an unlimited number for each type. A box configuration describes the packing of a bin, in terms of number of products of each type that are packed into it. We denote by NB the maximum number of box configurations that can be used for packing all products and by B = {1, …, NB} the associated set.

Products’ packing into boxes is described by integer variables y bi: for each product type i ∈ I and box configuration b ∈ B, the associated variable y bi indicates the number of products of type i that are packed into the b-th box configuration. In addition, integer variables x bs are used to denote the number of bins loaded according to box configuration b that have to be shipped to store s ∈ S.

Understocking and overstocking of product i at store s are expressed by decisional variables u is and o is, respectively. Positive costs α and β penalize each unit of under- and overstocking, respectively, whereas an upper bound δ is on the maximum overstocking of each product at each store is also imposed.

Finally, for each box configuration b ∈ B and capacity value k ∈ K, a binary variable t bk is introduced that takes value 1 if box configuration b corresponds to a bin of capacity k.

Additional integer variables used in the model are q bis = x bs y bi (number of items of type i sent to store s through boxes loaded with configuration b); hence, ∑bB q bis gives the total number of products of type i that are shipped to store s.

A mixed-integer nonlinear programming (MINLP) model then reads:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \min \sum_{s \in S} \sum_{i \in I} (\alpha u_{is} + \beta o_{is}) &\displaystyle {} \end{array} \end{aligned} $$
(25)
$$\displaystyle \begin{aligned} \begin{array}{rcl} q_{bis} = x_{bs} y_{bi} &\displaystyle &\displaystyle (b\in B; i \in I; s \in S) {} \end{array} \end{aligned} $$
(26)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{b\in B} q_{bis} - o_{is} + u_{is} = r_{is} &\displaystyle &\displaystyle (i \in I; s \in S) {} \end{array} \end{aligned} $$
(27)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{i \in I} y_{bi} = \sum_{k \in K} k \, t_{bk} &\displaystyle &\displaystyle (b\in B) {} \end{array} \end{aligned} $$
(28)
$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{k \in K} t_{bk} = 1 &\displaystyle &\displaystyle (b\in B) {} \end{array} \end{aligned} $$
(29)
$$\displaystyle \begin{aligned} \begin{array}{rcl} o_{is} \leq \delta_{is} &\displaystyle &\displaystyle (i \in I; s \in S) {} \end{array} \end{aligned} $$
(30)
$$\displaystyle \begin{aligned} \begin{array}{rcl} t_{bk} \in \{0, 1\} &\displaystyle &\displaystyle (b\in B; k \in K) {} \end{array} \end{aligned} $$
(31)
$$\displaystyle \begin{aligned} \begin{array}{rcl} x_{bs} \geq 0 \mbox{ integer} &\displaystyle &\displaystyle (b\in B; s \in S) {} \end{array} \end{aligned} $$
(32)
$$\displaystyle \begin{aligned} \begin{array}{rcl} y_{bi} \geq 0 \mbox{ integer} &\displaystyle &\displaystyle (b\in B; i \in I) {} \end{array} \end{aligned} $$
(33)

The model is of course nonlinear, as the bilinear constraints (26) involve the product of decision variables. To derive a linear MIP model, the following standard technique can be used. Each x bs variable is decomposed into its binary expansion using binary variables v bsl (l = 0, …, L), where L is easily computed from an upper bound on x bs. When these variables are multiplied by y bi, the corresponding product w bisl = v bsl y bi are linearized with the addition of suitable constraints.

Our final MIP is therefore obtained from (25), (26), (27), (28), (29), (30), (31), (32), and (33) by adding

$$\displaystyle \begin{aligned} \begin{array}{rcl} x_{bs} = \sum_{l=0}^L 2^l v_{bsl} &\displaystyle &\displaystyle (b\in B; s \in S) {} \end{array} \end{aligned} $$
(34)
$$\displaystyle \begin{aligned} \begin{array}{rcl} v_{bsl} \in \{0, 1\} &\displaystyle &\displaystyle (b\in B; s \in S; l=0,\ldots, L) {} \end{array} \end{aligned} $$
(35)

and by replacing each nonlinear equation (26) with the following set of new variables and constraints:

$$\displaystyle \begin{aligned} \begin{array}{rcl} q_{bis} = \sum_{l=0}^L 2^l w_{bisl} &\displaystyle &\displaystyle (b \in B; i \in I; s \in S) {} \end{array} \end{aligned} $$
(36)
$$\displaystyle \begin{aligned} \begin{array}{rcl} w_{bisl} \leq \overline Y v_{bsl} &\displaystyle &\displaystyle (b\in B; i \in I; s \in S; l=0,\ldots, L) {} \end{array} \end{aligned} $$
(37)
$$\displaystyle \begin{aligned} \begin{array}{rcl} w_{bisl} \leq y_{bi} &\displaystyle &\displaystyle (b\in B; i \in I; s \in S; l=0,\ldots, L) {} \end{array} \end{aligned} $$
(38)
$$\displaystyle \begin{aligned} \begin{array}{rcl} w_{bisl} \geq y_{bi} - \overline Y (1-v_{bsl})&\displaystyle &\displaystyle (b\in B; i \in I; s \in S; l=0,\ldots, L) {} \end{array} \end{aligned} $$
(39)
$$\displaystyle \begin{aligned} \begin{array}{rcl} w_{bisl} \geq 0 &\displaystyle &\displaystyle (b\in B; i \in I; s \in S; l=0,\ldots, L) {} \end{array} \end{aligned} $$
(40)

where \(\overline Y\) denotes an upper bound on the y variables.

In case all capacities are even, the following constraint—though redundant—plays a very important role in improving the LP bound of our MIP model:

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{i \in I} (u_{is}+o_{is}) \ge 1 &\displaystyle &\displaystyle \left(s \in S: \sum_{i\in I} r_{is} \text{ is odd}\right) {} \end{array} \end{aligned} $$
(41)

Matheuristics

The MIP model of the previous subsection is by far too difficult to be addressed by standard solvers. As a matter of fact, for real-world cases even the LP relaxation at each node turns out to be very time consuming. So we designed ad hoc heuristic approaches to exploit the special structure of our MIP, following the matheuristic paradigm.

Each heuristic is based on the idea of iteratively solving a restricted problem obtained by fixing a subset of variables, so as to obtain a subproblem which is (reasonably) easy to solve by a commercial MIP solver, but still able to produce improved solutions.

Two kinds of heuristics can be implemented: constructive and refinement heuristics. Constructive heuristics are used to find a solution H starting from scratch. In a refinement heuristic, instead, we are given a heuristic solution H = (x H, y H) that we would like to improve. We first fix some x and/or y variables to their value in H, thus defining a solution neighborhood \({\mathcal {N}}(H)\) of H. We then search \({\mathcal {N}}(H)\) by using a general-purpose MIP solver on the model resulting from fixing. If an improved solution is found within the given time limit, we update H and repeat; otherwise, a new neighborhood is defined in the attempt to escape the local optimum.

Fixing all x or y Variables

A first obvious observation is that our basic MINLP model (25), (26), (27), (28), (29), (30), (31), (32), and (33) reduces to a linear MIP if all the x (or all the y) variables are fixed, as constraints (26) trivially become linear. According to our experience, the resulting MIP (though nontrivial) is typically solved very quickly by a state-of-the-art solver, meaning that one can effectively solve a sequence of restricted MIPs where x and y are fixed, in turn, until no further improvement can be obtained.

Let \({\mathcal {N}}_y(H)\) and \({\mathcal {N}}_x(H)\) denote the solution neighborhoods of H obtained by leaving y or x free, i.e., when imposing x = x H or y = y H, respectively.

A basic tool that we use in our heuristics is function REOPT(S′, y H) that considers a store subset S′⊆ S and a starting y H and returns the best solution H obtained by iteratively optimizing over the neighborhoods \({\mathcal {N}}_x(H)\), \({\mathcal {N}}_y(H)\), \({\mathcal {N}}_x(H)\), etc. after having removed all stores not in S′, H being updated after each optimization.

Fixing y Variables For All but One Configuration

Another interesting neighborhood, say \({\mathcal {N}}_x(H, y_{\beta })\), is obtained by leaving all x variables free and by fixing \(y_{bi}=y^H_{bi}\) for all i ∈ I and b ∈ B ∖{β} for a given β ∈ B. In other words, we allow for changing just one (out of NB) configuration in the current solution, and leave the solver the possibility to change the x variables as well.

In our implementation, we first define a random permutation {β 1, …, β NB} of B. We then optimize, in a circular sequence, neighborhoods \({\mathcal {N}}_x(H, y_{\beta _t})\) for t = 1, …, NB, 1, … Each time an improved solution is found, we update H and further refine it through function REOPT(S, y H). The procedure is stopped when there is no hope of finding an improved solution, i.e., after NB consecutive optimizations that do not improve the current H.

A substantial speedup can be obtained by heuristically imposing a tight upper bound on the x variables, so as to reduce the number L + 1 of binary variables v bsl in the binary expansion (34). An aggressive policy (e.g., imposing x bs ≤ 1) is however rather risky as the optimal solution could be cut off; hence, the artificial bounds must be relaxed if an improved solution cannot be found.

Working with a Subset of Stores

Our basic constructive heuristic is based on the observation that removing stores can produce a substantially easier model. Note that a solution H′ = (x′, y′) with a subset of stores can easily be converted into a solution H = (x, y) of the whole problem by just invoking function REOPT(S, y′).

In our implementation, we first define a store permutation s 1, …, s |S| according to a certain criterion (to be discussed later). We then address, in sequence, the subproblem with store set S t = {s 1, …, s t} for t = 0, …, |S|.

For t = 0, store set S 0 is empty and our MIP model just produces a y solution with random (possibly repeated) configurations.

For each subsequent t, we start with the best solution H t−1 = (x t−1, y t−1) of the previous iteration and convert it into a solution H t = (x t, y t) of the current subproblem through the refining function REOPT(S t, y t−1). Then we apply the refinement heuristics described in the previous subsection to H t, reoptimizing one configuration at a time in a circular vein. To reduce computing time, this latter step can be skipped with a certain frequency—except of course in the very last step when S t = S.

Each time a solution H t = (x t, y t) is found, we quickly compute a solution H = (x, y) of the overall problem through function REOPT(S, y t) and update the overall incumbent where all stores are active.

As to store sequence s 1, …, s |S|, we have implemented three different strategies to define it. For each store pair a, b, let the dissimilarity index dist(a, b) be defined as the distance between the two demand vectors (r ia : i ∈ I) and (r ib : i ∈ I).

  • random: The sequence is just a random permutation of the integers 1, …, |S|;

  • most_dissimilar: We first compute the two most dissimilar stores (a, b), i.e., such that a < b and dist(a, b) is a maximum and initialize s 1 = a. Then, for t = 2, …, |S|, we define S′ = {s 1, …, s t−1} and let

    $$\displaystyle \begin{aligned} s_t = \text{argmax}_{a\in S\setminus S'} \{ \, \min\{ dist(a,b): b\in S' \, \}\end{aligned} $$
  • most_similar: This is just the same procedure as in the previous item, with max and min operators reverted.

The rational of the most_dissimilar policy is to attach first a “core problem” defined by the pairwise most dissimilar stores (those at the beginning of the sequence). The assumption here is that our method performs better in its first iterations (small values of t) as the size of the subproblem is smaller, and we have plenty of configurations to accommodate the initial requests. The “similar stores” are therefore addressed only at a later time, in the hope that the found configurations will work well for them.

A risk with the above policy is that the core problem becomes soon too difficult for our simple refining heuristic, so the current solution in not updated after the very first iterations. In this respect, policy most_similar is more conservative: given for granted that we proceed by subsequently refining a previous solution with one less store, it seems reasonable not to inject too much innovation in a single iteration—as most_dissimilar does when it adds a new store with very different demands with respect to the previous ones.

Computational Experiments

The heuristics described in the previous section have been implemented in C language. IBM ILOG CPLEX 12.6 [21] was used as MIP solver. Reported computing times are in CPU seconds of an Intel Xeon E3-1220 V2 quad-core PC with 16GB of RAM. For each run a time limit of 900 s (15 m) was imposed.

Four heuristic methods have been compared: the three construction heuristics random, most_dissimilar and most_similar of section “Working with a Subset of Stores,” plus

All heuristics are used in a multi-start mode, e.g., after completion they are just restarted from scratch until the time limit is exceeded. At each restart, the internal random seed is not reset; hence, all methods natively using a random permutation (namely, fast_heu and random) will follow a different search path at each run as the permutations will be different. As to most_dissimilar and most_similar, after each restart the sequence s 1, …, s |S| is slightly perturbed by five random pair swaps. In addition, after each restart the CPLEX’s random seed is changed so as to inject diversification among runs even within the MIP solver.

Due to their heuristic nature, our methods—though deterministic—exhibit a large dependency on the initial conditions, including the random seeds used both within our code and in CPLEX. We therefore repeated several times each experiment, starting with different (internal/CPLEX) random seeds at each run, and also report average figures.

In case all capacities are even (as it is the case in our tesbed), we compute the following trivial lower bound based on constraint (41)

$$\displaystyle \begin{aligned} \begin{array}{rcl} LB := \min\{\alpha, \beta\} \cdot\left| \left\{s \in S: \sum_{i\in I} r_{is} \text{ is odd} \right\} \right| {} \end{array} \end{aligned} $$
(42)

and abort the execution as soon as we find a solution whose value meets the lower bound.

Test Bed

Our test bed coincides with the benchmark proposed in [20] and contains a number of subinstances of a real-world problem (named AllColor58) with 58 stores that require 24 (= 6 × 4) different items: T-shirts available in six different sizes and four different colors (black, blue, red, and green). The available box capacities are 4, 6, 8, and 10. Finally, each item has a given overstock limit (0 or 1) for all stores but no understock limits, and the overstock and understock penalties are β = 1 and α = 10, respectively.

Note that our testing environment is identical to that used in [20] (assuming the PC used are substantially equivalent), so our results can fairly be benchmarked against those therein reported.

Comparison Metric

To better compare the performance of our different heuristics, we also make use of an indicator recently proposed by [1, 2] and aimed at measuring the trade-off between the computational effort required to produce a solution and the quality of the solution itself. In particular, let \(\tilde {z}_{opt}\) denote the optimal solution value and let z(t) be the value of the best heuristic solution found at a time t. Then, a primal gap function p can be computed as

$$\displaystyle \begin{aligned} p(t) = \left\{\begin{array}{ll} 1 & \mbox{if no incumbent found until time {$t$}} \\ \gamma(z(t)) & \mbox{otherwise} \end{array} \right. \end{aligned} $$
(43)

where γ(⋅) ∈ [0, 1] is the primal gap, defined as follows

$$\displaystyle \begin{aligned} \gamma(z) = \left\{\begin{array}{ll} 0 & \mbox{if {$|\tilde{z}_{opt}| = |z| = 0$},} \\ 1 & \mbox{if {$\tilde{z}_{opt} \cdot z < 0$},} \\ \frac{z - \tilde{z}_{opt}}{\max\{|\tilde{z}_{opt}|, |z| \}} & \mbox{otherwise.} \end{array} \right. \end{aligned} $$
(44)

Finally, the primal integral of a run until time \(t_{\max }\) is defined as

$$\displaystyle \begin{aligned} P(t_{\max}) = \frac{\int_{0}^{t_{\max}} p(t) \, \mathrm{d} t}{t_{\max}} {}\end{aligned} $$
(45)

and is actually used to measure the quality of primal heuristics—the smaller \(P(t_{\max })\), the better the expected quality of the incumbent solution if we stopped computation at an arbitrary time before \(t_{\max }\).

Computational Results

We addressed the instances provided in [20], with the aim of benchmarking our matheuristics against the methods therein proposed. Results for the easiest cases involving only NB = 4 box configurations (namely, instances Black58, Blue58, Red58, and Green58) are not reported as the corresponding MIP model can be solved to proven optimality in less than one second by our solver—thus confirming the figures given in [20].

Tables 2 and 3 report the performance of various heuristics in terms of solution value and time and refer to a single run for each heuristic and for each instance.

Table 2 Performance of CPLEX and LNS heuristics from [20]. Single run for each instance. Times in CPU seconds (time limit of 900 s)
Table 3 Performance of matheuristics. Single run for each instance. Times in CPU seconds (time limit of 900 s)

Table 2 is taken from [20], where a two-phase hybrid metaheuristic was proposed. In the first phase, the approach uses a memetic algorithm to explore the solution space and builds a pool of interesting box configurations. In the second phase, a box-to-store assignment problem is solved to choose a subset of configurations from the pool—and to decide how many boxes of each configuration should be sent to each store. The box-to-store assignment problem is modeled as a (very hard in practice) MIP and heuristically solved either by a commercial solver (CPLEX) or by a sophisticated large-neighborhood search (LNS) approach.

Table 3 reports the performance of our four matheuristics, as well as the lower bound value LB computed through (42)—this latter value turned out to coincide with the optimal value for all instances under consideration in the present subsection.

Comparing Tables 2 and 3 shows that matheuristics outperform the LNS heuristics analyzed in [20]. In particular, fast_heu is able to find very good solutions (actually, an optimal one in 3 out of 4 cases) within very short computing times. For the largest instance (AllColor58), however, most_similar qualifies as the best heuristic both in terms of quality and speed.

To get more reliable information about the matheuristics’ performance, we ran them 100 times for each instance, with different random seeds, and took detailed statistics on each run. Table 4 reports, for each instance and for each heuristic, the average completion time (time), the average time to find its best solution (time_best), the primal integral after 900 s (pint, the lower the better), and the number of provably optimal solutions found (#opt) out of the 100 runs. Note that, for all instances, a solution matching the simple lower bound (42) was eventually found by at least one of our heuristics. The 100-run statistics confirm that fast_heu is very effective in all cases, though it is outperformed by most_similar for the largest instance AllColor58 with respect to the #opt criterion. The results suggest that a hybrid method running fast_heu and most_similar (possibly in parallel) qualifies a robust heuristic with a very good performance for all instances.

Table 4 Average performance (out of 100 runs) of our heuristics

Application 3: Vehicle Routing

In this section we address the NP-hard distance-constrained capacitated vehicle routing problem (DCVRP) that can be defined as follows. We are given a central depot and a set of n − 1 customers, which are associated with the nodes of a complete undirected graph G = (V, E) where |V | = n, node 1 representing the depot. Each edge [i, j] ∈ E has an associated finite cost c ij ≥ 0. Each node j ∈ V has a request d j ≥ 0 (d 1 = 0 for depot node 1). Customers need to be served by k cycles (routes) passing through the depot, where k is fixed in advance. Each route must have a total duration (computed as the sum of the edge costs in the route) not exceeding a given limit D and can visit a subset S of customers whose total request ∑jS d j does not exceed a given capacity C. The problem then consists of finding a feasible solution covering exactly once all the nodes v ∈ V ∖{1} and having a minimum overall cost; see, e.g., [3, 31].

We will next outline the refinement matheuristic for DCVRP proposed in [15].

The ASSIGN Neighborhood for TSP

Sarvanov and Doroshko (SD) investigated in [28] the so-called ASSIGN neighborhood for the pure Traveling Salesman Problem (TSP), i.e., for the problem of finding a min-cost Hamiltonian cycle in a graph. Given a certain TSP solution (viewed as node sequence < v 1 = 1, v 2, ⋯ , v n > ), the neighborhood contains all the ⌊n∕2⌋! TSP solutions that can be obtained by permuting, in any possible way, the nodes in even position in the original sequence. In other words, any solution (ψ 1, ψ 2, ⋯ , ψ n) in the neighborhood is such that ψ i = v i for all odd i. An interesting feature of the neighborhood is that it can be explored exactly in polynomial time, though it contains an exponential number of solutions. Indeed, for any given starting solution, the min-cost TSP solution in the corresponding ASSIGN neighborhood can be found efficiently by solving a min-cost assignment problem on a ⌊n∕2⌋×⌊n∕2⌋ matrix; see, e.g., [18]. Starting from a given solution, an improving heuristic then consists of exploring the ASSIGN neighborhood according to the following two phases:

  • node extraction, during which the nodes in even position (w.r.t. the current solution) are removed from the tour, thus leaving an equal number of “free holes” in the sequence;

  • node reinsertion, during which the removed nodes are reallocated to the available holes in an optimal way by solving a min-sum assignment problem.

The simple example in Fig. 1 gives an illustration of the kind of “improving moves” involved in the method. The figure draws a part of a tour, corresponding to the node sequence \(\left \langle v_{1},v_{2},\cdots ,v_{9}\right \rangle \). In the node extraction phase, the nodes in even position v 2, v 4, v 6 e v 8 are removed from the sequence, whereas all the other nodes retain their position. In Fig. 1b the black nodes represent the fixed ones, while the holes left by the extracted nodes are represented as white circles. If we use symbol “−” to represent a free hole, the sequence corresponding to Fig. 1b is therefore \(\left \langle v_{1},-,v_{3},-,v_{5},-,v_{7},-,v_{9}\right \rangle \). The second step of the procedure, i.e., the optimal node reallocation, is illustrated in Fig. 1c, where nodes v 4 and v 6 swap their position, whereas v 2 and v 8 are reallocated as in the original sequence. This produces the improved part of tour \(\left \langle v_{1},v_{2},v_{3},v_{6},v_{5},v_{4},v_{7},v_{8},v_{9}\right \rangle \).

Fig. 1
figure 1

A simple example of node extraction and reallocation

In the example, the same final tour could have been constructed by a simple 2-opt move. However, for more realistic cases, the number of possible reallocation is exponential in the number of extracted nodes; hence, the possible reallocation patterns are much more complex and allow, e.g., for a controlled worsening of some parts of the solution which are compensated by large improvement in other parts.

From TSP to DCVRP

One can conjecture that the ASSIGN neighborhood would work well if applied to VRP problems. Indeed, due to the presence of several routes and of the associated route constraints, in VRP problems the node sequence is not the only issue to be considered when constructing a good solution: an equally important aspect of the optimization is to find a balanced distribution of the nodes between the routes. In this respect, heuristic refinement procedures involving complex patterns of node reallocations among the routes likely are quite effective in practice.

We can therefore extend the SD method to DCVRP so as to allow for more powerful move patterns, while generalizing its basic scheme so as to get rid of the too simple min-sum assignment model for node reallocation in favor of a more flexible reallocation model based on the (heuristic) solution of a more complex MIP model. The resulting matheuristic will be introduced, step by step, with the help of some illustrative examples.

Let us consider Fig. 2, where a nonoptimal part of a VRP solution is depicted. It is clear that the position of node v 3 is not very clever, in that inserting v 3 between node v 1 and v 2 is likely to produce a better solution (assuming this new solution is not infeasible because of the route constraints). Even if v 3 were an even position node, however, this move would be beyond the possibility of the pure SD method, where the extracted nodes can only be assigned to a hole left free by the removal of another node—while no hole between v 1 e v 2 exists which could accommodate v 3. The example then suggests a first extension of the basic SD method, consisting of removing the 1-1 correspondence between extracted nodes and empty holes. We therefore consider the concept of insertion point: after having extracted the selected nodes, we construct a restricted solution through the remaining nodes, obtained from the original one by short-cutting the removed nodes. All the edges in the restricted solution are then viewed as potential insertion points for the extracted nodes. In the example, removing v 3 but not v 1 and v 2 would produce the restricted solution depicted in Fig. 3a, where all dashed edges are possible insertion points for the extracted nodes—this allows the method to produce the solution in Fig. 3b.

Fig. 2
figure 2

The assignment of node v 3 to route r 1 is nonoptimal

Fig. 3
figure 3

Improving the solution depicted in Fig. 2

A second important extension comes from a more flexible node extraction scheme that allows for the removal of a sequence of nodes; see Fig. 4 for an illustration. Once a sequence of nodes has been extracted, one can use a heuristic procedure to generate new sequences through the extracted nodes, to be allocated to different insertion points. To be more specific, starting from the extracted node sequences, one can create new derived sequences that combine the extracted nodes in a different way and consider the possibility of assigning each derived sequence to a different insertion point. Of course, one never knows in advance which are the best sequences to be used, so all the (original and derived) sequences should be available when solving the reallocation problem.

Fig. 4
figure 4

Removing a sequence of nodes to better reallocate them (possibly in a different order)

The above considerations imply the use of a reallocation model which goes far beyond the scope of the original one, which is based on the solution of an easy min-cost assignment problem. Indeed, the new reallocation model becomes a MIP that receives as input the set of insertion points along with a (typically large) set of node sequences through the extracted nodes and provides an (almost) optimal allocation of at most one sequence to each insertion point, with the constraint that each extracted node has to belong to one of the allocated sequences, while fulfilling the additional constraints on the capacity and distance constraints on the routes. This model will be described in more detail in the next section.

The Overall Matheuristic

Here is a possible implementation of the ideas outlined in the previous section, leading to the so-called selection, extraction, recombination, and reallocation (SERR) matheuristic.

  1. (i)

    (Initialization). Apply a fast heuristic method to find a first (possibly infeasible, see below) DCVRP solution.

  2. (ii)

    (Selection). Apply one of the available criteria (to be described later) to determine the nodes to be extracted—the nodes need not be consecutive, and any node subset qualifies as a valid choice.

  3. (iii)

    (Extraction). Extract the nodes selected in the previous step and construct the corresponding restricted DCVRP solution obtained by short-cutting the extracted nodes. All edges in the restricted solution are put in the list \(\mathcal {I}\) of the available insertion points.

  4. (iv)

    (Recombination). The node sequences extracted in the previous step (called basic in the sequel) are initially stored in a sequence pool. Thereafter, heuristic procedures (to be described later) are applied to derive new sequences through the extracted nodes, which are added to the sequence pool. During this phase, dual information derived from the LP relaxation of the reallocation model can be used to find new profitable sequences—the so-called pricing step. Each sequence s in the final pool is then associated with a (heuristically determined) subset I s of the available insertion points in \(\mathcal {I}\). For all basic sequences s, we assume that \(\mathcal {I}_s\) contains (among others) the pivot insertion point associated to s in the original tour, so as to make it feasible to retrieve the original solution by just reallocating each basic sequence to the associated pivot insertion point.

  5. (v)

    (Reallocation). A suitable MIP (to be described later in greater details) is set up and solved heuristically through a general-purpose MIP solver. This model has a binary variable x si for each pair (s, i), where s is a sequence in the pool and \(i\in \mathcal {I}_s\), whose value 1 means that s has to be allocated to i. The constraints in the MIP stipulate that each extracted node has to be covered by exactly one of the selected sequences s, while each insertion point i can be associated to at most one sequence. Further constraints impose the capacity and distance constraints in each route. Once an (almost) optimal MIP solution has been found, the corresponding DCVRP solution is constructed and the current best solution is possibly updated (in which case each route in the new solution is processed by a 3-opt [26] exchange heuristic in the attempt of further improving it).

  6. (vi)

    (Termination). If at least one improved DCVRP solution has been found in the last n iterations, we repeat from step (ii); otherwise the method terminates.

Finding a Starting Solution

Finding a DCVRP solution that is guaranteed to be feasible is an NP-hard problem; hence, we have to content ourselves with the construction of solutions that, in some hard cases, may be infeasible—typically because the total-distance-traveled constraint is violated for some routes. In this case, the overall infeasibility of the starting solution can hopefully be driven to zero by a modification of the SERR recombination model where the capacity and distance constraints are treated in a soft way through the introduction of highly penalized slack variables.

As customary in VRP problems, we assume that each node is assigned a coordinate pair (x, y) giving the geographical position of the corresponding customer/depot in a two-dimensional map.

One option for the initialization of the current solution required at step (i) of the SERR method is to apply the classical two-phase method of Fisher and Jaikumar (FJ) [14]. This method can be implemented in a very natural way in our context in that it is based on a (heuristic) solution of a MIP whose structure is close to that of the reallocation model.

According to computational experience, however, the solution provided by the FJ heuristic is sometimes “too balanced,” in the sense that the routes are filled so tightly that leave not enough freedom to the subsequent steps of our SERR procedure. Better results are sometimes obtained starting from a less-optimized solution whose costs significantly exceed the optimal cost as, e.g., the one obtained by using a simplified SWEEP method [16].

A second possibility is instead to start from an extremely good solution provided by highly effective (and time-consuming) heuristics or metaheuristics, in the attempt of improving this solution even further.

Node selection criteria

At each execution of step (ii), one of the following selection schemes is applied.

  • scheme Random-Alternate: This criterion is akin to the SD one and selects in some randomly selected routes all the nodes in even position, while in the remaining routes the extracted nodes are those in odd position—the position parity being determined by visiting each route in a random direction.

  • scheme Scattered: Each node had a uniform probability of 50% of being extracted; this scheme allows for the removal of consecutive nodes, i.e., of route subsequences.

  • scheme Neighborhood: Here one concentrates on a seed node, say v , and removes the nodes v with a probability that is inversely proportional to the distance \(c_{vv^*}\) of v from v .

Schemes Random-Alternate and Scattered appear particularly suited to improve the first solutions, whereas the Neighborhood scheme seems more appropriate to deal with the solutions available after the first iterations.

Reallocation Model

Given the sequences stored in the pool and the associated insertion points (defined through the heuristics outlined in the next subsection), our aim is to reallocate the sequences so as to find a feasible solution of improved cost (if any). To this end, we need to introduce some additional notation.

Let \(\mathcal {F}\) denote the set of the extracted nodes, \(\mathcal {S}\) the sequence pool, and \(\mathcal {R}\) the set of routes r in the restricted solution. For any sequence \(s \in \mathcal {S}\), let c(s) be the sum of the costs of the edges in the sequence, and let d(s) be the sum of the requests d j associated with the internal nodes of s.

For each insertion point \(i\in \mathcal {I}\), we define the extra-cost γ si for assigning sequence s (in its best possible orientation) to the insertion point i. For each route \(r\in \mathcal {R}\) in the restricted solution, let \(\mathcal {I}(r)\) denote the set of the insertion points (i.e., edges) associated with r, while let \(\tilde {d}(r)\) and \(\tilde {c}(r)\) denote, respectively, the total request and distance computed for route r—still in the restricted tour. As already mentioned, our MIP model is based on the following decision variables:

$$\displaystyle \begin{aligned} x_{si}=\left\{ \begin{array}{ll} 1 & \mathrm{if \,sequence {s} \,is \,allocated \,to \,the \,insertion \,point \,{i\in \mathcal{I}_s}}\\ 0 & {\mathrm{otherwise}}\end{array}\right.{} \end{aligned} $$
(46)

The model then reads:

$$\displaystyle \begin{aligned} \sum_{r \in \mathcal{R}} \tilde{c}(r) + \mathrm{min}\sum_{s\in\mathcal{S}}\sum_{i\in \mathcal{I}_s} \gamma_{si} x_{si} {} \end{aligned} $$
(47)

subject to:

$$\displaystyle \begin{gathered} {\displaystyle \sum_{s\in v}\sum_{i\in\mathcal{I}_s}x_{si}=1\,\,\,\forall v\in\mathcal{F}}{} \end{gathered} $$
(48)
$$\displaystyle \begin{gathered} {\displaystyle \sum_{s\in\mathcal{S}: i \in \mathcal{I}_s}x_{si}\le1}\,\,\,\forall i\in\mathcal{I}{} \end{gathered} $$
(49)
$$\displaystyle \begin{aligned} {\displaystyle \tilde d(r)+\sum_{s\in\mathcal{S}}\sum_{i\in \mathcal{I}_s \cap \mathcal{I}(r)}d(s)x_{si}\le C ~~~~ \forall r\in\mathcal{R}}{} \end{aligned} $$
(50)
$$\displaystyle \begin{aligned} {\displaystyle \tilde{c}(r)+\sum_{s\in\mathcal{S}}\sum_{i\in \mathcal{I}_s \cap \mathcal{I}(r)}\gamma_{si} x_{si}\le D ~~~~ \forall s\in\mathcal{S}, \, r\in\mathcal{R}}{} \end{aligned} $$
(51)
$$\displaystyle \begin{aligned} 0\le x_{si} \le 1 \mathrm{~~integer} ~~~~ \forall s\in\mathcal{S}, \ i\in\mathcal{I}_s{} \end{aligned} $$
(52)

The objective function, to be minimized, gives the cost of the final DCVRP solution. Indeed, each objective coefficient gives the MIP cost of an inserted sequence, including the linking cost, minus the cost of the “saved” edge in the restricted solution. Constraints (48) impose that each extracted node belongs to exactly one of the selected sequences, i.e., that it is covered exactly once in the final solution. Note that, in the case of triangular costs, one could replace = by ≥ in (48), thus obtaining a typically easier-to-solve MIP having the structure of a set-covering (instead of set-partitioning) problem with side constraints. Constraints (49) avoid a same insertion point be used to allocate two or more sequences. Finally, constraints (50) and (51) impose that each route in the final solution fulfills the capacity and distance restriction, respectively.

In order to avoid to overload the model by an excessive number of variables, a particular attention has to be paid to reduce the number of sequences and, for each sequence, the number of the associated insertion points.

Node Recombination and Construction of Derived Sequences

This is a very crucial step in the SERR method. It consists not just of generating new “good” sequences through the extracted nodes, but it also associates each sequence to a clever set of possible insertion points that can conveniently accommodate it. Therefore, one has two complementary approaches to attack this problem: (a) start from the insertion points and, for each insertion point, try to construct a reasonable number of new sequences which are likely to “fit well” or (b) start from the extracted nodes and try to construct new sequences of small cost, no matter the position of the insertion points. The following two-phase method turned out to be a good strategy in practice.

In the first phase, the sequence pool is initialized by means of the original (basic) sequences, and each of them is associated to its corresponding (pivot) insertion point. This choice guarantees that the current DCVRP solution can be reconstructed by simply selecting all the basic sequences and putting them back in their pivot insertion point. Moreover, when the Neighborhood selection scheme is used, a further set of sequences is generated as follows. Let v be the extracted seed node, and let N(v ) contain v plus the, say, k closest extracted nodes (k = 4 in our implementation). A complete enumerative scheme is applied to generate all the sequences through N(v ) that are added to the pool. This choice is intended to increase the chances of locally improving the current solution, by exploiting appropriate sequences to reallocate the nodes in N(v ) in an optimal way.

The second phase is only applied for the Neighborhood and Scattered selection schemes and corresponds to a pricing loop based on the dual information available after having solved the LP relaxation of the current reallocation model. The reader is again referred to [15] for details.

Examples

Extensive computational results for SERR matheuristic have been reported [15] and are omitted because of space. We only report in Figs. 5 and 6 a plot of the incumbent SERR solution for some instances from the literature. Computing time is given in CPU seconds of an old AMD Athlon XP 2400+ PC with 1 GByte RAM, using ILOG Cplex 8.0 as MIP solver. The figures show that SERR is able to significantly improve the starting solution in the early part of its computation.

Fig. 5
figure 5

Time evolution of the SERR solution for various CVRP instances, with FJ [14] initial solution

Fig. 6
figure 6

Time evolution of the SERR solution for various CVRP instances, with SWEEP [16] initial solution

Conclusion

Contamination of metaheuristics with mathematical programming leads to the concept of “matheuristics.” The result is a general approach to design mathematically sound heuristics. In this chapter we presented the main ideas underlying matheuristics, and used some case studies to illustrate them. For each application, we described the specific problem at hand, the mathematical programming model that formalizes it, and the way the model—or a simplification of it—can be used to produce heuristic (as opposed to exact) solutions in an effective way.

Cross-References