Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

3.1 Introduction

Optimization tools have greatly improved during the last two decades. This is due to several factors: (1) progress in mathematical programming theory and algorithmic design; (2) rapid improvement in computer performances; (3) better communication of new ideas and integration in widely used complex softwares. Consequently, many problems long viewed as out of reach are currently solved, sometimes in very moderate computing times. This success, however, has led researchers and practitioners to address much larger instances and more difficult classes of problems. Many of these may again only be solved heuristically. Therefore thousands of papers describing, evaluating and comparing new heuristics appear each year. Keeping abreast of such a large literature is a challenge. Metaheuristics, or general frameworks for building heuristics, are therefore needed in order to organize the study of heuristics. As evidenced by the Handbook, there are many of them. Some desirable properties of metaheuristics [58, 59, 68] are listed in the concluding section of this chapter.

Variable neighborhood search (VNS) is a metaheuristic proposed by some of the present authors some 20 years ago [80]. Earlier work that motivated this approach can be found in [25, 36, 44, 78]. It is based upon the idea of a systematic change of neighborhood both in a descent phase to find a local optimum and in a perturbation phase to get out of the corresponding valley. Originally designed for approximate solution of combinatorial optimization problems, it was extended to address mixed integer programs, nonlinear programs, and recently mixed integer nonlinear programs. In addition VNS has been used as a tool for automated or computer assisted graph theory. This led to the discovery of over 1500 conjectures in that field and the automated proof of more than half of them. This is to be compared with the unassisted proof of about 400 of these conjectures by many different mathematicians.

Applications are rapidly increasing in number and pertain to many fields: location theory, cluster analysis, scheduling, vehicle routing, network design, lot-sizing, artificial intelligence, engineering, pooling problems, biology, phylogeny, reliability, geometry, telecommunication design, etc. References are too numerous to be listed here, but many of them can be found in [69] and special issues of IMA Journal of Management Mathematics [76], European Journal of Operational Research [68] and Journal of Heuristics [87] that are devoted to VNS.

This chapter is organized as follows. In the next section we present the basic schemes of VNS, i.e., variable neighborhood descent (VND), reduced VNS (RVNS), basic VNS (BVNS) and general VNS (GVNS). Two important extensions are presented in Sect. 3.3: Skewed VNS and Variable neighborhood decomposition search (VNDS). A further recent development called Formulation Space Search (FSS) is discussed in Sect. 3.4. The remainder of the paper describes applications of VNS to several classes of large scale and complex optimization problems for which it has proven to be particularly successful. Section 3.5 is devoted to primal dual VNS (PD-VNS) and its application to location and clustering problems. Finding feasible solutions to large mixed integer linear programs with VNS is discussed in Sect. 3.6. Section 3.7 addresses ways to apply VNS in continuous global optimization. The more difficult case of solving mixed integer nonlinear programming by VNS is considered in Sect. 3.8. Applying VNS to graph theory per se (and not just to particular optimization problems defined on graphs) is discussed in Sect. 3.9. Brief conclusions are drawn in Sect. 3.10.

3.2 Basic Schemes

A deterministic optimization problem may be formulated as

$$\displaystyle{ \min \{f(x)\vert x \in X,X \subseteq \mathcal{S}\}, }$$
(3.1)

where \(\mathcal{S},X,x\) and f denote the solution space, the feasible set, a feasible solution and a real-valued objective function, respectively. If \(\mathcal{S}\) is a finite but large set, a combinatorial optimization problem is defined. If \(\mathcal{S} = \mathbb{R}^{n}\), we refer to continuous optimization. A solution x X is optimal if

$$\displaystyle{f(x^{{\ast}}) \leq f(x),\;\forall x \in X.}$$

An exact algorithm for problem (3.1), if one exists, finds an optimal solution x , together with the proof of its optimality, or shows that there is no feasible solution, i.e., X = ∅, or the solution is unbounded. Moreover, in practice, the time needed to do so should be finite (and not too long). For continuous optimization, it is reasonable to allow for some degree of tolerance, i.e., to stop when sufficient convergence is detected.

Let us denote \(\mathcal{}N_{k}\), (k = 1, , k max), a finite set of pre-selected neighborhood structures, and \(\mathcal{}N_{k}(x)\) the set of solutions in the kth neighborhood of x. Most local search heuristics use only one neighborhood structure, i.e., k max = 1. Often successive neighborhoods \(\mathcal{}N_{k}\) are nested and may be induced from one or more metric (or quasi-metric) functions introduced into a solution space \(\mathcal{}S\). An optimal solution x opt (or global minimum) is a feasible solution where a minimum is reached. We call x′ ∈ X a local minimum of (3.1) with respect to \(\mathcal{}N_{k}\) (w.r.t. \(\mathcal{}N_{k}\) for short), if there is no solution \(x \in \mathcal{}N_{k}(x') \subseteq X\) such that f(x) < f(x′). Metaheuristics (based on local search procedures) try to continue the search by other means after finding the first local minimum. VNS is based on three simple facts:

Fact 1

A local minimum w.r.t. one neighborhood structure is not necessarily so for another;

Fact 2

A global minimum is a local minimum w.r.t. all possible neighborhood structures;

Fact 3

For many problems, local minima w.r.t. one or several \(\mathcal{}N_{k}\) are relatively close to each other.

This last observation, which is empirical, implies that a local optimum often provides some information about the global one. For instance, there may be several variables sharing the same values in both solutions. Since these variables usually cannot be identified in advance, one should conduct an organized study of the neighborhoods of a local optimum until a better solution is found.

In order to solve (1) by using several neighborhoods, facts 1–3 can be used in three different ways: (1) deterministic; (2) stochastic; (3) both deterministic and stochastic.

We first examine in Algorithm 1 the solution move and neighborhood change function that will be used within a VNS framework. Function NeighborhoodChange() compares the incumbent value f(x) with the new value f(x′) obtained from the kth neighborhood (line 1). If an improvement is obtained, the incumbent is updated (line 2) and k is returned to its initial value (line 3). Otherwise, the next neighborhood is considered (line 4).

Algorithm 1 Neighborhood change

Below we discuss Variable Neighborhood Descent and Reduced Variable Neighborhood Search and then build upon this to construct the framework for Basic and General Variable Neighborhood Search.

(i) The Variable Neighborhood Descent (VND) method (Algorithm 2) performs a change of neighborhoods in a deterministic way. These neighborhoods are denoted as N k, k = 1, , k max.

Algorithm 2 Variable neighborhood descent

Most local search heuristics use one or sometimes two neighborhoods for improving the current solution (i.e., k max ≤ 2). Note that the final solution should be a local minimum w.r.t. all k max neighborhoods, and thus, a global optimum is more likely to be reached than with a single structure. Beside this sequential order of neighborhood structures in VND, one can develop a nested strategy. Assume, for example, that k max = 3; then a possible nested strategy is: perform VND with Algorithm 2 for the first two neighborhoods from each point x′ that belongs to the third one (x′ ∈ N 3(x)). Such an approach is successfully applied in [22, 26, 57].

(ii) The \(\underline{\mathbf{ReducedVNS}}\) (RVNS) method is obtained when a random point is selected from \(\mathcal{N}_{k}(x)\) and no descent is attempted from this point. Rather, the value of the new point is compared with that of the incumbent and an update takes place in the case of improvement. We also assume that a stopping condition has been chosen such as the maximum cpu time allowed t max, or the maximum number of iterations between two improvements. To simplify the description of the algorithms, we always use t max below. Therefore, RVNS (Algorithm 3) uses two parameters: t max and k max.

Algorithm 3 Reduced VNS

The function Shake in line 4 generates a point x′ at random from the kth neighborhood of x, i.e., \(x' \in \mathcal{N}_{k}(x)\). It is given in Algorithm 4, where it is assumed that the points from \(\mathcal{N}_{k}(x)\) are numbered as \(\{x^{1},\ldots,x^{\vert \mathcal{N}_{k}(x)\vert }\}\). Note that a different notation is used for the neighborhood structures in the shake operation, since these are generally different than the ones used in VND.

Algorithm 4 Shaking function

RVNS is useful for very large instances for which local search is costly. It can be used as well for finding initial solutions for large problems before decomposition. It has been observed that the best value for the parameter k max is often 2 or 3. In addition, a maximum number of iterations between two improvements is typically used as the stopping condition. RVNS is akin to a Monte-Carlo method, but is more systematic (see, e.g., [81] where results obtained by RVNS were 30% better than those of the Monte-Carlo method in solving a continuous min-max problem). When applied to the p-Median problem, RVNS gave equally good solutions as the Fast Interchange heuristic of [102] while being 20 to 40 times faster [63].

(iii) The \(\underline{\mathbf{BasicVNS}}\) (BVNS) method [80] combines deterministic and stochastic changes of neighborhood. The deterministic part is represented by a local search heuristic. It consists in (1) choosing an initial solution x, (2) finding a direction of descent from x (within a neighborhood N(x)) and (3) moving to the minimum of f(x) within N(x) along that direction. If there is no direction of descent, the heuristic stops; otherwise it is iterated. Usually the steepest descent direction, also referred to as best improvement, is used. Also see Algorithm 2, where the best improvement is used in each neighborhood of the VND. This is summarized in Algorithm 5, where we assume that an initial solution x is given. The output consists of a local minimum, also denoted by x, and its value.

Algorithm 5 Best improvement (steepest descent) heuristic

As Steepest descent may be time-consuming, an alternative is to use a first descent (or first improvement) heuristic. Points x iN(x) are then enumerated systematically and a move is made as soon as a direction for descent is found. This is summarized in Algorithm 6.

Algorithm 6 First improvement (first descent) heuristic

The stochastic phase of BVNS (see Algorithm 7) is represented by the random selection of a point x′ from the kth neighborhood of the shake operation. Note that point x′ is generated at random in Step 5 in order to avoid cycling, which might occur with a deterministic rule.

Algorithm 7 Basic VNS

Example

We illustrate the basic steps on a minimum k-cardinality tree instance taken from [72], see Fig. 3.1. The minimum k-cardinality tree problem on graph G (k-card for short) consists of finding a subtree of G with exactly k edges whose sum of weights is minimum.

Fig. 3.1
figure 1

4-Cardinality tree problem

Fig. 3.2
figure 2

Steps of the Basic VNS for solving 4-card tree problem

The steps of BVNS for solving the 4-card problem are illustrated in Fig. 3.2. In Step 0 the objective function value, i.e., the sum of edge weights, is equal to 40; it is indicated in the right bottom corner of the figure. That first solution is a local minimum with respect to the edge-exchange neighborhood structure (one edge in, one out). After shaking, the objective function is 60, and after another local search, we are back to the same solution. Then, in Step 3, we take out 2 edges and add another 2 at random, and after a local search, an improved solution is obtained with a value of 39. Continuing in that way, the optimal solution with an objective function value equal to 36 is obtained in Step 8.

(iv) \(\underline{\mathbf{GeneralVNS.}}\) Note that the local search step (line 6 in BVNS, Algorithm 7) may also be replaced by VND (Algorithm 2). This General VNS (VNS/VND) approach has led to some of the most successful applications reported in the literature (see, e.g., [1, 2629, 31, 32, 39, 57, 66, 92, 93]). General VNS (GVNS) is outlined in Algorithm 8 below. Note that neighborhoods \(N_{1},\ldots,N_{l_{max}}\) are used in the VND step, while a different series of neighborhoods \(N_{1},\ldots,N_{k_{max}}\) apply to the Shake step.

Algorithm 8 General VNS

3.3 Some Extensions

(i) The \(\underline{\mathbf{SkewedVNS}}\) (SVNS) method [62] addresses the problem of exploring valleys far from the incumbent solution. Indeed, once the best solution in a large region has been found it is necessary to go quite far to obtain an improved one. Solutions drawn at random in far-away neighborhoods may differ substantially from the incumbent, and VNS may then degenerate, to some extent, into a Multistart heuristic (where descents are made iteratively from solutions generated at random, and which is known to be inefficient). So some compensation for distance from the incumbent must be made, and a scheme called Skewed VNS (SVNS) is proposed for that purpose. Its steps are presented in Algorithms 910 and 11. The KeepBest(x, x′) function (Algorithm 9) in SVNS simply keeps the best of solutions x and x′ The NeighborhoodChangeS function (Algorithm 10) performs the move and neighborhood change for the SVNS.

Algorithm 9 Keep best solution

Algorithm 10 Neighborhood change for Skewed VNS

Algorithm 11 Skewed VNS

SVNS makes use of a function ρ(x, x″) to measure the distance between the current solution x and the local optimum x″. The distance function used to define \(\mathcal{N}_{k}\) could also be used for this purpose. The parameter α must be chosen to allow movement to valleys far away from x when f(x″) is larger than f(x) but not too much larger (otherwise one will always leave x). A good value for α is found experimentally in each case. Moreover, in order to avoid frequent moves from x to a close solution, one may take a smaller value for α when ρ(x, x″) is small. More sophisticated choices for selecting a function of αρ(x, x″) could be made through some learning process.

(ii) The Variable neighborhood decomposition search (VNDS) method [63] extends the basic VNS into a two-level VNS scheme based upon decomposition of the problem. It is presented in Algorithm 12, where t d is an additional parameter that represents the running time allowed for solving decomposed (smaller-sized) problems by Basic VNS (line 5).

Algorithm 12 Variable neighborhood decomposition search

For ease of presentation, but without loss of generality, we assume that the solution x represents a set of attributes. In Step 4 we denote by y a set of k solution attributes present in x′ but not in x (y = x∖x). In Step 5 we find the local optimum y′ in the space of y; then we denote with x″ the corresponding solution in the whole space X (x″ = (x∖y) ∪ y′). We notice that exploiting some boundary effects in a new solution can significantly improve solution quality. That is why, in Step 6, the local optimum x‴ is found in the whole space X using x″ as an initial solution. If this is time consuming, then at least a few local search iterations should be performed.

VNDS can be viewed as embedding the classical successive approximation scheme (which has been used in combinatorial optimization at least since the sixties, see, e.g., [48]) in the VNS framework. Let us mention here a few applications of VNDS: p-median problem [63]; simple plant location problem [67]; k-cardinality tree problem [100]; 0-1 mixed integer programming problem [51, 74]; design of MBA student teams [37], etc.

3.4 Changing Formulation Within VNS

A traditional approach to tackle an optimization problem is to consider a given formulation and search in some way through its feasible set X. Given that the same problem can often be formulated in different ways, it is possible to extend search paradigms to include jumps from one formulation to another. Each formulation should lend itself to some traditional search method, its ‘local search’ that works totally within this formulation, and yields a final solution when started from some initial solution. Any solution found in one formulation should easily be translatable to its equivalent solution in any other formulation. We may then move from one formulation to another by using the solution resulting from the local search of the former as an initial solution for the local search of the latter. Such a strategy will of course only be useful when local searches in different formulations behave differently. Here we discuss two such possibilities.

3.4.1 Variable Neighborhood-Based Formulation Space Search

The idea of changing the formulation of a problem was investigated in [82, 83] using an approach that systematically alternates between different formulations for solving various Circle Packing Problems (CPP). It is shown there that a stationary point for a nonlinear programming formulation of CPP in Cartesian coordinates is not necessarily a stationary point in polar coordinates. A method called Reformulation Descent (RD) that alternates between these two formulations until the final solution is stationary with respect to both formulations is suggested. Results obtained were comparable with the best known values, but were achieved about 150 times faster than with an alternative single formulation approach. In this paper, the idea suggested above of Formulation Space Search (FSS) is also introduced, using more than two formulations. Some research in that direction has also been reported in [70, 79, 90]. One methodology that uses the variable neighborhood idea when searching through the formulation space is given in Algorithms 13 and 14. Here ϕ (ϕ′) denotes a formulation from a given space \(\mathcal{F}\), x (x′) denotes a solution in the feasible set defined with that formulation, and max is the formulation neighborhood index. Note that Algorithm 14 uses a reduced VNS strategy in the formulation space \(\mathcal{F}\). Note also that the ShakeFormulation() function must provide a search through the solution space \(\mathcal{S}'\) (associated with formulation ϕ′) in order to get a new solution x′. Any appropriate method can be used for this purpose.

Algorithm 13 Formulation change

Algorithm 14 Reduced variable neighborhood FSS

3.4.2 Variable Formulation Search

Many optimization problems in the literature, e.g., min-max problems, demonstrate a flat landscape. It means that, given a formulation of the problem, many neighbors of a solution have the same objective function value. When this happens, it is difficult to determine which neighborhood solution is more promising to continue the search. To address this drawback, the use of alternative formulations of the problem within VNS is proposed in [85, 86, 89]. In [89] it is named Variable Formulation Search (VFS). It combines a change of neighborhood within the VNS framework, with the use of alternative formulations.

Let us assume that, beside the original formulation and the corresponding objective function f 0(x), there are p other formulations denoted as f 1(x), . . , f p(x), xX. Note that two formulations are defined as equivalent if the optimal solution of one is the optimal solution of the other, and vice versa. For simplification purposes, we will denote different formulations as different objectives f i(x), i = 1, . . , p. The idea of VFS is to add the procedure Accept(x, x′, p), given in Algorithm 15, in all three basic steps of BVNS: Shaking, LocalSearch and NeighborhoodChange. Clearly, if a better solution is not obtained by any of the p + 1 formulations, the move is rejected. The next iteration in the loop of Algorithm 15 will take place only if the objective function values according to all previous formulations are equal.

Algorithm 15 Accept procedure withp secondary formulations

If Accept (x, x′, p) is included in the LocalSearch subroutine of BVNS, then it will not stop the first time a non improved solution is found. In order to stop LocalSearch and thus claim that x′ is a local minimum, x′ should not be improved by any among the p different formulations. Thus, for any particular problem, one needs to design different formulations of the problem considered and decide the order in which they will be used in the Accept subroutine. Answers to those two questions are problem specific and sometimes not easy. The Accept (x, x′, p) subroutine can obviously be added to the NeighborhoodChange and Shaking steps of BVNS from Algorithm 7 as well.

In [85], three evaluation functions, or acceptance criteria, within the Neighborhood Change step are used in solving the Bandwidth Minimization Problem. This min-max problem consists of finding permutations of rows and columns of a given square matrix to minimize the maximal distance of the nonzero elements from the main diagonal in the corresponding rows. Solution x may be represented as a labeling of a graph and the move from x to x′ as xx′. Three criteria are used:

  1. 1.

    the bandwidth length f 0(x) (f 0(x′) < f 0(x));

  2. 2.

    the total number of critical vertices f 1(x) (f 1(x′) < f 1(x)), if f 0(x′) = f 0(x);

  3. 3.

    f 3(x, x′) = ρ(x, x′) −α, if f 0(x′) = f 0(x) and f 1(x′) = f 1(x). Here, we want f 3(x, x′) > 0, because we assume that x and x′ are sufficiently far from one another when ρ(x, x′) > α, where α is an additional parameter. The idea for a move to an even worse solution, if it is very far, is used within Skewed VNS. However, a move to a solution with the same value is only performed in [85] if its Hamming distance from the incumbent is greater than α.

In [86] a different mathematical programming formulation of the original problem is used as a secondary objective within the Neighborhood Change function of VNS. There, two combinatorial optimization problems on a graph are considered: the Metric Dimension Problem and Minimal Doubly Resolving Set Problem.

A more general VFS approach is given in [89], where the Cutwidth Graph Minimization Problem (CWP) is considered. CWP also belongs to the min-max problem family. For a given graph, one needs to find a sequence of nodes such that the maximum cutwidth is minimum. The cutwidth of a graph should be clear from the example provided in Fig. 3.3 for the graph with six vertices and nine edges shown in (a).

Fig. 3.3
figure 3

Cutwidth minimization example as in [89]

Figure 3.3b shows an ordering x of the vertices of the graph in (a) with the corresponding cutwidth CW values of each vertex. It is clear that the CW represents the number of cut edges between two consecutive nodes in the solution x. The cutwidth value f 0(x) = CW(x) of the ordering x = (A, B, C, D, E, F) is equal to f 0(x) = max{4, 5, 6, 4, 2} = 6. Thus, one needs to find an order x that minimizes the maximum cut-width value over all vertices.

Beside minimizing the bandwidth f 0, two additional formulations, denoted f 1 and f 2, are used in [89], and implemented within a VND local search. Results are compared among themselves (Table 3.1) and with a few heuristics from the literature (Table 3.1), using the following usual data set:

  • Grid”: This data set consists of 81 matrices constructed as the Cartesian product of two paths. They were originally introduced by Rolim et al. [94]. For this set of instances, the vertices are arranged on a grid of dimension width × height where width and height are selected from the set {3, 6, 9, 12, 15, 18, 21, 24, 27}.

  • Harwell-Boeing” (HB): This data set is a subset of the public-domain Matrix Market library.Footnote 1 This collection consists of a set of standard test matrices M = (M ij) arising from problems in linear systems, least squares, and eigenvalue calculations from a wide variety of scientific and engineering disciplines. Graphs were derived from these matrices by considering an edge (i, j) for every element M ij ≠ 0. The data set is formed by the selection of the 87 instances were n ≤ 700. Their number of vertices ranges from 30 to 700 and the number of edges from 46 to 41,686.

Table 3.1 presents the results obtained with four different VFS variants, after executing them for 30 s over each instance. The column ‘BVNS’ of Table 3.1 represents a heuristic based on BVNS which makes use only of the original formulation f 0 of the CWP. VFS1 denotes a BVNS heuristic that uses only one secondary criterion, i.e., f 0 and f 1. VFS2 is equivalent to the previous one with the difference that now f 2 is considered (instead of f 1). Finally, the fourth column of the table, denoted as VFS3, combines the original formulation of the CWP with the two alternative ones, in the way presented in Algorithm 15. All algorithms were configured with k max = 0. 1n and start from the same random solution.

Table 3.1 Comparison of alternative formulations within 30 s for each test, by average objective values and % deviation from the best known solution

It appears that significant improvements in solution quality are obtained when at least one secondary formulation is used in case of ties (compare e.g., 192.44% and 60.40% deviations from the best known solutions obtained by BVNS and VFS1, respectively). An additional improvement is obtained when all three formulations are used in VFS3.

Comparison of VFS3 and state-of-the-art heuristics are given in Table 3.2. There, the stopping condition is increased from 30 s to 300 and 600 s for the first and the second set of instances, respectively. Besides average values and % deviation, the methods are compared based on the number of wins (the third row) and the total cpu time in seconds. Overall, the best quality results are obtained by VFS in less computing time.

Table 3.2 Comparison of VFS with the state-of-the-art heuristics over the “Grid” and “HB” data sets, within 300 and 600 s respectively

3.5 Primal-Dual VNS

For most modern heuristics, the difference in value between the optimal solution and the obtained approximate solution is not precisely known. Guaranteed performance of the primal heuristic may be determined if a lower bound on the objective function value can be found. To this end, the standard approach is to relax the integrality condition on the primal variables, based on a mathematical programming formulation of the problem. However, when the dimension of the problem is large, even the relaxed problem may be impossible to solve exactly by standard commercial solvers. Therefore, it seems to be a good idea to solve dual relaxed problems heuristically as well. In this way we get guaranteed bounds on the primal heuristic performance. The next difficulty arises if we want to get an exact solution within a branch-and-bound framework since having the approximate value of the relaxed dual does not allow us to branch in an easy way, for example by exploiting complementary slackness conditions. Thus, the exact value of the dual is necessary. A general approach to get both guaranteed bounds and an exact solution is proposed in [67], and referred as Primal-Dual VNS (PD-VNS). It is given in Algorithm 16.

Algorithm 16 Basic PD-VNS

In the first stage, a heuristic procedure based on VNS is used to obtain a near optimal solution. In [67] it is shown that VNS with decomposition is a very powerful technique for large-scale simple plant location problems (SPLP) with up to 15,000 facilities and 15,000 users. In the second phase, the objective is to find an exact solution of the relaxed dual problem. Solving the relaxed dual is accomplished in three stages: (1) find an initial dual solution (generally infeasible) using the primal heuristic solution and complementary slackness conditions; (2) find a feasible solution by applying VNS to the unconstrained nonlinear form of the dual; (3) solve the dual exactly starting with the found initial feasible solution using a customized “sliding simplex” algorithm that applies “windows” on the dual variables, thus substantially reducing the problem size. On all problems tested, including instances much larger than those previously reported in the literature, the procedure was able to find the exact dual solution in reasonable computing time. In the third and final phase, armed with tight upper and lower bounds obtained from the heuristic primal solution in phase one and the exact dual solution in phase two, respectively, a standard branch-and-bound algorithm is applied to find an optimal solution of the original problem. The lower bounds are updated with the dual sliding simplex method and the upper bounds whenever new integer solutions are obtained at the nodes of the branching tree. In this way it was possible to solve exactly problem instances of sizes up to 7000 facilities × 7000 users, for uniform fixed costs, and 15, 000 facilities × 15, 000 users, otherwise.

3.6 VNS for Mixed Integer Linear Programming

The Mixed Integer Linear Programming (MILP) problem consists of maximizing or minimizing a linear function, subject to equality or inequality constraints and integrality restrictions on some of the variables. The mixed integer programming problem (MILP) can be expressed as:

$$\displaystyle\begin{array}{rcl} (MILP)\quad \left [\begin{array}{ll} \min \;\;\sum _{j=1}^{n}c_{j}x_{j} & \\ \mathrm{s.t.}\quad \sum _{j=1}^{n}a_{ij}x_{j} \geq b_{i} &\forall i \in M =\{ 1,2,\ldots,m\} \\ \hspace{22.76228pt} x_{j} \in \{ 0,1\} &\forall j \in \mathcal{ B} \\ \hspace{22.76228pt} x_{j} \geq 0,\mathtt{integer}&\forall j \in \mathcal{G} \\ \hspace{22.76228pt} x_{j} \geq 0 &\forall j \in \mathcal{ C}\\ \end{array} \right.& & {}\\ \end{array}$$

where the set of indices N = {1, 2, , n} is partitioned into three subsets \(\mathcal{B},\mathcal{G}\) and \(\mathcal{C}\), corresponding to binary, general integer and continuous variables, respectively.

Numerous combinatorial optimization problems, including a wide range of practical problems in business, engineering and science, can be modeled as MILPs. Several special cases, such as knapsack, set packing, cutting and packing, network design, protein alignment, traveling salesman and other routing problems, are known to be NP-hard [46].

Many commercial solvers such as CPLEX [71] are available for solving MILPs. Methods included in such software packages are usually of the branch-and-bound (B&B) or of branch-and-cut (B&C) types. Basically, those methods enumerate all possible integer values in some order, and prune the search space for the cases where such enumeration cannot improve the current best solution.

3.6.1 Variable Neighborhood Branching

The connection between local search based heuristics and exact solvers may be established by introducing the so called local branching constraints [43]. By adding just one constraint into (MILP), as explained below, the kth neighborhood of (MILP) is defined. This allows the use of all local search based metaheuristics, such as Tabu search, Simulating annealing, VNS etc. More precisely, given two solutions x and y of (MILP), the distance between x and y is defined as:

$$\displaystyle{\delta (x,y) =\sum _{j\in \mathcal{B}}\mid x_{j} - y_{j}\mid.}$$

Let X be the solution space of (MILP). The neighborhood structures \(\{\mathcal{N}_{k}\mid k = 1,\ldots,k_{max}\}\) can be defined, knowing the distance δ(x, y) between any two solutions x, yX. The set of all solutions in the kth neighborhood of yX is denoted as \(\mathcal{N}_{k}(y)\) where

$$\displaystyle{\mathcal{N}_{k}(y) =\{ x \in X\mid \delta (x,y)\; \leq \; k\}.}$$

For the pure 0-1 MILP given above (i.e., (MILP) with \(\mathcal{G} =\emptyset\)), δ(. , . ) represents the Hamming distance and \(\mathcal{N}_{k}(y)\) may be expressed by the following local branching constraint

$$\displaystyle{ \delta (x,y) =\sum _{j\in S}(1 - x_{j}) +\sum _{j\in \mathcal{B}\setminus S}x_{j} \leq k, }$$
(3.2)

where \(S =\{ j \in \mathcal{B}\;\vert \;y_{j} = 1\}\).

In [66] a general VNS procedure for solving 0-1 MILPs is presented (see Algorithm 17). An exact MILP solver (MIPSOLVE() within CPLEX) is used as a black box for finding the best solution in the neighborhood, based on the given formulation (MILP) plus the added local branching constraints. Shaking is performed using the Hamming distance defined above. A detailed description of this VNS branching method is provided in Algorithm 17. The variables and constants used in the algorithm are defined as follows [66]:

  • UB—input variable for the CPLEX solver which represents the current upper bound.

  • first—logical input variable for CPLEX solver which is true if the first solution lower than UB is asked for in the output; if first = false, CPLEX returns the best solution found so far.

  • TL—maximum time allowed for running CPLEX.

  • rhs—right hand side of the local branching constraint; it defines the size of the neighborhood within the inner or VND loop.

  • cont—logical variable which indicates if the inner loop continues (true) or not (false).

  • x_opt and f_opt—incumbent solution and corresponding objective function value.

  • x_cur, f_cur, k_cur—current solution, objective function value and neighborhood from where the VND local search starts (lines 6–20).

  • x_next and f_next—solution and corresponding objective function value obtained by CPLEX in the inner loop.

Algorithm 17 VNS branching

In line 2, a commercial MIP solver is run to get an initial feasible solution, i.e., logical variable ‘first’ is set to value true. The outer loop starts from line 4. VND based local search is performed in the inner loop that starts from line 6 and finishes at line 24. There are four different outputs from subroutine MIPSOLVE provided by variable stat. They are coded in lines 11–20. The shaking step also uses the MIP solver. It is presented in the loop that starts at line 25.

3.6.2 VNDS Based Heuristics for MILP

It is well known that heuristics and relaxations are useful for providing upper and lower bounds on the optimal value of large and difficult optimization problems. A hybrid approach for solving 0-1 MILPs is presented in this section. A more detailed description may be found in [51]. It combines variable neighborhood decomposition search (VNDS) [63] and a generic MILP solver for upper bounding purposes, and a generic linear programming solver for lower bounding. VNDS is used to define a variable fixing scheme for generating a sequence of smaller subproblems, which are normally easier to solve than the original problem. Different heuristics are derived by choosing different strategies for updating lower and upper bounds, and thus defining different schemes for generating a series of subproblems. We also present in this section a two-level decomposition scheme, in which subproblems created according to the VNDS rules are further divided into smaller subproblems using another criterion, derived from the mathematical formulation of the problem.

3.6.2.1 VNDS for 0-1 MILPs with Pseudo-Cuts

Variable neighborhood decomposition search is a two-level variable neighborhood search scheme for solving optimization problems, based upon the decomposition of the problem (see Algorithm 12). We discuss here an algorithm which solves exactly a sequence of reduced problems obtained from a sequence of linear programming relaxations. The set of reduced problems for each LP relaxation is generated by fixing a certain number of variables according to VNDS rules. That way, two sequences of upper and lower bounds are generated, until an optimal solution of the problem is obtained. Also, after each reduced problem is solved, a pseudo-cut is added to guarantee that this subproblem is not revisited. Furthermore, whenever an improvement in the objective function value occurs, a local search procedure is applied in the whole solution space to attempt a further improvement (the so-called boundary effect within VNDS). This procedure is referred to as VNDS-PC, since it employs VNDS to solve 0-1 MILPs, while incorporating pseudo-cuts to reduce the search space [51].

If \(J \subseteq \mathcal{ B}\), we define the partial distance between x and y, relative to J, as δ(J, x, y) = jJx jy j∣. Obviously we have \(\delta (\mathcal{B},x,y) =\delta (x,y)\)). More generally, let \(\bar{x}\) be an optimal solution of LP(P), the LP relaxation of the problem P considered (not necessarily MIP feasible), and \(J \subseteq B(\bar{x}) =\{ j \in N\mid \bar{x}_{j} \in \{ 0,1\}\}\) an arbitrary subset of indices. The partial distance \(\delta (J,x,\overline{x})\) can be linearized as follows:

$$\displaystyle{\delta (J,x,\overline{x}) =\sum _{j\in J}[x_{j}(1 -\overline{x}_{j}) + \overline{x}_{j}(1 - x_{j})].}$$

Let X be the solution space of problem P. The neighborhood structures \(\{\mathcal{N}_{k}\mid k = k_{min},\ldots,k_{max}\}\), 1 ≤ k mink maxp, can be defined knowing the distance \(\delta (\mathcal{B},x,y)\) between any two solutions x, yX. The set of all solutions in the kth neighborhood of xX is denoted as \(\mathcal{N}_{k}(x)\), where

$$\displaystyle{\mathcal{N}_{k}(x) =\{ y \in X\mid \delta (\mathcal{B},x,y)\; \leq \; k\}.}$$

From the definition of \(\mathcal{N}_{k}(x)\), it follows that \(\mathcal{N}_{k}(x) \subset \mathcal{ N}_{k+1}(x)\), for any k ∈ {k min, k min + 1, , k max − 1}, since \(\delta (\mathcal{B},x,y)\; \leq \; k\) implies \(\delta (\mathcal{B},x,y)\; \leq \; k + 1\). It is trivial that, if we completely explore neighborhood \(\mathcal{N}_{k+1}(x)\), it is not necessary to explore neighborhood \(\mathcal{N}_{k}(x)\).

\(\underline{\mathbf{Orderingvariablesw.r.tLP --relaxation.}}\) The first variant of VNDS-PC, denoted as VNDS-PC1, is considered here for the maximization case. See Algorithm 18 for the pseudo-code of this algorithm which can be easily adjusted for minimization problems. Input parameters for the algorithm are an instance P of the 0-1 MIP problem, a parameter d which defines the number of variables to be released in each iteration and an initial feasible solution x of P. The algorithm returns the best solution found until the stopping criterion defined by the variable proceed1 is met.

Algorithm 18 VNDS for MIPs with pseudo-cuts

This variant of VNDS-PC is based on the following choices. Variables are ordered according to their distances from the corresponding LP relaxation solution values (see lines 4, 6 and 7 in Algorithm 18). More precisely, we compute distances \(\delta _{j} = \mid x_{j} -\overline{x}_{j}\mid\) for \(j \in \mathcal{ B}\), where x j is a variable value of the current incumbent (feasible) solution and \(\overline{x}_{j}\) a variable value of the LP-relaxation. We then index variables \(x_{j},j \in \mathcal{ B}\), so that δ 1δ 2δ p, \(p = \mid \mathcal{B}\mid\). Parameters k min, k step and k max (see line 9 in Algorithm 18) are determined in the following way. Let q be the number of binary variables which have different values in the LP relaxation solution and in the incumbent solution (\(q = \mid \{j \in \mathcal{ B}\mid \delta _{j}\neq 0\}\mid\)), and let d be a given parameter (whose value is experimentally found) which controls the neighborhood size. Then we set k min = pq, k step = ⌊qd⌋ and k max = pk step. We also allow the value of k to be less then k min (see lines 17 and 18 in Algorithm 18). In other words, we allow the variables which have the same integer value in the incumbent and LP-relaxation solutions to be freed anyway. When k < k min, k step is set to (approximately) half the number of the remaining fixed variables. Note that the maximum value of parameter k (which is k max) indicates the maximum possible number of fixed variables, which implies the minimum number of free variables and therefore the minimum possible neighborhood size in the VNDS scheme.

If an improvement occurs after solving the subproblem P(x , J k), where x is the current incumbent solution (see line 12 in Algorithm 18), we perform a local search on the complete solution, starting from x′ (see line 14 in Algorithm 18). The local search applied at this stage is the variable neighborhood descent for 0-1 MILPs, as described in [66]. Note that, in Algorithm 18 and in the pseudo-codes that follow, the statement y = MILPSOLVE(P, x) denotes a call to a generic MILP solver, for a given 0-1 MILP problem P, starting from a given solution x and returning a new solution y (if P is infeasible, then the value of y remains the same as the one before the call to the MILP solver).

In practice, when used as a heuristic with a time limit as the stopping criterion, VNDS-PC1 has a good performance. One can observe that, if pseudo-cuts (line 13 in Algorithm 18) and objective cuts (lines 2 and 16) are not added, the algorithm from [74] is obtained, which is a special case of VNDS-PC with a fixed LP relaxation reference solution.

Ordering variables w.r.t. the minimum and maximum distances from the

incumbent solution.

In the VNDS variant above, the variables in the incumbent integer solution are ordered according to the distances of their values to the values of the current linear relaxation solution. However, it is possible to employ different ordering strategies. For example, in the case of maximization of c Tx, consider the following two problems:

$$\displaystyle{\begin{array}{ll} (LP_{x^{{\ast}}}^{-})\left [\begin{array}{r l} &\min \delta (x^{{\ast}},x) \\ \text{s.t.:}&Ax \leq b \\ &c^{\text{T}}x \geq LB + 1 \\ &x_{j} \in \left [0,1\right ]\text{, }j \in \mathcal{B} \\ &x_{j} \geq 0\text{, }j \in N \end{array} \right.&\quad \quad \quad \quad (LP_{x^{{\ast}}}^{+})\left [\begin{array}{r l} &\max \delta (x^{{\ast}},x) \\ \text{s.t.:}&Ax \leq b \\ &c^{\text{T}}x \geq LB + 1 \\ &x_{j} \in \left [0,1\right ]\text{, }j \in \mathcal{B} \\ &x_{j} \geq 0\text{, }j \in N \end{array} \right. \end{array} }$$

where x is the best known integer feasible solution and LB is the best lower bound found so far (i.e., LB = c tx ). Of course, in case of solving minc Tx, the inequality c TxLB + 1 from models (\(LP_{x^{{\ast}}}^{-}\)) and (\(LP_{x^{{\ast}}}^{+}\)), should be replaced with c TxUB − 1, where the upper bound UB = c Tx . If \(\overline{x}^{-}\) and \(\overline{x}^{+}\) are optimal solutions of the LP-relaxation problems \(LP_{x^{{\ast}}}^{-}\) and \(LP_{x^{{\ast}}}^{+}\), respectively, then components of x could be ordered in ascending order of values \(\vert \overline{x}_{j}^{-}-\overline{x}_{j}^{+}\vert\), \(j \in \mathcal{ B}\). Since both solution vectors \(\overline{x}^{-}\) and \(\overline{x}^{+}\) are real-valued (i.e., from \(\mathbb{R}^{n}\)), this ordering technique is expected to be more sensitive than the standard one, i.e., the number of pairs (j, j′), j, j′ ∈ N, jj′ for which \(\vert \overline{x}_{j}^{-}-\overline{x}_{j}^{+}\vert \neq \vert \overline{x}_{j'}^{-}-\overline{x}_{j'}^{+}\vert\) is expected to be greater than the number of pairs (h, h′), h, h′ ∈ N, hh′ for which \(\vert x_{h}^{{\ast}}-\overline{x}_{h}\vert \neq \vert x_{h'}^{{\ast}}-\overline{x}_{h'}\vert\), where \(\overline{x}\) is an optimal solution of the LP relaxation LP(P).

Also, according to the definition of \(\overline{x}^{-}\) and \(\overline{x}^{+}\), it is intuitively more likely for the variables x j, jN, for which \(\overline{x}_{j}^{-} = \overline{x}_{j}^{+}\), to have that same value \(\overline{x}_{j}^{-}\) in the final solution, than it is for variables x j, jN, for which \(x_{j}^{{\ast}} = \overline{x}_{j}\) (and \(\overline{x}_{j}^{-}\neq \overline{x}_{j}^{+}\)), to have the final value x j . In practice, if \(\overline{x}_{j}^{-} = \overline{x}_{j}^{+}\), jN, then usually \(x_{j}^{{\ast}} = \overline{x}_{j}^{-}\), which justifies the ordering of components of x in the described way. However, if we want to keep the number of iterations in one pass of VNDS approximately the same as in the standard ordering, i.e., if we want to use the same value for parameter d, then the subproblems examined will be larger than with the standard ordering, since the value of q will be smaller (see line 8 in Algorithm 19). The pseudo-code of this variant of VNDS-PC, denoted as VNDS-PC2, is provided in Algorithm 19.

Algorithm 19 VNDS for MIPs with pseudo-cuts and another ordering strategy

3.6.2.2 A Double Decomposition Scheme

In this section we propose the use of a second level decomposition scheme within VNDS for the 0-1 MILP. The 0-1 MILP is tackled by decomposing the problem into several subproblems, where the number of binary variables with value 1 is fixed at a given integer value. Fixing the number of variables with value 1 to a given value \(h \in \mathbb{N} \cup \{ 0\}\) can be achieved by adding the constraint x 1 + x 2 + + x p = h, or, equivalently, e tx = h, where e is the vector of ones. Solving the 0-1 MILP by tackling separately each of the subproblems P h for hN appears to be an interesting approach for the case of the multidimensional knapsack problem [101], especially because the additional constraint e tx = h provides tighter upper bounds than the classical LP-relaxation.

Formally, let P h be the subproblem obtained from the original problem by adding the hyperplane constraint e tx = h for hN, and enriched by an objective cut:

$$\displaystyle{(P_{h})\left [\begin{array}{r l} \max &c^{\text{T}}x\\ \text{s.t.:} &Ax \leq b \\ &c^{\text{T}}x \geq LB + 1 \\ &e^{\text{T}}x = h \\ &x \in \{ 0,1\}^{p} \times \mathbb{R}_{+}^{n-p} \end{array} \right.}$$

Let h min and h max denote lower and upper bounds on the number of variables with value 1 in an optimal solution of the problem. Then it is obvious that ν(P) = max{ν(P h)∣h minhh max}. Bounds \(h_{min} = \left \lceil \nu (LP_{0}^{-})\right \rceil\) and \(h_{max} = \left \lfloor \nu (LP_{0}^{+})\right \rfloor\) can be computed by solving the following two problems:

$$\displaystyle{\begin{array}{ll} (LP_{0}^{-})\left [\begin{array}{r l} \min &e^{\text{T}}x\\ \text{s.t.:} &Ax \leq b \\ &c^{\text{T}}x \geq LB + 1 \\ &x \in \left [0,1\right ]^{p} \times \mathbb{R}_{+}^{n-p} \end{array} \right .&\quad \quad \quad (LP_{0}^{+})\left [\begin{array}{r l} \max &e^{\text{T}}x\\ \text{s.t.:} &Ax \leq b \\ &c^{\text{T}}x \geq LB + 1 \\ &x \in \left [0,1\right ]^{p} \times \mathbb{R}_{+}^{n-p} \end{array} \right . \end{array} }$$

We define the order of the hyperplanes at the beginning of the algorithm, and then we explore them one by one, in that order. The ordering can be done according to the objective values of the linear programming relaxations LP(P h), hH = {h min, , h max}. In each hyperplane, VNDS-PC1 is applied and if there is no improvement, the next hyperplane is explored. We refer to this method as VNDDS (short for Variable Neighborhood Double Decomposition Search), which corresponds to the pseudo-code in Algorithm 20. This idea is inspired by the approach proposed in [91], where the ordering of the neighborhood structures in Variable Neighborhood Descent is determined dynamically, by solving relaxations of the problems. Problems differ in one constraint that defines the Hamming distance h (hH = {h min, . . , h max}).

Algorithm 20 Two levels of decomposition with hyperplanes ordering

It is important to note that the exact variant of VNDDS, i.e., without any limitations regarding the running time or the number of iterations, converges to an optimal solution in a finite number of steps [51].

3.6.2.3 Comparison

For comparison purposes, five algorithms are ranked according to their objective values for the MIP benchmark instances in MIPLIB [77] and the benchmark instances for the Maximum Knapsack Problem (MKP) in [21]. Tables 3.3 and 3.4 report the average differences between the ranks of every pair of algorithms for the MIPLIP and MKP test sets, respectively.

Table 3.3 Objective value average rank differences on the MIPLIB set
Table 3.4 Objective value average rank differences on the MKP set

It appears that VNDS-MIP outperforms the other four methods on MIPLIB instances, while for the MKP set, the best performance is obtained with the VNDS-PC1 heuristic.

3.7 Variable Neighborhood Search for Continuous Global Optimization

The general form of the continuous constrained nonlinear global optimization problem (GOP) is given as follows:

$$\displaystyle\begin{array}{rcl} (GOP)\quad \left [\begin{array}{ll} \min \;\;f(x) & \\ \mathrm{s.t.}\quad g_{i}(x) \leq 0 &\forall i \in \{ 1,2,\ldots,m\} \\ \hspace{22.76228pt} h_{i}(x) = 0 &\forall i \in \{ 1,2,\ldots,r\} \\ \hspace{22.76228pt} a_{j} \leq x_{j} \leq b_{j}&\forall j \in \{ 1,2,\ldots,n\}\\ \end{array} \right.& & {}\\ \end{array}$$

where xR n, f: R nR, g i: R nR, i = 1, 2, , m, and h i: R nR, i = 1, 2, , r, are possibly nonlinear continuous functions, and a, bR n are the variable bounds. A box constraint GOP is defined when only the variable bound constraints are present in the model.

GOPs naturally arise in many applications, e.g. in advanced engineering design, data analysis, financial planning, risk management, scientific modeling, etc. Most cases of practical interest are characterized by multiple local optima and, therefore, a search effort of global scope is needed to find the globally optimal solution.

If the feasible set X is convex and objective function f is convex, then (GOP) is relatively easy to solve, i.e., the Karush-Kuhn-Tucker conditions can be applied. However, if X is not a convex set or f is not a convex function, we can have many local optima and the problem may not be solved with classical techniques. For solving (GOP), VNS has been used in two different ways: (1) with neighborhoods induced by using a p norm; (2) without using a p norm.

(i) \(\underline{\mathbf{VNSwith}\ell_{p}\mathbf{normneighborhoods}}\) [40, 75, 81, 84]. A natural approach in applying VNS for solving GOPs is to induce neighborhood structures \(\mathcal{N}_{k}(x)\) from the p metric given as:

$$\displaystyle{ \rho (x,y) = \left (\sum _{i=1}^{n}\vert x_{ i} - y_{i}\vert ^{p}\right )^{1/p}\text{ },\quad p \in [1,\infty ) }$$
(3.3)

and

$$\displaystyle{ \rho (x,y) =\max _{1\leq i\leq n}\vert x_{i} - y_{i}\vert,\quad \text{}p \rightarrow \infty. }$$
(3.4)

The neighborhood \(\mathcal{N}_{k}(x)\) denotes the set of solutions in the k-th neighborhood of x based on the metric ρ. It is defined as

$$\displaystyle{ \mathcal{N}_{k}(x) =\{ y \in X\;\vert \;\rho (x,y) \leq \rho _{k}\}, }$$
(3.5)

or

$$\displaystyle{ \mathcal{N}_{k}(x) =\{ y \in X\;\vert \;\rho _{k-1} <\rho (x,y) \leq \rho _{k}\}, }$$
(3.6)

where ρ k, known as the radius of \(\mathcal{N}_{k}(x)\), is monotonically increasing with k (k ≥ 2).

For solving box constraint GOPs, both [40] and [75] use the neighborhoods as defined in (3.6). The basic differences between the two algorithms reported there are as follows: (1) in the procedure suggested in [75] the norm is used, while in [40] the choice of metric is either left to the analyst, or changed automatically in some predefined order; (2) the commercial solver SNOPT [47] is used as a local search procedure within VNS in [75], while in [40], the analyst may choose one out of six different convex minimizers. A VNS based heuristic for solving the generally constrained GOP is suggested in [84]. There, the problem is first transformed into a sequence of box constrained problems within the well known exterior point method:

$$\displaystyle{ \min _{a\leq x\leq b}F_{\mu,q}(x) = f(x) + \frac{1} {\mu } \;\sum _{i=1}^{m}(\max \{0,g_{ i}(x)\})^{q} +\sum _{ i=1}^{r}\vert h_{ i}(x)\vert ^{q}, }$$
(3.7)

where μ and q ≥ 1 are a positive penalty parameter and penalty exponent, respectively. Algorithm 21 outlines the steps for solving the box constraint subproblem as proposed in [84].

Algorithm 21 VNS using a p norm

The Glob-VNS procedure from Algorithm 21 contains the following parameters in addition to k max and t max: (1) Values of radii ρ k,  k = 1, , k max, which may be defined by the user or calculated automatically in the minimizing process; (2) Geometry of neighborhood structures \(\mathcal{N}_{k}\), defined by the choice of metric. Usual choices are the 1, 2, and norms; (3) Distribution types used for obtaining random points y from \(\mathcal{N}_{k}\) in the Shaking step. A uniform distribution in \(\mathcal{N}_{k}\) is the obvious choice, but other distributions may lead to much better performance on some problems. Different choices of neighborhood structures and random point distributions lead to different VNS-based heuristics.

(ii) \(\underline{\mathbf{VNSwithoutusing}\ell_{p}\mathbf{normneighborhoods}}\). Two different neighborhoods, N 1(x) and N 2(x), are used in the VNS based heuristic suggested in [99]. In N 1(x), r (a parameter) random directions from the current point x are generated and a one dimensional search along each direction is performed. The best point (out of r) is selected as a new starting solution for the next iteration, if it is better than the current one. If not, as in VND, the search is continued within the next neighborhood N 2(x). The new point in N 2(x) is obtained as follows. The current solution is moved for each x j (j = 1, , n) by a value Δ j, taken at random from the interval ( −α, α); i.e., x j (new) = x j + Δ j or x j (new) = x jΔ j. Points obtained by the plus or minus sign for each variable define the neighborhood N 2(x). If a relative increase of 1% in the value of x j (new) produces a better solution than x (new), the + sign is chosen; otherwise the − sign is chosen.

Neighborhoods N 1(x) and N 2(x) are used for designing two algorithms. The first, called VND, iterates over these neighborhoods until there is no improvement in the solution value. In the second variant, a local search is performed with N 2 and k max is set to 2 for the shaking step.

It is interesting to note that computational results reported by all VNS based heuristics were very promising. They usually outperformed other recent approaches from the literature.

3.8 Variable Neighborhood Programming (VNP): VNS for Automatic Programming

Building an intelligent machine is an old dream that, thanks to computers, begins to take shape. Automatic programming is an efficient technique that has led to important developments in the field of artificial intelligence. Genetic programming (GP) [73], inspired by the genetic algorithm (GA), is among the few evolutionary algorithms used to evolve a population of programs. The main difference between GP and GA is the representation of a solution. An individual in GA can be a string, while in GP, the individuals are programs. A tree is the usual way to represent a program in GP. For example, assume that the current solution of a problem is the following function:

$$\displaystyle{f(x_{1},\ldots,x_{5}) = \frac{x_{1}} {x_{2} + x_{3}} + x_{4} - x_{5}.}$$

Then the code (tree) that calculates f using GP may be represented as in Fig. 3.4a.

Elleuch et al. [41, 42] recently adapted VNS rules for solving automatic programming problems. They first suggested an extended solution representation by adding coefficients to variables. Each terminal node was attached to its own parameter value. These parameters give a weight for each terminal node, with values from the interval [0, 1]. This type of representation allows VNP to examine parameter values and the tree structure in the same iteration, increasing the probability for finding a good solution faster. Let G = { α 1, α 2, , α n} denote a parameter set. In Fig. 3.4b an example of a solution representation in VNP is illustrated.

Fig. 3.4
figure 4

Current solution representation in automatic programming problem: (a) \(\frac{x_{1}} {x_{2}+x_{3}} + x_{4} - x_{5}\); (b) \(\frac{\alpha _{1}x_{1}} {\alpha _{2}x_{2}+\alpha _{3}x_{3}} +\alpha _{4}x_{4} -\alpha _{5}x_{5}\)

\(\underline{\mathbf{(i)Neighborhoodstructures}}\). Nine different neighborhood structures are proposed in [42] based on a tree representation. To save space, we will just mention some of them:

  • N 1(T)—Changing a node value operator. This neighborhood preserves the tree structure and changes only the values of a functional or a terminal node. Each node has a set of allowed values from which one can be chosen. Let x i be the current solution; then a neighbor x i+1 differs from x i by just a single node. A move within this neighborhood is shown in Fig. 3.5.

    Fig. 3.5
    figure 5

    Neighborhood N 1: changing a node value

  • N 2(T)-Swap operator. Here, a subtree from the current tree is randomly selected and a new random subtree is generated as shown in Fig. 3.6a1 and a2. Then the new subtree replaces the current one (see Fig. 3.6b). In this move, any constraint related to the maximum tree size should be respected.

    Fig. 3.6
    figure 6

    Neighborhood N 2: swap operator. (a1) The current solution. (a2) New generated subtree. (b) The new solution

  • N 3(T)—Changing parameter values. In the two previous neighborhoods, the tree structure and the node values were considered. In the N 3(T) neighborhood, attention is paid to the parameters. So, the position and value of nodes are kept in order to search the neighbors in the parameter space. Figure 3.7 illustrates the procedure where the change from one value to another is performed at random.

    Fig. 3.7
    figure 7

    Neighborhood N 3: change parameters

These neighborhoods may be used in both the local search step (N ,  ∈ [1,  max]) and in the shaking step (\(\mathcal{N}_{k},k \in [1,k_{max}]\)) of the VNP.

(ii) VNP shaking. The shaking step allows diversification in the search space. The proposed VNP algorithm does not use exactly the same neighborhood structures N than the local search. Thus, we denote the neighborhoods used in the shaking phase as \(\mathcal{N}_{k}(T),k = 1,\ldots,k_{max}\). \(\mathcal{N}_{k}(T)\) may be constructed by repeating k times one or more moves from the set {N (T), | = 1, ,  max}. Consider, for example, the swap operator N 2(T). Let m denote the maximum number of nodes in the tree representation of the solution. We can get a solution from the kth neighborhood of T using the swap operator, where k represents the number of nodes of the new generated sub-tree. If n denotes the number of nodes in the original tree after deleting the old sub-tree, than n + km. The objective of the shaking phase is to provide a good starting point for the local search.

(iii) VNP objective function. The evaluation consists of defining a fitness (or objective) function to assess a solution. This function depends on the problem considered. After running each solution (program) on a training data set, the fitness may be measured by counting the training cases where the returned solution is correct or close to the exact solution.

(iv) An example: Time series forecasting (TSF) problem. Two widely used benchmark data sets of the TSF problem are considered in [42] to study the VNP capabilities: the Mackey-Glass series and the Box-Jenkins set. The parameters for the VNP implementation that were chosen after some preliminary testing are given in Table 3.5.

Table 3.5 VNP parameters adjustment for the forecasting problem

The root mean square error (RMSE) is used as the fitness function, as it is normally done in the literature:

$$\displaystyle{f(T) = \sqrt{ \frac{1} {n}\sum _{j=1}^{n}(y_{t}^{j} - y_{out}^{j})^{2}}}$$

where n is the total number of samples, and y out j and y t j are the output of the VNP model and the desired output for sample j, respectively. Next we illustrate with a comparison on a single Box-Jenkins instance.

The gas furnace data for this instance were collected from a combustion process of a methane air mixture [20]. This time series has found a widespread application as a benchmark example for testing prediction algorithms. The data set contains 296 pairs of input-output values. The input u(t) corresponds to the gas flow, and the output y(t) is the CO2 concentration in the outlet gas. The inputs are u(t − 4), and y(t − 1), and the output is y(t). In this work, 200 samples are used in the training phase and the remaining samples are used for the testing phase. The performance of the evolved VNP model is evaluated by comparing it with existing approaches. The RMSE achieved by the VNP output model is (0.00038), which is better than the RMSE obtained by other approaches, as shown in Table 3.6.

Table 3.6 Comparison of testing error on Box-Jenkins dataset

3.9 Discovery Science

In all the above applications, VNS is used as an optimization tool. It can also lead to results in “discovery science”, i.e., for the development of new theories. This has been done for graph theory in a long series of papers with the common title “Variable neighborhood search for extremal graphs” that report on the development and applications of the AutoGraphiX (AGX) system [10, 28, 29]. This system addresses the following problems:

  • Find a graph satisfying given constraints.

  • Find optimal or near optimal graphs for an invariant subject to constraints.

  • Refute a conjecture.

  • Suggest a conjecture (or repair or sharpen one).

  • Provide a proof (in simple cases) or suggest an idea of proof.

A basic idea then is to address all of these problems as parametric combinatorial optimization problems on the infinite set of all graphs (or in practice some smaller subset) using a generic heuristic to explore the solution space. This is being accomplished using VNS to find extremal graphs with a given number n of vertices (and possibly also a given number of edges). Extremal graphs may be viewed as a family of graphs that maximize some invariant such as the independence number or chromatic number, possibly subject to constraints. We may also be interested in finding lower and upper bounds on some invariant for a given family of graphs. Once an extremal graph is obtained, VND with many neighborhoods may be used to build other such graphs. Those neighborhoods are defined by modifications of the graphs such as the removal or addition of an edge, rotation of an edge, and so forth. Once a set of extremal graphs, parameterized by their order, is found, their properties are explored with various data mining techniques, leading to conjectures, refutations and simple proofs or ideas of proof.

More recent applications include [31, 32, 45, 50, 55] in chemistry, [8, 29] for finding conjectures, [16, 35] for largest eigenvalues, [23, 56, 64] for extremal values in graphs, independence [17, 18], specialty indexes [11, 15, 19, 61] and others [13, 60, 95, 96]. See [9] for a survey with many further references.

The current list of references in the series “VNS for extremal graphs” corresponds to [3, 8, 1019, 23, 28, 29, 31, 32, 35, 45, 50, 55, 56, 60, 61, 64, 95, 96]. Another list of papers, not included in this series, is [47, 9, 30, 33, 49, 5254, 65, 97]. Papers in these two lists cover a variety of topics:

  1. 1.

    Principles of the approach [28, 29] and its implementation [10];

  2. 2.

    Applications to spectral graph theory, e.g., bounds on the index for various families of graphs, graphs maximizing the index subject to some conditions [16, 19, 23, 35, 65];

  3. 3.

    Studies of classical graph parameters, e.g., independence, chromatic number, clique number, average distance [3, 9, 12, 17, 18, 95, 96];

  4. 4.

    Studies of little known or new parameters of graphs, e.g., irregularity, proximity and remoteness [4, 56];

  5. 5.

    New families of graphs discovered by AGX, e.g., bags, which are obtained from complete graphs by replacing an edge by a path, and bugs, which are obtained by cutting the paths of a bag [14, 60];

  6. 6.

    Applications to mathematical chemistry, e.g., study of chemical graph energy, and of the Randić index [11, 15, 32, 45, 49, 50, 52, 53, 55];

  7. 7.

    Results of a systematic study of 20 graph invariants, which led to almost 1500 new conjectures, more than half of which were proved by AGX and over 300 by various mathematicians [13];

  8. 8.

    Refutation or strengthening of conjectures from the literature [8, 30, 53];

  9. 9.

    Surveys and discussions about various discovery systems in graph theory, assessment of the state-of-the-art and the forms of interesting conjectures together with proposals for the design of more powerful systems [33, 54].

3.10 Conclusions

The general schemes of variable neighborhood search have been presented and discussed. In order to evaluate research development related to VNS, one needs a list of the desirable properties of metaheuristics.

  1. 1.

    Simplicity: the metaheuristic should be based on a simple and clear principle, which should be widely applicable;

  2. 2.

    Precision: the steps of the metaheuristic should be formulated in precise mathematical terms, independent of possible physical or biological analogies which may have been the initial source of inspiration;

  3. 3.

    Coherence: all steps of heuristics developed for solving a particular problem should follow naturally from the metaheuristic principles;

  4. 4.

    Effectiveness: heuristics for particular problems should provide optimal or near-optimal solutions for all known or at least the most realistic instances. Preferably, they should find optimal solutions for most benchmark problems for which such solutions are known;

  5. 5.

    Efficiency: heuristics for particular problems should take a moderate computing time to provide optimal or near-optimal solutions, or comparable or better solutions than the state-of-the-art;

  6. 6.

    Robustness: the performance of the metaheuristic should be consistent over a variety of instances, i.e., not merely fine-tuned to some training set and not so good elsewhere;

  7. 7.

    User-friendliness: the metaheuristic should be clearly expressed, easy to understand and, most importantly, easy to use. This implies it should have as few parameters as possible, ideally none;

  8. 8.

    Innovation: the principle of the metaheuristic and/or the efficiency and effectiveness of the heuristics derived from it should lead to new types of application.

  9. 9.

    Generality: the metaheuristic should lead to good results for a wide variety of problems;

  10. 10.

    Interactivity: the metaheuristic should allow the user to incorporate his knowledge to improve the resolution process;

  11. 11.

    Multiplicity: the metaheuristic should be able to produce several near optimal solutions from which the user can choose.

We have tried to show here that VNS possesses to a great extent, all of the above properties. This framework has led to heuristics which are among the very best ones for many problems. Interest in VNS is growing quickly. This is evidenced by the increasing number of papers published each year on this topic. 20 years ago, only a few; 15 years ago, about a dozen; 10 years ago, about 50, and more than 250 papers in 2016.

Figure 3.8 shows the parallel increase of the number of papers on VNS and on the other best known metaheuristics. Data are obtained by using the Scopus search tool, looking for the terms “Variable Neighborhood Search” (VNS) and “Metaheuristics” (MH). Figure 3.8 shows the number of times the terms appeared in the abstract of papers in this database. The years used are from 2000 to 2017 but in 2017 only the first 6 months (from January to June) are included. For comparison purposes, the number of papers with MH is divided by 4.

Fig. 3.8
figure 8

VNS versus MH

Figure 3.9 shows the parallel increase of number of papers on VNS and on other most known Metaheuristics. Data are collected again from the Scopus search tool to look for the terms Variable Neighborhood Search (VNS), Tabu Search (TS), Genetic Algorithms (GA) and Simulated Annealing (SA). For better illustration, the number of appearances of TS, GA and SA are divided by 3, 50 and 10, respectively.

Fig. 3.9
figure 9

VNS versus other main MHs

From the last figure, one can easily see that the relative increase in the number of papers with VNS is larger than the one of other major metaheuristics, especially in the last 5 years.

In addition, the 18th EURO Mini conference held in Tenerife in November 2005 was entirely devoted to VNS. It led to special issues of the IMA Journal of Management Mathematics in 2007 [76], European Journal of Operational Research [68] and Journal of Heuristics [87] in 2008. After that, VNS conferences took place in Herceg Novi—Montenegro (2012), Djerba—Tunis (2014), Málaga—Spain (2016) and in Ouro Preto—Brazil (2017). Each meeting was covered with before-conference Proceedings (in Electronic notes of Discrete Mathematics) and with at least one post-conference special issue in leading OR journals: Computers and OR, Journal of Global Optimization, IMA JMM, International Transactions of OR.