Keywords

Introduction

Optimization is a key discipline in fields such as computer science, artificial intelligence, and operations research. Outside these scientific communities, the meaning of optimization becomes quite vague, going to mean simply “do it as better as you can.” In the context of this chapter, the concept of optimization is conceived as the process of trying to find the best possible solution to an optimization problem, usually in a limited time horizon.

In the simplest case, an optimization problem may be defined by a 2-tuple (X, f), where X represents the set of feasible solutions and f is an objective function that assigns a real number to each solution x ∈ X, which represents its quality or fitness. Then, the main objective of an optimization problem is to find a solution x ∈ X with the best objective function value among all solutions in the search space. Therefore, in a minimization problem, x ∈ X is a minimum point if f(x) ≤ f(x), ∀x ∈ X. Notice that minimization of f is equivalent to maximization of − f.

In optimization problems, there is usually either finite but a huge number or infinity number of solutions and a clear criterion for the comparison among them. Some well-known examples of optimization problems are the traveling salesman problem (TSP), the vehicle routing problem (VRP), the quadratic assignment problems (QAP), or scheduling problems, among others. A detailed description of these problems can be found in [3, 36]. The difficulty of solving these problems has been studied since the late 1970s [12]. These studies concluded that there is a subset of problems where it is possible to design an algorithm, which presents a polynomial computational complexity, i.e., the execution time of these algorithms polynomially grows with the problem size. Such problems belong to the class P, and they are considered “easy to solve.” Examples of these problems are the shortest path problem (Dijkstra algorithm), the minimum spanning tree (Prim or Kruskal algorithms), or flows in networks (Ford-Fulkerson algorithm). However, computing optimal solutions is intractable for many optimization problems of industrial and scientific importance (i.e., there is no known algorithm with polynomial complexity to solve it optimally). This type of problems belongs to a class known as NP, and they are considered “hard to solve.”

In practice, we are usually satisfied with “good” solutions, which are obtained by heuristic algorithms. In particular, metaheuristics (MHs) represent a family of approximate [42] optimization techniques that gained a lot of popularity in the past two decades, becoming the most promising and successful techniques for solving hard problems. Unlike exact optimization algorithms, metaheuristics do not guarantee the optimality of the obtained solutions. Additionally, metaheuristics do not define how close the obtained solutions are from the optimal ones, in contrast with approximation algorithms. MHs provide acceptable solutions in a reasonable computing time for solving hard and complex problems in science and engineering.

The term metaheuristic was coined in 1986 [13] as a way of defining a master process that guides and modifies other subordinate heuristics to explore solutions beyond simple local optimality. MHs are among the most prominent and successful techniques to solve a large amount of complex and computationally hard combinatorial and numerical optimization problems arising in human activities, such as economics, industry, or engineering. MHs can be seen as general algorithmic frameworks that require relatively few modifications to be adapted to tackle a specific problem. They constitute a very diverse family of optimization algorithms including methods such as simulated annealing (SA), Tabu search (TS), genetic algorithms (GA), ant colony optimization (ACO), or variable neighborhood search (VNS).

Metaheuristics are high-level strategies for exploring the search space using different methods. The search strategies are highly dependent on the philosophy of the metaheuristic itself. In particular, trajectory-based metaheuristics can be seen as intelligent extensions of traditional local search methods. The goal of this kind of MH is to escape from a local optimum in order to proceed in the exploration of the search space and move on to find other hopefully better local optimum. Examples of these MHs are Tabu search [13], simulated annealing [23], or variable neighborhood search [15], among others. Population-based metaheuristics deal with a set of solutions instead of dealing with only one solution. These techniques proved a natural and intrinsic way for the exploration of the search space. The final performance of these methods strongly depends on how the population is managed. Examples of population-based metaheuristics are genetic algorithms [19], scatter search [14], or memetic algorithms [33], among others. Some authors consider a third kind of metaheuristics called constructive-based metaheuristics, where the main effort is put in the intelligent construction of the solution. In other words, instead of starting the search from a random solution, these methods try to construct a high-quality initial solution. Examples of constructive-based metaheuristics are GRASP [10], ant colony optimization [5], or iterated greedy [39], among others.

The number of new proposed metaheuristics has amazingly increased in the last 25 years. Nowadays, the portfolio of MHs contains more than 50 variants, only considering the most stabilized ones (MHs successfully applied to a relatively large set of optimization problems). However, at the end, when designing a metaheuristic for an optimization problem, we face with two contradictory criteria: intensification (exploitation) and diversification (exploration). In fact, the performance of a metaheuristic basically relies on how it balances both criteria. The intensification of an algorithm describes its ability to thoroughly explore the promising regions in the hope to find better solutions. On the other hand, diversification describes the ability of the metaheuristic to explore non-visited regions of the search space in order to assure the evenly exploration of the search space and to avoid the confinement to the procedure to a reduced number of regions. Therefore, when tackling an optimization problem, it is necessary to search for the equilibrium between both criteria.

Variable neighborhood search (VNS) is a metaheuristic which was proposed in [32] as a general framework to solve hard optimization problems. This methodology is based on performing systematic changes of neighborhoods during the search space exploration. VNS has evolved in recent years, resulting in a large variety of strategies. Some of the most relevant variants are reduced VNS (RVNS), variable neighborhood descent (VND), basic VNS (BVNS), skewed VNS (SVNS), general VNS (GVNS), or variable neighborhood decomposition search (VNDS), among others (see [18] for a survey on VNS). We refer the reader to [6, 8, 37, 40, 41] to some recent and successful applications of VNS to hard optimization problems.

In this chapter, we focus on the deterministic variant of VNS, namely, variable neighborhood descent (VND). VND deserves separate attention since it is usually used in context of other metaheuristics as a local search routine. Once the set of neighborhood structures is selected, to be used in a deterministic manner, the VND-based local searches may be designed in three possible ways: (i) sequential VND, where neighborhoods are placed in the list with a given order and always explored in that order; (ii) nested or composite VND, where neighborhood operators are composed, i.e., N1(N2(N3(…(x)))) (neighborhood one of neighborhood two of neighborhood three, etc. of x; and (iii) mixed nested VND, where the two previous strategies are combined.

In this chapter, we first give some possible classification of neighborhood structures that are usually used in solving continuous and discrete optimization problems. Then we provide pseudocodes of sequential, composite, and mixed nested variants of VND. Variants of sequential VND are compared on traveling salesman problem.

Neighborhoods

The representation (encoding) of a solution in an optimization problem plays a relevant topic in the design of an algorithm. In fact, as it is well documented, this representation determines the difficulty of solving a problem [12] and also the complexity of some routines within algorithm. In addition, this encoding strongly influences the way in which the neighborhoods of a given solution are defined. Therefore, it is not possible to separate the neighborhood of a solution from its corresponding representation in the computer memory.

Let us assume without loss of generality that each solution x ∈ X is represented by a vector x = (x1, …, xn), being n the size of the problem. Depending on the values of each xi, we can distinguish among different types of problems: continuous (\(x_i \in \mathbb {R}\)), binary (xi ∈{0, 1}), integer (\(x_i \in \mathbb {N}\)), or permutations (\(x_i \in \mathbb {N}\), 1 ≤ xi ≤ n and xi ≠ xj if i ≠ j). We could also identify a fifth class of problems which encompasses mixed variables. In this section, we describe the most common neighborhoods defined in the related literature as well as some basic properties.

Neighborhoods for Continuous Optimization Problems

The continuous constrained nonlinear global optimization problem (GOP) in general form is given as follows:

$$\displaystyle \begin{aligned} (GOP)\quad\left[ \begin{array}{ll} \min \;\; f(x)&\\ \mathrm{s.t.}\quad g_i(x) \leq 0& \forall i \in \{1,2,\ldots, m\}\\ \quad\quad\ h_i(x) = 0 & \forall i \in \{1,2, \ldots, r\}\\ \quad\quad\ a_j \leq x_j\leq b_j& \forall j \in \{1,2, \ldots, n\}\\ \end{array} \right.\end{aligned} $$

where \(x\in \mathbb {R}^n\), \(f:\mathbb {R}^n \to \mathbb {R}\), \(g_i:\mathbb {R}^n \to \mathbb {R}\), i = 1, 2, …, m, and \(h_i:\mathbb {R}^n \to \mathbb {R}\), i = 1, 2, …, r are possibly nonlinear continuous functions and \(a,b\in \mathbb {R}^n\) are the variable bounds.

GOP naturally arises in many applications, e.g., in advanced engineering design, data analysis, financial planning, risk management, scientific modeling, etc. Most cases of practical interest are characterized by multiple local optima, and, therefore, in order to find the globally optimal solution, a global scope search effort is needed.

If the feasible set X is convex and objective function f is convex, then GOP is relatively easy to solve, i.e., the Karush-Kuhn-Tucker conditions may be applied. However, if X is not a convex set or f is not a convex function, we could have many local minima, and thus, the problem may not be solved by using classical techniques.

For solving GOP, neighborhood structures \(\mathscr {N}_{k}(x)\) are usually induced from the p metric:

$$\displaystyle \begin{aligned} \rho (x,y)=\left( \sum_{i=1}^{n}|x_{i}-y_{i}|{}^{p}\right)^{1/p}\text{ (} 1\leq p<\infty \text{)}{}\end{aligned} $$
(1)

or

$$\displaystyle \begin{aligned} \rho (x,y)=\max_{1\leq i\leq n}|x_{i}-y_{i}|\quad\text{(} p\rightarrow \infty \text{).}{}\end{aligned} $$
(2)

The neighborhood \(\mathscr {N}_{k}(x)\) denotes the set of solutions in the k–th neighborhood of x, and using the metric ρ, it is defined as:

$$\displaystyle \begin{aligned} \mathscr{N}_{k}(x)=\{y\in X\;|\;\rho (x,y)\leq \rho _{k}\}, {}\end{aligned} $$
(3)

or

$$\displaystyle \begin{aligned} \mathscr{N}_{k}(x)=\{y\in X\;|\;\rho _{k-1}\leq \rho (x,y)\leq \rho _{k}\}, {} \end{aligned} $$
(4)

where ρk, known as the radius of \(\mathscr {N}_{k}(x)\), is monotonically increasing with k.

Note that in some papers neighborhoods structures in \(\mathbb {R}^n\) are not induced from the p metric (see e.g., [1]).

Neighborhoods for Binary Problems

There are a large family of optimization problems, whose solution is usually represented as a binary array, where the presence or absence of an element is described by means of a binary variable. The knapsack problem [28], the max-cut problem (MCP) [29], or the maximum diversity problem [7] are examples of these problems. For solving binary problems, neighborhood structures \(\mathscr {N}_{k}(x)\) are usually induced from the Hamming metric:

$$\displaystyle \begin{aligned} d_H (x,y)=\sum_{i=1}^{n}|x_{i}-y_{i}|{}\end{aligned} $$
(5)

More precisely, the k–th neighborhood of a solution x, i.e., \(\mathscr {N}_{k}(x)\), relatively to Hamming metric, is defined as

$$\displaystyle \begin{aligned} \mathscr{N}^{Bin}_{k}(x)=\{y\in X\;|\;d_H (x,y)=k\} {}\end{aligned} $$
(6)

We will use the max-cut problem to illustrate the most relevant characteristics of binary problems. Consider a graph G = (V, E) with vertex set V and edge set E. Let wij be the weight associated with edge (i, j) ∈ E. A cut(W, W) is a partition of V into two sets W, W = V ∖ W, and its value is given by the expression:

$$\displaystyle \begin{aligned} \mathit{cut}(W,W^{\prime}) = \sum_{i\in W \land j \in W^{\prime}} w_{ij}\end{aligned} $$

The max-cut problem (MCP) consists of finding a cut in G with maximum value. Karp [22] showed that MCP is an NP-hard problem. Figure 1a shows an example graph with five vertices and seven edges where the number close to each edge represents the corresponding weight. Figure 1b shows a possible solution, x = (W, W), where W = {1, 2} and W = {3, 4, 5}. The value of this solution is cut(W, W) = 9 + 14 + 10 + 5 = 38, computed as the sum of the edges whose endpoints are in different sets (dashed edges). This solution can be represented as a binary vector x = {1, 1, 0, 0, 0} where xi = 1 indicates that the corresponding vertex is in W, while xi = 0 means that the vertex is in W.

Fig. 1
figure 1

Example of graph with five vertices and seven edges and a possible solution for the MCP. (a) Example of a graph. (b) Solution x = {1, 1, 0, 0, 0}

For a given solution x of a MCP, we define \(\mathscr {N}^{\mathit {Bin}}_{\mathit {drop}}(x)\) neighborhood using the drop(x, i) move operator. This operator is responsible of changing the value of a variable xi from 1 to 0, producing a new solution x. The associated neighborhood, \(\mathscr {N}^{\mathit {Bin}}_{\mathit {drop}}(x)\), has size n (in the worst case) and is a subset of \(\mathscr {N}^{Bin}_{1}(x)\). Formally, \(\mathscr {N}^{\mathit {Bin}}_{\mathit {drop}}(x)\) is defined as:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Bin}}_{\mathit{drop}}(x) = \{x^{\prime} \leftarrow \mathit{drop}(x,i): x_i = 1 \land 1\leq i \leq n\} \end{aligned}$$

Figure 2a shows an example of a this type of move. In particular, drop(x, 2) considers the solution x (depicted in Fig. 1b), removes the vertex 2 from W, and includes it in W, producing a new solution xdrop = {1, 0, 0, 0, 0}. The Hamming distance between x and x is 1 since there is only one vertex located in a different group.

Fig. 2
figure 2

Solutions generated when performing the different moves to the one depicted in Fig. 1b. (a) xdrop(x, 2). (b) x′′add(x, 5). (c) x′′′swap(x, 1, 3)

Symmetrically, we define the add(x, i) as the move operator responsible of changing the value of a variable xi from 0 to 1, producing a new solution xadd. Considering the MCP, this move operator removes a vertex from W and includes it in W. This move is represented as x′′add(x, i), and the neighborhood, \(\mathscr {N}^{\mathit {Bin}}_{\mathit {add}}(x)\), with size n in the worst case, is:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Bin}}_{\mathit{add}}(x) = \{x^{\prime\prime} \leftarrow \mathit{add}(x,i): x_i = 0 \land 1\leq i \leq n\} \end{aligned}$$

Figure 2b shows the move add(x, 1). It considers again the solution x (depicted in Fig. 1b), removes the vertex 1 from W including it in W. The Hamming distance between the new produced solution, x′′ = {1, 1, 0, 0, 1}, and x is again 1.

Finally, we define \(\mathscr {N}^{\mathit {Bin}}_{\mathit {swap}}(x)\) neighborhood as a subset of \(\mathscr {N}^{Bin}_{2}(x)\) defined by swap move. Swap move is defined as an operation that changes the value of one variable xi from 1 to 0, and simultaneously a different variable xj change the value from 0 to 1. This move interchanges a vertex from W to W and, simultaneously a different vertex from W to W. This move is represented as x′′′swap(x, i, j), and the neighborhood, \(\mathscr {N}^{\mathit {Bin}}_{\mathit {swap}}(x)\) (with size n2 in the worst case), is:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Bin}}_{\mathit{swap}}(x) = \{x^{\prime\prime\prime} \leftarrow \mathit{swap}(x,i, j): x_i \neq x_j \land 1\leq i,j \leq n\}. \end{aligned}$$

Figure 2c shows the move swap(x, 1, 3). It considers again the solution x (depicted in Fig. 1b) and interchanges the vertex 1 and 3 between W and W, respectively. The Hamming distance between the new produced solution, x′′′ = {0, 1, 1, 0, 0}, and x is in this case 2.

Neighborhoods for Integer Problems

There are some optimization problems where the solution x is represented as a vector of n variables, x = (x1, x2, …, xn), such that l ≤ xj ≤ u for all j (i.e., assuming that all variables are integer and bounded within the interval [l, u]). For example, the solutions to the set covering problem [9], the capacitated task allocation problem [24], or the maximally diverse grouping problems [11] can be represented as a vector of integer values. For solving these problems, neighborhood structures \(\mathscr {N}^{\mathit {Int}}_{k}(x)\) are usually induced from the metric defined as:

$$\displaystyle \begin{aligned} \rho_{\mathit{int}} (x,y)= \sum_{i=1}^{n}|x_{i}-y_{i}|{} \end{aligned} $$
(7)

or

$$\displaystyle \begin{aligned} \rho_{\mathit{int}} (x,y)=\max_{1\leq i\leq n}|x_{i}-y_{i}|{} \end{aligned} $$
(8)

The neighborhood \(\mathscr {N}^{\mathit {Int}}_{k}(x)\) denotes the set of solutions in the k–th neighborhood of x, and it is defined as

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Int}}_{k}(x)=\{y\in X\;|\;\rho_{\mathit{int}} (x,y)\leq k\} {} \end{aligned} $$
(9)

or

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Int}}_{k}(x)=\{y\in X\;|\;{k-1}\leq \rho_{\mathit{int}} (x,y)\leq{k}\} {} \end{aligned} $$
(10)

We use the maximally diverse grouping problem to illustrate neighborhoods for integer problems. It consists of grouping a set of M elements into G mutually disjoint groups in such a way that the diversity among the elements in each group is maximized. The diversity among the elements in a group is calculated as the sum of the individual distances between each pair of elements, where the notion of distance depends on the specific application context. The objective of the problem is to maximize the overall diversity, that is, the sum of the diversity of all groups. Figure 3 shows an example of a graph with ten vertices where we must select three groups of elements. For the sake of clarity, we only represent the distances among selected elements. In addition, we use different shapes to differentiate the vertices on each group. In particular, group 1 (diamond shape) is G1 = {1, 4, 9} whose distances among edges are represented with dashed lines; group 2 (rectangle shape) is G2 = {3, 5, 8} represented with dotted lines; and group 3 (circle shape) is G3 = {2, 6, 7, 10} represented with solid lines. Therefore, this solution can be represented as a vector x = {1, 3, 2, 1, 2, 3, 3, 2, 1, 3} (i.e., each vertex i is located in group xi).

Fig. 3
figure 3

Example of the maximally diverse grouping problem

With this representation, we define the following moves:

  • Exchange of values (Swap move): given a solution x, the exchange of the values of the variables xi and xj is denoted by swap(x, i, j) and generates a new solution x such that \(x_k^{\prime } = x_k\), ∀k ≠ i, j, and \(x_i^{\prime } = x_j\), \(x_j^{\prime } = x_i\). Note that resulting solution x belongs to the neighborhood

    $$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Int}}_{0}(x)=\{y\in X\;|\;\max_{1\leq i\leq n}|x_{i}-y_{i}|=0\}.\end{aligned} $$
  • Replacement of a single value (Replacement move): given a solution x, the replacement move denoted by replace(x, j, i) creates a solution x for which \(x_k^{\prime } = x_k\), ∀k ≠ j, and \(x_j^{\prime } = i\). The move is such that xj ≠ i. In this case, the resulting solution x belongs to the neighborhood

    $$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Int}}_{m}(x)=\{y\in X\;|\;\sum_{1\leq i\leq n}|x_{i}-y_{i}| \leq m\},\end{aligned} $$

    where m = |xj − i|.

Given the solution x = {1, 3, 2, 1, 2, 3, 3, 2, 1, 3}, the move swap(x, 6, 5) interchanges the corresponding group of vertices 6 and 5, producing the solution x = {1, 3, 2, 1, 3, 2, 3, 2, 1, 3} (see Fig. 4a). Similarly, the move replace(x, 2, 1) includes the vertex 2 in group 1 (removing it from its original group). The final solution in this case is x′′ = {1, 1, 2, 1, 2, 3, 3, 2, 1, 3} (see Fig. 4b).

Fig. 4
figure 4

Example of moves applied to a solution for the maximally diverse grouping problem. (a) xswap(x, 6, 5). (b) x′′replace(x, 2, 1)

Neighborhoods for Permutation Problems

In permutation-based representations, a solution is typically expressed as a labeling (permutation), where each element receives a unique label from 1 to n, being n the size of the problem. The traveling salesman problem [35], the linear ordering problem [30], or the cutwidth minimization problem [34] are examples of this kind of problems. For solving these problems, neighborhood structures \(\mathscr {N}^{\mathit {Int}}_{k}(x)\) may be induced using several metrics (see e.g., [4] for possible metric). However, neighborhoods are usually induced from the Cayley distance:

$$\displaystyle \begin{aligned} \rho_{\mathit{perm}} (x,y):= \text{the minimum number of transpositions needed to obtain y from x}{}\end{aligned} $$
(11)

So, the neighborhood \(\mathscr {N}^{\mathit {Perm}}_{k}(x)\) denotes the set of solutions in the k–th neighborhood of x, and it is defined as:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Perm}}_{k}(x)=\{y\in X\;|\;\rho_{\mathit{perm}} (x,y)= k\} {} \end{aligned} $$
(12)

We use the cutwidth minimization problem (CMP) to illustrate some neighborhoods for permutation-based problem. It can be easily described in mathematical terms. Given a graph G = (V, E) with n = |V | and m = |E|, a labeling or linear arrangement x of G assigns the integers {1, 2, …, n} to the vertices in V , where each vertex receives a different label. The cutwidth of a vertex v with respect to x, CWx(v), is the number of edges (u, w) ∈ E satisfying x(u) ≤ x(v) < x(w). In mathematical terms:

$$\displaystyle \begin{aligned} CW_x(v)=|\{(u,w) \in E : x(u) \leq x(v) < x(w)\}| \end{aligned}$$

The cutwidth of G with respect to x is defined as the maximum value of all CWx(v) for v ∈ V . More formally:

$$\displaystyle \begin{aligned} CW_x(G) = \max_{v \in V} CW_x(v) \end{aligned}$$

The optimum cutwidth of G is then defined as the minimum CWx(G) over all possible layouts of G. This optimization problem is NP-hard even for graphs with a maximum degree of three [27]. Figure 5a shows an example of an undirected graph with six vertices and seven edges. Figure 5b shows a labeling, x, of the graph in Fig. 5a, setting the vertices in a line with the order of the labeling, as commonly represented in the CMP. We represent x with the ordering (C, A, D, E, B, F) meaning that vertex C is located in the first position (label 1), vertex A is located in the second position (label 2), and so on. In Fig. 5c, the cutwidth of each vertex is represented as a dashed line with its corresponding value. For example, the cutwidth of vertex C is CWx(C) = 1, because the edge (C, B) has an endpoint in C labeled with 1 and the other endpoint in a vertex labeled with a value larger than 1. In a similar way, we can compute the cutwidth of vertex A, CWx(A) = 4, by counting the appropriate number of edges: (C, B), (A, B), (A, E), and (A, D). Then, since the cutwidth of the graph G, CWx(G), is the maximum of the cutwidth of all vertices in V , in this particular example, we obtain CWx(G) = CWx(D) = 5.

Fig. 5
figure 5

Graph with six vertices and seven edges and an example solution for the CWP. (a) Example graph. (b) Cutwidth of G for a labeling x

The associated neighborhoods for this optimization problem are typically based on two different move operators. The first one is referred to as exchange. Given a solution x = (v1, …, vi, …, vj, …, vn), we define exchange(x, i, j) as exchanging in x the vertex in position i (i.e., vertex vi) with the vertex in position j (i.e., vertex vj), producing a new solution x = (v1, …, vi−1, vj, vi+1, …, vj−1, vi, vj+1, …, vn). So, exchange move represents a transposition. For the sake of simplicity, we denote x = exchange(x, i, j). The associated neighborhood \(\mathscr {N}^{\mathit {Perm}}_{\mathit {exchange}}\), obtained applying all possible exchange moves, has size n(n − 1)∕2, and it is formally defined as:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Perm}}_{\mathit{exchange}}(x) = \{x^{\prime} \leftarrow \mathit{exchange}(x,i,j): i \neq j \land 1\leq i,j \leq n\} \end{aligned}$$

This neighborhood is actually \(\mathscr {N}^{\mathit {Perm}}_{1}\) neighborhood.

The second move operator for CMP is known as insert. Specifically, given a solution x, we define insert(x, j, vi) as the move consisting of deleting vi from its current position i and inserting it in position j. This operation results in the new solution x as follows:

  • If i ≥ j, then x = (…, vj−1, vj, vj+1, …, vi−1, vi, vi+1, …), and the vertex vi is inserted just before the vertex vj , obtaining x = (…, vj−1, vi, vj, vj+1, …, vi−1, vi+1, …).

  • If i < j, then x = (…, vi−1, vi, vi+1, …, vj−1, vj, vj+1, …), and the vertex vi is inserted just after the vertex vj , obtaining x = (…, vi−1, vi+1, …, vj−1, vj, vi, vj+1, …).

Thus, a move insert(x, j, vi) generates a solution belonging to the neighborhood \(\mathscr {N}^{\mathit {Perm}}_{|j-i|}(x)\). The associated neighborhood, \(\mathscr {N}^{\mathit {Perm}}_{\mathit {insert}}\), obtained applying all possible insert moves has size n(n − 1)∕2, and it is formally defined as:

$$\displaystyle \begin{aligned} \mathscr{N}^{\mathit{Perm}}_{\mathit{insert}}(x) = \{x^{\prime} \leftarrow \mathit{insert}(x,j,v_i): i \neq j \land 1\leq i,j \leq n\} \end{aligned}$$

The following example illustrates how the insert is implemented. Let x = (C, A, D, E, B, F) be a solution of the cutwidth problem. Suppose that we perform insert(x, 2, B), obtaining solution x = (C, B, A, D, E, F). Figure 6 graphically shows the solutions before and after the move.

Fig. 6
figure 6

Example of performing an insert move in a permutation solution. (a) Solution x before the move. (b) Resulting solution after insert(x, 2, B)

Local Search Methods

Local search methods are likely the oldest and simplest heuristic methods. Starting from a given feasible solution, these procedures explore a determined neighborhood in each iteration, replacing the current solution if a neighbor improves the objective function. The search ends when all neighbor solutions are worse (i.e., larger objective function value in a minimization problem) meaning that a local optimum is found.

There exist two typical strategies to explore the corresponding neighborhood: best improvement and first improvement. In the former (also known as steepest descent), the associated neighborhood is completely explored by a fully deterministic procedure, performing the best associated move. Therefore, no matter how the neighborhood is scanned, since all neighbor solutions are visited. Algorithm 1 shows the typical pseudocode of this local search method for a minimization problem. The algorithm starts by initializing the control variable improve (step 1). Then, the best improvement strategy performs iterations until it finds a local optimum with respect to neighborhood N(x) (steps 2–9). Given a solution x, the best neighbor solution x is determined in step 4. This instruction has a computational complexity of |N(x)|. In steps 5–8, it is decided whether to perform a new iteration (by updating x to x) or not (abandoning the search).

Algorithm 1: Best improvement(x)

The first improvement strategy tries to avoid the time complexity of exploring the whole neighborhood by performing the first improving move encountered during the exploration of the corresponding neighborhood. In this kind of exploration, the order in which the neighbors are inspected can have a significant influence of the efficiency of the search. Instead of using a fixed ordering for scanning the neighbors, random orders are usually suggested, since the first scanning strategy always drives to the same local optimum, while the second one can reach different local optima.

The pseudocode of this strategy is equivalent to the one presented in Algorithm 1. The only difference is in step 4, where, instead of selecting the best neighbor, it selected the first neighbor which improves the incumbent solution (in terms of objective function value). Additionally, if we follow a predefined order (e.g., a lexicographic order), the first positions in that order (e.g., 1, 2, …, in the lexicographic order) are favored, producing a kind of bias in the search.

Figure 7 shows the performance of two iterations of both strategies starting from the same solution, where the numbers over the arrows indicate the order of exploration of the solutions. Specifically, considering the MCP described in section “Neighborhoods for Binary Problems”, the initial solution is x1 = [1, 1, 0, 0, 0], representing W = [1, 2] and W = [3, 4, 5], with an objective function value of 38. The neighborhood selected to be explored is \( \mathscr {N}^{\mathit {Bin}}_{\mathit {add}}(x_1) \), where each neighbor is generated by adding a new vertex to W (removing it from W). In the first iteration, the best improvement strategy (Fig. 7a) generates all possible neighbor solutions for x1 (i.e., x2, x3, x4). The exploring order, as stated before, is irrelevant, since it is going to explore the whole neighborhood. Then, the method selects the best neighbor solution, which is x3 in Fig. 7a, with a higher (better) objective function value of 43. Finally, in the next iteration, the method explores the entire x3 neighborhood, stopping after the exploration, since there is no better neighbor solution.

Fig. 7
figure 7

Comparison of best and first improvement strategies when starting from the same solution for the MCP. (a) Best improvement. (b) First improvement

Regarding the first improvement method (Fig. 7b), the exploration order (which is relevant in this case) is selected at random, starting with solution x2. As the objective function value of x2 is lower (worse) than x1, the method explores another neighbor, which is x4. The objective function value for x4, 41, is better than for x1, 38, so in this case, the first improvement strategy stops the iteration, starting the next one from this new best solution x4. Notice that although solution x3 is better than x4, it is not explored in this strategy, since the order selected has lead the method to find a better solution before reaching x3. Finally, in the next iteration, the method starts from x4 and then explores, in this order, x6 and x5, stopping the search since there is no improvement in any neighbor.

In the context of large neighborhoods, there is a compromise between the number of iterations required to find a local optimum and the associated computing time. In general, iterations performed in the first improvement strategy are more efficient than those in the best improvement one, since the former only evaluates part of the neighborhood, while the latter explores it completely. On the other hand, the improvement obtained in the first improvement strategy is typically smaller than the one achieved by the best improvement strategy, requiring in general more iterations to obtain the local optimum. Additionally, the best improvement strategy is usually more adequate to perform efficient catching and updating mechanisms, which allows the search to efficiently explores the neighborhood [20].

In [17], an empirical study on traveling salesman problem was conducted in order to compare the first and the best improvement strategy within 2-opt neighborhood structure. It appeared that the quality of the final solution depends on the quality of the initial solution: (i) if random initial solution is chosen, the better and faster is the first improvement strategy; (ii) the opposite holds if the greedy solution is taken as initial one.

VND Variants

The Variable Neighborhood Search (VNS) is a metaheuristic proposed in [32] as a general framework to solve hard problems. It is based on a simple idea: systematical changes of neighborhood structures within the search procedure. Let \({\mathscr {N}}=\{\mathscr {N}_1, \dots , \mathscr {N}_{k_{\mathrm {max}}}\}\) be set of operators such that each operator \(\mathscr {N}_k\), 1 ≤ k ≤ kmax maps a given solution x to a neighborhood structure \(\mathscr {N}_k(x)\). Note that the order of operators taken from the set \({\mathscr {N}}\) defines also the order of neighborhood structures of a given solution x examined. When solving an optimization problem by using different neighborhood structures, VNS methodology proposes to explore them in three different ways: (i) at random, (ii) deterministically, or (iii) mixed (both, in deterministic and random fashion).

Variable neighborhood descent (VND) is a variant of VNS that explores neighborhoods in a deterministic way. In general, VND explores small neighborhoods until a local optimum is encountered. At that point, the search process switches to a different (typically larger) neighborhood that might allow further progress. This approach is based on the fact that a local optimum is defined with respect to a neighborhood relation, such that if a candidate solution x is locally optimal in a neighborhood \(\mathscr {N}_i(x)\), it is not necessarily a local optimum for another neighborhood \(\mathscr {N}_j(x)\). Thus, VND explores the solution space using several neighborhood structures either in a (i) sequential, (ii) a nested (or composite), or (iii) mixed nested way [21, 43].

Sequential Variable Neighborhood Descent Procedures

Most typical VND variants traverse the list of neighborhood structures in a sequential way. Within this variants, the basic VNS, the pipe VND, the cyclic VND, and the union VND emerge as the most representative search procedures. These variants differ in how they implement the neighborhood change procedures. Specifically, if an improvement has been detected in some neighborhood, this is how the search (after updating the incumbent solution) is continued:

  • Basic VND (B-VND) – returns to the first neighborhood from the list

  • Pipe VND (P-VND) – continues the search in the same neighborhood

  • Cyclic VND (C-VND) – resumes the search in the next neighborhood from the list

  • Union VND (U-VND) (sometimes called multiple neighborhood search [44], where the single neighborhood is obtained as the union of all predefined neighborhoods) – continues the search in the same large neighborhood. U-VND is recently proposed in [2, 26, 44] and used within Tabu search.

All these VND procedures follow the steps given in Algorithm 2 and start from a given solution x. In each VND iteration, a local search procedure through a given neighborhood structure is applied, followed by a neighborhood change function (Step 7). The neighborhood change function defines the neighborhood structure that will be examined in the next iteration. Each VND variant stops when there is no improvement with respect to any of the considered neighborhood structures. Thus, the solution obtained by any sequential VND is a local optimum with respect to all neighborhood structures.

Algorithm 2: Sequential variable neighborhood descent

We show the performance of the three aforementioned variants considering the example introduced in [31]. In particular, it shows an empirical study on different VND variants used to solve traveling salesman problem. For testing purposes, 15,200 random test instances were generated in the way described in [17]. Within VND variants, three classical TSP neighborhood structures are considered: 2-opt (Fig. 8), Insertion-1 (Fig. 9), and Insertion-2 (Fig. 10). On each instance from this data set, each VND variant, except U-VND, is tested under 24 different settings. Each setting corresponds to choosing the following: (i) one out of two common ways for getting an initial solution, at random (solution generated as a random permutation of nodes) or greedy; (ii) one out of six possible neighborhood orders; and (iii) the best or the first improvement search strategy. This gives 2 × 6 × 2 = 24 different search methods that use 2-opt, Insertion-1, and Insertion-2 neighborhoods. On the other hand, U-VND is tested under only two different settings as: U-VND that uses the best improvement search strategy and the greedy initial solution and U-VND that uses the best improvement search strategy and the random initial solution. Note that if the first improvement search strategy is used within U-VND, then U-VND is equivalent to basic VND.

Fig. 8
figure 8

Example of a 2-opt move involving vertices 1, 2, 4, and 5

Fig. 9
figure 9

Example of an Insertion-1 move where vertex 5 is inserted between vertices 1 and 2 and removed from its corresponding former position

Fig. 10
figure 10

Example of an Insertion-2 move where vertices 4 and 5 are inserted between vertices 1 and 2, removing them from its corresponding former position

Table 1 summarizes results considering the entire set of 15,200 instances as a test case. Namely, the average solution values (Column ‘av. best value’) and average CPU times (Column ‘av. time’) over all test instances in the data set, attained by VND variants under the best settings, are reported.

Table 1 Comparison of VND variants

From the results presented in Table 1, the following conclusions may be drawn: (i) U-VND is slightly better than the others VNDs, regarding the best average values attained, but much slower than the others. This is explained by the fact that U-VND in each iteration performs exploration of large part of the solution space before deciding to re-center the search. Obviously, such principle is suitable for reaching good final solution, but requires a large CPU time; (ii) comparing VNDs that re-center the search in the inner loop (i.e., B-VND, P-VND, and C-VND), it follows that B-VND is able to provide the best solution values. Regarding average CPU times consumed by B-VND, P-VND, and C-VND to find the best-reported average solution value, their ranking is as follows: the fastest is P-VND, B-VND is ranked as the second, while C-VND is the slowest one. However, since, B-VND consumed negligible more CPU time to find the best reported average solution value comparing to CPU time that P-VND consumed to do so, we may conclude that B-VND is the most appropriate sequential VND version.

Nested Variable Neighborhood Descent

A nested (composite) variable neighborhood descent procedure [21] explores a large neighborhood structure obtained as a composition of several neighborhoods. More precisely, let \({\mathscr {N}}=\{N_1, \dots N_{k_{\mathrm {max}}}\}\) again be set of move operators such that each one Nk, 1 ≤ k ≤ kmax maps a given solution x to a predefined neighborhood structure Nk(x). Then, the neighborhood explored within a nested variable neighborhood procedure is defined with operator \(N_{{nested}}=N_1\circ N_2\circ \cdots \circ N_{k_{\mathrm {max}}}\). More precisely, the composite neighborhood of solution x is formed by first applying the move operator N1, obtaining N1(x). Then, the move operator N2 is applied to all solutions in N1(x), forming the set N1(N2(x)) and so on. Obviously, the cardinality of a neighborhood structure \(N_{{nested}}(x)=N_1(N_2(\dots (N_{k_{\mathrm {max}}}(x))))\) of some solution x is less than or equal to the product of cardinalities of nested (composed) neighborhoods, i.e.,

$$\displaystyle \begin{aligned} |N(x)|\leq \prod_{k=1}^{k_{\mathrm{max}}} |N_k(x)|.\end{aligned} $$

Such cardinality obviously increases chances to find an improvement in the neighborhood. The nested VND is illustrated in Algorithm 3. The neighborhood N(x) may be explored by using either the first or the best improvement search strategy. However, since its cardinality is usually very large, the first improvement is used more often [21, 43].

Algorithm 3: Steps of best improvement nested VND

We use the uncapacitated r-allocation p-hub median problem (r-p-HMP) to illustrate how this VND strategy works. Specifically, the r-p-HMP may be stated as follows. Given n nodes, this problem considers for each pair of nodes i and j, the distance dij and the amount of flow tij ≥ 0 that needs to be transferred from i to j. It is generally assumed that the transportation between non-hub nodes i and j is only possible via hub nodes hi and hj, to which nodes i and j are assigned, respectively. Transferring tij units of flow through path i → hi → hj → j induces a cost cij(hi, hj), which is computed as

$$\displaystyle \begin{aligned} c_{ij}(h_i,h_j)=t_{ij}(\gamma d_{ih_i}+\alpha d_{h_ih_j}+\delta d_{h_jj}).\end{aligned} $$

Parameters γ, α, and δ represent unit rates for collection (origin-hub), transfer (hub-hub), and distribution (hub-destination), respectively. Note that the hub nodes hi and hj may be equal.

We represent a solution of r-p-HMP by a set H containing p hubs and a matrix A, where each row i contains r hubs assigned to node i (i.e., i-th row coincides with the set Hi). Thus, our solution is represented as x = (H, A). The initial solution is obtained using the greedy heuristic described in [38]. We consider two neighborhood structures of a given solution x = (H, A). The first neighborhood structure, denoted by \({\mathscr {N}}_H\), is obtained by replacing one hub node from H by another non-hub node from N ∖ H (see Fig. 11). More formally,

$$\displaystyle \begin{aligned} {\mathscr{N}}_H(x)=\{x^{\prime} \;|\; x^{\prime}=(H^{\prime},A^{\prime}), |H\cap H^{\prime}|=p-1 \}. \end{aligned}$$

The second neighborhood, denoted by \({\mathscr {N}}_A\), is obtained by replacing one hub assigned to some node with another hub, while the set H remains unchanged (see Fig. 12):

$$\displaystyle \begin{aligned} {\mathscr{N}}_A(x)=\{x^{\prime}|x^{\prime}=(H,A^{\prime}), |A\backslash A^{\prime}|=1 \}. \end{aligned}$$
Fig. 11
figure 11

Interchange neighborhood \({\mathscr {N}}_H\). Yellow dots are the possible new hub location for the one represented in green

Fig. 12
figure 12

Allocate neighborhood \({\mathscr {N}}_A\). Yellow dot represents the non-hub node which is reallocated to the another hub

Unfortunately, evaluating the objective function value of a solution from \({\mathscr {N}}_H\) requires to solve an allocation problem, which is \(\mathscr {NP}\) hard by itself [25]. Therefore, solving exactly the associated allocation problem will be quite time consuming. In order to deal with this drawback, we find near-optimal allocation A of some solution \(x^{\prime \prime }\in {\mathscr {N}}_H\) as follows. For each node, we firstly determine node-to-hub allocation using the greedy procedure (see Algorithm 4).

Algorithm 4: Greedy allocation

The solution x′′ = (H′′, A′′) obtained in this way is then improved by exploring the second neighborhood \({\mathscr {N}}_A(x^{\prime \prime })\). In that way, the so-called nested variable neighborhood descent (Nest-VND) is defined (see Algorithm 5).

Algorithm 5: Nested VND for r-p-HMP

In [43], the above nested VND was used as a local search within a general variable neighborhood search (GVNS)-based heuristic. The performance of this heuristic, named GVNS_RP, was compared with the GRASP heuristic proposed in [38]. In Table 2, we provide a summarized results of comparison of GVNS_RP and GRASP on instances from AP data set (see [43]). The average value of best found solutions and average CPU times needed for finding these solutions over all instances with the same number of nodes are reported. The column headings are defined as follows. In the first column of Table 2, we report the number of nodes in the considered instances, whereas in the columns “GRASP” and “GVNS_RP,” we provide the average of best solution values found by GRASP and GVNS_RP, respectively. In columns “time,” the average time needed to reach best found solutions for instances with n nodes is given, while in column “impr.( <percent>) ,” we report the percentage improvement obtained by GVNS_RP compared with the current best known values. From the reported results, it follows that within each set of instances with the same number of nodes, there is at least one instance where the best known solution is improved by GVNS_RP. Moreover, the average improvement on AP data set achieved by GVNS variants is around 0.25 %.

Table 2 Comparison of GRASP and GVNS on AP instances

Mixed Variable Neighborhood Descent

Mixed variable neighborhood descent [21] combines ideas of sequential and nested variable neighborhood descent. Namely, it uses a set of move operators \({\mathscr {N}}=\{N_1, \dots N_{b}\}\) to define a nested neighborhood, and, after that, on each element in this nested neighborhood, it applies a sequential variable neighborhood descent variant defined by a set of move operators \(\mathscr {N}^{\prime }=\{N_{b+1}, \dots N_{k_{\mathrm {max}}}\}\). The cardinality of the set explored in one iteration of a mixed VND is bounded by:

$$\displaystyle \begin{aligned} |N_{mixed}(x)|\leq \prod_{\ell=1}^{b} |{N}_{\ell}(x)|\times \sum_{\ell=b+1}^{k_{\mathrm{max}}}|{N}_{\ell}(x)|, x \in S . \end{aligned}$$

In Algorithm 6, we show the pseudocode of a mixed VND. Note that if the set \({\mathscr {N}}\) is the empty set (i.e., b = 0), we get pure sequential VND. If b = kmax, we get pure nested VND. Since nested VND intensifies the search in a deterministic way, boost parameter b may be seen as a way of balancing intensification and diversification in deterministic local search with several neighborhoods.

Algorithm 6: Steps of mixed VND

In [21], two mixed VND heuristics along with one basic sequential VND heuristic were proposed for solving the incapacitated single allocation p-hub median problem (USApHMP). Neighborhood structures examined within these VND variants are cluster based. A cluster represents one hub with all locations assigned to it. In particular, the following neighborhood structures are distinguished: (i) Allocate- change membership of a non-hub node by reallocating it to the another hub without changing the location of hubs; (ii) Alternate-change the location of the hub in one cluster; (iii) Locate - select one cluster C with hub h and a location node that is not in this cluster. The selected node becomes a hub, and all locations from the cluster C are assigned to the closest hub (including the new one).

The proposed basic sequential VND heuristic, named Seq-VND, examines neighborhood structures Allocate, Alternate, and Locate in that order. On the other hand, the first mixed VND heuristic, named Mix-VND1, takes several random points from Locate neighborhood and starting from each of them carries out search applying a basic sequential VND heuristic Seq-VND1. The used Seq-VND1 explores Allocate and Alternate neighborhoods in that order. The second mixed VND heuristic, named Mix-VND2, performs more thoroughly exploration of the solution space than Mix-VND1 applying Seq-VND1 on each point from Locate neighborhood. These three VND variants together with three local searches in the three defined neighborhood structures (Allocate, Alternate, and Locate) are experimentally compared on AP instance with n = 100 and p = 15 (see [21]). One thousand initial points are generated at random in the solution space, and a local search is conducted from each one in each type of neighborhood. The distance-to-target diagram [16] presented at Fig. 13 shows the distribution of local minima. Each point (x, y) plots the distance x and percentage deviation y of the local minimum from the best known solution. The results are summarized in Table 3.

Fig. 13
figure 13

Multistart on 1000 random initial solutions for AP instance with n = 100 and p = 15

Table 3 Comparison of six local searches on AP instance with n = 100 and p = 15

Comparing the results in Table 3, we observe that (i) the search in first (Allocate) and second (Alternate) local neighborhoods is faster than the others, but they have the worse quality; (ii) Seq-VND gives better results than the third local search (Locate) and is considerably faster; (iii) all three multistart VND local searches found the best known solution in the AP instance; and (iv) the mixed VND versions give the best results, although, as expected, solution times are longer than solution time of Seq-VND.

We see in Fig. 13 that the number of local minima is reduced when VND is used. This is due to the larger neighborhood in VND. For example, the Seq-VND neighborhood clearly contains each of the individual neighborhoods, Allocate, Alternate, and Locate. It is remarkable that Mix-VND2, which utilizes the largest neighborhood, yields only four local minima for this problem instance .

Conclusions

Local search represents one of the most popular classical heuristic technique that improves (locally) the current feasible solution of some continuous or discrete optimization problem. For that purposes, usually only one neighborhood structure is defined and explored to improve the incumbent solution. Searching for the better solution in such neighborhood is repeated until there is no better solution, i.e., until the local minimum with respect to that predefined neighborhood is reached.

Since the local minimum with respect to one neighborhood structure is not necessary local in another, one needs to construct deterministic local search in cases when more than one neighborhood is used. Such procedures are known as variable neighborhood descent (VND). In this VND survey, we first propose a possible classification of neighborhood structures in solving optimization problems and also provide some simple examples to clearly illustrate the basic ideas. Then, we discuss possible general ways of combining several neighborhoods in the deterministic fashion. Needless to say that the number of possible combination of VND local search variants is large, and they could include problem specific knowledge in building heuristic for each particular problem. Therefore, VND area is an open avenue for the future research in the area of optimization.

Cross-References