1 Introduction

Any combinatorial optimization problem involves a finite number of feasible solutions and is completely defined by a ground set \({{\mathcal {E}}} = {1,\ldots ,n}\), an objective function \(f: 2^{\mathcal E} \mapsto {{\mathbb {R}}}\), and the set of feasible solution \({\mathcal {X}}\subseteq 2^{{\mathcal {E}}}\). In case of minimization (resp. maximization), one searches for an optimal solution \(x^*\in {{\mathcal {X}}}\) such that \(f(x^*)\le f(x)\) (resp. \(f(x^*)\ge f(x)\)), \(\forall \; x\in {{\mathcal {X}}}\). To illustrate an example, let us consider the Traveling Salesman Problem, a classical combinatorial optimization problem defined on an undirected edge-weighted graph \(G=(V,E)\). In this case, the ground set \({{\mathcal {E}}}\) is the set of edges connecting nodes in V to be visited, \({{\mathcal {X}}}\) is formed by all edge subsets that determine a Hamiltonian cycle, and the objective function value f(x) is the sum of the costs of all edges in solution x. As a further example of how to define a combinatorial optimization problem through the ground set \({\mathcal E}\), the objective function f, and the set of feasible solution \({\mathcal {X}}\), let us consider the Maximum Cut Problem, defined on an undirected edge-weighted graph \(G=(V,E)\). Here, the ground set \({\mathcal {E}}\) is the set of all edges in E, \({\mathcal {X}}\) is the set of all subsets of \({\mathcal {E}}\) made of edges with endpoints in two different node subsets defining a partition of V, and f(x) is the sum of the weights of edges in solution x.

Combinatorial optimization problems arise in several and heterogenous domains [87], among many others we recall routing, scheduling, production planning, decision making process, and location problems. All these problems have both a theoretical relevance and a practical impact, given their applicability to real-world scenarios [89].

Many combinatorial optimization problems are computationally intractable, in the sense that until now, no polynomial-time algorithm is known to exactly solve them [45]. In the last decades, several optimal seeking methods that do not explicitly examine all feasible solutions have been developed, such as Branch & Bound, Cutting Planes, and Dynamic Programming. Nevertheless, most real-world problems are either computationally intractable by their nature, or sufficiently large so as to preclude the use of exact algorithms. In such cases, heuristic methods are usually employed to find good, but not necessarily guaranteed optimal solutions.

Starting from one of the pioneering papers of Kirkpatrick on Simulated Annealing [69] which appeared in 1984, the most promising heuristic methods concentrate their effort in the attempt of avoiding entrapments in local optima and in exploiting the basic structure and properties of the problem they solve. Such techniques include Tabu Search [4750], Genetic Algorithms [54], Variable Neighborhood Search [59, 83], and GRASP (Greedy Randomized Adaptive Search Procedure) [30, 31, 3942].

A GRASP is a multi-start or iterative process introduced by Feo and Resende [30], following in the spirit of the pioneering idea proposed in 1973 by Lin and Kernighan for the Traveling Salesman Problem [77]. Each GRASP iteration is usually made up of a construction phase, where a feasible solution is constructed, and a local search phase which starts at the constructed solution and applies iterative improvement until a locally optimal solution is found. Repeated applications of the construction procedure yields diverse starting solutions for the local search and the best overall locally optimal solution is kept as the result. Since its proposal, GRASP has been applied to solve decision and optimization problems arising in several contexts. Recent areas of application include routing [9, 11, 19], logic [36, 93, 94], covering and partition [6, 30, 92], location [23, 25], network optimization [7, 85, 97], assignment [82, 99], timetabling and scheduling [35, 14, 73, 74, 98], graph and map drawing [37, 38, 72, 80, 97], and very recently computational biology [28, 3235, 43, 51, 64].

The aim of this paper is to propose a new variant of the classical GRASP framework that uses a nonmonotone strategy to explore the neighborhood of the current solution. Inspired by an idea proposed in 1986 for Newton’s method [55], this strategy controls uphill solutions without using a “tabu list” but simply maintaining a set W of a given number of previously computed objective function values. A new solution is accepted if its function value improves the worst value in W.

The remainder of this paper is organized as follows. In Sect. 2, the main ingredients of a classical GRASP framework are described. In Sect. 3, we illustrate a GRASP for three selected hard combinatorial optimization problems, i.e., the MAX-CUT, MAX-SAT, and QAP problem. Section 4 is devoted to the description and analysis of a nonmonotone variant of GRASP (NM-GRASP). In Sect. 5, we illustrate the effectiveness of our Nonmonotone GRASP by comparing it with the classical GRASP on standard test problems (from the literature) for the three combinatorial optimization problems described in Sect. 3. The experiments empirically show that the new described GRASP framework results in better quality solutions. Concluding remarks are given in the last section.

2 The classical GRASP

Given a combinatorial optimization problem specified by the ground set \({{\mathcal {E}}}\), the real-valued objective function f, and the finite set of feasible solution \({\mathcal {X}}\), a classical GRASP metaheuristic is a multi-start or iterative method, in which each iteration consists of two phases: construction of a solution and local search. The pseudo-code in Fig. 1 illustrates the main blocks of a GRASP procedure for minimization, in which MaxIterations iterations are performed and Seed is used as the initial seed for the pseudorandom number generator.

Fig. 1
figure 1

Pseudo-code of a classical GRASP for a minimization problem

The construction phase builds a solution x that can be eventually not feasible (line 3). In this case, the feasibility of the built solution is obtained by invoking a repair procedure in line 5. Once a feasible solution x is obtained, its neighborhood is investigated by the local search until a local minimum is found (line 7). The best overall local optimal solution is kept as the result (line 12).

Fig. 2
figure 2

Basic GRASP construction phase pseudo-code

The pseudo-code in Fig. 2 illustrates the main ingredients of a typical GRASP construction phase, which proceeds applying a greedy, randomized, and adaptive criterion. In the spirit the pioneering semi-greedy idea proposed by Hart and Shogan in 1987, the construction procedure starts from an empty solution (line 1) and iteratively, one element at a time, builds a complete solution (loop in lines 3–9). At each iteration, the choice of the next element to be added to the partial solution under construction is determined by ordering all candidate elements (i.e., those that can be added to the solution) in a candidate list C with respect to a greedy function \(g: C \rightarrow {{\mathbb {R}}}\) that myopically measures the benefit in terms of objective function value of selecting each candidate element. The heuristic is adaptive because the benefits associated with every element are updated at each iteration of the construction phase to reflect the changes brought on by the selection of the previous element (line 8). The probabilistic component of a GRASP construction procedure is characterized by randomly choosing one of the best candidates in the list, but not necessarily the top candidate (line 5). The list of best candidates is called the restricted candidate list (RCL). This choice technique allows for different solutions to be obtained at each GRASP iteration, but does not necessarily compromise the power of the adaptive greedy component of the method.

Fig. 3
figure 3

Pseudo-code of a generic local search procedure

Once a feasible solution x is obtained, its neighborhood is investigated by the local search until a local minimum is found. Figure 3 illustrates the pseudo-code of a generic local search procedure for a minimization problem.

As any local search algorithm, a typical GRASP local search procedure requires the definition of a proper neighborhood structure N for the specific problem to be solved. The neighborhood structure N relates a solution x of the problem to a subset of solutions N(x) “close” to x, which is said to be locally optimal with respect to N(x) if within N(x) there is no better solution in terms of objective function value.

Once a suitable neighborhood N(x) of the current solution x has been defined and computed (line 1), a GRASP local search works in an iterative fashion by successively replacing the current solution x by a better solution in the neighborhood N(x). It terminates when no better solution is found in the neighborhood, i.e. when a local minimum is found and returned to the main algorithm.

It can be easily seen that the key to success for a local search algorithm consists of the suitable choice of a neighborhood structure, efficient neighborhood search techniques, and the starting solution.

3 A GRASP for the MAX-CUT, the MAX-SAT, and the QAP

We briefly illustrate in this section the main features of state-of-the-art classical GRASP proposed for three classical hard combinatorial optimization problems: the MAX-CUT, the MAX-SAT, and the QAP problem.

3.1 MAX-CUT

Given an undirected graph \(G=(V,E)\), where \(V=\{1,\dots ,n\}\) is the set of vertices and E is the set of edges, and weights \(w_{ij}\) associated with the edges \((i,j)\in E\), the MAX-CUTconsists in finding a partition \((S,{\bar{S}})\) of V such that the weight of the cut induced by \((S,{\bar{S}})\) defined as

$$\begin{aligned} w(S,\bar{S})=\displaystyle {\sum _{i\in S,j\in \bar{S}} w_{ij}} \end{aligned}$$

is maximized.

The problem can be formulated as the following integer quadratic program:

$$\begin{aligned} \begin{array}{llll} \max &{} \quad \displaystyle {{1\over 2}\sum _{1 \le i<j \le n} w_{ij}(1-y_iy_j)} \\ \text{ s.t. } \\ &{} \quad y_i\in \{-1,1\}, \quad \forall \; i\in V. \end{array} \end{aligned}$$

Each set \(S=\{i \in V: y_i=1\}\) induces a cut \((S,{\bar{S}})\) with weight

$$\begin{aligned} w(S,{\bar{S}})=\displaystyle {{1\over 2}\sum _{1 \le i<j \le n} w_{ij}(1-y_iy_j)}. \end{aligned}$$

In spite of the very easy statement of this well known combinatorial optimization problem, its decision version has been proved to be NP-complete by Karp already in 1972 [68]. In 1991, it has been showed that MAX-CUTis APX-complete [86], meaning that unless P=NP, it does not allow a polynomial time approximation scheme [108]. Polynomially solvable cases include planar graphs [58], weakly bipartite graphs with nonnegative weights [57], and graphs without \(K_5\) minors [10].

Given the inner intractability of the problem, many researchers have devoted their effort to both more deeply studying the inner computational nature of the problem and in designing good approximate and heuristic solution techniques (see e.g., [13, 22]). Along this research line, the idea that the MAX-CUT can be naturally relaxed to a semidefinite programming problem was first observed by Lovász [79] and Shor [103]. Goemans and Williamson [53] proposed a randomized algorithm that uses semidefinite programming to achieve a performance guarantee of 0.87856 if the weights are nonnegative. Since then, many approximation algorithms for NP-hard problems have been devised using SDP relaxations [56, 66, 91].

In the following, we describe a classical GRASP proposed by Festa et al. in 2002 [38]. As any GRASP, it proceeds in iterations. At each iteration, a greedy randomized adaptive solution is built and used as starting point in a local search procedure. The best overall locally optimal solution is returned as an approximation of the global optimal.

The construction procedure uses an adaptive greedy function, a construction mechanism for the restricted candidate list, and a probabilistic selection criterion. In the case of the MAX-CUT, it is intuitive to relate the greedy function to the sum of the weights of the edges in each cut. More formally, let \((S,{\bar{S}})\) be a cut. Then, for each vertex \(v \not \in S \cup \bar{S}\), the following two quantities are computed:

$$\begin{aligned} \displaystyle {\sigma _{S}(v)=\sum _{u\in S}w_{vu}}\quad \text{ and }\quad \displaystyle {\sigma _{{\bar{S}}}(v)=\sum _{u\in \bar{S}}w_{vu}}. \end{aligned}$$

The greedy function, \(g(v) = \max \{ \sigma _{S}(v), \sigma _{\bar{S}}(v) \}\), measures how much additional weight will result from the assignment of vertex v to S or \(\bar{S}\). The greedy choice consists in selecting the vertex v with the highest greedy function value. If \(\sigma _{S}(v)>\sigma _{{\bar{S}}}(v)\), then v is placed in \({\bar{S}}\); otherwise it is placed in S. To define the construction mechanism for the restricted candidate list (RCL), let

$$\begin{aligned} w_{min}= & {} \min \left\{ \min _{v\in V'}\sigma _S(v), \; \min _{v\in V'}\sigma _{{\bar{S}}}(v)\right\} \end{aligned}$$

and

$$\begin{aligned} w^{max}= & {} \max \left\{ \max _{v\in V'}\sigma _S(v), \; \max _{v\in V'}\sigma _{{\bar{S}}}(v)\right\} \nonumber \\= & {} \max _{v\in V'} \{ g(v) \}, \end{aligned}$$

where \(V'=V{\setminus } \{S\cup {\bar{S}}\}\) is the set of vertices which are still candidate elements, i.e., not yet assigned to either subset S or subset \({\bar{S}}\). Denoting by \(\mu =w_{min}+\alpha \cdot (w^{max}-w_{min})\) the cut-off value, where \(\alpha \in [0,1]\), the RCL is made up by all vertices whose value of the greedy function is greater than or equal to \(\mu \). A vertex is randomly selected from the RCL.

Once a greedy, randomized, and adaptive solution x is built, the local search procedure is invoked. Given the current solution x, it implements an elementary move, that consists in moving each vertex from one subset of the cut to the other. More formally, let \((S,{\bar{S}})\) be the current solution. To each vertex \(v \in V\) we associate either the neighbor \((S {\setminus } \{v\}, \bar{S}\cup \{v\})\) if \(v \in S\), or the neighbor \((S \cup \{v\}, {\bar{S}} {\setminus } \{v\})\) otherwise. The value

$$\begin{aligned} \delta (v) = \left\{ \begin{array}{ll} \sigma _{S}(v) - \sigma _{{\bar{S}}}(v), &{} \quad \hbox {if }\;v \in S;\\ \sigma _{{\bar{S}}}(v) - \sigma _{S}(v), &{} \quad \hbox {if }\;v \in \bar{S} \end{array} \right. \end{aligned}$$

represents the change in the objective function associated with moving vertex v from one subset of the cut to the other.

In [38], all possible moves are investigated and the acceptance criterion follows a monotone strategy, i.e., the current solution is replaced by its best improving neighbor and the search stops after all possible moves have been evaluated and no improving neighbor is found.

3.2 MAX-SAT

A propositional formula \({\varPhi }\) on a set of n Boolean variables \(V=\{x_{1},\ldots ,x_{n}\}\) in conjunctive normal form (CNF) is a conjunction on a set of m clauses \({\mathbb {C}} =\{C_{1},\ldots ,C_{m}\}\). Each clause \(C_{i}\) is a disjunction of \(|C_{i}|\) literals, where each literal \(l_{ij}\) is either a variable \(x_{j}\) or its negation \(\lnot x_{j}\). \({\varPhi }\) can formally be written as

$$\begin{aligned} {\varPhi } = \bigwedge _{i=1}^{m} C_{i} = \bigwedge _{i=1}^{m}\left( \bigvee _{j=1}^{|C_{i}|} l_{ij} \right) . \end{aligned}$$

A clause is satisfied if at least one of its literals evaluates to 1 (true), which means that either one of the unnegated Boolean variables has the value of 1 or a negated variable has the value of 0. \({\varPhi }\) is said to be satisfied if all of its clauses are satisfied. In the satisfiability problem (SAT), one must decide whether there exists an assignment of values to the variables such that a given propositional formula is satisfied. SAT was the first problem to be shown to be NP-complete  [24]. The MAX-SAT is a generalization of SAT, where given a propositional formula, one is interested in finding an assignment of values to the variables which maximizes the number of satisfied clauses. Generalizing even further, if we introduce a positive weight \(w_{i}\) for each clause \(C_{i}\), then the weighted MAX-SAT consists of finding an assignment of values to the variables such that the sum of the weights of the satisfied clauses is maximized. The MAX-SAT has many applications both theoretical and practical, in areas such as complexity theory, combinatorial optimization, and artificial intelligence  [12]. It is an intractable problem in the sense that no polynomial time algorithm exists for solving it unless \(\mathrm {P}=\mathrm {NP}\), which is evident since it generalizes the satisfiability problem  [45].

The first approximation algorithms for the MAX-SAT were introduced in [65], where Johnson presented two algorithms with performance rations \((k-1)/{k}\) and \((2^{k}-1)/{2^{k}}\), where k is the least number of literals in any clause. For the general case \(k=1\) they both translate to a 1 / 2-approximation algorithm, while it has been shown in  [20] that the second algorithm is in fact a 2 / 3-approximation algorithm. A 3 / 4-approximation algorithm, based on network flow theory, was presented by Yannakakis in  [110] and also in  [52] by Goemans and Williamson. Currently, one among the best deterministic polynomial time approximation algorithm for the MAX-SAT achieves a performance ratio of 0.758 and is based on semidefinite programming  [53], while there is also a randomized algorithm with performance ratio 0.77  [8]. Better approximation bounds for special cases of the problem in which, for instance, we restrict the number of literals per clause or impose the condition that the clauses are satisfiable have also been found  [29, 67, 107]. With respect to inapproximability results, it is known  [60] that unless \(\mathrm {P}=\mathrm {NP}\) there is no approximation algorithm with performance ratio greater than 7 / 8 for the MAX-SAT in which every clause contains exactly three literals, thereby limiting the general case as well. In 1997, to heuristically solve the problem a GRASP has been proposed  [95]. In  [96] a complete Fortran implementation of the algorithm is given along with extensive computational runs. In the following, we provide a brief description of the main ingredients of the classical GRASP for the MAX-SAT.

Given a set of clauses \({\mathbb {C}}\) and a set of Boolean variables V, let \({\mathbf {x}}\in \{0,1\}^{n}\) be a truth assignment (i.e., the vector of truth values assigned to the variables) and let \(c({\mathbf {x}})\) be the sum of the weights of the satisfied clauses as implied by \({\mathbf {x}}\). Without loss of generality, all the weights \(w_{i}\) of the clauses are assumed to be positive integers. Given any two truth assignments \(\mathbf {x,y}\in \{0,1\}^{n}\) let \({\varDelta }(\mathbf {x,y})\) denote their difference set, i.e.,

$$\begin{aligned} {\varDelta }(\mathbf {x,y}) := \{ i : x_{i}\ne y_{i}, i=1,\ldots ,n \} \end{aligned}$$
(1)

and their distance

$$\begin{aligned} d(\mathbf {x,y}) := | {\varDelta }(\mathbf {x,y}) | = \sum _{i=1}^{n} |x_{i}-y_{i}|, \end{aligned}$$
(2)

which is the Hamming distance, and will be used as a measure of proximity between two solutions. As detailed in  [95], in the construction phase of the algorithm a solution is built one element at a time in a greedy randomized fashion, by maintaining a RCL throughout the procedure, which contains elements that correspond to assignments of yet-unassigned variables to either 1 (true) or 0 (false). Choosing an element to be added to a partial solution from the RCL corresponds to setting the respective truth value to the given variable. Given any partial solution, which corresponds to a set of satisfied clauses, the next element to be added to the solution is chosen taking into account the total weight of the unsatisfied clauses that become satisfied after the assignment to the just chosen element. More formally, let \(N = \{1,2,\ldots ,n\}\) and \(M = \{1,2,\ldots ,m\}\) be sets of indices for the set of variables and clauses, respectively. Moreover, for \(i \in N\), let \({\varGamma }_i^+\) be the set of unassigned clauses that would become satisfied if variable \(x_i\) were to be set to true, and \({\varGamma }_i^-\) be the set of unassigned clauses that would become satisfied if variable \(x_i\) were to be set to false. Let \(\gamma _{j}^{+}\) and \(\gamma _{j}^{-}\) be the gain in the objective function value if we set the unassigned variable \(x_{j}\) to 1 and 0, respectively. Formally, they are defined as follows:

$$\begin{aligned} \gamma _i^+ = \sum _{j\in {\varGamma }_i^+} w_j \quad \text{ and } \quad \gamma _i^- = \sum _{j \in {\varGamma }_i^-} w_j. \end{aligned}$$

If \(X\subseteq V\) is the set of already assigned variables, the best gain \(\gamma ^{*}\) is computed as

$$\begin{aligned} \gamma ^{*} : = \max \{\gamma _{j}^{+},\gamma _{j}^{-} : j \; \mathrm { such } \; \mathrm { that } \; x_{j}\in V{\setminus } X\} \end{aligned}$$

and RCL keeps only those assignments with \(\gamma _{j}^{+}\) and \(\gamma _{j}^{-}\) that are greater or equal to \(\alpha \cdot \gamma ^{*}\) where \(0\le \alpha \le 1\) is a parameter. A random choice from the RCL corresponds to a new assignment \(x_{s}=1\) (\(x_{s}=0\)), which is added to our partial solution \(X=X\cup \{x_{s}\}\). After each such addition to the partial solution, \({\varGamma }^+_i\), \({\varGamma }^-_i\), \(\gamma _{j}^{+}\), and \(\gamma _{j}^{-}\) are consequently updated, in an adaptive fashion. The process is repeated until \(|X| = n\).

Having completed a truth assignment \({\mathbf {x}}\), a local search is applied in order to guarantee local optimality. The 1-flip neighborhood is used in the local search, which is defined as

$$\begin{aligned} N_{1}({\mathbf {x}}) := \{ {\mathbf {y}}\in \{0,1\}^{n}: d(\mathbf {x,y}) \le 1\}. \end{aligned}$$
(3)

If \(w({\mathbf {x}})\) is the total weight of the satisfied clauses for the truth assignment \({\mathbf {x}}\), then \({\mathbf {x}}\) is a local maximum if and only if \(w({\mathbf {x}}) \ge w({\mathbf {y}})\) for all \({\mathbf {y}}\in N_{1}({\mathbf {x}})\).

3.3 QAP

Given a set \(N = \{1, 2, \ldots , n\}\), the set \({\varPi }_N\) of all permutations of the elements in N and two \(n\times n\) matrices F and D, such that, for \(i,j\in \{1, 2, \ldots , n\}\), \(f_{ij}, d_{ij}\in {{\mathbb {R}}}^+\), the QAP aims at finding a permutation \(\pi ^*\in {\varPi }_N\) such that

$$\begin{aligned} \pi ^*=\text{ arg }\displaystyle {\min _{p\in {\varPi }_N} \sum _{i=1}^n \sum _{j=1}^n f_{ij}\cdot d_{\pi (i)\pi (j)}}. \end{aligned}$$

The QAP was first proposed already in 1957 by Koopmans and Beckman  [70] while studying the plant location problem. In the location theory context, one is given a set \({\mathcal F}=\{{{\mathcal {f}}}_1,\ldots ,{{\mathcal {f}}}_n\}\) of n facilities and a set N of n locations. Matrices F and D represent the flow matrix and the distance matrix, respectively, and the objective is to determine to which location each facility must be assigned so as to minimize the total cost of the assignment. Since its first formulation, the QAP has appeared in several practical applications, including economy  [61, 62], scheduling  [46], and numerical analysis  [15]. Recent surveys on the QAP are given in  [78] and in  [26], besides two nice and comprehensive books ([90],  [17]).

The QAPis one of the most difficult combinatorial optimization problems. In 1976, Sahni and Gonzales  [101] had shown that it is NP-hard and that, unless P \(=\) NP, it is not possible to find an \(\epsilon \)-approximation algorithm, for any constant \(\epsilon \) and this result stands even under the hypotheses that F and D are symmetric coefficient matrices. Due to its high computation complexity, to find in reasonable running times good quality solutions for the QAPLi et al. [76] proposed a GRASP, whose Fortran implementation has been described in [88].

In the GRASP for QAP, the construction phase performs two stages. In the first stage, only two assignments are produced. Once sorted inter-site distances in increasing order and inter-facility flows in decreasing order, the idea in this first stage is to assign facilities with high interaction (i.e., having high \(f_{ij}\) values) to nearby sites (i.e., sites with low \(d_{kl}\) values). Coherently, among the pairs of assignments having the smallest \(d_{kl}\cdot f_{ij}\) products and inserted in the RCL a pair is selected at random and the corresponding assignment established. The remaining \(n-2\) facility-site assignments are then made sequentially in the second stage. The idea now is to favor assignments that have small interaction cost with the set of previously-made assignments. To do this, at each iteration, the procedure keeps all costs of unassigned facility-site pairs sorted in increasing order. More specifically, let \({\varGamma } = \{(i_1, k_1),\ldots ,(i_q, k_q)\}\) be the set of q assignments at a certain iteration of the construction phase. Then, the cost \(c_{jl}\) of assigning facility j to site l, with respect to the already-made q assignments is computed as follows:

$$\begin{aligned} c_{jl}=\displaystyle {\sum _{(i,k)\in {\varGamma }} d_{kl}\cdot f_{ij}}. \end{aligned}$$

The pairs having the least \(\alpha \cdot \vert {\varGamma }\vert \), \(\alpha \in (0,1]\) costs are inserted in the RCL and one of them is selected at random and added to \({\varGamma }\). The procedure is repeated until \(n-1\) assignments are made. The remaining facility is then assigned to the remaining site. In the local search phase, a simple 2-exchange neighborhood structure is defined and the local improvement procedure considers all possible 2-swaps of facility-locations. If a swap improves the cost of the assignment, it is accepted. The procedure continues until no swap improves the solution value.

4 A nonmonotone GRASP

For finding approximate solutions of hard combinatorial problems, we propose a NonMonotone GRASP (NM-GRASP). The main difference between NM-GRASP and a classical GRASP is in the use of a nonmonotone local search strategy, based on the ideas described in [55].

The pseudo-code in Fig. 4 illustrates how the nonmonotone local search works for a minimization problem. As in the classical GRASP local search, the first step of the nonmonotone local search is the computation of a suitable neighborhood N(x) of the current solution x (line 1). The search is then carried out by successively replacing the current solution x by a solution \(y\in N(x)\) that improves a given reference objective function value \(\bar{f}\) instead of the best value obtained so far. Hence, we have \(f(y) < {\bar{f}}\) which clearly allows for uphill steps thus giving raise to a nonmonotone local search. In order to avoid cycling of the local search routine, the reference value must be updated according to a rigorous criterion. To perform such an update, the routine employs a queue W of maximum size M containing the least recently accepted function values. The queue is managed according to a first-in-first-out policy by means of the following two operations: push(fW), which inserts into W a new function value f, and pop(W), which drops from W the least recently inserted function value.

Inspired by an idea proposed in 1986 for Newton’s method [55] and unlike the Tabu Search [47, 48, 50], the nonmonotone local search controls uphill solutions without using a “tabu list” of already generated solutions, but simply maintaining a set W of a given number of previously computed objective function values and a new solution is accepted if its function value improves the worst value in W. Furthermore, the local search of a Tabu Search approach always moves towards a not prohibited solution that strictly improves the current solution in terms of objective function value. The nonmonotone local search, instead, accepts and moves towards a solution that not necessarily improves the best objective function value obtained so far. In this aspect, the nonmonotone local search resembles a classical Simulated Annealing [69]. The difference between the nonmonotone local search and a Simulated Annealing lies in the strategy they use to implement the idea of performing uphill steps. A Simulated Annealing uses a probabilistic acceptance criterion, while in our nonmonotone local search a solution is accepted if it improves a given reference objective function value \({\bar{f}}\) instead of the best value obtained so far.

Fig. 4
figure 4

Pseudo-code of the nonmonotone local search procedure

Looking at the pseudo-code in Fig. 4, procedure NonmonotoneLocalSearch successively updates the current solution x by a new one belonging to the set H of improving solutions with respect to the given reference value \({\bar{f}}\) and (possibly) updates the reference value \({\bar{f}}\) itself. The search terminates when there is no solution in the neighborhood that improves \({\bar{f}}\). More precisely, the nonmonotone local search terminates at x if

$$\begin{aligned} f(y)\ge {\bar{f}} \ge f(x), \quad \forall \; y\in N(x). \end{aligned}$$
(4)

Note that, condition (4) implies that x is locally optimal with respect to N(x).

In order to show that NonmonotoneLocalSearch cannot indefinitely cycle between lines 4 and 10 of the while-cycle, we need to explicitly define the sequences of points and function values generated by the procedure. To this aim, let us denote by \(x^0\) the starting solution of the local search, and by \(x^k\) and \(\bar{f}^k\) the solution and the reference value at the beginning of each iteration of the while-cycle, respectively. Moreover, let be

$$\begin{aligned} H^k = \{y\in N(x^k): f(y) < {\bar{f}}^k\}. \end{aligned}$$

Consequently, line 5 of the while-cycle can be written as

$$\begin{aligned} x^{k+1} \text{:= } \text{ Select }(H^k); \end{aligned}$$

We remark that the updating of the reference value performed by the procedure is such that \({\bar{f}}^k\) can formally be written as

$$\begin{aligned} {\bar{f}}^k = \max _{0 \le i \le \min \{k,\ M\}} f(x^{k-i}), \end{aligned}$$
(5)

where \(M\in {{\mathbb {N}}}^+\) is fixed.

We can now formally introduce the sequences \(\{x^k\}\), \(\{f(x^k)\}\), and \(\{{\bar{f}}^k\}\) generated by NonmonotoneLocalSearch. Note that, even when

$$\begin{aligned} f(x^{k+1}) < {\bar{f}}^k, \quad \text{ for } \text{ every } \text{ index } \; k, \end{aligned}$$

it results that the sequence \(\{f(x^k)\}\) can be nonmonotone.

Proposition 1

Let \(\{x^k\}\) be the sequence of solutions generated by the nonmonotone local search, and \(\{{\bar{f}}^k\}\) be the sequence of reference values defined as in (5).

Then, NonmonotoneLocalSearch cannot cycle and produces a local minimum point.

Proof

First, we observe that the sequence \(\{{\bar{f}}^k\}\) is bounded from below, since the number of solutions in the feasible set \({\mathcal {X}}\) is finite.

Moreover, at each iteration k, we have that

$$\begin{aligned} f(x^{k+1})<{\bar{f}}^k. \end{aligned}$$
(6)

From (5) and (6), we can write:

$$\begin{aligned} {\bar{f}}^{k+1}\le {\bar{f}}^k. \end{aligned}$$
(7)

Then, remembering that \(|{{\mathcal {X}}}| < \infty \), we can define

$$\begin{aligned} 0 < \delta = \min _{x,y\in {{\mathcal {X}}}}\Big \{|f(x)-f(y)|: f(x)\ne f(y)\Big \}, \end{aligned}$$

so that we have

$$\begin{aligned} {\bar{f}}^{k+M}<{\bar{f}}^k - \delta . \end{aligned}$$
(8)

By contradiction, let us assume now that the procedure does not terminate and that a solution \({\tilde{x}}\) (which is not a local minimum) is generated an infinite number of times. By (7) and (8), there exists an iteration \({\tilde{k}}\) such that

$$\begin{aligned} {\bar{f}}^{{\tilde{k}}} \le f({\tilde{x}}). \end{aligned}$$

Furthermore, as \({\tilde{x}}\) is generated an infinite number of times, there exists an iteration \({\hat{k}}\ge {\tilde{k}}\) such that

$$\begin{aligned} f({\tilde{x}}) < {\bar{f}}^{{\hat{k}}}. \end{aligned}$$

Hence, we have

$$\begin{aligned} f({\tilde{x}})<{\bar{f}}^{{\hat{k}}}\le {\bar{f}}^{{\tilde{k}}} \le f({\tilde{x}}), \end{aligned}$$

which shows that the local search procedure cannot cycle. Then, the point produced is such that \(|H| = 0\), therefore we have

$$\begin{aligned} f(x) \le {\bar{f}} \le f(y), \quad \text{ for } \text{ all } \; y\in N(x), \end{aligned}$$

which implies that x is a local minimum. \(\square \)

For the three combinatorial optimization problem considered in Sect. 3, namely the MAX-CUT, the MAX-SAT, and the QAP, we have designed a Nonmonotone GRASP (NM-GRASP), that applies the procedure

$$\begin{aligned} {\mathtt{NonmonotoneLocalSearch}}(x, f(\cdot )). \end{aligned}$$

NonmonotoneLocalSearch is based on the same neighborhood structure as in the classical GRASPs. In the case of maximization problems (and this is the case for the MAX-CUT and the MAX-SAT), we have the sign \(>\) in line 3 and 9. Furthermore, \({\bar{f}}\) is updated as the minimum among the objective function values in W, and on line 8 we have \({\bar{f}}\):=\(\min \{f \in W\}\).

NonmonotoneLocalSearch accepts a move if it guarantees an improvement greater than \({\bar{f}}\). The current solution is then successively replaced and the search stops after all possible moves have been evaluated and no neighbor that improves \({\bar{f}}\) was found.

5 Computational results

In this section, we present our computational experience on the MAX-CUT, the MAX-SAT, and the QAP. In order to compare the performance of the classical GRASP and the NM-GRASP we tested the two heuristics on benchmark test problems from the literature. The instances used for the tests on the MAX-CUT and the MAX-SAT problems are downloadable from Mauricio G.C. Resende’s webpage at http://mauricio.resende.info/data/index.html. The instances used for the tests on the QAP problem are downloadable from http://anjos.mgi.polymtl.ca/qaplib/inst.html#BO.

As for the MAX-CUT, the following problem instances were used:

g problems These test problems were used by Fujisawa et al. in [44]. They consist of sparse graphs whose size in terms of number of nodes varies from 10 to 1250.

sg3dl problems Proposed by Burer, Monteiro, and Zhang [16], these instances correspond to cubic lattice graphs modeling Ising spin glasses. The graphs vary in sparsity and sizes, in such a way that the larger is the size in terms of number of nodes (from 1000 to about 3000) the lower is the density (from 0.60 to 0.22 %).

torus problems This is also a set of instances from the Ising model of spin glasses. The complete problem library is available from the 7th DIMACS Implementation Challenge and is downloadable as a tar file and compressed with gzip from http://dimacs.rutgers.edu/Challenges/Seventh/Instances/.

B problems The graphs in this set of instances are sparse and vary in size from 5000 to 8000 nodes.

G problems These test problems were created by Helmberg and Rendl [63]. They consist of toroidal, planar, and randomly generated graphs of varying sparsity and sizes. These graphs vary in size from 800 to 3000 nodes and in density from 0.17 to 6.12 %.

As for the MAX-SAT, the instances have been generated from the jnh SAT problems class of the 2nd DIMACS Implementation Challenge by randomly generating clause weights uniformly between 1 and 1000. In these instances, the number of variables is 100, while the number of clauses ranges from 800 to 900.

For the QAP, benchmark problem instances have been proposed by Burkard et al. [18] and are known as QAPLIB - A Quadratic Assignment Problem Library:Footnote 1

chr problems These test problems were used by Christofides and Benavent in [21]. They are characterized by a \(n\times n\) adjacency matrix of a weighted tree and a \(n\times n\) adjacency matrix of a complete graph, with n varying from 12 to 25.

els19 problem In this instance, the data describe the distances of \(n=19\) different facilities of a hospital and the flow of patients between those locations. It has been used by Mautor in [81].

esc problems These test problems were used by Eschermann and Wunderlich in [27] in a computer science application where to minimize the amount of additional hardware needed for the testing of self-testable sequential circuits. In these instances, n varies from 16 to 128.

kra problems These are real–world instances used to plan the Klinikum Regensburg in Germany and described by Krarup and Pruzan in [71]. Here, \(n=30\).

lipa problems These test problems were randomly generated by Li and Pardalos [75]. They are asymmetric instances with known optimal solutions and n ranging from 20 to 90.

nug problems These test problems were used by Negent et al. in [84]. They are characterized by a distance matrix containing Manhattan distances of rectangular grids. The size n ranges from 14 to 30.

rou problems These instances were used by Roucairol in [100]. The entries of the matrices are randomly generated between 1 and 100 and \(n=\{12,15,20\}\).

scr, tho, and wil problems. In all these instances, the entries of the matrices are rectangular. It only changes the size. In the scr problems \(n=\{12,15,20\}\) and they were used by Scriabin and Vergin in [102]. In the tho problems, \(n=\{30,40,150\}\) and they were used by Thonemann and A. Bölte in [106]. Finally, in the wil problems, \(n=\{50,100\}\) and they were used by Wilhelm and Ward in [109].

sko problems In these instances, the entries of the distance matricx are rectangular, the entries in the flow matrix are pseudorandom numbers, and n ranges from 42 to 100. They were used by Skorin-Kapov in [104].

ste problems They refer to the backboard wiring problem and have size \(n=36\). Totally, they constitute a set of three instances characterized by data representing Manhattan, squared Euclidean, and Euclidean distances. They were used by Steinberg in [105].

We performed ten random runs for each instance considered. For each run, with the time limit of one hour, we stored the solution found with the best objective function value, the CPU time, and the iteration in which such solution was found.

The experiments were performed on an Intel Xeon E5-2670 processor, running at 2.60 GHz with 64 GB of RAM. All runs were done using a single processor. All codes were written in Fortran 77 and compiled with gfortran compiler.

About the fine tuning of the parameter M used in the nonmonotone local search, the best value resulting in our experiments has been 10 for the MAX-CUT and the QAP instances and 5 for the MAX-SAT instances.

Fig. 5
figure 5

Performance comparison between NM-GRASP (dashed line) and the original GRASP (dash-dot line) on the MAX-CUT problems (worst-left; average-center; best-right)

Fig. 6
figure 6

Performance comparison between NM-GRASP (dashed line) and the original GRASP (dash-dot line) on Max-sat problems (worst-left; average-center; best-right)

Fig. 7
figure 7

Performance comparison between NM-GRASP (dashed line) and the original GRASP (dash-dot line) on quadratic assignment problems (worst-left; average-center; best-right)

Figures 5, 6, and 7 plot the performance of the NM-GRASP and the classical GRASP in terms of objective function value for the MAX-CUT, the MAX-SAT, and the QAP, respectively. The dashed line gives on the y-axis the percentage of instances in which the absolute improvement in terms of objective function value of NM-GRASP with respect to GRASP is greater than or equal to the value given in the x-axis. Furthermore, the dash-dot line gives on the y-axis the percentage of instances in which the absolute improvement of GRASP with respect to NM-GRASP is greater or equal than the value given in the x-axis. More specifically, let \(f^{NM}_i\) and \(f^{OR}_i\) be the objective function values found by NM-GRASP and the classical GRASP on instance i, respectively. Let N be the total number of instances considered for each problem. The dashed line for the MAX-CUT and the MAX-SAT is the plot of

$$\begin{aligned} y(x) = \frac{|V(x)|}{N}, \end{aligned}$$
(9)

where \(V(x) = \{i:\ f^{NM}_i - f^{OR}_i \ge x\}\). The dash-dot line for the MAX-CUT and the MAX-SAT is the plot of

$$\begin{aligned} y(x) = \frac{|W(x)|}{N}, \end{aligned}$$
(10)

where \(W(x) = \{i:\ f^{OR}_i - f^{NM}_i \ge x\}\). For what concerns the QAP, since it is a minimization problem, we have that the dashed line is the plot of (10) while the dash-dot line is the plot of (9).

In every figure, we report: (1) on the left, the plot related to the worst objective function value obtained among the ten runs; (2) on the center, the one related to the average of the objective function value obtained on the ten runs, and (3) on the right, the plot related to the best objective function value obtained among the ten runs by the two heuristics.

On the basis of the figures, we notice that NM-GRASP is generally able to guarantee better performances than GRASP in all three scenarios (worst, average, and best scenario).

Table 1 Comparison between NM-GRASP and the classical GRASP on MAX-CUT instances (average results over ten runs)
Table 2 Comparison between NM-GRASP and the classical GRASP on MAX-SAT instances (average results over ten runs)
Table 3 Comparison between NM-GRASP and the classical GRASP on QAP instances (average results over ten runs)
Table 4 Average deviation from best known/optimalsolution

To deeper investigate and confirm the better performance of NM-GRASP, Tables 1, 2, and 3 summarize the details of the results obtained by comparing the two algorithms on the benchmark instances of the three selected combinatorial optimization problems. The first column of the three tables reports the name of the instance. The remaining columns report for each of the two approaches the average CPU time (Time), the average number of iterations (Iter), the average objective function value (Obj) with the standard deviation in brackets, and the best objective function value obtained over the ten runs (Best Obj) with the number of times the best value is obtained in brackets.

As for the 117 benchmark instances of the MAX-CUT, the 44 instances of the MAX-SAT, and the 82 instances of the QAP, we notice that the NM-GRASP found solutions whose objective function value is better than or equal to the objective function value of the solution found by the classical GRASP (often strictly better) for the great majority of problems. Moreover, the number of times NM-GRASP found the best objective function value over only ten runs is higher for all instances for the QAP, and for almost all instances except for a very small percentage of cases that is below 5 % for the MAX-CUT and below 11 % for the MAX-SAT.

In order to see if there exist significant differences in the results, in terms of solution quality, between NM-GRASP and the original GRASP, we apply the Friedman non-parametric statistical test followed by the post-hoc test on the results from the tables. The post-hoc analysis shows that NM-GRASP statistically outperforms the original GRASP on both MAX-CUT and the QAP instances with p-values of 1.05973563e-10 and 3.84942067e-09, respectively. The performance between the two algorithms is statistically less significant for the MAX-SAT, with a p-value of 6.47928279e-02.

As additional comparison between GRASP and NM-GRASP, we consider their performance with respect to the average CPU time. We recall that, for each run, we stored the CPU time needed to find the solution with the best objective function value in that run. In Fig. 8, we report the box plots related to the distribution of the average CPU time over the ten runs for each problem (MAX-CUT on the left, MAX-SAT in the center and QAP on the right). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. From each plots, we see that the median of the NM-GRASP is lower than the median of the classical GRASP. For the QAP problem, the NM-GRASP find its best solutions generally much faster than the classical GRASP.

Fig. 8
figure 8

Box plots related to the CPU time (average results over ten runs)

As a further experiment for the MAX-CUT problem, we considered the empirical distributions of the random variable time-to-target-solution-value (see [1, 2] for further details) considering instances g1250.n, G40, sg3dl142000.mc, and toruspm3-15-50 using different target values. We performed 100 independent runs of each heuristic and recorded the time needed to find a solution at least as good as the target solution. For each run, we considered one hour as the time limit. The Time-To-Target-plot analysis is reported in Appendix 1.

Finally, we compared the solutions obtained by both GRASP and NM-GRASP with the best known solutions (optimal solutions when available) from the literature. More specifically, for the MAX-CUT problem we considered the best known solutions related to sg3dl, torus and G instances; for the MAX-SAT problem we considered the optimal solutions for all the available instances; for the QAP problem we considered the best known/optimal solutions for all the QAPLIB instances. We calculated for both MAX-CUT and MAX-SAT problems the deviation \(\rho _{max}=(f_{best}-f)/f_{best}\) from the best known solution \(f_{best}\), where f is the best objective value attained by a given approach. In the QAP case we instead calculated \(\rho _{min}=(f-f_{best})/f_{best}\). In Table 4, we report for both approaches the average deviation obtained on the three classes of instances. As it can be easily noticed, NM-GRASP always gets a better average deviation than the classical GRASP.

Summarizing, our computational experience shows that considering a nonmonotone local search in the GRASP heuristic often gives a significant improvement in the quality of the solution, and this improvement is achieved without deteriorating the CPU time.

6 Concluding remarks

In this paper, we introduced a new nonmonotone strategy to explore the neighborhood of the current solution during a local search phase and formally stated the convergence of the resulting nonmonotone local search to a locally optimal solution. To illustrate its effectiveness, we used it as local search procedure in a GRASP framework and compared the resulting Nonmonotone GRASP (NM-GRASP) with a classical GRASP on three selected hard combinatorial optimization problems: the MAX-CUT, the MAX-SAT, and the QAP. The comparison showed that the new proposed approach is very competitive, outperforming the classical GRASP.