1 Introduction

The capacitated vehicle routing problem (CVRP) was defined by Dantzig and Ramser [10] as follows. The input consists of a set of \(n+1\) points, a depot and n customers; an \((n+1) \times (n+1)\) matrix \([c_{ij}]\) with the travel costs between every pair of points i and j; an n-dimensional demand vector \([d_i]\) giving the amount to be collected from customer i; and a vehicle capacity Q. A solution is a set of routes, starting and ending at the depot, that visit every customer exactly once. The only constraint on a route is that the sum of the demands of its customers does not exceed the vehicle capacity Q. The objective is to find a solution with minimum total cost. Many authors also assume that the number of routes is fixed to an additional input number K. The CVRP is a widely studied problem due to its own applications, since it can model adequately a significant number of real logistic systems. Furthermore, it plays a particularly important role on general vehicle routing research, for being the most basic and prototypical VRP variant.

Column generation has been used in exact algorithms for the vehicle routing problem with time windows (VRPTW) since Desrochers et al. [11]. This technique performed very well on tightly constrained instances (those with narrow time windows). As the CVRP can be regarded as the particular case of VRPTW where time windows are arbitrarily large, column generation was viewed as a non-promising approach for the problem. In fact, in the early 2000s, the best performing algorithms for the CVRP were branch-and-cut algorithms that separated quite complex families of cuts identified by polyhedral investigation (see Naddef and Rinaldi [22]). In spite of their sophistication, some instances from the literature with only 50 customers could not be solved to optimality [20]. At that moment, the branch-cut-and-price algorithm (BCP) algorithm in Fukasawa et al. [12] showed that the combination of cut and column generation could be much more effective than each of those techniques taken alone.

According to the classification proposed in Poggi and Uchoa [28], the BCP algorithm in [12] only uses robust cuts. A cut is said to be robust when the value of the dual variable associated to it can be translated into costs in the pricing subproblem. Therefore, the structure and the size of that subproblem remain unaltered, regardless of the number of robust cuts added. On the other hand, non-robust cuts are those that change the structure and/or the size of the pricing subproblem, each additional cut makes it harder. Robustness is a desirable property of a BCP. Even with good pricing heuristics, at least one call to the exact pricing is necessary to establish a valid dual bound. With the addition of non-robust cuts, there is a risk of having to solve to optimality an intractable subproblem. Nevertheless, since Jepsen et al. [17] and Baldacci et al. [2] it is known that some non-robust cuts can be effectively used, at least if their separation is restricted in order to avoid an excessive impact on the pricing. While the adjective non-robust focuses on their negative aspect; some authors call them strong cuts, focusing on their positive aspect, a greater potential for significantly reducing the integrality gaps.

1.1 Literature review

We briefly review the most recent works proposing exact algorithms for the CVRP, all of them are based on the combination of column and cut generation. A deeper review can be found in [29].

  1. 1.

    Fukasawa et al. [12] presented a BCP algorithm where the columns are associated to the q-routes without k-cycles [16]. The separated cuts are the same used in previous algorithms over the edge CVRP formulation. Those cuts are robust with respect to q-route pricing. If the column generation at the root node is found to be too slow, the algorithm may automatically switch to a branch-and-cut. This may be advantageous in some instances. All benchmark instances from the literature with up to 134 customers could be solved to optimality, an expressive improvement over previous methods.

  2. 2.

    Baldacci et al. [2] presented an algorithm based on column and cut generation. The columns are associated to elementary routes. Besides cuts for the edge formulation, strengthened capacity cuts and clique cuts are separated. Those later cuts are effective but non-robust. An important new idea is introduced. Instead of branching, the algorithm finishes in the root node (therefore, it is not a BCP) by enumerating all elementary routes with reduced cost smaller than the duality gap. A set-partitioning problem containing all those routes is then given to a MIP solver. The algorithm could solve almost all instances solved in [12], usually taking much less time. However, the exponential nature of some algorithmic elements, in particular the route enumeration, made it fail on some instances with many customers per route.

  3. 3.

    Pessoa et al. [24] presented some improvements over [12]. Cuts from an extended formulation with load indices were also separated. Those cuts do not change the complexity of the pricing of q-routes by dynamic programming. Additionally, the idea of performing elementary route enumeration and MIP solving to finish a node was borrowed from [2]. However, in order to avoid a premature failure when the root gap is too large, it was hybridized with traditional branching. The overall improvement was not sufficient for solving larger instances.

  4. 4.

    The algorithm by Baldacci et al. [3] improved upon [2]. It introduces the concept of ng-routes, a route relaxation that is more effective than the q-routes without k-cycles. Instead of clique cuts, subset row cuts ([17]) and weak subset row cuts, that have less impact on the pricing, are separated. The resulting algorithm is not only faster on average, it is much more stable than the algorithm in [2], being able to solve even some instances with many customers per route.

  5. 5.

    Contardo [8] introduced new twists on the use of non-robust cuts and on route enumeration. The columns are associated with q-routes without two-cycles, a relatively poor relaxation. The partial elementarity of the routes is enforced by non-robust strong degree cuts. Robust cuts from edge formulation and non-robust strengthened capacity cuts and subset row cuts are also separated. The enumeration of elementary routes is directed to a pool of columns. As soon as the duality gap is sufficiently small to produce a pool with reasonable size (a few million routes), the pricing starts to be performed by inspection. From this point, an aggressive separation of non-robust cuts can be performed, leading to very small gaps. The reported computational results are very consistent. In particular, the hard instance M-n151-k12 (150 customers, 12 routes) was solved to optimality for the first time (in 6 hours), setting a new record.

  6. 6.

    Røpke [32] went back to robust BCP. Its linear relaxation differs from [12] only by the use of the more effective ng-routes. The power of the algorithm comes mainly from a sophisticated and aggressive strong branching, able to reduce significantly the average size of the enumeration trees. The overall results are comparable with the results in [3, 8]. A long run of that algorithm (5.5 days) could also solve M-n151-k12.

  7. 7.

    Finally, Contardo and Martinelli [9] improved upon [8]. Instead of q-routes without two-cycles, ng-routes are used. Moreover, the performance of the dynamic programming pricing was enhanced by the use of the DSSR technique [4, 31] and edge variables are fixed by reduced costs, using the procedure proposed in [15]. The computational results were also improved. For example, instance M-n151-k12 could be solved in less than 3 h.

1.2 New branch-cut-and-price algorithm

The branch-cut-and-price algorithm proposed in this work contains elements from all those previous algorithms, usually enhanced and combined with new elements. The features of that BCP are listed below:

  • It uses limited memory subset row cuts (lm-SRCs), the most important original contribution introduced here. While the traditional SRCs are known to be effective, their practical use has been restricted by their large impact on the pricing. The lm-SRCs are a weakening of the SRCs. This weakening can be controlled and dynamically adjusted, making the lm-SRCs as effective in improving the lower bounds as the traditional SRCs, but still much less costly in the pricing. In fact, in many instances from the literature, including quite large ones, it is possible to separate lm-SRCs to obtain bounds as good as those that would be obtained by separating all SRCs with cardinality up to 5.

  • The underlying formulation used in the BCP has extended arc-load variables. This allows a new and effective scheme for fixing of variables by Lagrangean bounds (superior to the fixing in [15]), with direct benefits on the pricing.

  • The columns in the BCP are associated with ng-routes. The corresponding pricing subproblem is solved by a labeling algorithm that must also consider the dual variables of the lm-SRCs. Its implementation is quite critical for the overall BCP performance. After experiments with a number of alternatives, the best performance was obtained by a bidirectional search that differs a little from that proposed in [30] because the concatenation of the labels is not necessarily performed at the half of the capacity. Completion bounds (similar to those in [8]) are also used for eliminating labels. Anyway, the exact pricing algorithm is called just a few times per BCP node, most of the iterations use effective heuristics.

  • In some instances, usually those with an average of more than 15 customers per route, column generation is prone to severe convergence problems. When this situation is detected, the BCP automatically starts performing dual stabilization, as described in [25].

  • Like in [24], the BCP hybridizes branching with route enumeration. Actually, it performs an aggressive hierarchical strong branching, with up to n candidates (partially) evaluated in the root node. The strong branching effort in each node depends on an estimate of the size of the subtree rooted in that node. The branching mechanism also keeps the history of candidate evaluations for helping on future decisions.

  • As soon as the gap of a BCP node is sufficiently small, the elementary routes that can be part of the optimal solution are enumerated into a pool. From that point, since the pricing will be performed by inspection, all lm-SRCs may be immediately lifted to SRCs and additional non-robust cuts, including cliques, may be separated. On larger instances this may not be enough to reduce the number of routes to a size tractable by a MIP solver. In those cases, a new idea is used: perform ordinary BCP branching. The pricing will continue to be performed by pool inspection, in both child nodes. Nodes are only finished by the MIP solver when the number of remaining routes is quite small.

  • The lm-SRCs are still non-robust. There are cases where several hundreds such cuts are being normally handled by the pricing algorithm, and then, the separation of a few dozen additional lm-SRCs makes this algorithm 100 times slower. In this situation the BCP performs a rollback, where the offending cuts are removed even if the lower bound of the node is decreased. This is a completely original feature, not appearing in previous works.

Overall, we believe that this is possibly the most sophisticated BCP algorithm for a particular problem ever implemented. The techniques introduced in this work, including the lm-SRCs, have the potential of being applied on many other problems, including several other VRP variants, parallel machine scheduling or network design.

This article is organized as follows. Section 2 presents the used formulations. Section 3 introduces the limited memory SRCs and their separation strategy. Section 4 describes the labeling algorithms used for pricing, for fixing variables by reduced costs and for enumerating routes. Section 5 presents the strong branching procedure, fully hybridized with the enumeration. Section 6 reports computational experiments. Finally, Sect.  7 contains a number of conclusions.

2 Formulations

This work departs from an extended formulation for the Asymmetrical CVRP presented in [24] (also in [13] for the unitary demand case). Let \(G=(V,A)\) be a complete directed graph where \(V = \{0, \dots , n\}\) is the vertex set and A is the arc set. Vertices in \(V_+ = \{1, \dots , n\}\) correspond to the customers, whereas vertex 0 corresponds to the depot. A cost \(c_a\) is associated with each arc \(a=(i,j) \in A\), symmetrical CVRP instances have symmetric costs, i.e., \(c_{ij}=c_{ji}\). A positive integral demand \(d_i\) is associated to each customer i for \(i \in V_+, d_0\) is defined as 0. Let Q denote the vehicle capacity. We assume that the number of routes is fixed to K. Let \(G_Q=(V,A_Q)\) be a multigraph where \(A_Q\) contains arcs \((i,j)^q\), for all \(i \in V_+, j \in V\) and for all \(q=d_i,\ldots ,Q\), plus arcs \((0,j)^0\), for all \(j \in V_+\). For any set \(S \subseteq V, \delta ^-(S) = \{ (i,j)^q \in A_Q: i \in V \setminus S, j \in S \}\), and \(\delta ^+(S)\) is defined in a similar way. For each \((i,j)^q \in A_Q\), a binary variable \(x_{ij}^q\) indicates that some vehicle goes from i to j carrying a load—the sum of the demands of vertex i and of its preceding vertices in the route—of exactly q units. The arc-load indexed formulation is:

$$\begin{aligned} (\hbox {ALF})\; \min&\qquad \sum \limits _{a^q \in A_Q} c_a x_a^q\end{aligned}$$
(1)
$$\begin{aligned} \hbox {subject to}&\nonumber \\&\sum \limits _{a^0 \in \delta ^+(\{0\})} x_a^0 = K, \end{aligned}$$
(2)
$$\begin{aligned}&\sum \limits _{a^q \in \delta ^+(\{i\})} x_a^q = 1, \,\quad \forall i \in V_+, \end{aligned}$$
(3)
$$\begin{aligned}&{\sum \limits _{a^{q-d_i} \in \delta ^-(\{i\})} x_a^{q-d_i} - \sum \limits _{a^{q} \in \delta ^+(\{i\})} x_a^{q} = 0, \quad } \forall i \in V_+, q=d_i,\ldots ,Q,\nonumber \\\end{aligned}$$
(4)
$$\begin{aligned}&x_a^q \ge 0, \quad \forall a^q \in A_Q, \end{aligned}$$
(5)
$$\begin{aligned}&x \hbox { integer}. \end{aligned}$$
(6)

Equations (2) and (3) are depot and customer outdegree constraints. Balance equations (4) state that if an arc with index \(q-d_i\) enters vertex i then an arc with index q must leave i. This formulation can be viewed as defining an acyclic network \(\mathcal{N}=(V_Q,A_Q)\) with a set of nodes \(V_Q=\{(i,q): i \in V; q = d_i,\ldots ,Q\}\). The set of arcs is also \(A_Q\), but an arc \((i,j)^q \in A_Q\) is interpreted as going from (iq) to \((j,q+d_i)\). Figure 1 gives an example of an integral solution (routes \((0-1-5-0), (0-2-0)\), and \((0-3-4-0)\)) depicted as a set of paths in such a network, where \(n=Q=5, K=3, d_1=d_3=d_4=2\), and \(d_2=d_5=3\).

Fig. 1
figure 1

Representation of a solution as a set of paths in \(\mathcal{N}\)

A q-route is a walk that starts at the depot, traverses a sequence of customers with total demand at most Q, and returns to the depot [7]. The q-routes are not necessarily elementary, but each time a customer is revisited its demand in counted again. Let \(\varOmega \) be the set of all q-routes. For each \(r \in \varOmega \) define a non-negative variable \(\lambda _r\) and binary coefficients \(a^{ij}_{rq}\), for each \((i,j)^q \in A_Q\), indicating whether (ij) is traversed with load q in route r. Equations (4) in ALF can be replaced by:

$$\begin{aligned} \sum \limits _{r \in \varOmega } a^{ij}_{rq} \lambda _r = x_{ij}^q, \forall (i,j)^q \in A_Q. \end{aligned}$$
(7)

Substituting the x variables and relaxing the integrality, the Dantzig–Wolfe master LP is written as:

$$\begin{aligned} (\text{ DWM })\quad&\min ~ \sum \limits _{r \in \varOmega } \left( \sum \limits _{(i,j)^q \in A_Q} a^{ij}_{rq} c_{ij} \right) \lambda _r \end{aligned}$$
(8)
$$\begin{aligned} \hbox {subject to }&\nonumber \\&\sum \limits _{r \in \varOmega } \left( \sum \limits _{(i,j)^q \in \delta ^+(\{0\})} a^{ij}_{rq} \right) \lambda _r = K,\end{aligned}$$
(9)
$$\begin{aligned}&\sum \limits _{r \in \varOmega } \left( \sum \limits _{(i,j)^q \in \delta ^+(\{i\})} a^{ij}_{rq} \right) \lambda _r = 1, \quad \forall i \in V_+,\end{aligned}$$
(10)
$$\begin{aligned}&\lambda _r \ge 0\quad \forall r \in \varOmega . \end{aligned}$$
(11)

A generic constraint l of format \(\sum _{(i,j)^q \in A_Q} \alpha _{ij}^{lq} x_{ij}^q \ge b_l\) can also be included in the DWM, using the variable substitution (7), as \(\sum _{r \in \varOmega } (\sum _{(i,j)^q \in A_Q} \alpha _{ij}^{lq} a^{ij}_{rq}) \lambda _r \ge b_l\). Suppose that, at a given instant, there are \(n_R\) constraints over the x variables (including (9) and (10)) in the DWM. Equality (9) is numbered as 0, equalities (10) are numbered from 1 to n, according to the \(i \in V_+\), other constraints are numbered from \(n+1\) to \(n_R\). The dual variable of the constraint with number l is denoted by \(\pi _l\). The reduced cost of an arc \((i,j)^q\) is defined as:

$$\begin{aligned} \bar{c}_{ij}^q = c_{ij} -\sum _{l=0}^{n_R-1} \alpha _{ij}^{lq} \pi _l. \end{aligned}$$
(12)

The pricing subproblem for solving the DWM consists in finding a shortest path in \(\mathcal {N}\) from node (0, 0) to nodes \((0,q), 1 \le q \le Q\), with respect to the arc reduced costs \(\bar{c}_{ij}^q\). This can be done in \(O(n^2Q)\) time.

A significantly stronger linear relaxation would be obtained if \(\varOmega \) was redefined as the set of elementary routes. On the other hand, the pricing subproblem would become strongly NP-hard. While carefully designed labeling algorithms are now capable of pricing elementary routes on most instances from the literature with up to 199 customers [21], this is still too costly. A possible alternative is pricing q-routes without k-cycles [16], a relaxation of the elementary routes that allows multiple visits to a customer, on the condition that at least k other customers are visited between successive visits. While pricing q-routes without three-cycles is not much more costly than pricing ordinary q-routes [12], using values of k larger than 4 can make the algorithm too slow. A more recent alternative for imposing partial elementarity, used in this work, are the ng-routes [3]. For each customer \(i \in V_+\), let \(NG(i) \subseteq V_+\) be the ng-set of i, defining its neighborhood. This may stand for the |NG(i)| (this cardinality is decided a priori) closest customers and includes i itself. An ng-route allows multiple visits to a customer i, on the condition that at least one costumer j such that \(i \notin NG(j)\) is visited between successive visits. Extensive experiments performed in [21] showed that the use of ng-sets with size 8 already yields lower bounds comparable with the use of q-routes without 5-cycles, but spending significantly less time. From now on, \(\varOmega \) is redefined to be a set of ng-routes.

3 Cuts

Even if \(\varOmega \) only contains elementary routes, the bounds given by (8)–(11) are not good enough to be the basis of efficient exact algorithms (gaps between 1 and 4 % are typical in the instances from the literature). For this purpose, the formulation must be reinforced with additional cuts. Cuts for the undirected edge formulation can be included in the DWM by using the transformation \(x_{ij} = \sum _{(i,j)^q \in A_Q} x_{ij}^q + \sum _{(j,i)^q \in A_Q} x_{ji}^q\). In this work, rounded capacity cuts and strengthened combs are separated by the CVRPSEP package [19]. All those cuts are robust, the effect of their dual variables is captured in the arc-load reduced costs (12).

[17] introduced a family of inequalities defined over the route variables. Let \(a_i^r = \sum _{(i,j)^q \in A_Q} a^{ij}_{rq}\) be the number of times that vertex i appears in route r. Given a base set \(C \subseteq V_+\) and a multiplier \(p, 0< p < 1\), the following (Cp)-Subset Row Cut (SRC)

$$\begin{aligned} \sum \limits _{r \in \varOmega } \left\lfloor p \sum \limits _{i \in C} a_i^r \right\rfloor \lambda _r \le \left\lfloor p|C| \right\rfloor \end{aligned}$$
(13)

is valid, since it can be obtained by a Chvátal-Gomory rounding of the corresponding constraints in (10). For each integer \(d, 1 \le d \le n\), define a non-negative integer variable \(y_C^d\) as the sum of all variables \(\lambda _r\) such that \(\sum _{i \in C} a_i^r = d\). Variables with \(d > |C|\) can only be non-zero if \(\varOmega \) contains non-elementary routes. The interesting SRCs, for sets C with cardinality up to 5, are the following:

  • The cuts where \(|C|=3\) and \(p=1/2\) are called 3-subset row cuts (3SRCs) and can be expressed as \(y_C^2 + y_C^3 + 2 y_C^4 + 2y_C^5 + \ldots \le 1\). Although they are very effective in improving the lower bounds, only a relatively small number of those cuts could be separated in [3, 8], and [9], in order to keep the pricing tractable. The first authors also used the Weak 3SRCs, a weakening of the 3SRCs where only the variables corresponding to routes that use an edge (ij) such that \(i,j \in C\) have coefficient 1.

  • Taking \(|C|=1\) and \(p=1/2\), the 1-subset row cuts (1SRCs) \(y_C^2 + y_C^3 + 2y_C^4 + \ldots \le 0\) are obtained. They are equivalent to the strong degree cuts \(y_C^1 \ge 1\) introduced in [8], in the sense that both families forbid cycles over a vertex i (\(C=\{i\}\)). Contardo also defined the weaker k-cycle elimination cuts, that only forbid cycles over i of size k or less. Of course, these cuts can only be useful when the \(\varOmega \) set contains non-elementary routes.

  • The cuts where \(|C|=4\) and \(p=2/3\) are 4SRCs, expressed as \(y_C^2 + 2y_C^3 + 2y_C^4 + 3y_C^5 + 4y_C^6 + \ldots \le 2\).

  • There are two interesting families of cuts with \(|C|=5\). Those with \(p=1/3\) will be called 5,1SRCs, \(y_C^3 + y_C^4 + y_C^5 + 2y_C^6 \ldots \le 1\); whereas those with \(p=1/2\) are 5,2SRCs, having the format \(y_C^2 + y_C^3 + 2y_C^4 + 2y_C^5 + 3y_C^6 \ldots \le 2\). The latter family was already used in [8], and [9].

The newly proposed limited memory subset row cuts (lm-SRCs) correspond to a generalization of the SRCs. Each such cut is defined by a triplet (CMp), where M is the memory set (\(C \subseteq M \subseteq V_+\)). Remark that each lm-SRC has its own memory set. An lm-SRC is written as:

$$\begin{aligned} \sum \limits _{r \in \varOmega } \alpha (C,M,p,r) \lambda _r \le \left\lfloor p|C| \right\rfloor , \end{aligned}$$
(14)

where the coefficients \(\alpha \) are a function of CMp and r computed by algorithm 1.

figure a

Variable coeff stores the coefficient to be returned. Each time a vertex in C is visited, variable state is increased by p. When state becomes larger or equal to 1, its value is reduced by 1 unit and coeff is incremented. When \(M=V_+\), the Function \(\alpha \) will return \(\lfloor p\sum _{i \in C} a_i^r \rfloor \) and the lm-SRC will be identical to an SRC. On the other hand, when M is strictly contained in \(V_+\), the lm-SRC may be a weakening of its corresponding SRC. This happens because every time the route r leaves M, the variable state is reset to zero, potentially decreasing the returned coefficient. Function \(\alpha \) indicates how the lm-SRCs should be taken into account in the labeling algorithms used in the pricing. In fact, that procedural function is executed along the algorithm. Each label should have an additional dimension for each lm-SRC, storing their states in the corresponding partial paths. However, the coefficients do not need to be stored in the labels. Instead, whenever a label extension causes the increment of the coefficient of an lm-SRC, according to function \(\alpha \), the value of its dual variable is immediately subtracted from the reduced cost of the new label. We remark that the number of possible states of an lm-SRC depends on its p. For example, for the frequent case where \(p=1/2\), the state can be only 0 or 1 / 2. Therefore, it can be represented by a single bit.

The potential advantage of the lm-SRCs over classical SRCs is their much reduced impact on the labeling algorithm used in the pricing, when \(|M|<< |V_+|\). The reasons for that reduction will be explained in Sect. 4.1.1. In order to obtain small memory sets, we propose the following separation strategy for the lm-SRCs. First, identify a violated (Cp)-SRC. Then, function calculate M is used to obtain a minimal memory such that the lm-(CMp)-SRC has the same violation.

figure b

For example, suppose that, in a given fractional solution, the paths that visit \(C=\{1,2,3\}\) at least twice are: \(r_1=(0-1-4-5-3-6-2-7-1-0)\) with \(\lambda _{r_1} = 0.2, r_2=(0-7-2-8-3-0)\) with \(\lambda _{r_2}=0.3\), and \(r_3=(0-5-3-4-1-7-9-2-0)\) with value \(\lambda _{r_3}=0.4\). The (\(C,1/2)-\mathrm{SRC}\) 2\(\lambda _{r_1}+\lambda _{r_2}+\lambda _{r_3} \le 1\) has a violation of 0.1. The minimal set M obtained that yields a lm-(CM, 1 / 2)-SRC with the same violation is \(M = C \cup \{4,5\} \cup \{7\} \cup \{8\} \cup \{4\}\).

4 Labeling algorithms

In this section we present the algorithms used for pricing ng-routes, for fixing arc-load variables by reduced costs, and for enumerating elementary routes. Those algorithms should take into account the following points: (i) the reduced cost of an arc, defined as (12), depends on its load q; (ii) some variables \(x_{a}^{q}\) may be already fixed to zero; and (iii) non-robust lm-SRCs may be present.

4.1 Pricing

4.1.1 Forward labeling

The forward dynamic programming labeling algorithm for the pricing problem represents an ng-feasible partial path \(P=(0,\ldots ,i), i \in V\), as a label \({L}(P) = (\bar{c}(P),v(P)=i,q(P),\varPi (P), S(P), pred(P))\) storing its reduced cost, end vertex, load, set of vertices forbidden as immediate extensions due to ng-sets, vector of states corresponding to the \(n_S\) lm-SRCs with non-zero dual variables in the current Master LP solution, and a pointer to its predecessor label. Each \((i,q) \in V_Q\) defines a bucket F(iq). A label L(P) is stored in bucket F(v(P), q(P)). A label \(L(P_1)\) dominates a label \(L(P_2)\) if every feasible completion of \(P_2\) yields a route with reduced cost not smaller than the feasible route obtained by applying the same completion into \(P_1\). The following four conditions, together, are sufficient to ensure such a domination:

$$\begin{aligned} (\mathrm{i})\; v(P_1)= & {} v(P_2),\quad (\mathrm{ii})\; q(P_1) = q(P_2),\quad (\mathrm{iii})\; \varPi (P_1) \subseteq \varPi (P_2), \text{ and } \\ (\mathrm{iv})\; \bar{c}(P_1)\le & {} \bar{c}(P_2) + \sum _{1 \le s \le n_S: S(P_1)[s] > S(P_2)[s]} \sigma _s, \end{aligned}$$

where \(\sigma _s < 0\) is the dual variable associated to lm-SRC s. Remark that the reduced costs depending on q or the fixing of some \(x_a^q\) variables to zero prevent (ii) to be strengthened to \(q(P_1) \le q(P_2)\). This happens because, if \(q(P_1) \ne q(P_2)\), the reduced cost of a completion for \(P_2\) may differ from the reduced cost of same completion for \(P_1\). In fact, due to the fixing, a feasible completion for \(P_2\) may be infeasible for \(P_1\). Note also that the second term in the right-hand side of (iv) is an upper bound on what a completion of \(P_2\) can gain over the same completion of \(P_1\), by avoiding the penalizations of revisiting the lm-SRCs s in which \(S(P_1)[s] > S(P_2)[s]\). Only non-dominated labels are kept in the buckets. To accelerate the checking for dominated labels, it is convenient to keep labels of the same bucket ordered by reduced cost. The base set, multiplier and memory set of a lm-SRC s are denoted by C(s), p(s), and M(s), respectively. Consider NG(0) as \(\{0\}\). Algorithm 3 presents the pseudocode of the forward labeling procedure.

figure c

In the end of the algorithm, each non-empty bucket \(F(0,q), 1 \le q \le Q\), will contain only one label, representing the minimum reduced cost route with load q.

Now, it is possible to explain why the lm-SRCs have a reduced impact in the pricing when their memory sets are small. There are O(nQ) buckets in the labeling algorithm. In a rough analysis, the time spent processing each bucket grows quadratically with the average number of non-dominated labels in it and linearly with \(n_S\).

  • A first observation is that the state of an lm-SRC s needs only to be explicitly present in labels corresponding to partial paths ending in M(s). For the remaining labels, this state is zero by definition, which means that \(\sigma _s\) will not play any role in the processing of their buckets. In other words, only the vertices in M(s) need to know about the existence of that cut. This allows the acceleration of the algorithm by a factor of \(\theta (n/M_{avg})\), where \(M_{avg}\) is the average size of the memory sets.

  • However, the crucial point for the improved algorithm efficiency is related to the dominance. If there are no SRCs, the maximum number of non-dominated labels in a bucket F(iq) is bounded by \(2^{|NG(i)|-1}\), as follows from dominance conditions (iii) and (iv). If the cardinality of the ng-sets is small (we used 8 in this work), the pricing is guaranteed to be reasonably fast (unless Q is very large). However, if a traditional SRC is added, its dual variable may make condition (iv) weaker in all buckets. As other SRCs are separated, this may quickly result in an exponential proliferation of non-dominated labels. In practice, this severely limits the number of SRCs that can be used. In contrast, a lm-SRC s with a small memory has much less impact because it can only weaken the dominance in the buckets of M(s). In practice, many more lm-SRCs can be separated before the exponential proliferation of labels is observed. This is exemplified in Fig. 2. Let \(P_1\) be the solid path and \(P_2\) the dashed one, both paths ending in vertex i and having load q. A 3SRC with base set \(C=\{1,2,3\}\) may prevent \(L(P_1)\) from dominating \(L(P_2)\), even though no good completion of \(P_2\) visits C. On the other hand, a lm-3SRC over the same C, having the memory set represented in the figure by filled circles, would not interfere with that dominance.

Fig. 2
figure 2

Example illustrating the performance gain in the pricing when using lm-SRCs

4.1.2 Bidirectional labeling

The labeling algorithm for the pricing can also be performed backwards. In that case, the labels represent ng-feasible partial paths \(P=(i,\ldots ,0), i \in V_+\). The initializing labels are put in buckets \(B(0,q), 1 \le q \le Q\), and the algorithm proceeds in a reversed way, until the label corresponding to the route with minimum reduced cost is found in bucket B(0, 0), as shown in Algorithm 4.

figure d

The forward and backward variants of the labeling are equivalent in terms of computational cost. However, as pointed out in [30], when forward labeling is used, most of the computational effort is spent in buckets with larger values of q, close to Q. This happens by combinatorial reasons, there are many more possible paths converging into a bucket F(iq) if q is larger. In a similar way, when backward labeling is used, most of the computational effort is spent in buckets with small values of q. Therefore, it is often advantageous to perform bidirectional search: use the forward labeling for filling the buckets F(iq) with \(q \le Q/2\) and backward labeling for filling the buckets B(iq) with \(q > Q/2\). The minimum reduced cost paths can be obtained by an additional concatenation step. After implementing this bidirectional algorithm, we realized that the number of labels in the backward part was consistently larger (3–10 times more is typical) than in the forward part. This happens because the backward labeling has more starting labels. Therefore, the algorithm performance can be improved by better balancing both parts. This means that the concatenation will occur at a value (dynamically determined) larger than Q / 2.

Let nFL (nBL) denote the current number of non-dominated forward (backward) labels generated so far. Algorithm 5 presents the pseudocode of the bidirectional labeling procedure. The algorithm starts by running the forward labeling and backward labeling in an alternated way. If nFL is smaller than nBL, then the buckets F with load qf are processed and qf is incremented. Otherwise, the buckets B with load qb are processed and qb is decremented. The process ends when \(qf = qb-1\). The minimum reduced cost route with load \(1 \le q \le qf\) is obtained from bucket F(0, q), which contains at most one label. However, routes with load \(q > qf\) must be obtained from the concatenation of forward and backward labels, which is a potentially costly operation. We use the fact that the labels are ordered by reduced cost inside the buckets. Therefore, many concatenations that would not yield negative reduced cost routes are quickly discarded. This is possible because the lm-SRCs only penalize the concatenation of labels, since their duals are negative. Procedure Save saves pointers to pairs of labels corresponding to the routes with negative reduced cost, for subsequent use in the column generation.

figure e

4.1.3 Completion bounds

Besides dominance, there is a second mechanism for eliminating labels along the forward and backward labeling algorithms. If it can be proved that a partial path P can not be completed into a route with negative reduced cost, then L(P) can be removed from its bucket. Of course, it is not possible to find the exact cost of the best completion for every P without running the full original algorithms. Instead, one needs to devise completion bounds, i.e., lower bounds on the cost of the completions; they should be obtained in a relatively fast way. For example, completion bounds can be obtained by runs of the labeling algorithms that only consider, for each label L, the dimensions of S(L) that correspond to the \(n_S'\) active lm-SRCs out of the \(n_S\) existing ones. Since the effect of a lm-SRC in the pricing is penalizing the reduced cost of some routes, this corresponds to a relaxation of the original problem. However, as proposed in [8], better completion bounds can be obtained by incorporating part of the effect of the remaining \(n_S - n_S'\) SRCs into the reduced costs as follows:

  • For a 3-SRC or a 5,2SRC s, the value \(\sigma _s/2\) can be subtracted from the reduced cost of the arcs inside C(s).

  • For a 4-SRC s, the value \(2\sigma _s/3\) can be subtracted from the reduced cost of the arcs inside C(s).

However, it is not possible to transfer part of the dual variables corresponding to 1-SRCs or to 5,1SRCs into arc reduced costs. In the first case, because there are no arcs inside C. In the second case, because a route that uses just one arc inside the C set of an 5,1SRC may have coefficient zero in the cut.

The completion bounds for the forward labeling are obtained by first running the relaxation defined above of the backward labeling. Let \(\widehat{F}(i, q)\) be the minimum reduced cost of a label in a bucket \(B'(i,q), (i,q) \in V_Q\), filled by that relaxation. Then, the extension of label \(L(P) = (\bar{c}(P),i,q,\_, \_, \_)\) to a customer j in Algorithm 3 is avoided if the following condition holds:

$$\begin{aligned} \bar{c}(P) + \bar{c}_{ij}^{q} + \widehat{F}(j, q + d_j) \ge 0. \end{aligned}$$
(15)

Analogously, the backward Algorithm 4 uses completion bounds obtained by running a relaxed run of the forward labeling. Therefore, Algorithm 5 uses both forward and backward completion bounds.

Table 10 in the appendix shows detailed results over a number of hard instances of five variants of the pricing: 1—forward labeling (Algorithm 3); 2—bidirectional labeling with concatenation at Q / 2; 3—bidirectional labeling with dynamic determination of the concatenation point (Algorithm 5); 4—the later algorithm with mild use of completion bounds (\(n_S'\) is small); 5—the same algorithm with aggressive use of completion bounds (\(n_S'\) is larger).

4.1.4 Non-robustness control

Even with all the previously described enhancements, the addition of too many lm-SRCs will still cause an exponential explosion of the number of labels in the pricing algorithms. Worse, it is not possible to predict when the explosion will occur. We have seen examples of runs where several hundreds such cuts are being normally handled by the pricing algorithm and then, at some node deep in the tree, the separation of a few dozen additional lm-SRCs makes this algorithm 100 times or even 1000 times slower. In some cases the BCP crashed due to memory overflow.

The first strategy tried for avoiding such failures was setting a conservative limit on \(n_S\) and stopping the separation of lm-SRCs after it is reached. A much better strategy is to handle the lm-SRCs dynamically. In the beginning of the root node, when no lm-SRCs were added, the pricing still has a pseudo-polynomial complexity (assuming that the size of the ng-sets is fixed). Those robust runs of the pricing are used to establish a baseline BL on the number of labels. The algorithm proceeds by separating rounds of lm-SRCs. The number of labels in the pricing is likely to increase, values up to 5BL are always acceptable, larger values may be tolerated if the rate of lower bound improvement is still good. This mechanism determines, for each node, when separation will stop and enumeration/branching will be called. However, if the number of labels in a call of the pricing exceeds 50BL, the BCP concludes that an exponential explosion is occurring. The pricing is aborted and the node is rolled back to its previous state before the last round of cuts. This means that the lm-SRCs added in that round, even if they are active, are removed.

4.1.5 Heuristic pricing

Even with the use of the accelerating techniques mentioned in this section, the exact pricing algorithm can still be quite time-consuming. Therefore, the column generation can be accelerated by also using faster pricing heuristics. In fact, the exact pricing may be only called when the pricing heuristics can not find routes with negative reduced cost. Effective and simple such heuristics can be obtained by modifying the label setting algorithms. The bucket pruning heuristic (for example, see [12]) used in this work consists of storing only a fixed (small) number of labels per bucket. This means that many non-dominated labels, those that are less likely to lead to optimal solutions, may be dropped.

figure f

4.2 Variable fixing by reduced costs

The labeling algorithms are also employed in a key part of the BCP algorithm: the elimination of variables by reduced costs. A full separated run of both forward and backward labeling should be performed. The minimum reduced cost of a route passing by an arc \((i,j)^q \in A_Q\), denoted by \(\bar{C}_{ij}^q\), can be obtained by concatenating the labels in F(iq) from the forward run with the labels in \(B(j,q+d_j)\) from the backward run. If \(\bar{C}_{ij}^q\) is larger than the gap of the Lagrangean bound associated to the current dual solution with respect to the best known integer solution (see [26]), then \(x_{i,j}^q\) can be fixed to zero. The pseudocode of our variable fixing is presented by Algorithm 6. It is quite similar to that shown in Algorithm 5, the bidirectional labeling procedure. Note that the concatenation here also relies on the fact that labels are ordered by reduced cost inside buckets.

A similar fixing procedure was also proposed in [15], but it is weaker because it only removes an arc \((i,j) \in A\), if a single particular dual solution allows removing \((i,j)^q\) for all values of q. On the other hand, as our BCP already works on the arc-load formulation, individual arcs \((i,j)^q\) can be naturally removed. For instance, it is quite typical that, at a certain point of the BCP, 95 % of the arc-load variables were already fixed to zero, while the fixing on arcs would not achieve 80 %.

4.3 Route enumeration

Baldacci et al. [2] introduced a route enumeration based approach in order to close the duality gap after the root node. An elementary route can only be part of a solution that improves the best known upper bound if its reduced cost is smaller than the gap. The enumeration of elementary routes may be performed by a label setting algorithm, producing a set partitioning problem that is given to a general MIP solver. This may work very well if the size of the set R of enumerated routes is not too large. If \(|R|<\) 20,000 the set partitioning is usually solved in less than 1 minute in a modern machine. Larger values of |R| may cause the MIP solver to take too much time, \(|R|>\) 100,000 is often unpractical.

Contardo [8] proposed another strategy in order to better profit from the route enumeration. The enumeration is performed even if the resulting R has a few million routes, which are stored in a pool. The column and cut generation proceeds. However, instead of using a labeling algorithm, the pricing starts to be performed by inspection in the pool. Now, the non-robust cuts have little impact in the pricing complexity. Therefore, they may be separated in a very aggressive way. This usually increases substantially the lower bounds, allowing reductions in the pool size by fixing variables (that now are routes) by reduced costs. For example, he reported that the enumeration of instance M-n151-k12 with gap of 5 units produced a pool with 4M routes. The separation of non-robust cuts then reduced the gap to 1.5 and the final pool only had 13K routes. The resulting set partitioning was easily solved, finishing the instance.

In this work we used the enumeration scheme propose in [8], allowing pools with up to 50M routes. After enumeration, SRCs and also clique cuts (separated using the routines from [33]) are added. Anyway, since the enumeration of so many routes may be very time-consuming, the implementation of the labeling algorithms used for that purpose becomes critical. The forward route enumeration algorithm represents a feasible (elementary) path \(P=(0,\ldots ,i), i \in V\), as a label \({L}(P) = (c(P), \bar{c}(P),v(P)=i,q(P),V(P), S(P), h(P), pred(P))\) storing its original cost, reduced cost, end vertex, load, set of visited vertices, vector of states corresponding to the \(n_S\) lm-SRCs with non-zero dual variables in the current Master LP solution, a hash value depending on V(P), and a pointer to its predecessor label. A label \(L(P_1)\) dominates a label \(L(P_2)\) if:

$$\begin{aligned}&(\mathrm{i})\; v(P_1) = v(P_2),\quad (\mathrm{ii})\; V(P_1) = V(P_2)\\&\qquad (\mathrm{iii})\; c(P_1) \le c(P_2). \end{aligned}$$

It is very important to also use completion bounds to eliminate labels that can not be extended to routes with reduced cost smaller than the current gap, denoted by gap. Let \(\widehat{F}\) be the forward completion bounds obtained running the backward labeling procedure with respect to the current values of \(\bar{c}\) and \(\bar{S}\). The forward route enumeration extends a label L(P) with \(v(P) = i\) to a customer j if \(j \notin V(P)\) and:

$$\begin{aligned} \bar{c}(P) + \bar{c}_{ij}^{q} + \widehat{F}(j, q(P) + d_j) < gap. \end{aligned}$$

Remark that all the \(n_S\) lm-SRCs are taken into account in the calculation of \(\widehat{F}\), the relaxation is allowing ng-routes instead of only elementary routes. To accelerate the label dominance checking, the labels are stored in a hash table H having |H| buckets, where |H| is a power of 2. Each vertex i in V receives two random numbers between 0 and \(|H|-1, r_1(i)\) and \(r_2(i)\). A label L(P) is stored in bucket calculated by taking the bitwise exclusive-or of the numbers \(r_1(i), i \in V(P)\), and \(r_2(v(P))\). This ensures that labels that may have a dominance relationship will be in the same bucket. The pseudocode of the forward enumeration procedure is shown in Algorithm 7. The algorithm also keeps a list \(\mathcal {Q}\) of unprocessed labels, initially containing only the starting label. In the end of the algorithm, the labels L(P) with \(v(P) = 0\) represent all elementary routes with reduced costs not greater than the duality gap.

figure g

A similar enumeration procedure could also be built by extending labels backwards. For this purpose, the list \(\mathcal {Q}\) is initialized with starting labels at the depot carrying all possible loads \(1 \le q \le Q\). The backward enumeration in itself has no advantage. However, as happens with the pricing, it can be used as a component of a significantly faster bidirectional enumerator (as already done in [2] and in [9]). In this case, the forward and backward enumeration algorithms are run up to a point around Q / 2 and a concatenation phase is used for retrieving the elementary paths with reduced cost smaller than the gap.

5 Strong branching

It is clear that route enumeration is an important element in some of the best performing algorithms for the CVRP, being able of drastically reducing the running times of some instances. Nevertheless, it is disturbing to consider that route enumeration, no matter how cleverly done, is an inherently exponential space procedure (it would remain exponential even if P \(=\) NP) that is bound to fail on larger/harder instances. However, one does not need to take a radical stance on completely avoiding branching. The hybrid strategy used in [24] and [27] performs route enumeration after solving each node. If a limit on the number of routes is reached, the enumeration is aborted and the BCP proceeds by traditional branching. Of course, since deeper nodes will have smaller gaps, at some point the enumeration will work.

The branching in our BCP is performed over sets \(S \subseteq V_+\), imposing the disjunction \((\sum _{a \in \delta ^-(S)} x_a = 1) \vee (\sum _{a \in \delta ^-(S)} x_a \ge 2)\). Those branching constraints are robust and can be translated into arc reduced cuts for the pricing in both child nodes. The BCP in [12] adopted a similar branching over sets (already used in [20]). In order to better choose the branching set, that BCP used strong branching. Each set in a collection containing from 5 to 10 candidate sets was heuristically evaluated by applying a small number of column generation iterations in its child nodes. Remark that the exact evaluation of each candidate, performing full column generation (perhaps also cut generation) in both child nodes, would be too expensive.

The recent work by [32] showed that it is possible to obtain major improvements in a BCP performance by performing a more sophisticated and aggressive strong branching:

  • The simpler branching over individual arcs was used. The procedure starts by performing a quick evaluation of 30 candidate arcs, producing a ranking. The best ranked candidate is then fully evaluated and becomes the incumbent winner. Then, other candidates with good ranking are better evaluated, but only while they have a reasonable chance of beating the incumbent winner.

  • It is possible to collect statistics along the enumeration tree to help on choosing the candidates. The previous evaluations of an arc are good predictors of future evaluations.

His extensive experiments were performed both on CVRP and VRPTW instances. This strong branching was the key algorithmic element that allowed his BCP to solve the hard instance M-n151-k12, with optimum 1015, starting from the rather modest root lower bound of 1001.5. This BCP was the first algorithm to solve all the 56 VRPTW Solomon instances with 100 customers.

Inspired by those good results, we devised a hierarchical strong branching procedure for our BCP:

  • The phase zero performs the first selection of candidate sets. In the root node, this is a collection of \(\min \{n,300\}\) sets, chosen by the proximity of \(\sum _{a \in \delta ^-(S)} \bar{x}_a\) to 1.5 in the current fractional solution. For a non-root node v, the collection has cardinality between 50 and 10, depending on TS(v), the estimative of the size of the subtree rooted in v. Half of the candidates are chosen based on the history of previous calls to the strong branching procedure. In contrast, the choice of the remaining candidates favors fresh sets, that were never evaluated before.

  • The phase one performs a rough evaluation of each candidate by solving the current restricted Master LP twice, adding the constraint corresponding to each child node. Column and cut generation are not performed. The resulting improvements in the lower bounds are usually overvaluations of the true improvements if that candidate is selected. The candidates are ranked by the product rule [1] and the best (between 20 and 3) candidates go to phase two. If TS(v) is very small, Phase 2 is skipped and the branching is performed with the best candidate of Phase 1.

  • Phase two performs quite precise evaluations. Column and cut generation are performed, the only difference to the standard solving is that only heuristic pricing is used. Actually, if TS(v) is large, even the exact pricing may be called. The results of Phase 2 are not only used to select the candidate to perform the current branching, they are stored in tables (the branching history) for subsequent use in the phase zero.

The whole procedure is guided by the principle that the strong branching effort in a node should depend on the expected subtree size. The logic behind this is the following. If TS(v) is large, even a small improvement in that branching will pay the computational cost of the precise evaluation of several candidates. On the other hand, if TS(v) is small, the branching should be fast, relying on the historical data and on the rough evaluations of phase one. The estimation TS(v), following the model proposed in [18], is calculated from the node gap (the upper bound minus the lower bound of v) and from the values of \(\Delta _l\) and \(\Delta _r\), the estimatives of the average lower bound improvements in the left children (corresponding to constraints \(=\)1) and in the right children (constraints \(\ge 2\)) of the subtree rooted at v. It is typical that \(\Delta _l\) is larger than \(\Delta _r\), reflecting a quite unbalanced tree. In the first nodes, the evaluations of the candidates performed by strong branching itself are used as proxies for \(\Delta _l\) and \(\Delta _r\).

As mentioned before, our BCP uses a hybrid strategy. Enumeration is tried after solving each node. However, the strong branching can still be performed after a successful enumeration. This happens when the final set of routes R is too large (we use 20,000 as the limit) for the MIP solver. Of course, in both children of an enumerated node the pricing will continue to be done by inspection in the pool.

6 Computational experiments

Our BCP was coded in C++, IBM ILOG CPLEX Optimizer 12.5 was used as the LP solver and MIP solver in the exact method, the CVRPSEP package [19] was used to separate Rounded Capacity Cuts and Strengthened Combs, and routines from [33] were used to separate generic Clique Cuts after enumeration. The experiments were conducted in a single core of an Intel Xeon CPU E5-2667 v2 3.30 GHz with 264 GB RAM running CentOs Linux. We report results over the standard classes of instances (A, B, E, F, M, and P) used for testing exact methods for the CVRP. Since larger instances came into reach of the proposed BCP, results are also reported over other instances with up to 360 customers proposed in literature. All instances used in this paper are available in the CVRPLIB website (http://vrp.atd-lab.inf.puc-rio.br/).

Table 1 summarizes the performance of the new BCP, comparing with the recent exact algorithms for the CVRP on the standard benchmarks. As usual in the literature where similar tables appear, classes E and M are grouped together. Columns Opt indicate the number of instances solved to optimality. Columns Gap and T(s) are the average percent gap in the root node (before sending the set of enumerated routes to a MIP solver, if the method uses that technique) and average times in seconds over the indicated machines. In order to provide an indication of the relative speed of each machine, the scores found at PassMark site (https://cpubenchmark.net/singleThread.html) are reported in parenthesis. The labels LLE04, FLL+06, BCM08, BMR11, Con12, CM14, and Rop12 refer to the algorithms proposed in [2, 3, 8, 9, 12, 20] and [32], respectively. Label BCP refers to the proposed BCP. All averages of each method are computed only over the instances solved by it. This explains, for example, why BCP has larger average times in series E-M than other methods that could not solve some of its instances.

Table 1 Comparison of recent algorithms on series A, B, E, F, M, and P
Table 2 Detailed results over selected instances

The new BCP algorithm has a good performance and could solve all those 81 instances to optimality. On instance M-n200-k16, it showed that the previous best known solution was not optimal. We should remark that instances F-n72-k4, P-n101-k4, and F-n135-k7 are still better solved by a branch-and-cut algorithm, like [20]. As in [12], we could have used a hybrid method that is able to automatically switch to a branch-and-cut after severe problems with column generation convergence are found. However, in order to stick to the BCP paradigm, we prefer to report the results of a version where convergence problems triggers a dual stabilization strategy, similar to the one described in [25]. In fact, without dual stabilization F-n135-k7 can not be solved in reasonable time.

Table 2 presents detailed information on the resolution of a selected set of larger instances. Besides M-n151-k12, M-n200-k16, and M-n200-k17, it includes results on two instances from [6] and seven instances from [14]. In those cases, the number of customers is displayed in parenthesis. For each instance and method, column IUB presents the initial upper bound used by the method. Columns RLB1, ER1, RLB2, ER2, RT(s) are root node information. They are the lower bound obtained before enumeration (RLB1), the number of routes enumerated (if the method performs it and if the enumeration succeeds), (ER1), the improved root node lower bound after route enumeration, obtained by adding additional non-robust cuts (if Contardo style enumeration is performed) (RLB2), the number of remaining enumerated routes after that (ER2) and the total root node computing time (RT(s)). The final lower bound given by FLB, which is in bold when optimal, the number of nodes in the search tree denoted by Nds and the total computational time in seconds TT(s) complete the table columns. Detailed results over each individual instance and the description of additional mechanisms to detect and exploit the symmetry present in the instances G9, G10, G13, G14, G17, G18, and G19 are provided in the appendix.

7 Conclusions

This paper have described a BCP algorithm for the CVRP that was the result of a deliberate effort of testing, improving and combining ideas proposed by several authors. The obtained results were very positive, considerably increasing the size where instances can be expected to be solved to optimality. We conclude by stating more personal views on what general BCP algorithm construction can learn from that CVRP experience:

  • The separation of non-robust cuts should be as much integrated with the pricing as possible. More precisely, besides taking classical polyhedral considerations into account, the non-robust cuts should be designed in order to minimize their impact on the specific kind of algorithm used in the pricing. In the BCP presented in this paper, the limited memory SRCs has a quite odd algorithmic definition that only makes sense in the context of the labeling algorithm. For example, suppose an alternative BCP for the CVRP where a MIP model is used to price elementary routes. The limited memory SRCs would not fit in that algorithm, they would probably cause more negative impact in the pricing than ordinary SRCs.

  • When designing non-robust cuts, it is desirable to have a parameter that allows a smooth control on cut strength vs impact in the pricing. In the case of limited memory SRCs, the parameter M has that role.

  • Even if designed and separated in a careful way, at some point the non-robust cuts can indeed make the pricing intractable, halting the BCP algorithm. Therefore, it is advisable to have suitable escape mechanisms, like the rollback introduced in this work.

  • Fixing variables from the original formulation by reduced costs can help a lot. But this fixing can be more effective if the original formulation is chosen in order to match the algorithm using in the pricing. In fact, in the arc-load formulation, the vertices in \(V_Q\) have a one-to-one correspondence with the buckets in the labeling algorithm.

  • The idea of finishing a BCP by enumerating all columns that may be part of the optimal solution into a pool and perform pricing by inspection after that and eventually sending a much reduced problem to a MIP solver can be quite effective in some instances. However, for consistency reasons, it should be hybridized with traditional branching.

  • Strong branching is now a standard technique for improving branch-and-bound or branch-and-cut performance. Our experience confirms the statement by [32] that it can also improve a lot the performance of BCP algorithms.