1 Introduction

Bilevel optimization is a very challenging topic that received much of attention in recent years, as witnessed by the flourishing recent literature. In this paper we address a generic mixed-integer bilevel linear program (MIBLP), i.e., a bilevel optimization problem where all objective functions and constraints are linear, and some/all variables are required to take integer values.

While many papers in the literature proposed ad-hoc methods for specific bilevel problems, we aim at developing a general-purpose exact algorithm applicable to a general MIBLP in which both the leader and the follower are of mixed-integer type, provided that the leader variables influencing follower’s decisions are integer and bounded.

Our algorithmic design choice was to look for non-invasive supplements needed to convert an effective branch-and-cut mixed-integer linear programming (MILP) exact code into a valid MIBLP solver. In this way, we inherit the wide arsenal of tools (cuts, heuristics, propagations, etc.) available in modern MILP solvers, and do not have to address non-bilevel specific issues like numerical stability, effective LP parametrization, and multi-threading support. This is a distinguishing feature of our work: instead of proposing sophisticated bilevel-specific solution schemes for MIBLP, as in most papers from the literature, we build on a stable and powerful MILP solver and add bilevel-specific features to it.

We first introduce necessary modifications of branching, node-evaluation and fathoming rules of a standard branch-and-bound based MILP solver, and prove that they lead to an exact and (under appropriate conditions) finitely-convergent MIBLP solver. We then propose the use of intersection cuts (ICs) [3] within this branch-and-cut framework. Our approach relies on defining appropriate convex feasible-free sets that can be used to cut off bilevel infeasible points obtained from a problem relaxation. As far as we know, this is the first time ICs have been exploited in the context of bilevel programming. Many of the generic MIBLP approaches proposed in the literature are illustrated using small numerical examples involving few decision variables only (see, e.g., [2, 12, 16, 19]). On the contrary, in this article we present extensive computational results on a testbed containing more than 300 instances from the literature with up to 80,000 variables. An outcome of our experiments is that our approach outperforms available state-of-the-art MIBLP solvers by a large margin.

The paper is organized as follows. Section 2 gives a brief introduction to general bilevel optimization and to MIBLP. Section 3 presents a basic MILP-based branch-and-bound algorithm for MIBLP, and proves its finite convergence under appropriate assumptions. An improved branch-and-cut algorithm is then described in Sect. 4, where we present two new families of valid MIBLP cuts along with their separation procedures. In Sect. 5, results of our computational study are provided. Some conclusions and possible directions for future work are finally addressed in Sect. 6.

A preliminary work appeared in [11]; the current paper expands it as follows: (i) a detailed description of a finitely-convergent branch-and-bound algorithm for MIBLPs is added, including a discussion on how to deal with infeasible and unbounded followers; (ii) the separation of intersection cuts is described in more details, and two alternative procedures for separating ICs are given; (iii) the computational study is vastly expanded and includes a re-implementation of the recently proposed solution method of [7] to give a fair comparison with a state-of-the-art approach.

2 The problem

A general bilevel optimization problem is defined as

$$\begin{aligned} \min _{x\in {\mathbb {R}}^{n_1},y\in {\mathbb {R}}^{n_2}}&F(x,y) \end{aligned}$$
(1)
$$\begin{aligned}&G(x,y) \le 0 \end{aligned}$$
(2)
$$\begin{aligned}&y \in \arg \min _{y'\in {\mathbb {R}}^{n_2} } \{ f(x,y') : g(x,y') \le 0 \, \} , \end{aligned}$$
(3)

where \(F,f: {\mathbb {R}}^{n_1 + n_2} \rightarrow {\mathbb {R}}, G: {\mathbb {R}}^{n_1+n_2} \rightarrow {\mathbb {R}}^{m_1}\), and \(g: {\mathbb {R}}^{n_1+n_2} \rightarrow {\mathbb {R}}^{m_2}\). Let \(n=n_1+n_2\) denote the total number of decision variables, and let \(N_x=\{1,\dots , n_1\}\) and \(N_y=\{1,\dots , n_2\}\) denote the index sets of the x and y variables, respectively.

We refer to F(xy) and \(G(x,y)\le 0\) as the leader objective function and constraints, respectively, and to (3) as the follower subproblem. In case the follower subproblem has multiple optimal solutions, we assume that one with minimum leader cost among those with \(G(x,y)\le 0\) is chosen—i.e., we consider the optimistic version of bilevel optimization. The reader is referred to [15] for a comprehensive discussion on this topic.

By defining the follower value function for a given \(x \in {\mathbb {R}}^{n_1}\)

$$\begin{aligned} \varPhi (x) = \min _{y\in {\mathbb {R}}^{n_2} } \{ f(x,y) : g(x,y) \le 0 \, \}, \end{aligned}$$
(4)

one can restate the bilevel optimization problem as follows:

$$\begin{aligned} \min F(x,y)&\end{aligned}$$
(5)
$$\begin{aligned} G(x,y)&\le 0&\end{aligned}$$
(6)
$$\begin{aligned} g(x,y)&\le 0&\end{aligned}$$
(7)
$$\begin{aligned} (x,y)&\in {\mathbb {R}}^n&\end{aligned}$$
(8)
$$\begin{aligned} f(x,y)&\le \varPhi (x).&\end{aligned}$$
(9)

Note that the above optimization problem would be hard (both theoretically and in practice) even if one would assume convexity of FGf and g (which would imply that of \(\varPhi \)), due to the intrinsic nonconvexity of (9).

Dropping condition (9) leads to the so-called High Point Relaxation (HPR). Clearly, if HPR is infeasible, so is its bilevel counterpart. Thus, without loss of generality, we assume HPR is feasible.

As HPR contains all the follower constraints, any HPR solution (xy) satisfies \(f(x,y) \ge \varPhi (x)\), hence (9) actually implies \(f(x,y) = \varPhi (x)\). An HPR solution (xy) is called bilevel infeasible if it violates (9). A point \((x,y) \in {\mathbb {R}}^n\) is called bilevel feasible if it satisfies all constraints (6)–(9).

2.1 Mixed-integer bilevel linear program

In the remaining part of the paper we focus on the relevant special case in which some/all variables are required to be integer, and all leader/follower constraints and objective functions are linear/affine functions. To be more specific, we consider the following mixed-integer bilevel linear program (MIBLP), in its value-function reformulation (5)–(9):

$$\begin{aligned} \min c_x^T x + c_y^T y&\end{aligned}$$
(10)
$$\begin{aligned} G_x x + G_y y&\le q&\end{aligned}$$
(11)
$$\begin{aligned} A x + B y&\le b&\end{aligned}$$
(12)
$$\begin{aligned} x^- \le x&\le x^+&\end{aligned}$$
(13)
$$\begin{aligned} {l^L} \le y&\le {u^L}&\end{aligned}$$
(14)
$$\begin{aligned} x_j \hbox {~integer}&, \ \ \forall j \in J_x&\end{aligned}$$
(15)
$$\begin{aligned} y_j \hbox {~integer}&, \ \ \forall j \in {J_y}&\end{aligned}$$
(16)
$$\begin{aligned} d^T y&\le \varPhi (x)&\end{aligned}$$
(17)

where \(c_x, c_y, G_x, G_y, q, A, B, b, x^-, x^+, {l^L}\), and \({u^L}\) are given rational matrices/vectors of appropriate size, while sets \(J_x\subseteq N_x\) and \({J_y}\subseteq \ N_y\) identify the (possibly empty) indices of the integer-constrained variables in x and y, respectively. Constraints (13)–(14) define explicit lower/upper bounds on the variables; as customary, we allow some entries in \(x^-, x^+, {l^L}, {u^L}\) to be \(\pm \infty \).

As to the value function \(\varPhi (x)\) for a given x, it is computed by solving the follower MILP:

$$\begin{aligned} \varPhi (x) := \min d^T y&\end{aligned}$$
(18)
$$\begin{aligned} A x + B y&\le b&\end{aligned}$$
(19)
$$\begin{aligned} {l^F} \le y&\le {u^F} \end{aligned}$$
(20)
$$\begin{aligned} y_j ~&\hbox {integer}, \ \ \forall j \in J_y&\end{aligned}$$
(21)

where \({l^F}\) and \({u^F}\) are, respectively, lower and upper bounds on the y variables in the follower—possibly \({l_j^F}=-\infty \) and/or \({u_j^F}=+\infty \) for some j. Without loss of generality, one can assume \( {l^F} \le {l^L}\) and \( {u^F} \ge {u^L} \), i.e., the bound constraints (14) on the y variables are not weaker than the corresponding bounds (20) in the follower problem. Note that the follower objective function (18) does not depend on x, as the latter would just introduce a constant term without changing the set of the optimal follower solutions.

Let m denote the number of rows of matrix A, let \(A_j\) denote its j-th column, and let \(A_{ij}\) denote its generic entry. In what follows we use notation

$$\begin{aligned} J_F := \{ j \in N_x : A_j \not = 0 \} \end{aligned}$$
(22)

to denote the index set of the leader variables \(x_j\) (not necessarily integer-constrained) appearing in the follower problem.

Dropping the nonconvex condition (17) from (10)–(17) leads to HPR, which is a MILP in this setting. Dropping integrality conditions as well leads to the LP relaxation of HPR, which is denoted by \(\overline{\hbox {HPR}}\).

Solving MIBLPs is much more challenging than single-level MILPs: first of all, MIBLPs are \(\varSigma ^P_2\)-hard [14]. Furthermore, it is known that allowing continuous variables in the leader and integer variables in the follower, may lead to bilevel problems whose optimal solutions are unattainable (see, e.g., [13, 17]). Finally, in contrast to single-level MILPs, unboundedness of a relaxation of the problem (namely, the \(\overline{\hbox {HPR}}\)-relaxation) does not allow to draw conclusions on the optimal solution of MIBLP. More precisely, MIBLPs with unbounded \(\overline{\hbox {HPR}}\)-relaxation value can be unbounded, infeasible, or admit an optimal solution.

To deal with the above difficulties, in the remainder of this article we impose the following assumptions:

Assumption 1

All the integer-constrained variables x and y have finite lower and upper bounds both in HPR and in the follower MILP.

Assumption 2

\(J_F \subseteq J_x\), i.e., continuous leader variables \(x_j\) (if any) do not appear in the follower problem—hence they are immaterial for the computation of the value function \(\varPhi (x)\).

3 A finitely-convergent branch-and-bound algorithm

We next show how a classical branch-and-bound (B&B) scheme for MILP can be modified to obtain an exact MIBLP solver. Correctness and finite convergence is proved under the Assumptions 1 and 2 stated above.

If for all HPR solutions, the follower MILP is unbounded, one may safely conclude that the MIBLP is infeasible. In Sect. 3.1 we first show that such a situation may be detected by solving a single LP. This check can be seen as a preprocessing step, so that without loss of generality we may assume that for an arbitrary HPR solution, the follower MILP is well defined.

In Sects. 3.2 and 3.3 we detail the bilevel-feasibility check and refinement procedures that need to be “plugged-in” at some critical branch-and-bound nodes. Observe that under our assumptions, the HPR feasible set can be unbounded (we impose no bounds on continuous variables), which may result into an unbounded \(\overline{\hbox {HPR}}\) solution value. We show that our algorithm is able to detect whether a given MIBLP is in fact unbounded, infeasible, or admits an optimal solution.

3.1 Dealing with infeasible/unbounded follower MILPs

We now address the special cases where, for a given x, function \(\varPhi (x)\) is not well-defined as the follower MILP (18)–(21) is either infeasible or unbounded.

In case the follower MILP is infeasible for a certain x, no bilevel-feasible solution of the type \((x,\cdot )\) exists. This situation of course cannot occur when (xy) is a feasible HPR solution, in that HPR includes all follower constraints (19)–(21).

The situation where the follower MILP is unbounded for a certain x was instead analyzed in [20, Lemma 2], where it is proven that this implies the infeasibility of the MIBLP model (10)–(17). The following result is a consequence of that lemma, and shows that the solution of a single LP (not depending on x anymore) is enough to handle all cases of unboundedness of the follower MILP; to be self contained, a short proof is given.

Theorem 1

Relax Assumptions 1 and 2, and recall that B is a rational matrix. Let \(v^*\) be the optimal value of the following LP:

$$\begin{aligned} v^* := \min d^T \Delta y&\end{aligned}$$
(23)
$$\begin{aligned} B \Delta y&\le 0 \end{aligned}$$
(24)
$$\begin{aligned} \Delta y_j&\le 0,&\forall j \in {N_y} : {u_j^F} < +\infty \end{aligned}$$
(25)
$$\begin{aligned} \Delta y_j&\ge 0,&\forall j \in {N_y} : {l_j^F} > -\infty \end{aligned}$$
(26)
$$\begin{aligned} -1 \le&\Delta y \le 1. \end{aligned}$$
(27)

If \(v^*<0\), then the MIBLP model (10)–(17) is infeasible, otherwise the value function \(\varPhi (x)\) is well defined for every HPR solution (xy).

Proof

Observe that the LP (23)–(27) is not unbounded (because of (27)) nor infeasible (as \(\Delta y=0\) is a feasible solution). So let \(\Delta y^*\) be any optimal extreme solution of it. The rationality of B implies that of \(\Delta y^*\), so we can multiply the latter by a positive scalar to get an integer vector \(\overline{\Delta y}\) that satisfies (24)–(26). Now take any HPR solution (xy).

If \(v^*<0\), we have \(d^T \, \overline{\Delta y}<0\). Thus, for any follower solution \(y'\) satisfying (19)–(21), the new solution \(y'' = y' + \alpha \, \overline{\Delta y}\) is feasible, and has a smaller value, for every integer \(\alpha > 0\). This means that the follower MILP is unbounded, i.e., it does not have any optimal solution—hence no bilevel-feasible solution of the form \((x,\cdot )\) exists.

If \(v^* \ge 0\), instead, the LP relaxation of the follower MILP cannot be unbounded, as any unbounded ray would correspond to a solution \(\Delta y\) of (23)–(27) with \(d^T \Delta y < 0\). So the follower MILP itself cannot be unbounded, and \(\varPhi (x)\) is well defined.

The claim then follows from the arbitrariness of (xy). \(\square \)

Of course, in case all follower variables have finite lower and upper bounds, conditions (25)–(26) imply \(\Delta y= 0\) and hence \(v^*=0\), thus guaranteeing that \(\varPhi (x)\) is well defined for all HPR feasible solutions (xy). Case \(v^* \ge 0\) does not rule out instead the possible infeasibility of the MIBLP model (10)–(17).

As anticipated, Theorem 1 can be used as a preprocessing step to exclude unboundedness of all follower MILPs by solving a single LP in the space of the follower variables only. In what follows, we will therefore assume that the value function \(\varPhi (x)\) is well defined for every HPR solution (xy).

3.2 Feasibility check and refinement procedure

Given a feasible solution \((x^*,y^*)\) of HPR, checking bilevel feasibility requires solving the follower MILP (18)–(21) for \(x=x^*\) to compute \(\varPhi (x^*)\), and to check whether \(d^T y^* \le \varPhi (x^*)\) holds. Observe that the follower MILP for \(x=x^*\) cannot be infeasible, as the input \((x^*, \cdot )\) is a feasible HPR solution, nor unbounded by assumption (see Sect. 3.1).

The refinement procedure described below is intended to solve the MIBLP subproblem arising after fixing all \(x_j\)’s with \(j\in J_F\), for which (17) can be handled easily as \(\varPhi (x)\) becomes a constant.

figure a

At Step 1 we solve the follower problem for \(x_j=x^*_j\) for all \(j\in J_F\). At Steps 2 and 3, one defines a restricted HPR and computes a best bilevel-feasible solution \((\hat{x},\hat{y})\) with \(\hat{x}_j=x^*_j\) for all \(j\in J_F\), respectively; this is just a MILP because, for the restricted HPR, (17) becomes \(d^T y \le \varPhi (x^*)\) = const.

Special cases arise when the restricted HPR defined at Step 2 is either unbounded or infeasible. In the former case (Step 4), the bilevel problem (10)–(17) is unbounded as we can exhibit a bilevel-feasible solution \((\hat{x},\hat{y})\) of arbitrarily small cost \(c_x^T \hat{x}+ c_y^T \hat{y}\). In the latter case, no bilevel-feasible HPR solution of the form \((\hat{x},\cdot )\) exists, and the procedure returns a dummy solution of HPR-cost conventionally set to \(+\infty \).

3.3 The overall branch-and-bound framework

Recall that our main goal is to solve MIBLP by using a standard LP-based B&B algorithm applied to HPR, where the value-function constraint (17) is handled implicitly. In our basic scheme, branching is performed by restricting the domain of the integer-valued x and y variables, i.e., by modifying the lower/upper HPR bounds \((x^-_j, x^+_j, {l_j^L}, {u_j^L})\) appearing in (13) and (14) for the integer-constrained variables \(x_j\) with \(j\in J_x\) and \(y_j\) with \(j\in {J_y}\).

Finite convergence of the above scheme can be guaranteed in case:

  1. (a)

    The B&B algorithm for the underlying HPR, as well as for the follower MILP, is finely-convergent.

  2. (b)

    One can always prune a given B&B node where all variables \(x_j\) with \(j\in J_x\) have been fixed by branching.

Points (a) and (b) are in fact handled by Assumptions 1 and 2. A simple B&B algorithm that solves MIBLP in a finite number of iterations is then illustrated in Algorithm 2. For the sake of conciseness, only the bilevel-specific features are described in full details.

figure b

In Steps 2–17, we handle B&B nodes in which the standard branching on a fractional variable is not possible. This may happen because integrality requirements are met, or the current \(\overline{\hbox {HPR}}\) is unbounded. Nodes for which \(\overline{\hbox {HPR}}\) is infeasible do not need any special treatment instead, and can be fathomed as usual.

As customary, the current node is fathomed in case the current relaxation is finite and has an optimal solution that is bilevel feasible (Step 7).

Compared to a standard B&B algorithm, the following modified rules need to be applied:

  • Fathoming is inhibited in case of an unbounded \(\overline{\hbox {HPR}}\) value. Instead, at Step 15 one continues to branch on an integer \(x_j\) (even if \(x^*_j\) is integer) until its value is fixed by branching.

  • If \(\overline{\hbox {HPR}}\) value is unbounded and all variables \(x_j\) (\(j\in J_F\)) are fixed by branching, we apply the refinement algorithm of Step 11. If the refinement procedure returns “MIBLP is UNBOUNDED”, the algorithm terminates. Otherwise, the incumbent is possibly updated (if the refinement procedure returns a finite value), and the node is eventually fathomed.

  • Branching is also not required if all variables \(x_j\) (\(j \in J_F\)) have been fixed by branching (Step 10). Indeed, under Assumption 2, one can use Algorithm 1 to compute the best bilevel-feasible solution \((\hat{x},\hat{y})\) for the node.

As customary, the algorithm returns “MIBLP is INFEASIBLE”, if upon the enumeration of all B&B nodes, no feasible solution is found. Observe that, even in case of an unbounded \(\overline{\hbox {HPR}}\) value, this situation is handled correctly, due to the refinement algorithm of Step 11. In the worst case, this step is executed for every possible combination of feasible \(x_j\) integer values, \(j \in J_F\), until all of the B&B nodes are discarded.

Theorem 2

Under Assumptions 1 and 2, Algorithm 2 correctly solves MIBLP in a finite number of iterations.

Proof

Finiteness follows immediately by Assumption 1, as each branching operation strictly reduces the domain of an integer-constrained variable. Thus a finite number of B&B nodes is generated, and each node requires a finite number of operations to be processed. Correctness follows from the fact that, because of Assumption 2, Step 11 actually computes the best bilevel-feasible solution \((\hat{x},\hat{y})\) for the current node (if any), so the node can be pruned after the incumbent update. \(\square \)

Note that we allow branching on y variables, though this could be avoided by a simple modification of our scheme. The rationale of this choice is that we prefer to exploit the underlying MILP solver as much as possible, and to deviate from it only when strictly needed for the correctness of the approach. Also observe that, contrary to other approaches from the literature, branching on variables \(y_j\) (by changing their lower/upper bounds \(l^L_j/u^L_j\) in the HPR relaxation) is a legitimate option in our setting because it does not affect the follower lower/upper bound vectors \(l^F/u^F\), hence value function \(\varPhi (x^*)\) computed at Step 4 remains “completely blind” with respect to branching.

3.4 Comparison with the literature

The first generic branch-and-bound approach to MIBLP was given in [17]. The approach works for problems without y variables in the leader constraints. Our algorithm is much less restrictive than the one in [17], for which the authors are able to prove convergence only under the assumptions that the HPR feasible region is compact and that either all leader variables are integer (i.e., \(J_x = N_x\)), or all follower variables are continuous (i.e., \(J_y = \emptyset \)).

A MILP-based branch-and-cut algorithm was introduced in [7, 8]. This approach builds upon the ideas from [17], and cuts off integer bilevel infeasible solutions by adding cuts that exploit the integrality property of the leader and the follower variables. However, this method works with integer variables only and does not allow y variables in the leader constraints.

In the branch-and-sandwich algorithm in [12] (which is a more general approach for nonlinear bilevel problems) novel ideas for deriving lower and upper bounds on the follower value function \(\varPhi (x)\) are proposed. Also here, it is assumed that the HPR feasible region is compact. On the other hand, continuous x variables influencing the follower’s decision are allowed (i.e., \(J_F \subseteq N_x\)), in which case only \(\epsilon \)-convergence can be guaranteed.

In [20], an exact approach based on multi-way branching on the slack variables on the follower constraints is proposed. The algorithm solves a series of MILPs, obtained by restricting the slack variables, until the convergence is reached. Again, all x variables are required to be integer and bounded. In contrast to our approach, no continuous variables at the leader are allowed, and the constraint matrix A is assumed to be integer.

Recently, [5] proposed a method that works for integer x and y variables only: HPR is embedded into a branch-and-bound tree, bilevel infeasible solutions being cut off by linearizing the bilevel continuous problems resulting from the separation procedure.

A Benders-like decomposition scheme for general MIBLPs is given in [19], whereas an approach for nonlinear MIBLPs is introduced in [16]; in both cases, the algorithms assume the HPR feasible region be compact.

In [21] an algorithm based on a single-level reformulation, which is then solved by a column-and-constraint generation scheme, is presented. Similarly to our approach, the algorithm requires the integer variables to be bounded and the continuous variables at the leader level are also allowed. In contrast to our approach, the authors allow the continuous variables of the leader to control follower’s decisions; however, they make a strong assumption that the optimal solution in that case is attainable, thus avoiding the discussion on how to derive \(\epsilon \)-optimal solutions.

To conclude, our algorithm is one of the few approaches that are capable of handling continuous x variables (as long as they do not influence the follower decisions), in combination with a mixed-integer follower. Our algorithm, together with the one introduced in [20], is also one of the few finitely-convergent branch-and-bound approaches that return a provably optimal solution (if such exists) without assuming the HPR feasible region to be compact (only integer variables need to be bounded).

4 A branch-and-cut algorithm

We next elaborate the B&B algorithm described in the previous section, and introduce new families of linear cuts used to speedup its convergence within a branch-and-cut (B&C) scheme.

4.1 Intersection cuts

Intersection cuts have been introduced by Balas [3] in the 1970’s, and are widely used in Mixed-Integer Programming; the reader is referred to Ch. 6 “Intersection Cuts and Corner Polyhedra” in [6] for a recent in-depth treatment of the subject. Their use for MIBLP was not investigated until our very recent work [11], where for the first time we showed their usefulness in the context of bilevel optimization.

The definition of an IC violated by a given point \((x^*,y^*)\) requires the definition of two sets:

  1. (1)

    a cone pointed at \((x^*,y^*)\) that contains all the bilevel feasible solutions, and

  2. (2)

    a convex set \(S\) that contains \((x^*,y^*)\)—but no bilevel feasible solutions—in its interior.

It is worth observing that condition (2) is new, in that it replaces the usual “lattice free” property (exploited in classical MILP) with the bilevel-free one. The larger the convex set, and the smaller the cone, the better the resulting IC. We next briefly address point (1), while point (2) is the subject of the next section.

As customary in mixed-integer programming, our ICs are generated for vertices \((x^*,y^*)\) of the \(\overline{\hbox {HPR}}\) polyhedron, so a suitable cone is just the corner polyhedron associated with the corresponding optimal basis. All relevant information about this cone is readily available in the “optimal tableau”. As the \(\overline{\hbox {HPR}}\) at a given B&B node exploits locally-valid information (notably, the reduced variable domain resulting from branching), our ICs are locally (as opposed to globally) valid as well.

4.2 Bilevel-free polyhedra

We next describe how to derive bilevel-free polyhedra to be used to generate valid cuts as outlined in the previous section.

Theorem 3 below was implicit in some early references (including [20]), where it was only used as a branching rule in a B&B setting.

Theorem 3

For any \(\hat{y}\in {\mathbb {R}}^{n_2}\) that satisfies (20)–(21), the set

$$\begin{aligned} S(\hat{y}) = \{ (x,y) \in {\mathbb {R}}^n: d^T y > d^T \hat{y}, \, A x + B \hat{y}\le b \} \end{aligned}$$
(28)

does not contain any bilevel feasible point.

Proof

We have to prove that no bilevel feasible (xy) exists such that \(d^T y > d^T \hat{y}\) and \(A x + B \hat{y}\le b\). Indeed, for any bilevel feasible solution (xy) with \(A x + B \hat{y}\le b\), one has

hence \((x,y) \not \in S(\hat{y})\). \(\square \)

Note that inequalities \((A x + B y)_i \le b_i\) in the follower MILP that do not involve x variables, if any, would lead to useless inequalities of the form \(0\le b_i - (B\hat{y})_i\) (= a nonnegative constant), so they do not contribute to the definition of \(S(\hat{y})\). This is the reason why, for example, the bound constraints \({l^F} \le y \le {u^F}\) do not appear in the definition of \(S(\hat{y})\).

Unfortunately, the above set is not always suitable for IC generation, as one needs to ensure that any bilevel-infeasible HPR solution \((x^*,y^*)\) to be cut belongs to the interior of the bilevel-free polyhedron. To be sure that a violated IC can be derived, one therefore needs to address an extended version of the above set, whose validity requires the following assumption.

Assumption 3

\(A x + B y - b\) is integer for all HPR solutions (xy).

The above assumption holds true, in particular, when all the x and y variables appearing in the follower MILP are constrained to be integer, and (ABb) is integer. The latter is in fact a very common assumption in the MIBLP literature (see Sect. 3.4), which is also required by the MIBLP solver proposed in [7, 8] and used as a benchmark code in our computational experiments.

Let \( \mathbf {1}=(1,\cdots ,1)\) denote a vector of all ones of suitable size.

Theorem 4

Under Assumption 3, for any \(\hat{y}\in {\mathbb {R}}^{n_2}\) that satisfies (20)–(21), the extended polyhedron

$$\begin{aligned} S^+(\hat{y}) = \{(x,y) \in {\mathbb {R}}^n: d^T y \ge d^T \hat{y}, \, A x + B \hat{y}\le b+ \mathbf {1}\} \end{aligned}$$
(29)

does not contain any bilevel feasible point in its interior.

Proof

To be in the interior of \(S^+(\hat{y})\), a bilevel feasible (xy) should satisfy \(d^T y > d^T \hat{y}\) and \(A x + B \hat{y}< b+ \mathbf {1}\). By assumption, the latter condition can be replaced by \(A x + B \hat{y}\le b\), hence the claim follows from Theorem 3. \(\square \)

It is worth observing that violated ICs can sometimes be derived even from \(S(\hat{y})\), i.e., when Assumption 3 does not hold and the enlarged set \(S^+(\hat{y})\) cannot be used as it might contain bilevel-feasible points in its interior. In this latter case, the IC separation on \(S(\hat{y})\) can still be used as a heuristic way to improve the LP relaxation at the given B&B node through locally-valid cutting planes. This situation is illustrated in the following example, where we generate violated ICs both from \(S(\hat{y})\) and \(S^+(\hat{y})\)—those derived from the latter set being of course deeper.

Example

Figure 1 illustrates the application of our ICs to a notorious example from [17] which is frequently used in the literature, namely:

$$\begin{aligned} \min _{x\in {\mathbb {Z}}} \ -x - 10y&\end{aligned}$$
(30)
$$\begin{aligned} \ \ y \in \arg \min _{y'\in {\mathbb {Z}}}&\{ \ y' :&\end{aligned}$$
(31)
$$\begin{aligned}&-25x +20y'&\le 30&\end{aligned}$$
(32)
$$\begin{aligned}&x +2 y'&\le 10&\end{aligned}$$
(33)
$$\begin{aligned}&2 x - y'&\le 15&\end{aligned}$$
(34)
$$\begin{aligned}&2 x + 10y'&\ge 15 \ \}.&\end{aligned}$$
(35)
Fig. 1
figure 1

Illustration of the effect of alternative intersection cuts applied to a notorious example from [17]. Shaded regions correspond to the bilevel-free sets from which the cut is derived

In this all-integer example, there are 8 bilevel feasible points (depicted as crossed squares in Fig. 1), and the optimal bilevel solution is (2, 2). The drawn polytope corresponds to the \(\overline{\hbox {HPR}}\) feasible set.

We first apply the definition of the bilevel-free set from Theorem 3 with \(\hat{y}\) defined as the follower optimal solution for \(x=x^*\). After solving the first \(\overline{\hbox {HPR}}\), the point \(A=(2,4)\) is found. This point is bilevel infeasible, as for \(x^*=2\) we have \(d^T y^* = y^*=4\) while \(\varPhi (x^*) = 2\). Solving the follower for \(x=2\) we compute \(\hat{y}= 2\) and derive \(S(\hat{y}) = \{(x,y): \frac{2}{5} \le x \le 6, y > 2\}\) and the associated intersection cut \(y \le 2\), see Fig. 1(a). In the next iteration, the optimal \(\overline{\hbox {HPR}}\) solution moves to \(B =(6,2)\). Again, for \(x^*=6, f(x^*,y^*) = y^*=2\) while \(\varPhi (x^*) = 1\). So we compute \(\hat{y}= 1\) and generate the IC induced by the associated \(S(\hat{y}) = \{(x,y): \frac{5}{2} \le x \le 8, y > 1 \}\), namely \(2x + 11y \le 27\) (cf. Fig. 1(b)). In the next iteration, the fractional point \(C=(5/2,2)\) is found and \(\hat{y}=1\) is again computed. In this case, C is not in the interior of \(S(\hat{y})\) so we cannot generate an IC cut from C but we should proceed and optimize HPR to integrality by using standard MILP tools such as MILP cuts or branching. This produces the optimal HPR solution (2, 2) which is bilevel feasible and hence optimal.

We next apply the definition of the enlarged bilevel-free set from Theorem 4 (whose assumption is fulfilled in this example) with \(\hat{y}\) defined as before; see Figures 1(c) and (d). After the first iteration, the point \(A=(2,4)\) is cut off by a slightly larger \(S^+(\hat{y}=2) = \{ (x,y): \frac{9}{25} \le x \le 2, y \ge 2\}\). Note that, however, we obtain the same IC as before (\(y \le 2\)). After the second iteration, from the bilevel infeasible point \(B=(6,2)\) we derive a larger set \(S^+(\hat{y}=1) = \{ (x,y): 2 \le x \le \frac{17}{2}, y \ge 1 \}\) and a stronger IC (\(x +6y \le 14\)). In the third iteration, solution \(D=(2,2)\) is found which is the optimal bilevel solution, so no branching at all is required in this example.

4.3 Relaxed bilevel-free polyhedra

It is well known from IC theory [3, 6] that the larger the bilevel-free polyhedron, the deeper the resulting IC. Therefore one is interested in enlarging that polyhedron, e.g., by relaxing some of its defining inequalities—as long as this does not affect the bilevel-free property of course. To this end, one can apply the following simple (yet very powerful) argument.

Theorem 5

Let \(S=\{ (x,y) \in {\mathbb {R}}^n: \alpha _i^T x + \beta _i^T y \le \gamma _i, \, i=1,\dots ,k\}\) be any polyhedron not containing bilevel-feasible points in its interior. Then one can remove from the definition of \(S\) all the inequalities \(\alpha _i^T x + \beta _i^T y \le \gamma _i\) such that the half-space \(\{ (x,y)\in {\mathbb {R}}^n: \alpha _i^T x + \beta _i^T y \ge \gamma _i \}\) does not contain any bilevel-feasible solution.

Proof

Obvious, as the removal of any such inequality from the definition of \(S\) cannot bring bilevel-feasible points into it. \(\square \)

Note that the condition on the theorem refers to a closed half-space, i.e., the condition is \(\ge \gamma _i\) and not \(> \gamma _i\). This is consistent with the fact that \(S\) is allowed to contain bilevel-feasible points on its boundary.

Corollary 1

Let \(S=\{ (x,y) \in {\mathbb {R}}^n: \alpha _i^T x + \beta _i^T y \le \gamma _i, \, i=1,\dots ,k\}\) be any polyhedron not containing bilevel-feasible points in its interior. Then one can remove from the definition of \(S\) all the inequalities \(\alpha _i^T x + \beta _i^T y \le \gamma _i\) such that

$$\begin{aligned} \sum _{j=1}^{n_1} \max \{\alpha _{ij} x^-_j, \alpha _{ij} x^+_j\} + \sum _{j=1}^{n_2} \max \{\beta _{ij} {l^L_j}, \beta _{ij} {u^L_j}\} < \gamma _i. \end{aligned}$$

In our implementation, the above corollary is automatically applied within IC separation, by using the current lower/upper bounds \((x^-, x^+, {l^L}, {u^L})\) at the given B&B node.

Corollary 2

Let \(S^+(\hat{y})\) be the bilevel-free polyhedron defined by (29) and assume a lower bound FLB on \(d^T y\) is known. If \(d^T \hat{y}< FLB\), one can remove the inequality \(d^T y \ge d^T \hat{y}\) from the definition of \(S^+(\hat{y})\).

In our code, the above property is only exploited for “zero-sum” instances where the leader and follower objective functions satisfy \(F(x,y) = -f(x,y)\), i.e., \(c_x=0\) and \(c_y = -d\) in (10). Indeed, at every B&B node, the incumbent value \(z^*\) (say) is an upper bound for the leader objective function, hence \(FLB = -z^*\) is a lower bound for the follower’s one.

4.4 Separation of intersection cuts

We now address the question of how to cut a given vertex \((x^*,y^*)\) of \(\overline{\hbox {HPR}}\) by using an IC. Given that we use the cone associated with the current LP basis, as explained in Sect. 4.1, what remains is the choice of the bilevel-free set to be used.

As ICs require that \((x^*,y^*)\) belongs to the interior of the bilevel-free set, we concentrate on the extended polyhedron \(S^+(\hat{y})\) from Theorem 4. In doing so, we have to restrict ourselves to the cases where Assumption 3 holds. We also assume that (possibly after scaling) d is an integer vector, so as to replace the bilevel-infeasibility conditions of the type \(d^T y < d^T y^*\) by \(d^T y \le d^T y^*-1\).

As our ICs are locally valid cuts, in this section we denote by \((x^-, x^+, {l^L}, {u^L})\) the variable bounds at the current branching node, as modified by branching. Needless to say, notation \((A,B,b,d,{l^F,u^F})\) refers instead to the original follower MILP, that is required to be completely blind with respect to branching.

A very natural option for defining \(S^+(\hat{y})\) is to choose \(\hat{y}\) as an optimal solution of the follower MILP for \(x=x^*\), namely:

$$\begin{aligned} \mathbf{SEP-1:~~} \hat{y}\in \arg \min d^T y \end{aligned}$$
(36)
$$\begin{aligned} A x^* + B y&\le b \end{aligned}$$
(37)
$$\begin{aligned} {l^F} \le y&\le {u^F}\end{aligned}$$
(38)
$$\begin{aligned} y_j&\hbox { integer }&\forall j \in J_y \end{aligned}$$
(39)

Note that the above problem is always feasible when \((x^*,y^*)\) is an HPR solution. Also note that one cannot write \(\le b + \mathbf {1}\) in (37) as this would allow \((x^*,\cdot )\) to belong to the boundary of \(S^+(\hat{y})\).

In our B&C scheme, the computation of \(\hat{y}\) typically does not require extra effort as the follower MILP is solved anyway for the sake of checking bilevel feasibility of \((x^*,y^*)\) and, possibly, producing heuristic solutions.

By minimizing \(d^T \hat{y}\), the above model maximizes the distance of \((x^*,y^*)\) from the face of \(S^+(\hat{y})\) induced by \(d^T y \ge d^T \hat{y}\). This is an aggressive policy that works very well in some cases. However, a possible drawback is that the other faces of \(S^+(\hat{y})\) can be quite close to \((x^*,y^*)\), which tends to reduce the depth of the derived IC.

An alternative approach is to define \(\hat{y}\) so as to have a large number of removable inequalities according to Corollary 1, i.e., inequalities of type \((A x + B \hat{y})_i \le b_i+1\) with

$$\begin{aligned} \sum _{j \in N_x} \max \{A_{ij} x^-_j, A_{ij} x^+_j\} + (B \hat{y})_i \le b_i. \end{aligned}$$
(40)

This leads to the following alternative separation MILP (recall that m denotes the number of rows of B).

$$\begin{aligned} \mathbf{SEP-2:~~} \hat{y}\in \arg \min \sum _{i=1}^{m} w_i \end{aligned}$$
(41)
$$\begin{aligned} d^T y&\le d^T y^* - 1 \end{aligned}$$
(42)
$$\begin{aligned} B y + s&= b \end{aligned}$$
(43)
$$\begin{aligned} s_i + (L^{max}_i - L^*_i) \, w_i&\ge L^{max}_i,&\forall i=1,\dots ,m \end{aligned}$$
(44)
$$\begin{aligned} {l^F} \le y&\le {u^F} \end{aligned}$$
(45)
$$\begin{aligned} y_j&\hbox {~integer},&\forall j \in J_y \end{aligned}$$
(46)
$$\begin{aligned} s&\hbox {~free} \end{aligned}$$
(47)
$$\begin{aligned} w&\in \{0,1\}^m \end{aligned}$$
(48)

where, for each \(i=1,\dots ,m\),

$$\begin{aligned} L^*_i := \sum _{j \in N_x} A_{ij} x^*_j \ \le \ L^{max}_i := \sum _{j \in N_x} \max \{A_{ij} x^-_j, A_{ij} x^+_j\}. \end{aligned}$$

In the model, the binary variable \(w_i\) attains value 0 if the inequality \((A x + B \hat{y})_i \le b_i+1\) can be removed according to (40), \(w_i=1\) otherwise. The objective function (41) then minimizes the number of inequalities that cannot be removed.

Equations (43) define the free variable \(s_i = (b - B y)_i\) for each constraint, hence condition \(s \ge L^*\) implied by (44) actually imposes that the final solution \(\hat{y}\) satisfies \(A x^* + B \hat{y}\le b\). Together with (42), this guarantees that \((x^*,y^*)\) belongs to the interior of \(S^+(\hat{y})\).

In case \(w_i=0\), constraint (44) actually enforces the stronger condition \(s_i \ge L^{max}_i\), i.e., \(\sum _{j \in N_x} \max \{A_{ij} x^-_j, A_{ij} x^+_j\} + (B \hat{y})_i \le b_i\). Hence condition (40) holds and the corresponding inequality can be removed, as claimed.

4.5 Informed No-Good cuts

A known drawback of ICs is their dependency on the LP basis associated with the point to cut, which can create cut accumulation in the LP relaxation and hence shallow cuts and numerical issues. Moreover, ICs are not directly applicable if the point to cut is not a vertex of a certain LP relaxation of the problem at hand, as it happens e.g., when it is computed by the internal MILP heuristics.

We next describe a general-purpose variant of ICs whose derivation does not require any LP basis and is based on the well-known interpretation of ICs as disjunctive cuts. In a sense, we are replacing the cone defined by the LP basis by the cone induced by the tight bound constraints. It turns out that the resulting inequality is valid and violated by any bilevel infeasible solution of HPR in the relevant special case where all x and y variables are binary.

We are given a point \(\xi ^* =(x^*,y^*) \in {\mathbb {R}}^n\) and a polyhedron

$$\begin{aligned} S=\{ \xi \in {\mathbb {R}}^n: g_i^T \xi \le g_{i0}, \, i=1,\dots ,k\} \end{aligned}$$

whose interior contains \(\xi ^*\) but no bilevel-feasible points. Assume that variable-bound constraints \(\xi ^-\le \xi \le \xi ^+\) are present in the \(\overline{\hbox {HPR}}\), where some entries of \(\xi ^-\) or \(\xi ^+\) can be \(-\infty \) or \(+\infty \), respectively. Given \(\xi ^*\) with \(\xi ^-\le \xi ^* \le \xi ^+\), let

$$\begin{aligned} L :=\{ j\in \{1,\dots ,n\} : | \xi ^*_j - \xi ^-_j | \le |\xi ^*_j - \xi ^+_j |\} \end{aligned}$$

and \(U := \{1,\dots ,n\}\setminus L\). In other words, L contains the indices of the variables that are closer to their lower bound than to their upper bound, and vice-versa for U. For the relevant case where \(\xi ^*_j \in \{\xi ^-_j, \xi ^+_j\}\) for all \(j\in \{1,\dots ,n\}\), sets L and U then contain the indices of the variables at their lower or upper bound, respectively. Given the partition (LU), we define the corresponding linear mapping \(\xi \mapsto \overline{\xi }\in {\mathbb {R}}^n\) with

$$\begin{aligned} \overline{\xi }_j := {\left\{ \begin{array}{ll} \xi _j-\xi ^-_j, \quad \hbox { for } j\in L\\ \xi ^+_j - \xi _j, \quad \hbox { for } j\in U\\ \end{array}\right. } \end{aligned}$$

(variable shift and complement). By assumption, any feasible point \(\xi \) must satisfy the k-way disjunction

$$\begin{aligned} \bigvee _{i=1}^k \ \left( \sum _{j=1}^ng_{ij} \xi _j \ge g_{i0}\right) , \end{aligned}$$
(49)

whereas \(\xi ^*\) violates all the above inequalities. Now, each term of (49) can be rewritten in terms of \(\overline{\xi }\) as

$$\begin{aligned} \sum _{j=1}^n \, \overline{g}_{ij} \, \overline{\xi }_j \ge \overline{\beta }_i := g_{i0} - \sum _{j\in L} g_{ij} \xi ^-_j - \sum _{j\in U} g_{ij} \xi ^+_j, \end{aligned}$$
(50)

with \(\overline{g}_{ij} := g_{ij}\) if \(j\in L, \overline{g}_{ij}=-g_{ij}\) otherwise. If \(\overline{\beta }_i > 0\) for all \(i=1,\dots ,k\), one can normalize the above inequalities to get \(\sum _{j=1}^n (\overline{g}_{ij}/\overline{\beta }_i)\, \overline{\xi }_j\) \(\ge 1\) and derive the valid disjunctive cut in the \(\overline{\xi }\) space

$$\begin{aligned} \sum _{j=1}^n \overline{\gamma }_j \overline{\xi }_j \ge 1, \end{aligned}$$
(51)

where \(\overline{\gamma }_j := \max \{\overline{g}_{ij}/\overline{\beta }_i: i=1,\dots ,k\}\). Finally, one can transform it back to the \(\xi \) space in the obvious way.

It is easy to see that, in case \(\xi ^*_j \in \{\xi ^-_j,\xi ^+_j\}\) for all \(j\in \{1,\dots ,n\}\), one has \(\overline{\beta }> 0\), hence the above cut is indeed valid and obviously violated as \(\overline{\xi }^*=0\). In all other cases, the above cut separation is just heuristic.

Inequality (51) is called an Informed No-Good (ING) cut as it can be viewed as a strengthening of the following no-good cut often used for bilevel problems with all-binary variables—and in many other Constraint Programming (CP) and Mathematical Programming (MP) contexts:

$$\begin{aligned} \sum _{j\in L} \xi _j + \sum _{j \in U} (1-\xi _j) \ge 1. \end{aligned}$$
(52)

The cut above corresponds to the very generic choice

$$\begin{aligned} S=\{ \xi \in {\mathbb {R}}^n: \xi _j \le 1 \, \forall j \in L, \ 1-\xi _j \le 1 \, \forall j \in U \} \end{aligned}$$

and is violated by \(\xi ^*\) but is satisfied by any other binary point, hence resulting into a very weak cut. To the best our knowledge, ING cuts are new; they will hopefully be useful in other CP and MP contexts.

5 Computational results

To evaluate the performance of our B&C solution method, we implemented it (in C language) on top of the general-purpose MILP solver IBM ILOG CPLEX 12.6.3 using callbacks.

Internal CPLEX’s heuristics as well as preprocessing have been deactivated in all experiments. IC separation is applied both to fractional solutions (in the so-called usercut callback) and to integer solutions (in the so-called lazyconstraint callback). For fractional solutions, ICs whose normalized violation is very small are just skipped, and a maximum number of cuts max_node_cuts (a given parameter) is allowed to be generated at each node.

Internal numerical-precision thresholds for integrality and constraint-satisfaction tests are set to a very small value (\(10^{-9}\)) so as to guarantee a very precise overall computation.

Our computational study has been performed on an Intel Xeon E3-1220V2 3.1GHz, with 16GB of RAM. This is a quad-core processor launched by Intel in 2012 and credited for 1892 Mflop/s in the Linpack benchmark report of Dongarra [10].

Computing times reported in what follows are in wall-clock seconds and refer to 4-thread runs. The time limit for each run was set to 3600 wall-clock seconds.

Table 1 Our testbed

5.1 Testbed

Table 1 summarizes details about the data sets that have been considered in our computational study. The following four data sources with a total number of 302 instances have been considered: CARAMIA-MARI (available from the authors of [5] upon request), DENEGRE and INTERDICTION (both available at https://github.com/tkralphs/MibS/tree/library/data in February 2016), and MIPLIB (available at https://msinnl.github.io/pages/bilevel.html).

Class CARAMIA-MARI contains 70 randomly generated instances with \(n_1,n_2\in \{5,10,15\}\), with no leader constraints and with \(m=n_1 + n_2\) follower constraints. All variables are required to be integer and all coefficients are randomly generated integer values; see [5] for further details.

Instances of class DENEGRE have been proposed in [7]. They consist of \(n_1 \in \{5,10,15\}\) integer leader variables, and the number of integer follower variables \(n_2\) is set such that \(n_1 + n_2 = 20\). There are \(m \in \{20, 30, 40\}\) follower constraints and no constraints at the leader. All coefficients are integers in the range \([-50,50]\).

Instances of class INTERDICTION have the following structure. All variables are binary and each variable in the leader is associated with a variable in the follower, and vice-versa. The leader has a single constraint that is called the interdiction-budget constraint. The follower problem consists of some combinatorial optimization problem defined on the y variables only, plus additional constraints that involve both x and y variables and have the form \(x_i+y_i \le 1\). In other words, the leader is allowed to interdict to the follower the use of some items, subject to the interdiction budget. The leader and follower problems share the same objective function, with opposite signs. In class INTERDICTION, the follower problem is an assignment problem (25 instances with \(25+25\) variables) or a knapsack problem (100 instances with up to \(50+50\) variables). Although interdiction problems can in some cases be treated with specialized ad-hoc algorithms exploiting the problem structure (as, e.g., done in [7]), in our computational study they are treated as general bilevel problems.

Finally, class MIPLIB has been introduced in [11] with the aim of producing very challenging instances for MIBLP solvers. It is derived from 19 instances of MILPLIB 3.0 [4] containing only binary variables. They have been converted into bilevel problems by labeling the first \(Y\%\) (rounded up) variables as y’s, and the remaining ones as x’s, with \(Y \in \{10,50,90\}\), thus resulting in 57 instances in total. All constraints in the resulting model belong to the follower subproblem, while the objective function is used as the leader objective \(c_x^T x + c_y^T y\). Finally, the follower objective is defined as \(d^T y = - c_y^T y\). In [11], only 30 out of 57 instances from this class have been considered—those containing equality constraints were left out, for the sake of comparison with an alternative solver which could not handle equality constraints. By design, instances in class MIPLIB are much larger (and difficult) than those in the other classes, and involve up to about 80,000 HPR variables and up to about 5000 follower constraints.

5.2 Benchmark solver

In order to have a benchmark code, we implemented the solution method recently proposed in [7], called BENCHMARK in what follows. To have an apple-to-apple comparison, we decided to embed the cuts proposed in [7, cf. Proposition 2.2] within our own CPLEX-based code. These cuts can only be applied to pure integer MIBLP instances in which all variables are integer-constrained, and all constraint coefficients are integer as well. They are used to cut-off a bilevel-infeasible integer vertex \((x^*,y^*)\) of \(\overline{\hbox {HPR}}\) at a given node, so they have been implemented in the CPLEX’s lazyconstraint callback. Each such cut is obtained by adding up all tight constraints (including variable bounds) at \((x^*,y^*)\), written in their \(\ge \) form, and then adding 1 to the right-hand side. Note that the resulting cut is locally valid, and requires that all coefficients in the node-LP matrix are integer (meaning that one is not allowed to generate other cuts with non-integer coefficients); see [7] for details.

A comparison with the results on 30 instances (that were available online) from the set DENEGRE reported in [7, cf. Table 2.4] is given in Table 2. Computing times for the original implementation refer to an eight-core AMD Opteron Processor 6128 @2.0 Ghz with 32GB of memory, launched in March 2010. Time limit for BENCHMARK was set to 3600 wall-clock seconds, while a time limit of 30,000 seconds was used in [7]. Table 2 reports the value of the best solution found (BestSol), the computing time in seconds (t[s]), the percentage gap between the final lower and upper bounds (%GAP), and the speedup of BENCHMARK with respect to the original implementation (column speedup, not reported in case of both hit the time limit).

According to the table, our BENCHMARK implementation is about 10-100 times faster (and solves to proven optimality two more instances) than the original one. Moreover, for problems not solved to proven optimality, BENCHMARK systematically produces better lower and upper bounds. This is not surprising, as our implementation is based on the commercial software CPLEX, while the original one uses the open-source software COIN-OR (BLIS).

We can therefore conclude that our benchmark code BENCHMARK represents in fair way a state-of-the-art exact solver for general MIBLPs.

Table 2 Performance of our BENCHMARK code

5.3 Results

In this subsection we compare the performance of six alternative settings for our MIBLP solver, namely:

  • SEP-1a: our B&C solver using model SEP-1 of Sect. 4.4 for IC separation, and generating at most max_node_cuts=20 cuts at each node (including root);

  • SEP-2a: our B&C solver with SEP-2 and max_node_cuts=20 for all nodes (including root);

  • SEP-1b: our B&C solver with SEP-1 and max_node_cuts=20 for the root node only (=0 for all other nodes);

  • SEP-2b: our B&C solver with SEP-2 and max_node_cuts=20 for the root node only (=0 for all other nodes);

  • ING: our B&C solver with ING cuts applied to the SEP-1 bilevel-free polyhedron; only integer points being separated, i.e., max_node_cuts=0 for all nodes;

  • BENCHMARK: our benchmark code implementing cuts in [7].

The computational analysis reported in [11] shows that ING cuts significantly outperform standard no-good cuts, hence the latter were not considered in our computational tests. Note that, similarly to no-good cuts, ING cuts only work for binary problems, thus setting ING cannot be applied to the instances of classes CARAMIA-MARI and DENEGRE, as they contain general-integer variables.

Table 3 Summary of results obtained for the CARAMIA-MARI class. All 70 instances are solved to optimality by each of the settings

First, we provide a summary of our results on the class of CARAMIA-MARI instances, see Table 3. Given that these instances are solved by each of our settings within fractions of a second, we conclude that they are too easy and do not consider them in the remainder of this computational study. Note that the best performing approach presented in [5] needed on average 35 seconds for this class, on a PC Pentium Core 2 Duo with a 2 GHz processor and 1 GB RAM and using CPLEX 12.3.

We next compare the performance of our different settings using performance profiles [9]. To construct performance profiles, for each setting \(s \in S\) and instance \(p \in P\), a performance ratio

$$\begin{aligned} r_{p,s}=\frac{t_{p,s}}{\min _{s' \in S}\{t_{p,s'}\}} \end{aligned}$$

is calculated, where \(t_{p,s}\) is the time setting s needs to solve instance p to optimality. If a setting s does not solve instance \(p, r_{p,s}\) is set to \(r_M\), which is a value larger than any \(r_{p,s}\), e.g., \(r_M = \max _{s \in S, p \in P} r_{p,s}+1\). In the profiles, the cumulative distribution function of the performance ratio

$$\begin{aligned} \rho _s(\tau )=\frac{100}{|P|}\big |\{p \in P: r_{ps} \le \tau \}\big | \end{aligned}$$

is displayed for each setting \(s \in S\). In particular, the value \(\rho _s(1)\) is the percentage of instances, for which setting s is the fastest, and \(\rho _s(r_M-1)\) gives the percentage of instances setting s manages to solve to optimality. In the performance profile plot, those two values correspond, respectively, to the leftmost and rightmost point of the graph for setting s.

Fig. 2
figure 2

Performance profile plot over all instances (classes DENEGRE, INTERDICTION and MIPLIB); setting ING is missing as it only works for binary problems. For each plotted line, the leftmost value gives the percentage of instances for which the corresponding setting was the fastest, while the rightmost value gives the percentage of instances solved to proven optimality

The performance profile plot for all 232 instances from the classes MIPLIB, INTERDICTION and DENEGRE is given in Figure 2. Note that the horizontal axis is log-scaled. Analyzing the obtained results, we may conclude that, consistently over all different instance classes, our intersection cuts embedded in the branch-and-cut framework significantly outperform the BENCHMARK code. Setting SEP-2a turns out to be the best performing one: for 28% of all instances, SEP-2a is the fastest approach, and it manages to solve to optimality 71% of them within the time limit of one hour. In comparison, the BENCHMARK code is the fastest one for only 8% of all instances and manages to solve only 42% of them to optimality.

To evaluate usefulness of ING cuts, we separately considered only binary instances (namely, those from classes INTERDICTION and MIPLIB) and compared our setting ING with the five remaining ones. The corresponding performance profile is reported in Figure 3. We observe that ING outperforms BENCHMARK by a large margin, being the fastest performing approach for about 15% of all binary instances, and solving to optimality 48% of them. On the contrary, BENCHMARK is almost never the fastest one, and it manages to solve only 32% of all binary instances to optimality.

Fig. 3
figure 3

Performance profile plot over all binary instances (classes INTERDICTION and MIPLIB). Setting ING is now included

In Table 4 we further summarize the obtained results. We provide the number of solved instances (#) per each class, the shifted geometric mean of computing times and the number of branch-and-cut nodes, as well as the average gap. The shifted geometric mean for a given shift s and values \(v_1, v_2, \ldots , v_n\) is defined as \(\root n \of {\prod (v_i+s)}-s\) (see, e.g., [1]). Computing times and number of nodes are shifted by 10 seconds and 100 nodes, respectively. The percentage gap for each instance is calculated as \(\min \{100, \; 100\cdot (UB-LB)/(|UB|+10^{-10})\}\), i.e., gaps are clipped to 100% to avoid too-large values that would make the comparison harder.

Table 4 Summary of obtained results. We report the number of solved instances (#), the shifted geometric mean for computing time (t[s]) and for number of nodes (nodes), and the average gaps (\(g[\%]\))

Analyzing Table 4, one easily concludes that the most-challenging instances considered in our computational study are those from the MIPLIB class. Computing times, number of nodes and final gaps are much larger than the respective values of DENEGRE class. The setting SEP-2a turns out to be the best performing one for classes MIPLIB and INTERDICTION. Its success is particularly striking for the INTERDICTION class, for which the average gap is only 4.64%, whereas for the remaining five settings this gap remains larger than 30%. This indicates that SEP-2 model for separating ICs has a practical value when the follower has a clean (combinatorial) substructure, which is exploited by our inequality-removal scheme for enlarging the bilevel-free polyhedron.

A slightly different behavior can be observed for instances from the DENEGRE class. Recall that this class contains very small instances with up to 20 integer variables. Most of these instances are very easy and can be solved within a fraction of a second. See, for example, Table 1, which reports the computing times for BENCHMARK over the largest 30 instances from this class. For such small instances, separating intersection cuts at every node of the B&C tree does not pay off in general. This explains why the settings with max_node_cuts=0 at the non-root B&C nodes (SEP-1b, SEP-2b and also BENCHMARK) perform much better for these particular cases.

6 Conclusions

We have presented a finitely-convergent (under appropriate conditions) branch-and-cut method for MIBLP, that is intended to be a modification of a classical MILP scheme. By design, our approach focuses on the add-on extras required to convert a branch-and-cut MILP exact code into a valid MIBLP solver. In this way, we inherit the rich set of tools (cuts, heuristics, propagations, etc.) available in modern MILP solvers, and concentrate on bilevel-specific issues.

Valid intersection cuts for MIBLP have been proposed, along with the corresponding separation procedures. We have also introduced a class of “Informed No-Good” (ING) cuts that can be used in the pure-binary case. An extensive computational analysis on different classes of instances from the literature has been reported, showing that the proposed approach outperforms previous proposals by a large margin.

Future work should address the use of different bilevel-free polyhedra for deriving deeper intersection cuts. Different kinds of cuts not requiring the LP basis (like ING cuts) are also of interest and should be investigated as well.