1 Introduction

Consider the quadratic optimization problem with indicator variables

$$\begin{aligned} \text {(QOI)}\quad \min \bigg \{ a'x + b'y + y'Ay \ : \ (x,y) \in C, \ 0 \le y \le x, \ x \in \{0,1\}^N \bigg \}, \end{aligned}$$

where \(N=\{1,\ldots ,n\}\), a and b are n-vectors, A is an \(n \times n\) symmetric matrix and \(C \subseteq \mathbb {R}^{N \times N}\). Binary variables x indicate a selected subset of N and are often used to model non-convexities such as cardinality constraints and fixed charges. (QOI) arises in linear regression with best subset selection [10], control [23], filter design [47] problems, and portfolio optimization [11], among others. In this paper, we give strong convex relaxations for the related mixed-integer set

$$\begin{aligned} S=\big \{(x,y,t)\in \{0,1\}^N\times \mathbb {R}^{N}\times \mathbb {R}:y'Qy\le t,\; 0 \le y_i\le x_i \text { for all }i\in N\big \}, \end{aligned}$$

where Q is an M-matrix [43], i.e., \(Q\succeq 0\) and \(Q_{ij}\le 0\) if \(i\ne j\). M-matrices arise in the analysis of Markov chains [30]. Convex quadratic programming with an M-matrix is also studied on its own right [37]. Quadratic minimization with an M-matrix arises directly in a variety of applications including portfolio optimization with transaction costs [33] and image segmentation [27].

There are numerous approaches in the literature for deriving strong formulations for (QOI) and S. Dong and Linderoth [19] describe lifted inequalities for (QOI) from its continuous quadratic optimization counterpart over bounded variables. Bienstock and Michalka [12] give a characterization linear inequalities obtained by strengthening gradient inequalities of a convex objective function over a non-convex set. Convex relaxations of S can also be constructed from the mixed-integer epigraph of the bilinear function \(\sum _{i\ne j}Q_{ij}y_iy_j\). There is an increasing amount of recent work focusing on bilinear functions [e.g., [13, 14, 35]]. However, the convex hull of such functions is not fully understood even in the continuous case. More importantly, considering the bilinear functions independent from the quadratic function \(\sum _{i\in N}Q_{ii}y_i^2\) may result in weaker formulations for S. Another approach, applicable to general mixed-integer optimization, is to derive a strong formulation based on disjunctive programming [8, 17, 45]. Specifically, if a set is defined as the disjunction of convex sets, then its convex hull can be represented in an extended formulation using perspective functions. Such extended formulations, however, require creating a copy of each variable for each disjunction, and lead to prohibitively large formulations even for small-scale instances. There is also a increasing body of work on characterizing the convex hulls in the original space of variables, but such descriptions may be highly complex even for a single disjunction, e.g., see [7, 9, 31, 39].

The convex hull of S is well-known for a couple of special cases. When the matrix Q is diagonal, the quadratic function \(y'Qy\) is separable and the convex hull of S can be described using the perspective reformulation [21]. This perspective formulation has a compact conic quadratic representation [2, 24] and is by now a standard model strengthening technique for mixed-integer nonlinear optimization [15, 25, 38, 48]. In particular, a convex quadratic function \(y'Ay\) is decomposed as \(y'Dy+y'Ry\), where \(A=D+R\), \(D, R\succeq 0\) and D is diagonal and then each diagonal term \(D_{ii} y_i^2 \le t_i\), \(i \in N\), is reformulated as \(y_i^2 \le t_i x_i\). Such decomposition and strengthening of the diagonal terms are also standard for the binary restriction, where \(y_i=x_i\), \(i\in N\), in which case \(x'Ax \Leftrightarrow \sum _{i\in N}D_{ii}x_i+x'Rx\) [e.g.[3, 44]]. The binary restriction of S, where \(y_i=x_i\) and \(Q_{ij}\le 0\), \(i \ne j\), is also well-understood, since in that case the quadratic function \(x'Qx\) is submodular [40] and min \(\{a' x + x'Qx: x \in \{0,1\}^n\}\) is a minimum cut problem [28, 42] and, therefore, is solvable in poynomial time.

Whereas the set S with an M-matrix is interesting on its own, the convexification results on S can also be used to strengthen a general quadratic \(y'Ay\) by decomposing A as \(A=Q+R\), where Q is an M-matrix, and then applying the convexification results in this paper only on the \(y'Qy\) term with negative off-diagonal coefficients, generalizing the perspective reformulation approach above. We demonstrate this approach for portfolio optimization problems with negative as well as positive correlations through computations that indicate significant additional strengthening over the perspective formulation through exploiting the negative correlations.

The key idea for deriving strong formulations for S is decompose the quadratic function in the definition of S as the sum of quadratic functions involving one or two variables:

$$\begin{aligned} y'Qy=\sum _{i=1}^n\left( \sum _{j=1}^nQ_{ij}\right) y_i^2-\sum _{i=1}^n\sum _{j=i+1}^n Q_{ij}(y_i-y_j)^2. \end{aligned}$$
(1)

Since a univariate quadratic function with an indicator is well-understood, we turn our attention to studying the mixed-integer set with two continuous and two indicator variables:

$$\begin{aligned} X=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}^2\times \mathbb {R}: (y_1-y_2)^2\le t,\; 0 \le y_i\le x_i, \ i=1,2\right\} . \end{aligned}$$

Frangioni et al [22] also construct strong formulations for (QOI) based on \(2\times 2\) decompositions. In particular, they characterize quadratic functions that can be decomposed as the sum of convex quadratic functions with at most two variables. They utilize the disjunctive convex extended formulation for the mixed-integer quadratic set

$$\begin{aligned} {\hat{X}}=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}^2\times \mathbb {R}: q(y)\le t,\; 0 \le y_i\le x_i, \ i=1,2\right\} , \end{aligned}$$

where q(y) is a general convex quadratic function. The authors report that the formulations are weaker when the matrix A is an M-matrix, and remark on the high computational burden of solving the convex relaxations due the large number of additional variables. Additionally, Jeon et al. [29] give conic quadratic valid inequalities for \({\hat{X}}\), which can be easily projected into the original space of variables, and demonstrate their effectiveness via computations. However, a convex hull description of \({\hat{X}}\) in the original space of variable is unknown.

In this paper, we improve upon previous results for the sets S and X. In particular, our main contributions are (i) showing, under mild assumptions, that the minimization of a quadratic function with an M-matrix and indicator variables is equivalent to a submodular minimization problem and, hence, solvable in polynomial time; (ii) giving the convex hull description of X in the original space of variables—the resulting formulations for S are at least as strong as the ones used by Frangioni et al. and require substantially fewer variables; (iii) proposing conic quadratic inequalities amenable to use with conic quadratic MIP solvers—the proposed inequalities dominate the ones given by Jeon et al.; (iv) demonstrating the strength and performance of the resulting formulations for (QOI).

Outline The rest of the paper is organized as follows. In Sect. 2 we review the previous results for S and X. In Sect. 3 we study the relaxations of S and X, where the constraints \(0 \le y_i\le x_i\) are relaxed to \(y_i(1-x_i)=0\), and the related optimization problem. In Sect. 4 we give the convex hull description of X. The convex hulls obtained in Sects. 3 and 4 cannot be immediately implemented with off-the-shelf solvers in the original space of variables. Thus, in Sect. 5 we propose valid conic quadratic inequalities and discuss their strength. In Sect. 6 we give extensions to quadratic functions with positive off-diagonal entries and continuous variables unrestricted in sign. In Sect. 7 we provide a summary computational experiments and in Sect. 8 we conclude the paper.

Notation Throughout the paper, we use the following convention for division by 0: \(\nicefrac {0}{0}=0\) and \(\nicefrac {a}{0}=\infty \) if \(a>0\). In particular, the function \(p:[0,1]\times \mathbb {R}_+\rightarrow \mathbb {R}_+\) given by \(p(x,y)=\nicefrac {y^2}{x}\) is the closure of the perspective function of the quadratic function \(q(y)=y^2\), and is convex [e.g. [26], p.160] . For a set \(X\subseteq \mathbb {R}^N\), \(\text {conv}(X)\) denotes the convex hull of X. Throughout, Q denotes an \(n\times n\) M-matrix, i.e., \(Q \succeq 0\) and \(Q_{ij}\le 0\) for \(i\ne j\).

2 Preliminaries

In this section we briefly review the relevant results on the binary restriction of S and the previous results on set X.

2.1 The binary restriction of S

Let \(S_B\) be the binary restriction of S, i.e. \(y=x \in \{0,1\}^n\). In this case, the decomposition

$$\begin{aligned} x'Qx = \sum _{i=1}^n\left( \sum _{j=1}^nQ_{ij}\right) x_i^2-\sum _{i=1}^n\sum _{j=i+1}^n Q_{ij}(x_i-x_j)^2 \le t \end{aligned}$$
(2)

leads to \(\text {conv}(S_B)\), by simply taking the convex hull of each term. Indeed, the quadratic problem \(\min \big \{x'Qx: x \in \{0,1\}^n \big \}\) is equivalent to an undirected min-cut problem [e.g. [42]] and can be formulated as

$$\begin{aligned} \min \sum _{i=1}^n\left( \sum _{j=1}^nQ_{ij}\right) x_i - \sum _{i=1}^n\sum _{j=i+1}^n Q_{ij} t_{ij}: x_i - x_j \le t_{ij}, \ x_j - x_i \le t_{ij}, \ 0 \le x \le 1. \end{aligned}$$

Decomposition (2) leading to a simple convex hull description of \(S_B\) in the binary case is our main motivation for studying decomposition (1) with the indicator variables.

2.2 Previous results for set X

Here we review the valid inequalities of Jeon et al. [29] for X. Although their construction is not directly applicable as they assume a strictly convex function, one can utilize it to obtain limiting inequalities. For \(q(y)=y'Ay\) the inequalities of Jeon et al. are described via the inverse of the Cholesky factor of A. However, for X, we have \(q(y)=(y_1-y_2)^2\) or \(q(y)=y'Ay\), where \(A=\left[ {\begin{matrix} 1 &{} -1 \\ -1 &{} 1\end{matrix}} \right] \) is a singular matrix and the Cholesky factor is not invertible.

However, if the matrix is given by \(A= \left[ {\begin{matrix} d_1 &{} -1 \\ -1 &{} d_2\end{matrix}} \right] \) with \(d_1,d_2> 1\), then their approach yields three valid inequalities:

$$\begin{aligned}&d_2\frac{y_2^2}{x_2}-\frac{1}{d_1}x_1+\left( \frac{d_1d_2-1}{d_1}\right) \frac{y_2^2}{x_2}\le t\\&(d_2-1)\frac{y_2^2}{x_2}+d_1\frac{y_1^2}{x_1}+\frac{x_2}{d_1}-2x_2\le t\\&\left( \frac{d_1d_2-1}{d_1}\right) \frac{y_2^2}{x_2}+\frac{\left( \sqrt{d_1}y_1-\sqrt{\frac{1}{d_1}}y_2\right) ^2}{x_1+x_2}\le t. \end{aligned}$$

As \(d_1, d_2 \rightarrow 1\), we arrive at three limiting valid inequalities for X.

Proposition 1

The following convex inequalities are valid for X:

$$\begin{aligned} \frac{y_2^2}{x_2}-x_1&\le t, \end{aligned}$$
(3)
$$\begin{aligned} \frac{y_1^2}{x_1}-x_2&\le t, \end{aligned}$$
(4)
$$\begin{aligned} \frac{\left( y_1-y_2\right) ^2}{x_1+x_2}&\le t. \end{aligned}$$
(5)

For completeness, we verify here the validity of the limiting inequalities directly. The validity of inequality (3) is easy to see: observe that \(\nicefrac {y_2^2}{x_2}\le 1\) for \((x,y)\in X\); then, for \(x_1=0\), (3) reduces to the perspective formulation for the quadratic constraint \(y_2^2\le t\), and for \(x_1=1\) we have \(\nicefrac {y_2^2}{x_2}-x_1\le 0 \le t\). The validity of inequality (4) is proven identically. Finally, inequality (5) is valid since it forces \(y_1=y_2\) when \(x_1=x_2=0\), and is dominated by the original inequality \((y_1-y_2)^2\le t\) for other integer values of x.

Inequalities (3)–(5) are not sufficient to describe conv(X) though. In the next two sections we describe conv(X) and give new conic quadratic valid inequalities dominating (3)–(5) for X.

3 The unbounded relaxation

In this section we study the unbounded relaxations of S and X obtained by dropping the upper bound on the continuous variables:

$$\begin{aligned} S_U&=\left\{ (x,y,t)\in \{0, 1\}^N\times \mathbb {R}_+^{N}\times \mathbb {R}:y'Qy\le t,\;y_i(1-x_i)=0 \text { for all }i\in N\right\} ,\\ X_U&=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}_+^2\times \mathbb {R}: (y_1-y_2)^2\le t: y_i(1-x_i)=0,\; i=1,2\right\} . \end{aligned}$$

In Sect. 3.1 we show that the minimization of a linear function over \(S_U\) is equivalent to a submodular minimization problem and, consequently, solvable in polynomial time. In Sect. 3.2, we describe \(\text {conv}(X_U)\) and in Sect. 3.3 we use the results in Sect. 3.2 to derive valid inequalities for \(S_U\).

3.1 Optimization over \(S_U\)

We now show that the optimization of a linear function over \(S_U\) can be solved in polynomial time under a mild assumption on the objective function. Consider the problem

$$\begin{aligned} \text {(P)} \quad \min \left\{ a'x+b'y+t: (x,y,t) \in S_U \right\} , \end{aligned}$$

where Q is a positive definite M-matrix and \(b\le 0\). We show that (P) is a submodular minimization problem. The positive definiteness assumption on Q ensures that an optimal solution exists. Otherwise, if there is \(y \ge 0\) with \(y'Qy = 0\), the problem may be unbounded. The assumption \(b\le 0\) is satisfied in most applications (e.g., see Sects. 7.1 and 7.3). If \(b > 0\), then \(y=0\) in any optimal solution.

Proposition 2

(Characterization 15 [43]) A positive definite M-matrix Q is inverse-positive, i.e., its inverse satisfies \(Q_{ij}^{-1}\ge 0\) for all ij.

Proposition 3

Problem (P) is equivalent to a submodular minimization problem and it is, therefore, solvable in polynomial time.

Proof

We assume that \(a\ge 0\) (otherwise \(x=1\) in any optimal solution) and that an optimal solution exists. Given an optimal solution \((x^*,y^*)\) to (P), let \(T=\left\{ i\in N: y_i^*>0\right\} \), \(b_T\) the subvector of b induced by T, and by \(Q_T\) the submatrix of Q induced by T. Then, from KKT conditions, we find \( b_T+2Q_T y_T=0 \Leftrightarrow y_T=-\nicefrac {Q_T^{-1}b_T}{2} \cdot \) Thus, an optimal solution satisfies \(b'y^*+{y^*}'Qy^*=-\frac{b_T'Q_T^{-1}b_T}{4} \cdot \)

Consequently, defining \(\theta _{ij}:2^N\rightarrow \mathbb {R}\) for \(i,j\in N\) as \( \theta _{ij}(T)= (Q_T^{-1})_{ij} \text { if }i, j\in T \text { and } 0 \text { o.w.,} \) observe that (P) is equivalent to the binary minimization problem

$$\begin{aligned} \min _{T\subseteq N} \ \ a(T)-\frac{1}{4}\sum _{i\in N}\sum _{j\in N}b_ib_j \theta _{ij}(T) \cdot \end{aligned}$$

Note that since \(Q_T\) is a positive definite M-matrix for any \(T\subseteq N\), \(Q_T= \mu I_T-P_T\), where \(P_T\) is a nonnegative matrix and the largest eigenvalue of \(P_T\) is less than \(\mu \). By scaling, we may assume that \(\mu =1\). Moreover, \(Q_T^{-1}=(I-P_T)^{-1}=\sum _{\ell =0}^\infty P_T^{\ell }\) [e.g. [49]]. For \(\ell \in \mathbb {Z}_+\) and all \(i,j\in N\) let \( {\bar{\theta }}_{ij}^\ell (T)=(P_T^\ell )_{ij} \text { if }i,j\in T, \text { and } 0 \text { o.w.} \) Note that \(\theta _{ij}(T)=\sum _{\ell =0}^\infty {\bar{\theta }}_{ij}^\ell (T)\). Finally, define for \(k\in N\) and \(T\subseteq N{\setminus }\{k\}\) the increment function \(\rho _{ij}^\ell (k,T)= {\bar{\theta }}_{ij}^\ell (T\cup \{k\})-{\bar{\theta }}_{ij}^\ell (T)\).

Claim

For all \(i,j\in N\) and \(\ell \in \mathbb {Z}_+\), \({\bar{\theta }}_{ij}^\ell \) is a monotone supermodular function.

Proof

The claim is proved by induction on \(\ell \).

  • Base case, \(\ell =0\): Let \(k\in N\) and \(T\subseteq N{\setminus } \{k\}\). Note that \(P_T^0=I_T\). Thus \(\rho _{kk}^0(k,T)=1\), and \(\rho _{ij}^0(k,T)=0\) for all cases except \(i=j=k\). Thus, the marginal contributions are constant and \({\bar{\theta }}_{ij}^0\) is supermodular. Monotonicity can be checked easily.

  • Induction step: Suppose \({\bar{\theta }}_{ij}^\ell \) is supermodular and monotone for all \(i,j\in N\). Observe that \({\bar{\theta }}_{ij}^{\ell +1}(T)=\sum _{t\in N}{\bar{\theta }}_{it}^{\ell }(T)P_{tj}\) if \(i,j\in T\) and \({\bar{\theta }}_{ij}^{\ell +1}(T)=0\) otherwise. Monotonocity of \({\bar{\theta }}_{ij}^{\ell +1}\) follows immediately from the monotonicity of the functions \({\bar{\theta }}_{it}^{\ell }\). Now let \(k\in N\) and \(T_1\subseteq T_2\subseteq N{\setminus }\{k\}\). To prove supermodularity, we check that \(\rho _{ij}^{\ell +1}(k,T_2)-\rho _{ij}^{\ell +1}(k,T_1)\ge 0\) by considering all cases:

    • \(k\not \in \{i,j\}\): If \(\{i,j\}\subseteq T_1\) then \(\rho _{ij}^{\ell +1}(k,T_2)-\rho _{ij}^{\ell +1}(k,T_1)=\sum _{t\in N}(\rho _{it}^{\ell }(k,T_2)-\rho _{it}^{\ell }(k,T_1))P_{tj}\ge 0\) by supermodularity of functions \({\bar{\theta }}_{it}^\ell \); if \(\{i,j\}\not \subseteq T_1\) and \(\{i,j\}\subseteq T_2\) then \(\rho _{ij}^{\ell +1}(k,T_2)-\rho _{ij}^{\ell +1}(k,T_1)=\rho _{ij}^{\ell +1}(k,T_2)\ge 0\) by monotonicity; finally, if \(\{i,j\}\not \subseteq T_2\) then \(\rho _{ij}^{\ell +1}(k,T_2)-\rho _{ij}^{\ell +1}(k,T_1)=0\).

    • \(k=i\): If \(j\in T_1\) then \(\rho _{kj}^{\ell +1}(k,T_2)-\rho _{kj}^{\ell +1}(k,T_1)=\sum _{t\in N}(\rho _{kt}^{\ell }(k,T_2)-\rho _{kt}^{\ell }(k,T_1))P_{tj}\ge 0\) by supermodularity of functions \({\bar{\theta }}_{kt}^\ell \); if \(j\not \in T_1\) and \(j\in T_2\) then \(\rho _{kj}^{\ell +1}(k,T_2)-\rho _{kj}^{\ell +1}(k,T_1)=\bar{\theta }_{kj}^{\ell +1}(T_2\cup \{k\})\ge 0\); finally, if \(j\not \in T_2\) then \(\rho _{kj}^{\ell +1}(k,T_2)-\rho _{kj}^{\ell +1}(k,T_1)=0\). The case \(k=j\) is identical.\(\square \)

As \(\theta _{ij}(T)=\sum _{\ell =0}^\infty {\bar{\theta }}_{ij}^\ell (T)\) is a sum of supermodular functions, it is supermodular. Consequently, \(\nicefrac {1}{4}\sum _{i\in N}\sum _{j\in N}b_ib_j \theta _{ij}(T)\) is a supermodular function and (P) is a submodular minimization problem, solvable with a strongly polynomial number of calls to a value oracle [e.g.[41]]. Evaluating the submodular function for a given set T, i.e., computing \(a(T)-\nicefrac {b_T'Q_T^{-1}b_T}{4}\), requires only matrix multiplication and inversion, and can be done in strongly polynomial time. Therefore (P) is solvable in strongly polynomial time. \(\square \)

3.2 Convex hull of \(X_U\)

Consider the function \(f:[0,1]^2\times \mathbb {R}_+^2\rightarrow \mathbb {R}_+\) defined as

$$\begin{aligned} f(x,y)={\left\{ \begin{array}{ll}\frac{(y_1-y_2)^2}{x_1}&{} \text {if }y_1\ge y_2\\ \frac{(y_2-y_1)^2}{x_2}&{} \text {if }y_1\le y_2\end{array}\right. } \end{aligned}$$
(6)

and the corresponding nonlinear inequality

$$\begin{aligned} f(x,y)\le t. \end{aligned}$$
(7)

Remark 1

Observe that that inequality (7) dominates inequality (5) since

$$\begin{aligned} \frac{(y_1-y_2)^2}{x_1+x_2}\le \frac{(y_1-y_2)^2}{\max \{x_1,x_2\}}\le f(x,y). \end{aligned}$$

Inequalities (3)–(4) are not valid for the unbounded relaxation as the conditions \(\nicefrac {y_i^2}{x_i}\le 1\) are not satisfied by all feasible points in \(X_U\). For example, feasible points with \(x_1=x_2=1\), \(y_1=y_2>1\) and \(t=0\) are cut off by (3)–(4).

Proposition 4

Inequality (7) is valid for \(X_U\).

Proof

There are four cases to consider. If \(x_1=x_2=1\), then f(xy) reduces to the original quadratic inequality \((y_1-y_2)^2\), thus the inequality is valid. If \(x_1=x_2=0\), then the points in \(X_U\) satisfy \(y_1=y_2=0\) and \(t\ge 0\); since \(f(0,0)=0\), none of these points are cut off by (7). If \(x_1=1\) and \(x_2=0\), then \(y_2=0\) in any point in \(X_U\) and, in particular, \(y_1\ge y_2\); thus f(xy) reduces to the original inequality. The case where \(x_1=0\) and \(x_2=1\) is similar. \(\square \)

Observe that function f is a piecewise nonlinear function, where each piece is conic quadratic representable. However, the pieces are not valid outside of the region where they are defined, e.g., \((y_1-y_2)^2\le tx_1\) is invalid when \(y_2>y_1\) as it cuts off feasible points with \(x_1=y_1=0\) and \(y_2>0\). Thus, inequality (7) is not equivalent to the system given by \((y_1-y_2)^2\le tx_i\), \(i=1,2\). Nevertheless, as shown in Proposition 5 below, (7) is a convex inequality.

Proposition 5

The function f is convex on its domain.

Proof

Let \(({\bar{x}},{\bar{y}}),({\hat{x}},{\hat{y}})\in [0,1]^2\times \mathbb {R}_+^2\) and let \((x^*,y^*)=(1-\lambda )({\bar{x}},{\bar{y}})+\lambda ({\hat{x}},{\hat{y}})\) for \(0\le \lambda \le 1\) be a convex combination of \(({\bar{x}},{\bar{y}})\) and \(({\hat{x}},{\hat{y}})\). We need to prove that

$$\begin{aligned} f(x^*,y^*)\le (1-\lambda )f({\bar{x}},{\bar{y}}) + \lambda f({\hat{x}},{\hat{y}}). \end{aligned}$$
(8)

If \({\bar{y}}_1\ge {\bar{y}}_2\) and \({\hat{y}}_1\ge {\hat{y}}_2\), or \({\bar{y}}_1\le {\bar{y}}_2\) and \({\hat{y}}_1\le {\hat{y}}_2\), inequality (8) holds by convexity of the individual functions in the definition of f. Otherwise, assume, without loss of generality, that \({\bar{y}}_1\ge {\bar{y}}_2\), \({\hat{y}}_1\le {\hat{y}}_2\), and \(y_1^*\le y_2^*\). Letting \(\gamma =\lambda -(1-\lambda )\frac{{\bar{y}}_1-{\bar{y}}_2}{{\hat{y}}_2-{\hat{y}}_1}\), observe that

  • \(\gamma \le \lambda \le 1\).

  • \(\gamma \ge 0\), which is equivalent to \(y_2^*-y_1^*\ge 0\).

  • \(y_2^*-y_1^*=\gamma ({\hat{y}}_2-{\hat{y}}_1)\).

  • \(\gamma {\hat{x}}_2\le \lambda {\hat{x}}_2\le x_2^*\).

Then, we find

$$\begin{aligned} f(x^*,y^*)&=\frac{(y_2^*-y_1^*)^2}{x_2^*}\le \frac{(y_2^*-y_1^*)^2}{\gamma {\hat{x}}_2}\\&=\gamma \frac{({\hat{y}}_2-{\hat{y}}_1)^2}{ {\hat{x}}_2} \le \lambda f({\hat{x}},{\hat{y}})+(1-\lambda )f({\bar{x}},{\bar{y}}). \end{aligned}$$

\(\square \)

A consequence of Proposition 5 is that the convex inequality (7) can be implemented (with off-the-shelf solvers) using subgradient inequalities as for a subgradient \(\xi \in \partial f({\bar{x}},{\bar{y}})\) at a given point \(({\bar{x}},{\bar{y}})\), we have \(f({\bar{x}},{\bar{y}})+\xi '(x-{\bar{x}},y-{\bar{y}})\le f(x,y),\) for all points (xy) in the domain of the convex function f. In particular, the linear cuts

$$\begin{aligned} f({\bar{x}},{\bar{y}})+\xi '(x-{\bar{x}},y-{\bar{y}})\le t \text { for } \xi \in \partial f({\bar{x}},{\bar{y}}) \end{aligned}$$
(9)

provide an outer-approximation of \(f(x,y) \le t\) at \(({\bar{x}},{\bar{y}})\) and are valid everywhere on the domain. A subgradient \(\xi \) can be found simply by taking the gradient of the relevant piece of the function at \(({\bar{x}},{\bar{y}})\). In particular, for \({\bar{y}}_1\ge {\bar{y}}_2\) and \({{\bar{x}}}_1>0\), a subgradient inequality is

$$\begin{aligned} -\left( \frac{{\bar{y}}_1-{\bar{y}}_2}{{{\bar{x}}}_1}\right) ^2x_1+2\left( \frac{{\bar{y}}_1-{\bar{y}}_2}{{{\bar{x}}}_1}\right) (y_1-y_2)\le t. \end{aligned}$$
(10)

The process outlined here to find subgradient cuts (9) for f can be utilized for any convex piecewise nonlinear function, and will be used for other functions in the rest of the paper. Convex piecewise nonlinear functions also arise in strong formulations for mixed-integer conic quadratic optimization [5], and subgradient linear cuts for such functions were recently used in the context of the pooling problem [36].

As Theorem 1 below states, inequality (7) and bound constraints for the binary variables describe the convex hull of \(X_U\).

Theorem 1

(Convex hull of \(X_U\))

$$\begin{aligned} \text {conv}(X_U)=\left\{ (x,y,t)\in [0,1]^2\times \mathbb {R}_+^2\times \mathbb {R}: f(x,y)\le t\right\} . \end{aligned}$$

Proof

Consider the optimization problems

$$\begin{aligned} (P_0)&\qquad \min _{(x,y,t)\in X_U} a'x+b'y+ct;\\ (P_1)&\qquad \min _{(x,y,t)\in [0,1]^2\times \mathbb {R}_+^2\times \mathbb {R}} a'x+b'y+ct\text { s.t. } f(x,y)\le t. \end{aligned}$$

To prove the result we show that for any value of abc, either \((P_0)\) and \((P_1)\) are both unbounded, or there exists a solution integral in x that is optimal for both problems. If \(c<0\), then \((P_0)\) and \((P_1)\) are both unbounded, and if \(c=0\) then \((P_1)\) corresponds to an optimization problem over an integral polyhedron and it is easily checked that \((P_0)\) and \((P_1)\) are equivalent. Thus, the interesting case is \(c>0\) or, by scaling, \(c=1\). Note that \(t=(y_1-y_2)^2\) in any optimal solution of \((P_0)\), and \(t=f(x,y)\) in any optimal solution of \((P_1)\). If \(b_1, b_2\ge 0\), then \(y_1=y_2=0\) is optimal with corresponding integer x optimal for both \((P_0)\) and \((P_1)\).

Moreover, if \(b_1+b_2<0\), then both problems are unbounded: \(x_1=x_2=1\), \(y_1=y_2=\lambda \) is feasible for any \(\lambda > 0\) for both problems. Thus, one needs to consider only the case where \(b_1+b_2 \ge 0\) and \(b_1 < 0\) or \(b_2 < 0\). Without loss of generality, let \(b_1<0\) and \(b_2>0\).

Optimal solutions of \((P_0)\). There exists an optimal solution with \(y_2=0\) (if \(0<y_2 \le y_1\), subtracting \(\epsilon >0\) from both \(y_1\) and \(y_2\) does not increase the objective – and if \(y_2>y_1\), then swapping the values of \(y_1\) and \(y_2\) reduces the objective). Thus, \(y_2=0\), \(x_2=0\) if \(a_2\ge 0\) and \(x_2=1\) otherwise, and either \(x_1=y_1=0\) or \(x_1=1\) and \(y_1=-\frac{b_1}{2}\), which is the stationary point of \(b_1 y_1 + y_1^2\).

Optimal solutions of \((P_1)\). Note that there exists an optimal solution of \((P_1)\) where at least one of the continuous variables is 0 (if \(0<y_1,y_2\), subtracting \(\epsilon >0\) from both variables does not increase the objective value — this operation does not change the relative order of \(y_1\) and \(y_2\)). Then, we conclude that \(y_2=0\) in an optimal solution (if \(y_1=0\) and \(y_2>0\), then setting \(y_2=0\) reduces the objective value). Moreover, when \(y_2=0\), then \(f(x,y)=y_1^2/x_1\). Thus, in the optimal solution \(y_1=-b_1x_1/2\). Substituting in the objective, we see that \((P_1)\) simplifies to \( \min _{0\le x_1, x_2\le 1} a_2x_2+ \big (a_1-b_1^2/4 \big )x_1. \) For an optimal solution, \(x_2=0\) if \(a_2\ge 0\) and \(x_2=1\) otherwise, and \(x_1=0\) if \(a_1-b_1^2/4\ge 0\) and \(x_1=1\) otherwise. And, if \(x_1=1\), then \(y_1=-b_1/2\). Hence, the optimal solutions coincide. \(\square \)

3.3 Valid inequalities for \(S_U\)

Inequalities in an extended formulation

Let \({\bar{Q}}_i = \sum _{j=1}^nQ_{ij}\) and \(P = \{i \in N: {\bar{Q}}_i > 0\}\) and \({\bar{P}} = N{\setminus }P\). Using decomposition (1) and introducing \(t_{ij}\), \(1 \le i \le j \le n\), one can write a convex relaxation of \(S_U\) as

$$\begin{aligned} \sum _{i \in {\bar{P}}} {\bar{Q}}_i y_i + \sum _{i \in P} {\bar{Q}}_i y_i^2/x_i - \sum _{i=1}^n \sum _{j=i+1}^n Q_{ij} t_{ij}&\le t \\ f(x_i, x_j, y_i, y_j)&\le t_{ij}, \ \ 1 \le i \le j \le n. \end{aligned}$$

Inequalities in the original space of variables By projecting out the auxiliary variables \(t_{ij}\) one obtains valid inequalities in the original space of variables. By re-indexing variables if necessary, assume that \(y_1\ge y_2\ge \ldots \ge y_n\) to obtain the convex inequality

$$\begin{aligned} \sum _{i \in {\bar{P}}} {\bar{Q}}_i y_i + \sum _{i \in P} {\bar{Q}}_i y_i^2/x_i - \sum _{i=1}^n\sum \limits _{j=i+1}^n Q_{ij}(y_i-y_j)^2/x_i\le t. \end{aligned}$$
(11)

Observe that the nonlinear inequality (11) is valid only if \(y_1\ge \ldots \ge y_n\) holds. However, we can obtain linear inequalities that are valid for \(S_U\) by underestimating the convex function \( \sum _{i \in {\bar{P}}} {\bar{Q}}_i y_i + \sum _{i \in P} {\bar{Q}}_i y_i^2/x_i - \sum _{i=1}^n \sum _{j=i+1}^n Q_{ij} f(x_i,x_j,y_i,y_j)\) by its subgradients. Let \(({\bar{x}},{\bar{y}})\in [0,1]^N\times \mathbb {R}_+^N\) be such that \({\bar{y}}_1\ge \ldots \ge {\bar{y}}_n\) and \({\bar{x}}>0\). Then, the subgradient inequality

$$\begin{aligned}&-\sum _{i\in P}{\bar{Q}}_i \left( \frac{{\bar{y}}_i}{{{\bar{x}}}_i}\right) ^2 x_i+\sum _{i=1}^n \left( \sum \limits _{j=i+1}^n\frac{ Q_{ij}({\bar{y}}_i-{\bar{y}}_j)^2}{{\bar{x}}_i^2}\right) x_i\\&\quad +2\sum _{i\in P}{\bar{Q}}_i \frac{{\bar{y}}_i}{{{\bar{x}}}_i} y_i+\sum _{i\in {\bar{P}}}{\bar{Q}}_i y_i\\&\quad +2\sum _{i=1}^n\left( \sum _{j=1}^{i-1}\frac{Q_{ij}({\bar{y}}_j-{\bar{y}}_i)}{{\bar{x}}_j} -\sum \limits _{j=i+1}^n\frac{ Q_{ij}({\bar{y}}_i-{\bar{y}}_j)}{{\bar{x}}_i} \right) y_i\le t, \end{aligned}$$

corresponding to a first order approximation of (11) around \(({\bar{x}},{\bar{y}})\), is valid for \(S_U\) (regardless of the ordering of the variables).

4 The bounded set X

Let \(g:[0,1]^2\times \mathbb {R}_+^2\rightarrow \mathbb {R}_+\) be defined as

$$\begin{aligned} g(x,y)={\left\{ \begin{array}{ll} \frac{(y_1-x_2)^2}{x_1-x_2}+\frac{(x_2-y_2)^2}{x_2} &{} \text {if }y_2\le x_2\le y_1 \text { and }x_2(x_1-y_1)\le y_2(x_1-x_2) \\ \frac{(y_2-x_1)^2}{x_2-x_1}+\frac{(x_1-y_1)^2}{x_1} &{} \text {if }y_1\le x_1\le y_2 \text { and }x_1(x_2-y_2)\le y_1(x_2-x_1)\\ f(x,y) &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$
(12)

where f is the function defined in (6). This section is devoted to proving the main result:

Theorem 2

(Convex hull of X)

$$\begin{aligned} \text {conv}(X)=\left\{ (x,y,t)\in [0,1]^2\times \mathbb {R}_+^3: g(x,y)\le t,\; y_i \le x_i,\;i=1,2 \right\} . \end{aligned}$$

Remark 2

Observe that for the binary restriction \(X_B\) with \(y_i=x_i\), \(i=1,2\), \(g(x,y) \le t\) reduces to \(|x_1 - x_2| \le t\), which together with the bound constraints describe \(\text {conv}(X_B)\).

The rest of this section is organized as follows. In Sect. 4.1 we give the convex hull description of the intermediate set with two continuous variables and one indicator variable:

$$\begin{aligned} X_1=\left\{ (x,y,t)\in \{0,1\}\times \mathbb {R}_+^2\times \mathbb {R}: (y_1-y_2)^2\le t,\; y_1\le x,\; y_2\le 1\right\} . \end{aligned}$$

In Sect. 4.2 we use this results to prove Theorem 2. Finally, in Sect. 4.3 we give valid inequalities for S. Unlike in Sect. 3, the convex hull proofs in this section are constructive, i.e., we show how g is constructed from the mixed-binary description of X, instead of just verifying that g does indeed result in conv(X).

4.1 Convex hull description of \(X_1\)

Let \(g_1:[0,1]\times \mathbb {R}_+^2\rightarrow \mathbb {R}_+\) be given by

$$\begin{aligned} g_1(x,y_1, y_2)={\left\{ \begin{array}{ll}\frac{\left( y_2-x\right) ^2}{1-x}+\frac{\left( x-y_1\right) ^2}{x} &{} \text {if }x-y_1\le x(y_2-y_1)\\ \frac{\left( y_1-y_2\right) ^2}{x} &{} \text {if }y_2\le y_1\\ (y_2-y_1)^2 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Proposition 6

\(\text {conv}(X_1)=\left\{ (x,y,t)\in [0,1]\times \mathbb {R}_+^2\times \mathbb {R}: g_1(x,y_1,y_2)\le t, y_1\le x, \; y_2\le 1 \right\} \).

Proof

Note that a point (xyt) belongs to \(\text {conv}(X_1)\) if and only if there exists \(({\bar{x}},{\bar{y}},{\bar{t}})\), \(({\hat{x}},{\hat{y}},{\hat{t}})\) and \(0\le \lambda \le 1\) such that

$$\begin{aligned}&t=(1-\lambda ){\bar{t}}+\lambda {\hat{t}}\end{aligned}$$
(13)
$$\begin{aligned}&x=(1-\lambda ){\bar{x}}+\lambda {\hat{x}}\end{aligned}$$
(14)
$$\begin{aligned}&y_1=(1-\lambda ){\bar{y}}_1+\lambda {\hat{y}}_1\end{aligned}$$
(15)
$$\begin{aligned}&y_2=(1-\lambda ){\bar{y}}_2+\lambda {\hat{y}}_2\end{aligned}$$
(16)
$$\begin{aligned}&{\bar{x}}=0,\; {\hat{x}}=1\end{aligned}$$
(17)
$$\begin{aligned}&{\bar{y}}_1=0,\; 0\le {\hat{y}}_1\le 1\end{aligned}$$
(18)
$$\begin{aligned}&0\le {\bar{y}}_2, \ {\hat{y}}_2\le 1\end{aligned}$$
(19)
$$\begin{aligned}&{\bar{t}}\ge {\bar{y}}_2^2\end{aligned}$$
(20)
$$\begin{aligned}&{\hat{t}}\ge ({\hat{y}}_1-{\hat{y}}_2)^2. \end{aligned}$$
(21)

The non-convex system (13)–(21) follows directly from the definition of the convex hull. Note that a convex extended formulation of conv(\(X_1\)) could also be obtained using the approach proposed by [17]. See also Vielma [46] for a recent approach to eliminate the auxiliary variables using Cayley embedding. We now show how to project out the additional variables \(({\bar{x}},{\bar{y}},{\bar{t}})\), \(({\hat{x}},{\hat{y}},{\hat{t}})\) to find conv\((X_1)\) in the original space of variables, which can be done directly from the non-convex formulation above.

From constraints (14) and (17) we see \(\lambda =x\), from constraint (15) \({\hat{y}}_1=\frac{y_1}{x}\), from (18) \(y_1\le x\), from (16) we find \({\bar{y}}_2=\frac{y_2-x{\hat{y}}_2}{1-x}\), and from (19) we get \(0\le {\hat{y}}_2\le 1\) and \(0\le \frac{y_2-x{\hat{y}}_2}{1-x}\le 1\). Thus, (13)–(21) is feasible if and only if \(0\le y_1 \le x\), \(0\le y_2 \le 1\) and there exists \({\hat{y}}_2\) such that

$$\begin{aligned}&t\ge \frac{\left( y_2-x{\hat{y}}_2\right) ^2}{1-x}+\frac{\left( x{\hat{y}}_2-y_1\right) ^2}{x}, \ \ 0\le {\hat{y}}_2\le 1, \ \ \frac{y_2}{x}-\frac{1-x}{x}\le {\hat{y}}_2\le \frac{y_2}{x} \cdot \end{aligned}$$

The existence of such \({\hat{y}}_2\) can be checked by solving the convex optimization problem

$$\begin{aligned} \text {(M1)}&\qquad \min \; \varphi ({\hat{y}}_2):= \frac{\left( y_2-x{\hat{y}}_2\right) ^2}{1-x}+\frac{\left( x{\hat{y}}_2-y_1\right) ^2}{x}\\&\qquad \text {s.t.}\;\max \left\{ 0,\frac{y_2}{x}-\frac{1-x}{x} \right\} \le {\hat{y}}_2\le \min \left\{ 1, \frac{y_2}{x}\right\} . \end{aligned}$$

The equation \(\varphi '({\hat{y}}_2)=0\) yields

$$\begin{aligned}&-\frac{\left( y_2-x{\hat{y}}_2\right) }{1-x}+\frac{\left( x{\hat{y}}_2-y_1\right) }{x}=0\\&\quad \Leftrightarrow {\hat{y}}_2=y_2+y_1\frac{1-x}{x}:=\eta (x,y). \end{aligned}$$

Let \({\hat{y}}_2^*\) be an optimal solution to (M1). Note that \({\hat{y}}_2^*> 0\) whenever \( \eta (x,y)> 0\). Moreover, \(\eta (x,y)\le \frac{y_2}{x}-\frac{1-x}{x}\implies y_1+1\le y_2\), which can only happen if \(y_1=0\) and \(y_2=1\), in which case \(\frac{y_2}{x}-\frac{1-x}{x}=1\) . Thus, we may assume that \({\hat{y}}_2^*\) is not equal to one of its lower bounds.

Now observe that \(\frac{y_2}{x}\le \eta (x,y)\Leftrightarrow y_2\le y_1\), in which case \(\eta (x,y)\le \frac{y_1}{x}\le 1\). Additionally, if \(1\le \eta (x,y)\), then \(x\le y_2\) and in particular \(y_1\le y_2\). Therefore, the cases \( \eta (x,y) \le \min \{1,\frac{y_2}{x}\}\), \(\eta (x,y)\ge 1\), and \(\eta (x,y)\ge \frac{y_2}{x}\) are mutually exclusive if \(\frac{y_2}{x}\ne x\), and the optimal solution of (M1) corresponds to setting \({\hat{y}}_2^*=\eta (x,y)\), \({\hat{y}}_2^*=1\), or \({\hat{y}}_2^*=\frac{y_2}{x}\), respectively. By calculating the objective function of (M1) with the appropriate value of \({\hat{y}}_2^*\), we find \(\varphi ({\hat{y}}_2^*) = g_1(x,y_1,y_2)\). Hence, \((x,y,t)\in \text {conv}(X_1)\) if and only if \(t\ge g_1(x,y_1,y_2)\) and \(0\le y_1\le x\le 1\), \(0\le y_2\le 1\). \(\square \)

4.2 Convex hull description of X

We use a similar argument as in the proof of Proposition 6 to prove Theorem 2. Let (xyt) be a point such that \(0\le y_i\le x_i\le 1\) and we additionally assume that \(y_1\ge y_2\). A point (xyt) belongs to \(\text {conv}(X)\) if and only if there exists \(({\bar{x}},{\bar{y}},{\bar{t}})\), \(({\hat{x}},{\hat{y}},{\hat{t}})\), and \(0\le \lambda \le 1\) such that

$$\begin{aligned}&t=(1-\lambda ){\bar{t}}+\lambda {\hat{t}}\end{aligned}$$
(22)
$$\begin{aligned}&x_1=(1-\lambda ){\bar{x}}_1+\lambda {\hat{x}}_1\end{aligned}$$
(23)
$$\begin{aligned}&x_2=(1-\lambda ){\bar{x}}_2+\lambda {\hat{x}}_2\end{aligned}$$
(24)
$$\begin{aligned}&y_1=(1-\lambda ){\bar{y}}_1+\lambda {\hat{y}}_1\end{aligned}$$
(25)
$$\begin{aligned}&y_2=(1-\lambda ){\bar{y}}_2+\lambda {\hat{y}}_2\end{aligned}$$
(26)
$$\begin{aligned}&{\bar{x}}_2=0,\; {\hat{x}}_2=1\end{aligned}$$
(27)
$$\begin{aligned}&{\bar{y}}_2=0,\; 0\le {\hat{y}}_2\le 1\end{aligned}$$
(28)
$$\begin{aligned}&0\le {\bar{y}}_1\le {\bar{x}}_1\le 1, \; 0\le {\hat{y}}_1\le {\hat{x}}_1\le 1\end{aligned}$$
(29)
$$\begin{aligned}&{\bar{t}}\ge {\bar{y}}_1^2/{\bar{x}}_1\end{aligned}$$
(30)
$$\begin{aligned}&{\hat{t}}\ge g_1({\hat{x}}_1,{\hat{y}}_1,{\hat{y}}_2). \end{aligned}$$
(31)

The system (22)–(31) corresponds to \(\text {conv}(K_0\cup K_1)\), where \(K_0=\{(x,y,t)\in [0,1]^2\times \mathbb {R}_+^2\times \mathbb {R}:\nicefrac {y_1^2}{x_1}\le t,\; y_2=x_2=0\}\) and \(K_1=\{(x,y,t)\in [0,1]^2\times \mathbb {R}_+^2\times \mathbb {R}:g_1(x_1,y_1,y_2)\le t,\; x_2=1\}\). Observe that \(K_0\) and \(K_1\) are the convex hulls of the restrictions of X, where \(x_2=0\) and \(x_2=1\), respectively.

Using a similar reasoning as in the proof of Proposition 6, we find \(\lambda =x_2\), \({\hat{y}}_2=\frac{y_2}{x_2}\), \({\bar{x}}_1=\frac{x_1-x_2{\hat{x}}_1}{1-x_2}\), \({\bar{y}}_1=\frac{y_1-x_2{\hat{y}}_1}{1-x_2}\), and

$$\begin{aligned} \text {(M2)} \ \ \ \ \ \ \ \ \ t\ge \min _{{\hat{x}}_1,{\hat{y}}_1}\;&\psi ({\hat{x}}_1,{\hat{y}}_1)\nonumber \\ \text {s.t.}\;&0\le {\hat{y}}_1\le {\hat{x}}_1\le 1 \end{aligned}$$
(32)
$$\begin{aligned}&{\hat{y}}_1\le \frac{y_1}{x_2},\; {\hat{x}}_1-{\hat{y}}_1\le \frac{x_1-y_1}{x_2},\; \frac{x_1}{x_2}-\frac{1-x_2}{x_2}\le {\hat{x}}_1, \end{aligned}$$
(33)

where

$$\begin{aligned} \psi ({\hat{x}}_1,{\hat{y}}_1):=\frac{\left( y_1-x_2{\hat{y}}_1\right) ^2}{x_1-x_2{\hat{x}}_1}+ x_2 g_1({\hat{x}}_1, {\hat{y}}_1, y_2/x_2) \cdot \end{aligned}$$

Thus, to find the convex hull of X, we need to compute in closed form the solutions of the optimization problem (M2).

Lemma 1

There exists an optimal solution \(({\hat{x}}_1^*,{\hat{y}}_1^*)\) to (M2) such that \({\hat{y}}_1^*\ge \frac{y_2}{x_2}\).

Proof

Note that if \({\hat{y}}_1< \frac{y_2}{x_2}\), the function \(\psi \) is non-increasing in \({\hat{y}}_1\) for any value of \({\hat{x}}_1\). Thus there exists an optimal solution where \({\hat{y}}_1\) is set to one of its upper bounds, i.e., either \({\hat{y}}_1^*=\nicefrac {y_1}{x_2}\) or \({\hat{y}}_1^*={\hat{x}}_1^*\). Since we assume \(y_1\ge y_2\) and \({\hat{y}}_1< \nicefrac {y_2}{x_2}\), the case \({\hat{y}}_1^*=\nicefrac {y_1}{x_2}\) is not possible.

Now suppose that \({\hat{y}}_1={\hat{x}}_1\). Then observe that \(1\le \frac{y_2}{x_2} + {\hat{y}}_1\frac{1-{\hat{x}}_1}{{\hat{x}}_1}\Leftrightarrow {\hat{x}}_1\le \frac{y_2}{x_2}\). Thus

$$\begin{aligned} \psi ({\hat{x}}_1)=\frac{\left( y_1-x_2{\hat{x}}_1\right) ^2}{x_1-x_2{\hat{x}}_1}+\frac{\left( y_2-x_2{\hat{x}}_1\right) ^2}{x_2-x_2{\hat{x}}_1} \end{aligned}$$

in this case (substituting \({\hat{y}}_1={\hat{x}}_1\)). Taking the derivative, we find

$$\begin{aligned} \psi '({\hat{x}}_1)&=x_2\frac{y_1-x_2{\hat{x}}_1}{(x_1-x_2{\hat{x}}_1)^2} \left( -2x_1+x_2{\hat{x}}_1+y_1\right) \\&+\,x_2\frac{(y_2-x_2{\hat{x}}_1)}{(x_2-x_2{\hat{x}}_1)^2}\left( -2x_2+x_2{\hat{x}}_1+y_2\right) \cdot \end{aligned}$$

Note that \(y_1-x_2{\hat{x}}_1\ge 0\) since \({\hat{x}}_1={\hat{y}}_1\le \nicefrac {y_1}{x_2}\) in any feasible solution, and \(y_2-x_2{\hat{x}}_1\ge 0\), by assumption. Additionally

  • since \(y_1\le x_1\) and \({\hat{x}}_1={\hat{y}}_1\le \nicefrac {y_1}{x_2}\le \nicefrac {x_1}{x_2}\), we find that \(-2x_1+x_2{\hat{x}}_1+y_1\le 0\),

  • since \(y_2\le x_2\) and \({\hat{x}}_1\le 1\), we find that \(-2x_2+x_2{\hat{x}}_1+y_2\le 0\).

Therefore, \(\psi '(x_1)\) is non-positive, i.e., \(\psi \) is non-increasing. Then, increasing \({\hat{y}}_1={\hat{x}}_1\) another optimal solution can be found. In particular, an optimal solution with \({\hat{y}}_1^*\ge \nicefrac {y_2}{x_2}\) exits. \(\square \)

From Lemma 1 we can assume, without loss of generality, that

$$\begin{aligned} \psi ({\hat{x}}_1,{\hat{y}}_1)=\frac{(y_1-x_2{\hat{y}}_1)^2}{x_1-x_2{\hat{x}}_1}+\frac{(x_2{\hat{y}}_1-y_2)^2}{x_2{\hat{x}}_1} \cdot \end{aligned}$$
(34)

Taking partial derivatives, we find that

$$\begin{aligned} \frac{\partial \psi }{\partial {\hat{y}}_1}({\hat{x}}_1,{\hat{y}}_1)=&\ 2x_2\left( -\frac{y_1-x_2{\hat{y}}_1}{x_1-x_2{\hat{x}}_1}+\frac{x_2{\hat{y}}_1-y_2}{x_2{\hat{x}}_1}\right) ,\\ \frac{\partial \psi }{\partial {\hat{x}}_1}({\hat{x}}_1,{\hat{y}}_1)=&\ x_2 \left( \frac{y_1-x_2{\hat{y}}_1}{x_1-x_2{\hat{x}}_1}\right) ^2- x_2 \left( \frac{x_2{\hat{y}}_1-y_2}{x_2{\hat{x}}_1}\right) ^2. \end{aligned}$$

Lemmas 24 characterize the optimal solutions of (M2), depending on the values of (xy). Note that if

$$\begin{aligned} {\hat{y}}_1=\frac{y_2}{x_2}+\frac{{\hat{x}}_1}{x_1}(y_1-y_2), \end{aligned}$$
(35)

then \(\frac{\partial \psi }{\partial {\hat{y}}_1}({\hat{x}}_1,{\hat{y}}_1)=\frac{\partial \psi }{\partial {\hat{x}}_1}({\hat{x}}_1,{\hat{y}}_1)=0\), independently of the values of \({\hat{x}}_1\) and \({\hat{y}}_1\). Thus, any feasible point that satisfies (35) is an optimal solution of (M2), as is the case for Lemmas 2 and 3. In contrast, under the conditions of Lemma 4, no feasible point satisfies (35) as it would violate upper bound constraints.

Lemma 2

If \(x_1\le x_2\) then \({\hat{x}}_1^*=\frac{x_1-\epsilon }{x_2}\), where \(\epsilon >0\) is a sufficiently small number, and \({\hat{y}}_1^*=\frac{y_2}{x_2}+\frac{{\hat{x}}_1^*}{x_1}(y_1-y_2)\) is an optimal solution to (M2) with objective \(\psi ({\hat{x}}_2^*,{\hat{y}}_2^*)=\frac{(y_1-y_2)^2}{x_1} \cdot \)

Proof

We have \(\frac{\partial \psi }{\partial {\hat{y}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=\frac{\partial \psi }{\partial {\hat{x}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=0\) and \((x_1^*,y_1^*)\) satisfies all constraints (32)–(33). Thus, \((x_1^*,y_1^*)\) is a KKT point and, by convexity, is an optimal solution. Substituting in (34), we get the result. \(\square \)

Lemma 3

If \(x_1> x_2\) and \(y_2(x_1-x_2)+y_1x_2\le x_2x_1\), then \({\hat{x}}_1^*=1\) and \({\hat{y}}_1^*=\frac{y_2}{x_2}+\frac{{\hat{x}}_1^*}{x_1}(y_1-y_2)\) is an optimal solution to (M2) with objective \(\psi ({\hat{x}}_2^*,{\hat{y}}_2^*)=\frac{(y_1-y_2)^2}{x_1} \cdot \)

Proof

Observe that \(({\hat{x}}_1^*,{\hat{y}}_1^*)\) is feasible as \( {\hat{y}}_1^*=\frac{y_2}{x_2}+\frac{y_1-y_2}{x_1}\le \frac{y_2}{x_2}+\frac{y_1-y_2}{x_2}=\frac{y_1}{x_2}; {\hat{y}}_1^*=\frac{y_2}{x_2}+\frac{y_1-y_2}{x_1}=\frac{y_2x_1+y_1x_2-y_2x_2}{x_1x_2}\le 1={\hat{x}}_1^*; {\hat{x}}_1^*-{\hat{y}}_1^*= 1-\frac{y_2}{x_2}-\frac{y_1-y_2}{x_1}\le 1-\frac{y_2}{x_1}-\frac{y_1-y_2}{x_1}=\frac{x_1-y_1}{x_1}\le \frac{x_1-y_1}{x_2}; \frac{x_1}{x_2}-\frac{1-x_2}{x_2}=\frac{x_1-1}{x_2}+1\le 1= {\hat{x}}_1^*.\) Additionally, note that \(\frac{\partial \psi }{\partial {\hat{y}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=\frac{\partial \psi }{\partial {\hat{x}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=0\). Thus, \((x_1^*,y_1^*)\) is a KKT point and, by convexity, is an optimal solution. Substituting in (34), we find the result. \(\square \)

Lemma 4

If \(x_1> x_2\) and \(y_2(x_1-x_2)+y_1x_2\ge x_2x_1\), then \({\hat{x}}_1^*=1\) and \({\hat{y}}_1^*=1\) is an optimal solution to (M2) with objective \(\psi ({\hat{x}}_2^*,{\hat{y}}_2^*)=\frac{(y_1-x_2)^2}{x_1-x_2}+\frac{(x_2-y_2)^2}{x_2} \cdot \)

Proof

Note that since \(x_2\ge y_2\) and \(y_2(x_1-x_2)+y_1x_2\ge x_2x_1\), we have \(x_2(x_1-x_2)+y_1x_2\ge x_2x_1\Leftrightarrow y_1\ge x_2\) and, in particular, \({\hat{y}}_1^*\le \frac{y_1}{x_2}\). Additionally, it is easily checked that all other constraints (32)–(33) are satisfied. From \(y_2(x_1-x_2)+y_1x_2\ge x_2x_1\) we find that \(\frac{x_2-y_2}{x_2}\le \frac{y_1-x_2}{x_1-x_2}\). Now let \(\mu _1\) and \(\mu _2\) be the dual variables associated with constraints \({\hat{y}}_1\le {\hat{x}}_1\) and \({\hat{x}}_1\le 1\), respectively. Since both constraints are satisfied at equality at \(({\hat{x}}_1^*,{\hat{y}}_1^*)\), then we see that the dual variables \(\mu _1\) and \(\mu _2\) may take positive values without violating complementary slackness. In particular, let \(\mu _1^*=2x_2\left( \frac{y_1-x_2}{x_1-x_2}-\frac{x_2-y_2}{x_2}\right) \ge 0\) and \(\mu _2^*=x_2\left( \frac{y_1-x_2}{x_1-x_2}-\frac{x_2-y_2}{x_2}\right) \left( \frac{x_1-y_1}{x_1-x_2}+\frac{y_2}{x_2}\right) \ge 0\). Then, \( \frac{\partial \psi }{\partial {\hat{y}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=\mu _1^* \text { and } \frac{\partial \psi }{\partial {\hat{x}}_1}({\hat{x}}_1^*,{\hat{y}}_1^*)=-\mu _1^*+\mu _2^*. \) Thus \(({\hat{x}}_1^*,{\hat{y}}_1^*)\) corresponds to a KKT point and, by convexity, is optimal. Substituting in (34) gives the result. \(\square \)

Note that Lemmas 23 and 4 cover all cases with \(y_1\ge y_2\). We can now prove the main result.

Proof (Theorem 2)

If \(y_1\ge y_2\), the description of the convex hull follows directly from Lemmas 23 and 4. If \(y_1\le y_2\), the result follows from symmetry. \(\square \)

4.3 Valid inequalities for S

Similar to the discussion in Sect. 3.3, the description of \(\text {conv}(X)\) can be used to derive strong extended convex relaxations for S. In order to obtain (nonlinear) inequalities in the original space of variables, we project out the auxiliary variables for a given ordering \(y_1\ge \cdots \ge y_n\) of the continuous variables with additional restrictions corresponding to conditions \(x_j(x_i-y_i)\le y_j(x_i-x_j)\) in (12). Finally, to obtain linear inequalities valid independent of the conditions, we derive the first order approximations.

Suppose \(y_1\ge \cdots \ge y_n\), and \(x_j(x_i-y_i)\le y_j (x_i-x_j)\) for \(j>i\), which holds, in particular, if \(x=y\). By eliminating the auxiliary variables under these conditions we obtain the inequality

$$\begin{aligned} \phi (x,y)=\sum _{i \in {\bar{P}}} {\bar{Q}}_i y_i + \sum _{i \in P} {\bar{Q}}_i y_i^2/x_i - \sum _{i=1}^n\sum \limits _{j=i+1}^n Q_{ij}\left( \frac{(y_1-x_2)^2}{x_1-x_2}+\frac{(x_2-y_2)^2}{x_2}\right) \le t. \end{aligned}$$
(36)

Inequality (36) is only valid for the particular permutation of the continuous variables and when conditions \(x_j(x_i-y_i)\le y_j (x_i-x_j)\) for \(j>i\) hold. Since \( \sum _{i \in {\bar{P}}} {\bar{Q}}_i {\bar{y}}_i + \sum _{i \in P} {\bar{Q}}_i {\bar{y}}_i^2/{{\bar{x}}}_i - \sum _{i=1}^n \sum _{j=i+1}^n Q_{ij} g({{\bar{x}}}_i,{{\bar{x}}}_j,{\bar{y}}_i,{\bar{y}}_j)=\phi ({{\bar{x}}}, {\bar{y}})\), we can find valid subgradient inequalities by taking gradients of the left-hand-side of (36). Let \(\pi _i=Q_{ii}+2\sum _{j=i}^{i-1}Q_{ij}\) and \(\alpha _i=2\sum _{j=1}^iQ_{ij}\), and recall \({\bar{Q}}_i=\sum _{j=1}^nQ_{ij}\). The partial derivatives of \(\phi \) evaluated at a point \(({\bar{x}},{\bar{y}})\) where \({\bar{x}}={\bar{y}}\) are as follows:

$$\begin{aligned} \frac{\partial \phi }{\partial x_i}({\bar{x}},{\bar{y}})&=\sum _{j=i+1}^n Q_{ij}+\sum _{j=i+1}^{i-1}Q_{ij}-\bar{Q}_i=-Q_{ii}=\pi -\alpha _i,&\quad i\in P\\ \frac{\partial \phi }{\partial x_i}({\bar{x}},{\bar{y}})&=\sum _{j=i+1}^n Q_{ij}+\sum _{j=i+1}^{i-1}Q_{ij}=\pi -\alpha _i+{\bar{Q}}_i,&\quad i\in {\bar{P}}\\ \frac{\partial \phi }{\partial y_i}({\bar{x}},{\bar{y}})&=-2\sum _{j=i+1}^n Q_{ij}+2\bar{Q}_i=-\alpha _i,&\quad i \in P\\ \frac{\partial \phi }{\partial y_i}({\bar{x}},{\bar{y}})&=-2\sum _{j=i+1}^n Q_{ij}+\bar{Q}_i=-\alpha _i-{\bar{Q}}_i,&\quad i \in {\bar{P}}. \end{aligned}$$

Thus, since \(\phi ({\bar{x}},{\bar{y}})+\nabla \phi ({\bar{x}},{\bar{y}})(x-{\bar{x}},y-{\bar{y}})\le g(x,y)\le t\), we obtain the linear inequality

$$\begin{aligned} \sum _{i=1}^n\pi _i x_i\le t+\sum _{i=1}^n\alpha _i(x_i-y_i)-\sum _{i\in {\bar{P}}}{\bar{Q}}_i(x_i-y_i). \end{aligned}$$
(37)

Observe that inequality (37) depends only on the ordering of \({\bar{x}}\), but not on the actual values.

Remark 3

Consider the submodular function given by \(q(x)=x'Qx\). The extreme points of the extended polymatroid [20] associated with q, \(\Pi \), correspond to the vectors \(\pi \) in inequality (37); thus, the convex lower envelope of q is described by the function \({\bar{q}}(x)=\max _{\pi \in \Pi }\pi 'x\) [34]. Atamtürk and Bhardwaj [4] employ these polymatroid inequalities for the binary case. For the mixed-integer case, the inequality (37) is tight for the binary restriction \(x=y\), and the right hand side is relaxed as the distance between x and y increases.

Remark 4

The values \(\alpha _i\) in inequality (37) corresponds to the value of derivative of q(x) with respect to \(x_i\) when \(x_j=1\) for all \(j\le i\) and \(x_j=0\) for \(j>i\). Atamtürk and Jeon [6] use lifting to derive similar inequalities for another class of nonlinear functions with indicator variables and submodular binary restriction.

5 Valid conic quadratic inequalities for X

The inequalities \(f(x,y)\le t\) and \(g(x,y)\le t\) derived in Sects. 3 and 4 for \(X_U\) and X, respectively, cannot be directly used within off-the-shelf solvers in the original space of variables as they are piecewise functions. However, since they are convex, they can be implemented using gradient outer-approximations at differentiable points (as discussed in Sects. 3.3 and 4.3): given a fractional point \(({\bar{x}},{\bar{y}})\) with \({\bar{x}}>0\) and a subgradient \(\xi \in \partial g({\bar{x}},{\bar{y}})\), the inequality

$$\begin{aligned} g({\bar{x}},{\bar{y}})+\xi '(x-{\bar{x}},y-{\bar{y}})\le t \end{aligned}$$
(38)

can be used as a cutting plane to improve the continuous relaxation. However, such an approach may require adding too many inequalities (38) to the formulation, possibly resulting in poor performance (see also Sects. 7.1 and 7.3 for additional discussion on computations). Alternatively, an extended formulation could be used [e.g., [17, 22]]; however, such formulations may require a prohibitively large number of variables, resulting in hard-to-solve convex formulations and poor performance in branch-and-bound algorithms. Therefore, in this section we give valid conic quadratic inequalities that provide a strong approximation of \(\text {conv}(X)\) and can be readily used within conic quadratic solvers.

5.1 Derivation of the inequalities

Let \(L_2=\left\{ (x,y,t)\in X: x_2=0\right\} \) and observe that

$$\begin{aligned} \text {conv}(L_2)=\left\{ (x,y,t)\in [0,1]^2\times \mathbb {R}_+^2\times \mathbb {R}:\frac{y_1^2}{x_1}\le t,\; y_1\le x_1,\; x_2=y_2=0\right\} . \end{aligned}$$

We now consider inequalities obtained by lifting the valid inequality \(\frac{y_1^2}{x_1}\le t\) for \(\text {conv}(L_2)\), i.e., inequalities of the form

$$\begin{aligned} \frac{y_1^2}{x_1}+h(x_2,y_2)\le t \end{aligned}$$
(39)

for X, where \(h:[0,1]\times \mathbb {R}_+\rightarrow \mathbb {R}\). We additionally require the left hand side of (39) to be convex, which is the case if and only if h is convex.

Proposition 7

Inequality

$$\begin{aligned} \frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}-2y_2\le t \end{aligned}$$
(40)

is valid for X and is the strongest convex inequality of the form (39).

Proof

Any valid inequality of the form (39) needs to satisfy

$$\begin{aligned} h(x_2,y_2)\le \alpha =\min \; \left\{ (y_1-y_2)^2 -\frac{y_1^2}{x_1} \ : \ 0\le y_1\le x_1,\; x_1\in \{0,1\} \right\} \cdot \end{aligned}$$

If \(x_1=0\), then \(\alpha =y_2^2\); else, \(\alpha =-2y_1y_2+y_2^2\). Thus, \(y_1=x_1=1\) is a minimizer. We also find that \(h(x_2,y_2)\le y_2^2-2y_2\) for \(x_2\in \{0,1\}\). To find the strongest convex inequality, we compute \(\text {conv}(W)\), where \(W=\left\{ (x_2,y_2,t_2)\in \{0,1\}\times \mathbb {R}_+^2: y_2^2-2y_2\le t_2,\; y_2\le x_2\right\} .\) Using the perspective reformulation, one sees that

$$\begin{aligned} \text {conv}(W)=\left\{ (x_2,y_2,t_2)\in [0,1]\times \mathbb {R}_+^2: \frac{y_2^2}{x_2}-2y_2\le t_2,\; y_2\le x_2\right\} , \end{aligned}$$

and we get inequality (40). \(\square \)

By changing the lifting order, we also get that valid inequality \( \frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}-2y_1\le t \), or, writing the inequalities more compactly, we arrive at the convex valid inequality

$$\begin{aligned} \frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}-2\min \{y_1,y_2\}\le t. \end{aligned}$$
(41)

Remark 5

Observe that inequality (41) dominates inequality (4) since

$$\begin{aligned} \frac{y_1^2}{x_1}-x_2=\frac{y_1^2}{x_1}-y_2-(x_2-y_2)\le \frac{y_1^2}{x_1}-y_2-(x_2-y_2)\frac{y_2}{x_2}= \frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}-2y_2. \end{aligned}$$

Similarly, we find that (41) dominates inequality (3).

Remark 6

For the binary case, \(y_i=x_i\), \(i=1,2\), (41) reduces to \(|x_1 - x_2| \le t\).

5.2 Strength of the inequalities

In order to assess the strength of inequality (41), we consider the optimization problem

$$\begin{aligned} \min \;&a_1x_1+a_2x_2+b_1y_1+b_2y_2+t\\ \text {s.t.}\;&(y_1-y_2)^2\le t\\ \text {(SR)} \ \ \ \ \ \ \ \ \,&\frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}-2\min \{y_1,y_2\}\le t\\&0\le y_1\le x_1\le 1\\&0 \le y_2\le x_2 \le 1. \end{aligned}$$

Inequalities (41) are not sufficient to guarantee the integrality of x in the optimal solutions of (SR) for all values of a and b, since they do not describe \(\text {conv}(X)\) (given in Sect. 4). However, we now show that optimal solutions of (SR) are indeed integral under mild assumptions on the coefficients a and b. First, we prove an auxiliary lemma.

Lemma 5

If there exists an optimal solution to (SR) with \(y_i \in \{0,1\}\) for some \(i\in \{1,2\}\), then there exists an optimal solution that is integral in x.

Proof

If \(y_1=0\), then clearly there is an optimal solution with \(x_1 \in \{0,1\}\), depending on the sign of \(a_1\). Moreover, (SR) reduces to \(\min _{0\le y_2\le x_2\le 1}\left\{ a_2x_2+b_2y_2+y_2^2/x_2\right\} ,\) which has an optimal integral solution in \(x_2\). On the other hand, if \(y_1=x_1=1\), then (SR) reduces to \(\min _{0\le y_2\le x_2\le 1}\left\{ a_2x_2+(b_2-2)y_2+y_2^2/x_2\right\} ,\) which, again, has an optimal integral solution in \(x_2\). The case with \(y_2 \in \{0,1\}\) is symmetric. \(\square \)

Proposition 8

If \(a_1,a_2\) have the same sign and \(b_1,b_2\) have the same sign, then (SR) has an optimal solution that is integral in x.

Proof

Note that if \(a_1, a_2\le 0\), then \(x_1=x_2=1\) for an optimal solution of (SR). Also, if \(b_1,b_2\ge 0\), then \(y_1=y_2=0\) in an optimal solution of (SR), in which case x is integral in extreme point solutions. It remains to show that if \(a_1,a_2\ge 0\) and \(b_1,b_2\le 0\), then there exists an optimal solution of (SR) that is integral in x.

Suppose that \(y_1=y_2=y\) in an optimal solution. Then \((y_1-y_2)^2=0\) and \(\frac{y^2}{x_1}+\frac{y^2}{x_2}-2y\le 0\). Thus, \(t=0\) and (SR) reduces to \( \min \left\{ a_1x_1+a_2x_2+(b_1+b_2)y : 0\le y\le \min \{x_1,x_2\}\le 1 \right\} , \) which has an optimal solution integral in x.

Now suppose, without loss of generality, there is an optimal solution with \(1> y_1>y_2>0\) (if \(y_1=1\) or \(y_2=0\) then by Lemma 5 the solution is integral in x). Then observe that, in this case, the functions \((y_1-y_2)^2\) and \(y_2^2/x_2-2y_2\) are non-increasing in \(y_2\). Since \(b_2\le 0\), there exists a solution where \(y_2\) is at its upper bound, i.e., \(y_2=x_2\). Thus problem (SR) reduces to

$$\begin{aligned}&(\text {SR}^{\prime }) \ \ \min \left\{ a_1x_1+b_1y_1+(a_2 \! + \!b_2)y_2+t: (y_1 \!- \!y_2)^2\right. \\&\left. \quad \le t, \frac{y_1^2}{x_1}\!-\!y_2\le t, y_1 \! \le \! x_1 \!\le 1 \right\} \cdot \end{aligned}$$

Let \((\lambda , \mu , \alpha , \beta )\) be the dual variables associated with the \(\le \) constraints displayed in the order above and consider the dual feasibility conditions of problem (\(\hbox {SR}^{\prime }\))

$$\begin{aligned}&-a_1=-\mu _1\frac{y_1^2}{x_1^2}-\alpha +\beta \\&-b_1=2\lambda (y_1-y_2)+2\mu \frac{y_1}{x_1}+\alpha \\&-(a_2+b_2)=-2\lambda (y_1-y_2)-\mu \\&1=\lambda +\mu \\&0\le \lambda ,\mu ,\alpha ,\beta . \end{aligned}$$

Let \(({\bar{x_1}},{\bar{y_1}},{\bar{y_2}},{\bar{t}})\) be a KKT point with multipliers \(({\bar{\lambda }},{\bar{\mu }},{\bar{\alpha }}, {\bar{\beta }})\) and suppose that \({\bar{x}}_1<1\). Then observe that for small \(\epsilon >0\), \((\frac{{\bar{y}}_1+\epsilon }{{\bar{y}}_1}\bar{x_1},{\bar{y_1}}+\epsilon ,{\bar{y_2}}+\epsilon ,{\bar{t}})\) is also a KKT point with the same multipliers. In particular, by choosing \(\epsilon \) so that \(1=\frac{{\bar{y}}+\epsilon }{{\bar{y}}}{\bar{x}}\), we see that there is an optimal solution with \(x_1=1\). Then, problem (\(\hbox {SR}^{\prime }\)) further simplifies to

$$\begin{aligned} \text {(SR}^{\prime \prime }) \ \ \ \ \min \{ b_1y_1+(a_2+b_2)y_2+t: (y_1-y_2)^2\le t, y_1^2-y_2\le t \} \cdot \end{aligned}$$

It remains to show that \(y_2=x_2\) is integral. Note that

$$\begin{aligned} y_1^2-2y_1y_2+y_2^2=y_1^2-y_2(2y_1-1)\ge y_1^2-y_2, \end{aligned}$$

and, therefore, constraint \(y_1^2-y_2\le t\) is not binding when \(y_1 < 1\). So, (\(\hbox {SR}^{\prime \prime }\)) is equivalent to \(\min b_1y_1+(a_2+b_2)y_2+(y_1-y_2)^2\). However, by increasing or decreasing \(y_1\) and \(y_2\) by the same amount it is easy to check that there exists an optimal solution where either \(y_1=1\) or \(y_2=0\), and from Lemma 5 there exists an optimal integral solution. \(\square \)

Proposition 8 provides some insight on the problems for which inequalities (41) may be particularly effective: if the coefficients of the binary variables and the continuous variables have the same sign, then the relaxation induced by (41) may be close to ideal; otherwise, using subgradient inequalities may be required to find strong formulations. In our computations, this simple rule of thumb indeed results in the best performance.

6 Extensions to other quadratic functions with two indicator variables

In this paper we focus on the set X, i.e., a mixed-integer set with non-negative continuous variables and non-positive off-diagonal entries in the quadratic matrix. Although an in-depth study of more general quadratic functions is outside the scope of this paper, the approach used in Sect. 5 can be naturally extended to other quadratic functions. We briefly discuss two such extensions.

6.1 General quadratic functions

Observe that a general quadratic function \(y'Ay\) can be decomposed as

$$\begin{aligned} y'Ay= & {} \sum _{i=1}^n\left( \left( A_{ii}-\sum _{j\ne i}|A_{ij}|\right) y_i^2-\sum _{j>i:A_{ij}<0} A_{ij}(y_i-y_j)^2\right. \\&\left. +\sum _{j>i:A_{ij}>0} A_{ij}(y_i+y_j)^2\right) . \end{aligned}$$

Thus, stronger formulations for general quadratic functions may be obtained by studying the set with two continuous and two indicator variables and positive off-diagonal term

$$\begin{aligned} X_+=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}_+^2 \times \mathbb {R}: (y_1+y_2)^2\le t,\; y_i\le x_i, \ i=1,2\right\} . \end{aligned}$$

Proposition 9

Inequality

$$\begin{aligned} \frac{y_1^2}{x_1}+\frac{y_2^2}{x_2}\le t \end{aligned}$$
(42)

is valid for \(X_+\) and is the strongest among inequalities of the form (39).

The proof is analogous the the proof of Proposition 7 as is omitted for brevity. Although inequality (42) is similar in spirit to (40), and that it is the strongest among inequalities of the form (39), it is not as strong as (40) for X. In particular, an integrality result similar to Proposition 8 does not hold for (42).

6.2 Quadratic functions with continuous variables unrestricted in sign

Consider the set

$$\begin{aligned} X_{\pm }=\left\{ (x,y,t)\in \{0,1\}^2\times \mathbb {R}^2\times \mathbb {R}: (y_1\pm y_2)^2\le t,\; -x_i\le y_i\le x_i \text { for }i=1,2\right\} . \end{aligned}$$

Observe that, since the continuous variables can be positive or negative, the sign inside the quadratic expression does not matter (e.g., it can be flipped via the transformation \({\bar{y}}_2=-y_2\)). Thus we assume, without loss of generality, that it is a minus sign.

Proposition 10

Inequality (4), originally proposed by Jeon et al. [29], is valid for \(X_\pm \) and is the strongest among inequalities of the form (39).

Proof

Any valid inequality for \(X_{\pm }\) of the form (39) needs to satisfy

$$\begin{aligned} h(x_2,y_2)\le \alpha =\min \;&\left\{ (y_1-y_2)^2 -\frac{y_1^2}{x_1} \ : \ -x_1\le y_1\le x_1,\; x_1\in \{0,1\} \right\} \cdot \end{aligned}$$

If \(x_1=0\), then \(\alpha =y_2^2\). Else, \(\alpha =-2y_1y_2+y_2^2\); in this case, the minimum is attained at \(y_1^*=1\) if \(y_2\ge 0\) and at \(y_1^*=-1\) otherwise. Thus, we find that \(h(x_2,y_2)\le y_2^2-2|y_2|\) for \(x_2\in \{0,1\}\). To find the strongest convex inequality, we compute \(\text {conv}(W_{\pm })\), where \(W_{\pm }=\left\{ (y_2,x_2,t_2)\in \{0,1\}\times \mathbb {R}\times \mathbb {R}: y_2^2-2|y_2|\le t_2,\; -x_2\le y_2\le x_2\right\} .\) The convex lower envelope corresponding to the one-dimensional non-convex function \(h_1(y_2)=y_2^2-2|y_2|\) for \(y_2\in [-1,1]\) is the constant function equal to \(-1\). Moreover, it can be shown that

$$\begin{aligned} \text {conv}(W_{\pm })=\left\{ (y_2,x_2,t_2)\in [0,1]\times \mathbb {R}_+^2: -x_2\le t_2,\; -x_2\le y_2\le x_2\right\} \end{aligned}$$

and we get the convex valid inequality \(\frac{y_1^2}{x_1}-x_2\le t\) for \(X_{\pm }\). \(\square \)

In light of Proposition 10, inequalities (40)-(41) can be interpreted as inequalities that additionally account for the non-negativity of the continuous variables, with respect to the valid inequalities proposed by Jeon et al. [29]. Moreover, although not explicitly considered by Jeon et al., their inequalities may be particularly effective for quadratic optimization problems with indicator variables and continuous variables unrestricted in sign. Observe that inequalities (3)–(5) are indeed valid even if the variables are not required to be non-negative – in contrast with the inequalities \(f(x,y)\le t\), \(g(x,y)\le t\) and (41), which account for the non-negativity of the variables and are only valid in that case.

7 Computations

In this section we report a summary of computational experiments performed to test the effectiveness of the proposed inequalities in a branch-and-bound algorithm. All experiments are conducted using Gurobi 7.5 solver on a workstation with a 3.60GHz Intel® Xeon® E5-1650 CPU and 32 GB main memory with a single thread. The time limit is set to one hour and Gurobi’s default settings are used (except for the parameter “PreCrush”, which is set to 1 in order to use cuts). Cuts (if used) are added only at the root node using the callback features of Gurobi, and the reported times include the time used to add cuts.

7.1 Image segmentation with \(\ell \)-0 penalty

Given a finite set N, functions \(d_i:\mathbb {R}\rightarrow \mathbb {R}_+\) for \(i\in N\) and \(s_{ij}:\mathbb {R}\rightarrow \mathbb {R}_+\) for \(i\ne j\), consider

$$\begin{aligned} (D)\ \ \ \ \ \ \ \ \ \ \min _{y\in Y}\;&\sum _{i\in N}d_i(y_i)+\sum _{i\ne j}s_{ij}(y_i-y_j), \end{aligned}$$

where \(Y\subseteq \mathbb {R}_+^N\). Problem (D) arises as the Markov Random Fields (MRF) problem for image segmentation, see [16, 32]. In the MRF context, \(d_i\) are the deviation penalty functions, used to model the cost of changing the value of a pixel from the observed value \(p_i\) to \(y_i\), e.g., \(d_i(y_i)=c_i (p_i-y_i)^2\) with \(c_i\in \mathbb {R}_+\); functions \(s_{ij}\) are the separation penalty functions, used to model the cost of having adjacent pixels with different values, e.g., \(s_{ij}(y_i-y_j)=c_{ij}(y_i-y_j)^2\) with \(c_{ij} > 0\) if pixels i and j are adjacent, and \(s_{ij}(y_i-y_j)=0\) otherwise. Often, \(Y=[0,1]^N\) or is given by a suitable discretization, i.e., y is a vector of integer multiples of a parameter \(\varepsilon \). We consider in our computations the case \(Y=[0,1]^N\), but the proposed approach can be used with any Y.

Problem (D) can be cast as the nonlinear dual of the undirected minimum cost network flow problem [1] and efficient algorithms exist when all functions are convex [27]. In contrast, we consider here the case where the deviation functions involve a non-convex \(\ell \)-0 penalty, which is often used to induce sparsity, e.g., restricting the number of pixels that can have a color different from the background color. In particular, \(d_i(y_i)=a_i\Vert y_i\Vert _0+\bar{d}_i(y_i)\) with \(\bar{d}_i=c_i(p_i-y_i)^2\). Thus, the problem can be formulated as

$$\begin{aligned} \min \; \sum _{i\in N}a_ix_i+\sum _{i\in N}c_i(p_i-y_i)^2+\sum _{i\ne j}c_{ij}t_{ij} \text { s. t. }(x_i,x_j,y_i,y_j,t_{ij})\in X,\; \forall i\ne j. \end{aligned}$$
(43)

Instances The instances are constructed as follows. The elements of N correspond to points in a \(k \times k\) grid, thus \(n=k^2\), and separation functions \(s_{ij}\) are non-zero whenever the corresponding points are adjacent in the grid. The parameters \(p_i\) for \(i\in N\), and \(c_{ij}\) for each pair of adjacent points \(i,j\in N\) are drawn uniformly between 0 and 1. We set \(a_i=c_i\), where \(c_i\) is generated as follows: first we draw \(\tilde{c}_i\) uniformly between 0 and 1 for all \(i\in N\), let \(C_1=\sum _{i\in N}\tilde{c}_i\) and \(C_2=\sum _{i:p_i\ge 0.5}(2p_i-1)\); then we set \(c_i=\tilde{c}_i\frac{C_1}{C_2}\). Instances generated with these parameters are observed to have large integrality gaps.

Formulations We test the following formulations for solving problem (43):

Basic :

The natural formulation

$$\begin{aligned} \min \;\sum _{i\in N}a_ix_i+\sum _{i\in N}c_i(p_i-y_i)^2+\sum _{i\ne j}c_{ij}(y_i-y_j)^2 \text { s.t. }0\le y\le x,\; x\in \{0,1\}^N. \end{aligned}$$
Perspective :

The perspective reformulation implemented with rotated cone constraints

$$\begin{aligned} \sum _{i\in N}c_ip_i^2+\min \;&\sum _{i\in N}a_ix_i+\sum _{i\in N}c_i\left( -2p_iy_i+z_i\right) +\sum _{i\ne j}c_{ij}(y_i-y_j)^2\\ \text { s.t.}\;&y_i^2\le z_ix_i,\; \forall i\in N\\&0\le y\le x,\;z\ge 0,\; x\in \{0,1\}^N. \end{aligned}$$
Conic :

The formulation with the conic quadratic inequalities (41)

$$\begin{aligned} \sum _{i\in N}c_ip_i^2+\min \;&\sum _{i\in N}a_ix_i+\sum _{i\in N}c_i\left( -2p_iy_i+z_i\right) +\sum _{i\ne j}c_{ij}t_{ij}\\ \text { s.t.}\;&y_i^2\le z_ix_i,\; \forall i\in N\\&(y_i-y_j)^2\le t_{ij},\; z_i+z_j-2y_i\le t_{ij},\;z_i+z_j-2y_j\le t_{ij},\; \forall i\ne j\\&0\le y\le x,\;z\ge 0,\; x\in \{0,1\}^N. \end{aligned}$$

Furthermore, we also test models Perspective+cuts and Conic+cuts, where the subgradient inequalities (38) are used as cutting planes to strengthen the Perspective and Conic formulations, respectively. If \({\bar{x}}_i=0\) for some \(i\in N\) then we use the first-order expansion around \({\bar{x}}_i=10^{-5}\) instead.

Results Table 1 shows a comparison of the performance of the algorithm for each formulation for varying grid sizes. Each row in the table represents the average for five instances for a grid size. Table 1 displays the initial gap (igap), the root gap improvement (rimp), the number of branch and bound nodes (nodes), the elapsed time in seconds (time), and the end gap at termination (egap) (in brackets, we report the number of instances solved to optimality within the time limit). The initial gap is computed as \(\texttt {igap}=\frac{\texttt {obj}_{\texttt {best}}-\texttt {obj}_{\texttt {cont}}}{\left| \texttt {obj}_{\texttt {best}}\right| }{\times 100}\), where \(\texttt {obj}_{\texttt {best}}\) is the objective value of the best feasible solution found and \(\texttt {obj}_{\texttt {cont}}\) is the objective of the continuous relaxation of Basic. The root improvement is computed as \(\texttt {rimp}= \frac{\texttt {obj}_{\texttt {relax}}-\texttt {obj}_{\texttt {cont}}}{\texttt {obj}_{\texttt {best}}-\texttt {obj}_{\texttt {cont}}}{\times 100}\), where \(\texttt {obj}_{\texttt {relax}}\) is the objective value of the relaxation obtained after processing the first node of the branch-and-bound tree for a given formulation, obtained by querying Gurobi’s attribute “ObjBound” at the root node using a callback.

We observe that the Basic formulation requires a substantial amount of branching before proving optimality, resulting in long solution times. The Perspective formulation results in a root gap improvement close to 50% and better times and end gaps than the Basic formulation. However, even with the Perspective formulation, instances with \(k \times k=400\) and larger cannot be solved to optimality leaving end gaps 15.3% or more. In contrast, formulation Conic results in root gap improvements close to 100%, and the performance of the branch-and-bound algorithm is orders-of-magnitude better than with the Basic and Perspective formulations: instances with \(k \times k=400\) that are not close to being solved after one hour of computation with Basic and Perspective are solved to optimality in one second; while formulation Basic is able to solve in five minutes instances with 100 variables, formulation Conic is able to solve in the same amount of time formulations with 2, 500 variables, i.e., instances 250 times larger.

Table 1 Experiments with image segmentation with \(\ell \)-0 penalty

Formulation Conic+cuts results in very modest improvement in the strength of the continuous relaxation when compared with Conic (less than 0.3% additional root gap improvement) and almost no difference in terms of nodes, times or end gaps. Observe that in (43) the coefficients of the linear objective terms corresponding to the discrete and continuous variables have the same sign, and the experimental results are consistent with Proposition 8Conic indeed is a very close approximation of inequalities (38) in this case.

Note that if cuts are added without the approximation given by inequalities (41) (formulation Perspective+cuts), the root improvement is substantial for small instances but it degrades as the size increases. We conjecture that the required number of cuts to obtain an adequate relaxation increases with the size of the instances. Thus, for larger instances, Gurobi may stop adding cuts before obtaining a strong relaxation. Additionally, to solve second-order conic subproblems in branch-and-bound, solvers like Gurobi construct a linear outer approximation of the convex sets; adding a large number of cuts may interfere with the construction of the outer approximation, leading to weak relaxations of the convex set, which is observed for instances with \(k\times k = 10,000\). Using the approximation of the convex hull derived in Sect. 5 as a starting point appears to circumvent such numerical difficulties.

Finally, we remark that for the larger instances that are not solved to optimality by Conic, high quality solutions and tight lower bounds are found within a few seconds, but branching is ineffective to close the remaining gap. To illustrate, Figure 1 presents the time to prove an optimality gap of at most 1%, as a function of the dimension n of the problem. We see that the proposed approach scales very well (almost linearly) up to \(n=20{,}000\). In particular, the lower bound found corresponds to the one obtained at the root node, and the feasible solutions are found within a small number (50–60) of branch-and-bound nodes. Memory limit is reached for instances with \(n>20{,}000\).

Fig. 1
figure 1

Time to prove an optimality gap of 1% with Conic as a function of the dimension \(n=k\times k\)

7.2 Portfolio optimization with transaction costs

Consider a simple portfolio optimization problem with transaction costs similar to the one discussed in [18, p.146]. However, in our case, transactions have a fixed cost and there is a restricted number of transactions. For simplicity, we first consider assets with uncorrelated returns. In this context, an M-matrix arises directly due to the buying and selling decisions. In Sect. 7.3 we present computations with a general covariance matrix, from which an M-matrix corresponding to the negatively correlated assets can be extracted to apply the reformulations.

Let N be the set of assets, \(\mu ,\sigma \in \mathbb {R}_+^N\) be the vectors of expected returns and standard deviations of returns. Let \(w\in \mathbb {R}_+^N\) denote the current holdings in each asset, let \(a^+,a^-\in \mathbb {R}_+^N\) be the fixed transaction costs associated with buying and selling any quantity, \(c^+,c^-\in \mathbb {R}^N\) be the variable transaction costs and profits of buying and selling each asset, let \(u^+,u^-\in \mathbb {R}_+^N\) be the upper bounds on the transactions, and let k be the maximum number of transactions. Then the problem of finding a minimum risk portfolio that satisfies a given expected return \(b\in \mathbb {R}\) with at most k transactions can be formulated as the mixed-integer quadratic problem:

$$\begin{aligned} \min \;&v(y)=\sum _{i\in N}\sigma _i^2 (w_i+y_i^+-y_i^-)^2\\ \text {s.t.}\;&\sum _{i\in N}\left( \mu _iw_i+y_i^+(\mu _i-c_i^+)-y_i^-(\mu _i-c_i^-)-a_i^+x_i^+-a_i^-x_i^-\right) \ge b\\&\sum _{i\in N}(x_i^++x_i^-)\le k\\&0\le y_i^+\le u_i^+x_i^+,\; 0\le y_i^-\le u_i^-x_i^-,{\;x_i^++x_i^-\le 1,}\quad \forall i\in N\\&(x^+,x^-,y^+,y^-)\in \{0,1\}^N\times \{0,1\}^N \times \mathbb {R}_+^N\times \mathbb {R}_+^N, \end{aligned}$$

where v(y) is the variance of the new portfolio, the decision variables \(y_i^+\) (\(y_i^-\)) indicate the amount bought (sold) in asset i and the variables \(x_i^+\) (\(x_i^-\)) indicate whether asset i is bought (sold). Note that the quadratic objective function is nonseparable and the corresponding quadratic matrix is positive semi-definite but not positive definite; therefore, the classical perspective reformulation cannot be used. Additionally, observe that the portfolio optimization problem can be reformulated by adding continuous variables \(t\in \mathbb {R}_+^N\), constraints \((x_i^+,x_i^-,y_i^+,y_i^-,t_{i})\in X\) for all \(i\in N\) to minimize the linear objective

$$\begin{aligned} \sum _{i\in N} \sigma _i^2(2 w_i(y_i^+-y_i^-)+t_{i}) \cdot \end{aligned}$$
(44)

Note that since each continuous variable is involved in exactly one term in the objective, the extended formulation given by (44) and constraints \((x_i^+,x_i^-,y_i^+,y_i^-,t_{i})\in \text {conv}(X)\) results in the convex envelope of v(y).

Instances The instances are constructed as follows. We set \(w_i=u_i^+=u_i^-=1\) for all \(i\in N\). Coefficients \(\sigma _i\) are drawn uniformly between 0 and 1, \(\mu _i\) are drawn uniformly between 0 and \(2\sigma _i\), the transactions costs and profits \(c_i^+\) and \(c_i^-\) are drawn uniformly between 0 and \(\mu _i\), the fixed costs \(a_i^+\) and \(a_i^-\) are drawn uniformly between 0 and \((\mu _i-c_i^+)\) and \((\mu _i-c_i^-)\), respectively. The target return is set to \(\beta \sum _{i\in N}\mu _i\) where \(\beta > 0\) is a parameter; k is set to n / 10.

Formulations We test the formulations Basic, Basic+cuts, Conic, and Conic+cuts, as defined in Sect. 7.1. As mentioned above, the perspective reformulation cannot be used for these instances.

Results Table 2 shows the results for varying number of assets n and values of the expected return \(\beta \). Observe that instances with lower values of \(\beta \) are more difficult to solve for the Basic formulation: low \(\beta \) results in more feasible solutions, and more branch-and-bound nodes need to be explored before proving optimality. We also see that the Basic formulation is not effective for instances with 250 or more assets, where most instances (27 out of 30) are not solved to optimality within the time limit and leaving large end gaps at termination. On the other hand, the other three formulations achieve root improvements of over \(90\%\) in most cases, and lead to much lower solution times and end gaps.

Observe that for the portfolio problem, the coefficients of \(y_i^+\) and \(y_i^-\) in the objective and return constraints have opposite signs. Thus, we expect the approximation given by \(\texttt {Conic}\) not to be as effective as in Sect. 7.1 and, therefore, the cuts to have a larger impact in closing the root gaps. Indeed, we see in these experiments that adding cuts leads to an additional 2% to 4% root improvement (compared to the 0.3% improvement observed in Sect. 7.1)Footnote 1. In particular, formulation Basic+cuts is able to solve all instances in seconds, even instances with low values of \(\beta \) where all other formulations struggle.

Table 2 Experiments with portfolio optimization with fixed transaction costs

7.3 General convex quadratic functions

The quadratic matrices used in the previous computations had specific structures, given by the applications considered. Although our results are for M-matrices, in this section, we test the strength of the formulations for more general problems, with dense matrices having positive and negative off-diagonal entries. To employ the results developed for M-matrices, we simply apply the strengthening on the pairs of variables with a negative off-diagonal entry. Toward this end, we consider the mean-variance portfolio optimization

$$\begin{aligned} \min \;&y'Ay\\ \text {s.t.}\;&b'y\ge r\\ (MV)\ \ \ \ \ \ \ \ \ \,&1'x \le k\\&0\le y\le x \\&x\in \{0,1\}^n. \end{aligned}$$

where the objective is to minimize the portfolio variance \(y'Ay\), where A is a covariance matrix, subject to meeting a target return and satisfying sparsity constraints.

Instances In order test the effect of positive off-diagonal elements and diagonal dominance, the matrix A is constructed as follows: Let \(\rho \ge 0\) be a parameter that controls the magnitude of the positive off-diagonal entries of A, and \(\delta \ge 0\) be a parameter that controls the diagonal dominance of A. First, we construct a factor covariance matrix \(F=GG'\), where each entry in \(G_{20\times 20}\) in drawn uniformly from \([-1,1]\), and factor exposure matrix \(X_{n\times 20}\) such that \(X_{ij}=0\) with probability 0.8, and \(X_{ij}\) is drawn uniformly from [0, 1], otherwise. Then we construct an auxiliary matrix \(\bar{A}=XFX'\). Then, for \(i\ne j\), we set \(A_{ij}=\bar{A}_{ij}\) if \(\bar{A}_{ij}\le 0\), and we set \(A_{ij}=\rho \bar{A}_{ij}\) otherwiseFootnote 2. Finally, \(\upsilon _i\) is drawn uniformly from \([0, \delta \bar{\sigma }]\), where \(\bar{\sigma }=\frac{1}{n}\sum _{i\ne j}|A_{ij}|\), and \(A_{ii}=\sum _{j\in N}|A_{ij}|+\upsilon _i\). Observe that the auxiliary matrix \({\bar{A}}\) represents a low-rank matrix obtained from a 20-factor model, and \(\text {diag}(\upsilon )\) is a diagonal matrix representing the residual variances not explained by the factor model. The matrix A is obtained by scaling the positive off-diagonals of \({\bar{A}}\) by \(\rho \), and updating the diagonal entries to ensure positive definiteness by imposing diagonal dominance. Additionally, \(b_i\) is drawn uniformly between \(0.5U_{ii}\) and \(1.5U_{ii}\). Finally, we let \(r=0.25 \times \sum _{i\in N}b_i\) and \(k=n/5\) for “small” instances, and \(r=0.125 \times \sum _{i\in N}b_i\) and \(k=n/10\) for “large” instances.

Table 3 Experiments with non-positive off diagonal entries and varying diagonal dominance, \(k=n/5\)
Table 4 Experiments with constant diagonal dominance and varying positive off-diagonal entries, \(k=n/5\)

Formulations We test the same formulations as in Sect. 7.1. In this case, the diagonal matrix \(\text {diag}(\upsilon )\) is used for the Perspective formulation. In particular, formulations Perspective+cuts, Conic and Conic+cuts are based on the decomposition of the objective function given by

$$\begin{aligned} \min \;&\sum _{i\in N}\upsilon _i z_i+\sum _{A_{ij}< 0}|A_{ij}|t_{ij}+y'(A-Q-\text {diag}(\upsilon ))y \\ \text { s.t.}\;&y_i^2\le z_ix_i,\;\forall i\in N, \quad (x_i,x_j,y_i,y_j,t_{ij})\in X,\; \forall i\ne j: A_{ij}< 0, \end{aligned}$$

where \(Q_{ij}=\min \{0,A_{ij}\}\) for \(i\ne j\) and \(Q_{ii}=-\sum _{j\ne i}Q_{ij}\). By construction, \(A-Q-diag(\upsilon )\) is positive semi-definite.

Results Table 3 presents the results for matrices with non-positive off diagonal entries (i.e., \(\rho =0\)) and varying diagonal dominance \(\delta \). Table 4 presents the results for matrices with fixed diagonal dominance and varying magnitudes for positive off-diagonal entries \(\rho \). We see that, in all cases formulation Conic results in better root gap improvements than Perspective and Basic. The gap improvements depend on the parameters \(\delta \) and \(\rho \). In Table 3 we see that Conic formulation closes an additional 30% to 40% gap with respect to Perspective (independent of the diagonal dominance \(\delta \)). In Table 4 we observe that, as expected, Conic formulation is more effective at closing root gaps when the magnitude \(\rho \) for the positive off-diagonal entries is small. Nevertheless, for all instances formulations Conic and Conic+cuts result in significantly stronger root improvements than Perspective (at least 15%, and often much more) and the number of nodes required to solve the instances is decreased by at least an order of magnitude.

Observe that the stronger formulations of Conic and Conic+cuts do not necessarily lead to better solution times for small instances. Nevertheless, for the larger instances (\(n=100\)), using the Conic formulation leads to faster solution times, lower end gaps and more instances solved to optimality for all values of \(\delta \) and \(\rho \). As in Sect. 7.1, we observe little difference between Conic and Conic+cuts—consistent with Proposition 8—and that Perspective+cuts is not effective in closing the root gap. Approximating the nonlinear function with gradient inequalities appears to cause numerical issues as adding cuts weakens the relaxation contrary to expectations. Please see our comments at the end of Sect. 7.1.

Finally, observe that the formulations tested require adding \(O(n^2)\) additional variables, one for each negative off-diagonal entry in A. Thus, solving the continuous relaxations may be computationally expensive for large values of n. Table 5 illustrates this point for matrices with \(\rho =0\) and \(\delta =1\). It shows, for the Basic, Perspective and Conic formulations, the value of the best feasible solution found (sol), the value of the lower bound after one hour of branch and bound (ebound), the value of the lower bound after processing the root node (rbound), the time used to process the root node in seconds (rtime), and the number of nodes explored in one hour (nodes). Each row represents the average over five instances, and the values of sol, ebound and rbound are scaled so that the best feasible solution found for a given instance has value 100. Observe that for \(n\ge 150\) the lower bound found by Conic at the root node is stronger than the lower bounds found by other formulations after one hour of branch-and-bound. However, the continuous relaxations of Conic are difficult to solve for large values of n, leading to few branch-and-bound nodes explored and few or no feasible solutions found within the time limit.

Table 5 Experiments with \(n\ge 100\) and \(k=n/10\)

A possible approach that achieves a compromise between the strength and the size of the formulation is to apply the proposed conic inequalities for a subset of the matrix: given an M-matrix Q, choose \(I\subset \left\{ (i,j)\in N\times N: Q_{ij}<0\right\} \) and use the formulation

$$\begin{aligned} \min \;&\sum _{i\in P}{\bar{Q}}_{i} z_i+ \sum _{i \in {\bar{P}}} {\bar{Q}}_{i} y_i -\sum _{(i,j)\in I}Q_{ij}t_{ij}-\sum _{(i,j)\not \in I}Q_{ij}(y_i-y_j)^2 \\ \text { s.t.}\;&y_i^2\le z_ix_i,\;\forall i\in P, \quad (x_i,x_j,y_i,y_j,t_{ij})\in X,\; \forall (i,j)\in I. \end{aligned}$$

In particular, if \(|I|\approx 4n\), then the results in Sect. 7.1 suggest that the formulations would scale well. Additionally, the component corresponding to the remainder, \(-\sum _{(i,j)\not \in I}Q_{ij}(y_i-y_j)^2\), could be further strengthened by linear inequalities (37) (and other subgradient inequalities corresponding to points where \({\bar{y}}\ne {{\bar{x}}}\)) in the original space of variables instead of extended reformulations. An effective implementation of such a partial strengthening is beyond the scope of the current paper.

8 Conclusions

In this paper we show, under mild assumptions, that minimization of a quadratic function with an M-matrix with indicator variables is a submodular minimization problem, hence, solvable in polynomial time. We derive strong formulations using the convex hull description of non-separable quadratic terms with two indicator variables arising from a decomposition of the quadratic function. Additionally, we provide strong conic quadratic valid inequalities approximating the convex hulls. The derived formulations generalize previous results in the binary case and separable case, and the inequalities dominate valid inequalities given in the literature. Computational experiments indicate that the proposed conic formulations may be significantly more effective compared to the natural convex relaxation and the perspective reformulation.