1 Introduction

Mixed Integer Quadratic Programming (MIQP) is an optimization problem where the objective function is a general quadratic function, the constraints are linear inequalities, and some of the variables are required to be integers. Formally, given a symmetric matrix \(H \in {\mathbb {Q}}^{n \times n}\), a matrix \(W \in {\mathbb {Q}}^{m \times n}\), vectors \(h \in {\mathbb {Q}}^n\), \(w \in {\mathbb {Q}}^m\), and \(p \in \{0,1,\dots ,n\}\), we seek a vector \(x \in {\mathbb {R}}^n\) that attains

$$\begin{aligned} \begin{aligned} \min&\quad x^{\textsf{T}}H x + h^{\textsf{T}}x \\ {{\,\mathrm{s.t.}\,}}&\quad Wx \le w \\&\quad x \in {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}. \\ \end{aligned} \end{aligned}$$
(MIQP)

Many important applications can be modeled as MIQPs, in areas such as operations research, engineering, computer science, physics, biology, finance, economics, and artificial intelligence. MIQP reduces to Mixed Integer Linear Programming (MILP) when H is a zero matrix, and to Quadratic Programming (QP) if \(p=0\). Moreover, MIQP is a prototypical Mixed Integer Nonlinear Programming (MINLP) problem, as it captures the critical elements of those models, but in the simplest possible way, making it the natural first step to construct efficient algorithms for MINLP.

The decision version of MIQP lies in the complexity class NP [10]. Furthermore, MIQP is strongly NP-hard [15], and remains NP-hard even if H has rank one and \(p=0\) [28]. This implies the lack of efficient algorithms for solving this class of optimization problems in its full generality.

The main result of this paper is an approximation algorithm for MIQP. In order to state our result, we first give the definition of \(\epsilon \)-approximate solution. Consider an instance of (MIQP), and assume that it has an optimal solution \(x^*\). Let f(x) denote the objective function, and let \(f_{\max }\) be the maximum value of f(x) on the feasible region. For \(\epsilon \in [0,1]\), we say that a feasible point \(x^\diamond \) is an \(\epsilon \)-approximate solution if

$$\begin{aligned} f(x^\diamond ) - f(x^*) \le \epsilon \cdot (f_{\max } - f(x^*)). \end{aligned}$$

Note that only optimal solutions are 0-approximate solutions, while any feasible point is a 1-approximate solution. The definition of \(\epsilon \)-approximate solution has some useful invariance properties which make it a natural choice in this setting. For instance, it is preserved under dilation and translation of the objective function, and it is insensitive to affine transformations of the objective function and of the feasible region, like for example changes of basis. Our definition of approximation has been used in earlier works, and we refer to [1, 22, 27, 31] for more details. We can now state our main result.

Theorem 1

For every \(\epsilon \in (0,1]\), there is an algorithm that finds an \(\epsilon \)-approximate solution to a bounded (MIQP), if it exists. The running time of the algorithm is polynomial in the size of the input and in \(1/\epsilon \), provided that the rank k of the matrix H and the number of integer variables p are fixed numbers.

This is the first known polynomial time approximation algorithm for MIQP with k and p fixed. In particular, note that the dimension n of the problem is not required to be fixed. The running time of the algorithm exhibits a polynomial dependence on the size of the instance and on \(1/\epsilon \), and an exponential dependence on k and on p. It is known that this dependence is expected unless P=NP, and we refer the reader to the discussion below the statement of Theorem 1 in [8].

One might wonder if the boundedness assumption can be relaxed in Theorem 1, with the understanding that, if the input MIQP is unbounded, the algorithm should return at least a statement that the instance is unbounded. The next theorem implies that the boundedness assumption cannot be removed unless P=NP.

Theorem 2

Determining whether (MIQP) is unbounded is NP-complete, even if the rank k of the matrix H equals three and the number p of integer variables is zero.

Proof

From Theorem 4 in [10], the decision problem in the statement is in NP, thus we only need to show the NP-hardness. In Sects. 2 and 3 in [28], the authors present a QP of the form

$$\begin{aligned} \min \{ x_1 - x_2^2 : Wx \le w, \ x \in {\mathbb {R}}^n\} \end{aligned}$$
(1)

with nonnegative optimum objective value, and for which it is NP-hard to determine if the optimum value is zero. Since every bounded QP has an optimal solution of polynomial size [29], there is a number \(\phi \) which is polynomial in the size of the input QP (1) for which the optimum objective value is either zero or strictly larger than \(2^{-\phi }\).

Consider now the QP

$$\begin{aligned} \min \{ x_1 x_{n+1} - x_2^2 - 2^{-\phi } x_{n+1}^2: ( W \mid -w ) x \le 0, \ x_{n+1} \ge 1, \ x \in {\mathbb {R}}^{n+1}\}. \end{aligned}$$
(2)

Notice that the rank of the objective function is three. Thus, to conclude the proof of the theorem we only need to show that (2) is unbounded if and only if the optimum value of (1) is zero.

Assume that the optimum value of (1) is zero. Then there is a point \({\bar{x}} \in {\mathbb {R}}^n\) with \(W {\bar{x}} \le w\) and \(\bar{x}_1 - {\bar{x}}_2^2=0\). Consider now the set of vectors in \({\mathbb {R}}^{n+1}\) given by \((\lambda {\bar{x}}, \lambda )\), for \(\lambda \ge 1\). Note that all these vectors are feasible to (2). Furthermore, the objective value of \((\lambda {\bar{x}}, \lambda )\) is \(\lambda ^2 (\bar{x}_1 - {\bar{x}}_2^2 - 2^{-\phi }) = - \lambda ^2 2^{-\phi }\) which goes to \(-\infty \) as \(\lambda \rightarrow \infty \). Therefore, (2) is unbounded.

Next, assume that the optimum value of (1) is positive, therefore strictly larger than \(2^{-\phi }\). Consider a vector feasible to (2) and note that it can be written as \(({\bar{\lambda }} {\bar{x}}, {\bar{\lambda }})\), where \({\bar{\lambda }} \ge 1\) and \({\bar{x}}\) satisfies \(W {\bar{x}} \le w\). The objective value of \(({\bar{\lambda }} {\bar{x}}, {\bar{\lambda }})\) is \({\bar{\lambda }}^2 (\bar{x}_1 - {\bar{x}}_2^2 - 2^{-\phi })\). Since \({\bar{x}}\) is feasible to (1), we have \({\bar{x}}_1 - {\bar{x}}_2^2 > 2^{-\phi }\), thus the objective value of \(({\bar{\lambda }} {\bar{x}}, {\bar{\lambda }})\) is positive. In particular, (2) is bounded. \(\square \)

In particular, Theorem 2 strengthens the result by Murty and Kabadi [26] that deciding whether a QP is bounded or not is NP-hard.

1.1 Literature review

In this section, we review the known exact and approximation algorithms for MIQP with a polynomial running time.

MIQP admits a polynomial time approximation algorithm if the dimension n is fixed [6]. MIQP is polynomially solvable if the dimension n is fixed and the objective is convex [21] or concave [3, 4, 18]. If the objective is concave with a fixed number of negative eigenvalues and the number p of integer variables is fixed, there is a polynomial time approximation algorithm [8].

Next, we survey Integer Quadratic Programming (IQP), which is the special case of MIQP where all variables are integer, i.e., \(p=n\). IQP is solvable in polynomial time in dimension one and two [11]. Furthermore, there is a polynomial time approximation algorithm if the dimension is fixed and the objective is homogeneous with at most one positive or negative eigenvalue [19]. If the objective function is separable and convex, and the constraint matrix W is TU, then IQP can be solved in polynomial time [20]. IQP admits a polynomial time approximation algorithm if the objective is separable and concave, with a fixed number of negative eigenvalues, and the largest absolute value of the subdeterminants of the constraint matrix is bounded by two [9]. Other IQP tractability results under specific structural restrictions can be found in [13, 24].

Finally, we discuss Quadratic Programming (QP), the special case of MIQP where all variables are continuous, i.e., \(p=0\). QP can be solved in polynomial time if the dimension is fixed [10, 29]. Furthermore, QP admits a polynomial time approximation algorithm if the number of negative eigenvalues of H is fixed [30], and it admits a weaker polynomial time approximation algorithm in general [32]. If the objective is convex, then QP can be solved in polynomial time [23].

1.2 Overview of the results and organization of the paper

Our approximation algorithm is based on the novel concepts of spherical form MIQP and of aligned vectors. These two notions significantly enhance the available mathematical toolkit for the design and analysis of algorithms for MIQP, and therefore their importance is not limited to this work.

Sections 2 and 3 are devoted to finding a change of basis that transforms a MIQP in spherical form. In a spherical form MIQP the objective is separable and the polyhedron has a “spherical appearance”. Moreover, the set \({\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}\) is replaced by a set of the form \(\Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp \), for a lattice \(\Lambda \) of rank p. The formal definition is given in Sect. 3. In order to obtain this change of basis we develop a number of results of independent interest.

Since a spherical form MIQP has a separable objective function, in particular we need to find an invertible matrix L and a diagonal matrix D such that \(H = LDL^{\textsf{T}}\). In Sect. 2 we focus on this simpler task and, in Theorem 3 and Corollary 1, we present a symmetric decomposition algorithm that constructs such matrices LD in strongly polynomial time. This is the first known polynomial time algorithm for this problem.

In Sect. 3, we build on this algorithm and obtain, in Proposition 1, a rational version of theorems on simultaneous diagonalization of matrices. In particular, we show that we can find in polynomial time an invertible matrix L that at the same time diagonalizes a given matrix H, and provides the shape of an ellipsoid that approximates a given polytope within a factor depending only on the dimension. This result is the main building block that allows us to obtain, in Proposition 2, a polynomial time algorithm to transform a MIQP in spherical form.

In Sect. 4 we introduce the concept of aligned vectors for a spherical form MIQP. In particular, they are two feasible vectors that are “far” in the direction where the objective is “most curved” and “almost aligned” in all other directions. Furthermore, their midpoint is feasible as well. We then show, in Proposition 3, that if a spherical form MIQP has two aligned vectors, then it is possible to find an \(\epsilon \)-approximate solution by solving a number of MILPs. This number is polynomial in \(1/\epsilon \) if both k and p are fixed in the original (MIQP).

In Sect. 5 we focus on the problem of deciding whether a spherical form MIQP has two aligned vectors or not. In Proposition 5 we give a polynomial time algorithm that either finds two aligned vectors, or finds a vector \(v \in {{\,\textrm{span}\,}}(\Lambda )\) along which the polyhedron is “flat”. The vector v allows us to decompose the problem in a number of MIQPs with fewer integer variables. Furthermore, this number depends only on k and p, and thus is a constant if both k and p are fixed.

In Sect. 6 we then present our approximation algorithm for MIQP and provide a proof of Theorem 1. The algorithm first uses Proposition 2 to find a change of basis that transforms the input MIQP in spherical form. Then, it employs Proposition 5 and it either finds two aligned vectors, or finds a vector \(v \in {{\,\textrm{span}\,}}(\Lambda )\) along which the polyhedron is “flat”. In the first case, we use Proposition 3 to find an \(\epsilon \)-approximate solution. In the second case, the input MIQP is decomposed into a constant number of instances with fewer integer variable, and the algorithm is recursively applied to these instances. At the end of the execution, the algorithm returns the best solution found, and we show that it is an \(\epsilon \)-approximate solution to the input MIQP.

In this paper, we will be using several concepts from computational complexity. Recall that a strongly polynomial algorithm is a polynomial space algorithm in the Turing model and a polynomial time algorithm in the arithmetic model. The definition of strong polynomiality mixes the Turing model and the arithmetic model of computation. Throughout the paper, unless we state a different model, we mean the Turing model. For more details on time and space complexity we refer the reader to [17]. In particular, we recall that a strongly polynomial algorithm is also a polynomial time algorithm.

2 A strongly polynomial algorithm for symmetric decomposition

Given a rational symmetric \(n \times n\) matrix \(H\), a symmetric decomposition of \(H\) is a decomposition of the form \(BHB^{\textsf{T}}= D\), where B is an \(n \times n\) nonsingular matrix and D is an \(n \times n\) diagonal matrix. The goal of this section is to give an algorithm that constructs a symmetric decomposition of any rational symmetric matrix \(H\) with two fundamental properties: (i) the algorithm is strongly polynomial, (ii) the Frobenius norms of B and \(B^{-1}\) are upper bounded by an integer of size polynomial in n. To the best of our knowledge, our algorithm is the first known polynomial time algorithm that finds a symmetric decomposition of any rational symmetric matrix. Note that the spectral decomposition, the Schur decomposition, and Takagi’s factorization yield a symmetric decomposition of a rational symmetric matrix. However, none of these decompositions can be performed in polynomial space since the resulting matrices generally contain irrational elements. Other related matrix decompositions are the Cholesky decomposition and the \(LDL^{\textsf{T}}\) decomposition, but are not applicable to indefinite matrices. For more details on matrix decompositions we refer the reader to [16].

By introducing pivoting operations that perform symmetric additions of rows and columns, as well as symmetric interchanges, Dax and Kaniel [5] describe an algorithm that constructs a symmetric decomposition of any symmetric \(n \times n\) matrix \(H\). Their algorithm performs a number of arithmetic operations that is polynomial in n, thus it is a polynomial time algorithm in the arithmetic model. However, it is unknown if it is a polynomial time algorithm or a polynomial space algorithm.

In this section, we present a strongly polynomial version of Dax and Kaniel’s algorithm. Therefore, for our version of the algorithm, we show that all numbers stored during the execution of the algorithm have size that is polynomial in the size of the input matrix \(H\). This in particular implies that the output matrices B and D have polynomial size. The proof builds on the technique introduced by Edmonds to perform Gaussian elimination in strongly polynomial time [12], but it is more involved due to the “complete pivoting” performed at each iteration. In particular, while in Gaussian elimination every number stored during the algorithm is a ratio of subdeterminants of the original matrix, every number stored in our version Dax and Kaniel’s algorithm at iteration k is shown to be a ratio of subdeterminants of the matrix obtained from \(H\) by performing only the first k pivoting operations.

Another fundamental property of our symmetric decomposition algorithm is that the Frobenius norms of B and \(B^{-1}\) are upper bounded by an integer of size polynomial in n. In particular, this integer depends only on n and not on the input matrix. This property will be fundamental in the next sections of the paper, where the symmetric decomposition algorithm will be used to obtain a change of basis for our MIQP. Recall that the Frobenius norm of an \(m \times n\) matrix A is defined by \(\Vert A\Vert _F := \sqrt{\sum _{i=1}^m \sum _{j=1}^n A_{ij}^2}.\)

Therefore, the purpose of this section is to prove the following result.

Theorem 3

Let \(H\) be a rational symmetric \(n \times n\) matrix. There is a strongly polynomial algorithm that finds matrices BD such that \(B HB^{\textsf{T}}= D\) is a symmetric decomposition of \(H\). Furthermore, \(\Vert B\Vert _F\) and \(\Vert B^{-1}\Vert _F\) are upper bounded by \((5n)^{n/2}\).

If we set \(L:=B^{-1}\) in Theorem 3, we obtain \(H= LDL^{\textsf{T}}\). Since the inverse can be computed in strongly polynomial time [12], also this decomposition can be obtained in strongly polynmomial time.

Corollary 1

Let \(H\) be a rational symmetric \(n \times n\) matrix. There is a strongly polynomial algorithm that finds an invertible \(n \times n\) matrix L and an \(n \times n\) diagonal matrix D such that \(H= L D L^{\textsf{T}}\). Furthermore, \(\Vert L\Vert _F\) and \(\Vert L^{-1}\Vert _F\) are upper bounded by \((5n)^{n/2}\).

Corollary 1 then provides a strongly polynomial algorithm to compute a change of basis that transforms a general (MIQP) in a separable form. Namely, compute the decomposition \(H = LDL^{\textsf{T}}\) and set \(y := L^{\textsf{T}}x\). In particular, our approach can be substituted to the techniques used in [7, 8, 19, 31] to transform the original QP or MIQP in a separable form.

In the remainder of this section we only consider matrices that are \(n \times n\), thus we avoid repeating it throughout the section.

2.1 Description of the symmetric decomposition algorithm

In this section, we describe the symmetric decomposition algorithm that we analyze. It is the version of Dax and Kaniel’s algorithm where the parameter \(\gamma \) is always chosen in \(\pm 1\).

Let \(H\) be the rational symmetric \(n \times n\) matrix given in the input. Let \(H^{(0)} := H\), and for every \(k=1,\dots ,n-1\), we denote by \(H^{(k)}\) the \(n \times n\) matrix obtained after k iterations of the algorithm. The matrix \(H^{(k)}\) is symmetric and all the off-diagonal elements in the first k rows and columns equal zero. In particular, \(H^{(n-1)}\) is a diagonal matrix and coincides with the matrix D in the output.

For any \(k = 1,\dots ,n-1\), we now describe the iteration k of the symmetric decomposition algorithm, where the matrix \(H^{(k)}\) is obtained from \(H^{(k-1)}\). The iteration is subdivided into two stages, called “pivoting” and “elimination”.

Pivoting. The goal of the pivoting stage is to ensure that the pivotal element, which is the element in the (kk) position, is one with largest absolute value among rows and columns \(k,\dots ,n\). Let s and r be indices such that \(|H^{(k-1)}_{sr}| = \max _{i,j \in \{k,\dots ,n\}} |H^{(k-1)}_{ij}|.\) Since \(H^{(k-1)}_{sr}=H^{(k-1)}_{rs}\) we can assume without loss of generality that \(s \le r\). Let \(\tilde{H}\) be the symmetric \(n \times n\) matrix obtained from \(H^{(k-1)}\) by interchanging rows s and k, and interchanging columns s and k. If \(s=r\), then \(H^{(k-1)}_{sr} = {\tilde{H}}_{kk}\). In this case, we have achieved our goal and the pivoting is terminated. Thus, we now assume \(s < r\). This implies that \(H^{(k-1)}_{sr} = {\tilde{H}}_{rk}\). We define

$$\begin{aligned} \gamma := {\left\{ \begin{array}{ll} +1 &{} \text {if } {\tilde{H}}_{rk} ({\tilde{H}}_{kk} + {\tilde{H}}_{rr}) \ge 0 \\ -1 &{} \text {if } {\tilde{H}}_{rk} ({\tilde{H}}_{kk} + {\tilde{H}}_{rr}) < 0, \end{array}\right. } \end{aligned}$$

and we let \({\tilde{\tilde{H}}}\) be the symmetric \(n \times n\) matrix obtained from \({\tilde{H}}\) by adding row r multiplied by \(\gamma \) to row k, and adding column r multiplied by \(\gamma \) to column k. It is simple to check that the new (kk) element is the one with largest absolute value among rows and columns \(k,\dots ,n\), i.e., \(|{\tilde{\tilde{H}}}_{kk}| = \max _{i,j \in \{k,\dots ,n\}} |{\tilde{\tilde{H}}}_{ij}|\).

Pivoting can be achieved via matrix multiplication. We define the matrix \({\tilde{P}}_k\) which interchanges rows s and k, thus it is the permutation matrix obtained from the identity matrix by interchanging rows s and k (note that, if \(s=k\), then \(\tilde{P}_k\) is the identity matrix). The matrix \({\tilde{\tilde{P}}}_k\) adds (if necessary) the row r multiplied by \(\gamma \) to row k, therefore, it is the identity matrix if \(s=r\), or it is obtained from the identity matrix by replacing the zero element in the (kr) position with the scalar \(\gamma \). The matrix \({\tilde{H}}\) can then be written as \({\tilde{H}} = {\tilde{P}}_k H^{(k-1)} \tilde{P}_k^{\textsf{T}},\) while the matrix \({\tilde{\tilde{H}}}\) is the product \({\tilde{\tilde{H}}}= {\tilde{\tilde{P}}}_k {\tilde{H}} {\tilde{\tilde{P}}}_k^{\textsf{T}}= P_k H^{(k-1)} P_k^{\textsf{T}},\) where \(P_k := {\tilde{\tilde{P}}}_k {\tilde{P}}_k.\)

Elimination. The goal of this stage is to obtain zeros in the off-diagonal elements of row and column k. We first perform row elimination and then column elimination. The row elimination is done as in Gaussian elimination: For each \(i=k+1,\dots ,n\), add row k multiplied by \(-{\tilde{\tilde{H}}}_{ik} / {\tilde{\tilde{H}}}_{kk}\) to row i. The column elimination is done symmetrically: for each \(j=k+1, \dots , n\), add column k multiplied by \(-{\tilde{\tilde{H}}}_{kj} / {\tilde{\tilde{H}}}_{kk}\) to column j.

Row elimination is performed by multiplying on the left by the matrix \((I-E_k)\), where \(I\) denotes the \(n \times n\) identity matrix and the elements of \(E_k\) are given by

$$\begin{aligned} (E_k)_{ik} := {\tilde{\tilde{H}}}_{ik} / {\tilde{\tilde{H}}}_{kk} \qquad i=k+1,\dots ,n, \end{aligned}$$
(3)

and all the other elements are zeros. Symmetrically, column elimination is performed by multiplying on the right by the matrix \((I-E_k)^{\textsf{T}}\). Therefore, the matrix \(H^{(k)}\) is obtained from \(H^{(k-1)}\) via the matrix product

$$\begin{aligned} H^{(k)} := [(I-E_k) P_k] H^{(k-1)} [(I-E_k) P_k]^{\textsf{T}}. \end{aligned}$$
(4)

This completes the description of the iteration k of the symmetric decomposition algorithm. At the end of iteration \(n-1\) the algorithm returns the diagonal matrix \(D := H^{(n-1)}\) and the nonsingular matrix

$$\begin{aligned} B := (I- E_{n-1}) P_{n-1} \cdots (I- E_1) P_1. \end{aligned}$$

It follows directly from the description of the algorithm that the algorithm is correct, i.e., \(BHB^{\textsf{T}}= D\) is a symmetric decomposition of \(H\).

2.2 Analysis of the algorithm

In this section, we prove the first part of Theorem 3. Namely, we show that the symmetric decomposition algorithm presented in Sect. 2.1 runs in strongly polynomial time. Clearly, the number of arithmetic operations performed is polynomial in n. Therefore, we only need to show that the size of each matrix constructed during the execution is polynomial in the size of \(H\). For matrices \(\tilde{P}_k, {\tilde{\tilde{P}}}_k, P_k\), for \(k = 1,\dots ,n-1\), this follows directly from their definition. In fact, we only need to show that each matrix \(H^{(k)}\), for \(k = 1,\dots ,n-1\), has size polynomial in the size of \(H\). Indeed, once this is proven, we obtain that also \(E_k\) and the returned matrix B have size polynomial in the size of \(H\).

Thus, we now focus our attention on the matrix \(H^{(k)}\). From the equality (4) we deduce that

$$\begin{aligned} H^{(k)} = B^{(k)} H{B^{(k)}}^{\textsf{T}}, \end{aligned}$$

where

$$\begin{aligned} B^{(k)} := (I- E_k) P_k (I- E_{k-1}) P_{k-1} \cdots (I- E_1) P_1. \end{aligned}$$

As noticed on page 224 in [5], it is simple to verify that for every \(t,j \in \{1,\dots , n-1\}\) with \(t < j\), we have \(E_t P_j = E_t\). This in turn implies that for every \(t,j \in \{1,\dots , n-1\}\) with \(t < j\), we have

$$\begin{aligned} P_j (I- P_{j-1} P_{j-2} \cdots P_{t+1} E_t) = (I- P_j P_{j-1} \cdots P_{t+1} E_t) P_j, \end{aligned}$$

which allows us to write \(B^{(k)}\) in the form

$$\begin{aligned} B^{(k)}&= (I- E_k) (I- P_k E_{k-1}) \cdots (I- P_k \cdots P_2 E_1) P_k \cdots P_1. \end{aligned}$$
(5)

Therefore \(H^{(k)}\) can be written as

$$\begin{aligned} H^{(k)}&= E^{(k)} G^{(k)} {E^{(k)}}^{\textsf{T}}, \end{aligned}$$
(6)

where

$$\begin{aligned} G^{(k)}&:= (P_k \cdots P_1) H(P_k \cdots P_1)^{\textsf{T}}, \\ E^{(k)}&:= (I- E_k) (I- P_k E_{k-1}) \cdots (I- P_k \cdots P_2 E_1). \end{aligned}$$

In the next lemma, we analyze the matrices \(P_{k} P_{k-1} \cdots P_{t+1} E_t\) in the definition of \(E^{(k)}\). The second part of the statement will only be used later in Sect. 2.3.

Lemma 1

For each \(t \in \{1,\dots ,k\}\), the matrix \(P_k P_{k-1} \cdots P_{t+1} E_{t}\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). Furthermore, the elements in rows \(t+1,\dots ,k\) are bounded by two in absolute value, while the elements in rows \(k+1,\dots ,n\) are bounded by one in absolute value.

Proof

We show this lemma by induction on \(k-t\). In the base case we have \(k-t = 0\), thus we are considering matrix \(E_t\). By definition, \(E_t\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\), and from (3) all nonzeros are are bounded by one in absolute value.

For the inductive step we assume \(k-t \ge 1\) and consider the matrix \(P_k ( P_{k-1} \cdots P_{t+1} E_{t})\). By induction, \(P_{k-1} \cdots P_{t+1} E_{t}\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). Furthermore, the elements in rows \(t+1,\dots ,k-1\) are bounded by two in absolute value, while the elements in rows \(k,\dots ,n\) are bounded by one in absolute value. We have \(P_k = {\tilde{\tilde{P}}}_k {\tilde{P}}_k,\) where the matrix \({\tilde{P}}_k\) interchanges two rows in \(\{k,\dots ,n\}\), and the matrix \({\tilde{\tilde{P}}}_k\) adds or subtracts (if necessary) a row in \(\{k+1,\dots ,n\}\) to row k. Since \(k \ge t+1\), the matrix \(P_k ( P_{k-1} \cdots P_{t+1} E_{t})\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). The elements in rows \(t+1,\dots ,k-1\) are left unchanged, thus they are bounded by two in absolute value. The element in row k is now bounded by two in absolute value, while the elements in rows \(k+1,\dots ,n\) remain bounded by one in absolute value. \(\square \)

Next, we use Lemma 1 to discuss the effect of multiplying a matrix on the left by \(E^{(k)}\). Note that a multiplication of this type is performed in (6).

Lemma 2

Multiplying a matrix on the left by \(E^{(k)}\) results in a sequence of elementary row operations in which a multiple of a row \(t \in \{1,\dots ,k\}\) is added to a row in \(\{t+1, \dots , n\}\).

Proof

Due to the definition of \(E^{(k)}\), it suffices to show that multiplying a matrix on the left by \((I- P_k \cdots P_{t+1} E_{t}),\) for \(t \in \{1,\dots ,k\}\), results in a sequence of elementary row operations in which a multiple of row t is added to a row in \(\{t+1,\dots , n\}\).

From Lemma 1, the matrix \(P_k \cdots P_{t+1} E_{t}\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). Hence, the multiplication on the left by matrix \((I- P_k \cdots P_{t+1} E_{t})\) preserves the first t rows, and each subsequent row is obtained by adding a multiple of row t to the original row. \(\square \)

We are finally ready to show, in the next claim, that each matrix \(H^{(k)}\) has size polynomial in the size of \(H\). This concludes the proof that our symmetric decomposition algorithm runs in strongly polynomial time.

Claim 1

For each \(k \in \{1,\dots ,n-1\}\), the size of \(H^{(k)}\) is polynomial in the size of \(H\).

Proof

Let \(G^{(k)}_k\) denote the \(k \times k\) submatrix of \(G^{(k)}\) determined by the first k rows and columns, and let \(G^{(k)}_{k\mid ij}\), for \(i,j \in \{k+1,\dots ,n\}\), denote the \((k+1) \times (k+1)\) submatrix of \(G^{(k)}\) determined by rows \(\{1,\dots ,k,i\}\) and columns \(\{1,\dots ,k,j\}\).

It suffices to show that for every \(k \in \{1,\dots ,n-1\}\) and for every \(i,j \in \{k+1,\dots ,n\}\), we have

$$\begin{aligned} H_{ij}^{(k)} = \det (G^{(k)}_{k\mid ij}) / \det (G^{(k)}_k). \end{aligned}$$
(7)

In fact, the definition of \(G^{(k)}\) implies that its size is polynomial in the size of \(H\). Therefore, also \(\det (G^{(k)}_{k\mid ij})\) and \(\det (G^{(k)}_k)\) have size polynomial in the size of \(H\), and so does each element of \(H^{(k)}\) due to (7). Therefore, in the remainder of the proof we show (7).

Consider the product \(E^{(k)} G^{(k)}\) in (6). From Lemma 2, the resulting matrix is obtained from \(G^{(k)}\) via a sequence of elementary row operations. Among all these elementary row operations, only a subset modify the first k rows of the matrix \(G^{(k)}\). From Lemma 2, in each of these elementary row operations, a multiple of a row \(t \in \{1,\dots ,k-1\}\) is added to a row in \(\{t+1,\dots ,k\}\). Similarly, among all the elementary column operations performed by \({E^{(k)}}^{\textsf{T}}\), only a subset modify the first k columns of the matrix \(G^{(k)}\). In each of these elementary column operations, a multiple of a column \(t \in \{1,\dots ,k-1\}\) is added to a column in \(\{t+1,\dots ,k\}\). We perform these subsets of elementary row and column operations to the matrix \(G^{(k)}_k\). From (6), the resulting matrix is precisely the submatrix of \(H^{(k)}\) given by the first k rows and columns, hence it is diagonal with elements \(H_{11}^{(k)}, \dots ,H_{kk}^{(k)}\) in the diagonal. Note that each elementary operation considered preserves the determinant of \(G^{(k)}_k\). Thus,

$$\begin{aligned} \det (G^{(k)}_k) = H_{11}^{(k)} \cdots H_{kk}^{(k)}. \end{aligned}$$
(8)

A similar argument can be applied to the matrix \(G^{(k)}_{k\mid ij}\). Among all the elementary row operations performed by \(E^{(k)}\), only a subset modify rows \(\{1,\dots ,k,i\}\) of the matrix \(G^{(k)}\). From Lemma 2, in each of these elementary row operations, a multiple of a row \(t \in \{1,\dots ,k\}\) is added to a row in \(\{t+1,\dots ,k,i\}\). Similarly, among all the elementary column operations performed by \({E^{(k)}}^{\textsf{T}}\), only a subset modify columns \(\{1,\dots ,k,j\}\) of the matrix \(G^{(k)}\). In each of these elementary column operations, a multiple of a column \(t \in \{1,\dots ,k\}\) is added to a column in \(\{t+1,\dots ,k,j\}\). We perform this subset of elementary operations to the matrix \(G^{(k)}_{k\mid ij}\). From (6), the resulting matrix is precisely the submatrix of \(H^{(k)}\) determined by rows \(\{1,\dots ,k,i\}\) and columns \(\{1,\dots ,k,j\}\). Hence it is diagonal with elements \(H_{11}^{(k)}, \dots ,H_{kk}^{(k)},H_{ij}^{(k)}\) in the diagonal. Each elementary operation considered preserves the determinant of \(G^{(k)}_{k\mid ij}\). Thus, we have

$$\begin{aligned} \det (G^{(k)}_{k\mid ij}) = H_{11}^{(k)} \cdots H_{kk}^{(k)} \cdot H_{ij}^{(k)}. \end{aligned}$$

Dividing the latter equation by Eq. (8), we obtain (7). \(\square \)

2.3 Frobenius norm of B and \(B^{-1}\)

In this section, we prove the second part of Theorem 3. Namely, we show that for the matrix B returned by the algorithm described in Sect. 2.1, both its Frobenius norm and the Frobenius norm of its inverse are upper bounded by an integer of size polynomial in n. We will use the fact that the Frobenius norm is submultiplicative, i.e., for matrices \(A,A'\) we have \(\Vert AA'\Vert _F \le \Vert A\Vert _F \cdot \Vert A'\Vert _F\).

Claim 2

The Frobenius norm of matrices B and \(B^{-1}\) is upper bounded by \((5n)^{n/2}\).

Proof

From (5), we can write \(B = B^{(n-1)}\) as the product \(B = EP\), where

$$\begin{aligned} E&:= E^{(n-1)} = (I- E_{n-1}) (I- P_{n-1} E_{n-2}) \cdots (I- P_{n-1} \cdots P_2 E_1), \\ P&:= P_{n-1} P_{n-2} \cdots P_{1}. \end{aligned}$$

In order to bound the Frobenius norm of B and \(B^{-1}\), we bound separately the Frobenius norm of \(P, P^{-1}, E\), and \(E^{-1}\).

\({\underline{\hbox {Norm of }P}}\). Recall that each matrix \(P_k\) is the product of two matrices \(P_k = {\tilde{\tilde{P}}}_k \tilde{P}_k\), where the matrix \({\tilde{P}}_k\) interchanges row s and k, where \(s \ge k\), and the matrix \({\tilde{\tilde{P}}}_k\) adds (if necessary) row r multiplied by \(\gamma \) to row k, where \(r > k\). Therefore, for each \(k=n-1,\ldots ,1\), in the matrix \(P_k \cdots P_1\), the last \(n-k\) rows are permuted rows of the identity matrix, while each of the first k rows has at most two nonzero elements, each one being \(\pm 1\). We obtain

$$\begin{aligned} \Vert P\Vert _F&= \Vert P_{n-1} \cdots P_1\Vert _F \le \sqrt{2n-1}. \end{aligned}$$

\({\underline{\hbox {Norm of}~P^{-1}}}\). Each matrix \(P_k^{-1}\) is the product \(P_k^{-1} = \tilde{P}_k^{-1} {\tilde{\tilde{P}}}_k^{-1}\), where the matrix \({\tilde{\tilde{P}}}_k^{-1}\) adds (if necessary) row r multiplied by \(-\gamma \) to row k, where \(r > k\), and the matrix \({\tilde{P}}_k^{-1}\) interchanges row s and k, where \(s \ge k\). Therefore, for each \(k=n-1,\dots ,1\), in the matrix \(P_k^{-1} P_{k+1}^{-1} \cdots P_{n-1}^{-1}\), the first \(k-1\) rows coincide with the first \(k-1\) rows of the identity matrix, and the remaining rows are a permutation of the rows from k to n of an upper triangular matrix with elements in \(0, \pm 1\). We obtain

$$\begin{aligned} \Vert P^{-1}\Vert _F = \Vert P_1^{-1} P_2^{-1} \cdots P_{n-1}^{-1}\Vert _F \le \sqrt{(n^2 + n)/2}. \end{aligned}$$

\({\underline{\hbox {Norm of }E.}}\) From Lemma 1 with \(k=n-1\), for each \(t \in \{1,\dots ,n-1\}\), the matrix \(P_{n-1} P_{n-2} \cdots P_{t+1} E_t\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). Furthermore, the elements in rows \(t+1,\dots ,n-1\) are bounded by two in absolute value, while the element in the last row is bounded by one in absolute value. Thus, we obtain

$$\begin{aligned} \Vert I-P_{n-1} P_{n-2} \cdots P_{t+1} E_t\Vert _F \le \sqrt{(n+1) + 4(n-t-1)} \le \sqrt{5(n-1)}. \end{aligned}$$

Hence

$$\begin{aligned} \Vert E\Vert _F&\le \Vert I- E_{n-1}\Vert _F \cdot \Vert I- P_{n-1} E_{n-2}\Vert _F \cdots \Vert I- P_{n-1} \cdots P_2 E_1\Vert _F \\&\le \sqrt{(5(n-1))^{n-1}}. \end{aligned}$$

\({\underline{\hbox {Norm of } E^{-1}.}}\) Once again, from Lemma 1 with \(k=n-1\), we know that for each \(t \in \{1,\dots ,n-1\}\), the matrix \(P_{n-1} P_{n-2} \cdots P_{t+1} E_t\) can have nonzeros only in positions \((t+1,t),\dots ,(n,t)\). This fact allows us to write \(E^{-1}\) as

$$\begin{aligned} E^{-1}&= (I- P_{n-1} \cdots P_2 E_1)^{-1} \cdots (I- P_{n-1} E_{n-2})^{-1} (I- E_{n-1})^{-1} \\&= (I+ P_{n-1} \cdots P_2 E_1) \cdots (I+ P_{n-1} E_{n-2}) (I+ E_{n-1}) \\&= I+ P_{n-1} \cdots P_2 E_1 + \cdots + P_{n-1} E_{n-2} + E_{n-1}. \end{aligned}$$

In particular, the matrix \(E^{-1}\) is unit lower triangular, i.e., lower triangular with all elements on the main diagonal equal to one. The second part of Lemma 1 then implies that the elements in rows \(1,\dots ,n-1\) are bounded by two in absolute value, while the elements in the last row are bounded by one in absolute value. We obtain

$$\begin{aligned} \Vert E^{-1}\Vert _F \le \sqrt{(2n-1) + 4 (n^2 - 3n +2)/2} = \sqrt{2 n^2 - 4n + 3}. \end{aligned}$$

\({\underline{\hbox {Norm of }B \hbox { and } B^{-1}.}}\) Using the obtained bounds on the Frobenius norm of P, \(P^{-1}\), E, \(E^{-1}\), we derive

$$\begin{aligned} \Vert B\Vert _F&= \Vert EP\Vert _F \le \Vert E\Vert _F \, \Vert P\Vert _F \le \sqrt{(5(n-1))^{n-1} (2n-1)}, \\ \Vert B^{-1}\Vert _F&= \Vert P^{-1} E^{-1}\Vert _F \le \Vert P^{-1}\Vert _F \, \Vert E^{-1}\Vert _F \le \sqrt{(n^2 + n) (2 n^2 - 4n + 3)/2}. \end{aligned}$$

It can be checked that \((5n)^{n/2}\) is larger than both upper bounds for any n. Therefore, the claim follows. \(\square \)

While the bound on the Frobenius norm of matrices B and \(B^{-1}\) in Claim 2 is sufficient for our task, we remark that a better bound can be obtained by providing a better bound on \(\Vert E\Vert _F\). This can be done by bounding the largest absolute value of an element in E, instead of using the fact that the Frobenius norm is submultiplicative.

3 Simultaneous diagonalization and spherical form MIQP

In this section, a fundamental role is played by the spherical form MIQP. To formally define this problem, we now briefly recall the notion of lattice, and introduce some notation.

Given linearly independent vectors \(b^1, \dots , b^p\) in \({\mathbb {R}}^d\), the lattice generated by \(b^1, \dots , b^p\) is the set \(\Lambda := \left\{ \sum _{i=1}^p v_i b^i : v_i \in {\mathbb {Z}}\ \forall i=1,\dots ,p\right\} \) of integer linear combinations of the vectors \(b^i\), for \(i=1,\dots ,p\). The rank of \(\Lambda \) is p and the dimension of \(\Lambda \) is \(d\). If \(p = d\), then \(\Lambda \) is said to be a full rank lattice. Note that, in this paper, we will consider mainly lattices that are not full rank. The vectors \(b^1, \dots , b^p\) are called a lattice basis. Given a vector \(a\in {\mathbb {R}}^d\) and a nonnegative scalar r, we denote by \({\mathcal {B}}(a,r)\) the closed ball with center \(a\) and radius r. Formally,

$$\begin{aligned} {\mathcal {B}}(a,r) := \{x \in {\mathbb {R}}^d: \Vert x-a\Vert \le r\}. \end{aligned}$$

Note that, throughout the paper, we use the euclidian vector norm defined as \(\Vert x\Vert := \sqrt{x^{\textsf{T}}x}.\) Given vectors \(x^1,\dots ,x^t\), we denote by \((x^1,\dots ,x^t)\) the vector \(({x^1}^{\textsf{T}},\dots ,{x^t}^{\textsf{T}})^{\textsf{T}}\). The orthogonal complement of a linear space \({\mathcal {L}}\) is denoted by \({\mathcal {L}}^\perp \).

We are now in a position to give the formal definition of a spherical form MIQP. A spherical form MIQP is an optimization problem of the form

$$\begin{aligned} \begin{aligned} \min&\quad y^{\textsf{T}}D y + c^{\textsf{T}}y + l^{\textsf{T}}z \\ {{\,\mathrm{s.t.}\,}}&\quad (y,z) \in {\mathcal {P}}\\&\quad y \in \Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp , \ z \in {\mathbb {R}}^{n-d}. \end{aligned} \end{aligned}$$
(S-MIQP)

In this formulation, the variables are \(y \in {\mathbb {R}}^d\) and \(z \in {\mathbb {R}}^{n-d}\). The matrix \(D \in {\mathbb {Q}}^{d\times d}\) is diagonal and its diagonal elements satisfy \(|D_{11}| \ge \cdots \ge |D_{dd}|\). Furthermore, \(c \in {\mathbb {Q}}^d\), and \(l\in {\mathbb {Q}}^{n-d}\). The set \(\Lambda \) is a lattice of rank p and dimension \(d\), and is given via a rational lattice basis. Finally, the set \({\mathcal {P}}\subseteq {\mathbb {R}}^n\) is a polytope given via a finite system of rational linear inequalities, and it satisfies

$$\begin{aligned} {\mathcal {B}}(a,1) \subset {{\,\textrm{proj}\,}}_y {\mathcal {P}}\subset {\mathcal {B}}(a,r_d), \end{aligned}$$
(9)

where \({{\,\textrm{proj}\,}}_y {\mathcal {P}}\) denotes the orthogonal projection of \({\mathcal {P}}\) onto the space \({\mathbb {R}}^d\) of the y variables, a is a given vector in \({\mathbb {Q}}^d\), and \(r_d\) is an integer of size polynomial in \(d\).

The symmetric decomposition algorithm described in Sect. 2 allows us to obtain, in strongly polynomial time, a change of basis that directly transforms (MIQP) in a separable form. In this section, our main goal is to obtain another change of basis that not only maps (MIQP) in a separable form, but also guarantees that the resulting problem is in spherical form. The additional requirements on the change of basis will result in an algorithm that is polynomial time instead of strongly polynomial. To obtain this change of basis, we rely on two key results: (i) the symmetric decomposition algorithm discussed in Sect. 2, and (ii) the existence of an algorithm based on linear programming that, for every full-dimensional polytope \({\mathcal {P}}\), constructs a pair of concentric ellipsoids \({\mathcal {E}}_1\), \({\mathcal {E}}_2\) such that \({\mathcal {E}}_1 \subset {\mathcal {P}}\subset {\mathcal {E}}_2\) and \({\mathcal {E}}_1\) is obtained by shrinking \({\mathcal {E}}_2\) by a factor depending only on the dimension [25].

3.1 Simultaneous diagonalization

The first result of this section does not deal directly with MIQP but is the main building block that will allow us to transform a MIQP in spherical form. This result can be interpreted as a rational version of classic theorems on simultaneous diagonalization of matrices (see Sect. 8.7 in [16]).

In order to present our result we need to introduce ellipsoids. An ellipsoid in \({\mathbb {R}}^n\) is an affine transformation of the unit ball. That is, an ellipsoid is a set of the form

$$\begin{aligned} {\mathcal {E}}(a,L) = \{x \in {\mathbb {R}}^n : \Vert L^{\textsf{T}}(x-a)\Vert \le 1\}, \end{aligned}$$

where \(a\in {\mathbb {R}}^n\) and L is an \(n \times n\) invertible matrix. Note that \( {\mathcal {B}}(a,r) = {\mathcal {E}}(a,I_n/r), \) where \(I_n\) denotes the \(n \times n\) identity matrix.

In what follows, we will often work with rational linear subspaces. In the context of polynomial time algorithms, it is not important if they are given to us via a system of linear equations or via a basis, since each description can be obtained in polynomial time from the other. Given a linear subspace \({\mathcal {L}}\) of \({\mathbb {R}}^n\) of dimension \(d\), a basis matrix of \({\mathcal {L}}\) is an \(n \times d\) matrix whose columns \(b^1,\dots ,b^d\) form a basis of \({\mathcal {L}}\). An \({\mathcal {L}}\)-ellipsoid is a set of the form

$$\begin{aligned} {\mathcal {E}}_{\mathcal {L}}(a,L) = \{x \in {\mathcal {L}}: \Vert L^{\textsf{T}}(x-a)\Vert \le 1\}, \end{aligned}$$

where \({\mathcal {L}}\) is a linear subspace of \({\mathbb {R}}^n\), \(a\in {\mathcal {L}}\), and L is a basis matrix of \({\mathcal {L}}\). Given a linear subspace \({\mathcal {L}}\) of \({\mathbb {R}}^n\) and a set \({\mathcal {S}}\subseteq {\mathbb {R}}^n\), we denote by \({{\,\textrm{proj}\,}}_{\mathcal {L}}({\mathcal {S}})\) the orthogonal projection of \({\mathcal {S}}\) onto \({\mathcal {L}}\). We also say that a polyhedron \(\{x : Wx \le w\}\) is rational if W and w are rational. We are now ready to present the first result of this section.

Proposition 1

Let H be a rational symmetric \(n \times n\) matrix of rank k, let \(\{x \in {\mathbb {R}}^n: Wx \le w\}\) be a full-dimensional rational polytope, and let \({\mathcal {M}}\) be a rational linear subspace of \({\mathbb {R}}^n\) of dimension p. There is a polynomial time algorithm that finds a linear subspace \({\mathcal {L}}\) of \({\mathbb {R}}^n\) containing \({\mathcal {M}}\) and of dimension \(d\) with \(\max \{k,p\} \le d\le k+p\), a \(d\times d\) diagonal matrix D, and an \({\mathcal {L}}\)-ellipsoid \({\mathcal {E}}_{{\mathcal {L}}} (a,L)\) such that

  1. (i)

    \(H = L D L^{\textsf{T}}\),

  2. (ii)

    \({\mathcal {E}}_{{\mathcal {L}}} (a, L) \subset {{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\} \subset {\mathcal {E}}_{{\mathcal {L}}} (a,L / (2 d^{3/2} \lceil (5d)^{d/2}\rceil ^2)).\)

Proof

By Corollary 1 there is a strongly polynomial algorithm that computes an invertible \(n \times n\) matrix \(L_1\) and an \(n \times n\) diagonal matrix \(D_1\) such that \(H = L_1 D_1 L_1^{\textsf{T}}\). Since H has rank k and \(L_1\) is invertible, the matrix \(D_1\) has also rank k. Let \(D_2\) be the matrix obtained from \(D_1\) by deleting row i and column i for each i such that the ith diagonal element of \(D_1\) is zero. Clearly, \(D_2\) is an invertible \(k \times k\) diagonal matrix. We also define the matrix \(L_2\), obtained from \(L_1\) by deleting column i for each i such that the ith diagonal element of \(D_1\) is zero. The matrix \(L_2\) is then an \(n \times k\) matrix of rank k. Since row and column i of \(D_1\) have all zero elements, we have \(H = L_1 D_1 L_1^{\textsf{T}}= L_2 D_2 L_2^{\textsf{T}}\).

Let \({\mathcal {L}}\) be the linear subspace of \({\mathbb {R}}^n\) obtained as the Minkowski sum of \({\mathcal {M}}\) and of the linear space spanned by the k columns of \(L_2\). Clearly, \({\mathcal {L}}\) contains \({\mathcal {M}}\), and its dimension \(d\) satisfies \(\max \{k,p\} \le d\le k+p\). Note that \({{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\}\) is full-dimensional. It then follows form Sections 2 and 5 in [25] that there is a polynomial time algorithm which computes an \({\mathcal {L}}\)-ellipsoid \({\mathcal {E}}_{{\mathcal {L}}} (a,C)\) such that

$$\begin{aligned} {\mathcal {E}}_{{\mathcal {L}}}(a, C) \subset {{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\} \subset {\mathcal {E}}_{{\mathcal {L}}}(a,C / (2 d^{3/2})). \end{aligned}$$
(10)

Since the \(n \times d\) matrix C is a basis matrix of \({\mathcal {L}}\) and each column of \(L_2\) is a vector in \({\mathcal {L}}\), we can compute in polynomial time a \(d\times k\) matrix M such that \(L_2 = C M\). We obtain

$$\begin{aligned} H = L_2 D_2 L_2^{\textsf{T}}= C M D_2 M^{\textsf{T}}C^{\textsf{T}}= C {\tilde{H}} C^{\textsf{T}}, \end{aligned}$$

where \({\tilde{H}} := M D_2 M^{\textsf{T}}\) is a \(d\times d\) symmetric matrix.

By Corollary 1, applied to \({\tilde{H}}\), there is a strongly polynomial algorithm which computes an invertible \(d\times d\) matrix \({\tilde{L}}\) and a \(d\times d\) diagonal matrix \({\tilde{D}}\) such that \({\tilde{H}} = {\tilde{L}} {\tilde{D}} {\tilde{L}}^{\textsf{T}}\). Furthermore, \(\Vert {\tilde{L}}\Vert _F\) and \(\Vert {\tilde{L}}^{-1}\Vert _F\) are upper bounded by \(q_d:= \lceil (5d)^{d/2}\rceil \). Note that \(q_d\) is an integer of size polynomial in \(d\). We obtain \(H = C {\tilde{L}} {\tilde{D}} \tilde{L}^{\textsf{T}}C^{\textsf{T}}.\) By defining the \(d\times d\) matrix D and the \(n \times d\) matrix L in the statement as \(D := \tilde{D} / q_d^2,\) \(L := q_dC {\tilde{L}},\) we obtain \(H = L D L^{\textsf{T}}\). Clearly, D is diagonal, thus condition (i) in the statement holds.

Note that the vector \(a\) is in \({\mathcal {L}}\). Moreover, since C is a basis matrix of \({\mathcal {L}}\) and \({\tilde{L}}\) is invertible, we have that also L is a basis matrix of \({\mathcal {L}}\). Hence \({\mathcal {E}}_{{\mathcal {L}}} (a,L)\) is an \({\mathcal {L}}\)-ellipsoid. We now show that condition (ii) is satisfied. Using the fact that the Frobenius norm is submultiplicative and that \(\Vert {\tilde{L}}\Vert _F\) and \(\Vert {\tilde{L}}^{-1}\Vert _F\) are upper bounded by \(q_d\), we obtain

$$\begin{aligned} \Vert C^{\textsf{T}}(x - a)\Vert&= \Vert {\tilde{L}}^{-{\textsf{T}}} L^{\textsf{T}}(x - a)\Vert /q_d\le \Vert {\tilde{L}}^{-1}\Vert _F \, \Vert L^{\textsf{T}}(x - a)\Vert / q_d\le \Vert L^{\textsf{T}}(x - a)\Vert , \\ \Vert L^{\textsf{T}}(x - a)\Vert&= q_d\Vert \tilde{L}^{\textsf{T}}C^{\textsf{T}}(x - a)\Vert \le q_d\Vert {\tilde{L}}\Vert _F \, \Vert C^{\textsf{T}}(x - a)\Vert \le q_d^2 \Vert C^{\textsf{T}}(x - a)\Vert . \end{aligned}$$

The first chain of inequalities and (10) imply

$$\begin{aligned} {\mathcal {E}}_{{\mathcal {L}}} (a, L) \subseteq {\mathcal {E}}_{{\mathcal {L}}} (a, C) \subset {{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\}. \end{aligned}$$

The second chain of inequalities implies \({\mathcal {E}}_{{\mathcal {L}}} (a,q_d^2 C) \subseteq {\mathcal {E}}_{{\mathcal {L}}} (a, L)\), thus from (10),

$$\begin{aligned} {{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\} \subset {\mathcal {E}}_{{\mathcal {L}}} (a, C / (2 d^{3/2})) \subseteq {\mathcal {E}}_{{\mathcal {L}}} (a, L / (2 d^{3/2} q_d^2)). \end{aligned}$$

\(\square \)

Consider now the simplest case of Proposition 1, where we set \({\mathcal {M}}:= {\mathbb {R}}^n\). Then \({\mathcal {L}}= {\mathbb {R}}^n\), \(d=n\), the \({\mathcal {L}}\)-ellipsoids are just ellipsoids, and the polytope \({{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\}\) is simply \(\{x: Wx \le w\}\). In this case, Proposition 1 provides a matrix L that at the same time diagonalizes H and provides the shape of an ellipsoid that approximates the given polytope within a factor depending only on the dimension. This special case can then be interpreted as a rational version of theorems on simultaneous diagonalization of matrices. If we perform the change of basis \(y := L^{\textsf{T}}x\), the given matrix H is diagonalized, and the ellipsoids are just balls.

3.2 Reduction to spherical form MIQP

Next, we employ Proposition 1 to show that (MIQP) can be transformed in spherical form (S-MIQP). Throughout the paper, we denote by \(e^1,e^2\dots ,e^n\) the standard basis of \({\mathbb {R}}^n\).

Proposition 2

Consider (MIQP), assume that \(\{x: Wx \le w\}\) is a full-dimensional polytope, and let k denote the rank of H. There is a polynomial time algorithm that finds a change of basis that transforms (MIQP) in spherical form (S-MIQP), where \(d\) satisfies \(\max \{k,p\} \le d\le k+p\), the rank of the matrix D is k, and \(r_d\) in (9) is the ceiling of \(2 d^{3/2} \lceil (5d)^{d/2}\rceil ^2\).

Proof

Consider (MIQP), assume that \(\{x: Wx \le w\}\) is a full-dimensional polytope, and let k denote the rank of H. By Proposition 1 with \({\mathcal {M}}:= {\mathbb {R}}^p \times \{0\}^{n-p}\), we obtain in polynomial time a linear subspace \({\mathcal {L}}\) of \({\mathbb {R}}^n\) containing \({\mathcal {M}}\) and of dimension \(d\) with \(\max \{k,p\} \le d\le k+p\), a \(d\times d\) diagonal matrix D, and an \({\mathcal {L}}\)-ellipsoid \({\mathcal {E}}_{{\mathcal {L}}}(a, L_y)\) such that \(H = L_y D L_y^{\textsf{T}}\) and

$$\begin{aligned} {\mathcal {E}}_{{\mathcal {L}}} (a, L_y) \subset {{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x: Wx \le w\} \subset {\mathcal {E}}_{{\mathcal {L}}} (a, L_y / r_d), \end{aligned}$$
(11)

where we define \(r_d\) as the ceiling of \(2 d^{3/2} \lceil (5d)^{d/2}\rceil ^2\). Since \(L_y\) is an \(n \times d\) matrix of rank \(d\), it is simple to check that the rank of D coincides with the rank of H.

We now compute an \(n \times (n-d)\) basis matrix \(L_z\) of the orthogonal complement \({\mathcal {L}}^\perp \) of \({\mathcal {L}}\). Denote by L the \(n \times n\) invertible matrix \((L_y \mid L_z)\). We perform the change of basis \(x \mapsto (y,z)\), where \((y,z) \in {\mathbb {R}}^n\) is defined by \((y,z) := L^{\textsf{T}}x\), i.e., \(y \in {\mathbb {R}}^d\) is defined by \(y := L_y^{\textsf{T}}x\), and \(z \in {\mathbb {R}}^{n-d}\) is defined by \(z := L_z^{\textsf{T}}x\).

Next, we consider the problem obtained from (MIQP) via the above change of basis, and we show that it coincides with (S-MIQP). The objective function of the new problem is

$$\begin{aligned} x^{\textsf{T}}H x + h^{\textsf{T}}x = x^{\textsf{T}}L_y D L_y^{\textsf{T}}x + h^{\textsf{T}}x = y^{\textsf{T}}D y + h^{\textsf{T}}L^{-{\textsf{T}}} (y,z), \end{aligned}$$

which coincides with the objective function of (S-MIQP) if we define the vectors \(c \in {\mathbb {Q}}^d\) and \(l\in {\mathbb {Q}}^{n-d}\) by \((c,l) := L^{-1} h\). The image of the polytope \(\{x: Wx \le w\}\) is the set \({\mathcal {P}}:= \{(y,z) : WL^{-{\textsf{T}}} (y,z) \le w\}\). Clearly, \({\mathcal {P}}\) is a polytope defined by a finite system of rational linear inequalities.

By definition of \(L_z\), the linear subspace \({\mathcal {L}}\) can be written as \({\mathcal {L}}= \{x : L_z^{\textsf{T}}x = 0\}\), thus the image of \({\mathcal {L}}\) under the change of basis is \(\{(y,z) : z = 0\} = {\mathbb {R}}^d\times \{0\}^{n-d}\). Similarly, the linear subspace \({\mathcal {L}}^\perp \) can be written as \({\mathcal {L}}^\perp = \{x : L_y^{\textsf{T}}x = 0\}\), thus the image of \({\mathcal {L}}^\perp \) is \(\{0\}^d\times {\mathbb {R}}^{n-d}\).

Next we show that (11) implies (9). The above discussion implies that a point \({{\,\textrm{proj}\,}}_{{\mathcal {L}}} (x)\) is mapped to \({{\,\textrm{proj}\,}}_y (L^{-{\textsf{T}}} (y,z)) \times \{0\}^{n-d}\). Thus, \({{\,\textrm{proj}\,}}_{{\mathcal {L}}} \{x : Wx \le w\}\) is mapped to \({{\,\textrm{proj}\,}}_y {\mathcal {P}}\times \{0\}^{n-d}\). The set \({\mathcal {E}}_{{\mathcal {L}}} (a, L_y)\) is mapped to

$$\begin{aligned} \left\{ (y,z) : \Vert y-L_y^{\textsf{T}}a\Vert \le 1\right\} \cap ({\mathbb {R}}^d\times \{0\}^{n-d})&= {\mathcal {E}}(L_y^{\textsf{T}}a, I_d) \times \{0\}^{n-d} \\&= {\mathcal {B}}(L_y^{\textsf{T}}a,1) \times \{0\}^{n-d}. \end{aligned}$$

Similarly, the set \({\mathcal {E}}_{{\mathcal {L}}} (a, L_y/r_d)\) is mapped to

$$\begin{aligned} \left\{ (y,z) : \Vert (y-L_y^{\textsf{T}}a)/r_d\Vert \le 1\right\} \cap ({\mathbb {R}}^d\times \{0\}^{n-d})&= {\mathcal {E}}(L_y^{\textsf{T}}a, I_d/ r_d) \times \{0\}^{n-d} \\&= {\mathcal {B}}(L_y^{\textsf{T}}a,r_d) \times \{0\}^{n-d}. \end{aligned}$$

From (11), we obtain \({\mathcal {B}}(L_y^{\textsf{T}}a,1) \subset {{\,\textrm{proj}\,}}_y {\mathcal {P}}\subset {\mathcal {B}}(L_y^{\textsf{T}}a,r_d),\) which coincides with (9) if we redefine the vector \(a\in {\mathbb {Q}}^d\) to be \(L_y^{\textsf{T}}a\).

We now consider the image of \({\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}\). The set \({\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}\) can be written as the Minkowski sum \(({\mathbb {Z}}^p \times \{0\}^{n-p}) + {\mathcal {N}}+ {\mathcal {L}}^\perp \), where \({\mathcal {N}}\) is the orthogonal complement of \({\mathcal {M}}\) in \({\mathcal {L}}\). Since \({\mathcal {M}}\subseteq {\mathcal {L}}\) and the image of \({\mathcal {L}}\) is \({\mathbb {R}}^d\times \{0\}^{n-d}\), we have that the image of \({\mathbb {Z}}^p \times \{0\}^{n-p}\) is \(\Lambda \times \{0\}^{n-d}\), where \(\Lambda \) is a lattice of rank p and dimension \(d\). Furthermore, the image of the vectors \(e^1,e^2\dots ,e^p\) forms a lattice basis \(b^1,\dots ,b^p\) of \(\Lambda \). Since \({\mathcal {N}}\subseteq {\mathcal {L}}\), the image of \({\mathcal {N}}\) is \({\mathcal {N}}' \times \{0\}^{n-d}\), where \({\mathcal {N}}'\) is a linear subspace of \({\mathbb {R}}^d\) of dimension \(d-p\). Since \({\mathcal {M}}\) and \({\mathcal {N}}\) are orthogonal, we have that \(\Lambda + {\mathcal {N}}'\) has dimension d. Finally, we know that the image of \({\mathcal {L}}^\perp \) is \(\{0\}^d\times {\mathbb {R}}^{n-d}\). We conclude that the image of \({\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}\) is \((\Lambda + {\mathcal {N}}') \times {\mathbb {R}}^{n-d}\). Let \(\Lambda '\) be the orthogonal projection of \(\Lambda \) onto \({{\mathcal {N}}'}^\perp \). Then \(\Lambda '\) is a lattice of rank p and dimension \(d\), and the image of \({\mathbb {Z}}^p \times {\mathbb {R}}^{n-p}\) is \((\Lambda ' + {{\,\textrm{span}\,}}(\Lambda ')^\perp ) \times {\mathbb {R}}^{n-d}\) as desired. A basis of \(\Lambda '\) can be obtained by taking the orthogonal projection of \(b^1,\dots ,b^p\) onto \({{\mathcal {N}}'}^\perp \).

By eventually reordering the components of the vector y, and accordingly the data of the problem, we obtain that the diagonal elements of the matrix D satisfy \(|D_{11}| \ge \cdots \ge |D_{dd}|\). \(\square \)

Next, we briefly discuss how Proposition 2 simplifies in the pure integer setting and in the pure continuous setting. In the pure integer setting we have \(p=n\) in (MIQP), and Proposition 2 implies \(d= n\). Therefore, in (S-MIQP) we have no z variables and the constraint \(y \in \Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp \) is replaced by \(y \in \Lambda \) since the set \(\Lambda \) is a full rank lattice of dimension n. Furthermore, in (9), the set \({{\,\textrm{proj}\,}}_y {\mathcal {P}}\) is replaced by \({\mathcal {P}}\). In the pure continuous setting we have \(p=0\) in (MIQP), and Proposition 2 implies \(d= k\). Therefore, in (S-MIQP) the constraint \(y \in \Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp \) is replaced by \(y \in {\mathbb {R}}^d\) since the set \(\Lambda \) is a lattice of rank zero.

We remark that a change of basis similar to the one given by Proposition 2 can be obtained through the use of eigenvalue methods like the Schur decomposition [16], instead of our symmetric decomposition algorithm. These techniques have been used by Vavasis to obtain a related change of basis for QP (see page 282 in [30]). Unfortunately, these methods do not yield polynomial time algorithms since symmetric integer matrices can have irrational eigenvalues.

4 Aligned vectors

In this section, we introduce the notion of aligned vectors. Given an instance of problem (S-MIQP), two vectors \(y^+,y^- \in {\mathbb {R}}^d\) are said to be aligned if \(y^+,y^- \in {\mathcal {B}}(a,1) \cap (2\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp )\), and \(y^+_1 - y^-_1 \ge 1\), \(\sum _{i=2}^d(y^+_i - y^-_i)^2 \le 1/4\). The end goal of this section is to show that, if there exist two aligned vectors, then, for every \(\epsilon \in (0,1]\), it is possible to find an \(\epsilon \)-approximate solution to (S-MIQP) by solving a number of MILPs.

We begin by showing, in Lemma 4, how aligned vectors allow us to obtain a lower bound on the gap between maximum and minimum of a separable quadratic function evaluated on the two vectors and their midpoint. In the proof of Lemma 4 we use the following simple lemma. The proof is that of Lemma 3 in [30], even though our statement is slightly stronger.

Lemma 3

Let \(q(\lambda ) = a \lambda ^2 + b \lambda + c\) be a univariate quadratic function and let \(u, \ell \in {\mathbb {R}}\). Let \(\underline{q}\) and \({\overline{q}}\) be the minimum and maximum values attained by q on the three points \(u, \ell , (u+ \ell )/2\). Then \({\overline{q}} - \underline{q} \ge |a| (u-\ell )^2/4.\)

Lemma 4

Let \(f : {\mathbb {R}}^d\times {\mathbb {R}}^{n-d} \rightarrow {\mathbb {R}}\) be a quadratic function of the form \(f(y,z) = y^{\textsf{T}}D y + c^{\textsf{T}}y + l^{\textsf{T}}z\), where D is diagonal and \(D_{11}\) is the element of D with the largest absolute value. Let \((y^+,z^+), (y^-,z^-) \in {\mathbb {R}}^d\times {\mathbb {R}}^{n-d}\) such that \(y^+_1 - y^-_1 \ge 1\) and \(\sum _{i=2}^d(y^+_i - y^-_i)^2 \le 1/4\). Let \(\underline{f}\) and \({\overline{f}}\) be the minimum and maximum values attained by f on the three vectors \((y^+,z^+), (y^-,z^-), (y^+,z^+)/2 + (y^-,z^-)/2\). Then \({\overline{f}} - \underline{f} \ge \frac{3}{16} |D_{11}|.\)

Proof

By eventually replacing f with \(-f\), we can assume without loss of generality that \(D_{11} \ge 0\). Let \(q : {\mathbb {R}}\rightarrow {\mathbb {R}}\) be defined by

$$\begin{aligned} q(\lambda ) := f\left( (y^-,z^-) + \lambda \left( (y^+,z^+) - (y^-,z^-)\right) \right) . \end{aligned}$$

Using the separability of f, we obtain

$$\begin{aligned} q(\lambda ) = \sum _{i=1}^dD_{ii} \left( y^-_i + \lambda (y^+_i - y^-_i)\right) ^2 + O(\lambda ) = \lambda ^2 \cdot \sum _{i=1}^dD_{ii} (y^+_i - y^-_i)^2 + O(\lambda ). \end{aligned}$$

To conclude the proof we just need to show that

$$\begin{aligned} \left|\sum _{i=1}^dD_{ii} (y^+_i - y^-_i)^2\right| \ge \frac{3}{4} D_{11}. \end{aligned}$$
(12)

In fact, by noting that \(q(0) = f(y^-,z^-)\), \(q(1) = f(y^+,z^+)\), and \(q(1/2) = f \left( (y^+,z^+)/2 + (y^-,z^-)/2\right) \), we can apply Lemma 3 to q and the points \(0,1 \in {\mathbb {R}}\) and obtain

$$\begin{aligned} {\overline{f}} - \underline{f} = {\overline{q}} - \underline{q} \ge \frac{1}{4} \left|\sum _{i=1}^dD_{ii} (y^+_i - y^-_i)^2\right| \ge \frac{3}{16} D_{11}. \end{aligned}$$

To prove inequality (12), we bound its left hand side as follows:

$$\begin{aligned} \left|\sum _{i=1}^dD_{ii} (y^+_i - y^-_i)^2\right|&\ge \sum _{i=1}^dD_{ii} (y^+_i - y^-_i)^2 = \\&= \sum _{i : D_{ii} \ge 0} D_{ii} (y^+_i - y^-_i)^2 - \sum _{i : D_{ii} < 0} -D_{ii} (y^+_i - y^-_i)^2. \end{aligned}$$

We can now separately bound the two nonnegative sums using the assumption on \(D_{11}\), and the conditions \(y^+_1 - y^-_1 \ge 1\) and \(\sum _{i=2}^d(y^+_i - y^-_i)^2 \le 1/4\).

$$\begin{aligned} \sum _{i : D_{ii} \ge 0} D_{ii} (y^+_i - y^-_i)^2&\ge D_{11} (y^+_1 - y^-_1)^2 \ge D_{11}, \\ \sum _{i : D_{ii}< 0} -D_{ii} (y^+_i - y^-_i)^2&\le D_{11} \sum _{i : D_{ii} < 0} (y^+_i - y^-_i)^2 \le D_{11} / 4. \end{aligned}$$

Hence inequality (12) holds. \(\square \)

We are now ready to discuss our approximation algorithm for spherical form MIQPs for which there exist two aligned vectors. This algorithm is based on the classic technique of mesh partition and linear underestimators. This natural approach consists in replacing the nonlinear objective function with a piecewise linear approximation, an idea known in the field of optimization since at least the 1950s. Mesh partition and linear underestimators have proven to be a very successful technique to obtain approximation algorithms for several special classes of MIQP [7,8,9, 30, 31]. In this section, for the first time we employ mesh partition and linear underestimators to MIQPs that, at the same time, have integer variables and an indefinite quadratic objective function. The generality of this setting poses a number of additional challenges, and the results presented in the paper so far provide the key to successfully apply these techniques. In the proof, we will use the following standard lemma.

Lemma 5

Let \(q(\lambda ) = a \lambda ^2 + b \lambda + c\) be a univariate quadratic function and let \(u, \ell \in {\mathbb {R}}\). Let \(q'(\lambda )\) be the affine univariate function that attains the same values as q at \(\ell ,u\). Then \(|q'(\lambda ) - q(\lambda )| \le |a| (u-\ell )^2/4\) for every \(\lambda \in [\ell ,u]\).

Proposition 3

Consider (S-MIQP), assume that there exist two aligned vectors, and let k be the rank of the matrix D. For every \(\epsilon \in (0,1]\), there is an algorithm that finds an \(\epsilon \)-approximate solution, if it exists, by solving at most \(\left\lceil 4 r_{d} \sqrt{k/(3 \epsilon )}\right\rceil ^k\) MILPs of the same size as (S-MIQP) and with p integer variables.

Proof

We start by describing the approximation algorithm. We define \(\varphi ^k\) boxes in \({\mathbb {R}}^{k}\), where \(\varphi := \left\lceil 4 r_d\sqrt{k/(3 \epsilon )}\right\rceil \):

$$\begin{aligned} {\mathcal {C}}_{j_1,\dots ,j_k} := \prod _{i=1}^k \left( \left\{ a_i - r_d\right\} + \frac{2 r_d}{\varphi } \left[ j_i - 1 , j_i \right] \right) \quad \forall j_1,\dots ,j_k \in \{1,\dots , \varphi \}. \end{aligned}$$
(13)

Note that the union of these \(\varphi ^k\) boxes is the polytope

$$\begin{aligned} \{(y_1,\dots ,y_k) \in {\mathbb {R}}^{k} : a_i - r_d\le y_i \le a_i + r_d\ \forall i =1,\dots ,k\}, \end{aligned}$$

which contains the projection of \({\mathcal {P}}\) onto the space defined by the first k coordinates of y, since \({{\,\textrm{proj}\,}}_y {\mathcal {P}}\subset {\mathcal {B}}(a,r_d)\) from (9).

For each box \({\mathcal {C}}= \prod _{i=1}^k [\ell _i,u_i]\) among those defined in (13), we construct the affine functions \(g_i : {\mathbb {R}}\rightarrow {\mathbb {R}}\) that attain the same values as \(D_{ii} y_i^2\) at \(\ell _i,u_i\), for \(i=1,\dots ,k\):

$$\begin{aligned} g_i(y_i) := D_{ii}(\ell _i+u_i) y_i - D_{ii} \ell _i u_i \qquad \forall i = 1,\dots , k. \end{aligned}$$

We define \(\gamma := |D_{11}|.\) Then we define the affine function \(g: {\mathbb {R}}^{k} \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} g(y_1,\dots ,y_k) := \sum _{i=1}^k g_i(y_i) - \frac{\gamma r_d^2}{\varphi ^2} |\{i \in \{1,\dots ,k\} : D_{ii} > 0\}|. \end{aligned}$$
(14)

We solve the MILP obtained from (S-MIQP) by substituting \(y^{\textsf{T}}D y\) with \(g(y_1,\dots ,y_k)\) and adding the constraint \((y_1,\dots ,y_k) \in {\mathcal {C}}\):

$$\begin{aligned} \begin{aligned} \min&\quad g(y_1,\dots ,y_k) + c^{\textsf{T}}y + l^{\textsf{T}}z \\ {{\,\mathrm{s.t.}\,}}&\quad (y,z) \in {\mathcal {P}}\\&\quad (y_1,\dots ,y_k) \in {\mathcal {C}}\\&\quad y \in \Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp , \ z \in {\mathbb {R}}^{n-d}. \end{aligned} \end{aligned}$$
(15)

To see that (15) is indeed a MILP, one just needs to perform a change of basis that maps \(\Lambda \) to \({\mathbb {Z}}^p \times \{0\}^{d-p}\) and \({{\,\textrm{span}\,}}(\Lambda )^\perp \) to \(\{0\}^{p} \times {\mathbb {R}}^{d-p}\).

The approximation algorithm returns the best solution \((y^\diamond , z^\diamond )\) among all the (at most) \(\varphi ^k\) optimal solutions just obtained of the MILPs (15). If all the MILPs (15) are infeasible, the algorithm returns that (S-MIQP) is infeasible. This concludes the description of the algorithm.

Next, we show that \((y^\diamond ,z^\diamond )\) is an \(\epsilon \)-approximate solution to (S-MIQP). To simplify the notation, in this proof we denote the objective function of (S-MIQP) by

$$\begin{aligned} f(y, z) := y^{\textsf{T}}D y + c^{\textsf{T}}y + l^{\textsf{T}}z = \sum _{i=1}^k D_{ii} y_i^2 + c^{\textsf{T}}y + l^{\textsf{T}}z. \end{aligned}$$

In order to show that \((y^\diamond ,z^\diamond )\) is an \(\epsilon \)-approximate solution, we derive two bounds: (i) an upper bound on \(f(y^\diamond ,z^\diamond ) - f(y^*,z^*)\), where \((y^*, z^*)\) is an optimal solution to (S-MIQP), and (ii) a lower bound on \(f_{\max } - f(y^*,z^*)\), where \(f_{\max }\) is the maximum value of f(yz) on the feasible region of (S-MIQP). Note that both bounds will depend linearly on \(\gamma \). This dependence is what allows us to solve a number of MILPs that is independent on \(\gamma \).

\({\underline{\hbox {An upper bound on} f(y^\diamond ,z^\diamond ) -f(y^*,z^*)}}\). Let \({\mathcal {C}}\subset {\mathbb {R}}^k\) be a box constructed in (13), say \({\mathcal {C}}= \prod _{i=1}^k [\ell _i,u_i]\). For each \(i = 1,\dots , k,\) we apply Lemma 5 to each univariate quadratic function \( D_{ii} y_i^2 \) and points \(\ell _i, u_i\). Since \(u_i - \ell _i = 2 r_d/ \varphi \) and \(|D_{ii}| \le \gamma \) for \(i=1,\dots ,k\), we obtain that, for every \((y_1,\dots ,y_k) \in {\mathcal {C}}\),

$$\begin{aligned} g_i(y_i) - \gamma r_d^2 /\varphi ^2&\le D_{ii} y_i^2 \le g_i(y_i){} & {} \text {if } D_{ii} > 0\\ g_i(y_i)&\le D_{ii} y_i^2 \le g_i(y_i) + \gamma r_d^2 /\varphi ^2{} & {} \text {if } D_{ii} < 0. \end{aligned}$$

We sum up all these inequalities for \(i=1,\dots ,k\) and obtain that for every \((y_1,\dots ,y_k) \in {\mathcal {C}}\),

$$\begin{aligned} g(y_1,\dots ,y_k) \le \sum _{i=1}^k D_{ii} y_i^2 \le g(y_1,\dots ,y_k) + \gamma k r_d^2 / \varphi ^2. \end{aligned}$$
(16)

Let \({\mathcal {C}}^\diamond \subset {\mathbb {R}}^k\) be the box constructed in (13) that yields the solution \((y^\diamond ,z^\diamond )\) and let \(g^\diamond \) be the corresponding affine function defined in (14). Let \({\mathcal {C}}^* \subset {\mathbb {R}}^k\) be a box constructed in (13) such that \((y^*,z^*) \in {\mathcal {C}}^*\) and let \(g^*\) be the corresponding affine function. We have

$$\begin{aligned} \begin{aligned} f(y^\diamond ,z^\diamond )&\le g^\diamond (y_1^\diamond ,\dots ,y_k^\diamond ) + c^{\textsf{T}}y^\diamond + l^{\textsf{T}}z^\diamond + \gamma k r_d^2 / \varphi ^2 \\&\le g^*(y_1^*,\dots ,y_k^*) + c^{\textsf{T}}y^* + l^{\textsf{T}}z^* + \gamma k r_d^2 / \varphi ^2 \\&\le f(y^*,z^*) + \gamma k r_d^2 / \varphi ^2. \end{aligned} \end{aligned}$$
(17)

The first inequality follows by applying the right inequality in (16) to \({\mathcal {C}}^\diamond \) and \(y^\diamond \). The second inequality holds by definition of \((y^\diamond ,z^\diamond )\). The third inequality follows by applying the left inequality in (16) to \({\mathcal {C}}^*\) and \(y^*\).

\({\underline{\hbox {A lower bound on}f_{\max } - f(y^*, z^*)}}\). By assumption, there exist two aligned vectors \(y^+,y^-\) for (S-MIQP). From (9) we have \({\mathcal {B}}(a,1) \subset {{\,\textrm{proj}\,}}_y {\mathcal {P}}\), thus there exist \(z^+,z^- \in {\mathbb {R}}^{n-d}\) such that the vectors \((y^+,z^+), (y^-,z^-) \in {\mathbb {R}}^d\times {\mathbb {R}}^{n-d}\) are in \({\mathcal {P}}\). We define the midpoint of the segment joining \((y^+,z^+)\) and \((y^-,z^-)\) as \((y^\circ ,z^\circ ) := (y^+,z^+)/2 + (y^-,z^-)/2.\) By convexity, the vector \((y^\circ ,z^\circ )\) is in \({\mathcal {P}}\). Moreover, as both vectors \(y^+ /2\), \(y^- /2\) are in \(\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp \), so is their sum \(y^\circ \). Let \(\underline{f}\) and \({\overline{f}}\) be the minimum and maximum values attained by f on the three vectors \((y^+,z^+)\), \((y^-,z^-)\), \((y^\circ ,z^\circ )\). Then, by Lemma 4, \( {\overline{f}} - \underline{f} \ge \frac{3}{16} |D_{11}| = \frac{3}{16} \gamma . \) Since all three vectors are feasible to (S-MIQP), we conclude that

$$\begin{aligned} f_{\max } - f(y^*, z^*) \ge \frac{3}{16} \gamma . \end{aligned}$$
(18)

We are now ready to show that \((y^\diamond ,z^\diamond )\) is an \(\epsilon \)-approximate solution to (S-MIQP). We have

In the first inequality we used (17) and (18), and the last inequality holds by the definition of \(\varphi \) given at the beginning of the proof. \(\square \)

In particular, note that the number of MILPs solved in Proposition 3 is polynomial in \(1/\epsilon \) if k and d are fixed. Due to Proposition 2, this is indeed the case if both k and p are fixed in the original (MIQP).

5 Flatness and decomposition of spherical form MIQP

In Sect. 4 we saw that, if a spherical form MIQP has two aligned vectors, then we can find an \(\epsilon \)-approximate solution. But what if there are no aligned vectors? In this section, we show that in this case we can decompose the problem in a number of MIQPs with fewer integer variables. This result will play a crucial role in our approximation algorithm for MIQP. Before stating our theorem, we recall the concepts of width and of reduced basis.

Let \({\mathcal {S}}\subseteq {\mathbb {R}}^d\) be a bounded closed convex set. Given a vector \(v\in {\mathbb {R}}^d\), we define the width of \({\mathcal {S}}\) along \(v\) to be

$$\begin{aligned} {{\,\textrm{width}\,}}_v({\mathcal {S}}) := \max \{ v^{\textsf{T}}y : y \in {\mathcal {S}}\} - \min \{ v^{\textsf{T}}y : y \in {\mathcal {S}}\}. \end{aligned}$$

Let \(\Lambda \) be a lattice of rank p and dimension \(d\), and let \(b^1, \dots , b^p \in {\mathbb {R}}^d\) be a lattice basis of \(\Lambda \). Consider now a vector \(v\in {\mathbb {R}}^d\) that satisfies \(v^{\textsf{T}}b^i \in {\mathbb {Z}}\) for every \(i =1,\dots ,p\). Then \(v^{\textsf{T}}y\) is an integer for every \(y \in \Lambda \) since y can be written as an integer linear combination of the \(b^i\). It follows that \({{\,\textrm{width}\,}}_v({\mathcal {S}})\) is an upper bound on the number of hyperplanes orthogonal to \(v\) that contain points in \({\mathcal {S}}\cap \Lambda \).

Next, we recall the notion of reduced basis. Let \(\Lambda \) be a lattice of rank p and dimension \(d\), and let \(b^1, \dots , b^p \in {\mathbb {R}}^d\) be a lattice basis of \(\Lambda \). The \(d\times p\) matrix B formed by taking the columns to be the basis vectors \(b^i\) is called a basis matrix of \(\Lambda \). The determinant of \(\Lambda \) is the volume of the fundamental parallelepiped of any basis for \(\Lambda \), that is, \(\det (\Lambda ) := \sqrt{\det (B^{\textsf{T}}B)}\).

Lovász introduced the notion of a reduced basis, using a Gram-Schmidt orthogonal basis as a reference. The Gram-Schmidt procedure is as follows. Define \(g^1 := b^1\) and, recursively, for \(i = 2, \dots , p\), define \(g^i \in {\mathbb {R}}^d\) as the projection of \(b^i\) onto the orthogonal complement of the linear space spanned by \(b^1, \dots , b^{i-1}\). Formally, for \(i = 2, \dots , p\), \(g^i\) is defined by

$$\begin{aligned} g^i&:= b^i - \sum _{j=1}^{i-1} \mu _{ij} g^j, \qquad \text {where } \mu _{ij} := \frac{(b^i)^{\textsf{T}}g^j}{\Vert g^j\Vert ^2}&\qquad \forall j=1,\dots ,i-1. \end{aligned}$$
(19)

By construction, the Gram-Schmidt basis \(g^1 , \dots , g^p\) is an orthogonal basis of \({{\,\textrm{span}\,}}(\Lambda )\) with the property that, for \(i = 1,\dots ,p\), the linear spaces spanned by \(b^1,\dots , b^i\) and by \(g^1,\dots , g^i\) coincide. Moreover, we have \(\Vert b^i\Vert \ge \Vert g^i\Vert \) for \(i = 1,\dots , p\), and \(\Vert g^1\Vert \cdots \Vert g^p\Vert = \det (\Lambda )\).

A basis \(r^1, \dots ,r^p\) of the lattice \(\Lambda \) is said to be reduced if it satisfies the following two conditions

$$\begin{aligned} |\mu _{ij}| \le \frac{1}{2}&\qquad \text {for } 1 \le j < i \le p \\ \Vert g^i + \mu _{i,i-1}g^{i-1}\Vert ^2 \ge \frac{3}{4} \Vert g^{i-1}\Vert ^2&\qquad \text {for } 2 \le i \le p, \end{aligned}$$

where \(g^1, \dots , g^p\) is the output of the Gram-Schmidt procedure when applied to \(r^1, \dots , r^p\). Lovász’ celebrated basis reduction algorithm yields a reduced basis, and it runs in polynomial time in the size of the original basis. If a basis \(r^1, \dots , r^p\) of \(\Lambda \) is reduced, then it is “nearly orthogonal”, in the sense that it satisfies

$$\begin{aligned} \Vert r^1\Vert \cdots \Vert r^p\Vert \le 2^{p(p-1)/4}\det (\Lambda ). \end{aligned}$$
(20)

See for example [2] for more details on lattices and reduced basis, or [14] for an exposition that does not consider only full rank lattices.

In order to show our decomposition result for spherical form MIQP, we first prove the following Lenstra-type proposition.

Proposition 4

Let \(a \in {\mathbb {Q}}^d\), \(\delta \in {\mathbb {Q}}\) with \(\delta \ge 0\), and let \(\Lambda \) be a lattice of rank p and dimension \(d\) with basis matrix \(B\in {\mathbb {Q}}^{d\times p}\). There is a polynomial time algorithm which either finds a vector \({\bar{y}} \in {\mathcal {B}}(a,\delta ) \cap (\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp )\), or finds a vector \(v\in {{\,\textrm{span}\,}}(\Lambda )\) with \(v^{\textsf{T}}B\) integer such that \({{\,\textrm{width}\,}}_v({\mathcal {B}}(a,\delta )) \le p 2^{p(p-1)/4}\).

Proof

If \(p=0\), then the algorithm simply returns \({\bar{y}} = a\), thus we now assume \(p \ge 1\). The basis reduction algorithm gives in polynomial time a reduced basis \(r^1, \dots , r^p \in {\mathbb {Q}}^d\) of the lattice \(\Lambda \). Let \({\hat{r}}^1, \dots {\hat{r}}^p \in {\mathbb {Q}}^d\) be obtained by reordering \(r^1, \dots r^p\) so that the vector in the last position has maximum norm, and denote by \({\hat{R}} \in {\mathbb {Q}}^{d\times p}\) the corresponding basis matrix. Since \(B\) and \({\hat{R}}\) are basis matrices of the same lattice \(\Lambda \), it is well known that we can find in polynomial time a \(p \times p\) unimodular matrix U such that \(B= {\hat{R}}U\).

Let \(a_\Lambda := {{\,\textrm{proj}\,}}_{{{\,\textrm{span}\,}}(\Lambda )} a\in {\mathbb {Q}}^d\), let \(\lambda \in {\mathbb {Q}}^p\) be such that \({\hat{R}} \lambda = a_\Lambda \), and define \(y_\Lambda := {\hat{R}} \lfloor \lambda \rceil \in {\mathbb {Q}}^d\), where \(\lfloor \lambda \rceil = (\lfloor \lambda _1\rceil ,\dots ,\lfloor \lambda _p\rceil )\) and \(\lfloor \lambda _i\rceil \) denotes a nearest integer to \(\lambda _i\). Clearly, \(y_\Lambda \in \Lambda \). Consider first the case \(y_\Lambda \in {{\,\textrm{proj}\,}}_{{{\,\textrm{span}\,}}(\Lambda )}({\mathcal {B}}(a,\delta ))\). This implies that the vector \({\bar{y}} := (a+ {{\,\textrm{span}\,}}(\Lambda )) \cap (y_\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp )) \in {\mathbb {Q}}^d\) is in \({\mathcal {B}}(a,\delta )\). Since \(y_\Lambda \in \Lambda \), we have that \({\bar{y}} \in \Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp \). Therefore, in this case we are done. Hence, in the remainder of the proof we consider the case \(y_\Lambda \notin {{\,\textrm{proj}\,}}_{{{\,\textrm{span}\,}}(\Lambda )}({\mathcal {B}}(a,\delta ))\).

Since \(B\) is a \(d\times p\) matrix of rank p, the matrix \(B^{\textsf{T}}B\) is an invertible \(p \times p\) symmetric matrix, thus we can define the \(p \times d\) matrix \(B^\dagger := (B^{\textsf{T}}B)^{-1} B^{\textsf{T}}\). The matrix \(B^\dagger \) is a left inverse of \(B\), i.e., \(B^\dagger B\) is the identity matrix \(I_p\). Let \(u\in {\mathbb {Z}}^{1 \times p}\) be the last row of U, and define the vector \(v:= (uB^\dagger )^{\textsf{T}}\in {\mathbb {Q}}^d\). We have that \(v\in {{\,\textrm{span}\,}}(\Lambda )\), since for every vector \(t \in ({{\,\textrm{span}\,}}(\Lambda ))^\perp \) we have \(v^{\textsf{T}}t = uB^\dagger t = u(B^{\textsf{T}}B)^{-1} B^{\textsf{T}}t = 0,\) since each column of \(B\) lies in \({{\,\textrm{span}\,}}(\Lambda )\). Moreover, the vector \(v^{\textsf{T}}B\) is integer since \(v^{\textsf{T}}B= uB^\dagger B= uI_p = u.\) Hence, to complete the proof, we only need to show \({{\,\textrm{width}\,}}_v({\mathcal {B}}(a,\delta )) \le p 2^{p(p-1)/4}\).

The assumption \(y_\Lambda \notin {{\,\textrm{proj}\,}}_{{{\,\textrm{span}\,}}(\Lambda )}({\mathcal {B}}(a,\delta ))\) is equivalent to \(\Vert y_\Lambda - a_\Lambda \Vert > \delta \). Since \(y_\Lambda = {\hat{R}} \lfloor \lambda \rceil \) and \(a_\Lambda = {\hat{R}} \lambda \), we have

$$\begin{aligned} \Vert y_\Lambda - a_\Lambda \Vert&= \Vert {\hat{R}}(\lfloor \lambda \rceil - \lambda )\Vert = \left\Vert \sum _{i=1}^p (\lfloor \lambda _i\rceil - \lambda _i) {\hat{r}}^i\right\Vert \\&\le \sum _{i=1}^p \left|\lfloor \lambda _i\rceil - \lambda _i\right| \ \Vert {\hat{r}}^i\Vert \le p \Vert {\hat{r}}^p\Vert /2. \end{aligned}$$

We obtain that \(\Vert {\hat{r}}^p\Vert > 2 \delta /p\). Consider the Gram-Schmidt orthogonal basis \({\hat{g}}^1, \dots , {\hat{g}}^p \in {\mathbb {Q}}^d\) obtained from \({\hat{r}}^1,\dots ,{\hat{r}}^p\). Using (20) we have

$$\begin{aligned} \Vert {\hat{r}}^1\Vert \cdots \Vert {\hat{r}}^p\Vert = \Vert r^1\Vert \cdots \Vert r^p\Vert \le 2^{p(p-1)/4}\det (\Lambda ) = 2^{p(p-1)/4}\Vert {\hat{g}}^1\Vert \cdots \Vert {\hat{g}}^p\Vert . \end{aligned}$$

Moreover, as \(\Vert {\hat{r}}^i\Vert \ge \Vert {\hat{g}}^i\Vert \) for \(i = 1,\dots ,p-1\), it follows that \(\Vert {\hat{r}}^p\Vert \le 2^{p(p-1)/4}\Vert {\hat{g}}^p\Vert \). Since \(\Vert {\hat{r}}^p\Vert > 2 \delta / p\), we obtain

$$\begin{aligned} \Vert {\hat{g}}^p\Vert > \frac{2 \delta }{p2^{p(p-1)/4}}. \end{aligned}$$
(21)

We define the \(p \times d\) matrix \({\hat{R}}^\dagger := ({\hat{R}}^{\textsf{T}}{\hat{R}})^{-1} {\hat{R}}^{\textsf{T}}\), which is a left inverse of \({\hat{R}}\). Using \(B= {\hat{R}} U\), we obtain the relation

$$\begin{aligned} B^\dagger&= (B^{\textsf{T}}B)^{-1} B^{\textsf{T}}= (U^{\textsf{T}}{\hat{R}}^{\textsf{T}}{\hat{R}} U)^{-1} U^{\textsf{T}}{\hat{R}}^{\textsf{T}}\\&= U^{-1} ({\hat{R}}^{\textsf{T}}{\hat{R}})^{-1} U^{-{\textsf{T}}} U^{\textsf{T}}{\hat{R}}^{\textsf{T}}= U^{-1} ({\hat{R}}^{\textsf{T}}{\hat{R}})^{-1} {\hat{R}}^{\textsf{T}}= U^{-1} {\hat{R}}^\dagger . \end{aligned}$$

It is simple to check that \({{\,\textrm{width}\,}}_v({\mathcal {B}}(a,\delta )) = 2 \delta \Vert v\Vert \). If we denote by \({\hat{r}}\in {\mathbb {Q}}^{1 \times d}\) the last row of \({\hat{R}}^\dagger \), we have

$$\begin{aligned} {{\,\textrm{width}\,}}_v({\mathcal {B}}(a,\delta )) = 2 \delta \Vert v\Vert = 2 \delta \Vert (uB^\dagger )^{\textsf{T}}\Vert = 2 \delta \Vert (uU^{-1}{\hat{R}}^\dagger )^{\textsf{T}}\Vert = 2 \delta \Vert {\hat{r}}^{\textsf{T}}\Vert , \end{aligned}$$
(22)

where the last equality holds since \(u\) is the last row of U.

We now show that \({\hat{r}}^{\textsf{T}}= {\hat{g}}^p/\Vert {\hat{g}}^p\Vert ^2\). First, note that both \({\hat{r}}^{\textsf{T}}\) and \({\hat{g}}^p\) live in \({{\,\textrm{span}\,}}(\Lambda )\). For \({\hat{g}}^p\) this follows from the fact that \({\hat{g}}^1,\dots ,{\hat{g}}^p\) is a basis of \({{\,\textrm{span}\,}}(\Lambda )\). For \({\hat{r}}^{\textsf{T}}\), it can be seen because this vector is orthogonal to each vector \(t \in ({{\,\textrm{span}\,}}(\Lambda ))^\perp \) as \({\hat{r}}\) is the last row of \({\hat{R}}^\dagger \) and we have \({\hat{R}}^\dagger t = ({\hat{R}}^{\textsf{T}}{\hat{R}})^{-1} {\hat{R}}^{\textsf{T}}t = 0\), since each column of \({\hat{R}}\) lies in \({{\,\textrm{span}\,}}(\Lambda )\). Since \({\hat{g}}^p\) is orthogonal to \({\hat{g}}^1,\dots , {\hat{g}}^{p-1}\), it follows from (19) that \(({\hat{g}}^p)^{\textsf{T}}{\hat{r}}^i = 0\) for \(i = 1,\dots ,p-1\) and \(({\hat{g}}^p)^{\textsf{T}}{\hat{r}}^p = \Vert {\hat{g}}^p\Vert ^2\). Since \({\hat{r}}\) is the last row of \({\hat{R}}^\dagger \), we have \({\hat{r}}{\hat{r}}^i = 0\) for \(i = 1,\dots ,p-1\) and \({\hat{r}}{\hat{r}}^p = 1\). This concludes the proof that \({\hat{r}}^{\textsf{T}}= {\hat{g}}^p/\Vert {\hat{g}}^p\Vert ^2\).

Thus, by (22) and (21),

$$\begin{aligned} {{\,\textrm{width}\,}}_v({\mathcal {B}}(a,\delta )) = 2 \delta \Vert {\hat{r}}^{\textsf{T}}\Vert = \frac{2 \delta }{\Vert {\hat{g}}^p\Vert } \le p 2^{p(p-1)/4}. \end{aligned}$$

\(\square \)

We are now ready to give our decomposition result.

Proposition 5

There is a polynomial time algorithm which either finds two aligned vectors for (S-MIQP), or finds a vector \(v\in {{\,\textrm{span}\,}}(\Lambda )\) with \(v^{\textsf{T}}B\) integer such that \({{\,\textrm{width}\,}}_v({\mathcal {P}}) \le r_ds_p\), where \(s_p := 14 p 2^{p(p-1)/4}\).

Proof

Let \(a^+ := a+ \frac{3}{4} e^1 \in {\mathbb {Q}}^d\), where \(e^1\) denotes the first vector of the standard basis of \({\mathbb {R}}^d\). It is simple to verify that

$$\begin{aligned} {\mathcal {B}}(a^+, 1/4) \subseteq {\mathcal {B}}(a,1) \subseteq {\mathcal {B}}(a^+, 7/4). \end{aligned}$$
(23)

Denote by \(B \in {\mathbb {Q}}^{d\times p}\) the given basis matrix of the lattice \(\Lambda \). We apply Proposition 4 to \({\mathcal {B}}(a^+, 1/4)\) and the lattice \(2 \Lambda \) with basis matrix \(2B\). Consider first the case where Proposition 4 finds a vector \(v\in {{\,\textrm{span}\,}}(\Lambda )\) with \(v^{\textsf{T}}(2B)\) integer such that \({{\,\textrm{width}\,}}_v({\mathcal {B}}(a^+,1/4)) \le p 2^{p(p-1)/4}\). We then set \(v' := 2v\) and note that \(v' \in {{\,\textrm{span}\,}}(\Lambda )\) with \({v'}^{\textsf{T}}B\) integer. Furthermore, it follows from (23) that

$$\begin{aligned} {{\,\textrm{width}\,}}_{v'}({\mathcal {B}}(a,1))&= 2 {{\,\textrm{width}\,}}_v({\mathcal {B}}(a,1)) \le 2 {{\,\textrm{width}\,}}_v({\mathcal {B}}(a^+, 7/4)) \\&= 14 {{\,\textrm{width}\,}}_v({\mathcal {B}}(a^+,1/4)) \le 14 p 2^{p(p-1)/4}= s_p. \end{aligned}$$

Using (9) we obtain

$$\begin{aligned} {{\,\textrm{width}\,}}_{v'}({\mathcal {P}})&= {{\,\textrm{width}\,}}_{v'}({{\,\textrm{proj}\,}}_y {\mathcal {P}})) \le {{\,\textrm{width}\,}}_{v'}({\mathcal {B}}(a,r_d)) \\&\le r_d{{\,\textrm{width}\,}}_{v'}({\mathcal {B}}(a,1)) \le r_ds_p. \end{aligned}$$

Hence the statement of the proposition holds. Therefore, we now assume that Proposition 4 finds a vector \(y^+ \in {\mathcal {B}}(a^+,1/4) \cap (2\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp )\). Clearly, (23) implies that \(y^+ \in {\mathcal {B}}(a,1)\).

Next, we define \(a^- := a- \frac{3}{4} e^1 \in {\mathbb {Q}}^d\), and we apply Proposition 4 to \({\mathcal {B}}(a^-, 1/4)\) and the lattice \(2 \Lambda \) with basis matrix \(2B\). Symmetrically, we can assume that Proposition 4 finds a vector \(y^- \in 2\Lambda +{{\,\textrm{span}\,}}(\Lambda )^\perp \) that is in \({\mathcal {B}}(a^-,1/4)\) and therefore in \({\mathcal {B}}(a,1)\).

To conclude the proof, we only need to show that the vectors \(y^+,y^-\) are aligned for (S-MIQP). Since \(y^+ \in {\mathcal {B}}(a^+,1/4)\) and \(y^- \in {\mathcal {B}}(a^-,1/4)\), we obtain \(y^+_1 - y^-_1 \ge (a_1 + 1/2) - (a_1 - 1/2) = 1.\) For a vector \(y \in {\mathbb {R}}^d\) we denote by \(y_{-1}\) the vector in \({\mathbb {R}}^{d-1}\) obtained by deleting the first component from y. Using the triangle inequality and the fact that \(a^+_{-1} = a^-_{-1} = a_{-1}\), we obtain

$$\begin{aligned} \sum _{i=2}^d(y^+_i - y^-_i)^2&= \Vert y^+_{-1} - y^-_{-1}\Vert ^2 \le (\Vert y^+_{-1} - a_{-1}\Vert + \Vert y^-_{-1} - a_{-1}\Vert )^2 \\&= (\Vert y^+_{-1} - a^+_{-1}\Vert + \Vert y^-_{-1} - a^-_{-1}\Vert )^2 \le (\Vert y^+ - a^+\Vert + \Vert y^- - a^-\Vert )^2 \\&\le (1/4 + 1/4)^2 = 1/4. \end{aligned}$$

Hence \(y^+,y^-\) are aligned for (S-MIQP). \(\square \)

6 Approximation algorithm

In this section, we present our approximation algorithm for (MIQP) and we prove Theorem 1. First, we present two lemmas that allow us to reduce the number of variables in MIQPs with polyhedra that are not full-dimensional. The arguments are direct extensions of those for pure integer MILPs (see, e.g., [2]). Proofs are given for completeness.

Lemma 6

Let \(a \in {\mathbb {Q}}^n \setminus \{0\}\), \(\beta \in {\mathbb {Q}}\), \(p \in \{0,\dots ,n\}\). There is a polynomial time algorithm that determines whether the set \({\mathcal {S}}:=\{x \in {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p} : a^{\textsf{T}}x = \beta \}\) is empty or not. If \({\mathcal {S}}\ne \emptyset \), the algorithm finds a vector \({\bar{x}} \in {\mathbb {Q}}^n\) and a matrix \(M \in {\mathbb {Q}}^{n\times (n-1)}\) such that

$$\begin{aligned} {\mathcal {S}}= \{{\bar{x}} + My : y \in {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p-1}\}&\qquad \hbox { if } a_i \ne 0 \hbox { for some } i \in \{p+1, \dots ,n\} \\ {\mathcal {S}}= \{{\bar{x}} + My : y \in {\mathbb {Z}}^{p-1} \times {\mathbb {R}}^{n-p}\}&\qquad \text { if }a_i = 0 \text { for all }i \in \{p+1, \dots ,n\}. \end{aligned}$$

Proof

First, consider the case \(a_i \ne 0\) for some \(i \in \{p+1, \dots ,n\}\). We can then rewrite \(a^{\textsf{T}}x = \beta \) in the form \(x_i = (\beta - \sum _{j \in \{1,\dots ,n\} \setminus \{i\}} a_j x_j)/a_i\). Since \(x_i\) is a continuous variable, we obtain that \({\mathcal {S}}\) is nonempty. We define the vector \({\bar{x}} \in {\mathbb {Q}}^n\) with entry \(\bar{x}_i:=\beta /a_i\) and all other entries zero. We also define the matrix \(M \in {\mathbb {Q}}^{n \times n-1}\) obtained from the \(n \times n\) identity matrix by replacing the ith row with the horizontal vector \(-a^{\textsf{T}}/a_i\) and deleting column i. With these definitions of \({\bar{x}}\) and M, we obtain

$$\begin{aligned} {\mathcal {S}}= \{{\bar{x}} + My : y \in {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p-1}\}. \end{aligned}$$

Next, consider the case \(a_i = 0\) for all \(i \in \{p+1, \dots ,n\}\). Possibly by multiplying the equation \(a^{\textsf{T}}x = \beta \) by the least common multiple of the denominators of the entries of a, we may assume that a is an integral vector. Possibly by dividing the equation \(a^{\textsf{T}}x = \beta \) by the greatest common divisor of the entries of a, we may assume that a has relatively prime entries. If \(\beta \notin {\mathbb {Z}}\), then \({\mathcal {S}}\) is empty and we are done. Thus, we now assume \(\beta \in {\mathbb {Z}}\). Since \(a_1,\dots ,a_p\) are relatively prime, by Corollary 1.9 in [2], the equation \(\sum _{j=1}^p a_j x_j = \beta \) has an integral solution \({\tilde{x}} \in {\mathbb {Z}}^p\), thus \({\mathcal {S}}\) is nonempty. Furthermore, there exists a unimodular matrix \(U \in {\mathbb {Z}}^{p \times p}\) such that \({\tilde{a}}^{\textsf{T}}U = {e_1}^{\textsf{T}}\), where \({\tilde{a}}\) is the vector of the first p coordinates of a, and \(e_1\) denotes the first unit vector in \({\mathbb {R}}^p\). From the proof of Corollary 1.9 in  [2], both \({\tilde{x}}\) and U can be computed in polynomial time. If we define the matrix \(N \in {\mathbb {Z}}^{p \times (p-1)}\) formed by the last \(p-1\) columns of U, we have

$$\begin{aligned} \left\{ x \in {\mathbb {Z}}^p : \sum _{j=1}^p a_j x_j= \beta \right\} = \{{\tilde{x}} + Ny : y \in {\mathbb {Z}}^{p-1}\}. \end{aligned}$$

We define the vector \({\bar{x}} \in {\mathbb {Z}}^n\) by \({\bar{x}}_j := {\tilde{x}}_j\) for \(j \in \{1,\dots ,p\}\) and \({\bar{x}}_j := 0\) for \(j \in \{p+1,\dots ,n\}\). We also define the matrix \(M \in {\mathbb {Q}}^{n \times n-1}\) with block corresponding to the first p rows and \(p-1\) columns being equal to N, block corresponding to the last \(n-p\) rows and \(n-p\) columns being equal to the identity matrix \(I_{n-p}\), and remaining entries zero. Since \(a_i = 0\) for all \(i \in \{p+1, \dots ,n\}\), we conclude

$$\begin{aligned} {\mathcal {S}}= \{{\bar{x}} + My : y \in {\mathbb {Z}}^{p-1} \times {\mathbb {R}}^{n-p}\}. \end{aligned}$$

\(\square \)

Lemma 7

Consider an instance of (MIQP) with a nonempty feasible region. There is a polynomial time algorithm that determines whether \(\{x \in {\mathbb {R}}^n : Wx \le w \}\) is full-dimensional. If not, it rewrites the instance as an instance of (MIQP) with one fewer variable.

Proof

It is well-known [2] that there is a polynomial time algorithm that determines whether \(\{x \in {\mathbb {R}}^n : Wx \le w \}\) is full-dimensional, and if not, finds a rational hyperplane \(\{x \in {\mathbb {R}}^n : a^{\textsf{T}}x = \beta \}\) that contains it. In the latter case, we let \({\bar{x}} \in {\mathbb {Q}}^n\) and \(M \in {\mathbb {Q}}^{n\times (n-1)}\) from Lemma 6, and we define \(H' := M^{\textsf{T}}H M\), \(h' := 2M^{\textsf{T}}H^{\textsf{T}}{\bar{x}} + M^{\textsf{T}}h\), \(c := \bar{x}^{\textsf{T}}H {\bar{x}} + h^{\textsf{T}}{\bar{x}}\), \(W' := WM\), \(w' := w - W {\bar{x}}\). By Lemma 6, our instance of (MIQP) can be rewritten as

$$\begin{aligned} \min&\quad x^{\textsf{T}}H' x + h'^{\textsf{T}}x + c \\ {{\,\mathrm{s.t.}\,}}&\quad W'x \le w' \\&\quad x \in \Lambda , \end{aligned}$$

where

$$\begin{aligned} \Lambda := {\left\{ \begin{array}{ll} {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p-1} &{}\qquad \hbox { if } a_i \ne 0 \hbox { for some } i \in \{p+1, \dots ,n\} \\ {\mathbb {Z}}^{p-1} \times {\mathbb {R}}^{n-p} &{}\qquad \text { if }a_i = 0 \text { for all }i \in \{p+1, \dots ,n\}. \end{array}\right. } \end{aligned}$$

\(\square \)

6.1 Description of the approximation algorithm

We are now in a position to present our approximation algorithm for (MIQP). We will make use of Proposition 2, Proposition 3, Proposition 5, and Lemma 7.

The input of the algorithm consists of an instance of a bounded MIQP. Theorem 4 in [10] implies that, if there is an optimal solution, there is one of size bounded by an integer \(\psi \), which is polynomial in the size of the input MIQP.Footnote 1 Therefore, we obtain an equivalent MIQP instance by restricting each variable to the segment \([-2^\psi , 2^\psi ]\). The size of the latter instance is polynomial in the size of the former. Furthermore, it is simple to check that an \(\epsilon \)-approximate solution to the latter instance is also an \(\epsilon \)-approximate solution to the former, for every \(\epsilon \in [0,1]\). Therefore, we can now assume that our input MIQP has a bounded feasible region.

We initialize the set \({\mathscr {I}}\) of MIQP instances to be solved as a set containing only our input MIQP, and the set of possible approximate solutions as \({\mathscr {A}}:= \emptyset \). Throughout the algorithm, each instance in \({\mathscr {I}}\) will be our input MIQP with a number of additional linear equality constraints. On the other hand, the set \({\mathscr {A}}\) will contain a number of feasible solutions to the input MIQP.

Step 1: Output, feasibility, full-dimensionality, and linear case.

Output. If \({\mathscr {I}}= \emptyset \), then we return the solution in \({\mathscr {A}}\) with the minimum objective function value if \({\mathscr {A}}\ne \emptyset \), and we return “infeasible” if \({\mathscr {A}}= \emptyset \). Otherwise \({\mathscr {I}}\ne \emptyset \), we choose a MIQP instance in \({\mathscr {I}}\) and we remove it from \({\mathscr {I}}\). Without loss of generality, the chosen MIQP instance is (MIQP).

Feasibility. Using Lenstra’s algorithm [25], we check if the feasible region \(\{x \in {\mathbb {Z}}^p \times {\mathbb {R}}^{n-p} : Wx \le w\}\) of (MIQP) is the emptyset. If so, we go back to Step 1. Otherwise, (MIQP) is feasible and we continue.

Full-dimensionality. We apply recursively Lemma 7 until the polyhedron describing the feasible region is full-dimensional. For ease of notation, we denote the obtained instance again by (MIQP), and we now assume that \(\{x \in {\mathbb {R}}^n : Wx \le w\}\) is full-dimensional.

Linear case. Let k be the rank of the matrix H. If \(k=0\), (MIQP) is a MILP, and we find an optimal solution using Lenstra’s algorithm. We construct the corresponding feasible solution to the input MIQP by inverting the linear transformation just performed in “Full-dimensionality”, and we add it to \({\mathscr {A}}\). Otherwise, we have \(k \ge 1\) and we continue.

Step 2: Reduction to spherical form.

By Proposition 2, we perform a change of basis that transform (MIQP) in spherical form (S-MIQP), where \(d\) satisfies \(d\le k+p\), the rank of the matrix D is k, and \(r_d\) in (9) is the ceiling of \(2 d^{3/2} \lceil (5d)^{d/2}\rceil ^2\).

Let \(B \in {\mathbb {Q}}^{d\times p}\) be the obtained basis matrix of the lattice \(\Lambda \). By Proposition 5, we either find two aligned vectors \(y^+,y^-\) for (S-MIQP), or we find a vector \(v\in {{\,\textrm{span}\,}}(\Lambda )\) with \(v^{\textsf{T}}B\) integer such that \({{\,\textrm{width}\,}}_v({\mathcal {P}}) \le r_ds_p\), where \(s_p = 14 p 2^{p(p-1)/4}\). In the first case, continue with Step 3; In the second case, go to Step 4.

Step 3: Mesh partition and linear underestimators.

By Proposition 3 we obtain an \(\epsilon \)-approximate solution \((y^\diamond ,z^\diamond )\) to (S-MIQP). This requires solving, with Lenstra’s algorithm, at most \(\left\lceil 4 r_{d} \sqrt{k/(3 \epsilon )}\right\rceil ^k\) MILPs of the same size as (S-MIQP) and with p integer variables. We construct the corresponding \(\epsilon \)-approximate solution \(x^\diamond \) to the (MIQP) chosen at the beginning of this iteration of the algorithm by inverting the linear transformations in Step 2 and in Step 1, and we add it to \({\mathscr {A}}\). Then, we go back to Step 1.

Step 4: Decomposition.

Since \({{\,\textrm{width}\,}}_v({\mathcal {P}}) \le r_ds_p\), each point \((y,z) \in {\mathcal {P}}\) with \(y \in \Lambda + {{\,\textrm{span}\,}}(\Lambda )^\perp \) is contained in one of the following polytopes:

$$\begin{aligned} {\mathcal {P}}_t := \{(y,z) \in {\mathcal {P}}: v^{\textsf{T}}y = t\}&\qquad \text {for } t= \lceil \mu \rceil , \dots , \lfloor \mu + r_ds_p\rfloor , \end{aligned}$$

where \(\mu := \min \{v^{\textsf{T}}y : y \in {\mathcal {P}}\}\).

For each \(t= \lceil \mu \rceil , \dots , \lfloor \mu + r_ds_p\rfloor \), we consider the instance obtained from (S-MIQP) by replacing the polytope \({\mathcal {P}}\) with \({\mathcal {P}}_t\), and we add to \({\mathscr {I}}\) the MIQP obtained by inverting the linear transformations in Step 2 and in Step 1. Note that the instances that we just added to \({\mathscr {I}}\) differ from the one chosen at the beginning of this iteration of the algorithm only by the additional constraint obtained from \(v^{\textsf{T}}y = t\) by inverting the two linear transformations. Finally, we go back to Step 1.

6.2 Analysis of the algorithm

First, we show that the algorithm described in Sect. 6.1 is correct.

Claim 3

The algorithm in Sect. 6.1 returns an \(\epsilon \)-approximate solution, if it exists.

Proof

Clearly, if the input MIQP is infeasible, the algorithm correctly detects it in Step 1, thus we now assume that it is feasible. In this case, we need to show that the algorithm returns an \(\epsilon \)-approximate solution to the input MIQP. To prove this, we only need to show that the algorithm eventually adds to the set \({\mathscr {A}}\) an \(\epsilon \)-approximate solution \(x^\epsilon \) to the input MIQP. In fact, the vector returned at the end of the algorithm has objective value at most that of \(x^\epsilon \), and so it is an \(\epsilon \)-approximate solution to the input MIQP as well.

Let \(x^* \in {\mathbb {R}}^n\) be an optimal solution to the input MIQP. Let MIQP\(^*\) be an instance stored at some point in \({\mathscr {I}}\) that contains in the feasible region the vector \(x^*\). Among all these possible instances, we assume that MIQP\(^*\), after the “Full-dimensionality” transformation in Step 1, has a minimal number of integer variables. Note that MIQP\(^*\) does not get decomposed in Step 4. Otherwise, the vector \(x^*\) would be feasible for one of the instances generated in Step 4 from MIQP\(^*\), which after the “Full-dimensionality” transformation will have fewer integer variables than MIQP\(^*\). Hence, when the algorithm selects MIQP\(^*\) from \({\mathscr {I}}\), it performs Step 3 of the algorithm, and so by Proposition 3 it adds to \({\mathscr {A}}\) a vector \(x^\epsilon \) that is an \(\epsilon \)-approximate solution to MIQP\(^*\). Since the feasible region of MIQP\(^*\) is contained in the feasible region of the input MIQP, and since the vector \(x^*\) is feasible for MIQP\(^*\), it is simple to check that \(x^\epsilon \) is an \(\epsilon \)-approximate solution to the input MIQP. \(\square \)

We complete the proof of Theorem 1 by showing that the running time of the algorithm matches the one stated in Theorem 1.

Claim 4

The running time of the algorithm in Sect. 6.1 is polynomial in the size of the input and in \(1/\epsilon \), provided that the rank k of the matrix H and the number of integer variables p are fixed numbers.

Proof

First, we show that the algorithm performs at most \((r_{k+p} s_p +1)^{p+1}\) iterations, which is a fixed number if both k and p are fixed. Note that the number of iterations coincides with the total number of instances that are stored in \({\mathscr {I}}\) throughout the execution of the algorithm. Instances are added to \({\mathscr {I}}\) only in Step 4, where the MIQP chosen in that iteration gets replaced in \({\mathscr {I}}\) with at most \(r_{k'+p'} s_{p'} +1\) new instances. Here, \(k'\) denotes the rank of the quadratic objective and \(p'\) denotes the number of integer variables of the chosen instance after the “Full-dimensionality” transformation in Step 1. In the new instances added to \({\mathscr {I}}\), the rank of the quadratic objective is at most \(k'\), and the number of integer variables is at most \(p'-1\). In particular, this implies that for every chosen instance we have \(k' \le k\) and \(p' \le p\). Finally, note that Step 4 may be triggered only if \(p' \ge 1\). Therefore, the total number of MIQPs that are eventually stored in \({\mathscr {I}}\) is at most \(\sum _{j=0}^p (r_{k+p} s_p +1)^{j} \le (r_{k+p} s_p +1)^{p+1}.\)

It is simple to check that each instance constructed by the algorithm and each number generated has size polynomial in the size of the input MIQP. Thus, to conclude the proof we only need to analyze the running time of a single iteration of the algorithm. Each MILP encountered (in Step 1 and Step  3) has at most p integer variables. Since p is fixed, they can be solved with Lenstra’s algorithm [25] in time polynomial in the size of the input MIQP. Step 1 of the algorithm can then be performed in time polynomial in the size of the input MIQP. By Proposition 2 and Proposition 5, also Step 2 can be performed in time polynomial in the size of the input MIQP. In Step 3, the algorithm solves at most \(\left\lceil 4 r_{k+p} \sqrt{k/(3 \epsilon )}\right\rceil ^k\) MILPs with at most p integer variables. Since k and p are fixed, this number is polynomial in \(1/\epsilon \). Therefore, Step 3 of the algorithm can be performed in time polynomial in the size of the input MIQP and in \(1/\epsilon \). Step 4 only solves one linear program to find \(\mu \) and stores at most \(r_{k+p} s_p +1\) MIQPs, which is a fixed number if both k and p are fixed. \(\square \)