1 Introduction

We consider a linearly constrained quadratic optimization problem (QOP) with complementarity constraints:

$$\begin{aligned} \text{ minimize } \left\{ \varvec{u}^T \varvec{Q}\varvec{u}+ 2 \varvec{c}^T \varvec{u}\left| \begin{array}{l} \varvec{u}\in \mathbb {R}^m_+, \; \varvec{A}\varvec{u}+ \varvec{b}= 0, \\ u_iu_j = 0 \; ((i,j) \in \mathcal{{E}}) \end{array} \right. \right\} \end{aligned}$$
(1)

where \(\varvec{A}\in \mathbb {R}^{q\times m}\), \(\varvec{b}\in \mathbb {R}^q\), \(\varvec{c}\in \mathbb {R}^m\), \(\varvec{Q}\in \mathbb {S}^m\), and \(\mathcal{{E}}\subset \left\{ (i,j) : 1 \le i < j \le m \right\} \) are given data. Note that the binary constraint \(u_i(1-u_i) = 0\) can be converted to a complementarity constraint \(u_iv_i = 0\) with a slack variable \(v_i = 1-u_i \ge 0\); thus QOP (1) can model nonconvex quadratic problems with linear, binary and complementarity constraints. We assume that the linear constraint set \(\left\{ \varvec{u}\in \mathbb {R}^m_+ : \varvec{A}\varvec{u}+ \varvec{b}= \mathbf{0} \right\} \) is bounded. The QOP model (1) satisfying this assumption includes various combinatorial optimization problems, for instance, the binary integer quadratic problem, the maximum stable set problem, the quadratic multiple knapsack problem, and the quadratic assignment problem [10, 26, 28].

The completely positive programming (CPP) relaxation of linearly constrained QOPs in binary and continuous variables by Burer [9] has gained considerable attention since 2009. Extending his work, theoretical results for a more general class of QOPs were established by Eichfelder and Povh [13, 16] and by Arima et al. [1]. They showed that the exact optimal values of QOPs in their classes coincided with the optimal values of their CPP relaxation problems.

Such CPP relaxations are numerically intractable since the problem of determining whether a given matrix lies in the completely positive cone or the copositive cone is a NP-complete or co-NP-complete problem as shown in [24]. Replacing the completely positive cone by doubly nonnegative (DNN) cone and solving the resulting problem by a primal–dual interior-point method has been a popular approach [20, 36]. The computational cost of this approach, however, is very expensive as the number of nonnegative constraints for the variables grows rapidly with the size of the problem.

Another approach to approximately solve the CPP relaxation of (1) was developed in [8] via adaptive polyhedral inner and outer approximations of the copositive cone. Impressive numerical results were reported that for certain randomly generated nonconvex QOPs over the standard simplex, the adaptive linear approximation method can solve problems with \(n=2000\) in about 30 s on the average. However, the paper [8] also reported disappointing news that the linear approximation method was not able to generate good bounds (even given substantial computing time) for more realistic QOPs such as those arising from quadratic assignment problems or the maximum stable set problems.

Recently, Arima et al. [2] introduced the simplified Lagrangian–CPP relaxation of a linearly constrained QOP in continuous and binary variables with a single parameter \(\lambda \in \mathbb {R}\). It was derived by reducing the original QOP to an equivalent QOP with a single quadratic equality constraint in nonnegative variables, and applying the Lagrangian relaxation to the resulting QOP. As a result, an unconstrained QOP with a Lagrangian multiplier \(\lambda \in \mathbb {R}\) in nonnegative variables was obtained. From the computational point of view, this Lagrangian relaxation is one of the simplest forms to handle with a solution method. Applying the CPP relaxation to the unconstrained QOP with \(\lambda \) lead to the Lagrangian–CPP relaxation. It was shown in [2] that the optimal values of the Lagrangian relaxation as well as its CPP relaxation monotonically converge to the exact optimal value of the original QOP as \(\lambda \) tends to \(\infty \).

The main goals of this paper are twofold. First, we propose an efficient and effective numerical method for the QOP model (1) by extending the framework of the Lagrangian–CPP relaxation to the one including the Lagrangian–DNN relaxation of (1). Second, generalizing the brief discussion on such an extension in [2], we present a theoretical framework for the Lagrangian–conic relaxation of (1) that covers both Lagrangian–CPP and Lagrangian–DNN relaxations. We use the primal–dual pair of the Lagrangian–DNN relaxation with a sufficiently large \(\lambda \in \mathbb {R}\) to compute a tight lower bound for the optimal value of (1). The main features of the Lagrangian–DNN relaxation are:

  • The primal is an unconstrained DNN problem with a matrix variable whose upper-left corner element is fixed to \(1\). Thus, its dual becomes a simple problem with just a single variable.

  • The primal DNN problem is strictly feasible, i.e., the primal feasible region intersects with the interior of the DNN cone.

  • A common optimal objective value, shared by the primal–dual pair with a parameter \(\lambda > 0\), monotonically converges to the optimal objective value of the DNN relaxation of (1). Hence, a lower bound with almost the same quality as the one obtained from the DNN relaxation of (1) can be computed for the optimal objective value of (1) via the simple Lagrangian–DNN relaxation with a sufficiently large \(\lambda \in \mathbb {R}\).

The computational efficiency for solving the Lagrangian–DNN relaxation can be expected from the first and second features mentioned above. If a primal–dual interior-point method [7, 18, 31, 32] for SDPs is used to solve the Lagrangian–DNN relaxation, then the inequality constraints induced from the nonnegativity of all elements of the DNN matrix variable may incur prohibitive computational burden. More precisely, if the size of the DNN matrix is \(n\), then the number of inequalities to be added in the SDP problem amounts to \(n(n-1)/2\). Thus, the conversion of a DNN problem into a standard SDP is computationally inefficient when \(n\) is large. To avoid such inefficiency of using a primal–dual interior-point method, the numerical method proposed in this paper employs first-order algorithms [4] for the Lagrangian–DNN relaxation without converting it into a standard SDP. The first-order algorithms in our approach is used to find the projection onto the intersection of two cones, SDP and nonnegative matrix cones. The first and second features previously mentioned considerably increase the efficiency and numerical stability of our first-order algorithms, respectively. We should mention that for solving linear SDPs with doubly nonnegative cone constraints, Wen et al. [34] have applied a direct extension of the alternating direction method of multipliers (ADMM) to solve the dual problems. However, it has been shown recently in [11] that such a direct extension of the ADMM may not necessarily be convergent.

The numerical results in Sect. 5 show that the Lagrangian–DNN relaxation provides tighter lower bounds more efficiently than the DNN relaxation of the QOP (1). When the proposed method is experimented on the test problems such as the binary integer quadratic problem, the maximum stable set problem, the quadratic multiple knapsack problem, and the quadratic assignment problem, the quality of the lower bounds obtained from the proposed method is tighter, compared to the known optimal values. As mentioned in the third main feature, a sufficiently large \(\lambda \) results in a tight lower bound. The proposed method can also solve the problems efficiently, in particular, it obtains the lower bound for the quadratic assignment problem much faster than SDPNAL, which is an advanced large scale SDP solver [37], applied to the DNN relaxation of the problem.

In Sect. 2, we list notation and symbols used throughout the paper. The QOP (1) is extended to a general QOP model, from which effective conic and Lagrangian–conic relaxations are derived. In Sect. 3, the conic and Lagrangian–conic relaxations and their relations are discussed. In particular, the Lagrangian–DNN relaxation, a special case of the Lagrangian–conic relaxations, is used to obtain a tight lower bound for the optimal value of the general QOP. In Sect. 4, we presents a bisection method together with the proximal alternating direction multiplier method [17] and the accelerated proximal gradient method [4] for solving the Lagrangian–DNN relaxation of the general QOP. Numerical results on the binary integer quadratic problem, the maximum stable set problem, the quadratic multiple knapsack problem, and the quadratic assignment problem are presented in Sect. 5. Finally, we conclude in Sect. 6.

2 Preliminaries

2.1 Notation and symbols

Let \(\mathbb {R}\) denote the set of real numbers, \(\mathbb {R}_+\) the set of nonnegative real numbers, \(\mathbb {S}^n\) the space of \(n \times n\) symmetric matrices, and \(\mathbb {S}^n_+\) the cone of \(n \times n\) symmetric positive semidefinite matrices. We let \(\mathbb {C}= \left\{ \varvec{A}\in \mathbb {S}^n : \varvec{x}^T\varvec{A}\varvec{x}\ge 0 \ \hbox {for all } \varvec{x}\in \mathbb {R}^n_+ \right\} \) (the copositive cone), \(\varvec{\Gamma }= \left\{ \varvec{x}\varvec{x}^T : \varvec{x}\in \mathbb {R}^n_+ \right\} \), \(\mathbb {C}^* =\) the convex hull of \(\varvec{\Gamma }\) (the completely positive cone, the dual of \(\mathbb {C}\)), \(\mathbb {N}=\) the cone of \(n \times n\) symmetric matrices with nonnegative elements . \(\varvec{Y}\bullet \varvec{Z}\) means \(\text{ trace } \text{ of } \varvec{Y}\varvec{Z}\ \text{ for } \text{ every } \varvec{Y}, \ \varvec{Z}\in \mathbb {S}^n\).

The following relations for \(\varvec{\Gamma }, \ \mathbb {S}^n_+\), \(\mathbb {C}\), \(\mathbb {C}^*\) and \(\mathbb {N}\) are well-known:

$$\begin{aligned}&\varvec{\Gamma }\subset \mathbb {C}^* \subset \mathbb {S}^n_+ \cap \mathbb {N}\subset \mathbb {S}^n_+ \subset \mathbb {S}^n_+ + \mathbb {N}\subset \mathbb {C}, \ \varvec{\Gamma }= \left\{ \varvec{X}\in \mathbb {S}^n_+ \cap \mathbb {N}: \text{ rank }(\varvec{X}) = 1 \right\} , \\&\mathbb {S}^n_+ \cap \mathbb {N}= \left( \mathbb {S}^n_+ + \mathbb {N}\right) ^* \ \end{aligned}$$

We call \(\mathbb {S}^n_+ \cap \mathbb {N}\) the doubly nonnegative cone.

For \(\varvec{x}\in \mathbb {R}^n\), \(\varvec{x}^T\) denotes the transpose of \(\varvec{x}\). We use the notation \((\varvec{t},\varvec{u}) \in \mathbb {R}^{\ell +m}\) for the \((\ell +m)\)-dimensional column vector consisting of \(\varvec{t}\in \mathbb {R}^{\ell }\) and \(\varvec{u}\in \mathbb {R}^m\). In the subsequent discussions, the quadratic form \(\varvec{x}^T \varvec{Q}\varvec{x}\) associated with a matrix \(\varvec{Q}\in \mathbb {S}^n\) is represented as \(\varvec{Q}\bullet \varvec{x}\varvec{x}^T\) to suggest that \(\varvec{Q}\bullet \varvec{x}\varvec{x}^T\) with \(\varvec{x}\in \mathbb {R}^n_+\) is relaxed to \(\varvec{Q}\bullet \varvec{X}\) with \(\varvec{X}\in \mathbb {C}^*\), \(\varvec{X}\in \mathbb {S}^n_+ \cap \mathbb {N}\) or \(\varvec{X}\in \mathbb {S}^n_+\).

2.2 A quadratic optimization model

We introduce a QOP of the form

$$\begin{aligned} \zeta \,{:=}\, \inf \left\{ \varvec{Q}_0 \bullet \varvec{x}\varvec{x}^T \left| \begin{array}{l} \varvec{x}\in \mathbb {R}^n_+, \ \varvec{H}_0 \bullet \varvec{x}\varvec{x}^T = 1, \\ \varvec{Q}_p \bullet \varvec{x}\varvec{x}^T = 0 \ (p=1,2,\ldots ,q) \end{array} \right. \right\} , \end{aligned}$$
(2)

where \(\varvec{H}_0 \in \mathbb {S}^n\) and \(\varvec{Q}_p \in \mathbb {S}^n\) \((p=0,1,\ldots ,q)\), to describe a class of conic relaxations and their further Lagrangian relaxations in Sect. 3. This form of QOP was introduced in [1] to establish an equivalence to its CPP relaxation under a set of conditions. If we are concerned with only the CPP relaxation among the conic relaxations, basic theoretical results presented in Sect. 3 can be obtained from [1, 2]. Our main emphasis here is on extending their framework to a larger class of conic and Lagrangian–conic relaxations. In particular, the class includes a Lagrangian–DNN relaxation of QOP (2), which is shown to work very effectively and efficiently for large-scale QOPs in Sect. 5 with the first order methods described in Sect. 4.

We assume the following three conditions throughout the paper. These conditions are stronger than the ones assumed in [1, 2], because the framework includes not only the CPP relaxation but also the DNN relaxation.

Condition (a) The feasible region

$$\begin{aligned} \left\{ \varvec{x}\in \mathbb {R}^n_+ : \varvec{H}_0 \bullet \varvec{x}\varvec{x}^T = 1, \ \varvec{Q}_p \bullet \varvec{x}\varvec{x}^T = 0 \ (p=1,2,\ldots ,q) \right\} \text{ is } \text{ nonempty. } \end{aligned}$$
(3)

Condition (b) \(\varvec{O}\not = \varvec{H}_0 \in \mathbb {S}^n_+ + \mathbb {N}\) and \(\varvec{O}\not = \varvec{Q}_p \in \mathbb {S}^n_+ + \mathbb {N}\) \((p=1,2\ldots ,q)\).

Condition (c) \(\varvec{D}= \varvec{O}\) if \(\varvec{D}\in \mathbb {S}^n_+ \cap \mathbb {N}\), \(\varvec{H}_0 \bullet \varvec{D}= 0\) and \(\varvec{Q}_p \bullet \varvec{D}= 0\) \((p=1,2,\ldots ,q)\).

Notice that if \(\varvec{H}_0 = \varvec{O}\) then QOP (2) is infeasible, and if \(\varvec{Q}_p = \varvec{O}\) for some \(p\) then the redundant constraint \(\varvec{Q}_p \bullet \varvec{x}\varvec{x}^T = 0\) can be eliminated. Thus, \(\varvec{H}_0 \not = \varvec{O}\) and \(\varvec{Q}_p \not = \varvec{O}\) \((p=1,2,\ldots ,q)\) in Condition (b) can be assumed without loss of generality. We note that Condition (c) together with Condition (a) require that the feasible region is nonempty and bounded. For the proof of this fact, see (i) of Lemma 2 and its proof. Hence “inf” in (2) can be replaced by “min”. We let \(\varvec{x}^*\) be an optimal solution of QOP (2) with the finite optimal value \(\zeta \).

We can easily transform QOP (1) with linear and complementarity constraints to a QOP of the form (2) satisfying Conditions (a), (b) and (c) as follows. Define

$$\begin{aligned} \varvec{x}&= (x_1,\varvec{u}) \in \mathbb {R}^{n} \ \hbox {with }n = 1+m, \\ \varvec{Q}_0&= \left( \begin{array}{cc} 0 &{}\quad \varvec{c}^T \\ \varvec{c}&{}\quad \varvec{Q}\end{array}\right) \in \mathbb {S}^n, \ \ \varvec{H}_0 = \left( \begin{array}{cc} 1 &{}\quad \mathbf{0}^T \\ \mathbf{0} &{}\quad \varvec{O}\end{array}\right) \in \mathbb {S}^n, \ \ \varvec{Q}_{01} = \left( \begin{array}{cc} \varvec{b}^T\varvec{b}&{}\quad \varvec{b}^T\varvec{A}\\ \varvec{A}^T\varvec{b}&{}\quad \varvec{A}^T\varvec{A}\end{array}\right) \in \mathbb {S}^n, \\ \varvec{C}_{ij}&= \begin{array}{l} \hbox {the }m \times m \hbox { matrix with }(i,j)\hbox {th component }1/2, \\ \hbox {and }0\hbox { elsewhere} \ ((i,j) \in \mathcal{{E}}), \end{array} \varvec{Q}_{ij} = \left( \begin{array}{cc} 0 &{}\quad \mathbf{0}^T \\ \mathbf{0} &{}\quad \varvec{C}_{ij} + \varvec{C}_{ij}^T \end{array}\right) \in \mathbb {S}^n. \end{aligned}$$

Then QOP (1) is equivalent to

$$\begin{aligned} \text{ minimize } \left\{ \varvec{Q}_0 \bullet \varvec{x}\varvec{x}^T \left| \begin{array}{l} \varvec{x}\in \mathbb {R}^n_+, \ \varvec{H}_0 \bullet \varvec{x}\varvec{x}^T = 1, \\ \varvec{Q}_{01} \bullet \varvec{x}\varvec{x}^T = 0, \ \varvec{Q}_{ij} \bullet \varvec{x}\varvec{x}^T = 0 \ ((i,j) \in \mathcal{{E}}) \end{array} \right. \right\} \end{aligned}$$
(4)

in the sense that \(\varvec{u}\in \mathbb {R}^m\) is a feasible solution of (1) with the objective value \(\varvec{u}^T\varvec{Q}\varvec{u}+ 2\varvec{c}^T\varvec{u}\) if and only if \(\varvec{x}= (1,\varvec{u}) \in \mathbb {R}^n\) is a feasible solution of (4) with the same objective value \(\varvec{Q}_0 \bullet \varvec{x}\varvec{x}^T\).

Lemma 1

Suppose that the feasible region of QOP (1) is nonempty and that the polyhedral set, \(\left\{ \varvec{u}\in \mathbb {R}^m_+ : \varvec{A}\varvec{u}+ \varvec{b}= \mathbf{0}\right\} \), is bounded. Then QOP (4) induced from QOP (1) satisfies Conditions (a), (b) and (c). Here we assume that the subscripts of the matrices \(\varvec{Q}_{01}\) and \(\varvec{Q}_{ij}\) \(((i,j) \in \mathcal{{E}})\) have been renumbered to \(1,2,\ldots ,q\) for some \(q\).

Proof

We only prove Condition (c) because Conditions (a) and (b) are obvious. Assume that \(\varvec{H}_0 \bullet \varvec{D}= 0\), \(\varvec{Q}_{01} \bullet \varvec{D}= 0\) and \(\varvec{Q}_{ij} \bullet \varvec{D}= 0\) \((i,j) \in \mathcal{{E}}\) for some \(\varvec{D}\in \mathbb {S}^{1+m}_+ \cap \mathbb {N}\). Then, we see that

$$\begin{aligned} 0 = \varvec{H}_0 \bullet \varvec{D}= D_{11}, \ \text{ and } 0 = \varvec{Q}_{01} \bullet \varvec{D}= \left( \begin{array}{cc} \varvec{b}^T\varvec{b}&{}\quad \varvec{b}^T\varvec{A}\\ \varvec{A}^T\varvec{b}&{}\quad \varvec{A}^T\varvec{A}\end{array}\right) \bullet \varvec{D}. \end{aligned}$$
(5)

Now we write \(\varvec{D}\in \mathbb {S}^{1+m}_+ \cap \mathbb {N}\) as \(\left( \begin{array}{cc} D_{11} &{} \quad \varvec{D}_{12} \\ \varvec{D}_{12}^T &{}\quad \varvec{D}_{22} \end{array}\right) \). From \(0 = D_{11}\) and \(\varvec{D}\in \mathbb {S}^{1+m}_+\), we get \(\varvec{D}_{12} = \mathbf{0}\). As a result, the last relation in (5) implies that \(\varvec{A}^T\varvec{A}\bullet \varvec{D}_{22} = \varvec{O}\). Since \(\varvec{A}^T\varvec{A}\in \mathbb {S}^m_+\) and \(\varvec{D}_{22} \in \mathbb {S}^m_+\), we obtain \(\varvec{A}^T\varvec{A}\varvec{D}_{22} = \varvec{O}\) and \(\varvec{A}\varvec{D}_{22} = \varvec{O}\). On the other hand, by the assumption of the lemma, there does not exist nonzero \(\varvec{d}\in \mathbb {R}^m\) such that \(\varvec{d}\ge \mathbf{0}\) and \(-\varvec{A}\varvec{d}= \mathbf{0}\). By applying Tucker’s theorem of the alternative [33], we can take a \(\varvec{y}\in \mathbb {R}^m\) such that \(\varvec{A}^T\varvec{y}> \mathbf{0}\). Multiplying \(\varvec{y}^T\) to the identity \(\varvec{A}\varvec{D}_{22} = \varvec{O}\), we obtain that \((\varvec{y}^T\varvec{A})\varvec{D}_{22} = \mathbf{0}^T\), \((\varvec{y}^T\varvec{A}) > \mathbf{0}^T\) and \(\varvec{D}_{22} \in \mathbb {N}\), which implies \(\varvec{D}_{22} = \varvec{O}\). Thus we have shown that \(\varvec{D}= \varvec{O}\). \(\square \)

Condition (b) implies that the inequalities \( \varvec{Q}_{p} \bullet \varvec{x}\varvec{x}^T \ge 0 \ (p=1,2,\ldots ,q) \) hold for any \(\varvec{x}\in \mathbb {R}^n_+\). Hence, the set of equalities \( \varvec{Q}_{p} \bullet \varvec{x}\varvec{x}^T = 0 \ (p=1,2,\ldots ,q) \) in QOP (2) can be combined into a single equality \(\varvec{H}_1 \bullet \varvec{x}\varvec{x}^T = 0\), where \( \varvec{H}_1 = \sum _{p=1}^q \varvec{Q}_p. \) Consequently, we obtain a simplified QOP:

$$\begin{aligned} \zeta \,{:=}\, \min \left\{ \varvec{Q}_0 \bullet \varvec{x}\varvec{x}^T \mid \begin{array}{ll} \varvec{x}\in \mathbb {R}^n_+, \ \varvec{H}_0 \bullet \varvec{x}\varvec{x}^T = 1, \ \varvec{H}_1 \bullet \varvec{x}\varvec{x}^T = 0 \end{array} \right\} , \end{aligned}$$
(6)

which is equivalent to QOP (2). Specifically, (2) and (6) share a common feasible region (3) and a common optimal solution \(\varvec{x}^*\) with the optimal value \(\zeta \). We note that QOP (4) (hence (1) is converted to QOP (6) if we define \(\varvec{H}_1 = \varvec{Q}_{01}+\sum _{(i,j) \in \mathcal{{E}}} \varvec{Q}_{ij}\).

3 Main results

We present a class of conic relaxations of QOP (6), which includes the CPP and DNN relaxations of QOP (6), in Sect. 3.1, and their further Lagrangian relaxations, called Lagrangian–conic relaxations, in Sect. 3.2. From the theoretical results shown in Lemmas 2, 3 and 4, we can conclude that the Lagrangian–DNN relaxation of QOP (6) is almost as effective as the DNN relaxation applied to the original QOP (2), and that solving the Lagrangian–DNN relaxation is expected to be more efficient and stable than solving the direct DNN relaxation of (2).

3.1 A class of conic relaxations

We rewrite QOP (2) as

$$\begin{aligned}&\zeta \,{:=}\, \min \left\{ \varvec{Q}_0 \bullet \varvec{X}\left| \begin{array}{ll} \varvec{X}\in \varvec{\Gamma }, \ \varvec{H}_0 \bullet \varvec{X}= 1, \\ \varvec{Q}_p \bullet \varvec{X}= 0 \ (p=1,2,\ldots ,q) \end{array} \right. \right\} \end{aligned}$$

and QOP (6) as

$$\begin{aligned} \eta \,{:=}\, \min \left\{ \varvec{Q}_0 \bullet \varvec{X}\left| \varvec{H}_0 \bullet \varvec{X}= 1, \ \varvec{H}_1 \bullet \varvec{X}= 0, \ \varvec{X}\in \varvec{\Gamma }\right. \right\} . \end{aligned}$$

Recall that \(\varvec{\Gamma }= \left\{ \varvec{x}\varvec{x}^T : \varvec{x}\in \mathbb {R}^n_+\right\} \). If \(\varvec{\Gamma }\) is replaced by a closed convex cone \(\mathbb {K}\) in \(\mathbb {S}^n\) satisfying \(\varvec{\Gamma }\subset \mathbb {K}\), then the following convex relaxations of QOPs (2) and (6) are obtained:

$$\begin{aligned}&\zeta (\mathbb {K}) := \inf \left\{ \varvec{Q}_0 \bullet \varvec{X}\left| \begin{array}{ll} \varvec{X}\in \mathbb {K}, \ \varvec{H}_0 \bullet \varvec{X}= 1, \\ \varvec{Q}_p \bullet \varvec{X}= 0 \ (p=1,2,\ldots ,q) \end{array} \right. \right\} \end{aligned}$$
(7)
$$\begin{aligned}&\eta (\mathbb {K}) \,{:=}\, \inf \left\{ \varvec{Q}_0 \bullet \varvec{X}\mid \varvec{H}_0 \bullet \varvec{X}= 1, \ \varvec{H}_1 \bullet \varvec{X}= 0, \ \varvec{X}\in \mathbb {K}\right\} . \end{aligned}$$
(8)

Note that the dual problem of (8) is given by

$$\begin{aligned} \eta ^d(\mathbb {K}) := \sup \Big \{ y_0 \mid \varvec{Z}+y_0\varvec{H}_0+y_1\varvec{H}_1 = \varvec{Q}_0, \varvec{Z}\in \mathbb {K}^*, \ \varvec{y}= (y_0,y_1) \in \mathbb {R}^2 \Big \}. \end{aligned}$$
(9)

When \(\mathbb {K}\) is chosen to be \(\mathbb {C}^*\), \(\mathbb {S}^n_+ \cap \mathbb {N}\) or \(\mathbb {S}^n_+\), problem (7) [or problem (8)] is known as a CPP relaxation, a DNN relaxation or an SDP relaxation of QOP (2) [or QOP (6)], respectively. As the CPP relaxation attains the exact optimal value \(\zeta \) of QOP (2) [or QOP (6)] (see Theorem 3.5 of [1]), it is theoretically the most important among the three relaxations. It is, however, numerically intractable while the DNN and SDP relaxations are numerically implementable. From \(\varvec{\Gamma }\subset \mathbb {C}^* \subset \mathbb {S}^n_+ \cap \mathbb {N}\subset \mathbb {S}^n_+\) and

$$\begin{aligned} \zeta (\mathbb {S}^n_+) \le \zeta (\mathbb {S}^n_+ \cap \mathbb {N}) \le \zeta (\mathbb {C}^*) = \zeta (\varvec{\Gamma }) = \zeta , \end{aligned}$$

the DNN relaxation of QOP (2) provides a lower bound for the minimum value \(\zeta \) of (2) at least as effectively as the SDP relaxation. Furthermore, under Conditions (a), (b) and (c), the Lagrangian–DNN relaxation of (6) [for which (2) is equivalent to] satisfies additional properties which are conducive to numerically solving the problem, especially with first-order methods [4, 37].

We present three lemmas to show those properties in the remainder of Sect. 3 for the general case \(\mathbb {K}\) where \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\). They are significant in their own right, although a numerical method is proposed only for \(\mathbb {K}= \mathbb {S}^n_+ \cap \mathbb {N}\) in Sect. 4.

Lemma 2

Suppose that \(\mathbb {K}\) is a closed (not necessarily convex) cone in \(\mathbb {S}^n\) satisfying \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\).

  1. (i)

    The feasible region of problem (7) with \(\mathbb {K}\) is nonempty and bounded; hence \( \zeta (\mathbb {K}) > -\infty \).

  2. (ii)

    \(\eta (\mathbb {K}) = \zeta (\mathbb {K}) \le \zeta \).

Proof

By Condition (a), we know that \(\varvec{x}^*(\varvec{x}^*)^T\) is a feasible solution of problem (7) with \(\mathbb {K}\). For assertion (i), it suffices to show that the feasible region of problem (7) with \(\mathbb {K}\) is bounded. Assume on the contrary that there exists a sequence \(\left\{ \varvec{X}^k \in \mathbb {K}\right\} \) with \(\lim _{k \rightarrow \infty } \Vert \varvec{X}^k \Vert = \infty \) such that

$$\begin{aligned} \varvec{H}_0 \bullet \varvec{X}^k = 1 \ \text{ and } \varvec{Q}_p \bullet \varvec{X}^k = 0 \ \quad (p=1,2,\ldots ,q). \end{aligned}$$

We may assume without loss of generality that \(\varvec{X}^k/\Vert \varvec{X}^k\Vert \) converges to some \(\varvec{D}\in \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\). Dividing the identities above by \(\Vert \varvec{X}^k\Vert \) and taking their limit as \(k \rightarrow \infty \), we have that

$$\begin{aligned} \varvec{O}\not = \varvec{D}\in \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}, \quad \varvec{H}_0 \bullet \varvec{D}= 0 \ \text{ and } \varvec{Q}_p \bullet \varvec{D}= 0 \quad (p=1,2,\ldots ,q). \end{aligned}$$

This contradicts the given Condition (c), and we have shown assertion (i).

By the assumption, \(\varvec{\Gamma }\subset \mathbb {K}\). Hence \(\zeta (\mathbb {K}) \le \zeta (\varvec{\Gamma }) = \zeta \). Since \(\varvec{H}_1 = \sum _{p=1}^q \varvec{Q}_p\), if \(\varvec{X}\in \mathbb {K}\) is a feasible solution of (7) with the objective value \(\varvec{Q}_0 \bullet \varvec{X}\), then it is a feasible solution of (8) with the same objective value. Thus, \(\eta (\mathbb {K}) \le \zeta (\mathbb {K})\) follows. To prove the converse inequality, suppose that \(\varvec{X}\in \mathbb {K}\) is a feasible solution of (8) with \(\mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\). By Condition (b), we know \(\varvec{Q}_p \in \mathbb {S}^n_+ + \mathbb {N}\) \((p=1,2,\ldots ,q)\). Hence \(\varvec{Q}_p \bullet \varvec{X}\ge 0\) \((p=1,2,\ldots ,q)\), which together with \(0 = \varvec{H}_1 \bullet \varvec{X}= \sum _{p=1}^q (\varvec{Q}_p \bullet \varvec{X})\) imply that \(\varvec{Q}_p \bullet \varvec{X}= 0\) \((p=1,2,\ldots ,q)\). Therefore, \(\varvec{X}\) is a feasible solution of (7) with the objective value \(\varvec{Q}_0 \bullet \varvec{X}\), and the inequality \(\eta (\mathbb {K}) \ge \zeta (\mathbb {K})\) follows.\(\square \)

Observe that if the relaxation technique discussed above is directly applied to QOP (1), then we have the following problem:

$$\begin{aligned} \text{ minimize } \left\{ \varvec{Q}\bullet \varvec{U}+ 2\varvec{c}^T\varvec{u}\left| \begin{array}{ll} \displaystyle \varvec{X}= \left( \begin{array}{cc} 1 &{} \varvec{u}^T \\ \varvec{u}&{} \varvec{U}\end{array} \right) \in \mathbb {K}, \ \varvec{A}\varvec{u}+ \varvec{b}= \mathbf{0}, \\ \varvec{Q}_{ij} \bullet \varvec{X}= 0 \ ((i,j) \in \mathcal{{E}}) \end{array} \right. \right\} . \end{aligned}$$
(10)

For \(\mathbb {K}= \varvec{\Gamma }\), problems (7) induced from (4) and (10) induced from (1) are equivalent, and both problems represent QOP (1). If we choose a closed convex cone \(\mathbb {K}\) with \(\varvec{\Gamma }\subset \mathbb {K}\), both problems serve as convex relaxations of QOP (1), but they are not equivalent in general. The essential difference lies in \(\varvec{A}\varvec{u}+ \varvec{b}= \mathbf{0}\) and \(\varvec{Q}_{01} \bullet \varvec{X}= 0\). Suppose that \(\displaystyle \varvec{X}= \left( \begin{array}{cc} 1 &{} \varvec{u}^T \\ \varvec{u}&{} \varvec{U}\end{array} \right) \in \mathbb {K}\) is a feasible solution of problem (7) with \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+\). Then it satisfies

$$\begin{aligned} 0&= \varvec{Q}_{01} \bullet \left( \begin{array}{cc} 1 &{}\quad \varvec{u}^T \\ \varvec{u}&{}\quad \varvec{U}\end{array} \right) = \left( \begin{array}{cc} \varvec{b}^T\varvec{b}&{} \quad \varvec{b}^T\varvec{A}\\ \varvec{A}^T\varvec{b}&{} \quad \varvec{A}^T\varvec{A}\end{array} \right) \bullet \left( \begin{array}{cc} 1 &{} \quad \varvec{u}^T \\ \varvec{u}&{} \quad \varvec{U}\end{array} \right) . \end{aligned}$$

Since \(\varvec{Q}_{01} \in \mathbb {S}^n_+\) and \(\varvec{X}\in \mathbb {S}^n_+\), we see that \(\varvec{Q}_{01}\varvec{X}= \varvec{O}\). It follows that

$$\begin{aligned} \varvec{A}\varvec{u}+ \varvec{b}= \mathbf{0} \ \text{ and } \varvec{b}\varvec{u}^T + \varvec{A}\varvec{U}= \varvec{O}. \end{aligned}$$

From the first equality above, we know that \(\varvec{X}\) is a feasible solution of problem (10); hence problem (7) [hence (8)] provides a convex relaxation at least as good as problem (10). Furthermore, the second equality, which is not involved in (10) unless rank\((\varvec{X}) = 1\), often contributes to strengthening the relaxation.

3.2 A class of Lagrangian–conic relaxations

For each closed cone \(\mathbb {K}\) (not necessarily convex), we consider a Lagrangian relaxation of problem (8) and its dual.

$$\begin{aligned}&\eta ^p(\lambda ,\mathbb {K}) := \inf \left\{ \varvec{Q}_0 \bullet \varvec{X}+ \lambda \varvec{H}_1 \bullet \varvec{X}\mid \begin{array}{l} \varvec{H}_0 \bullet \varvec{X}= 1, \ \varvec{X}\in \mathbb {K}\end{array} \right\} , \end{aligned}$$
(11)
$$\begin{aligned}&\eta ^d(\lambda ,\mathbb {K}) := \sup \left\{ y_0 \mid \varvec{Q}_0 + \lambda \varvec{H}_1 - y_0\varvec{H}_0 \in \mathbb {K}^* \right\} , \end{aligned}$$
(12)

where \(\lambda \in \mathbb {R}\) denotes a Lagrangian parameter. We call either of (11) or (12) a Lagrangian–conic relaxation of QOP (6), and particularly a Lagrangian–DNN relaxation when \(\mathbb {K}= \mathbb {S}^n_+ \cap \mathbb {N}\). (We use these names for simplicity, although (12) is precisely the dual of Lagrangian–conic or Lagrangian–DNN relaxation.) It is easily verified that the weak duality relation \(\eta ^d(\lambda ,\mathbb {K}) \le \eta ^p(\lambda ,\mathbb {K}) \le \eta (\mathbb {K})\) hold for every \(\lambda \in \mathbb {R}\). Note that by Condition (b), problem (11) is strictly feasible when \(\mathbb {K}\) includes the completely positive cone \(\mathbb {C}^*\) with nonempty interior w.r.t. \(\mathbb {S}^n\) (see, for example, [15]).

Suppose that \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\). Then we see by Condition (b) that \(\varvec{H}_1 \in \mathbb {S}^n_+ + \mathbb {N}\). Hence \(\varvec{H}_1 \bullet \varvec{X}\ge 0\) for every \(\varvec{X}\in \mathbb {K}\). Thus, the second term \(\lambda \varvec{H}_1 \bullet \varvec{X}\) of the objective function of (11) serves as a penalty function for the equality constraint \(\varvec{H}_1 \bullet \varvec{X}= 0\) such that for each \(\varvec{X}\in \mathbb {K}\).

$$\begin{aligned} \begin{array}{lll} {\varvec{H}_1 \bullet \varvec{X}\ge 0} \ \text{ and } \lambda \varvec{H}_1 \bullet \varvec{X}\rightarrow \infty \ \text{ as } \lambda \rightarrow \infty \quad \hbox {if and only if } \varvec{H}_1 \bullet \varvec{X}\not = 0 . \end{array} \end{aligned}$$
(13)

By Lemma 2, we also know that \(-\infty < \eta (\mathbb {K}) = \zeta (\mathbb {K}) \le \zeta \). Using these relations, we establish the following result.

Lemma 3

Suppose that \(\mathbb {K}\) is a closed cone (not necessarily convex) in \(\mathbb {S}^n\) satisfying \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\). Then the following statements hold.

  1. (i)

    \(\eta ^p(\lambda _1,\mathbb {K}) \le \eta ^p(\lambda _2,\mathbb {K}) \le \eta (\mathbb {K})\) if \(\lambda _1 < \lambda _2\).

  2. (ii)

    \(\eta ^d(\lambda _1,\mathbb {K}) \le \eta ^d(\lambda _2,\mathbb {K}) \le \eta (\mathbb {K})\) if \( \lambda _1 < \lambda _2\).

  3. (iii)

    \(\eta ^p(\lambda ,\mathbb {K})\) converges to \(\eta (\mathbb {K})\) as \(\lambda \rightarrow \infty \).

  4. (iv)

    Moreover, if \(\mathbb {K}\) is convex, then both problems (11) and (12) have optimal solutions with a common optimal value \(\eta ^p(\lambda ,\mathbb {K}) = \eta ^d(\lambda ,\mathbb {K})\) for every sufficiently large \(\lambda \in \mathbb {R}\). Thus, “sup” can be replaced by “maximize”.

Proof

Assertion (i) follows from the first relation of (13) and the weak duality of the Lagrangian relaxation. Assertion (ii) follows from the fact that \(\varvec{H}_1\in \mathbb {S}^n_+ +\mathbb {N}\subset \mathbb {K}^* \subset \varvec{\Gamma }^*= \mathbb {C}\). To prove (iii), define the level set with the objective value \(\eta (\mathbb {K}) > -\infty \) by

$$\begin{aligned} L(\lambda ,\mathbb {K})&= \left\{ \varvec{X}\in \mathbb {K}: \varvec{H}_0 \bullet \varvec{X}= 1, \ \varvec{Q}_0 \bullet \varvec{X}+ \lambda \varvec{H}_1 \bullet \varvec{X}\le \eta (\mathbb {K})\right\} \end{aligned}$$

for problem (11) with each \(\lambda \in \mathbb {R}\). Then \(L(\lambda ,\mathbb {K})\) contains an optimal solution \(\widetilde{\varvec{X}}\) of (8) for every \(\lambda \in \mathbb {R}\), and \(L(\lambda _1,\mathbb {K}) \supset L(\lambda _2,\mathbb {K})\) if \(0 < \lambda _1 < \lambda _2\). We will show that \(L(\lambda ,\mathbb {K})\) is bounded for a sufficiently large \(\lambda \in \mathbb {R}\). Assume on the contrary that there exists a sequence \(\left\{ (\lambda ^k,\varvec{X}^k) \in \mathbb {R}_+ \times \mathbb {K}\right\} \) such that \(\varvec{X}^k \in L(\lambda ^k,\mathbb {K})\), \(0 < \lambda ^k \rightarrow \infty \) and \(0 < \Vert \varvec{X}^k\Vert \rightarrow \infty \) as \(k \rightarrow \infty \). Then, we have

$$\begin{aligned}&\frac{\varvec{X}^k}{\Vert \varvec{X}^k\Vert } \in \mathbb {K}, \quad \varvec{H}_1 \bullet \frac{\varvec{X}^k}{\Vert \varvec{X}^k\Vert } \ge 0 \ (\hbox {by }\varvec{H}_1 \in \mathbb {S}^n_+ + \mathbb {N}), \quad \varvec{H}_0 \bullet \frac{\varvec{X}^k}{\Vert \varvec{X}^k\Vert } = { \frac{1}{\Vert \varvec{X}^k\Vert } }. \end{aligned}$$

We may assume without loss of generality that \(\varvec{X}^k / \Vert \varvec{X}^k\Vert \) converges to a nonzero \(\varvec{D}\in \mathbb {K}\). By taking the limit as \(k \rightarrow \infty \), we obtain that

$$\begin{aligned} \varvec{O}\not = \varvec{D}\in \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}, \quad \varvec{H}_0 \bullet \varvec{D}= 0, \quad \varvec{H}_1 \bullet \varvec{D}= 0. \end{aligned}$$

This contradicts the given Condition (c). Therefore, we have shown that \(L(\bar{\lambda },\mathbb {K})\) is bounded for some sufficiently large \(\bar{\lambda } > 0\) and \(\widetilde{\varvec{X}} \in L(\lambda ,\mathbb {K}) \subset L(\bar{\lambda },\mathbb {K})\) for every \(\lambda \ge \bar{\lambda } > 0\).

Let \(\left\{ \lambda ^k \ge \bar{\lambda } \right\} \) be a sequence that diverges to \(\infty \). Since the nonempty level set \(L(\lambda ^k,\mathbb {K})\) is contained in a bounded set \(L(\bar{\lambda },\mathbb {K})\), problem (11) with each \(\lambda = \lambda ^k\) has an optimal solution \(\varvec{X}^k\) with the objective value \(\eta ^p(\lambda ^k,\mathbb {K}) = \varvec{Q}_0 \bullet \varvec{X}^k + \lambda ^k \varvec{H}_1 \bullet \varvec{X}^k\) in the level set \(L(\lambda ^k,\mathbb {K})\). We may assume without loss of generality that \(\varvec{X}^k\) converges to some \(\overline{\varvec{X}} \in L(\bar{\lambda },\mathbb {K})\). It follows that

$$\begin{aligned}&\varvec{H}_0 \bullet \varvec{X}^k = 1,\quad \frac{\varvec{Q}_0 \bullet \varvec{X}^k }{\lambda ^k}+ \varvec{H}_1 \bullet \varvec{X}^k \le \frac{\eta (\mathbb {K})}{\lambda ^k}, \\&\varvec{Q}_0 \bullet \varvec{X}^k \le \eta (\mathbb {K}), \quad \varvec{H}_1 \bullet \varvec{X}^k \ge 0 \quad \text{[both } \text{ by } \text{ Condition } \text{(b)] }. \end{aligned}$$

By taking the limit as \(k \rightarrow \infty \), we obtain that

$$\begin{aligned}&\overline{\varvec{X}} \in \mathbb {K}, \ \varvec{H}_0 \bullet \overline{\varvec{X}} = 1, \ \varvec{H}_1 \bullet \overline{\varvec{X}} = 0, \ \varvec{Q}_0 \bullet \overline{\varvec{X}} \le \eta (\mathbb {K}). \end{aligned}$$

This implies that \(\overline{\varvec{X}}\) is an optimal solution of problem (8). Hence, \(\varvec{Q}_0 \bullet \varvec{X}^k\) converges to \(\eta (\mathbb {K})\) as \(k \rightarrow \infty \). We also see from

$$\begin{aligned} \varvec{Q}_0 \bullet \varvec{X}^k \le \eta ^p(\lambda ^k,\mathbb {K}) = \varvec{Q}_0 \bullet \varvec{X}^k + \lambda ^k \varvec{H}_1 \bullet \varvec{X}^k \le \eta (\mathbb {K}) \end{aligned}$$

that \(\eta ^p(\lambda ^k,\varvec{X}^k)\) converges to \(\eta (\mathbb {K})\). Thus, we have shown assertion (iii).

Finally, we prove assertion (iv). We first see that problem (11) has an interior feasible solution by Condition (b). Indeed, if \(\widetilde{\varvec{X}}\) is an interior point of \(\mathbb {C}^*\), which also lies in the interior of any closed convex cone \(\mathbb {K}\) satisfying \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\), then \(\varvec{H}_0 \bullet \widetilde{\varvec{X}} > 0\). Hence \(\widetilde{\varvec{X}} /\left( \varvec{H}_0 \bullet \widetilde{\varvec{X}} \right) \) serves as an interior feasible solution of problems (11) with \(\mathbb {K}\). On the other hand, we have observed in the proof of assertion (iii) above that problem (11) has optimal solutions if \(\lambda > \bar{\lambda }\). Therefore, by the duality theorem for linear optimization problems over closed convex cones (see, for example, Theorem 4.2.1 of [25]), we obtain that \(\eta ^d(\lambda ,\mathbb {K}) = \eta ^p(\lambda ,\mathbb {K})\) for every \(\lambda \ge \bar{\lambda }\). \(\square \)

Lemma 4

Suppose that \(\mathbb {K}\) is a closed convex cone in \(\mathbb {S}^n\) satisfying \(\varvec{\Gamma }\subset \mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\).

  1. (i)

    Suppose \(\eta ^p(\lambda ,\mathbb {K}) < \eta (\mathbb {K})\) for all \(\lambda \), i.e., \(\eta (\mathbb {K})\) is not attained by \(\eta ^p(\lambda ^*,\mathbb {K})\) for any finite \(\lambda ^*\); then the optimal value of problem (9) is not attained at any feasible solution of the problem.

  2. (ii)

    Suppose that \(q \ge 1\); then the feasible set of (8), which is nonempty by Lemma 2, has no feasible point in the interior of \(\mathbb {K}\).

Proof

(i) We prove the result by contradiction. Suppose (9) attained the optimal value at a \(\varvec{y}^*=(y_0^*,y_1^*) \in {\mathbb {R}^2}\) with \(y_0^*=\eta ^d(\mathbb {K})\). By definition, it is clear that \( \eta ^d(-y_1^*,\mathbb {K}) \le \eta ^d(\mathbb {K})\). On the other hand, since \(y_0^*\) is feasible for (12) with parameter equal to \(-y_1^*\), we have \( \eta ^d(-y_1^*,\mathbb {K}) \ge y_0^*\). Thus \( y_0^*=\eta ^d(\mathbb {K}) = \eta ^d(-y_1^*,\mathbb {K}). \) Now for \(\lambda \ge -y_1^*\), we have

$$\begin{aligned} \eta ^d(\mathbb {K}) = \eta ^d(-y_1^*,\mathbb {K}) \le \eta ^d(\lambda ,\mathbb {K}) \le \eta ^d(\mathbb {K}). \end{aligned}$$

Hence \(\eta ^d(\mathbb {K}) = \eta ^d(-y_1^*,\mathbb {K}) = \eta ^d(\lambda ,\mathbb {K}) = \eta ^p(\lambda ,\mathbb {K})\) for every sufficiently large \(\lambda \ge -y_1^*\), where the last equality follows from assertion (iv) in Lemma 3. By letting \(\lambda \rightarrow \infty \), we get by using assertion (iii) of Lemma 3 that \( \eta ^d(-y_1^*,\mathbb {K}) = \eta (\mathbb {K}). \) Since \(\eta ^d(-y_1^*,\mathbb {K}) \le \eta ^p(-y_1^*,\mathbb {K}) \le \eta (\mathbb {K})\), we deduce that \(\eta ^p(-y_1^*,\mathbb {K}) = \eta (\mathbb {K})\). However, this contradicts the condition that \(\eta ^p(\lambda ,\mathbb {K}) < \eta (\mathbb {K})\) for all \(\lambda \). Hence the optimal value \(\eta ^d(\mathbb {K})\) is not attained.

(ii) Let \(\varvec{X}\) be an interior point of \(\mathbb {K}\). By Condition (b), we know that \(\varvec{O}\not = \varvec{Q}_p \in \mathbb {S}^n_+ + \mathbb {N}\) \((p=1,2,\ldots ,q)\). Since \(\varvec{X}\) lies in the interior of \(\mathbb {K}\), there exists a positive number \(\epsilon \) such that \(\varvec{X}-\epsilon \varvec{Q}_p\) \((p=1,,2,\ldots ,q)\) remain in \(\mathbb {K}\subset \mathbb {S}^n_+ \cap \mathbb {N}\); hence \(\varvec{Q}_p \bullet (\varvec{X}- \epsilon \varvec{Q}_p) \ge 0\) \((p=1,2,\ldots ,q)\). It follows that

$$\begin{aligned} \varvec{Q}_p \bullet \varvec{X}> \varvec{Q}_p \bullet \varvec{X}- \epsilon \varvec{Q}_p \bullet \varvec{Q}_p = \varvec{Q}_p \bullet (\varvec{X}- \epsilon \varvec{Q}_p) \ge 0 \quad (p=1,2,\ldots ,q). \end{aligned}$$

Since \(q \ge 1\) by the assumption, we obtain that \(\varvec{H}_1 \bullet \varvec{X}= \sum _{p=1}^q \varvec{Q}_p \bullet \varvec{X}> 0\), and any interior point of \(\mathbb {K}\) cannot be a feasible solution of (8). \(\square \)

4 Algorithms

Here we design an efficient algorithm to solve problem (12) with \(\mathbb {K}= \mathbb {S}^n_+ \cap \mathbb {N}\) [the Lagrangian–DNN relaxation of QOP (6)]. Before we present our algorithm, we should note that (12) with \(\mathbb {K}= \mathbb {S}^n_+ \cap \mathbb {N}\) can actually be reformulated as the following standard linear SDP in dual form by introducing an auxiliary variable \(\varvec{U}\):

$$\begin{aligned} \sup \{ y_0 \mid y_0\varvec{H}_0+\varvec{U}+ \varvec{S}= \varvec{Q}_0+\lambda \varvec{H}_1, -\varvec{U}+ \varvec{Z}= \mathbf{0}, \varvec{S}\in \mathbb {S}^n_+, \varvec{Z}\in \mathbb {N}\}. \end{aligned}$$
(14)

By considering \((y_0,\varvec{U})\) and \((\varvec{S},\varvec{Z})\) as two separate blocks, one can in theory apply the classical convergent 2-block ADMM, which was first introduced by Glowinski and Marrocco [22] and Gabay and Mercier [19], to solve (14). However, the practical performance of the classical 2-block ADMM in solving (14) is far from satisfactory. Typically it has great difficulty in obtaining a dual feasible solution even after a few thousand iterations. As an example, for the quadratic assignment problem nug25 considered in Table 4, running 2000 ADMM iterations produced an approximate solution \((\hat{y}_0,\hat{\varvec{U}},\hat{\varvec{S}},\hat{\varvec{Z}})\) with \(\hat{y}_0 = 4.6578\times 10^3\) and relative dual infeasibility \((\Vert \hat{\varvec{U}}-\hat{\varvec{Z}}\Vert +\Vert \hat{y}_0\varvec{H}_0+\hat{\varvec{U}}+\hat{\varvec{S}}-(\varvec{Q}_0+\lambda \varvec{H}_1)\Vert )/\Vert \varvec{Q}_0+\lambda \varvec{H}_1\Vert =1.07\times 10^{-6}\). Comparing to the optimal value of \(3.7440\times 10^3\) for the combinatorial problem, the approximate bound \(\hat{y}_0\) produced by the ADMM is not only an invalid bound, but the gap is also much worse than the valid lower bound of \(3.6250\times 10^3\) produced by the method we will design below. We should mention that using the convergent ADMM [35] designed to solve a problem of the form (12) also does not produce significant improvement. Finally we have also applied the SDPNAL algorithm [37] to solve (14). Although the performance of SDPNAL is generally much better than ADMM in terms of the achievable accuracy of the computed solution, the accuracy is still not enough to generate a good valid lower bound. For example, for the problem nug25 just mentioned, SDPNAL took \(418\) s to generate an approximate solution \(\hat{y}_0 = 3.8262\times 10^3 \) (which is again an invalid bound) with relative dual infeasibility of \(8.95\times 10^{-8}\). The unsatisfactory performance of the ADMM and SDPNAL in solving (14) thus shows the necessity to design an alternative algorithm to solve the problem.

For the subsequent discussion, we let \( \mathbb {K}_1 = \mathbb {S}_+^n \) and \( \mathbb {K}_2 =\mathbb {N}\), and hence \(\mathbb {K}^* = \mathbb {K}_1^*+\mathbb {K}_2^*\). We use \(\Pi _{\mathbb {K}}(\cdot )\), \(\Pi _{\mathbb {K}^*}(\cdot )\), \(\Pi _i(\cdot )\) and \(\Pi _i^*(\cdot )\) \((i=1,2)\) to denote the (metric) projections onto \(\mathbb {K}\), \(\mathbb {K}^*\), \(\mathbb {K}_i\) and \(\mathbb {K}_i^*\) \((i=1,2)\), respectively.

Suppose for the moment that we have an efficient algorithm to compute \(\Pi _{\mathbb {K}^*}(\varvec{G})\) for any given \(\varvec{G}\in \mathbb {S}^n\). (Note that to compute \(\Pi _{\mathbb {K}^*}(\varvec{G})\), we will present an accelerated proximal gradient method [4] described as Algorithm C in Sect. 4.1.) Then we can design a bisection method to solve (12), as we shall describe next.

For a sufficiently large and fixed \(\lambda > 0\), define the function

$$\begin{aligned} g_\lambda (y_0) = \Vert \varvec{G}_\lambda (y_0) -\Pi _{\mathbb {K}^*}( \varvec{G}_\lambda (y_0) )\Vert = \Vert \Pi _{\mathbb {K}}(-\varvec{G}_\lambda (y_0) )\Vert \end{aligned}$$

where \(\varvec{G}_\lambda (y_0) = \varvec{Q}_0 + \lambda \varvec{H}_1 - y_0\varvec{H}_0\) and \(\Vert \cdot \Vert \) is the Frobenius norm induced from the inner product in \(\mathbb {S}^n.\) It is clear that problem (12) is equivalent to

$$\begin{aligned} \text{ maximize } \{ y_0 \mid g_\lambda (y_0) = 0\}. \end{aligned}$$

We recall that problem (12) has an optimal solution for every sufficiently larger \(\lambda \in \mathbb {R}\) by \((iv)\) of Lemma 4 Thus we can solve (12) by the following simple bisection method.

figure a

In our numerical experiments, we choose \(\varepsilon = 10^{-12}\) in Algorithm A.

To determine an interval \([a_0,b_0]\) which contains \(\eta ^d(\lambda ,\mathbb {K})\), we can loosely solve (12) by Algorithm B described in Sect. 4.1 to produce an \(\varvec{X}_0 \in \mathbb {K}\). Assume that \(\varvec{H}_0\bullet \varvec{X}_0 \not = 0\), then we can generate a feasible point \(\hat{\varvec{X}}\) for (11) by setting \(\hat{\varvec{X}} = \varvec{X}_0/(\varvec{H}_0\bullet \varvec{X}_0)\). As a result, we have

$$\begin{aligned} \eta ^d(\lambda ,\mathbb {K}) \le \eta ^p(\lambda ,\mathbb {K}) \le (\varvec{Q}_0+\lambda \varvec{H}_1) \bullet \hat{\varvec{X}} =: b_\mathrm{test}. \end{aligned}$$

The following cases are considered in order to determine the interval \([a_0,b_0]\):

  1. (i)

    If \(b_\mathrm{test} \le -1\), then consider the set

    $$\begin{aligned} \mathcal{J}_- = \left\{ l \mid \kappa ^l b_\mathrm{test}\hbox { is feasible for } (12), \ l \ \hbox {is a positive integer} \right\} \end{aligned}$$

    where \(\kappa > 1\) is a given constant, say 2. Let \(l^*\) be the smallest integer in \(\mathcal{J}_-\). We can set \(a_0 = \kappa ^{l^*} b_\mathrm{test}\) and \(b_0= \kappa ^{l^*-1} b_\mathrm{test}\). Numerically, we regard \(\kappa ^{-l} b_\mathrm{test}\) as feasible if \(g_\lambda (\kappa ^{-l} b_\mathrm{test}) < \varepsilon \max \{1,|\kappa ^{-l} b_\mathrm{test}|\}\).

  2. (ii)

    If \(b_\mathrm{test} \ge 1\), then consider the set

    $$\begin{aligned} \mathcal{J}_+ = \left\{ l\mid \kappa ^{-l} b_\mathrm{test}\hbox { is feasible for } (13), \ l \ \hbox {is a positive integer} \right\} . \end{aligned}$$

    If \(\mathcal{J}_+\) is nonempty, let \(l^*\) be the smallest integer in \(\mathcal{J}_+\). We can set \(a_0 = \kappa ^{-l^*}b_\mathrm{test}\) and \(b_0 = \kappa ^{1-l^*}b_\mathrm{test}\). On the other hand, if \(\mathcal{J}_+\) is empty, then we know that \(\eta ^d(\lambda ,\mathbb {K}) \le 0\). Thus if \(y_0=-1\) is feasible for (12), we can set \(a_0 = -1\), \(b_0=0\). Otherwise, we know that \(\eta ^d(\lambda ,\mathbb {K}) < -1\), and hence we can set \(b_\mathrm{test} = -1\) and determine \(a_0,b_0\) as in case (i).

  3. (iii)

    If \(b_\mathrm{test} \in (-1,1)\), we can set \(b_0 = b_\mathrm{test}\) and \(a_0 = -1\) if \(y_0=-1\) is feasible for (12). Otherwise, we set \(b_\mathrm{test}=-1\) and determine \(a_0,b_0\) as in case (i).

4.1 A proximal alternating direction multiplier method for solving (12)

Here we design a proximal alternating direction multiplier (PADM) method [17] for solving the following problem:

$$\begin{aligned} \text{ minimize } \left\{ -\varvec{b}^T\varvec{y}\mid \mathcal{{A}}^*(\varvec{y})+\varvec{Z}_1+\varvec{Z}_2 = \varvec{C}, \ \varvec{Z}_1 \in \mathbb {S}^n_+, \varvec{Z}_2\in \mathbb {N}\right\} , \end{aligned}$$
(15)

where \(\varvec{C}\in \mathbb {S}^n\), \(\varvec{b}\in \mathbb {R}^m\) are given data and \(\mathcal{{A}}: \mathbb {S}^n \rightarrow \mathbb {R}^m\) is a given linear map. Note that (12) is a special case of (15) with \(\mathcal{{A}}^* (y_0) = y_0\varvec{H}_0\), \(\varvec{b}= 1\), and \(\varvec{C}= \varvec{Q}_0+\lambda \varvec{H}_1\).

Consider the following augmented Lagrangian function associated with (15):

$$\begin{aligned} L(\varvec{y},\varvec{Z}_1,\varvec{Z}_2; \varvec{X})&= -\varvec{b}^T \varvec{y}+ {\varvec{X}}\bullet (\mathcal{{A}}^*\varvec{y}+ \varvec{Z}_1 + \varvec{Z}_2-\varvec{C}) + \frac{\sigma }{2} \Vert \mathcal{{A}}^*\varvec{y}+\varvec{Z}_1 + \varvec{Z}_2-\varvec{C}\Vert ^2 \nonumber \\&= -\varvec{b}^T\varvec{y}+ \frac{\sigma }{2} \Vert \mathcal{{A}}^*\varvec{y}+ \varvec{Z}_1 + \varvec{Z}_2 +\frac{1}{\sigma }\varvec{X}-\varvec{C}\Vert ^2 - \frac{1}{2\sigma } \Vert \varvec{X}\Vert ^2 \end{aligned}$$
(16)

where \(\varvec{X}\in \mathbb {S}^n, \varvec{y}\in \mathbb {R}^m, \varvec{Z}_1\in \mathbb {S}^n_+, \varvec{Z}_2\in \mathbb {N}\), and \(\sigma >0\) is the penalty parameter. Let \(\mathcal{{T}}\) be a given self-adjoint positive semidefinite linear operator defined on \(\mathbb {S}^n\times \mathbb {S}^n\). The PADM method solves (15) by performing the following steps at the \(k\)th iteration:

$$\begin{aligned}&\varvec{y}^{k+1} = \text{ argmin } \Big \{ L(\varvec{y},\varvec{Z}_1^k,\varvec{Z}_2^k; \varvec{X}^k)\,\mid \, \varvec{y}\in \mathbb {R}^m \Big \} \\&(\varvec{Z}_1^{k+1},\varvec{Z}_2^{k+1}) \\&\quad = \text{ argmin } \left\{ \begin{array}{l} L(\varvec{y}^{k+1},\varvec{Z}_1,\varvec{Z}_2;\varvec{X}^k) \\ + \displaystyle \frac{\sigma }{2} \left( \begin{array}{c} \varvec{Z}_1-\varvec{Z}_1^k \\ \varvec{Z}_2-\varvec{Z}_2^k\end{array}\right) \bullet \mathcal{{T}}\left( \begin{array}{c} \varvec{Z}_1-\varvec{Z}_1^k \\ \varvec{Z}_2-\varvec{Z}_2^k\end{array}\right) \end{array} \Big | \varvec{Z}_1 \in \mathbb {S}^n_+,\varvec{Z}_2 \in \mathbb {N}\right\} \\&\quad = \text{ argmin } \left\{ \begin{array}{l} 2 \varvec{R}_d^k\bullet (\varvec{Z}_1-\varvec{Z}_1^k+\varvec{Z}_2-\varvec{Z}_2^k) \\ + \left( \begin{array}{c} \varvec{Z}_1-\varvec{Z}_1^k \\ \varvec{Z}_2-\varvec{Z}_2^k\end{array}\right) \bullet \left( \left( \begin{array}{cc} \mathcal{I}&{} \mathcal{I}\\ \mathcal{I}&{} \mathcal{I}\end{array}\right) + \mathcal{{T}}\right) \left( \begin{array}{c} \varvec{Z}_1-\varvec{Z}_1^k \\ \varvec{Z}_2-\varvec{Z}_2^k\end{array}\right) \end{array} \Big | \varvec{Z}_1 \in \mathbb {S}^n_+,\varvec{Z}_2 \in \mathbb {N}\right\} \\&\varvec{X}^{k+1} = \varvec{X}^k + \beta \sigma (\mathcal{{A}}^*\varvec{y}^{k+1}+\varvec{Z}_1^{k+1} + \varvec{Z}_2^{k+1}-\varvec{C}), \end{aligned}$$

where \(\beta \in (0,\frac{1+\sqrt{5}}{2})\) is the step-length and \(\varvec{R}_d^k = \mathcal{{A}}^*\varvec{y}^{k+1} +\varvec{Z}_1^k+\varvec{Z}_2^k +\sigma ^{-1}\varvec{X}^k -\varvec{C}\). In our implementation, we fix \(\beta = 1.618\).

By specifically choosing \(\mathcal{{T}}\) to be \(\mathcal{{T}}= \left( \begin{array}{cc} \mathcal{I}&{} -\mathcal{I}\\ -\mathcal{I}&{} \mathcal{I}\end{array}\right) \), the computation of \(\varvec{Z}^{k+1}_1, \varvec{Z}^{k+1}_2\) above then reduces to computing the projections onto \(\mathbb {S}^n_+\) and \(\mathbb {N}\), respectively. As a result, we obtain the following efficient PADM method for solving (15). It can be shown that the PADM method converges; we refer the reader to [17] for the proof.

figure b

We can apply Algorithm B to (12) by taking \(\varvec{C}= \varvec{Q}_0+\lambda \varvec{H}_1\), \(\mathcal{{A}}(\varvec{X}) = \varvec{H}_0 \bullet \varvec{X}\) and \(\varvec{b}=1 \in \mathbb {R}\). Note that since the dual variable \(\varvec{y}\) is one-dimensional, solving the linear system of equation in Step 1 is very easy.

4.2 Computing \(\Pi _{\mathbb {K}}(\varvec{G})\) and \(\Pi _{\mathbb {K}^*}(-\varvec{G})\)

Observe that checking whether a given scalar \(y_0\) is feasible for (12) amounts to checking whether \(\varvec{G}_{\lambda }(y_0) = \varvec{Q}_0 + \lambda \varvec{H}_1 - y_0\varvec{H}_0 \in \mathbb {K}^*\), i.e., whether \(\Pi _{\mathbb {K}} (-\varvec{G}_{\lambda }(y_0)) = \varvec{O}\). To compute \(\Pi _{\mathbb {K}}(\varvec{G})\) for a given \(\varvec{G}\in \mathbb {S}^n\), we consider the following projection problem:

$$\begin{aligned} \text{ minimize } \Big \{\frac{1}{2}\Vert \varvec{X}-\varvec{G}\Vert ^2 \, \left| \, \varvec{X}\in \mathbb {K}\right. \Big \} \end{aligned}$$
(17)

where the unique solution gives the projection \(\Pi _{\mathbb {K}} (\varvec{G})\) of \(\varvec{G}\) onto \(\mathbb {K}\). Note that for any \(\varvec{G}\in \mathbb {S}^n\), we have \(\varvec{G}= \Pi _{\mathbb {K}}(\varvec{G}) - \Pi _{\mathbb {K}^*}(-\varvec{G})\) by the decomposition theorem of Moreau [23]. Using this equality, we can prove that \(\varvec{X}= \Pi _{\mathbb {K}} (\varvec{G})\), \( \varvec{Y}_1+\varvec{Y}_2 = \Pi _{\mathbb {K}^*} (-\varvec{G})\), \(\varvec{Y}_1 \in \mathbb {K}_1^*\) and \(\varvec{Y}_2 \in \mathbb {K}_2^*\) if and only if

$$\begin{aligned}&\varvec{X}-\varvec{G}- \varvec{Y}_1 - \varvec{Y}_2=\mathbf{0}, \;\; \varvec{X}\bullet \varvec{Y}_1 = 0, \; \varvec{X}\bullet \varvec{Y}_2 =0, \nonumber \\&\quad \varvec{X}\in \mathbb {K}_1, \; \varvec{X}\in \mathbb {K}_2, \; \varvec{Y}_1 \in \mathbb {K}_1^*, \; \varvec{Y}_2 \in \mathbb {K}_2^*, \end{aligned}$$
(18)

which can also be derived as the KKT conditions for (17).

In Algorithm C described below, we will measure the accuracy of an approximation \((\hat{\varvec{X}},\hat{\varvec{Y}}_1,\hat{\varvec{Y}}_2)\) of \((\varvec{X},\varvec{Y}_1,\varvec{Y}_2)\) satisfying (18) by computing the following residual:

$$\begin{aligned}&\delta _{\mathbb {K}} \!=\! \frac{1}{1\!+\!\Vert \varvec{G}\Vert }\max \left\{ \begin{array}{l} \Vert \hat{\varvec{X}}-\varvec{G}-\hat{\varvec{Y}}_1-\hat{\varvec{Y}}_2\Vert , \frac{|\langle \hat{\varvec{X}}, \, \hat{\varvec{Y}}_1\rangle |}{1+\Vert \hat{\varvec{Y}}_1\Vert }, \frac{|\langle \hat{\varvec{X}}, \, \hat{\varvec{Y}}_2\rangle |}{1+\Vert \hat{\varvec{Y}}_2\Vert }, \\ \Vert \Pi _1^*(-\hat{\varvec{X}})\Vert ,\Vert \Pi _2^*(-\hat{\varvec{X}})\Vert , \Vert \Pi _1(-\hat{\varvec{Y}}_1)\Vert ,\Vert \Pi _2(-\hat{\varvec{Y}}_2)\Vert \end{array} \right\} \!. \end{aligned}$$
(19)

It turns out that to solve (17) for a given \(\varvec{G}\in \mathbb {S}^n\), it is more efficient to consider the following problem for computing the projection of \(-\varvec{G}\) onto \(\mathbb {K}^* = \mathbb {K}_1^* + \mathbb {K}_2^*\):

$$\begin{aligned}&\text{ minimize } \; \Big \{ \text{ minimize } \; \Big \{ \frac{1}{2}\Vert \varvec{G}+\varvec{Y}_1+\varvec{Y}_2\Vert ^2 \mid \varvec{Y}_2 \in {\mathbb {K}_2^*} \Big \} \mid \varvec{Y}_1\in {\mathbb {K}_1^*} \Big \}, \end{aligned}$$

which can equivalently be formulated as follows:

$$\begin{aligned} \text{ minimize } \Big \{ f(\varvec{Y}_1) := \frac{1}{2} \Vert \Pi _2(\varvec{G}+\varvec{Y}_1)\Vert ^2 \,\mid \, \varvec{Y}_1\in \mathbb {K}_1^*\Big \}. \end{aligned}$$
(20)

It is easy to show that if \(\varvec{Y}_1^*\) is an optimal solution of (20), then \(\varvec{X}^* = \Pi _2(\varvec{G}+\varvec{Y}_1^*)\) is an optimal solution of (17). It can be shown that \(f(\cdot )\) is continuously differentiable on \(\mathbb {S}^n\) with \(\nabla f(\varvec{Y}_1) = \Pi _2(\varvec{G}+\varvec{Y}_1)\), and the gradient is Lipschitz continuous with modulus \(L=1\). However, note that \(f(\cdot )\) is not twice continuously differentiable on \(\mathbb {S}^n\).

The KKT conditions for (20) are given by:

$$\begin{aligned} \Pi _2(\varvec{G}+\varvec{Y}_1) - \varvec{X}= \mathbf{0}, \quad \varvec{Y}_1\bullet \varvec{X}= 0, \quad \varvec{Y}_1\in \mathbb {K}_1^*, \; \varvec{X}\in \; \mathbb {K}_1. \end{aligned}$$

To solve problem (20), we can either use a projected gradient method [5] or the accelerated proximal gradient (APG) method in [4]. We prefer the latter because of its superior iteration complexity. The details of the APG method is described next.

figure c

We note that Algorithm C is called in Algorithm A with \(\varvec{G}= -\varvec{G}_{\lambda }(y_0^k)\) to determine whether \(y_0^k\) is feasible for (12). It is also used to check whether \(\kappa ^l b_{test} \) and \(\kappa ^{-l} b_{test} \) are feasible for (12). As Algorithm C generates approximations of \(\Pi _{\mathbb {K}}(\varvec{G})\) and \(\Pi _{\mathbb {K}^*}(-\varvec{G})\), it is important to pick a small tolerance \(\epsilon \) (\(10^{-12}\) in our numerical experiments) in the algorithm in order to determine the feasibility of \(y_0^k\) unambiguously.

We close this section with the following iteration complexity result for Algorithm C.

Theorem 1

Let \(\{ \varvec{Y}^k_1\}_{k=0}^\infty \) be generated by Algorithm C. Then for any \(k\ge 1\), we have

$$\begin{aligned} 0 \le f(\varvec{Y}^k_1) - f(\varvec{Y}^*_1) \le \frac{2\Vert \varvec{Y}_1^k-\varvec{Y}_1^0\Vert ^2 }{(k+1)^2}. \end{aligned}$$

Proof

It follows from [4, Theorem 4.1] by noting that \(\nabla f(\cdot )\) is Lipschitz continuous with modulus \(L=1\). \(\square \)

5 Numerical experiments

To apply the numerical methods presented in the previous section, we take \(\mathbb {K}= \mathbb {S}^n_+ \cap \mathbb {N}\). We demonstrate the efficiency and effectiveness of solving the Lagrangian–DNN relaxation (12) using Algorithm A by comparing with the DNN relaxation (7) of (6) in terms of the quality of lower bounds and CPU time.

The test problems include the binary integer quadratic problems, the quadratic multiple knapsack problems, the maximum stable set problems, and the quadratic assignment problems. All the experiments were performed in Matlab on a Dell Precision T3500 Desktop with Intel Xeon quad-core CPU (2.80 GHZ) and 24 GB memory.

5.1 Binary integer quadratic problems

The binary integer quadratic (BIQ) problem is described as

$$\begin{aligned} v^* = \min \; \{ \varvec{x}^T \varvec{Q}\varvec{x}\mid \varvec{x}\in \{0,1\}^m\}, \end{aligned}$$
(21)

where \(\varvec{Q}\) is an \(m \times m\) symmetric matrix (not necessarily positive semidefinite). By [9], a natural DNN relaxation of problem (21) is given by:

$$\begin{aligned} v^{(0)} \,{:=}\, \min \; \left\{ \varvec{Q}\bullet \varvec{X}\; \left| \; \begin{array}{l} \mathrm{diag}{(\varvec{X})}-\varvec{x}= 0, \\ \left[ \begin{array}{cc} 1 &{} \varvec{x}^T \\ \varvec{x}&{} \varvec{X}\end{array}\right] \in \mathbb {S}^{m+1}_+ \cap \mathbb {N}^{m+1} \end{array} \right. \right\} . \end{aligned}$$
(22)

Note that by introducing slack variables, \(v_i = 1-x_i\), \(i=1,\dots ,m\), (21) can be reformulated in the form (1) as follows:

$$\begin{aligned} v^* = \min \; \{ \varvec{x}^T \varvec{Q}\varvec{x}\mid \varvec{x}+ \varvec{v}= \varvec{e}, \ \varvec{x}\circ \varvec{v}=0,\ \varvec{x}\ge \mathbf{0}, \varvec{v}\ge \mathbf{0}\}. \end{aligned}$$
(23)

where \(\circ \) denotes the operation of performing elementwise multiplication. Observe that the direct DNN relaxation of the form (10) for problem (23) is given by

$$\begin{aligned} v^{(1)} \,{:=}\, \min \; \left\{ \varvec{Q}\bullet \varvec{X}\; \left| \; \begin{array}{l} \mathrm{diag}{(\varvec{X})}-\varvec{x}= 0,\; \mathrm{diag}{(\varvec{V})}-\varvec{v}= 0,\; \mathrm{diag}{(\varvec{W})} = 0\\ \varvec{x}+\varvec{v}=\varvec{e}, \; \left[ \begin{array}{ccc} 1 &{}\quad \varvec{x}^T &{}\quad \varvec{v}^T \\ \varvec{x}&{}\quad \varvec{X}&{}\quad \varvec{W}^T \\ \varvec{v}&{}\quad \varvec{W}&{}\quad \varvec{V}\end{array}\right] \in \mathbb {S}^{2m+1}_+\cap \mathbb {N}^{2m+1} \end{array} \right. \right\} .\nonumber \\ \end{aligned}$$
(24)

We can consider the relaxation of the form (7) for (23) with \(\mathbb {K}= \mathbb {S}^{2m+1}_+\cap \mathbb {N}^{2m+1}\), and the subsequent Lagrangian–DNN relaxation (11) and its dual (12) to generate a lower bound \(v^{(2)}\) for (21).

Table 1 BIQ problems with \(m=100\) and \(m=250\): \(\lambda = 10^6\Vert \varvec{Q}_0\Vert /\Vert \varvec{H}_1\Vert \)

Table 1 presents the numerical results we obtain for solving (22), the direct DNN relaxation (24), and the Lagrangian–DNN relaxation of (23). The test problems are the Billionnet–Elloumi instances from BIQMAC library [6]. The lower bounds \(v_\mathrm{LB}^{(0)}\) obtained from (22) are the current state-of-the-art, and they are computed by the advanced large scale SDP solver, SDPNAL, originally developed in [37]. The stopping tolerance for SDPNAL is set to \(10^{-5}\) rather than the default value of \(10^{-6}\) since the former is more efficient for the purpose of computing a lower bound for (22). We should mention that as the solver SDPNAL only produces an approximately primal–dual feasible solution for (22), the procedure in [21] is used to generate a valid lower bound \(v_\mathrm{LB}^{(0)}\) for \(v^{(0)}\) based on the approximately feasible solution.

From the table, we may observe that the Lagrangian–DNN relaxation (12) of (23) can produce much stronger lower bounds \(v^{(2)}_\mathrm{LB}\) than the bounds \(v^{(0)}_\mathrm{LB}\) generated from the standard DNN relaxation problem (22), while the CPU times taken to compute the bounds \(v^{(2)}_\mathrm{LB}\) by Algorithm A are at most 2.5 times that taken to compute \(v^{(0)}_\mathrm{LB}\) by SDPNAL. The fact that more time is needed to compute \(v^{(2)}_\mathrm{LB}\) should not come as a surprise since the matrix variable involved has dimension \(2m+1\) whereas the corresponding variable for \(v^{(0)}_\mathrm{LB}\) has dimension \(m+1\). But we have seen that the formulation (23) is able to produce much stronger lower bound than that produced by the standard DNN relaxation (22) of (21).

Next we test the strength of the Lagrangian–DNN relaxation (12) of (23) by comparing the lower bound \(v^{(2)}_\mathrm{LB}\) with the lower bound \(v^{(1)}_\mathrm{LB}\) produced by the direct DNN relaxation (24) of (23). Notice that the bounds \(v^{(2)}_\mathrm{LB}\) are tighter than \(v^{(1)}_\mathrm{LB}\) for almost all test problems, except for bqp100-3 and bqp100-4 where the differences between \(v^{(2)}_\mathrm{LB}\) and \(v^{(1)}_\mathrm{LB}\) are very small. The bounds \(v^{(1)}_\mathrm{LB}\) are now much closer to \(v^{(2)}_\mathrm{LB}\) compared to the differences between \(v^{(0)}_\mathrm{LB}\) and \(v^{(2)}_\mathrm{LB}\), however, computing \(v^{(1)}_\mathrm{LB}\) takes much longer time than \(v^{(2)}_\mathrm{LB}\) or \(v^{(0)}_\mathrm{LB}\). Thus we see that solving the Lagrangian–DNN relaxation of (23) by our proposed Algorithm A can generate the tight lower bounds \(v^{(2)}_\mathrm{LB}\) efficiently.

5.2 Quadratic multiple knapsack problems

The problem under consideration is the following:

$$\begin{aligned} v^* \,{:=}\, \min \; \left\{ \varvec{x}^T \varvec{Q}\varvec{x}\; \left| \; \varvec{A}\varvec{x}+\varvec{s}= \varvec{b}, \ \varvec{x}\in \{0,1\}^m, \varvec{s}\ge 0 \right. \right\} , \end{aligned}$$
(25)

where \(\varvec{A}\in \mathbb {R}^{q\times m}\) and \(\varvec{b}\in \mathbb {R}^q\) have their entries all being positive. The problem (25), studied in [10, 29], is a generalization of the binary quadratic single knapsack problem. In our numerical experiments, we set \(q = 10\), \(m = 100\), and use the same matrix \(\varvec{Q}\) with \(m=100\) as in the BIQ problems in Sect.  5.1. The matrix \(\varvec{A}\) is generated randomly with its elements independently drawn from the uniformly distribution on the interval \([0,10]\). The vector \(\varvec{b}\) is set to \(\varvec{b}=m\varvec{e}\).

By the formulation of Burer in [9], the natural DNN relaxation of (25) is given by

$$\begin{aligned} v^{(0)} \,{:=}\, \min \; \left\{ \varvec{Q}\bullet \varvec{X}\; \left| \; \begin{array}{l} \varvec{A}\varvec{x}+\varvec{s}= \varvec{b}, \ \mathrm{diag}{(\varvec{X})}-\varvec{x}= 0 \\ \left( \left[ \begin{array}{c} \varvec{a}_i \\ \varvec{e}_i\end{array}\right] \left[ \begin{array}{c} \varvec{a}_i\\ \varvec{e}_i\end{array}\right] ^T \right) \bullet \left[ \begin{array}{cc} \varvec{X}&{}\quad \varvec{W}^T\\ \varvec{W}&{}\quad \varvec{S}\end{array}\right] = b_i^2 \; (i=1,\dots ,q)\\ \left[ \begin{array}{ccc} 1 &{}\quad \varvec{x}^T &{}\quad \varvec{s}^T\\ \varvec{x}&{} \quad \varvec{X}&{} \quad \varvec{W}^T\\ \varvec{s}&{}\quad \varvec{W}&{}\quad \varvec{S}\end{array}\right] \in \mathbb {S}^{m+q+1}\cap \mathbb {N}^{m+q+1} \\ \end{array} \right. \right\} \nonumber \\ \end{aligned}$$
(26)

where \(\varvec{a}_i^T\) denotes the \(i\)th row of \(\varvec{A}\), and \(\varvec{e}_i\) is the \(i\)th unit vector in \(\mathbb {R}^q\). We can introduce slack variables, \(v_i = 1-x_i\), \(i=1,\dots ,m\), just as for the BIQ problems in Sect. 5.1, to reformulate (27) into the form (1) as follows:

$$\begin{aligned} v^* = \min \; \left\{ \varvec{x}^T \varvec{Q}\varvec{x}\; \left| \; \begin{array}{l} \left[ \begin{array}{ccc} \varvec{I}_m &{} \varvec{I}_m &{} 0 \\ \varvec{A}&{} 0 &{} \varvec{I}_q \end{array} \right] \left[ \begin{array}{c} \varvec{x}\\ \varvec{v}\\ \varvec{s}\end{array}\right] = \left[ \begin{array}{c} \varvec{e}\\ \varvec{b}\end{array}\right] , \\ \varvec{x}\circ \varvec{v}= 0,\ \varvec{x}\ge \mathbf{0}, \varvec{v}\ge \mathbf{0}, \varvec{s}\ge \mathbf{0} \end{array} \right. \right\} . \ \end{aligned}$$
(27)

In the numerical results presented in Table 2, we report the lower bound \(v^{(0)}_\mathrm{LB}\) computed by SDPNAL for (26), the lower bound \(v^{(1)}_\mathrm{LB}\) computed by SDPNAL for the DNN relaxation of (27) based on Burer’s formulation in [9], and the lower bound \(v^{(2)}_\mathrm{LB}\) computed by Algorithm A for the Lagrangian–DNN relaxation associated with (27).

As we can see from the numerical results that the lower bound \(v^{(0)}_\mathrm{LB}\) is much weaker than \(v^{(1)}_\mathrm{LB}\) and \(v^{(2)}_\mathrm{LB}\); thus we will mainly compare the bounds \(v^{(1)}_\mathrm{LB}\) and \(v^{(2)}_\mathrm{LB}\). Table 2 shows that the bound \(v^{(2)}_\mathrm{LB}\) based on the Lagrangian–DNN relaxation introduced in this paper can be computed much more efficiently than the lower bound \(v^{(1)}_\mathrm{LB}\). Furthermore, \(v^{(2)}_\mathrm{LB}\) is stronger than \(v^{(1)}_\mathrm{LB}\).

Table 2 Quadratic multiple knapsack problems: \(\lambda = 10^6\Vert \varvec{Q}_0\Vert /\Vert \varvec{H}_1\Vert \)

5.3 Maximum stable set problems

For a graph \(G\) with \(m\) nodes and edge set \(\mathcal E\), the stability number \(\alpha (G)\) is the cardinality of a maximal stable set of \(G\) [12], and

$$\begin{aligned} -\alpha (G)&:= \min \; \{-\varvec{e}^T \varvec{x}\; \mid \; x_ix_j = 0 \ ((i, j) \in \mathcal{E}), \ \varvec{x}\in \{0, 1\}^{m}\} \end{aligned}$$
(28)
$$\begin{aligned}&\ge \min \left\{ (-\varvec{e}\varvec{e}^T) \bullet (\varvec{z}\varvec{z}^T) \; \left| \; \begin{array}{l} \sum \nolimits _{i=1}^n z_i^2 = 1, \; z_iz_j = 0 \\ \; ((i,j)\in \mathcal{E}), \; \varvec{z}\ge \mathbf{0} \end{array} \right. \right\} . \end{aligned}$$
(29)

Note that (29) is derived from (28) by setting \(\varvec{z}= \varvec{x}/\sqrt{\varvec{e}^T\varvec{x}}\) for \(0\not =\varvec{x}\in \{0,1\}^m\).

Let \(\varvec{E}_{ij}\) be the \(m\times m\) symmetric matrix whose elements are all zeros, except the \((i,j)\) and \((j,i)\) elements which are equal to \(1\). By setting

$$\begin{aligned} \varvec{Q}_0 =-\varvec{e}\varvec{e}^T, \quad \varvec{H}_0 = \varvec{I},\quad \varvec{H}_1 = \sum _{(ij)\in \mathcal{{E}}} \varvec{E}_{ij} \end{aligned}$$
(30)

it is readily shown that (29) is equivalent to problem (8) with \(\mathbb {K}= \varvec{\Gamma }\). A well-known DNN relaxation of (29) is the following:

$$\begin{aligned} v^{(1)} = \min \, \left\{ \varvec{Q}_0 \bullet \varvec{X}\; \left| \; \varvec{I}\bullet \varvec{X}= 1,\, \varvec{E}_{ij} \bullet \varvec{X}= 0 \ ((i, j)\in \mathcal{E}), \ \varvec{X}\in \mathbb {S}^n_+\cap \mathbb {N}\right. \right\} .\quad \quad \end{aligned}$$
(31)

Table 3 presents the comparison of the lower bounds and computation times for problem (29). The test problems are graph instances (arising from coding theory) collected by Sloane [30]. The lower bounds \(v_\mathrm{LB}^{(1)}\) we generated from (31) are the current state-of-the-art, and they are computed by the advanced large scale SDP solver, SDPNAL. The lower bounds \(v_\mathrm{LB}^{(2)}\) are computed for the problem (29) based on the Lagrangian–DNN relaxation (12) by Algorithm A. We can see that the difference between the bounds \(v_\mathrm{LB}^{(1)}\) and \(v_\mathrm{LB}^{(2)}\) is less than \(10^{-2}\) for all test problems, and the CPU times to compute \(v_\mathrm{LB}^{(1)}\) and \(v_\mathrm{LB}^{(2)}\) are comparable, except for the problems 1dc.1024, 1tc.1024, 1zc.1024. For these problems, computing \(v_\mathrm{LB}^{(1)}\) by SDPNAL takes more than twice time computing \(v_\mathrm{LB}^{(2)}\) by Algorithm A.

We have also tested Algorithm A for solving the Hamming 6-4 instance, a graph with 64 vertices and \(\alpha (G)=4\) from the Second DIMACS Challenge [14]. For this problem, it was reported in [8] that the linear approximation method took about 58 min to solve the problem. For our algorithm, the problem (12) was solved in less than 2 s and generated the lower bound of \(-\)4.00024414 for \(-\alpha (G)\).

Table 3 Maximum stable set problems with \(n= 256, 512\) and \(1024\): \(\lambda = 10^6\Vert \varvec{Q}_0\Vert /\Vert \varvec{H}_1\Vert \)

5.4 Quadratic assignment problems

We will identify a matrix \(\varvec{X}= [\varvec{x}_1,\dots ,\varvec{x}_n]\in \mathbb {R}^{n\times n}\) with the \(n^2\)-vector \(\varvec{x}= \mathrm{vec}(\varvec{X}) = [\varvec{x}_1; \dots ; \varvec{x}_n]\). Given matrices \(\varvec{A}, \varvec{B}\in \mathbb {R}^{n\times n}\), the quadratic assignment problem is:

$$\begin{aligned} v^*&:= \min \; \left\{ \varvec{X}\bullet \varvec{A}\varvec{X}\varvec{B}^T \; \left| \; \begin{array}{l} \varvec{e}^T \varvec{X}\varvec{e}_j =1= \varvec{e}_j^T \varvec{X}\varvec{e}\\ (j=1,\dots ,n), \; \varvec{X}\in \{0,1\}^{n\times n} \end{array} \right. \right\} \nonumber \\&= \!\min \; \left\{ \varvec{x}^T(\varvec{B}\otimes \varvec{A}) \varvec{x}\; \left| \; \begin{array}{l} (\varvec{e}_j^T\otimes \varvec{e}^T)\varvec{x}=1= (\varvec{e}^T\otimes \varvec{e}_j^T)\varvec{x}\ \\ (j=1,\dots ,n), \ \varvec{x}\in \{0,1\}^{n^2} \end{array} \right. \right\} . \end{aligned}$$
(32)

Here \(\varvec{e}\in \mathbb {R}^n\) denotes the vector of ones, \(\varvec{e}_j \in \mathbb {R}^n\) the \(j\)th coordinate Unit vector, and \(\otimes \) denotes the Kronecker product. We can express the above problem in the form of (1) by introducing an additional \(n\) vector variables \(\varvec{v}_i = \varvec{e}-\varvec{x}_i\) \(i=1,\dots ,n\), but the resulting Lagrangian–DNN relaxation (12) of (1) will involve matrices with dimensions \(2n^2\times 2n^2\). From the computational point of view, such a doubling of matrix dimension is expensive, and our numerical experience also show that it is costly and difficult to solve (12) to high accuracy with such a formulation. For this reason, we will consider the following alternative reformulation of (32) introduced by Povh and Rendl [26]:

$$\begin{aligned} v^*&:= \!\min \; \Big \{ \varvec{X}\bullet (\varvec{A}\varvec{X}\varvec{B}^T) \; \mid \; \varvec{X}^T\varvec{X}\!=\!\varvec{I}\!=\!\varvec{X}\varvec{X}^T, \varvec{X}\!\ge \! 0 \Big \} \end{aligned}$$
(33)
$$\begin{aligned}&= \!\!\min \; \left\{ (\varvec{B}\otimes \varvec{A}) \bullet \varvec{Y}\; \left| \; \begin{array}{l} \sum \nolimits _{i=1}^n \varvec{Y}^{ii} \!=\! \varvec{I}, \ \varvec{I}\bullet \varvec{Y}^{ij} \!=\! \delta _{ij} \ (1 \!\le \! i, \ j \!\le \! n), \\ \varvec{E}\bullet \varvec{Y}= n^2, \ \varvec{Y}\in \mathbb {C}^* \end{array} \right. \right\} \!.\quad \quad \end{aligned}$$
(34)

Here \(\varvec{Y}^{ij}\) corresponds to the \((i,j)\)-block in the expansion of \(\varvec{x}\varvec{x}^T\) as an \(n^2\times n^2\) matrix, and \(\delta \) denotes Kronecker’s delta. A natural DNN relaxation of (34) is following:

$$\begin{aligned} v^{(1)}&:= \min \; \left\{ (\varvec{B}\otimes \varvec{A}) \bullet \varvec{Y}\; \left| \; \begin{array}{l} \sum \nolimits _{i=1}^n \varvec{Y}^{ii} = \varvec{I}, \ \varvec{I}\bullet \varvec{Y}^{ij} = \delta _{ij} \ (1 \le i, \ j \le n), \\ \varvec{E}\bullet \varvec{Y}= n^2, \ \varvec{Y}\in \mathbb {S}^{n^2}_+\cap \mathbb {N}\end{array} \right. \right\} .\quad \quad \quad \end{aligned}$$
(35)

Using the fact that \(\{ \varvec{X}\in \mathbb {R}^{n\times n}\mid \varvec{X}^T\varvec{X}= \varvec{I}_n, \varvec{X}\ge 0\}\) completely characterizes the set of \(n\times n\) permutation matrices, we can show that (33) can equivalently be formulated as follows:

$$\begin{aligned} v^*&:= \min \; \left\{ \varvec{X}\bullet (\varvec{A}\varvec{X}\varvec{B}^T) \; \left| \; \begin{array}{l} \varvec{e}^T\varvec{X}\varvec{e}_i = 1 = \varvec{e}_i^T\varvec{X}\varvec{e}\ ( i=1,\ldots ,n), \\ X_{it}X_{jt} = 0 = X_{ti}X_{tj} \ (t=1,\ldots ,n, \; i\not =j ), \\ \varvec{X}\ge \varvec{O}\end{array} \right. \right\} .\quad \quad \quad \end{aligned}$$
(36)

By considering the Lagrangian–DNN relaxation of (11) and its dual (12) for (36), we can generate a lower bound for (36), denoted as \(v_\mathrm{LB}^{(2)}\).

Table 4 presents the lower bounds and computation times corresponding to the relaxation problems (35) and the Lagrangian–DNN relaxation of (36) with \(\mathbb {K}= \mathbb {S}^{n^2}_+\cap \mathbb {N}\). The test problems were downloaded from [27]. We observe that the bounds \(v_\mathrm{LB}^{(2)}\) are slightly closer to \(v^{*}\) for the problems nug**, tai30*, and are comparable for the other problems. However, the lower bounds \(v_\mathrm{LB}^{(2)}\) can be computed much faster by Algorithm A than \(v_\mathrm{LB}^{(1)}\) by SDPNAL.

Table 4 Quadratic assignment problems: \(\lambda = 10^5\Vert \varvec{Q}_0\Vert /\Vert \varvec{H}_1\Vert \)
Table 5 Performance of PADM (Algorithm B with a maximum of 10,000 iterations) and SDPNAL in solving (9) with \(\mathbb {K}= \mathbb {S}^n_+\cap \mathbb {N}\)

5.5 Comparison of DNN relaxation (9) and Lagrangina-DNN relaxation (12)

To give the reader an idea on how difficult it is to solve problem (9) with \(\mathbb {K}=\mathbb {S}^n_+\cap \mathbb {N}\) accurately, we apply Algorithm B to solve the problem, and the numerical results are presented in Table 5. In the table, the quantities \(R_p\) and \(R_d\) are the relative primal and dual infeasibilities for problems (8) and (9), respectively. As can be observed from the table, even after running Algorithm B for 10,000 iterations, the dual feasibility of the final iteration is still quite large except for the maximum stable set problems. As a result, the corresponding dual objective value does not provide a valid lower bound for (8). However, by using the computed dual variables \(\hat{\varvec{y}}=(\hat{y}_0,\hat{y}_1)\), we can attempt to generate a valid lower bound as follows. Let \(\varvec{G}(y_0) = \varvec{Q}_0 -\hat{y}_1\varvec{H}_1 - y_0 \varvec{H}_0\). Set

$$\begin{aligned} v_\mathrm{LB}^{(3)} \!=\! \left\{ \begin{array}{ll} \max \{ y_0^j := \hat{y}_0(1+0.01\times j) \mid \varvec{G}(y_0^j) \!\in \! \mathbb {K}^*, j=0,1,\dots \} &{}\quad \hbox {if }\hat{y}_0 < 0\\ \max \{ y_0^j := \hat{y}_0(1-0.01\times j) \mid \varvec{G}(y_0^j) \!\in \! \mathbb {K}^*, j=0,1,\dots \} &{}\quad \hbox {if }\hat{y}_0 > 0\\ \max \{ y_0^j := -0.01\times j \mid \varvec{G}(y_0^j) \!\in \! \mathbb {K}^*, j=0,1,\dots \} &{}\quad \hbox {if }\hat{y}_0=0. \end{array}\right. \quad \end{aligned}$$
(37)

As we can observe from Table 5, it is generally difficult to solve (9) to high accuracy by Algorithm B or the solver SDPNAL, except for the maximum stable set problems. This is not surprising since the problems generally do not attain the optimal values by Lemma 4(i). Moreover, the time taken to solve (9) and the subsequent generation of the valid lower bound \(v_\mathrm{LB}^{(3)}\) by Algorithm B or SDPNAL is generally much longer than that taken to compute \(v_\mathrm{LB}^{(2)}\) based on (12) by Algorithm A. Worse still, the lower bound \(v_\mathrm{LB}^{(3)}\) generated is also inferior to \(v_\mathrm{LB}^{(2)}\), except for the maximum stable set problems.

6 Concluding remarks

We have presented a theoretical framework for the Lagrangian–conic relaxation of the QOP (1) and a computational method for the Lagrangian–DNN relaxation to improve the effectiveness and efficiency of obtaining the lower bounds for (1).

The theoretical results on the equivalence between the optimal value of the DNN relaxation of (1) and that of the Lagrangian–DNN relaxation shown in Sect. 3 provide the theoretical support for the tight lower bounds obtained numerically in Sect. 5. The computational efficiency of the proposed method has been achieved by the simple bisection method combined with the efficient first order methods, the proximal alternating direction multiplier and the accelerated proximal gradient methods.

As shown in Sect. 5, the proposed method can compute the lower bounds of improved quality for the test problems efficiently. Specifically, the quadratic assignment problem, known to be very hard problem to solve, could be handled very efficiently.

The approach proposed in this paper for the QOP (1) can be further extended to solve a general class of polynomial optimization problems (POPs). Currently, the numerical methods based on SDPs for POPs suffer from the numerical inefficiency caused by high degrees and large dimensions of POPs. Some of these difficulties can be dealt with the proposed idea in this paper. See also [3]. We intend to work on this project in the future.