Keywords

Mathematics Subject Classification (2010):

1 Introduction

In mathematical programming, the problem of minimizing a nonconvex quadratic function over a sphere constraint is known as the trust region subproblem, which arises in trust region methods [1, 2]. Here, we formulate it as

$$\displaystyle{ (\mathcal{P})\ \ \ \ \min \{P(\boldsymbol{x}) =\boldsymbol{ x}^{T}\boldsymbol{Q}\boldsymbol{x} - 2\boldsymbol{f}^{T}\boldsymbol{x}\ \vert \ \boldsymbol{x} \in \mathcal{X}_{ a}\} }$$
(1)

where the given matrix \(\boldsymbol{Q} \in \mathbb{R}^{n\times n}\) is assumed to be symmetric, \(\boldsymbol{f} \in \mathbb{R}^{n}\) is a given vector, and the feasible region is defined as

$$\displaystyle{ \mathcal{X}_{a} = \left \{\boldsymbol{x} \in \mathbb{R}^{n}\ \vert \ \|\boldsymbol{x}\| \leq r\right \}, }$$
(2)

in which r is a positive real number and \(\|\boldsymbol{x}\| =\|\boldsymbol{ x}\|_{2}\) represents 2 norm in \(\mathbb{R}^{n}\). In literatures, two types of similar problems are also discussed: one considers a general quadratic constraint, i.e. the convexity is removed [3, 4]; the other one is equipped with a two-sided (lower and upper bound) quadratic constraint [5, 6].

It is proved that the problem (\(\mathcal{P}\)) possesses hidden convexity, i.e. it is actually equivalent to a convex optimization problem [5]. Thus, if the vector \(\bar{\boldsymbol{x}}\) is a solution of (\(\mathcal{P}\)), there exists a Lagrange multiplier \(\bar{\mu }\) such that besides the KKT conditions

$$\displaystyle{ (\boldsymbol{Q} +\bar{\mu }\boldsymbol{ I})\bar{\boldsymbol{x}} =\boldsymbol{ f},\ \ \|\bar{\boldsymbol{x}}\| \leq r,\ \ \bar{\mu } \geq 0,\ \ \bar{\mu }(\|\bar{\boldsymbol{x}}\| - r) = 0, }$$
(3)

we also have [7]

$$\displaystyle{ \boldsymbol{Q} +\bar{\mu }\boldsymbol{ I}\succeq 0, }$$
(4)

which demonstrates the hidden convexity. If we let λ 1 be the smallest eigenvalue of the matrix \(\boldsymbol{Q}\), from conditions (3) and (4), it is true that \(\bar{\mu }\geq \max \{ 0,-\lambda _{1}\}\). If the problem (\(\mathcal{P}\)) has no solution on the boundary of \(\mathcal{X}_{a}\), then \(\boldsymbol{Q}\) must be positive definite and \(\|\boldsymbol{Q}^{-1}\boldsymbol{f}\| < r\), which leads to \(\bar{\mu }= 0\). While if (\(\mathcal{P}\)) has a solution on the boundary of \(\mathcal{X}_{a}\) and \((\boldsymbol{Q} +\bar{\mu }\boldsymbol{ I}) \succ 0\), we have \(\|(\boldsymbol{Q} +\bar{\mu }\boldsymbol{ I})^{-1}\boldsymbol{f}\| = r\). In this case, the multiplier \(\bar{\mu }\) can be easily found. However, if the solution \(\bar{\boldsymbol{x}}\) is located on the boundary of \(\mathcal{X}_{a}\) and \(\mathrm{det}(\boldsymbol{Q} +\bar{\mu }\boldsymbol{ I}) = 0\), this situation is the so-called hard case (see [8]), which leads to numerical difficulties [913]. As pointed in [9, 1214], the hard case always implies that \(\boldsymbol{f}\) is perpendicular to the subspace generated by all the eigenvectors corresponding to λ 1. We will show by Theorem 3 in this paper that this condition is only a necessary condition for the hard case. Many methods have been proposed for solving this spherical constrained quadratic minimization problem, especially focusing on the hard case. They include Newton type methods [8, 15], methods recasting the problem in terms of a parameterized eigenvalue problem [12, 13], methods sequential searching Krylov subspaces [14, 16], semidefinite programming methods [9, 11], and the D.C. (difference of convex functions) method [17].

The canonical duality theory was developed from Gao and Strang’s original work [18] for solving the nonconvex/nonsmooth variational problems. It is a powerful methodological theory which has been used successfully for solving a large class of difficult problems (nonconvex, nonsmooth or discrete) in global optimization (see [19, 20]) within a unified framework. This theory is mainly comprised of (1) a canonical dual transformation, which can be used to reformulate nonconvex/discrete problems from different systems as a unified canonical dual problem without duality gaps; (2) a complementary-dual principle, which provides a unified analytical solution form in terms of the canonical dual variable; and (3) a triality theory, which can be used to identify both global and local extrema.

The goal of this paper is to apply the canonical dual approach to find global solutions for the problem (\(\mathcal{P}\)), especially when it is in the hard case. We first show in the next section that the canonical dual problem is canonically (i.e., perfectly) dual to \((\mathcal{P})\) in the sense that both problems have the same set of KKT solutions. Then sufficient and necessary conditions are provided for identifying global optimal solutions. In Sect. 3, a perturbation method is proposed for problems in the hard case. Numerical results are presented in Sect. 4. The paper is ended with some conclusion remarks.

2 Canonical Dual Problem

Let \(\mathcal{S}_{a} =\{\sigma \in \mathbb{R}\ \vert \ \sigma \geq 0,\ \mathrm{det}\boldsymbol{G}(\sigma )\neq 0\;\}\), where \(\boldsymbol{G}(\sigma ) =\boldsymbol{ Q} +\sigma \boldsymbol{ I}\). The canonical dual function \(P^{d}: \mathcal{S}_{a} \rightarrow \mathbb{R}\) is defined by

$$\displaystyle{ P^{d}(\sigma ) = -\boldsymbol{f}^{T}\boldsymbol{G}(\sigma )^{-1}\boldsymbol{f} - r^{2}\sigma, }$$
(5)

and the stationary canonical dual problem [21] is defined by

$$\displaystyle{ \mathrm{sta}\{P^{d}(\sigma )\ \vert \;\sigma \in \mathcal{S}_{ a}\}. }$$
(6)

Theorem 1 (Analytical Solution and Complementary-Dual Principle [20, 22]).

The problem(6)is canonically dual to the problem ( \(\mathcal{P}\) ) in the sense that if \(\bar{\sigma }\in \mathcal{S}_{a}\) is a critical point of P d (σ), then \(\bar{\boldsymbol{x}} =\boldsymbol{ G}_{a}(\bar{\sigma })^{-1}\boldsymbol{f}\) is a KKT point of the primal problem ( \(\mathcal{P}\) ), and we have \(P(\bar{\boldsymbol{x}}) = P^{d}(\bar{\sigma })\) .

Here, we focus the discussion on global optimal solutions and define the canonical dual problem to \((\mathcal{P})\) as the following maximization problem:

$$\displaystyle{ (\mathcal{P}^{d})\ \ \ \ \max \{P^{d}(\sigma )\ \vert \ \sigma \in \mathcal{S}_{ a}^{+}\}, }$$
(7)

where \(\mathcal{S}_{a}^{+} = \left \{\sigma \in \mathcal{S}_{a}\ \vert \ \boldsymbol{G}(\sigma ) \succ \mathbf{0}\right \}\).

Theorem 2 (Global Optimality Condition [20, 22]).

If \(\bar{\sigma }\in \mathcal{S}_{a}^{+}\) is a critical point of P d (σ), then \(\bar{\sigma }\) is a global maximal solution of the problem ( \(\mathcal{P}^{d}\) ) and \(\bar{\boldsymbol{x}} =\boldsymbol{ G}(\bar{\sigma })^{-1}\boldsymbol{f}\) is a global minimal solution of the primal problem ( \(\mathcal{P}\) ), i.e. \(P(\bar{\boldsymbol{x}}) =\min _{\boldsymbol{x}\in \mathcal{X}_{a}}P(\boldsymbol{x}) =\max _{\sigma \in \mathcal{S}_{a}^{+}}P^{d}(\sigma ) = P^{d}(\bar{\sigma })\) .

According to the triality theorem [22, 23], the global optimality condition is called canonical min–max duality. Similar results are also discussed by Corollary 5.3 in [6] and Theorem 1 in [11].

By the symmetry of the matrix \(\boldsymbol{Q}\), there exist diagonal matrix \(\Lambda \) and orthogonal matrix \(\boldsymbol{U}\) such that \(\boldsymbol{Q} =\boldsymbol{ U}\Lambda \boldsymbol{U}^{T}\). The diagonal entities of \(\Lambda \) are the eigenvalues of the matrix Q and are arranged in nondecreasing order, \(\lambda _{1} = \cdots =\lambda _{k} <\lambda _{k+1} \leq \cdots \leq \lambda _{n}.\) The columns of \(\boldsymbol{U}\) are corresponding eigenvectors. Let \(\hat{\boldsymbol{f}} =\boldsymbol{ U}^{T}\boldsymbol{f}\).

Theorem 3 (Existence Conditions [24]).

Suppose that for any given symmetric matrix \(\boldsymbol{Q} \in \mathbb{R}^{n\times n}\) and vector \(\boldsymbol{f} \in \mathbb{R}^{n}\) , λ i and \(\hat{f}_{i}\) are defined as above. The canonical dual problem \((\mathcal{P}^{d})\) has a unique solution \(\bar{\sigma }\in (-\lambda _{1},+\infty )\) if and only if either \(\sum _{i=1}^{k}\hat{f}_{i}^{2}\neq 0\) or \(\sum _{i=k+1}^{n} \frac{\hat{f}_{i}^{2}} {(\lambda _{i}-\lambda _{1})^{2}} > r^{2}\) . If λ 1 ≤ 0, \(\bar{\boldsymbol{x}} =\boldsymbol{ G}(\bar{\sigma })^{-1}\boldsymbol{f}\) is a unique global solution of the problem ( \(\mathcal{P}\) ). Moreover, given that ( \(\mathcal{P}^{d}\) ) has no solutions in \((-\lambda _{1},+\infty )\) , the problem \((\mathcal{P})\) has exactly two global solutions if the multiplicity of λ 1 is k = 1 and infinite number of solutions if k > 1.

Generally speaking, the case that the canonical dual problem \((\mathcal{P}^{d})\) has no critical point in \(\mathcal{S}_{a}^{+}\) does not imply that the problem \((\mathcal{P})\) is in the hard case. For example, if λ 1 > 0, i.e. the matrix \(\boldsymbol{Q}\) is positive definite, \((\mathcal{P}^{d})\) may have no critical point in \(\mathcal{S}_{a}^{+} = [0,+\infty )\). But, the matrix \(\boldsymbol{G}\) is not singular and therefore, the problem is not in the hard case. If λ 1 ≤ 0, the hard case of \((\mathcal{P})\) is equivalent to that \((\mathcal{P}^{d})\) has no critical point in \(\mathcal{S}_{a}^{+} = (-\lambda _{1},+\infty )\). In this case, \(P^{d}(-\lambda _{1}) =\sup \{ P^{d}(\sigma )\vert \;\sigma \in \mathcal{S}_{a}^{+}\}\). The theorem can also be stated equivalently as: If \(\lambda _{1} \leq 0\) , the nonconvex problem \((\mathcal{P})\) is in the hard case if and only if \(\sum _{i=1}^{k}\hat{f}_{i}^{2} = 0\) and \(\sum _{i=k+1}^{n} \frac{\hat{f}_{i}^{2}} {(\lambda _{i}-\lambda _{1})^{2}} \leq r^{2}\) . The first condition indicates that a problem could be in the hard case only when the coefficient \(\boldsymbol{f}\) is perpendicular to the subspace generated by eigenvectors of the smallest eigenvalue. The second one adds that if the norm of \(\boldsymbol{f}\) is relative small, comparing to the radius r and differences between the smallest eigenvalue and all other eigenvalues, the hard case would happen.

3 Perturbation Methods

In order to reinforce the existence conditions, a perturbation \(\sum _{i=1}^{k}\alpha _{i}\boldsymbol{U}_{i}\) to \(\boldsymbol{f}\) with parameters \(\boldsymbol{\alpha }=\{\alpha _{i}\}_{i=1}^{k}\neq 0\) is introduced. Let \(\boldsymbol{p} =\boldsymbol{ f} +\sum _{ i=1}^{k}\alpha _{i}\boldsymbol{U}_{i}\) and \(\hat{\boldsymbol{p}} =\boldsymbol{ U}^{T}\boldsymbol{p}\). The perturbed problem is

$$\displaystyle{ (\mathcal{P}_{\alpha })\ \ \ \ \min \{P_{\alpha }(\boldsymbol{x}) =\boldsymbol{ x}^{T}\boldsymbol{Q}\boldsymbol{x} - 2\boldsymbol{p}^{T}\boldsymbol{x}\ \vert \ \boldsymbol{x} \in \mathcal{X}_{ a}\}. }$$
(8)

It is true that the existence condition, \(\sum _{i=1}^{k}\hat{p}_{i}^{2}\neq 0\), holds for the perturbed problem.

Theorem 4.

[24]Suppose that λ 1 ≤ 0, there is no critical point of P d (σ) in \(\mathcal{S}_{a}^{+}\) , and \(\bar{\boldsymbol{x}}^{{\ast}}\) is the optimal solution of the problem ( \(\mathcal{P}_{\alpha }\) ). Then, there is a global solution of the problem ( \(\mathcal{P}\) ), denoted as \(\bar{\boldsymbol{x}}\) , which is on the boundary of \(\mathcal{X}_{a}\) and, for any \(\varepsilon > 0\) , if the parameter \(\boldsymbol{\alpha }\) satisfies

$$\displaystyle{ \|\boldsymbol{\alpha }\|^{2} \leq (\lambda _{ 2} -\lambda _{1})^{2}\left (r^{2} -\sum _{ i=k+1}^{n} \frac{\hat{f}_{i}^{2}} {(\lambda _{i} -\lambda _{1})^{2}}\right )(1/\sqrt{2(1 -\cos (\varepsilon /r))} - 1)^{-2}, }$$
(9)

we have \(\|\bar{\boldsymbol{x}}^{{\ast}}-\bar{\boldsymbol{ x}}\| \leq \varepsilon\) .

Table 1 General case and \(\alpha = 1e - 3\)

Actually, if the perturbation parameter \(\boldsymbol{\alpha }\) is properly chosen, each solution of the problem (\(\mathcal{P}\)) can be approximated. When the multiplicity of λ 1 is equal to one, as stated in Theorem 3, there are exactly two global solutions. In this case, \(\boldsymbol{\alpha }\) becomes a scalar and has exactly two possible directions, which are mutual opposite and, respectively, lead to the two global solutions. For general cases, there may be infinite number of global solutions for the problem (\(\mathcal{P}\)), and we can show that between solutions of the problem (\(\mathcal{P}\)) and directions of \(\boldsymbol{\alpha }\) there is a one-to-one correspondence [24].

4 Numerical Results

Based on the perturbation method discussed in the previous section, a canonical primal-dual algorithm is developed [24], which is matrix inverse free and the essential cost of calculation is only the matrix-vector multiplication.

One hundred examples are randomly generated, containing 50 examples of the general case and 50 examples of the hard case. Both cases have ten examples for dimensions of 500, 1,000, 2,000, 3,000, and 5,000. All elements of the coefficients, \(\boldsymbol{Q}\), \(\boldsymbol{f}\) and r, are integer numbers in [−100, 100]. For each example of the hard case, in order to make \(\boldsymbol{f}\) can be easily chosen, we use a matrix \(\boldsymbol{Q}\) of whom the multiplicity of the smallest eigenvalue is equal to one. Then, the vector \(\boldsymbol{f}\) is constructed such that it is perpendicular to the eigenvector of the smallest eigenvalue, and a proper radius \(r\) is selected such that the existence conditions are violated.

For the hard case, a perturbation \(\alpha \boldsymbol{U}_{1}\) is added to the vector \(\boldsymbol{f}\), where \(\boldsymbol{U}_{1}\) is the eigenvector of the smallest eigenvalue, and two values of α, 1e−3 and 1e−4, are tried. The algorithm is implemented on Matlab 7.13, which was runned in the platform with a Linux 64-bit system and a quad-core CPU.

Results are shown in Tables 123, and 4, and they contain the number of examples which are successfully solved (Succ.Solv.), the distance of the optimal solution to the boundary of the sphere (Dist.Boun.), the number of iterations in Algorithm: Main (Numb.Iter.) and the running time (in second) of the algorithm (Runn.Time). The values in the columns of Dist.Boun., Numb.Iter. and Runn.Time are averages of the examples successfully solved. We compare the results of the algorithm adopting “left division” and that of the algorithm adopting “quadprog” in the same table, where LD denotes “left division” and QP denotes “quadprog.”

Table 2 General case and \(\alpha = 1e - 4\)
Table 3 Hard case and \(\alpha = 1e - 3\)
Table 4 Hard case and \(\alpha = 1e - 4\)

We can see that the examples are solved very accurately with error allowance being less than 1e-09. The failure in solving some examples is due to “left division” and “quadprog” being unable to handle very nearly singular matrices. For general cases, all the examples can be solved within no more than 30 iterations, while for hard cases, the number of iterations is around 40. From the running time, we notice that our method is capable to handle very large problems in reasonable time. The algorithms using “left division” and “quadprog” have similar performances in the accuracy and the number of iterations. The one using “left division” needs much less time than that of the one using “quadprog”, but “quadprog” is able to solve more examples successfully.

5 Conclusion Remarks

We have presented a detailed study on the quadratic minimization problem with a sphere constraint. By the canonical duality, this nonconvex optimization is equivalent to a concave maximization dual problem over a convex domain \(\mathcal{S}_{a}^{+}\), which is true also for many other global optimization problems (see [2531]). Based on this canonical dual problem, sufficient and necessary conditions are obtained for both general and hard cases. In order to solve hard-case problems, a perturbation method and the associated algorithm are proposed. Numerical results demonstrate that the proposed approach is able to handle the problem effectively. Combining with the trust region method, the results presented in this paper can be used to solve general global optimizations.