Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

We consider the following quadratic minimization problem:

$$\begin{aligned} ({\mathscr {P}})~~~~\min&~~ P({x})={x}^T\varvec{Q}{x}-2\varvec{f}^T{x}\\ \text {s.t.}&~~ {x}\in {\mathscr {X}}_a , \nonumber \end{aligned}$$

where the given matrix \(\varvec{Q}\in \mathbb {R}^{n\times n}\) is assumed to be symmetric, \(\varvec{f}\in \mathbb {R}^n\) is an arbitrarily given vector, and the feasible region is defined as

$$ {\mathscr {X}}_a =\left\{ {x}\in \mathbb {R}^n~|~\Vert {x}\Vert \le r\right\} ,\nonumber $$

with r being a positive real number and \(\Vert {x}\Vert = \Vert {x}\Vert _2\) representing \(\ell _2\) norm in \(\mathbb {R}^n\).

Problem \(({\mathscr {P}})\) arises naturally in computational mathematical physics with extensive applications in engineering sciences. From the point view of systems theory, if the vector \(\varvec{f}\in \mathbb {R}^n\) is considered as an input (or source ), then the solution \({x}\in \mathbb {R}^n\) is referred to as the output (or state) of the system. By the fact that the capacity of any given system is limited, the spherical constraint in \({\mathscr {X}}_a\) is naturally required for virtually every real-world system. For example, in engineering structural analysis, if the applied force \(\varvec{f}\in \mathbb {R}^\infty \) is big enough, the stress distribution in the structure will reach its elastic limit and the structure will collapse. For elasto-perfectly plastic materials, the well-known von Mises yield condition is a nonlinear inequality constraint \(\Vert {x}\Vert _2 \le r\) imposed on each material pointFootnote 1 (see Chap. 7, [1]). By finite element method, the variational problem in structural limit analysis can be formulated as a large-size nonlinear optimization problem with m quadratic inequality constraints (m depends on the number of total finite elements). Such problems have been studied extensively in computational mechanics for more than fifty years and the so-called penalty-duality finite element programing [2, 3] is one of the well-developed efficient methods for solving this type of problems in engineering sciences.

In mathematical programing, the problem \(({\mathscr {P}})\) is known as a trust region subproblem, which arises in trust region methods [4, 5]. In literatures, two similar problems are also discussed: in [6,7,8], the convexity of the quadratic constraint is removed; while in [9, 10], the constraint is replaced by a two-sided (lower and upper bounded) quadratic constraint. Although the function \(P({x})\) may be nonconvex, it is proved that the problem (\({\mathscr {P}}\)) possesses the hidden convexity, i.e., (\({\mathscr {P}}\)) is actually equivalent to a convex optimization problem [10], and for each optimal solution \(\bar{\varvec{x}}\), there exist a Lagrange multiplier \(\bar{\mu }\) such that the following conditions hold [11]:

$$\begin{aligned}&(\varvec{Q}+\bar{\mu }\varvec{I})\bar{\varvec{x}}=\varvec{f}, \end{aligned}$$
(1)
$$\begin{aligned}&\varvec{Q}+\bar{\mu }\varvec{I}\succeq 0, \end{aligned}$$
(2)
$$\begin{aligned}&\Vert \bar{\varvec{x}}\Vert \le r , \;\;\; \bar{\mu }\ge 0 , \;\;\; \bar{\mu }(\Vert \bar{\varvec{x}}\Vert -r)= 0 . \end{aligned}$$
(3)

Let \(\lambda _1\) be the smallest eigenvalue of the matrix \(\varvec{Q}\). From conditions (2) and (3), we have

$$ \bar{\mu }\ge \max \{0,-\lambda _1\}. $$

If the problem (\({\mathscr {P}}\)) has no solutions on the boundary of \({\mathscr {X}}_a\), then \(\varvec{Q}\) must be positive definite, and \(\Vert \varvec{Q}^{-1}\varvec{f}\Vert <r\), which leads to \(\bar{\mu }=0\). Now suppose the solution \(\bar{\varvec{x}}\) is on the boundary of \({\mathscr {X}}_a\). If \((\varvec{Q}+\bar{\mu }\varvec{I})\succ 0\), we have \(\Vert (\varvec{Q}+\bar{\mu }\varvec{I})^{-1}\varvec{f}\Vert =r\) and the multiplier \(\bar{\mu }\) can be easily found. While if \(\det (\varvec{Q}+\bar{\mu }\varvec{I}) = 0\), it becomes very challenging to solve the problem [12,13,14,15,16] and the situation is referred to as ‘hard case’ (see [17]). Mathematically speaking, when the problem is in the hard case, there are multiple solutions for the equation \((\varvec{Q}+\bar{\mu }\varvec{I}){x}=\varvec{f}\) and they are in the form \({x}=(\varvec{Q}+\bar{\mu }\varvec{I})^\dagger \varvec{f}+\tau \tilde{\varvec{x}}\) with \((\varvec{Q}+\bar{\mu }\varvec{I})\tilde{\varvec{x}}=0\). As pointed out in [12, 15, 16, 18], the hard case always implies that \(\varvec{f}\) is perpendicular to the subspace generated by all the eigenvectors corresponding to \(\lambda _1\). We show by Theorem 3 and Example 2 in this paper that this condition is only a necessary condition for the problem being in the hard case. Many methods have been proposed for handling the problem (\({\mathscr {P}}\)), especially focusing on the hard case: Newton type methods [17, 19], methods recasting the problem in terms of a parameterized eigenvalue problem [12, 15], methods sequential searching Krylov subspaces [18, 20], semidefinite programing methods [13, 16], and the D.C. (difference of convex functions) method [21].

Canonical duality theory is a powerful methodological theory which has been used successfully for solving a large class of difficult (nonconvex, nonsmooth, and discrete) problems in global optimization (see [22, 23]), within a unified framework. This theory is mainly comprised of (1) a canonical dual transformation, which can be used to reformulate nonconvex/discrete problems from different systems as a unified canonical dual problem without duality gaps; (2) a complementary-dual principle, which provides a unified analytical solution form in terms of the canonical dual variable; and (3) a triality theory, which is composed of canonical min–max duality, double-min duality, and double-max duality. The canonical min–max duality can be used to find a global optimal solution for the primal problem, while the double-min and double-max dualities can be used to identify the biggest local minimizer and the biggest local maximizer, respectively.

The canonical duality-triality theory was developed from Gao and Strang’s original work [24], which discusses the nonconvex/nonsmooth variational problem

$$\begin{aligned} \min \{ P({u}) = W(\varvec{D}{u}) + F({u}) \} , \end{aligned}$$
(4)

where the variational argument \({u}\) is a continuous function in an infinite-dimensional space, \(\varvec{D}\) is a linear operator, \(W(w)\) is the stored energy, which is an objective functional and depends only on the mathematical model, and \( F({u})\) is the external energy, which is a “subjective" functional and depends on each problem (boundary-initial conditions). It is well known in nonlinear analysis [25] and continuum physics (see [1], p. 288) that a real-valued function \(W(w)\) is called objective only if \(W(w)\) satisfies the frame-invariance principle,Footnote 2 i.e., \(W(w) = W(\varvec{R}w)\) for any rotation matrices \(\varvec{R}\) such that \(\varvec{R}^T = \varvec{R}^{-1}\) and \(\det \varvec{R}= 1\). It was emphasized in [25] that the objectivity is not an assumption but an axiom. This means that the objective function depends only on the constitutive property of the system. Geometrically speaking, the objective function should be an invariant under orthogonal transformation. This concept lays a foundation for the canonical duality theory, i.e., instead of the design variable \({u}\) (the linear operator \(\varvec{D}\) can not change the nonconvexity of \(W(\varvec{D}{u})\)), the canonical dual transformation is to choose a geometrically admissible (say objective) measure \(\varvec{\xi }= \varLambda ({u})\) and a convex function \(V(\varvec{\xi })\) such that \(W(\varvec{D}{u}) = V(\varLambda ({u}))\) and the duality relation \(\varvec{\xi }^* = \nabla V(\varvec{\xi })\) is invertible. Such one-to-one duality is called the canonical duality. The most simple objective measure is the \(\ell _2\) norm \(\varLambda ({u}) = {u}^T {u}\) since \(\varLambda (\varvec{R}{u}) = \varLambda ({u})\). Thus, the objective function \(W(w)\) can not be linear. On the other hand, the so-called subjective function \( F({u})\) depends on input (such as external force, market demanding, cost/price, etc.) and boundary-initial constraints for each problem, which must be linear. Therefore, the combination of \(W(w)\) and \( F({u})\) can be used to model general problems in complex systemsFootnote 3 [1, 27]. Using numerical discretization (say, the finite element method) for the unknown variable \({u}({x})\), the general variational problem (4) becomes a very general global optimization problem in finite dimensional space (see [2, 28]). This is the basic reason why the canonical duality theory can be used for solving a large class of problems from different fields. However, the objective function in mathematical programing has been misused with other concepts such as cost, target, utility, and energy functions. It turns out that the canonical duality theory has been challenged (cf. [29]) by oppositely using linear \(W(w)\) and nonlinear \(F({u})\) as counterexamples (see [30]). These conceptual mistakes show a big gap between mathematical physics and optimization.

The goal of this paper is to find global solutions for the problem (\({\mathscr {P}}\)), especially when it is in the hard case. We first show in the next section that by the canonical dual transformation, this constrained nonconvex problem can be reformulated as a one-dimensional optimization problem. The complementary-dual principle shows that this one-dimensional problem is canonically dual to \(({\mathscr {P}})\) in the sense that both problems have the same set of KKT solutions. While the canonical min–max duality in the triality theory provides a sufficient and necessary condition for identifying global optimal solutions. In order to solve the hard case, a perturbation method is proposed in Sect. 4 and, accordingly, a canonical primal-dual algorithm is developed in Sect. 5. Numerical results are presented in Sect. 6. The paper is ended with some conclusion remarks.

2 Canonical Dual Problem

By the fact that the condition \(\Vert {x}\Vert \le r\) is a physical constraint (required by mathematical model), it must be written in canonical form. Therefore, instead of the \(\ell _2\) norm, the canonical dual transformation is to introduce a quadratic (objective) measure \(\xi = \varLambda ({x}) = {x}^T {x}: \mathbb {R}^n \rightarrow {\mathscr {E}}_a = \{ \xi \in \mathbb {R}| \;\; \xi \ge 0 \}\) and a convex function \(V:{\mathscr {E}}_a \rightarrow \mathbb {R}\cup \{ +\infty \}\)

$$ V(\xi ) = \left\{ \begin{array}{ll} 0 &{} \text{ if } \xi \le r^2 ,\\ + \infty &{} \text{ otherwise } \end{array} \right. \nonumber $$

such that the constrained problem \(({\mathscr {P}})\) can be written equivalently in the following canonical form [22, 26, 27, 31]

$$ \min \big \{ \varPi ({x}) = V(\varLambda ({x})) - U({x}) ~|~ {x}\in \mathbb {R}^n \big \} ,\nonumber $$

where \(U({x}) = - {x}^T \varvec{Q}{x}+ 2 \varvec{f}^T {x}\). By the Fenchel transformation, the conjugate of \(V(\xi )\) can be uniquely defined as

$$ V^*(\sigma ) = \sup \{ \xi \sigma - V(\xi ) ~|~ \xi \in {\mathscr {E}}_a \} = \left\{ \begin{array}{ll} r^2 \sigma &{} \text{ if } \sigma \ge 0 ,\\ + \infty &{} \text{ otherwise }. \end{array} \right. \nonumber $$

Clearly, \(V^*(\sigma )\) is convex, lower semi-continuous on \({\mathscr {E}}^*_a = \mathbb {R}\). According to convex analysis [32], we have the following equivalent relations on \({\mathscr {E}}_a \times {\mathscr {E}}^*_a\):

$$ \sigma \in \partial V(\xi ) \; \; \Longleftrightarrow \;\; \xi \in \partial V^*(\sigma ) \; \; \Longleftrightarrow \;\; V(\xi ) + V^*(\sigma ) = \xi \sigma . $$

By the canonical duality theory, the pair \((\xi , \sigma )\) satisfying (2) is called the (generalized) canonical duality pair (see [31] and Remark 1 in [22]). Clearly, the canonical duality (2) is equivalent to

$$ \xi - r^2 \le 0 , \;\; \sigma \ge 0 ,\;\; \sigma (\xi - r^2) = 0 . $$

This shows that the KKT conditions in (3) are equivalently relaxed by one of the canonical duality relations in (2). Replacing \(V(\xi )\) in \(\varPi ({x})\) by the Fenchel-Young equality \(V(\xi ({x})) = \xi ({x}) \sigma - V^*(\sigma )\), the Gao-Strang total complementary function can be naturally obtained as [26, 27]:

$$ \varXi ({x}, \sigma ) = \xi ({x}) \sigma - V^*(\sigma ) -U({x}) = {x}^T \varvec{G}(\sigma ) {x}- 2 \varvec{f}^T {x}-V^*(\sigma ), $$

where \(\varvec{G}(\sigma )=\varvec{Q}+\sigma \varvec{I}\). Let

$$ {\mathscr {S}}_a=\{\sigma \in \mathbb {R}\;|\;\sigma \ge 0,~ \det \, \varvec{G}(\sigma ) \ne 0 \; \} $$

be a canonical dual feasible space. Then for any given \(\sigma \in {\mathscr {S}}_a\), the canonical dual function \(P^d: {\mathscr {S}}_a \rightarrow \mathbb {R}\) can be defined by

$$ P^d(\sigma )= \mathrm{sta}\big \{ \varXi ({x}, \sigma ) \;| \; {x}\in \mathbb {R}^n \big \} = -\varvec{f}^T\varvec{G}(\sigma )^{-1}\varvec{f}-r^2\sigma , $$

where the notation \(\mathrm{sta}\{ \varXi ({x}, \sigma )\; | \; {x}\in \mathbb {R}^n\}\) stands for computing stationary points of \(\varXi ({x},\sigma )\) with respect to \({x}\). Therefore, the stationary canonical dual problem is to find KKT points \(\bar{\sigma }\) of \(P^d(\sigma )\) such that [33]

$$ P^d(\bar{\sigma }) = \mathrm{sta}\{ P^d(\sigma ) \;| \; \sigma \in {\mathscr {S}}_a \} . $$

We need to emphasize that \(P^d(\sigma )\) is a function of a scalar variable \(\sigma \in {\mathscr {S}}_a \subset \mathbb {R}\), regardless of the dimension of the primal problem, and the inequality \(\det \, \varvec{G}(\sigma ) \ne 0 \) is actually not a constraint (the Lagrange multiplier for this inequality is zero). Therefore, the KKT points for this canonical dual problem are much easier to be obtained than that for the primal problem. By the canonical duality theory, we have the following theorem.

Theorem 1.

(Analytical Solution and Complementary-Dual Principle [33]) Suppose that the symmetrical matrix \(\varvec{Q}\) has m (\(\le n\)) distinct eigenvalues \(\lambda _i, i = 1, \dots , m \) and \(i_d \le m\) of them are strictly negative such that \(\lambda _1< \lambda _2< \dots< \lambda _{i_d}< 0 \le \lambda _{i_d + 1}< \dots < \lambda _m \). Then for a given vector \(\varvec{f}\in \mathbb {R}^n\) and a sufficiently large \(r > 0\), the canonical dual problem (2) has at most \(2i_d + 1\) KKT points \(\bar{\sigma }_i\) satisfying

$$ \bar{\sigma }_1> - \lambda _1> \bar{\sigma }_2 \ge \bar{\sigma }_3> - \lambda _2> \dots> - \lambda _{i_d}> \bar{\sigma }_{2i_d} \ge \bar{\sigma }_{2 i_d + 1} > 0 . $$

For each \(\bar{\sigma }_i, \;\; i = 1, \dots , 2 i_d + 1\), the vector

$$\begin{aligned} \bar{{x}} _i =\varvec{G}(\bar{\sigma }_i )^{-1}\varvec{f}\end{aligned}$$
(5)

is a KKT point of the primal problem (\({\mathscr {P}}\)), and we have

$$ P(\bar{{x}}_j ) \ge P(\bar{{x}}_i)=\varXi (\bar{{x}}_i, \bar{\sigma }_i) = P^d(\bar{\sigma }_i) \le P^d(\bar{\sigma }_j) \;\; \forall i, j = 1, \dots , 2 i_d + 1, \;\; i \le j . $$

This theorem shows that the nonconvex function \(P({x})\) is canonically dual (without duality gaps) to \(P^d(\sigma )\) at each KKT point \((\bar{{x}}_i,\bar{\sigma }_i)\), and the function values of \(P^d(\sigma _i)\) are in an opposite order with its critical points \(\sigma _1 > \sigma _2 \ge \dots \) (see Fig. 1). Clearly, the KKT solution \(\bar{{x}}_1\) is a global minimizer of the primal problem \(({\mathscr {P}})\).

Fig. 1
figure 1

The graph of canonical dual function \(P^d(\sigma )\) for \(n=4\) (see Example 3 for details)

In order to identify global optimal solutions among all the critical points of \(P^d(\sigma )\), a subset of \({\mathscr {S}}_a\) is needed:

$$ {\mathscr {S}}_a^+ = \left\{ \sigma \in {\mathscr {S}}_a ~|~ \varvec{G}(\sigma )\succ \mathbf {0} \right\} . \nonumber $$

The problem canonically dual to \(({\mathscr {P}})\) can be proposed as the following

$$ ({\mathscr {P}}^d)~~~~\max \big \{P^d(\sigma )~|~ \sigma \in {\mathscr {S}}_a^+\big \}. $$

Theorem 2.

(Global Optimality Condition [1, 23]) Suppose that \(\bar{\sigma }\) is a critical point of \(P^d(\sigma )\). If \(\bar{\sigma } \in {\mathscr {S}}_a^+\), then \(\bar{\sigma }\) is a global maximal solution of the problem (\({\mathscr {P}}^d\)) on \({\mathscr {S}}^+_a\) and \(\bar{{x}} =\varvec{G}(\bar{\sigma })^{-1}\varvec{f}\) is a global minimal solution of the primal problem (\({\mathscr {P}}\)), i.e.,

$$ P( \bar{{x}} )=\min _{{x}\in {\mathscr {X}}_a} P({x})=\max _{\sigma \in {\mathscr {S}}_a^+}P^d(\sigma )=P^d( \bar{\sigma }). $$

According to the triality theorem [1, 29], the global optimality condition (2) is called canonical min–max duality. By the fact that \(P^d(\sigma )\) is strictly concave on the (open) convex set \({\mathscr {S}}^+_a\), this theorem guarantees that if there is a critical point in \({\mathscr {S}}_a^+\), it must be unique and the nonconvex minimization problem (\({\mathscr {P}}\)) is equivalent to a concave maximization problem \(({\mathscr {P}}^d)\). Similar result is also discussed by Corollary 5.3 in [9] and Theorem 1 in [13]. Moreover, for the case when \(n=1\), the double-min duality statement in the weak-triality theory proven recently (see [29, 34, 35]) shows that the problem (\({\mathscr {P}}\)) has at most one local minimizer, which is corresponding to a critical point \(\bar{\sigma } \in {\mathscr {S}}^-_a = \{ \sigma \in {\mathscr {S}}_a | \; \varvec{G}(\sigma ) \prec 0 \} \). All these previous results show that the canonical duality-triality theory provides detailed information on a complete set of solutions to the nonconvex problem \(({\mathscr {P}})\).

Remark 1.

Duality theory for quadratic minimization problems with \(\ell _2\)-norm constraints was discussed extensively in plastic mechanics fifty years ago. It was shown by Gao in [3] that for the quadratic \(\ell _2^2\) constraint , the canonical dual can be easily formulated and a primal-dual finite element programing algorithm was first developed for solving minimal potential variational problems in infinite dimensional space [2]. By the fact that the geometrical measure \(\xi ({x}) = {x}^T {x}\) is quadratic, the first term in \(\varXi ({x},\sigma )\) is the so-called (generalized) complementary gap function [26, 27] denoted by

$$ G_{ap}({x},\sigma ) = \xi ({x}) \sigma + {x}^T \varvec{Q}{x}= {x}^T \varvec{G}(\sigma ) {x}. \nonumber $$

Clearly, \(G_{ap}({x},\sigma ) \ge 0 \;\; \forall {x}\in \mathbb {R}^n\) if and only if \(\sigma \in {\mathscr {S}}^+_a\). Therefore, \(\varXi ({x}, \sigma )\) is a saddle function on \(\mathbb {R}^n \times \mathbb {R}\) if \(G_{ap}({x},\sigma ) \ge 0\; \; \forall {x}\in \mathbb {R}^n\). This result was first discovered by Gao and Strang in nonconvex mechanics [24], where they proved that this gap function recovers a broken symmetry in geometrically nonlinear systems and provides a global optimality condition for general nonconvex variational problems in mathematical physics. Particularly, the total complementary function \(\varXi ({x}, \sigma )\) on \(\mathbb {R}^n \times \mathbb {R}_+ = \{ \sigma \in \mathbb {R}| \; \sigma \ge 0 \}\) has a simple form

$$ \varXi ({x}, \sigma ) = {x}^T \varvec{G}(\sigma ) {x}- 2 {x}^T \varvec{f}- r^2 \sigma = P({x}) + \sigma ({x}^T {x}- r^2), \nonumber $$

which can be viewed as a Lagrangian of \(({\mathscr {P}})\) for the \(\ell _2^2\)-norm constraint \({x}^T {x}\le r^2\). Indeed, the total complementary function \(\varXi ({x}, \sigma )\) was also called nonlinear Lagrangian in [1] or extended Lagrangian in [31]. However, for nonconvex target function \(P({x})\), the classical Lagrangian duality theory will produce a well-known duality gap unless the global optimality condition \(G_{ap}({x},\sigma ) \ge 0 \;\;\forall {x}\in \mathbb {R}^n\) is satisfied. Therefore, the Lagrangian duality theory is only a special case of the canonical duality theory for certain problems. Also, by the fact that a large class of nonconvex/discrete global optimization problems can be equivalently reformulated as a unified canonical dual form (2) (see [22, 26, 27]), which is equivalent to a convex minimization problem over a convex feasible set, the so-called “hidden convexity" is indeed a special case of the canonical min–max duality theory.

For the hard case, the matrix \( \varvec{G}( {\sigma } ) \) is singular at the KKT point \(\bar{\sigma }\), the canonical dual \(P^d(\sigma )\) should be replaced by (see [36])

$$ P^d(\sigma )=-\varvec{f}^T \varvec{G}(\sigma )^\dag \varvec{f}-r^2\sigma , \nonumber $$

where \(\varvec{G}(\sigma )^\dag \) stands for a generalized inverse of \(\varvec{G}(\sigma )\). In [9, 13], the dual function is also presented in discussions of the strong duality. Since this function is not strictly concave on the closure of \({\mathscr {S}}_a^+\), it may have multiple critical points located on the boundary of \({\mathscr {S}}_a^+\). In the following sections, we will first study the existence conditions of these critical points, and then study an associated algorithm for computing these solutions.

3 Existence Conditions

As \(\varvec{Q}\) is symmetrical, there exist a diagonal matrix \(\L \) and an orthogonal matrix \(U\) such that \(\varvec{Q}=U\L U^T\). The diagonal entities of \(\L \) are the eigenvalues of \(\varvec{Q}\) and are arranged in a nondecreasing order,

$$ \lambda _1=\cdots =\lambda _k<\lambda _{k+1}\le \cdots \le \lambda _n. $$

The columns of \(U\) are corresponding eigenvectors.

Let \(\hat{\varvec{f}}=U^T\varvec{f}\). Because \((\varvec{Q}+\sigma \varvec{I})^{-1}=U(\L +\sigma \varvec{I})^{-1}U^T\), we can rewrite the canonical dual function \(P^d(\sigma )\) as

$$ P^d(\sigma )=-\frac{\sum _{i=1}^k\hat{f}_i^2}{\lambda _1+\sigma }-\sum _{i=k+1}^n\frac{\hat{f}_i^2}{\lambda _i+\sigma }-r^2\sigma , $$

where \(\hat{f}_i, i=1,\ldots , n \) are elements of \(\hat{\varvec{f}}\). It is now easy to see that as long as \(\varvec{f}\ne 0\), \(P^d(\sigma )\) has stationary points in \({\mathscr {S}}_a\) and thus the canonical dual problem (2) is well defined. Whereas, for the case when \(\varvec{f}=0\), a perturbation should be introduced, which is discussed in the next section.

Theorem 3.

(Existence Conditions) Suppose that for any given \(\varvec{Q}\in \mathbb {R}^{n\times n}\) and \(\varvec{f}\in \mathbb {R}^n\), \(\lambda _i\), and \(\hat{f}_i\) are defined as above.

The canonical dual function \(P^d(\sigma )\) has a critical point \(\bar{\sigma }\) in \((-\lambda _1,+\infty )\) if and only if either \(\sum _{i=1}^k\hat{f}_i^2\ne 0\) or \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}>r^2\) holds true. Furthermore, if \(\lambda _1\le 0\), then \(\bar{{x}}=\varvec{G}(\bar{\sigma })^{-1}\varvec{f}\) is the unique solution of the primal problem (\({\mathscr {P}}\)).

If \(P^d(\sigma )\) has no critical points in \((-\lambda _1,+\infty )\), the primal problem \(({\mathscr {P}})\) has exactly two global solutions when the multiplicity of \(\lambda _1\) is \(k=1\) and has infinite number of solutions when \(k > 1\).

Proof: First, we prove that the existence of a critical point of \(P^d(\sigma )\) in \((-\lambda _1,+\infty )\) implies that either \(\sum _{i=1}^k\hat{f}_i^2\ne 0\) or \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}>r^2\) holds true. It is equivalent to prove that if \(\sum _{i=1}^k\hat{f}_i^2=0\) and \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}\le r^2\) the dual function \(P^d(\sigma )\) will have no critical points in \((-\lambda _1,+\infty )\). The first item in the expression (3) vanishes when \(\sum _{i=1}^k\hat{f}_i^2=0\). Then because \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}\le r^2\), the first-order derivative of the dual function

$$ (P^d(\sigma ))'=\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i+\sigma )^2}-r^2 \nonumber $$

is always negative in \((-\lambda _1,+\infty )\). Therefore, the dual function \(P^d(\sigma )\) will have no critical points in \((-\lambda _1,+\infty )\).

Next we will give the proof of the sufficiency, which is divided into two parts:

(1) If \(\sum _{i=1}^k\hat{f}_i^2\ne 0\), then \(\sigma =-\lambda _1\) is a pole of \(P^d(\sigma )\), i.e., as \(\sigma \) approaches \(-\lambda _1\) from the right side, \(P^d(\sigma )\) approaches \(-\infty \). The value of \(P^d(\sigma )\) also approaches \(-\infty \), when \(\sigma \) approaches \(+\infty \). Thus, \(- P^d(\sigma )\) is coercive on \((-\lambda _1,+\infty )\). Since, for any \(\sigma \in (-\lambda _1,+\infty )\), \(\varvec{G}(\sigma )\) is positive definite, \(P^d(\sigma )\) is strictly concave on \((-\lambda _1,+\infty )\). Thus there exists a unique critical point in \((-\lambda _1,+\infty )\).

(2) If \(\sum _{i=1}^k\hat{f}_i^2=0\) and \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}>r^2\), \((P^d(\sigma ))'\) is positive at \(\sigma =-\lambda _1\). Moreover, \((P^d(\sigma ))'\) approaches \(-r^2\) as \(\sigma \) approaches \(\infty \). Therefore, there exists at least one root for the equation \((P^d(\sigma ))'=0\) in \((-\lambda _1,+\infty )\), which means \(P^d(\sigma )\) has at least one critical point in \((-\lambda _1,+\infty )\). Similarly, because of the strict concavity of \(P^d(\sigma )\) over \((-\lambda _1,+\infty )\), the critical point is unique.

Suppose \(\lambda _1\le 0\). The uniqueness of global solution \(\bar{\varvec{x}}\) will be proved, if it can be proved that \((\bar{\varvec{x}},\bar{\sigma })\) is the only pair that satisfies the KKT conditions (1)–(3). As mentioned above, the dual function \(P^d(\sigma )\) is strictly concave on \((-\lambda _1,+\infty )\), which, plus the criticality of \(\bar{\sigma }\), implies that \((P^d(\sigma ))'=\Vert {x}\Vert ^2-r^2>0\) for \(\sigma \in (-\lambda _1,\bar{\sigma })\) and \(<0\) for \(\sigma \in (\bar{\sigma },+\infty )\), where \({x}=\varvec{G}(\sigma )^{-1}\varvec{f}\). Thus, for any \(\sigma \ne \bar{\sigma }\) in \((-\lambda _1,+\infty )\), there is no \({x}\) such that \(({x},\sigma )\) satisfies the KKT conditions (1)–(3). Except for the interval \((-\lambda _1,+\infty )\), \(\sigma =-\lambda _1\) is the last candidate. However, if \(\sum _{i=1}^k\hat{f}_i^2\ne 0\), the equation \(\varvec{G}(-\lambda _1){x}=\varvec{f}\) has no solutions, and if \(\sum _{i=1}^k\hat{f}_i^2=0\) and \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}>r^2\), the feasibility of any solution of \(\varvec{G}(-\lambda _1){x}=\varvec{f}\) is violated by the fact that \(\Vert {x}\Vert ^2-r^2=\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}-r^2>0\). Then, \(\sigma =-\lambda _1\) can not make the KKT conditions hold true. Therefore, \((\bar{\varvec{x}},\bar{\sigma })\) is the unique pair that satisfies the KKT conditions (1)–(3).

Finally, suppose that there are no critical points in \((-\lambda _1,+\infty )\), which, from the above proof, is equivalent to \(\sum _{i=1}^k\hat{f}_i^2 = 0\) and \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2} \le r^2\). Then, for any global solution, we have \(\bar{\sigma }= - \lambda _1\). Let \(\bar{\varvec{x}}\) be a global solution and \(\bar{\varvec{y}}=U^T\bar{\varvec{x}}\). Then the canonical equilibrium equation \(\varvec{G}(\bar{\sigma })\bar{\varvec{x}}=\varvec{f}\) can be equivalently transformed into \(\text {diag}(\{\lambda _i+\bar{\sigma }\})\bar{\varvec{y}}=\hat{\varvec{f}}\). If \(k=1\), i.e., the multiplicity of \(\lambda _1\) is one, the equation uniquely determines \(\bar{y}_i, i=2,\ldots ,n\), but not \(\bar{y}_1\). By the fact that \(\bar{\varvec{y}}^{T}\bar{\varvec{y}}=r^2\), \(\bar{y}_1\) has exactly two values, corresponding to the two global solutions of \(({\mathscr {P}})\). While, if \(k>1\), i.e., the matrix \(\varvec{Q}\) has at least two repeated eigenvalues \(\lambda _1 = \lambda _2 = \dots = \lambda _k \le 0\), the equations \(\text {diag}(\{\lambda _i+\bar{\sigma }\})\bar{\varvec{y}}=\hat{\varvec{f}}\) and \(\bar{\varvec{y}}^{T}\bar{\varvec{y}}=r^2\) have infinite number of solutions.    \(\square \)

Remark 2.

The complementarity relations between the primal problem \(({\mathscr {P}})\) and its canonical dual problem \(({\mathscr {P}}^d)\) are significant. When \(\lambda _1 > 0\), i.e., \(\varvec{Q}\) is positive definite, if \(({\mathscr {P}})\) has a global solution in the interior of \({\mathscr {X}}_a\), which must be the stationary point of \(P({x})\) and can be easily calculated, its canonical dual \(({\mathscr {P}}^d)\) has no critical point in \({\mathscr {S}}^+_a=[0,+\infty )\) due to \( (P^d(0))'=\Vert \bar{\varvec{x}}\Vert ^2-r^2<0\), where \(\bar{\varvec{x}}=\varvec{G}(0)^{-1}\varvec{f}\) is the stationary point of \(P({x})\). Dually, when \(\lambda _1 \le 0\), the primal function \(P({x})\) is nonconvex and the global minimizer of \(({\mathscr {P}})\) must be on the boundary of \({\mathscr {X}}_a\). In this case, if the canonical dual \(({\mathscr {P}}^d)\) has a critical point in \({\mathscr {S}}^+_a=(-\lambda _1,+\infty )\), the primal problem \(({\mathscr {P}})\) is then not in the hard case and has a unique solution, which can be easily obtained by solving the canonical dual problem. Whereas if \(({\mathscr {P}}^d)\) has no critical points in \({\mathscr {S}}^+_a\), i.e., \(P^d(-\lambda _1) = \sup \{ P^d(\sigma ) | \; \sigma \in {\mathscr {S}}^+_a \}\), the primal problem \(({\mathscr {P}})\) is in the hard case, because, for any \(\sigma \in {\mathscr {S}}_a^+\) and \({x}=\varvec{G}(\sigma )^{-1}\varvec{f}\), we have \( (P^d(\sigma ))'=\Vert {x}\Vert ^2-r^2<0\), which destroys the complementary condition in (3), and only \(\sigma =-\lambda _1\) can make the KKT conditions (1)–(3) hold.

Therefore, combining with Theorem 3, we have the following result.

Corollary 1.

If \(\lambda _1\le 0\), the nonconvex problem \(({\mathscr {P}})\) is in the hard case if and only if both conditions (i) \(\sum _{i=1}^k\hat{f}_i^2=0\) and (ii) \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}\le r^2\) hold true.

The condition (i) is well known: the trust region subproblem could be in the hard case only if the coefficient \(\varvec{f}\) is perpendicular to the subspace generated by eigenvectors of the smallest eigenvalue. The condition (ii) is new, which shows that the hard case of \(({\mathscr {P}})\) depends not only on the direction of \(\varvec{f}\), but also on its norm.

Theorem 3 and Corollary 1 show an important fact that the given vector \(\varvec{f}\) plays an important role to the solutions of the problem \(({\mathscr {P}})\). From the point of view of solid mechanics, if \(\varvec{f}\) is considered as an applied force, then the decision variable \({x}\) is the displacement and the spherical constraint \(\Vert {x}\Vert \le r\) is corresponding to the von Mises yield condition, which represents the capacity of the system. If the norm of \(\varvec{f}\) is big enough, the deformation \({x}\) should reach the limit \(\Vert {x}\Vert = r\) and the problem \(({\mathscr {P}})\) has a solution on the boundary of \({\mathscr {X}}_a\). By the canonical duality, the problem \(({\mathscr {P}}^d)\) must have a critical point in \({\mathscr {S}}_a^+\). If the norm of \(\varvec{f}\) is too small, the primal problem \(({\mathscr {P}})\) could have multiple solutions. In this case, \(({\mathscr {P}}^d)\) has no critical point in \({\mathscr {S}}_a^+\) and \(({\mathscr {P}})\) could be in the hard case.

To illustrate Theorem 3, let us consider a 3-dimensional problem with coefficients

In this case, the eigenvalues of \(\varvec{Q}\) are \(\lambda _1 = \lambda _2 = -1\), and \(\lambda _3 = 1\). So we have \(k=2\) and the target function

$$ P({x})=-\frac{1}{2}(x_1^2+x_2^2)+\frac{1}{2}x_3^2+1.8x_3 $$

is nonconvex, whose minimizers are on the boundary of the feasible region. Replacing \(x_1^2+x_2^2\) with \(r^2-x_3^2\), the target function \(P({x})\) can be reformulated as a univariate function of \(x_3\),

$$ g(x_3)=x_3^2+1.8x_3- 2, $$

which achieves the minimum at \(x_3=-0.9\). Then we obtain the following equation

$$ x_1^2+x_2^2=r^2-x_3^2=2^2-(-0.9)^2=3.19. $$

So all \(\bar{\varvec{x}}\in \mathbb {R}^3\) satisfying \(\bar{x}_1^2+\bar{x}_2^2=3.19\) and \(\bar{x}_3=-0.9\) are global minimizers of the problem.

By the fact that \(\sum _{i=1}^2 \hat{f}_i^2 = 0\) and \(\sum _{i=2+1}^3\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2} = (-1.8)^2/(1+1)^2 \le r^2 = 4\), Theorem 3 shows that \(P^d(\sigma )\) has no critical point in \({\mathscr {S}}_a^+\), and \(({\mathscr {P}})\) is indeed in the hard case and has infinite number of global solutions. If we choose either a smaller r or a vector \(\varvec{f}\) with a larger magnitude such that \(\sum _{i=2+1}^3\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2} > r^2 \), the global solution will be unique. For example, let \( r=0.5 \). Then \(x_3=-0.9\) is no longer the minimizer of \(g(x_3)\) and the problem \( \min \{ g(x_3) | \; x_3^2 \le 0.5^2 \}\) leads to \(x_3=-0.5\). From \( x_1^2+x_2^2=r^2-x_3^2=0.5^2-(-0.5)^2=0\), we know the unique global solution of \(({\mathscr {P}})\) is \(\bar{\varvec{x}}=(0,0,-0.5)^T\).

In [37], Martinez investigated the ‘local-nonglobal minimizers’ of the problem (\({\mathscr {P}}\)), of which the main results (Theorem 3.1 in [37]) can be restated in the following theorem.

Theorem 4.

(i) If \(\bar{\varvec{x}}\) is a local-nonglobal minimizer of (\({\mathscr {P}}\)), then there is a \(\bar{\sigma }\) \(\in (\max \{0,-\lambda _2\},-\lambda _1)\) such that \(\varvec{G}(\bar{\sigma })\bar{\varvec{x}}\)=\(\varvec{f}\) and \((P^d(\bar{\sigma }))''\ge 0\).

(ii) There exists at most one local-nonglobal minimizer of (\({\mathscr {P}}\)).

(iii) If \(\Vert \bar{\varvec{x}}\Vert =r\), \(\varvec{G}(\bar{\sigma })\bar{\varvec{x}}=\varvec{f}\) for some \(\bar{\sigma }\in (-\lambda _2,-\lambda _1)\), \(\bar{\sigma }>0\) and \((P^d(\bar{\sigma }))''>0\), then \(\bar{\varvec{x}}\) is a strict local minimizer of (\({\mathscr {P}}\)).

From the point of view of the canonical duality theory, the \(\bar{\sigma }\) in this theorem is actually a critical point of \(P^d(\sigma )\). The case of \(({\mathscr {P}})\) having no local-nonglobal minimizers implies that all the local minimizers are global solutions. The situations that leads to this case include (i) the multiplicity of \(\lambda _1\) being larger than one; (ii) no critical point in \((\max \{0,-\lambda _2\},-\lambda _1)\), and (iii) \(\varvec{f}\) being perpendicular to the eigenvector of \(\lambda _1\). The first situation results in \((-\lambda _2,-\lambda _1)=\emptyset \). The last situation violates the necessary condition \((P^d(\sigma ))''\ge 0\), which can be observed from the expression of \((P^d(\sigma ))''\),

$$ (P^d(\sigma ))''=-2\sum _{i=1}^n\frac{\hat{f}^2_i}{(\lambda _i+\sigma )^3}. \nonumber $$

For any \(\sigma \in (-\lambda _2,-\lambda _1)\), the only nonnegative item in \((P^d(\sigma ))''\) is the first term \(-2\hat{f}_1^2/(\lambda _1+\sigma )^3\). Thus \((P^d(\sigma ))''\) will be negative if \(\hat{f}_1^2=0\). As shown in Fig. 1, there is a critical point \(\bar{\sigma }_2 \in (-\lambda _2, -\lambda _1)=(4.37,10.51)\) and the corresponding solution \(\bar{{x}}_2\) obtained from the Eq. (5) is a local minimizer.

4 Perturbation Methods

This section is devoted to compute solutions for the problem when the canonical dual problem \(({\mathscr {P}}^d)\) has no critical point in \((-\lambda _1,+\infty )\). Since a necessary condition for the hard case is \(\sum _{i=1}^k\hat{f}_i^2=0\), a perturbation can be introduced such that this condition does not hold true anymore. Impressively, once we obtain the critical point in \({\mathscr {S}}^+_a\), all the global solutions can be determined. Our approach has been applied successfully in canonical duality theory for solving nonlinear algebraic equations [38], chaotic dynamical systems [39], as well as a class of NP-hard problems in the global optimization [36, 40, 41].

In order to establish the existence conditions, a perturbation \(\sum _{i=1}^k\alpha _iU_i\) with parameters

$$ \varvec{\alpha }=\{\alpha _i\}_{i=1}^k\ne 0 $$

is introduced to \(\varvec{f}\). Let

$$ \varvec{p}=\varvec{f}+\sum _{i=1}^k\alpha _iU_i,~~ \hat{\varvec{p}}=U^T\varvec{p}, {\text {and }} P_\alpha ({x})={x}^T\varvec{Q}{x}-2\varvec{p}^T{x}. \nonumber $$

It is true that the existence conditions hold true for the perturbed problem

$$ ({\mathscr {P}}_\alpha )~~~~\min \{P_\alpha ({x}) \,|\, {x}\in {\mathscr {X}}_a\}, $$

for \(\sum _{i=1}^k\hat{p}_i^2\ne 0\) is guaranteed by (4).

The following theorem states that if the parameter \(\varvec{\alpha }\) is chosen appropriately, the optimal solution of the perturbed problem approximates that of the primal problem \(({\mathscr {P}})\).

Theorem 5.

Suppose that \(\lambda _1\le 0\), there is no critical point of \(P^d(\sigma )\) in \({\mathscr {S}}_a^+\), and \(\bar{\varvec{x}}^*\) is the optimal solution of the problem (\({\mathscr {P}}_\alpha \)). Then, there is a global solution of the problem (\({\mathscr {P}}\)), denoted as \(\bar{\varvec{x}}\), which is on the boundary of \({\mathscr {X}}_a\) and, for any \(\varepsilon >0\), if the parameter \(\varvec{\alpha }\) satisfies

$$ \Vert \varvec{\alpha }\Vert ^2\le (\lambda _2-\lambda _1)^2 \left( r^2-\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2} \right) (1/\sqrt{2(1-\cos (\varepsilon /r))}-1)^{-2} , $$

we have \(\Vert \bar{\varvec{x}}^*-\bar{{x}}\Vert \le \varepsilon \).

Proof.

For simplicity, the coordinate system is rotated and let \(\varvec{y}=U^T{x}\), \(\varvec{y}_k=\{y_i\}_{i=1}^k\) and \(\varvec{y}_{\ell } =\{y_i\}_{i=k+1}^n\). Since \(\hat{f}_i=0\) for \(i=1,\ldots ,k\), variables \(y_i\) for \(i=1,\ldots ,k\) appear in the target function only in the form of squares. On the boundary of \({\mathscr {X}}_a\), the problem (\({\mathscr {P}}\)) is then equivalent to the following problem in \(\mathbb {R}^{n-k}\):

$$ \min _{\Vert \varvec{y}_{\ell } \Vert \le r} ~P^{\ell } (\varvec{y}_{\ell } )=\sum _{i=k+1}^n(\lambda _i-\lambda _1)y_i^2-\sum _{i=k+1}^n2\hat{f}_iy_i+\lambda _1r^2. $$

Since \(P^{\ell } (\varvec{y}_{\ell } )\) is a strictly convex function, it has a unique stationary point,

$$ \bar{\varvec{y}}_{\ell }=\left\{ \frac{\hat{f}_i}{\lambda _i-\lambda _1}\right\} _{i=k+1}^n. $$

Combining with the assumption of no critical point in \({\mathscr {S}}_a^+\), we know that this stationary point is the global optimal solution of the problem (4). Then, all \(\bar{\varvec{y}}\) that satisfies \(\bar{\varvec{y}}_k^T\bar{\varvec{y}}_k=r^2-\bar{\varvec{y}}_{\ell }^T\bar{\varvec{y}}_{\ell }\) are solutions of the problem \(({\mathscr {P}})\). Here we choose one particular solution with

$$ \bar{\varvec{y}}_k=h\bar{\varvec{y}}_k^*, ~~h=\frac{1}{\Vert \bar{\varvec{y}}_k^*\Vert }\sqrt{r^2-\bar{\varvec{y}}_{\ell }^T\bar{\varvec{y}}_{\ell }}, $$

where \(\bar{\varvec{y}}^* = U\bar{\varvec{x}}^*\), and let \(\bar{\varvec{x}}=U\bar{\varvec{y}}\).

By canceling variables \(y_i,i=1,\ldots ,k\), the perturbed problem (4) with the equality constraint is equivalent to

$$ \min _{\Vert \varvec{y}_{\ell } \Vert \le r} P_\alpha ^{\ell } (\varvec{y}_{\ell } )=\sum _{i=k+1}^n(\lambda _i-\lambda _1)y_i^2-\sum _{i=k+1}^n2\hat{f}_iy_i+\lambda _1r^2-2\Vert \varvec{\alpha }\Vert \sqrt{r^2-\varvec{y}_{\ell }^T\varvec{y}_{\ell }}. $$

The function \(P_\alpha ^{\ell } (\varvec{y}_{\ell } )\) is also strictly convex. Moreover, for any \(\Vert \varvec{y}_{\ell } \Vert < r\), we have \(P_\alpha ^{\ell } (\varvec{y}_{\ell } )<P^{\ell } (\varvec{y}_{\ell } )\), while for any \(\Vert \varvec{y}_{\ell } \Vert =r\), we have \(P_\alpha ^{\ell } (\varvec{y}_{\ell } )=P^{\ell } (\varvec{y}_{\ell } )\). The fact indicates that the unique stationary point of \(P_\alpha ^{\ell } (\varvec{y}_{\ell } )\) is in the interior of \(\Vert \varvec{y}_{\ell } \Vert \le r\). Thus the global solution \(\bar{\varvec{y}}_{\ell } ^*\) is a stationary point of the problem (4) and then satisfies

$$ \bar{y}_i^*=\frac{\hat{f}_i}{\lambda _i-\lambda _1+\Vert \varvec{\alpha }\Vert (r^2-\bar{\varvec{y}}_{\ell }^{*T}\bar{\varvec{y}}_{\ell }^*)^{-\frac{1}{2}}}, i=k+1,\ldots ,n. $$

and

$$ |\bar{y}_i^*|<|\bar{y}_i|, i=k+1,\ldots ,n. $$

We will prove that as \(\Vert \varvec{\alpha }\Vert \) approaches zero, \(\bar{\varvec{y}}^*\) will approach \(\bar{\varvec{y}}\). First, we have the following relationship

$$\begin{aligned} \bar{\varvec{y}}^{* T}\bar{\varvec{y}}&=\sqrt{r^2-\bar{\varvec{y}}_{\ell } ^{* T}\bar{\varvec{y}}_{\ell } ^*}\sqrt{r^2-\bar{\varvec{y}}_{\ell } ^{T}\bar{\varvec{y}}_{\ell } } +\bar{\varvec{y}}^{*T}_{\ell } \bar{\varvec{y}}_{\ell } \nonumber \\&\le \frac{1}{2}\left( r^2-\bar{\varvec{y}}_{\ell } ^{* T}\bar{\varvec{y}}_{\ell } ^*+r^2-\bar{\varvec{y}}_{\ell } ^{T}\bar{\varvec{y}}_{\ell } \right) +\bar{\varvec{y}}^{* T}_{\ell } \bar{\varvec{y}}_{\ell } \nonumber \\&=r^2-\frac{1}{2}\Vert \bar{\varvec{y}}_{\ell } ^*-\bar{\varvec{y}}_{\ell } \Vert ^2,\nonumber \end{aligned}$$

where the first equality is derived from the definition of \(\bar{\varvec{y}}_k\) and the fact that \(\bar{\varvec{y}}^*\) locates on the surface of the sphere. Based on the relationship

$$ \Vert \bar{\varvec{y}}^*-\bar{\varvec{y}}\Vert \le r\arccos \left( \frac{\bar{\varvec{y}}^{* T}\bar{\varvec{y}}}{r^2 } \right) \le r\arccos \left( \frac{r^2-\frac{1}{2}\Vert \bar{\varvec{y}}_{\ell } ^*-\bar{\varvec{y}}_{\ell } \Vert ^2}{r^2} \right) , \nonumber $$

we will have \(\Vert \bar{\varvec{y}}^*-\bar{\varvec{y}}\Vert \le \varepsilon \), if \(\Vert \bar{\varvec{y}}_{\ell } ^*-\bar{\varvec{y}}_{\ell } \Vert ^2\le 2r^2(1-\cos \frac{\varepsilon }{r})\). Then, it can be verified that

$$ \Vert \bar{\varvec{y}}_{\ell } ^*-\bar{\varvec{y}}_{\ell }\Vert ^2 \le \frac{r^2}{\left( (\lambda _2-\lambda _1)\Vert \varvec{\alpha }\Vert ^{-1}\sqrt{r^2-\bar{\varvec{y}}_{\ell } ^{* T}\bar{\varvec{y}}_{\ell } ^*}+1\right) ^2}. $$

If let the right side of Eq. (4) be less than or equal to \(2r^2(1-\cos \frac{\varepsilon }{r})\), we obtain

$$ \Vert \varvec{\alpha }\Vert ^2\le \frac{(\lambda _2-\lambda _1)^2(r^2-\bar{\varvec{y}}_{\ell } ^{*T}\bar{\varvec{y}}_{\ell } ^*)}{(1/\sqrt{2(1-\cos \frac{\varepsilon }{r})}-1)^2}.\nonumber $$

Combining with relations in (4), we can state that \(\Vert \bar{\varvec{y}}^*-\bar{\varvec{y}}\Vert \le \varepsilon \) if the following inequality is true

$$ \Vert \varvec{\alpha }\Vert ^2\le \frac{(\lambda _2-\lambda _1)^2(r^2-\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2})}{(1/\sqrt{2(1-\cos \frac{\varepsilon }{r})}-1)^2}. $$

Since \(\Vert \bar{\varvec{x}}^*-\bar{\varvec{x}}\Vert =\Vert \bar{\varvec{y}}^*-\bar{\varvec{y}}\Vert \), the Eq. (4) implies that \(\Vert \bar{\varvec{x}}^*-\bar{\varvec{x}}\Vert \le \varepsilon \).    \(\square \)

Theorem 5 shows that with a proper parameter \(\varvec{\alpha }\), the existence condition is guaranteed to hold true for the perturbed problem and the perturbation method can be used to solve the hard case approximately. As the perturbation parameters approach zero, the perturbed solutions will approach to one of the global solutions of \(({\mathscr {P}})\). By the projection theorem, the nearest points to \({\bar{\varvec{x}}}\) and \(\bar{\varvec{x}}^*\) in the subspace spanned by \(\{U_1,\ldots ,U_k\}\) are \(\sum _{i=1}^k(\bar{\varvec{x}}^{T}U_i)U_i\) and \(\sum _{i=1}^k(\bar{\varvec{x}}^{* T}U_i)U_i\), respectively. Then we have the following relationship

$$ \Vert \bar{\varvec{x}}^*-\sum _{i=1}^k(\bar{\varvec{x}}^{*T}U_i)U_i\Vert ^2<\Vert {\bar{\varvec{x}}}-\sum _{i=1}^k(\bar{\varvec{x}}^{T}U_i)U_i\Vert ^2, $$

which means that the perturbed solution \(\bar{\varvec{x}}^*\) is closer to the subspace spanned by \(\{U_1,\ldots ,U_k\}\) than the solution \({\bar{\varvec{x}}}\).

Furthermore, each solution of the problem (\({\mathscr {P}}\)) can be approximated, if the perturbation parameter \(\varvec{\alpha }\) is properly chosen. When the multiplicity of \(\lambda _1\) is equal to one, as stated in Theorem 3, there are exactly two global solutions. In this case, \(\varvec{\alpha }\) becomes a scalar and has exactly two possible directions, which are mutual opposite and, respectively, lead to the two global solutions (see Example 1). For general cases, there may be infinite number of global solutions for the problem (\({\mathscr {P}}\)), and we will show that there is a one-to-one correspondence between solutions of the problem (\({\mathscr {P}}\)) and directions of \(\varvec{\alpha }\). In the problem (4), variables \(y_i,i=1,\ldots ,k\) are removed by solving the following minimization problem

$$ \min \{-2\varvec{\alpha }^T\varvec{y}_k~~|~\varvec{y}_k^{T}\varvec{y}_k=r^2-\varvec{y}_{\ell }^T\varvec{y}_{\ell },\, \varvec{y}_k\in \mathbb {R}^k\}. $$

Its solution is

$$ \varvec{y}_k=h\varvec{\alpha },~~h=\frac{1}{\Vert \varvec{\alpha }\Vert }\sqrt{r^2-\varvec{y}_{\ell }^T\varvec{y}_{\ell }}, $$

i.e., the point falls on the boundary of the sphere in (4) and has the same direction with \(\varvec{\alpha }\). If \(\Vert \varvec{\alpha }\Vert \) keeps unchanged, the problem (4) always has the same solution and the scalar h also keeps unchanged. Thus, each direction of \(\varvec{\alpha }\) is corresponding to a solution \(\{y_i\}_{i=1}^k\), and all the solutions comprise the surface of a sphere centered at the original in \(\mathbb {R}^k\). On the other hand, from the problem (4), we have \(\bar{\varvec{y}}_k^T\bar{\varvec{y}}_k=r^2-\bar{\varvec{y}}_{\ell }^T\bar{\varvec{y}}_{\ell }\), which means all global solutions of the problem (\({\mathscr {P}}\)) also comprise the surface of a sphere. Combining Theorem 5, we then conclude that each solution of the problem (\({\mathscr {P}}\)) can be approached as the direction of \(\varvec{\alpha }\) is properly chosen and \(\Vert \varvec{\alpha }\Vert \) approaches zero.

5 Canonical Primal-Dual Algorithm

Based on the results obtained above, a canonical primal-dual algorithm is developed, which is matrix inverse free and the essential cost of calculation is only the matrix–vector multiplication.

The main step of this algorithm is to solve the following perturbed canonical dual problem:

$$ ({\mathscr {P}}^d_\alpha ) ~~~~ \max \big \{ P_\alpha ^d(\sigma )=-\varvec{p}^T\varvec{G}(\sigma )^{-1}\varvec{p}-r^2\sigma \, | \, \sigma \in {\mathscr {S}}_a^+ \big \} $$

Let \(\psi (\sigma )\) be its first-order derivative, i.e.,

$$ \psi (\sigma )=(P_\alpha ^d(\sigma ))^\prime =\varvec{p}^T\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{p}-r^2. $$

Then the critical point of \(P_\alpha ^d(\sigma )\) in \({\mathscr {S}}_a^+\) is corresponding to the solution of the equation \(\psi (\sigma )=0\) in \({\mathscr {S}}_a^+\). The first- and second-order derivatives of \(\psi (\sigma )\) are

$$\begin{aligned}&\psi ^\prime (\sigma )=-2\varvec{p}^T\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{p},\nonumber \\&\psi ^{\prime \prime }(\sigma )=6\varvec{p}^T\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{G}(\sigma )^{-1}\varvec{p}.\nonumber \end{aligned}$$

It is noticed that \(\psi (\sigma )\) is strictly decreasing and strictly convex over \({\mathscr {S}}_a^+\), \(\psi (\sigma )\) will approach \(-r^2\) as \(\sigma \) approaches infinity and \(\sigma =-\lambda _1\) is a pole of \(\psi (\sigma )\).

We use the Lanczos method to compute an approximation for the smallest eigenvalue of \(\varvec{Q}\) and a corresponding eigenvector, denoted, respectively, by \(\tilde{\lambda }_1\) and \(\tilde{U}_1\), where the latter is a unit vector. For choosing an effective perturbation, it is not necessary to calculate all eigenvectors of the smallest eigenvalue, since any one of which will be sufficient to divert the direction of \(\varvec{f}\). Here we use \(\alpha \tilde{U}_1\) as a perturbation to \(\varvec{f}\).

Although the perturbed canonical dual problem \(({\mathscr {P}}^d_\alpha )\) is strictly concave on \({\mathscr {S}}^+_a\), its derivative \(\psi (\sigma )\) would become ill-conditioned when \(\sigma \) approaches to the pole. Therefore, instead of nonlinear optimization techniques, a bisection method is used to find the root in \((-\lambda _1,+\infty )\) for \(\psi (\sigma )\). Each time, as a dual solution \(\sigma >-\lambda _1\) is obtained, the value of \(\psi (\sigma )\) is calculated and checked to see whether it is equal to zero. For moderate-size problems, it is not hard to calculate \(\varvec{G}(\sigma )^{-1}\varvec{p}\) by computing the inverse or decomposition of \(\varvec{G}(\sigma )\), but it is not possible for very large-size problems, especially when the memory is very limited. One alternative approach is to solve the following strictly convex minimization problem,

$$ \min _{{x}\in \mathbb {R}^n}~{x}^T\varvec{G}(\sigma ){x}-2\varvec{p}^T{x}, $$

whose optimal solution is \({x}=\varvec{G}(\sigma )^{-1}\varvec{p}\). Actually, during iterations, we do not need to calculate \(\psi (\sigma )\) every time, especially when \(\sigma \) is on the left side of the root and close to the pole. It is discovered that for a given \(\sigma \), the value of \(\psi (\sigma )\) is equal to the optimal value of the following unconstrained concave maximization problem

$$ \max _{\varvec{z}\in \mathbb {R}^n}~-\varvec{z}^T\varvec{G}(\sigma )\varvec{G}(\sigma )\varvec{z}+2\varvec{p}^T\varvec{z}-r^2. $$

By the fact that the value of the target function will increase during the iterations, we can stop solving the problem (5) if the target function is larger than a threshold, and then we claim that \(\sigma \) must be on the left side of the root. Thus, the ill-condition in computing \(\psi (\sigma )\) can be prevented as \(\sigma \) approaches to the pole. Since the optimal value is equal to zero when \(\sigma \) is a root of \(\psi (\sigma )\), any nonnegative value can be a threshold.

An uncertainty interval should be initialized before the bisection method is applied, and it is used to safeguard that the root is always in intervals of the bisection method. For the right end of the interval, any large enough number can be a candidate. An upper bound can be calculated and then be chosen to be the right end of the uncertainty interval. Let \(\bar{\sigma }^* \in (-\lambda _1,+\infty )\) be the root of \(\psi (\sigma )\). From the definition of \(\psi (\sigma )\), we have

$$ \frac{1}{(\lambda _1+\bar{\sigma }^*)^2}\hat{\varvec{p}}^T\hat{\varvec{p}}-r^2\ge 0. $$

Hence, \(\sqrt{\hat{\varvec{p}}^T\hat{\varvec{p}}}/r=\Vert \varvec{p}\Vert /r\) is an upper bound for the root \(\bar{\sigma }^*\). However, the bound \(\Vert \varvec{p}\Vert /r\) may be not tight. A practical way is to let \(\sigma =-\lambda _1\) as a starting point and then to update \(\sigma \) recursively by moving a certain step to its right each step. If the first \(\sigma \) that makes the value of \(\psi (\sigma )\) be negative is smaller than the upper bound \(\Vert \varvec{p}\Vert /r\), it is a tighter right end for the uncertainty interval.

  • Algorithm 1 (Initialization)

  • Input: Coefficients \(\varvec{Q}\), \(\varvec{f}\) and r, and an error tolerance \(\varepsilon \).

  • The smallest eigenvalue: Use Lanczos method to obtain \(\tilde{\lambda }_1\) and \(\tilde{U}_1\).

  • Perturbation: If existence conditions do not hold, a perturbation is introduced and let

    $$ \varvec{p}=\varvec{f}+\alpha \tilde{U}_1; $$

    otherwise, let \(\varvec{p}= \varvec{f}\).

  • Uncertainty interval: set a step size \(s_t\) and a threshold \(\varepsilon _t\); let \(\sigma =\sigma _{\ell } =-\tilde{\lambda }_1\).

    • step 1: Solve the problem (5). If the value of the target function is larger than the threshold \(\varepsilon _t\), stop the iteration, let \(\sigma =\sigma +s_t\) and go to step 1; otherwise, go to step 2.

    • step 2: Calculate the value of \(\psi (\sigma )\). If \(\psi (\sigma )>0\), set \(\sigma _{\ell } =\sigma \), \(\sigma =\sigma +s_t\) and go to step 2; otherwise, let \(\sigma _u=\sigma \) and stop.

As the uncertainty interval \([\sigma _{\ell }, \sigma _u]\) is obtained, the bisection method is applied to find the next iterate for \(\sigma \), by setting \(\sigma \) be the middle point of the uncertainty interval. The main part of the algorithm is given as follows:

  • Algorithm 2 (Main)

  • Do

    • set \(\sigma =(\sigma _{\ell } +\sigma _u)/2\) and calculate the value of \(\psi (\sigma )\);

    • If \(|\psi (\sigma )|<\varepsilon \), then STOP and return \(\sigma \) and \({x}\);

    • Else if \(\psi (\sigma )>0\), update \(\sigma _{\ell } =\sigma \);

    • Else update \(\sigma _u=\sigma \);

    • End if

  • End do

6 Numerical Experiments

First, three small-size examples are used to illustrate the application of the canonical duality theory. Then, randomly generated examples for \(n \in [ 500, 5000]\) are presented to demonstrate the efficiency of our method.

6.1 Small-Size Examples

Example 1

The given coefficients are

The existence conditions do not hold true for this example. There are two global solutions, \(\bar{\varvec{x}}_1=(0.437,-0.9)^T\) and \(\bar{\varvec{x}}_2=(-0.437,-0.9)^T\), which are red points shown in Fig. 2. In order to show how the perturbation method works, a big perturbation is firstly introduced to the linear coefficient \(\varvec{f}\) and let

$$\varvec{p}=(0.5,-1.8)^T.$$

A critical point appears in the interior of \({\mathscr {S}}_a^+\), which is \(\bar{\sigma }=1.676\) (see Fig. 2b). The corresponding optimal solution for the perturbed problem is \(\bar{\varvec{x}}^*_1=(0.74,-0.673)^T\), which is shown as a green point in Fig. 2a. As the perturbation becomes smaller, the solution of the perturbed problem should approach to that of the original problem. We then let

$$ \varvec{p}=(0.01,-1.8)^T. $$

The critial point now is \(\bar{\sigma }=1.022\) and the corresponding solution is \(\bar{\varvec{x}}^*_1=(0.456,-0.89)^T\) (see Fig. 2d and c).

As pointed out above, the other global solution, \(\bar{\varvec{x}}_2\), can also be approximated by just choosing a perturbation with the opposite direction.

Let \(\varvec{p}=(-0.5,-1.8)^T\) and \(\varvec{p}=(-0.01,-1.8)^T\). The critical point will be the same as that for \(\bar{\varvec{x}}^*_1\), \(\bar{\sigma }=1.676\) and \(\bar{\sigma }=1.022\), and their corresponding primal solutions are \(\bar{\varvec{x}}^*_2=(-0.74,-0.673)^T\) and \(\bar{\varvec{x}}^*_2=(-0.456,-0.89)^T\).

In Fig. 2b, we can see that there is no critical point between \(-\lambda _2=-1\) and \(-\lambda _1=1\), which suggests that there will no local-nonglobal solution. While there is a critical point between \(-\lambda _2=-1\) and \(-\lambda _1=1\) in Fig. 2d, by Theorem 4 there must be a local-nonglobal solution and it should locate near one of the global solutions, depending on the perturbation.

Fig. 2
figure 2

Example 1: a and c are contours of the primal function and the boundary of the sphere; b and d are the graphs of the dual function

Example 2

The matrix \(\varvec{Q}\) and radius r are the same as that in Example 1 and \(\varvec{f}\) is changed to

$$ \varvec{f}=\begin{pmatrix}0 \\ -3 \end{pmatrix}, $$

which is in the same direction of that in Example 1 but has a larger length. We notice that though \(\sum _{i=1}^k\hat{f}_i^2\ne 0\) is violated, the condition \(\sum _{i=k+1}^n\frac{\hat{f}_i^2}{(\lambda _i-\lambda _1)^2}>r^2\) holds true. Thus, the problem is not in the hard case. There is a critical point in the interior of \({\mathscr {S}}_a^+\), which is shown in Fig. 3b, and it is corresponding to the unique global solution of the primal problem, which is the green point in Fig. 3a.

Fig. 3
figure 3

Example 2: a is the contour of the primal function and boundary of the sphere; b is the graph of the dual function

Example 3

We consider a four-dimensional problem with \(\varvec{Q}\), \(\varvec{f}\) and r being

$$\begin{aligned} \varvec{Q}=\begin{pmatrix} [r]-10 &{} 0 &{} 2 &{} -2 \\ 0 &{} -3 &{} -4 &{} 2 \\ 2 &{} -4 &{} 7 &{} -4\\ -2 &{} 2 &{} -4 &{} 1 \end{pmatrix}, ~~ \varvec{f}=\begin{pmatrix} [r] -10 \\ 6 \\ 10 \\ 9 \end{pmatrix}, {\text {and}}\,r=5. \end{aligned}$$

As shown in Fig. 1, the canonical dual function \(P^d(\sigma )\) has six critical points

$$\begin{aligned} \bar{\sigma }_6=-11.1<\bar{\sigma }_5=-10.49<\bar{\sigma }_4=-1.84<\bar{\sigma }_3=6.08<\bar{\sigma }_2=8.23<\bar{\sigma }_1 =12.58. \end{aligned}$$

It can be verified that \(\bar{\sigma }_1\) belongs to \({\mathscr {S}}_a^+\), i.e., \(\varvec{G}(\bar{\sigma }_1)\succ 0\), which can also be observed from Fig. 1 where all the vertical lines represent eigenvalues of matrix \(\varvec{Q}\). Thus the corresponding solution

$$ \bar{\varvec{x}}_1=(-4.71,1.11,1.25, 0.18)^T $$

is the global solution of the primal problem. While \(\bar{\sigma }_2=8.23\) is a local minimizer of \(P^d(\sigma )\) in \((-\lambda _2,-\lambda _1)\) and thus the corresponding solution

$$ \bar{\varvec{x}}_2=(4.33, 1.05, 0.91, 2.08)^T $$

is the local-nonglobal minimizer.

6.2 Large-Size Examples

Examples with dimensions of 500, 1000, 2000, 3000, and 5000 are randomly generated, including both general and hard cases. For each given dimension, both cases are tested by ten examples, respectively. Thus, there are totally one hundred examples. All elements of the coefficients, \(\varvec{Q}\), \(\varvec{f}\), and r, are integer numbers in \([-100,100]\). For each example of the hard case, in order to make \(\varvec{f}\) be easily chosen, we use a matrix \(\varvec{Q}\) of whom the multiplicity of the smallest eigenvalue is equal to one. The vector \(\varvec{f}\) is constructed such that it is perpendicular to the eigenvector of the smallest eigenvalue, and then a proper radius r is selected such that the existence conditions are violated.

Table 1 General case and \(\alpha =1e-3\)
Table 2 General case and \(\alpha =1e-4\)
Table 3 Hard case and \(\alpha =1e-3\)

Two approaches are used to calculate the value of \(\psi (\sigma )\), one using decomposition methods to calculate \(\varvec{G}(\sigma )^{-1}\varvec{p}\), for which we use the ‘left division’ in Matlab, and the other solving the problem (5), for which we use the function ‘quadprog’ in Matlab. The tolerance parameter ‘TolFun’ of ‘quadprog’ is set to 1e-12. The Lanczos method is implemented by the function ‘eigs’ of Matlab. The Matlab is of version 7.13 and runned in the platform with Linux 64-bit system and quad CPUs.

The step size \(s_t\), the threshold \(\varepsilon _t\) and the termination tolerance \(\varepsilon \) are set to \(\Vert \varvec{p}\Vert /(200r)\), 0, and 1e-8, respectively. For the hard case, a perturbation \(\alpha U_1\) is added to the vector \(\varvec{f}\), and two values of \(\alpha \), 1e-3, and 1e-4, are tried.

Results are shown in Tables 1, 2, 3, and 4, and they contain the number of examples which are successfully solved (Succ.Solv.), the distance of the optimal solution to the boundary of the sphere (Dist.Boun.), the number of iterations in Algorithm 2 (Main) (Numb.Iter.), and the running time (in second) of the algorithm (Runn.Time). The values in the columns of Dist.Boun., Numb.Iter., and Runn.Time are averages of the examples successfully solved. We compare the results of the algorithm adopting ‘left division’ and that of the algorithm adopting ’quadprog’ in the same table, where LD denotes ‘left division’ and QP denotes ‘quadprog’.

Table 4 Hard case and \(\alpha =1e-4\)

We can see that the examples are solved very accurately with error allowance being less than 1e-09. The failure in solving some examples is due to ‘left division’ and ‘quadprog’ being unable to handle very nearly singular matrices. For general cases, all the examples can be solved within no more than 30 iterations, while for hard cases, the number of iterations is around 40. From the running time, we notice that our method is capable to handle very large problems in reasonable time. The algorithms using ‘left division’ and ’quadprog’ have similar performances in the accuracy and the number of iterations. Whereas the one using ‘left division’ needs much less time than that of the one using ‘quadprog’. However, the one using ‘quadprog’ is able to solve more examples successfully.

7 Conclusion Remarks

We have presented a detailed study on the quadratic minimization problem with a sphere constraint. By the canonical duality, this nonconvex optimization is equivalent to a unified concave maximization dual problem over a convex domain \({\mathscr {S}}^+_a\), which is true also for many other global optimization problems under certain conditions (see [26, 42,43,44,45,46,47]). Based on this canonical dual problem, sufficient and necessary conditions are obtained for both general and hard cases. In order to solve hard-case problems, a perturbation method and the associated polynomial algorithm are proposed. Numerical results demonstrate that the proposed approach is able to solve large-size problems deterministically and efficiently. Combining with the trust region method, the theory and method presented in this paper can be used to solve general global optimizations.