The generalized trust region subproblem

Pong, Ting Kei; Wolkowicz, Henry

doi:10.1007/s10589-013-9635-7

The generalized trust region subproblem

Published: 17 January 2014

Volume 58, pages 273–322, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Optimization and Applications Aims and scope Submit manuscript

The generalized trust region subproblem

Download PDF

Ting Kei Pong¹ &
Henry Wolkowicz¹

1940 Accesses
46 Citations
Explore all metrics

Abstract

The interval bounded generalized trust region subproblem (GTRS) consists in minimizing a general quadratic objective, q ₀(x)→min, subject to an upper and lower bounded general quadratic constraint, ℓ≤q ₁(x)≤u. This means that there are no definiteness assumptions on either quadratic function. We first study characterizations of optimality for this implicitly convex problem under a constraint qualification and show that it can be assumed without loss of generality. We next classify the GTRS into easy case and hard case instances, and demonstrate that the upper and lower bounded general problem can be reduced to an equivalent equality constrained problem after identifying suitable generalized eigenvalues and possibly solving a sparse system. We then discuss how the Rendl-Wolkowicz algorithm proposed in Fortin and Wolkowicz (Optim. Methods Softw. 19(1):41–67, 2004) and Rendl and Wolkowicz (Math. Program. 77(2, Ser. B):273–299, 1997) can be extended to solve the resulting equality constrained problem, highlighting the connection between the GTRS and the problem of finding minimum generalized eigenvalues of a parameterized matrix pencil. Finally, we present numerical results to illustrate this algorithm at the end of the paper.

An efficient algorithm for solving the generalized trust region subproblem

Article 13 May 2016

Local nonglobal minima for solving large-scale extended trust-region subproblems

Article 02 September 2016

Trust region subproblem with an additional linear inequality constraint

Article 05 October 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We consider the following quadratic minimization problem, which we call the interval bounded generalized trust region subproblem (GTRS):

$$ \text{(GTRS)}\qquad \begin{array}{r@{\quad }l} p^* := \inf& q_0(x):= x^TAx - 2a^Tx \\ \mbox{s.t.} & \ell\leq\underbrace{x^TBx - 2b^Tx}_{q_1(x)} \leq u. \end{array} $$

(1.1)

Here A, $B \in { \mathcal{S}^{n}}$, the space of real n×n symmetric matrices, a, $b\in {\mathbb{R}}^{n}$, and −∞<l≤u<∞. To avoid trivialities, we assume that B≠0 and the GTRS is feasible.^{Footnote 1} We emphasize that the quadratic constraint is two-sided, and both the objective and constraint functions are possibly nonconvex. Thus we are essentially considering a general nonlinear program with a quadratic objective and two quadratic constraints.

Problem (1.1) is most studied in the one-sided ball-constrained case, i.e., the special case when B=I, b=0, and ℓ=0<u, known as the trust region subproblem (TRS):

$$ \text{(TRS)}\qquad \begin{array}{r@{\quad }l} \inf& x^TAx - 2a^Tx \\ \mbox{s.t.} & x^Tx \le u. \end{array} $$

(1.2)

These problems arise in regularization or trust region methods for unconstrained and constrained nonlinear programming; see, e.g., [7] for a comprehensive discussion. The TRS provides a quadratic model in a trust region around the current point. The use of scaled trust regions with a general B≻0 in the constraint (see, e.g. [15]) motivates our consideration of GTRS. In our general consideration of (1.1), our scaled trust region is defined with a possibly indefinite B; hence, x ^T Bx could be a quadratic form induced from the indefinite inner product 〈x,y〉:=x ^T By; see, e.g. [13]. This together with the two-sided constraints model annular/hyperbolic type regions that allow minimum as well as maximum steplengths. In addition, the possibly nonzero b accounts for a shift of the center of the trust region, which allows trust regions to be built around previous iterates.

Problem (1.2) is explicitly nonconvex, since A is not necessarily positive semidefinite. Nevertheless, it is implicitly convex in that necessary and sufficient optimality conditions have been derived, see Gay [12] and Moré and Sorensen [25]. The conditions are rephrased in a modern primal-dual paradigm in, for example, [7, 10, 11, 15, 29, 34]. The optimality conditions of the TRS can also be derived using the S-lemma; see, e.g. [28]. The S-lemma, developed by Yakubovich [36] in 1971, states whether a quadratic (in)equality is a consequence of other quadratic (in)equalities. The earliest results of this kind date back to Finsler [9] and Hestenes and McShane [19] and we refer the readers to [28] for a detailed account of its history and its applications. Turning to the more general problem (1.1), necessary and sufficient optimality conditions have been studied in [24, 34], under certain constraint qualifications in the special cases when b=0 or u=ℓ. The general case (1.1) was considered in [37, Sect. 2.1] under a dual strict feasibility assumption; see also [3, 8, 20]. Further references are available in the online bibliography [17].

The necessary and sufficient optimality conditions for (1.2) have been the basis for developing efficient algorithms for solving TRS. The classical algorithm [25] by Moré and Sorensen (MS algorithm) applies Newton’s method with backtracking to the so-called secular function, and takes a primal step to the boundary for near hard case instances. This algorithm can be expensive since each iteration requires a full Cholesky factorization. The MS algorithm was later incorporated into the generalized Lanczos trust-region (GLTR) method [15, 16]. The GLTR algorithm uses the Lanczos procedure to obtain a sequence of TRS on low dimensional subspaces with tridiagonal matrices as objectives. The resulting sequence of TRS is then solved efficiently by a variant of the MS algorithm; see [15, Sect. 5.2]. Another related algorithm is the sequential subspace method (SSM) [18]. This algorithm constructs very low dimensional subspaces that include a point obtained from a sequential quadratic programming subproblem. The resulting sequence of problems are diagonalized and solved by a Newton-secant type method. Another line of algorithms involves reformulating the TRS into a parameterized eigenvalue problem. This includes, e.g., the Rendl-Wolkowicz (RW) algorithm [11, 29], the large-scale trust-region subproblem (LSTRS) by Rojas et al. [31, 32] and its variant [21]. These algorithms are factorization free, and can take advantage of well developed eigensolvers for large, sparse matrices. Other algorithms for TRS can be found in, e.g. [33, 35]. Despite the many algorithms developed for TRS, there are currently no algorithms specifically designed for the general problem (1.1); though the special case when B≻0, b=0 and ℓ=0 can be solved by the GLTR method.

In this paper we present characterizations of optimality and propose an algorithm for solving large-scale instances of GTRS (1.1). Specifically, we obtain optimality conditions under a constraint qualification and show that it can be assumed without loss of generality. Our results show that even though the GTRS is a problem consisting of general, possibly nonconvex, quadratic functions, it has both necessary and sufficient optimality conditions as in convex programming, and it has strong duality results as in linear programming, in the sense that when the constraint qualification fails, the problem can be explicitly solved. Thus, the GTRS is implicitly convex like the classical TRS. It sits on the boundary between linear and nonlinear programming, and on the boundary between convex, and general nonconvex programming. We also discuss in detail the so-called easy and hard cases corresponding to (non)singularity of the Hessian of the Lagrangian, which was previously only studied for some special cases of (1.1) in [34]. Moreover, as in [11], we include a shift and deflate operation that finds an explicit solution in the hard case (case 2). We then demonstrate that the GTRS can be reduced to an equality constrained problem (i.e., an instance of GTRS with u=ℓ) after finding some suitable generalized eigenvalues and possibly solving a sparse system. To solve this equality constrained problem, we generalize the ideas in [11, 29] and transform the problem into a parameterized generalized eigenvalue problem. The latter problem can then take advantage of specialized solvers for finding generalized eigenvalues/eigenvectors, e.g., the eigifp developed in [14]. We compare this approach with the GLTR algorithm (when B≻0) and a simple implementation of Newton’s method with the Armijo line search rule as applied to solve the dual problem of (1.1). Our computational results on large-scale instances show that our approach is competitive with the Newton method and usually outperforms the GLTR algorithm for random sparse positive definite B in both runtime and solution accuracy. In the case when B is indefinite, our approach requires additional inputs for initialization; see Sect. 3.3 for details. Granting such inputs, our computational results show that our approach is competitive with the Newton method.

1.1 Outline

We complete this section with some preliminary notations in Sect. 1.2. We then present the characterizations of optimality in Sect. 2.1. This includes a constraint qualification that can be assumed without loss of generality in the sense that if it fails, then an explicit solution of the GTRS (1.1) can be obtained. In Sect. 2.2, we discuss the so-called easy and hard cases, the shift and deflation procedure to obtain an explicit solution in the hard case, and the reduction to an equality constrained problem. The algorithm for this equality constrained problem and its implementation details are discussed in Sect. 3. We present numerical tests in Sect. 3.4 and finally some concluding remarks in Sect. 4.

1.2 Notation

In this paper, the symbol ${ {\mathbb{R}}^{n}}$ denotes the n-dimensional vector space. For $v\in { {\mathbb{R}}^{n}}$, ∥v∥ denotes the Euclidean norm of v. The space of n×n symmetric matrices equipped with the trace inner-product is denoted by ${ \mathcal{S}^{n}}$. For $C\in { \mathcal{S}^{n}}$, λ _max(C) and λ _min(C) denote the largest and smallest eigenvalue of C, respectively. For $C,D\in { \mathcal{S}^{n}}$, C⪰D,C≻D denote C−D is positive semidefinite, and positive definite, respectively. For a (not necessarily symmetric) square matrix M, tr(M) denotes the trace of M, M ^† denotes the Moore-Penrose generalized inverse of M, and $\operatorname{{Null}}(M)$ and $\operatorname{{Range}}(M)$ denote its null and range spaces, respectively. The identity matrix is denoted by I, whose dimension should be clear from the context.

For an interval J on the real line, the symbol cl(J) denotes its closure, and ri(J) denotes its relative interior. Finally, for a subspace V of ${ {\mathbb{R}}^{n}}$, the orthogonal complement of V is denoted by V ^⊥, and if a vector $v\in { {\mathbb{R}}^{n}}$ is orthogonal to V, we write v⊥V.

We use a number of acronyms in this paper. TRS refers to the standard trust region subproblem in (1.2); while GTRS refers to the generalized trust region subproblem in (1.1). GTRS₌ refers to the special equality constrained GTRS in (3.1). The acronyms D-GTRS, D_≻-GTRS, SDP-GTRS and DSDP-GTRS correspond to various reformulations of the GTRS and are defined in the beginning of Sect. 2.1. RICQ stands for the relative interior constraint qualification defined in (2.6). Finally, RW algorithm stands for the Rendl-Wolkowicz algorithm studied extensively in [11, 29] and Sect. 3, while the GLTR algorithm stands for the generalized Lanczos trust-region method in [15, 16].

2 Properties of GTRS

2.1 Duality and optimality conditions

In this section, we present the characterizations of optimality and duality properties for GTRS (1.1). In particular, we include assumptions that can be made without loss of generality. These assumptions include a constraint qualification, whose failure means that an explicit solution for GTRS can be obtained.

We start by writing down the Lagrangian dual of (1.1). The Lagrangian for (1.1) can be written with one free Lagrange multiplier λ, or with two nonnegative Lagrange multipliers^{Footnote 2}

$$ \lambda_+ := \max\{\lambda,0\}, \lambda_- := -\min\{\lambda,0\}, \lambda=(\lambda_+-\lambda_-): $$

(2.1)

$$\begin{aligned} L(x,\lambda_+,\lambda_-) =& x^TAx - 2a^T x +\lambda_+\bigl(\ell-\bigl(x^TBx - 2b^T x\bigr)\bigr)\\ &{}+\lambda_-\bigl(x^TBx - 2b^T x-u\bigr) \\ =& x^T\bigl(A-(\lambda_+-\lambda_-) B\bigr)x -2\bigl(a-(\lambda_+-\lambda_-) b\bigr)^Tx +\lambda_+\ell-\lambda_-u \\ =& x^T(A-\lambda B)x -2(a-\lambda b)^Tx +\lambda_+\ell-\lambda_-u. \end{aligned}$$

The Lagrangian dual problem of (1.1) can then be reduced to the following problem:

$$ \text{(D-GTRS)}\qquad \begin{array}{r@{\quad }l} d^* := \sup& h(\lambda) + \ell\lambda_+ - u\lambda_-\\ \mbox{s.t.} & A - \lambda B \succeq0, \end{array} $$

(2.2)

where the dual functional

$$\begin{aligned} h(\lambda) =& \inf_x\ x^T(A-\lambda B)x - 2(a - \lambda b)^T x \\ =& \begin{cases} -(a-\lambda b)^T(A - \lambda B)^\dagger(a-\lambda b) & \mathrm{if}\ a-\lambda b\in \operatorname{{Range}}(A-\lambda B) \\ &\mbox{and } A-\lambda B\succeq0, \\ -\infty& \mathrm{otherwise}. \end{cases} \end{aligned}$$

Note that the objective function in D-GTRS is concave. In this paper, we will also look at the following closely related problem formed by enforcing positive definiteness in (2.2) and thus reducing the size of the feasible set:

$$ \text{(D$_{\succ}$-GTRS)}\qquad \begin{array}{r@{\quad }l} d_{\succ}^* := \sup& h_{\succ}(\lambda) + \ell\lambda_+ - u\lambda_- \\ \mbox{s.t.} & A - \lambda B \succ0, \end{array} $$

(2.3)

where

$$ h_\succ(\lambda) := -(a - \lambda b)^T(A-\lambda B)^{-1}(a - \lambda b). $$

It is useful to study this problem since positive definiteness as in (2.3) is maintained in the algorithm presented below. Finally, we write down the semidefinite programming (SDP) relaxation of GTRS

$$ \text{(SDP-GTRS)}\qquad \begin{array}{r@{\quad }l} p_{SDP}^* := \inf& \mathrm{tr}(AX) - 2a^Tx\\ \mathrm{s.t.}& \ell\le\mathrm{tr}(BX) - 2b^Tx\le u,\\ & X \succeq xx^T. \end{array} $$

(2.4)

By a Schur complement argument, the final inequality is equivalent to the linear constraint .

Proposition 2.1

The dual of SDP-GTRS is

$$ \textit{(DSDP-GTRS)}\qquad \begin{array}{r@{\quad }l} d_{DSDP}^* := \sup& \ell\lambda_+ -u \lambda_- - \gamma\\ \mathrm{s.t.}& \begin{bmatrix} \gamma& -(a-\lambda b)^T\\ -(a-\lambda b) & A - \lambda B \end{bmatrix} \succeq0, \end{array} $$

(2.5)

where the multipliers are defined in (2.1). Moreover, (2.5) is equivalent to (2.2) with $d_{DSDP}^{*} =d^{*}$.

Proof

It is routine to show that the dual of (2.4) is given by (2.5). Furthermore, by considering the Schur complement, for any γ, $\lambda\in {\mathbb{R}}$, we have

$$\left[ \begin{array}{c@{\quad }c} \gamma& -(a-\lambda b)^T\\ -(a-\lambda b) & A - \lambda B \end{array}\right] \succeq0 \quad \Leftrightarrow\quad \begin{cases} \gamma\ge(a-\lambda b)^T(A-\lambda B)^{\dagger}(a-\lambda b)\\ a - \lambda b\in \operatorname{{Range}}(A-\lambda B)\\ A-\lambda B\succeq0. \end{cases} $$

Thus, (2.5) is equivalent to (2.2). □

2.1.1 Assumptions and properties

We consider the following assumptions on the GTRS (1.1).

Assumption 2.1

1.
B≠0.
2.
GTRS (1.1) is feasible.
3.
The following relative interior constraint qualification holds
$$ (\mathit{RICQ}) \qquad \mathrm{tr}(B \hat{X}) - 2b^T\hat{x} \in\mathrm{ri}\bigl([\ell,u]\bigr), \quad \text{ for some } \hat{X}\succ\hat{x}{\hat{x}}^T. $$
(2.6)
4.
GTRS is bounded below.
5.
D-GTRS (2.2) is feasible.

Note that Assumption 2.1 together with weak duality yields

$$ -\infty< d^* \le p^* < +\infty. $$

(2.7)

We show in the next theorem that Assumption 2.1 is reasonable in the sense that it can be made without loss of generality, i.e., if the assumption fails, then an explicit solution or a simplification can easily be obtained.

Theorem 2.1

The following holds for the Items in Assumption 2.1.

(i)
The Items 1, 2, 3 in Assumption 2.1 can be made in the order given, without loss of generality, i.e., if an assumption fails then an explicit solution can easily be obtained.
(ii)
If Items 1, 2, 3 in Assumption 2.1 hold and b=0, then Item 4 implies Item 5.
(iii)
If Item 4 in Assumption 2.1 fails, then Item 5 fails.

Proof

Let x ^∗:=B ^† b, $q_{1}^{*}:=q_{1}(x^{*})$. We now provide the details about how the assumptions hold in the order given.

(i)
- Suppose that B=0, i.e., the constraint is linear. If $A\succeq0, a\in \operatorname{{Range}}(A)$ and the unconstrained minimum $\bar{x}=A^{\dagger}a$ satisfies ℓ≤−2b ^T(A ^† a)≤u, then $\bar{x}$ solves GTRS. Otherwise, the optimum, if it exists, is on one of the two boundaries. Therefore, we can change the linear inequality constraint to an equality, and we can again check for an unconstrained minimum after the appropriate substitution using the linear constraint. Therefore, the assumption that B≠0 can be made without loss of generality.
- First, GTRS is infeasible if, and only if, the following three conditions hold:
  $$ \begin{array}{l} B \text{ is semidefinite};\\ b\in \operatorname{{Range}}(B) \quad\text{(equivalently $b=Bx^{*}$)};\\ q^*_1 < \ell\text{ if } B \preceq0; \quad\text{or} \quad q^*_1 > u \text{ if } B \succeq0. \end{array} $$
  More precisely, the characterization for infeasibility follows from the fact that q ₁ is bounded below with minimum at x ^∗ (resp. bounded above with maximum at x ^∗) if, and only if, $b\in \operatorname{{Range}}(B)$ and B⪰0 (resp. B⪯0). We can verify the semidefiniteness of B by finding the largest and smallest eigenvalues. The range condition follows from finding the best least squares solution of Bx=b. If the range condition holds, then the final inequalities can be checked by evaluating q ₁ at x ^∗. Thus, we conclude that feasibility can be verified by finding λ _max(B) and λ _min(B) and solving a system of equations, and hence, Assumption 2.1, Item 2, can be made without loss of generality.
- Suppose that Assumption 2.1, Items 1–2 hold, but the RICQ Assumption 2.1, Item 3 fails. Then we can find an explicit solution for GTRS. More precisely, since the RICQ fails, we have $b\in \operatorname{{Range}}(B)$, and either B⪯0 with ℓ=supx ^T Bx−2b ^T x or B⪰0 with u=infx ^T Bx−2b ^T x. In either case, the conditions imply that the feasible set is $x^{*}+\operatorname{{Null}}(B)$. We can then use a nullspace representation and substitute into the objective function to obtain an explicit solution or realize that the problem is unbounded below. In conclusion, if RICQ fails, we can obtain an explicit solution or realize that the problem is unbounded.
(ii)
This conclusion is proved as Proposition A.1 in the Appendix.
(iii)
If Assumption 2.1, Item 5 holds, then d ^∗>−∞. Combining with weak duality, we see that p ^∗≥d ^∗>−∞, i.e., Item 4 holds.

□

Remark 2.1

From the proof of Theorem 2.1(2.1) above, it is not hard to see that if we assume Items 1 and 2 of Assumption 2.1, then the RICQ (2.6) fails if and only if the constraint is convex and no Slater point (strict feasibility) exists, or equivalently, when the range of values of the quadratic q ₁(x) in the constraint only meets the interval [ℓ,u] at one point. Thus, in the case of equality constraint, i.e., when u=ℓ, the RICQ holds if and only if inf_x q ₁(x)<ℓ=u<sup_x q ₁(x), which is precisely condition (3.2) in [24].

For the rest of the paper, we assume that Assumption 2.1 holds.

2.1.2 Weak duality

We have the following weak duality result describing the relationship between the optimal values of the above optimization problems.

Proposition 2.2

The optimal values satisfy

$$ -\infty\leq d^*_\succ\leq d^*=d^*_{DSDP} \leq p^*_{SDP} \leq p^* < +\infty. $$

(2.8)

Moreover, if $d^{*}_{\succ}> -\infty$, then $d^{*}_{\succ}= d^{*}$.

Proof

The proof of (2.8) follows from the feasibility Assumption 2.1, weak duality, Proposition 2.1 and the definitions.

Now suppose that $d^{*}_{\succ}$ is finite and suppose to the contrary that $d^{*}_{\succ}< d^{*}$. Then there exists λ ₁ and λ ₂ feasible for (2.2) and (2.3), respectively, such that $f(\lambda_{1}) > d^{*} - \frac{d^{*} - d^{*}_{\succ}}{2}$ and $f(\lambda_{2}) > d^{*}_{\succ}- \frac{d^{*} - d^{*}_{\succ}}{2}$, where f(λ) is the common objective function for both problems. Since f(λ) is a concave function and $\frac{\lambda_{1}+\lambda_{2}}{2}$ is feasible for (2.3), we obtain that

$$ d^*_\succ\ge f \biggl(\frac{\lambda_1+\lambda_2}{2} \biggr)\ge \frac{f(\lambda_1) + f(\lambda_2)}{2} > \frac{d^*+ d^*_\succ}{2} - \frac{d^*- d^*_\succ}{2} = d^*_\succ, $$

a contradiction. This completes the proof. □

2.1.3 Strong duality and characterization of optimality

We show below that equality holds for four of the five finite optimal values in (2.8); and, moreover, two of them are attained. We start with the following technical lemma where the quadratic forms in (1.1) are linearized. Recall that a Schur complement argument implies that the quadratic constraint X⪰xx ^T in (2.9) is equivalent to the linear constraint .

Lemma 2.1

Let C≻A and let

$$ \begin{array}{r@{\quad }l} p_{C,SDP}^*:=\inf& \mathrm{tr}(CX) - 2a^Tx\\ \mathrm{s.t.}& \ell\le\mathrm{tr}(BX) - 2b^Tx\le u,\\ & X \succeq xx^T. \end{array} $$

(2.9)

Then, the optimal value $p_{C,SDP}^{*}$ is finite and attained.

Proof

The programs (D-GTRS) and (SDP-GTRS) are dual to each other. Assumption 2.1 implies that (2.9) is feasible and that there exists $\hat{\lambda}$ satisfying the Slater condition $C-\hat{\lambda}B \succ A-\hat{\lambda}B \succeq 0$, i.e., a Slater point exists for (D-GTRS) when A is replaced by C. This means that the dual program (2.9) is feasible and attained. □

We next prove strong duality between (1.1) and (2.2). This technical result is used repeatedly throughout this paper. Similarly as in [6], we make use of bounds on the rank of the extreme points of SDP representable sets [1, 2, 27] to prove the exactness of the SDP relaxations.

Theorem 2.2

Recall that Assumption 2.1 holds. Then the following holds for GTRS:

(i)
The optimal values of GTRS and its SDP relaxation are equal,
$$p_{SDP}^* = p^*. $$
(ii)
Strong duality holds for GTRS, i.e., p ^∗=d ^∗ and the dual optimal value d ^∗ is attained. Moreover, equality holds for four of the five optimal values in (2.8),
$$ d^*_\succ\le d^* = d^*_{DSDP} = p^*_{SDP} = p^*. $$
(2.10)
Furthermore, if $d^{*}_{\succ}> -\infty$, then all quantities in (2.10) are equal.

Proof

(i) For each ϵ≥0, let A _ϵ:=A+ϵI. Consider the following perturbation of (1.1)

$$\begin{aligned} p_\epsilon^* & = \inf\bigl\{ x^TA_\epsilon x - 2a^Tx:\; \ell\le x^TBx - 2b^Tx\le u\bigr\} , \end{aligned}$$

and its SDP relaxation

$$\begin{aligned} v_\epsilon^* & = \inf\bigl\{ \mathrm{tr}(A_\epsilon X) - 2a^Tx:\; \ell\le \mathrm{tr}(BX) - 2b^Tx \le u, X \succeq xx^T\bigr\} \\ & = \inf \biggl\{ \mathrm{tr}(A_\epsilon X) - 2a^Tx:\; \ell\le\mathrm{tr}(BX) - 2b^Tx \le u, U = \left[\begin{array}{c@{\quad }c} 1&x^T\\ x&X \end{array} \right] \succeq0 \biggr\} . \end{aligned}$$

Then $p_{\epsilon}^{*}\ge v_{\epsilon}^{*}$, for all ϵ≥0.

We proceed by first showing that $p_{\epsilon}^{*} = v_{\epsilon}^{*}$ for each fixed ϵ>0. We start with the case when ℓ=u. In this case, the SDP relaxation can be written as

$$ \begin{array}{r@{\quad }l} \inf&\mathrm{tr} \left( \left[ \begin{array}{c@{\quad }c} 0&-a^T\\ -a&A_\epsilon \end{array}\right] U \right) \\ \mathrm{s.t.}&\mathrm{tr} \left( \left[ \begin{array}{c@{\quad }c} 0&-b^T\\ -b&B \end{array}\right] U \right) = u,\\ & U_{1,1} = 1, U \succeq0. \end{array} $$

The optimal set of the above SDP is nonempty by Lemma 2.1. Since the cone of positive semidefinite matrices does not contain lines, the optimal value must be attained at an extreme point U ^∗ of the feasible set. Since there are two equality constraints, by [27, Theorem 2.2], the rank r _U of the extreme point satisfies

$$ r_U(r_U+1) \le4. $$

Since U ^∗≠0, we must have r _U=1 and hence , for some x ^∗. Then x ^∗ is feasible for (1.1) and we conclude that $p_{\epsilon}^{*} = v_{\epsilon}^{*}$.

Next, we consider the case when ℓ<u. In this case, the SDP relaxation can be written as

$$ \begin{array}{r@{\quad }l} \inf&\mathrm{tr} \left( \left[ \begin{array}{c@{\quad }c} 0&-a^T\\ -a&A_\epsilon \end{array}\right] U\right )\\ \mathrm{s.t.}& \mathrm{tr} \left( \left[ \begin{array}{c@{\quad }c} 0&-b^T\\ -b&B \end{array}\right] U \right) - \alpha= l,\\ & \mathrm{tr} \left( \left[ \begin{array}{c@{\quad }c} 0&-b^T\\ -b&B \end{array}\right] U \right) +\beta= u, \\ & U_{1,1} = 1, U \succeq0, \alpha\ge0, \beta\ge0. \end{array} $$

The optimal set of the above SDP is nonempty, again by Lemma 2.1. Hence, the optimal value must be attained at an extreme point (U ^∗,α ^∗,β ^∗) of the feasible set. Since there are three equality constraints, by [27, Theorem 2.2], the ranks r _U, r _α and r _β of this extreme point have to satisfy

$$ r_U(r_U+1) + r_\alpha(r_\alpha+1) + r_\beta(r_\beta+1) \le6. $$

(2.11)

Notice that at optimality, α ^∗ and β ^∗ cannot both be zero. This fact together with (2.11) and the fact U ^∗≠0 shows that r _U=1 and hence , for some x ^∗. Then x ^∗ is feasible for (1.1) and we again conclude that $p_{\epsilon}^{*} = v_{\epsilon}^{*}$.

Hence, we have shown that $p_{\epsilon}^{*} = v_{\epsilon}^{*}$, for all ϵ>0. Now, let (x,X) be feasible for the SDP relaxation (2.4). Then we have

$$ v^*_0 \le p_0^*\le\lim_{\epsilon\downarrow0} p_\epsilon^* = \lim_{\epsilon\downarrow0} v_\epsilon^* \le\lim _{\epsilon\downarrow0} \mathrm{tr}(A_\epsilon X) -2a^Tx = \mathrm{tr}(A X) -2a^Tx. $$

Taking the infimum over the feasible set of the SDP relaxation gives the desired equality $v_{0}^{*} = p_{0}^{*}$. This completes the proof.

(ii) Recall that the Lagrangian dual of the SDP relaxation (2.4) is given by (2.5). Moreover, from RICQ (2.6), the generalized Slater condition for (2.4) holds with the Slater point . Hence $p_{0}^{*}=v_{0}^{*} = d^{*}_{DSDP} = d^{*}$ and the dual optimal values are attained. This proves (2.10). The rest of the claim follows from Proposition 2.2. □

We remark that, when ℓ<u in the above proof, the equality $p^{*}_{\epsilon}= v^{*}_{\epsilon}$, for all ϵ>0, can also be obtained as a consequence of [37, Theorem 2.3]. We are now ready to characterize optimality for GTRS. We note that our constraint qualification (2.6) is different from that of [34, Theorem 2.1]. In particular, we do not require ℓ<u when B is indefinite with the optimal solution in its kernel and b=0. Moreover, as seen in Theorem 2.1, Assumption 2.1 can be made without loss of generality.

Theorem 2.3

Recall that Assumption 2.1 holds. A point x ^∗ is a solution to GTRS (1.1) if, and only if, for some (Lagrange multiplier) $\lambda^{*} \in {\mathbb{R}}$, we have

$$ \begin{array}{cc} \left . \begin{array}{cc} (A-\lambda^*B)x^* = a - \lambda^* b, \\ A-\lambda^*B \succeq0, \end{array} \right \} \quad \textit{dual feasibility} \\ \ell\le{x^*}^T B x^* - 2b^Tx^* \leq u, \quad \textit{primal feasibility} \\ \left. \begin{aligned} \bigl(\lambda^*\bigr)_+ \bigl(\ell- {x^*}^T B x^* + 2b^T x^*\bigr) = 0, \\ \bigl(\lambda^*\bigr)_- \bigl({x^*}^T B x^* - 2b^T x^* - u\bigr) = 0. \end{aligned} \right\} \quad \textit{complementary slackness} \\ \end{array} $$

(2.12)

Proof

Suppose first that x ^∗ is a solution to (1.1). Then the third relation in (2.12) holds. Furthermore, by Theorem 2.2, strong duality holds. Hence, there exists λ ^∗ such that

$$\begin{aligned} p^* & = {x^*}^TA{x^*} - 2a^T{x^*} = \inf _{\ell\le x^TBx-2b^Tx\le u}\ x^TAx - 2a^Tx \end{aligned}$$

(2.13)

$$\begin{aligned} & = \sup_{\lambda}\inf_{x}\ x^TAx - 2a^Tx + \lambda_+\bigl(\ell- x^T B x + 2b^Tx\bigr) + \lambda_-\bigl(x^T B x - 2b^Tx - u\bigr) \\ & = \inf_x \ x^TAx - 2a^Tx + \lambda^*_+\bigl(\ell- x^T B x + 2b^Tx\bigr) + \lambda^*_- \bigl(x^T B x - 2b^Tx - u\bigr). \end{aligned}$$

(2.14)

The first two relations in (2.12) follow immediately from the optimality condition of the unconstrained optimization problem (2.14). Moreover, from (2.14) and primal feasibility of x ^∗, we have

$$\begin{aligned} p^* = & \inf_x \ x^TAx - 2a^Tx + \lambda^*_+\bigl(\ell- x^T B x + 2b^Tx\bigr) + \lambda^*_- \bigl(x^T B x - 2b^Tx - u\bigr) \\ \le & {x^*}^TA{x^*} - 2a^T{x^*} + \lambda^*_+\bigl( \ell- {x^*}^T B {x^*} + 2b^Tx^*\bigr) + \lambda^*_- \bigl({x^*}^T B {x^*} - 2b^Tx^* - u\bigr)\\ \le&{x^*}^TA{x^*} - 2a^T{x^*}. \end{aligned}$$

Comparing this last relation with (2.13), we obtain the complementary slackness expressions in the last two relations of (2.12).

Next, assume that x ^∗ is primal feasible and that there exists λ ^∗ so that (2.12) holds. Then we have the following chain of inequalities.

$$\begin{aligned} p^* \ge d^* = &\sup_{\lambda}\inf_x \ x^TAx - 2a^Tx + \lambda _+\bigl(\ell - x^T B x + 2b^Tx\bigr)\\ &{} + \lambda_-\bigl(x^T B x - 2b^Tx - u\bigr) \\ \ge &\inf_x \ x^TAx - 2a^Tx + \lambda^*_+\bigl(\ell- x^T B x + 2b^Tx\bigr) + \lambda^*_- \bigl(x^T B x - 2b^Tx - u\bigr) \\ = & {x^*}^TA{x^*} - 2a^T{x^*} + \lambda^*_+\bigl(\ell- {x^*}^T B {x^*} + 2b^Tx^*\bigr)\\ &{} + \lambda^*_- \bigl({x^*}^T B {x^*} -2b^Tx^* - u\bigr) \\ = & {x^*}^TA{x^*} - 2a^T{x^*} \ge p^*, \end{aligned}$$

where the first equality follows from the definition of the Lagrangian dual problem, the second equality follows from the first two relations of (2.12), the third equality follows from the last two relations of (2.12), while the last inequality follows from the primal feasibility of x ^∗. Thus, in particular, we have x ^∗ ^T Ax ^∗−2a ^T x ^∗=p ^∗, and so x ^∗ solves (1.1). □

Remark 2.2

We note that in proving the optimality conditions (i.e., version of Theorem 2.3) in [24, 34] for the special cases of GTRS (1.1), the authors did not assume Items 4 and 5 of Assumption 2.1. Indeed, suppose we only assume Items 1 through 3 of Assumption 2.1. When b=0, if x ^∗ is known to be a solution of the GTRS (1.1), then the problem is bounded below and thus by Theorem 2.1(ii), we conclude that Assumption 2.1 holds. On the other hand, if there exists λ ^∗ and x ^∗ such that (2.12) holds, then we conclude from Theorem 2.1(iii) that Assumption 2.1 holds. Thus, our result is a strict improvement of [34, Theorem 2.1], which assumes in addition b=0, since our RICQ (2.6) is weaker than condition (2.5) in [34, Theorem 2.1]. However, since it is not known to us whether Item 4 implies Item 5 when b≠0, we cannot assert that our Theorem 2.3 reduces to [24, Theorem 3.2], even though our RICQ is equivalent to their constraint qualifications.

Example 2.1

We illustrate that (2.10) can fail if Assumption 2.1 is violated. Moreover, attainment for GTRS can fail. Consider

$$ \begin{array}{r@{\quad }l} \inf& x_1x_2\\ \mathrm{s.t.}& x_1^2 = 0. \end{array} $$

Then clearly, the optimal value of the above optimization problem is zero. Moreover, the optimal value of its SDP relaxation, which is given below, is also zero:

$$ \begin{array}{r@{\quad }l} \inf& X_{12}\\ \mathrm{s.t.}& X_{11} = 0,\ X\succeq0. \end{array} $$

However, it is not hard to observe that the dual problem is infeasible and thus the dual optimal value d ^∗=−∞. Hence, we have $p^{*} = p^{*}_{SDP} > d^{*}$. Indeed, Assumption 2.1 Item 3 and Item 5 are violated and thus Theorem 2.2 does not apply.

In the unconstrained case, boundedness below of q ₀(x) yields attainment. This is not the case for GTRS. Consider

$$ \begin{array}{r@{\quad }l} \inf& x_2^2\\ \mathrm{s.t.}& x_1x_2 = 1. \end{array} $$

The optimal value 0 is unattained. Note that λ=0 is the unique dual feasible point. Further results on attainment can be found in, e.g., [23].

2.2 Easy and hard cases and regular pencils

From the first two relations in (2.12), it is obvious that we have to work with A−λI and the eigenvalues of A for TRS. Similarly, for GTRS, we have to deal with generalized eigenvalues and the matrix pencil A−λB corresponding to the matrix pair (A,B). In the latter case, due to the arbitrariness of B, several singular cases may arise, see e.g., [26]. Below are some prototypical examples.

1.
Consider the case when , where is a common nullspace vector. Then the corresponding generalized eigenvalues include all of ${\mathbb{R}}$, i.e. the determinant det(A−tB)=0 is identically 0 in t. This case is problematic since a small perturbation in the matrices can cause a big change in the solution set.
2.
Let , . The eigenpairs include complex eigenvalues. This case is problematic as the dual problem is infeasible.

2.2.1 Regular case

To avoid the aforementioned singular matrix cases, we assume from now on that the matrix pencil is definite. This corresponds to the so-called (positively) regular case in [34].

$$ \text{Regular Pencil } \qquad \widehat{C}:=A - \widehat{\lambda}B \succ0, \text{ for some } \widehat{\lambda}\in {\mathbb{R}}. $$

(2.15)

Condition (2.15) covers most of the interesting cases. Indeed, it is shown in [34, Lemma 2.3] that if (2.15) fails and det(A−tB) is not an identically zero polynomial in t, then the feasible set of (2.2) is a singleton set. Condition (2.15) also guarantees the existence of a solution to (1.1); the proof of this fact follows similarly to that of Lemma 2.1. For convenience in the discussions below, we state this result as a proposition.

Proposition 2.3

If condition (2.15) holds, then the GTRS problem (1.1) has an optimal solution.

Proof

See the proofs of Lemma 2.1 and Theorem 2.2(i). □

Under the additional regularity assumption (2.15), we would like to classify the GTRS into the so-called easy case and hard cases analogously to the classical trust region subproblem. To this end, we follow a similar line of arguments as in [34, Sect. 3], which analyzes the case when ℓ=u=1 and b=0. We have to make use of the fact that under (2.15), the matrix pencil A−λB is diagonalizable by a congruence, see e.g., [22, Theorem 10.1]. We include a proof in Theorem 2.4 below for the convenience of the readers.

Theorem 2.4

Recall that the pencil is (positively) regular. Then, there exists an invertible matrix S and diagonal matrices D ₁ and D ₂ such that

$$ A = S D_1 S^T \quad \textit{and}\quad B = S D_2 S^T. $$

Furthermore, A−λB⪰0 if, and only if,

$$ \underline{\lambda} := \max_{\{i:\beta_i < 0\}} \frac{\alpha _i}{\beta _i} \le\lambda\le\min_{\{i: \beta_i > 0\}}\frac{\alpha_i}{\beta _i} =: \overline{\lambda}, $$

(2.16)

where α _i and β _i are the ith diagonal entry of D ₁ and D ₂, respectively. Moreover, $(\underline{\lambda},\overline{\lambda})$ is a nonempty open interval; at least one of the end points of the interval has to be finite; and,

$$|\alpha_i| + |\beta_i| > 0,\quad \forall i. $$

Proof

Let $\widehat{C}$ be as in (2.15). Without loss of generality, we can assume $\widehat{\lambda}\neq0$. Let $\widehat{C} =PDP^{T}$ denote the spectral decomposition (orthogonal diagonalization) of $\widehat{C}$. Therefore the diagonal D satisfies $0\prec D=P^{T}(A-\widehat{\lambda}B)P$ and, $I=D^{-1/2} P^{T}(A-\widehat{\lambda}B)PD^{-1/2}$. Let Q give the orthogonal diagonalization of D ^−1/2 P ^T APD ^−1/2, i.e. Q ^T D ^−1/2 P ^T APD ^−1/2 Q=D ₁. Define S=PD ^1/2 Q. Then it follows that A=SD ₁ S ^T and $B = -\frac{1}{\widehat{\lambda}}S(I - D_{1})S^{T}$.

Since positive semidefiniteness is preserved under congruence, we see immediately that A−λB⪰0 if, and only if,

$$ D_1 - \lambda D_2\succeq0. $$

Since D ₁ and D ₂ are diagonal, we obtain further that A−λB⪰0 is equivalent to

$$ \underline{\lambda} := \max_{\{i:\beta_i < 0\}}\frac{\alpha _i}{\beta _i} \le\lambda \le\min_{\{i: \beta_i > 0\}}\frac{\alpha_i}{\beta _i} =: \overline{\lambda}, $$

where α _i and β _i are the ith diagonal entry of D ₁ and D ₂, respectively. Next, note that $\lambda\in(\underline{\lambda},\overline{\lambda})$ if, and only if, A−λB≻0. Thus, by (2.15), $(\underline{\lambda},\overline{\lambda})$ is a nonempty open interval. Furthermore, since B≠0, it follows that at least one of the end points of the interval has to be finite. Finally, we note that α _i and β _i cannot both be zero, because of (2.15). □

We remark that the equivalence in (2.16) was established earlier in [24, Theorem 5.3]; see also [34, Lemma 3.1]. Next, for each real $\lambda\in\mathrm{cl}{(\underline{\lambda},\overline {\lambda})}$, the closure of the interval, we define the first order stationary point

$$ x(\lambda):=(A - \lambda B)^\dagger(a-\lambda b), $$

(2.17)

and the constraint evaluated at x(λ)

$$\begin{aligned} \psi(\lambda) :=& q_1\bigl(x(\lambda)\bigr) \\ =& (a - \lambda b)^T(A - \lambda B)^\dagger B (A - \lambda B)^\dagger(a - \lambda b) \\ &{} - 2b^T (A - \lambda B)^\dagger(a - \lambda b). \end{aligned}$$

(2.18)

Recall that in the TRS case where B=I and b=0, we can orthogonally diagonalize A and get $\underline{\lambda} = -\infty< \lambda^{*} \leq\overline{\lambda}=\lambda_{\min}(A)$. In addition, if we also have a=0 (homogeneous case), then both x(λ) and ψ(λ) are constant functions, identically 0. This property extends to the general case as follows, whose proof can be found in [24, p. 202].

Lemma 2.2

Let $\lambda\in(\underline{\lambda},\overline{\lambda})$. Then the functions x(λ) and ψ(λ) in (2.17) and (2.18), respectively, are differentiable with derivatives given by

$$\begin{aligned} x^\prime(\lambda) =& (A - \lambda B)^{-1}\bigl(Bx(\lambda) - b\bigr), \\ \psi'(\lambda) =& 2x'(\lambda)^T(A-\lambda B) x'(\lambda). \end{aligned}$$

Moreover, on $(\underline{\lambda},\overline{\lambda})$, ψ(λ) is either constant or strictly increasing with

$$ \begin{array}{r@{\quad }c@{\quad }l} \psi\textit{ constant in } (\underline{\lambda},\overline{\lambda}) &\Leftrightarrow& x'(\lambda) = 0, \textit{ for some } \lambda\in (\underline{\lambda},\overline{\lambda}) \\ &\Leftrightarrow& x'(\lambda) = 0, \forall\lambda\in(\underline{\lambda},\overline {\lambda}) \\ &\Leftrightarrow& \begin{pmatrix} a \cr b \end{pmatrix} \in \operatorname{{Range}}\left( \begin{bmatrix} A \cr B \end{bmatrix} \right). \end{array} $$

(2.19)

Remark 2.3

Notice that if (2.19) holds, then (a−λb)=(A−λB)v, for some v. Therefore, the objective in (2.2) becomes −v ^T(A−λB)v+ℓλ ₊−uλ ₋. Thus (2.2) reduces to a linear programming problem.

We now define the easy case and the hard cases. Unlike the case for TRS, we have to consider separately the cases when $\overline{\lambda}$ or $\underline {\lambda}$ is infinite.

Definition 2.1

The easy case occurs if one of the following three cases holds:

1.
both $\overline{\lambda}$ and $\underline{\lambda}$ are finite, and $-\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda) = \lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = +\infty$;
2.
only $\overline{\lambda}$ is finite, and $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = +\infty$;
3.
only $\underline{\lambda}$ is finite, and $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda) = -\infty$.

Otherwise, we have the hard case.

From the above definition and Lemma 2.2, we see immediately that in the easy case, the function ψ has to be strictly increasing on $(\underline {\lambda},\overline{\lambda})$. Sample shapes of the function ψ in the easy and hard cases are shown in Figs. 1 and 2. The next lemma and the discussion following it justify the terminology easy case.

Lemma 2.3

If $\overline{\lambda}=+\infty$, then $\lim_{\lambda\uparrow \overline{\lambda}}\psi(\lambda) = \sup x^{T}Bx-2b^{T}x > \ell$. Similarly, if $\underline{\lambda}=-\infty$, then $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda) = \inf x^{T}Bx-2b^{T}x < u$.

Proof

Suppose first that $\overline{\lambda} = +\infty$. Then, from the definition of $\overline{\lambda}$, we see that B⪯0, is negative semidefinite. We consider two cases: $b\in \operatorname{{Range}}(B)$ and $b\notin \operatorname{{Range}}(B)$.

Suppose that $b\in \operatorname{{Range}}(B)$. Notice first from the definition of $\underline{\lambda}$ and $\overline{\lambda}$ that A−λB≻0, and so also $B - \frac{1}{\lambda}A\prec0$, for all sufficiently large λ. From this and the definition of x(λ), for all sufficiently large λ and for any $x\in {\mathbb{R}}^{n}$, we see that

$$ \psi(\lambda) \ge x^T \biggl(B-\frac{1}{\lambda}A \biggr)x - 2 \biggl(b-\frac{1}{\lambda}a \biggr)^Tx + \frac{1}{\lambda}q_0\bigl(x(\lambda)\bigr). $$

(2.20)

Furthermore,

$$ \frac{1}{\lambda}q_0\bigl(x(\lambda)\bigr) = \sum _{i=1}^n \biggl(\frac{\alpha_i(\xi_i-\lambda\gamma _i)^2}{\lambda (\alpha_i-\lambda\beta_i)^2} - \frac{2\xi^2_i-2\lambda\xi_i\gamma_i}{\lambda(\alpha_i-\lambda \beta _i)} \biggr), $$

where α _i and β _i are defined as above, while ξ=S ⁻¹ a and γ=S ⁻¹ b. Since $b\in \operatorname{{Range}}(B)$ means that γ _i=0 whenever β _i=0, we see immediately that

$$ \lim_{\lambda\uparrow\overline{\lambda}}\frac{1}{\lambda }q_0\bigl(x(\lambda ) \bigr) = 0. $$

Taking limits in λ on both sides of (2.20) and then taking the supremum over all $x\in {\mathbb{R}}^{n}$, we obtain that $\lim_{\lambda\uparrow\overline{\lambda}}\psi (\lambda) \ge\sup x^{T}Bx -2b^{T}x > \ell$, where the last strict inequality is a consequence of RICQ (2.6). Hence, the first conclusion follows when $b\in \operatorname{{Range}}(B)$.

Next, suppose $b\notin \operatorname{{Range}}(B)$. Then by the definition of ψ(λ)=q ₁(x(λ)), we have

$$ q_1\bigl(x(\lambda)\bigr) = \sum_{i=1}^n \biggl(\frac{\beta_i(\xi_i-\lambda\gamma _i)^2}{(\alpha _i-\lambda\beta_i)^2} -\frac{2\gamma_i\xi_i-2\lambda\gamma^2_i}{(\alpha_i-\lambda\beta _i)} \biggr). $$

The assumption on b implies that there exists i ₀ such that $\beta _{i_{0}}=0$ but $\gamma_{i_{0}}\neq0$ (notice that by (2.15), necessarily the corresponding $\alpha_{i_{0}} > 0$). Thus, $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = +\infty $ and the first conclusion holds trivially in this case.

The case when $\underline{\lambda} = -\infty$ can be proved similarly. □

From Lemmas 2.2 and 2.3, we see that, in the easy case, ψ(λ)=s has a unique solution in the open interval $(\underline{\lambda},\overline{\lambda})$ for any s∈(infψ,supψ). We next show that we can alternatively characterize the easy case and hard cases via the null space of $A - \overline{\lambda} B$ or $A - \underline {\lambda} B$; see also [34, Sect. 3.1] for a similar analysis on the special case of GTRS (1.1) with u=ℓ=1 and b=0.

Lemma 2.4

If $\underline{\lambda}$ is finite, then $\lim_{\lambda\downarrow \underline{\lambda}}\psi(\lambda) = -\infty$ if, and only if, $a-\underline{\lambda}b\notin \operatorname{{Range}}(A - \underline{\lambda}B)$. Similarly, if $\overline{\lambda}$ is finite, then $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = +\infty $ if, and only if, $a-\overline{\lambda}b \notin \operatorname{{Range}}(A - \overline{\lambda}B)$.

Proof

Suppose first that $\underline{\lambda}$ is finite. Then the set of indices $\underline{I}$ such that $\alpha_{i} - \underline {\lambda}\beta_{i} = 0$ is nonempty. Moreover, from the definition of $\underline{\lambda}$ and the fact that $\underline{\lambda} < \overline{\lambda}$, we obtain that β _i<0 for all $i\in\underline{I}$. Next, notice that for any $\lambda\in(\underline{\lambda},\overline{\lambda})$, we have from (2.18) that

$$\begin{aligned} q_1\bigl(x(\lambda)\bigr) =& \sum_{i\in\underline{I}} \biggl(\frac{\beta_i(\xi _i-\lambda\gamma_i)^2}{(\alpha_i-\lambda\beta_i)^2} -\frac{2\gamma_i\xi_i-2\lambda\gamma^2_i}{(\alpha_i-\lambda\beta _i)} \biggr)\\ &{} + \sum _{i\notin\underline{I}} \biggl(\frac{\beta_i(\xi _i-\lambda\gamma_i)^2}{(\alpha_i-\lambda\beta_i)^2} -\frac{2\gamma_i\xi_i-2\lambda\gamma^2_i}{(\alpha_i-\lambda\beta _i)} \biggr), \end{aligned}$$

with α _i, β _i defined as above and ξ=S ⁻¹ a, γ=S ⁻¹ b. Since $a-\underline{\lambda}b \notin \operatorname{{Range}}(A - \underline{\lambda}B)$ if, and only if, there exists $i_{0}\in\underline{I}$ with $\xi_{i_{0}} - \underline{\lambda}\gamma_{i_{0}}\neq0$, the first conclusion now follows immediately. The second statement can be proved similarly. □

Using Lemma 2.4, we see that the hard and easy cases can be alternatively characterized as in Table 1.

Table 1 The three different cases for GTRS

Full size table

Before describing how the easy case and hard cases can be tackled, we need the following technical lemma concerning the smallest and largest generalized eigenvalues $\underline{\lambda}$ and $\overline{\lambda}$. This result was briefly discussed in [24, Sect. 5]. We include a proof here for the convenience of the readers.

Lemma 2.5

If $\overline{\lambda}$ is finite, then v ^T Bv>0 for all $v\in \operatorname{{Null}}(A - \overline{\lambda} B)\backslash\{0\}$. Similarly, if $\underline{\lambda}$ is finite, then v ^T Bv<0 for all $v\in \operatorname{{Null}}(A - \underline{\lambda} B)\backslash\{0\}$.

Proof

Suppose first that $\overline{\lambda}$ is finite and let $v\in \operatorname{{Null}}(A - \overline{\lambda} B)\backslash\{0\}$. Take any $\widetilde{\lambda}\in(\underline{\lambda},\overline {\lambda })$. Then $A - \widetilde{\lambda}B \succ0$ and hence we have

$$ 0 < v^T(A - \widetilde{\lambda}B)v = v^T(A - \overline{ \lambda} B)v + (\overline{\lambda} - \widetilde{\lambda})v^TBv = ( \overline{\lambda } - \widetilde{\lambda})v^TBv. $$

Since $\overline{\lambda} > \widetilde{\lambda}$, we conclude that v ^T Bv>0. This proves the first part. The second conclusion can be proved similarly. □

2.2.2 Three intervals for λ

We are now ready to describe how GTRS can be tackled. Our discussion is different from that in the concluding remarks of [24] in several aspects. First, our approach does not require solving two GTRS with equality constraints. Second, we include detailed discussion about the easy case and the hard cases, with hard case, case 2, solved explicitly. In what follows, we consider three cases dependent on where 0 is located relative to the interval $(\underline{\lambda}, \overline{\lambda})$.

Case 1: $\underline{\lambda}< 0< \overline{\lambda}$

Easy case: If ℓ≤ψ(0)≤u, then it is easy to check that the optimality conditions of GTRS are satisfied with λ ^∗=0 and x ^∗=A ⁻¹ a. We have an interior solution in this case.

Otherwise, suppose we have ψ(0)<ℓ instead. Since ψ is strictly increasing in $(\underline{\lambda},\overline{\lambda})$, we observe that the optimal Lagrange multiplier λ ^∗>0, i.e., it has to be positive. From the complementary slackness condition in (2.12), we observe further that such a multiplier is the unique solution of ψ(λ)=ℓ. The optimal solution is then given by x(λ ^∗).

The case when ψ(0)>u can be considered similarly, where the optimal Lagrange multiplier λ ^∗<0 solves ψ(λ)=u, and the optimal solution is x(λ ^∗).
Hard case: If ℓ≤ψ(0)≤u, then x(0)=x ^∗=A ⁻¹ a and we again obtain an interior solution.

Otherwise, we take a primal step to the boundary. First, suppose we have ψ(0)<ℓ. Since ψ is increasing, we necessarily have λ ^∗>0. If $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) > \ell $, then the equation ψ(λ)=ℓ is solvable and gives the optimal Lagrange multiplier $\lambda^{*} < \overline{\lambda}$. The optimal solution is given by x(λ ^∗). On the other hand, consider $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) \le \ell$. Then $\overline{\lambda}$ is finite by Lemma 2.3 and thus $q_{1}(x(\overline{\lambda}))=\lim_{\lambda\uparrow \overline{\lambda}}\psi(\lambda)$. Hence Lemma 2.5 implies that there exists $v\in \operatorname{{Null}}(A - \overline{\lambda}B)$ satisfying v ^T Bv>0. Scale such v so that $q_{1}(x(\overline{\lambda}) + v) = \ell$. Then an optimal solution is given explicitly by $x(\overline{\lambda}) + v$, with optimal Lagrange multiplier equal to $\overline{\lambda}$.

Finally, suppose we have ψ(0)>u instead. Then we necessarily have λ ^∗<0. If $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda) < u$, then the equation ψ(λ)=u is solvable and gives the optimal Lagrange multiplier $\lambda^{*} > \underline{\lambda}$. The optimal solution is given by x(λ ^∗). On the other hand, if $\lim_{\lambda\downarrow\underline{\lambda }}\psi (\lambda) \ge u$, then $\underline{\lambda}$ is finite by Lemma 2.3 and thus $q_{1}(x(\underline{\lambda}))=\lim_{\lambda\downarrow \underline{\lambda}}\psi(\lambda)$. From Lemma 2.5, there exists $v\in \operatorname{{Null}}(A - \underline {\lambda}B)$ satisfying v ^T Bv<0. Scale such v so that $q_{1}(x(\underline{\lambda}) + v) = u$. Then an optimal solution is given explicitly by $x(\underline{\lambda}) + v$, with the corresponding Lagrange multiplier being $\underline {\lambda}$.

Case 2: $\overline{\lambda}\le0$

Notice that the optimal Lagrange multiplier has to be nonpositive and $\overline{\lambda}$ has to be finite.

Easy case: We see that the optimal Lagrange multiplier λ ^∗ has to be negative, and thus, a solution of ψ(λ)=u. The optimal solution is then given by x(λ ^∗).
Hard case: If $\lim_{\lambda\downarrow \underline{\lambda}}\psi(\lambda) < u < \lim_{\lambda\uparrow \overline{\lambda}}\psi(\lambda)$, then ψ(λ)=u is solvable with the optimal Lagrange multiplier λ ^∗ as the unique solution. The optimal solution is again given by x(λ ^∗).

On the other hand, if $\lim_{\lambda\downarrow\underline{\lambda }}\psi (\lambda)\ge u$, then by Lemma 2.3, $\underline{\lambda}$ is finite and thus $q_{1}(x(\underline{\lambda})) =\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda)$. Hence, by Lemma 2.5 there exists $v\in \operatorname{{Null}}(A - \underline{\lambda}B)$ satisfying v ^T Bv<0. Scale such v so that $q_{1}(x(\underline{\lambda}) + v) = u$. Then an optimal solution is given explicitly by $x(\underline{\lambda})+v$, and $\lambda^{*} = \underline{\lambda}<0$. Finally, if $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda )\le u$, then $q_{1}(x(\overline{\lambda})) =\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda)$. Thus, by Lemma 2.5 there exists $v\in \operatorname{{Null}}(A - \overline{\lambda}B)$ satisfying v ^T Bv>0. Scale such v so that $q_{1}(x(\overline{\lambda}) + v) = u$. Then an optimal solution is given explicitly by $x(\overline{\lambda}) + v$, and $\lambda^{*} = \overline{\lambda}\le0$.

Case 3: $\underline{\lambda}\ge0$

Notice that the optimal Lagrange multiplier has to be nonnegative and $\underline{\lambda}$ has to be finite.

Easy case: Arguing similarly as in Case 2, we conclude that in the easy case, the optimal Lagrange multiplier λ ^∗ is a solution of ψ(λ)=ℓ and the optimal solution is given by x(λ ^∗).
Hard case: If $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda) < \ell< \lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda)$, then the optimal Lagrange multiplier λ ^∗>0 solves ψ(λ)=ℓ and the optimal solution is given by x(λ ^∗). Furthermore, if $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda)\ge \ell$, then an optimal solution is $x(\underline{\lambda}) + v$, where $v\in \operatorname{{Null}}(A - \underline{\lambda}B)$ satisfies v ^T Bv<0 and $q_{1}(x(\underline{\lambda}) + v) = \ell$, and $\lambda^{*} = \underline{\lambda}$. Finally, if $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda )\le \ell$, then an optimal solution is $x(\overline{\lambda}) + v$, where $v\in \operatorname{{Null}}(A - \overline{\lambda}B)$ satisfies v ^T Bv>0 and $q_{1}(x(\overline{\lambda}) + v) = \ell$, and $\lambda^{*} = \overline {\lambda}$.

From the above discussion, we see immediately that, unless A≻0 and A ⁻¹ a is an interior solution, the GTRS always has a solution on the boundary of the feasible set. Following the above procedures, we either end up solving an equality constrained problem, or obtain a closed form solution by moving a point to the suitable boundary along a suitable generalized eigenvector. In the next section, we will adapt the Rendl-Wolkowicz (RW) algorithm for TRS in [29] to tackle the equality constrained problem.

Before closing this section, we discuss how to compute $\lim_{\lambda \uparrow\overline{\lambda}}\psi(\lambda)$ and $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda)$ efficiently, which are important for the above case analysis. We only outline the procedure for computing $\lim_{\lambda\uparrow \overline{\lambda}}\psi(\lambda)$. The one for computing $\lim_{\lambda\downarrow\underline{\lambda}}\psi (\lambda )$ is similar.

Recall that $A - \overline{\lambda}B$ is singular and positive semidefinite. Thus, a nullspace vector of $A - \overline {\lambda}B$ can be found by finding an eigenvector corresponding to the smallest eigenvalue, i.e., 0.

Procedure for computing $\lim_{\lambda\uparrow\overline {\lambda }}\psi(\lambda)$ .

Step 1.
Take $v\in \operatorname{{Null}}(A - \overline{\lambda}B)\backslash\{0\}, \| v\|=1$.

If $v^{T}(a - \overline{\lambda}b)\neq0$, this means $(a - \overline{\lambda}b)\notin \operatorname{{Range}}(A - \overline {\lambda}B)$ and thus $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = +\infty$; quit. Else, update/deflate A←A+αvv ^T for some α>0. Repeat Step 1 if $\operatorname{{Null}}(A - \overline{\lambda}B)\neq\{0\}$.
Step 2.
Solve for $\bar{x} = (A-\overline{\lambda}B)^{-1}(a - \overline {\lambda}b)$. Then $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda) = q_{1}(\bar{x})$.

3 A method for solving GTRS: the extended Rendl-Wolkowicz algorithm

In this section, we discuss how the Rendl-Wolkowicz (RW) algorithm proposed in [29] can be adapted to solve the following equality constrained problem:

$$ (\mathit{GTRS}_{=})\qquad \begin{array}{r@{\quad }l} q^* = \inf& q_0(x)\\ \mbox{s.t.} & x^TBx - 2b^Tx = s, \end{array} $$

(3.1)

where $s\in {\mathbb{R}}$. Recall that we assume Assumption 2.1 holds and that the regularity in (2.15) holds. Thus, by Proposition 2.3, (3.1) has an optimal solution. Following [29], we consider the following chain of inequalities.

$$\begin{aligned} \begin{array}{rcl} q^* & =& \inf_{\stackrel{x^T Bx - 2b^Txy_0 = s}{y_0^2 = 1}}\ x^TAx - 2a^T x y_0 \\ & =& \sup_t\inf_{\stackrel{x^T Bx - 2b^Txy_0 = s}{ y_0^2 = 1}}\ x^T A x - 2a^T xy_0 + t\bigl(y_0^2 - 1\bigr)\\ \\ &\ge& \sup_t\inf_{x^T Bx - 2b^Txy_0 + y_0^2 = s + 1}\ x^T A x - 2a^T xy_0 + t\bigl(y_0^2 - 1\bigr) \\ & \ge& \sup_{r,t}\inf_{x,y_0}\ x^T A x - 2a^T xy_0 + t\bigl(y_0^2 - 1\bigr) + r\bigl(x^T Bx - 2b^Txy_0 + y_0^2 - s - 1\bigr) \\ & =& \sup_{r,\tau}\inf_{x,y_0}\ x^T A x - 2a^T xy_0 + \tau\bigl(y_0^2 - 1\bigr) + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \\ & =& \sup_{r} \bigl(\sup_{\tau}\inf _{x,y_0}\ x^T A x - 2a^T xy_0 + \tau\bigl(y_0^2 - 1\bigr) + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \bigr) \end{array} \end{aligned}$$

(3.2)

Let Ω denote the set of all real numbers r such that A+rB⪰0 and $a+rb\in \operatorname{{Range}}(A+rB)$. This set is nonempty by Assumption 2.1, Item 5. Furthermore, if r∉Ω, then it is easy to see that

$$\begin{aligned} &\inf_{x,y_0}\ x^T A x - 2a^T xy_0 + \tau \bigl(y_0^2 - 1\bigr) + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \\ &\quad {}= \inf_{y_0}\inf_x\ x^T A x - 2a^T xy_0 + \tau\bigl(y_0^2 - 1 \bigr) \\ &\qquad {} + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) = -\infty, \end{aligned}$$

(3.3)

regardless of τ. Thus, continuing from (3.2) we have

$$\begin{aligned} & \sup_{r} \Bigl(\sup _{\tau}\inf_{x,y_0}\ x^T A x - 2a^T xy_0 + \tau \bigl(y_0^2 - 1 \bigr) + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \Bigr) \\ &\quad= \sup_{r\in\varOmega} \Bigl(\sup_{\tau}\inf _{x,y_0}\ x^T A x - 2a^T xy_0 + \tau\bigl(y_0^2 - 1\bigr) + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \Bigr) \\ &\quad = \sup_{r\in\varOmega}\inf_{x,y^2_0=1}\ x^T A x - 2a^T xy_0 + r\bigl(x^T Bx - 2b^Txy_0 - s\bigr) \\ &\quad = \sup_{r\in\varOmega}\inf_{x}\ x^T A x - 2a^T x + r\bigl(x^T Bx - 2b^Tx - s\bigr) \\ &\quad = \sup_{r}\inf_{x}\ x^T A x - 2a^T x + r\bigl(x^T Bx - 2b^Tx - s\bigr) \\ &\quad = \inf_{x^T Bx - 2b^Tx = s}\ x^TAx - 2a^T x = q^*, \end{aligned}$$

where: the first equality follows from the observation in (3.3); while the second equality follows from Theorem 2.2, since for any r∈Ω, a Schur complement argument implies that there exists sufficiently large τ such that

$$ \left[ \begin{array}{c@{\quad }c} \tau&-(a+rb)^T\\ -(a+rb)& A+rB \end{array} \right] \succeq0; $$

the third equality follows from the homogenization; the fourth equality follows from an observation similar to (3.3) for the inner optimization problem; while the last equality follows from Theorem 2.2. Thus, equality holds throughout in (3.2). In particular, from the third line in (3.2), we have

$$ q^* = \sup_t \underbrace{k_0(t) - t}_{k(t)}, $$

where

$$\begin{aligned} k_0(t) =& \inf_{x^T Bx - 2b^Txy_0 + y_0^2 = s + 1} x^T A x - 2a^T xy_0 + ty_0^2\\ =&\inf \left\{ y^T \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array}\right] y:\; y^T \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] y = s + 1 \right\}. \end{aligned}$$

The RW algorithm adapted to our context amounts to solving (3.1) via maximizing k(t).

3.1 Properties of k ₀(t)

In this subsection, we discuss differentiability of the function k ₀(t) and show that a maximizer of k(t):=k ₀(t)−t necessarily exists. In addition to Assumption 2.1 and the regularity in (2.15), we also assume that s+1≠0, which can always be satisfied by scaling B and b. Furthermore, to avoid triviality, we assume that (2.19) fails so that ψ is strictly increasing. Indeed, in view of Remark 2.3, if (2.19) holds, (2.2) is a simple linear programming problem and can be readily solved.

Our analysis is based on the following function:

$$ d(\lambda) := \lambda+ (a-\lambda b)^T(A-\lambda B)^\dagger (a-\lambda b). $$

This function reduces to the d(λ) considered in [29] for problem (1.2). Since d′(λ)=1+ψ(λ) on $(\underline{\lambda},\overline{\lambda})$ and (2.19) fails, we see that this function is strictly convex with strictly increasing derivative on $(\underline{\lambda },\overline {\lambda})$. With a Schur complement argument, we get the following result.

Lemma 3.1

The following four properties hold.

(i)
Suppose $\lambda\in(\underline{\lambda},\overline{\lambda })$. Then
$$\begin{aligned} \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succ0 \quad \Leftrightarrow\quad t > d(\lambda), \\ \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \succeq0 \quad \Leftrightarrow\quad t \ge d(\lambda). \end{aligned}$$
(ii)
Suppose $\lim_{\lambda\downarrow\underline{\lambda }}d(\lambda)$ is finite and $\underline{\lambda}$ is finite. Then
$$ \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] - \underline{\lambda} \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0 \quad \Leftrightarrow\quad t \ge d(\underline{\lambda}). $$
(iii)
Suppose $\lim_{\lambda\uparrow\overline{\lambda}}d(\lambda )$ is finite and $\overline{\lambda}$ is finite. Then
$$ \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \overline{\lambda} \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0 \quad \Leftrightarrow\quad t \ge d(\overline{\lambda}). $$
(iv)
For any $s\in {\mathbb{R}}$ and any $t > \inf_{\underline{\lambda }<\lambda <\overline{\lambda}}d(\lambda)$, we have
$$\begin{aligned} &\sup \biggl\{ (s + 1)\lambda:\; \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \succeq0 \biggr\} \\ &\quad =\sup \bigl\{ (s + 1)\lambda:\; \lambda\in(\underline{\lambda },\overline{\lambda}), t\ge d(\lambda) \bigr\} . \end{aligned}$$

Proof

For part (i), notice that $\lambda\in(\underline{\lambda},\overline {\lambda})$ if, and only if, A−λB≻0. Hence the conclusion follows immediately from an application of the Schur complement. For part (ii), notice first that $\lim_{\lambda\downarrow\underline{\lambda}}d(\lambda)$ being finite implies $d(\underline{\lambda})=\lim_{\lambda\downarrow\underline{\lambda }}d(\lambda)$ and hence d is continuous and convex on $[\underline{\lambda },\overline{\lambda})$. Next, by taking $\lambda\in(\underline{\lambda},\overline{\lambda})$ and r>d(λ), we have from part (i) that

$$ \left[ \begin{array}{c@{\quad }c} r & -a^T\\ -a& A \end{array} \right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0. $$

Hence, the convexity of d(⋅) and the continuity of d(⋅) at $\underline{\lambda}$ yields

$$\begin{aligned} &\left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \underline{\lambda} \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0 \\ & \Leftrightarrow\quad \left[ \begin{array}{c@{\quad }c} (1-\epsilon)t + \epsilon r & -a^T\\ -a& A \end{array}\right] - \bigl[(1-\epsilon)\underline{\lambda} + \epsilon\lambda\bigr] \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0,\quad \forall0<\epsilon<1, \\ &t \ge d(\underline{\lambda})\quad \Leftrightarrow\quad (1-\epsilon)t + \epsilon r \ge d\bigl((1-\epsilon)\underline{\lambda} + \epsilon\lambda\bigr),\quad \forall 0<\epsilon<1. \end{aligned}$$

Since $(1-\epsilon)\underline{\lambda} + \epsilon\lambda\in (\underline{\lambda},\overline{\lambda})$ for all 0<ϵ<1, the conclusion of part (ii) now follows immediately from the above two relations and part (i). Part (iii) can be proved similarly. Finally, for part (iv), since $t > \inf_{\underline{\lambda}<\lambda<\overline{\lambda }}d(\lambda)$, the optimal value on the left hand side is unchanged if we intersect the feasible region further with $(\underline{\lambda},\overline {\lambda})$. The conclusion now follows from this observation and part (i). □

We next consider three cases: B is indefinite, positive semidefinite and negative semidefinite. We only discuss the first two cases in detail since the third case is analogous to the second case. For this purpose, we will need to consider the following quantities:

$$\begin{aligned} \begin{array}{@{}l@{\quad}l@{\quad}l@{\quad}l} t_0 := \inf_{\underline{\lambda}<\lambda<\overline{\lambda }}d(\lambda). & \overline{t}:= \lim_{\lambda\uparrow\overline{\lambda}} d(\lambda) \quad \mathrm{if}\ \overline{\lambda} < \infty, & \mathrm{or} & \mathrm{if}\ \overline {\lambda } = \infty \\ &&& \mathrm{and}\ \lim_{\lambda\uparrow\overline {\lambda }} d(\lambda) > t_0. \\ s_0:=\inf x^TBx - 2b^Tx. & \underline{t}:= \lim_{\lambda\downarrow\underline{\lambda}} d(\lambda) \quad \mathrm{if}\ \underline{\lambda} > -\infty, & \mathrm{or} & \mathrm{if}\ \underline {\lambda} = -\infty \\ &&& \mathrm{and}\ \lim_{\lambda\downarrow \underline{\lambda}} d(\lambda) > t_0. \end{array} \end{aligned}$$

(3.4)

Intuition behind the definitions of $\overline{t}$ and $\underline{t}$ will be described briefly after Corollary 3.3. We remark that these four quantities are not necessarily finite. Furthermore, it follows from weak duality that

$$ t_0 \ge-\inf_{x^TBx - 2b^Tx=-1}x^TAx - 2a^Tx. $$

Hence, t ₀ is finite when s ₀≤−1.

Case 1: B is indefinite

Notice that in this case, the quantities t ₀, $\underline{\lambda}$ and $\overline{\lambda}$ are finite. In the next theorem, we show that k ₀(t) is well-defined on a closed interval, and is differentiable in the interior of that interval, except possibly at one point.

Theorem 3.1

The function k ₀(t) is well-defined and continuous on t≥t ₀. Moreover:

(i)
when s+1>0, k ₀(t) is differentiable on $(t_{0},\overline {t})\cup(\overline{t},\infty)$, and $k_{0}(t) = (s+1)\overline{\lambda}$ on $t> \overline{t}$;
(ii)
when s+1<0, k ₀(t) is differentiable on $(t_{0},\underline {t})\cup(\underline{t},\infty)$, and $k_{0}(t) = (s+1)\underline{\lambda}$ on $t> \underline{t}$.

Proof

Suppose first that t>t ₀. Then from the definition of t ₀, there exists λ so that

$$ \lambda\in(\underline{\lambda},\overline{\lambda}),\qquad t - d(\lambda) > 0. $$

It follows from this, Lemma 3.1(i), (iv) and Theorem 2.2 that

$$ \begin{aligned} k_0(t) &= \inf \biggl\{ y^T \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array} \right] y:\; y^T \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] y = s + 1 \biggr\} \\ &= \sup \biggl\{ (s + 1)\lambda:\; \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \succeq0 \biggr\} > -\infty. \end{aligned} $$

(3.5)

Thus, k ₀(t) is finite. Next, suppose t=t ₀. Notice that the infimum value t ₀ in (3.4) has to be attained at some $\lambda_{0}\in[\underline{\lambda },\overline {\lambda}]$. Thus, we have from Lemma 3.1 that

$$\left[ \begin{array}{c@{\quad }c} t_0 & -a^T\\ -a& A \end{array} \right] - \lambda_0 \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq0. $$

Hence, from Theorem 2.2, we conclude that (3.5) holds with t=t ₀. Finally, since k ₀ is a one variable concave function, it is continuous in the interior of its domain [30, Theorem 10.1] and is lower semicontinuous up to the boundary [30, Theorem 10.2]. Furthermore, since k ₀ is the infimum of a family of continuous functions

$$t\mapsto y^T \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] y, $$

it also follows that k ₀ has to be upper semicontinuous. Thus, k ₀ is continuous on t≥t ₀.

We now turn to part (i). Since d(λ) is a strictly convex function, we deduce that

$$ d(\lambda) = t $$

(3.6)

has at most 2 solutions in $(\underline{\lambda},\overline{\lambda})$ for any t>t ₀, at which d has opposite slopes.

We first consider the case when $\overline{t}$ is infinite. In this case, there must be a unique solution $\mu\in (\underline{\lambda},\overline{\lambda})$ to (3.6) with d′(μ)>0. Since s+1>0, we see from Lemmas 3.1(iv) and (3.5) that k ₀(t)=(s+1)μ. Moreover, since d′(μ)>0, by the inverse function theorem, μ is differentiable at t and $\mu'(t)=\frac{1}{d'(\mu)}$. This shows that k ₀ is differentiable at t.

We next assume that $\overline{t}$ is finite. When $t_{0}<t<\overline {t}$, it still holds true that (3.6) has a unique solution $\mu\in(\underline{\lambda},\overline{\lambda })$ satisfying d′(μ)>0 and k ₀(t)=(s+1)μ as above. Similarly, it can be shown that k ₀(t) is differentiable on $t_{0}<t<\overline{t}$.

On the other hand, when $t> \overline{t}=d(\overline{\lambda})$, we see from Lemma 3.1(iii) that

$$ \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] - \overline{\lambda} \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \succeq0. $$

Hence, from Lemmas 3.1(iv) and (3.5), we conclude that $k_{0}(t) = (s+1)\overline{\lambda}$. This completes the proof of part (i).

The cases when s+1<0 in part (ii) can be proved similarly, by noting that k ₀(t)=(s+1)ν for $t_{0} < t < \underline{t}$, where ν=ν(t) is the unique root of (3.6) in $(\underline{\lambda},\overline{\lambda})$ with d′(ν)<0. □

Before proceeding, we collect several facts that are readily obtained from the proof of Theorem 3.1.

Corollary 3.1

(i)
If s+1>0 and $t_{0}<t<\overline{t}$, then k ₀(t)=(s+1)μ, where μ=μ(t) is the unique root of (3.6) on $(\underline {\lambda},\overline{\lambda})$ with d′(μ)>0; moreover, $\mu '(t) = \frac{1}{d'(\mu)} > 0$.
(ii)
If s+1<0 and $t_{0}<t<\underline{t}$, then k ₀(t)=(s+1)ν, where ν=ν(t) is the unique root of (3.6) on $(\underline {\lambda},\overline{\lambda})$ with d′(ν)<0; moreover, $\nu '(t) = \frac{1}{d'(\nu)} < 0$.

In the next proposition, we confirm that the domain of k ₀(t) is actually t≥t ₀, i.e., k ₀(t)=−∞ whenever t<t ₀. This is a consequence of the more general result from Proposition A.1.

Corollary 3.2

When t<t ₀, k ₀(t)=−∞.

Notice that $k'(t) = k_{0}'(t)-1$ whenever t>t ₀ and k ₀ is differentiable at t. Moreover, $\lim_{t\uparrow\infty}k_{0}'(t) = 0$, in view of Corollary 3.1 and Theorem 3.1. Thus, it holds true that a maximizer t ^∗ of k(t) must exist.

A plot of d(λ) is shown in Fig. 3.

Case 2: B is positive semidefinite

In this case, we have $\overline{\lambda}$ being finite and $\underline {\lambda} = -\infty$. Moreover, we have the following result concerning the behavior of d(λ) as λ goes to −∞. See Figs. 4, 5 for plots of d(λ) in this case.

Proposition 3.1

It holds that lim_λ→−∞ d′(λ)=1+s ₀. Hence

(i)
If s ₀<−1, then lim_λ→−∞ d(λ)=∞;
(ii)
If s ₀=−1, then lim_λ→−∞ d(λ)=t ₀>−∞;
(iii)
If s ₀>−1, then lim_λ→−∞ d(λ)=−∞.

Proof

The fact that lim_λ→−∞ d′(λ)=1+s ₀ follows from d′(λ)=1+ψ(λ) and Lemma 2.3. The rest of the conclusion then follows immediately. □

In view of the definition of s ₀ and Assumption 2.1, Item 3, we only need to consider s>s ₀ in subsequent analysis. We discuss the differentiability property of k ₀(t) in the next theorem.

Theorem 3.2

(i)
Suppose that s ₀<−1. Then k ₀(t) is continuous and well-defined on t≥t ₀. Furthermore, when s+1>0, k ₀(t) is differentiable on $(t_{0},\overline{t})\cup (\overline{t},\infty)$ and $k_{0}(t) = (s+1)\overline{\lambda}$ on $t> \overline{t}$; when s+1<0, k ₀(t) is differentiable on (t ₀,∞).
(ii)
Suppose that s ₀=−1. Then k ₀(t) is continuous and well-defined on t>t ₀. Furthermore, when s+1>0, k ₀(t) is differentiable on $(t_{0},\overline{t})\cup (\overline{t},\infty)$ and $k_{0}(t) = (s+1)\overline{\lambda}$ on $t> \overline{t}$.
(iii)
Suppose that s ₀>−1. Then k ₀(t) is continuous and well-defined everywhere. Furthermore, when s+1>0, k ₀(t) is differentiable on $(-\infty,\overline {t})\cup (\overline{t},\infty)$ and $k_{0}(t) = (s+1)\overline{\lambda}$ on $t> \overline{t}$.

Proof

The theorem can be proved similarly as Theorem 3.1 once we show that (3.5) holds for each case. To prove this, it suffices to show that Assumption 2.1, Item 3 is satisfied for the corresponding GTRS in (3.5) in each case. Suppose first that s ₀+1<0. Then it is easy to see that has to be indefinite and so Assumption 2.1, Item 3 is satisfied. On the other hand, if s ₀+1≥0, then

$$\begin{aligned} \inf_y y^T \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] y &= \min \Bigl\{ \inf_{y_1\neq0} y^T \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] y, \inf_y y^TBy \Bigr\} \\ &\ge\min \Bigl\{ \inf_{y_1\neq0} y_1^2(s_0+1), \inf_y y^TBy \Bigr\} \ge0, \end{aligned}$$

showing that is positive semidefinite. Since s+1>s ₀+1, we see that Assumption 2.1, Item 3 is also satisfied in this case. □

As in Corollary 3.1, we have the following relationship between k ₀(t) and the roots of (3.6), which can be obtained from a detailed proof of Theorem 3.2.

Corollary 3.3

(i)
If s+1>0 and $t_{0}<t<\overline{t}$, then k ₀(t)=(s+1)μ, where μ=μ(t) is the unique root of (3.6) on $(\underline {\lambda},\overline{\lambda})$ with d′(μ)>0; moreover, $\mu '(t) = \frac{1}{d'(\mu)} > 0$;
(ii)
If s+1<0 and t ₀<t, then k ₀(t)=(s+1)ν, where ν=ν(t) is the unique root of (3.6) on $(\underline{\lambda},\overline{\lambda})$ with d′(ν)<0; moreover, $\nu'(t) = \frac{1}{d'(\nu)} < 0$.

Combining with Corollary 3.1, we have k ₀(t)=(s+1)μ(t) for $t_{0}<t<\overline{t}$ and s+1>0 in both Case 1 (indefinite B) and Case 2 (positive semidefinite B). Also, using the definition of $\underline{t}$ and Proposition 3.1, we have a symmetric statement for s+1<0 in these two cases: k ₀(t)=(s+1)ν(t) for $t_{0}<t<\underline{t}$. The definitions of $\overline{t}$ and $\underline{t}$ were indeed introduced so that the above representations of k ₀(t) hold for any nonzero symmetric matrix B.

The existence of a maximizer t ^∗ of k(t) in this case is less trivial and is obtained as a consequence of the next theorem.

Theorem 3.3

(i)
Suppose that s ₀<−1. When s+1>0, we have lim_t→∞ k′(t)=−1. When s+1<0, we have $\lim_{t\rightarrow\infty} k'(t) = \frac {s+1}{s_{0}+1}-1<0$. Furthermore, in either case, k(t)=−∞ for t<t ₀.
(ii)
Suppose that s ₀=−1. When s+1>0, we have lim_t→∞ k′(t)=−1 and $\lim_{t\downarrow t_{0}}k'(t) = \infty$. Furthermore, k(t)=−∞ for t≤t ₀.
(iii)
Suppose that s ₀>−1. When s+1>0, we have lim_t→∞ k′(t)=−1 and $\lim_{t\rightarrow-\infty}k'(t) = \frac{s+1}{s_{0}+1}-1>0$.

Proof

First of all, in all three cases, when s+1>0, the limit of k′(t) as t goes to ∞ can be found similarly as in the case when B is indefinite.

We now consider part (i). When s+1<0, by Corollary 3.3, we have $k'(t) = \frac{s+1}{d'(\nu)}-1$. Moreover, from Proposition 3.1, we see that lim_λ→−∞ d(λ)=∞. Thus, ν is defined for all sufficiently large t and ν→−∞ as t→∞. We conclude from these results and Proposition 3.1 that $\lim_{t\rightarrow\infty} k'(t) = \frac{s+1}{s_{0}+1}-1<0$. The conclusion that k(t)=−∞ for t<t ₀ follows from Proposition A.1.

We next turn to part (ii). Since d is strictly increasing, it follows from Proposition 3.1 that

$$\bar{t} = \lim _{\lambda\uparrow\overline{\lambda}}d(\lambda )> \lim _{\lambda\rightarrow-\infty}d(\lambda)= t_0. $$

By Corollary 3.3, we have $k'(t) = \frac{s+1}{d'(\mu)}-1$ for $\bar{t}> t> t_{0}$. Furthermore, from Proposition 3.1, we see that lim_λ→−∞ d(λ)=t ₀>−∞ and hence μ→−∞ as t→t ₀. We conclude from this, Proposition 3.1 and the fact that s ₀+1=0 that $\lim_{t\downarrow t_{0}} k'(t) = \infty$. Furthermore, since k(t)=k ₀(t)−t by definition, we have by Corollary 3.3 that

$$\lim_{t\downarrow t_0}k(t) = (s+1)\lim_{t\downarrow t_0} \mu(t)-t_0 = -\infty. $$

It then follows from concavity of k that k(t)=−∞ when t≤t ₀.

Finally, for part (iii), note that from Proposition 3.1, we have lim_λ→−∞ d(λ)=−∞ and hence μ→−∞ as t→−∞. Thus, we conclude further from Proposition 3.1 that $\lim _{t\rightarrow-\infty} k'(t) = \frac{s+1}{s_{0}+1}-1>0$. □

3.2 Recovering solution of GTRS from maximizing k(t)

In this subsection, we discuss how one can obtain a solution to (3.1) after obtaining a maximizer t ^∗ of k(t). In addition, we will argue that such a maximizer has to be unique.

We focus on the case when s+1>0. We will briefly comment on the case when s+1<0 at the end of this subsection. Due to the definition of $\overline{t}$, we have k ₀(t)=(s+1)μ(t) for $t_{0}<t <\overline{t}$, where μ is the unique root of (3.6) in $(\underline {\lambda },\overline{\lambda})$ with d′(μ)>0. Furthermore, any maximizer of k(t) has to lie in $\mathrm{cl}(t_{0},\overline{t})$. Moreover, for any $t\in(t_{0},\overline{t})$, we see from the definition of t ₀ that the maximization problem in (3.5) is strictly feasible, and hence the infimum in (3.5) is attained. By Theorem 2.3, the infimum is attained at some generalized eigenvector of the matrix pair

$$ \left( \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array}\right], \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \right), $$

(3.7)

with corresponding eigenvalue μ. Let be a generalized eigenvector attaining the infimum in (3.5). Since $\mu\in(\underline{\lambda},\overline{\lambda})$, it follows that the first coordinate y ₀(t) of y(t) must be nonzero. Thus, we may further scale the vector so that y ₀(t)>0. Moreover, since s+1>0, we can scale the vector again so that .

We claim that such a vector is unique. To see this, notice first that for any $\lambda\in(\underline{\lambda},\overline{\lambda})$, we have

$$ \det \left( \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \lambda \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \right) = \bigl(t -d(\lambda)\bigr) \det(A - \lambda B). $$

It thus follows from this, det(A−μB)>0 and d′(μ)>0 that μ is a root of multiplicity one of the polynomial

$$\lambda\mapsto\det \left( \left[ \begin{array}{c@{\quad }c} t-\lambda& -(a-\lambda b)^T\\ -(a-\lambda b)& A-\lambda B \end{array} \right] \right). $$

Hence, the dimension of the nullspace of is one; i.e., any generalized eigenvector corresponding to μ differs only by a scaling. Since the scaling is uniquely determined by the constraints and y ₀>0, the generalized eigenvector constructed under these two restrictions is unique. Using standard arguments and the implicit function theorem, one can show in addition that y(t) is differentiable for $t_{0}<t<\overline{t}$.

By [5, Theorem 4.13] and using the above notations, the derivative of k(t) at any $t\in(t_{0},\overline{t})$ is given by

$$ k'(t) = (s+1)y_0^2(t)-1 = (s+1)\mu'(t) - 1. $$

(3.8)

We next analyze the following cases. Recall that we currently assume s+1>0 and that t ^∗ is a maximizer of k(t).

Case 1: $t^{*}\in(t_{0},\overline{t})$

In this case, k is differentiable at t ^∗. Hence, necessarily

$$y_0^2\bigl(t^*\bigr) = \frac{1}{s+1}. $$

It follows, after a simple calculation, that $x^{*}:=\frac{z(t^{*})}{y_{0}(t^{*})}$ satisfies x ^∗ ^T Bx ^∗−2b ^T x ^∗=s, from which it follows by checking optimality conditions that x ^∗ is an optimal solution to (3.1), with the Lagrange multiplier given by $\mu(t^{*})\in(\underline{\lambda},\overline {\lambda})$.

Case 2: $t^{*} = t_{0} < \overline{t}$

In this case, k′(t)≤0 for all t>t ₀ sufficiently close to t ₀. Hence

$$ \limsup_{t\downarrow t_0}y_0^2(t) \le\frac{1}{s+1}. $$

(3.9)

Furthermore, since t ^∗∈dom(k) by definition of maximizer, the assumption implies that t ₀∈dom(k) and thus k ₀ is (right) continuous at t ₀. Then, since k ₀(t)=(s+1)μ(t) for $t_{0}<t <\overline{t}$, we see that $\lim_{t\downarrow t_{0}}\mu(t)$ exists (and is finite). We claim that $\lim_{t\downarrow t_{0}}\mu(t) = \underline{\lambda}$. To see this, note that if $\lim_{t\downarrow t_{0}}\mu(t) \in(\underline{\lambda },\overline {\lambda})$, then $d'(\lim_{t\downarrow t_{0}}\mu(t)) = 0$, which implies that $\mu'(t)=\frac{1}{d'(\mu(t))}>0$ is arbitrarily large as t↓t ₀. In view of (3.8), this contradicts the fact that k′(t)≤0 for t close to t ₀. On the other hand, if $\lim_{t\downarrow t_{0}}\mu (t)=\overline{\lambda}$, then $\overline{\lambda}$ is finite. But by definition of μ(t), it is the unique root of d(λ)=t on $(\underline{\lambda},\overline{\lambda})$ with positive slope and thus $\mu(t) > \lim_{t\downarrow t_{0}}\mu(t) = \overline{\lambda}$ for all $\overline{t} > t > t_{0}$. But $\mu(t)<\overline{\lambda}$ by definition, and we arrive at a contradiction. Thus, we must have $\lim_{t\downarrow t_{0}}\mu(t) = \underline{\lambda}$. In particular, $\underline{\lambda}$ is finite.

We next claim that there exists a generalized eigenvector $y^{*} = \begin{bmatrix} y_{0}^{*}\\z^{*} \end{bmatrix} $ of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} t_0&-a^T\\ -a&A \end{array} \right], \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \right) $$

corresponding to $\underline{\lambda}$ such that $y_{0}^{*}\neq0$ and .

To prove this, fix $t_{1}\in(t_{0},\overline{t})$ and consider {y(t): t ₀<t<t ₁}. Since t ₁>t ₀, by the definition of t ₀, there exists $\hat{\lambda}>\underline{\lambda}$ such that

$$ \left[ \begin{array}{c@{\quad }c} t_1&-a^T\\ -a&A \end{array} \right] -\hat{\lambda}\left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq\epsilon I, $$

(3.10)

for some ϵ>0. Then, for any t ₀<t<t ₁, we have that

$$\begin{aligned} 0 &= y(t)^T \biggl( \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array}\right] - \mu(t) \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \biggr)y(t)\\ &= y(t)^T \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] y(t) - \mu(t) \\ &= y^2_0(t) (t-t_1) + y(t)^T \left[ \begin{array}{c@{\quad }c} t_1 & -a^T\\ -a& A \end{array} \right] y(t) - \mu(t) \\ &= y^2_0(t) (t-t_1) + y(t)^T \biggl( \left[ \begin{array}{c@{\quad }c} t_1 & -a^T\\ -a& A \end{array} \right] -\hat{\lambda}\left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \biggr)y(t) - \bigl(\mu(t) - \hat{\lambda}\bigr) \\ &\ge y^2_0(t) (t-t_1) - \bigl(\mu(t) - \hat{\lambda}\bigr) + \epsilon\bigl\| y(t)\bigr\| ^2, \end{aligned}$$

where the last inequality follows from (3.10). Thus,

$$ y^2_0(t) (t_1-t) + \bigl( \mu(t) - \hat{\lambda}\bigr) \ge\epsilon\bigl\| y(t)\bigr\| ^2. $$

(3.11)

This together with (3.9) shows that {y(t): t ₀<t<t ₁} is bounded. Consider any cluster point y ^∗ of {y(t)} as t↓t ₀. Then , and hence in particular y ^∗≠0. Moreover, since $\mu(t)-\hat{\lambda}\rightarrow\underline{\lambda} -\hat{\lambda}<0$, we see further from (3.11) that $y_{0}^{*} > 0$. Finally, it is easy to check that y ^∗ is a generalized eigenvector of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} t_0&-a^T\\ -a&A \end{array} \right], \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \right) $$

corresponding to the eigenvalue $\underline{\lambda}$.

Next, define $x^{*} := \frac{1}{y_{0}^{*}}z^{*}$. Then it follows from (3.9) that x ^∗ ^T Bx ^∗−2b ^T x ^∗≥s and from the definition of y(t) that $(A-\underline{\lambda}B)x^{*}= a-\underline{\lambda}b$. By Lemma 2.5, there exists $v\in \operatorname{{Null}}(A-\underline{\lambda}B)$ with v ^T Bv<0, and thus x ^∗+αv would solve (3.1), for some suitable α>0.

Case 3: $t^{*}=\overline{t} > t_{0}$

In this case, we must have k′(t)≥0 for all $t_{0}<t<\overline{t}$. Hence

$$ \liminf_{t\uparrow\overline{t}}y_0^2(t) \ge\frac{1}{s+1}. $$

(3.12)

Furthermore, since $\overline{t}$ is finite by assumption, we must have $\overline{\lambda}$ finite from the definition of $\overline{t}$ and thus $d(\overline{\lambda}) = \lim_{\lambda\uparrow\overline{\lambda}}d(\lambda) = \overline{t}$. This, together with the definition of μ(t) and the fact that d(λ) is strictly increasing when $\lambda>\lim_{t\downarrow t_{0}}\mu(t)$, implies that $\lim_{t\uparrow\overline{t}}\mu(t) = \overline {\lambda}$. We claim that there exists a generalized eigenvector of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} \overline{t}&-a^T\\ -a&A \end{array}\right], \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \right) $$

corresponding to $\overline{\lambda}$ such that $y_{0}^{*}\neq0$ and .

We first show that the parameterized set of eigenvectors $\{y(t):\; t_{1}\le t<\overline{t}\}$ is bounded for any fixed $t_{1}\in(t_{0},\overline{t})$. Since $t_{1}\in (t_{0},\overline {t})$, by definition of t ₀ and Lemma 3.1(i), there exists $\hat{\lambda}<\overline{\lambda}$ such that

$$ \left[ \begin{array}{c@{\quad }c} t_1&-a^T\\ -a&A \end{array} \right] -\hat{\lambda}\left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array}\right] \succeq\epsilon I, $$

(3.13)

for some ϵ>0. Using this and the definition of generalized eigenvector and eigenvalue, we have, for any $t_{1}\le t<\overline{t}$, that

$$\begin{aligned} 0 &= y(t)^T \biggl( \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] - \mu(t) \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \biggr)y(t)\\ &= y(t)^T \left[ \begin{array}{c@{\quad }c} t & -a^T\\ -a& A \end{array} \right] y(t) - \mu(t) \\ &= y^2_0(t) (t-t_1) + y(t)^T \left[ \begin{array}{c@{\quad }c} t_1 & -a^T\\ -a& A \end{array} \right] y(t) - \mu(t) \\ &= y^2_0(t) (t-t_1) + y(t)^T \biggl( \left[ \begin{array}{c@{\quad }c} t_1 & -a^T\\ -a& A \end{array}\right] -\hat{\lambda}\left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \biggr)y(t) - \bigl(\mu(t) - \hat{\lambda}\bigr) \\ &\ge- \bigl(\mu(t) - \hat{\lambda}\bigr) + \epsilon\bigl\| y(t)\bigr\| ^2, \end{aligned}$$

where the last inequality follows from (3.13). This shows that $\{y(t):\; t_{1}\le t<\overline{t}\}$ is bounded. Consider any cluster point y ^∗ of {y(t)} as $t\uparrow\overline{t}$. Due to (3.12), the first coordinate $y_{0}^{*}$ of y ^∗ is nonzero. It is easy to check that and that y ^∗ is a generalized eigenvector of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} \overline{t}&-a^T\\ -a&A \end{array}\right], \left[ \begin{array}{c@{\quad }c} 1&-b^T\\ -b&B \end{array} \right] \right) $$

corresponding to the eigenvalue $\overline{\lambda}$.

Next, define $x^{*} := \frac{1}{y_{0}^{*}}z^{*}$. Then it follows from (3.12) that x ^∗ ^T Bx ^∗−2b ^T x ^∗≤s and from the definition of y(t) that $(A-\overline{\lambda}B)x^{*}= a - \overline{\lambda}b$. By Lemma 2.5, there exists $v\in \operatorname{{Null}}(A-\overline{\lambda}B)$ with v ^T Bv>0, and thus x ^∗+αv would solve (3.1), for some suitable α>0.

Case 4: $t^{*}=t_{0}=\overline{t}$

In this case, from the definitions of t ₀ and $\overline{t}$, we have $\overline{\lambda}$ is finite and the infimum of d is attained at the right end point of the interval $(\underline{\lambda},\overline{\lambda})$, i.e., at $\overline {\lambda }$. From the definition of d, this implies in particular that $a - \overline{\lambda}b\in \operatorname{{Range}}(A - \overline{\lambda}B)$. Furthermore,

$$ 0\ge d'(\overline{\lambda}) = 1 + \psi(\overline{\lambda}). $$

Hence, using the fact that s+1>0, we see that

$$ \psi(\overline{\lambda}) = {x^*}^T B x^* - 2b^T x^*\le-1 < s, $$

where $x^{*} := x(\overline{\lambda})$. Finally, recall from Lemma 2.5 that there exists $v\in \operatorname{{Null}}(A-\overline{\lambda}B)$ with v ^T Bv>0. Combining these few facts, we conclude that x ^∗+αv solves (3.1), for some suitable α>0.

Remark 3.1

(Uniqueness of maximizer of k)

In passing, we remark that k(t) must have a unique maximizer. First of all, it is easy to see that the four cases analyzed above are mutually exclusive for a maximizer $t^{*}\in\mathrm{cl}(t_{0},\overline{t})$ of k(t). If case 4 happens, then clearly the maximizer is unique and equals $t_{0} = \overline{t}$. Otherwise, in all other three cases, we see that for each maximizer t ^∗, there corresponds at least one dual solution λ ^∗ of the GTRS₌ (3.1). Such λ ^∗ constructed in those cases are different for different t ^∗, thanks to the strict monotonicity of μ(t) on $(t_{0},\overline {t})$. Since the GTRS₌ (3.1) must have a unique dual optimal solution in view of the analysis in Sect. 2.2.2 and our assumption that ψ is strictly increasing on $\mathrm{cl}(\underline{\lambda},\overline{\lambda})$, we conclude that k(t) must have a unique maximizer.

The above arguments for s+1<0 are completely analogous. Note that we need to scale by $\sqrt{-(s+1)}$ instead of $\sqrt{s+1}$ in the constraint of the minimization problem in (3.5). Hence and moreover $k'(t) = -(s+1)y_{0}^{2}(t)-1$.

In this section, we just gave a complete analysis on how a solution of (3.1) can be recovered after maximizing k(t). Comparing this with the procedures outlined in Sect. 2.2, we see that if the problem (3.1) is obtained from those procedures, then $\lambda^{*}\in(\underline{\lambda},\overline{\lambda})$. Thus, we must be in Case 1 and hence k(t) has to be differentiable at t ^∗. On the other hand, it is easy to see that closed form solutions are obtained for the other cases using the procedures in Sect. 2.2.

3.3 Implementation details

In this subsection, we discuss implementation details of our algorithm. For simplicity, we only consider the case when B is nonsingular. In this case, by translating the optimization variable by B ⁻¹ b if necessary, we may further assume without loss of generality that b=0. Moreover, we only need to consider the cases when B is positive definite or indefinite. Furthermore, we still assume Assumption 2.1, the regularity (2.15) and s+1≠0. To explicitly guarantee that (2.19) never holds, we also assume a≠0.

When B is positive definite, $\underline{\lambda}$ is negative infinity and $\overline{\lambda}$ is the minimum generalized eigenvalue of the matrix pair (A,B), which can be computed efficiently. On the other hand, when B is indefinite, the interval $(\underline{\lambda},\overline{\lambda})$ cannot easily be determined in general. Thus, in this case, we consider the further subcase that A is positive definite. Then we know that $0\in (\underline{\lambda},\overline{\lambda})$. Observe that for any $\lambda\in[\underline{\lambda},0)$, we have

$$ A - \lambda B = -\lambda \biggl(B - \frac{1}{\lambda}A \biggr). $$

(3.14)

Since $\lambda\in[\underline{\lambda},0)$ if, and only if, $\frac {1}{\lambda} \in(-\infty,\frac{1}{\underline{\lambda}}]$, we conclude from (3.14) that $\underline{\lambda} = \frac {1}{\overline{\rho}_{1}}$, where $\overline{\rho}_{1}$ is the minimum generalized eigenvalue of the matrix pair (B,A). Similarly, one can show that $\overline{\lambda} = -\frac {1}{\overline {\rho}_{2}}$, with $\overline{\rho}_{2}$ being the minimum generalized eigenvalue of the matrix pair (−B,A). The quantities $\overline{\rho}_{1}$ and $\overline{\rho}_{2}$ can both be computed efficiently.

Case check

We first carry out the case check as described in Sect. 2.2.2. We obtain an explicit solution when the GTRS (1.1) has an interior solution A ⁻¹ a or when the GTRS instance is a hard case, case 2 instance.

Shift and deflation for the hard case, case 1, when B is positive definite

After the case check, the bound side has been determined and so we focus on the equality constrained case x ^T Bx=s. In the next proposition, we describe a shift and deflation technique that transforms a GTRS₌ instance from hard case, case 1, to the easy case, when B is positive definite.

Proposition 3.2

Suppose that we consider the GTRS ₌ with the equality constraint x ^T Bx=s with B≻0. Let $\overline{\lambda}$ be the smallest generalized eigenvalue of the matrix pair (A,B) and suppose that a corresponding nonzero eigenvector v satisfies a ^T v=0. Furthermore, let α>0. Then

$$ \begin{array}{c} (x^*,\lambda^*)\ \textit{solves}\ \mathit{GTRS}_= \\ \textbf{if},\ \textbf{and}\ \textbf{only}\ \textbf{if},\\ (x^*,\lambda^*)\ \textit{solves}\ \mathit{GTRS}_=\ \mathit{with}\ A\ \mathit{replaced}\ \mathit{by}\ A-\overline {\lambda}B+\alpha(Bv)(Bv)^T. \end{array} $$

Proof

First, since x ^T Bx=s, it is clear that (x ^∗,λ ^∗) solves GTRS₌ if, and only if, (x ^∗,λ ^∗) solves GTRS₌ with A replaced by $A-\overline {\lambda}B$. Next, consider the GTRS₌ with A replaced by $A-\overline{\lambda}B$. Let $y=B^{\frac{1}{2}} x$ and use the substitution $x=B^{-\frac{1}{2}}y$. This results in the following TRS:

$$ \begin{array}{r@{\quad }l} \inf& y^T \bar{A} y - 2\bar{a}^T y\\ \mathrm{s.t.} & y^Ty = s, \end{array} $$

where $\bar{A}= B^{-\frac{1}{2}}(A-\overline{\lambda}B)B^{-\frac{1}{2}}$ and $\bar{a} = B^{-\frac{1}{2}}a$.

We now apply the shift and deflate lemma in [11] stated for the above TRS and obtain that

$$ \begin{array}{c} (y^*,\lambda^*)\ \mathrm{solves}\ \mathrm{TRS}\\ \mathrm{if},\ \mathrm{and}\ \mathrm{only}\ \mathrm{if},\\ (y^*,\lambda^*)\ \mathrm{solves}\ \mathrm{TRS}\ \mathrm{when}\ \bar{A}\ \mathrm{is}\ \mathrm{replaced}\ \mathrm{by}\ \bar{A}+\alpha ww^T, \end{array} $$

where $w = B^{\frac{1}{2}} v$ is the corresponding eigenvector and hence satisfies $w^{T} \bar{a}=0$ by assumption and the definition of $\bar{a}$. After substituting again using $y = B^{\frac{1}{2}}x$, we see that

$$y^T \bigl( B^{-\frac{1}{2}}(A-\overline{\lambda}B)B^{-\frac{1}{2}} + \alpha ww^T \bigr)y= x^T \bigl( A-\overline{\lambda}B + \alpha(Bv) (Bv)^T \bigr)x. $$

The linear term and constraint follow similarly. □

Initialization

After the case check and the possible shift and deflation, we proceed to solve the equality constrained case by the RW algorithm. We discuss initialization in this subsection.

We first consider the case when B is positive definite. In this case, k(t) is well-defined for all t by Theorem 3.2. Let t ^∗ be the optimal solution of maxk(t). As in [29], we derive an interval that contains t ^∗ in the next proposition for initializing the RW algorithm.

Proposition 3.3

Suppose that $\lambda^{*} < \overline{\lambda}$. Then

$$ \overline{\lambda} - \sqrt{\frac{a^TB^{-1}a}{s}} \le t^* \le \overline {\lambda} + \sqrt{s\,a^TB^{-1}a}. $$

(3.15)

Proof

The assumption $\lambda^{*} < \overline{\lambda}$ implies that we can recover a solution of GTRS₌ from the maximizer of k(t). Let x ^∗ be the solution of GTRS₌ thus recovered. Then is a generalized eigenvector of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} t^*&-a^T\\ -a&A \end{array}\right], \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array} \right] \right) $$

with generalized eigenvalue λ ^∗. Hence, in particular,

$$\begin{aligned} t^* - a^T x^* = \lambda^*\quad \Rightarrow\quad t^* = \lambda^* +a^Tx^* \le& \overline{\lambda} + \sqrt{{x^*}^TBx^*} \sqrt{a^TB^{-1}a}\\ =& \overline {\lambda} + \sqrt{s\ a^TB^{-1}a}. \end{aligned}$$

This proves the upper bound in (3.15). We next derive the lower bound. In this case, we define $\delta= \overline{\lambda} - \lambda^{*} > 0$. Then we have

$$ A - \lambda^*B \succeq\delta B \succ0 $$

(3.16)

and hence

$$\begin{aligned} t^* - \lambda^* = a^T x^* = {x^*}^T\bigl(A - \lambda^*B\bigr)x^* \ge\delta {x^*}^T B x^* = \delta s. \end{aligned}$$

(3.17)

In addition, from the definition of d(λ), λ ^∗ and (3.16), we have

$$\begin{aligned} t^* - \lambda^* = a^T\bigl(A - \lambda^*B \bigr)^{-1}a \le\frac{1}{\delta}a^T B^{-1}a. \end{aligned}$$

(3.18)

Combining (3.17) and (3.18), we see that

$$ \delta\le\sqrt{\frac{a^TB^{-1}a}{s}}. $$

Finally, we observe from (3.18) that t ^∗≥λ ^∗ and hence $\delta\ge\overline{\lambda} - t^{*}$. The lower bound in (3.15) now immediately follows. □

On the other hand, when B is indefinite, according to Theorem 3.1 and Corollary 3.2, we conclude that t ₀ is finite and the function k(t) would have t ₀ as a left end point of its domain. We currently do not know of any efficient way for computing/estimating t ₀. However, if t ₀ and the corresponding λ ₀ such that d(λ ₀)=t ₀ are known, then the function value and the derivative of k at any t>t ₀ can be efficiently computed; see the paragraph below Proposition 3.4, with μ=λ ₀ in (3.23). Furthermore, for initialization, one can find an interval containing t ^∗ as follows:

Procedure for finding an interval containing t ^∗ when B is indefinite.

Step 1.
Take ϵ>0 and compute k′(t ₀+ϵ).
Step 2.
If k′(t ₀+ϵ)>0, set t ₊=t ₀+ϵ. Otherwise, set t ₋=t ₀+ϵ and i=1.
Step 3.
While k′(t ₋)<0, update t ₊←t ₋ and t ₋←t ₀+ζ ⁱ ϵ for some fixed ζ∈(0,1). Update i←i+1. Repeat this step.
Step 4.
While k′(t ₊)>0, update t ₋←t ₊ and t ₊←κ(t ₊−λ ₀)+λ ₀ for some fixed κ>1. Repeat this step.
Step 5.
By construction, k′(t ₋)>0 and k′(t ₊)<0. Thus, t ^∗∈(t ₋,t ₊).

Computing k ₀(t) and $k'_{0}(t)$

We next discuss how k ₀(t) and $k'_{0}(t)$ for t>t ₀, when they exist, can be computed efficiently. Recall from the definition of k ₀(t) and (3.8) that

$$\begin{aligned} k_0(t) = \inf \left\{ u^T \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array}\right] u:\; u^T \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array} \right] u = s + 1 \right \},\qquad k_0'(t) = u_0^2(t), \end{aligned}$$

where u ₀(t)≥0 is the first coordinate of a vector u(t) attaining the infimum defining k ₀(t). In the case when B is positive definite, k ₀(t) can be computed from the minimum generalized eigenvalue of the matrix pair

$$\left( \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array} \right], \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array} \right] \right), $$

and u ₀(t) is obtained from the corresponding eigenvector. However, when B is not positive definite, k ₀(t) and its derivative cannot be obtained directly from an extremal generalized eigenpair. We now discuss how to compute k ₀ and its derivative in this case.

To this end, we first consider a closely related problem. Let C and D be symmetric matrices. Consider the following program.

$$ \begin{array}{r@{\quad }l} \mathrm{val}:=\inf& x^T C x\\ \mathrm{s.t.} & x^T D x = 1. \end{array} $$

(3.19)

Then we have the following result.

Proposition 3.4

Suppose that (3.19) is feasible and that there exists λ so that C−λD⪰0. Suppose further that val≠0.

(i)
If val>0 and if x ^∗ is a solution to (3.19), then $\frac{1}{\sqrt{\mathrm{val}}}x^{*}$ is a solution to the following optimization problem, whose optimal value is $-\frac{1}{\mathrm{val}}$:
$$ \begin{array}{r@{\quad }l} \inf& -x^T D x\\ \mathrm{s.t.} & x^T C x = 1. \end{array} $$
(3.20)
(ii)
If val<0 and if x ^∗ is a solution to (3.19), then $\frac{1}{\sqrt{-\mathrm{val}}}x^{*}$ is a solution to the following optimization problem, whose optimal value is $-\frac{1}{\mathrm{val}}$:
$$ \begin{array}{r@{\quad }l} \inf& x^T D x\\ \mathrm{s.t.} & -x^T C x = 1. \end{array} $$
(3.21)

Proof

By Theorem 2.3, we see that x ^∗ is an optimal solution to (3.19) if, and only if, there exists λ ^∗ such that

$$\begin{aligned} \begin{aligned} \bigl(C - \lambda^* D\bigr) x^*& = 0, \\ C - \lambda^* D &\succeq0, \\ {x^*}^TDx^* & = 1. \end{aligned} \end{aligned}$$

(3.22)

Using the first and third relations of (3.22), we see that

$$ \mathrm{val} = {x^*}^T C x^* = \lambda^* {x^*}^T D x^* = \lambda^*. $$

Since val≠0, we obtain that λ ^∗=val≠0. For part (i), we have λ ^∗>0. From (3.22) and the relation val=x ^∗ ^T Cx ^∗, we see that

$$\begin{aligned} \biggl(\frac{1}{\lambda^*}C - D \biggr) \frac{x^*}{\sqrt{\lambda ^*}} =& 0, \\ \frac{1}{\lambda^*}C - D \succeq&0, \\ \biggl(\frac{x^*}{\sqrt{\lambda^*}} \biggr)^TC \biggl(\frac {x^*}{\sqrt {\lambda^*}} \biggr) =& \frac{\mathrm{val}}{{\lambda^*}} = 1. \end{aligned}$$

Hence, $\frac{x^{*}}{\sqrt{\lambda^{*}}} = \frac{x^{*}}{\sqrt{\mathrm{val}}}$ solves (3.20). On the other hand, for part (ii), λ ^∗<0. Hence, we have

$$\begin{aligned} \biggl(-\frac{1}{\lambda^*}C + D \biggr) \frac{x^*}{\sqrt{-\lambda^*}} = 0, \\ -\frac{1}{\lambda^*}C + D \succeq&0, \\ \biggl(\frac{x^*}{\sqrt{-\lambda^*}} \biggr)^TC \biggl(\frac {x^*}{\sqrt {-\lambda^*}} \biggr) =& \frac{\mathrm{val}}{-\lambda^*} = -1. \end{aligned}$$

Thus, $\frac{x^{*}}{\sqrt{-\lambda^{*}}} = \frac{x^{*}}{\sqrt{-\mathrm{val}}}$ solves (3.21). □

We now complete our discussion on efficient computation of k ₀(t) and its derivative when B is indefinite. Suppose that we know a μ such that

$$ \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array} \right] - \mu \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array}\right] \succ0. $$

(3.23)

Since is indefinite, we see immediately that

$$ \inf \left\{-\mathrm{sign}(s+1)y^T \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array}\right] y:\; y^T \left( \left[ \begin{array}{c@{\quad }c} t&-a^T\\ -a&A \end{array} \right] - \mu \left[ \begin{array}{c@{\quad }c} 1&0\\ 0&B \end{array}\right] \right)y = 1 \right\} < 0. $$

(3.24)

Thus, in view of Proposition 3.4, we see that k ₀(t) and $k_{0}'(t)$, upon scaling, can be computed by considering the optimization problem (3.24). Since the quadratic objective in the constraint of (3.24) is positive definite, the optimal value and solution of (3.24) can be obtained from an extremal generalized eigenpair.

Techniques for maximizing k(t)

These techniques are adapted from the original Rendl-Wolkowicz algorithm [11] for solving TRS (1.2).

Vertical cut

Suppose there exist t _g with k′(t _g)<0 and t _b with k′(t _b)>0. Suppose further that k(t _g)<k(t _b). Then we can use the so-called vertical cut to reduce the interval that contains t ^∗. More precisely, we obtain our new approximation t ₊ by intersecting the tangent line of k at t _g with the horizontal line k=k(t _b). Theoretically, we always get t ₊∈[t _b,t _g] and that k′(t ₊)<0. However, approximate evaluation of slopes and function values of k can cause t ₊∉[t _b,t _g]. When this happens, the vertical cut fails. The case when k(t _g)>k(t _b) can be treated similarly.

Triangle interpolation

Suppose there exist t _g with k′(t _g)<0 and t _b with k′(t _b)>0. We then intersect the tangent lines of k(t) at t _g and t _b to obtain a new estimate t ₊. Theoretically, we always get t ₊∈[t _b,t _g]. However, since we only compute the slope and the function value of k approximately, it can happen that t ₊∉[t _b,t _g]. When this happens, the triangle interpolation fails.

Inverse linear interpolation

Notice that $k'(t) = |s+1|y_{0}^{2}(t) - 1$, with y ₀(t) defined as in Sect. 3.2. Thus, k′(t)=0 is equivalent to

$$ \phi(t):= \sqrt{|s+1|} - \frac{1}{y_0(t)} = 0. $$

We consider approximating the inverse function of ϕ(t) by a linear function, i.e.,

$$t(\phi)=a \phi+ b, $$

for some real numbers a, b, and then set our next approximation as t ₊=t(0) if t(0)∈[t _b,t _g]. This technique is different from the previous techniques in the sense that it can also operate on two points whose slopes have the same sign.

Primal step to feasibility

As in the RW algorithm for solving TRS, we keep track of the point $x(t):=\frac{1}{y_{0}(t)}z(t)$, where is defined as in Sect. 3.2. In the easy case/hard case, case 1, y ₀(t) is nonzero as the algorithm proceeds and one can show that x(t) converges to the optimal solution of GTRS (1.1) as t converges to t ^∗. Notice that if there exist t _g with k′(t _g)<0 and t _b with k′(t _b)>0, then (x(t _g)^T Bx(t _g)−s)(x(t _b)^T Bx(t _b)−s)<0. Thus, by choosing a suitable α, the point αx(t _g)+(1−α)x(t _b) will be primal feasible. Furthermore, the resulting sequence is still convergent to x(t ^∗).

Before closing this section, we summarize our algorithm in the following flowchart.

Flowchart for the extended RW algorithm

Check for the hard cases/interior solution; shift and deflate; find initial interval containing t ^∗.
Main loop: iterate until a termination criterion is met:
1. 1.
  update t.
  1. (a)
    Set t to be the midpoint of the interval of uncertainty for t.
  2. (b)
    If points at which k has positive and negative slope, respectively, are known:
    1. (i)
      Do vertical cut; reduce the interval of uncertainty for t if possible. Update t if vertical cut is successful.
    2. (ii)
      Do triangle interpolation. Update t if triangle interpolation is successful.
  3. (c)
    Do inverse linear interpolation. Update t if inverse linear interpolation is successful.
2. 2.
  At the new t, compute k(t) and k′(t) using techniques discussed above and eigifp [14].^{Footnote 3} Take a primal step to satisfy feasibility if iterates at which k has positive and negative slopes are available.
End loop.

3.4 Numerical experiments with new RW algorithm

In this section, we study the numerical performance of our new RW algorithm for GTRS. We compute $\lim_{\lambda\uparrow\overline{\lambda}}\psi(\lambda)$ and $\lim_{\lambda\downarrow\underline{\lambda}}\psi(\lambda)$ as described in Sect. 2.2.2 with $\alpha=\frac{\|A\| _{2}}{\sqrt{n}}$, where ∥A∥₂ is the (approximate) spectral norm found using the MATLAB command normest(A,1). In particular, we check for $\frac{|a^{T}v|}{\|a\|}<1e-8$ for every vector v from an orthonormal basis of the nullspaces $\operatorname{{Null}}(A -\overline{\lambda}B)$ or $\operatorname{{Null}}(A -\underline{\lambda}B)$. If the instance is not a hard case, case 2, instance, we perform shift and deflation as described in Proposition 3.2 using these vectors v and the same α as above, when B≻0. We then initialize the RW algorithm (for the equivalent equality constrained problem) as discussed in the previous section, and terminate the algorithm when

$$\begin{aligned} \begin{array} {r@{\quad }c} \displaystyle&\max \biggl\{ \displaystyle\frac{|k'(t)|^2}{|s+1|^2}, \frac{|q_0(x) - k(t)|}{|{q_0}^\mathrm{best}|+1}, \frac{|q_1(x) - s|}{|s|+ 1}, \frac{\|Ax - \lambda Bx - a\|^2}{(\|A\|_2+\|a\|+1)^2} \biggr\} < 10^{-13}, \\ \mathrm{or}\ &\displaystyle\frac{|\mathrm{high} - \mathrm{low}|}{|\mathrm{high}|+|\mathrm{low}|} < 10^{-15}, \end{array} \end{aligned}$$

(3.25)

or when the number of iterations hits 30; here x is recovered from a generalized eigenvector and then shifted to become approximately feasible, ∥A∥₂ is the (approximate) spectral norm found using the MATLAB command normest(A,1), $q_{0}^{\mathrm{best}}$ is the smallest primal objective value up to the current iteration. Our code is written in MATLAB. All numerical experiments are performed on an SGI XE340 system, with two 2.4 GHz quad-core Intel E5620 Xeon 64-bit CPUs and 48 GB RAM, equipped with SUSE Linux Enterprise server 11 SP1 and MATLAB 7.14 (R2012a). All routines are timed using the tic-toc function in MATLAB.

3.4.1 B is positive definite

In this subsection, we consider GTRS (1.1) with a positive definite B. For simplicity, we also assume without loss of generality that b=0. We consider two cases: A is not positive semidefinite or A is positive definite.

A is not positive semidefinite

In this case, the GTRS is equivalent to (3.1) with s=u, which is also equivalent to (1.1) with l=0. We compare three algorithms:

The RW algorithm for solving (1.1);
The GLTR algorithm [15], an algorithm in the Galahad package (Version 2.50000 pre-release), to solve (1.1) with l=0. The GLTR algorithm is written in Fortran and implemented as a MATLAB mex file;
The Newton’s method with (Armijo) line search for maximizing the dual objective function to solve (3.1) with s=u. We code this in MATLAB.

For the GLTR algorithm, we use its default settings in our experiments. For the Newton’s method, we initialize at $\lambda^{0} = \overline{\lambda}-1$ and terminate when

$$\begin{aligned} \max \biggl\{ \frac{|q_0(x) - d(\lambda)|}{|q_0(x)|+1},\frac{|q_1(x) - s|}{|s|+ 1},\frac{\|Ax - \lambda Bx - a\|^2}{(\|A\|_2+\|a\| +1)^2} \biggr\} < 10^{-12}, \end{aligned}$$

or when the stepsize falls below 1e−10 or the number of iterations hits 10; here x is generated from λ via x=(A−λB)⁻¹ a. The Newton direction in the Newton’s method is computed via the MATLAB built-in function pcg, with termination tolerance 1e−14 and iteration bound 1000.

In our tests, we generate both easy and hard case instances, for dimensions n=10000, 15000 and 20000.

To generate an easy case instance, we first generate randomly a sparse symmetric matrix A and a sparse positive definite matrix B via A = sprandsym(n,1e-2) and B = sprandsym(n,1e-2,0.1,2). We then compute $\overline{\lambda}$, the minimum generalized eigenvalue of the matrix pair (A,B). We next generate x ₀ with Gaussian entries of mean 0 and standard deviation 0.1 and set $a = (A - \tilde{\lambda}B)x_{0}$, $s = x_{0}^{T}Bx_{0}$, where $\tilde{\lambda}= \overline{\lambda}- r$ for some r chosen uniformly from [5,10]. Finally, we set u=1.2s and l=0.8s. From the optimality conditions and the fact that r>0, we see that the above construction likely gives an easy case instance.

To generate a hard case instance, we follow the same procedure as above to obtain A, B, x ₀, $\overline{\lambda}$ and s but we set $a = (A - \overline{\lambda}B)x_{0}$ instead. If we take u=0.6s and l=0.6u, then we likely end up with a hard case, case 1 instance. On the other hand, by setting u=1.2s, l=1.1s, we likely end up with a hard case, case 2 instance.

For each n, we generate 10 easy, 10 hard case, case 1 and 10 hard case, case 2 instances as described above. The computational results are reported in Table 2, where we report the number of iterations (iter), CPU time in seconds (cpu), primal objective value (fval) at termination and also the primal infeasibility for the equivalent equality constrained problem measured by |x ^T Bx−u| (feas_eq),^{Footnote 4} averaged over the 10 random instances. We observe the RW algorithm is faster than Newton+Armijo on easy case instances, but is slower on hard case, case 1 instances because of the extra time taken for preprocessing. The preprocessing is justified by the observation that, the simple Newton+Armijo strategy fails to obtain even two digits of accuracy for hard case, case 2 instances. We also note that the GLTR algorithm is the slowest algorithm mainly because of the cost for factorizing the matrix B, whose Cholesky factorization has a high percentage fill-in.^{Footnote 5}

Table 2 Computational results for indefinite A and positive definite B

Full size table

A is positive definite

In this case, it is not certain whether the GTRS (1.1) is equivalent to (3.1) with s=u or s=l, or the GTRS may have an interior solution. Hence, for comparison, we first solve the GTRS using the RW algorithm, from which we can determine whether the GTRS has a boundary solution. If the GTRS is equivalent to an instance of (3.1), we solve this instance of (3.1) using the Newton’s method that maximizes its dual. Furthermore, if the GTRS is equivalent to (3.1) with s=u, then the GLTR algorithm can be applied directly to solve the instance. On the other hand, even if the GTRS turns out to be equivalent to (3.1) with s=l instead, it is not hard to see that the instance can be solved by applying the GLTR algorithm with $A\leftarrow A - (\overline{\lambda}+ 1)B$, u←l and l←0. We initialize and terminate all three algorithms as in the previous test.

In our tests, similarly as above, we generate both easy and hard case instances, for dimensions n=10000, 15000 and 20000. We follow the same procedure as in the previous case, except that instead of a random sparse symmetric A, we generate another matrix C similarly as we generated B, and then set A=C+10B so that the resulting GTRS (1.1) likely has a solution on the lower boundary of the feasible region.

Test results are reported in Table 3, averaged over 10 random instances.^{Footnote 6} We note that the CPU time for GLTR+shift includes both the GLTR algorithm run time and the time taken to find $\overline{\lambda}$ for the transformation $A\leftarrow A - (\overline{\lambda}+ 1)B$, with the latter time also reported separately in parenthesis. We again observe that the RW algorithm is faster than Newton+Armijo on easy case instances, and is slower on hard case, case 1 instances due to the extra preprocessing time. The failure of the simple Newton+Armijo strategy on hard case, case 2 instances again justifies the use of preprocessing. We also note that the GLTR algorithm is the slowest due to the cost for factorizing B.

Table 3 Computational results for positive definite A and positive definite B

Full size table

3.4.2 B is indefinite

In this subsection, we consider GTRS (1.1) with an indefinite (and nonsingular) B. For simplicity, we also assume without loss of generality that b=0. Furthermore, in order that the interval $(\underline{\lambda },\overline {\lambda})$ can be located efficiently, we restrict our attention to the case when A is positive definite; see Sect. 3.3. There are currently no algorithms in the literature specialized at solving such GTRS instances. Thus, we only compare our algorithm with the Newton’s method. Specifically, we first use our algorithm to determine whether the GTRS is equivalent to an instance of (3.1) with s=u or s=l. We then solve this instance of (3.1) using the Newton’s method that maximizes its dual.

For simplicity, we only generate easy case and hard case, case 2, instances in our numerical experiments. For dimensions n=10000, 15000 and 20000, we first generate sparse symmetric matrices B ₁ and A ₁ via the commands A_1 = sprandsym(n,1e-2,0.1,2) and B_1 = sprandsym(n,1e-2). We then locate the interval $(\underline{\lambda},\overline{\lambda})$ for the matrix pair (A ₁,B ₁).

To generate an easy case instance, we first set $\tilde{\lambda}= \frac {\overline{\lambda}+\underline{\lambda}}{2}$. We next generate x ₀ with Gaussian entries of mean 0 and standard deviation 0.1, and set $a = (A_{1} - \tilde{\lambda}B_{1})x_{0}$ and $s_{1} = x_{0}^{T}B_{1}x_{0}$. We now further modify A ₁ and B ₁ so that we obtain an instance with explicit knowledge of t ₀ and the corresponding $\lambda_{0}\in (\underline{\lambda},\overline{\lambda})$ with d(λ ₀)=t ₀. To this end, define B:=B ₁/(−|s ₁|) and A:=A ₁. Then by construction, we have

$$\inf_{x^TBx=-1}x^TAx - 2a^Tx = x_0^TAx_0 - 2a^Tx_0 = -d(\lambda_0), $$

where $\lambda_{0} = -|s_{1}|\tilde{\lambda}$. Finally, with the above A, a and B, setting $s = \alpha\cdot\sqrt{n}$ and l=u−1=s, where α follows standard Gaussian distribution, we likely obtain an easy case instance with explicitly known t ₀=d(λ ₀). The t ₀ can be used for initialization of the RW algorithm, and λ ₀ is used as in (3.23) in place of μ. In particular, we find t ₋ and t ₊ as described in Sect. 3.3 with ϵ=1, ζ=0.2 and κ=1.2. We note that computation of k′(t ₋) outlined in Sect. 3.3 can become inefficient when t ₋ gets too close to t ₀. ^{Footnote 7}

Generating a hard case, case 2 instance with known t ₀ is a bit more complicated. We first describe how to generate such an instance so that the optimal solution x _∗ satisfies $x_{*}^{T}Bx_{*} = l$. To this end, we set $\tilde{\lambda}= \underline{\lambda}$. We then generate x ₀ with Gaussian entries of mean 0 and standard deviation 0.1, and set $a = (A_{1} - \tilde{\lambda}B_{1})x_{0}$ and $s_{1} = x_{0}^{T}B_{1}x_{0}$. We now further modify A ₁ and B ₁ as in the previous paragraph to obtain A, B, t ₀ and λ ₀. Finally, with these A, a and B, setting $s = \alpha\cdot\sqrt{n}$ and l=u−1=s, where α is uniformly chosen from [1,2], we likely obtain a hard case, case 2 instance with optimal solution x _∗ satisfies $x_{*}^{T}Bx_{*} = l$. Next, to obtain a hard case, case 2 instance with optimal solution x _∗ satisfies $x_{*}^{T}Bx_{*} = u$, we follow the same procedure as above except that we set $\tilde{\lambda}= \overline{\lambda}$ and choose α uniformly from [−2,−1].

Test results are reported in Table 4, averaged over 10 random instances.^{Footnote 8} We observe that the RW algorithm is slightly faster than Newton+Armijo on easy case instances, and it produces a solution with better quality in terms of feasibility (of the equivalent equality constrained GTRS). Furthermore, we again observe that the simple Newton+Armijo strategy fails on hard case, case 2 instances.

Table 4 Computational results for positive definite A and indefinite B

Full size table

4 Conclusion

We have presented optimality conditions for GTRS (1.1) that hold under a constraint qualification, the RICQ (2.6). Furthermore, if the RICQ fails, we showed how the GTRS can be explicitly solved. In this sense, GTRS has strong duality results as in linear programming. We also demonstrated that the GTRS can be classified into easy and hard cases as the classical TRS, and the problem can be preprocessed to identify explicit solution in hard case/interior solution and reduce the instance into an equality constrained GTRS (3.1). We then discussed in detail how the Rendl-Wolkowicz algorithm in [29] can be extended to solve such instances, which involved transforming the instance into a parameterized generalized eigenvalue problem. We also illustrated our algorithm numerically in various cases, including cases when B is indefinite. When B is a random sparse positive definite matrix, our algorithm is competitive with a simple Newton’s method implementation, and is faster than the GLTR algorithm. On the other hand, when B is indefinite, our algorithm requires additional inputs for initialization. Given such inputs, our algorithm is also competitive with the Newton’s method.

Notes

Checking feasibility for GTRS is simple, see Theorem 2.1(i), below.
Note that complementary slackness can be written with a single Lagrange multiplier $\lambda\in {\mathbb{R}}$ as follows: (q ₁(x)−ℓ)λ≤0≤(u−q ₁(x))λ.
EIGIFP is implemented in MATLAB and finds the minimum generalized eigenvalue and eigenvector of the matrix pair (A,B) for positive definite B. It does not involve factorization of the matrices A or B.
Comparing with |x ^T Bx−max{min{x ^T Bx,u},l}|, feas_eq is less sensitive to the quantity |u−l|, the thickness of the feasible region.
Indeed, when B is structured, say, block diagonal with small blocks (size 100 by 100), the GLTR algorithm tends to terminate within 1 second for the easy case and hard case 1 instances, with high accuracy; and within a couple seconds (2 to 10) for hard case 2 instances, with moderate accuracy. In contrast, the other two codes are at most 50 % faster than the time reported in Table 2 for such B.
For consistency, we regenerate the problem instance to make sure that the instances considered in this subsection have minimizer on the lower boundary.
In all our test instances below, they all turned out to have a valid left end point t ₋ lying in [t ₀+(0.2)³,t ₀+1].
We regenerate another instance if A ⁻¹ b is an interior solution.

References

Barvinok, A.: Problems of distance geometry and convex properties of quadratic maps. Discrete Comput. Geom. 13(2), 189–202 (1995)
Article MATH MathSciNet Google Scholar
Barvinok, A.: A remark on the rank of positive semidefinite matrices subject to affine constraints. Discrete Comput. Geom. 25(1), 23–31 (2001)
Article MATH MathSciNet Google Scholar
Beck, A.: Quadratic matrix programming. SIAM J. Optim. 17(4), 1224–1238 (2006)
Article MathSciNet Google Scholar
Beck, A., Eldar, Y.C.: Strong duality in nonconvex quadratic optimization with two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006)
Article MATH MathSciNet Google Scholar
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer Series in Operations Research. Springer, New York (2000)
Book MATH Google Scholar
Burer, S., Anstreicher, K.M.: Second-order-cone constraints for extended trust-region subproblems. SIAM J. Optim. 23(1), 432–451 (2013)
Article MATH MathSciNet Google Scholar
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust-Region Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2000)
Book MATH Google Scholar
Ding, Y., Ge, D., Wolkowicz, H.: On equivalence of semidefinite relaxations for quadratic matrix programming. Math. Oper. Res. 36(1), 88–104 (2011)
Article MATH MathSciNet Google Scholar
Finsler, P.: Über das Vorkommen definiter und semidefiniter Formen in Scharen quadratischer Formen. Comment. Math. Helv. 9, 188–192 (1937)
Article MathSciNet Google Scholar
Flippo, O.E., Jansen, B.: Duality and sensitivity in nonconvex quadratic optimization over an ellipsoid. Eur. J. Oper. Res. 94(1), 167–178 (1996)
Article MATH Google Scholar
Fortin, C., Wolkowicz, H.: The trust region subproblem and semidefinite programming. Optim. Methods Softw. 19(1), 41–67 (2004)
Article MATH MathSciNet Google Scholar
Gay, D.M.: Computing optimal locally constrained steps. SIAM J. Sci. Stat. Comput. 2, 186–197 (1981)
Article MATH MathSciNet Google Scholar
Gohberg, I., Lancaster, P., Rodman, L.: Matrices and Indefinite Scalar Products. Birkhauser, Basel (1983)
MATH Google Scholar
Golub, G.H., Ye, Q.: An inverse free preconditioned Krylov subspace method for symmetric generalized eigenvalue problems. SIAM J. Sci. Comput. 24(1), 312–334 (2002) (electronic)
Article MATH MathSciNet Google Scholar
Gould, N.I.M., Lucidi, S., Roma, M., Toint, Ph.L.: Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2), 504–525 (1999)
Article MATH MathSciNet Google Scholar
Gould, N.I.M., Robinson, D.P., Sue Thorne, H.: On solving trust-region and other regularised subproblems in optimization. Math. Program. Comput. 2(1), 21–57 (2010)
Article MATH MathSciNet Google Scholar
Gould, N.I.M., Toint, Ph.L.: A quadratic programming bibliography. Technical report, Rutherford Appleton Laboratory, England (2001). www.optimization-online.org/DB_HTML/2001/02/285.html
Hager, W.W.: Minimizing a quadratic over a sphere. SIAM J. Optim. 12(1), 188–208 (2001)
Article MATH MathSciNet Google Scholar
Hestenes, M.R., McShane, E.J.: A theorem on quadratic forms and its application in the calculus of variations. Trans. Am. Math. Soc. 47, 501–512 (1940)
Article MathSciNet Google Scholar
Jin, Q., Fang, S.C., Xing, W.: On the global optimality of generalized trust region subproblems. Optimization 59(8), 1139–1151 (2010)
Article MATH MathSciNet Google Scholar
Lampe, J., Rojas, M., Sorensen, D.C., Voss, H.: Accelerating the LSTRS algorithm. SIAM J. Sci. Comput. 33(1), 175–194 (2011)
Article MATH MathSciNet Google Scholar
Lancaster, P., Rodman, L.: Canonical forms for Hermitian matrix pairs under strict equivalence and congruence. SIAM Rev. 47(3), 407–443 (2005)
Article MATH MathSciNet Google Scholar
Luo, Z.-Q., Zhang, S.: On extensions of the Frank-Wolfe theorems. Comput. Optim. Appl. 13(1–3), 87–110 (1999). Computational optimization—a tribute to Olvi Mangasarian, Part II
Article MATH MathSciNet Google Scholar
Moré, J.J.: Generalizations of the trust region problem. Optim. Methods Softw. 2, 189–209 (1993)
Article Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4, 553–572 (1983)
Article MATH Google Scholar
Parlett, B.N.: The Symmetric Eigenvalue Problem. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1998). Corrected reprint of the 1980 original
Book MATH Google Scholar
Pataki, G.: On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Math. Oper. Res. 23(2), 339–358 (1998)
Article MATH MathSciNet Google Scholar
Pólik, I., Terlaky, T.: A survey of the S-lemma. SIAM Rev. 49(3), 371–418 (2007) (electronic)
Article MATH MathSciNet Google Scholar
Rendl, F., Wolkowicz, H.: A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 77(2, Ser. B), 273–299 (1997)
MATH MathSciNet Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton (1997). Reprint of the 1970 original, Princeton Paperbacks
Google Scholar
Rojas, M., Santos, S.A., Sorensen, D.C.: A new matrix-free algorithm for the large-scale trust-region subproblem. SIAM J. Optim. 11(3), 611–646 (2000)
Article MATH MathSciNet Google Scholar
Rojas, M., Santos, S.A., Sorensen, D.C.: Algorithm 873: LSTRS: MATLAB software for large-scale trust-region subproblems and regularization. ACM Trans. Math. Softw. 34(2), Art11, 28 (2008)
Article MathSciNet Google Scholar
Sorensen, D.C.: Minimization of a large-scale quadratic function subject to a spherical constraint. SIAM J. Optim. 7(1), 141–161 (1997)
Article MATH MathSciNet Google Scholar
Stern, R., Wolkowicz, H.: Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM J. Optim. 5(2), 286–313 (1995)
Article MATH MathSciNet Google Scholar
Tao, P.D., An, L.T.H.: Difference of convex functions optimization algorithms (DCA) for globally minimizing nonconvex quadratic forms on Euclidean balls and spheres. Oper. Res. Lett. 19(5), 207–216 (1996)
Article MATH MathSciNet Google Scholar
Yakubovich, V.A.: The S-procedure in nonlinear control theory. Vestn. Leningr. Univ. 4, 73–93 (1977). English Translation, original Russian publication in Vestn. Leningr. Univ., Ser. Math. 62–77 (1971)
Google Scholar
Ye, Y., Zhang, S.: New results on quadratic minimization. SIAM J. Optim. 14(1), 245–267 (2003) (electronic)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The first author would like to thank Nicholas I.M. Gould for his help concerning the GLTR algorithm and for providing the pre-release version of the software.

Author information

Authors and Affiliations

Department of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
Ting Kei Pong & Henry Wolkowicz

Authors

Ting Kei Pong
View author publications
You can also search for this author in PubMed Google Scholar
Henry Wolkowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Henry Wolkowicz.

Additional information

T.K. Pong research supported by AFOSR.

H. Wolkowicz research supported by The Natural Sciences and Engineering Research Council of Canada and by AFOSR.

Appendix: Proof of Theorem 2.1(ii)

Proposition A.1

Suppose that b=0 and Items 1, 2 and 3 of Assumption 2.1 are satisfied. If GTRS (1.1) is bounded below, then D-GTRS (2.2) is feasible.

Proof

We first consider the case when B is positive semidefinite. Then there exists an invertible matrix P such that

$$ B = P \left[ \begin{array}{c@{\quad }c} 0&0\\ 0&I \end{array}\right] P^T. $$

Thus, after a change of variables y=P ^T x, GTRS (1.1) can be equivalently written as

$$ \begin{array}{r@{\quad }l} \inf& q_0(y):= \begin{bmatrix} y_1\\y_2 \end{bmatrix} ^T \left[ \begin{array}{c@{\quad }c} \bar{A}_1&\bar{A}_2^T\\ \bar{A}_2& \bar{A}_3 \end{array}\right] \begin{bmatrix} y_1\\y_2 \end{bmatrix} - 2 \begin{bmatrix} \bar{a}_1\\ \bar{a}_2 \end{bmatrix} ^T \begin{bmatrix} y_1\\y_2 \end{bmatrix} \\ \mbox{s.t.} & \ell\leq\|y_2\|^2 \leq u, \end{array} $$

(A.1)

where

$$ y = \begin{bmatrix} y_1\\y_2 \end{bmatrix},\quad \bar{A} = \left[ \begin{array}{c@{\quad }c} \bar{A}_1&\bar{A}_2^T\\ \bar{A}_2& \bar{A}_3 \end{array}\right] = P^{-1}AP^{-T},\quad \bar{a} = \begin{bmatrix} \bar{a}_1\\ \bar{a}_2 \end{bmatrix} = P^{-1}a. $$

(A.2)

By our assumptions, the program (A.1) is bounded below. Since

$$q_0(y) = y_1^T\bar{A}_1y_1 + 2y_2^T\bar{A}_2y_1 + y_2^T\bar{A}_3y_2 - 2\bar{a}_1^Ty_1 - 2\bar{a}_2^T y_2, $$

and the constraint does not involve y ₁, we must have

$$ \bar{A}_1\succeq0,\qquad \bar{A}_2^Ty_2 - \bar{a}_1\in \operatorname{{Range}}(\bar{A}_1),\quad \forall\ \ell\le \|y_2\|^2\le u. $$

(A.3)

Notice that u>0 since the RICQ (2.6) holds. Hence, the second relation in (A.3) implies

$$ \operatorname{{Range}}\bigl(\bar{A}_2^T\bigr)\subseteq \operatorname{{Range}}(\bar{A}_1)\quad \mathrm{and}\quad \bar{a}_1\in \operatorname{{Range}}(\bar{A}_1). $$

(A.4)

By (A.3), (A.4) and a consideration of the Schur complement, we conclude that for all sufficiently large λ>0, $\bar{A}_{3}+\lambda I \succ0$ and

$$ \bar{A} + \lambda \left[ \begin{array}{c@{\quad }c} 0&0\\ 0&I \end{array}\right] = \left[ \begin{array}{c@{\quad }c} \bar{A}_1&\bar{A}_2^T\\ \bar{A}_2& \bar{A}_3+\lambda I \end{array} \right] \succeq0. $$

(A.5)

Moreover, from (A.4), we see that, for all sufficiently large λ>0,

$$\operatorname{{Range}}\bigl(\bar{A}_1 - \bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\bar{A}_2\bigr) = \operatorname{{Range}}(\bar{A}_1), $$

and hence $\bar{a}_{1}\in \operatorname{{Range}}(\bar{A}_{1} - \bar{A}_{2}^{T}[\bar{A}_{3}+\lambda I]^{-1}\bar{A}_{2})$. Thus, for sufficiently large λ>0, the system of equations

$$ \left[ \begin{array}{c@{\quad }c} \bar{A}_1 - \bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\bar{A}_2 & 0\\ 0& \bar{A}_3+\lambda I \end{array} \right] \left[ \begin{array}{c} w_1\\ w_2 \end{array} \right] = \left[ \begin{array}{c} \bar{a}_1 - \bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\bar{a}_2\\ \bar{a}_2 \end{array} \right] $$

(A.6)

is consistent. Since

$$\begin{aligned} &\left[ \begin{array}{c@{\quad }c} \bar{A}_1 & \bar{A}_2^T\\ \bar{A}_2 & \bar{A}_3+\lambda I \end{array}\right] \\ & = \left[ \begin{array}{c@{\ \ }c} I & \bar{A}_2^T[\bar{A}_3\,{+}\,\lambda I]^{-1}\\ 0 & I \end{array}\right] \left[ \begin{array}{c@{\ \ }c} \bar{A}_1 \,{-}\, \bar{A}_2^T[\bar{A}_3\,{+}\,\lambda I]^{-1}\bar{A}_2 & 0\\ 0& \bar{A}_3\,{+}\,\lambda I \end{array}\right] \left[ \begin{array}{c@{\ \ }c} I & 0\\ {[}\bar{A}_3\,{+}\,\lambda I]^{-1}\bar{A}_2 & I \end{array}\right], \end{aligned}$$

and

$$\begin{aligned} \left[ \begin{array}{c@{\quad }c} I & \bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\\ 0 & I \end{array} \right] ^{-1} \left[ \begin{array}{c} \bar{a}_1\\ \bar{a}_2 \end{array}\right] =& \left[ \begin{array}{c@{\quad }c} I & -\bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\\ 0 & I \end{array} \right] \left[ \begin{array}{c} \bar{a}_1\\ \bar{a}_2 \end{array} \right]\\ =& \left[ \begin{array}{c} \bar{a}_1 - \bar{A}_2^T[\bar{A}_3+\lambda I]^{-1}\bar{a}_2\\ \bar{a}_2 \end{array}\right], \end{aligned}$$

we see from the consistency of (A.6) that for sufficiently large λ>0. From the definition of $\bar{a}$ and $\bar{A}$ in (A.2), we obtain immediately that $a\in \operatorname{{Range}}(A + \lambda B)$ for sufficiently large λ>0. This together with (A.5) proves that D-GTRS is feasible.

The case when B is negative semidefinite can be tackled similarly.

Next, we consider the case when B is indefinite. Suppose to the contrary that D-GTRS is infeasible. Then either A−λB is not positive semidefinite for any $\lambda\in {\mathbb{R}}$, or there exists λ with A−λB⪰0 but all such λ satisfy $a\notin \operatorname{{Range}}(A - \lambda B)$.

Case 1:

A−λB is not positive semidefinite for all $\lambda\in {\mathbb{R}}$. By [24, Theorem 2.3], there exists u ^∗ satisfying u ^∗ ^T Au ^∗<0 and u ^∗ ^T Bu ^∗=0.

Fix any s∈[ℓ,u]. For the above u ^∗ and each t>0, we would like to find a solution x _t of (x+tu ^∗)^T B(x+tu ^∗)=s that is uniformly bounded in t. For s=0, we just take x _t=0. In the case when s>0, the x we seek has to satisfy

$$ x^TBx + 2tx^TBu^*=s. $$

Let B=PDP ^T for some orthogonal matrix P and diagonal matrix D, and let β _i denote the ith diagonal entry of D. Also, let $\tilde{u}^{*}$ denote P ^T u ^∗. Since B is indefinite, there exists an index i ₀ such that $\beta_{i_{0}}>0$. Consider the quadratic equation

$$ \beta_{i_0} r^2 + 2 t\beta_{i_0}{\tilde{u}^*_{i_0}}r - s = 0. $$

It is clear that the solutions of the above quadratic equation are

$$\begin{aligned} r^1_t = \frac{-2 t\beta_{i_0}{\tilde{u}^*_{i_0}}+\sqrt{(2 t\beta _{i_0}{\tilde{u}^*_{i_0}})^2 + 4s\beta_{i_0}}}{ 2\beta_{i_0}}=\frac{2s}{2 t\beta_{i_0}{\tilde{u}^*_{i_0}}+\sqrt{(2 t\beta_{i_0}{\tilde{u}^*_{i_0}})^2 + 4s\beta_{i_0}}}, \\ r^2_t = \frac{-2 t\beta_{i_0}{\tilde{u}^*_{i_0}}-\sqrt{(2 t\beta _{i_0}{\tilde{u}^*_{i_0}})^2 + 4s\beta_{i_0}}}{2\beta_{i_0}} =\frac{2s}{2 t\beta_{i_0}{\tilde{u}^*_{i_0}}-\sqrt{(2 t\beta _{i_0}{\tilde{u}^*_i})^2 + 4s\beta_{i_0}}}. \end{aligned}$$

Moreover, it is easy to see that $r^{1}_{t}$ is uniformly bounded for t>0 when $\tilde{u}^{*}_{i_{0}}\ge0$, while $r^{2}_{t}$ is uniformly bounded for t>0 when $\tilde{u}^{*}_{i_{0}}\le0$. Define

$$x_t = \begin{cases} r^1_tPe_{i_0}& \mathrm{if}\ \tilde{u}^*_{i_0}\ge0,\\ r^2_tPe_{i_0}& \mathrm{otherwise}, \end{cases} $$

where $e_{i_{0}}$ is the vector which is one at the i ₀th entry and is zero otherwise. Then {x _t} is uniformly bounded in t>0 and satisfies $x_{t}^{T}Bx_{t} + 2tx_{t}^{T}Bu^{*}=s$. The case when s<0 can be considered similarly, by picking the index such that β _i<0.

Next, since {x _t} is bounded for t>0 and u ^∗ ^T Au ^∗<0, we see that

$$\begin{aligned} &\bigl(x_t+tu^*\bigr)^TA\bigl(x_t+tu^* \bigr)-2a^T\bigl(x_t+tu^*\bigr)\\ &\quad {} = x_t^TAx_t +2tx_t^TAu^* + t^2 {u^*}^TAu^*-2a^T \bigl(x_t+tu^*\bigr)\rightarrow-\infty \end{aligned}$$

as t→∞. This together with the feasibility of x _t+tu ^∗ for all t>0 contradicts the boundedness of GTRS.

Case 2:

There exists λ with A−λB⪰0 but all such λ satisfy $a\notin \operatorname{{Range}}(A - \lambda B)$. In this case, we apply [4, Theorem A.2] to derive a contradiction. To this end, let s∈[ℓ,u] be such that the set {x: x ^T Bx=s} is nonempty. Then, by assumption, we see that the equality constrained GTRS (3.1), with this s, is bounded below. Thus, f ₂(x):=x ^T Ax−2a ^T x−L is nonnegative on the set {x: f ₁(x)=0}, where L is a lower bound of the optimal value of the equality constrained GTRS, and f ₁(x):=x ^T Bx−s. Furthermore, since B is indefinite, there exist x ₁, x ₂ such that

$$ f_1(x_1)> 0\ \ \mathrm{and}\ \ f_1(x_2)<0. $$

(A.7)

Using these and invoking [4, Theorem A.2], we conclude that there exists μ such that

$$ \left( \begin{array}{c@{\quad }c} -L +\mu s& -a^T\\ -a& A -\mu B \end{array} \right) \succeq0. $$

From Schur complement, this implies that A−μB⪰0 and $a\in \operatorname{{Range}}(A - \mu B)$. Thus, D-GTRS is feasible, a contradiction to our assumption.

Combining Cases 1 and 2, we conclude that D-GTRS is also feasible when B is indefinite. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pong, T.K., Wolkowicz, H. The generalized trust region subproblem. Comput Optim Appl 58, 273–322 (2014). https://doi.org/10.1007/s10589-013-9635-7

Download citation

Received: 17 November 2012
Published: 17 January 2014
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10589-013-9635-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The generalized trust region subproblem

Abstract

Similar content being viewed by others

An efficient algorithm for solving the generalized trust region subproblem

Local nonglobal minima for solving large-scale extended trust-region subproblems

Trust region subproblem with an additional linear inequality constraint

1 Introduction

1.1 Outline

1.2 Notation

2 Properties of GTRS

2.1 Duality and optimality conditions

Proposition 2.1

Proof

2.1.1 Assumptions and properties

Assumption 2.1

Theorem 2.1

Proof

Remark 2.1

2.1.2 Weak duality

Proposition 2.2

Proof

2.1.3 Strong duality and characterization of optimality

Lemma 2.1

Proof

Theorem 2.2

Proof

Theorem 2.3

Proof

Remark 2.2

Example 2.1

2.2 Easy and hard cases and regular pencils

2.2.1 Regular case

Proposition 2.3

Proof

Theorem 2.4

Proof

Lemma 2.2

Remark 2.3

Definition 2.1

Lemma 2.3

Proof

Lemma 2.4

Proof

Lemma 2.5

Proof

2.2.2 Three intervals for λ

Case 1: \(\underline{\lambda}< 0< \overline{\lambda}\)

Case 2: \(\overline{\lambda}\le0\)

Case 3: \(\underline{\lambda}\ge0\)

3 A method for solving GTRS: the extended Rendl-Wolkowicz algorithm

3.1 Properties of k 0(t)

Lemma 3.1

Proof

Case 1: B is indefinite

Theorem 3.1

Proof

Corollary 3.1

Corollary 3.2

Case 2: B is positive semidefinite

Proposition 3.1

Proof

Theorem 3.2

Proof

Corollary 3.3

Theorem 3.3

Proof

3.2 Recovering solution of GTRS from maximizing k(t)

Case 1: \(t^{*}\in(t_{0},\overline{t})\)

Case 2: \(t^{*} = t_{0} < \overline{t}\)

Case 3: \(t^{*}=\overline{t} > t_{0}\)

Case 4: \(t^{*}=t_{0}=\overline{t}\)

Remark 3.1

3.3 Implementation details

Case check

Shift and deflation for the hard case, case 1, when B is positive definite

Proposition 3.2

Proof

Initialization

Proposition 3.3

Proof

3.1 Properties of k ₀(t)

Computing k ₀(t) and \(k'_{0}(t)\)