1 Introduction

Consider the classical trust region subproblem

$$\begin{aligned} (\text {TRS})~&\min&f(x) = \frac{1}{2}{x^T}Ax + {b^T}x \\&\text {s.t.}&{x^T}x \le 1, \end{aligned}$$

where \(A\in {\mathbb {R}}^{n\times n}\) is symmetric and possibly indefinite and \(b\in {\mathbb {R}}^{n}\). (TRS) plays a great role in trust region methods [2, 14] for nonlinear programming. Efficient numerical algorithms for (TRS) can be found in [5, 8]. When \(b = 0\), (TRS) reduces to the maximal eigenvector problem, which can be approximated in linear time [4]. Recently, the first linear-time approximation algorithm for (TRS) is proposed [6] based on strong duality, approximate eigenvector oracle and the bisection scheme.

In this paper, we present a new and simple linear-time approximation scheme for solving (TRS). Actually, our algorithm works for a more general version of (TRS), the two-sided trust region subproblem:

$$\begin{aligned} (\text {TTRS})~&\min&f(x) = \frac{1}{2}{x^T}Ax + {b^T}x \end{aligned}$$
(1)
$$\begin{aligned}&s.t.&\alpha \le {x^T}x \le 1, \end{aligned}$$
(2)

where \(0\le \alpha \le 1\). (TTRS) was first introduced in [11]. For a recent survey on (TTRS), we refer to [10].

In the following, we study (TTRS) and keep in mind that (TRS) is a special case of (TTRS) when \(\alpha = 0\). Our approximation scheme contains two main steps. First we establish a new convex programming reformulation for (TTRS), which is different from the existing convex reformulations [1, 13]. Secondly, we employ Nesterov’s accelerated gradient descent algorithm [9] to solve the convexified version of (TTRS). The total worst-case time complexity is even less than that of the recent linear-time algorithm [6], which only works for (TRS).

The remainder of this paper is as follows. Section 2 presents a new hidden convexity for (TTRS). Section 3 applies Nesterov’s fast gradient method to solve the convexified formulation of (TTRS) and then analyzes the worst-case time complexity. The complexity is even less than the recent linear-time algorithm for (TRS). Conclusions are made in Sect. 4.

2 Hidden convexity

In this section, we present a new convex quadratic constrained quadratic programming reformulation for (TTRS).

Let \(v(\cdot )\) be the optimal value of \((\cdot )\). Denote by \({\lambda _1}\) the minimal eigenvalue of A. Let \(v\ne 0\) be the corresponding eigenvector. Without loss of generality, we assume \(b^Tv\le 0\).

Theorem 1

(TTRS) is equivalent to the following convex programming relaxation

$$\begin{aligned} (\text {C})~&\min _{x,t}&\frac{1}{2}{x^T}(A-\lambda _1 I)x + {b^T}x+\frac{1}{2}\lambda _1t \end{aligned}$$
(3)
$$\begin{aligned}&\text {s.t.}&x^Tx\le t,~ \alpha \le t \le 1, \end{aligned}$$
(4)

in the sense that v(TTRS) \(= v\)( C). Moreover, for any optimal solution of \((\mathrm{C})\), denoted by \((x^*,t^*)\),

$$\begin{aligned} x^*+\frac{\sqrt{(x^{*T}v)^2-v^Tv(x^{*T}x^*-t^*)}-x^{*T}v}{v^Tv} v \end{aligned}$$
(5)

globally solves \((\mathrm{TTRS})\).

Proof

Let \({\widetilde{x}}\) be an optimal solution of \((\mathrm{TTRS})\). Then, \(({\widetilde{x}},\widetilde{t}:={\widetilde{x}}^T{\widetilde{x}})\) is a feasible solution of (C) whose objective value is equal to \(v(\mathrm{TTRS})\). Therefore, we have \(v(\mathrm{TTRS})\ge v(\mathrm{C})\). Now it is sufficient to show that the lower bound \(v(\mathrm{C})\) is attained for (TTRS).

Let \((x^*,t^*)\) be an optimal solution of \((\mathrm{C})\). We assume \(x^{*T}x^*<t^*\), otherwise, we are done. Define

$$\begin{aligned} x(\gamma )=x^*+\gamma v. \end{aligned}$$

Then, \((x(\gamma ),t^*)\) is a feasible solution of (C) for any \(\gamma \) such that \(x(\gamma )^Tx(\gamma )\le t^*\). Moreover, the objective value of \((x(\gamma ),t^*)\) is decreasing when \(\gamma \ge 0\) increases. Consequently, \((x(\gamma ),t^*)\) remains optimal for \(\gamma \ge 0\) and \(x(\gamma )^Tx(\gamma )\le t^*\). Notice that the equation \(x(\gamma )^Tx(\gamma )= t^*\) in terms of \(\gamma \) has a positive root \(\frac{\sqrt{(x^{*T}v)^2-v^Tv(x^{*T}x^*-t^*)}-x^{*T}v}{v^Tv}\). The proof is complete. \(\square \)

When \(\alpha =0\), Theorem 1 reduces to the following result for (TRS).

Corollary 1

([3], Theorem 1) Let \(\mu =\min \{\lambda _1,0\}\). \((\mathrm{TRS})\) is equivalent to the convex programming relaxation

$$\begin{aligned} \mathrm{(C_0)}~~ \min _{x^Tx \le 1} \frac{1}{2}{x^T}(A - \mu I)x + {b^T}x + \frac{1}{2}\mu \end{aligned}$$

in the sense that \(v(\mathrm{TRS})=v(\mathrm{C_0})\). Moreover, when \(\lambda _1<0\), for any optimal solution of \((\mathrm{C_0})\), denoted by \(x^*\),

$$\begin{aligned} x^*+\frac{\sqrt{(x^{*T}v)^2-v^Tv(x^{*T}x^*-1)}-x^{*T}v}{v^Tv} v \end{aligned}$$
(6)

globally solves \((\mathrm{TRS})\).

To approximately calculate \({\lambda _1}\), we need the following estimation [6] on the maximal eigenvalue of A, denoted by \(\lambda _{\max }(A)\), which is based on the analysis by Kuczynski and Wozniakowski [7].

Theorem 2

[6, 7] Given a symmetric matrix \(A \in {\mathbb {R}}^{n\times n}\) with \(\left\| A \right\| \le \rho \), parameters \(\varepsilon , \delta >0\) and let \(N(\ge n)\) is an upper bound of the number of non-zero entries in A, the Lanczos method [4] runs in time \(O(\frac{{N\sqrt{\rho }}}{{\sqrt{\varepsilon }}}\log \frac{n}{\delta })\) and returns a unit vector \(x\in {\mathbb {R}}^n\) for which

$$\begin{aligned} x^TAx\ge \lambda _{\max }(A)-\epsilon \end{aligned}$$

with probability at least \(1 - \delta \).

Notice that \(\left\| \rho I-A \right\| \le 2\rho \) and the maximal eigenvalue of \(\rho I-A\) is equal to \(\rho -\lambda _1\). Thus, applying Theorem 2 to \(\rho I-A\) yields the following estimation on \(\lambda _1\).

Corollary 2

Given a symmetric matrix \(A \in {\mathbb {R}}^{n\times n}\) with \(\left\| A \right\| \le \rho \) , parameters \(\varepsilon , \delta >0\) and let \(N(\ge \) n) is an upper bound of the number of non-zero entries in A, the Lanczos method [4] runs in time \(O(\frac{{N\sqrt{\rho }}}{{\sqrt{\varepsilon }}}\log \frac{n}{\delta })\) and returns a unit vector \({\widetilde{v}}\in {\mathbb {R}}^n\) for which

$$\begin{aligned} (\lambda _1\le ) {\widetilde{v}}^TA\widetilde{v}\le \lambda _{1}+\epsilon \end{aligned}$$

with probability at least \(1 - \delta \).

Define \({\widetilde{\lambda }}_1={\widetilde{v}}^TA{\widetilde{v}}-\epsilon \). Then,

$$\begin{aligned} A-{\widetilde{\lambda }}_1 I\succeq A-\lambda _1 I\succeq 0. \end{aligned}$$

As an approximation of (C), we turn to solve the following convex programming problem

$$\begin{aligned} (\mathrm{AC})~&\min _{x,t}&g(x,t):= \frac{1}{2}{x^T}(A-{\widetilde{\lambda }}_1 I)x + {b^T}x+\frac{1}{2}{\widetilde{\lambda }}_1t \end{aligned}$$
(7)
$$\begin{aligned}&\mathrm{s.t.}&(x,t)\in X:=\{(x,t)\mid ~ x^Tx\le t,~ \alpha \le t \le 1\}. \end{aligned}$$
(8)

It is not difficult to verify that

$$\begin{aligned} |v(\mathrm{AC})-v(\mathrm{C})|\le \epsilon . \end{aligned}$$
(9)

3 Nesterov’s accelerated gradient method and complexity

In this section, we employ Nesterov’s accelerated gradient descent method [9] to solve (AC) (7), (8) and then study the worst-case time complexity.

Notice that g(xt) defined in (7) is a convex quadratic function. Then, the gradient of g, denoted by \(\nabla g\), is Lipschitz continuous:

$$\begin{aligned} \left\| {\nabla {g}(x_1,t_1) - \nabla {g}(x_2,t_2)} \right\| \le 2\rho \left\| {(x_1,t_1) - (x_2,t_2)} \right\| \end{aligned}$$

where \(\rho \) is an upper bound of \(\left\| A \right\| \).

To solve (AC) (7), (8), we use the following algorithm in [12], which is a slightly generalized version of Nesterov’s accelerated gradient method [9].

Algorithm 1

  1. 1.

    Choose \({\theta _0} = {\theta _{ - 1}} \in (0,1]\), \((x_0,t_0)=(x_{ - 1}, t_{ - 1})\in X\). Let \(k:=0\).

  2. 2.

    Let

    $$\begin{aligned} (y_k,s_k) = (x_k,t_k) + {\theta _k}(\theta _{k - 1}^{ - 1} - 1)((x_k,t_k)- (x_{k-1},t_{k-1})) \end{aligned}$$

    and \(\nabla g_k=\nabla g(y_k,s_k)\). Update

    $$\begin{aligned} (x_{k+1},t_{k+1})= \arg \min _{(x,t) \in X}\nabla g_k^T(x,t)+\rho \Vert (x,t)-(y_k,s_k)\Vert ^2. \end{aligned}$$
    (10)
  3. 3.

    Choose \({\theta _{k + 1}} \in (0,1]\) satisfying

    $$\begin{aligned} \frac{{1 - {\theta _k}}}{{\theta _{k + 1}^2}} \le \frac{1}{{\theta _k^2}}. \end{aligned}$$

    If the stopping criterion does not hold, update \(k:=k + 1\), and go to step 2.

Theorem 3

[9, 12] Let \(\left\{ {({x_k},{t_k})} \right\} \) be generated by Algorithm 1, \((x^*,t^*)\) be the global minimizer of (AC) (7), (8), then

$$\begin{aligned} g(x_k,t_k) - g(x^*,t^*) \le O\left( \frac{\rho }{{{k^2}}}\right) ,\forall k. \end{aligned}$$
(11)

That is, Algorithm 1 returns an \(\epsilon \)-approximation solution of (AC) in \(O(\sqrt{\rho /\varepsilon })\) iterations.

In each iteration of Algorithm 1, the main computational cost is to solve (10). In the following, we show the solution details of (10). We first rewrite the objective function of (10) as follows

$$\begin{aligned} G(x,t):= & {} \nabla g_k^T(x,t)+\rho \Vert (x,t)-(y_k,s_k)\Vert ^2\\= & {} \rho x^Tx + [(A - {\widetilde{\lambda }}_1 I - 2\rho I){y_k} + b]^Tx + \rho t^2 + ({\widetilde{\lambda }}_1/2 - 2\rho s_k)t+e\\:= & {} \rho x^Tx + c^Tx + \rho t^2 + dt+e \end{aligned}$$

where \(c=(A - {\widetilde{\lambda }}_1 I - 2\rho I){y_k} + b\), \(d={\widetilde{\lambda }}_1/2 - 2\rho s_k\) and \(e=\rho \Vert (y_k,s_k)\Vert ^2\).

Now, (10) is equivalent to the following convex program:

$$\begin{aligned}&\min&\rho x^Tx + c^Tx + \rho t^2 + dt \end{aligned}$$
(12)
$$\begin{aligned}&\mathrm{s.t.}&x^Tx \le t, \end{aligned}$$
(13)
$$\begin{aligned}&t^2-(\alpha +1)t+\alpha \le 0. \end{aligned}$$
(14)

It is sufficient to solve the KKT system of (12)–(14), which reads as follows:

$$\begin{aligned}&2\rho x + c +2 \mu _1 x=0, \end{aligned}$$
(15)
$$\begin{aligned}&2\rho t + d -\mu _1 +2 \mu _2 t-(\alpha +1)\mu _2=0, \end{aligned}$$
(16)
$$\begin{aligned}&\mu _1(x^Tx - t) =0, \end{aligned}$$
(17)
$$\begin{aligned}&\mu _2(t^2-(\alpha +1)t+\alpha )=0, \end{aligned}$$
(18)
$$\begin{aligned}&x^Tx\le t,~ \alpha \le t\le 1,~ \mu _1\ge 0, ~\mu _2\ge 0, \end{aligned}$$
(19)

where \(\mu _1\) and \(\mu _2\) are the Lagrangian multipliers corresponding to (13) and (14), respectively.

Notice that \(\rho >0\), \(\mu _1\ge 0\) and \(\mu _2\ge 0\). It follows from (15) and (16) that

$$\begin{aligned} x(\mu _1)= & {} \frac{-c}{2\rho +2 \mu _1}, \end{aligned}$$
(20)
$$\begin{aligned} t(\mu _1,\mu _2)= & {} \frac{\mu _1+(\alpha +1)\mu _2-d}{2\rho +2 \mu _2}. \end{aligned}$$
(21)

Consider the following different cases.

  1. (a)

    Suppose \(\mu _1=0\) and \(\mu _2=0\). If \(x(0)^Tx(0)\le t(0,0)\) and \(\alpha \le t(0,0)\le 1\), then (x(0), t(0, 0)) is the optimal solution of (10).

  2. (b)

    Suppose \(\mu _1=0\) and \(\mu _2>0\). According to (18), \(t(0,\mu _2)=\alpha \) or 1. Hence, \(\mu _2\) is solved. If \(x(0)^Tx(0)\le t(0,\mu _2)\), then \((x(0),t(0,\mu _2))\) is the optimal solution of (10).

  3. (c)

    Suppose \(\mu _1>0\) and \(\mu _2=0\). According to (17), we have \(x(\mu _1)^Tx(\mu _1)= t(\mu _1,0)\). Consequently, \(\mu _1\) is obtained by solving this cubic equation. If \(\alpha \le t(\mu _1,0)\le 1\), then \((x(\mu _1),t(\mu _1,0))\) is the optimal solution of (10).

  4. (d)

    Suppose \(\mu _1>0\) and \(\mu _2>0\). It follows from (17) to (18) that \(x(\mu _1)^Tx(\mu _1)= t(\mu _1,\mu _2)\), and \(t(\mu _1,\mu _2)=\alpha \) or 1. The first equation is cubic and the second is linear. Therefore, combining both yields a cubic function of \((\mu _1)\), which also has an explicit solution. For the obtained \((\mu _1,\mu _2)\), if \(x(\mu _1)^Tx(\mu _1)\le t(\mu _1,\mu _2)\) and \(\alpha \le t(\mu _1,\mu _2)\le 1\), then \((x(\mu _1),t(\mu _1,\mu _2))\) is the optimal solution of (10).

From above, we can see that (10) is modeled in O(N) time and then solved in O(n) time, where \(N(\ge n)\) is an upper bound of the number of non-zero entries in A. According to Theorem 3, we have the following estimation.

Lemma 1

The time complexity of approximately solving (AC) is

$$\begin{aligned} O\left( N \sqrt{\frac{\rho }{\varepsilon }} \right) . \end{aligned}$$
(22)

Furthermore, combining (9), Theorem 1, Corollary 2, and Lemma 1 yields the following main result.

Theorem 4

Given the parameters \(\epsilon >0\), \(\delta >0\), with probability at least \(1-\delta \) over the randomization of an approximate eigenvector oracle, we can find an \(\epsilon \)-approximation solution of (TTRS), \({\widetilde{x}}\in {\mathbb {R}}^n\), satisfying \(\alpha \le {\widetilde{x}}^T{\widetilde{x}}\le 1\) and

$$\begin{aligned} f({\widetilde{x}})\le v(\mathrm{TTRS}) + \epsilon , \end{aligned}$$

in total time

$$\begin{aligned} O\left( \frac{N\sqrt{\Vert A\Vert } }{{\sqrt{\varepsilon }}}\log \frac{n}{\delta }\right) . \end{aligned}$$
(23)

We notice that Hazan and Koren’s approximation scheme [6] is particular to (TRS). To the best of our knowledge, whether it can be extended to solve (TTRS) remains open. Nevertheless, even for (TRS), our complexity (23) is clearly less than Hazan and Koren’s [6]:

$$\begin{aligned} O\left( \frac{N\sqrt{\beta }}{\sqrt{\varepsilon }}\log \frac{\beta }{\varepsilon }\log \left( \frac{n}{\delta }\log \frac{\beta }{\varepsilon }\right) \right) , \end{aligned}$$

where \(\beta = \max ( \left\| A \right\| + \left\| b \right\| , 1)\).

4 Conclusions

Recently, the first linear-time algorithm for the trust region subproblem (TRS) is proposed based on strong duality, approximate eigenvector oracle and the bisection scheme. In this paper, we present a new linear-time approximation scheme for solving the trust region subproblem (TRS). It trivially employs Nesterov’s accelerated gradient descent algorithm to solve a convex quadratic constrained quadratic programming reformulation of (TRS). However, the total time complexity of our scheme is less than that of the recent one for (TRS). Our approach is further extended to the two-sided trust region subproblem. The future work will include more extensions to the other nonconvex quadratic programming problems.