A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming

Qian, Xun; Liao, Li-Zhi; Sun, Jie

doi:10.1007/s10107-018-1314-0

A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming

Full Length Paper
Series A
Published: 28 July 2018

Volume 179, pages 1–19, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Programming Submit manuscript

A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming

Download PDF

600 Accesses
2 Citations
Explore all metrics

Abstract

The affine scaling algorithm is one of the earliest interior point methods developed for linear programming. This algorithm is simple and elegant in terms of its geometric interpretation, but it is notoriously difficult to prove its convergence. It often requires additional restrictive conditions such as nondegeneracy, specific initial solutions, and/or small step length to guarantee its global convergence. This situation is made worse when it comes to applying the affine scaling idea to the solution of semidefinite optimization problems or more general convex optimization problems. In (Math Program 83(1–3):393–406, 1998), Muramatsu presented an example of linear semidefinite programming, for which the affine scaling algorithm with either short or long step converges to a non-optimal point. This paper aims at developing a strategy that guarantees the global convergence for the affine scaling algorithm in the context of linearly constrained convex semidefinite optimization in a least restrictive manner. We propose a new rule of step size, which is similar to the Armijo rule, and prove that such an affine scaling algorithm is globally convergent in the sense that each accumulation point of the sequence generated by the algorithm is an optimal solution as long as the optimal solution set is nonempty and bounded. The algorithm is least restrictive in the sense that it allows the problem to be degenerate and it may start from any interior feasible point.

On Slater’s condition and finite convergence of the Douglas–Rachford algorithm for solving convex feasibility problems in Euclidean spaces

Article 13 October 2015

Optimality conditions and global convergence for nonlinear semidefinite programming

Article 07 December 2018

A convergent relaxation of the Douglas–Rachford algorithm

Article Open access 06 March 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let ${\mathcal {S}}^n$ denote the vector space of real symmetric $n\times n$ matrices. The standard inner product on ${\mathcal {S}}^n$ is

$$\begin{aligned} A\bullet B=\mathrm{tr}(AB) =\sum _{i,j}A_{ij}B_{ij} \quad \mathrm{and} \quad \mathrm{tr}(\cdot )=\mathrm{trace}(\cdot ). \end{aligned}$$

By $X\succeq 0$$(X \succ 0)$, where $X \in {\mathcal {S}}^n$, we mean that X is positive semidefinite (positive definite). Consider the following convex semidefinite programming (SDP) problem

$$\begin{aligned} \begin{array}{rcl} \min \limits _{X\in {\mathcal {S}}^n} &{} &{} f(X) \\ \mathrm{s.t.\ } &{} &{}A_k\bullet X= b_k, \ k=1,\dots ,m,\\ &{} &{} X\succeq 0, \end{array} \end{aligned}$$

(P)

where $f : {\mathcal {S}}^n \rightarrow R$ is convex and continuously differentiable, $b\in R^m$, and $A_k\in {\mathcal {S}}^n$, $k=1,\dots ,m$. As a blanket assumption, we assume that the optimal value for problem (P) is finite and attainable, therefore, we use $\min $ rather than $\inf $ in problem (P).

The following notations will be used in our subsequent discussions

$$\begin{aligned}&{R^n_+}=\{x \in R^n| x\ge 0\}, \quad {R^n_{++}}=\{x \in R^n| x > 0 \},\\&{{\mathcal {S}}^n_+}=\{X \in {\mathcal {S}}^n| X\succeq 0\}, \quad {{\mathcal {S}}^n_{++}}=\{X \in {\mathcal {S}}^n| X \succ 0 \},\\&\mathcal{P^+}=\{X \in {\mathcal {S}}^n|A_k\bullet X= b_k, \ k=1,\dots ,m,\ X\succeq 0\},\\&\mathcal{P^{++}}=\{X \in {\mathcal {S}}^n|A_k\bullet X= b_k, \ k=1,\dots ,m, \ X \succ 0 \}. \end{aligned}$$

A comprehensive study of SDP with linear objective function can be found in [35].

There are many interior point algorithms for solving problem (P), for example, [1, 16, 17, 19, 25, 28, 38] for f(X) being linear, [12, 21, 30, 31] for f(X) being convex quadratic, and [36, 37] for general nonlinear semidefinite programming. Some related continuous trajectories were studied for semidefinite programming, for instance, [8, 10, 11, 13, 15]. Most of interior point algorithms are primal–dual path-following algorithms that are extensions of primal–dual path-following algorithm for linear programming (LP). The affine scaling algorithm for LP was originally proposed by Dikin [5], and further discussed by Barnes [2], and Vanderbei et al. [33]. For more details on the development of the affine scaling algorithm, see [7, 20], and the references therein. Unfortunately, the affine scaling algorithm for linear SDP with either the short or the long step version may fail [20], even though the affine scaling continuous trajectory, which is contained in Cauchy trajectories, converges to the optimal solution set of problem (P) [13].

In [20], Muramatsu gave an example of linear SDP such that the affine scaling algorithm converges to a non-optimal point. In that example, for both the short and the long step version of the affine scaling algorithm, there exists a region of starting points such that the generated sequence converges to a non-optimal point. In the concluding remarks of [20], Muramatsu pointed that it may still be possible to prove the global convergence from well-chosen starting points, or by allowing variable step sizes. In this paper, we focus on the second strategy—allowing variable step sizes, and propose a new step size rule which is similar to the Armijo-type rule [3]. Under this new rule of step size, we can prove that starting from any interior feasible point, every accumulation point of the affine scaling algorithm is an optimal solution of problem (P). It should be noted that Renegar and Sondjaja studied a polynomial-time affine scaling method for linear semidefinite and hyperbolic programming in [22], where the method is actually a variant of Dikin’s affine scaling method. The algorithm in [22] shares the similar spirit as Dikin’s method in that at each iteration, the cone in the original optimization problem is replaced by an ellipsoidal cone centered at the current iterate, rather than an ellipsoid as in Dikin’s method. The algorithm differs from Dikin’s method in that at each iteration, the ellipsoidal cone is chosen to contain the original cone rather than to be contained by it.

In Sect. 5, we will consider a special case where X and $A_i$ ($i = 1,\ldots ,m$) are all diagonal in (P) and with a slightly different but less restrictive step size rule. In this special case, problem (P) becomes linearly constrained convex programming which has been studied in [7] in the context of affine scaling algorithms and in [32] through a first-order interior point method, which includes the affine scaling algorithm as a special case. However, to ensure optimality, they both require nondegeneracy assumptions, and the strong convergence of the (first-order) affine scaling algorithm is still open. The line search procedure in [7] is quite general, in order to guarantee optimality, the step size in [32] needs to be bounded. The second-order affine scaling algorithm for linearly constrained convex programming has been studied in [18, 27]. In [27], Sun proved the global convergence at a local linear rate under a Hessian similarity condition of the objective function, and an $\epsilon $-optimal solution can be obtained if the step size is in the order of $O(\epsilon )$ without nondegeneracy assumptions. In [18], Monteiro and Wang studied a version of the second-order affine scaling algorithm in which a fraction of the ellipsoid used at each iteration is selected according to a trust region strategy for linearly constrained convex and concave programs. In the convex case of [18], optimality is obtained under the primal nondegeneracy assumption, and it was shown that the sequences of iterates and objective function values generated by the algorithm converge R-linearly and Q-linearly, respectively, under certain invariance condition on the Hessian of the objective function.

For simplicity, in what follows, we use $\Vert \cdot \Vert $ to denote either the 2-norm of vector or the spectral norm of matrix, and use $\Vert \cdot \Vert _F$ to denote the Frobenius norm of matrix. $x_i$ denotes the ith component of a vector x, and I stands for the identity matrix whose dimension will be clear from the context. $\mathrm{diag}(x)$ for the vector x denotes the diagonal matrix whose diagonal entries are that of x. For any $Q\in {{\mathcal {S}}^n}$, $\lambda _{\min }(Q)$ denotes the smallest eigenvalue of Q.

The definition of the affine scaling direction in linear semidefinite programming can be found in [6] or [20]. In the convex case, the affine scaling direction can be defined similarly. Similar to [20], the affine scaling direction for problem (P) can be obtained by first defining the associated dual estimate. For a point $X \in \mathcal{P^{++}}$, we define the dual estimate $(u(X), S(X)) \in R^m \times {\mathcal {S}}^n$ as the unique solution of the following optimization problem

$$\begin{aligned} \begin{array}{rcl} \min \limits _{S\in {\mathcal {S}}^n, u\in R^m} &{} &{} \Vert X^{\frac{1}{2}}SX^{\frac{1}{2}}\Vert _F^2 \\ \mathrm{s.t. \ }&{} &{} S + \sum \limits _{i=1}^m u_iA_i = \frac{\partial f}{\partial X}, \end{array} \end{aligned}$$

(1)

where $X^\frac{1}{2} \in {{\mathcal {S}}^n_{++}}$ is the unique square root matrix of X. Using the KKT condition of problem (1), (u(X), S(X)) can be solved as (see (4) in [20])

$$\begin{aligned} u(X) = G(X)^{-1}p(X), \ \ \ S(X) = \frac{\partial f}{\partial X} - \sum \limits _{i=1}^m u_i(X)A_i, \end{aligned}$$

(2)

where $G(X) \in {\mathcal {S}}^m$ and $p(X) \in R^m$ are such that $G_{ij}(X) = \mathrm{tr}(A_iXA_jX)$ and $p_j(X) = \mathrm{tr}(A_jX\frac{\partial f}{\partial X}X)$ for $i, j = 1, 2,\ldots , m$, respectively. Assumption 2 below guarantees that G(X) is invertible (see Proposition 1 in [20]).

Then the affine scaling direction D(X) for problem (P) can be defined as

$$\begin{aligned} D(X) = -\, XS(X)X = -\,X \left( \frac{\partial f}{\partial X} - \sum \limits _{i=1}^m u_i(X)A_i \right) X. \end{aligned}$$

(3)

The rest of this paper is organized as follows. In Sect. 2, we study some properties of the affine scaling direction. In Sect. 3, we propose an affine scaling algorithm with a new step size rule, which is similar to the Armijo-type rule. In Sect. 4, we prove that any accumulation point of the affine scaling algorithm with the new step size rule is an optimal solution of problem (P) for any starting interior feasible point without nondegeneracy assumptions. In Sect. 5, we consider a special case of problem (P) where X and $A_i$’s are diagonal. A slightly different step size rule, which is less restrictive, is proposed for the affine scaling algorithm. For any accumulation point of the resulting algorithm, optimality is obtained as well. In Sect. 6, the convergence of our affine scaling algorithm on the counter example in [20] and on some randomly generated linear SDP problems are illustrated. Finally, some concluding remarks are drawn in Sect. 7.

2 Properties of the affine scaling direction

The following assumptions are made throughout this paper.

Assumption 1

$\mathcal{P^{++}}$ is nonempty.

Assumption 2

The matrix $\mathcal {A}$ has full row rank m.

Assumption 3

The optimal solution set of problem (P) is nonempty and bounded.

Now we introduce another form of the affine scaling direction D(X) in (3). Firstly we need the following notations.

We define the map $\mathrm{svec}:\ {{\mathcal {S}}^n} \rightarrow R^{n(n+1)/2}$ as
$$\begin{aligned} \mathrm{svec}(U) := (u_{11}, \sqrt{2}u_{21}, \dots , \sqrt{2}u_{n1}, u_{22}, \sqrt{2}u_{32}, \dots , \sqrt{2}u_{n2}, \dots , u_{nn})^T, \end{aligned}$$
where $U \in {\mathcal {S}}^n$ and “T” is the transpose.
The symmetrized Kronecker product $\circledast $ is defined as
$$\begin{aligned} (G\circledast K)\mathrm{svec}(H)=\frac{1}{2}\mathrm{svec}(KHG^T+GHK^T), \end{aligned}$$
where $G, K \in R^{n\times n}$ and $H\in {\mathcal {S}}^n$. The properties of operator $\circledast $ can be found in the Appendix of [1, 24].
Let matrix $\mathcal {A}$ be defined as follows
$$\begin{aligned} \mathcal{A}=\begin{pmatrix} \mathrm{svec}(A_1)^T \\ \vdots \\ \mathrm{svec}(A_m)^T \end{pmatrix} \in R^{m\times {n(n+1)/2}}. \end{aligned}$$

From (2), we can rewrite $G_{ij}(X)$ and $p_j(X)$ as $\mathrm{svec}(A_i)(X\circledast X)\mathrm{svec}(A_j)$ and $\mathrm{svec}(A_j)(X\circledast X)\mathrm{svec}\left( \frac{\partial f}{\partial X}\right) $, respectively. Therefore, G(X) and p(X) can be denoted as $\mathcal{{A}}(X\circledast X)\mathcal{{A}}^T$ and $\mathcal{{A}}(X\circledast X)\mathrm{svec}\left( \frac{\partial f}{\partial X}\right) $, respectively, consequently

$$\begin{aligned} u(X) = (\mathcal{{A}}(X\circledast X)\mathcal{{A}}^T)^{-1}\mathcal{{A}}(X\circledast X)\mathrm{svec}\left( \frac{\partial f}{\partial X}\right) , \end{aligned}$$

(4)

and

$$\begin{aligned} \mathrm{svec}(S(X)) = ( I-P_{\mathcal{{A}}X}(X\circledast X))\mathrm{svec}\left( \frac{\partial f}{\partial X} \right) , \end{aligned}$$

(5)

where we denote $P_{\mathcal{{A}}X}=\mathcal{{A}}^T(\mathcal{{A}}(X\circledast X)\mathcal{{A}}^T)^{-1}\mathcal{{A}}$ for simplicity. Now we can present another form of the affine scaling direction D(X) as

$$\begin{aligned} \mathrm{svec}(D(X))= -\,\left( I-(X\circledast X)P_{\mathcal{{A}}X}\right) (X\circledast X)\mathrm{svec}\left( \frac{\partial f}{\partial X} \right) . \end{aligned}$$

(6)

Using the above notations, the affine scaling direction in (3) can be also derived from the following optimization problem

$$\begin{aligned} \begin{array}{rcl} \min \limits _{D\in R^{n\times n}} &{} &{} \frac{\partial f}{\partial X}\bullet D \\ \mathrm{s.t.\ } &{} &{}A_k\bullet D= 0, \ k=1,\dots ,m,\\ &{} &{} \Vert X^{-\frac{1}{2}}DX^{-\frac{1}{2}} \Vert _F^2 \le {{\tilde{\beta }}}^2 <1, \end{array} \end{aligned}$$

(7)

where $X^{-\frac{1}{2}}$ is the inverse of $X^{\frac{1}{2}}$. In fact, if the current point $X \in \mathcal{P^{++}}$ is not an optimal solution of problem (P), then by the KKT condition of problem (7), it is not difficult to obtain the solution of problem (7) as

$$\begin{aligned} \mathrm{svec}(D) = -\,\frac{{{\tilde{\beta }}}\left( I-(X\circledast X)P_{\mathcal{{A}}X}\right) (X\circledast X)\mathrm{svec}\left( \frac{\partial f}{\partial X} \right) }{\left\| \left( I-\left( X^{\frac{1}{2}}\circledast X^{\frac{1}{2}}\right) P_{\mathcal{{A}}X}\left( X^{\frac{1}{2}}\circledast X^{\frac{1}{2}}\right) \right) \left( X^{\frac{1}{2}}\circledast X^{\frac{1}{2}}\right) \mathrm{svec}\left( \frac{\partial f}{\partial X} \right) \right\| }, \end{aligned}$$

or equivalently

$$\begin{aligned} D = -\,\frac{{{\tilde{\beta }}}XS(X)X}{\Vert X^{\frac{1}{2}}S(X)X^{\frac{1}{2}} \Vert _F}. \end{aligned}$$

We can see this D and the D(X) in (3) represent the same direction.

For linear SDP, the properties of the affine scaling direction have been discussed in [20]. For convex SDP, these properties can be obtained similarly. The proof of Theorem 1 below can be obtained identically from the proof of Proposition 2 in [20] except replacing matrix C with $\frac{\partial f}{\partial X}$, hence the proof is omitted.

Theorem 1

We have

$$\begin{aligned} A_i\bullet D(X) = 0, \end{aligned}$$

(8)

for all $i=1,\ldots ,m$, and

$$\begin{aligned} \frac{\partial f}{\partial X}\bullet D(X) = -\,\Vert X^{\frac{1}{2}}S(X)X^{\frac{1}{2}}\Vert _F^2. \end{aligned}$$

(9)

3 A new step size rule

It has been shown in [20] that the affine scaling algorithm for linear SDP can fail no matter it uses a long step strategy or a short step strategy. Hence here we design a new step size strategy which is similar to the Armijo-type rule [3]. For any initial point $X_0 \in \mathcal{P^{++}}$, the iterations in the affine scaling algorithm have the form

$$\begin{aligned} X_{k+1} = X_{k} + \alpha ^kD(X_k),\ k=0,1,\ldots , \end{aligned}$$

(10)

where $\alpha ^k > 0$ is the step size and D(X) is given in (3). In order to state our step size strategy, we first introduce some notations. Let

$$\begin{aligned} \rho (X) = \sup \{ \rho >0 | X + \rho D(X) \succ 0 \}, \end{aligned}$$

for any $X \in {{\mathcal {S}}^n_{++}}$ and select a positive sequence $\{ a_i\}_{i=0}^{+\infty }$ such that $\lim \nolimits _{i\rightarrow +\infty } a_i= 0$ and $\lim \nolimits _{s\rightarrow +\infty } \sum \nolimits _{i=0}^sa_i = +\,\infty $. For instance, the sequence can be $\{ \frac{1}{(i+1)^\alpha }\}_{i=0}^{+\infty }$ with $0<\alpha \le 1$ or $\{ \frac{1}{\ln (i+2)}\}_{i=0}^{+\infty }$. Then $\alpha ^k$ in (10) can be defined from the following two steps:

Step 1:
$$\begin{aligned} \alpha _0^k = \min \left\{ \frac{a_k}{\Vert S_kX_k\Vert c(\Vert S_k\Vert )}, \tau \rho (X_k) \right\} >0 , \end{aligned}$$
(11)
where $S_k = S(X_k)$, $0< \tau < 1 $ is a constant, and c(x) is a scalar function which satisfies $c_1x \le c(x) \le \max (c_2x, c_3)$, where $0<c_1 \le c_2$, $c_3>0$ are constants.
Step 2:$\alpha ^k$ is the largest $\alpha \in \{ \alpha _0^k \beta ^l \}_{l=0,1,\ldots }$ satisfying
$$\begin{aligned} f(X_k + \alpha D(X_k)) \le f(X_k) + \sigma \alpha G_k\bullet D(X_k), \end{aligned}$$
(12)
where $G_k = \frac{\partial f}{\partial X}|_{X=X_k}$, $0< \beta , \ \sigma <1$ are constants.

It should be noticed that in (11) $S_k$ should be a nonzero matrix. In fact, if $S_k = 0$, then it is easy to verify that $X_k$ is actually an optimal solution from the KKT condition, and the iteration should stop, hence we will not consider this trivial case for brevity.

4 Optimality of the affine scaling algorithm

In this section, we will show that the affine scaling algorithm with our step size strategy (12) will succeed without nondegeneracy assumptions. We begin our discussions with the following lemmas.

Lemma 1

(Section 3.1.3, [4]) Suppose f is differentiable (i.e., its gradient $\nabla f$ exists at each point in domf). Then f is convex if and only if domf is convex and

$$\begin{aligned} f(y) \ge f(x)+\nabla f(x)^T(y-x) \end{aligned}$$

(13)

holds for all x, $y\in dom f$.

According to [9], a vector $\beta $ is said to majorize a vector $\alpha $ if

$$\begin{aligned} \min \left\{ \sum _{j=1}^k \beta _{i_j}: 1\le i_1<\cdots<i_k\le n \right\} \ge \min \left\{ \sum _{j=1}^k \alpha _{i_j}: 1\le i_1<\cdots <i_k\le n \right\} , \end{aligned}$$

for any $k=1,2,\dots ,n$ with equality for $k=n$.

Lemma 2

(Theorem 4.3.26, [9]) Let A be Hermitian. Then the vector composed of the diagonal entries of A majorizes the vector composed of the eigenvalues of A.

Lemma 3

The level set $\mathcal{{F}} = \{ X\in \mathcal{P^+} | f(X) \le f(X_0) \}$ is bounded.

Proof

Let $\delta _{\mathcal{P^+}}(X)$ be the indicator function of $\mathcal{P^+}$ which is defined by

$$\begin{aligned} \delta _\mathcal{P^+}(X) = {\left\{ \begin{array}{ll} 0 &{}\quad \text{ if } X\in \mathcal{P^+},\\ +\,\infty &{}\quad \text{ Otherwise }. \end{array}\right. } \end{aligned}$$

Then $f(X) + \delta _{\mathcal{P^+}}(X)$ is a closed proper convex function, and the optimal solution set of problem (P) can be expressed as

$$\begin{aligned} \{ X\in {\mathcal {S}}^n | f(X) + \delta _{\mathcal{P^+}}(X) \le f^* \}, \end{aligned}$$

where $f^*$ is the optimal objective value for problem (P). According to Assumption 3, the optimal solution set of problem (P) is nonempty and bounded, hence Corollary 8.7.1 in [23] implies that

$$\begin{aligned} \{ X\in {\mathcal {S}}^n | f(X) + \delta _{\mathcal{P^+}}(X) \le f(X_0) \} = \mathcal{{F}} = \{ X\in \mathcal{P^+} | f(X) \le f(X_0) \} \end{aligned}$$

(14)

is also bounded. $\square $

Theorem 2

Let $\{ X_k \}$ be generated by the affine scaling algorithm (10) with the step size $\{ \alpha ^k \}$ chosen by (12). Then

(i)
$X_k \in \mathcal{P^{++}}$, $\{ f(X_k) \}$ is nonincreasing, $\{X_k \}$ and $\{D(X_k) \}$ are bounded;
(ii)
every accumulation point of $\{X_k \}$ is an optimal solution for problem (P).

Proof

Proof of (i). Since $X_0 \in \mathcal{P^{++}}$ and $\alpha ^k \le \tau \rho (X_k)$ at each step, by using an induction argument on k, we have that $X_k \in {{\mathcal {S}}^n_{++}}$ for all k. Moreover, from (8) in Theorem 1 and $X_0 \in \mathcal{P^{++}}$, we have $X_k \in \mathcal{P^{++}}$ for all k. Also, since f(X) is continuously differentiable and $\alpha ^k_0 > 0$, $0<\sigma <1$ in (12), we know $\alpha ^k>0$ for all k.

Combining (9) and (12), we have for all k

$$\begin{aligned} f(X_{k+1}) - f(X_k)\le \sigma \alpha ^k G_k\bullet D(X_k) = -\, \sigma \alpha ^k \left\| X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right\| _F^2 \le 0, \end{aligned}$$

(15)

thus $\{ f(X_k) \}$ is nonincreasing. Then $X_k \in \mathcal{{F}}$ for all k. Since the level set $\mathcal{{F}}$ in (14) is bounded from Lemma 3, we know $\{X_k \}$ is bounded as well. For $D(X_k)$, since

$$\begin{aligned} \mathrm{svec}(D(X_k)) = \left( X_k^{\frac{1}{2}}\circledast X_k^{\frac{1}{2}}\right) \mathcal{{P}}_k\left( X_k^{\frac{1}{2}}\circledast X_k^{\frac{1}{2}}\right) G_k, \end{aligned}$$

where $\mathcal{{P}}_k = I - (X_k^{\frac{1}{2}}\circledast X_k^{\frac{1}{2}})P_{\mathcal{{A}}X_k}(X_k^{\frac{1}{2}}\circledast X_k^{\frac{1}{2}})$ is an idempotent matrix which implies $\Vert \mathcal{{P}}_k\Vert \le 1$ for all k. Then along with the facts that $\{X_k \}$ is bounded and f(X) is continuously differentiable, we have $\{D(X_k) \}$ is also bounded.

Proof of (ii). From (i), we have $\{X_k \}$ is bounded, hence $\{X_k \}$ must have at least one accumulation point. Let ${\bar{X}}$ be any accumulation point of $\{X_k \}$, we will show it is actually an optimal solution for problem (P) by contradiction.

Assume ${\bar{X}}$ is not an optimal solution of problem (P). First, since $X_k \in \mathcal{P^{++}}$, we have ${\bar{X}} \in \mathcal{P^{+}}$. From Assumption 3, we can choose a point $X^* \in \mathcal{P^{+}}$ such that $X^*$ is an optimal solution for problem (P). According to the hypothesis and the fact that $\{ f(X_k) \}$ is nonincreasing from (i), we have

$$\begin{aligned} f(X_0) \ge f({\bar{X}}) = \lim \limits _{k\rightarrow +\infty } f(X_k) > f(X^*). \end{aligned}$$

Let us define

$$\begin{aligned} {\bar{Y}}=\frac{f({\bar{X}})-f(X^*)}{2(f(X_0)-f(X^*))}X_0+ \left[ 1-\frac{f({\bar{X}})-f(X^*)}{2(f(X_0)-f(X^*))}\right] X^*, \end{aligned}$$

then ${\bar{Y}} \in \mathcal{P^{++}}$. Since f(X) is convex, we have

$$\begin{aligned} f({\bar{Y}})\le & {} \frac{f({\bar{X}})-f(X^*)}{2(f(X_0)-f(X^*))}f(X_0) + \left[ 1-\frac{f({\bar{X}})-f(X^*)}{2(f(X_0)-f(X^*))}\right] f(X^*)\\= & {} \frac{f({\bar{X}})+f(X^*)}{2}. \end{aligned}$$

Let us define

$$\begin{aligned} V(X) = \ln \det X + \mathrm{tr}(X^{-1}{\bar{Y}}), \end{aligned}$$

where $X \in {{\mathcal {S}}^n_{++}}$. (Remark: The definition of V(x) is inspired by the potential function in Losert and Akin [14].) Then at $X_k$, we can define a scalar function $V_k(\alpha )$ as

$$\begin{aligned} V_k(\alpha ) = V(X_k+\alpha D(X_k)), \end{aligned}$$

where $0\le \alpha \le \alpha ^k_0$. Obviously, $V_k(0) = V(X_k)$ and $V_k(\alpha ^k) = V(X_{k+1})$. Moreover,

$$\begin{aligned} \frac{dV_k(\alpha )}{d\alpha }= & {} \mathrm{tr}(X(\alpha )^{-1}D(X_k)) - \mathrm{tr}(X(\alpha )^{-1}{{\bar{Y}}}X(\alpha )^{-1}D(X_k))\\= & {} \mathrm{tr}\left[ (X(\alpha )-{\bar{Y}})X(\alpha )^{-1}D(X_k)X(\alpha )^{-1}\right] \\= & {} \mathrm{tr}\left[ ({\bar{Y}}-X(\alpha ))X(\alpha )^{-1}X_kS_kX_kX(\alpha )^{-1}\right] , \end{aligned}$$

where $X(\alpha ) = X_k + \alpha D(X_k) = X_k - \alpha X_kS_kX_k$. Notice $X_k^{-1}X(\alpha ) = I - \alpha S_kX_k$, which implies

$$\begin{aligned} X(\alpha )^{-1}X_k = I + \alpha S_kX_k(I-\alpha S_kX_k)^{-1}, \end{aligned}$$

(16)

hence

$$\begin{aligned} \frac{dV_k(\alpha )}{d\alpha }= & {} \mathrm{tr}\left[ ({\bar{Y}}-X(\alpha ))X(\alpha )^{-1}X_kS_kX_kX(\alpha )^{-1}\right] \\= & {} \mathrm{tr}\left\{ ({\bar{Y}}-X(\alpha ))\left[ I + \alpha S_kX_kW \right] S_k\left[ I + \alpha W^TX_kS_k \right] \right\} \\= & {} \mathrm{tr}\left\{ ({\bar{Y}}-X(\alpha ))\left[ S_k + \alpha S_kX_kWS_k \right] \right\} \\&+\,\mathrm{tr}\left\{ ({\bar{Y}}-X(\alpha ))\left[ \alpha S_kW^TX_kS_k + {\alpha }^2S_kX_kWS_kW^T X_kS_k\right] \right\} , \end{aligned}$$

where $W = X(\alpha )^{-1}X_k = (I-\alpha S_kX_k)^{-1}$. Next we show that when $0\le \alpha \le \alpha ^k$, $\frac{dV_k(\alpha )}{d\alpha }$ is always negative if k is large enough. Since the level set $\mathcal{{F}}$ in (14) is bounded and f(X) is continuously differentiable, there exist $M_1, \ M_2 > 0$ such that $\Vert X\Vert \le M_1$, $\Vert \frac{\partial f}{\partial X}\Vert \le M_2$ if $X \in \mathcal{{F}}$. From Theorem 1 and Lemma 1, we have

$$\begin{aligned} \mathrm{tr}(({\bar{Y}} - X_k)S_k)= & {} \mathrm{tr}(({\bar{Y}} - X_k)G_k) \le f({\bar{Y}}) - f(X_k)\\\le & {} f({\bar{Y}}) - f({\bar{X}}) \le \frac{f(X^*)-f({\bar{X}})}{2} < 0, \end{aligned}$$

which implies

$$\begin{aligned} \frac{f({\bar{X}})-f(X^*)}{2} \le \mathrm{tr}((X_k -{\bar{Y}})S_k) \le \Vert X_k -{\bar{Y}}\Vert \cdot \Vert S_k\Vert \le 2M_1\Vert S_k\Vert , \end{aligned}$$

thus

$$\begin{aligned} \Vert S_k\Vert \ge \frac{f({\bar{X}})-f(X^*)}{4M_1}, \end{aligned}$$

(17)

for all k. Then when $0\le \alpha \le \alpha ^k$ we have

$$\begin{aligned} \mathrm{tr}(({\bar{Y}} - X(\alpha ))S_k)= & {} \mathrm{tr}\left[ ({\bar{Y}} - X_k + \alpha X_kS_kX_k)G_k\right] \\\le & {} \frac{f(X^*)-f({\bar{X}})}{2} + \alpha M_1M_2\Vert S_kX_k\Vert \\\le & {} \frac{f(X^*)-f({\bar{X}})}{2} + M_1M_2\frac{a_k}{c_1\Vert S_k\Vert } \\\le & {} \frac{f(X^*)-f({\bar{X}})}{2} + \frac{4M_1^2M_2a_k}{c_1(f({\bar{X}})-f(X^*))}. \end{aligned}$$

From (16), if $0< a_k < \frac{c_1(f({\bar{X}})-f(X^*))}{4M_1}$ and $0\le \alpha \le \alpha ^k$, we have

$$\begin{aligned} \Vert W\Vert \le 1 + \alpha \Vert S_kX_k\Vert \cdot \Vert W\Vert \le 1 + \frac{4M_1a_k}{c_1(f({\bar{X}})-f(X^*))}\Vert W\Vert , \end{aligned}$$

which indicates that

$$\begin{aligned} \Vert W^T\Vert = \Vert W\Vert \le \frac{1}{1-\frac{4M_1a_k}{c_1(f({\bar{X}})-f(X^*))}}. \end{aligned}$$

(18)

Therefore if $0< a_k < \frac{c_1(f({\bar{X}})-f(X^*))}{4M_1}$ and $0\le \alpha \le \alpha ^k$, we have

$$\begin{aligned} \frac{dV_k(\alpha )}{d\alpha }\le & {} \frac{f(X^*)-f({\bar{X}})}{2} + \frac{4M_1^2M_2a_k}{c_1(f({\bar{X}})-f(X^*))} + 4\alpha M_1\Vert S_kX_k\Vert \cdot \Vert W\Vert \cdot \Vert S_k\Vert \\&+\, 2\alpha ^2M_1\Vert S_kX_k\Vert ^2\cdot \Vert W\Vert ^2\cdot \Vert S_k\Vert \\\le & {} \frac{f(X^*)-f({\bar{X}})}{2} + \frac{4M_1^2M_2a_k}{c_1(f({\bar{X}})-f(X^*))} + \frac{4M_1a_k\Vert W\Vert }{c_1} + \frac{2M_1a_k^2\Vert W\Vert ^2}{c_1^2\Vert S_k\Vert }, \end{aligned}$$

then from (17), (18), and the fact that $\lim \nolimits _{k\rightarrow +\infty } a_k= 0$, we know there exists a $K > 0$ such that for all $k\ge K$, if $0\le \alpha \le \alpha ^k$, then $\frac{dV_k(\alpha )}{d\alpha } < 0$. Especially, we have

$$\begin{aligned} V(X_{k+1}) = V_k(\alpha ^k) < V_k(0) = V(X_k), \end{aligned}$$

for all $k\ge K$. Hence there exists an $M_3 \in R$ such that $V(X_k) \le M_3$ for all k. When $X \in {{\mathcal {S}}^n_{++}}$, let $X=Q\Lambda Q^T$ be an eigenvalue decomposition of X, and $\{\lambda _i\}_{i=1}^n$ be the eigenvalues of X. Then

$$\begin{aligned} V(X) = \ln \det X + \mathrm{tr}\big ( Q{\Lambda }^{-1}Q^T{{\bar{Y}}}\big ) = \sum _{i=1}^n \ln \lambda _i + \mathrm{tr}\big ( {\Lambda }^{-1}Q^T{{\bar{Y}}}Q \big ), \end{aligned}$$

since ${{\bar{Y}}}\in \mathcal{P}^{++}$, we have

$$\begin{aligned} \lambda _{\min }(Q^T{{\bar{Y}}}Q) = \lambda _{\min }({\bar{Y}}) > 0. \end{aligned}$$

Therefore from Lemma 2, we have

$$\begin{aligned} V(X)= & {} \sum _{i=1}^n \ln \lambda _i + \mathrm{tr}({\Lambda }^{-1}Q^T{{\bar{Y}}}Q)\\\ge & {} \sum _{i=1}^n \ln \lambda _i + \sum _{i=1}^n {\lambda _i}^{-1}\lambda _{\min }({\bar{Y}})\\= & {} \sum _{i=1}^n (\ln \lambda _i + {\lambda _i}^{-1}\lambda _{\min }({\bar{Y}}) ). \end{aligned}$$

For each i, $\ln \lambda _i + {\lambda _i}^{-1}\lambda _{\min }({\bar{Y}}) \ge \ln \lambda _{\min }({\bar{Y}}) + 1$ and $\lim \nolimits _{\lambda _i \rightarrow 0}[\ln \lambda _i + {\lambda _i}^{-1}\lambda _{\min }({\bar{Y}})] = +\infty $. Thus, by $V(X_k) \le M_3$ for all k, we know there exists an $M_4>0$ such that for all k, we have

$$\begin{aligned} \lambda _{\min }(X_k) \ge M_4 > 0. \end{aligned}$$

Let us define

$$\begin{aligned} \mathcal{H} = \{ X\in {{\mathcal {S}}^n_{++}} | \ \lambda _{\min }(X)\ge M_4 \}. \end{aligned}$$

Then for all k, $X_k \in \mathcal{H} \cap \mathcal{{F}}$ which implies $\Vert X_k^{-\frac{1}{2}}\Vert \le \frac{1}{\sqrt{M_4}}$. Along with (17), we get for all k,

$$\begin{aligned} \left\| X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right\| _F \ge \left\| X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right\| \ge \frac{\Vert S_k\Vert }{\Vert X_k^{-\frac{1}{2}}\Vert ^2} \ge \frac{M_4(f({\bar{X}})-f(X^*))}{4M_1} > 0. \end{aligned}$$

(19)

From (15), (19), and $\lim \nolimits _{k \rightarrow +\infty } f(X_{k+1})-f(X_k) = 0$, we know $\lim \nolimits _{k\rightarrow +\infty } \alpha ^k = 0$. Next we show that the index set

$$\begin{aligned} \mathcal{I} = \{ \ k \ | \ \alpha ^k = \alpha _0^k \beta ^l ,\ l\ge 1 \ \text{ in } \ (12) \} \end{aligned}$$

is finite. If not, then we can choose a subsequence $\{X_k\}_{k\in \mathcal{K}}$ ($\mathcal{K} \subseteq \{ 0, 1, \ldots \}$) such that $\mathcal{K} \subseteq \mathcal{I}$ and $\lim \nolimits _{k\in \mathcal{K}, \ k\rightarrow +\infty } X_k = {\tilde{X}} \in \mathcal{H} \cap \mathcal{{F}}$. Then for $k\in \mathcal{K}$, the condition (12) is violated by $\alpha = \alpha ^k/\beta $, i.e.,

$$\begin{aligned} \frac{f\left( X_k + \frac{\alpha ^k}{\beta }D(X_k)\right) - f(X_k)}{\frac{\alpha ^k}{\beta }} > \sigma G_k \bullet D(X_k). \end{aligned}$$

(20)

Since $\lim \nolimits _{k\rightarrow +\infty } \alpha ^k = 0$ and f(X) is continuously differentiable, from (20) we have ${\tilde{G}} \bullet D({{\tilde{X}}}) \ge \sigma {\tilde{G}} \bullet D({\tilde{X}})$ where ${\tilde{G}} = \frac{\partial f}{\partial X}|_{X={\tilde{X}}}$, which implies

$$\begin{aligned} -\,\left\| {{\tilde{X}}}^{\frac{1}{2}}S({\tilde{X}}){{\tilde{X}}}^{\frac{1}{2}}\right\| ^2_F = {\tilde{G}} \bullet D({\tilde{X}}) \ge 0, \end{aligned}$$

but this contradicts with (19). Thus the index set $\mathcal{I}$ is finite and there must exist an $N_1 > 0$ such that for all $k\ge N_1$, $\alpha ^k = \alpha ^k_0$ in (12). From (4), we know $\Vert u(X)\Vert $ is a continuous function on $\mathcal{H} \cap \mathcal{{F}}$ which is closed and bounded, thus $\Vert u(X)\Vert $ will be bounded on $\mathcal{H} \cap \mathcal{{F}}$. Along with (2) and (17), there exists an $M_5>0$ such that for all k, $\Vert S_k\Vert \le M_5$ and

$$\begin{aligned} \frac{c_1(f({\bar{X}})-f(X^*))}{4M_1} \le c(\Vert S_k\Vert ) \le M_5. \end{aligned}$$

(21)

Since $\lim \nolimits _{k \rightarrow +\infty } a_k = 0$, there exists an $N_2>0$ such that for all $k \ge N_2$, we have $a_k < \frac{\tau c_1 (f({\bar{X}})-f(X^*))}{4M_1}$. Then for $k\ge N_2$, by using Theorem 1.3.20 in [9] and (21), for $\alpha = \frac{a_k}{\tau \Vert S_kX_k\Vert c(\Vert S_k\Vert )}$, we have

$$\begin{aligned} \left\| \alpha X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right\|= & {} \alpha r\left( X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right) = \alpha r(S_kX_k)\\\le & {} \alpha \Vert S_kX_k\Vert = \frac{a_k}{\tau c(\Vert S_k\Vert )} \le \frac{4M_1a_k}{\tau c_1 (f({\bar{X}})-f(X^*))} < 1, \end{aligned}$$

where r(A) denotes the spectral radius of matrix A, this implies

$$\begin{aligned} X_k + \alpha D(X_k) = X_k^{\frac{1}{2}}\left( I - \alpha X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right) X_k^{\frac{1}{2}} \in {{\mathcal {S}}^n_{++}}, \end{aligned}$$

therefore $\alpha \le \rho (X_k)$ and then $\frac{a_k}{\Vert S_kX_k\Vert c(\Vert S_k\Vert )} \le \tau \rho (X_k)$. Thus for all $k\ge N_2$, by (21) we have

$$\begin{aligned} \alpha ^k_0 = \frac{a_k}{\Vert S_kX_k\Vert c(\Vert S_k\Vert )} \ge \frac{a_k}{\Vert S_k\Vert \cdot \Vert X_k\Vert c(\Vert S_k\Vert )} \ge \frac{a_k}{M_1M_5^2}. \end{aligned}$$

(22)

Combining (15), (19), and (22), we know for all $k\ge N_3 = \max (N_1, N_2)$,

$$\begin{aligned} f(X_{k+1}) - f(X_k) \le -\,\sigma \alpha ^k_0 \frac{M_4^2(f({\bar{X}})-f(X^*))^2}{16M_1^2} \le -\,\frac{\sigma M_4^2(f({\bar{X}})-f(X^*))^2}{16M_1^3M_5^2}a_k, \end{aligned}$$

this implies

$$\begin{aligned} f({\bar{X}}) - f(X_{N_3}) \le -\,\frac{\sigma M_4^2(f({\bar{X}})-f(X^*))^2}{16M_1^3M_5^2}\sum \limits _{k\ge N_3}^{+\infty } a_k = -\,\infty , \end{aligned}$$

which is a contradiction. Therefore any accumulation point of $\{X_k \}$ is an optimal solution for problem (P). $\square $

5 A special case of problem (P)

In this section, we consider a special case of problem (P) where X and $A_i$ ($i = 1,\ldots ,m$) are all diagonal. The results of this section are therefore applicable to the linearly constrained convex programming. We will show that in this special case, the step size can be larger in (12) in the sense that the positive sequence $\{ a_i\}_{i=0}^{+\infty }$ is not required. If $X = \mathrm{diag}(x)$, where $x \in R^n_{++}$, then S(X), D(X), and $\frac{\partial f}{\partial X}$ are all diagonal matrices. Let $A \in R^{m\times n}$ such that $A_{ij} = (A_i)_{jj}$ and $\nabla f \in R^n$ such that $(\nabla f)_i = (\frac{\partial f}{\partial X})_{ii}$. Then from (4), u(X) can be denoted as

$$\begin{aligned} u(X) = (AX^2A^T)^{-1}AX^2\nabla f. \end{aligned}$$

(23)

In this special case, the step size $\alpha ^k$ is also chosen by (12) but $\alpha ^k_0$ is defined as

$$\begin{aligned} 0 < \alpha _0^k = \min \left\{ \frac{c_5}{\Vert S_kX_k\Vert ^{c_4}}, \tau \rho (X_k) \right\} , \end{aligned}$$

(24)

where $0\le c_4<1$, $c_5>0$ are constants.

Theorem 3

Let $\{ X_k \}$ be generated by the affine scaling algorithm (10) with the step size $\{ \alpha ^k \}$ chosen by (12) and (24). Then

(i)
$X_k \in \mathcal{P^{++}}$, $\{ f(X_k) \}$ is nonincreasing, $\{X_k \}$ and $\{D(X_k) \}$ are bounded;
(ii)
$\lim \nolimits _{k \rightarrow +\infty } \Vert X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\Vert _F = 0$;
(iii)
every accumulation point of $\{X_k \}$ is an optimal solution for problem (P).

Proof

Proof of (i). Similar to the proof of (i) in Theorem 2.

Proof of (ii). We prove this by contradiction. Assume $\lim \nolimits _{k \rightarrow +\infty } \Vert X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\Vert _F = 0$ is not true, then since $\{X_k \}$, $\{D(X_k) \}$ are both bounded, there must exist a subsequence $\{X_k\}_{k\in \mathcal{K}}$ ($\mathcal{K} \subseteq \{ 0, 1, \ldots \}$) and an ${{\bar{M}}}_1 > 0$ such that $\lim \nolimits _{k\in \mathcal{K},\ k \rightarrow +\infty } \Vert X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\Vert _F = {{\bar{M}}}_1$, $\lim \nolimits _{k\in \mathcal{K},\ k \rightarrow +\infty } X_k = {\hat{X}}$, and $\lim \nolimits _{k\in \mathcal{K},\ k \rightarrow +\infty } D(X_k) = {\hat{D}}$.

Then from (15) and $\lim \nolimits _{k\rightarrow +\infty } f(X_{k+1})-f(X_k) = 0$, we know

$$\begin{aligned} \lim \limits _{k\in \mathcal{K},\ k \rightarrow +\infty } \alpha ^k = 0. \end{aligned}$$

(25)

From Lemma 3 and the Remark in Sun [26], we know that if $x>0$, then every entry of $(AX^{2}A^T)^{-1}AX^{2}$ is bounded, and the bound depends only on A and n. Thus from (23), we know that $u(X_k)$ is bounded which implies that $S(X_k)$ is also bounded. Hence there exists an ${{\bar{M}}}_2 > 0$ such that $\Vert S_kX_k\Vert \le {{\bar{M}}}_2$ for all k. If $\alpha = \frac{1}{2{{\bar{M}}}_2}$, then

$$\begin{aligned} \left\| \alpha X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right\| = \alpha \Vert S_kX_k\Vert < 1, \end{aligned}$$

which implies

$$\begin{aligned} X_k + \alpha D(X_k) = X_k^{\frac{1}{2}}\left( I - \alpha X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\right) X_k^{\frac{1}{2}} \in {{\mathcal {S}}^n_{++}}, \end{aligned}$$

therefore $\rho (X_k) \ge \frac{1}{2{{\bar{M}}}_2}$ for all k. Let ${{\bar{M}}}_3 = \min (\frac{c_5}{{{{\bar{M}}}_2}^{c_4}}, \tau \frac{1}{2{{\bar{M}}}_2})$. Then from (24), we have $\alpha ^k_0 \ge {{\bar{M}}}_3 > 0$. Hence from (25), we know for all $k \in \mathcal{K}$ sufficiently large, $\alpha ^k < \alpha ^k_0$, which implies that condition (12) is violated by $\alpha = \alpha ^k/\beta $, i.e.,

$$\begin{aligned} \frac{f\left( X_k + \frac{\alpha ^k}{\beta }D(X_k)\right) - f(X_k)}{\frac{\alpha ^k}{\beta }} > \sigma G_k \bullet D(X_k). \end{aligned}$$

(26)

Since $\lim \nolimits _{k \in \mathcal{K},\ k\rightarrow +\infty } \alpha ^k = 0$ and f(X) is continuously differentiable, from (26) we have ${\hat{G}} \bullet {\hat{D}} \ge \sigma {\hat{G}} \bullet {\hat{D}}$ where ${\hat{G}} = \frac{\partial f}{\partial X}|_{X={\hat{X}}}$, which implies

$$\begin{aligned} {\hat{G}} \bullet {\hat{D}} = 0, \end{aligned}$$

but this contradicts with $\lim \nolimits _{k\in \mathcal{K},\ k \rightarrow +\infty } \Vert X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\Vert _F = {{\bar{M}}}_1 > 0$. Hence the hypothesis is not true, and $\lim \nolimits _{k \rightarrow +\infty } \Vert X_k^{\frac{1}{2}}S_kX_k^{\frac{1}{2}}\Vert _F = 0$.

Proof of (iii). Similar to Theorem 2, we also prove this by contradiction. From (ii), we have

$$\begin{aligned} 0\le \lim \inf \limits _{k\rightarrow +\infty } \alpha ^k \Vert S_kX_k\Vert\le & {} \lim \sup \limits _{k\rightarrow +\infty } \alpha ^k \Vert S_kX_k\Vert \\\le & {} \lim \sup \limits _{k\rightarrow +\infty } \alpha ^k_0 \Vert S_kX_k\Vert \le \lim \sup \limits _{k\rightarrow +\infty } c_5 \Vert S_kX_k\Vert ^{1-c_4} =0, \end{aligned}$$

which implies $\lim \nolimits _{k\rightarrow +\infty }\alpha ^k\Vert S_kX_k\Vert = 0$. Moreover, $\Vert S_k\Vert $ is also bounded, hence similar to the proof of Theorem 2, we can also get (19) ($M_1, M_4>0$ may be different), but this contradicts with (ii). Hence every accumulation point of $\{X_k \}$ is an optimal solution for problem (P). Thus the proof is complete. $\square $

6 Numerical experiments

In this section, we provide computational results of our affine scaling algorithm on the counter example in [20] and on some randomly generated linear SDP problems. The algorithm is implemented in Matlab (version 2017a) on a PC, and it should be mentioned that the main goal of this paper is on the theoretical aspect, hence the program is coded only for the demonstration purpose. In our algorithm, for linear SDP, $\alpha ^k = \alpha _0^k$, and for simplicity, we set $c_1=c_2=c_3=1$ and $c(\Vert S_k\Vert ) = \Vert S_k\Vert $ in the following numerical experiments.

The example in [20] is the following SDP problem:

$$\begin{aligned} \min \limits _{X\in {\mathcal {S}}^n} \begin{pmatrix}1 &{} 0\\ 0 &{} 1\end{pmatrix} \bullet X \ \ \mathrm{s.t.\ } \begin{pmatrix}0 &{} 1 \\ 1 &{} 0\end{pmatrix} \bullet X= 2, \ X\succeq 0. \end{aligned}$$

(27)

The equality constraint in problem (27) implies that $X = \begin{pmatrix} x &{} 1\\ 1 &{} y\end{pmatrix}$ for $x, y\in R$. In fact, problem (27) has the following equivalent form (see (18) in [20]):

$$\begin{aligned} \min \limits _{x, y\in R} x+y \ \ \mathrm{s.t.\ } x\ge 0, \ \ y\ge 0, \ \ xy\ge 1. \end{aligned}$$

(28)

More details related to problem (27) can be found in Sect. 3 in [20]. From (24)-(26) in [20], we have

$$\begin{aligned} u(X) = \frac{x+y}{xy+1}, \ \ \ S(X) = \begin{pmatrix} 1 &{} -\,u(X) \\ -\,u(X) &{} 1 \end{pmatrix}, \end{aligned}$$

and

$$\begin{aligned} D(X) = -\,\frac{xy-1}{xy+1}\begin{pmatrix} x^2-1 &{} 0 \\ 0 &{} y^2-1 \end{pmatrix}. \end{aligned}$$

The initial point $(x^0, y^0)$ in [20] is chosen from the set

$$\begin{aligned} \mathcal{{L}} = \{ (x, y) | xy>1, \ x<1, \ y>1 \}, \end{aligned}$$

and in this situation, from Proposition 3 and (32) in [20], we can obtain

$$\begin{aligned} \rho (X) = \frac{xy+1}{xy-1}\cdot \frac{-\,(x+y)+\sqrt{(x+y)^2+4R(x,y)}}{2R(x,y)}, \end{aligned}$$

where $R(x,y) = \frac{(1-x^2)(y^2-1)}{xy-1}$. In the numerical tests, we also choose the initial point $(x^0, y^0)$ from $\mathcal{{L}}$. In particular, we choose an $x^0$ in (0, 1) and set $\epsilon (x^0) = \frac{1-x^0}{2}$. Finally, we choose a $y^0>1$ such that

$$\begin{aligned} \sqrt{x^0y^0-1} < \frac{\tau ^2\epsilon (x^0)(1-\epsilon (x^0)-x^0)}{2}. \end{aligned}$$

From Theorem 7 in [20], the limit point of the iterations $\{(x^k, y^k) \}$ will be contained in the closure of $\mathcal{{L}}_{\epsilon (x^0)} = \{ (x, y)\in \mathcal{{L}} | x<1-\epsilon (x^0) \}$ if $\alpha ^k = \tau \rho (X_k)$ at each iteration. In the numerical tests, we set $a_k = \frac{1}{\sqrt{x^0y^0-1}\cdot \ln (2+k)}$ to make $a_0$ and $\rho (X_0)$ have the same magnitude.

The following two tables list the numerical results of our algorithm for problem (27) with different $\tau $ values and initial points. The unique solution of problem (28) is (1, 1), and the stopping criterion of the program is $|x+y-2|<10^{-6}$. $Iter_{es}$ denotes the iteration number when $(x^k, y^k)$ escaped from $\mathcal{{L}}_{\epsilon (x^0)}$ the first time, and Iter denotes the total iteration number when the algorithm stopped. We also report the behaviors of the long step affine scaling algorithm in [20]. We set $\alpha ^k = \tau \rho (X_k)$ at each iteration and use $x_f$ to denote the value of x after $10^7$ iterations.

Table 1 Numerical results for problem (27) with $\tau =0.5$

Full size table

Table 2 Numerical results for problem (27) with $\tau =0.9$

Full size table

From Tables 1 and 2, we can see for the long step affine scaling algorithm where $\alpha ^k = \tau \rho (X_k)$ at each iteration, the numerical results confirm the findings of Theorem 7 in [20], and even after $1.0\times 10^7$ iterations, $(x^k, y^k)$ is still contained in $\mathcal{{L}}_{\epsilon (x^0)}$. However, by adopting our new step size rule, we can obtain the optimal solution within $10^{-6}$ accuracy successfully in all cases.

Next we report the numerical results of our affine scaling algorithm on some randomly generated linear SDP problems. Let $C\bullet X$ be the objective function of randomly generated linear SDP. Then each element of C and $A_i \ (i = 2,\ldots ,m)$ is chosen randomly and uniformly in $[-\,1, 1]$, while $A_1$ is set at the identity matrix I. Finally, we set $b_i = A_i\bullet I \ (i=1, \ldots , m)$. In this way, I will be an interior feasible point for problem (P), and we use I as the initial point in our numerical experiments. These randomly generated SDP problems are actually the same as the ones in Section 4.1 of [34].

In our tests, after a linear SDP problem is generated, we firstly use SDPT3 [29] to obtain the optimal objective function value $f^*$ with the default accuracy tolerance of $10^{-8}$. Then we run our program from the initial point $X_0 = I$, and the stopping criterion is set as $\frac{|f(X_k) - f^*|}{|f^*|+1} < 10^{-6}$. For each (m, n) pair, we run the program for 20 times, and each number reported in Table 3 is the average of 20 runs. In all of our tests, we set $a_k = \frac{mn}{\ln (k+2)}$ and $\tau = 0.5$.

Table 3 Numerical results for randomly generated linear SDPs

Full size table

7 Concluding remarks

In this paper, an affine scaling algorithm with a new step size rule for linearly constrained convex semidefinite programming is proposed. It is proven that, starting from any feasible interior point, the accumulation points of the sequence generated by this algorithm are optimal solutions if the optimal solution set is nonempty and bounded. This global convergence does not depend on nondegeneracy assumptions. This confirms our conjecture that by just selecting a suitable step size at each iteration, the affine scaling algorithm can be made convergent even if it is applied to semidefinite programming, which includes linearly constrained convex programming as a special case.

A possible future topic of research is the convergence of the generated sequence, which we call strong convergence. Even in the special case where X is diagonal, as far as we know, the strong convergence of the affine scaling algorithm for convex programming has not been resolved. Tseng et al. [32] showed that for some first-order interior point method, the strong convergence can be obtained for the quadratic case, but not for the affine scaling algorithm. It would be of theoretical interest to know if this is true if we allow more flexibility in selecting the step size or initial points.

References

Alizadeh, F., Haeberly, J.P.A., Overton, M.L.: Primal-dual interior-point methods for semidefinite programming: convergence rates, stability and numerical results. SIAM J. Optim. 8, 746–768 (1998)
Article MathSciNet Google Scholar
Barnes, E.R.: A variation on Karmarkar’s algorithm for solving linear programming problems. Math. Program. 36, 174–182 (1986)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Dikin, I.I.: Iterative solution of problems of linear and quadratic programming. Sov. Math. Dokl. 8, 674–675 (1967). (in Russian)
MathSciNet MATH Google Scholar
Faybusovich, L.: Dikin’s algorithm for matrix linear programming problems. In: System Modelling and Optimization, Springer, pp. 237–247 (1994)
Gonzaga, C.C., Carlos, L.A.: A primal affine scaling algorithm for linearly constrained convex programs. Tech. Report ES-238/90, Department of Systems Engineering and Computer Science, COPPE Federal University of Rio de Janeiro, Rio de Janeiro (1990)
Graña Drummond, L.M., Iusem, A.N., Svaiter, B.F.: On the central path for nonlinear semidefinite programming. RAIRO Oper. Res. 34, 331–345 (2000)
Article MathSciNet Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge University Press, Cambridge (2012)
Book Google Scholar
Halická, M., de Klerk, E., Roos, C.: On the convergence of the central path in semidefinite optimization. SIAM J. Optim. 12, 1090–1099 (2002)
Article MathSciNet Google Scholar
Halická, M., Trnovská, M.: Limiting behavior and analyticity of weighted central paths in semidefinite programming. Optim. Methods Softw. 25, 247–262 (2010)
Article MathSciNet Google Scholar
Li, L., Toh, K.C.: A polynomial-time inexact primal-dual infeasible path-following algorithm for convex quadratic SDP. Pac. J. Optim. 7(1), 43–61 (2011)
MathSciNet MATH Google Scholar
López, J., Ramírez, H.C.: On the central paths and cauchy trajectories in semidefinite programming. Kybernetika 46, 524–535 (2010)
MathSciNet MATH Google Scholar
Losert, V., Akin, E.: Dynamics of games and genes: discrete versus continuous time. J. Math. Biol. 17, 241–251 (1983)
Article MathSciNet Google Scholar
Lu, Z., Monteiro, R.D.C.: Limiting behavior of the Alizadeh–Haeberly–Overton weighted paths in semidefinite programming. Optim. Methods Softw. 22, 849–870 (2007)
Article MathSciNet Google Scholar
Luo, Z.Q., Sturm, J.F., Zhang, S.Z.: Superlinear convergence of a symmetric primal-dual path following algorithm for semidefinite programming. SIAM J. Optim. 8(1), 59–81 (1998)
Article MathSciNet Google Scholar
Monteiro, R.D.C.: Primal-dual path-following algorithms for semidefinite programming. SIAM J. Optim. 7, 663–678 (1997)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Wang, Y.: Trust region affine scaling algorithms for linearly constrained convex and concave programs. Math. Program. 80(3), 283–313 (1998)
Article MathSciNet Google Scholar
Monteiro, R.D.C., Zhang, Y.: A unified analysis for a class of long-step primal–dual path-following interior-point algorithms for semidefinite programming. Math. Program. 81(3), 281–299 (1998)
Article MathSciNet Google Scholar
Muramatsu, M.: Affine scaling algorithm fails for semidefinite programming. Math. Program. 83(1–3), 393–406 (1998)
MathSciNet MATH Google Scholar
Nie, J.W., Yuan, Y.X.: A predictorcorrector algorithm for QSDP combining Dikin-type and Newton centering steps. Ann. Oper. Res. 103, 115–133 (2001)
Article MathSciNet Google Scholar
Renegar, J., Sondjaja, M.: A polynomial-time affine-scaling method for semidefinite and hyperbolic programming arXiv: 1410.6734 (2014)
Rockafellar, R.T.: Convex Analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton (1970)
Book Google Scholar
Sim, C.-K., Zhao, G.: Asymptotic behavior of HKM paths in interior point methods for monotone semidefinite linear complementarity problems: general theory. J. Optim. Theory Appl. 137, 11–25 (2008)
Article MathSciNet Google Scholar
Sturm, J.F., Zhang, S.: Symmetric primal–dual path-following algorithms for semidefinite programming. Appl. Numer. Math. 29(3), 301–315 (1999)
Article MathSciNet Google Scholar
Sun, J.: A convergence proof for an affine scaling algorithm for convex quadratic programming without nondegeneracy assumptions. Math. Program. 60, 69–79 (1993)
Article MathSciNet Google Scholar
Sun, J.: A convergence analysis for a convex version of Dikin’s algorithm. Ann. Oper. Res. 62(1), 357–374 (1996)
Article MathSciNet Google Scholar
Todd, M.J., Toh, K.C., Tütüncü, R.H.: On the Nesterov–Todd direction in semidefinite programming. SIAM J. Optim. 8, 769–796 (1998)
Article MathSciNet Google Scholar
Toh, K.C., Todd, M.J., Tütüncü, R.H.: SDPT3a MATLAB software package for semidefinite programming, version 1.3. Optim. Methods Softw. 11(1–4), 545–581 (1999)
Article MathSciNet Google Scholar
Toh, K.C.: An inexact primal–dual path following algorithm for convex quadratic SDP. Math. Program. 112(1), 221–254 (2008)
Article MathSciNet Google Scholar
Toh, K.C., Tütüncü, R.H., Todd, M.J.: Inexact primal–dual path-following algorithms for a special class of convex quadratic SDP and related problems. Pac. J. Optim. 3(1), 135–164 (2007)
MathSciNet MATH Google Scholar
Tseng, P., Bomze, I.M., Schachinger, W.: A first-order interior point method for linearly constrained smooth optimization. Math. Program. 127, 399–424 (2011)
Article MathSciNet Google Scholar
Vanderbei, R.J., Meketon, M.S., Freedman, B.A.: A modification of Karmarkar’s linear programming algorithm. Algorithmica 1, 395–407 (1986)
Article MathSciNet Google Scholar
Yamashita, M., Fujisawa, K., Kojima, M.: Implementation and evaluation of SDPA 6.0 (semidefinite programming algorithm 6.0). Optim. Methods Softw. 18(4), 491–505 (2003)
Article MathSciNet Google Scholar
Wolkowicz, H., Saigal, R., Vandenberghe, L.: Handbook of Semidefinite Programming: Theory, Algorithms, and Applications, vol. 27. Springer, Berlin (2012)
MATH Google Scholar
Yamashita, H., Yabe, H.: Local and superlinear convergence of a primal–dual interior point method for nonlinear semidefinite programming. Math. Program. 132(1–2), 1–30 (2012)
Article MathSciNet Google Scholar
Yamashita, H., Yabe, H., Harada, K.: A primalCdual interior point method for nonlinear semidefinite programming. Math. Program. 135(1–2), 89–121 (2012)
Article MathSciNet Google Scholar
Zhang, Y.: On extending some primal–dual interior-point algorithms from linear programming to semidefinite programming. SIAM J. Optim. 8, 365–386 (1998)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the two anonymous referees for their constructive comments and suggestions on the earlier version of this paper. We really appreciate their valuable inputs.

Author information

Authors and Affiliations

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong, People’s Republic of China
Xun Qian & Li-Zhi Liao
Department of Mathematics and Statistics, Curtin University, Perth, Australia
Jie Sun
School of Business, National University of Singapore, Singapore, Singapore
Jie Sun

Authors

Xun Qian
View author publications
You can also search for this author in PubMed Google Scholar
Li-Zhi Liao
View author publications
You can also search for this author in PubMed Google Scholar
Jie Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li-Zhi Liao.

Additional information

The work of L.-Z. Liao was supported in part by grants from Hong Kong Baptist University (FRG) and General Research Fund (GRF) of Hong Kong. The work of J. Sun was partially supported by Australia Research Council under grant DP160102819.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qian, X., Liao, LZ. & Sun, J. A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming. Math. Program. 179, 1–19 (2020). https://doi.org/10.1007/s10107-018-1314-0

Download citation

Received: 01 March 2017
Accepted: 19 July 2018
Published: 28 July 2018
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10107-018-1314-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming

Abstract

Similar content being viewed by others

On Slater’s condition and finite convergence of the Douglas–Rachford algorithm for solving convex feasibility problems in Euclidean spaces

Optimality conditions and global convergence for nonlinear semidefinite programming

A convergent relaxation of the Douglas–Rachford algorithm

1 Introduction

2 Properties of the affine scaling direction

Assumption 1

Assumption 2

Assumption 3

Theorem 1

3 A new step size rule

4 Optimality of the affine scaling algorithm

Lemma 1

Lemma 2

Lemma 3

Proof

Theorem 2

Proof

5 A special case of problem (P)

Theorem 3

Proof

6 Numerical experiments

7 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A strategy of global convergence for the affine scaling algorithm for convex semidefinite programming

Abstract

Similar content being viewed by others

On Slater’s condition and finite convergence of the Douglas–Rachford algorithm for solving convex feasibility problems in Euclidean spaces

Optimality conditions and global convergence for nonlinear semidefinite programming

A convergent relaxation of the Douglas–Rachford algorithm

1 Introduction

2 Properties of the affine scaling direction

Assumption 1

Assumption 2

Assumption 3

Theorem 1

3 A new step size rule

4 Optimality of the affine scaling algorithm

Lemma 1

Lemma 2

Lemma 3

Proof

Theorem 2

Proof

5 A special case of problem (P)

Theorem 3

Proof

6 Numerical experiments

7 Concluding remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation