1 Introduction

Symmetric optimization (SO) problem is a convex optimization problem, which minimizes a linear function over the intersection of an affine subspace and a symmetric cone. The SO problem is an important class of convex optimization problems, because it includes linear optimization (LO), second-order cone optimization (SOCO), and semidefinite optimization (SDO) as special cases. Particularly, many interior-point methods (IPMs) for LO are successfully extended to SOCO and SDO (see, e.g., [15]), due to their polynomial complexity and practical efficiency.

Güler [6] was the first to realize that symmetric cones serve as a unifying framework to which the important types of cones used in optimization such as nonnegative orthant, second-order cone, and cone of positive semidefinite matrices belong. Faybusovich [7] made the first attempt to generalize IPMs from LO to SO using EJAs and associated symmetric cones. Consequently, several efficient IPMs for SO have been developed (see, e.g., [815]). For an overview of the relevant results, we refer to the recent book on this subject [16] and the references therein.

The primal-dual full Newton-step feasible IPM for LO was first analysed by Roos et al. in [17] and was later extended to infeasible version by Roos [18]. Subsequently, Gu et al. [19] proposed an improved version of full Newton-step infeasible IPM for LO where the convergence analysis of the algorithm was simplified. However, it still maintains the currently best known iteration bound. Both versions of the method were extended by Gu et al. [8] to SO using EJAs. The search direction used in [8] is based on the Nesterov–Todd (NT) scheme. The obtained iteration bounds coincide with the ones derived for LO, where \(n\) is replaced by \(r\), the rank of EJAs, which are the currently best known iteration bounds for SO. Some other references, related to full step IPMs, without any attempt to be complete, are [12, 2024].

In this paper, we present an improved convergence and complexity analysis of the full NT-step feasible IPM for SO discussed in [8]. More specifically, a sharper quadratic convergence result is established, which leads to a wider neighbourhood of the central path for the iterates in the algorithm. The purpose of the paper is mainly theoretical, which is to show that despite full NT-steps in the wider neighbourhood of the central path, the best complexity known for these types of methods is still maintained. Moreover, these features give a certain numerical appeal to the methods although they are still small-update methods and cannot quite compare in efficiency with large-update methods whose theoretical complexity is far worse (the “irony” of IPMs).

The outline of the paper is as follows. In Sect. 2, we recall some known results that are needed in this paper. More importantly, some new results are developed that are used to perform the new complexity analysis of feasible IPM. In Sect. 3, we present the full NT-step feasible IPM for SO with its complexity analysis. Finally, some concluding remarks are made in Sect. 4.

2 Preliminaries

In what follows we assume that the reader is familiar with the basic concepts of EJAs and symmetric cones. A comprehensive treatment of EJAs can be found in the monograph [25] and in [7, 8, 11, 12, 26] as it relates to optimization.

Let \((\mathcal {V}, \circ )\) be an \(n\)-dimensional EJA with rank \(r\) equipped with the standard inner product \(\left\langle x,s\right\rangle = \text{ tr }(x \circ s)\) (see, (8)) and \(\mathcal{K}\) be the corresponding symmetric cone. Moreover, we always assume that there exists an identity element \(e\in \mathcal {V}\) such that \(e \circ x = x\) for all \(x\in \mathcal {V}\).

For any element \(x\in \mathcal {V}\), the Lyapunov transformation \(L(x): \mathcal {V}\rightarrow \mathcal {V}\) is given by

$$\begin{aligned} L(x)y:=x\circ y, ~~\forall y\in \mathcal {V}. \end{aligned}$$
(1)

Furthermore, we define the quadratic representation of \(x\) in \(\mathcal {V}\) as follows:

$$\begin{aligned} P(x):=2L(x)^2-L\left( x^2\right) , \end{aligned}$$
(2)

where \(L(x)^2=L(x)L(x)\).

For any EJA \(\mathcal {V}\), the corresponding cone of squares

$$\begin{aligned} \mathcal{K}(\mathcal {V}):=\left\{ x^2: x\in \mathcal {V}\right\} \end{aligned}$$
(3)

is indeed a symmetric cone, i.e. self-dual closed convex and homogeneous cone [25].

For any \(x\in \mathcal {V}\), let \(r\) be the smallest integer such that the set \(\{e,x,\ldots ,x^r\}\) is linearly dependent. Then \(r\) is the degree of \(x\) which is denoted as deg\((x)\). Clearly, this degree of \(x\) is bounded by the dimension of the vector space \(\mathcal {V}\). The rank of \(\mathcal {V}\), denoted by rank\((\mathcal {V})\), is the largest deg\((x)\) of any element \(x\in \mathcal {V}\). An element \(x\in \mathcal {V}\) is called regular if its degree equals the rank of \(\mathcal {V}\). In the sequel, otherwise stated, \(\mathcal {V}\) is used to be denoted an EJA with rank\((\mathcal {V})=r\).

For a regular element \(x\in \mathcal {V}\), since \(\{e, x, x^2, \ldots , x^r\}\) is linearly dependent, there are real numbers \(a_1(x),\ldots , a_r (x)\) such that the minimal polynomial of every regular element \(x\) is given by

$$\begin{aligned} f(\lambda ;x)=\lambda ^r - a_1(x)\lambda ^{r-1} +\cdot \cdot \cdot +(-1)^ra_r (x), \end{aligned}$$
(4)

which is the characteristic polynomial of the regular element \(x\). The coefficient \(a_1(x)\) is called the trace of \(x\), denoted as \(\mathrm{tr}(x)\). And the coefficient \(a_r(x)\) is called the determinant of \(x\), denoted as \(\mathrm{det}(x)\).

An element \(c\in \mathcal {V}\) is said to be an idempotent if \(c^2=c\). Two idempotents \(c_1\) and \(c_2\) are said to be orthogonal if \(c_1\circ c_2=0\). Moreover, an idempotent is primitive if it is non-zero and cannot be written as the sum of two (necessarily orthogonal) non-zero idempotents. We say that \(\{c_1,\ldots , c_r\}\) is a complete system of orthogonal primitive idempotents, or Jordan frame, if each \(c_i\) is a primitive idempotent, \(c_i \circ c_j =0, i\ne j\), and \(\sum _{i=1}^{r}c_i = e\). The Löwner partial ordering “\(\succeq _{\mathcal{K}}\)” of \(\mathcal {V}\) defined by a cone \(\mathcal{K}\) is defined by \(x\succeq _{\mathcal{K}} s\) if \(x-s\in \mathcal{K}\). The interior of \(\mathcal{K}\) is denoted as \(\mathrm{int}\mathcal{K}\) and we write \(x\succ _{\mathcal{K}} s\) if \(x-s\in \mathrm{int} \mathcal{K}\).

The following theorem describes a spectral decomposition of elements in \(\mathcal {V}\), which plays an important role in the analysis of the IPMs for SO.

Theorem 2.1

(Theorem III.1.2 in [25]) Let \(x\in \mathcal {V}\). Then there exist a Jordan frame \(\{c_1,\ldots , c_r\}\) and real numbers \(\lambda _1(x),\ldots , \lambda _r(x)\) such that

$$\begin{aligned} x=\sum _{i=1}^{r}\lambda _i(x)c_i. \end{aligned}$$
(5)

The numbers \(\lambda _i(x)\) (with their multiplicities) are the eigenvalues of \(x\). Furthermore, the trace and the determinant of \(x\) are given by

$$\begin{aligned} \mathrm{tr}(x)=\sum _{i=1}^r\lambda _i(x)~~\mathrm{and}~\mathrm{det}(x)=\prod _{i=1}^r \lambda _i(x). \end{aligned}$$
(6)

Fix a Jordan frame \(\{c_1, c_2,\ldots , c_r\}\) in an EJA \(\mathcal {V}\). For \(i,j\in \{1,2,\ldots ,r\}\), define the eigenspaces

$$\begin{aligned} \mathcal {V}_{ii}:=\{x\in \mathcal {V}:\,x\circ c_i=x\}=I\!R\,c_i, \mathcal {V}_{ij}\!:=\!\left\{ x\in \mathcal {V}:\,x\circ c_i=\frac{1}{2}x=x\circ c_j\right\} , i\!\ne \! j. \end{aligned}$$

The following theorem provides another important decomposition, the Peirce decomposition, of the space \(\mathcal {V}\).

Theorem 2.2

(Theorem IV.2.1 in [25]) The space \(\mathcal {V}\) is the orthogonal direct sum of the spaces \(\mathcal {V}_{ij}\) \((i\le j)\), i.e.

$$\begin{aligned} \mathcal {V}=\bigoplus _{i\le j}\mathcal {V}_{ij}. \end{aligned}$$

Furthermore, we have \(\mathcal {V}_{ij}\circ \mathcal {V}_{ij}\subset \mathcal {V}_{ii}+\mathcal {V}_{jj},~\mathcal {V}_{ij}\circ \mathcal {V}_{jk}\subset \mathcal {V}_{ik},\,\text{ if }\,\,i\ne k,\) and

$$\begin{aligned} \mathcal {V}_{ij}\circ \mathcal {V}_{kl}=\{0\},\,\text{ if }\,\{i,j\}\cap \{k,l\}=\emptyset . \end{aligned}$$

Thus, the Peirce decomposition of \(x \in \mathcal {V}\) with respect to the Jordan frame \(\{c_1,\dots ,c_r\}\) is given by

$$\begin{aligned} x=\sum _{i=1}^r x_ic_i +\sum _{i<j}x_{ij} \end{aligned}$$
(7)

with \(x_i\in I\!R\), \(i=1,\dots ,r\) and \(x_{ij}\in \mathcal {V}_{ij}\), \(1\le i< j \le r\).

Corollary 2.1

(Lemma 12 in [11]) Lex \(x\in \mathcal {V}\) and its spectral decomposition with respect to the Jordan frame \(\{c_1,\ldots ,c_r\}\) is given by (5). Then the following statements hold.

  1. (i)

    The matrices, \(L(x)\) and \(P(x)\) commute and thus share a common system of eigenvectors; in fact the \(c_i\) are among their common eigenvectors.

  2. (ii)

    The eigenvalues of \(L(x)\) have the form \(\frac{\lambda _i+\lambda _j}{2},~1\le i\le j\le r.\)

  3. (iii)

    The eigenvalues of \(P(x)\) have the form \(\lambda _i\lambda _j,~1\le i\le j\le r.\)

As already indicated, for any \(x,s\in \mathcal {V}\), the trace inner product is defined by

$$\begin{aligned} \langle x,s\rangle :=\mathrm{tr}(x\circ s). \end{aligned}$$
(8)

Thus, \(\mathrm{tr}(x)=\langle x,e\rangle .\) Hence, it is easy to verify that

$$\begin{aligned} \mathrm{tr}(x+s)=\mathrm{tr}(x)+\mathrm{tr}(s) \, \, \mathrm{{and}} \,\, x\preceq _\mathcal{K}s \Rightarrow \mathrm{tr}(x) \le \mathrm{tr}(s). \end{aligned}$$
(9)

The Frobenius norm induced by this trace inner product is given by

$$\begin{aligned} \Vert x\Vert _F:=\sqrt{\langle x,x\rangle }. \end{aligned}$$
(10)

It follows from Theorem 2.1 that

$$\begin{aligned} \Vert x\Vert _F=\sqrt{\mathrm{tr}(x^2)}=\sqrt{\sum _{i=1}^r\lambda _i^2(x)}. \end{aligned}$$
(11)

One can easily verify that

$$\begin{aligned} |\lambda _\mathrm{min}(x)|\le \Vert x\Vert _F~~\mathrm{and}~~|\lambda _\mathrm{max}(x)|\le \Vert x\Vert _F. \end{aligned}$$
(12)

Furthermore, we have

$$\begin{aligned} \left\| x^2\right\| _F\le \Vert x\Vert _F^2. \end{aligned}$$
(13)

In the following two lemmas, we recall several important inequalities used in the sequel.

Lemma 2.1

(Lemma 2.13 in [8]) Let \(x,s\in \mathcal {V}\) and \(\langle x, s\rangle =0\). Then

$$\begin{aligned} -\frac{1}{4}\Vert x + s\Vert _F^2~e \preceq _\mathcal{K}x\circ s\preceq _\mathcal{K}\frac{1}{4}\Vert x+s\Vert _F^2~e. \end{aligned}$$

Lemma 2.2

(Lemma 2.16 in [8]) Let \(x,s\in \mathcal {V}\). Then

$$\begin{aligned} \Vert x \circ s\Vert _F\le \frac{1}{2}\left\| x^2+s^2\right\| _F. \end{aligned}$$

Next lemma provides an important inequality connecting eigenvalues of \(x \circ s\) with the sum of Frobenius norms of \(x\) and \(s\).

Lemma 2.3

Let \(x, s \in \mathcal {V}\). Then

$$\begin{aligned} \sum _{i=1}^r |\lambda _i (x \circ s)|\le \frac{1}{2}\left( \Vert x\Vert _F^2+\Vert s\Vert _F^2\right) . \end{aligned}$$

Proof

For any \(x, s \in \mathcal {V}\), we have

$$\begin{aligned} x^2+s^2-2 x\circ s=(x-s)^2\in \mathcal{K}~\mathrm{and}~x^2+s^2+2 x\circ s=(x+s)^2\in \mathcal{K}. \end{aligned}$$

This implies that

$$\begin{aligned} -\frac{1}{2} (x^2+s^2)\preceq _\mathcal{K}x\circ s\preceq _\mathcal{K}\frac{1}{2} (x^2+s^2). \end{aligned}$$

From Theorem 8 in [27], we have

$$\begin{aligned} \lambda _i^\downarrow (x\circ s) \le \frac{1}{2} \lambda _i^\downarrow (x^2+s^2)~\mathrm{and}~ -\lambda _i^\downarrow (x\circ s) \le \frac{1}{2} \lambda _i^\downarrow (x^2+s^2) \end{aligned}$$

for all \(i\), where for \(z\in \mathcal {V}\), \(\lambda _i^\downarrow (z)\) \((i= 1, 2, \ldots , r)\) denote the eigenvalues of \(z\) written in the decreasing order. Thus,

$$\begin{aligned} |\lambda _i^\downarrow (x\circ s)| \le \frac{1}{2} \lambda _i^\downarrow (x^2+s^2) \end{aligned}$$

for all \(i\). This implies that

$$\begin{aligned} \sum _{i=1}^r |\lambda _i (x \circ s)| \le \frac{1}{2} \sum _{i=1}^r \lambda _i(x^2+s^2) = \frac{1}{2} \mathrm{tr}(x^2+s^2) = \frac{1}{2}(\mathrm{tr}(x^2)+ \mathrm{tr}(s^2)). \end{aligned}$$

Since \(\mathrm{tr}(x^2)=\Vert x\Vert _F^2\) and \(\mathrm{tr}(s^2)=\Vert s\Vert _F^2\), we have

$$\begin{aligned} \sum _{i=1}^r |\lambda _i (x \circ s)|\le \frac{1}{2}\left( \Vert x\Vert _F^2+\Vert s\Vert _F^2\right) . \end{aligned}$$

This completes the proof. \(\square \)

If \(\langle x,s\rangle =0\), then \(\Vert x+s\Vert _F^2=||x||_F^2+||s||_F^2\). Thus, the following corollary follows immediately from Lemma 2.3.

Corollary 2.2

Let \(x, s \in \mathcal {V}\) and \(\langle x,s\rangle =0\). Then

$$\begin{aligned} \sum _{i=1}^r |\lambda _i (x \circ s)|\le \frac{1}{2}\Vert x+s\Vert _F^2. \end{aligned}$$

Lemma 2.4

(Lemma C.6 in [17]) Let \(\gamma \) be a vector in \(I\!R^p\) such that \(\gamma >-\mathbf{1}\) and \(\mathbf{1}^T\gamma =\sigma \), where \(\mathbf{1}\) denotes the all-one vector in \(I\!R^p\). Then if either \(\gamma \ge 0\) or \(\gamma \le 0\),

$$\begin{aligned} \sum _{i=1}^p\frac{-\gamma _i}{1+\gamma _i}\le \frac{-\sigma }{1+\sigma }. \end{aligned}$$

Equality holds if and only if at most one of the coordinates of \(\gamma \) is non-zero.

Lemma 2.5

Let \(u,v\in \mathcal {V}\) and \(\langle u, v\rangle =0\), and suppose \(\Vert u+v\Vert _F=2a\) with \(a<1\). Then

$$\begin{aligned} \left\langle e,(e+u\circ v)^{-1}-e\right\rangle \le \frac{2a^4}{1-a^4}. \end{aligned}$$

Proof

It follows from Lemma 2.1 that

$$\begin{aligned} |\lambda _i(u\circ v)|\le \frac{1}{4}\Vert u+v\Vert _F^2=\frac{1}{4}\cdot 4a^2=a^2<1,~~i=1,\ldots ,r. \end{aligned}$$

This implies that \(-e\prec _\mathcal{K}u\circ v \prec _\mathcal{K}e\). Let \(\beta : =u\circ v\), then \(\mathrm{tr}(\beta )=0\) by assumption. It easily follows that \(\langle e,\beta \rangle = 0\). Next, let

$$\begin{aligned} I_+(\beta ):=\{i: \lambda _i(\beta )>0\}~~\mathrm{and}~I_-(\beta ):=\{i: \lambda _i(\beta )<0\}. \end{aligned}$$

Then

$$\begin{aligned} \sigma :=\sum _{i\in I_+(\beta )}\lambda _i(\beta )=-\sum _{i\in I_-(\beta )}\lambda _i(\beta ). \end{aligned}$$

Let \(\gamma _i=\lambda _i(\beta )\) for \(i\in I_+(\beta )\) and \(\gamma _i=\lambda _i(\beta )\) for \(i\in I_-(\beta )\). From Lemma 2.4, we have

$$\begin{aligned} \left\langle e, (e+u\circ v)^{-1}-e\right\rangle&= \left\langle e, (e+\beta )^{-1}-e\right\rangle =\left\langle e, -\beta \circ (e+\beta )^{-1}\right\rangle \nonumber \\&= \sum _{i=1}^r \frac{-\lambda _i(\beta )}{1+\lambda _i(\beta )} =\sum _{i\in I_+(\beta )} \frac{-\lambda _i(\beta )}{1+\lambda _i(\beta )}+\sum _{i\in I_-(\beta )} \frac{-\lambda _i(\beta )}{1+\lambda _i(\beta )}\nonumber \\&\le \frac{-\sigma }{1+\sigma }+\frac{\sigma }{1-\sigma }=\frac{2\sigma ^2}{1-\sigma ^2}. \end{aligned}$$
(14)

The last expression is monotonically increasing in \(\sigma \). Hence we may replace it by an upper bound, which can be obtained from Corollary 2.2 as follows:

$$\begin{aligned} \sigma =\frac{1}{2}\sum _{i=1}^r |\lambda _i(\beta )|=\frac{1}{2}\sum _{i=1}^r |\lambda _i(x \circ s)|\le \frac{1}{4}\Vert u+v\Vert _F^2=a^2. \end{aligned}$$

Substitution of this bound for \(\sigma \) into (14) yields the desired inequality

$$\begin{aligned} \left\langle e,(e+u\circ v)^{-1}-e\right\rangle \le \frac{2a^4}{1-a^4}. \end{aligned}$$

This completes the proof. \(\square \)

3 Full Nesterov–Todd Step Feasible Interior-Point Method

In this section, we present a new convergence and complexity analysis for the full NT-step feasible IPM for SO given in [8]. It is shown that the iterates can be taken in the wider quadratic convergence neighbourhood of the central path than previously obtained in [8]. Furthermore, we establish the currently best known iteration bound for full NT-step feasible small-update methods.

3.1 The SO Problems

In this paper, we consider the primal SO problem in the standard form

$$\begin{aligned} \mathrm{(SOP)}\quad \quad \quad \quad \quad \min \quad \{\langle c, x\rangle :~ Ax=b,~~x\in \mathcal{K}\},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \end{aligned}$$

and its dual problem

$$\begin{aligned} \mathrm{(SOD)}\quad \quad \quad \quad \quad \max \quad \{b^\mathrm{T}y:~ A^\mathrm{T}y+s=c,~~s\in \mathcal{K}\},\quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \end{aligned}$$

where \(A\) is a matrix representation of the linear operator from \(\mathcal {V}\) to \(I\!R^m\). Vector \(c\) and the rows of \(A\) lie in \(\mathcal {V}\), \(b\in I\!R^m\), and \(A^\mathrm{T}\) is the adjoint of \(A\). Throughout the paper, we assume that the rows of \(A\) are linearly independent. Many researchers have studied SO and achieved numerous important results [812].

3.2 A Brief Outline of the Full NT-Step Feasible IPM

Without loss of generality, as established in [28], we can assume that both (SOP) and (SOD) satisfy the interior-point condition (IPC), i.e. there exists \((x^0,y^0,s^0)\) such that

$$\begin{aligned} Ax^0=b,\,x^0\in \mathrm{int}\mathcal{K},\,A^\mathrm{T}y^0+s^0=c,\,s^0\in \mathrm{int}\mathcal{K}. \end{aligned}$$

The Karush–Kuhn–Tucker (KKT) conditions for (SOP) and (SOD) are given by

$$\begin{aligned} Ax=b,\,x\in \mathcal{K},\quad A^\mathrm{T}y+s=c,\,s\in \mathcal{K},\quad x\circ s=0. \end{aligned}$$
(15)

The standard approach of primal-dual IPMs is to replace the third equation in (15), that is called complementarity condition for (SOP) and (SOD), by the parameterized equation \( x\circ s=\mu e\) with \(\mu >0\), which yields the following system

$$\begin{aligned} Ax=b,\,x\in \mathcal{K}, \quad A^\mathrm{T}y+s&= c,\,s\in \mathcal{K}, \quad x\circ s=\mu e. \end{aligned}$$
(16)

The parameterized system (16) has a unique solution \((x(\mu ), y(\mu ), s(\mu ))\) for each \(\mu >0\). The \(x(\mu )\) component is called the \(\mu \)-centre of (SOP) and the pair \((y(\mu ), s(\mu ))\) is called the \(\mu \)-centre of (SOD). The set of \(\mu \)-centres forms a homotopy path with \(\mu \) running through all positive real numbers, which is called the central path. If \(\mu \rightarrow 0\), then the limit of the central path exists and since the limit points satisfy the complementarity condition, i.e., \(x\circ s=0\), it naturally yields an optimal solution for (SOP) and (SOD) (see, e.g., [7, 11]).

The core idea of primal-dual IPMs is to follow the central path approximately, i.e. within a certain neighbourhood, and to approach the optimal set of SO by gradually reducing \(\mu \) to zero. The standard ‘tool’ for tracing the central path approximately is a Newton method. Given an iterate \((x,y,s)\) ‘close’ to the central path (the measure of the closeness will be defined in the sequel), the new iterate is obtained by taking full NT-step as described below. Applying Newton’s method to the system (16), we have

$$\begin{aligned} A\Delta x=0, \quad A^\mathrm{T}\Delta y+\Delta s=0, \quad x\circ \Delta s+s\circ \Delta x=\mu e-x\circ s. \end{aligned}$$
(17)

The system does not always have a unique solution due to the fact that \(x\) and \(s\) do not operator commute in general, i.e. \(L(x)L(s)\ne L(s)L(x)\). To overcome this difficulty, the third equation of the system (16) is replaced by the following equivalent scaled equation (cf. Lemma 28 in [11])

$$\begin{aligned} P(u)x \circ P(u^{-1})s= \mu e, \end{aligned}$$

where \(u\) is a scaling point from the interior of the cone \(\mathcal{K}\).

Applying Newton’s method to this modified system, we have

$$\begin{aligned} A\Delta x&= 0, \nonumber \\ A^\mathrm{T}\Delta y+\Delta s&= 0, \\ P(u)x\circ P(u^{-1})\Delta s+P(u^{-1})s\circ P(u)\Delta x&= \mu e -P(u)x \circ P(u^{-1})s. \nonumber \end{aligned}$$
(18)

There are several appropriate choices for the scaling point \(u\) that lead to obtaining unique search directions from system (18) (see, e.g. [9]). In this paper, we use the classical NT-scaling point to find unique NT-direction. Let \(u:=w^{-\frac{1}{2}}\), where

$$\begin{aligned} w:=P(x)^{\frac{1}{2}}\left( P\left( x^{\frac{1}{2}}\right) s\right) ^{-\frac{1}{2}}\left[ =P(s^{-\frac{1}{2}})\left( P\left( s^{\frac{1}{2}}\right) x\right) ^{\frac{1}{2}}\right] \end{aligned}$$
(19)

is the NT-scaling point of \(x\) and \(s\). This scaling point was first proposed by Nesterov and Todd for self-scaled cones [29, 30] and then adapted by Faybusovich [7] for symmetric cones.

Introducing the variance vector

$$\begin{aligned} v:=\frac{P(w)^{-\frac{1}{2}}x}{\sqrt{\mu }} \left[ =\frac{P(w)^{\frac{1}{2}}s}{\sqrt{\mu }}\right] , \end{aligned}$$
(20)

and the scaled search directions

$$\begin{aligned} d_x:=\frac{P(w)^{-\frac{1}{2}}\Delta x}{\sqrt{\mu }},~ d_s:=\frac{P(w)^{\frac{1}{2}}\Delta s}{\sqrt{\mu }}, \end{aligned}$$
(21)

the system (18) is further simplified

$$\begin{aligned} \overline{A} d_x=0,\quad \overline{A}^\mathrm{T}\Delta y+d_s=0, \quad d_x + d_s =v^{-1}-v, \end{aligned}$$
(22)

where \(\overline{A}:=\frac{1}{\sqrt{\mu }} AP(w)^{\frac{1}{2}}\). This system has a unique solution \((d_x, \Delta y, d_s)\).

From the first two equations of the system (22), one can easily verify that the scaled search directions \(d_x\) and \(d_s\) are orthogonal with respect to the trace inner product, i.e. \(\langle d_x, d_s\rangle \)=0. Hence, we have

$$\begin{aligned} d_x=d_s=0\Leftrightarrow v^{-1}-v=0\Leftrightarrow v=e, \end{aligned}$$

which implies that the iterate \((x,y,s)\) coincides with the corresponding \(\mu \)-centre if and only if \(v=e\). The original search directions can then be obtained from (21)

$$\begin{aligned} \Delta x=\sqrt{\mu } P(w)^{\frac{1}{2}}d_x~\mathrm{and}~\Delta s=\sqrt{\mu } P(w)^{-\frac{1}{2}}d_s. \end{aligned}$$
(23)

If \((x,y, s)\ne (x(\mu ), y(\mu ), s(\mu ))\), then \((\Delta x, \Delta y, \Delta s)\) is non-zero. The new iterate is obtained by taking full NT-steps

$$\begin{aligned} x^+:=x+\Delta x,~y^+:=y+\Delta y,~\mathrm{and}~s^+:=s+\Delta s. \end{aligned}$$
(24)

To measure the distance of an iterate to the corresponding \(\mu \)-centre, a norm-based proximity measure \(\delta (x,s;\mu )\) is introduced

$$\begin{aligned} \delta (v):=\delta (x,s;\mu ):=\frac{1}{2}\left\| v^{-1}-v\right\| _F. \end{aligned}$$
(25)

One can easily verify that

$$\begin{aligned} \delta (v)=0 \Leftrightarrow v=e\Leftrightarrow d_x=d_s=0\Leftrightarrow x\circ s=\mu e, \end{aligned}$$
(26)

which implies that the value of \(\delta (v)\) can indeed be considered as a measure for the distance between the given iterate and the corresponding \(\mu \)-centre.

The \(\tau \)-neighbourhood of the central path can now be defined as

$$\begin{aligned} \mathcal {N}(\tau )\!=\!\left\{ (x,y,s)\in \mathrm{int}\mathcal{K}\!\times \! I\!R^m \times \mathrm{int}\mathcal{K}: Ax=b, A^\mathrm{T}y+s=c,\delta (v) \le \tau \right\} .\qquad \end{aligned}$$
(27)

The brief outline of the feasible algorithm given above can now be summarized as follows. At the start of the algorithm, we choose a strictly feasible point \((x^0,y^0,s^0)\) and \(\mu ^0=\frac{\langle x^0,s^0\rangle }{r}\) such that \(\delta (x^0,s^0;\mu ^0)\le \tau \) with \(0< \tau <1\). Using Newton’s method with NT scaling, the NT-search directions are calculated. A new iterate \((x,y,s)\) is constructed by taking a full NT-step. Then, \(\mu \) is reduced by the factor \(1-\theta \) with \(0<\theta <1\). The appropriate choice of parameters \(\tau \) and \(\theta \) is crucial to assure that the new iterate stays in the \(\tau \)-neighbourhood (27) of the central path. This process is repeated until \(\mu \) is small enough, say until \(r\mu <\varepsilon \). At this stage an \(\varepsilon \)-approximate optimal solution of SO has been found.

The generic full NT-step feasible IPM is as follows:

figure a

3.3 Analysis of the Full NT-Step Feasible IPM

The condition for strict feasibility of the full NT-step is established in the lemma below.

Lemma 3.1

(Lemma 3.3 in [8]) Let \(\delta :=\delta (x,s;\mu )<1\). Then the full NT-step is strictly feasible.

The following lemma shows that, after a full NT-step, the duality gap assumes the same value as at the \(\mu \)-centres, namely \(r\mu \).

Lemma 3.2

(Lemma 3.4 in [8]) After a full NT-step, the duality gap is given by

$$\begin{aligned} \left\langle x^+, s^+\right\rangle = r\mu . \end{aligned}$$

Lemma 3.3

(Proposition 5.6 in [31]) One has

$$\begin{aligned} v^+\sim \left( P(v+d_x)^{\frac{1}{2}}(v+d_s)\right) ^{\frac{1}{2}}. \end{aligned}$$

Symbol \(\sim \) denotes the similarity of elements in EJA. Recall, two elements \(x\) and \(s\) in \(\mathcal {V}\) are similar, denoted by \(x\sim s\), if they share the same set of eigenvalues, including their multiplicities. Furthermore, \(v^+\) represents the variance vector after a full NT-step, which, according to (20), is given by

$$\begin{aligned} v^+:=\frac{P(w^+)^{-\frac{1}{2}}x^+}{\sqrt{\mu }} \left[ =\frac{P(w^+)^{\frac{1}{2}}s^+}{\sqrt{\mu }}\right] , \end{aligned}$$
(28)

where \(w^+\) is the NT-scaling point of \(x^+\) and \(s^+\).

The following technical lemma is a slight modification of Lemma 30 in [11].

Lemma 3.4

Let \(x,s \in \mathrm{int}\,\mathcal{K}\). Then

$$\begin{aligned} \Vert x\circ s\Vert _F\ge \Vert P(x)^{\frac{1}{2}}s\Vert _F. \end{aligned}$$

Proof

Corollary 2.1 implies that \(L(x)\) and \(L(x^2)\) commute and the matrix

$$\begin{aligned} L\left( x^2\right) - L(x)^2 =L(x)^2-\left( 2L(x)^2-L\left( x^2\right) \right) =L(x)^2 - P(x) \end{aligned}$$

has the eigenvalues

$$\begin{aligned} \left( \frac{\lambda _i+\lambda _j}{2}\right) ^2-\lambda _i\lambda _j\ge 0,\quad 1\le i\le j\le r. \end{aligned}$$

Thus, \(L\left( x^2\right) \succeq L(x)^2\) and we have

$$\begin{aligned} \Vert x\circ s\Vert _F^2&= \langle x\circ s,x\circ s\rangle =\langle s, L(x)^2s\rangle =\langle s, 2L(x)^2s\rangle -\langle s, L(x)^2s\rangle \\&\ge \langle s, 2L(x)^2s\rangle -\langle s, L(x^2)s\rangle = \langle s, (2L(x)^2-L(x^2))s\rangle \\&= \langle s, P(x)s\rangle =\langle P(x)^{\frac{1}{2}}s,P(x)^{\frac{1}{2}}s\rangle =\Vert P(x)^{\frac{1}{2}}s\Vert _F^2. \end{aligned}$$

This completes the proof. \(\square \)

Corollary 3.1

Let \(x,s \in \mathrm{int}\, \mathcal{K}\). Then

$$\begin{aligned} \Vert (x\circ s)^{-\frac{1}{2}}\Vert _F\le \Vert (P(x)^{\frac{1}{2}}s)^{-\frac{1}{2}}\Vert _F. \end{aligned}$$

The key part in proving the quadratic convergence is to investigate the effect on proximity measure \(\delta (x,s;\mu )\) of a full NT-step to the target point \((x(\mu ),y(\mu ),s(\mu ))\). For this purpose, Gu et al. [8] extended Theorem II.50 in [17] for LO to the SO case. This result is stated in the theorem below.

Theorem 3.1

(Theorem 3.6 in [8]) Let \(\delta :=\delta (x,s;\mu )<1\). Then

$$\begin{aligned} \delta (x^+,s^+;\mu )\le \frac{\delta ^2}{\sqrt{2(1-\delta ^2)}}. \end{aligned}$$

Furthermore, if \(\delta \le \frac{1}{\sqrt{2}}\), then \(\delta (x^+,s^+;\mu )\le \delta ^2\).

In what follows, we derive a sharper quadratic convergence result than the one mentioned above. Our derivation is based on the generalization of Theorem II.52 in [17] for LO. This leads to a wider quadratic convergence neighbourhood of the central path for the algorithm presented in this paper.

Theorem 3.2

Let \(\delta :=\delta (x,s;\mu )<1\). Then, the full NT-step is strictly feasible and

$$\begin{aligned} \delta (x^+,s^+;\mu )\le \frac{\delta ^2}{\sqrt{2(1-\delta ^4)}}. \end{aligned}$$

Proof

Let

$$\begin{aligned} u:=P(v+d_x)^{\frac{1}{2}}(v+d_s)~~\mathrm{and}~\bar{u}:=(v+d_x)\circ (v+d_s). \end{aligned}$$

Then, it follows from Lemma 3.3 that \(v^+\sim u^{\frac{1}{2}}\). Furthermore, we have

$$\begin{aligned} \bar{u}=v^2+v\circ (d_x+d_s)+d_x\circ d_s=v^2+v\circ (v^{-1}-v)+d_x\circ d_s=e+d_x\circ d_s. \end{aligned}$$

From (28) and Lemma 3.2, we have

$$\begin{aligned} \Vert u^{\frac{1}{2}}\Vert _F^2=\Vert v^+\Vert _F^2=\langle v^+,v^+\rangle =\frac{1}{\mu }\langle x^+,s^+\rangle =r. \end{aligned}$$

Corollary 3.1 implies that

$$\begin{aligned} \Vert u^{-\frac{1}{2}}\Vert _F^2\le \Vert \bar{u}^{-\frac{1}{2}}\Vert _F^2. \end{aligned}$$

Thus, we have

$$\begin{aligned} 4\delta (x^+,s^+;\mu )^2&= \Vert (v^+)^{-1}-v^+\Vert _F^2 =\Vert u^{-\frac{1}{2}}-u^{\frac{1}{2}}\Vert _F^2=\Vert u^{-\frac{1}{2}}\Vert _F^2+\Vert u^{\frac{1}{2}}\Vert _F^2 -2r\\&= \Vert u^{-\frac{1}{2}}\Vert _F^2-r\le \Vert \bar{u}^{-\frac{1}{2}}\Vert _F^2-r=\langle e, (e+d_x \circ d_s)^{-1}-e\rangle . \end{aligned}$$

Application of the Lemma 2.5 to the last expression above with \(u=d_x\) and \(v=d_s\) yields the result of the theorem, since \(\Vert d_x+d_s\Vert =2\delta \), with \(\delta <1\). This completes the proof. \(\square \)

Corollary 3.2

Let \(\delta :=\delta (x,s;\mu )\le \frac{1}{\root 4 \of {2}}\). Then the full NT-step is strictly feasible and

$$\begin{aligned} \delta (x^+,s^+;\mu )\le \delta ^2. \end{aligned}$$

The corollary above shows the quadratic convergence of the full NT-step to the target \(\mu \)-centre \((x(\mu ),y(\mu ),s(\mu ))\) in the wider neighbourhood determined by \(1/\root 4 \of {2}\) as opposed to \(1/\sqrt{2}\) in [8].

In the previous theorem, the \(\mu \) was kept fixed and \((x,y,s)\) was updated to \((x^+,y^+,s^+)\). In the following theorem, the effect on the proximity measure is investigated when \((x,y,s)\) is kept fixed and \(\mu \) is updated to \(\mu ^+=(1-\theta )\mu \). The theorem is a generalization of Lemma II.54 in [17].

Theorem 3.3

Let \(\delta :=\delta (x,s;\mu )<1\) and \(\mu ^+=(1-\theta )\mu \) with \(0<\theta <1\). Then

$$\begin{aligned} \delta (x,s;\mu ^+)^2= (1-\theta )\delta ^2+\frac{r\theta ^2}{4(1-\theta )}. \end{aligned}$$

Proof

From (20), we have \(\mu \mathrm{tr}(v^2)=\mathrm{tr}(x\circ s)\). This shows that \(\mathrm{tr}(v^2)=\mathrm{tr}(e)=r\). Therefore, \(v\) and \(v^{-1}-v\) are orthogonal with respect to the trace inner product. Thus, we have

$$\begin{aligned} 4\delta (x,s;\mu ^+)^2&= \left\| \sqrt{1-\theta }v^{-1}-\frac{v}{\sqrt{1-\theta }}\right\| _F^2 =\left\| \sqrt{1-\theta }(v^{-1}-v)+\frac{\theta v}{\sqrt{1-\theta }}\right\| _F^2\\&= (1-\theta )\Vert (v^{-1}-v)\Vert _F^2+\frac{\theta ^2 \Vert v\Vert _F^2}{1-\theta } =4(1-\theta )\delta ^2+\frac{r\theta ^2}{1-\theta }. \end{aligned}$$

This completes the proof. \(\square \)

Corollary 3.3

Let \(\delta :=\delta (x,s;\mu ) \le \frac{1}{\root 4 \of {2}}\) and \(\theta =\frac{1}{\sqrt{2r}}\) with \(r\ge 2\). Then

$$\begin{aligned} \delta (x^+,s^+;\mu ^+ )\le \frac{1}{\root 4 \of {2}}. \end{aligned}$$

Proof

From Corollary 3.2 and the fact that \(\delta \le \frac{1}{\root 4 \of {2}}\), we have

$$\begin{aligned} \delta (x^+,s^+;\mu )^2\le (\delta ^2 )^2=\delta ^4\le \frac{1}{\sqrt{2}}. \end{aligned}$$

Then, after the barrier parameter is updated to \(\mu ^+=(1-\theta )\mu \), with \(\theta =\frac{1}{\sqrt{2r}}\), Theorem 3.3 yields the following upper bound for \(\delta (x^+,s^+;\mu ^+)^2\):

$$\begin{aligned} \delta (x^+,s^+;\mu ^+ )^2\le \frac{1-\theta }{2}+\frac{1}{8(1-\theta )}\le \mathrm{max} \left\{ \frac{5}{8},\frac{1}{2}\right\} =\frac{5}{8}. \end{aligned}$$

The last inequality holds due to the fact that the left-hand side is a convex function of \(\theta \), whose values are \(5/8\) and \(1/2\) at \(\theta = 0\) and \(\theta = 1/2\), respectively. Since \(\theta \in [0,1/2]\), the left-hand side does not exceed \(5/8\). Note that \(5/8<1/\sqrt{2}\). Hence, the desired inequality follows

$$\begin{aligned} \delta \left( x^+,s^+;\mu ^+\right) \le \frac{1}{\root 4 \of {2}}. \end{aligned}$$

This completes the proof. \(\square \)

The consequence of the above results is that if we set the values of the threshold parameter and the barrier update parameter to

$$\begin{aligned} \tau =\frac{1}{\root 4 \of {2}}~~\mathrm{and}~\theta =\frac{1}{\sqrt{2r}}, \end{aligned}$$
(29)

the algorithm will generate a sequence of iterates that are not only feasible; they, in addition, always lay in the \(\tau \)-neighbourhood (27) of the central path. Moreover, the duality gap will be reduced according to \(\langle x^+, s^+\rangle =r \mu \). Hence, the algorithm is well defined and globally convergent.

Similar to the proof of Lemma 4.7 in [12], we can easily verify the validity of the following lemma.

Lemma 3.5

Let \(x^0\) and \(s^0\) are strictly feasible, \(\mu ^0=\frac{\langle x^0, s^0\rangle }{r}\le \frac{1}{\root 4 \of {2}}\). Moreover, let \((x^k,y^k,s^k)\) be the k-th iterate of the algorithm. Then the inequality \(\langle x^k, s^k\rangle \le \varepsilon \) is satisfied if

$$\begin{aligned} k\ge \frac{1}{\theta }\log {\frac{\langle x^0, s^0\rangle }{\varepsilon }}. \end{aligned}$$

From Lemma 3.5, we have the following theorem which provides an upper bound for the total number of the iterations produced by the algorithm.

Theorem 3.4

Let \(\tau =\frac{1}{\root 4 \of {2}}\) and \(\theta =\frac{1}{\sqrt{2r}}\) with \(r\ge 2\). Then the algorithm requires

$$\begin{aligned} \sqrt{2r}\log {\frac{\langle x^0,s^0\rangle }{\varepsilon }} \end{aligned}$$

iterations to obtain an iterate \((x,y,s)\) satisfying \(\langle x,s\rangle \le \varepsilon \) which is an \(\varepsilon \)-approximate optimal solution of (SOP) and (SOD).

Thus, the feasible algorithm is well defined, globally convergent, and achieves quadratic convergence of full NT-steps in the wider neighbourhood while still maintaining the best known iteration bound known for these types of methods, namely

$$\begin{aligned} O\left( \sqrt{r}\log {\frac{r\mu _0}{\varepsilon }}\right) . \end{aligned}$$

Although the neighbourhood is wider, it is not substantially wider, hence, the algorithm is still a small-update method.

4 Conclusions

In this paper, we have presented an improved convergence and complexity analysis of the full NT-step feasible IPM for SO discussed in [8]. It has been shown that the iterates in this modified version of the algorithm can be taken in the wider quadratic convergence neighbourhood of the central path characterized by \(1/\root 4 \of {2}\) as opposed to \(1/\sqrt{2}\) in [8]. However, the currently best iteration bound known for feasible algorithm of the type described in the paper is still achieved. For the analysis of the algorithm, some new results on EJAs has to be developed including Lemmas 2.3 and 2.5 as two important ones. Moreover, these results are interesting in their own right.

Some interesting topics for further research include the design and analysis of a full NT-step infeasible IPM with polynomial iteration bound and the generalization of these algorithms to the Cartesian \(P_*(\kappa )\)-linear complementarity problem over symmetric cones [32, 33].