Abstract
In this paper, we develop a primal-dual central trajectory interior-point algorithm for symmetric programming problems and establish its complexity analysis. The main contribution of the paper is that it uniquely equips the central trajectory algorithm with various selections of the displacement step while solving symmetric programming. To show the efficiency of the proposed algorithm, these selections of calculating the displacement step are compared in numerical examples for second-order cone programming, which is a special case of symmetric programming.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Symmetric programming problems [1, 2] are convex optimization problems in which we optimize a linear objective function over the intersection of an affine linear manifold with the Cartesian product of symmetric cones. Linear programming, second-order cone programming [3, 4] and semidefinite programming [5, 6] are well known and important special cases of symmetric programs. Interior-point methods are considered one of the effective methods developed for general symmetric programming problems (see for example [1, 2, 7,8,9]). For certain classes of optimization problems over symmetric cones complete interior point methods were developed. See, for example, [10,11,12,13] for linear programming, [3, 4, 14,15,16,17,18] for second-order cone programming, and [5, 6, 19,20,21,22] for semidefinite programming.
Interior-point methods consist of two elements: namely, calculating a step or search direction that places one in the direction of a central path, and calculating a displacement step-size in order to enforce some sort of feasibility (more specifically, the positive-definiteness) of the subsequent iterate. Throughout this paper, we use the term “displacement step” instead of “displacement step-size” for the sake of simplicity. In fact, the main challenge to be faced in establishing an iteration in an interior-point algorithm is that the determination and computation of the displacement step along the search direction. Calculating the displacement step using classical line search methods is undesirable and even generally impossible [20, 23]. Touil et al. [23] developed an interior-point algorithm for semidefinite programming and presented some strategies for determining the displacement step for the developed algorithm. Extensions of interior-point methods for semidefinite programming to general symmetric cones can be found in the literature (see for example [14,15,16,17,18]), but only in terms of calculating the search direction, not preserving geometric feasibility conditions thereafter. This paper presents some strategies for determining the displacement step for a particular interior-point method on general symmetric cones. So, our focus in this paper is not on the search direction, but on the displacement step. In other words, we are not developing a new step, but a new step size.
We can separate two types of interior-point methods: projective methods [10,11,12, 24, 25] and (feasible or infeasible) central trajectory methods [19, 23, 26, 27]. The interior-point method developed by Touil et al. [23] for semidefinite programming is a feasible central trajectory algorithm. The focus in this paper is on solving symmetric programming by feasible central trajectory methods. More specifically, this work exploits the Jordan algebraic structure of the symmetric cone to derive a feasible primal-dual central trajectory algorithm for symmetric programming and to present some strategies for determining the displacement step for the proposed algorithm by extending the work of Touil et al. [23] for the semidefinite case to the general symmetric case.
In this paper, we associate a perturbed problem to symmetric programming problem and apply Newton’s method to treat the corresponding perturbed equations in order to obtain a descent search direction. The main contribution of this paper is that it uniquely proposes four different selections to calculate the displacement step along the search direction while solving symmetric programming. Our work is not a trivial generalization of that in [23], however, because a deep understanding of the Euclidean Jordan algebras is very important to establish the analysis of this paper. The main difficulties in developing our analysis are caused by establishing and proving the selections of determining the displacement step in the framework of Euclidean Jordan algebras.
There are widely available software packages than can handle symmetric programs in general, and second-order cone programs in particular. Optimization solvers such as MOSEK [28] and SDPT3 [29] are based on the interior-point methods and are well suited for symmetric programming problems. The efficiency of our proposed algorithms is shown by presenting numerical experiments on second-order cone programs comparing with SDPT3 and MOSEK.
This paper is organized as follows. In Sect. 2, we briefly review the basics of the Euclidean Jordan algebra associated with the symmetric cone and provide some preliminary results. In Sect. 3, we introduce the problem formulation and make some assumptions, then we apply Newton’s method to the perturbed problem. In Sect. 4, we propose four different selections of calculating the appropriate displacement step. Section 5 is devoted to present a feasible primal-dual central trajectory algorithm. The complexity analysis of the proposed algorithm is presented in Sect. 6. In Sect. 7, we compare the four selections of computing the displacement step and show the efficiency of the proposed algorithm in some numerical examples for second-order cone programs. Section 8 contains some concluding remarks.
2 Background and preliminary results
In this section, we provide the necessary background for the subsequent sections. First, we review some statistical inequalities needed for deriving some results in the work. Then, we briefly review the definitions of self-duality, homogeneity and symmetric cones. Finally, we review the basics of the theory of Euclidean Jordan algebras.
2.1 Preliminary statistical inequalities
In this part, we review some statistical inequalities needed for the derivation of some of the results in the paper.
Let \(x_1, x_2, \ldots , x_n \in {\mathbb {R}}\) be a sample of size n, then its mean\(\bar{ x}\) and its standard deviation\(\sigma _{\varvec{x}}\) are respectively defined as
The first statement in the following proposition is due to [30] and the second statement is due to [20, Theorem 5].
Proposition 2.1
Assume that \(\varvec{x} \in {\mathbb {R}}^n\), then we have
In particular, if \(x_k>0\) for all \(k=1, 2, \ldots , n\), then we also have
2.2 Preliminary conical definitions
A cone is said to be closed iff it is closed under the taking of sequential limits, convex iff it is closed under taking convex combinations, pointed iff it does not contain two opposite nonzero vectors (so the origin is an extreme point), solid iff it has a nonempty interior, and regular iff it is a closed, convex, pointed, solid cone.
Definition 2.1
Let \(\mathcal {V}\) be a finite-dimensional Euclidean vector space over \({\mathbb {R}}\) with an inner product “\(\langle \varvec{\cdot }, \varvec{\cdot } \rangle \)”. The dual cone of a regular cone \(\mathcal {K} \subset \mathcal {V}\) is denoted by \(\mathcal {K}^\star \) and is defined as
A regular cone \(\mathcal {K}\) is said to be self-dual if it coincides with its dual cone, i.e., \(\mathcal {K} = \mathcal {K}^\star \).
The general linear group of degree n over \({\mathbb {R}}\) is denoted by \(GL(n, {\mathbb {R}})\) and is defined to be the set of all nonsingular matrices of order n with entries from \({\mathbb {R}}\), together with the operation of ordinary matrix multiplication. For a regular cone \(\mathcal {K} \subset \mathcal {V}\), we denote by \(\text {Aut}(\mathcal {K})\) the automorphism group of \(\mathcal {K}\), i.e.,
Definition 2.2
Let \(\mathcal {V}\) be a finite-dimensional real Euclidean space. A regular cone \(\mathcal {K} \subset \mathcal {V}\) is said to be homogeneous if for each \(\varvec{u}, \varvec{v} \in \text {int}\,\mathcal {K}\), there exists an nonsingular linear map \(F: \mathcal {V}\longrightarrow \mathcal {V}\) such that
In other words, \(\text {Aut}(\mathcal {K})\) acts transitively on the interior of \(\mathcal {K}\).
A regular \(\mathcal {K}\) cone is said to be symmetric if it is both self-dual and homogeneous.
2.3 Preliminaries of Euclidean Jordan algebras
In this subsection, we review some basic definitions and notions from the theory of Jordan algebras. The text of Faraut and Korányi [31] covers the foundations of this theory. Our presentation in this subsection follows that of [1, Section 2]. In order to make our presentation concrete, we give some examples from the algebra of the second-order cone as an interesting paradigm for understanding the theory of Jordan algebras.
We use “,” for adjoining scalars, vectors and matrices in a row, and use “;” for adjoining them in a column. So, for instance, if \(\varvec{u}\) and \(\varvec{v}\) are vectors, we have
Let \(\mathcal {J}\) be a finite-dimensional vector space over \({\mathbb {R}}\). A map \(\circ : \mathcal {J} \times \mathcal {J} \longrightarrow \mathcal {J}\) is called bilinear if \((\alpha \varvec{x} + \beta \varvec{y})\circ \varvec{z}=\alpha (\varvec{x} \circ \varvec{z})+\beta (\varvec{y} \circ \varvec{z})\) and \(\varvec{x} \circ (\alpha \varvec{y} + \beta \varvec{z})=\alpha (\varvec{x} \circ \varvec{y})+\beta (\varvec{x} \circ \varvec{z})\) for all \(\varvec{x}, \varvec{y}, \varvec{z} \in \mathcal {J}\) and \(\alpha , \beta \in {\mathbb {R}}\). The vector space \(\mathcal {J}\) over \({\mathbb {R}}\) is called an algebra over \({\mathbb {R}}\) if a bilinear map \(\circ : \mathcal {J} \times \mathcal {J} \longrightarrow \mathcal {J}\) exists.
Let \(\varvec{x}\) be an element in an algebra \(\mathcal {J}\), then we define \(\varvec{x}^{(1)}:=\varvec{x}\) and define \(\varvec{x}^{(n)}\) recursively by \(\varvec{x}^{(n)} :=\varvec{x} \circ \varvec{x}^{(n-1)}\) for \(n \ge 2\).
Definition 2.3
Let \(\mathcal {J}\) be an algebra over \({\mathbb {R}}\) with a bilinear product \(\circ : \mathcal {J} \times \mathcal {J} \longrightarrow \mathcal {J}\). Then \((\mathcal {J}, \circ )\) is called a Jordan algebra if for all \(\varvec{x}, \varvec{y} \in \mathcal {J}\) we have
The product \(\varvec{x} \circ \varvec{y}\) between two elements \(\varvec{x}\) and \(\varvec{y}\) of a Jordan algebra \((\mathcal {J}, \circ )\) is called the Jordan multiplication between \(\varvec{x}\) and \(\varvec{y}\). A Jordan algebra \((\mathcal {J}, \circ )\) has an identity element if there exists a (necessarily unique) element \(\varvec{e}\in \mathcal {J}\) such that \(\varvec{x} \circ \varvec{e} = \varvec{x}\) for all \(\varvec{x} \in \mathcal {J}\). A Jordan algebra \((\mathcal {J}, \circ )\) is not necessarily associative, that is, \(\varvec{x} \circ (\varvec{y} \circ \varvec{z}) =(\varvec{x} \circ \varvec{y}) \circ \varvec{z}\) may not generally hold. However, it is power associative, i.e., \(\varvec{x}^{(p)} \circ \varvec{x}^{(q)}=\varvec{x}^{(p+q)}\) for all integers \(p, q \ge 1\).
Example 2.1
For each vector \(\varvec{x} \in {\mathbb {R}}^n\) whose first entry is indexed with 0, we write \(\widetilde{\varvec{x}}\) for the subvector consisting of entries 1, through \(n-1\) (therefore \(\varvec{x} = (x_0; \widetilde{\varvec{x}}) \in {\mathbb {R}}\times {\mathbb {R}}^{{n}-1}\)). We let \(\mathcal {E}^{n}\) denote the \(n^\text {th}\) dimensional real vector space \({\mathbb {R}}\times {\mathbb {R}}^{{n}-1}\) whose vectors \(\varvec{x}\) are indexed from 0. Consider the multiplication \(\circ : \mathcal {E}^n \times \mathcal {E}^n \longrightarrow \mathcal {E}^n\) defined as
for \(\varvec{x}, \varvec{y} \in \mathcal {E}^n\). It is easy to see that the space \(\mathcal {E}^{n}\) with the multiplication “\(\circ \)” forms a Jordan algebra with the identity vector \( \varvec{e} := \left( 1; \varvec{0} \right) .\)
Definition 2.4
A Jordan algebra \(\mathcal {J}\) is called Euclidean if there exists a map \( \bullet :(\mathcal {J}, \circ ) \times (\mathcal {J}, \circ )\longrightarrow {\mathbb {R}}\) such that for all \(\varvec{x}, \varvec{y}, \varvec{z} \in \mathcal {J}\) we have
-
(1)
\(\varvec{x} \bullet \varvec{x} > 0\) for all \(\varvec{x} \ne \varvec{0}\) (positive definiteness);
-
(2)
\(\varvec{x}\bullet \varvec{y} = \varvec{y}\bullet \varvec{x}\) (symmetry);
-
(3)
\(\varvec{x}\bullet (\varvec{y} \circ \varvec{z}) =(\varvec{x} \circ \varvec{y})\bullet \varvec{z}\) (associativity).
That is, \(\mathcal {J}\) admits a positive definite, symmetric, quadratic form which is also associative.
The degree of an element \(\varvec{x} \in \mathcal {J}\) (denoted by \(\text {deg}(\varvec{x})\)) is the smallest integer d such that the set \(\left\{ \varvec{e}, \varvec{x}, \varvec{x}^{(2)}, \ldots , \varvec{x}^{(d)} \right\} \) is linearly independent. The rank of a Jordan algebra \(\mathcal {J}\) (denoted by \(\text {rank}(\mathcal {J})\)) is the largest \(\text {deg}(\varvec{x})\) of any element \(\varvec{x} \in \mathcal {J}\). An element \(\varvec{x} \in \mathcal {J}\) is called regular if \(\deg (\varvec{x})= \text {rank}(\mathcal {J})\). Throughout the rest of this section, we let \(\mathcal {J}\) be a rank-r Euclidean Jordan algebra.
Let \(\varvec{x}\) be an element of degree d in a Euclidean Jordan algebra \(\mathcal {J}\). Since \(\{\varvec{e}, \varvec{x}, \varvec{x}^2, \ldots , \varvec{x}^d\}\) is linearly dependent, there are real numbers \(a_1(\varvec{x}), a_2(\varvec{x}), \ldots , a_d(\varvec{x})\), not all zero, such that
We call q the minimal polynomial of \(\varvec{x}\). If \(\varvec{x}\) is a regular element of a Euclidean Jordan algebra, then we define its characteristic polynomial to be equal to its minimal polynomial. Since the set of regular elements is dense in \(\mathcal {J}\), we can extend characteristic polynomials to all \(\varvec{x}\) in \(\mathcal {J}\). So, the minimal polynomial coincides with the characteristic polynomial for regular elements and divides the characteristic polynomial of non-regular elements. Let \(\varvec{x}\) be an element in a rank-r algebra \(\mathcal {J}\), then its eigenvalues are the roots \(\lambda _1, \lambda _2, \ldots , \lambda _r\) of its characteristic polynomial \(p(\lambda ) =\lambda ^r - a_1(\varvec{x}) \lambda ^{r-1} + a_2(\varvec{x}) \lambda ^{r-2} + \cdots + (-1)^r a_r(\varvec{x})\).
Two elements \(\varvec{c}_1, \varvec{c}_2 \in \mathcal {J}\) are said to be orthogonal if \(\varvec{c}_1 \circ \varvec{c}_2 = \varvec{0}\). A set of elements of \(\mathcal {J}\) is orthogonal if all its elements are mutually orthogonal to each other. An element \(\varvec{c} \in \mathcal {J}\) is said to be an idempotent if \(\varvec{c}^{(2)}=\varvec{c}\). An idempotent is primitive if it is non-zero and cannot be written as a sum of two (necessarily orthogonal) non-zero idempotents. A subset \(\{ \varvec{c}_1, \varvec{c}_2, \ldots , \varvec{c}_k\}\) of \(\mathcal {J}\) is called a Jordan frame if it is a complete system (i.e., \(\varvec{c}_1 + \varvec{c}_2 + \cdots + \varvec{c}_k = \varvec{e}\)) of orthogonal primitive idempotents.
Note that Jordan frames always have r primitive idempotents in them, so in Jordan frames \(k=r\).
Theorem 2.1
(Spectral decomposition I, [31, Theorem 6]). Let \(\mathcal {J}\) be a Euclidean Jordan algebra. Then for each \(\varvec{x} \in \mathcal {J}\) there exist unique real numbers \(\lambda _1(\varvec{x}), \lambda _2(\varvec{x}), \ldots , \lambda _k(\varvec{x})\), all distinct, and a unique complete system of orthogonal idempotents \(\varvec{c}_1(\varvec{x}), \varvec{c}_2(\varvec{x}), \ldots , \varvec{c}_k(\varvec{x})\) such that
Theorem 2.2
(Spectral decomposition II, [31, Theorem 7]). Let \(\mathcal {J}\) be a rank-r Euclidean Jordan algebra. Then for each \(\varvec{x} \in \mathcal {J}\) there exist real numbers \(\lambda _1(\varvec{x}), \lambda _2(\varvec{x}), \ldots , \lambda _r(\varvec{x})\) and a Jordan frame \(\varvec{c}_1(\varvec{x}), \varvec{c}_2(\varvec{x}), \ldots , \varvec{c}_r(\varvec{x})\) such that
and \(\lambda _1, \lambda _2, \ldots , \lambda _r\) are the eigenvalues of \(\varvec{x}\).
The elements \(\varvec{x}\) and \(\varvec{y}\) of a Jordan algebra are called simultaneously decomposed if they share a Jordan frame, i.e., \(\varvec{x} = \lambda _1(\varvec{x})\, \varvec{c}_1 + \cdots + \lambda _r(\varvec{x})\, \varvec{c}_r\) and \(\varvec{y} = \lambda _1(\varvec{y})\, \varvec{c}_1 + \cdots + \lambda _r(\varvec{y})\, \varvec{c}_r\) for a Jordan frame \(\{\varvec{c}_1, \ldots ,\varvec{c}_r\}\) (hence, we have \(\varvec{c}_i(\varvec{x})=\varvec{c}_i(\varvec{y})\) for each \(i=1, \ldots , r\)). Two elements \(\varvec{x}\) and \(\varvec{y}\) are said to operator commute if for all \(\varvec{z}\), we have that \(\varvec{x} \circ (\varvec{y}\circ \varvec{z}) = \varvec{y} \circ (\varvec{x} \circ \varvec{z})\). Two elements of a Euclidean Jordan algebra are simultaneously decomposed iff they operator commute [1, Theorem 27].
Let \(\varvec{x}\) be an element in a Jordan algebra \(\mathcal {J}\) with the spectral decomposition given in (2). Then \(\text {trace}(\varvec{x}):=\lambda _1(\varvec{x}) +\cdots + \lambda _r(\varvec{x})\) is called the trace of \(\varvec{x}\) in \(\mathcal {J}\), and \(\det (\varvec{x}):=\lambda _1(\varvec{x}) \cdots \lambda _r(\varvec{x})\) is the determinant of \(\varvec{x}\) in \(\mathcal {J}\).
For a Euclidean Jordan algebra \(\mathcal {J}\), we define the Frobenius inner product between two elements \(\varvec{x}, \varvec{y} \in \mathcal {J}\) as
Due to the bilinearity of “\(\circ \)”, one can show that the map “\(\bullet \)” satisfies the three conditions in Definition 2.4. (See [1, Section 2]).
Example 2.2
One can easily show that the algebra \(\mathcal {E}^{n}\), with the Jordan multiplication “\(\,\circ \)” defined in (1), forms a Euclidean Jordan algebra under the inner product \(\varvec{x} \bullet \varvec{y} =2 \varvec{x}^\mathsf{T }\varvec{y}=2\left( x_0y_0+\varvec{{\widetilde{x}}}^\mathsf{T }\varvec{{\widetilde{y}}}\right) \).
Now, for \(\varvec{x} \in \mathcal {J}\) having the spectral decomposition given in (2), we can define \(\varvec{x}^2\) as
One can easily see that \(\varvec{x}^2 = \varvec{x} \circ \varvec{x}= \varvec{x}^{(2)}\).
If \(\det (\varvec{x}) \ne 0\) (i.e., all \(\lambda _i(\varvec{x}) \ne 0\)), then we say that \(\varvec{x}\) is nonsingular. In this case, the inverse of \(\varvec{x}\), which is denoted by \(\varvec{x}^{-1}\), is the unique element that satisfies \(\varvec{x}^{-1}\circ \varvec{x} = \varvec{e}\). Therefore,
More generally, if \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) is continuous, then it is also possible to extend the above definition to define \(f(\varvec{x})\) as
In particular, the square root of \(\varvec{x}\) is defined as
provided that \(\lambda _i (\varvec{x}) \ge 0, i=1, 2, \ldots , r\).
The cone of squares of \(\mathcal {J}\) is defined as \(\mathcal {K}_\mathcal {J}:=\left\{ \varvec{x}^2 : \varvec{x} \in \mathcal {J} \right\} \). The logarithmic barrier function, which is defined on the interior of \(\mathcal {K}_\mathcal {J}\) (denoted by \(\text {int}\,\mathcal {K}_\mathcal {J}\)) as \(\ell (\varvec{x}) :=- \ln \det \left( \varvec{x} \right) \), will play an important role for our subsequent development.
In the following theorem, we give the Jordan algebraic characterization of symmetric cones.
Theorem 2.3
[1, Theorem 2] A regular cone is symmetric iff it is the cone of squares of some Euclidean Jordan algebra.
Example 2.3
The cone of squares of the algebra \((\mathcal {E}^n, \circ )\), with “\(\,\circ \)” defined in (1), is the second-order cone, which is defined as
The mappings introduced in following definition play an important role in the development of the interior point methods for conic programming [1].
Definition 2.5
Let \(\varvec{x}\) be an element in a Jordan algebra \(\mathcal {J}\). Then
-
(a)
The linear map \(L(\varvec{x}) : \mathcal {J} \longrightarrow \mathcal {J}\) is defined as \(L(\varvec{x}) \varvec{y} :=\varvec{x} \circ \varvec{y}\) for all \(\varvec{y} \in \mathcal {J}\).
-
(b)
The quadratic representation of \(\varvec{x}\), \(Q_{(\varvec{x})}: \mathcal {J} \longrightarrow \mathcal {J}\), is defined as \(Q_{(\varvec{x})} := 2 L(\varvec{x})^2 - L(\varvec{x}^2)\).
Note that \(Q_{(\varvec{x})}\) is also a linear operator in \(\mathcal {J}\). Note also that \(L(\varvec{x}) \varvec{e} = \varvec{x}, L(\varvec{x}) \varvec{x} = \varvec{x}^2\), \(L(\varvec{e}) = Q_{(\varvec{e})} =I, \; \text {trace}(\varvec{e}) = 2\) and \(\det (\varvec{e})=1\) (since all the eigenvalues of \(\varvec{e}\) are equal to one).
The Frobenius norm of an element \(\varvec{x} \in \mathcal {J}\) is defined as \(\Vert \varvec{x}\Vert _F := \sqrt{\varvec{x} \bullet \varvec{x}}= \sqrt{\sum _{i=1}^{r} \lambda _i^2(\varvec{x})}\). For any \(\varvec{x}, \varvec{y}\in \mathcal {J}\), we have
All the above notions are also used in the block sense as follows. Let \((\mathcal {J}_1, \circ _1, \bullet _1), (\mathcal {J}_2, \circ _2, \bullet _2), \ldots \), \((\mathcal {J}_q, \circ _q, \bullet _q)\), be Euclidean Jordan algebras with identities \(\varvec{e}_1, \varvec{e}_2, \ldots , \varvec{e}_q\) and cone of squares \(\mathcal {K}_{\mathcal {J}_1}\), \(\mathcal {K}_{\mathcal {J}_2}\), \(\cdots , \mathcal {K}_{\mathcal {J}_q}\), respectively. Let also \(\varvec{x} := (\varvec{x}_1; \varvec{x}_2; \ldots ; \varvec{x}_q)\) and \(\varvec{y} :=(\varvec{y}_1; \varvec{y}_2; \ldots ; \varvec{y}_q)\) with \(\varvec{x}_i, \varvec{y}_i \in \mathcal {J}_i\) for \(i=1, 2, \ldots , q\). Then
-
(a)
\(\mathcal {K}_\mathcal {J} :=\mathcal {K}_{\mathcal {J}_1} \times \mathcal {K}_{\mathcal {J}_2} \times \cdots \times \mathcal {K}_{\mathcal {J}_q}\) is the cone of squares of \(\mathcal {J} := \mathcal {J}_1 \times \mathcal {J}_2 \times \cdots \times \mathcal {J}_q\).
-
(b)
\(\varvec{e} := \left( \varvec{e}_1; \varvec{e}_2; \ldots ; \varvec{e}_q\right) \) is the identity vector of \(\mathcal {J}\).
-
(c)
\(\varvec{x} \circ \varvec{y} := (\varvec{x}_1 \circ _1 \varvec{y}_1; \varvec{x}_2 \circ _2 \varvec{y}_2; \ldots ; \varvec{x}_q \circ _q \varvec{y}_q)\).
-
(d)
\(\varvec{x}\bullet \varvec{y} := {\varvec{x}_1}\bullet _1 \varvec{y}_1 + {\varvec{x}_2}\bullet _2 \varvec{y}_2 + \cdots + {\varvec{x}_q}\bullet _q \varvec{y}_q\).
-
(e)
\(\text {L}(\varvec{x}):= \text {L}(\varvec{x}_1) \oplus \text {L}(\varvec{x}_2) \oplus \cdots \oplus \text {L}(\varvec{x}_q)\)Footnote 1.
-
(f)
\(Q_{(\varvec{x})}:= Q_{(\varvec{x}_1)} \oplus Q_{(\varvec{x}_2)} \oplus \cdots \oplus Q_{(\varvec{x}_q)}\).
-
(g)
\(f\!(\varvec{x})\!\!:=\!(f(\varvec{x}_1); f(\varvec{x}_2); \ldots ; f(\varvec{x}_q))\). In particular, \(\varvec{x}^{-1}\!:=\!\left( {\varvec{x}_1}^{-1}; {\varvec{x}_2}^{-1}; \ldots ; {\varvec{x}_q}^{-1}\right) \).
-
(h)
\(\Vert \varvec{x} \Vert _F^2 := \Vert \varvec{x}_1 \Vert _F^2 + \Vert \varvec{x}_2 \Vert _F^2 + \cdots + \Vert \varvec{x}_q \Vert _F^2\).
-
(i)
The logarithmic barrier is defined on \(\text {int}\,\mathcal {K}_{\mathcal {J}}\) as \(\ell (\varvec{x}) := - \sum _{i=1}^{q} \ln \det \left( \varvec{x}_i \right) \).
-
(j)
\(\varvec{x}\) and \(\varvec{y}\) operator commute iff \(\varvec{x}_i\) and \(\varvec{y}_i\) operator commute for each \(i=1, 2, \ldots , q\).
We write \(\varvec{x} \succeq _{\mathcal {K}_{\mathcal {J}}} \varvec{0}\) (or simply \(\varvec{x} \succeq \varvec{0}\) when \(\mathcal {K}_{\mathcal {J}}\) is known from the context) to mean that \(\varvec{x} \in \mathcal {K}_{\mathcal {J}}\) (i.e., \(\varvec{x}\) is positive semidefinite). We also write \(\varvec{x} \succ \varvec{0}\) to mean that \(\varvec{x} \in \text {int} \mathcal {K}_{\mathcal {J}}\) (i.e., \(\varvec{x}\) is positive definite), and write \(\varvec{x} \succeq \varvec{y}\; (\varvec{x} \succ \varvec{y})\) to mean that \(\varvec{x} - \varvec{y} \succeq \varvec{0}\; (\varvec{x} - \varvec{y} \succ \varvec{0})\). Note that \(\varvec{x} \succeq \varvec{0}\) (\(\varvec{x} \succ \varvec{0}\)) iff \(\lambda _i (\varvec{x}) \ge 0 ~(\lambda _i (\varvec{x}) > 0)\) for \(i=1, 2, \ldots , r\).
3 Problem formulation and Newton’s method
In this section, we introduce the primal and dual forms of symmetric programming problem and the perturbed problem. Then we apply Newton’s method to the perturbed problem.
3.1 Problem formulation and assumptions
Let \(q \ge 1\) be an integer. For each \(i=1, 2, \ldots , q\), let \(n, n_i, r\) and \(r_i\) be positive integers such that \(n:=\sum _{i=1}^{q}n_i\) and \(r:=\sum _{i=1}^{q}r_i\). Let \((\mathcal {J}_1, \circ _1, \bullet _1), (\mathcal {J}_2, \circ _2, \bullet _2), \ldots \), \((\mathcal {J}_q, \circ _q, \bullet _q)\) be Euclidean Jordan algebras with dimensions \(n_1, n_2, \ldots , n_q\), ranks \(r_1, r_2, \ldots , r_q\), identities \(\varvec{e}_1, \varvec{e}_2, \ldots , \varvec{e}_q\) and cone of squares \(\mathcal {K}_{\mathcal {J}_1}, \mathcal {K}_{\mathcal {J}_2}, \ldots \mathcal {K}_{\mathcal {J}_q}\), respectively, and let
Let also \(\varvec{y}, \varvec{b} \in {\mathbb {R}}^{m}\), and \(\varvec{x}, \varvec{c}, \varvec{s} \in \mathcal {J}\) and \(A \in {\mathbb {R}}^{m \times n}\) be such that they are conformally partitioned as
where \(\varvec{x}_i, \varvec{s}_i, \varvec{c}_i \in \mathcal {J}_i\) and \(A_i \in {\mathbb {R}}^{m \times n_i}\) for \(i=1, 2, \ldots , q\). More specifically, the submatrix \(A_i\) consists of m rows \(\varvec{a}^{(i)}_1, \varvec{a}^{(i)}_2, \ldots , \varvec{a}^{(i)}_m \in \mathcal {J}_i\) and acts as a linear transformation that maps \(\varvec{x}_i\) to the \(m^\text {th}\)-dimensional vector whose \(j^\text {th}\) component is \(\varvec{a}^{(i)}_j \bullet _i \varvec{x}_i\) for each \(i=1, 2, \ldots , q\).
The symmetric programming problem and its dual in multi-block structures are defined as [1, Section 3]
We can rewrite the pair \((P_i, D_i)\) compactly as
The general scheme of the central trajectory methods is as follows. We associate the perturbed problems to symmetric programming problem (P) and (D), then we draw a path of the centers defined by the perturbed KKT optimality conditions. After that, Newton’s method is applied to treat the corresponding perturbed equations in order to obtain a descent search direction. As we mentioned earlier, we propose four different selections of calculating the displacement step.
Let \(\mu > 0\) be a barrier parameter and \(\sigma \in (0, 1)\) be the centering parameter. The perturbed primal problem corresponding to the primal problem (P) is
and the perturbed dual problem corresponding to the dual problem (D) is
Now, we define the following feasibility sets:
Next, we make two assumptions about the primal-dual pair (P, D).
Assumption 3.1
The m rows of the matrix A are linearly independent.
Assumption 3.2
The set \(\mathcal {F}^\circ \) is nonempty.
Assumption 3.1 is for convenience. Assumption 3.2 requires that Problem (P\(_\mu \)) and its dual (\(\hbox {D}_\mu \)) have strictly feasible solutions, which guarantees strong duality for the symmetric programming problem. Note that the feasible region for Problems (P\(_\mu \)) and (\(\hbox {D}_\mu \)) is described implicitly by \(\mathcal {F}_0\). Due to the coercivity of the function \(f_\mu \) on the feasible set of \(P_\mu \), Problem (P\(_\mu \)) has an optimal solution.
The following lemma proves the convergence of the optimal solution of Problem \((P_\mu )\) to the optimal solution of Problem (P) when \(\mu \) approaches zero.
Lemma 3.1
Let \(\bar{\varvec{x}}_\mu \) be an optimal primal solution of Problem \((P_\mu )\), then \(\bar{\varvec{x}} = \lim _{\mu \rightarrow 0} \bar{\varvec{x}}_\mu \) is an optimal solution of Problem (P).
Proof
Let \(f_\mu (\varvec{x}) := f(\varvec{x}, \mu )\) and \(f(\varvec{x}) := f(\varvec{x}, 0)\). Due to the coercivity of the function \(f_\mu \) on the feasible set of \(P_\mu \), Problem (P\(_\mu \)) has an optimal solution, say \(\bar{\varvec{x}}_\mu \), such that
Then, for all \(\varvec{x} \in \mathcal {F}_P^\circ \), we have that
Since \(\varvec{x}\) was arbitrary in \(\mathcal {F}^\circ _P\), this implies that \(\min _{\varvec{x} \in \mathcal {F}^\circ _P} f(\varvec{x}) \ge \varvec{c} \bullet \bar{\varvec{x}}_\mu - r \mu \ge \varvec{c} \bullet \bar{\varvec{x}}_\mu = f(\bar{\varvec{x}}_\mu )\). On the other side, we have \(f(\bar{\varvec{x}}_\mu ) \ge \min _{\varvec{x} \in \mathcal {F}^\circ _P} f(\varvec{x})\). As \(\mu \) goes to 0, it immediately follows that \(f(\bar{\varvec{x}}) = \min _{\varvec{x} \in \mathcal {F}^\circ _P} f(\varvec{x})\). Thus, \(\bar{\varvec{x}}\) is an optimal solution of Problem (P). The proof is complete. \(\square \)
3.2 Newton’s method and commutative directions
As we mentioned, the objective function of Problem \((P_\mu )\) is strictly convex, hence the KKT conditions are necessary and sufficient to characterize an optimal solution of Problem \((P_\mu )\). Consequently, the points \(\bar{\varvec{x}}_\mu \) and \((\bar{\varvec{y}}_\mu , \bar{\varvec{s}}_\mu )\) are optimal solutions of \((P_\mu )\) and \((D_\mu )\) respectively iff they satisfy the perturbed nonlinear system
where \(\varvec{e} := \left( \varvec{e}_{n_1}; \varvec{e}_{n_2}; \ldots ; \varvec{e}_{n_q}\right) \) is the identity vector of \(\mathcal {J}\).
We call the set of all solutions of system (4), denoted by \((\varvec{x}_\mu , \varvec{y}_\mu , \varvec{s}_\mu )\) with \(\mu >0\), the central trajectory or central path. We say that a point \((\varvec{x}, \varvec{y}, \varvec{s})\) is near to the central trajectory if it belongs to the set \(\mathcal {N}_\theta (\mu )\), which is defined as
Now, we can apply Newton’s method to system (4) and obtain the following linear system
where \((\Delta \varvec{x}; \Delta \varvec{y}; \Delta \varvec{s}) \in \mathcal {J} \times {\mathbb {R}}^m \times \mathcal {J}\) is the search direction and \(\mu =\frac{1}{r} \varvec{x} \bullet \varvec{s}\) is the normalized duality gap corresponding to \((\varvec{x}, \varvec{y}, \varvec{s})\).
Note that the strict second-order cone inequalities \(\varvec{x}, \varvec{s} \succ \varvec{0}\) imply that \(d_F(\varvec{x}, \varvec{s}) \le \Vert \varvec{x} \circ \varvec{s} - \mu \varvec{e}\Vert _F\) with equality holds when \(\varvec{x}\) and \(\varvec{s}\) operator commute [1, Lemma 30]. In fact, it is known that many interesting properties become apparent for the analysis of interior-point methods when \(\varvec{x}\) and \(\varvec{s}\) operator commute. The commutative class is denoted by \(\mathcal {C}(\varvec{x}, \varvec{s})\) and is defined as
From [1, Lemma 28], the equality \(\varvec{x} \circ \varvec{s} = \mu \varvec{e}\) holds iff the equality \((Q_{(\varvec{p})}\varvec{x}) \circ (Q_{(\varvec{p}^{-1})}\varvec{s}) = \mu \varvec{e}\) holds, for any nonsingular vector \(\varvec{p}\) in \(\mathcal {J}\). Therefore, for any given nonsingular vector \(\varvec{p} \in (\mathcal {J}, \circ , \bullet )\), the system (4) is equivalent to the system
Let \(\varvec{v} \in (\mathcal {J}, \circ , \bullet )\). With respect to a nonsingular vector \(\varvec{p} \in (\mathcal {J}, \circ , \bullet )\), we define the scaling vectors \(\overline{\varvec{v}}\) and \(\underline{\varvec{v}}\) and the scaling matrix \(\underline{A}\) as
Using this change of variables and the fact that \(Q_{(\varvec{p})}\left( \mathcal {J}\right) = \mathcal {J}\), we conclude that the system (5) is equivalent to the following Newton system
Here, the normalized duality gap is \(\mu = \frac{1}{r} \overline{\varvec{x}} \bullet \underline{\varvec{s}} =\frac{1}{r} \varvec{x} \bullet \varvec{s}\). In fact,
Solving the scaled Newton system (8) yields the search direction \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}}\right) \). Then, we apply the inverse scaling to \(\left( \overline{\Delta \varvec{x}}; \underline{\Delta \varvec{s}}\right) \) to obtain the Newton direction \((\Delta \varvec{x}; \Delta \varvec{s})\). Note that the search direction \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}}\right) \) belongs to the so-called the MZ family of directions (due to Monteiro [21] and Zhang [13]). In fact, such a way of scaling originally proposed for semidefinite programming by Monteiro [21] and Zhang [13], and after that it was generalized for general symmetric cone programming by Schmieta and Alizadeh [1].
Clearly, the set \(\mathcal {C}(\varvec{x}, \varvec{s})\) defined in (6) is a subclass of the MZ family of search directions. Our focus in this paper is in vectors \(\varvec{p} \in \mathcal {C}(\varvec{x}, \varvec{s})\). We discuss the following three choices of \(\varvec{p}\) (see [1, Section 3]): We may choose \(\varvec{p} = \varvec{x}^{1/2}\) to obtain \(\overline{\varvec{x}}= \varvec{e}\), and we may choose \(\varvec{p} = \varvec{s}^{-1/2}\) to obtain \(\underline{\varvec{s}} = \varvec{e}\). These two choices of directions are called the HRVW/KSH/M directions (due to Helmberg et al. [22], Monteiro [21] and Kojima et al. [32]). The third choice of \(\varvec{p}\) is given by
which yields \(Q_{(\varvec{p})}^2 \varvec{x} = \varvec{s}\), and therefore \({\underline{\varvec{s}}} = Q_{(\varvec{p}^{-1})} \varvec{s} = Q_{(\varvec{p})} \varvec{x} = {\overline{\varvec{x}}}\). This choice of directions is called the NT direction (due to Nesterov and Todd [7]).
4 Selections of the displacement step
We start by motivating the need for introducing the so-called displacement step. Note that the positive definiteness of the vectors \(\varvec{x}^+ = \varvec{x} + \Delta \varvec{x}\) and \(\varvec{s}^+ = \varvec{s} + \Delta \varvec{s}\) is not always achieved. In order to circumvent this difficulty, we introduce a parameter \(\alpha >0\), for which we call the displacement step-size or simply the displacement step, then we redefine \(\varvec{x}^+, \varvec{y}^+\) and \(\varvec{s}^+\) as
Calculating the displacement step using classical line search methods is undesirable and even generally impossible [20, 23]. Strategies for determining the displacement step and preserving positive definiteness exists in the literature for semidefinite programming case only [23]. To the best of our knowledge, no such strategies available for general symmetric programming. From this stems the importance of the results of this paper. Calculating the search direction for any Newton method for problems over symmetric cones has been well established in the literature, but there are gaps in terms of maintaining feasibility. In order to fill such gaps, this section proposes four different selections of computing the appropriate displacement step for symmetric optimization. Lemmas 4.1, 4.2, 4.3 and 4.4 below generalize Lemmas 3.3, 3.4, 3.5 and 3.6, respectively, in [23, Section 3]. Each one of the following four lemmas gives a selection to calculate the displacement step \(\alpha \).
Lemma 4.1
(The first selection lemma). Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\). If \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\), then \(\varvec{x}^+, \varvec{s}^+ \succ \varvec{0}\), where for \(\varvec{v} \in \{\varvec{x}, \varvec{s} \}\) we have
where
and \(\epsilon \) is a small positive real. Here, \(\lambda _i\left( \varvec{v}^{-1/2} \circ \left( \Delta \varvec{v} \circ \varvec{v}^{-1/2}\right) \right) , i=1, 2, \ldots , r\), are the eigenvalues of the vector \(\varvec{v}^{-1/2} \circ \left( \Delta \varvec{v} \circ \varvec{v}^{-1/2}\right) \).
Proof
Since \(\varvec{x} \succ \varvec{0}\), the vectors \(\varvec{x}^{\pm 1/2}\) are well-defined and we have \(\varvec{x}^{\pm 1/2} \succ \varvec{0}\). Note that
Therefore, as \(\varvec{x}^{-1/2} \succ \varvec{0}\), we have that
Hence,
If \(\min _{1\le i\le r} \lambda _i\left( \varvec{x}^{-1/2} \circ \left( \Delta \varvec{x} \circ \varvec{x}^{-1/2}\right) \right) < 0\), then \(\alpha < -1 / \min _{1\le i\le r} \lambda _i\left( \varvec{x}^{-1/2} \circ \right. \left. \left( \Delta \varvec{x} \circ \varvec{x}^{-1/2}\right) \right) \). Using Proposition 2.1, we conclude that
This gives the expression of \(\alpha _{\varvec{x}}\). Applying the same procedure, we obtain \(\alpha _{\varvec{s}}\). Finally, we choose \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\). The proof is complete. \(\square \)
Lemma 4.2
(The second selection lemma). Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\). If \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\), then \(\varvec{x}^+, \varvec{s}^+ \succ \varvec{0}\), where for \(\varvec{v} \in \{\varvec{x}, \varvec{s} \}\) we have
where
and \(\epsilon \) is a small positive real. Here, \(\lambda _i\left( \varvec{v}^{-1} \circ \Delta \varvec{v} \right) , i=1, 2, \ldots , r\), are the eigenvalues of the vector \(\varvec{v}^{-1} \circ \Delta \varvec{v} \).
Proof
Since \(\varvec{x} \succ \varvec{0}\), the vector \(\varvec{x}^{-1}\) is invertible and positive definite (i.e., \(\varvec{x}^{-1} \succ \varvec{0}\)). Note that
Therefore, as \(\varvec{x}^{-1} \succ \varvec{0}\), we have that
Hence,
If \(\min _{1\le i\le r} \lambda _i\left( \varvec{x}^{-1} \circ \Delta \varvec{x} \right) < 0\), then \(\alpha < -1 / \min _{1\le i\le r} \lambda _i\left( \varvec{x}^{-1} \circ \Delta \varvec{x} \right) \). Using Proposition 2.1, we conclude that
This gives the expression of \(\alpha _{\varvec{x}}\). Applying the same procedure, we obtain \(\alpha _{\varvec{s}}\). Finally, we choose \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\). The proof is complete. \(\square \)
Lemma 4.3
(The third selection lemma). Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\). If \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\), then \(\varvec{x}^+, \varvec{s}^+ \succ \varvec{0}\), where for \(\varvec{v} \in \{\varvec{x}, \varvec{s} \}\) we have
where
and \(\epsilon \) is a small positive real. Here, \(\lambda _i(\varvec{v}), i=1, 2, \ldots , r\), are the eigenvalues of the vector \(\varvec{v}\), and \(\lambda _i(\Delta \varvec{v}), i=1, 2, \ldots , r\), are the eigenvalues of the vector \(\Delta \varvec{v}\),
Proof
It is known that \(\varvec{x}^+ = \varvec{x} + \alpha \Delta \varvec{x} \succ \varvec{0}\) iff \(\min _{1 \le i \le r} \lambda _i(\varvec{x}^+) >0\). Here, \(\lambda _i(\varvec{x}^+), i=1, 2, \ldots , r\), are the eigenvalues of the vector \(\varvec{x}^+\). It is also known (see for example [33]) that
Then, it is enough to find \(\alpha \) such that
Hence
Using Proposition 2.1, we find
This gives the expression of \(\alpha _{\varvec{x}}\). Applying the same procedure, we obtain \(\alpha _{\varvec{s}}\). Finally, we choose \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\). The proof is complete. \(\square \)
Lemma 4.4
(The fourth selection lemma). Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\). If \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\), then \(\varvec{x}^+, \varvec{s}^+ \succ \varvec{0}\), where for \(\varvec{v} \in \{\varvec{x}, \varvec{s} \}\) we have
where
Proof
Note that a vector \(\varvec{w} \in \mathcal {J}\) is positive definite (i.e., \(\varvec{w} \succ \varvec{0}\)) iff \(\text {L}(\varvec{w}_i)\) is positive definite (i.e., \(\varvec{w}_i \succ \varvec{0}\)) for \(i=1, 2, \ldots , q\) [1, Lemma 12]. Then, \(\varvec{x}^+= \varvec{x} + \alpha \Delta \varvec{x} \succ \varvec{0}\) if
Observe that \(\text {L}(\varvec{x}^+_i) = \text {L}(\varvec{x}_i + \alpha \Delta \varvec{x}_i) = \text {L}(\varvec{x}_i) + \alpha \text {L}(\Delta \varvec{x}_i)\) for \(i=1, 2, \ldots , q\). So, the inequality (14) is equivalent to
Because, for \(i=1, 2, \ldots , q\), we have
it is enough to find \(\alpha \) such that
or equivalently
This gives the expression of \(\alpha _{\varvec{x}}\).
Applying the same procedure for the vector \(\varvec{s}^+ = \varvec{s} + \alpha \Delta \varvec{s}\), we obtain \(\alpha _{\varvec{s}}\). Finally, we choose \(\alpha = \rho \min \{\alpha _{\varvec{x}}, \alpha _{\varvec{s}}\}\) with \(0< \rho < 1\). The proof is complete. \(\square \)
5 The central trajectory algorithm
The central trajectory algorithm for solving symmetric programming problem is formally stated in Algorithm 5.1. Algorithm 5.1 selects a sequence of displacement steps \(\{\alpha ^{(k)} \}\) and centrality parameters \(\{\sigma ^{(k)}\}\) according to the following rule: for all \(k\ge 0\), we take \(\sigma ^{(k)} = 1 - \delta /\sqrt{r}\), where \(\delta \in \left[ 0, \sqrt{r}\right) \) as it will be seen in the next section.
In the rest of this section, we prove that the complementary gap and the function \(f_\mu \) decrease for a given displacement step. The proof of this result depends essentially on the following lemma.
Lemma 5.1
Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \text {int}\,\mathcal {K}_\mathcal {J} \times {\mathbb {R}}^m \times \text {int}\,\mathcal {K}_\mathcal {J}\), \(\left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) \) be obtained by applying scaling to \(\left( \varvec{x}; \varvec{y}; \varvec{s}\right) \), and \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}}\right) \) be a solution of the system (8). Then we have
-
(a)
\(\overline{\Delta \varvec{x}}\bullet \underline{\Delta \varvec{s}} = 0\).
-
(b)
\(\overline{\varvec{x}}\bullet \underline{\Delta \varvec{s}} + \overline{\Delta \varvec{x}}\bullet \underline{\varvec{s}} = trace (\varvec{h})\), where \(\varvec{h} = \sigma \mu \varvec{e} - \overline{\varvec{x}} \circ \underline{\varvec{s}}\) such that \(\sigma \in (0, 1)\) and \(\mu = \frac{1}{r} \overline{\varvec{x}}\bullet \underline{\varvec{s}}\).
-
(c)
\({\overline{\varvec{x}}^+}\bullet \underline{\varvec{s}}^+ =\left( 1-\alpha \left( 1-\frac{\sigma }{2}\right) \right) \; \overline{\varvec{x}}\bullet \underline{\varvec{s}}, \;\forall \alpha \in {\mathbb {R}}\), where \(\overline{\varvec{x}}^+ = \overline{\varvec{x}} + \alpha \overline{\Delta \varvec{x}}\) and \(\underline{\varvec{s}}^+ = \underline{\varvec{s}} + \alpha \underline{\Delta \varvec{s}}\).
-
(d)
\({\varvec{x}^+}\bullet \varvec{s}^+ =\left( 1-\alpha \left( 1-\frac{\sigma }{2}\right) \right) \; \varvec{x}\bullet \varvec{s}, \;\forall \alpha \in {\mathbb {R}}\), where \(\varvec{x}^+ = \varvec{x} + \alpha \Delta \varvec{x}\) and \({\varvec{s}}^+ = \varvec{s} + \alpha \Delta \varvec{s}\).
Proof
By the first two equations of the system (8), we get
This proves item (a).
We prove item (b) by noting that
where we used the last equation of the system (8) to obtain the first equality. To prove item (c), note that
where the third equality follows from items (a) and (b).
Finally, item (d) follows from item (c) and the fact that \(\overline{\varvec{x}}\bullet \underline{\varvec{s}}= \varvec{x}\bullet \varvec{s}\) (see (9)), and similarly that \({\overline{\varvec{x}}^+}\bullet \underline{\varvec{s}}^+= {{\varvec{x}}^+}\bullet {\varvec{s}}^+\). The proof is complete. \(\square \)
The following result generalizes the corresponding one in [23, Lemmas 4.2 and 4.3].
Lemma 5.2
Let \((\varvec{x}; \varvec{y}; \varvec{s})\) and \(\left( \varvec{x}^+; \varvec{y}^+; \varvec{s}^+\right) \) be strictly feasible solutions of the pair of problems \((P_\mu , D_\mu )\) with \(\left( \varvec{x}^+; \varvec{y}^+; \varvec{s}^+\right) = \left( \varvec{x} + \alpha \Delta \varvec{x};\; \varvec{y} + \alpha \Delta \varvec{y};\; \varvec{s} + \alpha \Delta \varvec{s}\right) \), where \(\alpha \) is a displacement step and \(\left( \Delta \varvec{x};\Delta \varvec{y};\Delta \varvec{s}\right) \) is the Newton direction. Then we have
-
(a)
\({\varvec{x}^+}\bullet \varvec{s}^+ < \overline{\varvec{x}}\bullet \underline{\varvec{s}}\).
-
(b)
\(f_\mu (\varvec{x}^+) < f_\mu (\varvec{x})\).
Proof
Note that
where the equality follows from item (d) of Lemma 5.1 and the strict inequality follows from \(\left( 1-\alpha \left( 1-\frac{\sigma }{2}\right) \right) < 1\) (as \(\alpha > 0\) and \(\sigma \in (0, 1)\)). This proves item (a).
To prove item (b), note that
and hence
Since
we have
where the strict inequality follows from the positive definiteness of the Hessian matrix \(\nabla ^2_{\varvec{x} \varvec{x}} f_\mu (\varvec{x})\) (as \(f_\mu \) is strictly convex). Thus, \(f_\mu (\varvec{x}^+) < f_\mu (\varvec{x})\). The proof is complete. \(\square \)
6 Complexity analysis
In this section, we analyze the complexity of the proposed central trajectory algorithm for symmetric programming. More specifically, we prove that the iteration-complexity of Algorithm 5.1 is bounded by
Our proof depends essentially on the following two lemmas.
Lemma 6.1
Let \((\varvec{x}; \varvec{y}; \varvec{s}) \in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\), \(\left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) \) be obtained by applying scaling to \(\left( \varvec{x}; \varvec{y}; \varvec{s}\right) \) with \(\varvec{h} = \sigma \mu \varvec{e} - \overline{\varvec{x}} \circ \underline{\varvec{s}}\), and \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}}\right) \) be a solution of the system (8). For any \(\alpha \in {\mathbb {R}}\), we set
Then
Proof
Given \(\alpha \in {\mathbb {R}}\), using item (c) of Lemma 5.1, we have
Thus, we get
This completes the proof. \(\square \)
Lemma 6.2
Let \((\varvec{x}; \varvec{y}; \varvec{s})\in \mathcal {F}^\circ _P \times \mathcal {F}^\circ _D\), \(\left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) \) be obtained by applying scaling to \(\left( \varvec{x}; \varvec{y}; \varvec{s}\right) \) such that \(\left\| \overline{\varvec{x}} \circ \underline{\varvec{s}} -\mu \varvec{e}\right\| \le \theta \mu \), for some \(\theta \in [0,1)\) and \(\mu >0\). Let also \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}}\right) \) be a solution of the system (8), \(\varvec{h} = \sigma \mu \varvec{e} - \overline{\varvec{x}} \circ \underline{\varvec{s}}\), \(\delta _{\varvec{x}}:=\mu \left\| \overline{\Delta \varvec{x}} \circ \overline{\varvec{x}}^{-1}\right\| _{F}, \delta _{\varvec{s}}:=\left\| \overline{\varvec{x}} \circ \underline{\Delta \varvec{s}} \right\| _{F}\). Then, we have
Proof
By the last equation of system (8) and from the operator commutativity, we have
It immediately follows that
where the second inequality follows from the assumption that \(\left\| \overline{\varvec{x}} \circ \underline{\varvec{s}} - \mu \varvec{e} \right\| \le \theta \mu \), and the third inequality follows from (3) the fact that
which is essentially zero due to item (a) of Lemma 5.1.
The right-hand side inequality in (16) follows by noting that \((\delta _{\varvec{x}} - \delta _{\varvec{s}})^2 \ge 0\), and the left-hand side inequality in (16) follows from the last inequality in (17). The proof is complete. \(\square \)
The following theorem analyzes the behavior of one iteration of Algorithm 5.1. This theorem generalizes [23, Theorem 7.1].
Theorem 6.1
Let \(\theta \in (0, 1)\) and \(\delta \in \left[ 0, \sqrt{r}\right) \) be given such that
Suppose that \(\left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) \in \mathcal {N}_\theta (\mu )\) and let \(\left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}} \right) \) denote the solution of system (8) with \(\varvec{h} = \sigma \mu \varvec{e} - \overline{\varvec{x}} \circ \underline{\varvec{s}}\) and \(\sigma = 1 - \frac{\delta }{\sqrt{r}}\). Then, we have
-
(a)
\({\overline{\varvec{x}}^+}\bullet \underline{\varvec{s}}^+ = \left( 1 - \frac{\delta }{\sqrt{r}} \right) \overline{\varvec{x}}\bullet \underline{\varvec{s}}\).
-
(b)
\(\left( \overline{\varvec{x}}^+;\varvec{y}^+;\underline{\varvec{s}}^+\right) = \left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) + \left( \overline{\Delta \varvec{x}}; \Delta \varvec{y}; \underline{\Delta \varvec{s}} \right) \in \mathcal {N}_\theta (\mu )\).
-
(c)
\(\left( \varvec{x}^+;\varvec{y}^+;\varvec{s}^+\right) = \left( \varvec{x}; \varvec{y}; \varvec{s}\right) + \left( \Delta \varvec{x}; \Delta \varvec{y}; \Delta \varvec{s} \right) \in \mathcal {N}_\theta (\mu )\).
Proof
Item (a) follows directly from item (c) of Lemma 5.1 with \(\alpha = 1\) and \(\sigma = 1 -\frac{\delta }{\sqrt{r}}\). We now prove item (b). Define
and let \(\left( \overline{\varvec{x}}; \varvec{y}; \underline{\varvec{s}}\right) \in \mathcal {N}_\theta (\mu )\), we then have
Since \(\Vert \overline{\varvec{x}} \circ \underline{\varvec{s}} - \mu \varvec{e}\Vert \le \theta \mu \) and \(\varvec{h} = \sigma \mu \varvec{e} - \overline{\varvec{x}} \circ \underline{\varvec{s}}\), using Lemma 6.2 it follows that
Defining \(\varvec{\nu }^+ := \varvec{\nu }(1) = \overline{\varvec{x}}^+ \circ \underline{\varvec{s}}^+ - \mu ^+ \varvec{e}\) and using (15) with \(\alpha = 1\), (21), (20), (18) and (19), we get
Consequently,
By using the right-hand side inequality in (16), and using (20) and (18), we have
where the strict inequality follows from \(\theta \le \frac{1}{2}\) and \(0< 1 - \frac{\delta }{\sqrt{r}} < 1\).
One can easily see that \(\left\| \overline{\Delta \varvec{x}} \circ \overline{\varvec{x}}^{-1} \right\| _F < 1\) implies that \(\varvec{e} + \overline{\Delta \varvec{x}} \circ \overline{\varvec{x}}^{-1} \succ \varvec{0}\), and therefore
Note that, from (22), we have \(\lambda _{\min }\left( \overline{\varvec{x}}^+ \circ \underline{\varvec{s}}^+ \right) \ge (1-\theta ) \mu ^+ >0\), and therefore \(\overline{\varvec{x}}^+ \circ \underline{\varvec{s}}^+ \succ \varvec{0}\). Since \(\overline{\varvec{x}}^+ \succ \varvec{0}\) and \(\overline{\varvec{x}}^+\) and \(\underline{\varvec{s}}^+\) operator commute, we conclude that \(\underline{\varvec{s}}^+ \succ \varvec{0}\). Using the first equation of system (8), we get
By using the second equation of system (8), we get
Thus, in view of (22), we deduce that \(\left( \overline{\varvec{x}}^+; \varvec{y}^+; \overline{\varvec{s}}^+\right) \in \mathcal {N}^\circ _\theta (\mu )\). Item (b) is therefore established. Item (c) follows from item (b) and [1, Proposition 29]. The proof is now complete. \(\square \)
Corollary 6.1
Let \(\theta \) and \(\delta \) as given in Theorem 6.1 and \(\left( \varvec{x}^{0}; \varvec{y}^{0}; \varvec{s}^{(0)}\right) \in \mathcal {N}_\theta (\mu )\). Then Algorithm 5.1 generates a sequence of points \(\left\{ \left( \varvec{x}^{k}; \varvec{y}^{k}; \varvec{s}^{(k)}\right) \right\} \subset \mathcal {N}_\theta (\mu )\) such that
Moreover, given a tolerance \(\epsilon >0\), Algorithm 5.1 computes an iterate \(\left\{ \left( \varvec{x}^{k}; \varvec{y}^{k}; \varvec{s}^{(k)}\right) \right\} \) satisfying \(\varvec{x}^{(k)}\bullet \varvec{s}^{(k)} \le \epsilon \) in at most \(K= \mathcal {O}\left( \sqrt{r} \ln \left[ \epsilon ^{-1}\varvec{x}^{(0)}\bullet \varvec{s}^{(0)} \right] \right) \) iterations.
Proof
Looking recursively at item (a) of Theorem 6.1, for each k we have that
By taking natural algorithm of both sides, we get
which holds only if
The result is established. \(\square \)
7 Numerical experiments
In this section, we present two numerical examples for second-order cone programs to demonstrate the efficiency of our algorithm. Our numerical experiments are carried out on a PC with Intel(R) Dual CPU at 2.20 GHz and 2 GB of physical memory. The PC runs MATLAB Version: 7.4.0.287 (R2007a) on Windows XP Enterprise 32-bit operating system. We denote by
-
\(1\text {st}\) Sel.: the first selection for calculating the displacement step,
-
\(2\text {nd}\) Sel.: the second selection for calculating the displacement step,
-
\(3\text {rd}\) Sel.: the third selection for calculating the displacement step,
-
\(4\text {th}\) Sel.: the fourth selection for calculating the displacement step,
-
(n, m): the size of problems,
-
Iter.: the number of iterations taken to obtain the optimal solution,
-
CPU(s): the time (in seconds) required to obtain the optimal solution.
We point out that the matrices used in our examples have full row rank. In all of our experiments, we used the NT direction for choosing the scaling vector \(\varvec{p}^{(k)} \in \mathcal {C}\left( \varvec{x}^{(k)}, \varvec{s}^{(k)}\right) \) because this direction takes the advantage of being primal-dual symmetric. The following example is taken from the literature, see [15, Section 6].
Example 7.1
Let n and m be positive integers such that \(n=2m\). We consider the second-order cone programming problem and its dual
where \(\mathcal {E}^{n}_+\) is the second-order cone defined in Example 2.3, and
Here, \(\mathbf ones (\ell , 1)\) is a vector of ones of length \(\ell \). We take \(\varvec{x}^{(0)} = \varvec{e} \in \mathcal {E}^n\) and \(\varvec{y}^{(0)} = \varvec{0} \in {\mathbb {R}}^m\) as our initial strictly feasible points. We also take \(\epsilon = 10^{-6}, \sigma = 0.1\) and \(\rho = 0.99\). As a well-known interior-point method for solving second-order cone programming, MOSEK [28] and SDPT3 [29] solvers were used in our experiments for comparison purposes. The numerical results related to this example are displayed in Table 1.
Due to the similarity in results between Examples 7.1 and 7.2, we analyze the numerical results of Table 1 after Example 7.2.
Example 7.2
In this example, the proposed primal-dual central trajectory algorithm was tested on randomly generated second-order cone programming problems. We generate a random matrix \(A\in {\mathbb {R}}^{n \times m}\) with full row rank and random vectors \(\varvec{x}, \varvec{s} \in \mathcal {E}^n_+\) and \(\varvec{y} \in {\mathbb {R}}^m\). We let \(\varvec{b} = A \varvec{x}\) and \(\varvec{c} = A^\mathsf{T }\varvec{y} + \varvec{s}\). Let \(\varvec{x}^{(0)} = \varvec{e}\in \mathcal {E}^{n}, \varvec{y}^{(0)} = \varvec{0} \in {\mathbb {R}}^m\) and \(\varvec{s}^{(0)} = \varvec{e}\in \mathcal {E}^{n}\) be initial points. Since the set of strictly feasible solutions of (P) and (D) are non-empty, the generated test problems (P) and (D) have optimal solutions and their optimal values are equal. The parameters used in Algorithm 5.1 were as follows: \(\epsilon = 1.3500e-06, \sigma = 0.1\) and \(\rho = 0.99\).
In our experiments, we generated test problems with size \(n(=2m)\) from 100 to 1200 and \(r=1\). The random test problems of each n are generated 5 times, and hence we have 60 random problems in total. In this example, we use MOSEK [28] and SDPT3 [29] solvers in our experiments for comparison purposes as another well-known interior-point method for solving second-order cone programming. We present the results of our numerical results in Table 2. Note that the values of “Iter” and “CPU(s)” are the average of 5 runs for each n.
In view of Tables 1 and 2, we can see that Algorithm 5.1 with the first, second or fourth selections are able to give an optimal solution with a favorable running time and requires less number of iterations than the MOSEK and SDPT3 solvers; this is probably due to the fact that these solvers implement infeasible algorithms and use infeasible initial points. We can also see that the fourth selection for calculating the displacement step has remarkable superiority to other three selections in terms of number of iterations and running time. It is interesting to note that our conclusion for second-order cone programming problems exactly matches the same conclusion obtained from the numerical experiments in [23, Section 5] for solving semidefinite programming problems.
8 Conclusions
In this paper, we have presented a primal-dual central trajectory interior-point method for solving symmetric programming problem. We have proven the convergence of the optimal solution of the corresponding perturbed problem to the optimal solution of the original problem when the barrier parameter goes to zero. Then, we have applied Newton’s method to find a new iterative point by computing a good descent direction. The inconvenience lies in the high computational cost motivated us to avoid using several methods, such as the line search methods, to calculate the displacement step. Alternatively, in this paper, we have proposed a new approach based on four new selections to calculate the displacement step.
After stating the central trajectory algorithm, we have analyzed the convergence of the proposed algorithm and have proven that the complexity for short-step is bounded by \(\mathcal {O}(\sqrt{r} \ln [\epsilon ^{-1}\varvec{x}^{(0)}\bullet \varvec{s}^{(0)}])\) iterations, where r is the rank of the Cartesian product of the corresponding Euclidean Jordan algebras. Our numerical results for second-order cone program have demonstrated the efficiency of our approach and have shown the convergence of the four proposed selections to the optimal solution of the problem. By looking at the computed time and number of iterations, the numerical results on second-order cone programs have shown that the fourth selection is the best selection that can be chosen to reach the optimal solution.
Notes
The direct sum of two square matrices A and B is the block diagonal matrix \(A \oplus B := \left[ \begin{array}{cc} A&{}0\\ 0&{}B\\ \end{array} \right] \).
References
Schmieta, S.H., Alizadeh, F.: Extension of primal-dual interior point methods to symmetric cones. Math. Program. Ser. A 96, 409–438 (2003)
Schmieta, S.H., Alizadeh, F.: Associative and Jordan algebras, and polynomial time interior point algorithms for symmetric cones. Math. Oper. Res. 26(3), 543–564 (2001)
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. Ser. B 95, 3–51 (2003)
Alzalg, B.: Stochastic second-order cone programming: application models. Appl. Math. Model. 36, 5122–5134 (2012)
Todd, M.J.: Semidefinite optimization. ACTA Numer. 10, 515–560 (2001)
Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Rev. 38, 49–95 (1996)
Nesterov, Y.E., Todd, M.J.: Primal-dual interior-point methods for self-scaled cones. SIAM J. Optim. 8(2), 324–364 (1998)
Alzalg, B., Ariyawansa, K.A.: Logarithmic barrier decomposition-based interior point methods for stochastic symmetric programming. J. Math. Anal. Appl. 409, 973–995 (2014)
Alzalg, B., Maggiono, F., Vitali, S.: Homogeneous self-dual methods for symmetric cones under uncertainty. Far East J. Math. Sci. 99(11), 1603–1778 (2016)
Benterki, D., Leulmi, A.: An improving procedure of the interior projective method for linear programming. Appl. Math. Comput. 199, 811–819 (2008)
Dodani, M., Babu, A.: Karmarkar’s projective method for linear programming: a computational appraisal. Comput. Ind. Eng. 16, 189–206 (1989)
Todd, M., Wang, Y.: On combined phase 1-phase 2 projective methods for linear programming. Algorithmica 9, 64–83 (1993)
Zhang, Y.: On extending primal-dual interior-point algorithms from linear programming to semidefinite programming. SIAM J. Optim. 8, 356–386 (1998)
Hans, Y.-J., Mittelmann, D.: Interior point methods for second-order cone programming and OR applications. Comput. Optim. Appl. 28, 255–285 (2004)
Tang, J., He, G., Dong, L., Fang, L.: A new one-step smoothing newton method for second-order cone programming. Appl. Math. 57, 311–331 (2012)
Alzalg, B.: Homogeneous self-dual algorithms for stochastic second-order cone programming. J. Optim. Theory Appl. 163(1), 148–164 (2014)
Alzalg, B.: Decomposition-based interior point methods for stochastic quadratic second-order cone programming. Appl. Math. Comput. 249, 1–18 (2014)
Alzalg, B.: Volumetric barrier decomposition algorithms for stochastic quadratic second-order cone programming. Appl. Math. Comput. 256, 494–508 (2015)
Kettab, S., Benterki, D.: A relaxed logarithmic barrier method for semidefinite programming. RAIRO Oper. Res. 42, 555–568 (2015)
Crouzeix, J.P., Merikhi, B.: A logarithm barrier method for semidefinite programming. RAIRO Oper. Res. 42, 123–139 (2008)
Monteiro, R.D.: Primal-dual path-following algorithms for semidefinite programming. SIAM J. Optim. 7, 663–678 (1997)
Helmberg, C., Rendl, F., Vanderbei, R.J., Wolkowicz, H.: An interior-point methods for stochastic semidefinite programming. SIAM J. Optim. 6, 342–361 (1996)
Touil, I., Benterki, D., Yassine, A.: A feasible primal-dual interior point method for linear semidefinite programming. J. Comput. Appl. Math. 312, 216–230 (2017)
Kebbiche, Z., Keraghel, A., Yassine, A.: Extension of a projective interior point method for linearly constrained convex programming. Appl. Math. Comput. 193, 553–559 (2007)
Gahinet, P., Nemirovski, A.: The projective method for solving linear matrix inequalities. Math. Prog. 77, 163–190 (1997)
Behling, R., Gonzaga, C., Haeser, G.: Primal-dual relationship between Levenberg–Marquardt and central trajectories for linearly constrained convex optimization. J. Optim. Theory Appl. 162, 705–717 (2014)
Gould, N., Orban, D., Robinson, D.: Trajectory-following methods for large-scale degenerate convex quadratic programming. Math. Prog. Comput. 5, 113–142 (2013)
MOSEK is an optimization software designed to solve large-scale mathematical optimization problems. http://www.mosek.com/. Accessed 5 Oct 2017
Toh, K.C., Todd, M.J., Tutuncu, R.H.: SDPT3 version 4.0–a MATLAB software for semidefinite-quadratic-linear programming, (2009). Available online at: http://www.math.nus.edu.sg/~mattohkc/sdpt3.html
Wolkowicz, H., Styan, G.-P.-H.: Bounds for eigenvalues using traces. Linear Algebra Appl. 29, 471–506 (1980)
Faraut, J., Korányi, A.: Analysis on Symmetric Cones. Oxford University Press, Oxford (1994)
Kojima, M., Shindoh, S., Hara, S.: Interior-point methods for the monotone linear complementarity problem in symmetric matrices. SIAM J. Optim. 7(9), 86–125 (1997)
Lütkepohl, H.: Handbook of Matrices. Humboldt-Universität zu Berlin, Germany (1996)
Acknowledgements
A part of this work was performed while the author was visiting The Center for Applied and Computational Mathematics at Rochester Institute of Technology, NY, USA. The work of the author was supported in part by the Deanship of Scientific Research at the University of Jordan. The author thanks the two anonymous expert referees for their valuable suggestions. The constructive comments from the referees have greatly enhanced the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alzalg, B. A primal-dual interior-point method based on various selections of displacement step for symmetric optimization. Comput Optim Appl 72, 363–390 (2019). https://doi.org/10.1007/s10589-018-0045-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-018-0045-8
Keywords
- Symmetric programming
- Interior-point methods
- Primal-dual methods
- Central trajectory methods
- Jordan algebras