Abstract
In this paper, we design a class of infeasible interior-point methods for linear optimization based on large neighborhood. The algorithm is inspired by a full-Newton step infeasible algorithm with a linear convergence rate in problem dimension that was recently proposed by the second author. Unfortunately, despite its good numerical behavior, the theoretical convergence rate of our algorithm is worse up to square root of problem dimension.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the linear optimization (LO) problem in standard format, given by (P), and its dual problem given by (D) (see Sect. 3). We assume that the problems (P) and (D) are feasible. As is well known, this implies that both problems have optimal solutions and the same optimal value. In Sect. 8, we discuss how infeasibility and/or unboundedness of (P) and (D) can be detected.
Interior-point methods (IPMs) for LO are divided into two classes: feasible interior-point methods (FIPMs) and infeasible interior-point methods (IIPMs). FIPMs assume that a primal-dual strictly feasible point is available from which the algorithm can immediately start. In order to get such a solution several initialization methods have been presented, e.g., by Megiddo [1] and Anstreicher [2]. The so-called big M method of Megiddo has not been received well due to the numerical instabilities caused by the use of huge coefficients. A disadvantage of the so-called combined phase-I and phase-II method of Anstreicher is that the feasible region must be nonempty and a lower bound on the optimal objective value must be known.
IIPMs have the advantage that they can start with an arbitrary point and try to achieve feasibility and optimality, simultaneously. It is worth to mention that the best-known iteration bound for IIPMs is due to Mizuno [3]. We refer to, e.g., [4] for a survey of IIPMs.
Recently, the second author proposed an IIPM with full-Newton steps [4]. Each main iteration of the algorithm improves optimality and feasibility simultaneously, by a constant factor until a solution with a given accuracy is obtained. The amount of reduction is too small for practical purposes. The aim of this paper is to design a more aggressive variant of the algorithm which improves optimality and feasibility faster. We emphasize that this is our aim and also what happens in practice. As we will see, however, our algorithm suffers the same irony that occurs for FIPMs, namely that the theoretical convergence rate of large-update methods is worse than for full-Newton methods.
In the analysis of our algorithm, we use so-called kernel function-based barrier function. Any such barrier function is based on a univariate function, named its kernel function.Footnote 1 Such functions can be found in [5] and are closely related to the so-called self-regular functions introduced in [6]. In these references only FIPMs are considered, and it is shown that these functions are much more efficient for the process of re-centering, which is a crucial part in every FIPM, especially when an iterate is far from the central path. Not surprising, it turned out that these functions are also useful in our algorithm, where re-centering is also a crucial ingredient.
The paper is organized as follows. In Sect. 2, we describe the notations which we use throughout the paper. In Sect. 3, we explain the properties of a kernel function-based barrier function. In Sect. 4, as a preparation to our large neighborhood-based IIPM, we briefly recall the use of kernel-based barrier functions in large-update FIPMs, as presented in [5]. It will become clear in this section that the convergence rate highly depends on the underlying kernel function.
In Sect. 5, we describe our algorithm in detail. In our description we use a search direction based on a general kernel function. The algorithm uses two types of damped Newton steps: a so-called feasibility step and some centering steps. The feasibility step serves to reduce the infeasibility, whereas the centering steps keep the infeasibility fixed, but improve the optimality. This procedure is repeated until a solution with enough feasibility and optimality is obtained. Though many parts of our analysis are valid for general kernel function, at some places we restrict ourselves to a specific kernel function. We show that the complexity of our algorithm based on this kernel function is slightly worse than the complexity of the algorithm introduced and discussed by Salahi et al. [7], who used another variant of this function. Note that the best-known iteration bound for the IIPMs that use a large neighborhood of the central path is due to Potra and Stoer [8] for a class of superlinearly convergent IIPMs for sufficient linear complementarity problems (LCP).
2 Notations
We use the following notational conventions. If \(x, s \in \mathbb {R}^n\), then xs denotes the coordinatewise (or Hadamard) product of the vectors x and s. The nonnegative orthant and positive orthant are denoted as \(\mathbb {R}^n_+\) and \(\mathbb {R}^n_{++}\), respectively. If \(z\in \mathbb {R}_+^n\) and \(f: \mathbb {R}_+\rightarrow \mathbb {R}_+\), then \(f\left( z\right) \) denotes the vector in \(\mathbb {R}_+^n\) whose i-th component is \(f\left( z_i\right) \), with \(1\le i \le n\). We write \(f(x)=O(g(x))\), if for any x in the domain of f, \(f(x)\le cg(x)\) for some positive constant c. For \(z\in \mathbb {R}^n\), we denote the \(l_1\)-norm by \(\left\| z\right\| _1\), and the Euclidean norm by \(\left\| z\right\| \).
3 Barrier Functions Based on Kernel Functions
Let the primal LO problem be defined as
and its dual problem as
Here, A is a full row rank matrix in \(\mathbb {R}^{m\times n},\,b,y\in \mathbb {R}^m\), \(c,x,s\in \mathbb {R}^n\); x, y and s are vectors of variables.
It is well known that the triple \(\left( x,y,s\right) \) is optimal for (P) and (D) if and only if
The first two equations require primal and dual feasibility, respectively, and \(xs = 0\) is the so-called complementarity condition. Note that since x and s are nonnegative, \(xs = 0\) holds if and only if \(x^Ts=0\). Therefore, since the feasibility conditions imply \(x^Ts = c^Tx-b^Ty\), the complementarity condition resembles the fact that at optimality the primal and dual objective values coincide.
Interior-point methods (IPMs) replace the complementarity condition by \(xs=\mu {\mathbf{e}},\) where \(\mu >0\). This gives rise to the following system:
where \(xs=\mu {\mathbf{e}}\) is called the centering condition. It has been established (see, e.g., [9, Theorem II.4]) that the system (2) has a solution for some \(\mu >0\) if and only if it has a solution for every \(\mu >0\). Moreover, this holds if and only if (P) and (D) satisfy the interior-point condition (IPC), i.e., there exist a feasible \(x>0\) for (P) and a feasible (y, s) with \(s>0\) for (D). If the matrix A has full row rank, then this solution is unique. It is denoted by \(\left( x(\mu ),y(\mu ),s(\mu )\right) \) and called the \(\mu \)-center of (P) and (D). The \(\mu \)-centers form a virtual path inside the feasibility region leading to an optimal solution of (P) and (D). This path is called the central path of (P) and (D). When driving \(\mu \) to zero, the central path converges to an optimal triple (x, y, s) for (P) and (D).
We proceed by showing that the \(\mu \)-center can be characterized as the minimizer of a suitably chosen primal-dual barrier function. In fact we will define a wide class of such barrier functions, each of which is determined by a kernel function. A kernel function is just a univariate nonnegative function \(\psi (t)\), where \(t>0\), which is strictly convex, minimal at \(t=1\) and such that \(\psi (1)=0\), whereas \(\psi (t)\) goes to infinity both when t goes to zero and when t goes to infinity.
Now let (x, y, s) be a feasible triple with \(x>0\) and \(s>0\). We call such a triple strictly feasible. We define the variance vector for this triple with respect to \(\mu \) as follows:
Observe that \(v={\mathbf{e}}\) holds if and only if (x, y, s) is the \(\mu \)-center of (P) and (D). Given any kernel function \(\psi \) we extend its definition to \(\mathbb {R}^n_{++}\) according to
It is obvious that \(\varPsi (v)\) is nonnegative everywhere, and \(\varPsi ({\mathbf{e}})=0\). Yet we can define a barrier function \(\varPhi (x,s,\mu )\) as follows:
where v is the variance vector of (x, y, s) with respect to \(\mu \). It is now obvious that \(\varPhi (x,s,\mu )\) is well-defined, nonnegative for every strictly feasible triple, and moreover,
This implies that \((x(\mu ),y(\mu ),s(\mu ))\) is the (unique) minimizer of \(\varPhi (x,s,\mu )\).
As in [5], we call the kernel function \(\psi \) eligible iff it satisfies the following technical conditions.
In the sequel it is always assumed that \(\psi \) is an eligible kernel function. Properties of eligible kernel functions will be recalled from [5] without repeating their proofs.
We would like to mention that a kernel function \(\psi \), introduced and discussed in [5], has the following format
The term \(\tfrac{t^2-1}{2}\) is called the growth term which goes to infinity as t goes to infinity. \(\psi _b\) has the property that it goes to infinity as t tends to zero. This term is called the barrier term of \(\psi \).
4 A Class of Large-update FIPMs for LO
In this section we recall from [5] some results for a large-update FIPM for solving (P) and (D) using a kernel function-based barrier function.
Let us start with a definition. We call a triple (x, y, s), with \(x>0,\,s>0,\) an \(\varepsilon \)-solution of (P) and (D) iff the duality gap \(x^Ts\) and the norms of the residual vectors \(r_b:= b - A x, \text{ and } r_c:= c-A^Ty - s\), do not exceed \(\varepsilon \). In other words, defining \(\epsilon (x,y,s):=\max \{x^Ts,\Vert r_b\Vert ,\Vert r_c\Vert \},\) we say that (x, y, s) is an \(\varepsilon \)-solution if \(\epsilon (x,y,s)\le \varepsilon .\)
We assume, without any loss of generality, that the triple
is a primal-dual feasible solution.Footnote 2 We then have \(x^0s^0=\mu ^0{\mathbf{e}}\) for \(\mu ^0=1\). This means that \((x^0,y^0,s^0)\) is the 1-center, and hence \(\varPhi (x^0,s^0,\mu ^0)=0\). We use this triple to initialize our algorithm.
Each main (or outer) iteration of the algorithm starts with a strictly feasible triple (x, y, s) that satisfies \(\varPhi (x,s,\mu )\le \tau \) for some \(\mu \in ]0,1]\), where \(\tau \) is a fixed positive constant. It then constructs a new triple \((x^+,y^+,s^+)\) such that \(\varPhi (x^+,s^+,\mu ^+)\le \tau \) with \(\mu ^+<\mu \). When taking \(\tau \) small enough, we obtain in this way a sequence of strictly feasible triples that belong to small neighborhoods of a sequence of \(\mu \)-centers, for a decreasing sequence of \(\mu \)’s. As a consequence, the sequence of constructed triples (x, y, s) converges to an optimal solution of (P) and (D).
We will assume that \(\mu ^+ = (1-\theta )\mu \), where \(\theta \in ]0,1[\) is a fixed constant, e.g., \(\theta =0.5\) or \(\theta =0.99\). The larger the \(\theta \), the more aggressive the algorithm is. In particular when \(\theta \) is large, each outer iteration will require several so-called inner iterations.
It is easy to compute the number the outer iterations. Using \(\mu ^0=1\) one has the following result.
Lemma 4.1
(cf. [9, Lemma II.17]) The algorithm needs at most
outer iterations to generate a strictly feasible \(\varepsilon \)-solution.
The main task is therefore to get a sharp upper estimate for the number of inner iterations during an outer iteration. We now describe how such an estimate is obtained. We go into some detail, though without repeating proofs, because the results that we recall below are relevant for the IIPM that we discuss in the next section.
As said before, at the start of each outer iteration we have a strictly feasible triple (x, y, s) and \(\mu >0\) such that \(\varPhi (x,s,\mu ) \le \tau \). We first need to estimate the increase in \(\varPhi \) when \(\mu \) is updated to \(\mu ^+ = (1-\theta )\mu \). For this we need the following Lemma.
Lemma 4.2
(cf. [5, Theorem 3.2]) Let \(\rho :[0,\infty [\rightarrow [1,\infty [\) be the inverse function of \(\psi (t)\) for \(t\ge 1\). Then we have for any positive vector v and any \(\beta \ge 1\):
Now let v be the variance vector of (x, y, s) with respect to \(\mu \). Then one easily understands that the variance vector \(v^+\) of (x, y, s) with respect to \(\mu ^+\) is given by \(v^+:=v/\sqrt{1-\theta }\). Hence, using Lemma 4.2 with \(\beta = 1/\sqrt{1-\theta }\) we may write
where the last inequality holds because \(\rho \) is monotonically increasing and \(\varPsi (v):=\varPhi (x,s,\mu ) \le \tau \). Hence the number \(\bar{\tau }\), defined by
is an upper bound for the value of \(\varPsi \) after a \(\mu \)-update. Note that this bound depends not on the triple (x, y, s), but only on the kernel function \(\psi \) and the parameters \(n,\tau \) and \(\theta \).
To simplify the notation we redefine \(\mu \) according to \(\mu :=\mu ^+\). Thus we need to deal with the following question: given a triple (x, y, s) such that \(\varPhi (x,s,\mu ) \le \bar{\tau }\), how many inner iterations are needed to generate a triple (x, y, s) such that \(\varPhi (x,s,\mu ) \le \tau \). To answer this question we have to describe an inner iteration. It has been argued in Section 2.2. of [5] that it is natural to define search directions \((\varDelta x,\varDelta y,\varDelta s)\) by the system
This system has a unique solution. It may be worth pointing out that if \(\psi (t) = \psi _1(t)\), then \(-\mu v\nabla \varPsi (v)=\mu {\mathbf{e}}-xs\), and hence the resulting direction is the primal-dual Newton direction that is used in all primal-dual FIPMs. By doing a line search in this direction with respect to \(\varPsi \) we get new iterates
where \(\alpha \) is the step size. According to [5, Lemma 4.4], we use below the following default step size:
where \(\rho \) is the inverse function of \(-\tfrac{1}{2}\psi ^{\prime }(t)\), and
Fig. 1 shows a formal description of the algorithm.
The closeness of (x, y, s) to the \(\mu \)-center is measured by \(\varPsi (v)\), where v is the variance vector of (x, y, s) with respect to the current value of \(\mu \). The initial triple (x, y, s) is as given by (5) and \(\mu =1\). So we then have \(\varPsi (v)=0 \le \tau \). After a \(\mu \)-update we have \(\varPsi (v) \le \bar{\tau }\). Then a sequence of inner iterations is performed to restore the inequality \(\varPsi (v)\le \tau \). Then \(\mu \) is updated again and so on. This process is repeated until \(n\mu \) falls below the accuracy parameter \(\varepsilon \) after which we have obtained an \(\varepsilon \)-solution.
To estimate the number of inner iterations we proceed as follows. Denoting the decrease in the value of \(\varPsi \) as \(\varDelta \varPsi \), it was shown in [5, Theorem 4.6] that
Since the kernel function \(\psi \) is eligible, the last expression is increasing in \(\delta (v)\) [5, Lemma 4.7]. Besides, by [5, Theorem 4.9], \(\delta (v)\) is bounded from below as follows:
Combining (7) and (8), we arrive at
Following [5], let \(\gamma \) be the smallest number such that
for some positive constant \(\kappa \), whenever \(\varPsi (v)\ge \tau \). In [10], it is established that such constants \(\kappa \) and \(\gamma \) exist for each kernel functions. When denoting the value of the barrier function after the \(\mu \)-update as \(\varPsi _0\) and the value after the k-th inner iteration as \(\varPsi _k\), it follows from (9) and (10) that
with \(\bar{\tau }\) as in (6). At this stage we may point out why the use of kernel functions other than the logarithmic kernel function may be advantageous. The reason is that if \(\psi \) is the logarithmic kernel function, then \(\gamma =1\), whence we obtain \(\varPsi _k \le \varPsi _{k-1} - \kappa \) for each \(k\ge 1\), provided that \(\varPsi _{k-1}\ge \tau \). This resembles the well-known fact that the best lower bound for the decrease in the logarithmic barrier function is a fixed constant, no matter what the value of \(\varPsi (v)\) is. As we will see, smaller values of \(\gamma \) can be obtained for other kernel functions, which leads to larger reductions in the barrier function value, and hence lower iteration numbers.
By [5, Lemma 5.1], (11) implies that the number of inner iterations will not exceed
Multiplying this number by the number of outer iterations, as given by Lemma 4.1, we obtain the following upper bound for the total number of iterations:
Given a kernel function \(\psi \), it is now straightforward to compute the resulting iteration bound from this expression.
Let now \(\psi (t) = \psi _q(t)\), where
This function is introduced and discussed in [11] whereby it is established that \(\gamma = \tfrac{q+1}{2q}\) and \(\kappa = \tfrac{1}{124\,q}\). As a result, the iteration number turns out to be
The expression \(qn^{\tfrac{q+1}{2q}}\) is minimal at \(q = \tfrac{1}{2}\log n\) and then it is equal to \(\tfrac{1}{2} e \log n \sqrt{n}\). Hence we obtain the iteration bound
for the algorithm, which is the best-known iteration bound for large-update FIPMs based on kernel functions. It should be mentioned that the best-known iteration bound for the FIPMs that use a large neighborhood of the central path is due to Potra [12] who designed a superlinearly convergent predictor–corrector algorithm for linear complementarity problems that has an \(O(\sqrt{n}L)\) iteration bound, with L denoting the length of a binary string encoding the problem data.
In this paper, we consider a IIPM based on the use of a kernel function. Although many of the results below hold for any eligible kernel function, we will concentrate of the kernel \(\psi _q\).
5 A Class of Large-Update IIPMs for LO
As usual in the theoretical analysis of IIPMs, we take the initial iterates as follows:
where \({\mathbf{e}}\) denotes the all-one vector in \(\mathbb {R}^n\), and \(\zeta \) is a number such that
for some optimal solutions \(\left( x^*,y^*,s^*\right) \) of (P) and (D). It is worth noting that if the data A, b, c are integral and L denotes the length of a binary string encoding the triple (A, b, c), then it is well known that there exist optimal solutions \(x^*\) of (P) and \((y^*,s^*)\) of (D) that satisfy (14) with \(\zeta =2^L\) [4, Section 4.7].
Following [4], for any \(0<\nu \le 1\) we consider the primal-dual perturbed pair (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)), defined as follows:
and
where \( r_b^0\) and \( r_c^0\) denote the initial primal and dual residual vectors, respectively. Note that, if \(\nu =0\), then the perturbed pair (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) coincides with the original pair (P) and (D). Moreover, \((x^0,y^0,s^0)\) is strictly feasible for the perturbed problems (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) if \(\nu = 1\). In other words, if \(\nu = 1\), then (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) satisfy the IPC and since \(x^0s^0 = \mu ^0{\mathbf{e}}\), with \(\mu ^0 = \zeta ^2\), the \(\mu ^0\)-center of \((P_1)\) and \((D_1)\) is given by \(\left( x^0,y^0,s^0\right) \). The following Lemma is crucial.
Lemma 5.1
(cf. [13, Theorem 5.13]) The original problems (P) and (D) are feasible if and only if for each \(\nu \) satisfying \(0 < \nu \le 1\) the perturbed problems (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) satisfy the IPC.
Hence, for any \(\nu \in ]0,1]\), the problems (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) satisfy the IPC. This implies that the central path of (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) exists. The \(\mu \)-center of (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) is uniquely determined by the system
In the sequel, the parameters \(\nu \) and \(\mu \) will always satisfy \(\mu =\nu \zeta ^2\). Therefore, we feel free to denote \(\mu \)-center of (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) as \(\left( x(\nu ),y(\nu ),s(\nu )\right) \). The set of these \(\mu \)-centers, as \(\nu \) runs through the interval ]0, 1], is called the homotopy path of (P) and (D) with respect to \(\zeta \). By driving \(\nu \) to zero, the homotopy path converges to an optimal solution of (P) and (D) [14]. Our algorithm starts at the \(\mu \)-center for \(\nu =1\) and then follows this homotopy path to obtain an \(\varepsilon \)-solution of (P) and (D). Note that along the homotopy path the residual vectors are given by \(\nu r_b^0\) and \(\nu r_c^0\), whereas the duality gap by \(\nu n \zeta ^2\). So these quantities converge to zero with the same speed. As a consequence we have
5.1 An Outer Iteration of the Algorithm
Just as in the case of large-update FIPMs we use a primal-dual barrier function to measure closeness of the iterates to the \(\mu \)-center of (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)). So, just as in Sect. 4, \(\varPsi (v)\) will denote the barrier function based on the kernel function \(\psi (t)\), as given in (3). Here, v denotes the variance vector of a triple (x, y, s) with respect to \(\mu >0\), and we define \(\varPhi (x,s,\mu )\) as in (4). The algorithm is designed in such a way that at the start of each outer iteration we have \(\varPsi (v) \le \tau \) for some threshold value \(\tau = O(1)\). As \(\varPsi (v) = 0\) at the starting points (13), the condition \(\varPsi (v) \le \tau \) is certainly satisfied at the start of the first outer iteration.
Each outer iteration of the algorithm consists of a feasibility step and some centering steps. At the start of the outer iteration we have a triple (x, y, s) that is strictly feasible for (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)), for some \(\nu \in ]0,1]\), and that belongs to the \(\tau \)-neighborhood of the \(\mu \)-center of (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)), where \(\mu = \nu \zeta ^2\). We first perform a feasibility step during which we generate a triple \(\left( x^f,y^f,s^f\right) \) which is strictly feasible for the perturbed problems (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)), where \(\nu ^+ = (1-\theta )\nu \) with \(\theta \in ]0,1[\), and moreover, close enough to the \(\mu ^+\)-center of (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)), with \(\mu ^+ = \nu ^+\zeta ^2\), i.e., \(\varPhi \left( x^f,s^f;\mu ^+\right) \le \tau ^f\), for some suitable value of \(\tau ^f\).
After the feasibility step, we perform some centering steps to get a strictly feasible triple \(\left( x^+,y^+,s^+\right) \) of (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)) in the \(\tau \)-neighborhood of the \(\mu ^+\)-center of (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)), where \(\mu ^+ = (1-\theta )\mu = \nu ^+\zeta ^2\). During the centering steps the iterates stay feasible for (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)). Hence for the analysis of the centering steps we can use the analysis for FIPMs, presented in Sect. 4. From this analysis we derive that the number of centering steps will not exceed
where the parameters \(\gamma \) and \(\kappa \) depend on the kernel function \(\psi \). Hence we are left with the problem of defining a suitable search direction \(\left( \varDelta ^f x, \varDelta ^f y, \varDelta ^f s\right) \) for the feasibility step and to determine \(\theta \) such that after the feasibility step we have \(\varPhi \left( x^f,s^f,\mu ^+\right) \le \tau ^f\) for some suitable value of \(\tau ^f\). The number of outer iterations will be \(\tfrac{1}{\theta }\log \left( \frac{\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})}{\varepsilon }\right) \). Thus, the total number of iterations will not exceed
5.2 Feasibility Step
For the search direction in the feasibility step, we use the triple \(\left( \varDelta ^f x, \varDelta ^f y, \varDelta ^f s\right) \) that is (uniquely) defined by the following system:
Then, defining the new iterates according to
we have, due to (16a),
In the same way one shows that \(c-A^Ty^f - s^f = \nu ^+ r_c^0\). Hence it remains to find \(\theta \) such that \(x^f\) and \(s^f\) are positive and \(\varPhi (x^f,s^f,\mu ^+) \le \tau ^f\). This is the hard part of the analysis of our algorithm, which we leave to the subsection below. We conclude this part with a formal description of the algorithm, in Fig. 2, and a graphical illustration in Fig. 3.
The straight lines in Fig. 3 depict the central paths of the pair (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) and the pair (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)). The \(\tau \)-neighborhoods of the \(\mu \)- and \(\mu ^+\)-centers are shown by the dark gray circles. The light gray region specifies the \(\tau ^f\)-neighborhood of the \(\mu ^+\)-center of (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)). The feasibility step is depicted by the first arrow at the right-hand side. The rest of the arrows stand for the centering steps. The iterates are shown by the circlets.
5.3 Analysis of the Feasibility Step
The feasibility step starts with some strictly feasible triple (x, y, s) for (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) and \(\mu =\nu \zeta ^2\) such that
Our goal is to find \(\theta \) such that after the feasibility step, with step size \(\theta \), the iterates \(\left( x^f,y^f,s^f\right) \) lie in the \(\tau ^f\)-neighborhood of the \(\mu ^+\)-center of the new perturbed pair (\(\hbox {P}_{\nu ^+}\)) and (\(\hbox {D}_{\nu ^+}\)), with \(\tau ^f=O(n)\). This means that \(\left( x^f,y^f,s^f\right) \) are such that
To proceed, we define scaled feasibility directions \(d_x^f\) and \(d_s^f\) as follows:
Using these notations, we may write
This shows that \(x^f\) and \(s^f\) are positive if and only if \(v + \theta d_x^f\) and \(v + \theta d_s^f\) are positive. On the other hand, using (17), we can reformulate (16c) as follows:
Therefore, \(d_s^f=- d_x^f\). As a consequence, \(x^f\) and \(s^f\) are positive if and only if \(v \pm \theta d_x^f>0\). Since \(v>0\), this is equivalent to \(v^2 - \theta ^2( d_x^f)^2>0\). We conclude that \(x^f\) and \(s^f\) are positive if and only if \(0\le \theta <\theta _{\max }\), where
Yet we turn to the requirement that \(\varPsi ( v^f) \le \tau ^f\). Using (18), (19) and \(xs=\mu v^2\), we write
Hence, if \(\theta < \theta _{\max }\) then we may write
Lemma 5.2
Let \(\theta \) be such that \(1/\sqrt{1-\theta }=O(1)\). Then \(\varPsi (\hat{v})= O(n)\) implies \(\varPsi ( v^f)= O(n)\).
Proof
By Lemma 4.2, we have
Let \(\varPsi (\hat{v})= O(n)\). Then \(\varPsi (\hat{v})\le Cn\) for some positive constant C. Hence \(\varPsi (\hat{v})/n\le C\). Recall that \(\rho (s)\ge 1\) for all \(s\ge 0\) and \(\rho (s)\) is monotonically increasing. Furthermore, \(\psi (t)\) is monotonically increasing for \(t\ge 1\). Hence we obtain \(\varPsi ( v^f)\le n\psi \left( {\rho \left( C\right) }/{\sqrt{1-\theta }}\right) \). Since \({1}/{\sqrt{1-\theta }}=O(1)\), the coefficient of n in the above upper bound for \(\varPsi ( v^f)\) does not depend on n. Hence the Lemma follows. \(\square \)
Due to Lemma 5.2 it suffices for our goal to find \(\theta \) such that \(\varPsi ({\hat{v}}) \le {\hat{\tau }}\) where \(\hat{\tau }=O(n)\). In the sequel, we consider \(\varPsi (\hat{v})\) as a function of \(\theta \), denoted as \(f_1(\theta )\). So we have
We proceed by deriving a tight upper bound for \(f_1(\theta )\), thereby using similar arguments as in [5]. Since the kernel function \(\psi (t)\) is eligible, \(\varPsi (v)\) is e-convex (cf. [5, Lemma 2.1]), whence we have
The first and the second derivatives of \(f(\theta )\) are as follows:
Since \(\psi ^{\prime \prime \prime }(t) < 0,\, \forall t > 0\), it follows that \(\psi ^{\prime \prime }(t)\) is monotonically decreasing. From this we deduce that
where \(v_{\min }:=\min (v)\) and \(\theta \) small enough, i.e., such that \(v_{\min }-\theta \Vert d_x^f\Vert >0\). Substitution into (21) gives
By integrating both sides of this inequality with respect to \(\theta \), while using that \(f^{\prime }(0)=0\), as follows from (20), we obtain
Integrating once more, we get
where the last inequality holds because \(\psi \) is convex.Footnote 3
The first derivative with respect to \(v_{\min }\) of the right-hand side expression in this inequality is given by \((\psi ^{\prime \prime }(v_{\min }) - \psi ^{\prime \prime }(v_{\min } - \theta \Vert d_x^f\Vert ))\theta \Vert d_x^f\Vert \). Since \(\psi ^{\prime \prime }\) is (strictly) decreasing, this derivative is negative. Hence it follows that the expression is decreasing in \(v_{\min }\). Therefore, when \(\theta \) and \(\Vert d_x^f\Vert \) are fixed, the less the \(v_{\min }\) is, the larger the expression will be. Below we establish how small \(v_{\min }\) can be when \(\delta (v)\) is given.
For each coordinate \(v_i\) of v we have \(|\psi ^{\prime }(v_{\min })|/2 \le \Vert \nabla \varPsi (v)\Vert /2 = \delta (v)\), which means that
Since the inverse function \(\rho \) of \(-\psi ^{\prime }/2\) is monotonically decreasing, this is equivalent to
Hence the smallest possible value of \(v_{\min }\) is \(\rho (\delta (v))\), and this value is attained in the (exceptional) case where \(v_{\min }\) is the only coordinate of the vector v that differs from 1. So we may assume that \(v_{\min } = \rho (\delta (v))\). This implies \(-\psi ^{\prime }(v_{\min })/2 = \delta (v)\) and hence \(\psi ^{\prime }(v_{\min })\le 0\), whence \(v_{\min }\le 1\).
In the sequel, we denote \(\delta (v)\) simply as \(\delta \). Substitution into (22) gives that
Hence we certainly have \(f(\theta )\le \hat{\tau }\) if
Since \(f(0)=\varPsi (v) \le \tau \), it can be verified that the last inequality holds if
Since \(\rho \) is decreasing, this is equivalent to
Note that if \(\theta \) approaches zero, then the left-hand side expression converges to \(\rho (\delta )\) and the right-hand side expression to zero. The left-hand side is decreasing in \(\theta \), whereas the right-hand side is increasing. The largest possible \(\theta \) makes both sides equal. In order to get a tight approximation for this value, we first need to estimate \(\Vert d_x^f\Vert \). This is done in the next section, where we follow a similar approach as in [15].
5.3.1 Upper Bound for \(\Vert d_{x}^{f}\Vert \)
One may easily check that the system (16), which defines the search directions \(\varDelta ^{f}x\), \(\varDelta ^{f}y\), and \(\varDelta ^{f}s\), can be expressed in terms of the scaled search directions \( d_{x}^{f}\) and \( d_{s}^{f}\) as follows:
where
At this stage we use that the initial iterates are given by (13) and (14), so we have
where \(\zeta > 0\) is such that
for some optimal solutions \(x^{*}\) of (P) and \((y^{*}, s^{*})\) of (D).
Using a similar argument as given in [15, Section 4.3], it can be proven that
In [15, Lemma 4.3], it is shown that
Using \(xs=\mu v^2\) and (23) we may write
Substitution in (31) yields that
Substitution in (30), also using that \(v_{\min }\ge \rho (\delta )\), we obtain
5.3.2 Condition for \(\theta \)
Yet we return to the condition (24) on \(\theta \):
The right-hand side is increasing in \(\Vert d_{x}^{f}\Vert \). Therefore, due to (32), it suffices if
Obviously, this implies that \(\theta n(1+\rho (-\delta )^2)< \rho (\delta )^2\). Therefore, there exists \(\alpha \in ]0,1[\), independent of n, such that
Clearly, using this \(\theta \), the expression (33) can be restated as
Our objective is to find the largest possible \(\alpha \) satisfying this inequality. In order to proceed we need bounds for \(\delta =\delta (v)\) and \(\rho (\delta )\). The next section deals with some technical lemmas which are useful in deriving these bounds.
6 Some Technical Lemmas
Recall that \(\rho \) is defined as the inverse function of \(-\tfrac{1}{2}\psi ^{\prime }(t)\) and \(\rho \) as the inverse function of \(\psi (t)\) for \(t\ge 1\). We also need the inverse function of \(\psi (t)\) for \(t\in ]0,1]\), which we denote as \(\chi \). To get tight estimates for these inverse functions, we need the inverse functions \(\bar{\chi }\) and \(\bar{\rho }\) of \(\psi _b(t)\) and \(-\psi ^{\prime }_b(t)\), respectively. Recall that, as it was shown in [5], one has
Hence \(\psi _b(t)\) is monotonically decreasing and \(\psi ^{\prime }_b(t)\) is monotonically increasing. This implies that \(\psi _b(t)\) and \(-\psi ^{\prime }_b(t)\) have inverse functions and these function are monotonically decreasing. We denote these inverse functions as \(\bar{\chi }\) and \(\bar{\rho }\), respectively.
Lemma 6.1
With \(\bar{\chi }\) denoting the inverse function of \(\psi _b(t)\), one has
Proof
Let \(t\in ]0,1]\). Then one has
Since \(\chi (s)\in ]0,1]\), this implies the inequalities in the lemma. \(\square \)
Lemma 6.2
With \(\bar{\rho }\) denoting the inverse function of \(- \psi ^{\prime }_b(t)\) for \(t >0\), one has
Proof
Since \(\psi ^{\prime }(t) = t+\psi _b^{\prime }(t)\), one has
If \(s\ge 0\), then \(\rho (s)=t\in ]0,1]\), and hence \(\bar{\rho }\left( 2s\right) \ge \rho (s) \ge \bar{\rho }\left( 2s+1\right) \), proving the lemma. \(\square \)
Recall that \(\rho \) is the inverse function of \(\psi (t)\) for \(t\ge 1\). The following two results are less trivial than the preceding two lemmas.
Lemma 6.3
(cf. [5, Lemma 6.2]) For \(s\ge 0\), one has
Lemma 6.4
One has, for any \(v\in \mathbb {R}^n_{++}\),
Proof
The left-hand side inequality in the lemma is due to [5, Theorem 4.9]. The proof of the right-hand side inequality can be obtained by slightly changing the proof of [5, Theorem 4.9] and is therefore omitted. \(\square \)
7 Iteration Bound of the IIPM Based on \(\psi _q\)
Since the kernel function \(\psi _q\) led to the best-known iteration bound for large-update FIPMs based on kernel functions, from now on we restrict ourselves to the case where \(\psi =\psi _q\). Thus, we have
In the current case the inverse functions \(\bar{\chi }\) and \(\bar{\rho }\) are defined as follows:
Since \(\chi \) and \(-\tfrac{1}{2}\psi ^{\prime }\) are decreasing, \(-\tfrac{1}{2}\psi ^{\prime }\chi \) is increasing. Hence, letting \(\varPsi (v)\le \tau \), Lemma 6.4 implies that
By Lemma 6.1 we have \(\chi (\tau ) \ge \bar{\chi }\left( \tau +\tfrac{1}{2}\right) \). Using once more that \(-\tfrac{1}{2}\psi ^{\prime }\) is decreasing we obtain
Since \(-\psi ^{\prime }(t) = t^{-q}-t \le t^{-q}\), and due to (36), it follows that
where the last inequality is due to \((1+ax)^{1/x}\le e^a\) for \(x> 0\) and \(1+ax>0\). Hence, when taking \(\tau \le 1/2\), we have
Since \(\rho \) is decreasing, by applying \(\rho \) to both sides of (38), and using Lemma 6.1 and (36) we obtain
If \(\tau \le 1/2\) this implies
Using that \(\rho \) is decreasing and also Lemma 6.2 and (37) we have
Furthermore, using \(\rho \ge 1/e\) we conclude that (35) certainly will hold if
Now taking \(\alpha = 1/2\) it follows that (35) will be satisfied if
Substitution of \(\alpha = 1/2\) into (34) yields
Due to (40) we have \(\rho (\delta )\ge 1/e\). In order to find an upper bound for \(\rho (\delta )\) we do as follows. For \(s\ge 0\),
Since \(t\ge 1\) we have \(t^{-q}\in ]0,1]\). Hence \(t=\rho (-s)\) implies \( 2s \le \rho (-s) \le 2s+1\). Thus, \(\rho (-\delta ) \le 2\delta +1\). By (39), one has \(\rho (-\delta ) \le 1+qe\). We therefore may conclude that (33) certainly holds if we take
This is the value that will be used in the sequel.
7.1 Complexity Analysis
As we established in (15), the total number of iterations is at most
We assume that \(\hat{\tau }= O(n)\). Due to Lemma 5.2 we then also have \(\tau ^f = O(n)\), provided that \({1}/{\sqrt{1-\theta }}=O(1)\). Due to (42) the latter condition is satisfied. To simplify the presentation we use \(\tau ^f=n\) in the analysis below, but our results can easily be adapted to the case where \(\hat{\tau }=O(n)\). When choosing \(\theta \) maximal such that (42) holds, we have
Hence, substitution of \(\gamma = \tfrac{q+1}{2q}\) and \(\kappa = \tfrac{1}{124\,q}\) yields that the total number of iterations does not exceed
The expression \(q^3n^\frac{1}{2q}\) is minimal if \(q = \log n /6\), and then it is equal to \(e^4 (\log n)^3/512\). This value of q satisfies (41), since \(\log (2e) \le 6\). Hence we obtain the following iteration bound:
It is worth mentioning that if \(\psi _1\) were used, then the iteration bound of our algorithm would be \(O\left( n^2\log \frac{n\zeta ^2}{\varepsilon }\right) \), which is a factor \(\sqrt{n}/(\log n)^3\) worse than (44).
8 Detecting Infeasibility or Unboundedness
As indicated in the introduction, the algorithm will detect infeasibility or/and unboundedness of (P) and (D) if no optimal solutions exist. In that case Lemma 5.1 implies the existence of \(\bar{\nu } > 0\) such that the perturbed pair (\(\hbox {P}_\nu \)) and (\(\hbox {D}_\nu \)) satisfy the IPC if and only if \(\nu \in ]\bar{\nu },1]\). As long as \(\nu ^+ = (1-\theta ) \nu >\bar{\nu }\) the algorithm will run as it should, with \(\theta \) given by (42). However, if \(\varepsilon \) is small enough, at some stage it will happen that \(\nu > \bar{\nu } \ge \nu ^+\). At this stage the new perturbed pair does not satisfy the IPC. This will reveal itself since at that time we necessarily have \(\theta _{\max } < \tilde{\theta }\). If this happens we may conclude that there is no optimal pair \((x^*,s^*)\) satisfying \(\Vert x^*+s^*\Vert _\infty \le \zeta \). We refer to [4] for a more detailed discussion.
9 Computational Results
In this section, we present the numerical results for a practical variant of the algorithm described in Sect. 5. Theoretically, the barrier parameter \(\mu \) is updated by a factor \((1-\theta )\) with \(\theta \) given by (42), and the iterates are kept very close to the \(\mu \)-centers, namely the \(\tau \)-neighborhood of the \(\mu \)-centers, with \(\tau =\tfrac{1}{8}\). In practice, it is not efficient to do in this way and not even necessary either. We present a variant of the algorithm which uses a predictor–corrector step in the feasibility step. Moreover, for the parameter \(\tau \), defined in Sect. 5.1, we allow some larger value than \(\tfrac{1}{8}\), e.g., \(\tau = O(n)\). We set \(\tau = \hat{\tau } = O(n)\) with \(\hat{\tau }\) defined as in Sect. 5.1. As a consequence, the algorithm does not need centering steps. We choose \(\hat{\tau }\) according to the following heuristics: if \(n \le 500\), then \(\hat{\tau } = 100n\), for \(500\le n\le 5000\), we choose \(\hat{\tau } = 10n\) and for \(n\ge 5000\), we set \(\hat{\tau } = 3n\). We compare the performance of the algorithm with the well-known LIPSOL package [16].
9.1 Starting Point and Stopping Criterion
A critical issue when implementing a primal-dual method is to find a suitable starting point. It seems sensible to obtain a starting point which is well centered and as close to a feasible primal-dual point as possible. The one suggested by theory, i.e., given by (13), being nicely centered, it may be quite far from the feasibility region. Moreover, to find a suitable \(\zeta \) is another issue.
In our implementation, we use a starting point \(\left( x^0,y^0,s^0\right) \) which is proposed by Lustig et al. [17] and inspired by the starting point used by Mehrtora [18]. Since we are interested in a point which is in the \(\tau \)-neighborhood of the \(\mu ^0\)-center. If \(\varPhi (x^0; s^0;\mu _0) > \tau \), we increase \(\mu _0\) until \(\varPhi (x0; s0;\mu _0) \le \tau \).
As in LIPSOL, our algorithm terminates if, for \(\varepsilon =10^{-6}\), either
or \(|x^Ts - (x^+)^Ts^+| < \varepsilon \) occurs. The condition (45) measures the total relative errors in the optimality conditions (1), while the later criterion terminates the program if only a tiny improvement is obtained on the optimality. In fact, it prevents the program from stalling. We include this criterion following Lustig [19].
9.2 Feasibility Step Size
As in other efficient numerical experiments, e.g., [16, 17], regardless of the theoretical result, we apply different step sizes along the primal step \(\varDelta x \) and the dual step \((\varDelta y,\varDelta s)\). This implies that the feasibility improves much faster than when identical step sizes are used. Letting (x, y, s) be the current iterates and \((\varDelta x,\varDelta y,\varDelta s)\) the Newton step, we obtain the maximum step sizes \(\theta ^p_{\max }\) and \(\theta ^d_{\max }\) in, respectively, the primal and the dual spaces as follows:
The goal is to keep the iterates close to the \(\mu \)-center, i.e., in its \(\hat{\tau }\)-neighborhood where \(\hat{\tau }\) is defined in Sect. 5.3. Thus, letting \(\bar{\theta }\) be such that
the primal and the dual step sizes \(\theta _p\) and \(\theta _d\) are defined as follows:
9.3 Solving the Linear System
We apply the backslash command of MATLAB (’\(\backslash \)’) to solve the normal equations
where \(\hat{b}\) is some right-hand side and D, in LO case, has the form
Denoting \(M:=AD^2A^T\) in (46), whenever the matrix M is ill-conditioned, we could obtain some more accurate solution by perturbing M as
where I is the identity matrix with size of M.
9.4 A Rule for Updating \(\mu \)
Motivated by the numerical results, and considering the fact that the Mehrotra’s PC method has become the most efficient in practice and used in most IPM-based software packages, e.g., [16, 20–22], we present the numerical results of the variant of our algorithm which uses Mehrotra’s PC direction at the feasibility step.
At the feasibility step, we apply the system
to obtain the affine-scaling directions \((\varDelta ^a_x,\varDelta ^a_y,\varDelta ^a_s)\). Then, the maximum step sizes \(\theta ^p_{\max }\) and \(\theta ^d_{\max }\) in, respectively, primal and dual spaces are calculated as described in Sect. 9.2. Then, defining
we let
We use \(\bar{\sigma }=0.3\), as the default value of \(\bar{\sigma }\). If \(\sigma <1\), we calculate the new barrier update parameter \(\mu \) as follows:
Then, if necessary, by increasing \(\mu _{\mathrm{new}}\) by a constant factor, say 1.1, we derive some \(\mu _{\mathrm{new}}\) for which
Note that \(\mu _{\mathrm{new}}\) is acceptable only if \(\mu _{\mathrm{new}} < \mu \).
In the case where \(\mu _{\mathrm{new}} \ge \mu \) or \(\sigma \ge 1\), we do not perform any \(\mu \)-update and proceed with the following corrector step which attempts to center toward the current \(\mu \)-center:
Recall that if \(\sigma \ge 1\), then we have an increase in the duality gap and hence the use of \(\mu _{\mathrm{new}}\) is no longer sensible. If \(\sigma < 1\) and \(\mu _{\mathrm{new}} < \mu \), we use the system (47) with \(\mu =0\) as a corrector step.
The feasibility step \(\left( \varDelta ^f x,\varDelta ^f y,\varDelta ^f s\right) \) is obtained as follows:
Next, we calculate the maximum step sizes \(\theta ^p_{\max }\) and \(\theta ^d_{\max }\) in, respectively, the primal and the dual spaces along the step \(\left( \varDelta ^f x,\varDelta ^f y,\varDelta ^f s\right) \), as described in Sect. 9.2. Assuming that \(\mu \) is the outcome of the predictor step, we obtain a \(\theta \) such that
with \(\hat{\tau }\) chosen as described in Sect. 9.4. By setting \(\theta _p = \theta \theta ^d_{\max }\) and \(\theta _d=\theta \theta ^d_{\max }\), we calculate the new iterates \(\left( x^f, y^f, s^f\right) \) as follows:
9.5 Results
In this section, we present our numerical results. Motivated by the theoretical result which says that the kernel function \(\psi _q\) gives rise to the best-known theoretical iteration bound for large-update IIPMs based on kernel functions, we compare the performance of the algorithm described in the previous subsection based on both the logarithmic barrier function and the \(\psi _q\)-based barrier function. As the theory suggests, we use \(q=\frac{\log n}{6}\) in \(\psi _q\).
Our test was done on a standard PC with Intel\(^\circledR \) Core\(^{\texttt {TM}}\) 2 Duo CPU and 3.25 GB of RAM. The code was implemented by version 7.11.0 (R2010b) of MATLAB\(^\circledR \) on Windows XP Professional operating system. The problems chosen for our test are from the NETLIB set. To simplify the study, we choose the problems which are in (P) format, i.e., there is no nonzero lower bound or finite upper bound on the decision variables.
Before solving a problem using the algorithm, we first solve it for singleton variables. This helps to reduce the size of the problem.
Numerical results are tabulated in Table 1. In second and the fourth columns, we listed the total number of iterations of the algorithm based on, respectively, \(\psi _1\), the kernel function of the logarithmic barrier function, and \(\psi _q\). The third and fifth columns stand for the quantity E(x, y, s). The iteration numbers of the LIPSOL package are given in the sixth column of these tables. Finally, in the seventh column, we listed the quantity E(x, y, s) of the LIPSOL package. In each row, the Italic number denotes the smallest of the iteration numbers of the three algorithms, namely our algorithm based on \(\psi _1\) and \(\psi _q\), and LIPSOL, and the bold number denotes the smallest of the iteration numbers of \(\psi _1\)-based and \(\psi _q\)-based algorithms. As it can be noticed from the last row of the table, the overall performance of the algorithm based on \(\psi _1\) is much better than that the variant based on \(\psi _q\). However, in some of the problems, the \(\psi _q\)-based algorithm outperforms the \(\psi _1\)-based algorithm. This amounts to the problems AGG, BANDM, DEGEN2, DEGEN3, SCSD1, SCSD6, SCSD8 and SHARE2B. The columns related to LIPSOL prove that it is still the best; however, our \(\psi _1\)-based algorithm saved one iteration compared with LIPSOL, to solve AGG2 and AGG3, and two iterations when solving STOCFOR1.
10 Conclusions
In this paper, we analyze infeasible interior-point methods (IIPMs) for LO based on a large neighborhood. Our work is motivated by [4] in which Roos presents a full-Newton IIPM for LO. Since the analysis of our algorithms requires properties of barrier functions based on kernel functions that are used in large-update feasible interior-point methods (FIPMs), we present primal-dual large-update FIPMs for LO based on kernel functions, as well.
In Roos’ algorithm, the iterates move within small neighborhood of the \(\mu \)-centers of the perturbed problem pairs. In many IIPMs, the algorithm reduces the infeasibility and the duality gap at the same rate. His algorithm has the advantages that it uses full-Newton steps and hence no calculation of step size is needed, and moreover its theoretical iteration bound is \(O(n\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon ))\) which coincides with the best-known iteration bound for IIPMs. Nevertheless, it has the deficiency that it is too slow in practice as it reduces the quantity \(\epsilon (x,y,s)\) by a factor \((1-\theta )\) with \(\theta = O(\tfrac{1}{n})\) at an iteration.
We attempt to design a large-update version of Roos’ algorithm which allows some larger amount of reduction of \(\epsilon (x,y,s)\) at an iteration than \(1-\theta \) with \(\theta =O(1/n)\). This requires that the parameter \(\theta \) is larger than O(1 / n), even \(\theta = O(1)\). Unfortunately, the result of Sect. 5 implies that \(\theta \) is \(O(1/(n(\log n)^2))\) which yields \(O(n\sqrt{n}(\log n)^3\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon )\) iteration bound for a variant. Since the theoretical result of the algorithm is disappointing, we rely on the numerical results to establish that our algorithm is a large-update method. A practically efficient version of the algorithm is presented and its numerical result is compared with the well-known LIPSOL package. Fortunately, the numerical results seem promising as our algorithm has iteration numbers close to those of LIPSOL and, in a few cases, outperforms it. This means that IIPMs suffer from the same irony as FIPMs, i.e., regardless of their nice practical performance, the theoretical complexity of large-update methods is worse. Recall that the best-known iteration bound for large-update IIPMs is \(O(n\sqrt{n}\log n\log (\epsilon (\zeta {\mathbf{e}},0,\zeta {\mathbf{e}})/\varepsilon )\) which is due to Salahi et al. [7].
As in LIPSOL [16], different step sizes in the primal and the dual spaces are used in our implementation. This gives rise to a faster achievement in feasibility than when identical step sizes are used. Moreover, inspired by the LIPSOL package, we use a predictor–corrector step in the feasibility step of the algorithm.
Future research may focus on the following areas:
-
As mentioned before, our algorithm has a factor \((\log n)^2\) worse iteration bound than the best-known iteration bound for large-update IIPMs. One may consider how to modify the analysis such that the iteration bound of our algorithm is improved by a factor \((\log n)^2\).
-
As mentioned in Sect. 10, according to the analysis of our algorithm presented in Sect. 5, the barrier-updating parameter \(\theta \) is \(O(1/(n(\log n)^2))\). This yields the loose iteration bound given by (44). This slender value of \(\theta \) is obtained because of some difficulties in the analysis of the algorithm which uses the largest value of \(\theta \), satisfying (24), to assure that \(\varPsi (\hat{v}) = O(n)\). This value of \(\theta \) is much smaller than the best value we may choose. Assuming \(n=60\), the largest value of \(\theta \) satisfying \(\varPsi (\hat{v}) = n\) is 0.849114 while the value of \(\theta \) suggested by theory is 0.141766. A future research may focus on some new analysis of the algorithm which yields some larger value of \(\theta \).
-
Roos’ full-Newton step IIPM was extended to semidefinite optimization (SDO) by Mansouri and Roos [23], to symmetric optimization (SO) by Gu et al. [24] and to LCP by Mansouri et al. [25]. An extension of large-update FIPMs based on kernel functions to SDO was presented by El Ghami [10]. One may consider how our algorithm behaves in theory and practice when it is extended to case of SDO, SO and LCP.
Notes
Throughout this paper, by a kernel function, we refer to the kernel functions introduced and discussed in [5].
We use that if f is convex and differentiable, then
$$\begin{aligned} (b-a)f^{\prime }(a)\le f(b)-f(a) \le (b-a)f^{\prime }(b). \end{aligned}$$
References
Megiddo, N. (ed.): Pathways to the optimal set in linear programming. In: Progress in Mathematical Programming (Pacific Grove, CA, 1987), pp. 131–158. Springer, New York (1989)
Anstreicher, K.M.: A combined phase I-phase II projective algorithm for linear programming. Math. Progr. 43(2, (Ser. A)), 209–223 (1989)
Mizuno, S.: Polynomiality of infeasible-interior-point algorithms for linear programming. Math. Progr. 67(1, Ser. A), 109–119 (1994)
Roos, C.: A full-Newton step \(O(n)\) infeasible interior-point algorithm for linear optimization. SIAM J. Optim. 16(4), 1110–1136 (2006)
Bai, Y., El Ghami, M., Roos, C.: A comparative study of Kernel functions for primal-dual interior-point algorithms in linear optimization. SIAM J. Optim. 15(1), 101–128 (2004)
Peng, J., Roos, C., Terlaky, T.: Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton Series in Applied Mathematics. Princeton University Press, Princeton (2002)
Salahi, M., Peyghami, M.R., Terlaky, T.: New complexity analysis of IIPMs for linear optimization based on a specific self-regular function. Eur. J. Oper. Res. 186(2), 466–485 (2008)
Potra, F.A., Stoer, J.: On a class of superlinearly convergent polynomial-time interior-point methods for sufficient LCP. SIAM J. Optim. 20(3), 1333–1363 (2009)
Roos, C., Terlaky, T., Vial, J.-Ph.: Interior-Point Methods for Linear Optimization. Springer, New York (2006). In: Second edition of Theory and Algorithms for Linear Optimization [Wiley, Chichester, 1997]
El Ghami, M.: New Primal-Dual Interior-Point Methods Based on Kernel Functions. Ph.D. thesis, Delft University of Technology, The Netherlands (2005)
Peng, J., Roos, C., Terlaky, T.: A new and efficient large-update interior-point method for linear optimization. Vychisl. Tekhnol. 6(4), 61–80 (2001)
Potra, F.A.: A superlinearly convergent predictor-corrector method for degenerate LCP in a wide neighborhood of the central path with \(O(\sqrt{n}L)\)-iteration complexity. Math. Progr. 100(2, Ser. A), 317–337 (2004)
Ye, Y.: Interior Point Algorithms. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley, New York (1997)
Asadi, A., Gu, G., Roos, C.: Convergence of the homotopy path for a full-Newton step infeasible interior-point method. Oper. Res. Lett. 38(2), 147–151 (2010)
Gu, G., Mansouri, H., Zangiabadi, M., Bai, Y., Roos, C.: Improved full-Newton step \(O(n)\) infeasible interior-point method for linear optimization. J. Optim. Theory Appl. 145(2), 271–288 (2010)
Zhang, Y.: Solving large-scale linear programs by interior-point methods under the MATLAB environment. Optim. Methods Softw. 10(1), 1–31 (1998)
Lustig, I., Marsten, R.E., Shanno, D.F.: On implementing Mehrotra’s predictor-corrector interior-point method for linear programming. SIAM J. Optim. 2(3), 435–449 (1992)
Mehrotra, S.: On the implementation of a primal-dual interior-point method. SIAM J. Optim. 2(4), 575–601 (1992)
Lustig, I.: Feasibility issues in a primal-dual interior-point method for linear programming. Math. Progr. 49(2, (Ser. A)):145–162 (1990/91)
Vanderbei, R.J.: LOQO: an interior-point code for quadratic programming. Optim. Methods Softw. 11/12(1–4), 451–484 (1999)
Andersen, E.D., Andersen, K. D.: The Mosek interior-point optimizer for linear programming: an implementation of the homogeneous algorithm. In: Frenk, H., Roos, K., Terlaky, T., Zhang, S. (eds.) High Performance Optimization, vol. 33 of Appl. Optim. pp. 197–232. Kluwer Academy Publication, Dordrecht (2000)
Czyzyk, J., Mehrotra, S., Wagner, M., Wright, S.: PCx: an interior-point code for linear programming. Optim. Methods Softw. 11/12(1–4), 397–430 (1999)
Mansouri, H., Roos, C.: A new full-Newton step \(O(n)\) infeasible interior-point algorithm for semidefinite optimization. Numer. Algorithms 52(2), 225–255 (2009)
Gu, G., Zangiabadi, M., Roos, C.: Full Nesterov–Todd step infeasible interior-point method for symmetric optimization. Eur. J. Oper. Res. 3, 473–484 (2011)
Mansouri, H., Zangiabadi, M., Pirhaji, M.: A full-Newton step \(O(n)\) infeasible-interior-point algorithm for linear complementarity problems. Nonlinear Anal. Real World Appl. 12(1), 545–561 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Asadi, A., Roos, C. Infeasible Interior-Point Methods for Linear Optimization Based on Large Neighborhood. J Optim Theory Appl 170, 562–590 (2016). https://doi.org/10.1007/s10957-015-0826-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-015-0826-5