1 Introduction

The linear-inequality feasibility problem has been the object of research at least since (Fourier 1824). His elimination method was rediscovered by Motzkin (1936), analysed by Kuhn (1956) and can be considered the first effective technique for solving that family of problems.

To this day, many years after (Fourier 1824), the relaxation and centering methodologies developed to solve linear equations in the basic papers of Kaczmarz (1937) and Cimmino (1938), were extended to linear inequalities by Agmon (1954), Motzkin and Schoenberg (1954), and Merzlyakov (1963). They represent an important area of both theoretical and applied research. The exponential nature of relaxation methods, in terms of the data, was obtained by Todd (1979), and Goffin (1982).

Some classes of projection algorithms are well studied. For intermittent projection algorithms, Bauschke and Borwein (1996) proved that the sequence converges linearly to some solution point at a rate that is independent of the starting point. For cyclic projection algorithms, Gubin et al. (1967), Herman et al. (1978) proved that the sequence converges in norm to some solution point. For block projection algorithms, Censor et al. (1988) proved that the sequence converges linearly to some solution point with a rate independent of the starting point. For weighted projection algorithms, Eremin (1969) proved that the sequence converges linearly to some solution point with a rate independent of the starting point. See the definitions of those classes of projection methods in Bauschke and Borwein (1996).

Censor and Elfving (1982) proposed a least-squares algorithm for the linear inequality feasibility problem, that has a Cimmino-like algorithm as a special case, proving the convergence for both.

Recently, Censor et al. (2012) showed the effectiveness of projection methods for large scale linear inequality feasibility problems. They considered problems up to tens of thousands of unknowns satisfying up to hundreds of thousands of constraints.

Closely related to projection methods, as shown, for example by Bauschke and Borwein (1996), subgradient algorithms occupy a special place amongst all the methods discovered, in their own merits and also for their extensions and applications; see Eremin (1969), Polyak (1987) and Shor (1985).

Center methods also play an essential role in feasibility problems. In general, the usual setting are convex sets. Nevertheless it is frequent that subproblems (local models) are given by linear feasibility problems. Now, we quote some classical papers, whose theory is based on geometric concepts. Levin (1965) and Newman (1965) considered the center of gravity of a convex body; Khachiyan (1979), in his remarkable paper, Nemirovsky and Yudin (1983), and Shor (1985) considered the ellipsoid method. The papers by Khachiyan (1979) and Nemirovsky and Yudin (1983) also proved the polynomial-time complexity. Tarasov et al. (1988) considered the center of the max-volume ellipsoid inscribing the body. The polynomial-time complexity of approximating the maximal inscribed ellipsoid for a polytope is given in Khachiyan and Todd (1993).

As largely known, the concept of analytic center leads to more efficient algorithms than the considered above. This begins with Huard and Lieu (1966) and Huard (1967), considering a generic center in the body that maximizes a distance function. Vaidya (1996) considered the volumetric center, the maximizer of the determinant of the Hessian matrix of the logarithmic barrier function. Goffin et al. (1996) consider a dual column generation algorithm for solving general convex feasibility problems. In each iteration it is computed an approximation of the analytic center of some set given by the intersection of linear inequalities. The authors in Vaidya (1996) and Goffin et al. (1996) obtained the usual convergence and polynomial-time complexity. The linear feasibility problem is considered by Filipowsky (1995). This author proposes “an algorithm that gives an approximate solution of a desired accuracy to a system of linear inequalities specified with approximate data, through the using of knowledge that the actual instance is feasible”. The polynomial-time complexity of the algorithm is a direct consequence of the using of polynomial-time linear programming algorithm in each main iteration.

Ho and Kashyap (1965) considered the strict feasibility problem \(Ax>0\), through the minimization of a quadratic model associated to the feasibility problem, namely, \(\min \Vert Ax-b\Vert _2\), where \(b>0\) is conveniently chosen. The authors prove exponential convergence for the algorithm.

Linear feasibility problems have important applications; see, for example, proton therapy planning in Chen et al. (2010), set theoretic estimation in Combettes (1993), image reconstruction in computerized tomography in Herman (2009), radiation therapy in Herman and Chen (2008), and image reconstruction in Herman et al. (1978).

One of the most important results in the endeavour to obtain a strongly polynomial-time algorithm is in Tardos (1986). In another direction, some papers have presented methods whose complexity depends on the condition number of the constraint matrix; see Vavasis and Ye (1996) and Ye (1986).

Chubanov (2012) recently proposed a strongly polynomial-time algorithm which either finds a solution of a linear system \(Ax = b, 0 \le x \le 1\) or decides that the system has no \(0,1\) solution. In (Chubanov 2010), Chubanov arrives at a similar result for integer solutions. In both cases, the data are integers. See also Basu et al. (2014) for some extensions of Chubanov’s algorithm. On a different approach, for certain classes of polytopes, Barasz and Vempala (2010) obtained a strong polynomial-time algorithm for linear programming.

Our work contributes to establishing the existence of a strong polynomial-time for the restricted case of open sets given by \(Ax>b, x>0\). There is thus no immediate generalization to linear programming. The most important theoretical results underpinning our method are Gordan and Farkas’ Alternative Theorems, and Banach’s Fixed Point Theorem. Our approach differs from those of the authors above, particularly by not requiring integer solution.

This paper is organized as follows: after this introduction, Sect. 2 presents the problem and a general idea of the method. We basically want to solve the feasibility problem through a certain convex minimization problem whose cost function is a mix of barrier and quadratic terms and is dependent on some parameters. Sect. 3 describes the method that uses the Banach Fixed Point Theorem (see Banach 1922). In Sect. 4 the extension to non- homogeneous problems is achieved. The conclusions are given in Sect. 5.

2 Strict homogeneous feasibility and associated problem

This section presents the strict homogeneous and associated convex problems, showing the relation between them. The main purpose of this section is to prove that if an approximate null solution is obtained by our method, then the interior of the set \(V\) in (2.1), which defines the problem, is empty. As an immediate consequence, the complexity of obtaining a null solution is the same as will be proved for the method.

We are concerned with the feasibility problem in the following form:

$$\begin{aligned} \left( P\right) \begin{array}{l} {\hbox {Find}}\; \; x\in V\subset R^n, \hbox {with } V:=\left\{ x\in R^n: x > 0,\;Ax > 0\right\} , \end{array} \end{aligned}$$
(2.1)

where the matrix \(A\), is of size \(m\times n\), with rows \(a_i^T\), \(i=1,\ldots ,m\). Note that any feasibility problem \(Ax> 0, \; x\in R^n\), can be expressed in the above form.

Our aim is to find an interior point of \(V\) or to show that its interior is empty.

2.1 Related problem

Consider the following convex problem:

$$\begin{aligned}&\min \frac{\rho }{2}\sum _{i=1}^n x_i^2-\mu \sum _{i=1}^n \ln x_i - \sum _{i=1}^m\ln y_i\nonumber \\&\quad \hbox {subject to }\;a_i^Tx=y_i,\;i=1,\ldots ,m,\\&\quad \quad (x, y)>0,\nonumber \end{aligned}$$
(2.2)

where \(\rho \) and \(\mu \) are positive parameters. Notice that (2.2) is feasible if there exists \((x,y)>0\) satisfying the equality constraints. Then we have the following:

Lemma 2.1

Only one of the following statements is true:

  1. 1.

    \( \hbox {int} (V)=\emptyset \), therefore (2.2) is not feasible.

  2. 2.

    (2.2) is feasible.

Proof

In the statement 1, we see that if int \((V)=\emptyset \) then one of the following situations is true: \(x>0 \) implies \(Ax\le 0\), or \(\exists x\le 0\) such that \(Ax>0\). Therefore (2.2) is not feasible. Otherwise, we have 2. \(\square \)

Now, let \(s\in \mathbb {R}^m\) be the Lagrange multiplier vector associated with the equalities in (2.2). Then the KKT equations can be written:

$$\begin{aligned}&\rho x_j - \frac{\mu }{x_j} - (A^Ts)_j =0, \quad j=1,\ldots ,n\nonumber \\&\quad -\frac{1}{y_i}+s_i =0, \quad i=1,\ldots ,m\nonumber \\&\quad a_i^Tx=y_i, \quad i=1,\ldots ,m. \end{aligned}$$
(2.3)

Clearly, as (2.2) is a convex-linear problem, these conditions are necessary and sufficient to determine its solution, if one exists. We state the following Lemma.

Lemma 2.2

Let \(\rho \) and \(\mu \) be positive parameters. Suppose (2.2) is feasible. Then there exists a dual variable \(s\in \mathbb {R}_{++}^m\) satisfying

$$\begin{aligned} F_i(s):=\frac{1}{s_i}-\frac{1}{2\rho }\sum _{j=1}^n a_{ij}\left[ (A^Ts)_j+\sqrt{(A^Ts)_j^2+4\rho \mu }\right] =0,\;i=1,\ldots ,m,\qquad \end{aligned}$$
(2.4)

which is a (dual) solution to the KKT equations (2.3).

Proof

The existence of \(s\in \mathbb {R}^m\) follows from the KKT theorem for convex problems. Now, from the KKT equations, the first group of equalities leads to a quadratic in \(x_j\), whose positive solution (the only solution we are interested in) is

$$\begin{aligned} x_j(s)=\frac{1}{2\rho }\left[ (A^Ts)_j + \sqrt{(A^Ts)_j^2+4\rho \mu }\right] ,\quad j=1,\ldots ,n. \end{aligned}$$
(2.5)

Using that formula, we can state explicitly \(a_i^Tx(s),\;i=1,\ldots ,m\), as a function of \(s\), whose positivity is implicitly ensured by the positivity of \(s_i\) (and \(y_i\)). Now, from the second equation in (2.3), \(y_i=1/s_i\), and the primal feasibility equality gives

$$\begin{aligned} \begin{array}{r} y_i=\frac{1}{s_i}=a_i^Tx(s)=\frac{1}{2\rho }\sum _{j=1}^n a_{ij}\left[ (A^Ts)_j+\sqrt{(A^Ts)_j^2+4\rho \mu }\right] ,\quad i=1,\ldots ,m. \end{array}\nonumber \\ \end{aligned}$$
(2.6)

This is a square system in the variable \(s\), as given in (2.4) in the hypothesis. \(\square \)

Remark 2.3

The existence of solution for the problem (2.2) is also constructively ensured through the closed expressions of \(x(s)\), given in (2.5) and \(y=Ax(s)\), given in (2.6), valid for any \(s>0\). Regarding the uniqueness, it will be assured in the convergence theorem of the Algorithm DVA in Sect. 4.

Next the relationship between the feasibility and the optimization problems will be stated. Taking the first hypothesis in the parameters, for some given small \(\varepsilon \), let

$$\begin{aligned} \mu =\mu _1=\frac{\varepsilon }{4\rho }. \end{aligned}$$
(2.7)

Notice that (2.7) allows (2.2) to be interpreted as a perturbation of the linear version of (2.2).

2.2 Relation between the problems

In this section, we apply two alternative lemmas, one by Gordan (1873), the other by Farkas (1901). From the foregoing discussion, it can be supposed that, in solving the set of equations (2.4) by some algorithm, a vector \(s^*>0\) is produced such that \(x^*>0\) and \(Ax^*>0\), as given by (2.5) and (2.6), respectively. Therefore, only two cases need be considered, with \(\varepsilon \) given as above:

  1. (a)

    \(2\rho x_i^*> 0\), \(2\rho x_i^*\ne O(\varepsilon )\), for some \(i=1,\ldots ,n\), then we are done.

  2. (b)

    \(2\rho x_i^*> 0\), \(2\rho x_i^*= O(\varepsilon )\), for all \(i=1,\ldots ,n\).

Clearly, the coefficient \(2\rho \) prevents a false null entry of the solution, if \(\rho \) is large.

Theorem 2.4

Assume that \(2\rho x_i^*> 0\), \(2\rho x_i^*= O(\varepsilon )\), for all \(i=1,\ldots ,n\), \(\mu \) satisfying (2.7) and \(\rho >0\) given. Then, in the feasibility problem (2.1), within an \(\epsilon \) approximation, \(\hbox {int}(V)=\emptyset \). Otherwise, in case a), any solution of (2.4) presents a positive solution to the feasibility problem.

Proof

From (2.5), due to the hypothesis satisfied by \(\mu \), the term \((A^Ts)_j\le O(\varepsilon )\). Therefore, the following cases have to be considered:

  1. i.

    \((A^Ts)_j=O(\varepsilon ),\;j=1,\ldots ,n\)

  2. ii.

    \((A^Ts)_j<0\), for at least some \(j=1,\ldots ,n\), \((A^Ts)_j=O(\varepsilon )\), for the remaining entries.

Now, in order to apply alternative theorems, the above conditions are approximated by:

  1. i’.

    \((A^Ts)_j =0,\;j=1,\ldots ,n\)

  2. ii’.

    \(A^Ts=c\le 0\), \(c \ne 0\), for some vector c.

For i\('\), we apply Gordan’s alternative Lemma, which in our notation is written:

$$\begin{aligned} \begin{array}{l} {\textit{Exactly}} \,\, {\textit{one}}\,\, {\textit{of}} \,\,{\textit{the}}\,\, {\textit{following}}\,\, {\textit{holds}}.\\ \hbox {Either}\,\, A^Ts=0 \,\,{\textit{has}}\,\, a\,\, {\textit{non-zero}}\,\, {\textit{solution}}\,\, s\ge 0,\\ {\textit{or}}\,\, {\textit{else}}\,\, {\textit{there}}\,\, {\textit{exists}}\,\, x\,\, {\textit{satisfying}}\,\, Ax> 0. \end{array} \end{aligned}$$

Thus, we can conclude that \({\hbox {int}}(V)=\emptyset \).

In case ii\('\), we apply Farkas’ Lemma:

$$\begin{aligned}\begin{array}{l} {\textit{Suppose}}\,\, {\textit{that}}\,\, \{s\in \mathbb {R}^m:\;A^Ts=c\le 0,\; s\ge 0\} \,\,{\textit{has}}\,\, {\textit{some}}\,\, {\textit{solution}}.\\ {\textit{Then}} \,\, \{x\in \mathbb {R}^n:\;{\textit{Ax}}\ge 0,\; c^Tx<0\} \,\,{\textit{has}}\,\, no \,\,{\textit{solution}}. \end{array} \end{aligned}$$

Clearly, one of the following is true:

  1. 1.

    \(\{Ax< 0,\; c^Tx<0\}\) has some solution: then \(x\) is non-negative, \(Ax< 0\), so \(\hbox {int}(V)=\emptyset .\)

  2. 2.

    \(\{Ax\ge 0,\; c^Tx>0\}\) has some solution. Only \(x<0\) satisfies that condition, so \(\hbox {int}(V)=\emptyset .\)

  3. 3.

    \(\{Ax< 0,\; c^Tx>0\}\) has some solution. Only \(x<0\) satisfies those inequalities, meaning that \(\hbox {int}(V)=\emptyset .\) The proof is complete.\(\square \)

Remark 2.5

Given some dual solution \(\hat{s}>0\), we have, by construction \(x(\hat{s})\) and \(Ax(\hat{s})>0\), or \(x(\hat{s})=O(\epsilon )\) and \(Ax(\hat{s})=O(\epsilon )\), that is, \(\hbox {int}V\) is empty under an \(\epsilon \) approximation, as specified in above Theorem.

3 An iterative Banach procedure

Remark 3.1

From now on, to simplify notation, we drop the set index \(l=1,\ldots ,m\), or take a similar course, except where necessary for clarity.

3.1 Motivation

As a motivation for the algorithm, we introduce a class of methods aimed at ensuring the main hypothesis to apply the fixed point theory. In the algorithm proposed we fix \(s\) in the hypercube \([1,2]^m\). Now, denote

$$\begin{aligned} G_i(s)=\sum _{j=1}^na_{ij}\left[ (A^Ts)_j+\sqrt{(A^Ts)_j^2+\epsilon }\;\;\right] \end{aligned}$$
(3.1)

and define the following abstract procedure:

$$\begin{aligned} s_i^{k+1}=s_i^k-H_i(s_i^k)F_i(s^k)=\Psi _i(s^k), \end{aligned}$$

where \(H_i(s_i)\) is some positive function, and \(F_i(s)\) is given in (2.4). Then, the iterative function \(\Psi _i(s)\) writes:

$$\begin{aligned} \Psi _i(s)=s_i-H_i(s_i)\left[ \frac{1}{2\rho }G_i(s)-\frac{1}{s_i}\right] =s_i-\frac{1}{2\rho }T_i(s)+\frac{H_i(s_i)}{s_i}, \end{aligned}$$
(3.2)

where \(T_i(s)=H_i(s_i)G_i(s)\).

As will become evident in the next sections, there are two points that determine the main conditions in \(\Psi _i(s)\). The inclusion property:

$$\begin{aligned} s\in [1,2-\delta ] \Rightarrow \Psi _i(s)\in [1,2], \end{aligned}$$

for \(\delta >0\) small (we cannot ensure that result is true for the whole hypercube). In fact, the not obvious property is to show that \(\Psi _i(s)\le 2\), for \(s\in [1,2-\delta ]\). The second point is related to the rate of convergence. Particularly, we must obtain conditions such that \(\partial \Psi _i(s)/\partial s_i \le \tau <1,\;\;s\in [1,2]^m\).

Let us begin by the last one. From the expression of \(\Psi _i(s)\) above, we must have:

$$\begin{aligned} 1-\frac{1}{2\rho }\frac{\partial T_i(s)}{\partial s_i}+\frac{d}{d s_i}\frac{H_i(s_i)}{s_i}\le \tau . \end{aligned}$$

In terms of \(\rho \), this writes:

$$\begin{aligned} \left[ \tau -1-\frac{d}{d s_i}\frac{H_i(s_i)}{s_i}\right] \rho \ge -\frac{1}{2}\frac{\partial T_i(s)}{\partial s_i}. \end{aligned}$$

Then this inequality is solvable for \(\rho >0\) if

$$\begin{aligned} \tau >1+\frac{d}{d s_i}\frac{H_i(s_i)}{s_i}. \end{aligned}$$
(3.3)

Clearly, we must require that the last derivative be negative. A simple choice is to fix \(H_i(s_i)=s_i Q_i(s_i)\), with \(Q_i(s_i)\) a decreasing function.

Now, we take the condition \(\Psi _i(s)\le 2\) into account for \(s\in [1,2-\delta ]^m\). Using (3.2), it writes:

$$\begin{aligned} s_i-\frac{1}{2\rho }T_i(s)+Q_i(s_i)\le 2, \end{aligned}$$

or, in terms of \(\rho \)

$$\begin{aligned} \left[ 2-s_i-Q_i(s_i)\right] \rho \ge -\frac{1}{2}T_i(s). \end{aligned}$$

Let us consider a particular choice, the exponential function: \(Q_i(s_i)= e^{-\alpha s_i}\), \(\alpha >0\). Note that the coefficient of \(\rho \) writes, in that case, \(2-s_i-e^{-\alpha s_i}\). It is easy to see that it is a concave function, so its minimum is at one of the extremes, given by \(2-\delta \). Therefore, the \(\rho \) coefficient is positive if \(\delta >e^{-\alpha (2-\delta )}\), giving the bound \(\alpha >(2-\delta )^{-1}\ln \delta ^{-1}\). Returning to (3.3), we have, in this case, \(\tau >1-\alpha e^{-\alpha s_i}\). Thus, we have, approximately, \(\tau >1+(e/2)\delta \ln \delta \), a rather poor rate of convergence. This shows that the method depends strongly on the choice of \(H_i(s_i)\).

3.2 Iterative procedure

Define

$$\begin{aligned} \Psi _i(s)= s_i + s_i\ln (\alpha s_i+\beta ) F_i(s), \end{aligned}$$
(3.4)

where \(\alpha \) and \(\beta \) are conveniently chosen and \(F_i(s)\) given in (2.4).

Dual Variable Algorithm (DVA)

Input:

\(\alpha >0\);

\(\beta >0\);

\(\hbox {tol}\) is the accuracy;

\(1\le s_i^0\le 2, \;i=1,\ldots ,m\) is the initial vector iteration;

Compute:

\(s_i^{1}=\Psi _i(s^0) \;i=1,\ldots ,m\)

\(t_0:=\Vert s^1-s^0\Vert _\infty \)

begin

for \(k\ge 0\)

\(s_i^{k+1}=\Psi _i(s^k), \;i=1,\ldots ,m\)

\(t_k:=\Vert s^{k+1}-s^k\Vert _\infty \)

if \(t_k\le \hbox {tol}\; t_0\), then stop.

Clearly, the convergence of this algorithm ensures that \(F_i(s^k)\) tends to \(0\), or, equivalently, we approach a solution to the system (2.2).

4 Convergence theory

This section provides the necessary hypothesis to apply the fixed-point theorem, i.e., to show that both the variable \(s\) and the iteration function belong to the same set, and obtain some convergence rate \(\tau <1\).

Remark 4.1

Consider the following simple estimation which uses the fact that \(s_i\in [1,2]\). From (3.1),

$$\begin{aligned} \Big |G_i(s)\Big |=\Bigg |\sum _{j=1}^na_{ij}[(A^Ts)_j+\sqrt{(A^Ts)_j^2+\epsilon }\Bigg |, \end{aligned}$$

which approximately verifies:

$$\begin{aligned} |G_i(s)|\le 2\sum _{j=1}^n\left| a_{ij}\right| \left| \big (A^Ts\big )_j\right| \le 4 \sum _{j=1}^n\sum _{k=1}^m|a_{ij}||a_{kj}|\le 4\max _{i=1,\ldots ,m}\sum _{j=1}^n\sum _{k=1}^m\left| a_{ij}\right| \left| a_{kj}\right| . \end{aligned}$$

We denote by

$$\begin{aligned} M=4\max _{i=1,\ldots ,m}\sum _{j=1}^n\sum _{k=1}^m|a_{ij}||a_{kj}| \end{aligned}$$
(4.1)

Regarding the partial derivatives, we have, for all \(i,l\):

$$\begin{aligned} \left| \frac{\partial G_i}{\partial s_l}(s)\right|&= \left| a_i^Ta_l+\sum \limits _{j=1}^n a_{ij}a_{lj}\sum \limits _{k=1}^m a_{kj}s_k\left[ \left( \sum \limits _{k=1}^m a_{kj}s_k\right) ^2 +\epsilon \right] ^{-\frac{1}{2}}\right| \nonumber \\&\le \left| a_i^Ta_l\right| +\sum \limits _{j=1}^n \left| a_{ij}a_{lj}\right| \left| \sum \limits _{k=1}^m a_{kj}s_k\right| \left[ \left( \sum \limits _{k=1}^m a_{kj}s_k\right) ^2 +\epsilon \right] ^{-\frac{1}{2}}\nonumber \\&\le \left| a_i^Ta_l\right| +\sum \limits _{j=1}^n \left| a_{ij}a_{lj}\right| \left| \sum \limits _{k=1}^m a_{kj}s_k\right| \left| \sum \limits _{k=1}^m a_{kj}s_k\right| ^{-1}=\left| a_i^Ta_l\right| +\sum \limits _{j=1}^n |a_{ij}a_{lj}|\nonumber \\&\le \Omega :=\max _{i,l=1,\ldots ,m}\left\{ \left| a_i^Ta_l\right| +\sum \limits _{j=1}^n \left| a_{ij}a_{lj}\right| \right\} \end{aligned}$$
(4.2)

Now, returning to \(\Psi _i(s)\), we can write:

$$\begin{aligned} \Psi _i(s)=s_i+s_i\ln (\alpha s_i+\beta )\left[ \frac{1}{2\rho }G_i(s)-\frac{1}{s_i}\right] =s_i+\frac{1}{2\rho }T_i(s)-\ln (\alpha s_i+\beta ), \end{aligned}$$
(4.3)

where

$$\begin{aligned} T_i(s)=s_i\ln (\alpha s_i+\beta )G_i(s). \end{aligned}$$
(4.4)

4.1 Assuring inclusion

Our aim is to show that, for \(s\in [1,2]^m\), \(\Psi _i(s)\in [1,2]\). Indeed, we are only able to show an approximate inclusion, that is, \(\Psi _i(s)\in [1,2]\) for \(s\in [1,2-\delta ]^m\), for some small \(\delta \).

Proposition 4.2

Let \(\beta _2<\beta <\beta _1\), \(\alpha <\min \{\alpha _1,\alpha _2\}\), \(\rho \ge \max \{\rho _1,\rho _2\}\), with

$$\begin{aligned}&\beta _1=1-2\alpha \end{aligned}$$
(4.5)
$$\begin{aligned}&\beta _2=e^{-\delta }-(2-\delta )\alpha \end{aligned}$$
(4.6)
$$\begin{aligned}&\alpha _1=\frac{1}{2}\end{aligned}$$
(4.7)
$$\begin{aligned}&\alpha _2=\frac{1-e^{-\delta }}{\delta }\end{aligned}$$
(4.8)
$$\begin{aligned}&\rho _1=M \end{aligned}$$
(4.9)

and

$$\begin{aligned} \rho _2=\frac{ M|\ln (\alpha +\beta )|}{\delta +\ln [(2-\delta )\alpha +\beta ]} \end{aligned}$$
(4.10)

Then \(\ln (\alpha s_i+\beta )<0\), for all \(s_i\in [1,2]\), \(\Psi _i(s)\in [1,2]\), for all \(s\in [1,2-\delta ]^m\).

Proof

Let prove first that \(\Psi _i(s)\ge 1\). From (4.3), \(\Psi _i(s)\ge 1\) is equivalent to

$$\begin{aligned} s_i+\frac{1}{2\rho }T_i(s)-\ln (\alpha s_i+\beta )\ge 1, \end{aligned}$$

or, in terms of \(\rho \):

$$\begin{aligned}{}[s_i-1-\ln (\alpha s_i+\beta )]\rho \ge -\frac{1}{2}T_i(s). \end{aligned}$$

Then, in order that the coefficient of \(\rho \) be positive, we fix, in the worst case, \(s_i=1\). The remaining term is positive due to the bounds (4.5) and (4.7) of the hypothesis. So we can obtain the following lower bound for \(\rho \)

$$\begin{aligned} \rho \ge -\frac{1}{2}\frac{T_i(s)}{-\ln (\alpha s_i+\beta )}=\frac{1}{2}s_i G_i(s) \end{aligned}$$

Then, with (4.1), we get \(\rho _1\), (4.9).

In the other sense, we have to show that

$$\begin{aligned} s_i+\frac{1}{2\rho }T_i(s)-\ln (\alpha s_i+\beta )\le 2, \;\;s\in [1,2-\delta ]. \end{aligned}$$

Similarly as above, we rewrite this expression in terms of \(\rho \), giving:

$$\begin{aligned} \left[ 2-s_i+\ln (\alpha s_i+\beta )\right] \rho \ge \frac{1}{2}T_i(s). \end{aligned}$$
(4.11)

To analyse the \(\rho \) coefficient, we observe that its derivative is \(-1+\alpha (\alpha s_i+\beta )^{-1} \) and the second derivative is \(-\alpha ^2(\alpha s_i+\beta )^{-2} \), a negative function. Thus, the coefficient is a concave function, and its minimum is at one of its extreme points. Therefore we must compare \(1+\ln (\alpha +\beta )\) and \(\delta +\ln [(2-\delta )\alpha +\beta ]\). Clearly, the last expression is the smallest one, for \(\delta \) sufficiently small. It is positive if we allow the bound \(\beta >\beta _2\), (4.6) in the hypothesis . The second upper bound for \(\alpha \) is derived from the compatibility between the lower and upper bounds to \(\beta \). Finally, we can return to (4.11) to get

$$\begin{aligned} \rho \ge \frac{1}{2}\frac{T_i(s)}{\delta +\ln [(2-\delta )\alpha +\beta ]}. \end{aligned}$$

A simple computation leads to \(\rho _2\), (4.10). \(\square \)

4.2 Convergence rate

Our target is the relationship between successive iterations, that is,

$$\begin{aligned} \Big |s_i^{k+1}-s_i^k\Big |\le \tau \Big \Vert s^k-s^{k-1}\Big \Vert _\infty , \end{aligned}$$

for some \(0<\tau <1\). Then, from the Algorithm DVA we have

$$\begin{aligned} \Big |s_i^{k+1}-s_i^k\Big |=\Big |\Psi _i(s^k)-\Psi _i(s^{k-1}\Big |\le \max _{s\in [1,2]^m}\max _{i,l=1,\ldots ,m} \Big |\frac{\partial \Psi _i(s)}{\partial s_l}\Big |\Big \Vert s^k-s^{k-1}\Big \Vert _\infty . \end{aligned}$$

Now, we are going to analyse the partial derivatives of \(\Psi _i(s)\). We need the following upper bound which uses (4.1) and (4.2). Straightforward calculations give that:

$$\begin{aligned} \max _{s\in [1,2]^m}\Big |\frac{\partial T_i(s)}{\partial s_i}\Big |\le D=:M\left( \Big |\ln \alpha +\beta \Big |+\frac{2\alpha }{2\alpha +\beta }\right) +2\Big |\ln \alpha +\beta \Big |\Omega \end{aligned}$$
(4.12)

Proposition 4.3

Suppose valid the hypothesis of Proposition 4.2. Let \(\alpha <1/2\), \(\beta =(\beta _1+\beta _2)/2\), \(\tau >1-\alpha \) (for \(\delta \) sufficiently small), \(\rho \) satisfies \(\rho \ge \max \{\rho _3, \rho _4, \rho _5\}\), where

$$\begin{aligned}&\rho _3=\left( \tau -1+\frac{2\alpha }{1+e^{-\delta }+\delta \alpha }\right) ^{-1}\frac{D}{2},\end{aligned}$$
(4.13)
$$\begin{aligned}&\rho _4=\left( 1+\tau -\frac{2\alpha }{-2\alpha +1+e^{-\delta }+\delta \alpha }\right) ^{-1}\frac{D}{2} \end{aligned}$$
(4.14)

and

$$\begin{aligned} \rho _5=\frac{\Omega |2\ln \alpha +1-4\alpha +e^{-\delta }+\delta \alpha |}{2\tau }. \end{aligned}$$
(4.15)

Then

$$\begin{aligned} \Big |\frac{\partial \Psi _i(s)}{\partial s_l}\Big |\le \tau ,\;\; \forall s\in [1,2]^m, \forall i,l=1,\ldots ,m. \end{aligned}$$

Proof

  1. (I)

    The case \(i=l\)

We first show that \(\partial \Psi _i(s)/\partial s_i\le \tau <1\). Using the formulation (4.3), we have

$$\begin{aligned} \frac{\partial \Psi _i(s)}{\partial s_i}=1+\frac{1}{2\rho }\frac{\partial T_i(s)}{\partial s_i}- \frac{\alpha }{\alpha s_i +\beta }. \end{aligned}$$

We want that

$$\begin{aligned} 1+\frac{1}{2\rho }\frac{\partial T_i(s)}{\partial s_i}- \frac{\alpha }{\alpha s_i +\beta }\le \tau \end{aligned}$$

or, in terms of \(\rho \),

$$\begin{aligned} \left( \tau -1+\frac{\alpha }{\alpha s_i +\beta }\right) \rho \ge \frac{1}{2}\frac{\partial T_i(s)}{\partial s_i}. \end{aligned}$$
(4.16)

We then get

$$\begin{aligned} \tau > 1-\frac{\alpha }{\alpha s_i +\beta }, \end{aligned}$$

and, in the worst case:

$$\begin{aligned} \tau > 1-\frac{\alpha }{2\alpha +\beta }. \end{aligned}$$

Now, we use the bounds on \(\alpha \) and \(\beta \). Fix \(\beta \) as \((\beta _1+\beta _2)/2\), to give, after some calculation:

$$\begin{aligned} \tau >1-\frac{2\alpha }{1+e^{-\delta }+\delta \alpha }. \end{aligned}$$

We can, now, for \(\delta \) sufficiently small, take the estimation \(\tau >1-\alpha \). Regarding the bounds for \(\alpha \), we see that \(\lim _{\delta \rightarrow 0}\alpha _2=1\), so we can take \(\alpha <\alpha _1=1/2\).

Returning to (4.16), we obtain

$$\begin{aligned} \rho \ge \frac{1}{2}\left( \tau -1+\frac{2\alpha }{1+e^{-\delta }+\delta \alpha }\right) ^{-1}\frac{\partial T_i(s)}{\partial s_i}. \end{aligned}$$

We then use (4.12) to get \(\rho _3\), given in (4.13) in the hypothesis.

Now we take the \(\tau \) condition into account in the opposite sense, that is,

$$\begin{aligned} 1+\frac{1}{2\rho }\frac{\partial T_i(s)}{\partial s_i}- \frac{\alpha }{\alpha s_i +\beta }\ge -\tau , \end{aligned}$$

which leads to

$$\begin{aligned} \left( 1+\tau -\frac{\alpha }{\alpha s_i +\beta }\right) \rho \ge -\frac{1}{2}\frac{\partial T_i(s)}{\partial s_i}. \end{aligned}$$

Clearly, the coefficient of \(\rho \) is always positive, giving the lower bound \(\rho _4\) (4.14) of the hypothesis.

  1. (II)

    The case \(i\ne l\)

The condition on \(\tau \) reduces to

$$\begin{aligned} \Big |\frac{\partial \Psi _i(s)}{\partial s_l}\Big |=\frac{1}{2\rho }\Big |\frac{\partial T_i(s)}{\partial s_l}\Big |\le \tau , \end{aligned}$$

which is ensured if we take the lower bound \(\rho _5\), (4.15) of the hypothesis. \(\square \)

4.3 Convergence theorem

Next we present the main result of this section, the Banach fixed-point theorem applied to our iterative procedure. The proof is classical and can be found in many functional analysis books (see e.g. Kantorovich and Akilov 1959). It is given here for the sake of completeness. In what follows, we assume the previous results.

Theorem 4.4

Let \(s^0\in [1,2]^m\) be given. Then the sequence \(\{s^k\}\) produced by Algorithm DVA converges, and its limit \(s^*\) is unique. We also have the following estimation:

$$\begin{aligned} \Vert s^*-s^k\Vert _\infty \le \frac{\tau ^{k}}{1-\tau }\Vert s^1-s^0\Vert _\infty \le \frac{ \tau ^{k}}{1-\tau }. \end{aligned}$$
(4.17)

Proof

We know that, for each \(i=1,\ldots ,m\), \(k\ge 1\),

$$\begin{aligned} |s_i^{k+1}-s_i^{k}|\le \tau \Vert s^k-s^{k-1}\Vert _\infty \end{aligned}$$
(4.18)

Then,

$$\begin{aligned} \Vert s^{k+1}-s^k\Vert _\infty \le \tau \Vert s^k-s^{k-1}\Vert _\infty ,\quad k\ge 1, \end{aligned}$$
(4.19)

with \(\tau <1\). We now state that, for all \(k\in \{1,2,\ldots \}\), the following is true:

$$\begin{aligned} \Vert s^{k+1}-s^k\Vert _\infty \le \tau ^{k}\Vert s^1-s^0\Vert _\infty . \end{aligned}$$
(4.20)

With that aim, we will proceed by induction. The statement is true when \(k=1\) because, from (4.19), we have \(\Vert s^{1+1}-s^1\Vert _\infty \le \tau \Vert s^1-s^0\Vert _\infty .\) Suppose the statement is valid for some \(l\in \{1,2,\ldots \}\). Then

$$\begin{aligned} \Vert s^{(l+1)+1}-s^{l+1}\Vert _\infty \le \tau \Vert s^{l+1}-s^l\Vert _\infty \le \tau \; \tau ^{l}\Vert s^1-s^0\Vert _\infty =\tau ^{l+1}\Vert s^1-s^0\Vert _\infty , \end{aligned}$$

given the inductive assumption. Therefore, the above claim is true.

Let \(\varepsilon >0\). Since \(0<\tau <1\), there exists some sufficiently large \(J\in \{1,2,\ldots \}\) such that

$$\begin{aligned} \tau ^{J}<\frac{\varepsilon (1-\tau )}{\Vert s^1-s^0\Vert _\infty }. \end{aligned}$$

Using property (4.20), for any \(j,k\in \{0,1,\ldots \}\) with \(j>k\ge J\),

$$\begin{aligned}&\Vert s^j-s^k\Vert _\infty \le \Vert s^j-s^{j-1}\Vert _\infty +\cdots +\Vert s^{k+1}-s^k\Vert _\infty \\&\quad \le \tau ^{j-1}\Vert s^1-s^0\Vert _\infty +\cdots +\tau ^{k}\Vert s^1-s^0\Vert _\infty \\&\quad =\Vert s^1-s^0\Vert _\infty \tau ^{k}\sum _{l=0}^{j-k-1}\tau ^{l}\\&\quad <\Vert s^1-s^0\Vert _\infty \tau ^{k}\sum _{l=0}^{\infty }\tau ^{l}\\&\quad =\tau ^{k} \frac{\Vert s^1-s^0\Vert _\infty }{1-\tau }<\varepsilon . \end{aligned}$$

Therefore, \(\{s^k\}_{k\ge 0}\) is a Cauchy sequence, its convergence being an immediate consequence of the completeness of the space (a closed box). Hence, we can take \(s^*=\lim _{k\rightarrow \infty }s^k\).

Proceeding now to the uniqueness property, we first show that \(s^*\) is a fixed point of \(\Psi _i(s)\). Note that, for any \(k\in \{0,1,\ldots \}\),

$$\begin{aligned} 0\le \left| s_i^{k+1}-{\Psi _i(s^*)}\right| =\left| {\Psi _i(s^k)}-{\Psi _i(s^*)}\right| \le \tau \Big \Vert s^k-s^* \Big \Vert _\infty . \end{aligned}$$

Then, as the sequence is convergent, the term on the right converges to \(0\), and hence

$$\begin{aligned} \lim _{k\rightarrow \infty }\left| s_i^{k+1} -{\Psi _i(s^*)}\right| =0. \end{aligned}$$

Therefore, simultaneously, as \(k\rightarrow \infty \), so

$$\begin{aligned} s_i^k\rightarrow {\Psi _i(s^*)} \end{aligned}$$

and \(s^k\rightarrow s^*\). As the limits are unique, it must be true that

$$\begin{aligned} s_i^*={\Psi _i(s^*)}. \end{aligned}$$

In order to demonstrate the uniqueness, suppose \(w\) also satisfies \(\rightarrow w_i=\Psi _i(w)\). Then

$$\begin{aligned} 0\le |s_i^*-w_i|=|{\Psi _i(s^*)}-{\Psi _i(w)}| \le \tau \Vert s^*-w\Vert _\infty . \end{aligned}$$

As \(\tau <1\), we have achieved our aim. \(\square \)

4.3.1 Complexity

This section presents a complexity result based on a given tolerance \(10^{-p}\), for \(p\ge 1\). Note that the tolerance specified is applied to the interval \([1,2]^m\), and thus independently from the feasibility problem data. \(\tau \) is also independent of data. Consequently, the number of iterations of the algorithm is also data-independent.

Theorem 4.5

Let the error be \(10^{-p}\), for \(p\ge 1\), between solutions \(s^*\) and \(s^K\). The Algorithm DVA then produces a solution with at most

$$\begin{aligned} \frac{1}{\log \tau }[\log (1-\tau )-p] \end{aligned}$$

iterations and \(O(m^2(n+p))\) arithmetic operations.

Proof

For the first result, supposing a tolerance \(10^{-p}\), using (4.17), a bound for the iteration \(K\) must be computed such that

$$\begin{aligned} \frac{\tau ^{K}}{1-\tau }\le 10^{-p}. \end{aligned}$$
(4.21)

This produces the estimation given in the hypothesis.

We now proceed to determine the total arithmetic complexity.

Clearly, the main fixed computation is given by he product \(AA^T\) in \(F_i(s)\), which takes \(O(m^2n)\) operations.

Therefore, the fixed cost takes \(O(m^2n)\) operations.

At each iteration, the product \(AA^Ts^k\) has a maximum cost of \(O(m^2)\) operations, leading to the estimation given in the hypothesis. \(\square \)

5 Strict non-homogeneous feasibility problem

The problem to consider is:

$$\begin{aligned} (\mathcal {F})\hbox { Find } x\in \mathbb {R}^n, Ax+b> 0, x> 0, \end{aligned}$$

where \(A\) is an \(m\times n\) matrix and \(b\in \mathbb {R}^m\).

As usual, we can define the following equivalent strict homogeneous problem:

$$\begin{aligned} (\mathcal {F}_h)\hbox { Find } x\in \mathbb {R}^n, Ax+bx_{n+1}> 0,\; (x, x_{n+1})> 0. \end{aligned}$$

A natural adaptation of the previous related convex problem, considered in Sect. 2, is:

$$\begin{aligned}&\min \frac{\rho }{2}\sum _{i=1}^{n+1} x_i^2-\mu \sum _{i=1}^{n+1} \ln x_i-\sum _{i=1}^{m}\log y_i\nonumber \\&\quad \hbox {subject to }y_i=a_i^Tx+b_i x_{n+1},\; i=1,\ldots ,m\nonumber \\&\quad \quad (x,x_{n+1})>0,\; y>0. \end{aligned}$$
(5.1)

Then, similarly to the development of Sect. 2.1 and, as before, designating the Lagrange multipliers associated with the above equality constraints as \(s\in \mathbb {R}^m\), we obtain,:

$$\begin{aligned} x_j(s)=\frac{1}{2\rho }[(A^Ts)_j+\sqrt{(A^Ts)_j^2+4\rho \mu } \quad j=1,\ldots ,n, \end{aligned}$$
(5.2)

and

$$\begin{aligned} x_{n+1}(s)=\frac{1}{2\rho }(b^Ts+\sqrt{(b^Ts)^2+4\rho \mu }. \end{aligned}$$
(5.3)

Also, \(1/s_i=y_i\), for \(i=1,\ldots ,m\), which determines the system:

$$\begin{aligned} \begin{array}{l} 0=F_i(s)=\frac{1}{s_i}-a_i^Tx(s)-b_i x_{n+1}(s)=\\ \\ \frac{1}{s_i}-\frac{1}{2\rho }\left\{ \displaystyle \sum \nolimits _{j=1}^n a_{ij}[(A^Ts)_j+\sqrt{(A^Ts)_j+4\rho \mu }+b_i (b^Ts+\sqrt{(b^Ts)^2+4\rho \mu }\right\} ,\\ \\ i=1,\ldots ,m. \end{array}\nonumber \\ \end{aligned}$$
(5.4)

Note that the results of Sect. 3 are immediately applicable.

6 Conclusions

We describe a new algorithm for the strict homogeneous linear-inequality feasibility problem in the positive orthant. It has some desirable and useful properties:

  • the number of iterations depends only on the given error \(10^{-p}\), for some positive integer \(p\);

  • the overall complexity depends only on \(p\) and the dimensions \(m\) and \(n\) of the problem;

  • it is based only on products of matrices and vectors, and is comparable in terms of arithmetic error and computational time with most known algorithms that use matrix inversion.

  • the algorithm is matrix rank independent.

  • The structure of the method allows parallel procedures to be used.

Our current research is directed to considering the general feasibility problem \(Ax=b, x\ge 0\).