Solving proximal split feasibility problems without prior knowledge of operator norms

Moudafi, A.; Thakur, B. S.

doi:10.1007/s11590-013-0708-4

Solving proximal split feasibility problems without prior knowledge of operator norms

Original Paper
Published: 19 November 2013

Volume 8, pages 2099–2110, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Optimization Letters Aims and scope Submit manuscript

Solving proximal split feasibility problems without prior knowledge of operator norms

Download PDF

A. Moudafi¹ &
B. S. Thakur²

723 Accesses
102 Citations
Explore all metrics

Abstract

In this paper our interest is in investigating properties and numerical solutions of Proximal Split feasibility Problems. First, we consider the problem of finding a point which minimizes a convex function $f$ such that its image under a bounded linear operator $A$ minimizes another convex function $g$. Based on an idea introduced in Lopez (Inverse Probl 28:085004, 2012), we propose a split proximal algorithm with a way of selecting the step-sizes such that its implementation does not need any prior information about the operator norm. Because the calculation or at least an estimate of the operator norm $\Vert A\Vert $ is not an easy task. Secondly, we investigate the case where one of the two involved functions is prox-regular, the novelty of this approach is that the associated proximal mapping is not nonexpansive any longer. Such situation is encountered, for instance, in numerical solution to phase retrieval problem in crystallography, astronomy and inverse scattering Luke (SIAM Rev 44:169–224, 2002) and is therefore of great practical interest.

Iterative approximation of solutions for proximal split feasibility problems

Article Open access 22 July 2015

The iterative method for solving the proximal split feasibility problem with an application to LASSO problem

Article 09 December 2021

Self-adaptive algorithms for proximal split feasibility problems and strong convergence analysis

Article Open access 14 November 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and preliminaries

The split feasibility problem has received much attention due to its applications in signal processing and image reconstruction [9], with particular progress in intensity-modulated therapy [6]. For a complete and exhaustive study on algorithms for solving convex feasibility problem, including comments about their applications and an excellent bibliography see, for example [1]. Our purpose here is to study the more general case of proximal split minimization problems and to investigate the convergence properties of the associated numerical solutions. To begin with, we are interested in finding a solution $x^*\in H_1$ of the following problem

$$\begin{aligned} \min _{x\in H_1}\{f(x)+g_\lambda (Ax)\}, \end{aligned}$$

(1.1)

where $H_1, H_2$ are two real Hilbert spaces, $f:H_1\rightarrow I\!\!R\cup \{+\infty \},\,g:H_2\rightarrow I\!\!R\cup \{+\infty \}$ two proper convex lower semicontinuous functions and $A:H_1\rightarrow H_2$ a bounded linear operator, $g_{\lambda } (y)=\min _{u\in H_2}\{g(u)+\frac{1}{2\lambda }\Vert u-y\Vert ^2\}$ stands for the Moreau–Yosida approximate of the function $g$ of parameter $\lambda $.

Note that the differentiability of the Yosida-approximate $g_\lambda $, see for instance [19], secures the additivity of the subdifferentials and thus we can write

$$\begin{aligned} \partial (f(x)+g_\lambda (Ax))=\partial f(x)+ A^*\nabla g_\lambda (Ax)=\partial f(x)+ A^*\left( \frac{I-\text{ prox }_{\lambda g}}{\lambda }\right) (Ax). \end{aligned}$$

The optimality condition of (1.1) can be then written as

$$\begin{aligned} 0\in \lambda \partial f(x^*)+ A^*(I-\text{ prox }_{\lambda g})(Ax^*), \end{aligned}$$

(1.2)

where $\text{ prox }_{\lambda g} (x)=argmin_{u\in H_2}\{g(u)+\frac{1}{2\lambda }\Vert u-x\Vert ^2\}$ stands for the proximal mapping of $g$ and the subdifferential of $f$ at $x$ is the set

$$\begin{aligned} \partial f(x):=\{u\in H_1: \ f(y)\ge f(x)+\langle u, y-x\rangle \text{ for } \text{ all } \ y\in H_1\}. \end{aligned}$$

Inclusion (1.2) in turn yields to the following equivalent fixed point formulation

$$\begin{aligned} x^*=\text{ prox }_{\mu \lambda f}\big (x^*- \mu A^*(I-\text{ prox }_{\lambda g})\big )(Ax^*). \end{aligned}$$

(1.3)

To solve (1.1), relation (1.3) suggests to consider the following split proximal algorithm

$$\begin{aligned} x_{k+1}=\text{ prox }_{\mu _k\lambda f}\big (x_k- \mu _k A^*(I-\text{ prox }_{\lambda g})\big )(Ax_k); \end{aligned}$$

(1.4)

Observe that by taking $f=\delta _C$ [defined as $\delta _C(x)=0$ if $x\in C$ and $+\infty $ otherwise], $g=\delta _Q$ the indicator functions of two nonempty closed convex sets $C, Q$ of $H_1$ and $H_2$ respectively, Problem (1.1) reduces to

$$\begin{aligned} \min _{x\in H_1}\{\delta _C(x)+(\delta _Q)_\lambda (Ax)\}\Leftrightarrow \min _{x\in C}\{\frac{1}{2\lambda }\Vert (I -P_Q)(Ax)\Vert ^2\}, \end{aligned}$$

(1.5)

which, when $C\cap A^{-1}(Q)\not =\emptyset $, is equivalent to the so called Split Feasibility Problem, namely

$$\begin{aligned} x\in C \quad \text{ such } \text{ that } Ax\in Q. \end{aligned}$$

(1.6)

This problem was used for solving an inverse problem in radiation therapy treatment planning [6] and has been well studied both theoretically and practically, see for example [1, 5] and the references therein. In this context, (1.4) reduces to the so-called CQ-algorithm introduced by Byrne [4]

$$\begin{aligned} x_{k+1}=P_C\big (x_k- \mu _k A^*(I-P_Q)(Ax_k)\big ); \end{aligned}$$

(1.7)

where the step-size $\mu _k$ is chosen in $(0,2/\Vert A\Vert ^2)$ and $P_C, P_Q$ stand for the orthogonal projections on the closed convex sets $C$ and $Q$, respectively.

The determination of the step-size in (1.7) (idem for its Krasnoselskii–Mann version, see for instance [21–23], and also for (1.4), see for example [9] and references therein) depends on the operator norm which computation (or at least estimate) is not an easy task. To overcome this difficulty, Lopez et al. [10] introduce a new choice of the step-size sequence $(\mu _k)$ and propose the following algorithm:

Algorithm 3.1

Given an initial arbitrarily point $x_0\in H_1$. Assume that $x_k\in C$ has been constructed and $\nabla \tilde{h}(x_k)\not =0$; then compute $x_{k+1}$ via the rule

$$\begin{aligned} x_{k+1}=P_C(x_k-\mu _kA^*(I- P_Q))(Ax_k), \end{aligned}$$

(1.8)

where $\mu _k:=\rho _k\frac{ \tilde{h}(x_k)}{\Vert \nabla \tilde{h}(x_k)\Vert ^2}$ with $0<\rho _k<4$ and $\tilde{h}(x)=\frac{1}{2}\Vert (I-P_Q)Ax \Vert ^2$. If $\nabla \tilde{h}(x_k)=0$, then $x_{k+1}=x_k$ is a solution of (1.6) and the iterative process stops; otherwise, we set $k:=k+1$ and go to (1.8).

They proved the weak convergence of the sequence generated by (1.8) if (1.6) is consistent and $inf_k\rho _k(4-\rho _k)>0$.

At this stage we would like to emphasize that our interest, in the first part of the present paper, is in solving (1.1) in the case $\textit{argmin} \ f\cap A^{-1}(\textit{argmin} \ g)\not =\emptyset $, or in other words: in finding a minimizer $x^*$ of $f$ such that $Ax^*$ minimizes $g$, namely

$$\begin{aligned} x^*\in \textit{argmin} f \quad \text{ such } \text{ that } Ax^*\in \textit{argmin} \ g, \end{aligned}$$

(1.9)

$f,\,g$ being two proper, lower semicontinuous convex functions, $\textit{argmin} \ f:=\{\bar{x}\in H_1: f(\bar{x})\le f(x) \ \forall x\in H_1\}$ and $\textit{argmin} \ g:=\{\bar{y}\in H_2: g(\bar{y})\le g(y) \ \forall y\in H_2\}$. $\Gamma $ will denote the solution set.

This problem was considered, for instance in [5] and [14], however, the iterative methods proposed to solve it need to know a priori the norm (or at least an estimate of the norm) of the bounded linear operator $A$. To avoid this difficulty, inspired by the idea introduced in [10], we develop in Sect. 2 an algorithm which is designed to address a way of selecting the step-sizes such that its implementation does not need any prior information about the operator norm and prove its related convergence result. This result is an extension of Theorem 3.5-[10].

In the second part of this paper, we will still assume $f$ to be convex, but we allow the function $g$ to be noconvex. In the case of indicator functions of subsets with $A=I$, such situation is encountered in numerical solution to phase retrieval problem in inverse scattering [11] and is therefore of great practical interest. Here, we consider the more general problem of finding a minimizer $\bar{x}$ of $f$ such that $A\bar{x}$ is a critical point of $g$, namely

$$\begin{aligned} 0\in \partial f({\bar{x}}) \quad \text{ such } \text{ that } 0\in \partial _P g(A{\bar{x}}), \end{aligned}$$

(1.10)

where $\partial _P$ stands for the Proximal subdifferential of $g$ which will be define in Sect. 3.

At this time the nonconvex theory is much less developed than the convex one. A notable exception, in the fixed-point context, is the work by Luke [12], who studies the convergence of a projection/reflection algorithm in a prox-regular setting. Nevertheless, the fixed point operator is assumed to be locally firmly nonexpansive. In [15] a proximal approach was also developed for finding critical points of uniformly prox-regular functions. Here, in the case of variable regularization parameters $(\lambda _k)$, we will prove in Sect. 3 the convergence of our Split Proximal Algorithm if the bounded linear operator is surjective. The latter assumption is always satisfied in inverse problems in which a priori information is available about the representation of the target solution in a frame, see for instance [7] and the references therein.

2 Convex minimization feasibility problem

Now, we are in a position to introduce a new way of selecting the step-sizes. To that end, we set $\theta (x_k):=\sqrt{\Vert \nabla h(x)\Vert ^2+\Vert \nabla l(x)\Vert ^2}$ with $h(x)=\frac{1}{2}\Vert (I-\text{ prox }_{\lambda g})Ax \Vert ^2,\,l(x)=\frac{1}{2}\Vert (I-\text{ prox }_{\mu _k\lambda f})x \Vert ^2$ and introduce the following split proximal algorithm:

Split proximal algorithm Given an initial point $x_0\in H_1$. Assume that $x_k$ has been constructed and $\theta (x_k)\not =0$; then compute $x_{k+1}$ via the rule

$$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda \mu _kf}(x_k-\mu _kA^*(I- \text{ prox }_{\lambda g})(Ax_k)), \end{aligned}$$

(2.1)

where the stepsize $\mu _k:=\rho _k\frac{ h(x_k)+l(x_k)}{\theta ^2(x_k)}$ with $0<\rho _k<4$. If $\theta (x_k)=0$, then $x_{k+1}=x_k$ is a solution of (1.9) and the iterative process stops; otherwise, we set $k:=k+1$ and go to (2.1).

Observe that by taking $f=\delta _C,\,g=\delta _Q$ the indicator functions of two nonempty closed convex sets $C, Q$ of $H_1$ and $H_2$ respectively, we recover [10, Algorithm 3.1]. Indeed, since $ \text{ prox }_{\lambda \mu _kf}=P_C$, the iterates $x_k$ belong to $C$ and thus $\nabla l(x_k)=0$. So $\theta (x_k)$ reduces to $\nabla \tilde{h}(x_k)$.

Recall that the proximal mapping of $g$ is firmly nonexpansive, namely

$$\begin{aligned} \langle \text{ prox }_{\mu g} y-\text{ prox }_{\mu g} y', y-y'\rangle \ge \Vert \ge \text{ prox }_{\mu g} y-\text{ prox }_{\mu g} y'\Vert ^2 \ \forall y,\quad y'\in H_2, \end{aligned}$$

and it is also the case for complement $I-\text{ prox }_{\mu g}$, see for example [7].

In order to prove the main result of this section, the idea is to apply the following well-known result on weak convergence in Hilbert spaces (the Opial’s lemma) with $S:=\Gamma $. The original proof of this Lemma in [16] requires $S$ to be closed and convex. An argument given in [20] shows that we do not need convexity.

Lemma 2.1

see [16, 20].

Let ${H}$ be a Hilbert space and $(x_k)$ a sequence in ${ H}$ such that there exists a nonempty closed set $S\subset {H}$ satisfying:

(i)
For every $z \in S,\,\lim _k \Vert x_k-z\Vert $ exists.
(ii)
Any weak-cluster point of the sequence $(x_k)$ belongs in $S$.

Then, there exists $\bar{x}\in S$ such that $(x_k)$ weakly converges to $\bar{x}$.

The following Theorem contains the main convergence result of this section.

Theorem 2.2

Assume that $f$ and $g$ are two proper convex lower-semicontinuous functions and that (1.9) is consistent (i.e., $\Gamma \not =\emptyset $). If the parameters satisfy the following conditions $\varepsilon \le \rho _k\le \frac{4h(x_k)}{h(x_k)+l(x_k)} -\varepsilon $ (for some $\varepsilon >0$ small enough), then the sequence $(x_k)$ generated by (2.1) weakly converges to a solution of (1.9).

Proof

Let $z\in \Gamma $ and note that $\nabla h(x)=A^*(I-\text{ prox }_{\mu _k g})Ax$, $\nabla l(x)=(I-\text{ prox }_{\mu _k\lambda f})x$. Using the fact that $\text{ prox }_{\lambda \mu _k f}$ is nonexpansive, $z$ verifies (1.9) (since minimizers of any function are exactly fixed-points of its proximal mapping) and having in hand

$$\begin{aligned} \langle \nabla h(x_k), x_k\!-\!z\rangle&= \langle (I\!-\!\text{ prox }_{\mu _k g})Ax_k, Ax_k\!-\!Az\rangle \ge \Vert (I\!-\!\text{ prox }_{\lambda g})Ax_k \Vert ^2=2h(x_k), \end{aligned}$$

thank to the fact that $I-\text{ prox }_{\mu _k g}$ is firmly nonexpansive, we can write

$$\begin{aligned} \Vert x_{k+1}-z\Vert ^2&\le \Vert x_k-z\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k \langle \nabla h(x_k), x_k-z\rangle \\&\le \Vert x_k-z\Vert ^2+ \mu _k^2\Vert \nabla h(x_k)\Vert ^2 -4\mu _kh(x_k)\\&= \Vert x_k\!-\!z\Vert ^2\!+\!\rho _k^2\frac{( h(x_k)\!+\!l(x_k))^2}{(\theta ^2(x_k))^2}\Vert \nabla h(x_k)\Vert ^2\!-\! 4\rho _k\frac{ h(x_k)\!+\!l(x_k)}{\theta ^2(x_k)}h(x_k)\\&\le \Vert x_k\!-\!z\Vert ^2\!+\!\rho _k^2\frac{( h(x_k)\!+\!l(x_k))^2}{\theta ^2(x_k)}\!-\! 4\rho _k\frac{( h(x_k)\!+\!l(x_k))^2}{\theta ^2(x_k)}\frac{ h(x_k)}{h(x_k)\!+\!l(x_k)}\\&= \Vert x_k-z\Vert ^2-\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)\frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}. \end{aligned}$$

Thus

$$\begin{aligned} \Vert x_{k+1}-z\Vert ^2 \le \Vert x_k-z\Vert ^2-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}.\qquad \end{aligned}$$

(2.2)

The sequence $(x_k)$ is thus Fejer monotone with respect to $\Gamma $ which assures the existence of the limit

$$\begin{aligned} l(z):=\displaystyle {\lim _{k\rightarrow +\infty }}\Vert x_k-z\Vert ^2<+\infty , \end{aligned}$$

(2.3)

and hence the first condition of Lemma 2.1 is satisfied. The latter in turn implies that $(x_k)$ is bounded.

Now, Summing up inequality (2.2) with respect to $k=0, 1,\ldots $, we obtain

$$\begin{aligned} \sum _{k=0}^\infty \rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\le \Vert x_0-z\Vert ^2<+\infty . \end{aligned}$$

(2.4)

On the other hand, the assumption $\varepsilon \le \rho _k\le \frac{4h(x_k)}{h(x_k)+l(x_k)}-\varepsilon $ together with (2.4) ensure that

$$\begin{aligned} \sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}<+\infty . \end{aligned}$$

(2.5)

Consequently,

$$\begin{aligned} \lim _{k\rightarrow +\infty }(h(x_k)+l(x_k))=0\Leftrightarrow \lim _{k\rightarrow +\infty } h(x_k)=0 \quad \text{ and } \quad \lim _{k\rightarrow +\infty } l(x_k)=0,\qquad \end{aligned}$$

(2.6)

because $\theta ^2(x_k):=\Vert \nabla h(x)\Vert ^2+\Vert \nabla l(x)\Vert ^2$ is bounded. This follows from the fact that $\nabla h$ is Lipschitz continuous with constant $\Vert A\Vert ^2$, $\nabla l$ is nonexpansive and $(x_k)$ is bounded. More precisely, for any $z$ which solves (1.9), we have

$$\begin{aligned} \Vert \nabla h(x_k)\Vert&= \Vert \nabla h(x_k))-\nabla h(z)\Vert \le \Vert A\Vert ^2\Vert x_k-z\Vert \quad \text{ and } \\ \Vert \nabla l(x_k)\Vert&= \Vert \nabla l(x_k)-\nabla l(z)\Vert \le \Vert x_k-z\Vert . \end{aligned}$$

Now, let $\tilde{x}$ be a weak cluster point of $(x_k)$, there exists a subsequence $(x_{k_\nu })$ which weakly converges to $\tilde{x}$. The lower-semicontinuity of $h$ then implies that

$$\begin{aligned} 0\le h(\tilde{x})\le \liminf _{\nu \rightarrow +\infty } h(x_{k_\nu })=\lim _{k\rightarrow +\infty }h(x_k)=0. \end{aligned}$$

That is $h(\tilde{x})=\frac{1}{2}\Vert (I-\text{ prox }_{\lambda g})A\tilde{x} \Vert ^2=0$, i.e. $A\tilde{x}$ is a fixed point of the proximal mapping of $g$ or equivalently $0\in \partial g(A\tilde{x})$. In other words $A\tilde{x}$ is a minimizer of $g$.

Likewise, the lower-semicontinuity of $l$ implies that

$$\begin{aligned} 0\le l(\tilde{x})\le \liminf _{\nu \rightarrow +\infty } l(x_{k_\nu })=\lim _{k\rightarrow +\infty }l(x_k)=0. \end{aligned}$$

That is $l(\tilde{x})=\frac{1}{2}\Vert (I-\text{ prox }_{\mu _k\lambda f})\tilde{x} \Vert ^2=0$, i.e. $\tilde{x}$ is a fixed point of the proximal mapping of $f$ or equivalently $0\in \partial f(\tilde{x})$. In other words $\tilde{x}$ is also a minimizer of $f$ and thus a solution of problem (1.9). The second condition of Lemma 2.1 is also verified and consequently the whole sequence $(x_k)$ converges weakly to a solution of problem (1.9). This completes the proof. $\square $

Remark 2.3

i)
Where the bounded linear operator $A$ is the identity operator, (1.9) is nothing else than the problem of finding a common minimizer of $f$ and $g$ and (2.1) reduces to the following relaxed split proximal algorithm
$$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda \mu _kf}\big ((1-\mu _k)x_k+\mu _k \text{ prox }_{\lambda g}(x_k)\big ). \end{aligned}$$
(2.7)
ii)
We would like also to emphasize that by taking $f=\delta _C,\,g=\delta _Q$ the indicator functions of two nonempty closed convex sets $C, Q$ of $H_1$ and $H_2$ respectively, (2.1) reduces to [10, Algorithm 3.1] and we recover the corresponding convergence result, namely [10, Theorem 3.5].
iii)
It is worth mentioning that our approach works for split equilibrium and split null point problems considered in [5] and [14], respectively. To that end, just replace the proximal mappings of the convex functions $f$ and $g$ by the resolvent operators associated to two monotone equilibrium bifunctions and two maximal monotone operators, respectively.

3 A step towards the nonconvex case

Throughout this section $g$ is assumed to be prox-regular. The following problem

$$\begin{aligned} 0\in \partial f(\bar{x}) \quad \text{ such } \text{ that } \quad 0\in \partial _P g(A\bar{x}), \end{aligned}$$

(3.1)

is very general in the sense that it includes, as special cases, $g$ convex and $g$ lower-$\mathcal{C}^2$ function which is of great importance in optimization and can be locally expressed as a difference $g-\frac{r}{2}\Vert \cdot \Vert ^2$, where $g$ is a finite convex function, hence a large core of problems of interest in variational analysis and optimization. It should be noticed that examples abound of practitioners needing algorithms for solving nonconvex problems, for instance, in crystallography, astronomy and more recently in inverse scattering, see for example [12].

We start with some elementary facts of prox-regularity. We denote by $B(x,\varepsilon )$ (respectively $B[x,\varepsilon ]$) the open (respectively closed) ball around $x$ with radius $\varepsilon $.

Let $g:H_2\rightarrow I\!\!R\cup \{+\infty \}$ be a function and let $\bar{x}\in \text{ dom } g$, i.e., $g(\bar{x})<+\infty $. Poliquin–Rockafellar [17] introduced the concept of a proximal subdifferential and then they investigated the limiting proximal subdifferential (see also Bernard and Thibault [2] for the Hilbert setting). More precisely, the proximal subdifferential $\partial _Pg(\bar{x})$ is defined as follows

Definition 3.1

A vector $v$ is in $\partial _Pg(\bar{x})$ if there exist some $r>0$ and $\varepsilon >0$ such that for all $x\in B(\bar{x},\varepsilon )$,

$$\begin{aligned} \langle v,x-\bar{x}\rangle \le g(x)-g(\bar{x})+\frac{r}{2}\Vert x-\bar{x}\Vert ^2. \end{aligned}$$

(3.2)

When $g(\bar{x})=+\infty $, one puts $\partial _Pg(\bar{x})=\emptyset $.

Before stating the definition of prox-regularity of $g$ and properties of its proximal mapping, we recall that $g$ is locally l.s.c at $\bar{x}$ if its epigraph is closed relative to a neighborhood of $(\bar{x}, g(\bar{x}))$, prox-bounded if $g$ is minorized by a quadratic function, and recall that for $\varepsilon >0$, the $g$-attentive $\varepsilon $-localisation of $\partial _Pg$ around $(\bar{x}, \bar{v})$, is the mapping $T_\varepsilon : H_2\rightarrow 2^{H_2}$ defined by

$$\begin{aligned} T_\varepsilon (x)= \left\{ \begin{array}{l@{\quad }l} \{v\in \partial _Pg(x), \Vert v-\bar{v}\Vert <\varepsilon \}&{} \text{ if } \Vert x-\bar{x}\Vert <\varepsilon \text{ and } \vert g(x)-g(\bar{x})\vert <\varepsilon ,\\ \emptyset &{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

Definition 3.2

A function $g$ is said to be prox-regular at $\bar{x}$ for $\bar{v}\in \partial _Pg(\bar{x})$ if there exist some $r>0$ and $\varepsilon >0$ such that for all $x, x'\in B(\bar{x},\varepsilon )$ with $\vert g(x)-g(x')\vert <\varepsilon $ and all $v\in B(\bar{v};\varepsilon )$ with $v\in \partial _P g(x)$ one has

$$\begin{aligned} g(x')\ge g(x)+\langle v,x'- x\rangle -\frac{r}{2}\Vert x'- x\Vert ^2. \end{aligned}$$

(3.3)

If the property holds for all vectors $\bar{v}\in \partial _P g(\bar{x})$, the function is said to be prox-regular at $\bar{x}$.

Fundamental insights into the properties of a function $g$ come from the study of its Moreau–Yosida regularization $g_\lambda $ and the associated proximal mapping $\text{ prox }_{\lambda g}$ defined for $\lambda >0$ respectively by

$$\begin{aligned} g_\lambda (x)\!=\!\inf _{u\in H_2}\{g(u)\!+\!\frac{1}{2\lambda }\Vert u\!-x\Vert ^2\} \quad \text{ and } \quad \text{ prox }_{\lambda g}(x)\!:=\!\text{ arg }\min _{u\in H_2}\{g(u)\!+\!\frac{1}{2\lambda }\Vert u\!-\!x\Vert ^2\}. \end{aligned}$$

The latter is a fundamental tool in optimization and it was shown that a fixed point iteration on the proximal mapping could be used to develop a simple optimization algorithm, namely, the proximal point algorithm.

Note also, see for example Sect. 1 in [8], that local minima are zeroes of the Proximal subdifferential and that the Proximal subdifferential and the convex one coincide in the convex case.

Now, let us state the following proposition which summarizes some important consequences of the prox-regularity, see [17, Theorem 4.4] and [2, Lemma 3.1], Theorem 3.4.

Proposition 3.1

Suppose that $g$ is locally lower semicontinuous at $\bar{x}$ and prox-regular at $\bar{x}$ for $\bar{v}=0$ with respect to $r$ and $\varepsilon $. Let $T_\varepsilon $ be the $g$-attentive $\varepsilon $-localisation of $\partial _Pg$ around $(\bar{x},\bar{v})$. Then for each $\lambda \in ]0,1/r[$ there is a neighborhood $U_\lambda $ of $\bar{x}$ such that, on $U_\lambda $

i)
the mapping $\text{ prox }_{\lambda g}$ is single-valued and Lipschitz continuous with constant $\frac{1}{1-\lambda r}$ and $\text{ prox }_{\lambda g}(x)=(I+\lambda T_\varepsilon )^{-1}(x)=[\text{ singleton }]$.
ii)
$g_\lambda $ is differentiable (more precisely $g$ is of class $\mathcal{C}^{1+}$) with
$$\begin{aligned} \nabla g_\lambda (x)=\frac{x-\text{ prox }_{\lambda g}(x)}{\lambda }=(\lambda I+T_\varepsilon ^{-1})^{-1}(x). \end{aligned}$$

Now, let us prove the following key property of the proximal mapping complement.

Remark 3.2

If the assumptions of Proposition (3.1) hold true, then $\forall \lambda \in (0,\frac{1}{r})$ and $\forall x_1, x_2\in U_\lambda $, one has

$$\begin{aligned}&\langle (I- \text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2), x_1-x_2\rangle \\&\quad \ge \Vert (I- \text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2)\Vert ^2 -\frac{\lambda r}{(1-\lambda r)^2}\Vert x_1-x_2\Vert ^2. \end{aligned}$$

Observe that when $r=0$ which amount to saying that $g$ is convex, we recover the fact that the mapping $I-\text{ prox }_{\lambda g}$ is firmly nonexpansive.

Indeed, let $x_i\in U_\lambda ,\,i=1, 2$. Then $v_i= \frac{x_i-\text{ prox }_{\lambda g}(x_i)}{\lambda }\in T_\varepsilon (\text{ prox }_{\lambda g}(x_i))$. Invoking the prox-regularity of $g$, we have the monotonicity of $T_\varepsilon +rI$, see for instance [2, Theorem 3.4]. This implies, for the pairs $(x_1,v_1)$ and $(x_2,v_2)$, that

$$\begin{aligned} \langle v_1-v_2, \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\rangle \ge -r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2 \end{aligned}$$

We also have

$$\begin{aligned} \langle v_1-v_2, x_1-x_2\rangle&= \langle v_1-v_2, x_1-\text{ prox }_{\lambda g}(x_1)-(x_2-\text{ prox }_{\lambda g}(x_2))\rangle \\&+\, \langle v_1-v_2, \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\rangle \\&\ge \lambda ^{-1} \Vert x_1-\text{ prox }_{\lambda g}(x_1)-(x_2-\text{ prox }_{\lambda g}(x_2)\Vert ^2\\&-\,r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2, \end{aligned}$$

hence

$$\begin{aligned}&\langle (I-\text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g}(x_2)),x_1-x_2\rangle \\&\quad \ge \Vert (I-\text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2)\Vert ^2 -\lambda r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2. \end{aligned}$$

Combining the last inequality with Lipschitz continuity of the proximal mapping, i.e.,

$$\begin{aligned} \Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert \le \frac{1}{1-\lambda r}\Vert x_1-x_2\Vert , \end{aligned}$$

we obtain the desired result.

We state also a lemma which will be needed in the sequel.

Lemma 3.3

(See Polyak [18, Lemma 2.2.2]).

Let $(a_k), (\beta _k)$ and $(\gamma _k),\,k\in I\!\!N$ be three sequences of nonnegative numbers satisfying $a_{k+1}\le (1+\beta _k)a_k+\gamma _k.$ If $\sum _{k=0}^\infty \beta _k<+\infty $ and $\sum _{k=0}^\infty \gamma _k<+\infty $, then $(a_k)$ is convergent.

Now, the regularization parameters $\lambda $ are allowed to vary in the algorithm (2.1), namely considering possibly variable parameters $\lambda _k\in (0,\frac{1}{r}-\varepsilon )$ (for some $\varepsilon >0$ small enough) and $\mu _k>0$, our interest is in studying the convergence properties of the following algorithm:

Split proximal algorithm Given an initial point $x_0\in H_1$. Assume that $x_k$ has been constructed and $\theta (x_k)\not =0$; then compute $x_{k+1}$ via the rule

$$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda _k\mu _kf}(x_k-\mu _kA^*(I- \text{ prox }_{\lambda _k g})(Ax_k)), \end{aligned}$$

(3.4)

where the stepsize $\mu _k:=\rho _k\frac{ h(x_k)+l(x_k)}{\theta ^2(x_k)}$ with $0<\rho _k<4$. If $\theta (x_k)=0$, then $x_{k+1}=x_k$ is a solution of (3.1) and the iterative process stops; otherwise, we set $k:=k+1$ and go to (3.4).

Theorem 3.4

Assume that $f$ is a proper convex lower-semicontinuous function, $g$ is locally lower semicontinuous at $A\bar{x}$, prox-bounded and prox-regular at $A\bar{x}$ for $\bar{v}=0$ with $\bar{x}$ a point which solves (3.1) and $A$ a bounded linear operator which is surjective with a dense domain. If the parameters satisfy the following conditions $\sum \nolimits _{k=0}^\infty \lambda _k<+\infty $ and $ \inf _{k}\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)>0$, and if $\Vert x_0-\bar{x}\Vert $ is small enough, then the sequence $(x_k)$ generated by (3.4) weakly converges to a solution of (3.1).

Proof

Using the fact that $\text{ prox }_{\lambda _k\mu _k f}$ is nonexpansive, $\bar{x}$ verifies (3.1) (critical points of any function are exactly fixed-points of its proximal mapping) and having in mind Remark 3.2, we can write

$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert ^2&\le \Vert x_k-\bar{x}\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k \langle \nabla h(x_k), x_k-\bar{x}\rangle \\&\le \Vert x_k-\bar{x}\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k\big (2h(x_k)-\frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2\big ) \\&= \Vert x_k\!-\bar{x}\Vert ^2\!+\!2\mu _k\frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2-4\mu _kh(x_k)\!+\! \mu _k^2\Vert \nabla h(x_k)\Vert ^2\\&\le \Vert x_k-\bar{x}\Vert ^2+2\rho _k\frac{(h(x_k)+l(x_k))}{\Vert \nabla h(x_k)\Vert ^2+\Vert \nabla l(x_k)\Vert ^2} \frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2\\&-\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)\frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\\&\le \left( 1+\lambda _k\rho _k\left( \frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}+\frac{2l(x_k)}{\Vert \nabla l(x_k)\Vert ^2}\right) \frac{ r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\right) \Vert x_k-\bar{x}\Vert ^2\\&-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\\&= \left( 1+\lambda _k\rho _k\left( 1+\frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}\right) \frac{r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\right) \Vert x_k-\bar{x}\Vert ^2\\&-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}. \end{aligned}$$

Recall that

$$\begin{aligned} (\star ) \qquad A \text{ is } \text{ surjective } \text{ with } \text{ a } \text{ dense } \text{ domain } \Leftrightarrow \exists \gamma >0 \text{ such } \text{ that } \Vert A^*x\Vert \ge \gamma \Vert x\Vert , \end{aligned}$$

see for example Brezis [3, Theorem II.19]. This ensures that

$$\begin{aligned} \frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}=\frac{\Vert (I- \text{ prox }_{\lambda _k g})(Ax_k)\Vert ^2}{\Vert A^*(I- \text{ prox }_{\lambda _k g})(Ax_k)\Vert ^2}\le \frac{1}{\gamma ^2}. \end{aligned}$$

Conditions on the parameters $\lambda _k$ and $\rho _k$ assure the existence of a positive constant $M$ such that

$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert ^2 \le (1+ M\lambda _k)\Vert x_k-\bar{x}\Vert ^2-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}.\nonumber \\ \end{aligned}$$

(3.5)

In view the inequality (3.5), Lemma 3.3 assures the existence of $\displaystyle {\lim _{k\rightarrow +\infty }}\Vert x_k-\bar{x}\Vert ^2$, since $\sum _{k=0}^\infty \lambda _k<+\infty $.

Now, summing up inequality (3.5) with respect to $k=0, 1,\ldots $ and taking into account the assumption on $\rho _k$, we obtain

$$\begin{aligned} \varepsilon ^2\sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}\le \Vert x_0-z\Vert ^2+M\sum _{k=0}^\infty \lambda _k\Vert x_k-z\Vert ^2, \end{aligned}$$

(3.6)

and by invoking the facts that $(\Vert x_k-z\Vert ^2)$ is bounded and $\sum _{k=0}^\infty \lambda _k<+\infty $, we infer

$$\begin{aligned} \sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}<+\infty . \end{aligned}$$

(3.7)

Following the proof of Theorem 2.1, we conclude that the sequence $(x_k)$ weakly converges to a solution of (3.1). $\square $

Remark 3.5

In inverse problems, certain physical properties of the target solution are most suitably expressed in terms of the coefficients of its representation with respect to a family of finite or infinite vectors $(e_k)_k$ of a Hilbert space which constitutes a frame (see for instance [7] and the references therein), namely there exist two constants such that

$$\begin{aligned} \forall x\in H \quad \underline{\beta }\Vert x\Vert ^2\le \sum _k\vert \langle x, e_k\rangle \vert ^2 \le \overline{\beta }\Vert x\Vert ^2. \end{aligned}$$

The associated frame operator is the injective bounded operator $F$ define on $H$ by $F(x):=(\langle x, e_k\rangle )_k$ and the adjoint of which is the surjective operator define as $F^*((\xi )_k):=\sum _k\xi _k e_k$.

Let $\bar{x} \in H$ be the target solution of the underlying inverse problem in which a priori information (e.g., sparsity, distribution, statistical properties) is available about the coefficients $(\bar{\xi }_k)$ of the decomposition of $\bar{x}$ in $(e_k)_k$. To recover $\bar{x}$ a natural formulation is a variational problem in the space $l^2$ of frame coefficients (where a priori information on $(\bar{\xi }_k)$ can be easily incorporated) that involves two functions $f$ and $g\circ F^*$ with $f, g$ two proper lower semicontinuous convex functions (see for example [7] and the references therein). In this context the assumption on the linear operator $A:=F^*$ is satisfied. Furthermore, in the particular case of a Riesz basis, relation $(\star )$ is satisfied with $\gamma =\sqrt{\underline{\beta }}$, see [13].

References

Bauschke, H.H., Borwein, J.M.: On projection algorithms for solving convex feasibility problems. SIAM Rev. 38(3), 367–426 (1996)
Article MathSciNet MATH Google Scholar
Bernard, F., Thibault, L.: Prox-regular functions in Hilbert spaces. J. Math. Anal. Appl. 303, 1–14 (2005)
Article MathSciNet MATH Google Scholar
Brézis, H.: Analyse fonctionnelle, théorie et applications. Masson, Paris (1983)
MATH Google Scholar
Byrne, Ch.: Iterative oblique projection onto convex sets and the split feasibility problem. Inverse Probl. 18, 441–453 (2002)
Article MathSciNet MATH Google Scholar
Byrne, C.H., Censor, Y., Gibali, A., Reich, S.: Weak and strong convergence of algorithms for the split common null point problem. J. Nonlinear Convex Anal. 13, 759–775 (2012)
MathSciNet MATH Google Scholar
Censor, Y., Bortfeld, T., Martin, B., Trofimov, A.: A unified approach for inversion problems in intensity-modulated radiation therapy. Phys. Med. Biol. 51, 2353–2365 (2006)
Article Google Scholar
Chaux, C., Combettes, P.L., Pesquet, J.-C., Wajs, V.R.: A variational formulation for frame-based inverse problems. Inverse Probl. 23, 1495–1518 (2007)
Article MathSciNet MATH Google Scholar
Clarke, Z.F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory. Springer, New York (1998)
MATH Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
Lopez, G., Martin-Marquez, V., Wang, F., Xu, H.K.: Solving the split feasibility problem without prior knowledge of matrix norms. Inverse Probl. 28, 085004 (2012)
Article MathSciNet Google Scholar
Luke, D.R., Burke, J.V., Lyon, R.G.: Optical wavefront reconstruction: theory and numerical methods. SIAM Rev. 44, 169–224 (2002)
Article MathSciNet MATH Google Scholar
Luke, D.R.: Finding best approximation pairs relative to a convex and prox-regular set in Hilbert space. SIAM J. opt. 19(2), 714–729 (2008)
Article MathSciNet MATH Google Scholar
Mallat, S.G.: A Wavelet Tour of Signal Processing, 2nd edn. Academic Press, New York (1999)
MATH Google Scholar
Moudafi, A.: Split Monotone Variational Inclusions. J. Optim. Theory Appl. 150(2), 275–283 (2011)
Article MathSciNet MATH Google Scholar
Moudafi, A.: A proximal iterative approach to a nonconvex optimization problem. Nonlinear Anal. Theory Methods Appl. 72, 704–709 (2010)
Article MathSciNet MATH Google Scholar
Opial, Z.: Weak convergence of the sequence of successive approximations for nonexpansive mappings. Bull. Am. Math. Soc. 73, 591–597 (1967)
Article MathSciNet MATH Google Scholar
Poliquin, R.A., Rockafellar, R.T.: Prox-regular functions in variational analysis. Trans. Am. Math. Soc. 348, 1805–1838 (1996)
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)
Google Scholar
Rockafellar, R.T., Wets, R.: Variational Analysis. Springer, Berlin (1988)
Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877898 (1976)
Google Scholar
Xu, H.K.: A variable Krasnosel’skii-Mann algorithm and the multiple-set split feasibility problem. Inverse Probl. 22, 2021–2034 (2006)
Article MATH Google Scholar
Xu, H.-K.: Iterative methods for the split feasibility problem in infinite-dimensional Hilbert spaces, Inverse Probl. 26(10), article 105018 (2010)
Google Scholar
Yao, Y., Chen, R., Marino, G., Liou, Y.C.: Applications of fixed point and optimization methods to the multiple-sets split feasibility problem. J. Appl. Math. 2012 Article ID 927530 (2012)

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their careful reading, suggestions and pertinent questions which permitted us to improve the first version of this paper.

Author information

Authors and Affiliations

Ceregmia-Département scientifique, Université des Antilles et de la Guyane, Schoelcher, 97275, Martinique
A. Moudafi
School of Studies in Mathematics, Pt. Ravishankar Shukla University, Raipur, 492010, India
B. S. Thakur

Authors

A. Moudafi
View author publications
You can also search for this author in PubMed Google Scholar
B. S. Thakur
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Moudafi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moudafi, A., Thakur, B.S. Solving proximal split feasibility problems without prior knowledge of operator norms. Optim Lett 8, 2099–2110 (2014). https://doi.org/10.1007/s11590-013-0708-4

Download citation

Received: 30 April 2013
Accepted: 04 November 2013
Published: 19 November 2013
Issue Date: October 2014
DOI: https://doi.org/10.1007/s11590-013-0708-4

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solving proximal split feasibility problems without prior knowledge of operator norms

Abstract

Similar content being viewed by others

Iterative approximation of solutions for proximal split feasibility problems

The iterative method for solving the proximal split feasibility problem with an application to LASSO problem

Self-adaptive algorithms for proximal split feasibility problems and strong convergence analysis

1 Introduction and preliminaries

Algorithm 3.1

2 Convex minimization feasibility problem

Lemma 2.1

Theorem 2.2

Proof

Remark 2.3

3 A step towards the nonconvex case

Definition 3.1

Definition 3.2

Proposition 3.1

Remark 3.2

Lemma 3.3

Theorem 3.4

Proof

Remark 3.5

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Solving proximal split feasibility problems without prior knowledge of operator norms

Abstract

Similar content being viewed by others

Iterative approximation of solutions for proximal split feasibility problems

The iterative method for solving the proximal split feasibility problem with an application to LASSO problem

Self-adaptive algorithms for proximal split feasibility problems and strong convergence analysis

1 Introduction and preliminaries

Algorithm 3.1

2 Convex minimization feasibility problem

Lemma 2.1

Theorem 2.2

Proof

Remark 2.3

3 A step towards the nonconvex case

Definition 3.1

Definition 3.2

Proposition 3.1

Remark 3.2

Lemma 3.3

Theorem 3.4

Proof

Remark 3.5

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation