1 Introduction and preliminaries

The split feasibility problem has received much attention due to its applications in signal processing and image reconstruction [9], with particular progress in intensity-modulated therapy [6]. For a complete and exhaustive study on algorithms for solving convex feasibility problem, including comments about their applications and an excellent bibliography see, for example [1]. Our purpose here is to study the more general case of proximal split minimization problems and to investigate the convergence properties of the associated numerical solutions. To begin with, we are interested in finding a solution \(x^*\in H_1\) of the following problem

$$\begin{aligned} \min _{x\in H_1}\{f(x)+g_\lambda (Ax)\}, \end{aligned}$$
(1.1)

where \(H_1, H_2\) are two real Hilbert spaces, \(f:H_1\rightarrow I\!\!R\cup \{+\infty \},\,g:H_2\rightarrow I\!\!R\cup \{+\infty \}\) two proper convex lower semicontinuous functions and \(A:H_1\rightarrow H_2\) a bounded linear operator, \(g_{\lambda } (y)=\min _{u\in H_2}\{g(u)+\frac{1}{2\lambda }\Vert u-y\Vert ^2\}\) stands for the Moreau–Yosida approximate of the function \(g\) of parameter \(\lambda \).

Note that the differentiability of the Yosida-approximate \(g_\lambda \), see for instance [19], secures the additivity of the subdifferentials and thus we can write

$$\begin{aligned} \partial (f(x)+g_\lambda (Ax))=\partial f(x)+ A^*\nabla g_\lambda (Ax)=\partial f(x)+ A^*\left( \frac{I-\text{ prox }_{\lambda g}}{\lambda }\right) (Ax). \end{aligned}$$

The optimality condition of (1.1) can be then written as

$$\begin{aligned} 0\in \lambda \partial f(x^*)+ A^*(I-\text{ prox }_{\lambda g})(Ax^*), \end{aligned}$$
(1.2)

where \(\text{ prox }_{\lambda g} (x)=argmin_{u\in H_2}\{g(u)+\frac{1}{2\lambda }\Vert u-x\Vert ^2\}\) stands for the proximal mapping of \(g\) and the subdifferential of \(f\) at \(x\) is the set

$$\begin{aligned} \partial f(x):=\{u\in H_1: \ f(y)\ge f(x)+\langle u, y-x\rangle \text{ for } \text{ all } \ y\in H_1\}. \end{aligned}$$

Inclusion (1.2) in turn yields to the following equivalent fixed point formulation

$$\begin{aligned} x^*=\text{ prox }_{\mu \lambda f}\big (x^*- \mu A^*(I-\text{ prox }_{\lambda g})\big )(Ax^*). \end{aligned}$$
(1.3)

To solve (1.1), relation (1.3) suggests to consider the following split proximal algorithm

$$\begin{aligned} x_{k+1}=\text{ prox }_{\mu _k\lambda f}\big (x_k- \mu _k A^*(I-\text{ prox }_{\lambda g})\big )(Ax_k); \end{aligned}$$
(1.4)

Observe that by taking \(f=\delta _C\) [defined as \(\delta _C(x)=0\) if \(x\in C\) and \(+\infty \) otherwise], \(g=\delta _Q\) the indicator functions of two nonempty closed convex sets \(C, Q\) of \(H_1\) and \(H_2\) respectively, Problem (1.1) reduces to

$$\begin{aligned} \min _{x\in H_1}\{\delta _C(x)+(\delta _Q)_\lambda (Ax)\}\Leftrightarrow \min _{x\in C}\{\frac{1}{2\lambda }\Vert (I -P_Q)(Ax)\Vert ^2\}, \end{aligned}$$
(1.5)

which, when \(C\cap A^{-1}(Q)\not =\emptyset \), is equivalent to the so called Split Feasibility Problem, namely

$$\begin{aligned} x\in C \quad \text{ such } \text{ that } Ax\in Q. \end{aligned}$$
(1.6)

This problem was used for solving an inverse problem in radiation therapy treatment planning [6] and has been well studied both theoretically and practically, see for example [1, 5] and the references therein. In this context, (1.4) reduces to the so-called CQ-algorithm introduced by Byrne [4]

$$\begin{aligned} x_{k+1}=P_C\big (x_k- \mu _k A^*(I-P_Q)(Ax_k)\big ); \end{aligned}$$
(1.7)

where the step-size \(\mu _k\) is chosen in \((0,2/\Vert A\Vert ^2)\) and \(P_C, P_Q\) stand for the orthogonal projections on the closed convex sets \(C\) and \(Q\), respectively.

The determination of the step-size in (1.7) (idem for its Krasnoselskii–Mann version, see for instance [2123], and also for (1.4), see for example [9] and references therein) depends on the operator norm which computation (or at least estimate) is not an easy task. To overcome this difficulty, Lopez et al. [10] introduce a new choice of the step-size sequence \((\mu _k)\) and propose the following algorithm:

Algorithm 3.1

Given an initial arbitrarily point \(x_0\in H_1\). Assume that \(x_k\in C\) has been constructed and \(\nabla \tilde{h}(x_k)\not =0\); then compute \(x_{k+1}\) via the rule

$$\begin{aligned} x_{k+1}=P_C(x_k-\mu _kA^*(I- P_Q))(Ax_k), \end{aligned}$$
(1.8)

where \(\mu _k:=\rho _k\frac{ \tilde{h}(x_k)}{\Vert \nabla \tilde{h}(x_k)\Vert ^2}\) with \(0<\rho _k<4\) and \(\tilde{h}(x)=\frac{1}{2}\Vert (I-P_Q)Ax \Vert ^2\). If \(\nabla \tilde{h}(x_k)=0\), then \(x_{k+1}=x_k\) is a solution of (1.6) and the iterative process stops; otherwise, we set \(k:=k+1\) and go to (1.8).

They proved the weak convergence of the sequence generated by (1.8) if (1.6) is consistent and \(inf_k\rho _k(4-\rho _k)>0\).

At this stage we would like to emphasize that our interest, in the first part of the present paper, is in solving (1.1) in the case \(\textit{argmin} \ f\cap A^{-1}(\textit{argmin} \ g)\not =\emptyset \), or in other words: in finding a minimizer \(x^*\) of \(f\) such that \(Ax^*\) minimizes \(g\), namely

$$\begin{aligned} x^*\in \textit{argmin} f \quad \text{ such } \text{ that } Ax^*\in \textit{argmin} \ g, \end{aligned}$$
(1.9)

\(f,\,g\) being two proper, lower semicontinuous convex functions, \(\textit{argmin} \ f:=\{\bar{x}\in H_1: f(\bar{x})\le f(x) \ \forall x\in H_1\}\) and \(\textit{argmin} \ g:=\{\bar{y}\in H_2: g(\bar{y})\le g(y) \ \forall y\in H_2\}\). \(\Gamma \) will denote the solution set.

This problem was considered, for instance in [5] and [14], however, the iterative methods proposed to solve it need to know a priori the norm (or at least an estimate of the norm) of the bounded linear operator \(A\). To avoid this difficulty, inspired by the idea introduced in [10], we develop in Sect. 2 an algorithm which is designed to address a way of selecting the step-sizes such that its implementation does not need any prior information about the operator norm and prove its related convergence result. This result is an extension of Theorem 3.5-[10].

In the second part of this paper, we will still assume \(f\) to be convex, but we allow the function \(g\) to be noconvex. In the case of indicator functions of subsets with \(A=I\), such situation is encountered in numerical solution to phase retrieval problem in inverse scattering [11] and is therefore of great practical interest. Here, we consider the more general problem of finding a minimizer \(\bar{x}\) of \(f\) such that \(A\bar{x}\) is a critical point of \(g\), namely

$$\begin{aligned} 0\in \partial f({\bar{x}}) \quad \text{ such } \text{ that } 0\in \partial _P g(A{\bar{x}}), \end{aligned}$$
(1.10)

where \(\partial _P\) stands for the Proximal subdifferential of \(g\) which will be define in Sect. 3.

At this time the nonconvex theory is much less developed than the convex one. A notable exception, in the fixed-point context, is the work by Luke [12], who studies the convergence of a projection/reflection algorithm in a prox-regular setting. Nevertheless, the fixed point operator is assumed to be locally firmly nonexpansive. In [15] a proximal approach was also developed for finding critical points of uniformly prox-regular functions. Here, in the case of variable regularization parameters \((\lambda _k)\), we will prove in Sect. 3 the convergence of our Split Proximal Algorithm if the bounded linear operator is surjective. The latter assumption is always satisfied in inverse problems in which a priori information is available about the representation of the target solution in a frame, see for instance [7] and the references therein.

2 Convex minimization feasibility problem

Now, we are in a position to introduce a new way of selecting the step-sizes. To that end, we set \(\theta (x_k):=\sqrt{\Vert \nabla h(x)\Vert ^2+\Vert \nabla l(x)\Vert ^2}\) with \(h(x)=\frac{1}{2}\Vert (I-\text{ prox }_{\lambda g})Ax \Vert ^2,\,l(x)=\frac{1}{2}\Vert (I-\text{ prox }_{\mu _k\lambda f})x \Vert ^2\) and introduce the following split proximal algorithm:

Split proximal algorithm Given an initial point \(x_0\in H_1\). Assume that \(x_k\) has been constructed and \(\theta (x_k)\not =0\); then compute \(x_{k+1}\) via the rule

$$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda \mu _kf}(x_k-\mu _kA^*(I- \text{ prox }_{\lambda g})(Ax_k)), \end{aligned}$$
(2.1)

where the stepsize \(\mu _k:=\rho _k\frac{ h(x_k)+l(x_k)}{\theta ^2(x_k)}\) with \(0<\rho _k<4\). If \(\theta (x_k)=0\), then \(x_{k+1}=x_k\) is a solution of (1.9) and the iterative process stops; otherwise, we set \(k:=k+1\) and go to (2.1).

Observe that by taking \(f=\delta _C,\,g=\delta _Q\) the indicator functions of two nonempty closed convex sets \(C, Q\) of \(H_1\) and \(H_2\) respectively, we recover [10, Algorithm 3.1]. Indeed, since \( \text{ prox }_{\lambda \mu _kf}=P_C\), the iterates \(x_k\) belong to \(C\) and thus \(\nabla l(x_k)=0\). So \(\theta (x_k)\) reduces to \(\nabla \tilde{h}(x_k)\).

Recall that the proximal mapping of \(g\) is firmly nonexpansive, namely

$$\begin{aligned} \langle \text{ prox }_{\mu g} y-\text{ prox }_{\mu g} y', y-y'\rangle \ge \Vert \ge \text{ prox }_{\mu g} y-\text{ prox }_{\mu g} y'\Vert ^2 \ \forall y,\quad y'\in H_2, \end{aligned}$$

and it is also the case for complement \(I-\text{ prox }_{\mu g}\), see for example [7].

In order to prove the main result of this section, the idea is to apply the following well-known result on weak convergence in Hilbert spaces (the Opial’s lemma) with \(S:=\Gamma \). The original proof of this Lemma in [16] requires \(S\) to be closed and convex. An argument given in [20] shows that we do not need convexity.

Lemma 2.1

see [16, 20].

Let \({H}\) be a Hilbert space and \((x_k)\) a sequence in \({ H}\) such that there exists a nonempty closed set \(S\subset {H}\) satisfying:

  1. (i)

    For every \(z \in S,\,\lim _k \Vert x_k-z\Vert \) exists.

  2. (ii)

    Any weak-cluster point of the sequence \((x_k)\) belongs in \(S\).

Then, there exists \(\bar{x}\in S\) such that \((x_k)\) weakly converges to \(\bar{x}\).

The following Theorem contains the main convergence result of this section.

Theorem 2.2

Assume that \(f\) and \(g\) are two proper convex lower-semicontinuous functions and that (1.9) is consistent (i.e., \(\Gamma \not =\emptyset \)). If the parameters satisfy the following conditions \(\varepsilon \le \rho _k\le \frac{4h(x_k)}{h(x_k)+l(x_k)} -\varepsilon \) (for some \(\varepsilon >0\) small enough), then the sequence \((x_k)\) generated by (2.1) weakly converges to a solution of (1.9).

Proof

Let \(z\in \Gamma \) and note that \(\nabla h(x)=A^*(I-\text{ prox }_{\mu _k g})Ax\), \(\nabla l(x)=(I-\text{ prox }_{\mu _k\lambda f})x\). Using the fact that \(\text{ prox }_{\lambda \mu _k f}\) is nonexpansive, \(z\) verifies (1.9) (since minimizers of any function are exactly fixed-points of its proximal mapping) and having in hand

$$\begin{aligned} \langle \nabla h(x_k), x_k\!-\!z\rangle&= \langle (I\!-\!\text{ prox }_{\mu _k g})Ax_k, Ax_k\!-\!Az\rangle \ge \Vert (I\!-\!\text{ prox }_{\lambda g})Ax_k \Vert ^2=2h(x_k), \end{aligned}$$

thank to the fact that \(I-\text{ prox }_{\mu _k g}\) is firmly nonexpansive, we can write

$$\begin{aligned} \Vert x_{k+1}-z\Vert ^2&\le \Vert x_k-z\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k \langle \nabla h(x_k), x_k-z\rangle \\&\le \Vert x_k-z\Vert ^2+ \mu _k^2\Vert \nabla h(x_k)\Vert ^2 -4\mu _kh(x_k)\\&= \Vert x_k\!-\!z\Vert ^2\!+\!\rho _k^2\frac{( h(x_k)\!+\!l(x_k))^2}{(\theta ^2(x_k))^2}\Vert \nabla h(x_k)\Vert ^2\!-\! 4\rho _k\frac{ h(x_k)\!+\!l(x_k)}{\theta ^2(x_k)}h(x_k)\\&\le \Vert x_k\!-\!z\Vert ^2\!+\!\rho _k^2\frac{( h(x_k)\!+\!l(x_k))^2}{\theta ^2(x_k)}\!-\! 4\rho _k\frac{( h(x_k)\!+\!l(x_k))^2}{\theta ^2(x_k)}\frac{ h(x_k)}{h(x_k)\!+\!l(x_k)}\\&= \Vert x_k-z\Vert ^2-\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)\frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}. \end{aligned}$$

Thus

$$\begin{aligned} \Vert x_{k+1}-z\Vert ^2 \le \Vert x_k-z\Vert ^2-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}.\qquad \end{aligned}$$
(2.2)

The sequence \((x_k)\) is thus Fejer monotone with respect to \(\Gamma \) which assures the existence of the limit

$$\begin{aligned} l(z):=\displaystyle {\lim _{k\rightarrow +\infty }}\Vert x_k-z\Vert ^2<+\infty , \end{aligned}$$
(2.3)

and hence the first condition of Lemma 2.1 is satisfied. The latter in turn implies that \((x_k)\) is bounded.

Now, Summing up inequality (2.2) with respect to \(k=0, 1,\ldots \), we obtain

$$\begin{aligned} \sum _{k=0}^\infty \rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\le \Vert x_0-z\Vert ^2<+\infty . \end{aligned}$$
(2.4)

On the other hand, the assumption \(\varepsilon \le \rho _k\le \frac{4h(x_k)}{h(x_k)+l(x_k)}-\varepsilon \) together with (2.4) ensure that

$$\begin{aligned} \sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}<+\infty . \end{aligned}$$
(2.5)

Consequently,

$$\begin{aligned} \lim _{k\rightarrow +\infty }(h(x_k)+l(x_k))=0\Leftrightarrow \lim _{k\rightarrow +\infty } h(x_k)=0 \quad \text{ and } \quad \lim _{k\rightarrow +\infty } l(x_k)=0,\qquad \end{aligned}$$
(2.6)

because \(\theta ^2(x_k):=\Vert \nabla h(x)\Vert ^2+\Vert \nabla l(x)\Vert ^2\) is bounded. This follows from the fact that \(\nabla h\) is Lipschitz continuous with constant \(\Vert A\Vert ^2\), \(\nabla l\) is nonexpansive and \((x_k)\) is bounded. More precisely, for any \(z\) which solves (1.9), we have

$$\begin{aligned} \Vert \nabla h(x_k)\Vert&= \Vert \nabla h(x_k))-\nabla h(z)\Vert \le \Vert A\Vert ^2\Vert x_k-z\Vert \quad \text{ and } \\ \Vert \nabla l(x_k)\Vert&= \Vert \nabla l(x_k)-\nabla l(z)\Vert \le \Vert x_k-z\Vert . \end{aligned}$$

Now, let \(\tilde{x}\) be a weak cluster point of \((x_k)\), there exists a subsequence \((x_{k_\nu })\) which weakly converges to \(\tilde{x}\). The lower-semicontinuity of \(h\) then implies that

$$\begin{aligned} 0\le h(\tilde{x})\le \liminf _{\nu \rightarrow +\infty } h(x_{k_\nu })=\lim _{k\rightarrow +\infty }h(x_k)=0. \end{aligned}$$

That is \(h(\tilde{x})=\frac{1}{2}\Vert (I-\text{ prox }_{\lambda g})A\tilde{x} \Vert ^2=0\), i.e. \(A\tilde{x}\) is a fixed point of the proximal mapping of \(g\) or equivalently \(0\in \partial g(A\tilde{x})\). In other words \(A\tilde{x}\) is a minimizer of \(g\).

Likewise, the lower-semicontinuity of \(l\) implies that

$$\begin{aligned} 0\le l(\tilde{x})\le \liminf _{\nu \rightarrow +\infty } l(x_{k_\nu })=\lim _{k\rightarrow +\infty }l(x_k)=0. \end{aligned}$$

That is \(l(\tilde{x})=\frac{1}{2}\Vert (I-\text{ prox }_{\mu _k\lambda f})\tilde{x} \Vert ^2=0\), i.e. \(\tilde{x}\) is a fixed point of the proximal mapping of \(f\) or equivalently \(0\in \partial f(\tilde{x})\). In other words \(\tilde{x}\) is also a minimizer of \(f\) and thus a solution of problem (1.9). The second condition of Lemma 2.1 is also verified and consequently the whole sequence \((x_k)\) converges weakly to a solution of problem (1.9). This completes the proof. \(\square \)

Remark 2.3

  1. i)

    Where the bounded linear operator \(A\) is the identity operator, (1.9) is nothing else than the problem of finding a common minimizer of \(f\) and \(g\) and (2.1) reduces to the following relaxed split proximal algorithm

    $$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda \mu _kf}\big ((1-\mu _k)x_k+\mu _k \text{ prox }_{\lambda g}(x_k)\big ). \end{aligned}$$
    (2.7)
  2. ii)

    We would like also to emphasize that by taking \(f=\delta _C,\,g=\delta _Q\) the indicator functions of two nonempty closed convex sets \(C, Q\) of \(H_1\) and \(H_2\) respectively, (2.1) reduces to [10, Algorithm 3.1] and we recover the corresponding convergence result, namely [10, Theorem 3.5].

  3. iii)

    It is worth mentioning that our approach works for split equilibrium and split null point problems considered in [5] and [14], respectively. To that end, just replace the proximal mappings of the convex functions \(f\) and \(g\) by the resolvent operators associated to two monotone equilibrium bifunctions and two maximal monotone operators, respectively.

3 A step towards the nonconvex case

Throughout this section \(g\) is assumed to be prox-regular. The following problem

$$\begin{aligned} 0\in \partial f(\bar{x}) \quad \text{ such } \text{ that } \quad 0\in \partial _P g(A\bar{x}), \end{aligned}$$
(3.1)

is very general in the sense that it includes, as special cases, \(g\) convex and \(g\) lower-\(\mathcal{C}^2\) function which is of great importance in optimization and can be locally expressed as a difference \(g-\frac{r}{2}\Vert \cdot \Vert ^2\), where \(g\) is a finite convex function, hence a large core of problems of interest in variational analysis and optimization. It should be noticed that examples abound of practitioners needing algorithms for solving nonconvex problems, for instance, in crystallography, astronomy and more recently in inverse scattering, see for example [12].

We start with some elementary facts of prox-regularity. We denote by \(B(x,\varepsilon )\) (respectively \(B[x,\varepsilon ]\)) the open (respectively closed) ball around \(x\) with radius \(\varepsilon \).

Let \(g:H_2\rightarrow I\!\!R\cup \{+\infty \}\) be a function and let \(\bar{x}\in \text{ dom } g\), i.e., \(g(\bar{x})<+\infty \). Poliquin–Rockafellar [17] introduced the concept of a proximal subdifferential and then they investigated the limiting proximal subdifferential (see also Bernard and Thibault [2] for the Hilbert setting). More precisely, the proximal subdifferential \(\partial _Pg(\bar{x})\) is defined as follows

Definition 3.1

A vector \(v\) is in \(\partial _Pg(\bar{x})\) if there exist some \(r>0\) and \(\varepsilon >0\) such that for all \(x\in B(\bar{x},\varepsilon )\),

$$\begin{aligned} \langle v,x-\bar{x}\rangle \le g(x)-g(\bar{x})+\frac{r}{2}\Vert x-\bar{x}\Vert ^2. \end{aligned}$$
(3.2)

When \(g(\bar{x})=+\infty \), one puts \(\partial _Pg(\bar{x})=\emptyset \).

Before stating the definition of prox-regularity of \(g\) and properties of its proximal mapping, we recall that \(g\) is locally l.s.c at \(\bar{x}\) if its epigraph is closed relative to a neighborhood of \((\bar{x}, g(\bar{x}))\), prox-bounded if \(g\) is minorized by a quadratic function, and recall that for \(\varepsilon >0\), the \(g\)-attentive \(\varepsilon \)-localisation of \(\partial _Pg\) around \((\bar{x}, \bar{v})\), is the mapping \(T_\varepsilon : H_2\rightarrow 2^{H_2}\) defined by

$$\begin{aligned} T_\varepsilon (x)= \left\{ \begin{array}{l@{\quad }l} \{v\in \partial _Pg(x), \Vert v-\bar{v}\Vert <\varepsilon \}&{} \text{ if } \Vert x-\bar{x}\Vert <\varepsilon \text{ and } \vert g(x)-g(\bar{x})\vert <\varepsilon ,\\ \emptyset &{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

Definition 3.2

A function \(g\) is said to be prox-regular at \(\bar{x}\) for \(\bar{v}\in \partial _Pg(\bar{x})\) if there exist some \(r>0\) and \(\varepsilon >0\) such that for all \(x, x'\in B(\bar{x},\varepsilon )\) with \(\vert g(x)-g(x')\vert <\varepsilon \) and all \(v\in B(\bar{v};\varepsilon )\) with \(v\in \partial _P g(x)\) one has

$$\begin{aligned} g(x')\ge g(x)+\langle v,x'- x\rangle -\frac{r}{2}\Vert x'- x\Vert ^2. \end{aligned}$$
(3.3)

If the property holds for all vectors \(\bar{v}\in \partial _P g(\bar{x})\), the function is said to be prox-regular at \(\bar{x}\).

Fundamental insights into the properties of a function \(g\) come from the study of its Moreau–Yosida regularization \(g_\lambda \) and the associated proximal mapping \(\text{ prox }_{\lambda g}\) defined for \(\lambda >0\) respectively by

$$\begin{aligned} g_\lambda (x)\!=\!\inf _{u\in H_2}\{g(u)\!+\!\frac{1}{2\lambda }\Vert u\!-x\Vert ^2\} \quad \text{ and } \quad \text{ prox }_{\lambda g}(x)\!:=\!\text{ arg }\min _{u\in H_2}\{g(u)\!+\!\frac{1}{2\lambda }\Vert u\!-\!x\Vert ^2\}. \end{aligned}$$

The latter is a fundamental tool in optimization and it was shown that a fixed point iteration on the proximal mapping could be used to develop a simple optimization algorithm, namely, the proximal point algorithm.

Note also, see for example Sect. 1 in [8], that local minima are zeroes of the Proximal subdifferential and that the Proximal subdifferential and the convex one coincide in the convex case.

Now, let us state the following proposition which summarizes some important consequences of the prox-regularity, see [17, Theorem 4.4] and [2, Lemma 3.1], Theorem 3.4.

Proposition 3.1

Suppose that \(g\) is locally lower semicontinuous at \(\bar{x}\) and prox-regular at \(\bar{x}\) for \(\bar{v}=0\) with respect to \(r\) and \(\varepsilon \). Let \(T_\varepsilon \) be the \(g\)-attentive \(\varepsilon \)-localisation of \(\partial _Pg\) around \((\bar{x},\bar{v})\). Then for each \(\lambda \in ]0,1/r[\) there is a neighborhood \(U_\lambda \) of \(\bar{x}\) such that, on \(U_\lambda \)

  1. i)

    the mapping \(\text{ prox }_{\lambda g}\) is single-valued and Lipschitz continuous with constant \(\frac{1}{1-\lambda r}\) and \(\text{ prox }_{\lambda g}(x)=(I+\lambda T_\varepsilon )^{-1}(x)=[\text{ singleton }]\).

  2. ii)

    \(g_\lambda \) is differentiable (more precisely \(g\) is of class \(\mathcal{C}^{1+}\)) with

    $$\begin{aligned} \nabla g_\lambda (x)=\frac{x-\text{ prox }_{\lambda g}(x)}{\lambda }=(\lambda I+T_\varepsilon ^{-1})^{-1}(x). \end{aligned}$$

Now, let us prove the following key property of the proximal mapping complement.

Remark 3.2

If the assumptions of Proposition (3.1) hold true, then \(\forall \lambda \in (0,\frac{1}{r})\) and \(\forall x_1, x_2\in U_\lambda \), one has

$$\begin{aligned}&\langle (I- \text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2), x_1-x_2\rangle \\&\quad \ge \Vert (I- \text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2)\Vert ^2 -\frac{\lambda r}{(1-\lambda r)^2}\Vert x_1-x_2\Vert ^2. \end{aligned}$$

Observe that when \(r=0\) which amount to saying that \(g\) is convex, we recover the fact that the mapping \(I-\text{ prox }_{\lambda g}\) is firmly nonexpansive.

Indeed, let \(x_i\in U_\lambda ,\,i=1, 2\). Then \(v_i= \frac{x_i-\text{ prox }_{\lambda g}(x_i)}{\lambda }\in T_\varepsilon (\text{ prox }_{\lambda g}(x_i))\). Invoking the prox-regularity of \(g\), we have the monotonicity of \(T_\varepsilon +rI\), see for instance [2, Theorem 3.4]. This implies, for the pairs \((x_1,v_1)\) and \((x_2,v_2)\), that

$$\begin{aligned} \langle v_1-v_2, \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\rangle \ge -r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2 \end{aligned}$$

We also have

$$\begin{aligned} \langle v_1-v_2, x_1-x_2\rangle&= \langle v_1-v_2, x_1-\text{ prox }_{\lambda g}(x_1)-(x_2-\text{ prox }_{\lambda g}(x_2))\rangle \\&+\, \langle v_1-v_2, \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\rangle \\&\ge \lambda ^{-1} \Vert x_1-\text{ prox }_{\lambda g}(x_1)-(x_2-\text{ prox }_{\lambda g}(x_2)\Vert ^2\\&-\,r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2, \end{aligned}$$

hence

$$\begin{aligned}&\langle (I-\text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g}(x_2)),x_1-x_2\rangle \\&\quad \ge \Vert (I-\text{ prox }_{\lambda g})(x_1)-(I-\text{ prox }_{\lambda g})(x_2)\Vert ^2 -\lambda r\Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert ^2. \end{aligned}$$

Combining the last inequality with Lipschitz continuity of the proximal mapping, i.e.,

$$\begin{aligned} \Vert \text{ prox }_{\lambda g}(x_1)-\text{ prox }_{\lambda g}(x_2)\Vert \le \frac{1}{1-\lambda r}\Vert x_1-x_2\Vert , \end{aligned}$$

we obtain the desired result.

We state also a lemma which will be needed in the sequel.

Lemma 3.3

(See Polyak [18, Lemma 2.2.2]).

Let \((a_k), (\beta _k)\) and \((\gamma _k),\,k\in I\!\!N\) be three sequences of nonnegative numbers satisfying \(a_{k+1}\le (1+\beta _k)a_k+\gamma _k.\) If \(\sum _{k=0}^\infty \beta _k<+\infty \) and \(\sum _{k=0}^\infty \gamma _k<+\infty \), then \((a_k)\) is convergent.

Now, the regularization parameters \(\lambda \) are allowed to vary in the algorithm (2.1), namely considering possibly variable parameters \(\lambda _k\in (0,\frac{1}{r}-\varepsilon )\) (for some \(\varepsilon >0\) small enough) and \(\mu _k>0\), our interest is in studying the convergence properties of the following algorithm:

Split proximal algorithm Given an initial point \(x_0\in H_1\). Assume that \(x_k\) has been constructed and \(\theta (x_k)\not =0\); then compute \(x_{k+1}\) via the rule

$$\begin{aligned} x_{k+1}= \text{ prox }_{\lambda _k\mu _kf}(x_k-\mu _kA^*(I- \text{ prox }_{\lambda _k g})(Ax_k)), \end{aligned}$$
(3.4)

where the stepsize \(\mu _k:=\rho _k\frac{ h(x_k)+l(x_k)}{\theta ^2(x_k)}\) with \(0<\rho _k<4\). If \(\theta (x_k)=0\), then \(x_{k+1}=x_k\) is a solution of (3.1) and the iterative process stops; otherwise, we set \(k:=k+1\) and go to (3.4).

Theorem 3.4

Assume that \(f\) is a proper convex lower-semicontinuous function, \(g\) is locally lower semicontinuous at \(A\bar{x}\), prox-bounded and prox-regular at \(A\bar{x}\) for \(\bar{v}=0\) with \(\bar{x}\) a point which solves (3.1) and \(A\) a bounded linear operator which is surjective with a dense domain. If the parameters satisfy the following conditions \(\sum \nolimits _{k=0}^\infty \lambda _k<+\infty \) and \( \inf _{k}\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)>0\), and if \(\Vert x_0-\bar{x}\Vert \) is small enough, then the sequence \((x_k)\) generated by (3.4) weakly converges to a solution of (3.1).

Proof

Using the fact that \(\text{ prox }_{\lambda _k\mu _k f}\) is nonexpansive, \(\bar{x}\) verifies (3.1) (critical points of any function are exactly fixed-points of its proximal mapping) and having in mind Remark 3.2, we can write

$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert ^2&\le \Vert x_k-\bar{x}\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k \langle \nabla h(x_k), x_k-\bar{x}\rangle \\&\le \Vert x_k-\bar{x}\Vert ^2+\mu _k^2\Vert \nabla h(x_k)\Vert ^2 -2\mu _k\big (2h(x_k)-\frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2\big ) \\&= \Vert x_k\!-\bar{x}\Vert ^2\!+\!2\mu _k\frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2-4\mu _kh(x_k)\!+\! \mu _k^2\Vert \nabla h(x_k)\Vert ^2\\&\le \Vert x_k-\bar{x}\Vert ^2+2\rho _k\frac{(h(x_k)+l(x_k))}{\Vert \nabla h(x_k)\Vert ^2+\Vert \nabla l(x_k)\Vert ^2} \frac{\lambda _k r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\Vert x_k-\bar{x}\Vert ^2\\&-\rho _k(\frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k)\frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\\&\le \left( 1+\lambda _k\rho _k\left( \frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}+\frac{2l(x_k)}{\Vert \nabla l(x_k)\Vert ^2}\right) \frac{ r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\right) \Vert x_k-\bar{x}\Vert ^2\\&-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}\\&= \left( 1+\lambda _k\rho _k\left( 1+\frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}\right) \frac{r\Vert A\Vert ^2}{(1-\lambda _k r)^2}\right) \Vert x_k-\bar{x}\Vert ^2\\&-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}. \end{aligned}$$

Recall that

$$\begin{aligned} (\star ) \qquad A \text{ is } \text{ surjective } \text{ with } \text{ a } \text{ dense } \text{ domain } \Leftrightarrow \exists \gamma >0 \text{ such } \text{ that } \Vert A^*x\Vert \ge \gamma \Vert x\Vert , \end{aligned}$$

see for example Brezis [3, Theorem II.19]. This ensures that

$$\begin{aligned} \frac{2h(x_k)}{\Vert \nabla h(x_k)\Vert ^2}=\frac{\Vert (I- \text{ prox }_{\lambda _k g})(Ax_k)\Vert ^2}{\Vert A^*(I- \text{ prox }_{\lambda _k g})(Ax_k)\Vert ^2}\le \frac{1}{\gamma ^2}. \end{aligned}$$

Conditions on the parameters \(\lambda _k\) and \(\rho _k\) assure the existence of a positive constant \(M\) such that

$$\begin{aligned} \Vert x_{k+1}-\bar{x}\Vert ^2 \le (1+ M\lambda _k)\Vert x_k-\bar{x}\Vert ^2-\rho _k\left( \frac{4h(x_k)}{h(x_k)+l(x_k)}-\rho _k\right) \frac{(h(x_k)+l(x_k))^2}{ \theta ^2(x_k)}.\nonumber \\ \end{aligned}$$
(3.5)

In view the inequality (3.5), Lemma 3.3 assures the existence of \(\displaystyle {\lim _{k\rightarrow +\infty }}\Vert x_k-\bar{x}\Vert ^2\), since \(\sum _{k=0}^\infty \lambda _k<+\infty \).

Now, summing up inequality (3.5) with respect to \(k=0, 1,\ldots \) and taking into account the assumption on \(\rho _k\), we obtain

$$\begin{aligned} \varepsilon ^2\sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}\le \Vert x_0-z\Vert ^2+M\sum _{k=0}^\infty \lambda _k\Vert x_k-z\Vert ^2, \end{aligned}$$
(3.6)

and by invoking the facts that \((\Vert x_k-z\Vert ^2)\) is bounded and \(\sum _{k=0}^\infty \lambda _k<+\infty \), we infer

$$\begin{aligned} \sum _{k=0}^\infty \frac{(h(x_k)+l(x_k))^2}{\theta ^2(x_k)}<+\infty . \end{aligned}$$
(3.7)

Following the proof of Theorem 2.1, we conclude that the sequence \((x_k)\) weakly converges to a solution of (3.1). \(\square \)

Remark 3.5

In inverse problems, certain physical properties of the target solution are most suitably expressed in terms of the coefficients of its representation with respect to a family of finite or infinite vectors \((e_k)_k\) of a Hilbert space which constitutes a frame (see for instance [7] and the references therein), namely there exist two constants such that

$$\begin{aligned} \forall x\in H \quad \underline{\beta }\Vert x\Vert ^2\le \sum _k\vert \langle x, e_k\rangle \vert ^2 \le \overline{\beta }\Vert x\Vert ^2. \end{aligned}$$

The associated frame operator is the injective bounded operator \(F\) define on \(H\) by \(F(x):=(\langle x, e_k\rangle )_k\) and the adjoint of which is the surjective operator define as \(F^*((\xi )_k):=\sum _k\xi _k e_k\).

Let \(\bar{x} \in H\) be the target solution of the underlying inverse problem in which a priori information (e.g., sparsity, distribution, statistical properties) is available about the coefficients \((\bar{\xi }_k)\) of the decomposition of \(\bar{x}\) in \((e_k)_k\). To recover \(\bar{x}\) a natural formulation is a variational problem in the space \(l^2\) of frame coefficients (where a priori information on \((\bar{\xi }_k)\) can be easily incorporated) that involves two functions \(f\) and \(g\circ F^*\) with \(f, g\) two proper lower semicontinuous convex functions (see for example [7] and the references therein). In this context the assumption on the linear operator \(A:=F^*\) is satisfied. Furthermore, in the particular case of a Riesz basis, relation \((\star )\) is satisfied with \(\gamma =\sqrt{\underline{\beta }}\), see [13].