Abstract
We extend the Malitsky-Tam forward-reflected-backward (FRB) splitting method for inclusion problems of monotone operators to nonconvex minimization problems. By assuming the generalized concave Kurdyka-Łojasiewicz (KL) property of a quadratic regularization of the objective, we show that the FRB method converges globally to a stationary point of the objective and enjoys the finite length property. Convergence rates are also given. The sharpness of our approach is guaranteed by virtue of the exact modulus associated with the generalized concave KL property. Numerical experiments suggest that FRB is competitive compared to the Douglas-Rachford method and the Boţ-Csetnek inertial Tseng’s method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider the general inclusion problem of maximally monotone operators:
where \(A:{\mathbb {R}}^n\rightarrow 2^{{\mathbb {R}}^n}\) and \(B:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) are (maximally) monotone operators with B being Lipschitz continuous. Successful splitting methods for the inclusion problem (1) are numerous; see, e.g., [1, Chapters 26, 28], including the forward-backward method with B being cocoercive [2] and Tseng’s method [3]. Recently, Malitsky and Tam [4] proposed a forward-reflected-backward (FRB) splitting method for solving (1). Let \(\lambda >0\) and \(J_{\lambda A}=({\text {Id}}+\lambda A)^{-1}\) be the resolvent of operator \(\lambda A\); see, e.g., [1]. Given initial points \(x_0,x_{-1}\in {\mathbb {R}}^n\), the Malitsky-Tam FRB method with a fixed step-size (see [4, Remark 2.1]) iterates
with only one forward step computation required per iteration compared to Tseng’s method. In [4] Malitsky and Tam showed that their FRB given by (2) converges to a solution of (1).
In this paper, we extend the Malitsky-Tam FRB splitting method to solve the following structured possibly nonconvex minimization problem:
where \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}=(-\infty ,\infty ]\) is proper lower semicontinuous (lsc), and \(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) has a Lipschitz continuous gradient with constant \(L>0\). This class of optimization problems is of significantly broad interest in both theory and practical applications; see, e.g., [5, 6].
Because both \(\partial f, \partial g\) are not necessarily monotone, the techniques developed by Malitsky and Tam do not apply. To accomplish our goal, we employ the generalized concave Kurdyka-Łojasiewicz property (see Definition 2.2) recently proposed in [7] as a key assumption in our analysis, under which the FRB method demonstrates pleasant behavior. We show that the sequence produced by FRB converges globally to a stationary point of (3) and has finite length property. The sharpest upper bound on the total length of iterates is described by using the exact modulus associated with the generalized concave KL property. Both sequential and function value convergence rate analysis are carried out. Our approach deviates from the convergence mechanism developed in the original FRB paper [4], but follows a similar pattern as in many work that devotes to obtaining nonconvex extensions of splitting algorithms; see, e.g., [6, 8,9,10,11,12,13]. Numerical simulations indicate that FRB is competitive compared to the Douglas-Rachford method with a fixed step-size [6] and the Boţ-Csetnek inertial Tseng’s method [12].
The paper is structured as follows. We introduce notation and basic concepts in Sect. 2. The convergence analysis of FRB is presented in Sect. 3. When the merit function of FRB has KL exponent \(\theta \in [0,1)\), we obtain convergence rates for both the iterates and function values. Numerical experiments are implemented in Sect. 4. We end this paper by providing some directions for future research in Sect. 5.
2 Notation and preliminaries
Throughout this paper, \({\mathbb {R}}^n\) is the standard Euclidean space equipped with inner product \(\langle x,y\rangle =x^Ty\) and the Euclidean norm \(\left\Vert x\right\Vert =\sqrt{\langle x,x\rangle }\) for \(x, y\in {\mathbb {R}}^n\). Let \(\mathbb {N}=\{0,1,2,3,\ldots \}\) and \(\mathbb {N}^*=\{-1\}\cup \mathbb {N}\). The open ball centered at \({\bar{x}}\) with radius r is denoted by \(\mathbb {B}({\bar{x}};r)\). Associated with a subset \(K\subseteq {\mathbb {R}}^n\), its distance function is \({\text {dist}}(\cdot ,K):{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}=(-\infty ,\infty ]\),
where \({\text {dist}}(x,K)\equiv \infty\) if \(K=\emptyset\). The indicator function of K is defined as \(\delta _K(x)=0\) if \(x\in K\) and \(\delta _K(x)=\infty\) otherwise. For a function \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) and \(r_1,r_2\in [-\infty ,\infty ]\), we set \([r_1<f<r_2]=\{x\in {\mathbb {R}}^n:r_1<f(x)<r_2\}\). We say that f is coercive if \(\displaystyle \lim _{\left\Vert x\right\Vert \rightarrow \infty }f(x)=\infty\).
We will use frequently the following subgradients in the nonconvex setting; see, e.g., [14, 15].
Definition 2.1
Let \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) be a proper function. We say that
-
(i)
\(v\in {\mathbb {R}}^n\) is a Fréchet subgradient of f at \({\bar{x}}\in {\text {dom}}f\), denoted by \(v\in \hat{\partial }f({\bar{x}})\), if
$$\begin{aligned} f(x)\ge f({\bar{x}})+\langle v,x-{\bar{x}}\rangle +o(\left\Vert x-{\bar{x}}\right\Vert ). \end{aligned}$$(4) -
(ii)
\(v\in {\mathbb {R}}^n\) is a limiting subgradient of f at \({\bar{x}}\in {\text {dom}}f\), denoted by \(v\in \partial f({\bar{x}})\), if
$$\begin{aligned} v\in \{v\in {\mathbb {R}}^n:\exists x_k\xrightarrow []{f}{\bar{x}},\exists v_k\in \hat{\partial }f(x_k),v_k\rightarrow v\}, \end{aligned}$$(5)where \(x_k\xrightarrow []{f}{\bar{x}}\Leftrightarrow x_k\rightarrow {\bar{x}}\text { and }f(x_k)\rightarrow f({\bar{x}})\). Moreover, we set \({\text {dom}}\partial f=\{x\in {\mathbb {R}}^n:\partial f(x)\ne \emptyset \}\).
-
(iii)
We say that \({\bar{x}}\in {\text {dom}}\partial f\) is a stationary point, if \(0\in \partial f({\bar{x}})\).
Recall that the proximal mapping of a proper and lsc \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) with parameter \(\lambda >0\) is
which is the resolvent \(J_{\lambda A}\) with \(A=\partial f\) when f is convex; see, e.g., [1].
For \(\eta \in (0,\infty ]\), denote by \(\Phi _\eta\) the class of functions \(\varphi :[0,\eta )\rightarrow {\mathbb {R}}_+\) satisfying the following conditions: (i) \(\varphi (t)\) is right-continuous at \(t=0\) with \(\varphi (0)=0\); (ii) \(\varphi\) is strictly increasing on \([0,\eta )\). The following concept will be the key in our convergence analysis.
Definition 2.2
[7] Let \(f:{\mathbb {R}}^n\rightarrow \overline{\mathbb {R}}\) be proper and lsc. Let \({\bar{x}}\in {\text {dom}}\partial f\) and \(\mu \in {\mathbb {R}}\), and let \(V\subseteq {\text {dom}}\partial f\) be a nonempty subset.
(i) We say that f has the pointwise generalized concave Kurdyka-Łojasiewicz (KL) property at \({\bar{x}}\in {\text {dom}}\partial f\), if there exist neighborhood \(U\ni {\bar{x}}\), \(\eta \in (0,\infty ]\) and concave \(\varphi \in \Phi _\eta\), such that for all \(x\in U\cap [0<f-f({\bar{x}})<\eta ]\),
where \(\varphi _-^\prime\) denotes the left derivative. Moreover, f is a generalized concave KL function if it has the generalized concave KL property at every \(x\in {\text {dom}}\partial f\).
(ii) Suppose that \(f(x)=\mu\) on V. We say f has the setwiseFootnote 1 generalized concave Kurdyka-Łojasiewicz property on V, if there exist \(U\supset V\), \(\eta \in (0,\infty ]\) and concave \(\varphi \in \Phi _\eta\) such that for every \(x\in U\cap [0<f-\mu <\eta ]\),
Remark 2.3
The generalized concave KL property reduces to the celebrated concave KL property when concave desingularizing functions \(\varphi \in \Phi _\eta\) are continuously differentiable, which has been employed by a series of papers to establish convergence of proximal-type algorithms in the nonconvex setting; see, e.g., [6, 8,9,10,11,12,1316] and the references therein. The generalized concave KL property allows us to describe the optimal (minimal) concave desingularizing function and obtain sharp convergence results; see [7].
Many classes of functions satisfy the generalized concave KL property, for instance semialgebraic functions; see, e.g., [17, Corollary 18] and [8, Section 4.3]. For more characterizations of the concave KL property, we refer readers to [19]; see also the fundamental work of Łojasiewicz [20] and Kurdyka [21].
Definition 2.4
(i) A set \(E\subseteq \mathbb {R}^n\) is called semialgebraic if there exist finitely many polynomials \(g_{ij}, h_{ij}:\mathbb {R}^n\rightarrow \mathbb {R}\) such that \(E=\bigcup _{j=1}^p\bigcap _{i=1}^q\{x\in \mathbb { R}^n:g_{ij}(x)=0\text { and }h_{ij}(x)<0\}.\)
(ii) A function \(f:\mathbb {R}^n\rightarrow \overline{{\mathbb {R}}}\) is called semialgebraic if its graph \({\text {gph}}f=\{(x,y)\in \mathbb {R}^{n+1}:f(x)=y\}\) is semialgebraic.
Fact 2.5
[17, Corollary 18] Let \(f:\mathbb {R}^n\rightarrow \overline{{\mathbb {R}}}\) be a proper and lsc function and let \({\bar{x}}\in {\text {dom}}\partial f\). If f is semialgebraic, then it has the concave KL property at \({\bar{x}}\) with \(\varphi (t)=c\cdot t^{1-\theta }\) for some \(c>0\) and \(\theta \in (0,1)\).
3 Forward-reflected-backward splitting method for nonconvex problems
In this section, we prove convergence results of the FRB splitting method in the nonconvex setting. For convex minimization problems, the iteration scheme (2) specifies to
where \(y_k=x_k+\lambda \big (\nabla g(x_{k-1})-\nabla g(x_k)\big )\) is an intermediate variable introduced for simplicity. Since proximal mapping \({\text {Prox}}_{\lambda f}\) might not be single-valued without convexity of f, we formulate the FRB scheme for solving (3) as follows:
Forward-reflected-backward (FRB) splitting method
-
1.
Initialization: Pick \(x_{-1}, x_0\in {\mathbb {R}}^n\) and real number \(\lambda >0\).
-
2.
For \(k\in \mathbb {N}\), compute
$$\begin{aligned}&y_k=x_k+\lambda \big (\nabla g(x_{k-1})-\nabla g(x_k) \big ), \end{aligned}$$(8)$$\begin{aligned}&x_{k+1}\in {\text {Prox}}_{\lambda f}\big (y_k-\lambda \nabla g(x_k) \big ). \end{aligned}$$(9)
Recall that a proper, lsc function \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) is prox-bounded if there exists \(\lambda >0\) such that \(f+\frac{1}{2\lambda }\left\Vert \cdot \right\Vert ^2\) is bounded below; see, e.g., [15, Exercise 1.24]. The supremum of the set of all such \(\lambda\) is the threshold \(\lambda _f\) of prox-boundedness for f. In the remainder of this paper, we shall assume that
-
\(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) is proper, lsc, and prox-bounded with threshold \(\lambda _f>0\).
-
\(g:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) has a Lipschitz continuous gradient with constant \(L>0\).
For \(0<\lambda <\lambda _f\), [15, Theorem 1.25] shows that \((\forall x\in {\mathbb {R}}^n)\ {\text {Prox}}_{\lambda f}(x)\) is nonempty compact, so the above scheme is well-defined. Moreover, it is easy to see that
3.1 Basic properties
After formulating the iterative scheme, we now investigate its basic properties. We begin with a merit function for the FRB splitting method, which will play a central role in our analysis. Such functions of various forms appear frequently in the convergence analysis of splitting methods in nonconvex settings; see, e.g., [6, 12, 13, 22].
Definition 3.1
(FRB merit function) Let \(\lambda >0\). Define the FRB merit function \(H:{\mathbb {R}}^n\times {\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) by
We use the Euclidean norm for \({\mathbb {R}}^n\times {\mathbb {R}}^n\), i.e., \(\left\Vert (x,y)\right\Vert =\sqrt{\left\Vert x\right\Vert ^2+\left\Vert y\right\Vert ^2}\).
The FRB merit function has the decreasing property under a suitable step-size assumption.
Lemma 3.2
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\) and \(\inf (f+g)>-\infty\). Let \(M_1=\frac{1}{4\lambda }-\frac{3L}{4}\). Then the following statements hold:
(i) For \(k\in \mathbb {N}\), we have
which means that \(H(z_k)\le H(z_{k-1})\). Hence, the sequence \(\big (H(z_k)\big )_{k\in \mathbb {N}^*}\) is convergent.
(ii) \(\displaystyle \sum _{k=0}^\infty \left\Vert z_k-z_{k-1}\right\Vert ^2<\infty\), consequently \(\displaystyle \lim _{k\rightarrow \infty }\left\Vert z_k-z_{k-1}\right\Vert =0\).
Proof
(i) The sequence \((x_k)_{k\in \mathbb {N}^*}\) is well-defined because \(\lambda <\lambda _f\). The FRB iterative scheme and (10) imply that
Invoking the descent lemma [1, Theorem 18.15] to g, we have
where the first equality follows from the identity \(\left\Vert x_k-y_k\right\Vert ^2=\left\Vert x_{k+1}-y_k\right\Vert ^2-\left\Vert x_{k+1}-x_k\right\Vert ^2-2\langle x_{k+1}-x_k,x_k-y_k\rangle\), and the second inequality is implied by (8) and the Cauchy-Schwarz inequality. Moreover, the third inequality holds by using the inequality \(a^2+b^2\ge 2ab\) and the Lipschitz continuity of \(\nabla g\). Recall the product norm identity \(\left\Vert z_k-z_{k-1}\right\Vert ^2=\left\Vert x_{k+1}-x_k\right\Vert ^2+\left\Vert x_k-x_{k-1}\right\Vert ^2\). It then follows from the inequality above that
Adding \(\left( \frac{1}{4\lambda }-\frac{L}{4}\right) \left\Vert x_{k+1}-x_k\right\Vert ^2\) to both sides of the above inequality and applying again the product norm identity, one gets
which proves (12). Because \(M_1>0\), we have \(H(z_k)\le H(z_{k-1})\). The convergence result then follows from the assumption that \(\inf (f+g)>-\infty\).
(ii) For integer \(l>0\), summing (12) from \(k=0\) to \(k=l-1\) yields
Passing to the limit, one concludes that \(\displaystyle \sum _{k=0}^\infty \left\Vert z_k-z_{k-1}\right\Vert ^2<\infty\). \(\square\)
Remark 3.3
(Comparison to a known result) Boţ and Csetnek also utilized a merit function with similar form for the decreasing property of Tseng’s method [12, Lemma 3.2]. The similarity between their approach and ours is not unexpected. Indeed, it is pointed out in [23, Section 2.3] that FRB is a modified Tseng’s method. The major difference between [12, Lemma 3.2] and Lemma 3.2 is that inequality (12) asserts that H decreases with respect to a single sequence \((x_k)\) generated by FRB, while [12, Lemma 3.2] concerns the decreasing property of H with respect to two different sequences produced by Tseng’s method.
Next we estimate the lower bound on the gap between iterates. To this end, a lemma helps.
Lemma 3.4
For every \((x,y)\in {\mathbb {R}}^n\times {\mathbb {R}}^n\), we have
Proof
Apply the subdifferential sum and separable sum rules [15, Exercise 8.8, Proposition 10.5].\(\square\)
Lemma 3.5
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Let \(M_2=\sqrt{2}\left( \frac{1}{\lambda }+L+\left| \frac{1}{\lambda }-L\right| \right)\) and define for \(k\in \mathbb {N}\)
Then \(\left( A_{k+1},B_{k+1}\right) \in \partial H(z_k)\) and \(\left\Vert \left( A_{k+1},B_{k+1}\right) \right\Vert \le M_2\left\Vert z_k-z_{k-1}\right\Vert\).
Proof
The FRB scheme and (10) imply that \(0\in \partial f(x_{k+1})+\nabla g(x_k)+\frac{1}{\lambda }(x_{k+1}-y_k)\). Equivalently, \(\frac{1}{\lambda }(x_k-x_{k+1})+\nabla g(x_{k-1})-2\nabla g(x_k)\in \partial f(x_{k+1})\), which together with the subdifferential sum rule [15, Exercise 8.8] implies that
Applying Lemma 3.4 with \(x=x_{k+1}\) and \(y=x_k\) yields
It then follows that \(\left( A_{k+1},B_{k+1}\right) \in \partial H(z_k)\) and
where the third inequality follows from the Lipschitz continuity of \(\nabla g\) with modulus L. \(\square\)
We now connect the properties of FRB merit function with the actual objective of (3). Denote by \(\omega (z_{-1})\) the set of limit points of \((z_k)_{k\in \mathbb {N}^*}\).
Theorem 3.6
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that conditions in Lemma 3.2 are satisfied and \((z_k)_{k\in \mathbb {N}^*}\) is bounded. Suppose that a subsequence \((z_{k_l})_{l\in \mathbb {N}}\) of \((z_k)_{k\in \mathbb {N}^*}\) converges to some \(z^*=(x^*,y^*)\) as \(l\rightarrow \infty\). Then the following statements hold:
(i) \(\displaystyle \lim _{l\rightarrow \infty } H(z_{k_l})=f(x^*)+g(x^*)=F(x^*)\). In fact, \(\displaystyle \lim _{k\rightarrow \infty } H(z_{k})=F(x^*)\).
(ii) We have \(x^*=y^*\) and \(0\in \partial H(x^*,y^*)\), which implies \(0\in \partial F(x^*)=\partial f(x^*)+\nabla g(x^*)\).
(iii) The set \(\omega (z_{-1})\) is nonempty, compact and connected, on which the FRB merit function H is finite and constant. Moreover, we have \(\displaystyle \lim _{k\rightarrow \infty }{\text {dist}}(z_k,\omega (z_{-1}))=0\).
Proof
(i) By the identity (10), for every \(k\in \mathbb {N}^*\)
Combining the identity \(\left\Vert x^*-y_k\right\Vert ^2=\left\Vert x^*-x_{k+1}\right\Vert ^2+\left\Vert x_{k+1}-y_k\right\Vert ^2+2\langle x^*-x_{k+1},x_{k+1}-y_k\rangle\) with the above inequality and using the definition \(y_k=x_k+\lambda \big (\nabla g(x_{k-1})-\nabla g(x_k) \big )\), one gets
Replacing k by \(k_l\), passing to the limit and applying Lemma 3.2(ii), we have
where the last inequality is implied by the lower semicontinuity of f, which implies that \(\displaystyle \lim _{l\rightarrow \infty }f(x_{k_l+1})=f(x^*)\). Hence we conclude that
where the last equality holds because of Lemma 3.2(ii). Moreover, Lemma 3.2(i) states that the function value sequence \(\big (H(z_k)\big )_{k\in \mathbb {N}^*}\) converges, from which the rest of the statement readily follows.
(ii) By assumption, we have \(x_{k_l+1}\rightarrow x^*\) and \(x_{k_l}\rightarrow y^*\) as \(l\rightarrow \infty\). Therefore \(0\le \left\Vert y^*-x^*\right\Vert \le \left\Vert y^*-x_{k_l}\right\Vert +\left\Vert x_{k_l}-x_{k_l+1}\right\Vert +\left\Vert x_{k_l+1}-x^*\right\Vert \rightarrow 0\) as \(l\rightarrow \infty\), which means that \(x^*=y^*\). Combining Lemma 3.2, Lemma 3.5, statement(i) and the outer semicontinuity of \(z\mapsto \partial H(z)\), one gets
as desired.
(iii) Given that \((z_k)_{k\in \mathbb {N}^*}\) is bounded, it is easy to see that \(\omega (z_{-1})=\cap _{l\in \mathbb {N}}\overline{\cup _{k\ge l}z_k}\) is a nonempty compact set. By using Lemma 3.2 and a result by Ostrowski [1, Theorem 1.49], the set \(\omega (z_{-1})\) is connected. Pick \({\tilde{z}}\in \omega (z_{-1})\) and suppose that \(z_{k_q}\rightarrow {\tilde{z}}\) as \(q\rightarrow \infty\). Then statement(i) implies that \(H({\tilde{z}})=\displaystyle \lim _{q\rightarrow \infty }H(z_{k_q})=H(z^*)\). Finally, by the definition of \(\omega (z_{-1})\), we have \(\displaystyle \lim _{k\rightarrow \infty }{\text {dist}}(z_k,\omega (z_{-1}))=0\).\(\square\)
Theorem 3.6 requires the sequence \((z_k)_{k\in \mathbb {N}^*}\) to be bounded. The result below provides a sufficient condition to such an assumption.
Theorem 3.7
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that conditions in Lemma 3.2 are satisfied. If \(f+g\) is coercive (or level bounded), then the sequence \((x_k)_{k\in \mathbb {N}^*}\) is bounded, so is \((z_k)_{k\in \mathbb {N}^*}\).
Proof
Lemma 3.2(i) implies that we have
Suppose that \((x_k)_{k\in \mathbb {N}^*}\) was unbounded. Then we would have a contradiction by the coercivity or level boundedness of \(f+g\). \(\square\)
3.2 Convergence under the generalized concave KL property
Following basic properties of the FRB method, we now present the main convergence result under the generalized concave KL property. For \(\varepsilon >0\) and nonempty set \(\Omega \subseteq {\mathbb {R}}^n\), we define \(\Omega _\varepsilon =\{x\in {\mathbb {R}}^n:{\text {dist}}(x,\Omega )<\varepsilon \}\). The following lemma will be useful soon.
Lemma 3.8
[7, Lemma 4.4] Let \(f:{\mathbb {R}}^n\rightarrow \overline{{\mathbb {R}}}\) be proper lsc and let \(\mu \in {\mathbb {R}}\). Let \(\Omega \subseteq {\text {dom}}\partial f\) be a nonempty compact set on which \(f(x)=\mu\) for all \(x\in \Omega\). The following statements hold:
-
(i)
Suppose that f satisfies the pointwise generalized concave KL property at each \(x\in \Omega\). Then there exist \(\varepsilon >0,\eta \in (0,\infty ]\) and \(\varphi \in \Phi _\eta\) such that f has the setwise generalized concave KL property on \(\Omega\) with respect to \(U=\Omega _\varepsilon\), \(\eta\) and \(\varphi\).
-
(ii)
Set \(U=\Omega _\varepsilon\) and define \(h:(0,\eta )\rightarrow {\mathbb {R}}_+\) by
$$\begin{aligned} h(s)=\sup \big \{{\text {dist}}^{-1}\big (0,\partial f(x)\big ):x\in U\cap [0<f-\mu <\eta ],s\le f(x)-\mu \big \}. \end{aligned}$$Then the function \(\tilde{\varphi }:[0,\eta )\rightarrow {\mathbb {R}}_+\),
$$\begin{aligned} t\mapsto \int _0^th(s)ds,~\forall t\in (0,\eta ), \end{aligned}$$and \(\tilde{\varphi }(0)=0\), is well-defined and belongs to \(\Phi _\eta\). The function f has the setwise generalized concave KL property on \(\Omega\) with respect to U, \(\eta\) and \(\tilde{\varphi }\). Moreover,
$$\begin{aligned} \tilde{\varphi }=\inf \big \{\varphi \in \Phi _\eta : \varphi \,\text { is a concave desingularizing function of}\, f \,\text { on} \, \Omega \, \text {with respect to}\, U \, \text {and} \, \eta \big \}. \end{aligned}$$We say \(\tilde{\varphi }\) is the exact modulus of the setwise generalized concave KL property of f on \(\Omega\) with respect to U and \(\eta\).
We are now ready for the main results. Our methodology is akin to the concave KL convergence mechanism employed by a vast amount of literature; see, e.g., [6, 8,9,10,11,12,13]. However, we make use of the generalized concave KL property and the associated exact modulus, which guarantees the sharpness of our results; see Remark 3.10.
Theorem 3.9
(Global convergence of FRB) Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that \((z_k)_{k\in \mathbb {N}^*}\) is bounded, \(\inf (f+g)>-\infty\), and \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\). Suppose that the FRB merit function H has the generalized concave KL property on \(\omega (z_{-1})\). Then the following statements hold:
(i) The sequence \((z_k)_{k\in \mathbb {N}^*}\) is Cauchy and has finite length. To be specific, there exist \(M>0\), \(k_0\in \mathbb {N}\), \(\varepsilon >0\) and \(\eta \in (0,\infty ]\) such that for \(i\ge k_0+1\)
where \(\tilde{\varphi }\in \Phi _\eta\) is the exact modulus associated with the setwise generalized concave KL property of H on \(\omega (z_{-1})\) with respect to \(\varepsilon\) and \(\eta\).
(ii) The sequence \((x_k)_{k\in \mathbb {N}^*}\) has finite length and converges to some \(x^*\) with \(0\in \partial F(x^*)\).
Proof
(i) By the boundedness assumption, assume without loss of generality that \(z_k\rightarrow z^*=(x^*,y^*)\) for some \(z^*\in {\mathbb {R}}^n\times {\mathbb {R}}^n\). Then Theorem 3.6 implies that \(x^*=y^*\) and \(H(z_k)\rightarrow F(x^*)=H(z^*)\) as \(k\rightarrow \infty\). Recall from Lemma 3.2 that we have \(H(z_{k+1})\le H(z_k)\) for \(k\in \mathbb {N}^*\), therefore one needs to consider two cases.
Case 1: Suppose that there exists \(k_0\) such that \(H(z^*)=H(z_{k_0})\). Then Lemma 3.2(i) implies that \(z_{k_0+1}=z_{k_0}\) and \(x_{k_0+1}=x_{k_0}\). The desired results then follows from a simple induction.
Case 2: Assume that \(H(z^*)<H(z_k)\) for every k. By assumption and Lemma 3.8, there exist \(\varepsilon >0\) and \(\eta \in (0,\infty ]\) such that the FRB merit function H has the setwise generalized concave KL property on \(\omega (z_{-1})\) with respect to \(\varepsilon >0\) and \(\eta >0\) and the associated exact modulus \(\tilde{\varphi }\). On one hand, by the fact that \(H(z_k)\rightarrow H(z^*)\), there exists \(k_1\) such that \(z_k\in [0<H-H(z^*)<\eta ]\) for \(k>k_1\). On the other hand, Theorem 3.6(iii) implies that there exists \(k_2\) such that \({\text {dist}}(z_k,\omega (z_{-1}))<\varepsilon\) for all \(k>k_2\). Put \(k_0=\max (k_1,k_2)\). Then for \(k>k_0\), we have \(z_k\in \omega (z_{-1})_\varepsilon \cap [0<H-H(z^*)<\eta ]\). Hence for \(k>k_0\)
Invoking Lemma 3.5 yields
For simplicity, define for \(k>l\)
By the concavity of \(\tilde{\varphi }\) and (16), one has
Applying Lemma 3.2 to (17) implies that
Put \(M=\frac{M_2}{M_1}\). Note that \(a^2+b^2\ge 2ab\) for \(a,b\ge 0\). Then the above inequality gives
Pick \(i\ge k_0+1\). Summing (18) from i to an arbitrary \(j>i\) yields
where the second inequality is implied by the definition of \(\Delta _{k,k+1}\), implying
from which (15) readily follows. Moreover, one gets from the above inequality that for \(i\ge k_0+1\) and \(j>i\),
Recall from Theorem 3.6(i)&(ii) that \(H(z_i)\rightarrow H(z^*)\) and from Lemma 3.2(ii) that \(\left\Vert z_i-z_{i-1}\right\Vert \rightarrow 0\) as \(i\rightarrow \infty\). Passing to the limit, one concludes that \((z_k)_{k\in \mathbb {N}^*}\) is Cauchy.
(ii) The statement follows from the definition of \((z_k)_{k\in \mathbb {N}^*}\) and Theorem 3.6(ii).\(\square\)
Remark 3.10
(i) Note that iterates distance was shown to be only square-summable in the original FRB paper [4, Theorem 2.5]. Therefore the finite length property is even new in the convex setting.
(ii) Unlike the usual concave KL convergence analysis, our approach uses the generalized concave KL property and the associated exact modulus to describe the sharpest upper bound of \(\sum _{k=-1}^\infty \left\Vert z_{k+1}-z_k\right\Vert\). To see this, note that the usual analysis would yield a similar bound as (15) with \(\tilde{\varphi }\) replaced by a concave desingularizing function associated with the concave KL property of H. Lemma 3.8 states that \(\tilde{\varphi }\) is the infimum of all associated concave desingularizing functions. Hence the upper bound (15) is the sharpest; see also [7] for a similar sharp result of the celebrated PALM algorithm.
A key assumption of Theorem 3.9 is the concave KL property of the FRB merit function H on \(\omega (z_{-1})\). The class of semialgebraic functions provides rich examples of functions satisfying such an assumption.
Corollary 3.11
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB. Assume that \((x_k)_{k\in \mathbb {N}^*}\) is bounded, \(\inf (f+g)>-\infty\), and \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\). Suppose further that f and g are both semialgebraic functions. Then \((x_k)_{k\in \mathbb {N}^*}\) converges to some \(x^*\) with \(0\in \partial F(x^*)\) and has the finite length property.
Proof
Recall from [8, Section 4.3] that the class of semialgebraic functions is closed under summation and notice that the quadratic function \((x,y)\mapsto (\frac{1}{4\lambda }-\frac{L}{4})\left\Vert x-y\right\Vert ^2\) is semialgebraic. Then Fact 2.5 implies that the FRB merit function H is concave KL. Applying Theorem 3.9 then completes the proof. \(\square\)
Assuming that the FRB merit function admits KL exponent \(\theta \in [0,1)\), we establish the following convergence rate result. Our analysis is standard and follows from the usual KL convergence rate methodology; see, e.g., [24, Theorem 2], [16, Remark 6] and [10, Lemma 4]. We provide a proof here for the sake of completeness.
Theorem 3.12
(Convergence rate of FRB) Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that \((z_k)_{k\in \mathbb {N}^*}\) is bounded, \(\inf (f+g)>-\infty\), and \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\). Let \(M,k_0\) and \(x^*\) be those given in Theorem 3.9. Suppose that the FRB merit function H has KL exponent \(\theta \in [0,1)\) at \((x^*,x^*)\). Then the following statements hold:
(i) If \(\theta =0\), then \((x_k)_{k\in \mathbb {N}^*}\) converges to \(x^*\) in finite steps.
(ii) If \(\theta \in \left( 0,\frac{1}{2}\right]\), then there exist \(Q_1\in (0,1)\) and \(c_1>0\) such that
for k sufficiently large.
(iii) If \(\theta \in \left( \frac{1}{2},1\right)\), then there exists \(Q_2>0\) such that
for k sufficiently large.
Proof
Let \(z^*=(x^*,x^*)\). Then Theorem 3.9 shows that \(z_k\rightarrow z^*\) as \(k\rightarrow \infty\). Assume without loss of generality that \(H(z^*)=0\) and \(H(z_k)>0\) for all k. Before proving the desired statements, we will first develop several inequalities which will be used later. We have shown in Theorem 3.9 that for \(k>k_0\),
which by Lemma 3.5 and our assumption further implies that
Furthermore, define for \(k\in \mathbb {N}\)
which is well-defined due to Theorem 3.9. Assume again without loss of generality that \(\sigma _{k-1}-\sigma _k=\left\Vert z_k-z_{k-1}\right\Vert <1\) for every k (recall Lemma 3.2). Rearranging (19) gives
By (15), we have for \(k>k_0\)
where the last inequality follows from (20) and \(C=cM((1-\theta )cM_2)^{\frac{1-\theta }{\theta }}\). Clearly \(\left\Vert x_k-x^*\right\Vert \le \left\Vert z_k-z^*\right\Vert \le \sigma _k\). Hence, in order to prove the desired statements, it suffices to estimate \(\sigma _k\).
(i) Let \(\theta =0\) and suppose that \((z_k)_{k\in \mathbb {N}^*}\) converges in infinitely many steps. Then (19) means that for every \(k>k_0\)
Passing to the limit and applying Lemma 3.2, one gets \(1\le 0\), which is absurd.
(ii) If \(\theta \in \left( 0,\frac{1}{2}\right]\), then \(\frac{1-\theta }{\theta }\ge 1\) and \((\sigma _{k-1}-\sigma _k)^{\frac{1-\theta }{\theta }}\le \sigma _{k-1}-\sigma _k\). Hence (21) implies that for \(k>k_0\)
from which the desired result readily follows by setting \(Q_1=\frac{C+1}{C+2}\) and \(c_1=\left( \frac{C+1}{C+2}\right) ^{-k_0}\sigma _{k_0}\).
(iii) If \(\frac{1}{2}<\theta <1\), then \(0<\frac{1-\theta }{\theta }<1\) and \((\sigma _{k-1}-\sigma _k)^{\frac{1-\theta }{\theta }}\ge \sigma _{k-1}-\sigma _k\). It follows from (21) that \(\sigma _k\le \left( 1+C \right) (\sigma _{k-1}-\sigma _k)^{\frac{1-\theta }{\theta }}\). Define \(h(t)=t^{-\frac{\theta }{1-\theta }}\) for \(t>0\). Then
Let \(R>1\) be a fixed real number. Next we consider two cases for \(k>k_0\). If \(h(\sigma _{k})\le Rh(\sigma _{k-1})\) then (22) implies that
Set \(v=\frac{1-2\theta }{1-\theta }<0\). Then the above inequality can be rewritten as
If \(h(\sigma _k)>Rh(\sigma _{k-1})\) then \(\sigma _k<\left( \frac{1}{R}\right) ^{\frac{1-\theta }{\theta }}\sigma _{k-1}=q\sigma _{k-1}\), where \(q=\left( \frac{1}{R}\right) ^{\frac{1-\theta }{\theta } }\in (0,1)\). Hence
Set \(c_2=\min \big \{(q^v-1)\sigma _{k_0}^v,-\frac{v}{R}(1+C)^{-\frac{\theta }{1-\theta }}\big \}>0\). Combining (23) and (24) yields
Summing the above inequality from \(k_0+1\) to any \(k>k_0\), one gets
where the last inequality holds because \(k-k_0\ge \frac{k}{k_0+1}\) for \(k>k_0+1\). Hence
from which the desired statement follows by setting \(Q_2=\left( \frac{c_2}{k_0+1}\right) ^{\frac{1}{v}}\).\(\square\)
The following corollary asserts that convergence rates can be directly deduced from the KL exponent of the objective under certain conditions, which is a consequence of [25, Theorem 3.6], Theorems 3.6 and 3.12.
Corollary 3.13
Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and suppose that all conditions of Theorem 3.9 are satisfied. Let \(x^*\) be as in Theorem 3.9. Assume further that F has KL exponent \(\theta \in [\frac{1}{2},1)\) at \(x^*\). Then the following statements hold:
(i) If \(\theta =\frac{1}{2}\), then there exist \(Q_1\in (0,1)\) and \(c_1>0\) such that
for k sufficiently large.
(ii) If \(\theta \in \left( \frac{1}{2},1\right)\), then there exists \(Q_2>0\) such that
for k sufficiently large.
Proof
The FRB merit function H has KL exponent \(\theta\) at \((x^*,x^*)\) by our assumption and [25, Theorem 3.6]. Applying Theorem 3.12 completes the proof. \(\square\)
Following similar techniques in [18, 26], we now deduce convergence rates of the merit function value sequence.
Theorem 3.14
(Convergence rate of merit function value) Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that \((z_k)_{k\in \mathbb {N}^*}\) is bounded, \(\inf (f+g)>-\infty\), and \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\). Let \(x^*\) be the limit given in Theorem 3.9(ii). Suppose that the FRB merit function H has KL exponent \(\theta \in [0,1)\) at \(z^*=(x^*,x^*)\). Define \(e_k=H(z_k)-F(x^*)\) for \(k\in \mathbb {N}\). Then \((e_k)_{k\in \mathbb {N}}\) converges to 0 and the following statements hold:
(i) If \(\theta =0\), then \((e_k)_{k\in \mathbb {N}}\) converges in finite steps.
(ii) If \(\theta \in \left( 0,\frac{1}{2}\right]\), then there exist \({\hat{c}}_1>0\) and \({\hat{Q}}_1\in [0,1)\) such that for k sufficiently large,
(iii) If \(\theta \in \left( \frac{1}{2},1 \right)\), then there exists \({\hat{c}}_2>0\) such that for k sufficiently large,
Proof
We learn from Theorems 3.6 and 3.9 that the sequence \(\left( H(z_k) \right) _{k\in \mathbb {N}}\) is decreasing, \(H(z^k)\rightarrow H(z^*)=F(x^*)\) and \(z_k\rightarrow z^*\) as \(k\rightarrow \infty\). Therefore \((e_k)_{k\in \mathbb {N}}\) converges to 0 and \(e_k\ge 0\) for all k. By the KL exponent assumption, there exist \(c>0\) and \(k_0\) such that for \(k\ge k_0\)
Therefore by using Lemmas 3.2 and 3.5
where \(C=\frac{M_1}{c^2(1-\theta )^2M_2^2}\). Applying [18, Lemma 10] completes the proof. \(\square\)
Corollary 3.15
(Convergence rate of objective function value) Let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and define \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Assume that \((z_k)_{k\in \mathbb {N}^*}\) is bounded, \(\inf (f+g)>-\infty\), and \(0<\lambda <\min \big \{\frac{1}{3L},\lambda _f\big \}\). Let \(x^*\) be the limit given in Theorem 3.9(ii). Suppose that the FRB merit function H has KL exponent \(\theta \in [0,1)\) at \(z^*=(x^*,x^*)\). Then the following statements hold:
(i) If \(\theta =0\), then \(F(x_k)\) converges to \(F(x^*)\) in finite steps.
(ii) If \(\theta \in \left( 0,\frac{1}{2}\right]\), then there exit \({\bar{c}}_1>0\) and \({\bar{Q}}_1\in [0,1)\) such that for k sufficiently large
(iii) If \(\theta \in \left( \frac{1}{2},1 \right)\), then there exists \({\bar{c}}_2>0\) such that for k sufficiently large,
Proof
Observe that
It suffices to apply Theorems 3.12 and 3.14. \(\square\)
Let \({\text {Proj}}_C\) be the projection mapping associated with a nonempty closed subset \(C\subseteq {\mathbb {R}}^n\).
Example 3.16
Let \(C\subseteq {\mathbb {R}}^n\) be a nonempty, closed and convex set and let \(D\subseteq {\mathbb {R}}^n\) be nonempty and closed. Suppose that \(C\cap D\ne \varnothing\) and either C or D is compact. Assume further that both C and D are semialgebraic. Consider the minimization problem (3) with \(f=\delta _D\) and \(g=\frac{1}{2}{\text {dist}}^2(\cdot ,C)\). Let \(0<\lambda <\frac{1}{3}\), and let \((x_k)_{k\in \mathbb {N}^*}\) be a sequence generated by FRB and \(z_k=(x_{k+1},x_k)\) for \(k\in \mathbb {N}^*\). Then the following statements hold:
(i) There exists \(x^*\in {\mathbb {R}}^n\) such that \(x_k\rightarrow x^*\) and \(0\in \partial F(x^*)\).
(ii) Suppose, in addition, that
Then \(x^*\in C\cap D\). Moreover, there exist \(Q_1\in (0,1)\) and \(c_1>0\) such that
for k sufficiently large.
Proof
(i) By the compactness assumption, the function \(f+g\) is coercive. Hence Theorem 3.7 implies that \((x_k)_{k\in \mathbb {N}^*}\) is bounded. We assume that sets C, D are semialgebraic, then so are functions f and g; see [8, Section 4.3] and [9, Lemma 2.3]. Moreover, note that \(\lambda _f=\infty\) and g admits a Lipschitz continuous gradient with constant \(L=1\); see [15, Exercise 1.24] and [1, Corollary 12.31]. The desired result then follows from Corollary 3.11.
(ii) Taking the fact that \(0\in \partial F(x^*)\) into account and applying the subdifferential sum rule [15, Exercise 8.8], one gets
The constraint qualification (27) then implies that \(x^*={\text {Proj}}_C(x^*)\) and consequently \(x^*\in C\). Moreover, the FRB scheme together with the closedness of D ensures that \(x^*\in D\), hence \(x^*\in C\cap D\). Consequently, notice that the constraint qualification (27) amounts to
Then a direct application of [27, Theorem 5] guarantees that F has KL exponent \(\theta =\frac{1}{2}\) at \(x^*\). The desired result follows immediately from Corollary 3.13.\(\square\)
4 Numerical experiments
In this section, we apply the FRB splitting method to nonconvex feasibility problems. Let \(C=\{x\in {\mathbb {R}}^n:Ax=b\}\) for \(A\in {\mathbb {R}}^{m\times n}\), \(b\in {\mathbb {R}}^m\), \(r=\lceil m/5\rceil\), \(l=10^6\), and \(D=\{x\in {\mathbb {R}}^n: \left\Vert x\right\Vert _0\le r, \left\Vert x\right\Vert _\infty \le l\}\). Similarly to [6, Section 5], we consider the minimization problem (3) with
Clearly C is semialgebraic. As for D, first notice that \({\mathbb {R}}^n\times \{i\}\) for \(i\in {\mathbb {R}}\) and \({\text {gph}}\left\Vert \cdot \right\Vert _{0}\) are semialgebraic; see [16, Example 3]. Then
which means that \(\{x\in {\mathbb {R}}^n:\left\Vert x\right\Vert _0\le r \}\) is a finite union of intersections of semialgebraic sets, hence semialgebraic; see also [28, Formula 27(d)]. On the other hand, one has \(\left\Vert x\right\Vert _\infty \le l\Leftrightarrow \max _{1\le i\le n}(|x_i|-l)\le 0\), which means that the box \([-l,l]^n\) is semialgebraic. Altogether, the set D, which is intersection of semialgebraic sets, is semialgebraic. Hence, when specified to the problem above, FRB converges to a stationary point thanks to Example 3.16.
We find a projection onto D by the formula given below, which is a consequence of [29, Proposition 3.1] and was already observed by Li and Pong [6]. We provide a proof for the sake of completeness.
Proposition 4.1
Let \(z=(z_1,\ldots ,z_n)\in {\mathbb {R}}^n\). For every i, set \({\tilde{z}}_i^*={\text {Proj}}_{[-l,l]}(z_i)\), \(v_i^*=|z_i|^2-|{\tilde{z}}_i^*-z_i|^2\), and let \(I^*\subseteq \{1,\ldots ,n\}\) be the set of indices corresponding to the r largest elements of \(v_i^*,~i=1,\ldots ,n\). Define \(z^*\in {\mathbb {R}}^n\) by \(z^*_i={\tilde{z}}_i^*\) if \(i\in I^*\) and \(z^*_i=0\) otherwise. Then \(z^*\in {\text {Proj}}_D(z)\).
Proof
Apply [29, Proposition 3.1] with \(\phi _i=|\cdot -z_i|^2\) and \(\mathcal {X}_i=[-l,l]\). \(\square\)
We shall benchmark FRB against the Douglas-Rachford method with fixed step-size (DR) [6] by Li and Pong, inertial Tseng’s method (iTseng) [12] by Boţ and Csetnek, and DR equipped with step-size heuristics (DRh) [6] by Li and Pong. These splitting algorithms for nonconvex optimization problems are known to converge globally to a stationary point of (3) under appropriate assumptions on the concave KL property of merit functions; see [6, Theorems 1–2, Remark 4, Corollary 1] and [12, Theorem 3.1]. The convergence of DR and iTseng in our setting are already proved in [6, Proposition 2] and [12, Corollary 3.1], respectively.
We implement FRB with the following specified scheme for the problem of finding an r-sparse solution of a linear system \(\{x\in {\mathbb {R}}^n:Ax=b\}\)
where the step-size \(\lambda =0.9999\cdot \frac{1}{3}\) (recall Example 3.16). The inertial type Tseng’s method studied in [12, Scheme (6)] is applied with a step-size \(\lambda ^\prime = 0.1316\) given by [12, Lemma 3.3] and a fixed inertial term \(\alpha =\frac{1}{8}\):
As for DR and DRh, we employ the schemes specified by [6, Scheme (7)] and [6, Section 5] with the exact same step-sizes, respectively. All algorithms are initialized at the origin, and we terminate FRB and iTseng when
We adopt the termination criteria from [6, Section 5] for DR and DRh, where the same tolerance of \(10^{-8}\) is applied. Similar to [6, Section 5], our problem data is generated through creating random matrices \(A\in {\mathbb {R}}^{m\times n}\) with entries following the standard Gaussian distribution. Then we generate a vector \({\hat{x}}\in {\mathbb {R}}^r\) randomly with the same distribution, project it onto the box \([-10^6,10^6]^r\), and create a sparse vector \({\tilde{x}}\in {\mathbb {R}}^n\) whose r entries are chosen randomly to be the respective values of the projection of \({\hat{x}}\) onto \([-10^6,10^6]^r\). Finally, we set \(b=A{\tilde{x}}\) to guarantee \(C\cap D\ne \varnothing\).
Results of our experiments are presented in Table 1 below. For each problem of the size (m, n), we randomly generate 50 instances using the strategy described above, and report ceilings of the average number of iterations (iter), the minimal objective value at termination (\(\text {fval}_{\text {min}}\)), and the number of successes (succ). Here we say an experiment is “successful", if the objective function value at termination is less than \(10^{-12}\), which means that the algorithm actually hits a global minimizer rather than just a stationary point. We observed the following:
-
Among algorithms with a fixed step-size (FRB, DR, and iTseng), FRB outperforms the others in terms of both the number of iterations and successes, and it also has the smallest termination values. Moreover, it’s worth noting that our simulation results align with Malitsky and Tam’s observation that FRB converges faster than Tseng’s method on a specific problem [4, Remark 2.8].
-
DRh has the most number of successes and the best precision at termination (see \(\text {fval}_{\text {min}}\)), but tend to be slower on “easy" problems (large m).
Therefore, FRB is a competitive method for finding a sparse solution of a linear system, at least among the aforementioned algorithms with a fixed step-size.Footnote 2
5 Conclusion and future work
We established convergence of the Malitsky-Tam FRB splitting algorithm in the nonconvex setting. Under the generalized concave KL property, we showed that FRB converges globally to a stationary point of (3) and admits finite length property, which is even new in the convex setting. The sharpness of our results is demonstrated by virtue of the exact modulus associated with the generalized concave KL property. We also analyzed both function value and sequential convergence rates of FRB when desingularizing functions associated with the generalized concave KL property have the Łojasiewicz form. Numerical simulations suggest that FRB is a competitive method compared to DR and inertial Tseng’s methods.
As for future work, it is tempting to analyze convergence rate using the exact modulus, as the usual KL convergence rate analysis assumes desingularizing functions of the Łojasiewicz form, which could be an overestimation. Another direction is to improve FRB through heuristics or other step-size strategies such as FISTA. Finally, as anonymous referees kindly pointed out, inertia techniques have been widely adopted to improve nonconvex proximal-type algorithms in the KL framework; see, e.g., [12, 30] and the references therein. Therefore, it is also tempting to explore an inertial type FRB for nonconvex optimization problems.
Data availability
The data that support the findings of Table 1 are available from the corresponding author upon request.
Notes
In the remainder, we shall omit adjectives “pointwise" and “setwise" whenever there is no ambiguity.
We also performed simulations with much larger problem size (\(n=4000,5000,6000\)), in which case FRB, DR, and iTseng tend to stuck at stationary points while DRh can still hit global minimizers. We believe that this is due to its heuristics; see also [6, Section 5].
References
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Cham (2017)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simulat. 4, 1168–1200 (2005)
Tseng, P.: A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control. Optim. 38, 431–446 (2000)
Malitsky, Y., Tam, M.K.: A forward-backward splitting method for monotone inclusions without cocoercivity. SIAM J. Optim. 30, 1451–1472 (2020)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 183–202 (2009)
Li, G., Pong, T.K.: Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems. Math. Program. 159, 371–401 (2016)
Wang, X., Wang, Z.: The exact modulus of the generalized Kurdyka-Łojasiewicz property. Math. Oper. Res. (2021). arXiv:2008.13257, to appear
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel Methods. Math. Program. 137, 91–129 (2013)
Banert, S., Boţ, R.I.: A general double-proximal gradient algorithm for dc programming. Math. Program. 178, 301–326 (2019)
Bolte, J., Sabach, S., Teboulle, M., Vaisbourd, Y.: First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM J. Optim. 28, 2131–2151 (2018)
Boţ, R.I., Csetnek, E.R.: An inertial Tseng’s type proximal algorithm for nonsmooth and nonconvex optimization problems. J. Optim. Theory Appl. 171, 600–616 (2016)
Li, G., Liu, T., Pong, T.K.: Peaceman-Rachford splitting for a class of nonconvex optimization problems. Comput. Optim. Appl. 68, 407–436 (2017)
Mordukhovich, B.S.: Variational Analysis and Generalized Differentiation I: Basic Theory. Springer-Verlag, Berlin (2006)
Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer-Verlag, Berlin (1998)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)
Boţ, R.I., Nguyen, D.-K.: The proximal alternating direction method of multipliers in the nonconvex setting: convergence analysis and rates. Math. Oper. Res. 45, 682–712 (2020)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362, 3319–3363 (2010)
Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles 117, 87–89 (1963)
Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier 48, 769–783 (1998)
Ochs, P., Chen, Y., Brox, T., Pock, T.: iPiano: inertial proximal algorithm for nonconvex optimization. SIAM J. Imag. Sci. 7, 1388–1419 (2014)
Böhm, A., Sedlmayer, M., Csetnek, E. R., Boţ, R. I.: Two steps at a time–taking gan training in stride with Tseng’s method, arXiv:2006.09033, (2020)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 116, 5–16 (2009)
Li, G., Pong, T.K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comput. Math. 18, 1199–1232 (2018)
Boţ, R.I., Csetnek, E.R., Nguyen, D.-K.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. SIAM J. Optim. 29, 1300–1328 (2019)
Chen, C., Pong, T.K., Tan, L., Zeng, L.: A difference-of-convex approach for split feasibility with applications to matrix factorizations and outlier detection. J. Global Optim. 78, 107–136 (2020)
Bauschke, H.H., Luke, D.R., Phan, H.M., Wang, X.: Restricted normal cones and sparsity optimization with affine constraints. Found. Comput. Math. 14, 63–83 (2014)
Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23, 2448–2478 (2013)
Boţ, R.I., Csetnek, E.R., László, S.C.: An inertial forward-backward algorithm for the minimization of the sum of two nonconvex functions. EURO J. Comput. Optimiz. 4, 3–25 (2016)
Acknowledgements
The authors thank the editor and the reviewers for helpful suggestions and constructive feedback which helped us to improve the presentation of the results. XW and ZW were partially supported by NSERC Discovery Grants.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Wang, Z. Malitsky-Tam forward-reflected-backward splitting method for nonconvex minimization problems. Comput Optim Appl 82, 441–463 (2022). https://doi.org/10.1007/s10589-022-00364-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-022-00364-0
Keywords
- Generalized concave Kurdyka-Łojasiewicz property
- Proximal mapping
- Malitsky-Tam forward-reflected-backward splitting method
- Merit function
- Global convergence
- Nonconvex optimization