Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm

Gao, Lejia; Wen, Bo

doi:10.1007/s12190-022-01797-w

Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm

Original Research
Published: 03 October 2022

Volume 69, pages 1403–1429, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Applied Mathematics and Computing Aims and scope Submit manuscript

Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm

Download PDF

Lejia Gao¹ &
Bo Wen²

423 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers a class of difference-of-convex (DC) optimization problems, whose objective function is the sum of a convex smooth function and a possibly nonsmooth DC function. We first propose a new extrapolated proximal difference-of-convex algorithm, which incorporates a more general setting of the extrapolation parameters $\{\beta _k\}$. Then we prove the subsequential convergence of the proposed method to a stationary point of the DC problem. Based on the Kurdyka–Łojasiewicz inequality, the global convergence and convergence rate of the whole sequence generated by our method have been established. Finally, some numerical experiments on the DC regularized least squares problems have been performed to demonstrate the efficiency of our proposed method.

Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization

Article 14 September 2018

Error Bound and Isocost Imply Linear Convergence of DCA-Based Algorithms to D-Stationarity

Article 01 March 2023

New Bregman proximal type algorithms for solving DC optimization problems

Article Open access 23 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In recent years, DC optimization problems arise in many application areas, such as machine learning [12, 15] and image processing [7, 9, 21]. Due to their popularity, many numerical algorithms have been proposed for solving these problems [1, 13, 14, 29]. In this paper, we consider a special DC problem, which is in the following form:

$$\begin{aligned} \min \left\{ \Psi (x):=f(x)+g(x) \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$

(1.1)

where g is a difference-of-convex function:

$$\begin{aligned} g(x)=g_{n}(x)-g_{s}(x), \end{aligned}$$

(1.2)

$f:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a smooth convex function with Lipschitz continuous gradient, $g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a proper closed convex function and $g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a continuous convex function. In addition, we assume the objective function $\Psi $ is level-bounded, which means that inf $ \Psi (x)>-\infty $ and the set of global minimizers of (1.1) is nonempty.

Since problem (1.1) is a class of DC problems, the well-known and classical DC algorithm (DCA) [26] can be applied to solving the optimization problem. The main iteration step is:

$$\begin{aligned} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ f(x)+g_{n}(x)-\left\langle \xi ^{k}, x\right\rangle \right\} , \end{aligned}$$

(1.3)

where $\xi ^{k} \in \partial g_{s}\left( x^{k}\right) $ is a subgradient of $g_s$ at $x^{k} \in \mathbb {R}^{n}$. As the scale of this problem becomes larger and larger, the original DCA probably not very efficient. By using the proximal mapping and a specific DC decomposition [27], large-scale DC optimization problems are solved via the proximal DC algorithm (PDCA) [11]. It can be suitably applied to the problem (1.1) for which the new iterate is obtained by solving the following proximal subproblem

$$\begin{aligned} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x)+\left\langle \nabla f\left( x^{k}\right) -\xi ^{k}, x\right\rangle +\frac{L}{2}\left\| x-x^{k}\right\| ^{2}\right\} , \end{aligned}$$

(1.4)

where $\xi ^{k} \in \partial g_{s}\left( x^{k}\right) $ and $L>0$ is the Lipschitz continuity modulus of $\nabla f$. However, this algorithm may take a large number of iterations.

Therefore, various attempts have been made to accelerate PDCA. By using the extrapolation accelerating techniques, Wen et al. [31] proposed a proximal DC algorithm with extrapolation (pDCAe) to solve the problem (1.1). The key iteration step is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} y^{k}=x^{k}+\beta _{k}\left( x^{k}-x^{k-1}\right) , \\ x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x)+\left\langle \nabla f\left( y^{k}\right) -\xi ^{k}, x\right\rangle +\frac{L}{2 }\left\| x-y^{k}\right\| ^{2}\right\} , \end{array}\right. \end{aligned}$$

(1.5)

where $\xi ^{k} \in \partial g_{s}\left( x^{k}\right) $, $L>0$ is the Lipschitz continuity modulus of $\nabla f$ and $\beta _{k}\left( x^{k}-x^{k-1}\right) $ is an extrapolation term. The choice of extrapolation parameters $\{\beta _{k}\}$ in [31] can cover many popular choices of extrapolation parameters including those used in fast iterative shrinkage-thresholding algorithm (FISTA) with fixed restart strategy [24] for solving (1.1) in the case $g(x)=g_n(x)$, which is given as follows:

$$\begin{aligned} \beta _k=\frac{t_{k-1}-1}{t_k} \ \text {with} \ t_{k}=\frac{1+\sqrt{1+4t_{k-1}^2}}{2}, \end{aligned}$$

(1.6)

where $t_{-1}=t_{0}=1$. Recently, based on the difference-of-Moreau-envelopes smoothing technique, Sun et al. [30] developed an inexact gradient descent method for solving the DC programming, which is

$$\begin{aligned} \left\{ \begin{array}{l} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x) + \left\langle \nabla f\left( x^{k}\right) , x\right\rangle +\frac{1}{2 \upsilon }\left\| x-z^{k}\right\| ^{2}\right\} , \\ x_{\upsilon g_{s}}\left( z^{k}\right) =\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_s(x)+\frac{1}{2 \upsilon }\left\| x-z^{k}\right\| ^{2}\right\} ,\\ z^{k+1}=z^{k}+\beta \left( x^{k+1}-x_{\upsilon g_{s}}\left( z^{k}\right) \right) , \end{array}\right. \end{aligned}$$

(1.7)

where $\upsilon \in (0, \frac{1}{L})$, $L>0$ is the Lipschitz continuity modulus of $\nabla f$ and $\beta \in (0,2)$.

For other accelerated DC algorithms for solving constrained DC programming have been further proposed to improve the quality of solutions and the rate of convergence. Pang et al. [25] proposed an enhanced DC algorithm (EDCA) to solve the considered DC problem, meanwhile speed up the subsequential convergence. Further, based on EDCA, Lu et al. [22, 23] developed the enhanced proximal DC algorithm (EPDCA) to improve the efficiency of solving constrained DC programming with $g_s$ to be the supremum of many convex smooth functions in (1.2), while nonmonotone EPDCA is proposed by utilizing the nonmonotone line search technique. Moreover, Sun et al. [30] proposed algorithms by combining the difference-of-Moreau-envelopes smoothing technique with the augmented Lagrangian function and obtained well convergence results.

In this paper, motivated by the success of pDCAe on accelerating the original PDCA for solving structured DC optimization problems, we aim to investigate a more general extrapolation scheme in order to improve the numerical performance of PDCA. Concretely, inspired by the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe in [31], we consider a new extrapolated proximal DC algorithm (EPDCA) with a more general setting of the extrapolation parameters $\{\beta _{k}\}$, where a power parameter $\omega \in (0, 1]$ is introduced in the extrapolation scheme.

We first prove that any accumulation point of the sequence generated by our method is a stationary point under very general conditions. Moreover, according to Kurdyka–Łojasiewicz property, we establish the global convergence and convergence rate of the sequence generated by our method, which does not require the assumption that $g_s$ is continuously differentiable. In the end, we conduct some numerical experiments to show the efficiency of our method.

The main contribution of this work can be summarized as following:

We propose a novel extrapolation scheme with more general setting of the extrapolation parameters $\{\beta _{k}\}$ in EPDCA to solve DC optimization problem (1.1) for potential acceleration. Then we establish the convergence results of our method.
By constructing an auxiliary function, we prove the global convergence results of EPDCA without the assumption that $g_s$ is a locally Lipschitz continuous function, which extended the convergence results of pDCAe developed in [31].
According to our numerical results, the performance of our method is efficient on $\ell _{1-2}$ regularized least squares problem [32] and logarithmic regularized least squares problem [6].

The rest of this paper is organized as follows. In Sect. 2, some basic definitions used in the whole paper are given. Section 3 gives the framework of EPDCA and discuss the selection of the extrapolation parameters $\{\beta _k\}$. In Sect. 4, we analyze the convergence behaviors of our method. Some numerical experiments are conducted to show the efficiency of the proposed method in comparison with other existing methods in Sect. 5. Section 6 concludes the whole paper.

2 Preliminaries

In this paper, the Euclidean scalar product of $\mathbb {R}^{n}$ and its corresponding norm are respectively denoted by $\langle \cdot , \cdot \rangle $ and $\Vert \cdot \Vert .$ For a $m \times n$ matrix A, let its transpose be denoted by $A^{T}$. If A is a symmetric matrix, we use $\lambda _{\max }(A)$ and $\lambda _{\min }(A)$ to denote the largest and smallest eigenvalues respectively. For a given nonempty closed set $\mathcal {C} \subseteq \mathbb {R}^{n}$, let ${\text {dist}}(\cdot , \mathcal {C}): \mathbb {R}^{n} \rightarrow \mathbb {R}$ denotes the distance function to $\mathcal {C}$, i.e.

$$\begin{aligned} {\text {dist}}(x, \mathcal {C}):=\inf _{y \in \mathcal {C}}\Vert x-y\Vert , \quad \forall x \in \mathbb {R}^{n}. \end{aligned}$$

(2.1)

For an extended real-valued function $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$, the set

$$\begin{aligned} {\text {dom}} f:=\left\{ x \in \mathbb {R}^{n}: f(x)<\infty \right\} \end{aligned}$$

(2.2)

denotes its effective domain. f is called proper if it does not take the value $-\infty $ and dom f is nonempty.

The set

$$\begin{aligned} \text{ epi }f:=\left\{ (x, t)\in \mathbb {R}^{n}\times \mathbb {R}: f(x) \le t\right\} \end{aligned}$$

(2.3)

denotes the epigraph of f, and f is closed (or convex) if epif is closed (or convex). A proper closed function f is said to be level-bounded if the lower level set $\left\{ x \in \mathbb {R}^{n}: f(x) \le r\right\} $ is bounded for any $r \in \mathbb {R}$.

Given a proper closed function $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$, the $Fr\acute{e}chet$ subdifferential of f at $x \in {\text {dom}} f$ is given by

$$\begin{aligned} \partial ^{F} f(x):=\left\{ y \in \mathbb {R}^{n}: \liminf _{z \rightarrow x} \frac{f(z)-f(x)-\langle y, z-x\rangle }{\Vert z-x\Vert } \ge 0\right\} , \end{aligned}$$

(2.4)

while for $x \notin {\text {dom}} f, \partial ^{F} f(x)=\emptyset $. The limiting subdifferential of f at $x \in {\text {dom}} f$ is defined as

$$\begin{aligned} \partial f(x):=\left\{ y \in \mathbb {R}^{n}: \exists \ x^{k} \rightarrow x, f\left( x^{k}\right) \rightarrow f(x), y^{k} \in \partial ^{F} f\left( x^{k}\right) \right. \text {such that} \left. y^{k} \rightarrow y\right\} , \end{aligned}$$

(2.5)

while for $x \notin {\text {dom}} f$, $\partial f(x)=\emptyset $. If f is convex, the $Fr\acute{e}chet$ subdifferential and the limiting subdifferential will coincide with the convex subdifferential, that is,

$$\begin{aligned} \partial ^{F} f(x)=\partial f(x)=\left\{ y \in \mathbb {R}^{n}: f(z) \ge f(x)+\langle y, z-x\rangle \text{ for } \text{ all } z \in \mathbb {R}^{n}\right\} . \end{aligned}$$

(2.6)

A proper closed function $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$ is called $\sigma $-strongly convex with $\sigma \ge 0$ if for any $x,z \in {\text {dom}} f$ and $\lambda \in [0,1]$, it holds that

$$\begin{aligned} f(\lambda x+(1-\lambda ) z) \le \lambda f(x)+(1-\lambda ) f(z)-\frac{\sigma }{2} \lambda (1-\lambda )\Vert x-z\Vert ^{2}. \end{aligned}$$

(2.7)

Moreover, let f be a $\sigma $-strongly convex function, it is known that for any $x \in {\text {dom}} \partial f, y \in \partial f(x)$ and $z \in {\text {dom}} f$, we have

$$\begin{aligned} f(z) \ge f(x)+\langle y, z-x \rangle +\frac{\sigma }{2}\Vert z-x\Vert ^{2} . \end{aligned}$$

(2.8)

For a proper closed convex function $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$, let $f^{*}(u)=\sup _{x\in \mathbb {R}^{n}}\{\langle u, x\rangle -f(x)\}$ denotes the conjugate function of f(x). Then $f^{*}$ is proper closed convex and for any x and y, the Young’s inequality holds, that is

$$\begin{aligned} f(x)+f^{*}(y) \ge \langle x, y\rangle , \end{aligned}$$

(2.9)

the equality holds if and only if $y \in \partial f(x)$. Moreover, for any x and y, one has $y \in \partial f(x)$ if and only if $x \in \partial f^{*}(y)$.

Finally, we present some preliminaries on the Kurdyka–Łojasiewicz property [2,3,4]. These concepts play a central role in our theoretical and algorithmic developments. For $\eta \in (0, \infty ]$, we denote by $\Xi _{\eta }$ the set of all concave continuous functions $\phi :[0, \eta ) \rightarrow [0, \infty )$ that are continuously differentiable over $(0, \eta )$ with positive derivatives and satisfy $\phi (0)=0$.

Definition 2.1

(Kurdyka–Łojasiewicz property) A proper closed function $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$ is said to satisfy the Kurdyka–Łojasiewicz property at $\overline{x} \in {\text {dom}} \partial f$ if there exist $\eta \in (0, \infty ]$, a neighborhood W of $\overline{x}$, and a function $\phi \in \Xi _{\eta }$ such that for all x in the intersection

$$\begin{aligned} W\cap \left\{ x \in \mathbb {R}^{n}: f(\overline{x})<f(x)<f(\overline{x})+\eta \right\} \end{aligned}$$

it holds that

$$\begin{aligned} \phi ^{\prime }(f(x)-f(\overline{x})) {\text {dist}}(0, \partial f(x)) \ge 1. \end{aligned}$$

If f satisfies the Kurdyka–Łojasiewicz property at any point of dom $\partial f$, then f is called a Kurdyka–Łojasiewicz function.

The following uniformized Kurdyka–Łojasiewicz property [5] plays an important role in our sequential convergence analysis.

Lemma 2.1

(Uniformized Kurdyka–Łojasiewicz property) Let $\Omega \subseteq \mathbb {R}^{n}$ be a compact set and let $f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]$ be a proper closed function. If f is constant on $\Omega $ and satisfies the Kurdyka–Łojasiewicz property at each point of $\Omega $, then there exist $\epsilon , \eta >0$ and $\phi \in \Xi _{\eta }$ such that

$$\begin{aligned} \phi ^{\prime }(f(x)-f(\overline{x})) {\text {dist}}(0, \partial f(x)) \ge 1 \end{aligned}$$

for any $\overline{x} \in \Omega $ and any x satisfying ${\text {dist}}(x, \Omega )<\epsilon $ and $f(\overline{x})<f(x)<f(\overline{x})+\eta $.

3 Extrapolated proximal difference-of-convex algorithm

In this section, we will show the framework of the extrapolated proximal difference-of-convex algorithm (EPDCA) and give the chosen of the extrapolation parameters $ \{\beta _k\}$, which are more general than those proposed in (1.6).

The extrapolation parameters $ \{\beta _k\}$ are selected as follows:

$$\begin{aligned} \beta _k=\frac{t_{k-1}-1}{t_k} \ \text {with} \ t_{k-1}=a(k-1)^\omega +b,~~ k\ge 1 \end{aligned}$$

(3.2)

where $\omega \in (0, 1]$, $a \in \mathbb {R}^{+}$ and $b \in \mathbb {R}$. Set $t_{-1}=t_0=b$. It is worth noting that in order to avoid the denominator being zero, without further mentioning, we always set $b \in \mathbb {R}\backslash \{-ak^\omega : k \in \mathbb {N}^{+}\}$.

Remark 3.1

(i)
If we choose $\omega =1, a=\frac{1}{\alpha -1} \ \mathrm{with} \ \alpha >3, b=1$ in (3.2), the extrapolation parameters $ \{\beta _k\}$ become as follows:
$$\begin{aligned} \beta _k=\frac{k-1}{k+\alpha -1}. \end{aligned}$$
(3.3)
This setting was proposed by Chambolle and Dossal [8], which is a special case of our proposed extrapolation parameters in (3.2).
(ii)
Let the extrapolation parameters $ \{\beta _k\}$ in (3.2) with $a=\frac{1}{2}, b=1$ and $\omega =1$, we can obtain the extrapolation parameters $ \{\beta _k\}$ in (1.6) and (3.2) converge to the same value with the same convergence rate from [17, Proposition2], which means that the setting (1.6) of the extrapolation parameters $ \{\beta _k\}$ are asymptotically consistent to a special case of the parameters we proposed in (3.2).
(iii)
The choice of the power $\omega $ depends on the specific problems. We choose $\omega =0.98$ in the numerical experiments for solving two classes of DC regularized least squares problems, see Sect. 5.
(iv)
We can easily deduce that the sequence $\{\beta _k\}$ is increasing to 1 from (3.2). Therefore, given a fixed positive integer K, we can choose $\beta _k \equiv \beta _K$ whenever $k>K$, which implies the supremum of $\beta _k$ is strictly less than 1.

More concrete examples of such setting of extrapolation parameters have been extensively used for the Nesterov’s accelerated forward–backward algorithm, see [16] for more details.

4 Convergence analysis

In this section, we are going to analyze the convergence behaviors of EPDCA. We first define the following auxiliary sequence:

$$\begin{aligned} H(x^k,\xi ^k,x^{k-1}):=f\left( x^{k}\right) +g_n\left( x^{k}\right) -\langle x^k,\xi ^k\rangle +g_s^*\left( \xi ^{k}\right) +\frac{\mu }{2}\Vert x^{k}-x^{k-1}\Vert ^2, \end{aligned}$$

where $\mu \in [L\overline{\beta }^2,L]$ with $\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1$, $\xi ^{k} \in \partial g_s(x^{k-1})$ and $g_s^{*}(\xi ^{k}) $ is the conjugate function of $g_s(x^k)$. Based on the Young’s inequality, we obtain that

$$\begin{aligned} H(x^k,\xi ^{k},x^{k-1})\ge f\left( x^{k}\right) +g_n\left( x^{k}\right) -g_s\left( x^{k}\right) +\frac{\mu }{2}\Vert x^{k}- x^{k-1}\Vert ^2\ge \Psi (x^k). \end{aligned}$$

(4.1)

It is worth mentioning that using this auxiliary sequence structure, we can establish the global convergence of the sequence generated by our method to solve problem (1.1) without requiring $g_s$ to be continuously differentiable. The idea of constructing such an auxiliary sequence similarity can be found in [20].

Lemma 4.1

Let $\{x^k\}$ be a sequence generated by EPDCA and $\mu \in [L\overline{\beta }^2,L]$, it holds that

$$\begin{aligned} H(x^{k+1},\xi ^{k+1},x^{k})-H(x^k,\xi ^{k},x^{k-1}) \le \frac{\mu -L}{2}\Vert x^{k+1}-x^{k}\Vert ^2+\frac{L\beta _k^2-\mu }{2}\Vert x^{k}-x^{k-1}\Vert ^2. \end{aligned}$$

(4.2)

Furthermore, the sequence $\{H(x^k,\xi ^{k},x^{k-1})\}$ is nonincreasing.

Proof

Using the strong convexity of the objective in the subproblem (3.1), we can compare the objective values of this strongly convex function at $x^{k+1}$ and $x^k$ that

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k+1}\right) +\left\langle \nabla f\left( y^{k}\right) -\xi ^{k+1}, x^{k+1}\right\rangle +\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2} \\&\qquad \le g_{n}\left( x^{k}\right) +\left\langle \nabla f\left( y^{k}\right) -\xi ^{k+1}, x^{k}\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}-\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^{2}. \end{aligned} \end{aligned}$$

Rearranging terms, we obtain that

$$\begin{aligned} \begin{aligned} g_{n}\left( x^{k+1}\right) \le \,&g_{n}\left( x^{k}\right) +\left\langle -\nabla f\left( y^{k}\right) +\xi ^{k+1}, x^{k+1}-x^k\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}\\&-\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2}-\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^{2}. \end{aligned} \end{aligned}$$

On the other hand, using the fact that $\nabla f$ is Lipschitz continuous with a Lipschitz constant of $L>0$, we have

$$\begin{aligned} f\left( x^{k+1}\right) \le f\left( y^{k}\right) +\left\langle \nabla f\left( y^{k}\right) , x^{k+1}-y^{k}\right\rangle +\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2}. \end{aligned}$$

Using the above two inequalities, we obtain further that

$$\begin{aligned}&f\left( x^{k+1}\right) +g_n\left( x^{k+1}\right) \nonumber \\&\qquad \le f\left( y^{k}\right) +\left\langle \nabla f\left( y^{k}\right) , x^{k}-y^{k}\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle \nonumber \\&\qquad \qquad -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2 \nonumber \\&\qquad \le f\left( x^{k}\right) +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2 \nonumber \\&\qquad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2, \nonumber \\ \end{aligned}$$

(4.3)

where the second inequality follows from the convexity of f, and the last inequality holds due to the relation $x^k-y^k=x^k-(x^k+\beta _k(x^k-x^{k-1}))=-\beta _k(x^k-x^{k-1})$. From the definition of the auxiliary sequence $\{H(x^k,\xi ^k,x^{k-1})\}$ and the conjugate function $g_s^{*}$, we have

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) =f\left( x^{k+1}\right) +g_n\left( x^{k+1}\right) -\left\langle x^{k+1},\xi ^{k+1}\right\rangle +g_s^*\left( \xi ^{k+1}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2\nonumber \\&\quad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\qquad -\left\langle x^{k+1},\xi ^{k+1}\right\rangle +g_s^*\left( \xi ^{k+1}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\quad = f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\qquad +\left\langle x^k-x^{k+1},\xi ^{k+1}\right\rangle -g_s\left( x^{k}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\quad = f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2-g_s\left( x^{k}\right) , \end{aligned}$$

(4.4)

where the first inequality follows from (4.3), and the second equality follows from the fact that $\xi ^{k+1} \in \partial g_s(x^k)$. Combing the Young’s inequality on (4.4), we obtain that

$$\begin{aligned} \begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) \\&\quad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2\\&\qquad -\left\langle x^k,\xi ^k \right\rangle + g_s^*\left( \xi ^{k}\right) \\&\quad =f\left( x^{k}\right) +g_{n}\left( x^{k}\right) -\left\langle x^k,\xi ^k \right\rangle + g_s^*\left( \xi ^{k}\right) +\frac{\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^{2}\\&\qquad + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2 +\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}\\&\quad = H\left( x^{k},\xi ^{k},x^{k-1}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2+\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}. \end{aligned} \end{aligned}$$

where the second equality follows from the definition of the sequence $\{H(x^k,\xi ^{k},x^{k-1})\}$. Therefore,

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^{k},\xi ^{k},x^{k-1}\right) \\&\quad \le \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2+\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}. \end{aligned}$$

Recall that $L \bar{\beta }^{2} \le \mu \le L$ and $\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1$, we obtain that for any $k >0$,

$$\begin{aligned} \frac{\mu }{2}-\frac{L}{2} \le 0, \text{ and } \frac{L}{2} \beta _{k}^{2}-\frac{\mu }{2} \le \frac{L}{2} \bar{\beta }^{2}-\frac{\mu }{2} \le 0. \end{aligned}$$

Consequently, we see immediately that

$$\begin{aligned} H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^{k},\xi ^{k},x^{k-1}\right) \le 0, \end{aligned}$$

which means that the auxiliary sequence $\left\{ H(x^{k},\xi ^{k},x^{k-1})\right\} $ is nonincreasing. This completes the proof. $\square $

Before analyzing the convergence behaviors, on the basis of Fermat’s rule [28, Theorem 10.1], we give the definition of stationary points of the objective function $\Psi $ as follows.

Definition 4.1

Let $\Psi $ be given in (1.1). We say that $\tilde{x}$ is a stationary point of $\Psi $ if

$$\begin{aligned} 0 \in \partial \nabla f(\tilde{x})+\partial g_n(\tilde{x})-\partial g_s(\tilde{x}). \end{aligned}$$

The set of all stationary points of $\Psi $ is denoted by $\mathcal {X}$.

We next give the global subsequential convergence of the sequence $\{x^k\}$ generated by EPDCA to solve problem (1.1) .

Theorem 4.1

(Global subsequential convergence) Let $\{x^k\}$ be the sequence generated by EPDCA and $\mu \in [L\overline{\beta }^2,L]$. Then the following statements hold:

(i)
The sequences $\left\{ x^{k}\right\} $ and $\left\{ \xi ^{k}\right\} $ are bounded.
(ii)
$\left\| x^{k}-x^{k-1}\right\| \rightarrow 0$ as $k \rightarrow \infty $.
(iii)
Any accumulation point of the sequence $\left\{ x^{k}\right\} $ is a stationary point of $\Psi $.

Proof

(i) Since the sequence $\left\{ H(x^k,\xi ^k,x^{k-1})\right\} $ is nonincreasing from Lemma 4.1, together with the inequality (4.1), we obtain that

$$\begin{aligned} \Psi \left( x^k\right) \le H\left( x^k,\xi ^k,x^{k-1}\right) \le H\left( x^0,\xi ^0,x^{-1}\right) < \infty . \end{aligned}$$

Noting that by assumption $\Psi $ is level-bounded, thus we have the sequence $\left\{ x^{k}\right\} $ is bounded. Moreover, due to the boundedness of $\{x^k\}$, the continuity of $g_s$, and the fact that $\xi ^k \in \partial g_s(x^{k-1})$, we immediately have the sequence $\{\xi ^k\}$ is also bounded. This proves (i).

(ii) From Lemma 4.1 and according to the fact that $\mu \le L$, we have

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^k,\xi ^k,x^{k-1}\right) \\&\quad \le \frac{\mu -L}{2}\left\| x^{k+1}-x^{k}\right\| ^2+\frac{L\beta _k^2-\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^2\\&\quad \le \frac{L\beta _k^2-\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^2. \end{aligned}$$

By rearranging the terms above, we obtain that

$$\begin{aligned} \frac{\mu -L\beta _k^2}{2}\left\| x^{k}-x^{k-1}\right\| ^2\le H\left( x^k,\xi ^k,x^{k-1}\right) -H\left( x^{k+1},\xi ^{k+1},x^{k}\right) . \end{aligned}$$

(4.5)

Furthermore, summing up both sides of the above inequality with k ranging from 1 to m, we have

$$\begin{aligned} 0 \le \sum _{k=1}^{m}\left[ \frac{\mu -L\beta _k^2}{2}\left\| x^{k}-x^{k-1}\right\| ^{2}\right]\le & {} \sum _{k=1}^{m}\left( H\left( x^k,\xi ^k,x^{k-1}\right) -H\left( x^{k+1},\xi ^{k+1},x^{k}\right) \right) \nonumber \\= & {} H\left( x^1,\xi ^1,x^{0}\right) -H\left( x^{m+1},\xi ^{m+1},x^{m}\right) . \end{aligned}$$

(4.6)

Letting m to the limit in (4.6), we conclude that

$$\begin{aligned} \frac{\mu -L\beta _k^2}{2} \sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| ^{2}<\infty . \end{aligned}$$

Using the fact that $\mu \ge L\overline{\beta }^2 \ge L\beta _{k}^2 $, we have $ \sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| ^{2}<\infty .$ Therefore, we immediately obtain that $\left\| x^{k}-x^{k-1}\right\| \rightarrow 0$ as $k \rightarrow \infty $. This proves (ii).

(iii) Let $x^{*}$ be an accumulation point of $\left\{ x^{k}\right\} $. Then there exists a subsequence $\left\{ x^{k_{j}}\right\} $ convergent to $x^{*}$. From the first-order optimality condition of the subproblem (3.1), we obtain that

$$\begin{aligned} 0 \in \partial g_{n}\left( x^{k_{j}+1}\right) + \nabla f\left( y^{k_{j}}\right) -\xi ^{k_{j}+1}+ L\left( x^{k_{j}+1}-y^{k_{j}}\right) , \end{aligned}$$

(4.7)

where $\xi ^{k_{j}+1} \in \partial g_{s}\left( x^{k_{j}}\right) $ . On the other hand, due to the fact that $\left\| x^{k_j}-x^{k_j-1}\right\| \rightarrow 0$ from (ii), we have

$$\begin{aligned} \begin{aligned} x^{k_{j}+1}-y^{k_{j}}&=x^{k_{j}+1}-\left( x^{k_j}+\beta _{k_j}\left( x^{k_j}-x^{k_j-1}\right) \right) \\&= \left( x^{k_{j}+1}-x^{k_j}\right) -\beta _{k_j}\left( x^{k_j}-x^{k_j-1}\right) \rightarrow 0. \end{aligned} \end{aligned}$$

(4.8)

In addition, recall that the sequence $\left\{ \xi ^{k_{j}+1}\right\} $ is bounded from (i), so without loss of generality we can assume that the limit of the sequence $\left\{ \xi ^{k_{j}+1}\right\} $ exists, and it belongs to $\partial g_{s}\left( x^{*}\right) $ due to the closedness of $\partial g_{s}$. Noting the closedness of $\partial g_{n}$ and the continuity of $\nabla f$, then taking j to the limit in (4.7), we obtain that

$$\begin{aligned} 0 \in \partial g_{n}\left( x^{*}\right) +\nabla f\left( x^{*}\right) -\partial g_{s}\left( x^{*}\right) , \end{aligned}$$

(4.9)

which means that $x^{*}$ is a stationary point of $\Psi $. This completes the proof. $\square $

Before analyzing the global convergence results of EPDCA, we will give some properties of the auxiliary sequence $\{H\left( x^{k},\xi ^k, x^{k-1}\right) \}$.

Proposition 4.1

Let $\{x^k\}$ be the sequence generated by EPDCA for solving (1.1) and $\mu \in [L\overline{\beta }^2,L]$. Then the following statements hold:

(i)
$\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0$.
(ii)
$\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) $ exists.
(iii)
The set of accumulation points of $\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} $, denoted by $\Upsilon $, is a nonempty compact set. Moreover, $H \equiv \zeta $ on $\Upsilon $.

Proof

(i) Taking the subdifferential of the auxiliary sequence $\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}$, we obtain that for $k\ge 1$,

$$\begin{aligned} \partial H\left( x^{k}, \xi ^{k}, x^{k-1}\right) =\left[ \begin{array}{c} \nabla f\left( x^{k}\right) +\partial g_{n}\left( x^{k}\right) -\xi ^{k}+\mu \left( x^k-x^{k-1}\right) \\ -x^{k}+\partial g_s^{*}\left( \xi ^{k}\right) \\ -\mu \left( x^{k}-x^{k-1}\right) \end{array}\right] . \end{aligned}$$

Using the first-order optimality of the subproblem (3.1), we have

$$\begin{aligned} -\nabla f\left( y^{k-1}\right) +\xi ^k-L\left( x^k-y^{k-1}\right) \in \partial g_n(x^k). \end{aligned}$$

From the above relation and the fact that $x^{k-1}\in \partial g_s^*(\xi ^k)$, we obtain further that

$$\begin{aligned} \left[ \begin{array}{c} \nabla f\left( x^{k}\right) -\nabla f\left( y^{k-1}\right) -L\left( x^k-y^{k-1}\right) +\mu \left( x^k-x^{k-1}\right) \\ x^{k-1}-x^{k} \\ -\mu \left( x^{k}-x^{k-1}\right) \end{array}\right] \in \partial H\left( x^{k}, \xi ^{k}, x^{k-1}\right) , \end{aligned}$$

for all $k \ge 1$. This together with the relation that $x^k-y^{k-1}=x^k-x^{k-1}-\beta _{k-1}(x^{k-1}-x^{k-2})$ and the Lipschitz continuity of $\nabla f$, we conclude that there exists $ A > 0$ such that

$$\begin{aligned} \begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) \le A \left( \left\| x^{k}-x^{k-1}\right\| +\left\| x^{k-1}-x^{k-2}\right\| \right) . \end{aligned} \end{aligned}$$

(4.10)

Since the facts that $\left\| x^{k}-x^{k-1}\right\| \rightarrow 0$ and $\left\| x^{k-1}-x^{k-2}\right\| \rightarrow 0$ from Theorem 4.1 (ii), we deduce that $\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0$. This proves (i).

(ii) Noting that the inequality $H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \ge \Psi (x^k) $ holds in (4.1) and due to the level-boundedness of $\Psi $, we obtain that the sequence $\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}$ is bounded below. In addition, the sequence $\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}$ is nonincreasing from Lemma 4.1. Therefore, we conclude that $\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) $ exists. This proves (ii).

(iii) Using the facts $\{x^k\}$ and $\{\xi ^k\}$ are bounded from Theorem 4.1 (i), we know immediately that the set of accumulation points of $\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} $ is a nonempty compact set $\Upsilon $.

We now show that $H \equiv \zeta $ on $\Upsilon $. To this end, take any $(\hat{x}, \hat{\xi }, \hat{\nu }) \in \Upsilon $, then there exists a convergent subsequence $\left\{ \left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) \right\} $ such that $\lim _{j \rightarrow \infty }\left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) =(\hat{x}, \hat{\xi }, \hat{\nu })$. From the lower semicontiny of the sequence $\{H(x^k,\xi ^k,x^{k-1})\}$ and the definition of $\zeta $, we have

$$\begin{aligned} H(\hat{x},\hat{\xi },\hat{\nu })\le \liminf \limits _{j \rightarrow \infty } H\left( x^{k_j},\xi ^{k_j}, x^{k_j-1}\right) =\zeta . \end{aligned}$$

(4.11)

On the other hand, since $x^{k_{j}}$ is the minimizer of the subproblem (3.1), we have

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\quad \le g_{n}(\hat{x})+\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, \hat{x}\right\rangle +\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}. \end{aligned} \end{aligned}$$

(4.12)

By rearranging the terms above, we obtain that

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\quad \le g_{n}(\hat{x})+\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}. \end{aligned} \end{aligned}$$

(4.13)

Using the facts that $y^{k_{j}-1}=x^{k_{j}-1}+\beta _{k_{j}-1}\left( x^{k_{j}-1}-x^{k_{j}-2}\right) $, $\left\| x^{k}-x^{k-1}\right\| \rightarrow 0$ from Theorem 4.1(ii), and $\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}$, we have the following relations:

$$\begin{aligned}&\begin{aligned} \left\| x^{k_{j}}-y^{k_{j}-1}\right\| \le \left\| x^{k_{j}}-x^{k_{j}-1}\right\| +\overline{\beta }\left\| x^{k_{j}-1}-x^{k_{j}-2}\right\| \rightarrow 0, \end{aligned} \end{aligned}$$

(4.14)

$$\begin{aligned}&\left\| \hat{x}-y^{k_{j}-1}\right\| =\left\| \hat{x}-x^{k_{j}}+x^{k_{j}}-y^{k_{j}-1}\right\| \le \left\| \hat{x}-x^{k_{j}}\right\| +\left\| x^{k_{j}}-y^{k_{j}-1}\right\| \rightarrow 0. \nonumber \\ \end{aligned}$$

(4.15)

In addition, using the continuous differentiability of f, $\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}$, and the boundedness of $\left\{ \xi ^{k_{j}}\right\} $ and $\left\{ y^{k_{j}}\right\} $, we have

$$\begin{aligned} \lim _{j \rightarrow \infty }\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle =0. \end{aligned}$$

(4.16)

Therefore, from the definition of the auxiliary sequence $\{H(x^k,\xi ^k,x^{k-1})\}$, we obtain that

$$\begin{aligned} \begin{aligned}&\zeta =\lim _{j \rightarrow \infty } H\left( x^{k_{j}},\xi ^{k_{j}},x^{k_{j}-1}\right) \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) +\frac{L}{2}\left\| x^{k_{j}}-x^{k_{j}-1}\right\| ^{2} \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2} \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\qquad +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) \\&\quad \le \limsup _{j\rightarrow \infty } f\left( x^{k_{j}}\right) +\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}+g_{n}(\hat{x})-\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_s^{*}\left( \xi ^{k_{j}}\right) \\&\quad =\limsup _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}(\hat{x})-\left\langle x^{k_{j}}-x^{k_{j}-1}, \xi ^{k_{j}}\right\rangle -g_{s}\left( x^{k_{j}-1}\right) \\&\quad =f(\hat{x})+g_{n}(\hat{x})-g_{s}(\hat{x})=\Psi (\hat{x}) \le H(\hat{x}, \hat{\xi }, \hat{\nu }), \end{aligned} \end{aligned}$$

(4.17)

where the third equality follows from the facts that $\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0$ and (4.14), the fourth equality follows from (4.16), the first inequality holds due to (4.13), the fifth equality follows from (4.15) and the fact that $\xi ^{k_{j}} \in \partial g_{s}\left( x^{k_{j}-1}\right) $, the sixth equality holds due to $\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0$ from Theorem 4.1 (ii), the boundedness of $\left\{ \xi ^{k}\right\} $ and the continuity of f and $g_s$, while the last inequality follows from the relation (4.1).

Therefore, we conclude that $H(\hat{x},\hat{\xi },\hat{\nu })=\lim \nolimits _{j \rightarrow \infty } H(x^{k_j},\xi ^{k_j},x^{k_{j}-1})$ from (4.11) and (4.17). Due to the arbitrariness of $(\hat{x},\hat{\xi },\hat{\nu })$, we have $H \equiv \zeta $ on $\Upsilon $. This completes the proof. $\square $

Next, we are going to analyze the global convergence and the convergence rate of the sequence $\{x^k\}$ generated by EPDCA to solve (1.1).

Theorem 4.2

(Global convergence) Let $\{x^k\}$ be the sequence generated by EPDCA for solving (1.1) and $\mu \in [L\overline{\beta }^2,L]$. Assume the auxiliary function $H(x, \xi ,\nu )$ is a Kurdyka–Łojasiewicz function. Then the sequence $\left\{ x^{k}\right\} $ converges to a stationary point of $\Psi $; moreover, $\sum \nolimits _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| <\infty $.

Proof

From Theorem 4.1 (iii), it is sufficient to prove the sequence $\left\{ x^{k}\right\} $ is convergent. We first consider the case that there exists an integer $k > 0$, such that $H\left( x^{k}, \xi ^k,x^{k-1}\right) =\zeta .$ Due to the sequence $\left\{ H\left( x^{k},\xi ^k, x^{k-1}\right) \right\} $ is nonincreasing and convergent to $\zeta $, we obtain that for any $\bar{j} \ge 0$, $H\left( x^{k+\bar{j}},\xi ^{k+\bar{j}} , x^{k+\bar{j}-1}\right) =\zeta $ is satisfied. Therefore, from (4.5) we conclude that for all $k >\bar{j} \ge 0$, it holds that $x^k=x^{k+\bar{j}}$, which means that the sequence $\left\{ x^{k}\right\} $ converges finitely.

In the following, we consider the case that for any $k>0$, $H\left( x^{k},\xi ^k, x^{k-1}\right) >\zeta $ holds. Recall from Proposition 4.1 (iii) that $\Upsilon $ is a compact subset and $H \equiv \zeta $ on $\Upsilon $, where $\Upsilon $ is the set of the accumulation points of $\left\{ \left( x^{k},\xi ^k, x^{k-1}\right) \right\} $. In addition, noting that $H(x,\xi ,\nu )$ is a Kurdyka–Łojasiewicz function, according to Lemma 2.1, there exists a positive scalar $\epsilon $ and a continuous concave function $\phi \in \Xi _{\eta }$ with $\eta >0$ such that

$$\begin{aligned} \phi ^{\prime }(H(x, \xi ,\nu )-\zeta )\cdot {\text {dist}}((0,0,0), \partial H(x, \xi , \nu )) \ge 1 \end{aligned}$$

for all $(x, \xi , \nu ) \in W$, where

$$\begin{aligned} \begin{aligned} W=&\left\{ (x, \xi , \nu ) \in \mathbb {R}^{n} \times \mathbb {R}^{n} \times \mathbb {R}^{n}: {\text {dist}}((x, \xi , \nu ), \Upsilon )<\epsilon \right\} \\&\cap \left\{ (x, \xi , \nu ) \in \mathbb {R}^{n} \times \mathbb {R}^{n} \times \mathbb {R}^{n}: \zeta<H(x, \xi , \nu )<\zeta +\eta \right\} . \end{aligned} \end{aligned}$$

Since $\left\{ x^{k}\right\} $ and $\left\{ \xi ^{k}\right\} $ are bounded from Theorem 4.1 (i), and $\Upsilon $ is the set of accumulation points of $\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} $ from Proposition 4.1 (iii), we obtain that

$$\begin{aligned} \lim _{k \rightarrow \infty } {\text {dist}}\left( \left( x^{k},\xi ^k, x^{k-1}\right) , \Upsilon \right) =0. \end{aligned}$$

Hence, there exists $\bar{k}_{1}>0$ such that ${\text {dist}}\left( \left( x^{k},\xi ^k, x^{k-1}\right) , \Upsilon \right) <\epsilon $ whenever $k \ge \bar{k}_{1}$. Similarly, from the sequence $\left\{ H\left( x^{k}, \xi ^k, x^{k-1}\right) \right\} $ is nonincreasing and convergent to $\zeta $, there exists $\bar{k}_{2}>0$ such that $\zeta<H\left( x^{k},\xi ^k, x^{k-1}\right) <\zeta +\eta $ for all $k \ge \bar{k}_{2}$. Taking $\bar{k}=\max \left\{ \bar{k}_{1},\bar{ k}_{2}\right\} $, then the sequence $\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} $ belongs to W whenever ${k \ge \bar{k}}$. Hence we have

$$\begin{aligned} \phi ^{\prime }\left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) \ge 1,~~\forall k \ge \bar{k}. \end{aligned}$$

(4.18)

Since $\phi $ is a concave function, we see further that

$$\begin{aligned} \begin{aligned}&\left[ \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right] \\&\qquad \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \\&\quad \ge \phi ^{\prime }\left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \\&\qquad \cdot \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) \right) \\&\quad \ge H\left( x^{k}, \xi ^k,x^{k-1}\right) -H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) \ge T \left\| x^{k}-x^{k-1}\right\| ^{2}, \end{aligned} \end{aligned}$$

where the second inequality holds due to (4.18) and the fact that $\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} $ is nonincreasing, the third inequality follows from (4.5) and $T=\frac{\mu -L\beta _k^2}{2}$. From the above inequality and (4.10), we obtain that

$$\begin{aligned} \begin{aligned} \left\| x^{k}-x^{k-1}\right\| ^{2} \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&\cdot \left( \left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| \right) . \end{aligned} \end{aligned}$$

(4.19)

Taking the square root on both sides of (4.19) and using the inequality of arithmetic and geometric means, we have

$$\begin{aligned} \begin{aligned} \left\| x^{k}-x^{k-1}\right\| \le&\sqrt{\frac{2 A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) } \\&\times \sqrt{\frac{\left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| }{2}} \\ \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&+\frac{1}{4}\left\| x^{k-1}-x^{k-2}\right\| +\frac{1}{4}\left\| x^{k}-x^{k-1}\right\| , \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \begin{aligned} \frac{1}{2}\left\| x^{k}-x^{k-1}\right\| \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&+\frac{1}{4}\left( \left\| x^{k-1}-x^{k-2}\right\| -\left\| x^{k}-x^{k-1}\right\| \right) . \end{aligned} \end{aligned}$$

(4.20)

Summing the above relation from $k=\bar{k}$ to $\infty $, we obtain that

$$\begin{aligned} \sum _{k=\bar{k}}^{\infty }\left\| x^{k}-x^{k-1}\right\| \le \frac{2 A}{T} \phi \left( H\left( x^{\bar{k}}, \xi ^{\bar{k}},x^{\bar{k}-1}\right) -\zeta \right) +\frac{1}{2}\left\| x^{\bar{k}-1}-x^{\bar{k}-2}\right\| <\infty , \end{aligned}$$

which implies that $\sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| \le \infty $, i.e., the sequence $\left\{ x^{k}\right\} $ is a Cauchy sequence. Thus, the sequence $\left\{ x^{k}\right\} $ converges to a stationary point of $\Psi $ from Theorem 4.1 (iii). This completes the proof. $\square $

Remark 4.1

It is worth mentioning that the idea of Theorem 4.2 is similar to Theorem 2.9 proposed in [4], but the details are slightly different. In order to make the proof easier to read and understand, we write out the concrete proof.

In the following, we consider the convergence rate of the sequence $\left\{ x^{k}\right\} $ under the assumption that the auxiliary function $H(x, \xi , \nu )$ is a KL function whose $\phi \in \Xi _\eta $ takes the form $\phi (s)=c s^{1-\theta }$ for some $\theta \in [0,1)$.

Theorem 4.3

(Rate of convergence) Let $\{x^k\}$ be the sequence generated by EPDCA for solving (1.1) and $\mu \in [L\overline{\beta }^2,L]$. Suppose that $\left\{ x^{k}\right\} $ converges to some $x^\star \in \mathcal {X}$, the auxiliary function $H(x, \xi ,\nu )$ is a KL function with $\phi $ in the KL inequality taking the form $\phi (s)=c s^{1-\theta }$ for some $\theta \in [0,1)$ and $c>0 .$ Then the following statements hold:

(i)
If $\theta =0$, then there exists $k_{0}>0$ so that $x^{k}$ is constant for $k>k_{0}$.
(ii)
If $\theta \in \left( 0, \frac{1}{2}\right] $, then there exist $c_{1}>0, k_{1}>0$, and $\eta \in (0,1)$ such that $\left\| x^{k}-x^\star \right\| <c_{1} \eta ^{k}$ for $k>k_{1}$.
(iii)
If $\theta \in \left( \frac{1}{2}, 1\right) $, then there exist $c_{2}>0$ and $k_{2}>0$ such that $\left\| x^{k}-x^\star \right\| <c_{2} k^{-\frac{1-\theta }{2 \theta -1}}$ for $k>k_{2}$.

Proof

(i) If $\theta =0$, we will prove that there must exist an integer $k_{0}>0$ such that $H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta $. We assume to the contrary that for all $k>0$, it holds that $H\left( x^{k},\xi ^{k}, x^{k-1}\right) >\zeta $. Since $\lim \limits _{k\rightarrow \infty } x^{k}=x^\star $ and the sequence $\left\{ H\left( x^{k},\xi ^{k}, x^{k-1}\right) \right\} $ is nonincreasing and convergent to $\zeta $, from $\phi (s)=c s$ and the inequality (4.18), we have

$$\begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \ge \frac{1}{c}, \end{aligned}$$

for all sufficiently large k, which contradicts Proposition 4.1 (i). Therefore, there exists $k_{0}>0$ so that $H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta $. Due to the sequence $\{H\left( x^{k}, \xi ^k,x^{k-1}\right) \}$ is nonincreasing and convergent to $\zeta $, it must holds that $H\left( x^{k_{0}+\bar{j}},\xi ^{k_{0}+\bar{j}}, x^{k_{0}+\bar{j}-1}\right) =\zeta $ for any $\bar{j} \ge 0$. Thus, we conclude from (4.5) that there exists $k_{0}>0$ so that $x^{k}$ is constant for $k>k_{0}$. This proves (i).

(ii–iii) We next consider the case that $\theta \in (0,1) .$ If there exists $k_{0}>0$ such that $H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta $, then we can show that the sequence $\left\{ x^{k}\right\} $ is finitely convergent as above, and the desired conclusions hold trivially. Therefore, for $\theta \in (0,1)$, we only need to consider the case when $H\left( x^{k}, \xi ^k,x^{k-1}\right) >\zeta $ for all $k>0$.

Define $R_{k}=H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta $ and $S_{k}=\sum _{j=k}^{\infty }\left\| x^{j}-x^{j-1}\right\| $, where $S_{k}$ is well-defined due to Theorem 4.2. Using (4.20), for any $k\ge \bar{k}$ ,where $\bar{k}$ is defined as in (4.18), we have

$$\begin{aligned} \begin{aligned} S_{k}=\,&2 \sum _{j=k}^{\infty } \frac{1}{2}\left\| x^{j}-x^{j-1}\right\| \\ \le \,&2 \sum _{j=k}^{\infty }\left[ \frac{A}{T}\left( \phi \left( H\left( x^{j}, \xi ^j,x^{j-1}\right) -\zeta \right) -\phi \left( H\left( x^{j+1}, \xi ^{j+1},x^{j}\right) -\zeta \right) \right) \right. \\&\left. +\,\frac{1}{4}\left( \left\| x^{j-1}-x^{j-2}\right\| -\left\| x^{j}-x^{j-1}\right\| \right) \right] \\ \le \,&\frac{2 A}{T} \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) +\frac{1}{2}\left\| x^{k-1}-x^{k-2}\right\| \\ =&\, \frac{2 A}{T} \phi \left( R_{k}\right) +\frac{1}{2}\left( S_{k-1}-S_{k}\right) . \end{aligned} \end{aligned}$$

Using the above inequality and the fact that the sequence $\left\{ S_{k}\right\} $ is nonincreasing, we obtain that for all $k \ge \bar{k}$,

$$\begin{aligned} S_{k} \le \frac{2 A}{T} \phi \left( R_{k}\right) +\frac{1}{2}\left( S_{k-2}-S_{k}\right) . \end{aligned}$$

(4.21)

On the other hand, since $\lim \limits _{k \rightarrow \infty } x^{k}=x^\star $ and the sequence $\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} $ is nonincreasing and convergent to $\zeta $, from (4.18) with $\phi (s)=c s^{1-\theta }$, we have

$$\begin{aligned} c(1-\theta )\left( R_{k}\right) ^{-\theta } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \ge 1, \end{aligned}$$

(4.22)

for all sufficiently large k. In addition, from (4.10) and the definition of $S_{k}$, we obtain that

$$\begin{aligned} \begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right)&\le A\left( \left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| \right) \\&=A\left( S_{k-1}-S_{k+1}\right) \le A\left( S_{k-2}-S_{k}\right) . \end{aligned} \end{aligned}$$

(4.23)

Combining the relations (4.22) and (4.23), we further have

$$\begin{aligned} \left( R_{k}\right) ^{\theta } \le A \cdot c(1-\theta ) \left( S_{k-2}-S_{k}\right) . \end{aligned}$$

Raising to a power of $\frac{1-\theta }{\theta }$ and multiplying by c to both sides of the above inequality, we obtain that

$$\begin{aligned} c\left( R_{k}\right) ^{1-\theta } \le c \left( A \cdot c(1-\theta ) \cdot \left( S_{k-2}-S_{k}\right) \right) ^{\frac{1-\theta }{\theta }} . \end{aligned}$$

Combining this with (4.21) and according to the fact that $\phi \left( R_{k}\right) =c\left( R_{k}\right) ^{1-\theta }$, we see that for all sufficiently large k,

$$\begin{aligned} S_{k} \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+\frac{1}{2}\left( S_{k-2}-S_{k}\right) \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+S_{k-2}-S_{k}, \nonumber \\ \end{aligned}$$

(4.24)

where $A_{1}=\frac{2 A}{T} c (A \cdot c(1-\theta ))^{\frac{1-\theta }{\theta }}$.

(ii)When $\theta \in \left( 0, \frac{1}{2}\right] $, the inequality $\frac{1-\theta }{\theta } \ge 1$ holds. According to the fact that $\left\| x^{k}-x^{k-1}\right\| \rightarrow 0$ from Theorem 4.1 (ii) and the definition of $S_k$, we obtain that $S_{k-2}-S_{k} \rightarrow 0$. Combining these and (4.24), we conclude that there exists $k_{1}>0$ so that for all $k \ge k_{1}$, it holds that $S_{k} \le \left( A_{1}+1\right) \left( S_{k-2}-S_{k}\right) $, which implies that $S_{k} \le \frac{A_{1}+1}{A_{1}+2} S_{k-2}$. Therefore, we obtain that for any $k \ge k_{1}$,

$$\begin{aligned} \left\| x^{k}-x^\star \right\| \le \sum _{j=k}^{\infty }\left\| x^{j}-x^{j-1}\right\| =S_{k} \le S_{k_{1}-2}\left( \sqrt{\frac{A_{1}+1}{A_{1}+2}}\right) ^{k-k_{1}+1}. \end{aligned}$$

This proves (ii).

(iii)When $\theta \in \left( \frac{1}{2}, 1\right) $, we deduce that $\frac{1-\theta }{\theta }<1$. From this with (4.24) and the fact $S_{k-2}-S_{k} \rightarrow 0$, we see that there exists $k_{2}>0$ such that for all $k \ge k_{2}$,

$$\begin{aligned} \begin{aligned} S_{k} \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }} =\left( A_{1}+1\right) \left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}. \end{aligned} \end{aligned}$$

Raising the above inequality to a power of $\frac{\theta }{1-\theta }$ to both sides, we have for any $k \ge k_{2}$,

$$\begin{aligned} S_{k}^{\frac{\theta }{1-\theta }} \le A_{2}\left( S_{k-2}-S_{k}\right) , \end{aligned}$$

where $A_{2}=\left( A_{1}+1\right) ^{\frac{\theta }{1-\theta }} .$ From [2, Theorem 2], we obtain that there exists $M> 0$ and all sufficiently large k, the inequality $S_k\le M k^{-\frac{1-\theta }{2 \theta -1}}$ holds. This completes the proof. $\square $

5 Numerical experiments

In this section, we perform numerical experiments to illustrate the efficiency of EPDCA for solving the optimization problem (1.1). We compare our algorithm with pDCAe proposed in [31], which has different choices of the extrapolation parameters $ \{\beta _k\}$ given in (1.6), and the general iterative shrinkage and thresholding algorithm (GIST) proposed in [10] on a DC regularized least squares problem as follows:

$$\begin{aligned} \min \left\{ \Psi (x):=\frac{1}{2}\Vert A x-b\Vert ^{2}+g_n(x)-g_s(x) \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$

(5.1)

where $A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}$, $g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a proper closed convex function and $g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}$ is a continuous convex function. More concrete examples can be found in [31].

In the following we consider two different classes of regularizers. One is the $\ell _{1-2}$ regularized least squares problem [32]:

$$\begin{aligned} \min \left\{ \Psi _{\ell _{1-2}}(x)=\frac{1}{2}\Vert A x-b\Vert ^{2}+\lambda \Vert x\Vert _{1}-\lambda \Vert x\Vert ~ \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$

(5.2)

where $A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m},$ and $\lambda >0$ is the regularization parameter. It is easy to see that (5.2) is in the form of (5.1) with $g_{n}(x)=\lambda \Vert x\Vert _{1}$ and $g_{s}(x)=\lambda \Vert x\Vert $. We assume that A does not have zero columns so that $\Psi _{\ell _{1-2}}$ is level-bounded [18]. So by Theorem 4.1, the sequence $\left\{ x^{k}\right\} $ generated by our method to solve problem (5.2) is bounded, and any accumulation point of $\left\{ x^{k}\right\} $ is a stationary point. In addition, noting from [31, Exemple 4.1] that if chosen $\lambda <\frac{1}{2}\Vert A^{T}b\Vert _\infty $, we obtain that the sequence $\left\{ x^{k}\right\} $ generated by our method for solving problem (5.2) is globally convergent.

The other is the logarithmic regularized least square problem [6]:

$$\begin{aligned} \min \left\{ \Psi _{log}(x)=\frac{1}{2}\Vert A x-b\Vert ^{2}+\sum _{i=1}^{n}\left[ \lambda \log \left( \left| x_{i}\right| +\epsilon \right) -\lambda \log \epsilon \right] ~ \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$

(5.3)

where $A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}, \epsilon >0$ is a constant and $\lambda >0$ is the regularization parameter. We observe that (5.3) is in the form of (5.1) with $g_{n}(x)=\frac{\lambda }{\epsilon }\Vert x \Vert _{1}$ and $g_{s}(x)=\sum _{i=1}^{n} \lambda \left[ \frac{\left| x_{i}\right| }{\epsilon }-\log \left( \left| x_{i}\right| +\epsilon \right) +\log \epsilon \right] .$ It is easy to show that $\Psi _{log}$ is level-bounded. This together with Theorem 4.2, we see that the sequence $\left\{ x^{k}\right\} $ generated by EPDCA to solve problem (5.3) is globally convergent.

In the following, we perform several numerical tests to compare the behaviors of our method with pDCAe and GIST for solving (5.2) and (5.3), respectively. All the numerical experiments have been performed in Matlab 2018b on a 64-bit PC with an Intel(R) Core(TM) i7-4790 CPU (3.60GHz) and 32GB of RAM.

In our numerical experiments, for given (m, n, s), we first randomly generate a matrix $A \in \mathbb {R}^{m \times n}$ with i.i.d. standard Gaussian entries, and each column of A is normalized. Then a s-sparsity vector $\hat{y} \in \mathbb {R}^{n}$ is uniformly generated at random and let $y=\mathrm{sign}(\hat{y})$. Finally, we generate $b=A y+0.01 \cdot \hat{n}$, where $\hat{n} \in \mathbb {R}^{m}$ is a random vector with i.i.d. standard Gaussian entries.

For each triple $(m, n,s)=(720i,2560i,80i)$, $i = 1,2, \ldots , 8$, we generate 50 instances randomly as described above and compare all three methods in terms of the average performance over the instances. We choose the Lipschitz constant L of $\nabla f$ as $\lambda _{\max }\left( A^{T} A\right) $, initialize these algorithms at the origin, and terminate these algorithms when the iterations reach the maximum number of 5000 times or $\Vert x^{k}-x^{k-1}\Vert / \text {max}\{1,\Vert x^{k}\Vert \}\le 10^{-5}.$

Table 1 Solving (5.2) on 50 random instances, $\lambda = 1 \times 10^{-3}$

Full size table

Table 2 Solving (5.2) on 50 random instances, $\lambda = 5 \times 10^{-4}$

Full size table

Table 3 Solving (5.3) on 50 random instances, $\lambda = 1 \times 10^{-3}$

Full size table

Table 4 Solving (5.3) on 50 random instances, $\lambda = 5 \times 10^{-4}$

Full size table

With regard to parameters, for EPDCA, we choose the extrapolation parameters {$\beta _{k}$} as in (3.2) and set $a=0.8, b=5, \omega =0.98$. In addition, we adopt the restart strategy and set $K=500$ in Remark 3.1 (iv) on our method to guarantee $\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1$. For pDCAe, the extrapolation parameters are chosen as the extrapolation coefficients used in [31]. For GIST, the algorithmic framework and the setting of parameters can be found in [19, Appendix A].

For the $\ell _{1-2}$ regularized least squares problem, the numerical results are presented in Tables 1 and 2, which correspond with $\lambda =1 \times 10^{-3}$ and $\lambda =5 \times 10^{-4}$ respectively. While for the logarithmic regularized least square problem, the numerical results are presented in Tables 3 and 4, which also correspond with $\lambda =1 \times 10^{-3}$ and $\lambda =5 \times 10^{-4}$ respectively. In all the tables, we give the computation of $L=\lambda _{\max }\left( A^{T} A\right) (t_{\lambda _{max}})$, the number of iterations, CPU times in seconds which does not include the time for computing L, and the function values at termination, averaged over the 50 random instances.

In Fig. 1, We plot $\Vert x^k-x^*\Vert $ against the number of iterations to compare with the convergence performance of each algorithm on solving $\ell _{1-2}$ regularized least squares problem (5.2), where $x^*$ denotes the approximate solution obtained at termination of the respective method. We choose $\lambda =1 \times 10^{-3}$ in Fig. 1a and $\lambda =5 \times 10^{-4}$ in Fig. 1b, which correspond with Tables 1 and 2, respectively. In Fig. 2, We plot $\Vert x^k-x^*\Vert $ against the number of iterations to compare with the convergence performance of each algorithm on solving logarithmic regularized least square problem (5.3). We choose $\lambda =1 \times 10^{-3}$ in Fig. 2a and $\lambda =5 \times 10^{-4}$ in Fig. 2b, which correspond with Tables 3 and 4, respectively. The green, blue and red line correspond to the three methods, GIST, pDCAe and EPDCA, respectively.

We conclude from these tables and figures that our method takes fewer iterations and less CPU time than pDCAe and GIST to get a good approximation of the minimum value of the two test regularized least squares problems.

6 Conclusion

In this paper, based on the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe [31] for DC programming, we propose a new extrapolated proximal difference-of-convex algorithm (EPDCA), which equips a more general setting of the extrapolation parameters $\{\beta _k\}$ for solving the DC optimization problem (1.1). We establish global subsequential convergence of the sequence generated by our method. In addition, by assuming the Kurdyka–Łojasiewicz property of the auxiliary function, we establish global convergence of the sequence generated by EPDCA and analyze its convergence rate, which can allow $g_s$ to be a nonsmooth function. To evaluate the performance of the proposed method, we consider two classes of DC regularized least squares problems including the $\ell _{1-2}$ regularized least squares problem and logarithmic regularized least square problem. Numerical results illustrate the efficiency of our method and its superiority over other well-known methods.

In future research, we can utilize the advantage of nonmonotone line search technique in [23] to enlarge the step size and meanwhile speed up the convergence of the proposed algorithm. Furthermore, the difference-of-Moreau-envelopes smoothing technique has been developed to solve DC programming in [30]. These variational methods may inspire further improvements and extensions.

References

Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser DC programming and its applications. IEEE Trans. Signal Process. 62, 2984–2998 (2014)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions invoving analytic features. Math. Program. 116, 5–16 (2009)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 3, 438–457 (2010)
Article MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Article MathSciNet MATH Google Scholar
Candès, E.J., Wakin, M., Boyd, S.: Enhancing spasity by reweighted $\ell _{1}$ minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)
Article MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of “FISTA’’. J. Optim. Theory Appl. 166, 25 (2015)
MATH Google Scholar
Gaso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process. 57, 4686–4698 (2009)
Article MathSciNet MATH Google Scholar
Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative shinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML (2013)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Preprint, METR 2015-27, Department of Mathematical Informatics, University of Tokyo. http://www.keisu.t.u-tokyo.ac.jp/research/techrep/index.html
Le Thi, H.A., Nguyen, M.C.: DCA based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249, 273–300 (2017)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham, D.T.: DC programming and DCA: thirty years of developments. Math. Program. 169, 5–68 (2018)
Article MathSciNet MATH Google Scholar
Le Thi, H.A., Pham, D.T., Le, H.M., Vo, X.Y.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244, 26–46 (2015)
Article MathSciNet MATH Google Scholar
Lin, Y., Li, S., Zhang, Y.Z.: Convergence rate analysis of accelerated forward–backward algorithm with generalized Nesterov momentum scheme. arXiv: 2112.05873
Lin, Y., Schmidtlein, C.R., Li, Q., Li, S., Xu, Y.: A Krasnoselskii–Mann algorithm with an improved EM preconditioner for PET image reconstruction. IEEE Trans. Med. Imaging 38, 2114–2126 (2019)
Article Google Scholar
Liu, T., Pong, T.K.: Further properties of the forward–backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)
Article MathSciNet MATH Google Scholar
Liu, T., Pong, T.K., Takeda, A.: A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Math. Program. 176, 339–367 (2018)
Article MathSciNet MATH Google Scholar
Liu, T., Pong, T.K., Takeda, A.: A refined convergence analysis of pDCA$_e$ with applications to simultaneous sparse recovery and outlier detection. Math. Program. 176, 339–367 (2019)
Lou, Y., Zeng, T., Osher, S., Xin, J.: A weighted difference of anisotropic and isotropic total variation model for image processing. SIAM J. Imaging Sci. 8, 1798–1823 (2015)
Article MathSciNet MATH Google Scholar
Lu, Z., Zhou, Z., Sun, Z.: Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Program. 176, 369–401 (2019)
Article MathSciNet MATH Google Scholar
Lu, Z., Zhou, Z.: Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programming. SIAM J. Optim. 29, 2725–2752 (2019)
Article MathSciNet MATH Google Scholar
O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Article MathSciNet MATH Google Scholar
Pang, J.-S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42, 95–118 (2017)
Article MathSciNet MATH Google Scholar
Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)
MathSciNet MATH Google Scholar
Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8, 476–505 (1998)
Article MathSciNet MATH Google Scholar
Rockafellar, R.T., Wets, R.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften. Springer, Berlin (1998)
Google Scholar
Sanjabi, M., Razaviyayn, M., Luo, Z.-Q.: Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal Process. 62, 1950–1961 (2014)
Article MathSciNet MATH Google Scholar
Sun, K., Sun, X.A.: Algorithms for difference-of-convex (DC) programs based on difference-of-Moreau-envelopes smoothing. arXiv: 2104.01470
Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 62, 297–324 (2018)
Article MathSciNet MATH Google Scholar
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of $\ell _{1-2}$ for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)

Download references

Acknowledgements

The authors would like to thank the editors and anonymous reviewers for their insight and helpful comments and suggestions which improve the quality of the paper. This work is also supported in part by NSFC11801131, Natural Science Foundation of Hebei Province (Grant No. A2019202229), a scientific grant of Hebei Educational Committee (QN2018101).

Author information

Authors and Affiliations

School of Science, Hebei University of Technology, Tianjin, People’s Republic of China
Lejia Gao
Institute of Mathematics, Hebei University of Technology, Tianjin, People’s Republic of China
Bo Wen

Authors

Lejia Gao
View author publications
You can also search for this author in PubMed Google Scholar
Bo Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wen.

Ethics declarations

Conflict of interest

The authors declare that there are no conflicts of interest with regards to the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This author’s work was supported in part by NSFC11801131, Natural Science Foundation of Hebei Province, (Grant No. A2019202229), a scientific grant of Hebei Educational Committee (QN2018101)

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gao, L., Wen, B. Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm. J. Appl. Math. Comput. 69, 1403–1429 (2023). https://doi.org/10.1007/s12190-022-01797-w

Download citation

Received: 08 June 2022
Revised: 17 September 2022
Accepted: 25 September 2022
Published: 03 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s12190-022-01797-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm

Abstract

Similar content being viewed by others

Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization

Error Bound and Isocost Imply Linear Convergence of DCA-Based Algorithms to D-Stationarity

New Bregman proximal type algorithms for solving DC optimization problems﻿

1 Introduction

2 Preliminaries

Definition 2.1

Lemma 2.1

3 Extrapolated proximal difference-of-convex algorithm

Remark 3.1

4 Convergence analysis

Lemma 4.1

Proof

Definition 4.1

Theorem 4.1

Proof

Proposition 4.1

Proof

Theorem 4.2

Proof

Remark 4.1

Theorem 4.3

Proof

5 Numerical experiments

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation

New Bregman proximal type algorithms for solving DC optimization problems