1 Introduction

In recent years, DC optimization problems arise in many application areas, such as machine learning [12, 15] and image processing [7, 9, 21]. Due to their popularity, many numerical algorithms have been proposed for solving these problems [1, 13, 14, 29]. In this paper, we consider a special DC problem, which is in the following form:

$$\begin{aligned} \min \left\{ \Psi (x):=f(x)+g(x) \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$
(1.1)

where g is a difference-of-convex function:

$$\begin{aligned} g(x)=g_{n}(x)-g_{s}(x), \end{aligned}$$
(1.2)

\(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a smooth convex function with Lipschitz continuous gradient, \(g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a proper closed convex function and \(g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuous convex function. In addition, we assume the objective function \(\Psi \) is level-bounded, which means that inf \( \Psi (x)>-\infty \) and the set of global minimizers of (1.1) is nonempty.

Since problem (1.1) is a class of DC problems, the well-known and classical DC algorithm (DCA) [26] can be applied to solving the optimization problem. The main iteration step is:

$$\begin{aligned} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ f(x)+g_{n}(x)-\left\langle \xi ^{k}, x\right\rangle \right\} , \end{aligned}$$
(1.3)

where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \) is a subgradient of \(g_s\) at \(x^{k} \in \mathbb {R}^{n}\). As the scale of this problem becomes larger and larger, the original DCA probably not very efficient. By using the proximal mapping and a specific DC decomposition [27], large-scale DC optimization problems are solved via the proximal DC algorithm (PDCA) [11]. It can be suitably applied to the problem (1.1) for which the new iterate is obtained by solving the following proximal subproblem

$$\begin{aligned} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x)+\left\langle \nabla f\left( x^{k}\right) -\xi ^{k}, x\right\rangle +\frac{L}{2}\left\| x-x^{k}\right\| ^{2}\right\} , \end{aligned}$$
(1.4)

where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \) and \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\). However, this algorithm may take a large number of iterations.

Therefore, various attempts have been made to accelerate PDCA. By using the extrapolation accelerating techniques, Wen et al. [31] proposed a proximal DC algorithm with extrapolation (pDCAe) to solve the problem (1.1). The key iteration step is as follows:

$$\begin{aligned} \left\{ \begin{array}{l} y^{k}=x^{k}+\beta _{k}\left( x^{k}-x^{k-1}\right) , \\ x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x)+\left\langle \nabla f\left( y^{k}\right) -\xi ^{k}, x\right\rangle +\frac{L}{2 }\left\| x-y^{k}\right\| ^{2}\right\} , \end{array}\right. \end{aligned}$$
(1.5)

where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \), \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\) and \(\beta _{k}\left( x^{k}-x^{k-1}\right) \) is an extrapolation term. The choice of extrapolation parameters \(\{\beta _{k}\}\) in [31] can cover many popular choices of extrapolation parameters including those used in fast iterative shrinkage-thresholding algorithm (FISTA) with fixed restart strategy [24] for solving (1.1) in the case \(g(x)=g_n(x)\), which is given as follows:

$$\begin{aligned} \beta _k=\frac{t_{k-1}-1}{t_k} \ \text {with} \ t_{k}=\frac{1+\sqrt{1+4t_{k-1}^2}}{2}, \end{aligned}$$
(1.6)

where \(t_{-1}=t_{0}=1\). Recently, based on the difference-of-Moreau-envelopes smoothing technique, Sun et al. [30] developed an inexact gradient descent method for solving the DC programming, which is

$$\begin{aligned} \left\{ \begin{array}{l} x^{k+1}=\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_n(x) + \left\langle \nabla f\left( x^{k}\right) , x\right\rangle +\frac{1}{2 \upsilon }\left\| x-z^{k}\right\| ^{2}\right\} , \\ x_{\upsilon g_{s}}\left( z^{k}\right) =\underset{x \in \mathbb {R}^{n}}{\arg \min }\left\{ g_s(x)+\frac{1}{2 \upsilon }\left\| x-z^{k}\right\| ^{2}\right\} ,\\ z^{k+1}=z^{k}+\beta \left( x^{k+1}-x_{\upsilon g_{s}}\left( z^{k}\right) \right) , \end{array}\right. \end{aligned}$$
(1.7)

where \(\upsilon \in (0, \frac{1}{L})\), \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\) and \(\beta \in (0,2)\).

For other accelerated DC algorithms for solving constrained DC programming have been further proposed to improve the quality of solutions and the rate of convergence. Pang et al. [25] proposed an enhanced DC algorithm (EDCA) to solve the considered DC problem, meanwhile speed up the subsequential convergence. Further, based on EDCA, Lu et al. [22, 23] developed the enhanced proximal DC algorithm (EPDCA) to improve the efficiency of solving constrained DC programming with \(g_s\) to be the supremum of many convex smooth functions in (1.2), while nonmonotone EPDCA is proposed by utilizing the nonmonotone line search technique. Moreover, Sun et al. [30] proposed algorithms by combining the difference-of-Moreau-envelopes smoothing technique with the augmented Lagrangian function and obtained well convergence results.

In this paper, motivated by the success of pDCAe on accelerating the original PDCA for solving structured DC optimization problems, we aim to investigate a more general extrapolation scheme in order to improve the numerical performance of PDCA. Concretely, inspired by the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe in [31], we consider a new extrapolated proximal DC algorithm (EPDCA) with a more general setting of the extrapolation parameters \(\{\beta _{k}\}\), where a power parameter \(\omega \in (0, 1]\) is introduced in the extrapolation scheme.

We first prove that any accumulation point of the sequence generated by our method is a stationary point under very general conditions. Moreover, according to Kurdyka–Łojasiewicz property, we establish the global convergence and convergence rate of the sequence generated by our method, which does not require the assumption that \(g_s\) is continuously differentiable. In the end, we conduct some numerical experiments to show the efficiency of our method.

The main contribution of this work can be summarized as following:

  • We propose a novel extrapolation scheme with more general setting of the extrapolation parameters \(\{\beta _{k}\}\) in EPDCA to solve DC optimization problem (1.1) for potential acceleration. Then we establish the convergence results of our method.

  • By constructing an auxiliary function, we prove the global convergence results of EPDCA without the assumption that \(g_s\) is a locally Lipschitz continuous function, which extended the convergence results of pDCAe developed in [31].

  • According to our numerical results, the performance of our method is efficient on \(\ell _{1-2}\) regularized least squares problem [32] and logarithmic regularized least squares problem [6].

The rest of this paper is organized as follows. In Sect. 2, some basic definitions used in the whole paper are given. Section 3 gives the framework of EPDCA and discuss the selection of the extrapolation parameters \(\{\beta _k\}\). In Sect. 4, we analyze the convergence behaviors of our method. Some numerical experiments are conducted to show the efficiency of the proposed method in comparison with other existing methods in Sect. 5. Section 6 concludes the whole paper.

2 Preliminaries

In this paper, the Euclidean scalar product of \(\mathbb {R}^{n}\) and its corresponding norm are respectively denoted by \(\langle \cdot , \cdot \rangle \) and \(\Vert \cdot \Vert .\) For a \(m \times n\) matrix A, let its transpose be denoted by \(A^{T}\). If A is a symmetric matrix, we use \(\lambda _{\max }(A)\) and \(\lambda _{\min }(A)\) to denote the largest and smallest eigenvalues respectively. For a given nonempty closed set \(\mathcal {C} \subseteq \mathbb {R}^{n}\), let \({\text {dist}}(\cdot , \mathcal {C}): \mathbb {R}^{n} \rightarrow \mathbb {R}\) denotes the distance function to \(\mathcal {C}\), i.e.

$$\begin{aligned} {\text {dist}}(x, \mathcal {C}):=\inf _{y \in \mathcal {C}}\Vert x-y\Vert , \quad \forall x \in \mathbb {R}^{n}. \end{aligned}$$
(2.1)

For an extended real-valued function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), the set

$$\begin{aligned} {\text {dom}} f:=\left\{ x \in \mathbb {R}^{n}: f(x)<\infty \right\} \end{aligned}$$
(2.2)

denotes its effective domain. f is called proper if it does not take the value \(-\infty \) and dom f is nonempty.

The set

$$\begin{aligned} \text{ epi }f:=\left\{ (x, t)\in \mathbb {R}^{n}\times \mathbb {R}: f(x) \le t\right\} \end{aligned}$$
(2.3)

denotes the epigraph of f, and f is closed (or convex) if epif is closed (or convex). A proper closed function f is said to be level-bounded if the lower level set \(\left\{ x \in \mathbb {R}^{n}: f(x) \le r\right\} \) is bounded for any \(r \in \mathbb {R}\).

Given a proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), the \(Fr\acute{e}chet\) subdifferential of f at \(x \in {\text {dom}} f\) is given by

$$\begin{aligned} \partial ^{F} f(x):=\left\{ y \in \mathbb {R}^{n}: \liminf _{z \rightarrow x} \frac{f(z)-f(x)-\langle y, z-x\rangle }{\Vert z-x\Vert } \ge 0\right\} , \end{aligned}$$
(2.4)

while for \(x \notin {\text {dom}} f, \partial ^{F} f(x)=\emptyset \). The limiting subdifferential of f at \(x \in {\text {dom}} f\) is defined as

$$\begin{aligned} \partial f(x):=\left\{ y \in \mathbb {R}^{n}: \exists \ x^{k} \rightarrow x, f\left( x^{k}\right) \rightarrow f(x), y^{k} \in \partial ^{F} f\left( x^{k}\right) \right. \text {such that} \left. y^{k} \rightarrow y\right\} , \end{aligned}$$
(2.5)

while for \(x \notin {\text {dom}} f\), \(\partial f(x)=\emptyset \). If f is convex, the \(Fr\acute{e}chet\) subdifferential and the limiting subdifferential will coincide with the convex subdifferential, that is,

$$\begin{aligned} \partial ^{F} f(x)=\partial f(x)=\left\{ y \in \mathbb {R}^{n}: f(z) \ge f(x)+\langle y, z-x\rangle \text{ for } \text{ all } z \in \mathbb {R}^{n}\right\} . \end{aligned}$$
(2.6)

A proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) is called \(\sigma \)-strongly convex with \(\sigma \ge 0\) if for any \(x,z \in {\text {dom}} f\) and \(\lambda \in [0,1]\), it holds that

$$\begin{aligned} f(\lambda x+(1-\lambda ) z) \le \lambda f(x)+(1-\lambda ) f(z)-\frac{\sigma }{2} \lambda (1-\lambda )\Vert x-z\Vert ^{2}. \end{aligned}$$
(2.7)

Moreover, let f be a \(\sigma \)-strongly convex function, it is known that for any \(x \in {\text {dom}} \partial f, y \in \partial f(x)\) and \(z \in {\text {dom}} f\), we have

$$\begin{aligned} f(z) \ge f(x)+\langle y, z-x \rangle +\frac{\sigma }{2}\Vert z-x\Vert ^{2} . \end{aligned}$$
(2.8)

For a proper closed convex function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), let \(f^{*}(u)=\sup _{x\in \mathbb {R}^{n}}\{\langle u, x\rangle -f(x)\}\) denotes the conjugate function of f(x). Then \(f^{*}\) is proper closed convex and for any x and y, the Young’s inequality holds, that is

$$\begin{aligned} f(x)+f^{*}(y) \ge \langle x, y\rangle , \end{aligned}$$
(2.9)

the equality holds if and only if \(y \in \partial f(x)\). Moreover, for any x and y, one has \(y \in \partial f(x)\) if and only if \(x \in \partial f^{*}(y)\).

Finally, we present some preliminaries on the Kurdyka–Łojasiewicz property [2,3,4]. These concepts play a central role in our theoretical and algorithmic developments. For \(\eta \in (0, \infty ]\), we denote by \(\Xi _{\eta }\) the set of all concave continuous functions \(\phi :[0, \eta ) \rightarrow [0, \infty )\) that are continuously differentiable over \((0, \eta )\) with positive derivatives and satisfy \(\phi (0)=0\).

Definition 2.1

(Kurdyka–Łojasiewicz property) A proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) is said to satisfy the Kurdyka–Łojasiewicz property at \(\overline{x} \in {\text {dom}} \partial f\) if there exist \(\eta \in (0, \infty ]\), a neighborhood W of \(\overline{x}\), and a function \(\phi \in \Xi _{\eta }\) such that for all x in the intersection

$$\begin{aligned} W\cap \left\{ x \in \mathbb {R}^{n}: f(\overline{x})<f(x)<f(\overline{x})+\eta \right\} \end{aligned}$$

it holds that

$$\begin{aligned} \phi ^{\prime }(f(x)-f(\overline{x})) {\text {dist}}(0, \partial f(x)) \ge 1. \end{aligned}$$

If f satisfies the Kurdyka–Łojasiewicz property at any point of dom \(\partial f\), then f is called a Kurdyka–Łojasiewicz function.

The following uniformized Kurdyka–Łojasiewicz property [5] plays an important role in our sequential convergence analysis.

Lemma 2.1

(Uniformized Kurdyka–Łojasiewicz property) Let \(\Omega \subseteq \mathbb {R}^{n}\) be a compact set and let \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) be a proper closed function. If f is constant on \(\Omega \) and satisfies the Kurdyka–Łojasiewicz property at each point of \(\Omega \), then there exist \(\epsilon , \eta >0\) and \(\phi \in \Xi _{\eta }\) such that

$$\begin{aligned} \phi ^{\prime }(f(x)-f(\overline{x})) {\text {dist}}(0, \partial f(x)) \ge 1 \end{aligned}$$

for any \(\overline{x} \in \Omega \) and any x satisfying \({\text {dist}}(x, \Omega )<\epsilon \) and \(f(\overline{x})<f(x)<f(\overline{x})+\eta \).

3 Extrapolated proximal difference-of-convex algorithm

In this section, we will show the framework of the extrapolated proximal difference-of-convex algorithm (EPDCA) and give the chosen of the extrapolation parameters \( \{\beta _k\}\), which are more general than those proposed in (1.6).

figure a

The extrapolation parameters \( \{\beta _k\}\) are selected as follows:

$$\begin{aligned} \beta _k=\frac{t_{k-1}-1}{t_k} \ \text {with} \ t_{k-1}=a(k-1)^\omega +b,~~ k\ge 1 \end{aligned}$$
(3.2)

where \(\omega \in (0, 1]\), \(a \in \mathbb {R}^{+}\) and \(b \in \mathbb {R}\). Set \(t_{-1}=t_0=b\). It is worth noting that in order to avoid the denominator being zero, without further mentioning, we always set \(b \in \mathbb {R}\backslash \{-ak^\omega : k \in \mathbb {N}^{+}\}\).

Remark 3.1

  1. (i)

    If we choose \(\omega =1, a=\frac{1}{\alpha -1} \ \mathrm{with} \ \alpha >3, b=1\) in (3.2), the extrapolation parameters \( \{\beta _k\}\) become as follows:

    $$\begin{aligned} \beta _k=\frac{k-1}{k+\alpha -1}. \end{aligned}$$
    (3.3)

    This setting was proposed by Chambolle and Dossal [8], which is a special case of our proposed extrapolation parameters in (3.2).

  2. (ii)

    Let the extrapolation parameters \( \{\beta _k\}\) in (3.2) with \(a=\frac{1}{2}, b=1\) and \(\omega =1\), we can obtain the extrapolation parameters \( \{\beta _k\}\) in (1.6) and (3.2) converge to the same value with the same convergence rate from [17, Proposition2], which means that the setting (1.6) of the extrapolation parameters \( \{\beta _k\}\) are asymptotically consistent to a special case of the parameters we proposed in (3.2).

  3. (iii)

    The choice of the power \(\omega \) depends on the specific problems. We choose \(\omega =0.98\) in the numerical experiments for solving two classes of DC regularized least squares problems, see Sect. 5.

  4. (iv)

    We can easily deduce that the sequence \(\{\beta _k\}\) is increasing to 1 from (3.2). Therefore, given a fixed positive integer K, we can choose \(\beta _k \equiv \beta _K\) whenever \(k>K\), which implies the supremum of \(\beta _k\) is strictly less than 1.

More concrete examples of such setting of extrapolation parameters have been extensively used for the Nesterov’s accelerated forward–backward algorithm, see [16] for more details.

4 Convergence analysis

In this section, we are going to analyze the convergence behaviors of EPDCA. We first define the following auxiliary sequence:

$$\begin{aligned} H(x^k,\xi ^k,x^{k-1}):=f\left( x^{k}\right) +g_n\left( x^{k}\right) -\langle x^k,\xi ^k\rangle +g_s^*\left( \xi ^{k}\right) +\frac{\mu }{2}\Vert x^{k}-x^{k-1}\Vert ^2, \end{aligned}$$

where \(\mu \in [L\overline{\beta }^2,L]\) with \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\), \(\xi ^{k} \in \partial g_s(x^{k-1})\) and \(g_s^{*}(\xi ^{k}) \) is the conjugate function of \(g_s(x^k)\). Based on the Young’s inequality, we obtain that

$$\begin{aligned} H(x^k,\xi ^{k},x^{k-1})\ge f\left( x^{k}\right) +g_n\left( x^{k}\right) -g_s\left( x^{k}\right) +\frac{\mu }{2}\Vert x^{k}- x^{k-1}\Vert ^2\ge \Psi (x^k). \end{aligned}$$
(4.1)

It is worth mentioning that using this auxiliary sequence structure, we can establish the global convergence of the sequence generated by our method to solve problem (1.1) without requiring \(g_s\) to be continuously differentiable. The idea of constructing such an auxiliary sequence similarity can be found in [20].

Lemma 4.1

Let \(\{x^k\}\) be a sequence generated by EPDCA and \(\mu \in [L\overline{\beta }^2,L]\), it holds that

$$\begin{aligned} H(x^{k+1},\xi ^{k+1},x^{k})-H(x^k,\xi ^{k},x^{k-1}) \le \frac{\mu -L}{2}\Vert x^{k+1}-x^{k}\Vert ^2+\frac{L\beta _k^2-\mu }{2}\Vert x^{k}-x^{k-1}\Vert ^2. \end{aligned}$$
(4.2)

Furthermore, the sequence \(\{H(x^k,\xi ^{k},x^{k-1})\}\) is nonincreasing.

Proof

Using the strong convexity of the objective in the subproblem (3.1), we can compare the objective values of this strongly convex function at \(x^{k+1}\) and \(x^k\) that

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k+1}\right) +\left\langle \nabla f\left( y^{k}\right) -\xi ^{k+1}, x^{k+1}\right\rangle +\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2} \\&\qquad \le g_{n}\left( x^{k}\right) +\left\langle \nabla f\left( y^{k}\right) -\xi ^{k+1}, x^{k}\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}-\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^{2}. \end{aligned} \end{aligned}$$

Rearranging terms, we obtain that

$$\begin{aligned} \begin{aligned} g_{n}\left( x^{k+1}\right) \le \,&g_{n}\left( x^{k}\right) +\left\langle -\nabla f\left( y^{k}\right) +\xi ^{k+1}, x^{k+1}-x^k\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}\\&-\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2}-\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^{2}. \end{aligned} \end{aligned}$$

On the other hand, using the fact that \(\nabla f\) is Lipschitz continuous with a Lipschitz constant of \(L>0\), we have

$$\begin{aligned} f\left( x^{k+1}\right) \le f\left( y^{k}\right) +\left\langle \nabla f\left( y^{k}\right) , x^{k+1}-y^{k}\right\rangle +\frac{L}{2}\left\| x^{k+1}-y^{k}\right\| ^{2}. \end{aligned}$$

Using the above two inequalities, we obtain further that

$$\begin{aligned}&f\left( x^{k+1}\right) +g_n\left( x^{k+1}\right) \nonumber \\&\qquad \le f\left( y^{k}\right) +\left\langle \nabla f\left( y^{k}\right) , x^{k}-y^{k}\right\rangle +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle \nonumber \\&\qquad \qquad -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2 \nonumber \\&\qquad \le f\left( x^{k}\right) +\frac{L}{2}\left\| x^{k}-y^{k}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2 \nonumber \\&\qquad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\Vert x^{k+1}-x^{k}\Vert ^2, \nonumber \\ \end{aligned}$$
(4.3)

where the second inequality follows from the convexity of f, and the last inequality holds due to the relation \(x^k-y^k=x^k-(x^k+\beta _k(x^k-x^{k-1}))=-\beta _k(x^k-x^{k-1})\). From the definition of the auxiliary sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\) and the conjugate function \(g_s^{*}\), we have

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) =f\left( x^{k+1}\right) +g_n\left( x^{k+1}\right) -\left\langle x^{k+1},\xi ^{k+1}\right\rangle +g_s^*\left( \xi ^{k+1}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2\nonumber \\&\quad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\qquad -\left\langle x^{k+1},\xi ^{k+1}\right\rangle +g_s^*\left( \xi ^{k+1}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\quad = f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) +\left\langle \xi ^{k+1}, x^{k+1}-x^{k}\right\rangle -\frac{L}{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\qquad +\left\langle x^k-x^{k+1},\xi ^{k+1}\right\rangle -g_s\left( x^{k}\right) +\frac{\mu }{2}\left\| x^{k+1}-x^{k}\right\| ^2 \nonumber \\&\quad = f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2-g_s\left( x^{k}\right) , \end{aligned}$$
(4.4)

where the first inequality follows from (4.3), and the second equality follows from the fact that \(\xi ^{k+1} \in \partial g_s(x^k)\). Combing the Young’s inequality on (4.4), we obtain that

$$\begin{aligned} \begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) \\&\quad \le f\left( x^{k}\right) +\frac{L}{2}\beta _k^2\left\| x^{k}-x^{k-1}\right\| ^{2}+g_{n}\left( x^{k}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2\\&\qquad -\left\langle x^k,\xi ^k \right\rangle + g_s^*\left( \xi ^{k}\right) \\&\quad =f\left( x^{k}\right) +g_{n}\left( x^{k}\right) -\left\langle x^k,\xi ^k \right\rangle + g_s^*\left( \xi ^{k}\right) +\frac{\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^{2}\\&\qquad + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2 +\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}\\&\quad = H\left( x^{k},\xi ^{k},x^{k-1}\right) + \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2+\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}. \end{aligned} \end{aligned}$$

where the second equality follows from the definition of the sequence \(\{H(x^k,\xi ^{k},x^{k-1})\}\). Therefore,

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^{k},\xi ^{k},x^{k-1}\right) \\&\quad \le \left( \frac{\mu }{2}-\frac{L}{2}\right) \left\| x^{k+1}-x^{k}\right\| ^2+\left( \frac{L}{2}\beta _k^2-\frac{\mu }{2}\right) \left\| x^{k}-x^{k-1}\right\| ^{2}. \end{aligned}$$

Recall that \(L \bar{\beta }^{2} \le \mu \le L\) and \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\), we obtain that for any \(k >0\),

$$\begin{aligned} \frac{\mu }{2}-\frac{L}{2} \le 0, \text{ and } \frac{L}{2} \beta _{k}^{2}-\frac{\mu }{2} \le \frac{L}{2} \bar{\beta }^{2}-\frac{\mu }{2} \le 0. \end{aligned}$$

Consequently, we see immediately that

$$\begin{aligned} H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^{k},\xi ^{k},x^{k-1}\right) \le 0, \end{aligned}$$

which means that the auxiliary sequence \(\left\{ H(x^{k},\xi ^{k},x^{k-1})\right\} \) is nonincreasing. This completes the proof. \(\square \)

Before analyzing the convergence behaviors, on the basis of Fermat’s rule [28, Theorem 10.1], we give the definition of stationary points of the objective function \(\Psi \) as follows.

Definition 4.1

Let \(\Psi \) be given in (1.1). We say that \(\tilde{x}\) is a stationary point of \(\Psi \) if

$$\begin{aligned} 0 \in \partial \nabla f(\tilde{x})+\partial g_n(\tilde{x})-\partial g_s(\tilde{x}). \end{aligned}$$

The set of all stationary points of \(\Psi \) is denoted by \(\mathcal {X}\).

We next give the global subsequential convergence of the sequence \(\{x^k\}\) generated by EPDCA to solve problem (1.1) .

Theorem 4.1

(Global subsequential convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA and \(\mu \in [L\overline{\beta }^2,L]\). Then the following statements hold:

  1. (i)

    The sequences \(\left\{ x^{k}\right\} \) and \(\left\{ \xi ^{k}\right\} \) are bounded.

  2. (ii)

    \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) as \(k \rightarrow \infty \).

  3. (iii)

    Any accumulation point of the sequence \(\left\{ x^{k}\right\} \) is a stationary point of \(\Psi \).

Proof

(i) Since the sequence \(\left\{ H(x^k,\xi ^k,x^{k-1})\right\} \) is nonincreasing from Lemma 4.1, together with the inequality (4.1), we obtain that

$$\begin{aligned} \Psi \left( x^k\right) \le H\left( x^k,\xi ^k,x^{k-1}\right) \le H\left( x^0,\xi ^0,x^{-1}\right) < \infty . \end{aligned}$$

Noting that by assumption \(\Psi \) is level-bounded, thus we have the sequence \(\left\{ x^{k}\right\} \) is bounded. Moreover, due to the boundedness of \(\{x^k\}\), the continuity of \(g_s\), and the fact that \(\xi ^k \in \partial g_s(x^{k-1})\), we immediately have the sequence \(\{\xi ^k\}\) is also bounded. This proves (i).

(ii) From Lemma 4.1 and according to the fact that \(\mu \le L\), we have

$$\begin{aligned}&H\left( x^{k+1},\xi ^{k+1},x^{k}\right) -H\left( x^k,\xi ^k,x^{k-1}\right) \\&\quad \le \frac{\mu -L}{2}\left\| x^{k+1}-x^{k}\right\| ^2+\frac{L\beta _k^2-\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^2\\&\quad \le \frac{L\beta _k^2-\mu }{2}\left\| x^{k}-x^{k-1}\right\| ^2. \end{aligned}$$

By rearranging the terms above, we obtain that

$$\begin{aligned} \frac{\mu -L\beta _k^2}{2}\left\| x^{k}-x^{k-1}\right\| ^2\le H\left( x^k,\xi ^k,x^{k-1}\right) -H\left( x^{k+1},\xi ^{k+1},x^{k}\right) . \end{aligned}$$
(4.5)

Furthermore, summing up both sides of the above inequality with k ranging from 1 to m, we have

$$\begin{aligned} 0 \le \sum _{k=1}^{m}\left[ \frac{\mu -L\beta _k^2}{2}\left\| x^{k}-x^{k-1}\right\| ^{2}\right]\le & {} \sum _{k=1}^{m}\left( H\left( x^k,\xi ^k,x^{k-1}\right) -H\left( x^{k+1},\xi ^{k+1},x^{k}\right) \right) \nonumber \\= & {} H\left( x^1,\xi ^1,x^{0}\right) -H\left( x^{m+1},\xi ^{m+1},x^{m}\right) . \end{aligned}$$
(4.6)

Letting m to the limit in (4.6), we conclude that

$$\begin{aligned} \frac{\mu -L\beta _k^2}{2} \sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| ^{2}<\infty . \end{aligned}$$

Using the fact that \(\mu \ge L\overline{\beta }^2 \ge L\beta _{k}^2 \), we have \( \sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| ^{2}<\infty .\) Therefore, we immediately obtain that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) as \(k \rightarrow \infty \). This proves (ii).

(iii) Let \(x^{*}\) be an accumulation point of \(\left\{ x^{k}\right\} \). Then there exists a subsequence \(\left\{ x^{k_{j}}\right\} \) convergent to \(x^{*}\). From the first-order optimality condition of the subproblem (3.1), we obtain that

$$\begin{aligned} 0 \in \partial g_{n}\left( x^{k_{j}+1}\right) + \nabla f\left( y^{k_{j}}\right) -\xi ^{k_{j}+1}+ L\left( x^{k_{j}+1}-y^{k_{j}}\right) , \end{aligned}$$
(4.7)

where \(\xi ^{k_{j}+1} \in \partial g_{s}\left( x^{k_{j}}\right) \) . On the other hand, due to the fact that \(\left\| x^{k_j}-x^{k_j-1}\right\| \rightarrow 0\) from (ii), we have

$$\begin{aligned} \begin{aligned} x^{k_{j}+1}-y^{k_{j}}&=x^{k_{j}+1}-\left( x^{k_j}+\beta _{k_j}\left( x^{k_j}-x^{k_j-1}\right) \right) \\&= \left( x^{k_{j}+1}-x^{k_j}\right) -\beta _{k_j}\left( x^{k_j}-x^{k_j-1}\right) \rightarrow 0. \end{aligned} \end{aligned}$$
(4.8)

In addition, recall that the sequence \(\left\{ \xi ^{k_{j}+1}\right\} \) is bounded from (i), so without loss of generality we can assume that the limit of the sequence \(\left\{ \xi ^{k_{j}+1}\right\} \) exists, and it belongs to \(\partial g_{s}\left( x^{*}\right) \) due to the closedness of \(\partial g_{s}\). Noting the closedness of \(\partial g_{n}\) and the continuity of \(\nabla f\), then taking j to the limit in (4.7), we obtain that

$$\begin{aligned} 0 \in \partial g_{n}\left( x^{*}\right) +\nabla f\left( x^{*}\right) -\partial g_{s}\left( x^{*}\right) , \end{aligned}$$
(4.9)

which means that \(x^{*}\) is a stationary point of \(\Psi \). This completes the proof. \(\square \)

Before analyzing the global convergence results of EPDCA, we will give some properties of the auxiliary sequence \(\{H\left( x^{k},\xi ^k, x^{k-1}\right) \}\).

Proposition 4.1

Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Then the following statements hold:

  1. (i)

    \(\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0\).

  2. (ii)

    \(\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) \) exists.

  3. (iii)

    The set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \), denoted by \(\Upsilon \), is a nonempty compact set. Moreover, \(H \equiv \zeta \) on \(\Upsilon \).

Proof

(i) Taking the subdifferential of the auxiliary sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\), we obtain that for \(k\ge 1\),

$$\begin{aligned} \partial H\left( x^{k}, \xi ^{k}, x^{k-1}\right) =\left[ \begin{array}{c} \nabla f\left( x^{k}\right) +\partial g_{n}\left( x^{k}\right) -\xi ^{k}+\mu \left( x^k-x^{k-1}\right) \\ -x^{k}+\partial g_s^{*}\left( \xi ^{k}\right) \\ -\mu \left( x^{k}-x^{k-1}\right) \end{array}\right] . \end{aligned}$$

Using the first-order optimality of the subproblem (3.1), we have

$$\begin{aligned} -\nabla f\left( y^{k-1}\right) +\xi ^k-L\left( x^k-y^{k-1}\right) \in \partial g_n(x^k). \end{aligned}$$

From the above relation and the fact that \(x^{k-1}\in \partial g_s^*(\xi ^k)\), we obtain further that

$$\begin{aligned} \left[ \begin{array}{c} \nabla f\left( x^{k}\right) -\nabla f\left( y^{k-1}\right) -L\left( x^k-y^{k-1}\right) +\mu \left( x^k-x^{k-1}\right) \\ x^{k-1}-x^{k} \\ -\mu \left( x^{k}-x^{k-1}\right) \end{array}\right] \in \partial H\left( x^{k}, \xi ^{k}, x^{k-1}\right) , \end{aligned}$$

for all \(k \ge 1\). This together with the relation that \(x^k-y^{k-1}=x^k-x^{k-1}-\beta _{k-1}(x^{k-1}-x^{k-2})\) and the Lipschitz continuity of \(\nabla f\), we conclude that there exists \( A > 0\) such that

$$\begin{aligned} \begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) \le A \left( \left\| x^{k}-x^{k-1}\right\| +\left\| x^{k-1}-x^{k-2}\right\| \right) . \end{aligned} \end{aligned}$$
(4.10)

Since the facts that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) and \(\left\| x^{k-1}-x^{k-2}\right\| \rightarrow 0\) from Theorem 4.1 (ii), we deduce that \(\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0\). This proves (i).

(ii) Noting that the inequality \(H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \ge \Psi (x^k) \) holds in (4.1) and due to the level-boundedness of \(\Psi \), we obtain that the sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\) is bounded below. In addition, the sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\) is nonincreasing from Lemma 4.1. Therefore, we conclude that \(\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) \) exists. This proves (ii).

(iii) Using the facts \(\{x^k\}\) and \(\{\xi ^k\}\) are bounded from Theorem 4.1 (i), we know immediately that the set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \) is a nonempty compact set \(\Upsilon \).

We now show that \(H \equiv \zeta \) on \(\Upsilon \). To this end, take any \((\hat{x}, \hat{\xi }, \hat{\nu }) \in \Upsilon \), then there exists a convergent subsequence \(\left\{ \left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) \right\} \) such that \(\lim _{j \rightarrow \infty }\left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) =(\hat{x}, \hat{\xi }, \hat{\nu })\). From the lower semicontiny of the sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\) and the definition of \(\zeta \), we have

$$\begin{aligned} H(\hat{x},\hat{\xi },\hat{\nu })\le \liminf \limits _{j \rightarrow \infty } H\left( x^{k_j},\xi ^{k_j}, x^{k_j-1}\right) =\zeta . \end{aligned}$$
(4.11)

On the other hand, since \(x^{k_{j}}\) is the minimizer of the subproblem (3.1), we have

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\quad \le g_{n}(\hat{x})+\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, \hat{x}\right\rangle +\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}. \end{aligned} \end{aligned}$$
(4.12)

By rearranging the terms above, we obtain that

$$\begin{aligned} \begin{aligned}&g_{n}\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\quad \le g_{n}(\hat{x})+\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}. \end{aligned} \end{aligned}$$
(4.13)

Using the facts that \(y^{k_{j}-1}=x^{k_{j}-1}+\beta _{k_{j}-1}\left( x^{k_{j}-1}-x^{k_{j}-2}\right) \), \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) from Theorem 4.1(ii), and \(\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}\), we have the following relations:

$$\begin{aligned}&\begin{aligned} \left\| x^{k_{j}}-y^{k_{j}-1}\right\| \le \left\| x^{k_{j}}-x^{k_{j}-1}\right\| +\overline{\beta }\left\| x^{k_{j}-1}-x^{k_{j}-2}\right\| \rightarrow 0, \end{aligned} \end{aligned}$$
(4.14)
$$\begin{aligned}&\left\| \hat{x}-y^{k_{j}-1}\right\| =\left\| \hat{x}-x^{k_{j}}+x^{k_{j}}-y^{k_{j}-1}\right\| \le \left\| \hat{x}-x^{k_{j}}\right\| +\left\| x^{k_{j}}-y^{k_{j}-1}\right\| \rightarrow 0. \nonumber \\ \end{aligned}$$
(4.15)

In addition, using the continuous differentiability of f, \(\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}\), and the boundedness of \(\left\{ \xi ^{k_{j}}\right\} \) and \(\left\{ y^{k_{j}}\right\} \), we have

$$\begin{aligned} \lim _{j \rightarrow \infty }\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle =0. \end{aligned}$$
(4.16)

Therefore, from the definition of the auxiliary sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\), we obtain that

$$\begin{aligned} \begin{aligned}&\zeta =\lim _{j \rightarrow \infty } H\left( x^{k_{j}},\xi ^{k_{j}},x^{k_{j}-1}\right) \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) +\frac{L}{2}\left\| x^{k_{j}}-x^{k_{j}-1}\right\| ^{2} \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2} \\&\quad =\lim _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +\left\langle \nabla f\left( y^{k_{j}-1}\right) -\xi ^{k_{j}}, x^{k_{j}}-\hat{x}\right\rangle +\frac{L}{2}\left\| x^{k_{j}}-y^{k_{j}-1}\right\| ^{2}\\&\qquad +g_{n}\left( x^{k_{j}}\right) -\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_{s}^{*}\left( \xi ^{k_{j}}\right) \\&\quad \le \limsup _{j\rightarrow \infty } f\left( x^{k_{j}}\right) +\frac{L}{2}\left\| \hat{x}-y^{k_{j}-1}\right\| ^{2}+g_{n}(\hat{x})-\left\langle x^{k_{j}}, \xi ^{k_{j}}\right\rangle +g_s^{*}\left( \xi ^{k_{j}}\right) \\&\quad =\limsup _{j \rightarrow \infty } f\left( x^{k_{j}}\right) +g_{n}(\hat{x})-\left\langle x^{k_{j}}-x^{k_{j}-1}, \xi ^{k_{j}}\right\rangle -g_{s}\left( x^{k_{j}-1}\right) \\&\quad =f(\hat{x})+g_{n}(\hat{x})-g_{s}(\hat{x})=\Psi (\hat{x}) \le H(\hat{x}, \hat{\xi }, \hat{\nu }), \end{aligned} \end{aligned}$$
(4.17)

where the third equality follows from the facts that \(\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0\) and (4.14), the fourth equality follows from (4.16), the first inequality holds due to (4.13), the fifth equality follows from (4.15) and the fact that \(\xi ^{k_{j}} \in \partial g_{s}\left( x^{k_{j}-1}\right) \), the sixth equality holds due to \(\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0\) from Theorem 4.1 (ii), the boundedness of \(\left\{ \xi ^{k}\right\} \) and the continuity of f and \(g_s\), while the last inequality follows from the relation (4.1).

Therefore, we conclude that \(H(\hat{x},\hat{\xi },\hat{\nu })=\lim \nolimits _{j \rightarrow \infty } H(x^{k_j},\xi ^{k_j},x^{k_{j}-1})\) from (4.11) and (4.17). Due to the arbitrariness of \((\hat{x},\hat{\xi },\hat{\nu })\), we have \(H \equiv \zeta \) on \(\Upsilon \). This completes the proof. \(\square \)

Next, we are going to analyze the global convergence and the convergence rate of the sequence \(\{x^k\}\) generated by EPDCA to solve (1.1).

Theorem 4.2

(Global convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Assume the auxiliary function \(H(x, \xi ,\nu )\) is a Kurdyka–Łojasiewicz function. Then the sequence \(\left\{ x^{k}\right\} \) converges to a stationary point of \(\Psi \); moreover, \(\sum \nolimits _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| <\infty \).

Proof

From Theorem 4.1 (iii), it is sufficient to prove the sequence \(\left\{ x^{k}\right\} \) is convergent. We first consider the case that there exists an integer \(k > 0\), such that \(H\left( x^{k}, \xi ^k,x^{k-1}\right) =\zeta .\) Due to the sequence \(\left\{ H\left( x^{k},\xi ^k, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), we obtain that for any \(\bar{j} \ge 0\), \(H\left( x^{k+\bar{j}},\xi ^{k+\bar{j}} , x^{k+\bar{j}-1}\right) =\zeta \) is satisfied. Therefore, from (4.5) we conclude that for all \(k >\bar{j} \ge 0\), it holds that \(x^k=x^{k+\bar{j}}\), which means that the sequence \(\left\{ x^{k}\right\} \) converges finitely.

In the following, we consider the case that for any \(k>0\), \(H\left( x^{k},\xi ^k, x^{k-1}\right) >\zeta \) holds. Recall from Proposition 4.1 (iii) that \(\Upsilon \) is a compact subset and \(H \equiv \zeta \) on \(\Upsilon \), where \(\Upsilon \) is the set of the accumulation points of \(\left\{ \left( x^{k},\xi ^k, x^{k-1}\right) \right\} \). In addition, noting that \(H(x,\xi ,\nu )\) is a Kurdyka–Łojasiewicz function, according to Lemma 2.1, there exists a positive scalar \(\epsilon \) and a continuous concave function \(\phi \in \Xi _{\eta }\) with \(\eta >0\) such that

$$\begin{aligned} \phi ^{\prime }(H(x, \xi ,\nu )-\zeta )\cdot {\text {dist}}((0,0,0), \partial H(x, \xi , \nu )) \ge 1 \end{aligned}$$

for all \((x, \xi , \nu ) \in W\), where

$$\begin{aligned} \begin{aligned} W=&\left\{ (x, \xi , \nu ) \in \mathbb {R}^{n} \times \mathbb {R}^{n} \times \mathbb {R}^{n}: {\text {dist}}((x, \xi , \nu ), \Upsilon )<\epsilon \right\} \\&\cap \left\{ (x, \xi , \nu ) \in \mathbb {R}^{n} \times \mathbb {R}^{n} \times \mathbb {R}^{n}: \zeta<H(x, \xi , \nu )<\zeta +\eta \right\} . \end{aligned} \end{aligned}$$

Since \(\left\{ x^{k}\right\} \) and \(\left\{ \xi ^{k}\right\} \) are bounded from Theorem 4.1 (i), and \(\Upsilon \) is the set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) from Proposition 4.1 (iii), we obtain that

$$\begin{aligned} \lim _{k \rightarrow \infty } {\text {dist}}\left( \left( x^{k},\xi ^k, x^{k-1}\right) , \Upsilon \right) =0. \end{aligned}$$

Hence, there exists \(\bar{k}_{1}>0\) such that \({\text {dist}}\left( \left( x^{k},\xi ^k, x^{k-1}\right) , \Upsilon \right) <\epsilon \) whenever \(k \ge \bar{k}_{1}\). Similarly, from the sequence \(\left\{ H\left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), there exists \(\bar{k}_{2}>0\) such that \(\zeta<H\left( x^{k},\xi ^k, x^{k-1}\right) <\zeta +\eta \) for all \(k \ge \bar{k}_{2}\). Taking \(\bar{k}=\max \left\{ \bar{k}_{1},\bar{ k}_{2}\right\} \), then the sequence \(\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) belongs to W whenever \({k \ge \bar{k}}\). Hence we have

$$\begin{aligned} \phi ^{\prime }\left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) \ge 1,~~\forall k \ge \bar{k}. \end{aligned}$$
(4.18)

Since \(\phi \) is a concave function, we see further that

$$\begin{aligned} \begin{aligned}&\left[ \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right] \\&\qquad \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \\&\quad \ge \phi ^{\prime }\left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) \cdot {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \\&\qquad \cdot \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) \right) \\&\quad \ge H\left( x^{k}, \xi ^k,x^{k-1}\right) -H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) \ge T \left\| x^{k}-x^{k-1}\right\| ^{2}, \end{aligned} \end{aligned}$$

where the second inequality holds due to (4.18) and the fact that \(\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) is nonincreasing, the third inequality follows from (4.5) and \(T=\frac{\mu -L\beta _k^2}{2}\). From the above inequality and (4.10), we obtain that

$$\begin{aligned} \begin{aligned} \left\| x^{k}-x^{k-1}\right\| ^{2} \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&\cdot \left( \left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| \right) . \end{aligned} \end{aligned}$$
(4.19)

Taking the square root on both sides of (4.19) and using the inequality of arithmetic and geometric means, we have

$$\begin{aligned} \begin{aligned} \left\| x^{k}-x^{k-1}\right\| \le&\sqrt{\frac{2 A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) } \\&\times \sqrt{\frac{\left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| }{2}} \\ \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&+\frac{1}{4}\left\| x^{k-1}-x^{k-2}\right\| +\frac{1}{4}\left\| x^{k}-x^{k-1}\right\| , \end{aligned} \end{aligned}$$

which implies that

$$\begin{aligned} \begin{aligned} \frac{1}{2}\left\| x^{k}-x^{k-1}\right\| \le&\frac{A}{T}\left( \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) -\phi \left( H\left( x^{k+1}, \xi ^{k+1},x^{k}\right) -\zeta \right) \right) \\&+\frac{1}{4}\left( \left\| x^{k-1}-x^{k-2}\right\| -\left\| x^{k}-x^{k-1}\right\| \right) . \end{aligned} \end{aligned}$$
(4.20)

Summing the above relation from \(k=\bar{k}\) to \(\infty \), we obtain that

$$\begin{aligned} \sum _{k=\bar{k}}^{\infty }\left\| x^{k}-x^{k-1}\right\| \le \frac{2 A}{T} \phi \left( H\left( x^{\bar{k}}, \xi ^{\bar{k}},x^{\bar{k}-1}\right) -\zeta \right) +\frac{1}{2}\left\| x^{\bar{k}-1}-x^{\bar{k}-2}\right\| <\infty , \end{aligned}$$

which implies that \(\sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| \le \infty \), i.e., the sequence \(\left\{ x^{k}\right\} \) is a Cauchy sequence. Thus, the sequence \(\left\{ x^{k}\right\} \) converges to a stationary point of \(\Psi \) from Theorem 4.1 (iii). This completes the proof. \(\square \)

Remark 4.1

It is worth mentioning that the idea of Theorem 4.2 is similar to Theorem 2.9 proposed in [4], but the details are slightly different. In order to make the proof easier to read and understand, we write out the concrete proof.

In the following, we consider the convergence rate of the sequence \(\left\{ x^{k}\right\} \) under the assumption that the auxiliary function \(H(x, \xi , \nu )\) is a KL function whose \(\phi \in \Xi _\eta \) takes the form \(\phi (s)=c s^{1-\theta }\) for some \(\theta \in [0,1)\).

Theorem 4.3

(Rate of convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Suppose that \(\left\{ x^{k}\right\} \) converges to some \(x^\star \in \mathcal {X}\), the auxiliary function \(H(x, \xi ,\nu )\) is a KL function with \(\phi \) in the KL inequality taking the form \(\phi (s)=c s^{1-\theta }\) for some \(\theta \in [0,1)\) and \(c>0 .\) Then the following statements hold:

  1. (i)

    If \(\theta =0\), then there exists \(k_{0}>0\) so that \(x^{k}\) is constant for \(k>k_{0}\).

  2. (ii)

    If \(\theta \in \left( 0, \frac{1}{2}\right] \), then there exist \(c_{1}>0, k_{1}>0\), and \(\eta \in (0,1)\) such that \(\left\| x^{k}-x^\star \right\| <c_{1} \eta ^{k}\) for \(k>k_{1}\).

  3. (iii)

    If \(\theta \in \left( \frac{1}{2}, 1\right) \), then there exist \(c_{2}>0\) and \(k_{2}>0\) such that \(\left\| x^{k}-x^\star \right\| <c_{2} k^{-\frac{1-\theta }{2 \theta -1}}\) for \(k>k_{2}\).

Proof

(i) If \(\theta =0\), we will prove that there must exist an integer \(k_{0}>0\) such that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \). We assume to the contrary that for all \(k>0\), it holds that \(H\left( x^{k},\xi ^{k}, x^{k-1}\right) >\zeta \). Since \(\lim \limits _{k\rightarrow \infty } x^{k}=x^\star \) and the sequence \(\left\{ H\left( x^{k},\xi ^{k}, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), from \(\phi (s)=c s\) and the inequality (4.18), we have

$$\begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \ge \frac{1}{c}, \end{aligned}$$

for all sufficiently large k, which contradicts Proposition 4.1 (i). Therefore, there exists \(k_{0}>0\) so that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \). Due to the sequence \(\{H\left( x^{k}, \xi ^k,x^{k-1}\right) \}\) is nonincreasing and convergent to \(\zeta \), it must holds that \(H\left( x^{k_{0}+\bar{j}},\xi ^{k_{0}+\bar{j}}, x^{k_{0}+\bar{j}-1}\right) =\zeta \) for any \(\bar{j} \ge 0\). Thus, we conclude from (4.5) that there exists \(k_{0}>0\) so that \(x^{k}\) is constant for \(k>k_{0}\). This proves (i).

(ii–iii) We next consider the case that \(\theta \in (0,1) .\) If there exists \(k_{0}>0\) such that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \), then we can show that the sequence \(\left\{ x^{k}\right\} \) is finitely convergent as above, and the desired conclusions hold trivially. Therefore, for \(\theta \in (0,1)\), we only need to consider the case when \(H\left( x^{k}, \xi ^k,x^{k-1}\right) >\zeta \) for all \(k>0\).

Define \(R_{k}=H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \) and \(S_{k}=\sum _{j=k}^{\infty }\left\| x^{j}-x^{j-1}\right\| \), where \(S_{k}\) is well-defined due to Theorem 4.2. Using (4.20), for any \(k\ge \bar{k}\) ,where \(\bar{k}\) is defined as in (4.18), we have

$$\begin{aligned} \begin{aligned} S_{k}=\,&2 \sum _{j=k}^{\infty } \frac{1}{2}\left\| x^{j}-x^{j-1}\right\| \\ \le \,&2 \sum _{j=k}^{\infty }\left[ \frac{A}{T}\left( \phi \left( H\left( x^{j}, \xi ^j,x^{j-1}\right) -\zeta \right) -\phi \left( H\left( x^{j+1}, \xi ^{j+1},x^{j}\right) -\zeta \right) \right) \right. \\&\left. +\,\frac{1}{4}\left( \left\| x^{j-1}-x^{j-2}\right\| -\left\| x^{j}-x^{j-1}\right\| \right) \right] \\ \le \,&\frac{2 A}{T} \phi \left( H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \right) +\frac{1}{2}\left\| x^{k-1}-x^{k-2}\right\| \\ =&\, \frac{2 A}{T} \phi \left( R_{k}\right) +\frac{1}{2}\left( S_{k-1}-S_{k}\right) . \end{aligned} \end{aligned}$$

Using the above inequality and the fact that the sequence \(\left\{ S_{k}\right\} \) is nonincreasing, we obtain that for all \(k \ge \bar{k}\),

$$\begin{aligned} S_{k} \le \frac{2 A}{T} \phi \left( R_{k}\right) +\frac{1}{2}\left( S_{k-2}-S_{k}\right) . \end{aligned}$$
(4.21)

On the other hand, since \(\lim \limits _{k \rightarrow \infty } x^{k}=x^\star \) and the sequence \(\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), from (4.18) with \(\phi (s)=c s^{1-\theta }\), we have

$$\begin{aligned} c(1-\theta )\left( R_{k}\right) ^{-\theta } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right) \ge 1, \end{aligned}$$
(4.22)

for all sufficiently large k. In addition, from (4.10) and the definition of \(S_{k}\), we obtain that

$$\begin{aligned} \begin{aligned} {\text {dist}}\left( (0,0,0), \partial H\left( x^{k}, \xi ^k,x^{k-1}\right) \right)&\le A\left( \left\| x^{k-1}-x^{k-2}\right\| +\left\| x^{k}-x^{k-1}\right\| \right) \\&=A\left( S_{k-1}-S_{k+1}\right) \le A\left( S_{k-2}-S_{k}\right) . \end{aligned} \end{aligned}$$
(4.23)

Combining the relations (4.22) and (4.23), we further have

$$\begin{aligned} \left( R_{k}\right) ^{\theta } \le A \cdot c(1-\theta ) \left( S_{k-2}-S_{k}\right) . \end{aligned}$$

Raising to a power of \(\frac{1-\theta }{\theta }\) and multiplying by c to both sides of the above inequality, we obtain that

$$\begin{aligned} c\left( R_{k}\right) ^{1-\theta } \le c \left( A \cdot c(1-\theta ) \cdot \left( S_{k-2}-S_{k}\right) \right) ^{\frac{1-\theta }{\theta }} . \end{aligned}$$

Combining this with (4.21) and according to the fact that \(\phi \left( R_{k}\right) =c\left( R_{k}\right) ^{1-\theta }\), we see that for all sufficiently large k,

$$\begin{aligned} S_{k} \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+\frac{1}{2}\left( S_{k-2}-S_{k}\right) \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+S_{k-2}-S_{k}, \nonumber \\ \end{aligned}$$
(4.24)

where \(A_{1}=\frac{2 A}{T} c (A \cdot c(1-\theta ))^{\frac{1-\theta }{\theta }}\).

(ii)When \(\theta \in \left( 0, \frac{1}{2}\right] \), the inequality \(\frac{1-\theta }{\theta } \ge 1\) holds. According to the fact that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) from Theorem 4.1 (ii) and the definition of \(S_k\), we obtain that \(S_{k-2}-S_{k} \rightarrow 0\). Combining these and (4.24), we conclude that there exists \(k_{1}>0\) so that for all \(k \ge k_{1}\), it holds that \(S_{k} \le \left( A_{1}+1\right) \left( S_{k-2}-S_{k}\right) \), which implies that \(S_{k} \le \frac{A_{1}+1}{A_{1}+2} S_{k-2}\). Therefore, we obtain that for any \(k \ge k_{1}\),

$$\begin{aligned} \left\| x^{k}-x^\star \right\| \le \sum _{j=k}^{\infty }\left\| x^{j}-x^{j-1}\right\| =S_{k} \le S_{k_{1}-2}\left( \sqrt{\frac{A_{1}+1}{A_{1}+2}}\right) ^{k-k_{1}+1}. \end{aligned}$$

This proves (ii).

(iii)When \(\theta \in \left( \frac{1}{2}, 1\right) \), we deduce that \(\frac{1-\theta }{\theta }<1\). From this with (4.24) and the fact \(S_{k-2}-S_{k} \rightarrow 0\), we see that there exists \(k_{2}>0\) such that for all \(k \ge k_{2}\),

$$\begin{aligned} \begin{aligned} S_{k} \le A_{1}\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}+\left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }} =\left( A_{1}+1\right) \left( S_{k-2}-S_{k}\right) ^{\frac{1-\theta }{\theta }}. \end{aligned} \end{aligned}$$

Raising the above inequality to a power of \(\frac{\theta }{1-\theta }\) to both sides, we have for any \(k \ge k_{2}\),

$$\begin{aligned} S_{k}^{\frac{\theta }{1-\theta }} \le A_{2}\left( S_{k-2}-S_{k}\right) , \end{aligned}$$

where \(A_{2}=\left( A_{1}+1\right) ^{\frac{\theta }{1-\theta }} .\) From [2, Theorem 2], we obtain that there exists \(M> 0\) and all sufficiently large k, the inequality \(S_k\le M k^{-\frac{1-\theta }{2 \theta -1}}\) holds. This completes the proof. \(\square \)

5 Numerical experiments

In this section, we perform numerical experiments to illustrate the efficiency of EPDCA for solving the optimization problem (1.1). We compare our algorithm with pDCAe proposed in [31], which has different choices of the extrapolation parameters \( \{\beta _k\}\) given in (1.6), and the general iterative shrinkage and thresholding algorithm (GIST) proposed in [10] on a DC regularized least squares problem as follows:

$$\begin{aligned} \min \left\{ \Psi (x):=\frac{1}{2}\Vert A x-b\Vert ^{2}+g_n(x)-g_s(x) \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$
(5.1)

where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}\), \(g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a proper closed convex function and \(g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuous convex function. More concrete examples can be found in [31].

In the following we consider two different classes of regularizers. One is the \(\ell _{1-2}\) regularized least squares problem [32]:

$$\begin{aligned} \min \left\{ \Psi _{\ell _{1-2}}(x)=\frac{1}{2}\Vert A x-b\Vert ^{2}+\lambda \Vert x\Vert _{1}-\lambda \Vert x\Vert ~ \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$
(5.2)

where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m},\) and \(\lambda >0\) is the regularization parameter. It is easy to see that (5.2) is in the form of (5.1) with \(g_{n}(x)=\lambda \Vert x\Vert _{1}\) and \(g_{s}(x)=\lambda \Vert x\Vert \). We assume that A does not have zero columns so that \(\Psi _{\ell _{1-2}}\) is level-bounded [18]. So by Theorem 4.1, the sequence \(\left\{ x^{k}\right\} \) generated by our method to solve problem (5.2) is bounded, and any accumulation point of \(\left\{ x^{k}\right\} \) is a stationary point. In addition, noting from [31, Exemple 4.1] that if chosen \(\lambda <\frac{1}{2}\Vert A^{T}b\Vert _\infty \), we obtain that the sequence \(\left\{ x^{k}\right\} \) generated by our method for solving problem (5.2) is globally convergent.

The other is the logarithmic regularized least square problem [6]:

$$\begin{aligned} \min \left\{ \Psi _{log}(x)=\frac{1}{2}\Vert A x-b\Vert ^{2}+\sum _{i=1}^{n}\left[ \lambda \log \left( \left| x_{i}\right| +\epsilon \right) -\lambda \log \epsilon \right] ~ \mid x \in \mathbb {R}^{n}\right\} , \end{aligned}$$
(5.3)

where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}, \epsilon >0\) is a constant and \(\lambda >0\) is the regularization parameter. We observe that (5.3) is in the form of (5.1) with \(g_{n}(x)=\frac{\lambda }{\epsilon }\Vert x \Vert _{1}\) and \(g_{s}(x)=\sum _{i=1}^{n} \lambda \left[ \frac{\left| x_{i}\right| }{\epsilon }-\log \left( \left| x_{i}\right| +\epsilon \right) +\log \epsilon \right] .\) It is easy to show that \(\Psi _{log}\) is level-bounded. This together with Theorem 4.2, we see that the sequence \(\left\{ x^{k}\right\} \) generated by EPDCA to solve problem (5.3) is globally convergent.

In the following, we perform several numerical tests to compare the behaviors of our method with pDCAe and GIST for solving (5.2) and (5.3), respectively. All the numerical experiments have been performed in Matlab 2018b on a 64-bit PC with an Intel(R) Core(TM) i7-4790 CPU (3.60GHz) and 32GB of RAM.

In our numerical experiments, for given (mns), we first randomly generate a matrix \(A \in \mathbb {R}^{m \times n}\) with i.i.d. standard Gaussian entries, and each column of A is normalized. Then a s-sparsity vector \(\hat{y} \in \mathbb {R}^{n}\) is uniformly generated at random and let \(y=\mathrm{sign}(\hat{y})\). Finally, we generate \(b=A y+0.01 \cdot \hat{n}\), where \(\hat{n} \in \mathbb {R}^{m}\) is a random vector with i.i.d. standard Gaussian entries.

For each triple \((m, n,s)=(720i,2560i,80i)\), \(i = 1,2, \ldots , 8\), we generate 50 instances randomly as described above and compare all three methods in terms of the average performance over the instances. We choose the Lipschitz constant L of \(\nabla f\) as \(\lambda _{\max }\left( A^{T} A\right) \), initialize these algorithms at the origin, and terminate these algorithms when the iterations reach the maximum number of 5000 times or \(\Vert x^{k}-x^{k-1}\Vert / \text {max}\{1,\Vert x^{k}\Vert \}\le 10^{-5}.\)

Table 1 Solving (5.2) on 50 random instances, \(\lambda = 1 \times 10^{-3}\)
Table 2 Solving (5.2) on 50 random instances, \(\lambda = 5 \times 10^{-4}\)
Table 3 Solving (5.3) on 50 random instances, \(\lambda = 1 \times 10^{-3}\)
Table 4 Solving (5.3) on 50 random instances, \(\lambda = 5 \times 10^{-4}\)

With regard to parameters, for EPDCA, we choose the extrapolation parameters {\(\beta _{k}\)} as in (3.2) and set \(a=0.8, b=5, \omega =0.98\). In addition, we adopt the restart strategy and set \(K=500\) in Remark 3.1 (iv) on our method to guarantee \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\). For pDCAe, the extrapolation parameters are chosen as the extrapolation coefficients used in [31]. For GIST, the algorithmic framework and the setting of parameters can be found in [19, Appendix A].

For the \(\ell _{1-2}\) regularized least squares problem, the numerical results are presented in Tables 1 and 2, which correspond with \(\lambda =1 \times 10^{-3}\) and \(\lambda =5 \times 10^{-4}\) respectively. While for the logarithmic regularized least square problem, the numerical results are presented in Tables 3 and 4, which also correspond with \(\lambda =1 \times 10^{-3}\) and \(\lambda =5 \times 10^{-4}\) respectively. In all the tables, we give the computation of \(L=\lambda _{\max }\left( A^{T} A\right) (t_{\lambda _{max}})\), the number of iterations, CPU times in seconds which does not include the time for computing L, and the function values at termination, averaged over the 50 random instances.

In Fig. 1, We plot \(\Vert x^k-x^*\Vert \) against the number of iterations to compare with the convergence performance of each algorithm on solving \(\ell _{1-2}\) regularized least squares problem (5.2), where \(x^*\) denotes the approximate solution obtained at termination of the respective method. We choose \(\lambda =1 \times 10^{-3}\) in Fig. 1a and \(\lambda =5 \times 10^{-4}\) in Fig. 1b, which correspond with Tables 1 and 2, respectively. In Fig. 2, We plot \(\Vert x^k-x^*\Vert \) against the number of iterations to compare with the convergence performance of each algorithm on solving logarithmic regularized least square problem (5.3). We choose \(\lambda =1 \times 10^{-3}\) in Fig. 2a and \(\lambda =5 \times 10^{-4}\) in Fig. 2b, which correspond with Tables 3 and 4, respectively. The green, blue and red line correspond to the three methods, GIST, pDCAe and EPDCA, respectively.

We conclude from these tables and figures that our method takes fewer iterations and less CPU time than pDCAe and GIST to get a good approximation of the minimum value of the two test regularized least squares problems.

Fig. 1
figure 1

Comparison of GIST, pDCAe and EPDCA with the problem sizes \(m=720, n=2560\) and \( s=80\) on solving \(\ell _{1-2}\) regularized least squares problem (5.2)

Fig. 2
figure 2

Comparison of GIST, pDCAe and EPDCA with the problem sizes \(m=720, n=2560\) and \( s=80\) on solving logarithmic regularized least square problem (5.3)

6 Conclusion

In this paper, based on the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe [31] for DC programming, we propose a new extrapolated proximal difference-of-convex algorithm (EPDCA), which equips a more general setting of the extrapolation parameters \(\{\beta _k\}\) for solving the DC optimization problem (1.1). We establish global subsequential convergence of the sequence generated by our method. In addition, by assuming the Kurdyka–Łojasiewicz property of the auxiliary function, we establish global convergence of the sequence generated by EPDCA and analyze its convergence rate, which can allow \(g_s\) to be a nonsmooth function. To evaluate the performance of the proposed method, we consider two classes of DC regularized least squares problems including the \(\ell _{1-2}\) regularized least squares problem and logarithmic regularized least square problem. Numerical results illustrate the efficiency of our method and its superiority over other well-known methods.

In future research, we can utilize the advantage of nonmonotone line search technique in [23] to enlarge the step size and meanwhile speed up the convergence of the proposed algorithm. Furthermore, the difference-of-Moreau-envelopes smoothing technique has been developed to solve DC programming in [30]. These variational methods may inspire further improvements and extensions.