Abstract
This paper considers a class of difference-of-convex (DC) optimization problems, whose objective function is the sum of a convex smooth function and a possibly nonsmooth DC function. We first propose a new extrapolated proximal difference-of-convex algorithm, which incorporates a more general setting of the extrapolation parameters \(\{\beta _k\}\). Then we prove the subsequential convergence of the proposed method to a stationary point of the DC problem. Based on the Kurdyka–Łojasiewicz inequality, the global convergence and convergence rate of the whole sequence generated by our method have been established. Finally, some numerical experiments on the DC regularized least squares problems have been performed to demonstrate the efficiency of our proposed method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In recent years, DC optimization problems arise in many application areas, such as machine learning [12, 15] and image processing [7, 9, 21]. Due to their popularity, many numerical algorithms have been proposed for solving these problems [1, 13, 14, 29]. In this paper, we consider a special DC problem, which is in the following form:
where g is a difference-of-convex function:
\(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a smooth convex function with Lipschitz continuous gradient, \(g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a proper closed convex function and \(g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuous convex function. In addition, we assume the objective function \(\Psi \) is level-bounded, which means that inf \( \Psi (x)>-\infty \) and the set of global minimizers of (1.1) is nonempty.
Since problem (1.1) is a class of DC problems, the well-known and classical DC algorithm (DCA) [26] can be applied to solving the optimization problem. The main iteration step is:
where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \) is a subgradient of \(g_s\) at \(x^{k} \in \mathbb {R}^{n}\). As the scale of this problem becomes larger and larger, the original DCA probably not very efficient. By using the proximal mapping and a specific DC decomposition [27], large-scale DC optimization problems are solved via the proximal DC algorithm (PDCA) [11]. It can be suitably applied to the problem (1.1) for which the new iterate is obtained by solving the following proximal subproblem
where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \) and \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\). However, this algorithm may take a large number of iterations.
Therefore, various attempts have been made to accelerate PDCA. By using the extrapolation accelerating techniques, Wen et al. [31] proposed a proximal DC algorithm with extrapolation (pDCAe) to solve the problem (1.1). The key iteration step is as follows:
where \(\xi ^{k} \in \partial g_{s}\left( x^{k}\right) \), \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\) and \(\beta _{k}\left( x^{k}-x^{k-1}\right) \) is an extrapolation term. The choice of extrapolation parameters \(\{\beta _{k}\}\) in [31] can cover many popular choices of extrapolation parameters including those used in fast iterative shrinkage-thresholding algorithm (FISTA) with fixed restart strategy [24] for solving (1.1) in the case \(g(x)=g_n(x)\), which is given as follows:
where \(t_{-1}=t_{0}=1\). Recently, based on the difference-of-Moreau-envelopes smoothing technique, Sun et al. [30] developed an inexact gradient descent method for solving the DC programming, which is
where \(\upsilon \in (0, \frac{1}{L})\), \(L>0\) is the Lipschitz continuity modulus of \(\nabla f\) and \(\beta \in (0,2)\).
For other accelerated DC algorithms for solving constrained DC programming have been further proposed to improve the quality of solutions and the rate of convergence. Pang et al. [25] proposed an enhanced DC algorithm (EDCA) to solve the considered DC problem, meanwhile speed up the subsequential convergence. Further, based on EDCA, Lu et al. [22, 23] developed the enhanced proximal DC algorithm (EPDCA) to improve the efficiency of solving constrained DC programming with \(g_s\) to be the supremum of many convex smooth functions in (1.2), while nonmonotone EPDCA is proposed by utilizing the nonmonotone line search technique. Moreover, Sun et al. [30] proposed algorithms by combining the difference-of-Moreau-envelopes smoothing technique with the augmented Lagrangian function and obtained well convergence results.
In this paper, motivated by the success of pDCAe on accelerating the original PDCA for solving structured DC optimization problems, we aim to investigate a more general extrapolation scheme in order to improve the numerical performance of PDCA. Concretely, inspired by the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe in [31], we consider a new extrapolated proximal DC algorithm (EPDCA) with a more general setting of the extrapolation parameters \(\{\beta _{k}\}\), where a power parameter \(\omega \in (0, 1]\) is introduced in the extrapolation scheme.
We first prove that any accumulation point of the sequence generated by our method is a stationary point under very general conditions. Moreover, according to Kurdyka–Łojasiewicz property, we establish the global convergence and convergence rate of the sequence generated by our method, which does not require the assumption that \(g_s\) is continuously differentiable. In the end, we conduct some numerical experiments to show the efficiency of our method.
The main contribution of this work can be summarized as following:
-
We propose a novel extrapolation scheme with more general setting of the extrapolation parameters \(\{\beta _{k}\}\) in EPDCA to solve DC optimization problem (1.1) for potential acceleration. Then we establish the convergence results of our method.
-
By constructing an auxiliary function, we prove the global convergence results of EPDCA without the assumption that \(g_s\) is a locally Lipschitz continuous function, which extended the convergence results of pDCAe developed in [31].
-
According to our numerical results, the performance of our method is efficient on \(\ell _{1-2}\) regularized least squares problem [32] and logarithmic regularized least squares problem [6].
The rest of this paper is organized as follows. In Sect. 2, some basic definitions used in the whole paper are given. Section 3 gives the framework of EPDCA and discuss the selection of the extrapolation parameters \(\{\beta _k\}\). In Sect. 4, we analyze the convergence behaviors of our method. Some numerical experiments are conducted to show the efficiency of the proposed method in comparison with other existing methods in Sect. 5. Section 6 concludes the whole paper.
2 Preliminaries
In this paper, the Euclidean scalar product of \(\mathbb {R}^{n}\) and its corresponding norm are respectively denoted by \(\langle \cdot , \cdot \rangle \) and \(\Vert \cdot \Vert .\) For a \(m \times n\) matrix A, let its transpose be denoted by \(A^{T}\). If A is a symmetric matrix, we use \(\lambda _{\max }(A)\) and \(\lambda _{\min }(A)\) to denote the largest and smallest eigenvalues respectively. For a given nonempty closed set \(\mathcal {C} \subseteq \mathbb {R}^{n}\), let \({\text {dist}}(\cdot , \mathcal {C}): \mathbb {R}^{n} \rightarrow \mathbb {R}\) denotes the distance function to \(\mathcal {C}\), i.e.
For an extended real-valued function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), the set
denotes its effective domain. f is called proper if it does not take the value \(-\infty \) and dom f is nonempty.
The set
denotes the epigraph of f, and f is closed (or convex) if epif is closed (or convex). A proper closed function f is said to be level-bounded if the lower level set \(\left\{ x \in \mathbb {R}^{n}: f(x) \le r\right\} \) is bounded for any \(r \in \mathbb {R}\).
Given a proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), the \(Fr\acute{e}chet\) subdifferential of f at \(x \in {\text {dom}} f\) is given by
while for \(x \notin {\text {dom}} f, \partial ^{F} f(x)=\emptyset \). The limiting subdifferential of f at \(x \in {\text {dom}} f\) is defined as
while for \(x \notin {\text {dom}} f\), \(\partial f(x)=\emptyset \). If f is convex, the \(Fr\acute{e}chet\) subdifferential and the limiting subdifferential will coincide with the convex subdifferential, that is,
A proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) is called \(\sigma \)-strongly convex with \(\sigma \ge 0\) if for any \(x,z \in {\text {dom}} f\) and \(\lambda \in [0,1]\), it holds that
Moreover, let f be a \(\sigma \)-strongly convex function, it is known that for any \(x \in {\text {dom}} \partial f, y \in \partial f(x)\) and \(z \in {\text {dom}} f\), we have
For a proper closed convex function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\), let \(f^{*}(u)=\sup _{x\in \mathbb {R}^{n}}\{\langle u, x\rangle -f(x)\}\) denotes the conjugate function of f(x). Then \(f^{*}\) is proper closed convex and for any x and y, the Young’s inequality holds, that is
the equality holds if and only if \(y \in \partial f(x)\). Moreover, for any x and y, one has \(y \in \partial f(x)\) if and only if \(x \in \partial f^{*}(y)\).
Finally, we present some preliminaries on the Kurdyka–Łojasiewicz property [2,3,4]. These concepts play a central role in our theoretical and algorithmic developments. For \(\eta \in (0, \infty ]\), we denote by \(\Xi _{\eta }\) the set of all concave continuous functions \(\phi :[0, \eta ) \rightarrow [0, \infty )\) that are continuously differentiable over \((0, \eta )\) with positive derivatives and satisfy \(\phi (0)=0\).
Definition 2.1
(Kurdyka–Łojasiewicz property) A proper closed function \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) is said to satisfy the Kurdyka–Łojasiewicz property at \(\overline{x} \in {\text {dom}} \partial f\) if there exist \(\eta \in (0, \infty ]\), a neighborhood W of \(\overline{x}\), and a function \(\phi \in \Xi _{\eta }\) such that for all x in the intersection
it holds that
If f satisfies the Kurdyka–Łojasiewicz property at any point of dom \(\partial f\), then f is called a Kurdyka–Łojasiewicz function.
The following uniformized Kurdyka–Łojasiewicz property [5] plays an important role in our sequential convergence analysis.
Lemma 2.1
(Uniformized Kurdyka–Łojasiewicz property) Let \(\Omega \subseteq \mathbb {R}^{n}\) be a compact set and let \(f: \mathbb {R}^{n} \rightarrow (-\infty , \infty ]\) be a proper closed function. If f is constant on \(\Omega \) and satisfies the Kurdyka–Łojasiewicz property at each point of \(\Omega \), then there exist \(\epsilon , \eta >0\) and \(\phi \in \Xi _{\eta }\) such that
for any \(\overline{x} \in \Omega \) and any x satisfying \({\text {dist}}(x, \Omega )<\epsilon \) and \(f(\overline{x})<f(x)<f(\overline{x})+\eta \).
3 Extrapolated proximal difference-of-convex algorithm
In this section, we will show the framework of the extrapolated proximal difference-of-convex algorithm (EPDCA) and give the chosen of the extrapolation parameters \( \{\beta _k\}\), which are more general than those proposed in (1.6).
The extrapolation parameters \( \{\beta _k\}\) are selected as follows:
where \(\omega \in (0, 1]\), \(a \in \mathbb {R}^{+}\) and \(b \in \mathbb {R}\). Set \(t_{-1}=t_0=b\). It is worth noting that in order to avoid the denominator being zero, without further mentioning, we always set \(b \in \mathbb {R}\backslash \{-ak^\omega : k \in \mathbb {N}^{+}\}\).
Remark 3.1
-
(i)
If we choose \(\omega =1, a=\frac{1}{\alpha -1} \ \mathrm{with} \ \alpha >3, b=1\) in (3.2), the extrapolation parameters \( \{\beta _k\}\) become as follows:
$$\begin{aligned} \beta _k=\frac{k-1}{k+\alpha -1}. \end{aligned}$$(3.3)This setting was proposed by Chambolle and Dossal [8], which is a special case of our proposed extrapolation parameters in (3.2).
-
(ii)
Let the extrapolation parameters \( \{\beta _k\}\) in (3.2) with \(a=\frac{1}{2}, b=1\) and \(\omega =1\), we can obtain the extrapolation parameters \( \{\beta _k\}\) in (1.6) and (3.2) converge to the same value with the same convergence rate from [17, Proposition2], which means that the setting (1.6) of the extrapolation parameters \( \{\beta _k\}\) are asymptotically consistent to a special case of the parameters we proposed in (3.2).
-
(iii)
The choice of the power \(\omega \) depends on the specific problems. We choose \(\omega =0.98\) in the numerical experiments for solving two classes of DC regularized least squares problems, see Sect. 5.
-
(iv)
We can easily deduce that the sequence \(\{\beta _k\}\) is increasing to 1 from (3.2). Therefore, given a fixed positive integer K, we can choose \(\beta _k \equiv \beta _K\) whenever \(k>K\), which implies the supremum of \(\beta _k\) is strictly less than 1.
More concrete examples of such setting of extrapolation parameters have been extensively used for the Nesterov’s accelerated forward–backward algorithm, see [16] for more details.
4 Convergence analysis
In this section, we are going to analyze the convergence behaviors of EPDCA. We first define the following auxiliary sequence:
where \(\mu \in [L\overline{\beta }^2,L]\) with \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\), \(\xi ^{k} \in \partial g_s(x^{k-1})\) and \(g_s^{*}(\xi ^{k}) \) is the conjugate function of \(g_s(x^k)\). Based on the Young’s inequality, we obtain that
It is worth mentioning that using this auxiliary sequence structure, we can establish the global convergence of the sequence generated by our method to solve problem (1.1) without requiring \(g_s\) to be continuously differentiable. The idea of constructing such an auxiliary sequence similarity can be found in [20].
Lemma 4.1
Let \(\{x^k\}\) be a sequence generated by EPDCA and \(\mu \in [L\overline{\beta }^2,L]\), it holds that
Furthermore, the sequence \(\{H(x^k,\xi ^{k},x^{k-1})\}\) is nonincreasing.
Proof
Using the strong convexity of the objective in the subproblem (3.1), we can compare the objective values of this strongly convex function at \(x^{k+1}\) and \(x^k\) that
Rearranging terms, we obtain that
On the other hand, using the fact that \(\nabla f\) is Lipschitz continuous with a Lipschitz constant of \(L>0\), we have
Using the above two inequalities, we obtain further that
where the second inequality follows from the convexity of f, and the last inequality holds due to the relation \(x^k-y^k=x^k-(x^k+\beta _k(x^k-x^{k-1}))=-\beta _k(x^k-x^{k-1})\). From the definition of the auxiliary sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\) and the conjugate function \(g_s^{*}\), we have
where the first inequality follows from (4.3), and the second equality follows from the fact that \(\xi ^{k+1} \in \partial g_s(x^k)\). Combing the Young’s inequality on (4.4), we obtain that
where the second equality follows from the definition of the sequence \(\{H(x^k,\xi ^{k},x^{k-1})\}\). Therefore,
Recall that \(L \bar{\beta }^{2} \le \mu \le L\) and \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\), we obtain that for any \(k >0\),
Consequently, we see immediately that
which means that the auxiliary sequence \(\left\{ H(x^{k},\xi ^{k},x^{k-1})\right\} \) is nonincreasing. This completes the proof. \(\square \)
Before analyzing the convergence behaviors, on the basis of Fermat’s rule [28, Theorem 10.1], we give the definition of stationary points of the objective function \(\Psi \) as follows.
Definition 4.1
Let \(\Psi \) be given in (1.1). We say that \(\tilde{x}\) is a stationary point of \(\Psi \) if
The set of all stationary points of \(\Psi \) is denoted by \(\mathcal {X}\).
We next give the global subsequential convergence of the sequence \(\{x^k\}\) generated by EPDCA to solve problem (1.1) .
Theorem 4.1
(Global subsequential convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA and \(\mu \in [L\overline{\beta }^2,L]\). Then the following statements hold:
-
(i)
The sequences \(\left\{ x^{k}\right\} \) and \(\left\{ \xi ^{k}\right\} \) are bounded.
-
(ii)
\(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) as \(k \rightarrow \infty \).
-
(iii)
Any accumulation point of the sequence \(\left\{ x^{k}\right\} \) is a stationary point of \(\Psi \).
Proof
(i) Since the sequence \(\left\{ H(x^k,\xi ^k,x^{k-1})\right\} \) is nonincreasing from Lemma 4.1, together with the inequality (4.1), we obtain that
Noting that by assumption \(\Psi \) is level-bounded, thus we have the sequence \(\left\{ x^{k}\right\} \) is bounded. Moreover, due to the boundedness of \(\{x^k\}\), the continuity of \(g_s\), and the fact that \(\xi ^k \in \partial g_s(x^{k-1})\), we immediately have the sequence \(\{\xi ^k\}\) is also bounded. This proves (i).
(ii) From Lemma 4.1 and according to the fact that \(\mu \le L\), we have
By rearranging the terms above, we obtain that
Furthermore, summing up both sides of the above inequality with k ranging from 1 to m, we have
Letting m to the limit in (4.6), we conclude that
Using the fact that \(\mu \ge L\overline{\beta }^2 \ge L\beta _{k}^2 \), we have \( \sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| ^{2}<\infty .\) Therefore, we immediately obtain that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) as \(k \rightarrow \infty \). This proves (ii).
(iii) Let \(x^{*}\) be an accumulation point of \(\left\{ x^{k}\right\} \). Then there exists a subsequence \(\left\{ x^{k_{j}}\right\} \) convergent to \(x^{*}\). From the first-order optimality condition of the subproblem (3.1), we obtain that
where \(\xi ^{k_{j}+1} \in \partial g_{s}\left( x^{k_{j}}\right) \) . On the other hand, due to the fact that \(\left\| x^{k_j}-x^{k_j-1}\right\| \rightarrow 0\) from (ii), we have
In addition, recall that the sequence \(\left\{ \xi ^{k_{j}+1}\right\} \) is bounded from (i), so without loss of generality we can assume that the limit of the sequence \(\left\{ \xi ^{k_{j}+1}\right\} \) exists, and it belongs to \(\partial g_{s}\left( x^{*}\right) \) due to the closedness of \(\partial g_{s}\). Noting the closedness of \(\partial g_{n}\) and the continuity of \(\nabla f\), then taking j to the limit in (4.7), we obtain that
which means that \(x^{*}\) is a stationary point of \(\Psi \). This completes the proof. \(\square \)
Before analyzing the global convergence results of EPDCA, we will give some properties of the auxiliary sequence \(\{H\left( x^{k},\xi ^k, x^{k-1}\right) \}\).
Proposition 4.1
Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Then the following statements hold:
-
(i)
\(\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0\).
-
(ii)
\(\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) \) exists.
-
(iii)
The set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \), denoted by \(\Upsilon \), is a nonempty compact set. Moreover, \(H \equiv \zeta \) on \(\Upsilon \).
Proof
(i) Taking the subdifferential of the auxiliary sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\), we obtain that for \(k\ge 1\),
Using the first-order optimality of the subproblem (3.1), we have
From the above relation and the fact that \(x^{k-1}\in \partial g_s^*(\xi ^k)\), we obtain further that
for all \(k \ge 1\). This together with the relation that \(x^k-y^{k-1}=x^k-x^{k-1}-\beta _{k-1}(x^{k-1}-x^{k-2})\) and the Lipschitz continuity of \(\nabla f\), we conclude that there exists \( A > 0\) such that
Since the facts that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) and \(\left\| x^{k-1}-x^{k-2}\right\| \rightarrow 0\) from Theorem 4.1 (ii), we deduce that \(\lim \limits _{k \rightarrow \infty } {\text {dist}}\left( (0,0,0), \partial H\left( x^{k},\xi ^k, x^{k-1}\right) \right) =0\). This proves (i).
(ii) Noting that the inequality \(H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \ge \Psi (x^k) \) holds in (4.1) and due to the level-boundedness of \(\Psi \), we obtain that the sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\) is bounded below. In addition, the sequence \(\{H\left( x^{k}, \xi ^{k}, x^{k-1}\right) \}\) is nonincreasing from Lemma 4.1. Therefore, we conclude that \(\zeta :=\lim \limits _{k \rightarrow \infty } H\left( x^{k},\xi ^k, x^{k-1}\right) \) exists. This proves (ii).
(iii) Using the facts \(\{x^k\}\) and \(\{\xi ^k\}\) are bounded from Theorem 4.1 (i), we know immediately that the set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \) is a nonempty compact set \(\Upsilon \).
We now show that \(H \equiv \zeta \) on \(\Upsilon \). To this end, take any \((\hat{x}, \hat{\xi }, \hat{\nu }) \in \Upsilon \), then there exists a convergent subsequence \(\left\{ \left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) \right\} \) such that \(\lim _{j \rightarrow \infty }\left( x^{k_{j}}, \xi ^{k_{j}}, x^{k_{j}-1}\right) =(\hat{x}, \hat{\xi }, \hat{\nu })\). From the lower semicontiny of the sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\) and the definition of \(\zeta \), we have
On the other hand, since \(x^{k_{j}}\) is the minimizer of the subproblem (3.1), we have
By rearranging the terms above, we obtain that
Using the facts that \(y^{k_{j}-1}=x^{k_{j}-1}+\beta _{k_{j}-1}\left( x^{k_{j}-1}-x^{k_{j}-2}\right) \), \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) from Theorem 4.1(ii), and \(\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}\), we have the following relations:
In addition, using the continuous differentiability of f, \(\lim _{j \rightarrow \infty } x^{k_{j}}=\hat{x}\), and the boundedness of \(\left\{ \xi ^{k_{j}}\right\} \) and \(\left\{ y^{k_{j}}\right\} \), we have
Therefore, from the definition of the auxiliary sequence \(\{H(x^k,\xi ^k,x^{k-1})\}\), we obtain that
where the third equality follows from the facts that \(\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0\) and (4.14), the fourth equality follows from (4.16), the first inequality holds due to (4.13), the fifth equality follows from (4.15) and the fact that \(\xi ^{k_{j}} \in \partial g_{s}\left( x^{k_{j}-1}\right) \), the sixth equality holds due to \(\left\| x^{k_{j}}-x^{k_{j}-1}\right\| \rightarrow 0\) from Theorem 4.1 (ii), the boundedness of \(\left\{ \xi ^{k}\right\} \) and the continuity of f and \(g_s\), while the last inequality follows from the relation (4.1).
Therefore, we conclude that \(H(\hat{x},\hat{\xi },\hat{\nu })=\lim \nolimits _{j \rightarrow \infty } H(x^{k_j},\xi ^{k_j},x^{k_{j}-1})\) from (4.11) and (4.17). Due to the arbitrariness of \((\hat{x},\hat{\xi },\hat{\nu })\), we have \(H \equiv \zeta \) on \(\Upsilon \). This completes the proof. \(\square \)
Next, we are going to analyze the global convergence and the convergence rate of the sequence \(\{x^k\}\) generated by EPDCA to solve (1.1).
Theorem 4.2
(Global convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Assume the auxiliary function \(H(x, \xi ,\nu )\) is a Kurdyka–Łojasiewicz function. Then the sequence \(\left\{ x^{k}\right\} \) converges to a stationary point of \(\Psi \); moreover, \(\sum \nolimits _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| <\infty \).
Proof
From Theorem 4.1 (iii), it is sufficient to prove the sequence \(\left\{ x^{k}\right\} \) is convergent. We first consider the case that there exists an integer \(k > 0\), such that \(H\left( x^{k}, \xi ^k,x^{k-1}\right) =\zeta .\) Due to the sequence \(\left\{ H\left( x^{k},\xi ^k, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), we obtain that for any \(\bar{j} \ge 0\), \(H\left( x^{k+\bar{j}},\xi ^{k+\bar{j}} , x^{k+\bar{j}-1}\right) =\zeta \) is satisfied. Therefore, from (4.5) we conclude that for all \(k >\bar{j} \ge 0\), it holds that \(x^k=x^{k+\bar{j}}\), which means that the sequence \(\left\{ x^{k}\right\} \) converges finitely.
In the following, we consider the case that for any \(k>0\), \(H\left( x^{k},\xi ^k, x^{k-1}\right) >\zeta \) holds. Recall from Proposition 4.1 (iii) that \(\Upsilon \) is a compact subset and \(H \equiv \zeta \) on \(\Upsilon \), where \(\Upsilon \) is the set of the accumulation points of \(\left\{ \left( x^{k},\xi ^k, x^{k-1}\right) \right\} \). In addition, noting that \(H(x,\xi ,\nu )\) is a Kurdyka–Łojasiewicz function, according to Lemma 2.1, there exists a positive scalar \(\epsilon \) and a continuous concave function \(\phi \in \Xi _{\eta }\) with \(\eta >0\) such that
for all \((x, \xi , \nu ) \in W\), where
Since \(\left\{ x^{k}\right\} \) and \(\left\{ \xi ^{k}\right\} \) are bounded from Theorem 4.1 (i), and \(\Upsilon \) is the set of accumulation points of \(\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) from Proposition 4.1 (iii), we obtain that
Hence, there exists \(\bar{k}_{1}>0\) such that \({\text {dist}}\left( \left( x^{k},\xi ^k, x^{k-1}\right) , \Upsilon \right) <\epsilon \) whenever \(k \ge \bar{k}_{1}\). Similarly, from the sequence \(\left\{ H\left( x^{k}, \xi ^k, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), there exists \(\bar{k}_{2}>0\) such that \(\zeta<H\left( x^{k},\xi ^k, x^{k-1}\right) <\zeta +\eta \) for all \(k \ge \bar{k}_{2}\). Taking \(\bar{k}=\max \left\{ \bar{k}_{1},\bar{ k}_{2}\right\} \), then the sequence \(\left\{ \left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) belongs to W whenever \({k \ge \bar{k}}\). Hence we have
Since \(\phi \) is a concave function, we see further that
where the second inequality holds due to (4.18) and the fact that \(\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) is nonincreasing, the third inequality follows from (4.5) and \(T=\frac{\mu -L\beta _k^2}{2}\). From the above inequality and (4.10), we obtain that
Taking the square root on both sides of (4.19) and using the inequality of arithmetic and geometric means, we have
which implies that
Summing the above relation from \(k=\bar{k}\) to \(\infty \), we obtain that
which implies that \(\sum _{k=1}^{\infty }\left\| x^{k}-x^{k-1}\right\| \le \infty \), i.e., the sequence \(\left\{ x^{k}\right\} \) is a Cauchy sequence. Thus, the sequence \(\left\{ x^{k}\right\} \) converges to a stationary point of \(\Psi \) from Theorem 4.1 (iii). This completes the proof. \(\square \)
Remark 4.1
It is worth mentioning that the idea of Theorem 4.2 is similar to Theorem 2.9 proposed in [4], but the details are slightly different. In order to make the proof easier to read and understand, we write out the concrete proof.
In the following, we consider the convergence rate of the sequence \(\left\{ x^{k}\right\} \) under the assumption that the auxiliary function \(H(x, \xi , \nu )\) is a KL function whose \(\phi \in \Xi _\eta \) takes the form \(\phi (s)=c s^{1-\theta }\) for some \(\theta \in [0,1)\).
Theorem 4.3
(Rate of convergence) Let \(\{x^k\}\) be the sequence generated by EPDCA for solving (1.1) and \(\mu \in [L\overline{\beta }^2,L]\). Suppose that \(\left\{ x^{k}\right\} \) converges to some \(x^\star \in \mathcal {X}\), the auxiliary function \(H(x, \xi ,\nu )\) is a KL function with \(\phi \) in the KL inequality taking the form \(\phi (s)=c s^{1-\theta }\) for some \(\theta \in [0,1)\) and \(c>0 .\) Then the following statements hold:
-
(i)
If \(\theta =0\), then there exists \(k_{0}>0\) so that \(x^{k}\) is constant for \(k>k_{0}\).
-
(ii)
If \(\theta \in \left( 0, \frac{1}{2}\right] \), then there exist \(c_{1}>0, k_{1}>0\), and \(\eta \in (0,1)\) such that \(\left\| x^{k}-x^\star \right\| <c_{1} \eta ^{k}\) for \(k>k_{1}\).
-
(iii)
If \(\theta \in \left( \frac{1}{2}, 1\right) \), then there exist \(c_{2}>0\) and \(k_{2}>0\) such that \(\left\| x^{k}-x^\star \right\| <c_{2} k^{-\frac{1-\theta }{2 \theta -1}}\) for \(k>k_{2}\).
Proof
(i) If \(\theta =0\), we will prove that there must exist an integer \(k_{0}>0\) such that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \). We assume to the contrary that for all \(k>0\), it holds that \(H\left( x^{k},\xi ^{k}, x^{k-1}\right) >\zeta \). Since \(\lim \limits _{k\rightarrow \infty } x^{k}=x^\star \) and the sequence \(\left\{ H\left( x^{k},\xi ^{k}, x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), from \(\phi (s)=c s\) and the inequality (4.18), we have
for all sufficiently large k, which contradicts Proposition 4.1 (i). Therefore, there exists \(k_{0}>0\) so that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \). Due to the sequence \(\{H\left( x^{k}, \xi ^k,x^{k-1}\right) \}\) is nonincreasing and convergent to \(\zeta \), it must holds that \(H\left( x^{k_{0}+\bar{j}},\xi ^{k_{0}+\bar{j}}, x^{k_{0}+\bar{j}-1}\right) =\zeta \) for any \(\bar{j} \ge 0\). Thus, we conclude from (4.5) that there exists \(k_{0}>0\) so that \(x^{k}\) is constant for \(k>k_{0}\). This proves (i).
(ii–iii) We next consider the case that \(\theta \in (0,1) .\) If there exists \(k_{0}>0\) such that \(H\left( x^{k_{0}},\xi ^{k_{0}}, x^{k_{0}-1}\right) =\zeta \), then we can show that the sequence \(\left\{ x^{k}\right\} \) is finitely convergent as above, and the desired conclusions hold trivially. Therefore, for \(\theta \in (0,1)\), we only need to consider the case when \(H\left( x^{k}, \xi ^k,x^{k-1}\right) >\zeta \) for all \(k>0\).
Define \(R_{k}=H\left( x^{k}, \xi ^k,x^{k-1}\right) -\zeta \) and \(S_{k}=\sum _{j=k}^{\infty }\left\| x^{j}-x^{j-1}\right\| \), where \(S_{k}\) is well-defined due to Theorem 4.2. Using (4.20), for any \(k\ge \bar{k}\) ,where \(\bar{k}\) is defined as in (4.18), we have
Using the above inequality and the fact that the sequence \(\left\{ S_{k}\right\} \) is nonincreasing, we obtain that for all \(k \ge \bar{k}\),
On the other hand, since \(\lim \limits _{k \rightarrow \infty } x^{k}=x^\star \) and the sequence \(\left\{ H\left( x^{k}, \xi ^k,x^{k-1}\right) \right\} \) is nonincreasing and convergent to \(\zeta \), from (4.18) with \(\phi (s)=c s^{1-\theta }\), we have
for all sufficiently large k. In addition, from (4.10) and the definition of \(S_{k}\), we obtain that
Combining the relations (4.22) and (4.23), we further have
Raising to a power of \(\frac{1-\theta }{\theta }\) and multiplying by c to both sides of the above inequality, we obtain that
Combining this with (4.21) and according to the fact that \(\phi \left( R_{k}\right) =c\left( R_{k}\right) ^{1-\theta }\), we see that for all sufficiently large k,
where \(A_{1}=\frac{2 A}{T} c (A \cdot c(1-\theta ))^{\frac{1-\theta }{\theta }}\).
(ii)When \(\theta \in \left( 0, \frac{1}{2}\right] \), the inequality \(\frac{1-\theta }{\theta } \ge 1\) holds. According to the fact that \(\left\| x^{k}-x^{k-1}\right\| \rightarrow 0\) from Theorem 4.1 (ii) and the definition of \(S_k\), we obtain that \(S_{k-2}-S_{k} \rightarrow 0\). Combining these and (4.24), we conclude that there exists \(k_{1}>0\) so that for all \(k \ge k_{1}\), it holds that \(S_{k} \le \left( A_{1}+1\right) \left( S_{k-2}-S_{k}\right) \), which implies that \(S_{k} \le \frac{A_{1}+1}{A_{1}+2} S_{k-2}\). Therefore, we obtain that for any \(k \ge k_{1}\),
This proves (ii).
(iii)When \(\theta \in \left( \frac{1}{2}, 1\right) \), we deduce that \(\frac{1-\theta }{\theta }<1\). From this with (4.24) and the fact \(S_{k-2}-S_{k} \rightarrow 0\), we see that there exists \(k_{2}>0\) such that for all \(k \ge k_{2}\),
Raising the above inequality to a power of \(\frac{\theta }{1-\theta }\) to both sides, we have for any \(k \ge k_{2}\),
where \(A_{2}=\left( A_{1}+1\right) ^{\frac{\theta }{1-\theta }} .\) From [2, Theorem 2], we obtain that there exists \(M> 0\) and all sufficiently large k, the inequality \(S_k\le M k^{-\frac{1-\theta }{2 \theta -1}}\) holds. This completes the proof. \(\square \)
5 Numerical experiments
In this section, we perform numerical experiments to illustrate the efficiency of EPDCA for solving the optimization problem (1.1). We compare our algorithm with pDCAe proposed in [31], which has different choices of the extrapolation parameters \( \{\beta _k\}\) given in (1.6), and the general iterative shrinkage and thresholding algorithm (GIST) proposed in [10] on a DC regularized least squares problem as follows:
where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}\), \(g_n:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a proper closed convex function and \(g_s:\mathbb {R}^{n}\rightarrow \mathbb {R}\) is a continuous convex function. More concrete examples can be found in [31].
In the following we consider two different classes of regularizers. One is the \(\ell _{1-2}\) regularized least squares problem [32]:
where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m},\) and \(\lambda >0\) is the regularization parameter. It is easy to see that (5.2) is in the form of (5.1) with \(g_{n}(x)=\lambda \Vert x\Vert _{1}\) and \(g_{s}(x)=\lambda \Vert x\Vert \). We assume that A does not have zero columns so that \(\Psi _{\ell _{1-2}}\) is level-bounded [18]. So by Theorem 4.1, the sequence \(\left\{ x^{k}\right\} \) generated by our method to solve problem (5.2) is bounded, and any accumulation point of \(\left\{ x^{k}\right\} \) is a stationary point. In addition, noting from [31, Exemple 4.1] that if chosen \(\lambda <\frac{1}{2}\Vert A^{T}b\Vert _\infty \), we obtain that the sequence \(\left\{ x^{k}\right\} \) generated by our method for solving problem (5.2) is globally convergent.
The other is the logarithmic regularized least square problem [6]:
where \(A \in \mathbb {R}^{m \times n}, b \in \mathbb {R}^{m}, \epsilon >0\) is a constant and \(\lambda >0\) is the regularization parameter. We observe that (5.3) is in the form of (5.1) with \(g_{n}(x)=\frac{\lambda }{\epsilon }\Vert x \Vert _{1}\) and \(g_{s}(x)=\sum _{i=1}^{n} \lambda \left[ \frac{\left| x_{i}\right| }{\epsilon }-\log \left( \left| x_{i}\right| +\epsilon \right) +\log \epsilon \right] .\) It is easy to show that \(\Psi _{log}\) is level-bounded. This together with Theorem 4.2, we see that the sequence \(\left\{ x^{k}\right\} \) generated by EPDCA to solve problem (5.3) is globally convergent.
In the following, we perform several numerical tests to compare the behaviors of our method with pDCAe and GIST for solving (5.2) and (5.3), respectively. All the numerical experiments have been performed in Matlab 2018b on a 64-bit PC with an Intel(R) Core(TM) i7-4790 CPU (3.60GHz) and 32GB of RAM.
In our numerical experiments, for given (m, n, s), we first randomly generate a matrix \(A \in \mathbb {R}^{m \times n}\) with i.i.d. standard Gaussian entries, and each column of A is normalized. Then a s-sparsity vector \(\hat{y} \in \mathbb {R}^{n}\) is uniformly generated at random and let \(y=\mathrm{sign}(\hat{y})\). Finally, we generate \(b=A y+0.01 \cdot \hat{n}\), where \(\hat{n} \in \mathbb {R}^{m}\) is a random vector with i.i.d. standard Gaussian entries.
For each triple \((m, n,s)=(720i,2560i,80i)\), \(i = 1,2, \ldots , 8\), we generate 50 instances randomly as described above and compare all three methods in terms of the average performance over the instances. We choose the Lipschitz constant L of \(\nabla f\) as \(\lambda _{\max }\left( A^{T} A\right) \), initialize these algorithms at the origin, and terminate these algorithms when the iterations reach the maximum number of 5000 times or \(\Vert x^{k}-x^{k-1}\Vert / \text {max}\{1,\Vert x^{k}\Vert \}\le 10^{-5}.\)
With regard to parameters, for EPDCA, we choose the extrapolation parameters {\(\beta _{k}\)} as in (3.2) and set \(a=0.8, b=5, \omega =0.98\). In addition, we adopt the restart strategy and set \(K=500\) in Remark 3.1 (iv) on our method to guarantee \(\overline{\beta }=\mathop {sup}\limits _{k} \beta _k<1\). For pDCAe, the extrapolation parameters are chosen as the extrapolation coefficients used in [31]. For GIST, the algorithmic framework and the setting of parameters can be found in [19, Appendix A].
For the \(\ell _{1-2}\) regularized least squares problem, the numerical results are presented in Tables 1 and 2, which correspond with \(\lambda =1 \times 10^{-3}\) and \(\lambda =5 \times 10^{-4}\) respectively. While for the logarithmic regularized least square problem, the numerical results are presented in Tables 3 and 4, which also correspond with \(\lambda =1 \times 10^{-3}\) and \(\lambda =5 \times 10^{-4}\) respectively. In all the tables, we give the computation of \(L=\lambda _{\max }\left( A^{T} A\right) (t_{\lambda _{max}})\), the number of iterations, CPU times in seconds which does not include the time for computing L, and the function values at termination, averaged over the 50 random instances.
In Fig. 1, We plot \(\Vert x^k-x^*\Vert \) against the number of iterations to compare with the convergence performance of each algorithm on solving \(\ell _{1-2}\) regularized least squares problem (5.2), where \(x^*\) denotes the approximate solution obtained at termination of the respective method. We choose \(\lambda =1 \times 10^{-3}\) in Fig. 1a and \(\lambda =5 \times 10^{-4}\) in Fig. 1b, which correspond with Tables 1 and 2, respectively. In Fig. 2, We plot \(\Vert x^k-x^*\Vert \) against the number of iterations to compare with the convergence performance of each algorithm on solving logarithmic regularized least square problem (5.3). We choose \(\lambda =1 \times 10^{-3}\) in Fig. 2a and \(\lambda =5 \times 10^{-4}\) in Fig. 2b, which correspond with Tables 3 and 4, respectively. The green, blue and red line correspond to the three methods, GIST, pDCAe and EPDCA, respectively.
We conclude from these tables and figures that our method takes fewer iterations and less CPU time than pDCAe and GIST to get a good approximation of the minimum value of the two test regularized least squares problems.
6 Conclusion
In this paper, based on the generalized Nesterov momentum scheme for the accelerated forward–backward algorithm (AFBA) proposed in [16] and pDCAe [31] for DC programming, we propose a new extrapolated proximal difference-of-convex algorithm (EPDCA), which equips a more general setting of the extrapolation parameters \(\{\beta _k\}\) for solving the DC optimization problem (1.1). We establish global subsequential convergence of the sequence generated by our method. In addition, by assuming the Kurdyka–Łojasiewicz property of the auxiliary function, we establish global convergence of the sequence generated by EPDCA and analyze its convergence rate, which can allow \(g_s\) to be a nonsmooth function. To evaluate the performance of the proposed method, we consider two classes of DC regularized least squares problems including the \(\ell _{1-2}\) regularized least squares problem and logarithmic regularized least square problem. Numerical results illustrate the efficiency of our method and its superiority over other well-known methods.
In future research, we can utilize the advantage of nonmonotone line search technique in [23] to enlarge the step size and meanwhile speed up the convergence of the proposed algorithm. Furthermore, the difference-of-Moreau-envelopes smoothing technique has been developed to solve DC programming in [30]. These variational methods may inspire further improvements and extensions.
References
Alvarado, A., Scutari, G., Pang, J.S.: A new decomposition method for multiuser DC programming and its applications. IEEE Trans. Signal Process. 62, 2984–2998 (2014)
Attouch, H., Bolte, J.: On the convergence of the proximal algorithm for nonsmooth functions invoving analytic features. Math. Program. 116, 5–16 (2009)
Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 3, 438–457 (2010)
Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 137, 91–129 (2013)
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)
Candès, E.J., Wakin, M., Boyd, S.: Enhancing spasity by reweighted \(\ell _{1}\) minimization. J. Fourier Anal. Appl. 14, 877–905 (2008)
Chambolle, A.: An algorithm for total variation minimization and applications. J. Math. Imaging Vis. 20, 89–97 (2004)
Chambolle, A., Dossal, C.: On the convergence of the iterates of “FISTA’’. J. Optim. Theory Appl. 166, 25 (2015)
Gaso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process. 57, 4686–4698 (2009)
Gong, P., Zhang, C., Lu, Z., Huang, J.Z., Ye, J.: A general iterative shinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML (2013)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Preprint, METR 2015-27, Department of Mathematical Informatics, University of Tokyo. http://www.keisu.t.u-tokyo.ac.jp/research/techrep/index.html
Le Thi, H.A., Nguyen, M.C.: DCA based algorithms for feature selection in multi-class support vector machine. Ann. Oper. Res. 249, 273–300 (2017)
Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)
Le Thi, H.A., Pham, D.T.: DC programming and DCA: thirty years of developments. Math. Program. 169, 5–68 (2018)
Le Thi, H.A., Pham, D.T., Le, H.M., Vo, X.Y.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244, 26–46 (2015)
Lin, Y., Li, S., Zhang, Y.Z.: Convergence rate analysis of accelerated forward–backward algorithm with generalized Nesterov momentum scheme. arXiv: 2112.05873
Lin, Y., Schmidtlein, C.R., Li, Q., Li, S., Xu, Y.: A Krasnoselskii–Mann algorithm with an improved EM preconditioner for PET image reconstruction. IEEE Trans. Med. Imaging 38, 2114–2126 (2019)
Liu, T., Pong, T.K.: Further properties of the forward–backward envelope with applications to difference-of-convex programming. Comput. Optim. Appl. 67, 489–520 (2017)
Liu, T., Pong, T.K., Takeda, A.: A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Math. Program. 176, 339–367 (2018)
Liu, T., Pong, T.K., Takeda, A.: A refined convergence analysis of pDCA\(_e\) with applications to simultaneous sparse recovery and outlier detection. Math. Program. 176, 339–367 (2019)
Lou, Y., Zeng, T., Osher, S., Xin, J.: A weighted difference of anisotropic and isotropic total variation model for image processing. SIAM J. Imaging Sci. 8, 1798–1823 (2015)
Lu, Z., Zhou, Z., Sun, Z.: Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Program. 176, 369–401 (2019)
Lu, Z., Zhou, Z.: Nonmonotone enhanced proximal DC algorithms for a class of structured nonsmooth DC programming. SIAM J. Optim. 29, 2725–2752 (2019)
O’Donoghue, B., Candès, E.J.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015)
Pang, J.-S., Razaviyayn, M., Alvarado, A.: Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42, 95–118 (2017)
Pham, D.T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22, 289–355 (1997)
Pham, D.T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8, 476–505 (1998)
Rockafellar, R.T., Wets, R.: Variational Analysis. Grundlehren der Mathematischen Wissenschaften. Springer, Berlin (1998)
Sanjabi, M., Razaviyayn, M., Luo, Z.-Q.: Optimal joint base station assignment and beamforming for heterogeneous networks. IEEE Trans. Signal Process. 62, 1950–1961 (2014)
Sun, K., Sun, X.A.: Algorithms for difference-of-convex (DC) programs based on difference-of-Moreau-envelopes smoothing. arXiv: 2104.01470
Wen, B., Chen, X., Pong, T.K.: A proximal difference-of-convex algorithm with extrapolation. Comput. Optim. Appl. 62, 297–324 (2018)
Yin, P., Lou, Y., He, Q., Xin, J.: Minimization of \(\ell _{1-2}\) for compressed sensing. SIAM J. Sci. Comput. 37, A536–A563 (2015)
Acknowledgements
The authors would like to thank the editors and anonymous reviewers for their insight and helpful comments and suggestions which improve the quality of the paper. This work is also supported in part by NSFC11801131, Natural Science Foundation of Hebei Province (Grant No. A2019202229), a scientific grant of Hebei Educational Committee (QN2018101).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest with regards to the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This author’s work was supported in part by NSFC11801131, Natural Science Foundation of Hebei Province, (Grant No. A2019202229), a scientific grant of Hebei Educational Committee (QN2018101)
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gao, L., Wen, B. Convergence rate analysis of an extrapolated proximal difference-of-convex algorithm. J. Appl. Math. Comput. 69, 1403–1429 (2023). https://doi.org/10.1007/s12190-022-01797-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12190-022-01797-w
Keywords
- Difference-of-convex optimization
- Convergence analysis
- Extrapolation parametes
- Kurdyka–Łojasiewicz inequality