Aggregate subgradient method for nonsmooth DC optimization

Bagirov, Adil M.; Taheri, Sona; Joki, Kaisa; Karmitsa, Napsu; Mäkelä, Marko M.

doi:10.1007/s11590-020-01586-z

Aggregate subgradient method for nonsmooth DC optimization

Original Paper
Published: 14 May 2020

Volume 15, pages 83–96, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Optimization Letters Aims and scope Submit manuscript

Aggregate subgradient method for nonsmooth DC optimization

Download PDF

Adil M. Bagirov¹,
Sona Taheri ORCID: orcid.org/0000-0003-1779-4567¹,
Kaisa Joki²,
Napsu Karmitsa² &
…
Marko M. Mäkelä²

507 Accesses
9 Citations
Explore all metrics

Abstract

The aggregate subgradient method is developed for solving unconstrained nonsmooth difference of convex (DC) optimization problems. The proposed method shares some similarities with both the subgradient and the bundle methods. Aggregate subgradients are defined as a convex combination of subgradients computed at null steps between two serious steps. At each iteration search directions are found using only two subgradients: the aggregate subgradient and a subgradient computed at the current null step. It is proved that the proposed method converges to a critical point of the DC optimization problem and also that the number of null steps between two serious steps is finite. The new method is tested using some academic test problems and compared with several other nonsmooth DC optimization solvers.

An augmented subgradient method for minimizing nonsmooth DC functions

Article 24 July 2021

Bundle Methods for Nonsmooth DC Optimization

Recent Advances in DC Programming and DCA

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the unconstrained DC minimization problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \text {minimize} &{}\quad f({x}) \\ \text {subject to} &{}\quad x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$

(1)

where $f(x) = f_1(x) - f_2(x)$, functions $f_1$ and $f_2$ are convex and in general, nonsmooth. Here $\mathbb {R}^n$ is the n-dimensional Euclidean space.

To date, several methods have been developed to locally solve Problem (1) [9, 15, 19, 21, 24, 25, 33,34,35]. Some of these methods are extensions of the Difference of Convex Algorithm (DCA) [24, 25, 33, 34] while others can be considered as the extension of the bundle method for convex optimization [9, 15, 19, 21]. In this paper, a different approach is proposed to develop an algorithm for solving Problem (1). The key element in this approach is the concept of the aggregate subgradient. Aggregate subgradients have been used to design nonsmooth optimization algorithms [3, 16, 23]. Although there are different definitions of these subgradients in all cases they are convex combinations of subgradients from the current bundle.

The subgradient method is widely used in nonsmooth optimization due to its simple structure. Its convergence has been proved only for convex problems [1, 10, 11, 29,30,31,32]. This method uses one subgradient and one function evaluation at each iteration and involves no line search procedure. It is applicable to large-scale problems due to its small storage requirements.

Bundle methods are efficient methods in nonsmooth optimization (see, [2, 4, 13, 14, 18, 23, 27, 28]). The problem of finding the search direction in these methods is reduced to a certain quadratic programming subproblem whose size may increase significantly as the number of variables increases. Various approaches, in particular based on the notion of the aggregate subgradient, to reduce the size of this subproblem has been proposed in [16, 17, 26]. These approaches allow one to design bundle methods which are applicable to large-scale problems.

The method introduced in this paper shares some similarities with both the subgradient and the bundle methods. It is a descent method, uses more than one subgradient at each iteration and involves the line search procedure (similar to bundle methods). On the other hand, it does not involve any time consuming quadratic programming subproblem to find search directions (similar to the subgradient methods). The proposed method has a simple structure, and it is easy to implement. We prove the convergence of this method and demonstrate its performance using some nonsmooth unconstrained DC optimization test problems.

The paper is structured as follows. In Sect. 2, the new aggregate subgradient method is introduced and its convergence is studied. Results of numerical experiments are reported in Sect. 3. Section 4 concludes the paper.

Throughout this paper, we use following notations and definitions. The inner product in $\mathbb {R}^n$ is $\langle u, v\rangle = \sum \limits _{i=1}^n u_i v_i$, and $\Vert \cdot \Vert $ is the associated norm. $S_1 = \{x \in \mathbb {R}^n: ~\Vert x\Vert =1\}$ is the unit sphere, and $B(x;\varepsilon ) = \{y \in \mathbb {R}^n:~\Vert x-y\Vert < \varepsilon \}$ is the open ball centered at $x \in \mathbb {R}^n$. The convex hull of a set is denoted by “$\mathop {\mathrm {conv}}$”. For a convex function $f: \mathbb {R}^n \rightarrow \mathbb {R}$, its subdifferential at a point $x \in \mathbb {R}^n$ is [5]

$$\begin{aligned} \partial f(x) = \left\{ \xi \in \mathbb {R}^n:~f(y) - f(x) \ge \langle \xi , y - x\rangle ~~\forall ~y \in \mathbb {R}^n \right\} \end{aligned}$$

and its $\varepsilon $-subdifferential is

$$\begin{aligned} \partial _\varepsilon f(x) = \left\{ \xi \in \mathbb {R}^n:~f(y) - f(x) \ge \langle \xi , y - x\rangle - \varepsilon ~~\forall ~y \in \mathbb {R}^n \right\} . \end{aligned}$$

A point $x^* \in \mathbb {R}^n$ is called a critical point of Problem (1) if

$$\begin{aligned} \partial f_1(x^*)\cap \partial f_2(x^*) \ne \emptyset . \end{aligned}$$

2 The aggregate subgradient method

In this section, we introduce an aggregate subgradient (AggSub) method for solving Problem (1). For a given number $\tau > 0$ consider the set

$$\begin{aligned} Q_{1\tau }(x) = \mathop {\mathrm {conv}}\bigcup _{d \in S_1} \partial f_1(x+\tau d),~x \in \mathbb {R}^n. \end{aligned}$$

Proposition 1

Let $x \in \mathbb {R}^n$. The set $Q_{1\tau }(x)$ is convex and compact for any finite $\tau > 0$.

Proof

The convexity of the set $Q_{1\tau }(x)$ follows from its definition. Since the subdifferential is bounded over any bounded set the set $Q_{1\tau }(x)$ is bounded. So is the set

$$\begin{aligned} {\bar{Q}}_{1\tau }(x) = \bigcup _{d \in S_1} \partial f_1(x+\tau d). \end{aligned}$$

Now we prove that the set ${\bar{Q}}_{1\tau }(x)$ is closed. Take any sequence $\{u_k\} \subset {\bar{Q}}_{1\tau }(x)$ such that $u_k \rightarrow {\bar{u}}$ as $k \rightarrow \infty $. This means that there exists the sequence $\{d_k\} \subset S_1$ such that $u_k \in \partial f_1(x+\tau d_k)$. Since the set $S_1$ is compact all limit points of the sequence $\{d_k\}$ belong to this set. Without loss of generality, assume that $d_k \rightarrow {\bar{d}} \in S_1$ as $k \rightarrow \infty $. Then the upper semicontinuity of the subdifferential mapping implies that ${\bar{u}} \in \partial f_1(x+\tau {\bar{d}})$ and therefore, ${\bar{u}} \in {\bar{Q}}_{1\tau }(x)$, which means that the set ${\bar{Q}}_{1\tau }(x)$ is closed. Thus, the set ${\bar{Q}}_{1\tau }(x)$ is compact. Since a convex hull of a compact set is also compact it follows that the set $Q_{1\tau }(x)$ is compact. $\square $

Remark 1

Note that the set $Q_{1\tau }(x)$ was introduced in [6], where it is called the spherical subdifferential.

The proof of the following theorem can be found in [5] (see Theorem 2.33).

Theorem 1

Let $f: \mathbb {R}^n \rightarrow \mathbb {R}$ be convex with a Lipschitz constant $L>0$ at $x \in \mathbb {R}^n$. Then for all $\varepsilon \ge 0$ we have

$$\begin{aligned} \partial f(y) \subseteq \partial _\varepsilon f(x) ~~\forall ~y \in B\left( x;\frac{\varepsilon }{2L}\right) . \end{aligned}$$

(2)

Proposition 2

For the set $Q_{1\tau }(x), ~x \in \mathbb {R}^n,~\tau >0$ the following holds:

$$\begin{aligned} Q_{1\tau }(x) \subseteq \partial _\varepsilon f_1(x) \end{aligned}$$

for all $\varepsilon > 2L\tau $.

Proof

It is clear that $x+\tau d \in B\left( x;\frac{\varepsilon }{2L}\right) $ for any $\varepsilon > 2L\tau $ and $d \in S_1$. Then the proof follows from Theorem 1 and the convexity of $\partial _\varepsilon f_1(x)$. $\square $

Now take any subgradient $v \in \partial f_2(x)$ and construct the set

$$\begin{aligned} {Q_{\tau , v}(x)} = \left\{ \xi \in \mathbb {R}^n:~\xi = u - v,~u \in Q_{1\tau }(x) \right\} . \end{aligned}$$

The convexity of functions $f_1$ and $f_2$ and the subgradient inequality imply that

$$\begin{aligned} f(x+\tau d) - f(x) \le \tau \langle u - v, d \rangle ~~\forall ~ u \in \partial f_1(x+\tau d) \end{aligned}$$

(3)

and therefore,

$$\begin{aligned} f(x+\tau d) - f(x) \le \tau \max _{\xi \in {Q_{\tau , v}(x)}} \langle \xi , d \rangle . \end{aligned}$$

It follows from Proposition 1 that the set $Q_{\tau , v}(x)$ is convex and compact for any finite $\tau > 0$. If $0_n \notin {Q_{\tau , v}(x)}$ at $x \in \mathbb {R}^n$, then one can use the set $Q_{\tau , v}(x)$ to compute descent directions of the function f [8]. However, it is not always easy to compute the whole set $Q_{\tau , v}(x)$.

Next, we describe the proposed AggSub method. At each iteration of this method, we use only two elements of the set $Q_{\tau , v}(x)$ to compute search directions. The method consists of inner and outer iterations. The inner iteration contains null steps whereas the outer iteration involves serious steps. Aggregate subgradients are calculated in the inner iterations.

Remark 2

Note that Problem (4) in Step 2 is solved explicitly. If ${\bar{\xi }}^l_k = \xi ^l_k$, then ${\bar{\lambda }}_k^l$ can be any number from [0, 1]. Otherwise, we compute:

$$\begin{aligned} {\hat{\lambda }} = \frac{\Vert {\bar{\xi }}^l_k\Vert ^2 - \langle \xi ^l_k, {\bar{\xi }}_k^l \rangle }{\Vert {\bar{\xi }}^l_k\Vert ^2 + \Vert \xi ^l_k\Vert ^2 - 2 \langle \xi ^l_k, {\bar{\xi }}_k^l \rangle }. \end{aligned}$$

Since ${\bar{\xi }}^l_k \ne \xi ^l_k$ the denominator never equals to zero and the number ${\hat{\lambda }}$ is well defined. Then

$$\begin{aligned} {\bar{\lambda }}^l_k= {\left\{ \begin{array}{ll} 0, &{}\quad {\hat{\lambda }} < 0, \\ {\hat{\lambda }}, &{}\quad {\hat{\lambda }} \in [0,1], \\ 1, &{}\quad {\hat{\lambda }} > 1. \end{array}\right. } \end{aligned}$$

In the next proposition, we prove that for each outer iteration $k \ge 0$ the number of inner iterations, null steps, in the AggSub method is finite.

Proposition 3

Let

$$\begin{aligned} C_k = \max \left\{ \Vert \xi \Vert : \xi \in {Q_{\tau _k, v_k}(x_k)}\right\} < +\infty . \end{aligned}$$

(6)

Assume that $\delta _k < C_k$. Then the inner loop at the k-th iteration of the AggSub method stops after $m_k > 0$ iterations, where

$$\begin{aligned} m_k \le \left\lceil \frac{2 \log _2 (\delta _k/C_k)}{\log _2 C_{1k}}\right\rceil , ~~C_{1k} = 1-(1-c_1)^2(4C_k^2)^{-1}\delta _k^2. \end{aligned}$$

Proof

The inner loop in the AggSub method terminates when either the condition (5) or the condition of the sufficient descent

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) \le -c_1 \tau _k \Vert {\bar{\xi }}^{l+1}_k\Vert \end{aligned}$$

(7)

is met. Let $S_k^l = \{\xi \in \mathbb {R}^n:~\xi = \lambda \xi _k^l + (1-\lambda ) \bar{\xi _k^l},~\lambda \in [0,1]\}$. In order to prove this proposition it is sufficient to estimate the upper bound of the number of iterations $m_k$ when the condition (5) occurs. If none of the conditions (5) and (7) are satisfied at the l-th inner iteration, then the new subgradient $\xi _k^{l+1} = u_k^{l+1} - v_k$ computed in Step 5 does not belong to the set $S_k^l$. Indeed, according to the definition of ${\bar{\xi }}_k^{l+1}$ and the necessary and sufficient condition for a minimum (see Lemma 5.2.6 in [27]) we have

$$\begin{aligned} \left\langle \xi , {\bar{\xi }}_k^{l+1} \right\rangle \ge \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 ~~\forall ~\xi \in S_k^l. \end{aligned}$$

(8)

On the other hand, since the condition (7) does not hold we get

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) > -c_1 \tau _k \Vert {\bar{\xi }}_k^{l+1}\Vert . \end{aligned}$$

In addition, it follows from (3) that

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) \le \tau _k \langle \xi _k^{l+1}, d^{l+1}_k \rangle , \end{aligned}$$

therefore, we have

$$\begin{aligned} -c_1 \tau _k \Vert {\bar{\xi }}_k^{l+1}\Vert < \tau _k \langle \xi _k^{l+1}, d^{l+1}_k \rangle , \end{aligned}$$

which means that

$$\begin{aligned} \left\langle \xi _k^{l+1}, {\bar{\xi }}_k^{l+1} \right\rangle < c_1 \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

(9)

Then (8) and the condition $c_1 \in (0,1]$ imply that $\xi _k^{l+1} \not \in S_k^l$. Therefore, it is proved that the newly calculated subgradient $\xi _k^{l+1}$ is not on the segment between $\bar{\xi _k^l}$ and $\xi _k^l$, and by updating subgradients in this manner, we change the set $S_k^l$ at each inner iteration of the AggSub method. This means that ${\bar{\xi }}^{l+2}_k \ne {\bar{\xi }}^{l+1}_k$. Indeed, if ${\bar{\xi }}^{l+2}_k = {\bar{\xi }}^{l+1}_k$, then the optimality condition (8) would hold also for all $\xi \in S_k^{l+1}$ and since $\xi ^{l+1}_k \in S_k^{l+1}$ we get

$$\begin{aligned} \left\langle \xi ^{l+1}_k, {\bar{\xi }}_k^{l+1} \right\rangle \ge \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \end{aligned}$$

contradicting (9).

Next, we prove that if none of the stopping criteria is satisfied, then the new aggregated subgradient ${\bar{\xi }}_{k}^{l+2}$ computed at Step 3 is better than ${\bar{\xi }}_k^{l+1}$ in the sense that $\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) < \varphi ^l_k({\bar{\lambda }}^l_k)$. Since the function $\varphi ^{l+1}_k$ is strictly convex ${\bar{\lambda }}^{l+1}_k$ is its only minimizer. Furthermore, it follows from

$$\begin{aligned} {\bar{\xi }}_k^{l+2}={{\bar{\lambda }}^{l+1}_k \xi _{k}^{l+1}+\left( 1-{\bar{\lambda }}^{l+1}_k\right) {\bar{\xi }}_k^{l+1}} \end{aligned}$$

and ${\bar{\xi }}^{l+2}_k \ne {\bar{\xi }}^{l+1}_k$ that ${\bar{\lambda }}^{l+1}_k > 0$. Then we have

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) = \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 < \varphi ^{l+1}_k(0) = \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 = \varphi ^l_k({\bar{\lambda }}^l_k).} \end{aligned}$$

In addition, it is clear that

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le \Vert t\xi _k^{l+1}+(1-t){\bar{\xi }}_k^{l+1}\Vert ^2 ~~\forall t \in [0,1]. \end{aligned}$$

This means that for all $t \in [0,1]$

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le t^2\Vert \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}\Vert ^2 + 2t\langle \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}, {\bar{\xi }}_k^{l+1}\rangle + \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

It follows from (6) that

$$\begin{aligned} \Vert \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}\Vert \le 2C_k. \end{aligned}$$

This together with the inequality (9) implies that

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k)} = \Vert {\bar{\xi }}_k^{l+2}\Vert ^2\le & {} 4t^2C_k^2+ (1-2t(1-c_1))\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \end{aligned}$$

for all $t \in [0,1]$. Let $t_0 = (1-c_1)\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 (4C_k^2)^{-1}$. It is clear that $t_0 \in (0,1)$. Therefore, for $t = t_0$ we get

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le \left( 1-(1-c_1)^2(4C_k^2)^{-1}\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \right) \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

At any l-th iteration, $l>0$, where the stopping condition (5) is not met, we have $\Vert {\bar{\xi }}_k^{l+1}\Vert > \delta _k$. Then it follows that

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 < \left( 1-(1-c_1)^2(4C_k^2)^{-1} \delta _k^2 \right) \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

Let $C_{1k} = 1-(1-c_1)^2(4C_k^2)^{-1}\delta _k^2$. It is obvious that $C_{1k} \in (0,1)$. Then we have

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) < C_{1k} \varphi ^l_k({\bar{\lambda }}^l_k)}. \end{aligned}$$

Since ${\varphi ^0_k({\bar{\lambda }}^0_k)} = \Vert \xi _k^0\Vert ^2 \le C_k^2$ we get

$$\begin{aligned} {\varphi ^l_k({\bar{\lambda }}^l_k)} < C_{1k}^l C_k^2. \end{aligned}$$

Thus, the inequality (5) is satisfied if

$$\begin{aligned} C_{1k}^{m_k} C_k^2 \le \delta _k^2, \end{aligned}$$

which must happen after at most $m_k$ iterations, where

$$\begin{aligned} m_k \le \left\lceil \frac{2 \log _2 (\delta _k/C_k)}{\log _2 C_{1k}}\right\rceil . \end{aligned}$$

This completes the proof. $\square $

Next, we prove that the AggSub method generates a sequence which converges to a critical point of Problem (1). For a given $x_0 \in \mathbb {R}^n$ consider the level set

$$\begin{aligned} \mathcal{L}(x_0) = \{x \in \mathbb {R}^n:~f(x) \le f(x_0)\}. \end{aligned}$$

Theorem 2

Assume that the set $\mathcal{L}(x_0)$ is bounded for any starting point $x_0 \in \mathbb {R}^n$, and $\varepsilon =0$ in the AggSub method. Then any limit point of the sequence $\{x_k\}$ generated by this algorithm is a critical point of Problem (1).

Proof

The AggSub method consists of inner and outer iterations. In all inner iterations, the values of $\tau _k$ and $\delta _k$ remain unchanged. Furthermore, these parameters remain unchanged in Steps 7 and 8 of outer iterations. Their values are updated in Step 6 when the condition (5) is satisfied. According to Proposition 3 the number of steps in inner iterations is finite for the given $\tau _k$ and $\delta _k$. As an outcome of the inner loop either Step 6 or Steps 7 and 8 are executed.

First, we show that for each unchanged values of $\tau _k$ and $\delta _k$ the number of executions of Steps 7 and 8 is finite. Indeed, in this case the line search is carried out in Step 7 and we have

$$\begin{aligned} f(x_{k+1}) - f(x_k) \le -c_2 \tau _k \Vert {\bar{\xi }}^{l+1}_k\Vert ,~\Vert {\bar{\xi }}^{l+1}_k\Vert > \delta _k. \end{aligned}$$

Therefore, we get

$$\begin{aligned} f(x_{k+1}) - f(x_k) \le -c_2 \tau _k \delta _k. \end{aligned}$$

(10)

Since the set $\mathcal{L}(x_0)$ is compact and the function f is continuous it follows that

$$\begin{aligned} f_* := \inf ~\{f(x):x\in \mathbb {R}^n\} > -\infty . \end{aligned}$$

(11)

If the values of $\tau _k$ and $\delta _k$ are unchanged infinitely many times then (10) implies that $f(x_k) \rightarrow -\infty $ as $k \rightarrow \infty $. This contradicts the condition (11). Thus, $\tau _k$ and $\delta _k$ are updated in Step 6 after a finite number of outer iterations, and we denote the number of these iterations by $k_p$, $p=1,2\ldots $. During the iteration $k_p$, the condition (5) is satisfied meaning that

$$\begin{aligned} \Vert {{\bar{\xi }}}_{k_p}\Vert \le \delta _{k_p}~~\forall ~ p=1,2,\ldots , \end{aligned}$$

where ${{\bar{\xi }}}_{k_p} = {{\bar{\xi }}}_{k_p}^l$ for some $l >0$. It is obvious that ${{\bar{\xi }}}_{k_p} \in {Q_{\tau _{k_p}, v_{k_p}}(x_{k_p})}$ which, in turn, implies that

$$\begin{aligned} \min _{\xi \in {Q_{\tau _{k_p},v_{k_p}}(x_{k_p})}} \Vert \xi \Vert \le \delta _{k_p}, \end{aligned}$$

or

$$\begin{aligned} \min _{u \in Q_{1\tau _{k_p}}(x_{k_p})} \Vert u-v_{k_p}\Vert \le \delta _{k_p}. \end{aligned}$$

(12)

In addition, it follows from Proposition 2 that

$$\begin{aligned} Q_{1\tau _{k_p}}(x_{k_p}) \subseteq \partial _{{\bar{\varepsilon }}} f_1(x_{k_p}) \end{aligned}$$

(13)

for all ${\bar{\varepsilon }} > 2L\tau _{k_p}$, where $L>0$ is the Lipschitz constant of the function $f_1$ over the set $\mathcal{L}(x_0)$. Take any $c_0 > 1$ and consider ${\bar{\varepsilon }}_{k_p} = 2c_0 L \tau _{k_p}$. Then the inequality (12) and the inclusion (13) imply that

$$\begin{aligned} \min _{u \in \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p})} \Vert u-v_{k_p}\Vert \le \delta _{k_p}, \end{aligned}$$

which means that

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \cap B(v_{k_p};\delta _{k_p}) \ne \emptyset , \end{aligned}$$

or

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \cap B(\partial f_2(x_{k_p});\delta _{k_p}) \ne \emptyset . \end{aligned}$$

(14)

Here

$$\begin{aligned} B(\partial f_2(x_{k_p});\delta _{k_p}) = \partial f_2(x_{k_p}) + B(0_n; \delta _{k_p}).\end{aligned}$$

Since the set $\mathcal{L}(x_0)$ is compact and $x_{k_p} \in \mathcal{L}(x_0)$ for any $p >0$ the sequence $\{x_{k_p}\}$ has at least one limit point. For the sake of simplicity, assume that $x_{k_p} \rightarrow {\bar{x}}$ as $p \rightarrow \infty $. The upper semicontinuity of the subdifferential mapping and the fact that $\tau _{k_p}, \delta _{k_p} \downarrow 0$ as $p \rightarrow \infty $ imply that for $\gamma > 0$ there exists $p_\gamma > 0$ such that

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \subset \partial f_1({\bar{x}}) + B(0_n;\gamma ),~~~\partial f_2(x_{k_p}) \subset \partial f_2({\bar{x}}) + B(0_n;\gamma )~~\text{ and }~~\delta _{k_p} < \gamma \end{aligned}$$

for all $p > p_\gamma $. Then for all $p > p_\gamma $ we have

$$\begin{aligned} B(\partial f_2(x_{k_p}); \delta _{k_p}) \subseteq B(\partial f_2(x_{k_p}); \gamma ) \subseteq \partial f_2({\bar{x}}) + B(0_n; 2\gamma ). \end{aligned}$$

Taking into account (14) we get

$$\begin{aligned} \bigg (\partial f_1({\bar{x}}) + B(0_n;\gamma )\bigg ) \bigcap \bigg (\partial f_2({\bar{x}}) + B(0_n;2\gamma ) \bigg ) \ne \emptyset . \end{aligned}$$

Since $\gamma > 0$ is arbitrary we have

$$\begin{aligned} \partial f_1({\bar{x}}) \cap \partial f_2({\bar{x}}) \ne \emptyset . \end{aligned}$$

This completes the proof. $\square $

3 Numerical results

In this section, we compare the proposed AggSub method with three other nonsmooth optimization methods. The collection of test problems used in our experiments consists of 34 nonsmooth unconstrained DC optimization problems. The problems are formulated by modifying various convex problems introduced in [16, 19, 20] and using different number of variables. We use 100 (extra small), 200 (small), 500 (medium) and 1000 (large)^{Footnote 1} More details of these problems can be found in [7]. The number of starting points in these problems varies from 3 to 5 and they are also given in [7]. We present only the results of those problems where all the solvers converge to the same critical point. Overall, this leaves us 150 test examples. variables in our tests.

3.1 Solvers and parameters

We use the following solvers in our experiments:

DCA is an implementation of the well-known difference of convex algorithm (DCA) [25]. The DCA, in general, finds critical solutions to the DC optimization problems. In the implementation of the DCA, we apply the proximal bundle method MPBNGC [27] to solve the convex subproblem.
PBDC [19] is a proximal bundle method for DC optimization. Similar to the DCA, the PBDC is guaranteed to find critical solutions to the DC optimization problems.
SolvOpt [22] is an implementation of Shor’s r-algorithm [32]. Unlike the other methods considered in this paper, the SolvOpt is not utilizing the DC structure of the problem.
AggSub is an implementation of the proposed algorithm. The following parameters were used in our experiments:

$$\begin{aligned} {\left\{ \begin{array}{ll} \sigma _1=0.2,~ \sigma _2=1.0,~ \delta _0=10^{-7}, \\ \varepsilon =10^{-5},~ c_1=0.2,~ c_2=0.05, \end{array}\right. } \quad \text{ and } \quad \tau _0= {\left\{ \begin{array}{ll} 10.0, &{} n < 200, \\ 50.0, &{} n\ge 200. \end{array}\right. } \end{aligned}$$

The Fortran 95 source code of AggSub is available at http://napsu.karmitsa.fi/aggsub/. All experiments are performed on an Intel$^{\circledR }$ Core™ i5-2400 CPU (3.10 GHz, 3.10 GHz) running under Windows 7. To compile the codes, we use gfortran, the GNU Fortran compiler. All the solvers are used with the default settings of the codes.

We say that a solver finds a solution with respect to a tolerance $\varepsilon >0$ if

$$\begin{aligned} \frac{f_\mathtt{best}-f_\mathtt{opt}}{1+|f_\mathtt{opt}|} \le \varepsilon , \end{aligned}$$

where $f_\mathtt{best}$ is the best value of the objective function obtained by the solver and $f_\mathtt{opt}$ is the best known (or optimal) solution. Otherwise, the solver fails. We set $\varepsilon =10^{-3}$. In addition to the usual stopping criteria of the solvers, we terminate the experiment if the elapsed CPU time exceeds half an hour per problem.

3.2 Results

The results are analyzed using the performance profiles (Figs. 1, 2) introduced in [12]. We compare the performance of the solvers both in terms of the computational time (CPU time) and the number of the subgradient evaluations (evaluations for short)—we use the sum of the number of the subgradient evaluations of the DC components in the AggSub, the PBDC and the DCA, while for the SolvOpt we use the number of the subgradient evaluations of the objective function since this solver does not utilize the DC structure of the problem. The number of function evaluations follows the similar trends with the number of the subgradient evaluations with all the solvers, and thus, we omit these results.

In the performance profiles, the value of $\rho _s(\tau )$ at $\tau =0$ shows the ratio of the test problems for which the solver s is the best—that is, the solver s uses the least computational time or evaluations—while the value of $\rho _s(\tau )$ at the rightmost abscissa gives the ratio of the test problems that the solver s can solve—that is, the robustness of the solver s. In addition, the higher is a particular curve, the better is the corresponding solver.

It is clear from Fig. 1 that the AggSub method is the most efficient and accurate with all sizes of the test problems—it is superior to other three methods in the large problems with $n \ge 200$. Furthermore, the AggSub method becomes more robust in comparison with other methods as the number of variables increases. With the number of evaluations, as usual for any subgradient method, the AggSub method has rather high numbers as can be seen in Fig. 2a, b. This means that the AggSub method might not be efficient when function and subgradient evaluations are expensive. However, this observation is faded out with the larger problems (see Fig. 2c, d). The reason for most of the failures with the SolvOpt is its inaccuracy while the PBDC usually stops far away from the solution due to the time limit. From these results, we conclude that, overall, the AggSub method is efficient and robust.

4 Concluding remarks

In this paper, an aggregated subgradient method is developed to solve unconstrained nonsmooth difference of convex (DC) optimization problems. The method combines the strengths of the well-known subgradient and bundle methods. It is a descent method, easy to implement and does not involve solving of any time-consuming quadratic programming subproblem, since only two subgradients are utilized to calculate search directions. In addition, the proposed method is globally convergent to a critical point.

The aggregated subgradient method was tested using large-scale nonsmooth unconstrained DC programming problems and compared with several other nonsmooth DC programming solvers. Results demonstrate that the method is efficient and robust for solving nonsmooth DC optimization problems, although in some cases it may require a large number of function and subgradient evaluations. The proposed method outperforms other methods when the evaluation of the objective function and its subgradient is not expensive.

Notes

We include 148 large test problems since none of four methods succeeded in solving two other problems.

References

Anstreicher, K.M., Wolsey, L.A.: Two well-known properties of subgradient optimization. Math. Program. 120(1), 213–220 (2009)
Article MathSciNet Google Scholar
Bagirov, A.M., Ganjehlou, A.N.: A quasisecant method for minimizing nonsmooth functions. Optim. Methods Softw. 25(1), 3–18 (2010)
Article MathSciNet Google Scholar
Bagirov, A.M., Jin, L., Karmitsa, N., Al Nuaimat, A., Sultanova, N.: Subgradient method for nonconvex nonsmooth optimization. J. Optim. Theory Appl. 157(2), 416–435 (2013)
Article MathSciNet Google Scholar
Bagirov, A.M., Karasözen, B., Sezer, M.: Discrete gradient method: Derivative-free method for nonsmooth optimization. J. Optim. Theory Appl. 137(2), 317–334 (2008)
Article MathSciNet Google Scholar
Bagirov, A.M., Karmitsa, N., Mäkelä, M.M.: Introduction to Nonsmooth Optimization: Theory, Practice and Software. Springer, New York (2014)
Book Google Scholar
Bagirov, A.M., Taheri, S., Bai, F., Wu, Z.: An approximate ADMM for solving linearly constrained nonsmooth optimization problems with two blocks of variables. In: International Series in Numerical Mathematics (2019)
Bagirov, A.M., Taheri, S., Joki, K., Karmitsa, N., Mäkelä, M.M.: A new subgradient based method for nonsmooth DC programming, TUCS. Tech. Rep., No. 1201, Turku Centre for Computer Science, Turku (2019)
Bagirov, A.M., Taheri, S., Ugon, J.: Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recognit. 53, 12–24 (2016)
Article Google Scholar
Bagirov, A.M., Ugon, J.: Codifferential method for minimizing nonsmooth DC functions. J. Global Optim. 50(1), 3–22 (2011)
Article MathSciNet Google Scholar
Beltran, C., Heredia, F.J.: An effective line search for the subgradient method. J. Optim. Theory Appl. 125(1), 1–18 (2005)
Article MathSciNet Google Scholar
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, New York (1999)
MATH Google Scholar
Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Mathematical Programming 91(2), 201–213 (2002)
Article MathSciNet Google Scholar
Fuduli, A., Gaudioso, M., Giallombardo, G.: A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization. Optim. Methods Softw. 19(1), 89–102 (2004)
Article MathSciNet Google Scholar
Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14(3), 743–756 (2004)
Article MathSciNet Google Scholar
Gaudioso, M., Giallombardo, G., Miglionico, G., Bagirov, A.M.: Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations. J. Global Optim. 71(1), 37–55 (2018)
Article MathSciNet Google Scholar
Haarala, M., Miettinen, K., Mäkelä, M.M.: New limited memory bundle method for large-scale nonsmooth optimization. Optim. Methods Softw. 19(6), 673–692 (2004)
Article MathSciNet Google Scholar
Haarala, N., Miettinen, K., Mäkelä, M.M.: Globally convergent limited memory bundle method for large-scale nonsmooth optimization. Math. Program. 109(1), 181–205 (2007)
Article MathSciNet Google Scholar
Hare, W., Sagastizábal, C.: A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5), 2442–2473 (2010)
Article MathSciNet Google Scholar
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M.: A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes. J. Global Optim. 68, 501–535 (2017)
Article MathSciNet Google Scholar
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Double bundle method for nonsmooth DC optimization, TUCS. Tech. Rep., No. 1173, Turku Centre for Computer Science, Turku (2017)
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M., Taheri, S.: Double bundle method for finding Clarke stationary points in nonsmooth DC programming. SIAM J. Optim. 28(2), 1892–1919 (2018)
Article MathSciNet Google Scholar
Kappel, F., Kuntsevich, A.V.: An implementation of Shor’s $r$-algorithm. Comput. Optim. Appl. 15(2), 193–205 (2000)
Article MathSciNet Google Scholar
Kiwiel, K.C.: Methods of Descent for Nondifferentiable Optimization. Springer, Berlin (1985)
Book Google Scholar
Le Thi Hoai, A., Pham Dinh, T.: Solving a class of linearly constrained indefinite quadratic problems by D.C. algorithms. J. Global Optim. 11, 253–285 (1997)
Article MathSciNet Google Scholar
Le Thi Hoai, A., Pham Dinh, T.: The DC (differnece of convex functions) programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)
MathSciNet MATH Google Scholar
Lukšan, L., Vlček, J.: A bundle-Newton method for nonsmooth unconstrained minimization. Math. Program. 83(1), 373–391 (1998)
MathSciNet MATH Google Scholar
Mäkelä, M.M., Neittaanmäki, P.: Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control. World Scientific, Singapore (1992)
Book Google Scholar
Mifflin, R., Sun, D., Qi, L.: Quasi-Newton bundle-type methods for nondifferentiable convex optimization. SIAM J. Optim. 8(2), 583–603 (1998)
Article MathSciNet Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4), 1574–1609 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software Inc., New York (1987)
MATH Google Scholar
Shor, N.Z.: Minimization Methods for Non-differentiable Functions. Springer, Berlin (1985)
Book Google Scholar
Souza, J.C.O., Oliveira, P.R., Soubeyran, A.: Global convergence of a proximal linearized algorithm for difference of convex functions. Optim. Lett. 10, 1529–1539 (2016)
Article MathSciNet Google Scholar
Strekalovsky, A.S.: Global optimality conditions for nonconvex optimization. J. Global Optim. 12, 415–434 (1998)
Article MathSciNet Google Scholar
Tuy, H.: Convex Analysis and Global Optimization. Kluwer Academic Publishers, Dordrescht (1998)
Book Google Scholar

Download references

Acknowledgements

This research by Dr. Adil Bagirov and Dr. Sona Taheri was supported by the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (Project No. DP190100580). The research by Dr. Napsu Karmitsa and Dr. Sona Taheri was supported by the Academy of Finland (Projects Nos. 289500 and 319274).

Author information

Authors and Affiliations

School of Science, Engineering and Information Technology, Federation University Australia, Ballarat, Victoria, Australia
Adil M. Bagirov & Sona Taheri
Department of Mathematics and Statistics, University of Turku, 20014, Turku, Finland
Kaisa Joki, Napsu Karmitsa & Marko M. Mäkelä

Authors

Adil M. Bagirov
View author publications
You can also search for this author in PubMed Google Scholar
Sona Taheri
View author publications
You can also search for this author in PubMed Google Scholar
Kaisa Joki
View author publications
You can also search for this author in PubMed Google Scholar
Napsu Karmitsa
View author publications
You can also search for this author in PubMed Google Scholar
Marko M. Mäkelä
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sona Taheri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagirov, A.M., Taheri, S., Joki, K. et al. Aggregate subgradient method for nonsmooth DC optimization. Optim Lett 15, 83–96 (2021). https://doi.org/10.1007/s11590-020-01586-z

Download citation

Received: 12 March 2019
Accepted: 21 April 2020
Published: 14 May 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11590-020-01586-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Aggregate subgradient method for nonsmooth DC optimization

Abstract

Similar content being viewed by others

An augmented subgradient method for minimizing nonsmooth DC functions

Bundle Methods for Nonsmooth DC Optimization

Recent Advances in DC Programming and DCA

1 Introduction