1 Introduction

Consider the unconstrained DC minimization problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \text {minimize} &{}\quad f({x}) \\ \text {subject to} &{}\quad x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$
(1)

where \(f(x) = f_1(x) - f_2(x)\), functions \(f_1\) and \(f_2\) are convex and in general, nonsmooth. Here \(\mathbb {R}^n\) is the n-dimensional Euclidean space.

To date, several methods have been developed to locally solve Problem (1) [9, 15, 19, 21, 24, 25, 33,34,35]. Some of these methods are extensions of the Difference of Convex Algorithm (DCA) [24, 25, 33, 34] while others can be considered as the extension of the bundle method for convex optimization [9, 15, 19, 21]. In this paper, a different approach is proposed to develop an algorithm for solving Problem (1). The key element in this approach is the concept of the aggregate subgradient. Aggregate subgradients have been used to design nonsmooth optimization algorithms [3, 16, 23]. Although there are different definitions of these subgradients in all cases they are convex combinations of subgradients from the current bundle.

The subgradient method is widely used in nonsmooth optimization due to its simple structure. Its convergence has been proved only for convex problems [1, 10, 11, 29,30,31,32]. This method uses one subgradient and one function evaluation at each iteration and involves no line search procedure. It is applicable to large-scale problems due to its small storage requirements.

Bundle methods are efficient methods in nonsmooth optimization (see, [2, 4, 13, 14, 18, 23, 27, 28]). The problem of finding the search direction in these methods is reduced to a certain quadratic programming subproblem whose size may increase significantly as the number of variables increases. Various approaches, in particular based on the notion of the aggregate subgradient, to reduce the size of this subproblem has been proposed in [16, 17, 26]. These approaches allow one to design bundle methods which are applicable to large-scale problems.

The method introduced in this paper shares some similarities with both the subgradient and the bundle methods. It is a descent method, uses more than one subgradient at each iteration and involves the line search procedure (similar to bundle methods). On the other hand, it does not involve any time consuming quadratic programming subproblem to find search directions (similar to the subgradient methods). The proposed method has a simple structure, and it is easy to implement. We prove the convergence of this method and demonstrate its performance using some nonsmooth unconstrained DC optimization test problems.

The paper is structured as follows. In Sect. 2, the new aggregate subgradient method is introduced and its convergence is studied. Results of numerical experiments are reported in Sect. 3. Section 4 concludes the paper.

Throughout this paper, we use following notations and definitions. The inner product in \(\mathbb {R}^n\) is \(\langle u, v\rangle = \sum \limits _{i=1}^n u_i v_i\), and \(\Vert \cdot \Vert \) is the associated norm. \(S_1 = \{x \in \mathbb {R}^n: ~\Vert x\Vert =1\}\) is the unit sphere, and \(B(x;\varepsilon ) = \{y \in \mathbb {R}^n:~\Vert x-y\Vert < \varepsilon \}\) is the open ball centered at \(x \in \mathbb {R}^n\). The convex hull of a set is denoted by “\(\mathop {\mathrm {conv}}\)”. For a convex function \(f: \mathbb {R}^n \rightarrow \mathbb {R}\), its subdifferential at a point \(x \in \mathbb {R}^n\) is [5]

$$\begin{aligned} \partial f(x) = \left\{ \xi \in \mathbb {R}^n:~f(y) - f(x) \ge \langle \xi , y - x\rangle ~~\forall ~y \in \mathbb {R}^n \right\} \end{aligned}$$

and its \(\varepsilon \)-subdifferential is

$$\begin{aligned} \partial _\varepsilon f(x) = \left\{ \xi \in \mathbb {R}^n:~f(y) - f(x) \ge \langle \xi , y - x\rangle - \varepsilon ~~\forall ~y \in \mathbb {R}^n \right\} . \end{aligned}$$

A point \(x^* \in \mathbb {R}^n\) is called a critical point of Problem (1) if

$$\begin{aligned} \partial f_1(x^*)\cap \partial f_2(x^*) \ne \emptyset . \end{aligned}$$

2 The aggregate subgradient method

In this section, we introduce an aggregate subgradient (AggSub) method for solving Problem (1). For a given number \(\tau > 0\) consider the set

$$\begin{aligned} Q_{1\tau }(x) = \mathop {\mathrm {conv}}\bigcup _{d \in S_1} \partial f_1(x+\tau d),~x \in \mathbb {R}^n. \end{aligned}$$

Proposition 1

Let \(x \in \mathbb {R}^n\). The set \(Q_{1\tau }(x)\) is convex and compact for any finite \(\tau > 0\).

Proof

The convexity of the set \(Q_{1\tau }(x)\) follows from its definition. Since the subdifferential is bounded over any bounded set the set \(Q_{1\tau }(x)\) is bounded. So is the set

$$\begin{aligned} {\bar{Q}}_{1\tau }(x) = \bigcup _{d \in S_1} \partial f_1(x+\tau d). \end{aligned}$$

Now we prove that the set \({\bar{Q}}_{1\tau }(x)\) is closed. Take any sequence \(\{u_k\} \subset {\bar{Q}}_{1\tau }(x)\) such that \(u_k \rightarrow {\bar{u}}\) as \(k \rightarrow \infty \). This means that there exists the sequence \(\{d_k\} \subset S_1\) such that \(u_k \in \partial f_1(x+\tau d_k)\). Since the set \(S_1\) is compact all limit points of the sequence \(\{d_k\}\) belong to this set. Without loss of generality, assume that \(d_k \rightarrow {\bar{d}} \in S_1\) as \(k \rightarrow \infty \). Then the upper semicontinuity of the subdifferential mapping implies that \({\bar{u}} \in \partial f_1(x+\tau {\bar{d}})\) and therefore, \({\bar{u}} \in {\bar{Q}}_{1\tau }(x)\), which means that the set \({\bar{Q}}_{1\tau }(x)\) is closed. Thus, the set \({\bar{Q}}_{1\tau }(x)\) is compact. Since a convex hull of a compact set is also compact it follows that the set \(Q_{1\tau }(x)\) is compact. \(\square \)

Remark 1

Note that the set \(Q_{1\tau }(x)\) was introduced in [6], where it is called the spherical subdifferential.

The proof of the following theorem can be found in [5] (see Theorem 2.33).

Theorem 1

Let \(f: \mathbb {R}^n \rightarrow \mathbb {R}\) be convex with a Lipschitz constant \(L>0\) at \(x \in \mathbb {R}^n\). Then for all \(\varepsilon \ge 0\) we have

$$\begin{aligned} \partial f(y) \subseteq \partial _\varepsilon f(x) ~~\forall ~y \in B\left( x;\frac{\varepsilon }{2L}\right) . \end{aligned}$$
(2)

Proposition 2

For the set \(Q_{1\tau }(x), ~x \in \mathbb {R}^n,~\tau >0\) the following holds:

$$\begin{aligned} Q_{1\tau }(x) \subseteq \partial _\varepsilon f_1(x) \end{aligned}$$

for all \(\varepsilon > 2L\tau \).

Proof

It is clear that \(x+\tau d \in B\left( x;\frac{\varepsilon }{2L}\right) \) for any \(\varepsilon > 2L\tau \) and \(d \in S_1\). Then the proof follows from Theorem 1 and the convexity of \(\partial _\varepsilon f_1(x)\). \(\square \)

Now take any subgradient \(v \in \partial f_2(x)\) and construct the set

$$\begin{aligned} {Q_{\tau , v}(x)} = \left\{ \xi \in \mathbb {R}^n:~\xi = u - v,~u \in Q_{1\tau }(x) \right\} . \end{aligned}$$

The convexity of functions \(f_1\) and \(f_2\) and the subgradient inequality imply that

$$\begin{aligned} f(x+\tau d) - f(x) \le \tau \langle u - v, d \rangle ~~\forall ~ u \in \partial f_1(x+\tau d) \end{aligned}$$
(3)

and therefore,

$$\begin{aligned} f(x+\tau d) - f(x) \le \tau \max _{\xi \in {Q_{\tau , v}(x)}} \langle \xi , d \rangle . \end{aligned}$$

It follows from Proposition 1 that the set \(Q_{\tau , v}(x)\) is convex and compact for any finite \(\tau > 0\). If \(0_n \notin {Q_{\tau , v}(x)}\) at \(x \in \mathbb {R}^n\), then one can use the set \(Q_{\tau , v}(x)\) to compute descent directions of the function f [8]. However, it is not always easy to compute the whole set \(Q_{\tau , v}(x)\).

Next, we describe the proposed AggSub method. At each iteration of this method, we use only two elements of the set \(Q_{\tau , v}(x)\) to compute search directions. The method consists of inner and outer iterations. The inner iteration contains null steps whereas the outer iteration involves serious steps. Aggregate subgradients are calculated in the inner iterations.

figure a

Remark 2

Note that Problem (4) in Step 2 is solved explicitly. If \({\bar{\xi }}^l_k = \xi ^l_k\), then \({\bar{\lambda }}_k^l\) can be any number from [0, 1]. Otherwise, we compute:

$$\begin{aligned} {\hat{\lambda }} = \frac{\Vert {\bar{\xi }}^l_k\Vert ^2 - \langle \xi ^l_k, {\bar{\xi }}_k^l \rangle }{\Vert {\bar{\xi }}^l_k\Vert ^2 + \Vert \xi ^l_k\Vert ^2 - 2 \langle \xi ^l_k, {\bar{\xi }}_k^l \rangle }. \end{aligned}$$

Since \({\bar{\xi }}^l_k \ne \xi ^l_k\) the denominator never equals to zero and the number \({\hat{\lambda }}\) is well defined. Then

$$\begin{aligned} {\bar{\lambda }}^l_k= {\left\{ \begin{array}{ll} 0, &{}\quad {\hat{\lambda }} < 0, \\ {\hat{\lambda }}, &{}\quad {\hat{\lambda }} \in [0,1], \\ 1, &{}\quad {\hat{\lambda }} > 1. \end{array}\right. } \end{aligned}$$

In the next proposition, we prove that for each outer iteration \(k \ge 0\) the number of inner iterations, null steps, in the AggSub method is finite.

Proposition 3

Let

$$\begin{aligned} C_k = \max \left\{ \Vert \xi \Vert : \xi \in {Q_{\tau _k, v_k}(x_k)}\right\} < +\infty . \end{aligned}$$
(6)

Assume that \(\delta _k < C_k\). Then the inner loop at the k-th iteration of the AggSub method stops after \(m_k > 0\) iterations, where

$$\begin{aligned} m_k \le \left\lceil \frac{2 \log _2 (\delta _k/C_k)}{\log _2 C_{1k}}\right\rceil , ~~C_{1k} = 1-(1-c_1)^2(4C_k^2)^{-1}\delta _k^2. \end{aligned}$$

Proof

The inner loop in the AggSub method terminates when either the condition (5) or the condition of the sufficient descent

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) \le -c_1 \tau _k \Vert {\bar{\xi }}^{l+1}_k\Vert \end{aligned}$$
(7)

is met. Let \(S_k^l = \{\xi \in \mathbb {R}^n:~\xi = \lambda \xi _k^l + (1-\lambda ) \bar{\xi _k^l},~\lambda \in [0,1]\}\). In order to prove this proposition it is sufficient to estimate the upper bound of the number of iterations \(m_k\) when the condition (5) occurs. If none of the conditions (5) and (7) are satisfied at the l-th inner iteration, then the new subgradient \(\xi _k^{l+1} = u_k^{l+1} - v_k\) computed in Step 5 does not belong to the set \(S_k^l\). Indeed, according to the definition of \({\bar{\xi }}_k^{l+1}\) and the necessary and sufficient condition for a minimum (see Lemma 5.2.6 in [27]) we have

$$\begin{aligned} \left\langle \xi , {\bar{\xi }}_k^{l+1} \right\rangle \ge \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 ~~\forall ~\xi \in S_k^l. \end{aligned}$$
(8)

On the other hand, since the condition (7) does not hold we get

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) > -c_1 \tau _k \Vert {\bar{\xi }}_k^{l+1}\Vert . \end{aligned}$$

In addition, it follows from (3) that

$$\begin{aligned} f(x_k+\tau _k d^{l+1}_k) - f(x_k) \le \tau _k \langle \xi _k^{l+1}, d^{l+1}_k \rangle , \end{aligned}$$

therefore, we have

$$\begin{aligned} -c_1 \tau _k \Vert {\bar{\xi }}_k^{l+1}\Vert < \tau _k \langle \xi _k^{l+1}, d^{l+1}_k \rangle , \end{aligned}$$

which means that

$$\begin{aligned} \left\langle \xi _k^{l+1}, {\bar{\xi }}_k^{l+1} \right\rangle < c_1 \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$
(9)

Then (8) and the condition \(c_1 \in (0,1]\) imply that \(\xi _k^{l+1} \not \in S_k^l\). Therefore, it is proved that the newly calculated subgradient \(\xi _k^{l+1}\) is not on the segment between \(\bar{\xi _k^l}\) and \(\xi _k^l\), and by updating subgradients in this manner, we change the set \(S_k^l\) at each inner iteration of the AggSub method. This means that \({\bar{\xi }}^{l+2}_k \ne {\bar{\xi }}^{l+1}_k\). Indeed, if \({\bar{\xi }}^{l+2}_k = {\bar{\xi }}^{l+1}_k\), then the optimality condition (8) would hold also for all \(\xi \in S_k^{l+1}\) and since \(\xi ^{l+1}_k \in S_k^{l+1}\) we get

$$\begin{aligned} \left\langle \xi ^{l+1}_k, {\bar{\xi }}_k^{l+1} \right\rangle \ge \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \end{aligned}$$

contradicting (9).

Next, we prove that if none of the stopping criteria is satisfied, then the new aggregated subgradient \({\bar{\xi }}_{k}^{l+2}\) computed at Step 3 is better than \({\bar{\xi }}_k^{l+1}\) in the sense that \(\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) < \varphi ^l_k({\bar{\lambda }}^l_k)\). Since the function \(\varphi ^{l+1}_k\) is strictly convex \({\bar{\lambda }}^{l+1}_k\) is its only minimizer. Furthermore, it follows from

$$\begin{aligned} {\bar{\xi }}_k^{l+2}={{\bar{\lambda }}^{l+1}_k \xi _{k}^{l+1}+\left( 1-{\bar{\lambda }}^{l+1}_k\right) {\bar{\xi }}_k^{l+1}} \end{aligned}$$

and \({\bar{\xi }}^{l+2}_k \ne {\bar{\xi }}^{l+1}_k\) that \({\bar{\lambda }}^{l+1}_k > 0\). Then we have

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) = \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 < \varphi ^{l+1}_k(0) = \Vert {\bar{\xi }}_k^{l+1}\Vert ^2 = \varphi ^l_k({\bar{\lambda }}^l_k).} \end{aligned}$$

In addition, it is clear that

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le \Vert t\xi _k^{l+1}+(1-t){\bar{\xi }}_k^{l+1}\Vert ^2 ~~\forall t \in [0,1]. \end{aligned}$$

This means that for all \(t \in [0,1]\)

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le t^2\Vert \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}\Vert ^2 + 2t\langle \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}, {\bar{\xi }}_k^{l+1}\rangle + \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

It follows from (6) that

$$\begin{aligned} \Vert \xi _k^{l+1}-{\bar{\xi }}_k^{l+1}\Vert \le 2C_k. \end{aligned}$$

This together with the inequality (9) implies that

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k)} = \Vert {\bar{\xi }}_k^{l+2}\Vert ^2\le & {} 4t^2C_k^2+ (1-2t(1-c_1))\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \end{aligned}$$

for all \(t \in [0,1]\). Let \(t_0 = (1-c_1)\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 (4C_k^2)^{-1}\). It is clear that \(t_0 \in (0,1)\). Therefore, for \(t = t_0\) we get

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 \le \left( 1-(1-c_1)^2(4C_k^2)^{-1}\Vert {\bar{\xi }}_k^{l+1}\Vert ^2 \right) \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

At any l-th iteration, \(l>0\), where the stopping condition (5) is not met, we have \(\Vert {\bar{\xi }}_k^{l+1}\Vert > \delta _k\). Then it follows that

$$\begin{aligned} \Vert {\bar{\xi }}_k^{l+2}\Vert ^2 < \left( 1-(1-c_1)^2(4C_k^2)^{-1} \delta _k^2 \right) \Vert {\bar{\xi }}_k^{l+1}\Vert ^2. \end{aligned}$$

Let \(C_{1k} = 1-(1-c_1)^2(4C_k^2)^{-1}\delta _k^2\). It is obvious that \(C_{1k} \in (0,1)\). Then we have

$$\begin{aligned} {\varphi ^{l+1}_k({\bar{\lambda }}^{l+1}_k) < C_{1k} \varphi ^l_k({\bar{\lambda }}^l_k)}. \end{aligned}$$

Since \({\varphi ^0_k({\bar{\lambda }}^0_k)} = \Vert \xi _k^0\Vert ^2 \le C_k^2\) we get

$$\begin{aligned} {\varphi ^l_k({\bar{\lambda }}^l_k)} < C_{1k}^l C_k^2. \end{aligned}$$

Thus, the inequality (5) is satisfied if

$$\begin{aligned} C_{1k}^{m_k} C_k^2 \le \delta _k^2, \end{aligned}$$

which must happen after at most \(m_k\) iterations, where

$$\begin{aligned} m_k \le \left\lceil \frac{2 \log _2 (\delta _k/C_k)}{\log _2 C_{1k}}\right\rceil . \end{aligned}$$

This completes the proof. \(\square \)

Next, we prove that the AggSub method generates a sequence which converges to a critical point of Problem (1). For a given \(x_0 \in \mathbb {R}^n\) consider the level set

$$\begin{aligned} \mathcal{L}(x_0) = \{x \in \mathbb {R}^n:~f(x) \le f(x_0)\}. \end{aligned}$$

Theorem 2

Assume that the set \(\mathcal{L}(x_0)\) is bounded for any starting point \(x_0 \in \mathbb {R}^n\), and \(\varepsilon =0\) in the AggSub method. Then any limit point of the sequence \(\{x_k\}\) generated by this algorithm is a critical point of Problem (1).

Proof

The AggSub method consists of inner and outer iterations. In all inner iterations, the values of \(\tau _k\) and \(\delta _k\) remain unchanged. Furthermore, these parameters remain unchanged in Steps 7 and 8 of outer iterations. Their values are updated in Step 6 when the condition (5) is satisfied. According to Proposition 3 the number of steps in inner iterations is finite for the given \(\tau _k\) and \(\delta _k\). As an outcome of the inner loop either Step 6 or Steps 7 and 8 are executed.

First, we show that for each unchanged values of \(\tau _k\) and \(\delta _k\) the number of executions of Steps 7 and 8 is finite. Indeed, in this case the line search is carried out in Step 7 and we have

$$\begin{aligned} f(x_{k+1}) - f(x_k) \le -c_2 \tau _k \Vert {\bar{\xi }}^{l+1}_k\Vert ,~\Vert {\bar{\xi }}^{l+1}_k\Vert > \delta _k. \end{aligned}$$

Therefore, we get

$$\begin{aligned} f(x_{k+1}) - f(x_k) \le -c_2 \tau _k \delta _k. \end{aligned}$$
(10)

Since the set \(\mathcal{L}(x_0)\) is compact and the function f is continuous it follows that

$$\begin{aligned} f_* := \inf ~\{f(x):x\in \mathbb {R}^n\} > -\infty . \end{aligned}$$
(11)

If the values of \(\tau _k\) and \(\delta _k\) are unchanged infinitely many times then (10) implies that \(f(x_k) \rightarrow -\infty \) as \(k \rightarrow \infty \). This contradicts the condition (11). Thus, \(\tau _k\) and \(\delta _k\) are updated in Step 6 after a finite number of outer iterations, and we denote the number of these iterations by \(k_p\), \(p=1,2\ldots \). During the iteration \(k_p\), the condition (5) is satisfied meaning that

$$\begin{aligned} \Vert {{\bar{\xi }}}_{k_p}\Vert \le \delta _{k_p}~~\forall ~ p=1,2,\ldots , \end{aligned}$$

where \({{\bar{\xi }}}_{k_p} = {{\bar{\xi }}}_{k_p}^l\) for some \(l >0\). It is obvious that \({{\bar{\xi }}}_{k_p} \in {Q_{\tau _{k_p}, v_{k_p}}(x_{k_p})}\) which, in turn, implies that

$$\begin{aligned} \min _{\xi \in {Q_{\tau _{k_p},v_{k_p}}(x_{k_p})}} \Vert \xi \Vert \le \delta _{k_p}, \end{aligned}$$

or

$$\begin{aligned} \min _{u \in Q_{1\tau _{k_p}}(x_{k_p})} \Vert u-v_{k_p}\Vert \le \delta _{k_p}. \end{aligned}$$
(12)

In addition, it follows from Proposition 2 that

$$\begin{aligned} Q_{1\tau _{k_p}}(x_{k_p}) \subseteq \partial _{{\bar{\varepsilon }}} f_1(x_{k_p}) \end{aligned}$$
(13)

for all \({\bar{\varepsilon }} > 2L\tau _{k_p}\), where \(L>0\) is the Lipschitz constant of the function \(f_1\) over the set \(\mathcal{L}(x_0)\). Take any \(c_0 > 1\) and consider \({\bar{\varepsilon }}_{k_p} = 2c_0 L \tau _{k_p}\). Then the inequality (12) and the inclusion (13) imply that

$$\begin{aligned} \min _{u \in \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p})} \Vert u-v_{k_p}\Vert \le \delta _{k_p}, \end{aligned}$$

which means that

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \cap B(v_{k_p};\delta _{k_p}) \ne \emptyset , \end{aligned}$$

or

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \cap B(\partial f_2(x_{k_p});\delta _{k_p}) \ne \emptyset . \end{aligned}$$
(14)

Here

$$\begin{aligned} B(\partial f_2(x_{k_p});\delta _{k_p}) = \partial f_2(x_{k_p}) + B(0_n; \delta _{k_p}).\end{aligned}$$

Since the set \(\mathcal{L}(x_0)\) is compact and \(x_{k_p} \in \mathcal{L}(x_0)\) for any \(p >0\) the sequence \(\{x_{k_p}\}\) has at least one limit point. For the sake of simplicity, assume that \(x_{k_p} \rightarrow {\bar{x}}\) as \(p \rightarrow \infty \). The upper semicontinuity of the subdifferential mapping and the fact that \(\tau _{k_p}, \delta _{k_p} \downarrow 0\) as \(p \rightarrow \infty \) imply that for \(\gamma > 0\) there exists \(p_\gamma > 0\) such that

$$\begin{aligned} \partial _{{\bar{\varepsilon }}_{k_p}} f_1(x_{k_p}) \subset \partial f_1({\bar{x}}) + B(0_n;\gamma ),~~~\partial f_2(x_{k_p}) \subset \partial f_2({\bar{x}}) + B(0_n;\gamma )~~\text{ and }~~\delta _{k_p} < \gamma \end{aligned}$$

for all \(p > p_\gamma \). Then for all \(p > p_\gamma \) we have

$$\begin{aligned} B(\partial f_2(x_{k_p}); \delta _{k_p}) \subseteq B(\partial f_2(x_{k_p}); \gamma ) \subseteq \partial f_2({\bar{x}}) + B(0_n; 2\gamma ). \end{aligned}$$

Taking into account (14) we get

$$\begin{aligned} \bigg (\partial f_1({\bar{x}}) + B(0_n;\gamma )\bigg ) \bigcap \bigg (\partial f_2({\bar{x}}) + B(0_n;2\gamma ) \bigg ) \ne \emptyset . \end{aligned}$$

Since \(\gamma > 0\) is arbitrary we have

$$\begin{aligned} \partial f_1({\bar{x}}) \cap \partial f_2({\bar{x}}) \ne \emptyset . \end{aligned}$$

This completes the proof. \(\square \)

3 Numerical results

In this section, we compare the proposed AggSub method with three other nonsmooth optimization methods. The collection of test problems used in our experiments consists of 34 nonsmooth unconstrained DC optimization problems. The problems are formulated by modifying various convex problems introduced in [16, 19, 20] and using different number of variables. We use 100 (extra small), 200 (small), 500 (medium) and 1000 (large)Footnote 1 More details of these problems can be found in [7]. The number of starting points in these problems varies from 3 to 5 and they are also given in [7]. We present only the results of those problems where all the solvers converge to the same critical point. Overall, this leaves us 150 test examples. variables in our tests.

3.1 Solvers and parameters

We use the following solvers in our experiments:

  • DCA is an implementation of the well-known difference of convex algorithm (DCA) [25]. The DCA, in general, finds critical solutions to the DC optimization problems. In the implementation of the DCA, we apply the proximal bundle method MPBNGC [27] to solve the convex subproblem.

  • PBDC [19] is a proximal bundle method for DC optimization. Similar to the DCA, the PBDC is guaranteed to find critical solutions to the DC optimization problems.

  • SolvOpt [22] is an implementation of Shor’s r-algorithm [32]. Unlike the other methods considered in this paper, the SolvOpt is not utilizing the DC structure of the problem.

  • AggSub is an implementation of the proposed algorithm. The following parameters were used in our experiments:

$$\begin{aligned} {\left\{ \begin{array}{ll} \sigma _1=0.2,~ \sigma _2=1.0,~ \delta _0=10^{-7}, \\ \varepsilon =10^{-5},~ c_1=0.2,~ c_2=0.05, \end{array}\right. } \quad \text{ and } \quad \tau _0= {\left\{ \begin{array}{ll} 10.0, &{} n < 200, \\ 50.0, &{} n\ge 200. \end{array}\right. } \end{aligned}$$

The Fortran 95 source code of AggSub is available at http://napsu.karmitsa.fi/aggsub/. All experiments are performed on an Intel\(^{\circledR }\) Core™  i5-2400 CPU (3.10 GHz, 3.10 GHz) running under Windows 7. To compile the codes, we use gfortran, the GNU Fortran compiler. All the solvers are used with the default settings of the codes.

We say that a solver finds a solution with respect to a tolerance \(\varepsilon >0\) if

$$\begin{aligned} \frac{f_\mathtt{best}-f_\mathtt{opt}}{1+|f_\mathtt{opt}|} \le \varepsilon , \end{aligned}$$

where \(f_\mathtt{best}\) is the best value of the objective function obtained by the solver and \(f_\mathtt{opt}\) is the best known (or optimal) solution. Otherwise, the solver fails. We set \(\varepsilon =10^{-3}\). In addition to the usual stopping criteria of the solvers, we terminate the experiment if the elapsed CPU time exceeds half an hour per problem.

Fig. 1
figure 1

CPU time

3.2 Results

The results are analyzed using the performance profiles (Figs. 1, 2) introduced in [12]. We compare the performance of the solvers both in terms of the computational time (CPU time) and the number of the subgradient evaluations (evaluations for short)—we use the sum of the number of the subgradient evaluations of the DC components in the AggSub, the PBDC and the DCA, while for the SolvOpt we use the number of the subgradient evaluations of the objective function since this solver does not utilize the DC structure of the problem. The number of function evaluations follows the similar trends with the number of the subgradient evaluations with all the solvers, and thus, we omit these results.

In the performance profiles, the value of \(\rho _s(\tau )\) at \(\tau =0\) shows the ratio of the test problems for which the solver s is the best—that is, the solver s uses the least computational time or evaluations—while the value of \(\rho _s(\tau )\) at the rightmost abscissa gives the ratio of the test problems that the solver s can solve—that is, the robustness of the solver s. In addition, the higher is a particular curve, the better is the corresponding solver.

It is clear from Fig. 1 that the AggSub method is the most efficient and accurate with all sizes of the test problems—it is superior to other three methods in the large problems with \(n \ge 200\). Furthermore, the AggSub method becomes more robust in comparison with other methods as the number of variables increases. With the number of evaluations, as usual for any subgradient method, the AggSub method has rather high numbers as can be seen in Fig. 2a, b. This means that the AggSub method might not be efficient when function and subgradient evaluations are expensive. However, this observation is faded out with the larger problems (see Fig. 2c, d). The reason for most of the failures with the SolvOpt is its inaccuracy while the PBDC usually stops far away from the solution due to the time limit. From these results, we conclude that, overall, the AggSub method is efficient and robust.

Fig. 2
figure 2

Number of subgradient evaluations

4 Concluding remarks

In this paper, an aggregated subgradient method is developed to solve unconstrained nonsmooth difference of convex (DC) optimization problems. The method combines the strengths of the well-known subgradient and bundle methods. It is a descent method, easy to implement and does not involve solving of any time-consuming quadratic programming subproblem, since only two subgradients are utilized to calculate search directions. In addition, the proposed method is globally convergent to a critical point.

The aggregated subgradient method was tested using large-scale nonsmooth unconstrained DC programming problems and compared with several other nonsmooth DC programming solvers. Results demonstrate that the method is efficient and robust for solving nonsmooth DC optimization problems, although in some cases it may require a large number of function and subgradient evaluations. The proposed method outperforms other methods when the evaluation of the objective function and its subgradient is not expensive.