Keywords

1 Introduction

This chapter presents the clustering algorithms based on the DC optimization approaches. In Chap. 4, the clustering problems are formulated using the DC representation of their objective functions. Using this representation we describe three different DC optimization algorithms.

For simplicity we use the following unconstrained DC programming problem to represent both the clustering and the auxiliary clustering problems (4.20) and (4.34):

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad & f(\mathbf{x})= f_1(\mathbf{x}) - f_2(\mathbf{x}) \\ \text{subject to} & \mathbf{x} \in \mathbb{R}^{n}, \end{cases} \end{aligned} $$
(9.1)

where both f 1 and f 2 are finite valued convex functions on \(\mathbb {R}^n\). As mentioned before, if the squared Euclidean norm is used to define the similarity measure, then the function f 1 is smooth and the function f 2 is, in general, nonsmooth. However, with other two similarity measures d 1 and d , both functions are nonsmooth. In this chapter, we only consider the first case and present three different algorithms to solve the clustering problem (9.1).

We start with the incremental nonsmooth DC clustering algorithm [36]. This algorithm combines the MSInc-Clust with the algorithm for finding \(\inf \)-stationary points given in Fig. 3.7. The latter algorithm, in its turn, applies the NDCM presented in Fig. 3.8.

Then we present the DC diagonal bundle clustering algorithm [170]. Similar to the incremental DC clustering algorithm, the DC diagonal bundle clustering algorithm is a combination of the MSInc-Clust and the NSO methods. However, here we apply the DCD-Bundle given in Fig. 3.6 instead of the NDCM.

Finally, we describe the incremental DCA for clustering [20]. The algorithm is a combination of the DCA (see Fig. 3.9) and the MSInc-Clust.

2 Incremental Nonsmooth DC Clustering Algorithm

The incremental nonsmooth DC clustering algorithm (NDC-Clust) is a combination of three different algorithms. The MSInc-Clust is used to solve the clustering problem globally. At each iteration of this algorithm the algorithm for finding \(\inf \)-stationary points is applied to solve both the clustering and the auxiliary clustering problems. In its turn, the later algorithm uses the NDCM to find Clarke stationary points of these problems. The flowchart of NDC-Clust is given in Fig. 9.1.

Fig. 9.1
figure 1

Incremental nonsmooth DC clustering algorithm (NDC-Clust)

Next, we present a detailed description of the NDC-Clust. For a given point \(\mathbf {x} \in \mathbb {R}^n\) and a number λ > 0, consider the set

$$\displaystyle \begin{aligned} Q_1(\mathbf{x},\lambda) = \operatorname{\mathrm{conv}} \big\{\nabla f_1(\mathbf{x}+\lambda \mathbf{g}): ~\mathbf{g} \in S_1 \big\}, \end{aligned}$$

where S 1 is the sphere of the unit ball. It is obvious that the set Q 1(x, λ) is convex and since the function f 1 is smooth it is also compact for any \(\mathbf {x} \in \mathbb {R}^n\) and λ > 0.

Recall that a point \({\mathbf {x}}^* \in \mathbb {R}^n\) is called (λ, δ)-\(\inf \)-stationary of the problem (9.1) if and only if

$$\displaystyle \begin{aligned} \partial f_2({\mathbf{x}}^*) \subset Q_1({\mathbf{x}}^*,\lambda) + B({\pmb 0};\delta), \end{aligned}$$

and (λ, δ)-stationary if there exists ξ 2 ∈ ∂f 2(x ) such that

$$\displaystyle \begin{aligned} {\boldsymbol{\xi}}_2 \in Q_1({\mathbf{x}}^*,\lambda) + B({\pmb 0};\delta). \end{aligned}$$

If a point \(\mathbf {x} \in \mathbb {R}^n\) is not a (λ, δ)-stationary point, then ∥ξ 2 −z∥≥ δ for all ξ 2 ∈ ∂f 2(x) and z ∈ Q 1(x, λ). Take any ξ 2 ∈ ∂f 2(x) and construct the set

$$\displaystyle \begin{aligned}\widetilde{Q}(\mathbf{x},\lambda,{\boldsymbol{\xi}}_2)=Q_1(\mathbf{x},\lambda)-{\boldsymbol{\xi}}_2, \end{aligned}$$

then we have

$$\displaystyle \begin{aligned} f(\mathbf{x}+\lambda \mathbf{u}) -f(\mathbf{x}) \leq \lambda \max_{\mathbf{z} \in \widetilde{Q}(\mathbf{ x},\lambda,{{\boldsymbol{\xi}}}_2)} {\mathbf{z}}^T \mathbf{u} \quad \mbox{for all} \quad \mathbf{u} \in \mathbb{R}^n. \end{aligned}$$

It is shown in Proposition 3.9 that if the point x is not a (λ, δ)-stationary, then the set \(\widetilde {Q}(\mathbf {x},\lambda ,{\boldsymbol {\xi }}_2)\) can be used to find a direction of sufficient decrease of the function f at x. However, the computation of this set is not always possible. Next, we give a step by step algorithm which uses a finite number of elements from \(\widetilde {Q}(\mathbf {x},\lambda ,{\boldsymbol {\xi }}_2)\) to compute descent directions, (λ, δ)-stationary points, and eventually Clarke stationary points of the problem (9.1). The flowchart and the more detailed description of this method (NDCM) are given in Sect. 3.6. Here, we use x 1 for the starting point; ε > 0 for the stopping tolerance; ε L and ε R for line search parameters.

The convergence results for Algorithm 9.1 are given in Sect. 3.6. The next two propositions recall the most important results in light of the clustering problem.

Proposition 9.1

Algorithm 9.1 finds (λ, δ)-stationary points of the clustering and the auxiliary clustering problems in at most \(h_{\max }\) iterations where

$$\displaystyle \begin{aligned}h_{\max}=\left\lceil\frac{f({\mathbf{x}}_1)}{\lambda \delta\varepsilon_R}\right\rceil. \end{aligned}$$

Proof

The proof follows from Proposition 3.10 and the fact that \(f^*=\inf \{f(\mathbf {x}),~ \mathbf {x}\in \mathbb {R}^n\}>0\) for both the clustering and the auxiliary clustering problems. □

Proposition 9.2

Assume that ε = 0. Then all limit points of the sequence {x h} generated by Algorithm 9.1 are Clarke stationary points of the clustering or the auxiliary clustering problems.

An algorithm for finding \(\inf \)-stationary points of the problem (9.1) is presented next (see also Fig. 3.7). Assume that x is a Clarke stationary point found by Algorithm 9.1. If the subdifferential ∂f 2(x ) is a singleton, then according to Proposition 3.7 the point is also an \(\inf \)-stationary point.

Algorithm 9.1 Nonsmooth DC algorithm

If the subdifferential ∂f 2(x ) is not a singleton, Corollary 3.3 implies that the point x is not \(\inf \)-stationary. Then according to Proposition 3.6 a descent direction from this point can be computed which in turn allows us to find a new starting point for Algorithm 9.1.

Algorithm 9.2 Finding inf-stationary points of clustering problems

Note that if the subdifferential ∂f 2(x) is not singleton, then the two subgradients \({\boldsymbol {\xi }}_2^1,{\boldsymbol {\xi }}_2^2 \in \partial f_2(\mathbf {x})\), such that \({\boldsymbol {\xi }}_2^1 \ne {\boldsymbol {\xi }}_2^2\) can be computed as described in Remarks 4.2 and 4.6. In addition, the following Lemmas show that the gradients of functions \(\bar {f}_{k1}\) and f k1, given respectively in (4.33) and (4.19), satisfy Lipschitz condition.

Lemma 9.1

The gradient of the function \(\bar {f}_{k1}\) satisfies Lipschitz condition on \(\mathbb {R}^n\) with the constant L = 2.

Proof

Recall that the gradient of the function \(\bar {f}_{k1}\) at a point \(\mathbf {y} \in \mathbb {R}^n\) is

$$\displaystyle \begin{aligned} \nabla \bar{f}_{k1}(\mathbf{y}) = \frac{2}{m} \sum_{\mathbf{a} \in A} (\mathbf{y}-\mathbf{a}). \end{aligned}$$

Then for any \({\mathbf {y}}_1, {\mathbf {y}}_2 \in \mathbb {R}^n\) we get

$$\displaystyle \begin{aligned}\nabla \bar{f}_{k1}({\mathbf{y}}_1) - \nabla \bar{f}_{k1}({\mathbf{y}}_2)= 2({\mathbf{y}}_1-{\mathbf{y}}_2). \end{aligned}$$

Therefore,

$$\displaystyle \begin{aligned}\|\nabla \bar{f}_{k1}({\mathbf{y}}_1) - \nabla \bar{f}_{k1}({\mathbf{y}}_2)\| = 2\|{\mathbf{y}}_1-{\mathbf{y}}_2\|,\end{aligned}$$

that is the gradient \(\nabla \bar {f}_{k1}\) satisfies the Lipschitz condition on \(\mathbb {R}^n\) with the constant L = 2. □

Lemma 9.2

The gradient of the function f k1 satisfies Lipschitz condition on \(\mathbb {R}^{nk}\) with the constant L = 2.

Proof

The proof is similar to that of Lemma 9.1. □

Considering clustering problems we can now get the following result.

Proposition 9.3

Algorithm 9.2 terminates after the finite number of iterations at an approximate \(\inf \) -stationary point of the (auxiliary) clustering problem.

Proof

The proof follows directly from Proposition 3.8 and Lemmas 9.1 and 9.2. □

Algorithm 9.3 Incremental nonsmooth DC clustering algorithm (NDC-Clust)

Now we are ready to give the NDC-Clust for solving the problem (9.1). The NDC-Clust first uses Algorithm 7.2 to generate a set of promising starting points for the auxiliary clustering problem. In addition, Algorithm 9.2 is utilized to solve both the clustering and the auxiliary clustering problems. This algorithm, in its turn, applies Algorithm 9.1 to find Clarke stationary points of the clustering problems. The NDC-Clust is described in Algorithm 9.3.

Remark 9.1

Algorithm 9.3 can be used to solve clustering problems with the distance functions d 1 and d if we apply the partial smoothing to the functions f k and \(\bar {f_k}\), described in Sects. 4.7.4 and 4.7.5, respectively (see [23]). More specifically, if we approximate the first component of the (auxiliary) cluster function by applying a smoothing technique then Algorithm 9.3 becomes applicable to solve clustering problems with the distance functions d 1 and d .

3 DC Diagonal Bundle Clustering Algorithm

In this section, we describe the DC diagonal bundle clustering algorithm (DCDB-Clust) for solving the problem (9.1) in large data sets [170]. The algorithm is a combination of three different algorithms. The MSInc-Clust is used to solve the clustering problem globally. At each iteration of this algorithm a modified version of the algorithm for finding \(\inf \)-stationary points (Algorithm 9.2) is applied to solve both the clustering and the auxiliary clustering problems. The later algorithm uses the DCD-Bundle to find Clarke stationary points of these problems. The flowchart of DCDB-Clust is given in Fig. 9.2.

Fig. 9.2
figure 2

DC diagonal bundle clustering algorithm (DCDB-Clust)

The DCD-Bundle is developed specifically to solve the clustering problems that are formulated as the nonsmooth DC optimization problem. The flowchart and more details of this method are given in Sect. 3.5. Here, we give the algorithm in its step by step form. We use x 1 for the starting point; ε c > 0 for the stopping

Algorithm 9.4 DC diagonal bundle algorithm

tolerance; ε L and ε R for line search parameters; γ for the distance measure parameter; \(\hat {m}_c\) for the maximum number of stored correction vectors used to form diagonal updates. We also use i type to show the type of the problem, that is:

  • i type = 0: the auxiliary clustering problem (7.4);

  • i type = 1: the clustering problem (7.2).

Algorithm 9.5 Finding inf-stationary points of clustering problems

The convergence properties of the DCD-Bundle are studied in Sect. 3.5. Here, we recall the most important results for clustering problems. Note that Assumptions 3.53.6 are trivially satisfied for both the cluster and the auxiliary cluster functions.

Proposition 9.4

Assume ε c = 0. If Algorithm 9.4 terminates at the hth iteration, then the point x h is a Clarke stationary point of the (auxiliary) clustering problem.

Proposition 9.5

Assume ε c = 0. Every accumulation point of the sequence {x h} generated by Algorithm 9.4 is a Clarke stationary of the (auxiliary) clustering problem.

If the function f 2 in the problem (9.1) is smooth, then the point found by Algorithm 9.4 is also \(\inf \)-stationary. Otherwise, a slight modification of Algorithm 9.2 is applied to find an \(\inf \)-stationary point of the problem. This modification is given in Algorithm 9.5.

If the subdifferential ∂f 2(x) is not a singleton, then we can compute two different subgradients \({\boldsymbol {\xi }}_2^1,{\boldsymbol {\xi }}_2^2 \in \partial f_2(\mathbf {x})\) in Step 4 of Algorithm 9.5 (see Remarks 4.2 and 4.6). In addition, in Lemmas 9.1 and 9.2, we proved that the gradients of functions \(\bar {f}_{k1}\) and f k1 (see (4.20) and (4.34)) satisfy the Lipschitz condition. Then we get the following convergence result for clustering problems.

Proposition 9.6

Algorithm 9.5 terminates after finite number of iterations at an approximate \(\inf \) -stationary point of the (auxiliary) clustering problem.

Proof

The proof follows directly from Proposition 3.8 and Lemmas 9.1 and 9.2. □

Next, we give the step by step description of the DCDB-Clust.

Algorithm 9.6 DC diagonal bundle clustering algorithm (DCDB-Clust)

Remark 9.2

Similar to Algorithm 9.3, Algorithm 9.6 can be applied to solve clustering problems with the distance functions d 1 and d if we apply the partial smoothing to the cluster function f k and the auxiliary cluster function \(\bar {f_k}\).

4 Incremental DCA for Clustering

In this section, we describe an incremental DCA for clustering (IDCA-Clust) to solve the clustering problem (9.1) [20]. The IDCA-Clust is based on the MSInc-Clust and the DCA, where the latter algorithm is utilized at each iteration of the MSInc-Clust to solve the clustering and the auxiliary clustering problems. Figure 9.3 illustrates the flowchart of the IDCA-Clust.

Fig. 9.3
figure 3

Incremental DCA for clustering (IDCA-Clust)

First, we recall the DCA for solving the unconstrained DC programming problem (9.1) when the first DC component f 1 is continuously differentiable.

Algorithm 9.7 DC algorithm

Next, we explain how this algorithm can be applied to solve the clustering and the auxiliary clustering problems (9.1). We start with the clustering problem. Let \({\mathbf {x}}_h=({\mathbf {x}}_{h,1},\ldots ,{\mathbf {x}}_{h,k}) \in \mathbb {R}^{nk}\) be a vector of cluster centers at the iteration h and A 1, …, A k be the cluster partition of the data set A provided by these centers.

We discussed the subdifferentials of the functions f 1 and f 2 in Sect. 4.4. Here, we recall them when the similarity measure d 2 is used in these functions. In this case, the function f 1 is continuously differentiable and we have

$$\displaystyle \begin{aligned} \nabla f_{k1}(\mathbf{x}) = 2(\mathbf{x}-\tilde{\mathbf{a}}), \quad \mathbf{x} \in \mathbb{R}^{nk}, \end{aligned}$$

where \(\tilde {\mathbf {a}} = (\bar {\mathbf {a}},\ldots ,\bar {\mathbf {a}})\) and \(\bar {\mathbf {a}} = \frac {1}{m} \sum _{i=1}^m {\mathbf {a}}_i.\)

For the subdifferential of the function f 2, recall the function φ a(x) and the set \(\widetilde {\mathcal {R}}_{\mathbf { a}}(\mathbf {x}),~\mathbf {x} \in \mathbb {R}^{nk}\), defined in (4.22) and (4.23), respectively:

$$\displaystyle \begin{aligned} \varphi_{\mathbf{a}}(\mathbf{x}) = \max_{j=1,\ldots,k} \sum_{s=1, s\neq j}^k ~d_2({\mathbf{x}}_s,\mathbf{a}), \end{aligned}$$

and

$$\displaystyle \begin{aligned} \widetilde{\mathcal{R}}_{\mathbf{a}}(\mathbf{x}) = \Big\{j \in \{1,\ldots,k\}~:~\sum_{s=1, s\neq j}^k ~d_2({\mathbf{x}}_s,\mathbf{a}) = \varphi_{\mathbf{a}}(\mathbf{x}) \Big\}. \end{aligned}$$

Then we have

$$\displaystyle \begin{aligned} \partial \varphi_{\mathbf{a}}(\mathbf{x}) = \operatorname{\mathrm{conv}} &\Big\{2\big({\mathbf{x}}_1-\mathbf{a},\ldots,{\mathbf{x}}_{j-1} - \mathbf{a},{\pmb 0}, {\mathbf{x}}_{j+1} - \mathbf{a},\ldots,{\mathbf{x}}_k - \mathbf{a}\big), \\ &j \in \widetilde{\mathcal{R}}_{\mathbf{a}}(\mathbf{x}) \Big\}, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \partial f_{k2}(\mathbf{x}) = \frac{1}{m} \sum_{\mathbf{a} \in A} \partial \varphi_{\mathbf{a}}(\mathbf{x}). \end{aligned}$$

Applying these formulas for subdifferentials, the subgradient ξ 2,h ∈ ∂f 2(x h) in Step ?? of Algorithm 9.7 is

$$\displaystyle \begin{aligned} {\boldsymbol{\xi}}_{2,h} &= \frac{2}{m} \Big(\sum_{\mathbf{a}\in A\setminus A^1} ({\mathbf{x}}_{h,1}-\mathbf{a}),\ldots,\sum_{\mathbf{a} \in A\setminus A^k} ({\mathbf{x}}_{h,k}-\mathbf{a}) \Big)\\ &= \frac{2}{m} \Big((m-|A^1|) {\mathbf{x}}_{h,1} -(m\bar{\mathbf{a}}-|A^1| \bar{\mathbf{a}}_1),\ldots, \\ &\quad \quad \quad (m-|A^k|) {\mathbf{x}}_{h,k}-(m\bar{\mathbf{a}}-|A^k| \bar{\mathbf{a}}_k)\Big), \end{aligned} $$

where \(\bar {\mathbf {a}}_l\) is the center of the cluster A l, l = 1, …, k and \(\bar {\mathbf {a}}\) is the center of the whole set A. In addition, the solution x h+1 = (x h+1,1, …, x h+1,k) to the problem (??) in Step 4 of Algorithm 9.7 is

$$\displaystyle \begin{aligned}{\mathbf{x}}_{h+1,t} = \Big(1-\frac{|A^t|}{m} \Big){\mathbf{x}}_{h,t} + \frac{|A^t|}{m} \bar{\mathbf{a}}_t, \quad t=1,\ldots,k, \end{aligned}$$

and the stopping criterion in Step 3 of this algorithm can be given as

$$\displaystyle \begin{aligned}{\mathbf{x}}_{h,t} = \Big(1-\frac{|A^t|}{m} \Big){\mathbf{x}}_{h,t} + \frac{|A^t|}{m} \bar{\mathbf{a}}_t, \quad t=1,\ldots,k. \end{aligned}$$

In order to apply Algorithm 9.7 for solving the auxiliary clustering problem, recall the sets B i(y), i = 1, 2, 3, defined in (4.30) for p = 2 and \(\mathbf {y} ={\mathbf {x}}_h \in \mathbb {R}^n\):

$$\displaystyle \begin{aligned} B_1({\mathbf{x}}_h)&=\big\{\mathbf{a} \in A: ~r_{l-1}^{\mathbf{a}} < d_2({\mathbf{x}}_h,\mathbf{a})\big\}, \\ B_2({\mathbf{x}}_h)&=\big\{\mathbf{a} \in A: ~r_{l-1}^{\mathbf{a}} = d_2({\mathbf{x}}_h,\mathbf{a})\big\}, \quad \mbox{and} \\ B_3({\mathbf{x}}_h)&=\big\{\mathbf{a} \in A: ~r_{l-1}^{\mathbf{a}} > d_2({\mathbf{x}}_h,\mathbf{a})\big\}. \end{aligned} $$

Then the subgradient ξ 2,h ∈ ∂f 2(x h) in Step 2 of Algorithm 9.7 is computed as

$$\displaystyle \begin{aligned}{\boldsymbol{\xi}}_{2,h} = \frac{2}{m} \sum_{\mathbf{a} \in B_1({\mathbf{x}}_h)} ({\mathbf{x}}_h-\mathbf{a}),\quad {\mathbf{x}}_h \in \mathbb{R}^n. \end{aligned}$$

Furthermore, the solution x h+1 to the problem (??) in Step ?? is

$$\displaystyle \begin{aligned}{\mathbf{x}}_{h+1}=\frac{1}{m}\left(|B_1({\mathbf{x}}_h)| {\mathbf{x}}_h + \sum_{\mathbf{a} \in B_2({\mathbf{x}}_h) \cup B_3({\mathbf{x}}_h)} \mathbf{ a}\right). \end{aligned}$$

Finally, the stopping criterion in Step ?? of Algorithm 9.7 can be given by

$$\displaystyle \begin{aligned}\sum_{\mathbf{a} \in B_2({\mathbf{x}}_h) \cup B_3({\mathbf{x}}_h)} ({\mathbf{x}}_h-\mathbf{a}) = 0. \end{aligned}$$

These results demonstrate that there is no need to apply any optimization algorithm to solve the problem (??) for both the DC clustering and the DC auxiliary clustering problems. In both cases solutions can be expressed explicitly.

Algorithm 9.8 Incremental DCA for clustering (IDCA-Clust)

Proposition 9.7

All accumulation points of the sequence {x h} generated by Algorithm 9.7 are Clarke stationary points of the problem (9.1) when d 2 is used as a similarity measure.

Proof

Since the function f 1 in the problem (9.1) with the similarity measure d 2 is smooth the sets of critical points and Clarke stationary points of this problem coincide (see Theorem 2.27 and Fig. 2.9). □

Now, we are ready to design an IDCA-Clust. This algorithm is based on the MSInc-Clust and the DCA. The IDCA-Clust applies the MSInc-Clust for solving the clustering problem globally and the DCA is utilized at each iteration of the MSInc-Clust to solve both the clustering and the auxiliary clustering problems. The step by step description of the IDCA-Clust is given in Algorithm 9.8.

Remark 9.3

Similar to Algorithms 9.3 and 9.6, we can apply Algorithm 9.2 in Steps 4 and 5 of Algorithm 9.8. Then we obtain \(\inf \)-stationary points of both the clustering and the auxiliary clustering problems.