Abstract
This chapter is devoted to the clustering algorithms based on DC optimization approaches. The incremental DC clustering algorithm is studied first. Then the DC diagonal bundle clustering algorithm is described. Finally, the incremental DCA for clustering is presented. We provide the detailed descriptions and flowcharts of these algorithms.
Access provided by Autonomous University of Puebla. Download chapter PDF
Keywords
- DC optimization
- Incremental nonsmooth DC clustering
- DC diagonal bundle clustering
- Incremental DCA for clustering
1 Introduction
This chapter presents the clustering algorithms based on the DC optimization approaches. In Chap. 4, the clustering problems are formulated using the DC representation of their objective functions. Using this representation we describe three different DC optimization algorithms.
For simplicity we use the following unconstrained DC programming problem to represent both the clustering and the auxiliary clustering problems (4.20) and (4.34):
where both f 1 and f 2 are finite valued convex functions on \(\mathbb {R}^n\). As mentioned before, if the squared Euclidean norm is used to define the similarity measure, then the function f 1 is smooth and the function f 2 is, in general, nonsmooth. However, with other two similarity measures d 1 and d ∞, both functions are nonsmooth. In this chapter, we only consider the first case and present three different algorithms to solve the clustering problem (9.1).
We start with the incremental nonsmooth DC clustering algorithm [36]. This algorithm combines the MSInc-Clust with the algorithm for finding \(\inf \)-stationary points given in Fig. 3.7. The latter algorithm, in its turn, applies the NDCM presented in Fig. 3.8.
Then we present the DC diagonal bundle clustering algorithm [170]. Similar to the incremental DC clustering algorithm, the DC diagonal bundle clustering algorithm is a combination of the MSInc-Clust and the NSO methods. However, here we apply the DCD-Bundle given in Fig. 3.6 instead of the NDCM.
Finally, we describe the incremental DCA for clustering [20]. The algorithm is a combination of the DCA (see Fig. 3.9) and the MSInc-Clust.
2 Incremental Nonsmooth DC Clustering Algorithm
The incremental nonsmooth DC clustering algorithm (NDC-Clust) is a combination of three different algorithms. The MSInc-Clust is used to solve the clustering problem globally. At each iteration of this algorithm the algorithm for finding \(\inf \)-stationary points is applied to solve both the clustering and the auxiliary clustering problems. In its turn, the later algorithm uses the NDCM to find Clarke stationary points of these problems. The flowchart of NDC-Clust is given in Fig. 9.1.
Next, we present a detailed description of the NDC-Clust. For a given point \(\mathbf {x} \in \mathbb {R}^n\) and a number λ > 0, consider the set
where S 1 is the sphere of the unit ball. It is obvious that the set Q 1(x, λ) is convex and since the function f 1 is smooth it is also compact for any \(\mathbf {x} \in \mathbb {R}^n\) and λ > 0.
Recall that a point \({\mathbf {x}}^* \in \mathbb {R}^n\) is called (λ, δ)-\(\inf \)-stationary of the problem (9.1) if and only if
and (λ, δ)-stationary if there exists ξ 2 ∈ ∂f 2(x ∗) such that
If a point \(\mathbf {x} \in \mathbb {R}^n\) is not a (λ, δ)-stationary point, then ∥ξ 2 −z∥≥ δ for all ξ 2 ∈ ∂f 2(x) and z ∈ Q 1(x, λ). Take any ξ 2 ∈ ∂f 2(x) and construct the set
then we have
It is shown in Proposition 3.9 that if the point x is not a (λ, δ)-stationary, then the set \(\widetilde {Q}(\mathbf {x},\lambda ,{\boldsymbol {\xi }}_2)\) can be used to find a direction of sufficient decrease of the function f at x. However, the computation of this set is not always possible. Next, we give a step by step algorithm which uses a finite number of elements from \(\widetilde {Q}(\mathbf {x},\lambda ,{\boldsymbol {\xi }}_2)\) to compute descent directions, (λ, δ)-stationary points, and eventually Clarke stationary points of the problem (9.1). The flowchart and the more detailed description of this method (NDCM) are given in Sect. 3.6. Here, we use x 1 for the starting point; ε > 0 for the stopping tolerance; ε L and ε R for line search parameters.
The convergence results for Algorithm 9.1 are given in Sect. 3.6. The next two propositions recall the most important results in light of the clustering problem.
Proposition 9.1
Algorithm 9.1 finds (λ, δ)-stationary points of the clustering and the auxiliary clustering problems in at most \(h_{\max }\) iterations where
Proof
The proof follows from Proposition 3.10 and the fact that \(f^*=\inf \{f(\mathbf {x}),~ \mathbf {x}\in \mathbb {R}^n\}>0\) for both the clustering and the auxiliary clustering problems. □
Proposition 9.2
Assume that ε = 0. Then all limit points of the sequence {x h} generated by Algorithm 9.1 are Clarke stationary points of the clustering or the auxiliary clustering problems.
An algorithm for finding \(\inf \)-stationary points of the problem (9.1) is presented next (see also Fig. 3.7). Assume that x ∗ is a Clarke stationary point found by Algorithm 9.1. If the subdifferential ∂f 2(x ∗) is a singleton, then according to Proposition 3.7 the point is also an \(\inf \)-stationary point.
Algorithm 9.1 Nonsmooth DC algorithm
If the subdifferential ∂f 2(x ∗) is not a singleton, Corollary 3.3 implies that the point x ∗ is not \(\inf \)-stationary. Then according to Proposition 3.6 a descent direction from this point can be computed which in turn allows us to find a new starting point for Algorithm 9.1.
Algorithm 9.2 Finding inf-stationary points of clustering problems
Note that if the subdifferential ∂f 2(x) is not singleton, then the two subgradients \({\boldsymbol {\xi }}_2^1,{\boldsymbol {\xi }}_2^2 \in \partial f_2(\mathbf {x})\), such that \({\boldsymbol {\xi }}_2^1 \ne {\boldsymbol {\xi }}_2^2\) can be computed as described in Remarks 4.2 and 4.6. In addition, the following Lemmas show that the gradients of functions \(\bar {f}_{k1}\) and f k1, given respectively in (4.33) and (4.19), satisfy Lipschitz condition.
Lemma 9.1
The gradient of the function \(\bar {f}_{k1}\) satisfies Lipschitz condition on \(\mathbb {R}^n\) with the constant L = 2.
Proof
Recall that the gradient of the function \(\bar {f}_{k1}\) at a point \(\mathbf {y} \in \mathbb {R}^n\) is
Then for any \({\mathbf {y}}_1, {\mathbf {y}}_2 \in \mathbb {R}^n\) we get
Therefore,
that is the gradient \(\nabla \bar {f}_{k1}\) satisfies the Lipschitz condition on \(\mathbb {R}^n\) with the constant L = 2. □
Lemma 9.2
The gradient of the function f k1 satisfies Lipschitz condition on \(\mathbb {R}^{nk}\) with the constant L = 2.
Proof
The proof is similar to that of Lemma 9.1. □
Considering clustering problems we can now get the following result.
Proposition 9.3
Algorithm 9.2 terminates after the finite number of iterations at an approximate \(\inf \) -stationary point of the (auxiliary) clustering problem.
Proof
The proof follows directly from Proposition 3.8 and Lemmas 9.1 and 9.2. □
Algorithm 9.3 Incremental nonsmooth DC clustering algorithm (NDC-Clust)
Now we are ready to give the NDC-Clust for solving the problem (9.1). The NDC-Clust first uses Algorithm 7.2 to generate a set of promising starting points for the auxiliary clustering problem. In addition, Algorithm 9.2 is utilized to solve both the clustering and the auxiliary clustering problems. This algorithm, in its turn, applies Algorithm 9.1 to find Clarke stationary points of the clustering problems. The NDC-Clust is described in Algorithm 9.3.
Remark 9.1
Algorithm 9.3 can be used to solve clustering problems with the distance functions d 1 and d ∞ if we apply the partial smoothing to the functions f k and \(\bar {f_k}\), described in Sects. 4.7.4 and 4.7.5, respectively (see [23]). More specifically, if we approximate the first component of the (auxiliary) cluster function by applying a smoothing technique then Algorithm 9.3 becomes applicable to solve clustering problems with the distance functions d 1 and d ∞.
3 DC Diagonal Bundle Clustering Algorithm
In this section, we describe the DC diagonal bundle clustering algorithm (DCDB-Clust) for solving the problem (9.1) in large data sets [170]. The algorithm is a combination of three different algorithms. The MSInc-Clust is used to solve the clustering problem globally. At each iteration of this algorithm a modified version of the algorithm for finding \(\inf \)-stationary points (Algorithm 9.2) is applied to solve both the clustering and the auxiliary clustering problems. The later algorithm uses the DCD-Bundle to find Clarke stationary points of these problems. The flowchart of DCDB-Clust is given in Fig. 9.2.
The DCD-Bundle is developed specifically to solve the clustering problems that are formulated as the nonsmooth DC optimization problem. The flowchart and more details of this method are given in Sect. 3.5. Here, we give the algorithm in its step by step form. We use x 1 for the starting point; ε c > 0 for the stopping
Algorithm 9.4 DC diagonal bundle algorithm
tolerance; ε L and ε R for line search parameters; γ for the distance measure parameter; \(\hat {m}_c\) for the maximum number of stored correction vectors used to form diagonal updates. We also use i type to show the type of the problem, that is:
Algorithm 9.5 Finding inf-stationary points of clustering problems
The convergence properties of the DCD-Bundle are studied in Sect. 3.5. Here, we recall the most important results for clustering problems. Note that Assumptions 3.5–3.6 are trivially satisfied for both the cluster and the auxiliary cluster functions.
Proposition 9.4
Assume ε c = 0. If Algorithm 9.4 terminates at the hth iteration, then the point x h is a Clarke stationary point of the (auxiliary) clustering problem.
Proposition 9.5
Assume ε c = 0. Every accumulation point of the sequence {x h} generated by Algorithm 9.4 is a Clarke stationary of the (auxiliary) clustering problem.
If the function f 2 in the problem (9.1) is smooth, then the point found by Algorithm 9.4 is also \(\inf \)-stationary. Otherwise, a slight modification of Algorithm 9.2 is applied to find an \(\inf \)-stationary point of the problem. This modification is given in Algorithm 9.5.
If the subdifferential ∂f 2(x) is not a singleton, then we can compute two different subgradients \({\boldsymbol {\xi }}_2^1,{\boldsymbol {\xi }}_2^2 \in \partial f_2(\mathbf {x})\) in Step 4 of Algorithm 9.5 (see Remarks 4.2 and 4.6). In addition, in Lemmas 9.1 and 9.2, we proved that the gradients of functions \(\bar {f}_{k1}\) and f k1 (see (4.20) and (4.34)) satisfy the Lipschitz condition. Then we get the following convergence result for clustering problems.
Proposition 9.6
Algorithm 9.5 terminates after finite number of iterations at an approximate \(\inf \) -stationary point of the (auxiliary) clustering problem.
Proof
The proof follows directly from Proposition 3.8 and Lemmas 9.1 and 9.2. □
Next, we give the step by step description of the DCDB-Clust.
Algorithm 9.6 DC diagonal bundle clustering algorithm (DCDB-Clust)
Remark 9.2
Similar to Algorithm 9.3, Algorithm 9.6 can be applied to solve clustering problems with the distance functions d 1 and d ∞ if we apply the partial smoothing to the cluster function f k and the auxiliary cluster function \(\bar {f_k}\).
4 Incremental DCA for Clustering
In this section, we describe an incremental DCA for clustering (IDCA-Clust) to solve the clustering problem (9.1) [20]. The IDCA-Clust is based on the MSInc-Clust and the DCA, where the latter algorithm is utilized at each iteration of the MSInc-Clust to solve the clustering and the auxiliary clustering problems. Figure 9.3 illustrates the flowchart of the IDCA-Clust.
First, we recall the DCA for solving the unconstrained DC programming problem (9.1) when the first DC component f 1 is continuously differentiable.
Algorithm 9.7 DC algorithm
Next, we explain how this algorithm can be applied to solve the clustering and the auxiliary clustering problems (9.1). We start with the clustering problem. Let \({\mathbf {x}}_h=({\mathbf {x}}_{h,1},\ldots ,{\mathbf {x}}_{h,k}) \in \mathbb {R}^{nk}\) be a vector of cluster centers at the iteration h and A 1, …, A k be the cluster partition of the data set A provided by these centers.
We discussed the subdifferentials of the functions f 1 and f 2 in Sect. 4.4. Here, we recall them when the similarity measure d 2 is used in these functions. In this case, the function f 1 is continuously differentiable and we have
where \(\tilde {\mathbf {a}} = (\bar {\mathbf {a}},\ldots ,\bar {\mathbf {a}})\) and \(\bar {\mathbf {a}} = \frac {1}{m} \sum _{i=1}^m {\mathbf {a}}_i.\)
For the subdifferential of the function f 2, recall the function φ a(x) and the set \(\widetilde {\mathcal {R}}_{\mathbf { a}}(\mathbf {x}),~\mathbf {x} \in \mathbb {R}^{nk}\), defined in (4.22) and (4.23), respectively:
and
Then we have
and
Applying these formulas for subdifferentials, the subgradient ξ 2,h ∈ ∂f 2(x h) in Step ?? of Algorithm 9.7 is
where \(\bar {\mathbf {a}}_l\) is the center of the cluster A l, l = 1, …, k and \(\bar {\mathbf {a}}\) is the center of the whole set A. In addition, the solution x h+1 = (x h+1,1, …, x h+1,k) to the problem (??) in Step 4 of Algorithm 9.7 is
and the stopping criterion in Step 3 of this algorithm can be given as
In order to apply Algorithm 9.7 for solving the auxiliary clustering problem, recall the sets B i(y), i = 1, 2, 3, defined in (4.30) for p = 2 and \(\mathbf {y} ={\mathbf {x}}_h \in \mathbb {R}^n\):
Then the subgradient ξ 2,h ∈ ∂f 2(x h) in Step 2 of Algorithm 9.7 is computed as
Furthermore, the solution x h+1 to the problem (??) in Step ?? is
Finally, the stopping criterion in Step ?? of Algorithm 9.7 can be given by
These results demonstrate that there is no need to apply any optimization algorithm to solve the problem (??) for both the DC clustering and the DC auxiliary clustering problems. In both cases solutions can be expressed explicitly.
Algorithm 9.8 Incremental DCA for clustering (IDCA-Clust)
Proposition 9.7
All accumulation points of the sequence {x h} generated by Algorithm 9.7 are Clarke stationary points of the problem (9.1) when d 2 is used as a similarity measure.
Proof
Since the function f 1 in the problem (9.1) with the similarity measure d 2 is smooth the sets of critical points and Clarke stationary points of this problem coincide (see Theorem 2.27 and Fig. 2.9). □
Now, we are ready to design an IDCA-Clust. This algorithm is based on the MSInc-Clust and the DCA. The IDCA-Clust applies the MSInc-Clust for solving the clustering problem globally and the DCA is utilized at each iteration of the MSInc-Clust to solve both the clustering and the auxiliary clustering problems. The step by step description of the IDCA-Clust is given in Algorithm 9.8.
Remark 9.3
Similar to Algorithms 9.3 and 9.6, we can apply Algorithm 9.2 in Steps 4 and 5 of Algorithm 9.8. Then we obtain \(\inf \)-stationary points of both the clustering and the auxiliary clustering problems.
References
Bagirov, A.M.: An incremental DC algorithm for the minimum sum-of-squares clustering. Iran. J. Oper. Res. 5(1), 1–14 (2014)
Bagirov, A.M., Taheri, S.: A DC optimization algorithm for clustering problems with L 1-norm. Iran. J. Oper. Res. 8(2), 2–24 (2017)
Bagirov, A.M., Taheri, S., Ugon, J.: Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems. Pattern Recogn. 53, 12–24 (2016)
Karmitsa, N., Bagirov, A.M., Taheri, S.: New diagonal bundle method for clustering problems in large data sets. Eur. J. Oper. Res. 263(2), 367–379 (2017)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
M. Bagirov, A., Karmitsa, N., Taheri, S. (2020). DC Optimization Based Clustering Algorithms. In: Partitional Clustering via Nonsmooth Optimization. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-030-37826-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-37826-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37825-7
Online ISBN: 978-3-030-37826-4
eBook Packages: EngineeringEngineering (R0)