Keywords

1 Introduction

As we mentioned in Chap. 4, the clustering problem (4.3) is a nonsmooth global optimization problem and may have many local minimizers. Applying the conventional global optimization techniques is not always a good choice since they are time-consuming for solving such problems, particularly in large data sets. The local methods are fast, however, depending on the choice of starting cluster centers they may end up at the closest local minimizer. Therefore, the success of these methods in solving clustering problems heavily depends on the choice of initial centers.

Since the second half of 1980s, several algorithms have been introduced to choose favorable starting cluster centers for local search clustering algorithms, especially for the k-means algorithm [4, 14, 16, 19, 64, 190, 197]. In some of these algorithms, starting points are generated randomly using certain procedures. The use of the incremental approach allows us to choose good starting points in a deterministic way from different parts of the search space. The paper [106] is among the first introducing the incremental algorithm.

The existing incremental algorithms in cluster analysis can be divided, without any loss of generality, into the following classes:

  • algorithms where new data points are added at each iteration and cluster centers are refined accordingly. Such algorithms are called single pass incremental clustering algorithms; and

  • algorithms where clusters are built incrementally adding one cluster center at a time. This type of algorithms are called sequential clustering algorithms.

In the single pass incremental algorithms, new data points are presented as a sequence of items and can be examined only in a few passes (usually just one). At each iteration of these algorithms clusters are updated according to newly arrived data. These algorithms require limited memory and also limited processing time per item (see [130] and references therein).

In the second type of incremental algorithms, the data set is considered as static and clusters are computed incrementally. Such algorithms compute clusters step by step starting with one cluster for the whole data set and gradually adding one cluster center at each iteration [19, 26, 29, 142, 197]. In this book, we consider this type of incremental clustering algorithms.

There are following three optimization problems to be solved at each iteration of incremental clustering algorithms [229]:

  • problem of finding a center of one cluster;

  • auxiliary clustering problem, defined in (4.29), to obtain starting points for cluster centers; and

  • clustering problem, given in (4.3), to determine all cluster centers.

In this chapter, we discuss different approaches for solving each of these problems. In Sect. 7.2, we describe how a center of one cluster can be found. The general incremental clustering algorithm is given in Sect. 7.3. This algorithm involves solving of the auxiliary clustering problem (4.29).

Since both the cluster and the auxiliary cluster functions are nonconvex they may have a large number of local minimizers. Therefore, having favorable starting points will help us to obtain either global or nearly global solutions to clustering problems. We describe the algorithm for finding such starting points for cluster centers in Sect. 7.4. This algorithm generates a set of starting points for the cluster centers, where the points guarantee the decrease of the cluster function at each iteration of the incremental algorithm. Section 7.5 presents the multi-start incremental clustering algorithm. This algorithm is an improvement of the general incremental algorithm that applies the algorithm for finding a set of starting cluster centers.

Finally, the incremental k-medians algorithm and the discussion on the decrease of its computational complexity are given in Sect. 7.6. This algorithm is a modification of the k-medians algorithm, where the latter algorithm is used at each iteration of the multi-start incremental algorithm to solve the clustering problem (4.3).

2 Finding a Center of One Cluster

In Chap. 5, the problem of finding a center of a cluster is formulated as an optimization problem. Considering a cluster C, the problem of finding its center \(\mathbf {x} \in \mathbb {R}^n\) can be reformulated as follows:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad & \varphi(\mathbf{x}) \\ \text{subject to} & \mathbf{x} \in \mathbb{R}^n, \end{cases} \end{aligned} $$
(7.1)

where

$$\displaystyle \begin{aligned}\varphi(\mathbf{x}) = \frac{1}{|C|} \sum_{\mathbf{c} \in C} d_p(\mathbf{x},\mathbf{c}). \end{aligned}$$

If the similarity measure d 2 is used, then the centroid of the cluster C is the solution to the problem (7.1) which can be easily computed. If the distance function d 1 is applied, then according to Proposition 5.2 the median of the set C is a solution to this problem. This means that there is no need to solve the problem (7.1) when the similarity functions d 1 and d 2 are applied in the clustering problem.

Next, we consider the problem (7.1) when the function d is used. Unlike the functions d 1 and d 2, there is no explicit formula for finding a solution to this problem with the function d , and one needs to apply some optimization methods to solve it. In this case, we have

$$\displaystyle \begin{aligned} \varphi(\mathbf{x}) = \frac{1}{|C|} \sum_{\mathbf{c} \in C} d_\infty(\mathbf{x},\mathbf{c}), \end{aligned}$$

and the subdifferential of the function φ at \(\mathbf {x} \in \mathbb {R}^n\) is

$$\displaystyle \begin{aligned}\partial \varphi(\mathbf{x}) = \frac{1}{|C|} \sum_{\mathbf{c} \in C} \partial d_\infty(\mathbf{x},\mathbf{c}), \end{aligned}$$

where the subdifferential ∂d (x, c) is given in (4.10) and (4.11). Recall that the necessary and sufficient condition for a point x to be a minimum is 0 ∈ ∂φ(x).

For a moderately large number of points in the set C, the subdifferential ∂φ(x) may have a huge number of extreme points and therefore, the computation of the whole subdifferential is not an easy task. To solve the problem (7.1) in this case, we can apply versions of the bundle method which are finite convergent for minimizing convex piecewise linear functions [32].

Another option is to use smoothing techniques to approximate the function d by the smooth functions to replace the problem (7.1) by the sequence of smooth optimization problems. Then we can apply any smooth optimization method to solve these problems.

3 General Incremental Clustering Algorithm

As we mentioned, the incremental approach provides an efficient way to generate starting cluster centers. In this section, we describe a general scheme of the incremental clustering algorithm (Inc-Clust) using the nonconvex nonsmooth optimization model of the clustering problem. Recall the clustering problem (4.3)

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize} \quad & f_k(\mathbf{x}) \\ \text{subject to} & \mathbf{x}=({\mathbf{x}}_1,\ldots,{\mathbf{x}}_k) \in \mathbb{R}^{nk}, \end{cases} \end{aligned} $$
(7.2)

where the function f k, given in (4.4), is

$$\displaystyle \begin{aligned} f_k(\mathbf{x})=\frac{1}{m}\sum_{i=1}^m \min_{j=1,\ldots,k} d_p({\mathbf{x}}_j,{\mathbf{a}}_i). \end{aligned} $$
(7.3)

We also recall the auxiliary clustering problem (4.29)

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad & \bar{f}_k(\mathbf{y}) \\ \text{subject to} & \mathbf{y} \in \mathbb{R}^n, \end{cases} \end{aligned} $$
(7.4)

where the function \(\bar {f}_k\), defined in (4.28), is

$$\displaystyle \begin{aligned} \bar{f}_k(\mathbf{y}) = \frac{1}{m} \sum_{i=1}^m \min \Big\{ r^i_{k-1}, d_p(\mathbf{y},{\mathbf{a}}_i) \Big\}, \end{aligned} $$
(7.5)

and \(r^i_{k-1}\), given in (4.27), is the distance between the data point a i, i = 1, …, m and its cluster center:

$$\displaystyle \begin{aligned} r^i_{k-1} = \min_{j=1, \ldots, k-1} d_p({\mathbf{x}}_j,{\mathbf{a}}_i). \end{aligned} $$
(7.6)

The general scheme of the Inc-Clust for solving the k-partition problem (7.2) is given in Fig. 7.1 and Algorithm 7.1.

Fig. 7.1
figure 1

Incremental clustering algorithm (Inc-Clust)

Algorithm 7.1 Incremental clustering algorithm (Inc-Clust)

Remark 7.1

Algorithm 7.1 in addition to the k-partition problem solves also all intermediate l-partition problems, where l = 1, …, k − 1.

Steps 3 and 4 are the most important steps of Algorithm 7.1, where both the auxiliary clustering problem (7.4) and the clustering problem (7.2) are solved. Since these problems are nonconvex they may have a large number of local minimizers. In the next section, we describe a special procedure to generate favorable starting points for solving these problems. Such an approach allows us to find high quality solutions to the clustering problem using local search methods.

4 Computation of Set of Starting Cluster Centers

In this section, first we describe an algorithm for finding starting points for solving the auxiliary clustering problem (7.4). We assume that for some l > 1, the solution (x 1, …, x l−1) to the (l − 1)-clustering problem is known. Consider the sets

$$\displaystyle \begin{aligned} & \bar{S}_1 = \big\{\mathbf{y} \in \mathbb{R}^n: ~ r^{\mathbf{a}}_{l-1} \leq d_p(\mathbf{y},\mathbf{a}) \quad \mbox{for all} \quad \mathbf{ a} \in A \big\},\quad \text{and} {} \end{aligned} $$
(7.7)
$$\displaystyle \begin{aligned} & \bar{S}_2 = \big\{\mathbf{y} \in \mathbb{R}^n:~r^{\mathbf{a}}_{l-1} > d_p(\mathbf{y},\mathbf{a})\quad \mbox{for some}\quad \mathbf{a} \in A \big\}. {} \end{aligned} $$
(7.8)

Here, \(r^{\mathbf {a}}_{l-1},~\mathbf {a} \in A\) is defined by (4.27). It is obvious that cluster centers \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_{l-1} \in \bar {S}_1\). The set \(\bar {S}_2\) contains all points \(\mathbf {y} \in \mathbb {R}^n\) which are not cluster centers and attract at least one point from the data set A.

Since the number l − 1 of clusters is less than the number m of data points in the set A all points which are not cluster centers belong to the set \(\bar {S}_2\) (because such points attract at least themselves) and therefore, the set \(\bar {S}_2\) is not empty. Obviously

$$\displaystyle \begin{aligned}\bar{S}_1 \cap \bar{S}_2 = \emptyset \quad \mbox{and} \quad \bar{S}_1 \cup \bar{S}_2 = \mathbb{R}^n. \end{aligned}$$

Figure 7.2 illustrates the sets \(\bar {S}_1\) and \(\bar {S}_2\) where the similarity measure d 2 is applied to find cluster centers. There are three clusters in this figure. Their centers are shown by “red” circles. The set \(\bar {S}_2\) consists of all points inside three balls except cluster centers and the set \(\bar {S}_1\) contains three cluster centers and the part of the space outside balls.

Fig. 7.2
figure 2

Illustration of sets \(\bar {S}_1\) and \(\bar {S}_2\)

Note that

$$\displaystyle \begin{aligned} \bar{f}_l (\mathbf{y}) &\leq \frac{1}{m}\sum_{\mathbf{a} \in A} r^{\mathbf{a}}_{l-1} \quad \mbox{for all} \quad \mathbf{y} \in \mathbb{R}^n, \quad \mbox{and}\\ \bar{f}_l (\mathbf{y}) &= f_{l-1}({\mathbf{x}}_1,\ldots,{\mathbf{x}}_{l-1}) = \frac{1}{m}\sum_{\mathbf{a} \in A} r^{\mathbf{a}}_{l-1} \quad \mbox{for all} \quad \mathbf{y} \in \bar{S}_1. \end{aligned} $$

This means that the lth auxiliary cluster function \(\bar {f}_l\) is constant on the set \(\bar {S}_1\), and any point from this set is a global maximizer of this function. In general, a local search method terminates at any of these points. Therefore, starting points for solving the auxiliary clustering problem (7.4) should not be chosen from the set \(\bar {S}_1\).

We introduce a special procedure which allows one to select starting points from the set \(\bar {S}_2\). Take any \(\mathbf {y} \in \bar {S}_2\) and consider the sets B i(y), i = 1, 2, 3 defined in (4.30). Then the set A can be divided into two subsets \(\bar {B}_{12}(\mathbf { y})\) and \(\bar {B}_3(\mathbf {y})\), where

$$\displaystyle \begin{aligned} \bar{B}_{12}(\mathbf{y})= B_1(\mathbf{y}) \cup B_2(\mathbf{y}) \quad \mbox{and} \quad \bar{B}_3(\mathbf{y}) = B_3(\mathbf{y}). \end{aligned} $$
(7.9)

The set \(\bar {B}_3(\mathbf {y})\) contains all data points a ∈ A which are closer to the point y than to their cluster centers, and the set \(\bar {B}_{12}(\mathbf {y})\) contains all other data points. Since \(\mathbf {y} \in \bar {S}_2\) the set \(\bar {B}_3(\mathbf {y}) \neq \emptyset \). Furthermore,

$$\displaystyle \begin{aligned}\bar{B}_{12}(\mathbf{y}) \cap \bar{B}_3(\mathbf{y}) = \emptyset \quad \mbox{and}\quad A= \bar{B}_{12}(\mathbf{y})\cup \bar{B}_3(\mathbf{ y}).\end{aligned}$$

Figure 7.3 depicts the set \(\bar {B}_3(\mathbf {y})\) for a given y (black ball). There are two clusters in this data set and their centers are shown by “red” circles. The set \(\bar {B}_3(\mathbf {y})\) contains all “yellow” data points and the set \(\bar {B}_{12}(\mathbf {y})\) contains the rest of the data set.

Fig. 7.3
figure 3

Illustration of sets \(\bar {B}_{12}(\mathbf {y})\) and \(\bar {B}_3(\mathbf {y})\)

At a point \(\mathbf {y} \in \mathbb {R}^n\) using the sets \(\bar {B}_{12}(\mathbf {y})\) and \(\bar {B}_3(\mathbf {y})\), the lth auxiliary cluster function \(\bar {f}_l\) can be written as

$$\displaystyle \begin{aligned}\bar{f}_l (\mathbf{y}) = \frac{1}{m} \Big(\sum_{\mathbf{a} \in \bar{B}_{12}(\mathbf{y})} r^{\mathbf{a}}_{l-1} + \sum_{\mathbf{a} \in \bar{B}_3(\mathbf{y})} d_p(\mathbf{y},\mathbf{a}) \Big). \end{aligned}$$

The difference between the values of \(\bar {f}_l (\mathbf {y})\) and f l−1(x 1, …, x l−1) is

$$\displaystyle \begin{aligned}z_l(\mathbf{y}) = \frac{1}{m} \sum_{\mathbf{a} \in \bar{B}_3(\mathbf{y})} \Big(r^{\mathbf{a}}_{l-1} - d_p(\mathbf{y},\mathbf{a}) \Big), \end{aligned}$$

which can be rewritten as

$$\displaystyle \begin{aligned} z_l(\mathbf{y}) = \frac{1}{m} \sum_{\mathbf{a} \in A} \max \Big\{0, r^{\mathbf{a}}_{l-1} - d_p(\mathbf{y},\mathbf{a}) \Big\}. \end{aligned} $$
(7.10)

The difference z l(y) shows the decrease of the value of the lth cluster function f l comparing with the value f l−1(x 1, …, x l−1) if the points x 1, …, x l−1, y are chosen as the cluster centers for the lth clustering problem.

It is reasonable to choose a point \(\mathbf {y} \in \mathbb {R}^n\) that provides the largest decrease z l(y) of the clustering function as the starting point for minimizing the auxiliary clustering function. Since it is not easy to choose such a point from the whole space \(\mathbb {R}^n\) we restrict ourselves to the data set A.

If a data point a ∈ A is a cluster center, then this point belongs to the set \(\bar {S}_1\), otherwise it belongs to the set \(\bar {S}_2\). Therefore, we choose points y from the set \(\bar {A}_0 = A \setminus \bar {S}_1\). Obviously, \(\bar {A}_0 \neq \emptyset .\) Take any \(\mathbf {y}=\mathbf {a} \in \bar {A}_0\), compute z l(a) and define the number

$$\displaystyle \begin{aligned} z^1_{\max} = \max_{\mathbf{a} \in \bar{A}_0} z_l(\mathbf{a}). \end{aligned} $$
(7.11)

The number \(z^1_{\max }\) represents the largest decrease of the cluster function which can be provided by any data point. Let γ 1 ∈ [0, 1] be a given number. Compute the following subset of \(\bar {A}_0\):

$$\displaystyle \begin{aligned} \bar{A}_1 = \big\{\mathbf{a} \in \bar{A}_0: ~z_l(\mathbf{a}) \geq \gamma_1 z^1_{\max} \big\}. \end{aligned} $$
(7.12)

The set \(\bar {A}_1\) contains all data points that provide the decrease of the cluster function no less than the threshold \(\gamma _1 z^1_{\max }\). This set is obtained from the set \(\bar {A}_0\) by removing data points that do not provide sufficient decrease of the cluster function. Apparently, \(\bar {A}_1 \neq \emptyset \) for any γ 1 ∈ [0, 1]. If γ 1 = 0, then \(\bar {A}_1 = \bar {A}_0\) and if γ 1 = 1, then the set \(\bar {A}_1\) contains data points providing the largest decrease \(z_{\max }^1\).

For each point \(\mathbf {a} \in \bar {A}_1\) compute the set \(\bar {B}_3(\mathbf {a})\) and its center c(a). Replace the point a by c(a) since the center c(a) is a better representative of the set \(\bar {B}_3(\mathbf { a})\) than the point a. If the similarity measure d p is defined using the L 2-norm, then c(a) is the centroid of the set \(\bar {B}_3(\mathbf {a})\). In other cases, c(a) is found as a solution to the problem (7.1) where

$$\displaystyle \begin{aligned}\varphi(\mathbf{x}) =\frac{1}{|\bar{B}_3(\mathbf{a})|} \sum_{\mathbf{b} \in \bar{B}_3(\mathbf{a})} d_p(\mathbf{x},\mathbf{b}).\end{aligned}$$

Let

$$\displaystyle \begin{aligned}\bar{A}_2 = \big\{\mathbf{c} \in \mathbb{R}^n :~\mbox{there exists} ~\mathbf{a} \in \bar{A}_1~\mbox{such that}~ \mathbf{c}=\mathbf{c}(\mathbf{ a})\big\} \end{aligned}$$

be the set of such solutions. It is obvious that \(\bar {A}_2 \neq \emptyset \). For each \(\mathbf {c} \in \bar {A}_2\), compute the number z l(c) using (7.10) and find the number

$$\displaystyle \begin{aligned} z^2_{\max} = \max_{\mathbf{c} \in \bar{A}_2} z_l(\mathbf{c}). \end{aligned} $$
(7.13)

The number \(z^2_{\max }\) represents the largest value of the decrease

$$\displaystyle \begin{aligned}f_{l-1}({\mathbf{x}}_1,\ldots,{\mathbf{x}}_{l-1}) - f_l({\mathbf{x}}_1,\ldots,{\mathbf{x}}_{l-1},\mathbf{c})\end{aligned}$$

among all centers \(\mathbf {c} \in \bar {A}_2\).

For a given number γ 2 ∈ [0, 1], define the following subset of \(\bar {A}_2\):

$$\displaystyle \begin{aligned} \bar{A}_3 = \big\{\mathbf{c} \in \bar{A}_2: ~z_l(\mathbf{c}) \geq \gamma_2 z^2_{\max} \big\}. \end{aligned} $$
(7.14)

The set \(\bar {A}_3\) contains all points \(\mathbf {c} \in \bar {A}_2\) that provide the decrease of the cluster function no less than the threshold \(\gamma _2 z^2_{\max }\). This set is obtained from the set \(\bar {A}_2\) by removing centers which do not provide the sufficient decrease of the cluster function. It is clear that the set \(\bar {A}_3 \neq \emptyset \) for any γ 2 ∈ [0, 1]. If γ 2 = 0, then \(\bar {A}_3 = \bar {A}_2\) and if γ 2 = 1, then the set \(\bar {A}_3\) contains only centers c providing the largest decrease of the cluster function f l.

All points from the set \(\bar {A}_3\) are considered as starting points for solving the auxiliary clustering problem (7.4). Since all data points are used for the computation of the set \(\bar {A}_3\), it contains starting points from different parts of the data set. Such a strategy allows us to find either global or nearly global solutions to the problem (7.2) (as well as to the problem (7.4)) using local search methods.

Applying a local search algorithm, the auxiliary clustering problem (7.4) is solved using starting points from \(\bar {A}_3\). A local search algorithm generates the same number of solutions as the number of starting points. The set of these solutions is denoted by \(\bar {A}_4\). This set is a non-empty subset of the set of stationary points of the auxiliary cluster function \(\bar {f}_l\).

A local search algorithm starting from different points may arrive to the same stationary point or stationary points which are close to each other. To identify such stationary points we define a tolerance ε > 0. If the distance between any two points from the set \(\bar {A}_4\) is less than this tolerance, then we keep a point with the lower value of the function \(\bar {f}_l\) and remove another point from the set \(\bar {A}_4\).

Algorithm 7.2 Finding set of starting points for the lth cluster center

Next, we define

$$\displaystyle \begin{aligned} \bar{f}_l^{\min} = \min_{\mathbf{y} \in \bar{A}_4} \bar{f}_l(\mathbf{y}). \end{aligned} $$
(7.15)

The number \(\bar {f}_l^{\min }\) is the lowest value of the auxiliary cluster function \(\bar {f}_l\) over the set \(\bar {A}_4\). Let γ 3 ∈ [1, ) be a given number. Introduce the following set:

$$\displaystyle \begin{aligned} \bar{A}_5 = \big\{\mathbf{y} \in \bar{A}_4: \bar{f}_l(\mathbf{y}) \leq \gamma_3 \bar{f}_l^{\min} \big\}. \end{aligned} $$
(7.16)

The set \(\bar {A}_5\) contains all stationary points where the value of the function \(\bar {f}_l\) is no more than the threshold \(\gamma _3 \bar {f}_l^{\min }\). Note that the set \(\bar {A}_5 \neq \emptyset \). If γ 3 = 1, then \(\bar {A}_5\) contains the best local minimizers of the function \(\bar {f}_l\) obtained using starting points from the set \(\bar {A}_3\). If γ 3 is sufficiently large, then \(\bar {A}_5 = \bar {A}_4\). Points from the set \(\bar {A}_5\) are used as a set of starting points for the lth cluster center to solve the lth clustering problem (7.2).

Summarizing all described above, the algorithm for finding starting points to solve the problem (7.2) proceeds as follows [228].

Algorithm 7.2 allows us to use more than one starting point to solve the clustering problem (7.2) in Step ?? of Algorithm 7.1. Moreover, these points always guarantee the decrease of the cluster function value at each iteration of the incremental algorithm and are distinct from each other in the search space. Such an approach allows us to apply local search methods to obtain a high quality solution to the global optimization problem (7.2).

5 Multi-Start Incremental Clustering Algorithm

In this section, we present the multi-start incremental clustering algorithm (MSInc-Clust) for solving the problem (7.2). This algorithm is an improvement of Algorithm 7.1 where in Step ??, Algorithm 7.2 is applied. Similar to Algorithm 7.1, the MSInc-Clust builds clusters dynamically adding one cluster center at a time by solving the auxiliary clustering problem (7.4).

The MSInc-Clust applies Algorithm 7.2 to compute a set of starting cluster centers. Using these centers as initial points, the lth clustering problem (7.2) is solved (l = 2, …, k). Then a solution with the least cluster function value, defined in (7.3), is accepted as the solution to the clustering problem. The flowchart of the MSInc-Clust is given in Fig. 7.4 and its step by step description is presented in Algorithm 7.3.

Fig. 7.4
figure 4

Multi-start incremental clustering algorithm (MSInc-Clust)

Algorithm 7.3 Multi-start incremental clustering algorithm (MSInc-Clust)

Remark 7.2

Similar to Algorithm 7.1, this algorithm solves all intermediate l-partition problems (l = 1, …, k − 1) in addition to the k-partition problem. However, Algorithm 7.1 can find only stationary points of the clustering problem, while Algorithm 7.3 is able to find either global or nearly global solutions.

Note that the most important steps in Algorithm 7.3 are Step 3, where the auxiliary clustering problem (4.29) is solved to find starting points, and Step 4, where the clustering problem (7.2) is solved for each starting point. To solve these problems, we will introduce different algorithms in this and the following two chapters.

6 Incremental k-Medians Algorithm

In this section, we design the incremental k-medians algorithm (Ikmed) as an application of Algorithm 7.3. The k-medians algorithm (Algorithm 5.4), presented in Chap. 5, is simple and easy to implement. However, this algorithm is sensitive to the choice of starting points and finds only local solutions that can be significantly different from the global solution in large data sets. The Ikmed overcomes these drawbacks by applying Algorithm 7.2. Characteristically for k-medians, the distance function d 1 is used to define the similarity measure in the Ikmed. Fig. 7.5 illustrates the flowchart of this algorithm.

Fig. 7.5
figure 5

Incremental k-medians algorithm (Ikmed)

The Ikmed first calculates the center of the whole data set as its median. Then it applies Algorithm 7.2 to compute the set of initial cluster centers by solving the auxiliary clustering problem (7.4). Using these centers, the clustering problem (7.2) is solved. Note that Algorithm 5.4 is utilized to solve both problems (7.2) and (7.3).

The following algorithm describes the Ikmed in step by step.

Algorithm 7.4 Incremental k-medians algorithm (Ikmed)

In Step 4 of Algorithm 7.4 one can apply the modified version of the k-medians Algorithm 5.4 to solve the auxiliary clustering problem and to find starting points for the lth cluster center. In this version, cluster centers x 1, …, x l−1 are fixed and the algorithm updates only the lth center. Therefore, we call it the partial k-medians algorithm. The description of this algorithm is given below.

We use the sets \(\bar {S}_2\) and \(\bar {B}_3(\mathbf {y}),~\mathbf {y} \in \mathbb {R}^n\), defined in (7.8) and (7.9), respectively. Note that we employ the distance function d 1 in computing these sets.

Algorithm 7.5 Partial k-medians algorithm

Remark 7.3

The set \(\bar {S}_2\) contains all data points a ∈ A which are not cluster centers and therefore, in Step 1 one can choose the point y 1 among such data points. More specifically, we can choose \({\mathbf {y}}_1 \in A \setminus \bar {S}_1\) where the set \(\bar {S}_1\) is given in (7.7). Furthermore, since for any \(\mathbf {y} \in \bar {S}_2\) the set \(\bar {B}_3(\mathbf {y})\) is not empty and the value of the auxiliary cluster function decreases at each iteration h the problem of finding the center of the sets \(\bar {B}_3(\mathbf { y}_h),~h \geq 1\) in Step 4 is well defined.

Note that the stopping criterion in Step 3 means that the algorithm terminates when no data point changes its cluster.

The most time-consuming steps in Algorithm 7.4 are Steps 3, ??, and 5. To reduce the computational effort required in these steps, we discuss three different approaches as follows:

  1. 1.

    Reduction of the number of starting cluster centers. As mentioned above, starting points for solving the auxiliary clustering problem (7.4) can be chosen from the set \(A \setminus \bar {S}_1\). At the lth iteration (l ≥ 2) of Algorithm 7.4, we can remove points that are close to cluster centers x 1, …, x l−1. For each cluster A q, 1 ≤ q ≤ l − 1, compute its average radius

    $$\displaystyle \begin{aligned}r_{av}^q = \frac{1}{|A^q|} \sum_{\mathbf{a} \in A^q} d_1({\mathbf{x}}_q,\mathbf{a}), \end{aligned}$$

    and define the subset \(\hat {A}^q \subseteq A^q\) as

    $$\displaystyle \begin{aligned}\hat{A}^q = \big\{\mathbf{a} \in A^q:~r_{av}^q \leq d_1({\mathbf{x}}_q,\mathbf{a}) \big\}. \end{aligned}$$

    Note that if the cluster A q is not empty, then the set \(\hat {A}^q\) is also non-empty. Consider the following subset of the set A:

    $$\displaystyle \begin{aligned} \hat{A} = \bigcup_{q=1}^{l-1} \hat{A}^q. \end{aligned}$$

    Replacing the set \(A\setminus \bar {S}_1\) by the set \(\hat {A} \setminus \bar {S}_1\) allows us to reduce—in some cases significantly—the number of starting cluster centers and to remove those points which do not provide the sufficient decrease of the cluster function.

  2. 2.

    Exclusion of some stationary points of the auxiliary clustering problem (7.4). If any two stationary points from the set \(\bar {A}_4\) are close to each other with respect to some predefined tolerance, then one of them is removed while another one is kept. In order to do so we define a tolerance \(\varepsilon = \hat {f}_1/ml,\) where \(\hat {f}_1\) is the optimal value of the cluster function f 1. If d 1(y 1, y 2) ≤ ε for two points \({\mathbf {y}}_1,{\mathbf {y}}_2 \in \bar {A}_4\), then the point with the lowest value of the auxiliary cluster function is kept in \(\bar {A}_4\) and another point is removed.

  3. 3.

    Use of the triangle inequality to reduce the number of distance calculations. Since d 1 is the distance function it satisfies the triangle inequality. This can be used to reduce the number of distance function calculations of Algorithm 5.4 in solving both the clustering and the auxiliary clustering problems. First, we consider the auxiliary clustering problem (7.4). Assume that (x 1, …, x l−1) is the solution to the (l − 1)-partition problem. Recall that the distance between the data point a ∈ A and its cluster center is denoted by

    $$\displaystyle \begin{aligned}r^{\mathbf{a}}_{l-1} = \min_{j=1, \ldots, l-1} d_1({\mathbf{x}}_j,\mathbf{a}). \end{aligned}$$

    Let \(\bar {\mathbf {y}}\) be a current approximation to the solution of the problem (7.4). Compute distances \(d_1(\bar {\mathbf { y}},{\mathbf {x}}_j),~j=1,\ldots ,l-1\). Assume that a ∈ A j for some j ∈{1, …, l − 1}. According to the triangle inequality we have

    $$\displaystyle \begin{aligned} d_1(\bar{\mathbf{y}},{\mathbf{x}}_j) &\leq d_1(\mathbf{a},\bar{\mathbf{y}}) + d_1(\mathbf{a},{\mathbf{x}}_j) = d_1(\mathbf{a},\bar{\mathbf{ y}}) + r^{\mathbf{a}}_{l-1}, \quad \mbox{or}\\ d_1(\mathbf{a},\bar{\mathbf{y}}) &\geq d_1(\bar{\mathbf{y}},{\mathbf{x}}_j) - r^{\mathbf{a}}_{l-1}. \end{aligned} $$

    This means that if \(d_1(\bar {\mathbf {y}},{\mathbf {x}}_j) > 2r^{\mathbf {a}}_{l-1}\), then \(d_1(\mathbf {a},\bar {\mathbf {y}}) > r^{\mathbf {a}}_{l-1}\) and therefore, there is no need to calculate the distance \(d_1(\mathbf {a},\bar {\mathbf {y}})\) as the point a does not belong to the cluster with the center \(\bar {\mathbf {y}}\).

    Similar approach can be considered for the clustering problem (7.2). Let \((\bar {\mathbf {x}}_1,\ldots ,\bar {\mathbf {x}}_l)\) be a current approximation to the solution of the lth partition problem. Compute distances \(d_1(\bar {\mathbf {x}}_i,\bar {\mathbf {x}}_j)\) for i, j = 1, …, l. Assume that for a given point a ∈ A, the distances \(d_1(\mathbf {a},\bar {\mathbf {x}}_i),~i=1,\ldots ,j\) have been calculated or estimated for some j ∈{1, …, l − 1}. Let \(\tilde {\mathbf {x}} \in \{\bar {\mathbf {x}}_1, \ldots ,\bar {\mathbf {x}}_j\}\) be such that

    $$\displaystyle \begin{aligned}d_1(\mathbf{a},\tilde{\mathbf{x}}) =\min_{i=1,\ldots,j} d_1(\mathbf{a},\bar{\mathbf{x}}_i). \end{aligned}$$

    According to the triangle inequality we have

    $$\displaystyle \begin{aligned} d_1(\tilde{\mathbf{x}},\bar{\mathbf{x}}_{j+1}) &\leq d_1(\mathbf{a},\tilde{\mathbf{x}}) + d_1(\mathbf{a},\bar{\mathbf{x}}_{j+1}), \quad \mbox{or}\\ d_1(\mathbf{a},\bar{\mathbf{x}}_{j+1}) &\geq d_1(\tilde{\mathbf{x}},\bar{\mathbf{x}}_{j+1}) - d_1(\mathbf{a},\tilde{\mathbf{x}}). \end{aligned} $$

    If \(d_1(\tilde {\mathbf {x}},\bar {\mathbf {x}}_{j+1}) > 2d_1(\mathbf {a},\tilde {\mathbf {x}})\), then there is no need to calculate the distance \(d_1(\mathbf {a},\bar {\mathbf {x}}_{j+1})\) as the point a does not belong to the cluster A j+1 with the center \(\bar {\mathbf {x}}_{j+1}\). The last approach allows us to significantly reduce the number of distance function evaluations as the number of clusters increases.