Keywords

1 Introduction

Data clustering has wide applications in such fields as data mining, pattern recognition and others. Many clustering algorithms of different types have been developed and some of them have generated impressive results in application. Some commonly used algorithms include k-means, spectral clustering [13, 16], DBSCAN [7], mean shift [5] and their variants. Recently, some new algorithms have been proposed, including affinity propagation (AP) [3], robust spectral clustering [19], dominant sets (DSets) [14]. Noticing that many algorithms require to determine the number of clusters beforehand, [8, 12] have presented some methods to solve this problem. Since some algorithms detect only spherical clusters, density based algorithms have received a lot of attention [1, 2].

In density based clustering algorithms, DBSCAN relies on a density threshold to detect cluster borders, and the density threshold is represented by two parameters MinPts and Eps. While DBSCAN has been shown to perform well in many experiments, it may not be easy to determine the appropriate parameters. In addition, a fixed set of parameters imply a fixed density threshold, which may not be appropriate for datasets where cluster densities vary significantly. Different from DBSCAN-like algorithms, the density peak (DP) algorithm presented in [15] accomplishes the clustering process on the basis of density relationship. By treating local density peaks as the candidates of cluster centers, the DP algorithm finds that cluster centers have both great \(\rho \)’s and great \(\delta \)’s, and either the \(\rho \)’s or \(\delta \)’s of non-density-peak data are small. This algorithm then uses both \(\rho \) and \(\delta \), or \(\gamma =\rho \delta \), to identify cluster centers, and then group the other data into clusters based on density relationship among neighboring data. Different from cluster centers surrounded by data of smaller density, non-density-peak data usually have the nearest neighbors greater density in the neighborhood, corresponding to small \(\delta \)’s. Consequently, the distance \(\delta \) is effective in isolating cluster centers from the non-density-peak data. While cluster centers usually have greater local density than the neighboring non-density-peak data, the density of non-density-peak data may not be small in absolute magnitude. In other words, non-density-peak data may have great or small local density, and the \(\rho \) criterion is not very informative in strengthening the specificity of cluster centers. For the purpose of solving this problem, we study the cluster center identification process and find that the role of density peak is more important for a cluster center than a great density. We then present an enhanced criterion based on local density of subordinates to replace the original local density \(\rho \). Furthermore, a new density kernel is proposed to overcome the drawbacks of the cutoff and Gaussian kernels. By combining the new criterion and density kernel, our algorithm performs well in experiments and compares favorably to some commonly used and recently proposed algorithms.

2 Density Peak Clustering Algorithm

An attractive property of density based clustering algorithms is that they detect non-spherical clusters. While the DBSCAN algorithm based on a density threshold, the DP algorithm makes use of the density information in a different manner. We use examples to demonstrate how the DP algorithm identify cluster centers and accomplish the clustering process. With the Aggregation [10] dataset, we calculate \(\rho \) in the first step. The cutoff kernel defines the local density as the count of data in the \(d_c\)-radius neighborhood, where \(d_c\) is the cutoff distance to be specified. The distribution of \(\rho \) and \(\delta \) of all the data is shown in Fig. 1(a). Obviously only a few data have both great \(\rho \) and great \(\delta \) and the majority of the data have either small \(\delta \)’s or small \(\rho \)’s. This makes it feasible to determine cluster centers by selecting the data of great \(\rho \)’s and great \(\delta \)’s. Noticing that with Fig. 1(a) two thresholds are necessary to determine cluster centers, we sort the data according to \(\gamma =\rho \delta \) in the decreasing order and show the distribution of \(\gamma \) in Fig. 1(b), where a few data are with significantly greater \(\gamma \) than the others. With the \(\gamma \) decision graph we only need one threshold to select out the cluster centers.

Fig. 1.
figure 1

Decision graphs and clustering results with the cutoff kernel.

With the cluster centers available, the DP algorithm determines the labels of the non-density-peak data based on data density relationship. The clustering of non-density-peak data is on the basis of the assumption that one data and the nearest neighbor of greater local density are in the same cluster. With this assumption, the non-density-peak data can be assigned labels in the decreasing order of their local density. This process involves only one scan of the data and can be accomplished efficiently. While no proof shows that this assumption holds in theory, the method works well in practice, as shown in the clustering results in Fig. 1(c).

While Fig. 1 shows that cluster centers usually have great \(\rho \), great \(\delta \) and great \(\gamma \), they also indicate that it may not be easy to select out the cluster centers with only the decision graphs. The reason is that the differences between great and small \(\rho \)’s, \(\delta \)’s and \(\gamma \)’s are not significant in many cases. For the purpose of avoiding the influence from inappropriate thresholds, we specify the number of clusters in this paper, and the data of the greatest \(\gamma \)’s are identified as cluster centers.

3 Our Approach

Since cluster centers typically have both great \(\rho \)’s and great \(\delta \)’s, we often uses \(\gamma =\rho \delta \) to identify cluster centers. As density peaks, cluster centers are surrounded by neighboring data of smaller local density. As a result, they are distant from the nearest data of greater local density and have great \(\delta \)’s. However, the nearest neighbors of cluster centers may have only slightly smaller density than the correspondingly cluster centers. In other words, non-density-peak data may also have great density. As Fig. 1(a) shows, the local densities of the data are distributed quite evenly, and there are a large amount of data with great local density. This observation indicates that the local density \(\rho \) is not as informative as \(\delta \) in isolating cluster centers from non-density-peak data. In our opinion, this also explains why the DP clustering results are influenced by density kernel types and kernel parameters significantly.

In order to relieve the problems resulted by the uninformative \(\rho \) criterion in identifying cluster centers, we make a further study of the cluster center identification process. One intention of the DP algorithm is to use some measures to strengthen the specificity of cluster centers. Since cluster centers have both great \(\rho \)’s and great \(\delta \)’s, and either the \(\rho \)’s or \(\delta \)’s of non-density-peak data are small, the product \(\gamma =\rho \delta \) is used as the cluster center identification criterion. We have observed that \(\delta \) is indeed effective for cluster center identification, and \(\rho \) is not so informative in comparison. However, if we remove \(\rho \) and uses only \(\delta \) to select cluster centers, it is likely that the outliers of datasets which are far from other data are identified as cluster centers. In other words, the local density \(\rho \) is still effective in preventing outliers from being identified as cluster centers. Therefore instead of removing \(\rho \) completely, we propose to enhance the discriminative ability of \(\rho \).

In the DP algorithm, one important feature of cluster centers is that they are surrounded by non-density-peak data of smaller local density. Here we see that what differentiates cluster centers from non-density-peak data is not the great density in absolute magnitude, but the role of density peaks. That is to say, it doesn’t matter if one cluster center has a great density, but it matters if it has a greater local density than the neighboring data. Hence we propose to use a criterion measuring the role of density peaks to replace \(\rho \) in identifying cluster centers.

In the following we take one data i for example, and denote the cluster containing i as \(C_i\). If i is the cluster center of \(C_i\), it should be surrounded by neighboring data of smaller density. Intuitively we can use the number \(N_n\) of neighboring data with smaller density to measure the role of i being the cluster center. A larger \(N_n\) means a larger possibility of i being the cluster center of \(C_i\). However, it is possible that the neighboring data with smaller density contain not only the data in \(C_i\), but also some data in other clusters. In this case, \(N_n\) cannot measure the possibility of i being the cluster center accurately, and we need to consider only the data in \(C_i\). However, before the clustering is accomplished, the cluster membership of \(C_i\) is unknown.

We present the following method to make use of only the data in \(C_i\) before the cluster membership is available. It is assumed in the DP algorithm that one data and the nearest data of greater local density are in the same cluster. If one data n is the nearest neighbor of greater local density of data m, we call n as the superior of m, and m as the subordinate of n, and denote this relationship by \(m \rightarrow n\). Evidently one data and all its subordinates should be in the same cluster. Since cluster centers are density peaks, they usually have a large amount of subordinates. On the contrary, the number of subordinates of non-density-peak data may be small or zero. Therefore for the data i, we can use the number \(N_s\) of subordinates to measure the probability of i being the cluster center. Furthermore, the local density of subordinates also plays a role in measuring the possibility. In summary, we use the sum of local density of the subordinates to measure the possibility of i being the cluster center, and define the enhanced version of \(\rho \) as

$$\begin{aligned} \eta _i=\sum _{j \in S, j \ne i} \zeta (i,j) \rho _j, \end{aligned}$$
(1)

where

$$\begin{aligned} \zeta (x,y)= {\left\{ \begin{array}{ll} 1, &{} y \rightarrow x,\\ 0, &{} otherwise. \end{array}\right. } \end{aligned}$$
(2)

Then we can use \(\eta \) to replace \(\rho \) and identify cluster centers based on \(\gamma '=\eta \delta \). It is worth mentioning that we only use \(\eta \) in determining the cluster centers. The original \(\rho \) is still used in grouping non-density-peak data on the basis of density relationship, as it measures the density relationship among neighboring data more accurately.

In addition, we make use of the average distance to a limited amount of nearest data to evaluate the local density. The density kernel obtained this way is presented as a compromise between the cutoff and Gaussian kernels. The cutoff kernel makes use of only the count of data in a neighborhood and discards the distance information to these data. This information loss may influence the local density precision. While the Gaussian kernel makes use of the distance information, it takes into account both the nearest neighbors and the farthest data. In this case, the density kernel may measure the distribution of data in a large region but not a small neighborhood, if the parameter \(d_c\) is not selected appropriately. Between these two extremes, our new kernel makes use of the distance to a limited number of neighboring data, and is shown to perform well in experiments.

4 Experiments

In our work \(\eta \) is presented as an enhanced version of local density \(\rho \) to improve the discriminative ability, and then use a new density kernel to overcome the drawbacks of existing ones. In this part we firstly validate the effectiveness of the enhanced local density criterion. After that, the whole algorithm is tested and compared with existing commonly used and recently proposed algorithms.

4.1 Enhanced Local Density

The \(\rho \)-\(\delta \) decision graph in Fig. 1 shows that the distribution of data in the range of the local density \(\rho \) is quite even, indicting that \(\rho \) is not very informative in strengthening the specificity of cluster centers. We are motivated to replace \(\rho \) by \(\eta \) to help isolate cluster centers from non-density-peak data. Here we test if \(\eta \) really works in serving this purpose. By replacing \(\rho \) with \(\eta \), we show the \(\eta \)-\(\delta \) decision graphs and the corresponding \(\rho \)-\(\delta \) decision graphs on the Aggregation and Flame datasets in Fig. 2. Evidently the majority of the data have small \(\eta \) values, and only a few data are with great \(\eta \). The comparison between \(\rho \)-\(\delta \) graphs and \(\eta \)-\(\delta \) graphs indicates that \(\eta \) is helpful in isolating cluster centers from non-density-peak data, and is more suitable to serve as a cluster center identification criterion than \(\rho \).

Fig. 2.
figure 2

The \(\eta \)-\(\delta \) decision graphs and corresponding \(\rho \)-\(\delta \) decision graphs. The left two figures belong to the Aggregation dataset, and the right two correspond to the Flame dataset.

4.2 Comparison

We now compare the proposed algorithm with some commonly used and recently proposed clustering algorithms with experiments on eight datasets, including Aggregation, Compound [18], Spiral [4], D31 [17], R15 [17], Flame [9] as well as the Wine and Iris datasets from UCI machine learning repository. Aside from the well-known k-means and DBSCAN algorithms, the normalized cuts (NCuts) [16], AP, DSets, one improved version of DSets presented in [11] and SPRG [19] are also adopted in comparison. Since our work is proposed to improve the DP algorithm, we also compare with two versions of the DP algorithm, one of which with cutoff kernel (DP-c) and the other with Gaussian kernel (DP-G). We experiment on the same eight datasets as in previous sections, and report clustering results evaluated with NMI. Except for the algorithm in [11], all these algorithms require to input one or more parameters. The k-means, SPRG and NCuts algorithms involve the number of clusters, and we set this parameter as the ground truth. As to DBSCAN which has two parameters MinPts and Eps, we set \(MinPts=3\) which is selected from 2, 3, \(\cdots \), 10, and then determine Eps based on MinPts [6]. The AP algorithm involves the preference value p, and the authors of [3] provide a method to obtain the range \([p_{min},p_{max}]\) of this parameter. We sample this range and select \(p=p_{min}+9.2\xi \), where \(\xi =(p_{max}-p_{min})/10\). In the DSets algorithm, \(s(x,y)=exp(-d(x,y)/\sigma )\) is used to evaluate the data similarity, and we manually select \(\sigma =10\overline{d}\) to obtain the best overall result, with \(\overline{d}\) denoting the mean pairwise distance. With the DP-c and DP-G algorithms the parameter \(d_c\) is determined by including 1.1% and 2.0% of data into the neighborhood for DP-c and DP-G, respectively. We report the clustering results of these algorithms in Table 1.

Table 1. Clustering results (NMI) of some algorithms.

We firstly look at the comparison between the original DP algorithms DP-c, DP-G and our algorithm. On D31 and R15 datasets, both DP-c and DP-G algorithms generate very good results, and our algorithm performs as well as these two. On the Compound, Spiral, Flame, Wine and Iris datasets, our algorithm compares favorably with the two algorithms. Only on the Aggregation dataset the two DP algorithms outperform ours evidently. These comparisons demonstrate the effectiveness of our improvements to the original DP algorithm.

Comparatively, our algorithm is shown as the best-performing or near-best-performing one on 5 out of the 8 datasets, and our algorithm generates the best overall result. Especially on the Spiral dataset, on which k-means, NCuts and AP fail completely in clustering, our algorithm generate the perfect result. Even if our algorithm is outperformed by some algorithms on Aggregation, Compound and Wine datasets evidently, it is always among the 5 best-performing algorithms. These observations indicate that our algorithm has nice generality and performs well on various types of datasets.

5 Conclusions

An enhanced cluster center identification criterion and a new density kernel are presented to improve the DP clustering algorithm in this paper. By treating local density peaks as candidates of cluster centers, the DP algorithm uses local density and the distance to the nearest data of greater local density to represent the data and identify cluster centers. By studying the cluster center identification process, we find that local density is not very effective in strengthening the specificity of cluster centers. We introduce the concept of subordinates and present an alternative criterion to local density based on the subordinates. Furthermore, we make use of the average distance to neighboring data to evaluate the local density, in an endeavor to overcome the drawbacks of the cutoff and Gaussian kernels. Experiments show that the new criterion strengthens the specificity of cluster centers, and our algorithm performs well in comparison with some commonly used and recently proposed algorithms.