1 Introduction

Clustering [1,2,3,4] is a process of dividing the data object set into multiple groups and clusters, which recognizes different groups (clusters) underlying data. Clustering is different from classification. Classification needs class labels while there are no labels for clustering [5,6,7,8,9,10]. Currently, clustering analysis has been widely used in many areas [11,12,13,14], such as business intelligence, image pattern recognition, web search, biology and security, and so forth. Traditional clustering algorithms are mainly categorized as partitioning clustering, hierarchical clustering, density-based clustering, grid-based clustering, and model-based clustering. The k-means algorithm [15] is a classical partition-based clustering method. The k-means algorithm is simple in principle, easy to implement and fast in convergence. However, by adopting iterative algorithm, k-means algorithm can only obtain local optimal solution. In addition, the value of parameter k in k-means algorithm needs to be given in advance, and the initial clustering center has a great impact on the clustering results. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm [16] is a representative density-based clustering algorithm. Different from partitioning and hierarchical clustering, DBSCAN algorithm defines clusters as areas with higher density than the remainder of the data set, which can divide areas with sufficient density into clusters and find clusters of arbitrary shapes in spatial databases. BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies) algorithm [17] is based on hierarchical clustering, and it used the limited memory resources to complete the high quality of the clustering of large datasets. BIRCH, however, does not work well if the clusters aren’t spherical because it uses the concept of radius or diameter to control the boundaries of the clusters. The spectral clustering algorithm [18] is based on the spectral theory of graph theory. It transforms the clustering problem into the optimal partition problem of graph. The spectral clustering can overcome the shortcomings of some classical clustering algorithms, which ensures that the result converges to the global optimal solution, and has a good application prospect for data clustering.

On the other hand, clustering analysis is generally classified into two types: hard clustering, and soft clustering [19,20,21]. Traditional clustering analysis is a hard partition, in other words, every object to be identified is strictly divided into a certain class with distinct boundaries. Most of the algorithms mentioned above are hard clustering. In order to improve the accuracy of clustering algorithm, fuzzy mathematical method [22,23,24] is introduced into clustering analysis. The concept of fuzzy clustering analysis was firstly proposed by Ruspini [25]. It generally refers to the construction of fuzzy matrix based on the object’s properties of the object itself and the determination of clustering relationship based on certain membership degree. By means of fuzzy mathematics, the fuzzy relationship between samples is quantitatively determined, resulting in objective and accurate clustering. Fuzzy c-means algorithm [26, 27] is one of fuzzy clustering algorithms that are widely used. However, there are some underlying drawbacks of FCM algorithm. One of the issues is that the number of clusters that is determined artificially is sensitive to the initial clustering center. Also FCM algorithm is easy to generate problems such as multiple clustering iterations, slow convergence speed and local optimal solution. Many algorithms have been proposed to improve the FCM algorithm. Geweniger [28] combined the median c-means algorithm with the fuzzy c-means approach to improve the accuracy of this algorithm. Xue Zhenxia [29] presented a fuzzy rough semi-supervised outlier detection approach with the help of some labeled samples and fuzzy rough C-means clustering. This method introduces an objective function, which minimizes the sum squared error of clustering results and the deviation from known labeled examples as well as the number of outliers. As a result, better clustering results for normal points and better accuracy for outlier detection can be achieved. Zexuan [30] introduced a novel adaptive method to compute the weights of local spatial in the objective function, the new adaptive fuzzy clustering algorithm is allowing the suppression of noise and helping to resolve classification ambiguity. Fritz Heinrich [31] proposed several fuzzy clustering methods which are able to handle the presence of noise. Lai et al. [32] presented a rough k-means clustering algorithm by minimizing the dissimilarity to solve the divergence problem of the original approaches that the cluster centers may not be converged to their final positions. Wang [33] presented a rough-set [34, 35] based measurement for the membership degree of fuzzy C-means algorithm, and take the advantage of the positive region set and the boundary region set of rough set. In this paper, the FCM algorithm is combined with DPC algorithm. As a density-based clustering algorithm, DPC can get the cluster center while using less parameters than other clustering algorithms. Experiment results and analysis demonstrate that the proposed algorithms can effectively solve the problem that the FCM algorithm is sensitive to the initial cluster center and improve the accuracy of the algorithm.

2 Priliminaries

In this section, we introduce the standard DPC algorithm and the original fuzzy C-means Algorithm. Through this section we can understand these basic notions and descriptions.

2.1 DPC algorithm

In 2014, Alex Rodriguez and Alessandro Laio [36] proposed a clustering algorithm named clustering by fast search and find of density peaks. The advantage of this algorithm [37, 38] is that it requires fewer parameters, is insensitive to noise, and can find clusters with arbitrary shapes and dimensions. The clustering algorithm is divided into two steps. The first step is to calculate the local density and distance of samples according to the parameters input by users, and find the clustering center, which is also called density peak. Then, the appropriate clustering center is selected from the samples according to the decision graph. In the second step, the remaining samples are allocated to the cluster where the nearest and densest samples are located.

The algorithm has its basis with the assumptions that cluster centers are surrounded by neighbors with lower local density. They are also at a relatively large distance between any points with a higher local density. So, for each data point i, compute two quantities: its local density \(\rho_{i}\) and its distance \(\delta_{i}\) from points of higher density. Both of these quantities depend only on the distance \(dis_{ij}\) between the data points. The definition of the local density \(\rho_{i}\) and distance \(\delta_{i}\) is shown in Eqs. (1) and (2), respectively:

$$\rho_{i} = \sum\limits_{j} {\chi (dis_{ij} - d_{c} )}$$
(1)

In the above equation, \(dis_{ij}\) is the distance between sample i and j, and \(d_{c}\) is the truncation distance, \(\chi ({\text{x}})\left\{ {\begin{array}{*{20}l} {1,} \hfill & {x < 0} \hfill \\ {0,} \hfill & {x \ge 0} \hfill \\ \end{array} } \right.\).

$$\delta_{i} = \min_{{j:\rho_{j} > \rho_{i} }} (dis_{ij} )$$
(2)

For the highest density point, the distance is \(\delta_{i} = \max_{j} (dis_{ij} )\).

In addition, Alex Rodriguez and Alessandro Laio also proposed a method to calculate local density using gaussian kernel function, as shown in Eq. (3):

$$\rho_{i} = \sum\limits_{j \ne i} {e^{{ - \left( {\frac{{dis_{ij} }}{{d_{c} }}} \right)^{2} }} }$$
(3)

By comparing Eq. (1) and (3), it is easy to know that Eq. (1) is a discrete value and Eq. (3) is a continuous value. Therefore, the latter is less likely to collide, which means different data points have the same local density value. In Eqs. (2), the distance between the sample and its nearest one with a higher local density \(\rho\) is \(\delta\). DPC algorithm uses the local density \(\rho\) and distance \(\delta\) to construct decision graph and choose samples with large values both in \(\rho\) and \(\delta\) as a clustering center. Then the algorithm allocates the rest of the sample j to the cluster with the highest local density on the nearest sample.

2.2 Fuzzy C-means algorithm

FCM algorithm is an unsupervised learning method based on the objective function optimization, which realizes the partition of a given dataset through the iterative optimization of the objective function [27]. FCM algorithm studies the clustering problem by means of fuzzy mathematics. The clustering result is a numerical value representing the membership degree of each data point to the clustering center.

Let dataset \(X = \left\{ {X_{1} ,X_{2} , \ldots ,X_{n} } \right\}\) is a collection of n data, where each sample \(X_{i}\) has p features. We divide the n samples into C fuzzy groups, and set the central matrix of the dataset as \(V = \left\{ {V_{1} ,V_{2} , \ldots ,V_{C} } \right\}\). The objective function defining FCM is:

$$J\left( {U,V} \right) = \mathop \sum \limits_{i = 1}^{C} \mathop \sum \limits_{j = 1}^{n} (u_{ij} )^{m} (d_{ij} )^{2}$$
(4)

where \(U = (u_{ij} )\) is an \(n \times c\) dimension membership matrix, and \(u_{ij}\) represents the membership between \(V_{i}\) and \(X_{j}\). \(d_{ij} = x_{j} - V_{i}\) is the Euclidean distance between the sample j and the cluster center i. m (m > 1) is the fuzzy exponent in the algorithm, usually set as 2. Equation (4) also needs to meet the following conditions:

$$\left\{ {\begin{array}{*{20}l} {\mathop \sum \limits_{i = 1}^{C} u_{ij} = 1,} \hfill & {\quad j = 1,2, \ldots ,n;} \hfill \\ {0 \le u_{ij} \le 1,} \hfill & {\quad i = 1,2, \ldots ,C;\quad j = 1,2, \ldots ,n;} \hfill \\ {0 < \mathop \sum \limits_{j = 1}^{n} u_{ij} < n,} \hfill & {\quad i = 1,2, \ldots ,C;} \hfill \\ \end{array} } \right.$$
(5)

Finally, the minimum objective function \(J\left( {U,V} \right)\) of FCM algorithm was obtained through iterative optimization, and \(U\) and \(V\) can be obtained as follows:

$$u_{ij} = \left\{ {\begin{array}{*{20}c} {\frac{1}{{\mathop \sum \nolimits_{k = 1}^{C} \frac{{d_{ij} }}{{d_{kj} }}^{{\frac{1}{m - 2}}} }}} \\ 1 \\ \end{array} } \right.$$
(6)
$$V_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{n} (u_{ij} )^{m} x_{j} }}{{\mathop \sum \nolimits_{j = 1}^{n} (u_{ij} )^{m} }}$$
(7)

3 DP-FCM algorithm

3.1 The improvement of DP-FCM algorithm

FCM algorithm needs to set the cluster centers and cluster number artificially, and is sensitive to the initial cluster centers. The algorithm is easy to generate problems such as multiple clustering iterations, slow convergence, local optimal solution and poor stability. Therefore, we propose a new density peak-based FCM algorithm (DP-FCM).

In DP-FCM, the first step is to determine the centers of the clusters based on the two parameters: \(\rho_{i}\) and \(\delta_{i}\) of dataset. We define the density distance index \(\varphi_{i}\):

$$\varphi_{i} = \rho_{i} \delta_{i}$$
(8)

Traverse the n sample points and find the \(\varphi_{i}\) values of all the points. Sort the density distance values in descending order and extract the first z points. Calculate an average density distance:

$$\varphi_{ave} = \frac{1}{z}\mathop \sum \limits_{i = 1}^{z} \varphi_{i}$$
(9)

It is easy to know that the points of the density distance value show a downward trend. The points with larger value maintain a higher possibility to be the clustering center. When \(\varphi_{i} > \varphi_{ave}\), we specify that \(C_{i}\) at that point is a clustering center. Then, the clustering center is selected according to the decision graph drawn [39].

The second step of DP-FCM is to put the selected clustering center point into the FCM algorithm and obtain the clustering result.

3.2 The steps of DP-FCM algorithm

figure a

Through the analysis of the above steps, we can see that for a dataset with n samples, the time complexity of the algorithm mainly comes from the local density of \(\rho_{i}\), distance \(\delta_{i}\) and FCM algorithm time complexity. If the Eq. (1) is used to calculate the local density of \(\rho_{i}\), we need to look for samples that are less than \(d_{c}\) away from sample i, the worst time complexity is \(O(n^{2} )\). If Eq. (3) is used to calculate the local density of \(\rho_{i}\), we need to calculate the sum of weighted values of distances between each sample i with other samples, its time complexity is \(O(n^{2} )\). The time complexity of the distance of \(\delta_{i}\) is \(O\left( {n^{2} } \right)\). The time complexity of FCM algorithm is \(O\left( {npCL} \right)\), where \(n\) is the number of samples in the dataset, \(p\) is the dimension of the dataset, C is the number of clusters, and \(L\) is the number of iterations. Therefore, the total time of algorithm complexity is \(O(2n^{2} + npCL)\).

4 Experimental results

In order to test the effect of DP-FCM algorithm, we use several classical artificial datasets and real datasets in UCI repository. We also compare our DP-FCM algorithm with FCM algorithm, k-means algorithm, DBSCAN algorithm and NMF(Non-negative Matrix Factorization)-based model. Among them, k-means algorithm and DBSCAN algorithm are classical algorithms in the clustering algorithm based on partition and the clustering algorithm based on density, respectively. NMF algorithm is representative work in clustering. FCM algorithm is dependent on its initial cluster center and has poor stability. The optimized DP-FCM algorithm can solve these problems well and improve the effectiveness of the algorithm.

4.1 The datasets and evaluation indexes of experiment

Table 1 lists the experimental datasets, including 6 artificial datasets and 10 real datasets from UCI (Table 2).

Table 1 Description of the classical artificial datasets
Table 2 Description of the real datasets

Clustering performance measurement is also known as the validity indexes of clustering. To evaluate the performance, we apply metrics of Accuracy, ARI (Adjusted Rand index) and NMI (Normalized Mutual Information) [40]. These metrics are widely used in the fields of information retrieval and statistics to evaluate the quality of clustering results. The range of the three evaluation metrics is between 0 and 1. And the larger the value is, the better the clustering effect is.

Specifically, Accuracy measures the percentage of correctly classified data points in the clustering solution over the pre-defined class labels. The Accuracy is calculated as:

$$Accuracy = \mathop \sum \limits_{i = 1}^{k} \frac{{\hbox{max} (C_{i} |L_{i} )}}{|X|}$$
(10)

where \(C_{i}\) is the set of instances in the \(i\)th cluster, \(L_{i}\) is the class labels for all instances in the \(i\)th cluster. And \(\hbox{max} (C_{i} |L_{i} )\) is the number of instances with the majority label in the \(i\)th cluster (e.g. if label \(l\) appears in the \(i\)th cluster more often than any other labels, then \(\hbox{max} (C_{i} |L_{i} )\) is the number of instances in \(C_{i}\) with the lable \(l\)). \(\left| X \right|\) is the number of elements in dataset \(X\).

ARI (Adjusted Rand Index) takes into account the number of instances that exist in the same cluster and different clusters. The expected value of such a validation measure is not zero when comparing partitions. ARI is defined as:

$$ARI = \frac{{n_{11} + n_{00} }}{{n_{11} + n_{10} + n_{01} + n_{00} }}$$
(11)

where \(n_{11}\) number of pairs of instances that are in the same cluster. \(n_{00}\) Number of pairs of instances that are in different clusters. \(n_{10}\) Number of pairs of instances that should be in the same cluster in A, but in different clusters in B. \(n_{01}\) Number of pairs of instances that should be in different clusters in A, but in the same cluster in B.

NMI (Normalized Mutual Information) is a measure of the interdependencies between variables.

$${\text{NMI}} = \frac{I(X;Y)}{{\sqrt {H(X)H(Y)} }}$$
(12)

X and Y are the random variables. \(I\left( {X;Y} \right)\) represents the mutual information of two variables. \(H_{X}\) is the entropy of X. They are defined as follows:

$$I\left( {X;Y} \right) = \mathop \sum \limits_{x,y} p(x,y)\log \left( {\frac{{p\left( {x,y} \right)}}{p\left( x \right)p\left( y \right)}} \right)$$
(13)
$$H_{X} = \mathop \sum \limits_{i = 1}^{n} p(x_{i} )I(x_{i} ) = \mathop \sum \limits_{i = 1}^{n} p(x_{i} )\log \frac{1}{{p(x_{i} )}} = - \mathop \sum \limits_{i = 1}^{n} p\left( {x_{i} } \right)\log p(x_{i} )$$
(14)
$$H_{Y} = \mathop \sum \limits_{i = 1}^{n} p(y_{i} )I(y_{i} ) = \mathop \sum \limits_{i = 1}^{n} p(y_{i} )\log \frac{1}{{p(y_{i} )}} = - \mathop \sum \limits_{i = 1}^{n} p(y_{i} )\log p(y_{i} )$$
(15)

4.2 Analysis of experimental results on the artificial datasets

Figures 1, 2, 3, 4, 5 and 6 shows the clustering results of DP-FCM algorithm on six different artificial datasets. The datasets including D31, R15, Square3, Long1, Shapes and Twenty, which are described in Table 1. The coordinates of x-axis and y-axis represent two-dimensional plane. By mapping the results to a two-dimensional plane, we can see that DP-FCM algorithm has a significant lift on the clustering of balanced datasets.

Fig. 1
figure 1

Clustering result of D31

Fig. 2
figure 2

Clustering result of R15

Fig. 3
figure 3

Clustering result of Square3

Fig. 4
figure 4

Clustering result of Long1

Fig. 5
figure 5

Clustering result of Shapes

Fig. 6
figure 6

Clustering result of Twenty

4.3 Analysis of experimental results on the real datasets in UCI

The experimental results on the real datasets in UCI are shown in Tables 3, 4 and 5. The tables show Accuracy, ARI(Adjusted Rand index), NMI(Normalized Mutual Information) of clustering results for each algorithm respectively. The experimental results are percentages, and the bolded values in the three tables represent the best experimental results.

Table 3 The accuracy comparison of clustering algorithms
Table 4 The ARI comparison of clustering algorithm
Table 5 The NMI comparison of clustering algorithm

5 Summary and conclusion

FCM algorithm needs to determine the number of clusters manually and is sensitive to the initial clustering center. To solve this problem, we propose an improved FCM algorithm based on density peak: DP-FCM algorithm. This algorithm uses the density peak of data points to optimize the selection of the initial clustering center, which reduces the number of iterations, improves the convergence speed and avoids falling into the local optimal solution, and effectively solves the shortcomings of the original algorithm. In the experiment, 16 datasets in UCI were used to analyze the algorithm from three aspects, accuracy, ARI, and NMI. The experimental results show that DP-FCM algorithm is superior to the comparative FCM algorithm, K-means algorithm, DBSCAN algorithm and NMF algorithm, which shows DP-FCM an effective clustering algorithm.