Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset

Narayana, G. Surya; Kolli, Kamakshaiah

doi:10.1007/s11042-020-09718-4

Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset

Published: 01 October 2020

Volume 80, pages 4769–4787, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Multimedia Tools and Applications Aims and scope Submit manuscript

Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset

Download PDF

327 Accesses
9 Citations
Explore all metrics

Abstract

Many conventional optimization approaches concentrate more on addressing only one appropriate solution. Thus, these methods were to be utilized often, hence there were no chances of producing the intended solution. Therefore, the issue of multimodal optimization has to be considered. So, to reduce the difficulties by the clustering and further, it followed by the optimization technique. Here, the variety of real-time and artificial techniques are used. Using the FCDP-Fast Clustering with Density Peak, we calculate the density values after determining the center with the help of objective function. Then, the fuzzy clustering is applied to form the clustered groups with the density and center values. Finally, we optimize the data using the CDE-Crowding Differential Evaluation methodology. Performance analysis is then proceeded with some existing methods by using the performance metrics like NMI and ARI. After validation, it concluded that the proposed method was superior to the existing method.

Research of improved fuzzy c-means algorithm based on a new metric norm

Article 29 January 2015

Optimizing kernel possibilistic fuzzy C-means clustering using metaheuristic algorithms

Article 26 October 2023

Kernel Parameter Optimization in Stretched Kernel-Based Fuzzy Clustering

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Being the age of internet dominances and rapid technological advancements, we must be safe and sound so that we could escape from the intruders and spammers in the surroundings. So data mining and anonymization have become sough one of the topics. The proposed method too is willing to take up the clustering and optimization of the big datasets by calculating the centers based on Fast Density Peak Clustering FCDP and then, cluster it by using the fuzzy logic clustering. As far as the optimization is concerned, we will make use of the CDE Crowding Differential Evaluation methodology based on the Evolutionary multimodal optimization methodology. Clustering of data in a definite manner was a tough and puzzling job due to the distinctive type of clear-cut features: no natural order. That case wished to offer a two-step process called PM-FGCA-Partition-cum-Merge dependent Fuzzy Genetic Clustering Algorithm for clear-cut statistics [16]. To estimate the clustering fulfilment, the offered PM-FGCA Partition-cum-Merge dependent Fuzzy Genetic Clustering Algorithm was associated with few remaining approaches like fuzzy k-modes procedure, k-modes procedure, non-dominated sorting genetic procedure, and genetic fuzzy k-modes procedure using fuzzy membership chromosomes. NMI - Normalized Mutual Information, ARI - Adjusted Ranked Index and DB - Davies-Bouldin index were chosen as validation for 3grouping indices that were characterized mutually to internal and external measures. Therefore, the investigational outcome displays that the offered methodology overtakes the standard approaches when compared with the verified indices. Amid the meta-heuristic optimization or nature-inspired heuristic procedures, BCO - Bee Colony Optimization procedures were extensively used to elucidate the problem of clustering. In this work, an MBCO-Modified BCO method was planned for the clustering of data. In the offered MBCO, the Reconciliation features of the bees and designating a good level of opportunity to two of the reliable and unreliable bees were carried out [6].

The crossbreed KMCLUST and MKCLUST gave better or the same outcome surpassing the offered MBCO. For authenticating the offered procedures, 7 standard sets of information were measured. From percentage flaw computation, it was witnessed that the offered procedures accomplish well compared to a few remaining procedures. The model outcomes conclude that the offered procedures could be capably used for the clustering of data. Clustering that discusses data visualization and distribution had been widely investigated in recent years. When current clustering algorithms like DBSCAN could recognize and operate well the arbitrary-shape clusters, sometimes it could be challenging to find the parameters associated with these techniques. Clustering deploying a quick search for peaks of density was a promising technique to resolve the faced issues. The existing methods, moreover, were lacking from the issue of unequal distribution between regional clusters. For redressing this issue, this method came up with a new clustering methodology based on the density peak using a hierarchical approach called HCFS, consisting primarily of two levels [37]. The HCFS determines each point’s range and density in the initial stage. The investigations proceeded on a huge count of datasets indicate that this devised operation could efficiently detect clusters with irregularities and produce improved performance when compared with several kinds of datasets. For improving the earlier efforts and production efficiency, this ensemble clustering method implemented an interesting concept for the sake of removing a few of the proofs by using the correlation matrix that yields more improvised outcomes in terms of the clustering operation [35].

The above Fig. 1 shows the typical computation of the Multivariate Kernel Density with the relation modified from the traditional work. A fresh density-depended clustering procedure, RNN-DBSCAN, was offered which uses inverse adjacent neighbor counts as a guess of remark density. DBSCAN-like approach depended on k adjacent neighbor graph accomplished clustering, travels via the dense remarks. RNN-DBSCAN was desirable for this most known density-dependent clustering methodology in 2 phases. Clustering was able to identify contained data dissemination and it was beneficial in driven data machine learning [3]. Density depended clustering had a smart thing of identifying clusters of random arrangements. The density peak technique makes use of two rules to sense cluster centers and the collection of further data. This methodology was easy to implement and revealed to be likely in numerous pieces of researches. Dissimilar types of sets of data, to exhibit the routine of their method [31] [10].

A feature-weighted clustering model depended on the explanation of object-cluster resemblance was offered. A combined weighting system [11] for the arithmetical and categorical features was suggested, which enumerates the feature-to-cluster involvement by taking mutually inter-cluster variance and intracluster resemblance [17]. An initialization-oriented technique was also presented, which could efficiently increase the steadiness and precision of k-means-type clustering techniques on arithmetical, categorical, and mixed data. The problems of the real world always had different multiple solutions. Optical engineers must alter the recording factors to get many optimal answers for many tribunals problem in the varied-line-spacing holographic grating design [27].

1.1 Objective

To reduce the conventional optimization algorithm issue, the proposed optimization technique followed.
To execute the FCDP method – Fast Clustering Density Peak method and density values calculated.
To optimize the data using CDE- Crowding Differential Evaluation methodology.

2 Background overview

Multimodal Optimization Issues should try to obtain the multiple optima at the same time, hence the variety of the community was a crucial problem ok which should be treated as an evolutionary optimization methodology for the issues of Multimodal Optimization in design. Using evolutionary multi-objective enhancement to ensure proper population variety, this methodology devised a tri-aim DE-Differential Evolution approach for the resolution of multi optimization issues [34]. While proceeding with these optimization issues, this work initially made the conversion it into a TOP-Tri-objective Optimization Problem. Based on three measures such as MMOP target function, the specific span data assessed by a collection of points of reference, the mutual fitness taking up the methodology of niching, the 3 optimization goals were developed. The first 2 aims were potentially contradictory for the sake of utilizing the full benefits in the stage of evolutionary multi-aim optimization. The 3rd aim formed by the methodology of niching, that was not that much sensing the niching variables, was greatly enhancing the population varieties. Computational evidence was provided to prove that the TOPs’ Pareto-optimal front comprises most of the global optima in the multi-aim optimization issues.

A newer methodology of niching depending on the repelling subpopulations was made in this work of, with the absence of few predictions. The devised methodology was implemented onto the CMA-ES- Covariance Matrix Self-Adaptation Evolution Strategy, for facilitating the simultaneous convergence to several kinds of minima. The resultant operation known by the name, RS-CMSA-Covariance Matrix Self-Adaptation with Repelling Subpopulations, was also evaluated and correlated with the variety of recent aged niching algorithms on a base test case for optimizing the multi-modes [1].

For correlating with the multi-modal optimization methodologies’ performances, multi-mode based benchmark issues were often needed. In this work of, fifteen newer scaling capable multi-modal and practical variable benchmark issues were developed. Out of the fifteen considered issues, eight were modified basic relations and the remaining were relations comprising of compositions [20]. The above 15 relations take care of shift and rotation processes for generating the connection between the several dimensions for the sake of positioning the optima at several positions appropriately. 4 Simple niching methodologies were taken for the sake of redressing the devised issues. The investigations and analyses of the proposed issues were holding well against the taken 4 literature available methodologies. Data mining was a necessary operation in many of the evolving computer related innovations since the data mining lessens the data sets’ complexity with the improvised concept. Not, just this issue, but a collection of issues was there in the data mining. So, many methodologies came into vague for addressing the issues found in the existing methods [14]. Also, clustering, relationship, and classification protocol mining were good and necessary for mining the information so that many researchers have started introducing the intertwining and inter-association between them, thereby this work would turn out to be a helping hand for the researchers undertaking this kind of domains. This work deals with the difficulties encountered in the methodology of clustering for high-dimensional and simple 2-databases.

Untagged matte data was mutual in most of the applications. Since the matte data had no structure of geometry, how to uncover patterns and knowledge from the untagged matte data, was significant trouble. But here, in our paper, for matte data, a rough fuzzy clustering set of instructions was devised [30]. The proposed set of instructions and the comparison set of instructions are done on data of real sets. The results of the experiment showed that the proposed set of instruction surpasses the comparison set of instructions, for many data sets and proves the results that the proposed set of instructions was capable of producing a clustering set of instructions for matte data sets. Another instant of work, taking up LFGL – Latent Feature Grouping Learning methodology was devised to uncover the grouping arrangements feature and subspace groups for high dimensional data. For improved handling, the clusters vary the density values in data of big dimension, the source attribute of the grouping based weighted k-means methodology was amended with the dependence of mass measure which was dissimilarity pace rather than the Euclidean span measure and the attribute weights were also enhanced as an issue of nonnegative matrix factorization based on the attribute weight matrix’s orthogonal matrix [25].

In this paper, a Data-driven, to redesign individual for nonparametric approach and constraints of joint chance with RHS-Right Hand Size uncertainty into constraints of algebra were developed [4]. The Historical data are given for continuous random variables of Uni-variate or Multivariate (Parameters which are uncertain in an enhanced model), the joint cumulative function of distribution, and the inverse cumulative distribution function (the function of quantile) are calculated for the univariate and multivariate cases, respectively. This kind of approach depends upon the confidence set construction that comprises terra incognita true distribution was modeled via Φ-divergences. Spatially contagious clusters could modify the interpretation turns the clusters could modify the geographical sub-regions in a meaningful manner. So here, this paper developed an agglomerative hierarchical clustering approach that accounts take the spatial dependency between every observation [8]. It depends on a matrix of dissimilarity built from a non-parametric kernel estimator of the data which was multivariate spatial dependence structure. The proposed approach’s capability provides spatially compact, clusters that were meaningful and connected was considered using real datasets and multivariate synthetic. Satisfactory results were given by the proposed method of clustering when comparing from other geostatistical methods of clustering.

Grouping similar and dissimilar data was done by a technique called Data Clustering. But, when dealing with multi-dimensional scan data many clustering algorithms fail. So, this paper introduced the cuckoo optimization algorithm for data clustering which was an efficient method; known as COAC and FCOAC - Fuzzy Cuckoo Optimization Algorithm [2]. This algorithm’s performance was compared and evaluated with GSA, PSO, K-mean, CS, COAC, black hole. The result of the overall performance shows that our algorithm had superior performance when compared with other state-of-the-art methods. Pursuing multiple optima simultaneously, the multimodal optimization aims at increasing attention, which remains challenging. By using the advantage of ACO-Ant Colony Optimization methodology in protecting high diversity, this paper expanded the ACO algorithms for dealing with the multimodal optimization. The overall comparisons gave the effectiveness and efficiency of the devised methodology, especially in critical issues, having high numbers of local optima [32].

Along this way, another instant of clustering taking up the methodology known as Cluster KDE, based on univariate kernel density estimation was proposed [13]. It comprises iterative procedure, wherein minimizing a smooth kernel function in each step, a new cluster would be obtained. Even though we were already using univariate Gaussian kernel in our applications, a smooth kernel function could also be deployed in this schema. The advantage of devised methodology does not require a prior number of clusters. The Cluster KDE was also easy to use, and stops in a finite number of steps, at the initial point, it converges independently. The outcomes show that ClusterKDE was superior and fast when correlated with cluster data and K-means methodologies, via the Matlab Simulink for clustering data.

Like many types of research, this work too devised a unique schema for exploring the count of clusters at the same time and categorize the information points into the clusters by utilizing the clustering of the type subspace [12]. Practical information circulated in a high-dimensioned region could be separated into a combination of low-dimensioned subspaces that could give desired results when utilized. Triplet association could be acquired from the matrix of self-representation, which would be further deployed for assigning the information points to the desired clusters in an iterative manner. This devised methodology gives the clusters count in an automated manner and then merges the clusters for neglecting the excessive segmenting of the intended region. Comprehensive investigational outcomes on both the artificial and practical datasets cross-check the devised methodologies’ robustness and effectiveness. Point clustering mainly clusters N points given into clusters K. It leads to the same group of similarities among objects was high whereas the different group of similarities among objects was low. The Sum of votes maximization between the same cluster points has reduced by the voting formulation technique. Clustering of various sizes and densities have advantages because of this technique [18] An extension to the basic fuzzy K-means algorithm which integrated into a robust function to handle outliers introduced and called as robust and sparse fuzzy k means algorithm. Penalty term presented to make sure about the sparseness of object cluster membership [29].

3 Summarized works of data clustering and optimization

In this section, we tabulate some literature available papers as shown in the below Table 1.

Table 1 Literature level summarization of clustering and optimization for data mining

Full size table

4 The overall flow of the clustering and optimization

In the traditional optimization algorithm, the existing issues have been addressed by the proposed optimization method. For this purpose, the efficient Fast Clustering Density Peak method executed to calculate density values and for data optimization, the Crowding Differential Evaluation Technique followed. In this section, the flow for the devised work is elaborated. Here, to cluster the high dimensional data using this proposed novel method. Clustering algorithms aim to analyze data by discovering their underlying structure and organize them into separate categories according to their characteristics expressed as internal homogeneity and external bifurcation without prior knowledge.

The FDP algorithm, published in the journal of Science in 2014 which is a method for determining categories based only on the distance between data. It assumes that the cluster center is surrounded by data points that are less dense than its local density and have a large distance from other centers. The evolutionary algorithms for multimodal optimization usually not only locate multiple optima in a single run but also preserve their population diversity throughout a run, resulting in their global optimization ability on multimodal functions. Also, the techniques for multimodal optimization are borrowed as diversity maintenance techniques to other problems. The novelty of this paper lies in the fact of taking the novel steps in calculating the density using the kernel estimator and then using the optimization method for a clustering operation uniquely.

5 Fast density peak clustering

5.1 (FCDP)

5.1.1 Distance matrix

The distance matrix d_ij is given by the

$$ \mathrm{Distance},{\mathrm{d}}_{\mathrm{i}\mathrm{j}}=\left[\left|{\mathrm{n}}_{\mathrm{i}}-{\mathrm{n}}_{\mathrm{j}}\right|\right]\frac{1}{2} $$

(1)

5.1.2 Cut-off distance

The cut-off distance matrix d_cut is given by the

$$ \kern1.75em {\mathrm{d}}_{\mathrm{cut}}={\sum}_{\mathrm{i}=1}^{\mathrm{D}}\mathrm{mean}\ {\left({d}_{ij}\right)}^D $$

(2)

5.1.3 Density of point

The novelty part lies majorly in this part wherein we determine the density using the kernel estimator ‘k’ after finding the center value. The density of points ρ_i is given by the

$$ {\uprho}_{\mathrm{i}}=\frac{1}{nd}.{\sum}_{\mathrm{j}=1}^{\mathrm{n}}\mathrm{k}\left({d}_{ij}-{\mathrm{d}}_{\mathrm{cut}}\right) $$

(3)

Where, $ \frac{1}{nd} $ indicates the number of the dimensions depending upon the dataset utilized.

Based on the below conditions, many obtained points will be converted to the values of 0 s and 1 s by using the two relationships of δ_i appropriately.

$$ {\mathrm{X}}^{\left(\mathrm{x}\right)}=\left\{\begin{array}{c}1,\kern0.5em x<0\\ {}0,\kern0.5em otherwise\end{array}\right. $$

(4)

Where.

The distance between nearest-neighbor with higher density and point i is defined by the distance of point i- δ_i: δ_i,

$$ {\updelta}_{\mathrm{i}}=\min \kern2em {d}_{ij}i\in s,\rho j-\rho i $$

(5)

The distance δ_i calculated as when the point i has the highest density as follows,

$$ {\updelta}_{\mathrm{i}}=\max \kern1.25em {d}_{ij}i\in s $$

(6)

6 Fuzzy k-means clustering

FKM generally splits and make partition defined by the group of t. The FKM methodology is based on minimizing the objective function Clus_Fuz.

$$ {Clus}_{Fuz}={\sum}_{i=1}^d{\sum}_{j=1}^k{u}_{ij}^m{d}_{ij}^2 $$

(7)

$$ {u}_{ij}=\frac{1}{\sum_{p=1}^k\sqrt[\left(m-1\right)]{{\left(\frac{d_{ij}}{d_{jp}}\right)}^2}} $$

(8)

In the above equation, m will not be equal to one and also d_ij will be same which was already indicated in the previous Eq. (1)

The exponent m in Eq. 7 refers to the fuzzifier parameter and it explains the fuzziness of clustering.

Based on the above discussion, the operational steps of the FKM algorithm could be summarized as follows:

Step 1:
Select the count of clusters k, level of fuzziness along with a threshold value e. The initialization of the fuzzy partitioning matrix will take place.
Step 2:
Estimate the cluster centers with the help of Eqs. 7 and 8.
Step 3:
Estimate the distance d_ij from the taken specimens. Then, we determine the values of u_ij and modify the fuzzy partition matrix for updating it.
Step 4:
Estimate the objective function Clus_Fuz using the relation (7). Validate whether the relation converges or not the difference intermediary to the 2 nearby values of the objective function is lower than the taken critical count, afterward send it. Or else proceed the Step 2 again if needed.

7 Evolutionary multimodal optimization

Any Evolutionary multimodal optimizing methodology will be intended in determining the numerous (local or overall) optima of a multimodal relation. Many intended issues could be detected and addressed by determining all the overall optima, determining k overall optima, determining k best optima, and determining all local and overall optima, etc.

One such methodology was CDE Crowding Differential Evaluation methodology, which will be discussed in the next sub-section (Fig. 2).

7.1 Crowding differential evaluation

For every offspring in every generation, the differential evolution could replace the majority of the alike ones. Although an intensive estimation is accompanied, it can effectively transform the differential evolution into an algorithm specifically for multimodal optimization.

7.2 Algorithm for the crowding differential evaluation

The algorithm for the Crowding Differential Evaluation is given below as follows:

In each generation for each ind, a trial vector generation undergone for offspring generation which is returned by TRIALVECTORGENERATION. To form a trial vector, three individuals randomly selected by the algorithm. A base vector formed by one individual and a difference vector formed by the other two individuals. Trial vector formed by the sum of these two vectors which again combine with the parent and to form offspring Off_sp.

By this trial vector generation, crossover parameter tuning manually and no longer needed mutation, the typical crossover and mutation operation replaced. To select the appropriate step size the differential evolution- adaptive ability has been provided. For moving towards optima the self-organizing ability has granted.

8 Comparative analysis of the proposed method and its performance

The existing methods such as [37]& [35] were taken for our comparative analysis to show the effectiveness of this proposed work.

8.1 Processed dataset

Artificial and real-time Benchmark datasets like Aggregation, D31, Flame, Path-based, R15, iris, Seeds, Glass, Wine, spiral are used for the comprehensive performance analysis of this proposed method with the existing methods.

The proposed method is examined on various datasets [35, 36] in which half them are real and half of them are synthetic. The detailed description of the dataset shown in Table 2.

Table 2 Dataset Description

Full size table

8.2 Performance metrics investigation

Two performance metrics such as ARI-Adjusted Rand index and Normalized Mutual Information are utilized for making the comparative study with the real and artificial datasets.

The Hubert statistics is defined as,

$$ Huber{t}^{\prime }s=\left( Ma-{m}_1{m}_2\right)/\sqrt{m_1{m}_2-\left(M-{m}_1\right)\left(M-{m}_2\right)} $$

(9)

Where M = a + b + c + d, m1 = a + b and m2 = a + c

From the above Eq. 9, the term defines the number of pairs in which every two objects in similar cluster w.r.t. P and P′. Let b defines the number of pairs in which every two objects in similar cluster w.r.t. P but various clusters go to P′. The term c defines the number of pairs in which every two objects in various clusters w.r.t. P and similar clusters go to P′. Finally, d is the number of pairs in which every two objects belong to various clusters w.r.t. P and P′.

The pair of clustering shared information depicted by mutual information and the normal mutual information NMI used as external validity criteria and defined as,

$$ NMI\left(P,{P}^{\prime}\right)=\frac{\sum_{i=0}^k{\sum}_{j=1}^{k\prime}\left|{c}_i\cap {c}_j^{\prime}\right|\ \log \left(\frac{N\mid {c}_i\cap {c}_j^{\prime}\mid }{\mid {c}_i\Big\Vert {c}_j^{\prime}\mid}\right)}{\sqrt{\sum_{i=0}^k\left|{c}_i\right|\mathit{\log}\frac{\mid {c}_i\mid }{N}\left({\sum}_{i=0}^{k\prime }|{c}_j^{\prime }|\mathit{\log}\frac{\mid {c}_j^{\prime}\mid }{N}\right)}\ } $$

(10)

|C| Denoted as the number of objects in C.

The adjusted rand index ARI is calculated by,

$$ ARI\left(P,{P}^{\prime}\right)=\frac{\sum_{i=1}^k{\sum}_{j=1}^{k\prime}\begin{array}{c}{n}_{ij}\\ {}2\end{array}-\left[\ {\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)\ {\sum}_{j=1}^{k\prime}\frac{\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)}{\left(\begin{array}{c}n\\ {}2\end{array}\right)}\right]}{\frac{1}{2}\left[{\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)+{\sum}_{j=1}^{k\prime}\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)\right]-\left[\ {\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)\ {\sum}_{j=1}^{k\prime}\frac{\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)}{\left(\begin{array}{c}n\\ {}2\end{array}\right)}\right]\kern0.5em } $$

(11)

Where a_j defines the addition of common points in j^th cluster. The ground truth b_j defines the addition of common points in j^th classes. N_ij denoted as the number of common points in cluster i and class j.

The below Table 3 shows the values of NMI of the existing and proposed methods.

Table 3 Performance metrics for ARI of the existing [36] and proposed method

Full size table

The above Fig. 3 shows the performance level comparison for ARI of the proposed and existing methods. The proposed optimization method was good and yielded superior results than other considered existing methods of ISB, NCFS, CFS, IS, and HCF. Where Density Kernel in Density Peak Based Clustering(NCFS), Clustering by fast search and of density peaks (CFS), HCFS: A Density Peak Based Clustering Algorithm Employing -A Hierarchical Strategy.

Then, the NMI-Normalized Mutual Information is tabulated in the below Table 4.

Table 4 Performance metrics for NMI of the existing [36]and proposed method

Full size table

The above Fig. 4 shows the performance level comparative study with the values of NMI of the proposed and existing methods. The proposed optimization method was good and yielded superior results than other considered existing methods of ISB, NCFS, CFS, IS, and HCF because of the clustering it efficiently by finding the center, cut- off distance, and density of point.

The F1 score values are tabulated in the below Table 5 for validating the performance of the proposed clustering cum optimization method with the existing method.

Table 5 Performance metrics for the F1 measure of the existing [10] and proposed method

Full size table

The above Fig. 5 shows the performance-based comparison investigation with the F1 scores of the proposed and existing methods concerning the optimization. The proposed optimization method was good and yielded superior results than other considered existing methods of conventional K-means, AP, DP-C, and REDPC because of the clustering it. Whereas the original DP algorithm with the cutoff kernel (DP-c), Affinity peak(AP) and Record based Density Peak Clustering REDPC.

Then, the NMI-Normalized Mutual Information is tabulated in the below Table 6 for validating the performance of the proposed clustering cum optimization method with the existing method

Table 6 Performance metrics for NMI of the existing[35] and proposed method

Full size table

The above Fig. 6 indicates the performance level comparison investigation with the NMI of the proposed and existing methods concerning the optimization. The proposed optimization method was good and yielded superior results than other considered existing methods of NegMM, WCT, WTQ, CSM, CSPA, and REDPC because of the clustering it. Link-based Clustering Ensemble (LCE) [4] and Strehl’s algorithms. The former has three vari- ants: Weighted Connected-Triple (WCT), Weighted Triple-Quality (WTQ) and Combined Similarity Measure (CSM). The latter also has three variants: Cluster-based Similarity Partitioning Algorithm (CSPA), Hyper Graph Partitioning Algorithm (HGPA), and Meta- Clustering Algorithm (MCLA).

The above Table 7 shows the Performance metrics for Hubert’s of the existing and proposed method. Here too, the proposed method was effective to the most part than the existing methods like, and in some cases, the proposed methods’ performance coincided when using the datasets like R15, iris, and wine.

Table 7 Performance metrics for Hubert’s of the existing [35] and proposed method

Full size table

From Fig. 7 it depicted that the proposed method shows the highest accuracy while compared with other existing methods. The second highest accuracy yielded by CVR-LMV clustering based on Voting representation existing method of 0.9333 and the lowest accuracy yielded from PAM method.

9 Conclusion

In this proposed methodology, we first of all the clustered the data after determining the center, cut off distance, and density of point by using the FCDP Fast Density Peak Clustering with these novel steps and then, fuzzy K-means clustering was applied to form the cluster. Then, CDE Crowding Differential Evaluation methodology optimization was used to optimize the data innovatively here in this paper, since the CDE was deployed to be utilized with the scheduling and other related operations. Then, we measured the performance metrics of this proposed work by using the performance measures like NMI, ARI, and Hubert’s. In all cases but few, superior results were obtained predominantly by coinciding with the fewer outcomes when Hubert’s comparison was made. The future enhancement focuses on the various another dataset can include and follows deep learning techniques to enhance the current optimization techniques.

References

Ahrari A, Deb K, Preuss M (2017) Multimodal optimization by covariance matrix self-adaptation evolution strategy with repelling subpopulations. Evol Comput 25:439–471
Article Google Scholar
Amiri E, Mahmoudi S (2016) Efficient protocol for data clustering by fuzzy cuckoo optimization algorithm. Appl Soft Comput 41:15–21
Article Google Scholar
Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30:1109–1121
Article Google Scholar
Calfa BA, Grossmann IE, Agarwal A, Bury SJ, Wassick JM (2015) Data-driven individual and joint chance-constrained optimization via kernel smoothing. Comput Chem Eng 78:51–69
Article Google Scholar
Chander S, Vijaya P, Dhyani P (2018) Multi kernel and dynamic fractional lion optimization algorithm for data clustering. Alexandria engineering journal 57:267–276
Article Google Scholar
Das P, Das DK, Dey S (2018) A modified bee Colony optimization (MBCO) and its hybridization with k-means for an application to data clustering. Appl Soft Comput 70:590–603
Article Google Scholar
Emami H, Derakhshan F (2015) Integrating fuzzy K-means, particle swarm optimization, and imperialist competitive algorithm for data clustering. Arab J Sci Eng 40:3545–3554
Article Google Scholar
Fouedjio F (2016) A hierarchical clustering method for multivariate geostatistical data. Spatial Statistics 18:333–351
Article MathSciNet Google Scholar
Heil J, Häring V, Marschner B, Stumpe B (2019) Advantages of fuzzy k-means over k-means clustering in the classification of diffuse reflectance soil spectra: a case study with west African soils. Geoderma 337:11–21
Article Google Scholar
Hou J, Zhang A (2019) “Enhancing density peak clustering via density normalization,” IEEE Transactions on Industrial Informatics
Jia H, Cheung Y-M (2017) Subspace clustering of categorical and numerical data with an unknown number of clusters. IEEE transactions on neural networks and learning systems 29:3308–3325
MathSciNet Google Scholar
Liang J, Yang J, Cheng M-M, Rosin PL, Wang L (2019) Simultaneous subspace clustering and cluster number estimating based on triplet relationship. IEEE Trans Image Process 28:3973–3985
Article MathSciNet Google Scholar
Matioli L, Santos S, Kleina M, Leite E (2018) A new algorithm for clustering based on kernel density estimation. J Appl Stat 45:347–366
Article MathSciNet Google Scholar
Mittal M, Goyal LM, Hemanth DJ, Sethi JK (2019) Clustering approaches for high-dimensional databases: a review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9:e1300
Google Scholar
Nayak J, Naik B, Kanungo D, Behera H (2018) A hybrid elicit teaching learning based optimization with fuzzy c-means (ETLBO-FCM) algorithm for data clustering. Ain Shams Engineering Journal 9:379–393
Article Google Scholar
Nguyen TPQ, Kuo R (2019) Partition-and-merge based fuzzy genetic clustering algorithm for categorical data. Appl Soft Comput 75:254–264
Article Google Scholar
Nock R, Nielsen F (2006) On weighting clustering. IEEE Trans Pattern Anal Mach Intell 28:1223–1235
Article Google Scholar
Panagiotakis C (2015) Point clustering via voting maximization. J Classif 32:212–240
Article MathSciNet Google Scholar
Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Article MathSciNet Google Scholar
Qu B, Liang J, Wang Z, Chen Q, Suganthan PN (2016) Novel benchmark functions for continuous multimodal optimization with comparative results. Swarm and Evolutionary Computation 26:23–34
Article Google Scholar
Scognamiglio L, Magnoni F, Tinti E, Casarotti E (2016) Uncertainty estimations for moment tensor inversions: the issue of the 2012 may 20 Emilia earthquake. Geophys J Int 206:792–806
Article Google Scholar
Sengupta S, Basak S, and Peters RA (2018) “Data clustering using a hybrid of fuzzy c-means and quantum-behaved particle swarm optimization,” in 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), p 137–142
Sitompul O, Nababan E (2018) “Optimization model of K-Means clustering using artificial neural networks to handle class imbalance problem,” in IOP Conference Series: Materials Science and Engineering, p 012075
Wang Y, Chen L (2017) Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources. Expert Syst Appl 72:457–466
Article Google Scholar
Wang W, He Y, Ma L, Huang JZZ (2019) Latent feature group learning for high-dimensional data clustering. Information 10:208
Article Google Scholar
Wang Z-J, Zhan Z-H, Lin Y, Yu W-J, Yuan H-Q, Gu T-L et al (2017) Dual-strategy differential evolution with affinity propagation clustering for multimodal optimization problems. IEEE Trans Evol Comput 22:894–908
Article Google Scholar
Wong KC (2015) “Evolutionary multimodal optimization: A short survey,” arXiv preprint arXiv:1508.00457
Wu X, Wu B, Sun J, Qiu S, Li X (2015) A hybrid fuzzy K-harmonic means clustering algorithm. Appl Math Model 39:3398–3409
Article Google Scholar
Xu J, Han J, Xiong K, Nie F (2016) “Robust and Sparse Fuzzy K-Means Clustering,” in IJCAI, pp. 2224–2230
Xu S, Liu S, Zhou J, Feng L (2019) Fuzzy rough clustering for categorical data. Int J Mach Learn Cybern 10:3213–3223
Article Google Scholar
Xu J, Wang G, Deng W (2016) DenPEHC: density peak based efficient hierarchical clustering. Inf Sci 373:200–218
Article Google Scholar
Yang Q, Chen W-N, Yu Z, Gu T, Li Y, Zhang H et al (2016) Adaptive multimodal continuous ant colony optimization. IEEE Trans Evol Comput 21:191–205
Article Google Scholar
Yang C-L, Kuo R, Chien C-H, Quyen NTP (2015) Non-dominated sorting genetic algorithm using fuzzy membership chromosome for categorical data clustering. Appl Soft Comput 30:113–122
Article Google Scholar
Yu W-J, Ji J-Y, Gong Y-J, Yang Q, Zhang J (2018) A tri-objective differential evolution approach for multimodal optimization. Inf Sci 423:1–23
Article MathSciNet Google Scholar
Zhong C, Hu L, Yue X, Luo T, Fu Q, Xu H (2019) Ensemble clustering based on evidence extracted from the co-association matrix. Pattern Recogn 92:93–106
Article Google Scholar
Zhuo L, Li K, Liao B, Li H, Wei X, Li K (2019) HCFS: a density peak based clustering algorithm employing a hierarchical strategy. IEEE Access 7:74612–74624
Article Google Scholar
Zhuo L, Li K, Liao B, Lia H, Wei X, Lib K (2019) “HCFS: a density peak based clustering algorithm employing a hierarchical strategy,” IEEE Access

Download references

Author information

Authors and Affiliations

Cse Department, Sreyas Institute of Engineering and Technology, Hyderabad, India
G. Surya Narayana
Department of CSE, Geethanjali College of Engineering and Technology, Cheeryal Village, Hyderabad, India
Kamakshaiah Kolli

Authors

G. Surya Narayana
View author publications
You can also search for this author in PubMed Google Scholar
Kamakshaiah Kolli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Surya Narayana.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Narayana, G.S., Kolli, K. Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimed Tools Appl 80, 4769–4787 (2021). https://doi.org/10.1007/s11042-020-09718-4

Download citation

Received: 19 November 2019
Revised: 12 August 2020
Accepted: 25 August 2020
Published: 01 October 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09718-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset

Abstract

Similar content being viewed by others

Research of improved fuzzy c-means algorithm based on a new metric norm

Optimizing kernel possibilistic fuzzy C-means clustering using metaheuristic algorithms

Kernel Parameter Optimization in Stretched Kernel-Based Fuzzy Clustering