1 Introduction

Being the age of internet dominances and rapid technological advancements, we must be safe and sound so that we could escape from the intruders and spammers in the surroundings. So data mining and anonymization have become sough one of the topics. The proposed method too is willing to take up the clustering and optimization of the big datasets by calculating the centers based on Fast Density Peak Clustering FCDP and then, cluster it by using the fuzzy logic clustering. As far as the optimization is concerned, we will make use of the CDE Crowding Differential Evaluation methodology based on the Evolutionary multimodal optimization methodology. Clustering of data in a definite manner was a tough and puzzling job due to the distinctive type of clear-cut features: no natural order. That case wished to offer a two-step process called PM-FGCA-Partition-cum-Merge dependent Fuzzy Genetic Clustering Algorithm for clear-cut statistics [16]. To estimate the clustering fulfilment, the offered PM-FGCA Partition-cum-Merge dependent Fuzzy Genetic Clustering Algorithm was associated with few remaining approaches like fuzzy k-modes procedure, k-modes procedure, non-dominated sorting genetic procedure, and genetic fuzzy k-modes procedure using fuzzy membership chromosomes. NMI - Normalized Mutual Information, ARI - Adjusted Ranked Index and DB - Davies-Bouldin index were chosen as validation for 3grouping indices that were characterized mutually to internal and external measures. Therefore, the investigational outcome displays that the offered methodology overtakes the standard approaches when compared with the verified indices. Amid the meta-heuristic optimization or nature-inspired heuristic procedures, BCO - Bee Colony Optimization procedures were extensively used to elucidate the problem of clustering. In this work, an MBCO-Modified BCO method was planned for the clustering of data. In the offered MBCO, the Reconciliation features of the bees and designating a good level of opportunity to two of the reliable and unreliable bees were carried out [6].

The crossbreed KMCLUST and MKCLUST gave better or the same outcome surpassing the offered MBCO. For authenticating the offered procedures, 7 standard sets of information were measured. From percentage flaw computation, it was witnessed that the offered procedures accomplish well compared to a few remaining procedures. The model outcomes conclude that the offered procedures could be capably used for the clustering of data. Clustering that discusses data visualization and distribution had been widely investigated in recent years. When current clustering algorithms like DBSCAN could recognize and operate well the arbitrary-shape clusters, sometimes it could be challenging to find the parameters associated with these techniques. Clustering deploying a quick search for peaks of density was a promising technique to resolve the faced issues. The existing methods, moreover, were lacking from the issue of unequal distribution between regional clusters. For redressing this issue, this method came up with a new clustering methodology based on the density peak using a hierarchical approach called HCFS, consisting primarily of two levels [37]. The HCFS determines each point’s range and density in the initial stage. The investigations proceeded on a huge count of datasets indicate that this devised operation could efficiently detect clusters with irregularities and produce improved performance when compared with several kinds of datasets. For improving the earlier efforts and production efficiency, this ensemble clustering method implemented an interesting concept for the sake of removing a few of the proofs by using the correlation matrix that yields more improvised outcomes in terms of the clustering operation [35].

The above Fig. 1 shows the typical computation of the Multivariate Kernel Density with the relation modified from the traditional work. A fresh density-depended clustering procedure, RNN-DBSCAN, was offered which uses inverse adjacent neighbor counts as a guess of remark density. DBSCAN-like approach depended on k adjacent neighbor graph accomplished clustering, travels via the dense remarks. RNN-DBSCAN was desirable for this most known density-dependent clustering methodology in 2 phases. Clustering was able to identify contained data dissemination and it was beneficial in driven data machine learning [3]. Density depended clustering had a smart thing of identifying clusters of random arrangements. The density peak technique makes use of two rules to sense cluster centers and the collection of further data. This methodology was easy to implement and revealed to be likely in numerous pieces of researches. Dissimilar types of sets of data, to exhibit the routine of their method [31] [10].

Fig. 1
figure 1

Computation of modified Multivariate Kernel Density [21] based on [19]

A feature-weighted clustering model depended on the explanation of object-cluster resemblance was offered. A combined weighting system [11] for the arithmetical and categorical features was suggested, which enumerates the feature-to-cluster involvement by taking mutually inter-cluster variance and intracluster resemblance [17]. An initialization-oriented technique was also presented, which could efficiently increase the steadiness and precision of k-means-type clustering techniques on arithmetical, categorical, and mixed data. The problems of the real world always had different multiple solutions. Optical engineers must alter the recording factors to get many optimal answers for many tribunals problem in the varied-line-spacing holographic grating design [27].

1.1 Objective

  • To reduce the conventional optimization algorithm issue, the proposed optimization technique followed.

  • To execute the FCDP method – Fast Clustering Density Peak method and density values calculated.

  • To optimize the data using CDE- Crowding Differential Evaluation methodology.

2 Background overview

Multimodal Optimization Issues should try to obtain the multiple optima at the same time, hence the variety of the community was a crucial problem ok which should be treated as an evolutionary optimization methodology for the issues of Multimodal Optimization in design. Using evolutionary multi-objective enhancement to ensure proper population variety, this methodology devised a tri-aim DE-Differential Evolution approach for the resolution of multi optimization issues [34]. While proceeding with these optimization issues, this work initially made the conversion it into a TOP-Tri-objective Optimization Problem. Based on three measures such as MMOP target function, the specific span data assessed by a collection of points of reference, the mutual fitness taking up the methodology of niching, the 3 optimization goals were developed. The first 2 aims were potentially contradictory for the sake of utilizing the full benefits in the stage of evolutionary multi-aim optimization. The 3rd aim formed by the methodology of niching, that was not that much sensing the niching variables, was greatly enhancing the population varieties. Computational evidence was provided to prove that the TOPs’ Pareto-optimal front comprises most of the global optima in the multi-aim optimization issues.

A newer methodology of niching depending on the repelling subpopulations was made in this work of, with the absence of few predictions. The devised methodology was implemented onto the CMA-ES- Covariance Matrix Self-Adaptation Evolution Strategy, for facilitating the simultaneous convergence to several kinds of minima. The resultant operation known by the name, RS-CMSA-Covariance Matrix Self-Adaptation with Repelling Subpopulations, was also evaluated and correlated with the variety of recent aged niching algorithms on a base test case for optimizing the multi-modes [1].

For correlating with the multi-modal optimization methodologies’ performances, multi-mode based benchmark issues were often needed. In this work of, fifteen newer scaling capable multi-modal and practical variable benchmark issues were developed. Out of the fifteen considered issues, eight were modified basic relations and the remaining were relations comprising of compositions [20]. The above 15 relations take care of shift and rotation processes for generating the connection between the several dimensions for the sake of positioning the optima at several positions appropriately. 4 Simple niching methodologies were taken for the sake of redressing the devised issues. The investigations and analyses of the proposed issues were holding well against the taken 4 literature available methodologies. Data mining was a necessary operation in many of the evolving computer related innovations since the data mining lessens the data sets’ complexity with the improvised concept. Not, just this issue, but a collection of issues was there in the data mining. So, many methodologies came into vague for addressing the issues found in the existing methods [14]. Also, clustering, relationship, and classification protocol mining were good and necessary for mining the information so that many researchers have started introducing the intertwining and inter-association between them, thereby this work would turn out to be a helping hand for the researchers undertaking this kind of domains. This work deals with the difficulties encountered in the methodology of clustering for high-dimensional and simple 2-databases.

Untagged matte data was mutual in most of the applications. Since the matte data had no structure of geometry, how to uncover patterns and knowledge from the untagged matte data, was significant trouble. But here, in our paper, for matte data, a rough fuzzy clustering set of instructions was devised [30]. The proposed set of instructions and the comparison set of instructions are done on data of real sets. The results of the experiment showed that the proposed set of instruction surpasses the comparison set of instructions, for many data sets and proves the results that the proposed set of instructions was capable of producing a clustering set of instructions for matte data sets. Another instant of work, taking up LFGL – Latent Feature Grouping Learning methodology was devised to uncover the grouping arrangements feature and subspace groups for high dimensional data. For improved handling, the clusters vary the density values in data of big dimension, the source attribute of the grouping based weighted k-means methodology was amended with the dependence of mass measure which was dissimilarity pace rather than the Euclidean span measure and the attribute weights were also enhanced as an issue of nonnegative matrix factorization based on the attribute weight matrix’s orthogonal matrix [25].

In this paper, a Data-driven, to redesign individual for nonparametric approach and constraints of joint chance with RHS-Right Hand Size uncertainty into constraints of algebra were developed [4]. The Historical data are given for continuous random variables of Uni-variate or Multivariate (Parameters which are uncertain in an enhanced model), the joint cumulative function of distribution, and the inverse cumulative distribution function (the function of quantile) are calculated for the univariate and multivariate cases, respectively. This kind of approach depends upon the confidence set construction that comprises terra incognita true distribution was modeled via Φ-divergences. Spatially contagious clusters could modify the interpretation turns the clusters could modify the geographical sub-regions in a meaningful manner. So here, this paper developed an agglomerative hierarchical clustering approach that accounts take the spatial dependency between every observation [8]. It depends on a matrix of dissimilarity built from a non-parametric kernel estimator of the data which was multivariate spatial dependence structure. The proposed approach’s capability provides spatially compact, clusters that were meaningful and connected was considered using real datasets and multivariate synthetic. Satisfactory results were given by the proposed method of clustering when comparing from other geostatistical methods of clustering.

Grouping similar and dissimilar data was done by a technique called Data Clustering. But, when dealing with multi-dimensional scan data many clustering algorithms fail. So, this paper introduced the cuckoo optimization algorithm for data clustering which was an efficient method; known as COAC and FCOAC - Fuzzy Cuckoo Optimization Algorithm [2]. This algorithm’s performance was compared and evaluated with GSA, PSO, K-mean, CS, COAC, black hole. The result of the overall performance shows that our algorithm had superior performance when compared with other state-of-the-art methods. Pursuing multiple optima simultaneously, the multimodal optimization aims at increasing attention, which remains challenging. By using the advantage of ACO-Ant Colony Optimization methodology in protecting high diversity, this paper expanded the ACO algorithms for dealing with the multimodal optimization. The overall comparisons gave the effectiveness and efficiency of the devised methodology, especially in critical issues, having high numbers of local optima [32].

Along this way, another instant of clustering taking up the methodology known as Cluster KDE, based on univariate kernel density estimation was proposed [13]. It comprises iterative procedure, wherein minimizing a smooth kernel function in each step, a new cluster would be obtained. Even though we were already using univariate Gaussian kernel in our applications, a smooth kernel function could also be deployed in this schema. The advantage of devised methodology does not require a prior number of clusters. The Cluster KDE was also easy to use, and stops in a finite number of steps, at the initial point, it converges independently. The outcomes show that ClusterKDE was superior and fast when correlated with cluster data and K-means methodologies, via the Matlab Simulink for clustering data.

Like many types of research, this work too devised a unique schema for exploring the count of clusters at the same time and categorize the information points into the clusters by utilizing the clustering of the type subspace [12]. Practical information circulated in a high-dimensioned region could be separated into a combination of low-dimensioned subspaces that could give desired results when utilized. Triplet association could be acquired from the matrix of self-representation, which would be further deployed for assigning the information points to the desired clusters in an iterative manner. This devised methodology gives the clusters count in an automated manner and then merges the clusters for neglecting the excessive segmenting of the intended region. Comprehensive investigational outcomes on both the artificial and practical datasets cross-check the devised methodologies’ robustness and effectiveness. Point clustering mainly clusters N points given into clusters K. It leads to the same group of similarities among objects was high whereas the different group of similarities among objects was low. The Sum of votes maximization between the same cluster points has reduced by the voting formulation technique. Clustering of various sizes and densities have advantages because of this technique [18] An extension to the basic fuzzy K-means algorithm which integrated into a robust function to handle outliers introduced and called as robust and sparse fuzzy k means algorithm. Penalty term presented to make sure about the sparseness of object cluster membership [29].

3 Summarized works of data clustering and optimization

In this section, we tabulate some literature available papers as shown in the below Table 1.

Table 1 Literature level summarization of clustering and optimization for data mining

4 The overall flow of the clustering and optimization

In the traditional optimization algorithm, the existing issues have been addressed by the proposed optimization method. For this purpose, the efficient Fast Clustering Density Peak method executed to calculate density values and for data optimization, the Crowding Differential Evaluation Technique followed. In this section, the flow for the devised work is elaborated. Here, to cluster the high dimensional data using this proposed novel method. Clustering algorithms aim to analyze data by discovering their underlying structure and organize them into separate categories according to their characteristics expressed as internal homogeneity and external bifurcation without prior knowledge.

The FDP algorithm, published in the journal of Science in 2014 which is a method for determining categories based only on the distance between data. It assumes that the cluster center is surrounded by data points that are less dense than its local density and have a large distance from other centers. The evolutionary algorithms for multimodal optimization usually not only locate multiple optima in a single run but also preserve their population diversity throughout a run, resulting in their global optimization ability on multimodal functions. Also, the techniques for multimodal optimization are borrowed as diversity maintenance techniques to other problems. The novelty of this paper lies in the fact of taking the novel steps in calculating the density using the kernel estimator and then using the optimization method for a clustering operation uniquely.

5 Fast density peak clustering

5.1 (FCDP)

5.1.1 Distance matrix

The distance matrix dij is given by the

$$ \mathrm{Distance},{\mathrm{d}}_{\mathrm{i}\mathrm{j}}=\left[\left|{\mathrm{n}}_{\mathrm{i}}-{\mathrm{n}}_{\mathrm{j}}\right|\right]\frac{1}{2} $$
(1)

5.1.2 Cut-off distance

The cut-off distance matrix dcut is given by the

$$ \kern1.75em {\mathrm{d}}_{\mathrm{cut}}={\sum}_{\mathrm{i}=1}^{\mathrm{D}}\mathrm{mean}\ {\left({d}_{ij}\right)}^D $$
(2)

5.1.3 Density of point

The novelty part lies majorly in this part wherein we determine the density using the kernel estimator ‘k’ after finding the center value. The density of points ρi is given by the

$$ {\uprho}_{\mathrm{i}}=\frac{1}{nd}.{\sum}_{\mathrm{j}=1}^{\mathrm{n}}\mathrm{k}\left({d}_{ij}-{\mathrm{d}}_{\mathrm{cut}}\right) $$
(3)

Where, \( \frac{1}{nd} \) indicates the number of the dimensions depending upon the dataset utilized.

Based on the below conditions, many obtained points will be converted to the values of 0 s and 1 s by using the two relationships of δi appropriately.

$$ {\mathrm{X}}^{\left(\mathrm{x}\right)}=\left\{\begin{array}{c}1,\kern0.5em x<0\\ {}0,\kern0.5em otherwise\end{array}\right. $$
(4)

Where.

The distance between nearest-neighbor with higher density and point i is defined by the distance of point i- δi: δi,

$$ {\updelta}_{\mathrm{i}}=\min \kern2em {d}_{ij}i\in s,\rho j-\rho i $$
(5)

The distance δi calculated as when the point i has the highest density as follows,

$$ {\updelta}_{\mathrm{i}}=\max \kern1.25em {d}_{ij}i\in s $$
(6)

6 Fuzzy k-means clustering

FKM generally splits and make partition defined by the group of t. The FKM methodology is based on minimizing the objective function ClusFuz.

$$ {Clus}_{Fuz}={\sum}_{i=1}^d{\sum}_{j=1}^k{u}_{ij}^m{d}_{ij}^2 $$
(7)
$$ {u}_{ij}=\frac{1}{\sum_{p=1}^k\sqrt[\left(m-1\right)]{{\left(\frac{d_{ij}}{d_{jp}}\right)}^2}} $$
(8)

In the above equation, m will not be equal to one and also dij will be same which was already indicated in the previous Eq. (1)

The exponent m in Eq. 7 refers to the fuzzifier parameter and it explains the fuzziness of clustering.

Based on the above discussion, the operational steps of the FKM algorithm could be summarized as follows:

  1. Step 1:

    Select the count of clusters k, level of fuzziness along with a threshold value e. The initialization of the fuzzy partitioning matrix will take place.

  2. Step 2:

    Estimate the cluster centers with the help of Eqs. 7 and 8.

  3. Step 3:

    Estimate the distance dij from the taken specimens. Then, we determine the values of uij and modify the fuzzy partition matrix for updating it.

  4. Step 4:

    Estimate the objective function ClusFuz using the relation (7). Validate whether the relation converges or not the difference intermediary to the 2 nearby values of the objective function is lower than the taken critical count, afterward send it. Or else proceed the Step 2 again if needed.

7 Evolutionary multimodal optimization

Any Evolutionary multimodal optimizing methodology will be intended in determining the numerous (local or overall) optima of a multimodal relation. Many intended issues could be detected and addressed by determining all the overall optima, determining k overall optima, determining k best optima, and determining all local and overall optima, etc.

One such methodology was CDE Crowding Differential Evaluation methodology, which will be discussed in the next sub-section (Fig. 2).

Fig. 2
figure 2

Proposed outline for the fuzzy k-means clustering and optimization of data

7.1 Crowding differential evaluation

For every offspring in every generation, the differential evolution could replace the majority of the alike ones. Although an intensive estimation is accompanied, it can effectively transform the differential evolution into an algorithm specifically for multimodal optimization.

7.2 Algorithm for the crowding differential evaluation

The algorithm for the Crowding Differential Evaluation is given below as follows:

figure a

In each generation for each ind, a trial vector generation undergone for offspring generation which is returned by TRIALVECTORGENERATION. To form a trial vector, three individuals randomly selected by the algorithm. A base vector formed by one individual and a difference vector formed by the other two individuals. Trial vector formed by the sum of these two vectors which again combine with the parent and to form offspring Off_sp.

By this trial vector generation, crossover parameter tuning manually and no longer needed mutation, the typical crossover and mutation operation replaced. To select the appropriate step size the differential evolution- adaptive ability has been provided. For moving towards optima the self-organizing ability has granted.

8 Comparative analysis of the proposed method and its performance

The existing methods such as [37]& [35] were taken for our comparative analysis to show the effectiveness of this proposed work.

8.1 Processed dataset

Artificial and real-time Benchmark datasets like Aggregation, D31, Flame, Path-based, R15, iris, Seeds, Glass, Wine, spiral are used for the comprehensive performance analysis of this proposed method with the existing methods.

The proposed method is examined on various datasets [35, 36] in which half them are real and half of them are synthetic. The detailed description of the dataset shown in Table 2.

Table 2 Dataset Description

8.2 Performance metrics investigation

Two performance metrics such as ARI-Adjusted Rand index and Normalized Mutual Information are utilized for making the comparative study with the real and artificial datasets.

The Hubert statistics is defined as,

$$ Huber{t}^{\prime }s=\left( Ma-{m}_1{m}_2\right)/\sqrt{m_1{m}_2-\left(M-{m}_1\right)\left(M-{m}_2\right)} $$
(9)

Where M = a + b + c + d, m1 = a + b and m2 = a + c

From the above Eq. 9, the term defines the number of pairs in which every two objects in similar cluster w.r.t. P and P′. Let b defines the number of pairs in which every two objects in similar cluster w.r.t. P but various clusters go to P′. The term c defines the number of pairs in which every two objects in various clusters w.r.t. P and similar clusters go to P′. Finally, d is the number of pairs in which every two objects belong to various clusters w.r.t. P and P′.

The pair of clustering shared information depicted by mutual information and the normal mutual information NMI used as external validity criteria and defined as,

$$ NMI\left(P,{P}^{\prime}\right)=\frac{\sum_{i=0}^k{\sum}_{j=1}^{k\prime}\left|{c}_i\cap {c}_j^{\prime}\right|\ \log \left(\frac{N\mid {c}_i\cap {c}_j^{\prime}\mid }{\mid {c}_i\Big\Vert {c}_j^{\prime}\mid}\right)}{\sqrt{\sum_{i=0}^k\left|{c}_i\right|\mathit{\log}\frac{\mid {c}_i\mid }{N}\left({\sum}_{i=0}^{k\prime }|{c}_j^{\prime }|\mathit{\log}\frac{\mid {c}_j^{\prime}\mid }{N}\right)}\ } $$
(10)

|C| Denoted as the number of objects in C.

The adjusted rand index ARI is calculated by,

$$ ARI\left(P,{P}^{\prime}\right)=\frac{\sum_{i=1}^k{\sum}_{j=1}^{k\prime}\begin{array}{c}{n}_{ij}\\ {}2\end{array}-\left[\ {\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)\ {\sum}_{j=1}^{k\prime}\frac{\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)}{\left(\begin{array}{c}n\\ {}2\end{array}\right)}\right]}{\frac{1}{2}\left[{\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)+{\sum}_{j=1}^{k\prime}\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)\right]-\left[\ {\sum}_{i=1}^k\left(\begin{array}{c}{a}_i\\ {}2\end{array}\right)\ {\sum}_{j=1}^{k\prime}\frac{\left(\begin{array}{c}{b}_j\\ {}2\end{array}\right)}{\left(\begin{array}{c}n\\ {}2\end{array}\right)}\right]\kern0.5em } $$
(11)

Where aj defines the addition of common points in jth cluster. The ground truth bj defines the addition of common points in jth classes. Nij denoted as the number of common points in cluster i and class j.

The below Table 3 shows the values of NMI of the existing and proposed methods.

Table 3 Performance metrics for ARI of the existing [36] and proposed method

The above Fig. 3 shows the performance level comparison for ARI of the proposed and existing methods. The proposed optimization method was good and yielded superior results than other considered existing methods of ISB, NCFS, CFS, IS, and HCF. Where Density Kernel in Density Peak Based Clustering(NCFS), Clustering by fast search and of density peaks (CFS), HCFS: A Density Peak Based Clustering Algorithm Employing -A Hierarchical Strategy.

Fig. 3
figure 3

Performance level comparison for ARI

Then, the NMI-Normalized Mutual Information is tabulated in the below Table 4.

Table 4 Performance metrics for NMI of the existing [36]and proposed method

The above Fig. 4 shows the performance level comparative study with the values of NMI of the proposed and existing methods. The proposed optimization method was good and yielded superior results than other considered existing methods of ISB, NCFS, CFS, IS, and HCF because of the clustering it efficiently by finding the center, cut- off distance, and density of point.

Fig. 4
figure 4

Performance level comparison for NMI

The F1 score values are tabulated in the below Table 5 for validating the performance of the proposed clustering cum optimization method with the existing method.

Table 5 Performance metrics for the F1 measure of the existing [10] and proposed method

The above Fig. 5 shows the performance-based comparison investigation with the F1 scores of the proposed and existing methods concerning the optimization. The proposed optimization method was good and yielded superior results than other considered existing methods of conventional K-means, AP, DP-C, and REDPC because of the clustering it. Whereas the original DP algorithm with the cutoff kernel (DP-c), Affinity peak(AP) and Record based Density Peak Clustering REDPC.

Fig. 5
figure 5

Performance level comparison for F1 score

Then, the NMI-Normalized Mutual Information is tabulated in the below Table 6 for validating the performance of the proposed clustering cum optimization method with the existing method

Table 6 Performance metrics for NMI of the existing[35] and proposed method

The above Fig. 6 indicates the performance level comparison investigation with the NMI of the proposed and existing methods concerning the optimization. The proposed optimization method was good and yielded superior results than other considered existing methods of NegMM, WCT, WTQ, CSM, CSPA, and REDPC because of the clustering it. Link-based Clustering Ensemble (LCE) [4] and Strehl’s algorithms. The former has three vari- ants: Weighted Connected-Triple (WCT), Weighted Triple-Quality (WTQ) and Combined Similarity Measure (CSM). The latter also has three variants: Cluster-based Similarity Partitioning Algorithm (CSPA), Hyper Graph Partitioning Algorithm (HGPA), and Meta- Clustering Algorithm (MCLA).

Fig. 6
figure 6

Performance level comparison for NMI

The above Table 7 shows the Performance metrics for Hubert’s of the existing and proposed method. Here too, the proposed method was effective to the most part than the existing methods like, and in some cases, the proposed methods’ performance coincided when using the datasets like R15, iris, and wine.

Table 7 Performance metrics for Hubert’s of the existing [35] and proposed method

From Fig. 7 it depicted that the proposed method shows the highest accuracy while compared with other existing methods. The second highest accuracy yielded by CVR-LMV clustering based on Voting representation existing method of 0.9333 and the lowest accuracy yielded from PAM method.

Fig. 7
figure 7

Resulted in Accuracy comparison [18]

9 Conclusion

In this proposed methodology, we first of all the clustered the data after determining the center, cut off distance, and density of point by using the FCDP Fast Density Peak Clustering with these novel steps and then, fuzzy K-means clustering was applied to form the cluster. Then, CDE Crowding Differential Evaluation methodology optimization was used to optimize the data innovatively here in this paper, since the CDE was deployed to be utilized with the scheduling and other related operations. Then, we measured the performance metrics of this proposed work by using the performance measures like NMI, ARI, and Hubert’s. In all cases but few, superior results were obtained predominantly by coinciding with the fewer outcomes when Hubert’s comparison was made. The future enhancement focuses on the various another dataset can include and follows deep learning techniques to enhance the current optimization techniques.