Exponent and Logarithm Component-Wise Construction Method of FCM Clustering Validity Function Based on Subjective and Objective Weighting

Liu, Jia-Xu; Wang, Jie-Sheng; Wang, Guan; Zhao, Xiao-Rui; Wang, Hong-Yu; Jin, Di

doi:10.1007/s40815-022-01394-w

Exponent and Logarithm Component-Wise Construction Method of FCM Clustering Validity Function Based on Subjective and Objective Weighting

Published: 26 September 2022

Volume 25, pages 647–669, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Fuzzy Systems Aims and scope Submit manuscript

Exponent and Logarithm Component-Wise Construction Method of FCM Clustering Validity Function Based on Subjective and Objective Weighting

Download PDF

Jia-Xu Liu¹,
Jie-Sheng Wang¹,
Guan Wang¹,
Xiao-Rui Zhao¹,
Hong-Yu Wang¹ &
…
Di Jin²

209 Accesses
1 Citation
Explore all metrics

Abstract

The cluster validity function is used to evaluate the quality of the cluster results, and giving the exact number of initial cluster categories will rationalize the cluster results. Most single cluster validity functions and combined cluster validity functions generally have strong subjective problems, which also increases the burden on decision analysts and have great limitations in applications. To overcome the shortcomings of these clustering validity functions and improve the accuracy of the optimal cluster category classification for the datasets, based on the clustering performance evaluation components, a validity functional component construction method based on the exponential and log form was proposed. The weighting method adopts the combination of expert empowerment and standard separation method to combine the five weights so as to obtain 52 different fuzzy clustering validity functions. Then, based on the fuzzy C-mean (FCM) clustering algorithm, the performance analysis are carried out by using multiple data sets. Experimental simulation of these functions are proceeded on six commonly used UCI datasets. A clustering validity function with the simplest structure and the best classification effect was selected by comparison. Finally, this function is compared with 8 typical single clustering validity functions and four common clustering validity combination evaluation methods on 8 UCI data sets. Through experimental simulation, the proposed validity function is compared in processing data sets, but also has strong scientific theoretical basis. Thus, the feasibility and effectiveness of the proposed clustering validity function construction method are proved.

Fuzzy C-Means Clustering Validity Function Based on Multiple Clustering Performance Evaluation Components

Article 21 February 2022

Clustering Validity Function Fusion Method of FCM Clustering Algorithm Based on Dempster–Shafer Evidence Theory

Article 10 October 2021

A new validity index adapted to fuzzy clustering algorithm

Article 27 February 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering is accompanied by the emergence and development of human society. People in the process of understanding and mastering objective things, always distinguish different things and recognize the similarity between things. Therefore, the research of cluster analysis is not only of great theoretical significance, but also has important engineering application and humanistic values. With its theory development, clustering has been widely used in many fields, such as speech recognition, face recognition, radar target recognition, biological information analysis [1, 2], image segmentation [3,4,5,6], edge detection [7], image compression [8], curve fitting, target detection and tracking, mobile robot positioning, traffic flow video detection [9, 10], and model identification and fuzzy rule establishment [11, 12]. Clustering learning is one of the earliest methods used in pattern recognition and data mining tasks, and is used to study large databases in various applications. Therefore, the clustering algorithm for big data has attracted more and more attention.

In recent years, with the development of computing theory and technology, many clustering methods have been proposed. According to the implementation ideas of clustering algorithms, they can be divided into hierarchical clustering algorithm, partitioned clustering algorithm, density based clustering algorithm, grid based clustering algorithm, and model-based clustering algorithm. Goldberger et al. Proposed a hierarchical clustering algorithm based on classical Hungarian method [13]. Typical clustering methods include k-means algorithm, k-medoids algorithm, and fuzzy c-means algorithm [14]. Macqueen proposed K-means clustering algorithm [15] in 1967, which has become one of the most classic clustering algorithms. Yodern et al. proposed a semi-supervised K-means clustering algorithm in 2017 [16]. Gengzhang et al. proposed a DC k-means algorithm in 2018 [17]. Hiep proposed a differential privacy preserving K-Modes algorithm in 2018 [18]. So far, many clustering algorithms have been put forward for different types of clustering, which can meet the different clustering requirements. However, many existing clustering algorithms need to specify the number of clusters in order to obtain the optimal clustering partition of the target datasets before performing the clustering task. The clustering validity index is used to evaluate the partition result of clustering algorithm, which are modeled by mathematical knowledge and can evaluate the effectiveness of clustering partition results. Through the mathematical evaluation of the clustering results for datasets, the clustering algorithm can also obtain the best clustering results under the premise of unable to achieve the given optimal number of clusters. From the present point of view, the research on clustering validity can be roughly divided into the study of single cluster validity function and the research of combined clustering validity evaluation method. The research on single clustering validity function focuses on the following two aspects.

(1)
The fuzzy clustering validity function based on membership degree. Partition coefficient $(V_{PC} )$ defined by Bezdek is used to measure the overlap between clusters [19]. Bezdek will also proposed partition entropy $(V_{PE} )$ used to measure the fuzziness of clustering partition [20]. This index is similar to $V_{PC}$. Bezdek proved that for all probabilistic cluster partitions, the structure of $V_{PC}$ and $V_{PE}$ is simple and the amount of calculation is small, but they will change monotonously with the number of clusters. An improved partition coefficient $(V_{MPC} )$ is revised on $V_{PC}$ about the existed monotone decreasing trend problem, but other aspects of the defects have not been improved [21]. In 2004, Chen and links proposed an effective index in the form of subtraction $(V_{P} )$, which is an effective function that only focuses on membership [22]. In 2013, Jiashun proposed a clustering validity function $(V_{CS} )$, which can effectively suppress noise data [23]. Joopudi used the maximum membership degree and the minimum membership degree to measure the data overlap and proposed a clustering validity function $(V_{GD} )$ [24].
(2)
The fuzzy clustering validity function based on geometric structure. Xie and Beni proposed a clustering validity function $(V_{XB} )$ based on proportion operation in 1991 [25], which is the first clustering validity function which takes into account the structure of the data set. It is the ratio of the compactness within the cluster and the separation between the clusters. $V_{K}$ is an validity index proposed by Kwon. The method of adding penalty items to the numerator of the index effectively restrained the trend of decreasing monotonously of $V_{XB}$. $V_{PCAES}$ index is a clustering validity index proposed by Wu and Yang in 2005. It describes the compactness and separation of clustering by fuzzy membership function and the relative value of the center distance of an exponential type structure [26]. Chi-Hung Wu proposed a clustering validity function $(V_{WL} )$ in 2015 [27]. It considers all clusters and the overall compactness separation ratio of each cluster. Chi Yun proposed a validity function $(V_{FM} )$ in 2007 [28], which takes the partition entropy and fuzzy partition factor into account, and defines the compactness and separation of clustering, but its performance on noisy data sets is poor. Zhu proposed a new clustering validity function $(V_{ZLF} )$ in 2019 [29], which can divide high-dimensional data sets accurately. In 2021, Wang used the definition of compactness, separation, and overlap as reference, introduced a new concept to enhance the adaptability of the validity function, and thus proposed a clustering validity function $(V_{HY} )$ [30]. Wang proposed a new clustering validity function $(V_{WG} )$ in 2021, which can find the best clustering number of the noise, overlapping, and high-dimensional datasets [31].

The final clustering results will be directly affected by the performance of the clustering validity functions. Aiming at the shortcomings of the existing fuzzy clustering validity functions, this paper proposes a new fuzzy c-means clustering validity function based on multi clustering performance evaluation components by combining the subjective weighting method and standard deviation method. Two kinds of combination weighting methods in exponential and logarithmic forms are proposed, and then five FCM clustering performance evaluation components are continuously arranged and combined in weighted form. Several clustering validity functions based on combination weighting strategy are tested on UCI datasets. The experimental results show that the two validity functions can obtain the correct clustering results on UCI datasets, which can overcome the defects of other clustering validity functions, and become a new direction to solve the problem of fuzzy clustering validity problem, and expand the theoretical system of constructing clustering validity functions based on components.

2 FCM Clustering Algorithm and Combined Clustering Validity Evaluation Method

2.1 FCM Clustering Algorithm

Fuzzy C-means (FCM) algorithm is a common soft clustering algorithm, and it is also the most representative fuzzy clustering algorithm. It is widely used in pattern recognition and clustering analysis. Set the target data set $X = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\}$ composed of $n$ samples, the sample data $x_{j} = [x_{1j} ,x_{2j} , \ldots ,x_{sj} ]^{T}$, where $x_{kj}$ is the the $k$ property value of $x_{j}$. For a given sample set $X$_, the cluster analysis of $X$ is divided into the $c$ clusters. The minimum objective function is found by iteration, which is defined in Eq. (1).

$$J_{FCM} (U,V) = \sum\limits_{i = 1}^{c} {\sum\limits_{j = 1}^{n} {(u_{ij} )^{m} } } \left\| {x_{j} - v_{i} } \right\|^{2}$$

(1)

where $J_{FCM} (U,V)$ represents the square error clustering criterion, and its minimum value is called as the stationary point of least square error. $V = \left\{ {v_{1} ,v_{2} , \ldots ,v_{n} } \right\}$ represents the set of clustering centers, whose definition is shown in Eq. (2).

$$v_{i} = \frac{{\sum\nolimits_{i = 1}^{c} {\sum\nolimits_{j = 1}^{n} {u_{ij}^{m} \cdot x_{i} } } }}{{\sum\nolimits_{j = 1}^{n} {u_{ij}^{m} } }}$$

(2)

where, c represents the number of clusters; $m \in (1,\infty )$ is the fuzzy coefficient to control the fuzziness of membership degree of each group data the range; $v_{i}$ is on behalf of the i-th clustering centers; $\left\| {x_{j} - v_{i} } \right\|$ represents the distance between the objects $x_{j}$ to the cluster center $v_{i}$, which usually adopts the euclidean distance; $u_{ij} (0 \le u_{ij} \le 1)$ represents the membership degree of the data objects $x_{j}$ belonging to the cluster center $v_{i}$; $u_{ij} \in U$ and U is the membership matrix of fuzzy partition and meet the following conditions.

$$u_{ij} = \left[ {\sum\nolimits_{k = 1}^{c} {\left( {\frac{{\left\| {x_{j} - v_{i} } \right\|^{2} }}{{\left\| {x_{j} - v_{k} } \right\|^{2} }}} \right)}^{{{\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 {m - 1}}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${m - 1}$}}}} } \right]^{ - 1} ,\sum\nolimits_{i = 1}^{c} {u_{ij} = 1} ,0 \le \sum\nolimits_{j = 1}^{n} {u_{ij} \le n}$$

(3)

where, $1 \le j \le n$, $1 \le i \le c$.

The FCM clustering algorithm process is described as follows:

Step 1: Set the clustering parameter $c$, fuzzy factor $m$, and convergence threshold $\varepsilon$_.

Step 2: Initialize the clustering center matrix $V$ and membership matrix $U$, and obtain $U_{0}$ and $V_{0}$.

Step 3: Update the fuzzy partition matrix $U={({u}_{ij})}_{c\times n}$ according to Eq. (3).

Step 4: Update the clustering center $V = \left\{ {v_{1} ,v_{2} , \ldots ,v_{c} } \right\}$ according to Eq. (2).

Step 5: Calculate $e = \left\| {u_{t + 1} - u_{t} } \right\|$. If $e \le \varepsilon$ ($\varepsilon$ is a threshold from 0.001 to 0.01), the algorithm stops and the final clustering result is calculated. Otherwise, $U_{t} = U_{t + 1}$ and repeat from Step 2.

2.2 Combined Clustering Validity Evaluation Method

The clustering validity problem mainly lies in how to select a clustering validity function to determine the optimal number of clusters in datasets. The clustering validity functions can be roughly divided into external validity function, internal validity function, and relative validity function. Both the internal fuzzy clustering validity function and the relative fuzzy clustering validity function have developed very mature and the system is more and more perfect. At present, the fusion of clustering validity functions mostly uses the weighted combination. The typical weighted combination clustering validity evaluation methods are listed in Table 1.

Table 1 Weighted combination clustering validity evaluation methods

Exponent and Logarithm Component-Wise Construction Method of FCM Clustering Validity Function Based on Subjective and Objective Weighting

Abstract

Similar content being viewed by others

Fuzzy C-Means Clustering Validity Function Based on Multiple Clustering Performance Evaluation Components

Clustering Validity Function Fusion Method of FCM Clustering Algorithm Based on Dempster–Shafer Evidence Theory

A new validity index adapted to fuzzy clustering algorithm

Explore related subjects

1 Introduction

2 FCM Clustering Algorithm and Combined Clustering Validity Evaluation Method

2.1 FCM Clustering Algorithm

2.2 Combined Clustering Validity Evaluation Method

3 Exponent and Logarithm Component-wise Construction Method of FCM Clustering Validity Function

3.1 Clustering Validity Evaluation Components

3.2 Exponent and Logarithm Component-Wise Construction Method

3.3 Subjective and Objective Weighting Strategy

4 Simulation Experiment and Result Analysis

5 Simulation Results and Analysis on Exponent Validity Functions (Group 1)

6 Simulation Results and Analysis on Exponent Validity Functions (Group 2)

7 Simulation Results and Analysis on Exponent Validity Functions (Group 3)

8 Simulation Results and Analysis on Logarithm Validity Functions (Group 1)

9 Simulation Results and Analysis on Logarithm Validity Functions (Group 2)

10 Simulation Results and Analysis on Logarithm Validity Functions (Group 3)

11 Simulation Comparison with Single Clustering Validity Functions and Combined Clustering Validity Methods

11.1 Simulation Comparison with Single Clustering Validity Functions

11.2 Simulation Comparison with Combined Clustering Validity Methods

12 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation