Three-way k-means: integrating k-means and three-way decision

Wang, Pingxin; Shi, Hong; Yang, Xibei; Mi, Jusheng

doi:10.1007/s13042-018-0901-y

Three-way k-means: integrating k-means and three-way decision

Original Article
Published: 02 January 2019

Volume 10, pages 2767–2777, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Three-way k-means: integrating k-means and three-way decision

Download PDF

Pingxin Wang ORCID: orcid.org/0000-0002-1290-6112^1,2,
Hong Shi³,
Xibei Yang³ &
…
Jusheng Mi²

1224 Accesses
81 Citations
Explore all metrics

Abstract

The traditional k-means, which unambiguously assigns an object precisely to a single cluster with crisp boundary, does not adequately show the fact that a cluster may not have a well-defined cluster boundary. This paper presents a three-way k-means clustering algorithm based on three-way strategy. In the proposed method, an overlap clustering is used to obtain the supports (unions of the core regions and the fringe regions) of the clusters and perturbation analysis is applied to separate the core regions from the supports. The difference between the support and the core region is regarded as the fringe region of the specific cluster. Therefore, a three-way explanation of the cluster is naturally formed. Davies–Bouldin index (DB), Average Silhouette index (AS) and Accuracy (ACC) are computed by using core region to evaluate the structure of three-way k-means result. The experimental results on UCI data sets and USPS data sets show that such strategy is effective in improving the structure of clustering results.

Adaptive K-means Algorithm Based on Three-Way Decision

Adaptive Three-Way C-Means Clustering Based on the Cognition of Distance Stability

Article 21 January 2022

On Various Types of Even-Sized Clustering Based on Optimization

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

As one of the most fundamental topics in both data mining and machine learning, clustering has been widely used in data analysis [8, 23]. The aim of clustering is to partition a set of given multivariate samples into several meaningful groups such that all members within a group are similar to each other and samples between different groups represent dissimilar characteristics. Research on clustering algorithm has received much attention and a number of clustering methods have been developed over the past decades. A comprehensive review of clustering algorithms can be found in [7, 40].

Roughly speaking, most of the existing clustering methods can be classed into two categories: hierarchical clustering and partitive clustering [9]. In this paper, we focus on the latter which is categorized as a prototype-based model, i.e., each cluster can be represented by a prototype, leading to a concise description of the original data set. According to whether there is a crisp boundary between clusters, the various partitive clustering algorithms can be divided into hard clustering and soft clustering. Hard clustering methods are based on an assumption that a cluster is represented by a set with a crisp boundary. One of the most widely used hard clustering is k-means method [21], where each object must be assigned to exactly one cluster. The requirement of a sharp boundary leads to easy analytical results, but may not adequately show the fact that a cluster may not have a well-defined cluster boundary. In order to relax this requirement of crisp boundary, many soft clustering methods based on k-means were proposed for different applications. Incorporating fuzzy sets into k-means clustering, Bezdek [2] proposed fuzzy k-means (FKM), which is assumed that a cluster is represented by a fuzzy set that models a gradually changing boundary. It is often used to reveal the structure of a data set and to construct information granules. Based on the rough set theory [26,27,28], Lingras and West [18] introduced rough k-means (RKM) clustering, which describes each cluster not only by a center, but also with a pair of lower and upper bounds. The lower and upper approximations are different weighted parameters that are used to compute the new centers. Involving membership degrees, Mitra et al. [24] put forward a rough-fuzzy k-means (RFKM) clustering method, which incorporated membership in the RKM framework. The lower and upper bounds are determined according to the membership degrees, not the individual absolute distances between an object and its neighbors. As a conceptual and algorithmic bridge between rough sets and fuzzy sets, shadowed set [29] provides an alternate mechanism for handling uncertainty and has been successfully used in clustering analysis, resulting in shadowed k-means (SKM) [25].

Three-way decision, as a new field of study for complex problem solving, was proposed by Yao [43,44,45,46]. It is an extension of the commonly used binary-decision models by adding the deferred decision. The main idea of three-way decision is to divide a universe into three disjoint regions and make different strategies for different regions. The deferred decision is viewed as the third decision-making behavior when the information is not sufficient to determine the state of an object, that is, whether the object is accepted or rejected. Many soft computing models, such as interval sets, rough sets, fuzzy sets and shadowed sets, have the tri-partitioning properties and can be reinvestigated within the framework of three-way decision [46]. It is obviously that there may exist at least three types of relationships between an element and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. In order to chapter the three types of relationships, Yu et al. [49, 50, 52] introduced three-way clustering based on three-way decision theory. In three-way clustering, a cluster is represented by pair of nested sets called the core and support of the cluster, respectively. The core, the difference between the support and core (i.e., the fringe region), and the complement of the support give rise to a trisection of the space. A trisection captures the three types of relationships between a cluster and an object.

As one of the classical hard clustering algorithms, k-means represents a cluster by a single set. For each cluster, the set naturally divides the space into two regions. Objects belong to the cluster if they are in the set, otherwise they do not. Here, only two relationships are considered in the process of k-means. In this paper, we aim at presenting a three-way k-means (TWKM for short) clustering method by incorporating three-way decisions into k-means clustering. In the proposed method, we view each cluster as a set and represent it by using core region (Co), fringe region (Fr) and trivial region (Tr). Since Tr can be expressed as the complement of the union of Co and Fr, we can represent a three-way cluster by a pair of the set of core objects and the set of fringe objects. For each object, it can be a member of one core region at most or a member of one fringe region at least. Figure 1 shows one possible cluster by three-way k-means clustering method. The elements in the fringe region may belong only to this cluster or to fringe region of other clusters.

In order to clarify the main differences between TWKM and the traditional k-means, we take the objects in Fig. 2 as an example. Figure 3 is the clustering result of traditional k-means method by setting $k=2$. From the result we can see that an object is either in $C_1$ or not in $C_1$; The same is true about $C_2$. There is a crisp boundary for each cluster. The requirement of a sharp boundary leads to easy analytical results, but may not be good enough for characterizing the uncertainty. If we apply the proposed TWKM algorithm on data set in Fig. 2, we obtain the clustering result as Fig. 4. We can see that objects near the cluster center are assigned to core region and other objects are assigned to fringe region, which reveal a better structure than the result in Fig. 3.

The procedure of TWKM consists mainly of two steps. The first step is to obtain the support of each cluster, where support is the union of core region and fringe region of specified cluster. The second step is to separate the core region from the support set. The rest of this paper is organized as follows. In Sect. 2, we review some basic concepts of three-way decision, rough k-means and three-way clustering. In Sect. 3, we present the process and the algorithm of three-way k-means clustering. Evaluations of the algorithm and experiment results are reported in Sects. 4 and 5, respectively. In Sect. 6, we give some concluding remarks and point out some future research problems.

2 Preliminaries

To facilitate the description of the proposed method, we introduce some basic concepts related to this paper, which include three-way decision, fuzzy k-means and three-way clustering.

2.1 Three-way decision

The concept of three-way decision, which was first proposed by Yao [45, 46], is an extension of the commonly used binary-decision model through adding a third option and is used to interpret rough set three regions. The positive, negative and boundary regions are viewed, respectively, as the regions of acceptance, rejection, and noncommitment in a ternary classification. The positive and negative regions can be used to induce rules of acceptance and rejection. Whenever it is impossible to make an acceptance or a rejection decision, the third noncommitment decision is made. One usually makes a decision based on available information and evidence. When the evidence is insufficient or too weak, it might be impossible to make either a positive or a negative decision. One chooses an alternative decision that is neither yes nor no. The third option may also be referred to as a deferment decision that requires further judgments. With an ordered evaluation function, the three regions are formally defined by:

Definition 1

Three-way decision with order set [46]. Suppose $(L, \preceq )$ is a totally ordered set, that is, $\preceq$ is a total order. For two elements $\alpha , \beta$ with $\beta \prec \alpha$, suppose that the set of designated values for acceptance is given by $L^+=\{t\in L\mid t\succeq \alpha \}$ and the set of designated values for rejection is given by $L^-=\{b\in L\mid b\preceq \beta \}$. For an evaluation function $\nu : U\rightarrow L$, its three regions are defined by:

$$\begin{aligned} \mathrm{POS}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \nu (x)\succeq \alpha \},\\ \mathrm{NEG}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \nu (x)\preceq \beta \},\\ \mathrm{BND}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \beta \prec \nu (x)\prec \alpha \}. \end{aligned}$$

Three-way decision has been proved to build on solid cognitive foundations and is a class of effective ways commonly used in human problem solving and information processing [47]. The idea of three-way decision is commonly used in real life and widely applied in many fields and disciplines. Many soft computing models for leaning uncertain concepts, such as interval sets, rough sets, fuzzy sets and shadowed sets, have the tri-partitioning properties and can be reinvestigated within the framework of three-way decisions [45]. The theory of three-way decision embraces ideas from these theories and introduces its own notions, concepts, methods and tools. Recently, Yao [47] presented a trisecting-and-acting model, which can be depicted by Fig. 5. The model explains three-way decision in terms of two basic tasks. The first one is to divide a universal into three pair-wise disjoint regions and the second one is to develop appropriate strategies for different region.

Since three-way decision was proposed by Yao, we have witnessed a fast growing developments and applications of three-way approaches in many fields and disciplines, such as, investment decision [17, 20], information filtering [16], text classification [13, 15], risk decision [13, 17], government decision [19], conflict analysis [10], web-based support systems [42], and so on. Many recent studies further investigated extensions and applications of three-way decision. For examples, Zhang et al. [54] presented a kind of three-way decision model according to two types of classification errors and two types of uncertain classifications. Yang et al. [41] proposed a unified model of sequential three-way decision and multilevel incremental processing for complex problem solving by making use of a granular structure. Hao et al. [5] developed a sequential three-way decision model to solve the optimal scale selection problem in a dynamic multi-scale decision table. Zhang et al. [53] established a dynamic three-way decision model based on the updating of the attribute values. In addition, Qi et al. [30] introduced three-way decision into formal concept analysis and proposed the notion of three-way concept, in which the main idea is to incorporate the idea of ternary classification into the design of extension or intension of a concept. Li et al. [14] proposed an axiomatic approach to describe three-way concepts by means of multi-granularity. The idea of three-way concept analysis has attracted a lot of research, and a series of related results have been obtained [6, 12, 32, 48]. In clustering field, Yu et al. [49, 50, 52] presented a framework of three-way clustering which represents the clusters by a pair of sets called core region and fringe region. Afridi et al. [1] presented a three-way clustering approach for handling missing data by using game-theoretic rough set model. Wang and Yao [39] proposed a framework of a contraction-and-expansion based three-way clustering called CE3 inspired by the ideas of erosion and dilation from mathematical morphology. Yu et al. [51] investigated an active three-way clustering method via low-rank matrices. All the above results enrich the theories and models of three-way decision.

Recently, Singh [33] proposed another three-way method to generate the three-way fuzzy concepts and their hierarchical-order visualization in the concept lattice. Singh’s method comes from the theory of neutrosophic set, which uses three functions: a truth membership function, an indeterminacy-membership function, and a falsity membership function, to represent a set. The Singh’s method shares some similarity with the Yao’s three-way decision and, at the same time, differs from it in several ways. On the one hand, they are both based on the tri-partition methodology, which provides flexible ways for human-like problem solving and information processing. On the other hand, Yao’s method comes from decision-theoretic rough set model, which has led to the concept of three-way decision. A basic idea of Yao’s three-way decision is to divide a universal set into three pair-wise disjoint regions and to process the three regions accordingly. Different from Yao’s method, Singh’s method gives a way to divide a universe set into three regions, independently, which may intersect each other. Most of recent studies of Singh’s method focus on fuzzy concept lattice [34,35,36] and its applications [37].

2.2 Rough k-means

The traditional k-means algorithm [38] proceeds by partitioning n objects into k non-empty subsets. During each partition, the centroids or means of clusters are computed as

$$\begin{aligned} x_i=\frac{\sum _{v\in C_i}v}{|C_i|}, \end{aligned}$$

(1)

where v is the object in $C_i$, $|C_i|$ is the number of objects in cluster $C_i$. The process is repeated until convergence, i.e., there are no more new assignments of objects to the clusters.

In the rough set theory [26,27,28], a rough concept is approximated by a pair of exact concepts, called the lower and upper approximations. The lower approximation is the set of objects definitely belonging to the vague concept, whereas the upper approximation is the set of objects possibly belonging to the same. Correspondingly, three pairwise disjoint regions are formed, i.e., positive, boundary, and negative regions. Figure 6 provides a schematic diagram of rough set X with POS(X), BND(X) and NEG(X), consisting of granules coming from the rectangular grid.

By incorporating rough set theory into traditional k-means, Lingras and West [18] introduced RKM clustering. In RKM, the concept of k-means is extended by viewing each cluster as an interval set or rough set X. It is characterized by the lower and upper approximations ${\underline{R}}X$ and ${\overline{R}}X$, respectively, with the following properties: (i) an object $x_k$ can be part of at most one lower approximation, (ii) if $x_k\in {\underline{R}}X$ of cluster X, then simultaneously $x_k\in {\overline{R}}X$, (iii) if $x_k$ is not a part of any lower approximation, then it belongs to two or more upper approximations. This permits overlaps between clusters.

In order to compute the centroid of each cluster in RKM, the right side of Eq. (1) is split into two parts. Since the patterns lying in the lower approximation definitely belong to a rough cluster, they are assigned a higher weight that is controlled by parameter $w_{low}$. The patterns lying in the upper approximation are assigned a relatively lower weight, controlled by parameter $w_{up}$ during computation. The centroid of cluster $U_i$ is determined by

$$\begin{aligned} v_i= \left\{ \begin{array}{cc} w_{low}A_1+w_{up}B_1, &{} \text{ if } \ \ \ {\underline{R}}C_i\ne \phi \wedge {\overline{R}}C_i-{\underline{R}}C_i \ne \phi \\ B_1, &{} \text{ if } \ \ \ \ {\underline{R}}C_i=\phi \wedge {\overline{R}}C_i-{\underline{R}}C_i \ne \phi \\ A_1, &{} \text{ if }\ \ \ \ {\underline{R}}C_i\ne \phi \wedge {\overline{R}}C_i-{\underline{R}}C_i = \phi \end{array} \right. , \end{aligned}$$

(2)

where,

$$\begin{aligned} A_1= & {} \frac{\sum _{x\in {\underline{R}}C_i}x_k}{|{\underline{R}}C_i|}, \end{aligned}$$

(3)

$$\begin{aligned} B_1= & {} \frac{\sum _{x\in ({\overline{R}}XC_i-{\underline{R}}C_i)}x_k}{|{\overline{R}}C_i-{\underline{R}}C_i|}. \end{aligned}$$

(4)

The parameters $w_{low}$ and $w_{up}$ correspond to the relative importance of the lower and upper approximations, respectively. Here $|{\underline{R}}C_i|$ indicates the number of patterns in the lower approximation of cluster $C_i$, while $|{\overline{R}}C_i-{\underline{R}}C_i|$ is the number of patterns in the rough boundary lying between the two approximations. In order to determine the lower and upper approximations of each cluster, Lingras et al. [4] utilized the following rules: Let $d_{pk}=\min _{1\le i\le k}d_{ik}$ and $T=\{j : d_{jk}-d_{pk}\le \ \ threshold \ \ \text{ and } \ \ p\ne j\}$.

1.
If $T\ne \phi , x_k \in {\overline{R}}(C_p)$ and $x_k \in {\overline{R}}(C_j), \forall j\in T$ . Furthermore, $x_k$ is not part of any lower bound.
2.
Otherwise, if $T=\phi , x_k \in {\underline{R}}(C_i)$. In addition, $x_k \in {\overline{R}}(C_p)$.

It is observed that the performance of the algorithm is dependent on the choice of $w_{low}, w_{up}$ and threshold. The parameter $w_{low}$ controls the importance of the objects lying within the lower approximation of a cluster in determining its centroid. Hence an optimal selection of these parameters is an issue of reasonable interest. We allowed $w_{up}+ w_{low}=1$ and $0.5\le w_{low}\le 1$.

2.3 Three-way clustering

The framework of three-way clustering was first proposed by Yu [50, 52]. We summarize the basic concepts of three-way clustering.

Assume ${\mathbb {C}}=\{C_1, \ldots , C_k\}$ is a family clusters of universe $V =\{v_1,\ldots , v_n\}$. A hard clustering requires that $C_i$ satisfies the following conditions:

$$\begin{aligned} &(\mathrm{i})C_i\ne \phi , \ \ \ i=1, \ldots , k,\\ &(\mathrm{ii})\bigcup _{i=1}^kC_i=V,\\ &(\mathrm{iii})\,C_i\bigcap C_j=\phi , \quad i\ne j. \end{aligned}$$

Property (i) states that each cluster cannot be empty. Properties (ii) and (iii) state that every $v\in V$ belongs to one and only one cluster. In this case, C is a partition of the universe.

It is obviously that there are three types of relationships between an object and a cluster, namely, belong-to definitely, not belong-to definitely, and uncertain. It is therefore more appropriate to use three regions to represent a cluster. Inspired by ideas of three-way decision, Yu [50, 52] proposed a framework of three-way clustering. In contrast to the general crisp representation of a cluster, three-way clustering represents a three-way cluster $C_i$ as a pair of sets:

$$\begin{aligned} C_i=(\text{ Co }(C_i), \text{ Fr }(C_i)), \end{aligned}$$

where $\text{ Co }(C_i)\subset V$ and $\text{ Fr }(C_i)\subset V$. Let $\text{ Tr }(C_i)= V- (\text{ Co }(C_i)\cup \text{ Fr }(C_i))$. The three sets, $\text{ Co }(C_i), \text{ Fr }(C_i)$ and $\text{ Tr }(C_i)$ naturally form the CoreRegion, FringeRegion, and TrivialRegion, respectively, of a cluster. That is:

$$\begin{aligned} \mathrm{CoreRegion}(C_i)= & {} \text{ Co }(C_i),\\ \mathrm{FringeRegion}(C_i)= & {} \text{ Fr }(C_i),\\ \mathrm{TrivialRegion}(C_i)= & {} V-(\text{ Co }(C_i)\cup \text{ Fr }(C_i)). \end{aligned}$$

These subsets have the following properties.

$$\begin{aligned}&\text{ Tr }(C_i)\cup \text{ Co }(C_i)\cup \text{ Fr }(C_i)= V,\\&\text{ Co }(C_i)\cap \text{ Fr }(C_i)=\phi ,\\&\text{ Co }(C_i)\cap \text{ Tr }(C_i)=\phi ,\\&\text{ Fr }(C_i)\cap \text{ Tr }(C_i)=\phi . \end{aligned}$$

If $\text{ Fr }(C_i)=\phi$, the representation of $C_i$ turns into $C_i = \text{ Co }(C_i)$. It is a set and $\text{ Tr }(C_i) = U-\text{ Co }(C_i)$. This is a representation of hard cluster. It means that the representation of a cluster by a single set is a special case of three-way cluster in which the fringe regions are empty.

There are different requirements on $\text{ Co }(C_i)$ and $\text{ Fr }(C_i)$. In this paper, we adopt the following properties:

$$\begin{aligned} (\mathrm{I})&\text{ Co }(C_i) \ne \phi , ~~~~i=1, \ldots , k,\\ (\mathrm{II})&\bigcup _{i=1}^k(\text{ Co }(C_i)\cup \text{ Fr }(C_i))=V,\\ (\mathrm{III})&\text{ Co }(C_i)\cap \text{ Co }(C_j)=\phi , ~~~~i\ne j. \end{aligned}$$

Property (I) demands that each cluster cannot be empty. Property (II) states that any element in V must be in the core or the fringe region of at least one cluster. It is possible that an element $v\in V$ belongs to more than one cluster. Property (III) requires that the core regions of clusters are pairwise disjoint. Base on the above discussions, we have the following family of clusters to represent the result of three-way clustering:

$$\begin{aligned} {\mathbb {C}}=\{(\text{ Co }(C_1), \text{ Fr }(C_1)),(\text{ Co }(C_2), \text{ Fr }(C_2)), \ldots , (\text{ Co }(C_k), \text{ Fr }(C_k))\}. \end{aligned}$$

3 The proposed TWKM

We begin our discussion by introducing some notations. We suppose that $V =\{v_1,\ldots , v_n\}$ is a set of n objects and ${{\mathbb {C}}}=\{(\mathrm{Co}(C_1), \mathrm{Fr}(C_1)), {(\mathrm Co}(C_2), \mathrm{Fr}(C_2)), \ldots ,$${(\mathrm Co}(C_k), \mathrm{Fr}(C_k))\}$ is three-way clustering results of V. The unions of $\mathrm{Co}(C_i)$ and $\mathrm{Fr}(C_i)$ are the support set of clusters $C_i,(i=1, \cdots , k)$, i.e.,

$$\begin{aligned} \mathrm{support}(C_i)=\mathrm{Co}(C_i)\cup \mathrm{Fr}(C_i), (i=1, \ldots , k). \end{aligned}$$

Three-way clustering uses core region and fringe region to represent a cluster rather than a single set. One of the main tasks in three-way clustering is to construct core region and fringe region. Based on the concept of three-way decision and three-way clustering, we develop three-way k-means (TWKM for short) clustering algorithm in this section.

As we know, k-means is a classical hard clustering algorithm, which represents a cluster by a single set with crisp boundary. The main idea of k-means is to use centroids to represent a cluster. The process can be viewed as continuous iteration of the centroids by associating each object to the nearest centroid. Only two relationships are considered in the process of k-means. Objects belong to the cluster if they are in the set, otherwise they do not. However, assigning uncertain points into a cluster will reduce the accuracy of the method. From Fig. 1, we know that there are three types of relationships between an element and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. In order to chapter the three types of relationships, we integrate idea of three-way decision into k-means and propose three-way k-means (TKWM). TWKM clustering is one of three-way clustering methods, that is, each cluster is represented by its core region and fringe region. In TWKM, an overlap clustering is used to obtain the supports of the clusters and perturbation analysis is applied to separate the core regions from the supports. The differences between the support and the core region are regarded as the fringe region of the specific cluster. We suppose that clustering results satisfy following properties.

Property 1: An object can be a member of one core region at most.
Property 2: An object can be a member of one fringe region at least.

The procedure of three-way k-means clustering consists mainly of two steps. The first step is to obtain the support of each cluster and the second step is to separate the core region from the support. The idea of computing support comes from RKM. For each object v and randomly selected k centroids $x_1, \ldots , x_k$, let $d(v, x_j)$ be the distance between itself and the centroid $x_j$. Suppose $d(v, x_i)=\min _{1\le j\le k} d(v, x_j)$ and $T=\{j: {d(v, x_j)}-{d(v, x_i)} \le \varepsilon _1$ and $i\ne j\}$, where $\varepsilon _1$ is a given parameter. Then,

1.
If $T \ne \phi$, then $v\in \mathrm{support}(C_i)$ and $v\in \mathrm{support}(C_j)$.
2.
If $T= \phi$, then $v \in \mathrm{support}(C_i)$.

The modified centroid calculations for above procedure are given by:

$$\begin{aligned} x_i=\frac{\sum _{v\in \mathrm{support}(C_i)}v}{|\mathrm{support}(C_i)|}, \end{aligned}$$

(5)

where $i=1, \ldots , k$, v are all objects in $\mathrm{support}(C_i)$, $|\mathrm{support}(C_i)|$ is the number of objects in $\mathrm{support}(C_i)$.

The mean in Eq. (5) basically tries to firstly get a coarse idea regarding the cluster prototype and then proceeds to tune and refine this value using data from the support. Repeat this process until modified centroids in the current iteration are identical to those that have been generated in the previous one, namely, the prototypes are stabilized.

From the above procedure, we get a family of overlapping clusters, which are the unions of core regions and fringe regions. How to separate the core regions from the supports is another pivotal problem. We use centroids perturbation distance by adding weights of elements to solve this problem.

For a given upper bound $\mathrm{support}(C_i) (i=1, \ldots , k)$, we classify the elements of $\mathrm{support}(C_i)$ into two types.

$$\begin{aligned} \mathrm{Type\, I}= & {} \{v\in \mathrm{support}(C_i) \mid \exists j=1, \ldots , k, j\ne i, v \in \mathrm{support}(C_j) \},\\ \mathrm{Type\, II}= & {} \{v\in \mathrm{support}(C_i) \mid \forall j=1, \ldots , k, j\ne i, v \notin \mathrm{support}(C_j)\}. \end{aligned}$$

Suppose the centroid of $\mathrm{support}(C_i)$ is $x_i$ and $m_i$ is the number of elements in $\mathrm{support}(C_i)$. We adapt different strategies for different types. For objects in type I, we assign them into fringe region of $C_i$ because they belong to two clusters at least. For objects in type II, we add $m_i$ times v into $\mathrm{support}(C_i)$ and denote the new cluster by $\mathrm{support}(C_i^*)$. Calculate the new centroid $x_i^*$ of $\mathrm{support}(C_i^*)$ by (5) and the differences between $x_i^*$ and $x_i$. For a given parameter $\varepsilon _2$, if $|x_i^*-x_i|\le \varepsilon _2$, v is assigned to core region of $C_i$, otherwise, v is assigned to fringe region of $C_i$.

Using the same strategy for each $\mathrm{support}(C_i) (i=1, \ldots , k)$, we can obtain the core region and fringe region of each cluster. Therefore, a three-way cluster is naturally formed. The above clustering procedure is referred as TWKM clustering. Algorithm 1 is designed to describe the process of TWKM clustering. In Algorithm 1, Line 3 to Line 15 is to find the support of each cluster by using the iteration process to update the centroids of supports. Line 16 to Line 29 is to separate the core regions from the support sets by using centroids perturbation analysis. Fig. 7 shows the flowchart of Algorithm 1. The time complexity of Algorithm 1 is $O(tknm)+O(knm)$ and the space complexity of Algorithm 1 is $O(k+n)m$, where t, n and m are the numbers of iterations, objects and attributes, respectively.

4 Evaluation of algorithm

The evaluation of clustering, also referred to as cluster validity, is a crucial process to assess the performance of the learning method in identifying relevant groups. A good measure of cluster quality will help in allowing the comparison of several clustering methods and the analysis of whether one method is superior to another one. The following quantitative indices are often used to evaluate the performance of clustering algorithms.

1.
Davies–Bouldin index [3, 22] (DB hereafter).
$$\begin{aligned} DB=\frac{1}{c}\sum _{i=1}^c \max _{j\ne i}\left\{ \frac{S(C_i)+S(C_j)}{d(x_i,x_j)}\right\} \end{aligned}$$
(6)
where $S(C_i)$ and $d(x_i,x_j)$ are the intra-cluster distance and the inter-cluster separation, respectively. $S(C_i)$ is defined as follows:
$$\begin{aligned} S(C_i)=\frac{\sum _{v\in C_i} \parallel v-x_i\parallel }{\mid C_i\mid }. \end{aligned}$$
(7)
As a function of the ratio of the within cluster scatter to the between cluster separation, a lower value will mean that the clustering is better.
2.
Average Silhouette index [31] (AS hereafter).
$$\begin{aligned} AS=\frac{1}{n}\sum _{i=1}^n S_i, \end{aligned}$$
(8)
where n is the total number of objects in the set and $S_i$ is the silhouette of object $v_i$, which defined as,
$$\begin{aligned} S_i=\frac{b_i-a_i}{\max \{a_i, b_i\}}, \end{aligned}$$
(9)
$a_i$ is the average distance between $x_i$ and all other objects in its own cluster, and $b_i$ is the minimum of the average distance between $x_i$ and objects in other clusters.

The silhouette of each object shows which objects lie well within their cluster, and which ones are merely somewhere in between clusters. The average silhouette provides an evaluation of clustering validity. The range of the average silhouette index is $[-1, 1]$, a larger value means a better clustering result.
3.
Accuracy (ACC hereafter).
$$\begin{aligned} ACC =\sum _{c=1}^k\frac{n_c^j}{n}. \end{aligned}$$
(10)
where $n_c^j$ is the number of common objects in the cth cluster and its matched class j after obtaining a one-to-one match between clusters and classes. The higher the value of ACC is, the better the clustering result is. The value is equal to 1 only when the clustering result is same as the ground truth.

5 Experimental illustration

To test the performance of our proposed algorithm, six data sets from UCI Machine Leaning repository [4] and three USPS ZIP code handwritten digits database [11] data sets are employed in this subsection. The details of these data sets are shown in Table 1.

Table 1 A description of data sets used in the experiments

Full size table

In order to identify the quality of three-way k-means clustering, we use all the core regions to form a clustering result and compute DB index, AS index and ACC by using core region to represent corresponding cluster. The total number of the objects is to exclude the objects in the fringe regions when calculating ACC. A better three-way clustering results should have a lower DB value, a higher AS value and ACC value. The performances of three-way k-means clustering are presented on ten data sets. For comparing the clustering effect, the performances of k-means [21], k-medoids [9], FKM [2] and RKM [18] are also presented on each data set, where the $threshold=0.02$, $w_{up}=0.3$ and $w_{low}=0.7$ in RKM. These experiments are repeated 100 times for each data set and the parameters of three-way k-means clustering are $\varepsilon _1=0.02$ and $\varepsilon _2=0.0023n$, where n is the total number of objects in each set. The average values and the best values of the three indices in 100 times runs are used to compare the overall performances. The results are presented in Tables 2, 3, 4, 5, 6 and 7 in which, the optimal results among the different algorithms are marked as bold. The CPU computing time of 100 runs on each data set, the unit of measurement for time is the “second”, are recorded in Table 8 as well.

Table 2 Average performances of DB value

Full size table

Table 3 Best performances of DB value

Full size table

Table 4 Average performances of AS value

Full size table

Table 5 Best performances of AS value

Full size table

Table 6 Average performances of ACC value

Full size table

Table 7 Best performances of ACC value

Full size table

Table 8 The total run time of 100 runs

Full size table

Tables 2, 3, 4 and 5 demonstrate the average performances and the best performances of DB value and AS value by k-means, k-medoids, FKM, RKM and TWKM, respectively. From Tables 2, 3, 4 and 5, we can find that TWKM outperforms other algorithms in the sense of DB value and AS value both for the best performances and for the average performances on most of the date sets. The improvement can be attributed to the fact that each cluster is represented by its lower bound, which helps to increase the degree of separation between clusters and decrease the degree of scatter within cluster since fringe regions have been successfully marked out. Though the performances of DB value and AS value on WDBC, GLASS and BANK sets by FKM are similar to or superior to the results by TWKM, the computing time of TWKM are far less than FKM.

Tables 6 and 7 list the average performances and the best performances of ACC value, respectively. It is not difficult to observe that both the average performances and the best performances of ACC obtained by TWKM are superior to the results obtained by other methods on WINE, WDBC, OCCUPANCY, MAGIC, USPS-08 and USPS-49. However, other methods exceed TKWM on GLASS, BANK and USPS-49. This is because ACC is computed by using core region to represent corresponding cluster and the total number of the objects is to exclude the objects in the fringe regions, which means $n_c^j$ and n both become smaller in Eq. (10).

6 Concluding remarks

In most of the existing studies, a cluster is represented by a single set, the set naturally divides the space into two regions. An object belongs to the cluster if it is in the set, otherwise it does not belong to the cluster. It is obviously that there may be three types of relationships between an object and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. Based on these relationships, we developed three-way k-means clustering method by integrating k-means and three-way decisions in this paper. In the proposed method, an overlap clustering is used to obtain the supports of the clusters and perturbation analysis is applied to separate the core regions from the supports. The differences between the supports and the core regions are regarded as the fringe region of the specific cluster. Therefore, a three-way explanation of the cluster is naturally formed. Experimental results demonstrate that the new algorithms can significantly improve the structure of clustering results by comparing with the traditional clustering algorithm. The present study is the first step for the research of three-way k-means. The following are challenges for further research.

1.
The parameters $\varepsilon _1$ and $\varepsilon _2$ have a significant impact on the clustering results. Research on dynamic changes of parameters will be an interesting topic to be addressed.
2.
In the proposed three-way k-means algorithm, the number of clusters is given in advance. However, how to determine the number of clusters is another interesting topic to be addressed.
3.
Multigranulation is a developing approach which can be used for the constructing approximations of target concept. With granular computing point of view, our proposed algorithm is based on a single granulation. How to develop three-way k-means based on multigranulations needs to be further discussed.

References

Afridi MK, Azam N, Yao JT, Alanazi E (2018) A three-way clustering approach for handling missing data using GTRS. Int J Approx Reason. https://doi.org/10.1016/j.ijar.2018.04.001
Article MathSciNet Google Scholar
Bezdek J (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York
Book Google Scholar
Bezdek J, Pal N (1998) Some new indexes of cluster validity. IEEE Trans Syst Man Cybern B 28:301–315
Article Google Scholar
Blake CL, Merz CJ (2005) UCI machine learning repository. http://www.ics.uci.edu/mlearn/MLRepository.html
Hao C, Li JH, Fan M, Liu WQ, Tsang ECC (2017) Optimal scale selection in dynamic multi-scale decision tables based on sequential three-way decisions. Inf Sci 415:213–232
Article Google Scholar
Huang CC, Li JH, Mei CL, Wu WZ (2017) Three-way concept learning based on cognitive operators: an information fusion viewpoint. Int J Approx Reason 83:218–242
Article MathSciNet Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31:651–666
Article Google Scholar
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31:264–323
Article Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book Google Scholar
Lang GM, Miao DQ, Cai MJ (2017) Three-way decision approaches to conflict analysis using decision-theoretic rough set theory. Inf Sci 406:185–207
Article Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1990) USPS zip code handwritten digits database. http://www.ics.uci.edu/mlearn/MLRepository.html
Li CP, Li JH, He M (2016) Concept lattice compression in incomplete contexts based on k-medoids clustering. Int J Mach Learn Cybern 7:539–552
Article Google Scholar
Li HX, Zhou XZ (2011) Risk decision making based on decision-theoretic rough set: a three-way view decision model. Int J Comput Inf Sys 4:1–11
Article Google Scholar
Li JH, Huang CC, Qi JJ, Qia YH, Liu WQ (2017) Three-way cognitive concept learning via multi-granularity. Inf Sci 378:244–263
Article Google Scholar
Li W, Miao DQ, Wang WL, Zhang N (2010) Hierarchical rough decision theoretic framework for text classification. In: IEEE international conference on cognitive informatics, pp 484–489
Li Y, Zhang C, Swan JR (2000) An information filtering model on the web and its application in jobagent. Knowl-Based Syst 13:285–296
Article Google Scholar
Liang DC, Liu D (2015) A novel risk decision-making based on decision-theoretic rough sets under hesitant fuzzy information. IEEE Trans Fuzzy Syst 23:237–247
Article Google Scholar
Lingras P, West C (2004) Interval set clustering of web users with rough k-means. J Intell Inf Syst 23:5–16
Article Google Scholar
Liu D, Li TR, Liang DC (2012) Three-way government decision analysis with decision-theoretic rough sets. Int J Uncertain Fuzz 20:119–132
Article Google Scholar
Liu D, Yao YY, Li TR (2011) Three-way investment decisions with decision-theoretic rough sets. Int J Comput Inf Sys 4:66–74
Article Google Scholar
Macqueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–197
Maulik U, Bandyopadhyay S (2002) Performance evaluation of some clustering algorithms and validity indices. IEEE Trans Pattern Anal 24:1650–1654
Article Google Scholar
Mirkin B (1991) Mathematical classification and clustering. Kluwer, Boston
MATH Google Scholar
Mitra S, Banka H, Pedrycz W (2006) Rough-fuzzy collaborative clustering. IEEE Trans Syst Man Cybern B 36:795–805
Article Google Scholar
Mitra S, Pedrycz W, Barman B (2010) Shadowed c-means: integrating fuzzy and rough clustering. Pattern Recognit 43:1282–1291
Article Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:314–356
Article Google Scholar
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Boston
Book Google Scholar
Pawlak Z (2004) Some issues on rough sets. Trans Rough Sets I 3100:1–58
Article Google Scholar
Pedrycz W (1998) Shadowed sets: representing and processing fuzzy sets. IEEE Trans Syst Man Cybern B 28:103–109
Article Google Scholar
Qi JJ, Qian T, Wei L (2016) The connections between three-way and classical concept lattices. Knowl-Based Syst 91:143–151
Article Google Scholar
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpreta-tion and validation of cluster analysis. J Comput Math Appl Math 20:53–65
Article Google Scholar
Shivhare R, Cherukuri AK (2017) Three-way conceptual approach for cognitive memory functionalities. Int J Mach Learn Cybern 8:21–34
Article Google Scholar
Singh PK (2016) Three-way fuzzy concept lattice representation using neutrosophic set. Int J Mach Learn Cybern 8:1–11
Google Scholar
Singh PK (2017) Interval-valued neutrosophic graph representation of concept lattice and its ($\alpha,\beta,\gamma$)-decomposition. Arab J Sci Eng 43:1–18
Google Scholar
Singh PK (2018) Similar vague concepts selection using their Euclidean distance at different granulation. Cogn Comput 10:228–241
Article Google Scholar
Singh PK (2018) Concept learning using vague concept lattice. Neural Process Lett 48:31–52
Article Google Scholar
Singh PK (2017) Medical diagnoses using three-way fuzzy concept lattice and their euclidean distance. Comp Appl Math 3:1–24
Article Google Scholar
Tou JT, Gonzalez RC (1974) Pattern recognition principles. Addison-Wesley, London
MATH Google Scholar
Wang PX, Yao YY (2018) CE3: A three-way clustering method based on mathematical morphology. Knowl-Based Syst 155:54–65
Article Google Scholar
Xu R, Wunsch DC (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16:645–678
Article Google Scholar
Yang X, Li TR, Fujita H, Liu D, Yao YY (2017) A unified model of sequential three-way decisions and multilevel incremental processing. Knowl-Based Syst 134:172–188
Article Google Scholar
Yao JT (2015) Web-based medical decision support systems for three-way medical decision making with game-theoretic rough sets. IEEE Trans Fuzzy Syst 23:3–15
Article Google Scholar
Yao YY (2009) Three-way decision: an interpretation of rules in rough set theory. In: Proceedings of RSKT’09, vol 5589, pp 642–649
Chapter Google Scholar
Yao YY (2010) Three-way decisions with probabilistic rough sets. Inf Sci 180:341–353
Article MathSciNet Google Scholar
Yao YY (2011) The superiority of three-way decisions in probabilistic rough set models. Inf Sci 181:1080–1096
Article MathSciNet Google Scholar
Yao YY (2012) An outline of a theory of three-way decisions. In: Proceedings of RSCTC’12, vol 7413, pp 1–17
Chapter Google Scholar
Yao YY (2016) Three-way decisions and cognitive computing. Cogn Comput 8:543–554
Article Google Scholar
Yao YY (2017) Interval sets and three-way concept analysis in incomplete contexts. Int J Mach Learn Cybern 8:3–20
Article Google Scholar
Yu H (2017) A framework of three-way cluster analysis. In: Proceedings of international joint conference on rough sets, pp 300–312
Chapter Google Scholar
Yu H, Jiao P, Yao YY, Wang GY (2016) Detecting and refining overlapping regions in complex networks with three-way decisions. Inf Sci 373:21–41
Article Google Scholar
Yu H, Wang XC , Wang GY, Zeng XH (2018) An active three-way clustering method via low-rank matrices for multi-view data. Inf Sci. https://doi.org/10.1016/j.ins.2018.03.009
Article MathSciNet Google Scholar
Yu H, Zhang C, Wang GY (2016) A tree-based incremental overlapping clustering method using the three-way decision theory. Knowl-Based Syst 91:189–203
Article Google Scholar
Zhang QH, Lv GX, Chen YH, Wang GY (2018) A dynamic three-way decision model based on the updating of attribute values. Knowl-Based Syst 142:71–84
Article Google Scholar
Zhang QH, Xia DY, Wang GY (2017) Three-way decision model with two types of classification errors. Inf Sci 420:431–453
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their constructive and valuable comments. This work was supported in part by National Natural Science Foundation of China (nos. 61503160, 61773012 and 61572242), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (no. 15KJB110004).

Author information

Authors and Affiliations

School of Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, Jiangsu, People’s Republic of China
Pingxin Wang
College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang, 050024, People’s Republic of China
Pingxin Wang & Jusheng Mi
School of Computer Science, Jiangsu University of Science and Technology, Zhenjiang, 212003, People’s Republic of China
Hong Shi & Xibei Yang

Authors

Pingxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xibei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jusheng Mi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pingxin Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, P., Shi, H., Yang, X. et al. Three-way k-means: integrating k-means and three-way decision. Int. J. Mach. Learn. & Cyber. 10, 2767–2777 (2019). https://doi.org/10.1007/s13042-018-0901-y

Download citation

Received: 12 December 2017
Accepted: 12 December 2018
Published: 02 January 2019
Issue Date: October 2019
DOI: https://doi.org/10.1007/s13042-018-0901-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Three-way k-means: integrating k-means and three-way decision

Abstract

Similar content being viewed by others

Adaptive K-means Algorithm Based on Three-Way Decision

Adaptive Three-Way C-Means Clustering Based on the Cognition of Distance Stability

On Various Types of Even-Sized Clustering Based on Optimization

1 Introduction