1 Introduction

As one of the most fundamental topics in both data mining and machine learning, clustering has been widely used in data analysis [8, 23]. The aim of clustering is to partition a set of given multivariate samples into several meaningful groups such that all members within a group are similar to each other and samples between different groups represent dissimilar characteristics. Research on clustering algorithm has received much attention and a number of clustering methods have been developed over the past decades. A comprehensive review of clustering algorithms can be found in [7, 40].

Roughly speaking, most of the existing clustering methods can be classed into two categories: hierarchical clustering and partitive clustering [9]. In this paper, we focus on the latter which is categorized as a prototype-based model, i.e., each cluster can be represented by a prototype, leading to a concise description of the original data set. According to whether there is a crisp boundary between clusters, the various partitive clustering algorithms can be divided into hard clustering and soft clustering. Hard clustering methods are based on an assumption that a cluster is represented by a set with a crisp boundary. One of the most widely used hard clustering is k-means method [21], where each object must be assigned to exactly one cluster. The requirement of a sharp boundary leads to easy analytical results, but may not adequately show the fact that a cluster may not have a well-defined cluster boundary. In order to relax this requirement of crisp boundary, many soft clustering methods based on k-means were proposed for different applications. Incorporating fuzzy sets into k-means clustering, Bezdek [2] proposed fuzzy k-means (FKM), which is assumed that a cluster is represented by a fuzzy set that models a gradually changing boundary. It is often used to reveal the structure of a data set and to construct information granules. Based on the rough set theory [26,27,28], Lingras and West [18] introduced rough k-means (RKM) clustering, which describes each cluster not only by a center, but also with a pair of lower and upper bounds. The lower and upper approximations are different weighted parameters that are used to compute the new centers. Involving membership degrees, Mitra et al. [24] put forward a rough-fuzzy k-means (RFKM) clustering method, which incorporated membership in the RKM framework. The lower and upper bounds are determined according to the membership degrees, not the individual absolute distances between an object and its neighbors. As a conceptual and algorithmic bridge between rough sets and fuzzy sets, shadowed set [29] provides an alternate mechanism for handling uncertainty and has been successfully used in clustering analysis, resulting in shadowed k-means (SKM) [25].

Three-way decision, as a new field of study for complex problem solving, was proposed by Yao [43,44,45,46]. It is an extension of the commonly used binary-decision models by adding the deferred decision. The main idea of three-way decision is to divide a universe into three disjoint regions and make different strategies for different regions. The deferred decision is viewed as the third decision-making behavior when the information is not sufficient to determine the state of an object, that is, whether the object is accepted or rejected. Many soft computing models, such as interval sets, rough sets, fuzzy sets and shadowed sets, have the tri-partitioning properties and can be reinvestigated within the framework of three-way decision [46]. It is obviously that there may exist at least three types of relationships between an element and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. In order to chapter the three types of relationships, Yu et al. [49, 50, 52] introduced three-way clustering based on three-way decision theory. In three-way clustering, a cluster is represented by pair of nested sets called the core and support of the cluster, respectively. The core, the difference between the support and core (i.e., the fringe region), and the complement of the support give rise to a trisection of the space. A trisection captures the three types of relationships between a cluster and an object.

As one of the classical hard clustering algorithms, k-means represents a cluster by a single set. For each cluster, the set naturally divides the space into two regions. Objects belong to the cluster if they are in the set, otherwise they do not. Here, only two relationships are considered in the process of k-means. In this paper, we aim at presenting a three-way k-means (TWKM for short) clustering method by incorporating three-way decisions into k-means clustering. In the proposed method, we view each cluster as a set and represent it by using core region (Co), fringe region (Fr) and trivial region (Tr). Since Tr can be expressed as the complement of the union of Co and Fr, we can represent a three-way cluster by a pair of the set of core objects and the set of fringe objects. For each object, it can be a member of one core region at most or a member of one fringe region at least. Figure 1 shows one possible cluster by three-way k-means clustering method. The elements in the fringe region may belong only to this cluster or to fringe region of other clusters.

Fig. 1
figure 1

A demonstrative cluster of TWKM

In order to clarify the main differences between TWKM and the traditional k-means, we take the objects in Fig. 2 as an example. Figure 3 is the clustering result of traditional k-means method by setting \(k=2\). From the result we can see that an object is either in \(C_1\) or not in \(C_1\); The same is true about \(C_2\). There is a crisp boundary for each cluster. The requirement of a sharp boundary leads to easy analytical results, but may not be good enough for characterizing the uncertainty. If we apply the proposed TWKM algorithm on data set in Fig. 2, we obtain the clustering result as Fig. 4. We can see that objects near the cluster center are assigned to core region and other objects are assigned to fringe region, which reveal a better structure than the result in Fig. 3.

Fig. 2
figure 2

Schematic diagram of a data set

Fig. 3
figure 3

Clustering results of traditional k-means

Fig. 4
figure 4

Clustering results of TWKM

The procedure of TWKM consists mainly of two steps. The first step is to obtain the support of each cluster, where support is the union of core region and fringe region of specified cluster. The second step is to separate the core region from the support set. The rest of this paper is organized as follows. In Sect. 2, we review some basic concepts of three-way decision, rough k-means and three-way clustering. In Sect. 3, we present the process and the algorithm of three-way k-means clustering. Evaluations of the algorithm and experiment results are reported in Sects. 4 and  5, respectively. In Sect. 6, we give some concluding remarks and point out some future research problems.

2 Preliminaries

To facilitate the description of the proposed method, we introduce some basic concepts related to this paper, which include three-way decision, fuzzy k-means and three-way clustering.

2.1 Three-way decision

The concept of three-way decision, which was first proposed by Yao [45, 46], is an extension of the commonly used binary-decision model through adding a third option and is used to interpret rough set three regions. The positive, negative and boundary regions are viewed, respectively, as the regions of acceptance, rejection, and noncommitment in a ternary classification. The positive and negative regions can be used to induce rules of acceptance and rejection. Whenever it is impossible to make an acceptance or a rejection decision, the third noncommitment decision is made. One usually makes a decision based on available information and evidence. When the evidence is insufficient or too weak, it might be impossible to make either a positive or a negative decision. One chooses an alternative decision that is neither yes nor no. The third option may also be referred to as a deferment decision that requires further judgments. With an ordered evaluation function, the three regions are formally defined by:

Definition 1

Three-way decision with order set [46]. Suppose \((L, \preceq )\) is a totally ordered set, that is, \(\preceq\) is a total order. For two elements \(\alpha , \beta\) with \(\beta \prec \alpha\), suppose that the set of designated values for acceptance is given by \(L^+=\{t\in L\mid t\succeq \alpha \}\) and the set of designated values for rejection is given by \(L^-=\{b\in L\mid b\preceq \beta \}\). For an evaluation function \(\nu : U\rightarrow L\), its three regions are defined by:

$$\begin{aligned} \mathrm{POS}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \nu (x)\succeq \alpha \},\\ \mathrm{NEG}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \nu (x)\preceq \beta \},\\ \mathrm{BND}_{(\alpha , \beta )}(\nu )= & {} \{x\in U \mid \beta \prec \nu (x)\prec \alpha \}. \end{aligned}$$

Three-way decision has been proved to build on solid cognitive foundations and is a class of effective ways commonly used in human problem solving and information processing [47]. The idea of three-way decision is commonly used in real life and widely applied in many fields and disciplines. Many soft computing models for leaning uncertain concepts, such as interval sets, rough sets, fuzzy sets and shadowed sets, have the tri-partitioning properties and can be reinvestigated within the framework of three-way decisions [45]. The theory of three-way decision embraces ideas from these theories and introduces its own notions, concepts, methods and tools. Recently, Yao [47] presented a trisecting-and-acting model, which can be depicted by Fig. 5. The model explains three-way decision in terms of two basic tasks. The first one is to divide a universal into three pair-wise disjoint regions and the second one is to develop appropriate strategies for different region.

Fig. 5
figure 5

Trisecting-and-acting model

Since three-way decision was proposed by Yao, we have witnessed a fast growing developments and applications of three-way approaches in many fields and disciplines, such as, investment decision [17, 20], information filtering [16], text classification [13, 15], risk decision [13, 17], government decision [19], conflict analysis [10], web-based support systems [42], and so on. Many recent studies further investigated extensions and applications of three-way decision. For examples, Zhang et al. [54] presented a kind of three-way decision model according to two types of classification errors and two types of uncertain classifications. Yang et al. [41] proposed a unified model of sequential three-way decision and multilevel incremental processing for complex problem solving by making use of a granular structure. Hao et al. [5] developed a sequential three-way decision model to solve the optimal scale selection problem in a dynamic multi-scale decision table. Zhang et al. [53] established a dynamic three-way decision model based on the updating of the attribute values. In addition, Qi et al. [30] introduced three-way decision into formal concept analysis and proposed the notion of three-way concept, in which the main idea is to incorporate the idea of ternary classification into the design of extension or intension of a concept. Li et al. [14] proposed an axiomatic approach to describe three-way concepts by means of multi-granularity. The idea of three-way concept analysis has attracted a lot of research, and a series of related results have been obtained [6, 12, 32, 48]. In clustering field, Yu et al. [49, 50, 52] presented a framework of three-way clustering which represents the clusters by a pair of sets called core region and fringe region. Afridi et al. [1] presented a three-way clustering approach for handling missing data by using game-theoretic rough set model. Wang and Yao [39] proposed a framework of a contraction-and-expansion based three-way clustering called CE3 inspired by the ideas of erosion and dilation from mathematical morphology. Yu et al. [51] investigated an active three-way clustering method via low-rank matrices. All the above results enrich the theories and models of three-way decision.

Recently, Singh [33] proposed another three-way method to generate the three-way fuzzy concepts and their hierarchical-order visualization in the concept lattice. Singh’s method comes from the theory of neutrosophic set, which uses three functions: a truth membership function, an indeterminacy-membership function, and a falsity membership function, to represent a set. The Singh’s method shares some similarity with the Yao’s three-way decision and, at the same time, differs from it in several ways. On the one hand, they are both based on the tri-partition methodology, which provides flexible ways for human-like problem solving and information processing. On the other hand, Yao’s method comes from decision-theoretic rough set model, which has led to the concept of three-way decision. A basic idea of Yao’s three-way decision is to divide a universal set into three pair-wise disjoint regions and to process the three regions accordingly. Different from Yao’s method, Singh’s method gives a way to divide a universe set into three regions, independently, which may intersect each other. Most of recent studies of Singh’s method focus on fuzzy concept lattice [34,35,36] and its applications [37].

2.2 Rough k-means

The traditional k-means algorithm [38] proceeds by partitioning n objects into k non-empty subsets. During each partition, the centroids or means of clusters are computed as

$$\begin{aligned} x_i=\frac{\sum _{v\in C_i}v}{|C_i|}, \end{aligned}$$
(1)

where v is the object in \(C_i\), \(|C_i|\) is the number of objects in cluster \(C_i\). The process is repeated until convergence, i.e., there are no more new assignments of objects to the clusters.

In the rough set theory [26,27,28], a rough concept is approximated by a pair of exact concepts, called the lower and upper approximations. The lower approximation is the set of objects definitely belonging to the vague concept, whereas the upper approximation is the set of objects possibly belonging to the same. Correspondingly, three pairwise disjoint regions are formed, i.e., positive, boundary, and negative regions. Figure 6 provides a schematic diagram of rough set X with POS(X), BND(X) and NEG(X), consisting of granules coming from the rectangular grid.

Fig. 6
figure 6

Three disjoint regions in rough set model

By incorporating rough set theory into traditional k-means, Lingras and West [18] introduced RKM clustering. In RKM, the concept of k-means is extended by viewing each cluster as an interval set or rough set X. It is characterized by the lower and upper approximations \({\underline{R}}X\) and \({\overline{R}}X\), respectively, with the following properties: (i) an object \(x_k\) can be part of at most one lower approximation, (ii) if \(x_k\in {\underline{R}}X\) of cluster X, then simultaneously \(x_k\in {\overline{R}}X\), (iii) if \(x_k\) is not a part of any lower approximation, then it belongs to two or more upper approximations. This permits overlaps between clusters.

In order to compute the centroid of each cluster in RKM, the right side of Eq. (1) is split into two parts. Since the patterns lying in the lower approximation definitely belong to a rough cluster, they are assigned a higher weight that is controlled by parameter \(w_{low}\). The patterns lying in the upper approximation are assigned a relatively lower weight, controlled by parameter \(w_{up}\) during computation. The centroid of cluster \(U_i\) is determined by

$$\begin{aligned} v_i= \left\{ \begin{array}{cc} w_{low}A_1+w_{up}B_1, &{} \text{ if } \ \ \ {\underline{R}}C_i\ne \phi \wedge {\overline{R}}C_i-{\underline{R}}C_i \ne \phi \\ B_1, &{} \text{ if } \ \ \ \ {\underline{R}}C_i=\phi \wedge {\overline{R}}C_i-{\underline{R}}C_i \ne \phi \\ A_1, &{} \text{ if }\ \ \ \ {\underline{R}}C_i\ne \phi \wedge {\overline{R}}C_i-{\underline{R}}C_i = \phi \end{array} \right. , \end{aligned}$$
(2)

where,

$$\begin{aligned} A_1= & {} \frac{\sum _{x\in {\underline{R}}C_i}x_k}{|{\underline{R}}C_i|}, \end{aligned}$$
(3)
$$\begin{aligned} B_1= & {} \frac{\sum _{x\in ({\overline{R}}XC_i-{\underline{R}}C_i)}x_k}{|{\overline{R}}C_i-{\underline{R}}C_i|}. \end{aligned}$$
(4)

The parameters \(w_{low}\) and \(w_{up}\) correspond to the relative importance of the lower and upper approximations, respectively. Here \(|{\underline{R}}C_i|\) indicates the number of patterns in the lower approximation of cluster \(C_i\), while \(|{\overline{R}}C_i-{\underline{R}}C_i|\) is the number of patterns in the rough boundary lying between the two approximations. In order to determine the lower and upper approximations of each cluster, Lingras et al. [4] utilized the following rules: Let \(d_{pk}=\min _{1\le i\le k}d_{ik}\) and \(T=\{j : d_{jk}-d_{pk}\le \ \ threshold \ \ \text{ and } \ \ p\ne j\}\).

  1. 1.

    If \(T\ne \phi , x_k \in {\overline{R}}(C_p)\) and \(x_k \in {\overline{R}}(C_j), \forall j\in T\) . Furthermore, \(x_k\) is not part of any lower bound.

  2. 2.

    Otherwise, if \(T=\phi , x_k \in {\underline{R}}(C_i)\). In addition, \(x_k \in {\overline{R}}(C_p)\).

It is observed that the performance of the algorithm is dependent on the choice of \(w_{low}, w_{up}\) and threshold. The parameter \(w_{low}\) controls the importance of the objects lying within the lower approximation of a cluster in determining its centroid. Hence an optimal selection of these parameters is an issue of reasonable interest. We allowed \(w_{up}+ w_{low}=1\) and \(0.5\le w_{low}\le 1\).

2.3 Three-way clustering

The framework of three-way clustering was first proposed by Yu [50, 52]. We summarize the basic concepts of three-way clustering.

Assume \({\mathbb {C}}=\{C_1, \ldots , C_k\}\) is a family clusters of universe \(V =\{v_1,\ldots , v_n\}\). A hard clustering requires that \(C_i\) satisfies the following conditions:

$$\begin{aligned} &(\mathrm{i})C_i\ne \phi , \ \ \ i=1, \ldots , k,\\ &(\mathrm{ii})\bigcup _{i=1}^kC_i=V,\\ &(\mathrm{iii})\,C_i\bigcap C_j=\phi , \quad i\ne j. \end{aligned}$$

Property (i) states that each cluster cannot be empty. Properties (ii) and (iii) state that every \(v\in V\) belongs to one and only one cluster. In this case, C is a partition of the universe.

It is obviously that there are three types of relationships between an object and a cluster, namely, belong-to definitely, not belong-to definitely, and uncertain. It is therefore more appropriate to use three regions to represent a cluster. Inspired by ideas of three-way decision, Yu [50, 52] proposed a framework of three-way clustering. In contrast to the general crisp representation of a cluster, three-way clustering represents a three-way cluster \(C_i\) as a pair of sets:

$$\begin{aligned} C_i=(\text{ Co }(C_i), \text{ Fr }(C_i)), \end{aligned}$$

where \(\text{ Co }(C_i)\subset V\) and \(\text{ Fr }(C_i)\subset V\). Let \(\text{ Tr }(C_i)= V- (\text{ Co }(C_i)\cup \text{ Fr }(C_i))\). The three sets, \(\text{ Co }(C_i), \text{ Fr }(C_i)\) and \(\text{ Tr }(C_i)\) naturally form the CoreRegion, FringeRegion, and TrivialRegion, respectively, of a cluster. That is:

$$\begin{aligned} \mathrm{CoreRegion}(C_i)= & {} \text{ Co }(C_i),\\ \mathrm{FringeRegion}(C_i)= & {} \text{ Fr }(C_i),\\ \mathrm{TrivialRegion}(C_i)= & {} V-(\text{ Co }(C_i)\cup \text{ Fr }(C_i)). \end{aligned}$$

These subsets have the following properties.

$$\begin{aligned}&\text{ Tr }(C_i)\cup \text{ Co }(C_i)\cup \text{ Fr }(C_i)= V,\\&\text{ Co }(C_i)\cap \text{ Fr }(C_i)=\phi ,\\&\text{ Co }(C_i)\cap \text{ Tr }(C_i)=\phi ,\\&\text{ Fr }(C_i)\cap \text{ Tr }(C_i)=\phi . \end{aligned}$$

If \(\text{ Fr }(C_i)=\phi\), the representation of \(C_i\) turns into \(C_i = \text{ Co }(C_i)\). It is a set and \(\text{ Tr }(C_i) = U-\text{ Co }(C_i)\). This is a representation of hard cluster. It means that the representation of a cluster by a single set is a special case of three-way cluster in which the fringe regions are empty.

There are different requirements on \(\text{ Co }(C_i)\) and \(\text{ Fr }(C_i)\). In this paper, we adopt the following properties:

$$\begin{aligned} (\mathrm{I})&\text{ Co }(C_i) \ne \phi , ~~~~i=1, \ldots , k,\\ (\mathrm{II})&\bigcup _{i=1}^k(\text{ Co }(C_i)\cup \text{ Fr }(C_i))=V,\\ (\mathrm{III})&\text{ Co }(C_i)\cap \text{ Co }(C_j)=\phi , ~~~~i\ne j. \end{aligned}$$

Property (I) demands that each cluster cannot be empty. Property (II) states that any element in V must be in the core or the fringe region of at least one cluster. It is possible that an element \(v\in V\) belongs to more than one cluster. Property (III) requires that the core regions of clusters are pairwise disjoint. Base on the above discussions, we have the following family of clusters to represent the result of three-way clustering:

$$\begin{aligned} {\mathbb {C}}=\{(\text{ Co }(C_1), \text{ Fr }(C_1)),(\text{ Co }(C_2), \text{ Fr }(C_2)), \ldots , (\text{ Co }(C_k), \text{ Fr }(C_k))\}. \end{aligned}$$

3 The proposed TWKM

We begin our discussion by introducing some notations. We suppose that \(V =\{v_1,\ldots , v_n\}\) is a set of n objects and \({{\mathbb {C}}}=\{(\mathrm{Co}(C_1), \mathrm{Fr}(C_1)), {(\mathrm Co}(C_2), \mathrm{Fr}(C_2)), \ldots ,\)\({(\mathrm Co}(C_k), \mathrm{Fr}(C_k))\}\) is three-way clustering results of V. The unions of \(\mathrm{Co}(C_i)\) and \(\mathrm{Fr}(C_i)\) are the support set of clusters \(C_i,(i=1, \cdots , k)\), i.e.,

$$\begin{aligned} \mathrm{support}(C_i)=\mathrm{Co}(C_i)\cup \mathrm{Fr}(C_i), (i=1, \ldots , k). \end{aligned}$$

Three-way clustering uses core region and fringe region to represent a cluster rather than a single set. One of the main tasks in three-way clustering is to construct core region and fringe region. Based on the concept of three-way decision and three-way clustering, we develop three-way k-means (TWKM for short) clustering algorithm in this section.

As we know, k-means is a classical hard clustering algorithm, which represents a cluster by a single set with crisp boundary. The main idea of k-means is to use centroids to represent a cluster. The process can be viewed as continuous iteration of the centroids by associating each object to the nearest centroid. Only two relationships are considered in the process of k-means. Objects belong to the cluster if they are in the set, otherwise they do not. However, assigning uncertain points into a cluster will reduce the accuracy of the method. From Fig. 1, we know that there are three types of relationships between an element and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. In order to chapter the three types of relationships, we integrate idea of three-way decision into k-means and propose three-way k-means (TKWM). TWKM clustering is one of three-way clustering methods, that is, each cluster is represented by its core region and fringe region. In TWKM, an overlap clustering is used to obtain the supports of the clusters and perturbation analysis is applied to separate the core regions from the supports. The differences between the support and the core region are regarded as the fringe region of the specific cluster. We suppose that clustering results satisfy following properties.

  • Property 1: An object can be a member of one core region at most.

  • Property 2: An object can be a member of one fringe region at least.

The procedure of three-way k-means clustering consists mainly of two steps. The first step is to obtain the support of each cluster and the second step is to separate the core region from the support. The idea of computing support comes from RKM. For each object v and randomly selected k centroids \(x_1, \ldots , x_k\), let \(d(v, x_j)\) be the distance between itself and the centroid \(x_j\). Suppose \(d(v, x_i)=\min _{1\le j\le k} d(v, x_j)\) and \(T=\{j: {d(v, x_j)}-{d(v, x_i)} \le \varepsilon _1\) and \(i\ne j\}\), where \(\varepsilon _1\) is a given parameter. Then,

  1. 1.

    If \(T \ne \phi\), then \(v\in \mathrm{support}(C_i)\) and \(v\in \mathrm{support}(C_j)\).

  2. 2.

    If \(T= \phi\), then \(v \in \mathrm{support}(C_i)\).

The modified centroid calculations for above procedure are given by:

$$\begin{aligned} x_i=\frac{\sum _{v\in \mathrm{support}(C_i)}v}{|\mathrm{support}(C_i)|}, \end{aligned}$$
(5)

where \(i=1, \ldots , k\), v are all objects in \(\mathrm{support}(C_i)\), \(|\mathrm{support}(C_i)|\) is the number of objects in \(\mathrm{support}(C_i)\).

The mean in Eq. (5) basically tries to firstly get a coarse idea regarding the cluster prototype and then proceeds to tune and refine this value using data from the support. Repeat this process until modified centroids in the current iteration are identical to those that have been generated in the previous one, namely, the prototypes are stabilized.

From the above procedure, we get a family of overlapping clusters, which are the unions of core regions and fringe regions. How to separate the core regions from the supports is another pivotal problem. We use centroids perturbation distance by adding weights of elements to solve this problem.

For a given upper bound \(\mathrm{support}(C_i) (i=1, \ldots , k)\), we classify the elements of \(\mathrm{support}(C_i)\) into two types.

$$\begin{aligned} \mathrm{Type\, I}= & {} \{v\in \mathrm{support}(C_i) \mid \exists j=1, \ldots , k, j\ne i, v \in \mathrm{support}(C_j) \},\\ \mathrm{Type\, II}= & {} \{v\in \mathrm{support}(C_i) \mid \forall j=1, \ldots , k, j\ne i, v \notin \mathrm{support}(C_j)\}. \end{aligned}$$

Suppose the centroid of \(\mathrm{support}(C_i)\) is \(x_i\) and \(m_i\) is the number of elements in \(\mathrm{support}(C_i)\). We adapt different strategies for different types. For objects in type I, we assign them into fringe region of \(C_i\) because they belong to two clusters at least. For objects in type II, we add \(m_i\) times v into \(\mathrm{support}(C_i)\) and denote the new cluster by \(\mathrm{support}(C_i^*)\). Calculate the new centroid \(x_i^*\) of \(\mathrm{support}(C_i^*)\) by (5) and the differences between \(x_i^*\) and \(x_i\). For a given parameter \(\varepsilon _2\), if \(|x_i^*-x_i|\le \varepsilon _2\), v is assigned to core region of \(C_i\), otherwise, v is assigned to fringe region of \(C_i\).

Using the same strategy for each \(\mathrm{support}(C_i) (i=1, \ldots , k)\), we can obtain the core region and fringe region of each cluster. Therefore, a three-way cluster is naturally formed. The above clustering procedure is referred as TWKM clustering. Algorithm 1 is designed to describe the process of TWKM clustering. In Algorithm 1, Line 3 to Line 15 is to find the support of each cluster by using the iteration process to update the centroids of supports. Line 16 to Line 29 is to separate the core regions from the support sets by using centroids perturbation analysis. Fig. 7 shows the flowchart of Algorithm 1. The time complexity of Algorithm 1 is \(O(tknm)+O(knm)\) and the space complexity of Algorithm 1 is \(O(k+n)m\), where t, n and m are the numbers of iterations, objects and attributes, respectively.

figure a
Fig. 7
figure 7

Flowchart of Algorithm 1

4 Evaluation of algorithm

The evaluation of clustering, also referred to as cluster validity, is a crucial process to assess the performance of the learning method in identifying relevant groups. A good measure of cluster quality will help in allowing the comparison of several clustering methods and the analysis of whether one method is superior to another one. The following quantitative indices are often used to evaluate the performance of clustering algorithms.

  1. 1.

    Davies–Bouldin index [3, 22] (DB hereafter).

    $$\begin{aligned} DB=\frac{1}{c}\sum _{i=1}^c \max _{j\ne i}\left\{ \frac{S(C_i)+S(C_j)}{d(x_i,x_j)}\right\} \end{aligned}$$
    (6)

    where \(S(C_i)\) and \(d(x_i,x_j)\) are the intra-cluster distance and the inter-cluster separation, respectively. \(S(C_i)\) is defined as follows:

    $$\begin{aligned} S(C_i)=\frac{\sum _{v\in C_i} \parallel v-x_i\parallel }{\mid C_i\mid }. \end{aligned}$$
    (7)

    As a function of the ratio of the within cluster scatter to the between cluster separation, a lower value will mean that the clustering is better.

  2. 2.

    Average Silhouette index [31] (AS hereafter).

    $$\begin{aligned} AS=\frac{1}{n}\sum _{i=1}^n S_i, \end{aligned}$$
    (8)

    where n is the total number of objects in the set and \(S_i\) is the silhouette of object \(v_i\), which defined as,

    $$\begin{aligned} S_i=\frac{b_i-a_i}{\max \{a_i, b_i\}}, \end{aligned}$$
    (9)

    \(a_i\) is the average distance between \(x_i\) and all other objects in its own cluster, and \(b_i\) is the minimum of the average distance between \(x_i\) and objects in other clusters.

    The silhouette of each object shows which objects lie well within their cluster, and which ones are merely somewhere in between clusters. The average silhouette provides an evaluation of clustering validity. The range of the average silhouette index is \([-1, 1]\), a larger value means a better clustering result.

  3. 3.

    Accuracy (ACC hereafter).

    $$\begin{aligned} ACC =\sum _{c=1}^k\frac{n_c^j}{n}. \end{aligned}$$
    (10)

    where \(n_c^j\) is the number of common objects in the cth cluster and its matched class j after obtaining a one-to-one match between clusters and classes. The higher the value of ACC is, the better the clustering result is. The value is equal to 1 only when the clustering result is same as the ground truth.

5 Experimental illustration

To test the performance of our proposed algorithm, six data sets from UCI Machine Leaning repository [4] and three USPS ZIP code handwritten digits database [11] data sets are employed in this subsection. The details of these data sets are shown in Table 1.

Table 1 A description of data sets used in the experiments

In order to identify the quality of three-way k-means clustering, we use all the core regions to form a clustering result and compute DB index, AS index and ACC by using core region to represent corresponding cluster. The total number of the objects is to exclude the objects in the fringe regions when calculating ACC. A better three-way clustering results should have a lower DB value, a higher AS value and ACC value. The performances of three-way k-means clustering are presented on ten data sets. For comparing the clustering effect, the performances of k-means [21], k-medoids [9], FKM [2] and RKM [18] are also presented on each data set, where the \(threshold=0.02\), \(w_{up}=0.3\) and \(w_{low}=0.7\) in RKM. These experiments are repeated 100 times for each data set and the parameters of three-way k-means clustering are \(\varepsilon _1=0.02\) and \(\varepsilon _2=0.0023n\), where n is the total number of objects in each set. The average values and the best values of the three indices in 100 times runs are used to compare the overall performances. The results are presented in Tables 2, 3, 4, 5, 6 and 7 in which, the optimal results among the different algorithms are marked as bold. The CPU computing time of 100 runs on each data set, the unit of measurement for time is the “second”, are recorded in Table 8 as well.

Table 2 Average performances of DB value
Table 3 Best performances of DB value
Table 4 Average performances of AS value
Table 5 Best performances of AS value
Table 6 Average performances of ACC value
Table 7 Best performances of ACC value
Table 8 The total run time of 100 runs

Tables 2, 3, 4 and 5 demonstrate the average performances and the best performances of DB value and AS value by k-means, k-medoids, FKM, RKM and TWKM, respectively. From Tables 2, 3, 4 and 5, we can find that TWKM outperforms other algorithms in the sense of DB value and AS value both for the best performances and for the average performances on most of the date sets. The improvement can be attributed to the fact that each cluster is represented by its lower bound, which helps to increase the degree of separation between clusters and decrease the degree of scatter within cluster since fringe regions have been successfully marked out. Though the performances of DB value and AS value on WDBC, GLASS and BANK sets by FKM are similar to or superior to the results by TWKM, the computing time of TWKM are far less than FKM.

Tables 6 and  7 list the average performances and the best performances of ACC value, respectively. It is not difficult to observe that both the average performances and the best performances of ACC obtained by TWKM are superior to the results obtained by other methods on WINE, WDBC, OCCUPANCY, MAGIC, USPS-08 and USPS-49. However, other methods exceed TKWM on GLASS, BANK and USPS-49. This is because ACC is computed by using core region to represent corresponding cluster and the total number of the objects is to exclude the objects in the fringe regions, which means \(n_c^j\) and n both become smaller in Eq. (10).

6 Concluding remarks

In most of the existing studies, a cluster is represented by a single set, the set naturally divides the space into two regions. An object belongs to the cluster if it is in the set, otherwise it does not belong to the cluster. It is obviously that there may be three types of relationships between an object and a cluster, namely, belong-to fully, belong-to partially (i.e., also not belong-to partially), and not belong-to fully. Based on these relationships, we developed three-way k-means clustering method by integrating k-means and three-way decisions in this paper. In the proposed method, an overlap clustering is used to obtain the supports of the clusters and perturbation analysis is applied to separate the core regions from the supports. The differences between the supports and the core regions are regarded as the fringe region of the specific cluster. Therefore, a three-way explanation of the cluster is naturally formed. Experimental results demonstrate that the new algorithms can significantly improve the structure of clustering results by comparing with the traditional clustering algorithm. The present study is the first step for the research of three-way k-means. The following are challenges for further research.

  1. 1.

    The parameters \(\varepsilon _1\) and \(\varepsilon _2\) have a significant impact on the clustering results. Research on dynamic changes of parameters will be an interesting topic to be addressed.

  2. 2.

    In the proposed three-way k-means algorithm, the number of clusters is given in advance. However, how to determine the number of clusters is another interesting topic to be addressed.

  3. 3.

    Multigranulation is a developing approach which can be used for the constructing approximations of target concept. With granular computing point of view, our proposed algorithm is based on a single granulation. How to develop three-way k-means based on multigranulations needs to be further discussed.