1 Introduction

In the paper, we explore multi-dimensional/multi-featured datasets with the help of clustering. Some of the well-known procedures of clustering are subspace clustering (see [1, 2]), density-based clustering [3], fuzzy clustering [4], and hierarchical clustering [5]. Fuzzy C-means (FCM) [6] is an example of a fuzzy partitional clustering technique, and its functioning has been widely observed over multiple types of real-world multi-dimensional datasets. The variants of FCM obtained in the purview of other fuzzy sets are known as intuitionistic fuzzy C-means (IFCM) [7], interval type-2 fuzzy C-means [8, 9], rough fuzzy C-means [10, 11], and hesitant fuzzy k-means [12]. The FCM algorithm uses Euclidean distance measure. Researchers exploited measures like Mahalanobis distance in place of the Euclidean distance measure to obtain clusters that are not spherical (see [13]). Further such distance-based variants of FCM are found less sensitive towards noises and outliers (see [14]). The weighted fuzzy C-mean clustering algorithms are successful in satisfactory clustering of the multi-dimensional/multi-featured datasets (see [15,16,17,18]). The weighted FCM assigns an appropriate role to each feature in a multi-featured dataset during its clustering. It involves a feature weight learning technique that decides the role of each feature (see [19,20,21,22,23]). A feature weight learning technique is proposed in [20] based on the gradient descent procedure. For an efficient clustering of the multi-dimensional datasets, an entropy-based weight learning technique is proposed by the [21]. Some of the recent studies on feature learning are [24, 25, 17, 26,27,28,29], and [30].

The AIFS proposed by [31] is an effective tool to model the ambiguity and uncertainty present within real-world phenomena. The ambiguity and uncertainty present in the multi-dimensional datasets motivate us to cluster them under the AIFS environment using the hesitancy component of the AIFS. An AIFS-based clustering algorithm efficiently solves the problems of various fields discussed by [32,33,34,35,36,37,38]. [34] proposes a clustering technique for image representations, whereas medical image segmentation is done by an entropy-based technique (see [32]). [33] extends the clustering technique of [32] to cope with the issues of noise/outliers by selecting neighborhood pixels from different regions of the magnetic resonance imaging for its tuning. The natural extension of the FCM under the AIFS environment is the IFCM algorithm [7], which is further studied by [38]. It is improvised for the noisy medical image segmentation while using the rough set theory (see [39]).

During the clustering of a multi-dimensional/featured dataset, FCM and IFCM assign equal importance to each dimension/feature. Moreover, the derivation of FCM/IFCM or their variants involves calculus-based optimization. It is observed that many of these variants are using weighted Euclidean distance measure. Here, we categorize these variants on the basis of an equally likely approach as follows: (1) In the IFCM variants (see [32,33,34, 40, 36, 37]), an equally likely approach is used, that is, all the components of Euclidean distance are assigned same weight. (2) There are some variants (see [41, 39]) in which unequal weights are allocated to the components of Euclidean distance in contrast to the equally likely approach. Moreover, the variants of the FCM employing an equally likely approach traverse their distance along a circular path, whereas the IFCM counterparts compute distances along spherical paths. A single weighing exponent is sufficient to control those distance measures which track along a circular or spherical/hyper-spherical path. Now, if variants of FCM/IFCM do not employ an equally likely approach, then elliptical/ellipsoidal path-based distances are measured. To monitor an ellipsoidal path, three independent weighing exponents are exploited, here, the same path is monitored using a weighing exponent containing two variables. We show that the improvisation of the clustering algorithm depends on the computation of an accurate feature weight distribution. The clustering algorithms discussed in the paper are either uni-parametric, bi-parametric, tri-parametric, or quad-parametric (see Table 2). Let us discuss the major contributions of the paper:

  • An equally likely approach often fails in the dealing of a multi-dimensional/featured dataset because all features of the dataset are not equally relevant. To address this problem, we have proposed two clustering techniques: (1) a singly weighted data-driven algorithm called uni-weighted intuitionistic fuzzy C-means (uW-IFCM), (2) two variables-based weight triplets introduce bi-weighted probabilistic intuitionistic fuzzy C-means algorithm (bW-PIFCM) (see Sect. 2.2 and 2.3).

  • The initialization of uW-IFCM and bW-PIFCM algorithms requires a transformation of the real-valued multi-dimensional/featured dataset to an AIFS dataset. For this, we have utilized two parameters \((\alpha ~\text{ and }~\beta )\)-based novel data fuzzification technique (see Sect. 3).

  • In uW-IFCM algorithm, the weighing exponent \(\eta\) and fuzzy factor m are used as parameters. We further tune the feature weights using the exponent \(\eta\), which gives better feature weight distribution for uW-IFCM in comparison to bW-PIFCM. A study about the optimal choices of \(\eta\) and m at which uW-IFCM delivers its best clustering has been carried out in Sect. 4. Here, we perform both experimental analysis and convergence analysis of the uW-IFCM and bW-PIFCM algorithms.

  • The uW-IFCM detects irrelevant features in a multi-dimensional/featured dataset during its clustering. We have shown that uW-IFCM is a mechanized feature reduction technique (see Sect. 5.1), whereas an automatized feature reduction technique called FRT-equipped uW-IFCM algorithm is also proposed for nullification of irrelevant features (see Sect. 5.2).

The remaining paper is organized into five sections. In Section 2, we provide a mathematical solution for a fuzzy clustering problem. Section 3 discusses the procedural detailing of proposed uW-IFCM and bW-PIFCM algorithms. In Sect. 4, an experimental analysis of synthetic and some UCI machine learning datasets is carried out. We provide two uW-IFCM-based feature reduction techniques in Sect. 5. Finally, the conclusion is stated in Sect. 6.

Table 1 Mathematical symbols
Table 2 Description of C-means algorithms

2 Some Important Mathematical Results for C-Means Clustering Algorithms

We have divided this section into three subsections.

2.1 Description of Clustering Problem I

Fig. 1
figure 1

Plotting of a three-cluster 5D Synthetic Dataset I with three normally distributed features \(d_1, d_2\) and \(d_3\) and two noisy features \(d_4\) and \(d_5\)

Intuitionistic fuzzy C-means (IFCM) is a well-known clustering algorithm that uses Euclidean distance measure to cluster the datasets.

Clustering problem I: Let \(\{\hat{x}_1, \hat{x}_2, \dots ,\) \(\hat{x}_P \}\) be a set of multi-valued AIFS data items. Here, the dimension of each data item is D, that is, \(\hat{x}_i = (\tilde{x}_{id})_{d=1}^{D}\), and \(\tilde{x}_{id} = (\mu _{id},\nu _{id},\pi _{id})\). We have to group \(\{\hat{x}_1, \hat{x}_2, \dots ,\) \(\hat{x}_P \}\) into C clusters. Let \(V = \{\hat{v}_1, \hat{v}_2, \dots , \hat{v}_C: \hat{v}_l = (\bar{\mu }_{ld},\bar{\nu }_{ld},\bar{\pi }_{ld})_{d=1}^{D}, 1 \le l \le C\}\) be a set of initial cluster centroids.

IFCM-based solution procedure: The criterion function \(J_m\) is defined as

$$\begin{aligned} {J_m}(U,V)\,=\,& {} \sum _{l=1}^{C}\sum _{i=1}^{P}u_{li}^mD_1^2(\hat{x}_i,\hat{v}_l), ~~\text{ where }, \end{aligned}$$
(1)
$$\begin{aligned} D_1^2(\hat{x}_i,\hat{v}_l)\,=\,& {} \frac{1}{{2P}}\sum _{d=1}^{D}\{(\mu _{id}-\bar{\mu }_{ld})^2 + (\nu _{id}-\bar{\nu }_{ld})^2 \nonumber \\{} & {} + (\pi _{id}-\bar{\pi }_{ld})^2\} \end{aligned}$$
(2)

such that

$$\begin{aligned} {\sum \limits _{l=1}^{C}}{u_{il}} = 1,{} & {} 1 \le i \le P \end{aligned}$$
(3a)
$$\begin{aligned} u_{il} \in [0,1]{} & {} 1 \le i \le P, ~1 \le l \le C \end{aligned}$$
(3b)
$$\begin{aligned} {\sum \limits _{i=1}^{P}}{u_{il}} > 0,{} & {} 1 \le l \le C \end{aligned}.$$
(3c)

The membership value \(u_{il}\) of ith data item in lth cluster and the centroid \((\mu _{\bar{v}_{ld}},\nu _{\bar{v}_{ld}},\pi _{\bar{v}_{ld}})_{d=1}^{D}\) are updated using Eqs. (4) and (5).

$$\begin{aligned} u_{il}\,=\,& {} \frac{1}{{\sum \nolimits _{l=1}^{C}}\Biggl (\frac{{D}_1(\hat{x}_i,\hat{v}_l)}{{D_1}(\hat{x}_i,\hat{v}_l)}\Biggr )^\frac{2}{m-1}} \end{aligned}$$
(4)
$$\begin{aligned} {\mu _{\tilde{v}_{ld}}}\,=\,& {} \frac{{\sum \nolimits _{i=1}^{P}} u^{m}_{il} {\mu _{\tilde{x}_{id}}}}{ {\sum \nolimits _{i=1}^{P}} u^{m}_{il}}, {\nu _{\tilde{v}_{ld}}} = \frac{{\sum \nolimits _{i=1}^{P}} u^{m}_{il} {\nu _{\tilde{x}_{id}}}}{ {\sum \nolimits _{i=1}^{P}} u^{m}_{il}}, {\pi _{\tilde{v}_{ld}}} = \frac{{\sum \nolimits _{i=1}^{P}} u^{m}_{il} {\pi _{\tilde{x}_{id}}}}{ {\sum \nolimits _{i=1}^{P}} u^{m}_{il}} \end{aligned}.$$
(5)

In IFCM algorithm, all the features of a multi-dimensional/ featured dataset are equally emphasized during clustering. The IFCM transforms into weighted-IFCM, if \(D_1^2(\tilde{x}_i,\tilde{v}_l)\) is replaced with \(D_w^2(\tilde{x}_i,\tilde{v}_l)=\sum _{d=1}^{D}w_{d}((\mu _{id}-\bar{\mu }_{ld})^2 + (\nu _{id}-\bar{\nu }_{ld})^2 + (\pi _{id}-\bar{\pi }_{ld})^2)\). In weighted-IFCM, we randomly select its weights (i.e., \(w_{d},1\le d\le D\)) such that \(\sum _{d=1}^{D}w_{d}=1\).

2.2 A Proposal of Novel Probabilistic Intuitionistic Fuzzy C-Means Clustering Algorithm

Let us solve the Clustering problem I probabilistically. So, we propose bi-Weighted Probabilistic Intuitionistic Fuzzy C-Means algorithm (bW-PIFCM), and this variant of PIFCM uses two variable-based weight triplets. The proposed algorithm computes data-driven feature weights such that it results to an optimal feature weight distribution at the convergence. It means that the proposed algorithm is adaptive in nature. The weights portray the importance of features in a multi-dimensional/featured dataset. Here, a membership matrix \(M = {(u_{il})_{P \times C}}\) of order \({P \times C}\) is obtained. The mathematical results which correspond to bW-PIFCM are given in the form of four theorems.

Theorem 2.1

Let S be a collection that contains AIFS. If \(I_{1}=[p',p'']\) and \(I_{2}=[q',q'']\) be the weight intervals assigned to membership and non-membership components of an AIFS, which is an element of S. The weight interval I assigned to the hesitancy component of the AIFS is

$$\begin{aligned} I\,= & {} [1-max(p',q',p'',q'')+max(p',q',p'',q'')\pi , 1-min(p',q',p'',q'')\nonumber \\{} & {} +min(p',q',p'',q'')\pi ] \end{aligned}.$$
(6)

Proof

Let \(A=(\mu ,\nu ,\pi )\) be an AIFS belonging to S. As,

$$\begin{aligned}{} & {} \min (p',q')(\mu +\nu )\le p'\mu +q'\nu \le \max (p',q')(\mu +\nu ) \end{aligned}$$
(7)
$$\begin{aligned}{} & {} \quad \min (p'',q'')(\mu +\nu )\le p''\mu +q''\nu \le \max (p'',q'')(\mu +\nu ) \end{aligned}.$$
(8)

From (7) and (8), we have,

$$\begin{aligned}{} & {} \min (p',q',p'',q'')(\mu +\nu )\le p'\mu +q'\nu ,~ p'\mu +q'\nu \le \max (p',q',p'',q'')(\mu +\nu ) \end{aligned}$$
(9)
$$\begin{aligned}{} & {} \quad \min (p',q',p'',q'')(\mu +\nu )\le p''\mu +q''\nu ,~ p''\mu +q''\nu \le \max (p',q',p'',q'')(\mu +\nu ) \end{aligned}.$$
(10)

Therefore, from (9) and (10), we get,

$$\begin{aligned} I= & {} [1-\max (p',q',p'',q'')+\max (p',q',p'',q'')\pi , 1-\min (p',q',p'',q'')\nonumber \\{} & {} + \min (p',q',p'',q'')\pi ]. \end{aligned}$$

\(\square\)

Theorem 2.2

Let an AIFS, say \(A_1\) belonging to S has been assigned weight intervals \(I_{1}=[p',p'']\), \(I_{2}=[q',q'']\) , and I to their membership, non-membership, and hesitancy components, respectively. Here, \(I_{1}=[p',p'']\), \(I_{2}=[q',q'']\) are independent of \(A_1\), but the interval I depends on \(A_1\). Let \(d:S\times S\rightarrow \mathbb R\) be a mapping, such that,

$$\begin{aligned} d(A_1,A_2)= & {} \Bigl [{\frac{1}{2n}}{\sum \limits _{i=1}^{n}}{p_{12}} {(\mu _{A_1}(x_i) - \mu _{A_2}(x_i))^2}+ {q_{12}} {(\nu _{A_1}(x_i) - \nu _{A_2}(x_i))^2}\nonumber \\{} & {} + {r_{12}} {(\pi _{A_1}(x_i) - \pi _{A_2}(x_i))^2}\Bigr ]^{1/2} \end{aligned}.$$
(11)

The mean of the intervals \(I_{1},I_{2}, I_{3}=I\cap I^{'}\) gives the coefficients \(p_{12},q_{12}\), \(r_{12}\), respectively. Here, the weight interval associated with the hesitancy component of \(A_2\) is \(I^{'}\). The mapping d is a distance measure.

Proof

We show that d satisfies all the properties of distance measure:

1.  It is trivial to see \(0\le d(A_1,A_2)\le 1\), where \(A_1\) and \(A_2\) are any two AIFS of S.

2.     Let, \(d(A_1,A_2)=0\)

$$\begin{aligned}{} & {} \Leftrightarrow \Bigl [{\frac{1}{2n}}{\sum \limits _{i=1}^{n}}{p_{12}} {(\mu _{A_1}(x_i) - \mu _{A_2}(x_i))^2}+ {q_{12}} {(\nu _{A_1}(x_i) - \nu _{A_2}(x_i))^2} \nonumber \\{} & {} \quad + {r_{12}} {(\pi _{A_1}(x_i) - \pi _{A_2}(x_i))^2}\Bigr ]^{1/2} = 0 \end{aligned}.$$
(12)

Equation (12) implies

$$\begin{aligned} \mu _{A_1}(x_i)=\mu _{A_2}(x_i),~~\nu _{A_1}(x_i)=\nu _{A_2}(x_i) \end{aligned}.$$

3.   It can be easily shown that \(d(A_1,A_2)=d(A_2,A_1).\)

4.  To establish this condition, it is equivalent to prove the property of transitivity (see [42, 43] and [41]). To do so, let \(A_1, A_2, A_3\) be the three AIFSs. Now,

$$\begin{aligned} d(A_1,A_3)= & {} \Bigl [{\frac{1}{2n}}{\sum \limits _{i=1}^{n}}{p_{12}} {(\mu _{A_1}(x_i) - \mu _{A_3}(x_i))^2}+ {q_{12}} {(\nu _{A_1}(x_i) - \nu _{A_3}(x_i))^2} \nonumber \\{} & {} + {r_{12}} {(\pi _{A_1}(x_i) - \pi _{A_3}(x_i))^2}\Bigr ]^{1/2} \end{aligned}.$$
(13)

Using Minkowski’s inequality, we get

$$\begin{aligned}{} & {} \le \Bigl [{\frac{1}{2n}}{\sum \limits _{i=1}^{n}}{p_{12}} {(\mu _{A_1}(x_i) - \mu _{A_2}(x_i))^2}+ {q_{12}} {(\nu _{A_1}(x_i) - \nu _{A_2}(x_i))^2} + {r_{12}} {(\pi _{A_1}(x_i) - \pi _{A_2}(x_i))^2}\Bigr ]^{1/2} \nonumber \\{} & {} \quad +\Bigl [{\frac{1}{2n}}{\sum \limits _{i=1}^{n}}{p_{12}} {(\mu _{A_2}(x_i) - \mu _{A_3}(x_i))^2}+ {q_{12}} {(\nu _{A_2}(x_i) - \nu _{A_3}(x_i))^2} + {r_{12}} {(\pi _{A_2}(x_i) - \pi _{A_3}(x_i))^2}\Bigr ]^{1/2} \end{aligned}$$
(14)

\(~~~~~~~~~~~~~~~\Rightarrow d(A_1,A_3)\le d(A_1,A_2)+d(A_2,A_3)\)

Thus, \(d(A_1,A_2)\) is a distance measure. \(\square\)

Theorem 2.3

Let \(\theta _1:\mathbb {M}_{si}\times \mathbb {V}_{sd}\rightarrow \mathbb {R}\) be a mapping, such that \(\theta _1(U,V) = J_{m}({U},{V})\), where cluster matrix, \(V\in \mathbb {V}_{sd}\) and membership matrix, \(U\in \mathbb {M}_{si}\). Then \(U^*\) is a strict local minima if \(U^*\) is calculated from Eq. (23). The coefficients \(p_{12}, q_{12}, r_{12}\) are known (see Theorem 2.2).

Proof

The criterion function and constraint are given as

$$\begin{aligned} J_m(U,V)=\sum _{s=1}^{C} \sum _{i=1}^{P}u_{si}^m \sum _{d=1}^{D} \bigg [p_{12}(\mu _{id}-\bar{\mu }_{sd})^2+ q_{12}(\nu _{id}-\bar{\nu }_{sd})^2 + r_{12}(\pi _{id}-\bar{\pi }_{sd})^2 \bigg ] \end{aligned}$$
(15)

such that

$$\begin{aligned} \sum _{s=1}^{C}u_{si}=1,~~~~~~~~~~\forall ~1 \le i \le P \end{aligned}.$$
(16)

The Lagrangian G(UV) is constructed based on the criterion function and constraint as follows:

$$\begin{aligned} G(U,V)= & {} \sum _{i=1}^{P} \bigg (\sum _{s=1}^{C}u_{si}^m \sum _{d=1}^{D}(p_{12}(\mu _{id}-\bar{\mu }_{sd})^2 +q_{12}(\nu _{id}-\bar{\nu }_{sd})^2 + \nonumber \\{} & {} r_{12}(\pi _{id}-\bar{\pi }_{sd})^2) -\sum _{i=1}^{P} \lambda _i\bigg [\sum _{s=1}^{C}u_{si}- 1\bigg ] \end{aligned}.$$
(17)

In Eq. (17), we have used Lagrange’s multipliers \(\lambda _i,~1 \le i \le P\). Here, the fuzzy factor \(m \in (0,1) \cup (1,\infty )\) operates over the elements of the membership matrix, U. The element \(u_{li} \in [0,1]\) denotes the membership grade of the ith element of the universe of discourse within the lth cluster. It is a minimization problem, so for its solution, we differentiate Eq. (17) with respect to Lagrange multipliers \(\lambda _i\), and set them equal to zero as follows:

$$\begin{aligned} \frac{\partial G(U,{\textbf {V}},{\textbf {W}})}{\partial \lambda _i} = -\bigg [\sum _{s=1}^{c}u_{si}-1\bigg ]=0,~~\forall 1\le i\le P \end{aligned}.$$
(18)

Similarly, derivatives of the Lagrangian condition (17) are set equal to zero with respect to membership parameter \(u_{si}\), where \(1\le i\le P\),\(1 \le s \le C\) as

$$\begin{aligned}{} & {} \frac{\partial G(U,{V})}{\partial u_{si}} = mu_{si}^{m-1}\sum _{d=1}^{D}((p_{12}(\mu _{id}-\tilde{\mu }_{sd})^2+q_{12}(\nu _{id}-\tilde{\nu }_{sd})^2 + \nonumber \\{} & {} \quad r_{12}(\pi _{id}-\tilde{\pi }_{sd})^2))-\lambda _i=0 \end{aligned}.$$
(19)

Solving Eq. (19), we get

$$\begin{aligned} u_{si}=\bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\sum _{d=1}^{D}(p_{12}((\mu _{id}-\bar{\mu }_{sd})^2 + q_{12}(\nu _{id}-\bar{\nu }_{sd})^2+ r_{12}(\pi _{id}-\bar{\pi }_{sd})^2))^\frac{1}{1-m} \end{aligned}.$$
(20)

To compute the iterative formula for \(u_{si}\), we combine Eqs. (18) and (20) to yield

$$\begin{aligned}{} & {} \sum _{s=1}^{C}\bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\bigg (\sum _{d=1}^{D}(p_{12}(\mu _{id}-\bar{\mu }_{sd})^2 + q_{12}(\nu _{id}-\bar{\nu }_{sd})^2\nonumber \\{} & {} \quad + r_{12}(\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}=1 \end{aligned}$$
(21)
$$\begin{aligned}{} & {} \quad \bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\sum _{s=1}^{C}\bigg (\sum _{d=1}^{D}(p_{12}(\mu _{id}-\bar{\mu }_{sd})^2 +q_{12}(\nu _{id}-\bar{\nu }_{sd})^2 \nonumber \\{} & {} \quad +r_{12}(\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}=1 \end{aligned}.$$
(22)

Therefore, with the division of Eq. (20) by Eq. (22), the iterative formula for membership value \(u_{si}\) is obtained as follows:

$$\begin{aligned} u_{si}= \dfrac{\left\{ \begin{array}{ll} \bigg (\sum _{d=1}^{D}p_{12}(\mu _{id}-\bar{\mu }_{sd})^2 \\ \quad +q_{12}(\nu _{id}-\bar{\nu }_{sd})^2 + r_{12}(\pi _{id}-\bar{\pi }_{sd})^2\bigg )^\frac{1}{1-m}\\ \end{array} \right\} }{\left\{ \begin{array}{ll}\sum _{s=1}^{C}\bigg (\sum _{d=1}^{D}p_{12}((\mu _{id}-\bar{\mu }_{sd})^2\\ \quad + {q_{12}(\nu _{id}-\bar{\nu }_{sd})^2 + r_{12}(\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}}\\ \end{array}\right\} } \end{aligned}$$
(23)

\(\square\)

Theorem 2.4

The optimal minima \(U^*\) is obtained at a point \(V=V^*\).

Proof

We define the Lagrangian G(UV) as

$$\begin{aligned} G(U,V)= & {} \sum _{i=1}^{P} \bigg (\sum _{s=1}^{C}u_{si}^m \sum _{d=1}^{D}(p_{12}(\mu _{id}-\bar{\mu }_{sd})^2+q_{12}(\nu _{id}-\bar{\nu }_{sd})^2\nonumber \\{} & {} \quad + r_{12}(\pi _{id}-\bar{\pi }_{sd})^2) -\sum _{l=1}^{P}\lambda _l\bigg [\sum _{s=1}^{C}u_{si}-1\bigg ]\bigg ) \end{aligned}.$$
(24)

To find the cluster center matrix V, we equate derivatives of the Lagrangian (see Eq. (24)) equal to zero with respect to cluster center \(\bar{\mu }_{sd},\bar{\nu }_{sd},\bar{\pi }_{sd}\), where \(1 \le s \le C, 1 \le d \le D\). It yields,

$$\begin{aligned} \frac{\partial G(U,V)}{\partial \bar{\mu }_{sd}}= & {} 0 \end{aligned}$$
(25)
$$\begin{aligned} \frac{\partial G(U,V)}{\partial \bar{\nu }_{sd}}= & {} 0 \end{aligned}$$
(26)
$$\begin{aligned} \frac{\partial G(U,V)}{\partial \bar{\pi }_{sd}}= & {} 0 \end{aligned}.$$
(27)

Solving Eq. (25), we get

$$\begin{aligned} \frac{\partial G(U,V)}{\partial \bar{\mu }_{sd}}= & {} \sum _{i=1}^{P}\bigg (u_{si}^m p_{12} [-2(\mu _{id}-\bar{\mu }_{sd})]\bigg ) \nonumber \\= & {} 0 \end{aligned}$$
(28)
$$\begin{aligned} \bar{\mu }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\mu _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}.$$
(29)

Similarly, we obtain the following solutions from Eqs. (26) and (27), respectively,

$$\begin{aligned} \bar{\nu }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\nu _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}$$
(30)
$$\begin{aligned} \bar{\pi }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\pi _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}.$$
(31)

Here, eqns. (29), (30), and (31) derive the point \(V=V^*\). \(\square\)

2.3 A Proposal of Novel Intuitionistic Fuzzy C-Means Clustering Algorithm

This section establishes some mathematical results to propose the uni-weighted intuitionistic fuzzy C-means clustering (uW-IFCM) algorithm. A feature weight matrix (namely, \(W = (w_{d})_{1 \times D}\))-dependent distance measure \({D}_2^2(\hat{x_i},\hat{v_l})\) is used here. We have

$$\begin{aligned} {D}_2^2(\hat{x_i},\hat{v_l}) =\sum _{d=1}^{D}w^{\eta }_{d}\left( (\mu _{id}-\bar{\mu }_{ld})^2+ (\nu _{id}-\bar{\nu }_{ld})^2 +(\pi _{id}-\bar{\pi }_{ld})^2 \right) \end{aligned}.$$
(32)

The set containing C cluster centroids against dth feature is \(V = \{{v}_1, {v}_2, \dots , {v}_C; {v}_l = (\bar{\mu }_{ld},\bar{\nu }_{ld},\bar{\pi }_{ld}), 1 \le l \le C\}\). Here, the theorems derive membership matrix U, cluster centroids \((\bar{\mu }_{ld},\bar{\nu }_{ld},\bar{\pi }_{ld})\) , and weight matrix W.

Theorem 2.5

Let \(\theta _1:\mathbb {M}_{li}\rightarrow \mathbb {R}\) be a mapping, such that \(\theta _1(U) = J({U},{\textbf {V}},{\textbf {W}})\), where cluster matrix \({\textbf {V}}\in \mathbb {V}_{ld}\) and weight matrix \({\textbf {W}}\in \mathbb {W}_{ld}\) are fixed. Then \(U^*\) is a strict local minima if \(U^*\) is calculated from Eq. (41).

Proof

As \({{\textbf {V}}}\) and \({{\textbf {W}}}\) are kept fixed, they are marked bold. Let us take cluster center matrix as \({\textbf {V}}=[{\textbf {v}}_1,{\textbf {v}}_2,...,{\textbf {v}}_C]^T\), where \({\textbf {v}}_{l}=(\bar{\mu }_{lj},\bar{\nu }_{lj},\bar{\pi }_{lj})_{j=1}^{D}\). The weight matrix \({\textbf {W}}\) is of order \({1\times D}\), and \(\sum _{d=1}^{D}w_{d}=1,~\text{ for } \text{ all }~1 \le i \le P\). Here, \(m,\eta \in (0,1) \cup (1,\infty )\). Now, we define the criterion function:

$$\begin{aligned} J_m(U,{{\textbf {V}}},{{\textbf {W}}})=\sum _{l=1}^{C} \sum _{i=1}^{P}u_{li}^m \sum _{d=1}^{D}w_{d}^{\eta } \bigg [(\mu _{id}-\bar{\mu }_{ld})^2 + (\nu _{id}-\bar{\nu }_{ld})^2 + (\pi _{id}-\bar{\pi }_{ld})^2 \bigg ] \end{aligned}$$
(33)

the following constraint is used for defining Lagrangian:

$$\begin{aligned} \sum _{l=1}^{C}u_{li}=1,~~~~~~~~~~\forall ~1 \le i \le P \end{aligned}.$$
(34)

Equations (33) and (34) define the Lagrangian:

$$\begin{aligned}{} & {} G(U,{\textbf {V}},{\textbf {W}})=\sum _{i=1}^{P} \bigg (\sum _{l=1}^{C}u_{li}^m \sum _{d=1}^{D}w_{d}^\eta ((\mu _{id}-\bar{\mu }_{ld})^2+(\nu _{id}-\bar{\nu }_{ld})^2+ (\pi _{id}-\bar{\pi }_{ld})^2) \nonumber \\{} & {} \quad -\sum _{i=1}^{P} \lambda _i\bigg [\sum _{l=1}^{C}u_{li}- 1\bigg ] \end{aligned}.$$
(35)

The derivatives of the Lagrangian condition (35) are set equal to zero with respect to Lagrange’s multipliers \(\lambda _i\):

$$\begin{aligned} \frac{\partial G(U,{\textbf {V}},{\textbf {W}})}{\partial \lambda _i} = -\bigg [\sum _{l=1}^{c}u_{li}-1\bigg ]=0,~~\forall 1\le i\le P \end{aligned}.$$
(36)

Similarly, derivatives of the Lagrangian condition (35) are set equal to zero with respect to membership parameter \(u_{si}\), where \(1\le i\le P\),\(1 \le s \le C\). So,

$$\begin{aligned} \frac{\partial G(U,{\textbf {V}},{\textbf {W}})}{\partial u_{si}} = mu_{si}^{m-1}\sum _{d=1}^{D}w_{d}^\eta (((\mu _{ia}-\bar{\mu }_{sa})^2+(\nu _{ia}-\bar{\nu }_{sa})^2 + (\pi _{ia}-\bar{\pi }_{sa})^2))-\lambda _i=0 \end{aligned}.$$
(37)

Solving Eq. (37), we get

$$\begin{aligned} u_{si}=\bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\sum _{d=1}^{D}w_{d}^\eta (((\mu _{id}-\bar{\mu }_{sd})^2 + (\nu _{id}-\bar{\nu }_{sd})^2+ (\pi _{id}-\bar{\pi }_{sd})^2))^\frac{1}{1-m} \end{aligned}.$$
(38)

Equations (36) and (38) together yield:

$$\begin{aligned}{} & {} \sum _{l=1}^{C}\bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\bigg (\sum _{d=1}^{D}w_{d}^\eta ((\mu _{ia}-\bar{\mu }_{sd})^2 + (\nu _{id}-\bar{\nu }_{sd})^2 + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}=1 \end{aligned}$$
(39)
$$\begin{aligned}{} & {} \quad \bigg (\frac{\lambda _i}{m}\bigg )^\frac{1}{m-1}\sum _{l=1}^{C}\bigg (\sum _{d=1}^{D}w_{d}^\eta ((\mu _{ia}-\bar{\mu }_{sd})^2 +(\nu _{id}-\bar{\nu }_{sd})^2 + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}=1 \end{aligned}.$$
(40)

Dividing Eq. (38) by Eq. (40) results in an iterative formula for membership value \(u_{si}\) as follows:

$$\begin{aligned} u_{si}= \dfrac{\left\{ \begin{array}{ll} \bigg (\sum _{d=1}^{D}w_{d}^\eta (\mu _{id}-\bar{\mu }_{sd})^2 \\ \quad + (\nu _{id}-\bar{\nu }_{sd})^2 + (\pi _{id}-\bar{\pi }_{sd})^2\bigg )^\frac{1}{1-m}\end{array}\right\} }{\left\{ \begin{array}{ll}\sum _{l=1}^{c}\bigg (\sum _{d=1}^{D}w_{d}^\eta ((\mu _{ia}-\bar{\mu }_{sd})^2 \\ \quad +{(\nu _{id}-\bar{\nu }_{sd})^2 + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg )^\frac{1}{1-m}}\end{array} \right\} } \end{aligned}$$
(41)

\(\square\)

Fig. 2
figure 2

Distribution of feature weights using uW-IFCM clustering algorithm over Synthetic Dataset I

Theorem 2.6

Let \(\theta _2:\mathbb {W}_{D}\rightarrow \mathbb {R}\) be a mapping such that \(\theta _2(W) = J({\textbf {U}},{\textbf {V}},W)\), where membership matrix \({\textbf {U}}\in \mathbb {U}_{li}\) and cluster matrix \({\textbf {V}}\in \mathbb {V}_{ld}\) remain constant. Then \(W^*\) is a strict local minima if Eq. (50) calculates \(W^*\).

Proof

The membership matrix \({\textbf {U}}\) and cluster center matrix \({\textbf {V}}\) are kept constant and we mark them bold. The Lagrangian \(G({\textbf {U}},{\textbf {V}},W)\) is defined using a weight matrix \(W = (w_{d}^\eta )_{1 \times D}\) as follows:

$$\begin{aligned}{} & {} G({\textbf {U}},{\textbf {V}},W)=\sum _{i=1}^{P} \bigg (\sum _{l=1}^{C}u_{li}^m \sum _{d=1}^{D}w_{d}^\eta ((\mu _{id}-\bar{\mu }_{ld})^2+(\nu _{id}-\bar{\nu }_{ld})^2+ (\pi _{id}-\bar{\pi }_{ld})^2) \nonumber \\{} & {} \quad -\sum _{l=1}^{P}\lambda _l\bigg [\sum _{d=1}^{D}w_{d}-1\bigg ]\bigg ) \end{aligned}.$$
(42)

The derivatives of the Lagrangian (see Eq. (42)) with respect to Lagrange multiplier \(\lambda _l\) are set equal to zero. Hence,

$$\begin{aligned} \frac{\partial G(U,V,W)}{\partial \lambda _l} = -\bigg [\sum _{d=1}^{D} w_{d}-1\bigg ]=0,~~\forall 1\le l\le P \end{aligned}$$
(43)

Similarly, the derivatives of the Lagrangian (see Eq. (42)) with respect to weight parameter \(w_{a}\), \(1 \le a \le D\) is set equal to zero. Hence,

$$\begin{aligned}{} & {} \frac{\partial G({\textbf {U}},{\textbf {V}},W)}{\partial w_{a}}= \sum _{i=1}^{q}u_{si}^m \eta w_{sd}^{(\eta -1)}(((\mu _{ia}-\bar{\mu }_{sa})^2 + (\nu _{ia}-\bar{\nu }_{sa})^2 + \nonumber \\{} & {} \quad (\pi _{ia}-\bar{\pi }_{sa})^2))-\lambda _l=0 \end{aligned}.$$
(44)

We simplify Eq. (44) and it derives feature weight \(w_{a}\) as follows:

$$\begin{aligned}{} & {} \sum _{i=1}^{P}u_{si}^m \eta w_{a}^{(\eta -1)}(((\mu _{ia}-\bar{\mu }_{sa})^2 + (\nu _{ia}-\bar{\nu }_{sa})^2 +(\pi _{ia}-\bar{\pi }_{sa})^2))-\lambda _l=0 \end{aligned}$$
(45)
$$\begin{aligned}{} & {} \quad \sum _{i=1}^{P}u_{si}^m \eta w_{sa}^{(\eta -1)}(((\mu _{ia}-\bar{\mu }_{sa})^2 + (\nu _{ia}-\bar{\nu }_{sa})^2 + (\pi _{ia}-\bar{\pi }_{sa})^2))=\lambda _l \end{aligned}.$$
(46)

For \(\eta \ne 1\):

$$\begin{aligned} w_{a}=\bigg (\frac{\lambda _l}{\eta }\bigg )^\frac{1}{\eta -1}\bigg [\sum _{i=1}^{P}u_{si}^m \eta w_{a}^{(\eta -1)}((\mu _{ia}-\bar{\mu }_{sa})^2 + (\nu _{ia}-\bar{\nu }_{sa})^2 + (\pi _{ia}-\bar{\pi }_{sa})^2)\bigg ]^\frac{1}{1-\eta } \end{aligned}.$$
(47)

Equations (46) and (47) are solved, so we have

$$\begin{aligned}{} & {} \sum _{d=1}^{D}\bigg (\frac{\lambda _l}{\eta }\bigg )^\frac{1}{\eta -1}\bigg [\sum _{i=1}^{P}mu_{si}^m \eta w_{d}^{(\eta -1)}((\mu _{id}-\bar{\mu }_{sd})^2 + (\nu _{id}-\bar{\nu }_{sd})^2 \nonumber \\{} & {} \quad + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg ]^\frac{1}{1-\eta }=1 \end{aligned}$$
(48)
$$\begin{aligned}{} & {} \quad \bigg (\frac{\lambda _l}{\eta }\bigg )^\frac{1}{\eta -1}\sum _{d=1}^{D}\bigg [\sum _{i=1}^{P}mu_{si}^m \eta w_{d}^{(\eta -1)}((\mu _{id}-\bar{\mu }_{sd})^2 + (\nu _{id}-\bar{\nu }_{sd})^2 \nonumber \\{} & {} \quad + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg ]^\frac{1}{1-\eta }=1 \end{aligned}.$$
(49)

Let us divide Eq. (47) by Eq. (49), and it results in the iterative formula for weight \(w_{a}\) as follows:

$$\begin{aligned} w_{a}= \dfrac{\left\{ \begin{array}{ll} \bigg [\sum _{i=1}^{P}u_{si}^m ((\mu _{ia}-\bar{\mu }_{sa})^2 \\ \quad + (\nu _{ia}-\bar{\nu }_{sa})^2+ (\pi _{ia}-\bar{\pi }_{sa})^2)\bigg ]^\frac{1}{1-\eta }\end{array} \right\} }{\left\{ \begin{array}{ll}\sum _{d=1}^{D}\bigg [\sum _{i=1}^{P}u_{si}^m ((\mu _{id}-\bar{\mu }_{sd})^2 \\ \quad + (\nu _{id}-\bar{\nu }_{sd})^2 + (\pi _{id}-\bar{\pi }_{sd})^2)\bigg ]^\frac{1}{1-\eta } \end{array}\right\} } \end{aligned}$$
(50)

\(\square\)

Corollary 2.1

Equations. (55), (56), and (57) give the iterative formula for cluster center matrix V.

For proof, we set derivatives of the Lagrangian (see Eq. (42)) with respect to cluster center \(\bar{\mu }_{sa},\bar{\nu }_{sa},\bar{\pi }_{sa}\), where \(1 \le s \le C, 1 \le a \le D\) equal to zero is as follows:

$$\begin{aligned} \frac{\partial G(U,V,W)}{\partial \bar{\mu }_{sa}}= & {} 0 \end{aligned}$$
(51)
$$\begin{aligned} \frac{\partial G(U,V,W)}{\partial \bar{\nu }_{sa}}= & {} 0 \end{aligned}$$
(52)
$$\begin{aligned} \frac{\partial G(U,V,W)}{\partial \bar{\pi }_{sa}}= & {} 0 \end{aligned}.$$
(53)

From Eq. (51), we have

$$\begin{aligned} \frac{\partial G(U,V,W)}{\partial \bar{\mu }_{sa}}= & {} \sum _{i=1}^{P}\bigg (u_{si}^mw_{a}^\eta [-2(\mu _{ia}-\bar{\mu }_{sa})]\bigg ) \nonumber \\= & {} 0 \end{aligned}$$
(54)
$$\begin{aligned} \bar{\mu }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\mu _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}.$$
(55)

Similarly, from Eqns. (52) and (53), we get

$$\begin{aligned} \bar{\nu }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\nu _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}$$
(56)
$$\begin{aligned} \bar{\pi }_{sa}= & {} \frac{\sum _{i=1}^{P}u_{si}^m\pi _{id}}{\sum _{i=1}^{P}u_{si}^m} \end{aligned}$$
(57)

\(\square\)

Fig. 3
figure 3

A study of convergence for a IFCM, b uW-IFCM, c wIFCM, and d bW-PIFCM over Synthetic Dataset I

Fig. 4
figure 4

Proposed Architecture of uW-IFCM and bW-PIFCM algorithms

3 Procedural Details of uW-IFCM and bW-PIFCM

The implementation of the uW-IFCM algorithm involves seven steps. In the first step, intuitionistic data fuzzification technique transforms the given dataset into an AIFS dataset. Here, the functions \(\mu _{id}\), \(\nu _{id}\), and \(\pi _{id}\) calculate membership, non-membership, and hesitancy values, respectively. The optimized iterative values of membership matrix U, centroid matrix V, weight matrix W, and convergence criteria are determined from step 2 to step 7. A brief procedure to implement uW-IFCM and bW-PIFCM has been given in Algorithm III and Algorithm IV.

Step 1. Intuitionistic data fuzzification technique: Let us consider a real-valued multi-dimensional/featured dataset \(T=\{ x^{'}_{1}, x^{'}_{2}, \dots , x^{'}_{P} \}\), where \(x^{'}_{i}=(x_{id})_{d=1}^{D}\) and \(x_{id}\in R\) for all \(1\le d\le D\). The bi-parametric formula (a and b are the parameters) normalizes T as follows:

$$\begin{aligned} N_{id} = a+\frac{x_{id}-x^i_{\underset{d}{min}}}{x^i_{\underset{d}{max}}-x^i_{\underset{d}{min}}}(b-a) \end{aligned}.$$
(58)

The symbols \(x^i_{\underset{d}{\min }}\) and \(x^i_{\underset{d}{\max }}\) denote the minima and maxima of the set \((x_{id})_{d=1}^{D}\). Here, \(a = 0\) and \(b = 1\).

1. Membership function, \(\mu _{id}\): Equation. (58) transforms T into a normalized data matrix N. The norm function of MATLAB computes Euclidean distance between each pair \((x^{'}_{i},x^{'}_{j})\), and its resultant is a distance matrix \([dis_{ij}]\). Let us multiply N and \([dis_{ij}]\), and then we obtain the matrix \([P_{id}]\). The feature weight allocation to \(x_{id}\) is done with a factor \(M_{id}\) computed as follows:

$$\begin{aligned} M_{id} = \frac{1}{P_{id}} \end{aligned}.$$
(59)

We have calculated \(\sum _{i=1}^{P}\sum _{d=1}^{D} (x_{id}+M_{id})\), and it results to a new matrix called \(normalized\text{- }(x_{id}+M_{id})\) \(=\frac{x_{id}+M_{id}}{\sum _{i=1}^{n}\sum _{d=1}^{D} (x_{id}+M_{id})}\). Now, a data-centric membership function is computed as follows:

$$\begin{aligned} \mu _{id} = ({normalized\text{- }}(x_{id} + M_{id}))^\beta ,~~~~~ \beta > 0 \end{aligned}.$$
(60)

The tuning parameter \(\beta \in (0,10)\) tunes the membership values as per the requirement of the dataset T.

2. Non-membership function, \(\nu _{id}\): The generalized intuitionistic fuzzy generator given by [40] derives the non-membership function \(\nu _{id}\) as

$$\begin{aligned} \nu _{id}=(1-\mu _{id}^{\alpha \beta })^\frac{1}{\alpha } \end{aligned}.$$
(61)

The domain of the tuning parameter \(\alpha\) is (0, 1].

3. Hesitancy function, \(\pi _{id}\): The mathematical formulation of the hesitancy function is given as follows:

$$\begin{aligned} \pi _{id}= 1-\mu _{id}-(1-\mu _{id}^{\alpha \beta })^{\frac{1}{\alpha }} \end{aligned}$$
(62)

Equations. (60), (61), and (62) together transform \({x}_{id}\) into an AIFS \(\tilde{x}_{id} = (\mu _{id},\nu _{id},\pi _{id})\). This novel data fuzzification technique is implemented with the help of Algorithm I.

Now, we discuss the outlines of Algorithm II:

Step 2. Initialization of centroid matrix V: To initialize uW-IFCM, and bW-PIFCM algorithms, it is necessary to fix the prior number of cluster centroids. We randomly select initial cluster centroids, \(\hat{v}_l\), \(1 \le l \le C\) with the help of rand function of MATLAB.

Step 3. Initialization of weight matrix, W: The feature information provided in the multi-dimensional dataset is used to compute initial feature weights in uW-IFCM (see Eqn 50), whereas bW-PIFCM algorithm uses data-driven weight triplets for its initialization (see Theorem 2.2).

Step 4. Computation of membership matrix, U: The membership matrix U contains the membership value of each data point in every cluster. The higher the membership value, the more the associativity of the data points in the particular cluster. Here, the matrix \(U = [u_{kl}]\) is of order \(P\times C\), where P is the number of data points and C is the number of clusters. Eq. (41) and Eq. (23) provide mathematical formulas with which we calculate the membership matrix corresponding to uW-IFCM and bW-PIFCM algorithms, respectively.

Step 5. Updation of cluster centroid matrix, V: We have arbitrarily selected C number of initial cluster centroids \((\hat{v}_l=(\mu _{ld},\nu _{ld},\pi _{ld})_{d=1}^{D}, 1\le l\le C)\) in step 2. The same weightage has been given to each dimension d of \(\hat{s}_l\) (lth cluster centroid). The uW-IFCM uses Eqns. (55-57) to update V, whereas Eqns. (29-31) update the cluster centroids of bW-PIFCM. In some algorithms, we see an explicit role of hesitancy during the updation of centroid matrix [32]).

Step 6. Updation of weight matrix, W: To update W, the updated membership matrix U and the updated cluster centroids V of uW-IFCM are utilized. We observe that after each iteration, there is an increase in the weights of relevant features and a decrease in the weights of irrelevant features or noisy features. The data-specific appropriate selection of the exponent \(\eta\) and fuzzy factor m delivers an optimal feature weight distribution. The formula (see Eq. (50)) works for the weight matrix updation. In the case of bW-PIFCM, we update the probability weight intervals \(I_1,I_2\) and I with the help of Algorithm III.

Step 7. Convergence criterion: The convergence of the uW-IFCM and bW-PIFCM algorithms are decided with an error margin \(\epsilon\) equal to \(10^{-6}\). The convergence is met for uW-IFCM if the termination criterion \({\sum \limits _{d=1}^{D}} \frac{{d_2}({W}_d(t),{W}_d(t+1))}{D} < \epsilon\) is reached else the algorithm repeats themselves from Step 4. The termination criterion used for bW-PIFCM is \({\sum \limits _{l=1}^{C}} \frac{{d_2}({V}_l(t),{V}_l(t+1))}{C} < \epsilon\) till the convergence is reached.

figure d
figure e
figure f

4 An Experimental Study of Synthetic and UCI Machine Learning Datasets

This section is divided into four subsections as follows:

figure g
figure h

4.1 Study of Synthetic Dataset I Using IFCM and Weighted-IFCM

Description of Synthetic Dataset I: For the experimentation, we use a Gaussian distribution function-based five-dimensional Synthetic Dataset I of sample size 300 (see [44]). Here, we denote the mean and the standard deviation with \(\theta\) and \(\sigma\), respectively. The mean value of the jth cluster along lth dimension (feature) is given as \(\theta _{jl}\) with standard deviation \(\sigma _{jl}\) and distribution \(\sum _{j=1}^{3}\sum _{l=1}^{5}N(\theta _{jl},\sigma _{jl})\). The Synthetic Dataset I consists of three well-separated clusters and each cluster has 100 data items (see Fig. 1). In Table 3, the mean values and standard deviations, which we use for the three clusters are provided.

Fig. 5
figure 5

Impact of equally likely approach on clustering Synthetic Dataset I by wighted-IFCM

Table 3 Description of Synthetic Dataset I
Table 4 Clustering results of weighted-IFCM over Synthetic Dataset I
Table 5 Clustering results over Synthetic Dataset I with an equally likely approach-based weighted-IFCM

Claim: The features relevant to clustering are \(d_1, d_2, d_3\) , whereas \(d_4, d_5\) are irrelevant features in the given dataset.

Strategy: This weighted-IFCM-based data analysis is bifurcated on the basis of the evaluation process of feature weight distribution as follows: (1) Firstly, we arbitrarily use four fixed feature weight distributions \(W_i = \{w_{ik}\}_{k=1}^5\), \(1 \le i \le 4\) (see Table 4). Here, \(w_{ik}\) is assigned to feature \(d_k\), where \(1 \le k \le 5\). (2) Secondly, we use an equally likely approach to calculate the feature weight distributions (see Table 5).

Experimental analysis:

Case 1. The collections \(W_1,W_2\), \(W_3\), and \(W_4\) contain weights that are assigned to features \(d_1, d_2, d_3, d_4\) and \(d_5\) (see Table 4). The benchmark measuring indexes validating the clustering performance of weighted-IFCM are clustering accuracy CA and rand index \(R_I\) (for their details see Sect. 4.4). Using a random set of initial cluster centroids and the two parameters m, \(\alpha\) (see Table 4), we initialize the weighted-IFCM algorithm. Here, 30 values of m are selected from interval [1, 4], and 20 values of \(\alpha\) are selected out of the interval (0, 1) which results in 600 pairs of \((m,\alpha )\) for the algorithm. The use of \(W_1\) in the weighted-IFCM yields better clustering in comparison to \(W_2,W_3,W_4\), in this case, CA and \(R_I\) are obtained as 0.9933 and 0.9911, respectively (see Table 4). In \(W_1\), the emphasis is given to features \(d_1,d_2,d_3\) , whereas very less importance is assigned to \(d_4\) and \(d_5\). It is observed that a considerable weightage is given to \(d_4\) or \(d_5\) in \(W_2,W_3,W_4\), so the weighted-IFCM does not perform well under these distributions (see Table 4). We further explore the role of features \(d_4\) and \(d_5\) in Case 2.

Case 2. The collection computed using an equally likely approach is \(W_5\) = (1/5,1/5,1/5,1/5,1/5). We nullify features one by one, then the weight distributions obtained by applying this approach over four feature-based sets, three feature-based sets, two feature-based set, and single feature-based set are \(W_6\) = (1/4,1/4,1/4,1/4,0), \(W_7\) = (1/3,1/3,1/3,0,0), \(W_8\) = (1/2,1/2,0,0,0), and \(W_9\) = (1,0,0,0,0), respectively (see Table 5). The algorithm delivers its worst clustering performance at \(W_5\) (CA = 0.3333 and RI = 0.3311), and delivers its best clustering performance at \(W_7\) (CA = 0.9966 and RI = 0.9911). We have discarded features \(d_4\) and \(d_5\) in \(W_7\). Both cases verify that \(d_4\) and \(d_5\) are irrelevant features. Please refer to Table 4 and Table 5 for all the results obtained using weighted-IFCM.

Through the experiment, we observe a marginal gap between the performance of weighted-IFCM with random allocation of weights and relevancy-based weights.

4.2 Study of Synthetic Dataset I Using bW-PIFCM

The weight distributions used in weighted-IFCM algorithm are randomly selected. In this algorithm, each feature is allocated a single weight, whereas the algorithms like PIFCM (see [41]) employ a weight triplet to cluster the multi-dimensional dataset. Here, we analyze Synthetic Dataset I with the bi-weighted variant of PIFCM called bW-PIFCM. The bW-PIFCM algorithm converges after seven iterations (see Fig. 3d). The bW-PIFCM performs well over synthetic datasets (see Table 6). It verifies that algorithm is adaptive in nature. Here, none of the features were neglected. Therefore, it cannot be used as a feature reduction technique.

Table 6 Clustering of synthetic datasets using bW-PIFCM

4.3 Study of Synthetic Dataset I Using uW-IFCM

The research work of [21] motivates us to propose uW-IFCM clustering algorithm. The mathematical study of convergence of uW-IFCM becomes a trivial exercise due to the results established by [21]. So, the focus of the paper is on the experimental study of uW-IFCM algorithm. Our study on Synthetic Dataset I shows that uW-IFCM and IFCM take five and ten iterations, respectively, to converge (see Fig. 3a, b). The criterion value of uW-IFCM differs from that of wIFCM (see Table 2) by a large margin (3c). In the figure, the number of iterations is kept along the horizontal axis, whereas the criterion function values are placed on the vertical axis.

Table 7 Tuning of weighing exponent \(\eta\) with fuzzy factor m in uW-IFCM over Synthetic Dataset I

Role of weighting exponent, \(\eta\): The weight \(w_d\) is iteratively calculated in the proposed uW-IFCM algorithm (see Eq. (50) in Theorem 2.6). Then, the algorithm assigns a weight \(w_d^\eta\) to the feature \(\hat{x}_{id}\), where \(1\le d\le D\). Here, the feature obtaining the minimum sum of cluster distances is allocated a larger weight in comparison to other features. This weight assignment approach involves a weighing exponent \(\eta\), so it distinguishes between relevant and irrelevant features. The weight \(w_d^\eta\) helps in the selection of relevant features. The set of positive real numbers (\(\eta > 0\)) is the domain of weighing exponent \(\eta\).

Fig. 6
figure 6

Feature weight distribution over \(\eta\) and m for Synthetic Dataset I such that \(m\ge 1.8\) for \(\eta \in (0,1)\) and \(m<1.8\) for \(\eta >1\)

Fig. 7
figure 7

a Depicting five random initial weights for the five features of Synthetic Dataset I. b The five final weights coinciding to same values when \(\eta =2\). c Five initial weights using the equally likely approach for Synthetic Dataset I (d) Graphical representation of variation in the distribution of final weights for Synthetic Dataset I with varying weighing exponent \(\eta\) in uW-IFCM

We pictorially describe and analyze each feature of Synthetic Dataset I independently in Fig. 1. In Sect. 4.1, we have already discussed about relevant features \(d_1, d_2\), \(d_3\), and irrelevant/noisy features \(d_4\), \(d_5\). The inseparable clusters are obtained in the presence of noisy features \(d_4\) and \(d_5\) (see Fig. 1). Here, we interrelate an appropriate selection of \(\eta\) and fuzzy factor m with good clustering. Let us study the domain of \(\eta\) in two parts, \(\eta \in (0,1)\) and \(\eta >1\). Total of eight values of \(\eta\) are used for the experimentation. It is reasonable to select the domain equal to [1, 4] for m (see [45]). The experimental analysis is done in two parts:

1. A case study showing impact of weighing exponent \(\eta\) on feature weight distribution:

We implement uW-IFCM algorithm over Synthetic Dataset I for the study. The clustering depends on parameters \(\eta\) and m (see Fig. 2). Here, uW-IFCM has initialized under the consideration that all features are equally likely. If \(\eta = 0.1\), then \(w_d^\eta\) allocates a weight distribution to the first three features that are very different to the distribution assigned to the last two features, provided m is suitably chosen. If \(m<1.8\), then \(w_{d}^{0.1}\) assigns high weightages to \(d_4\) and \(d_5\), that is, \(w_{d_4}^{0.1},w_{d_5}^{0.1}>>0.2\). Moreover, \(w_{d}^{0.1}\) assigns weightages to the features \(d_1, d_2\), \(d_3\) of value lesser than 0.2 (see Fig. 2a). Now, if \(m \ge 1.8\), then \(w_d^\eta\) assigns higher weights to relevant features \(d_1, d_2\) and \(d_3\) in comparison to noisy features \(d_4\) and \(d_5\). Here, we do not observe many changes in the weight distribution on changing \(\eta\), where \(\eta \in \{0.5,0.7,0.9\}\). From this discussion, we conclude that \(m \ge 1.8\) and \(0<\eta <1\) is the domain of uW-IFCM algorithm.

In Table 7, the initial centroids and five randomly generated weight distributions initializing the algorithm are given. We have \(\eta \in \{0.1, 0.3, 0.5, 0.7, 0.9, 1.5, 2.0, 2.5, 5.0, 8.0, 10\}\), now the best performance of the algorithm is adjudged after experimenting upon all the values of \(\eta\). Let \(\eta = 1.1\) and \(m<1.8\). In this domain, \(w_d^\eta\) assigns high weights to \(d_1, d_2\), \(d_3\) in comparison to noisy features \(d_4\) and \(d_5\), here a good weight distribution operationalizes uW-IFCM. We vary \(\eta\) over the set \(\{1.5,2,2.5\}\) while fixing \(m<1.8\), the distributions obtained slightly differ with the distribution at \(\eta = 1.1\). If \(m \ge 1.8\) and \(\eta >1\), then noisy features \(d_4\) and \(d_5\) are assigned high weights due to virtue of \(w_d^\eta\). If \(\eta >2\), then the second feature \(d_2\) is assigned a very high weightage in comparison to the remaining features. Here, corresponding to each \(\eta\), we select an optimal m and the algorithm results in efficient clustering using the optimal \((\eta ,m)\) (see Table 8). It is concluded that for \(\eta <1\), we have \(m\ge 1.8\), and \(\eta >1\) implies \(m<1.8\). The complete performance details of the uW-IFCM are provided in Table 7. Here, the two benchmark indexes CA and rand index \(R_I\) verify the performance of the uW-IFCM algorithm. The large deviations in the relevant feature weights and irrelevant feature weights are observed at \(\eta =2\). So, there are bright chances that the algorithm delivers an efficient performance at \(\eta =2\), and this case is analyzed separately in detail.

Table 8 Final weights of Synthetic Dataset I with CA and \(R_I\) by uW-IFCM clustering algorithm when \(\eta = 2\) with random initial weights
Table 9 Distribution of feature weights of Synthetic Dataset I with random weight initialization using uW-IFCM clustering algorithm with \(\eta = 2\)
Table 10 Comparison of clustering performance between IFCM, weighted-IFCM and proposed uW-IFCM with CA and \(R_I\) when \(\eta = 2\)

2. A case study showing connection between the weighing exponent \(\eta =2\) and optimal feature weight distribution:

Here, the role of \(\eta =2\) in obtaining efficient clustering is discussed. We randomly select five initial cluster centroids (see Table 7) and five initial weight distributions (see \(w_o\) rows in Table 9) for the experimentation. Here, all the \(w_o\) rows are initializing uW-IFCM algorithm one by one, still, the output remains the same because in all the five instances, the point of convergence is not changing (see Fig. 7b). Let us deal with the Synthetic Dataset I using the equally likely approach in uW-IFCM. The multiple convergence patterns are observed for feature weights on varying \(\eta\), where \(\eta \in \{0.3, 1.5, 2.0, 8.0,10\}\) (see Fig. 7d). In Table 10, the performance of \(\eta = 2\) based uW-IFCM is compared with the IFCM and weighted-IFCM. An optimal feature weight distribution appears at \(\eta =2\), and we see very different distributions at \(\eta = 0.5\) and \(\eta = 5\) (see Fig. 6a–d). In the case of \(\eta =5\), a high feature weight is allocated to \(d_1\) in comparison to \(d_2\) and \(d_3\) (see Fig. 6a). In Fig. 6b, the weight allocated to feature \(d_2\) is very high, whereas \(d_2\) and \(d_3\) are assigned very low weights. In other words, the algorithm singles out feature \(d_2\) for the clustering at \(\eta = 5\). An optimal weight distribution is obtained using \(\eta =2\), therefore \(w_d^2\) delivers the best clustering accuracy and rand index value also.

Table 11 Clustering results based on bW-PIFCM with fixed initial cluster centroids over IRIS, Thyroid, and Bupa dataset
Table 12 Clustering results based on bW-PIFCM with fixed initial cluster centroids over Zoo, Heart, WDBC, and Ecoli dataset
Table 13 Clustering results based on PIFCM with fixed initial cluster centroids over IRIS, Thyroid, and Bupa dataset
Table 14 Clustering results based on PIFCM with fixed initial cluster centroids over Zoo, Heart, WDBC, and Ecoli dataset

4.4 A Comparative Analysis of the Proposed Algorithms with some C-Means Algorithms Over UCI Machine Learning Datasets

Here, we explore the functioning of proposed uW-IFCM and bW-PIFCM algorithms on some real-valued UCI machine learning datasets. In addition, the clustering performance of uW-IFCM is compared with bW-PIFCM, PIFCM, IFCM, and FCM clustering algorithms.

Table 15 Summary of UCI datasets
Table 16 Summary of synthetic datasets used
Table 17 WORST/AVERAGE/THE BEST CA, \(R_I\), PC, SC, and DI with fixed initial feature weights and different initial cluster centroids over UCI datasets
Table 18 WORST/AVERAGE/THE BEST CA, \(R_I\), PC, SC, and DI of bW-PIFCM and PIFCM with fixed initial feature weights and different initial cluster centroids over UCI datasets
Fig. 8
figure 8

Comparison of Rand Index, \(R_I\) over real datasets with uW-IFCM, wIFCM, wFCM, IFCM, and FCM

1. UCI Machine Learning Datasets

The seven machine learning datasets, namely, IRIS, thyroid, Bupa, zoo, heart, WDBC, and Ecoli are considered for experimentation (see [47]). Out of these seven datasets, five are disease-based datasets concerning to thyroid, liver, heart, breast, and cancer. IRIS is a plant dataset and zoo belongs to the category of the animal dataset. The description of the datasets is provided in Table 15. MATLAB version 8.1 running on a PC with 3.40 GHz frequency and RAM 16 GB is used for the computational tasks.

Fig. 9
figure 9

Study of clustering of IRIS by uW-IFCM with variation in \(\eta\) and m

2. Some popular benchmark measuring indexes:

Here, the five benchmark measuring indexes, namely, clustering accuracy (CA), rand index (\(R_I\)), partition coefficient (PC), partition index (PI), and Dunn index (DI) are used to compare the performances of the algorithms. Now, we define the indexes as follows:

(1) Clustering accuracy (CA) [48]: It is the ratio of the number of correctly classified elements \(p_c\) and a total number of elements p, that is, \(CA=\frac{p_c}{p}\). It is a popular index that works over labeled datasets.

(2) Rand index (\(R_I\)) [44]: Rand index is also used for labeled datasets. Let \(\mathbb {C}\) be the set of clusters being provided in the original dataset. The cluster set resulting after the clustering of the dataset is \(\mathbb {C'}\). The elements in \(\mathbb {C}\times \mathbb {C'}\) are categorized as follows: (1) SD, (2) DS, (3) SS, (4) DD (here S stands for same cluster and D stands for different cluster). Mathematically, we have,

$$\begin{aligned} R_I = \frac{t_1 + t_4}{R} \end{aligned}$$
(63)

, where \(t_1\) = number of SS, \(t_2\) = number of DS, \(t_3\) = number of SD and \(t_4\) = number of DD with \(R = a+b+c+d\).

(3) Partition coefficient (PC) [49]: It is the mean of the summation of the square of membership value of ith element in jth cluster, \(1\le j\le C\). Mathematically, \(PC=\frac{1}{P}\sum _{i=1}^{P}\sum _{j=1}^{C}u_{ij}^2\), where P is the number of elements in the dataset. It informs about the trade-off between the clusters. If PC limits to unity, then the number of elements appearing in common clusters is very few. Therefore, higher the value of \(PC\in [\frac{1}{C},1]\) (where C is the number of clusters) results in better clustering performance.

(4) Partition index (SC) [50]: The ratio between the compactness of a cluster and its separation from other clusters calculates the partition index. Mathematically,

$$\begin{aligned} SC= \frac{\sum _{i=1}^{P}\sum _{j=1}^{C}u_{ij}d^2(x_k,v_i)}{\sum _{k=1}^{P}u_{ik}\sum _{t=1}^{C}d^2(v_i,v_t)} \end{aligned}.$$
(64)

A lower value of SC means better clustering.

(5) Dunn index (DI) [51]: It is a well-established hard cluster validity index. Dunn defines the clusters to be “compact, separate (CS)” relative to the metric iff the following condition is satisfied: for all p, q, r, where \(p \ne q\), any pair of elements \(u, v \in A_r\) are closer together (with respect to metric), then any pair of elements \(x \in A_p\) and \(y \in A_q\). Let \(A_1, A_2,\dots , A_C\) be C-partition of A and let U be the partition matrix. Dunn index is defined as follows:

$$\begin{aligned} DI= & {} \underset{1\le i \le C}{\min }\biggl \{\underset{i+1\le j \le C-1}{\min }\ \biggl \{ \frac{dis(u_i,u_j)}{\underset{1 \le c \le k}{max}dia(u_k)} \biggr \} \biggr \} \end{aligned}$$
(65)
$$\begin{aligned}{} & {} \text{ where }~~ dis(u_i,u_j) = \underset{ A_i \in u_i, A_j \in u_j}{\min }d(A_i,A_j), \end{aligned}$$
(66)
$$\begin{aligned} dia(u_k)= & {} \underset{ A_i,A_j\in u_k}{\min }d(A_i,A_j) \end{aligned}$$
(67)
$$\begin{aligned} DI= & {} \underset{1\le i \le C}{\min }\biggl \{\underset{i+1\le j \le C-1}{\min }\ \biggl \{ \frac{dis(u_i,u_j)}{\underset{1 \le c \le k}{max}dia(u_k)} \biggr \} \biggr \} \end{aligned}$$
(68)
$$\begin{aligned}{} & {} \text{ where }~~ dis(u_i,u_j) = \underset{ A_i \in u_i, A_j \in u_j}{\min }d(A_i,A_j), \end{aligned}$$
(69)
$$\begin{aligned} dia(u_k)= & {} \underset{ A_i,A_j\in u_k}{\min }d(A_i,A_j) \end{aligned}.$$
(70)

3. Experimental analysis of UCI machine learning datasets: In Table 17, we record three types of values for CA, \(R_I\), PC, SC, DI, namely, worst, average, and best. The circular/spherical path-based algorithms, such as uW-IFCM, wIFCM, IFCM, wFCM, FCM, and elliptic/ellipsoidal path-based data-driven bW-PIFCM, PIFCM algorithms are implemented over the seven datasets (see Table 11-Table 14). In IFCM and FCM algorithms, the equally likely approach computes the weight distributions. The weight distributions in fixed-weighted data-driven algorithms, namely wIFCM and wFCM are calculated using Eq. (50). The uW-IFCM algorithm further tunes the weight distribution of wIFCM with the help of its weighing exponent \(\eta\). Here, the values that frequently repeat themselves during 100 random initializations of the algorithms are termed as average values. The results of Table 17 are used for the following experimental analysis:

  1. (1)

     The best CA obtained by uW-IFCM differs from its worst with a minimal value of 0.033 over the IRIS dataset, whereas the maximal deviation equal to 0.118 is observed on the zoo dataset. Hence, we find a CA-deviation interval [0.033, 0.118] for uW-IFCM. The CA-deviation intervals [0.032, 0.212], [0.007, 0.393], [0.033, 0.196], [0.031, 0.253] are obtained using wIFCM, IFCM, FCM, and wFCM, respectively. Since, in IFCM, the underlined requirements of the problem and weights are not correlated, therefore IFCM has the largest deviation interval. In uW-IFCM, \(\eta\) acts as a controlling parameter, so it deduces highly stable weight distributions (in other words, the deviation interval is least).

  2. (2)

     In a similar fashion, the \(R_I\)-deviation intervals [0.007, 0.124], [0.002, 0.195], [0.008, 0.166], [0.004, 0.322], and [0.005, 0.317] are obtained using uW-IFCM, wIFCM, IFCM, FCM, and wFCM, respectively. Since, in FCM also the weights and underlined requirement of the problem are not correlated, therefore FCM yields the largest deviation interval. The uW-IFCM uses a controlling parameter \(\eta\)-based weight distribution, and hence highly stable weights are obtained (in other words, the deviation interval is least). For all the algorithms, we have shown the patterns of \(R_I\)-deviations in Fig. 8.

  3. (3)

      Similar type of analysis can be done for the PC, SC, and DI indexes.

  4. (4)

     Here, every algorithm shows its best and worst performances on IRIS and Bupa datasets, respectively.

  5. (5)

     PIFCM is an adaptive algorithm (see [41]), and its clustering is improvised by using data-driven bW-PIFCM. Further, bW-PIFCM yields clustering better than wIFCM, IFCM, wFCM, FCM algorithms except for uW-IFCM. Hence, bW-PIFCM is also an adaptive algorithm.

  6. (6)

     The worst, average, and best column of every benchmark measuring indexes clearly indicate that uW-IFCM outperforms bW-PIFCM, PIFCM, wIFCM, IFCM, wFCM, and FCM algorithms (see Table 17).

  7. (7)

     The wFCM and wIFCM are data-driven non-adaptive fixed-weighted algorithms. These fixed weights are derived by Eq. (50). The weighing parameter tunes the weight provided in Eq. (50) during the implementation of uW-IFCM.

Table 19 Clustering performance of uW-IFCM on IRIS dataset
Fig. 10
figure 10

Weight distribution for IRIS dataset among its features using uW-IFCM

5 Feature Reduction Technique

We carry out this study in two sections. In the first section, the uW-IFCM algorithm is introduced as a mechanized feature reduction technique. The next section proposes automatized feature reduction technique called FRT-equipped uW-IFCM algorithm. Here, we try to establish a connection between computational cost and feature reduction.

5.1 A discussion of uW-IFCM clustering algorithm as a feature reduction technique:

Here, we analyze IRIS dataset deeply with the help of the uW-IFCM algorithm. The dataset consists of four features \(\{d_1, d_2, d_3, d_4\}\), namely, sepal length, sepal width, petal length, and petal width, respectively. The 150 instances present in the dataset are divided into three classes: setosa, virginica, versicolor, and each class contains 50 instances. We use twenty feature weight distributions to initialize the algorithm. Here, the first ten feature weight distributions are calculated using the equally likely approach, whereas the next ten feature weight distributions are randomly generated with the help of \(\texttt {rand}\) function of MATLAB (see Table 19). The initial cluster centroids given in Sect. 4.3 are used here also. The importance of weighing exponent \(\eta = 2\) in uW-IFCM is highlighted in Sect. 4.4. Let us use \(\eta = 2\), \(m = 1.7\), and equally likely approach-based initial weights (see rows 1–10 of Table 19) to initialize the uW-IFCM, the algorithm assigns higher weights to the third feature and fourth feature in comparison to other two features. Further, a good clustering accuracy \(CA=0.9867\), rand index \(RI=0.9875\), \(PC=0.9205\) are obtained with \(SC=0.00007\) and \(DI=0.05123\). These values validate that initial weight distribution converges to optimal feature weight distribution. Now, we initialize the uW-IFCM algorithm using initial weight distribution (0.3119, 0.2371, 0.3183, 0.1327), \(\eta = 2\), and \(m = 1.7\), its result is an optimal weight distribution (0.1622, 0.1661, 0.3527, 0.3191) (See Table 19). The experimental findings confirm that uW-IFCM yields optimal weight distribution over the IRIS dataset at \(\eta =2\) (see Fig. 10a, e, f). The algorithm assigns higher weights to the third and fourth features in comparison to the first two features, while using \(\eta =2\) and \(m=1.1\). A graphical comparison of feature weight distribution obtained over the IRIS dataset is also presented (see Fig. 10). At \(\eta =0.5\) and \(\eta = 1.1\), uW-IFCM gives almost equal weightage to all the four features (see Fig. 10b, c). The clustering completely depends on the third feature whenever \(\eta > 2.5\) is explored (see Fig. 10g–j). Here, convergence is achieved at the ninth iteration. Further, if \(\eta >2\), the clustering performance of uW-IFCM deteriorates which can be improvised by tuning \(\eta\) with m. The domain of m depends on \(\eta\) (see Sect. 4.4). Therefore, from an \(\eta\)-dependent domain of m, we have selected some values for the experimentation (see Table 19). The clustering results obtained using different \(\eta >2\) and \(m=1.1\) are recorded in Table 19, we exploit \(m=1.1\) as it suitably tunes the \(\eta\). During the clustering of the IRIS dataset, we obtain higher relevancy for the third and fourth features in comparison to the remaining features. In Table 20, the relevancy of the features of the IRIS dataset is given.

Table 20 Updation of feature weights for IRIS dataset using uW-IFCM when \(\eta = 2\)
Fig. 11
figure 11

Plots to show a reduced number of iterations for IRIS dataset clustering using uW-IFCM

 Summary of this discussion: During the clustering of the IRIS dataset, the uW-IFCM allocates more weightage to features \(d_3, d_4\) in comparison to \(d_1, d_2\). Here, due to the absence of proper feature nullification criteria, the less relevant features are mechanically neglected. So, uW-IFCM mechanically observes that \(d_1, d_2\) are irrelevant features. This algorithm can be used as a feature reduction technique. The uW-IFCM results in an efficient clustering of two featured versions of the IRIS dataset also. The algorithm converges in nine iterations while clustering the IRIS dataset, whereas the two featured versions of the IRIS dataset are clustered in five iterations (see Fig. 11). The total computational time taken by uW-IFCM over IRIS dataset and its two featured versions is 0.1485 and 0.1118, respectively.

Fig. 12
figure 12

Synthetic Dataset IV illustrated in Example 3 viewed in different subspaces

Fig. 13
figure 13

a 2D—Original Synthetic Dataset II, b clustering with uW-IFCM using original features, c clustering with uW-IFCM with final features

Fig. 14
figure 14

a Original Synthetic Dataset III mapped to 3D, b dataset projected to x–y plane, c clustering with uW-IFCM

5.2 Performance of the FRT-equipped uW-IFCM algorithm

Here, a feature reduction technique (FRT) is incorporated in the uW-IFCM algorithm for the automatic elimination of irrelevant features. The computational cost of the algorithm is low. In the FRT-equipped uW-IFCM, a collective threshold is decided for all features such that the weights equal and above the threshold value are accepted and weights having values lower than the threshold gets rejected. The dataset contains D numbers of features/dimensions, so for the removal of irrelevant features the threshold value equal to \(1/(\sqrt{D})^{\chi },\chi >0\) is selected for computational work. The features having weight values lower than \(1/(\sqrt{D})^{\chi }\) are not useful for clustering. Higher the number of dimensions/features, the lower the value of the threshold and vice-versa. If D is large, then \(1/(\sqrt{D})^{\chi }\) becomes small, and in this case, the underlying iterative process of the algorithm converges after discarding low relevancy features. The small value of D results in high threshold value, and hence in this situation the features are rapidly eliminated. This threshold selection works well for all finitely valued D. In this section, we worked on two synthetic datasets of two dimensions and one six-dimensional synthetic dataset. Please refer to Table 16 for synthetic datasets used in the following examples.

Algorithm V:

FRT-equipped uW-IFCM algorithm

Input:

Dataset (X), number of centroids (C), weighing exponent (\(\eta\)), fuzzy factor (m), intuitionistic fuzzy parameter (\(\alpha\)), tuning parameter for membership function (\(\beta\)), tolerance level (\(\epsilon\))

Output:

Fuzzy partition U, centroids \(\{(\bar{\mu }_{ld}, \bar{\nu }_{ld}, \bar{\pi }_{ld})\}_{d=1}^{D}\), Weight matrix W

Procedure:

1. 1. Data fuzzification using Algorithm I.

 

2. 2. Initialize centroid \(\hat{V}\), U at \(t=0\).

 

3. 3. Initialize weight matrix W with \(\frac{1}{D}\)’s.

 

Repeat

 

4. Update \((U=u_{il})^{t+1}\) by calculating the fuzzy partition using Eq. (41).

 

5. Update centroid \(\{(\bar{\mu }_{ld}, \bar{\nu }_{ld}, \bar{\pi }_{ld})\}_{d=1}^{D}\) using Eqns. (55-57).

 

6. Update weight matrix, W using Eq. (50).

 

7. 7. Optimize weight matrix with threshold value \(1/(\sqrt{D})^{\chi }\).

 

Until

 

8. \(\sum _{i=1}^{C}\frac{d_2(W_i(t),W_i(t+1))}{D} < \epsilon\) is satisfied.

Return

\(U^{{t+1}}\), (\(\bar{\mu }_{l}^{t+1}, ~\bar{\nu }_{l}^{t+1}\),\(\bar{\pi }_{l}^{t+1}\)) and \(W^{t+1}\).

Example 1

We generate a two-dimensional Synthetic Dataset II of sample size 1000 using a Gaussian distribution function, \(\sum _{j=1}^{3}\sum _{l=1}^{2}q_k N(\theta _{jl},\sigma _{jl})\). Here, we denote the mean and the standard deviation with \(\theta\) and \(\sigma\), respectively. The mean value of the jth cluster along lth dimension/feature is given as \(\theta _{jl}\) with standard deviation \(\sigma _{jl}\). Synthetic Dataset II consists of three well-separated clusters. Here, the parameter \(q_k = 1/3, \forall 1 \le k \le 3\). The values of the mean and standard deviation of the three clusters are given as \(\theta _{j1} = (0~2)^T, \theta _{j2} = (4~7)^T, \theta _{j3} = (8~12)^T\) and \(\sigma _{j1} = \sigma _{j2} = \begin{pmatrix} 10&{}0.01\\ 0.01&{}0.01 \end{pmatrix}\) and \(\sigma _{j3} = \begin{pmatrix} 4&{}0\\ 0&{}1 \end{pmatrix}\).

The threshold limit \(1/(\sqrt{D})^{\chi }\) of FRT-equipped uW-IFCM algorithm nullifies the first feature because the relevancy of the first feature is quite less in comparison to the second feature. Now, uW-IFCM computes the CA and \(R_I\) values using only the second feature. The clustering accuracy \(CA=0.9950\) and rand index \(R_I=0.9960\) are calculated over Synthetic Dataset II. Here, with reduced computational cost, we obtain similar results. These criteria easily eliminate irrelevant features from the dataset.

Example 2

We generate a two-dimensional/featured Synthetic Dataset III of sample size 1000 using a Gaussian distribution function, \(\sum _{j=1}^{4}\sum _{l=1}^{2}q_k N(\theta _{jl},\sigma _{jl})\). Here, we denote the mean and the standard deviation with \(\theta\) and \(\sigma\), respectively. The mean value of the jth cluster along lth dimension/feature is given as \(\theta _{jl}\) with standard deviation \(\sigma _{jl}\). The Synthetic Dataset III consists of four well-separated clusters. Here, the parameter \(q_k = 1/4, \forall 1 \le k \le 4\). The values of the mean and standard deviation of the four clusters are given as \(\theta _{j1} = (10~5)^T, \theta _{j2} = (10~10)^T, \theta _{j3} = (10~15)^T, \theta _{j4} = (10~20)^T\) and \(\sigma _{jk} = \begin{pmatrix} 1&{}0\\ 0&{}1 \end{pmatrix}\), \(\forall 1 \le k \le 4\).

We have transformed two-dimensional/featured dataset into a three-dimensional/featured dataset using the mapping \(\chi (x, y) = (x \cos x, y, x\sin x)\). A two-dimensional cross-sectional view of the clustering is shown in Fig. 14, and in this case, we obtain clustering accuracy \(CA=0.5888\) and rand index \(R_I=0.4999\). The threshold limit of the dataset is given by \(1/(\sqrt{3})^{\chi }\). The incorporation of feature reduction criteria (\(1/(\sqrt{3})^{\chi }\)) in uW-IFCM excludes the first feature during the clustering, and thus improved \(CA= 0.9954\) and \(R_I=0.9960\) are obtained.

Example 3

Here, we use a two-cluster six-dimensional dataset named Synthetic Dataset IV. The dataset contains two types of data points. Each data point has six features. The Gaussian distribution selects the 1st, 3rd, and 5th feature values of the first 100 data points. This distribution has two parameters, mean = 3 and variance = 0.5. Using uniform distribution, the 2nd, 4th, and 6th feature values are assigned in the interval [0,10]. In a similar fashion, the 2nd, 4th, and 5th feature values of the next 100 data points are selected with the help of Gaussian distribution having mean = 7 and variance = 0.5. Uniform distribution assigns values to the remaining features, i.e., 1st, 3rd, and 6th in [0,10].

The dataset is constructed such that the 1st, 3rd, and 5th dimensions/features reside in the first cluster, whereas the 2nd, 4th, and 6th dimensions reside in the second cluster. In such datasets, subspace clustering gives effective results. A detailed subspace view of the dataset is presented (see Fig. 12). The importance of the 5th feature in separating two clusters is clearly shown in Fig. 12c and h. The uW-IFCM algorithm also verifies that 5th feature is most important as it calculates weight distribution, \(W = [0.1215,0.1102,0.0934,\) \(0.0956,{\textbf {0.5252}},0.0539]\). The CA and \(R_I\) are being recorded as 0.9989 and 0.9954, respectively.

Table 21 Final number of features for synthetic datasets with varying \(\chi\)
Fig. 15
figure 15

Feature weight distribution of the three synthetic datasets using uW-IFCM

Table 22 Running time (in seconds) of uW-IFCM with original and final features of dataset

The weight distributions obtained using the uW-IFCM algorithm over different datasets are compared in Fig. 15. The figure clearly shows the differences in the weights of relevant and irrelevant features. Once the number of features is reduced, the running time also decreases (see Table 22). A detailed analysis of the parameter \(\chi\) is provided in Table 21.

6 Conclusion

In this paper, we have introduced a spherical path-based clustering algorithm called uW-IFCM and an ellipsoidal path-based algorithm known as bW-PIFCM. The uW-IFCM algorithm has assigned a single weight to each feature, whereas bW-PIFCM is an adaptive algorithm that allocates data-driven weight triplet to each feature. The feature weights in uW-IFCM are further tuned with the help of a weighing parameter \(\eta\). Here, the real-valued dataset has been transformed into an AIFS dataset using a novel intuitionistic fuzzification technique. The technique does not extract membership and non-membership functions from the dataset. This lacking of the extraction process leads to uncertainty for which the two parameters \(\alpha\) and \(\beta\) have been introduced. The uW-IFCM is not an adaptive algorithm, so it is computationally expensive. In order to reduce the cost of the uW-IFCM algorithm, we have introduced the FRT-equipped uW-IFCM algorithm. The feature reduction technique eliminates irrelevant features from the dataset. The proposed uW-IFCM algorithm has been compared with the introduced bW-PIFCM algorithm, PIFCM, IFCM, wIFCM, wFCM, and FCM algorithms over UCI machine learning datasets and some synthetic datasets. The feature weight distribution used in uW-IFCM is appropriate, so it has effectively handled irrelevant and noisy features in comparison to other algorithms. The uW-IFCM has shrunk the search space of fuzzy factor m besides eliminating irrelevant features, so the running time of the algorithm is reduced.

In the future, the aim is to study the outliers and noises possessing real-world datasets using uW-IFCM. We will try to formulate an AIFS extraction technique involving a multi-valued function. The final objective is to suggest exponent \(\eta\)-based improvisation for the proposed bW-PIFCM and to tune the weight triplets with \(\eta\). The FRT-equipped uW-IFCM algorithm deduces a threshold-dependent weight distribution, and thereby relevant features get separated from irrelevant features. In the future, we will try to introduce the FRT-equipped bW-PIFCM algorithm. Here, the paper has established bW-PIFCM as an adaptive algorithm.

The future work of this paper is motivated by the limitations of our proposed approach, which involve implementing the algorithm on real-valued datasets in a computationally efficient manner.