1 Introduction

Clustering algorithms are widely used in data mining, pattern recognition, and other related fields (Goicovich et al. 2021; Peng et al. 2022; Fu et al. 2023; Le et al. 2023). Common clustering algorithms include partition clustering, hierarchical clustering, density clustering, etc. (Peng et al. 2022). Notably, since being introduced by MacQueen (1967), the K-Means algorithm has received significant attention for its simplicity and efficiency. Recently, centroid initialization and feature weight optimization of the K-Means algorithm have become a research focus. K-Means exhibits a strong dependence on the initial centroid vector, and the conventional random selection of the initial centroid will lead to unstable clustering results. To this end, the researchers proposed different improved K-Means algorithms (Arthur and Vassilvitskii 2007; Li and Wu 2012; Mawati et al. 2014; Zhou et al. 2017; Rashidi et al. 2020; Ay et al. 2023). For example, Arthur and Vassilvitskii (2007) developed the K-Means++ algorithm, which determined the initial centroids by the probabilistic method and accelerated the convergence speed. Ay et al. (2023) improved the stability and convergence efficiency of centroid calculation under big-data samples by fixing partial initial centroids. Random selection of the initial centroids is easy to cause: (1) When the distance between the initial centroids is too large, the convergence speed of the K-Means algorithm slows down; (2) When the distance between the initial centroids is too small, it is not conducive to the calculation of the final centroids and determination of cluster boundaries during the clustering process. The random selection method is proved inadequate for multi-feature data. Moreover, in real life, partial-order relationship is common in multi-feature data (Wang et al. 2016). The traditional centroid initialization method cannot consider the partial-order relation. Therefore, how to determine the initial centroids of K-Means through the partial-order relationship between multi-feature data remains to be studied.

For multi-feature data clustering, existing research mainly focuses on the optimization of feature weights and distance metrics to improve the performance of the K-Means algorithm (Huang et al. 2005, 2018; Chen et al. 2012; De Amorim and Mirkin 2012). For instance, Huang et al. (2005), based on a global optimization strategy, designed a new K-Means algorithm that can automatically solve feature weights. Similarly, Renato (De Amorim and Mirkin 2012) employed the Minkowski distance metric to construct an objective optimization function. These approaches are based on the weighted feature distance metrics and construct objective optimization equations to improve the clustering performance of the K-Means algorithm. In recent years, more and more studies have begun to improve the K-Means algorithm from the perspective of multi-feature data fusion (Cheng et al. 2009; Pons-Vives et al. 2022). Researchers have employed operators such as the ordered weighted average (OWA) operator (Yager and Filev 1999) to aggregate data, thereby improving the performance of the K-Means algorithm. For example, based on the OWA operator, Pons-Vives et al. (2022) introduced the ordered weighted distance relationship (OWDr) metric to enhance the K-Means algorithm’s sensitivity to different features and improve the neglect of differences between features which is the deficiency of Euclidean distance. Cheng et al. (2009) combined stepwise regression feature selection with the OWA operator to reduce multi-dimensional features and enhance the accuracy of K-Means in classification tasks. Despite these advancements, there remains an unmet need for research focused on weighted fusion at the feature level, particularly during the centroid calculation in clustering processes. At the same time, the weighted calculation can be further used for data level, that is, the important difference among cluster members should be considered in the centroid calculation process, and the noise data (Askari 2021) that will cause bias should be screened. The OWA operator family and its weight methods can solve the above, hence the combination of the OWA operator and K-Means centroid calculation needs to be studied.

To enhance the adaptability of the OWA operator in data fusion, Yager and Filev (1999) developed the induced ordered weighted average (IOWA) operator. Later, scholars did further research on the selection and acquisition methods of induced factors (Chiclana et al. 2007; Ma and Guo 2011; Yi et al. 2018; Ji et al. 2021). Building upon Yager’s studies (1999), Chiclana et al. (2007) developed the Consistency-IOWA operator. Ma and Guo (2011) and Yi et al. (2018) introduced the Density-IOWA operator and Quantile-IOWA operator, respectively. Ji et al. (2021) developed Average-IOWA for expert opinions aggregation. However, these operators mainly rely on single-factor induction, which can lead to biases and often results in numerous identical sorting outcomes when applied to big data. For multi-feature data such as vectors, there is not only one influencing factor but also the relationship between factors should be considered. Consequently, single-factor-induced sorting falls short of addressing the complexities of multi-feature data. Inspired by database indexing theory (Huang and Chen 2000), this paper proposed a new multi-feature-induced order weighted average (MFIOWA) operator to expand the adaptive range of the IOWA operator.

However, for the IOWA operator family, feature weights have always been a highly concerned problem. Different weight methods can make IOWA operators play different roles. For example, Yager (1988) obtained OWA weights through linguistic quantifiers and introduced Orness and Dispersion measures for weights optimization (Yager 1993; Yager and Filev 1994). O'Hagan (1988) and Xu and Da (2003) used the Orness measure to construct an objective optimization function to solve the weight. However, in the process of multi-feature data processing, scholars will screen samples and features by measure metrics, such as density, variance, Pearson correlation coefficient, and so on (Naik and Kiran 2021), or identify the importance of features by sensitivity analysis (Zhang 2019; Asheghi et al. 2020). How to use derived information to complete the filtering and aggregation of multi-feature data poses a challenge to IOWA's weight function. Considering the excellent characteristics, which are continuous, smooth, differentiable, and saturated at a certain value, the Sigmoid function has drawn considerable interest from researchers (Xu et al. 2021; Dombi and Jónás 2022). For example, Dombi and Jónás (2022) proposed a generalization of the Sigmoid function and used it for logistic regression and preference modeling. To enable the proposed MFIOWA operator to address data-fusion tasks with different information filtering requirements, and provide K-Means with more flexible and powerful data-fusion as well as noise-resistant tool, the weight method and solution model based on the Sigmoid function need to be studied.

According to the above investigations, the K-Means algorithm and IOWA operator as potent tools for data mining, but the K-Means algorithm combining with IOWA operator to improve cluster performances are challenges, the motivation of this paper is as follows.

  1. 1.

    Although multi-feature weighted K-Means have received high attention from researchers, K-Means combined with OWA operator have not been developed based on the induced order relationship between features.

  2. 2.

    The random selection centroid of K-Means causes unstable clustering results, and the partial-order relationships between features need to be explored to improve the stability of K-Means.

  3. 3.

    The IOWA only takes the single-factor induction based on the OWA operator, multi-factor induction of the OWA operator has not been designed.

In this paper, we develop the OWAK-Means clustering algorithm which is based on the proposed MFIOWA operator and centroid initialization method. The major contributions of this research are as follows.

  1. 1.

    Inspired by the IOWA and database indexing theory (Yager and Filev 1999; Huang and Chen 2000), the MFIOWA operator is proposed. A new weight method is derived for the MFIOWA operator based on the Sigmoid function, and an objective optimization model is constructed to solve the feature weights.

  2. 2.

    Inspired by the K-Means algorithm (De Amorim and Mirkin 2012; Pons-Vives et al. 2022; Ay et al. 2023), an OWAK-Means algorithm is developed for the multi-feature clustering which integrates a new centroid initialization method and the MFIOWA operator.

  3. 3.

    The OWAK-Means is compared with variant K-Means on UCI real-world datasets. Comparison and analysis results show that the OWAK-Means has obvious advantages over the other algorithms in clustering performance.

The subsequent sections of this study are structured as follows: In Sect. 2, an extensive review is described. Section 3 introduces the MFIOWA operator. In Sect. 4, the OWAK-Means algorithm is elucidated. Section 5 describes the comparison and analysis of OWAK-Means with other cluster algorithms. Finally, Sect. 6 summarizes the contributions of our research and discusses potential avenues for future exploration in this domain.

2 Related work

2.1 IOWA operator

Definition 1 (Yager and Filev 1999)

Let \(\left({a}_{1},{a}_{2},\dots ,{a}_{n}\right)\) be a set of real numbers, and \(\left(\left({u}_{1},{a}_{1}\right),\left({u}_{2},{a}_{2}\right),\dots ,\left({u}_{n},{a}_{n}\right)\right)\) be a set of numerical pairs. The IOWA operator with \(n\)-dimension is a mapping: \({R}^{n}\to R\), characterized by an \(n\)-dimensional weight vector \(W={\left({w}_{1},{w}_{2},\dots ,{w}_{n}\right)}^{T}\). The weight vector satisfies \({w}_{i}\ge 0\), \({\sum }_{1}^{n}{w}_{i}=1\left(i=\mathrm{1,2},\dots ,n\right)\). The IOWA operator is shown as Eq. (1).

$${\text{IOWA}}\left(\left({u}_{1},{a}_{1}\right),\left({u}_{2},{a}_{2}\right),\dots ,\left({u}_{n},{a}_{n}\right)\right)={\sum \limits_{j=1}^{n}}{w}_{j}{c}_{j}.$$
(1)

In Eq. (1), \({c}_{j}\) represents \({a}_{i}\) which is associated with the j-th largest induced value \({u}_{i}\) in the sequence \(({u}_{1},{u}_{2}, \dots ,{u}_{n})\). The sequence can be in descending or ascending order, depending on the sort rule. Notably, the Orness measure of the IOWA operator, as referenced in Yager’s paper (1988, 1993), is the same as that of the OWA operator, as shown in Eq. (2).

$${\text{Orness}}(W)=\frac{1}{n-1} \sum \limits_{i=1}^{n}\left(n-i\right){w}_{i}.$$
(2)

The existing IOWA operator is characterized by a single-factor induction. However, real-world scenarios frequently involve a plurality of induced factors. To address this complexity, the MFIOWA operator is developed in Sect. 3.1.

2.2 K-Means clustering algorithm

K-Means algorithm can be distilled into a specific objective optimization function (De Amorim and Mirkin 2012). Suppose a dataset has N entities, each entity has M attributes. This dataset is represented by the matrix \(X=\left({x}_{iv}\right)\), where \({x}_{iv}\) denotes the value of the v-th attribute for the i-th entity \(\left(i=\mathrm{1,2},\dots ,N, v=\mathrm{1,2},\dots ,M\right)\). The K-Means algorithm divides the dataset into K clusters and the output of K-Means is represented by \(C=\{{C}_{1},{C}_{2},\dots ,{C}_{k}\}\), where the centroid of each cluster is represented by μk = (ckv). The minimization objective function of the K-Means is articulated in Eq. (3).

$$D = \sum_{k = 1}^K {\sum_{i = 1}^N {\mathop \sum \limits_{v = 1}^M } } s_{ik} (x_{iv} - c_{kv} )^2$$
(3)

where \({s}_{ik}\) is an indicator function that \({s}_{ik}=1\) if \({{x}_{i}\in C}_{k}\), and \({s}_{ik}=0\) otherwise.

To improve the performances of the K-Means algorithm, there are several challenges (a) selecting initial centroids, (b) optimizing feature weight in the context of multi-feature big data, (c) optimizing centroid calculation based on multi-feature weights, (d) optimizing distance metric. Concerning issue (a), a detailed review is scheduled for Sect. 2.2.1. For issue (c), a comprehensible explanation will be provided in Sect. 2.2.2. This paper proposed an initial centroid selecting algorithm to address the issue (a) based on partial-order relations. For challenges (b) and (c), a new MFIOWA operator will be employed to filter interference data and noise data in the data fusion processing. Regarding issue (d), this study will employ the improved OWDr distance metric to calculate the distance between samples.

2.2.1 K-Means initial centroid selection method

To address the challenge of selecting centroids in K-Means clustering, researchers have made significant contributions (Arthur and Vassilvitskii 2007; Li and Wu 2012; Mawati et al. 2014; Wang et al. 2016; Zhou et al. 2017; Nainggolan et al.2019; Khan et al. 2019; Rashidi et al. 2020; Ay et al. 2023). Arthur and Vassilvitskii (2007) proposed the K-Means++ algorithm, which utilizes a probabilistic method for computing distances to determine the initial centroids of K-Means. This method starts from a randomly chosen point, subsequently selects the farthest point as the initial centroid, and iteratively identifies K initial centroids, ensuring substantial distances between each cluster. Arthur and Vassilvitskii (2007) method has greatly improved the accuracy and convergence speed of K-Means, but the random selection of cluster centroid still brings challenges to the accuracy and stability of the K-Means algorithm. There are also many studies on centroid selection of clustering. For instance, Li and Wu (2012) employ the Max–Min distance algorithm to address the dependency of the K-Means algorithm on the initial centroid selection. Mawati et al. (2014) integrated a systematic method for initial centroid determination. This approach iteratively identifies 0.75 * (n/K) points in closest proximity to each other through an iterative algorithm and obtains the centroid using an arithmetic mean.

In the context of big data, Steinbach et al. (2000) introduced the Bisecting K-Means clustering algorithm to reduce the dependency of the clustering process on the selection of random points. The Bisecting K-Means algorithm employs a dichotomy to overcome K-Means convergence to the local optimal state to the local optimal state and reduce similarity computations to improve the overall execution speed. At the same time, in big data environments, the K-Means algorithm is also facing new challenges. Sculley (2010) developed the Mini-Batch K-Means algorithm, which employs mini-batch optimization methods to reduce clustering time, thereby improving the K-Means algorithm's adaptability to big data scenarios. Ay et al. (2023) introduced a new hybrid method, which combined the part-fixed clustering centroid method with the dynamic determination of the centroid. This method demonstrated effectiveness in high-dimensional feature spaces and big-data datasets and overcomes the problem of initial centroid selection and centroid solution under big data environment.

These studies have investigated various strategies to optimize the selection of initial centroids to enhance the stability and convergence efficiency of the K-Means algorithm. Nevertheless, in professional domains, the inherent partial-order relationships of data are often overlooked. Therefore, this paper proposes setting initial centroids based on partial-order relationships to address this gap.

2.2.2 K-Means centroid optimization method

For the high-dimensional and multi-feature clustering problem, Eq. (3) can satisfy the clustering task when the feature weights are equal. However, in practical application, different features often possess different levels of importance. Recent researchers have developed K-Means algorithms for feature weight optimization (de Amorim 2016). For example, Modha and Spangler (2003) introduced the convex K-Means planning algorithm (CK-Means), based on Eq. (3). The CK-Means is designed to minimize intra-cluster dispersion while maximizing inter-cluster dispersion. The weight relationships among features are linear. Nonetheless, the CK-Means approach may not be apt for scenarios where feature relationships exhibit non-linear dependencies.

For the issue of non-linear relationships in feature weights, recent scholars have primarily focused on the exponentiation weights (\({w}^{\beta }\)) (Makarenkov and Legendre 2001; Frigui and Nasraoui 2004) to refine the interaction among features. Makarenkov and Legendre (2001) utilized \({w}^{2}\) to improve the K-Means algorithm. Frigui and Nasraoui (2004) introduced a regularization method to adjust \({w}^{2}\). Inspired by Makarenkov and Legendre (2001), Chan et al. (2004), and Huang et al. (2005) developed the Weighted K-Means (WK-Means) algorithm. The WK-Means automatically calculates the weights for each feature during the clustering process and integrates a step for computing feature weights in each iteration. The objective optimization function of WK-Means is shown in Eq. (4).

$$D = \mathop \sum \limits_{k = 1}^K \mathop \sum \limits_{i = 1}^N \mathop \sum \limits_{v = 1}^M s_{ik} w_v^\beta (x_{iv} - c_{kv} )^2 ,$$
(4)

where \({w}_{v}=\frac{1}{\sum_{{\text{u}}\in {\text{V}}}{({D}_{v}/{D}_{u})}^{1/(\beta -1)}}\), and \(D_v = \sum_{k = 1}^K {\,} \sum_{i = 1}^N {s_{ik} (x_{iv} - c_{kv} )^2 }\) . In contrast to the approach of Modha and Spangler (2003), the WK-Means offers a defense against irrelevant or noisy features and emphasizes nonlinear relationships between feature weights. However, the WK-Means compromises the direct correlation between the scale of feature values and feature weights. Inspired by Chan et al. (2004), and Huang et al. (2005), De Amorim and Mirkin (2012) introduced weighted K-Means based on the Minkowski distance (MWK-Means), the optimization object function of MWK-Means is presented in Eq. (5).

$$D = \mathop \sum \limits_{k = 1}^K \mathop \sum \limits_{i = 1}^N \mathop \sum \limits_{v = 1}^M s_{ik} w_v^\beta (x_{iv} - c_{kv} )^\beta$$
(5)

Equation (5) determines weights using feature scaling factors. The simplified expression of the MWK-Means algorithm is as follows: \(D=\sum_{k=1}^{K}\sum_{i=1}^{N}\sum_{v=1}^{M}{s}_{ik}{|{{w}_{v}x}_{iv}-{w}_{v}{c}_{kv}|}^{\beta }=\sum_{k=1}^{K}\sum_{i=1}^{N}{d}^{\beta }{|{x}_{i}{\prime}-{c}_{i}{\prime}|}^{\beta }\), where \({x}_{i}{\prime}=({w}_{v}{x}_{iv})\). Under the Minkowski distance metric, the MWK-Means algorithm transforms feature weights into scaling factors within the original K-Means criterion. The MWK-Means algorithm performs well in datasets with additional uniform random noise features, but determining reasonable parameter \(\beta\) is a challenge. Moreover, in big-data environments, the burden of calculating feature weights increases significantly. Compared with the method proposed by Huang et al. (2005), the method proposed by De Amorim and Mirkin (2012) is more flexible. While ensuring that the feature weights are non-linear, the distance metric is associated with the feature weights. However, the feature weights calculated by Huang and De Amorim may not be able to highlight representative features well. When the feature dimension increases, the weight difference between features is not obvious (Hashemzadeh et al. 2019). Therefore, the weight cluster algorithm needs to be improved.

Equations (4) and (5) are global feature optimization, so the computing cost will be high when there is a large amount of data. To address this problem, Hashemzadeh et al. (2019) employ weighted features within clusters and developed a new objective function: \(D_k = \sum_{i = 1}^N {\,} \sum_{v = 1}^M {u_{ik}^\alpha w_{kv}^\beta (x_{iv} - c_{kv} )^2 }\). For high-dimensional sparse big data clustering, Jing et al. (2007) developed a K-Means algorithm with an entropy weight method and constructed the objective function: \(D = \sum_{k = 1}^K {\left[ {\sum_{i = 1}^N {\sum_{v = 1}^M {w_{kv} } } (x_{iv} - c_{kv} )^2 + \gamma \sum_{v = 1}^M {w_{kv} } {\text{log}}(w_{kv} )} \right]}\) Jing’s method used local weighted features within clusters and entropy penalty terms to improve the algorithm's performance. Inspired by Jing et al. (2007), Khan et al. (2019) introduced a weighted entropy penalty term and proposed a new objective optimization function.

The core of the above improvement method is to optimize the feature weights based on different weighting distances. Commonly used distance functions include Euclidean distance, Manhattan Distance, Chebyshev Distance, and so on. Euclidean distance is simple to calculate, Manhattan Distance is more suitable for datasets with independent features, and Chebyshev Distance has strong robustness. The Minkowski Distance (De Amorim and Mirkin 2012) is their generic form, with a lot of flexibility. Scholars have also improved K-Means from the perspective of distance metric (Pons-Vives et al. 2022; Singh and Singh 2023; Savita and Siwch 2024). For example, Pons-Vives et al. (2022) utilized the OWA operator to define the ordered weight distance relative (OWDr) to further advance clustering methodologies. The distance formula Pons-Vives et al. (2022) proposed is \(d_r (a,b) = |x - y|/{\text{max}}(x,y)\). Subsequently, the optimization object function based on OWDr is shown in Eq. (6),

$$D = \mathop \sum \limits_{k = 1}^K \mathop \sum \limits_{i = 1}^N s_{ik} {\text{OWDr}}(x_{iv} ,c_{kv} ).$$
(6)

The OWDr significantly enhances the distinctiveness of information across varying feature dimensions, which further increases the likelihood and speed of convergence. OWA operator (Yager and Filev 1999) is an aggregation method that can effectively filter features. And Cheng et al. (2009) have adopted a stepwise regression feature selection method, employing the OWA operator for dimensionality reduction, which has been shown to improve the accuracy of K-Means in high-complexity and high-dimensional classification tasks.

Therefore, upon reviewing and analyzing existing literature, it becomes clear that current K-Means algorithms with weighted features primarily optimize performance through a focus on the weighted distance of features. At the same time, the important difference between data samples and interference samples in clusters is not taken into account, so how to screen and weighted aggregate multi-feature data has important research significance. In contrast, this study proposes a novel approach that emphasizes partial-order relations among different features and the significance of data samples within clusters. This perspective aims to further enhance the performance of the K-Means algorithm.

3 Multi-feature-induced ordered weighted averaging operator

3.1 MFIOWA operator based on partial-order relation

Multi-feature order is one of the key technologies of database indexing which is often used to sort multi-feature or high-dimensional complex data (Huang and Chen 2000; Liang et al. 2021). Multi-feature sorting refers to determining the order of data based on the importance and internal rules of multiple features to meet different needs and scenarios, to achieve more accurate and comprehensive sorting results.

Data aggregation is critical in production and everyday life, particularly within big-data contexts. Relying on a single feature is often inadequate for sorting tasks in real-world scenarios. Consider, for instance, there are five students' final grades in a university need to be comprehensively ranked, including five features: Advanced Mathematics, Python, Computer Networks, Sports, and Total Score. The score of Sports is classified into three levels (A, B, C) in descending order. The other features are numerical, ranging from 0 to 100, where higher values denote superior performance, as depicted in Table 1.

Table 1 Final grades of sophomore students in a university

Employing the Total Score for initial sorting can select the top student (ID 15), but the priority relationship among students with IDs 11–14 cannot be distinguished. Therefore, by further using the Advanced Mathematics feature for induced sorting, the student with ID 11 can be placed at the lowest rank, but the students numbered 12, 13, and 14 still cannot be distinguished. Then, rest students are sequenced by the three features of Python, Computer Networks, and Sports until all students are sorted, and the final ranking result is (15, 13, 14, 12, 11).

As can be seen from Table 1, according to the database retrieval rules, an induced vector containing multiple features is formed by (Total Score, Advanced Mathematics, Python, Computer Networks, and Sports), adhering to the following relation: Total Score \(\succcurlyeq\) Advanced Mathematics \(\succcurlyeq\) Python \(\succcurlyeq\) Computer Networks \(\succcurlyeq\) Sports. In this context, the rank of student grades can be derived based on the induced vector. For instance, the data of student 15 is composed of ((263, 91, 90, 82, C), 263), where (263, 91, 90, 82, C) denotes the induced vector, and 263 represents the score corresponding to the induced vector. The rank result of student 15 is 1.

In this case, the sequence of multi features participating in the ranking is regarded as the induced partial-order relation. Drawing inspiration from database indexing theory and the IOWA operator, we designed the MFIOWA as delineated in Definition 2.

Definition 2

Let \(D=\left\{({U}_{i},{a}_{i})\right\}\) be a two-dimensional dataset where \({U}_{i}=\left({u}_{i1,}{u}_{i2,}\dots ,{u}_{im}\right)\) is an induced vector corresponding to the real number \({a}_{i}\) and encompassing m features. These features uphold a partial-order relationship: \({e}_{1}\succcurlyeq {e}_{2}\succcurlyeq \dots \succcurlyeq {e}_{m}\). The MFIOWA operator, characterized as a mapping: \({R}^{n}\to R\), is defined by an n-dimensional weight vector \(W={\left({w}_{1},{w}_{2},\dots ,{w}_{n}\right)}^{T}\), the weight vector satisfies \({w}_{i}\ge 0\), \({\sum }_{1}^{n}{w}_{i}=1\left(i=\mathrm{1,2},\dots ,n\right)\). The MFIOWA operator is shown as Eq. (7).

$${\text{MFIOWA}}\left(\left({U}_{1},{a}_{1}\right),\left({U}_{2},{a}_{2}\right),\dots ,\left({U}_{n},{a}_{n}\right)\right)= \sum \limits_{j=1}^{n}{w}_{j}{d}_{j}$$
(7)

In Eq. (7), \({d}_{j}\) is defined as the j-th largest value of \({a}_{i}\) after the application of a multi-feature partial order on the induced vector \({U}_{i}\). The behavior of the MFIOWA operator is contingent on the composition of \({U}_{i}.\) Specifically, when \({U}_{i}\) encompasses a single feature, the MFIOWA operator simplifies to an IOWA operator, and if \({U}_{i}\) is an empty set, the MFIOWA operator simplifies to an OWA operator. The sorting rule applied to the feature set \(\left({e}_{1},{e}_{2},\dots ,{ e}_{m}\right)\) determines the type of multi-feature partial-order induction. If the feature set is all in ascending order, it is called ascending multi-feature partial-induced order, and vice versa, it is called descending multi-feature partial-induced order. If there are both ascending and descending order features, it is called mixed multi-feature partial-induced order. In this paper, ascending multi-feature partial-induced order is used. Compared with the IOWA operator, the MFIOWA operator considers the effect of multiple features on data aggregation and the partial-order relationship between multiple features. MFIOWA operator is like multi-feature index techniques (Liang et al. 2021). It utilizes expert-constructed sorting rules to rank and aggregate target data, with the feature's partial-order relationships significantly which will affect the sorting outcomes. Multi-feature sorting technique (Chen et al. 2022) offers the flexibility to modify, add, or remove features, and allows the definition of various feature combinations and partial-order relationships. So that it could comprehensively and accurately meet different sorting requirements and adapt to the change in decision-makers perspectives.

The MFIOWA operator merges the idea of multi-feature partial induced order with the OWA operator’s ability to reorder and aggregate data (Yager and Filev 1999) according to multiple features and their respective partial-order relationships. MFIOWA operator exhibits attributes of comprehensiveness, stability, and heterogeneity, and the aggregated results can more accurately reflect the significance of features and the overall situation of samples. At the same time, MFIOWA expands the application range of the IOWA operator by overcoming the limitations associated with a single induced factor in IOWA. Similar to the IOWA operator, the MFIOWA operator satisfies properties of monotonicity, commutativity, boundedness, and idempotency which are derivable from operational rules.

Property 1 (Monotonicity)

Let \(\left({U}_{i},{a}_{i}\right)\) and \(\left({U}_{i},{a}_{i}{\prime}\right) \left(i=\mathrm{1,2},\dots ,n\right)\) be two vector pairs, where \({U}_{i}\) is an induced vector, and \({a}_{i}\) and \({a}_{i}{\prime}\) are two real numbers. For any \(i\in [1,n]\), if \({a}_{i}\) and \({a}_{i}{\prime}\) satisfy the condition: \({a}_{i}\ge {a}_{i}{\prime}\), then Eq. (8) is established.

$${\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right)\ge {\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}{\prime}\right)\right)$$
(8)

Proof

Let \({d}_{j}\) and \({d}_{j}{\prime}\) be the j-th largest elements of \({a}_{i}\) and \({a}_{i}{\prime}\) after sorting induced by a multi-feature partial order \(\left(j=\mathrm{1,2},\dots ,n\right)\). Since the induced vector \({U}_{i}\) is the same, the rank results are identical, implying \({d}_{j}={a}_{i}\) and \({d}_{j}{\prime}={a}_{i}{\prime}\). There are non-negative weights \({w}_{j}\) satisfying \({\sum }_{1}^{n}{w}_{j}=1\) and \({w}_{j}\ge 0\) such that \({\sum }_{1}^{n}{w}_{j}{d}_{j}\ge {\sum }_{1}^{n}{w}_{j}{d}_{j}{\prime}\). Thus, Eq. (8) holds. □

Property 2 (Commutativity)

Let \(\left({U}_{i},{a}_{i}\right)\) be a vector pair, and \({a}_{i}\) be a real number. If \(\left({U}_{i}{\prime},{a}_{i}{\prime}\right)\) is any arrangement of \(\left({U}_{i},{a}_{i}\right) \left(i=\mathrm{1,2},\dots ,n\right)\), then Eq. (9) is established

$${\rm {MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right)={\rm {MFIOWA}}\left(\left({U}_{i}{\prime},{a}_{i}{\prime}\right)\right).$$
(9)

Proof

Let \({d}_{j}\) and \({d}_{j}{\prime}\) be the j-th largest elements of \({a}_{i}\) and \({a}_{i}{\prime}\) after sorting induced by a multi-feature partial order \(\left(j=\mathrm{1,2},\dots ,n\right)\). Because \(\left({U}_{i}{\prime},{a}_{i}{\prime}\right)\) represent any ordering of \(\left({U}_{i},{a}_{i}\right)\), \(\left({U}_{i}{\prime},{a}_{i}{\prime}\right)\) share the same ordering results as \(\left({U}_{i},{a}_{i}\right)\). There exist non-negative weights \({w}_{j}\) satisfying \({\sum }_{1}^{n}{w}_{j}=1\) and \({w}_{j}\ge 0\), and for any \(j\), \({d}_{j}={d}_{j}{\prime}\). Therefore, the following equation can be obtained \(\begin{aligned}&{\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right)={\sum }_{j=1}^{n}{w}_{j}{d}_{j} \\ &={\sum }_{j=1}^{n}{w}_{j}{d}_{j}{\prime}={\text{MFIOWA}}\left(\left({U}_{i}{\prime},{a}_{i}{\prime}\right)\right) \end{aligned}\). Equation (9) is proved. □

Property 3 (Boundedness)

Let \(\left({U}_{i},{a}_{i}\right)\) \(\left(i=\mathrm{1,2},\dots ,n\right)\) be a vector pair, and \({a}_{i}\) be a real number. For any \(i\in [1,n]\), if \({a}_{{\text{min}}}={\text{min}}\left({a}_{i}\right)\), \({a}_{{\text{max}}}={\text{max}}\left({a}_{i}\right)\), then Eq. (10) is established.

$${a}_{{\text{min}}}\le {\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right)\le {a}_{{\text{max}}}$$
(10)

Proof

Because \({d}_{j}\) is the j-th largest element of \({a}_{i}\) after sorting induced by a multi-feature partial order \(\left(j=\mathrm{1,2},\dots ,n\right)\), and \({d}_{j}\) is a permutation of \({a}_{i}\), it follows that \({a}_{{\text{min}}}\le {d}_{j}\le {a}_{{\text{max}}}\). Therefore, the following equation can be obtained: \(\begin{aligned}&{\sum }_{j=1}^{n}{w}_{j}{a}_{{\text{min}}}\le {\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right) \\ & \quad ={\sum }_{j=1}^{n}{w}_{j}{d}_{j}\le {\sum }_{j=1}^{n}{w}_{j}{a}_{{\text{max}}} \end{aligned}\). Also, since \(\sum_{j=1}^{n}{w}_{j}=1\), \(\sum_{j=1}^{n}{w}_{j}{a}_{{\text{min}}}={a}_{{\text{min}}}\sum_{j=1}^{n}{w}_{j}\), and \(\sum_{j=1}^{n}{w}_{j}{a}_{{\text{max}}}={a}_{{\text{max}}}\sum_{j=1}^{n}{w}_{j}\), it can be demonstrated that Eq. (10) holds. □

Property 4 (Idempotency)

Let \(\left({U}_{i},{a}_{i}\right)\) be a vector pair. If \({a}_{1}=\dots ={a}_{n}=a\), then Eq. (11) is established

$${\text{MFIOWA}}\left(\left({U}_{i},{a}_{i}\right)\right)=a.$$
(11)

Proof

Since \(\sum_{j=1}^{n}{w}_{j}=1\) and for any \(i\), \({a}_{i}=a\) it follows that \(\sum_{j=1}^{n}{w}_{j}{a}_{j}=a\sum_{j=1}^{n}{w}_{j}=a\). Thus, Eq. (11) holds. □

3.2 Derivation feature weights based on sigmoid function

The Sigmoid function is distinguished by its unique properties, such as monotonicity, continuity, differentiability, and the monotonicity of its inverse function. In addition, the Sigmoid function is characterized by a value range confined within [0, 1] (Ying et al. 2021; Sun et al. 2021). The standard form of the Sigmoid function, expressed as \(1/({e}^{-x}+1)\), is notable for its S-shaped curve, which is symmetrically centered at the point (0, 0.5) (Ying et al. 2021). Inspired by Sun et al. (2021), the Sigmoid function is used to derive the feature weight of the MFIOWA operator which involves the parameter \(\lambda\), leading to the construction of the modified Sigmoid function, denoted as \({{\text{sigm}}}^{\lambda }\), as presented in Eq. (12). The \(\lambda\) allows for a modified curvature of the Sigmoid function, thereby, the modified Sigmoid function offers a more adaptable tool for weight allocation in data aggregation.

$${{\text{sigm}}}^{\lambda }\left(r\right)=\frac{1}{{e}^{-\lambda \left(r-0.5\right)}+1},r\in \left[\mathrm{0,1}\right],\lambda \in R$$
(12)

In Eq. (12), while retaining the S-shaped curve of the traditional Sigmoid function, the \({{\text{sigm}}}^{\lambda }\) function introduces a parameter \(\lambda\) to modulate the sharpness and direction of the weight curve. Notably, the \({{\text{sigm}}}^{\lambda }\) function exhibits central symmetry, passing through the point (0.5, 0.5). The parameter \(\lambda\) is instrumental in determining the orientation of the Sigmoid function curve. When \(\lambda >0\), the \({{\text{sigm}}}^{\lambda }\) function takes on an S shape. Conversely, when \(\lambda <0\), the \({{\text{sigm}}}^{\lambda }\) function takes on an inverted S shape. The magnitude of \(\left|\lambda \right|\) is critical in controlling how changes in the input influence the slope of the \({{\text{sigm}}}^{\lambda }\) function curve. A larger \(\left|\lambda \right|\) results in more pronounced slope changes, rendering the curve akin to a step function. Conversely, a smaller \(\left|\lambda \right|\) yields more gradual slope changes, closely resembling a line parallel to the x-axis. In the special case, when \(\lambda =0\), the Sigmoid function simplifies to an equal-weight function curve. Furthermore, when \(\lambda =n\), the Sigmoid function curve becomes a special instance of the \({{\text{sigm}}}^{\lambda }\) function curve. By adjusting \(\lambda\), the \({{\text{sigm}}}^{\lambda }\) function can accommodate S-shaped curves with varying curvatures, thereby enhancing information filtering and processing capabilities. The flexibility of the Sigmoid function allows the weight method to be tailored to different application contexts.

Additionally, the \({{\text{sigm}}}^{\lambda }\) function curve, as defined in Eq. (12), is centrally symmetric. To better fit real-world data, and improve the diversity and flexibility of the MFIOWA weight method, a translation transformation is introduced into the \({{\text{sigm}}}^{\lambda }\) function. This modification leads to the development of the \({{\text{sigm}}}^{\lambda , \theta }\) function, as shown in Eq. (13)

$${{\text{sigm}}}^{\lambda , \theta }\left(r\right)=\frac{1}{{e}^{-\lambda \left(r-0.5-\theta \right)}+1},r\in \left[\mathrm{0,1}\right],\lambda \in R.$$
(13)

Based on Eq. (12), Eq. (13) further adds \(\theta\) to control the filtering proportion of sample data. The positive or negative value of \(\theta\) determines the direction of the curve offset center. Assuming \(\lambda\) is positive, whether \(\theta\) is greater than or less than 0, the \({{\text{sigm}}}^{\lambda , \theta }\) function values are increasing. Particularly, when \(\lambda =n\) and \(\theta =0\), the \({{\text{sigm}}}^{\lambda , \theta }\) function degenerates into the standard Sigmoid function. The magnitude of \(|\theta |\) controls the degree of the curve offset from the S-shaped curve. Assuming \(\lambda\) is positive, when \(\theta >0\), the larger \(|\theta |\) is, the more the function curve resembles an exponential function curve, and the range of the function will increase accordingly. When \(\theta <0\), the larger \(|\theta |\) is, the more the function curve tends towards an equidistant line, and the range of the function will gradually decrease. \(\theta\) introduces subjective attitudes in weights and allows the \({{\text{sigm}}}^{\lambda , \theta }\) function to be adjusted based on the judgment of the important group size. The change of \(\theta\) can control the degree and balance of information filtering. Thus, \({{\text{sigm}}}^{\lambda , \theta }\) function has greater flexibility and applicability.

Compared to Xu's weight method built upon normal distribution (Xu and Da 2003), the weight method using the \({{\text{sigm}}}^{\lambda , \theta }\) function can provide linear and nonlinear feature weights for requirements of S-shaped distribution. The new method makes it conducive for the OWA operator family to filter out important information and emphasize the proportion of important information. Therefore, using the \({{\text{sigm}}}^{\lambda , \theta }\) function can provide the weight vector with monotonicity, boundedness, and information filtering capabilities. Inspired by the ideas of Xu and Da (2003), a weight method for the MFIOWA operator is proposed based on the \({{\text{sigm}}}^{\lambda , \theta }\) function, as shown in Definition 3.

Definition 3

Let \({W}_{{\text{Sigmoid}}}={\left({w}_{1},{w}_{2},\dots ,{w}_{n}\right)}^{T}\) be a weight vector, where \({w}_{i}\ge 0\) and \({\sum }_{1}^{n}{w}_{i}=1\). The weight calculation method of the MFIOWA operator based on the Sigmoid function is given by Eq. (14)

$${w}_{i}=\frac{{{\text{sigm}}}^{\lambda , \theta }\left(\frac{i}{n}\right)}{{\sum }_{i=1}^{n}{{\text{sigm}}}^{\lambda , \theta }\left(\frac{i}{n}\right)}, i=\mathrm{1,2},...,n.$$
(14)

In Eq. (14), the calculation of \({{\text{sigm}}}^{\lambda , \theta }\) is as shown in Eq. (9). To facilitate control of the function shape, the input is usually normalized to form \((i/n)\), which range is [0, 1]. If the standard Sigmoid is used directly, when the input value is between 0 and 1, the output of Eq. (14) will become a straight line. Therefore, additional parameters are introduced to improve the Sigmoid function formula to solve this problem, as shown in Eq. (13). By adjusting the parameters \(\theta\) and \(\lambda\), Eq. (14) can fit the linear and nonlinear weight requirements.

When \(\lambda\) and \(\theta\) take different values, the \({{\text{sigm}}}^{\lambda , \theta }\) function undergoes dynamic changes with the weight vector. Therefore, when solving the \({W}_{{\text{Sigmoid}}}\) vector using Eq. (14) determining the values of \(\lambda\) and \(\theta\) adds computing costs, and in practical applications, considerations need to be given to metrics such as Orness and Dispersion. However, for large datasets, the computing cost of Dispersion is higher than that of the Gini index (Breiman et al. 1984). Additionally, the Gini index applies to both discrete and continuous data. Hence, this paper employs Orness and the Gini index to measure the MFIOWA weights. Let \(W={\left({w}_{1},{w}_{2},\dots ,{w}_{n}\right)}^{T}\) be a weight vector, and the Gini index of the weight vector \(W\) is given by Eq. (15).

$${\text{Gini}}(W)={\sum }_{i=1}^{n}{\sum }_{j=1,i \ne j}^{n}{w}_{i}{w}_{j}=1-{\sum }_{i=1}^{n}{w}_{i}^{2}$$
(15)

It can be easily proved from Eq. (15) that the value range of the Gini index of the weight vector \(W\) is from 0 to 1. The closer the value of the Gini index is to 1, the higher the uncertainty within the dataset is, and the closer the value is near 0, the greater the degree of purity is. The Gini index serves as a valuable tool in evaluating the efficacy of weight distribution of the MFIOWA operator, and the relative significance of each sample in the population.

In weight optimization of the OWA operator, the objective function proposed by O’Hagan offers substantial insight (O'Hagan 1988). Inspired by O’Hagan, a weight optimization model of the MFIOWA is introduced that integrates Orness and the Gini index. The weight objective optimization function of the MFIOWA operator is defined by the Gini index, which operates within the bounds set by a pre-determined level of Orness, and the weight optimization objective function is encapsulated in the model (M-1).

$$\begin{array}{l} {{\text{Minimize}}:} \qquad {1 - \sum_{i = 1}^n {\,} w_i^2 } \\ {{\text{Subject to}}:} \qquad {\frac{1}{n - 1}\sum_{i = 1}^n {\left( {n - i} \right)w_i } = \alpha } \\ {\,} \qquad {\begin{array}{*{20}c} {w_i = \frac{{{\text{sigm}}^{\lambda , \theta } \left( \frac{i}{n} \right)}}{{\sum_{i = 1}^n {\text{sigm}}^{\lambda , \theta } \left( \frac{i}{n} \right)}}} \\ {\alpha \in \left[ {0,1} \right],w_i \in \left[ {0,1} \right],i = 1,2, \ldots ,n} \\ \end{array} } \\ \end{array}$$

In the model (M-1), the objective function is to solve the optimal weights \({w}_{i}\) that minimize the Gini index for a given Orness measure, denoted as \(\alpha\). The Orness constraint is designed to ensure weight vector conforms to human subjective attitudes and enhances the model's applicability in scenarios of human-like decision-making processes. The model (M-1) of the weight constraint function incorporates a \({\text{sigm}}^{\lambda ,\theta }\) function, as detailed in Eq. (15). To solve model (M-1) efficiently, this study employs the Sequential Least Squares Quadratic Programming (SLSQP) method (Marques et al. 2021).

In model (M-1), the influence of different \(\alpha\) on the weight vector is shown in Fig. 1. The parameter α is closely linked to parameters of weight method and weight measures, and simplifies parameter tuning processes. A smaller absolute difference \(|\alpha -0.5|\) is, the more balanced or neutral the subjective attitude is and the closer the curve is towards a linear representation. In contrast, the larger the value of \(|\alpha -0.5|\) is, the more pronounced or extremely subjective the attitude is and the greater the curvature of the graph is. Consequently, by modulating the \(\alpha\) parameter, it is possible to determine the weight change of the MFIOWA operator.

Fig. 1
figure 1

MFIOWA weight values at different \(\alpha\) levels

4 OWAK-means algorithm based on MFIOWA operator

Let x be a sample set comprising of n samples, characterized by \({m}_{1}\)-dimensional features. U is an induced set consisting of n samples of \({m}_{2}\)-dimensional features. The features of U can be derived from features of x (such as density, variance, etc.), or some part features of x. Data set x and induction set U together form the input set X, and the feature dimension of X is \(m\) meeting \(m={m}_{1}+{m}_{2}\). After the OWAK-Means clustering, k subsets are obtained, represented as \({C=\{C}_{1},{C}_{2},\dots ,{C}_{k}\}\). The following will detail the improvement in centroid initialization, distance metric, and centroid calculation in the OWAK-Means Algorithm.

4.1 Initialization centroids of OWAK-means algorithm

In real life, multi-feature data frequently have certain partial-order relationships (Wang et al. 2016). Random selection of the initial centroid of K-Means is not good for centroid calculation and membership assignment, while fixed selection is unsuitable for datasets with multi-feature and fails to consider the relationship between features. Therefore, given the distribution properties of partially ordered data, experts could evaluate the position of \(k\) initial centroid according to the distribution of observed data, and give the weight of \(k-1\) centroid intervals. The initial centroid selection method for OWAK-Means which considers partial-order relationship, is shown in Definition 4.

Definition 4

Let \(\left\{{x}_{1}{\prime},{x}_{2}{\prime},\dots ,{x}_{n}{\prime}\right\}\) be an order set, which is obtained by sorting the original set \(\left\{{x}_{1},{x}_{2},\dots ,{x}_{n}\right\}\) and satisfies the partial-order relationship. The expert weight vector is denoted as \({\left({\varepsilon }_{1},{\varepsilon }_{2},\dots ,{\varepsilon }_{k-1}\right)}^{T}\). The computation of the initial centroid vector \(\left({\mu }_{1},{\mu }_{2},\dots ,{\mu }_{k}\right)\) for the OWAK-Means algorithm, which is based on partial-order relationships, is shown in Eq. (16) and Eq. (17).

$${\mu }_{i}={x}_{id(i)}{\prime}$$
(16)
$${\text{id}}(i) = \left\{ {\begin{array}{l} {1,} \qquad \qquad \qquad \qquad \qquad \,\,\,\, {i = 1} \\ {{\text{id}}(i - 1) + \left\lfloor {\varepsilon_{i - 1} (n - 1)} \right\rfloor }, \,\, {2 \le i \le k} \\ \end{array} } \right.$$
(17)

In Eq. (16), \({\mu }_{i}\) is defined as the i-th initial centroid derived from the partial-order set \(\left\{{x}_{1}^{{^{\prime}}},{x}_{2}^{{^{\prime}}},\dots ,{x}_{n}^{{^{\prime}}}\right\}\). The function \({\text{id}}\) is an iterative function used to determine the index position of the i-th initial centroid in a partial-order set, as elaborated in Eq. (17). As can be seen from Eq. (17), the initial value of the \({\text{id}}\) function is 1 and the index position of each ensuing centroid is determined by adding the current index position to the interval calculated based on expert-assessed. This interval is calculated as the product of \((n-1)\) and expert weights, with the result rounded down. The selection method of the initial centroid reflects the experts’ estimation of the difference in index positions between consecutive centroids. The weights \({\varepsilon }_{i}\) are evaluated considering the partial-order relationships and distribution characteristics of the dataset and represent the fraction of the interval between initial centroids relative to the total number of samples in the set. Thus, the initial centroid vector is an ordered vector. When \({\varepsilon }_{1}={\varepsilon }_{2}=\dots ={\varepsilon }_{k-1}\), the index obtained is equivalent to the index extracted by the equidistant sampling method.

The initialization centroid method proposed in this paper adopts a deterministic approach based on index sorting, which avoids the problem of unstable results caused by the random selection of initial centroids. On the other hand, the weights introduced initialization centroid are constructed based on partial-order density, which has certain subjective preferences. Consequently, in practical implementations, it is recommended to engage individuals possessing extensive experience and professional expertise in evaluating partial-order density, to improve accuracy and reliability.

4.2 OWAK-means clustering algorithm

To improve the performance of the K-Means algorithm, the proposed OWAK-Means algorithm uses the MFIOWA operator, which employs the Sigmoid weight method to find the prime centroid vectors and integrates into Eq. (3) to calculate the distances between each point and its centroid vector. The OWAK-Means proceeds through iterations until it reaches convergence. Let \({C}_{j}\left(j=\mathrm{1,2},\dots ,{\text{k}}\right)\) denote a cluster in the clustering process, and \(D=({x}_{1},{x}_{2},\dots ,{x}_{|{C}_{j}|})\) represents the clustered set. Correspondingly, \(({U}_{1},{U}_{2},\dots ,{U}_{|{C}_{j}|})\) represent the induced set. The update method for the centroid vector in the OWAK-Means Algorithm, based on the MFIOWA operator, is shown in Eq. (18).

$${\mu }{\prime}={\text{MFIOWA}}\left(\left({U}_{i}{,x}_{i}\right)|\left({U}_{i}{,x}_{i}\right)\in {C}_{j}\right)={W}_{S}^{T}{X}^{\prime}$$
(18)

In Eq. (18), \({X}^{\prime}\) is defined as an array comprising j-th largest elements \({x}_{i}\), which is determined following the application of multi-feature-induced order using \({U}_{i}\). The weight vector \({W}_{S}^{T}\) is solved by the (M-1) model. In this context, \({W}_{S}^{T}{X}{\prime}\) represents the vector form of the updated formula for centroids. The new centroid calculation method can reduce the influence of sample noise or uneven distribution by filtering the samples in the cluster with multi-feature induced factors.

In the K-Means algorithm, the distance function will significantly affect the performance of clustering results. Inspired by the OWDr distance metric (Pons-Vives et al. 2022), the squaring distance is introduced in this paper to enhance the distance difference between different points. Let \(x\) and \(y\) be two n-dimension vectors, and \({\left({w}_{1},{w}_{2},\dots ,{w}_{n}\right)}^{T}\) be a weight vector meeting \({w}_{i}\ge 0\) and \({\sum }_{i=1}^{n}{w}_{i}=1\). The weighted squaring distance metric is shown as Eq. (19).

$${\text{OWD}}\left( {x,y} \right) = \mathop \sum \limits_{j = 1}^n w_j d_{rs} (x_{\left( j \right)} ,y_{\left( j \right)} ),$$
(19)

where \(x\), y are feature vectors with \(x=({x}_{1},{x}_{2},\dots ,{x}_{n})\), \(y=({y}_{1},{y}_{2},\dots ,{y}_{n})\), \(d_{rs} (x_{\left( j \right)} ,y_{\left( j \right)} )\) denotes the j-th distance element in the distance set \(\{ d_{rs} (x_1 ,y_1 ), \ldots ,d_{rs} (x_n ,y_n )\}\). And \({d}_{rs}: R\times R\to [\mathrm{0,1}]\) is the squaring distance given by Eq. (20).

$$d_{rs} (a,b) = \left\{ {\begin{array}{l} {(a - b)^2 /({\text{max}}(a,b))^2 ,} \quad {{\text{otherwise}}} \\ {0,} \qquad \qquad \qquad \qquad \quad \,\,\,\, {a = b = 0} \\ \end{array} } \right..$$
(20)

Observe that the input vectors \(x\) and y represent the data instances involved in the clustering process. Moreover, like the same as OWDr distance metric, \({\rm {OWD}}\) distance is sensitive to differences in the scales of the variables. Subsequently, by replacing \({\mu }{\prime}\) and \({\rm {OWD}}\) distance metric in Eq. (3), the optimization objective function of OWAK-Means is presented in Eq. (21).

$$D = \sum_{j = 1}^k {\sum_{i = 1}^n {\sum_{v = 1}^{m_1 } } } s_{ij} w_v d_{rs} (x_{iv} , (W_{S j}^T X_j^{\rm{\prime}} )_v) ,$$
(21)

where \({d}_{rs}\) is shown as Eq. (20). In Eq. (21), the \({\mu }_{jv}^{\prime}=({{W}_{S j}^{T}{X}_{j}^{\prime})}_{v}\) represents the v-th feature values of the j-th centroid. \({x}_{iv}\) denotes the v-th feature value of the i-th sample in the sample set. The OWAK-Means algorithm, as outlined in Table 2, initiates with the construction of the input set \(X\) with two subsets: the sample set \(x\), containing \({m}_{1}\) features, and the induced set \(U\) encompassing \({m}_{2}\) features. This paper selects the sample features from the original datasets, and the induced features are obtained by calculating based on multi-density levels. The process of calculating multi-level density is as follows. First, calculate the mean point of sample set \(x\) and obtain the distance between the farthest point and the mean point as the baseline distance (base). Secondly, given the truncation rate \(\upsigma\) and the number of levels \({{\text{m}}}_{2}\), the critical distances for different density levels are calculated with the formula: \((\sigma \cdot {\text{base}} \cdot 1/m_2 ,...,\sigma \cdot {\text{base}} \cdot l/m_2 )\), where \(l\in [1,{m}_{2}]\). Last, with each point as the center, a circle with the radius of l-th critical distance \(\sigma \cdot {\text{base}} \cdot l/{\text{m}}_2\) is constructed, and the number of sample points existing in the circle is calculated as the density of the point under the l-th critical distance. The data of multi-density features are constructed into induced set \(U\). The number of points around each point within different critical distances as the induced set for that point is calculated. In the experiment in Sect. 5, the values of \(\sigma\) and \({m}_{2}\) are fixed after, where \(\sigma =0.1\) and \({m}_{2}=5\). Following this, a multi-feature sorting is applied to \(x\), and the initial centroid vector \({\mu }_{j}\) is computed based on the ordered set and expert-assigned weights. Cluster \({C}_{j}\) is initially established as an empty set. The algorithm then assigns each sample point to its corresponding cluster based on the OWD distance metric between \({\mu }_{j}\) and \({x}_{j}\). Utilizing the MFIOWA operator, which incorporates both clustering and induced features of the cluster members, a new centroid vector \({\mu }_{j}{\prime}\) is determined. The necessity of updating the centroid vector is evaluated by comparing the OWD distance between \({\mu }_{j}{\prime}\) and \({\mu }_{j.}\) The iterative process of updating centroids continues until the objective function reaches its minimum value.

Table 2 OWAK-means algorithm

The introduction of MFIOWA increases the characteristics that can be used for clustering tasks. The sample data set is used to calculate the distance and determine memberships, while the induced set is used to filter interference and noise data. Multi-feature-induced order rules support the sorting of continuous data and discrete data, and support custom sorting rules for non-numerical data so that non-continuous features participate in the iterative process of clustering centroid. MFIOWA influences the updating process of cluster centroid through feature induction and order-weighted aggregation, so that there is a certain preference in determining cluster centroid, and the degree of preference is determined by the parameter \(\alpha\) of the Sigmoid weight model (M-1). In the algorithm proposed in this paper, this preference is expressed in the retention of data points with a certain degree of density. Utilizing the weight method from the (M-1) model, the parameter \(\alpha\) can be manipulated to alter the direction, form, and equilibrium of the weight curve. As delineated in Sect. 3.2, an \(\alpha <0.5\) imparts a pessimistic bias through \({W}_{S}^{T}\) filtering information from lower rankings. Consequently, the calculation results of MFIOWA exhibit an upward bias relative to the WA operator, leading to a consistent positive bias in each cluster centroid calculation. In contrast, when \(\alpha >0.5\), \({W}_{S}^{T}\) adopts an optimistic bias, emphasizing lower positioned information. This results in the MFIOWA's calculations being downwardly biased compared to the WA operator, thereby introducing a negative bias in the centroid calculations. The application of the S-shaped weight method within the MFIOWA operator framework enables sophisticated filtering and screening of data during the iterative centroid vector update process, in conjunction with induced features. The \(\alpha\) parameter affords subjective control over the cluster centroids, ensuring that each iteration strategically adjusts centroid calculations, thus enhancing the overall clustering performance and accuracy. We will analyze the relationship between parameters and the OWAK-Means algorithm in Sect. 5.

The OWAK-Means distinct from the conventional K-Means algorithm, the proposed OWAK-Means algorithm necessitates the input of expert weights \(({\varepsilon }_{1},{\varepsilon }_{2},\dots ,{\varepsilon }_{k-1})\) and a subjective attitude parameter \(\alpha\). The expert weights \(({\varepsilon }_{1},{\varepsilon }_{2},\dots ,{\varepsilon }_{k-1})\) ensure the rationality of the initial centroid positions in OWAK-Means. Given the Orness measure \(\alpha\), the optimal weights for the MFIOWA operator can be determined, thereby inducing the optimization of centroids in OWAK-Means. This additional input expedites centroid convergence during the iterative clustering process, culminating in final clustering outcomes reflecting subjective judgment and objective data patterns.

5 OWAK-means algorithm implementation

5.1 Datasets and evaluation indicators

The experimental study employed eight real-world datasets from the UCI, Wine, Iris, Drug, Obesity, Customer Segment, Heart, Balance, and User. The basic characteristics of these datasets are succinctly summarized in Table 3, which includes information on the sample size, the number of feature dimensions, and the number of target categories for classification. Notably, the Customer Segment dataset, with sample sizes of 8068, is a multi-sample dataset. Furthermore, the Wine and Heart datasets have 13 features. The scope of classification within these datasets is diverse, encompassing categories ranging from 2-class to 5-class.

Table 3 Basic characteristics of the datasets

To assess the performances of the clustering algorithm, three metrics are employed: adjusted rand score (ARS) (Khan et al. 2019), Normalized Mutual Information (NMI), and purity (PUR) (Huang et al. 2021). The ARS ranges from − 1 to 1, where values approaching 1 signify superior clustering performance. The NMI spans from 0 to 1, with values nearing 1 indicating more precise clustering outcomes, while values closer to 0 suggest results akin to random clustering. Lastly, a greater value in PUR is indicative of more effective clustering performance.

5.2 Analysis of implementation result

The proposed OWAK-Means algorithm is compared with seven clustering algorithms: K-Means (MacQueen 1967), K-Means+ + (Arthur and Vassilvitskii 2007), Mini Batch K-Means (Sculley 2010), Bisecting K-Means (Steinbach et al. 2000), WK-Means (Chen et al. 2012), MWK-Means (De Amorim and Mirkin 2012), and K-Means (OWDr) (Pons-Vives et al. 2022). In the implementation phase, the OWAK-Means uniquely partitioned the multi-feature data of each dataset into sample sets and induced sets. In contrast, the other clustering algorithms utilized only attribute sets for the clustering process. To ensure the reliability of the experimental results and prevent accidental results from being seen as final, each algorithm is subjected to 50 iterations of experimentation. The average of multiple experiments is used as the result. To effectively assess the performance of these methodologies, the Technique for Order of Preference by Similarity to the Ideal Solution (TOPSIS) based on the entropy weight method is used to generate a comprehensive score and ranking based on the mean results. Assuming a scenario with n objects, each characterized by m attributes, the TOPSIS method, underpinned by the entropy weight method, entails the following procedural steps: (1) Standardization to procure a matrix, denoted as \(Z=\{{z}_{ij}\}\). (2) Computation of information entropy for each indicator: \({e}_{j}=-\frac{\sum_{i=1}^{n}{p}_{ij}ln\left({p}_{ij}\right)}{ln\left(n\right)}\), where \({p}_{ij}={z}_{ij}/\sqrt{\sum_{{\text{i}}=1}^{{\text{n}}}{z}_{ij}}\). And obtain the weight for each indicator: \(W_j = (1 - e_j )/\left( {\sum_{j=1}^m (1 - e_j )} \right)\). (3) Calculate the distance to the positive ideal solution: \({D}_{i}^{+}=\sqrt{\sum_{j=1}^{m}{{W}_{j}({Z}_{j}^{+}-{z}_{ij})}^{2}}\), and the distance to the negative ideal solution: \({D}_{i}^{-}=\sqrt{\sum_{j=1}^{m}{{W}_{j}({Z}_{j}^{-}-{z}_{ij})}^{2}}\), where the positive and negative ideals are defined as \(Z^+ = \{ Z_j^+ = {\text{max}}(z_{ij} )\}\) and \(Z^- = \{ Z_j^- = {\text{min}}(z_{ij} )\}\), respectively. (4) Calculate the comprehensive score as \({S}_{i}={D}_{i}^{-}/({{D}_{i}^{+}+D}_{i}^{-})\). Where \(i\in [1,n],j\in [1,m]\).

Comprehensive performance comparison results of different K-Means algorithms are given in Table 4, where italic represents the best method for a given dataset and a certain metric. And the Friedman test and Nemenyi test (Ma et al. 2022) results are conducted, as shown in Table 5 and Fig. 2. Stability performance analysis is given in Fig. 3 which details variance based on uncertainty assessment (Abbaszadeh Shahri et al. 2022) for ARS, NMI, and PUR. An Outlier comparison of ARS, NMI, and PUR is given in Fig. 4. Z-Score methodology is used to evaluate the outlier, and the outlier threshold is established at 2. If the Z-Score of ARS, NMI, and PUR exceeds 2, the evaluating index is classified as an outlier. Robustness analysis is given in Fig. 5 which shows the performance of each algorithm under noise.

Table 4 Performance comparison of variant K-Means clustering algorithms (50 experiments)
Table 5 Summary of the Friedman Statistics \({F}_{{\text{F}}}\) (k = 8, N = 8) and the critical value
Fig. 2
figure 2

Nemenyi test of variant K-Means algorithms

Fig. 3
figure 3

Stability comparison of variant K-Means algorithms on eight datasets

Fig. 4
figure 4

Outliers comparison of variant K-Means algorithms

Fig. 5
figure 5

Robustness comparison of variant K-Means algorithms

From Table 4, it can be revealed that the OWAK-Means algorithm exhibits superior overall performance. Detailed observations are as followed. (1) For the ARS metric, OWAK-Means is better than the second-best clustering algorithm on the Wine, Iris, Drug, Customer Segment, Heart, Balance, and User datasets, exhibiting improvements of 0.0159 (1.9487%), 0.1258 (16.5589%), 0.0036 (4.9557%), 0.0051 (4.6460%), 0.1535 (100.7730%), 0.0047 (3.3107%), 0.1732 (69.5556%), respectively. (2) Regarding the NMI metric, OWAK-Means outperforms on the Wine, Iris, Drug, Obesity, Customer Segment, Heart, Balance, and User datasets, and closely approaches the optimal results on the Drug dataset. (3) For PUR metric, OWAK-Means ranks highest on all datasets and shows improvements over the second-best clustering algorithm of 0.0055 (0.5868%), 0.0629 (7.0155%), 0.0167 (3.2221%), 0.0096 (1.4254%), 0.0151 (3.4579%), 0.0800 (11.4650%), 0.0121 (1.8379%), 0.0029 (0.4927%), respectively. (4) In terms of TOPSIS comprehensive evaluating result, OWAK-Means consistently ranks first, surpassing K-Means, K-Means++, Mini Batch K-Means, Bisecting K-Means, WK-Means, MWK-Means, and K-Means (OWDr) algorithms.

To further evaluate whether the OWAK-Means has significant advantages over other algorithms, the Friedman test and Nemenyi test are used to evaluate whether the ranks of the methods significantly differ or not. The Friedman test statistics value \({F}_{{\text{F}}}\) and the corresponding critical values for each evaluation criterion are shown in Table 5. In Table 5, k is the number of comparing methods and N is the number of datasets. Taking a significance level of \(\varphi =0.05\), the null hypothesis that the compared methods perform equally is rejected for all evaluation criteria.

The Nemenyi test further investigates whether each of the methods performs equally well against the others. The performance of the two methods significantly differs if the difference in their average is over the critical difference \({\text{CD}}={q}_{\varphi }\sqrt{\frac{k(k+1)}{6N}}\). \({q}_{\varphi }\) is 3.0310 and the CD is calculated as 3.7122 (k = 8, N = 8) for \(\varphi =0.05\). The CDs are outlined in Fig. 2. The compared methods with average ranks within a CD to that of OWAK-Means are covered by a red line. In other words, uncovered methods thus have a significantly worse performance than OWAK-Means.

By looking at ARS, for example, the average rank for OWAK-Means is 1.25 and the critical value is 4.281 after adding the CD. Given that the average ranks of K-Means, K-Means++, Mini Batch K-Means, Bisecting K-Means, WK-Means, and MWK-Means, are than 4.281, they are classified as worse methods in this case. However, there is no statistical evidence to assert that OWAK-Means outperforms the rest of the compared methods under ARS. The same applies to NMI and PUR.

From Fig. 3, it can be revealed that the OWAK-Means algorithm demonstrates a stable result and lower uncertainty, with the smallest mean standard deviation when the OWAK-Means algorithm compared to K-Means, K-Means++, Mini Batch K-Means, Bisecting K-Means, WK-Means, MWK-Means, and K-Means (OWDr) algorithms. This observation suggests that OWAK-Means superior stability and reliability during the clustering process.

From Fig. 4, it can be indicated that among the eight datasets, the OWAK-Means algorithm records the lowest number of outlier results across the evaluation metrics. In comparison, the rest algorithms show a higher prevalence of outlier value results, especially in Drug, Obesity, Balance, and User datasets, where OWAK-Means registers fewer outliers than the other seven algorithms. This disparity is particularly pronounced in multi-featured and large-sample datasets such as Obesity, Customer Segment, and Heart datasets. These results show that the OWAK-Means algorithm has better stability and reliability.

From Fig. 5, after randomly increasing 5% uniform noise to simulate extreme values and outliers, OWAK-Means algorithm remained advantages in ARS, NMI, and PUR metrics. It shows that the OWAK-Means algorithm has better robustness under the influence of noise. For ARS metric, the OWAK-Means algorithm is significantly better than the second-best algorithm in the Wine, Drug, and User datasets, exhibiting improvement of 0.1164 (19.09%), 0.0534 (64.48%) and 0.1036 (32.98%), respectively. As for the NMI metric, OWAK-Means outperforms on eight datasets. For PUR metric, except User and Customer Segment datasets, OWAK-Means ranks top and shows improvements over the second-best algorithm. The noise will affect the determination of centroids, which deviates from the real centroid. OWAK-Means filters noise by MFIOWA operator and Sigmoid weights according to density metric, thus it can be well adapted to the noise, especially in dealing with noise at cluster edges. These results show that the OWAK-Means algorithm has better robustness.

Through comparative analysis, the proposed OWAK-Means algorithm has better comprehensive performance, stability, and robustness. But there are some limitations and difficulties. Multiple subjective parameters are involved in OWAK-Means, which increases the cost of decision-making for experts. In the part of centroid initialization, the OWAK-Means clustering algorithm based on expert weight and partial-order relationship has high stability, but it also brings the problem that may fall into a local optimal solution. In MFIOWA, this paper takes multi density levels as the induced factors to explore dense-induced in-cluster sample filtering and centroid calculation. The value of the critical distance and the number of density levels used to calculate multi-density has a great impact on MFIOWA, and this part is not related to the data distribution and integrated into the algorithm. In this experiment, the feature weight optimization was not carried out, and the feature weights of the OWD distance metric were set as equal weights. The OWAK-Means algorithm proposed in this paper still has a large space for improvement, and we will continue to explore it in future research.

5.3 Sensitivity analysis of OWAK-Means algorithm

To further explore the impact of the parameter settings of MFIOWA's weighting method on clustering results, this study conducted a sensitivity analysis on the parameter \(\alpha\). Multiple experiments were conducted on OWAK-Means at different \(\alpha\) levels, and the results were averaged to avoid the occurrence of random outcomes, as shown in Fig. 6. In OWAK-Means, the multi-feature-induced order rule of MFIOWA is in ascending multi-feature partial order. As mentioned in Sect. 3.2, the weight values exhibit an ascending order, indicating that OWAK-Means places more emphasis on high-density samples. Conversely, for OWAK-Means focusing on low-density samples, the weight values exhibit a descending order. As shown in Fig. 6, different data samples have different data distributions and densities, and the optimal parameter values are also different. The datasets Wine, Iris, and Customer Segment show minor variations. Obesity, Balance, and Heart datasets exhibit obvious optimal regions, indicating that the datasets have uneven sample distributions. The result of the User dataset shows a clear monotonicity, suggesting that high-density samples better reflect the concentration of data. From Fig. 6, it can be observed that the changing trends of all indicators are consistent, with many optimal solutions concentrated in the ranges of 0.3 to 0.5 and 0.5 to 0.7. This suggests that appropriately filtering intra-cluster members can effectively enhance the accuracy, feasibility, and reliability of the clustering algorithm. However, the relationship between the selection of density-induced features and the characteristics of the dataset is not well-reflected, and this is a focus for future research.

Fig. 6
figure 6

Performance comparison of OWAK-Means at different \(\alpha\) levels

6 Conclusion

This paper introduces the OWAK-Means algorithm, which incorporates a new initial centroid optimization method and the newly proposed MFIOWA operator. The initial centroid optimization method is designed based on partial-order relationships of multiple features. The MFIOWA operator, developed from the existing IOWA operator, integrates concepts from database index theory and multi-feature induced order. Furthermore, we construct an objective optimization model based on the Gini index and Orness measure to determine the feature weights for the MFIOWA operator using a modified Sigmoid function. This approach reduces tuning costs while enhancing flexibility and variability.

Comparative analysis with state-of-the-art clustering algorithms reveals that OWAK-Means outperforms in critical metrics such as ARS, NMI, and PUR on eight real-world datasets from the UCI repository. Comprehensive evaluation using the TOPSIS method consistently ranks OWAK-Means as the top performer. Significance through the Friedman and Nemenyi tests and stability assessment based on variance and outlier further affirm OWAK-Means' superiority over other K-Means algorithm variants. Additionally, the comparative experiment with adding noise shows that OWAK-Means has better robustness depending on the information filtering ability from the MFIOWA operator and Sigmoid weights. Therefore, the proposed OWAK-Means algorithm demonstrates superior performance in key metrics such as the ARS, NMI, and PUR, along with robustness and stability.

While experimental validation on eight multi-feature UCI datasets confirms OWAK-Means' high performance, future research will further explore OWAK-Means' adaptability across a broader spectrum of datasets and examine the performance of clustering algorithms under different dataset distribution scenarios. The integration of centroid initialization, feature weight optimization method, and automatic selection of cluster k based on the IOWA operator will be further studied.