1 Introduction

In the field of data mining, spectral clustering techniques excel in handling complex data, especially in identifying non-convex separable clusters. It also has the ability to adapt to perturbations within the dataset. Ng et al. [1] defined a transformative approach to clustering, reconceptualizing it as a graph partitioning challenge. This involves utilizing the spectral properties of graphs, in particular their eigenvalues and eigenvectors, as a fundamental tool for cluster segmentation. This approach has gained wide acceptance in the field as a fundamental framework for the development of advanced spectral clustering techniques [2, 3]. Zelnik-Manor et al. [4] pointed out the difficulty of parameter selection in traditional spectral clustering algorithms, such as choosing the threshold value of the similarity matrix and determining the number of clusters. Then, an adaptive parameter tuning method is proposed to automatically determine the threshold value of the similarity matrix by calculating the similarity between each data point and its neighbors. Zhou et al. [5] proposed a constrained spectral clustering algorithm by introducing a priori knowledge or constraints to guide the formation of clustering results. Ning et al. [6] decomposed the Laplace matrix into two correlation matrices and considered the similarity changes as correlation vectors attached to the original correlation matrix, thus realizing an incremental spectral clustering algorithm for dealing with changing datasets. In contemporary computational paradigms, spectral clustering algorithms have gained significant traction in a variety domains, including image segmentation [7,8,9], bioinformatics [10, 11], and financial data analysis [12, 13]. These algorithms leverage the eigenvalues of similarity matrices to partition data points into distinct groups, thus facilitating the extraction of nuanced patterns and correlations inherent within large datasets. This approach is particularly effective in scenarios where traditional clustering methods, such as k-means, are limited by the complexity of the data structure or the multidimensionality of the data space. The versatility and efficiency of spectral clustering techniques have made them indispensable in advanced data analysis and machine learning applications.

The exploration of fair computational paradigms has reached an advanced stage, recognizing that algorithmic decisions can disproportionately affect different entities or cohorts. Scholars in computational theory are investigating the inherent bias of eigenvalue-based partitioning methods, especially in spectral clustering. They have introduced a set of quantitative indices to assess algorithmic fairness. These indices aim to measure the extent to which algorithms categorize data points impartially, without bias toward any specific demographic or group. Kleindessner et al. [14] made the connection between privacy and fairness. Subsequent research [15,16,17] predominantly incorporated the principle of Group fairness, alternatively termed ’Demographic Parity’ in survey [18]. This principle posited that algorithmic decisions should not disproportionately favor or disadvantage any sensitive group. Conversely, Kang et al. [19] introduced the paradigm of ’individual fairness’, which advocated algorithmic imperative that entities with analogous characteristics should receive parallel treatment in computational decision-making processes. This introduces a nuanced approach to the field of machine learning ethics, emphasizing the importance of individual-level fairness in algorithmic decisions.

In current spectral clustering algorithms, there is an acknowledged bias mitigation, both at the granularity of singular entities and at the collective level of sensitive cohorts. This mitigation is achieved by formulating an augmented objective function that actively seeks to minimize inequity. However, as can be seen from the left image in Fig. 1, there is still a high probability that spectral clustering without scale fairness will have clusters that are oversized or undersized. In the left panel, cluster1 is significantly larger than the other three clusters, while in the right panel, using the scale fairness spectral clustering algorithm, the four clusters are much closer in scale.

Fig. 1
figure 1

Spectral clustering appears to have an imbalance in the scale of the result clusters

In spectral clustering, improving the group fairness of an algorithm reduces bias against certain groups. Improving individual fairness ensures that each individual in a cluster receives appropriate attention and consideration. Improving scale fairness avoids over-reliance on data from certain large clusters, resulting in more comprehensive and accurate decisions. This is supported by theories of statistical learning that emphasize the importance of representation in learning algorithms. A fair distribution of cluster scales ensures that no single cluster dominates the representation, which is consistent with the principle of uniformity in probability theory. Scale fairness directly addresses biases that can arise due to the uneven distribution of data points. Biased clustering can lead to misrepresentation of minority groups because smaller clusters may not adequately represent these groups. This is consistent with the principles of social fairness and equity in algorithmic decision making as outlined in ethical AI frameworks [20, 21].

Herein this paper introduces the entropy weighted spectral clustering (EWSC) and the scale fair spectral clustering (SFSC). These methods aim to solve the problem of clustering results appearing as oversized or undersized clusters, which is prevalent in existing spectral clustering algorithms. The main contributions of this work are delineated as follows:

  • This paper introduces an innovative concept of scale fairness that is specifically tailored to the domain of computational classification and clustering paradigms. This conceptualization is operationalized by a metric that quantifies fairness by assessing the dispersion of data points across clusters, specifically by computing the standard deviation of the number of elements per cluster.

  • This paper explores the role of entropy in scale fair spectral clustering for the first time. It is shown that the higher the entropy of the input matrix, the higher the scale fairness of the spectral clustering.

  • This paper proposes the EWSC method, which potently improves the scale fairness of spectral clustering by expanding the influence of high entropy data points on the clustering results.

  • This paper proposes the SFSC method, which both considers the similarity between data and improves the scale fairness of clustering by constructing a new Laplace matrix.

  • This paper conducts experiments on numerous real public datasets. The result shows that EWSC and SFSC have better fairness performance compared to spectral clustering, while SFSC also has comparable clustering effects.

This paper is structured as follows. Section 2 focuses on the related work. Section 3 presents a definition of scale fairness and evaluation indicators. Section 4 proposes the SFSC method, including its clustering steps. Section 5 shows the scale fairness performance and clustering effect of EWSC and SFSC under datasets of different sizes. Section 6 notes the direction and objectives of future work. Finally, a brief summary is given in Sect. 7.

2 Related work

In this section, we describe the main symbolic representations used throughout this manuscript, accompanied by a basic exposition of the pertinent concepts. A comprehensive compendium of these symbols, along with their respective explanations, is systematically cataloged in Table 1. Following this foundational overview, we introduce the standard spectral clustering algorithm process. This is followed by an analytical overview of several widely used methods for fair spectral clustering. Finally, we discuss the integration of entropy computation into spectral clustering, highlighting its application and importance in improving algorithmic performance.

Table 1 Table of symbols

2.1 Spectral clustering

Spectral clustering, a method in the field of unsupervised learning, is based on the partitioning of a corpus of data into distinct subgroups. This technique exploits the principles of spectral graph theory, an advanced branch of computational mathematics. The core goal is to ensure maximum homogeneity within each subgroup while preserving heterogeneity across different clusters [1]. This method’s reliance on spectral graph theory sets it apart from other clustering techniques by allowing for a nuanced analysis of the intrinsic structure within the data, resulting in a more refined and contextually appropriate clustering result.

The vanilla data graph \(\mathcal {G}(P,V)\) can be represented using a set of data points \(P = \{x_1, x_2, \ldots , x_n\}\). The first step is to create a similarity matrix. This matrix encapsulates the degree of similarity between individual data points. The construction of this matrix typically uses an algorithm known as K-nearest neighbors (KNN). This algorithm works by quantifying the proximity between data entities, allowing for a systematic evaluation of their similarity. This quantitative assessment is crucial in various applications such as clustering, anomaly detection, and pattern recognition within multidimensional datasets. The similarity matrix S is defined as:

$$\begin{aligned} \text {S}(i, j) = {\left\{ \begin{array}{ll} \exp \left( -\frac{\Vert x_i - x_j\Vert ^2}{2\sigma ^2}\right) & \quad {\text {if }} j \in \text {K-nearest neighbors of } i \\ 0 & \quad \text {otherwise} \end{array}\right. } \end{aligned}$$
(1)

where \({\Vert x_i - x_j\Vert ^2}\) presents the Euclidean distance between samples and \({\sigma }\) is a parameter controlling the rate of similarity decay.

Then build the degree matrix of \(\mathcal {G}(P,V)\). The degree of a data point \( x_{i} \in P \) is defined as:

$$\begin{aligned} d_i = \sum _{j=1}^{n} s_{ij}. \end{aligned}$$
(2)

Thus, the degree matrix of \(\mathcal {G}(P,V)\) can be calculated \({\textrm{D}}={\textrm{diag}}\left( d_1,d_2,\ldots ,d_n\right) \).

In [22], the objective function of the standard spectral clustering under this partitioning approach of Ratio-Cut can be formally defined as follows.

$$\begin{aligned} {\text {Ratio-Cut}}\left( C_1, C_2, \ldots C_k\right) =\min \left( \sum _{j=1}^k \frac{V\left( C_j, \overline{C_j}\right) }{\sqrt{\left| C_j\right| }}\right) \end{aligned}$$
(3)

Yu et al. [23] proofed that the Ratio-Cut function is related to matrix trace:

$$\begin{aligned} {\text {Ratio}}\left( C_1, C_2, \ldots C_k\right)= & {\text {argmin}} H ^{\textrm{T}} L H , \nonumber \\ \text {s.t.} \quad H ^{\textrm{T}} H= & I_k. \end{aligned}$$
(4)

where H is a clustering indicator matrix,

$$\begin{aligned} H= & (h_{i\ell }) \in \mathbb {R}^{n \times k}, \nonumber \\ h_{i\ell }= & {\left\{ \begin{array}{ll} 1, & \quad \text {if } p_i \in C_{\ell }, \\ 0, & \quad \text {otherwise}, \end{array}\right. }\nonumber \\ \text { for } i= & 1, 2, \ldots , n,\text { and }\ell = 1, 2, \ldots , k. \end{aligned}$$
(5)

The introduction of the Laplacian matrix represents a major step forward. Initially integrated into spectral clustering methods as described in [1], the Laplacian matrix functions as a fundamental tool for analyzing and partitioning data. Its use in spectral clustering algorithms underscores its utility in facilitating the dissection of complex data structures, thereby increasing the robustness and accuracy of clustering results. This integration marks a significant evolution in the computational strategies employed in data segmentation, reflecting a sophisticated approach to dealing with multidimensional datasets. The Laplacian matrix is usually defined as

$$\begin{aligned} L = D - S \end{aligned}$$
(6)

He et al. [24] provided detailed proof of the nonnegativity, seminormality, and connection with the minimum-cut problem of Laplacian matrices.

Finally, perform an eigenvalue decomposition of the Laplacian matrix to extract the eigenvalues and their associated eigenvectors. Subsequently, execute a clustering operation on the dataset using the derived eigenvectors. This is typically accomplished by implementing clustering algorithms such as K-means, which separates the eigenvectors, culminating in the final clustering output. The spectral clustering is summarized in Algorithm 1 as follows.

Algorithm 1
figure g

Spectral Clustering

2.2 Fair spectral clustering

In the field of clustering algorithms, the concept of fairness is defined by several different paradigms. These paradigms include: group fairness, individual fairness, counterfactual fairness, degree-related fairness, and application-specific fairness. A comprehensive review, as summarized in survey [18], reveals a notable trend in the field of spectral clustering methods. The current research mainly addresses the facets of Group fairness and Individual fairness.

2.2.1 Group fair spectral clustering

The principle of Group fairness, alternatively known as statistical parity or independence, provides a framework for fair spectral clustering. Central to this concept is the presence of a particular sensitive attribute within the dataset, such as a binary gender classification. This attribute bifurcates the dataset into two distinct subsets corresponding to the dual modalities of the sensitive attribute. The basic premise of group fairness is the preservation of proportional representation. Specifically, if the original dataset has a sensitive attribute distribution with a 3:7 ratio between the two groups, this proportionality should be reflected in the composition of any subsets or clusters derived from the dataset. Thus, in the context of clustering or data segmentation, the goal is to maintain this 3:7 ratio within each resulting cluster, ensuring that the representation of the sensitive attribute is consistent with its original distribution in the entire dataset (e.g., Fig. 2). This approach to fairness in data processing is critical to mitigating biases that may arise from disproportionate representation of sensitive attributes, thereby promoting equitable treatment of different groups within algorithmic decision-making processes.

Fig. 2
figure 2

The original dataset was 60% male and 40% female. Group fair spectral clustering requires that each resulting cluster is also 60% male and 40% female

This fair clustering approach is particularly beneficial in fields that require consideration of data representativeness and bias reduction, such as recommendation systems, social network analysis, and market segmentation. In these applications, ensuring proportionate representation of different attribute groups in each cluster can improve the accuracy and fairness of decision making, thereby enhancing user satisfaction and the social responsibility of algorithms.

Group fair spectral clustering was first proposed in [14], which extends the notion of group fairness of [25]. Given a certain sensitive attribute M and its h sensitive attribute groups, we have \(M = M_1 \cup M_2 \cup \cdots \cup M_h\), \(M_i \cap M_j=\varnothing , i, j \in [l]\), \(i \ne j\). The concept of group fair spectral clustering can be articulated as the computational process of delineating a subdivision within a data graph \(\mathcal {G}\). This process is governed by an optimization paradigm wherein the objective is to minimize a spectral clustering function. Concurrently, this optimization is subject to a critical fairness constraint, which mandates an equitable representation or treatment of predefined groups within the data. This constraint is embedded within the algorithmic structure to ensure that the resultant clusters do not disproportionately favor or disadvantage any particular group, thereby aligning with principles of fairness in algorithmic decision making.

Definition 1

A spectral clustering is group fair if in each cluster the objects from each attribute group are present proportionately as in the vanilla dataset. That is,

$$\begin{aligned} \frac{|C_i \cap M_s|}{|C_i|} = \frac{|M_s|}{|M|}, \quad i \in [k], \, s \in [h]. \end{aligned}$$

Kleindessner et al. [14] addressed the enhancement of group fairness in spectral clustering through a novel algorithmic framework. This framework encapsulates a complex optimization paradigm, where the primary focus lies in harmonizing an objective function while adhering to a predefined set of constraints. The core of this approach is anchored in an iterative process, leveraging the principles of alternating minimization. Empirical analyses attest to the efficacy of this approach, demonstrating a tangible enhancement in group fairness within the context of spectral clustering. The significance of this research lies in its potential to contribute to the evolving discourse on algorithmic fairness, particularly in spectral clustering methodologies.

So far, the majority of group fair spectral clustering studies have followed the above Definition 1. Xia et al. [26] proposed an unnormalized fair spectral clustering (UFSC) method to improve fairness by considering the fairness problem as a constraint problem, thus transforming the fairness spectral clustering into constraint spectral clustering. This study introduces for the first time the Karush–Kuhn–Tucker (KKT) condition to solve the feasible solution set of constrained spectral clustering and integrates multiple fairness constraint matrices into one matrix to solve the fair clustering problem with multiple sensitive attributes.

Wang et al. [27] introduced a new method, s-FairSC, which incorporates nullspace projection and Hotelling’s deflation. This approach only involves sparse matrix–vector products, fully exploiting the sparsity of the fair SC model. The s-FairSC algorithm demonstrates comparable performance to fair spectral clustering [14] in achieving fair clustering but is significantly faster (up to 12 times for moderate model sizes). It is also scalable, with computational costs only marginally increasing compared to spectral clustering without fairness constraints.

2.2.2 Individual fair spectral clustering

In the field of computational ethics, particularly in the context of algorithmic decision making, the paradigm of individual fairness, which posits that analogous entities should be afforded comparable opportunities, is perceived as more ethically tenable from a societal perspective than the construct of group fairness (e.g., Fig. 3). This notion was initially delineated in the seminal paper referred to as [25]. Subsequently, a number of scholarly articles, such as [28], have engaged in a discourse exploring the optimal conceptualization of fairness. These discussions center on the merits of individual versus group fairness, and whether these constructs should be operationalized independently within algorithmic frameworks.

Fig. 3
figure 3

Individual fair spectral clustering requires that individuals with similar characteristics should be treated similarly in algorithmic decision making

At the same time, there has been a notable trend toward incorporating individual fairness constraints into conventional machine learning algorithms. This trend reflects a growing recognition of the importance of fairness in algorithmic design. However, it is noteworthy that the study of individual fairness in the specialized area of graph mining, particularly in the context of spectral clustering, remains relatively nascent, with only a limited number of studies, such as [19, 29], addressing this aspect. The paucity of research in this area suggests an emergent field of inquiry within the broader intersection of machine learning and ethical computing.

Kang et al. [19] presented the first principled study of individual fairness on graph mining (InFoRM). It introduces the regularized Cosine Similarity Matrix and Jaccard Similarity Matrix for the first time, which effectively reduces the Bias in clustering. It systematically studies the impact of incorporating individual fairness in graph mining tasks and provides practical solutions to mitigate bias. Extensive experimental evaluations are conducted on real-world datasets to demonstrate the effectiveness and generality of the proposed methods.

Definition 2

Given a the result of a spectral clustering \( Y \) and a node similarity measure \( S \), the condition that a spectral clustering model has individual fairness is:

$$\begin{aligned} \left| Y[i, :] - Y[j, :] \right| \le \delta S[i, j] \end{aligned}$$
(7)

where \( \delta > 0 \) is a tolerance constant, and \( S[i, j] \) represents the similarity between nodes \( i \) and \( j \).

The baseline axiom states that nodes with significant attribute divergence or no direct link are conventionally assigned a similarity index of zero. This axiom underpins Definition 2, which upon closer examination appears to be overly rigid for practical datasets. Such rigidity in the criterion could impose constraints that are impractical to meet in real-world datasets. Consequently, Definition 3 introduces a more flexible approach by recalibrating the evaluation metrics for fair spectral clustering at the individual node level. This recalibration aims to adapt the theoretical model to better match the complexity and nuances of actual datasets, thus improving the applicability and robustness of individual fair spectral clustering.

Definition 3

The relaxed criterion for individual fair spectral clustering is based on the Laplacian matrix \( L_S \) of the similarity matrix \( S \):

$$\begin{aligned} \text {Tr}\big (Y^{\textrm{T}} L_S Y\big ) \le \tau \end{aligned}$$
(8)

where \( \text {Tr} \) denotes the trace of a matrix, \( L_S \) is the Laplacian matrix of \( S \), and \( \tau \) is a threshold value.

The experimental results show that although the method is effective in improving the individual fairness of spectral clustering, it exhibits poor clustering results under certain datasets.

Gupta et al. [29] introduced an innovative paradigm within the domain of graph-based clustering, emphasizing a unique approach to equitable representation at the individual level. This is operationalized by ensuring that, within a given graph \(\mathcal {G}\), the adjacency relations of each vertex (or node) are evenly distributed across various clusters. This concept posits that if a vertex’s adjacent nodes are equitably partitioned among all clusters, then a state of individual fairness within the clustering mechanism is attained. To actualize this notion of fairness, the authors have recalibrated the spectral clustering framework. This recalibration entails a novel formulation of the Laplacian matrix. The modified Laplacian now incorporates parameters that explicitly account for fairness considerations. This integration allows the algorithm to more effectively balance the distribution of data points among the clusters, striving to ensure equitable representation across them. In a broader context, this algorithm can be perceived as a hybrid model synthesizing two predominant fairness paradigms: group fairness and individual fairness. The fusion of these paradigms in the context of spectral clustering offers a nuanced approach to fairness, one that addresses both collective and individual equity within the data clustering process.

In the area of algorithmic fairness and unsupervised learning, fair spectral clustering methods remain notably underdeveloped in the context of fairness paradigms. Predominantly, these algorithms are designed to address fairness considerations at the granularity of individual entities and subgroup constructs. However, there is a conspicuous absence of mechanisms that holistically incorporate a global fairness perspective, particularly in mitigating discrepancies in scale and distributional equity across different cluster formations. This gap highlights a critical area for further research and development aimed at integrating a more comprehensive fairness framework into spectral clustering algorithms that can effectively balance equity across different cluster scales.

Therefore, this paper presents the first principled study of spectral clustering with scale fairness constraints. We define the scale fairness of spectral clustering as a completely new notion of fairness alongside the two existing ways of defining fairness. This paper effectively improves the scale fairness of spectral clustering by designing two algorithms, EWSC and SFSC.

2.3 Entropy in spectral clustering

The current state of entropy research in machine learning is multifaceted, encompassing various applications and theoretical advancements (e.g., Fig. 4).

Fig. 4
figure 4

Entropy is widely used in machine learning for neural networks, clustering, and decision trees

The principle of entropy, originally described in the disciplines of thermodynamics and information theory, is gradually being integrated into the methodology of machine learning. The primary goal of this integration is to elucidate and improve the efficiency of complex algorithmic structures. The application of entropy in this context serves as a central tool for quantifying the degree of disorder or uncertainty inherent in the state of a system. This quantification is instrumental in optimizing decision-making and predictive models within machine learning frameworks, thereby facilitating more robust and efficient handling of data-intensive and dynamic computational environments.

In the field of data clustering, entropy is more often combined with the K-means algorithm. A central theme of these [30,31,32,33] is to improve the clustering performance of the K-means algorithm. By incorporating entropy measures, these studies aim to improve the accuracy, stability, and robustness of clustering results. Entropy is often used as a weighting or optimization criterion in these studies. For example, in financial risk analysis, entropy weights are computed to normalize data across dimensions. In brain tumor detection, Otsu’s entropy measure serves as a fitness function for segmentation. This approach uses the inherent property of entropy to quantify disorder or uncertainty, thereby refining the clustering process.

Fig. 5
figure 5

Different ways of introducing entropy on spectral clustering will have different optimization effects on spectral clustering algorithms

In recent years, more and more researchers have introduced entropy into spectral clustering algorithms. Entropy has different roles in different algorithms and realizes different goals (e.g., Fig. 5). In their seminal work, Jenssen et al. [34] introduced the Kernel Maximum Entropy (kernel MaxEnt) method, a transformative data processing technique, and its integration into spectral clustering. This technique diverges from conventional kernel PCA, enhancing clustering performance by utilizing the kernel MaxEnt method for intermediate data transformation. Subsequently, Zhao et al. [35] proposed the Eigenvector Selection Based on Entropy Ranking for Spectral Clustering (ESBER), an innovative eigenvector selection paradigm. This method departs from traditional reliance on principal eigenvectors, instead employing entropy as a criterion for eigenvector significance in clustering. ESBER’s methodology represents a substantive shift from prevalent spectral clustering practices, such as the NJW method.

Further, Jia et al. [36] explored the application of knowledge entropy from rough set theory to assess attribute significance in datasets. This approach is critical in spectral clustering, where similarity measures and, consequently, clustering effectiveness are contingent on data attributes. Moreover, Hu et al. [37] provided an in-depth theoretical analysis of entropy’s role in regularization, employing matrix perturbation theory. This paper elucidates how entropy, as a rank score function, influences the regularization process in spectral clustering, diverging from constant-parameter methods and adapting to the data’s inherent complexity. This work not only deepens the understanding of entropy in spectral clustering but also lays foundational concepts for future algorithmic advancements in high-dimensional data clustering.

Finally, Kumar et al. [38] introduced an entropy-based spectral clustering approach for optimizing distributed generation (DG) unit placement in electrical distribution networks. This method, rooted in the concept of entropy as a measure of randomness, segments the network into clusters for efficient voltage stability management in high DG penetration scenarios. This approach marks a significant advancement over traditional centralized voltage control methods, offering a more effective and economically efficient solution."

3 Scale fairness

In this section, we define the scale fair spectral clustering problem. We also formulate the associated notions of fairness.

3.1 Problem definition

This paper addresses the scale fair spectral clustering problem, which involves assigning clusters to uncertain data points in spectral clustering algorithms. During the spectral clustering process, these uncertain data points are identified as the points in the original dataset that exhibit higher entropy. In this paper, the entropy value is defined in Definition 4.

In the context of spectral clustering, uncertain data points are those that are close to the border between clusters in terms of their similarity. These points are located on the boundaries between clusters in the similarity graph.

Definition 4

The entropy H(X) of a discrete random variable X is defined as:

$$\begin{aligned} H(X) = -\sum _{i} P(x_i) \log P(x_i) \end{aligned}$$
(9)

where \( P(x_i) \) is the probability of the random variable taking a specific value \( x_i \).

Theorem 1

Points in spectral clustering where cluster attribution is uncertain are the points in the original data with higher entropy.

Proof

For a data point in spectral clustering, we can consider it as a random variable whose probabilities of belonging to different clusters form its probability distribution. For example, if a data point x is uncertain between two clusters A and B, we can estimate the probabilities P(A|x) and P(B|x) of it belonging to clusters A and B, respectively.

The entropy of this point can be expressed as:

$$\begin{aligned} H(x) = - P(A|x) \log P(A|x) - P(B|x) \log P(B|x) \end{aligned}$$
(10)

For uncertain data points, as they are almost equally likely to belong to either of the clusters (e.g., \( P(A|x) \approx P(B|x) \approx 0.5 \) for a two-cluster case), their entropy values are high. This is because when \( P(A|x) \approx P(B|x) \approx 0.5 \), the term \( - P(A|x) \log P(A|x) - P(B|x) \log P(B|x) \) reaches its maximum.

Therefore, it can be concluded that in spectral clustering, the uncertain data points, i.e., those on the boundaries between different clusters, correspond to points with higher entropy values in the original dataset. This is due to their higher uncertainty in cluster affiliation, which in terms of information theory, manifests as higher entropy. \(\square \)

From Theorem 1, we can define the scale fair spectral clustering problem as a cluster attribution problem for high entropy points.

3.2 Fairness notion

This paper concretizes the concept of scale fairness. We define scale fairness quantitatively by employing the standard deviation (SD), of the number of elements present within each resultant cluster. Furthermore, to ensure a controlled and practical application of this concept, we establish a threshold \(\mathcal {T}\) that serves as an upper limit for scale fairness. This threshold is crucial for maintaining a balance between the fair distribution of data points among clusters and the functional constraints of clustering algorithms.

Definition 5

Let \({C}_1, {C}_2, \ldots , {C}_k\) be the clusters obtained from a spectral clustering algorithm, with each cluster \({C}_i\) having a size \(c_i\). The standard deviation (SD) of the cluster sizes is defined as:

$$\begin{aligned} {\text {SD}} = \sqrt{\frac{1}{k}\sum _{i=1}^{k}(c_i - \bar{c})^2} \end{aligned}$$
(11)

where \(\bar{c}\) is the average size of the clusters, computed as \(\bar{c} = \frac{1}{k}\sum _{i=1}^{k}c_i\).

Definition 6

Set an ideal upper limit of discrimination \(\mathcal {T}\), the condition that a spectral clustering has scale fairness is:

$$\begin{aligned} {\text {SD}} \le {\mathcal {T}} \end{aligned}$$
(12)

However, this metric exhibits heightened sensitivity to outlier values. This sensitivity is particularly pronounced in scenarios involving expansive datasets, where the SD tends to inflate. To address this issue and provide a more robust measure in extreme cases of cluster size disparity, the present study adopts the standardized relative range (RR) as the key metric for evaluating the distribution of cluster sizes. This approach aims to mitigate the impact of outliers and provide a more stable evaluation of scale fairness across different dataset sizes.

Definition 7

The relative range (RR) signifies a more homogeneous distribution of cluster scales. This metric is more intuitive for large datasets to reflect the max–min variance in cluster scale. The RR is computed as follows:

$$\begin{aligned} \text {RR} = \frac{\max _i(c_i) - \min _j(c_j)}{\bar{c}} \end{aligned}$$

where \( c_a \) represents the size of the \( a \)-th cluster, and \( \bar{c} \) is the average size of all clusters in the dataset. The terms \( \max _i(c_i) \) and \( \min _j(c_j) \) denote the maximum and minimum cluster sizes, respectively.

Example

Consider a dataset partitioned into three clusters with sizes 50, 200, and 100. Here, \( \max _i(c_i) = 200 \), \( \min _j(c_j) = 50 \), and \( \bar{c} = \frac{50 + 200 + 100}{3} = 116.67 \). The relative range, RR, would then be calculated as:

$$\begin{aligned} \text {RR} = \frac{200 - 50}{116.67} \approx 1.29 \end{aligned}$$

This example illustrates how RR quantifies the variance in cluster scales relative to the average cluster size, providing an intuitive measure of the homogeneity in cluster sizes for large datasets.

4 Methodology

4.1 Entropy in scale fairness

This paper explores the relationship between the entropy of the input matrix and the scale fairness performance. The geometric mean of cluster sizes is considered as it aligns closely with our goal of achieving scale fairness in clustering. Unlike the arithmetic mean, which could be disproportionately influenced by larger clusters, the geometric mean ensures that all clusters contribute equally to the final measure. This equality of contribution is crucial in spectral clustering, where the objective is to distribute data points as evenly as possible across clusters. By maximizing the geometric mean of the cluster sizes, we encourage a clustering solution where sizes are uniform, thereby adhering to the principle of scale fairness.

Theorem 2

Let H(C) be the entropy of the input matrix X, higher entropy leads to clustering results with more scale fairness.

Proof

Consider a spectral clustering with N data points and k clusters. The average size of clusters is:

$$\begin{aligned} \bar{c} = \frac{1}{k} \sum _{i=1}^{k} c_i \end{aligned}$$

Assuming uniform distribution within each cluster, the probability of a data point belonging to a specific cluster \(C_i\) is \(\frac{1}{c_i}\). The entropy for a cluster \(C_i\) can be calculated as:

$$\begin{aligned} H(C_i) = -\sum _{j=1}^{c_i} \frac{1}{c_i} \log \left( \frac{1}{c_i}\right) \end{aligned}$$

which simplifies to:

$$\begin{aligned} H(C_i) = \log (c_i) \end{aligned}$$

Given the uniform probability distribution within each cluster, the entropy of a cluster \(C_i\) is a measure of its uncertainty or diversity. Under this uniform distribution, each of the \(c_i\) data points in \(C_i\) has an equal likelihood of \(\frac{1}{c_i}\). The entropy, in this context, quantifies the ’surprise’ in identifying a specific data point within the cluster. When \(c_i\) is larger, implying a larger cluster, the uncertainty or entropy increases, as reflected by the logarithmic relationship in \(H(C_i) = \log (c_i)\). This relationship underscores the interpretation of entropy as a measure of uniformity or fairness in the distribution of data points across clusters in spectral clustering.

The total entropy for the clustering is:

$$\begin{aligned} H(C) = \sum _{i=1}^{k} H(C_i) = \sum _{i=1}^{k} \log (c_i) \end{aligned}$$

Given the logarithmic identity:

$$\begin{aligned} \log (a) + \log (b) = \log (ab) \end{aligned}$$

we have:

$$\begin{aligned} H(C) = \log \left( \prod _{i=1}^{k} c_i\right) \end{aligned}$$

Entropy’s property of maximization under uniform distribution implies that a higher entropy value corresponds to a more uniform distribution of data points across clusters. This can be interpreted in the clustering context as more uniform sizes of clusters. Therefore, a higher total entropy H(C) suggests that the clusters are more uniform in size.

Furthermore, if we consider the geometric mean of the cluster sizes, given by:

$$\begin{aligned} G = \left( \prod _{i=1}^{k} c_i\right) ^{\frac{1}{k}} \end{aligned}$$

we find that maximizing H(C) is equivalent to maximizing G, which occurs when the cluster sizes are equal, as per the inequality of arithmetic and geometric means (AM–GM inequality).

Thus, a higher entropy in the spectral clustering indicates more uniform sizes of the clusters, reflecting greater scale fairness. \(\square \)

4.2 Entropy weighted spectral clustering

With respect to Theorem 2, it is observed that the application of scale fairness spectral clustering improves scale fair by increasing the entropy metric associated with the input matrix. Nevertheless, this approach seems to compromise the integrity of the original data graph \(\mathcal {G}\). This perturbation results from the direct manipulation of the entropy parameter, which subsequently leads to a loss of predictability in the clustering results. The alteration of the entropy value appears to induce a deviation from the structured representation inherent in \(\mathcal {G}\), thereby posing a challenge in maintaining control over the resulting clustering behavior.

Thus, this paper effectively improves the scale fairness of spectral clustering by expanding the influence of high entropy points in the clustering results and constructing the entropy weight feature matrix \(X'\) as the input matrix of K-Maens.

4.2.1 Algorithm

EWSC first computes the probability distribution for each data point across all of its feature dimensions. This is typically done by transforming each feature value of the data point into a probability. For example, the probability can be represented by the ratio of the feature value of the data point to the sum of all data points in that feature. The specific calculations are in Definition 8.

Definition 8

To avoid numerical issues (such as division by zero or logarithm of zero), a small constant \( \epsilon \) is typically added to the probability values. Thus, the entropy calculation formula becomes:

$$\begin{aligned} h(x_i) = -\sum _{i=1}^{n} (p(x_i) + \epsilon ) \log (p(x_i) + \epsilon ) \end{aligned}$$

Repeat the above process for each data point in the dataset to calculate its entropy value.

Spectral clustering extracts the minimum k eigenvectors by eigenvalue decomposition of the Laplacian matrix and forms the matrix X. These eigenvectors represent the embedding of the data in the low-dimensional space and are the basis for performing clustering. In this paper, we use \(X' = XH\) as the belonging matrix of K-means, which effectively enhances the influence of high entropy points in clustering.

Theorem 3

\(X' = XH\) improves the influence of high entropy points on spectral clustering process relative to X.

Proof

\( H = \text {diag}(h_1, h_2, \ldots , h_n) \) as the diagonal matrix of entropy values for each data point, where \( h_i \) is the entropy of the \( i \)-th data point. \( X = [x_1, x_2, \ldots , x_k] \) as the matrix of selected eigenvectors from the Laplacian matrix, where each \( x_i \) is a column vector representing the \( i \)-th eigenvector.

The entropy weight feature matrix \( X' \) is computed as follows:

$$\begin{aligned} X' = XH = [x_1, x_2, \ldots , x_k] \begin{bmatrix} h_1 & \quad 0 & \quad \cdots & \quad 0 \\ 0 & \quad h_2 & \quad \cdots & \quad 0 \\ \vdots & \quad \vdots & \quad \ddots & \quad \vdots \\ 0 & \quad 0 & \quad \cdots & \quad h_n \end{bmatrix} \end{aligned}$$

The multiplication results in each eigenvector \( x_i \) being scaled element-wise by the corresponding entropy values:

$$\begin{aligned} X' = [h_1 x_1, h_2 x_2, \ldots , h_n x_k] \end{aligned}$$

This operation scales the \( i \)-th row of the eigenvector matrix by the entropy \( h_i \) of the corresponding data point. High entropy data points (with high \( h_i \)) will have their corresponding rows in the eigenvector matrix scaled up, increasing their influence in the subsequent clustering process (e.g., K-means). \(\square \)

Compared to spectral clustering, EWSC uses a new construction to get the feature matrix X, which happens to be the direct input matrix for K-means clustering. Therefore, EWSC obtains a better scale fairness performance. Algorithm 2 summarizes this process.

Algorithm 2
figure h

Entropy Weighted Spectral Clustering

4.2.2 Limitations of EWSC

Table 2 shows some of the experimental results with respect to EWSC and spectral clustering (SC), and we will show the details in Sect. 5. Among them, SCoef is the metric used to evaluate the clustering effect, the closer the SCoef value is to 1, the better the algorithm clusters.

Table 2 Part of the experimental data about EWSC and SC, including the fair performance and clustering effects of both

In Table 2, we can clearly see that EWSC has improved significantly in terms of fair performance, but EWSC has lost too much clustering effect. Therefore, we consider that EWSC may have the following limitations:

  • EWSC adjusts the clustering process by directly multiplying the entropy value of the original data points on the feature vector. This approach can cause the structural information of the original data to be obscured to some extent by the entropy value, especially when the distributional characteristics of the data points vary widely.

  • EWSC is very sensitive to the distribution characteristics of the original data. If there are outliers or a very uneven distribution in the dataset, the directly computed entropy values may adversely affect the clustering results.

  • Directly multiplying the entropy values on the feature vectors can lead to over-fitting of the data, especially if the entropy values differ significantly, which can lead to a decrease in the clustering effect.

4.3 Scale fair spectral clustering

4.3.1 Algorithm

This paper uses a new method of Laplace matrix construction that allows SFSC to take into account not only the similarity between samples, but also the uncertainty in the distribution of similarity for each data point. The experimental results show that SFSC has excellent scale fairness performance and also has a clustering effect comparable to SC.

Compared to EWSC, all entropy operations in SFSC are based on the similarity matrix S. This calculation is less affected by the degree of discretization of the original data distribution, and the resulting entropy already includes the relationships between the data points. This paper constructs an entropy diagonal matrix (ED) which is formed as in Definition 9.

Definition 9

The entropy diagonal matrix (ED) is defined as:

$$\begin{aligned} {\text {ED}} = \begin{bmatrix} {\text {ED}}_{11} & \quad 0 & \quad \ldots & \quad 0 \\ 0 & \quad {\text {ED}}_{22} & \quad \ldots & \quad 0 \\ \vdots & \quad \vdots & \quad \ddots & \quad \vdots \\ 0 & \quad 0 & \quad \ldots & \quad {\text {ED}}_{NN} \end{bmatrix} \end{aligned}$$

where \({\text {ED}}_{ii}\) represents the entropy of the \(i\)-th data point from similarity matrix S:

$$\begin{aligned} {\text {ED}}_{ii} = -\sum _{j=1}^{N} p(x_j^{(i)}) \log (p(x_j^{(i)})) \end{aligned}$$

The probability \( p(x_j^{(i)}) \) is the ratio of the similarity between \( x_i \) and \( x_j \) to the sum of similarities between \( x_i \) and all other data points. This is formed as in Definition 10.

Definition 10

The similarity probability \(p(x_j^{(i)})\) is defined as:

$$\begin{aligned} p\big (x_j^{(i)}\big ) = \frac{S_{ij}}{\sum _{t=1}^{N} S_{it}} \end{aligned}$$

where \( \sum _{t=1}^{N} S_{it} \) is the sum of similarities between the \( i \)-th data point and all other data points from similarity matrix S.

This paper constructs a new Laplace matrix based on ED, which is formed as in Definition 11. The new Laplace matrix also has symmetry and nonnegativity.

Definition 11

The new Laplacian matrix is obtained by subtracting \(S\) from \(ED\):

$$\begin{aligned} L' = {\text {ED}} - S \end{aligned}$$

Theorem 4

The new Laplacian matrix \(L'\) has symmetry and nonnegativity.

Proof

Symmetry:

$$\begin{aligned} (L')^{\textrm{T}}&= ({\text {ED}} - S)^{\textrm{T}} \\&= {\text {ED}}^{\textrm{T}} - S^{\textrm{T}} \\&= {\text {ED}} - S \\&= L' \end{aligned}$$

Thus, \(L'\) is a symmetric matrix.

Nonnegativity: For any vector \(\textbf{x}\), we have:

$$\begin{aligned} \textbf{x}^{\textrm{T}} L' \textbf{x} = \textbf{x}^{\textrm{T}} {\text {ED}} \textbf{x} - \textbf{x}^{\textrm{T}} S \textbf{x} \end{aligned}$$

Since \({\text {ED}}\) is a diagonal matrix with nonnegative elements (entropy values) and \(S\) is the similarity matrix with nonnegative elements, \(L'\) has nonnegativity. \(\square \)

When the construction of the Laplace matrix \(L'\) is completed, the remaining steps are the same as for spectral clustering. The Scale Fair Spectral Clustering is summarized in Algorithm 3 as follows.

Algorithm 3
figure i

Scale Fair Spectral Clustering

4.3.2 Differences between SFSC and EWSC

SFSC computes entropy values based on the similarity matrix, which takes into account the relationships between data points. In many cases, the relationships between data points are more critical than the distribution of individual data points, especially in network or social data. Meanwhile, SFSC has a more balanced feature adjustment, and the new Laplace matrix construction method can moderately adjust the features of the data while preserving the original data structure, which may help to improve the clustering effect.

Although both EWSC and SFSC aim to achieve scale fair spectral clustering, they differ significantly in their approach to data handling. EWSC focuses more on the distribution properties of individual data points, while SFSC emphasizes the relationships between data points. Thus, in practical applications, SFSC may be more appropriate for datasets where the relationships between data points are important, while EWSC may be better suited for datasets where the independent distributional properties of data points are more relevant.

5 Evaluation

In this section, we conduct extensive experiments on 11 real datasets of different sizes. Both EWSC and SFSC are implemented in Pyhton, and the main parameter is the number of clusters K.Footnote 1 We list the information of the datasets in detail, and compare the scale fairness performance and clustering effect of EWSC and SFSC with existing spectral clustering algorithms. Finally, we present the experimental results and discussion.

5.1 Datasets

Most of the datasets are from SNAP [39]. In addition, Iris, Wine, and wdbc are all classic public datasets in the field of machine learning. Drug is from [40], Congress is from [41], and Friend-net is from [42]. Information on all datasets is presented in Table 3.

Table 3 Description of the dataset used in this paper

5.2 Fairness performance

5.2.1 EWSC

This paper compares the scale fairness performance of different spectral clustering algorithms under various datasets, where SC is the standard spectral clustering algorithm proposed in [1]. In Fig. 6, we first compare the difference in scale fairness performance between EWSC and SC under two classical datasets and four real datasets (including three large datasets). The experiments adjusted the number of clusters to be consecutive numbers starting from 5. SD was chosen as the main evaluation metric for fairness. However, SD also becomes large when the datasets are too large, or when there are extreme values in the cluster scales, SD does not intuitively reflect the situation. Therefore, we again chose RR as the second evaluation metric, a normalization metric that simply describes the degree of dispersion and focuses on extreme values.

Fig. 6
figure 6

Scale fairness performance of EWSC and SC under six different datasets with the number of clusters K set to five consecutive numbers

As we can see from Fig. 6, EWSC has improved significantly in scale fairness compared to with SC. From the vast majority of the datasets, the green part representing SC in the bar chart is higher than EWSC, which means that EWSC has better scale fairness performance (SD and RR take lower values). Especially on a large dataset like Wiki-Vote, the SD value of EWSC is less than half of that of SC. However, on some datasets such as email-Eu, the improvement in scale fairness of EWSC is not obvious, which we believe is related to the direct way of constructing the feature matrix of EWSC.

After that we compare EWSC with the existing fair spectral clustering algorithms on four more datasets. Figure 7 shows the experimental results that compares the scale fairness performance of SFSC over the existing fair spectral clustering algorithms GFSC and IFSC. Among them, GFSC is the group fair spectral clustering proposed in [14], and IFSC is the individual fair spectral clustering proposed in [19]. For GFSC, the sensitive attribute on Friend-net is set to Gender, and on Drug it is set to Ethnicity.

Fig. 7
figure 7

Scale fairness performance of EWSC, GFSC, and IFSC under four different datasets with the number of clusters K set to six consecutive numbers

When examining the comparative analysis of EWSC against GFSC and IFSC as depicted in Fig. 7, it becomes apparent that EWSC exhibits superior performance in terms of scale fairness. The architectural design of GFSC and IFSC does not explicitly account for the potential disproportion in cluster scales. This oversight becomes evident in datasets such as Friend-net, where GFSC’s RR value approaches 5, indicating a significant disparity in cluster scales that could potentially influence the final decision. Conversely, in the context of the Drug dataset, EWSC’s RR metric does not fare as well. This can be attributed to the inherent characteristics of EWSC as a clustering algorithm, where the formation of clusters is predominantly influenced by the inherent similarities within the dataset. On the whole, EWSC stands out as a spectral clustering algorithm that aligns more closely with the concept of global fairness, both in its theoretical underpinnings and empirical outcomes.

5.2.2 SFSC

In Fig. 8, we compare the scale fairness performance of SFSC and SC, and in this experiment we have chosen the same six datasets as in Fig. 7. From the experimental results, we can see that SFSC also has outstanding scale fairness performance, and in five out of six datasets, the SD and RR of SFSC are much smaller than that of SC. This steady increase in scale fairness is something that EWSC does not have, because in some datasets in Fig. 7 EWSC simply performs better in scale fairness for certain cluster number settings. But for email-Eu dataset, neither SFSC nor EWSC performs well in terms of fairness. We think that this dataset may have characteristics such as features being too obvious, and therefore both algorithms do not perform well enough on this data.

Fig. 8
figure 8

Scale fairness performance of SFSC and SC under six different datasets with the number of clusters K set to six consecutive numbers

Both the EWSC and SFSC algorithms proposed in this paper aim to improve the scale fairness of spectral clustering, where SFSC has a more moderate entropy matrix construction method compared to EWSC. Therefore, we organized experiments on four more datasets with the intention of comparing the scale fairness performance of these two algorithms. In Fig. 9, we use a radar graph to show the results of this experiment. Where both RR and SD have smaller values, that is, the smaller the area of the two algorithms in the radar graph, the better their scale fairness performance. From the experimental results, it can be seen that EWSC and SFSC are close in scale fairness performance. Both EWSC and SFSC may be the most suitable scale fairness spectral clustering algorithms under different datasets with different cluster number K value settings.

Fig. 9
figure 9

Scale fairness performance of EWSC and SFSC under four different datasets with the number of clusters K set to six consecutive numbers

5.3 Clustering effect

In this paper, we use the Silhouette Coefficient (SCoef) as a metric for evaluating the effectiveness of clustering algorithms. SCoef quantitatively evaluates both the cohesion within clusters and the degree of separation between them. This coefficient is bounded within the interval [\(-\,1\)1, 1], with values approaching 1 indicating more distinct and well-defined cluster formations, while values closer to \(-\,1\) indicate less effective clustering performance. In Table 4, we list three evaluation metrics under one classical dataset and three real datasets, including SD and RR, which measure the scale fairness performance, and SCoef, which is used to measure the clustering effect. The number of clusters is set to six consecutive numbers.

In this part of the experiment, we focus on the clustering effect of EWSC, SFSC compared to SC (SCoef). Overall, SC is undoubtedly the algorithm with the best clustering effect because the whole clustering process of SC is performed by considering only the similarity between the data, while EWSC expands the influence of high entropy points in the feature matrix, and SFSC is constructed by ED.

However, given the improved scale fairness, it is reasonable to sacrifice the clustering effect to some extent, which is a phenomenon present in any fair clustering algorithm. For most datasets, SCoef does not degrade too much relative to the improvement in scale fairness performance, which we consider acceptable [14, 19]. It is also surprising to us that for some datasets (e.g., HepTh), SFSC also shows better clustering results, and we believe that the reason for this may be that SFSC is more robust in dealing with certain noises or outliers, since its goal is not only to optimize the overall clustering performance, but also to take into account the fair distribution of certain uncertain points.

Another part of our focus is on the clustering effect of EWSC compared to SC and SFSC. The experimental results illustrate that the clustering effect of EWSC is still an acceptable value under most of the datasets, considering that EWSC also has a significant scale fairness improvement. However, under some datasets (e.g., the large dataset Wiki-Vote), EWSC loses too much clustering effect, and we consider that at this point EWSC is not suitable as a qualified fair spectral clustering algorithm.

Table 4 Clustering effect and scale fairness performance under four datasets

5.4 Discussion

The experiment has two main parts, scale fairness performance and clustering effect. From the various metrics of the experimental results, both EWSC and SFSC have obvious scale fairness improvement compared to the existing spectral clustering algorithms. These two fair spectral clustering algorithms are able to consider the scale of clustering result clusters globally, and effectively solve the problem of cluster assignment of uncertain points by introducing entropy values. It is worth mentioning that although SFSC does not directly expand the influence of high entropy points in clustering as EWSC does, SFSC still has an uncompromising scale fairness performance.

As for both EWSC and SFSC, combining the fairness performance and clustering effect, SFSC is still the better choice in most cases. Because SFSC can fully take into account the data similarity and entropy value during the clustering process, which is also reflected in the experimental results, SFSC has a more stable and better clustering effect compared to EWSC in the case of both excellent scale fairness.

6 Future work

Certainly, addressing the issue of scale fairness is crucial. It is important to recognize that its implications extend beyond spectral clustering. This issue affects a broad range of classification and clustering methods used in computational algorithms. Recognizing this paves the way for a more comprehensive investigation. It also helps in resolving challenges related to scale fairness more effectively. Future efforts in this area will focus on developing solutions. These solutions will aim to address scale fairness across a broader range of algorithmic constructs. This approach will enhance the robustness and fairness of computational classification and clustering systems.

7 Conclusion

This paper presents EWSC and SFSC algorithms that aim to solve the scale fairness problem of spectral clustering. This is the first study of scale fairness notion in the field of spectral clustering, which is defined in terms of the resulting cluster scale problem that potentially exists in all classification and clustering algorithms, and this way of defining fairness is more global compared to the existing notion of fairness. In this paper, the scale fairness performance of spectral clustering is effectively improved by introducing entropy to the spectral clustering process, and the experimental results also demonstrate the scale fairness performance improvement of EWSC and SFSC compared to the existing spectral clustering algorithms, while SFSC has the comparable clustering effect with SC.