Hybrid particle swarm optimization algorithm for text feature selection problems

Nachaoui, Mourad; Lakouam, Issam; Hafidi, Imad

doi:10.1007/s00521-024-09472-w

Hybrid particle swarm optimization algorithm for text feature selection problems

Original Article
Published: 19 February 2024

Volume 36, pages 7471–7489, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Hybrid particle swarm optimization algorithm for text feature selection problems

Download PDF

260 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Feature selection (FS) is a crucial preprocessing step that aims to eliminate irrelevant and redundant features, reduce the dimensionality of the feature space, and enhance clustering efficiency and effectiveness. FS is categorized as NP-Hard due to the high number of existing solutions. Various metaheuristic methods have been developed to address the FS problem, yielding promising results. Particularly, particle swarm optimization (PSO), an evolutionary computing (EC) approach guided by swarm intelligence, has gained widespread adoption owing to its implementation simplicity and potential for global search. This paper analyzes several variants of PSO algorithms and introduces a new FS method called HPSO. The proposed approach utilizes an asynchronously adaptive inertia weight and an improved constriction factor. Additionally, it incorporates a chaotic map and a MAD fitness function with a feature count penalty to tackle the clustering FS problem. The efficiency of the developed method is evaluated against the genetic algorithm (GA) and well-known variants of PSO algorithms, including PSOs with fixed inertia weights, PSOs with improved inertia weights, PSOs with fixed constriction factors, PSOs with improved constriction factors, PSOs with adaptive inertia weights, and PSO’s includes advanced learning exemplars and sophisticated structure topologies. This paper assesses two different reference text data sets, Reuters-21578 and Webkb. In comparison with competitive methods, the proposed HPSO method achieves higher clustering precision and selects a more informative feature set.

Meta-heuristic Algorithms for Text Feature Selection Problems

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

Article 11 April 2017

A parallel hybrid krill herd algorithm for feature selection

Article 30 September 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The quantity of textual data generated in recent years on the internet has exploded exponentially due to the technology evolution which affects the process of grouping text documents [1]. Text clustering methods are unsupervised algorithms, which aim to process a large amount of documents text and group them into a predetermined number of clusters [2], so that each group contains similar documents (documents within the same cluster have a higher degree of similarity than documents in other clusters). Both relevant and non-informative features are included in text clustering data sets, with the redundant, useless, and noisy non-informative features having the potential to impair the accuracy and computing performance of the clustering technique [3]. Features selection technique can be used to address these by chosen an best subset of relevant attributes among a wide set of features. Therefore, FS is a primary and important task, which is the inevitable part of data mining that deals with the curse of dimensionality. Moreover, these methods are designed to enhance accuracy of clustering approach and minimize the number of non-informative features for each document. Many areas in text extraction are supported by the FS approach [4], including text clustering, classification, text categorization, information retrieval, etc.

This paper introduces a hybrid PSO algorithm named HPSO to enhance the clustering accuracy. The developed method is used for reducing the number of non-informative features to enhance clustering technique and effectiveness based on an asynchronously adaptive inertia weight and improved constriction factor. Thus, to improve text clustering accuracy, a chaotic map and the mean absolute difference (MAD) fitness function with a feature count penalty are introduced. Chaotic maps play a major role in enhancing evolutionary algorithms to prevent the local optima and accelerate the convergence [5]. Concretely, the HPSO is used in each document to generate new subsets of useful text features, which are then combined to be the input for the k-means text clustering approach, which is a popular unsupervised procedure thanks to its speed convergence [2, 6]. The methodology of numerical validation of the HPSO is conducted on two different popular text data sets Reuters-21578 [7] and Webkb [8]. The performance of HPSO is examined through its comparison with genetic algorithm [9] and recent various PSO algorithms [1, 10]. The experiment results show that the proposed HPSO is better than the competitive approaches in terms of Precision, Recall, F-measures, and Accuracy measures.

The rest of this paper is organized as follows. Section 2 reviews literature on PSO and genetic algorithm. Section 3 is devoted to the text preprocessing steps, the FS using PSO method, the proposed approaches HPSO, and the k-means clustering technique. The numerical experiments results are illustrated in Sect. 4. Finally, the conclusion is given in Sect. 5.

2 State of art

Numerous methodologies have been proposed to make the problem of selecting characteristics more effective. These approaches can be classified into wrapper, filter, embedded, and hybrid methods [11]. The difference between these three techniques is whether a learning approach is applied and how it is used. Wrapper approaches are dependent on the used learning algorithm, making classification. In this process, a search strategy is applied to generate subsets of features and a learning algorithm to examine the accuracy of the selected features subsets [12]. This method is intended to grow accuracy by inserting or eliminating features from the subset consecutively. Since the clustering technique is used in each evaluation, wrapper methods outperform other approaches in terms of accuracy [13]. However, the action of reaching accuracy improvement typically suffers from overfitting, and they are computationally expensive. Moreover, these methods are not generic. Indeed, the elected subset of features is significantly reliant on the clustering algorithm employed to measure quality. Therefore, any modification in this algorithm leads to re-execution of the FS algorithm. Moreover, filter approaches are based on some statistics tools, like correlation and consistency, etc., to evaluate the set of features in order to generate a new subset of informational text features, without interacting with learning techniques [1, 14]. An optimal feature subset is derived by eliminating the features that perform poorly. While there is no reliance upon classifiers, these approaches tend to be faster, simpler, and less accurate than classifiers [15]. Hybrid approaches [16] combine filter methods and wrapper to select relevant features. Based on the learning technique’s variable selection process, embedded approaches automatically generate and evaluate new feature subsets. As a result, the computation cost and clustering accuracy of embedded methods are situated between wrappers and filters [17].

Metaheuristic algorithms appear to be extensively employed techniques for enhancing feature selection (FS) methods. Notably, various well-known evolutionary algorithms are employed in these studies to overcome the challenge of prior works getting stuck in local optima. These include PSO [18,19,20], GA [21,22,23], ant colony optimization (ACO) [24, 25], differential evolution (DE) [26], among others.

The PSO approach was initially developed in [27]; it is an optimization method based on swarm intelligence, which mimics species social behavior, such as bird flocking. Thus, in order to find the optimum solution, PSO is positioned in the search space for a FS problem using a swarm of particles, where each particle’s movement is determined by its own velocity and also the movement experiences of other particles. The authors introduced two variations of the PSO algorithm: (1) "GBEST" approach, where each particle keeps track of the best solution found on the swarm, and (2) "LBEST" approach, wherein each particle follows the best solution found upon its neighbors. In 1998, another study proposes a new parameter called inertia weight to boost the effectiveness of the original PSO [28], along with two models for the PSO algorithm. In the first model, different values of the inertia parameter were tested using the fixed inertia weight PSO algorithm, whereas the second model proposes a time-decreasing inertia weight. The experiments results demonstrated that the developed method brings a good improvement in the PSO performance. PSO has been successfully used to resolve many optimization problems such as medical care problems [29], image processing [30], cloud computing [31], and wireless networks [32]. PSO is also used to improve classification accuracy in support vector machine (SVM) by determining parameters and FS of the SVM [3]. In another study, a multi-swarm PSO algorithm is introduced to diminish the number of informational features and improve classification performance, where a multi-swam strategy is applied to PSO for parameter determination and FS in SVM [33].

There are several variants of the PSO algorithm that have been proposed to improve the convergence and learning abilities of this algorithm, to overcome some of its drawbacks, such as premature convergence and getting stuck in local optima [34]. In this case, the authors [35] have investigated four PSO variants and introduced two PSO models. These variants and models include: (1) the first variant used a fixed inertia weight [35, 36], (2) the second variant used a functional inertia weight [35, 37], (3) the third variant used a fixed constriction factor [38], (4) the fourth variant used a functional constriction factor [38, 39], (5) the first proposed model used a synchronously inertia weight and constriction factor, and (6) the final model used an asynchronously inertia weight and constriction factor. Through some experiment results, the authors establish that the sixth PSO model overwhelms the other comparative approaches in terms of classification accuracy across different feature dimensions. Another study proposes integrating opposition-based initialization, chaotic strategy, fitness-based dynamic inertia weight, and mutation into binary PSO (BPSO), to enhance the global search capability of the PSO [2]. The presented experiments results show that the introduced approach outperforms BPSO, chaotic BPSO (CBPSO), simple GA (SGA), and adaptive inertia weight PSO (AIWPSO) in terms of convergence speed and the accuracy of clustering. On the other hand, the authors of the paper [1] have proposed (FSPSOTC), an improved inertia weight PSO algorithm with MAD as a fitness function to resolve the text FS problem. This allows improvement in the performance of the text clustering technique and reduces the computing time. The introduced method is compared with genetic algorithm (GA), harmony search approach (HS), and k-mean clustering without any FS method on six Benchmark data sets. The effectiveness of the FSPSOTC in terms of text clustering technique has been demonstrated by some numerical experiments.

The GA and the PSO are two algorithms that are frequently applied to solve difficult optimization problems [40,41,42]. Crossover and mutation are powerful elements of the GA. Exploration and exploitation are handled by these elements in the algorithm. In some research, crossover and mutation are incorporated into the PSO algorithm to enhance its search capabilities. The FS problem is solved in [43] by integrating GA with PSO. The bare bone PSO for the FS problem is used in [44]. The GA for FS for credit card fraud detection is used in [45]. In [46] a fast genetic algorithm for feature selection-A qualitative approximation approach is proposed.

For this method, particles’ local leaders are updated using reinforced memory. Moreover, it is proposed that a uniform combination of crossovers and mutations should be applied to average out exploration and exploitation of the algorithm.

Recently, PSO has witnessed numerous advancements. Surprisingly Popular Algorithm-based Adaptive Euclidean Distance-based Topology Learning Particle Swarm Optimization (SpadePSO) is a more recent addition to the PSO family and incorporates innovative learning exemplars and structural topologies, distinguishing itself from conventional PSO approaches [10]. Its design aims to address challenges faced by traditional PSO algorithms, providing enhanced exploration and exploitation capabilities. As we delve into the landscape of feature selection for text clustering, this study extends beyond established PSO variants, including a comparative analysis with SpadePSO to elucidate the distinctive contributions of the proposed Hybrid PSO (HPSO) and Improved Inertia Weight PSO (IIWPSO). This comparative exploration seeks to shed light on the evolving dynamics within the realm of PSO-based optimization, offering a comprehensive perspective on the state-of-the-art algorithms.

3 Methodology

The digital format has led to a gradual increase in the volume of text documents, making text clustering a crucial approach for organizing them. The main objective of a text clustering algorithm is to group these documents based on their inherent features. To achieve this, certain common preprocessing steps are applied to the documents before clustering. These standard preprocessing steps include tokenization, eliminating stop words, stemming, and term weighting, which are used to transform the documents into a suitable format. In this section, we provide a concise overview of these preprocessing steps. Then, we present the PSO method for the features selection problem.

Next, we introduce the developed HPSO. The goal is to improve clustering efficiency and effectiveness by reducing the number of irrelevant and redundant text features for each document. Documents are represented using a standard model called the vector space model (VSM), and the TF-IDF is the metric used to evaluate the importance of terms in the clustering process. Finally, we give the clustering technique based on the k-means approach used to examine the performance of the FS methods.

3.1 Preliminaries

The text documents are transformed into numerical representations during the preprocessing processes [2]. Tokenization, stop-word elimination, stemming, term weighting, and VSM representation are the five steps. The subsequent subsections provide an overview of various preprocessing stages.

3.1.1 Tokenization

A text document is broken into tokens, each of which represents a single term or a group of related terms, in the tokenization process. A single term is applied for vector space model representation in this work.

3.1.2 Stop words elimination

Stop words build a family of frequent words such as “the,” “is,” “an,” “by” and other recognizable terms that appear frequently in text texts but contain little information for the clustering process. As a result, it is vital to get rid of them because they provide more features and make the text clustering approach perform worse.

3.1.3 Stemming

It is possible to get a word’s grammatical root form by stemming it from its inflectional or derivative forms. For instance, the words “information,” “informations,” and “informative” all share the same root, “inform.” The list of potential stemming techniques can be found in [47]. In this work, stemming tasks are carried out using the porter stemmer.

3.1.4 Term weighting

Using a procedure called term weighting, words in documents are converted into a numerical vector representation. This task has been accomplished using a variety of term weighting procedures, but the most well-liked method is term frequency inverse document frequency (TF-IDF). The TF-IDF is a metric that assesses how crucial a word is in separating the contents of the documents [1]. When a term is used frequently and just sometimes in a few documents, this value rises. The use formula to compute the word-weighting is defined as follows:

$$\begin{aligned} w_{k,l} = tf_{k,l} \times idf_{k,l} = tf_{k,l} \times \log \Big (\frac{n}{df_{l}}\Big ). \end{aligned}$$

(1)

where $tf_{k,l}$ represents the number of times $l^\textrm{th}$ term appears in the $k\textrm{th}$ document, n represents the total number of documents in the data set, and $df_{l}$ represents the number of documents that include the $l^\textrm{th}$ term.

3.1.5 VSM

The VSM is a common model to represent documents as vectors of weights, where each term weight represents the weight that a word should have in the clustering process. The following expression illustrates how documents are represented in this paper using the VSM:

$$\begin{aligned} \textrm{VSM} = \left[ \begin{array}{lllll} \phi _{1,1} &{}\quad \cdots &{}\quad \phi _{1,j} &{}\quad \cdots &{}\quad \phi _{1,t} \\ \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \phi _{i,1} &{}\quad \cdots &{}\quad \phi _{i,j} &{}\quad \cdots &{}\quad \phi _{i,t} \\ \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \phi _{n,1} &{}\quad \cdots &{}\quad \phi _{n,j} &{}\quad \cdots &{}\quad \phi _{n,t} \end{array} \right] \end{aligned}$$

(2)

3.1.6 Standard PSO

PSO is a sophisticated optimization approach that falls under category of population-based meta-heuristic methods. It is modeled after how a flock of birds or a school of fish would exploit and explore a problem space to find food. Swarm is the name given to the PSO population, and each member of the swarm is abstracted as a particle. The positioning of the particle i is designed as an $\mathcal {N}$-dimensional vector $\zeta _i = (\zeta _1, \zeta _2, \ldots , \zeta _\mathcal {N})$, and its velocity is the vector denoted by $\upsilon _i = (\upsilon _1, \ldots , \upsilon _\mathcal {N})$. The position $\zeta $ and velocity vector $\upsilon $ are initialized randomly and adjusted during each iteration using the position’s own well-established best position ($\mathcal {P}optim_j$). The current global best position is represented by the swarm’s best position, which is denoted by ($\mathcal {G}optim_j$). The original PSO concept makes use of both the individual and the present global best. The position that has lowest cost, denoted by $\mathcal {F}$, is what we called the best position. The updating strategy is written as follows:

$$\begin{aligned} \upsilon _{i+1}= & {} \phi \upsilon _i+\gamma _1\xi _1(\mathcal {P}optim_i-\zeta _i) +\gamma _2\xi _2(\mathcal {G}optim_i-\zeta _i) \end{aligned}$$

(3)

$$\begin{aligned} \zeta _{i+1}= & {} \zeta _i+\upsilon _{i+1}. \end{aligned}$$

(4)

where $\xi _1$ and $\xi _2$ are two uniformly distributed random numbers in [0, 1]. The inertia weight denoted by $\phi $, regulates how much the prior velocity will have an impact. This value is crucial for achieving a balance between the algorithm’s capacity for exploration and exploitation. Additionally, the parameters $\gamma _1$ and $\gamma _2$ stand for the acceleration coefficients that regulate, respectively, self-awareness and social impact. Algorithm 1 summarizes the common PSO procedures.

3.2 Proposed HPSO for the FS problem

This work proposes a new approach for solving FS named HPSO, which uses an asynchronously adaptive inertia weight and improved constriction factor, and also uses a chaotic map and a MAD fitness function with a feature count penalty to improve text clustering accuracy. To accomplish this goal, HPSO is used in each document to generate new subsets of useful text features, which are then combined to form the input for the k-mean clustering approach.

Before starting to expose our methodology of HPSO, we present the mathematical model for considered FS problem.

3.2.1 Mathematical model

Given $\textrm{VSM}$ as vectors of weights of text features in each document, VSM is represented as a vector

$$\begin{aligned} \textrm{VSM}_i = \phi _{i,1}, \phi _{i,2}, \ldots ,\phi _{i,j}, \ldots ,\phi _{i,t-1}, \phi _{i,t}, \end{aligned}$$

where i is the document number and t is the number of all unique terms. The FS algorithm generates a new subset of text features S, represented as a vector

$$\begin{aligned} S_i = s_{i,1},\ldots , s_{i,j}, \ldots , s_{i,t},\ \ s_{i,j} \in \{0,1\},\ \ i = 1, \ldots , n,\ \ j = 1,\ldots , t, \end{aligned}$$

where n represents the total number of documents. If $s_{i,j} = 1$, the feature number j in the $i\textrm{th}$ document has been chosen as informative feature. If $s_{i,j} = 0$, it implies that the feature number j in the $i\textrm{th}$ document is an non-informative text feature. It is possible to formulate the text FS problem as an optimization problem to identify the ideal subset of practical features as follows:

$$\begin{aligned} \begin{matrix} {Max} &{} \mathcal {F}_i:=\mathcal {MAD}_i \\ s.t. &{} s_{i,j} \in \{0,1\} \\ &{} \forall \ i = 1,\ldots ,n,\ \ \forall \ j = 1,\ldots ,t \end{matrix} \end{aligned}$$

(5)

where ($\mathcal {MAD}$) is the mean absolute difference, which is used as an objective function for the text FS problem [1]. It involves applying the common weighting formula to analyze the measure that the PSO algorithm uses to assess each solution it offers in each generation (TF-IDF). The solutions with the highest $\mathcal {MAD} $ value provided by the PSO in each document are considered to be the optimal solutions to the FS problem. Calculating mean value first, followed by the absolute value of the difference between value and mean value of the chosen feature weights $\phi _{ i,j}$, is how $ \mathcal {MAD} $ assigns a score (fitness value) to each candidate solution, as illustrated below:

$$\begin{aligned} \begin{aligned} \mathcal {MAD}_i&= \frac{1}{a_i} \sum _{j=1}^{t} s_{i,j} \left| \phi _j-\bar{\zeta _i} \right| \\ \textrm{where}&\\ \bar{\zeta _i}&= \frac{1}{a_i} \sum _{j=1}^{t} s_{i,j} \phi _j \end{aligned} \end{aligned}$$

(6)

$\mathcal {MAD}_i$ represents the value of the fitness for the $i\textrm{th}$ particle. If the $j\textrm{th}$ term is selected in the $i\textrm{th}$ solution, $s_{i,j} =1$; otherwise, $s_{i,j}= 0$. The $\phi _j$ stands for the weight of the $j\textrm{th}$ feature in the current document, $a_i$ for the total number of selected features in the $i\textrm{th}$ particle for the current document, t for the total number of terms, and $\bar{\zeta _i}$ for the average value of the selected weights in the $\textrm{VSM}$ for the current document.

To resolve the FS problem, we applied PSO approach. The main goal is to develop a new subset of informative text features that will serve as the best possible replacement for the existing document. The process starts with random vector of features and then enhances the population until the stop criterion is reached. The PSO swarm is made up of particles (solutions), each of which is represented by a binary vector of positions (features) [2]. Each placement denotes the state of a single feature in the document. The solution (particle) representation of the PSO algorithm is given by

$$\begin{aligned} X =[ 0, 1, 1, -1, 0, 1, 1, -1, 0, 1 ], \end{aligned}$$

(7)

where each unique term in the t search area for the FS problem has the option of being chosen or ignored. When position j is 1, it indicates the $j\textrm{th}$ feature is chosen as a valuable text feature; when position j is 0, it indicates the $j\textrm{th}$ feature is not chosen; and when position j is $-1$, it indicates the $j\textrm{th}$ feature is absent from the real document.

3.2.2 Penalty fitness evaluation

Denote by d the size of a given feature subset, which is used as a constraint to force the generated subset of text features that satisfy the given size requirement [2], and thus particles that violate this constraint are penalized. The particle fitness is represented in the following equation.

$$\begin{aligned} \mathcal {F}_i = \mathcal {MAD}_i - \delta \times |a_i-d| \end{aligned}$$

(8)

where $\delta $ is a penalty coefficient, d is the required size value, and $a_i$ means the total amount of features selected in the particle $i\textrm{th} $.

3.2.3 Chaotic map

Chaos refers to some random irregular motions appearing in deterministic systems. It is a nonlinear dynamic system that is highly sensitive to its initial circumstances and parameters. It possesses the traits of determinism, ergodicity, stochasticity, and regularity. Several researchers have employed chaotic maps to improve PSO’s search and global convergence capabilities [48]. In this study, we use a logistic map to generate chaotic sequences [2]. Mathematically, the logistic map is defined as

$$\begin{aligned} ch_{I+1} = 4 \times ch_I \times \left( 1 - ch_I \right) \end{aligned}$$

(9)

ch generates a chaotic value between 0.0 and 1.0 at each iteration I. $ch_0$ is generated randomly, with $ch_0$ not equal to 0, 0.25, 0.5, 0.75, or 1. Figure 1 represents the changing curve of the ch value.

3.2.4 Constriction factor

The original PSO has been improved and extensively used in many applications. Nevertheless, no rigorous mathematical justification for the convergence of the PSO algorithm has been established. Thus, Clerc [49] has explained, through some mathematical tools, how a simplified PSO model behaves in its seek for an optimal solution with the conclusion it may not converge in some situations. Indeed, the system equations of PSO (3–4) can be seen as system dynamics. In order to assure convergence and prevent premature convergence, constriction factors have been developed as a result of the analysis of the system trajectory [50]. Due to the large dimensions of the text feature set, the classical PSO will stick in a local optimum where the global optimum has not yet been found [35]. Thus, this work incorporated the constriction factor K into PSO to achieve the best convergence. The velocity and constriction factor based on the cosine function is shown in Eq. (10). In the early iterations, a convex function with a large K value is chosen, and a concave function with a smaller K value is chosen in the late period so that a particle in PSO will search over vast area to find the placement of the best solution. Then, in a small range, it will converge to determine the best solution.

$$\begin{aligned} \begin{aligned} \upsilon _{i,j}&= K [ \upsilon _{i,j} + \gamma _1 \times \xi _1 \times \left( \mathcal {P}optim_{I}-\zeta _{i,j} \right) \\&+ \gamma _2 \times \xi _2 \times \left( \mathcal {G}optim_{I}-\zeta _{i,j} \right) ] \\ where&\\ K&= 0.25\cos \left( ( \pi / I_{max} ) \times I \right) +\frac{ 5 }{8 } \end{aligned} \end{aligned}$$

(10)

I is the number of iterations. Figure 2 represents the changing curve of the K value. The curve of value K in Fig. 2 begins as a convex function and eventually becomes a concave function.

3.2.5 Adaptive inertia weight

As can be seen from Eq. (10), there are three main components that keep track of the velocity update. The first component of the information refers to the particle’s previous velocity, the second to the information the particle itself possesses, and the third to the information stored in the swarm. The inertia weight controls the current velocity, while $\gamma _1, \xi _1$ and $\gamma _2, \xi _2$ control the second and the third ones, respectively. These parameters are crucial for enhancing the PSO algorithm’s ability to search. In this study, the PSO algorithm uses a fitness-based dynamic inertia weight to dynamically update velocity and change the inertia weight’s value based on the particle’s current fitness [2]. In order to search throughout a large portion of the search space (exploration), it allocates lower inertia weights to high fitness particles and larger inertia weights to low fitness particles (exploitation). The improved velocity equation and the fitness-based dynamic inertia weight are described as follows:

$$\begin{aligned} \begin{aligned} \upsilon _{i,j}&= fw_i \times \upsilon _{i,j} + \gamma _1 \times \xi _1 \times \left( \mathcal {P}optim_{I}-\zeta _{i,j} \right) \\&+ \gamma _2 \times \xi _2 \times \left( \mathcal {G}optim_{I}-\zeta _{i,j} \right) \\ \textrm{where}&\\ fw_i&= 1.1-\frac{0.9 \times fw_i}{fw_\textrm{best}+0.1} \end{aligned} \end{aligned}$$

(11)

$fw_i$ and $fw_\textrm{best}$ represent the fitness of the $i\textrm{th}$ and global best particles, respectively.

3.2.6 Asynchronously adaptive inertia weight and improved constriction factor

The constriction factor affects PSO particle convergence, while the inertia weight affects how much the initial velocity is maintained. In this study, inspired by the idea of [35] that tries to improve asynchronously inertia weight and constriction factor, we apply the constriction factor and the inertia weight adaptively throughout distinct PSO periods depending on their diverse features. During the initial phases, when the particle exhibits a high fitness value, the inertia weight is set to a smaller value. This allows the particle to retain a limited portion of its previous velocity, focusing on local development. As the particle moves farther away from the optimal solution, the inertia weight is increased, enabling the particle to maintain a larger portion of its previous velocity and explore the search space globally. Moreover, we introduce the chaotic strategy to enhance the global search capability of PSO. Indeed, the properties of a chaotic system provide a good exploration of the search space and refine the selected feature subspace. This strategy avoids the entrapment of an individual at an undesirable local minimum solution. Equation (12) shows the new formulas for inertia weight, constriction factor, and velocity.

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{ll} \begin{aligned} \upsilon _{i,j} &{}= fw_i \times \upsilon _{i,j} + \gamma _1 \times ch \times \left( \mathcal {P}optim_{I}-\zeta _{i,j} \right) \\ &{}\quad + \gamma _2 \times (1-ch) \times \left( \mathcal {G}optim_{I}-zeta_{i,j} \right) \end{aligned} &{} \textrm{if} \, I < \frac{I_{max}}{2} \\ \begin{aligned} \upsilon _{i,j} &{}= K [ 0.7 \times \upsilon _{i,j} + \gamma _1 \times ch \times \left( \mathcal {P}optim_{I}-\zeta _{i,j} \right) \\ &{}\quad + \gamma _2 \times (1-ch) \times \left( \mathcal {G}optim_{I}-zeta_{i,j} \right) ] \end{aligned} &{} \textrm{if} \, I \ge \frac{I_{max}}{2} \end{array} \right. \\&\textrm{where} \\&fw_i = 1.1-\frac{0.9 \times fw_i}{fw_\textrm{best}+0.1}\\&K = \frac{ \cos \left( ( 2 \pi / I_\textrm{max} ) \times ( I - \frac{I_\textrm{max}}{2} ) \right) +2.428571 }{ 4 } \end{aligned} \end{aligned}$$

(12)

ch value between 0.0 and 1.0 generated by Eq. (9) at each iteration I.

3.2.7 Hybrid particle swarm optimization algorithm

The FS problem is addressed by developing a new hybrid PSO method that identifies the best subset of text features. Sequence provides a detailed description of the created process, while Algorithm provides an algorithmic flow. 2.

With a set of solutions that were produced at random, the PSO algorithm starts the swarm of particles. Each of them is assessed by the MAD fitness function using Eqs. (6 and 8) defined feature count penalty. The PSO swarm consists of several particles; each of them contains a number of positions (features) that move around with their own velocity using Eq. (12), causing the PSO algorithm to be positioned in the search space of the text FS problem. The fitness value of particles is evaluated in each iteration. In addition, parameters such as the chaotic map ch, constriction factor K, and fitness-based dynamic inertia weight $fw_i$ are calculated in each iteration, and the current and best fitness is saved to affect particle movement in subsequent iterations. Finally, the best solution discovered by the PSO approach is chosen as the best solution, representing a new subset of revealing features.

3.3 Clustering

The accuracy of the FS techniques is assessed using the k-mean clustering approach after creating a fresh subset of informational text characteristics. The following subsections provide an explanation of the k-mean clustering method.

3.3.1 Mathematical model

Given a large set of text documents, $D = d_1, d_2, \ldots , d_i, \ldots , d_n$, where $d_i = w_{i1}, w_{i2}, \ldots , w_{ij}, \ldots , w_{it}$, represents the document number i, $w_{ij}$ represents the weight of the feature number j in the $i\textrm{th}$ document, n is the number of documents in the given document collection, and t represents the total number of terms. The cost function $\mathcal {C}os(d_i,c_l)$ that evaluates the cosine similarity measure between the document number i and the cluster centroid number l, where $c_l = c_{l1}, c_{l2}, \ldots , c_{lj}, \ldots , c_{lt}$ must be updated in every iteration using Eq. (13). An extensive collection of text documents is clustered into k clusters using the considered objective function. By assigning documents to the cluster with the highest degree of similarity based on their resemblance to the cluster centroid, documents in these clusters are more similar to one another than documents in other groups [1].

$$\begin{aligned} c_{l} = \frac{\sum _{i=1}^{n}(a_{li})d_{i}}{\sum _{i=1}^{n}a_{li}} \end{aligned}$$

(13)

where $d_i$ represents the $i\textrm{th}$ document. $a_{li}$ is equal to 1 if the document number i is assigned to the $l\textrm{th}$ cluster, and 0 otherwise. The cosine measure is employed in this study to compute the similarity value between the document vector and the cluster centroid vector. The similarity value is calculated using the following formula:

$$\begin{aligned} \mathcal {C}os(d_i,c_l) = \frac{ \sum _{j=1}^{t} w_{ij} \times c_{lj} }{ \sqrt{\sum _{j=1}^{t} w_{ij}^2} \sqrt{\sum _{j=1}^{t} c_{lj}^2} } \end{aligned}$$

(14)

where $w_{ij}$ represents the magnitude of feature j in the $i\textrm{th}$ document, $c_{lj}$ is the value of the $j\textrm{th}$ term in cluster centroid number l, $\sum _{j=1}^{t} w_{ij}^2$ is the square of the norm of the score vector for the document number i, and $\sum _{j=1}^{t} c_{lj}^2$ is the square of score vector for the cluster centroid number l.

3.3.2 k-means algorithm

The k-means method is an unsupervised one that attempts to process a large amount of text and organize it into a specified number of clusters, with each group containing documents that are similar to one another. k-means solves the clustering problem by using Eq. (14) to iteratively reassign text documents to clusters based on their similarity to the cluster centroid. Each iteration of the reassignment method will result in the recalculation of the cluster centroids using Eq. (13). In this method, $\Lambda (n\times k)$ represents the number of documents and clusters, where n represents the number of documents. This procedure is described by Algorithm 3. The k-mean clustering method seeks the best clustering solution $(n \times k)$.

3.4 Complexity analysis

In this subsection, we present a comprehensive computational complexity analysis of the proposed HPSO used for feature selection in text data. Evaluating the efficiency and scalability of optimization algorithms is crucial for understanding their performance characteristics. The computational complexity of our HPSO is assessed by considering key operations involved in the optimization process. Specifically, we analyze the complexities associated with objective function evaluations, the generation of chaotic sequences using logistic maps, velocity updates, and the adaptive adjustment of inertia weights and constriction factors. The overall computational complexity is derived by aggregating these individual complexities over the course of HPSO iterations. Let us break down the complexities associated with each key algorithmic component:

3.4.1 Fitness function evaluation complexity

The objective function, representing the MAD fitness, involves the calculation of fitness values for each particle. Considering $\mathcal {N}p$ particles and $t$ features, the complexity ($C_{\text {MAD}}$) can be expressed as:

$$\begin{aligned} C_{\text {MAD}} = O(\mathcal {N}p \cdot t) \end{aligned}$$

3.4.2 Chaotic sequence generation complexity

Logistic map equation (9) for generating chaotic sequences involves iterative calculations. With $I_{\text {max}}$ iterations, the complexity ($C_{\text {chaotic}}$) is given by:

$$\begin{aligned} C_{\text {chaotic}} = O(I_{\text {max}}) \end{aligned}$$

3.4.3 Velocity updates complexity

Velocity update Equation (10) includes arithmetic operations and trigonometric functions. Considering $I_{\text {max}}$ iterations and $\mathcal {N}p \cdot t$ features, the complexity ($C_{\text {velocity}}$) is expressed as:

$$\begin{aligned} C_{\text {velocity}} = O(I_{\text {max}} \cdot \mathcal {N}p \cdot t) \end{aligned}$$

3.4.4 Inertia weight and constriction factor adaptation complexity

The adaptation of inertia weight and constriction factor involves conditional statements and arithmetic operations. With $I_{\text {max}}$ iterations and $\mathcal {N}p$ particles, the complexity ($C_{\text {adaptation}}$) is given by:

$$\begin{aligned} C_{\text {adaptation}} = O(I_{\text {max}} \cdot \mathcal {N}p) \end{aligned}$$

By summing up these complexities, the overall computational complexity ($C_{\text {total}}$) of the proposed PSO variant can be expressed as:

$$\begin{aligned} C_{\text {total}} = C_{\text {MAD}} + C_{\text {chaotic}} + C_{\text {velocity}} + C_{\text {adaptation}} \end{aligned}$$

This analysis provides valuable insights into the algorithm’s resource requirements and scalability, offering a foundation for discussions on optimization efficiency in the context of large-scale text feature sets.

The classical PSO complexity is given by $O(\mathcal {N}p \cdot t + I_{\text {max}} \cdot \mathcal {N}p \cdot t + I_{\text {max}} \cdot \mathcal {N}p)$. Comparing both complexities, the proposed HPSO introduces an additional complexity term associated with chaotic sequence generation. However, it is important to note that the chaotic sequence generation complexity ($O(I_{\text {max}})$) is generally lower than the velocity updates complexity ($O(I_{\text {max}} \cdot \mathcal {N}p \cdot t)$) in both algorithms.

In summary, while the proposed HPSO introduces some additional computational load due to chaotic sequence generation, the overall impact on complexity is moderate.

4 Experiments results

We implement a Java software that uses the HPSO algorithm to choose a fresh set of practical text features before using the k-means text clustering method. This section includes the data sets, parameter settings, evaluation criteria, and results. All tests are performed on a Laptop with a core i7 processor and 16GB of RAM in a Windows 10 environment (Table 1).

Table 1 Parameters setting used in this paper

Full size table

4.1 Data sets

The experiment is carried out on two different reference text data sets, Reuters-21578 and Webkb. Tables 2, 3, and 4 represent a summary of the data sets used to compare HPSO with other competitive methods (GA [9], and other well-known PSO variants such as PSO with fixed inertia weight, PSO with improved inertia weight, PSO with fixed constriction factor, PSO with improved constriction factor, and PSO with adaptive inertia weight). One of the most popular document clusters for text categorization research, the Reuters-21578 data set includes a variety of historical data from the Reuter news agency. Reuters, Ltd. and Carnegie Group, Inc. collected and initialized the data. The R8 set has 7674 documents overall and extracts from Reuters-21578 all papers with the eight most common subjects, which are acq, crude, earn, grain, interest, money-fx, ship, and trade. A thorough distribution of these documents is presented in Table 2.

Table 2 Detailed distribution of the R8 data set

Full size table

Table 3 shows the detailed distribution of the R52 set, which extracts a large number of documents from Reuters-21578 with 52 subjects and a total of 9100 documents. The R8 and R52 data sets are created by applying the following transformations to the original Reuters-21578:

SPACE is used in place of the characters TAB, NEWLINE, and RETURN. Only letters should be kept (i.e., turn punctuation, numbers, etc. to SPACES). Lowercase all letters. Multiple SPACES should be replaced with a single SPACE. Titles and subjects are simply added to documents.
Words with fewer than three characters are removed. For instance, remove ’he’ but keep ’him.’
524 stop words should be removed. As a result of being shorter than three characters, many of them have already been eliminated.
Applying Stemmer Porter to the remaining words.

Table 3 Detailed distribution of the R52 data set

Full size table

Webkb extracts documents from the web based on four popular topics: Project, Course, Faculty, and Students, with a total of 4199 documents, as shown in Table 4. The data sets documents are described as follows: they are all text files, one document for each row, each document is represented by a “word” representing the document’s class, a TAB character, and a series of “words” separated by spaces, which represent the terms contained in the document. Each document is constituted of its group and its terms.

Table 4 Detailed distribution of the Webkb data set

Full size table

4.2 Parameter settings

In this paper, seven meta-heuristic algorithms are compared: PSO with fixed inertia weight (FIWPSO) [35], PSO with improved inertia weight (IIWPSO) [35], PSO with fixed constriction factor (FCFPSO) [35], PSO with improved constriction factor (ICFPSO) [35], PSO with adaptive inertia weight (AIWPSO) [2], genetic algorithm (GA) [9], and the proposed HPSO method. The algorithms under consideration make use of a different changeable parameters. The parameters for the competing approaches were derived from relevant papers that the researchers suggested based on their experimental study. The values of the parameters used in the paper are shown in Table 1.

4.3 Evaluation criteria

Table 5 Algorithms effectiveness (Accuracy, Fmeasure, Precision, and Recall) in six data sets over ten runs based on k-means text clustering algorithm

Full size table

Comparative evaluations were performed using one internal evaluation measure (the similarity measure) and four external evaluation measures (the precision (P), recall (R), F-measure, and average accuracy). The variables taken into consideration are generally accepted evaluation standards for assessing cluster correctness in the context of text clustering. We must count the number of documents with the same topic that are in the same cluster as well as the number of documents with different topics that are in distinct clusters to examine the performance of clustering. One of the aforementioned conditions could be true for every set of documents:

SS Both documents are grouped together in both our clusters and the corpus.
SD Although the two documents are in separate clusters in the corpus, they are in the same cluster in our clusters.
DSIn our clusters, documents are divided into distinct groups; yet, in the corpus, they are put together in the same groups.
DD Both documents were categorized in separate clusters in both the corpus and our clusters.

The accuracy rate is given by Eq. (15), where $\alpha $, $\beta $, $\eta $, and $\rho $ are the amount of document couplings in the SS state, SD, DS, and DD, respectively.

$$\begin{aligned} \textrm{Average} \, \textrm{Accuracy} = \frac{1}{2} \left( \frac{\alpha }{\alpha +\eta }+\frac{\rho }{\beta +\rho } \right) \end{aligned}$$

(15)

Utilizing the F-measure, which has the following formula, is an additional strategy for evaluating clustering

$$\begin{aligned} \begin{aligned}&\text {F-measure} = \frac{2\times P\times R}{P+R} \\ \textrm{where}&\\&P = \frac{\alpha }{\alpha +\beta }\\&R = \frac{\alpha }{\alpha +\eta } \end{aligned} \end{aligned}$$

(16)

Average accuracy, precision, recall, and F-measure can all go as high as 1. This value may appear if all documents are successfully grouped.

4.4 Results and discussion

Three experiments are realized to show the efficiency of the HPSO and to discuss the performance of each PSO variant. In the first experiment, the inertia weight variants of PSO named FIWPSO, IIWPSO, and AIWPSO are compared (Tables 5, 6). The second compares fixed inertia weight (FIWPSO) to the constriction variants of PSO known as FCFPSO and ICFPSO in order to illustrate the effectiveness of the constriction factor in the particle swarm optimization algorithm. Finally, we compare HPSO results to all PSO variants and GA results. All algorithms are executed independently twenty times on all of the six data sets. These algorithms are compared in terms of accuracy, precision, recall, F-measure, and convergence rate. The convergence rate assesses how quickly the algorithm approaches the optimal solution over iterations. The comparison of convergence rates is presented in Fig. 3.

Table 6 Algorithm’s mean Accuracy, Fmeasure, Precision, and Recall in six data sets

Full size table

Based on the k-means text clustering algorithm, Table 7 displays the Algorithms effectiveness (Accuracy, Fmeasure, Precision, and Recall). The FS technique using the AIWPSO algorithm and the IIWPSO outperformed the other comparable inertia weight variant of PSO FIWPSO in terms of outcomes. According to two of the evaluation metrics (Accuracy and Precision), the AIWPSO performed best in three of the six data sets, followed by the IIWPSO in two of the six data sets. IIWPSO performed best in three of the six data sets and five of the six data sets, respectively, as measured by Fmeasure and Recall. In the second experiment, the FCFPSO method clearly outperformed the other comparative Constriction-based method ICFPSO and the conventional Fixed Inertia Weight FIWPSO algorithm for almost all data sets in terms of clustering evaluation criteria. Finally, for the third experiment, based on the evaluation criteria, the developed HPSO recorded the best effectiveness and outperformed the other comparative methods (FIWPSO, IIWPSO, AIWPSO, FCFPSO, ICFPSO, and GA), followed by the IIWPSO. According to Table 7, HPSO achieved the outstanding results in three out of the six data sets (i.e., WebkbTest, WebkbTrain, and R8Train), based on the Accuracy measure, followed by IIWPSO in two out of six data sets (i.e., R52Test and R52Train). Based on Fmeasure, HPSO and IIWPSO achieved comparable results, with HPSO get the distinguished results in three of six data sets (i.e., WebkbTest, WebkbTrain, and R8Train) and IIWPSO get the most good results in three of six data sets (i.e., R8Test, R52Test, and R52Train). Based on Precision metric, HPSO outperformed the other algorithms in four out of six data sets (i.e., WebkbTest, WebkbTrain, R8Train, and R52Train) followed by both IIWPSO and AIWPSO in one out of six data sets. Based on Recall measure, IIWPSO performed best in three of the six data sets (i.e., R8Test, R52Test, and R52Train) followed by HPSO, FCFPSO, and FIWPSO, who each had the most good results in one of the six data sets.

Table 8 displays the algorithm’s mean accuracy, Fmeasure, precision, and recall in six data sets. In terms of performance, HPSO-based FS approaches clearly outperform more traditional FS methods. To begin, HPSO has one of the best Recall results and the highest mean in the Accuracy, Fmeasure, and Precision measures. Meanwhile, IIWPSO and AIWPSO have the best results after HPSO in most of the measures. According to Table 8, AIWPSO had the second best mean based on the Accuracy and Precision measures, with IIWPSO coming in third. According to the Fmeasure metric, IIWPSO obtained the second best mean followed by AIWPSO in the third place. Finally, in terms of recall, IIWPSO outperformed the other algorithms and achieved the highest mean.

The new developed FS method HPSO is used to generate new subsets of useful text features in order to improve the accuracy of the k-mean text clustering algorithm. To summarize, the proposed HPSO-based FS approaches can achieve the best performance of text clustering according to the majority of evaluation measures compared to similar variants of PSO algorithms and the GA. In addition, the HPSO has the best mean across almost all measures and is ranked first. The degree to which particles maintain their original velocity is measured by inertia weight. Thus, the inertia weight adjustment is critical for adjusting particle velocity so that they can escape from the local optimal solution and reach a more good solution via an effective search strategy. Tables 7 and 8 clearly illustrate this analysis. The results show that IIWPSO and AIWPSO achieve better mean results in almost all measures than the competitive methods and are ranked 2 and 3, respectively. The FCFPSO was ranked fourth, outperforming the FIWPSO and ICFPSO. The k-mean clustering with genetic-based FS algorithm was the worst, ranking seventh.

In order to demonstrate the convergence characteristics of the competing approaches, we recorded the convergence values for each program obtained through ten runs, each consisting of 50 iterations. The fitness function MAD values versus the number of iterations are used in the comparative analysis of the techniques. Figure 3 illustrates the changing convergence curves of FIWPSO, IIWPSO, AIWPSO, FCFPSO, ICFPSO, GA, and HPSO on WebkbTest, WebkbTrain, R8Test, R52Test, R8Train, and R52Train data sets.

As observed in Fig. 3, HPSO exhibits a slower convergence when contrasted with other methods that rely solely on the MAD fitness function without incorporating a feature count penalty. This characteristic is attributed to the multi-objective nature introduced in the proposed HPSO. In this multi-objective approach, HPSO is designed to simultaneously optimize two critical objectives: the MAD fitness function and a feature count penalty.

The MAD fitness function serves to assess the quality of solutions by evaluating their capacity to represent pertinent text features. Conversely, the feature count penalty introduces a cost for larger feature subsets, aiming to strike a delicate balance between the necessity for informative features and the aspiration for a concise feature set. Consequently, the integration of the feature count penalty adds a layer of complexity to the optimization process, resulting in a slower convergence rate compared to methods exclusively optimizing the MAD fitness function. This deceleration is attributed to the algorithm’s additional consideration of penalizing larger feature subsets. While it may lead to a more gradual convergence, this nuanced approach proves beneficial in enhancing overall clustering accuracy, as the algorithm strategically balances the quality and quantity of the selected features.

Another observed result is that the IIWPSO converges faster than the competing methods and achieves the best results across nearly all six text data sets. AIWPSO has similar convergence curves to FIWPSO, but its final fitness is only slightly worse. ICFPSO has a lower fitness than FCFPSO but the fastest convergence speed, except for IIWPSO, requiring only 30 iterations to achieve the optimum. In comparison with all methods, FCFPSO has the second best fitness. The GA has the final worst fitness, when compared to all PSO variants, its fitness is slightly worse than ICFPSO, but its convergence speed is the slowest.

To assess the robustness of the proposed HPSO in feature selection clustering against more recent variants of PSO, especially those incorporating learning exemplars and structure topologies, we compare HPSO with SpadePSO [10]. SpadePSO, introduced as one of the latest variants of PSO, is distinguished by its innovative approach that includes advanced learning exemplars and sophisticated structure topologies. These features aim to enhance the algorithm’s adaptability and exploration–exploitation balance. By choosing SpadePSO as a benchmark, we aim to provide a comprehensive evaluation, considering not only the historical PSO variants but also the cutting-edge developments in the field.

In this case, we have adopted the source code of SpadePSO [10], for solving the FS problem. In Tables 7 and 8 we present the performance comparison between SpadePSO, IIWPSO, and HPSO applied to the six data sets.

Table 7 Algorithm effectiveness (Accuracy, Fmeasure, Precision, and Recall) in six data sets over ten runs based on k-means text clustering algorithm

Full size table

Table 8 Algorithm’s mean Accuracy, Fmeasure, Precision, and Recall in six data sets

Full size table

In the comparative evaluation of SpadePSO, HPSO, and IIWPSO across the six data sets, noteworthy patterns emerge from the performance metrics. HPSO consistently demonstrates robust results, outperforming both SpadePSO and IIWPSO in several key aspects. In terms of accuracy, precision, recall, and F-measure, HPSO excels, showcasing its superior ability to generate effective feature subsets for text clustering. IIWPSO, known for its fast convergence, remains competitive, demonstrating strengths in specific data sets. SpadePSO, while exhibiting respectable performance, generally falls behind HPSO, particularly in accuracy and precision. These results underscore the efficacy of HPSO in enhancing clustering outcomes, emphasizing the importance of its unique combination of asynchronously adaptive inertia weight, improved constriction factor, chaotic map, and MAD fitness function with a feature count penalty. The multi-objective nature of HPSO, though contributing to slower convergence, proves instrumental in achieving top-tier effectiveness, providing a valuable trade-off for practitioners seeking optimal clustering solutions.

As a summarized conclusion of the comparison results for the three experiments realized, we have:

1.
Comparison of Inertia Weight Variants of PSO (FIWPSO, IIWPSO, AIWPSO):

AIWPSO and IIWPSO outperformed FIWPSO in most data sets for accuracy, precision, recall, and F-measure.
2.
Comparison of Constriction Factor Variants of PSO (FCFPSO, ICFPSO):

FCFPSO consistently outperformed ICFPSO and the conventional FIWPSO in almost all data sets based on clustering evaluation criteria.
3.
The comparison between HPSO and the most recent variant of PSO (SpadePSO):

Comparison reveals intriguing insights into their performance across various data sets and evaluation metrics. In terms of accuracy, F-measure, precision, and recall, HPSO demonstrates a competitive edge over SpadePSO in the majority of data sets.
4.
HPSO vs. All PSO Variants and GA:
- HPSO demonstrated superior performance when compared to all PSO variants and GA.
- HPSO achieved outstanding results in three of the six data sets based on accuracy (WebkbTest, WebkbTrain, R8Train), followed by IIWPSO in two data sets (R52Test and R52Train).
- In terms of F-measure, HPSO and IIWPSO achieved comparable results, with HPSO excelling in three data sets (WebkbTest, WebkbTrain, R8Train), and IIWPSO performing best in three data sets (R8Test, R52Test, R52Train).
- HPSO outperformed other algorithms in four data sets in terms of precision (WebkbTest, WebkbTrain, R8Train, R52Train).
- IIWPSO performed best in three data sets in terms of recall (R8Test, R52Test, R52Train).
- In the evaluation across the six data sets (WebkbTest, WebkbTrain, R8Test, R52Test, R8Train, R52Train), HPSO consistently exhibits superior performance compared to SpadePSO, showcasing its efficacy in feature selection for text clustering across diverse data sets.

It follows that the mean performance across all measures, including accuracy, F-measure, precision, and recall, is consistently better for HPSO-based FS approaches compared to traditional methods. Moreover, HPSO ranks first in most measures, with IIWPSO and AIWPSO following, while the convergence analysis reveals that HPSO, although having slower convergence due to its multi-objective nature, achieves the best overall effectiveness. It turns out that IIWPSO stands out with fast convergence and top results in most data sets.

We concluded that the HPSO-based FS approach significantly enhances text clustering performance across multiple evaluation metrics when compared to various PSO variants and the GA. This improvement is attributed to the exploration and exploitation capability owing to asynchronously adaptive inertia weight, improved constriction factor, chaotic map, and MAD fitness function with a feature count penalty. These strategies helped the algorithm improve its search capability (exploration and exploitation); hence, the HPSO attains a better solution and avoids stagnation of the particles at a local optimal solution.

5 Conclusions and prospective directions

Text clustering offers an efficient means to automatically group digital documents based on their inherent characteristics. However, the high dimensionality of the feature space poses a significant challenge in text clustering. Various meta-heuristic techniques have been proposed in the literature to address the feature selection problem. In this paper, we analyze several variants of PSO algorithms and introduce a novel approach for feature selection, which we have named HPSO. Our aim is to overcome issues such as premature convergence and particle entrapment in local optima. To achieve this, we integrate different strategies into the PSO to enhance its search capabilities.

We work on four distinct stages of the PSO to improve its search efficiency. This includes the incorporation of an asynchronously adaptive inertia weight, an enhanced constriction factor, the use of chaotic maps, and the application of a MAD fitness function with a feature count penalty. Through numerical experiments, we have assessed the effectiveness of the developed method compared to other competitive methods. The results demonstrate that the proposed HPSO method not only achieves higher clustering precision but also selects a more informative feature set.

While the elementary steps introduced in HPSO incrementally increase the algorithm’s computational complexity, they significantly enhance the PSO’s search capabilities, improving both convergence behavior and accuracy. Statistical analysis validates that the HPSO outperforms competitive methods.

Looking ahead, we plan to address specific issues identified with HPSO in this study. We will explore other PSO variants known for their improved search capabilities and faster convergence to apply them to the feature selection task. We also aim to develop methods for the automatic adjustment of certain parameters, given their influence on PSO performance, while maintaining reasonable computational complexity.

Data availability

The data that support the findings of this study are openly available in [7, 8].

References

Abualigah LM, Khader AT, Hanandeh ES (2018) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci 25:456–466
Article Google Scholar
Bharti KK, Singh PK (2016) Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl Soft Comput 43:20–34
Article Google Scholar
Lin S-W, Ying K-C, Chen S-C, Lee Z-J (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824
Article Google Scholar
Shamsinejadbabki P, Saraee M (2012) A new unsupervised feature selection method for text clustering based on genetic algorithms. J Intell Inf Syst 38(3):669–684
Article Google Scholar
Lu H, Wang X, Fei Z, Qiu M (2014) The effects of using chaotic map on improving the performance of multiobjective evolutionary algorithms. Math Probl Eng
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Article Google Scholar
reuters21578, http://kdd.ics.uci.edu/databases/reuters21578
Webkb, http://www.cs.cmu.edu/~webkb/
Abualigah LM, Khader AT, Al-Betar MA (2016) Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 7th international conference on computer science and information technology (CSIT). IEEE, pp 1–6
Wu X, Han J, Wang D, Gao P, Cui Q, Chen L, Liang Y, Huang H, Lee HP, Miao C et al (2023) Incorporating surprisingly popular algorithm and Euclidean distance-based adaptive topology into PSO. Swarm Evolut Comput 76:101222
Article Google Scholar
Crone SF, Kourentzes N (2010) Feature selection for time series prediction-a combined filter and wrapper approach for neural networks. Neurocomputing 73(10–12):1923–1936
Article Google Scholar
Khammassi C, Krichen S (2020) A nsga2-lr wrapper approach for feature selection in network intrusion detection. Comput Netw 172:107183. https://doi.org/10.1016/j.comnet.2020.107183
Article Google Scholar
Hu G, Du B, Wang X, Wei G (2022) An enhanced black widow optimization algorithm for feature selection. Knowl Based Syst 235:107638
Article Google Scholar
Jha K, Saha S (2021) Incorporation of multimodal multiobjective optimization in designing a filter based feature selection technique. Appl Soft Comput 98:106823. https://doi.org/10.1016/j.asoc.2020.106823
Article Google Scholar
Xue B, Zhang M, Browne WN (2015) A comprehensive comparison on evolutionary feature selection approaches to classification. Int J Comput Intell Appl 14(02):1550008
Article Google Scholar
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: Icml, Vol. 1, Citeseer, pp. 74–81
Liu X-Y, Liang Y, Wang S, Yang Z-Y, Ye H-S (2018) A hybrid genetic algorithm with wrapper-embedded approaches for feature selection. IEEE Access 6:22863–22874
Article Google Scholar
Chhikara RR, Sharma P, Singh L (2016) A hybrid feature selection approach based on improved PSO and filter approaches for image steganalysis. Int J Mach Learn Cybern 7(6):1195–1206
Article Google Scholar
Kılıç F, Kaya Y, Yildirim S (2021) A novel multi population based particle swarm optimization for feature selection. Knowl Based Syst 219:106894
Article Google Scholar
Xue B, Zhang M, Browne WN (2012) Particle swarm optimization for feature selection in classification: a multi-objective approach. IEEE Trans Cybern 43(6):1656–1671
Article Google Scholar
Das AK, Das S, Ghosh A (2017) Ensemble feature selection using bi-objective genetic algorithm. Knowl Based Syst 123:116–127
Article Google Scholar
Shastry KA, Sanjay H (2021) A modified genetic algorithm and weighted principal component analysis based feature selection and extraction strategy in agriculture. Knowl Based Syst 232:107460
Article Google Scholar
Yang J, Honavar V (1998) Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, Springer, pp 117–136
Kabir MM, Shahjahan M, Murase K (2012) A new hybrid ant colony optimization algorithm for feature selection. Expert Syst Appl 39(3):3747–3763
Article Google Scholar
Wan Y, Wang M, Ye Z, Lai X (2016) A feature selection method based on modified binary coded ant colony optimization algorithm. Appl Soft Comput 49:248–258
Article Google Scholar
Al-Ani A, Alsukker A, Khushaba RN (2013) Feature subset selection using differential evolution and a wheel based search strategy. Swarm Evol Comput 9:15–26
Article Google Scholar
Eberhart R, Kennedy J (1995) A new optimizer using particle swarm theory. In: MHS’95. Proceedings of the sixth international symposium on micro machine and human science, IEEE, pp 39–43
Shi Y, Eberhart R (1998) A modified particle swarm optimizer. In: IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360). IEEE, pp 69–73
Sheikhpour R, Sarram MA, Sheikhpour R (2016) Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Appl Soft Comput 40:113–131
Article Google Scholar
Naeini AA, Babadi M, Mirzadeh SMJ, Amini S (2018) Particle swarm optimization for object-based feature selection of vhsr satellite images. IEEE Geosci Remote Sens Lett 15(3):379–383
Article Google Scholar
Sujana JAJ, Revathi T, Priya TS, Muneeswaran K (2019) Smart PSO-based secured scheduling approaches for scientific workflows in cloud computing. Soft Comput 23(5):1745–1765
Article Google Scholar
Chaudhry R, Tapaswi S, Kumar N (2019) Fz enabled multi-objective PSO for multicasting in IoT based wireless sensor networks. Inf Sci 498:1–20
Article MathSciNet Google Scholar
Liu Y, Wang G, Chen H, Dong H, Zhu X, Wang S (2011) An improved particle swarm optimization for feature selection. J Bionic Eng 8(2):191–200
Article Google Scholar
Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22(2):387–408
Article Google Scholar
Lu Y, Liang M, Ye Z, Cao L (2015) Improved particle swarm optimization algorithm and its application in text feature selection. Appl Soft Comput 35:629–636
Article Google Scholar
Wang D, Wang J, Wang H, Zhang R, Guo Z Intelligent optimization methods, China: Higher Education Press
Shi Y, Eberhart RC (1999) Empirical study of particle swarm optimization, in: Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), Vol 3, IEEE, pp 1945–1950
Lindfield G, Penny J (2017) Chapter 3—particle swarm optimization algorithms. In: Lindfield G, Penny J (Eds.), Introduction to nature-inspired optimization, Academic Press, Boston, pp 49–68. https://doi.org/10.1016/B978-0-12-803636-5.00003-7
Wei J, Yuehong S, Xinning S (2010) A document clustering algorithm using particle swarm optimization. J Chin Soc Sci Tech Inf 29(3):428–432
Google Scholar
Molaei S, Moazen H, Najjar-Ghabel S, Farzinvash L (2021) Particle swarm optimization with an enhanced learning strategy and crossover operator. Knowl Based Syst 215:106768
Article Google Scholar
Nachaoui M, Afraites L, Laghrib A (2021) A regularization by denoising super-resolution method based on genetic algorithms. Signal Process Image Commun 99:116505. https://doi.org/10.1016/j.image.2021.116505
Article Google Scholar
Nachaoui M, Chakib A, Nachaoui A (2020) An efficient evolutionary algorithm for a shape optimization problem. Appl Comput Math 19(2):220–244
MathSciNet Google Scholar
Ghamisi P, Benediktsson JA (2014) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313
Article Google Scholar
Zhang Y, Gong D, Hu Y, Zhang W (2015) Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 148:150–157
Article Google Scholar
Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):1–17
Article Google Scholar
Altarabichi MG, Nowaczyk S, Pashami S, Sheikholharam Mashhad P (2023) Fast genetic algorithm for feature selection-a qualitative approximation approach. In: Proceedings of the companion conference on genetic and evolutionary computation, pp 11–12
Hull DA (1996) Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci 47(1):70–84
Article Google Scholar
Chuang L-Y, Yang C-H, Li J-C (2011) Chaotic maps based on binary particle swarm optimization for feature selection. Appl Soft Comput 11(1):239–248
Article Google Scholar
Clerc M (1999) The swarm and the queen: towards a deterministic and adaptive particle swarm optimization. In: Proceedings of the 1999 congress on evolutionary computation-CEC99 (Cat. No. 99TH8406), Vol. 3, IEEE, pp 1951–1957
Naik BB, Raju CP, Rao RS (2018) A constriction factor based particle swarm optimization for congestion management in transmission systems. Int J Electr Eng Inf 10(2):232–241
Google Scholar

Download references

Author information

Authors and Affiliations

National School of Applied Science ENSA, 25000, Khouribga, Morocco
Issam Lakouam & Imad Hafidi
Equipe de Mathématiques et Interactions, Faculté des Sciences et Techniques, Université Sultan Moulay slimane, Beni-Mellal, Marocco
Mourad Nachaoui

Authors

Mourad Nachaoui
View author publications
You can also search for this author in PubMed Google Scholar
Issam Lakouam
View author publications
You can also search for this author in PubMed Google Scholar
Imad Hafidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mourad Nachaoui.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nachaoui, M., Lakouam, I. & Hafidi, I. Hybrid particle swarm optimization algorithm for text feature selection problems. Neural Comput & Applic 36, 7471–7489 (2024). https://doi.org/10.1007/s00521-024-09472-w

Download citation

Received: 04 July 2023
Accepted: 14 January 2024
Published: 19 February 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s00521-024-09472-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hybrid particle swarm optimization algorithm for text feature selection problems

Abstract

Similar content being viewed by others

Meta-heuristic Algorithms for Text Feature Selection Problems

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

A parallel hybrid krill herd algorithm for feature selection

Explore related subjects

1 Introduction

2 State of art

3 Methodology

3.1 Preliminaries

3.1.1 Tokenization

3.1.2 Stop words elimination

3.1.3 Stemming

3.1.4 Term weighting

3.1.5 VSM

3.1.6 Standard PSO

3.2 Proposed HPSO for the FS problem

3.2.1 Mathematical model

3.2.2 Penalty fitness evaluation

3.2.3 Chaotic map

3.2.4 Constriction factor

3.2.5 Adaptive inertia weight

3.2.6 Asynchronously adaptive inertia weight and improved constriction factor

3.2.7 Hybrid particle swarm optimization algorithm

3.3 Clustering

3.3.1 Mathematical model

3.3.2 k-means algorithm

3.4 Complexity analysis

3.4.1 Fitness function evaluation complexity

3.4.2 Chaotic sequence generation complexity

3.4.3 Velocity updates complexity

3.4.4 Inertia weight and constriction factor adaptation complexity

4 Experiments results

4.1 Data sets

4.2 Parameter settings

4.3 Evaluation criteria

4.4 Results and discussion

5 Conclusions and prospective directions

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation