1 Introduction

Feature selection is an essential task that can disrupt the efficacy of image recognition and classification [1]. The major intention of the feature selection process is to minimize the large dimensionality of data that affects classification performance. Minimizing the dimensionality of patterns through the feature selection process is the most challenging task for image classification [2]. The feature selection process applies to several fields, such as signal processing, data mining, bioinformatics, text classification, pattern recognition, image processing, etc. This feature selection stage selects a subset of the presented features by ignoring unwanted features [3]. Feature dimensionality reduction is necessary to attain better information in a given image set. This process helps to diminish the redundant features, which are highly correlated, also reducing the overall computational time [4].

Recently, various approaches have been developed to address the issues of minimizing irrelevant features in image processing fields [5]. An effective feature selection technique assists in learning data, diminishing the demand for computation, minimizing the dimensionality curse effect, and enhancing classification performance. From the point of machine learning techniques, if a system utilizes unwanted features, it will utilize such information for advanced data, leading to the worst generalization. Eliminating unwanted features must not be examined with other size reduction approaches like Principal Component Analysis (PCA) [6] since optimal features can be unique to the remaining dataset. The feature reduction process cannot generate additional features since it utilizes the input features to diminish the number of features. When the features are chosen, a concept must be established to identify the subset of optimal features. If the feature subsets are measured directly for provided data, an NP-hard issue arises as the amount of features increases. Therefore, a suboptimal concept must be utilized to ignore high data dimensionality with complex computations [7].

In general, the dataset involves large amounts of information with unwanted or irrelevant data, which can reduce the machine learning model’s performance. This problem can be overcome by selecting the optimal set of highly suitable features for classification [8]. The applications of feature selection are carried out in varied areas like face recognition, classification of genes, plant disease classification, classification of tumor, etc. The feature selection is assumed to be an NP-complete combinatorial optimization issue, which intends to choose \(n\) optimal amount of features from the overall amount of features \(m\) without ignoring the information. Thus, various optimization algorithms are introduced to resolve the feature selection issues, such as genetic algorithm (GA) [9], Particle swarm optimization (PSO) [10,11,12], binary dragonfly algorithm (BDA) [13,14,15], whale optimization algorithm (WOA) [16, 17], etc. This problem can be solved using filtering, hybridized, wrapper, and embedded methods. The filtering based approaches cannot require learning models, and these approaches utilize statistical models to compute a subset of features or find the feature correlations [18]. The wrapper approaches utilize learning models, including different classifiers, to compute a subset of features. Compared with wrapper approaches, the filter methods are increased in speed. However, the wrapper based techniques attain improved classification accuracy than filter approaches [19, 20]. Some of the wrapper techniques employed for feature selection are Naïve Bayes (NB), Decision tree (DT), KNN, linear discriminant analysis (LDA) and so on.

1.1 Motivation

The feature selection effectively finds the most important features from the dataset and removes redundant (irrelevant) data. The main goals of feature selection include a better understanding of the data for different ML applications, maximizing prediction performance, and reducing data dimensionality. The data set, which consists of redundant and noisy data features, can significantly slow down the processing speed of the learning algorithm and affect its accuracy. High-dimensional data affects computation time, learning algorithm, model accuracy, and computational resources (memory). Until now, the classical optimization techniques for solving feature selection problems have been impractical. Therefore, evolutionary algorithms (EAs) are developed as an alternative to finding the optimal solution. In a group, EAs are generally inspired by nature, the biological and social behavior of birds, animals, fish, wolves, bats, fireflies, etc. Many researchers have offered various calculation methods to mimic the behavior of these species and find an optimal solution. Therefore, reducing the dimensionality of data and solving its problem is crucial to providing the most flexible, reliable, and highly accurate computational applications [21]. This motivated the authors to develop a hybrid optimization algorithm for feature selection that can shorten the learning time, reduce the size of features, and/or improve the performance of the classification algorithm.

1.2 Contribution

The major contributions of this work are as follows,

  • To solve the problems of high-dimensional data sets, a new hybrid model for optimal feature selection is developed, and the algorithm performance is improved.

  • To analyze multiple optimization algorithms with the high-dimensional dataset in order to choose the first and second best algorithms to form the finest hybrid combination.

  • The proposed work develops an optimized hybrid technique using SMA and BGWO algorithms to make the feature selection process more effective.

  • To efficiently remove the redundant or irrelevant data features, allow accurate search analysis, and speed up the algorithm with increased computational efficacy and high accuracy.

The rest of this paper is structured as follows: Section 2 describes the literature survey related to the proposed work, Section 3 deals with the proposed methodology, Section 4 discusses the simulation results and discussions of the proposed model, and Section 5 mentions the conclusion of the overall work.

2 Related works

Some of the existing works carried out on the feature selection process are discussed as follows,

Sathiyabhama et al. [22] introduced a feature selection technique in breast cancer detection using the grey wolf optimizer (GWO) algorithm. The large dimensionality of the feature set is reduced by optimally selecting the required features according to the GWO concept. This GWO is inspired by the social hierarchy and hunting behaviour of grey wolves. This existing work hybridizes the GWO algorithm with a Rough set (RS) to identify the essential features from the given images. The developed algorithm reduces the irrelevant features from the image and ensures a smooth classification process. The simulation analysis proves that the established algorithm provides better feature selection and helps to attain improved accuracy performance in classification.

Shanthi et al. [23] developed a modified stochastic diffusion search (SDS) approach for optimal feature selection in lung cancer prediction. This approach improves the classification process by obtaining important features that effectively identify different kinds of classes. The SDS algorithm selects the most needed features, and NB and decision tree approaches classify lung cancer. The large size of the features is minimized optimally and gives better classification results. This study reveals that the developed feature selection algorithm is highly suitable for large data quality. The experimental analysis shows that the developed model gives high accuracy in lung cancer prediction.

Asgarnezhad, Razieh et al. [24] established a multi-objective grey wolf (MOGW) algorithm based on feature selection in text prediction. In this study, the text is classified by using the NN classification. KNN chooses the most significant words, and NB approaches via an optimization algorithm. Three different datasets, like TS2, PMD and TS3, are used for the simulation analysis. The developed algorithm optimally chooses essential features and improves the NN classifier’s performance. This existing work attains an accuracy level of 95.75%, and the results are compared with several optimization techniques. This multi-objective algorithm is highly effective for the feature selection process in text classification.

Ghosh et al. [25] developed an ant colony optimization (ACO) for feature selection. Here, the wrapper filter is hybridized with the ACO algorithm in which the reduced subset of features is selected. Also, this optimization algorithm reduces the computational complexity using a wrapper approach. This feature selection algorithm is performed in a multi-objective way, and the simulation validation is done using two varied datasets. The objects are classified using KNN and MLP classifiers. The experimental analysis shows that the developed approach is computationally inexpensive and produces excellent classification results by eliminating irrelevant feature sets.

El-Kenawy et al. [26] introduced a hybrid feature selection concept using GWO and particle swarm optimization (PSO). Hybridization is done by attaining the balance between exploration and exploitation. The PSO algorithm is utilized to promote the diversity of the population and enhance production efficiency. The quality is accessed using 17 UCI datasets based on machine learning, which evaluates the optimization algorithm’s consistency and ensures the developed solution is stable and reliable. These developed hybrid algorithms are meta-heuristics algorithms, which choose better features and improve the system performance by minimizing the complexity.

Pathak et al. [27] have utilized levy flight based GWO for the feature selection process in image steganalysis application. In order to reduce the irrelevant features from the given inputs, the developed study adopted a new optimization method. This algorithm assisted in choosing the most important features for attaining better steganalysis outcomes. The feature extraction was performed here using AlexNet, Convolutional neural network (CNN) and SPAM. Due to the selection of the most suitable features, the classification accuracy was enhanced, and it exhibited that the feature selection process is necessary for mitigating the large dimensionality issues.

Singh et al. [28] developed a collaborative feature optimization for glaucoma diagnosis using retinal fundus images. The proposed approach was a two-layer approach working on the basis of PSO, Binary Cuckoo Search (BCS) and Bat algorithm. These algorithms are now said to be single- and double-layer approaches used to combine features. These features are used in the classification process with high accuracy. In the evaluation phase, the developed model used the ORIGA and REFUGE datasets and provided output with 98.95% accuracy. Among the advances was that the model reflected the problem of overfitting due to the combination of uneven layers.

Munish Khanna et al. [29] developed a methodology for human disease prediction based on machine learning approaches. The constructed model used ant lion-based optimization for feature selection, here four classifiers were used for prediction. Three public data sets and one private data set are available to evaluate the model. Five measures were used to evaluate the performance of the proposed model. The result is significantly improved by this change. The method enabled a 50 percent reduction in the original functionality without sacrificing accuracy or performance. For the heart disease dataset: 79.99% for the diabetes dataset, 98.52% for the diabetic retinopathy dataset and 97.18% for the skin cancer dataset to achieve maximum accuracy.

Singh et al. [30] developed three feature selection strategies based on Bacterial Foraging (BFOA), Emperor Penguin (EPO) and a hybrid (hBFEPO) linking BFOA and EPO. For further ML tasks, in addition to the classification of breast cancer, the basic methods for feature selection were examined. For the first time, a hybrid of the two was used. The COVID-19 data set served as the first testing ground for these tactics. These algorithms will be tested on the WDBC breast cancer dataset after producing good results. The model achieved an accuracy of 98.49% in the evaluation phase.

Singh et al. [31] projected a metaheuristic algorithm that worked to replicate the emperor penguin’s activities. This proposed algorithm was called emperor penguin optimization, and another optimization combined with this algorithm was bacterial foraging optimization. Here, 36 features were extracted from the retinal fundus images. The proposed approach was used in selecting features with features that significantly improved the accuracy of classification. Six machine learning classifiers classify based on a smaller subset of features provided by these three optimization techniques. The hybrid optimization technique combined with random forest achieved the highest accuracy of up to 0.95410. However, the model is only used for a small subset of functions.

Khan et al. [32] presented a hybrid optimization for efficient feature selection. In this work, we explain a typical feature selection problem to reduce the number of roles and responsibilities while increasing accuracy. From the machine learning repository, another classification dataset and feature selection technique for SMA was tested using GWO. The feature selection for the datasets in the UCI repository includes bat optimization, slime mold optimization, cuckoo search optimization, particle swarm optimization, whale optimization, and grey wolf optimization. Here, the algorithms were evaluated on limited datasets to show the performance of feature selection.

2.1 Problem statement

Feature selection in image processing applications plays a major role in data mining and machine learning. The high number of features in the presented data involves irrelevant and noisy information. There is a higher chance of degrading the classification accuracy and performance. Thus, various feature selection techniques are developed in advanced information technologies. However, many existing approaches face critical challenges regarding appropriate feature selection due to the high increase in search space. Also, some traditional search techniques have the disadvantage of expensive computation, which falls in local optima and nesting effect. The existing approaches failed to process large data dimensionality, and the previous techniques cannot provide better classification results. In addition, the existing optimization algorithms easily fall into local optima, affecting the system performance. The performance of the search algorithm is enhanced by properly balancing the exploration and exploitation process. Thus, the proposed work develops a hybrid optimization algorithm for an appropriate feature selection.

3 Proposed methodology

This section describes feature selection issues with the KNN classifier, and a hybrid optimization algorithm is introduced to resolve the issues in the feature selection process. Selecting an optimal feature set from a given dataset is a critical problem in machine learning. In recent decades, various techniques have been developed to solve the issues in the feature selection process. Nowadays, meta-heuristic algorithms are becoming more popular in optimal feature selection. Many of the advanced optimization algorithms provide enhanced performance by solving issues in feature selection. The proposed work hybridizes an SMA with BGWO to resolve the feature selection issues with improved classification accuracy. Initially, the input data are pre-processed, where data cleaning is done to clean the given input data. Then, the feature selection stage is performed based on the hybrid optimization techniques. After completing the feature selection phase, the classification is enabled with the aid of the KNN method. Figure 1 represents the block diagram of the proposed work.

Fig. 1
figure 1

Proposed block diagram

3.1 Pre-processing

Pre-processing is a significant stage for image processing because it promotes data quality, reduces redundant data, and ignores the noises available in the provided input high dimensional dataset. The data cleaning process is performed in the pre-processing stage in the proposed work. The process of data cleaning is to identify and remove inaccurate records from a given dataset. Also, it finds incorrect, missing values, incomplete and irrelevant data parts and performs modifying, replacing or eliminating the noisy data. In this pre-processing stage, the data is cleaned, thereby enhancing the quality of input data. The pre-processed data are fed to the input of the feature selection techniques.

3.2 Description of different feature selection methodologies

In order to select the optimal features, the proposed work implements eight different types of optimization techniques. From the eight techniques, the best algorithms for the feature selection process are chosen in the proposed work. The best algorithm depends on performance metrics like mean, best, worst, standard deviation and computational time. The algorithms include Ant Colony Optimization (ACO), Bat optimization algorithm (BAT), Cuckoo Search Optimization (CSO), Firefly Optimization Algorithm (FFA), Whale Optimization Algorithm (WOA), Particle Swarm Optimization (PSO), Grey Wolf Optimization (GWO) and Slime Mould Algorithm (SMA) are used for analyzed for attaining better optimization algorithm.

3.2.1 Ant colony optimization (ACO)

The ACO algorithm is one of the nature inspired meta-heuristic approaches to provide resolution of hard optimization problems. The utilization of ACO to identify the most effective subset of features that optimizes the performance of a classification model is possible. When dealing with high-dimensional datasets, where selecting relevant characteristics can improve classification accuracy and reduce computational costs, this is especially advantageous. The proposed work utilizes the ACO approach for feature selection purposes. Each ant is considered a feature, and the features are selected based on the selection probability. Initially, each of the features has a similar probability of selection, represented as follows [25],

$$Q_{p} \left( i \right) = \frac{{\left[ {\phi \left( i \right)} \right]^{\alpha } \left[ {\xi \left( i \right)} \right]^{\beta } }}{{\sum {_{{k \in M_{i}^{p} }} \left[ {\phi \left( k \right)} \right]^{\alpha } \left[ {\xi \left( k \right)} \right]^{\beta } } }}$$
(1)

where, \(\xi \left( i \right)\) = feature’s document frequency and it mentioned as the number of documents presented in the training set. It involves \(i^{th}\) feature and mentions heuristic information presented to the ants. The pheromone trial value is signified as \(\phi \left( i \right)\), and the two parameters are given as \(\alpha\) and \(\beta\), which finds the corresponding effect of heuristic and pheromone information. Table 1 shows the range of parameters in the ACO approach.

Table 1 ACO parameter values

Based on the global update rule, the pheromone trial is updated, which is mentioned as,

$$\phi \left( i \right) = \lambda \phi \left( i \right) + \sum\limits_{p = 1}^{m} {\Delta \phi_{p} } \left( i \right)$$
(2)

where, \(\lambda\) mentions the parameter of pheromone evaporation, which degrades the pheromone trails, \(m\) be the number of features. The particular number of pheromones is given as \(\Delta \phi_{p} \left( i \right)\) in which all the features are stored on the trail and is expressed as,

$$\Delta\phi_p\left(i\right)=\left\{\begin{array}{ll}2\,U_pF_p&if\,\,feature\,\,i\,\,is\,utilized\,by\,\,\,search\,agent\,p\\U_pF_p&if\,\,feature\,\,i\,\,is\,utilized\,by\,any\,search\,agent\,p\\0&otherwise\end{array}\right.$$
(3)

where, \(F_{p}\) be the F-measure value of \(p^{th}\) search agent’s subset of features and the unit of the pheromone value is mentioned as \(U_{p}\). The maximized F-measure of the search agent’s chosen subset, more pheromone is stored on the features utilized in the subset, and such features are chosen in the upcoming iteration. Based on this, the features are selected by using the ACO algorithm. Moreover, this algorithm has a limited parallelization problem and is computationally complex.

3.2.2 Bat optimization algorithm (BAT)

The BAT approach is one of the meta-heuristic algorithms inspired by the behavior of bats. BAT helps to identify and remove noisy or irrelevant features that improve the signal-to-noise ratio in the data and result in more robust models. This algorithm can be customized to optimize feature subsets based on specific goals or constraints, such as to maximize accuracy, minimize the number of features, or achieve a balance between precision and recall. Initially, the bats utilize echolocation to sense food distance and find the barriers in their specific characteristics. Initially, the features population are initialized, and the parameters are also initialized. The position and velocity of the features are updated for each iteration. When optimal features are selected from a present best solution, a new solution for all search agents is created utilizing a local random walk, which is represented as [33],

$$z_{new} = z_{old} + \delta \,X^{t}$$
(4)

where, \(\delta\) mentions the random number ranges from [-1, 1], the average attributes of each search agent are mentioned as \(X^{t} = < X_{i}^{t} >\). The penalty function is introduced for non-linear equality and inequality constraints. Table 2 shows the range of parameters in the BAT approach.

Table 2 BAT parameter values

Based on the penalty function, the constrained issue is converted into an unconstrained issue, which is given as,

$$\rho \left( {z,\,\eta_{i} ,\,l_{j} } \right) = f(z) + \sum\limits_{i = 1}^{N} {\eta_{i} } \varpi_{i}^{2} \left( z \right) + \sum\limits_{j = 1}^{N} {l_{j} } \xi_{j}^{2} \left( z \right)$$
(5)

From the above equation, the constraints are managed using the penalty function. The features are ranked and analyzed as the best features in every iteration. The features are selected using the BAT approach depending on the above equations. This algorithm is intended to be sensitive to multiple parameters, and the parameters used may have a lack of intuitive interpretation.

3.2.3 Cuckoo search optimization (CSO)

The CSO approach is one of the algorithms utilized for solving optimization issues. Feature selection using CS helps reduce dimensionality by identifying a subset of the most informative features. Reducing the number of features helps simplify models, shorten training times, and improve the generalization ability of the model. It is also a meta-heuristic algorithm and depends upon the levy flight’s cuckoo species and random walks. This algorithm describes the behavior of cuckoo birds and their strategy of laying an egg in another bird’s nest. The CSO algorithm process is carried out on cuckoo breeding behavior, levy flight, and cuckoo search. The breeding behavior presents three kinds of brood parasitism: cooperative breeding, intraspecific brood parasitism, and nest takeover. In the exploration process, the group with the better quality features are chosen for the next process. At each iteration, the position of optimal features gets updated. When generating a new optimal feature set, a levy flight is enabled and is expressed as [34],

$$z_{i}^{(t + 1)} = z_{i}^{(t)} + \psi \oplus \,Levy\,(\delta )$$
(6)

where, \(\psi\) mentions the scales of interest problem,\(\psi > 0\), the term \(\oplus\) represents the entry-wise multiplication. The values of parameters that are fixed in the CSO approach are given in Table 3.

Table 3 CSO parameter values

The levy flights perform a random walk, and the random steps are obtained from a Levy distribution for big steps, which is given as,

$$Levy \text{ }v={t}^{-\alpha },\;(1<\alpha \le 3)$$
(7)

In this, each step of search agents significantly forms a process of the random walk, which satisfies a distribution of the power-law step length. Based on this, the search agent can explore the optimal feature set in the CSO approach. In addition, this approach also has concerns about the sensitivity of the parameters used for selection and the adaptability to dynamic environments is limited.

3.2.4 Firefly optimization algorithm (FFA)

FFA is a stochastic global optimization approach in which the process is based on the behavior of fireflies. In general, selecting a subset of features using FFA can result in more interpretable models. Understanding the relevance of individual features provides insights into the factors that contribute to the model predictions. FFA can also help identify and eliminate redundant features in a data set. Numerous firefly species are available, and this firefly generates the least and most harmonious flash lighting. The firefly flash lighting has three functions: attracting the mate partners, attracting the potential prey, and affording a warning scheme. Table 4 shows the parameter ranges in FFA.

Table 4 FFA parameters values

The search agent moves its position to search for the optimal features.

$$y_{i} = y_{i} + \alpha_{0} e^{{ - \eta \,r_{ij}^{2} }} (y_{j} - y_{i} ) + \beta (rand - 0.5)$$
(8)

where, \(\alpha_{0} e^{{ - \eta \,r_{ij}^{2} }} (y_{j} - y_{i} )\) be the feature characteristics, \(\beta (rand - 0.5)\) represents the parameter of randomization. The random number is denoted as \(rand\) from the uniform distribution ranges from 0 to 1 and \(\beta\) is signified as noise exiting. The distance between the search agent and features is defined as,

$$r_{ij} = \left\| {y_{i} - y_{j} } \right\|$$
(9)

where, \(y_{i}\) mentions the position of \(i^{th}\) search agent. The optimal solution is attained depending on the fitness function. Thus, the optimal features are selected using the FFA approach [35]. This performance of the algorithm depends on the initialization of the feature population.

3.2.5 Whale optimization algorithm (WOA):

WOA is a nature-inspired meta-heuristic based algorithm which imitates the behavior of whales. WOA can help select a subset of features that improve the generalization ability of a machine learning model. By focusing on the most relevant features, the model is less likely to overfit the noise in the training data. This approach is motivated by the strategy of bubble-net hunting, and the mathematical model is expressed as three processes: encircling prey, bubble-net attacking mechanism, and prey searching. Table 5 illustrates the parameter values that are fixed in WOA.

Table 5 WOA parameter values

Encircling prey

The search agent can identify the position of features and encircle them. After determining the optimal search agent, the remaining search agents attempt to update their current position towards the optimal search agent [16]. It is defined as,

$$\vec{E} = \left| {\overrightarrow {F} .\overrightarrow {{A^{ * } }} \left( t \right) - \overrightarrow {A} \left( t \right)} \right|$$
(10)
$$\overrightarrow {A} (t + 1) = \overrightarrow {{A^{ * } }} \left( t \right) - \overrightarrow {X} .\overrightarrow {E}$$
(11)

where, \(t\) mentions the present iteration, the coefficient vectors are represented as \(\overrightarrow {X}\) and \(\overrightarrow {F}\), the position vector of the optimal solution is denoted as \(A^{ * }\), the position vector is signified as \(\overrightarrow {A}\), the absolute value is termed as \(\left| {\overrightarrow {F} .\overrightarrow {{A^{ * } }} \left( t \right) - \overrightarrow {A} \left( t \right)} \right|\), which is a multiplication based on element by element.

Exploitation process-Bubble net attacking scheme

The exploitation stage has two processes: shrinking of encircling schemes and spiral updation locations. In this, the position of the search agent is updated as,

$$\overrightarrow {A} \left( {t + 1} \right) = \left\{ \begin{gathered} \overrightarrow {{A^{ * } }} \left( t \right) - \overrightarrow {X} .\,\overrightarrow {E\,} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,\,\,q < 0.5 \hfill \\ \overrightarrow {E^{\prime}.} e^{cd} .\,\cos \,\left( {2\pi d} \right) + \overrightarrow {{A^{ * } }} \left( t \right)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,if\,\,\,q \ge 0.5 \hfill \\ \end{gathered} \right.$$
(12)

where, the random number is represented as \(q\) and ranges from 0 to 1. In this, the search agent can randomly explore the features.

Exploration process – Feature search

The search agent can search the features and update their position for each iteration. The process is continued until the optimal set of features is obtained. The mathematical model of the exploration process is given as,

$$\overrightarrow {E} = \left| {\overrightarrow {F} .\overrightarrow {{A_{rand} }} - \overrightarrow {A} } \right|$$
(13)
$$\overrightarrow {A} \left( {t + 1} \right) = \overrightarrow {{A_{rand} }} - \overrightarrow {X} .\overrightarrow {\,E}$$
(14)

where, \(\overrightarrow {{A_{rand} }}\) is a position vector in a random manner, which is selected from the present feature set? Adjust the search agent who goes beyond the searching space, and the fitness function is evaluated. Based on this, the features are selected optimally for the classification. Except for selection performance, the algorithm has scalability concerns.

3.2.6 Particle swarm optimization (PSO)

PSO is one of the stochastic optimization approaches inspired by the behaviour of swarms [26]. The utilization of PSO to identify the optimal subset of features that optimizes the performance of a classification model is possible. This is especially beneficial when working with high-dimensional datasets, as the selection of pertinent features can enhance classification accuracy and mitigate overfitting. Based on the swarm, the PSO approach performs the searching process. In each iteration, the information about every feature set is integrated to manage the velocity of all dimensions. It is utilized to evaluate the current position of the feature. Table 6 illustrates the parameter ranges in the PSO algorithm.

Table 6 PSO parameter ranges

The position of each feature is updated as,

$$k_{i,\,t + 1}^{l} = \left\{ \begin{gathered} z_{i,\,t + 1}^{l} ,\,\,\,\,\,\,\,\,\,\,\,if\,\,p\,\left( {A_{i,\,t + 1} } \right) < p\,\left( {K_{i,\,t} } \right) \hfill \\ k_{i,\,t}^{l} ,\,\,\,\,\,\,\,\,\,\,\,\,\,\,otherwise \hfill \\ \end{gathered} \right.\,$$
(15)

To update the velocity, the inertia weight was developed, and the current updation formula of velocity is expressed as [26],

$$y_{i,\,t + 1}^{l} = w * y_{i,\,t}^{l} + a_{1} * rand * \left( {k_{i,\,t}^{l} - z_{i,\,t}^{l} } \right) + a_{2} * rand * \left( {k_{h,\,t}^{l} - z_{i,\,t}^{l} } \right)$$
(16)

A variant is introduced in the PSO approach with a factor \(\xi\) by determining the convergence behaviour. It assures convergence and enhances the rate of convergence, and hence, the velocity is updated as,

$$y_{i,\,t + 1}^{l} = \xi \left( {y_{i,\,t}^{l} + \rho_{1} * rand * \left( {k_{i,\,t}^{l} - z_{i,\,t}^{l} } \right) + \rho_{2} * rand * \left( {k_{h,\,t}^{l} - z_{i,\,t}^{l} } \right)} \right)$$
(17)

When the optimal solution is attained, the process is stopped, or it updates the fitness function and continues until a better solution is attained. In this process, the optimal features are selected using the PSO approach. By analyzing the optimization above algorithms based on the performance metrics like mean, best, worst, standard deviation and computational time, the SMA approach is superior to the other algorithms. The second-best technique is GWO, the third is BAT, and the fourth is WOA. Thus, the proposed work uses SMA and GWO for the feature selection. The SMA approach is integrated with GWO to attain an optimal feature set. The reason for using multiple optimizations for feature selection from the features obtained from the sampled data set is to improve the robustness, adaptability and effectiveness of the feature selection process across different data sets. Here, every optimization algorithm has an error in the selection process. A novel hybrid approach is implemented to address drawbacks such as restricted parallelization, limited scalability, parameter sensitivity, and initialization issues.

3.2.7 Slime mould algorithm (SMA)

The slime mould is generally represented as Physarum polycephalum and was initially categorized as a fungus. The slime mould is a kind of eukaryote mainly located in humid and cool places. The major nutrition stage of slime mould is Plasmodium. The SMA approach is the same as the swarm-based optimization techniques. Each individual would expand over the process and be managed towards the global optimum in every iteration. The SMA optimization approach is categorized into three phases: initialization, searching and exploiting. Table 7 shows the parameter values in SMA.

Table 7 SMA parameter values

Initialization

Each of the features from the given data is uniformly and randomly initialized in the overall domain and is mentioned as [36],

$$z_{i} = R_{1} (ub - lb) + lb$$
(18)

where \(R_{1}\) mentions the random function in the Gaussian distribution. Each value of features should be initialized in this stage, such as total feature size, the maximum number of iterations \(\max Iter\) etc.

Exploration and exploitation

Each feature’s position is updated in every iteration and managed to the global optimum during the exploration and exploitation process. It is represented as,

$$z_{i} (t + 1) = \left\{ \begin{gathered} \,\,\,\,\,\,\,\,\,\,\,\,R_{2} .(ub - lb) + lb\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,R_{2} < x \hfill \\ z_{a} + f_{a} .[Y.\,z_{M} (t) - z_{N} (t)]\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,R_{3} < q \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,f_{b} .\,z_{i} (t)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,q \le R_{3} \le 1 \hfill \\ \end{gathered} \right.$$
(19)

where, \(z_{i} (t)\) and \(z_{i} (t + 1)\) mentions the position of \(i^{th}\) feature in the present iteration \(t\) and future iteration \(t + 1\). The random value in the Gaussian distribution is signified as \(R_{2}\), the proportional number is depicted as \(x\), which selects some of the features in a random manner to restart the initialization process, \(x = 0.03\) as defaults. In the present iteration, two randomly chosen features are represented as \(z_{M} (t)\) and \(z_{N} (t)\), the two random values in the uniform distribution are mentioned as \(f_{a}\) and \(f_{b}\) with the range of \([ - k,\,k]\) and \([ - l,\,l]\) correspondingly. In this, \(k\) and \(l\) are represented as the two variables corresponding to the number of iterations and the maximum amount of iteration time is given as,

$$k = k\tanh \left( {1 - \frac{t}{\max Iter}} \right)$$
(20)
$$l = 1 - \frac{t}{\max Iter}$$
(21)

Here \(q\) is also a proportional number, which limits the option of branches. It corresponds to the global optimal fitness function OF,

$$q=\mathrm{tanh}\left|{P}_{i}-OF\right|$$
(22)

The weight parameter is evaluated as,

$$\omega_{{s_{i} (i)}} = \left\{ \begin{gathered} 1 + R_{4} .\,\log \,\left( {1 + \frac{{B_{f} - P_{i} }}{{B_{f} - W_{f} }}} \right)\,\,\,\,\,\,\,\,\,\,condition \hfill \\ 1 - R_{4} .\,\log \,\left( {1 + \frac{{B_{f} - P_{i} }}{{B_{f} - W_{f} }}} \right)\,\,\,\,\,\,\,\,\,\,others \hfill \\ \end{gathered} \right.$$
(23)

where, \(B_{f}\) represents the best fitness value and \(W_{f}\) mentions the worst fitness values among the total fitness values \(P_{i}\), here \(i = 1,2,....,m\). The fitness values of each feature are sorted by the following Eq. (24).

$$p_{i} = Sort\,(P)$$
(24)

Here, the SMA algorithm has the advantage of showing both exploratory and exploitative behavior in its foraging strategies. An algorithm inspired by these behaviors could balance exploring a wide range of features and exploiting promising subsets for feature selection. This algorithm is adaptable to any noisy environment and also provides a parallel process of feature selection.

3.2.8 Grey Wolf Optimization

The hunting behaviour of grey wolves inspires the GWO approach. The hunting techniques of grey wolves are tracking the prey, encircling the prey and attacking the prey. Prey is considered the feature in the proposed work, and the wolves are assumed to be a search agent. Table 8 depicts the parameters in GWO.

Table 8 GWO parameter values

Encircling the features

In the encircling behaviour, the search agents encircle the needed features and are given as [22],

$$\overrightarrow {H} = \left| {\overrightarrow {G} .\,\overrightarrow {Z}_{q} (t) - \overrightarrow {Z} (t)} \right|$$
(25)
$$\overrightarrow {Z} (t + 1) = \overrightarrow {Z}_{q} (t) - \overrightarrow {U} .\overrightarrow {H}$$
(26)

where, \(\overrightarrow {U}\) and \(\overrightarrow {G}\) are considered as the coefficient vectors, \(t\) denotes the number of iterations, \(\overrightarrow {Z}_{q}\) be the vector of feature position, the vector of search agent is signified as \(\overrightarrow {Z}\) and \(\overrightarrow {H}\) mentions the evaluated vector, which is utilized to represent a search agent’s new position. The vectors \(\overrightarrow {U}\) and \(\overrightarrow {G}\) are evaluated as,

$$\overrightarrow {U} = 2\overrightarrow {k} .\,\overrightarrow {{R_{1} }} - \overrightarrow {k}$$
(27)
$$\overrightarrow {G} = 2.\,\overrightarrow {{R_{2} }}$$
(28)

where \(\overrightarrow {k}\) mentions the vector is set to minimize linearly from two to zero among the interactions, the random number ranges from [0, 1] is represented as \(\overrightarrow {{R_{1} }}\) and \(\overrightarrow {{R_{2} }}\). Based on the position of features, the search agent can change its position. The position of search agent is updated as,

$$H_{i} = \frac{{H_{\alpha } + H_{\beta } + H_{\chi } }}{3}$$
(29)

The hunting process of search agents is managed by parameters like \(\alpha\), and \(\beta\) and \(\chi\). \(H_{\alpha }\) Mentions the first optimal search agent, \(H_{\beta }\) represents the second optimal search agent and \(H_{\chi }\) denotes the third optimal search agent.

Attacking the features

The search agents complete the search process by attacking the features until they stop moving. The attacking process is modeled by minimizing the value of \(\overrightarrow {k}\) in varied iterations. If the fluctuation rate of \(\overrightarrow {k}\) reduces, then the vector \(\overrightarrow {U}\) also reduces. Based on the position of the parameters \(\alpha\), and \(\beta\) and \(\chi\), the search agent updates its position.

Exploration process

The search agents often explore for the features based on the positions of parameters \(\alpha\),\(\beta\) and \(\chi\). The search agents diverge from each other and associate with attacking the features to search for the feature’s position. The GWO approach has a component \(\left( {\overrightarrow {G} } \right)\) which helps the optimization algorithm to generate new solutions. This component provides random weights for each of the features, and it assists in maintaining the search agent away from local optima issues. The weights of the GWO approach are given as follows,

$$\psi = \frac{1}{2}v\tan \,(t)$$
(30)
$$\theta = \frac{2}{\pi }v\cos \frac{1}{3}.v\tan \,(t)$$
(31)
$$w_{1} = \cos \theta ,\,w_{2} = \frac{1}{2}\sin \theta .\cos \psi ,\,w_{3} = 1 - w_{1} - w_{2}$$
(32)

In addition, the GWO algorithm has high interoperability and is inherently parallelizable, making it suitable for parallel computing environments. This property can be advantageous when processing computationally intensive optimization problems.

3.3 Feature selection using the proposed hybridized SMA + BGWO model

The feature selection process reduces the large dimensionality of the features from the pre-processed data. This feature selection phase helps improve overall system performance. In addition, the feature selection phase shortens the training time of classification and improves accuracy. Different algorithms are analyzed to select the better optimization algorithm suitable for effective classification. First, the algorithms ACO, BAT, CSO, FFA, WOA, PSO, GWO and SMA are analyzed. Among these algorithms, SMA and GWO algorithms are selected as better optimizers for feature selection due to their high interoperability, scalability and inherently parallelizable parameters compared to the other optimization algorithms. Therefore, the proposed work utilizes a hybrid form of SMA and GWO based optimization for the feature selection process. In order to make the system more effective and achieve better convergence, the GWO algorithm is changed to BGWO in addition to the SMA algorithm. The fitness function of hybrid algorithms is represented as follows.

$$Fitness\,function = \delta \,e_{r} (C) + \eta \frac{\left| S \right|}{{\left| N \right|}}$$
(33)

where, \(e_{r} (C)\) mentions the classification error rate, the total volume of features is represented as \(N\). The chosen subset of features is denoted as \(S\), \(\delta\) and \(\eta\) mentions the parameters related to the accuracy of the classification process and the minimization of features. Depending on the fitness function, the features are selected using hybrid algorithms. The combination of SWA and BGWO aims to accelerate convergence by integrating effective mechanisms from each algorithm. This can lead to faster convergence towards optimal solutions, especially for high-dimensional and complex optimization problems.

3.3.1 Hybridization process

The SMA algorithm is hybridized with BGWO to obtain a better feature selection outcome. In this, the best feature \(z_{a}\) is replaced with the average of \(\alpha\),\(\beta\) and \(\chi\),the hybridization process is updated as,

$$z_{i} (t + 1) = \left\{ \begin{gathered} \,\,\,\,\,\,\,\,\,\,\,\,R_{2} .(ub - lb) + lb\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,R_{2} < x \hfill \\ w_{1} H_{\alpha } + w_{2} H\beta + w_{3} H\chi + f_{a} .[Y.\,z_{M} (t) - z_{N} (t)]\,\,\,\,\,\,\,\,\,\,R_{3} < q \hfill \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,f_{b} .\,z_{i} (t)\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,q \le R_{3} \le 1 \hfill \\ \end{gathered} \right.$$
(34)

To perform fine-tuning of the hybrid algorithm, the BGWO approach is introduced in the proposed work. This BGWO approach is combined with SMA to improve the performance of the optimization process and also allows to obtain better features. The hybridization process can enable parallel exploration of the solution space by different algorithms. This parallelism can lead to more efficient use of computing resources. In the presence of noise in the objective function, hybridization can provide robustness. Different algorithms respond differently to noise, and the hybrid approach can mitigate the impact of noise on the optimization process.

3.3.2 Binary grey wolf optimization (BGWO- (Sigmoid transfer function (S-shape))

Based on this hybridization process, the features are optimally selected. In the fundamental optimization algorithms, the individual features move in the search space to update their positions to any location in the space, termed continuous space. For the basic feature selection problems, the solutions are restricted to the binary space {0, 1} values. To resolve the issues in feature selection, the continuous space must be converted to their related binary solutions {0, 1}. Therefore, it inspired the introduction of the latest version based on a sigmoid transfer function (S-shape). The updation of the S-shaped function is mentioned as,

$$v^{p} = \frac{1}{{1 + e^{{ - y_{i}^{p} (t)}} }}$$
(35)
$$Z_{i}^{d} = \left\{ \begin{gathered} 1\,\,\,\,\,\,\,\,if\,\,rand < L(z_{i}^{p} (t + 1)) \hfill \\ 0\,\,\,\,\,\,\,otherwise \hfill \\ \end{gathered} \right.$$
(36)

By applying the S-shaped function, a free position of the search agents must be converted to their related binary solutions. This conversion is performed in each dimension, representing the probability of changing the elements of position vectors from 0 to 1. Hence, the search agents are moved to a certain binary space. This S-shaped function makes the performance of the hybrid optimization process high. Table 9 gives the pseudocode for the optimized feature selection process.

Table 9 Pseudo code of proposed feature selection process

By analyzing the performance of each feature selection approach, SWA and BGWO are selected as effective approaches. Thus, these two approaches are hybridized in the proposed study to make a robust feature selection. In order to further enhance the feature selection performance in the proposed study, the existing GWO is enhanced by BGWO, and then hybridization is enabled over SWA. Because of utilizing BGWO, the local minima issues are avoided and the convergence rate is enhanced compared to other existing feature selection algorithms. This hybridization in existing work has been evaluated with limited datasets. To show the improved performance in the proposed work, the hybrid algorithm is evaluated with many datasets and for improvement, the sigmoid transfer function is also used for optimization. In general, numerical features offer various advantages when used on high-dimensional datasets. Dimensionality reduction approaches are often better suited to numerical features. Compared to complex categorical properties, numerical features are typically easier to understand. Numerical data tends to scale more smoothly as the data set grows. Large datasets with numeric features are typically easier to process than datasets with categorical features. For feature development, numerical features offer more flexibility. In metaheuristic algorithms, there is the freedom to combine, alter, or interact with current numerical characteristics to build new ones, giving the ability to capture more complex data relationships.

3.4 KNN based classification

The selected optimal features are subjected to the classification stage in which different kinds of data and their categories are accurately classified. The proposed work uses the KNN approach for classification purposes. The KNN classification method is more popular because of its high efficiency and simplicity. This model learns from the training data during the prediction phase, making it computationally efficient during the training process. Furthermore, it is used for binary and multiclass problems, while ANN makes no assumptions about the underlying data distribution. This flexibility makes it suitable for a wide range of data sets, including those with non-linear decision boundaries. This KNN approach generates a decision on categorizing new testing data with the known training data. Basically, for a provided unlabeled time series \(Z\), the rule of the KNN method identifies the K “neighborhood” labeled time series from the training data. Also, it allocates \(Z\) to the class which repeatedly belongs in the closest of the k time series. In KNN, the classification process is performed by a maximum vote of the nearest neighbour. The Euclidean distance between the two data samples is evaluated as,

$$Dist\,(A,B) = \sqrt {\sum\limits_{i = 1}^{D} {\left( {A_{i} - B_{i} } \right)^{2} } }$$
(37)

where, \(A_{i}\) and \(B_{i}\) are the attributes of two different samples. The test data are classified using the following two voting methods such as majority voting and distance weighted voting, which are given as,

$$\mathrm{Majority\;voting},\;m^{\prime}=\mathrm{arg}\;{\mathrm{max}}_{f}\;\sum_{({z}_{i},\;{m}_{i})\in Dx}\;\psi (f,\;{m}_{i})$$
(38)
$$\mathrm{Distance}-\mathrm{weighted\;voting},\;m^{\prime}=\mathrm{arg}\;{\mathrm{max}}_{f}\;\sum_{({z}_{i},\;{m}_{i})\in Dx}\;{\omega }_{i}\psi (f,\;{m}_{i})$$
(39)

where, \(\psi (f,\,m_{i} )\) represents the indicator function, the set of the closest neighbour of the test data is signified as \(D_{x}\) and \(\omega_{i}\) mentions the weight in which \(\omega_{i} = {1 \mathord{\left/ {\vphantom {1 {d\left( {z^{\prime},\,z_{i} } \right)}}} \right. \kern-0pt} {d\left( {z^{\prime},\,z_{i} } \right)}}^{2}\). KNN implicitly performs feature selection by considering only the relevant features for determining the similarity between data points. Irrelevant features have less impact on the algorithm’s decision. By using the KNN approach, the classification is performed with reduced error.

4 Results and discussions

This section evaluates the performance of the proposed techniques using different high dimensional datasets, and the attained results are compared with several existing approaches to compute the efficacy of the proposed methods. Different feature selection algorithms are implemented using seven varied datasets, and the results are compared with metrics like mean, worst, best, computational time and standard deviation. Based on the results, the best feature selection algorithm is chosen for the further process. Also, different classifiers like SVM, NB and KNN are executed for the classification stage. The best classifier is chosen according to some of the metrics from these classifiers. Then, the hybrid optimization for the feature selection process is performed using the best two algorithms, and the chosen best classifier does the classification.

4.1 Dataset description

Different high dimensional datasets are used for the experimental setup in the proposed work. Initially, seven datasets are utilized to analyze the optimal algorithm for feature selection: Banknote, Breast Cancer, Diabetes, Heart, Liver, Alzheimer’s and Zoo. These datasets are collected from the UCI machine learning repository.

Banknote authentication dataset

In the banknote authentication dataset, the data are gathered from the images acquired to compute an authentication concept of banknotes. This dataset involves 1372 instances, and the number of attributes is 5.

Breast cancer dataset

The total number of instances in the breast cancer dataset is 286, and the available amount of attributes is 9. This dataset has multivariate characteristics and contains the attributes information about age, class, size of the tumor, etc.

Diabetes dataset

The diabetes dataset involves 20 different attributes and the files of diabetes patients with four different fields per record.

Heart disease dataset

The heart disease dataset contains 75 kinds of attributes with 303 instances. This dataset holds the details of heart disease patients, including age, sex, id, etc.

Liver dataset

The liver disorder dataset has 345 instances and contains 7 attributes.

Alzheimer’s

The Alzheimer’s dataset is collected from the Kaggle repository with several MRI-based images. The four categories in the Alzheimer’s dataset are mild demented, non-demented, moderate demented and very mild demented. Moreover, this dataset is manually collected from several websites.

Zoo dataset

The Zoo dataset involves 101 different instances and 17 volumes of attributes. The available information of attributes is milk, airborne, feathers, legs, eggs, etc.

Using such datasets, eight feature selection algorithms and three varied classifiers are executed, and from that, the best techniques are chosen using performance metrics. After that, the selected techniques are executed using a large data dimensionality.

High dimensionality data

The proposed work uses the dataset with high dimensionality by categorizing the dataset into large, medium and low. In the large dataset, two datasets are taken in the proposed experimental setup, namely CICDDoS2019 and CICMalDroid2020. The proposed work uses three datasets in the medium dataset: heart dataset, wine quality dataset, and car sales dataset. The low dataset used is the banknote, seed and Prima Indians Diabetes.

CICDDoS2019

The CICDDoS2019 contains 12 classes of attacks and presents TFTP attack data with a ratio of 100 to 1. This dataset is collected from the Kaggle DDOS dataset for several network security-based approaches.

CICMalDroid2020

The CICMalDroid2020 dataset has 17,341 samples gathered from Dec 2017 to Dec 2018. This dataset is classified as Banking malware, Riskware, Adware, Benign and SMS malware.

Wine quality dataset

The wine quality dataset has 4898 instances, and it also includes 12 amount of attributes. This dataset has white and red Vinho Verde samples of wine collected from Portugal.

Car sales dataset

The car sales dataset is acquired from the Kaggle repository. This dataset has several pieces of information regarding varied types of cars.

Seed dataset

The seed dataset is collected from the BSMI laboratory, which holds 20 subjects, 10 females and 10 males. This dataset is highly utilized in emotion classifications.

Prima Indians Diabetes

The Prima Indians Diabetes dataset is gathered from the kidney and digestive diseases of the diabetes institute. This dataset presents the patient 21 years old of a female with diabetes.

4.2 Performance metrics

The performance metrics such as accuracy, precision, recall, F-measure, time, MSE and MAE are utilized to evaluate the proposed classification scheme. The metrics like best, worst, mean, standard deviation and computational time are utilized to compute the efficacy of the optimized feature selection techniques. The efficacy of the proposed work is measured by comparing the attained results with various existing techniques.

4.2.1 Metrics for feature selection

The following performance metrics are measured to analyze the effectiveness of feature selection techniques.

Mean fitness

The mean fitness represents the average value of the attained fitness function. When the proposed algorithm is processed at \(N\) times, then the mean fitness is evaluated as,

$$mean_{f} = \frac{1}{N}\sum\nolimits_{p = 1}^{N} {h_{ * }^{p} }$$
(40)

where, \({h}_{*}^{p}\) be the fitness function value.

Best fitness

The best fitness mentions the lowest value of the fitness function. When the technique is processed at \(N\) times, then the best fitness is computed as,

$$Best_{f} = Low_{P}^{N} = 1^{{h_{ * }^{p} }}$$
(41)

where, \(h_{ * }^{p}\) signifies the best fitness value.

Worst fitness

The worst fitness mentions the highest values of the fitness function. When the feature selection technique is processed at \(N\) times, the worst fitness is evaluated as,

$$worst\,fitness = High_{{P = 1^{{h_{ * }^{p} }} }}^{N}$$
(42)

where, \(h_{ * }^{p}\) represents the worst fitness value.

Standard deviation

The standard deviation is the metric to measure the stability of each optimization algorithm in the feature selection process. Based on the measure of the fitness function, the standard deviation is evaluated and is given as,

$$S\tan dard\,deviation = \sqrt {\frac{1}{M}\sum\limits_{p = 1}^{M} {(x - \mu )^{2} } }$$
(43)

where, \(M\) be the total features, \(x\) be the value of data, and the mean value is denoted as \(\mu\).

Computational time

Computational time is the metric used to evaluate the average time for the computation process in sec. When the feature selection approach is run at \(N\) times, then the average computational time is expressed as,

$$Avg.\,time = \sum\nolimits_{p = 1}^{N} {Average\,\,time^{p} }$$
(44)

where, \(Average\,\,time^{p}\) mentions the average processing time.

4.2.2 Metrics for classification:

The following metrics used to compute the efficiency of the classification approach are discussed below.

Accuracy

Accuracy is an essential metric to measure efficiency, and the enhancement rate is the proposed classifier over state-of-the-art techniques. An accuracy metric plays a significant role in determining the performance of the classification process to categorize the given data. The evaluation of accuracy is expressed as,

$$A_{c} = \frac{Tp + Tn}{{Tp + Tn + Fp + Fn}}$$
(45)

Precision

Precision is important to evaluate the performance of classification methods; also, it mentions the rate of correctly classified data based on the total amount of true positives. It is acquired as the sum of correctly labelled data from a certain class divided by the entire classified data. It is computed as,

$$P_{r} = \frac{Tp}{{Tp + Fp}}$$
(46)

Recall

Recall is one of the performance metrics to compute the accuracy of the entire model. The total amount of correctly classified data to the overall input data should have been classified for a certain label. The recall can identify the error rate in the classification process. It is evaluated as,

$$R_{e} = \frac{Tp}{{Tp + Fn}}$$
(47)

F-measure

The F1 score represents the comparative view of precision and recall; the perfect precision and recall are determined by 1 in the F1 score. The F1 score is considered zero when the precision or recall is zero. The F-measure is known as the harmonic mean of precision and recall, which can be formulated as,

$$F - measure = 2 \times \frac{\Pr ecision \times recall}{{\Pr ecision + recall}}$$
(48)

RMSE

The term RMSE is referred to as Root Mean Square Error, in which the square root of the obtained error is evaluated. The RMSE value is similar to the real units of the target value that are being classified. The RMSE is evaluated as,

$$RMSE = \sqrt {\sum\limits_{i = 1}^{m} {\frac{{\left( {\hat{x}_{i} - x_{i} } \right)}}{m}} }$$
(49)

where, \(\hat{x}_{i}\) be the predicted value and \(x_{i}\) be the actual value. The total amount of data is represented as \(m\).

MAE

The term MAE stands for mean absolute error, an error measure between the predicted and true values. It is measured as,

$$MAE = \frac{{\sum\nolimits_{i = 1}^{n} {\left| {x_{i} - y_{i} } \right|} }}{m}$$
(50)

where, \(x_{i}\) denotes the prediction value, \(y_{i}\) represents the true value and \(m\) signifies the total amount of data.

4.3 Performance analysis

The proposed work uses eight optimization techniques and three varied classifiers to analyze the optimal feature selection algorithm. The best techniques are selected for the proposed feature selection and classification process. The results obtained from the feature selection process are described in the following section. Table 10 represents the performance results of the proposed techniques using different datasets.

Table 10 Performance analysis of proposed feature selection techniques

This Table shows the performance analysis of several feature selection techniques using metrics like mean, best, worst, standard deviation and computational time. The analysis is done by using seven different datasets. The result analysis shows that the SMA algorithm is better than other techniques, and the GWO algorithm is the second best among the other optimization techniques. The attained results of feature selection techniques in each dataset are illustrated in Fig. 2.

Fig. 2
figure 2

Performance results of several feature selection techniques using a varied dataset

The obtained result analysis shows that the algorithms SMA and BGWO are superior to the other techniques. Thus, the proposed work introduces the hybrid optimized feature selection process using SMA and BGWO. This hybrid algorithm is compared with other optimization algorithms and is depicted in Table 11.

Table 11 Performance analysis of proposed hybrid optimization technique with other optimization algorithms

The hybrid techniques are compared with optimization algorithms like WOA, BAT, GWO and SMA. Using seven varied datasets, the analysis is performed. Each dataset contains several features, and the optimal features are selected using the mentioned algorithms. The comparison results prove that the hybrid approach gives improved results than the other techniques. Figure 3 represents the performance comparison of the proposed hybrid algorithm with other optimization techniques.

Fig. 3
figure 3

Performance comparison of the proposed hybrid optimization approach with other optimization techniques

The comparison analysis states that the hybrid technique results better than the other optimization techniques. The hybrid algorithm selects the optimal set of features in reduced computational time. Also, these hybrid approaches are very easy to implement and can ignore local optima. Thus, this makes the algorithm attain enhanced results over the other techniques. To detect the efficacy of the proposed hybrid algorithm, the obtained results are compared with some of the previous existing hybrid approaches. Table 12 shows the performance comparison of the proposed hybrid optimization algorithm with other hybrid approaches.

Table 12 Performance comparison of proposed and existing hybrid approaches

The proposed hybrid approaches are compared with the existing hybrid optimization techniques like BAT-PSO [37], ALO-GWO [38], GWO-WOA [39] and SMA-FA [40]. These existing hybrid algorithms fail to provide better feature selection results because of the high computational time and slow convergence rate. Also, these approaches easily fall into local optima issues, which may affect the algorithm’s performance. This drawback can be overcome in the proposed hybrid SMA + BGWO approach. This hybrid approach provides improved results regarding best, worst, standard deviation, mean and computational time. Figure 4 mentions the comparative analysis of the proposed hybrid algorithm with existing techniques in terms of several performance metrics.

Fig. 4
figure 4

Comparison analysis of proposed hybrid with existing hybrid techniques using varied datasets

This result analysis shows that the proposed hybrid approaches are more effective for feature selection than previous hybrid approaches. The proposed SMA + BGWO obtains more excellent results for each performance metric than the other hybrid algorithms. The selected features from these hybrid feature selection approaches are given as the input of the three classifiers: SVM, NB and KNN. Depending on this analysis, the best classifier uses varied performance metrics like accuracy, precision, recall, F-measure, RMSE, time and standard deviation. Table 13 compares the classification performance of different classifiers using seven varied datasets.

Table 13 Classification performance comparison (SMA+GWO)

The analysis uses seven datasets with three varied classifiers: SVM, NB and KNN. The KNN approach provides better results for each metric than the SVM and NB methods. The SVM has an overlapping issue; hence, it is difficult to afford better classification results and takes a large amount of training time. Moreover, the SVM method cannot support large dimensionality data processing. Due to the high computational complexity, the existing NB cannot provide improved classification results. Thus, these approaches attain reduced results as compared with KNN. The KNN method is very easy to handle and provides a reduced error rate during classification. Also, the KNN approach can process large data dimensionality and improve the system’s performance. Figure 5 shows the results based on classification using seven types of datasets.

Fig. 5
figure 5

Result analysis of different classifiers in terms of accuracy, precision, recall, F-measure, time, RMSE and MAE

The performance analysis of three types is classifier analyzed using seven varied datasets. Compared with other approaches, the KNN method is effective for each dataset. For every metric, the KNN method attains improved classification results. According to this analysis, KNN is chosen as the best classifier. The proposed work integrates the s-shaped curve to the SMA + GWO algorithm to enhance the hybrid feature selection process and is termed SMA + BGWO. This process can be performed using both normal data and high data dimensionality. Table 14 represents the performance comparison of classification approaches.

Table 14 Comparing the performance of classifiers based on the SMA + BGWO approach using seven datasets

Using SMA + BGWA, the performance is enhanced in each performance metric. Different high dimensional datasets do the analysis, and the classification performance is improved for SVM, NB and KNN. Compared to two other classifiers, the KNN method attains improved outcomes. Figure 6 represents the performance comparison of three classifiers based on the SMA + BGWO approach.

Fig. 6
figure 6

Different classification performance based on the proposed hybrid SMA + BGWO approach using seven datasets

The result analysis proves that the KNN method is more effective than the other two classification techniques. The proposed SMA + BGWO approach is also suitable for the large dimensionality of data. Tables 15, 16 and 17 represent the classification performance of KNN using large, medium and low datasets.

Table 15 Performance of KNN using a large dataset
Table 16 Performance of KNN using the medium dataset
Table 17 Performance of KNN using the low dataset

The performance analysis of KNN classification is performed by evaluating metrics like accuracy, precision, recall, F-measure, MAE and RMSE. For every dataset, the KNN classifier attains enhanced results in each metric. The hybrid SMA + BGWO model with KNN achieves effective results compared to the previous results. Figure 7 represents the performance comparison of KNN classifiers based on the SMA + BGWO approach dataset.

Fig. 7
figure 7

Performance comparison of KNN classification with SMA + BGWO approach using high dimensionality datasets

The KNN classifier achieves better results for each metric in different high dimensionality datasets. The results show that the hybrid optimization approach is highly suitable for the feature selection. Also, the proposed techniques are more applicable to feature selection and classification mechanisms. The confusion matrix for the datasets, namely CICDDoS2019 and bank notes, are depicted in Fig. 8.

Fig. 8
figure 8

Confusion matrix for CICDDoS2019 and Bank note dataset

The graphic above shows that the confusion matrix is ​​evaluated for two datasets: CICDDoS2019 and the banknote dataset. The proposed model can predict the results for CICDDoS2019 more accurately than other existing models by considering two classes, DrDOS-DNS and Benign. The accuracy rate of finding these classes is 99.47%. Similarly, four classes, such as adware, banking, SMS malware, riskware and harmless, are classified from the banknote dataset with an accuracy of 99.05%. The confusion matrix for the Seeds and Prima Indians Diabetes datasets is shown in Fig. 9.

Fig. 9
figure 9

Confusion matrix for Seed and Prima Indians diabetes dataset

The above graphic shows that the confusion matrix is ​​evaluated for two data sets: the diabetes data set “Seed” and “Prima Indians”. The proposed model can predict the results for the seed dataset more accurately than other existing models by considering the three classes: Kama, Rosa and Canadian. The accuracy rate of finding these classes is 99.57%. Similarly, from the Prima Indians diabetes dataset, two classes such as diabetics and non-diabetics, are classified with an accuracy of 99.52%. To better evaluate the proposed hybrid model, an ROC analysis is performed and shown in Fig. 10.

Fig. 10
figure 10

ROC analysis of the proposed hybrid approach

The ROC analysis of the proposed FS approach shows that the model leads to higher performance in both classification and prediction. In addition, the ROC lines are plotted between the true positive data rate and the false positive data rate. The existing models provide a linear constant true positive rate of 0.70 to 0.90. Even the proposed model provides a true positive rate of 0.83 to 0.95 for the classification process. The convergence plot for the proposed hybrid approach is shown in Fig. 11.

Fig. 11
figure 11

Convergence analysis for the proposed hybrid optimization model

To measure the system performance and analyze the more cost-effective function, the proposed method is evaluated using convergence analysis compared to existing models. The models included in the comparison are PSO, ACO, FFA, CS, GWO, BAT, WOA and SMA. Using the 0–50 iterations of fitness values plotted for the proposed approach shows that it decreases the cost when it reaches the 20th iteration. Therefore, from the analysis, it is evident that the proposed SMA + BGWO model converges very quickly, showing better performance compared to other existing algorithms.

5 Conclusion

This paper executes several feature selection techniques based on the results, and the best algorithm is selected for further process. To solve the problems in the feature selection process, the proposed work combines SMA with BGWO algorithms. In the beginning, the data from the various high dimensional datasets are pre-processed, and data cleaning is performed. The pre-processed images are subjected to the hybrid optimization technique. In this, the optimal features are selected, which smoothens the classification process and increases the classification accuracy. For classification, the KNN approach is utilized. The performance of the KNN method is computed by measuring various performance metrics. The proposed model attains an accuracy of 0.9902 in the Alzheimer’s dataset, the accuracy of the Banknote dataset is 0.9899, the accuracy of the Breast Cancer dataset is 0.9887, and the accuracy of the Diabetes dataset is 0.9783. The accuracy obtained in the Heart dataset is 0.9882, the Liver dataset is 0.9793, and the ZOO dataset is 0.9918. Using a large dimensional dataset, the proposed hybrid optimization with the KNN method attains the accuracy of the CICDDoS2019 dataset is 0.9983, achieved an accuracy of CICMalDroid2020 is 0.9930, a Heart dataset obtains an accuracy value of 0.9925, and the Wine quality dataset achieved 0.9929 of accuracy. The Car sales dataset attains 0.9930 accuracy, the banknote dataset attains the accuracy range of 0.9930, the seed dataset attains 0.9959 accuracy, and the Prima Indians dataset obtains 0.9950 accuracy value. The simulation results show that the developed model is more suitable for feature selection and classification. However, the proposed hybrid optimization is more computationally expensive than other recent population-based optimization algorithms. Thus, it should be solved in the future by adopting less computational optimization methods. Additionally, future work is suggested to manage the multi-objective formulation of the feature selection issue and introduce a multi-objective optimization technique to determine the Pareto best solution for feature selection issues. Also, the parameters in the optimization techniques will be adaptively updated to attain enhanced searching ability. Further, promoting the proposed optimization by utilizing evolutionary operators in which high scale optimization tasks will be solved.