1 Introduction

Lately, data have been exponentially growing and become an essential source for several science domains, such as data sciences, data mining, and medical sciences [1]. However, the rapid increase in data size causes several problems, such as high dimensionality, irrelevant data, or noise. Therefore, such problems reduce the machine learning analysis accuracy rate and elevation computational costs [2]. Besides, all conventional machine learning classifier approaches are incapable of interacting with features included in used data [3]. Consequently, as mentioned earlier, these problems influence the production of data science and data mining domains because they are mostly utilized machine learning classifier approaches. Therefore, feature selection (FS) is challenged to select a subset of useful features and eliminates unnecessary features [4, 5].

Different FS techniques contribute to select the most useful features from the dataset [6]. FS aims to accomplish two basic objectives: decreasing the number of selected features and increasing the accuracy rates of the classifier [7, 8]. FS approaches are categorized into filter, embedded, and wrapper, and the main differences among them are in the procedure of selecting the features subset [9]. In wrapper-based methods, a training method or a classification technique is utilized to assess the selected features. Filter-based methods assess the selected features subset based on the properties of the given dataset. In comparison with wrapper-based methods, filter-based methods are faster, as wrapper-based methods have a high consumption time in determining the classifier accuracy values. Furthermore, the wrapper-based methods can find a useful feature subset that satisfies the classifier than the filter-based methods. Fusion methods have the benefit of low computational time but usually are particular to machine learning [10, 11].

The FS problem is known as procedures of finding a minimal reduct, which is defined as an NP-hard problem. The standard used solution to find such a minimal reduct is to produce all potential reducts quickly and then select a minimum number of elements in a set [12]. It is an expensive solution, which is only useful in case of a simple data. For most selection purposes, only one smallest reduct is wanted, thus, all considerations included in determining the rest are targetless. Hence, alternative solutions are needed for large data [13]. Moreover, FS methods are widely applied in different intelligent systems, such as in data mining [10, 14, 15], text clustering [16, 17], image processing [18,19,20], intrusion detection [21,22,23], data classification [24, 25], human activity recognition [26], sentiment analysis [27, 28], cancer detection [29, 30], disease detection [31, 32], power systems [33, 34], design problems [35, 36], text classification [37, 38], and many more other application areas [39,40,41].

In literature, different optimization approaches have been presented for solving FS problems [8]. However, the optimization methods became essential and useful for the FS community, as they are recognized for their ability to search for solutions. Examples of such methods are whale optimization algorithm [42], artificial fish swarm (AFS) [43], social spider optimization (SSO) [44], moth-flame optimization algorithm (MFO) [8], salp swarm algorithm (SSA) [4], particle swarm optimization (PSO) [45], water wave optimization algorithm (WWO) [46], Harris Hawks optimizer (HHO) [47], fruit fly optimization algorithm (FOA) [48], gray wolf optimizer algorithm (GWO) [6], grasshopper optimization algorithm (GGA) [49], ant colony optimization (ACO) [50], aquila optimizer (AO) [51], arithmetic optimization algorithm (AOA) [52], and bat algorithm (BA) [53].

When comparing these optimization methods, such as PSO, GA, GOA, ALO, PSO, and WWO, several limitations have been found. PSO does not act efficiently for large optimization problems and low population diversity. GA has shortcomings such as the dilemma of crossover operator, which is quickly shifted the candidate solutions through the optimization process. GOA has shortcomings such as slow-motion, poor grasshopper moves, and early convergence. ALO has shortcomings such as the problems of early convergence and local optima. GWO has shortcomings such as strangling in the local area and early convergence. Recently, Faramarzi et al. [54] presented a new nature-inspired optimization algorithm, called marine predators algorithm (MPA) for solving various engineering optimization problems. The primary motivation of MPA is the general foraging approach, which are Lévy and Brownian motions of animals that naturally prey on others in the ocean along with optimal collision rate method in natural interaction between animals and prey. MPA is proceeding ahead of the rules that biological rule in optimal foraging approach and collisions rate plan between animals and prey in the marine environment. The MPA’s performance is evaluated on 29 CEC-BC-2017 benchmark functions and five engineering problems. The evaluation of MPA showed that MPA got better results in some cases and comparative results in other cases. These obtained results motivate us to apply MPA to solve the FS problems; it is never applied to solve these kinds of problems since it was introduced. Moreover, MPA has less tuning parameters and is straightforward to implement. It proved its ability to tackle small and large optimization problems and distinguished by its features: flexibility and robust stochastic behavior.

Mirjalili [55] proposed a population-based method called sine–cosine algorithm (SCA) to solve various optimization tasks. It performs multiple initial stochastic solutions and orders them to shift toward the near-optimal solution utilizing a mathematical representation depending on sine and cosine functions. Several adaptive and random variables are blended to the SCA to ensure search methods: exploitation and exploration. It has an excellent ability in exploitation search over the given search space.

In this study, an efficient FS method is proposed based on a new improved version of the marine predators algorithm (MPA) using the operators of the sine–cosine algorithm (SCA), called MPASCA. Generally, MPA suffers from some shortcomings, such as permutate convergence and strangling in the local optimum. So, the proposed MPASCA improves the conventional MPA’s performance by added additional components from the SCA algorithm. This proposed procedure is founded to tackle the weaknesses of conventional MPA. Thus, the cooperation between the used algorithms improves the behavior of convergence ability. Several datasets with different characterized are employed to evaluate the performance of the MPASCA. The performance of the MPASCA is compared to the conventional MPA, SCA, and other well-known optimization algorithms. The obtained results proved that the MPASCA achieved better results compared to other similar optimization methods. Moreover, statistical significance tests are conducted to further evaluate the MPASCA method’s effectiveness compared to other optimization methods using the same datasets. The results proved that the MPASCA obtained significantly better results.

The main contributions in this paper have been pointed as follows:

  • A new fusion version of the marine predators algorithm (MPA) and sine–cosine algorithm (SCA) is adapted, called MPASCA.

  • MPA is combined with SCA to solve its local optima problem and enhance the current best solution.

  • MPASCA wrapper feature selection method is proposed in this paper.

  • Comprehensive comparisons and statistical tests are performed, such as worst, mean, and best fitness, classification accuracy, standard deviation, and computational time.

  • MPASCA method is compared with other optimization algorithms, and the experimental results showed that the proposed MPASCA gave superior results compared to different algorithms.

The rest of this study is presented as follows. A brief literature review is presented in Sect.  2. Section 3 discusses the background of MPA and SCA. Section 4 illustrates the proposed MPASCA method. Section 5 shows the experiments and results. The conclusion and potential future works are presented in Sect. 6.

2 Related works

Feature selection is a necessary pre-processing action in several creative systems such as text clustering, intrusion detection, text categorization, disease prediction, and text summarization [56]. It also appears as one of the most popular and challenging jobs in machine learning domain [57]. A well-known approach for FS is by using the Monte Carlo and rough set theory (RST), such as kruczyk et al. [58]. Also, Bouzayane and Saad [59] used rough set theory to build a periodic multicriteria classification approach. Kifah et al. [60] developed an FS approach using a double treatment iterative improvement algorithm with the RST. Li et al. [61] developed a new FS method based on Monte Carlo approach for leukemia stem cell expression signatures Identification. Recently, different optimization approaches have been successfully employed to address these problems, as follows.

An improved FS approach is proposed based on applying the SSA to find the optimal features subset in wrapper-method, called ISSA [4]. Two enhancements are incorporated into the conventional SSA to tackle its weaknesses through dealing with FS problems. The first development involves the advantage of Opposition-based Learning at the beginning phase of SSA to increase its population diversity, where the second enhancement involves improving and applying a new local exploration search with SSA to enhance its exploitation ability. Eighteen datasets were used to validate the effectiveness of the improved SSA (ISSA). Moreover, the performance of the ISSA is analyzed with other similar optimization algorithms. The results demonstrated that the ISSA got better results in convergence curves, accuracy, fitness values, and feature reduction.

Agrawal et al. [62] presented a new FS approach based on adding an amalgamation of the Quantum ideas to the conventional WOA, called QWOA. The introduced method improved the search strategies (exploitation and exploratory) of the conventional WOA by utilizing the quantum bit design over the solutions of the WOA population and a quantum rotation as a mutation operator. Furthermore, improved mutation and crossover operators were proposed for the quantum exploration process. The effectiveness of QWOA was tested on fourteen datasets from various domains, and the results were analyzed and compared with the WOA and several similar optimization algorithms. The obtained results proved that the proposed method (QWOA) got a superior performance. Moreover, statistical tests are also employed, and it showed that the QWOA achieved better results compared to several similar metaheuristics.

An improved version of FOA is presented to tackle the implied weaknesses of the original FOA [48]. The proposed version added chaotic exploitation search mechanism and Gaussian mutation operator into the main procedure of the FOA to evade premature convergence problem and enhance the exploitative tendencies, respectively. Comprehensive comparisons had been developed utilizing different characterized datasets to verify the performance of the proposed methods. Experiment results proved that the best version of the FOA is the MCFOA; they were employed to FS. Also, evaluation results demonstrated that MCFOA got optimal classification accuracy.

A wrapper-based method was proposed to solve the FS problems using an improved SSA [57]. The proposed method merged a time-varying mechanism, Random Weight Network (RWN), and binary SSA, called TVBSSA. In the proposed method, TVBSSA was employed as an optimization search mechanism, where RWN was employed as a selection method. The objective function is a combination of three sub-objectives: reducing the reduction degree of the selected features, increasing the classification accuracy rates, and reducing the complexity of produced RWN models. Twenty datasets were used, and several existing methods were employed to test its effectiveness. Experiment results validated the capability of the proposed method was overwhelming other similar optimization algorithms.

More so, an FS method was proposed to find a subset of features that maximize classification accuracy values [13]. The proposed method applies a newly binary WWO algorithm along with a rough set theory (RST), called BWWO. Two experiments were conducted based on the wrapper-based method, and a rough set-based process as a portion of the used objective function to validate the effectiveness of BWWO. In the first section, the performance of the RST methods was tested on sixteen various datasets. In the second section, the kNN method’s performance to select a subset of features that improved the classification accuracy and reduced the number of selected features was evaluated on seventeen various datasets. The obtained results proved the effectiveness of BWWO in obtaining the smallest features subset that maximizes accuracy values. Moreover, the results of Friedman and Wilcoxon’s ranking tests showed that the BWWO got significantly better results. An alternative FS method is proposed for finding a subset of features [8]. The proposed method, called OMFODE, improved the search efficiency of the conventional MFO by combining the OBL and differential evolution (DE) algorithm with the MFO. The OBL is employed to produce a better initial population which enhances the convergence ability of MFO; meantime, the DE is utilized to enhance the exploitation search of the MFO. Consequently, the OMFODE can avert from getting trapped in a local optimum, dissimilar to the conventional MFO. A set of experiments was used to evaluate its performance, and results demonstrated that it performed better than other optimization algorithms.

A new version of SSA is proposed for solving the FS, called ISSAFD based on enhancing the followers of the SSA using the SCA and Disrupt Operator (DO) [41]. This improvement supports to enhance the exploration stage and to avert from stuck in a local area. Furthermore, DO is utilized for the given solutions to enhance the population diversity and to keep the equilibrium between the search strategies (exploitation and exploration). Two different modifications of the SSA are produced based on the SCA, namely ISSALD and ISSAF. Experimental results are assessed on 20 datasets. The results showed that ISSAFD had an excellent performance in all of specificity, sensitivity, accuracy, and the number of chosen features compared to other optimization algorithms.

A new GWO algorithm is proposed and combined with two-phase mutation operators to address the FS problem [6]. The sigmoid function was employed to convert into binary search space to meet binary characteristics. The two-phase mutation intensifies the exploitation search ability of the proposed method. The first mutation operator aims to decrease the size of features and keeping a high accuracy rate. While the second mutation operator is applied for adding extra informative features which improve the accuracy values. Comprehensive comparisons on 35 datasets were conducted and compared with other similar optimization approaches. Also, statistical analyses were implemented to verify its performance.

Mafarja and Mirjalili [42] presented two hybridization algorithms based on the WOA for solving FS problems. In the first proposal, the WOA is hybridized with Simulated Annealing (SA) for enhancing its exploitation stage. In the second proposal, the SA is used by MOA for improving the obtained-best solution in each iteration, which is the exploitation search ability. The performance of the hybridization algorithms was evaluated using 18 datasets and compared to other similar FS methods. The results proved the effectiveness of the hybridization algorithms in increasing the accuracy values, which guarantees the ability of the WOA to solve challenging FS problems.

3 Background

In this section, the basics of the marine predators algorithm and sine–cosine algorithm are introduced.

3.1 Marine predators algorithm (MPA)

The MPA is a recently proposed method inspirited by the behavior of prey and predator in nature [54]. In MPA, predator and prey are considered as search agents, in which a predator looking for a prey when a prey is looking for its food. Similar to all metaheuristic techniques (MHs), MPA started by a set of random solutions as an initialization. After that, solutions are modified based on the main structure of the algorithm.

Therefore, initial solutions can be obtained depending on the search landscape as follows;

$$\begin{aligned} Z=\mathrm{LB}+r_1\times ({\mathrm{UB}}-{\mathrm{LB}}) \end{aligned}$$
(1)

where the LB and UB refer to the lower and upper boundary in the search landscape. \(r_1\in [0,1]\) is the random number.

In the initialization phase, two basic matrices must be defined, the matrix of the fittest predators (Elite matrix) and the prey matrix. The following formulas are employed for representing the Elite and prey matrices.

$$\begin{aligned} {\mathrm{Elite}}=\left[ \begin{array}{cccc} Z_{11}^1&{}Z_{12}^1&{}...&{}Z_{1d}^1\\ Z_{21}^1&{}Z_{22}^1&{}...&{}Z_{2d}^1\\ ...&{}...&{}...&{}...\\ Z_{N1}^1&{}Z_{N2}^1&{}...&{}Z_{Nd}^1\\ \end{array}\right] , \, Z=\left[ \begin{array}{cccc} Z_{11}&{}Z_{12}&{}...&{}Z_{1d}\\ Z_{21}&{}Z_{22}&{}...&{}Z_{2d}\\ ...&{}...&{}...&{}...\\ Z_{N1}&{}Z_{N2}&{}...&{}Z_{Nd}\\ \end{array}\right] , \, \end{aligned}$$
(2)

The principal target of the MHs is reaching for the optimal solutions via modifying the initial random set of solutions based on a list of steps regarding the algorithm structure. MPA follows three phases during updating the solutions that depend on velocity ratio among the predators and prey. The first phase can be considered when velocity ratio among the prey and predators is high while the unit and low-velocity ratios are the observable marks for the second and third phases. The details of each stage are discussed in the following.

3.2 Phase 1: high-velocity ratio

This phase applied for discovering the search landscape (exploration phase). The main feature for this stage is the high-velocity ratio among predators and prey, where the prey moves very fast while the best tactic for the predator is staying without move at all. This phase occurred at the first third of the total number of generations (i.e., \(\frac{1}{3}t_{\max }\)) where the prey position is updating using the following equations.

$$\begin{aligned} S_i= & {} R_{\mathrm{B}} \bigotimes ({\mathrm{Elite}}_i-R_{\mathrm{B}}\bigotimes Z_i), i=1,2,...,n \end{aligned}$$
(3)
$$\begin{aligned} Z_i= & {} Z_i+P.R\bigotimes S_i \end{aligned}$$
(4)

where \(R\in [0,1]\) represent a vector of uniform random numbers, and \(P=0.5\) is a constant number, where \(R_{\mathrm{B}}\) represents a random vector which represents Brownian motion. More so, \(\bigotimes \) is the process of element-wise multiplications.

3.3 Phase 2: unit velocity ratio

The second phase is the transition phase from the exploration to the exploitation stage, in which predators and prey are moving with nearly the same velocity searching for their foods. This phase performed in the middle stage of the algorithm when \(\frac{1}{3}t_{\max }< t< \frac{2}{3}t_{\max }\). In this case, the best tactic for predator is to follow Brownian, where the prey moves with Lévy flight. Accordingly, the population can be divided into two halves and applying Eqs. (5), (6) to emulate the motion of the first half and Eq. (9), (10) for the second half as defined below.

$$\begin{aligned} S_i= & {} R_L \bigotimes ({\mathrm{Elite}}_i-R_L\bigotimes Z_i), i=1,2,...,n \end{aligned}$$
(5)
$$\begin{aligned} Z_i= & {} Z_i+P.R\bigotimes S_i \end{aligned}$$
(6)

where \(R_L\) represents random numbers that follow Lévy distributions. Eqs. (5), (6) are employed to the first half of the population that represents the exploitation, where the second half follows the following equations:

$$\begin{aligned} S_i= & {} R_{\mathrm{B}} \bigotimes (R_{\mathrm{B}} \bigotimes {\mathrm{Elite}}_i- Z_i), i=1,2,...,n \end{aligned}$$
(7)
$$\begin{aligned} Z_i= & {} {\mathrm{Elite}}_i+P.CF\bigotimes S_i,\, CF=\left( 1-\frac{t}{t_{\max }} \right) ^{2\frac{t}{t_{\max }} )} \end{aligned}$$
(8)

where CF represents a parameter which controls the step size of movements for predators.

3.4 Stage 3: low-velocity ratio

This phase is the last stage of the optimization process. During this phase, predator is moving faster than prey; this is why predator follows Lévy during updating its position. This process is performed at the last third of the iteration numbers (\(t>\frac{2}{3}t_{\max }\)) and this formulated as:

$$\begin{aligned} S_i= & {} R_L \bigotimes (R_L \bigotimes {\mathrm{Elite}}_i- Z_i), i=1,2,...,n \end{aligned}$$
(9)
$$\begin{aligned} Z_i= & {} {\mathrm{Elite}}_i+P.CF\bigotimes S_i,\, CF=\left( 1-\frac{t}{t_{\max }} \right) ^{2\frac{t}{t_{\max }} )} \end{aligned}$$
(10)

3.5 Eddy formation and fish aggregating devices’ effect (FADS)

The surrounded environment has a significant impact on the behave of the creature. Therefore for MPA, the authors considered the environmental issues such as the eddy formation or Fish Aggregating Devices (FADs) effects that can be mathematically modeled as follows:

$$\begin{aligned} Z_i=\left\{ \begin{matrix} Z_i+CF [Z_{\min }+R \bigotimes (Z_{\max }-Z_{\min })]\bigotimes U &{} r_5 < {\mathrm{FAD}} \\ Z_i+[{\mathrm{FAD}}(1-r)+r](Z_{r1}-Z_{r2}) &{} r_5 > {\mathrm{FAD}}\\ \end{matrix}\right. \end{aligned}$$
(11)

where \({\mathrm{FAD}}=0.2\), and U represent a binary solution and this preformed by generating random solutions then they are converted into binary using threshold 0.2. \(r\in [0,1]\) represents a random number. \(r_1\) and \(r_2\) represent the indices of the prey.

3.6 Marine memory

Predators have a strong memory of the position where they have been productive in foraging. This feature is implemented by exporting memory to MPA, where the last best solutions of the previous iteration are saved and compared with the current solutions. After that, solutions can be updated depending on the best during the comparison stage.

3.7 Sine-cosine algorithm

In SCA, Mirjalili et al. [55] utilized the characteristics of sine and cosine trigonometric functions to attain qualified solutions for global optimization problems. The SCA structure includes some steps that start with the initialization phase, where a set of random solutions have been generated. The random solutions z have a set of N solutions, and each solution \(Z_i\) has dimension \(D_z\) as the number of the search agents is N, and the dimension of the studied problem is D. The initial values of the studied function \(Fit_i\) as well as computed based on the initial solutions \(Z_i\). Mirjalili et al. [55] sorted the solutions to identify the best solutions \(Z_{\mathrm{B}}\) and the corresponding \(Fit_{\mathrm{B}}\) value to be based on while modifying the solution values across the number of iterations. The best solution vector and best function value have been modified each iteration to be a guide for the other search agents. Subsequently, the followed solutions update approach in the SCA can be represented as below:

$$\begin{aligned} Z_i=\left\{ \begin{array}{c} Z_i+\sigma _1\times sin(\sigma _2 )\times |\sigma _3 Z_{\mathrm{B}}-Z_i | \ \ \ \ if \; \sigma _4<0.5 \\ Z_i+\sigma _1\times cos(\sigma _2 )\times |\sigma _3 Z_{\mathrm{B}}-Z_i | \ \ \ if\; \sigma _4\ge 0.5 \end{array} \right. \ \end{aligned}$$
(12)

where |.| and \(\sigma _i, i=1,2,3,4\) represents the absolute value, and random variables produced in the interval of [0, 1], respectively. The \(\sigma _2\) is applied to assign the next direction of updating the solution (e.g., toward or outward \(Z_{\mathrm{B}}\)), whereas \(\sigma _3\) is employed to test whether the best solution \(Z_{\mathrm{B}}\) stochastically maintains, and this occurred if \(\sigma _3 > 1\). In addition, \(\sigma _1\) is used to attain a smooth harmonization between the diversification and the intensification cores of the algorithm to certain the promising region, and this parameter can be modified at each iteration t based on the following equation Eq.(13) [55]:

$$\begin{aligned} \sigma _1=\gamma -t \frac{\gamma }{t_{\max }} \end{aligned}$$
(13)

where \(t_{\max }\) and \(\gamma \) represent the total number of iteration and the constant, respectively.

4 Proposed feature selection method

The structure of the proposed MPASCA is given in Fig. 1. The proposed MPASCA uses SCA operators to enhance the exploitation of MPA, which reflected on the quality of the obtained solution that represents the optimal subset of relevant features. To reach this main objective, the developed MPASCA begins by setting the initial value for a set of N solutions inside the search space that is bound by lower and upper boundaries (here, 0 and 1, respectively). Then compute the fitness value for each solution after converting it to a binary solution, which determines which features are relevant and which must be removed. Thereafter, the best solutions are allocated, and the other solutions can be updated using either the operators of SCA or MPA. The previous steps are implemented until they reached the end of the iterations. The details of each step are discussed with more information below.

4.1 First step: generating solutions

In this step, MPASCA starts by defining the initial value for each parameter and split the data into training and testing set. Followed by using Eq. (14) to set the initial value for each solution

$$\begin{aligned} Z_{i,j}={\mathrm{LB}}_j+r_1\times ({\mathrm{UB}}_j-{\mathrm{LB}}_j), \; \, j =1, 2,..., D, \, i=1,2,...,N \end{aligned}$$
(14)

where \({\mathrm{LB}}_j\) and \({\mathrm{UB}}_j\) are the limits of search domain at jth dimension. N is the total number of solutions.

4.2 Second step: updating solutions

This step starts by converting each solution \(Z_i\) to binary using the following equation.

$$\begin{aligned} BZ_{ij}=\left\{ {\begin{array}{*{20}{c}} 1&{} if \ Z_{ij}>0.5\\ 0&{} otherwise\\ \end{array}} \right. \end{aligned}$$
(15)

The main aim of using Eq. (15) is to convert the solution from real-value to discrete that suitable for FS problem. Thereafter, the quality of the current solution is evaluated using the kNN classifier that depends on the training set with removing the features that corresponding to zeros in \(BZ_i\). This process (i.e., compute fitness value \(Fit_i\)) is defined as:

$$\begin{aligned} {\mathrm{Fit}}_i=\lambda \times \eta _i+(1-\lambda )\times \left( \frac{|BZ_i|}{\mathrm{Dim}}\right) \end{aligned}$$
(16)

In Eq. (16), \(\lambda \in [0,1]\) is a uniform random number which is employed to balance between the ratio of selected features (\(\frac{|BZ_i|}{\mathrm{Dim}}\)) and the classification error (\(\eta _i\)). For clarity, if we assume that the number of features of the dataset is 5, so this means that the dimension of the current solution \(Z_i\) is 5. In case \(Z_i=[0.1\, 0.6\, 0.7\, 0.9\, 0.2]\), so, by using Eq. (16) the \(BZ_i=[0\, 1\, 1\, 1 \,0]\). These Boolean values lead us to determine the irrelevant features (i.e., first and fifth) that will be removed. As well as the relevant features that corresponding to ones (i.e., second, third, fourth), and they will be used to train the classifier.

The next process is to determine the value of the best solutions and update the solutions using the operators of MPA or SCA. This achieved by computing the probability \(Pr_i\) for the current solution \(Z_i\) in the exploitation stage as:

$$\begin{aligned} Pr_i=\frac{Fit_i}{\sum _{i=1}^NFit_i} \end{aligned}$$
(17)

Then \(Z_i\) is updated using Eq. (18).

$$\begin{aligned} Z_i=\left\{ \begin{matrix} operators\, of\, MPA &{} Pr_i \ge rs \\ operators\, of\, SCA &{} otherwise \\ \end{matrix}\right. \end{aligned}$$
(18)

where

$$\begin{aligned} rs=min(Pr_i)+rand\times (max(Pr_i)-min(Pr_i)), \, rand\in [0,1] \end{aligned}$$
(19)

In Eq. (18), the value of r1 is updated in dynamic environment based on \(Pr_i\) and this provides MPASCA by a suitable tool to avoid the problem of fixed the value of r1. This leads to enhance solutions and to reduce the time needed to find the value of r1 manually.

4.3 Step three: stop conditions

In this step, the terminal conditions are checked, and if they are not satisfied, then the previous step (i.e., updating process) is implemented again. Followed by reducing the testing set depending on the best solution features that correspond to ones and remove other features. Then apply the testing set to the KNN classifier and compute the target of the testing set with computing the performance.

Fig. 1
figure 1

The structure of MPASCA FS method

4.4 Computational complexity

In this section, the computational complexity of the proposed method is presented. This calculation proved the time needed by the proposed method to solve any problem, which is given as follows.

The real-time complexity of the proposed MPASCA is given. The time complexity of the proposed MPASCA is normally calculated using three main parts: initializing process for the first solutions, determining the fitness functions, and renewing solutions. Suppose that N is the number of the practiced solutions, O(N) is the computational complexity in this part. The computational complexity of the second part for the updating processes is O(T \(\times \) N) + O(T \(\times \) N \(\times \) Dim), which involves studying for the best solutions and renewing them, where the highest number of iterations is named T. This approach is for all of the used search methods; MPA and SCA. The dimension of the provided problem is defined Dim. Thus, the total computational complexity of the proposed method (MPASCA) is O(N \(\times \) (T \(\times \) Dim \(+\) 1)).

5 Results and discussion

5.1 Data description

To validate the developed MPASCA, 18 UCI datasets [63] are used. The descriptions of these datasets are presented in Table 1. It can be seen from this table that those datasets are collected from different fields; also, they have different numbers of features and instances.

Table 1 Datasets description

5.2 Performance measures

This section presents six performance measures used to evaluate the performance of the proposed method. These measures are the averages of the accuracy, the standard deviation (Std), the fitness value, the minimum of the fitness value (Min), the maximum of the fitness value (Max), the selected features. In order to calculate the average of the performance measures, all algorithms are applied 30 independent runs.

Table 2 Results of the fitness means of the proposed method and the compared algorithms
Table 3 Results of the STD of the fitness value of the proposed method and the compared algorithms
Table 4 Results of the Min measure of the proposed method and the compared algorithms
Table 5 Results of the Max measure of the proposed method and the compared algorithms
Table 6 Results of the selected features of the proposed method and the compared algorithms
Table 7 Accuracy measure results of the proposed method and the compared algorithms
  • Accuracy (Acc): This measure calculates the ratio of the corrected classified data. It is calculated as in Eq. (20).

    $$\begin{aligned} {\mathrm{Acc}} = \frac{{{\mathrm{TP}} + {\mathrm{TN}}}}{{{\mathrm{TP}} + {\mathrm{FN}} + {\mathrm{FP}} + {\mathrm{TN}}}} \end{aligned}$$
    (20)

    where TN, TP, FN, and FP denotes the true negative, the true positive, the false negative, and the false positive, respectively.

  • Fitness value (FV): This measure evaluates the performance of the methods based on the used fitness function as in Eq. (16).

  • Maximum of the fitness value (Max): This measure records the maximum value obtained by the fitness function for each algorithm.

    $$\begin{aligned} {\mathrm{Max}} = \max _{1\le i\le N} Fit_{b}^i \end{aligned}$$
    (21)
  • Minimum of the fitness value (Min): This measure records the minimum value obtained by the fitness function for each algorithm.

    $$\begin{aligned} Min = \min _{1\le i\le N} Fit_{b}^i \end{aligned}$$
    (22)
  • Selected features: This measure records the number of selected features obtained by each algorithm.

  • Standard deviation (Std): This measure evaluates the stability of an algorithm over different runs. It can be calculated as in Eq. (23).

    $$\begin{aligned} {\mathrm{Std}} = \sqrt{\frac{1}{{{N}}}\sum \limits _{i = 1}^{{N}} {{{( Y_{i} - {Y_{\mathrm{avg}}})}^2}}} \end{aligned}$$
    (23)

    where N is the number of runs. \(Y_{i}\) indicates the given fitness value. \(Y_{\mathrm{avg}}\) is the mean of all values for the given algorithms. \(Fit_{b}^i\) indicates the best FV at the ith run.

5.3 Computational results

In this section, the MPASCA is evaluated using 18 datasets and it is compared to eight optimization algorithms, including MPA, Henry gas solubility optimization (HGSO) [64], Harris hawks optimization (HHO) [65], WOA [66], GWO [67], genetic algorithm (GA) [68], SSA [24], and SCA. The comparison is based on the average of the fitness function value, STD, Min, Max, and the selected features measures besides the accuracy of the classification phase.

The parameter setting of these algorithms is set as the recommended setting in their original studies. In addition, the common parameters such as \(N=15\), and \(t_{\max }=50\). Each method is performed 30 independent runs to fair comparison, and the dataset is split into 80% training, and the rest is considered as a testing set.

The results are tabulated in Tables 2, 3, 4, 5, 7 and in Figs. 2, 3, 4. According to the average of fitness function values, all algorithms tried to minimize the fitness function value; therefore, in this measure, the MPASCA obtained the lowest fitness values in 33% of all datasets, especially in D8, D9, D10, and D12. The SCA was ranked second, followed by MPA with obtaining the lowest values in 22% of all datasets for each one. The HHO and HGSO showed slightly the same behavior, whereas the GWO and GA produced the worst results. In addition, the proposed MPASCA outperformed the other methods in 50% of the high-dimensional datasets. It obtained the first rank in both DS19 and DS21 followed by HHO and SSA in DS19 and DS21, respectively, and the MPASCA showed competitive results in the rest datasets. The worst results showed by the WOA algorithm.

In terms of STD measure, the MPASCA was the most stable algorithm; it recorded the lowest STD values in four databases as well as in the average of STD in all databases. The second and third ranks were obtained by MPA and SCA, respectively. While the HHO did not get the lowest STD in any dataset, but it was ranked fourth based on the average of STD in all datasets followed by HGSO, SSA, and GA. The largest STD values were obtained by WOA; therefore, it came in the last rank. These results also showed in the high-dimensional datasets.

The Min measure is used to check the behavior of an algorithm in reaching the smallest fitness values in each dataset. In this measure, the MPASCA achieved the smallest and best values in 8 out of 22 datasets, followed by SCA and HGSO, which obtained the best values in 3 datasets for each one, whereas the performance of the MPA and HHO was slightly similar. The GA did not obtain the best values in any datasets.

Based on the Max measure, the worst algorithm was GA. It obtained the worst values in 10 out of 22 datasets, followed by GWO with the worst values in 6 datasets, whereas the WOA obtained the worst fitness values in two datasets. The MPASCA showed good performance in this measure and was ranked first.

In feature selection problems, the goal is to find the smallest number of features with saving the quality of these features. Therefore, the feature numbers are considered in this experiment. In this context, the smallest number is the best. As shown in Table 6, the SCA yielded the smallest feature numbers in 8 out of 22 datasets, followed by the MPASCA with the smallest feature numbers in 7 datasets, whereas the WOA, MPA, and HHO showed similar selection ratios to some extent.

The selected features by all algorithms are evaluated using KNN classifier to test their quality, whereas the smallest feature number is not always the best, but in some cases it includes low qualities features. Therefore, these features should be evaluated by a classifier.

In this part, the accuracy is used as an output of this phase. Table 7 shows the accuracy of each algorithm, overall datasets. The proposed method, MPASCA, obtained the highest accuracy in 78% of all datasets. The SSA came in the second rank and achieved the best accuracy in 27% of all datasets, followed by HHO with the highest accuracy in 23%. Both SCA and HGSO showed similar results to some extent and were ranked third and fourth, respectively. The MPA showed similar performance to GA in most datasets, whereas the WOA obtained the last rank and produced the lowest accuracy in all datasets.

In general, the MPASCA showed a good ability to select the most significant features in the selection phase and achieve the highest accuracy in the classification phase.

5.4 Statistical results

In this section, the Friedman test is used to rank the used methods in the previous experiments. Table 8 contains the Friedman results of the fitness values (FV), Min, and Max measures. From the table, we can conclude that the MPASCA algorithm obtained the best rank than other methods in all measures followed by SCA, MPA, HHO, and HGSO, respectively, whereas GA came in the last rank. In addition, Figs. 5, 6, 7 illustrate the post hoc test; the algorithms appear in the figure with this order: MPASCA, MPA, HHO, HGSO, WOA, GWO, GA, SSA. There are statistically significant differences between the proposed method and 50% of the compared algorithms in all measures.

Fig. 2
figure 2

Averages of the fitness function values and the STD measure in the experiment

Fig. 3
figure 3

Averages of the Min and Max measures in the experiment

Fig. 4
figure 4

Averages of the accuracy measure and the number of selected features in the experiment

5.5 Real application

In this section, we test the applicability of the proposed MPASCA when applied to the real metabolomics dataset. The metabolomics studies have a great role in early diagnosis of diseases where metabolomics can be defined as a branch of science which deals with the analysis of metabolites in the tissues of organisms in order to observe physiological functions and responses of the organism to different stimuli, such as infection, illness or drug usage. Accordingly, the feature selection process of bioinformatics is an essential target. Identifying the genomic biomarkers between thousands or even millions of genes tested can help to define the exact mechanism behind a specific disease or can participate in the drug design process [69, 70]. The considered three metabolomics datasets descriptions in this study can be given as below:

  1. 1.

    TBI dataset In this traumatic brain injury (TBI) dataset, the features refer to the profiling of serum metabolic with/without cognitive impairment (CI). In general, there are 31 and 73 TBI patients with and without CI, respectively. There are 42 metabolites and 104 samples in this dataset, as given in Table 9.

  2. 2.

    CHD dataset The coronary heart disease (CHD) [71, 72] contains 88 patients with 78 metabolites computed in the patients and healthy controls with UPLC-HRMS (this term refers to the “ultra-performance liquid chromatography-high resolution mass spectrometry”). The collected sample is from patients of the Hospital of Yunnan Province, China, where the 88 samples can be categorized into 21 and 16 patients with type 2 diabetes mellitus and without CHD, respectively. Moreover, there are 51 healthy cases without blood relationships, so there are three classes in the CHD dataset.

  3. 3.

    ATR dataset This dataset refers to the Acori Tatarinowii Rhizoma (ATR) collected from two provinces in China (Anhui and Sichuan). There are 21 samples from Sichuan and eight from Anhui. Additionally, the ATR contains 104 features (i.e., volatile compounds) and 29 samples. For more details see [70].

Table 8 Results of Friedman test
Fig. 5
figure 5

Post hoc test for fitness function measure

Fig. 6
figure 6

Post hoc test for Min measure

Fig. 7
figure 7

Post hoc test for Max measure

Table 9 Description of the used datasets

5.5.1 Results and discussion

Table 10 records the classification accuracy results (in this measure, the best accuracy (BestAcc) and the mean accuracy (MeanAcc) are presented) as well as the number of the selected features by each algorithm. The MeanAcc was calculated using 30 runs.

In the TBI dataset, the BestAcc shows that the performance of the MPASCA was similar to the MPA, HGSO, WOA, GWO, and SSA, whereas all these algorithms reached 100% of the classification accuracy, whereas the MeanAcc shows that the MPASCA, SCA, and WOA successfully classified all samples correctly, followed by HGSO. Regarding the feature selection results, the MPASCA was ranked third after HGSO and WOA, respectively, with small differences.

Table 10 Results of the real application experiment

In the CHD dataset, the MPASCA achieved the first rank in the BestAcc and MeanAcc measures; it was obtained 0.90 and 0.86, followed by SCA with 0.89 and 0.83, respectively. The MPA was ranked third whereas, the last rank was obtained by the HGSO. In the feature selection measure, the MPASCA came in the second order after the HGSO, but the quality of the obtained features by MPASCA was better than those of HGSO.

In the ART dataset, all algorithms performed equally in the BestAcc measure, and they classified all samples correctly. In MeanAcc measure, the MPASCA performed similarly to the HHO, HGSO, SCA, GWO, and SSA. The worst result was recorded by the GA algorithm. In the feature selection measure, the MPASCA and WOA obtained the smallest features followed by the MPA, HGSO, and SCA, respectively.

In general, the MPASCA showed good performance in all datasets and measures besides showed the best classification results, especially in the CHD dataset.

From all previous results and discussion, it can be noticed that the MPASCA as a feature selection method provides better accuracy than other comparison FS techniques. This was obtained by using the operators of SCA during the exploration phase of MPA. This makes the solutions are competitive in this phase and allow them to find the feasible region that contains the optimal solution. However, MPASCA has some limitations that influence the quality of solutions, such as its exploitation ability needs to be improved, and this can be achieved by using the local search method.

6 Conclusion

In this study, a modified version of the marine predators algorithm (MPA) is proposed as a feature selection (FS) method. In general, MPA established its ability to find the solution for global optimization and engineering problems. However, by analyzing its behavior during the optimization process, it has been found that its exploitation is weaker than its exploration, which can degradation its performance when applied to real-world applications, such as FS. Therefore, the operators of the sine–cosine algorithm (SCA) have been combined with MPA in the exploitation phase to enhance its ability to stagnate at the local point. To validate the performances of the proposed MPASCA, a set of experiments are conducted using 18 datasets. In addition, it compared to eight FS algorithms. The experimental results proved the superiority of the developed MPASCA overall other algorithms in terms of classification performance. Furthermore, we evaluate the MPASCA with real-world application, such as metabolomics dataset. Also, it showed better performance than compared algorithms.

In future works, the developed MPASCA model can be extended as a multi-objective feature selection and image segmentation. In addition, it can be applied to IoT as task scheduler techniques.