Keywords

1 Introduction

Rapid developments in information science have resulted in a dramatic increase in dataset dimensions over the past decade. Potential dimension reduction algorithms are needed to remove redundant or irrelevant information from these datasets, since these features can lead to reduced performance of learning algorithms [22].

Typically considered a mechanism for preprocessing, feature selections are used for decreasing the total number of input variables, as well as finding the most relevant subset from a complete features set. Feature selection reduces the dimensionality of data by removing the noise and irrelevant attributes. This challenge is very important, especially when the real-time classification is needed by finding optimal or near-optimal subset of features, the training process can be shortened and classification accuracy can be improved. It is applied so as to increase the precision of prediction results given by the machine learning model, by reducing complexity, and diminishing redundant and irrelevant features in the dataset. This can be crucial in case of some critical applications, such as medical diagnostic [10]. Feature subset evaluation and search strategy are the two primary stages of preprocessing. Search strategy uses techniques for subset feature selection, while feature subset evaluation utilizes a classifier for evaluating the quality for the selected feature subset. All methods for feature selection, according to reviewed literature, are defined as either filter based or wrapper based.

Metaheuristic algorithms are considered the most reliable and efficient techniques for optimization and show great results when applied to problems considered more challenging or with higher-dimensional datasets. As a result, these algorithms show great promise and have been applied to many real-world problem that require optimization and performance improvements [3, 4, 25, 32, 34, 36]. Although these algorithms are often nature inspired, this is not necessarily always the case as shown in the sine cosine algorithm (SCA) [20].

Because of the high accuracy results achieved, as well as the reduced computational times when compared to traditional discrete methods, the metaheuristic approach has been employed by researchers, in wrapper-based methods, when solving the problem of feature selection. A Gaussian mutational chaotic fruit fly optimization algorithm’s [31] application has been suggested for tackling the problem of feature selection, specifically to classification tasks. An augmented model of the dragonfly algorithm (DA), the hyper-learning binary dragonfly algorithm (HLBDA), has been implemented for feature evaluation and utilized on coronavirus (COVID-19) datasets [28].

SCA is a population-based algorithm, named after its use of the sine and cosine functions in its formulation, originally intended for use in solving optimization problems [20]. The algorithm initially creates a collection of multiple randomized solutions requiring them to fluctuate toward the best solution during the exploitation phase or outwards to encourage exploration employing a mathematical model formed from the sine cosine functions.

Some deficiencies were observed in the original SCA while performing practical empirical simulations with standard unconstrained benchmarks. Because of this, we have attempted to improve the basic SCA by performing hybridization with the well-known ABC algorithm. The mSCA is benchmarked using ten datasets form the University of California, Irvine (UCI) repository, and Arizona State University, as well as a single dataset of the coronavirus disease (COVID-19).

The main contribution of this conducted research can be outlined in the following:

  • Proposal of a mSCA applied to the problem of feature selection elements of the ABC algorithm is integrated into the SCA to improve exploratory behavior.

  • Testing the mSCA on ten standard benchmark datasets with low medium and high dimensions sets represented.

  • Comparing the mSCA to other advanced feature selection algorithms and demonstrating the improvements made.

  • Applying the proposed mSCA to solving a case study of COVID-19.

The remainder of this article is organized according to the following order: Sect. 2 shows a summary of the reviewed literature. Section 3 consists of a description of the original SCA. Section 4 shows experimental results and discusses the findings based on said results. Section 5 summarizes the findings and presents proposals for the direction of further work in this field.

2 Literature Review

When we have large datasets that are too difficult to classify, the use of swarm intelligence-based algorithm is suggested. Each large dataset contains features that are insignificant and irrelevant which can prove to be difficult when trying to analyze and interpret data. Swarm intelligence algorithm’s purpose is to reduce dimensionality (feature selection) by keeping only useful features and those containing rich information. As a result of using dimensionality reduction technique, we have better understanding and interpretation of data, as well as higher accuracy of the results. There are two main steps in dimensionality reduction process, extracting features and selecting features. Before any further explanation of those features, we should give a short overview of swarm intelligence algorithms.

Swarm intelligence algorithms are part of the artificial intelligence (AI) field, and they are so-called nature-inspired metaheuristics [29]. Many groups of animals form collective intelligence which means that every member acts independently, but they mutually exchange information. That information eventually takes the group toward the optimal solution of their problem. Such animal colonies are ants, birds, hawks, fish, and more [16, 21, 29]. Nature-inspired metaheuristics are not that efficient at finding the most optimal solutions inside the search area, but they are efficient at finding the candidate solutions. Furthermore, they are especially good at finding possible solutions inside very large search areas. Because they take unreasonable amount of time to find the most optimal solution, swarm intelligence algorithms are also classified as NP-hard problems [15]. Many diverse problems can be solved with swarm intelligence algorithms such as wireless sensor network optimization [4, 32], cloud computing [6, 8, 35] and optimization of neural networks  [2, 5, 12, 24], machine learning, and COVID-19 prediction [33], all the way to solving complicated problems in the field of medicine [7].

In order to prepare raw and unprocessed data, feature extraction is used [17]. A new dataset is formed by keeping some of the core features after which new features can be derived. Eventually, we have a new dataset that is cleaner, containing only features relevant to the specific problem and with fewer dimensions compared to the original dataset.

Since we have our most relevant and important data after feature extraction, the next step is feature selection. With feature selection, we select attributes previously defined in original dataset. This step is extremely important since the combination of the right attributes can improve the model’s performance and accuracy. A common example of feature selection, alongside feature extraction, is image processing and analysis. Large amount of statistical features can be retrieved from the image, but a combination of only a few gives satisfactory results.

A side effect of feature selection is a possible loss of a certain amount of information, but, due to achieving simplicity of the model and significant performance improvement, it is well worth it. There are three distinct categories of techniques for selecting features, the wrapper, filter, and embedded technique [9].

Filter techniques choose the features that should contain the most information, without taking into consideration whether there are any relationships between the features or not. Wrapper techniques choose features that are most accurate to our machine learning model by going through all feature combinations. As for the embedded technique, the features are chosen while the model is still being constructed [9]. With these techniques, a decent performance can be achieved on relatively small datasets, but, for larger datasets, because of the decline in performance, a different method should be used such as swarm intelligence algorithm. In a reasonable amount of computational time, satisfactory results on large datasets are provided by the algorithm.

3 Original and Proposed Modified Sine Cosine Algorithm

SCA originally designed with the purpose of solving optimization problems, and first introduced by Seyadali Mirjalili [20], is a generally new population-based algorithm. The algorithm stochastically looks for the most optimum solution to our problems. At the very beginning, it starts with a randomized set of solutions, then repeatedly evaluates this set against an objective function, and follows a given ruleset that forms the core of the given optimization technique. As such, finding the most optimal solution in the first iteration is not guaranteed; however, given enough iterations and a large enough collection of randomized solutions, the probability of the global optimal solution being found increases.

The process of optimization in the stochastic population-based approach, regardless of the algorithm being applied, can be split across two distinct phases: exploration phase and exploitation phase. In the exploration phase, the algorithm quickly, in a very random manner, combines solutions from a given random set, looking through the search space for the most favorable regions. With the exploitation phase, the changes are gradually made, however, noticeably less severe than those from the exploitation phase.

The original SCA proposes the use of the following equations for position updating in both phases Eq (1):

$$\begin{aligned} \begin{aligned} X_i^{t+1} = X_i^t + r_1 \times \sin (r_2) \times |r_3P_i^t - X_i^t | \\ X_i^{t+1} = X_i^t + r_1 \times \cos (r_2) \times |r_3P_i^t - X_{i_{i}}^t | \end{aligned} \end{aligned}$$
(1)

where X represents the current solution’s position in the i-th dimension after the t-th iteration, \(P_i\) represents the point of destination in the i-th dimension, \(r_1\), \(r_2\) and \(r_3\) are random numbers, and || indicates an absolute value.

In Eq. (2), a combination of these two Eq. (1) can be seen:

$$\begin{aligned} X_i^{t+1} = {\left\{ \begin{array}{ll} X_i^{t+1} = X_i^t + r_1 \times \sin (r_2) \times |r_3P_i^t - X_i^t |, r_4 < 0.5\\ X_i^{t+1} = X_i^t + r_1 \times \cos (r_2) \times |r_3P_i^t - X_{i_{i}}^t|, r_4 \ge 0.5 \end{array}\right. } \end{aligned}$$
(2)

where \(r_4\) represents a random value in [0,1].

The four major parameters of the SCA are \(r_1, r_2, r_3\), and \(r_4\), as shown in the equations above. Parameter \(r_1\) defines region of the following position. Said position signifies one of the two possible spaces: the space between the solution and destination or the space outside of the two. Parameter \(r_2\) dictates the movement away from or toward the destination, or more precisely, how distant the movement is. The role of parameter \(r_{3}\) is to stochastically diminish (\(r_3 < 1\)) or emphasize (\(r_3 > 1\)) the distribution effects on distance definition. Lastly, parameter \(r_4\) plays the part of switching between the sine and cosine components in Eq. (2).

The effects of the sine and cosine functions on Eqs. (1) and (2) are depicted in Fig. 1. The search space in between the two solutions is dictated by these two equations as depicted in said figure. These two equations can also be expanded to include higher dimensions; however, Fig. 1 depicts a two-dimensional model.

Fig. 1
figure 1

Sine and cosine effects on the upcoming position from Eq. 1

The sine and cosine functions cyclic pattern allows for solution repositioning around a different solution. This can provide a guarantee of exploitation in the defined space enclosed by the two calculated solutions. Altering the range of sine and cosine function enables the solutions to search outside the space that is defined by the corresponding destinations, and this is done so as to ensure exploration.

Fig. 2
figure 2

Sine and cosine with the range in \([-2,2]\) allow a solution to go around (inside the space between them) or beyond (outside the space between them) the destination

While changing the function range, as shown in Fig. 2, it is necessary to update the new position of the solution taking into account positions of the existing solutions. The updated position is attained by choosing a random value in range \([0,2\pi ]\) for \(r_2\) from Eq. 2, and it can be either on the outside or on the inside. This mechanism ensures both exploitation and exploration of the search space.

The algorithm needs to have the ability to balance both exploration and exploitation when searching for promising regions inside a given search space. This is done to eventually converge on a global optimum. The SCA does this by changing range of the sine and cosine adaptively in Eq. 2 according to Eq. 3:

$$\begin{aligned} r_1=a-t{\frac{a}{T}} \end{aligned}$$
(3)

where a represents a constant, T represents the maximum amount of allowed repetitions, and finally t represents the active iteration.

Through many repetitions of Eq. 2, we get a decreasing range of sine and cosine as shown in Fig. 3.

Fig. 3
figure 3

Decreasing range of sine and cosine (\(a=3\))

By taking into consideration both Figs. 2 and 3, it can be deduced that the SCA focuses on exploitation when the given ranges are in \([-1,1]\), and on exploration when the ranges are in between (1, 2] and \([-2,-1)\).

Fig. 4
figure 4

General steps of the original SCA algorithm

Finally, the pseudocode for the SCA are shown in Fig. 4. As depicted, the algorithm begins the process of optimization with randomized set of solutions. Every time the algorithm encounters a solution, it considers the most optimal so far, and it assigns it as a target point. The algorithm then, in regard to the most optimal solution, updates other solutions. During this process, the iteration counter is increased, and the ranges of sine and cosine function are, after every iteration, updated emphasizing exploitation of the defined search space. When the counter reaches the maximum allowed amount of iterations, the optimization process of the original SCA stops. Other conditions for termination can be implemented as well, including the total number of functional evaluations or reaching a desired global optimum accuracy.

3.1 Proposed Modified SCA Approach

Notwithstanding the fact that the basic SCA metaheuristics establish excellent results for standard benchmark instances [20], based on additional conducted experiments with basic congress on evolutionary computation (CEC) benchmark suites, it was concluded that the basic SCA can be further improved.

As many other swarm intelligence approaches, original SCA may be stuck in non-optimal regions of the search domain in early iterations of execution. In this early phase, due to the lack of exploration power, if the search process is not “lucky” and if does not register optimal domain of the search space, algorithm may stuck in sub-optimal domain for many iterations. As a consequence, worse mean values are generated, and performance of the metaheuristic is seriously degraded.

Without adding complexity to the algorithm, abovementioned drawback of original SCA can overcome by introducing simple mechanism in the search process as follows: after every iteration, 5% of worst solutions from the population are replaced with the randomly generated individuals within the boundaries of the search space in the first 50% of iterations:

$$\begin{aligned} X_{\text {rnd}}^{j} = L^{j} + \phi \cdot (U^{j}-L^{j}), \end{aligned}$$
(4)

where \(X_{\text {rnd}}^{j}\) is j-th component of the newly generated random solution, phi is the value derived from the uniform distribution, and \(U^{J}\) and \(L^{j}\) are upper and lower boundaries of j-th parameter, respectively.

Based on conducted simulations, it was concluded that in approximately first 50% of iterations described exploration mechanism should be triggered. However, in later iterations, this mechanism is not needed, and it would only represent an obstacle in performing a fine-tuned search around the promising domain of the search region. Proposed method is named modified SCA (mSCA), and its pseudocode is shown in Algorithm 1.

figure a

4 Experiments and Discussion

In the research presented in this manuscript, the proposed mSCA algorithm was tested on ten basic datasets and one additional COVID-19 dataset. The experimental simulations in this research were executed through 20 independent runs, while each run consisted of 70 iterations. The size of the population was set to 8, and a mixed initializer was utilized to randomly select 2/3 from the available amount of features. The suggested improved optimization method’s performance has been tested on ten UCI datasets that are very popular among researchers and used as a benchmark in Table 1.

The performance of mSCA was evaluated on a computer with a central processing unit (CPU) with a clock frequency of 2.90 GHz, additionally with 16.0G of available random access memory (RAM) and programmed in the language of Python with Anaconda framework using machine learning libraries including NumPy, SciPy and scikit-learn. The performance is judged based on five calculated evaluation metrics. The evaluation metrics include optimal fitness value, average fitness value, fitness value normal divination, precision of classification, and the ratio of feature selection with each method executed and evaluated 20 times. The repetition is performed to better represent results and avoid bias caused by optimization algorithms stochastic nature. The result averages are logged and presented after the last iteration of the 20 individual runs.

The mSCA in tested against ten standard datasets and COVID-19 dataset. And its performance is then evaluated. The datasets are acquired from the UCI repository [11] and Arizona State University [18]. Table 2 represents best overall fitness while Table 3 represents the mean fitness metric. Tables 4 and 5 each represent standard deviation, average classification accuracy and feature selection of already referenced ten datasets. The best results are marked in bold in each table, except in the case of tie, where none of the results are marked. Tests of the proposed mSCA have been conducted on different structures, so as to provide evidence of the algorithms efficiency and performance in differing dimension.

Table 1 List of experimental simulation datasets
Table 2 Best fitness metric over ten UCI datasets for the compared approaches
Table 3 Statistical mean fitness metric over ten datasets for the compared approaches
Table 4 Standard deviation results for ten datasets included in the comparative analysis
Table 5 Percentage of selected feature for ten datasets included in comparative analysis

The obtained results from Tables 2, 3, 4 and 5 from conducted experiments proved the efficiency and efficacy of mSCA proposed algorithm. Based on the empirical analysis, a deduction can be made that the proposed mSCA can yield higher-quality results than the algorithms it has been tested against. The eight algorithms tested in this paper are (BDA) [19], binary artificial bee colony (BABC) [14], binary multiverse optimizer (BMVO) [1], binary particle swarm optimization (BPSO) [30], chaotic crow search algorithm (CCSA) [23], binary coyote optimization algorithm (BCOA) [27], evolution strategy with covariance matrix adaptation (CMAES) [13] and success history-based adaptive differential evolution with linear population size reduction (LSHADE) [26] algorithms.

Fig. 5
figure 5

Average classification accuracy over ten datasets included in the comparative analysis

Fig. 6
figure 6

Accuracy and feature size of the proposed SCA and mSCA on the COVID-19 dataset

Based on the presented results, it can be concluded that the proposed mSCA metaheuristics clearly outperformed the original SCA approach for all observed metrics. In general, when compared to other approaches included in the simulations, mSCA obtained the best performances. Based on the results from Table 2, the proposed mSCA approach obtained the best results for best fitness metrics on five out of the ten UCI datasets. When the statistical mean fitness metric is observed, from Table 3, it can be concluded that the mSCA obtained the best results on six out of ten UCI datasets. In case of the standard deviation, Table 4 shows that the mSCA obtained the best results on four datasets and tied the best results on the Glass dataset. In Table 5, comparative analysis between proposed mSCA and other approaches in terms of selected features (expressed as ratios of total number of features in the datasets) is presented. From results, it can be seen that proposed mSCA in average utilizes a smaller number of features than other methods which means that it managed to substantially reduce the problem dimensions, which makes the training process of a classifier much faster (Figs. 5 and 6).

5 Conclusion

The conducted research that is presented in this manuscript proposes a novel feature selection method. The implemented mSCA metaheuristics address the drawbacks of the original SCA method that are observed from the results of the conducted experiments. The proposed mSCA approach was later used to help find the crucial features for the classification process. The presented algorithmic method of optimization was validated on ten benchmark datasets, and the results are represented in comparison with other swarm intelligence metaheuristics. Finally, the mSCA method was used on COVID-19 dataset. The conducted experiments results indicate that the mSCA approach outperformed other methods included in the comparative analysis. Based on defined research contributions, the novelty of proposed research can be summed as follows: more efficient SCA metaheuristics are devised, solving feature selection challenge was improved in terms of classification accuracy, and the number of employed features and classification for the most recent and important COVID-19 dataset was performed.

The future research in this area will be focused on including additional datasets to the experimental simulations. Also, the future work will deal with adaptation of other swarm intelligence metaheuristics, with a goal to further enhance the classification accuracy.