1 Introduction

Over the last decade, smartphones have become one of the most used devices worldwide with nearly 3.6 billion users in 2020 according to Statista report [1]. This took place due to the outstanding functionality and features that smartphones possess [2]. Smartphones can be applied to do various features, such as sending emails, gaming, take pictures and video recording, information search, GPS, and so on. This can be done because of the applications or apps created and upgraded on a daily basis particularly, on an android operating system (OS).

Android OS was designed in 2007 as a modified version of the Linux kernel for touchscreen mobile devices. The Andriod OS obtain 70% of the market share worldwide compared to other OS in 2021 [3]. Furthermore, the number of apps on Android OS reached nearly 2.7 million apps last year [4]. These android apps can be employed in diverse categories, including banking, social media, medical, educational, and entertainment apps [5]. Consequently, most of these apps used for the sake of users’ advantage. However, some of them are utilized for malicious purposes in order to hack or exploit. Such malicious apps are known as Malware, which can be defined as intrusive software that steal information and damage other users devices [6, 7]. Malware is usually developed by cybercriminals and programmed to act like worms, adware, Trojan viruses, ransomware, and spyware [8].

In view of the rapid rate of development and existence of malware apps, it becomes hard to prevent and stop most security attacks [9]. These attacks are able to operate in different cases, for example, in 2019 a warning took place by Cybersecurity Check Point for android users that more than 25 million mobile devices were exposed to malware known as Agent Smith [10]. Moreover, the malware disguises under apps like WhatsApp to exploit Android OS vulnerabilities. Another example coccus in 2020 stated that above billion android devices endanger of getting hacked due to lacking new security updates [11]. These exposed devices by ransomware are the ones released before 2012. Furthermore, according to Kaspersky Lab researchers in 2020, a number of hackers have been employing Google Play, i.e., the app store for android, for many years to disseminate advanced malware [12]. In a recent situation, the android malware ‘FlyTrap’ app has been utilized to hack numerous Facebook accounts [13]. Additionally, the “Vultur” app was found to be using screen recording features to steal sensitive information. [14]

Various countermeasures have been utilized in literature to mitigate and prevent such malicious attacks. The authors of [15] suggested that there are several kinds of methods to detect malware, including Static analysis, Dynamic analysis, Application permission analysis and Anomaly detection. Each of which has its own technique in order to detect malicious malware, where static analysis depends on extracted features from codes of non-executed apps [16], while, the dynamic analysis relies on monitoring and analyzing the executed apps in a controlled environment. As for application permission analysis, it can be described as a technique considering the access granted from users for android malware. Finally, anomaly detection usually applied using Machine Learning (ML) to identify and predicate malware apps from the learning process and other domains such as, spam detection [17,18,19,20], fake news [21]. image segmentation and other fields.

ML malware detection-based acquired more attention from researchers in the past few years due to being more superior compared to other methods. The upgraded malware characteristics made the ML detection-based achieve better performance, because of the periodically updated datasets used in the process. The work in [22], stated that the non-machine learning-based approaches for Android malware detection are consuming more time as well as have less ability to detect malware compared with the ML-based. Also, according to [23], ML detection-based approaches can adapt to the new sophisticated and unpredictable malware more than the other methods. Besides, of all the existing approaches, the ML obtain high accuracy in malware detection [24].

Many works in Android malware detection based on ML have been proposed in the literature, for example, the work in [9] investigated different ML algorithms to detect Android malware. They compared several ML classifiers in the process, including Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest (RF). Another recent work proposed an Android malware detection approach using ML algorithms [25]. The authors apply three different dataset types namely, time-series, Boolean, and frequency datasets. [26] presented four tree-based ML approach for Android malware detection. They used the DREBIN dataset to investigate the performance of the algorithms. The RF achieved the best results when compared with the other methods. Other numerous works also examined the detection of malware utilizing ML algorithms such as [27,28,29].

Furthermore, the SVM classification model shows outstanding performance against other ML algorithms for Android malicious detection. [30] presented a combination of Active Learning and SVM to detect Android malware. Their approach is evaluated on the DREBIN dataset and shows excellent results in detecting new malicious. Additionally, [31] introduced a keywords correlation distance and SVM method in order to detect Android malware. The method obtained efficient results in detecting malware on Android OS. Another approach tried to detect Android malware using the SVM [32]. In this work, the SVM is grouped with a decision tree (DT) to deal with malware apps. The results demonstrate the superiority of the approach when compared with other detection approaches. Another recent work applied the SVM to detect Android malware using Application Program Interface (API) as features [33]. The results show competitive performance when compared with other approaches.

The SVM shows excellent performance as shown in the previously mentioned works. However, SVM has more space to improve if the optimal hyperparameters were selected [34, 35]. Therefore, in this study, we propose an SVM combined with Harris Hawks Optimization (HHO) for Android malware detection. The HHO in this work, used for two different parts, first, for automatically identifying the best hyperparameters of the SVM, while the second part is for feature weighting to determine the most important ones in order to improve the detection phase. In the proposed approach, the evaluation criteria occurs against the CICMalAnal2017 datasets. Five different datasets are generated from the original data, each data consist of various malware types. Furthermore, additional analysis was applied to describe the relationship and correction of the malware types with the most important features.

In other words, the contribution can be summarized by the following points:

  • Android malware detection based on evolutionary Support vector Machine (SVM) algorithm is presented.

  • The HHO algorithm is employed to achieve two different objectives simultaneously, including, parameter tuning and feature weighting.

  • Generate five various datasets, where each data contain a different malware type.

  • Recognizing the most relevant features in order to improve the malware detection phase.

The rest of the paper is organized as follows: Section 2 presents previous works in the literature for Android malware detection. Section 3 introduces the background knowledge of Support Vector Machine (SVM) and Harris Hawks Optimization (HHO). The proposed approach is described in Section 4. Section 5 presents dataset description preparation process. Experiments and results are conducted and described in Section 6, while conclusion and future work is addressed in Section 7.

2 Related work

Android becomes the most used OS in the world in the past few years [3]. The huge number of apps that exist in the Android store reached 2.7 million apps according to [4]. However, not a small number of apps are malware that needs to be detected and controlled. One of the ways to do this is by using malware detection. Android malware detection is the method to distinguish between malicious or benign apps by using techniques such as Static analysis, Dynamic analysis, Application permission analysis, and Anomaly Detection. Machine learning Android malware detection-based gain more attention in recent years.

For instance, the authors of [36] presented a machine learning approach to detect Android malware. They proposed several machine learning methods in order to classify and identify unknown malicious apps. [37] investigated the malware detection using the Significant Permission IDentification (SigPID) system. The SigPID works as a permission technique to analyze and control the increase of Android malware numbers. Further, the system used machine learning methods to classify malware families. Their approach using SVM achieved excellent results in accuracy, recall, precision, and f-measure with a 90% rate with fewer analysis times. On the other hand, the SigPID obtained 93.62% malware detection. Another recent work also applied the machine learning algorithm for Android malware detection [25, 38]. The authors obtain the API information by generating flow graph control of the application separated into three datasets type, time-series, frequency, and Boolean. Based on these datasets, three detection models are constructed for API sequence, API calls, and API frequency. Their ensemble approach examined 10010 benign and 10683 malware and shows excellent results when compared with other methods [39].

Moreover, [40] proposed a machine learning method for Android malware detection based on extracted features of apps. The features are utilized as a set of inputs for the classifier learning. They tried enhancing the detection phase by using the ensemble learning approach (SecENS). Further, the authors develop an efficient system in order to integrate their two methods SecCLS and SecENS together to improve machine learning detection. The work in [41] applied the dynamic analysis for malware detection using machine learning technique. An implementation of automatically feature extracted tool from Android phones was also applied. Their analysis shows that number of features extracted better on-device when compared to emulators and performs better with the machine learning model.

The increasing of Android malware encourages researchers to implement various detection systems. The work in [42] for instance, presented a detection approach of two parts. Firstly, they extract 123 different permissions from more than 10000 applications. Secondly, an evaluation of several machine learning algorithms are applied, namely, Decision Tree (J48), Simple Logistic (SL), k-star, Naive Bayes (NB), and Random Forest (RF). The experiments show that the SL obtained the best results compared with other algorithms. The authors of [43] introduced a lightweight system based on machine learning for Android malware detection. Further, in the system, they used both dynamic and static features alongside the principal component feature selection technique to identify the best set of features. Then the SVM is employed as a classification model. The proposed approach outperforms the other methods [44].

Wang et al. [45] proposed a novel approach for Android malware detection based on information fusion and machine learning methods. The proposed approach applied parallel criteria for the machine learning technique. They start their approach by extracting eight kinds of features, then employed a parallel machine learning model for the purpose of Android malware detection. Additionally, they examine the probability analysis as well as Dempster-Shafer theory on their approach. Another recent work also investigated the Android malware and benign identification using feature weight-based detection, multiple dimensional and kernel feature-based framework [8]. An analysis of 112 data structure kernels in Android OS and examined the detection performance against several types of datasets. Furthermore, they stated that the memory- and signal-related features obtained the best detection accuracy compared to schedule-related features.

Standard machine learning proved its efficient performance as shown in the previous studies, however, the detection criteria can be improved more using metaheuristic algorithms. Therefore, more recent studies investigated the detection of Android malware combined with metaheuristic algorithms are proposed. For example, [46] proposed a hybrid approach of support vector machine combined with evolutionary algorithms for Android malware detection. This hybrid approach utilizes a genetic algorithm (GA) and a particle swarm optimization (PSO) in order to enhance the detection phase of the SVM. Their approach outperforms the other standard machine learning classifiers. Another recent work applied the metaheuristic and machine learning together for Android ransomware detection [47]. They combined the Kernel Extreme Learning Machine (KELM) with the Salp Swarm Algorithm (SSA) to improve the KELM hyperparameters and select the best subset of features. The experiments of the proposed method achieved better results than other methods in several measures. Furthermore, the work in [48] presented a malware detection technique based on an evolutionary algorithm and operational codes (OpCodes). Their work contained various steps to perform which are, take apart the executable files, producing OpCodes graph, and employing the evolutionary algorithm in order to identify similar graphs. Besides, the detection of the malware types takes place by applying the graph similarity of each instance using the evolutionary algorithm.

Furthermore, a evolutionary algorithm has gain attention recently, Harris Hawks Optimization (HHO). The HHO applied in different and wide applications in the literature, including, feature selection [49], student performance, fault Detection, Internet of Things, image segmentation, manufacturing problems and so on.

Therefore, in this study we utilized the HHO for the problem of malware Android detection. The proposed work in this study differs from the previously mentioned methods in the following points:

  • Applied the recent metaheuristic algorithm Harris Hawks Optimization (HHO) in order to improve the SVM detection performance.

  • The propped approach HHO-SVM tackles the problem of optimizing the SVM hyperparameters and identify the features weighting of the datasets.

  • Generate five sampled datasets to study each scenario of every attack type.

  • Analyze each attack type of the five datasets and their relation to the features. In other words, identify the most important features of each dataset (different malware type).

3 Preliminaries

3.1 Support vector machine (SVM)

Support Vector Machine (SVM) is a machine learning classifier designed to solve classification and regression problems [50]. It is known as one of the most reliable classifiers applied for solving problems in different domains [51]. SVM depends on searching for the most possible optimal linear separation criterion which is known as the Hyperplane. The hyperplane tries to maximize the distance (margin) between the closes data points of the training instances which belong to each class. The data points near to the hyperplane in a distance equal to the margin are called Support Vectors. On the other hand, overfitting to the training set will most probably occur, which can lead to the misclassification of the new instances or datasets. To solve this problem, a penalty parameter known as the cost, which is denoted by C, will be used to increase the accuracy of the classification for the new data points. [52]. Figure 1 shows the aforementioned description of SVM.

To solve the issue of nonlinear separation of data points, different kernel functions can be used for SVM. The most popular and robust kernel function is the Radial Basis Function (RBF); as it relies on the gamma \(\gamma\) parameter for deciding the effect of support vectors over each other. The reader can refer to [53] for more details about SVM.

Fig. 1
figure 1

Support vectors with optimal hyperplane

3.2 Harris Hawks optimization (HHO)

Harris Hawks Optimizer (HHO) is a nature-inspired population-based optimization algorithm designed by [54]. HHO is inspired by the behavior of Harris’ hawks when they cooperate to chase their preys in an intelligent strategy that is called the surprise pounce, through which, the hawks pounce their prey from different directions to surprise it as shown in Fig. 2.

As a nature-inspired optimization algorithm, the HHO is composed of two main phases; which are the exploration and exploitation, as well as a transition state between exploitative behaviors. The candidate solutions here are represented by the hawks that are observing and waiting in the desert to detect their prey, and the best solution for each step is the selected prey.

In the exploration phase, the Harris’ hawks start their haunting process by selecting random locations and waiting to try to detect a prey. This is carried out based on two strategies: the first depends on the positions of other hawks that are participating in the haunting of the prey, and the second depends on the randomly existing tall trees within the haunt range. Equation 1 explains both strategies, where an equal chance q for every positioning (perching) strategy is considered and thus, the first strategy is selected if q is equal to or greater than 0.5, and the second strategy is selected otherwise. \(X(t + 1)\) is the vector of hawks’ positions in the following iteration, \(X_{rabbit}(t)\) is the position of the prey in the current iteration t, \(X_{rand}(t)\) is a randomly selected hawk from the current iteration, and X(t) is the vector of hawks’ positions of the current iteration. \(r_{1}\), \(r_{2}\), \(r_{3}\), \(r_{4}\) and q are random numbers in the interval (0,1) which are updated through each iteration, LB and UB are the lower and upper bounds of the variables, respectively.

$$\begin{aligned} \small X(t+1)={\left\{ \begin{array}{ll} X_{rand}(t)-r_{1}|X_{rand}(t)-2r_{2}X(t) &{} q\ge 0.5\\ (X_{rabbit}(t)-X_{m}(t))-r_{3}(LB+r_{4}(UB-LB)) &{} q<0.5 \end{array}\right. } \end{aligned}$$
(1)

\(X_{m}(t)\) is the average position of hawks in the current population, which can be calculated according to equation 2, where \(X_{i}(t)\) is the position of the hawk i in the current iteration, while N is the total number of hawks.

$$\begin{aligned} X_{m}(t)=\frac{1}{N}\sum _{i=1}^{N}X_{i}(t) \end{aligned}$$
(2)

In the exploitation phase, the Harris’ hawks start attacking their prey by performing the surprise pounce. However, as the prey attempts several times to escape from the hawks, they change their chasing strategies according to the escaping behaviors of the prey. Hence, there are four different chasing strategies followed by the hawks, which are Soft Besiege, Soft Besiege with progressive rapid dives, Hard Besiege, Hard Besiege, and Hard Besiege with progressive rapid dives.

The selection of either strategy of the four depends on the energy E of the prey; as the prey loses its energy during escaping the haunt. In other words, it can be translated as changing between different exploitative behaviors. The energy of the prey can be modeled through Equ 3, where \(E_0\) is the initial energy of the prey, and T is the maximum number of iterations.

$$E = 2E_{0} \left( {1 - \frac{t}{T}} \right){\text{ }}$$
(3)

When \(|E| \ge 0.5\), the soft besiege occurs only if the chance r of the prey successfully escaping from the hawks is \(\ge 0.5\). However, if \(r < 0.5\), the soft besiege with progressive rapid dives strategy occurs. Equ 4 and 5 illustrate both strategies, respectively. Where \(\Delta X(t)\) is the difference between the position vector of the rabbit and the location stored in the current iteration t, Y is the rule to evaluate the next move of the hawks to perform a soft besiege. While Z is the rule to apply the zigzag deceptive motion that is mimicked in the Levy Flight LF move, only if the Y rule fails. The reader can read about Y, Z, and LF in original paper.

$$\begin{aligned} X(t+1)=\varDelta X(t)-E|JX_{rabbit}(t)-X(t)| \end{aligned}$$
(4)
$$\begin{aligned} X(t+1)={\left\{ \begin{array}{ll} Y &{} if\,F(Y)<F(X(t))\\ Z &{} if\,F(Z)<F(X(t)) \end{array}\right. } \end{aligned}$$
(5)

When \(|E| < 0.5\), the Hard besiege strategy is followed in condition that r is greater than or equal to 0.5. Otherwise, hard besiege with progressive rapid dives strategy will be carried out. Equation 6 shows how the current positions are updated for hard besiege, while for hard besiege with progressive rapid dives the same Equ 5 is applied with a difference that Y considers the average positions of the hawks instead.

$$\begin{aligned} X(t+1)=X_{rabbit}(t)-E|\varDelta X(t)| \end{aligned}$$
(6)
Fig. 2
figure 2

HHO different phases [54]

4 Proposed approach

4.1 Design issues

This section detailedly describes, the approach followed to utilize the HHO algorithm for optimizing SVM parameters as well as weighting the features.

Solution representation As described earlier, in this paper, the solution here is represented by the hawks that are waiting and observing the prey. Therefore, from now on, we will only mention the term solution instead of hawks to eliminate any confusion for the reader. The representation of the solution is affected by two factors; the parameters of SVM, and the features (attributes) of the inserted dataset.

For the first part of the representation, we look into the search spaces of SVM parameters C and \(\gamma\), which both have different boundaries than the original boundaries of the solution, which are 0 and 1. Therefore, we need to scale the values of the solution into readable values for SVM parameters. The C parameter best accepts values between 0 and 32, and \(\gamma\) values can be within the interval [0,35000]. For scaling the values, we apply the min-max normalization equ shown in 7, where B represents the final scaled value, A is the value to be scaled, \(min_{A}\), \(max_{A}\) are the lower and upper bounds of the old interval, respectively, and \(min_{B}\), and \(max_{B}\) are the lower and upper bounds of the new interval, respectively.

$$\begin{aligned} B=\frac{A-min_{A}}{max_{A}-min_{A}}(max_{B}-min_{B})+min_{B} \end{aligned}$$
(7)

The second part of the solution representation is directly affected by the number of attributes of the dataset. Each value in the solution vector is matched to an attribute in the dataset. Consequently, the value of each cell in the second part of the solution that is produced by the HHO algorithm is multiplied by the value of the corresponding attribute for all training instances as shown in Fig. 3. Hence, combining both parts will result in a solution that has a length that is equal to two (for C \(\gamma\) ) plus the number of the attributes of the dataset.

Fitness function Deciding the quality of the solution is known as fitness assessment and thus, the function that is used for such task is called the fitness function. In our case, we use the classification accuracy produced by the SVM algorithm as the fitness function, and we make sure that HHO is also set to maximize this value. The classification accuracy is calculated using Eq. 8, where TP and TN are the truly classified positive and negative instances, and FP and FN are the falsely classified positive and negative instances.

System architecture To start the process, we split the dataset into training and testing subsets. The split is conducted based on the k-fold cut, where the \(k - (k-1)\) partition is allocated for the testing set and the remaining \(k-1\) is allocated for the testing set. This step is repeated k times, having different \(k-1\) parts of the dataset for both training and testing sets in each iteration. This step is conducted to guarantee the maximum possible diversity of the training/testing sets, as well as maximum possible number of separated runs.

The HHO initializes a random solution based on the training set at the beginning of each fold. The solution will be composed of the values that will be given to the SVM parameters and the features of the training dataset. The solution is split by assigning the first two values for SVM parameters after having them scaled and the remaining part will be assigned to the features. The value in each cell in the second part of the solution is multiplied by each value of the matching feature.

Fig. 3
figure 3

Representation of the weighting mechanism

Next, the SVM is trained using the scaled values of the first two cells of the solution vector which are assigned to the C and \(\gamma\) variables, as well as the new values of the training set which resulted from the multiplications by the correspondent cells. The classification accuracy that resulted using the values of the solution is returned as the fitness function outcome for the HHO algorithm.

As we mentioned, all previous operations occur during a single training fold and they are repeated in that fold based on the number of iterations set in the HHO algorithm. When the maximum number of iterations is reached, the HHO returns the most optimal possible solution which owns the highest classification accuracy, and this value will be the outcome of that single fold. Finally, we calculate the average accuracy out of the accuracy of the testing set of all folds.

4.2 Evaluation

The results of all algorithms are compared and evaluated in order to examine their performance on the datasets. The evaluation process performed using the confusion matrix table as illustrated in Fig. 4. The True Positive (TP) depicts the number of all actual positive classes that are accurately predicted, while the count of actual positive elements that are incorrectly predicted denotes with False Negative (FN). As for False Positive (FP), it is the number of negative elements that are incorrectly predicted as Positive class, whilst the count of negative elements that are accurately predicted.

Fig. 4
figure 4

Confusion matrix

Four evaluation measures are utilized to examine the model’s performance, including, accuracy, precision, recall, and f-measure. The mentioned measures can be calculated as shown in the following equations:

$$\begin{aligned} Accuracy = \frac{TP + TN}{ TP + TN + FP + FN} \end{aligned}$$
(8)
$$\begin{aligned} Precision = \frac{TP}{TP + FP } \end{aligned}$$
(9)
$$\begin{aligned} Recall (Sensitivity) = \frac{TP}{TP + FN } \end{aligned}$$
(10)
$$\begin{aligned} F-Measure = \frac{2*Precision*Recall}{Precision + Recall} \end{aligned}$$
(11)

All the aforementioned processes described in Sect. 4 are depicted in Fig. 5.

Fig. 5
figure 5

Proposed approach process

5 Dataset description, characteristics and preparation

The dataset used in this work (CICAndMal2017) is collected from 10,854 samples by the Canadian Institute for Cybersecurity [55]. The samples consist of 4354 malware and 6500 benign gathered from different sources. On one hand, the malware sources are divided into several parts, namely, Contagio security blog, VirusTotal, and previously published works in the literature [56], and on the other hand, the benign application samples are collected from Google Play store during the years 2015-2017.

The dataset is categorized into five different groups, including, Benign, Adware, Ransomware, Scareware and SMSmalware. The details of each group can be seen in Table 1.

Table 1 Details of the original data

The Adware type is defined as a malicious application that is responsible for sending user information to a specific remote server in order to forcefully showing personalized (interest-based) advertisements for that user. This can be done by hacking smartphone speakers or tracking users’ search history and application usage [57]. As for the Ransomware type, it is a kind of malware that demands users an amount of cash. This type has two general classes, which are, crypto and lock-screen. The crypto class works as an encryption method that scrambles the mobile device information and contents, while the lock-screen class functions by blocking the smartphone screen and covers it completely with a picture and make it impossible to be used. Both classes can be resolved if the users pay the demanded payment. Moreover, the Scareware type operates by scaring users with some kind of phishing websites or applications that threaten them to steal their information [58]. Scareware tries to trick users by pretending to be a security application, for instance, that shows a fake list of viruses on users’ devices causing them to use such malware. Finally, the SMSmalware type is a malware that controls and manages messages to send unwanted messages. In other words, it is a process used by the hacker to send messages by using users’ mobile phones to trick their trusted contacts [59].

Before processing, the dataset comprises various files where each file is a member of the malware groups (Adware, Ransomware, Scareware, and SMSmalware). Therefore, to merge all files together, a command line script was applied. Further, some prepossessing steps were employed in order to prepare the dataset for the classification models, including, clean noisy data and missing values. These issues were solved using normalization and majority vote methods, respectively.

In this paper, five datasets are prepared and sampled from the original data. In our sampling technique, we generated each dataset to have two types of classes, Benign and different malware types, while the last dataset contains all malware types except the Adware due to having diverse characteristics. Also, all datasets simulate the distribution of the original data. Additional experiments were applied with other feature selection approaches and the same results were obtained. The details of the sampled datasets are shown in Table 2.

Table 2 Details of the five sampled datasets

6 Experiment and results

In this section, several experimental phases are carried out on all datasets, including, base classifiers models examination, SVM with metaheuristic benchmarks investigation, comparison between the proposed approach, and other metaheuristic algorithms on our datasets.

Additionally, feature importance analysis is applied in order to identify the best-weighted ones to detect android malware on each dataset.

Therefore. to summarize the four experiments and analysis phases:

  • Base classifiers models: an examination of our sampled datasets on traditional well-known classification models.

  • Benchmarks performance of SVM with metaheuristic algorithms: to investigate the performance of HHO-SVM against other algorithms with distinguished benchmarks.

  • HHO-SVM against other metaheuristic algorithms: an examination of our sampled datasets on the proposed HHO-SVM compared with other algorithms.

  • Feature Importance analysis: an analysis of the most important features to detect malware as well as the relationship between each class type and the features.

6.1 Experiments setup

All experiments were conducted on a workstation with the specification of Xeon E5-2609 CPU and 64GB RAM. All algorithms were implemented on MATLAB 2016 version A. As for the base classifiers we used Weka tool. The settings and parameters of the metaheuristic algorithms can be found in Table 3.

Table 3 Parameter settings

Furthermore, 10 independent runs are conducted for all approaches and the average alongside standard deviations of the runs were taken, while the number of iteration was 20 for metaheuristic algorithms. The performed measures in this work are accuracy, precision, recall, and f-measure. As for the training and testing splitting criteria, we utilized the 10-fold cross-validation, thus, we guarantee the maximum shuffle for the testing and training sets. Finally, all approaches are examined on the 5 sampled datasets, and an extra investigation of the metaheuristic is examined on 10 well-known benchmarks.

6.2 Performance of the sampled data (CICMalAnal2017) on the base classifiers models

In the first phase, the performance of the base classifiers models is investigated on the new 5 sampled datasets. Theses classifiers are commonly used in the literature, which are, Naive Bayes (NB), k-nearest Neighbors (k-NN) with different k values and Random Forest (RF). This phase implemented in order to analyze and examined the sampled datasets before executed on our proposed approach.

Table 4 illustrates the results for \(Data_1\) dataset. The highest results obtained by 5-NN with 85.57%, while the second highest achieved by NB classifier with 84.86%. In terms of precision, the NB achieved the highest result with 0.876%, followed by RF with 0.8745%. As for the recall measure, the maximum result accomplished by 5-NN with 0.97%, while the second best acquired by NB. If we observe the f-measure results, the 5-NN has the best result, followed by NB, RF, 3-NN and 1-NN, respectively.

Table 4 Base classifiers results for \(Data_1\) dataset

Based on the results for \(Data_2\) in Table 5, we notice that the NB outperforms all other methods with 86.7% in terms of accuracy, followed by 5-NN, RF, 3-NN and 1-NN, respectively. As for the precision results, the 1-NN provides the fittest result, while 3-NN obtained the second best result. In terms of recall and f-measure, the best classifier was NB with 0.99% and 0.92%, respectively.

Table 5 Base classifiers results for \(Data_2\) dataset

According to the accuracy results for \(Data_3\) in Table 6, 5-NN achieved the best results with 92.6%, followed by 3-NN, RF, 1-NN and NB, respectively. As for the precision measure, the NB outperforms other classifiers, while the highest result in recall and f-measure accomplished by 5-NN with 0.99% and 0.96%, respectively.

Table 6 Base classifiers results for \(Data_3\) dataset

The 5-NN achieved the highest accuracy for \(Data_4\) as shown in Table 7, followed by 3-NN, RF, NB and 1-NN, respectively. As for the precision measure, the 3-NN exceeds all other classifiers, while 5-NN is placed second with 0.9313%. In terms of recall and f-measure, the 5-NN obtained the highest results with 0.99% and 0.96%, respectively.

Table 7 Base classifiers results for \(Data_4\) dataset

As per accuracy results in Table 8, 5-NN outperforms all other methods, followed by RF, 3-NN, NB and 1-NN, respectively. NB classifier provides the fittest results compared to the other classifiers in terms of precision with 0.878%. As for recall and f-measure, 5-NN also acquired the highest results, while the second best achieved by RF.

Table 8 Base classifiers results for \(Data_5\) dataset

In summary, the investigation of the datasets shows the stability of the results in total. Therefore, the examination of our proposed approach can be employed after this analysis. It is worth mentioned that the best algorithm obtained by the 5-NN, where it acquired first place 4 times in terms of accuracy, while NB achieved the first place in one time. As for the other measure, NB placed first 3 times in precision and 5-NN placed first for recall and f-measure in 4 times.

6.3 Performance of HHO-SVM on general benchmarks

Before examining our sampled datasets on the the proposed approach, a general performance investigation is applied on number of benchmarks. This is done in order to verify the proposed approach performance compared with the other algorithms. Table 9 reports a brief description of the 10 utilized benchmarks in this subsection.

Table 9 List of benchmark datasets

Four metaheuristic algorithms are applied in this phase which are, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Salp Swarm Algorithm (SSA), and Harris Hawk Optimizer (HHO).

According to the average accuracy shown in Table 10, HHO-SVM outperforms the other algorithms with 6 out of 10 datasets, namely, Breast Cancer, Wine, Sonar, Spectft, Ionosphere, Glass and Iris. Further, Parkinsons dataset achieved the best result by both the HHO-SVM and GA-SVM with 94.8421%. The two other datasets, Heart and Vowel, obtained the highest results by SSA-SVM and PSO-SVM, respectively.

The previous results clearly show the superiority of the HHO-SVM compering with other approaches on the 10 benchmarks. However, for more accurate examination, all four approaches are also compared with our sampled datasets.

Table 10 A comparison of average accuracy for GA-SVM, PSO-SVM, SSA-SVM and HHO-SVM over all benchmarks

6.4 Results of HHO-SVM compared with other metaheuristic algorithms on the sampled data (CICMalAnal2017)

In this subsection, the HHO-SVM applied on the sampled datasets and compared with the remaining metaheuristic algorithms. These algorithms are the same as the previous subsection, including GA-SVM, PSO-SVM, and SSA-SVM. Moreover, unlike the previous phase, the examination take place on several measures namely, accuracy, , recall, precision, and f-measure.

Table 11 illustrates the results of the four algorithms in terms of all measures for the \(Data_1\) dataset. In terms of accuracy, HHO-SVM achieved the best result, followed by GA-SVM, PSO-SVM, and SSA-SVM, respectively. Regarding the recall measure, HHO-SVM obtained the highest result, while GA-SVM placed second. Further, the HHO-SVM also acquired the fittest result for precision and f-measure with 99.95% and 93.20%, respectively, while the best-second achieved by PSO-SVM for both measures.

Table 11 HHO-SVM and other metaheuristic algorithms results for \(Data_1\) dataset

As shown in Table 12, the HHO-SVM exceeds all other approaches in terms of accuracy for \(Data_2\), followed by PSO-SVM, SSA-SVM, and GA-SVM. As per recall results, the best result is obtained by HHO-SVM, while PSO-SVM has the second highest result. According to the precision measure, the best results gained by both HHO-SVM and SSA-SVM. The PSO-SVM has the fittest result in f-measure compared to other algorithms with 93.14%.

Table 12 HHO-SVM and other metaheuristic algorithms results for \(Data_2\) dataset

Table 13 states the comparison of all algorithms for \(Data_3\) dataset. As per results for accuracy, recall and precision measures, we can see that all algorithms have the same results. This is happened due to the sensitivity of the dataset and the distribution of the classes. As for f-measure, the PSO-SVM attained the best result with 96.4355%.

Table 13 HHO-SVM and other metaheuristic algorithms results for \(Data_3\) dataset

Table 14 reflects the comparison results of \(Data_4\) dataset. The HHO-SVM reached the highest accuracy rates with 92.85%, which is followed by PSO-SVM, SSA-SVM and GA-SVM, respectively. Regarding the recall and f-measures, the HHO-SVM obtained the best results with 92.85% and 96.27%, respectively. As for the precision, we observe that the HHO-SVM joint with PSO-SVM achieved the highest result.

Table 14 HHO-SVM and other metaheuristic algorithms results for \(Data_4\) dataset

As per results for \(Data_5\) dataset in Table 15, it is observed that the HHO-SVM again exceeds the other approaches in terms of accuracy. The GA-SVM, PSO-SVM and SSA-SVM has attained the next rates, respectively. However, in terms of recall, precision, and f-measure, the GA-SVM attained the best result with 87.711%, 99.96% and 93.429%, respectively. While the HHO-SVM obtained the second best result for the three measures.

Table 15 HHO-SVM and other metaheuristic algorithms results for \(Data_5\) dataset

Mainly, the results of SVM in total shows improvement compared with the base classifiers phase. All metaheuristic algorithms obtained good results, however, the HHO-SVM outperforms all other approaches in most of the measures. This again proves the superiority of the proposed approach (HHO-SVM).

Figure 6 presents the box-plot charts for all datasets in terms of accuracy. The box-plots is determined by using the 10-runs values of the accuracy measure for each algorithm.

Fig. 6
figure 6

Box-plot charts for HHO-SVM and other algorithms based on sampled-datasets

6.5 Feature importance analysis

In this subsection, a feature importance analysis is presented in order to investigate the features’ weights for each dataset. This analysis will help us to identify the relationship between the features and the types of classes that each dataset has. Therefore, more identification and explanation is needed about the most important features to detect the malware for each scenario.

According to Fig. 7, the weights of the features reveals different values for each datasets. In Fig. 7a for example, the best feature was \(min_seg_size_forward\), while the second best feature was IdleMax. In \(Data_2\), the highest weighted feature was Init Win bytes forward as shown in Fig. 7b, and the second obtained by Bwd IAT Total. As can be seen in Fig. 7c the first and second features were FIN Flag Count and Fwd IAT Total with 0.610 and 0.550, respectively. On the other hand, Fig. 7d illustrates the features’ weights for \(Data_4\), where FIN Flag Count achieved the first place and Init Win bytes forward acquired the second place. Finally, Fig. 7e shows the weights of \(Data_5\) features. Fwd IAT Total attained the highest weighted feature, while Init Win bytes forward was the second highest.

Fig. 7
figure 7

HHO-SVM feature weighting for all datasets, incdluding \(Data_1\), \(Data_2\), \(Data_3\), \(Data_4\) and \(Data_5\)

Overall, 7 reveals each dataset’s important features and the difference between them. Each of which shows unique order of the features, due to the characteristics of the class type. \(Data_1\) for example, with classes Benign and Adware, the features were inclined to IAT features which mean the range time (total time between two actions) of the Flow, forward, and backward directions. Further, \(Data_2\) classes (Benign, Ransomware) related more to feature such as the total bytes transmitted to the initial window in a forward direction and the duration of two packets transmission in both directions. As for \(Data_3\) with classes Benign and Scareware the features closer to features that mean termination of data transmission alongside the time duration and flow of transmission between two packets. On the other hand, \(Data_4\) (Benign, SMSmalware) correlates with features like termination of the data, initial window transmission, and minimum segment volume in the forward direction. Finally, \(Data_5\) that has the largest number of classes, which are Benign, Ransomware, Scareware, and SMSmalware. The data shows more bonds with features such as transfer time duration of two packets in the forward direction, minimum segment size in the forward direction, transmission duration in the backward direction, and the flow maximum time being idle until it moves again.

7 Conclusion and future work

Android OS has been dominating the market share worldwide in the past few years. The number of users and applications increases every year due to this lead. Therefore, hackers and attackers exploit this success to spread various types of malware. Such issues can be resolve by using a measure like a machine learning Android malware detection-based. Consequently, in this work, we proposed a hybrid Support Vector Machine (SVM) and Harris Hawks Optimization (HHO) approach to detect these malware. The HHO is responsible for two procedures in this approach, optimizing of SVM hyperparameters and features weighting, while the SVM is in charge of evaluating this combination and selecting the best model for the testing phase of CICMalAnal2017 sampled datasets. Furthermore, a detailed analysis of the relationship between the features and malware attacks was presented. The performance of the proposed approach outperforms the other approaches in most datasets and measures. This approach suffers from two main limitations, namely, time consumption and computational complexity. The time consumption limitation can be solved using the correct application and appropriate dataset. While the computational complexity is hard to overcome due to the requirements needed for the objective of this work, parameter optimization and feature weighting, where both require different and unique structure representations. In future work, we seek to investigate more sub-attack types as well as other machine learning methods and metaheuristic algorithms. There are more than twenty sub-attack types that can be reviewed to improve the detection phase. Also, other classification methods can be employed that could offer different results and analyses.