Keywords

1 Introduction

Nowadays Android malware are increasing rapidly. According to the McAfee mobile threat report issued in 2020, there is an increase in the malicious apps targeting Android operating system [3]. These malware can hide themselves after installation, mimick the legitimate applications icon and also use advanced evasion technique that downloads the malicious code after sometime. The conventional signature based detection mechanisms [1] cannot detect such obfuscated evasive malware. Hence many malware detection mechanisms are adopting machine learning to detect unknown malware. The advantage of machine learning is that it can automatically learn and predict the malware behaviour from raw data [34]. In Android, there are many works that show how machine learning can be used effectively for static, dynamic and hybrid malware detection mechanisms [42].

In static Android malware detection, the Android application is examined without executing it [51]. The features used for static analysis are permissions, intents, static API calls, opcodes etc. The advantage of employing static analysis technique is that it has high code coverage. The drawback of static detection mechanism is that, it cannot detect obfuscated malware. Dynamic analysis on the other hand runs the application in an emulator and then captures the malicious behaviour of the application by examining the run time features such as system calls, dynamic API calls etc. [13, 25, 26, 45]. The disadvantage of dynamic detection mechanism is that it has less code coverage. Moreover, malware developers can evade the dynamic analysis by examining the specific API’s invoked by the application when it is made to run in a virtual machine. For example, if an application is executed in a virtual machine, then the TelephonyManager.get-Device id() API returns zero [47]. Petsas et al. [38] proposed an attack against virtual machines by examining the dynamic sensor information and VM-related intricacies of the Android emulator to evade detection. Wenrui et al. [24] proposed a mechanism to evade Android emulator runtime analysis using an evasive component that identifies whether the events are coming from a real user or from an automated tool. To solve the problems related to static and dynamic malware detection, several mechanisms have been proposed in the past that use hybrid analysis. The hybrid analysis uses a combination of both static and dynamic features for detection. The drawback of hybrid detection is that the resource consumption is more [12] when compared to static and dynamic malware detection mechanisms.

To detect malware, the static, dynamic or hybrid features of the application are fed into unsupervised or supervised machine learning classifiers. In supervised machine learning malware detection, the machine learning model is trained with thousands of benign and malware samples. However, adversaries can evade the most powerful machine learning models used for the malware detection by crafting malware that exploits the vulnerability of machine learning models. These malware are called adversarial malware. Adversarial malware pose a serious threat nowadays and is an emerging area of research [36].

There are some works that show how static detection mechanism using permissions and API calls can be easily evaded using adversarial attack. In [5], the authors evade the Drebin [11] detection method using some feature perturbation techniques. However, a research on how adversary evades the opcode based Android malware detection is yet to be explored and is an interesting area of study. This is because the opcodes contain valuable information to detect malware that employ code repackaging. The recent CoronaVirus application is one such ransomware [14] that uses code repackaging. The effectiveness of opcodes lies in the fact that despite obfuscation, the same family of Android malware share the same code parts and hence can be identified by examining the opcode patterns of the application [7, 45].

Likewise in dynamic malware detection, system calls are very effective features for detecting obfuscated malware [16, 33]. This is because, system call captures the interaction of the application with the operating system and hence reveal the actual behaviour of the application even if the application adopts dynamic loading, encryption and other techniques for evasion. Vinod et al. [49] showed label flipping attack against Android system call based malware detection where they poison the training data with adversarial samples. However, their attack injects individual system calls rather than sequences of system calls that is not effective.

In this paper, we show how static Android malware detection mechanism using opcodes and dynamic Android malware detection mechanism using system calls can be evaded by the adversary. We employ frequency based evasion in both static and dynamic malware detection mechanisms whereby the attacker injects the most frequent benign code sequences into the malware to subvert the detection. Our attack is realistic which shows that injecting a few sequences of benign code can evade the robust machine learning based malware detection mechanisms. Moreover, our attack is resilient against feature selection approaches [23] since we inject opcode sequences that replicate benign application behaviour.

The contributions in this paper are the following:-

  1. 1.

    We explore how an adversary injects benign opcode sequences to evade the static Android malware detection mechanism. For this we evade the mechanism employed in [16], which is a malware detection mechanism using opcode n- grams. This is the first work that shows adversarial attacks in the form of benign opcode injection.

  2. 2.

    We explore how the adversary injects benign system call sequences for evading dynamic Android malware detection mechanism. To show this, we evade the mechanism employed in [44] which is a malware detection mechanism using graph signal processing. We show that Android system call based malware detection mechanism using the powerful graph signal processing technique can be evaded by injecting a few benign system call sequences.

2 Related Works

The malware detection mechanism employed by the popular malware detection companies like Kaspersky [2] and Norton [4] use machine learning to detect polymorphic and obfuscated malware. However their detection capabilities can be deteriorated by an adversary that employs feature perturbations to evade detection. According to the recent threat report issued by Kaspersky [9], the adversarial attack against machine learning based malware detection can cause misidentified Trojans to infect millions of devices. There are many works in the past that discuss about adversarial attacks and defenses in Android malware detection. This section discusses about the attacks and defenses against Android malware detection classifiers that are implemented in the past.

There are two types of adversarial attacks. They are data poisoning and evasion attacks [32]. Data poisoning attacks are launched by contaminating the training instances of the classifier. Evasion attack on the other hand finely perturbs the applications features to evade detection. The evasion attack can be either problem space attacks or feature space attacks [39].

Chen al [19] proposed a data poisoning attack in Android malware detection where the adversary contaminates the training data with malicious samples. However, their attack required gradient information about the classifier. Abaid et al. [5] proposed an evasion attack in which evades the Drebin detection mechanism. They constructed attackers with different capabilities and showed that adversarial evasion is a feasible threat. Their attack reduced the detection accuracy of the classifier from 100% to 0%. However, their attack removed some features from the Android application, which may cause the application to lose its malware functionality.

The evasion attack can be either problem space attacks or feature space attacks [39].

In problem space attacks, the attacker transforms the malicious application to a new variant sample that is valid and realistic. Pierazzi et al. [39] proposed a problem space attack in android malware where the adversary employs opaque predicates by carefully constructing obfuscated conditions or program code that returns False but evades static detection. Their attack can be detected with the help of feature selection techniques mentioned in [31]. Yang et al. [53] proposed a problem space attack in which they craft adversarial malware samples. Their technique alters the semantics of the application, and the generated malware may loose its functionality. Rosenberg et al. [40] proposed an attack against the API call based malware detection mechanism. They added artifacts into the application which can be detected using dynamic analysis.

The feature space attack on the other hand makes fine grained feature perturbations on various static and dynamic features of the application for evasion. Gross et al. [27] proposed a feature space attack using Jacobian matrix perturbation to evade Drebin Android malware detection. However the generated malware can be detected using undeclared classes and unused permissions. Li et al. [35] crafted attack against Drebin android malware detection mechanism. Their attack employed multiple generative methods to craft malware that do not ruin the malware functionality. They also proposed an ensemble technique to defend against adversarial attack.

Demontis et al. [23] proposed a secure learning technique to detect adversarial attacks in Android malware detection. However, their technique cannot detect malware that replicated benign applications behaviour. Chen et al. [18] proposed an ensemble based defense against adversarial attacks in Android malware detection. They used a feature selection approach to detect adversarial malware. The disadvantage of all these defensive mechanisms is that they can only detect adversarial attacks that perturb syntactic features like permissions [19, 23, 28, 53, 53]. Moreover, perturbing syntactic features like Android permissions can be easily achieved unlike semantic features. Since many malware detection mechanisms are extensively using the information of the dex files for malware detection [20] the attack that manipulates the features of the dex file is critical.

In this work, we perturb the features in the classes.dex file by injecting Dalvik opcodes that occur in the benign Android applications. Chen et al. [21] proposed a similar attack that manipulates the API control flow graph to evade detection. However, their attack inserts Nop API calls that can be detected using white list filtering [21]. Moreover their attack requires sophisticated adversarial feature perturbation techniques to achieve high evasion rate. In this paper, we investigate evasion attack in the form of feature space attack in which the attacker subverts the malware detection mechanism by injecting the features of the benign application. Our attack is resilient against feature selection approaches as mentioned in [23]. In Android, malware developers can easily download evasive malware by launching an update attack [5] when compared to data poisoning attacks. This motivated us to explore evasion attacks in static and dynamic Android malware detection mechanisms. We believe that our work will help security researchers to develop suitable defensive mechanisms against adversarial attacks in Android malware detection classifiers.

3 Machine Learning for Android Malware Detection

There are many works that show the effectiveness of machine learning for malware detection. This section discusses about the various machine learning models implemented in the past for malware detection. In [54] the authors proposed a classifier fusion approach that combines several machine learning classifiers for detecting malware. They combined various classifiers like J48, Random Tree-100, and Voted Perceptron, REPTree and Random Tree-9. Among the various machine learning models used for malware detection, Support Vector Machines(SVM) are found to be extremely useful for detecting unknown malware. Justin et al. [41] proposed a malware detection mechanism using SVM to detect Android malware with control flow graphs(CFG). Shifu et al. [30] proposed Hindroid, a mechanism using standard multi-kernel learning with SVM to detect Android malware. Canfora et al. [17] proposed a malware detection mechanism using sequences of system calls with SVM for building a fingerprint of the Android malware applications. Wen et al. [52] proposed a malware detection approach based on big data analytics and SVM by extracting various static and dynamic features of Android malware. Besides SVM, decision trees and random forest also gave excellent results in detecting malware. Peiravain et al.[37] proposed a malware detection method with permissions and API calls to detect Android malware using decision trees. Alam et al. [6] proposed a malware detection mechanism using Random Forest. The features used were battery consumption, CPU usage, memory related features, permissions etc. Moutaz et al. [8] proposed a malware detection mechanism using API calls and permissions. Their detection mechanism gave 94.3% F-measure with random forest classifier. Among all other machine learning models, deep neural network gained popularity owing to its ability to detect malware without manual feature engineering [48]. Venkatraman et al.[46] proposed a malware detection with deep neural network and their detection mechanism gave 96% accuracy. However, all these malware detection mechanisms using machine learning can be evaded by an adversary that crafts intelligent malware using adversarial machine learning.

4 Method of Attack

In this work, we evade the state of the art malware detection mechanisms [44] and [16] that use system calls and opcode n- grams as features. To inject the features of benign application to the malware application, we use TF-IDF feature selection. We chose TF-IDF since both of these malware detection mechanisms [16, 44] use the frequency counts of the opcode n- grams and system calls for constructing the features for malware detection.

5 Evading Opcode Based Android Malware Detection Mechanism

To evade the opcode based malware detection mechanism, an adversary may employ code injection attack to inject benign dalvik code parts to the malware application or may insert junk codes to evade the detection mechanism. Since a good feature selection approach can easily detect junk code insertion, we launch attacks in the form of benign opcode injection to test whether the classifier is able to detect malware. This section discusses about how an adversary can evade opcode n- gram based Android malware detection mechanism mentioned in [16]. In [16], the opcode n- grams obtained from benign and malware applications are given to SVM(Support Vector Machines) and Random Forest for malware detection. An accuracy of 95.67% accuracy was obtained using opcode 5- grams with SVM classifier while with Random Forest, an accuracy of 96.88% was obtained using opcode 2- grams.

5.1 Preprocessing

We replicated the experimental setup mentioned in [16] to explore how adversary evades the detection mechanism. For this, we took 5560 malware applications from the Drebin dataset [11] and collected 5560 benign applications. Table 1 shows the malware families that were taken for the experiments as mentioned in [16]. We took all the benign application categories as mentioned in the original work. The benign applications were downloaded from Google Playstore and were uploaded to VirusTotal to check for malicious behaviour. Using apktool [50], we first extracted the .dex files from the apk. Then by using smali tool [29], we extracted the smali files from the .dex files. These smali files contain the opcodes of an apk and can be used to construct opcode n- grams.

Table 1. Android malware familes

5.2 Training the Classifier

We trained the classifier as mentioned in [16]. We took opcode n- grams with n = 2 and 5 since they gave the maximum accuracy when compared to n = 1,3,4. We took top 2000 number of opcode n- grams that distinguish benign from malware application by using the technique mentioned in the original paper. We trained the two classifiers Support Vector Machine(SVM) and Random Forest(RF) as mentioned in the original work. We obtained an accuracy 96.3% with 1000 number of opcode 2-grams and an accuracy of 95.3% on 2000 number of opcode 5-grams. The accuracy values were approximately equal to that of the original work. Table 2 shows this.

Table 2. Performance of [16] before the attack
Table 3. Performance of [16] After the Attack

5.3 Testing the Classifier

To evaluate the performance of the classifier, we used metrics such as True Positive Rate(TPR), False positive(FPR), True negative Rate (TNR), False Negative Rate (TNR), Accuracy, Recall, F-measure. True Positives(TP) refers to the number of malware applications that are correctly classified as malware by the classifier. True Negatives(TN) refers to the number of goodware applications that are correctly classified as goodware. False Positives(FP) refers to the number of benign applications that are incorrectly classified as malware. False Negatives(FN) represents the number of malware applications incorrectly classified as goodware applications. The accuracy, precision and F-measure are computed as follows:

$$\begin{aligned} Accuracy= \frac{TP+TN}{TP+FN+TN+FP} \end{aligned}$$
(1)
$$\begin{aligned} Precision=\frac{TP}{TP+FP} \end{aligned}$$
(2)
$$\begin{aligned} Recall=\frac{TP}{TP+FN} \end{aligned}$$
(3)
$$\begin{aligned} F-measure=\frac{2\times precision \times recall}{precision+recall} \end{aligned}$$
(4)

5.4 Dalvik Opcode Injection

The aim of the Dalvik opcode injection attack is to evade the classifier by injecting benign opcode n- grams. We aim to use opcode injection attack rather than opcode elimination attack since the latter may destroy the malware functionality. The attack is achieved by using a JADX tool to find the java files corresponding to the smali files and injecting opcode sequences corresponding to the malicious class files.

We assume that the attacker has complete knowledge about the classifier and the features. To evade the detection mechanism the attacker injects benign opcode n- grams. We computed the most frequent benign opcode n- grams obtained using the TF-IDF method as mentioned before and injected them to evade the detection mechanism employed in [16]. Table 4 shows the top five opcode 5- grams obtained using TF-IDF method. In addition to the benign opcode n- grams obtained using TF-IDF, we also injected opcode n- grams for displaying text messages inside the malicious application to mimick the legitimate application behaviour. Figure 1 shows this. In this figure, the java code and its corresponding dalvik code to display a text message is shown. Here we aim to explore how injecting junk or random text messages can evade the opcode n- gram based detection mechanism.

We conducted the experiments on Random Forest classifier with opcode 2- grams, since it gave maximum accuracy in the original work. For testing the performance of the classifier, we took 500 benign application and 500 opcode injected malware applications. We took 50 samples from each of the Android malware families listed in Table 1. The performance of the classifier when we inject l benign opcode n- grams is shown in Table 3. When l increases, the FNR also increases which shows that the injected malware can evade the detection mechanism employed in [16]. Figure 2 shows how the detection accuracy of the classifier is reduced when we increase the value of l.

Table 4. Top five opcode 5- grams obtained from TF-IDF Method.
Fig. 1.
figure 1

Injecting benign opcodes for displaying text messages

Fig. 2.
figure 2

Accuracy Values of [16] After Injecting l number of opcode \(n-\) grams

6 Evading System Call Graph Based Android Malware Detection Mechanism

System call based Android malware detection mechanisms are found to be extremely powerful in detecting malware that evade static detection mechanisms [33]. Most of the system call based Android malware detection mechanisms are using frequency of occurrence of the system calls for detecting malware [17, 25]. This is because certain system calls like read(), write() etc. are frequently invoked by the malware than goodware. This technique can be evaded by using a system call injection attack in which the malware injects some rare or benign system calls at runtime [15]. In this section, we show how an adversary can evade the system call based Android malware detection mechanism in [44]. The malware detection mechanism in [44] employs graph signal processing mechanism to detect Android malware. In this mechanism, the frequency of occurrence of the system calls are taken as the signals and then a graph shift operation is applied to the signals to obtain the processed graph signals. These graph signals are then fed into the machine learning classifiers to check whether the application is malicious or not.

6.1 Preprocessing and Signal Extraction

We took 2500 malware and goodware applications as mentioned in the original work [44] to replicate the experimental setup. The malware samples were taken from Drebin [11], AMD [10], and Contagio minidump [22]. We took 1,2,5,6,9,10 malware families mentioned in Table 1 and also the malware families in Table 5 as mentioned in [44] for conducting the experiments. The benign applications were downloaded from Google Playstore and checked with VirusTotal to check for malicious behaviour. We also eliminated semantically similar Android malware and took all the malware families mentioned in [44] and replicated the experimental set up. The Android applications were made to run in an emulator by injecting thousand pseudorandom events like key press event, touch event etc. to achieve high code coverage. We collected system calls using strace utility [43] and eliminated irrelevant system calls as mentioned in [44] and only selected relevant opcodes for malware detection to replicate the features for classification. After selecting the relevant opcodes, we constructed system call digraph and extracted the graph signals.

6.2 Training the Classifier

We trained the classifier as mentioned in the original paper. We took Random Forest Classifier, since it gave maximum accuracy. We took 80% samples for training and remaining 20% for training as mentioned in the original work [44].

6.3 Testing the Classifier

The accuracy, precision, recall and F-measure was computed as in Section 3.1. Table 6 shows the performance matrix of the classifier.

6.4 System Call Injection

The system call graph signal based detection mechanism takes the frequency of occurrence of the system call for constructing the graph signal. This mechanism can be evaded by injecting benign system call codes that mimick legitimate application behaviour. We model a perfect knowledge attack where the attacker has complete knowledge about the features and the classification model. The attacker can gather malware and benign system calls from public repositories and examine the most frequent system calls that are occurring in malware and benign applications. We inject a sequence of system calls rather than individual system calls since the application may not work properly if we do so. To inject a system call sequence, we first computed the most frequently occurring benign system calls from goodware applications using the TF-IDF method. We found that certain system calls like unlink(), mkdir(), chmod() are frequently invoked by the benign application. Our attack is similar to the attack as mentioned in [15]. We carefully selected the benign applications that are having the most frequent benign system call counts and then injected those system call sequences to evade the detection. We took 10 malware samples from each of the malware family and made a test set of 270 system call injected malware samples and 270 benign samples. Table 7 shows how the detection accuracy is reduced.

Table 5. Android malware familes
Table 6. Performance of [44] before the attack
Table 7. Performance of [44] after the attack

7 Conclusion and Future Work

In this paper, we showed that how an adversary can evade the static and dynamic Android malware detection mechanism employed by some of the state of the art machine learning models. We showed that by injecting only a few number of features, adversaries can induce misclassification. In future, we plan to model a limited knowledge attack and a blackbox attack to evade the system call and opcode based malware detection mechanisms. This is to explore how the adversary evades the detection model with less or no knowledge about the classifier. We also plan to develop suitable mechanisms to detect adversarial malware.