Deep learning vs. adversarial noise: a battle in malware image analysis

Asmitha, K. A.; Puthuvath, Vinod; Rafidha Rehiman, K. A.; Ananth, S. L.

doi:10.1007/s10586-024-04397-4

Deep learning vs. adversarial noise: a battle in malware image analysis

Published: 17 April 2024

Volume 27, pages 9191–9220, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Cluster Computing Aims and scope Submit manuscript

Deep learning vs. adversarial noise: a battle in malware image analysis

Download PDF

K. A. Asmitha¹,
Vinod Puthuvath^1,2,
K. A. Rafidha Rehiman¹ &
…
S. L. Ananth³

214 Accesses
Explore all metrics

Abstract

The proliferation of malware variants has shown a steep increase, attributed to their enhanced sophistication and the utilization of the latest technologies. This constitutes a severe menace to smart gadgets and IT infrastructure. Malware visualization has emerged as an exceptionally attractive technique, primarily because it obviates the need for disassembly or code execution. In this approach, malicious executables are transformed into visual representations resembling images. This visual representation allows for the extraction of textural features using the Local Binary Pattern (LBP) technique. Subsequently, classification models are constructed using ResNet50, VGG16, and customized models tailored to the specific task. These model undergoes extensive evaluation through two benchmark datasets: the MalImg dataset (consisting of 9,342 instances of malware across 25 families) and the Malware Classification Challenge dataset (BIG2015) (with 10,868 labeled malware instances across nine families). Additionally, the model is validated on a self-made dataset, which we named Malhub, consisting of 26,452 executables comprising 20 families. Furthermore, we implemented a white-box adversarial attack using additive noise (Gaussian, Local Variable, Poisson, Salt and Pepper, Speckle). We observed an F1 score in the range of 0.992$-$0.993 for MalImg, 0.874$-$0.878 for BIG2015, and 0.014$-$0.992 for Malhub dataset. This proves that efforts are required to tune machine learning models to detect adversarial examples.

Utilizing InfoGAN and PE Header Features for Synthetic Ransomware Image Generation: An Experimental Study

Auxiliary-Classifier GAN for Malware Analysis

Malware Classification Using Image Representation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cyber attacks are on the rise and have emerged as a disruptive force against cybersecurity, driven by the proliferation of malware and the increasing sophistication in their development. According to the PurpleSec Trend Report, cybercriminals have recorded a 600% surge in launching cybercrimes through malware during lockdown and the digital convergence in the pandemic.^{Footnote 1} Cyber threats were identified as the top human-caused risks, with an estimated financial loss of around $11.4 million per minute, as documented in Crowdstrike global threat report (2021) .^{Footnote 2} Furthermore, there has been an increase in malware attacks targeting various industries, including education, healthcare, technology, and government, as reported in Global industry sectors most targeted by malware incidents in 2020.^{Footnote 3}

Malware, a collection of malicious programs, consists of executable code that can infiltrate an organization’s security. Commonly, all potentially unwanted software, such as viruses, trojans, worms, adware, ransomware, spyware, keyloggers, rootkits, etc., is referred to as malware. Generally, malware is classified depending on its functionality, mode of propagation, and impact on the system. Security of interconnected devices is paramount for modern organizations, as cyber threats continue to evolve and pose significant risks. Understanding the various types of malware and their infection mechanisms is crucial for organizations to protect their systems and data effectively. Malware analysis involves the techniques or tools for understanding its behavior to build defense strategies and prepare systems to withstand future attacks. Anti-malware vendors generally employ signature and heuristic-based approaches to detect and remove malicious files before affecting the system. The former method matches strings extracted from suspicious samples against a known signature repository. This method is not suitable for detecting new malware strains. Besides, managing a massive database of signatures is practically infeasible. Generally, experts categorize heuristic-based approaches into static, dynamic, and hybrid analysis. Static analysis examines the source code by decompiling the allegedly malicious software. This method fails to identify encrypted and obfuscated code as it defeats reverse engineering. Conversely, dynamic approaches perform analysis by unpacking and executing suspected binary code in a virtual machine or sandbox. However, such methods have limited code coverage and are computationally expensive.

The use of machine learning (ML)and deep learning (DL)algorithms have gained popularity for malicious software detection using attributes extracted from both static and dynamic approaches [1,2,3]. Malware detection using ML techniques gained popularity due to (1) the availability of labeled malware feeds and (2) a reduction in the cost of hardware. However, the solutions based on ML have shown remarkable acceptance by researchers from industries and academia. These methods involve substantial time and resources to extract relevant features through feature engineering. Researchers have explored image visualization techniques for recognizing the visual similarities within malware families, malware detection, and classification [4]. This approach departs from traditional methods that rely on intricate feature engineering and instead leverages the inherent visual patterns present in malware samples. This shift in approach offers several advantages [4], including (a) reduced reliance on domain-specific knowledge, (b) enhanced robustness against obfuscation, and (c) improved generalizability.

In this approach, malware binaries are visualized as grayscale or color images [5], enabling the application of deep learning techniques like Convolutional Neural Networks (CNNs), Auto Encoders (AEs), and Long Short Term Memory (LSTMs) for malware detection and classification. While these deep learning methods have demonstrated promising results, they exhibit a critical vulnerability to adversarial examples (AEs) [6, 7].

This paper aims to classify malicious executable variants into their respective families employing visualization and deep learning algorithms. The proposed approach utilizes Local Binary Pattern (LBP) to extract distinctive features from malware samples, transforming them into visual representations. These LBP-generated images are fed into pre-trained Convolutional Neural Networks (CNNs) like ResNet50 and VGG16, along with customized classifiers, to classify malware samples effectively. Furthermore, to assess the efficacy of the proposed approach, we conducted repeated experiments using images without employing LBP conversion. Using LBP, the proposed model can classify samples into corresponding families with an F1 score in the range of 0.90$-$0.995, 0.989$-$0.998, and 0.993$-$0.999 on BIG2015, MalImg, and Malhub datasets, respectively. Moreover, we supplied the model with adversarial samples generated using additive noise and observed a marginal reduction of 4.1% for MalImg, 11.9% for BIG2015, and 98.5% for Malhub in their respective maximum F1 scores. In general, the major contributions of this research work are as follows:

1.
Developing a novel deep learning technique that integrates malware visualization and family categorization incorporating textural features of images using Local Binary Pattern (LBP) on grayscale images.
2.
Providing a comprehensive analysis of different image dimensions on two well-known benchmark datasets: MalImg, and the Microsoft Malware Classification Challenge and Malhub dataset.
3.
Conducting a rigorous analysis by executing a variety of conventional machine learning and cutting-edge deep learning designs using three different datasets.
4.
Analyzing the performance through a comparative analysis of visualization-based methods for malware classification. The results show that our proposed approach can classify samples with the highest F1 score of 0.995(64x64), 0.998(128x128), and 0.999(128x128) for BIG2015, MalImg, and Malhub datasets, respectively, which is superior to other methods in the literature.
5.
We also demonstrate white-box attacks by generating tainted examples. The experiments demonstrate a marginal drop in the performance with classifiers trained on MalImg(1.6$-$4.1%) using ResNet+CNN+Dense Layer. In the case of Fusion model, images in MalImg were resilient to adversarial samples except for Pepper noise(1%). On the contrary, ResNet50+CNN+DL trained on BIG 2015 samples were resilient to evasion attacks, and for the fused ResNet50||CNN||DL, we experienced a drop of a maximum 11.9%. For the Malhub dataset, the drop is in the range of 0.7$-$98.5%.

The remaining part of the paper is arranged as follows: Sect. 2 presents the research status in malware classification. Section 3 presents a detailed explanation of the proposed method. Section 4 expounds on the evaluation measures and details of the results. Finally, section 5 summarizes our findings and provides directions for the future work.

2 Related works

Malware detection approaches can be categorized into three main types: static analysis-based, behavior analysis-based, and visualization-based. This section offers insights into existing malware detection techniques by reviewing related works within the above-mentioned categories.

2.1 Static analysis

Static malware analysis is a technique used to examine the code and properties of a suspected malware sample without executing it, eliminating the risk of infecting the analysis environment. Binary analysis tools such as IDAPro, PEStudio, etc., are used in static analysis techniques to extract static features like strings, file hashes, API calls, opcode sequences, etc., from executable binaries.

In [8], the authors utilized a list of DLLs, functions, function calls within DLLs, encoded strings, and byte sequences. Using the multinomial Naïve Bayes algorithm, they successfully attained 97.11% accuracy on a dataset consisting of 3265 malware samples and 1001 benign executables. However, this approach necessitates feature engineering tools and domain expertise. Moreover, the scarcity of benign examples compared to malware files introduces a divergence from the realistic scenarios.

In [9], Kolter et al. generated n-grams from executables and tested the accuracy of different machine-learning algorithms and their combined versions. They classified malware into multiple families and obtained the best accuracy for the Boosted j48. However, their approach failed to take into account the overhead time.

In [10], Santos et al. have proposed a semi-supervised learning method for malware detection, as it is strenuous to obtain labeled datasets. Byte n-gram distribution approaches with Local, Global Consistency (LGGC) have been used in their work and achieved 86% accuracy for detection. Even though the accuracy is less than other models, they could estimate the number of labeled samples required for learning while maintaining reasonable classification results. However, they considered a very small number of samples and disregarded the overhead time.

In [11], the authors used variable-length instruction sequences to categorize benign and malignant classes using machine learning. They obtained 96% accuracy using Random Forest and Decision Trees. A machine learning method was presented on n-opcode sequences in [12]. They employed a Support Vector Machine (SVM) classifier, achieving 98% accuracy. However, the proposed approach necessitates feature engineering techniques.

2.2 Dynamic analysis

In dynamic analysis-based techniques, researchers extract behavioral features by executing the samples within a controlled sandbox environment. In [13], authors proposed a novel method using Artificial Intelligence techniques to analyze behavior-based malware and classify malware into Worms and Trojans. CWSandbox and Anubis generated behavior profiles for collected samples. Their manual analysis-driven approach needed more scalability and feasibility for practical implementation.

The authors in [14] proposed an incremental method using clustering and classification to create behavioral profiles for malware execution changes. They reported an accuracy of 88% when using SVM. The single execution path limits their approach, and evasion affects it either by detecting the sandbox environment or mimicking different behaviors. In [15], the authors introduce an alternative approach that uses DNA sequence alignment algorithms to identify common API call sequence patterns. However, they must regularly update a list of trusted benign and malicious programs for their approach, which uses whitelist and blacklist filtering. Additionally, the hooking process of the study only traces user-level APIs, so API call sequences can’t be logged if malware uses kernel-level APIs. Anderson et al. [16] suggested a novel approach to detect malware by applying Markov Chain Graphs on instruction traces during its execution. They employed machine learning algorithms to categorize the data based on global and local similarity, achieving an accuracy of 96.41%. However, this approach necessitates additional computation overhead.

Malware detection and classification traditionally involve static analysis, requiring the disassembly of malware samples into.asm files to extract opcodes and operands. Subsequently, the generation of n-grams or function call graphs for analysis proves computationally expensive and time-consuming. In contrast, dynamic analysis executes samples in a virtualized environment, extracting system calls, memory artifacts, and malware traces. While dynamic analysis is also resource-intensive, time-sensitive, and demands human interaction. Therefore, image-based malware detection offers an efficient alternative. Reading raw data to generate byte plots and utilizing CNN models for feature extraction, significantly reduces the time and resources required for analysis and providing a more streamlined and practical approach to malware detection and classification.

2.3 Visualization based techniques

Visualization-based malware analysis proves resilient against obfuscation techniques, focusing on the visual representation of malware files. This approach efficiently utilizes resources, requiring fewer computational assets than traditional static and dynamic methods. Its non-execution approach distinguishes it from dynamic analysis, minimizing the risk of triggering malicious behavior and ensuring a safer analytical environment. Leveraging Convolutional Neural Networks (CNNs) allows effective feature extraction from image representations, empowering the model to discern intricate patterns within the data. The visual and interpretable nature of image-based representation facilitates intuitive pattern identification by analysts. However, despite its strengths, image-based analysis encounters challenges. It may lack a comprehensive understanding of malware functionality, prioritizing patterns, and anomaly detection over detailed behavioral insights. The quality of generated images significantly impacts analysis effectiveness, posing a challenge for researchers. Vulnerability to adversarial attacks raises concerns about the reliability of the analysis. Addressing these challenges is crucial to align the analysis method with specific malware analysis goals and sample characteristics.

The authors in [17] proposed the initial work in malware visualization, utilizing self-organizing maps to visualize virus binaries. Natraj et al. [18] visualized grayscale images of malware binaries categorized into 25 families and extracted GIST features. They reported an accuracy of 97.18% for their solution on the K-Nearest Neighbour classifier. Evaluation on adversarial examples was not performed. Such modified samples can exploit the global image-based features to evade detection. Additionally, the researchers have applied deep learning techniques [19] on grayscale images of benign and malware binaries and demonstrated equal performance with machine learning models trained with GIST descriptors [20]. However, the experiment involved only a few malware samples and overlooked the overhead time. The authors in [21] integrated dynamic analysis with image processing techniques to identify unpacked and packed malware samples and concluded that it is not scalable as textural analsis.

In [19], the authors utilized grayscale image representations to visualize malware and benign binaries. They used deep learning techniques to train the model and achieved a test accuracy 95.66% on a dataset comprising 10,000 benign and 200 malware files. However, their system lacked detailed insight into the design and attributes of the malware, and notably, it failed to consider the execution overhead.

Detection of IoT malware using one-channel grayscale images was proposed in [22] and achieved a classification accuracy 94% for DDoS malware using CNN. The researchers applied a deep Convolutional Neural Network to the MalImg dataset in [23] and [24]. They obtain 94.5% and 98.48% accuracy, respectively. However, they designed a notably shallow network structure, and their samples were restricted to only two malware families. Another CNN-based approach proposed by the author in [25] obtained 97.02% accuracy. Researchers also investigated hybrid models such as CNN and bi-directional Gated Recurrent Units, as discussed in [26], and CNN-LSTM, as explored in [27], for classifying malware. However, the authors of the paper neglected to consider overhead time.

Recently, transfer learning [28, 29] was adopted to classify malware samples to its corresponding family. The deep CNNs are trained on natural images and utilize the extracted features to identify characteristic attributes of malware represented as images. In[30], transfer learning with InceptionV1 architecture was applied to malware detection using grayscale images from the Malimg and Microsoft Malware Classification datasets: the multi-class classification achieved 99.25% accuracy and 0.03% false positives on Malimg dataset. A dataset with 16,518 benign and 10,639 malware files used and obtained 99.67% accuracy for binary classification. However, concerns were raised about excluding small malware files and testing against known classes, suggesting improvements in experimental design to handle more realistic data. The authors in[31] investigated the effectiveness of DenseNet in image classification using the visual similarity of malware families. They employed DEAM (Depthwise Efficient Attention Module) and DenseNet for malware detection and family classification. Their results showed 98.5% accuracy for the MalImg dataset, 97.3% for the BIG 2015 dataset, and 99.3% accuracy for a custom dataset composed of both datasets.

Another approach presents the FDL-CADIS model, a fusion of deep learning-based models[32]. This approach uses two-dimensional malware images and applies them to MobileNetv2. The authors used the Black Widow Optimization for tuning hyperparameters. Furthermore, an ensemble of voting-based classifiers, incorporating Gated Recurrent Unit and Long Short-Term Memory techniques, achieves accuracies of 98.73% for the MalImg dataset and 98.83% for the BIG2015 dataset, respectively. However, the exclusive emphasis on malware detection and the absence of information about the utilized features hinder the assessment of their significance. The authors in [33] introduce a malware classification method based on deep learning, which relies on a one-dimensional representation of raw binary. However, these models mentioned in state-of-art were unable to evaluate the impact of adversarial attacks.

Machine learning’s dominance in malware detection has made it a vulnerable target for hackers. By manipulating data distributions during training or testing, attackers can trick machine learning classifiers into misclassifying malicious software. This manipulation is called an adversarial attack. Recent works such as Ambra et al. [34] presented a case study of attack vectors against Android malware classifiers. Kathrin et al. [35] demonstrate techniques to evade deep neural networks. They utilized the DREBIN dataset of Android malware and claimed the misclassification of 69%. Similarly, Chen et al. crafted adversarial examples by poisoning syntactic features in [36]. They reported that the attack successfully defeated three popular Android malware classifiers: DREBIN, DroidAPIMiner, and MaMaDroid.

Unlike the above-presented methods, we researched by integrating pre-trained deep neural networks designed for object detection with Convolutional Neural Networks, known for extracting features automatically. We further supplemented this approach with dense layers to detect new visual images of malware. Additionally, to ensure the scalability of the proposed malware detector, we created a custom dataset containing real-time malware samples from 20 distinct malware families, in addition to utilizing benchmark datasets. Furthermore, we thoroughly examined how adversarial attacks, implemented through the introduction of various types of additive noises, impacted the performance of the proposed model. Our investigation centered on assessing whether even these fundamental noise types could result in misclassification of the samples.

3 Proposed method

In this section, we briefly discuss the architecture of the proposed solution (refer Fig. 1).

3.1 Dataset preparation

The dataset includes malware executables collected from MalImg [18], the Big 2015 (Microsoft Malware Classification Challenge) [37]. Moreover, we created a dataset, which we will refer to as the Malhub dataset throughout this article. The MalImg dataset contains 9,339 malware executables from 25 different families. The transformation of individual byte values into pixels produced the grayscale images depicting each executable within this dataset. We utilized 10,868 tagged malware executables belonging to 9 distinct families from the Big 2015 dataset. Each file contains raw data, encompassing the hexadecimal representation of a binary file and associated metadata comprising strings, function calls, and opcodes. Finally, the Malhub dataset consists of 26,452 samples of 20 families collected from VirusShare^{Footnote 4} repository. Table 1, Table 2, and Table 3 respectively show the datasets along with the details such as family name and number of samples. To ensure the effectiveness and generalizability of our model, we divided each dataset into three subsets: a training set comprising 70% of the malware samples, a validation set containing 20% of the samples, and a test set comprising the remaining 10%. This approach enabled us to train the model on a substantial amount of data, evaluate its performance on a separate collection of samples, and assess its generalizability to unseen malware images.

Table 1 The MalImg dataset comprises images exclusively in PNG format belonging to 25 malware families

Deep learning vs. adversarial noise: a battle in malware image analysis

Abstract

Similar content being viewed by others

Utilizing InfoGAN and PE Header Features for Synthetic Ransomware Image Generation: An Experimental Study

Auxiliary-Classifier GAN for Malware Analysis

Malware Classification Using Image Representation

Explore related subjects

1 Introduction

2 Related works

2.1 Static analysis

2.2 Dynamic analysis

2.3 Visualization based techniques

3 Proposed method

3.1 Dataset preparation

3.2 Grayscale image generation

3.3 Feature extraction

3.3.1 LBP image generation

3.4 Classification model

3.4.1 ResNet50

3.4.2 VGG16

3.5 Attacks on machine learning algorithms

4 Experiments and results

4.1 Dataset description

4.2 The performance of classification models on benchmark datasets

4.3 Performance evaluation of the proposed models

4.4 Evaluation of proposed model on obfuscated samples

4.5 Performance evaluation of proposed model on evasion attack

4.6 Comparative analysis of the proposed system with the state-of-the-art approaches

4.7 Discussion and limitations

5 Conclusions and future scope

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 2196 KB)

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation