DNNdroid: Android Malware Detection Framework Based on Federated Learning and Edge Computing

Mahindru, Arvind; Arora, Himani

doi:10.1007/978-3-031-23095-0_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1760))

Included in the following conference series:

International Conference on Advancements in Smart Computing and Information Security

447 Accesses
4 Citations

Abstract

The fact that apps are available for free via Android’s official store has helped the platform become more popular. The functionality of Android apps is reliant on permissions. Due to these permissions, cybercriminals developed malware-infected apps for smartphone users. The main fault lies in the permission model of Android. To address this issue, a framework entitled “DNNdroid” is proposed that work on the principle of federated learning. Information related to newly installed apps is stored on the user’s device only and this information is not revealed to the developer. In the meantime, input from all the users is collected simultaneously to train the model with a federated learning process, so that a better classification model is developed. The main challenge in this is that a user is not able to identify whether an app is malware-infected or not. The experiment result reveals that the cloud server has an F1 score of 97.8% having a recall rate of client than 0.95 false positive rates using 1,00,000 unique Android apps with 500 plus users and 50 rounds of the federation. Further, an experiment is performed by using frameworks available in the literature and different anti-virus scanners.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Study on Android Malware Classification by Using Federated Learning

A Dynamic Weighted Federated Learning for Android Malware Classification

Systematic Review on Various Techniques of Android Malware Detection

Keywords

1 Introduction

According to the global data report^{Footnote 1}, Android has captured 71.54% market share. The main reason for its popularity is the availability of free apps in its official play store. Cybercriminals are taking advantage of this and developing malware-infected apps on a daily basis for smartphone users. In the literature [5, 16,17,18], researchers and academicians proposed different malware detection frameworks that work on machine learning techniques and achieved success too. Often, developing an accurate malware detection model with classification machine learning algorithms is dependent upon the extensive collection of datasets. But it has a limitation, it affects the privacy of smartphone users [23]. To address these issues, there is a need for decentralized entry information which also respects the user’s privacy and will not expose to third parties too.

In the literature [14, 19], academicians and researchers proposed three different machine learning solutions for malware detection i.e., client-based, cloud-based, and hybrid (combination of cloud-based and client-based) techniques. In the client-based approach, collected data is protected due to the machine learning model being developed locally on the host computer but the process is time-consuming and has the highest value of false positives. In the cloud-based approach, this process is reversed of it. It is developed by using a large set of features and reveals which app is installed by the users too. Last, in the hybrid model, Android apps that are malware-infected are sent to the cloud for further analysis. This type of solution has a high number of false positives and reveals users’ private data to the cloud.

In identifying malware from Android devices, machine learning algorithms are without a doubt incredibly effective. But, to develop effective malware detection model, a large amount of information is required. In the literature [25], it was observed that large amount of features are available at the central place for training and testing the model. Addition to it, it has seen that developed model memorize and disclose information related to the dataset. To address this issue, we consider the key question that is to be answer in this study, i.e. How can we create a decentralized, privacy-preserving classifier for Android malware?

In this study, we proposed DNNdroid - a model that is based on classification techniques and uses the principle of federated learning and respecting the user’s privacy. The proposed model collects the features from the user’s smartphone without prior knowledge that an app was installed from its official play store or any other promised repositories. The proposed framework reduces the dependency of users on cloud-based technique and also benefit them in term of privacy.

In the literature [1, 2, 6, 12, 21], state-of-the-art federated learning techniques were discussed by researchers and academicians. In which, smartphone users test their data locally by using a supervised machine learning algorithm and the resultant performance is updated to the cloud for the betterment of the model. Our proposed model enhances the existing work by incorporating the principle of deep learning at the time of training the model. Further, we evaluate our proposed model by using 1,00,000 unique Android apps out of which 75,000 are benign and 25,000 are malware-infected apps with 500 plus users and 108 rounds of the federation. Additionally, we contrasted our approach with pre-existing frameworks found in the literature and several anti-virus scanners sold today.

The novel and unique contributions of this study are as under:

To the best of our knowledge, this is the first research paper, which trained with the help of dynamic features of Android apps and prevent the users privacy too.
In this study, we also demonstrate the effectiveness of our proposed model against malware-infected apps.

The rest of the paper is organized as follows. In Sect. 2, we discuss the related work done in the field of Android malware detection using federated learning. Section 3, described the collection of datasets from different promised repositories. The machine learning technique implemented in our proposed framework is discussed in Sect. 4. Section 5 describes the architecture of our proposed framework. Evaluating the proposed framework is discussed in Sect. 6. In Sect. 7, we compare our proposed framework with the existing framework and the distinct anti-virus available in the market. The experimental finding is discussed in Sect. 7. Section 8 discusses the conclusion and the future scope of this study.

2 Related Work

Hsu et al. [9] proposed a malware detection for Android named as privacy-preserving federated learning (PPFL). The proposed model is trained by using SVM as a base classifier. They developed their model by trained them using static analysis. Experiment result reveals that proposed model achieved higher detection rate as compared to decentralized models. Empirical results also reveals that if number of clients increases the accuracy is also increases. Gálvez et al. [6] proposed a malware detection model named as LiM that work on the principles of semi-supervised machine learning technique and federated learning. Experiment was performed on 50,000 Android apps having 200 users and 50 rounds of federation. Taheri et al. [26] proposed malware detection model entitled FEd-IIoT for detecting malware in IIoT. The results of the experiment corroborate the high accuracy rates of our attack and defence algorithms and demonstrate how the A3GAN defensive strategy protects the robustness of data privacy for Android mobile users and is around 8% more accurate than current state-of-the-art solutions.

3 Datasets

In this study, we collect Android application packages (.apk) from Google play store^{Footnote 2}, AppChina^{Footnote 3}, Android^{Footnote 4} and Mumayi^{Footnote 5}. Malware-infected apps were collected from AndroMalShare^{Footnote 6} and Malgenomeproject [29]. Table 1 represent the collected Android apps.

Table 1. Collected Android apps.

Full size table

Feature Dataset. Extraction of features are done as per the study [15]. 1844 distinct features are extracted from collected Android apps. Features play an important role to train the classification model. In the literature [11, 20], different feature selection techniques were proposed by researchers and academics. In this work, we implement chi-square test to select significant features that helps to train the model.

4 Machine Learning Technique

Deep Neural Network (DNN) is implemented to train the model in cloud-based architecture i.e., base learner in our study. In the literature, authors proposed two distinct methods to develop model using DNN i.e., Deep Belief Networks (DBN) and Convolutional neural networks (CNN). In the current study, we decide to build our deep learning model using DBN architecture. The deep learning method’s architecture is shown in Fig. 1. It consists of two stages: supervised back propagation in the first and unsupervised pre-training in the second. Restricted Boltzmann Machines (RBM) and a deep neural network are used to train the model in the initial stages of model construction. The model is built using an iterative procedure in the training phase using unlabeled Android apps. Pre-trained DBN is adjusted using labelled Android apps in a supervised way during the back-propagation step. An Android app is used in both stages of the training process for a model created using the deep learning technique.

5 Proposed Framework Architecture

Federated learning is implemented based on a decentralized approach to training the model. Clients implement the process locally and the outcome is shared with the service provider. The main success of federated learning is dependent upon the labeled dataset which can be used to train the model. But it has one limitation, smartphone users do not know what to label malicious or benign. To overcome this issue, in our study we implement supervised learning at the cloud-based structure i.e., labeled dataset, and unsupervised learning at the client-based structure i.e., unlabelled dataset.

In the proposed framework, the federation of learning has happened in the cloud database and the client estimates the unlabeled dataset at the time of testing. In addition, this cloud server collects all the data from clients and aggregates them, and presents the weight. Figure 2 demonstrates the architecture of the proposed work. The following steps are taken to train and evaluate the model.

1.
Server Side: First of all, labeled data is given to the server to train it as a base classifier and send unlabeled data to assess the weight from it.
2.
Client Side: Client receive baseline classifier and base learner.
3.
Weight Gain: Client gained the estimated weights from installed apps.
4.
Calculate Estimate Weight: Client calculates the average weight.
5.
Predict at the Client Side: Client classifies the installed apps.
6.
Complete the process: Client computes the aggregate weight and uploads them to the cloud for further processing.
7.
Aggregate at the cloud: Cloud collects all the weight and averages them.
8.
Median client-cloud weight: At last the median weight of both client and cloud are computed and used for further processing.

6 Evaluation of Proposed Framework

To evaluate the proposed framework, we set up a server with 500 users iterating over 100 different federation rounds. The client model is run parallelly on Intel Core i7 machine having 16 GB RAM. In this study, we consider two different parameters to evaluate our proposed model i.e., Accuracy and F-measure. Table 2 shows the confusion matrix for determining if an app is malware-infected or benign.

Table 2. Confusion matrix consider in this study. (.apk)

Full size table

Following terminology are used in this study for evaluate the proposed framework.

Recall: Recall measures the number of precise class predictions generated from all of the positive examples in the dataset.
$$\begin{aligned} Recall =\frac{a}{a + c},\end{aligned}$$
(1)
where $a= N_{Malware\rightarrow Malware},$

$b= N_{Benign\rightarrow Malware},$

$c= N_{Malware\rightarrow Benign}$
Precision: Precision is the percentage of predicted members of a positive class that really belong to that class.
$$\begin{aligned} Precision =\frac{a}{a + b}.\end{aligned}$$
(2)

Accuracy: Accuracy is computed as mentioned in [14]:

$$\begin{aligned} Accuracy=\frac{a+d}{N_{classes}},\end{aligned}$$

(3)

where ${N_{classes} = a+b+c+d}$,

$d= N_{Benign\rightarrow Benign}$

F-measure: F-measure is computed as mentioned in [14]:

$$\begin{aligned} F-measure=&\frac{2*Recall*Precision}{Recall+Precision}\nonumber \\&=\frac{2*a}{2*a+b+c} \end{aligned}$$

(4)

Table 3 shows the calculated value of accuracy and F-measure using above mentioned equations features selected by using chi-square analysis. From empirical study, it can be observed that by using 300 unique features we gain the optimal value in terms of detection rate. Figure 3, demonstrate the computed value of the F-measure and False positive rate having 40 round of federation.

Table 3. Calculated Accuracy and F-measure by using first 50, 100, 150, 200 and 300 features having 50 rounds of federation learning.

Full size table

Based on Table 3 and Fig. 3, we are having the following observations:

It can be inferred that how an optimal number of features are required to train the model.
It can reveal that increasing the value of the federated learning model it is a directly paid impact on the detection rate of malware-infected apps.
By increasing the value of the federated model, it can also be paid to impact the value of the false positive rate.

7 Comparison of Proposed Framework

In this study, we select two different methods to validate our proposed work which are described below:

7.1 Comparison on the Basis of Framework Available in the Literature

To examine whether our proposed framework is equivalent to a previously developed framework or not, we compare our proposed framework with ten distinct previously developed frameworks available in the literature. To perform this, we consider Drebin dataset [3] in our study. Table 4 shows the result of our empirical analysis.

Table 4. Comparing the proposed framework to existing frameworks or methods.

Full size table

7.2 Comparison of the Proposed Framework with Different Anti-virus Scanners

In this study, we evaluate the available free antivirus scanners on the market with our suggested framework for detecting malware. In this study, the Drebin dataset is used to provide empirical results. Table 5 compares various antivirus scanners using the structure we’ve suggested.

Table 5. Comparative analysis using various antivirus scanners.

Full size table

8 Experimental Findings

Based on the experimental outcomes, the following are the experimental findings of this research article.

Tables 4 and 5 provide evidence that the suggested framework is capable of identifying malware in real-world apps.
Based on empirical findings, it can be concluded that the suggested methodology can identify malware-infected apps more quickly than other anti-virus scanners on the market.

9 Conclusion

Based on empirical studies, it can be concluded that our suggested framework has an accuracy of 98.7% and can identify malware-infected apps with 500 unique attributes and 40 different federation rounds. Additionally, experimental results show that our suggested framework is more accurate than other anti-virus scanners and proposed frameworks in the literature. Further, the work will be extended by implementing distinct feature selection approaches and soft computing techniques.

Notes

References

Alyamani, H.J.: Cyber security for federated learning environment using AI technique. Expert Syst. e13080 (2022)
Google Scholar
Arisdakessian, S., Wahab, O.A., Mourad, A., Otrok, H., Guizani, M.: A survey on IoT intrusion detection: federated learning, game theory, social psychology and explainable AI as future directions. IEEE Internet Things J. (2022). https://doi.org/10.1109/JIOT.2022.3203249
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014)
Google Scholar
Enck, W., et al.: TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. (TOCS) 32(2), 1–29 (2014)
Article Google Scholar
Faruki, P., Ganmoor, V., Laxmi, V., Gaur, M.S., Bharmal, A.: AndroSimilar: robust statistical feature signature for android malware detection. In: Proceedings of the 6th International Conference on Security of Information and Networks, pp. 152–159 (2013)
Google Scholar
Gálvez, R., Moonsamy, V., Diaz, C.: Less is more: a privacy-respecting android malware classifier using federated learning. arXiv preprint arXiv:2007.08319 (2020)
Han, W., Xue, J., Wang, Y., Liu, Z., Kong, Z.: MalInsight: a systematic profiling based malware detection framework. J. Netw. Comput. Appl. 125, 236–250 (2019)
Article Google Scholar
Hou, S., Ye, Y., Song, Y., Abdulhayoglu, M.: HinDroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1507–1515 (2017)
Google Scholar
Hsu, R.H., et al.: A privacy-preserving federated learning system for android malware detection based on edge computing. In: 2020 15th Asia Joint Conference on Information Security (AsiaJCIS), pp. 128–136. IEEE (2020)
Google Scholar
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection using deep learning. Digital Invest. 24, S48–S59 (2018)
Article Google Scholar
Kumar, L., Misra, S., Rath, S.K.: An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput. Stand. Interfaces 53, 1–32 (2017)
Article Google Scholar
Lin, K.Y., Huang, W.R.: Using federated learning on malware classification. In: 2020 22nd International Conference on Advanced Communication Technology (ICACT), pp. 585–589. IEEE (2020)
Google Scholar
Lindorfer, M., Neugschwandtner, M., Weichselbaum, L., Fratantonio, Y., Veen, V.V.D., Platzer, C.: ANDRUBIS-1,000,000 apps later: a view on current android malware behaviors. In: 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), pp. 3–17. IEEE (2014)
Google Scholar
Mahindru, A., Sangal, A.: MLDroid-framework for android malware detection using machine learning techniques. Neural Comput. Appl. 33, 5183–5240 (2020)
Article Google Scholar
Mahindru, A., Sangal, A.: PARUDroid: validation of android malware detection dataset. J. Cybersecurity Inf. Manage. 3(02), 42–52 (2020)
Article Google Scholar
Mahindru, A., Sangal, A.L.: PerbDroid: effective malware detection model developed using machine learning classification techniques. In: Singh, J., Bilgaiyan, S., Mishra, B.S.P., Dehuri, S. (eds.) A Journey Towards Bio-inspired Techniques in Software Engineering. ISRL, vol. 185, pp. 103–139. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40928-9_7
Chapter Google Scholar
Mahindru, A., Sangal, A.L.: SOMDROID: android malware detection by artificial neural network trained using unsupervised learning. Evol. Intell. 15(1), 407–437 (2020)
Article Google Scholar
Mahindru, A., Sangal, A.: HybriDroid: an empirical analysis on effective malware detection model developed using ensemble methods. J. Supercomput. 77(8), 8209–8251 (2021)
Article Google Scholar
Mariconti, E., Onwuzurike, L., Andriotis, P., Cristofaro, E.D., Ross, G., Stringhini, G.: MamaDroid: detecting android malware by building Markov chains of behavioral models. arXiv preprint arXiv:1612.04433 (2016)
Martín, A., Menéndez, H.D., Camacho, D.: MOCDroid: multi-objective evolutionary classifier for android malware detection. Soft Comput. 21(24), 7405–7415 (2017)
Article Google Scholar
Pei, X., Deng, X., Tian, S., Zhang, L., Xue, K.: A knowledge transfer-based semi-supervised federated learning for IoT malware detection. IEEE Trans. Dependable Secure Comput. 21, 7405–7415 (2022)
Google Scholar
Portokalidis, G., Homburg, P., Anagnostakis, K., Bos, H.: Paranoid android: versatile protection for smartphones. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 347–356 (2010)
Google Scholar
Saracino, A., Sgandurra, D., Dini, G., Martinelli, F.: MADAM: effective and efficient behavior-based android malware detection and prevention. IEEE Trans. Dependable Secure Comput. 15(1), 83–97 (2016)
Article Google Scholar
Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., Weiss, Y.: “Andromaly’’: a behavioral malware detection framework for android devices. J. Intell. Inf. Syst. 38(1), 161–190 (2012)
Article Google Scholar
Song, C., Ristenpart, T., Shmatikov, V.: Machine learning models that remember too much. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 587–601 (2017)
Google Scholar
Taheri, R., Shojafar, M., Alazab, M., Tafazolli, R.: FED-IIoT: a robust federated malware detection architecture in industrial IoT. IEEE Trans. Ind. Inf. 17(12), 8442–8452 (2020)
Article Google Scholar
Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient Intell. Humanized Comput. 10(8), 3035–3043 (2019)
Article Google Scholar
Xu, R., Saidi, H., Anderson, R.: Aurasium: practical policy enforcement for android applications. In: Presented as part of the 21st USENIX Security Symposium USENIX Security 2012, pp. 539–552 (2012)
Google Scholar
Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on Security and Privacy, pp. 95–109. IEEE (2012)
Google Scholar
Zhu, H.J., Jiang, T.H., Ma, B., You, Z.H., Shi, W.L., Cheng, L.: HEMD: a highly efficient random forest-based malware detection framework for android. Neural Comput. Appl. 30(11), 3353–3361 (2018)
Article Google Scholar
Zhu, H.J., You, Z.H., Zhu, Z.X., Shi, W.L., Chen, X., Cheng, L.: DroidDet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Applications, DAV University, Jalandhar, India
Arvind Mahindru
Department of Mathematics, Guru Nanak Dev University, Amritsar, India
Himani Arora

Authors

Arvind Mahindru
View author publications
You can also search for this author in PubMed Google Scholar
Himani Arora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arvind Mahindru .

Editor information

Editors and Affiliations

Marwadi University, Rajkot, India
Sridaran Rajagopal
AVPTI, Rajkot, India
Parvez Faruki
Marwadi University, Rajkot, India
Kalpesh Popat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mahindru, A., Arora, H. (2022). DNNdroid: Android Malware Detection Framework Based on Federated Learning and Edge Computing. In: Rajagopal, S., Faruki, P., Popat, K. (eds) Advancements in Smart Computing and Information Security. ASCIS 2022. Communications in Computer and Information Science, vol 1760. Springer, Cham. https://doi.org/10.1007/978-3-031-23095-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-23095-0_7
Published: 11 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23094-3
Online ISBN: 978-3-031-23095-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DNNdroid: Android Malware Detection Framework Based on Federated Learning and Edge Computing

Abstract

Similar content being viewed by others

A Study on Android Malware Classification by Using Federated Learning

A Dynamic Weighted Federated Learning for Android Malware Classification

Systematic Review on Various Techniques of Android Malware Detection

Keywords

1 Introduction

2 Related Work

3 Datasets

4 Machine Learning Technique

5 Proposed Framework Architecture

6 Evaluation of Proposed Framework

7 Comparison of Proposed Framework

7.1 Comparison on the Basis of Framework Available in the Literature

7.2 Comparison of the Proposed Framework with Different Anti-virus Scanners

8 Experimental Findings

9 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DNNdroid: Android Malware Detection Framework Based on Federated Learning and Edge Computing

Abstract

Similar content being viewed by others

A Study on Android Malware Classification by Using Federated Learning

A Dynamic Weighted Federated Learning for Android Malware Classification

Systematic Review on Various Techniques of Android Malware Detection

Keywords

1 Introduction

2 Related Work

3 Datasets

4 Machine Learning Technique

5 Proposed Framework Architecture

6 Evaluation of Proposed Framework

7 Comparison of Proposed Framework

7.1 Comparison on the Basis of Framework Available in the Literature

7.2 Comparison of the Proposed Framework with Different Anti-virus Scanners

8 Experimental Findings

9 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation