Keywords

1 Introduction

According to the global data reportFootnote 1, Android has captured 71.54% market share. The main reason for its popularity is the availability of free apps in its official play store. Cybercriminals are taking advantage of this and developing malware-infected apps on a daily basis for smartphone users. In the literature [5, 16,17,18], researchers and academicians proposed different malware detection frameworks that work on machine learning techniques and achieved success too. Often, developing an accurate malware detection model with classification machine learning algorithms is dependent upon the extensive collection of datasets. But it has a limitation, it affects the privacy of smartphone users [23]. To address these issues, there is a need for decentralized entry information which also respects the user’s privacy and will not expose to third parties too.

In the literature [14, 19], academicians and researchers proposed three different machine learning solutions for malware detection i.e., client-based, cloud-based, and hybrid (combination of cloud-based and client-based) techniques. In the client-based approach, collected data is protected due to the machine learning model being developed locally on the host computer but the process is time-consuming and has the highest value of false positives. In the cloud-based approach, this process is reversed of it. It is developed by using a large set of features and reveals which app is installed by the users too. Last, in the hybrid model, Android apps that are malware-infected are sent to the cloud for further analysis. This type of solution has a high number of false positives and reveals users’ private data to the cloud.

In identifying malware from Android devices, machine learning algorithms are without a doubt incredibly effective. But, to develop effective malware detection model, a large amount of information is required. In the literature [25], it was observed that large amount of features are available at the central place for training and testing the model. Addition to it, it has seen that developed model memorize and disclose information related to the dataset. To address this issue, we consider the key question that is to be answer in this study, i.e. How can we create a decentralized, privacy-preserving classifier for Android malware?

In this study, we proposed DNNdroid - a model that is based on classification techniques and uses the principle of federated learning and respecting the user’s privacy. The proposed model collects the features from the user’s smartphone without prior knowledge that an app was installed from its official play store or any other promised repositories. The proposed framework reduces the dependency of users on cloud-based technique and also benefit them in term of privacy.

In the literature [1, 2, 6, 12, 21], state-of-the-art federated learning techniques were discussed by researchers and academicians. In which, smartphone users test their data locally by using a supervised machine learning algorithm and the resultant performance is updated to the cloud for the betterment of the model. Our proposed model enhances the existing work by incorporating the principle of deep learning at the time of training the model. Further, we evaluate our proposed model by using 1,00,000 unique Android apps out of which 75,000 are benign and 25,000 are malware-infected apps with 500 plus users and 108 rounds of the federation. Additionally, we contrasted our approach with pre-existing frameworks found in the literature and several anti-virus scanners sold today.

The novel and unique contributions of this study are as under:

  • To the best of our knowledge, this is the first research paper, which trained with the help of dynamic features of Android apps and prevent the users privacy too.

  • In this study, we also demonstrate the effectiveness of our proposed model against malware-infected apps.

The rest of the paper is organized as follows. In Sect. 2, we discuss the related work done in the field of Android malware detection using federated learning. Section 3, described the collection of datasets from different promised repositories. The machine learning technique implemented in our proposed framework is discussed in Sect. 4. Section 5 describes the architecture of our proposed framework. Evaluating the proposed framework is discussed in Sect. 6. In Sect. 7, we compare our proposed framework with the existing framework and the distinct anti-virus available in the market. The experimental finding is discussed in Sect. 7. Section 8 discusses the conclusion and the future scope of this study.

2 Related Work

Hsu et al. [9] proposed a malware detection for Android named as privacy-preserving federated learning (PPFL). The proposed model is trained by using SVM as a base classifier. They developed their model by trained them using static analysis. Experiment result reveals that proposed model achieved higher detection rate as compared to decentralized models. Empirical results also reveals that if number of clients increases the accuracy is also increases. Gálvez et al. [6] proposed a malware detection model named as LiM that work on the principles of semi-supervised machine learning technique and federated learning. Experiment was performed on 50,000 Android apps having 200 users and 50 rounds of federation. Taheri et al. [26] proposed malware detection model entitled FEd-IIoT for detecting malware in IIoT. The results of the experiment corroborate the high accuracy rates of our attack and defence algorithms and demonstrate how the A3GAN defensive strategy protects the robustness of data privacy for Android mobile users and is around 8% more accurate than current state-of-the-art solutions.

3 Datasets

In this study, we collect Android application packages (.apk) from Google play storeFootnote 2, AppChinaFootnote 3, AndroidFootnote 4 and MumayiFootnote 5. Malware-infected apps were collected from AndroMalShareFootnote 6 and Malgenomeproject [29]. Table 1 represent the collected Android apps.

Table 1. Collected Android apps.

Feature Dataset. Extraction of features are done as per the study [15]. 1844 distinct features are extracted from collected Android apps. Features play an important role to train the classification model. In the literature [11, 20], different feature selection techniques were proposed by researchers and academics. In this work, we implement chi-square test to select significant features that helps to train the model.

Fig. 1.
figure 1

Architecture of deep neural network.

4 Machine Learning Technique

Deep Neural Network (DNN) is implemented to train the model in cloud-based architecture i.e., base learner in our study. In the literature, authors proposed two distinct methods to develop model using DNN i.e., Deep Belief Networks (DBN) and Convolutional neural networks (CNN). In the current study, we decide to build our deep learning model using DBN architecture. The deep learning method’s architecture is shown in Fig. 1. It consists of two stages: supervised back propagation in the first and unsupervised pre-training in the second. Restricted Boltzmann Machines (RBM) and a deep neural network are used to train the model in the initial stages of model construction. The model is built using an iterative procedure in the training phase using unlabeled Android apps. Pre-trained DBN is adjusted using labelled Android apps in a supervised way during the back-propagation step. An Android app is used in both stages of the training process for a model created using the deep learning technique.

5 Proposed Framework Architecture

Federated learning is implemented based on a decentralized approach to training the model. Clients implement the process locally and the outcome is shared with the service provider. The main success of federated learning is dependent upon the labeled dataset which can be used to train the model. But it has one limitation, smartphone users do not know what to label malicious or benign. To overcome this issue, in our study we implement supervised learning at the cloud-based structure i.e., labeled dataset, and unsupervised learning at the client-based structure i.e., unlabelled dataset.

Fig. 2.
figure 2

Proposed framework i.e., DNNDroid.

In the proposed framework, the federation of learning has happened in the cloud database and the client estimates the unlabeled dataset at the time of testing. In addition, this cloud server collects all the data from clients and aggregates them, and presents the weight. Figure 2 demonstrates the architecture of the proposed work. The following steps are taken to train and evaluate the model.

  1. 1.

    Server Side: First of all, labeled data is given to the server to train it as a base classifier and send unlabeled data to assess the weight from it.

  2. 2.

    Client Side: Client receive baseline classifier and base learner.

  3. 3.

    Weight Gain: Client gained the estimated weights from installed apps.

  4. 4.

    Calculate Estimate Weight: Client calculates the average weight.

  5. 5.

    Predict at the Client Side: Client classifies the installed apps.

  6. 6.

    Complete the process: Client computes the aggregate weight and uploads them to the cloud for further processing.

  7. 7.

    Aggregate at the cloud: Cloud collects all the weight and averages them.

  8. 8.

    Median client-cloud weight: At last the median weight of both client and cloud are computed and used for further processing.

6 Evaluation of Proposed Framework

To evaluate the proposed framework, we set up a server with 500 users iterating over 100 different federation rounds. The client model is run parallelly on Intel Core i7 machine having 16 GB RAM. In this study, we consider two different parameters to evaluate our proposed model i.e., Accuracy and F-measure. Table 2 shows the confusion matrix for determining if an app is malware-infected or benign.

Table 2. Confusion matrix consider in this study. (.apk)

Following terminology are used in this study for evaluate the proposed framework.

  • Recall: Recall measures the number of precise class predictions generated from all of the positive examples in the dataset.

    $$\begin{aligned} Recall =\frac{a}{a + c},\end{aligned}$$
    (1)

    where \(a= N_{Malware\rightarrow Malware},\)

    \(b= N_{Benign\rightarrow Malware},\)

    \(c= N_{Malware\rightarrow Benign}\)

  • Precision: Precision is the percentage of predicted members of a positive class that really belong to that class.

    $$\begin{aligned} Precision =\frac{a}{a + b}.\end{aligned}$$
    (2)

Accuracy: Accuracy is computed as mentioned in [14]:

$$\begin{aligned} Accuracy=\frac{a+d}{N_{classes}},\end{aligned}$$
(3)

where \({N_{classes} = a+b+c+d}\),

\(d= N_{Benign\rightarrow Benign}\)

F-measure: F-measure is computed as mentioned in [14]:

$$\begin{aligned} F-measure=&\frac{2*Recall*Precision}{Recall+Precision}\nonumber \\&=\frac{2*a}{2*a+b+c} \end{aligned}$$
(4)

Table 3 shows the calculated value of accuracy and F-measure using above mentioned equations features selected by using chi-square analysis. From empirical study, it can be observed that by using 300 unique features we gain the optimal value in terms of detection rate. Figure 3, demonstrate the computed value of the F-measure and False positive rate having 40 round of federation.

Table 3. Calculated Accuracy and F-measure by using first 50, 100, 150, 200 and 300 features having 50 rounds of federation learning.
Fig. 3.
figure 3

F-measure and False positive rate after federation round.

Based on Table 3 and Fig. 3, we are having the following observations:

  • It can be inferred that how an optimal number of features are required to train the model.

  • It can reveal that increasing the value of the federated learning model it is a directly paid impact on the detection rate of malware-infected apps.

  • By increasing the value of the federated model, it can also be paid to impact the value of the false positive rate.

7 Comparison of Proposed Framework

In this study, we select two different methods to validate our proposed work which are described below:

7.1 Comparison on the Basis of Framework Available in the Literature

To examine whether our proposed framework is equivalent to a previously developed framework or not, we compare our proposed framework with ten distinct previously developed frameworks available in the literature. To perform this, we consider Drebin dataset [3] in our study. Table 4 shows the result of our empirical analysis.

Table 4. Comparing the proposed framework to existing frameworks or methods.

7.2 Comparison of the Proposed Framework with Different Anti-virus Scanners

In this study, we evaluate the available free antivirus scanners on the market with our suggested framework for detecting malware. In this study, the Drebin dataset is used to provide empirical results. Table 5 compares various antivirus scanners using the structure we’ve suggested.

Table 5. Comparative analysis using various antivirus scanners.

8 Experimental Findings

Based on the experimental outcomes, the following are the experimental findings of this research article.

  • Tables 4 and 5 provide evidence that the suggested framework is capable of identifying malware in real-world apps.

  • Based on empirical findings, it can be concluded that the suggested methodology can identify malware-infected apps more quickly than other anti-virus scanners on the market.

9 Conclusion

Based on empirical studies, it can be concluded that our suggested framework has an accuracy of 98.7% and can identify malware-infected apps with 500 unique attributes and 40 different federation rounds. Additionally, experimental results show that our suggested framework is more accurate than other anti-virus scanners and proposed frameworks in the literature. Further, the work will be extended by implementing distinct feature selection approaches and soft computing techniques.