Abstract
Working of android apps depends upon the permissions. While there is an exponential growth of android apps in the last decade, the security of smartphones became a crucial factor. In the literature, academicians and researchers proposed malware detection frameworks based on the principle of simple neural network and regression analysis. In this research article, three artificial intelligence techniques are based on the principle of hybrid approach. Proposed approach are based on functional link artificial neural network (FLANN) with clonal selection algorithm (CSA), particle swarm optimization (PSO) and genetic algorithm (GA), i.e., FLANN-CSA (FCSA), FLANN-PSO (FPSO and MFPSO) and FLANN-genetic (FGA and AFGA). Proposed machine learning techniques are applied on five million distinct android apps. In addition to that, this research article also paid attention toward feature selection techniques such as rough set analysis (RSA) and principal component analysis (PCA) when they are implemented for malware detection. Empirical result reveals that feature reduction approaches are extremely effective in detective malware by employing FLANN-Genetic.
Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
1 Introduction
Android has gained popularity in the year 2011 due to its open-source and number of free apps in its official play store.Footnote 1 According to the statistics,Footnote 2 more than 2.87 million free apps are present in Google Play Store. Working of android apps depends upon the permissions. At the time of installation, android apps required certain permissions that are required for its proper functioning. On daily basis, cyber-criminals are taking advantage of these permissions and develop malware-infected apps for smartphone users. According to the survey done by Kaspersky Security Network,Footnote 3, there are millions of malware-infected apps which are still submitted in Google Play Store and third-party app stores.
According to the report published by Gartner,Footnote 4, the growth of smartphone is increased by 11% in the upcoming year. During pandemic, everyone dependent upon apps for their jobs. At the time of installation and run-time, android apps demand certain permissions. Google defined these permissionsFootnote 5 as “normal” or “dangerous.” Normal permissions do not pay any impact on user’s privacy. In the reverse, dangerous permissions paid a great effect on user’s privacy. The fault lies in the underneath permission model of android apps.
In the literature [12, 14,15,16,17,18,19,20,21,22,23,24], number of authors proposed android malware detection frameworks using supervised and unsupervised machine learning techniques. The main limitation in their work is that researchers and academicians used limited datasets. In order to achieve better detection rate, in this research article, we proposed a framework that is based on the principle of hybrid artificial intelligence techniques approach of functional link artificial neural network (FLANN) with clonal selection algorithm (CSA), particle swarm optimization (PSO) and genetic algorithm (GA), i.e., FLANN-CSA (FCSA), FLANN-PSO (FPSO and MFPSO) and FLANN-genetic (FGA and AFGA). This study also focuses on the effectiveness of feature selection techniques, i.e., principal component analysis (PCA) and rough set analysis (RSA), which are used to reduce the complexity of the proposed model by minimizing the number of inputs.
The generic steps that are followed in this research paper to identify malware-infected apps are shown in Fig. 1. Initially in the first step, we collect Android Application Packages (.apk) files from different repositories. In the second step, we extracted dynamic features and form the features dataset. Implemented of feature selection techniques is performed in the third step. Further, features are selected by implementing feature selection approaches. In the last step, we validate our developed models by using two performance parameters, i.e., accuracy and F-measure.
The unique and novel contributions of this study are as follows:
-
To build efficient and effective malware detection model, in this study more than five millions android apps are utilized.
-
Dynamic analysis was performed on collected android apps, and 1844 unique features are extracted.
-
In this chapter, five different hybrid functional link artificial neural networks are proposed.
The rest of the chapter is summarized as follows. In Sect. 2, related work is discussed. Collection of .apk file and formulation of feature dataset is discussed in Sect. 3. Implemented feature selection techniques are discussed in Sect. 4. Section 5 discusses the proposed hybrid machine learning algorithms. Experimental setup to proposed the framework is discussed in Sect. 6. Outcome of the experiment is discussed in Sect. 7. At last, chapter is concluded in Sect. 8.
2 Related Work
Hou et al. [7] proposed a malware detection framework named as “Droiddelver” based on Application Programming Interface (API) that is extracted from smali files. Proposed model was build by using 5000 different android apps and a deep belief network as a machine learning technique. Empirical outcome reveals that the proposed model was able to detect 96.66% of malware-infected apps. Hou et al. [6] proposed a malware detection model named as “Deep4MalDroid” developed on the basis of dynamic analysis approach called component traversal which follows code routines of particular android apps. Based on the extracted features, they construct the weighted directed graphs and then applied deep learning as a machine learning algorithm. An experiment was performed by using 3000 android apps and detect 91.4% malware-infected apps.
Mahindru and Singh [25] proposed dynamic analysis-based approach that are build by using 123 features. An experiment was performed by using 11,000 distinct android apps and five different machine learning algorithms. The malware detection model developed by using simple logistic achieved a higher detection rate as compared to others. Hou et al. [8] developed a framework entitled as “HinDroid” based on the relationships between API calls and developed higher-level semantics that require more efforts for attackers. An experiment was performed by using two different datasets; i.e., one contains 1834 distinct android apps, and the second contains 30,000 distinct android apps. Proposed malware detection framework was able to identify 99.01% malware-infected apps. Martín et al. [26] developed a model named as “MOCDroid,” that is based on the integrity of genetic algorithm. An experiment was performed by using 17,135 android apps and achieved an accuracy of 94.60%. Tong and Yan [30] proposed a hybrid approach that works on the combination of static and dynamic features. Experiment was carry-out by utilizing 2000 different android apps while considering API calls as a feature. Proposed malware detection model achieved the detection rate of 90.19%.
Karbab et al. [10] developed malware detection model named as “MalDozer” that is based on the principle of deep learning techniques. Developed model uses the behavior of API calls to recognize the behavior of benign and malware apps. The developed framework was tested on 38 K benign apps and 33 K malware-infected apps and achieved an F1-score of 96–99%. Cai et al. [4] proposed a dynamic malware detection approach that used calls and inter-component communication as features. An experiment was performed by using 34,343 android apps and the proposed framework achieved an accuracy of 97%. Kim et al. [11] developed a malware detection model on the basis of multimodal deep learning. Features were extracted from the manifest file, dex file and shared libraries for developing the model. The developed model was tested with 41,260 android apps and achieved an accuracy of 98%. Yerima et al. [34] proposed detection model entitled as “DroidFusion,” that is based on the principle of feature selection techniques and implement multiple machine learning algorithms. The proposed malware detection model was tested with 55,018 distinct smartphone apps and achieved the detection rate of 97%. Shen et al. [28] developed a malware detection model that works on the principle of information flow analysis. Developed model is based on the structure of information flows to know the pattern behavior and which helps in distinguishing between benign and malware app. An experiment was performed by using 8598 android apps and achieved an accuracy of 82%.
Arora et al. [1] developed malware detection framework work on graphs that construct by utilizing permissions extract from distinct android apps. An experiment was performed by using 5993 android apps and achieved the detection rate of 95.44%. Xiao et al. [32] developed a model by using deep learning principles. The proposed model is built by using system call sequences and long short-term memory as a machine learning technique. An experiment was performed by using 7103 android apps and achieved an accuracy of 96.6%. Mahindru and Sangal [14] developed a malware detection framework entitled as “DeepDroid” by using significant features selected by feature selection approaches and deep learning as a machine learning technique. Experimental outcome reveals that the framework build by using principal component analysis (PCA) as a feature selection technique achieved a higher detection rate as compared to other techniques. Kumar et al. [12] build a detection framework by utilizing three different data sampling approaches, three different feature selection approaches and seven distinct classifier approaches. Outcome reveals that the framework developed by using upscale sampling technique and ELM with polynomial kernel achieved a higher detection rate as compared to others.
Mahindru and Sangal [16] developed a malware detection framework work on the basis of semi-supervised machine learning techniques. The proposed framework is developed by using four different feature subset selection approaches and LLGC as a machine learning algorithm. The empirical result reveals that framework build using rough set analysis as a feature selection approach achieved the detection rate of 97.8%. Mahindru and Sangal [17] developed malware detection model entitled as “GADroid” that is build by using genetic algorithm as a feature selection approach. Further, selected features are used to build the model by using deep learning as machine learning technique. Experiment was performed on 560,142 distinct android apps, and the developed model is able to achieved an accuracy of 98.6%.
Mahindru and Sangal [19] developed the model named as “PARUDroid.” Proposed model is able to detect 98.8% malware-infected apps. Table 1 describes the frameworks developed in the literature. Previous malware detection model has been proposed with a limited dataset and conquered a higher accuracy with the limited dataset. On the basis of related work, the following questions have been answered in this research article:
RQ1. To identify which malware detection model is more effective in detecting malware from real-world apps?
This question helps in identifying the malware detection model which is more effective in detecting malware from real-world apps. To answer this question, in this study distinct malware detection models are developed and compared with two different performance parameters, i.e., F-measure and accuracy.
RQ2. Is the proposed malware detection framework able to identify malware from android devices or not?
To examine this question, in this study, proposed framework is compared with existing malware detection models presented in the literature.
RQ3. While selected features using feature selection approaches paid any impact on malware detection models or not?
To answer this question, developed using in this research article model developed using all extracted features compared with the models developed by using feature selection techniques.
3 Collection of .apk Files and Formulation of Features Dataset
Collection of five million distinct android apps is performed to use in this research article. Benign .apk files are collected from, i.e., slideme,Footnote 6 mumayi,Footnote 7 hiapk,Footnote 8 appchina,Footnote 9 Google’s play store,Footnote 10 Android,Footnote 11 gfan,Footnote 12 and pandaapp,Footnote 13 and malware-infected apps are collected from Android Malware Genome project [35], 1929, botnet samples were collected from [9] and from AndroMalShareFootnote 14 along with their package names. Table 2 represents the distinct categories of android apps with respect to its numbers. Dynamic analysis was performed by using the principle mentioned in [19]. After that, we divided the extracted features into different categories to which they belong. Formulation of feature dataset is mentioned in Table 3.
4 Feature Selection Techniques
Relevant features paid an important role while developing the malware detection models in case of effectiveness and efficiency. In this research article, to select relevant features two different feature selection approaches are considered, i.e., principal component analysis (PCA) and rough set theory.
4.1 Principal Component Analysis (PCA)
To carry-out a data space, low dimension PCA is considered as feature selection. Figure 2 demonstrates the steps that are considered while selecting features using PCA.
4.2 Rough Set Theory
Rough set theory used to eliminate irrelevant features by using approximation, reduced attributes and information method. Steps that are followed in rough set theory are shown in Fig. 3.
5 Proposed Hybrid Machine Learning Techniques
In this section, we discuss various machine learning algorithms that are developed by using genetic algorithm, clonal selection and particle swarm optimization for detection malware from android apps.
5.1 Functional Link Artificial Neural Network (FLANN)
In this research article, FLANN is implemented to detect malware from android apps. FLANN is worked on the architecture of single layered of artificial neural network (ANN), that is responsible to perform complex decision. The computational cost of ANN is very high, but in the case of FLANN it is very less due to not present of hidden layers. Figure 4 demonstrate the basis architecture of FLANN.
Output is computed by using following equations:
where z and \(\hat{z}\) are the estimated and actual values, \(a_i\) is the function block and W is the weight vector that is defined by using
The revised weight is updated as:
where \(e_i\) is the error value and \(\alpha \) is the learning rate that is determined as:
5.2 FLANN-Genetic (FGA) Technique
This technique is very effective at the time of learning, and it is utilized mostly there for upgrading the weight. A function link neural network with a form of ‘\(a-x\)’ is deemed as estimation; i.e., the network contains l number of input neurons and x number of output neurons.
Weights are calculated using the following equation:
5.3 Adaptive FLANN-Genetic (AFGA) Technique
This approach, paid an impact on two different parameters for its advancement, i.e., probability for mutation \((P_m)\) and probability for cross over \((P_c)\). Updated values of \((P_m)_{k+1}\) and \((P_c)_{k+1}\) is calculated by using the following equations:
5.4 FLANN Particle Swarm Optimization (FPSO) Technique
It is based on the principle of particle swarm optimization and Function link neural network. PSO is utilized to update the weight at learning phase. Figure 5 represents the execution of PSO. Formula to calculate the fitness value is:
where X is the position of particles and V is the velocity.
5.5 FLANN-Clonal Selection Algorithm (FCSA) Approach
FCSA is a hybrid approach using clonal selection algorithm and functional link neural network [13].
5.6 Modified-FLANN Particle Swarm Optimization (MFPSO) Technique
The main difference between PSO and MFPSO approach is that in case of MFPSO mutation stage is included just the completion of first stage. The following equation is required to calculate the update value of mutation.
where \(P_m\) is the first state of mutation and n is the generation number.
6 Experimental Setup
In this section of the chapter, we discuss the experimental setup done to find that developed malware detection model is effective or not. Six different hybrid functional link artificial neural network machine learning algorithms are implemented in this chapter. In Fig. 6, representation of proposed framework is demonstrated. In the first phase, feature selection techniques are implemented, i.e., PCA and rough set theory to select significant features. In the second phase, to normalize the features min-max approach is implemented. Distinct malware detections are developed by using six different machine learning techniques. After that, confusion matrix is developed by using the technique mentioned in [23, 24]. By comparing the malware detection model, best suitable model is selected and compared with the existing framework mentioned in the literature. If the detection rate is high after comparing the models with the existing framework, then proposed framework is useful or vice versa.
7 Outcomes
In this section, the outcomes are gained by performing feature selection approaches and machine learning algorithms.
7.1 Feature Selection Approaches
Relevant features are selected using PCA whose eigenvalue is greater than 1, and features selected using rough set analysis are basis on heuristic search. Features selected using PCA and rough set analysis are demonstrated in Fig. 7.
7.2 Machine Learning Approaches
Tables 4 and 5 represent the measured value of accuracy and F-measure using PCA and rough set analysis using the equations mentioned in the literature [18, 19]. From tables, it may be inferred that:
-
Highest detection rate is represented by bold value.
-
It is observed from tables that models developed using features selection techniques achieved higher detection rate as compared to all extracted feature set.
-
Model developed using FLANN-genetic accomplished higher detection rate as resembled to FLANN-PSO and FLANN-CSA.
In order to search, developed malware detection model is effective or not, box-plot diagrams of the individual developed model is constructed. Figures 8 and 9 demonstrate the box-plot diagrams for accuracy and F-measure using feature selection approaches. From figures, it can be concluded that:
-
Based on Figs. 8 and 9, model developed by using RSA as feature selection technique achieve higher detection rate.
-
On the basis of Fig. 9, it is seen that model developed by using FLANN-genetic is having higher median value and few outliers. Model build by using RSA achieved higher detection rate as compared to PCA.
7.3 Comparison with Existing Developed Frameworks
In order to find out developed malware detection model is effective in detecting malware or not, in this chapter comparison is done by using existing frameworks present in the literature. To perform this experiment, freely available dataset; i.e., Drebin [2] is considered. Table 6 represent the comparison with existing approaches or frameworks presented in the literature.
7.3.1 Experimental Findings
In this chapter, a framework is developed by using android apps and by utilizing hybrid artificial neural network. Based on the outcome, this study is able to answer the questions discussed in Sect. 2.
RQ1: In the present study, implementation of six different machine learning techniques is used to develop malware detection model. Based on Tables 4 and 5, it can be implicit that model build using FLANN-genetic is more effective in detecting malware-infected from android.
RQ2: Yes, proposed detection model is effective in identifying malware-infected apps when compared to existing frameworks present in the literature.
RQ3: From Tables 4 and 5, it can be concluded that feature selection techniques have a significant role in building the malware detection model. Models developed using feature selection techniques are very effective when compared to the model developed using all extracted features.
8 Conclusion
This chapter paid a significant role while developing the malware detection models by using distinct android apps. In addition to that, it is observed that feature selection approach also paid an significant role while selecting the relevant features from all extracted features. Moreover, model developed using hybrid approach is more capable in detecting malware as compared to previously developed frameworks.
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
References
Arora, A., Peddoju, S.K., Conti, M.: Permpair: android malware detection using permission pairs. IEEE Trans. Inf. Forensics Secur. 15, 1968–1982 (2019)
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of android malware in your pocket. NDSS 14, 23–26 (2014)
Burguera, I., Zurutuza, U., Nadjm-Tehrani, S.: Crowdroid: behavior-based malware detection system for android. In: Proceedings of the 1st ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, pp. 15–26 (2011)
Cai, H., Meng, N., Ryder, B., Yao, D.: Droidcat: effective android malware detection and categorization via app-level profiling. IEEE Trans. Inf. Forensics Secur. 14(6), 1455–1470 (2018)
Enck, W., Gilbert, P., Han, S., Tendulkar, V., Chun, B.G., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.N.: Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans. Comput. Syst. (TOCS) 32(2), 1–29 (2014)
Hou, S., Saas, A., Chen, L., Ye, Y.: Deep4maldroid: a deep learning framework for android malware detection based on linux kernel system call graphs. In: 2016 IEEE/WIC/ACM International Conference on Web Intelligence Workshops (WIW), pp. 104–111. IEEE (2016)
Hou, S., Saas, A., Ye, Y., Chen, L.: Droiddelver: an android malware detection system using deep belief network based on api call blocks. In: International Conference on Web-Age Information Management, pp. 54–66. Springer (2016)
Hou, S., Ye, Y., Song, Y., Abdulhayoglu, M.: Hindroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1507–1515 (2017)
Kadir, A.F.A., Stakhanova, N., Ghorbani, A.A.: Android botnets: What URLs are telling us. In: International Conference on Network and System Security, pp. 78–91. Springer (2015)
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: Maldozer: automatic framework for android malware detection using deep learning. Digit. Invest. 24, S48–S59 (2018)
Kim, T., Kang, B., Rho, M., Sezer, S., Im, E.G.: A multimodal deep learning method for android malware detection using various features. IEEE Trans. Inf. Forensics Secur. 14(3), 773–788 (2018)
Kumar, L., Hota, C., Mahindru, A., Neti, L.B.M.: Android malware prediction using extreme learning machine with different kernel functions. In: Proceedings of the Asian Internet Engineering Conference, pp. 33–40 (2019)
Kumar, L., Rath, S.K.: Hybrid functional link artificial neural network approach for predicting maintainability of object-oriented software. J. Syst. Softw. 121, 170–190 (2016)
Mahindru, A., Sangal, A.: Deepdroid: feature selection approach to detect android malware using deep learning. In: 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), pp. 16–19. IEEE (2019)
Mahindru, A., Sangal, A.: Dldroid: feature selection based malware detection framework for android apps developed during covid-19. Int. J. Emerg. Technol. 11(3), 516–525 (2020)
Mahindru, A., Sangal, A.: Feature-based semi-supervised learning to detect malware from android. In: Automated Software Engineering: A Deep Learning-Based Approach, pp. 93–118. Springer (2020)
Mahindru, A., Sangal, A.: Gadroid: a framework for malware detection from android by using genetic algorithm as feature selection approach. Int. J. Adv. Sci. Technol. 29(5), 5532–5543 (2020)
Mahindru, A., Sangal, A.: Mldroid-framework for android malware detection using machine learning techniques. Neural Comput. Appl., 1–58 (2020)
Mahindru, A., Sangal, A.: Parudroid: validation of android malware detection dataset. J. Cybersecur. Inform. Manag. 3(2), 42–52 (2020)
Mahindru, A., Sangal, A.: Perbdroid: effective malware detection model developed using machine learning classification techniques. In: A Journey Towards Bio-Inspired Techniques in Software Engineering, pp. 103–139. Springer (2020)
Mahindru, A., Sangal, A.: Semidroid: a behavioral malware detector based on unsupervised machine learning techniques using feature selection approaches. Int. J. Mach. Learn. Cybernet., 1–43 (2020)
Mahindru, A., Sangal, A.: Somdroid: android malware detection by artificial neural network trained using unsupervised learning. Evol. Intell., 1–31 (2020)
Mahindru, A., Sangal, A.: Fsdroid:-a feature selection technique to detect malware from android using machine learning techniques. Multimedia Tools Appl., 1–53 (2021)
Mahindru, A., Sangal, A.: Hybridroid: an empirical analysis on effective malware detection model developed using ensemble methods. J. Supercomput., 1–43 (2021)
Mahindru, A., Singh, P.: Dynamic permissions based android malware detection using machine learning techniques. In: Proceedings of the 10th Innovations in Software Engineering Conference, pp. 202–210 (2017)
Martín, A., Menéndez, H.D., Camacho, D.: Mocdroid: multi-objective evolutionary classifier for android malware detection. Soft Comput. 21(24), 7405–7415 (2017)
Portokalidis, G., Homburg, P., Anagnostakis, K., Bos, H.: Paranoid android: versatile protection for smartphones. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 347–356 (2010)
Shen, F., Del Vecchio, J., Mohaisen, A., Ko, S.Y., Ziarek, L.: Android malware detection using complex-flows. IEEE Trans. Mob. Comput. 18(6), 1231–1245 (2018)
Tam, K., Khan, S.J., Fattori, A., Cavallaro, L.: Copperdroid: Automatic reconstruction of android malware behaviors. In: NDSS (2015)
Tong, F., Yan, Z.: A hybrid approach of mobile malware detection in android. J. Parallel Distrib. Comput. 103, 22–31 (2017)
Wang, W., Zhao, M., Wang, J.: Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J. Ambient Intell. Hum. Comput. 10(8), 3035–3043 (2019)
Xiao, X., Zhang, S., Mercaldo, F., Hu, G., Sangaiah, A.K.: Android malware detection based on system call sequences and LSTM. Multimedia Tools Appl. 78(4), 3979–3999 (2019)
Xu, R., Saïdi, H., Anderson, R.: Aurasium: Practical policy enforcement for android applications. In: Presented as Part of the 21st USENIX Security Symposium (USENIX Security 12), pp. 539–552 (2012)
Yerima, S.Y., Sezer, S.: Droidfusion: a novel multilevel classifier fusion approach for android malware detection. IEEE Trans. Cybernet. 49(2), 453–466 (2018)
Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on Security and Privacy, pp. 95–109. IEEE (2012)
Zhu, H.J., You, Z.H., Zhu, Z.X., Shi, W.L., Chen, X., Cheng, L.: Droiddet: effective and robust detection of android malware using static analysis along with rotation forest model. Neurocomputing 272, 638–646 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mahindru, A. (2023). ANNDroid: A Framework for Android Malware Detection Using Feature Selection Techniques and Machine Learning Algorithms. In: Singh, J., Das, D., Kumar, L., Krishna, A. (eds) Mobile Application Development: Practice and Experience. Studies in Systems, Decision and Control, vol 452. Springer, Singapore. https://doi.org/10.1007/978-981-19-6893-8_5
Download citation
DOI: https://doi.org/10.1007/978-981-19-6893-8_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6892-1
Online ISBN: 978-981-19-6893-8
eBook Packages: EngineeringEngineering (R0)