Abstract
Malware detection is a major challenge in today’s software security profession. Works exist for malware detection based on static analysis such as function length frequency, printable string information, byte sequences, API calls, etc. Some works also applied dynamic analysis using features such as function call arguments, returned values, dynamic API call sequences, etc. In this work, we applied a reverse engineering process to extract static and behavioral features from malware based on an assumption that behavior of a malware can be revealed by executing it and observing its effects on the operating environment. We captured all the activities including registry activity, file system activity, network activity, API Calls made, and DLLs accessed for each executable by running them in an isolated environment. Using the extracted features from the reverse engineering process and static analysis features, we prepared two datasets and applied data mining algorithms to generate classification rules. Essential features are identified by applying Weka’s J48 decision tree classifier to 1103 software samples, 582 malware and 521 benign, collected from the Internet. The performance of all classifiers are evaluated by 5-fold cross validation with 80-20 splits of training sets. Experimental results show that Naïve Bayes classifier has better performance on the smaller data set with 15 reversed features, while J48 has better performance on the data set created from the API Call data set with 141 features. In addition, we applied a rough set based tool BLEM2 to generate and evaluate the identification of reverse engineered features in contrast to decision trees. Preliminary results indicate that BLEM2 rules may provide interesting insights for essential feature identification.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec 2009: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, pp. 55–62. ACM, New York (2009)
Burji, S., Liszka, K.J., Chan, C.-C.: Malware Analysis Using Reverse Engineering and Data Mining Tools. In: The 2010 International Conference on System Science and Engineering (ICSSE 2010), pp. 619–624 (July 2010)
Chan, C.-C., Santhosh, S.: BLEM2: Leaming Bayes’ rules from examples using rough sets. In: Proc. NAFIPS 2003, 22nd Int. Conf. of the North American Fuzzy Information Processing Society, Chicago, Illinois, July 24-26, pp. 187–190 (2003)
Chan, C.-C., Grzymala-Busse, J.W.: On the two local inductive algorithms: PRISM, and LEM2. Foundations of Computing and Decision Sciences 19(3), 185–203 (1994)
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behaviour. In: Proc. ESEC/FS 2007, pp. 5–14 (2007)
Cohen, F.: Computer Viruses. PhD thesis, University of Southern California (1985)
Cohen, W.: Learning Trees and Rules with Set-Valued Features. American Association for Artificial Intelligence, AMI (1996)
Islam, R., Tian, R., Batten, L., Versteeg, S.C.: Classification of Malware Based on String and Function Feature Selection. In: 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, Victoria Australia, July 19-July 20 (2010) ISBN: 978-0-7695-4186-0
Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proc. Fifth ACM Workshop on Recurring Malcode, WORM 2007 (November 2007)
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proc. KDD 2004, pp. 470–478 (2004)
Komashinskiy, D., Kotenko, I.V.: Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. In: Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010. IEEE Computer Society, Washington, DC (2010) ISBN: 978-0-7695-3939-3
Mcafee.com (2010a), http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2010.pdf (retrieved)
Mcafee.com (2010b), http://www.mcafee.com/us/resources/reports/rp-good-decade-for-cybercrime.pdf (retrieved)
Messagelabs.com (2011), http://www.messagelabs.com/mlireport/MLI_2011_01_January_Final_en-us.pdf (retrieved)
Miller, P.: Hexdump. Online publication (2000), http://www.pcug.org.au/millerp/hexdump.html
Pawlak, Z.: Rough sets: basic notion. International Journal of Computer and Information Science 11(15), 344–356 (1982)
Pawlak, Z.: Flow graphs and intelligent data analysis. Fundamenta Informaticae 64, 369–377 (2005)
Rozinov, K.: Reverse Code Engineering: An In-Depth Analysis of the Bagle Virus. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, June 15-17, pp. 380–387 (2005)
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data Mining Methods for Detection of New Malicious Executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49. IEEE Computer Society (2001)
Skoudis, E.: Malware: Fighting Malicious Code. Prentice Hall (2004)
Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proc. 20th Annu. Comput. Security Appl. Conf., pp. 326–334 (2004)
Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: A Virus Prevention Model Based on Static Analysis and Data Mining Methods. In: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, CITWORKSHOPS 2008, pp. 288–293 (2008)
Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: Detecting Unknown Malicious Executables Using Portable Executable Headers. In: Fifth International Joint Conference on INC, IMS and IDC, pp. 278–284 (2009)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. (2005) ISBN: 0-12-088407-0
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ravula, R.R., Liszka, K.J., Chan, CC. (2013). Learning Attack Features from Static and Dynamic Analysis of Malware. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-37186-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)