Abstract
Metamorphic viruses are equipped with morphing engine responsible for transforming the structure of the code in subsequent generations, thereby retaining the malicious behavior. Thus, commercial anti-virus software based on signature approach is unable to identify the unknown or zero-day malware. Each metamorphic malware has its own unique pattern since its internal structure changes from generation to generation. Hence, detection of these viruses is a challenge for researchers working on computer security. The degree of metamorphism in the dataset is estimated by aligning the locations of common opcodes using Smith–Waterman sequence alignment method suggesting that a generic pattern representing malware or benign classes cannot be extracted, thus demonstrating the failure of signature-based approach. The proposed statistical non-signature-based detector creates two different meta feature spaces each comprising 25 attributes for their detection. Three categories of opcode features are extracted from each sample: (a) branch opcodes, (b) unigrams and (c) bigrams. Insignificant features are initially eliminated using the Naïve Bayes approach; obtained feature space is further reduced using two feature reduction techniques: (1) Discriminant Feature Variance-based Approach (DFVA) and (2) Markov Blanket. Learning models are created using the prominent attributes obtained from each dimensionality reduction methods. The models which provided the highest accuracy at minimum feature length were retained, and unseen instances are classified using these optimal models. Later, two meta feature spaces were generated by ensembling the prominent branch, unigram and bigram opcodes obtained from DFVA and Markov Blanket. Both feature reduction techniques were found to be equally efficient in detecting the metamorphic malware samples. The proposed system detected Metamorphic Worm and Next Generation Virus Construction Kit viruses with 100 % accuracy, Precision 1.0, Recall 1.0 and a promising F1-score of 1.0 is achieved. The results demonstrate the efficiency of the proposed metamorphic malware detector, and we thus recommend that this approach can be used to assist commercial AV scanners.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Skoudis, E.; Zeltser, L.: Malware.: Fighting Malicious Code. Prentice Hall Professional, Upper Saddle River (2004)
Bilge, L.; Dumitras, T.: Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM Conference on Computer and Communications Security, pp. 833-844. ACM (2012)
Wong, W.; Stamp, M.: Hunting for metamorphic engines. J. Comput. Virol. 2(3), 211–229 (2006)
Stamp, M.: Information Security: Principles and Practice. Wiley, Hoboken (2011)
Attaluri, S.; McGhee, S.; Stamp, M.: Profile hidden markov models and metamorphic virus detection. J. Comput. Virol. 5(2), 151–169 (2009)
Sridhara, S.M.; Stamp, M.: Metamorphic worm that carries its own morphing engine. J. Comput. Virol. Hacking Tech. 9(2), 49–58 (2013)
Donabelle, B.; Low Richard, M.; Mark, S.: Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. Springer, pp. 1–14 (2013)
Bilar, D.: Opcodes as predictor for malware. IJESDF 1(2), 156–168 (2007)
Santos, I.; Brezo, F.; Nieves, J.; Penya, Y.K.; Sanz, B.; Laorden, C.; Bringas, P.G.: Idea: opcode-sequence-based malware detection. In: Engineering Secure Software and Systems, pp. 35–43. Springer, Berlin (2010)
Santos, I.; Brezo, F.; Sanz, B.; Laorden, C.; Bringas, P.G.: Using opcode sequences in single-class learning to detect unknown malware. IET Inf. Secur. 5(4), 220–227 (2011)
Moskovitch, R.; Feher, C.; Tzachar, N.; Berger, E.; Gitelman, M.; Dolev, S.; Elovici, Y.: Unknown malcode detection using opcode representation. In: Intelligence and Security Informatics, pp. 204–215. Springer, Berlin (2008)
Shabtai, A.; Moskovitch, R.; Feher, C.; Dolev, S.; Elovici, Y.: Detecting unknown malicious code by applying classification techniques on opcode patterns. Secur. Inf. 1(1), 1–22 (2012)
Priyadarshi, S.; Stamp, M.: Metamorphic detection via emulation. Masters Report (2011)
Shanmugam, G.; Low, R.M.; Stamp, M.: Simple substitution distance and metamorphic detection. J. Comput. Virol. Hacking Tech. 9(3), 159–170 (2013)
Tahan, G.; Rokach, L.; Shahar, Y.: Mal-id: automatic malware detection using common segment analysis and meta-features. j. mach. learn. res. 13, 949–979 (2012)
Vinod, P.; Laxmi, V.; Gaur, M.; Chauhan, G.: Momentum: metamorphic malware exploration techniques using msa signatures. In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 232–237. IEEE (2012)
Runwal, N.; Low, R.M.; Stamp, M.: Opcode graph similarity and metamorphic detection. J. Comput. Virol. 8(1–2), 37–52 (2012)
Deshpande, S.; Park, Y.; Stamp, M.: Eigenvalue analysis for metamorphic detection. J. Comput. Virol. Hacking Tech. 10(1), 53–65 (2014)
Annachhatre, C.; Austin, T.H.; Stamp, M.: Hidden markov models for malware classification. J. Comput. Virol. Hacking Tech. 1–15 (2014)
Kuriakose, J.; Vinod, P.: Unknown metamorphic malware detection: Modelling with fewer relevant features and robust feature selection techniques. IAENG Int. J. Comput. Sci. 42(2), (2015)
Raphel, J.; Vinod, P.: Pruned feature space for metamorphic malware detection. In: Proceedings of 8th IEEE International Conference on Contemporary Computing (IC3-2015), Jaypee Institute of Information Technology & University of Florida, August 20–22 (2015) [To appear]
Lin, D.; Stamp, M.: Hunting for undetectable metamorphic viruses. J. Comput. Virol. 7(3), 201–214 (2011)
IDA Pro disassembler. http://www.hex-rays.com/products/ida/. Accessed 24 June 2015
Nair, V.P.; Laxmi, V.; Gaur, M.S.; Kumar, G.V.S.S.P.; Chundawat, Y.S.: Static cfg analyzer for metamorphic malware code. In: Eli, A., Makarevich, O.B., Orgun, M.A., Chefranov, A.G., Pieprzyk, J., Bryukhomitsky, Y.A., rs, S.B. (eds) SIN, pp. 225–228. ACM (2009)
Lewis, D.: Naive (bayes) at forty: the independence assumption in information retrieval. In: Ndellec, C., Rouveirol, C. (eds.) Machine Learning: ECML-98. Lecture Notes in Computer Science, vol. 1398, pp. 4–15. Springer, Berlin (1998)
Panneerselvam, R.: Research Methodology. PHI Learning Pvt. Ltd., New Delhi (2004)
Yu, L.; Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)
Lyda, R.; Hamrock, J.: Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 5(2), 40–45 (2007)
Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Weka, open source machine learning software. http://www.cs.waikato.ac.nz/ml/weka/. Accessed 12 August 2015
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, Ian H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1) (2009)
Witten, I.H.; Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
Gotoh, Osamu: Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci. 9(3), 361–370 (1993)
Smith, T.; Waterman, M.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Stiglic, G.; Rodriguez, J.J.; Kokol, P.: Finding optimal classifiers for small feature sets in genomics and proteomics. Neurocomputing 73(13), 2346–2352 (2010)
Dinaburg, A.; Royal, P.; Sharif, M.; Lee, W.: Ether malware analysis via hardware virtualiza- tion extensions In: Proceedings of the 15th ACM Conference on Computer and Communications Security (CCS08), pp.51–62 (2008)
Zhao, Z.; Wang, J.; Bai, J.: Malware detection method based on the control-flow construct feature of software. IET Inf. Secur. 8(1), 18–24 (2014)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Raphel, J., Vinod, P. Heterogeneous Opcode Space for Metamorphic Malware Detection. Arab J Sci Eng 42, 537–558 (2017). https://doi.org/10.1007/s13369-016-2264-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-016-2264-6