Abstract
A malware phylogeny model is an estimation of the derivation relationships between a set of malware samples. Systems that construct phylogeny models are expected to be useful for malware analysts. While several such systems have been proposed, little is known about the consistency of their results on different data sets, about their generalizability across different types of malware evolution. This paper explores these issues using two artificial malware history generators: systems that simulate malware evolution according to different evolution models. A quantitative study was conducted using two phylogeny model construction systems and multiple samples of artificial evolution. High variability was found in the quality of their results on different data sets, and the systems were shown to be sensitive to the characteristics of evolution in the data sets. The results call into question the adequacy of evaluations typical in the field, raise pragmatic concerns about tool choice for malware analysts, and underscore the important role that model-based simulation is expected to play in evaluating and selecting suitable malware phylogeny construction systems.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Beaucamps P.: Advanced polymorphic techniques. Int. J. Comput. Sci. 2(3), 194–205 (2007)
Bluis, J., Shin, D.: Nodal distance algorithm: calculating a phylogenetic tree comparison metric. In: Proceedings of the Third IEEE Symposium on Bioinformatics and BioEngineering, pp. 87–94 (2003)
Buckley C., Dimmick D., Soboroff I., Voorhees E.: Bias and the limits of pooling for large collections. Inf. Retr. 10(6), 491–508 (2007)
Christodorescu, M., Jha, S.: Testing malware detectors. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis, Boston, MA, USA, pp. 34–44 (2004)
Erdélyi, G., Carrera, E.: Digital genome mapping: advanced binary malware analysis. In: Martin, H. (ed.) Proceedings of the 15th Virus Bulletin International Conference, Chicago, IL, USA, pp. 187–197. Virus Bulletin Ltd (2004)
Filiol E., Jacob G., Le Laird M.: Evaluation methodology and theoretical model for antiviral behavioural detection strategies. J. Comput. Virol. 3(1), 23–37 (2007)
Goldberg L., Goldberg P., Phillips C., Sorkin G.: Constructing computer virus phylogenies. J. Algorit. 26(1), 188–208 (1998)
Gorshenev, A.A., Pis’mak, Y.M.: Punctuated equilibrium in software evolution. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 70(6), (2004). Epub 23 December 2004
Harding E.F.: The probabilities of rooted tree shapes generated by random bifurcation. Adv. Appl. Prob. 3, 44–77 (1971)
Hayes, M.: Simulating malware evolution for evaluating program phylogenies. Master’s thesis, Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, USA, 70504 (2008)
Infection Vectors. Agobot and the kitchen sink. Retrieved from http://www.infectionvectors.com/vectors/kitchensink.htm, 17 Feb 2008
Karim M.E., Lakhotia A.W.A., Parida L.: Malware phylogeny generation using permutations of code. J. Comput. Virol. 1(1), 13–23 (2005)
Karypis, G.: CLUTO—a clustering toolkit. Technical Report TR 02–017, Deptment of Computer Science, University of Minnesota (2003)
Lyle, J.R., Gallagher, K.B.: A program decomposition scheme with applications to software modification and testing. In: Proceedings of the 22nd Annual Hawaii International conference on System Sciences, vol. 2, pp. 479–485 (1989)
Ma, J., Dunagan, J., Wang, H.J., Savage, S., Voelker, G.M.: Finding diversity in remote code injection exploits. In: Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement, Rio de Janeiro, Brazil, pp. 53–64 (2006)
Marx, A., Dressman, F.: The wildlist is dead: long live the wildlist! In: Martin, H. (ed.) Proceedings of the 18th Virus Bulletin International Conference, Vienna, Austria, pp. 136–147 (2007)
Nakhleh, L., Sun, J., Warnow, T., Linder, C., Moret, B., Tholse, A.: Towards the development of computational tools for evaluating phylogenetic network reconstruction. In: Proceedings of the Eighth Pacific Symposium on Biocomputing, pp. 315–326 (2003)
Rambaut A., Grassly N.: Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Bioinformatics 13(3), 235–238 (1997)
Robinson D., Foulds L.: Comparison of phylogenetic trees. Math. Biosci. 53(1/2), 131–147 (1981)
Sim, S.E., Easterbrook, S., Holt, R.C.: Using benchmarking to advance research: a challenge to software engineering. In: Proceedings of the 25th International Conference on Software Engineering (ICSE’03), pp. 74–83 (2003)
Symantec. Symantec global internet security threat report volume XIII: trends for July–December 2007, April 2008
Wehner S.: Analyzing worms and network traffic using compression. J. Comput. Secur. 15, 303–320 (2007)
Wu, J., Spitzer, C.W., Hassan, A.E., Holt, R.C.: Evolution spectrographs: Visualizing punctuated change in software evolution. In: Proceedings of the Seventh International Workshop on the Principles of Software Evolution (IWPSE’04), pp. 57–66 (2004)
Author information
Authors and Affiliations
Corresponding author
Additional information
M. Hayes is presently at Case Western Reserve University.
Rights and permissions
About this article
Cite this article
Hayes, M., Walenstein, A. & Lakhotia, A. Evaluation of malware phylogeny modelling systems using automated variant generation. J Comput Virol 5, 335–343 (2009). https://doi.org/10.1007/s11416-008-0100-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11416-008-0100-6