Static Program Behavior Tracing for Program Similarity Quantification

Raman, Dugyala; Bezawada, Bruhadeshwar; Rajinikanth, T. V.; Sathyanarayan, Sai

doi:10.1007/978-981-10-2471-9_31

Dugyala Raman¹⁹,
Bruhadeshwar Bezawada²⁰,
T. V. Rajinikanth²¹ &
…
Sai Sathyanarayan²²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 507))

1092 Accesses
8 Citations

Abstract

Characterizing program behavior using static analysis is a challenging problem. In this work, we focus on the fundamental problem of program similarity quantification, i.e., estimating the behavioral similarity of two programs. The solution to this problem is a sub-routine for many important practical problems, such as malware classification, code-cloning detection, program testing, and so on. The main difficulty is to be able to characterize the run-time program behavior without actually executing the program or performing emulation. In this work, we propose a novel behavior tracing approach to characterize program behaviors. We use the call-dependency relationship among the program API calls to generate a trace of the API calling sequence. The dependency tracking is done in a backward fashion, so as to capture the cause and effect relationship among the API calls. Our hypothesis is that this relationship can capture the program behavior to a large extent. We performed experiments by considering several “versions” of a given software, where each version was generated using the code obfuscation techniques. Our approach was found to be resilient up to 20 % obfuscation, i.e., our approach correctly detected that all obfuscated programs that are similar in behavior based on the API call sequences.

Access provided by CONRICYT-eBooks. Download conference paper PDF

On the Automatic Analysis of the Practical Resistance of Obfuscating Transformations

Article 01 December 2020

Comparing API Call Sequence Algorithms for Malware Detection

Impeding behavior-based malware analysis via replacement attacks to malware specifications

Article 31 May 2016

Keywords

1 Introduction

Analyzing and characterizing program behavior are important and challenging problems.^{Footnote 1} In general, a new version of a program is created by adding additional functionality to the original vanilla version of the program. However, the core behavior of the program remains the same as the vanilla version. Such analysis has extensive applications in program classification scenarios, such as malware analysis and signature generation. In malware, the changes are made to hide the program from anti-virus signature databases. Thus, it is important to develop approaches that can characterize the core program behavior of programs, which are evolved from the original version.

Although program behavior can be learnt by executing the program or emulating it in a sand box and collecting traces, this is an expensive approach and there are no guarantees that the entire program behavior would be captured within the given sample execution. We focus on static analysis-based techniques, as they are light weight and fast when compared with run-time approaches, such as sand boxing and dynamic program tracing. In the past, system calls have been used for modeling program behavior [1–5] and for detecting malware. Other approaches like [6] used frequency distribution of critical API calls to detect the presence of malware behavior in a program. However, these approaches have high false positive rates, as frequency distributions can be common across programs that are dissimilar in behavior.

To extract behavioral aspects, API call sequences are shown to be quite useful [7–30]. For example, a particular calling sequence could be typically constant across all programs evolved from the same base program. However, extracting such a sequence could be cumbersome, as a forward analysis on the control-flow graph can have an exponential number of such sequences. Furthermore, even if the program analyst extracts all such sequences, it is non-trivial to identify which of these sequences characterizes the core program behavior.

To address this problem, we use the notion of critical API calls, which are the most frequently occurring API calls in the original version of the software. We focus only on the sequences generated by the critical API calls, and thereby reduce the computational effort for the program analyst. Given this insight, our approach consists of a backward tracing algorithm, which consists of a sequence of instructions that have either directly or indirectly modified the API call’s arguments. For a given program, we use two versions from the same class and generate all the backward traces. Next, we use a statistical profiling technique that compares the lengths of the sequences and stores the result of the comparison. This serves as the signature for the class of programs and the statistical threshold is used to check if any other program exhibits similar behavior. We have tested our technique on two Windows executables, notepad.exe and mstc.exe, by introducing some obfuscations and adding additional call sequences.

In this paper, we present a static analysis approach which extracts the behaviors of a program as ordered sequences of critical API calls. Our contributions are as follows:

Capturing Program Behavior using Backward Traces. For each critical API call in the program binary, a backward trace is the sequence of all other critical API calls that effect the arguments of the concerned API call, resulting in accurate modeling of program behavior. We describe an algorithm that can perform this tracing.
Program Similarity Testing. We have experimented by obfuscating two normal Windows application programs by adding additional redundant functionality and determining their similarity to the original program. Our preliminary experiments show that a vanilla version has to be obfuscated by introducing at least 20 % of new functionality to evade detection. These results show that our approach can effectively classify programs based on their behavior.

The paper is organized as follows. In Sect. 2, we present the related work which has been done in the past in the field of program behavior analysis and malware detection. In Sect. 3, we describe our backward tracking approach for behavioral tracing. In Sect. 4, we describe a prototype implementation of our approach, present experimental results, and evaluate the effectiveness of our approach. Finally, in Sect. 5, we conclude and discuss future work.

2 Related Work

Much of the related work in this domain has come from the area of malware detection. We cover some important results. Cohen [13] and Chess and White [14] use sandboxing to detect viruses. They proved that, in general, the problem of virus detection is undecidable. Our work is closely related to the work done by Bergeron et al. [1]. They used slicing on assembly code of a program to extract code fragments that are critical from the security standpoint. In [8], the authors have used the static analysis to generate a critical API graph which they compare against a security policy to detect the presence of malicious code in the program. In [10], Christodorescu and Jha use static analysis to detect malicious code in executables. Their implementation called SAFE handles simple obfuscations used by malware writers, such as insertion of NOPs between instructions, that are used to evade detection. Bergeron et al. [8] consider critical API calls and security policies to test the presence of malicious code. Their approach does not work for obfuscated malicious executables. Bilar [15] uses statistical structures, such as opcode frequency distribution and graph structure fingerprints, to detect malicious programs.

In [16], Christodorescu et al. exploited dynamically captured semantic heuristics to detect obfuscated malware. DOME [17] uses static analysis to detect system call locations and run-time monitoring to check all system calls that are made from a location identified during static analysis. Min Sun et al. [3] use dynamic monitoring to detect worms and other exploits. Sai et al. [6] used frequency distribution of API calls in a program to differentiate between malware and benign programs.

A more closely related work is the work in [18], where the authors attempt to capture behavioral similarities across malware variants of a same family of malware. They used execution traces of a malware sample to create a malspec (malicious specification) for the sample. This malspec is able to detect other variants of the malware sample. It is a dynamic analysis technique and uses subgraph matching for comparing two malspecs, which increases detection time considerably. Similar approaches have recently been proposed in [5, 7, 12]. In [11], the authors use data flow graphs to quantify the behavior of malware, and thereby detect them effectively. However, their approach is specific to malware detection and cannot be applied to program similarity estimation.

3 Our Approach

In this section, first, we outline our approach behavioral tracing. We call the behavioral trace of a program as signature of the program. Next, we describe, in detail, our program behavior model used for signature generation and the statistical comparison technique. Our model generates signatures for the vanilla (the earliest version) of the program and calculates the similarity of this signature with the future versions of the program.

The behavior of a program can be specified based on the API calls that the program and its versions use. For instance, a virus trying to search for executable files will typically make use of API calls, such as FindFirstFileA, FindNextFileA and FindClose, in KERNEL32.DLL. This behavior of searching files is captured by the use of these API calls. Rather than considering all API calls, we consider only critical API calls [8, 18]. The critical API calls are essential calls required for the core functionality of the program and will be required across all future versions. We extract the sequences of critical API calls from the program and represent them in a compactly as signatures. Next, we give details of how we perform each of the steps of our approach.

3.1 Disassembling the Program and Converting into IR

We use IDAPro Disassembler to disassemble the program binary. Next, we convert the disassembled instructions into an intermediate representation (IR) which is in the form of C-like statements. If these steps were being carried out on a parallel processor, multiple programs could be processed in parallel and make them ready for analysis. Thus, overhead of these steps is negligible for analysis. In addition, these steps could be done offline on a cloud cluster or high-end server.

3.2 Extracting the Behavior

Each program behavior can be specified by an ordered sequence of API calls. The signature of a program is hence based on the sequences of critical API calls. Let the set $P=(a_1,a_2,\ldots ,a_n)$ be a profile created from a program by extracting its critical API calls, where $a_i$ represents the ith critical API call and n being the total number of critical API calls.

To get the sequence of statements for each critical API call, we compute the backward trace of the program with respect to set P. A trace is a subset of a program, which contains only those statements that evaluate the arguments input to the concerned tracing criterion. In our case, the tracing criterion contains the set of critical API calls P. In our approach, we need not generate the trace for all the critical API calls in P, since if the trace generated for $a_i$ criterion contains any of the API call say $a_j$ in P, then we need not perform tracing on $a_j$, since $a_i$ already contains all the trace statements that $a_j$ should contain. Therefore, the number of traces will be less than n, say m. The algorithm for generating the behavior traces is shown in Algorithm 1.

3.3 Computing the Longest Common Subsequence (LCS)

In this section, we compute the signature for a program by computing the LCS for each backward trace generated from the previous section. The signature is generated by computing the LCS of each trace from some versions of the program. For example, to create a signature for MyDoom family, we consider MyDoom.a and MyDoom.aa as training samples. The algorithm for generating the signature is shown in Algorithm 2. While we have considered malware samples, without loss of generality, this approach applies to any program consisting of several versions.

3.4 Classification Strategy

We use a statistical profiling technique to differentiate between programs. We measure the difference between the proportions of the critical API calls in a signature and that of a test program using the Chi-square test [19]. The chi-square test is a likelihood-ratio or maximum-likelihood statistical significance test that measures the difference between proportions in two independent samples. The signature $SIG_i$ for a program $M_i$ specifies the LCS of critical API calls that a version of the program which belongs to $M_i$ is expected to have. To test the relationship of a given test file to $M_i$, its API calls are extracted and compared to that in the signature. The chi-square is then computed as:

$$\begin{aligned} \chi _i^2 = \frac{( O_i - E_i )^2}{E_i} \qquad ;1 \le i \le n. \end{aligned}$$

Here, $O_i$ is the observed frequency/LCS of the ith critical API call in the test file and $E_i$ is its expected frequency, i.e., frequency in the signature of the program. Now, $\chi ^2$ is compared against a threshold value $\epsilon $ from a standard chi-square distribution table with one degree of freedom. The degrees of freedom are associated with the number of parameters that can vary in a statistical model. A significance level of 0.05 was selected. This means that 95 % of the time we expect $\chi ^2$ to be less than or equal to $\epsilon $. For one degree of freedom and significance level 0.05, $\epsilon = 3.84$. Let $U = \lbrace API_i \ | \ \chi _i^2 \le \epsilon _i \rbrace $. We define a degree of membership $\lambda $ as

$$\begin{aligned} \lambda = \frac{|U|}{n}. \end{aligned}$$

Degree of membership $\lambda $ is a measure of similarity of a test file to the given program. The statistical profiling algorithm is shown in Algorithm 3.

4 Experimental Analysis

To test the resilience of our approach against program obfuscations, we tested with the following known obfuscation techniques. Code Transposition: Code is shuffled and unconditional jumps are inserted to restore the original order of execution of program. Instruction Substitution: Many instruction sequences can be rewritten with similar semantics. A number of such transformations were identified and replaced. Register Reassignment: Usage of registers is interchanged, for example, EAX is used as EBX and vice versa. Dead Code Insertion: Code is added to program without modifying its behavior. To challenge our technique, we added code that consisted of critical API calls that would result in no change of behavior. For each program sample, a random number of program transformations were applied using the obfuscator. A set of 1000 obfuscated binaries for the sample program were generated. The signature was generated using only the sample program binary that was used to generate the 1000 obfuscated binaries. All experiments were conducted on an Intel Celeron 1.6 GHz, 1 GB RAM machine.

4.1 Synthetic Program Similarity Test

The experiment was carried out on 245 samples of malicious programs using the synthetic obfuscations described earlier. The observations made during the experiment were:

Our approach was resilient to control-flow transpositions. There was no effect on the critical API call sequences that were generated using backward tracing.
Our approach was resilient to instruction substitution transposition. In our approach, one of the steps is to convert the disassembly into a C-like intermediate representation, where such substitutions are handled and replaced by a code sequence which is kept same for the whole program.
Our approach was resilient to register reassignment transposition and dead code insertion. In both these techniques, the data dependencies were not changed. Therefore, our backward tracing analysis results were same.

4.2 Real-Life Program Similarity Test

In this experiment, semantically obfuscated program binaries of a sample were tested against the signature of the original program from which the semantically obfuscated binaries were generated. The experiment was carried out on two programs—notepad.exe and mstsc.exe. The API call sequences of a benign program were generated using backward tracing. Incrementally, some dummy noise API calls were introduced in these API call sequences. The following observations were made.

Observation 1. At some point, adding noise to API call sequences caused the program to be not detected as a variant of the original program. For the tested programs, notepad.exe and mstsc.exe, after about 20 % increase in the API call sequences length, our approach could not identify them as similar to original sequence. One of the reason for this was the threshold that had been kept for similarity detection.

Observation 2. Given, two programs, our program was able to provide an estimate percentage of similarity between those two programs based on the similarity in the API call sequences of the two programs. This metric can be used to cluster similar programs, especially, when analyzing a large number of programs.

5 Conclusion and Future Work

In this paper, we presented a novel static analysis approach to characterize program behaviors using behavior tracing We observed that the calling sequence relationship in API calls cannot be removed without changing the behavior of the program. Thus, calling sequences of API calls are able to capture program behavior to a large extent. To be able to accurately evaluate our approach, we developed an obfuscator that would apply random syntactic transformations to generate an obfuscated binary from the original binary.

Our approach is able to generate a signature, which is sufficient to characterize any number future versions of the program. Our approach is able to classify programs on the basis of behavioral similarity and also to provide an approximate measure of the quality of obfuscations. We provided the results to show the efficacy of our approach in classifying similar programs. As future work, we intend to use our approach in clustering programs on the basis of behavior.

Notes

1.
Malware are programs that exhibit malicious behavior that can disrupt the proper functioning of a computing system and can cause damage to sensitive data or to other resident programs.

References

J. Bergeron, M. Debbabi, M. M. Erhioui, and B. Ktari. Static Analysis of Binary Code to Isolate Malicious Behaviors. In the Proceedings of the IEEE 4th International Workshop on Enterprise Security, WETICE’99, Stanford University, California, USA, June 16–18, 1999, Pages 184–189, IEEE Press.
Google Scholar
Sean Peisert, Matt Bishop, Sidney Karin, Keith Marzullo, Analysis of Computer Intrusions Using Sequences of Function Calls, IEEE Transactions On Dependable and Secure Computing, VOL. 4, No. 2, APRIL-JUNE 2007.
Google Scholar
Hung-Min Sun, Yue-Hsun Lin, and Ming-Fung Wu. API Monitoring System for Defeating Worms and Exploits in MS-Windows System. In Proceedings of 11th Australasian Conference on Information Security and Privacy, ACISP 2006, Melbourne, Australia.
Google Scholar
R. Sekar, M. Bendre, D. Dhurjati, P. Bollineni, A Fast Automaton-Based Method for Detecting Anomalous Program Behaviors. IEEE Symposium on Security and Privacy, 2001.
Google Scholar
Gerardo Canfora; Francesco Mercaldo; Corrado Aaron Visaggio; Paolo Di Notte; Metamorphic Malware Detection Using Code Metrics, in Information Security Journal: A Global Perspective, Taylor & Francis, pp 1–14, 2014, DOI:10.1080/19393555.2014.931487.
V. Sai Sathyanarayan, Pankaj Kohli and Bezawada Bruhadeshwar. Signature Generation and Detection of Malware Families. Proceedings of 13th Australian Conference on Information Security and Privacy, ACISP 2008.
Google Scholar
Ronghua Tian; Islam, R.; Batten, L.; Versteeg, S.; Differentiating malware from cleanware using behavioural analysis, 2010 5th International Conference on Malicious and Unwanted Software (MALWARE), pp 23–30, 19–20 Oct, 2010, Nancy Lorraine.
Google Scholar
J. Bergeron, M. Debbabi, J. Desharnais, M. Erhioui, Y. Lavoie and N. Tawbi. Static Detection of Malicious Code in Executable Programs. In the Proceedings of the International Symposium on Requirements Engineering for Information Security SREIS’01, Pages 1–8, March 5–6, 2001, Indianapolis, Indiana, USA.
Google Scholar
R. W. Lo, K. N. Levitt, and R. A. Olsson. MCF: A Malicious Code Filter. Computers and Security, 14(6):541566, 1995.
Google Scholar
Mihai Christodorescu and Somesh Jha. Static Anlaysis of Executables to Detect Malicious Patterns. In proceeding of the 12th USENIX Security Symp. (Security03), pages 169–186 August 2003.
Google Scholar
Tobias Wchner, Martn Ochoa, and Alexander Pretschner. 2014. Malware detection with quantitative data flow graphs. In Proceedings of the 9th ACM symposium on Information, computer and communications security (ASIA CCS ’14). ACM, New York, NY, USA, 271–282. DOI=10.1145/2590296.2590319, http://doi.acm.org/10.1145/2590296.2590319.
Raman Dugyala; Bruhadeshwar Bezawada; Romanch Agrawal; Sai Sathyanarayan; Rajinikanth Tatiparthi; Application of Information Flow Tracking for Signature Generation and Detection of Malware Families; International Journal of Applied Engineering Research (IJAER). ISSN 0973-4562 Volume 9, Number 24 (2014), pp. 29371–29390.
Google Scholar
F. Cohen. Computer Virus: Theory and experiments. Computers and Security, 6:2235, 1987.
Google Scholar
D.M. Chess and S.R. White. An undetectable computer virus. In proceedings of Virus Bulletin Conference, 2000.
Google Scholar
Bilar, D.: Statistical Structures: Tolerant Fingerprinting for Classification and Analysis given at BH ’06 (Las Vegas, NV): Blackhat Briefings USA (August 2006).
Google Scholar
Mihai Christodorescu, Somesh Jha, Sanjit A. Seshia, Dawn Song, Randal E. Bryant, Semantics-Aware Malware Detection, Proceedings of the 2005 IEEE Symposium on Security and Privacy, p. 32–46, May 08–11, 2005.
Google Scholar
C.Jesse, R.Rabek. I.Khazan, M.Scott, L.Robert and K.Cunningham, Detection of Injected, Dynamically Generated and Obfuscated Malicious Code. In Proceedings of 2003 ACM workshop on Rapid Malcode October 2003.
Google Scholar
Mihai Christodorescu, Somesh Jha and Christopher Krugel. Mining Specification of Malicious Behavior. In proceeding of the 6th joint meeting of the European Software Engineering Conference. ACM SIGSOFT Symp. On ESES/FSE 2007.
Google Scholar
Sokal, R. R. and Rohlf, F. J.; Biometry: the principles and practice of statistics in biological research., 3rd edition. New York: Freeman (1994).
Google Scholar
Mehdi, B.; Ahmed, F.; Khayyam, S. A.; Farooq, M.; Towards a Theory of Generalizing System Call Representation for In-Execution Malware Detection, 2010 IEEE International Conference on Communications (ICC), 23–27 May 2010, pp: 1–5, Cape Town, South Africa.
Google Scholar
M. Pietrek, An In-Depth Look into the Win32 Portable Executable File Format, in MSDN Magazine, March 2002.
Google Scholar
VX Heavens. At http://vx.netlux.org.
Ilfak Guilfanov. An Advanced Interactive Multi-processor Disassembler. http://www.datarescue.com, 2000.
Kent Griffin, Scott Schneider, Xin Hu and Tzi-cker Chiueh. Automatic Generation of String Signatures for Malware Detection. In Proceedings of the 12th Symposium on Recent Advances in Intrusion Detection (RAID), Saint-Malo, Brittany, France, September 2009.
Google Scholar
N. Landi. Undecidability of static analysis. ACM Letters on Programming Language and systems (LOPLAS), 1(4):323 337, December 1992.
Google Scholar
C. Willems. CWSandbox: Automatic Behaviour analysis of malware. http://www.cwsandbox.org/, 2006.
M. Sharif, V. Yegneswaran, H. Saidi, P.A Porras, and W. Lee. Eureka: A Framework for Enabling Static Malware Analysis. In Proceedings of the 13th European Symposium on Research in Computer Security, Malaga, Spain, October 2008.
Google Scholar
Ulrich Bayer, Paolo Milani, Clemens Hlauschek, Christopher Kruegel, and Engin Kirda. Scalable, Behavior-Based Malware Clustering. 16th Annual Network and Distributed System Security Symposium (NDSS 2009), San Diego, February 2009.
Google Scholar
Tony Lee, and Jigar J. Mody. Behavioral Classification. In EICAR Conference, 2006.
Google Scholar
G. Mazeroff, V. De Cerqueira, J. Gregor, and M. Thomason. Probabilistic Tree and Automata for Application Behavior Modeling. Proceedings of 41st ACM Southeast Regional Conference, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Vardhaman College of Engineering, Hyderabad, India
Dugyala Raman
Koneru Lakshmaiah University, Guntur, India
Bruhadeshwar Bezawada
Sreenidhi Institute of Science and Technology, Hyderabad, India
T. V. Rajinikanth
Citibank, Singapore
Sai Sathyanarayan

Authors

Dugyala Raman
View author publications
You can also search for this author in PubMed Google Scholar
Bruhadeshwar Bezawada
View author publications
You can also search for this author in PubMed Google Scholar
T. V. Rajinikanth
View author publications
You can also search for this author in PubMed Google Scholar
Sai Sathyanarayan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dugyala Raman .

Editor information

Editors and Affiliations

ANITS, Prof., Comp. Sci. & Engg. Dept. ANITS, Visakhapatnam, Andhra Pradesh, India
Suresh Chandra Satapathy
JNTUH College of Engg. HYD (Autonomous), Prof. & Head, Comp. Sci. & Engg. Dept. JNTUH College of Engg. HYD (Autonomous), Hyderabad, Telangana, India
V. Kamakshi Prasad
JNTUH College of Engg. HYD (Autonomous), Pro., Dept. Computer Science & Engg. JNTUH College of Engg. HYD (Autonomous), Hyderabad, Telangana, India
B. Padmaja Rani
SCIS, University of Hyderabad , Hyderabad, India
Siba K. Udgata
CMR Technical Campus , Hyderabad, India
K. Srujan Raju

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raman, D., Bezawada, B., Rajinikanth, T.V., Sathyanarayan, S. (2017). Static Program Behavior Tracing for Program Similarity Quantification. In: Satapathy, S., Prasad, V., Rani, B., Udgata, S., Raju, K. (eds) Proceedings of the First International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 507. Springer, Singapore. https://doi.org/10.1007/978-981-10-2471-9_31

Download citation

DOI: https://doi.org/10.1007/978-981-10-2471-9_31
Published: 01 December 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2470-2
Online ISBN: 978-981-10-2471-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics