1 Introduction

Buffer overflow attacks rely on exploiting illegal memory accesses by referencing a buffer outside its legal array bounds. These attacks are mostly caused by bugs in software written in low-level memory unsafe languages, like C or C++ [37]. Such memory errors present an old security issue that persists in spite of advanced exploit mitigation mechanisms and can lead to silent data corruption, security vulnerabilities, and program crashes. In spite of solutions proposed through techniques at the programmer/source-level [16, 24], compiler-level [2, 4, 7, 9, 14, 27, 29], and binary-level [33, 36, 38], the problem of memory safety persists especially in embedded, low-level, performance critical, and legacy software systems.

Techniques to detect memory errors require the ability to determine accurate buffer bounds along with the data type referenced (called the owner in this work) by each memory access (read/write) and pointer assignment/move instruction. This information is largely available to the source-code and compiler-level techniques, and enables more precise memory error detection at run-time. Unfortunately, techniques at this level require access to the source code and may not be applicable to legacy software where source code may not be available. Such techniques also involve reprogramming and/or re-compiling the code. The single binary executable generated/deployed using these techniques cannot be easily adapted to different risk averseness and performance overhead tolerances of end-users. Such approaches also leaves the task of memory safety solely in the hands of the software developer (rather than the end-user).

Binary-level techniques can overcome these challenges of source-level approaches. However, much of the program syntax and semantic information needed by techniques to resolve memory errors may be lost during the compilation process, especially when the generated binary is stripped of debug symbols. To overcome this limitation for binary-level techniques, researchers have developed advanced reverse engineering (RE) frameworks with sophisticated disassemblers, decompilers, and binary type and symbol inference algorithms that attempt to reconstruct information lost during the source to binary translation process.

In this work we study how much of the array bounds and instruction owner information is preserved by the compilation process (for binaries generated with debug information and those stripped of debug symbols) and can be retrieved by traditional disassemblers provided with contemporary RE tools. We also conduct the first detailed study on how accurately can this information that is needed to detect/prevent memory errors be reconstructed by the advanced decompilers and type inference algorithms provided with modern RE frameworks for stripped binaries. Our work explores the capabilities of two state-of-the-art RE tools, specifically NSA’s Ghidra [28] and Hex-Ray’s IDA Pro [1], and assesses the accuracy of the information they derive from program binaries.

Imprecision in array bounds detection and instruction owner information obtained by static RE tools can affect the ability to detect and prevent buffer overflows at run-time. In this work, we design and build a new binary-level run-time tool to evaluate, for the first time, the effectiveness of the program information gathered by the RE frameworks (in different configurations) to detect and prevent memory errors. The tool uses the obtained static analysis information to keep track of owners as pointers are assigned, and check relevant buffer reads/writes to assess the ability to ensure fine-grained memory safety at run-time.

Thus, we claim two major contributions in this work.

  1. 1.

    We conduct the first detailed study to determine the ability of static RE tools, specifically Ghidra and IDA Pro, to derive precise array bounds and instruction owner information from binary programs, which is required to detect and prevent memory errors.

  2. 2.

    We design, build and employ a new decoupled binary-level execution-time tool with the goal to assess the efficacy of the statically derived program information to provide memory safety for binary programs.

2 Related Works

In this section we compare our work with studies that evaluate the capabilities and precision of reverse engineering frameworks to reconstruct program information lost during the translation process. We also discuss past research in binary-level techniques to detect and prevent memory errors.

Several prior research works have evaluated the accuracy of binary code disassemblers and decompilers. Meng and Miller identify challenging code constructs that make it hard for RE tools to accurately disassemble binary code and construct a correct control flow graph (CFG) [25]. Andriesse et al. compare 9 popular disassemblers and find that complex code constructs are rare in real-world programs [3]. Inaccuracy in function start/boundary detection by current RE tools was reported by some works [3, 5]. Pang et al. analyze 9 open-source disassemblers to compare the algorithms and heuristics used for instruction recovery, symbolization, function detection and CFG construction and assess their precision [31]. They find that different tools use distinct algorithms and heuristics that complement each other, but also introduce coverage-correctness trade-offs. Another study explores the usability and effectiveness of decompilers to recover C output from binary code [21]. They find that while modern decompilers are getting increasingly powerful and accurate, issues such as type recovery and optimization still impede decompilers from generating accurate and presentable outputs. None of these works assess the efficacy of array bounds and instruction owner detection in RE tools.

A plethora of research has been conducted on type inference from program binaries [6, 13, 17, 19, 20, 23, 30, 35, 39]. Most of these research efforts are focused on prediction of basic or preliminary type information. Although some of these approaches claim to be able to detect higher order structures or aggregate types likes arrays, none of the approaches we know assess the accuracy of array bound detection, or evaluate the precision of instruction owner detection for binaries.

Past researchers have developed many techniques to detect and prevent memory errors. Many past approaches rely on the source-code with access to rich semantic program information [2, 8, 10, 27, 29, 32, 34].

Binary-level tools to locate fine-grained buffer overflows in memory at run-time have also been developed. The BinArmor technique [36] to detect memory errors relies on a tool called Howard [35] that uses past program execution traces to extract data structures and their memory bounds. BinArmor uses information from Howard to statically instrument the binary with checks to detect unsafe memory accesses during later program executions. Another technique develops a memory layout recovery algorithm to locate memory access vulnerabilities in the program after execution of the failed run [38]. This approach requires traces from a set of correct program executions to recover fine-grained memory layouts of variables. The recovered memory layouts from the passed program executions are then used to determine if the failed run exceeded any valid variable boundaries.

Both these past techniques employ a dynamic approach that relies on traces from multiple correct prior program executions to determine or predict relevant properties about the program, including buffer bounds. All dynamic analysis techniques require representative program inputs and are incomplete by design since they cannot guaranty complete code coverage and can only protect code and buffers that were seen by the analyzed program execution traces. Instead, our work is the first to explore the potential, capabilities and trade offs of using a static analysis and static type inference based approach to resolve this problem. Similar to BinArmor, but unlike the approach by Wang et al., our technique is designed to detect memory errors before they are triggered during program execution. Most importantly, none of these tools are available for use by researchers in the open-source domain and none have attempted to employ these tools to assess the extent and impact of inaccuracies in array bounds and instruction owner detection to locate and prevent buffer overflows at run-time.

3 Benchmarks and Frameworks

In this section we describe the experimental setup, benchmarks, and tools and frameworks used for this study.

Fig. 1.
figure 1

Schematic of experimental framework setup

3.1 Experimental Framework

A schematic of the overall framework is illustrated in Fig. 1. A C/C++ program is compiled with the standard gcc compiler with the “-g” flag to generate a binary with embedded debug symbol information. This binary is used by our ➀ Debug configuration. Later, the Linux command is used to generate another binary executable that is stripped of all symbol information. This binary is used by our ➁ Stripped and ➂ Decomp. configurations.

Our experiments employ two stages: (a) static analysis to assess the ability of our RE tools to derive precise array bounds and instruction owner information from binary programs, followed by (b) dynamic binary instrumentation (DBI) to assess the efficacy of the statically derived program information to provide memory safety for binary programs. We employ Ghidra version 9.1.2 (with Ghidra decompiler) [28] and IDA Pro version 7.5 (with Hex-Rays decompiler) [1] to conduct static analysis in the first stage. We use the PIN (version pin-3.15) [22] dynamic instrumentation engine in the second stage. All experiments are performed on a cluster of x86-64 Intel Xeon processors with the Fedora 28 OS.

The static RE tools we employ work independently of the program input(s). They include a disassembler to convert machine code to assembly code. They also provide a decompiler that employs sophisticated type inference and code reconstruction algorithms to raise the low-level assembly code into a higher-level language representation (commonly, C). Our ➀ Debug and ➁ Stripped configurations only use the disassembler. The ➂ Decomp. configuration also uses the decompiler, which enables this configuration to recover higher-order program structures like arrays and pointers along with their associated sizes and assists with instruction owner detection from the stripped binary. Each configuration in the static analysis phase outputs a distinct interface file with the array bounds and instruction owner information that it can recover from the binary.

The stripped binary program and the statically generated interface file are provided to Pin. Pin adds instrumentation based on previously determined instruction owner mapping and array bounds information, tracks dynamically allocated buffers and relevant register and memory values, and inserts security checks to detect buffer overflows at run-time.

3.2 Benchmarks

In this work we use benchmarks from three different benchmark suites, SARD-89 [18, 26], SARD-88 [26, 40], and SPEC cpu2006 [15]. The SARD-89 suite contains 291 small programs that implement a taxonomy of diverse C buffer overflows (1164 total programs). Each test case has three versions with memory accesses that overflow just outside, moderately outside, and far outside the buffer, respectively. The fourth version for each test case is a patched version without any buffer overflow. 18 of the 291 test subjects in SARD-89 benchmark suite contain overflows that leverage library functions to succeed. Although not a fundamental limitation of our technique or tools, we currently do not analyze library functions, and so leave out these programs. Additionally, 152 test subjects in SARD-89 overflow the buffer with an index that is a constant integer, for example buf[2048]. We discuss these cases in more detail in Sect. 4.2. We use the remaining 121 test programs for all experiments in this work, unless mentioned otherwise.

The SARD-88 benchmark suite contains 14 “real-world” programs from various internet applications (BIND, Sendmail, WU-FTP) with known buffer overflows. Two versions are provided for each test case, one with and the other without a buffer overflow (28 programs in total). We statically link library functions like strcpy, strcmp, that can overflow in some of the SARD-88 programs. We also employ all the SPEC cpu2006 integer benchmarks to study the scalability and efficacy of the static tools on large programs. All benchmarks are compiled using GCC version 9.3.1; optimized benchmarks use -O3.

4 Static Reverse Engineering

Techniques to detect and prevent memory errors need precise information regarding buffer data types, their base address and size/bound, and the data type referenced (owner) by each memory access (read/write) and pointer assignment/move instruction. Much of this information is lost during the compilation process. RE frameworks employ complex algorithms and heuristics to reconstruct lost program information from binaries. We explored the abilities of several RE tools to identify and reconstruct program information that is required to detect and prevent memory-related attacks in binaries, including Angr, Radare/r2, Debin, Ghidra, and IDA Pro. We found that only Ghidra and IDA Pro provide the capability and API for this task. In this section we present our results and observations. To our knowledge, this is the first work that evaluates and reports the efficacy of RE tools to extract or reconstruct the buffer/pointer bound and instruction owner information required to detect/prevent memory errors.

4.1 Setup and Implementation Details

In this section we describe the algorithms and extensions we develop to explore the capabilities of Ghidra and IDA Pro. Our scripts extract information relating to the statically known object bounds (local/global variables) and instruction-owner mappings. We use the term owner for program variables of type array or pointer that constitute the memory operand for the memory access instructions (of the kind MOV for the x86-64). Additionally, we have also extended the tools with block-level data-flow algorithms to track the instructions that propagate the pointer variables from memory to registers before they are used.

Figure 2 illustrates the information we gather from our RE tools. The figure shows the source code, the compiler generated binary code and corresponding IDA Pro output for a simple C program. This program has a single integer buffer, ‘b’, an integer pointer, ‘ptr’, and an integer scalar ‘n’. The variable ‘ptr’ is the “owner” of the assembly instructions at offsets ‘8’, ‘20’ and ‘27’. ‘ptr’ is mapped to the corresponding addresses. The pointer access on line #6 overflows the array ‘b’ - corresponding to assembly instruction at offset ‘27’. Comparably, the direct array access on line #7 overflows the array ‘b’ - corresponding to assembly instruction at offset ‘32’. Memory safety algorithms need to check such accesses to determine the invalid access at run-time.

Fig. 2.
figure 2

Example showing an invalid array access: (a) C source code (b) Assembly output (c) Output text file (interface file) after static analysis by IDA Pro

We found that the owners of direct variable access instructions (that employ \(\{rbp, rsp, rip\}\) based relative addressing, like the instructions at address ‘8’, ‘20’ and ‘32’ in Fig. 2) are determined automatically by the reverse engineering frameworks we study. However, the owners of pointer dereference instructions (for example, the instruction at address ‘27’ in Fig. 2) are not detected automatically by our advanced tools. To analyze such memory accesses, we implement a simple data-flow algorithm that keeps track of the variables and owners as they move between the memory stack and registers.

Figure 2(c) shows the output of our RE scripts after analysing the binary generated using the example program shown in Fig. 2(a). This output file contains function related metadata such as owner-instruction address mapping I (listed under addresses), function variable metadata \(f_v\) - local variables along with their position (offset) on the stack relative to the stack pointer, their size and type (listed under locals), function boundary \((f_s \cup f_e)\), and additional metadata \(f_m\) such as number of functions, stack size, base pointer relative addressing information, etc. This file also contains global variable metadata \(G_v\) - Variables defined in the data or bss sections and associated with their static address; the rest of the metadata is similar to local variables (listed under .global). This output of the static analysis \(G_v\cup \sum f_i\{(f_s \cup f_e), f_v, f_m, I\}\) is fed to the Pin tool.

4.2 Efficacy of Reverse Engineering Tools

In this section we study the efficacy of existing reverse engineering tools to determine buffer bound and instruction owner information for programs compiled by standard compilers with and without debug symbols and compiler optimizations.

Failures Even with Debug Symbol Information. Building a binary with debug symbols retains useful information from the source program regarding the function stack and the global data/bss section layout, variable types, and buffer bounds. However, the owner information is not captured by the debug symbols and may become hard to infer from the static binary. An example of this challenging scenario is encountered for many SARD-89 benchmarks that overflow the buffer with an index that is a constant integer. An example of this case is illustrated in Fig. 3. The left-hand side of the figure shows the source code and the right-hand side shows the corresponding assembly code. This program declares two arrays, ‘b1[5]’ and ‘b2[10]’. The write to ‘b2[15]’ corresponds to assembly instruction at location ‘40112e’ and the read from ‘b1[3]’ corresponds to the assembly instruction at location ‘401135’. In the assembly code these buffer accesses that are indexed by a constant use a displacement relative to the stack frame pointer, rbp, rather than the base array pointer. Thus, although these two instructions reference different buffers (and one, b2[15] is an overflow), if these accesses are within the stack frame, then it is hard for the RE tools to infer or predict from the assembly code if they refer to the array ‘b1’ or ‘b2’ or neither. In such cases, we found that the reverse engineering tools cannot determine the correct instruction owner even in the presence of debug symbols.

Such failures caused due to buffer accesses by a constant numeral may be an intrinsic limitation of binary-level techniques. Fortunately, arrays dereferenced by a constant numeral may be a less critical hazard or attack vector in security threat models, as many real-world buffer-overflow and stack-smashing attacks are triggered by a malicious external input specifically devised to overflow the buffer bound. Lack of high-level program information also prevents our RE tools from associating the correct owner with instructions accessing individual members of a structure. We found that there are 152 test cases in the SARD-89 benchmark suite that our RE tools fail to analyze due to these intrinsic reasons. We leave out these programs from the remaining experiments in this paper.

Fig. 3.
figure 3

Ambiguous array access: (a) C source code (b) Assembly output

Accuracy of Type and Owner Detection for Arrays and Pointers. Figures 4 and 6 (in Appendix A for optimized benchmarks) display the efficacy of array and pointer type detection for programs in the SARD-89, SARD-88, and SPEC suites. Each figure shows three configurations for each of our static RE tools, ➀ Debug, ➁ Stripped and ➂ Decomp. We leverage the pyelftools [12] module to design and build a new tool to extract variable information directly from the “dwarf” [11] section of binariesFootnote 1. The data from this tool is used as a baseline to compare the results obtained in the other RE-based configurations.

Fig. 4.
figure 4

Accuracy of array, pointers, and owner detection for SARD-89, SARD-88, SPEC-cpu2006 benchmarks, generated without compiler optimizations

Figures in the first row (4(a), 4(e), 4(i)) display array bound detection accuracy for corresponding benchmarks. #TP Arrays show the (True Positive, TP) arrays detected at correct offsets regardless of their size/bound, while #FP Arrays show the (False Positive, FP) arrays that are detected at incorrect offsets compared to our baseline. Figures in the second row (4(b), 4(f), 4(j)) display the accuracy of pointer detection. The first set of bars in each of these figures show the number of TP and FP pointers as detected directly by the reverse engineering tools. The set of bars labeled “with Pred.” use a simple pointer prediction algorithm we employed that marks every variable with “undefined type” (or undetermined type) and with a size of 8 bytes as a pointer. Figures in the third row (4(c), 4(g), 4(k)) display the accuracy of mapping the move and memory dereference instructions to array/pointer owners. The known owners are associated with instructions by our analysis algorithm. Static instructions mapped to owners that are scalar variables are ignored. Instructions are assigned unknown owners when relevant owners cannot be predicted. The stripped and decompiler results in these figures are compared against those when debug symbols are available with each respective RE tool. Finally, figures in the last row (4(d), 4(h), 4(l)) plot the accuracy of array bounds detection for the #TP Arrays. The Y-axis in these figures indicates the error magnitude in array bound detection.

Both Ghidra and IDA Pro show very poor efficacy with optimized programs for most of our experiments. One reason is that these RE tools do not consider register allocated variables which are prevalent in optimized benchmarks. This observation suggests a critical area for future work. Given this state with optimized binaries, we focus on unoptimized benchmarks for the remainder of this section. Results for optimized binaries are presented in Fig. 6 in Appendix A.

We make the following interesting observations from the data presented in Fig. 4: (1) Even advanced RE tools, like Ghidra and IDA Pro, can fail to appropriately leverage program symbol information, as seen most prominently by the poor efficacy of the debug-ghidra configuration to accurately detect static program pointers. (2) The debug-ghidra configuration misrepresents pointer types as int_64 or undefined_64 in many cases. (3) With no symbol information available in stripped binaries, disassemblers in our RE tools are unable to detect most/all arrays and pointers. Surprisingly, our simple pointer prediction algorithm is able to correctly detect most true pointers but also produces many false positives. We will explore developing more sophisticated pointer detection algorithms in the future to improve this simple prediction model. (4) Decompiler algorithms in IDA Pro and Ghidra do a commendable job, especially in detecting arrays and array bounds in stripped binaries. Interesting, even small programs seem to be able to provide sufficient context information to enable effective array type detection with these algorithms. (5) We find that the decompilers in Ghidra and IDA Pro are more accurate in inferring arrays and array bounds than inferring pointers. However, decompiler-based type inference algorithms often split arrays or combine them with adjacent arrays/variables resulting in many false positives and at times large inaccuracies in bound detection. (6) We can also make some more specific observations, like in Fig. 4(c) for SARD-89 benchmarks, many instructions associated with a scalar in the stripped-IDA case (which are ignored and are not plotted in the figure) are not assigned any owner (unknown owner) in the stripped-Ghidra case. We will see the implication of this difference in the next section. While instruction owner detection appears to works well for unoptimized benchmarks, it is largely unsuccessful for optimized programs. These observations reveal both the current capabilities of the static RE tools, Ghidra and IDA Pro, and open areas for research to more accurately derive program information that is necessary to detect memory errors at run-time.

5 Run-Time Framework to Detect Memory Errors

In this section we describe the implementation details of our run-time framework that employs the static program information gathered from the RE tools to detect spatial memory errors. We also assess the efficacy of the complete framework to effectively use the program information extracted by the static RE tools to detect memory errors in programs during execution. This approach does not require access to source code or hardware support.

figure b

5.1 Dynamic Tracking and Instrumentation Using Pin

Pin [22] employs information supplied by the our static RE tools in the interface file to detect memory safety violations at run-time. We build scripts, called Pintools, that use the Pin API to insert dynamic checks in the executed code. Algorithm 1 explains our dynamic buffer overflow detection algorithm. Our Pintool will add instrumentation code at run-time for pointer/array memory move/dereference instructions that are mapped with corresponding instruction owners from the interface file. Run-time instrumentation is added for the static instruction categories mentioned below. We employ the example program in Fig. 2 to explain the run-time algorithm and illustrate the instrumentation categories.Footnote 2

  • I. Function Start: The InitializeStack() function in Algorithm 1 will add instrumentation code at each function prologue to mark the locations of local variables w.r.t. the actual value of the stack pointer in memory. Function start (\(f_s\)) address obtained from the interface file (Fig. 2(c) - line #3) determines the instrumentation point. The dynamic array/pointer variable locations and available bounds get stored in a global metadata structure in this phase. Arguments passed to the program are also detected in this phase by adding a special check for function ‘main’.

  • II. Function End/Return: The UnInitializeStack() function in Algorithm 1 will fetch the function end (\(f_e\)) address from the interface file (Fig. 2(c) - line #4) to add instrumentation at every function end. This type of instrumentation is required to roll-back the allocated stack and remove corresponding meta-data when the function returns.

  • III. Pointer Move/Propagate: This type of instrumentation is used to transfer and assign the address/bound of the buffer to any associated pointer. The pointer can then be used to indirectly access the buffer. Similarly, bounds can also be transferred between two pointers. Instructions at offsets ‘4’–‘8’ (from Fig. 2(b)) give an example instruction pattern that represents pointer propagation.

figure c

Here, the lea (load effective address) instruction computes the address of buffer ‘b’ into a register (rax), and then assigns it to the pointer ‘ptr’ (at offset ‘(rbp-0x8)’ on the stack). Thus, the static analysis tools mark the owner of instruction at offset ‘8’ as pointer ‘ptr’ (line #8 in Fig. 2(c)).

At run-time, the BoundPropogationCheck() function in our Pintool will add instrumentation code (to the store instruction at offset ‘8’ in Fig. 2(b)) to check the contents of the rax register to determine the location of object ‘b’ in memory. Note that the address and bounds of ‘b’ get stored in a global map structure during stack initialization at function start. It will then transfer these bounds to the pointer ‘ptr’.

  • IV. Pointer Dereference: The following instruction triplet (instructions at offset ‘20’–‘27’ from Fig. 2(b)) shows an example pattern for pointer dereference.

figure d

The buffer ‘b’ is accessed through pointer ‘ptr’. Here, the PtrBoundCheck() function from Algorithm 1 will add instrumentation code (just before the store instruction at offset ‘27’) to check whether the access is within the associated bounds, as follows:

figure e
  • V. Array/Object Bound Check: Similar to PtrBoundCheck(), the ObjBoundCheck() function adds code to verify that a direct array access is within the associated bounds. An example pattern (instruction at offset ‘32’ in Fig. 2(b)) is:

figure f
  • VI. Memory Accesses with Unknown Instruction Owner: In some cases our static RE tools are unable to determine the instruction owners for the memory access instructions in the binary. In such cases, the UnknownBoundCheck() function will add instruction code to check that the access is within the bounds of the current function stack.

Apart from the above instrumentation categories, we instrument dynamic memory allocation functions like malloc, calloc, etc. We use Pin’s routine instrumentation support to instrument these dynamic allocation functions. Our implementation also supports pointer metadata propagation through function calls, i.e. it propagates the pointer bounds information whenever pointers are passed between different functions.

5.2 Buffer Overflow Detection Accuracy

The efficacy of this framework to accurately detect memory errors is influenced by two factors: (a) the ability of the employed static RE tools in the first stage to correctly discover the necessary program information, and (b) the ability of the dynamic Pin-based run-time framework to correctly detect the program patterns that constitute valid instrumentation points. The run-time framework should also maintain and correctly propagate the desired program state at the relevant instrumentation points.Footnote 3

We check the effectiveness of our prototype framework to detect memory overflows using two benchmark suites – SARD-89 and SARD-88. Table 1 presents the efficacy of the framework with the SARD-89 benchmarks. Each SARD-89 benchmark consists of four programs, one that is categorized as benign (no overflow), and three categorized as Malicious with a memory reference that overflows some buffer with a Minimum, Medium, or Large amount, respectively.

Tables 1(a) and 1(b) show the efficacy results for the 121 SARD benchmarks that overflow for an instruction with a non-constant array access, with static analysis conducted by IDA Pro and Ghidra, respectively. For each configuration and benchmark, the column labeled Basic lists the number of programs that behave correctly or as expected (no-overflow or overflow detected at correct location) with our mechanism that does not add any instrumentation for instructions associated with unknown owners. The columns labeled Ext. display the results with the small extension to our run-time algorithm to add instrumentation for instructions with unknown owners to detect an overflow if the access is outside the bounds of the current stack.

Thus, we can see that, (a) All Benign cases are correctly handled. (b) All cases with the Debug configuration are correctly detected. (c) Most Malicious cases with the Stripped configuration cannot be detected due to missing information from the static analysis phase. The run-time Pin extension enables the detection of overflows outside the stack bounds for binaries analysed by Ghidra (that contain instructions with unknown owners). This extension does not help binaries analyzed by IDA Pro as it assigns some owner (a scalar in many cases) to all such relevant instructions. (d) Interestingly, advanced type and bounds detection conducted by the static tools enables the Decomp. configuration to correctly detect a large majority of overflows for the Malicious programs.

Table 1. SARD-89 run-time results for three experimental configurations: ➀ Debug, ➁ Stripped ➂ Decomp. (Stripped + decompiler)

Table 2 presents the efficacy results for the 14 SARD-88 benchmark programs with the IDA Pro RE tool used in the first stage.Footnote 4 For each SARD-88 benchmark, the program with the odd number is malicious and contains a buffer overflow and the program with the even number is benign. All results displayed here include the Pin extension to detect memory access beyond the current function stack.

We find that while most cases with the Debug configuration are detected correctly, there are a few notable failures. Most of these failures are due to incorrect static bound detection for global read/write buffers. We did not encounter this case in SARD-89 benchmarks; most overflows there were in local buffers.

Programs analyzed by Ghidra encounter additional failures, even in the Debug case, because, unlike IDA Pro, Ghidra does not detect global strings that are usually defined in the binary’s read-only (.rodata) section. For instance, benchmarks II, IX, XI and XIV fail when analyzed by Ghidra due to this issue. We also observed that global read-only strings with lengths less than 4 bytes are not detected by IDA Pro; for Ghidra this length is 5 bytes. This issue is a basic limitation for reverse engineering tools, as reducing this lower bound can lead to type detection conflicts with other types that may appear to be strings.Footnote 5.

As expected, malicious programs in the Stripped configuration fail due to incorrect static analysis. However, in contrast to our observation that the benign cases with the Stripped configuration in SARD-89 are successful (no overflow detected), we find that most benign-Stripped cases in SARD-88 fail (false positive overflow is detected). This difference in behavior is because our RE tools make no owner association (or unknown owner with Ghidra) for the SARD-89 programs in this configuration; so, no check is added for programs analyzed by IDA Pro, and the only check added is to detect out-of-stack overflows for binaries analyzed by Ghidra. In contrast our RE tools associate an owner (global variables in many cases) with incorrect bounds (1 in many cases) for many SARD-88 programs in this configuration; hence, they encounter a false positive overflow.

Again, we notice that advanced array bound and type inference enables several programs to be correctly handled in the Decomp. configuration. Of the 23 programs that are correctly detected in the Debug case, 15 are also correctly handled in the Decomp. configuration.

Table 2. SARD-88 test Results (IDA Pro) for our three experimental configurations: ➀ Debug, ➁ Stripped, and ➂ Decomp. (Stripped + Decompiler)

5.3 Performance Overhead

Figure  7 uses different metrics to estimate the performance overhead of the run-time framework.Footnote 6 Apart from the slowdown introduced by the Pin framework itself, the instrumentation added by our run-time algorithm is the primary source of performance overhead. Figures 5(a) and 5(b) plot the total number of instrumentation points encountered by all the SARD-89 and SARD-88 programs at run-time, respectively. The figures also highlight some interesting observations, including, (a) the number of Stack sets is less than the number of Stack unsets due to many programs exiting abruptly after an overflow is detected, (b) while SARD-89 programs are dominated by array dereferences, the SARD-88 programs encounter many more pointer dereferences, (c) the Ghidra-stripped configuration assigns an unknown owner to several instruction in SARD-89, which enables the detection of large buffer overflows that exceed the current stack bounds.

We also compare the execution time of the benchmark in three settings, (a) native run, (b) using a minimal pintool that does not add any instrumentation, and (c) the pintool implementing our run-time algorithm (plots are available in Figs. 7(a) and 7(b) in Appendix C). Each program is run for 15 times and the average execution time is plotted. Most programs in the SARD-89 and SARD-88 suites run quickly, with an average execution time of 0.99 ms and 1.17 ms for the native run, respectively. The startup overhead of the minimal Pin framework increases the average run-time to 213.71 ms for SARD-89 and 417.99 ms for SARD-88 programs, respectively. Finally, our run-time framework increases the overhead to 227.85 ms for SARD-89 programs and 450.62 ms for the SARD-88 programs.

Fig. 5.
figure 5

Dynamic instrumentation points (subfigures (a) and (b))

6 Conclusions and Future Work

Our goal in this work is to analyze and evaluate the ability of current state-of-art static reverse engineering tools, especially Ghidra and IDA, to accurately determine the required program information from binary programs to enable the effective detection of memory errors during program execution. We find that both Ghidra and IDA include advanced algorithms for array bound and instruction owner identification as part of their decompiler framework. However, more advanced techniques and algorithms are needed to further improve their capabilities and precision, especially for optimized binaries. We built a Pin-based run-time tool that can use the information from the static RE tools to detect buffer overflows during execution. We found that while our run-time tool can detect a large fraction of memory errors in our benchmarks, the accuracy of the tool is directly proportional to the limitations in the available static program information.

The practicality of this approach to detect memory errors is limited by the accuracy and completeness of the static tools and the efficiency of the run-time framework. In the future we will explore the potential of different approaches, including other dedicated type inference mechanisms, new static algorithms, and combining static and dynamic analysis, to improve array bound and owner detection, especially for optimized binaries. We will also study techniques to improve the efficiency of our prototype run-time framework, including using a static binary rewriting system. Finally, we will experiment with a larger benchmark set to more comprehensively study the properties of this approach.