Bottom-Up Context-Sensitive Pointer Analysis for Java

Feng, Yu; Wang, Xinyu; Dillig, Isil; Dillig, Thomas

doi:10.1007/978-3-319-26529-2_25

Yu Feng¹⁵,
Xinyu Wang¹⁵,
Isil Dillig¹⁵ &
…
Thomas Dillig¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9458))

Included in the following conference series:

Asian Symposium on Programming Languages and Systems

806 Accesses
14 Citations

Abstract

This paper describes a new bottom-up, subset-based, and context-sensitive pointer analysis for Java. The main novelty of our technique is the constraint-based handling of virtual method calls and instantiation of method summaries. Since our approach generates polymorphic method summaries, it can be context-sensitive without reanalyzing the same method multiple times. We have implemented this algorithm in a tool called Scuba, and we compare it with k-CFA and k-obj algorithms on Java applications from the DaCapo and Ashes benchmarks. Our results show that the new algorithm achieves better or comparable precision to k-CFA and k-obj analyses at only a fraction of the cost.

This work is supported in part by the Air Force Research Laboratory under agreement numbers FA8750-14-2-0270 and FA8750-15-2-0096 and in part by NSF Awards #1453386.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Correspondence between Two Approaches to Interprocedural Analysis in the Presence of Join

TFA: an efficient and precise virtual method call resolution for Java

Article 06 October 2020

Self-inferencing Reflection Resolution for Java

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Pointer analysis is a key enabling technology underlying many program analysis, software engineering, and compiler optimization tasks. Given a pointer variable p, pointer analysis statically determines the set of all heap objects that p may point to. The result of such an analysis can be used to resolve important program analysis questions, such as whether two pointers can be aliases or whether a heap location may be referenced in a given piece of code.

While existing pointer analysis algorithms differ along many dimensions, a key feature that determines the precision of an algorithm is context-sensitivity. In particular, a context-sensitive analysis respects the call/return semantics of procedure calls and does not yield spurious points-to facts that arise from interprocedurally unrealizable paths. Furthermore, a context-sensitive analysis distinguishes heap objects that are allocated at the same program location, but due to different invocations of the same method. While more precise than context-insensitive ones, context-sensitive algorithms are much harder to scale to real programs, and many existing techniques use approximations of full context-sensitivity. For instance, object-sensitive analyses [14, 15, 23] only distinguish callsites where the receiver objects are different, and k-CFA analyses [12, 22] differentiate contexts by tracking callstrings up to some fixed length k.

Context-sensitivity can be achieved either by performing a top-down or bottom-up interprocedural analysis. Top-down analyses start ethods of a program and analyze callers before callees. In contrast, bottom-up analyses start at leaf methods of the callgraph and analyze callees before callers. Since top-down algorithms analyze every method in a known calling context, they are simpler to design and implement, but they need to re-analyze the same method multiple times under different contexts. In contrast, bottom-up analyses generate a polymorphic method summary that may be used in any calling context to get context-sensitive results. While generating a polymorphic points-to summary is trickier than determining points-to information at a particular call site, bottom-up analyses do not need to reanalyze the same method several times^{Footnote 1} and have the potential to scale better. In addition, the results of a bottom-up pointer analysis are reusable: For instance, using a bottom-up pointer analysis, we can analyze a library just once and reuse its summary for many different clients.

In this paper, we present a bottom-up context- and field-sensitive pointer analysis algorithm for Java. A key novel feature of our approach is the constraint-based treatment of virtual method calls. Similar to many other approaches, our method starts with an imprecise callgraph and refines the callgraph as points-to facts are discovered. However, we construct the callgraph in a purely bottom-up fashion by predicating points-to facts on the possible dynamic types of the receiver object. As method summaries are propagated up the call chain, these dynamic types are resolved, thereby allowing the refutation of infeasible call targets and spurious points-to facts in a context-sensitive manner.

Another salient feature of our approach is that it can generate polymorphic method summaries without performing expensive case splits on possible aliasing patterns at call sites. In particular, a key challenge in bottom-up pointer analysis is how to generate method summaries that soundly capture the aggregate effect of a call to method m under any possible aliasing relation at m’s call sites. Most previous techniques deal with this difficulty either by performing case splits on all possible aliasing patterns [3, 6] (which can cause exponential blow-up) or by using unification-based methods [10, 13, 27, 28] (which are imprecise compared to subset-based methods). A main advantage of our technique is that it is as precise as subset-based methods despite modeling the unknown state of the heap in a simple and uniform way. In particular, since our technique does not perform strong updates ^{Footnote 2} to heap locations, it can soundly account for the callee’s side effects by performing a fixed-point computation during summary instantiation.

We have implemented our algorithm in a tool called Scuba, and we compare its scalability and precision with top-down pointer analysis algorithms implemented in Chord [16]. Our experimental results on programs from the DaCapo and Ashes benchmark suites indicate that Scuba achieves better or comparable precision to k-CFA and k-object-sensitive algorithm at a fraction of the cost.

To summarize, this paper makes the following contributions:

We present a bottom-up, subset-based, and context-sensitive pointer analysis for Java. A key novelty of our approach is the handling of virtual method calls using constraint-based techniques.
We describe a new method for summarizing and instantiating points-to facts. Unlike previous techniques, our approach does not case-split on aliasing patterns and guarantees soundness by performing fixed-point computation during summary instantiation.
We describe an implementation of our algorithm and compare it with k-CFA and k-obj algorithms on the DaCapo and Ashes benchmarks.

2 Example

This section illustrates our approach on an example that showcases virtual method calls and the need for context-sensitivity. Consider the code shown in Fig. 1, which defines classes X, Y, and A. Here, Y is a subclass of X and overrides X’s bar method. Class A has two instance variables x and y of type X. For concreteness, suppose we want to know whether x.f and y.g can be aliases at the end of a1 and a2.

Our algorithm starts by analyzing X::bar and Y::bar, which are leaf procedures in the callgraph. The summaries for X::bar and Y::bar are shown in Fig. 2(a) and (b). We depict both method summaries as well as local points-to facts in the form of a graph, where nodes correspond to abstract heap objects and directed edges denote may-point-to relations. Our method summaries only include points-to edges that may be added due to an invocation of the summarized method. In particular, method summaries do not include points-to relations that already exist on method entry, and since our analysis does not apply strong updates, no points-to edges can be removed as a result of analyzing a method. Hence, the summary for a method m can be thought of as a bag of (symbolic) points-to edges that are introduced due to an invocation of m.

Now, consider the summary for X::bar shown in Fig. 2(a). We use arg $_0$ to denote the this pointer, and arg $_1$, arg $_2$ to denote the first and second formal parameters. For parameter i, we use arg $_i.\epsilon $ to denote the heap object pointed to by arg $_i$ on method entry. Hence, according to the summary in Fig. 2(a), X::bar adds a points-to edge via field f from the object pointed to by the this pointer to the object pointed to by X::bar’s first parameter. Note that, at a call site of X::bar, parameter z may have multiple points-to targets, hence, the single edge in X::bar’s summary may introduce multiple points-to edges at a call site. The summary for Y::bar, which is shown in Fig. 2(b), is very similar.

Now consider method A::foo, whose summary is shown in Fig. 2(c). Since x has type X, the call x.bar(a) could either invoke X::bar or Y::bar. Hence, to analyze the method call, we instantiate the summaries of both X::bar and Y::bar, but the points-to edges induced by instantiating method T::bar are qualified by a constraint that stipulates that the dynamic type of x is T.

As an example, consider the potential target X::bar of the call x.bar(a). Here, the only points-to edge in the summary for X::bar is from the f field of arg $_0.\epsilon $ to arg $_1.\epsilon $. Since arg $_0$ of X::bar corresponds to this.x at the call site, arg $_0.\epsilon $ instantiates to arg $_0.\epsilon .x$, which denotes the memory location pointed to by this.x. On the other hand, arg $_1$ in X::bar corresponds to parameter z at the call site, hence arg $_1.\epsilon $ translates to a location with the same name, i.e., arg $_1.\epsilon $. Therefore, as shown in Fig. 2(c), instantiating the summary of X::bar for this call site induces an edge from arg $_0.\epsilon .x$ to arg $_1.\epsilon $, but this edge is qualified by the constraint $\mathrm{type}(\mathrm{arg}_0.\epsilon .x) = \mathrm{X}$.

Continuing with the method invocation y.bar(a) in the second line of A::foo, we again instantiate the summaries of X::bar and Y::bar, since the type of y is X. However, this time, the location named arg $_1.\epsilon $ used in the summaries of X::bar and Y::bar correspond to the location pointed to by this.y, which is denoted by arg $_0.\epsilon .y$ in method foo. Hence, as shown in Fig. 2(c), the summary for foo includes points-to edges from the f and g fields of arg $_0.\epsilon .y$ to arg $_1.\epsilon $, again qualified by the appropriate type constraints. Observe that our approach is context-sensitive because two different invocations of bar induce different points-to relations. Also, even though this.x and this.y may alias at a call site of foo, we represent their points-to targets on method entry using two separate locations called arg $_0.\epsilon .x$ and arg $_0.\epsilon .y$. As we explain shortly, this approach is sound as long as we do not perform strong updates.

Now consider method a1 defined in A. The solid black edges in Fig. 3(a) denote points-to facts that hold right before the call to foo. In particular, the location named arg $_0.\epsilon $ denotes the object pointed to by the this pointer, and both the x and y fields of arg $_0.\epsilon $ point to a location called $\mathrm{alloc(Y)@a1:1}$, which corresponds to the memory allocated at the first line of method a1. Throughout the paper, we use the notation $\mathrm{alloc(T)@Ctx}$ to denote heap objects of type T that are allocated in context $\mathrm{Ctx}$.

We now turn to the method invocation foo(z) in a1. Here, arg $_1.\epsilon $ in foo’s summary corresponds to the location pointed to by z in a1, which is $\mathrm{alloc(Z)@a1:3}$. On the other hand, the locations named arg $_0.\epsilon .x$ and arg $_0.\epsilon .y$ in foo represent the locations pointed to by this.x and this.y in a1 respectively. Following the chain of points-to edges in Fig. 3(a), we see that arg $_0.\epsilon .x$ and arg $_0.\epsilon .y$ both correspond to the location $\mathrm{alloc(Y)@a1:1}$. Since this allocation is tagged with type Y, the constraints $\mathrm{type}(\mathrm{arg}_0.\epsilon .x) = \mathrm{X}$ and $\mathrm{type}(\mathrm{arg}_0.\epsilon .y) = \mathrm{X}$ evaluate to false, while $\mathrm{type}(\mathrm{arg}_0.\epsilon .x) = \mathrm{Y}$ and $\mathrm{type}(\mathrm{arg}_0.\epsilon .y) = \mathrm{Y}$ evaluate to true. Hence, instantiating foo’s summary induces a single points-to edge from the g field of $\mathrm{alloc(Y)@a1:1}$ to $\mathrm{alloc(Z)@a1:3}$, which is shown with the dotted edge in Fig. 3(a). A similar chain of reasoning allows us to obtain the points-to facts shown in Fig. 3(b) for method a2.

As we can see from Fig. 3, the analysis determines that x.f and y.g are not aliases in either a1 or a2. Observe that an analysis that is either context-insensitive or based on an imprecise callgraph would conclude otherwise. Also, even though there are two different calls to foo and four different calls to bar, observe that our algorithm analyzes each method only once.

3 Conceptual Foundations

Before describing our analysis in detail, we first describe a conceptual framework that lays the foundations of our algorithm. We describe the main ideas using a may-points-to graph, which we refer to as an abstract heap:

Definition 1

(Abstract heap). An abstract heap H is a graph (N, E) where N is a set of nodes corresponding to abstract memory locations, and E is a set of directed edges between nodes labeled with field names or $\epsilon $. An edge $(o_1, o_2, f)$ indicates that the f field of $o_1$ may point to $o_2$.

Here, an abstract memory location represents either the stack location of a variable or a set of heap objects. The edge label $\epsilon $ is used to model points-to relations from stack locations to heap objects. The root nodes of an abstract heap denote locations of variables, and we write $root (H)$ to indicate the set of root nodes of H. Given two abstract heaps $H_1$ and $H_2$, $H_1 \cup H_2$ represents the abstract heap containing nodes and edges from both $H_1$ and $H_2$. Given a heap H and edges E, we write $H \backslash E$ to denote the heap that contains all nodes and edges in H except the set of edges E.

3.1 Normalization of Abstract Heaps

Given an abstract heap H, we define a normalization operation $\mathbb {N}(H)$, which yields a normalized heap $H^*$ and a mapping $\zeta $ from nodes in H to nodes in $H^*$.

Definition 2

(Normal form). Given heap $H = (N, E)$, $\mathbb {N}(H)$ yields normalized heap $H^* = (N^*, E^*)$ and mapping $\zeta : N \rightarrow 2^{N^*}$ such that:

1.
If $x \in root (H)$, then $x \in N^*$ and $\zeta (x) = \{ x \}$.
2.
If $(o, o', f) \in E$ and $o^* \in \zeta (o)$, then $o^*.f \in N^*$, $o^*.f \in \zeta (o')$, and $(o^*, o^*.f, f)\in ~E^*$.

We use the notation $\mathbb {N}(H) = H^*$ to indicate that $H^*$ is in normal form, and we write $Map (H, H^*) = \zeta $ to indicate that $\zeta $ maps nodes of H to nodes in $H^*$.

Example 1

Consider the abstract heap H shown in Fig. 4(a) and its normal form $H^*$ in Fig. 4(b). Here, $Map (H, H^*)$ yields the following mapping $\zeta $:

$$\begin{aligned} \small \begin{array}{lll} \zeta (x) = \{ x\} &{} \zeta (y) = \{ y \} &{} \zeta (A) = \{ x.\epsilon \} \\ \zeta (B) = \{ x.\epsilon , y.\epsilon \} &{} \zeta (C) = \{ y.\epsilon \} &{} \zeta (D) = \{ x.\epsilon .f \} \\ \zeta (E) = \{ x.\epsilon .f, y.\epsilon .f \} &{} &{} \zeta (F) = \{ x.\epsilon .f, y.\epsilon .f \} \end{array} \end{aligned}$$

We use heap normal forms to model the heap on entry to a method. In particular, $H^*$ corresponds to a “generic” heap representing the unknown points-to targets of object o’s f field as o.f. While the mapping $\zeta $ from abstract heap H to its normal form $H^*$ differs for each call site, the normalized heap for a method m is the same irrespective of calling context. Observe that no pair of abstract memory locations alias each other in a normalized heap, and every location has exactly one points-to target for a given field.

Given an abstract heap $H = (N, E)$ and its normal form $H^* = (N^*, E^*)$ such that $Map (H, H^*) = \zeta $, we define $\zeta ^{-1}$ to be a mapping from $N^*$ to $2^N$ such that $n \in \zeta ^{-1}(n^*)$ iff $n^* \in \zeta (n)$. In general, there is a many-to-many relationship between the nodes of an abstract heap and its corresponding normal form.

Example 2

Consider the heap from Example 1. We have:

$$\begin{aligned} \small \begin{array}{lll} \zeta ^{-1}(x) = \{ x\} &{} \zeta ^{-1}(y) = \{ y \} &{} \zeta ^{-1}(x.\epsilon ) = \{ A, B \} \\ \zeta ^{-1}(y.\epsilon ) = \{ B, C \} &{} \zeta ^{-1}(x.\epsilon .f) = \{ D, E, F\} &{} \zeta ^{-1}(y.\epsilon .f) = \{ E, F \} \\ \end{array} \end{aligned}$$

The mapping $\zeta ^{-1}$ is important in summary-based analysis because it allows us to instantiate a method summary to a particular abstract heap at a call site. We use the notation $\overline{\zeta ^{-1}}$ to denote the extension of $\zeta ^{-1}$ that maps any element that is not in the domain of $\zeta ^{-1}$ to itself. Given heap $H^* = (N^*, E^*)$, we also write $\zeta ^{-1}(H^*)$ to denote a heap $H = (N, E)$ where $(o_1, o_2, f) \in E$ iff there exists an edge $(o^*_1, o^*_2, f) \in E^*$ such that $o_1 \in \zeta ^{-1}(o^*_1)$ and $o_2 \in \zeta ^{-1}(o^*_2)$.

Definition 3

(Default Edge). We say an edge $(n, n', f)$ is a default edge of an abstract heap if $n' = n.f$.

Given a heap H, we write $default (H)$ to denote the set of default edges in H.

3.2 Summary-Based Pointer Analysis

We now explain the basic idea underlying our summary-based analysis, assuming a family of pointer analyses that are sound and weakly-updating. Given code snippet S, an abstract heap H, and pointer analysis A, we write $H' = Analyze (H, S, A)$ to indicate that, if statement S is executed in an environment that satisfies abstract heap H, then analyzing code S using pointer analysis A yields a heap $H'$ which conservatively models the concrete heap after S.

The basic structure of our summary-based pointer analysis is shown in Fig. 5. The algorithm SumAnalyze takes as input an abstract heap H, a code snippet S, and a weakly-updating pointer analysis A, and works as follows. Line (1) constructs the normalized heap $H^*$ representing the unknown state of the heap before executing S, and line (2) analyzes S without making any assumptions about points-to facts that hold before S. Line (3) generates a polymorphic points-to summary $\varDelta $ which characterizes side effects of code S. Lines (4)–(11) instantiate the summary $\varDelta $ by performing a fixed-point computation. Finally, $H'$ at line (12) models the state of the heap after S when S is executed in an environment satisfying H.

Before discussing details, let us first understand in what way this algorithm is “summary-based”. Since $H^*$ can be constructed in a context-independent manner, we can analyze S in isolation and compute its side effects without knowing the points-to facts that hold before S. Hence, lines (1)–(3) in Fig. 5 correspond to summary generation. On the other hand, lines (4)–(11) perform summary instantiation by computing the context-specific mapping $\zeta _0$ and by adding all points-to edges that represent S’s side effects.

The most involved part of the above algorithm is the fixed-point computation at lines (4)–(11). Intuitively, the algorithm maps each edge in the summary to a set of edges at the callsite by using the mapping $\overline{\zeta ^{-1}}$ and adds these edges to the initial abstract heap H. However, as new edges are added to H, the mapping $\zeta $ must to be recomputed since locations used in the summary may map to new additional locations after the summary has been applied. Hence, $H'$ is recomputed until the mapping $\zeta $ from $H'$ to $H^*$ stabilizes. It is easy to see that SumAnalyze is sound because (i) the underlying pointer analysis does not apply strong updates to memory locations, and (ii) the summary is applied to a fixed-point. In particular, observe that $H'$ overapproximates H and $\{ H' \} S \{H'\}$ is a valid Hoare triple [8].

The reader may wonder why it is necessary to re-apply the summary until the mapping $\zeta $ reaches a fixed point in Fig. 5. This is necessary because our summary $\varDelta $ encodes all possible side effects of code snippet S, but not the order in which they happen. Hence, while the fixed point computation is required for soundness, an immediate corollary is that the procedure SumAnalyze can be less precise than Analyze if the underlying pointer analysis A is flow-sensitive.

4 Formalization of Algorithm

While the previous section describes core ideas of the analysis, it omits many important details. In this section, we describe our full algorithm using the core object-oriented language of Fig. 6. Here, a program consists of one or more class declarations C, which defines a class $T_1$ with optional superclass $T_2$. Instructions include assignments, loads, stores, heap allocations (marked with a unique program point $\rho $), non-deterministic conditionals, sequences, static method calls $m^\rho @T(...)$ (also marked with program point $\rho $), and virtual method calls $v_0. m^\rho (\ldots )$. We assume that the first argument of a method is always the this pointer, and if a class T inherits a method m from its superclass $T'$, then $T'$ also contains a definition of m with the same implementation.

4.1 Abstract Domains

Figure 7 shows the abstract domains that we need for describing our algorithm. Since our analysis is bottom-up, we differentiate between two kinds of heap objects o: Access paths of the form $a_i.\eta $ represent (caller-allocated) unknown heap objects reachable through the i’th argument, whereas objects named $\mathrm{alloc}(T)@\mathbf {\rho }$ represent heap objects of type T that are allocated either in the currently analyzed method or in a transitive callee. Specifically, $a_i.f_1...f_n$ denotes the unknown locations reachable on method entry through a series of field accesses $f_1...f_n$ from the i’th argument. We use the notation $a_i.(f_1...f_n)^\star $ to denote all unknown locations reachable from $a_i$ through any combination of field selectors $f_1, ..., f_n$. For instance, $a_0.(f.g)^\star $ represents the infinite set of access paths $a_0.f$, $a_0.g$, $a_0.f.f$, $a_0.g.f$ and so on. As we will see in Sect. 4.3, access paths in our analysis correspond to node labels of the normalized heap from Sect. 3.

Abstract memory locations $\pi $ are either heap objects o or stack locations. In particular, $a_i$ denotes the stack location of the i’th argument, and $v_i@\mathbf {\rho }$ denotes the location of local variable $v_i$ under context $\mathbf {\rho }$. We represent calling contexts using a sequence of program points $\rho _1,\ldots , \rho _n$, where each $\rho _i$ corresponds to some call or allocation site. For instance, a memory location named $\mathrm{alloc}(T)@\rho _1\rho _2$ corresponds to a heap object allocated at program point $\rho _2$ of some method m which is invoked at call site $\rho _1$. Similarly, $v@\rho _1$ denotes the local variable v of some method m when m is invoked at callsite $\rho _1$. Since our analysis builds contexts in a bottom-up way, local variables declared in the currently analyzed method m do not have any context information; hence, we abbreviate the locations of locals in the current method as $v_i$.

Definition 4

(Argument-Derived Location). We say location $\pi $ is derived from an argument, written $arg (\pi )$, if it is either (i) $a_i$ representing the location of the i’th argument or (ii) a heap object represented with an access path $a_i.\eta $.

An abstract heap $\varGamma $ maps each field f of location $\pi $ to a points-to set. A (guarded) points-to set $\theta $ is a set of pairs $(o, \phi )$ where o is a heap object and $\phi $ is a constraint. As discussed in Sect. 2, we use constraints to predicate points-to facts on dynamic types of receivers. Constraints $\phi $ belong to the theory of equality with uninterpreted functions, defined according to the following grammar:

$$\begin{aligned} \small \begin{array}{lll} \mathrm{Function} \ f &{} := &{} \mathrm{pts} \ | \ \mathrm{alloc} \ | \ \varsigma _i \\ \mathrm{Term} \ t &{} := &{} c \ | \ v \ | \ f(\mathbf {t}) \\ \mathrm{Formula} \ \phi &{} := &{} \top \ | \ \bot \ | \ \mathrm{type}(t) = T \ | \ \phi _1 \wedge \phi _2 \ | \ \phi _1 \vee \phi _2 \end{array} \end{aligned}$$

Here, terms include constants c, variables v, and function applications $f(\mathbf {v})$ where f is either the binary function $\mathrm{pts}, \mathrm{alloc}$, or an n-ary function drawn from $\varsigma _1, \ldots , \varsigma _k$. Formulas are composed of $\top $ (true), $\bot $ (false), and conjunctions and disjunctions of equality constraints of the form $\mathrm{type}(t) = T$, where T is a type constant. In addition to the usual function and equality axioms, the $\mathrm{alloc}$ and $\mathrm{type}$ functions obey the additional axiom $\forall x. \ \mathrm{type}(\mathrm{alloc}(T, \mathbf {\rho })) = T$, which states that the type of an allocation of type T is T.

Since we will convert heap objects to terms, we define an operation called lift($\pi $), abbreviated $\overline{\pi }$, as follows:

$$\begin{aligned} \small \overline{a_i} = a_i \ \ \ \overline{\mathrm{alloc}(T)@\mathbf {\rho }} = \mathrm{alloc}(T, \mathbf {\rho }) \ \ \ \overline{\pi .f} = \mathrm{pts}(\overline{\pi }, f) \ \ \ \overline{\pi .(\mathbf {f})^\star } = \varsigma _i(\overline{\pi }, \mathbf {f}) \ (\varsigma _i \ \mathrm{fresh}) \end{aligned}$$

We also assume an operation lift $^{-1}(t)$ which is the inverse of $lift (\pi )$. Given a term t, lift $^{-1}(t)$ yields an abstract memory location representation of that term.

Example 3

The constraint $\mathrm{type}(\overline{a.\epsilon .f}) = A \wedge \mathrm{type}(\overline{a.\epsilon .f}) = B$ is unsatisfiable since the dynamic type of $a.\epsilon .f$ cannot simultaneously be A and B. But the constraint $\mathrm{type}(\overline{a.\epsilon .g^\star }) = A \wedge \mathrm{type}(\overline{a.\epsilon .g^\star }) = B$, which translates to $\mathrm{type}(\varsigma _1(\mathrm{pts}(a, \epsilon ), g)) = A \wedge \mathrm{type}(\varsigma _2(\mathrm{pts}(a, \epsilon ), g)) = B$, is satisfiable because two distinct occurrences of $a.\epsilon .g^\star $ may correspond to different objects (e.g., $a.\epsilon .g$ and $a.\epsilon .g.g$).

We now define a function has_type( $\theta ,$ T) which generates a constraint that evaluates to true if some element in points-to set $\theta $ can have dynamic type T:

Definition 5

has_type( $\theta , T$ ). Given a points-to set $\theta $, the function has_type$(\theta , T)$ yields the following constraint:

$$\begin{aligned} \small \bigvee _{ (\pi _i, \phi _i) \in \theta } (( (\mathrm{type})(\overline{\pi _i}) = T) \wedge \phi _i) \end{aligned}$$

Now, going back to Fig. 7, an environment $\varUpsilon $ maps each method M in class T to its corresponding summary, which is an abstract heap $\varGamma $ summarizing M’s side effects. Applying the summary at a call site allows us to determine points-to relations of the caller without having to reanalyze the callee. In addition, our method summaries include points-to information for locals in the summarized method. In particular, this design choice allows us to determine points-to sets for all program variables without employing a separate top-down pass.

4.2 Operations on Abstract Domains

In this section, we describe some operations on abstract domains that simplify the description of our algorithm. Since our algorithm constructs the initial heap on method entry in a demand-driven way, we first define default targets for argument-derived locations:

Definition 6

(Default Target). Given an argument-derived location $\pi $ and a field f, the default target of the f field of $\pi $, written def $(\pi , f)$, is given as follows:

$$\begin{aligned} \small def (\pi , f) = \left\{ \begin{array}{lll} \pi &{} \ \mathrm{if}\ \pi = \pi '.(\mathbf {f})^\star \ \mathrm{and} \ f \in \mathbf {f} &{} (1) \\ \pi '.(f.\mathbf {g})^\star &{} \ \mathrm{if}\ \pi = \pi '.f.\mathbf {g} &{} (2) \\ \pi .f &{} \ \mathrm{otherwise} &{} (3) \end{array} \right. \end{aligned}$$

In other words, if field f is not part of a recursive field cycle (line 3), then the default target for field f of an argument derived location $\pi $ is $\pi .f$, just like the normal form heaps from Sect. 3. However, if f is part of a recursive field cycle, then our analysis collapses this cycle into a single abstract memory location (lines 1–2). For example, $\mathrm{def}(a.\mathrm{next}, \mathrm{next})= a.\mathrm{next}^\star $ (line 2), and $\mathrm{def}(a.\mathrm{next}^\star , \mathrm{next})= a.\mathrm{next}^\star $ (line 1). The summarization of recursive field cycles into access paths of the form $a.\mathbf {f}^\star $ is needed to ensure termination of the fixed-point computation performed by our algorithm.

Next, we define a field lookup operation on abstract heaps $\varGamma $:

Definition 7

(Field Look-up). Given heap $\varGamma $, field f, and location $\pi $, the field lookup operation $\varGamma [\pi , f]$ retrieves the points-to target for $\pi $’s f field:

$$\begin{aligned} \small \varGamma [\pi , f] = \left\{ \begin{array}{ll} \varGamma (\pi , f) \cup \{ (def (\pi , f), \top )\} &{} \mathrm{if} \ arg (\pi ) \\ \varGamma (\pi , f) &{} \mathrm{otherwise} \end{array} \right. \end{aligned}$$

Since our algorithm does not explicitly add default edges to the abstract heap, $\varGamma [\pi , f]$ always yields x def $(\pi , f)$ as part of the points-to set of $\pi .f$ if $\pi $ is an argument derived location. Now, since our analysis performs weak updates, we need to merge two points-to sets using the following join operator:

Definition 8

(Join $\sqcup $ of points-to sets $\theta _1, \theta _2$ )

$$\begin{aligned} \small (\theta _1 \sqcup \theta _2)(o) = \left\{ \begin{array}{l} \theta _1(o) \vee \theta _2(o) \ \ \mathrm{if} \ o \in dom(\theta _1) \cap dom(\theta _2) \\ \theta _1(o) \ \ \mathrm{if} \ o \in dom(\theta _1) \ \mathrm{and} \ o \not \in dom(\theta _2) \\ \theta _2(o) \ \ \mathrm{if} \ o \in dom(\theta _2) \ \mathrm{and} \ o \not \in dom(\theta _1) \end{array} \right. \end{aligned}$$

Observe that if an object o is in both points to sets $\theta _1$ and $\theta _2$, we take the disjunction of the constraints associated with o. We also extend this join operator to abstract heaps in the expected way. That is, for a location $\pi $ and field f, $(\varGamma _1 \sqcup \varGamma _2)(\pi , f)$ yields $\varGamma _1(\pi , f) \sqcup \varGamma _2(\pi , f) $. In our analysis, we sometimes need to predicate points-to information on constraints. For this purpose, an operation $\theta \downarrow \phi $ conjoins $\phi $ with every constraint in $\theta $:

Definition 9

(Projection of $\theta $ on $\phi $ ). $ \theta \downarrow \phi = \{ (\pi _i, \phi _i \wedge \phi ) \ | \ (\pi _i, \phi _i) \in \theta \} $

Finally, we extend the field lookup operation on points-to sets as follows:

Definition 10

(Field lookup for pts-to set). $ \small \varGamma [\theta , f] = \bigsqcup _{(\pi _i, \phi _i) \in \theta } \varGamma [\pi _i, f] \downarrow \phi _i $

That is, $\varGamma [\theta , f]$ includes the points-to target of every element in $\theta $ under the appropriate constraints.

4.3 Intraprocedural Analysis

Figure 8 describes the intraprocedural analysis using judgements of the form $ \varUpsilon , \varGamma \vdash I: \varGamma ' $. which indicates that, if statement I is executed in an environment that satisfies summary environment $\varUpsilon $ and abstract heap $\varGamma $, we obtain a new heap $\varGamma '$. Since the analysis is (partially) flow-sensitive, we distinguish between heaps $\varGamma , \varGamma '$ before and after executing I.

Rule (1) in Fig. 8 describes the analysis of assignments. Although our analysis only performs weak updates to heap objects, it does apply strong updates to variables. Hence, rule (1) updates the points-to set for $(v_1, \epsilon )$ to be $\varGamma [v_2, \epsilon ]$, where the lookup operation is defined in Sect. 4.2. Rule (2) for memory allocations $v = \mathrm{new}^\rho \ T$ introduces a new abstract location named $\mathrm{alloc}(T)@\rho $ and assigns $v_1$ to this singleton.

Rule (3) concerns loads of the form $v_1 = v_2.f$. Here, we first look up the points-to set $\theta $ of $v_2$ and then use $\varGamma [\theta , f]$ to retrieve the targets of memory locations in $\theta $. Finally, since our analysis applies strong updates to variables, we override $v_1$’s existing targets and change its points-to set to $\varGamma [\theta , f]$.

Rule (4) analyzes stores $v_1.f = v_2$. First, we look up the points-to sets $\theta _1$ and $\theta _2$ of $v_1$ and $v_2$. Now, the store operation will update every location $o_i$ such that $(o_i, \phi _i) \in \theta _1$. However, since we apply only weak updates to heap objects, we preserve the existing points-to targets $\varGamma (o_i, f)$ for each $o_i$. Furthermore, since $v_1$ points to $o_i$ under constraint $\phi _i$, $o_i$ points to elements in $\theta _2$ only when $\phi _i$ holds. Hence, the new points-to set for $o_i$ is given by $(\theta _2 \downarrow \phi _i) \sqcup \varGamma (o_i, f)$ where the $\downarrow $ operation is given by Definition 9. Since rules (5) and (6) for if statements and sequencing and are fairly standard, we do not describe them in detail.

4.4 Interprocedural Analysis

We now describe the instantiation of summaries at call sites. Since a key part of summary instantiation is constructing the mapping from locations in the summary to those at the call site, we first start with the rules in Fig. 9 which describe the instantiation of memory locations. Informally, the rules of Fig. 9 construct the mapping $\zeta ^{-1}$ from Sect. 3. More formally, they produce judgements of the form $ \mathcal {M}, \varGamma , \rho \vdash inst\_loc (\pi ): \theta $ where $\mathcal {M}$ maps formals to actuals, and $\varGamma $ and $\rho $ are the abstract heap and program point associated with a call site respectively. The meaning of the judgement is that, under $\mathcal {M}, \varGamma , \rho $, location $\pi $ used in the summary maps to (guarded) location set $\theta $.

The first rule Fig. 9 maps formal parameter $a_i$ to the actual $\mathcal {M}(a_i)$. The second rule instantiates argument-derived locations of the form $\pi .f$. For this purpose, we first instantiate prefix $\pi $ to location set $\theta $, then retrieve the points-to targets of the f field of locations in $\theta $. The third rule instantiates access paths of the form $\pi .(f_1 \ldots f_n)^\star $. As in the previous rule, we first instantiate prefix $\pi $, which yields $\theta _0$. Now, recall that the access path $\pi .(f_1 \ldots f_n)^\star $ describes the infinite set of access paths given by the regular expression $\pi .(f_1 + \ldots + f_n)^*$. Hence, to instantiate $\pi .(f_1 \ldots f_n)^\star $, we need to compute all locations that are reachable from $\theta _0$ using any combination of field selectors $f_1, \ldots , f_n$. The resulting set $\bigsqcup _{i\ge 0} \theta _i$ is the reflexive transitive closure of $\theta _0$ with respect to fields $f_1, \ldots , f_n$.

The last two rules in Fig. 9 describe the instantiation of allocations and local variables. Both rules use a helper $\mathrm{new\_ctx}$ method defined as follows:

$$\begin{aligned} \small \mathrm{new\_ctx}(\rho , \mathbf {\rho }) = \left\{ \begin{array}{ll} \rho , \mathbf {\rho } &{} \mathrm{if} \ |\mathbf {\rho }| \le k \\ \mathbf {\rho } &{} \mathrm{otherwise} \\ \end{array} \right. \end{aligned}$$

In other words, new_ctx appends call site $\rho $ to context $\mathbf {\rho }$ if the length of $\mathbf {\rho }$ is less than some pre-determined threshold k. Hence, our analysis uses a k-CFA style context-sensitive heap abstraction where the value of k is configurable.

We now turn to the instantiation of constraints, summarized in Fig. 10. To translate a constraint $\mathrm{type}(t) = T$, we map t to its corresponding location set $\theta $ by using inst_loc. The function has_type $(\theta , T)$ then yields the condition under which some element in $\theta $ has dynamic type T (recall Definition 5).

Using these ingredients, Fig. 11 shows how to instantiate an abstract heap $\varDelta $. Given location $\pi _i$ and field $f_j$ from the callee heap, inst_partial_heap instantiates all points-to edges from $\pi _i$ labeled with $f_j$ and yields instantiated partial heap $\varDelta _{ij}$. The instantiation of $\varDelta $ is obtained by taking the join over all $\varDelta _{ij}$’s.

Finally, Fig. 12 describes the analysis of method calls. First, consider a static call to m with corresponding summary $\varDelta $ (rule (1)). To analyze it, we construct the formal-to-actual mapping $\mathcal {M}$ and perform a least fixed-point computation that instantiates the summarized heap $\varDelta $ until we obtain an overapproximation of the set $\varDelta '$ of m’s side effects. The abstract heap after the method call is obtained by taking the union of the existing heap $\varGamma $ and the new “edges” $\varDelta '$.

The second rule of Fig. 12 describes the analysis of virtual calls. Here, we first overapproximate the call’s targets and then use the previous rule for analyzing static method calls to obtain heap $\varGamma _i$ assuming the called method is $T_i::m$. Now, since the target of the virtual call is $T_i::m$ under the assumption that $v_0$ has dynamic type $T_i$, we generate the constraint $\phi _i = \mathrm{has\_type}(\varGamma (v_0), T_i)$. Then, the final abstract heap after the call is obtained as $ \sqcup _i (\varGamma _i \downarrow \phi _i) $.

5 Implementation and Extensions

We implemented the proposed algorithm in a tool called Scuba (http://www.cs.utexas.edu/~yufeng/scuba.html) which is built on top of Chord [16]. Scuba performs analysis on the Quad representation of Joeq [25] and obtains an initial callgraph by running the context-insensitive pointer analysis implemented in Chord. It also uses the Z3 SMT solver [5] for checking satisfiability of constraints.

Our implementation performs several optimizations over the core algorithm described here. One optimization is memoizing instantiation results. For example, consider an access path $a_0.\epsilon .f.g.h$ used in m’s summary. Since this access path may be instantiated many times when analyzing a call site of m, our analysis maintains a cache per callsite that records instantiation results. A second optimization concerns constraint generation for virtual method calls. Consider a call v.m(...) where the static type of v is $T_0$. Further, suppose $T_0$ has a large number of subclasses $T_1, \ldots , T_n$ all of which inherit $T_0$’s m method except $T_n$. Assuming v points to heap object o, we need to introduce constraints of the form $\bigvee _{0 \le i < n} \mathrm{type}(o) = T_i$. Since such constraints can be very large, our implementation allows subtyping constraints and translates them to linear inequalities by assigning integer identifiers to types in reverse topological order.

6 Evaluation

We evaluated Scuba on ten large Java applications from the DaCapo and Ashes benchmark suites [1, 2]. These applications range between 92615 and 227507 lines of statements in the Quad IR and contain between 4634 and 9653 reachable methods. To evaluate our algorithm, we compared Scuba against the k-CFA and k-object-sensitive algorithms implemented in Chord [16]. All analyses are Anderson-style flow-insensitive pointer analyses that allow customizing the value of k. Chord also allows customizing the context-sensitivity associated with heap objects using a value h. For example, a 2-obj-1-h analysis uses the abstract allocation site of the receiver as a context up to depth 2, and it also differentiates heap allocations with different contexts up to depth 2.

Table 1. Analysis time in seconds. Runs exceeding the time-limit of 3600 s are labeled T/O.

Full size table

Before describing the results, we first explain how our algorithm relates to k-CFA and k-obj-sensitive analyses. Similar to k-CFA, Scuba uses call sites rather than receiver objects as contexts. However, unlike k-CFA, we do not impose a fixed value of k since our algorithm instantiates method summaries differently for each call site. On the other hand, we can customize the context-sensitivity associated with heap objects by varying the parameter k used in the $\mathrm{new\_ctx}$ function from Sect. 4.4. Hence, for a given value of k in the $\mathrm{new\_ctx}$ function, Scuba is roughly comparable to a $\infty $-CFA-k-h analysis. In what follows, we write Scuba-k to refer to different configurations of Scuba for different values of parameter k used in the new_ctx function from Sect. 4.4.

Table 2. May-alias results. The bigger the better.

Full size table

Table 1 compares the running times of Scuba-k (for $2\le k\le 4$) against the context-insensitive(CIPA), 2-CFA, and 2-obj-sensitive analyses using h value of 1. While we also tried comparing Scuba against k-CFA and k-obj-sensitive analyses for $k = 3$ and $k=4$, these analyses did not complete within an hour for most of the benchmarks; hence, we do not include these results in Table 1. We also note that the running times shown in Fig. 1 include the analysis time for libraries (e.g., JDK, Swing, Sun Security) as well as the application code (i.e., we did not use manually provided stub methods for analyzing libraries). As the results in Table 1 show, Scuba-k is significantly faster compared to k-CFA and k-obj analyses.

To compare the precision of Scuba against k-CFA and k-obj analyses, we used two typical pointer analysis clients, namely may-alias and downcast analyses. Table 2 compares the precision of different analysis configurations in the context of the may-alias client. The column labeled “Alias pairs” shows the number of variable pairs that are queried by the may-alias client. To generate these pairs, we first ran a context-insensitive pointer analysis to identify potential may-alias in the application code. From these variables, we further filtered those pairs that are “obviously” aliases (e.g., due to a direct assignment). Columns 3–11 in Table 2 show the number of variables proven not to be aliases according to each analysis configuration. Hence, a higher number indicates better precision. Observe that every configuration of Scuba-k yields better precision on average compared to 2-CFA and 2-obj analyses^{Footnote 3}.

Table 3. Downcast results. Bigger numbers indicate higher precision.

Full size table

Table 3 shows the precision of each analysis in the context of the downcast client. Here, the second column labeled “# downcasts” shows the total number of downcasts in the application, and the subsequent columns show the number of downcasts that can be proven safe. The number of downcasts shown in Table 3 only include the downcasts performed in the application code rather than in external libraries^{Footnote 4}. According to the results shown in Table 3, Scuba-k has better precision on average compared to both 2-CFA and 2-obj.

7 Related Work

Top-Down Pointer Analysis. Most existing context-sensitive pointer analysis algorithms are top-down [7, 12, 14, 24, 26]. Generally speaking, top-down context-sensitivity comes in two flavors: call-site sensitivity [22] (k-CFA) and k-object-sensitivity [14]. Specifically, CFA-based algorithms use method call sites as the context, while object-sensitive approaches use the receiver’s abstract allocation site. The recent work described in [9] has proposed selectively combining k-CFA and k-object sensitivity to achieve better precision. Several papers have used BDD-based methods for top-down context-sensitive pointer analysis [11, 26, 31]. The use of BDDs exposes commonalities between different contexts and allows the technique to scale better. The Chord framework [16] used in our experimental evaluation also uses BDDs to exploit equivalences between calling contexts.

Bottom-Up Pointer Analysis. While not as widely-studied as top-down algorithms, several papers propose bottom-up pointer analysis. However, many of these approaches are unification-based [10, 13, 18]. The algorithm proposed in [28] is also bottom-up but uses a combined equality- and subset-based approach. By contrast, our algorithm is subset-based and therefore more precise.

The algorithm described in [4] presents a subset-based, partially bottom-up pointer analysis for C. Unlike our approach, it incorporates both top-down and bottom-up phases where the top-down phase is used for precise handling of function pointers. Also unlike our approach, it tracks alias pairs as opposed to an explicit heap model and is meant for C rather than Java. Another subset-based pointer analysis for C that combines top-down and bottom-up phases is based on the observation that context-insensitivity does not result in a loss of precision if function side effects are accounted for [17]. In contrast to our approach, that technique handles SCCs in a context-insensitive way and employs a top-down phase that removes callee side effects. The algorithms described in [3, 6] perform bottom-up pointer analysis for C++ programs. Unlike the method presented in this paper, they perform case splits on possible aliasing patterns. Since the approach of [6] performs strong updates, it is more precise but less scalable compared to our technique.

Another related work is the algorithm described in [27], which describes a compositional pointer and escape analysis for Java. While this analysis is flow-sensitive and applies strong updates, it assumes that parameters do not alias and generates a summary that is valid under this assumption. However, if this assumption is violated at a call site, the analysis corrects the summary through a complex mechanism that involves merging of memory locations. Another difference is that [27] does not precisely handle virtual method calls.

Summarization. Many papers describe general frameworks for interprocedural analysis [19–21]. The work described in [29, 30] compute polymorphic summaries for dataflow problems but both rely on global points-to sets.

8 Conclusion

We described a new bottom-up, summary-based pointer analysis for Java. The experimental evaluation demonstrates that our algorithm runs significantly faster than top-down pointer analyses with comparable precision. We believe that Scuba is able to scale better because the cost of instantiating a method summary is smaller compared to the cost of re-analyzing the function.

Notes

1.
Bottom-up algorithms only re-analyze methods that belong to SCCs in the callgraph.
2.
A strong update to memory location o kills the existing points-to facts for o, while a weak update does not.
3.
We manually inspected a randomly selected subset of the may-alias queries that could only be discharged by Scuba and confirmed that these are not false negatives.
4.
Since most benchmarks use the same libraries, this strategy avoids double counting. Furthermore, clients are typically interested in finding defects in the application.

References

Ashes benchmark suite. http://www.sable.mcgill.ca/software/#ashessuitecollection
Dacapo benchmarks. http://www.dacapobench.org/
Chatterjee, R., Ryder, B.G., Landi, W.A.: Relevant context inference (1999)
Google Scholar
Cheng, B.-C., Hwu, W.-M.W.: Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation. In: PLDI (2000)
Google Scholar
de Moura, L., Bjørner, N.S.: Z3: an efficient SMT solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) TACAS 2008. LNCS, vol. 4963, pp. 337–340. Springer, Heidelberg (2008)
Chapter Google Scholar
Dillig, I., Dillig, T., Aiken, A., Sagiv, M.: Precise and compact modular procedure summaries for heap manipulating programs. In: PLDI (2011)
Google Scholar
Fähndrich, M., Rehof, J., Das, M.: Scalable context-sensitive flow analysis using instantiation constraints. In: PLDI 2000 (2000)
Google Scholar
Hoare, C.A.R.: An axiomatic basis for computer programming. Commun. ACM 12(10), 576–580 (1969)
Article MATH Google Scholar
Kastrinis, G., Smaragdakis, Y.: Hybrid context-sensitivity for points-to analysis. In: PLDI, pp. 423–434 (2013)
Google Scholar
Lattner, C., Lenharth, A., Adve, V.: Making context-sensitive points-to analysis with heap cloning practical for the real world. In: PLDI (2007)
Google Scholar
Lhoták, O.: Program analysis using binary decision diagrams. Ph.D. thesis, McGill University (2006)
Google Scholar
Lhoták, O., Hendren, L.: Context-sensitive points-to analysis: is it worth it? In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 47–64. Springer, Heidelberg (2006)
Chapter Google Scholar
Liang, D., Harrold, M.J.: Efficient points-to analysis for whole-program analysis. In: Wang, J., Lemoine, M. (eds.) ESEC 1999 and ESEC-FSE 1999. LNCS, vol. 1687, p. 199. Springer, Heidelberg (1999)
Google Scholar
Milanova, A., Rountev, A., Ryder, B.G.: Parameterized object sensitivity for points-to and side-effect analyses for Java. In: ISSTA (2002)
Google Scholar
Milanova, A., Rountev, A., Ryder, B.G.: Parameterized object sensitivity for points-to analysis for Java. TOSEM 4, 1–41 (2005)
Article Google Scholar
Naik, M.: Chord framework. http://pag.gatech.edu/chord
Nystrom, E.M., Kim, H.-S., Hwu, W.W.: Bottom-up and top-down context-sensitive summary-based pointer analysis. In: Giacobazzi, R. (ed.) SAS 2004. LNCS, vol. 3148, pp. 165–180. Springer, Heidelberg (2004)
Chapter Google Scholar
O‘Callahan, R.: Generalized aliasing as a basis for program analysis tools. Ph.D. thesis, Carnegie Mellon University (2001)
Google Scholar
Reps, T., Horwitz, S., Sagiv, M.: Precise interprocedural dataflow analysis via graph reachability. In: POPL, pp. 49–61 (1995)
Google Scholar
Sagiv, S., Reps, T.W., Horwitz, S.: Precise interprocedural dataflow analysis with applications to constant propagation. In: TAPSOFT 1995 (1996)
Google Scholar
Sharir, M., Pnueli, A.: Two Approaches to Interprocedural Data Flow Analysis, Chap. 7, pp. 189–234. Prentice-Hall, Englewood Cliffs (1981)
Google Scholar
Shivers, O.: Control-flow analysis of higher-order languages. Technical report (1991)
Google Scholar
Smaragdakis, Y., Bravenboer, M., Lhoták, O.: Pick your contexts well: understanding object-sensitivity. In POPL (2011)
Google Scholar
Sridharan, M., Bodík, R.: Refinement-based context-sensitive points-to analysis for Java. In: PLDI, pp. 387–400 (2006)
Google Scholar
Whaley, J.: Joeq: a virtual machine and compiler infrastructure. In: IVME, pp. 58–66. ACM (2003)
Google Scholar
Whaley, J., Lam, M.S.: Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In: PLDI, pp. 131–144 (2004)
Google Scholar
Whaley, J., Rinard, M.: Compositional pointer and escape analysis for Java programs. In: OOPSLA, pp. 187–206 (1999)
Google Scholar
Xu, G., Rountev, A.: Merging equivalent contexts for scalable heap-cloning-based context-sensitive points-to analysis. In ISSTA (2008)
Google Scholar
Yorsh, G., Yahav, E., Chandra, S.: Generating precise and concise procedure summaries. In: POPL, pp. 221–234 (2008)
Google Scholar
Zhang, X., Mangal, R., Naik, M., Yang, H.: Hybrid top-down and bottom-up interprocedural analysis. In: PLDI, p. 28 (2014)
Google Scholar
Zhu, J., Calman, S.: Symbolic pointer analysis revisited. In: PLDI (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

UT Austin, Austin, USA
Yu Feng, Xinyu Wang, Isil Dillig & Thomas Dillig

Authors

Yu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xinyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Isil Dillig
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Dillig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Feng .

Editor information

Editors and Affiliations

Univ. of Science and Technology of China, Hefei, Anhui, China
Xinyu Feng
Pohang Univ. of Science and Technology, Nam-Gu, Pohang, Korea (Republic of)
Sungwoo Park

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, Y., Wang, X., Dillig, I., Dillig, T. (2015). Bottom-Up Context-Sensitive Pointer Analysis for Java. In: Feng, X., Park, S. (eds) Programming Languages and Systems. APLAS 2015. Lecture Notes in Computer Science(), vol 9458. Springer, Cham. https://doi.org/10.1007/978-3-319-26529-2_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-26529-2_25
Published: 09 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26528-5
Online ISBN: 978-3-319-26529-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bottom-Up Context-Sensitive Pointer Analysis for Java

Abstract

Similar content being viewed by others

A Correspondence between Two Approaches to Interprocedural Analysis in the Presence of Join

TFA: an efficient and precise virtual method call resolution for Java

Self-inferencing Reflection Resolution for Java

Keywords

1 Introduction

2 Example

3 Conceptual Foundations

Definition 1

3.1 Normalization of Abstract Heaps

Definition 2

Example 1

Example 2

Definition 3

3.2 Summary-Based Pointer Analysis

4 Formalization of Algorithm

4.1 Abstract Domains

Definition 4

Example 3

Definition 5

4.2 Operations on Abstract Domains

Definition 6

Definition 7

Definition 8

Definition 9

Definition 10

4.3 Intraprocedural Analysis

4.4 Interprocedural Analysis

5 Implementation and Extensions

6 Evaluation

7 Related Work

8 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation