1 Introduction

Static analyses based on abstractions of sets of states (or for short, state analyses) compute an over-approximation of the states that a program may reach, so as to answer questions related, e.g., to safety (absence of errors or structural invariant violations). By contrast, one may also design static analyses that discover relations between program initial states and output states. In this paper, we refer to such static analyses as transformation analyses. A transformation relation between the initial state and the output state of a given execution can provide an answer to questions related to the functional correctness of a program (i.e., does it compute a correct result when it does not crash and terminates). Another application of such a transformation relation is to let the analysis reuse multiple times the result obtained for a given code fragment (e.g., a procedure), provided the analysis can compose transformation relations. The great advantage of this approach is to reduce the analysis cost, by avoiding to recalculate the effect, e.g., of a procedure in multiple calling contexts. This is known as the relational approach to interprocedural analysis [35].

However, a major difficulty is to find an accurate and lightweight representation of the input-output transformation relation of each procedure. A first solution is to resort to tables of abstract pre- and post-conditions that are all expressed in a given state abstract domain [2, 10, 15, 22]. However, this generally makes composition imprecise unless very large tables can be computed. A second solution is to build upon a relational abstract domain, namely, an abstract domain that can capture relations across distinct program variables. The transformation between states is then expressed using “primed” and “non-primed” variables where the former describe the input state and the latter the output state [24, 26, 27]. As an example, we consider a procedure that computes the absolute value of input \( x \) and stores it into \( y \) (for the sake of simplicity we assume mathematical integers):

  • Using the intervals abstract domain [9], we can provide the table \( [ ((x \in [-5, -1], y \in ]{-}\infty , +\infty [) \mapsto (x \in [-5, -1], y \in [1,5])) ; ((x \in [-1, 10], y \in ]{-}\infty , +\infty [) \mapsto (x \in [-1, 10], y \in [0,10])) ] \) (this table is obviously not unique);

  • Using the relational abstract domain of polyhedra [11], we can construct the transformation relation \( (x' = x \mathrel {\wedge }y' \ge x \mathrel {\wedge }y' \ge -x) \).

We note that, while the expressiveness of the two is not comparable, the latter option is more adapted to compositional reasoning. For instance, given pre-condition \( -10 \le x \le -5 \), the analysis based on a table either returns a very imprecise answer or requires enriching the table whereas the analysis with a relational domain can immediately derive \( x' = x \in [-10, -5] \) (\( x \) has not changed) and \( y' \ge 5 \).

Such reasoning becomes more complex when considering non-numerical facts, such as memory shape properties. Many works rely on the tabulation approach, using a conventional shape state abstraction [2, 15, 22]. In general, the tabulation approach restricts the ability to precisely relate procedure input and output states and may require large tables of pairs of formulas for a good level of precision. The approach based on a relational domain with primed and non-primed variables has been implemented by [18, 19] in the TVLA shape analysis framework [33]. However, it is more difficult to extend shape analyses that are based on separation logic [28] since a separation logic formula describes a region of a given heap; thus, it does not naturally constrain fragments of two different states. To solve this issue, a first approach is to modify the semantics of separation logic connectors to pairs of states [34]. A more radical solution is to construct novel logical connectors over state transformation relations that are inspired by separation logic [17]. These transformations can describe the effect of a program and express facts such as “memory region \( A \) is left untouched whereas a single element is inserted into the list stored inside memory region \( B \) and the rest of that list is not modified”. The analysis of [17] is designed as a forward abstract interpretation which produces abstract transformation relations. Therefore, it can describe tranfsormations precisely using separation logic predicates and without accumulating tables of input and output abstract states.

However, this analysis still lacks several important components to actually make interprocedural analysis modular. In particular, it lacks a composition algorithm over abstract transformation relations. Modular interprocedural analysis also needs to synchronize two distinct processes that respectively aim at computing procedure summaries and at instantiating them at a call-site. In this paper, we propose a top-down analysis based on shape summaries and make the following contributions:

  • in Sect. 2, we demonstrate the use of abstract shape transformations;

  • in Sect. 3 and Sect. 4, we formalize transformation summaries based on separation logic (intraprocedural analysis is presented in Sect. 5);

  • in Sect. 6, we build a composition algorithm over transformation summaries;

  • in Sect. 7, we formalize a modular interprocedural analysis;

  • in Sect. 8, we report on implementation and evaluation.

2 Overview

In this section, we study a restricted example to highlight some issues in interprocedural analysis. We consider a recursive implementation of C linked lists, with a couple of procedures shown in Fig. 1. The function \(\texttt {append}\) takes two lists as arguments and assumes the first one non-empty, it traverses the first list, and mutates the pointer field of the last element. The function \(\texttt {double\_append}\) takes three lists as arguments (the first one is assumed non-empty) and concatenates all three by calling append. The topic of our discussion is only the invariants underlying this code and their discovery, not the efficiency of the code itself.

Fig. 1.
figure 1

Simple and double list append procedures.

State Abstraction and Analysis. We consider an abstraction based on separation logic [28], as shown, e.g., in [6, 13]. To describe sets of states, we assume a predicate \( \mathbf {lseg}( \alpha _0, \alpha _1 ) \) that represents heap regions that consist of a list segment starting at some address represented by the symbolic variable \( \alpha _0 \) and such that the last element points to some address represented by \( \alpha _1 \). Such a segment may be empty. For short, we note \( \mathbf {list}( \alpha ) \) for the instance \( \mathbf {lseg}( \alpha , \mathrm {\mathbf {0x0}}) \), which denotes a complete singly-linked list (as the last element contains a null next link). A single list element writes down \( \alpha _0 \cdot \underline{\mathtt {n}}{\mathop {\mapsto }\limits ^{}}\alpha _1 \) where \( \underline{\mathtt {n}}\) denotes the next field offset (we elide other fields). More generally, \( \alpha _0 {\mathop {\mapsto }\limits ^{}}\alpha _1 \) denotes a single memory cell of address \( \alpha _0 \) and content \( \alpha _1 \). Thus \( \mathtt { \& }\texttt {x} {\mathop {\mapsto }\limits ^{}}\alpha \) expresses that variable \( \texttt {x} \) stores a value \( \alpha \) (which may be either a base value or the address of a memory cell). Predicates \( \mathbf {lseg}\) and \( \mathbf {list}\) are summary predicates as they describe unbounded memory regions; their denotation is naturally defined by induction. As usual, separating conjunction conjoins predicates over disjoint heap regions.

Assuming the abstract precondition , existing state shape analyses [6, 13] can derive the post-condition by a standard forward abstract interpretation [9] of the body of \(\texttt {append}\). The analysis proceeds by abstract interpretation of basic statements, unfolds summaries when analyzing reads or writes into the regions that they represent, and folds basic predicates into summaries at loop heads. The convergence of the loop analysis takes a few iterations and involves widening which is often a rather costly operation.

The analysis of \(\texttt {double\_append}\) by inlining follows similar steps. One important note though is that it analyses the body of \(\texttt {append}\) twice, namely once per calling context, and thus also the loop inside \(\texttt {append}\). In turn, if \(\texttt {double\_append}\) is called in several contexts in a larger program, or inside a loop, its body will also be analyzed multiple times, which increases even more the overall analysis cost.

Transformation Analysis. Unlike state analyses, transformation analyses compute abstractions of the input-output relation of a program fragment. As an example, given the above abstract pre-condition, the analysis of [17] infers relations as shown in Fig. 2(a).

Fig. 2.
figure 2

A shape transformation procedure summary and composition.

This graphical view states that the procedure \(\texttt {append}\) keeps both summary predicates unchanged and only modifies the middle list element so as to append the two lists. This transformation is formally represented using three basic kinds of relational predicates that respectively state that one region is preserved, that one region is “transformed” into another one, and that two transformations are conjoined at the transformation level. To stress the difference, we write for the latter, instead of just . Although it uses transformation predicates, the analysis follows similar abstract interpretation steps as [6, 13].

Towards Modular Analysis: Composition of Transformation Abstractions. The first advantage of transformation predicates such as Fig. 2(a) is that they can be applied to state predicates in the abstract, as a function would be in the concrete. Indeed, if we apply this abstract transformation to the abstract pre-condition given above, we can match each component of the abstract transformation of Fig. 2(a), and derive its post-condition. In this example, the body of each summary is left unchanged, whereas the last element of the first list is updated. The result is shown in Fig. 2(b) and it can be derived without reanalyzing the body of \(\texttt {append}\).

While these steps produce a state analysis of the body of \(\texttt {double\_append}\), we may want to summarize the effect of this function too, as it may be called somewhere else in a larger program. To achieve this, we need not only to apply an abstract transformation to an abstract state, but to compose two abstract transformations together. Intuitively, this composition should proceed in a similar way as the application that we sketc.hed above. In the case of \(\texttt {double\_append}\), the analysis requires case splits depending on whether \( \mathtt {k1} \) is empty or not. For brevity, we show only the case where this list is non-empty in Fig. 2(c). In general, the composition of two transformations requires to match the post-condition of the first and the pre-condition of the second, and to refine them, using some kind of intersection as shown in Sect. 6. Another important issue is the summary computation process. Bottom-up analyses strive for general summaries whatever the calling context. However, a procedure may behave differently when applied to other structures (e.g., a binary tree or a doubly linked list), thus the top-down approach which provides information about the calling contexts before they are analyzed seems more natural. However, this means that the analysis should infer summaries and apply them simultaneously, and that the discovery of new calling contexts may require for more general summaries. We describe this in Sect. 7.

3 Abstraction of Sets of States and State Transformations

In the following sections, we restrict ourselves to a small imperative language that is sufficient to study procedure transformation summaries, although our implementation described in Sect. 8 supports a larger fragment of C. We also only consider basic singly linked lists in the formalization although the implementation supports a large range of list or tree-like inductive data-structures.

Concrete States, Programs, and Semantics. We write \( \mathbb {X}\) for the set of program variables and \( \mathbb {V}\) for the set of values, which include memory addresses. The address of a variable \( \texttt {x} \) is assumed fixed and noted \( \mathtt { \& }\texttt {x} \). Structure fields are viewed both as names and as offsets; they include the null offset (basic pointer) and \( \underline{\mathtt {n}}\), the “next” field of a list element. We let a memory state \( \sigma \in \mathbb {M}\) be a partial function from variable addresses and heap addresses to values. We write \( \mathbf {dom}( \sigma ) \) for the domain of \( \sigma \), that is the set of elements for which it is defined. Additionally, if \( \sigma _0, \sigma _1 \) are such that \( \mathbf {dom}( \sigma _0 ) \mathrel {\cap }\mathbf {dom}( \sigma _1 ) = \emptyset \), we let be the memory state obtained by appending \( \sigma _1 \) to \( \sigma _0 \) (its domain is \( \mathbf {dom}( \sigma _0 ) \mathrel {\cup }\mathbf {dom}( \sigma _1 ) \)). If \( a\) is an address and \( v\) a value, we write \( [a{\mathop {\mapsto }\limits ^{}}v] \) for the memory state \( \sigma \) such that \( \mathbf {dom}( \sigma ) = \{ a\} \) and \( \sigma ( a) = v\). A command \( \mathtt {C}\) is either an assignment, a local variable declaration or a loop (we omit tests and memory allocation out as our procedure summary analysis handles them in a very standard way). A procedure \( \mathtt {P}\) is defined by a list of arguments and a body (we let function returns be encoded via parameter passing by reference). A program is made of a series of procedures including a \( \texttt {main} \). All variables are local. The syntax of programs is defined by the grammar below:

We let the semantics of a command \( \mathtt {C}\) be a function \( \llbracket \mathtt {C} \rrbracket _{\mathcal {T}}: \mathcal {P}(\mathbb {M}) \rightarrow \mathcal {P}(\mathbb {M}) \) that maps a set of input states into a set of output states. We do not detail the definition of this semantics as it is standard. In the following, require for more general summaries, that respectively describe sets of states and relations over states.

Fig. 3.
figure 3

Syntax and concretization of the abstract states.

Abstract States and Transformations. The syntax of abstract heaps \( {\mathrm {h}}^{\sharp }\in \mathbb {H}\) is shown in Fig. 3(a). We let \( \mathbb {A}= \{ \alpha , \alpha ', \ldots \} \) denote a set of symbolic variables that abstract heap addresses and values. A symbolic name \( \mathrm {n}\in \mathbb {N}\) is either a variable address \( \mathtt { \& }\texttt {x} \) or a symbolic value \( \alpha \). Numerical constraints \( {\mathrm {c}}^{\sharp }\) describe predicates over symbolic names. An abstract heap (or state) \( {\mathrm {h}}^{\sharp }\in \mathbb {H}\) is the (possibly) separating conjunction of region predicates that abstract separate regions [28], which consist either of an empty region \( \mathbf{emp}\), or of a basic memory block (described by a points-to predicate \( \mathrm {n}\cdot \underline{\mathtt {f}}{\mathop {\mapsto }\limits ^{}}\mathrm {n}' \)), or inductive summaries, and may include some numerical constraints (that do not represent any memory region and only constrain symbolic names). We note for separating conjunction over states. The abstract states defined in Fig. 3(a) are of a comparable level of expressiveness as the abstractions used in common shape analysis tools such as [2, 6, 13, 14] to verify properties such as the absence of memory errors or the preservation of structural invariants.

The concretization of abstract states is shown in Fig. 3(b). It uses valuations to tie the abstract names and the value they denote. A valuation consists of a function \( \nu : \mathbb {N}\rightarrow \mathbb {V}\). We assume the concretization \( \gamma _{\mathbb {C}}( {\mathrm {c}}^{\sharp }) \) of a numeric constraint \( {\mathrm {c}}^{\sharp }\) returns the set of valuations that meet this constraint. Abstract heaps are concretized into sets of pairs made of a heap and of the valuation that realizes this heap. The concretization of summary predicates \( \mathbf {list}\) and \( \mathbf {lseg}\) is defined recursively, by unfolding. Indeed, we let \( \gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_0 ) = \bigcup \{ \gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_1 ) \mid {\mathrm {h}}^{\sharp }_0 \rightarrow _{\mathfrak {U}}{\mathrm {h}}^{\sharp }_1 \} \), where \( \rightarrow _{\mathfrak {U}}\) is defined by (cases for \( \mathbf {list}\) are similar):

Example 1

(Abstract state). The abstract state in Fig. 2(b) writes down as:

Assuming both \( \mathbf {lseg}( \alpha _0, \alpha _1 ) \) and \( \mathbf {list}( \alpha _2 ) \) unfold to structures of length one, it concretizes in the same way as:

Abstract Transformations. Abstract transformations are defined on top of abstract states and rely on specific logical connectors. Their syntax is defined in Fig. 4(a). A heap transformation is either the identity \( \mathtt {Id}( {\mathrm {h}}^{\sharp }) \), which denotes physical equality over pairs of states that are both described by \( {\mathrm {h}}^{\sharp }\), a state transformation \( [{\mathrm {h}}^{\sharp }_{\mathrm {i}} \dasharrow {\mathrm {h}}^{\sharp }_{\mathrm {o}}] \) which captures input/output pairs of states respectively defined by \( {\mathrm {h}}^{\sharp }_{\mathrm {i}} \) and by \( {\mathrm {h}}^{\sharp }_{\mathrm {o}} \), or a separating conjunction of transformations (we write to stress the distinction with the state separating conjunction ).

Fig. 4.
figure 4

Syntax and concretization of abstract transformations.

The concretization of transformations is shown in Fig. 4(b). It is built upon the previously defined \( \gamma _{\mathbb {H}}\) and also utilizes valuations. The most interesting case is that of : this connector imposes disjointness not only of the sub-heaps in both the pre- and post-state, but also across them. In this paper, we study only a basic form of the transformation predicate although it may be strengthened with additional constraints [16], e.g., to assert that the footprint has not changed or that only specific fields may have been modified. We leave out such constraints as their goal is orthogonal to the focus of this paper. Finally, the analysis uses finite disjunctions of transformations.

Example 2

(Abstract transformation). The transformation informally described in Fig. 2(a) is captured by the abstract transformation below:

In the following, we write \( {\mathrm {h}}^{\sharp }_0 \rightarrow _{\mathfrak {U}}{\mathrm {h}}^{\sharp }_1 \) when \( {\mathrm {h}}^{\sharp }_0 \) may be rewritten into \( {\mathrm {h}}^{\sharp }_1 \) by applying \( \rightarrow _{\mathfrak {U}}\) to any of its sub-terms. We use this notation for both heaps and transformations. Last, we let \( \rightarrow _{\mathfrak {U}[\alpha ]} \) denote unfolding of a \( \mathbf {list}( \alpha ) \) or \( \mathbf {lseg}( \alpha , \ldots ) \) predicate.

4 Procedure Summarization

The semantics of a procedure boils down to a relation between its input states and its output states, thus our first attempt at summaries over-approximates the input-output relation of the procedure. To express this, we introduce the following notation. If \( f: \mathcal {P}(A) \rightarrow \mathcal {P}(B) \) is a function and \( R \subseteq A \times B \) is a relation, we note \( f \Subset R \) if and only if \( \forall X \subseteq A, \; X \times f( X ) \subseteq R \).

Definition 1

(Global transformation summary). A sound global transformation summary (or, for short, global summary) of procedure \( \mathbf {proc}\; \texttt {f} ( \ldots )\{ \mathtt {C}\} \) is an abstract transformation \( {\mathrm {t}}^{\sharp }\) that over-approximates \( \llbracket \mathtt {C} \rrbracket _{\mathcal {T}} \) in the sense that:

$$ \llbracket \mathtt {C} \rrbracket _{\mathcal {T}} \Subset \{ (\sigma _{\mathrm {i}}, \sigma _{\mathrm {o}}) \mid \exists \nu , \; (\sigma _{\mathrm {i}}, \sigma _{\mathrm {o}}, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }) \} $$

For example, function \(\texttt {append}\) (Fig. 1) can be described using a global procedure summary (noted \( {\mathrm {t}}^{\sharp }\) in Example 2). While this notion of summary may account for the effect of a procedure, it is not adequate to describe intermediate analysis results. As an example, a procedure \( \texttt {f} \) is likely to be called in multiple contexts. In that case, when the analysis reaches a first context, it computes a summary \( {\mathrm {t}}^{\sharp }\), that accounts for the effect of the procedure in that context, for a given set of procedure input states. When it reaches a second context, it should be able to decide whether \( {\mathrm {t}}^{\sharp }\) also covers the states that reach the procedure in that second context. Observe that the pre-state of \( {\mathrm {t}}^{\sharp }\) does not suffice since \( {\mathrm {t}}^{\sharp }\) may have been computed for some very specific context. Moreover, the left projection of \( {\mathrm {t}}^{\sharp }\) may not account for some call states encountered so far when these lead to non-termination or to an error in the body of \( \texttt {f} \). To overcome this issue, an over-approximation of the procedure input states observed so far should be adjoined to the global summary:

Definition 2

(Context transformation summary). A sound context transformation summary (or, for short, context summary) of procedure \( \mathbf {proc}\; \texttt {f} ( \ldots )\{ \mathtt {C}\} \) is a pair \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) such that the following holds:

$$ (\lambda (M \subseteq \{ \sigma \mid \exists \nu , (\sigma , \nu ) \in \gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_{\texttt {f}} ) \}) \cdot \llbracket \mathtt {C} \rrbracket _{\mathcal {T}}( M )) \Subset \{ (\sigma _{\mathrm {i}}, \sigma _{\mathrm {o}}) \mid \exists \nu , (\sigma _{\mathrm {i}}, \sigma _{\mathrm {o}}, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_{\texttt {f}} ) \} $$

Intuitively, Definition 2 asserts that \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) captures all the functions such that their restriction to states in \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) can be over-approximated by relation \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \). Although we do not follow this approach here, the \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) component may be used in order to augment summaries with context sensitivity. We note that \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) accounts for all states found to enter the body of \( \texttt {f} \) so far, even though they may lead to no output state in \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \) (e.g., if the evaluation of the body of \( \texttt {f} \) from them does not terminate or fails due to an error, as shown in Example 4).

Example 3

(Context summary). We let and assume that \( {\mathrm {t}}^{\sharp }\) is defined as in Example 2. Then, \( ({\mathrm {h}}^{\sharp }, {\mathrm {t}}^{\sharp }) \) defines a valid context summary for \(\texttt {append}\) (Fig. 1).

Example 4

(Role of the pre-condition approximation in context summaries). We consider the function below and assume it is always called in a state where \(\texttt {l0}\) is a valid pointer and \(\texttt {l1}\) points to a well-formed, but possibly empty, singly-linked list:

figure n

Obviously, this function will crash when the list is empty, i.e., when \(\texttt {l0}\) is the null pointer. However, the pair \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) below defines a valid transformation summary for this procedure:

We observe that the \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) component describes not only states for which the procedure returns but also states for which it crashes since the list pointer \(\texttt {c}\) l1 is null. The former are not part of the concretization of the transformation component \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \).

The above example shows the importance of the first component of the transformation summary: it conveys the fact that a set of states have been considered by the analysis as a pre-condition for a program fragment, even when the program fragment may not produce any post-condition for these states hereby they can be omitted from the transformation part.

5 Intraprocedural Analysis

The analysis performs a forward abstract interpretation [9] over abstract transformations (rather than on abstract states). More precisely, the abstract semantics \( {\llbracket \mathtt {C} \rrbracket }^{\sharp }_{\mathbb {T}} \) of a command \( \mathtt {C}\) inputs a transformation describing the entire computation so far, before executing \( \mathtt {C}\), and outputs a new transformation that reflects the effect of \( \mathtt {C}\) on top of that computation. Intuitively, the input of \( {\llbracket \mathtt {C} \rrbracket }^{\sharp }_{\mathbb {T}} \) may be viewed the dual of a continuation. Formally, the analysis is designed so as to meet the following soundness statement, for any transformation \( {\mathrm {t}}^{\sharp }\):

$$\begin{aligned} \forall (\sigma _0, \sigma _1, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }), \sigma _2 \in \mathbb {M}, \; (\sigma _1, \sigma _2) \in \llbracket \mathtt {C} \rrbracket _{\mathcal {T}} \Longrightarrow (\sigma _0, \sigma _2, \nu ) \in \gamma _{\mathbb {T}}( {\llbracket \mathtt {C} \rrbracket }^{\sharp }_{\mathbb {T}}( {\mathrm {t}}^{\sharp }) ) \end{aligned}$$
(1)

The analysis of assignments and loops follows from [17]. It may require finite disjunctions of transformations although we do not formalize this aspect since it is orthogonal to the goal of this paper. We recall the main aspects of their algorithms in this section and refer the reader to [17] for a full description.

Post-conditions for Assignment. We consider an assignment command \( \texttt {x} \mathbin {\texttt {->}}\underline{\mathtt {n}}= \texttt {y} \) (the analysis of other kinds of commands is similar), and a pre-transformation \( {\mathrm {t}}^{\sharp }\), and we discuss the computation of \( {\llbracket \texttt {x} \mathbin {\texttt {->}}\underline{\mathtt {n}}= \texttt {y} \rrbracket }^{\sharp }_{\mathbb {T}}( {\mathrm {t}}^{\sharp }) \). To do this, the analysis should first localize both \( \texttt {x} \mathbin {\texttt {->}}\underline{\mathtt {n}}\) and \( \texttt {y} \) in the post-state of \( {\mathrm {t}}^{\sharp }\), by rewriting \( {\mathrm {t}}^{\sharp }\) into an expression of the form or , and searching for \( \alpha _0 \) in \( {\mathrm {t}}^{\sharp }_0 \). Two main cases may arise:

  • if \( {\mathrm {t}}^{\sharp }_0 \) contains a term of the form \( \mathtt {Id}( \alpha _0 \cdot \underline{\mathtt {n}}{\mathop {\mapsto }\limits ^{}}\alpha _2 ) \) or \( [(\ldots ) \dasharrow (\alpha _0 \cdot \underline{\mathtt {n}}{\mathop {\mapsto }\limits ^{}}\alpha _2)] \), the post-transformation is derived by a mutation over the pointer cell, which produces a term of the form \( [(\ldots ) \dasharrow (\alpha _0 \cdot \underline{\mathtt {n}}{\mathop {\mapsto }\limits ^{}}\alpha _1)] \);

  • if \( {\mathrm {t}}^{\sharp }_0 \) contains a term \( \mathtt {Id}( {\mathrm {h}}^{\sharp }_0 ) \) or \( [(\ldots ) \dasharrow {\mathrm {h}}^{\sharp }_0] \) where \( {\mathrm {h}}^{\sharp }_0 \) is either \( \mathbf {list}( \alpha _0 ) \) or \( \mathbf {lseg}( \alpha _0, \ldots ) \), the summary should be unfolded so that the modified cell can be resolved as in the previous case; this step relies on relation \( \rightarrow _{\mathfrak {U}}\) (Sect. 3).

It is also possible that the localization of \( \texttt {x} \mathbin {\texttt {->}}\underline{\mathtt {n}}\) fails, which typically indicates that the program being analyzed may dereference an invalid pointer. Besides assignment, the analysis also supports other such operations; for instance, we write \( {\mathbf {newV}_{\mathbb {T}}}^{\sharp }[\texttt {x} _0, \ldots , \texttt {x} _n] \) (resp., \( {\mathbf {delV}_{\mathbb {T}}}^{\sharp }[\texttt {x} _0, \ldots , \texttt {x} _n] \)) for the operation that adds (resp., removes) variables \( \texttt {x} _0, \ldots , \texttt {x} _n \) to the output state of the current transformation. They over-approximate concrete operations noted \( \mathbf {newV}\) and \( \mathbf {delV}\).

Weakening. The analysis of loop statements proceeds by abstract iteration over the loop body with widening. Intuitively, the widening \( {\mathrm {t}}^{\sharp }_0 \mathrel {\triangledown _{\mathbb {T}}}{\mathrm {t}}^{\sharp }_1 \) of transformations \( {\mathrm {t}}^{\sharp }_0, {\mathrm {t}}^{\sharp }_1 \) returns a new transformation \( {\mathrm {t}}^{\sharp }\), such that \( \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_i ) \subseteq \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }) \) for all \( i \in \{ 0, 1 \} \). In the state level, widening replaces basic blocks with summaries (effectively inversing \( \rightarrow _{\mathfrak {U}}\)). In the transformation level, widening commutes with \( \mathtt {Id}\) and whenever their arguments can be widened as above, and weakens them into \( [ \dasharrow ] \) transformations otherwise. Furthermore, this transformation introduces summary predicates so as to ensure termination of all widening sequences [17]. Similarly, \( {\mathrm {t}}^{\sharp }_0 \sqsubseteq _{\mathbb {T}}{\mathrm {t}}^{\sharp }_1 \) conservatively decides inclusion test (if \( {\mathrm {t}}^{\sharp }_0 \sqsubseteq _{\mathbb {T}}{\mathrm {t}}^{\sharp }_1 \) holds, then \( \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_0 ) \subseteq \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_1 ) \)).

Example 5

(Analysis of the loop of \(\texttt {append}\) ). In this example, we consider the loop at line 4 in the \(\texttt {append}\) function (Fig. 1) and only present the part of the memory reachable from \(\texttt {c}\). The analysis of the loop starts with the transformation . The analysis of the assignment inside the loop body forces the unfolding of \( \mathbf {list}\), and produces . The widening of these two transformations produces , which also turns out to be the loop invariant.

6 Abstract Composition

In this section, we set up the abstract operations that are required to rely on transformations for modular analysis. In Sect. 2, we mentioned application and composition. We remark that the application of a transformation \( {\mathrm {t}}^{\sharp }\) to an abstract state \( {\mathrm {h}}^{\sharp }\) boils down to the abstract composition of \( {\mathrm {t}}^{\sharp }\) with \( \mathtt {Id}( {\mathrm {h}}^{\sharp }) \), since the latter represents exactly the set of pairs \( (\sigma , \sigma ) \) where \( \sigma \) is described by \( {\mathrm {h}}^{\sharp }\). Moreover, we observed in Sect. 2 that composition requires to reason over intersection of abstract states. Thus, we only define intersection and composition in this section.

Abstract Intersection. In this paragraph, we set up an abstract operator \( {\mathbf {inter}}^{\sharp }\), which inputs two abstract states and returns a disjunctive abstract state that over-approximates their intersection. The computation over abstract heaps is guided by a set of rewriting rules that are shown in Fig. 5. The predicate \( {\mathrm {h}}^{\sharp }_0 \mathrel {\sqcap }{\mathrm {h}}^{\sharp }_1 \leadsto _{\sqcap }H \) means that the computation of the intersection of \( {\mathrm {h}}^{\sharp }_0 \) and \( {\mathrm {h}}^{\sharp }_1 \) may produce the disjunction of abstract heaps \( H \) (there may exist several solutions for a given pair of arguments). We remark that the definition of \( \gamma _{\mathbb {H}}\) lets symbolic variables be existentially quantified, thus they may be renamed without changing the concretization, following usual \( \alpha \)-equivalence. Therefore, the rules of Fig. 5 assume that both arguments follow a consistent naming, although the implementation should perform \( \alpha \)-equivalence whenever needed and maintain a correspondence of symbolic variables [5].

Fig. 5.
figure 5

Abstract intersection rewriting rules.

Rule states that intersection can be computed locally. Rule \( \sqcap _{=}\) expresses that intersection behaves like identity when both of its arguments are the same basic term. Rules \( \sqcap [l,s]\) and \( \sqcap [s,s]\) implement structural reasoning over summaries. Finally, rule \( \sqcap _{u}\) unfolds one argument so as to consider all subsequent cases. The result may differ depending on the order of application or even on the way each rule is applied. As an example, may produce different results depending on the way both arguments are split into \( {\mathrm {h}}^{\sharp }_{i, 0} \) and \( {\mathrm {h}}^{\sharp }_{i, 1} \), which may affect precision. Therefore, our implementation follows a carefully designed application strategy that attempts to maximize the use of \( \sqcap _{=}\). We omit the numerical predicate intersection (handled by a numerical domain intersection operator). Given two abstract heaps \( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1 \), the computation of \( {\mathbf {inter}}^{\sharp }( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1 ) \) proceeds by proof search following the rules of Fig. 5 up-to commutativity (standard rule, not shown). In case this system fails to infer a solution, returning either argument provides a sound result.

Definition 3

(Abstract intersection algorithm). The operator \( {\mathbf {inter}}^{\sharp }\) is a partial function that inputs two abstract heaps \( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1 \) and returns a disjunction of abstract heaps \( H\) such that \( {\mathrm {h}}^{\sharp }_0 \mathrel {\sqcap }{\mathrm {h}}^{\sharp }_1 \leadsto _{\sqcap }H\) following Fig. 5.

Fig. 6.
figure 6

Abstract composition rewriting rules (rules and which are right versions of rules and , can be systematically derived by symmetry, and are omitted for the sake of brevity).

Theorem 1

(Soundness of abstract intersection). Abstract intersection is sound in the sense that, for all \( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1 \), \( \gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_0 ) \mathrel {\cap }\gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_1 ) \subseteq \gamma _{\mathbb {H}}( {\mathbf {inter}}^{\sharp }( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1 ) ) \).

Example 6

(Abstract intersection). Let us consider:

  • and

  • .

Then, . Note that the computation involves the structural rule \( \sqcap [s,s]\) and the unfolding rule to derive this result, where both the segment between \( \texttt {x} \) and \( \texttt {y} \) and the list pointed to by \( \texttt {y} \) are non-empty. This result is exact (no precision is lost) and the result is effectively more precise than both arguments.

Composition of Abstract Transformations. We now study the composition of abstract transformations. Again, the computation is based on a rewriting system, that gradually processes two input transformations into an abstraction of their composition. The rules are provided in Fig. 6. The predicate means that the effect of applying transformation \( {\mathrm {t}}^{\sharp }_0 \) and then transformation \( {\mathrm {t}}^{\sharp }_1 \) can be described by the union of the transformations in \( T \). Rule enables local reasoning over composition, at the transformation level. Rules and respectively compose matching identity transformations and matching modifying transformations. Similarly, composes an identity followed by a modifying transformation with a consistent support (this rule, as the following, has a corresponding right version that we omit for the sake of brevity). Rule implements a weakening based on the inclusion \( \gamma _{\mathbb {T}}( \mathtt {Id}( {\mathrm {t}}^{\sharp }) ) \subseteq \gamma _{\mathbb {T}}( [{\mathrm {t}}^{\sharp } \dasharrow {\mathrm {t}}^{\sharp }] ) \) (the inclusion is proved in [17]). Similarly, rule . Finally, rule unfolds a summary to enable composition. The abstract composition operation performs a proof search. Just as for intersection, the composition operator may produce different results depending on the application order; our implementation relies on a strategy designed to improve precision by maximizing the use of .

Definition 4

(Abstract composition algorithm). The operator \( {\mathbf {comp}}^{\sharp }\) is a partial function that inputs two abstract transformations \( {\mathrm {t}}^{\sharp }_0, {\mathrm {t}}^{\sharp }_1 \) and returns a set of abstract transformations \( T\) such that following Fig. 6.

Theorem 2

(Soundness of abstract composition). Let \( {\mathrm {t}}^{\sharp }_0, {\mathrm {t}}^{\sharp }_1 \) be two transformations, \( {\mathrm {h}}^{\sharp }_0, {\mathrm {h}}^{\sharp }_1, {\mathrm {h}}^{\sharp }_2 \) be abstract heaps and \( \nu \) be a valuation. We assume that \( {\mathbf {comp}}^{\sharp }( {\mathrm {t}}^{\sharp }_0, {\mathrm {t}}^{\sharp }_1 ) \) evaluates to the set of transformations \( T\). Then:

$$ (\sigma _0, \sigma _1, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_0 ) \mathrel {\wedge }(\sigma _1, \sigma _2, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_1 ) \Longrightarrow \exists {\mathrm {t}}^{\sharp }\in T, \; (\sigma _0, \sigma _2, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }) $$

Example 7

Abstract composition and analysis compositionality). In this example, we study the classical case of an in-place list reverse code snippet:

figure an

The effects of and of can be described by the abstract transformations \( {\mathrm {t}}^{\sharp }_0 \) and \( {\mathrm {t}}^{\sharp }_1 \):

The composition of these two transformations needs to apply the weakening rules to match terms that are under the \( \mathtt {Id}\) constructors. The result is the following transformation :

This description is actually a very precise account for the effect of the sequence of these two assignment commands. This example shows that composition may be used not only for interprocedural analysis (as we show in the next section), but also to supersede some operations of the intraprocedural analysis of Sect. 5.

Example 8

(Abstract application). As observed at the beginning of the section, composition may also be used as a means to analyze the application of a transformation to an abstract state. We consider the composition of the transformation \( {\mathrm {t}}^{\sharp }\) corresponding to function \(\texttt {append}\) (shown in Example 2 and Fig. 1) and the abstract pre-state . Then, the composition returns:

7 Interprocedural Analysis Based on Function Summaries

In this section, we study two mutually dependent aspects: the application of summaries at call-sites and their inference by static analysis in a top down manner. We first focus on non-recursive calls and discuss recursive calls at the end of the section. The analysis maintains a context summary \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) for each procedure \( \texttt {f} \). Initially, this context summary is set to \( (\bot , \bot ) \). When the analysis reaches a call to procedure \( \texttt {f} \) it should attempt to utilize the existing summary (Sect. 7.1). When the existing summary does not account for all the states that may reach the current context, a new summary needs to be computed first (Sect. 7.2).

7.1 Analysis of a Call Site Using an Existing Summary

We assume a procedure \( \mathbf {proc}\; f( \texttt {p} _0, \ldots , \texttt {p} _k ) \{ \mathtt {C}\} \) and a sound context summary \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \). We consider the analysis \( {\llbracket \texttt {f} ( \texttt {x} _0, \ldots , \texttt {x} _k ) \rrbracket }^{\sharp }_{\mathbb {T}} \) of a call to this procedure, with transformation \( {\mathrm {t}}^{\sharp }_\mathrm{pre} \) as a pre-transformation. To analyze the call, the analysis should (1) process parameter passing, (2) detect which part of \( {\mathrm {t}}^{\sharp }_\mathrm{pre} \) may be modified by \( \texttt {f} \), (3) check whether the context summary covers this context, and (4) apply the summary, if (3) succeeds (the case where it fails is studied in Sect. 7.2).

Parameter Passing. Parameter passing boils down to creating the variables \( \texttt {p} _0, \ldots , \texttt {p} _k \) using transfer function \( {\mathbf {newV}_{\mathbb {T}}}^{\sharp }\) and then to analyzing assignment statements \( \texttt {p} _0 = \texttt {x} _0, \ldots , \texttt {p} _k = \texttt {x} _k \). These operations can all be done using the transfer functions defined in Sect. 5:

$$ {\mathrm {t}}^{\sharp }_\mathrm{pars} = ( {\llbracket \texttt {p} _k = \texttt {x} _k \rrbracket }^{\sharp }_{\mathbb {T}} \circ \ldots \circ {\llbracket \texttt {p} _0 = \texttt {x} _0 \rrbracket }^{\sharp }_{\mathbb {T}} \circ {\mathbf {newV}_{\mathbb {T}}}^{\sharp }[\texttt {p} _0, \ldots , \texttt {p} _k]) ( {\mathrm {t}}^{\sharp }_\mathrm{pre} ) $$

Procedure Footprint. To identify the fragment of the abstract heap that \( \texttt {f} \) can view and may modify, the analysis should first extract from \( {\mathrm {t}}^{\sharp }_\mathrm{pars} \) an abstraction of the set of states that enter the body of \( \texttt {f} \). This is the goal of function \( \mathcal {O}: \mathbb {T}\rightarrow \mathbb {H}\):

Intuitively, \( \mathcal {O}\) projects the “output” part of a transformation. Thus \( \mathcal {O}( {\mathrm {t}}^{\sharp }_\mathrm{pars} ) \) over-approximates the set of states that enter \( \mathtt {C}\). However, only the fragment of \( \mathcal {O}( {\mathrm {t}}^{\sharp }_\mathrm{pars} ) \) that is reachable from the parameters of \( \texttt {f} \) is relevant. Given an abstract heap \( {\mathrm {h}}^{\sharp }\), we can compute the set of symbolic names \( R[{\mathrm {h}}^{\sharp }] \) that are relevant based on the following rules:

figure at

The slice \( \mathcal {R}[ \texttt {p} _0, \ldots , \texttt {p} _k ]( {\mathrm {h}}^{\sharp }) \) of \( {\mathrm {h}}^{\sharp }\) with respect to \( \texttt {p} _0, \ldots , \texttt {p} _k \) retains only the terms of \( {\mathrm {h}}^{\sharp }\) that contain only names in \( R[{\mathrm {h}}^{\sharp }] \) defined as the least solution of the above rules. Similarly, we let \( \mathcal {I}[ \texttt {p} _0, \ldots , \texttt {p} _k ]( {\mathrm {h}}^{\sharp }) \) be the abstract heap made of the remaining terms. Therefore, we have the equality .

Context Summary Coverage Test. Based on the above, the set of states that reach the body of \( \texttt {f} \) under the calling context defined by \( {\mathrm {t}}^{\sharp }_\mathrm{pre} \) can be over-approximated by \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} = \mathcal {R}[ \texttt {p} _0, \ldots , \texttt {p} _k ]( \mathcal {O}( {\mathrm {t}}^{\sharp }_\mathrm{pars} ) ) \). We let \( {\mathrm {h}}^{\sharp }_\mathrm{rem} = \mathcal {I}[ \texttt {p} _0, \ldots , \texttt {p} _k ]( \mathcal {O}( {\mathrm {t}}^{\sharp }_\mathrm{pars} ) ) \) be the remainder part. The context summary \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) covers \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \) if and only if \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \sqsubseteq _{\mathbb {H}}{\mathrm {h}}^{\sharp }_{\texttt {f}} \) holds where \( \sqsubseteq _{\mathbb {H}}\) is a sound abstract state inclusion test as defined in, e.g., [6, 13]. When this condition does not hold, the context summary should be re-computed with a more general pre-condition (this point is discussed in Sect. 7.2).

Summary Application. Given the above notation, the effect of the procedure (described by \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \)) should be applied to \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \) whereas \( {\mathrm {h}}^{\sharp }_\mathrm{rem} \) should be preserved. To do this, the following transformation should be composed with the abstract transformation (note that the identity part is applied to the part of the pre-transformation that is not relevant to the execution of the body of \( \texttt {f} \)). Thus, the transformation that accounts for the computation from the program entry point till the return point of \( \texttt {f} \) is:

Example 9

(Context summary coverage and application). In this example, we assume the context summary defined in Example 3 for procedure \(\texttt {append}\):

  • ;

Moreover, we consider the call with the abstract transformation below as a pre-condition (note that variable \(\texttt {c}\) is not accessed by \(\texttt {append}\)):

After parameter passing, computation of the heap fragment \( \texttt {f} \) may view, and projection of the output, we obtain the abstract state , which is obviously included in \( {\mathrm {h}}^{\sharp }\). The composition with the summary of the procedure produces the abstract transformation below:

The whole algorithm is shown in Fig. 7. It implements the steps described above and in Sect. 7.2. The case considered in this subsection (when \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \sqsubseteq _{\mathbb {H}}{\mathrm {h}}^{\sharp }_{\texttt {f}} \) holds) corresponds to the case where the \( \mathbf{if} \) branch at lines 5–6 is not taken.

Fig. 7.
figure 7

Interprocedural analysis: algorithm for the analysis of a procedure call.

7.2 Inference of a New Context Summary

We now discuss the case where the previously existing context summary of \( \texttt {f} \) does not cover the executions corresponding to \( {\mathrm {t}}^{\sharp }\). As mentioned above, this corresponds to the case where the abstract inclusion \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \sqsubseteq _{\mathbb {H}}{\mathrm {h}}^{\sharp }_{\texttt {f}} \) does not hold.

Summary Computation. The computation of a new context summary should take into account a context that is general enough to encompass both \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) and \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \):

  • the new abstract context is \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \mathrel {\triangledown _{\mathbb {H}}}{\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \) using abstract state widening \( \mathrel {\triangledown _{\mathbb {H}}}\) [6];

  • the new summary related to this abstract context is derived by analysis of the body of \( \texttt {f} \), thus by updating \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \) with \( {\llbracket \mathtt {C} \rrbracket }^{\sharp }_{\mathbb {T}}( \mathtt {Id}( {\mathrm {h}}^{\sharp }_{\texttt {f}} \mathrel {\triangledown _{\mathbb {T}}}{\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} ) ) \).

Then, the context summary for \( \texttt {f} \) is updated with this new context summary.

Application. Once a new summary has been computed, by definition, it satisfies the inclusion \( {\mathrm {h}}^{\sharp }_\mathrm{in,\texttt {f}} \sqsubseteq _{\mathbb {H}}{\mathrm {h}}^{\sharp }_{\texttt {f}} \), thus it can be applied so as to compute an abstract transformation after the call to \( \texttt {f} \) as shown in Sect. 7.1.

The overall procedure call analysis algorithm is displayed in Fig. 7. The case examined in this subsection corresponds to the case where the \( \mathbf{if} \) branch at lines 5–6 is taken. We observe that it necessarily occurs whenever a procedure is analyzed for the first time, as context summaries are initially set to \( (\bot , \bot ) \). Moreover, we note that the application of the summary after its update is done as in Sect. 7.1.

The following result formalizes the soundness of Fig. 7, under the assumption that there is no recursive call.

Theorem 3

(Soundness of the analysis of a procedure call using a context summary). We consider the call to \( \texttt {f} \) with the abstract transformation \( {\mathrm {t}}^{\sharp }_\mathrm{pre} \) as input, and the post-condition \( {\mathrm {t}}^{\sharp }_\mathrm{post} \) returned by the algorithm of Fig. 7. We denote by \( ({\mathrm {h}}^{\sharp }_{\texttt {f}}, {\mathrm {t}}^{\sharp }_{\texttt {f}}) \) the context summary for \( \texttt {f} \) after the analysis of the call. We let \( (\sigma _0, \sigma _1, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_\mathrm{pre} ) \), \( \sigma '_1 \in \llbracket \texttt {p} _0 = \texttt {x} _0 \rrbracket _{\mathcal {T}} \circ \ldots \circ \llbracket \texttt {p} _k = \texttt {x} _k \rrbracket _{\mathcal {T}} \circ \mathbf {newV}[\texttt {p} _0, \ldots \texttt {p} _k]( \{ \sigma _1 \} ) \), \( \sigma '_2 \in \llbracket \mathtt {C} \rrbracket _{\mathcal {T}}( \{ \sigma '_1 \} ) \), and \( \sigma _2 \in \mathbf {delV}[\texttt {p} _0, \ldots , \texttt {p} _k]( \{ \sigma '_2 \} ) \) (i.e., \( \sigma _2 \in \llbracket \texttt {f} ( \texttt {x} _0, \ldots , \texttt {x} _k ) \rrbracket _{\mathcal {T}}( \{ \sigma _1 \} ) \)). Then, the following property holds:

$$ (\sigma _0, \sigma _2, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_\mathrm{post} ) \mathrel {\wedge }(\sigma '_1, \nu ) \in \gamma _{\mathbb {H}}( {\mathrm {h}}^{\sharp }_{\texttt {f}} ) \mathrel {\wedge }(\sigma '_1, \sigma '_2, \nu ) \in \gamma _{\mathbb {T}}( {\mathrm {t}}^{\sharp }_{\texttt {f}} ) $$

This result means that not only the transformation \( {\mathrm {t}}^{\sharp }_\mathrm{post} \) over-approximates the state after the call, but also the new context summary accounts for this call. This entails that context summaries account for all the calls that are encountered in the analysis of a complete interprocedural program. Moreover, Theorem 3 entails Eq. 1 for procedure calls.

Example 10

(Context summary computation). In this example, we consider the function \(\texttt {append}\) again, but assume that the analysis starts with the \( (\bot , \bot ) \) context summary for it. We study the code where \( \texttt {a}, \texttt {b}, \) and \( \texttt {c} \) are initially lists of length 1. Then:

  • the first call results in the update of the summary with a context summary \( ({\mathrm {h}}^{\sharp }, {\mathrm {t}}^{\sharp }) \) where \( {\mathrm {h}}^{\sharp }\) assumes that the first argument is a single list element, i.e., is of the form ;

  • when the second call is encountered this first summary does not cover the second context (at this point, the argument \( \texttt {a} \) has length 2), thus a new context summary needs to be calculated; this new summary \( ({\mathrm {h}}^{\sharp }, {\mathrm {t}}^{\sharp }) \) is such that \( {\mathrm {h}}^{\sharp }\) only assumes that the first argument may be a list of any length (as derived by widening), i.e., it is of the form .

This last summary may still not be as general as the summary shown in Example 9, and may thus be generalized even more at subsequent call points.

Analysis of Recursive Calls. So far, we have focused on the case where there are no recursive calls. Actually, the presence of recursive calls changes relatively little to the principle of our analysis (although the formalization would be significantly heavier and is left out). Indeed, it only requires to extend the algorithm of Fig. 7 with a fixpoint computation over context summaries, so as to determine an over-approximation of both components of the procedure context summary.

  • when a recursive call is encountered and when the calling context is not accounted for in the current context summary (Fig. 7, condition at line 4 evaluated to false), the \( {\mathrm {h}}^{\sharp }_{\texttt {f}} \) component should be widened and the current \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \) should be used directly;

  • at the end of the analysis of the procedure body, the \( {\mathrm {t}}^{\sharp }_{\texttt {f}} \) component should be widened with the previously known transformation, and the analysis of the procedure body iterated until this transformation stabilizes.

Convergence is ensured by widening both on abstract states and transformations.

8 Experimental Evaluation

In this section, we report on the evaluation of the interprocedural analysis based on function summaries, with respect to the following questions:

  1. 1.

    Is it able to infer precise invariants?

  2. 2.

    Does it scale better than a classical call-string-based analysis?

  3. 3.

    How effective are context summaries, i.e., do they often have to be recomputed?

We have implemented the interprocedural analysis based on context summaries for a large fragment of the C language. Our tool is a plugin for Frama-C  [20]. It supports conventional control flow structures of C and can be parameterized by the inductive definition of the structure to consider, as in [5]. However, it leaves out non-recursive structures and data-types that are not immediately related to the analysis principle (strings, arrays, numeric types). Furthermore, we have also implemented as another extension of Frama-C an analysis relying on the call-string approach, i.e., that inlines procedures at call sites.

Experiments and Raw Data. We did two experiments. First, we ran the two analyses on a fragment of the GNU Emacs 25.3 editor. This fragment comprises 22 functions of the Fx_show_tip feature and is implemented in C. It manipulates descriptions of Lisp objects including lists built as Cons pairs. The analyzed code corresponds to about 3 kLOC. Analyses were ran on an Intel Core i7 laptop at 2.3 GHz with 16 Gb of RAM. The raw data are provided in Appendix A and the following paragraphs discuss the main points related to the above questions. Second, we analyzed a set of basic recursive functions on trees and lists to validate the recursive call analysis.

Precision. We compared the result of the analysis of the body of each procedure. More precisely, we checked whether the transformation computed for the whole procedure body for its entry abstract state (the first component of context summaries) is at least as precise as the post-condition produced by the state analysis. For 15 functions, the body contains nested calls and the result is as precise. The remaining 7 functions do not contain any call, thus are not relevant to validate the absence of precision loss at call sites.

Scalability. The total measured analysis time was 14.20 s for the transformation-based analysis against 877.12 s for the state analysis, which shows a high overall speedup. Secondly, we show the average analysis time of the body of each function in Fig. 8 (these values are average analysis times, and not total time spent analyzing each function, as the effectiveness of summary reuse is the topic of the next paragraph). We observe that for some functions the speedup is low or even negative. These functions mostly correspond to low analysis times. Upon inspection, they all occupy a low position in the call tree: they call few functions, and the transformation analysis overhead is not compensated by a high gain from many summary applications to avoid the analysis of down calls. Conversely functions at the top of the call graph (such as the entry point) show a very high gain.

Fig. 8.
figure 8

Per-function comparison of analysis runtime (times in seconds)

Effectiveness. Finally, we assessed the effectiveness of the summary reuse, which depends not only on the call graph shape (functions that are called at a single site offer no gain opportunity) but also on the function behavior and the analysis (depending on the contexts that are encountered, some procedure may need to be reanalyzed multiple times or not). We observed that only one procedure needed to be reanalyzed (Fcons was reanalyzed 3 times). All other summaries were computed only once (i.e., the branch at lines 4–6 in Fig. 7 is taken only once). For functions called at a single point (11 of the total) summaries could not be reused, but for 8 functions, summaries were reused multiple times (3 to 44 times). By contrast, the state analysis had to reanalyze most functions several times: in particular 11 functions were reanalyzed 15 times or more (up to 296 times). Therefore, the summary-based analysis provides significant gain even for small functions.

Recursive Calls. We ran the analysis on a series of recursive implementations of classical functions on lists and binary trees, including size, allocation, deallocation, insertion, search and deep copy, and also list concatenation and filter. For all these functions, the expected invariants could be inferred in under 5 ms (Appendix A).

9 Related Works and Conclusion

Since Sharir and Pnueli’s seminal paper [35], many works have studied approaches to interprocedural analyses. The relational approach strives for modularity, which has been a concern in static analysis even beyond interprocedural code [8, 10]. A major advantage of modular approaches is to avoid the reanalysis of program fragments that are called repeatedly. However, it is generally difficult to apply as it requires an accurate and efficient representation for summaries. While relational numerical abstract domains [11] naturally provide a way to capture numerical input/output relations [26, 27], this is harder to achieve when considering memory abstractions. The TVLA framework [33] supports such relation using the classical “primed variables” technique [18, 19] where input and output variables are distinguished using prime symbols. Some pointer analyses such as [12] rely on specific classes of procedure summaries to enable modular analysis. However, separation logic [28] does not naturally support this since formulas describe regions of a given heap. The classical solution involves the tabulation of pairs of separation logic formulas [2, 15, 22], but this approach does not allow to relate the description of heap regions in a fine manner. To circumvent this, we use transformations introduced in [17], which are built on connectors inspired by separation logic. The advantage is twofold: it enables not only a more concise representation of transformations (since tables of pairs may need to grow large to precisely characterize the effect of procedures) but also a more local description of the relation between procedure input and output states. Our graph representation of abstract states and transformations opens the way to a resolution of the frame problem [29, 30] using intersection operation. The results of our top-down, context summary-based analysis confirm that this approach brings a gain in analysis scalability once the upfront cost of summaries is amortized.

Many pointer analyses and weak forms of shape analyses have introduced specific techniques in order to construct and compute procedure summaries [21, 23, 25]. These works typically rely on some notion of graph that describes both knowledge about memory entities and procedure calls, therefore the interprocedural analysis reduces to graph algorithms. Moreover, context sensitivity information may be embedded into these graphs. Our approach differs in that it relies on a specific algebra of summaries, although we may also augment our summaries with context information. Another difference is that the summary predicates our abstract domain is based on allow a very high level of precision and that the abstract procedure call analysis algorithm (with intersection and composition) aims at preserving this precision. We believe that two interesting avenues for future works would consider the combination of various levels of context sensitivity and of less less expressive summaries with our analysis framework.

A very wide range of techniques have been developed to better cope with interprocedural analysis, many of which could be combined with context summaries. First, we do not consider tabulation of procedure summaries [10], however, we could introduce this technique together with some amount of context sensitivity [1]. Indeed, while relations reduce the need for tables of abstract pre- and post-conditions, combining context summaries and finer context abstraction may result in increased precision [7]. Bi-abduction [3] has been proposed as a technique to infer relevant abstract pre-conditions of procedures. In [3], this process was implemented on a state abstract domain, but the core principle of the technique is orthogonal to that choice, thus bi-abduction could also be applied to abstract transformations. Moreover, while our analysis proceeds top-down, it would be interesting to consider the combination with a bottom-up inference of summaries for some procedures [4]. Last, many works have considered the abstraction of the stack-frame and of the relations between the stack frame structure and the heap structures manipulated by procedures [31, 32]. The notion of cut-points has been proposed in [29, 30] to describe structures tightly intertwined with the stack. An advantage of our technique is that the use of an abstraction based on transformations which can express that a region of the heap is preserved reduces the need to reason over cutpoints.