1 Introduction

Linear Relation Analysis (LRA [17])—or polyhedral abstract interpretation—is a classical method for discovering invariant linear inequalities among the numerical variables of a program. This method is still one of the most powerful numerical program analysis techniques, because of the expressivity of the discovered properties. However, it is not applicable to large monolithic programs, because of its prohibitive complexity, in terms of number of involved variables—in spite of recent progress in polyhedra algorithmics [22, 37, 49]. An obvious solution consists in using it in a modular way: the analysis of reasonably small procedures can provide, once and for all, a summary as an input-output relation; this summary can be reused in the analysis of programs calling the procedure. The relational nature of LRA is, of course, beneficial in this process.

On the other hand, the numerous works on interprocedural analysis, often concluded that such a “bottom-up” approach—where a procedure is analyzed before its callers—generally results in very imprecise summaries, because the procedure is considered independently of its calling context. One can object that this imprecision can be also due to the poor expressivity of the used domains, in particular those commonly used in compilers (e.g., data-flow analysis [32]).

So interprocedural analysis can provide a solution to the prohibitive cost of LRA, which, in turn, can provide a convenient expressive power for expressing more accurate procedure summaries.

This idea of using LRA to synthesize input-output relations is quite straightforward and not new. In particular, it is systematically applied in the tool PIPS [3, 23], which considers each basic statement as an elementary relation, and synthesizes the input-output relation of a full program by composing these relations bottom-up. In this paper, we specialize the approach to the synthesis of procedure summaries. An easy way for building a relational summary of a procedure consists in duplicating the parameters to record their initial value, then performing a standard LRA of the body, which provides the summary as the least upper bound (convex hull) of the results at return points of the procedure.

However, it appears that conjunctions of linear constraints, i.e., convex polyhedral relations, are too restrictive. Obviously, procedures may exhibit very different and irregular behaviors according to the values of conditions appearing in tests. For instance,

  • in many cases, whether an outermost loop is entered at least once or not is very relevant for the global behavior of the procedure;

  • when a procedure has several return points, they are likely to correspond to quite different behaviors;

  • for a simple recursive procedure, the base case(s) should be distinguished from those which involve recursive calls.

So it is natural to look for summaries that are disjunctions of polyhedral relations. However, algorithms for manipulating polyhedra do not extend easily to general disjunctions of polyhedra. A solution consists in using trace partitioning [9, 28, 38, 47]. This solution is used in [24, 26], where the partitioning is directed by formulas on Boolean variables. Here, we will propose such a partitioning directed by well-chosen preconditions on input parameters.

Contributions: While being mainly interested in LRA, we consider a more general framework. We provide a general formalization of relational abstract interpretation, that we didn’t find elsewhere. As its use for computing procedure summaries often provides too rough results, we propose an approach to build disjunctive summaries, based on precondition partitioning. The choice of partitions is a heuristic process. We propose a method based on successive partition refinements, guided, on one hand, by the reachability of control points, and on the other hand, by the partitioning of summaries of called procedures. The method has been implemented in a prototype analyzer. Our experiments give encouraging results.

The paper is organized as follows. To situate our work, we first survey the abundant literature on interprocedural program analysis (Sect. 2). Since our approach can be applied in a more general context than LRA, we will develop each aspect in a stratified fashion: first, we consider the very general framework, then we present a specialization to LRA, before an application on a running example. Section 3 is concerned with concrete relational semantics of programs, and introduces the notations in the general framework and for numerical programs, together with our running example. Sections 4 and 5 deal with relational abstract interpretation and its use for building procedure summaries relative to a precondition. In view of the results on our example, in Sect. 6, we propose to compute disjunctive summaries directed by a partition of preconditions. In Sect. 7, we present a way of partitioning preconditions by successive refinements. The application of our method to recursive procedures is illustrated in Sect. 8. Section 9 briefly presents our prototype implementation, and some experiments are described in Sect. 10. Section 11 gives the conclusion and sketches some future work.

2 Related Work

Interprocedural analysis originated in side-effects analysis, from works of Spillman [51], Allen [1, 2] and Barth [6].

Interprocedural analyses can be distinguished according to the order in which procedures are traversed. In top-down analyses, procedures are analyzed following their invocation order [2], from callers to callees, while in bottom-up analyses, procedures are analyzed according to the inverse invocation order, from the callees up to the callers, by computing procedure summaries. Hybrid analyses [53] combine top-down and bottom-up analyses. We are interested in bottom-up approaches since each procedure is analyzed only once, regardless of the calling contexts, in possibly much smaller variable environments, thereby allowing a modular analysis with potential scalability improvements for numerical analyses such as LRA.

Sharir and Pnueli [48] introduced the functional approach and the call strings approach for distributive data flow frameworks. The functional approach computes procedure summaries, either from the bottom-up composition of individual propagation functions or by propagating data flow properties in a top-down fashion and by tabulating properties obtained at the exit node of a procedure with the associated property at entry. In the call strings approach, data flow properties are tagged by a finite string which encodes the procedure calls encountered during propagation. Call strings are managed as stacks and updated during propagation through a procedure call or return.

Reps et al. [46] proposed an algorithm belonging to the family of functional approaches to solve data flow problems with finite semilattices and distributive propagation functions in polynomial time, by recasting these data flow problems into graph reachability problems. Jeannet et al. [30, 50] proposed a method reminiscent of the call strings approach, for the relational numerical analysis of programs with recursive procedures and pointers to the stack. It is a top-down approach based on an abstraction of the stack. An implementation is available in the Interproc tool [25]. Abstract states are partitioned according to Boolean conditions, but not according to possible input abstract states of a procedure. Yorsh et al. [52] proposed a bottom-up approach for finite distributive data flow properties and described how precise summaries for this class of properties can be constructed by composition of summaries of individual statements.

A relational abstraction of sets of functions for shape analysis is proposed in [27], considering functions of signature \( D_1 \rightarrow D_2 \), provided that abstractions \( A_1 \) of \( \mathcal {P}(D_1) \) and \( A_2 \) of \( \mathcal {P}(D_2) \) exist, and that \( A_1 \) is of finite cardinality. This abstraction is relational since it is able to express relations between images of abstract elements mapped by a set of functions, but the abstraction \( A_1 \) is required to be of finite cardinality, thus excluding numerical abstract domains such as convex polyhedra.

Gulwani et al. [20] proposed a backward analysis to compute procedure summaries as constraints that must be satisfied to guarantee that some generic assertion holds at the end of a procedure. A generic assertion is an assertion with context variables which can be instantiated by symbols of a given theory. Procedure summaries are obtained by computing weakest preconditions of generic assertions. These generic assertions must be given prior to the analysis, thus forbidding the automatic discovery of procedure properties.

Cousot and Cousot [15, 16] describe the symbolic relational separate analysis for abstract interpretation, which uses relational domains, relational semantics and symbolic names to represent initial values of variables modified by a procedure. When instantiated with the convex polyhedra abstract domain, this approach computes procedure summaries which are input-output relations represented by a single convex polyhedron, with no ability to capture disjunctive behaviors in procedures. Recursive procedures are supported, as presented earlier in [13, 14, 21].

Müller-Olm et al. [40, 42] proposed an interprocedural bottom-up analysis to discover all Herbrand equalities between program variables in polynomial time. This approach was extended to linear two-variables equalities [18] and to affine relations [41]. This approach considers only abstracted programs with affine assignments, ignoring conditions on branches and dealing conservatively with other assignments. We are proposing a more general approach, which is also able to capture some disjunctive behaviors.

In the PIPS tool [3, 23], statements are abstracted by affine transformers [35, 36] which are input-output relations represented by convex polyhedra. The summary of a whole procedure is obtained from the composition of statement transformers, in a bottom-up fashion. Recursive procedures are not supported and each procedure summary is a single affine input-output relation, preventing the expression of disjunctive behaviors.

Popeea et al. [43,44,45] presented an analysis to both prove user-supplied safety properties and to find bugs by deriving conditions leading either to success or failure in each procedure. Disjunctive numerical properties are handled by a complete decision procedure for linear arithmetic provided by the Omega Test [31]. Our approach is able to discover automatically some disjunctive behaviors of procedures without requiring user-provided assertions.

Kranz et al. [33] proposed a modular analysis of executables based on Heyting completion [19]. Unfortunately, in the convex polyhedra abstract domain, the pseudo-complement \( a \Rightarrow b = \sqcup \{ d \mid a \sqsubseteq d \sqsubseteq b \} \) of a relative to b is not available in general.

3 Concrete Relational Semantics

3.1 General Framework

In our general framework, a program or a procedure is just a transition system. We introduce below a few definitions and notations.

States and Relations: Let S be a set of states. Let \(2^S\) be the powerset of S. Let \({\mathcal R}= 2^{S\times S}\) be the set of binary relations on S.

  • We define \({ {src}}, { {tgt}}\) the projection functions \({\mathcal R}\mapsto 2^S\) such that: \(\forall r \in {\mathcal R}\),

    $${ {src}}(r) = \{ s_0 \in S \,\mid \, \exists s_1 \in S, (s_0,s_1) \in r\},\; { {tgt}}(r) = \{ s_1 \in S \,\mid \, \exists s_0 \in S, (s_0,s_1) \in r\} $$
  • If \(U \subseteq S\), we define \({ {Id}}_U\) the relation \(\{(s,s)\,\mid \, s\in U\}\).

  • If \(r_1, r_2 \in {\mathcal R}\), we denote by \(r_1\circ r_2\) their composition:

    $$r_1\circ r_2 = \{(s,s')\,\mid \,\exists s'', (s,s'')\in r_1 \text{ and } (s'',s') \in r_2\}$$

Forward, Backward Relational Semantic Equations: Let \(\rho \in {\mathcal R}\) be a transition relation on S. We are interested in computing an upper approximation of its transitive closure \(\rho ^*\), which can be defined as a least fixpoint:

$$\begin{array}{ll} \rho ^* &{} = \mu r.{ {Id}}_S \cup (r\circ \rho ) \quad \text{(forward } \text{ equation) } \\ &{} = \mu r.{ {Id}}_S \cup (\rho \circ r) \quad \text{(backward } \text{ equation) } \end{array} $$

Trace Partitioning: We use the classical “trace partitioning” technique [38, 47]. Assume that the set S is finitely partitioned: \(S = S_1 \oplus S_2 \oplus \ldots \oplus S_n\). This partitioning can reflect the control points in a program or a control-flow graph, but it can also be more “semantic”, and express state properties, like preconditions. If \(r\in {\mathcal R}\), for each \(i,j \in \{1,\ldots ,n\}\), we define \(r(S_i,S_j)= r\cap (S_i\times S_j)\).

With these notations, the relations \(\rho ^*(S_i,S_j)\) can be defined by the following system of fixpoint equations (henceforth, we consider only forward computation, backward computation is symmetrical):

$$ \begin{array}{ll} \forall j\ne i,&{}\displaystyle \rho ^*(S_i,S_j) = \bigcup _{k=1}^n \rho ^*(S_i,S_k)\circ \rho (S_k,S_j) \\ &{}\displaystyle \rho ^*(S_i,S_i) = { {Id}}_{S_i}\cup \bigcup _{k=1}^n \rho ^*(S_i,S_k)\circ \rho (S_k,S_i) \end{array} $$

Concrete Relational Summaries: Let p be a procedure, \(S, \rho , {\mathcal I}, {\mathcal E}\), respectively, its set of states, its transition relation, its sets of initial states (global precondition) and exit states. We assume that S is partitioned, and that \({\mathcal I}, {\mathcal E}\) belong to the partition. The concrete relational summary of p is \(\sigma _p= \rho ^*({\mathcal I},{\mathcal E})\). So, for the forward computation of the summary, we are concerned with the computation of \(\rho ^*({\mathcal I},S_j),\,j=1...n\), according to the equations

$$\rho ^*({\mathcal I},S_j) = \left( \bigcup _{k=1}^n \rho ^*({\mathcal I},S_k) \circ \rho (S_k,S_j)\right) \cup \left\{ \begin{array}{cl}{ {Id}}_{{\mathcal I}} &{}\text{ if } S_j={\mathcal I}\\ \emptyset &{}\text{ otherwise }\end{array}\right\} $$

Concrete Semantics of Procedure Calls: Let S be the set of states of a procedure p, T be the set of states of a program calling p. For a given call to p, let us write \(\pi \) the mapping \(\in 2^{S\times S} \mapsto 2^{T\times T}\) representing the parameter passing mechanism (generally, a renaming of formal parameters into actual ones). Then, if \(T_i\) (resp. \(T_j\)) represents the sets of states just before (resp., just after) the call, the elementary relation corresponding to the call is \(\rho (T_i,T_j) = \pi (\sigma _p)\).

3.2 Numerical Programs and Procedures

Procedures: For simplicity, and without loss of generality, the following assumptions are taken:

  • All procedure parameters are supposed to be passed by reference. However, we are not concerned with pointer manipulation, and we entrust existing analyses to detect aliasing problems.

  • Global variables are dealt with as additional parameters.

  • For clarity, we will consider that all variables are parameters, since local variables don’t raise any problem, but complicate the presentation.

In LRA, only numerical variables—taking their values in a numerical domain \({\mathcal N}\) (\(=\mathbb {Z}\) or \(\mathbb {Q}\))—are considered. A state of a numerical procedure with n variables is thus a pair \(({c},V)\), where \({c}\in C\) is a control point (a line, a statement, a block in a control-flow graph, ...), and \(V=(v_1,...,v_n)\in {\mathcal N}^n\) is a vector of numerical values. Control points provide a natural partitioning of such a set of states: \(S_{c}= \{({c},V)\;|\; V\in {\mathcal N}^n\}\). The set of initial states \({\mathcal I}\) of a procedure with entry point \({c}_{\mathcal I}\) is such an \(S_{{c}_{\mathcal I}}\), possibly restricted by a precondition \(A_{\mathcal I}\subseteq {\mathcal N}^n\) on parameter values: \({\mathcal I}=\{({c}_{\mathcal I},V)\;|\;V\in A_{\mathcal I}\}\).

From State to Relational Collecting Semantics: Given such a partition \(\{S_{c}\;|\; {c}\in C\}\), the usual collecting semantics defines the set \(A_{c}\) of reachable variable valuations in each \(S_{c}\), such that \(A_{c}= \{V \;|\; ({c},V) \text{ is } \text{ a } \text{ reachable } \text{ state } \text{ from } {\mathcal I}\}\), as the least solution of a system of fixpoint equations:

$$ A_{c}= F_{c}\left( \{A_{{c}'}\;|\;{c}'\in C\}\right) \;\cup \;\left\{ \begin{array}{cl} A_{\mathcal I}&{} \text{ if } {c}={c}_{\mathcal I}\\ \emptyset &{} \text{ otherwise }\end{array}\right. $$

where the semantic function \(F_{c}\) expresses how the states in \(S_{c}\) depends on the states at other control points. This state semantics can be straightforwardly extended to relational semantics as follows: for each variable \(v_i\), a new variable \(v^0_i\) is introduced to record the initial value of \(v_i\). The new set of states is thus \(C\times {\mathcal N}^{2n}\), and the new initial state is

$${\mathcal I}= \{({c}_{\mathcal I},(v^0_1, \ldots v^0_n, v_1,\ldots ,v_n))\;|\; (v^0_1,\ldots ,v^0_n) \in A_{\mathcal I}\,\wedge \, v_i=v^0_i, i=1...n\}$$

The relational semantics is equivalent to the state semantics of the same procedure, initialized with the assignments \(v^0_i = v_i\) for each \(i=1...n\).

Concrete Relational Summary: Let \({\mathcal E}\subset C\) be the set of exit points of the procedure. Then, \(\bigcup _{{c}\in {\mathcal E}} A_{c}\) is the concrete summary of the procedure. In presence of local variables, they should be eliminated from this expression by existential quantification.

3.3 A Very Simple Example

Our example program is the classical Euclidean division, shown below with its relational semantic equations:

figure a

The least solution for \(A_6\), the unique exit point, is the concrete summary of the procedure: \(a=a^0\wedge b=b^0\wedge a = bq+r \wedge q\ge 0 \wedge b-1\ge r\ge 0\). Notice that it contains a non linear relation, so it cannot be precisely obtained by LRA. For simplicity, we pretended to duplicate all parameters. Of course, in practice, pure input parameters (“value” parameters, whose value is not changed in the procedure) as well as pure output parameters (“result” parameters, whose initial value is not used in the procedure) don’t need to be duplicated.

4 Relational Abstract Interpretation

4.1 General Framework

Relational Abstract Domains: A relational abstract domain is a complete lattice \({({\mathcal R}^\sharp , \sqsubseteq , \bot ,\top , \sqcap , \sqcup )}\) related to \({\mathcal R}\) by a Galois connection, i.e., a pair of increasing functions: \(\alpha _{\mathcal R}: {\mathcal R}\mapsto {\mathcal R}^\sharp \) (abstraction), \(\gamma _{\mathcal R}: {\mathcal R}^\sharp \mapsto {\mathcal R}\) (concretization), such that \(\forall r \in {\mathcal R}, r^\sharp \in {\mathcal R}^\sharp ,\,\alpha (r) \sqsubseteq r^\sharp \,\Leftrightarrow \, r\subseteq \gamma (r^\sharp )\).

If \(U \subseteq S\), we denote by \({ {Id}}^\sharp _U\) the abstract relation \(\alpha _{\mathcal R}({ {Id}}_U)\). If \(r^\sharp _1, r^\sharp _2 \in {\mathcal R}^\sharp \), we define \(r^\sharp _1\circ r^\sharp _2\) their composition as \(\alpha _{\mathcal R}(\gamma _{\mathcal R}(r^\sharp _1)\circ \gamma _{\mathcal R}(r^\sharp _2))\). A relational abstract domain induces two abstract domains, \(S_\rightarrow ^\sharp \) and \(S_\leftarrow ^\sharp \) on \(2^S\):

$$\forall U \subseteq S, \alpha _{S_\rightarrow }(U) = \alpha _{\mathcal R}(U\times S),\quad \alpha _{S_\leftarrow }(U) = \alpha _{\mathcal R}(S\times U)$$

Notice that both \(S_\rightarrow ^\sharp \) and \(S_\leftarrow ^\sharp \) are included in \({\mathcal R}^\sharp \). We can define the abstract projections \({ {src}}^\sharp : {\mathcal R}^\sharp \mapsto S_\rightarrow ^\sharp \) and \({ {tgt}}^\sharp : {\mathcal R}^\sharp \mapsto S_\leftarrow ^\sharp \) by:

$$ { {src}}^\sharp (r^\sharp ) = \alpha _{S_\rightarrow }({ {src}}(\gamma _{\mathcal R}(r^\sharp ))),\quad { {tgt}}^\sharp (r^\sharp ) = \alpha _{S_\leftarrow }({ {tgt}}(\gamma _{\mathcal R}(r^\sharp ))) $$

Relational Abstract Analysis: Let \(\rho \) be a transition relation, and \(\rho ^\sharp \) be an upper bound of its abstraction. We assume the availability of both a widening and a narrowing operation \(\nabla ,\varDelta : {\mathcal R}^\sharp \times {\mathcal R}^\sharp \mapsto {\mathcal R}^\sharp \). Classically [12], an upper approximation of \(\rho ^{\sharp *}\) can be obtained by computing the limit \(r^{\sharp \nabla }\) of an increasing approximation sequence:

$$r^\sharp _0 = \bot ,\; r^\sharp _{n+1} = r^\sharp _n \nabla (r^\sharp _n\circ \rho ^\sharp )$$

then the limit \(r^{\sharp \nabla \varDelta }\) of a decreasing sequence:

$$r'^\sharp _0 = r^{\sharp \nabla }, r'^\sharp _{n+1} = r'^\sharp _n \varDelta (r'^\sharp _n\circ \rho ^\sharp )$$

The result \(r^{\sharp \nabla \varDelta }\) is an abstract approximation of \(\rho ^*\), i.e., \(\rho ^* \subseteq \gamma (r^{\sharp \nabla \varDelta })\).

Abstract Partition: For \(\leftrightarrow \in \{\leftarrow , \rightarrow \}\), we define an abstract partition of \(S_\leftrightarrow ^\sharp \) as a finite set \(\{S^\sharp _0, ..., S^\sharp _n\}\subseteq S_\leftrightarrow ^\sharp \), such that \(\{S_i=\gamma _{S_\leftrightarrow }(S^\sharp _i)\,|\, i=1...n\}\) is a partition of S. More generally, if \(U\subseteq S\), an abstract partition of \(U_\leftrightarrow ^\sharp =\alpha _{S_\leftrightarrow }(U)\) is a finite set \(\{U^\sharp _0, ..., U^\sharp _n\}\subseteq U_\leftrightarrow ^\sharp \), such that \(\{U_i=\gamma _{S_\leftrightarrow }(U^\sharp _i)\,|\, i=1...n\}\) is a partition of U.

Partitioned Relational Abstract Analysis: Let \(\{S_i=\gamma (S^\sharp _i)\,|\, i=1...n\}\) be a partition of S, let \(\rho \) be a transition relation, \(\rho (S_i,S_j)\) be defined as before for \(i,j=1...n\), and \(\rho ^\sharp (S^\sharp _i,S^\sharp _j)\) be (an upper bound of) the abstraction of \(\rho (S_i,S_j)\). An upper approximation of the vector \(\{\rho ^{\sharp *}(S^\sharp _i,S^\sharp _j) \;|\; i,j=1...n\}\) can be obtained as the limit of (vectorial) increasing-decreasing sequences corresponding to the system of fixpoint equations:

$$\begin{array}{ll} \forall i=1...n, &{} \displaystyle \forall j\ne i,\; \rho ^{\sharp *}(S^\sharp _i,S^\sharp _j) = \bigsqcup _{k=1}^n \rho ^{\sharp *}(S^\sharp _i,S^\sharp _k) \circ \rho ^\sharp (S^\sharp _k,S^\sharp _j)\\ &{} \displaystyle \rho ^{\sharp *}(S^\sharp _i,S^\sharp _i) = { {Id}}^\sharp _{S^\sharp _i} \sqcup \bigsqcup _{k=1}^n \rho ^{\sharp *}(S^\sharp _i,S^\sharp _k) \circ \rho ^\sharp (S^\sharp _k,S^\sharp _i) \end{array}$$

Abstract Summary and Abstract Effect of a Procedure Call: Let p be a procedure, \({\mathcal I},{\mathcal E}\) its set of initial and exit states. The abstract summary of p is \(\sigma ^\sharp _p= \rho ^{\sharp *}({\mathcal I}^\sharp ,{\mathcal E}^\sharp )\). The abstract effect of a call to p, with parameter passing \(\pi \), situated between \(T^\sharp _i\) and \(T^\sharp _j\) is \(\rho ^\sharp (T^\sharp _i,T^\sharp _j) = \pi (\sigma ^\sharp _p)\).

4.2 Building Summaries Using LRA

LRA makes use of the lattice of convex polyhedra [5, 17]. It abstracts a set of numerical vectors by its convex hull (i.e., its least convex superset). Notice that the convex hull of an infinite set of vectors is not necessarily a polyhedron, but the finiteness of the analysis—thanks to the use of a widening operation—ensures that all the computed approximations are polyhedra, i.e., sets of solutions of a finite system of affine inequalities.

Intersection (\(P_1\sqcap P_2\)), convex hull (\(P_1\sqcup P_2\)), projection (\(\exists X.P\)), effect of variable assignment (\(P[x\leftarrow \text{ exp }]\), widening (\(P_1\nabla P_2\)), test for inclusion (\(P_1\sqsubseteq P_2\)) and emptiness (\(P=\emptyset \)) are available. Instead of using a narrowing operator to ensure the finiteness of the decreasing sequence, a limited number of iterations of the abstract function is generally applied.

Polyhedra can be used for representing input-output relations, as an abstraction of the relational semantics described in Sect. 3.2. We write \(P(X^0,X)\) a polyhedron involving initial values \(X^0\) and current values X. Notice that the source and the target of the relation r expressed by \(P(X^0,X)\) can be obtained by polyhedron projections:

$$ { {src}}^\sharp (r) = \exists X. P(X^0,X),\; { {tgt}}^\sharp (r) = \exists X^0. P(X^0,X)$$

4.3 Example

Let us apply LRA to our Euclidean division example. The abstract equations are as follows:

$$\begin{aligned}&P_1 = {(a^0\ge 0, b^0 \ge 1, a=a^0, b=b^0, q=q^0, r=r^0)}\\&\begin{array}{ll|rl} P_2 = &{} P_1[q\leftarrow 0][r\leftarrow a] \quad &{}\quad P_4= &{} P_3 \sqcap (r\ge b) \\ P_3 = &{} P_2 \sqcup P_5 &{} P_5 = &{}P_4[r\leftarrow r-b][q\leftarrow q+1]\\ P_6 = &{} P_3\sqcap (r\le b-1) &{}&{} \end{array} \end{aligned}$$

\(P_6\) corresponds to the unique exit point of the procedure, so it is the summary. The standard analysis—where the widening is applied on \(P_3\) during the increasing sequence, and the decreasing sequence is limited to 2 steps—provides:

$$\begin{aligned} P_6 = (a=a^0\,,\; b=b^0,\; r\ge 0,\; q \ge 0\,,\; b \ge r+1) \end{aligned}$$

It is a rather weak summary, all the more as the precondition \(a^0\ge 0\) has been lost. This suggests that preconditions should be considered more carefully.

5 Preconditions

For closed programs, the initial state is generally not relevant, since, normally, the variables are explicitly assigned an initial value before being used. When considering procedures, the initial state is implicitly defined by the initial values of parameters. Therefore, it is essential to take it into account. In particular, the correct behavior of a procedure often depends on (user-defined) preconditions on parameter values. We will call global precondition the abstraction of the set of legal initial states of a procedure: \({\mathcal I}^\sharp _p = \alpha _{S_\rightarrow }({\mathcal I}_p)\). Notice that we already took into account the global precondition \(a\ge 0, b\ge 1\) in our example. Such global precondition may be given by the user, or deduced from another analysis of the calling contexts, or simply \(\top \).

Moreover, preconditions can be used to differentiate cases of input values (calling contexts) that should be considered separately. These preconditions will be obtained by refining the global precondition. This is the way we intend to build disjunctive summaries.

Widening under a Precondition: In relational analysis, a precondition provides an obvious invariant: a procedure may not change its initial state, so any concrete relation \(\rho ^*({\mathcal I}_p,S_i)\) has its source within \({\mathcal I}_p\). However, it is not as obvious with abstract analysis: because of the use of widening, it may happen that the result \(r^{\sharp \nabla \varDelta }\) does not satisfy this invariant, i.e., \(\gamma _{\mathcal R}(r^{\sharp \nabla \varDelta })\) is not included in \({\mathcal I}_p\times S\). This is what happened in our example (Sect. 4.3). As a consequence, it is sound and interesting to make use of a “limited widening” when computing \(r^{\sharp \nabla }\): we define this more precise widening by \(r\nabla _{\!{\mathcal I}^\sharp _p}r' = (r\nabla r') \sqcap {\mathcal I}^\sharp _p\).

Example: Coming back to our example in Sect. 4.3, the widening is performed on \(P_3\). Instead of applying the widening classically, i.e., computing \(P_3=P_3\nabla (P_2\sqcup P_5)\), we limit it with the precondition, i.e., compute \(P_3=(P_3\nabla (P_2\sqcup P_5))\sqcap (a^0\ge 0, b^0\ge 1)\). The summary we obtain

$$\begin{aligned} P_6 = (a=a^0,\; b=b^0,\; r\ge 0\,,\; q \ge 0\,,\; b \ge r+1\,,\; a \ge q+r) \end{aligned}$$

recovers more than just the precondition. Instead of gaining just \(a\ge 0\), we get the stronger \(a\ge q+r\).

6 Disjunctive Summaries

Up to now, we described the classical analysis by abstract interpretation, with an emphasis on relational analysis, use of trace partitioning, and taking care of preconditions. In this section, we propose to refine the partitioning by distinguishing the calling contexts of a procedure, defined as preconditions.

Abstract domains are generally not closed under disjunction (in some sense, it is the essence of abstraction). In order to build more precise procedure summaries, it is natural to consider disjunctions of abstract relations. However, some restrictions must be applied to be able to compute on such disjunctions. Moreover, in order to be able to exploit such a disjunctive procedure summary when using it on a procedure call, the values of the actual parameters should determine which disjunct must apply. Thus, different disjuncts should have disjoint sources.

6.1 Disjunctive Refinements of an Abstract Relation

If p is a procedure with global precondition \({\mathcal I}\), a disjunctive refinement of the abstract relation \(\rho ^{\sharp *}({\mathcal I}^\sharp ,S^\sharp _i)\) will be a finite set \(r^{\sharp *}_1, ..., r^{\sharp *}_m\) of abstract relations, such that

$$\begin{array}{l} (1)\quad \forall k=1...m, r^{\sharp *}_k\sqsubseteq \rho ^{\sharp *}({\mathcal I}^\sharp ,S^\sharp _i)\\ (2)\quad \forall k_1,k_2 = 1...m, k_1\ne k_2\Rightarrow \gamma ({ {src}}^\sharp (r^{\sharp *}_{k_1})) \cap \gamma ({ {src}}^\sharp (r^{\sharp *}_{k_2}))=\emptyset \\ (3)\quad \displaystyle \bigcup _{k=1}^m \gamma ({ {src}}^\sharp (r^{\sharp *}_k)) = \gamma ({\mathcal I}^\sharp ) \end{array}$$

In other words, \(\{{ {src}}^\sharp (r^{\sharp *}_k)\}_{k=1...m}\) forms an abstract partition of \({\mathcal I}^\sharp \). Notice that, with this definition, the disjunctive summary of a procedure can also be seen as a conjunction of implications:

$$ \bigvee _{k=1}^m r_k^{\sharp *} \;\Longleftrightarrow \; \bigwedge _{k=1}^m \left( { {src}}^\sharp (r^{\sharp *}_k)\Rightarrow r_k^{\sharp *}\right) $$

emphasizing the fact that the partitioning is directed by properties of input parameters. Conversely, given an abstract partition \(\{{\mathcal I}_k^\sharp \}_{k=1...m}\) of \({\mathcal I}^\sharp \), one can compute a disjunctive refinement of the abstract relation \(\rho ^{\sharp *}({\mathcal I}^\sharp ,S^\sharp _i)\) simply by computing \(r^{\sharp *}_k=\rho ^{\sharp *}({\mathcal I}_k^\sharp ,S^\sharp _i)\) for each \(k=1...m\).

6.2 Disjunctive Abstract Summary and Abstract Effect of a Call

Given a disjunctive refinement \(\{r^{\sharp *}_k\}_{k=1...m}\) of an abstract relation, the corresponding abstract summary of a procedure is a set of disjuncts:

$$\begin{aligned} \{\sigma ^\sharp _k = r^{\sharp *}_k({\mathcal I}^\sharp _p,{\mathcal E}^\sharp _p)\}_{k=1...m} \end{aligned}$$

Given such a disjunctive summary, the abstract effect of a call to p, with parameter passing \(\pi \), situated between \(T^\sharp _i\) and \(T^\sharp _j\) is

$$\begin{aligned} \rho ^\sharp (T^\sharp _i,T^\sharp _j) = \bigsqcup _{k=1}^m \pi (\sigma ^\sharp _k) \end{aligned}$$

6.3 Application to LRA

Disjunctive Polyhedral Summaries: Let p be a procedure, X be its vector of variables, and \({\mathcal I}^\sharp \) be its polyhedral global precondition. A disjunctive polyhedral summary of p is a disjunction of input-output relations expressed by a set of polyhedra \(\{R_1,...,R_m\}\), and such that, if we define \({\mathcal I}^\sharp _k= { {src}}^\sharp (R_k) = \exists X.R_k\;(k=1...m)\), the set \(\{{\mathcal I}^\sharp _k\}_{k=1...m}\) forms an abstract partition of \({\mathcal I}^\sharp \).

Polyhedron Transformer of a Procedure Call: With the same notations concerning the procedure p and its disjunctive polyhedral summary, assume that \(X=(x_1,\ldots ,x_n)\) is the list of formal parameters. Let q be a caller to p, \(A=(a_1,\ldots ,a_n)\) be the actual parameters of a call to p situated between control points \({c}\) and \({c}'\) in q. Let \(Q_{c}\) be the polyhedron associated with \({c}\) in q. Then the polyhedron associated with the return point \({c}'\) is

$$Q_{{c}'} = \bigsqcup _{k=1}^m \left( \exists A^1. Q_{c}[A/A^1] \sqcap R_k[X^0/A^1][X/A]\right) $$

where

  • \(Q_{c}[A/A^1]\) is the result of renaming, in \(Q_{c}\), each variable \(a_i\) as \(a^1_i\)

  • \(R_k[X^0/A^1][X/A]\) is the result of renaming, in \(R_k\), each variable \(x^0_i\) as \(a^1_i\), and each variable \(x_i\) as \(a_i\) (this term is what we wrote \(\pi (\sigma ^\sharp _k)\) in the general framework Sect. 6.2).

In other words, the auxiliary variables \(A^1=(a^1_1,\ldots ,a^1_n)\) represent the values of actual parameters before the call, so they are substituted for A in the calling context \(Q_{c}\) and to \(X^0\) in the summary; the values A of the actual parameters after the call, are substituted for X in the summary.

7 Partition Refinement

7.1 General Framework

Given an abstract partition of the global precondition of a procedure, we know how to compute and use a disjunctive summary based on this partition. In this section, we propose a heuristic method to choose the abstract partition.

Complementable Abstract Values: An abstract value \(r^\sharp \) is said to be complementable, if there exists an abstract value \(\,\overline{r^\sharp }\,\) (its complement) such that \(r^\sharp \sqcap \,\overline{r^\sharp }\,=\bot \) and \(\gamma (r^\sharp ) \cup \gamma (\,\overline{r^\sharp }\,) = {\mathcal R}\). For instance, complementable convex polyhedra are half-spaces, i.e., polyhedra defined by a single inequality.

Refinement According to Local Reachability: Let \(\{r^{\sharp \nabla \varDelta }({\mathcal I}^\sharp ,S^\sharp _i)\}_{i=1...n}\) be the result of a classic analysis from a precondition \({\mathcal I}^\sharp \). For a given \(i\in \{1...n\}\), \({\mathcal I}^\sharp _i={ {src}}^\sharp (r^{\sharp \nabla \varDelta }({\mathcal I}^\sharp ,S^\sharp _i))\) is a necessary condition for \(S^\sharp _i\) to be reachable. As a consequence, if \(s^\sharp \) is a complementable abstract value such that

  • \({\mathcal I}^\sharp _i\sqsubseteq s^\sharp \)

  • \({\mathcal I}'^\sharp = {\mathcal I}^\sharp \sqcap s^\sharp \ne \bot \) and \({\mathcal I}''^\sharp = {\mathcal I}^\sharp \sqcap \,\overline{s^\sharp }\,\ne \bot \)

then \(({\mathcal I}'^\sharp ,{\mathcal I}''^\sharp )\) is a good candidate for refining the precondition \({\mathcal I}^\sharp \). As a matter of fact, \({\mathcal I}''^\sharp \) is a sufficient precondition for \(S^\sharp _i\) to be unreachable.

Refinement According to the Summary of a Called Procedure: The effect of a call to a procedure with a partitioned summary \(\{\sigma ^\sharp _k\}_{k=1...m}\) (as defined in Sect. 6.2) involves a least upper bound \(\bigsqcup _{k=1}^m \pi (\sigma ^\sharp _k)\), which is likely to lose precision. So it is interesting to refine the partition in the caller in order to split this least upper bound. Let us denote by \({\mathcal J}^\sharp _k = \pi ({ {src}}^\sharp (\sigma ^\sharp _k))\), i.e., the condition on actual parameters for \(\sigma ^\sharp _k\) to be applicable. Then, in the caller, \({\mathcal I}^\sharp _k = { {src}}^\sharp (r^{\sharp \nabla \varDelta }({\mathcal I}^\sharp ,{\mathcal J}^\sharp _k))\), is a necessary precondition for \({\mathcal J}^\sharp _k\) to be satisfiable. As a consequence, if \(s^\sharp \) is a complementable abstract value such that

  • \({\mathcal I}^\sharp _k\sqsubseteq s^\sharp \)

  • \({\mathcal I}'^\sharp = {\mathcal I}^\sharp \sqcap s^\sharp \ne \bot \) and \({\mathcal I}''^\sharp = {\mathcal I}^\sharp \sqcap \,\overline{s^\sharp }\,\ne \bot \)

then \(({\mathcal I}'^\sharp ,{\mathcal I}''^\sharp )\) is a good candidate for refining the precondition \({\mathcal I}^\sharp \). As a matter of fact, \({\mathcal I}''^\sharp \) is a sufficient precondition for \({\mathcal J}^\sharp _k\) to be unsatisfiable at the call.

Iterative Refinements: Our proposal is to build the summary of a procedure as the result of a sequence of analyses, working on more and more refined partitions. We define \({\mathcal P}^{(\ell )} = \{{\mathcal I}^{\sharp (\ell )}_k\}_{k=1...m_\ell }\) the partition of abstract preconditions considered at \(\ell \)-th analysis. Starting with \({\mathcal P}^{(0)} = \{{\mathcal I}^\sharp \}\) (the singleton made of the global precondition of the procedure), for each \(\ell \), we compute from \({\mathcal P}^{(\ell )}\) the corresponding disjunctive abstract relation \(\{r^{\sharp (\ell )}_k\}_{k=1...m_\ell }\), which is used to refine \({\mathcal P}^{(\ell )}\) into \({\mathcal P}^{(\ell +1)}\), using one of the refinement techniques presented above. This process is not guaranteed to terminate, but may be stopped at any step. In practice, the size of the partition will be limited by a constant parameter of the analysis.

Ensuring the Monotonicity of the Refinement: Refining a precondition is intended to provide a more precise summary. However, this is not guaranteed because of the non-monotonicity of the widening operator. So at step \(\ell \), when precondition \({\mathcal I}^{\sharp (\ell )}_k\) has been split into a pair \(({\mathcal I}^{\sharp (\ell +1)}_{k'}, {\mathcal I}^{\sharp (\ell +1)}_{k''})\) of new preconditions, the analyses performed at step \(\ell +1\) from these new preconditions should use a widening limited by \(r^{\sharp (\ell )}_k\). The monotonicity of the refinement is especially important when dealing with recursive procedures, and avoids the difficulties tackled by [4].

7.2 Application to LRA

Complementable Polyhedra: As said before, complementable polyhedra are those defined by a single inequality. So any polyhedron is the intersection of a finite number of complementable polyhedra. The complement of “\(aX\le b\)” is obtained either with the converse strict inequality “\(aX>b\)” (strict inequalities are handled in the PPL [5, 8] and in Apron [29]), or, in case of integer variables, by the inequality “\(aX\ge b+1\)”.

Precondition Refinement: From a precondition \({\mathcal I}^\sharp \), a standard analysis by LRA provides, at each control point \({c}\) of the program, a polyhedron \(P_{c}({\mathcal I}^\sharp )\). From these solutions, we can try to refine the precondition:

  • For each control point \({c}\), let \(Q_{c}=\exists X.P_{c}({\mathcal I}^\sharp )\) be the projection of \(P_{c}({\mathcal I}^\sharp )\) on initial variables. Then, if \(Q_{c}\ne {\mathcal I}^\sharp \), any constraint \({\chi }\) of \(Q_{c}\) not satisfied by \({\mathcal I}^\sharp \) can be used to separate \({\mathcal I}^\sharp \) into \({\mathcal I}^\sharp _1= {\mathcal I}^\sharp \cap {\chi }\) and \({\mathcal I}^\sharp _2= {\mathcal I}^\sharp \cap \,\overline{{\chi }}\,\), since the point \({c}\) is unreachable by any execution starting in \({\mathcal I}^\sharp _2\). Obviously, this should be tried on control points following a test, and especially those corresponding to loop conditions.

  • For each control point \({c}\) corresponding to a call to a procedure, say p(A), let \(\{R_1,...,R_m\}\) be the polyhedral summary of p, and for each \(k=1...m\), \({\mathcal J}^\sharp _k(p) = { {src}}^\sharp (R_k)[X^0/A]\) (i.e., \({\mathcal J}^\sharp _k(p)\) is the precondition of \(R_k\), expressed on actual parameters). Then, let \(Q_{{c},k}=\exists X.P_{c}({\mathcal I}^\sharp )\sqcap {\mathcal J}^\sharp _k(p)\) be the projection of \(P_{c}({\mathcal I}^\sharp )\sqcap {\mathcal J}^\sharp _k(p)\) on the initial variables of the caller. Then, as before, if \(Q_{{c},k} \ne {\mathcal I}^\sharp \), any constraint \({\chi }\) of \(Q_{{c},k}\) not satisfied by \({\mathcal I}^\sharp \) can be used to separate \({\mathcal I}^\sharp \) into \({\mathcal I}^\sharp _1= {\mathcal I}^\sharp \cap {\chi }\) and \({\mathcal I}^\sharp _2= {\mathcal I}^\sharp \cap \,\overline{{\chi }}\,\), and it is interesting since starting the caller in \({\mathcal I}^\sharp _2\) makes empty the precondition \({\mathcal J}^\sharp _k(p)\).

Notice that, in both cases, the choice of the constraint \({\chi }\) is arbitrary, and that several such constraints can be used in turn. So the fact that the refinement is done according to one single constraint is not a limitation.

7.3 Example

The analysis of the example in Sect. 4.3, from the precondition \({\mathcal I}^{\sharp (0)}= (a^0\ge 0 , b^0\ge 1)\), as done in Sect. 5, provides, on the branches of the loop condition \((r\ge b)\), the solutions:

$$\begin{array}{ll} P_4({\mathcal I}^{\sharp (0)}) = &{}(a^0=a , b^0=b, r\ge b, q\ge 0, b \ge 1 , a \ge q+r)\\ P_6({\mathcal I}^{\sharp (0)}) = &{}(a^0=a , b^0=b, r\ge 0, q\ge 0, b \ge r+1 , a \ge q+r) \end{array} $$

The projections of these solutions on the initial values are:

$$\begin{array}{l} { {src}}^\sharp (P_4({\mathcal I}^{\sharp (0)})) = (a^0\ge b^0 \ge 1) \quad { {src}}^\sharp (P_6({\mathcal I}^{\sharp (0)})) = (a^0\ge 0 , b^0 \ge 1) \end{array} $$

\({ {src}}^\sharp (P_6({\mathcal I}^{\sharp (0)})) = {\mathcal I}^{(0)}\), so it does not induce any refinement. However, \({ {src}}^\sharp (P_4({\mathcal I}^{\sharp (0)})) \ne {\mathcal I}^{\sharp (0)}\), since \({\mathcal I}^{\sharp (0)}\) does not imply \(a^0\ge b^0\). We can refine \({\mathcal I}^{\sharp (0)}\) into

$${\mathcal I}^{\sharp (1)}_1 = (a^0\ge b^0 \ge 1) \text{ and } {\mathcal I}^{\sharp (1)}_2 = (b^0-1 \ge a^0\ge 0)$$

i.e., separate the cases where the loop is entered at least once or not. New analyses from these refined preconditions provide:

$$\begin{array}{ll} P_4({\mathcal I}^{\sharp (1)}_1) = &{}(a^0=a , b^0=b, r\ge b, q\ge 0, b \ge 1 , a \ge q+r)\\ P_6({\mathcal I}^{\sharp (1)}_1) = &{}(a^0=a , b^0=b, r\ge 0, q\ge 0, q+r>=1, b \ge r+1 , \\ &{}\quad a+1 \ge b+q, a\ge b)\\ P_4({\mathcal I}^{\sharp (1)}_2) = &{}\bot \\ P_6({\mathcal I}^{\sharp (1)}_2) = &{}(a^0=a , b^0=b, b-1 \ge a\ge 0, q=0, r=a) \end{array} $$

The projections of these solutions on the initial values are:

$$\begin{array}{ll} { {src}}^\sharp (P_4({\mathcal I}^{\sharp (1)}_1)) = &{}(a^0 \ge b^0 \ge 1) = {\mathcal I}^{\sharp (1)}_1)\\ { {src}}^\sharp (P_6({\mathcal I}^{\sharp (1)}_1)) = &{}(a^0 \ge b^0 \ge 1) = {\mathcal I}^{\sharp (1)}_1)\\ { {src}}^\sharp (P_4({\mathcal I}^{\sharp (1)}_2)) = &{}\bot \\ { {src}}^\sharp (P_6({\mathcal I}^{\sharp (1)}_2)) = &{}(b^0-1 \ge a^0 \ge 0) = {\mathcal I}^{\sharp (1)}_2 \end{array} $$

so, according to our criteria, the preconditions cannot be further refined, and we get the summary

$$\begin{array}{ll} \displaystyle R_1 = &{}\left( a^0=a , b^0=b, a^0\ge b^0 \ge 1, \right. \\ \displaystyle &{} \left. \quad r\ge 0, q\ge 0, q+r>=1, b \ge r+1 , a+1 \ge b+q\right) \\ \displaystyle R_2 = &{}\left( a^0=a , b^0=b, b^0-1 \ge a^0\ge 0, q=0, r=a \right) \end{array} $$

directed by input conditions \(R^0_1 = (a^0\ge b^0 \ge 1)\) and \(R^0_2 = (b^0-1 \ge a^0\ge 0)\).

7.4 A Last Improvement: Postponing Loop Feedback

The previous example shows a weakness of the analysis: the summary has been partitioned according to whether the loop is entered at least once (\(R_1\)) or not (\(R_2\)). However, in the former case, since the loop body is executed at least once, we should obtain \(q\ge 1\), a fact which is missed by the analysis. We could recover this fact by systematically unrolling once each loop that gives raise to such a partitioning. We propose another, cheaper solution. The problem comes from the least upper bound computed at loop entry (\(P_3=P_2\,\sqcup \,P_5\) in the abstract equations of Sect. 4.3), before the test on the loop condition (\(P_6=P_3\sqcap (r\le b-1)\)). The solution consists in permuting the least upper bound and the test, computing instead \(P_6 = (P_2 \sqcap (r\le b-1)) \sqcup (P_5 \sqcap (r\le b-1))\)Footnote 1.

Back to the Example: Computing \(R_1 = P_6({\mathcal I}^{\sharp (1)}_1)\) with this new equation, since \(P_2 \sqcap (r\le b-1) = \bot \), we get

$$\begin{array}{ll} \displaystyle R_1 =&\left( a^0=a , b^0=b, a^0\ge b^0 \ge 1,r\ge 0, q\ge 1, b \ge r+1 , a+1 \ge b+q+r\right) \end{array} $$

Once again, we recover more precision than expected, since, in addition to finding \(q\ge 1\), \(a+1 \ge b+q\) is strengthened into \(a+1 \ge b+q+r\).

8 Recursive Procedures

The relational abstract interpretation of recursive procedures was proposed a long time ago [13, 15, 21]. It involves the use of widening, since the summary of a recursive procedure depends on itself. Moreover, a group of mutually recursive procedures must be analyzed jointly, with widening applied on a cutset of their call graph. In this section, we only show a simple example of how our technique can be applied to build a disjunctive summary of a recursive procedure. It will also illustrate the refinement according to the summary of a called procedure.

figure b

Example: McCarthy’s 91 Function. The opposite procedure is the well-known “91 function” defined by John McCarthy. For simplicity, we don’t duplicate parameters, knowing that x is a value parameter and y is a result parameter. The polyhedral summary of the procedure can be defined by the following equations:

$$\begin{array}{l} R(x,y) = P_2 \sqcup P_7 \\ P_2 = (x \ge 101, y = x-10)\\ P_7 = (x\le 100 \sqcap (\exists t. R(x+11,t) \sqcap R(t,y))) \end{array}$$

A first, standard analysis, without partitioning, reaches the following fixpoint after one widening step:

$$\begin{array}{l} P_2 = (x \ge 101, y = x-10),\quad P_6 = (x\le 100, y+9\ge x, y\ge 91)\\ R^{(0)} = (x\le y+10, y \ge 91)\end{array}$$

Since \({ {src}}^\sharp (P_2)=(x\ge 101)\) splits the global precondition \({\mathcal I}^\sharp =\top \), we refine the precondition into \({\mathcal I}_1^{\sharp (1)} = (x\ge 101)\) and \({\mathcal I}_2^{\sharp (1)} = (x\le 100)\). From this (obvious) partition, the results are not much better:

$$\begin{array}{lll} P_2({\mathcal I}_1^{\sharp (1)}) = (x \ge 101, y = x-10),&{} &{}P_2({\mathcal I}_2^{\sharp (1)}) = \bot \\ P_6({\mathcal I}_1^{\sharp (1)}) = \bot ,&{} &{}P_6({\mathcal I}_2^{\sharp (1)}) = (x\le 100, y\ge 91)\\ R^{(1)}({\mathcal I}_1^{\sharp (1)}) = (x \ge 101, y = x-10),&{}&{}R^{(1)}({\mathcal I}_2^{\sharp (1)})= (x\le 100, y\ge 91) \end{array}$$

But now, the partitioned precondition involves a refinement of \({\mathcal I}_2^{\sharp (1)}\) at the first recursive call, according to the condition \(x+11 \ge 101\). We get \({\mathcal I}_1^{\sharp (2)}=(90\le x\le 100)\) and \({\mathcal I}_2^{\sharp (2)}=(x\le 89)\). The final result is

$$\begin{array}{l} R^{(1)}({\mathcal I}_1^{\sharp (1)}) = (x \ge 101, y = x-10)\\ R^{(2)}({\mathcal I}_1^{\sharp (2)}) = (90\le x\le 100, y=91)\\ R^{(2)}({\mathcal I}_2^{\sharp (2)}) = (x\le 89, y=91) \end{array}$$

which is the most precise summary.

9 Implementation

This approach has been implemented in a prototype static analyzer . Organized as a collection of tools, the analyzer computes numerical invariants on programs written in a significant subset of C. A front-end tool based on Clang [34] and LibTooling translates the abstract syntax tree of a C program into an intermediate representation. The analyzer tool then computes numerical invariants on the intermediate representation. Abstract domains, such as convex polyhedra, are provided by the Apron [29] library. The analyzer can either consider an inlined version of the program, or construct and use procedure summaries as described in the paper, with some restrictions: for the time being, recursive procedures are not yet taken into account, and postponing the loop feedback is not performed as described in Sect. 7.4, but makes use of “loop inversion”.

Procedures are analyzed only once, regardless of the number of call sites, in a bottom-up fashion according to the inverse invocation order, with respect to the dependencies induced by the program call graph.

In order to limit the number of additional variables, the tool does not duplicate all procedure parameters, but applies a simple dataflow analysis before summary construction to identify procedure parameters which are either pure input parameters or pure output parameters, and thus which do not need to be duplicated.

Refinement according to local reachability is performed by the analyzer only at direct successors of test nodes and particularly at loop entry and loop exit. Candidate nodes for refinement are examined during each refinement step using a breadth-first traversal of the program graph. For practical reasons, in order to guarantee the termination of the refinement process and to limit procedure summaries to a reasonable size, an upper-bound \( \theta \) on the refinement depth for a given procedure is set to \( \theta =2 \). This limits procedure summaries to a maximum size of 4.

10 Experiments

Up to now, to illustrate our approach, we presented only tiny examples, for which the precondition partitioning is obvious. However, in presence of more complex control structures—nested and/or successive loops—and when preconditions result from more involved invariants, the usefulness of our method for discovering relevant preconditions is more convincing. Several more complex ad-hoc examples can be found on the repository github.com/programexamples/programexamples.

More thorough experiments are necessary to validate our approach, and in particular to answer the following questions:

  • Since we analyze a procedure several times to construct its summary, it is likely to be time-consuming. So it is interesting to measure the cost of summary construction with respect to the time hopefully saved by using the summary.

  • Precondition partitioning is a heuristic process, so it is important to evaluate the precision lost or gained by using a disjunctive summary instead of analyzing again the procedure for each calling context.

So our experiments consists in comparing our bottom-up approach with an analysis of inlined programs, both with respect to the analysis time and the precision of results. Several difficulties must be addressed first:

  • Most public benchmarks are not usable, since they contain very few numerical programs with procedures. For instance, in the SV-COMP benchmarkFootnote 2, most numerical examples are inlined; the ALICe benchmarkFootnote 3 also contains only monolithic programs. For our assessment, we used the benchmark of the MälardalenFootnote 4 WCET research group, which contains various small and middle-sized programs, such as sorts, matrix computations, fft, etc. Moreover, some programs of this benchmark were sometimes extended with auxiliary variables counting the number of executions of each block to help the evaluation of the execution time [10]; these extensions—the name of which are prefixed with “cnt_” below—are interesting for us, since they contains more numeric variables.

  • The comparison of polyhedral results is not straightforward:

    • On one hand, we must decide which polyhedra to compare. The correspondence of control points between the inlined program and the structured one is not easy to preserve. In our experiments, we only compared the results at the end of the main program. Of course, for the comparison to be meaningful, the results on the inlined program must be first projected on the variables of the main program.

    • On the other hand, while a qualitative comparison of two polyhedra is easy—by checking their inclusion in both directions—, a quantitative comparison is more difficult: it could be precisely achieved by comparing their volumes—algorithms are available for that [7, 11]—but it is only possible for bounded polyhedra. In our assessment, besides a qualitative comparison, we only compared the number of constraints.

All our experiments are done using the convex polyhedra abstract domain. Widening is never delayed and decreasing sequences are limited to 7 terms. The analysis times are those obtained on an Intel Xeon E5-2630 v3 2.40 Ghz machine with 32 GB of RAM and 20MB of L3 cache running Linux.

Table 1 compares our method with a standard LRA on inlined programs, in terms of analysis time, qualitative precision and number of constraints of results found at the exit points of the main procedures. The “# procs” column gives the number of procedures in each program and the “max. # calls” column gives the maximum number of call sites per procedure in a program. We define:

  • (resp. ) the time (in seconds) for analyzing the inlined program (resp. the time for interprocedural analysis)

  • (resp. ) the polyhedron result of the inlined analysis (resp., of the interprocedural analysis)

  • (resp. ) the number of constraints of (resp. of ).

The qualitative results comparison is shown by column “cmp. res.” which indicates whether the result is better (\(\sqsupset \)), worse (\(\sqsubset \)), equal (\(=\)) or incomparable (\(<>\)) w.r.t. . The \( S\) column gives for each program the speedup of our method defined as . Our method is significantly faster than standard LRA using inlining for 13 over 19 programs (\(\approx 68\%\) of programs), with an average speedup of 2.9. The loss of precision is very moderate since only 1 over 19 programs, namely minver, has a less precise convex polyhedra at the exit node of the main procedure.

Table 1. Experimental results.

Interestingly, our method also leads to precision improvements for some programs, such as janne_complex, my_sin and cnt_minver, due to the use of disjunction, enabling a more accurate analysis of procedure behaviors. Moreover, those precision improvements are not necessarily obtained at the expense of analysis time, since the janne_complex program has a more precise convex polyhedra at the exit of the main procedure, with a \( 60\% \) increase in the number of constraints and has also the highest speedup with \( S= 15.34 \).

Table 2 reports the computation times of the summary of each procedure in each program. The \( \tau _c \) column gives the fraction of the analysis time using our method spent during the computation of each procedure summary, defined as \( \tau _c = \text {Procedure summary comp. time}/ \text {Program analysis time using rel. summ.} \)

Table 2. Summaries computation times.

The summary construction time for small utility procedures, such as the my_fabs, my_sin, my_cos and my_log procedures, in the fft1 and cnt_fft1 programs, are very small (lower than 4 ms) and often individually negligible with respect to the analysis time of the entire program (with \( \tau _c \) often lower than \( 1\% \)). This suggests that our method could be particularly beneficial, in terms of analysis performance, for programs built on top of a collection of utility procedures or a library of such procedures, each procedure summary being computed only once and possibly used in many call contexts.

figure c

Our last experiment concerns the speedup of our interprocedural analysis with respect to the number of calls. Notice that the Mälardalen benchmark is not very favorable in this respect, since most procedures are called only once. Our analysis on the cnt_ns program has a moderate speedup of 1.25. In order to observe the evolution of the speedup with the number of calls, we increase the number of calls to the foo procedure in the main procedure of the cnt_ns program. The opposite graphic shows the evolution of the analysis times of these successive versions, comparing our analysis with respect to standard LRA with inlining.

figure d

The analysis of the cnt_ns program using our disjunctive relational summaries analysis becomes significantly faster than standard LRA with inlining when there are more than 2 calls to the foo procedure in the main procedure.

11 Conclusion and Future Work

In this paper, we proposed a method for interprocedural analysis as a solution to the cost of using expressive relational abstract domains in program analysis. An analysis using a relational domain can be straightforwardly transformed into a relational analysis, computing an input-output relation. Such relations can be used as procedure summaries, computed once and for all, and used bottom-up to compute the effect of procedure calls. Applying this idea with linear relation analysis, we concluded that the obtained polyhedral summaries are not precise enough, and deserve to be refined disjunctively. The main ideas of the paper are as follows. First, we used precondition partitioning as a basis of disjunctive summaries. Then, we proposed a heuristic method for refining a summary according to reachability of control points or calling contexts of called procedures. We also identified some technical improvements, like widening limited by preconditions and previously computed relations, and more precise computation of results at loop exit points. Our experiments show that using summaries built in this way can significantly reduce the analysis time, especially for procedures used several times. On the other hand, the precision of the results is not dramatically damaged, and can even be improved, due to disjunctive analysis.

Future work should be devoted to applying the method with other relational domains. In particular, octagons [39] would be interesting since they permit a better quantitative comparison of results: apart from infinite bounds, two octagons on the same variables can be precisely compared by comparing their constant vectors. Another, longer-term, perspective is to use disjunctive relational summaries for procedures acting on remanent memories, like methods in object-oriented programming or reaction functions in reactive programming. Our precondition partitioning could result in partitioning memory states, and allow disjunctive memory invariants to be constructed modularly.