Keywords

1 Introduction

Automated techniques for modular reasoning about interprocedural recursive programs have a rich history with various techniques spanning interprocedural dataflow analysis [55, 57], abstract interpretation [18], and software model checking [6]. These techniques exploit the inherent modularity in a program by deriving a summary for each procedure. Procedure summaries can be viewed as specifications or interface contracts, where internal implementation details have been abstracted away. In addition to aiding code understanding and maintenance, they can be combined to verify the full program. A modular verification approach that infers and composes procedure summaries may scale better than a monolithic one that considers all procedure implementations at once.

A popular modern approach is to encode interprocedural program verification problems as Constrained Horn Clauses (CHCs) [32], in which uninterpreted predicates represent placeholders for procedure summaries. A CHC solver then finds interpretations for these predicates such that these interpretations correspond to summaries, enabling generation of procedure summaries.

CHC solvers [13, 27, 32, 39, 42, 49, 60] query to backend SMT (Satisfiability Modulo Theory) solvers [8] to find interpretations that make all CHC rules valid. In addition to classic fixpoint computations,

CHC solvers use model checking techniques, e.g., counterexample guided abstraction refinement (CEGAR) [17], interpolation [46], property-directed reachability (PDR) [12, 23], and guess-and-check procedures [25]. They can thus find procedure summaries adequate for verification but not necessarily least or greatest fixpoints. CHC-based verifiers have been successfully applied to a range benchmark programs, but there remain significant challenges in handling mutual recursion and in scalability.

We aim to address these challenges by leveraging program structure during solving and learning relevant facts. Typical CHC-based verifiers may not maintain a program’s structure when encoding it into CHCs. In contrast, our method uses the program call graph, which can be preserved easily in a CHC encoding, to guide proof search.

For improving scalability, we ensure that the SMT queries in our method are always bounded in size even when more of the program is explored. We wish both to maintain scalability and to avoid learning over-specialized facts. We do this by leveraging the call graph of the program, i.e., analyzing a procedure in the context of a bounded number of levels in the call graph. Furthermore, such a notion of a bounded environment enables us to refer to bounded call paths in the program and learn special lemmas, called EC (Environment-Call) Lemmas, to capture relationships among summaries of different procedures on such paths. These lemmas are beneficial in handling mutual recursion.

Other techniques also trade off scalability and relevance by considering a bounded number of levels in a call graph, e.g., in bounded context-sensitivity or k-sensitive pointer/alias analysis [51], stratified inlining [44], and depth cutoff [40] in program verification. However, other than Spacer [42], which is restricted to \(k\,=\,1\) bounded environments, existing CHC solvers do not use bounded environments to limit size of the SMT queries.

Summary of Contributions. This paper’s contributions are as follows:

  • We propose a new CHC-solving method for generating procedure summaries for recursive interprocedural programs (Sect. 6).

  • We propose to handle mutual recursion by explicitly learning EC Lemmas to capture relationships among different procedures on a call path (Sect. 5).

  • We propose to use bounded environments (with bound \(k \ge 1\)) (Sect. 4) to compute individual procedure summaries. The SMT queries formulated in our method are always bounded in size, thereby improving scalability.

  • We have implemented our method in a tool called Clover and report on its evaluation on several benchmark programs, along with a detailed comparison against existing tools (Sect. 7).

To the best of our knowledge, EC Lemmas and bounded environments, the main features of our algorithm, are novel for summary generation in modular verifiers.

2 Motivating Example

We illustrate the main steps of our modular algorithm on the example program shown in Fig. 1a. To keep our focus on intuition, we describe our algorithm in terms of the program (CHC encodings are described later).

In Fig. 1a, e and o are defined mutually recursively and return \( true \) iff their argument is respectively even or odd. Procedure f returns the (always-even) result of calling h on g’s result, where g returns an arbitrary odd number and h adds one to its input. The safety specification is that \(\mathtt {e(f() - 1)}\) never holds. We aim to infer over-approximate procedure summaries so that the assertion’s truth follows from replacing procedure calls in main with these summaries.

We maintain context-insensitive over- and under-approximate summaries for all procedures , each of which captures both pre- and post-conditions of its procedure. All over- (resp. under-) approximate summaries are initially \(\top \) (resp. \(\perp \)). At each step, we choose a target procedure p and its bounded environment, then update p’s summaries based on the results of SMT queries on its over- or under-approximate body. We also allow the bounded environment to be over- or under-approximated, leading to four kinds of SMT queries. These queries let us over- and under-approximate any procedure that is called before or after the target, unlike Spacer [42] or Smash [31], which use two kinds of SMT queries.

Fig. 1.
figure 1

Example: (a) source code, (b) call graph, and (c) final derivation tree.

Table 1. Relevant steps to verify program in Fig. 1a.

Table 1 lists non-trivial verification steps that update various procedure summaries. (Steps that do not update any summary are not listed.) The first column lists the call path that is visited in each step, in which the last call is the current target procedure whose summary is updated, and the call path is used to generate its bounded environment. The “Environment” (resp. “Target”) column shows whether the bounded environment (resp. target) is over- or under-approximated. The “Deductions” column lists deductions resulting from SMT queries in that step. Note that formulas in this column (and in the remainder of this section) are implicitly universally quantified over all variables and involve uninterpreted predicates (e.g., \(\mathtt {h(x,y)}\) in row 2). Except in row 9, all these formulas are implications that represent procedure summaries. Row 9 shows an implication between two such formulas – this is an instance of an EC lemma (described later).

2.1 Using the Program Call Graph

Our algorithm chooses environment-target pairs based on the call graph of the program, shown in Fig. 1b. It maintains explored paths through the call graph in a data structure called a derivation tree, initially consisting of only one node that represents entry procedure main. Figure 1c shows the tree just before the algorithm converges. The subset A of nodes available to be explored is also maintained, and it is this subset that guides exploration in our algorithm.

To improve scalability, we use bounded environments from call paths to use in SMT queries at each step. These bounded environments include bodies of the ancestors of the target procedure, but only up to level k above the target in the call graph. Ancestors at \(l > k\) above the target are soundly abstracted away so that these environments capture at least the behaviors of the program before and after the target procedure that may lead to a counterexample. Approximations of these environments and of the bodies of target procedures help us learn new facts about the targets. In this example, we use \(k=2\). When we target the last call to e along path \(\mathtt {main} \rightarrow \mathtt {e} \rightarrow \mathtt {o} \rightarrow \mathtt {e}\), main’s body will be abstracted.

2.2 Summary Updates Using SMT Queries

We now consider the four SMT queries on a chosen environment-target pair at each step. Suppose our algorithm has already considered the path to o on row 3 (Table 1) and now chooses node \(\mathtt {g}\in A\) in path \(\mathtt {main} \rightarrow \mathtt {f} \rightarrow \mathtt {g}\). Here, the bounded environment includes calls to h (called by f) and e (called by main), so we use their over-approximate summaries (both currently \(\top \)). We under-approximate the environment using summaries for h and e learned in rows 2 and 1, respectively. Over- and under-approximations of g are just its body with local variables rewritten away (i.e., ,2 * havoc() + 1,), since it has no callees.

In checks that over-approximate the procedure body, we try to learn an interpolant that proves the absence of a subset of counterexamples along this path in the program. Since target procedure body is over-approximated, any interpolant found that separates its encoding from the counterexample captured by the environment will be an over-approximate summary for the target procedure, expressing a fact about all behaviors of the procedure. Such over-approximate summaries allow us to prove safety in a modular way. In checks that under-approximate the procedure body, we try to find (part of) a bug. Since the target procedure body is under-approximated, the interpolant is instead an under-approximate summary, describing behaviors the procedure may exhibit. Under-approximate summaries allow us to construct counterexamples in the case where the program is unsafe. Approximating the environment and target procedure body allows us to keep queries small.

Both the over-over and under-under checks fail here, so no updates are made. A weaker version of the under-under check is the over-under check, in which the environment is now over-approximated. Because it is weaker, it may result in learning under-approximate summaries that may not be necessary, since the over-approximated environment may contain spurious counterexamples. When our algorithm performs this check, it finds a path that goes through the over-approximated environment and the under-approximation of g’s body and thus augments g’s under-approximation (row 4).

A corresponding weaker version of the over-over check is the under-over check, in which the environment is under-approximated. Because the under-approximated environment may not capture all counterexamples, the learned interpolant by itself could be too weak to prove safety. Our algorithm refines g’s over-approximation with the interpolant learned in this query (row 5).

Note that these two weaker checks are crucial in our algorithm. Consider a different main function that contains only assert(f() mod 2 = 0). To prove safety, we would need to consider paths \(\mathtt {main} \rightarrow \mathtt {f} \rightarrow \mathtt {h}\) and \(\mathtt {main} \rightarrow \mathtt {f} \rightarrow \mathtt {g}\), but for these paths, both “stronger” checks fail. Paths through the derivation tree must be paths through the call graph, so we would not consider the bodies of h and g simultaneously; the “weaker” checks allow us to learn summary updates.

2.3 Explicit Induction and EC lemmas

To demonstrate the need for and use of induction and EC lemmas for handling mutual recursion, we now consider row 9 in Table 1, where we perform an over-over check for the final call to e in the call path. The current derivation tree has the same structure as the final derivation tree, shown in Fig. 1c.

No Induction. At this stage, our over-approximation for f precisely describes all possible behaviors of f (rows 7, 8), but no interpolant can be learned because the over-approximation \(\top \) of o in the body of procedure e is too coarse. Without using induction, we cannot make any assumptions about this call to o, and are stuck with this coarse over-approximation. Even if we inlined the body of o, we would similarly still have an overly-coarse over-approximation for e.

Induction with EC Lemmas. We can instead try to use induction on the body of e. Its over-approximated environment includes counterexample paths that we would like to prove spurious. Let formula \(\phi (x, y)\) denote property \(\mathtt {e}(x, y) \Rightarrow (y \Leftrightarrow x~\mathrm {mod}~2 = 0)\). The consequent in this implication is generated by examining the environment for e, i.e., the environment implies the negation of the consequentFootnote 1. Problems arise when trying to prove this property by induction because there is no opportunity to apply the inductive hypothesis about e. When the else branch is taken, facts about o are needed to finish the proof for \(\phi (x, y)\) and \(x > 0\).

If we were to inline o and assume inductive hypothesis that \(\phi (x,y)\) holds for the inner call to e, an inductive proof would succeed without using EC lemmas. However, such an inlining approach can lead to poor scalability and precludes inference of summaries (e.g., for o) that could be useful in other call paths.

EC Lemmas. Our algorithm discovers additional lemmas in the form of implications over certain procedure summaries (Sect. 5). Let formula . (Again, the consequent in this implication is generated by examining the environment for o.) Let , i.e., \(\psi \) is similar to \(\phi \) property, but with an additional assumption \(\theta \) about o.

Validity of \(\psi \) is proved by case analysis: \(\psi (1,true, m, n)\) is trivially true, and the proof of \(\psi (x,y,m,n)\) for \(x > 0\) works because of the assumption \(\theta \). Thus, the formula \(\psi (x,y,m,n)\) is learned as an EC lemma (see row 9).

Now, we reconsider the call to \(\mathtt {o}\) along call path \(\mathtt {main} \rightarrow \mathtt {e} \rightarrow \mathtt {o}\). The discovered EC lemma allows us to prove formula \(\theta \) valid by induction. This new over-approximate fact for o is combined with the EC lemma allowing the algorithm to learn \(\mathtt {e}(x, y) \Rightarrow (y \Leftrightarrow x~\mathrm {mod}~2 = 0)\). This step corresponds to row 10.

3 Preliminaries

In this section, we define our notion of a program, introduce CHC notions and encodings, and define contexts and derivation trees.

Programs. A program P is a set of procedures with entry point \( main \). Each procedure p has vectors of input and output variables \({ in_p }\) and \({ out_p }\) and a body \( body_p \), which may contain calls to other procedures or recursive calls to p. When p is clear from context, we omit it in the variables’ subscripts, e.g., \(p( in , out )\). We encode a program as a system of CHCs \(\mathcal {C}\).

Definition 1

A CHC C is an implicitly universally-quantified implication formula in first-order logic with theories of the form \( body \Rightarrow head \). Let \(\mathcal {R}\) be a set of uninterpreted predicates. The formula \( head \) may take either the form \(R(\vec {y})\) for \(R\in \mathcal {R}\) or \(\perp \). Implications in which \( head = \perp \) are called queries. The formula \( body \) may take the form \(\phi (\vec {x})\) or \(\phi (\vec {x}) \wedge R_0(\vec {x}_0) \wedge \ldots \wedge R_n(\vec {x}_n)\), where each \(R_i\) is an uninterpreted predicate, and \(\phi (\vec {x})\) is a fully interpreted formula over \(\vec {x}\) (i.e., it contains only theory predicates), which may contain all variables in each \(\vec {x}_i\) and (if the head is of the form \(R(\vec {y})\)) all variables in \(\vec {y}\).

Fig. 2.
figure 2

Unfoldings (and intermediate steps) of e in the body of main from Fig. 1a. Program snippets are shown on the left and CHC encodings on the right.

A system of CHCs for a program can be generated by introducing an uninterpreted predicate per procedure and encoding the semantics of each procedure using these and theory predicates. Each application \(R(\vec {x})\) in the body of a CHC corresponds to a procedure call to a procedure p, where \(\vec {y} = ( in _p, out _p)\). By analogy, we refer each such R as a callee of the predicate in the head of the CHC. For each \(C \in \mathcal {C}\) with uninterpreted predicate applications \(\{R_0(\vec {x}_0), \ldots , R_n({\vec {x}_n})\}\) in its body, we let \( callee _C\) be a one-to-one mapping from \(0, \ldots , n\) to these applications.

This mapping allows us to distinguish between different applications of the same predicate within the same CHC body, which we can understand as distinguishing between different callsites of the same callee within a procedure. We abuse notation and denote the corresponding predicate for a procedure \(p \in P\) in encoding \(\mathcal {C}\) as p. We assume that in any application \(p(\vec {y})\) in the head of a CHC in \(\mathcal {C}\), \(\vec {y}\) is the same vector of variables \( in _p\), \( out _p\). We let \(C. body \) and \(C. head \) denote the body and head of CHC C respectively. We let \( loc _C\) denote \( fv (C. body ) \setminus fv (C. head )\), where for a formula F, \( fv (F)\) denotes the free variables in F. We assume that all \(C,C' \in \mathcal {C}\) are such that \( loc _C \cap loc _{C'} = \emptyset \) and let \( loc _p = \bigcup \{ loc _C~|~C. head = p(\vec {y})\}\). Note that disjunction \(\bigvee _i \{ body_i ~|~ body _i \Rightarrow p(\vec {y} ) \in \mathcal {C}\}\) gives the semantics of \( body _p\). We abuse notation to use \( body _p\) to refer to this disjunction.

Corresponding CHC encodings are shown in Fig. 2 for demonstration. We assume the use of an encoding that preserves the call graph structure of the program in CHCs; i.e., there will be a CHC with head containing p with an application of q in its body iff p calls q.

Definition 2 (Solution)

A solution for a system of CHCs \(\mathcal {C}\) is an interpretation M for uninterpreted predicates \(\mathcal {R}\) that makes all CHCs in \(\mathcal {C}\) valid.

A CHC encoding is such that if it has a solution, the original program is safe. To remember facts learned during our algorithm, we maintain two sets of first-order interpretations of \(\mathcal {R}\) called O and U, functioning as mappings from procedures to their over- and under-approximate summaries, respectively.

Definition 3 (Procedure Summaries)

The over- (O) and under-approximate (U) summaries are such that all non-query CHCs \( body \Rightarrow head \in \mathcal {C}\) are valid under O and that implication \( head \Rightarrow body \) is valid under U.

From Definition 3, it is clear that for all p, \(O[p] = \top \) and \(U[p] = \perp \) are valid summaries. We use these as initial summaries in the algorithm presented in Sect. 6. Note that when O is a solution for the system of CHCs \(\mathcal {C}\) (i.e., O makes the query CHCs valid). When U is such that a query CHC is not valid, then verification fails and a counterexample exists.

Definition 4 (Approximation)

Given a formula \(\varPi \) and an interpretation \(M \in \{O, U\}\), an approximation \(\widehat{\varPi }_M\) is defined as follows:

In addition to approximations, we can manipulate CHCs using renaming and unfolding.

Definition 5 (Renaming)

For a formula F containing variables \(\vec {x}\), \(F[\vec {x} \mapsto \vec {y}]\) denotes the simultaneous renaming of variables \(\vec {x}\) to \(\vec {y}\) in F.

Definition 6 (Unfolding)

Let \(\mathcal {C}\) be a system of CHCs. Let \(C \in \mathcal {C}\) be a CHC \(R_0(\vec {x}_0) \wedge \ldots \wedge R_n(\vec {x}_n) \wedge \phi (\vec {x}) \Rightarrow R(\vec {y})\) where \( callee _C(i) = R_i(\vec {x}_i)\) for each \(i \in \{0, \ldots , n\}\). There is an unfolding of \( callee _C(k)\) per CHC in \(\mathcal {C}\) whose head is an application of predicate \(R_k\). For such a CHC \( body \Rightarrow R_k(\vec {y}_k) \in \mathcal {C}\), the unfolding of \(R_k(\vec {x}_k)\) in C is given by the following:

$$\bigwedge _{i \in \{0, \ldots , n\}, i \ne k} R_i(\vec {x}_i) \wedge body [\vec {y}_k \mapsto \vec {x}_k] \wedge \phi (\vec {x}) \Rightarrow (\vec {y})$$

An unfolding is essentially a one-level inlining of one CHC in another. Figure 2 illustrates what an unfolding of CHCs correponds to on our motivating example, where e is unfolded in main.

Definition 7 (Environment)

For a CHC C of the form \(\bigwedge _{i \in \{1 .. n\}} R_i(\vec {x_i}) \wedge \phi (\vec {x_1}, \ldots , \vec {x_n}) \Rightarrow \perp \), the environment for \(R_k(\vec {x_k})\) is given by the following:

$$\bigwedge _{i \in \{1 .. n\}, i \ne k} R_i(\vec {x_i}) \wedge \phi (\vec {x_1}, \ldots , \vec {x_n})$$

By analogy with programs, the environment for \(R_k(\vec {x_k})\) intuitively captures the procedure calls in C before and after the procedure call for \(R_k(\vec {x_k})\). Note that if C is simply an encoding of a single procedure body, then the environment will only capture the immediate callees of that procedure. On the other hand, if C is, for example, an unfolding of the CHC representing \(\texttt {main}\), then the environment may contain any calls before and after the call corresponding to \(R_k(\vec {x_k})\) in a full but potentially spurious counterexample run of the program so long as they have corresponding predicate applications in the unfolding C.

Definition 8 (Derivation Tree)

A derivation tree \(D = \langle N, E \rangle \) for system of CHCs P is a tree with nodes N and edges E, where each \(n \in N\) is labeled with uninterpreted predicate \(p = proc (n)\), a context query CHC \( ctx (n)\), and an index \(i = idx (n)\) such that \( callee _{ ctx (n)}(i)\) is an application of p.

Our algorithm uses the derivation tree is capture the already-explored unfoldings starting from the encoding of \( main \) and to further guide exploration. Each node \(n \in N\) represents a verification subtask, where the body of \( ctx (n)\) represents a set of (potentially spurious) counterexamples. The goal of each subtask is to find a solution for the system of CHCs consisting of all non-query CHCs in \(\mathcal {C}\) with the query CHC \( ctx (n)\) and refine the over-approximation \(O[ proc (n)]\) to reflect the learned facts, or, if this cannot be done, to expand \( proc(n) \)’s under-approximation \(U[ proc (n)]\) to demonstrate (part of) a real counterexample.

A program’s initial derivation tree consists of only one node labeled with procedure \( main \) and a query CHC from the system \(\mathcal {C}\). We maintain the invariant that if s is the parent of t, then the \( ctx (t)\) must be able to be constructed by unfolding a predicate in \( ctx (s)\). Furthermore, we require that the unfolded predicate is one of the predicates that was added in the previous unfolding step to get \( ctx (s)\). This notion of a derivation tree is similar to other CHC-based work [49, 60], but our invariant restricts the way in which we can expand the tree (i.e., the way in which we can unfold from \( main \)) – every derivation tree path corresponds to a call graph path. We let e(n) refer to the environment for \( callee _ ctx(n) ( idx (n))\) in \( ctx (n)\).

For a derivation tree path d (of length |d|) whose final node is n, the full context \( ctx(n) \) can be derived by unfolding all of \( proc (n)\)’s ancestors in the root node’s context CHC along the corresponding call graph path for the original programFootnote 2. We also denote this full context as \( unfold (d, |d|)\). For \(k < |d|\), \( unfold (d, k)\) corresponds to unfolding the bodies of the last \(k-1\) procedure calls in d into the body of \( proc (n)\)’s \(k^ th \) ancestor. Note that \( unfold (d, k)\) only unfolds ancestors on the call path; any other of the ancestors remain represented as uninterpreted predicates. For \(k \ge |d|\), \( unfold(d, k) = unfold (d, |d|)\). (See also Definition 9.)

4 Bounded Contexts and Environments

Here we define bounded contexts and environments. Our algorithm uses these bounded versions in all SMT queries described later.

Definition 9 (Bounded context)

For a given bound k, and a path \(d = n_0 \rightarrow \ldots \rightarrow n_{m - 1} \rightarrow n_m\) in a derivation tree, a k-bounded context for \(n_m\) is a formula \( bctx (n_m)\) over variables , defined as follows:

Here, we also have the following:

  • \( interface (d, k)\) is a formula over the inputs and outputs of the procedure for node \(n_ m - k \), \(k < m\) (or \(\top \), if \(k \ge m\))

  • \( summ (d, k)\) is a formula over the inputs and outputs of the other callees of the k-bounded ancestors of \( proc (n_m)\).

Note \( unfold(d, k) \) ignores any restrictions due to ancestors that are more than k-levels above \(proc(n_m)\). Such restrictions are expressed in \( interface (d, k)\), which represents the interface between the k-bounded context and the rest of the context above it. In practice, we compute \( interface (d, k)\) as \(\mathrm {QE}(\exists fv (e(n_m)) \setminus bvs . \widehat{e(n_m)}_{O,\ell })\), where \(\mathrm {QE}\) denotes quantifier elimination. We approximate quantifier elimination using the standard model-based projection technique [10]. We can always use \( interface (d, k) = \top \), which treats ancestor procedures above bound k as havocs; we found this choice ineffective in practice.

In what follows, we refer to \( unfold (d, k). body \wedge interface (d, k)\) as B(dk) or simply as B when d and k are clear. Again, we require that each \( bctx (n_m)\) (and thus each B(dk)) can be computed from its parent \(n_{m - 1}\)’s bounded context via a single unfolding. Given our choice of \( interface (d, k)\), using such a method to compute a child node’s bounded context lets us avoid (approximate) quantifier elimination on large formulas since only one procedure body’s variables need to be eliminated when starting from the parent’s bounded context.

The \( summ (d, k)\) formula can be either \(\top \) or a conjunction that adds approximation constraints based on summaries for the other callee procedures. We use \( bctx.body = B\) when \( summ (d, k) = \top \), or \( bctx.body = \widehat{B}_{M}\) or \( bctx.body = \widehat{b}_{M}\) for \(M \in \{O, U\}\), where b is the environment for \( callee _{ ctx (n_m)}( idx (n_m))\) (when \( summ (d, k)\) is the conjunction from approximating with M).

Example 1

The figure shows a bounded context for predicate p with bound 2 for the derivation tree path shown with solid edges. Ancestor predicates \(q_1\), \(q_2\) are unfolded in \( unfold (d,2)\), and \( summ (d, 2)\) approximates callees \(r_0\), \(r_1\), \(r_2\):

figure a

For scalability, our algorithm (Sect. 6) considers verification subtasks with the bounded context of a given procedure. Our algorithm’ queries use bounded environments, which can be computed from bounded contexts.

Definition 10 (Bounded environment)

For a node n, its bounded environment \( benv (n)\) is the environment for the predicate \( callee _{ ctx (n)}( idx (n))\) in \( ctx (n)\).

We define a bounded parent relationship between nodes \(s, t \in N\), where \(s \rightarrow t\) is not necessarily in E, but where \( ctx (s)\) has \( proc (t)\) as a callee.

Definition 11 (Bounded parent)

A node s is a bounded parent of node t in derivation tree D, denoted \(s \in Bparent (t, D)\), iff there is some index i such that \( callee _{ ctx (s)}(i)\) is an application of \( proc (t)\) and \(b_t \Leftrightarrow next (b_s, proc (t), i)\), where \(b_t\) and \(b_s\) are bodies of the bounded contexts of s and t.

Note that the parent of a node n is always a bounded parent for n, and that n may have several bounded parents because the approximation of different full environments may lead to the same bounded environment. We use bounded parents in our algorithm (Sect. 6) to avoid considering redundant verification subtasks.

5 EC Lemmas

We also learn a set L of EC lemmas, which are implications capturing assumptions under which a procedure has a particular over-approximation.

Definition 12 (Environment-Call (EC) Lemmas)

Let \( proc (n) = p\) for some node n in a derivation tree. An EC lemma for p, where n has ancestors with procedures \(\{ q_i \}\) along a derivation tree path, is of the following form:

$$\forall fv (S_i)~\cup ~ in ~\cup ~ out . \bigwedge _i S_i \Rightarrow (p( in , out ) \Rightarrow prop )$$

Here, \( prop \) is a formula with \( fv(prop) \subseteq in ~\cup ~ out \), each \(S_i\) is of the form \(q_i( in _i, out _i) \Rightarrow prop _i\), where \(q_i\) is some ancestor’s uninterpreted predicate, and \( prop _i\) is a formula with \( fv(prop_i) \subseteq in _i \cup out _i\).

Intuitively, an EC lemma allows us to learn that \( prop \) is an over-approximation of procedure p under the assumptions \(\{S_i\}\) about its ancestors with procedures \(\{q_i\}\). Each \(S_i\) itself is an assumption that \( prop _i\) over-approximates \(q_i\). These ancestors are in target p’s environment, so we call these formulas Environment-Call (EC) Lemmas. In practice, we learn EC lemmas involving ancestors whose procedures are callees of p to help set up induction for mutual recursion.

6 Modular Verification Algorithm

We now describe our modular verification algorithm. We first outline the top-level procedure (Sect. 6.1) based on iteratively processing nodes in the derivation tree. Then we describe how each node is processed using SMT queries (Sect. 6.2), the order in which SMT queries are performed (Sect. 6.3), and how induction is performed and how EC lemmas are learned and used (Sect. 6.4). We present the correctness and the progress property of our algorithm and discuss limitations (Sect. 6.5). (Additional heuristics are described in Appendix C [52].)

6.1 Algorithm Outline

Our algorithm constructs a derivation tree based on the call graph of the program, which is used to guide the selection of CHCs to explore. We achieve scalability by considering only bounded environments in all our SMT queries. We present these queries as part of proof rules that capture the major steps of our algorithm. The use of induction and EC lemmas enables handling of mutually recursive programs. The state during verification is captured by proof (sub)goals.

Definition 13 (Proof (sub)Goal)

For system of CHCs \(\mathcal {C}\), derivation tree \(D = \langle N, E \rangle \), a subset \(A \subseteq N\) of available nodes, over- and under-approximate summary maps O and U, a set of EC lemmas L, and \( Res \in \{\top , \perp \}\), a proof (sub)subgoal is denoted \(D, A, O, U, L, \mathcal {C} \vdash Res \).

Main Loop. Algorithm 1 shows the top-level procedure for our method. The Verify procedure constructs an initial proof goal containing an initial derivation tree, initial summary maps, and empty sets of lemmas. Initially all nodes in the derivation tree are available, i.e., they are in A. It then iteratively chooses an available node and tries to update its summaries (using routine ProcessNode), thereby updating the current goal. The loop terminates when no more nodes are available or when the current summaries are sufficient to prove/disprove safety. \(\textsc {Result}\) returns \( safe \) if the summaries are sufficient for proving safety, \( unsafe \) if they are sufficient for disproving safety, or \( unknown \) otherwise.

figure b

Choice of Procedures and Environments. ProcessNode can be viewed as making queries on an environment-procedure pair. If the algorithm chooses node n, then the pair consists of \( benv (n)\) and the procedure corresponding to \( proc (n)\). Note that the call graph guides the choice of the target since all paths in D correspond to call graph paths, and the bounded environment, which is computed by unfolding the k-bounded ancestors of the target. Importantly, the chosen node must be in A; this choice can be heuristic as long as no node in A is starved.

Summary Inference. Our algorithm learns new summaries for target predicates by applying four proof rules. For ease of exposition, we first describe these proof rules without induction (next subsection), followed by rules for induction and EC lemma. While these proof rules resemble those in Smash [31], our queries involve k-bounded environments with \(k \ge 1\) and our summaries are first-order theory formulas; in Smash, queries use bounded environments with \(k=1\) and summaries are pre-/post-condition pairs over predicate abstractions. Additional proof rules specify the removal and addition of nodes in D and A. Appendix B [52] provides the complete set of rules, omitted here due to space constraints.

6.2 Proof Rules Without Induction

The algorithm updates the current \( Goal \) whenever a proof rule can be applied. Note that we are building a proof tree from the bottom-up, so an application of a rule here involves matching the conclusion to the current \( Goal \). We abbreviate some common premises with names as shown in Fig. 3. For a node \(n \in A\), let p be its procedure and b be its bounded environment. Also let \( body \) be the renaming of the body of p. The distinct feature of our algorithm is that the proof rules use only bounded environments.

Fig. 3.
figure 3

Abbreviated premises, where \( fresh (\vec {x})\) returns a vector \(\vec {x'}\) of fresh variables.

Fig. 4.
figure 4

Proof rules without induction.

The SAFE and UNSAFE rules (Fig. 4) allow us to conclude the safety or find a counterexample of the original program P using over- or under-approximate summaries, respectively. In the latter case, the underapproximate summaries demonstrate that there is no solution for set of CHCs \(\mathcal {C}\). If either rule is applicable to the proof goal, we have found sufficient summaries.

The OVER-OVER (OO) rule (Fig. 4) can be used to update a predicate p’s over-approximate summary. If the conjunction of over-approximation of \( body _p\) and the bounded environment is unsatisfiable, then we can find an interpolant \(\mathbb {I}\) and use it to refine the map O for p.

The UNDER-OVER (UO) rule (Appendix B [52]) is similar, except it uses an under-approximation of the environment.

Example 2

Recall the example in Fig. 1a. Row 5 in Table 1 shows the over-approximate summary \(y \text { mod } 2 \ne 0\) for procedure g obtained as a result of UO.

The UNDER-UNDER (UU) rule (Fig. 4) can be used to update predicate p’s under-approximation.

figure c

Let \(\pi \) be the body of a CHC whose head is \(p(\vec {y})\), where variables \(\vec {y}\) have been renamed to the variables that p is applied to in \( callee_{bctx(n)}(i) \) and the rest of the variables have been renamed to fresh ones. If the conjunction of the under-approximations of \(\pi \) and b is satisfiable, then we can update p’s under-approximate summary U with \(\exists loc _C . \pi \).

If the environment were unbounded, then this check being satisfiable would actually indicate a concrete counterexample, since the context would be an unfolding of a query CHC and UNSAFE would hold, but since our environment is bounded, the context may not be an unfolding of the query CHC, since it may be missing some constraints. We can only conclude that there might be a counterexample that involves unfolding this application of p. We want to remember the part that goes through p so that we do not need to unfold it in the full context and thus add \(\exists loc _p . \pi \) to U[p]. The OVER-UNDER (OU) rule (Appendix B [52]) is the same as UU but over-approximates the bounded environment.

Example 3

Recall the example in Fig. 1a. Row 7 in Table 1 shows the under-approximate summary \(y \text { mod } 2 = 0\) for procedure f obtained as a result of OU.

6.3 Ordering and Conditions for SMT Queries

The way in which proof rules are applied to process a node is shown in Algorithm 2. In the pseudocode, OO, UO, OU, and UU refer to attempts to apply the corresponding rules (e.g., OO\((n, Goal )\) tries to apply the OO rule with \(n \in A\) as the AVAIL premise). Rules that update under-approximations (UU, OU) are applied per CHC with head \( proc(n) \)Footnote 3, whereas rules that update over-approximations (OO, UO) are applied to the disjunction of all such CHCs’ bodies. They return \( true \) upon successful application (and update \( Goal \)), or \( false \) otherwise.

If we have neither found any counterexamples through the bounded environment (i.e., all UU attempts failed), nor eliminated the bounded verification subtask (i.e., the OO attempt failed), then we try to derive new facts by adding new available nodes for the callees of \( proc (n)\). Procedure AddNodes adds these nodes while avoiding adding redundant nodes to D (more details in Appendix D [52]). If any summary updates were made for \( proc (n)\), then the procedure Processed (line 8) will add the bounded parents of n to A, so that new information can be propagated to the parents’ summaries. It then removes n from A.

6.4 Proof Rules for Induction

For programs with unbounded recursion, the OO and UO rules (Fig. 4) are insufficient for proving safety; we therefore extend the rules with induction where the goal is to show that the paths in the approximated bounded environment are spurious. For ease of exposition, we first discuss an extension of OO that does not use EC lemmas and then discuss one that does. (Corresponding extensions for rule UO are similar and can be found in Appendix B [52].)

Without EC lemmas. The rule OVER-OVER-IND (OOI) in Fig. 5 is a replacement for OO that uses induction to find new over-approximate facts. The first five premises are the same as in rule OO. As before, we aim to learn a refinement \(\mathbb {I}\) for the over-approximate summary of p, where \(\mathbb {I} \Rightarrow \lnot \widehat{b}_O\).

The base case is that \(\mathbb {I}\) over-approximates p for all CHCs that do not have any applications of p in their body, i.e. for all \( body \Rightarrow p(\vec {y}) \in \mathcal {C}\) where p does not occur in \( body \), \( body \Rightarrow \mathbb {I}\). For the inductive step, we consider such CHCs where \( body \) contains calls to P. The inductive hypothesis, which is captured by formula \( hyp \), is that \(\mathbb {I}\) over-approximates all recursive calls to p inside these bodies. We check both the base case and the inductive step at once with the implication \(\widehat{ body }_O \wedge hyp \Rightarrow \mathbb {I}\). If the induction succeeds, then we strengthen O[p] with \(\mathbb {I}\).

With EC lemmas. The OVER-OVER-IND-LEMMAS (OOIL) proves weaker properties than OOIL by doing induction under certain \( assumptions \). These properties are EC lemmas.

Fig. 5.
figure 5

Proof rules for induction.

OOIL makes assumptions for current node n and performs induction using these assumptions and known EC lemmas. In particular, \( assumps (n, D)\) is a set of assumptions \(\{a_i~|~1 \le i \le j\}\) for some \(j \ge 0\). When \(j = 0\), the set of assumptions is empty, and OOIL has the same effect as applying OOI. Each assumption \(a_i\) is of the following form:

$$\begin{aligned} q_i( in _{q_i}, out _{q_i}) \Rightarrow \forall vars (b_i) \setminus in _{q_i} \cup out _{q_i} . \lnot b_i , \end{aligned}$$

where \(q_i\) is the predicate for an ancestor of n and \(q_i\) is called by target p in some CHC. The ancestor node’s bounded environment is \(b_i\). Intuitively, each assumption is that the ancestor’s bounded verification subtask has been discharged.

The \( Inst \) function takes a set of formulas, conjoins them, and replaces each application of an uninterpreted predicate with its interpretation in O. When applied to a set of assumptions S, it has an additional step that precedes the others: it first adds a conjunct \(a_i[ in _{q_i} \mapsto x, out _{q_i} \mapsto y]\) for each predicate application \(q_i(x,y)\) in \( body \) to each element \(a_i \in S\). This corresponds to applying the assumption in the induction hypothesis. If induction succeeds, we learn the EC lemma that \(\mathbb {I}\) over-approximates \(p( in , out )\) under the assumptions S.

Example 4

In §2, when we chose procedure e and proved an EC lemma, we used \(j = 1\) to make an assumption about its caller o.

Appendix B [52] contains additional rules that allow lemmas to be simplified. There may be multiple attempts at applying the OOIL proof rule with different j values. For scalability, we require that j not exceed the bound k used for bounded contexts, limiting the number of these attempts.

6.5 Correctness and Progress

The correctness and progress claims for Algorithm 1 are stated below.

Theorem 1 (Correctness)

Algorithm 1 returns \( safe \) (resp. \( unsafe \)) only if the program with entry point main never violates the assertion (resp. a path in main violates the assertion).

Proof

(Sketc.h) The CHC encoding is such that there is solution to the system of CHCs \(\mathcal {C}\) iff the program does not violate the assertion. As a result, if the over-approximate summaries O constitute a solution and proof rule SAFE can be applied, the program does not violate the assertion. The under-approximate summaries U in every proof subgoal are guaranteed to be such that for any \(p \in \mathcal {C}\), U[p] implies any over-approximation O[p]. If UNSAFE can be applied, then the under-approximate summaries U imply that there is no possible solution O. The summaries in U can be used to reconstruct a counterexample path through the original program in this case.

Theorem 2 (Progress)

Processing a node in the derivation tree leads to at least one new (non-redundant) query.

Proof

(Sketch) Initially, no nodes in A have been processed, and after a node is processed, it is removed from the derivation tree. The only way that a node can be processed and not have a new query made about it is if an already-processed node is re-added to A and this node does not have a new query that can be made about it. The AddNodes and MakeUnavailable procedures are the only ones that add nodes to A. The AddNodes procedure, by definition, will only add a node to A if there is a new query that can be made about it. MakeUnavailable only adds bounded parents of nodes whose summaries were updated. For any such bounded parent, at least one approximation of its procedure’s body must be different than it was the last time the bounded parent was processed, since one of its callee’s summaries was updated.

Limitations. If the underlying solver is unable to find appropriate interpolants, the algorithm may generate new queries indefinitely. (The underlying problem is undecidable, so this is not unusual for modular verifiers.) Note, however, that because environments are bounded, each query’s size is restricted.

7 Evaluation and Results

We implemented our algorithm in a tool called Clover on top of CHC solver FreqHorn [25] and SMT solver Z3 [50]. We evaluated Clover and compared it with existing CHC-based tools on three sets of benchmarks (described later) that comprise standard collections and some new examples that include mutual recursion.

We aimed to answer the following questions in our evaluation:

  • Is Clover able to solve standard benchmarks?

  • Is Clover more effective than other tools at handling mutual recursion?

  • To what extent do EC lemmas help Clover solve benchmarks?

  • How does the bound k for environments affect Clover’s performance?

We compared Clover against tools entered in the annual CHC-solver competition (CHC-Comp) in 2019: Spacer [42], based on PDR [12]; Eldarica [39], based on CEGAR [17]; HoIce [13], based on ICE [29]; PCSat [56]; and Ultimate Unihorn [22], based on trace abstraction [35].

For all experiments, we used a timeout of 10 min (as used in CHC-Comp). We ran Clover on a MacBook Pro, with a 2.7 GHz Intel Core i5 processor and 8 GB RAM, but the other tools were run using StarExec [59]. Clover was not run on StarExec due to difficulties with setting up the tool with StarExecFootnote 4.

7.1 Description of Benchmarks

To evaluate Clover, we gathered three sets of varied benchmarks. The first set’s benchmarks range from  10–7200 lines, and the latter two sets have smaller but nontrivial code ( 100 lines). The latter two sets were manually encoded into CHCs, and we plan to contribute them to CHC-Comp. Additional details follow.

CHC-Comp. We selected 101 benchmarks from CHC-Comp [14] that were contributed by HoIce and PCSat, since their encodings preserve procedure calls and feature nonlinear CHCs (which can represent procedures with multiple callees per control-flow path)Footnote 5.

Real-World. Two families of benchmarks are based on real-world code whose correctness has security implications. The Montgomery benchmarks involve properties about the sum of Montgomery representations [41] of concrete numbers. The s2n benchmarks are based on Amazon Web Services’ s2n library [3] and involve arrays of unbounded length (not handled by the tool PCSat).

Mutual Recursion. This set of benchmarks containing mutual recursion was created because few CHC-Comp benchmarks exhibit mutual recursion, likely due to lack of tool support. Even-Odd benchmarks involve various properties of e and o (defined as in Sect. 2) and extensions that handle negative inputs. Another benchmark family is based on the Hofstadter Figure-Figure sequence [38]. Mod n benchmarks consider mutually-recursive encodings of \(\lambda x . x~\mathrm {mod}~n = 0\) for \(n = 3,4,5\). These serve as proxies for recursive descent parsers, which may have deep instances of mutual recursion. We could not directly conduct experiments on such parsers, since existing front-ends [21, 33] cannot handle them. Combination benchmarks result from combining Montgomery and Even-Odd benchmarks.

Table 2. Number of benchmarks solved by Clover and competing tools.
Fig. 6.
figure 6

Timing results for the Real World (left) and Mutual Recursion (right) benchmarks. Points below the diagonal line are those for which Clover outperforms the corresponding tool. Points on the right edge indicate timeouts of the other tool.

7.2 Results and Discussion

Table 2 gives a summary of results. It reports the number of benchmarks solved for each benchmark set by Clover with bound parameter k being 2, 9, and 10 (the best-performing bounds for the three benchmark sets) and by the other tools. It also reports results for Clover with \(k=10\) but without EC lemmas. Figure 6 show the timing results for other tools against Clover for Real-World and Mutual Recursion benchmarks.

Efficacy on Standard Benchmarks. As can be seen in Table 2, Clover performs comparably with other tools on the CHC-Comp benchmarks, and significantly outperforms them on the other two sets of benchmarks. We expect that we can further improve the performance of Clover with additional optimizations and heuristics, such as those that improve the quality of interpolants.

Efficacy on Mutual Recursion Benchmarks. Table 2 and Fig. 6 demonstrate that Clover is more effective and often more efficient at solving Mutual Recursion benchmarks than the other tools. Few tools are able to handle the Even-Odd benchmarks, which Clover (with EC lemmas) can solve at any bound value greater than 2. Other tools are unable to solve even half of the Mutual Recursion benchmarks, reinforcing that Clover is a useful addition to existing tools that enables handling of mutual recursion as a first class concern.

Usefulness of EC lemmas. Running Clover with and without EC lemmas using bound \(k = 10\) revealed their usefulness for many of the benchmarks. In particular, the columns for bound 10 with and without EC lemmas in Table 2 show that EC lemmas are needed to allow Clover to solve several CHC-Comp benchmarks and all the Mutual Recursion benchmarks except the Hofstadter ones. These results indicate that Clover’s ability to outperform other tools on the these benchmarks relies on EC lemmas.

Fig. 7.
figure 7

Left: Percentage of benchmarks Clover solves with different bounds on different benchmark categories; Center, Right: Timing results on a representative benchmark from CHC-Comp and Mutual Recursion, respectively.

Comparison of Different Bounds. Figure 7 (left) shows the number of benchmarks successfully solved by Clover in each set as the bound value is varied. Running Clover with too small a bound impedes its ability to prove the property or find a counterexample, since the environment is unable to capture sufficient information. On the other hand, running Clover with too large a bound affects the runtime negatively. This effect can be observed in Fig. 7 center and right, which show how the runtime varies with the bound for a representative benchmark from the CHC-Comp and Mutual Recursion sets, respectively. Note that at a bound \(k < 2\), Clover does not solve the given CHC-Comp benchmark, and at \(k < 5\), Clover does not solve the given Mutual Recursion benchmark. These results confirm the expected trade-off between scalability and environment relevance. The appropriate trade-off – i.e., the best bound parameter to use – depends on the type of program and property. As seen in Fig. 7 (left), the bound values that lead to the most benchmarks being solved differ per benchmark set. Rather than having a fixed bound, or no bound at all, the ability to choose the bound parameter in Clover allows the best trade-off for a particular set of programs. If the best bound is not known a priori, bound parameters of increasing size can be determined empirically on representative programs.

We also report data on how the number and solving time for each type of SMT query varies with the bound k, averaged over benchmarks in each set. Figure 8 shows the statistics on the average number of queries of each type (top), on the average time taken to solve the query (bottom). These data are from all runs for which Clover is successful and gives an answer of safe or unsafe.

Fig. 8.
figure 8

Average statistics (top four plots: number, bottom four: solve times) of SMT queries made by Clover as the bound changes (for successful runs).

We can use these data along with the data in Fig. 7 to (roughly) compare an approach restricted to \(k=1\) with an approach that allows \(k>1\) in bounded environments. Note that Clover differs significantly in other respects from tools like Spacer and Smash that enforce \(k=1\) in environmentsFootnote 6, making it difficult to perform controlled experiments to compare this aspect alone.

Note from Fig. 8 that for the CHC-Comp and Mutual Recursion sets of benchmarks, the number of SMT queries of all types is lower at \(k > 1\) in comparison to \(k = 1\). This result indicates that benchmarks that can be solved with \(k > 1\) require on average fewer updates to procedure summaries than are needed on average for benchmarks that can be solved with \(k = 1\), confirming the benefit of improved relevance when going beyond a restricted environment with \(k=1\). The data for the Real-World does not follow this trend because a higher bound (\(k=10\)) is needed to solve the examples (as can be seen in Fig. 7).

From Fig. 8, it is clear that the OU and UU queries are cheaper than OO and UO queries, which is expected since the latter require over-approximating the target’s body. Unsurprisingly, OO queries are the most expensive. Average times of non-OO queries for \(k>1\) are lower than (or about the same as) average times for \(k=1\) for the CHC-Comp and Mutual Recursion sets but continue to increase with k in the Real-World set because solving the Montgomery benchmarks relies on propagating under-approximations from increasingly large call graph depths.

8 Related Work

There is a large body of existing work that is related in terms of CHC solving, program analysis, and specification inference.

8.1 CHC-solving for Program Verification

Program verification problems, including modular verification, can be encoded into systems of CHCs [21, 32, 33, 49]. There are many existing CHC-solver based tools [13, 15, 26, 32, 39, 42, 48, 60, 62] that can solve such systems. Clover has many algorithmic differences from these efforts.

Most existing tools do not place any bounds on the environments (if they are used at all). This includes approaches that unfold a relation at each step [48, 60] and CEGAR-based approaches [32, 39] where counterexamples can be viewed as environments. These tools face scalability issues as environments grow; Duality makes larger interpolation queries as more relations are unfolded [49], and Eldarica makes larger tree/disjunctive interpolation queries for counterexamples that involve more procedures [39].

Spacer [42], which is based on PDR [12, 23], considers bounded environments but only allows a bound of one (\(k=1\)). The difference between Duality and a PDR-like approach has been referred to as the variable elimination trade-off [48], where eliminating too many variables can lead to over-specialization of learned facts (PDR) and eliminating no variables can lead to larger subgoals (Duality). Our parameterizable bounded environments enable a trade-off between the two. Another significant difference between Spacer and Clover is that the former uses PDR-style bounded assertion maps to perform induction, whereas we use induction explicitly and derive EC lemmas. Duality may also implicitly use assumptions, and some other tools [13, 60] learn lemmas with implications, but none of them learn lemmas in the form of EC lemmas.

HoIce [13], FreqHorn [26], and LinearArbitrary [62] are based on guessing summaries and do not have any notion of environments similar to ours. All of these approaches have trade-offs between scalability of the search space and expressivity of guessed summaries.

8.2 Program Analysis and Verification

Techniques such as abstract interpretation [18, 19, 24] and interprocedural dataflow analysis [55, 57] can infer procedure summaries and perform modular verification.

These approaches often use fixed abstractions and path-insensitive reasoning, which may result in over-approximations that are too coarse for verification.

The software model checker Bebop [6] in SLAM [7] extended interprocedural dataflow analysis with path sensitivity. Related model checkers include a direct precursor to Duality [47] and other adaptations of PDR to software [16, 37]. Of these, GPDR [37] is similar to Spacer, but lacks modular reasoning and under-approximations. Specification inference (including Houdini-style learning [28]) has also been used to enable modular verification of relational programs [43, 53].

Another tool Smash [31] is closely related to our work. It uses over- and under-approximate procedure summaries, and alternation between them. However, it does not have any notion of a parameterizable bounded environment. The environment for a procedure call is expressed as a pair of a precondition and a postcondition, where the former is an under-approximation of the program execution preceding the call, and the latter is an over-approximation of the program execution following the call. These environments are thus bounded environments with a fixed bound of 1. More importantly, procedure summaries in Smash are comprised of predicate abstractions. In contrast, our summaries are richer formulas in first-order logic theories. We do not rely on predicate abstraction unlike Smash and other related tools [30, 31, 34].

8.3 Specification Inference

Existing work on specification inference is also relevant. Many past efforts [2, 4, 9, 54, 58, 61] focused on learning coarse interface specifications or rules specifying the correct usage of library API calls, rather than learning logical approximations of procedures. Other specification inference techniques learn procedure summaries for library API procedures by using abstract interpretation [19, 36] or learn information-flow properties about the procedures [45, 53]. Other related work [1] infers maximal specifications for procedures with unavailable bodies, and other techniques assume an angelic environment setting [11, 20] – specifications inferred by these techniques may not be valid over-approximations. Another technique [5] also uses interpolation to infer over-approximate summaries but is not applicable to recursive programs.

9 Conclusions

We have presented a modular algorithm for generating procedure summaries and safety verification of interprocedural programs that addresses the challenges of handling mutual recursion and scalability of SMT queries. The novel features of our algorithm are use of bounded environments to limit the size of SMT queries, and a mechanism for performing induction under assumptions that uses these bounded environments to learn EC lemmas that capture relationships between summaries of procedures on call paths in the program.

We have implemented our algorithm in a CHC-based tool called Clover. An evaluation demonstrates that Clover is competitive with state-of-the-art tools on benchmarks from CHC-Comp and based on real-world examples, and is especially effective at solving benchmarks containing mutual recursion. Our algorithm can also be combined with existing invariant-generation techniques to successfully solve benchmarks with unbounded arrays.