Keywords

1 Introduction

One of the most common approaches for performing static analysis of software, used for both simple data-flow analysis and more complex analysis based on abstract interpretation, is to setup a set of equations over some partially ordered set. The solutions of this equation system form the result of the analysis.

These equation systems are generally solved by iterative methods, based on some variant of the Knaster-Tarski theorem. This is immediate when the partial order on the values of the unknowns has a small finite height, but becomes difficult when the height is large or, worse, the partial order does not satisfy the ascending chain condition. In this case, some way of accelerating iterations is needed, such as a widening/narrowing [13] or warrowing [8].

figure a
figure b

The ScalaFix library strives to be a general solver for these kind of equation system, in the spirit of modularization of static analyzers presented in [19]. It implements several iterative algorithms for solving equations (both with a finite or infinite number of unknowns) and it has a convenient interface which is designed for the Scala programming language. Scala combines functional and object-oriented programming in a single high-level language which runs on the Java Virtual Machine (JVM). The library may also be used, with some difficulties, from other languages which run on the JVM, such as Java itself. A better interface for other languages is planned for a later version. The source code of ScalaFix is available on https://github.com/jandom-devel/ScalaFix (in this paper we present the release 0.10), while the compiled code is on the Sonatype OSSRH (OSS Repository Hosting) https://oss.sonatype.org/ with group it.unich.scalafix and artifact scalafix.

In this paper we present the structure of the ScalaFix library and we show some examples of its use. The main application target for such a library is to be a backend for a static analyzer (it is currently in use by the Jandom static analyzer [2]).

In all the code fragments appearing in this paper we assume the following import statements:

figure c

In many examples we will also use the PPL (Parma Polyhedra Library) [10] trough the JPPL bindings. These are simpler and more natural to use than the default Java bindings provided by the PPL. The source code for JPPL is available on https://github.com/jandom-devel/JPPL, while the compiled code is on the Sonatype repository https://s01.oss.sonatype.org/ with group it.unich.jppl and artifact jppl.

All the examples in this paper are available with full code in the GitHub repository https://github.com/jandom-devel/ScalaFixExamples.

2 Equation Systems

The main concept of ScalaFix is the equation system. It comes in two flavors: either with a finite number of unknowns or with a possibly infinite number of unknowns. The main difference between the two flavors is that, in the first case, we are generally interested in solving the system for all the unknowns, while in the second case we are only interested in solving for a single unknown.

Each equation system is characterized by a type U for the unknowns and a type V for the values assumed by the unknowns. As assignment is a function from unknowns to values. The different solvers of ScalaFix take an assignment as the input, perform several iterative steps, and produce a new assignment as the solution of the equation system. The body of an equation system is the cornerstone of all the iterative algorithms: it takes an initial assignment and returns a new assignment obtained by computing all the right hand sides of the equation system. In the Scala language, we have:

figure d

where means that the assignment type is covariant in and contravariant in .

It is important not to be misled by the type of . Given the variables and , we have that is a function, hence no real computation starts until this is applied to a specific unknown , as in .

2.1 Infinite Equation Systems

For example, consider the following equations defining the Fibonacci sequence:

$$\begin{aligned} x_0 = x_1 = 1 x_{i+2} = x_i + x_{i+1} \end{aligned}$$

This may be encoded in ScalaFix as follows:

figure n

Note that, although many type declarations might be avoided thanks to the Scala type inference, we have decided here to be more verbose since we think it is helpful to the reader.

The body must be packed into an before being handed over to a solver:

figure p

Since the number of unknowns is infinite, we use the for computing a solution. The worklist solver needs three parameters: an equation system, an initial assignment and the set of the unknowns for which we want to get a partial solution. For example, if we want to known the sixth Fibonacci number we use:

figure r

where is the assignment which maps every unknown to 1 and is the singleton set \(\{6\}\).

The output is an assignment which maps 6 to the 6th Fibonacci number. Morevoer, since in order to determine \(x_6\) we also need to solve for the unknowns from \(x_0\) to \(x_5\), the resulting assignment also contains the values for these unknowns:

figure u

Another solver for infinite equation systems implemented in ScalaFix is the , where the unknown to be updated is chosen, among those in the worklist, according to priorities which are dynamically generated.

In the general case there is no guarantee of convergence: this should be assured from the specific theory used to analyze the equation system under consideration.

2.2 Finite Equation Systems

The solvers in the package work for equation systems with a possibly infinite number of unknowns. If the number of unknowns is finite, it is possible to use a instead, which allows to use different solvers specifically tailored for this case.

A finite equation system is characterized, besides its body, also by:

  • the set of all the unknowns;

  • the influence relation between the unknowns;

  • a subset of all the unknowns, called the input unknowns.

While the set of all the unknowns is an obvious information, the other two parameters deserve an explanation. The influence relation determines the dependencies between unknowns. If the unknown x is used in the right hand side of the equation defining y, then we say that x influences y. When x is recomputed and its value changes, all the unknowns influenced by x should be recomputed as well.

Note that not all the solvers actually need the influence relation. For example, it is not used by , which performs a parallel update of all the unknowns, and by , which repeatedly updates all the unknowns one at a time. On the contrary, it is used by those solvers which avoid to recompute an unknown when it is not strictly necessary, such as . In the case of infinite equation systems, the influence relation is computed dynamically during the evaluation of the body when needed, while for finite equation systems we require the influence relation to be provided statically.

The set of input unknowns is used to compute a depth-first ordering of the unknowns in the equation system. This allows to determine the set of unknowns where widening should be applied to ensure convergence and the default hierarchical ordering [11] for the .

We may turn the Fibonacci example into a finite equation system by restricting the number of unknowns, as follows:

figure ac

Then we may solve the equation system, for example with:

figure ad

Here we do not need to specify the set of wanted unknowns, since we assume we are interested in solving the entire equation system and find the fixpoint for all the unknowns.

Fig. 1.
figure 1

The example program loop. The symbols \(\vee \), \(\wedge \) and \(+\) are respectively the lub, the glb and the sum on the domain of intervals.

2.3 A Use Case for Static Analysis

We show a use case involving the Parma Polyhedra Library trough the Java binding provided by JPPL. Consider the example program loop and its corresponding equation system over the interval domain in Fig. 1. We first define the body of the equation system using the interval abstract domain from PPL as follows:

figure af

We can now construct a finite equation system as follows:

figure ag

where correspond to the program points 0, 1, 2, 3 in Fig. 1 and defines the dependency relations between the equations. For instance, means that any change in the value of \(x_0\) requires recomputation of \(x_1\). We can now solve the equation system with:

figure ak

whose solution is:

figure al

where means that in the program point 2 the value of i is in the interval [0, 10].

2.4 Infinite Equation Systems and Static Analysis

While finite equation systems are well-suited for intra-procedural analysis, infinite equation systems may be used for inter-procedural analysis, by including an abstraction of the call-stack as part of the unknowns.

Consider, for example, the following code:

figure an

A possible approach for the analysis of this program consists in defining an equation system whose unknowns are pairs (pc) where p is a program point and c is an abstraction of the call-stack, such as (but not limited to) an abstract representation of the values of the formal parameters.

Assuming to work with the interval domain, this is an excerpt of the equation system which describes the function:

figure ap

When the function is called in the context c, the value of the variables at program point 1 is obtained by enlarging the input, provided in c, with a new unconstrained dimension representing the variable b. The equation for the return statement, i.e., the unknown (3, c), is a no-op: in the general case, we might decide to remove those dimensions corresponding to the local variables not returned by the function.

The following are the equations for the main program:

figure ar

The interesting point is the equation for the program point (5, c). Here we:

  1. 1.

    determine the abstract calling context by projecting the abstract value of the program point 4 on the actual parameters of the function call (variable i in this case);

  2. 2.

    take the abstract return context of the function , when invoked with the previously computed calling context;

  3. 3.

    combine the information at program point 4 and the return context to get the final best possible approximation of the value of variables at program point 5.

A theoretical discussion on the last step, as well as on alternative abstractions of the call-stack, is available in [21].

The reason why infinite equation systems are useful for inter-procedural analysis is that we cannot know in advance the contexts we will use for evaluating the function. In this example ScalaFix uses the intervals [0, 0] and [1, 1]. Note that, in more complex cases, some kind of widening operator should be applied on calling contexts to avoid generating an infinite number of them.

3 Widening, Narrowing and Warrowing

ScalaFix supports the use of widenings, narrowings [14] and warrowings [8]. These operators are commonly used to combine the values of the last two iterations into a new value, in order to accelerate or ensure the convergence. In this paper, and in the ScalaFix jargon, they are generally called combos. Combos are implemented at the level of an equation system, and therefore work with every fixpoint solver. Mathematically, a combo over a set V is a binary function \(\mathbin {\square }: V \times V \rightarrow V\). In ScalaFix we have that:

figure au

Applying a combo \(\mathbin {\square }\) to an unknown \(x_i\) means replacing the equation \(x_i = e\) with \(x_i = x_i \mathbin {\square }e\). Typically, a combo is applied to a selection of unknowns, generally the loop heads in the graph generated by the unknowns and their influence relation. Potentially, we might want to use different combos for different unknowns. Therefore, when using combos in ScalaFix, we need to specify a , i.e., a partial function which maps each unknown to the combo we want to use for it (if any). Continuing the example in Sect. 2.3, we may define a combo using the standard widening for intervals [12]:

figure aw

where means that we apply the widening to the unknown 1 only. We now equip the equation system with the widening:

figure ay

The equation system can be solved as before:

figure az

ScalaFix also implements general techniques enhancing widenings and narrowings such as delayed widening.

3.1 Automatic Determination of Combo Points

Instead of manually specifying the set of unknowns where combos should be applied, we may let ScalaFix determine this set automatically. Each finite equation system induces a dependency graph whose nodes are the unknowns and such that there is an edge (xy) iff x influences y. We may build a depth-first ordering of this graph using

figure ba

whose result for the example program loop is:

figure bb

Here the parenthesis denotes loop head nodes, i.e., nodes which are the target of retreating edges. In order to ensure convergence, it is enough to apply widenings to these nodes. This may be done with the method used above, using the graph ordering as a parameter:

figure bd

Then, everything proceeds as in the previous example.

4 Equation Systems Based on Hyper-Graphs

In the equation system shown above, the right-hand side of equations are black boxes. This is generally fine, but in some cases exposing some structure allows optimizations which are not possible otherwise. This is especially true for unknowns such as \(x_1\) in Fig. 1 which correspond to join nodes of a flow chart.

ScalaFix allows to define a body for an equation system in a way that makes manifest the individual contributions of the edges of the flow chart. Consider again the equations in Fig. 1. For the sake of clarity, in Fig. 2 we depict the control-flow graph of the program. Note that the edge i=0 has no source: this is fine since ScalaFix supports hyper-graphs, where each edge may have many (possibly none) sources and a single target. Hyper-graphs are needed for inter-procedural analysis [19]. Edges enter and loop correspond to the two edges entering the join node in Fig. 1, i.e., to the contributions \(x_0\) and \(x_3\) in the equation \(x_1 = x_0 \vee x_3\).

Fig. 2.
figure 2

The graph corresponding to the equation system in Fig. 1.

We need to associate to each edge an action, i.e., a function that takes an assignment and returns the contribution of that edge to the new value of the target unknown.

figure be

For our example equation system we have:

figure bf

The actions for the edges should be packed together with fields describing the structure of the graph into a :

figure bh

The body is automatically reconstructed in ScalaFix by combining all the contributions from the incoming edges with the specified operation , which in our example is simply the upper bound operator of the abstract domain. Finally, the body is used to build a graph-based equation system:

figure bj

Since a is a subclass of , we may use exactly as the equation systems in the previous sections.

Note that the way we provide to the structure of the graph is not particularly elegant: there is a lot of redundancy among the parameters sources, target, ingoing and outgoing. However, ScalaFix has been principally designed to be used as a backend for a static analyzer. In this context, it is likely that the analyzer has already built the control-flow graph internally. Since the four parameters above are just functions from edges (or nodes) to set of nodes (or edges), it is easy for a static analyzer to build a very thin layer providing these parameters.

The ScalaFix library also provide a different API for building graphs (the class) which is easier to use for simple experiments but is not described in this paper.

Fig. 3.
figure 3

The example program nested.

Fig. 4.
figure 4

The graph corresponding to the equation system in Fig. 3.

4.1 Localized Widening

The definition of an equation system based on hyper-graphs allows us to use localized widening [7]. Consider the program in Fig. 3 and the corresponding system of equations. Let be the description of the graph in Fig. 3, as depicted in Fig. 4. We can build, as in the previous section, a graph equation system as follows:

figure br

and define the widening:

figure bs

Now using we can recover the depth-first ordering of the set of unknowns:

figure bu

which is , where and are the loop head nodes. We can apply localized widening to these nodes as follows:

figure by

where the last line computes the solution for the ascending chain. We can now start a descending phase using the narrowing defined in [12]:

figure bz

In the solution for the program point we have that , which cannot be computed without the localized widening.

5 A High-Level Interface

The interface shown above, where the user builds an equation system, decides where to apply widening/narrowing and calls the solver with appropriate parameters, is rather low-level. For example, if one wants to solve an equation system using the classical approach based on an ascending chain with widening followed by a descending chain with narrowing, this procedure must be repeated for both phases, as done in the previous section.

Albeit this allows an extreme flexibility, if we just want to solve an equation system following a standard approach, ScalaFix provides a high-level API which simplifies this task. It is enough to call the generic with a bunch of parameters which specify how we want to solve the equation system. For example, the analysis shown in Sect. 4.1 may be implemented more easily as follows:

figure cd

The possible choices for the above parameters are:

  • : one of the following fixpoint solvers:

    • : updates all the unknowns in parallel;

    • : updates one unknown at a time, following a fixed ordering;

    • : updates one unknown at a time, taken from a queue containing only the unknowns which might produce a different result w.r.t. the previous iteration;

    • : it is similar to the , but the order in which unknowns are extracted from the queue depends on an ordering of the unknowns;

    • : updates the unknowns following a hierarchical ordering (see [11]).

    For the and , the ordering is based on the depth first traversal of the equation system.

  • : does not use combos; puts combos at each unknown; places combos only at loop heads (which are automatically computed).

  • : or , for standard or localized widening respectively.

  • : uses widening operators with no descending phase; uses the standard two phases widening/narrowing approach; strictly intertwines ascending and descending steps in a single warrowing operator;

  • : either or for disabling or enabling the restarting policy which replaces part of the current assignment with the initial assignment, in order to improve precision [8] (only useful for the ).

The high level API also needs some extra information on the analysis domain. This may be provided to ScalaFix in the form of a given instance (the Scala equivalent of a type class) of the type . This instance implicitly provides the partial ordering relation and the upper bound operator for a given type. This is a fragment of the instance for :

figure df

6 Performance

In this section we present some benchmarks showing the performance of the ScalaFix library. Obviously, different equation solvers will have different performances, but comparing different methods for solving equation systems is not in the scope of this paper. What we want to show is the overhead which is caused by using the ScalaFix library instead of an ad-hoc equation solver.

6.1 A Simple Benchmark Using the PPL

Consider the equation system E given by the following equations on \(\mathcal P(\mathbb Z)\):

$$\begin{aligned} {\begin{matrix} x_0 &{}= \left( x_{N-1} \cap \{ v \mid v \le l \}\right) \cup \{ 0 \}\\ x_{i+1} &{}= \{ v+1 \mid v \in x_i \} \end{matrix}} \end{aligned}$$
(1)

We solve E with \(N=100\) and \(l=2{,}000\) using the following methods:

  1. 1.

    an ad-hoc implementation of the round-robin solver, using arrays as the data structure for assignments (array);

  2. 2.

    an ad-hoc implementation of the round-robin solver, using hash tables as the data structure for assignments (hash);

  3. 3.

    the round-robin solver of ScalaFix (scalafix).

For each method, we used both the and domains of the PPL, with or without widening at each unknown. In ScalaFix widenings are added to E using the .withCombos method, while in the custom solvers they are inlined inside the solvers.

Benchmarking programs running on the JVM is not an easy task, since a lot of factors may impact the execution speed, such as just in time compilation and garbage collection. We have used the JMH (Java Microbenchmark Harness) to perform the benchmarks, using 5 forks, each fork composed of 5 iterations for warming up the JVM and 5 iterations for collecting the results. On top of this, we have tried to reduce the effect of automatic CPU performance scaling by disabling Turbo Boost and setting a fixed clock for the CPU, low enough not to overheat the processor. In particular, the results have been obtained on a Intel Core i2500K clocked at 1.6 GHz.

The results are shown in Table 1, and are expressed in operations per second (i.e., the number of times the equation system is solved per second) with a 99% confidence interval.

Table 1. Benchmarks results (operations/s) with 99% confidence intervals.

The benchmarks show that the difference between the three solvers is negligible, since the cost of executing the and operations is much larger than the overhead of the fixpoint solvers.

6.2 Reaching Definitions

The second benchmark contains different implementations of an equation system for reaching definition analysis of a three-address code program from [1, p. 626], whose code is in Fig. 5. As before, we have executed the benchmark comparing the ScalaFix solver to an ad-hoc implementation of the round-robin solver, using arrays and hash tables. The results show that even in this case the difference between the solvers is negligible.

The experiments suggest that the overhead of using ScalaFix is very limited, almost zero.

Fig. 5.
figure 5

Reaching definition benchmark.

7 Related Work

Most available static analyzers, both for industrial or academic applications, implement their custom procedure for solving equation systems. We believe that the use of ScalaFix could help developers in experimenting with different and state-of-the-art solvers. Also, they could contribute, by implementing new techniques that would be immediately reusable by the community. Moreover, the developers would benefit from all the experiments and development efforts behind the library. Actually, one of the major difficulty in the development of ScalaFix has been to choose the correct abstractions to put widening, narrowing, warrowing, localized techniques, equation systems, assignments, solvers, etc...inside a common API.

To the best of our knowledge, ScalaFix is the only general purpose library for solving equation systems for static analysis which is currently available.

We are aware of only another proposal in the past with the library Fixpoint [20]. This library is unmaintained for more than nine years now and the subversion repository for the source code is not accessible. In general, while Fixpoint and ScalaFix share the same general goal, there are many differences:

  • Fixpoint was written in OCaml, while ScalaFix is written in Scala for the Java Virtual Machine.

  • The structure of Fixpoint was more monolithic than that of ScalaFix: the Fixpoint.manager type encapsulates almost all the information needed to solve an equation system, from the position of widenings to the action of the hyper-edges. In ScalaFix we give different responsibilities to different classes.

  • Fixpoint had additional modules implementing some techniques for solving fixpoint equations, namely, guided static analysis [17] and widening with threshold [23]. Implementation of these techniques is a planned improvements for ScalaFix.

  • ScalaFix implements many state-of-the-art techniques recently proposed, such as localized widening, warrowing and restarting.

  • ScalaFix implements general solvers for infinite equation systems, suitable for the analysis of inter-procedural programs.

Since the source code of Fixpoint is no more available, neither a more detailed comparison nor a performance evaluation has been possible.

Another library for solving fixpoint equations, with a different purpose, is Killdall (https://compcert.org/doc-1.6/html/Kildall.html), written for the Coq proof assistant, and part of the CompCert project [22]. Killdall implements the same algorithm as the for finite equation systems in ScalaFix, using the depth-first ordering of the equation system for deciding priorities. However, Killdall does not implement any of the additional features of ScalaFix such as combos (Kildall does not have any support for widening or narrowing), infinite equation systems or alternative solvers. But here the goal is to provide a mechanized verification of program analyses, which can be used to equip the CompCert C compiler, being a challenge to implement and reason upon data structures in a purely functional setting such as Coq.

Finally, FPSolve [15] is a library for solving systems of polynomial equations over a semi-ring. While in particular cases it is possible to recast data-flow equations as equations over a semi-ring, this does not hold in general. Therefore the applicability of FPSolve as a general procedure for solving data-flow equations is limited.

8 Conclusion

We have shown some features of the ScalaFix library. There are other features of ScalaFix which are not presented here, such as:

  • support for observing the behaviour of the solvers with the listener class which can be used for debugging and computing metrics, and also for fine-tuning the analysis domain using statistical approaches (see for instance [3, 6]);

  • support for restarting: a policy which, under certain conditions, replaces part of the current assignment with the initial assignment, in order to improve precision [8];

  • implementation of other equation solvers from the literature, such as solvers based on hierarchical ordering and priority worklists.

ScalaFix is the only general purpose library implementing advanced techniques such as localized widening and restarting. In the near future, we plan to enhance ScalaFix along several directions:

  • develop a thin interface layer to make ScalaFix easier to use by other JVM based languages;

  • implement more techniques such as guided abstract interpretation [17], lookahead widening [16] or the improved handling of descending chains in [18];

  • implement equation systems with side-effects [9] and for different paradigms [4, 5].

We have shown in Sect. 6 that the overhead of using ScalaFix instead of re-implementing an ad-hoc solver is negligible. A big effort has been provided to design the ScalaFix API to be as flexible as possible for the need of very different analyzers, and in the choice of the data structures both for equation systems and graphs to allow the implementation of many speed-up features, depending on the kind of equation systems used.