Keywords

1 Introduction

Probabilistic programming obviates the need to manually provide inference methods and enables rapid prototyping [13]. Automated formal verification of probabilistic programs, however, is still in its infancy. Our tool Amber provides a step towards solving this problem when it comes to automating the termination analysis of probabilistic programs, which is an active research topic [1, 6, 7, 9,10,11,12, 14, 16]. Probabilistic programs are almost-surely terminating (AST) if they terminate with probability 1 on all inputs. They are positively AST (PAST) if their expected runtime is finite [5]. We describe Amber, a fully automated software artifact to prove/disprove (P)AST. Proving (P)AST is a notoriously difficult problem; in fact it is harder than proving traditional program termination [15]. Amber supports the analysis of a class of polynomial probabilistic programs. Programs in the supported class consist of single loops whose body is a sequence of random assignments with acyclic variable dependencies. Moreover, Amber’s programming model supports programs parametrized by symbolic constants and drawing from common probability distributions. To automate termination analysis, Amber automates relaxations of various existing martingale-based proof rules ensuring (non-)(P)AST [8] and combines symbolic computation with asymptotic bounding functions. Amber certifies (non-)(P)AST without relying on user-provided templates/bounds over termination conditions. Our experiments demonstrate Amber outperforming the state-of-the-art in the automated termination analysis of probabilistic programs (Sect. 3).

Fig. 1.
figure 1

The Amber input syntax. C[V] denotes the set of polynomials in V (program variables) with coefficients from C (constants). The power operator is ‘**’.

Related Work. The tools MGen [6] and LexRSM [1] use linear programming techniques to certify PAST and AST, respectively. The recent tools Absynth [20], KoAT2 [18] and ecoimp [2] can establish upper bounds on expected costs, therefore also on expected runtimes, and thus certify PAST. While powerful on respective AST/PAST domains, we note that none of the aforementioned tools support both proving and disproving (P)AST. Amber is the first tool able to prove and disprove (P)AST. Our recent work introduces relaxations of existing proof rules for probabilistic (non-)termination together with automation techniques based on asymptotic bounding functions [19]. We utilize these proof rule relaxations in Amber and extend the technique of asymptotic bounding functions to programs drawing from common probability distributions and including symbolic constants.

Contributions. This tool demonstration paper describes what Amber can do and how it can be used for certifying (non-)(P)AST.

  • We present Amber, a fully automatic open-source software artifactFootnote 1 for certifying probabilistic (non-)termination (Sect. 2).

  • We exhaustively compare Amber to related tools and report on our experimental findings (Sect. 3).

  • We provide a benchmark suite of 50 probabilistic programs as a publicly available repository of probabilistic program examples (Sect. 3).

2 Usage and Components

Programming Model. Amber supports analyzing the probabilistic termination behavior of a class of probabilistic programs involving polynomial arithmetic and drawing from common probability distributions, parameterized by symbolic constants which represent arbitrary real numbers. All symbolic constants are assumed to be positive. Negative constants can be modeled with the explicit use of “-”. The grammar in Fig. 1 defines the input programs to Amber. Inputs consist of an initialization part and a while-loop, whose guard is a polynomial inequality over program variables. The initialization part is a sequence of assignments either assigning (symbolic) constants or values drawn from probability distributions. Within the loop body, program variables are updated with either (i) a value drawn from a distribution or (ii) one of multiple polynomials over program variables with some probability. Additional to the structure imposed by the grammar in Fig. 1, input programs are required to satisfy the following structural constraint: each variable updated in the loop body depends at most linearly on itself and at most polynomially on variables preceding. On a high-level, this constraint enables the use of algebraic recurrence techniques for probabilistic termination analysis [19]. Despite the syntactical restrictions, most existing benchmarks on automated probabilistic termination analysis [19] and dynamic Bayesian networks [3] can be encoded in our programming language. Figure 2 shows three example programs for which Amber is able to automatically infer the respective termination behavior.

Fig. 2.
figure 2

Two programs supported by Amber, with symbolic constants \(c,x0,e\in \mathbb {R}^+\); Program 2a is PAST, program 2b is AST but not PAST and program 2c is not AST.

Implementation and Usage. Amber is implemented in python3 and relies on the lark-parserFootnote 2 package to parse its input programs. Further, Amber uses the diofantFootnote 3 package as its computer-algebra system. To compute closed-form expressions for statistical moments of monomials over program variables only depending on the loop counter, Amber uses the tool Mora [4]. However, for efficient integration within Amber, we reimplemented and adapted the Mora functionalities exploited by Amber (Mora v2), in particular by employing dynamic programming to avoid redundant computations. Altogether, Amber consists of \({\sim }2000\) lines of code. Figure 3 shows Amber’s output when run on the program from Fig. 2a. Amber can be used through a Docker container [17] or installed locally. Detailed installation and usage instructions are available at https://github.com/probing-lab/amber.

Run with Docker. Amber can be used through a Docker container [17] by running: $ docker run -ti marcelmoosbrugger/amber

Amber can be run on our 2d_bounded_random_walk benchmark with:

$ ./amber benchmarks/past/2d_bounded_random_walk

Fig. 3.
figure 3

The output of Amber when run on the program from Fig. 2a.

Local Installation. First, clone the repository by running the following command in your terminal: $ git clone git@github.com:probing-lab/amber.git

Change directories to Amber’s root folder and make sure python3.8 and the package manager pip are installed on your system. All required python packages can be installed by running $ pip install -r requirements.txt

Create an input program (see Sect. 2) and save it in the benchmarks folder for example with the file name my-benchmark. Amber can now be run with respect to the input program benchmarks/my-benchmark with the following command: $ python ./amber.py --benchmarks benchmarks/my-benchmark.

Fig. 4.
figure 4

Main components of Amber and interactions between them.

Components. Figure 4 illustrates Amber’s main components. Amber uses four existing probabilistic termination proof rules [6, 9, 12, 16] and their relaxations [19]. Additionally, Amber extends the algorithms for these relaxations to further support drawing from common probability distributions and symbolic constants (cf. Fig. 1). After parsing the input program, Amber initializes the four proof rule relaxations and determines their applicability [19]. Amber then executes applicable proof rules consecutively and reports the analysis result containing potential witnesses for (non-)(P)AST. The proof rule algorithms require the computation of asymptotic bounding functions which is implemented in the Bound Store component.

3 Evaluation

Experimental Setup. Amber and our benchmarks, are publicly available at https://github.com/probing-lab/amber. The output of Amber is an answer (“Yes”, “No” or “Maybe”) to PAST and AST, together with a potential witness. We took all 39 benchmarks from [19] and extended them by 11 new programs to test Amber’s capability to handle symbolic constants and drawing from probability distributions. The 11 new benchmarks are constructed from the 39 original programs, by adding noise drawn from common probability distributions and replacing concrete constants with symbolic ones. As such, we conduct experiments using a total of 50 challenging benchmarks, involving polynomial arithmetic, probability distributions and symbolic constants. Further, we compare Amber not only against Absynth and MGen (as in [19]), but also evaluate Amber in comparison to the recent tools LexRSM [1], KoAT2 [18] and ecoimp [2]. Note that MGen can only certify PAST and LexRSM only AST. Moreover, the tools Absynth, KoAT2 and ecoimp mainly aim to find upper bounds on expected costs. Tables 1, 2 and 3 summarize our experimental results, with benchmarks separated into PAST (Table 1), AST but not PAST (Table 2), and not AST (Table 3). Benchmarks marked with * are part of our 11 new examples. In every table, ( ) marks a tool (not) being able to certify the respective termination property. Moreover, symbolizes that a benchmark is out-of-scope for a tool, for instance, due to not supporting some distributions or polynomial arithmetic. All benchmarks have been run on a machine with a 2.6 GHz Intel i7 (Gen 10) processor and 32 GB of RAM and finished within a timeout of 50 s, where most experiments terminated within a few seconds.

Table 1. 27 programs which are PAST.
Table 2. 14 programs which are AST and not necessarily PAST.
Table 3. 9 programs which are not AST.

Experimental Analysis. Amber successfully certifies 23 out of the 27 PAST benchmarks (Table 1). Although Absynth, KoAT2 and ecosimp can find expected cost upper bounds for large programs [2, 18, 20], they struggle on small programs whose termination is not known a priori. For instance, they struggle when a benchmark probabilistically “chooses” between two polynomials working against each other (one moving the program state away from a termination criterion and one towards it). Our experiments show that Amber handles such cases successfully. MGen supports the continuous uniform distribution and KoAT2 the geometric distribution whose support is infinite. With these two exceptions, Amber is the only tool supporting continuous distributions and distributions with infinite support. To the best of our knowledge, Amber is the first tool certifying PAST supporting both discrete and continuous distributions as well as distributions with finite and infinite support. Amber successfully certifies 12 benchmarks to be AST which are not PAST (Table 2). Whereas the LexRSM tool can certify non-PAST programs to be AST, such programs need to contain subprograms which are PAST [1]. The well-known example of symmetric_1D_random_walk, contained in our benchmarks, does not have a PAST subprogram. Therefore, the LexRSM tool cannot establish AST for it. In contrast, Amber using the Supermartingale Rule can handle these programs. To the best of our knowledge, Amber is the first tool capable of certifying non-AST for polynomial probabilistic programs involving drawing from distributions and symbolic constants. Amber is also the first tool automating (non-)AST and (non-)PAST analysis in a unifying manner.

Experimental Summary. Tables 1, 2 and 3 demonstrate that (i) Amber outperforms the state-of-the-art in certifying (P)AST, and (ii) Amber determines (non-)(P)AST for programs with various distributions and symbolic constants.

4 Conclusion

We described Amber, an open-source tool for analyzing the termination behavior for polynomial probabilistic programs, in a fully automatic way. Amber computes asymptotic bounding functions and martingale expressions and is the first tool to prove and disprove (P)AST in a unifying manner. Amber can analyze continuous, discrete, finitely- and infinitely supported distributions in polynomial probabilistic programs parameterized by symbolic constants. Our experimental comparisons give practical evidence that Amber can (dis)prove (P)AST for a substantially larger class of programs than state-of-the-art tools.