Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Formal verification of randomized algorithms remains a challenging problem. In recent years, a number of specialized program logics [8, 10, 11, 37, 42] and automated techniques [6, 19, 20] have been developed to analyze these programs. In addition, a number of randomized algorithms have been verified directly in interactive theorem provers [26, 27, 52] without using intermediary program logics. Besides establishing correctness results, much of this work has focused on verifying the expected or average cost of randomized algorithms. Although expectation bounds are an important first step in cost analysis, there are other stronger properties that often hold. For many randomized algorithms, we can establish tail bounds which bound the probability that the algorithm takes more than a given amount of time.

For example, it is well known that randomized QuickSort performs \(O(n\log n)\) comparisons on average when sorting a list of length n, and this fact has been verified in theorem provers before [27, 52]. However, not only does it do \(O(n\log n)\) comparisons on average, but the probability that it does more than \(O(n\log n)\) comparisons is vanishingly small for sufficiently large lists. To be precise, let \(W_n\) be the number of comparisons when sorting a list of length n. Then, for any positive k there exists \(c_k\) such that \( \Pr \left[ W_n > c_k n \log n\right] < \frac{1}{n^k}\). When we say that such \(c_k\) exist, we mean so in a constructive and practical sense: we can actually determine them and they are not absurdly large, so that one can derive interesting concrete bounds. For instance, when n is 10 million, the probability that \(W_n\) is greater than \(8 n \log _2 n\) is less than \(10^{-9}\). These kinds of tail bounds hold for many other classical randomized algorithms and are often stronger than asymptotic expectation bounds.

Despite this, there is a good reason for the prior emphasis on expectation bounds rather than tail bounds in the field of formal methods: tail bounds on running time are usually quite difficult to derive. Common approaches for deriving these bounds involve the use of methods from analytic combinatorics [31] or the theory of concentration of measure [25]. Although these techniques are very effective, to be able to use them in a theorem prover one would first need to be able to mechanize the extensive body of results that they depend upon.

The Need for “Cookbook” Methods. Let us contrast the difficulty mentioned above with the (relative) ease of analyzing deterministic algorithms. For deterministic divide-and-conquer algorithms, the cost is often given by recurrences of the form

$$\begin{aligned} W(x) = a(x) + \sum _{i=1}^nW(h_i(x)) \end{aligned}$$
(1)

where the “toll” function \(a(x)\) represents the cost to process an input and divide it into subproblems of size \(h_1(x), \dots , h_n(x)\), which are then solved recursively. Every undergraduate algorithms course covers “cookbook” techniques such as the Master Theorem [13, 23] that can be used to straightforwardly derive asymptotic bounds on these kinds of recurrences. Moreover, these results can also be used to easily analyze recurrences for other types of resource use, such as the maximum stack depth or the span of parallel divide-and-conquer algorithms [15]. Recurrences for these kinds of resources have the form:

$$\begin{aligned} S(x) = b(x) + \max _{i=1}^nS(h_i(x)) \end{aligned}$$
(2)

We will call recurrences of the form in Eq. 1 “work recurrences” and those of the form in Eq. 2 “span recurrences”. Although Eq. 2 does not fit the format of the Master Theorem directly, when S is monotone the recurrence simplifies to \(S(x) = b(x) + S(\max _{i=1}^n(h_i(x)))\) and so can be analyzed using the Master Theorem.

What is nice about these methods is that they give a process for carrying out the analysis: find the toll function, bound the size of recursive problems, and then use the theorem. Even if the first two steps might require some ingenuity, the method at least suggests an approach to decomposing the problem.

Besides being easy to use, results like the Master Theorem do not have many mathematical prerequisites. This makes them ideal for use in interactive theorem provers. Indeed, Eberl [28] has recently mechanized the more advanced Akra and Bazzi [2] recurrence theorem in Isabelle and has used it to derive asymptotic bounds for a number of recurrence relations.

For randomized divide-and-conquer algorithms, the same recurrence relations arise, except the \(h_i(x)\) are random variables because the algorithms use randomness to divide the input into subproblems. Because of the similarity between deterministic and probabilistic recurrences, textbook authors sometimes give the following heuristic argument before presenting a formal analysis [23, pp. 175–177]: In an algorithm like QuickSort, the size of the sublists generated by the partitioning step can be extremely unbalanced in the worst case, but this happens very rarely. In fact, each sublist is unlikely to be much more than \(\frac{3}{4}\) the length of the original list. And, for a deterministic recurrence like \(W(n) = n~+~W(\frac{3}{4} n)~+~W(\frac{3}{4} n)\), the master theorem says the result will be \(O(n\log n)\). Thus, intuitively, we should expect the average running time of Quicksort to be something like \(O(n\log n)\).

This raises a natural question: Is there a variant of the Master Theorem that can be used to justify this kind of heuristic argument? Moreover, because Eq. 2 does not simplify to a version of Eq. 1 in the randomized settingFootnote 1, we ideally want something that can be used to analyze recurrences of both forms.

For the case where there is only a single recursive call (so that \(n=1\) above), Karp [38] developed such a result. At a high-level, using Karp’s theorem involves two steps. First, bound the average size of the recursive subproblem by finding a function \(m\) such that \({{\mathrm{E}}}\left[ h_1(x)\right] \le m(x)\). Next, find a solution \(u\) to the deterministic recurrence relation

$$ u(x) \ge a(x) + u(m(x)) $$

Then the theorem says that for all positive integers w,

$$ \Pr \left[ W(x) > u(x) + w a(x)\right] \le \left( \frac{m(x)}{x}\right) ^w $$

There are a few side conditions on the functions \(m\) and \(u\) which are usually easy to check. Although this method generally does not give the tightest possible bounds, they are often strong. Recently, Karp’s technique has been extended [51] to the case for \(n > 1\) for both span and work recurrences.

Our Contribution. In this paper, we present a mechanization of Karp’s theorem and these extensions in Coq, and use it to develop verified tail bounds for (1) the number of comparisons in sequential QuickSort, (2) the span arising from comparisons in parallel QuickSort, (3) the height of a randomly generated binary search tree, and (4) the number of rounds needed in a distributed randomized leader election protocol. By using the Coq-Interval library [41] we are able to instantiate our bounds in Coq to establish numerical results such as the \(10^{-9}\) probability bound for QuickSort quoted above. To our knowledge, this is the first time these kinds of bounds have been mechanized.

We start by outlining the mechanization of probability theory that our work is based on (Sect. 2). We then describe Karp’s theorem and its extensions in more detail (Sect. 3). To demonstrate how Karp’s result is used, we describe our verification of the examples mentioned above, with a focus on the sequential QuickSort analysis (Sect. 4). Of course, formalization often requires changing parts of a paper proof, and our experience with Karp’s theorem was no different. We discuss the issues we encountered and what we had to change in Sect. 5. Finally, we compare our approach to related work (Sect. 6) and conclude by discussing possible extensions and improvements to our development (Sect. 7).

The Coq development described in this paper is available at https://github.com/jtassarotti/coq-probrec.

2 Probability Preliminaries

2.1 Discrete Probability

We first need a set of basic results and definitions about probabilities and expectations to be able to even state Karp’s theorem. We had to decide whether to use a measure-theoretic formulation or restrict ourselves to discrete distributions. Although the Isabelle standard library has an extensive formalization of measure theoretic probability, we are not aware of a similarly complete set of results in Coq (we discuss existing libraries later in Sect. 6). Moreover, the applications we had in mind only involved discrete distributions, so we did not need the extra generality of the measure-theoretic approach. To keep things simple, we decided to develop a small library for discrete probability theory. Defining probability and expectation for discrete distributions still involves infinite series over countable sets, which can raise some subtle issues involving convergence. We use the Coquelicot real analysis library [16] to deal with infinite series.

The definition of probability distributions is given in Fig. 1. We represent them as a record type parameterized by a countable type. We use the ssreflect [32] library’s definition of countable types (countType), which consists of a type \(\mathtt{A}\) equipped with a surjection from \(\mathtt{nat}\) to \(\mathtt{A}\).

The distribution record consists of three fields: (1) a probability mass function \(\mathtt{pmf : A -> R}\) that assigns a probability to each element of \(\mathtt{A}\), (2) a proof that \(\mathtt{pmf\ a}\) is non-negative for all \(\mathtt{a}\), and (3) a proof that the countable series that sums \(\mathtt{pmf\ a}\) over all \(\mathtt{a}\) converges and is equal to 1.

Random variables on a distribution (rvar) are functions from the underlying countable space \(\mathtt{A}\) to some other type \(\mathtt{B}\). The expected value of a real-valued random variable is defined in the usual way as the series \(\sum _{r \in \textsf {img}(X)} \Pr \left[ X = r\right] \cdot r\). Because the underlying distribution is discrete, the image of the random variable is a countable set, so we can define such a series.

Of course, expectations of discrete random variables do not always exist, because the above series may not converge absolutely. Because of this, even with the restriction to discrete probability, dealing with infinite series and issues of convergence can often be tedious. In actuality, many randomized algorithms only involve finite distributions. For random variables defined on such distributions, the expectation always exists, because the series is actually just a finite sum. For our mechanization of Karp’s theorem, we restrict to these finite distributions.

Fig. 1.
figure 1

Basic definitions for discrete probability distributions and random variables.

2.2 Monadic Encoding

We represent sequential and parallel randomized algorithms in Coq using a monadic encoding. Variants of this kind of representation have been used in many prior formalizations and domain specific languages [3, 9, 46, 48].

The type \(\mathtt{ldist\ A}\) represents probabilistic computations that result in values of type \(\mathtt{A}\). Such computations are represented as a finite list of values of type \(\mathtt{A}\) paired with the probabilities that these values occur. The bind operation \(\mathtt{dist\_bind\ l\ f}\) represents the process of performing the computation represented by \(\mathtt{l}\) to obtain a random element of type A (i.e., “sampling” from the distribution represented by \(\mathtt{l}\)), and then passing this to \(\mathtt{f}\). The return operation (\(\mathtt{dist\_ret}\)) applied to \(\mathtt{a}\) corresponds to the probabilistic computation that simply returns \(\mathtt{a}\) with probability 1. We use Coq’s notation mechanism to represent binding \(\mathtt{m}\) in \(\mathtt{e}\) by writing \(\mathtt{x <- m; e}\), and write \(\mathtt{mret\ a}\) for returning \(\mathtt{a}\).

3 Karp’s Theorem

Now that we have a formalization of the basic concepts of probability theory and a way to describe randomized algorithms in Coq, we can give a more careful explanation of Karp’s theorem and its extensions.

3.1 Unary Recurrences

The setting for Karp’s theorem is more general than the informal account we gave in the introduction. Specifically, he assumes that there is a set \(I\) of algorithm inputs, a function \(size: I\rightarrow \mathbb {R}^{\ge 0}\) such that \(size(z)\) is the “size” of input \(z\), and a family of random variables \(h(z)\) which correspond to the new problem that is passed to the recursive call of the algorithm. The random variable \(W(z)\), which represents the cost of the algorithm when run on input \(z\), is assumed to obey the following unary recurrence:

$$\begin{aligned} W(z) = a(size(z)) + W(h(z)) \end{aligned}$$
(3)

Although the intent of this recurrence is clear, it requires some care to interpret: on the right hand side, \(h(z)\) is a random variable, but it is given as an argument to \(W\), which technically has I as a domain, not I-valued random variables. Instead, we should read this not as the composition \(W\circ h\) applied to \(z\), but rather as a specification for the process which first generates a random problem according to \(h(z)\) and then passes it to \(W\). In other words, this part of the recurrence is really describing a monadic process of the form:

$$ z' \leftarrow h(z); W(z') $$

Already, Eq. 3 addresses a detail that is often glossed over in informal treatments of randomized algorithms. In informal accounts, one often speaks about a random variable \(W(n)\), which is meant to correspond to the number of steps taken by an algorithm when processing an instance of size n. The issue is that usually, the exact distribution depends not just on the size of the problem but also the particular instance, so it is somewhat sloppy to regard \(W(n)\) as a random variable (admittedly, we did so in Sect. 1). For instance, when randomized QuickSort is run on a list containing duplicate elements, a good implementation will generally perform fewer total comparisons. Even if one tries to avoid this issue by, say, restricting only to lists that do not contain duplicates, one would still need to prove that the distribution depends on the size of the list alone. This is mostly harmless in informal treatments, but it is a detail that would otherwise have to be dealt with in a theorem prover.

We assume there is some constant \(d\) that is the “cut-off” point for the recurrence: when the input’s size drops below d no further recursive calls are made. The function \(a: \mathbb {R} \rightarrow \mathbb {R}^{\ge 0}\) is required to be continuous and increasingFootnote 2 on \((d, \infty )\), but equal to 0 on the interval [0, d]. In addition, it is required that \(0 \le size(h(z)) \le size(z)\), i.e., the size of the subproblem is not bigger than the original.

Then, assume there exists some continuous function \(m: \mathbb {R} \rightarrow \mathbb {R}\) such that for all z, \({{\mathrm{E}}}\left[ size(h(z))\right] \le m(size(z))\) and \(0 \le m(size(z)) \le size(z))\). Moreover, the function \(m(x)/x\) must be non-decreasing. Karp then argues that if there exists a solution to the deterministic recurrence relation \(\tau (x) = a(x) + \tau (m(x))\), there must be a continuous minimal solution \(u : \mathbb {R} \rightarrow \mathbb {R}\). He assumes such a solution exists and derives the following tail bound for \(W\) in terms of u:

Theorem 1

([38]). For all \(z\) and integer w such that \(size(z) > d\),

$$ \Pr \left[ W(z) > u(size(z)) + w \cdot a(size(z))\right] \le \left( \frac{m(size(z))}{size(z)}\right) ^w $$

Because \(u\) is the minimal solution to the deterministic recurrence, we can replace \(u\) with any other solution \(t\) in the above bound: if \(W(z)\) is greater than the version with \(t\), then by minimality of \(u\), it must be bigger than the version with \(u\). This means we do not need to find a closed form for the minimal solution \(u\), because any solution will give us a bound.

It is important to note that \(m\), \(a\) and \(u\) are all functions from \(\mathbb {R}\) to \(\mathbb {R}\). This means that we do not have to deal with subtle rounding issues that sometimes come up when attempting to formalize solutions to recurrences for algorithms. Eberl [28], in his formalization of the Akra-Bazzi theorem, has pointed out how important this can be. The trade-off is that establishing that the recurrence holds everywhere on the domain \(\mathbb {R}\) can be harder, especially at the boundaries where the recurrence terminates.

3.2 Extension to Binary Work and Span Recurrences

Although Theorem 1 makes it easier to get strong tail bounds, it cannot be used in many cases because it only applies to programs with a single recursive call.

Tassarotti [51] describes an extension to cover the general case of work and span recurrences with \(n > 1\) recursive calls. In our mechanization, we only handle the case where there are two recursive calls (so that \(n = 2\)) because this is sufficient for many examples. In this setting, we now have two random variables \(h_1\) and \(h_2\) giving the recursive subproblems. These variables are generally not independent: for QuickSort, \(h_1\) would be the lower partition of the list and \(h_2\) would be the upper partition. However, it is assumed that there is some function \(g_1: \mathbb {R} \rightarrow \mathbb {R}\) such that for all \(z \in I\) and \((z_1, z_2)\) in the support of \((h_1(z), h_2(z))\):

$$ g_1\left( size(z_1)\right) + g_1\left( size(z_2)\right) \le g_1\left( size(z)\right) $$

Informally, we can think of this function \(g_1\) as a kind of ranking function, and the above inequality is saying that the combined rank of the two subproblems is no bigger than that of the original problem. The function \(m\) is now required to bound the expected value of the maximum size of the two subproblems:

$$ {{\mathrm{E}}}\left[ \max \left( size(h_1(z)), size(h_2(z))\right) \right] \le m(size(z)) $$

For bounding span recurrences of the form:

$$\begin{aligned} S(z) \le a(size(z)) + \max (S(h_1(z)), S(h_2(z))) \end{aligned}$$
(4)

we assume once more that \(u\) is a solution to the recurrence \(u(x) \ge a(x) + t(m(x))\). Then we have:

Theorem 2

For all \(z\) and integer w such that \(size(z) > d\) and \(g_1(size(z)) > 1\),

$$ \Pr \left[ S(z) > u(size(z)) + w \cdot a(size(z))\right] \le g_1(size(z)) \cdot \left( \frac{m(size(z))}{size(z)}\right) ^w $$

The difference between the bound above and the one in Theorem 1 is the additional factor \(g_1(size(z))\). Generally speaking, \(g_1(size(z))\) will be bounded by a polynomial, so that in comparison to \(\left( \frac{m(size(z))}{size(z)}\right) ^w\), which decreases exponentially with respect to w, the effect is negligible.

The bound for binary work recurrences is slightly different. Given the recurrence:

$$\begin{aligned} W(z) \le a(size(z)) + W(h_1(z)) + W(h_2(z)) \end{aligned}$$
(5)

we need a second “ranking” function \(g_2\) with the same property that \(g_2\left( size(z_1)\right) + g_2\left( size(z_2)\right) \le g_2\left( size(z)\right) \) for all \(z_1\) and \(z_2\) in the support of the joint distribution \((h_1(z), h_2(z))\) when \(size(z) > d\). In the proof by Tassarotti [51], this second ranking function is used to transform the work recurrence into a span recurrence which is then bounded by Theorem 2, and this bound is converted back to a bound on the original recurrence. From the perspective of the user of the theorem, we now need \(u\) to solve the deterministic recurrence \(u(x) \ge \frac{a(x)}{g_2(x)} + u(m(x))\), and we obtain the following bound:

Theorem 3

For all \(z\) and integer w such that \(size(z) > d\) and \(g_1(size(z)) > 1\),

$$ \Pr \left[ W(z) > g_2(size(z)) \cdot u(size(z)) + w \cdot a(size(z))\right] \le g_1(size(z)) \cdot \left( \frac{m(size(z))}{size(z)}\right) ^w $$

Observe that on the left side of the bound, we re-scale \(u\) by a factor of \(g_2(size(z))\) because it was the solution to a recurrence in which we normalized everything by \(g_2\).

The above results let us fairly easily obtain tail bounds for a wide variety of probabilistic recurrences arising in the analysis of randomized divide-and-conquer algorithms. In the next section, we demonstrate their use by verifying a series of examples. After showing how they are used, we return to the discussion of the results themselves in Sect. 5, where we describe issues we encountered when trying to translate the paper proofs into Coq.

4 Examples

We now apply the results developed in the previous sections to several examples.

4.1 Sequential QuickSort

Our first example is bounding the number of comparisons performed by a sequential implementation of randomized QuickSort. To count the number of comparisons that the monadic implementation of the algorithm performs, we combine the probabilistic monad from Sect. 2.2 with a version of the writer monad that increments a counter every time a comparison is done. This cost monad is defined by:

figure a

A computation of type \(\mathtt{cost\ A}\) is just a pair of a nat, representing the count of the number of comparisons, and an underlying value of type A. The bind operation sums costs in the obvious way. We can then define a version of comparison in this monad:

figure b

where \(\mathtt{ltngtP}\) is a function from the \(\mathtt{ssreflect}\) library that returns whether \(\mathtt{x < y}\), \(\mathtt{x = y}\), or \(\mathtt{x > y}\).

The codeFootnote 3 for QuickSort is given in Fig. 2. This is the standard randomized functional version of QuickSort: For empty and singleton lists, \(\mathtt{qs}\) simply returns the input. Otherwise, it selects an element uniformly at random from the list using \(\mathtt{draw\_pivot}\). It then uses \(\mathtt{partition}\) to split the list into three parts: elements smaller than the pivot, elements equal to the pivot, and elements larger than the pivot. Elements smaller and larger than the pivot are recursively sorted and then the results are joined together. Partition uses the compare operator defined above, which implicitly counts the comparisons it performs.

Fig. 2.
figure 2

Simplified version of code for sequential QuickSort. In ssreflect, we write [: :] for the empty list and \(\mathtt{[:\,\!:a]}\) for a list containing the single element \(\mathtt{a}\). Because randomized QuickSort is not structurally recursive, the actual definition in our development defines it by well-founded recursion on the size of the input.

What is the probabilistic recurrence for this algorithm? In each round of the recursion, the algorithm performs n comparisons to partition a list of length n. So, taking the size function to be the length of the list, we have the toll function \(a(x) = x\). There are two recursive calls, and we have to sum the comparisons performed by each to get the total, so we need to use Theorem 3.

The \(h_1\) and \(h_2\) functions giving the recursive subproblems correspond to the \(\mathtt{lower}\) and \(\mathtt{upper}\) sublists returned by \(\mathtt{partition}\). We now need to bound the expected value of the maximum of the sizes of these two lists. We first show:

$$ {{\mathrm{E}}}\left[ \max \left( size(h_1(l)), size(h_2(l))\right) \right] \le \frac{1}{size(l)}\sum _{i=0}^{size(l) - 1} \max (i, size(l) - i -1) $$

To get some intuition for this inequality, imagine the input list l was already sorted. In this situation, if the pivot we draw is in position i, then the sublist of elements less than i only contains elements to the left of i in l and the sublist of elements larger than i contains only elements to the right of i in l. The size of each sublist is therefore at most i and \(size(l) - i - 1\), respectively, which corresponds to the ith term in the sum above. The factor of \(\frac{1}{size(l)}\) is the probability of selecting each pivot index, because they are all equally likely. Of course, the input list is not actually sorted, but when we select pivot position i, we can consider where its position would be in the final sorted list, and the result is just a re-ordering of the terms in the sum.

Next we show by induction on n that:

$$ \sum _{i=0}^{n - 1} \max (i, n - i -1) = {n \atopwithdelims ()2} + \left\lfloor \frac{n}{2}\right\rfloor \cdot \left\lceil \frac{n}{2}\right\rceil \le \frac{3 n^2}{4} $$

We combine the two inequalities to conclude:

$$ {{\mathrm{E}}}\left[ \max \left( size(h_1(l)), size(h_2(l))\right) \right] \le \frac{3}{4} \cdot size(l) $$

The above bound is for the case when the list has at least 2 elements; otherwise the recursion is over so that the sublists have length 0. Hence we can define \(m\) to be \(m(x) = 0\) for \(x < 4/3\) and \(m(x) = \frac{3x}{4}\) otherwise. We use 4/3 as the cut-off point rather than 2 because it makes the recurrence easier to solve.

To use Theorem 3, we need to come up with two “ranking” functions \(g_1\) and \(g_2\) such that \(g_i(size(h_1(z))) + g_i(size(h_2(z))) \le g_i(size(z))\) for each i. Ideally, we want \(g_1\) to be as small as possible, because it scales the final bound we derive, whereas for \(g_2\) we want to pick something that makes it easy to solve the recurrence \(t(x) \ge a(x)/g_2(x) + t(m(x))\). Like the derivation of the bound \(m\), these parts of the proof are not automatic and require some experimentation. We define the following choices for the parameters of Theorem 3:

$$\begin{aligned} g_1(x) = x \qquad g_2(x) = {\left\{ \begin{array}{ll} \frac{1}{2} &{} x \le 1 \\ \frac{x}{x-1} &{} 1< x < 2 \\ x &{} x \ge 2 \end{array}\right. } \qquad t(x) = {\left\{ \begin{array}{ll} 1 &{} x \le 1 \\ \log _{\frac{4}{3}} x + 1 &{} x > 1 \end{array}\right. } \end{aligned}$$

We can check \(g_1\) and \(g_2\) satisfy the necessary conditions, and that \(t\) is a solution to the resulting deterministic recurrence relation.

Writing T(x) for the total number of comparisons performed on input x, Theorem 3 now gives us:

$$ \Pr \left[ T(x) > size(x) \cdot \log _{4/3}(size(x)) + 1 + w \cdot size(x)\right] \le size(x) \cdot \left( \frac{3}{4}\right) ^w $$

for l such that \(size(x) > 1\). More concisely, if we set \(n = size(x)\), then this becomes:

$$ \Pr \left[ T(x) > n \log _{4/3} n + 1 + w n\right] \le n \cdot \left( \frac{3}{4}\right) ^w $$

In Coq, this is rendered as:

figure c

where \(k = \frac{1}{\ln {4/3}}\), \(\mathtt{rsize}\) returns the length of a list as a real number, and \(\mathtt{INR : nat -> R}\) coerces its input into a real number.

To understand the significance of these bounds, consider the case when \(w = \lfloor c \cdot \log _{4/3}n \rfloor \) for some constant c. Then, using the above we get:

$$\begin{aligned} \Pr \left[ T(x)> (c + 1)n \log _{4/3} n + 1\right]&\le \Pr \left[ T(x) > n \log _{4/3} n + 1 + w n\right] \end{aligned}$$
(6)
$$\begin{aligned}&\le n \cdot \left( \frac{3}{4}\right) ^{w} \le n \cdot \left( \frac{3}{4}\right) ^{c\log _{4/3}n - 1} \end{aligned}$$
(7)
$$\begin{aligned}&= \frac{4}{3} \cdot \frac{1}{n^{c-1}} \end{aligned}$$
(8)

so that when \(c > 2\), the probability goes very quickly to 0 for lists of even moderate size.

We can now use the Coq-Interval library, which provides tactics for establishing numerical inequalities, to compute the value of this bound for particular choices of n. In particular, we can establish the claim from the introduction: when sorting a list with 10 million elements, the probability that QuickSort performs more than \(8n\log _2 n\) comparisons is less than \(10^{-9}\).

figure d

4.2 Other Examples

We have mechanized the analysis of three other examples using Karp’s theorem. A discussion of these examples is given in the appendix of the full version of this paper available as supplementary material. Here we give a brief description of the examples:

  1. 1.

    Parallel QuickSort: using Theorem 2 we show that the longest chain of sequential dependencies from comparisons in a parallel version of QuickSort is \(O(\log (n))\) with high probability.

  2. 2.

    Binary search tree: we analyze the height of a binary search tree which is generated by inserting a set of elements under a random permutation. We show the height is \(O(\log (n))\) with high probability using Theorem 2.

  3. 3.

    Randomized leader election: we consider a protocol for distributed leader election that has been analyzed by several authors [30, 47]. The protocol consists of stages called “rounds”. At the beginning of a round, each active node generates a random bit. If the bit is 1, the node remains “active” and sends a message to all the other nodes; otherwise, if the bit is 0 it becomes inactive and stops trying to become the leader. If every active node generates a 0 within a round, no messages are sent and instead of becoming inactive, those nodes try again in the next round. When there is only one active node remaining, it is deemed the leader. We use Theorem 1 to show that with high probability at most \(O(\log {n})\) rounds are needed.

5 Changes Needed for Mechanization

Anyone who has mechanized something based on a paper proof has probably encountered issues that make it harder than just “translating” the steps of the proof into the formal system. Even when the paper proof is correct, there are inevitably parts of the argument that are more difficult to mechanize than they appear on paper, and this can require changing the strategy of the proof.

Our experience mechanizing Karp’s theorem and its extensions was no different. In this section we describe obstacles that arose in our attempt to mechanize the proof.

5.1 Overview of Proof

To put the following discussion in context, we need to give a sketch of the paper proof. Recall that Theorem 1 says that if we have a probabilistic recurrence \(W\) with a corresponding deterministic recurrence solved by \(u\), then for all \(z\) and integer w,

$$ \Pr \left[ W(z) > u(size(z)) + w \cdot a(size(z))\right] \le \left( \frac{m(size(z))}{size(z)}\right) ^w $$

The first thing one would naturally try to prove this is to proceed by induction on the size of \(z\). However, immediately one realizes that the induction hypothesis needs to be strengthened: the bound above is only shown at each integer w, so there are “gaps” in between where we do not have an appropriately tight intermediate bound. To address this, Karp defines a function \(D_r\) which “interpolates” the bound \(\left( \frac{m(size(z))}{size(z)}\right) ^w\) to fill in these gaps. This function \(D_r\) is somewhat complicated, and is defined in a piecewise manner as follows:

  1. 1.

    If \(r \le 0\) and \(x > 0\), \(D_r(x) = 1\)

  2. 2.

    If \(r > 0\):

    1. (a)

      If \(x \le d\) then \(D_r(x) = 0\)

    2. (b)

      If \(x > d\) and \(u(x) \ge r\) then \(D_r(x) = 1\)

    3. (c)

      If \(x > d\) and \(u(x) < r\) then

      $$ D_r(x) = \left( \frac{m(x)}{x}\right) ^{\left\lceil \frac{r - u(x)}{a(x)}\right\rceil } \frac{x}{u^{-1}(r - a(x) {\left\lceil \frac{r - u(x)}{a(x)}\right\rceil })} $$

This definition is intricate, especially the last case. However, if we set \(r = u(size(z)) + w \cdot a(size(z))\), then \(D_r(size(z))\) simplifies to \(\left( \frac{m(size(z))}{size(z)}\right) ^{w}\), confirming the intuition that this is some kind of interpolation.

Next, define \(K_r(z) = \Pr \left[ W(z) > r\right] \). Then, the result follows by showing that

$$ K_r(z) \le D_r(size(z)) $$

The probabilistic recurrence relation for \(W\) implies that:

$$\begin{aligned} K_r(z) \le {{\mathrm{E}}}\left[ K_{r-a(size(z))}(h(z))\right] \end{aligned}$$
(9)

when \(size(z) > d\). Karp’s idea is to recursively define a sequence of functions \(K^{i}_r\) for \(i \in \mathbb {N}\) which approximate \(K_r\). These are defined by:

$$\begin{aligned} K^{0}_r(z)&= {\left\{ \begin{array}{ll} 1 &{} \text {if r < u(d)} \\ 0 &{} \text {otherwise} \end{array}\right. } \\ K^{i+1}_r(z)&= {{\mathrm{E}}}\left[ K^{i}_{r-a(size(z))}(h(z))\right] \end{aligned}$$

Note the similarity between the recursive case and the property in (9). For all i, \(K^{i}_{r}(z) \le 1\), so \(\sup _{i}K^{i}_{r}(z)\) exists. Karp says then that \(K_r(z) \le \sup _{i} K^{i}_r(z)\), so it suffices to show that for all i, \(K^{i}_r(z) \le D_r(size(z))\).

The proof is by induction on i. The base case is straightforward. For the inductive case, the definition of \(K^{i+1}_r\) and the induction hypothesis give us:

$$\begin{aligned} K^{i+1}_r(z)&= {{\mathrm{E}}}\left[ K^{i}_{r-a(size(z))}(h(z))\right] \\&\le {{\mathrm{E}}}\left[ D_{r-a(size(z))}(h(z))\right] \end{aligned}$$

So we just need to show that this final expected value is \(\le D_r(size(z))\). The key is the following simple lemma, which lets us bound the expected value of suitable functions of random variables:

Lemma 1

([38, Lemma 3.1]). Let X be a random variable with values in the range [0, x]. Suppose \(f : \mathbb {R} \rightarrow \mathbb {R}\) is a non-negative function such that \(f(0) = 0\), and there exists some constant c such that for all \(y \ge c\), \(f(y) = 1\) and f(y) / y is non-decreasing on the interval (0, c]. Then:

$$ {{\mathrm{E}}}\left[ f(X)\right] \le \frac{{{\mathrm{E}}}\left[ X\right] f(\min (x, c))}{\min (x,c)} $$

Applying this with \(X = size(h(z))\), \(f = D_{r-a(size(z))}\), and suitable choice of c gives us the desired result. Of course, we need to check that this choice of f satisfies the conditions of the lemma. In particular, showing that f(y) / y is non-decreasing is somewhat involved, and it is here that the various continuity assumptions on parameters like \(a\) are used.

Once the inductive proof is finished, we set \(r = u(size(z)) + w \cdot a(size(z))\), to get the form of the bound in the statement of the theorem.

5.2 Changes

Termination Assumption. The first problem we had was that we were unable to prove that \(K_r(z) \le \sup _{i} K^{i}_r(z)\). In the original paper proof, this inequality is simply stated without further justification. Young [53] has suggested that in fact one may need stronger assumptions on \(W\) or \(h\) to be able to conclude this and suggests two alternatives. Either \(W\) can be assumed to be a minimal solution to the probabilistic recurrence, or one can assume that the recurrence terminates with probability 1, that is \(\Pr \left[ h^n(z) > d\right] \rightarrow 0\) as \(n \rightarrow \infty \). In the end, we chose to make the latter assumption, because it is easy to show for most examples.

Existence of a Minimal Solution. Karp argues that if there is a solution to the deterministic recurrence relation, there must be a minimal solution \(u\). The results in the theorem are then stated in terms of \(u\). It seemed to us more efficient to simply state the results in terms of any continuous and invertible solution \(t\) to the recurrence relation. In this way, we avoid the need to prove the existence, continuity, and invertibility of the minimal solution. In fact, rather than assuming \(t\) is invertible on its full domain, we merely assume that there exists a function \(t'\) which is an inverse to \(t\) on the subdomain \((d, \infty )\), that is: \(t'(t(x)) = x\) for \(x > d\) and \(t(t'(x)) = x\) for \(x > t(d)\). The definition of \(D\) is then changed to replace occurrences of \(u\) with \(t\).

Division by Zero. The original piecewise definition of \(D\) above involves division by \({u^{-1}(r - a(x) {\left\lceil \frac{r - u(x)}{a(x)}\right\rceil })}\). However, it is not clear that this is always non-zero on the domain considered, and this is not explicitly discussed in the paper proof. Since we replace the \(u^{-1}\) function with a user supplied function \(t'\), we found it easier to simply require an explicit assumption that \(t'\) is non-zero everywhere.

Unneeded Assumptions. In the original paper proof, the toll function \(a\) is assumed to be everywhere continuous and strictly increasing on \([d, \infty )\). This rules out recurrences like \(W(z) = 1 + W(h(x))\) which show up in examples such as the leader election protocol. For that reason, there is actually an additional result in Karp [38] for the particular case where \(a(x) = 0\) for \(x \le d\) and 1 otherwise.

However, after finishing the mechanization of Theorem 1, we suspected that the assumptions on a could be weakened, avoiding the need for the additional lemma. We changed the assumptions to only require that a was monotone and continuous on the interval \((d, \infty )\). In turn, we require the function t which solves the deterministic recurrence to be strictly increasing on the interval \((d, \infty )\). Our prior proof script worked mostly unchanged: most of the changes actually ended up deleting helper lemmas we had needed under the original assumptions. This is not because our proof scripts were highly automated or robust, but because the original proof really was not exploiting these stronger assumptions. Checking this carefully with respect to the original paper proof would have been rather tedious, but was straightforward with a mechanized version.

Extending to the Binary Case. In a technical report, Karpinski and Zimmermann [39] claimed to extend Karp’s result to work and span recurrences with multiple recursive calls, so we initially tried to verify their result. The argument is fundamentally like Karp’s original proof, so many steps were described briefly because they were intended to be similar to the corresponding parts of the proof of Theorem 1. However, we were unable to prove that their analogue of the \(D_r\) function satisfied the assumptions of Lemma 1, and so we were stuck at the corresponding step of the induction argument. It was at this point that we mechanized the results from Tassarotti [51] instead.

6 Related Work

6.1 Verification of Randomized Algorithms and Mechanized Probability Theory

Audebaud and Paulin-Mohring [3] developed a different monadic encoding for reasoning about randomized algorithms in Coq that can represent randomized algorithms that do not necessarily terminate. It would be interesting to try to generalize our version of Karp’s theorem and apply them to programs expressed using this monad.

Barthe et al. [9] develop a probabilistic variant of Benton’s relational Hoare logic [14] called pRHL to do relational reasoning about pairs of randomized programs. Extensions to and applications of pRHL for reasoning about probabilistic programs have been developed in a series of papers [7, 10, 11], and this kind of relational reasoning has been implemented in the EasyCrypt tool [5]. There are many other formal logics for reasoning about probabilistic programs (e.g., [8, 40, 44, 49]). Kaminski et al. [37] presented a weakest-precondition logic that can be used to establish expected running time. As an example, they proved a bound on the expected number of comparisons used by QuickSort. The soundness of their logic was later mechanized by Hölzl [34] in Isabelle.

Van der Weegen and McKinna [52] mechanized a proof of the average number of comparisons performed by QuickSort in Coq, and used monad transformers to elegantly separate reasoning about correctness and cost while still being able to extract efficient code. Eberl [27] has recently mechanized a similar result, as well as bounds on the expected depth and height of binary search trees [26]. Haslbeck et al. [33] have verified expected height bounds for treaps, which requires measure theoretic probability because of the way that treap algorithms sample from continuous distributions. See the overview by Eberl et al. [29] for a description of the mechanizations from [26, 27, 33]. Eberl [28] also mechanized the Akra-Bazzi theorem, a generalization of the Master Theorem for reasoning about deterministic divide and conquer recurrences.

More generally, multiple large developments of probability theory have been carried out in several theorem provers, including large amounts of measure theory [35, 36], the Central Limit Theorem [4], Lévy and Hoeffding’s inequalities [24], and information theory [1], to name just some of these results.

6.2 Techniques for Bounds on Randomized Algorithms

There are a vast number of tools and results that have been developed for analyzing properties of randomized algorithms; see [25, 31, 43, 45] for expository accounts of both simple and more advanced techniques. Different “cookbook” methods like Karp’s also exist: Bazzi and Mitter [12] developed a variant of the Akra-Bazzi master theorem for deriving asymptotic expectation bounds for work recurrences. Roura [50] presented a master theorem that also applies to recurrences like that of the expected work for QuickSort.

Chaudhuri and Dubhashi [22] extended the results of Karp [38] for unary probabilistic recurrence relations by weakening some of the assumptions of Theorem 1. Their proof used only “standard” techniques from probability theory like Markov’s inequality and Chernoff bounds, so they argued that it is easier to understand. Of course, this approach may be less beneficial for mechanization if we do not have a pre-existing library of results.

7 Conclusion

We have described our mechanization of theorems by Karp [38] and Tassarotti [51] that make it easier to obtain tail bounds for various probabilistic recurrence relations arising in the study of randomized algorithms. To demonstrate the use of these results, we have explained our verification of four example applications. Moreover, we have shown that these results can be used to obtain concrete numerical bounds, fully checked in Coq, for input sizes of practical significance. To our knowledge, this is the first mechanization of these kinds of tail bounds in a theorem prover.

In future work, it would be interesting to try to automate the inference of the \(a\), \(g_1\), and \(g_2\) functions used when applying Karp’s theorem. The resulting deterministic recurrence could also probably be solved automatically, since more complex recurrences have been analyzed automatically in related work (e.g., [21]). If these analyses are done as part of external tools, it would be useful to be able to produce proof certificates that could be checked using the Coq development we describe here, as in some other resource analysis tools [17, 18].

It should also be possible to extend the applicability of our mechanization by handling arbitrary probability distributions instead of finite ones. Moreover, it may be possible to use tools like the probabilistic relational Hoare logic of Barthe et al. [9] to prove suitable refinements between imperative randomized algorithms and the functional versions we have analyzed here. This would allow one to derive corresponding tail bounds on the imperative versions.