1 Introduction

Correctness proofs of algorithms are one of the main motivations for computer-based theorem proving. This survey focuses on the verification (which for us always means machine-checked) of textbook algorithms. Their often tricky nature means that for the most part they are verified with interactive theorem provers (ITPs).

We explicitly cover running time analyses of algorithms, but for reasons of space only those analyses employing ITPs. The rich area of automatic resource analysis (e.g. the work by Jan Hoffmann et al. [103, 104]) is out of scope.

The following theorem provers appear in our survey and are listed here in alphabetic order: ACL2 [111], Agda [31], Coq [25], HOL4 [181], Isabelle/HOL [150, 151], KeY [7], KIV [63], Minlog [21], Mizar [17], Nqthm [34], PVS [157], Why3 [75] (which is primarily automatic). We always indicate which ITP was used in a particular verification, unless it was Isabelle/HOL (which we abbreviate to Isabelle from now on), which remains implicit. Some references to Isabelle formalizations lead into the Archive of Formal Proofs (AFP) [1], an online library of Isabelle proofs.

There are a number of algorithm verification frameworks built on top of individual theorem provers. We describe some of them in the next section. The rest of the article follows the structure and contents of the classic text by Cormen et al. [49] (hereafter abbreviated by CLRS) fairly closely while covering some related material, too. Material present in CLRS but absent from this survey usually means that we are not aware of a formalization in a theorem prover.

Because most theorem provers are built on a logic with some kind of functional programming language as a sublanguage, many verifications we cite pertain to a functional version of the algorithm in question. Therefore we only mention explicitly if a proof deals with an imperative algorithm.

It must be emphasized that this survey is biased by our perspective and covers recent work by the authors in more depth. Moreover it is inevitably incomplete. We encourage our readers to notify us of missing related work.

2 Programming and Verification Frameworks

In this section we will describe how the various systems that appear in our survey support program verification. The theorem provers ACL2, Agda, Coq, HOL, Isabelle, Minlog, Nqthm and PVS are based on logics that subsume a functional programming language. By default that is the language algorithms must be formulated in. ACL2 and Nqthm are special in two regards: their logic contains the programming language Lisp whereas the other theorem provers typically rely on some sort of compiler into functional languages like SML, OCaml, Haskell, Scala and even JavaScript; ACL2 also supports imperative features [35] directly.

KeY and KIV are primarily program verification systems but with inbuilt provers. They support modular verification and stepwise refinement. KeY focuses on the verification of Java programs, KIV on refinement and the automatic generation of executable programs in multiple target languages.

Why3 also falls into the program verification category. It has its own programming and specification language WhyML, which is mostly functional but with mutable record fields and arrays. Verification conditions are discharged by Why3 with the help of various automated and interactive theorem provers. WhyML can be translated into OCaml but can also be used as an intermediate language for the verification of C, Java, or Ada programs.

Mizar is the odd one out: it does not have any built-in notion of algorithm and its proofs about algorithms are at an abstract mathematical level.

There are various approaches for algorithm verification. Two important categories are the following:

Explicit Programming Language (Deep Embedding). One can define a programming language – functional or imperative – with a convenient set of constructs, give it a formal semantics, and then express an algorithm as a program in this language. Additionally, a cost model can be integrated into the semantics to enable formal reasoning about running time or other resource use. The actual analysis is then typically done with some sort of program logic (e.g. a Hoare-style calculus). When embedded in a theorem prover, this approach is often referred to as a deep embedding.

Directly in the Logic (No Embedding). As was mentioned before, many ITPs offer functionality to define algorithms directly in the logic of the system – usually functionally. This approach is more flexible since algorithms can use the full expressiveness of the system’s logic and not only some fixed restricted set of language constructs. One possible drawback of this approach is that it can be difficult or even impossible to reason about notions such as running time explicitly. A possible workaround is to define an explicit cost function for the algorithm, but since there is no formal connection between that function and the algorithm, one must check by inspection that the cost function really does correspond to the incurred cost. Another disadvantage is that, as was said earlier, most logics do not have builtin support for imperative algorithms.

Hybrids between these two approaches also exist (such as shallow embeddings). And, of course, the different approaches can be combined to reap the advantages of all of them; e.g. one can show a correspondence between the running time of a deeply-embedded algorithm and a cost function specified as a recurrence directly in the logic, so that results obtained about the latter have a formal connection to the former.

Imperative Verification Frameworks. As examples of such combined approaches, both Coq and Isabelle provide frameworks for the verification of imperative algorithms that are used to verify textbook algorithms.

The CFML tool [41] allows extracting a characteristic formula – capturing the program behaviour, including effects and running time – from a Caml program and importing it into Coq as an axiom. The CFML library provides tactics and infrastructure for Separation Logic with time credits [42] that allow to verify both functional correctness and running time complexity.

Sakaguchi [176] presented a library in Coq that features a state-monad and extraction to OCaml with mutable arrays.

Imperative-HOL is a monadic framework with references and arrays by Bulwahn et al. [38] which allows code generation to ML and Haskell with references and mutable arrays. Lammich [118] presents a simplified fragment of LLVM, shallowly embedded into Isabelle, with a code generator to LLVM text. Both Imperative-HOL and Isabelle-LLVM come with a Separation Logic framework and powerful proof tools that allow reasoning about imperative programs. Zhan and Haslbeck [201] extend Imperative-HOL with time and employ Separation Logic with time credits.

Isabelle Refinement Framework. There are several techniques for verifying algorithms and data structures. Many verification frameworks start in a “bottom-up” manner from a concrete implementation and directly generate verification conditions which are then discharged automatically or interactively. Another technique is to model an algorithm purely functionally, prove its correctness abstractly and then prove that it is refined by an imperative implementation. For example, Lammich [122] and Zhan [200] employ this technique for the verification of algorithms in Imperative-HOL.

A third approach is best described as “top-down”: an abstract program capturing the algorithmic idea is proved correct, refined stepwise to a more concrete algorithm and finally an executable algorithm is synthesized. The Isabelle Refinement Framework [127] constitutes the top layer of a tool chain in Isabelle that follows that approach. It allows specifying the result of programs in a nondeterminism monad and provides a calculus for stepwise refinement. A special case is data refinement, where abstract datatypes (e.g. sets) are replaced by concrete datatypes (e.g. lists). Many algorithms have been formalized in this framework and we will mention some of them in the main part of this paper. Lammich provides two backends to synthesize an executable program and produce a refinement proof from an abstract algorithm: The Autoref tool [114] yields a purely functional implementation by refining abstract datatypes with data structures from the Isabelle Collections Framework [121]. The Sepref tool [118, 119] synthesizes efficient imperative implementations in Imperative-HOL and LLVM. Haslbeck and Lammich [95] extended the Isabelle Refinement Framework and Sepref to reason abstractly about the running time of programs and synthesize programs that preserve upper bounds through stepwise refinement down to Imperative-HOL with time [201].

2.1 Approaches for Randomized Algorithms

There are various approaches for reasoning about randomized algorithms in a formal way. Analogously to the non-randomized setting described in Sect. 2, there again exists an entire spectrum of different approaches:

  • fully explicit/deeply-embedded approaches

  • “no embedding” approaches that model randomized algorithms directly in the logic as functions returning a probability distribution

  • shallow embeddings, e.g. with shallow deterministic operations but explicit random choice and explicit “while” loops. Examples are the approaches by Petcher and Morrisett  [165] in Coq and by Kaminski et al.   [110] on paper (which was formalized by Hölzl  [105]).

  • combined approaches that start with a program in a deeply-embedded probabilistic programming language and then relate it to a distribution specified directly in the logic, cf. e.g. Tassarotti and Harper  [188].

Next, we will explore the different approaches that exist in an ITP setting to represent probability distributions. This is crucial in the “no embedding” approach, but even in the other cases it is useful to be able to give a formal semantics to the embedded programming language, prove soundness of a program logic, etc. The first work known to us on formalizing randomized algorithms is by Hurd  [109] and represented randomized algorithms as deterministic algorithms taking an infinite sequence of random bits as an additional input. However, it seems that later work preferred another approach, which we will sketch in the following.

Generally, the idea is to have a type constructor M of probability distributions, i.e. \(M(\alpha )\) is the type of probability distributions over elements of type \(\alpha \). This type constructor, together with two monadic operations \(\textit{return} : \alpha \rightarrow M(\alpha )\) and \(\textit{bind} : M(\alpha ) \rightarrow (\alpha \rightarrow M(\beta )) \rightarrow M(\beta )\), forms the Giry monad  [84], which in our opinion has emerged as the most powerful and natural representation for randomized programs in an ITP setting.

The exact definition and scope of \(M(\alpha )\) varies. The following approaches are found in popular ITPs:

  • For their formalization of quicksort in Coq, Van der Weegen and McKinna [193] represented distributions as trees whose leaves are deterministic results and whose nodes are uniformly random choices. While well-suited for their use case, this encoding is not ideal for more general distributions.

  • Isabelle mostly uses probability mass functions (PMFs), i.e. functions \(\alpha \rightarrow [0,1]\) that assign a probability to each possible result (which only works for distributions with countable support). The same approach is also used by Tassarotti and Harper  [187] in Coq.

  • As an earlier variation of this, Audebaud and Paulin-Mohring  [14] used the CPS-inspired encoding \((\alpha \rightarrow [0,1])\rightarrow [0,1]\) of PMFs in Coq.

  • Isabelle contains a very general measure theory library  [106] in which distributions are functions \(\text {Set}(\alpha )\rightarrow [0,1]\) that assign a probability to every measurable set of results. This is the most expressive representation and allows for continuous distributions (such as Gaussians) as well. It can, however, be tedious to use due to the measurability side conditions that arise. PMFs are therefore preferred in applications in Isabelle whenever possible.

The main advantage of having probability distributions in the logic as first-class citizens is again expressiveness and flexibility. It is then even possible to prove that two algorithms with completely different structure have not just the same expected running time, but exactly the same distribution. For imperative randomized algorithms or fully formal cost analysis, one must however still combine this with an embedding, as done by Tassarotti and Harper  [188].

One notable system that falls somewhat outside this classification is Ellora by Barthe et al.   [19]. This is a program logic that is embedded into the EasyCrypt theorem prover  [18], which is not a general-purpose ITP but still general enough to allow analysis of complex randomized algorithms.

This concludes the summary of the verification frameworks we consider. The rest of the paper is dedicated to our survey of verified textbook foundations and algorithms. We roughly follow the structure and contents of CLRS.

3 Mathematical Foundations

3.1 Basic Asymptotic Concepts

Landau symbols (“Big-O”, “Big-Theta”, etc.) are common in both mathematics and in the analysis of algorithms. The basic idea behind e.g. a statement such as \(f(n) \in O(g(n))\) is that f(n) is bounded by some multiple of g(n), but different texts sometimes differ as to whether “bounded” means \(f(n) \le g(n)\) or \(f(n)\le |g(n)|\) or even \(|f(n)|\le |g(n)|\). Usually (but not always), the inequality need also only hold “for sufficiently large n”. In algorithms contexts, f and g are usually functions from the naturals into the non-negative reals so that these differences rarely matter. In mathematics, on the other hand, the domain of f and g might be real or complex numbers, and the neighbourhood in which the inequality must hold is often not \(n\rightarrow \infty \), but e.g. \(n\rightarrow 0\) or a more complicated region. Finding a uniform formal definition that is sufficiently general for all contexts is therefore challenging.

To make matters worse, informal arguments involving Landau symbols often involve a considerable amount of hand-waving or omission of obvious steps: consider, for instance, the fact that

$$\begin{aligned} \exp \bigg (\frac{x+1}{\sqrt{x}+1}\bigg ) x^a (\log x)^b \in O\big (e^x\big )\ . \end{aligned}$$
(1)

This is intuitively obvious, since the first factor on the left-hand side is “roughly” equal to \(e^{x/2}\). This is exponentially smaller than the \(e^x\) on the right-hand side and therefore eclipses the other, polynomial–logarithmic factors. Doing such arguments directly in a formally rigorous way is very tedious, and simplifying this process is a challenging engineering problem.

Another complication is the fact that pen-and-paper arguments, in a slight abuse of notation, often use O(g(n)) as if it were one single function. The intended meaning of this is “there exists some function in O(g(n)) for which this is true”. For example, one writes \(e^{\sqrt{n+1}} = e^{\sqrt{n} + O(1/\sqrt{n})}\), meaning “there exists some function \(g(n) \in O(1/\sqrt{n})\) such that \(e^{\sqrt{n+1}} = e^{\sqrt{n} + g(n)}\).” This notation is very difficult to integrate into proof assistants directly.

Few ITPs have support for Landau-style asymptotics at all. In the following, we list the formalizations that we are aware of:

  • Avigad et al.   [15] defined “Big-O” for their formalization of the Prime Number Theorem, including the notation \(f\ {=}o\ g\ {+}o\ O(h)\) for \(f(x) = g(x) + O(h(x))\) that emulates the abovementioned abuse of notation at least for some simple cases. However, their definition of O is somewhat restrictive and no longer used for new developments.

  • Eberl defined the five Landau symbols from CLRS and the notion of asymptotic equivalence (“\(\sim \)”). These are intended for general-purpose use. The neighbourhood in which the inequality must hold is \(n\rightarrow \infty \) by default, but can be modified using filters  [107], which allow for a great degree of flexibility. This development is now part of the Isabelle distribution. A brief discussion of it can be found in his article on the Akra–Bazzi theorem  [59].

  • Guéneau et al.   [87] (Coq) define a “Big-O-like domination relation for running time analysis, also using filters for extra flexibility.

  • Affeldt et al.   [6] (Coq) define general-purpose Landau symbols. Through several tricks, they fully support the abovementioned “abuse of notation”.

It seems that the filter-based definition of Landau symbols has emerged as the canonical one in an ITP context. For algorithm analysis, the filter in question is usually simply \(n\rightarrow \infty \) so that this extra flexibility is not needed, but there are two notable exceptions:

  • Multivariate “Big-O” notation is useful e.g. if an algorithm’s running time depends on several parameters (e.g. the naïve multiplication of two numbers m and n, which takes O(mn) time). This can be achieved with product filters.

  • Suppose we have some algorithm that takes \(O(\log n)\) time on sorted lists \(\textit{xs}\) for large enough n, where \(n = |\textit{xs}|\) is the length of the list. This can be expressed naturally as \(\text {time}(\textit{xs}) \in O_F(\log |\textit{xs}|)\) w.r.t. a suitable filter F. For instance, in Isabelle, this filter can be written as

    $$\begin{aligned} \text {length}\ \mathbf{going}-to \ \text {at}\_\text {top}\ \mathbf{within} \ \{\textit{xs}.\ \text {sorted}\ \textit{xs}\}\ . \end{aligned}$$

In addition to that, Coq and Isabelle provide mechanisms to facilitate asymptotic reasoning:

  • Affeldt et al.   [6] provide a proof method called “near” in Coq that imitates the informal pen-and-paper reasoning style where in asymptotic arguments, one can assume properties as long as one can later justify that they hold eventually. This can lead to a more natural flow of the argument.

  • Motivated by the proof obligations arising from applications of the Master theorem, Eberl  [59] implemented various simplification procedures in Isabelle that rewrite Landau-symbol statements into a simpler form. Concretely, there are procedures to

    • \(\bullet \) cancel common factors such as \(f(x) g(x)\in O(f(x) h(x))\),

    • \(\bullet \) cancel dominated terms, e.g. rewriting \(f(x) + g(x) \in O(h(x))\) to \(f(x)\in O(h(x))\) when \(g(x)\in o(f(x))\) and

    • \(\bullet \) simplify asymptotic statements involving iterated logarithms, e.g. rewriting \(x^a (\log x)^b \in O(x^c (\log \log x)^d)\) to equations/inequalities of a, b, c, d.

  • Lastly, Eberl  [60] provides an Isabelle proof method to prove limits and Landau-symbol statements for a large class of real-valued functions. For instance, it can solve the asymptotic problem (1) mentioned earlier fully automatically.

3.2 The Master Theorem

CLRS present the Master theorem for divide-and-conquer recurrences and use it in the running time analysis of several divide-and-conquer algorithms. They also briefly mention another result known as the Akra–Bazzi theorem and cite the streamlined version due to Leighton  [130]. This result generalizes the Master theorem in several ways:

  • The different sub-problems being solved by recursive calls are not required to have the same size.

  • The recursive terms are not required to be exactly \(f(\lfloor n/b\rfloor )\) or \(f(\lceil n/b\rceil )\) but can deviate from n/b by an arbitrary sub-linear amount.

  • While the “balanced” case of the original Master theorem requires \(f(n) \in \varTheta (n^{\log _b a} (\log n)^k)\), the Akra–Bazzi theorem also works for a much larger class of functions.

The only formalized result related to this that we are aware of is Eberl’s formalization of Leighton’s version of the Akra–Bazzi theorem  [59]. CLRS state that the Akra–Bazzi Theorem “can be somewhat difficult to use” – probably due to its rather technical side conditions and the presence of an integral in the result. However, Eberl’s formalization provides several corollaries that combine the first and second advantage listed above while retaining the ease of application of the original Master theorem.

Eberl gives some applications to textbook recurrences (mergesort, Karatsuba multiplication, Strassen multiplication, median-of-medians selection). Zhan and Haslbeck  [201] also integrated Eberl’s work into their work on verifying the asymptotic time complexity of imperative algorithms (namely imperative versions of mergesort, Karatsuba and median-of-medians). Rau and Nipkow  [173] used Eberl’s Master theorem to prove the \(O(n \log n)\) running time of a closest-pair-of-points algorithm.

4 Sorting and Order Statistics

4.1 Sorting

Verification of textbook sorting algorithms was a popular pastime in the early theorem proving days (e.g. [32]) but is now more of historic interest. To show that the field has progressed, we highlight three verifications of industrial code.

The sorting algorithm TimSort (combining mergesort and insertion sort) is the default implementation for generic arrays and collections in the Java standard library. De Gouw et al. [86] first discovered a bug that can lead to an ArrayIndexOutOfBoundsException and suggested corrections. Then De Gouw et al. [85] verified termination and exception freedom (but not full functional correctness) of the actual corrected code using KeY.

Beckert et al. [20], again with KeY, verified functional correctness of the other implementation of sorting in the Java standard library, a dual pivot quicksort algorithm.

Lammich [120] verified a high-level assembly-language (LLVM) implementation of two sorting algorithms: introsort [140] (a combination of quicksort, heapsort and insertion sort) from the GNU C++ library (libstdc++) and pdqsort, an extension of introsort from the Boost C++ libraries. The verified implementations perform on par with the originals.

Additionally, we mention a classic meta result that is also presented in CLRS: Eberl [56] formally proved the \(\varOmega (n \log n)\) lower bound for the running time of comparison-based sorting algorithms in Isabelle.

4.2 Selection in Worst-Case Linear Time

Eberl  [57] formalized a functional version of the deterministic linear-time selection algorithm from CLRS including a worst-case analysis for the sizes of the lists in the recursive calls. Zhan and Haslbeck  [201] refined this to an imperative algorithm, including a proof that it indeed runs in linear time using Eberl’s formalization of the Akra–Bazzi theorem (unlike the elementary proof in CLRS). However, the imperative algorithm they formalized differs from that in CLRS by some details. Most notably, the one in CLRS is in-place, whereas the one by Zhan and Haslbeck is not. Formalizing the in-place algorithm would require either a stronger separation logic framework or manual reasoning to prove that the recursive calls indeed work on distinct sub-arrays.

5 Data Structures

5.1 Elementary Data Structures

We focus again on two noteworthy verifications of actual code. Polikarpova et al. [167, 168] verify EiffelBase2, a container library (with emphasis on linked lists, arrays and hashing) that was initially designed to replace EiffelBase, the standard container library of the Eiffel programming language [136]. A distinguishing feature is the high degree of automation of their Eiffel verifier called AutoProof [78]. The verification uncovered three bugs. Hiep et al. [100] (KeY) verified the implementation of a linked list in the Java Collection framework and found an integer overflow bug on 64-bit architectures.

5.2 Hash Tables

The abstract datatypes sets and maps can be efficiently implemented by hash tables. The Isabelle Collections Framework [121] provides a pure implementation of hash tables that can be realized by Haskell arrays during code generation. Lammich [116, 122] also verified an imperative version with rehashing in Imperative-HOL. Filliâtre and Clochard [72] (Why3) verified hash tables with linear probing. Pottier [170] verified hash tables in CFML with a focus on iterators. Polikarpova et al. (see above) also verified hash tables. These references only verify functional correctness, not running times.

5.3 Binary Search Trees

Unbalanced binary search trees have been verified many times. Surprisingly, the functional correctness, including preservation of the BST invariant, almost always require a surprising amount of human input (in the form of proof steps or annotations). Of course this is even more the case for balanced search trees, even ignoring the balance proofs. Most verifications are based on some variant of the following definition of BSTs: the element in each node must lie in between the elements of the left subtree and the elements of the right subtree. Nipkow [146] specifies BSTs as trees whose inorder list of elements is sorted. With this specification, functional correctness proofs (but not preservation of balancedness) are fully automatic for AVL, red-black, splay and many other search trees.

5.4 AVL and Red-Black Trees

Just like sorting algorithms, search trees are popular case studies in verification because they can often be implemented concisely in a purely functional way. We merely cite some typical verifications of AVL [47, 74, 152, 172] and red-black trees [12, 41, 52, 74] in various theorem provers.

We will now consider a number of search trees not in CLRS.

5.5 Weight-Balanced Trees

Weight-balanced trees were invented by Nievergelt and Reingold [143, 144] (who called them “trees of bounded balance”). They are balanced by size rather than height, where the size |t| of a tree t is defined as the number of nodes in t. A tree is said to be \(\alpha \)-balanced, \(0 \le \alpha \le 1/2\), if for every non-empty subtree t with children l and r, \(\alpha \le \frac{|l|+1}{|t|+1} \le 1 - \alpha \). Equivalently we can require \( \frac{\min (|l|,|r|)+1}{|t|+1} \le \alpha \). Insertion and deletion may need to rebalance the tree by single and double rotations depending on certain numeric conditions. Blum and Mehlhorn [28] discovered a mistake in the numeric conditions for deletion, corrected it and gave a very detailed proof. Adams [5] used weight-balanced trees in an actual implementation (in ML) but defined balancedness somewhat differently from the original definition. Haskell’s standard implementation of sets, Data.Set, is based on Adams’s implementation. In 2010 it was noticed that deletion can break \(\alpha \)-balancedness. Hirai and Yamamoto [102], unaware of the work by Blum and Mehlhorn, verified their own version of weight-balanced trees in Coq, which includes determining the valid ranges of certain numeric parameters. Nipkow and Dirix [149] provided a verified framework for checking validity of specific values for these numeric parameters.

5.6 Scapegoat Trees

These trees are due to Anderson [11], who called them general balanced trees, and Galperin and Rivest [81], who called them scapegoat trees. The central idea: don’t rebalance every time, rebuild a subtree when the whole tree gets “too unbalanced”, i.e. when the height is no longer logarithmic in the size, with a fixed multiplicative constant. Because rebuilding is expensive (in the worst case it can involve the whole tree) the worst case complexity of insertion and deletion is linear. But because earlier calls did not need to rebalance, the amortized complexity is logarithmic. The analysis by Anderson was verified by Nipkow  [147].

5.7 Finger Trees

Finger trees were originally defined by reversing certain pointers in a search tree to accelerate operations in the vicinity of specific positions in the tree [88]. A functional version is due to Hinze and Paterson [101]. It can be used to implement a wide range of efficient data structures, e.g. sequences with access to both ends in amortized constant time and concatenation and splitting in logarithmic time, random access-sequences, search trees, priority queues and more. Functional correctness was verified by Sozeau [182] (Coq) and by Nordhoff et al. [155]. The amortized complexity of the deque operations was analysed by Danielsson [50] (Agda).

5.8 Splay Trees

Splay trees [179] are self-adjusting binary search trees where items that have been searched for are rotated to the root of the tree to adjust to dynamically changing access frequencies. Nipkow [145, 146] verified functional correctness and amortized logarithmic complexity.

5.9 Braun Trees

Braun trees are binary trees where for each node the size of the left child is the same or one larger than the size of the right child [108, 174]. They lend themselves to the implementation of extensible arrays and priority queues in a purely functional manner [162]. They were verified by Nipkow and Sewell [153] in great depth. McCarthy et al. [133] demonstrate their Coq library for running time analysis by proving the logarithmic running time of insertion into Braun trees.

6 Advanced Design and Analysis Techniques

6.1 Dynamic Programming

It is usually easy to write down and prove correct the recursive form of a dynamic programming problem, but it takes work to convert it into an efficient implementation by memoizing intermediate results. Wimmer et al. [196] automated this process: a recursive function is transformed into a monadic one that memoizes results, and a theorem stating the equivalence of the two functions is proved automatically. The results are stored in a so-called state monad. Two state monads were verified: a purely functional state monad based on search trees and the state monad of Imperative-HOL using arrays. The imperative monad yields implementations that have the same asymptotic complexity as the standard array-based algorithms. Wimmer et al. verify two further optimizations: bottom-up order of computation and an LRU cache for reduced memory consumption. As applications of their framework, they proved the following algorithms correct (in their recursive form) and translated them into their efficient array-based variants: Bellman-Ford, CYK (context-free parsing), minimum edit distance and optimal binary search trees. Wimmer [195] also treated Viterbi’s algorithm in this manner.

Nipkow and Somogyi [154] verified the straightforward recursive cubic algorithm for optimal binary search trees and Knuth’s quadratic improvement [113] (but using Yao’s simpler proof [199]) and applied memoization.

6.2 Greedy Algorithms

One example of a greedy algorithm given in CLRS is Huffman’s algorithm. It was verified by Théry [189] (Coq) and Blanchette [27]. For problems that exhibit a matroid structure, greedy algorithms yield optimal solutions. Keinholz [112] formalizes matroid theory. Haslbeck et al. [95, 96] verify the soundness and running time of an algorithm for finding a minimum-weight basis on a weighted matroid and use it to verify Kruskal’s algorithm for minimum spanning trees.

7 Advanced Data Structures

7.1 B-Trees

We are aware of two verifications of imperative formulations of B-trees. Malecha et al. [131] used Ynot [141], an axiomatic extension to Coq that provides facilities for writing and reasoning about imperative, pointer-based code. The verification by Ernst et al. [64] is unusual in that it combines interactive proof in KIV with the automatic shape analysis tool TVLA [175].

7.2 Priority Queues

We start with some priority queue implementations not in CLRS. Priority queues based on Braun trees (see Sect. 5.9) were verified by Filliâtre [70] (Why3) and Nipkow and Sewell [153]. Two self-adjusting priority queues are the skew heap [180] and the pairing heap [77]. Nipkow and Brinkop [145, 148] verified their functional correctness (also verified in Why3 [69, 164]) and amortized logarithmic running times. Binomial heaps (covered in depth in the first edition of CLRS) were verified by Meis et al. [135] (together with skew binomial heaps [36]), Filliâtre [71] (Why3) and Appel [13] (Coq).

The above heaps are purely functional and do not provide a decrease-key operation. Lammich and Nipkow [124] designed and verified a simple, efficient and purely functional combination of a search tree and a priority queue, a “priority search tree”. The salient feature of priority search trees is that they offer an operation for updating (not just decreasing) the priority associated with some key; its efficiency is the same as that of the update operation.

Lammich [117] verified an imperative array-based implementation of priority queues with decrease-key.

7.3 Union-Find

The union-find data structure for disjoint sets is a frequent case-study: it was formalized in Coq [48, 176, 192] and Isabelle [122]. Charguéraud, Pottier and Guéneau [42, 43, 87] were the first to verify the amortized time complexity \(O(\alpha (n))\) in Coq using CFML. Their proof follows Alstrup et al. [10].

8 Graph Algorithms

8.1 Elementary Graph Algorithms

Graph-searching algorithms are so basic that we only mention a few notable ones. BFS for finding shortest paths in unweighted graphs was verified by participants of a verification competition [76] (in particular in KIV). Lammich and Sefidgar [125] verified BFS for the Edmonds–Karp algorithm. Lammich and Neumann [123] as well as Pottier [169] (Coq) verified DFS and used it for algorithms of different complexity, ranging from a simple cyclicity checker to strongly connected components algorithms. Wimmer and Lammich [198] verified an enhanced version of DFS with subsumption. Bobot [29] verified an algorithm for topological sorting by DFS in Why3.

Strongly Connected Components. There are several algorithms for finding strongly connected components (SCCs) in a directed graph. Tarjan’s algorithm [186] was verified by Lammich and Neumann [123, 142]. Chen et al. verified Tarjan’s algorithm in Why3, Coq and Isabelle and compared the three formalizations [45, 46]. Lammich [115] verified Gabow’s algorithm [79] (which was used in a verified model checker [65]), and Pottier [169] (Coq) verified the SCC algorithm featured in CLRS, which is attributed to Kosaraju.

8.2 Minimum Spanning Trees

Prim’s algorithm was first verified by Abrial et al. [4] in B [3] and on a more abstract level by Lee and Rudnicki [129] (Mizar). Guttmann [89, 90] verified a formulation in relation algebra, while Nipkow and Lammich [124] verified a purely functional version.

Kruskal’s algorithm was verified by Guttmann [91] using relation algebras. Functional correctness [97] and time complexity [95] of an imperative implementation of Kruskal’s algorithm were verified by Haslbeck, Lammich and Biendarra.

8.3 Shortest Paths

The Bellman-Ford algorithm was verified as an instance of dynamic programming (see Sect. 6.1).

Dijkstra’s Algorithm. Dijkstra’s algorithm has been verified several times. The first verifications were conducted by Chen, Lee and Rudnicki [44, 129] (Mizar) and by Moore and Zhang [138] (ACL2). While these formalizations prove the idea of the algorithm, they do not provide efficient implementations. Charguéraud [41] verifies an OCaml version of Dijkstra’s algorithm parameterized over a priority queue data structure (without a verified implementation). A notable point is that his algorithm does not require a decrease-key operation.

Nordhoff and Lammich [156] use their verified finger trees (see Sect. 5.7) that support decrease-key to obtain a verified functional algorithm. Lammich later refined the functional algorithm down to an imperative implementation using arrays to implement the heap [117]. Zhan [200] also verified the imperative version using his auto2 tool. Finally, Lammich and Nipkow [124] used their red-black tree based priority queues that also support decrease-key to obtain a simple functional implementation.

The Floyd–Warshall Algorithm. Early verifications by Paulin-Mohring [161] (Coq), Berger et al. [22] (Minlog), and Berghofer [23] relied on program extraction from a constructive proof and only targeted the Warshall algorithm for computing the transitive closure of a relation. Filliâtre and Clochard [73] (Why3) verified an imperative implementation of the Warshall algorithm. Wimmer [194] verified the functional correctness of the Floyd–Warshall algorithm for the APSP problem including detection of negative cycles. The main complication is to prove that destructive updates can be used soundly. This and the detection of negative cycles are left as an exercise to the reader in CLRS. The resulting functional implementation (with destructive updates) was later refined to an imperative implementation by Wimmer and Lammich [197].

8.4 Maximum Network Flow

The first verification of the Ford–Fulkerson method, at an abstract level, was by Lee [128] (Mizar). Lammich and Sefidgar [125] verified the Ford–Fulkerson method and refined it down to an imperative implementation of the Edmonds–Karp algorithm. They proved that the latter requires \(O(|V| \cdot |E|)\) iterations. On randomly generated networks, their code is competitive with a Java implementation by Sedgewick and Wayne [177]. In further work [126] they verified the generic push–relabel method of Goldberg and Tarjan and refined it to both the relabel-to-front and the FIFO push–relabel algorithm. They also performed a running time analysis and benchmarked their algorithms against C and C++ implementations.

Haslbeck and Lammich [95] provided a proper running time analysis of the Edmonds–Karp algorithm and proved the complexity \(O(|V|\cdot |E|\cdot (|V| + |E|))\).

8.5 Matching

Edmonds’ famous blossom algorithm [62] for finding maximum matchings in general graphs was verified by Abdulaziz [2].

Hamid and Castelberry [92] (Coq) verified the Gale–Shapley algorithm [80] for finding stable marriages.

9 Selected Topics

9.1 Matrix Operations

Palomo-Lozano et al. formalized Strassen’s algorithm for matrix multiplication in ACL2  [158], but only for square matrices whose size is a power of two. Dénès et al. formalized a slightly more efficient variant of it known as Winograd’s algorithm in Coq  [51] for arbitrary matrices. Garillot et al. formalized the LUP decomposition algorithm from CLRS in Coq  [83].

9.2 Linear Programming

The simplex algorithm was formalized by Allamigeon and Katz [9] (Coq) and by Spasić and Marić  [132, 183]. The latter was repurposed into an incremental algorithm that can emit unsatisfiable cores by Bottesch et al.   [30]. Parsert and Kaliszyk  [159] extended this to a full solver for linear programming.

9.3 Polynomials and FFT

The recursive Fast Fourier Transform was formalized in various systems. We are aware of the formalizations in ACL2 by Gamboa  [82], in Coq by Capretta  [39], in HOL4 by Akbarpour and Tahar  [8] and in Isabelle by Ballarin  [16].

9.4 Number-Theoretic Algorithms

Most of the basic number theory shown in CLRS (GCDs, modular arithmetic, Chinese remainder theorem) is available in the standard libraries of various systems and we will therefore not give individual references for this and focus entirely on the algorithms.

Hurd formalized the Miller–Rabin test  [109] in HOL4. Stüwe and Eberl  [185] formalized Miller–Rabin and some other related tests (FermatFootnote 1 and Solovay–Strassen) in Isabelle. In all cases, what was shown is that a prime is always correctly classified as prime and that a composite is correctly classified with probability at least \(\frac{1}{2}\). The running time analysis is not particularly interesting for these algorithms, and although they are randomized algorithms, the randomness is of a very simple nature and thus not very interesting either.

Beyond the primality-testing algorithms in CLRS, Chan  [40] gave a HOL4 formalization of the correctness and polynomial running time of the AKS, which was the first deterministic primality test to be proved to run in polynomial time.

9.5 String Matching

The Knuth–Morris–Pratt algorithm was verified by Filliâtre in Coq and Why3 [67, 68]. Hellauer and Lammich [99] verified a functional version of this algorithm and refined it to Imperative-HOL. Lammich [118] synthesized verified LLVM code. The Boyer–Moore string searching algorithm [33] was covered in the first edition of CLRS. Boyer and Moore [34] (Nqthm) and Moore and Martinez [137] (ACL2) verified different variants of this algorithm; Toibazarov  [190] verified the preprocessing phase of the variant considered by Moore and Martinez. Besta and Stomp [26] (PVS) verified the preprocessing phase of the original algorithm.

9.6 Computational Geometry

Convex hull algorithms have been popular verification targets: Pichardie and Bertot [166] (Coq) verified an incremental and a package wrapping algorithm, Meikle and Fleuriot [134] verified Graham’s scan and Brun et al. [37] (Coq) verified an incremental algorithm based on hypermaps. Dufourd and Bertot [24, 53] (Coq) verified triangulation algorithms based on hypermaps.

Rau and Nipkow [173] verified the divide-and-conquer closest pair of points algorithm and obtained a competitive implementation.

9.7 Approximation and Online Algorithms

Stucke [184] (Coq and Isabelle) verified an approximation algorithm for vertex colouring in relation algebra. Eßmann et al. [66] verified three classic algorithms and one lesser-known one for vertex cover, independent set, load balancing and bin packing. Haslbeck and Nipkow [98] formalized online algorithms and verified several deterministic and randomized algorithms for the list update problem.

9.8 Randomized Algorithms

In addition to the randomized algorithms from CLRS, we will also list some from the classic textbook Randomized Algorithms by Motwani and Raghavan  [139]. All work was done using PMFs unless stated otherwise (refer to Sect. 2.1 for a discussion of the various approaches).

The first work on a non-trivial randomized algorithm in an ITP was probably Hurd’s  [109] previously-mentioned formalization of the Miller–Rabin primality test in HOL (using an infinite stream of random bits to encode the randomness). The primality tests formalized by Stüwe and Eberl  [185] are technically also randomized algorithms, but the probabilistic content is very small.

The expected running time of the coupon collector problem was treated by Kaminski et al.   [110] using their Hoare-style calculus for the pGCL language (on paper). Hölzl  [105] formalized their approach in Isabelle.

Barthe et al. analyzed several probabilistic textbook problems using a program logic called Ellora  [19], which is embedded into the EasyCrypt system  [18]:

  • expected running time of the coupon collector problem

  • tail bounds on the running time of Boolean hypercube routing

  • probability of incorrectly classifying two different polynomials as equal in probabilistic polynomial equality checking

The correctness of CLRS’s Randomize-In-Place, also known as the Fisher–Yates shuffle, was verified by Eberl [54].

The correctness and expected running time of randomized quicksort was formalized by Van der Weegen and McKinna [193] (Coq) using their “decision tree” approach mentioned earlier and by Eberl [58, 61]. Both additionally treated the case of average-case deterministic quicksort: Van der Weegen and McKinna proved that its expected running time is \(O(n \log n)\), whereas Eberl additionally proved that it has exactly the same distribution as randomized quicksort.

Eberl  [55, 61] proved that random binary search trees (BSTs into which elements are inserted in random order) have logarithmic expected height and internal path length. He also proved that the distribution of the internal path length is precisely the same as that of the running time of randomized quicksort.

Haslbeck et al.   [61, 94] formalized randomized treaps  [178] and proved that their distribution is precisely equal to that of random BSTs, regardless of which order the elements are inserted. The analysis is particularly noteworthy because it involves continuous distributions of trees, which require a non-trivial amount of measure theory.

Haslbeck and Eberl  [93] also defined skip lists  [171] and formally analyzed two of the most algorithmically interesting questions about them, namely the expected height and the expected path length to an element.

Tassarotti and Harper  [187] developed a general cookbook-style method for proving tail bounds on probabilistic divide-and-conquer algorithms in Coq. They applied this method to the running time of randomized quicksort and the height of random BSTs. Later  [188] they used a hybrid approach that combines a program logic for a deeply embedded imperative language with high-level reasoning in Coq to analyze skip lists (restricted to two levels for simplicity).