Keywords

1 Introduction

In the summer of 2017, the Newton Institute at Cambridge held a programme entitled Big Proof (BPR) “directed at the challenges of bringing proof technology into mainstream mathematical practice”. It was held in recognition of the formalisations that had already been done (which were indeed big). The programme webpageFootnote 1 specifically lists the proofs of the Kepler conjecture [19], the odd order theorem [17] and the four colour theorem [16]. That summer also saw the start of my ERC project, ALEXANDRIA. Big Proof represented an acknowledgement that the formalisation of mathematics could no longer be ignored, but also an assertion that big problems remain to be solved. These included “novel pragmatic foundations” and large-scale “formal mathematical libraries” and “inference engines”, and also the “curation” of formalised mathematical knowledge.

ALEXANDRIA was conceived in part to try to identify those big problems. By hiring professional mathematicians and asking them to formalise advanced mathematics, we would get a direct idea of the obstacles they faced. We would also try to refine our tools, extend our libraries and investigate other technologies. We would have only five years (extended to six due to COVID-19).

The need for formalisation had been stressed by Vladimir Voevodsky, a Fields medallist, who pointedly asked “And who would ensure that I did not forget something and did not make a mistake, if even the mistakes in much more simple arguments take years to uncover?” [38]. He advocated a new sort of formalism, homotopy type theory, which was the subject of much excitement. However, the most impressive formalisations by that time had been done in Coq (four colour theorem, odd order theorem), HOL Light (Kepler conjecture and much else) or Isabelle/HOL (part of the Kepler proof, and more). Lean, a newcomer, was attracting a user community. Perhaps our project would shed light on the respective values of the available formalisms: calculus of constructions (Coq, Lean), higher-order logic or homotopy type theory. Voevodsky would never find out, due to his untimely death in September 2017.

Since that date, research into the formalisation of mathematics has plunged ahead. Kevin Buzzard, a number theorist at Imperial College London, followed some of the Big Proof talks online. This resulted in his adoption of Lean for his Xena Project, with the aim of attracting students to formalisation.Footnote 2 Xena has had a huge impact, but here I’d like to focus on the work done within ALEXANDRIA.

2 A Brief Prehistory of the Formalisation of Mathematics

Mathematics is a work of the imagination, and the struggle between intuition and rigour has gone on since classical times. Euclid’s great contribution to Greek geometry was the unification of many separate schools through his system of axioms and postulates. Newton and Leibniz revolutionised mathematics, but the introduction of infinitesimals was problematical. During the 19th centuries, the “arithmetisation of analysis” carried out by Cauchy and Weierstrass replaced infinitesimals by rigorous \(\epsilon \)\(\delta \) arguments. (We would not get a consistent theory of infinitesimals until the 1960 s,s, under the banner of non-standard analysis.) Dedekind and Cantor promulgated a radical new understanding of sets and functions that turned out to be inconsistent until Zermelo came up with his axioms. It is notable that Zermelo set theory (which includes the axiom of choice but lacks Fraenkel’s replacement axiom) is approximately equal in logical strength to higher-order logic.

Only axiomatic mathematics can be formalised. The first attempt was by Frege, whose work (contrary to common belief) was not significantly impacted by Russell’s paradox [1]. Russell and Whitehead in their Principia Mathematica [40] wrote out the proofs of thousands of mathematical propositions in a detailed axiomatic form. The work of Bourbaki can also be seen as a kind of formalised mathematics. The philosopher Hao Wang wrote on the topic and also coded the first automatic theorem prover [39] for first-order logic, based on what we would now recognise as a tableau calculus.

This takes us to NG de Bruijn, who in 1968 created AUTOMATH [5], and to his student’s formalisation [24] of Landau’s Foundations of Analysis in 1977. This takes us to the birth of Mizar [18], in which a truly impressive amount of mathematics was formalised in a remarkably readable notation. More recent history—analysis in HOL Light, the four colour theorem in Coq, etc.—is presumably familiar to readers. But it is appropriate to close this section with a prescient remark by de Bruijn back in 1968:

As to the question what part of mathematics can be written in AUTOMATH, it should first be remarked that we do not possess a workable definition of the word “mathematics”. Quite often a mathematician jumps from his mathematical language into a kind of metalanguage, obtains results there, and uses these results in his original context. It seems to be very hard to create a single language in which such things can be done without any restriction [[4], p. 3].

And so we have two great scientific questions:

  • What sort of mathematics can be formalised?

  • What sort of proofs can be formalised?

We would investigate these questions—mostly in the context of Isabelle/HOL—by formalising as much mathematics as we could, covering as many different topics as possible. I expected to run into obstacles here and there, which would have to be recorded if they could not be overcome.

3 ALEXANDRIA: Warmup Formalisation Exercises

The ERC proposal called for hiring research mathematicians, who would bring their knowledge of mathematics as it was practised, along with their inexperience of Isabelle/HOL. Their role would be to formalise increasingly advanced mathematical material with the twin objectives of developing formalisation methodologies and identifying deficiencies that might be remedied by extending Isabelle/HOL somehow. The project started in September 2017. We hired Anthony Bordg and Angeliki Koutsoukou-Argyraki. A third postdoc was required to undertake any necessary Isabelle engineering, and Wenda Li was hired.

One of the tasks for the first year was simply to reorganise and consolidate the Isabelle/HOL analysis library, which had mostly been translated from HOL Light. But we were also supposed to conduct pilot studies. The team set to work enthusiastically, and already in the first year they created a number of impressive developments:

  • Irrational rapidly convergent series, formalising a 2002 proof by J. Hančl [20]

  • Projective geometry, including Hessenberg’s theorem and Desargues’s theorem

  • The theory of quantum computing (which identified a significant error in one of the main early papers)

  • Quaternions, octonions and several other small exercises

  • Effectively counting real and complex roots of polynomials, and the Budan-Fourier theorem [30, 31]

  • The first formal proof that every field contains an algebraically closed extension [37]

Koutsoukou-Argyraki wrote up her reactions to Isabelle/HOL from the perspective of a mathematician in her paper “Formalising Mathematics —in Praxis” [25].

4 Advanced Formalisations

As noted above, Kevin Buzzard had taken an interest in formalisation through participation in Big Proof, and by 2019 had marshalled large numbers of enthusiastic students to formalise mathematics using Lean. He had also made trenchant criticisms of even the most impressive prior achievements: that most of it concerned simple objects such as finite groups, or was just 19th-century mathematics. Nobody seemed to be working with sophisticated objects. He expressed astonishment that Grothendieck schemes—fundamental objects in algebraic geometry and number theory—had not been formalised in any tool. His criticisms helped focus our attention on the need to tackle difficult, recent and deep mathematics. Team members proposed their own tasks, but we also contributed to one another’s tasks, sometimes with the help of interns or students. We completed three notable projects during this middle period:

  • Irrationality and transcendence criteria for infinite series [27], extending the Hančl work mentioned above with material from two more papers: Erdős–Straus [13] and Hančl–Rucki [21].

  • Ordinal partition theory [9]: infinite forms of Ramsey’s theorem, but for order types rather than cardinals. We formalised relatively papers by Erdős–Milner [14] and Larson [29], and as a preliminary, the Nash-Williams partition theorem [36]. These were deep results in the context of Zermelo–Fraenkel set theory, involving highly intricate inductive constructions. One of the papers contained so many errors as to necessitate publishing a second paper [15] with a substantially different proof. This material was difficult even for Erdős!

  • Grothendieck Schemes [3]. Buzzard had formalised schemes in Lean [6] (three times), and even claimed that Isabelle was not up to the job due to its simple type system. We took the challenge and it was straightforward, following a new approach based on locales to manage the deep hierarchies of definitions.

We were aiming for a special issue devoted to formalisation in the journal Experimental Mathematics, and were delighted to see these projects take up three of the six papers ultimately accepted.

5 Seriously Deep Formalisation Projects

Inspired by the success of the previous projects—conducted under the difficult circumstances of COVID-19 lockdown—team members continued to propose theorems to formalise, and we continued to collaborate in small groups. By now we had the confidence to take on almost anything. There are too many projects to describe in full, so let’s look at some of the highlights.

5.1 Szemerédi’s Regularity Lemma and Roth’s Theorem on Arithmetic Progressions

Szemerédi’s regularity lemma is a fundamental result in extremal graph theory. It concerns a property called the edge density of two given sets of vertices X, \(Y\subseteq V(G)\), and a further property of (XY) being an \(\epsilon \)-regular pair for any given \(\epsilon >0\). The lemma itself states that for a given \(\epsilon >0\) there exists some M such that every graph has an \(\epsilon \)-regular partition of its vertex set into at most M parts. Intuitively, (XY) is an \(\epsilon \)-regular pair if the density of edges between various subsets \(A\subseteq X\) and \(B\subseteq Y\) is more or less the same for all possible A and B; an \(\epsilon \)-regular partition enjoys that property for all but an insignificant number of pairs (XY) of vertex sets taken from the partition. Intuitively then, the vertices of any graph can be partitioned into most M parts such that the edges between the various parts are uniform in this sense.

We used Szemerédi’s regularity lemma to prove Roth’s theorem on arithmetic progressions, which states that every “sufficiently dense” set of natural numbers includes three elements of the form k, \(k+d\), \(k+2d\).

We used a variety of source materials and discovered a good many significant infelicities in the definitions and proofs. These included confusion between \(\subset \) and \(\subseteq \) (which are often synonymous in combinatorics) and between a number of variants of the lemma statement. One minor claim was flatly incorrect. To make matters worse, the significance of these issues only became clear in the application of the regularity lemma to Roth’s theorem. Much time was wasted, and yet the entire formalisation project [10] took under six months.Footnote 3 By a remarkable coincidence, a group based in the mathematics department at Cambridge formalised a slightly different version of Szemerédi’s regularity lemma, using Lean, around the same time [8].

5.2 Additive Combinatorics

Let A and B be finite subsets of a given abelian group \((G,{+})\), and define their sumset as

$$\begin{aligned} A+B = \{a+b:a\in A, b\in B\}. \end{aligned}$$

Write nA for the n-fold iterated sumset \(A+\cdots +A\). Additive combinatorics concerns itself with such matters as the relationship between the cardinality of \(A+B\) and other properties of A and B. Angeliki proposed this field as the natural successor to the formalisation of Szemerédi’s regularity lemma because it’s fairly recent (many results are less than 50 years old) and significant (providing a route to Szemerédi’s theorem, a much stronger version of the Roth result mentioned above).

Here’s an overview of the results formalised, all within the 7-month period from April to November 2022:

  • The Plünnecke–Ruzsa inequality: yields an upper bound on the difference set \(mB-nB\)

  • Khovanskii’s theorem: for any finite \(A\subseteq G\), the cardinality of nA grows like a polynomial for all sufficiently large n.

  • The Balog-Szemerédi-Gowers theorem is a deep result bearing on Szemerédi’s theorem. The formalisation combines additive combinatorics with extremal graph theory and probability [26].

  • Kneser’s theorem and the Cauchy-Davenport theorem yield lower bounds for the size of \(A+B\).

These are highly significant results by leading mathematicians. They can all be found in Isabelle’s Archive of Formal Proofs (AFP).Footnote 4

5.3 Other Formalisation Projects

The members chose a variety of large and small projects with a variety of specific objectives:

  • Combinatorial structures. This is the PhD project of Chelsea Edmonds, who has used Isabelle’s locale system to formalise dozens of varieties of block designs, hypergraphs, graphs and the relationships among them [11]. Results proved include Fisher’s inequality [12].

  • Number theory. We have formalised several chapters of Modular Functions and Dirichlet Series in Number Theory, a graduate textbook by Tom M. Apostol.

  • Wetzel’s problem is a fascinating small example, due to Erdős, where the answer to a question concerning complex analysis depends on the truth or falsity of the continuum hypothesis. The formal proof illustrates analysis and axiomatic set theory smoothly combined into a single argument [33].

  • Turán’s graph theorem states a maximality property of Turán graphs. This was a Master’s student project.

This is a partial list, especially as regards contributions from interns, students and other visitors.

5.4 On Legibility of Formal Proofs

A proof is an argument, based on logical reasoning from agreed assumptions, that convinces mathematicians that a claim is true. How then do we understand a computer proof? To follow the analogy strictly, a computer proof convinces computers that a claim is true. But computers, even in this age of clever chatbots, are not sentient. We need to convince mathematicians.

Of the early efforts at the formalisation of mathematics, only Mizar aimed for legibility. Even pre-computer formal proofs such as Principia Mathematica are unreadable. Isabelle’s proof language (Isar) follows the Mizar tradition, as in the following example:

figure a

Only a little training is required to make some sense of this. The lemma claims that the derivative of a certain summation equals a certain other summation. The proof refers of the variables ?f and ?g, which are defined by the pattern provided in the lemma statement: ?f denotes the original summation, and we prove that ?g is its derivative. Within that proof we can see summations being manipulated through changes of variable. Since we can see these details of the reasoning, we have reasons to believe that the proof is indeed correct: we do not simply have to trust the computer.

Not all Isabelle proofs can be written in a structured style. Page-long formulas often arise when trying to verify program code, and sometimes just from expanding mathematical definitions. Then we must use the traditional tactic style: long sequences of proof commands. However, most mathematical proofs that humans can write go into the structured style with ease. We have aimed for maximum legibility in all our work.

6 Library Search and Machine Learning Experiments

The focus of this paper is achievements in the formalisation of mathematics, but the ALEXANDRIA proposal also called for investigating supporting technologies. The name of the project refers to the library of Alexandria, and Isabelle’s AFP already has nearly 4 million lines of proof text and well over 700 separate entries. How can we take advantage of all this material when developing new proofs?

In May 2019, the team acquired a new postdoc: Yiannos Stathopoulos. He came with the perfect background to tackle these objectives. After much labour, he and Angeliki produced the SErAPIS search engine,Footnote 5 which searches both the pre-installed Isabelle libraries and the AFP, offering a great many search strategies based on anything from simple keywords to abstract mathematical concepts [35]. It is not easy to determine the relevance or significance of a formal text to an abstract concept, but a variety of query types can be combined to explore the libraries.

Also mentioned in the proposal was the aim of Intelligent User Support. I had imagined that common patterns of proofs could be identified in the existing libraries and offered up to users, but with no idea how. To generate structured proofs automatically would require the ability to generate intermediate mathematical assertions. Six years of dramatic advances in machine learning have transformed our prospects. Language models can generate plausible texts given a corpus of existing texts. And as the texts we want would be inserted into Isabelle proofs, we can immediately check their correctness.

An enormous amount of work is underway, particularly by a student in our group, Albert Qiaochu Jiang, working alongside Wenda Li and others. It is now clear that language models can generate formal Isabelle proof skeletons  [32] and can also be useful for identifying relevant lemmas [22]. We can even envisage automatic formalisation [23, 41]: translating informal proofs into formal languages, by machine. Autoformalisation is easier with a legible proof language like ours, because the formal proof can have the same overall structure as the given natural language proof; a project currently underway is to develop the Isabelle Parallel Corpus, pairing natural language and Isabelle texts.Footnote 6 The next few years should see solid gains through machine learning.

7 Evaluation

At the start of this paper, I listed two scientific questions: what sort of mathematics, and what sort of proofs, can be formalised? And the answer so far is, everything we attempted, and we attempted a great variety of mathematical topics: number theory, combinatorics, analysis, set theory. The main difficulties have been errors and omissions in proofs. A vignette illustrates this point. Chelsea was formalising a probabilistic argument where the authors wrote “these probabilities are clearly independent, and therefore the joint probability is obtained by multiplying them.” The problem is that this multiplication law is the mathematical definition of independent probabilities, which the authors had somehow confused with the real-world concept of unconnected random events. Frequently we have found proofs that are almost right: they need a bit of adjustment, but getting everything to fit takes effort.

Effort remains the main obstacle to the use of verification tools by mathematicians. Obvious claims are often tiresome to prove, which is both discouraging and a waste of an expert’s time. But we might already advocate an approach of formalising the definitions and the proofs, stating the obvious claims without proofs (using the keyword sorry). Even for this idea to be feasible, much more library material is needed, covering at least all the definitions a mathematician might expect to have available.

Another key scientific question is the role of dependent types. People in the type theory world seem to share the conviction that dependent types are necessary to formalise nontrivial mathematics. But in reality it seems to be Lean users who repeatedly fall foul of intensional equality: that \(i=j\) does not guarantee that T(i) is the same type as T(j). Falling foul of this can be fatal: the first definition of schemes had to be discarded for this reason. Intensional equality is adopted by almost all dependent type theories, including Coq and Agda: without it, type checking becomes undecidable. But with it, type dependence does not respect equality.

The main limitation of simple type theory is that axiomatic type classes are less powerful than they otherwise would be. Isabelle/HOL has type classes for groups, rings, topological spaces among much else, but they are not useful for defining the theories of groups, rings or topological spaces. Rather they allow us, for example, to define the quaternions, prove a dozen or so laws and immediately inherit entire libraries of algebraic and topological properties. Abstract groups, rings, etc., need to be declared with an explicit carrier set (logically, the same thing as a predicate) rather than using the corresponding type class. It’s a small price to pay for a working equality relation.

Having said this, one must acknowledge the enormous progress made by the Lean community over roughly the same period, 2017–now. Lean users, inspired by Buzzard, have taken on hugely ambitious tasks. The most striking is probably the Liquid Tensor Experiment [7]: brand-new mathematics, by a Fields medallist (Peter Scholze) who was concerned about its correctness, formalised over about a year and a half. This one accomplishment, more than anything else, demonstrates that formalisation can already offer real value to professional mathematicians.

We have from time to time looked at type issues directly. De Vilhena [37] describes an interesting technique for defining the n-ary direct product of a finite list of groups, iterating the binary direct product; his trick to avoid type issues involves creating an isomorphism to a suitable type. However, here one could avoid type issues (and handle the infinite case) by defining the direct product of a family in its own right as opposed to piggybacking off of the binary product. Anthony Bordg has done a lot of work on the right way to express mathematics without dependent types [2, 3]. Ongoing work, still unpublished, is exploring the potential of the types-to-sets framework [28] to allow a smooth transition between type-based and carrier-set based formalisations.

One can also compare formalisms in terms of their logical strength. Higher-order logic is somewhat weaker than Zermelo set theory, which is much weaker than ZFC, which in turn is much weaker than Tarski-Grothendieck set theory:

$$\begin{aligned} \textrm{HOL} < \textrm{Z} \ll \textrm{ZF} \ll \textrm{TG} \end{aligned}$$

The Calculus of Inductive Constructions, which is the formalism of Lean and Coq, is roughly equivalent to TG. The advantage of a weaker formalism is better automation. The power of ZF set theory, when it is required, can be obtained simply by loading the corresponding library from the AFP [33]. It’s highly likely that a similar library could be created for Tarski-Grothendieck. And yet, remarkably, everything we have tried to formalise, unless it refers explicitly to ZF, sits comfortably within HOL alone. Since HOL is essentially the formalism of Principia Mathematica [40], we can conclude that Whitehead and Russell were right all along.

The AFP entries contributed by the project authors are too many to list, but they can be consulted via the on-line author indices:

8 Conclusions

We set out to tackle serious mathematics with a combination of hope and trepidation. We were able to formalise everything we set out to formalise and were never forced to discard a development part way through. As Angeliki has pointed out, “we have formalised results by two Fields medalists (Roth and Gowers), an Abel prize winner (Szemerédi) and of course Erdős too!”

We’ve also seen impressive advances in search and language models to assist users in proof development. Although the effort required to formalise mathematical articles remains high, we can confidently predict that formalisation will be playing a significant role in mathematical research in the next few years.