Keywords

1 Introduction

Without exaggeration it can be said that Higman’s Lemma [15] is one of the most often proven theorems in Mathematical Logic and Theoretical Computer Science. The fascination of this theorem is due to the fact that it has various formulations and is of interest in different areas such Proof theory, Constructive Mathematics, Reverse Mathematics, and Term rewriting, as we will briefly discuss further below.

Nash-Williams [19] gave a very concise classical proof using the so-called minimal bad sequence argument. In the following we briefly recall well-quasiorderings and sketch Nash-Williams’ proof.

Definition

A binary relation \(\preceq \) on a set A is a well-quasiorder (wqo) if (i) it is transitive and (ii) every infinite sequence in A is “good”, i.e., \(\forall _{(a_i)_{i<\omega }} \exists _{i,j}(i<j \wedge a_i \preceq a_j)\).

Let \(A^*\) denote the set of finite sequences (“words”) with elements in A. We call a word \([a_1,\dots ,a_n]\) embeddable (\(\preceq ^*\)) in \([b_1,\dots ,b_{m}]\) if there exists a strictly increasing map \(f:\{1,\dots ,n\}\rightarrow \{1,\dots ,m\}\) such that \(a_i \preceq b_{f(i)}\) for all \(i\in \{1,\dots n\}\).

Now Higman’s Lemma says

If \((A,\preceq )\) is a well-quasiorder, then so is \((A^*,\preceq ^*)\).

Nash-Williams’ proof proceeds as follows. That a bad sequence of words, i.e., a sequence that is not good, is impossible is basically a consequence of two facts: (a) for each bad sequence exists a bad sequence with is smaller in a lexicographical sense, and (b), if there exists a bad sequence, then exists also a minimal bad sequence with respect to this lexicographical order. We give the proof in more detail:

  1. (1)

    In order to show “wqo \((A,\preceq )\) implies wqo \((A^*,\preceq ^*)\)” assume for contradiction that there is a bad sequence of words in \(A^*\).

  2. (2)

    Among all infinite bad sequences of words we choose (using classical dependent choice) a minimal bad sequence, i.e., a sequence \((w_i)_{i<\omega }\), such that, for all n, \(w_0,\dots ,w_n\) starts an infinite bad sequence, but \(w_0,\dots ,w_{n-1},v\), where v is a proper initial segment of \(w_n\), does not.

  3. (3)

    Since for all i \(w_i\ne [\!\;]\), let \(w_i = {a_i{*}v_i}\). By Ramsey’s theorem and the fact that our alphabet A is a well-quasiorder, there exists an infinite subsequence \(a_{\kappa _0} \preceq a_{\kappa _1} \preceq ~\cdots \) of the sequence \((a_i)_{i<\omega }\). This also determines a corresponding sequence \(w_0,\dots , w_{\kappa _0 - 1}, v_{\kappa _0}, v_{\kappa _1},\dots \).

  4. (4)

    The sequence \(w_0,\dots , w_{\kappa _0 - 1}, v_{\kappa _0}, v_{\kappa _1},\dots \) must be bad (otherwise also \((w_i)_{i<\omega }\) would be good), but this contradicts the minimality in (2).

The computational content of Nash-Williams’ proof was first investigated by Murthy [17], by applying Friedman’s A-translation in the interactive theorem prover NuPRL to the classical proof. Murthy represented functions as relations to eliminate choice, and used second order classical logic. However, due to the size of the translated proof and program, the resulting program could only be run on trivial input. In [29], the second author formalized Nash-Williams’ proof in the proof assistant Minlog, by applying a refined version of the A-translation and, contrary to Murthy, not eliminating the axiom of classical dependent choice, but rather adding computational content to it using bar recursion. This resulted in a considerable smaller extracted program, but still with infeasible run-times due to the eager evaluation strategy of Minlog’s term language. Reasonable results have been only obtained recently thanks to a Minlog extension which translates extracted terms to Haskell. Other formalizations of the classical proof include Herbelin’s formalization of Murthy’s A-translated proof in Coq [14] and Sternagel’s formalization of Nash-Williams’ proof in Isabelle [30] which also provides a proof of Kruskal’s theorem. However, [30] does not include the extraction of a program. Recently, Powell [21] applied Gödel’s Dialectica Interpretation to this proof. The interpretation yields a program, but no formalization has been provided so far.

In this paper, we aim at a constructive proof (without choice) which has the same underlying construction as Nash-Williams’ proof but allows us to directly read off the program. For a \(\{0,1\}\)-alphabet such a proof was given by Coquand and Fridlender [6, 7]. Here we provide a proof and a formalization for full Higman’s Lemma, and also discuss how this proof is related to other constructive proofs. The paper is organized as follows: We give a constructive reformulation of Nash-Williams’ proof in Sect. 2 and comment on its formalization in Sect. 3. In Sect. 4 we spell out the computational content of some of the proofs. Each time it comes in the form of a term (in an extension \(T^{+}\) of Gödels T) machine extracted from a formalization of the respective proof. We give an overview on existing formalizations of the Coquand/Fridlender proof at the end of Sect. 2 and add a comparison with other constructive proofs in Sect. 5.

2 A Constructive Reformulation of Nash-Williams’ Proof

The objective of this section is to present a constructive proof of Higman’s Lemma that uses the same combinatorial idea as Nash-Williams’ classical proof and generalizes the proof by Coquand and Fridlender. Such a proof (without formalization) has been given in [28]. However, if one is interested in the computational content one has to reformulate this proof and to change it at various places to make the computational content visible (see also the remark at the end of this section). We use an inductive characterization of a binary relation satisfying condition (ii) in the definition of a well-quasiorder; such relations have been called “almost full” in [33]. Our characterization is via a “bar” predicate which comes in two variants, one for the alphabet and one for words, see below for a definition. Thus, the statement we are going to prove is

$$\begin{aligned} \mathrm {BarA}_{\preceq }[\!\;]\rightarrow \mathrm {BarW}_{\preceq }[\!\;]. \end{aligned}$$

Throughout the whole paper we assume \(\preceq \) to be a binary relation on a set A which is decidable in the sense that it is given by a binary total function into the booleans; transitivity will not be needed. It suffices to let A be the set of natural numbers. Most of our notions will depend on the \(\preceq \)-relation. However, we usually suppress this dependence, since \(\preceq \) will be kept fixed most of the time.

Notation

We use

$$\begin{aligned} \begin{array}{ll} a,b, \dots &{}\text {for letters, i.e., elements of a}\ A,\\ a s , b s ,\dots &{}\text {for finite sequences of letters, i.e., elements of}\ A^*,\\ v,w, \dots &{}\text {for words, i.e., elements of}\ A^*,\\ v s , w s ,\dots &{}\text {for finite sequences of words, i.e., elements of}\ A^{**}. \end{array} \end{aligned}$$

Definition

(Higman embedding, inductive) The embedding relation \(\preceq ^*\) on \(A^*\) is defined inductively by the following axioms (written as rules):

$$\begin{aligned} \frac{\quad }{[\!\;]\preceq ^*[\!\;]} \qquad \frac{v \,\preceq ^*\,w}{v\,\preceq ^*\,{a{*}w}} \qquad \frac{a\preceq b \quad v\,\preceq ^*\,w}{{a{*}v} \,\preceq ^*\,{b{*}w}} \end{aligned}$$

where \({{*}}\) denotes the cons operation on lists.

Definition

\(\mathrm {GoodA}_{}\, a s \) expresses that a finite sequence \( a s \) of letters is good; note that finite sequences grow to the left, i.e., a finite sequence is good if there are two elements such that the one to the left is larger than or equal to w.r.t. \(\preceq \) to the one on the right. A sequence is called bad if it is not good. Furthermore, we use

$$\begin{aligned}\begin{array}{ll} \mathrm {Ge}_{\exists }(a, a s ) &{}:=\exists _{i<| a s |} a \succeq ( a s )_i,\\ \mathrm {Ge}_{\forall }(a, a s ) &{}:=\forall _{i<| a s |} a \succeq ( a s )_i,\\ \mathrm {Ge}_{\exists \forall }(a, w s ) &{}:=\exists _{i<| w s |} \forall _{j<|( w s )_i|} a \succeq ( w s )_{i,j}.\end{array} \end{aligned}$$

A finite sequence \( a s =[a_{n-1},..,a_0]\) is decreasing if \(a_j \succeq a_i\) whenever \(j \ge i\). Further, \(\mathrm {BSeq}_{} \, a s \) determines the “first” bad subsequence occurring in \( a s \):

$$\begin{aligned}\begin{array}{ll} \mathrm {BSeq}_{}\, [\!\;]&{}:=[\!\;], \\ \mathrm {BSeq}_{} ({a{*} a s }) &{}:={\left\{ \begin{array}{ll} {a{*}\mathrm {BSeq}_{} \, a s } &{}\text {if}\ \lnot \mathrm {Ge}_{\exists } (a, a s ),\\ \mathrm {BSeq}_{} \, a s &{}\text {otherwise}. \end{array}\right. }\end{array} \end{aligned}$$

Definition

We inductively define a set \(\mathrm {BarA}_{} \subseteq A^*\) by the following rules:

$$\begin{aligned} \frac{\mathrm {GoodA}_{}\, a s }{\mathrm {BarA}_{}\, a s } \qquad \qquad \frac{\forall _a\,\mathrm {BarA}_{}\, {{a{*} a s }}}{\mathrm {BarA}_{}\, a s }. \end{aligned}$$

\(\mathrm {BarW}_{}\) \( w s \) is defined similarly, using the corresponding \(\mathrm {GoodW}_{}\) \( w s \). However, since \(\mathrm {GoodW}_{}\) is a predicate on words, it refers to the embedding relation \(\preceq ^*\) on \(A^*\) rather than \(\preceq \) directly.

As in the end we are interested in getting a program that for any sequence of words yields witnesses that this sequence is good we also prove the following.

Proposition

(BarWToGoodInit) \(\mathrm {BarW}_{}{[\!\;]}\) implies that every infinite sequence of words has a good initial segment.

Proof

Let f be a variable of type nat=>list nat. We show, more generally,

$$\begin{aligned} \forall _{ w s ,f,n}(\mathrm {BarW}_{}\ w s \rightarrow \mathrm {Rev}(\bar{f}n)= w s \rightarrow \exists _m \mathrm {GoodW}_{} (\mathrm {Rev}(\bar{f}m))) \end{aligned}$$

by induction on \(\mathrm {BarW}_{}\). The proposition then follows with \( w s =[\!\;]\).

  1. 1.

    \(\mathrm {GoodW}_{}\) \( w s \). Assume that there are an infinite sequence f and a number n such that \(\mathrm {Rev}(\bar{f}n)= w s \) (i.e., \([f(n-1),\dots , f0] = w s \)). Since \( w s \) is good, we can take m to be n.

  2. 2.

    Using the induction hypothesis

    $$\begin{aligned} \forall _{w,f,n}(\mathrm {Rev}(\bar{f}n) = {w{*} w s }\rightarrow \exists _m \mathrm {GoodW}_{} (\mathrm {Rev}(\bar{f}m))) \end{aligned}$$

    with fn, f and \(n+1\), we only have to prove \(\mathrm {Rev}(\bar{f}(n+1)) = {fn{*}ws}\), which follows from \(\mathrm {Rev}(\bar{f}n)= w s \).   \(\square \)

Note that the reverse direction expresses a form of bar induction. However, for the proof below the present direction suffices.

In the following we want to first highlight the idea behind the constructive proof. This is best done by showing how the steps (1)–(4) in the proof of Nash-Williams given in the introduction are dealt with in the inductive proof.

  1. (1)

    Prove inductively “\(\mathrm {BarA}_{}\, [\!\;]\rightarrow \mathrm {BarW}_{} [\!\;]\)”.

  2. (2)

    The minimality argument will be replaced by structural induction on words.

  3. (3)

    Given a sequence \( w s =[w_n,\dots ,w_0]\) s.t. \(w_i={a_i{*}v_i}\), we are interested in all decreasing subsequences \([a_{\kappa _l}, \dots , a_{\kappa _0}]\) of maximal length and their corresponding sequences \( v_{\kappa _l}, \dots , v_{\kappa _0}, w_{\kappa _0 - 1}, \dots , w_0\). The sequences \([a_{\kappa _l},\dots ,a_{\kappa _0}]\) form a forest. In the proof these sequences will be computed by the procedure \(\mathrm {Forest}_{}\) which takes \( w s \) as input and yields a forest labeled by pairs in \(A^{**}\times A^*\). In the produced forest the right-hand components of each node form such a descending subsequence \([a_{\kappa _i},\dots , a_{\kappa _0}]\) and the corresponding left-hand component consists of the sequence \([v_{\kappa _i},\dots , v_{\kappa _0},w_{\kappa _0 - 1},\dots , w_0]\). If we extend the sequence \( w s \) to the left by a word \({a{*}v} \), then in the existing forest either new nodes, possibly at several places, are inserted, or a new singleton tree with root node \({\langle {{v{*} w s }},{[a]}\rangle }\) is added. Now the informal idea of the inductive proof is: if in \(\mathrm {Forest}_{}\) \( w s \) new nodes cannot be inserted infinitely often (without ending up with a good left-hand component in a node) and if also new trees cannot be added infinitely often, then \( w s \) can not be extended badly infinitely often. Formally, this will be captured by the statement:

    $$\begin{aligned} \forall _{ w s }(\mathrm {BarW}_{} {(\mathrm {BSeq}_{}(\mathrm {Heads}\, w s ))} \rightarrow \mathrm {BarF}_{}{(\mathrm {Forest}_{}\ w s )} \rightarrow \mathrm {BarW}_{}\ w s ). \end{aligned}$$
  4. (4)

    The first part of item (4) corresponds to GoodWForestToGoodW.

Definition

For a finite sequence \( w s \) of words let \(\mathrm {Heads}\, w s \) denote the finite sequence consisting of the starting letters of the non-empty words. We call a finite sequence \( w s \) of words admissible (\(\mathrm {Adm}\, w s \)) if each word in \( w s \) is non-empty.

Notation

We use t for elements in \(T(A^{**}\times A^*)\), i.e., trees labeled by pairs in \(A^{**}\times A^*\), and \( t s , s s \) for elements in \((T(A^{**}\times A^*))^*\), i.e., forests. The tree with root \({\langle { w s },{ a s }\rangle }\) and list of subtrees \( t s \) is written \({\langle { w s },{ a s }\rangle } t s \). We use the destructors \(\mathrm {Left}\, \) and \(\mathrm {Right}\, \) for pairs and the destructors \(\mathrm {Root}\, \) and \(\mathrm {Subtrees}\, \) for trees. For better readability we set:

$$\begin{aligned}\begin{array}{ll} \mathrm {Newtree}\,{\langle { w s },{ a s }\rangle } &{}:= {\langle { w s },{ a s }\rangle } [\!\;], \\ \mathrm {Roots}\, [t_{n-1},\dots ,t_0] &{}:= [\mathrm {Root}\, t_{n-1},\dots ,\mathrm {Root}\, t_0],\\ \mathrm {Lefts}\, [{\langle { v s _{n-1}},{ a s _{n-1}}\rangle },\dots ,{\langle { v s _0},{ a s _0}\rangle }]&{}:= [ v s _{n-1},\dots , v s _0],\\ \mathrm {Rights}\, [{\langle { v s _{n-1}},{ a s _{n-1}}\rangle },\dots ,{\langle { v s _0},{ a s _0}\rangle }]&{}:= [ a s _{n-1},\dots , a s _0]. \end{array} \end{aligned}$$

Definition

Let \( w s \in A^{**}\) be a sequence of words. Then \(\mathrm {Forest}_{}\, w s \in (T(A^{**}\times A^*))^*\) is recursively defined by

$$\begin{aligned}&\mathrm {Forest}_{}\, [\!\;]:=[\!\;], \\&\mathrm {Forest}_{}\, {[\!\;]{*} w s }:=\mathrm {Forest}_{}\, w s , \\&\mathrm {Forest}_{} {({a{*}v}){*} w s }:={} \\&{\left\{ \begin{array}{ll} \mathrm {InsertF}_{}(\mathrm {Forest}_{}\, w s ,v,a) &{}\text {if}\ \mathrm {Ge}_{\exists }(a, \mathrm {BSeq}_{}(\mathrm {Heads}\, w s )), \\ {\mathrm {Newtree}\,{\langle {{v{*} w s }},{[a]}\rangle }{*}(\mathrm {Forest}_{}\, w s )} &{}\text {otherwise} \end{array}\right. } \end{aligned}$$

where

$$\begin{aligned} \mathrm {InsertF}_{}({ t s }, v, a) :=\mathrm {map} \left( \lambda _t\left[ \begin{array}{ll} \text {if} &{} \mathrm {Ge}_{\forall }(a,\mathrm {Right}\, (\mathrm {Root}\, t))\\ &{} {\mathrm {InsertT}_{}( t, v, a)}\\ &{} t \end{array} \right] \right) \ t s \end{aligned}$$

and

$$\begin{aligned}&\mathrm {InsertT}_{}( {{\langle { v s },{ a s '}\rangle } t s }, v, a) :={} \\&{\left\{ \begin{array}{ll} {\langle { v s },{ a s '}\rangle }\mathrm {InsertF}_{}( t s , v, a) &{}\text {if}\ \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s )), \\ {\langle { v s },{ a s '}\rangle }({\mathrm {Newtree}\,{\langle {{v{*} v s }},{{a{*} a s }'}\rangle }{*} t s }) &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Example

Take as (almost full) relation the natural numbers with \(\le \). For better readability we use underlining rather than parentheses to indicate a list. We will only use one-digit numbers, hence every digit stands for a natural number. Then \(\mathrm {Forest}_{}\ [\underline{28}, \underline{421}, \underline{69}, \underline{35}]\) is

and \(\mathrm {Forest}_{}\ [\underline{52}, \underline{28}, \underline{421}, \underline{69}, \underline{35}]\) is

If we “project” each node to its right-hand-side we obtain

The leaves of e.g. the final tree are the maximal decreasing subsequences of the heads [52463] of \([\underline{52}, \underline{28}, \underline{421}, \underline{69}, \underline{35}]\). Recall that the left-hand-side of each leaf consists of the sequence \([v_{\kappa _i},\dots ,v_{\kappa _0},w_{\kappa _0-1},\dots ,w_0]\), and its right-hand-side is the maximal descending subsequence \([a_{\kappa _i},\dots ,a_{\kappa _0}]\) of \([a_n,\dots ,a_0]\). In the example the leaf \(([\underline{2}, \underline{8}, \underline{421}, \underline{69}, \underline{35},],\underline{52},)\) has exactly this form: \(\underline{52}\) is a maximal descending subsequence of \(\underline{52463}\), and we have \([\underline{52}, \underline{28}, \underline{421}, \underline{69}, \underline{35}] = [(5*\underline{2}), (2*\underline{8}), \underline{421}, \underline{69}, \underline{35}]\).

Definition

Let \(t\in T(A^{**}\times A^*)\). Then t is a tree with a good leaf (\(\mathrm {GLT}_{}\, t\)) if there is a leaf with a good left side. We inductively define the predicate \(\mathrm {BarF}_{}\subseteq (T(A^{**}\times A^*))^* \) by the rules

$$\begin{aligned} \frac{\mathrm {GLT}_{} {( t s )_i}}{\mathrm {BarF}_{} t s } \quad \frac{\forall _{a,v}(\mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s )) \rightarrow \mathrm {BarF}_{}(\mathrm {InsertF}_{}( t s , v, a )))}{\mathrm {BarF}_{} t s } \end{aligned}$$

Lemma

(GoodWProjForestToGoodW, BSeqHeadsEqRhtsRootsForest)    

  1. (a)

    \(\forall _{ w s , i}(i<\mathrm {Lh}\, w s \rightarrow \mathrm {GLT}_{}(\mathrm {Forest}_{}\ w s )_i \rightarrow \mathrm {GoodW}_{}\ w s )\).

  2. (b)

    \(\forall _{ w s }(\mathrm {Adm}\, w s \rightarrow \mathrm {BSeq}_{}(\mathrm {Heads}\, w s )= \mathrm {Heads}(\mathrm {Rights}(\mathrm {Roots}\, (\mathrm {Forest}_{}\ w s ))))\).

Proof

Both parts follow from the construction of \(\mathrm {Forest}_{}\); the proof of (a) is rather laborious and involves a number of auxiliary notions. However, since we are mainly interested in computational content and this lemma has none, we do not give details.   \(\square \)

Lemma

(BarFNil, BarFAppd)    

  1. (a)

    \(\mathrm {BarF}_{}[\!\;]\).

  2. (b)

    \(\forall _{t, t s }( \mathrm {BarF}_{}{[t]} \rightarrow \mathrm {BarF}_{}{ t s } \rightarrow \mathrm {BarF}_{}{{t{*} t s }})\).

Proof

(a) \(\mathrm {BarF}_{}[\!\;]\) follows from the second rule of the definition of \(\mathrm {BarF}_{}{}\), using ex-falso-quodlibet.

(b) This assertion holds since \(\mathrm {InsertF}_{}\) is defined by a map operation. In more detail, using \(\#\) for the concatenation of two lists, we prove

$$\begin{aligned} \forall _{ t s }( \mathrm {BarF}_{}{ t s } \rightarrow \forall _{ s s }( \mathrm {BarF}_{}{ s s } \rightarrow \mathrm {BarF}_{}{ t s \# s s })) \end{aligned}$$

by induction on \(\mathrm {BarF}_{} { t s }\). The base case is straightforward as \(\mathrm {GLT}_{} ( t s )_i\) implies \(\mathrm {GLT}_{}{( t s \# s s )_i}\). In the step case we have

$$\begin{aligned} \mathrm {ih}_{1}:&\forall _{v,a}( \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s )) \rightarrow {} \\&\forall _{ s s }(\mathrm {BarF}_{} { s s } \rightarrow \mathrm {BarF}_{} (\mathrm {InsertF}_{}( t s , v, a)\# s s ))) \end{aligned}$$

and need to prove \(\forall _{ s s }( \mathrm {BarF}_{}{ s s } \rightarrow \mathrm {BarF}_{}{ t s \# s s })\). Fix \( s s \in A^*\) and use induction on \(\mathrm {BarF}_{} { s s }\). The base case again is easy since \(\mathrm {GLT}_{} ( s s )_i\) implies that there is j such that \(\mathrm {GLT}_{}{( t s \# s s )_j}\). In the step case we have

$$\begin{aligned} \mathrm {ih}_{2}:\forall _{v,a}( \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, s s )) \rightarrow \mathrm {BarF}_{} t s \# \mathrm {InsertF}_{}( s s , v, a)) \end{aligned}$$

as well as its “strengthening”

$$\begin{aligned} \mathrm {ih}_{2a} :\forall _{v,a}( \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, s s )) \rightarrow \mathrm {BarF}_{}(\mathrm {InsertF}_{}( s s , v, a))). \end{aligned}$$

To show \(\mathrm {BarF}_{} { t s \# s s }\), assume va with \(\mathrm {Ge}_{\exists \forall }( a, \mathrm {Rights}(\mathrm {Roots}\, t s \# s s ))\) and show \(\mathrm {BarF}_{} (\mathrm {InsertF}_{}( t s \# s s , v, a))\).

Case 1. \(\lnot \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s ))\), i.e., new nodes are only added to \( s s \); \( t s \) remains unchanged. Then \(\mathrm {BarF}_{} t s \# s s '\) follows by \(\mathrm {ih}_{2}\).

Case 2. \(\mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s )))\). First assume that new nodes are added to both \( t s \) and \( s s \), i.e., \(\mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, s s ))\). In this case we use with va and \(\mathrm {InsertF}_{}( s s , v, i)\). We still need to show \(\mathrm {BarF}_{} (\mathrm {InsertF}_{}( s s , v, a))\), which holds because of \(\mathrm {ih}_{2a}\).

Now assume \(\lnot \mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, s s ))\), i.e., new nodes are only added to \( t s \). In this case we apply \(\mathrm {ih}_{1}\) with va and \( s s \) where we use \(\mathrm {ih}_{2a}\) and the definition of \(\mathrm {BarF}_{}{}\) to obtain \(\mathrm {BarF}_{}{ s s }\).   \(\square \)

The next lemma tells us that a forest consisting of only one tree, in which we continue to insert new nodes by \(\mathrm {InsertF}_{}\) operations, eventually becomes good.

Lemma

(BarFNew) Assume \(\mathrm {BarA}_{}\, [\!\;]\). Then

$$\begin{aligned} \forall _{ w s _0}( \mathrm {BarW}_{}\ w s _0 \rightarrow \forall _{ a s _0} \mathrm {BarF}_{}{[\mathrm {Newtree}\,{\langle { w s _0},{ a s _0}\rangle }]}). \end{aligned}$$

Proof

Ind\(_1(\mathrm {BarW}_{})\). 1.1. \(\mathrm {GoodW}_{}\ w s _0\). Then \(\mathrm {GLT}_{} (\mathrm {Newtree}\,{\langle { w s _0},{ a s _0}\rangle })\), i.e., \(\mathrm {BarF}_{} [\mathrm {Newtree}\,{\langle { w s _0},{ a s _0}\rangle }]\). 1.2. Assume

$$\begin{aligned} \mathrm {ih}_{1}:\forall _{w, a s } \mathrm {BarF}_{} {[\mathrm {Newtree}\,{\langle {{w{*} w s }_0},{ a s }\rangle }]}. \end{aligned}$$

Let \( a s _0\in A\). Instead of proving \(\mathrm {BarF}_{}{[\mathrm {Newtree}\,{\langle { w s _0},{ a s _0}\rangle }]}\) we show more generally that this assertion holds for all t with \(\mathrm {Root}\, t={\langle { w s _0},{ a s _0}\rangle }\) and (a) \(\mathrm {Subtrees}\, t\) in \(\mathrm {BarF}_{}{}\), and (b) \(\mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}(\mathrm {Subtrees}\, t)))}\) in \(\mathrm {BarA}_{}\). We do this by main induction on (b) and side induction on (a), i.e., we prove

$$\begin{aligned} \forall _{ a s }&(\mathrm {BarA}_{}\, a s \rightarrow \lnot \mathrm {GoodA}_{}\, a s \rightarrow {} \\ \forall _{ t s }&(\mathrm {BarF}_{} t s \rightarrow a s =\mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))} \rightarrow \mathrm {BarF}_{} {[{\langle { w s _0},{ a s _0}\rangle } t s ]})). \end{aligned}$$

Ind\(_2(\mathrm {BarA}_{})\). 2.1. \(\mathrm {GoodA}_{}\, a s \). Then the conclusion follows immediately by ex-falso-quodlibet with the premise \(\lnot \mathrm {GoodA}_{} a s \). 2.2. \(\mathrm {BarA}_{}\, a s \) is obtained by the second rule. We assume \( a s \) and

$$\begin{aligned} \mathrm {ih}_{2}:\forall _{a, t s }(\mathrm {BarF}_{} t s \,\rightarrow {a{*} a s }= \mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))} \rightarrow \mathrm {BarF}_{} {[{\langle { w s _0},{ a s _0}\rangle } t s ]}), \end{aligned}$$

and have to show

$$\begin{aligned} \forall _{ t s }( \mathrm {BarF}_{} t s \rightarrow a s =\mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))} \rightarrow \mathrm {BarF}_{} {[{\langle { w s _0},{ a s _0}\rangle } t s ]}). \end{aligned}$$

Ind\(_3(\mathrm {BarF}_{})\). 3.1. Fix \(( t s )_i\) such that \(\mathrm {GLT}_{}( t s )_i\). By the first clause of \(\mathrm {BarF}_{}\), for any t such that \(\mathrm {Subtrees}\, t = t s \), \(\mathrm {GLT}_{} ( t s )_i\) implies \(\mathrm {BarF}_{} [t]\). 3.2. Fix \( t s \) with \( a s = \mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))}\) and assume the induction hypothesis

$$\begin{aligned} \mathrm {ih}_{3}:\forall _{v, a}&( \mathrm {Ge}_{\exists }( a, {\mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))}}) \rightarrow {} \\&a s = \mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}(\mathrm {InsertF}_{}( t s , v, a)) ))}\rightarrow {} \\&\mathrm {BarF}_{}{[{\langle { w s _0},{ a s _0}\rangle } {\mathrm {InsertF}_{}( t s , v, a)}]}) \end{aligned}$$

together with its strengthening

$$\begin{aligned} \mathrm {ih}_{3a}: \forall _{v, a}(\mathrm {Ge}_{\exists }( a, {\mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))}}) \rightarrow \mathrm {BarF}_{} (\mathrm {InsertF}_{}( t s , v, a))). \end{aligned}$$

To show \(\mathrm {BarF}_{} [{\langle { w s _0},{ a s _0}\rangle } t s ]\) we use the second clause, i.e., prove

$$\begin{aligned} \forall _{v, a}( \mathrm {Ge}_{\exists }( a, {\mathrm {Head}{[ a s _0]}}) \rightarrow \mathrm {BarF}_{} (\mathrm {InsertF}_{}( {[{\langle { w s _0},{ a s _0}\rangle } t s ]}, v, a))). \end{aligned}$$

We fix v and a with \(\mathrm {Ge}_{\exists }(a, a s _0)\) and prove the statement by case distinction on how a relates to \( a s \), i.e., whether nodes in the existing subtrees \( t s \) need to be inserted, or whether a new subtree has to be added.

Case 1. \(\mathrm {Ge}_{\exists }(a, a s )\). In this case we have

$$\begin{aligned} a s = \mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}\, t s ))} = \mathrm {Heads}{(\mathrm {Rights}(\mathrm {Roots}(\mathrm {InsertF}_{}( t s , v, a))))} \end{aligned}$$

and by applying \(\mathrm {ih}_{3}\) we obtain \(\mathrm {BarF}_{}{[{\langle { w s _0},{ a s _0}\rangle } {\mathrm {InsertF}_{}( t s , v, a)}]}\).

Case 2. \(\lnot \mathrm {Ge}_{\exists }(a, a s )\). In this case we need to show

$$\begin{aligned} \mathrm {BarF}_{} {[{\langle { w s _0},{ a s _0}\rangle } {\mathrm {Newtree}\,{\langle {{w{*} w s _0}},{{a{*} a s _0}}\rangle }{*} t s }]} \end{aligned}$$

which can be obtained by applying \(\mathrm {ih}_{2}\) to a and \({\mathrm {Newtree}\,{\langle {{w{*} w s _0}},{{a{*} a s _0}}\rangle }{*} t s }\) provided we can show

$$\begin{aligned} \mathrm {BarF}_{} {({\mathrm {Newtree}\,{\langle {{w{*} w s _0}},{{a{*} a s _0}}\rangle }{*} t s })}. \end{aligned}$$

This follows from \(\mathrm {BarF}_{} t s \) and \(\mathrm {BarF}_{}{[\mathrm {Newtree}\,{\langle {{w{*} w s _0}},{{a{*} a s _0}}\rangle }]}\) via BarFAppd. The former holds by \(\mathrm {ih}_{3a}\), the latter follows by \(\mathrm {ih}_{1}\).

Now, the proof of the general assertion is completed. Since \(\mathrm {BarA}_{}\, [\!\;]\) by assumption and \(\mathrm {BarF}_{}[\!\;]\) by BarFNil, we may in the assertion put \( a s =[\!\;]\) and \( t s =[\!\;]\) and end up with \( \mathrm {BarW}_{}\ w s \rightarrow \mathrm {BarF}_{}{[\mathrm {Newtree}\,{\langle { w s _0},{ a s _0}\rangle }]}\).   \(\square \)

Theorem

(Higman) \(\mathrm {BarA}_{}\, [\!\;]\rightarrow \mathrm {BarW}_{} [\!\;]\).

Proof

Assume \(\mathrm {BarA}_{[\!\;]}\). We show more generally

$$\begin{aligned} \forall _{ a s }&(\mathrm {BarA}_{}\, a s \rightarrow \\ \forall _{ t s }&(\mathrm {BarF}_{} t s \rightarrow \\ \forall _{ w s }&(\mathrm {Adm}\, w s \rightarrow \mathrm {BSeq}_{}(\mathrm {Heads}\, w s )= a s \rightarrow \mathrm {Forest}_{}\ w s = t s \rightarrow \mathrm {BarW}_{}\ w s ))). \end{aligned}$$

Ind\(_1(\mathrm {BarA}_{})\). 1.1. \(\mathrm {GoodA}_{}\ a s \). Then, the result follows by ex-falso-quodlibet since for any \( w s \), \(\mathrm {BSeq}_{}(\mathrm {Heads}\, w s )\) is bad.

1.2. Let \( a s \in A^* \) and assume

$$\begin{aligned} \mathrm {ih}_{1} :\forall _{a, t s }&(\mathrm {BarF}_{} t s \rightarrow {} \\ \forall _{ w s }&(\mathrm {Adm}\, w s \rightarrow \mathrm {BSeq}_{}(\mathrm {Heads}\, w s ) = {a{*} a s }\rightarrow \mathrm {Forest}_{}\ w s {=} t s \rightarrow \mathrm {BarW}_{}\ w s )). \end{aligned}$$

Ind\(_2(\mathrm {BarF}_{})\). 2.1. \(\mathrm {GLT}_{} {( t s )_i}\). Then, by GoodWProjForestToGoodW, for any \( w s \) such that \(\mathrm {Forest}_{}\ w s = t s \) we obtain \(\mathrm {GoodW}_{}\ w s \) and hence \(\mathrm {BarW}_{}\ w s \). 2.2. Fix \( t s \) and assume

$$\begin{aligned} \mathrm {ih}_{2} :\forall _{v,a}&(\mathrm {Ge}_{\exists }( a, \mathrm {Heads}(\mathrm {Rights}(\mathrm {Roots}\, t s ))) \rightarrow \\ \forall _{ w s }&(\mathrm {Adm}\, w s \rightarrow \mathrm {BSeq}_{}(\mathrm {Heads}\, w s ) = a s \rightarrow \mathrm {Forest}_{}\ w s {=} \mathrm {InsertF}_{}( t s , v, a) \rightarrow {} \\&\mathrm {BarW}_{}\ w s )) \end{aligned}$$

as well as the strengthening of the induction hypothesis

$$\begin{aligned} \mathrm {ih}_{2a} :\forall _{v,a}(\mathrm {Ge}_{\exists }( a, \mathrm {Heads}(\mathrm {Rights}(\mathrm {Roots}\, t s ))) \rightarrow \mathrm {BarF}_{}(\mathrm {InsertF}_{}( t s , v, a))). \end{aligned}$$

Assume that we have \( w s \) such that \(\mathrm {BSeq}_{}(\mathrm {Heads}\, w s )= a s \) and \(\mathrm {Forest}_{}\ w s = t s \). In order to prove \(\mathrm {BarW}_{}\ w s \), we fix a word w and show \(\mathrm {BarW}_{}{{w{*} w s }}\) by induction on the structure of w:

Ind\(_3 (w)\). 3.1. \(\mathrm {BarW}_{}{{[\!\;]{*} w s }}\) holds since the empty word is embeddable in any word. 3.2. Assume that we have a word of form \({a{*}w}\). We show \(\mathrm {BarW}_{}{{({a{*}w}){*} w s }}\) by case analysis on whether or not \(\mathrm {Ge}_{\exists }( a, a s )\).

Case 1. \(\mathrm {Ge}_{\exists }( a, a s )\).

In this case, we have

$$\begin{aligned}\begin{array}{ll} \mathrm {BSeq}_{}(\mathrm {Heads}({({a{*}w}){*} w s })) &{}= a s ,\\ \mathrm {Forest}_{}{({({a{*}w}){*} w s })}&{}=\mathrm {InsertF}_{}( t s , w, a). \end{array} \end{aligned}$$

By BSeqHeadsEqRhtsRootsForest and the definition of \(\mathrm {Forest}_{}{({a{*}w}){*} w s }\), we know that at least one node has been inserted into \(\mathrm {Forest}_{}\ w s \). In this situation, we may apply \(\mathrm {ih}_{2}\) (to \(\mathrm {InsertF}_{}( t s , w, a)\) and \({({a{*}w}){*} w s }\)) and conclude \(\mathrm {BarW}_{}{{({a{*}w}){*} w s }}\).

Case 2. \(\lnot \mathrm {Ge}_{\exists }( a, a s )\). Then we have

$$\begin{aligned}\begin{array}{ll} \mathrm {BSeq}_{}(\mathrm {Heads}({({a{*}w}){*} w s })) &{}= {a{*} a s },\\ \mathrm {Forest}_{}({{{a{*}w})}{*} w s }) &{}= {\mathrm {Newtree}\,{\langle {{w{*} w s }},{[a]}\rangle }{*} t s }.\end{array} \end{aligned}$$

By \(\mathrm {ih}_{2a}\) and \(\mathrm {ih}_{3}\), we have \(\mathrm {BarF}_{} t s \) and \(\mathrm {BarW}_{} {{w{*} w s }}\). Hence, by BarFNew applied to \({w{*} w s }\) and [a], we obtain \(\mathrm {BarF}_{}{[\mathrm {Newtree}\,{\langle {{w{*} w s }},{ [a]}\rangle }}]\). By BarFAppd we may conclude

$$\begin{aligned} \mathrm {BarF}_{} {[{\mathrm {Newtree}\,{\langle {{w{*} w s }},{[a]}\rangle }{*} t s }]}. \end{aligned}$$

Now we are able to apply \(\mathrm {ih}_{1}\) (to \(a,\,{\mathrm {Newtree}\,{\langle {{w{*} w s }},{[a]}\rangle }{*} t s }\) and \({({a{*}w}){*} w s }\)) and end up with \(\mathrm {BarW}_{}{{({a{*}w}){*} w s }}\). This completes the proof of the general assertion.

Now, by putting \( a s =[\!\;]\), \( t s =[\!\;]\) and \( w s =[\!\;]\) and the fact that \(\mathrm {BarF}_{}[\!\;]\) always holds (by BarFNil) we obtain \(\mathrm {BarA}_{}\, [\!\;]\rightarrow \mathrm {BarW}_{} [\!\;]\).   \(\square \)

Remark

In order to make the computational content behind the inductive proof visible, it is essential to use a “positive” formulation of a well-quasiorder, that is, a definition using two rules, as was pointed out, e.g., in [10]. Having a proof of \(\mathrm {BarW}_{}\ w s \) implies that the proof yields the information whether \(\mathrm {BarW}_{}\ w s \) was obtained by the first rule or by the second. In the first case the result can be read off, in the second we continue with looking at a proof of \(\mathrm {BarW}_{}\ {w* w s }\) . If we used a definition consisting of only one rule, i.e., an \({\mathrm {acc}_{\preceq }}\)-notion as in [28], \(\mathrm {BarW}_{}\ w s \) would correspond to

$$\begin{aligned} \lnot \exists _i(i<\mathrm {Lh}\ w s \rightarrow ( w s )_i\ \text {embeds into}\ w s ) \rightarrow \mathrm {BarW}_{}{{w{*} w s }} \end{aligned}$$

where the test whether or not the premise holds results in a brute-force search; it is not given by the proof itself.

In the next section we discuss a formalization of this proof. For the special case of {0,1} there are formalizations in Agda (Fridlender), Minlog (Seisenberger), Isabelle (Berghofer, [3]) and Coq (Berghofer). The formalization of the general case is much more elaborate. Such a formalization has been given in Coq, by Delobel.Footnote 1 However, its computational content has not been extracted and investigated. It would suffer from the point made in the previous remark, and its usage of the Set/Prop distinction (see footnote 2) in Coq. Here we want to demonstrate how to get hold of the computational content of a (non-trivial) proof by means of an extracted term, and that this term clearly represents the computationally relevant aspects of the underlying proof.

3 Formalization

Why should we formalize the rather clear proof given in the previous section? There is of course the obvious reason that we want to be sure that it is correct. However, in addition we might want to get hold of its computational content. We will present this content in the form of an “extracted term”, in (an extension \(T^{+}\) of) Gödel’s T. This term can be applied to another term representing an infinite sequence of words, and then evaluated (i.e., normalized). The normal form is a numeral determining a good finite initial segment of the input sequence.

When formalizing we of course need a theory (or formal system) where this is done. Now what features of such a theory are essential for our task? First of all, we have to get clear about (i) what “computational content” is, and (ii) where it arises. We use the Kleene-Kreisel concept of modified realizability for the former. In fact, we will have a formula expressing “the term t is a realizer for the formula A” inside our formal system. For the latter, we take it that computational content only arises from inductive predicates; prime examples are the \(\mathrm {Bar}\) predicates introduced in the previous section. But then a particular aspect becomes prominent: we need “non computational” (n.c.) universal quantification [1] written \(\forall ^{\mathrm {nc}}\) to correctly express the type of a computational problem of the form

$$\begin{aligned} \forall ^{\mathrm {nc}}_{ a s }(\mathrm {BarA}_{}\, a s \rightarrow A). \end{aligned}$$

Its intended computational content is a function f mapping a witness that \( a s \) is in \(\mathrm {BarA}_{}\) into a realizer of A. It is important that f does not get \( a s \) as an argument.Footnote 2

On the more technical side, we use \(\mathrm {TCF}\) [26], a form of \(\mathrm {HA}^{\omega }\) extended by inductively defined predicates and n.c. logical connectives. \(\mathrm {TCF}\) has the (Scott-Ershov) partial continuous functions as its intended model.

It is also mandatory to use a proof assistant to help with the task of formalization. We use MinlogFootnote 3 [2], which is designed to support these features.

Space does not permit to present the full formalizationFootnote 4 of the constructive proof above of Higman’s Lemma. We restrict ourselves to comment on some essential aspects.

Most important are of course the basic definitions of the data structures (free algebras) and predicates involved. Their formal definitions are very close to the informal ones above and do not need to be spelled out. However, already at this level computational content crops up: an inductive predicate may or may not have computational content. Examples for the former are the \(\mathrm {Bar}\) predicates, and for the latter the \(\mathrm {GoodA}_{}\) predicate. It is convenient to define the \(\mathrm {GoodA}_{}\) predicate inductively, but—since it is decidable—we can also view it as a (primitive) recursive boolean valued function.

The first point in the proof above where we have to be careful with n.c. quantification is the inductive definition of \(\mathrm {BarA}_{}\), with clauses

$$\begin{aligned}&\mathtt {InitBarA} :\forall ^{\mathrm {nc}}_{\preceq , a s }( \mathrm {GoodA}_{\preceq } a s \rightarrow \mathrm {BarA}_{\preceq } a s ),\\&\mathtt {GenBarA} :\quad \forall ^{\mathrm {nc}}_{\preceq , a s }( \forall _a \mathrm {BarA}_{\preceq } a * a s \rightarrow \mathrm {BarA}_{\preceq } a s ). \end{aligned}$$

The (free) algebra of witnesses for this inductive predicate is called \(\mathtt {treeA}\). In the clause \(\mathtt {GenBarA}\) the generation tree of \(\mathrm {BarA}_{\preceq } as\) should have infinitely many predecessors indexed by a, hence we need \(\forall _a\). However, the outside quantifier is \(\forall ^{\mathrm {nc}}_{\preceq , a s }\), since we do not want to let the argument \( a s \) be involved in the computational content of \(\mathrm {BarA}_{\preceq } a s \). Hence \(\mathtt {treeA}\) has constructors

$$\begin{aligned}&\mathtt {CInitBarA} :\mathtt {treeA},\\&\mathtt {CGenBarA} :\quad (\mathtt {nat}\Rightarrow \mathtt {treeA})\Rightarrow \mathtt {treeA}. \end{aligned}$$

A similar (but slightly more involved) comment applies to the inductive definition of \(\mathrm {BarF}_{}\). For readability we omit the dependency on \(\preceq \) here. The clauses are

$$\begin{aligned} \mathtt {InitBarF} :\forall ^{\mathrm {nc}}_{ t s ,i}(i<\mathrm {Lh}\, t s \rightarrow \mathrm {GLT}_{}\, ( t s )_i \rightarrow \mathrm {BarF}_{} t s ), \end{aligned}$$

and \(\mathtt {GenBarF} :\)

$$\begin{aligned}&\forall ^{\mathrm {nc}}_{ t s }(\forall _{ t a s ,a,v}( t a s =\mathrm {ProjF}\, t s \rightarrow \mathrm {Ge}_{\exists \forall }(a,\mathrm {Roots}\, t a s ) \rightarrow {}\\&\qquad \qquad \ \mathrm {BarF}_{}(\mathrm {InsertF}_{}( t s , v, a)) \rightarrow {}\\&\qquad \mathrm {BarF}_{} t s ). \end{aligned}$$

We need the concept of the “A-projection” of a tree t, where each rhs of a label in t is projected out. Here only the A-projection of \( t s \) (but not \( t s \)) is used computationally. More precisely, the predecessors of \(\mathrm {BarF}_{} t s \) are all \(\mathrm {InsertF}_{}( t s , v, a)\) for va with \(\mathrm {Ge}_{\exists \forall }(a, \mathrm {Rights}(\mathrm {Roots}\, t s ))\). To decide the latter, we need (computationally) \(\mathrm {Rights}(\mathrm {Roots}\, t s )\), i.e., the A-projection of \( t s \).

The (free) algebra of witnesses for the inductive predicate \(\mathrm {BarF}_{}\) is called \(\mathtt {treeF}\); its constructors are

$$\begin{aligned}&\mathtt {CInitBarF} :\mathtt {treeF},\\&\mathtt {CGenBarF} :\quad (\mathtt {list}\ \mathtt {lntree}\ \mathtt {nat}\Rightarrow \mathtt {nat}\Rightarrow \mathtt {list}\ \mathtt {nat}\Rightarrow \mathtt {treeF}) \Rightarrow \mathtt {treeF}. \end{aligned}$$

4 Extraction

We now spell out the computational content of some of the proofs above. Each time it comes in the form of a term (in \(T^{+}\)) machine extracted from a formalization of the respective proof.

When reading the extracted terms please note that lambda abstraction is displayed via square brackets; so [n]n+m means \(\lambda _n n+m\). Our notation \({\langle { w s },{ a s }\rangle } t s \) for the tree with root \({\langle { w s },{ a s }\rangle }\) and list of subtrees \( t s \) is displayed as (ws pair as)%ts. Also types are implicit in variable names; for example, n, a both range over natural numbers. One can also use the display string for a type as a variable name of this type; for example, treeW is a name for a variable of type treeW.

4.1 BarWToGoodInit

We use typed variable names

figure a

The term extracted from the proof of the proposition BarWToGoodInit is

figure b

It takes some effort to understand such an extracted term. The recursion operator on treeW with value type alpha has type

figure c

Let Leaf: treeW and Branch: (list nat=>treeW)=>treeW be the constructors of treeW. Then \(\Phi :=\) (Rec treeW=>alpha) is given by the recursion equations

$$\begin{aligned}\begin{array}{ll} \Phi (\mathtt {Leaf})&{}:=G,\\ \Phi (\mathtt {Branch}(g))&{} :=H(g, \lambda _v \Phi (g(v))).\end{array} \end{aligned}$$

Here the value type alpha is (nat=>list nat)=>nat=>nat, and

$$\begin{aligned}\begin{array}{ll} G &{}:=\lambda _{f,a} a,\\ H(\mathtt {gw},\mathtt {hwfa}) &{}:=\lambda _{f,a} \mathtt {hwfa}(f(a), f, a+1).\end{array} \end{aligned}$$

4.2 BarFNil, BarFAppd

For BarFNil we have the simple extracted term

figure d

For BarFAppd we use the variable names

figure e

Then the extracted term is

figure f

The recursion operator on treeF with value type alpha has type

figure g

LargerARExAl wqo a ws means \(\exists _{i<| w s |} \forall _{j<|( w s )_i|} a \succeq ( w s )_{i,j}\). treeF has constructors CInitBarF: treeF and CGenBarF: (list nat=>treeF)=> texttttreeF. Then \(\Phi :=\) (Rec treeF=>alpha) is given by the recursion equations

$$\begin{aligned}\begin{array}{ll} \Phi (\mathtt {CInitBarF}) &{}:=G,\\ \Phi (\mathtt {CGenBarF}(g)) &{}:=H(g, \lambda _v \Phi (g(v))).\end{array} \end{aligned}$$

The value type of the first recursion is treeF=>nat=>treeF, and

$$\begin{aligned}\begin{array}{ll} G &{}:=\lambda _{\mathtt {treeF},a} \mathtt {CInitBarF},\\ H(g,\mathtt {htat}) &{}:=\lambda _{\mathtt {treeF0}} K(g, \mathtt {htat}, \mathtt {treeF0})\end{array} \end{aligned}$$

with \(K(g, \mathtt {htat}, \mathtt {treeF0})\) given by

figure h

The inner recursion is on treeF again, with value type nat=>treeF, and

$$\begin{aligned}\begin{array}{ll} G_1 &{}:=\lambda _a \mathtt {CInitBarF},\\ H_1(\mathtt {g0}, \mathtt {hat}) &{}:=\lambda _a \mathtt {CGenBarF} \dots \end{array} \end{aligned}$$

4.3 BarFNew

With the variable names

figure i

we extract

figure j

This time we have three nested recursions: an outer one on treeW with value type list nat=>treeF, then on treeA with value type treeF=>treeF, and innermost on treeF with value type treeF. This corresponds to the three elimination axioms used in the proof. Notice that the computational content cBarFAppd of the theorem BarFAppd appears as a constant inside the term.

4.4 Higman

figure k

4.5 Experiments

To run the extracted terms we need to “animate” the theorems involved. This means that the constant denoting their computational content (e.g., cBarFAppd for the theorem BarFAppd) unfolds into the term extracted from the proof of the theorem. Then for an arbitrary infinite sequence extending e.g. the example in Sect. 2 we obtain the expected good initial segment.

In more detail, we first have to animate the computationally relevant propositions in the proof given above of Higman’s Lemma. Then we need to prove and animate lemmas relating to the particular relation NatLe:

$$\begin{aligned}\begin{array}{ll} \mathtt {BarNatLeAppdOne}:&{} \forall _{i,m, a s }(i+\mathrm {Lh}( a s )=m+1 \rightarrow \mathrm {BarA}_{\le }( a s \# [m])),\\ \mathtt {BarANilNatLe}:&{}\mathrm {BarA}_{\le }[],\\ \mathtt {HigmanNatLe}:&{}\mathrm {BarW}_{\le }[].\end{array} \end{aligned}$$

Using these we can prove the final proposition

$$\begin{aligned} \mathtt {GoodWInitNatLe}:\forall _f \exists _n \mathrm {GoodW}_{\le }(\mathrm {Rev}(\bar{f}n)). \end{aligned}$$

Let neterm be the result of normalizing the term extracted from this proof. Next we provide an infinite sequence (extending the example in Sect. 2), e.g. in the form of a program constant:

figure l

Finally we run the our normalized extracted term by evaluating

figure m

(Here nt means “normalize term” and pt means “parse term”). The result is 4, the length of a good initial segment of our infinite sequence.

5 Related Work: Other Proofs of Higman’s Lemma

As mentioned at the beginning of Sect. 2, our constructive proof of Higman’s Lemma does not need transitivity; it works for arbitrary almost full relations. However, in the following discussion we disregard this fine point and assume that the underlying relation is a well-quasiorder. This will make it easier to compare different proofs in the literature.

There are quite a number of constructive proofs of Higman’s Lemma, thus the natural question arises: are they all different? The number of proofs is due to the fact that researchers from different areas, algebra, proof theory, constructive mathematics, term rewriting, to name a few, became interested in Higman’s Lemma. In addition, there are various formulations of a well-quasiorder which include different proof principles. These are for instance proofs using ordinal notation systems and transfinite induction as used in [24, 25] or inductively defined predicates and structural induction as used in [9, 18, 23]. Below we argue that these proofs are the same from a computational point of view.

The proof theoretic strength of Higman’s Lemma is that of Peano Arithmetic, i.e. \(\epsilon _0\), as was shown in [12] using the constructive proof of [25]. Speaking in terms of Reverse Mathematics, Higman’s Lemma can be proven in the theory ACA \(_\mathsf{0}\). In term rewriting theory, Higman’s Lemma and its generalization to trees, Kruskal’s Theorem, are used to prove termination of string rewriting systems and term rewriting systems respectively. The orders whose termination is covered by these two theorems are called simplification orders. They form an important class since the criterion of being a simplification order can be checked syntactically. A constructive proof, e.g., as given in [4], moreover yields a bound for the longest possible bad sequence. In the case of Higman’s Lemma the reduction length, expressed in terms of the Hardy hierarchy, \(H_{\alpha }\), assuming a finite alphabet A, is as follows. If we have a bad sequence \((t_i)_{i<n}\), fulfilling the condition \(|t_i| \le |t_0| + k\times i\), where k is a constant and |t| denotes the size of t, then the length n of the sequence is bound by \(\Phi (|t_0|)\) where \(\Phi \) is an elementary function in \(H_{\omega ^{\omega ^{|A|}}}\) [4, 31]. This bound is optimal since there are term rewriting systems which “reach” these bounds [32].

5.1 Equivalent Formulations of a Well-Quasiorder

We define the maximal ordertype of a well-quasiorder \((A,\preceq )\) as the supremum of the ordertypes of all extensions of \((A,\preceq )\) to a linear order. Equivalently, in a more constructive manner, the maximal ordertype can be defined by the height of the tree of all bad sequences \((\mathrm {Bad}_\preceq \)) with elements in A. A reification of a quasi order \((A,\preceq )\) into a wellordering \((\sigma ,<)\) is a map

$$\begin{aligned} r :\mathrm {Bad}_\preceq \rightarrow \sigma , \end{aligned}$$

such that for all \({a{*}a}s\in \mathrm {Bad}_\preceq \) we have \(r({a{*} a s }) < r( a s )\). On the set \(\mathrm {Bad}_\preceq \) of bad sequences in A we define a relation \(\ll _A\) by \( a s '\ll _A a s \) iff \( a s '= {a{*} a s }\) for some \(a\in A\). The accessible part of the relation \(\ll _A\subseteq \mathrm {Bad}_\preceq \times \mathrm {Bad}_\preceq \) is inductively given by the rule

$$\begin{aligned} \frac{\forall _{ a s '}( a s '\ll _A a s \rightarrow \mathrm {acc}_{\ll _A}{ a s '})}{\mathrm {acc}_{\ll _A}{ a s }} \end{aligned}$$

It is obvious that the following are equivalent for a quasiorder \((A,\preceq )\):

  1. (i)

    \((A,\preceq )\) is a well-quasiorder (i.e., \(\mathrm {Wqo}(A,\preceq )\)).

  2. (ii)

    \((A,\preceq )\) has a maximal ordertype.

  3. (iii)

    There is a reification of \((A,\preceq )\) into a wellorder.

  4. (iv)

    \((\mathrm {Bad}_\preceq ,\ll _A)\) is wellfounded, i.e., \(\mathrm {acc}_{\ll _A}[\!\;]\).

  5. (v)

    \(\mathrm {BarA}_{\preceq }{[]}\).

5.2 A Generic Proof of Higman’s Lemma

In the following we sketch a generic proof of

$$\begin{aligned} \mathrm {Wqo}(A,\preceq ) \rightarrow \mathrm {Wqo}(A^*,\preceq ^*). \end{aligned}$$

which differs from the proof presented in the earlier sections. We start by choosing a characterization of a well-quasiorder, either using ordinal notations ((ii) or (iii)) or inductive definitions ((iv) or (v)). (Note that in the latter case we need to generalize the statement; for instance, in (iv) we prove more generally \(\mathrm {acc}_{\ll _A} a s \rightarrow \mathrm {acc}_{\ll _{(A_{ a s })^*}}{[]}\) and use this proof with \( a s =[]\), and in the proof below instead of \(A_{[a]}\) we use everywhere \(A_{{{a{*} a s }}}\), etc.). Here we define \(A_{ a s }\) as the set of all elements that extend \( a s \) badly, i.e. \(\forall _{i} { a s }_i \not \preceq a\). Similarly, we define \(A_{[a]}\) to be the set of all elements b such that \(a\not \preceq _A b\). Assume that for our choice of characterization we are able to prove (with the obivous extension of \(\preceq \) to \(\cup \) and \(\times \)):

$$\begin{aligned} (a)&\forall a\ \mathrm {Wqo}(A_{[a]},\preceq ) \rightarrow \mathrm {Wqo}(A,\preceq ),\\ (b)&A\subseteq B \rightarrow \mathrm {Wqo}(A,\preceq ) \rightarrow \mathrm {Wqo}(B,\preceq ),\\ (c)&\mathrm {Wqo}(A) \wedge \mathrm {Wqo}(B) \rightarrow \mathrm {Wqo}(A\cup B),\\ (d)&\mathrm {Wqo}(A) \wedge \mathrm {Wqo}(B) \rightarrow \mathrm {Wqo}(A\times B). \end{aligned}$$

Assume \(\mathrm {Wqo}(\preceq )\). We proceed to prove Higman’s Lemma by using either structural induction or transfinite induction, depending on our choice. From the induction hypothesis we get

$$\begin{aligned}&\mathrm {Wqo}(A_{[a]}) \rightarrow \mathrm {Wqo}({A_{[a]}}^*). \end{aligned}$$
(1)

By (a) it suffices to prove \(\forall _v \mathrm {Wqo}(A^*_{[v]})\). Let \(v=[a_1,\ldots ,a_n]\). The main combinatorial idea is now contained in the following statement

$$\begin{aligned} (A)^*_{[[a_1,\ldots ,a_n]]} \subseteq \bigcup \{ (A_{[a_1]})^* \, \times A \times (A_{[a_2]})^* \, \times \dots \times A \times (A_{[a_l]})^* \mid l<n \} \end{aligned}$$
(2)

which holds by a simple combinatorial argument. Using (b) we are done once we have shown

$$\begin{aligned} \mathrm {Wqo}(\bigcup \{ (A_{[a_1]})^* \, \times A \times (A_{[a_2]})^* \, \times \dots \times A \times (A_{[a_l]})^* \mid l<n \}). \end{aligned}$$

But this follows immediately from (c), (d) and (1).

Remark

Instantiated versions of this proof, using characterizations (ii), (iii), (iv) or (v) of a well-quasiorder, can be found in the following articles: (ii) is used by de Jongh and Parikh [8] and Schmidt [24]. (iii) is used in the proof by Schütte and Simpson [25] (and Hasegawa [13]) (and is the characterization which is most promising in terms of generalizations beyond Kruskal’s Theorem). (iv) has been used by Fridlender [9], using an \(\mathrm {acc}\) notation. His proof is a reformulation of the proof by Richman and Stolzenberg [23]. To a less formal extent this characterization is also used in [18], where also structural induction and a similar construction describing the space to which a sequence can be extended badly are used. Characterization (v): the proof in [9] can be easily reformulated using (v). Fridlender [10] gives a variant where he does not need the decidability of \(\preceq _A\). His proof is a type theoretic version of an intuitionistic proof by Veldman, later published in [34].

Finally, the proof of [18] forms the basis of the formalization and proof of Higman’s Lemma in [16], in ACL2. Their work however starts with a program solving the problem, and then proving its properties rather than extracting the program from the proof.

Remark

Higman’s Lemma extends naturally to Kruskal’s Theorem, the corresponding statement for trees. Constructive proofs of Kruskal’s Theorem have been given by Schmidt [24] using characterization (ii), by Rathjen and Weiermann [22] and Hasegawa [13] using (iii), and in [27] using (iv). Finally, also Goubault-Larrecq’s proof [11] which generalizes the proof in [18] falls under this category.

It remains to compare how the computational content behind this generic proof of Higman Lemma is related to the constructive proof given in this paper. Although we have not yet formalized the proof above, it is quite obvious that the construction, in particular Eq. (2) differs from the construction in our proof, and therefore would result in a different algorithm.

6 Conclusion and Further Work

We presented and formalized a constructive proof of Higman’s Lemma that contains the same combinatorial idea as Nash-Williams’ indirect proof, and extracted and discussed its inherent program in detail. We also argued that a number of constructive proofs of Higman’s Lemma are based on a combinatorial idea different from ours. It is still open to make that claim formal, i.e. to formalize the proof presented in the previous section, and compare the resulting program with our extracted program. Similarly, there are a number of formalizations of Nash-Williams’ classical proof as mentioned in the introduction. It would be worthwhile to confirm that they, in principle, lead to the same algorithm, which also corresponds to the algorithm in our extracted program.

Equally interesting is the question which of the discussed proofs are most suitable for applications such as termination of string- and term rewriting systems, see e.g. [11, 30, 35] for recent discussions on applications to termination proofs. A particularly promising application has been given in [20]. It will be worth checking how our alternative proof of Higman’s Lemma and its extracted program can be utilized with regard to these applications or further generalizations.