1 Introduction

Syntax with bindings is an essential ingredient in the formal specification and implementation of logics and programming languages. However, correctly and formally specifying, assigning semantics to, and reasoning about bindings is notoriously difficult and error-prone. This fact is widely recognized in the formal verification community and is reflected in manifestos and benchmarks such as the influential POPLmark challenge [107] and its follow-ups, e.g., [1, 36].

In the past decade, in a framework developed intermittently starting with the second author’s PhD [89] and moving into the first author’s ongoing PhD, a series of results in logic and \(\lambda \)-calculus have been formalized in Isabelle/HOL [71, 73]. These include classic results (e.g., FOL completeness and soundness of Skolemization [22, 24, 26], strong normalization [92]), as well as novel results in the meta-theory of Isabelle’s Sledgehammer tool [16, 22]. A recent paper [45] gives a concrete flavor of how our framework can be deployed and used for two particular syntaxes, those of \(\lambda \)-calculus and of \(\lambda \)-calculus with emphasized values, where we prove the Church-Rosser and standardization theorems.

In this paper, we present the Isabelle/HOL formalization of the framework itself, which is available from the Archive of Formal Proofs [46]. While concrete system syntaxes differ in their details, there are some fundamental phenomena concerning bindings that follow the same generic principles. It is these fundamental phenomena that our framework aims to capture, by mechanizing a form of universal algebra for bindings. The framework has evolved over the years through feedback from concrete application challenges: Each time a tedious, seemingly routine construction was encountered, a question arose as to whether this could be performed once and for all in a syntax-agnostic fashion.

The paper is structured as follows. We start with an example-driven overview of our design decisions (Sect. 2). Then we present the general theory: terms as alpha-equivalence classes of “quasiterms,” standard operators on terms and their basic properties (Sect. 3), custom induction (Sect. 5) and recursion schemes (Sect. 4), including support for the semantic interpretation of syntax, and the sorting of terms according to a signature (Sect. 6). Finally, we briefly survey the various applications of the framework (Sect. 7). pointing out the usage of its various features. Within the large body of formalizations in the area (Sect. 8), distinguishing features of our work are the general setting (many-sorted signature, possibly infinitely branching syntax), a rich theory of the standard operators, and operator-aware recursion.

This paper is a substantially extended version of our conference paper presented at ITP 2017 [47]. The newly added material mostly expands the presentation of our main novel contributions: a rich theory of the operators and operator-aware recursion principles.

  • Operator properties We give a quasi-exhaustive presentation of the properties we have proved about the syntactic operators and their interaction (in Sect. 3.4).

  • Recursion

    • We present two more recursion schemes, which factor in the swapping operator instead of, or in addition to, the substitution operator (in Sect. 4.1).

    • We discuss primitive recursion, which is an extension of the iteration schemes (in a newly added Sect. 4.2).

    • We introduce the skeleton operator, defined as an instance of one of our iteration schemes (in a newly added Sect. 4.3).

    • We show the end-product scheme of many-sorted recursion (in a newly added Sect. 6.4).

  • Applications We survey the applications of our theory (in a newly added Sect. 7).

  • Relevant literature The related work Sect. 8 has been significantly expanded.

2 Design Decisions

In this section, we use some examples to motivate our design choices for the theory. We also introduce conventions and notations that will be relevant throughout the paper.

The paradigmatic example of syntax with bindings is that of the \(\lambda \)-calculus [10]. We assume an infinite supply of variables, \(x \in \mathbf{var}\). The \(\lambda \)-terms, \(X,Y \in \mathbf{term}_\lambda \), are defined by the following BNF grammar

$$\begin{aligned} X \;{::=}\; {\textsf {Var}}\;x \mid {\textsf {App}}\;X\;Y \mid {\textsf {Lm}}\;x\;X \end{aligned}$$

Thus, a \(\lambda \)-term is either a variable, or an application, or a \(\lambda \)-abstraction. This grammar specification, while sufficient for first-order abstract syntax, is incomplete when it comes to syntax with bindings— we also need to indicate which operators introduce bindings and in which of their arguments. Here, \({\textsf {Lm}}\) is the only binding operator: When applied to the variable x and the term X, it binds x in X. After knowing the binders, the usual convention is to identify terms modulo alpha-equivalence, i.e., to treat as equal terms that only differ in the names of bound variables, such as, e.g., and . The end results of our theory will involve terms modulo alpha. We will call the raw terms “quasiterms,” reserving the word “term” for alpha-equivalence classes.

2.1 Standalone Abstractions

To make the binding structure manifest, we will “quarantine” the bindings and their associated intricacies into the notion of abstraction, which is a pairing of a variable and a term, again modulo alpha. For example, for the \(\lambda \)-calculus we will have

$$\begin{aligned} \begin{array}{cc} X \;{::=}\; {\textsf {Var}}\;x \mid {\textsf {App}}\;X\;Y \mid {\textsf {Lm}}\;A \quad&\quad A \;{::=}\; {\textsf {Abs}}\;x\;X \end{array} \end{aligned}$$

where X are terms and A abstractions. Within \({\textsf {Abs}}\;x\;X\), we assume that x is bound in X. The \(\lambda \)-abstractions \({\textsf {Lm}}\;x\;X\) of the the original syntax are now written \({\textsf {Lm}}\;({\textsf {Abs}}\;x\;X)\).

2.2 Freshness, Substitution and Swapping

The three most fundamental and most standard operators on \(\lambda \)-terms are:

  • the freshness predicate, \({\textsf {fresh}}: \mathbf{var}\rightarrow \mathbf{term}_\lambda \rightarrow \mathbf{bool}\), where \({\textsf {fresh}}\;x\;X\) states that x is fresh for (i.e., does not occur free in) X; for example, it holds that \({\textsf {fresh}}\;x\;({\textsf {Lm}}\;({\textsf {Abs}}\;x\;({\textsf {Var}}\;x)))\) and \({\textsf {fresh}}\;x\;({\textsf {Var}}\;y)\) (when \(x\not =y\)), but not that \({\textsf {fresh}}\;x\;({\textsf {Var}}\;x)\)

  • the substitution operator, \(\_[\_/\_]: \mathbf{term}_\lambda \rightarrow \mathbf{term}_\lambda \rightarrow \mathbf{var}\rightarrow \mathbf{term}_\lambda \), where \(Y\,[X/x]\) denotes the (capture-free) substitution of term X for (all free occurrences of) variable x in term Y; e.g., if Y is \({\textsf {Lm}}\;({\textsf {Abs}}\;x\;({\textsf {App}}\;({\textsf {Var}}\;x)\;({\textsf {Var}}\;y)))\) and \(x \not \in \{y,z\}\), then:

    • \(Y\,[({\textsf {Var}}\;z)/y] = {\textsf {Lm}}\;({\textsf {Abs}}\;x\;({\textsf {App}}\;({\textsf {Var}}\;x)\;({\textsf {Var}}\;z)))\)

    • \(Y\,[({\textsf {Var}}\;z)/x] = Y\) (since bound occurrences like those of x in Y are not affected)

  • the swapping operator \(\_[\_ \wedge \_]: \mathbf{term}_\lambda \rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{term}_\lambda \), where \(Y[x\wedge y]\) indicates the term Y where every occurrence (free or bound, indifferently) of the variable x has been replaced by an occurrence of y and vice versa; for example if Y is \({\textsf {Lm}}\;({\textsf {Abs}}\;x\;({\textsf {App}}\;({\textsf {Var}}\;x)\;({\textsf {Var}}\;y)))\) and \(z \not \in \{x,y\}\)

    • \(Y\,[x\wedge y] = Y\,[y\wedge x] = {\textsf {Lm}}\;({\textsf {Abs}}\;y\;({\textsf {App}}\;({\textsf {Var}}\;y)\;({\textsf {Var}}\;x)))\)

    • \(Y\,[x\wedge z] = Y\,[z\wedge x] = {\textsf {Lm}}\;({\textsf {Abs}}\;z\;({\textsf {App}}\;({\textsf {Var}}\;z)\;({\textsf {Var}}\;y)))\)

    • \(Y\,[y\wedge z] = Y\,[z\wedge y] = {\textsf {Lm}}\;({\textsf {Abs}}\;x\;({\textsf {App}}\;({\textsf {Var}}\;x)\;({\textsf {Var}}\;z)))\)

And there are corresponding operators for abstractions—e.g., \({\textsf {freshAbs}}\;x\;({\textsf {Abs}}\;x\;({\textsf {Var}}\;x))\) holds. Freshness, substitution and swapping are pervasive in the meta-theory of \(\lambda \)-calculus, as well as in most logical systems and formal semantics of programming languages. The basic properties of these operators lay at the core of important meta-theoretic results in these fields—our formalized theory aims at the exhaustive coverage of these basic properties.

2.3 Advantages and Obligations from Working with Terms Modulo Alpha

In our theory, we start with defining quasiterms and quasiabstractions and their alpha-equivalence. Then, after proving all the syntactic constructors and standard operators to be compatible with alpha, we quotient to alpha, obtaining what we call terms and abstractions, and define the versions of these operators on quotiented items. For example, let \(\mathbf{qterm}_\lambda \) and \(\mathbf{qabs}_\lambda \) be the types of quasiterms and quasiabstractions in \(\lambda \)-calculus. Here, the quasiabstraction constructor, \({\textsf {qAbs}}: \mathbf{var}\rightarrow \mathbf{qterm}_\lambda \rightarrow \mathbf{qabs}_\lambda \), is a free constructor, of the kind produced by standard datatype specifications [14, 20]. The types \(\mathbf{term}_\lambda \) and \(\mathbf{abs}_\lambda \) are \(\mathbf{qterm}_\lambda \) and \(\mathbf{qabs}_\lambda \) quotiented to alpha. We prove compatibility of \({\textsf {qAbs}}\) with alpha and then define \({\textsf {Abs}}: \mathbf{var}\rightarrow \mathbf{term}_\lambda \rightarrow \mathbf{abs}_\lambda \) by lifting \({\textsf {qAbs}}\) to quotients.

The decisive advantages of working with quasiterms and quasiabstractions modulo alpha, i.e., with terms and abstractions, are that (1) substitution behaves well (e.g., is compositional) and (2) Barendregt’s variable convention [10] (of assuming, w.l.o.g., the bound variables fresh for the parameters) can be invoked in proofs.Footnote 1

However, this choice brings the obligation to prove that all concepts on terms are compatible with alpha. Without employing suitable abstractions, this can become quite difficult even in the most “banal” contexts. Due to nonfreeness, primitive recursion on terms requires a proof that the definition is well formed, i.e., that the overlapping cases lead to the same result. As for Barendregt’s convention, its rigorous usage in proofs needs a principle that goes beyond the usual structural induction for free datatypes.

A framework that deals gracefully with these obligations can make an important difference in applications—enabling the formalizer to quickly leave behind low-level “bootstrapping” issues and move to the interesting core of the results. To address these obligations, we formalize state-of-the-art techniques from the literature [84, 91, 112].

2.4 Many-Sortedness

While \(\lambda \)-calculus has only one syntactic category of terms (to which we added that of abstractions for convenience), this is often not the case. FOL has two: terms and formulas. The Edinburgh Logical Framework (LF) [50] has three: object families, type families and kinds. More complex calculi can have many syntactic categories.

Our framework will capture these phenomena. We will call the syntactic categories sorts. We will distinguish syntactic categories for terms (the sorts) from those for variables (the varsorts). Indeed, e.g., in FOL we do not have variables ranging over formulas, in the \(\pi \)-calculus [70] we have channel names but no process variables, etc.

Sortedness is important, but formally quite heavy. In our formalization, we postpone dealing with it for as long as possible. We introduce an intermediate notion of good term, for which we are able to build the bulk of the theory—only as the very last step we introduce many-sorted signatures and transit from “good” to “sorted.” Note that good terms are only an intermediate concept, not showing in the final product of our theory.

2.5 Possibly Infinite Branching

Nominal logic’s [83, 112] notion of finite support has become central in state-of-the-art techniques for reasoning about bindings. Occasionally, however, important developments step outside finite support. For example, (a simplified) CCS [69] has the following syntactic categories of data expressions \(E \in \mathbf{exp}\) and processes \(P \in \mathbf{proc}\):

$$\begin{aligned} E \;{::=}\; {\textsf {Var}}\;x \mid 0 \mid E + E \qquad P \;{::=}\; {\textsf {Inp}}\;c\;x\;P \mid {\textsf {Out}}\;c\;E\;P \mid \sum \nolimits _{i\in I}P_i \end{aligned}$$

Above, \({\textsf {Inp}}\;c\;x\;P\), usually written \(c(x).\,P\), is an input prefix c(x) followed by a continuation process P, with c being a channel and x a variable which is bound in P. Dually, \({\textsf {Out}}\;c\;E\;P\), usually written \(c\,\overline{E}.\,P\), is an output-prefixed process with E an expression. The exotic constructor here is the sum \(\sum \), which models nondeterministic choice from a collection \((P_i)_{i \in I}\) of alternatives indexed by a set I. It is important that I is allowed to be infinite, for modeling different decisions based on different received inputs. But then process terms may use infinitely many variables, i.e., may not be finitely supported. Similar issues arise in infinitary FOL [61] and Hennessey-Milner logic [52]. In our theory, we cover such infinitely branching syntaxes.

3 General Terms with Bindings

We start the presentation of our formalized theory, in its journey from quasiterms (3.1) to terms via alpha-equivalence (3.2). The journey is fueled by the availability of fresh variables, ensured by cardinality assumptions on constructor branching and variables (3.3). It culminates with a systematic study of the standard term operators (3.4).

3.1 Quasiterms

The types \(\mathbf{qterm}\) and \(\mathbf{qabs}\), of quasiterms and quasiabstractions, are defined as mutually recursive datatypes polymorphic in the following type variables: \(\mathbf{index}\) and \(\mathbf{bindex}\), of indexes for free and bound arguments, \(\mathbf{varsort}\), of varsorts, i.e., sorts of variables, and \(\mathbf{opsym}\), of (constructor) operation symbols. For readability, below we omit the occurrences of these type variables as parameters to \(\mathbf{qterm}\) and \(\mathbf{qabs}\):

$$\begin{aligned} \begin{array}{rrcl} \textsf {datatype } &{} \mathbf{qterm}&{}\;=\;&{} {\textsf {q}}{\textsf {Var}}\;\mathbf{varsort}\;\mathbf{var}\mid \\ &{}&{}&{} {\textsf {qOp}}\;\mathbf{opsym}\;((\mathbf{index},\mathbf{qterm})\,\mathbf{input})\;((\mathbf{bindex},\mathbf{qabs})\,\mathbf{input}) \\ \textsf {and } &{} \mathbf{qabs}&{}\;=\;&{} {\textsf {qAbs}}\;\mathbf{varsort}\;\mathbf{var}\;\mathbf{qterm}\end{array} \end{aligned}$$

Thus, any quasiabstraction has the form \({\textsf {qAbs}}\;{\textit{xs}}\;x\;X\), putting together the variable x of varsort \({\textit{xs}}\) with the quasiterm X, indicating the binding of x in X. On the other hand, a quasiterm is either an injection \({\textsf {q}}{\textsf {Var}}\;{\textit{xs}}\;x\), of a variable x of varsort \({\textit{xs}}\), or has the form \({\textsf {qOp}}\;\delta \;{\textit{inp}}\;{\textit{binp}}\), i.e., consists of an operation symbol applied to some inputs that can be either free, \({\textit{inp}}\), or bound, \({\textit{binp}}\). Note that the same variable can appear in a quasiterm with two different sorts. This is not a problem, since we always treat variables in a sorted manner: we inject them into quasiterms with explicit sort, \({\textsf {q}}{\textsf {Var}}\;{\textit{xs}}\;x\), and we bind them with explicit sort, \({\textsf {qAbs}}\;{\textit{xs}}\;x\;t\). This way, given \({\textit{xs}}\not = {\textit{ys}}\), we treat “x at sort \({\textit{xs}}\)” and “x at sort \({\textit{ys}}\)” as unrelated entities.

We use \((\alpha ,\beta )\,\mathbf{input}\) as a type synonym for \(\alpha \rightarrow \beta \;\mathbf{option}\), the type of partial functions from \(\alpha \) to \(\beta \); such a function returns either \({\textsf {None}}\) (representing “undefined”) or \({\textsf {Some}}\;b\) for \(b: \beta \). This type models inputs to the quasiterm constructors of varying number of arguments. An operation symbol \(\delta : \mathbf{opsym}\) can be applied, via \({\textsf {qOp}}\), to: (1) a varying number of free inputs, i.e., families of quasiterms modeled as members of \((\mathbf{index},\mathbf{qterm})\,\mathbf{input}\) and (2) a varying number of bound inputs, i.e., families of quasiabstractions modeled as members of \((\mathbf{index},\mathbf{qabs})\,\mathbf{input}\). For example, taking \(\mathbf{index}\) to be \(\mathbf{nat}\) we capture n-ary operations for any n (passing to \({\textsf {qOp}}\;\delta \) inputs defined only on \(\{0,\ldots ,n-1\}\)), as well as as countably-infinitary operations (passing to \({\textsf {qOp}}\;\delta \) inputs defined on the whole \(\mathbf{nat}\)).

Note that, so far, we consider sorts of variables but not sorts of terms. The latter will come much later, in Sect. 6, when we introduce signatures. Then, we will gain control (1) of which varsorts should be embedded in which term sorts and (2) of which operation symbols are allowed to be applied to which sorts of terms. But, until then, we will develop the interesting part of the theory of bindings without sorting the terms.

On quasiterms, we define freshness, \({\textsf {qFresh}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{qterm}\rightarrow \mathbf{bool}\), substitution, \(\_[\_ / \_]_{\_}: \mathbf{qterm}\rightarrow \mathbf{qterm}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{qterm}\), parallel substitution, \(\_[\_]: \mathbf{qterm}\rightarrow (\mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{qterm}\;\mathbf{option}) \rightarrow \mathbf{qterm}\), swapping, \(\_[\_ \wedge \_]_{\_}: \mathbf{qterm}\rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{qterm}\), and alpha-equivalence, \({\textsf {alpha}}: \mathbf{qterm}\rightarrow \mathbf{qterm}\rightarrow \mathbf{bool}\)—and corresponding operators on quasiabstractions: \({\textsf {qFreshAbs}}\), \({\textsf {alphaAbs}}\), etc.

The definitions proceed as expected, with picking suitable fresh variables in the case of substitutions and alpha. For parallel substitution, given a (partial) variable-to-quasiterm assignment \(\rho : \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{qterm}\;\mathbf{option}\), the quasiterm \(X[\rho ]\) is obtained by substituting, for each free variable x of sort \({\textit{xs}}\) in X for which \(\rho \) is defined, the quasiterm Y where \(\rho \;{\textit{xs}}\;x = {\textsf {Some}}\;Y\). We only show the formal definition of alpha.

Fig. 1
figure 1

Alpha-Equivalence

3.2 Alpha-Equivalence

We define the predicates \({\textsf {alpha}}\) (on quasiterms) and \({\textsf {alphaAbs}}\) (on quasiabstractions) mutually recursively, as shown in Fig. 1. For variable quasiterms, we require equality on both the variables and their sorts. For \({\textsf {qOp}}\) quasiterms, we recurse through the components, \({\textit{inp}}\) and \({\textit{binp}}\). Given any predicate \(P: \beta ^2 \rightarrow \mathbf{bool}\), we write \({\uparrow }\,P\) for its lifting to \((\alpha ,\beta )\,\mathbf{input}^2 \rightarrow \mathbf{bool}\), defined as

$$\begin{aligned} \begin{array}{l} {\uparrow }\,P\;{\textit{inp}}\;{\textit{inp}}' \Longleftrightarrow \\ \forall i.\; \text{ case } ({\textit{inp}}\;i,{\textit{inp}}'\;i) \text{ of } ({\textsf {None}},{\textsf {None}}) \Rightarrow {\textsf {True}}\mid ({\textsf {Some}}\;b,\,{\textsf {Some}}\;b') \Rightarrow P\;b\;b' \mid \_ \Rightarrow {\textsf {False}}\end{array} \end{aligned}$$

Thus, \({\uparrow }\,P\) relates two inputs just in case they have the same domain and their results are componentwise related.

Convention 1

Throughout this paper, without further notice we write \({\uparrow }\) for the natural lifting of the various operators from terms and abstractions to free or bound inputs.

In Fig. 1’s clause for quasiabstractions, we require that the bound variables are of the same sort and there exists some fresh y such that \({\textsf {alpha}}\) holds for the terms where y is swapped with the bound variable. Following nominal logic, we prefer to use swapping instead of substitution in alpha-equivalence, since this leads to simpler proofs [84].

3.3 Good Quasiterms and Regularity of Variables

In general, \({\textsf {alpha}}\) will not be an equivalence, namely, will not be transitive: Due to the arbitrarily wide branching of the constructors, we may not always have fresh variables y available in an attempt to prove transitivity by induction. To remedy this, we restrict ourselves to “good” quasiterms, whose constructors do not branch beyond the cardinality of \(\mathbf{var}\). Goodness is defined as the mutually recursive predicates \({\textsf {qGood}}\) and \({\textsf {qGoodAbs}}\):

$$\begin{aligned} \begin{array}{rcl} {\textsf {qGood}}\;({\textsf {q}}{\textsf {Var}}\;{\textit{xs}}\;x) &{}\Longleftrightarrow &{} {\textsf {True}}\\ {\textsf {qGood}}\;({\textsf {qOp}}\;\delta \;{\textit{inp}}\;{\textit{binp}}) &{}\;\Longleftrightarrow \;&{} {\uparrow }\,{\textsf {qGood}}\;{\textit{inp}}\;\wedge \; {\uparrow }\,{\textsf {qGoodAbs}}\;{\textit{binp}}\;\wedge \; \\ &{}&{} |{\textsf {dom}}\;{\textit{inp}}|< |\mathbf{var}| \;\wedge \; |{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}| \\ {\textsf {qGoodAbs}}\;({\textsf {qAbs}}\;{\textit{xs}}\;x\;X) &{}\Longleftrightarrow &{} {\textsf {qGood}}\;X \end{array} \end{aligned}$$

where, given a partial function f, we write \({\textsf {dom}}\;f\) for its domain.

Thus, for good items, we hope to always have a supply of fresh variables. Namely, we hope to prove \( {\textsf {qGood}}\;X \Longrightarrow \forall {\textit{xs}}.\;\exists x.\;{\textsf {qFresh}}\;{\textit{xs}}\;x\;X \). But goodness is not enough. We also need a special property for the type \(\mathbf{var}\) of variables. In the case of finitary syntax, it suffices to take \(\mathbf{var}\) to be countably infinite, since a finitely branching term will contain fewer than \(|\mathbf{var}|\) variables (here, meaning a finite number of them)—this can be proved by induction on terms, using the fact that a finite union of finite sets is finite.

So let us attempt to prove the same in our general case. In the inductive \({\textsf {qOp}}\) case, we know from goodness that the branching is smaller than \(|\mathbf{var}|\), so to conclude we would need the following: A union of sets smaller than\(|\mathbf{var}|\)indexed by a set smaller than\(|\mathbf{var}|\)stays smaller than\(|\mathbf{var}|\). It turns out that this is a well-studied property of cardinals, called regularity—with \(|\mathbf{nat}|\) being the smallest regular cardinal. (In addition, we know that for every cardinal k there exists a regular cardinal larger than k.) Thus, the desirable generalization of countability is regularity (which is available from Isabelle’s cardinal library [23]). Henceforth, we will assume:

Assumption 2

\(|\mathbf{var}|\) is a regular cardinal.

We will thus have not only one, but a \(|\mathbf{var}|\) number of fresh variables:

Prop 3

\({\textsf {qGood}}\;X \Longrightarrow \forall {\textit{xs}}.\;|\{x.\;{\textsf {qFresh}}\;{\textit{xs}}\;x\;X\}| = |\mathbf{var}|\)

Now we can prove, for good items, the properties of alpha familiar from the \(\lambda \)-calculus, including it being an equivalence and an alternative formulation of the abstraction case, where “there exists a fresh y” is replaced with “for all fresh y.” While the “exists” variant is useful when proving that two terms are alpha-equivalent, the “forall” variant gives stronger inversion and induction rules for proving implications from \({\textsf {alpha}}\). (Such fruitful “exsist-fresh/forall-fresh,” or “some-any” dychotomies have been previously discussed in the context of bindings, e.g, in [8, 68, 76].)

Prop 4

The following hold:

(1) \({\textsf {alpha}}\) and \({\textsf {alphaAbs}}\) are equivalences on good quasiterms and quasiabstractions

(2) The predicates defined by replacing, in Fig. 1’s definition, the abstraction case with

coincide with \({\textsf {alpha}}\) and \({\textsf {alphaAbs}}\).

3.4 Terms and Their Properties

We define \(\mathbf{term}\) and \(\mathbf{abs}\) as collections of \({\textsf {alpha}}\)- and \({\textsf {alphaAbs}}\)- equivalence classes of \(\mathbf{qterm}\) and \(\mathbf{qabs}\). Since \({\textsf {qGood}}\) and \({\textsf {qGoodAbs}}\) are compatible with \({\textsf {alpha}}\) and \({\textsf {alphaAbs}}\), we lift them to corresponding predicates on terms and abstractions, \({\textsf {good}}\) and \({\textsf {goodAbs}}\).

We also prove that all constructors and operators are alpha-compatible, which allows lifting them to terms. Figure 2 shows the types of all these term constructors and operators.

Fig. 2
figure 2

Constructors and operators on terms and abstractions

To establish an abstraction barrier that sets terms free from their quasiterm origin, we prove that the syntactic constructors mostly behave like free constructors, in that \({\textsf {Var}}\), \({\textsf {Op}}\) and \({\textsf {Abs}}\) are exhaustive and \({\textsf {Var}}\) and \({\textsf {Op}}\) are injective and nonoverlapping. True to the quarantine principle expressed in Sect. 2.1, the only nonfreeness incident occurs for \({\textsf {Abs}}\). Its equality behavior is regulated by the “exists fresh” and “forall fresh” properties inferred from the definition of \({\textsf {alphaAbs}}\) and Prop. 4(2), respectively:

Prop 5

Assume \({\textsf {good}}\;X\) and \({\textsf {good}}\;X'\). Then the following are equivalent:

(1) \({\textsf {Abs}}\;{\textit{xs}}\;x\;X = {\textsf {Abs}}\;{\textit{xs}}'\;x'\;X'\)

(2) \({\textit{xs}}= {\textit{xs}}' \,\wedge \,(\exists y \notin \{x,x'\}.\;{\textsf {fresh}}\;{\textit{xs}}\;y\;X \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;y\;X' \,\wedge \, X [ y \wedge x]_{xs} = X' [ y \wedge x']_{xs})\)

(3) \({\textit{xs}}= {\textit{xs}}' \,\wedge \,(\forall y \notin \{x,x'\}.\;{\textsf {fresh}}\;{\textit{xs}}\;y\;X \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;y\;X' \Longrightarrow X [ y \wedge x]_{xs} = X' [ y \wedge x']_{xs})\)

Useful rules for abstraction equality also hold with substitution:

Prop 6

Assume \({\textsf {good}}\;X\) and \({\textsf {good}}\;X'\). Then the following hold:

(1) \(y \notin \{x,x'\} \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;y\;X \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;y\;X' \,\wedge \, X\,[({\textsf {Var}}\;{\textit{xs}}\;y)\,/\,x]_{xs} = X'\,[({\textsf {Var}}\;{\textit{xs}}\;y)\,/\,x']_{xs} \Longrightarrow {\textsf {Abs}}\;{\textit{xs}}\;x\;X = {\textsf {Abs}}\;{\textit{xs}}\;x'\;X'\)

(2) \({\textsf {fresh}}\;{\textit{xs}}\;y\;X \;\Longrightarrow \; {\textsf {Abs}}\;{\textit{xs}}\;x\;X = {\textsf {Abs}}\;{\textit{xs}}\;y\;(X\,[({\textsf {Var}}\;{\textit{xs}}\;y)\,/\,x]_{xs})\)

To completely seal the abstraction barrier, for all the standard operators we prove simplification rules regarding their interaction with the constructors, which makes the former behave as if they had been defined in terms of the latter.

The following facts resemble an inductive definition of freshness (as a predicate):

Prop 7

Assume \({\textsf {good}}\;X\), \({\uparrow }\,{\textsf {good}}\;{\textit{inp}}\), \({\uparrow }\,{\textsf {good}}\;{\textit{binp}}\), \(|{\textsf {dom}}\;{\textit{inp}}| < |\mathbf{var}|\) and \(|{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}|\). The following hold:

(1) \(({\textit{ys}},y) \not = ({\textit{xs}},x) \,\Longrightarrow \, {\textsf {fresh}}\;{\textit{ys}}\;y\;({\textsf {Var}}\;{\textit{xs}}\;x)\)

(2) \({\uparrow }\,({\textsf {fresh}}\;{\textit{ys}}\;y)\;{\textit{inp}}\,\wedge \, {\uparrow }\,({\textsf {freshAbs}}\;{\textit{ys}}\;y)\;{\textit{binp}}\;\Longrightarrow \; {\textsf {fresh}}\;{\textit{ys}}\;y\;({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\)

(3) \(({\textit{ys}},y) = ({\textit{xs}},x)\,\vee \, {\textsf {fresh}}\;{\textit{ys}}\;y\;X \;\Longrightarrow \; {\textsf {freshAbs}}\;{\textit{ys}}\;y\;({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\)

Here and elsewhere, when dealing with \({\textsf {Op}}\), we make cardinality assumptions on the domains of the inputs to make sure the terms \({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}}\) are good.

We can further improve on Prop. 7, obtaining “iff” facts that resemble a primitively recursive definition of freshness (as a function):

Prop 8

Prop. 7 stays true if the implications are replaced by equivalences (\(\Longleftrightarrow \)).

For the swapping and substitution operators, we prove the following simplification rules, with a similar primitive recursion flavor.

Prop 9

Assume \({\textsf {good}}\;X\), \({\uparrow }\,{\textsf {good}}\;{\textit{inp}}\), \({\uparrow }\,{\textsf {good}}\;{\textit{binp}}\), \(|{\textsf {dom}}\;{\textit{inp}}| < |\mathbf{var}|\) and \(|{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}|\). The following hold:

(1) \(({\textsf {Var}}\;{\textit{xs}}\;x)\,[y\wedge z]_{ys} \;=\; {\textsf {Var}}\,(x\,[y\wedge z]_{xs,ys})\)

(2) \(({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\,[y\wedge z]_{ys} = {\textsf {Op}}\;\delta \;({\uparrow }\,(\_[y\wedge z]_{ys})\,{\textit{inp}})\;({\uparrow }\,(\_[y\wedge z]_{ys})\,{\textit{binp}})\)

(3) \(({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\,[y\wedge z]_{ys} \,=\, {\textsf {Abs}}\;{\textit{xs}}\;(x\,[y\wedge z]_{xs,ys})\;(X\,[y\wedge z]_{ys}) \)

where \(x\,[y\wedge z]_{xs,ys}\) is the (sorted) swapping on variables, defined as

$$\begin{aligned} \text{ if } ({\textit{xs}},x) = ({\textit{ys}},y)\hbox { then }y\hbox { else if }({\textit{xs}},x) = ({\textit{ys}},z) \hbox { then }z\hbox { else }x \end{aligned}$$

Prop 10

Assume \({\textsf {good}}\;X\), \({\textsf {good}}\;Y\), \({\uparrow }\,{\textsf {good}}\;{\textit{inp}}\), \({\uparrow }\,{\textsf {good}}\;{\textit{binp}}\), \(|{\textsf {dom}}\;{\textit{inp}}| < |\mathbf{var}|\) and \(|{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}|\). The following hold:

(1) \(({\textsf {Var}}\;{\textit{xs}}\;x)\,[Y/y]_{ys} \;=\; (\text{ if } ({\textit{xs}},x) = ({\textit{ys}},y)\hbox { then }Y\hbox { else }{\textsf {Var}}\;{\textit{xs}}\;x)\)

(2) \(({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\,[Y/y]_{ys} = {\textsf {Op}}\;\delta \;({\uparrow }\,(\_[Y/y]_{ys})\,{\textit{inp}})\;({\uparrow }\,(\_[Y/y]_{ys})\,{\textit{binp}})\)

(3) \(({\textit{xs}},x) \not = (ys,y) \;\wedge \; {\textsf {fresh}}\;{\textit{xs}}\;x\;Y \;\Longrightarrow \; ({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\,[Y/y]_{ys} \,=\, {\textsf {Abs}}\;{\textit{xs}}\;x\;(X\,[Y/y]_{ys}) \)

Since unary substitution is a particular case of parallel substitution, the previous lemma is a corollary of the following:

Prop 11

Assume \({\textsf {good}}\;X\), \({\uparrow }\,{\textsf {good}}\;{\textit{inp}}\), \({\uparrow }\,{\textsf {good}}\;{\textit{binp}}\), \({\uparrow }\,{\textsf {good}}\;\rho \), \(|{\textsf {dom}}\;{\textit{inp}}| < |\mathbf{var}|\), \(|{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}|\) and \(|{\textsf {dom}}\;\rho | < |\mathbf{var}|\). The following hold:

(1) \(({\textsf {Var}}\;{\textit{xs}}\;x)\,[\rho ] \;=\; (\text{ if } \rho \;{\textit{xs}}\;x = {\textsf {Some}}\ Y \text{ then } Y \text{ else } {\textsf {Var}}\;{\textit{xs}}\;x)\)

(2) \(({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\,[\rho ] = {\textsf {Op}}\;\delta \;({\uparrow }\,(\_[\rho ])\,{\textit{inp}})\;({\uparrow }\,(\_[\rho ])\,{\textit{binp}})\)

(3) \( {\uparrow }({\textsf {fresh}}\;{\textit{xs}}\;x)\;\rho \;\Longrightarrow \; ({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\,[\rho ] \,=\, {\textsf {Abs}}\;{\textit{xs}}\;x\;(X\,[\rho ]) \)

Above, the notation \({\uparrow }({\textsf {fresh}}\;{\textit{xs}}\;x)\;\rho \) follows the spirit of Convention 1, in that it lifts the freshness predicate from terms to environments for parallel substitution, i.e., partial variable-to-term assignments \(\rho : \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{term}\;\mathbf{option}\). However, it must be noted that this is a non-standard lifting process, referring not only to the freshness on \(\rho \)’s image terms, but also to distinctness on \(\rho \)’s domain variables. Namely, \({\uparrow }({\textsf {fresh}}\;{\textit{xs}}\;x)\;\rho \) is defined as

$$\begin{aligned} \forall \;{\textit{ys}},y,Y.\; \rho \;{\textit{ys}}\;y = {\textsf {Some}}\;Y \Longrightarrow ({\textit{xs}},x) \not = ({\textit{ys}},y) \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;x\;Y \end{aligned}$$

Thus, \(({\textit{xs}},x)\) must be fresh for the graph of (the uncurried version of) the partial function \(\rho \).

Note that, when it comes to the interaction of freshness and substitution with \({\textsf {Abs}}\), the simplification rules require freshness of the bound variable. Thus, \({\textsf {freshAbs}}\;{\textit{ys}}\;y\;({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\) is reducible to \({\textsf {fresh}}\;{\textit{ys}}\;y\;X\) only if \(({\textit{xs}},x)\) is distinct from \(({\textit{ys}},y)\). Moreover, \(({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\,[Y/y]_{ys}\) is expressible in terms of \(X\,[Y/y]_{ys}\) only if \(({\textit{xs}},x)\) is distinct from \(({\textit{ys}},y)\) and fresh for Y. And similarly for parallel substitution. By contrast, swapping does not suffer from this restriction, which makes it significantly more manageable in proofs.

In addition to the simplification rules, we prove a comprehensive collection of lemmas describing the interaction between any pair of operators, including the interaction of each operator with itself (the latter being typically a form of compositionality property). Below we only list these properties for terms, omitting the corresponding ones for abstractions.

Prop 12

(Properties of swapping) Assume \({\textsf {good}}\;X\). The following hold:

(1) Swapping the same variable is identity:

$$\begin{aligned} X\;[x \wedge x]_{xs} = X \end{aligned}$$

(2) Swapping is compositional:

$$\begin{aligned} (X\; [x_1 \wedge x_2]_{xs})\;[y_1\wedge y_2]_{ys} {=} (X\;[y_1\wedge y_2]_{ys})\;[(x_1\;[y_1\wedge y_2]_{xs,ys}) \wedge (x_2\;[y_1\wedge y_2]_{xs,ys})]_{xs} \end{aligned}$$

(3) Swapping commutes if the variables are disjoint or the varsorts are different:

$$\begin{aligned}&{\textit{xs}}\ne {\textit{ys}}\ \vee \ \{x_1,x_2\} \cap \{y_1,y_2\} = \emptyset \\&\qquad \qquad \qquad \implies \ (X\;[x_1 \wedge x_2]_{xs})\;[y_1 \wedge y_2]_{ys} = (X\;[y_1 \wedge y_2]_{ys})\;[x_1 \wedge x_2]_{xs} \end{aligned}$$

(4) Swapping is involutive:

$$\begin{aligned} (X\;[x \wedge y]_{xs})\;[x \wedge y]_{xs} = X \end{aligned}$$

(5) Swapping is symmetric:

$$\begin{aligned} X\;[x \wedge y]_{xs}=X\;[y \wedge x]_{xs} \end{aligned}$$

Prop 13

(Swapping versus freshness) Assume \({\textsf {good}}\;X\). The following hold:

(1) Swapping preserve freshness:

$$\begin{aligned} {\textit{xs}}\ne {\textit{ys}}\ \vee \ x\notin \{y_1,y_2\}\ \implies \ {\textsf {fresh}}\ {\textit{xs}}\ (x\;[y_1 \wedge y_2]_{xs,ys})\ (X\;[y_1 \wedge y_2]_{ys}) = {\textsf {fresh}}\ {\textit{xs}}\ x\ X \end{aligned}$$

(2) Swapping fresh variables is identity:

$$\begin{aligned} {\textsf {fresh}}\ {\textit{xs}}\ x_1\ X\ \wedge \ {\textsf {fresh}}\ {\textit{xs}}\ x_2\ X \ \implies \ X\;[x_1 \wedge x_2]_{xs} = X \end{aligned}$$

(3) Swapping fresh variables composes:

$$\begin{aligned} {\textsf {fresh}}\ {\textit{xs}}\ y\ X\ \wedge \ {\textsf {fresh}}\ {\textit{xs}}\ z\ X \ \implies \ (X\;[y \wedge x]_{xs})\;[z \wedge y]_{xs} = X\;[z \wedge x]_{xs} \end{aligned}$$

The following lemmas describe the basic properties of substitution. The results for unary substitution follow routinely from those of parallel substitution. However, to support concrete formalizations it is useful to have both versions. Indeed, the majority of formalizations will only need unary substitution—and they should not be bothered with having to work with the much heavier parallel substitution properties.

Prop 14

(Swapping versus substitution) Assume \({\textsf {good}}\;X\), \({\textsf {good}}\;Y\) and \({\uparrow }\,{\textsf {good}}\;\rho \). The following hold:

(1) \( (X\;[\rho ])\;[z_1 \wedge z_2]_{zs} = (X\;[z_1\wedge z_2]_{zs})\;[\,{\uparrow }(\_\,[z_1\wedge z_2]_{zs})\;\rho \,] \)

(2) \(Y\,[X/x]_{xs}\,[z_1 \wedge z_2]_{zs} \;=\; (Y\,[z_1 \wedge z_2]_{zs})\,[(X[z_1 \wedge z_2]_{zs})\,/\,(x[z_1 \wedge z_2]_{xs,zs}) ]_{xs}\)

Note that, at point (1) above, \({\uparrow }(\_\,[z_1\wedge z_2]_{zs})\;\rho \) is the lifting of the \((z_1,z_2.{\textit{zs}})\)-swapping operator (which swaps \(z_1\) with \(z_2\) on varsort \({\textit{zs}}\)) to parallel-substitution environments \(\rho : \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{term}\;\mathbf{option}\). Point (2) follows easily from point (1).

Prop 15

(Parallel substitution versus freshness) Assume \({\textsf {good}}\;X\) and \({\uparrow }\,{\textsf {good}}\;\rho \). The following hold:

$$\begin{aligned} \begin{array}{l} {\textsf {fresh}}\ {\textit{xs}}\ x\ (X\;[\rho ])\ \Longleftrightarrow \ (\ \forall \ {\textit{ys}}\ y.\ {\textsf {fresh}}\;{\textit{ys}}\;y\;X\ \vee \ \\ ((\rho \;{\textit{ys}}\;y = {\textsf {None}}\wedge ({\textit{ys}},y) \not = ({\textit{xs}},x) )\ \vee \ (\exists \ Y.\ \rho \;{\textit{ys}}\;y = {\textsf {Some}}\;Y \,\wedge \, {\textsf {fresh}}\;{\textit{xs}}\;x\; Y))\ ) \end{array} \end{aligned}$$

In the unary case we obtain three separate properties:

Prop 16

(Substitution versus freshness) Assume \({\textsf {good}}\;X\) and \({\textsf {good}}\;Y\). The following hold:

(1) Freshness for a unary substitution decomposes into freshness for its participants:

$$\begin{aligned} {\textsf {fresh}}\;{\textit{zs}}\;z\;(X[Y / y]_{ys}) \Longleftrightarrow (({\textit{zs}},z)=({\textit{ys}},y) \vee {\textsf {fresh}}\;{\textit{zs}}\;z\;X) \,\wedge \, ({\textsf {fresh}}\;{\textit{ys}}\;y\;X \vee {\textsf {fresh}}\;{\textit{zs}}\;z\;Y) \end{aligned}$$

(2) Substitution preserve freshness:

$$\begin{aligned} {\textsf {fresh}}\;{\textit{zs}}\;z\;X \wedge {\textsf {fresh}}\;{\textit{zs}}\;z\;Y\ \implies \ {\textsf {fresh}}\;{\textit{zs}}\;z\;(X\;[Y / y]_{ys}) \end{aligned}$$

(3) The substituted variable is fresh for the substitution:

$$\begin{aligned} {\textsf {fresh}}\;{\textit{ys}}\;y\;Y\ \implies \ {\textsf {fresh}}\;{\textit{ys}}\;y\;(X\;[Y / y]_{ys}) \end{aligned}$$

Prop 17

(Properties of substitution) Assume \({\textsf {good}}\;X\), \({\textsf {good}}\;Y\), \({\uparrow }\,{\textsf {good}}\;\rho \) and \({\uparrow }\,{\textsf {good}}\;\rho '\). The following hold:

(1) Parallel substitution in environment \(\rho \) only depends on \(\rho \)’s action on the free (non-fresh) variables:

$$\begin{aligned} (\forall \ {\textit{ys}}\ y.\ \lnot \;{\textsf {fresh}}\;{\textit{ys}}\;y\;X \implies \rho \;{\textit{ys}}\;y =\rho '\;ys\;y)\ \implies \ X\;[\rho ] = X\;[\rho '] \end{aligned}$$

(2) Parallel substitution is the identity if the free variables of the environment \(\rho \) are disjoint from those of the target term X:

$$\begin{aligned} (\forall \ {\textit{zs}}\ z.\ {\uparrow }({\textsf {fresh}}\;{\textit{zs}}\;z)\;\rho \ \vee \ {\textsf {fresh}}\;{\textit{zs}}\;z\;X)\ \implies \ X\;[\rho ] = X \end{aligned}$$

(3) Unary substitution is the identity if the substituted variable is fresh for the target term (corollary of point (2)):

$$\begin{aligned} {\textsf {fresh}}\;{\textit{ys}}\;y\;X\ \implies \ (X\;[Y / y]_{ys}) = X \end{aligned}$$

As for compositionality of substitution we give different versions of the lemma, all of which are consequences of the first, most general one.

Prop 18

(Substitution compositionality) Assume \({\textsf {good}}\;X\), \({\textsf {good}}\;Y\), \({\uparrow }\,{\textsf {good}}\;\rho \) and \({\uparrow }\,{\textsf {good}}\;\rho '\). The following hold:

(1) Parallel substitution is compositional

$$\begin{aligned} X\,[\rho ]\,[\rho '] \;=\; X\,[\rho \bullet \rho '] \end{aligned}$$

where \(\rho \bullet \rho '\) is the monadic composition of \(\rho \) and \(\rho '\), defined as

$$\begin{aligned} (\rho \bullet \rho ')\,{\textit{xs}}\;x \,=\; \text{ case } \rho \;{\textit{xs}}\;x\hbox { of }{\textsf {None}}\Rightarrow \rho '\;{\textit{xs}}\;x \mid {\textsf {Some}}\;X \Rightarrow X[\rho '] \end{aligned}$$

(2) Parallel substitution distributes over unary substitution

$$\begin{aligned} (X\,[Y / y]_{ys})\,[\rho ] \,=\, X\,[\rho \,[y \leftarrow Y[\rho ]]_{ys}] \end{aligned}$$

where \(\rho [y \leftarrow Y[\rho ]]_{ys}\) is the assignment \(\rho \) updated with value \({\textsf {Some}}\,(Y[\rho ])\) for \(({\textit{ys}},y)\)

(3) Unary substitution composes with parallel substitution (via monadic composition)

$$\begin{aligned} (X\,[\rho ])\,[Y / y]_{ys} \,=\, X\,[\rho \bullet [Y / y]_{ys}] \end{aligned}$$

where we use the notation \([Y / y]_{ys}\) also for that environment that maps everything to \({\textsf {None}}\), but \(({\textit{ys}},y)\) which is instead mapped to Y

(4) Substitution of the same variable (and of the same varsort) distributes over itself

$$\begin{aligned} X\ [Y_1 / y]_{ys}\, [Y_2 / y]_{ys} \;=\; X\,[(Y_1\,[Y_2 / y]_{ys}) / y]_{ys} \end{aligned}$$

(5) Substitution of different variables distributes over itself, assuming freshness

$$\begin{aligned} ({\textit{ys}}\ne {\textit{zs}}\vee y \ne z) \wedge {\textsf {fresh}}\;{\textit{ys}}\;y\;Z \;\Longrightarrow \; X\,[Y / y]_{ys}\, [Z / z]_{zs} \,=\, (X\,[Z / z]_{zs})\ [(Y\,[Z / z]_{zs}) / y]_{ys} \end{aligned}$$

In summary, we have formalized quite exhaustively the general-purpose properties of the syntactic constructors and the standard operators. Some of these properties are subtle. During the formalization of concrete results for particular syntaxes, they are likely to require a lot of time to even formulate them correctly, let alone prove them—which would be wasteful, since they are independent of the particular syntax.

4 Operator-Sensitive Recursion

In this section we present several definition principles for functions having terms and abstractions as their domains. The principles we formalize are generalizations to an arbitrary syntax of results that have been previously described for the particular syntax of \(\lambda \)-calculus [75, 91]. The main characteristic of the principles will be that the functions they introduce have defining clauses not only for the constructors (as customary in recursive definitions on free datatypes), but also for the freshness, substitution and/or swapping operators.

We start with the simpler-structured iteration principles (4.1) followed by their extension to primitive recursion (4.2). We also show two examples of using our principles. The first defines the skeleton of a term (a generalization of the notion of depth) using freshness-swapping-based iteration (4.3). The second employs freshness-substitution-based iteration to produce a whole class of instances: the interpretation of syntax in semantic domains (4.4). Several iteration/recursion examples for particular syntaxes can be found in [91, §4], [22, §6, §7], [45, §3.2, §3.3, §4] and [89, §3.3, §3.4].

4.1 Iteration

A freshness-substitution (FSb) model consists of two collections of elements endowed with term- and abstraction- like operators satisfying some characteristic properties of terms. More precisely, it consists of:

  • two types, \(\mathbf{T}\) and \(\mathbf{A}\)

  • operations corresponding to the constructors:

    • \({\textsf {V}}\)\({\textsf {A}}\)\({\textsf {R}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{T}\)

    • \({\textsf {OP}}: \mathbf{opsym}\rightarrow (\mathbf{index},\mathbf{T})\;\mathbf{input}\rightarrow (\mathbf{bindex},\mathbf{A})\;\mathbf{input}\rightarrow \mathbf{T}\)

    • \({\textsf {ABS}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{T}\rightarrow \mathbf{A}\)

  • operations corresponding to freshness and substitution:

    • \({\textsf {FRESH}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{T}\rightarrow \mathbf{bool}\)

    • \({\textsf {FRESHABS}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{A}\rightarrow \mathbf{bool}\)

    • \( {\textsf {SUBST}}: \mathbf{T}\rightarrow \mathbf{T}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{T}\)

    • \( {\textsf {SUBSTABS}}: \mathbf{A}\rightarrow \mathbf{T}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{A}\)

and it is required to satisfy:

  • the freshness clauses F1–F5 shown in Fig. 3 (analogous to the implicational simplification rules for freshness in Prop. 7)

  • substitution clauses Sb1–Sb4 shown in Fig. 4 (analogous to the simplification rules for substitution in Prop. 10)

  • the substitution-renaming clause SbRn also shown in Fig. 4 (analogous to the substitution-based abstraction equality rule in Prop. 6(2))

Fig. 3
figure 3

The freshness clauses

Fig. 4
figure 4

The substitution and substitution-renaming clauses

Theorem 19

The good terms and abstractions form the initial FSb model. Namely, for any FSb model as above, there exist the functions \({\textit{f}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{fAbs}}: \mathbf{abs}\rightarrow \mathbf{A}\) that commute, on good terms, with the constructors and with substitution and preserve freshness:

$$\begin{aligned} \begin{array}{l} {\textit{f}}\,({\textsf {Var}}\;{\textit{xs}}\;x) = {\textsf {V}}{\textsf {A}}{\textsf {R}}\;{\textit{xs}}\;x \\ {\textit{f}}\,({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}}) = {\textsf {OP}}\;\delta \;({\uparrow }\,{\textit{f}}\;{\textit{inp}})\;({\uparrow }\,{\textit{fAbs}}\;{\textit{binp}}) \\ {\textit{fAbs}}\,({\textsf {Abs}}\;{\textit{xs}}\;x\;X) = {\textsf {ABS}}\;{\textit{xs}}\;x\;(f\;X) \\ {\textit{f}}\,(X\,[Y/y]_{ys}) = {\textsf {SUBST}}\ ({\textit{f}}\;X)\ ({\textit{f}}\;Y)\ y\ {\textit{ys}}\\ {\textit{fAbs}}\,(A\,[Y/y]_{ys}) = {\textsf {SUBSTABS}}\ ({\textit{fAbs}}\;A)\ ({\textit{f}}\;Y)\ y\ {\textit{ys}}\\ {\textsf {fresh}}\;{\textit{xs}}\;x\;X \,\Longrightarrow \, {\textsf {FRESH}}\;{\textit{xs}}\;x\;({\textit{f}}\;X) \\ {\textsf {freshAbs}}\;{\textit{xs}}\;x\;A \,\Longrightarrow \, {\textsf {FRESHABS}}\;{\textit{xs}}\;x\;({\textit{fAbs}}\;A) \end{array} \end{aligned}$$

In addition, the two functions are uniquely determined on good terms and abstractions, in that, for all other functions \({\textit{g}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{gAbs}}:\mathbf{abs}\rightarrow \mathbf{A}\) satisfying the same commutation and preservation properties, it holds that \({\textit{f}}\) and \({\textit{g}}\) are equal on good terms and \({\textit{fAbs}}\) and \({\textit{gAbs}}\) are equal on good abstractions.

Like any initiality property, this theorem represents an iteration principle. To comprehend the connection between initiality and iteration, let us first look at the simpler case of lists over a type \(\mathbf{G}\), with constructors \({\textsf {Nil}}: \mathbf{G}\;\mathbf{list}\) and \({\textsf {Cons}}: \mathbf{G}\rightarrow \mathbf{G}\;\mathbf{list}\rightarrow \mathbf{G}\;\mathbf{list}\). To define, by iteration, a function from lists, say, \({\textsf {length}}: \mathbf{G}\;\mathbf{list}\rightarrow \mathbf{nat}\), we need to indicate what is \({\textsf {Nil}}\) mapped to, here \({\textsf {length}}\;{\textsf {Nil}}= 0\), and, recursively, what is \({\textsf {Cons}}\) mapped to, here \({\textsf {length}}\;({\textsf {Cons}}\;a\;{\textit{as}}) = 1 + {\textsf {length}}\;{\textit{as}}\). We can rephrase this by saying: If we define “list-like” operators on the target domain— here, taking \({\textsf {NIL}}: \mathbf{nat}\) to be 0 and \({\textsf {CONS}}: \mathbf{G}\rightarrow \mathbf{nat}\rightarrow \mathbf{nat}\) to be \(\lambda g,n.\;1+n\)—then the iteration offers us a function \({\textsf {length}}\) that commutes with the constructors: \({\textsf {length}}\;{\textsf {Nil}}= {\textsf {NIL}}= 0\) and \({\textsf {length}}\,({\textsf {Cons}}\;a\;{\textit{as}}) = {\textsf {CONS}}\;a\;({\textsf {length}}\;{\textit{as}}) = 1 + {\textsf {length}}\;{\textit{as}}\). For terms, we have a similar situation, except that (1) substitution and freshness are considered in addition to the constructors and (2) paying the price for lack of freeness, some conditions need to be verified to deem the operations “term-like.”

The main feature of our iteration theorem is the ability to define functions in a manner that is compatible with alpha-equivalence. A byproduct of the theorem is that the defined functions also interact well with freshness and substitution, in that they map these concepts to corresponding concepts on the target domains.

Michael Norrish has developed a similar principle that employs swapping instead of substitution [75]. We have also formalized this in our framework—in a slightly restricted form, namely without factoring in fixed variables and parameters.

A freshness-swapping (FSw) model is a structure similar to a freshness-substitution model, just that instead of the substitution-like operators, \({\textsf {SUBST}}\) and \({\textsf {SUBSTABS}}\), it features swapping-like operators:

  • \({\textsf {SWAP}}: \mathbf{T}\rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{T}\)

  • \({\textsf {SWAPABS}}: \mathbf{A}\rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{A}\)

assumed to satisfy the clauses Sw1–Sw3 corresponding to those for simplifying term swapping and, instead of the substitution-renaming property, a swapping-based congruence rule for abstractions SwCong—all shown in Fig. 5.

Fig. 5
figure 5

The swapping and swapping-based congruence clauses

Then a swapping-aware version of the iteration theorem holds:

Theorem 20

The good terms and abstractions form the initial FSw model. Namely, there exists a pair of functions \({\textit{f}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{fAbs}}: \mathbf{abs}\rightarrow \mathbf{A}\) that commute, on good terms, with the constructors and preserve freshness—similarly to how it is described in Theorem 19, the only difference being that they are not guaranteed to commute with substitution, but with swapping, namely:

$$\begin{aligned} \begin{array}{l} {\textit{f}}\,(X\,[z_1\wedge z_2]_{zs}) = {\textsf {SWAP}}\ ({\textit{f}}\;X)\;z_1\;z_2\;{\textit{zs}}\\ {\textit{fAbs}}\,(A\,[z_1\wedge z_2]_{zs}) = {\textsf {SWAPABS}}\ ({\textit{fAbs}}\;A)\;z_1\;z_2\;{\textit{zs}}\end{array} \end{aligned}$$

In addition, the two functions are uniquely determined on good terms and abstractions (just like in Theorem 19).

Finally, we combine both notions, obtaining freshness-substitution-swapping (FSbSw) models. These are required to have both substitution-like and swapping-like operators and to satisfy the union of the FSb and FSw clauses, except for the swapping congruence clause SwCong—namely, clauses F1-F4, Sb1-Sb4, Sw1-Sw3 and SbRn. (Interestingly, SwCong was not needed for proving the iteration theorem; the proof needs either SbRn and SwCong, i.e., only one of the two.)

Theorem 21

The good terms and abstractions form the initial FSbSw model. Namely, there exists a pair of functions \({\textit{f}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{fAbs}}: \mathbf{abs}\rightarrow \mathbf{A}\) that commute, on good terms, with the constructors, substitution, swapping and preserve freshness, as described in Theorems 19 and 20. In addition, the two functions are uniquely determined on good terms and abstractions.

In summary, we have three variants of models (and iteration principles), corresponding to three combinations of the fundamental term operations:

  • freshness-substitution (FSb) models

  • freshness-swapping (FSw) models

  • freshness-substitution-swapping (FSbSw) models

Having formalized all these variants, the user can decide on the desired “contract:” With more operators factored in, there are more proof obligations that need to be discharged for the definition to succeed, but then the defined functions satisfy more desirable properties.

4.2 Primitive Recursion

Iteration is a simplified form of primitive recursion. The difference between the two is illustrated by the following simple exampleFootnote 2: The predecessor function \({\textsf {pred}}: \mathbf{nat}\rightarrow \mathbf{nat}\) is defined by \({\textsf {pred}}\;0 = 0\) and \({\textsf {pred}}\,({\textsf {Suc}}\;n) = n\). This does not fit an iteration scheme, where only the value of the function on smaller arguments, and not the arguments themselves, can be used. In the example, iteration would allow \({\textsf {pred}}\;({\textsf {Suc}}\;n)\) to invoke recursively \({\textsf {pred}}\;n\), but not n. Of course, we can simulate recursion by iteration if we are allowed an auxiliary output: defining \({\textsf {pred}}': \mathbf{nat}\rightarrow \mathbf{nat}\times \mathbf{nat}\) by iteration, \({\textsf {pred}}'\;0 = (0,0)\) and \({\textsf {pred}}'\,({\textsf {Suc}}\;n) = \) case \({\textsf {pred}}'\;n\) of \((n_1,n_2) \Rightarrow ({\textsf {Suc}}\;n_1,n_1)\), and then taking \({\textsf {pred}}\;n\) to be the second component of \({\textsf {pred}}'\;n\).

In our framework, primitive recursion can also be reduced to iteration—see [89, §1.4.2] for a description of this phenomenon for the general case of initial models in Horn theories. Initially, we had only formalized the iteration theorems. However, we soon realized that several applications (for the particular syntaxes of \(\lambda \)-calculus and many-sorted first-order logic) required the full power of primitive recursion, and it was very tedious to perform the recursion-to-iteration encoding over and over again, with each new definition. We therefore decided to formalize this reduction for an arbitrary syntax, obtaining primitive recursion theorems in all three variants, that is, factoring in substitution, swapping or both. We only show here the primitive recursion variant of Theorem 19, where we highlight the additions compared to iteration. (The other two primitive recursion theorems are obtained similarly from Theorems 20 and 21.)

A FSb recursion model has the same components as an FSb model, except that:

  • \({\textsf {OP}}\) takes term and abstraction inputs in addition to inputs from the model, i.e., has type

  • \({\textsf {ABS}}\) takes an additional term argument, i.e., has type

  • The freshness and substitution operators take additional term and/or abstraction arguments; e.g., the types for the term versions of these are:

  • The clauses F1–F4, Sb1–Sb4 and SbRn are updated to factor in the additional structure, e.g., Sb4 becomes:

    $$\begin{aligned} \begin{array}{l} ({\textit{xs}},x) \not = (zs,z) \;\wedge \; {\textsf {fresh}}\;{\textit{xs}}\;x\;Z' \;\wedge \; {\textsf {FRESH}}\;{\textit{xs}}\;x\;Z'\;Z \;\Longrightarrow \; \\ {\textsf {SUBSTABS}}\ ({\textsf {Abs}}\;{\textit{xs}}\;x\;X')\ ({\textsf {ABS}}\;{\textit{xs}}\;x\;X'\;X)\ \ Z'\ Z\ z\ {\textit{zs}}\ = \\ {\textsf {ABS}}\;{\textit{xs}}\;x\;(X'[Z'/z]_{zs})\;({\textsf {SUBST}}\ X'\ X\ Z'\ Z\ z\ {\textit{zs}}) \end{array} \end{aligned}$$

    where X and Z are (as before) elements of \(\mathbf{T}\), whereas \(X'\) and \(Z'\) are terms.

Theorem 22

For any FSb recursion model as above, there exist the functions \({\textit{f}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{fAbs}}: \mathbf{abs}\rightarrow \mathbf{A}\) that commute, on good terms, with the constructors and with substitution and preserve freshness, in the same manner as in Theorem 19, mutatis mutandis. For example:

  • \({\textit{f}}\,({\textsf {Var}}\;{\textit{xs}}\;x) = {\textsf {V}}\)\({\textsf {A}}\)\({\textsf {R}}\;{\textit{xs}}\;x\)

4.3 Iteration Example: The Skeleton of a Term

Since terms are possibly infinitely branching, they have no notion of finite depth. While we could generalize the depth to return a transfinite ordinal, we opt for a simpler solution: Instead of depth, we use a slightly more informative entity, the “skeleton,” which models a term’s bare-bones structure.

We define the (free) datatypes of trees and “abstraction trees” branching over the free and bound indexes we use for terms. Unlike terms and abstractions, these store no operation symbols or variables, but only placeholders indicating their presence.

$$\begin{aligned} \begin{array}{rrcl} \textsf {datatype } &{} \mathbf{tree}&{}\;=\;&{} {\textsf {t}}{\textsf {Var}}\mid \\ &{}&{}&{} {\textsf {tOp}}\;((\mathbf{index},\mathbf{tree})\,\mathbf{input})\;((\mathbf{bindex},\mathbf{atree})\,\mathbf{input}) \\ \textsf {and } &{} \mathbf{atree}&{}\;=\;&{} {\textsf {tAbs}}\;\mathbf{tree}\end{array} \end{aligned}$$

Our aim is to introduce the skeleton of a term (or of an abstraction), as the tree obtained from it by retaining only branching information and forgetting about the occurrences of operation symbols and variables and their sorts. Namely, we wish to define \({\textsf {skel}}: \mathbf{term}\rightarrow (\mathbf{index},\mathbf{bindex})\,\mathbf{tree}\) and \({\textsf {skelAbs}}: \mathbf{abs}\rightarrow (\mathbf{index},\mathbf{bindex})\,\mathbf{tree}\) by the following mutually recursive clauses:

$$\begin{aligned} \begin{array}{rcl} {\textsf {skel}}\;({\textsf {Var}}\;{\textit{xs}}\;x) &{}\;=\;&{} {\textsf {t}}{\textsf {Var}}\\ {\textsf {skel}}\;({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}}) &{}\;=\;&{} {\textsf {tOp}}\;({\uparrow }{\textsf {skel}}\;{\textit{inp}})\;({\uparrow }{\textsf {skelAbs}}\;{\textit{binp}}) \\ {\textsf {skelAbs}}\;({\textsf {Abs}}\;{\textit{xs}}\;x\;X) &{}\;=\;&{} {\textsf {tAbs}}\;({\textsf {skel}}\;X) \end{array} \end{aligned}$$

To this end, we wish to make use of one of our iteration/recursion principles to guarantee that the above represents a valid definition, in that there exist the functions \({\textsf {skel}}\) and \({\textsf {skelAbs}}\) satisfying the above equations. So we look into “completing” the above definition by indicating how these presumptive functions are supposed to behave with respect to the standard operators. In other words, we try to define tree versions of the standard term operators

  • \({\textsf {FRESH}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{tree}\rightarrow \mathbf{bool}\)

    \({\textsf {FRESHABS}}: \mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{atree}\rightarrow \mathbf{bool}\)

  • \({\textsf {SWAP}}: \mathbf{tree}\rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{tree}\)

    \({\textsf {SWAPABS}}: \mathbf{atree}\rightarrow \mathbf{var}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{atree}\)

    and/or

  • \({\textsf {SUBST}}: \mathbf{tree}\rightarrow \mathbf{tree}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{tree}\)

    \({\textsf {SUBSTABS}}: \mathbf{atree}\rightarrow \mathbf{tree}\rightarrow \mathbf{var}\rightarrow \mathbf{varsort}\rightarrow \mathbf{atree}\)

while keeping in mind that \({\textsf {skel}}\) and \({\textsf {skelAbs}}\) must commute with these. Since trees have no actual variables in them, the only sensible choices are the trivial ones:

  • \({\textsf {FRESH}}\;{\textit{xs}}\;x\;X = {\textsf {True}}\), \({\textsf {FRESHABS}}\;{\textit{xs}}\;x\;A = {\textsf {True}}\)

  • \({\textsf {SWAP}}\;X\;x_1\;x_2\;{\textit{xs}}= X\), \({\textsf {SWAPABS}}\;A\;x_1\;x_2\;{\textit{xs}}= A\)

  • \({\textsf {SUBST}}\;Y\;X\;x\;{\textit{xs}}= Y\), \({\textsf {SUBSTABS}}\;A\;X\;x\;{\textit{xs}}= A\)

Thus, commutation with the operators will mean the following (where we omit the properties for the abstraction versions of the operators, which are similar):

$$\begin{aligned} \begin{array}{rcl} {\textsf {fresh}}\;{\textit{xs}}\;x\;X &{} \Longrightarrow &{} {\textsf {True}}\\ {\textsf {skel}}\;(X\,[x_1 \wedge x_2]_{xs}) &{}\;=\;&{} {\textsf {skel}}\;X \\ {\textsf {skel}}\;(Y\,[X / x]_{xs}) &{}\;=\;&{} {\textsf {skel}}\;Y \end{array} \end{aligned}$$

Of these, the intended freshness and swapping properties are clearly suitable: The former is vacuously true, and the latter states that the skeleton does not change after swapping two variables. However, the substitution property cannot work, since it states the wrong/undesired property that the skeleton of a term Y does not change after substituting a term X for one of its variables x—which contradicts our intuition that the skeleton may in fact grow (specifically, at the \({\textsf {t}}{\textsf {Var}}\) leaves that correspond to free occurrences of \({\textsf {Var}}\;{\textit{xs}}\;x\) in Y).

In summary, for making the skeleton definition work we must focus on freshness and swapping rather than substitution. We thus employ the iteration Theorem 20, where the required FSw model is defined taking \({\textsf {FRESH}}\), \({\textsf {FRESHABS}}\), \({\textsf {SWAP}}\) and \({\textsf {SWAPABS}}\) as above and taking \({\textsf {V}}\)\({\textsf {A}}\)\({\textsf {R}}\), \({\textsf {OP}}\), \({\textsf {ABS}}\) to be given by \({\textsf {t}}{\textsf {Var}}\), \({\textsf {tOp}}\) and \({\textsf {tAbs}}\), respectively. (Namely, \({\textsf {V}}\)\({\textsf {A}}\)\({\textsf {R}}\;{\textit{xs}}\;x = {\textsf {t}}{\textsf {Var}}\), \({\textsf {OP}}\;\delta = {\textsf {tOp}}\) and \({\textsf {ABS}}\;{\textit{xs}}\;x = {\textsf {tAbs}}\).) That this indeed forms an FSw model, i.e., satisfies the desired clauses, is immediate to check: F1–F4 hold trivially since \({\textsf {FRESH}}\) and \({\textsf {FRESHABS}}\) are vacuously true, while Sw1–Sw4 and SwCong hold trivially since \({\textsf {SWAP}}\) and \({\textsf {SWAPABS}}\) are the identity functions.

Thus, Theorem 20 gives us the functions \({\textsf {skel}}\) and \({\textsf {skelAbs}}\) that are uniquely characterized by the following properties (where we omit the freshness preservation property, which in this case is just a tautology):

$$\begin{aligned} \begin{array}{rcl} {\textsf {skel}}\;({\textsf {Var}}\;{\textit{xs}}\;x) &{}\;=\;&{} {\textsf {t}}{\textsf {Var}}\\ {\textsf {skel}}\;({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}}) &{}\;=\;&{} {\textsf {tOp}}\;({\uparrow }{\textsf {skel}}\;{\textit{inp}})\;({\uparrow }{\textsf {skelAbs}}\;{\textit{binp}}) \\ {\textsf {skelAbs}}\;({\textsf {Abs}}\;{\textit{xs}}\;x\;X) &{}\;=\;&{} {\textsf {tAbs}}\;({\textsf {skel}}\;X) \\ {\textsf {skel}}\;(X\,[x_1 \wedge x_2]_{xs}) &{}\;=\;&{} {\textsf {skel}}\;X \\ {\textsf {skelAbs}}\;(A\,[x_1 \wedge x_2]_{xs}) &{}\;=\;&{} {\textsf {skelAbs}}\;A \end{array} \end{aligned}$$

Besides offering a simple instance of freshness-swapping-based iteration, the skeleton operator provides a generalization of depth that turned out to be sufficient for proving important properties requiring renaming variables in terms—notably the fresh induction principle we discuss in Sect. 5.

4.4 Iteration Example: Interpretation of Syntax in Semantic Domains

Perhaps the most useful application of our iteration principles is the seamless interpretation of syntax in semantic domains, in a manner that is guaranteed to be compatible with alpha, substitution and freshness. This construction shows up commonly in the literature, for different notions of semantic domain. However, the construction is essentially the same, and can be expressed for an arbitrary syntax—for which reason we have formalized it in our framework.

A semantic domain consists of two collections of elements endowed with interpretations of the \({\textsf {Op}}\) and \({\textsf {Abs}}\) constructors, the latter in a higher-order fashion—interpreting variable binding as (meta-level) functional binding. Namely, it consists of:

  • two types, \(\mathbf{Dt}\) and \(\mathbf{Da}\)

  • a function \({\textsf {op}}: \mathbf{opsym}\rightarrow (\mathbf{index},\mathbf{Dt})\;\mathbf{input}\rightarrow (\mathbf{bindex},\mathbf{Da})\;\mathbf{input}\rightarrow \mathbf{Dt}\)

  • a function \({\textsf {abs}}: \mathbf{varsort}\rightarrow (\mathbf{Dt}\rightarrow \mathbf{Dt}) \rightarrow \mathbf{Da}\)

Theorem 23

The terms and abstractions are interpretable in any semantic domain. Namely, if \(\mathbf{val}\) is the type of valuations of variables in the domain, \(\mathbf{varsort}\rightarrow \mathbf{var}\rightarrow \mathbf{Dt}\), there exist the functions \({\textsf {sem}}: \mathbf{term}\rightarrow \mathbf{val}\rightarrow \mathbf{Dt}\) and \({\textsf {semAbs}}: \mathbf{abs}\rightarrow \mathbf{val}\rightarrow \mathbf{Da}\) such that:

  • \({\textsf {sem}}\,({\textsf {Var}}\;{\textit{xs}}\;x)\,\rho = \rho \;{\textit{xs}}\;x\)

  • \({\textsf {sem}}\,({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\,\rho = {\textsf {op}}\;\delta \; ({\uparrow }\,({\textsf {sem}}\;\_\;\,\rho )\,{\textit{inp}})\;({\uparrow }\,({\textsf {semAbs}}\;\_\;\,\rho )\,{\textit{binp}})\)

  • \({\textsf {semAbs}}\,({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\,\rho = {\textsf {abs}}\;{\textit{xs}}\,(\lambda d.\;{\textsf {sem}}\;X\;(\rho [({\textit{xs}},x) \leftarrow d]))\),

    where \(\rho [({\textit{xs}},x) \leftarrow d]\) is the function \(\rho \) updated at \(({\textit{xs}},x)\) with d—which sends \(({\textit{xs}},x)\) to d and any other \(({\textit{ys}},y)\) to \(\rho \;{\textit{ys}}\;y\).

In addition, the interpretation functions map syntactic substitution and freshness to semantic versions of the concepts:

  • \({\textsf {sem}}\,(X[Y/y]_{ys})\,\rho = {\textsf {sem}}\;X\;(\rho [({\textit{ys}},y) \leftarrow {\textsf {sem}}\;Y\;\rho ])\)

  • \({\textsf {fresh}}\;{\textit{xs}}\;x\;X \;\Longrightarrow \;(\forall \rho ,\rho '.\;\rho =_{(xs,x)}\rho ' \,\Longrightarrow \, {\textsf {sem}}\;X\;\rho = {\textsf {sem}}\;X\;\rho ')\),

    where “\(=_{(xs,x)}\)” means “equal everywhere except perhaps on \(({\textit{xs}},x)\)”—namely \(\rho =_{(xs,x)}\rho '\) holds iff \(\rho \;{\textit{ys}}\;y = \rho '\;{\textit{ys}}\;y\) for all \(({\textit{ys}},y)\not =({\textit{xs}},x)\).

Theorem 23 is the foundation for many particular semantic interpretations, including that of \(\lambda \)-terms in Henkin models and that of FOL terms and formulas in FOL models. It guarantees compatibility with alpha and proves, as bonuses, a freshness and a substitution property. The freshness property is the familiar notion that the interpretation only depends on the free variables, and the substitution property generalizes what is usually called the substitution lemma, stating that interpreting a substituted term is the same as interpreting the original term in a “substituted” environment. Both properties are essential lemmas in most developments that involve semantics.

This theorem follows by an instantiation of the iteration Theorem 19: taking \(\mathbf{T}\) and \(\mathbf{A}\) to be \(\mathbf{val}\rightarrow \mathbf{Dt}\) and \(\mathbf{val}\rightarrow \mathbf{Da}\) and taking the term/abstraction-like operations as prescribed by the desired clauses for \({\textsf {sem}}\) and \({\textsf {semAbs}}\) (in the following we omit the abstraction versions of the freshness and substitution operators):

  • \({\textsf {V}}\)\({\textsf {A}}\)\({\textsf {R}}\;{\textit{xs}}\;x\ =\ \lambda \rho .\;\rho \;{\textit{xs}}\;x\)

  • \({\textsf {OP}}\;\delta \;{\textit{inp}}\;{\textit{binp}}\ =\ \lambda \rho .\;{\textsf {op}}\;\delta \;({\uparrow }(\_\;\,\rho )\;{\textit{inp}})\; ({\uparrow }(\_\;\,\rho )\;{\textit{binp}})\) where \((\_\;\,\rho )\) denotes the “application to \(\rho \)” operator \(\lambda \;u.\;u\;\rho \)

  • \({\textsf {ABS}}\;{\textit{xs}}\;x\;X\ =\ \lambda \rho .\;{\textsf {abs}}\;{\textit{xs}}\;(\lambda d.\;X\;(\rho [({\textit{xs}},x) \leftarrow d]))\)

  • \({\textsf {FRESH}}\;{\textit{xs}}\;x\;X \ =\ (\forall \rho ,\rho '.\;\rho =_{(xs,x)}\rho ' \,\Longrightarrow \, X\;\rho = X\;\rho ')\)

  • \({\textsf {SUBST}}\;X\;Y\;y\;{\textit{ys}}=\ \lambda \rho .\;X\;(\rho [({\textit{ys}},y) \leftarrow Y\;\rho ])\)

Note that the above definitions are completely determined by the intended properties listed in Theorem 23 (which we set out to prove). For example, \({\textsf {FRESH}}\) was defined so that the freshness property listed in Theorem 23 becomes the freshness commutation property \({\textsf {fresh}}\;{\textit{xs}}\;x\;X \Longrightarrow {\textsf {FRESH}}\;{\textit{xs}}\;x\;({\textsf {sem}}\;X)\). Thus, according to our freshness-substitution-based iteration Theorem 19, what we are left to check in order to prove Theorem 23 is that the above structure is an FSb model, i.e., satisfies the clauses F1-F4, Sb1-Sb4 and SbRn. This amounts to checking the following:

  1. F1:

    \(({\textit{xs}},x)\ne ({\textit{ys}},y)\ \)\(\wedge \)\(\ \rho =_{(ys,y)}\rho '\ \)\(\Longrightarrow \)\(\ \rho \ {\textit{xs}}\ x=\rho '\ {\textit{xs}}\ x\)

  2. F2:

    A trivial implication, of the form “A implies A”

  3. F3:

    If \(\rho =_{(ys,y)}\rho '\ \)\(\Longrightarrow \)\(\ \rho [({\textit{ys}},y)\leftarrow d] =\rho '[({\textit{ys}},y)\leftarrow d] \)

  4. F4:

    If \(\rho =_{(ys,y)}\rho '\ \)\(\Longrightarrow \)\(\ \rho [({\textit{xs}},x)\leftarrow d] =_{(ys,y)}\rho '[({\textit{xs}},x)\leftarrow d]\)

  5. Sb1:

    \(\rho [({\textit{xs}},x)\leftarrow d]\ {\textit{xs}}\ x=d\ \)

  6. Sb2:

    If \(({\textit{xs}},x)\ne ({\textit{ys}},y)\ \)\(\Longrightarrow \)\(\ \rho [({\textit{ys}},y)\leftarrow d]\ {\textit{xs}}\ x = \rho \ {\textit{xs}}\ x\ \)

  7. Sb3:

    Some trivial equalities, of the form “A \(=\) A”

  8. Sb4:

    \(\rho =_{(xs,x)}\rho [({\textit{xs}},x)\leftarrow d]\ \) and

    \(({\textit{xs}},x)\not =({\textit{zs}},z)\ \)\(\Longrightarrow \)\(\ \rho [({\textit{zs}},z)\leftarrow d'][({\textit{xs}},x)\leftarrow d]=\rho [({\textit{xs}},x)\leftarrow d]\,[({\textit{zs}},z)\leftarrow d']\)

  9. SbRn:

    \(\rho [({\textit{xs}},y)\leftarrow d]\,[({\textit{xs}},x)\leftarrow d]\ =_{(xs,y)}\ \rho [({\textit{xs}},x)\leftarrow d]\)

All the above are straightforward properties of function update. Indeed, Isabelle/HOL’s auto method was able to prove all of them.

5 Induction Principle

We formalize a scheme for “fresh” induction in the style of nominal logic, which realizes the Barendregt convention. We introduce and motivate this scheme by an example. To prove Prop. 16(a), we use (mutual) structural induction over terms and abstractions, proving the statement together with the corresponding statement for abstractions,

$$\begin{aligned} \begin{array}{l} {\textsf {freshAbs}}\;{\textit{zs}}\;z\;(A[Y / y]_{ys}) \;\,\Longleftrightarrow \\ (({\textit{zs}},z) = ({\textit{ys}},y) \,\vee \, {\textsf {freshAbs}}\;{\textit{zs}}\;z\;A) \;\wedge \; ({\textsf {freshAbs}}\;{\textit{ys}}\;y\;A \,\vee \, {\textsf {fresh}}\;{\textit{zs}}\;z\;Y) \end{array} \end{aligned}$$

The proof’s only interesting case is the \({\textsf {Abs}}\) case, say, for abstractions of the form \({\textsf {Abs}}\;{\textit{xs}}\;x\;X\). However, if we were able to assume freshness of \(({\textit{xs}},x)\) for all the statement’s parameters, namely Y, \(({\textit{ys}},y)\) and \(({\textit{zs}},z)\), this case would also become “uninteresting,” following automatically from the induction hypothesis by mere simplification, as shown below (with the freshness assumptions highlighted):

The practice of assuming freshness, known in the literature as the Barendregt convention, is a hallmark in informal reasoning about bindings. Thanks to insight from nominal logic [84, 110, 112], we also know how to apply this morally correct convention fully rigorously. To capture it in our formalization, we model parameters \(p: \mathbf{param}\) as anything that allows for a notion of freshness, or, alternatively, provides a set of (free) variables for each varsort, \({\textsf {vars}}{\textsf {Of}}: \mathbf{param}\rightarrow \mathbf{varsort}\rightarrow \mathbf{var}\;\mathbf{set}\). With this, a “fresh induction” principle can be formulated, if all parameters have fewer variables than \(|\mathbf{var}|\) (in particular, if they have only finitely many).

Theorem 24

Let \(\varphi : \mathbf{term}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\) and \({\varphi {\textit{Abs}}}: \mathbf{abs}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\). Assume:

(1) \(\forall {\textit{xs}},p.\;|{\textsf {vars}}{\textsf {Of}}\;p\;{\textit{xs}}| < |\mathbf{var}|\)

(2) \(\forall {\textit{xs}},x,p.\;\varphi \;({\textsf {Var}}\;{\textit{xs}}\;x)\;p\)

(3) \(\forall \delta ,{\textit{inp}},{\textit{binp}},p.\; |{\textsf {dom}}\;{\textit{inp}}|< |\mathbf{var}| \,\wedge \, |{\textsf {dom}}\;{\textit{binp}}| < |\mathbf{var}| \,\wedge \, {\uparrow }\,(\lambda X.\;{\textsf {good}}\;X \,\wedge \, (\forall q.\,\varphi \;X\;q))\;{\textit{inp}}\,\wedge \, {\uparrow }\,(\lambda A.\;{\textsf {goodAbs}}\;A \,\wedge \, (\forall q.\,{\varphi {\textit{Abs}}}\;A\;q))\,{\textit{binp}}\Longrightarrow \varphi \,({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\;p\)

(4)

Then \(\forall X,p.\;{\textsf {good}}\;X \Longrightarrow \varphi \;X\;p\) and \(\forall A,p.\;{\textsf {goodAbs}}\;A \Longrightarrow {\varphi {\textit{Abs}}}\;A\;p\).

Highlighted is the essential difference from the usual structural induction: The bound variable x can be assumed fresh for the parameter p (on its varsort, \({\textit{xs}}\)). Note also that, in the \({\textsf {Op}}\) case, we lift to inputs the predicate as quantified universally over all parameters.

Back to Prop. 16(a), this follows automatically by fresh induction (plus the shown simplifications), after recognizing as parameters the variables \(({\textit{ys}},y)\) and \(({\textit{zs}},z)\) and the term Y—formally, taking \(\mathbf{param}= (\mathbf{varsort}\times \mathbf{var})^2 \times \mathbf{term}\) and \({\textsf {vars}}{\textsf {Of}}\;(({\textit{ys}},y),({\textit{zs}},z),Y)\;{\textit{xs}}= \{y \mid {\textit{xs}}= {\textit{ys}}\} \,\cup \, \{z \mid {\textit{xs}}= {\textit{zs}}\} \,\cup \, \{x \mid \lnot \,{\textsf {fresh}}\;{\textit{xs}}\;x\;Y\}\).

Fresh induction is based on the possibility to rename bound variables in abstractions without loss of generality. To prove this principle, we employed standard induction over the skeleton of terms—using the crucial fact that the skeleton is invariant under swapping.

6 Sorting the Terms

So far, we have a framework where the operations take as free and bound inputs partial families of terms and abstractions. All theorems refer to good (i.e., sufficiently low-branching) terms and abstractions. However, we promised a theory that is applicable to terms over many-sorted binding signatures. Thanks to the choice of a flexible notion of input, it is not difficult to cast our results into such a many-sorted setting. Given a suitable notion of signature (6.1), we classify terms according to sorts (6.2) and prove that well-sorted terms are good (6.3)—this gives us sorted versions of all theorems (6.6).

6.1 Binding Signatures

A (binding) signature is a tuple \((\mathbf{index},\mathbf{bindex},\mathbf{varsort},\mathbf{sort},\mathbf{opsym},{\textsf {asSort}},{\textsf {stOf}},{\textsf {arOf}},\;{\textsf {barOf}})\), where \(\mathbf{index}\), \(\mathbf{bindex}\), \(\mathbf{varsort}\) and \(\mathbf{opsym}\) are types (with the previously discussed intuitions) and \(\mathbf{sort}\) is a new type, of sorts for terms. Moreover:

  • \({\textsf {asSort}}: \mathbf{varsort}\rightarrow \mathbf{sort}\) is an injective map, embedding varsorts into sorts

  • \({\textsf {stOf}}: \mathbf{opsym}\rightarrow \mathbf{sort}\), read “the (result) sort of”

  • \({\textsf {arOf}}: \mathbf{opsym}\rightarrow (\mathbf{index},\mathbf{sort})\,\mathbf{input}\), read “the (free) arity of”

  • \({\textsf {barOf}}: \mathbf{opsym}\rightarrow (\mathbf{bindex},\mathbf{varsort}\times \mathbf{sort})\,\mathbf{input}\), read “the bound arity of”

Thus, a signature prescribes which varsorts correspond to which sorts (as discussed in Sect. 2.4) and, for each operation symbol, which are the sorts of its free inputs (the arity), of its bound (abstraction) inputs (the bound arity), and of its result.

When we give examples for our concrete syntaxes described in Sect. 2, we will write \((i_1\mapsto a_1,\ldots ,i_n\mapsto a_n)\) for the partial function that sends each \(i_k\) to \({\textsf {Some}}\;a_k\) and everything else to \({\textsf {None}}\). In particular, () denotes the totally undefined function.

For the \(\lambda \)-calculus syntax, we take \(\mathbf{index}= \mathbf{bindex}= \mathbf{nat}\), \(\mathbf{varsort}= \mathbf{sort}= \{{{\textsf {lterm}}}\}\) (a singleton datatype), \(\mathbf{opsym}= \{{{\textsf {app}}},{{\textsf {lam}}}\}\), \({\textsf {asSort}}\) to be the identity and \({\textsf {stOf}}\) to be the unique function to \(\{{{\textsf {lterm}}}\}\). Since \({{\textsf {app}}}\) has two free inputs and no bound input, we use the first two elements of \(\mathbf{nat}\) as free arity and nothing for the bound arity: \({\textsf {arOf}}\;{{\textsf {app}}}=(0\mapsto {{\textsf {lterm}}},\,1\mapsto {{\textsf {lterm}}})\), \({\textsf {barOf}}\;{{\textsf {app}}}= ()\). By contrast, since \({{\textsf {lam}}}\) has no free input and one bound input, we use nothing for the free arity, and the first element of \(\mathbf{nat}\) for the bound arity: \({\textsf {arOf}}\;{{\textsf {lam}}}= ()\), \({\textsf {barOf}}\;{{\textsf {lam}}}= (0\mapsto ({{\textsf {lterm}}},{{\textsf {lterm}}}))\).

For the CCS example in Sect. 2.5, we fix a type \(\mathbf{chan}\) of channels. We choose a cardinal upper bound \(\kappa \) for the branching of sum \(\left( \sum \right) \), and choose a type \(\mathbf{index}\) of cardinality \(\kappa \). For \(\mathbf{bindex}\), we do not need anything special, so we take it to be \(\mathbf{nat}\). We have two sorts, of expressions and processes, so we take \(\mathbf{sort}= \{{\textsf {exp}},{\textsf {proc}}\}\). Since we have expression variables but no process variables, we take \(\mathbf{varsort}= \{{\textsf {varexp}}\}\) and \({\textsf {asSort}}\) to send \({\textsf {varexp}}\) to \({\textsf {exp}}\). We define \(\mathbf{opsym}\) as the following datatype: \(\mathbf{opsym}\; =\; {\textsf {Zero}}\mid {\textsf {Plus}}\mid {\textsf {Inp}}\;\mathbf{chan}\mid {\textsf {Out}}\;\mathbf{chan}\mid {\textstyle \sum }\,(\mathbf{index}\;\mathbf{set})\). The free and bound arities and sorts of the operation symbols are as expected. For example, \({\textsf {Inp}}\;c\) acts similarly to \(\lambda \)-abstraction, but binds, in \({\textsf {proc}}\) terms, variables of a different sort, \({\textsf {varexp}}\): \({\textsf {arOf}}\,({\textsf {Inp}}\;c) = ()\), \({\textsf {barOf}}\,({\textsf {Inp}}\;c) = (0\mapsto ({\textsf {varexp}},{\textsf {proc}}))\). For \(\sum I\) with \(I:\mathbf{index}\;\mathbf{set}\), the arity is only defined for elements of I, namely \({\textsf {arOf}}\;\left( \sum I\right) = ((i\in I)\mapsto {\textsf {proc}})\).

6.2 Well-Sorted Terms Over a Signature

Based on the information from a signature, we can distinguish our terms of interest, namely those that are well-sorted in the sense that:

  • all variables are embedded into terms of sorts compatible with their varsorts

  • all operation symbols are applied according their free and bound arities

This is modeled by well-sortedness predicates \({\textsf {wls}}: \mathbf{sort}\rightarrow \mathbf{term}\rightarrow \mathbf{bool}\) and \({\textsf {wlsAbs}}: \mathbf{varsort}\times \mathbf{sort}\rightarrow \mathbf{abs}\rightarrow \mathbf{bool}\), where \({\textsf {wls}}\;s\;X\) states that X is a well-sorted term of sort s and \({\textsf {wlsAbs}}\;({\textit{xs}},s)\;A\) states that A is a well-sorted abstraction binding an \({\textit{xs}}\)-variable in an s-term. They are defined mutually inductively by the following clauses:

$$\begin{aligned} \begin{array}{rcl} &{}&{}{\textsf {wls}}\;({\textsf {asSort}}\;{\textit{xs}})\;({\textsf {Var}}\;{\textit{xs}}\;x) \\ {\uparrow }\,{\textsf {wls}}\;({\textsf {arOf}}\;\delta )\;{\textit{inp}}\,\wedge \, {\uparrow }\,{\textsf {wlsAbs}}\;({\textsf {barOf}}\;\delta )\;{\textit{binp}}&{}\;\Longrightarrow \;&{} {\textsf {wls}}\;({\textsf {stOf}}\;\delta )\;({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}}) \\ {\textsf {isInBar}}\,({\textit{xs}},s) \,\wedge \, {\textsf {wls}}\;s\;X &{}\;\Longrightarrow \;&{} {\textsf {wlsAbs}}\;({\textit{xs}},s)\;({\textsf {Abs}}\;{\textit{xs}}\;x\;X) \end{array} \end{aligned}$$

where \({\textsf {isInBar}}\,({\textit{xs}},s)\) states that the pair \(({\textit{xs}},s)\) is in the bound arity of at least one operation symbol \(\delta \), i.e., \({\textsf {barOf}}\;\delta \;i = ({\textit{xs}},s)\) for some i— this rules out unneeded abstractions.

Let us illustrate sorting for our running examples. In the \(\lambda \)-calculus syntax, let \(X = {\textsf {Var}}\;{{\textsf {lam}}}\;x\), \(A = {\textsf {Abs}}\;{{\textsf {lam}}}\;x\;X\), and \(Y = {\textsf {Op}}\;{\textsf {Lm}}\;()\;(0 \mapsto A)\). These correspond to what, in the unsorted BNF notation from Sect. 2.1, we would write \({\textsf {Var}}\;x\), \({\textsf {Abs}}\;x\;X\) and \({\textsf {Lm}}\;({\textsf {Abs}}\;x\;X)\). In our sorting system, X and Y are both well-sorted terms at sort \({{\textsf {lam}}}\) (written \({\textsf {wls}}\;{{\textsf {lam}}}\;X\) and \({\textsf {wls}}\;{{\textsf {lam}}}\;Y\)) and A is a well-sorted abstraction at sort \(({{\textsf {lam}}},{{\textsf {lam}}})\) (written \({\textsf {wlsAbs}}\,({{\textsf {lam}}},{{\textsf {lam}}})\,A\)).

For CCS, we have that \(E = {\textsf {Op}}\;{\textsf {Zero}}\,()\,()\) and \(F = {\textsf {Op}}\;{\textsf {Plus}}\;(0\mapsto E,\;1\mapsto E)\,()\) are well-sorted terms of sort \({\textsf {exp}}\). Moreover, \(P = {\textsf {Op}}\,(\sum \emptyset )\,()\,()\) and \(Q = {\textsf {Op}}\,({\textsf {Out}}\;c)\,(0\mapsto F,1 \mapsto P)\,()\) are well-sorted terms of sort \({\textsf {proc}}\). (Note that P is a sum over the empty set of choices, i.e., the null process, whereas Q represents a process that outputs the value of \(0+0\) on channel c and then stops.) If, e.g., we swap the arguments of \({\textsf {Out}}\;c\) in Q, we obtain \({\textsf {Op}}\,({\textsf {Out}}\;c)\,(0 \mapsto P,1\mapsto F)\,()\), which is not well-sorted: In the inductive clause for \({\textsf {wls}}\), the input \((0 \mapsto P,1\mapsto F)\) fails to match the arity of \({\textsf {Out}}\;c\), \((0 \mapsto {\textsf {exp}},1\mapsto {\textsf {proc}})\).

6.3 From Good to Well-Sorted

Recall that goodness means “does not branch beyond \(|\mathbf{var}|\).” On the other hand, well-sortedness imposes that, for each applied operation symbol \(\delta \), its inputs have same domains, i.e., only branch as much, as the arities of \(\delta \). Thus, it suffices to assume the arity domains smaller than \(|\mathbf{var}|\). We will more strongly assume that the types of sorts and indexes (the latter subsuming the arity domains) are all smaller than \(|\mathbf{var}|\):

Assumption 25

\(|\mathbf{sort}|< |\mathbf{var}| \;\wedge \; |\mathbf{index}|< |\mathbf{var}| \;\wedge \; |\mathbf{bindex}| < |\mathbf{var}|\)

Now we can prove:

Prop 26

\(({\textsf {wls}}\;s\;X \Longrightarrow {\textsf {good}}\;X) \;\wedge \; ({\textsf {wls}}\;({\textit{xs}},s)\;A \Longrightarrow {\textsf {goodAbs}}\;A)\)

In particular, when we work with well-sorted terms we can take advantage of all the properties we proved about good terms without having to carry with us the cardinality constraint on the size of the inputs. This is because the constraint is implied by the assumption about the cardinality of the operation arities. On the other hand, for well-sorted terms we must of course carry sort and/or arity information.

In addition, we prove that all the standard operators preserve well-sortedness. For example, we prove that if we substitute, in the well-sorted term X of sort s, for the variable y of varsort \({\textit{ys}}\), the well-sorted term Y of sort corresponding to \({\textit{ys}}\), then we obtain a well-sorted term of sort s: \({\textsf {wls}}\;s\;X \;\wedge \; {\textsf {wls}}\;({\textsf {asSort}}\;{\textit{ys}})\;Y \;\Longrightarrow \; {\textsf {wls}}\;s\;(X\,[Y/y]_{ys})\).

Using the preservation properties and Prop. 26, we transfer the entire theory of Sects. 3.4, 5 and 4 from good terms to well-sorted terms. For example, Prop. 18(d) becomes:

As a slightly more involved example, here is the well-sorted version of the fresh induction Theorem 24, where the predicates \(\varphi \) and \({\varphi {\textit{Abs}}}\) take sorting information as an additional parameter:

Theorem 27

Let \(\varphi : \mathbf{sort}\rightarrow \mathbf{term}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\) and \({\varphi {\textit{Abs}}}: \mathbf{varsort}\times \mathbf{sort}\rightarrow \mathbf{abs}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\). Assume:

(1) \(\forall {\textit{xs}},p.\;|{\textsf {vars}}{\textsf {Of}}\;p\;{\textit{xs}}| < |\mathbf{var}|\)

(2) \(\forall {\textit{xs}},x,p.\;{\textsf {isInBar}}\;({\textit{xs}},s) \Longrightarrow \varphi \,({\textsf {asSort}}\;{\textit{xs}})\,({\textsf {Var}}\;{\textit{xs}}\;x)\;p\)

(3) \(\forall \delta ,{\textit{inp}},{\textit{binp}},p.\;\)

   \({\uparrow }\,(\lambda s,X.\;{\textsf {wls}}\;s\;X \,\wedge \, (\forall q.\,\varphi \;s\;X\;q))\,({\textsf {arOf}}\;\delta )\,{\textit{inp}}\,\;\wedge \, \)

   \({\uparrow }\,(\lambda ({\textit{xs}},s),A.\;{\textsf {wlsAbs}}\;({\textit{xs}},s)\;A \,\wedge \, (\forall q.\,{\varphi {\textit{Abs}}}\;({\textit{xs}},s)\;A\;q))\,({\textsf {barOf}}\;\delta )\,{\textit{binp}}\;\Longrightarrow \)

   \(\varphi \,({\textsf {stOf}}\;\delta )\,({\textsf {Op}}\;\delta \;{\textit{inp}}\;{\textit{binp}})\;p\)

(4) \(\forall {\textit{xs}},x,s,X,p.\;{\textsf {wls}}\;s\;X \,\wedge \, (\forall q.\;\varphi \;X\;q) \,\wedge \, x \not \in {\textsf {vars}}{\textsf {Of}}\;p\;{\textit{xs}}\,\Longrightarrow \, {\varphi {\textit{Abs}}}\,({\textit{xs}},s)\,({\textsf {Abs}}\;{\textit{xs}}\;x\;X)\;p\)

Then \(\forall s,X,p.\;{\textsf {wls}}\;s\;X \Longrightarrow \varphi \;s\;X\;p\) and \(\forall s,{\textit{xs}},A,p.\;{\textsf {wlsAbs}}\;({\textit{xs}},s)\;A \Longrightarrow {\varphi {\textit{Abs}}}\;({\textit{xs}},s)\;A\;p\).

The transfer is mostly straightforward for all facts, including the induction theorem. For stating the sorted version of the recursion and semantic interpretation theorems, there is some additional bureaucracy since we also need sorting predicates on the target domain; we will dedicate to this the next subsection.

There is an important remaining question: Are our two Assumptions (2 and 25) satisfiable? That is, can we find, for any types \(\mathbf{sort}\), \(\mathbf{index}\) and \(\mathbf{bindex}\), a type \(\mathbf{var}\) larger than these such that \(|\mathbf{var}|\) is regular? Fortunately, the theory of cardinals again provides us with a positive answer: Let \(\mathbf{G}= \mathbf{nat}+ \mathbf{sort}+ \mathbf{index}+ \mathbf{bindex}\). Since any successor of an infinite cardinal is regular, we can take \(\mathbf{var}\) to have the same cardinality as the successor of \(|\mathbf{G}|\), by defining \(\mathbf{var}\) as a suitable subtype of \(\mathbf{G}\;\mathbf{set}\). In the case of all operation symbols being finitary, i.e., with their arities having finite domains, we do not need the above fancy construction, but can simply take \(\mathbf{var}\) to be a copy of \(\mathbf{nat}\).

6.4 Many-Sorted Recursion

As mentioned in the previous subsection, adapting the theorems from good items to well-sorted items is a routine process. For recursion, the process is more bureaucratic, since it involves the sorting of the target domain as well. We only show here the case of FSb primitive recursion. The others are similar.

A sorted FSb recursion model is an extension of the concept of FSb recursion model with the following data:

  • the sorting predicates \({\textsf {wls}}^{\mathbf{T}} : \mathbf{sort}\rightarrow \mathbf{T}\rightarrow \mathbf{bool}\) and \({\textsf {wlsAbs}}^{\mathbf{T}} : \mathbf{varsort}\times \mathbf{sort}\rightarrow \mathbf{A}\rightarrow \mathbf{bool}\)

  • the assumption that all operators preserve sorting, e.g.,

    • \({\textsf {wls}}\,s\;X' \;\wedge \; {\textsf {wls}}^{\mathbf{T}}\,s\;X \;\wedge \; {\textsf {wls}}\;({\textsf {asSort}}\;{\textit{ys}})\;Y' \;\wedge \; {\textsf {wls}}^{\mathbf{T}}\;({\textsf {asSort}}\;{\textit{ys}})\;Y \;\Longrightarrow \; {\textsf {wls}}^{\mathbf{T}}\,s\ ({\textsf {SUBST}}\;X'\;X\;Y'\; Y\;y\;{\textit{ys}})\)

The recursion Theorem 22 is now extended to take sorting into account:

Theorem 28

For any sorted FSb recursion model, there exist the functions \({\textit{f}}:\mathbf{term}\rightarrow \mathbf{T}\) and \({\textit{fAbs}}: \mathbf{abs}\rightarrow \mathbf{A}\) that satisfy the same properties as in Theorem 22 and additionally preserve sorting:

  • \({\textsf {wls}}\;s\;X \;\Longrightarrow \; {\textsf {wls}}^{\mathbf{T}}\,s\;({\textit{f}}\;X)\)

  • \({\textsf {wlsAbs}}\;({\textit{xs}},s)\;A \;\Longrightarrow \; {\textsf {wlsAbs}}^{\mathbf{T}}\,({\textit{xs}},s)\;({\textit{fAbs}}\;A)\)

Similarly, we obtain a sorted version of the semantic interpretation theorem. We define a sorted semantic domain to have the same components as a semantic domain from Sect. 4.4, plus sorting predicates \({\textsf {wls}}^{\mathbf{Dt}}\) and \({\textsf {wls}}^{\mathbf{Da}}\). Again, it is assumed that the semantic operators preserve sorting. Then Theorem 23 is adapted to sorted domains, additionally ensuring the sort preservation of \({\textsf {sem}}\) and \({\textsf {semAbs}}\).

6.5 Instantiating the Framework

In Sect. 6.1 we have seen how our general binding signature can be straightforwardly instantiated to particular syntaxes, such as those of \(\lambda \)-calculus and CCS. However, such raw instances are not convenient for the end user, because they represent too deep embeddings. Thus, the constructors would have to be applied indirectly, via the generic operator \({\textsf {Op}}\), to which the desired constructor symbol must be passed; sortedness information would have to be carried around; etc.—in other words, one would incur the usual inconvenience arising from instantiating universal algebra to fixed-signature algebras, such as groups or rings.

To address this, we have designed a systematic method for transferring a signature instance to the expected more shallowly embedded version of the instance. This involves creating native Isabelle/HOL types of terms for each sort of the signature and transferring all the term constructors and operators and all facts about them to these native types. The process is conceptually straightforward, but is quite tedious, and currently must be done by hand since we have not yet automated it. (But [99, §5] presents the successful automation of a similar kind of transfer.)

Fig. 6
figure 6

Constructors and operators on \(\lambda \)-terms and \(\lambda \)-abstractions

For the \(\lambda \)-calculus instance, this results in the countable type \(\mathbf{var}\) and the types \(\mathbf{Lterm}\) and \(\mathbf{Labs}\), of \(\lambda \)-terms and \(\lambda \)-abstractions, together with the constructors and operators shown in Fig. 6. In comparison to the types of the general-theory operators in Fig. 2, we see that in Fig. 6 the varsort information has disappeared and the generic constructor-managing operator \({\textsf {Op}}\) has been replaced by the concrete constructors \({\textsf {App}}\) and \({\textsf {Lm}}\). The latter are defined by applying \({\textsf {Op}}\) to the corresponding constructor symbols and to inputs matching their arities: \({\textsf {App}}\;X\;Y\) from \({\textsf {Op}}\;{{\textsf {app}}}\;(0 \mapsto X, 1 \mapsto Y)\;()\) and \({\textsf {Lm}}\;A\) from \({\textsf {Op}}\;{{\textsf {lm}}}\;()\;(0 \mapsto A)\;()\). Because the \(\lambda \)-calculus syntax is single-sorted, the instance has only one freshness, one (parallel) substitution and one swapping operator for terms; for many-sorted syntaxes, we would have one such operator for each varsort–sort combination.

The theorems of the deep instance of the general theory are (isomorphically) transferred to the shallow embedding’s types. For example, here is what Theorem 27 becomes for the \(\lambda \)-calculus syntax, where \({\textsf {vars}}{\textsf {Of}}: \mathbf{param}\rightarrow \mathbf{var}\;\mathbf{set}\):

Theorem 29

Let \(\varphi : \mathbf{Lterm}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\) and \({\varphi {\textit{Abs}}}: \mathbf{Labs}\rightarrow \mathbf{param}\rightarrow \mathbf{bool}\). Assume:

(1) \(\forall p.\;\textsf {finite}\; ({\textsf {vars}}{\textsf {Of}}\;p)\)Footnote 3

(2) \(\forall x,p.\;\varphi \;({\textsf {Var}}\;x)\;p\)

(3) \(_{\textsf {App}}\)\(\forall X,Y,p.\; (\forall q.\,\varphi \;X\;q) \,\wedge \, (\forall q.\,\varphi \;Y\;q) \,\Longrightarrow \, \varphi \;({\textsf {App}}\;X\;Y)\;p\)

(3) \(_{\textsf {Lm}}\)\(\forall A,p.\; (\forall q.\,{\varphi {\textit{Abs}}}\;A\;q) \,\Longrightarrow \, \varphi \;({\textsf {Lm}}\;A)\;p\)

(4) \(\forall x,X,p.\; (\forall q.\,\varphi \;X\;q) \,\wedge \, x \not \in {\textsf {vars}}{\textsf {Of}}\;p \,\Longrightarrow \, {\varphi {\textit{Abs}}}\,({\textsf {Abs}}\;x\;X)\;p\)

Then \(\forall X,p.\;\varphi \;X\;p\) and \(\forall A,p.\;{\varphi {\textit{Abs}}}\;A\;p\).

In comparison to the general Theorem 27, again the sorts and varsorts have disappeared and the \({\textsf {Op}}\)-clause (3) has been replaced by concrete-constructor clauses: one for \({\textsf {App}}\) and one for \({\textsf {Lm}}\). In the recent draft [45] we give more details about two \(\lambda \)-calculus instances of our framework: the above one-sorted syntax, as well as a two-sorted syntax that distinguishes values from arbitrary terms.

6.6 End Product

All in all, our formalization provides a theory of syntax with bindings over an arbitrary many-sorted signature. The signature is formalized as an Isabelle locale [60] that fixes the types \(\mathbf{var}\), \(\mathbf{sort}\), \(\mathbf{varsort}\), \(\mathbf{index}\), \(\mathbf{bindex}\) and \(\mathbf{opsym}\) and the constants \({\textsf {asSort}}\), \({\textsf {arOf}}\) and \({\textsf {barOf}}\) and assumes the injectivity of \({\textsf {asSort}}\) and the \(\mathbf{var}\) properties (Assumptions 2 and 25). All end-product theorems are placed in this locale.

The whole formalization consists of 18700 lines of code (LOC). Of these, 3600 LOC are dedicated to quasiterms, their standard operators and alpha-equivalence. 4000 LOC are dedicated to the definition of terms and the lifting of results from quasiterms. Of the latter, the properties of substitution were the most extensive—2500 LOC out of the whole 4000—since substitution, unlike freshness and swapping, requires heavy variable renaming, which complicates the proofs.

The induction scheme presented in Sect. 5 is not the only scheme we formalized (though it is the most useful). We also proved a variety of lower-level induction schemes based on the skeleton of the terms and schemes that are easier to instantiate—e.g., by pre-instantiating Theorem 24 with commonly used parameters such as variables, terms and environments. Induction and iteration/recursion principles constitute 5000 LOC altogether.

The remaining 6100 LOC of the formalization are dedicated to transiting from good terms to sorted terms. Of these, 2500 LOC are taken by the sheer statement of our many end-product theorems. Another fairly large part, 1500 LOC, is dedicated to transferring all the variants of iteration and recursion (those from Sects. 4.1 and 4.2) and the interpretation Theorem 23, which require conceptually straightforward but technically tedious moves back and forth between sorted terms and sorted elements of the target domain.

7 Applications of the Framework

So far, we have instantiated our theory to the syntaxes of the call-by-name and call-by-value variants of the \(\lambda \)-calculus (the latter differing from the former by a separate syntactic category for values) and to that of many-sorted FOL. For these syntaxes, we performed several formal developments that take advantage of the theory’s infrastructure. These developments, spanning several years, have used different versions of the general theory, which itself has evolved taking inspiration from them.

The first development took place in 2010, when we formalized a proof of strong normalization for System F [92]. The two employed syntaxes, of System F’s Curry-style terms and types, are two copies of the \(\lambda \)-calculus syntax. The logical relation technique required in the proof made essential use of parallel substitution—and in fact was the incentive for us to go beyond unary substitution in the general theory. To streamline the development, on top of the first-order syntax we introduced HOAS-like definition and reasoning techniques, which were based on the general-purpose first-order ones shown in Sects. 4 and 5. In particular, the strong HOAS recursion principle stated in Prop. 4 from [92] was inferred using our semantic interpretation Theorem 23.

In subsequent work, we formalized several results about \(\lambda \)-calculus: the standardization and Church-Rosser theorems and the CPS translations between call-by-name and call-by-value calculi [85], an adequate HOAS representation of the calculus into itself, a sound interpretation via De Bruijn encodings [32], and the isomorphism between different definitions of \(\lambda \)-terms: ours, the Nominal one [112], the locally named one [87] and the Hybrid one [35]. These results are centered around some translation/representation functions: CPS, HOAS, Church-Rosser complete development [105], De Bruijn, etc. — these functions and their desirable properties (e.g., preservation of substitution, crucial in HOAS [50]) were obtained as instances of our recursion theorems (Sect. 4).Footnote 4

Finally, in the context of certifying Sledgehammer’s HOL to FOL encodings [17], we formalized fundamental results in many-sorted FOL [22], including the completeness, compactness, Skolemization and downward Löwenheim-Skolem theorems.Footnote 5 Besides the ubiquitous employment of the properties of freshness and substitution listed in Sect. 3.4, we used again the semantic interpretation Theorem 23 for the FOL semantics and the recursion Theorem 22 for bootstrapping quickly the (technically quite tricky) Skolemization function.

8 Related Work and Future Work

There is a large amount of literature on formal approaches to syntax with bindings, many of which are supported by proof assistants or logical frameworks. See [107, §2], [35, §6] and [89, §2.10,§3.7] for overviews. Our work follows a nameful paradigm to representing binders (8.1) and takes a universe-like approach to capture syntax specified by an arbitrary binding signature (8.2), which is restricted to single-variable binders but caters for infinitely branching terms (8.3). We formalize a theory covering not only the syntactic constructors, but also other standard generic operators (8.4), and featuring nominal-logic-style induction and operator-sensitive recursion (8.5).

8.1 Binding Representation Paradigm

There are three main paradigms of reasoning about bindings: (1) the (first-order) nameful paradigm, (2) the nameless paradigm and (3) higher-order abstract syntax. To illustrate the differences between them, in what follows we slightly depart from this paper’s notation and let \(\mathbf{term}\) denote the type of \(\lambda \)-terms—i.e., terms for the particular syntax of \(\lambda \)-calculus. Moreover, we will write \({\textsf {Lm}}\) for the binding constructor of this syntax, and think of it as operating directly on terms (bypassing any explicit notion of abstraction).

In the nameful paradigm, binding variables are passed as arguments to the binding operator, so that \({\textsf {Lm}}\) has type \(\mathbf{var}\rightarrow \mathbf{term}\rightarrow \mathbf{term}\); and terms are usually equated modulo alpha-equivalence. The nameful paradigm is followed by most of the informal, pen-and-paper developments of logic, \(\lambda \)-calculi and programming languages, and is systematically pursued by Barendregt in his \(\lambda \)-calculus monograph [10]—where he introduced his famous variable convention. The best known rigorous account of this paradigm is offered by Gabbay and Pitts’s nominal logic. Originally developed within a non-standard axiomatization of set theory [41, 42], nominal logic was subsequently cast in a standard foundation [83, 84], and also significantly developed in a proof assistant context—most extensively by Urban and collaborators [108,109,110,111,112]. Our work belongs to the nameful paradigm, giving a formal expression to many ideas from nominal logic—but, as discussed below, departing from nominal logic by exploring notions such as infinitely-branching terms and substitution-aware recursion.

In the nameless paradigm originating with De Bruijn [27], the bindings are indicated through nameless pointers to positions in a term. The \({\textsf {Lm}}\) constructor has type \(\mathbf{term}\rightarrow \mathbf{term}\) or a scope-safe variation of this, such as \((\mathbf{var}\;\mathbf{option})\,\mathbf{term}\rightarrow \mathbf{var}\;\mathbf{term}\) where \(\mathbf{var}\) is a parameter on which \(\mathbf{term}\) depends (a type variable) and \(\mathbf{option}\) is used to mark the binding positions, or using dependent types, \(\mathbf{term}_{n+1} \rightarrow \mathbf{term}_n\), to a similar effect. Thus, given a term t, \({\textsf {Lm}}\;t\) introduces in t a binding without specifying a variable to be bound, but relying on some pre-identified binding positions in t. Major exponents of the scope-safe nameless paradigm are representations based on presheaves [40, 55] and nested datatypes [5, 15]—interestingly, with seminal papers published in the same year (1999), and two at the same conference (LICS) as the seminal paper on nominal logic [41]. The presheaf approach has been generalized and refined in many subsequent works, e.g., [2,3,4, 39, 44, 54, 59].

A paradigm quite significantly different from the other two is higher-order abstract syntax (HOAS) [29, 33, 35, 37, 50, 77, 79, 81]. Based on ideas going back as far as Church [30], Huet and Lang [56] and Martin-Löf [74, Chapter 3], HOAS has truly gained traction with the works of Harper et. al [50], Pfenning and Elliott [80] and Paulson [77] in the late eighties. HOAS essentially embeds the binders of the represented system (referred to as the object system) shallowly into the meta-logic’s binder. This shallow embedding leads to higher-order arguments to the binding constructors. There are two main versions of HOAS. The original version, sometimes called strong HOAS, endows \({\textsf {Lm}}\) with the type \((\mathbf{term}\rightarrow \mathbf{term}) \rightarrow \mathbf{term}\)—where the negative occurrence of \(\mathbf{term}\) makes it impossible to regard \({\textsf {Lm}}\) as an inductive datatype constructor, as supported by general-purpose proof assistants; consequently, strong HOAS is best pursued in HOAS-dedicated logical frameworks, such as Abella [9], Beluga [82], Delphin [96] and Twelf [81]. On the other hand, the weak version of HOAS [33], endowing \({\textsf {Lm}}\) with the type \((\mathbf{var}\rightarrow \mathbf{term}) \rightarrow \mathbf{term}\), is readily compatible with standard logical foundations and has been formalized in general-purpose proof assistants such as Coq [29, 33] and Isabelle [49]. HOAS often allows for lighter formalizations, thanks to borrowing binding mechanisms and sometimes structural properties from the meta-level. Unary substitution is also built in the representation: term-for-variable substitution in strong HOAS and variable-for-variables substitution in weak HOAS. Formalizations in this paradigm are often accompanied by pen-and-paper proofs of the representations’ adequacy (which involve informal reasoning about substitution) [50, 78]. In weak HOAS, the adequate representation relies on the exclusion of the so-called exotic terms, i.e., terms obtained by applying \({\textsf {Lm}}\) to non-uniform functions that do not correspond to abstractions—exclusion that can be achieved by either defining a custom predicate [33, 49] or keeping the type of variables generic and relying on parametricity [29].

Some approaches in the literature combine two paradigms. For example, the locally nameless approach [8, 28, 86] employs a nameless representation of bindings, but stores a distinct type of variables that can occur free; this enables some essentially nameful techniques for dealing with free variables (similar to those of nominal logic). Other examples are the Hybrid system [35] and the “HOAS on top of FOAS” approach [92], which develop HOAS reasoning techniques over locally nameless and nameful representation substrata.

Moreover, it should be emphasized that the three paradigms do not differ fundamentally in the employed datatype of terms. Regardless of whether terms are defined by quotienting quasiterms to alpha-equivalence, defined as a free datatype using nameless binders, or encoded using a meta-level binder, they yield essentially the same concept. The isomorphisms between different nameful and nameless representations are well-known, and have also been established formally [76] [6, §3] [89, §2.9.6]. For HOAS developments, the equivalence with an (informal) nameless definition is usually expressed through an adequacy theorem.

The essential differences between the paradigms lie in their respective definitional and reasoning styles, which are customized and optimized for the different types of the binding constructors. Thus (as also illustrated in this paper), defining functions recursively amounts to endowing the target type T with operators matching the arities of the term constructors. For example, in the case of \({\textsf {Lm}}\), the types of these operators are (variations of) \(\mathbf{var}\rightarrow T \rightarrow T\), \(T \rightarrow T\), \(\prod _{n}\,(T_{n+1} \rightarrow T_n)\) and \((\mathbf{var}\rightarrow T) \rightarrow T\) for the nameful, (classic) nameless, well-scoped nameless and weak HOAS representations, respectively—these entail different styles of recursive definitions (whose correctness may or may not require discharging proof obligations) and inductive proofs (which may or may not require auxiliary lemmas). A deep analysis of these tradeoffs is beyond the scope of this paper, but Sect. 8.5 offers some high-level points of comparison.

8.2 Approach to Handling Arbitrary Signatures

There are many formal developments in proof assistants that focus on particular syntaxes, e.g., that of a chosen logic or programming language. In this subsection and the next one we only discuss developments that, like ours, cover an entire class of syntaxes.

An expressive type theory such as Agda’s or Coq’s easily caters for generic developments using universes, where types can be referred to and manipulated via codes. Most of the universe constructions for syntax with bindings follow the (locally) nameless paradigm, e.g., [2, 62, 64, 65]. However, universes for the other two paradigms have also been explored: [62] also features and alternative (parametric) weak HOAS representation, and [31] features a nameful (nominal) representation.

An alternative to generic development via universes is the code generator approach,Footnote 6 which is preferred in proof assistants based on weaker logics, such as HOL. Code generators, also known as definitional packages in the HOL literature, produce the datatypes and associated theorems dynamically (using tactics), for each user-specified syntax. This is the approach taken for Nominal Isabelle, but also for several tools for Coq, such as LNgen [6] (supporting a locally nameless representation), DBGen [88] (supporting the classic nameless representation of De Bruijn), and Autosubst [98, 102] and Needle&Knot [63] (supporting scope-safe nameless representations).

Our work targets a weak logic, but is close to the universe approach—except that instead of type codes we use deeply embedded sorts, and develop our whole theory, including universal-algebra-style recursion principles, in a deep embedding.

From a theoretical perspective, the universe approach has a wider appeal, as it models “statically” the meta-theory in its entire generality. However, a code generator is often more practical, since most proof assistant users only care about the particular instance syntax used in their development—and code generators can deliver fine-tuned solutions here. By contrast, in our case, as we discuss in Sect. 6.5, simply instantiating the signature with a particular syntax is not entirely satisfactory, for which reason we need to engage in a process of transferring theorems to a more shallow representation. In the future, we plan to have this transfer fully automated, obtaining the best of both worlds, namely a universal algebra theory that caters for a statically certified code generator.

Note that much of our instantiation overhead comes from the lack of expressiveness of HOL. Indeed, had the logic allowed us to speak of families of types indexed by sorts, the “raw” instances would have already been quite shallow.Footnote 7 In addition, dependent type families would have made our generic development conceptually more direct, avoiding the application of operation symbols outside their arities (which currently require correction via sortedness predicates); however, we believe this would not have represented a decisive boost in automation, since the proof obligations arising from dependent types would have had a similar complexity to those incurred by the sortedness predicates.

8.3 Generality of the Represented Syntax

Our constructors are restricted to binding at most one variable in each input—a limitation that makes our framework far from ideal for representing complex binders such as the let patterns in part 2B of the POPLmark challenge. By contrast, the specification language Ott [101], Isabelle’s Nominal2 definitional package [111], Coq’s Needle&Knot tool and our own recent “bindings as functors” representation [19] were specifically designed to address such complex, possibly recursive binders. Incidentally, Nominal2 separates abstractions from terms, like we do (but their abstractions are significantly more expressive).

On the other hand, our formalization seems to be the first to cover infinitely branching terms, and our foundation of alpha-equivalence on the regularity of \(|\mathbf{var}|\) is also a theoretical novelty—constituting a less exotic alternative to Gabbay’s work on infinitely supported objects in nonstandard set theory [43]. This flexibility is needed when formalizing systems such as infinitary logics [61] and \(\lambda \)-calculi [58], as well as infinite-choice process algebra (for which infinitary structures have been previously employed to give semantics [67]).

8.4 Support for Generic Operators

The main goal of our work was the development of as much as possible from the theory of syntax, for an arbitrary syntax. In particular, we wanted to define and study important binding-datatype-generic operators, in the spirit of polytipic programming [53] and universal-algebraic theories of syntax [40, 95, 104].

This development, performed in Isabelle, joins a series of generic formalizations and tools in Coq that offer support, i.e., definitions and theorems, for unary substitution (e.g., LNgen, GMeta, DBGen and Needle&Knot) and parallel substitution (Autosubst). In addition, Needle&Knot and GMeta go further and handle contexts and their associated term well-scoping predicates and variable lookup operators.

Our work seems to be the first to formalize generic support for the interpretation of terms in semantic domains—which in the meantime has also been developed in Agda within the well-scoped nameless paradigm, using a universe [2]. While widely applicable, our semantic interpretation principle incurs the usual limitations of (and can benefit from the usual workarounds for) HOL. For example, if we want to assign semantics to the simply-typed \(\lambda \)-calculus, we would first need to have all the simple types embedded in a larger type (which is possible by either restricting the full function space or employing a gentle extension of HOL [51]). Then we could invoke our principle to obtain an untyped interpretation, which in a second stage could be proved type-correct. This is clearly a more bureaucratic solution than what is offered by the state of the art in Agda [2].

8.5 Definition and Reasoning Principles

Within the nameful paradigm, a major strand in the literature is connected to the quest, pioneered by Gordon and Melham [48], for understanding terms with bindings modulo alpha as an abstract datatype. The state of the art is currently represented by the expressive induction and recursion principles of nominal logic [7, 31, 84, 108, 109] and by the essential variation of nominal recursion due to Norrish [75].

In this paper we formalized the nominal logic structural induction principle from [108], which was implemented in Nominal Isabelle (for a different notion of binding signature). By contrast, we did not formalize the nominal logic recursion principle. Instead, we chose to stay more faithful to a classical abstract datatype desideratum, generalizing to an arbitrary syntax our own schema for substitution-aware recursion [91] and a parameter-free version of Norrish’s schema for swapping-aware recursion [75]—both of which can be read as stating that terms with bindings are Horn-abstract datatypes, i.e., are initial models of certain Horn theories [91, §3,§8]. The difference between our recursion principles and the nominal logic ones rests in the identification of terms as forming initial objects in different categories: that of models of a Horn theory, versus that of some enriched nominal sets. Our recursion principles are better suited to handle situations where the finite support assumption no longer holds, as in one of our main applications: interpretation in semantic domains.Footnote 8 On the other hand, unlike our recursion principles, the nominal principle (as well as Norrish’s) observes Barendregt’s convention in the recursion clauses, allowing the presence of additional parameters for which the binding variables are fresh. Finally, the nominal recursion principle is currently better supported in Isabelle by the automation delivered through the Nominal definitional package.

A different line of attack to recursion is taken within the nameless paradigm. The nameless recursion principles are more readily available thanks to the freeness of the term datatype (no quotienting involved). On the other hand, the nameful recursion principles stay closer to informal practice, but this happens at the price of the user (or ideally, the proof assistant) having to prove some conditions that ensure the definition is correct—in our case, these are the FSb/FSw/FSbSw (recursion) model conditions. We refer to [13] for a detailed analysis of the relative merits of the nameful and nameless paradigms (in two particular variants). More insight into possibilities for automating or reducing the proof obligations resulted from quotienting could be offered by general-purpose datatype quotienting mechanisms such as Isabelle’s nonfree datatypes [99] or the higher inductive types of homotopy type theory implemented in Coq [11] and Lean [34].

In the nameless presheaf approaches, the recursion principles come from exhibiting the terms as initial models in specific presheaf toposes. To apply these principles, a substitution/renaming structure needs to be provided on the intended target domain, and some conditions need to be verified, with the reward that the defined function is guaranteed to preserve the additional structure—which leads to a situation somewhat similar to the nameful case, although admittedly the conditions to be verified are quite different. Recent formalization-supported investigations by Allais et al. [2, 3] and Kaiser et al. [59] have revealed lighter variations of the presheaf-based recursors (incurring fewer proof obligations for the user).

A quite different take on binding-aware recursion is pursued in HOAS-dedicated frameworks [38, 100], whose relationship with the other two paradigms is difficult to grasp due to the use of a substantially different logical foundation. A study of this relationship would be interesting future work. On the other hand, the weak HOAS induction and recursion principles are easier to compare since they are developed within standard foundations. Like the De Bruijn ones, they employ a free datatype of terms—namely, considering \({\textsf {Lm}}\) as an (infinitary) injective constructor of type \((\mathbf{var}\rightarrow \mathbf{term}) \rightarrow \mathbf{term}\). In fact, by virtue of working in a suitable presheaf topos where context extension is isomorphic to the function space from the presheaf of variables, some nameless presheaf representations yield a weak-HOAS-like recursor [40, 55] (but again, in a topos different from \(\mathcal {S}et\)). As another point of convergence, the recursor developed by Gordon and Melham [48] for alpha-quotiented terms (within the nameful paradigm) is essentially the weak HOAS one [33].

8.6 Ongoing and Future Work

Our theory currently addresses mostly structural aspects of terms. A next step would be to cover behavioral aspects, such as binding-aware proof rules associated to SOS formats, perhaps building on existing Isabelle formalizations of process algebras and programming languages (e.g., [12, 66, 72, 90, 93, 94]).

A major test for our theory will be its application to a realistic programming language, featuring several syntactic categories with many constructors and a large operational semantics. We have no reason to doubt that our theory would scale for such a development, but we cannot know for a fact before we try. One aspect that could prove helpful when dealing with a large syntax may be refraining from switching to a shallow representation (as described in Sect. 6.5), but staying within a deep representation, i.e., having a single type of terms classified by a sortedness predicate. For example, it may be more proof-engineering efficient (incurring a smaller amount of proof scripts) to have a single generic substitution operator as in the deep representation rather than a combinatorial explosion of such operators for each combination of variable type and term type. (See [97, §7] for the discussion of a similar phenomenon for a medium-sized syntax.)

Another line of work (which we pursue together with other colleagues) is the development of support for bindings that gets full traction from Isabelle’s existing infrastructure—namely its definitional package for inductive and coinductive datatypes [20] based on bounded natural functors (BNFs), which follows a compositional design [106] and provides flexible ways to nest types [21] and mix recursion with corecursion [18, 25]. Theoretical progress with our ongoing effort on this front is reported in the recent paper [19], where BNFs are refined into what we call map-restricted BNFs (MRBNFs), a notion of functor that accommodates bindings as first-class citizens. In addition to inheriting the modularity of the BNF framework, this work covers complex bindings à la POPLmark and beyond. Moreover, it further stretches our regular-cardinal technique to allow non-well-founded, coinductive terms (such as Böhm trees [10]) in addition to infinitely branching terms. This development is only in a prototype phase, with a formalization having been developed for a fixed signature (modeled as a fixed MRBNF). It does not follow the current paper’s universe-like approach, but will be implemented as a definitional package.