Keywords

2010 Subject Classification

Solomon Feferman engaged with model theory in three different ways. In the 1950s he did fundamental work on the first-order theories of generalised sums and products, for which the name ‘Feferman-Vaught theorem’ will ensure that he never goes forgotten. Between the 1960s and the 1980s he followed a trajectory that began with applications of interpolation theorems in model theory, and led to work on set-theoretic generalisations of model theory. In more recent years he reshaped himself as a historian of ideas, recalling and explaining aspects of the development of mathematical logic through the twentieth century; model theory takes a significant place in this work, largely thanks to Feferman’s connections with Tarski and the editing of Gödel’s collected works.

These three themes seem rather distinct, though they probably had various interesting links in Feferman’s own mind. So we will treat them separately.

My thanks to Jamshid Derakhshan, Janos Makowsky and Jeffery Zucker for various kinds of information and advice, and (posthumously, alas) to Solomon Feferman himself for a number of preprints of his work that he sent me over the years.

1 Logic When Feferman Entered the Field

In his valuable book with Anita Burdman Feferman, on the life and works of Alfred Tarski, Solomon Feferman recalls ([11] p. 171) that he went to work at Berkeley in 1948 as ‘an eager new graduate student in mathematics’. He attended Tarski’s lectures in metamathematics at Berkeley in 1949/50 ([26] footnote p. 80). By 1953 he was working on his doctoral thesis under Tarski’s supervision, and simultaneously acting as Tarski’s course assistant ([28] p. 388).

In the early 1950s the boundaries of logic were in a state of flux. Was the kind of thing that Tarski was doing at that date a branch of philosophy or of mathematics, or in the intersection of the two? Tarski himself had distinguished metamathematics from mathematics back in the 1930s. In 1950 he spoke of working ‘on the borderline of algebra and metamathematics’ [55], and he continued to put some of his work on the metamathematical side of the mathematics/metamathematics divide right up to 1978, as witness the title of [7]. By the time I began work in model theory in 1965, I think most model theorists regarded themselves as straight mathematicians—as I did—and the notion of metamathematics never entered their heads.

One memory I have is that in 1966 my supervisor John Crossley arranged for me to meet Georg Kreisel with a view to getting some advice on possible research topics. Kreisel’s suggestions struck me as aimed more at foundations than model theory, and I didn’t pursue his advice. This story is relevant to Feferman, because it appears that Kreisel had greater success drawing Feferman towards similar problems a year or so earlier; a background paper of Kreisel had the title ‘Set theoretic problems suggested by the notion of potential totality’ ([42], written in 1959 and cited in [32]).

Not only was the boundary between mathematics and philosophy unclear; the divisions within purely mathematical logic were unclear too. The now-standard division into Set Theory, Recursion Theory, Proof Theory and Model Theory seems to go back to the early 1960s. It reached the AMS Subject Classification in 1973. Around 1971 I remember Robin Gandy complaining that Jack Silver had been invited to give some talks in model theory and had spoken on set theory (or maybe it was the other way round); Silver’s topic had been the use of forcing to prove the consistency of some propositions about model-theoretic two-cardinal theorems. Soon afterwards Saharon Shelah was inventing proper forcing to prove results in his model-theoretic classification theory. The problem that Tarski first proposed to Feferman was about decidability, but Feferman responded to it by proving results in model theory. A few years later Feferman was proving results in model theory by means of proof theory.

There will always be crossovers, but the boundaries between model theory and the other main divisions of mathematical logic have been reasonably robust for the last thirty years or so. The fact that they were not robust when Feferman came into the subject shows up in Feferman’s work in two ways. The first is that nearly all of his so-called model-theoretic work contains at least a taste of some other logical discipline. Today this work of his seems to us strikingly interdisciplinary, though it may not have seemed so at the time. The second is that Feferman, being a reflective kind of person, drew on his experience to write a number of very interesting papers about the interactions between different parts of logic in the formative years of model theory.

In his paper for the Proceedings of the Tarski Symposium 1974 ([17] footnote p. 205), Feferman tells us that the paper which he presented to the Symposium itself ‘was entitled Model theory and foundations. It dealt with three areas of mutually fruitful interaction which I had found between model theory and work in proof theory and constructivity’. One of these involved the application of many-sorted interpolation theorems, and we review it in Sect. 3 below. The second was about model-theoretic functors; we discuss it briefly at the end of Sect. 2 below. The third was on’ordinals and functionals in proof theory’, as reported in [15]. Looking at that paper [15], I don’t see anything in it that would be regarded as model theory today. But this fact in itself points to questions about the boundaries of model theory, which we come back to in Sect. 4.

2 The Feferman-Vaught Theorem

The Feferman-Vaught theorem is as much a technique as a theorem. A typical case of the theorem is that when the structure A is a cartesian product, say \(A = \prod _{i\in {I}} B_i\) with \(I \ne \emptyset \), we can reduce the question whether a first-order sentence \(\phi \) is true in A to a question whether a certain set-theoretic formula \(\Phi _{\phi }\) is satisfied in the power set \(\mathcal {P}(I)\) of the index set I by a certain finite sequence of sets \((\parallel \theta _0\parallel ,\ldots ,\parallel \theta _{n-1}\parallel )\), where \(\parallel \theta \parallel \) is the set \(\{i \in I : B_i \models \theta \}\). The reduction is effective in the sense that there is an algorithm which computes \(\Phi _{\phi }\) and the sentences \(\theta _0, \ldots ,\theta _{n-1}\) uniformly from \(\phi \). The technique that gives this special case applies to a wide range of other constructions.

Some results provoke the question ‘How on earth did anybody think of doing that?’. The Feferman-Vaught theorem surely provokes that question; but in this case the history is well recorded, so we can answer the question. The answer runs as follows.

Starting in 1927, Tarski ran a seminar at Warsaw University, in which he and his students developed the method of ‘elimination of quantifiers’, which had been created a few years earlier by Leopold Löwenheim and Thoralf Skolem and pursued also by Cooper H. Langford. As practised by Skolem and Langford, the method was to take a structure A and show that for each first-order sentence \(\phi \) in the language of A, we can effectively find a first-order sentence \(\phi '\) which is true in A if and only if \(\phi \) is true in A, and which is a boolean combination of sentences of some simple form; we can call these simple sentences the ‘elimination set’. In some cases the elimination set would contain only quantifier-free sentences, so that passing from \(\phi \) to \(\phi '\) eliminates quantifiers completely. But in general these sentences might contain quantifiers, though they would typically be all universal or all existential. (Cf. [11] pp. 72–75.)

Early successes of this programme were Presburger’s quantifier elimination for the structure of integers with addition, and Tarski’s own quantifier elimination for the structure of the real numbers with addition and multiplication. Tarski also devoted some of his early efforts to extending Langford’s work on dense and discrete linear orderings. In 1938 Tarski and Andrzej Mostowski (a loyal student of Tarski, though officially his doctoral thesis supervisor was Kuratowski) began working on quantifier elimination for ordinals, in languages involving one or more of <, ordinal addition and ordinal multiplication. Mostowski’s notes were lost during the war (they were in the notebook that he famously had to abandon in favour of a loaf of bread, [41] pp. 6f), and he never reconstructed them, though some of the main results were published in an abstract in 1949 [51]. Tarski later published a reconstruction of this work in a joint paper with his student John Doner [7]. One of the results was that the simply ordered structure of the ordinals has a decidable first-order theory. The obvious next result to try to prove was that the ordered structure of the ordinals with ordinal addition has decidable first-order theory, and this was one of the two tasks that Tarski set Feferman for his doctoral thesis ([31] p. 37).

Feferman approached the problem from the point of view of elimination of quantifiers, as Tarski no doubt expected him to do. Tarski may have given him some information from the joint work with Mostowski. But also a very pertinent paper of Mostowski had just appeared [50]. In this paper Mostowski proved quantifier elimination results for structures A that allow a decomposition as a cartesian power of another structure, say \(A = B^I\) for some index set I. His elimination set was not in the first-order language of A; instead it also allowed one to refer to \(\mathcal {P}(I)\) and sets \(\parallel \theta _i \parallel \) as above. He showed that his results adapt also to ‘weak powers’ which are substructures of cardinal powers whose domains consist of all the elements that differ from e in at most finitely many factors, where e is a designated element of B (for example the identity element if B is a group, or 0 if B is an ordinal). From Mostowski’s own account it seems that he thought he was not so much striking out in a new direction as finding a common generalisation of several results already proved by quantifier elimination. One application that he mentions is of particular interest to us here. He considers the ordinal \(\alpha \) and the weak product of \(\alpha \) copies of \(\omega \), with sums taken pointwise. The resulting structure is isomorphic to \(\omega ^{\alpha }\) with natural addition; by Mostowski’s results this structure has a decidable first-order theory. This is less interesting than \(\omega ^{\alpha }\) with ordinal addition, but it suggests ways of approaching ordinal addition. In fact by late 1953 Feferman was able to follow this route and show that for every positive integer n, \(\omega ^n\) with ordinal addition has a decidable first-order theory. His main adjustment to Mostowski’s scheme was that he allowed the elimination formulas to refer to other features of the construction of A from B, for example a linear ordering of the index set I.

In 1956 Robert Vaught (also a student of Tarski) noticed that Feferman’s methods generalised to the case where A is a product of a family \((B_i : i \in I)\) of distinct structures; there was no need to assume that the factors were all isomorphic. Almost at once this gave a common generalisation both of Feferman’s earlier work, and of Vaught’s own work on sentences true in cartesian products. They wrote up the results as their joint paper ‘The first order properties of products of algebraic systems’ [33].

To explain how the technique works, we begin with the case of cartesian products, so that A has the form \(\prod _{i \in I} B_i\) with \(I \ne \emptyset \). We write L for the first-order language of A and the \(B_i\). If \(\bar{a}\) is a tuple of elements of A and \(i \in I\), then we write \(\pi _i\bar{a}\) for the tuple of projections to \(B_i\) of the terms of \(\bar{a}\). If \(\theta (\bar{x})\) is a formula of L, we write \(\parallel \theta (\bar{a}) \parallel \) for the set

$$\begin{aligned} \{ i \in I : B_i \models \theta (\pi _i\bar{a}) \}. \end{aligned}$$
(1)

Feferman and Vaught prove a theorem stating that for each formula \(\phi (\bar{x})\) of L there are a boolean formula \(\Phi _{\phi }\) and formulas \(\theta _0(\bar{x}), \ldots , \theta _{k-1}(\bar{x})\) of L such that for any cartesian product \(A = \prod _{i\in {I}}B_i\) of structures for the language L, and every tuple \(\bar{a}\) of elements of A,

$$\begin{aligned}&A \models \phi (\bar{a}) \ \text {if and only if the sequence}\ (\parallel \theta _0(\bar{a})\parallel ,\ldots ,\parallel \theta _{k-1}(\bar{a})\parallel ) \nonumber \\&\text {satisfies}\ \Phi _{\phi }\ \text {in the power set algebra} \ \mathcal {P}(I) \end{aligned}$$
(2)

Moreover \(\Phi _{\phi }\) and the formulas \(\theta _i\) can be found effectively from \(\phi \), and the formulas \(\theta _i\) can be chosen so that independently of A and I, the sets \(\parallel \theta _0(\bar{a})\parallel ,\ldots ,\parallel \theta _{k-1}(\bar{a})\parallel \) form a partition of I (though some partition sets may sometimes be empty).

We state one corollary about decidability. Suppose the factor structures \(B_i\) are required to be models of a decidable theory T (i.e. the set of consequences of T is recursive). Given any sentence \(\phi \) of L, we can determine as follows whether A can be chosen to be a model of \(\phi \). Find the formula \(\Phi _{\phi }\) and the sentences \(\theta _j\) (\(j < k\)) as in (2). Skolem’s quantifier elimination reduces the condition on the righthand side of (2) to a boolean combination of conditions of the form

$$\begin{aligned} \mathcal {P}(I) \models \text { For at least}\ p\,i' \text {s}, i \in \bigcap _{j\in Y} \parallel \theta _j \parallel \setminus \bigcup _{j \notin Y} \parallel \theta _j \parallel , \end{aligned}$$
(3)

where p is a positive integer and \(Y \subseteq k\). Since the sets \(\parallel \theta _j \parallel \) form a partition of I and I is not empty, (3) is false unless Y is a singleton \(\{j_0\}\), and in this case (3) will be true if and only if there are at least p factors \(B_i\) that are models of \(\theta _{j_0}\). Such factors can be found if and only if \(T \cup \{\theta _{j_0}\}\) is consistent, which can be checked since T is decidable. The rest is book-keeping.

The Feferman-Vaught theorem, in the form stated above, is proved by induction on the complexity of \(\phi \). The most interesting case is where \(\phi (\bar{x})\) is \(\exists y \phi '(\bar{x},y)\), and for this reason Feferman and Vaught say that the proof is by the ‘method of eliminating quantifiers’ ([33] p. 66). I am not entirely convinced that this is the best description. It seems to me more illuminating to compare with Jaakko Hintikka’s ‘distributive normal forms’ [39]. Let me digress briefly on these.

For simplicity suppose the language L has finitely many relation symbols and no function symbols. We will find, for all \(n, r < \omega \), a finite set \(\Theta _{n,r}\) of formulas \(\theta (x_0, \ldots ,x_{n-1})\) of L, such that

$$\begin{aligned}&\text { For each fixed } n \ \text { and }\ r, \text { if }\ \Theta _{n,r}\ \text { consists of the distinct formulas } \nonumber \\&\theta _0, \ldots , \theta _{k-1}\ \text { then } \vdash \forall \bar{x}\ \text { (Exactly one of } \ \theta _0(\bar{x}), \ldots , \theta _{k-1}(\bar{x}) \ \text {holds)}. \end{aligned}$$
(4)

We will assume that the set \(\Theta _{n,r}\) is given a fixed ordering, so that we can regard it as a sequence. (Feferman and Vaught [33] p. 64 call a sequence satisfying the condition in (4) a ‘partitioning sequence’.)

When \(r = 0\) we choose \(\Theta _{n,0}\) so that the formulas describe the complete quantifier-free types of n-tuples.

For \(r > 0\) we proceed by induction on r. Suppose \(\Theta _{n+1,r}\) has been defined and consists of the formulas \(\theta _0(x_0,\ldots ,x_n), \ldots , \theta _{k-1}(x_0,\ldots ,x_n)\). List as \((s(j') : j' < k' = 2^k)\) all the subsets of k. Then we define \(\Theta _{n,r+1}\) to consist of all the formulas \(\psi _{j'}(x_0,\ldots ,x_{n-1})\) \((j' < k')\) where \(\psi _{j'}\) is

$$\begin{aligned} \bigwedge _{j\in s(j')} \exists x_n \theta _j(x_0,\ldots ,x_n) \wedge \forall x_n \bigvee _{j\in s(j')} \theta _j(x_0,\ldots ,x_n). \end{aligned}$$
(5)

We will need the following two consequences of this definition. First,

$$\begin{aligned} \vdash \ \exists x_n \theta _j(x_0,\ldots ,x_n) \leftrightarrow \bigvee \{\psi _{j'}(\bar{x}) : j \in s(j')\} \end{aligned}$$
(6)

for each \(j < k\). And second,

$$\begin{aligned}&\text { We can effectively find, for each formula }\ \phi (x_0,\ldots ,x_{n-1})\ \text {of}\ L, \text {an} \nonumber \\&r_0 < \omega ,\ \text { and for each }\ r \geqslant r_0\ \text { a subset } \Theta \ \text { of } \Theta _{n,r}, \text { such that } \phi \ \text { is } \nonumber \\&\text { logically equivalent to } \bigvee \Theta . \end{aligned}$$
(7)

See [39] for further information. The families of formulas \(\Theta _{n,r}\), or close relatives of them, appear also in the theory of Ehrenfeucht-Fraïssé games, and in Scott sentences.

We return to the Feferman-Vaught theorem, as stated above for cartesian products. It will be helpful to add a clause to the theorem, namely that the formulas \(\theta _0, \ldots , \theta _{k-1}\) are the formulas of \(\Theta _{n,r}\), where n is the length of the tuple \(\bar{a}\) and r is found from \(\phi \). (If \(\phi \) is written so as to remove any nesting of function symbols, then r can be taken to be the quantifier rank of \(\phi \).) This guarantees that the \(\theta _i\) can be found effectively from \(\phi \), and that the sets \(\parallel \theta _i(\bar{a})\parallel \) always form a partition of I. So the proof of the theorem reduces to finding an appropriate formula \(\Phi _{\phi }\) for (1). We concentrate on the most significant case, which is where \(\phi (\bar{x})\) is \(\exists x_n\phi '(\bar{x},x_n)\).

By induction hypothesis there is a formula \(\Phi _{\phi '}\) such that for any \(A = \prod _{i\in {I}} B_i\) and any \(\bar{a}, b\) in A we have

$$\begin{aligned} \text {A}\ \models \phi '(\bar{a},b)\ \text { if and only if } \mathcal {P}(I) \models \Phi _{\phi '}(\parallel \theta _0(\bar{a},b)\parallel ,\ldots ,\parallel \theta _{k-1}(\bar{a},b)\parallel ) \end{aligned}$$
(8)

where \(\theta _0, \ldots ,\theta _{k-1}\) lists the formulas of \(\Theta _{n+1,r-1}\). If \(\psi _0(\bar{x}), \ldots ,\psi _{k'-1}(\bar{x})\) are the formulas of \(\Theta _{n,k}\), we want to find \(\Phi _{\phi }\) so that for all A and \(\bar{a}\) as above,

$$\begin{aligned} \text {A} \models \exists y \phi '(\bar{a},y) \text { if and only if } \mathcal {P}(I) \models \Phi _{\phi }(\parallel \psi _0(\bar{a})\parallel ,\ldots ,\parallel \psi _{k'-1}(\bar{a})\parallel ). \end{aligned}$$
(9)

Now we know that if the lefthand side of (9) holds then

$$\begin{aligned}&\text { There is a partition } I = I_0 \cup \ldots \cup I_{k-1}\ \text {such that} \mathcal {P}(I) \models \nonumber \\&\Phi _{\phi '}(I_0,\ldots ,I_{k-1}) \text { and for each } j < k, I_j \subseteq \parallel \exists y \theta _j(\bar{a},y) \parallel . \end{aligned}$$
(10)

Namely, choose b so that \(A \models \phi '(\bar{a},b)\) and put \(I_j = \parallel \theta _j (\bar{a},b)\parallel \), recalling that the \(\theta _j\) are a partitioning sequence. Conversely if (10) holds, then we can use the definition of cartesian product to choose elements from the \(B_i\) to form an element b of A with \(I_j = \parallel \theta _j(\bar{a},b) \parallel \) for each \(j < k\), and then it follows by (8) that \(A \models \exists y \phi '(\bar{a},y)\). So what we need is to find a formula \(\Phi _{\phi }\) so that the righthand side of (9) says the same as (10). This reduces to translating ‘\(I_j \subseteq \parallel \exists y \theta _j(\bar{a},y)\parallel \)’ into a condition on \(I_j\) and the sets \(\parallel \psi _{j'}(\bar{a})\parallel \), for each \(j < k\). But by (6),

$$\begin{aligned} \parallel \exists y \theta _j(\bar{a},y)\parallel = \bigcup \{\parallel \psi _{j'} (\bar{a})\parallel : j \in s(j')\}, \end{aligned}$$
(11)

and from this the definition of \(\Phi _{\phi }\) is straightforward.

Cartesian products are only a particular case for the Feferman-Vaught theorem; but they already contain the worst headaches. Feferman and Vaught show how to extend the technique to other kinds of product or sum by adding extra relations to the power set \(\mathcal {P}(I)\); for example a linear ordering on I can be coded up as a relation between singleton subsets of I. Then the required structure can be taken to be a definable substructure of the cartesian product. The extra relations are incorporated into the formulas by taking the formulas in \(\Theta _{n,0}\) to be formed not from quantifier-free formulas, but from formulas—added to the language L if necessary—that define sets of the form

$$\begin{aligned} \{\bar{a} \text { in } A: \mathcal {P}(I) \models \Phi (\parallel \eta _0(\bar{a})\parallel , \ldots , \parallel \eta _{k-1}(\bar{a})\parallel ) \} \end{aligned}$$
(12)

for any \(\eta _j\) in L and any suitable \(\Phi \). In this way the added complexity is fed in at the bottom level of the induction; the induction steps proceed exactly as before, since they are still done in the cartesian product. This approach has great generality, but for particular applications of the theorem there are often easier routes.

The Feferman-Vaught theorem, distributive normal forms, the Fraïssé back-and-forth method and Ehrenfeucht games have close relationships, and it is not at all surprising that they can be used to prove the same or similar results. In fact Feferman soon became aware that Roland Fraïssé [35] had a back-and-forth proof of results overlapping his own on elementary equivalence of ordinals as ordered sets, and that Andrzej Ehrenfeucht [8] found a back-and-forth proof of the decidability conjecture that Tarski had given Feferman for his thesis. Feferman reported on these results to the Summer Institute at Cornell in 1957 [12]. His report is clear, but it doesn’t go into the details of how back-and-forth equivalences imply degrees of elementary equivalence. This is the part of the back-and-forth theory where the Hintikka distributive normal forms appear. He may not have appreciated at once the common underlying themes of Feferman-Vaught and back-and-forth methods.

Nor did he know Hintikka’s ideas. In fact Hintikka had no contacts with the Berkeley group until later. He had visited Williams College in Massachusetts in 1948–9, and he later recalled unsuccessfully trying to explain distributive normal forms to Quine during that visit ([38] p. 11). But in any case Hintikka had only the theory, and nothing like the concrete applications that the Berkeley group could claim. (Fraïssé should count as a member of the Berkeley group. In 1995 I asked Fraïssé about his contacts with Tarski. My recollection is that he told me he attended Tarski’s Berkeley seminar for most of a year sometime in the early 1950s. The brief reference in Feferman and Feferman [11] p. 315 is consistent with this.)

It seems that after publishing his joint paper with Vaught, Feferman did no further work in the area. But the Feferman-Vaught theorem took on a life of its own. Model theorists used it to prove results about elementary equivalence and elementary embeddings, and computer scientists used it to prove a wide range of algorithmic results. Both groups extended the scope of the theorem. The generalisation from first-order logic to monadic second-order logic (which has quantifiers ranging over the class of sets of individuals) proved particularly useful. Yuri Gurevich [37] p. 479 comments ‘... monadic [second-order] logic definitely appears to be the proper framework for examining generalized products’.

One strikingly fruitful area of application in computer science was to the complexity of model-checking of properties of various structures, for example graphs and transition systems. The survey by Janos Makowsky [47] both illustrates and extends the applications to graph properties definable in monadic second-order logic. Ingo Felscher [34] reviews the use of the Feferman-Vaught technique in connection with composition theorems for transition systems. These two papers make clear both the wide range of applications, and the curious status of the Feferman-Vaught theorem on the borderline between theory of models and theory of algorithms. On the model-theoretic side one should note an incisive application of the Feferman-Vaught theorem to the model theory of adeles in number theory, by Jamshid Derakhshan and Angus Macintyre [6].

The work that Feferman did on back-and-forth methods led him to one further development. In [16] he defined a class of functors F between classes of structures, and showed for example that if F is such a functor defined on a class \(\mathbf {K}\) of structures, and AB in \(\mathbf {K}\) are equivalent in the infinitary language \(L_{\infty ,\kappa }\) for a suitable cardinal \(\kappa \), then F(A) and F(B) are also equivalent in \(L_{\infty ,\kappa }\). The functor F also preserves elementary embedding in the sense of \(L_{\infty ,\kappa }\). Feferman notes ([16] footnote p. 83) that he first found these results for the special case of ordinal functions. So proof theory hovers in the background. In fact this paper is one of several where Feferman raises the question whether there is a good notion of ‘natural’ well-ordering for use in proof theory.

Paul Eklof [9] describes some developments from Feferman’s [16]. Not least is Eklof’s own theorem that for any two universal domains \(K_1, K_2\) (in the sense of Weil) and any functor F preserving direct limits, if \(K_1\) and \(K_2\) have the same characteristic then \(F(K_1)\) is \(L_{\infty ,\omega }\)-equivalent to \(F(K_2)\). This is a strong form of Lefschetz’s Principle ([9] p. 430).

3 Applications of Interpolation Theorems

In 1953 Evert W. Beth [2] published a remarkable theorem about definability. Suppose T is a first-order theory and R a relation symbol that occurs in sentences of T. We say that R is implicitly defined by T if whenever A and B are models of T that have the same domain and agree in the interpretations of all nonlogical symbols except perhaps R, the structures A and B do in fact give R the same interpretation. This is a model-theoretic condition. We say that R is explicitly defined by T if there is a formula \(\phi (\bar{x})\) in the first-order language of T, not containing any occurrences of R, such that

$$\begin{aligned} T \ \vdash \ \forall \bar{x} \ (\phi (\bar{x}) \leftrightarrow R\bar{x}). \end{aligned}$$
(13)

This is a logical condition on R and T. But it is not specifically a syntactic or a model-theoretic one, because the consequence relation \(\vdash \) can be defined equally well by a proof calculus or model-theoretically; the completeness theorem says we get the same consequence relation either way.

Beth proved that these two conditions on R and T, namely implicit definability and explicit definability, are equivalent. It’s trivial that if R is explicitly defined by T then R is implicitly defined by T too. The interest lies in the other direction. By proving that in this first-order context every implicitly definable relation is explicitly definable, Beth turned an old heuristic idea of Padoa [53] into a theorem.

To prove his theorem, Beth had to extract a formula \(\phi \) somehow from the information that R is implicitly defined by T. Seeing no other way, he converted implicit definability into a statement that a certain proof exists, and then used Gentzen’s analysis of cut-free proofs to carve \(\phi \) out of the proof. More precisely, he reasoned: let the theory U be the same as T except that in U every occurrence of R is replaced by an occurrence of the new relation symbol S. The completeness theorem tells us that if R is implicitly defined by T, then there is a proof of

$$\begin{aligned} \forall \bar{x} \ (R\bar{x} \leftrightarrow S\bar{x}) \end{aligned}$$
(14)

from \(T \cup U\).

Beth’s argument for extracting \(\phi \) from this proof was not very clean. William Craig, who reviewed Beth’s paper for the Journal of Symbolic Logic in 1956 [3], thought about it and saw a much neater way to express the relevant proof-theoretic information. This became Craig’s Interpolation Theorem, which he published in 1957 [4]. Feferman [27] recalls that he first heard Craig’s theorem at the July 1957 Cornell Summer Institute, where Craig gave a talk on it. Craig showed, by using his own variant of Gentzen’s proof theory, that if \(\psi \) and \(\chi \) are first-order sentences such that

$$\begin{aligned} \psi \ \vdash \ \chi \end{aligned}$$
(15)

and some relation symbol occurs in both \(\psi \) and \(\chi \), then there is a first-order sentence \(\theta \) (the ‘interpolant’), which contains no relation symbols except those occurring both in \(\psi \) and in \(\chi \), such that

$$\begin{aligned} \psi \ \vdash \ \theta \ \ \ \text { and } \ \ \ \theta \ \vdash \chi . \end{aligned}$$
(16)

(This is Theorem 5 of [4].)

Beth’s result is an easy consequence, as follows (Craig [5]). Use compactness to shrink T down to a single sentence \(\tau \), and U to a corresponding sentence \(\sigma \), and add new constants \(\bar{c}\) to the language. Then implicit definability gives that

$$\begin{aligned} \tau \wedge \sigma \ \vdash \ R\bar{c} \rightarrow S\bar{c}. \end{aligned}$$
(17)

or after rearrangement

$$\begin{aligned} \tau \wedge R\bar{c} \ \vdash \ \sigma \rightarrow S\bar{c} \end{aligned}$$
(18)

where R occurs only on the left of \(\vdash \) and S only on the right. By Craig’s theorem there is an interpolant \(\theta (\bar{c})\) such that

$$\begin{aligned} \tau \wedge R\bar{c} \ \vdash \ \theta (\bar{c}) \ \ \ \text { and } \ \ \ \theta (\bar{c}) \ \vdash \sigma \rightarrow S\bar{c}. \end{aligned}$$
(19)

From this, elementary manipulations give

$$\begin{aligned} \tau \ \vdash \ \forall \bar{x} (R\bar{x} \rightarrow \theta (\bar{x})) \ \ \ \text { and } \ \ \ \sigma \ \vdash \ \forall \bar{x} (\theta (\bar{x}) \rightarrow S\bar{x}). \end{aligned}$$
(20)

Changing S back to R, and noting that R and S occur nowhere in \(\theta \), we reach the explicit definition

$$\begin{aligned} \tau \ \vdash \ \forall \bar{x} (R\bar{x} \leftrightarrow \theta (\bar{x})), \end{aligned}$$
(21)

which proves Beth’s theorem.

Note that this derivation ignores Craig’s condition that some relation symbol occurs in both \(\psi \) and \(\chi \). One can justify this by arguing that the condition can always be met by ensuring that the language includes at least one of \(=\) and \(\perp \), and then making trivial adjustments to \(\psi \) and/or \(\chi \). Craig’s theorem and its various descendents all tend to have this feature, that a correct statement involves some nuisance conditions that make only a marginal difference to applications. The discussion below will ignore these conditions.

Soon after Craig published his theorem, Roger Lyndon [44] showed, using cut elimination, that we can put more conditions on the interpolant. In particular we can require that every relation symbol occurring positively in \(\theta \) occurs positively in both \(\psi \) and \(\chi \), and likewise with ‘negatively’ for ‘positively’. He deduced from this [45] that every formula preserved in surjective homomorphisms between models of a first-order theory T is equivalent modulo T to a positive formula. Lyndon’s result suggested a programme: find other conditions that can be put on the interpolant formula, and use them to give similar characterisations of other model-theoretic notions. It was above all Feferman who carried this programme through, though as he remarks in [27], he began work on this only some ten years after he heard Craig’s talk in 1957.

In his course on proof theory at the 1967 European JSL meeting in Leeds [13], Feferman showed in detail how to adapt cut elimination so as to impose further conditions on the interpolant. His most striking innovation was that the language is many-sorted. Thus for example besides obeying Craig’s or Lyndon’s condition on relation symbols, the interpolant contains no variables of a given sort unless such variables occur in both \(\psi \) and \(\chi \); and for every sort s the interpolant contains no universal (resp. existential) quantifiers of sort s unless \(\psi \) (resp. \(\chi \)) contains a universal (resp. existential) quantifier of sort s. Caution: for this result to be correct, we need to assume that the only truth functions in the language are \(\wedge \), \(\vee \) and \(\lnot \), and that \(\lnot \) never occurs in the formulas involved except immediately in front of atomic formulas. (Or equivalently, an occurrence of a quantifier is ‘universal’ if either the occurrence is positive and the quantifier is \(\forall \), or the occurrence is negative and the quantifier is \(\exists \); and analogously with ‘existential’.) In the title of his [17] Feferman speaks of ‘many-sorted interpolation theorems’ in the plural. The reason is that his interpolation theorem of [13], like the Feferman-Vaught theorem, is as much a method as a theorem. The method can be adapted to prove various interpolation theorems for various applications.

Speaking loosely, the kinds of application of interpolation theorems that Feferman harvested generally involve two features: a property that is in some generalised sense implicitly definable is shown to be explicitly definable by a certain type of formula, and some relationship between structures is involved. The two features are connected by the fact that the implicit definition is given in terms of the relationship between structures.

Take for example the Łoś-Tarski theorem [43], which is easily derived by Feferman’s method, though it was originally found in another way. We have a first-order theory T and a property P of structures. For this theorem the relevant ‘implicit definability’ of P says that if A and B are both models of T and B is an extension of A, and B has property P, then A also has property P; and that the property P is expressible by a first-order sentence \(\phi \). The ‘explicit definability’ says that there is a prenex first-order sentence \(\theta \) with only universal quantifiers, such that P is expressible by \(\theta \) in all models of T.

The significance of the many sorts in Feferman’s interpolation is that it allows him to talk about two or more structures by assigning different sorts to the structures. Anatoliĭ Mal’tsev [48] had explored a similar use of many-sorted logic.

Applying this idea to the Łoś-Tarski theorem, suppose we take two sorts, 1 and 2, and assign expressions to the sorts by using 1 and 2 as superscripts. Then T can be written in sort 1, say as \(T^1\), and it has an exact copy \(T^2\) in sort 2. Wherever \(T^1\) has a relation symbol \(R^1\), \(T^2\) has a corresponding relation symbol \(R^2\). Feferman allows equality \(=\) to run across all sorts. So a pair of structures A, B that are models of T, with B an extension of A, can be written as a model of

$$\begin{aligned} T^1, T^2, \text {Ext} \end{aligned}$$
(22)

where Ext says

$$\begin{aligned} \forall x^1 \exists x^2 (x^1 = x^2) \wedge \forall \bar{x}^1 (R^1 \bar{x}^1\leftrightarrow R^2\bar{x}^1) \wedge \ldots \end{aligned}$$
(23)

and the missing part at ‘\(\ldots \)’ repeats for all relation symbols S etc. the statement made for R. The statement that \(\phi \) is implicitly defined in the relevant sense can be written

$$\begin{aligned} T^1, T^2, \text {Ext}, \phi ^2 \vdash \phi ^1. \end{aligned}$$
(24)

Similar adjustments to those we made for Beth’s theorem transform this entailment into

$$\begin{aligned} \tau ^1 \wedge \text {Ext} \wedge \lnot \phi ^1 \ \vdash \ \ \tau ^2 \rightarrow \lnot \phi ^2. \end{aligned}$$
(25)

We find an interpolant \(\theta \), and without loss of generality we can take it to be prenex. Since no variables or relation symbols of sort 1 occur on the righthand side of (25), \(\theta \) lies in sort 2. But then no universal quantifiers can occur in \(\theta \), since in that case a universal quantifier of sort 2 would have to occur on the lefthand side of (25), and inspection of Ext shows that none do.

So \(\theta \) is a prenex existential sentence in sort 2, and we have

$$\begin{aligned} T^2 \ \vdash \ \theta \rightarrow \lnot \phi ^2. \end{aligned}$$
(26)

We also have

$$\begin{aligned} T^1, \text {Ext} \ \vdash \lnot \phi ^1 \rightarrow \theta \end{aligned}$$
(27)

which holds in particular when the two sorts are identified. Under this identification Ext becomes logically true, and there remains that

$$\begin{aligned} T \ \vdash \ \phi \leftrightarrow \lnot \theta \end{aligned}$$
(28)

where \(\lnot \theta \) is logically equivalent to a prenex sentence with only universal quantifiers. Using constants as in the proof of Beth’s theorem allows us to generalise the result from sentences \(\phi \) to formulas \(\phi (\bar{x})\).

By varying the sentence Ext, together with other suitable adjustments, we can capture various other relations between structures. One of Feferman’s first applications of this approach was a result parallel to the Łoś-Tarski theorem, but for end-extensions between transitive models of set theory. The natural conjecture is that in this case the formula \(\phi \) should be equivalent, modulo set theory, to a \(\Pi _1\) formula, and in [14] Feferman showed that this conjecture is true. The question arose at least partly from Feferman’s discussions with Kreisel about related results in higher order logic ([32], as mentioned in Sect. 1 above).

Another result of a similar ilk appears in [23].

Let me mention two more applications. They have very different characters: the first of them feeds directly into nontrivial questions of algebra, while the second is about logical properties of quantifiers.

The first application [18] can be thought of as a generalisation of Beth’s theorem to situations where we add not just a relation symbol but also new elements. We consider structures B that are models of a theory T in a language L, such that B has a relativised reduct A (a substructure in a language \(L^-\) where some relation or function symbols of L may be dropped), and A is picked out by a relation symbol R in the language L. The ‘implicit definability’ condition on T says that if B and \(B'\) are two models of T, with corresponding relativised reducts A and \(A'\), then every isomorphism from A to \(A'\) extends to an isomorphism from B to \(B'\). The ‘explicit definability’ condition on T (which Feferman calls the ‘uniform reduction’ property) says that if \(\phi (\bar{x})\) is a formula of L, then there is a formula \(\theta (\bar{x})\) of \(L^-\) such that if B is any model of L with corresponding relativised reduct A, and \(\bar{a}\) is a tuple of elements of A, then

$$\begin{aligned} B \models \phi (\bar{a}) \ \ \Leftrightarrow \ \ A \models \theta (\bar{a}). \end{aligned}$$
(29)

The theorem says that when the languages L, \(L^-\) are first-order and T satisfies the implicit definability condition, then T has the uniform reduction property. (In fact the theorem in this form is due to Haim Gaifman [36] p. 31, who indicates that Feferman’s interpolation results for many-sorted logic can be used to prove it. In [18] Feferman cites Gaifman [36] but gives the theorem in a broader and more abstract form.)

A simple example is where \(L^-\) is the language of rings, L is \(L^-\) with an added 1-ary relation symbol R, and T says of any model B that B is a field which is the field of fractions of the subring A picked out by R. The theorem tells us that in any model B of T, every first-order property of a tuple \(\bar{a}\) of elements of A in B can be translated uniformly into a first-order property of \(\bar{a}\) in A. This is not hard to check directly, using the fact that every element of B can be written as a ratio of elements of A. But the theorem applies also in cases where the elements of B are not explicitly definable in terms of elements of A.

In fact it’s a natural question whether the Gaifman-Feferman result can be extended to say that if T satisfies the implicit definability condition, then in every model B of T with relativised reduct A, the elements of B can be uniformly defined in terms of elements of A. But group cohomology prevents any such theorem. If it was true, then the automorphism group of A would lift homomorphically to a group of automorphisms of B. The paper [10] presents many counterexamples in finite abelian groups.

In the second application [30], Feferman encodes second-order logic by letting the second sort range over relations on the domain of the first sort. Then a generalised quantifier is encoded as a relation which has one or more arguments of the second sort. For example existential quantification is the relation \(\exists (p)\) which is satisfied by a 1-ary relation p of the first sort if and only if p is nonempty. In this setting a Beth-type theorem says that if a quantifier is implicitly definable by a theory (which can be a first-order theory but with its relations expressed as constants of the second sort), then the quantifier is explicitly definable by the theory. If one spells out the details, this ingeniously tightens up a proof sketched by Jeffery Zucker in [58], stating that if a generalised quantifier is completely determined by its quantifier rules (which have to be first-order expressible in the appropriate sense), then it is first-order definable.

By the early 1960s it was known that there are several ways of proving Craig’s interpolation theorem without going through proof theory. Abraham Robinson found a proof by building an isomorphism of structures through unions of chains. Similar insights led to proofs using saturated structures, or structures with some variant of saturation. Another cluster of proofs use techniques for building models from atomic sentences upwards; these techniques are closely related to cut-free proofs. They include the consistency properties of Smullyan [46] and the similar construction of models by Hintikka sets. Starting in the 1960s, a number of people adapted these various techniques to give model-theoretic proofs of any and all interpolation theorems (e.g. Stern [54], Otto [52]). Also some results that were first proved by many-sorted interpolation were given proofs not using any kind of interpolation (e.g. Marker [49]). But Feferman’s interpolation theorem itself is still impressive for its uniform treatment of a wide range of cases. His use of sorts for encoding is a valuable tool; and he was the first person to make many of the applications, in particular those that combine model theory with ideas from other areas of logic—such as definability of truth, or end-extensions of models of set theory.

As we saw earlier, in the 1950 and 1960s there was no general understanding that model theory had any intrinsic connection with first-order languages. When a result was proved for first-order logic, people would soon ask whether it holds for other languages too. Feferman’s interpolation theorem was an important contributor to this work, because it lifts smoothly to any countable admissible language, and hence to \(L_{\omega _1\omega }\) which is the union of the countable admissible languages. Feferman’s student Jon Barwise wrote his doctoral thesis on the proof theory and model theory of admissible languages.

In 1985 Barwise and Feferman together edited the massive volume Model-Theoretic Logics [1], which brought together a vast amount of work on model-theoretic properties of various logics. In his Preface Feferman notes that he and Barwise had ‘turned to other interests in the latter part of the 1970s’ (p. vii), and the initiative for the book came not from them but from the editors of ‘Perspectives in Mathematical Logic’. In fact Feferman wrote none of the chapters. Notoriously the book has no index, but if it had one there would certainly be many entries under ‘Feferman, S.’. Thus Jouko Väänänen in his chapter on ‘Set-theoretic definability of logics’ [57] remarks (p. 599) that an important tool throughout his chapter is Feferman’s notion of adequacy to truth. This notion is a formalisation of the notion of implicit definability of truth within a logic, and it comes from a paper [19] where Feferman applies the interpolation theorem machinery to relate implicit and explicit definability of truth in a logic.

We should add too that Feferman contributed to this area of research not only through specific results that he proved, but also through helping to establish it as a substantive part of logic. For example Janos Makowsky kindly tells me that a preprint of Feferman’s paper [17] was in circulation in 1971 and led him to the topic of his PhD thesis. He names Daniele Mundici, Jouko Väänänen, Jonathan Stavi and Saharon Shelah as other logicians whose work around that time took inspiration from this preprint.

A paper that touches on model-theoretic logics, but is rather orthogonal to Feferman’s other work in model theory, is his study [20] of some results of Nigel Cutland on effective model theory. Unlike most writers in this vein, Feferman looks not for an effective analogue of model theory, but for a common abstract framework that includes both the classical set-theoretic version and effective analogues. This leads us neatly into our final section.

4 The Concept of Model Theory

In his paper for the Proceedings of the 1974 Tarski Conference [17] p. 206, Feferman refers to

(30)

It’s clear that one reason why Feferman mentions Tarski’s aims in this direction is that they resonate with Feferman’s own thinking. Feferman writes about Tarski’s conceptual analyses in several places (e.g. [24, 26]). In one unpublished paper he notes several conceptual problems that drove much of his own work [21]—though none of his listed problems are directly model-theoretic.

That said, I am not entirely clear how Feferman in 1974 saw Tarski’s conceptual work as related to the ‘model-theoretic languages’ that he refers to. Did he think that identifying the class of ‘model-theoretic languages’ was itself a task parallel to defining truth and satisfaction? Or did he merely mean that Tarski by giving a precise definition of satisfaction, rather than leaving it to intuition, had made it easier to develop model theory in an abstract setting?

A third possibility is that Feferman regarded it as a conceptual advance, similar in some sense to Tarski’s conceptual advances, to think of assessing logics by formal properties that they have. This is after all what Lindström contributed, with his proof that first-order logic can be picked out from other logics by some very general and abstract properties that it has (for example compactness and downward Löwenheim-Skolem). This possibility is worth mentioning because Feferman himself quite often used this kind of assessment of logics.

One central example of this is Feferman’s interest in the question of what terms (i.e. meaningful expressions) in a language are ‘logical’. Two of his published papers [22, 29] discuss criteria for terms to be counted as ‘logical’, for example Tarski’s criterion in terms of permutations of the universe. He also has a paper [25] in which he says his ‘main purpose’ is to examine in what sense Hintikka’s Independence-Friendly Logic ‘deserves to be called a logic’ (p. 454). In all these papers, Feferman discusses and assesses criteria for a term to be regarded as ‘logical’. In fact his work on Zucker’s paper, discussed in the previous section, was intended as an application of just such a criterion. The criterion was that a ‘logical’ quantifier should be determined uniquely, up to logical equivalence in an obvious sense, by its proof rules. Feferman’s result, following Zucker, shows that any such quantifier is explicitly definable in first-order logic, and this supports Feferman’s own intuition that ‘logical’ terms should be limited to first-order ones.

Here I am reporting what Feferman says, though I think I am not the only logician who finds it hard to see what question is being answered. Expressions like ‘logic’ and ‘logical’ get their meaning from their usage among people called logicians. This usage is likely to vary from year to year as the subject develops. When the community of logicians fragments, as it inevitably does, the notion of ‘logic’ comes to vary between the fragments. So Feferman’s appeal to ‘the traditional conception of logic’ ([29] p. 17) could be an appeal to a set of concepts that nobody still uses. (Compare ‘the traditional conception of probability’, bearing in mind that probability used to be treated as a branch of logic.) But let me hold my sceptical tongue.

In [27] Feferman recalls the correspondence between Beth, Tarski and himself in 1953, about Beth’s definability theorem and its proof. He says:

$$\begin{aligned}&\text {A draft of Beth's proof was sent to Tarski in May 1953, and Tarski}\nonumber \\&\text {discussed it with me at that time (I was in Berkeley then, working}\nonumber \\&\text { with him as a student). From Tarski's point of view, since the}\nonumber \\&\text {statement of Beth's definability theorem is model-theoretic, there}\nonumber \\&\text {ought to be a model-theoretic proof, and there was correspondence}\nonumber \\&\text { (via me) with Beth about how that might be, accomplished} \nonumber \\&\text {(cf. Van Ulsen 2000, pp. 136ff).} \end{aligned}$$
(31)

This interests me very much. To my eye the correspondence reported by Van Ulsen [56] p. 138 can more easily be read in the opposite sense. On 22 June 1953 Feferman wrote to Beth

$$\begin{aligned}&\text {Your solution of the problem is really a solution of a problem}\nonumber \\&\text { in proof theory and only incidentally an application of }\nonumber \\&\text {[the completeness theorem]. Indeed, it seems to me that}\nonumber \\&\text {your main result has its proper phrasing as follows: If}\ A, B \nonumber \\&\text {are formulas symmetric in }\ a, b\ \text { (which are consistent), and if }\nonumber \\&\forall x_1 \ldots \forall x_k(a(x_1,\ldots ,x_k) \leftrightarrow b(x_1,\ldots ,x_k))\ \text {is derivable from}\; A \wedge B\nonumber \\&\text {by elementary logic, then [and here follow statements about}\nonumber \\&\text {derivability in first-order logic]. From this theorem it is, of}\nonumber \\&\text {course, a quick step to the solution of your problem via the [completeness}\nonumber \\&\text { theorem]. I believe it is worthwhile putting the problem} \nonumber \\&\text { in this form, since then the difference between your problem}\nonumber \\&\text {and the problem of exhibiting models for independence of axioms}\nonumber \\&\text { is quite sharply pointed up.} \end{aligned}$$
(32)

In short, Feferman in 1953 argued that Beth’s theorem was really a proof-theoretic theorem and ought to be stated and proved in a form which makes this clear.

Feferman’s statement in 1953 seems to me very much in line with what we know Tarski was saying about Padoa’s method in the 1920 and 1930s. Namely, the method should be made rigorous by removing the model-theoretic content and rewriting it as a statement about deducibility in the appropriate logic. (See for example [40].) In the 1930s the appropriate logic was higher-order; for Beth’s theorem it is first-order. Interestingly Van Ulsen ([56] p. 137) does include an item that closely fits Feferman’s later account in [27]:

$$\begin{aligned}&\ldots \ \text {in particular, Tarski has emphasized the desirability of establishing}\nonumber \\&\text { the Interpolation Theorem by methods independent of}\nonumber \\&\text { the theory of proof.} \end{aligned}$$
(33)

But this is a quotation from Lyndon [44] in 1959!

The point to note is that Van Ulsen’s quotation shows Feferman in 1953 describing Beth’s theorem as a solution of a problem in proof theory, in line with Tarski’s earlier perceptions. Today I think everybody regards Beth’s theorem as relating an overtly model-theoretic notion (implicit definability) to a more basic logical notion (explicit definability). How far is this a different perception of the theorem itself? How far is it a reflection of how the boundaries between the branches of logic were moving in the mid 1950s?

So here we have a question not about what should count as logic, but about what did actually count as proof theory, or as semantics, or as model theory, in the early to mid 1950s. It was a time of paradigm change, and Solomon Feferman was right at the heart of the action, with a mental ear well tuned to uses and adjustments of concepts. Everything that he has told us about the understanding of those notions in Berkeley at that time is gold dust for present and future historians of logic.