1 Introduction

This is the first paper to present the formal darwinism project for mathematicians. It presents all the results of Grafen (2002, 2006b) completely anew, but also combines into a single model all their separate features, correcting some errors and making explicit all needed assumptions. While the earlier papers have relevant discussion of the biological motivation and background, from a mathematical point of view, the current paper should be regarded as the starting point of the formal darwinism project.

The first explicit links of this kind between gene dynamics and optimization were made by Grafen (2002), for a population evolving in discrete time with non-overlapping generations, considering the effects of uncertainty (e.g. climate). The starting point for this work is the Price equation, which records how the frequency of a given allele changes from the parent generation to the offspring generation. Grafen (2006b) considers the more complicated case of a population with a class structure, e.g. age, ploidy, etc., and considers the problem of rigorously defining Fisher’s notion of reproductive value (Fisher 1930), still in discrete time, and without the effects of uncertainty. As part of this Grafen derives versions of the Price equation for a class-structured population. Given that this equation is the starting point for proving links to optimization in Grafen (2002), it seems natural to investigate what corresponding optimization results we can derive in the case of a class-structured population.

This paper provides analogous results in the class-structured case to the fundamental links proved in Grafen (2002). To avoid the apparently substantial and more sophisticated problems inherent in a Markov process including uncertainty (by which a general expression for reproductive value and hence our maximand would be defined) we have to make some assumptions so that there is no population-level effect of uncertainty, i.e. that the total offspring distribution over classes is independent of the state of nature. A current line of investigation in the formal darwinism project is to allow uncertainty to interact more substantially with the population over time. Neither do we allow any social interactions between individuals; the full extension of the project in the context of inclusive fitness, begun in Grafen (2006a), is the next challenge.

As a preliminary to proving the links to optimization, we re-derive the Price equation for a class-structured population, following the original derivation in Grafen (2006b), but making some changes for clarity and correctness. Our general results are illustrated by an examination of the case of a finite age-structured population, which has traditionally been analyzed by means of so-called Leslie matrices.

The final contribution we make is to provide (in discrete time) an explicit statement, and a complete and rigorous derivation, of Fisher’s celebrated Fundamental Theorem of Natural Selection (Fisher 1930), placing the theorem within our wider optimization programme, which seems to formalize mathematically its original and natural conceptual context.

2 Biological motivation

It has been a frequent and recurrent theme in biology since Darwin that natural selection has a tendency to result in the maximization of something. Herbert Spencer’s phrase ‘survival of the fittest’ led to the term ‘fitness’ being used for whatever that relevant quantity might be. Many significant advances in biology, as well as a number of recurrent debates, hinge on what exactly fitness is, who or what should be regarded as doing the maximizing, and how the maximization can be formalized. These ideas can be found and pursued in textbooks such as Davies et al. (2012).

Many empirical biologists today base research projects and paradigms on the idea that natural selection leads to individual organisms acting as if maximizing their ‘inclusive fitness’, subject to the physical, physiological and informational constraints on the development and behaviour of an individual. An intermediate level of biological theory, represented by the theory of evolutionarily stable strategies (Maynard Smith 1982) and inclusive fitness theory (originated by Hamilton 1964), simply assumes that individuals act as if maximizing their Darwinian fitness (roughly, lifetime number of offspring) or their inclusive fitness (a more sophisticated concept that recognizes social behaviour), and studies topics of interest on that basis. However, the most fundamental level of biological theory, mathematical population genetics, has long been resistant to the idea that any useful maximization principle can be derived from the known processes of gene frequency change, which are modelled using difference or differential equations.

From a mathematical point of view, the obvious candidate principles are those familiar to students of dynamical systems, such as Lyapunov functions and gradient functions, and the general conclusion of the literature is that only under rather special and not very interesting conditions are population genetic systems of a kind that will admit these kinds of functions (Ewens 2004). The main premiss of the current paper is that the empirical biologist’s individual-based idea of fitness-maximization can be done justice only in a more sophisticated setting, in which the equations of motion are taken as fundamental in representing gene frequency change, an optimization programme is constructed from those equations of motion in which the implicit decision-taker is the individual organism, and then links are proved between the equilibrium concepts of the equations of motion on the one hand, and of the optimization programme on the other. The maximand of the optimization programme plays the role of fitness, and a major interest lies in the nature of that maximand, and in how tightly the maximand is defined by the structures imposed on it. The stronger the links between the equilibrium concepts, the more constrained is the nature of the maximand, and so the more precisely the concept of fitness is defined.

In the present work, the population of individuals is assumed to be divided into classes, such as sexes and/or size and/or age. There are discrete overlapping generations, so that in a formal sense an individual may have a special kind of asexual offspring that is itself surviving to the next period, as well as contribute gametes to new individuals. The population may be finite or infinite, as may the set of classes. An individual takes an action in each period which affects the offspring it leaves in the next period. So does its class, and the ‘state of nature’, which is drawn from a set of possible outcomes of ‘actions by nature’, with a given probability distribution. Thus the offspring distribution across classes produced by an individual depends on its own action and on uncertainty, but we impose the restriction that it does not depend on the actions of others—social behaviour is therefore excluded. A further restriction is that the uncertainty is assumed not to affect the class demographics, that is, the projection operator from classes to classes has no stochastic component. The action taken by an individual depends on its phenotype, which we assume is determined by its genotype, and also by informational cues that make it possible for an individual to condition its choice of action on aspects of the uncertainty.

The aim is to prove a fitness-maximization principle at a very high level of generality. So far as gene frequencies are concerned, we employ the covariance selection mathematics of Price (1970), though not the generalization in Price (1972a), and very weak assumptions, which allow us not to say anything about mating systems, linkage, or some other potential complications. On the optimization side, we consider the equilibrium concepts of an optimization programme in which an individual organism is the decision-taker. The programme represents a sophisticated individual, who has a prior distribution over all the relevant uncertainties, and who updates this distribution in the light of information received. For an individual to solve this programme implies that the prior distribution is correct, and that the updating is optimal Bayesian. The links are of two kinds. Three results make an assumption related to how individuals fare in the optimization programme, and draw conclusions about gene frequency change. A fourth result makes an assumption about gene frequency change, and draws a conclusion that individuals each solve the optimization programme.

The first explicit mathematical fitness-maximization principle in biology was the Fundamental Theorem of Natural Selection of Fisher (1930), and the current work can be viewed as an extension and generalization of this theorem. Fisher’s theorem has been notoriously hard to understand, and his arguments are famously opaque. The early rejection (endorsed and reviewed by Ewens 1979) by mathematical population geneticists of the theorem must now be read in the context of the exposition of the theorem and proofs by Price (1972b). The current view in that discipline, as represented by Ewens (2004), is that the fundamental theorem is true, and Fisher’s proofs are valid give or take some minor typographical and other errors (see Lessard 1997 for a careful modern derivation), but that the meaning of the theorem remains obscure. Recent papers focus on the meaning (Okasha 2008; Ewens 2011) but come to no firm conclusion. The present paper contributes to the debate on fitness-maximization by generalizing the Fundamental Theorem of Natural Selection to include arbitrary classes rather than just age classes, and to allow for arbitrary uncertainty at an individual level; and by making fully explicit the nature of the optimization in the conceptual scheme. Our presentation should indicate that interest in the theorem is not merely historical: whereas much of the literature on the theorem seems wholly motivated by solving the enigma of what one man meant by one theorem, and the result is left to stand alone and justify itself, we, contrastingly, demonstrate that it sits very naturally alongside related results about the optimizing behaviour of natural selection.

The high level of generality sought in this paper and in the formal darwinism project more generally is important for two reasons. First, the fewer the assumptions, the more widely the framework will apply as a meta-model: that is, the assumptions of other models will not contradict those made here. Our results then offer an optimization interpretation of the results of those previous models. Application as a meta-model also explains one significance of dealing with infinite populations, which are admittedly hard to find in nature. The assumptions we do make are so weak as to be what population geneticists call ‘dynamically insufficient’ (Lewontin 1974), that is, they use a lot of information about the parental generation and calculate only one piece of information about the offspring generation, so one cannot ‘crank the handle’ and repeat the process. One virtue is that the framework can apply as a meta-model to models with different detailed assumptions about mating systems and genetic architecture. Second, biologists today read Darwin (1859) and agree with his conclusions for the reasons he gave. If Darwin’s arguments are generally valid, then there must be a formal framework in which they can be expressed. Darwin did not concern himself with continuous versus discrete time, ploidy levels, or whether generations were overlapping or not, and his arguments apply regardless: so should ours. This encompassing of Darwin’s argument within a formal framework will reduce the scope for misunderstanding and misinterpretation. The broad aim of the paper and the project is to justify a fitness-maximization principle for understanding the outcome of natural selection, which will involve defining fitness, in as wide a setting as possible. Indeed, Fisher’s theorem is best viewed as showing that the change in the mean of a quantity equals the variance of that same quantity, and it is natural to regard that quantity as what increases under natural selection wherever possible. Significantly, that quantity, whose exact nature is too technical to explain in this section, is intimately related to the maximand in the optimization programme of our Theorems 2–5.

3 Notation and concepts

We set out to define an extremely general model of a biological population that has an arbitrary class-structure, with genotypes in an arbitrary set, and phenotypes in an arbitrary set. Each class may have its own ploidy (number of haploid sets in the genome). There will be arbitrary environmental uncertainty that, together with its phenotype, affects the number of offspring each individual has. The way phenotype affects offspring number is important. Each individual possesses partial knowledge of the uncertainty (it observes a ‘cue’), and the phenotype dictates how the value of that cue in turn determines an action. The action, together with the whole uncertainty and an individual’s class, determines its number of offspring. No restrictions are placed on how genotype determines phenotype. Thus, the model is extremely general: its chief restrictions are that it is in discrete time, the environmental uncertainty does not affect the class-to-class projection at a population level, and an individual’s offspring number is not affected by phenotypes other than its own. Formally, this model is a simultaneous generalization of two previous papers (Grafen 2002, 2006b), using a single consistent mathematical argument that replaces less sophisticated, and in places erroneous, arguments.

We firstly outline the very basic structure of how we shall describe a class-structured population reproducing under the effects of environmental uncertainty. Our aim is generality, and so we impose only the minimum mathematical structure required, and we make only the weakest assumptions necessary to ensure the discussion makes sense. This has the consequence that some technical effort has to be made to attain results which are unremarkable in more familiar settings, for example being able to exchange the order of integration. Since our aim is mathematical rigour, it is appropriate to demonstrate that these results can be formally justified, and that our conclusions are indeed valid in this generality.

Let \((I,\mathcal{I },\mu _I)\) and \((\varOmega ,\mathcal{O },\nu )\) be probability spaces, representing the population and states of nature respectively. We remark that no assumption is made on the cardinality of \(I\), or \(\varOmega \). For notational ease, we use subscripts for functions from \(I\) and superscripts for functions from \(\varOmega \). Since context shall not always make things clear, we shall throughout the paper fully notate spaces of measurable and integrable functions, including the relevant domain, \(\sigma \)-algebra, measure, and co-domain, e.g. \(L^1(I,\mathcal{I }, \mu _I; \mathbb{R })\) is the space of functions from \(I\) to \(\mathbb{R }\), integrable with respect to the measure \(\mu _I\) on the \(\sigma \)-algebra \(\mathcal{I }\).

Let \(X\) be a compact Hausdorff space equipped with the usual Borel \(\sigma \)-algebra \(\mathcal{X }\), and let \(\chi :I \rightarrow X\) be measurable. \(X\) is the space of classes, and thus \(\chi \) is the map allocating each individual to a class. We let \(\sigma {(\chi )}\) denote the \(\sigma \)-algebra on \(I\) generated by \(\chi \), i.e. generated by the set of pre-images \(\{\chi ^{-1}(Y) : Y \in \mathcal{X }\}\).

We let \(d :I \rightarrow \mathbb{N }\) denote the ploidy of the individuals, and we assume that \(d \in L^1(I, \sigma {(\chi )}, \mu _I; \mathbb{N })\), so in particular \(d\) is measurable with respect to \(\sigma {(\chi )}\). A ploidy-weighted probability measure \({\tilde{\mu }}_I\) on \((I, \mathcal{I })\) is then defined by

$$\begin{aligned} {\tilde{\mu }}_I (J)=\left( \int _I d_i \,\mu _I (di) \right) ^{\!\!-1} \int _J d_i\, \mu _I (di). \end{aligned}$$

Evidently a set \(J \in \mathcal{I }\) is \(\mu _I\)-null if and only if it is \({\tilde{\mu }}_I\)-null. Expectations and covariances over \(I\) shall always be taken with respect to this ploidy-weighted measure \({\tilde{\mu }}_I\). Expectations with respect to \(i \in I\) or \(\omega \in \varOmega \) shall be denoted \(\mathbb{E }_I\) and \(\mathbb{E }^{\varOmega }\) respectively, and \(\mathbb{C }_I\) shall denote the corresponding covariance over \(I\).

We assume individuals to produce offspring as measures over classes, i.e. as elements of \(\mathcal{M }(X)\), the space of signed finite measures on \((X, \mathcal{X })\), equipped with the usual norm and Borel \(\sigma \)-algebra. This is a Banach space, and thus we may in principle integrate functions taking values in this space, using the Bochner integral. We let \(\mathcal{M }_{+}(X)\) denote the subset of positive measures, in which the offspring distributions will of course lie. (We do not demand that these offspring distributions are probability measures.)

The technical work of Sect. 4 below is devoted to deriving a Price equation in this context, in the style of Grafen (2006b), but with the added generalization to environmental uncertainty. The generality of the set-up described here means that the manipulations required of population, class, and uncertainty are rather delicate, and the argument must be pursued with great care. The argument given here provides both a generalization of the corresponding Price equation of Grafen (2006b), and a more sensitive handling of the difficult mathematical structures used.

We now begin to articulate the decision structure for individuals, so that our model is of an individual in possession of some (possibly none and possibly complete) information about environmental uncertainty. In order to move towards regarding individuals as facing the same decision, we also define a space for how the environmental uncertainty can affect an individual. This allows us to have a function, which is the same function for all individuals, to map from action and uncertainty (and class) to number of offspring. This reduction from the whole population to a single implicit decision taker is a key theme of the development. The set of possible phenotypes available to each individual is also defined as a central part of the decision structure for an individual.

Following Grafen (2002), we further suppose we have measurable spaces \(R,U\), and \(A\), where we shall not notate the associated \(\sigma \)-algebras. \(R\) denotes the observable local environment on which individual behaviour can be conditioned, \(U\) denotes the set of chance events from the point of view of the individual that are determined by the state of nature \(\omega \in \varOmega \), which may represent events experienced by the whole population and events affecting individuals separately (we thus combine in one notation what was notated separately in Grafen 2002). \(A\) denotes the space of actions which may be taken and which in turn (partly) determine the offspring produced.

A measurable function is understood to be a function between measurable spaces such that pre-images of elements of the target space \(\sigma \)-algebra are elements of the domain \(\sigma \)-algebra. All subspaces, product spaces, and function spaces shall be equipped with the usual \(\sigma \)-algebras generated under these operations; in particular function spaces are equipped with the smallest \(\sigma \)-algebra which makes each evaluation map measurable (see Kechris 1995, §10.B). We shall let \(Q \subseteq A^R\) consist of the measurable functions \(q :R \rightarrow A\) and \(\mathcal{Q }\) denote the induced \(\sigma \)-algebra on \(Q\).

For each individual \(i \in I\) let \(S_i\) be a subset of the class of measurable functions from \(R\) to \(A\) (‘strategies’). Thus each individual has a set of possible ways to react to any given local environment. We suppose the following functions are measurable:

  • \(r :I \times \varOmega \rightarrow R\), representing the information about the local environment available to individual \(i\) in state of nature \(\omega \);

  • \(u :I \times \varOmega \rightarrow U\), uncertainty, in principle affecting both individuals separately and the population as a whole;

  • \(a :I \times R \rightarrow A\), the phenotype or strategy, specifying the action taken by individual \(i\) in local environment \(r\); moreover we assume that \(a_i \in S_i\) for all \(i \in I\), i.e. that the realized phenotype is indeed an admissible phenotype; and

  • \(w :A \times X \times U \rightarrow \mathcal{M }_{+}(X)\), the offspring distribution, depending on action, class, and chance events.

A purely technical assumption we shall require in the proofs is that for all \({\tilde{I}} \in \mathcal{I }\) of full \({\tilde{\mu }}_I\)-measure and all \(E \in \mathcal{X } \times \mathcal{Q }\) (the product \(\sigma \)-algebra on \(X \times Q\)),

$$\begin{aligned} \left\{ x \in X : \left( \{ x \} \times \{ a_i : i \in {\tilde{I}} \cap \chi ^{-1}(\{x\})\}\right) \cap E \ne \emptyset \right\} \in \mathcal{X }, \end{aligned}$$
(1)

i.e. this set is a measurable subset of \(X\). Thus the set of those classes which contain some individual playing one of a given measurable set of strategies is itself measurable. This is somewhat reminiscent of the assumptions used in measurable selection theorems, e.g. that of Kuratowski–Ryll-Nardzewski (see Wagner 1977), and its role will be in precisely this kind of context. We note that it is satisfied trivially if the set of classes \(X\) is finite or countable.

We now condense our notation a little: we define \({{\tilde{w}}}\in \prod _{(i, \omega ) \in I \times \varOmega } (\mathcal{M }_{+}(X)^{S_i})\) by

$$\begin{aligned} {{\tilde{w}}}_i^{\omega } (q) = d_i^{-1}w(q(r_i^{\omega }),\chi _i, u_i^{\omega }), \end{aligned}$$

for \(q \in S_i\). So \({{\tilde{w}}}_i^{\omega }(S_i) \subseteq \mathcal{M }_{+}(X)\) is the set of all possible offspring distributions per haploid set of individual \(i\) in state of nature \(\omega \), when considering all admissible strategies for that individual. Note that the partial maps \(\omega \mapsto {{\tilde{w}}}_i^{\omega }(q)\) for fixed individual \(i \in I\) and strategy \(q \in S_i\) and \(i \mapsto {{\tilde{w}}}_i^{\omega }(a_i)\) for fixed \(\omega \in \varOmega \) are measurable (see Rudin 1966, Theorem 7.5). The need to take averages in the offspring generation, which is described (only) in terms of measures, means we need some theory of vector integration. We use that of the Bochner integral, which is the most powerful, and bears the strongest resemblance to the more familiar Lebesgue integral (see Diestel and Uhl 1977). We make some important assumptions about the offspring distribution, the first three of which are purely technical and simply guarantee that we may pursue our argument in great generality:

  • that the function \((i, \omega ) \mapsto {\tilde{w}}_i^{\omega }(a_i)\) is strongly measurable;

  • that the function \(\omega \mapsto {\tilde{w}}_i^{\omega }(q)\) is Bochner integrable for all \(i \in I\) and \(q \in S_i\);

  • that the function \(i \mapsto {\tilde{w}}_i^{\omega }(a_i)\) is Bochner integrable for all \(\omega \in \varOmega \);

  • that the total offspring distribution \(W^{\omega } {:}= \mathbb{E }_I[ {\tilde{w}}_i^{\omega }(a_i)]\) does not in fact depend on \(\omega \), we therefore notate this by just \(W\); and

  • that \(\mu _X {:}= \chi _{\sharp }{\tilde{\mu }}_I \ll W\). This is the assertion that classes may not be abandoned except by \({\tilde{\mu }}_I\)-null sets of individuals: a positive distribution of the parental generation on some set of classes \(Y\) implies a positive distribution on \(Y\) of the offspring.

We record here two simple facts about Bochner integration which we shall need. The first is essentially a version of Fubini’s theorem, stating that integrating a function with respect to an average measure is the same as averaging over the integrals over individual measures.

Lemma 1

Let \((Y, \mathcal{Y }, \mu _Y)\) be a measure space, and let \((Z, \mathcal{Z })\) be a measurable space. Let \(m :Y \rightarrow \mathcal{M }_{+}(Z)\) be Bochner integrable, \({\bar{m}} {:}= \int _Y m (y)\, \mu _Y(dy) \in \mathcal{M }_{+}(Z)\), and \(f :Z \rightarrow [0, \infty ]\) be measurable.

Then the function

$$\begin{aligned} Y \ni y \mapsto \int _Z f(z) \, m(y)(dz) \end{aligned}$$

is measurable, and

$$\begin{aligned} \int _Z f(z) \, {\bar{m}}(dz)=\int _Y \left( \,\int _Z f(z) \, m(y)(dz) \right) \,\mu _Y (dy). \end{aligned}$$
(2)

Remark 1

We do not assume that the function \(f\) is integrable; thus (2) also applies in the case that the integrals are infinite. The result will not extend in this generality to arbitrary real-valued (i.e. possibly negative-valued) functions, since in this case the argument would involve the indefinite term \(\infty -\infty \).

Proof

The measurability result can be seen by returning to the definition of the integral, and using the measurability of the map taking measures to their evaluation on some fixed set, and the algebra of measurable functions.

Equation (2) can be seen by routine approximation of the integral by simple functions. \(\square \)

Lemma 2

Let \((Y,\mathcal{Y }, \mu _Y)\) be a finite measure space, \((Z, \mathcal{Z })\) be a measurable space, \(m :Y \rightarrow \mathcal{M }(Z)\) be Bochner integrable, \(E \in \mathcal{Z }\).

Then \(y \mapsto m(y)(E)\) is integrable and

$$\begin{aligned} \left( \int _{Y} m(y)\, \mu _Y(dy) \right) (E) = \int _{Y} m(y)(E) \, \mu _Y (dy). \end{aligned}$$

Proof

This is a trivial consequence of the fact that \(m \mapsto m(E)\) is a bounded linear operator from \(\mathcal{M }(Z)\) to \(\mathbb{R }\), and the result of Hille (Diestel and Uhl 1977, Chapter 2, Theorem 2.6) that Bochner integration commutes with closed operators. \(\square \)

Finally, it is unusual in population genetic models to leave the connection unspecified between genotype and phenotype, but it is one of the notable features of the covariance selection mathematics of Price (1970). We can obtain a sufficient purchase on that connection by using two population genetic concepts. A \(p\)-score is an arbitrary weighted sum of allele frequencies, and is thus a linear functional on the set of genotypes: by proving results for an arbitrary \(p\)-score, we manage to say something significant about selection of genotypes in general. If a phenotype is a real number, such as height, we can find the \(p\)-score that best predicts the phenotype across the whole population. Sometimes it will be useful to discuss those predicted phenotypes, which are known in biology as additive genetic values (of that given trait).

Thus we formally introduce the concepts of additive genetic value and \(p\) -scores. Suppose each individual \(i \in I\) has at most \(N \ge 1\) loci, and at most \(n \ge 1\) rival alleles for each locus. Then the space \(\mathbb{R }^{n \times N}\) of real-valued \(n \times N\) matrices \(G\) can be regarded as containing all the matrices of genotypes of individuals: the entry \(g_{k, l}\) of the matrix \(G\) representing the genotype of individual \(i\) is the number of alleles \(k\) at locus \(l\) of \(i\). Let \(g :I \rightarrow \mathbb{R }^{n\times N}\) be the map assigning each individual its genotype. A \(p\)-score is a function \(p :I \rightarrow \mathbb{R }\) representing an additive genetic trait (Grafen 2000), i.e. a linear combination of allele frequencies. Hence the following definition.

Definition 1

(\(p\)-score and additive genetic trait) Let \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\). Then \(p\) is a \(p\)-score, i.e. represents an additive genetic trait, if there exists a linear map \(\xi :\mathbb{R }^{n \times N} \rightarrow \mathbb{R }\) such that \(p_i = \xi (d_i^{-1}g_i)\). Let \({\fancyscript{P}}\) denote the subspace of \(L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\) comprising these \(p\)-scores; \({\fancyscript{P}}\) is then a finite-dimensional subspace of \(L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\), of dimension \(K \le N \times n\), say.

Using the Gram–Schmidt process, we can construct a basis for \({\fancyscript{P}}\) consisting of functions \(\{ p_l \}_{l = 1}^K\) such that

$$\begin{aligned} \int _I (p_l)_i (p_{l^{\prime }})_i \ {\tilde{\mu }}_I (di) = {\left\{ \begin{array}{ll} 1 &{} l = l^{\prime } \\ 0 &{} \mathrm{{otherwise}}. \end{array}\right. } \end{aligned}$$

To define the additive genetic value of an arbitrary integrable function of individuals, we simply project onto the set of \(p\)-scores, \({\fancyscript{P}}\). We choose our coefficients so that the kernel of this projection lies in the pre-annihilator of the subspace \({\fancyscript{P}}\), with the usual identification of dual spaces, or rather that the average over all individuals carrying any given allele of the additive genetic value of a trait is equal to the same average of the trait itself.

Definition 2

(Additive genetic value) Let \(f \in L^1(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\). Then the additive genetic value of \(f,\mathrm{{agv}}(f) \in {\fancyscript{P}}\), is given by

$$\begin{aligned} \mathrm{{agv}}(f) = \sum _{l = 1}^K \left( \int _I (f_i) (p_l)_i \, {\tilde{\mu }}_I (di) \right) p_l. \end{aligned}$$

This is then a \(p\)-score, and represents that component of \(f\) which is heritable. We have thus defined a bounded linear operator \(\mathrm{{agv}} :L^1(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R }) \rightarrow {\fancyscript{P}}\). Evidently then \(f = \mathrm{{agv}}(f) + ( f - \mathrm{{agv}}(f))\), where by choice of the \(p_l\), we see that for any \(p \in {\fancyscript{P}}\),

$$\begin{aligned} \int _I (f_i - \mathrm{{agv}}(f)_i)p_i \, {\tilde{\mu }}_I (di) = 0. \end{aligned}$$
(3)

This equation also assures us that \(\mathrm{{agv}}\) does not depend on the choice of basis for \({\fancyscript{P}}\), for suppose two bases give two definitions, \(\mathrm{{agv}}_1\) and \(\mathrm{{agv}}_2\), say, both of which satisfy Eq. (3). Then for all \(f \in L^1(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\), since \(\mathrm{{agv}}_1 (f), \mathrm{{agv}}_2 (f) \in {\fancyscript{P}}\), applying (3) gives that

$$\begin{aligned}&\int _I (\mathrm{{agv}}_1 (f)_i - \mathrm{{agv}}_2 (f)_i)^2 \, {\tilde{\mu }}_I (di) \\&\qquad \!=\! \int _I (\mathrm{{agv}}_1 (f)_i)^2 \, {\tilde{\mu }}_I (di) \!-\! 2 \int _I \mathrm{{agv}}_1 (f)_i \mathrm{{agv}}_2 (f)_i \, {\tilde{\mu }}_I (di) \!+\! \int _I ( \mathrm{{agv}}_2 (f)_i)^2 \, {\tilde{\mu }}_I (di) \\&\qquad = \int _I f_i \mathrm{{agv}}_1 (f)_i \, {\tilde{\mu }}_I (di) - \int _I f_i \mathrm{{agv}}_1 (f)_i \, {\tilde{\mu }}_I (di) \\&\quad \qquad - \int _I f_i \mathrm{{agv}}_2 (f)_i \, {\tilde{\mu }}_I (di) + \int _I f_i \mathrm{{agv}}_2 (f)_i \,{\tilde{\mu }}_I (di) \\&\qquad = 0, \end{aligned}$$

and hence that \(\mathrm{{agv}}_1(f) = \mathrm{{agv}}_2 (f)\) as elements of \({\fancyscript{P}}\).

4 Reproductive value and the Price equation

The Price equation represents gene frequency change in our argument, but in its original form does not admit uncertainty or class-structure. This section derives a suitable Price equation, simultaneously generalizing the uncertainty of Grafen (2002) and the class-structure of Grafen (2006b). The Markov theory allows us to weight over the classes to obtain a single average change in \(p\)-score from one time period to the next. The central property needed is that two sets of weights are the same: those used to average across classes to obtain an average change in mean \(p\)-score on the left hand side; and those used to obtain a single measure of reproductive success for each individual, averaging across the classes of its offspring, to include in the right hand side. The original idea of using the leading eigenvector of a class-to-class transition matrix goes back to Taylor (1990, 1996), and may be detected in embryo form in the famous sex ratio argument of Fisher (1930).

4.1 Links to Markov theory

In this section we apply our assumption that, while uncertainty may affect the fitnesses of individuals, it does not affect class-to-class projection at a population level. We do this by defining a Markov process for arbitrary \(\omega \), and then assuming that there exists an associated invariant measure that does not depend on the choice of \(\omega \). Further work aims to remove this assumption. The invariant measure represents the ‘reproductive value’ of a subset of classes: that is, if we take a sufficiently distant generation and choose an allele at random, what is the probability that its ancestor today is present in an individual that belongs to a class in that subset?

Given all our data, we can follow the methods of Grafen (2006b) for each state of nature \(\omega \in \varOmega \). To find an invariant measure and the appropriate weightings for classes, we must first understand how the class distribution of the population changes, given the pattern of offspring production described above.

Lemma 3

Let \(\omega \in \varOmega \), and let \(f :I \rightarrow \mathbb{R }\) be such that \(i \mapsto f_i{\tilde{w}}_i^{\omega }(a_i)\) as a map from \(I\) to \(\mathcal{M }(X)\) is Bochner integrable.

Then \(\mathbb{E }_I[f_i{\tilde{w}}_i^{\omega }(a_i)] \ll W\).

Proof

This is trivial using Lemma 2 since \(i \mapsto {\tilde{w}}_i^{\omega }(a_i)\) maps into positive measures. \(\square \)

Therefore for such a measure \(\mathbb{E }_I[f_i {\tilde{w}}_i^{\omega }(a_i)]\), the Radon–Nikodym derivative with respect to \(W,\frac{d}{dW}\mathbb{E }_I[f_i {\tilde{w}}_i^{\omega }(a_i)]\), is defined. Applying this remark to characteristic functions of sets of individuals sharing classes, we can define a (discrete time) Markov process with state space \(X\) by defining the probability transition function \(P^{\omega } :X \times \mathcal{X } \rightarrow [0, 1]\) by

$$\begin{aligned} P^{\omega }(x, A) = \left( \frac{d}{dW} \left( \,\, \int _{\chi ^{-1}(A)} {\tilde{w}}_i^{\omega }(a_i) \, {\tilde{\mu }}_I (di) \right) \right) \!(x). \end{aligned}$$

\(P^{\omega }(x, A)\) is then well-defined \(W\)-almost everywhere, and represents the proportionate contribution by parents belonging to a class in \(A\) to offspring of class \(x\).

Following Rosenblatt (1971), we can use \(P^{\omega }( \cdot , \cdot )\) to define a linear functional \(T^{\omega } :L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R }) \rightarrow L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })\) by defining

$$\begin{aligned} (T^{\omega }f)(x) = \int _X f(y)\, P^{\omega }(x, dy). \end{aligned}$$

This is the average of \(f\) over all those parents contributing offspring to class \(x\), and therefore represents how values on classes may be traced back through a generation. \(T^{\omega }\) is clearly well-defined since \(P^{\omega }(x, \cdot ) \ll \mu _X\), and for \(W\)-almost every \(x \in X\), we have that

$$\begin{aligned} | (T^{\omega }f)(x) | \le \Vert f\Vert _{L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })}. \end{aligned}$$

Since \(\mu _X \ll W\), we see that this exceptional set is \(\mu _X\)-null, i.e. \(T^{\omega }f \in L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })\) indeed. We shall find the following alternative expression for \(T^{\omega }f\) useful when deriving the Price equation.

Lemma 4

Let \(f \in L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })\).

Then

$$\begin{aligned} (T^{\omega }f)(x) = \frac{d}{dW} \left( \int _I (f \circ \chi )_i {\tilde{w}}_i^{\omega }(a_i) \, {\tilde{\mu }}_I(di)\right) \!(x) \end{aligned}$$
(4)

for \(W\)-almost every \(x \in X\).

Proof

This follows by routine approximation of the integral by simple functions, using the vectorial version of the dominated convergence theorem (Diestel and Uhl 1977, Chapter 2, Theorem 2.3). \(\square \)

Associated with any such process is the notion of an invariant measure, i.e. a measure \(\tau ^{\omega } \in \mathcal{M }_{+}(X)\) such that

$$\begin{aligned} \int _X P^{\omega }(x, A) \, \tau ^{\omega } (dx)=\tau ^{\omega }(A) \end{aligned}$$

for all \(A \in \mathcal{X }\), or, equivalently,

$$\begin{aligned} \int _X f(x)\, \tau ^{\omega }(dx) = \int _X (T^{\omega }f)(x) \, \tau ^{\omega } (dx) \end{aligned}$$

for all \(f \in L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })\). Such a weighting then appropriately balances the reproductive outputs of classes so that average values across classes are preserved from parental to offspring generation. We remark that since \(P^{\omega }(x, A)\) is only well-defined for \(W\)-almost every \(x \in X\), we can a priori only integrate \(P^{\omega }(x,A)\) with respect to a measure \(\tau ^{\omega }\) if \(\tau ^{\omega } \ll W\). Thus we assume that any discussion of invariant measures is restricted to such absolutely continuous measures, so that the derivative \(\frac{d\tau ^{\omega }}{dW}\) is well-defined.

We assume that an invariant measure exists for each \(\omega \in \varOmega \); we refer to Grafen (2006b) for a discussion of the assumptions required to guarantee this. We further assume that there exists an invariant measure \(\tau ^{\omega }\) which is in fact independent of \(\omega \), thus we can just write \(\tau \) for this invariant measure.

This weighting of classes based on reproductive output is the key ingredient in our definition of fitness of individuals. Given the invariant measure \(\tau \), we can now define ‘fitness operator’ \(F^{\omega } \in \prod _{i \in I} [0, \infty ]^{S_i}\) by, for \(i \in I\), setting

$$\begin{aligned} S_i \ni q \mapsto F_i^{\omega }(q) {:}= \int _X \frac{d\tau }{dW} (x)\, {\tilde{w}}_i^{\omega }(q)(dx). \end{aligned}$$

For each individual \(i \in I\), this is a function of strategy, and is an appropriately weighted average of the individual’s offspring when playing each given strategy.

For this and the following subsection we consider a fixed element \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\). We emphasize that at this stage this is an arbitrary element of the space, and not in general a \(p\)-score, and assigns (bounded) numbers to individuals in a manner not necessarily determined by genotype. We define the class-average value of \(p\) by defining \(\mathcal{X }\)-measurable function \(\pi :X \rightarrow \mathbb{R }\) as the Radon–Nikodym derivative with respect to \(\mu _{X}\) of the measure on \(\mathcal{X }\) given by \(Y \mapsto \int _{\chi ^{-1}(Y)} p_i \, {\tilde{\mu }}_I (di)\). We see that \(\pi \) satisfies

$$\begin{aligned} (\pi \circ \chi )= \mathbb{E }[p | \chi ], \end{aligned}$$

as elements of \(L^1(I, \mathcal{I },{\tilde{\mu }}_I; \mathbb{R })\). Note that properties of conditional expectations imply that \(\pi \in L^{\infty }(X, \mathcal{X }, \mu _X; \mathbb{R })\), since \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\).

Fix \(\omega \in \varOmega \). Since \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\), the correspondingly weighted offspring function \(p {\tilde{w}}^{\omega }(a)\) is Bochner integrable and \(\mathbb{E }_I[p_i{\tilde{w}}_i^{\omega }(a_i)] \ll W\) by Lemma 3. We can define the class average of this weighted offspring function, \({\hat{\pi }}^{\omega } \in L^1(X, \mathcal{X }, W; \mathbb{R })\), by

$$\begin{aligned} {\hat{\pi }}^{\omega } = \frac{d}{dW}(\mathbb{E }_I[p_i {\tilde{w}}_i^{\omega }(a_i)]). \end{aligned}$$
(5)

We expand later on the precise interpretation of this value and under what further assumptions it becomes a quantity of greater relevance, e.g. when it becomes a useful estimate for the class average value of \(p\) in the following generation. In the next lemma we make the observation that this offspring class average is, like the parental class average, a bounded function.

Lemma 5

With \({\hat{\pi }}^{\omega }\) as defined above, in fact \({\hat{\pi }}^{\omega } \in L^{\infty }(X, \mathcal{X }, W; \mathbb{R })\).

Proof

Let \(c > 0\) and use Lemma 2 to see that

$$\begin{aligned} c W(({\hat{\pi }}^{\omega })^{-1}((c, \infty )))&\le \int _{({\hat{\pi }}^{\omega })^{-1}((c, \infty ))} {\hat{\pi }}^{\omega } (x) \, W(dx) \\&= \mathbb{E }_I[p_i{\tilde{w}}_i^{\omega }(a_i)] (({\hat{\pi }}^{\omega })^{-1}((c,\infty ))) \\&= \int _I p_i {\tilde{w}}_i^{\omega }(a_i) (({\hat{\pi }}^{\omega })^{-1}((c, \infty ))) \, {\tilde{\mu }}_I(di)\\&\le \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} \int _I {\tilde{w}}_i^{\omega }(a_i) (({\hat{\pi }}^{\omega })^{-1}((c, \infty ))) \, {\tilde{\mu }}_I(di) \\&= \Vert p \Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} \left( \int _I {\tilde{w}}_i^{\omega }(a_i) \, {\tilde{\mu }}_I(di) \right) (({\hat{\pi }}^{\omega })^{-1}((c, \infty ))) \\&= \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} W(({\hat{\pi }}^{\omega })^{-1}((c, \infty ))). \end{aligned}$$

Hence if \(W(({\hat{\pi }}^{\omega })^{-1}((c, \infty ))) \ne 0\), we see that \(c \le \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })}\). Similarly we can show that

$$\begin{aligned} -c W(({\hat{\pi }}^{\omega })^{-1}((-\infty , -c)) \ge -\Vert p\Vert _{L^{\infty }} W(({\hat{\pi }}^{\omega })^{-1}((-\infty , -c))), \end{aligned}$$

and hence again that if \(W(({\hat{\pi }}^{\omega })^{-1}((-\infty , -c))) \ne 0\) then \(c \le \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })}\). Combining these results and taking the contrapositive, we see that if \(c > \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })}\), then \(W((|{\hat{\pi }}^{\omega }|)^{-1}((c, \infty ))) = 0\). In other words, we infer that \(\Vert {\hat{\pi }}^{\omega }\Vert _{L^{\infty }(X, \mathcal{X }, W; \mathbb{R })} \le \Vert p\Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })}\). \(\square \)

4.2 The Price equation

It turns out that the properties of the invariant measure discussed in the previous subsection are those needed to construct a suitable Price equation. The arguments of this subsection apply for a fixed \(\omega \in \varOmega \). We therefore suppress the superfluous \(\omega \) superscript from the notation.

The definition of \({\hat{\pi }}\) in (5), linearity of Radon–Nikodym differentiation, and Lemma 4 imply that

$$\begin{aligned} {\hat{\pi }} = \frac{d}{dW} (\mathbb{E }_I[p_i {\tilde{w}}_i (a_i)] ) = T\pi + \frac{d}{dW} \left( \mathbb{E }_I \left[ ( p_i - \mathbb{E }[p | \chi ]_i) {\tilde{w}}_i (a_i) \right] \right) \!. \end{aligned}$$

Using the above equation, the invariance of the class-weighting \(\tau \), and the change of variables formula for Radon–Nikodym derivatives (Halmos 1950, §32, Theorem B), we see that

$$\begin{aligned} \int _X ({\hat{\pi }}(x) - \pi (x)) \, \tau (dx)&= \int _X \left( \frac{d}{dW} \left( \mathbb{E }_I \left[ ( p_i - \mathbb{E }[p | \chi ]_i) {\tilde{w}}_i (a_i) \right] \right) \right) \, \tau (dx)\nonumber \\&= \int _X \frac{d \tau }{dW} (x) \, \mathbb{E }_I [ ( p_i - \mathbb{E }[p|\chi ]_i) {\tilde{w}}_i (a_i)](dx). \end{aligned}$$
(6)

Note that since \(W\) and \(\tau \) are positive measures, \(\frac{d \tau }{d W} :X \rightarrow [0, \infty )\). Integration with respect to signed measures is understood as in (Rudin (1966), §6.18), that is, as an integral with respect to the ‘polar decomposition’ of the measure, thus it is defined in terms of a usual integral with respect to a positive measure.

Our plan is to interchange the order of integration in (6) so that the expression becomes an average over individuals of an integrand involving a weighted average over classes of offspring measures, and thus looks rather more like an integral involving the fitness of individuals as defined above. This manipulation is not immediate, but is permitted by the following technical lemma, which is a similar statement to the quasi-Fubini result of Lemma 1, but now involves in general non-positive weights on the measures \( {\tilde{w}}_i (a_i)\), corresponding to our desire to allow \(p\)-scores to be defined via arbitrary allelic weightings. This makes the argument a little more delicate, so we include some details of the proof.

Lemma 6

Let \(f \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R }),g \in L^1 (X, \mathcal{X }, \mu _X; [0, \infty ))\), and all other notation be as above.

Then

$$\begin{aligned} \int _X g(x) \, \left( \int _I f_i {\tilde{w}}_i (a_i)\, {\tilde{\mu }}_I(di) \right) (dx)=\int _I f_i\left( \, \int _X g(x)\, {\tilde{w}}_i (a_i) (dx) \right) \, {\tilde{\mu }}_I (di).\nonumber \\ \end{aligned}$$
(7)

Proof

Define the weighted offspring measure \(W_f \in \mathcal{M }(X)\) by \(W_f = \mathbb{E }_I[f_i {\tilde{w}}_i (a_i)]\).

The result is easily seen to be true, via Lemma 2, for measurable simple functions. Let \(s_n :X \rightarrow [0, \infty )\) be a sequence of measurable simple functions such that \(s_n(x) \uparrow g (x)\). We easily see by the monotone convergence theorem that for \(i \in I\),

$$\begin{aligned} \int _X g (x) \, ( f_i {\tilde{w}}_i (a_i))(dx) = f_i \int _X g(x) \, {\tilde{w}}_i (a_i)(dx). \end{aligned}$$

Now, for \(i \in I\), we have that

$$\begin{aligned} \left| f_i \int _X s_n (x)\, {\tilde{w}}_i (a_i)(dx) \right| \le \Vert f \Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} \int _X g (x) \, {\tilde{w}}_i (a_i) (dx), \end{aligned}$$

where by Lemma 1,

$$\begin{aligned} \int _I \left( \int _X g (x) \, {\tilde{w}}_i (a_i)(dx) \right) \, {\tilde{\mu }}_I(di) = \int _X g (x)\, W(dx) < \infty \end{aligned}$$

since \(g \in L^1(X, \mathcal{X }, W; [0, \infty ))\). So \(i \mapsto \int _X g (x)\, {\tilde{w}}_i (a_i)(dx)\) is \({\tilde{\mu }}_I\)-integrable, and thus by the dominated convergence theorem,

$$\begin{aligned} \lim _{n \rightarrow \infty } \int _I f_i \left( \int _X s_n (x)\, {\tilde{w}}_i (a_i)(dx) \right) \, {\tilde{\mu }}_I(di) \!=\! \int _I f_i \left( \int _X g(x) \, {\tilde{w}}_i (a_i)(dx) \right) \, {\tilde{\mu }}_I(di).\nonumber \\ \end{aligned}$$
(8)

This argument shows that the limits behave as required for the right-hand side of our required expression.

We recall from the definition of integration with respect to a signed measure that there exists some \(\mathcal{X }\)-measurable function \(\phi :X \rightarrow \{ \pm 1\}\) such that \(dW_f = \phi \, d |W_f|\), thus

$$\begin{aligned} \int _X g(x) \, W_f (dx) = \int _X g(x) \phi (x) \, |W_f|(dx) \end{aligned}$$

and

$$\begin{aligned} \int _X s_n (x) \, W_f (dx) = \int _X s_n (x) \phi (x) \, |W_f|(dx) \end{aligned}$$

for all \(n \ge 1\). Now we observe that for arbitrary \(Y \in \mathcal{X }\), use of Lemma 2 shows that

$$\begin{aligned} |W_f(Y)|\le \Vert f \Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} W(Y), \end{aligned}$$

and hence

$$\begin{aligned} \int _X g(x) \, W_f(dx) \le \Vert f \Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} \int _X g (x) \, W(dx), \end{aligned}$$

since \(g \ge 0\). So \(g \in L^1(X, \mathcal{X }, W_f, [0, \infty ))\), and thus \(g \phi \in L^1(X, \mathcal{X }, |W_f|; [0, \infty ))\). Since \(|\phi (x)| = 1\) for \(\mu _X\)-almost every \(x \in X\), the choice of \(s_n\) implies that

$$\begin{aligned} |s_n (x) \phi (x) | \le | g(x) \phi (x)|, \end{aligned}$$

so we can used the dominated convergence theorem to see that

$$\begin{aligned} \lim _{ n \rightarrow \infty } \int _X s_n (x) \phi (x) \, |W_f|(dx) = \int _X g (x) \phi (x) \, |W_f|(dx). \end{aligned}$$

Hence by definition

$$\begin{aligned} \lim _{n \rightarrow \infty } \int _X s_n (x) \, W_f (dx) = \int _X g(x) \, W_f (dx). \end{aligned}$$

So, using the result on measurable simple functions and (8), we see that

$$\begin{aligned} \int _Xg (x) \, W_f (dx)&= \lim _{n \rightarrow \infty } \int _X s_n(x) \, W_f (dx)\\&= \lim _{n \rightarrow \infty } \int _I \left( \, \int _X s_n (x) \,f_i {\tilde{w}}_i (a_i)(dx) \right) \, {\tilde{\mu }}_I(di)\\&= \int _I f_i \left( \, \int _X g (x) \, {\tilde{w}}_i(a_i)(dx) \right) \, {\tilde{\mu }}_I (di), \end{aligned}$$

as required. \(\square \)

Applying this result with \(f_i = p_i - \mathbb{E }[p | \chi ]_i\) and \(g(x) = \frac{d \tau }{dW}(x)\), we recall (6) and the definition of \(F_i(a_i)\) and see now that

$$\begin{aligned} \int _X ({\hat{\pi }} (x)- \pi (x)) \, \tau (dx)&= \int _I (p_i - \mathbb{E }[p|\chi ]_i) \left( \, \int _X \frac{d \tau }{dW}(x) \, {\tilde{w}}_i (a_i) (dx) \right) \, {\tilde{\mu }}_I (di) \nonumber \\&= \mathbb{E }_I [ (p_i - \mathbb{E }[p| \chi ]_i) F_i (a_i)]. \end{aligned}$$
(9)

We now wish to reintroduce and average over the states of nature \(\omega \in \varOmega \) which have hitherto been fixed.

Fixing both an individual \(i \in I\) and an admissible strategy \(q \in S_i\), Lemma 1 implies that \(\omega \mapsto F_i^{\omega }(q)\) is measurable. So we may define the expected fitness function \(F \in \prod _{i \in I} [0, \infty ]^{S_i}\) by

$$\begin{aligned} F_i (q) = \mathbb{E }^{\varOmega } [F_i^{\omega }(q)], \end{aligned}$$

or, equivalently, by (2),

$$\begin{aligned} F_i (q) = \int _X \frac{d \tau }{dW} (x) \, \mathbb{E }^{\varOmega }[ {\tilde{w}}_i^{\omega }(q)] (dx). \end{aligned}$$
(10)

This is the expected fitness of an individual \(i \in I\) playing strategy \(q \in S_i\), and the latter expression shows it to be given by an appropriately-weighted average of the expected contributions to offspring classes, when playing strategy \(q\).

In this general context we must check that the fitness of the individuals with their realized phenotypes is a well-defined finite number. As above, the map \((i, \omega ) \mapsto F_i^{\omega }(a_i)\) is measurable, and hence by the classical Fubini theorem (Rudin 1966, theorem 7.8), \(i \mapsto F_i (a_i)\) is measurable. Fubini’s theorem for Bochner integrals (see Dunford and Schwartz 1958, §III.11.9 Theorem 9) implies that

$$\begin{aligned} \mathbb{E }_I[\mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega }(a_i)]] = \mathbb{E }^{\varOmega }[\mathbb{E }_I[{\tilde{w}}_i^{\omega }(a_i)]]. \end{aligned}$$

Hence we note, using (10) and (2) once more, that

$$\begin{aligned} 0 \le \int _I F_i (a_i) \, {\tilde{\mu }}_I (di)&= \int _I \left( \,\int _X \frac{d \tau }{dW} (x)\, \mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega }(a_i)](dx) \right) \, {\tilde{\mu }}_I(di)\\&= \int _X \frac{d \tau }{dW} (x)\, \mathbb{E }_I[\mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega }(a_i)]](dx)\\&= \int _X \frac{d \tau }{dW} (x)\, \mathbb{E }^{\varOmega }[\mathbb{E }_I[{\tilde{w}}_i^{\omega }(a_i)]](dx)\\&= \int _X \frac{d \tau }{dW} (x)\, \mathbb{E }^{\varOmega }[W] (dx)\\&= \int _X \frac{d \tau }{dW}(x)\, W(dx) \\&= \tau (X) \\&< \infty . \end{aligned}$$

Thus \(i \mapsto F_i (a_i)\) is a function in \(L^1(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\), and hence \(0 \le F_i(a_i) < \infty \) for \({\tilde{\mu }}_I\)-almost every \(i \in I\).

The final step is to define the expected value of our class-average value \({\hat{\pi }}^{\omega }\) in the following generation. Radon–Nikodym differentiation is an isometry from \(\{ \mu \in \mathcal{M }(X) : \mu \ll W\}\) to \(L^1(X, \mathcal{X }, W; \mathbb{R })\), so the function \(\omega \mapsto {\hat{\pi }}^{\omega }\) is strongly measurable. Moreover, for each \(\omega \in \varOmega \), we see using standard facts about Bochner integrals (see Diestel and Uhl 1977, Chapter 2, Theorem 2.4) that

$$\begin{aligned} \left\| \frac{d}{dW} \mathbb{E }_I[ p_i {\tilde{w}}_i^{\omega }(a_i) ] \right\| _{L^1(X, \mathcal{X }, W; \mathbb{R })}&\le \Vert \mathbb{E }_I [ p_i {\tilde{w}}_i^{\omega }(a_i) ] \Vert _{\mathcal{M }(X)} \\&\le \mathbb{E }_I \left[ \Vert p_i {\tilde{w}}_i^{\omega }(a_i) \Vert _{\mathcal{M }(X)} \right] \\&\le \Vert p \Vert _{L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })} W(X), \end{aligned}$$

and therefore that \(\omega \mapsto {\hat{\pi }}^{\omega }\) is Bochner integrable. We may therefore define the expected value \({\hat{\pi }} \in L^1(X, \mathcal{X }, W; \mathbb{R })\) by

$$\begin{aligned} {\hat{\pi }} = \mathbb{E }^{\varOmega }[{\hat{\pi }}^{\omega }]. \end{aligned}$$

Using once more that Radon–Nikodym differentiation is an isometry, we use the result of Hille (Diestel and Uhl 1977, Chapter 2, Theorem 2.6) to see that we can commute it with the expectation and conclude that

$$\begin{aligned} {\hat{\pi }} = \mathbb{E }^{\varOmega } \left[ \frac{d}{dW} \mathbb{E }_I[p_i {\tilde{w}}_i^{\omega }(a_i)]\right] = \frac{d}{dW} \mathbb{E }^{\varOmega }[ \mathbb{E }_I[p_i {\tilde{w}}_i^{\omega }(a_i)]]. \end{aligned}$$
(11)

Having established these expected values of our variables, we use expression (9), linearity of expectation and Fubini’s theorem to see that the expected difference in the class-average values calculated in the two generations can be related to the \(p\)-score with which we began and expected fitness:

$$\begin{aligned} \int _X ({\hat{\pi }} (x) - \pi (x)) \, \tau (dx)&= \mathbb{E }^{\varOmega } \left[ \int _X ({\hat{\pi }}^{\omega } (x)- \pi (x) )\, \tau (dx)) \right] \nonumber \\&= \mathbb{E }_I \left[ ( p_i - \mathbb{E }[ p |\chi ]_i ) \mathbb{E }^{\varOmega } [ F_i^{\omega }(a_i) ] \right] \nonumber \\&= \mathbb{E }_I \left[ (p_i - \mathbb{E }[ p |\chi ]_i)F_i(a_i)\right] \end{aligned}$$
(12)
$$\begin{aligned}&= \mathbb{E }_I \left[ p_i ( F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i) \right] , \end{aligned}$$
(13)

where the conditional expectation of \(F(a)\) is understood here and elsewhere (including references to the corresponding covariance) to refer to the function \(i \mapsto F_i(a_i)\).

We remark that at this point, with the definitions given and properties assumed, these equations are not necessarily susceptible to intelligible interpretation, despite our suggestive notation, since they do not in general represent the change from one generation to the next of a comparable quantity. When, however, the function \(p :I \rightarrow \mathbb{R }\) is a \(p\)-score, i.e. represents an additive genetic trait and is thus an allele frequency or linear combination of allele frequencies, then as discussed above \(p\) is the composition of a map on an underlying space of genotypes and a map specifying each individual’s genotype. Thus in principle corresponding values may be computed in the offspring generation. Assuming perfect transmission, i.e. no mutation, fair meiosis, and no gametic selection (Grafen 2000), precisely by definition of being an additive genetic trait, the expected mean value by class of this trait in the offspring generation is given by the expectation of the average over the gametic contributions (Falconer 1981). However, (11) implies that for all \(Y \in \mathcal{X }\),

$$\begin{aligned} \int _Y {\hat{\pi }} (x) \, W(dx) = \mathbb{E }^{\varOmega }[\mathbb{E }_I[p_i {\tilde{w}}_i^{\omega }(a_i)]](Y). \end{aligned}$$

The right-hand side is precisely the expected average of the gametic contributions to the set of classes \(Y\). Since \({\hat{\pi }} \in L^1(X, \mathcal{X }, W; \mathbb{R })\) is the unique element satisfying this equation for all \(Y \in \mathcal{X }\), we see that \({\hat{\pi }}\) therefore represents the expected mean value by class of the additive genetic trait in the offspring generation, \(W\)-almost everywhere. Since \(\tau \ll W\), any discrepancies on \(W\)-null sets are lost when weighted by reproductive value \(\tau \). Thus the equations we derived above record the change in the mean value by class of this additive genetic trait from the parental to the offspring generation, weighted by reproductive value. We record these remarks in the following theorem.

Theorem 1

(The Price equation) Supposing perfect transmission, the expected change in the mean value of an additive genetic trait, weighted by reproductive value, is given by

$$\begin{aligned} \int _X ({\hat{\pi }} (x) - \pi (x) )\, \tau (dx)&= \mathbb{E }_I [ (p_i - \mathbb{E }[ p | \chi ]_i) F_i (a_i) ] \nonumber \\&= \mathbb C _I [ p_i - \mathbb{E }[p | \chi ]_i, F_i (a_i) ] \nonumber \\&= \mathbb{E }_I [ \mathbb C _I [(p, F(a)) | \chi ]_i] \nonumber \\&= \mathbb{E }_I [ p_i (F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i)] \end{aligned}$$
(14)
$$\begin{aligned}&= \mathbb{E }_I[ p_i ( \mathrm{{agv}}(F(a) - \mathbb{E }[F(a) | \chi ])_i)], \end{aligned}$$
(15)

where \(p \in {\fancyscript{P}}\) represents the parental values of the trait.

Proof

The interpretation of the left-hand side of (15) is discussed above. The equality of the first four expressions on the right-hand side follows by standard properties of conditional expectations (see Billingsley 1995, Theorem 34.3). Moving from (14) to (15) is possible in this situation because by assumption \(p\) is a \(p\)-score, and therefore we can apply Eq. (3). \(\square \)

Equation (15) is the final version of the Price equation in the situation of a class-structured population with uncertainty, when this uncertainty does not affect the class distribution of offspring at the population level.

The Price equation thus represents for us how a special class-weighted mean of any arbitrary weighted sum of allele frequencies changes from this generation to the next. The class-weights do not depend on the allele-weights, and individual expected fitnesses on the right hand depend on the class-weights but not on the allele-weights. Thus we obtain some purchase on changes in our arbitrary space of genotypes by knowing how every arbitrary weighted sum of allele frequencies changes. The genetic side can be linked to phenotypes at an individual level because the genetic changes depend only on the expected individual fitnesses. These abstract connections make the most of the phenotype-genotype separation inherent in the original covariance selection mathematics of Price (1970) and, remarkably, allow links to be made to fitness-maximization ideas without any further specification of how genotype determines phenotype.

4.3 Comparison with Grafen (2006b)

Before pursuing the implications of the Price equation for fitness-maximization, we conclude this section by observing that our approach is indeed consistent with that of Grafen (2006b). To see this we reduce our setup to the cases examined there. Thus we suppose there is no effect of uncertainty, so \(\varOmega = \{ \omega \}\), and the offspring distributions have no explicit dependence on class. Rather, the situation Grafen considers is where the offspring distribution depends only on the individual. We may capture this situation by considering the measure \(w(i)\) in the notation of Grafen (2006b) to be given in our notation as \(w(a_i)\). Let \(Y \in \mathcal{X }\). We calculate the class reproductive value of \(Y\), using our definitions and concepts, by regarding our fitness \(F_i (a_i)\) as a Fisherian per-capita reproductive value per ploidy at the level of the individual. So our calculation is, using Lemma 1, change of variables in Radon–Nikodym derivatives, and the invariance of \(\tau \):

$$\begin{aligned} \int _{\chi ^{-1}(Y)} F_i (a_i)\, {\tilde{\mu }}_I (di)&= \int _{\chi ^{-1}(Y)}\left( \, \int _X \frac{d \tau }{dW} (x) \,w(a_i) (dx) \right) \, {\tilde{\mu }}_I (di) \\&= \int _X \frac{d \tau }{dW} (x) \, \mathbb{E }_I[ (1\!\!1_{\chi ^{-1}(Y)})_i w(a_i) ] (dx) \\&= \int _X \frac{d}{dW} (\mathbb{E }_I[ (1\!\!1_{\chi ^{-1}(Y)})_i w(a_i) ])(x) \, \tau (dx) \\&= \int _X \frac{d}{dW} (\mathbb{E }_I[ (1\!\!1_{Y} \circ \chi )_iw(a_i)]) (x)\, \tau (dx) \\&= \int _X T1\!\!1_{Y} (x) \, \tau (dx) \\&= \int _X 1\!\!1_{Y} (x) \, \tau (dx) \\&= \tau (Y). \end{aligned}$$

This is precisely what Grafen considers to be class reproductive value. We can, then, for example, quickly recover Fisher’s sex ratio argument (after Grafen 2006b, §8.1). In this situation we have a space of two classes, \(X = \{ M, F\}\) say, representing the sexes. The assumption of equal male and female contribution to offspring is precisely the assumption that \(\mathbb{E }_I [( {1\!\!1}_{\chi ^{-1}(M)} )_iw(a_i)] = \mathbb{E }_I [ ({1\!\!1}_{\chi ^{-1}(F)})_i w(a_i)] \) as measures on \(X\), so, arguing as above,

$$\begin{aligned} \int _{\chi ^{-1}(M)} F_i (a_i) \, {\tilde{\mu }}_I (di)&= \int _X \frac{d \tau }{dW} (x) \, \mathbb{E }_I [ (1\!\!1_{\chi ^{-1}(M)})_i w(a_i)] (dx) \\&= \int _X \frac{d \tau }{dW} (x) \, \mathbb{E }_I [ (1\!\!1_{\chi ^{-1}(F)})_iw(a_i)](dx) \\&= \int _{\chi ^{-1}(F)} F_i (a_i) \, {\tilde{\mu }}_I (di). \end{aligned}$$

Thus already our work seems to support the interpretation of Fisher’s notion of reproductive value as an evolutionary maximand. We shall uncover further connections to Fisher’s work as we formalize the fitness-maximization consequences of the Price equation in the following sections.

5 Optimization

The work of this section is to construct an optimization programme at the same level of generality as the Price equation of the previous section. The instrument will be phenotype, the constraint set will be the set of possible phenotypes, and, most significantly from a biological point of view, the maximand will be expected fitness. This definition of what fitness-maximization means stands in contrast to the way population geneticists have in the past attempted to represent the biologist’s sense of fitness-maximization, namely in terms of natural structures on dynamical systems such as Lyapunov functions and gradient functions (Ewens 2004). The contrast is immediately clear: both of those mathematical concepts are functions from the space of gene-frequencies to the real line, rather than from the set of possible phenotypes; it may also be noted that working with those concepts requires dynamic sufficiency, which the current framework lacks. Grafen (2002) and Grafen (2006a) provided a parallel optimization programme including uncertainty, while Grafen (2006b) was unable to provide one with class structure: this section provides both simultaneously. Thus even providing an optimization programme is a technical advance, and it has biological significance in defining what fitness-maximization means.

It is worth remembering that a strategy, shortly to be defined, is a mapping from environmental cues to actions, and thus the individual is regarded as making decisions in the face of partial knowledge about uncertainty. Solving the programme will therefore imply acting as if in possession of a correct prior distribution over the whole uncertainty, and as if performing appropriate Bayesian updating of that prior in the light of the partial information received. The maximand is a probability-weighted arithmetic mean over the uncertainty, and so this fitness-maximization does not exhibit the risk aversion or bet-hedging that appears in many biological discussions of uncertainty (for reasons best explained by Frank and Slatkin 1990): that difference arises because in this paper fitness is defined as relative to the population mean in each given state of nature. (The recent series of papers by Frank 2011a, b, 2012a, b, c, 2013a, b presents a modern discussion of bet-hedging and many other relevant topics.) The advantage of the relative definition is precisely that we can operate at the current very high level of generality. The bet-hedging models are all, in comparison, very special cases, in that they assume some definite genetic architecture, and they do not focus on individual fitness maximization.

In moving from the population genetic model to the optimization programme, there are two steps of note. The whole population of the genetic model must be reduced to the single implicit decision-taker of the optimization. In order to do this, an assumption must be made to ensure that individuals within the same class are, in some suitable sense, equivalent. Nothing assumed up to now prevents the population being in two separate halves with quite different selection pressures. First noticed by Grafen (2002), this kind of assumption is an essential part of any general argument linking population genetics to fitness maximization. Its precise form could be of interest in applications, in determining whether fitness-maximization ideas can be applied or not. Perhaps even more importantly, requiring mathematical rigour and proofs allowed us to find and articulate this once-latent assumption, and also allows us to be confident that there are no further assumptions waiting to be uncovered.

The definitions and results of the rest of the paper follow those of Grafen (2002), but are extended to include simultaneously uncertainty and the division of the population into classes.

Definition 3

(Pairwise exchangeability) We say the assumption of pairwise exchangeability holds if

  • for all measurable functions \(v :A \times X \times U \rightarrow \mathbb{R }\), the function

    $$\begin{aligned} (i, q, c) \mapsto \nu \left( \left\{ \omega \in \varOmega : v \left( q (r_i^{\omega }),\chi _i, u_i^{\omega }\right) \le c\right\} \right) \!, \end{aligned}$$

    mapping from \(I \times Q \times \mathbb{R }\) to \([0,1]\), is measurable with respect to the product \(\sigma \)-algebra \(\sigma ( \chi ) \times \mathcal{Q } \times \mathcal{B }\), where \(\mathcal{B }\) denotes the usual Borel \(\sigma \)-algebra on \(\mathbb{R }\); and

  • for all \(x\in X,S_j = S_k\) whenever \(j, k \in \chi ^{-1}(\{x\})\).

The significance lies in the stipulation of the \(\sigma \)-algebra \(\sigma (\chi )\) on \(I\) with respect to which measurability is asserted. Technical reasons arising from the subtleties of integration and measurability on product spaces demand that the assumption is stated in this somewhat elaborate form, but the content of the assertion lies in the reference to \(\sigma (\chi )\), rather than to \(\mathcal{I }\). Roughly, the assertion is that any pair of individuals of the same class have a symmetric distribution of chance events, and the collection of admissible strategies is the same. This is thus the natural generalization of the corresponding assumption in (Grafen (2002), §4).

This can also be read as a comment on how the class structure is defined, in stipulating that classes are not so broad as to allow within them individuals which face on average wholly different situations under the effects of environmental uncertainty. The question of how appropriate and workable the class structure is arises again later (see Definition 7), where a tension in the opposite direction is revealed, towards classes not being too small. We therefore find it more informative to retain pairwise exchangeability as an explicit assumption rather than incorporate it more implicitly into the fundamental properties of the class structure, and we thereby admit the possibility of a structured population in which the assumption fails. For concreteness, suppose we have a population in which males and females have different sets of possible actions, and in which juveniles differ in survivorship from adults in each sex. Then it is possible that the assumption holds when we model the population with a full age and sex class structure. However, if we modelled with only age classes or with only sex classes, the assumption would fail. We further remark that, as indicated in the statements, some of our results hold without this assumption. As an aside, the issues of correctness of class structure, and whether such questions can be precisely formulated, are intriguing, particularly in relation to the interaction of class and genotype, and warrant further attention.

The important consequence of pairwise exchangeability is captured in the following lemma, in which we see that under this assumption, within classes, expected fitness is a function only of strategies, not of individuals. Again, technical reasons demand that the result is stated more subtly, but the essence is the blindness to individual differences within classes, in expectation, of fitness. This allows us to pass from a class of individuals each playing their own strategy to a single implicit decision maker in each class.

Lemma 7

Suppose we have pairwise exchangeability.

Then the map

$$\begin{aligned} \bigcup _{i \in I} (\{i\}\times S_i) \ni (i, q) \mapsto F_i (q), \end{aligned}$$

assigning an expected fitness to an individual playing a strategy admissible for that individual, is measurable with respect to the induced product \(\sigma \)-algebra \(\sigma (\chi ) \times \mathcal{Q }\). As in the definition of pairwise exchangeability, the significance of the statement lies in the assertion of measurability with respect to the \(\sigma \)-algebra \(\sigma (\chi )\).

Proof

By (10) and the definition of the integral, it suffices to show that for each set of classes \(Y \in \mathcal{X }\) the map

$$\begin{aligned} (i, q) \mapsto \mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega } (q)](Y) \end{aligned}$$

is measurable with respect to \(\sigma ( \chi ) \times \mathcal Q \). So fix a set \(Y \in \mathcal{X }\). Using Lemma 2 and, for example, (Kingman and Taylor (1966), Theorem 11.4) we have for each individual \(i \in I\) and strategy \(q \in S_i\) that

$$\begin{aligned} \mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega } (q)](Y)\!=\! \mathbb{E }^{\varOmega }[{\tilde{w}}_i^{\omega }(q)(Y)] \!=\! \int _{\mathbb{R }} \nu (\{ \omega \in \varOmega : {\tilde{w}}_i^{\omega }(q) (Y) \!\le \! c \}) \, {\fancyscript{L}}^1 (dc),\quad \end{aligned}$$
(16)

where the final integral is with respect to the usual one-dimensional Lebesgue measure \({\fancyscript{L}}^1\) on \(\mathbb{R }\). That this final expression is measurable as a function of \((i, q)\) with respect to \(\sigma (\chi ) \times \mathcal{Q }\) follows by pairwise exchangeability and Fubini’s theorem. \(\square \)

Definition 4

(Optimization programme) Let

$$\begin{aligned} {\mathcal{S }} \subseteq \{ s :I \times R \rightarrow A : s \mathrm{{\,is\,measurable\,and\,}} s_i \in S_i \mathrm{{\,for\,all\,}} i \in I\} \end{aligned}$$

contain the realized phenotype allocation \(a\) and satisfy the following substitution condition: if \(s, t \in {\mathcal{S }}\) and \(J \in \mathcal{I }\), then the function \(s_{t, J} \in \prod _{i \in I} S_i\) defined by

$$\begin{aligned} (s_{t, J})_i (r) = {\left\{ \begin{array}{ll} s_i (r) &{} i \notin J \\ t_i (r) &{} i \in J \end{array}\right. } \end{aligned}$$

also lies in \({\mathcal{S }}\). Thus substituting a different admissible specification of phenotypes on a certain subset of individuals is also admissible. Let \(k \in I\). We consider the optimization programme of maximizing \(F_k (s_k)\) over all \(s \in {\mathcal{S }}\), where \(s :k \mapsto s_k \in S_k\). We say \({\bar{s}} \in {\mathcal{S }}\) is a solution for \(k\) in relation to \({\mathcal{S }}\) if \(F_k({\bar{s}}_k) \ge F_k(s_k)\) for all \(s \in {\mathcal{S }}\).

Remark 2

The substitution condition on the choice set \({\mathcal{S }}\) precisely stipulates that the strategies available to any one individual do not depend on the strategies played by any other set of individuals, and is implied by our underlying biological assumption of lack of social interaction.

We now move on to two key concepts relating to the Price equation. They are carefully constructed to be weak enough to be usable despite our lack of model connecting phenotype to genotype, but strong enough to allow meaningful links to be made to optimization. Scope for selection almost means that extant gene frequencies do change in expectation—in fact it says there is a possible constructible \(p\)-score that would change in expectation, where we allow ourselves to construct a ‘possible’ \(p\)-score by assigning an arbitrary real number to each individual (in a measurable way, of course).

Definition 5

(Scope for selection) We say there is no scope for selection whenever the expected change in any class average value is zero, i.e. whenever

$$\begin{aligned} \int _X( {\hat{\pi }} (x) - \pi (x)) \, \tau (dx) = 0 \end{aligned}$$

for all \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\) (recall that \(\pi \) and \({\hat{\pi }}^{\omega }\) depend on \(p\)). Thus there is scope for selection when there are differences in fitness that could cause an allele frequency to change. We emphasize that this definition discusses behaviour for arbitrary essentially bounded functions \(p\), not just \(p\)-scores.

This is not an analogue of the first part of the ESS definition of Maynard Smith and Price (1973), nor of a first-order condition in simple optimization. Note that the condition is that no mean \(p\)-score and so no gene frequency changes, and that nothing is said about genotype frequencies. This is an inevitable consequence of our abstract setting, and has consequences for the interpretation of the links to be proved later on.

The second condition considers a counter-factual case in which some of the phenotypes in the population are replaced with a new phenotype, and asks whether, supposing the individuals with altered phenotype each had one copy of a new allele, that allele would spread in expectation. There is potential for selection if there is a possible phenotype for which the answer is yes. Thus scope for selection is about standing variation in phenotypes, and potential for selection is about whether a new phenotype would spread if caused by a rare dominant mutation.

Definition 6

(Potential for positive selection) Consider an alternative set of strategies \(s \in {\mathcal{S }}\) and a subset of individuals \(J \in \mathcal{I }\). We define a hypothetical rival set of admissible strategies \({\tilde{a}}_{s, J}\in {\mathcal{S }}\) by

$$\begin{aligned} ({\tilde{a}}_{s, J})_i (r)= {\left\{ \begin{array}{ll} a_i (r) &{} i \notin J \\ s_i(r) &{} i \in J. \end{array}\right. } \end{aligned}$$

Then \({\tilde{a}}_{s, J}\in {\mathcal{S }}\) represents individuals in \(J\) swapping to strategies given by \(s\), and indeed is itself admissible and lies in \({\mathcal{S }}\) precisely by our substitution assumption on \({\mathcal{S }}\).

We say there is no potential for positive selection in relation to \({\mathcal{S }}\) if for all \(s\in {\mathcal{S }}\) and all \(J \in \mathcal{I }\), we have

$$\begin{aligned} \mathbb{E }_I[(d_i^{-1}(1\!\!1_{J})_i - \mathbb{E }[ (d^{-1}1\!\!1_{J}) | \chi ]_i) F_i((\tilde{a}_{s, J})_i)] \le 0. \end{aligned}$$

Note that the condition is satisfied trivially, with equality holding, if \({\tilde{\mu }}_I(J) = 0\). We make the stipulation that the substitution of strategy on the set \(J\) affects only the evaluation of the function \(F\) on points \(i \in J\), despite the fact that \(F\) depends by definition on \(W\), which is in general a different measure under this substitution.

The purpose of this definition is to enable us to discuss the possibility of a mutant allele invading the population on some set of individuals \(J\), and thereby altering the strategy \(a_i\) of these individuals to \(s_i\). The defining inequality comes of course from the relation (12). The function \(i \mapsto d^{-1}_i ({1\!\!1}_{J})_i\) is the \(p\)-score obtained by allocating weight \(1\) to the mutant allele and \(0\) to all others: it therefore represents the presence of one copy of the mutant allele in precisely those individuals \(i \in J\). Thus there is potential for selection when a rare mutant, producing with dominance some admissible phenotype, would initially increase in density in the population. The restriction of rarity arises because of the implicit assumption that no individual possesses more than one copy of the mutant gene.

6 Links between gene dynamics and optimization

Having established the situation, we prove analogous version of the four links to be found in Grafen (2002). These comprise three implications for gene dynamics based on optimization assumptions, and one implication for optimization based on assumptions about gene dynamics. As the instrument and constraint set of the optimization are more or less copied over from the population genetic assumptions, the focus of interest is the maximand. In particular, it is important to show that the maximand has many of the properties a biologist would wish ‘fitness’ or ‘expected fitness’ to have. From the point of view of its construction, it is an average over environmental uncertainty of a class-weighted sum of offspring numbers, which is biologically reasonable. From the point of view of links to genetics, an unattainable dream would be to show that exactly when each individual maximizes its fitness, the population genetic system itself is in equilibrium. Our level of abstraction prevents us obtaining such a definite result, but it is in any event untrue, for example in simple cases such as over-dominance in a diploid population (Allison 1954). Our aim is therefore to prove the strongest possible results in that direction with the twin aims of showing that there are close ties between fitness-maximization and gene-frequency change, and that our definition of fitness is essentially unique, though this latter point will not be pursued formally in the current paper. The interpretation of the theorems will be discussed in the following section.

Theorem 2

Suppose we have pairwise exchangeability, and suppose for some set of individuals \({\tilde{I}} \in \mathcal{I }\) of full \({\tilde{\mu }}_I\)-measure that \(a\) is a solution for \(i\) in relation to \({\mathcal{S }}\) for every \(i \in {\tilde{I}}\).

Then there is no scope for selection and no potential for positive selection in relation to \({\mathcal{S }}\).

Proof

The trick is to exploit the assumption of pairwise exchangeability to infer from the assumption of maximization that the expected fitness of the realized phenotypes is equal to its class average. By Lemma 7 and the Doob–Dynkin Lemma (see Rao 2004, §3.1 Theorem 8), there is a measurable function \(H :X \times Q \rightarrow [0, \infty ]\) such that

$$\begin{aligned} F_i(q) = H(\chi _i, q) \end{aligned}$$

for all \(i \in I\) and \(q \in S_i\).

Fix \(x \in X\) and consider \(j, k \in \chi ^{-1}(\{x\}) \cap {\tilde{I}}\). Note that by pairwise exchangeability \(a_j \in S_k\) and \(a_k \in S_j\), so it makes sense to evaluate \(F_k(a_j)\) and \(F_j (a_k)\). Then

$$\begin{aligned} F_j(a_j) = H( \chi _j, a_j) = H(\chi _k, a_j) = F_k (a_j) \le F_k (a_k), \end{aligned}$$

since \(a\) is a solution for \(k\) in relation to \({\mathcal{S }}\). Swapping the roles of \(j\) and \(k\) we get the reverse inequality, thus \(F_k (a_k) = F_j (a_j)\). Hence for any Borel set \(B \subseteq \mathbb{R }\),

$$\begin{aligned}&\{ i \in {\tilde{I}} : F_i(a_i) \in B \} \\&\quad = \{ i \in {\tilde{I}} : (\chi _i, a_i) \in H^{-1}(B) \} \\&\quad = \{ i \in {\tilde{I}} :(\{ \chi _i \} \times \{ a_j : j \in \tilde{I} \cap \chi ^{-1} (\{\chi _i\})\} )\cap H^{-1}(B) \ne \emptyset \} \\&\quad = {\tilde{I}} \cap \chi ^{-1} \Big ( \{ x \in X : ( \{x\} \times \{ a_j : j \in \tilde{I} \cap \chi ^{- 1}( \{x\})\} ) \cap H^{-1} (B) \ne \emptyset \} \Big ) \end{aligned}$$

Since \(H\) is measurable, this final line is the restriction to \({\tilde{I}}\) of the \(\chi \)-pre-image of a measurable subset of \(X\), by our assumption (1), which therefore lies in \(\sigma (\chi )\) by definition. Hence \(i \mapsto F_i(a_i)\) is measurable with respect to \(\sigma (\chi )\) as a map from \({\tilde{I}}\). By definition, \(i \mapsto \mathbb{E }[ F(a) | \chi ]_i\) is measurable with respect to \(\sigma (\chi )\). Furthermore, by definition of conditional expectation, and since \(I \backslash {\tilde{I}}\) is \({\tilde{\mu }}_I\)-null, we see for any \(Y \in \mathcal{X }\) that

$$\begin{aligned} \int _{{\tilde{I}} \cap \chi ^{-1}(Y) } F_i(a_i) \, {\tilde{\mu }}_I (di)&= \int _{\chi ^{-1}(Y)} F_i (a_i) \, {\tilde{\mu }}_I (di) \\&= \int _{\chi ^{-1}(Y)} \mathbb{E }[ F(a) | \chi ]_i \, {\tilde{\mu }}_I (di) \\&= \int _{\tilde{I} \cap \chi ^{-1}(Y)} \mathbb{E }[ F (a) | \chi ]_i \, {\tilde{\mu }}_I (di). \end{aligned}$$

Since \(Y\) is arbitrary, this implies (see for example Halmos 1950, §25 Theorem E) that \(F_i(a_i) = \mathbb{E }[F(a) | \chi ]_i\) for \({\tilde{\mu }}_I\)-almost every \(i \in \tilde{I}\), and hence for \({\tilde{\mu }}_I\)-almost every \(i \in I\). Thus the expression (13) is zero for any \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\).

For the second assertion of the theorem, fix \(s \in {\mathcal{S }}\) and \(J \in \mathcal{I }\). Note that the first argument gives us in particular that

$$\begin{aligned} \mathbb{E }_I [ (d_i^{-1}(1\!\!1_{J})_i - \mathbb{E }[d^{-1} 1\!\!1_{J} | \chi ]_i) F_i(a_i) ] = \mathbb{E }_I[ d_i^{-1}(1\!\!1_{J})_i ( F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i)] = 0, \end{aligned}$$

using properties of conditional expectation, and recalling that ploidy \(d\) is measurable with respect to \(\sigma {(\chi )}\). So again using properties of conditional expectation, we see that, since \(F_i((\tilde{a}_{s, J})_i) = F_i (a_i)\) for \(i \notin J\),

$$\begin{aligned}&\mathbb{E }_I \left[ (d_i^{-1}(1\!\!1_{J})_i - \mathbb{E }[d^{-1}1\!\!1_{J} | \chi ]_i )F_i((\tilde{a}_{s, J})_i)\right] \nonumber \\&\quad = \mathbb{E }_I \left[ d_i^{-1}((1\!\!1_{J})_i - \mathbb{E }[1\!\!1_{J} | \chi ]_i) (F_i((\tilde{a}_{s, J})_i) - F_i(a_i)) \right] \end{aligned}$$
(17)
$$\begin{aligned}&\quad = \left( \int _I d_i \ \mu _I (di) \right) ^{\! -1}\!\!\int _J d_i^{-1}((1\!\!1_{J})_i - \mathbb{E }[1\!\!1_{J} | \chi ]_i)\left( F_i((\tilde{a}_{s, J})_i) -F_i(a_i)\right) d_i\, \mu _I (di) \nonumber \\&\quad = \left( \int _I d_i \ \mu _I (di) \right) ^{\! -1}\!\!\int _J ( 1 - \mathbb{E }[1\!\!1_{J} | \chi ]_i )\left( F_i((\tilde{a}_{s, J})_i) - F_i(a_i)\right) \, \mu _I (di) \nonumber \\&\quad \le 0, \end{aligned}$$
(18)

where the final inequality follows since \(\mathbb{E }[1\!\!1_{J} | \chi ]_i \le 1\) \(\mu _I\)-almost everywhere and \( F_i((\tilde{a}_{s, J})_i) \le F_i (a_i)\) for \(i \in \tilde{I}\) because \(a\) is a solution for \(i\) in \({\mathcal{S }}\) by assumption, and \(\mu _I ( J \backslash {\tilde{I}}) ={\tilde{\mu }}_I ( J \backslash {\tilde{I}})= 0\). \(\square \)

We must now consider on what subsets of individuals we can consider a hypothetical mutant allele invading, when discussing potential for positive selection.

Definition 7

(Invadable sets) Let

$$\begin{aligned} \overline{\sigma ( \chi )} = \{ J \in \mathcal{I } : J = K \triangle N \mathrm{{\,for\,some\,}} K \in \sigma ( \chi ) \mathrm{{\,and\,some\,null\,set\,}} N \in \mathcal{I }\}, \end{aligned}$$

where \(K \triangle N = K \backslash N \cup N \backslash K\) is the usual set-theoretic symmetric difference. We shall say that \(J \in \mathcal{I }\) of positive \({\tilde{\mu }}_I\)-measure is invadable if there exists \(K \in \mathcal{I } \backslash \overline{\sigma (\chi )}\) of positive \({\tilde{\mu }}_I\)-measure such that \({\tilde{\mu }}_I ( K \backslash J) = 0\). In particular any set \(J \in \mathcal{I } \backslash \overline{ \sigma (\chi )}\) of positive \({\tilde{\mu }}_I\)-measure is invadable. We shall further say that our phenotype specification \(a :I \times R \rightarrow A\) is invadable if the set

$$\begin{aligned} \{ i \in I : a \mathrm{{\,is\,}} not \mathrm{{\,a\,solution\,for\,}} i \mathrm{{\,in\,relation\,to\,}} {\mathcal{S }}\} \end{aligned}$$

is either \({\tilde{\mu }}_I\)-null or contains an invadable subset.

Invadable sets and their existence are determined by the nature of the class map \(\chi \), as the remarks below indicate. Class is intuitively intended to be a way of dividing a large population of individuals into smaller groups: naively one would expect each class to be populated by many individuals, in which case invadable sets are in abundance. In the generality pursued here, where no restriction is placed on the cardinality of either the population or the set of classes, such informal remarks have little meaning, but the principle underlying this discussion is that the condition of a set being invadable should not, under a natural and useful class allocation, be a restrictive one.

The technical point of this definition is that an invadable set contains a subset \(K\) for which the defining condition of potential for positive selection is not trivially zero. Restricting attention to invadable sets ensures that the presence or absence of potential for positive selection is indeed determined by the evaluation of the maximand.

Remark 3

The crucial consequence of \(K\) not lying in \(\overline{\sigma (\chi )}\) is that then it is not the case that \(\mathbb{E }[1\!\!1_{K} | \chi ] = 1\) \({\tilde{\mu }}_I\)-almost everywhere on \(K\).

Remark 4

Removing a \({\tilde{\mu }}_I\)-null set from an invadable set leaves an invadable set, and supersets of invadable sets are invadable.

Remark 5

We note the following special cases:

  1. 5.1

    \(J \in \mathcal{I }\) is invadable if

    $$\begin{aligned} 0 < \mu _I (J) < \inf \left\{ \mu _I (\chi ^{-1} (\{x \})) : x \in X \ \mathrm{{such\,that}}\ \mu _I ( \chi ^{-1}( \{ x \}) ) > 0 \right\} \!. \end{aligned}$$
  2. 5.2

    If \(\chi ^{-1}(\{x\})\) is an atom of \(\mu _I\) for all \(x \in X\), then no set is invadable.

  3. 5.3

    If \(I\) is finite, and singletons have positive \(\mu _I\)-measure, then invadable sets exist if and only if \(\chi \) is not injective.

Theorem 3

Suppose that there exists a set of individuals \({\tilde{I}} \in \mathcal{I }\) of full \({\tilde{\mu }}_I\)-measure such that \(F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i = 0\) for each \(i \in {\tilde{I}}\), but that the set

$$\begin{aligned} J = \{ i \in I : a\ \mathrm{{is}}\ not\ \mathrm{{a\,solution\,for\,}} i \mathrm{{\,in\,relation\,to\,}} {\mathcal{S }} \} \end{aligned}$$

is of positive \({\tilde{\mu }}_I\)-measure.

Then there is no scope for selection. However, assuming pairwise exchangeability and that \(a\) is invadable, there is potential for positive selection in relation to \({\mathcal{S }}\).

Proof

For the first assertion, we first observe that the corresponding assertion of Theorem 2 in fact required only that \(F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i = 0\) \({\tilde{\mu }}_I\)-almost everywhere. In that situation this was implied by optimality of \(a\) for \({\tilde{\mu }}_I\)-almost every individual \(i\), but here it is given explicitly by assumption. So the same proof applies.

For the second assertion, by the assumption of invadability, we can choose sets \(K, {\tilde{K}} \in \mathcal{I }\) of positive \({\tilde{\mu }}_I\)-measure with \({\tilde{K}} \subseteq K\) such that \((1\!\!1_{K})_i - \mathbb{E }[1\!\!1_{K} | \chi ]_i > 0\) for each \(i \in \tilde{K}\), and \({\tilde{\mu }}_I ( K \backslash J) = 0\).

We know that, since \(a \in \mathcal{S }\) is not a solution in relation to \(\mathcal{S }\) for each \(i \in K \cap J \), we can choose \(s \in \mathcal{S }\) and for each \(i \in K \cap J\) a number \(\epsilon _i > 0\) such that \(F_i(a_i) + \epsilon _i < F_i(s_i)\). Since \(i \mapsto F_i (s_i)\) is measurable, the map \(i \mapsto \epsilon _i\) can be assumed to be measurable. By the substitution assumption on \(\mathcal{S },{\tilde{a}}_{s,K} \in \mathcal{S }\). Thus for \(i \in K \cap J\), we have by choice of \(s\) that

$$\begin{aligned} F_i (({\tilde{a}}_{s, K})_i) > F_i (a_i) + \epsilon _i. \end{aligned}$$
(19)

Since there is no scope for selection, we again have the identity (17), and so we can argue, using (19) and the fact that \({\tilde{\mu }}_I ( K \backslash J) = 0\), as follows:

$$\begin{aligned}&\mathbb{E }_I \left[ (d_i^{-1}(1\!\!1_{K})_i - \mathbb{E }[d^{-1}1\!\!1_{K} | \chi ]_i)F_i((\tilde{a}_{s, K})_i)\right] \nonumber \\&\quad = \left( \int _I d_i \ \mu _I (di) \right) ^{-1} \int _{K} ((1\!\!1_{K})_i - \mathbb{E }[1\!\!1_{K} | \chi ]_i)\left( F_i((\tilde{a}_{s, K})_i) - F_i(a_i)\right) \, \mu _I(di) \\&\quad \ge \left( \int _I d_i \ \mu _I (di) \right) ^{-1} \int _{K \cap J} ((1\!\!1_{K})_i - \mathbb{E }[1\!\!1_{K} | \chi ]_i) \epsilon _i \, \mu _I(di) \\&\quad \ge \left( \int _I d_i \ \mu _I (di) \right) ^{-1} \int _{{\tilde{K}} \cap J} ((1\!\!1_{K})_i - \mathbb{E }[1\!\!1_{K} | \chi ]_i) \epsilon _i \, \mu _I(di) \\&\quad > 0, \end{aligned}$$

since the choice of \({\tilde{K}}\) implies that the integrand is strictly positive at each point of \({\tilde{K}} \cap J\), which is a set of positive \(\mu _I\)-measure. \(\square \)

Theorem 4

Suppose there exists a set of individuals \(J \in \mathcal{I }\) of positive \({\tilde{\mu }}_I\)-measure such that \(F_i (a_i) - \mathbb{E }[ F(a) | \chi ]_i \ne 0\) for each \(i \in J\).

Then there is scope for selection.

Proof

Define \(p :I \rightarrow \mathbb{R }\) by \(p_i = F_i(a_i) - \mathbb{E }[F (a) | \chi ]_i\). Then we do not know a priori that \(p\) defines an essentially bounded function on \(I\). However, since it is only important for this proof that the function is non-zero on a set of positive measure, we can truncate the function if necessary and assume without loss of generality that \(p \in L^{\infty }(I, \mathcal{I }, {\tilde{\mu }}_I; \mathbb{R })\). Then (13) implies, for this definition of \(p\), that

$$\begin{aligned} \int _X( {\hat{\pi }} (x) - \pi (x) )\, \tau (dx)&= \mathbb{E }_I [(F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i) (F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i)] \\&\ge \int _J (F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i)^2 \, {\tilde{\mu }}_I(di) \\&> 0. \end{aligned}$$

\(\square \)

Theorem 5

Suppose we have pairwise exchangeability and that \(a\) is invadable, and suppose there is no scope for selection and no potential for positive selection in relation to \(\mathcal{S }\).

Then there exists a set of individuals \({\tilde{I}} \in \mathcal{I }\) of full \({\tilde{\mu }}_I\)-measure such that \(a\) is a solution for \(i\) in relation to \(\mathcal{S }\) for each \(i \in {\tilde{I}}\).

Proof

By the contrapositive to Theorem 4, since there is no scope for selection we know that \(F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i = 0\) for \({\tilde{\mu }}_I\)-almost every \(i \in I\).

Given this, the contrapositive to the second assertion of Theorem 3 gives the required result. \(\square \)

A glance at the proof of Theorem 4 prompts us to record the following important consequence of our work.

Theorem 6

(Fisher’s Fundamental Theorem of Natural Selection (Discrete Time)) Let \(p \in {\fancyscript{P}}\) be defined by

$$\begin{aligned} p_i = \mathrm{{agv}}( F(a) - \mathbb{E }[F(a) | \chi ])_i. \end{aligned}$$

Then

$$\begin{aligned}&\int _X ({\hat{\pi }} (x) - \pi (x)) \, \tau (dx)\\&\quad = \int _I \left( \mathrm{{agv}} ( F(a) - \mathbb{E }[F(a) | \chi ])_i - \mathbb{E }_I[ \mathrm{{agv}}( F(a) - \mathbb{E }[F(a) | \chi ])]\right) ^{\! 2} \, {\tilde{\mu }}_I (di), \end{aligned}$$

recalling that

$$\begin{aligned} \pi (\chi _i) = \mathbb{E }[ (\mathrm{{agv}}(F(a) - \mathbb{E }[F(a) | \chi ] ) | \chi ]_i \end{aligned}$$

and

$$\begin{aligned} {\hat{\pi }}(x) = \mathbb{E }^{\varOmega }\left[ \frac{d}{dW}\mathbb{E }_I[(\mathrm{{agv}}(F(a) - \mathbb{E }[F(a) | \chi ])_i {\tilde{w}}_i^{\omega }(a_i)]\right] \end{aligned}$$

represent the (expected) mean values by class of \(\mathrm{{agv}}(F(a) - \mathbb{E }[F(a) | \chi ])\) in the parental and the offspring generations respectively.

That is: the expected change in the mean of the additive genetic value of the deviation of the expected fitness from the class mean, weighted by reproductive value, is equal to the (unweighted) variance of the additive genetic value of the deviation of the expected fitness from the class mean.

Proof

We first recall the definition of \(p\)-scores, and observe that if we choose a locus \(1 \le L \le N\), this corresponds to choosing the column \((g_{k,L})_{k = 1}^n\) of the genotype matrix \(\,G=(g_{k,l}) \in \mathbb{R }^{n \times N}\). The total number of alleles at this locus in any individual must equal the ploidy of that individual, thus when we consider the linear map \(\xi _L :\mathbb{R }^{n \times N} \rightarrow \mathbb{R }\) defined by summing the entries of the column of the matrix \(G = (g_{k,l}) \in \mathbb{R }^{n \times N}\), i.e.

$$\begin{aligned} \xi _L (G) = \sum _{k = 1}^n g_{k, L}, \end{aligned}$$

we see that for all \(i \in I\),

$$\begin{aligned} \xi _L ( g_i / d_i) = 1. \end{aligned}$$

By definition, since \(\xi _L :\mathbb{R }^{n \times N} \rightarrow \mathbb{R }\) is linear, this—the constant function—is a \(p\)-score. Hence we can apply (3) to see that

$$\begin{aligned} \mathbb{E }_I[ \mathrm{{agv}}( F(a) - \mathbb{E }[ F(a) | \chi ])_i]&= \mathbb{E }_I[ 1 \cdot (\mathrm{{agv}}( F(a) - \mathbb{E }[ F(a) | \chi ])_i )] \\&= \mathbb{E }_I[F_i(a_i) - \mathbb{E }[F(a) | \chi ]_i ] \\&= 0. \end{aligned}$$

With the definition of \(p \in {\fancyscript{P}}\) made in the statement, we now apply the Price equation (15) to obtain

$$\begin{aligned}&\int _X ( {\hat{\pi }} (x) - \pi (x)) \, \tau (dx)\\&\quad = \int _I (\mathrm{{agv}}( F(a) - \mathbb{E }[F(a) | \chi ])_i)^2 \, {\tilde{\mu }}_I(di) \\&\quad = \int _I \left( \mathrm{{agv}}( F(a) - \mathbb{E }[F(a) | \chi ])_i - \mathbb{E }_I[ \mathrm{{agv}}(F(a) - \mathbb{E }[F(a) | \chi ])_i] \right) ^{2} \, {\tilde{\mu }}_I (di), \end{aligned}$$

as required. \(\square \)

7 Interpretation, significance, and context of our results

Theorems 2–5 are the four links between gene frequency change and optimization, parallel to the four links proved in previous work (Grafen 2002, 2006a). If all individuals in the population solve the optimization programme, then the expected change in every gene frequency equals zero, and no phenotype, if caused by a rare dominant mutant, would cause that mutant to spread. The strengths of this result are that it applies to all gene frequencies, whether they affect any given trait or not, and whether they affect fitness or not; that it therefore applies to all weighted sums of allele frequencies and so to the additive genetic value of every quantitative trait; that fitness is a property of the individual and in particular is the same whichever gene frequency is being considered; that it applies to a class-structured population, with evolutionarily appropriate class-weights that are used both to aggregate the mean \(p\)-score across classes when considering the change in mean \(p\)-score, and to provide class-weights for offspring in the evaluation of the fitness of an individual; and that the class-weights are the same no matter which class the parent belongs to. Notable features of the result are that it shows that the expected change in gene frequencies equals zero, but in any given state of nature the gene frequencies may indeed change; that it says expected gene frequencies don’t change, but this does not imply that expected genotype frequencies do not change; that once gene or genotype frequencies have changed from one generation to the next, there is no guarantee, or even reason to expect, that the class-weights will remain the same; hence the way of evaluating fitness is quite likely to change from one period to the next. Theorem 3 is a diminished form of the previous theorem, which supposes that each individual attains the same value of the maximand, but does not solve it, and then while no gene-frequency changes in expectation, there is a phenotype which, if produced by a rare dominant mutant, would spread in expectation. Theorem 4 supposes individuals attain different values of the maximand, and merely asserts that the expected change in each gene frequency equals its covariance with fitness. This is a restatement of the Price equation, and the purpose of presenting it in this form is that the previous two theorems would hold if the maximand were replaced with a monotonic increasing function of the maximand, and this theorem holds only for the expected fitness itself (up to addition of a constant). So including this theorem ensures that the links presented do constrain as strongly as possible the nature of the maximand of an optimization programme that can take part in the set of links. The final link is Theorem 5, which reverses the direction of inference, and is in one way the most important. It states that if there is no expected change in gene frequency, and if there is no phenotype which, if produced by a rare dominant mutant, would cause that mutant to spread, then each individual solves the optimization programme: it is the only theorem to proceed from gene frequencies to optimization, and is the central result that contributes to establishing the reasonableness of the adaptationist viewpoint, namely that we should expect organisms to be optimally adapted. Of course, there are many questions, which cannot be considered here, about the extent to which this result does justify adaptationism, but the very general nature of the whole framework, and the particularities of how ‘adaptiveness’ and in particular fitness have to be defined, represent major advances in our understanding of adaptationism and its connection to genetics.

Theorem 6 is a generalization of the Fundamental Theorem of Natural Selection of Fisher (1930), except that we remain in discrete time and do not pass to a continuous time limit as Fisher did. The current status of the fundamental theorem in the literature is discussed by Okasha (2008) and Ewens (2011), and the best previous technical treatment is due to Lessard (1997). These modern authors have discussed what the theorem states, and whether it is true, but the consensus has been that no-one knows whether it has biological significance. Its inclusion here is intended to settle that question in the affirmative. The formal darwinism project is the first attempt to construct a formal justification of individual fitness maximization ideas since the fundamental theorem, and this paper shows that a wide range of abstract and perhaps arcane concepts have to be introduced to do the job properly. It turns out that the fundamental theorem is readily expressed and proved in terms of those concepts, in a much more explicit way than Fisher was able to prove it, and in a more general way than more recent derivations.

Integrating our work with the important literature on the fundamental theorem (for example Price 1972b; Ewens 1989; Frank and Slatkin 1992; Edwards 1994; Lessard 1997; Okasha 2008; Ewens 2011) must remain a task for the future. Here we indicate three significant contributions of the current work to understanding the theorem. First, Fisher has been criticized because his verbal statement of the theorem (‘the rate of increase in fitness of any organism at any time is equal to its genetic variance in fitness at that time’) differs from his technical statement. However, a statement that the mean change in \(X\) equals the variance in \(X\) is obviously closely tied to the idea of \(X\) being maximized. Our version is of such a form, and it is not surprising that Fisher would wish the most accessible form of his theorem to display what he clearly believed to be its more important implication. Further, the exact sense in which the theorem provides a maximand, in light of the technical qualifications of the theorem, is precisely the subject of Theorems 2–5. Second, the reason for the discrepancy between verbal and technical statements is that so much in Fisher’s argument has not been made explicit. We have made the whole argument explicit, and in view of the length and complexity of the argument, it is not surprising that Fisher did not do so, perhaps from a combination of inability (tracking classes and reproductive values and gene frequencies is notationally cumbersome and hard work; further, most of our mathematical references are from later than 1930) and unwillingness (the book was for a lay audience, and it may be doubted how much of our current argument would have been appreciated even by contemporary scientific readers, of Fisher 1941, for example). Third, some obstacles to understanding are now explained and removed. For example, Price (1972b)’s first technical move is to assert that the left hand side of the theorem is not in fact the change in fitness, but only that part of the change in fitness due to changes in gene frequencies. The meaning of this qualification has played a major role in discussions of the meaning and significance of the theorem (Okasha 2008; Ewens 2011): our version shows that this qualification is made because, while fitness itself does not admit a result of the form ‘change in mean of \(X\) equals variance in \(X\)’, the additive genetic value of fitness does. This settles in a very precise technical way what the qualification means, and provides a wholly understandable reason for Fisher to make it.

We regard the mathematical framework of this paper as a set of ideas about which Fisher had strong and correct intuitions, but which he managed to articulate only in small part, and about which he drew a most significant conclusion: that population genetics could exhibit the design-making nature of Darwinian natural selection, demonstrate that it was constantly at work in a very general setting, and make precise what quantity was the appropriate measure of goodness of design. Fisher rightly regarded the fundamental theorem of natural selection as providing the fundamental link between Darwin’s argument that natural selection brought about adaptation, on the one hand, and population genetics, on the other.

On the dust-jacket of the 1999 variorum edition of Fisher’s book, W.D. Hamilton writes “In some ways some of us have overtaken Fisher; in many, however, this brilliant, daring man is still far in front”. In showing exactly how the fundamental theorem relates to fitness-maximization, and that the full argument is even today at the boundaries of mathematical biology, we have taken a significant step towards “catching up with Fisher”.

8 An example with an age-structured population

We present an example to show our various theorems at work, and choose the classic case of an age-structured population for which Fisher first defined ‘reproductive value’ and proved his fundamental theorem. The results of the current paper would have allowed us to have a sexual population and to study sex ratio simultaneously with survival–fertility tradeoff, thus uniting Fisher’s original uses of reproductive value. However, for simplicity, and for the historical interest of exhibiting implications of the original form of the fundamental theorem, we have reserved that more advanced example for the future. In particular, we note that versions of the theorem that omit age structure, and so have no need to involve reproductive value, miss out a fundamental feature of the fundamental theorem.

We suppose \(I\) is a finite population, and our classes are \(K+1\) age classes for some \(K \in \mathbb{N }\), comprising ages \(0\) to \(K\). We assume the population to be of constant ploidy one, and to be asexual. We shall consider the set of local environments to be the set of possible amounts of some resource available to each individual without competition: thus \(R = [0, \infty )\). Thus being in environment \(r\) is interpreted as having \(r\) resources available. These resources are to be invested entirely in reproduction and/or survival. We shall assume each individual \(i \in I\) requires \(b_i\) resources to produce each offspring, where \(b_i \in (0, \infty )\). We shall identify the phenotype of an individual with the choice the individual makes of how best to spend her resources, which, since we demand that resources are exhausted between reproduction and survival, is the choice of how to distribute the resources between offspring production and attempted survival. Chance events shall represent how many of the produced offspring of an individual will survive to the next census point, when they will be of age \(0\), and whether an individual herself will survive to the next age class. Thus our phenotype space \(A\) is given by \(([0, \infty ) \times [0,\infty ))\), the first coordinate representing energy devoted to offspring, the second how much energy the individual invests in survival.

We thus have a function \(r :I \times \varOmega \rightarrow [0, \infty )\) determining how much resource individual \(i \in I\) finds available in the state of nature \(\omega \in \varOmega \). The set of admissible phenotypes for an individual \(i\) is then the set of functions \(q :[0, \infty ) \rightarrow [0, \infty ) \times [0, \infty )\) which must satisfy, writing \(q= ( q_1, q_2)\), the relations

$$\begin{aligned}&\displaystyle b_i q_1 (r) + q_2 (r) = r, \ \mathrm{{and}} \end{aligned}$$
(20)
$$\begin{aligned}&\displaystyle q_2(r) =0 \quad \mathrm{{for\,all}}\ r \in [0, \infty )\ \mathrm{{if}}\ \chi _i = K. \end{aligned}$$
(21)

The second condition states that no individual attempts to survive beyond age \(K\).

The chance events of the state of nature \(\omega \) influencing the individual \(i\) are then represented by \(u :I \times \varOmega \rightarrow ((\mathbb N \cup \{0\})^{[0, \infty )} \times \{0, 1\}^{[0, \infty )})\), the first term telling us, for each \(x \in [0, \infty )\), how many surviving offspring are produced given the investment of \(x\) resources in offspring production, and the second term telling us, given the individual devotes \(r \in [0, \infty )\) resources to survival, whether the parent individual survives (\(1\)) or dies (\(0\)). It is reasonable to assume that the function \((u_i^{\omega })_2 :[0, \infty ) \rightarrow \{0, 1\}\) is monotone increasing: the more effort an individual puts into survival, the more likely she is to survive; and that \((u_i^{\omega })_2(0) = 0\): individuals do not survive if they do not try to, so in particular no individual survives beyond age \(K\).

Offspring are then produced as follows:

$$\begin{aligned} {\tilde{w}}_i^{\omega }(a_i) = (u_i^{\omega })_1((a_i)_1 (r_i^{\omega })) \delta _0 + (u_i^{\omega })_2((a_i)_2(r_i^{\omega })) \delta _{\chi _i + 1}. \end{aligned}$$
(22)

Define \(\alpha _i^{\omega } {:}= (u_i^{\omega })_1((a_i)_1 (r_i^{\omega })) \), the number of surviving offspring, and \(\beta _i^{\omega } {:}= (u_i^{\omega })_2((a_i)_2(r_i^{\omega }))\), which determines survival of the individual.

The offspring distribution over ages may then be captured by considering the conditional expectations \(\alpha ^{\omega }(k) {:}= \mathbb{E }[ \alpha _i^{\omega } | \chi _i = k]\) and \(\beta ^{\omega }(k) {:}= \mathbb{E } [ \beta _i^{\omega } | \chi _i = k]\). Supposing the parental age distribution to be given by the vector \(\mathbf{v} = (v_0, \ldots , v_K)^T\), the offspring distribution, in state of nature \(\omega \), is given by the vector \(\mathbf{w} = (w_0, \ldots , w_K)^T\) where

$$\begin{aligned} w_0&= \sum _{k=0}^{K}\alpha ^{\omega }(k),\ \mathrm{{and}}\end{aligned}$$
(23)
$$\begin{aligned} w_k&= \beta ^{\omega }(k-1)\quad \mathrm{{for}}\ k \ge 1. \end{aligned}$$
(24)

We assume, as in the general argument, that these coefficients are independent of \(\omega \). In particular, then, \(\beta ^{\omega }(k)\) is independent of \(\omega \) for each \(k \ge 0\), and thus may be written as \(\beta (k)\). Furthermore, by linearity, the situation can be captured by considering the coefficients \(\alpha (k) {:}= \mathbb{E }^{\varOmega }[\alpha ^{\omega }(k)]\) and writing

$$\begin{aligned} L = \begin{pmatrix} \alpha (0) &{} \alpha (1) &{} \cdots &{} \cdots &{} \alpha (K) \\ \beta (0) &{} 0 &{} \cdots &{} \cdots &{} 0 \\ 0 &{} \beta (1) &{} \cdots &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} &{} \vdots \\ 0 &{} \ldots &{} 0 &{} \beta (K-1) &{} 0 \end{pmatrix} \end{aligned}$$
(25)

and noting that

$$\begin{aligned} L \mathbf{v} = \mathbf{w}. \end{aligned}$$
(26)

\(L\) is then the so-called Leslie matrix associated with demographic processes (Leslie 1945, 1948; Lewis 1942). The left eigenvector then gives us the per-capita reproductive value in the sense of Fisher—an observation consistent with the more general assertions in Grafen (2006b) that per-capita reproductive value is found as an eigenvector of the adjoint of the forward process transition operator. We therefore denote such a vector \({\varvec{\tau }} = (\tau _0, \ldots , \tau _K)^T\). Our definition of the fitness of an individual playing strategy \(q :[0, \infty ) \rightarrow [0, \infty ) \times [0, \infty )\) then amounts to

$$\begin{aligned} F_i (q) = \mathbb{E }^{\varOmega }[ (u_i^{\omega })_1(q_1(r_i^{\omega })) \tau _0 + (u_i^{\omega })_2 (q_2 (r_i^{\omega })) \tau _{\chi _i + 1}] , \end{aligned}$$

in particular

$$\begin{aligned} F_i (a_i) = \mathbb{E }^{\varOmega }[\alpha _i^{\omega } \tau _0 + \beta _i^{\omega } \tau _{\chi _i + 1}]. \end{aligned}$$

We define \(\alpha _i {:}= \mathbb{E }^{\varOmega }[ \alpha _i^{\omega }]\) and \(\beta _i {:}= \mathbb{E }^{\varOmega }[\beta _i^{\omega }]\). Thus

$$\begin{aligned} F_i (a_i) = \alpha _i \tau _0 + \beta _i \tau _{\chi _ i + 1}. \end{aligned}$$

The situation is then analyzed by considering when this number is at a maximum. Individuals of age \(K\) cannot choose to survive, thus there is no choice of phenotype available to them; so we concentrate on individuals in age classes \(k\) for \(0 \le k < K\). Fix such an individual \(i \in I\). Then

$$\begin{aligned} F_i^{\omega }(q)&= (u_i^{\omega })_1(q_1 (r_i^{\omega })) \tau _0 + (u_i^{\omega })_2(q_2 (r_i^{\omega }) )\tau _{\chi _i +1} \\&= (u_i^{\omega })_1(q_1(r_i^{\omega })) \tau _0 + (u_i^{\omega })_2(r_i^{\omega } - b_iq_1(r_i^{\omega }) )\tau _{\chi _i +1}. \end{aligned}$$

For an admissible strategy \(q\), define \(A_i^{\omega }(q) {:}= (u_i^{\omega })_1(q_1 (r_i^{\omega }))\) and \(B_i^{\omega }(q) {:}= (u_i^{\omega })_2(q_2 (r_i^{\omega }) )\). Taking expectations over states of nature we see that

$$\begin{aligned} F_i(q)&= \mathbb{E }^{\varOmega }\left[ (u_i^{\omega })_1(q_1 (r_i^{\omega })) \tau _0 + (u_i^{\omega })_2(r_i^{\omega } - b_iq_1(r_i^{\omega }) )\tau _{\chi _i +1}\right] \\&= \int _{\{\omega \in \varOmega : B_i^\omega (q) = 0\}} A_i^{\omega }(q) \tau _0 \, \nu ( d \omega )+ \int _{\{ \omega \in \varOmega : B_i^{\omega } (q) = 1\}} (A_i^{\omega }(q) \tau _0 + \tau _{\chi _i + 1} )\, \nu (d \omega )\\&= \tau _0 \int _{\varOmega } A_i^{\omega }(q) \, \nu (d \omega ) + \tau _{\chi _i + 1} \nu ( \{ \omega \in \varOmega : B_i^{\omega } (q) = 1\}). \end{aligned}$$

The first summand is the expected number of offspring weighted by their reproductive value; the second is the probability of survival, weighted by reproductive value of an individual in the next age class.

Let a strategy \(q\) be fixed, and consider another strategy \({\tilde{q}} = ({\tilde{q}}_1, {\tilde{q}}_2)\). Recall that the choice to be made is the value of \(q_1(r_i^{\omega }) \in [0, b_i^{-1}r_i^{\omega }]\). Suppose \({\tilde{q}}_1 < q_1\), i.e. playing the strategy \({\tilde{q}}\) means having fewer offspring. Since \((u_i^{\omega })_2\) is monotone increasing, and \(B_i^{\omega } (q) = (u_i^{\omega })_2 ( r_i^{\omega } - b_iq_1(r_i^{\omega }))\), we then have that \(\nu (\{ B_i^{\omega } = 1\}) \le \nu ( \{ B_i^{\omega }({\tilde{q}}) = 1\})\). So \(F_i({\tilde{q}}) \ge F_i( q)\) if and only if

$$\begin{aligned} \tau _0 \int _{\varOmega } A_i^{\omega }({\tilde{q}}) \, \nu (d \omega ) + \nu ( \{ B_i^{\omega }({\tilde{q}})&= 1\}) \tau _{\chi _i + 1} \ge \tau _0 \int _{\varOmega } A_i^{\omega }(q) \, \nu (d \omega ) \\&+ \nu ( \{ B_i^{\omega }(q) = 1\}) \tau _{\chi _i + 1}, \end{aligned}$$

equivalently

$$\begin{aligned} (\nu ( \{ B_i^{\omega }(\tilde{q}) = 1\}) - \nu ( \{ B_i^{\omega }(q) = 1\}))\tau _{\chi _i + 1} \ge \tau _0 \int _{\varOmega }( A_i^{\omega }(q) - A_i^{\omega }(\tilde{q})) \, \nu (d \omega ). \end{aligned}$$

That is, for \(\tilde{q}\) to be a better strategy, the increased chance of survival, weighted by the reproductive value of a surviving individual, must exceed the change in expected number of offspring, weighted by the reproductive value of offspring. Of course, if having more offspring leads to a lower expected number of surviving offspring (e.g. if then limited resources are shared between too many infants), then this is trivial.

Assuming pairwise exchangeability, this then rigorously justifies our intuition for this example. Theorems 2 and 3 together assert that the population is at an evolutionary equilibrium if and only if every individual maximizes the value \(F_i(a_i)\). This number is the reproductive value-weighted sum of the expected number of offspring and the survival probability of the individual. The above analysis of the maximand then shows that evolutionary equilibrium is attained precisely when each individual reacts to each environment in such a way that the expected contribution to the following generation, in terms of reproductive value, is at a maximum.

This is a simple but fitting example with which to end. Fisher proved his fundamental theorem in this setting, and concluded that in general we should expect individuals to maximize their fitness. Here we explicitly articulate that optimization argument in an example, and note that the fundamental theorem shows how survival and reproduction trade off in the calculation of fitness, through the use of reproductive value. Notice that the individual is regarded as making only the decision relevant to its current age as opposed to all life-decisions simultaneously, and that the fecundity–survival trade-off operates through varying the chance of survival to the next age period. The value of surviving is obtained through knowing the reproductive value of an individual in the next age class. Thus reproductive value in an age-structured population is the expected future reproductive value of an individual of a given age. If we had extended the set of classes to include condition, incorporating body weight and health, at each age, then we could have modelled more complex tradeoffs in which producing more offspring reduced one’s condition, and in which reproductive value would presumably be an increasing function of condition within each age class.

Our general results extend Fisher’s theorem by permitting an arbitrary class-structure, explicitly incorporating uncertainty, allowing each class to have its own ploidy and, in particular, by fully articulating the meaning of fitness-maximization. Future generalizations may further permit social behaviour, time to be continuous or discrete, and random variation in class-to-class projection at the population level along with the demographic stochasticity that implies.