1 Introduction

In a series of papers motivated by models of structured metapopulations (Levins 1969; Hanski and Gilpin 1991) and parasitic disease transmission (Kretzschmar 1993), the authors have extended Kurtz’s (1970, 1971) theory to provide laws of large numbers and central limit theorems for Markov population processes with countably many types of individual, together with estimates of the approximation errors: see Barbour and Luczak (2008, 2012a, b). These theorems provide a good description of the overall behaviour of such processes, when the population size is large. However, as observed by Léonard (1990), many ecological models, when seen from the perspective of the individuals themselves, can be interpreted as interacting particle systems. It is then of interest to be able to describe the behaviour of (small groups of) individuals within the large system. Under very stringent assumptions on the transition rates, in particular requiring that they be uniformly bounded, he proves a ‘propagation of chaos’ theorem, showing that individuals evolve almost independently of one another, as Markov processes whose transition rates are determined by the bulk behaviour of the system.

In this paper, we establish an analogous result for systems with countably many types, under much less restrictive conditions. We formulate a model that is general enough to encompass many host parasite systems and structured metapopulation models. The main tool used in showing the asymptotic independence of individuals in such processes is to couple the process describing the evolution of individuals in the original system with one in which they evolve independently. The coupling is constructed by matching the transition rates in the two processes, and the argument is described in Sect. 2.

In order to show that the coupling is close, we rely on the quantitative law of large numbers proved in Barbour and Luczak (2012a). The conditions needed for the law of large numbers have already been shown to be satisfied for a number of examples from the literature, including the models of Arrigoni (2003), Barbour and Kafetzaki (1993), Kretzschmar (1993) and Luchsinger (2001a, b). However, some work is required to find explicit conditions based on the parameters of our general model under which the law holds; this is accomplished in Sect. 3. The paper concludes with examples taken from Metz and Gyllenberg (2001) and from Kretzschmar (1993).

2 Main results

We begin by formulating our models in a way which explicitly reflects their origins in metapopulation and parasitic disease modelling. The basic description is in terms of the numbers of patches of each of a countable number of types. The type of a patch is determined by the numbers of animals of each of \(d\) different varieties present in the patch, indexed by \({\mathbf {i}}= (i_1,\ldots ,i_d) \in \mathbb {Z}_+^d\). For instance, a patch may represent a host, and its type the numbers of parasites of various different species that it harbours. However, an animal’s variety may also indicate its developmental stage, or its infection status, so that its variety may change over its lifetime. We also define \(d\) further types, to account for animals of the different varieties that are in transit between patches. Thus the possible patch types are indexed by \(\mathcal{{Z}}:= \mathcal{{Z}}_1 \cup \mathcal{{Z}}_2\), where \(\mathcal{{Z}}_1 = \mathbb {Z}_+^d\) and \(\mathcal{{Z}}_2 =\{1,\ldots ,d\}\). In these terms, the state space is expressed as \(\mathcal{{X}}:= \{X \in \mathbb {Z}_+^{\mathcal{{Z}}},\,\sum _{z\in \mathcal{{Z}}}X_z < \infty \}\). The interpretation is that \(X_{\mathbf {i}}\) records the number of patches of type \({\mathbf {i}}\), \({\mathbf {i}}\in \mathcal{{Z}}_1\), whereas \(X_{l}\), \(1\le l\le d\), denotes the number of migrating animals of variety \(l\). The restriction \(\sum _{z\in \mathcal{{Z}}}X_z < \infty \) in the definition of \(\mathcal{{X}}\) constrains total numbers of patches and animals to be finite. Our model for the evolution of the metapopulation consists of a family \(X^N := (X^N(t),\,t\ge 0)\) of pure jump Markov processes on \(\mathcal{{X}}\), indexed by \(N \in {\mathbb {N}}\), with \(N\) to be thought of as a typical number of patches in the process \(X^N\). Writing \(e(z)\) for the \(z\)-coordinate vector in \(\mathbb {R}_+^{\mathcal{{Z}}}\), \(z\in \mathcal{{Z}}\), and \(e_{l}\) for the \(l\)-th coordinate vector in \(\mathbb {Z}^d\), the transition rates for \(X^N\) are assumed to be given by

$$\begin{aligned} \begin{array}{llllll} \mathrm{I}:&{} \ X &{} \rightarrow &{}X + e({\mathbf { j}}) - e({\mathbf {i}}) &{}\quad \text{ at } \text{ rate }\quad X_{\mathbf {i}}\{{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x)\}, &{}\quad {\mathbf {i}},{\mathbf { j}}\in \mathcal{{Z}}_1; \\ \mathrm{II}:&{} \ X &{} \rightarrow &{}X + e({\mathbf {i}}) &{}\quad \text{ at } \text{ rate }\quad N\beta _{\mathbf {i}}(x), &{}\quad {\mathbf {i}}\in \mathcal{{Z}}_1; \\ \mathrm{III}:&{} \ X &{} \rightarrow &{}X - e({\mathbf {i}}) &{}\quad \text{ at } \text{ rate }\quad X_{\mathbf {i}}\{{\bar{\delta }}_{{\mathbf {i}}} + \delta _{{\mathbf {i}}}(x)\}, &{}\quad {\mathbf {i}}\in \mathcal{{Z}}_1; \\ \mathrm{IV}:&{} \ X &{} \rightarrow &{}X + e(l) + e({\mathbf {i}}- e_l) - e({\mathbf {i}}) &{}\quad \text{ at } \text{ rate }\quad X_{\mathbf {i}}\{{\bar{\gamma }}_{{\mathbf {i}}l} + \gamma _{{\mathbf {i}}l}(x)\}, &{}\quad {\mathbf {i}}\in \mathcal{{Z}}_1,\,1\le l\le d; \\ \mathrm{IV'}:&{} \ X &{} \rightarrow &{}X + e(l) &{}\quad \text{ at } \text{ rate }\quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}X_{\mathbf { j}}\{{\bar{\gamma }}'_{{\mathbf { j}}l} + \gamma '_{{\mathbf {i}}l}(x)\}, &{}\quad 1\le l\le d; \\ \mathrm{V}:&{} \ X &{} \rightarrow &{}X + e({\mathbf {i}}+e_l) - e({\mathbf {i}}) - e(l) &{}\quad \text{ at } \text{ rate }\quad X_{l} x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x), &{}\quad {\mathbf {i}}\in \mathcal{{Z}}_1,\,1 \le l \le d, \\ \mathrm{VI}:&{} \ X &{} \rightarrow &{}X - e(l) &{}\quad \text{ at } \text{ rate }\quad X_{l}\{\bar{\zeta }_l + \zeta _l(x)\}, &{}\quad 1\le l\le d, \end{array} \end{aligned}$$

where \(x := N^{-1}X \in \{x' \!\in \! {R}_+^{\mathcal{{Z}}},\,\Vert x'\Vert _1 < \infty \} =: \mathcal{{X}}'\), and \(\Vert x\Vert _1 := \sum _{z\in \mathcal{{Z}}}x_z\).

The transitions I correspond to changes in the type of a patch, because of births, deaths and changes of status involving animals within the patch, or as a result of infection or catastrophe, or of immigration from outside the metapopulation, and we set \({\bar{\lambda }}_{{\mathbf {i}}{\mathbf {i}}} = \lambda _{{\mathbf {i}}{\mathbf {i}}}(\cdot ) = 0\), \({\mathbf {i}}\in \mathcal{{Z}}_1\). Then II and III correspond to the creation and destruction of patches, IV and V concern the migration of animals of the different varieties between patches, and VI the deaths of animals during migration. The transitions IV\('\) allow for the possibility of an individual being born as a migrant, as is allowed in our first example, in Sect. 4. More complicated transitions of this kind could have been incorporated, but the biological motivation for doing so does not seem compelling. The parameters \({\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}}\), \({\bar{\delta }}_{\mathbf {i}}\), \({\bar{\gamma }}_{{\mathbf {i}}l}\), \({\bar{\gamma }}'_{{\mathbf {i}}l}\) and \(\bar{\zeta }_l\) represent fixed rates of transition per patch. To ensure that the overall rate of jumps is finite at any \(x \in N^{-1}\mathcal{{X}}\), it is necessary to have \({\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} < \infty \) for all \({\mathbf {i}}\in \mathcal{{Z}}_1\). The corresponding quantities without the bars, together with \(\sigma _{l{\mathbf {i}}}(\cdot )\) and \(\beta _{\mathbf {i}}(\cdot )\), represent state dependent components of the transition rates. For each \(x \in \mathcal{{X}}'\), it is then also necessary to have

$$\begin{aligned} {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\lambda _{{\mathbf {i}}{\mathbf { j}}}(x) \ <\ \infty , \quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\beta _{\mathbf { j}}(x) \ <\ \infty \quad \text{ and }\quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x) \ <\ \infty ; \end{aligned}$$
(2.1)

further assumptions are added in Sect. 3. In transition IV, we require \({\bar{\gamma }}_{{\mathbf {i}}l} = \gamma _{{\mathbf {i}}l}(x) = 0\) whenever \(i_l = 0\), to avoid ever having \(i_l < 0\), which would be biologically meaningless.

Let \(T > 0\) be a constant; we study the evolution of the metapopulation over the interval \([0,T]\). Under further assumptions on the transition rates I–VI and on the initial condition \(x^N(0)\), it can be shown that, with high probability, \(x^N(t)\) is uniformly close to the solution \(x\) of a deterministic integral equation, which is the analogue of the usual deterministic drift differential equations found in finite dimensional problems. In Sect. 3, we illustrate how to use the results of Barbour and Luczak (2012a) to justify this. For the rest of this section, we assume that

$$\begin{aligned} {\mathbb {P}}\left[ \sup _{0\le t\le T}\Vert x^N(t) - x(t)\Vert _\mu > \varepsilon _N\right] \ \le \ P_T(N,\varepsilon _N), \end{aligned}$$
(2.2)

for some (small) \(\varepsilon _N\) and \(P_T(N,\varepsilon _N)\), and for some norm \(\Vert \cdot \Vert _\mu \), and show how (2.2) can be used to establish the joint behaviour of groups of individuals in the process \(X^N\).

We begin by investigating the behaviour over time of the type of a single patch \({\mathcal {P}}\). The transitions I, IV and V each contain elements corresponding to the rate of change of type of a patch that is currently of type \({\mathbf {i}}\), with the rates depending on the current state of the whole system, and the death rate of such a patch is given in III. Thus we can single out the transition rates for the patch \({\mathcal {P}}\), with its evolution only being Markovian if the current state \(x\) of the whole system is adjoined. For any \({\mathbf {i}},{\mathbf { j}}\in \mathcal{{Z}}_1\) and \(1\le l\le d\), these take the form

$$\begin{aligned} \begin{array}{lllll} {\mathbf {i}}&{} \rightarrow &{}{\mathbf { j}}&{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x), &{}\quad \Vert {\mathbf { j}}-{\mathbf {i}}\Vert _1 \ge 2; \\ {\mathbf {i}}&{} \rightarrow &{}{\mathbf { j}}&{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x) + {\bar{\gamma }}_{{\mathbf {i}}l} + \gamma _{{\mathbf {i}}l}(x); &{}\quad {\mathbf { j}}= {\mathbf {i}}- e_l \\ {\mathbf {i}}&{} \rightarrow &{}{\mathbf { j}}&{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x) + x_{l}\sigma _{l{\mathbf {i}}}(x); &{}\quad {\mathbf { j}}= {\mathbf {i}}+ e_l \\ {\mathbf {i}}&{} \rightarrow &{}\Delta &{}\quad \text{ at } \text{ rate }\quad {\bar{\delta }}_{\mathbf {i}}+ \delta _{\mathbf {i}}(x), \end{array} \end{aligned}$$
(2.3)

with \(\Delta \) a state to represent that the patch has been destroyed. We let \(Y_N\) denote the process describing the time evolution of the type assigned to \({\mathcal {P}}\), with \(Y_N(t)\) taking values in \(\mathcal{{Z}}_1 \cup \Delta \); the \(N\)-dependence reflects that its transition rates are as described in (2.3), but with \(x^N(t)\) in place of \(x\) for the rates at time \(t\).

Analogously, we could define a process representing the life history of an animal \({\mathcal {A}}\) in the metapopulation. The migration transitions IV, V and VI are easy to interpret, and the destruction of a patch in III implies the death of any animals in that patch. The transitions I are more complicated. Considering an animal of variety \(l\), its death is typically recorded in a transition in which \(j_l \le i_l-1\) (several animals of the same variety may die as a result of the same event), but a change of developmental stage, for instance, may also result in \(j_l = i_l - 1\). Then, for unicellular animals, division is recorded most simply as \(j_l = i_l + 1\), though it may be useful to interpret the same event as the death of the original animal at the same time as the birth of two offspring. Furthermore, transitions in which \(i_l\) does not change may represent births of animals that are directly associated with the particular animal of variety \(l\) being considered, as when an adult gives birth to juveniles that are represented as a distinct variety; such events are naturally to be recorded in a life history. This suggests defining a life history process \(Z_N := \{(Z_{N0}(t),\ldots ,Z_{Nd}(t)),\, t \ge 0\}\) for an animal \({\mathcal {A}}\), whose statespace is

$$\begin{aligned} ((\mathcal{{Z}}_1 \times \{1,2,\ldots ,d\}) \cup \{1,2,\ldots ,d\} \cup \Delta ) \times \mathbb {Z}_+^d. \end{aligned}$$

A value \(Z_{N0}(t) \in \mathcal{{Z}}_1 \times \{1,2,\ldots ,d\}\) denotes the the type of patch in which \({\mathcal {A}}\) is living and its current variety. Then \(Z_{N0}(t) = l\) if \({\mathcal {A}}\) is of variety \(l\) and in migration, and, if \(Z_{N0}(t) = \Delta \), the animal \({\mathcal {A}}\) has died before time \(t\). The values \(Z_{Nl}(t)\), \(1\le l\le d\), record the numbers of children of the different varieties to which \({\mathcal {A}}\) has given birth up to time \(t\). For \({\mathbf {i}}\in \mathcal{{Z}}_1\), \(l,l'\in \{1,2,\ldots ,d\}\) and \(m,s\in \mathbb {Z}_+^d\), the transition rates can be represented in the form

$$\begin{aligned} \begin{array}{llll} &{}(({\mathbf {i}},l),m) \rightarrow (({\mathbf {i}}+s,l),m+s) &{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}^{(1)}_{{\mathbf {i}}ls} + \lambda ^{(1)}_{{\mathbf {i}}ls}(x) ; \\ &{}(({\mathbf {i}},l),m) \rightarrow (({\mathbf { j}},l),m) &{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}^{(2)}_{{\mathbf {i}}{\mathbf { j}}} + \lambda ^{(2)}_{{\mathbf {i}}{\mathbf { j}}}(x) ; \\ &{}(({\mathbf {i}},l),m) \rightarrow (({\mathbf {i}}-e_l + e_{l'},l'),m) &{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}^{(3)}_{{\mathbf {i}}ll'} + \lambda ^{(3)}_{{\mathbf {i}}ll'}(x) ; \\ &{}(({\mathbf {i}},l),m) \rightarrow (({\mathbf {i}},l),m+e_{l'}) &{}\quad \text{ at } \text{ rate }\quad {\bar{\lambda }}^{(4)}_{{\mathbf {i}}ll'} + \lambda ^{(4)}_{{\mathbf {i}}ll'}(x) ; \\ &{}(({\mathbf {i}},l),m) \rightarrow (\Delta ,m) &{}\quad \text{ at } \text{ rate }\quad {\bar{\delta }}'_{{\mathbf {i}}l} + \delta '_{{\mathbf {i}}l}(x) ; \\ &{}(({\mathbf {i}},l),m) \rightarrow (l,m) &{}\quad \text{ at } \text{ rate }\quad i_l^{-1}\{{\bar{\gamma }}_{{\mathbf {i}}l} + \gamma _{{\mathbf {i}}l}(x)\}; \\ &{}(l,m) \rightarrow (({\mathbf {i}}+e_l,l),m) &{}\quad \text{ at } \text{ rate }\quad x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x); \\ &{}(l,m) \rightarrow (\Delta ,m) &{}\quad \text{ at } \text{ rate }\quad \bar{\zeta }_l + \zeta _l(x). \end{array} \end{aligned}$$
(2.4)

Here, the quantities \({\bar{\lambda }}^{(1)}_{{\mathbf {i}}ls}\) and \(\lambda ^{(1)}_{{\mathbf {i}}ls}(x)\) represent the rates at which, in a type \({\mathbf {i}}\) patch, an animal of variety \(l\) produces offspring in the composition \(s\), and they would form a part of the rates \({\bar{\lambda }}_{{\mathbf {i}},{\mathbf {i}}+s}\) and \(\lambda _{{\mathbf {i}},{\mathbf {i}}+s}(x)\); they are assumed not to depend on \(m\). Similar considerations apply to the quantities \({\bar{\lambda }}^{(2)}_{{\mathbf {i}}{\mathbf { j}}}\) and \(\lambda ^{(2)}_{{\mathbf {i}}{\mathbf { j}}}(x)\), which relate to events changing the composition of the patch containing \({\mathcal {A}}\) that do not result in offspring for \({\mathcal {A}}\) or a change in its variety, including migration of other animals from the patch or the arrival of migrants. Thus, for instance, one might have \({\bar{\lambda }}_{{\mathbf {i}},{\mathbf {i}}+e_l} = \varphi _{1l} i_l\), \({\bar{\lambda }}_{{\mathbf {i}},{\mathbf {i}}-e_l} = \varphi _{2l} i_l\), \({\bar{\gamma }}_{{\mathbf {i}}l} = i_l\varphi _{3l}\) and \(\sigma _{l{\mathbf {i}}}(x) = \sigma _{l{\mathbf {i}}}\), \(1\le l\le d\), corresponding to constant per capita birth, death, migration and immigration rates \(\varphi _{1l}\), \(\varphi _{2l}\), \(\varphi _{3l}\) and \(\sigma _{l{\mathbf {i}}}\) of individuals of variety \(l\). These would imply \({\bar{\lambda }}^{(1)}_{{\mathbf {i}}le_l} = \varphi _{1l}\), \({\bar{\lambda }}^{(2)}_{{\mathbf {i}},{\mathbf {i}}+e_l} = (i_l-1)\varphi _{1l}\), \(\lambda ^{(2)}_{{\mathbf {i}},{\mathbf {i}}+e_l}(x) = x_l\sigma _{l{\mathbf {i}}}\), and \({\bar{\lambda }}^{(2)}_{{\mathbf {i}},{\mathbf {i}}-e_l} = (i_l-1)(\varphi _{2l} + \varphi _{3l})\) for transitions only involving \(l\)-animals, and, for \(l' \ne l\), \({\bar{\lambda }}^{(2)}_{{\mathbf {i}},{\mathbf {i}}+e_{l'}} = i_{l'}\varphi _{1l'}\), \(\lambda ^{(2)}_{{\mathbf {i}},{\mathbf {i}}+e_{l'}}(x) = x_{l'}\sigma _{l'{\mathbf {i}}}\), and \({\bar{\lambda }}^{(2)}_{{\mathbf {i}},{\mathbf {i}}-e_{l'}} = i_{l'}(\varphi _{2l'} + \varphi _{3l'})\). The transition rates \({\bar{\lambda }}^{(3)}_{{\mathbf {i}}ll'}\) and \(\lambda ^{(3)}_{{\mathbf {i}}ll'}(x)\) relate to events that change \({\mathcal {A}}\)’s variety from \(l\) to \(l'\); it is tacitly assumed that no other changes take place when this happens, but more general possibilities could have been allowed. The rates \({\bar{\lambda }}^{(4)}_{{\mathbf {i}}ll'}\) and \(\lambda ^{(4)}_{{\mathbf {i}}ll'}(x)\) relate to births of migrants as offspring of an \(l\)-animal. The rates \({\bar{\delta }}'_{{\mathbf {i}}l} \ge {\bar{\delta }}_{\mathbf {i}}\) and \(\delta '_{{\mathbf {i}}l}(x) \ge \delta _{\mathbf {i}}(x)\) include a contribution from the mortality rate of an animal of variety \(l\) in a patch of type \({\mathbf {i}}\), in addition to the rate of destruction of the patch itself. As for the single patch dynamics, the rates for the process \(Z_N\) at time \(t\) are obtained by replacing \(x\) with \(x^N(t)\) in the expressions (2.4).

These constructions immediately suggest approximating the processes \(Y_N\) and \(Z_N\) by random processes \(Y\) and \(Z\), in which the transition rates at time \(t\) are obtained by replacing \(x\) by \(x(t)\) in (2.3) and (2.4). Consider first the processes \(Y_N\) and \(Y\). Suppose, for some \(\delta > 0\), that the functions \(\lambda _{{\mathbf {i}}{\mathbf { j}}}\), \(\gamma _{{\mathbf {i}}l}\), \(\sigma _{l{\mathbf {i}}}\) and \(\delta _{\mathbf {i}}\) are all of uniformly bounded Lipschitz \(\mu \)-norm, for \(x\) in a set \(B_{T,\delta } := \{x \in \mathcal{{X}}':\,\inf _{0\le t\le T}\Vert x-x(t)\Vert _\mu \le \delta \}\) of points close to the deterministic trajectory \((x(t),\,0\le t\le T)\). Then, in view of (2.3), the jump rates of \(Y_N\) and \(Y\) at any time \(t\in [0,T]\) differ only by a small amount, on the event that \(\sup _{0\le t\le T}\Vert x^N(t) - x(t)\Vert _\mu \le \varepsilon _N\), provided that \(N\) is large enough that \(\varepsilon _N \le \delta \). Indeed, defining \(f^* := \sup _{x \in B_{T,\delta }}|f(x)|\) for any \(f:\,\mathcal{{X}}\rightarrow \mathbb {R}\), and setting

$$\begin{aligned} |Df|(x)\ :=\ \limsup _{\varepsilon \rightarrow 0}\sup _{0 < \Vert y-x\Vert _\mu < \varepsilon }\{|f(y) - f(x)| / \Vert y-x\Vert _\mu \}, \end{aligned}$$

it follows that, if \(|x-x(t)| \le \varepsilon < \delta \) and \(0\le t\le T\), then the sum of the differences of the transition rates out of \(x\) and \(x(t)\) is bounded by

$$\begin{aligned}&\sup _{{\mathbf {i}}\in \mathcal{{Z}}_1} \left\{ \sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}|\lambda _{{\mathbf {i}}{\mathbf { j}}}(x) - \lambda _{{\mathbf {i}}{\mathbf { j}}}(x(t))| + \sum _{l=1}^d|\gamma _{{\mathbf {i}}l}(x) - \gamma _{{\mathbf {i}}l}(x(t))| \right. \\&\qquad \left. + \sum _{l=1}^d|x_l\sigma _{l{\mathbf {i}}}(x) - x_l(t)\sigma _{l{\mathbf {i}}}(x(t))| + |\delta _{\mathbf {i}}(x) - \delta _{\mathbf {i}}(x(t))|\right\} \ \le \ \varepsilon D_Y(T,\delta ), \end{aligned}$$

where, writing \(\hat{\sigma }_{l{\mathbf {i}}}(x) := x_l\sigma _{l{\mathbf {i}}}(x)\), we define

$$\begin{aligned} D_Y(T,\delta ) {:=}\sup _{{\mathbf {i}}\in \mathcal{{Z}}_1} \left\{ \sum _{{\mathbf { j}}\in \mathcal{{Z}}_1} |D\lambda _{{\mathbf {i}}{\mathbf { j}}}|^* + \sum _{l=1}^d\{|D\gamma _{{\mathbf {i}}l}|^* + |D\hat{\sigma }_{l{\mathbf {i}}}|^*\} + |D\delta _{\mathbf {i}}|^* \right\} . \end{aligned}$$

Thus, until the time at which first \(\Vert x^N(t) - x(t)\Vert _\mu > \varepsilon _N\), the aggregate difference between the jump rates of the processes \(Y_N\) and \(Y\) is bounded by \(\varepsilon _N D_Y(T,\delta )\), if also \(t \le T\). This immediately leads to the following theorem.

Theorem 2.1

Suppose that (2.2) holds, and that \(D_Y(T,\delta ) < \infty \) for some \(\delta > 0\). Then, if \(Y_N(0) = Y(0)\) and \(\varepsilon _N \le \delta \), the processes \(Y_N\) and \(Y\) can be constructed on the same probability space in such a way that

$$\begin{aligned} {\mathbb {P}}[Y_N(t) = Y(t)\quad \text{ for } \text{ all }\ 0\le t\le T]\ \ge \ 1 - \{ T \varepsilon _N D_Y(T,\delta ) + P_T(N,\varepsilon _N)\}. \end{aligned}$$

Proof

Let \(Y_1\) and \(Y_2\) be time-inhomogeneous Markov processes on a countable state space \(\mathcal{{Y}}\), with transition rates \(q_1(t,y,y')\) and \(q_2(t,y,y')\) respectively. Starting with \(Y_1(0) = Y_2(0) = y_0\), the processes can be coupled by representing them as the marginals of a joint process \(((Y_1(t),Y_2(t)),\,t\ge 0)\), whose transition rates at points on the diagonal are given by

$$\begin{aligned} q(t,(y,y),(y',y'))&:= \min \{q_1(t,y,y'),q_2(t,y,y')\};\\ q(t,(y,y),(y,y'))&:= \{q_2(t,y,y') - q_1(t,y,y')\}_+;\\ q(t,(y,y),(y',y))&:= \{q_1(t,y,y') - q_2(t,y,y')\}_+, \end{aligned}$$

and with the components evolving independently when off the diagonal. Let \(\tau := \inf \{t\ge 0:\, Y_1(t) \ne Y_2(t)\}\), and let \(E_t^\eta \) denote the event \(\{Q(s,Y_1(s)) \le \eta \text{ for } \text{ all } 0\le s\le t\}\), where

$$\begin{aligned} Q(t,y) {:=}\sum _{y'\in \mathcal{{Y}}} |q_2(t,y,y') - q_1(t,y,y')|. \end{aligned}$$

Then the one-jump process \((I[\{\tau \le t\} \cap E_t^\eta ],\,t\ge 0)\) has compensator

$$\begin{aligned} A_t {:=}\int _0^{t\wedge \tau } Q(s,Y_1(s)) I[E_s^\eta ]\,ds \ \le \ \eta t. \end{aligned}$$

This implies that, for any \(T > 0\),

$$\begin{aligned} {\mathbb {P}}[\{\tau \le T\} \cap E_T^\eta ] \ =\ {\mathbb {E}}\{I[\{\tau \le T\} \cap E_T^\eta ]\} \ =\ {\mathbb {E}}A_T \ \le \ \eta T, \end{aligned}$$

from which it follows that \({\mathbb {P}}[\tau \le T] \le \eta T + {\mathbb {P}}[(E_T^\eta )^c]\). Thus this construction realizes \(Y_1\) and \(Y_2\) on the same probability space, in such a way that the two remain identical up to time \(T\) with probability at least \(1 - (\eta T + {\mathbb {P}}[(E_T^\eta )^c])\).

Now, taking \(Y_N\) for \(Y_1\) and \(Y\) for \(Y_2\), and setting \(\eta = \varepsilon _N D_Y(T,\delta )\), the theorem follows from (2.2). \(\square \)

Since all the transitions in (2.3) involve a single patch, the theorem generalizes easily to any group of \(K\) patches. The transition rates for the process \((Y_N^{[1]},Y_N^{[2]},\ldots ,Y_N^{[K]})\) at time \(t\) from a state \(({\mathbf {i}}^{(1)},\ldots ,{\mathbf {i}}{^{(K)}})\) to one in which \({\mathbf {i}}{^{(k)}}\) is replaced by \({\mathbf {i}}{^{(k')}}\), with \({\mathbf {i}}{^{(k')}}\) either of the form \({\mathbf {i}}{^{(k)}}+ {\mathbf { j}}\), \({\mathbf { j}}\in \mathbb {Z}^d\), or \(\Delta \), are given by the formulae in (2.3) with \({\mathbf {i}}{^{(k)}}\) for \({\mathbf {i}}\), and with \(x^N(t)\) for \(x\). The rates for a vector of independent processes \(Y^{[k]}\), \(1\le k\le K\), each distributed as \(Y\), with \(Y^{[k]}(0) = {\mathbf {i}}{^{(k)}}\), are the corresponding rates with \(x(t)\) for \(x\). This leads to the following corollary.

Corollary 2.2

Under the conditions of Theorem 2.1,

$$\begin{aligned} {\mathbb {P}}[(Y_N^{[1]}(t),\ldots ,Y_N^{[K]}(t))&= (Y^{[1]}(t),\ldots ,Y^{[K]}(t))\quad \text{ for } \text{ all }\ 0\le t\le T]\\&\ge \ 1 - \{ K T \varepsilon _N D_Y(T,\delta ) + P_T(N,\varepsilon _N)\}. \end{aligned}$$

Thus the joint distribution of \(K_N\) patches is asymptotically close to that of \(K_N\) independently evolving patches over any fixed interval \([0,T]\), as \(N\rightarrow \infty \), if \(K_N \varepsilon _N \rightarrow 0\), \(P_T(N,\varepsilon _N)\rightarrow 0\) and \(D_Y(T,\delta ) < \infty \) for some \(\delta > 0\).

For the life history process of an animal, the argument for a single individual is very similar. We consider the differences in the transition rates (2.4) with arguments \(x^N(t)\) and \(x(t)\); defining

$$\begin{aligned} D_Z(T,\delta )&:= \max _{1\le l\le d}\left( \sup _{{\mathbf {i}}\in \mathcal{{Z}}_1} \left\{ \sum _{s\in \mathbb {Z}_+^d} |D\lambda ^{(1)}_{{\mathbf {i}}ls}|^* + {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}|D\lambda ^{(2)}_{{\mathbf {i}}{\mathbf { j}}}|^* + \sum _{l'=1}^d |D\lambda ^{(3)}_{{\mathbf {i}}ll'}|^* \right. \right. \\&\quad + \left. \left. \sum _{l'=1}^d |D\lambda ^{(4)}_{{\mathbf {i}}ll'}|^* + |D\delta '_{\mathbf {i}}|^* + |D\gamma _{{\mathbf {i}}l}|^* + |D\hat{\sigma }'_{l{\mathbf {i}}}|^* {} \right\} + |D\zeta _l|^* \right) , \end{aligned}$$

where \(\hat{\sigma }'_{l{\mathbf {i}}}(x) := x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x)\), this gives the following result.

Theorem 2.3

Suppose that (2.2) holds, and that \(D_Z(T,\delta ) < \infty \) for some \(\delta > 0\). Then, if \(\varepsilon _N \le \delta \) and \(Z_N(0) = Z(0)\), the processes \(Z_N\) and \(Z\) can be constructed on the same probability space in such a way that

$$\begin{aligned} {\mathbb {P}}[Z_N(t) = Z(t)\quad \text{ for } \text{ all }\ 0\le t\le T]\ \ge \ 1 - \{T \varepsilon _N D_Z(T,\delta ) + P_T(N,\varepsilon _N)\}. \end{aligned}$$

For the joint distribution of a group of \(K\) animals, asymptotic independence is not quite as straightforward, since all but the fourth and the last transitions in (2.4) simultaneously change the state of any other animal in the same patch. Hence it is necessary to begin with all animals in different patches, and the simple coupling breaks down once two of them are to be found in the same patch. This can only occur when a migrant enters a patch that already contains another of the \(K\) animals. For a given animal of variety \(l\), an upper bound for the maximum rate at which it can enter such a patch is \(N^{-1}(K-1) \sup _{\mathbf {i}}|\sigma _{l{\mathbf {i}}}|^*\), because the \((K-1)\) other animals of the group can be in at most \(K-1\) distinct patches, and \(\sigma _{l{\mathbf {i}}}(x) \le |\sigma _{l{\mathbf {i}}}|^*\); and there are \(K\) animals that could migrate into such a patch. Hence the event that no two of the \(K\) animals are in the same patch during the interval \([0,T]\) has probability bounded by \(K^2 N^{-1} \sigma ^+\), where \(\sigma ^+ := \sup _{{\mathbf {i}}\in \mathcal{{Z}}_1} \max _{1\le l\le d} |\sigma _{l{\mathbf {i}}}|^*\). This leads to the following corollary.

Corollary 2.4

Suppose that (2.2) holds, and that \(D_Z(T,\delta ) < \infty \) for some \(\delta > 0\). Then, if \(\varepsilon _N \le \delta \) and the \(K\) individuals are initially all in distinct patches, we have

$$\begin{aligned} {\mathbb {P}}[(Z_N^{[1]}(t),\ldots ,Z_N^{[K]}(t))&= (Z^{[1]}(t),\ldots ,Z^{[K]}(t))\quad \text{ for } \text{ all }\ 0\le t\le T]\\&\ge \ 1 - \{ K T \varepsilon _N D_Z(T,\delta ) + T K^2 N^{-1} \sigma ^+ + P_T(N,\varepsilon _N)\} , \end{aligned}$$

where the \(Z^{[k]}\), \(1\le k\le K\), are independent copies of \(Z\) with \(Z^{[k]}(0) = Z_N^{[k]}(0)\).

Thus, if (2.2) holds and \(D_Z(T,\delta ) < \infty \) for some \(\delta > 0\), any group of \(K_N\) animals that are initially in different patches behaves asymptotically as a group of independent individuals, under the same asymptotic scenario as before, if also \(N^{-1}K_N^2 \rightarrow 0\) as \(N \rightarrow \infty \).

The model in Arrigoni (2003) does not conform to our general prescription, because migration is assumed to take place instantaneously, rather than by way of an intermediate migration state. However, the state dependent elements of its transition rates are locally uniformly Lipschitz, and (2.2) holds, so that analogous theorems hold for this model as well. We do not include instantaneous migration in our general formulation, partly because it seems unrealistic, but mainly because, for the methods in Barbour and Luczak (2012a) to be applied, only rather restrictive choices can be allowed for the migration transitions. For instance, in the Arrigoni model, it is important that the migration rate \({\bar{\gamma }}_{i}\) out of patches with \(i\) individuals is given by \({\bar{\gamma }}_{i} = \gamma i\); variants in which \(i^{-1}{\bar{\gamma }}_i\) increases with \(i\) would not lead to a locally Lipschitz drift \(F\) in (3.14) below.

3 Establishing the law of large numbers

We now need to prove that (2.2) holds. For this, we need to find conditions on the transition rates in I–VI that allow us to apply the results of Barbour and Luczak (2012a) to the process \(X^N\). First, we need to make some small modifications to the setting in the previous section. We start by augmenting the type space \(\mathcal{{Z}}\) to \({\widetilde{\mathcal{{Z}}}}\), by substituting \({\widetilde{\mathcal{{Z}}}}_2 := \{1,2,\ldots ,d\} \times \{0,1\}\) for \(\mathcal{{Z}}_2\), where the type \((l,1)\) replaces the previous type \(l\) in representing an individual of variety \(l\) in migration, and type \((l,0)\) is to be thought of as an unused place available for a migrant of variety \(l\). Then, in transitions IV and IV\('\), \(e(l)\) is replaced by \(e(l,1) - e(l,0)\) and, in transitions V and VI, \(-e(l)\) is replaced by \(e(l,0) - e(l,1)\) and \(X_l\) by \(X_{l1}\). The number \(X_{l0}\) of patches of type \(e(l,0)\) can be deduced from the number \(X_{l1}\) of \(e(l,1)\) patches, since the sum \(X_{l1} + X_{l0}\) remains constant in all transitions, and is therefore always the same as its initial value. However, to prevent the number of type \((l,0)\) patches becoming negative, the process \(X^N\) has to be stopped at the time \(\tau _{0,N} := \inf \{t\ge 0:\, \min _{1\le l\le d} X_{l0}^N = 0\}\). So that this has little effect on the process, \(X^N(0)\) is chosen with \(X^N_{l0} \ge Nh_l\), \(1\le l\le d\), with the \(h_l\) so large that, for fixed \(T\), the event \(\{\tau _{0,N} \le T\}\) has asymptotically small probability as \(N\rightarrow \infty \). The reason for introducing the empty migration patches will emerge shortly.

3.1 A priori bounds

We now introduce a measure \(\nu \) of the size of a patch, defining \(\nu (l,0) = \nu (l,1) := 1\) for \(1\le l\le d\), and \(\nu ({\mathbf {i}}) := \Vert {\mathbf {i}}\Vert _1+1\), one more than the number of individuals in a type \({\mathbf {i}}\) patch. More flexible choices for \(\nu \) are allowed in Barbour and Luczak (2012a), but this suffices here. It is then necessary to make assumptions ensuring that, for enough values of \(r\in \mathbb {Z}_+\), the empirical moments \(S_r(x^N(t)):= {\sum _{z \in {\widetilde{\mathcal{{Z}}}}}}\nu (z)^rx^N_z(t)\) remain bounded with high probability as \(N\) increases, if they are initially bounded. Let \(J\) denote a finite linear combination of coordinate vectors in \({\widetilde{\mathcal{{Z}}}}\). Let \({\mathcal {J}}\) denote the jumps \(J\) that appear in the transitions I–VI, with the above modification replacing \(e(l)\) by \(e(l,1) - e(l,0)\), and let the associated transition rates be denoted by \(N\alpha _J(x)\). Note that we can suppose that \(x \in \mathcal{{X}}'\), if the \(l\) coordinates in \(\mathcal{{Z}}\) are identified with the \((l,1)\) coordinates in \({\widetilde{\mathcal{{Z}}}}\), since the values \(x_{(l,0)}\) do not appear in the expressions for the transition rates I–VI. For \(J := \sum _{k=1}^K a_k e({\mathbf { j}}{^{(k)}}) \in {\mathcal {J}}\), write

$$\begin{aligned} \nu _r^+(J) {:=}\sum _{k=1}^K a_k \{\nu ({\mathbf { j}}{^{(k)}})\}^r, \end{aligned}$$
(3.1)

and, for \(r\in \mathbb {Z}_+\), define

$$\begin{aligned} U_r(x) {:=}\sum _{J\in {\mathcal {J}}}\alpha _J(x)\nu _r^+(J); \quad V_r(x) {:=}\sum _{J\in {\mathcal {J}}}\alpha _J(x)\{\nu _r^+(J)\}^2. \end{aligned}$$
(3.2)

Then, in order to be able to apply the theorems of Barbour and Luczak (2012a), we assume that, for some \(r^{(1)}\ge 1\) and for all \(0 \le r \le r^{(1)}\),

$$\begin{aligned} \sum _{J\in {\mathcal {J}}}\alpha _J(N^{-1}X)|\nu _r^+(J)| \ <\ \infty \quad \text{ for } \text{ each }\ X \in \mathcal{{X}}, \end{aligned}$$
(3.3)

and that, for suitable constants \(k_{rl}\) and all \(x\in \mathcal{{X}}'\),

$$\begin{aligned} \begin{aligned} U_0(x)&\ \le \ k_{01}S_0(x) + k_{04}; \\ U_1(x)&\ \le \ k_{11}S_1(x) + k_{14}; \\ U_r(x)&\ \le \ \{k_{r1} + k_{r2}S_0(x)\}S_r(x) + k_{r4}, \quad 2\le r\le r^{(1)}, \end{aligned} \end{aligned}$$
(3.4)

and, for some \(r^{(2)}\ge 1\),

$$\begin{aligned} \begin{aligned} V_0(x)&\ \le \ k_{03}S_1(x) + k_{05}; \\ V_r(x)&\ \le \ k_{r3}S_{p(r)}(x) + k_{r5},\quad 1\le r \le r^{(2)}, \end{aligned} \end{aligned}$$
(3.5)

are satisfied, where \(1 \le p(r) \le r^{(1)}\) for \(1\le r\le r^{(2)}\).

In our setting, satisfying the condition (3.3) is straightforward except for the transitions of the form II, since, for \(X\in \mathcal{{X}}\), only finitely many of the \(X_{\mathbf {i}}\) are non-zero; and transitions of the form II are also the only ones that make positive contributions to \(U_0(x)\). One plausible assumption, covering these and later conditions, is to require that

$$\begin{aligned} \beta _{\mathbf { j}}(x) \le c'_{\mathbf { j}}(\Vert x\Vert _1 + 1), \quad \text{ where }\quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}c'_{\mathbf { j}}\{\nu ({\mathbf { j}})\}^r < \infty \quad \text{ for } \text{ each }\ r\in \mathbb {Z}_+. \end{aligned}$$
(3.6)

Here, and in what follows, \(c\) and \(c'\) are used to denote generic constants. If the types \((l,0)\) had not been introduced, there would also be positive contributions of \({\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}X_{\mathbf { j}}{\bar{\gamma }}_{{\mathbf { j}}l}\) to \(U_0(x)\) from transitions IV, and the most natural assumption for the value of \({\bar{\gamma }}_{{\mathbf { j}}l}\) is \(\gamma _l j_l\), for some constant \(\gamma _l\), corresponding to a constant per capita migration rate for \(l\)-individuals. Thus \({\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}X_{\mathbf { j}}{\bar{\gamma }}_{{\mathbf { j}}l}\) would be bounded by a multiple of \(S_1(x)\), rather than by a multiple of \(S_0(x)\), and so would not have come within the scope of Barbour and Luczak (2012a). For the remaining conditions concerning \(U_r(x)\), \(r\ge 1\), it is enough to assume that, for \({\mathbf {i}}\in \mathcal{{Z}}_1\) and for all \(x\in \mathcal{{X}}'\),

$$\begin{aligned}&\displaystyle {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}({\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x))\{\nu ({\mathbf { j}}) - \nu ({\mathbf {i}})\}_+ \le c\nu ({\mathbf {i}}); \end{aligned}$$
(3.7)
$$\begin{aligned}&\displaystyle {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}({\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x))(\{\nu ({\mathbf { j}})\}^r - \{\nu ({\mathbf {i}})\}^r)_+ \le c\{\nu ({\mathbf {i}})\}^r(\Vert x\Vert _1+1), \end{aligned}$$
(3.8)

and that, for \(1\le l\le d\) and for all \(x\in \mathcal{{X}}'\),

$$\begin{aligned} \sigma _{l{\mathbf {i}}}(x) \ \le \ c;\quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\sigma _{l{\mathbf { j}}}(x) \ \le \ c. \end{aligned}$$
(3.9)

For the conditions concerning \(V_r(x)\), \(r\ge 0\), with \(p(r) = 2r+1\) as in Barbour and Luczak (2012a, b), we assume further that, for \({\mathbf {i}}\in \mathcal{{Z}}_1\) and \(1\le l\le d\) and for all \(x\in \mathcal{{X}}'\),

$$\begin{aligned} {\bar{\delta }}_{\mathbf {i}}+ \delta _{\mathbf {i}}(x) \ \le \ c\nu ({\mathbf {i}}),\quad {\bar{\gamma }}_{{\mathbf { j}}l} + \gamma _{{\mathbf {i}}l}(x) \ \le \ c\nu ({\mathbf {i}}) \quad \text{ and }\quad {\bar{\gamma }}'_{{\mathbf {i}}l} + \gamma '_{{\mathbf {i}}l}(x) \ \le \ c\nu ({\mathbf {i}}),\qquad \end{aligned}$$
(3.10)

and that

$$\begin{aligned} {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}({\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x))(\{\nu ({\mathbf { j}})\}^r - \{\nu ({\mathbf {i}})\}^r)^2 \ \le \ c\{\nu ({\mathbf {i}})\}^{2r+1}. \end{aligned}$$
(3.11)

3.2 The deterministic equation

The process \(x^N := N^{-1}X^N\) has infinitesimal drift \(F_0(x)\), \(x\in \mathcal{{X}}'\), whose components are formally given by

$$\begin{aligned} F_{0;{\mathbf {i}}}(x)&:= {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\{{\bar{\lambda }}_{{\mathbf { j}}{\mathbf {i}}} + \lambda _{{\mathbf { j}}{\mathbf {i}}}(x)\} - x_{\mathbf {i}}{\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\{{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} + \lambda _{{\mathbf {i}}{\mathbf { j}}}(x)\} - x_{\mathbf {i}}\{{\bar{\delta }}_{{\mathbf {i}}} + \delta _{{\mathbf {i}}}(x)\} \nonumber \\&\quad +\, \beta _{\mathbf {i}}(x) + \sum _{l=1}^dx_{{\mathbf {i}}+e_l}\{{\bar{\gamma }}_{{\mathbf {i}}+e_l,l} + \gamma _{{\mathbf {i}}+e_l,l}(x)\} - x_{\mathbf {i}}\sum _{l=1}^d\{{\bar{\gamma }}_{{\mathbf {i}}l} + \gamma _{{\mathbf {i}}l}(x)\} \nonumber \\&\quad +\, \sum _{l=1}^dx_{l1}\{x_{{\mathbf {i}}-e_l}\sigma _{l,i-e_l}(x) - x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x)\}, \end{aligned}$$
(3.12)

for \({\mathbf {i}}\in \mathcal{{Z}}_1\), and, for \(1\le l\le d\),

$$\begin{aligned} F_{0;l1}(x)&:= {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\{{\bar{\gamma }}_{{\mathbf { j}}l} + \gamma _{{\mathbf { j}}l}(x) + {\bar{\gamma }}'_{{\mathbf { j}}l} + \gamma '_{{\mathbf { j}}l}(x)\} \nonumber \\&\quad - x_{l1} {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\sigma _{l{\mathbf { j}}}(x) - x_{l1}\{\bar{\zeta }_l + \zeta _l(x)\}; \end{aligned}$$
(3.13)

these expressions only make sense if the \({\mathbf { j}}\)-sums are all finite. The drift in the \((l,0)\) coordinate is given by \(-F_{0;l1}(x)\), but we do not use it explicitly. Thus, for \(x \in \mathcal{{X}}'\) such that \(F(x)\) exists, we can write

$$\begin{aligned} F_0(x) {:=}Ax + F(x), \end{aligned}$$
(3.14)

to be interpreted as an element of \(\mathbb {R}_+^{\mathcal{{Z}}}\), where

$$\begin{aligned} \begin{aligned} A_{{\mathbf {i}}{\mathbf { j}}} \,&:=\, {\bar{\lambda }}_{{\mathbf { j}}{\mathbf {i}}} + \sum _{l=1}^d\mathbf{1}_{\{{\mathbf { j}}={\mathbf {i}}+e_l\}}{\bar{\gamma }}_{{\mathbf { j}}l},\quad {\mathbf {i}}\ne {\mathbf { j}}\in \mathcal{{Z}}_1;\\ A_{{\mathbf {i}}{\mathbf {i}}} \,&:=\, - {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}} - {\bar{\delta }}_{\mathbf {i}}- \sum _{l=1}^d{\bar{\gamma }}_{{\mathbf {i}}l},\quad {\mathbf {i}}\in \mathcal{{Z}}_1;\\ A_{{\mathbf {i}}l} \,&:=\, 0, \ A_{l{\mathbf {i}}} \,:=\, {\bar{\gamma }}_{{\mathbf {i}}l} + {\bar{\gamma }}'_{{\mathbf {i}}l}, \ A_{ll} \,:=\, -\bar{\zeta }_l, \ A_{ll'} \,:=\, 0, \quad {\mathbf {i}}\in \mathcal{{Z}}_1,\,1\le l,l'\le d, \end{aligned} \end{aligned}$$
(3.15)

with \(l\) in the indices of \(A\) as shorthand for \((l,1)\); and where

$$\begin{aligned} F_{{\mathbf {i}}}(x)&:= {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\lambda _{{\mathbf { j}}{\mathbf {i}}}(x) - x_{\mathbf {i}}{\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\lambda _{{\mathbf {i}}{\mathbf { j}}}(x) + \beta _{\mathbf {i}}(x) - x_{\mathbf {i}}\delta _{{\mathbf {i}}}(x) + \sum _{l=1}^dx_{{\mathbf {i}}+e_l} \gamma _{{\mathbf {i}}+e_l,l}(x) \nonumber \\&\quad - x_{\mathbf {i}}\sum _{l=1}^d\gamma _{{\mathbf {i}}l}(x) + \sum _{l=1}^dx_{l1}\{x_{{\mathbf {i}}-e_l}\sigma _{l,{\mathbf {i}}-e_l}(x) - x_{\mathbf {i}}\sigma _{l{\mathbf {i}}}(x)\}, \end{aligned}$$
(3.16)

for \({\mathbf {i}}\in \mathcal{{Z}}_1\), and, for \(1\le l\le d\),

$$\begin{aligned} F_{l1}(x)&:= {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\{\gamma _{{\mathbf { j}}l}(x) + \gamma '_{{\mathbf { j}}l}(x)\} - x_{l1} {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}x_{\mathbf { j}}\sigma _{l{\mathbf { j}}}(x) - x_{l1} \zeta _l(x). \end{aligned}$$
(3.17)

The reason for splitting the drift as above is to treat models in which the transition rates are not bounded as \(\nu ({\mathbf {i}})\) increases—migration, birth and death rates proportional to the numbers of individuals in a patch are very natural—enabling the theory of perturbed linear operators to be applied.

We first assume that there is a real \(\mu \in [1,\infty )^\mathcal{{Z}}\) such that, for some \(w\ge 0\),

$$\begin{aligned} A^T \mu \ \le \ w\mu , \end{aligned}$$
(3.18)

and use it to define the \(\mu \)-norm

$$\begin{aligned} \Vert x\Vert _\mu {:=}\sum _{z\in \mathcal{{Z}}}\mu (z)|x_z| \quad \text{ on }\quad \mathcal{{X}}'_\mu := \{x \in \mathbb {R}^\mathcal{{Z}}:\, \Vert x\Vert _\mu < \infty \}, \end{aligned}$$
(3.19)

with \(x_l\) identified with \(x_{l1}\) as before. Note that, if (3.18) is assumed, we must have \(\sum _{z\in \mathcal{{Z}}}{\bar{\lambda }}_{{\mathbf {i}}z}\mu (z) < \infty \) for each \({\mathbf {i}}\). Then, as in Barbour and Luczak (2012a, Theorem 3.1), there exists a \(\mu \)-strongly continuous semigroup \(\{R(t),\,t\ge 0\}\) with elementwise derivative \(R'(0)=A\). Furthermore, if \(F:\mathcal{{X}}'_\mu \rightarrow \mathcal{{X}}'_\mu \) is locally \(\mu \)-Lipschitz and \(\Vert x(0)\Vert _\mu < \infty \), the integral equation

$$\begin{aligned} x(t) \ =\ R(t)x(0) + \int _0^t R(t-u)F(x(u))\,du \end{aligned}$$
(3.20)

has a unique, \(\mu \)-continuous solution on \([0,T]\) for any \(0 < T < t_{\max }\), for some \(t_{\max }\le \infty \). This \(x\) is the deterministic curve that approximates \(x^N(t)\) when \(x^N(0)\) is \(\mu \)-close enough to \(x(0)\).

From now on, we take \(\mu ({\mathbf { j}}) := \Vert {\mathbf { j}}\Vert _1 + 1\) for \({\mathbf { j}}\in \mathcal{{Z}}_1\) and \(\mu (l) := 1\) for \(1\le l\le d\). Inequality (3.18) is then satisfied if

$$\begin{aligned} {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}{\bar{\lambda }}_{{\mathbf {i}}{\mathbf { j}}}(\mu ({\mathbf { j}}) - \mu ({\mathbf {i}})) + \sum _{l=1}^d({\bar{\gamma }}_{{\mathbf {i}}-e_l,l} - {\bar{\gamma }}_{{\mathbf {i}}l})\mu ({\mathbf {i}}-e_l) + \sum _{l=1}^d{\bar{\gamma }}'_{{\mathbf {i}}l} \ \le \ w\mu ({\mathbf {i}}) \end{aligned}$$
(3.21)

for all \({\mathbf {i}}\in \mathcal{{Z}}_1\). In order then to deduce that \(F:\mathcal{{X}}'_\mu \rightarrow \mathcal{{X}}'_\mu \) is locally \(\mu \)-Lipschitz, sufficient conditions are that, for \(1\le l\le d\) and \({\mathbf {i}}\in \mathcal{{Z}}_1\), and for any \(R > 0\),

$$\begin{aligned} \begin{aligned}&\sigma _{{\mathbf {i}}l}(x), \delta _{\mathbf {i}}(x), \gamma _{{\mathbf {i}}l}(x), \gamma '_{{\mathbf {i}}l}(x), \zeta _l(x)\ \text{ and }\ {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\lambda _{{\mathbf {i}}{\mathbf { j}}}(x)\ \text{ are } \text{ uniformly } \text{ bounded, } \text{ and }\\&\delta _{\mathbf {i}}(x), \gamma _{{\mathbf {i}}l}(x), \gamma '_{{\mathbf {i}}l}(x), \sigma _{{\mathbf {i}}l}(x)\ \text{ and }\ \zeta _l(x)\ \text{ are } \mu \text{-uniformly } \text{ Lipschitz, } \text{ in }\ x \in B_R; \\&{\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}|\lambda _{{\mathbf {i}}{\mathbf { j}}}(x) - \lambda _{{\mathbf {i}}{\mathbf { j}}}(y)| \ \le \ c\Vert x-y\Vert _\mu , \quad {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}|\beta _{\mathbf { j}}(x)-\beta _{\mathbf { j}}(y)|\mu ({\mathbf { j}}) \ \le \ c\Vert x-y\Vert _\mu , \\&{\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}|\lambda _{{\mathbf {i}}{\mathbf { j}}}(x) - \lambda _{{\mathbf {i}}{\mathbf { j}}}(y)|\mu ({\mathbf { j}}) \ \le \ c\mu ({\mathbf {i}})\Vert x-y\Vert _\mu \ \ \text{ and }\ \ {\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\lambda _{{\mathbf {i}}{\mathbf { j}}}(x)\mu ({\mathbf { j}}) \ \le \ c\mu ({\mathbf {i}}),\\&\quad \text{ uniformly } \text{ in }\ x,y \in B_R, \end{aligned}\nonumber \\ \end{aligned}$$
(3.22)

for suitable constants \(c = c_R\), where \(B_R\) is the ball of radius \(R\) in \(\mathcal{{X}}'_\mu \).

3.3 The law of large numbers approximation

In order to apply the results of Barbour and Luczak (2012a), we still need to check that their Assumption 4.2 is satisfied. Part (1) is satisfied with \(r_\mu =1\), because \(\mu (z) = \nu (z)\) for all \(z\in {\widetilde{\mathcal{{Z}}}}\). For Part (2), we define \(\zeta ({\mathbf {i}}) := (\Vert {\mathbf {i}}\Vert _1+1)^{2d+5}\) for \({\mathbf {i}}\in \mathcal{{Z}}_1\) and \(\zeta (l,1) := \zeta (l,0) = 1\) for \(1\le l\le d\), and observe that then, using conditions (3.7) and (3.10), the sum

$$\begin{aligned} Z {:=}{\sum _{{\mathbf { j}}\in \mathcal{{Z}}_1}}\frac{\mu ({\mathbf { j}})(A_{{\mathbf { j}}{\mathbf { j}}}+1)}{\sqrt{\zeta ({\mathbf { j}})}} \ =\ O\left( \sum _{j\ge 0} j^{(d-1) + 2 - (d+5/2)}\right) < \infty . \end{aligned}$$

This implies that Barbour and Luczak (2012a, Assumption 4.2(2)) is satisfied, provided that \(\zeta \) satisfies Barbour and Luczak (2012a, Assumption (2.25)). Defining \(f(J) := \sum _{k=1}^K |a_k| \zeta ({\mathbf { j}}{^{(k)}})\) when \(J := \sum _{k=1}^K a_k {\mathbf { j}}{^{(k)}}\), this in turn requires that

$$\begin{aligned} \sum _{J\in {\mathcal {J}}}\alpha _J(x) f(J) \ \le \ \{k_1 S_r(x) + k_2\},\quad x\in \mathcal{{X}}', \end{aligned}$$
(3.23)

for some constants \(k_1\) and \(k_2\) and for some \(r \le r^{(2)}\). However, this also follows from conditions (3.7)–(3.10), if \(r = 2d+6\). Hence it is necessary to have \(r^{(2)}\ge 2d+6\) in (3.5) and thus \(r^{(1)}\ge 4d+13\) in (3.4).

Suppose now that the assumptions (3.6)–(3.11) of Sect. 3.1, and (3.18), (3.21) and (3.22) of Sect. 3.2, are all satisfied. Then it follows from Barbour and Luczak (2012a, Theorem 4.7) that, for a sequence of initial conditions satisfying

$$\begin{aligned} x_N(0) \in \mathcal{{X}}', \ N\ge 1;\quad S_{2d+6}(x_N(0)) \le C_*\ \text{ for } \text{ some }\ C_* < \infty , \end{aligned}$$
(3.24)

and

$$\begin{aligned} \Vert x_N(0) - x(0)\Vert _\mu \ =\ O(N^{-1/2}\sqrt{\log N})\quad \text{ for } \text{ some }\ x(0) \in \mathcal{{X}}'_\mu , \end{aligned}$$
(3.25)

the deterministic approximation (2.2) holds for any \(T\), with

$$\begin{aligned} \varepsilon _N \ =\ k_T N^{-1/2}\sqrt{\log N}\quad \text{ and }\quad P_T(N,\varepsilon _N) \ =\ k_T' N^{-1}\log N, \end{aligned}$$

for suitably chosen constants \(k_T\) and \(k_T'\). Note that Eq. (3.20) remains the same, whatever the values \(h_l\), \(1\le l\le d\), chosen as lower bounds for \(x^N_{l0}\). Hence, in view of this approximation, it follows that the event \(\{\tau _{0,N} \le T\}\) has probability at most \(P_T(N,\varepsilon _N)\) if the \(h_l\) are chosen to satisfy \(h_l \ge \sup _{0\le t\le T}x_{l1} + \delta \) for each \(l\), for some \(\delta >0\), whenever \(N\) is so large that \(\varepsilon _N < \delta \). Thus, under the above conditions on the rates for the transitions I–VI, the results of Sect. 2 all hold, with the above values of \(\varepsilon _N\) and \(P_T(N,\varepsilon _N)\). In particular, groups of patches or of animals of sizes \(K_N = O(N^\alpha )\), for any \(\alpha < 1/2\), behave asymptotically independently.

Remark 3.1

The assumptions concerning the transition rates are rather general, and cover many biologically useful models. They can be extended somewhat, as far as the permissible variation with \(x\) is concerned, by noting that the inequality (3.5), for \(r\ge 1\), could be replaced by

$$\begin{aligned} V_r(x) \le k_{r3}S_{p(r)}(x)(1+S_0(x)) + k_{r5}; \end{aligned}$$

this would require only minor modification to the proof of Barbour and Luczak (2012a, Theorem 2.4). For our purposes, the bounds in (3.10) and (3.11) could then be relaxed by multiplying their right hand sides by a factor \((\Vert x\Vert _1 + 1)\). However, it is not obvious that the inequality in (3.7) can be relaxed in this way, and this restricts the freedom for \(\lambda _{{\mathbf {i}}{\mathbf { j}}}(x)\) to vary with \(x\).

4 Examples

4.1 Example 1: The finite patch size models of Metz and Gyllenberg (2001)

The first model, with \(N\) patches and just one variety of animal, has transitions of the form I–VI, with index set \(\mathbb {Z}_+ \cup \{D\}\), where \(D\) is used here as index for the migrants (Metz and Gyllenberg use \(D\) to denote our \(x_D\)). In their notation, in a patch with \(i\) occupants, the birth rate is \({\bar{\lambda }}_{i,i+1} := i\lambda _i(1-d_i)\), the death rate \({\bar{\lambda }}_{i,i-1} := i\mu _i\), the catastrophe rate \({\bar{\lambda }}_{i,0} := \gamma _i\) and the birth rate of (juvenile) migrants \({\bar{\gamma }}'_{iD} := i\lambda _id_i\); here, \(0\le d_i\le 1\) for all \(i\). The arrival rate of a migrant into an \(i\)-patch is \(\sigma _{Di}(x) := \alpha s_i\), where \(0 \le s_i \le 1\) for all \(i\), and the death rate of a migrant is \(\zeta _D := \mu _D\). All other transition rates are zero; in particular, there is none of the explicit dependence on \(x\) that would be allowed in our formulation, for functions such as \(\lambda _{ij}(x)\).

We take \(\nu (i) = \mu (i) = i+1\), \(i\in \mathbb {Z}_+\), and \(\nu (D) = \mu (D) = 1\). Then assumption (3.6) is trivially satisfied, and (3.7) and (3.8) require \(\lambda _i\) to be bounded (so, as is reasonable, the per capita birth rate of an animal is to be bounded), in which case (3.10) is also satisfied. For (3.9), we require \(s_i\) to be bounded, which is satisfied since \(s_i\) are assumed to be probabilities. Condition (3.11) also involves \(\gamma _i\) and \(\mu _i\), and is satisfied if, in addition, \(\mu _i\) and \(i^{-1}\gamma _i\) are bounded in \(i\ge 1\). The conditions (3.22) are trivially satisfied, and (3.21) is satisfied for

$$\begin{aligned} w {:=}\sup _{i\ge 1}\{\lambda _i(1-d_i) - \mu _i - i^{-1}\gamma _i + ((i-1)\lambda _{i-1}d_{i-1} - i\lambda _id_i)\}, \end{aligned}$$

finite if also \(u_i := (i-1)\lambda _{i-1}d_{i-1} - i\lambda _id_i\) is bounded above in \(i\ge 1\). The quantity \(u_i\) is the amount by which the total migration from a patch declines, when the number of individuals in the patch increases from \(i-1\) to \(i\), and for this to be bounded is again an entirely reasonable hypothesis. Finally, the quantities \(D_Y(T,\delta )\) and \(D_Z(T,\delta )\) are bounded, since the \(s_i\) are bounded. Hence, assuming that

$$\begin{aligned} \lambda _i, \mu _i, i^{-1}\gamma _i, \ \text{ and }~u_i\ \text{ are } \text{ bounded }, \end{aligned}$$
(4.1)

our theorems apply to the initial model of Metz and Gyllenberg (2001), for initial conditions \(x^N(0)\) satisfying (3.24) and (3.25). As it happens, the authors restricted their model by imposing a maximal number of animals per patch ‘to make life easy’, so that (4.1) is trivially satisfied in their context; but such a restriction is unnatural, and we have shown that it can be replaced by (4.1). Metz and Gyllenberg use the deterministic approximation \(x := \{x(t),\,t\ge 0\}\) as the basis for their analysis, and this is justified over any fixed finite time interval \([0,T]\) by the discussion in Sect. 3, provided that \(N\) is large enough.

The results of Sect. 2 now show, in addition, that small groups of individuals behave almost independently of each other, according to time inhomogeneous Markov jump processes whose transition rates are determined by \(x\). For a chosen patch \({\mathcal {P}}\), the Markov process has transition rates at time \(t\) given by

$$\begin{aligned} \begin{array}{lllll} i&{} \rightarrow &{}i+1 &{}\quad \text{ at } \text{ rate }\quad i\lambda _i(1-d_i) + x_D(t)\alpha s_i, &{}\quad i \ge 0; \\ i&{} \rightarrow &{}i - 1 &{}\quad \text{ at } \text{ rate }\quad i\mu _i, &{}\quad i \ge 2; \\ i&{} \rightarrow &{}0 &{}\quad \text{ at } \text{ rate }\quad \gamma _i + \mu _1 \mathbf{1}_{\{1\}}(i), &{}\quad i \ge 1. \end{array} \end{aligned}$$
(4.2)

Any particular animal \({\mathcal {A}}\) is born either as a migrant, or in a patch. Once in a patch, it never migrates again. Its Markov process has transition rates at time \(t\) given by

$$\begin{aligned} \begin{array}{lllll} (i,m) &{} \rightarrow &{}(i+1,m+1) &{}\quad \text{ at } \text{ rate }\quad \lambda _i(1-d_i) ; &{}\quad i\ge 1 \\ (i,m) &{} \rightarrow &{}(i+1,m) &{}\quad \text{ at } \text{ rate }\quad (i-1)\lambda _i(1-d_i) + x_D(t)\alpha s_i ; &{}\quad i\ge 2 \\ (i,m) &{} \rightarrow &{}(i-1,m) &{}\quad \text{ at } \text{ rate }\quad (i-1)\mu _i ; &{}\quad i\ge 2 \\ (i,m) &{} \rightarrow &{}(i,m+1) &{}\quad \text{ at } \text{ rate }\quad \lambda _i d_i ; &{}\quad i\ge 1 \\ (i,m) &{} \rightarrow &{}(\Delta ,m) &{}\quad \text{ at } \text{ rate }\quad \mu _i + \gamma _i ; &{}\quad i\ge 1 \\ (D,0)&{} \rightarrow &{}(i,0) &{}\quad \text{ at } \text{ rate }\quad \alpha x_{i-1}(t)s_{i-1}; &{}\quad i\ge 1 \\ (D,0)&{} \rightarrow &{}(\Delta ,0) &{}\quad \text{ at } \text{ rate }\quad \mu _D. \end{array} \end{aligned}$$
(4.3)

In either case, the process depends on \(x(t)\) only through the arrival rates of migrants into patches.

The second model of Metz and Gyllenberg (2001) has animals of two different varieties, that interact through living in common patches, in that their per capita birth and death rates \(\lambda \) and \(\mu \) and their migration parameters \(d\) and \(s\) vary with the entire composition \((i_1,i_2)\) of the populations of the two varieties in a patch. Under assumptions analogous to (4.1), the deterministic process \(\{x(t),\,t\ge 0\}\) with index set \(Z_+^2 \cup \{D_1,D_2\}\) again acts as a good approximation to the random process \(x^N\), and small groups of individuals and patches behave asymptotically almost independently. Sufficient conditions for this are bounded per capita birth, death, catastrophe and migrant arrival rates, together with \(u_{i_1,i_2}\) being bounded in \(i_1,i_2\ge 0\), where

$$\begin{aligned} u_{i,j} {:=}(i-1)\lambda _{i-1,j}d_{i-1,j} - i\lambda _{ij}d_{ij} + (j-1)\lambda _{i,j-1}^*d_{i,j-1}^* - j\lambda _{ij}^*d_{ij}^*; \end{aligned}$$

here, the starred quantities are those for the second variety, and the unstarred those for the first.

However, Metz and Gyllenberg are interested in using the approximation when just a small number of animals of the second variety have been introduced into a resident metapopulation consisting only of the first variety. Under such circumstances, the development of the introduced variety has an essentially random component—it may die out by chance, even if at a theoretical advantage—making it more reasonable to treat it as a small group of individuals, of a different variety, evolving at random among a resident population. The following discussion represents a theoretical justification for the analysis in Metz and Gyllenberg (2001, Section 2(d)).

We begin by choosing \(x^N(0) = {\tilde{x}}^N(0) + N^{-1}K_Ne_{D_2}\), where \({\tilde{x}}^N(0)\) is an initial composition consisting only of individuals of the first variety, and \(\Vert {\tilde{x}}^N(0) - {\tilde{x}}(0)\Vert _\mu = O(N^{-1/2}\sqrt{\log N})\) for some fixed \({\tilde{x}}(0) \in \mathcal{{X}}'_\mu \), which thus also consists only of \(1\)-individuals. Then, in the transition rates for any Markov process approximating individual dynamics, the argument \(x(t)\) can be taken to be \({\tilde{x}}(t)\), where \({\tilde{x}}\) denotes the solution of (3.20) starting at \({\tilde{x}}(0)\), provided that \(K_N = O(N^{\beta })\) for any \(\beta < 1/2\), because then \(\Vert x^N(0) - {\tilde{x}}(0)\Vert _\mu = O(N^{-1/2}\sqrt{\log N})\) also. But since \({\tilde{x}}(0)\) consists only of \(1\)-individuals, so does \({\tilde{x}}(t)\) for all \(t > 0\), and \({\tilde{x}}(t)\) is the solution to the deterministic equation for the initial model of Metz and Gyllenberg (2001), with the parameters of the resident population.

Since a \(2\)-juvenile, once arrived in a patch, never leaves it, the development of the introduced species is best described in terms of the evolution of the patches that \(2\)-juveniles reach. Each such patch can be treated as an ‘individual’, and the \(2\)-migrants that leave it as its offspring, up to the time at which the patch contains no more \(2\)-individuals. This patch process, of a ‘\(p\)-individual’, can thus be interpreted as a life history process \(W\), beginning with the juvenile \(2\)-migrant, whose offspring are the \(2\)-migrants that leave its chosen patch. The entire process begins with a group of \(K_N\) juvenile \(2\)-migrants, and the \(2\)-migrant offspring of the resulting \(p\)-individuals in turn initiate new \(W\)-processes, so that the entire process, if the bound deduced from Corollary 2.4 is small, can be approximated by a Crump–Mode–Jagers (CMJ) branching process (Crump and Mode 1968a, b; Jagers 1968; see also Jagers (1975, Chapter 6)).

Let \(W(t) = ((i,j),m)\) indicate that, at time \(t\), the patch contains \(i\) \(1\)-individuals and \(j\) \(2\)-individuals, and that \(m\) \(2\)-migrants have left the patch up to time \(t\); if \((i,j)\) is replaced by \(\Delta \), this indicates that the initial juvenile and all of its offspring that did not migrate, if there were any, have died, and \(D_2\) is used when the state consists of the single juvenile \(2\)-migrant, before it reaches a patch. The transition rates of \(W\) at time \(t\) can then be expressed as

$$\begin{aligned} \begin{array}{lllll} ((i,j),m) &{} \rightarrow &{}((i,j+1),m) &{}\quad \text{ at } \text{ rate }\quad j\lambda _{ij}^*(1-d_{ij}^*) ; &{}\quad i\ge 0,\,j\ge 1 \\ ((i,j),m) &{} \rightarrow &{}((i+1,j),m)&{}\quad \text{ at } \text{ rate }\quad i\lambda _{ij}(1-d_{ij}) + {\tilde{x}}_D(t)\alpha s_{ij} ; &{}\quad i\ge 1,\, j\ge 1 \\ ((i,j),m) &{} \rightarrow &{}((i-1,j),m) &{}\quad \text{ at } \text{ rate }\quad i\mu _{ij} ; &{}\quad i\ge 1,\,j\ge 1 \\ ((i,j),m) &{} \rightarrow &{}((i,j),m+1) &{}\quad \text{ at } \text{ rate }\quad j\lambda _{ij}^* d_{ij}^* ; &{}\quad i \ge 0,\,j\ge 1 \\ ((i,j),m) &{} \rightarrow &{}((i,j-1),m) &{}\quad \text{ at } \text{ rate }\quad j\mu _{ij}^* ; &{}\quad i\ge 0,\,j\ge 2 \\ ((i,j),m) &{} \rightarrow &{}(\Delta ,m) &{}\quad \text{ at } \text{ rate }\quad \mu _1^*\mathbf{1}_{\{1\}}(j) + \gamma _{ij} ; &{}\quad i\ge 0,\, j\ge 1 \\ (D_2,0)&{} \rightarrow &{}((i,1),0) &{}\quad \text{ at } \text{ rate }\quad \alpha {\tilde{x}}_{i}(t)s_{i0}^*; &{}\quad i\ge 0 \\ (D_2,0)&{} \rightarrow &{}(\Delta ,0) &{}\quad \text{ at } \text{ rate }\quad \mu _D^*. \end{array}\nonumber \\ \end{aligned}$$
(4.4)

In particular, if the resident population started at an equilibrium of the deterministic equations, so that \({\tilde{x}}(t) = {\tilde{x}}(0)\) for all \(t\), then these transition rates are time homogeneous. Note also that, since the per capita birth rate of the second variety is uniformly bounded over all patch compositions, comparison with a linear pure birth process shows that the expectation of the square of the number of \(2\)-individuals that were ever alive during \([0,T]\) is bounded by \(c_T K_N^2\), for a suitable \(c_T < \infty \). Hence the probability that any \(2\)-migrant, whenever it was born, arrives during \([0,T]\) in a patch which has already been visited by individuals of the second variety is of order \(O(N^{-1}K_N^2)\), and this is asymptotically small if \(K_N = O(N^{\beta })\) for any \(\beta < 1/2\).

Thus, in view of Corollary 2.4, the evolution of the introduced species over any finite time interval \([0,T]\), measured in terms of the number of juvenile migrants, is the same as that of a CMJ-branching process, with probability of order \(O(N^{-1+2\beta })\). The individual life history consists of a period of migration, followed either by death (with probability \(\mu _D^*/S\), where \(S := \mu _D^* + \sum _{i\ge 0}\alpha {\tilde{x}}_i(0) s_{i0}^*\)) or arrival in a patch (of type \((i,0)\) with probability \(\alpha {\tilde{x}}_i(0) s_{i0}^*/S\)), after which its subsequent life history follows that of the Markov process with rates (4.4), started in the state \(((i,1),0)\). In particular, each transition of this process in which the third component increases corresponds to the birth of a new juvenile migrant. If \(P(i,j,t)\) denotes the probability \({\mathbb {P}}[(W_1(t),W_2(t)) = (i,j) \,|\,W(0) = (D_2,0)]\), then the mean intensity of the offspring process is \(m(t) := \sum _{i\ge 0}\sum _{j\ge 1}P(i,j,t) j\lambda _{ij}^* d_{ij}^* \,dt\), and the mean number of offspring is \({\bar{m}}:= \int _0^\infty m(t)\,dt \le \infty \).

The approximation using a branching process gives a lot of insight into the development of the introduced species. In particular, if the equation \(\int _0^\infty e^{-\rho t} m(t)\,dt = 1\) has a solution \(\rho > 0\) (which has to be the case if \(1 < {\bar{m}}< \infty \)), then the introduced species, if it becomes established, grows exponentially with rate \(\rho \), and the probability that it becomes established from an initial population of \(K\) juvenile migrants is \(1 - q^K\), where \(q\) is the extinction probability of the Galton–Watson process, starting with a single individual, whose offspring distribution is the distribution of the total number of offspring in the CMJ-process. If \({\bar{m}}\le 1\), the introduced species dies out with probability one. However, the current theorems only guarantee this approximation to be valid over a fixed time interval \([0,T]\), and then for \(N\) sufficiently large. In Barbour et al. (2013), the development of an introduced species, including the branching approximation, is considered over much longer time intervals, but in the context of finite dimensional Markov population processes. It would be interesting to establish analogous results in the current context.

Metz and Gyllenberg (2001) made the (intuitively obvious) conjecture that, if the introduced species has exactly the same parameters as the original, and is introduced in equilibrium, then \({\bar{m}}= 1\). This is equivalent to saying that, in equilibrium, each migrant generates a process that results in an average of exactly one new migrant. They were, however, unable to give a proof of this. If the random process for finite \(N\) were ergodic, it would be natural to use arguments based on long term time averages as the basis of a proof. However, the finite \(N\) process is eventually absorbed in the zero population extinction state, so such arguments cannot be used. However, we sketch a proof of the conjecture, under assumptions that include those of Metz and Gyllenberg, in the “Appendix”.

4.2 Example 2: Kretzschmar’s (1993) model

In Kretzschmar’s (1993) model of parasitic infection, \(N\) denotes the initial number of hosts, these playing the role of patches. The index \(i \in \mathbb {Z}_+\) denotes the number of parasites living in the host. The model has transitions of the form I–VI, with \(\lambda _{i,i-1} := i\mu \), \(\lambda _{i,i+1} := \lambda \varphi (x)\), \(\beta _0(x) := \beta \sum _{i\ge 0}x_i \theta ^i\) and \(\delta _i := \kappa + i\alpha \), all other transition rates being zero; here, \(0 \le \theta \le 1\), and \(\varphi (x) := \sum _{j\ge 1}jx_j/(c + \Vert x\Vert _1)\) for some \(c > 0\). It is shown in Barbour and Luczak (2012a, Example 5.1) that, if the initial conditions satisfy (3.24) and (3.25), then the law of large numbers approximation (2.2) holds with \(\varepsilon _N = k_T N^{-1/2}\sqrt{\log N}\) and \(P_T(N,\varepsilon _N) = k_T' N^{-1}\log N\), for suitably chosen constants \(k_T\), \(k_T'\), where, as usual, \(\mu (i) = i+1\). It is also easy to check that \(D_Y(T,\delta ) < \infty \) for all \(T\) and \(\delta \). The patch process \(Y\) on \(\mathbb {Z}_+\cup \Delta \) has transition rates at time \(t\) given by

$$\begin{aligned} \begin{array}{lllll} i&{} \rightarrow &{}i+1 &{}\quad \text{ at } \text{ rate }\quad \lambda \varphi (x(t)), &{}\quad i\ge 0; \\ i&{} \rightarrow &{}i - 1 &{}\quad \text{ at } \text{ rate }\quad i\mu , &{}\quad i \ge 1; \\ i&{} \rightarrow &{}\Delta &{}\quad \text{ at } \text{ rate }\quad \kappa + i\alpha , &{}\quad i \ge 0. \end{array} \end{aligned}$$
(4.5)

One way of looking at this process is as a superposition of Poisson processes. Each parasite on arrival decides independently either to die or to kill the host, with probabilities \(\mu /(\mu +\alpha )\) and \(\alpha /(\mu +\alpha )\) respectively. The time of this event is exponentially distributed with mean \(1/(\mu +\alpha )\). Independently, the host is killed after an exponentially distributed time with mean \(1/\kappa \). Because of the independence of marked Poisson streams, given that the host is alive at time \(T\), the number of parasites living in it has a Poisson distribution with mean

$$\begin{aligned} \int _0^T \lambda \varphi (x(t)) e^{-(\mu +\alpha )(T-t)}\,dt. \end{aligned}$$

Thus a cohort consisting of \(K_N\) hosts of given age \(T\) would exhibit an approximately Poisson distribution of parasites per host, if \(K_N = O(N^{\gamma })\) for some \(\gamma < 1/2\). Thus, within age classes, Kretzschmar’s model does not generate over-dispersed distributions of parasites per host, though mixing over age classes in a sample may be expected to do so. Even then, if \(\alpha \) and \(\kappa \) are much smaller than \(\mu \), and \(x\) is in equilibrium, the departure from Poisson may not be very noticeable, unless there are many young hosts (with ages comparable to \(1/\mu \)) in the sample.