Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Preliminaries

1.1 Introduction

Mathematical models arising in the natural sciences often involve equations that describe how the phenomena under investigation evolve in time. Such evolution equations can arise in a number of different forms; for example, the assumption that time is a discrete variable could lead to difference equations, whereas continuous-time models are often expressed in terms of differential equations.

The construction and application of a mathematical model usually proceeds in the following manner.

  • We make assumptions on the various factors that influence the evolution of the time-dependent process that we are interested in.

  • We obtain a ‘model’ by expressing these assumptions in terms of mathematics.

  • We use mathematical techniques to analyse our model. If the model takes the form of an equation, then ideally we would like to obtain an explicit formula for its solution (unfortunately, this is impossible in the majority of cases).

  • Finally, we examine the outcome of our mathematical analysis and translate this back into the real world situation to find out how closely the predictions from our model agree with actual observations.

In the case of kinetic models, where the interest is in describing, in mathematical terms, the evolution of some population of objects, the modelling process usually results in a so-called Kinetic (or Master) Equation. A nice account of the typical steps involved in deriving such an equation is given in Sect. 1.1 of the contribution to this volume by Jacek Banasiak [3].

Note that a mathematical model will usually be only an approximation to what is actually happening in reality. Highly detailed models, incorporating many different factors, inevitably mean very complicated mathematical equations which are difficult to analyse, whereas crude models, which are easy to analyse, are most likely to provide poor predictions of actual behaviour. In practice, a compromise has to be reached; a small number of key factors are identified and used to produce a model which is not excessively complicated.

When faced with a specific mathematical problem that has emerged from the modelling process, an important part of the mathematical analysis is to establish that the problem has been correctly formulated. The usual requirements for this to be the case are the following.

  1. 1.

    Existence of Solutions. We require at least one solution to exist.

  2. 2.

    Uniqueness of Solutions. There must be no more than one solution.

  3. 3.

    Continuous Dependence on the Problem Data. The solution should depend continuously on any input data, such as initial or boundary conditions.

Problems that meet these requirements are said to be well-posed. Note that implicit in the above statements is that we know exactly what is meant by a solution to the problem. Often there will be physical, as well as mathematical, constraints that have to be satisfied. For example we may be only interested in solutions which take the form of non-negative, differentiable functions. Also, in some cases, it may be possible to define a solution in different ways, and this could lead to a well-posed problem if we work with one type of solution but an ill-posed problem if we adopt a different definition of a solution. When multiple solutions exist, then we may be prepared to accept this provided a satisfactory explanation can be provided for the non-uniqueness condition being violated.

In these notes, we shall present some techniques that have proved to be effective in establishing the well-posedness of problems involving evolution equations. We shall illustrate how these techniques can be applied to standard problems that arise in population dynamics, beginning initially with the simple case of initial-value problems (IVPs) for scalar ordinary differential equations (ODEs) (the Malthus and Verhulst models of single-species population growth), and then going on to IVPs for finite systems of ODEs (e.g. models of interacting species and epidemics). We conclude by discussing and analysing models of coagulation–fragmentation processes that are expressed in terms of an infinite system of differential equations. To enable these problems to be treated in a unified manner, the techniques used will be developed from a dynamical systems point of view and concepts and results from the related theory of semigroups of operators will be introduced at appropriate stages.

1.2 Dynamical Systems

From a mathematical viewpoint, a dynamical system consists of the following two parts:

  • a state vector that describes the state of the system at a given time,

  • a function that maps the state at one instant of time to the state at a later time.

The following definition expresses this more precisely; see [15, p.160].

Definition 1

Let X represent the state space (i.e. the space of all state vectors) and let J be a subset of \(\mathbb{R}\,\) (which we assume contains 0). A function ϕ: J × X → X that has the two properties

  1. (i)

    \(\phi (0,\stackrel{\circ }{u}) =\, \stackrel{\circ }{u}\)

  2. (ii)

    \(\phi (s,\phi (t,\stackrel{\circ }{u})) =\phi (t + s,\stackrel{\circ }{u})\,,\mbox{ for }t,s,t + s \in J,\)  (the semigroup property)

is called a dynamical system on X.

Remarks

  1. 1.

    Throughout, we assume that X is a Banach space (i.e. a complete normed vector space); see Sect. 1.3.2 for details.

  2. 2.

    We can regard \(\phi (t,\stackrel{\circ }{u})\) as the state at time t of the system that initially was at state \(\stackrel{\circ }{u}\). The semigroup property then has the following interpretation: let the system evolve from its initial state \(\stackrel{\circ }{u}\) to state \(\phi (t,\stackrel{\circ }{u})\) at time t, and then allow it to evolve from this state for a further time s. The system will then arrive at precisely the state \(\phi (t + s,\stackrel{\circ }{u})\) that it would have reached through a single-stage evolution of t + s from state \(\stackrel{\circ }{u}\).

  3. 3.

    In these notes, we shall consider only the case when J is an interval in \(\mathbb{R}\), usually \(J =\mathbb{R} ^{+} = [0,\infty )\). The dynamical system is then called a continuous-time (semi- or forward) dynamical system. We shall abbreviate this to CDS.

In operator form, we can write

$$\displaystyle{\phi (t,\stackrel{\circ }{u}) = S(t)\!\stackrel{\circ }{u},}$$

where S(t) is an operator mapping the state space X into X. Note that S(0) = I (the identity operator on X) and the semigroup property (in the case when \(J =\mathbb{R} _{+}\)) becomes

$$\displaystyle{S(t)S(s) = S(t + s),\ \forall t,s \geq 0.}$$

The family of operators \(\mathbf{S} =\{ S(t)\}_{t\geq 0}\) is said to be a semigroup of operators on X (algebraically, S is a semigroup under the associative binary operation of composition of operators).

Example 1

As a simple illustration of how a CDS arises from a differential equation, consider the initial value problem

$$\displaystyle{ u^{{\prime}}(t) = \mathit{lu}(t)\,,\ \quad u(0) =\,\, \stackrel{\circ }{u}\,, }$$
(1)

where l is a real constant. Routine methods show that a solution to (1) is \(u(t) = e^{\mathit{tl}}\!\stackrel{\circ }{u}\). This establishes that there exists at least one solution to (1). To prove that there is no other differentiable solution (i.e. to establish the uniqueness of the solution that we have produced), we argue as follows. Suppose that another solution v exists and let t > 0 be arbitrarily fixed. Then, for 0 < s ≤ t, we have

$$\displaystyle\begin{array}{rcl} \frac{d} {\mathit{ds}}(e^{(t-s)l}v(s))& =& -\mathit{le}^{(t-s)l}v(s) + e^{(t-s)l}v^{{\prime}}(s) {}\\ & =& -\mathit{le}^{(t-s)l}v(s) + e^{(t-s)l}\mathit{lv}(s) = 0. {}\\ \end{array}$$

It follows from this that e (ts)l v(s) is a constant function of s on [0, t]. On choosing s = 0 and s = t, we obtain

$$\displaystyle{e^{\mathit{tl}}v(0) = e^{(t-t)l}v(t) = v(t).}$$

Since this argument works for any t > 0 and we already know that \(v(0) = u(0) = \stackrel{\circ }{u}\), we deduce that \(v(t) = u(t) = e^{\mathit{tl}}\!\stackrel{\circ }{u}\) for all t ≥ 0. Now let \(\phi:\mathbb{R} \times \mathbb{R}\rightarrow \mathbb{R}\) be defined by

$$\displaystyle{\phi (t,\stackrel{\circ }{u}) = e^{\mathit{tl}}\!\stackrel{\circ }{u}\,,\quad t,\stackrel{\circ }{u}\, \in \mathbb{R}\,,}$$

that is, \(\phi (t,\stackrel{\circ }{u})\) denotes the value at time t of the solution of the IVP (1). Clearly

  1. (i)

    \(\phi (0,\stackrel{\circ }{u}) = \stackrel{\circ }{u}\)

  2. (ii)

    \(\phi (s,\phi (t,\stackrel{\circ }{u})) =\phi (s,e^{\mathit{tl}}\!\stackrel{\circ }{u}) = e^{sl}e^{\mathit{tl}}\!\stackrel{\circ }{u} = e^{(t+s)l}\!\stackrel{\circ }{u} =\phi (t + s,\stackrel{\circ }{u})\)

and so \(\phi:\mathbb{R} \times \mathbb{R}\rightarrow \mathbb{R}\) is a CDS (by Definition 1).

Example 2

To make things a bit more interesting, we shall add a time-dependent forcing term to the IVP (1) and consider the non-homogeneous problem

$$\displaystyle{ u^{{\prime}}(t) = \mathit{lu}(t) + g(t),\ t > 0,\quad u(0) =\,\, \stackrel{\circ }{u}, }$$
(2)

where g is some known, and suitably restricted, function of t. To find a solution of (2), we use the following trick to reduce the problem to one that is more straightforward. Suppose that the solution u can be written as u(t) = e tl v(t). On substituting into the non-homogeneous ODE, we obtain

$$\displaystyle{\mathit{le}^{\mathit{tl}}v(t) + e^{\mathit{tl}}v^{{\prime}}(t) = \mathit{le}^{\mathit{tl}}v(t) + g(t).}$$

It follows that v satisfies the ODE \(v^{{\prime}}(t) = e^{-\mathit{tl}}g(t)\) and therefore, from basic calculus,

$$\displaystyle{v(t) - v(0) =\int _{ 0}^{t}v^{{\prime}}(s)\,\mathit{ds} =\int _{ 0}^{t}e^{-\mathit{sl}}g(s)\,\mathit{ds}.}$$

Rearranging terms, and using the fact that \(v(0) =\,\, \stackrel{\circ }{u}\), produces

$$\displaystyle{v(t) =\,\, \stackrel{\circ }{u} +\int _{ 0}^{t}e^{-\mathit{sl}}g(s)\,\mathit{ds},}$$

and therefore a solution of the IVP (2) is given by

$$\displaystyle{ u(t) = e^{\mathit{tl}}\!\stackrel{\circ }{u} +\int _{ 0}^{t}e^{(t-s)l}g(s)\,\mathit{ds}. }$$
(3)

Formula (3) is sometimes referred to as Duhamel’s (or the variation of constants) formula. As we have actually found a solution, we have resolved the question of existence of solutions to (2). But what about uniqueness? Do solutions to (2) exist other than that given by the Duhamel formula? The following argument shows that (3) is the only solution. Suppose that another solution, say w, of (2) exists and consider \(z = u - w\), where u is the solution given by (3). Then z must satisfy the IVP \(z^{{\prime}}(t) = lz(t),\ z(0) = 0\), and therefore, by the previous example, is given by \(z(t) = e^{\mathit{tl}}0 = 0\) for all t ≥ 0. Consequently, w(t) = u(t) for all t ≥ 0. In this case, if we define

$$\displaystyle{\phi (t,\stackrel{\circ }{u}):= e^{\mathit{tl}}\!\stackrel{\circ }{u} +\int _{ 0}^{t}e^{(t-s)l}g(s)\,\mathit{ds},}$$

then we do not obtain a CDS as the semigroup property is not satisfied. The reason for this is that the right-hand side of (2) depends explicitly on t through the function g; i.e. the equation is non-autonomous. In the previous example, where the solution of the IVP led to a CDS, the equation is autonomous since the right-hand side depends on t only through the solution u.

In the sequel, we shall consider only autonomous differential equations. When existence and uniqueness of solutions can be established for IVPs associated with an equation of this type, then we end up with a CDS ϕ: J × X → X which we can go on to investigate further. Typical questions that we would like to answer are the following.

  1. 1.

    Given an initial value \(\stackrel{\circ }{u}\), can we determine the asymptotic (long-term) behaviour of \(\phi (t,\stackrel{\circ }{u})\) as t →  ?

  2. 2.

    Can we identify particular initial values which give rise to the same asymptotic behaviour?

  3. 3.

    Can we say anything about the stability of the system? For example, if \(\stackrel{\circ }{u}\) is “close to” \(\stackrel{\circ }{v}\) in X, what can be said about the distance between \(\phi (t,\stackrel{\circ }{u})\) and \(\phi (t,\stackrel{\circ }{v})\) for future values of t?

In many situations, a dynamical system may also depend on a parameter (or several parameters), that is, the system takes the form \(\phi _{\mu }: J \times X \rightarrow X\) where \(\mu \in \mathbb{R}\) represents the parameter. In such cases, the following questions would also be of interest.

  1. 4.

    Can we determine what happens to the behaviour of the dynamical system as the parameter varies?

  2. 5.

    Can we identify the values of the parameter at which changes in the behaviour of the system occur (bifurcation values)?

In some special cases, it is possible to find an explicit formula for the dynamical system. For example, \(\phi (t,\stackrel{\circ }{u}) = e^{\mathit{tl}}\!\stackrel{\circ }{u}\) (where l can be regarded as a parameter). The formula can then be used to answer questions 1–5 above. Unfortunately, in most cases no such formula can be found and analysing the dynamical system becomes more complicated.

1.3 Some Basic Concepts from Functional Analysis

The definition we gave of a dynamical system in Sect. 1.2, involved a state space X. Recall that, from a mathematical point of view, a dynamical system is a function ϕ of time t and the state variable \(\stackrel{\circ }{u}\,\, \in X\). In the context of evolution equations, \(\stackrel{\circ }{u}\) represents the initial state of the system (physical, biological, economic, etc.) that is being investigated. We now examine the algebraic and analytical structure of the state spaces that will be used in these notes. For a more detailed account, see any standard book on Functional Analysis such as [16].

1.3.1 Vector Spaces

A complex vector space (or complex linear space) is a non-empty set X of elements f, g … (often called vectors) together with two algebraic operations, namely vector addition and multiplication of vectors by scalars (complex numbers). Vector addition associates with each ordered pair (f, g) ∈ X × X a uniquely defined vector f + g ∈ X (the sum of f and g) such that

$$\displaystyle{f + g = g + f\mbox{ and }f + (g + h) = (f + g) + h\qquad \forall f,g,h\ \in X.}$$

Moreover there exists a zero element O X and, for each f ∈ X, there exists − f ∈ X, such that

$$\displaystyle{f + O_{X} = f\mbox{ and }f + (-f) = O_{X}.}$$

Multiplication by scalars associates with each f ∈ X and scalar \(\alpha \in \mathbb{C}\) a uniquely defined vector α f ∈ X such that for all f, g ∈ X and scalars α, β we have

$$\displaystyle{\alpha (\beta f) = (\alpha \beta )f,\quad 1x = x,\quad \alpha (f + g) =\alpha f +\alpha g,\quad (\alpha +\beta )f =\alpha f +\beta f.}$$

Note that a real vector space, in which the scalars are restricted to be real numbers, is defined analogously.

A linear combination of \(\{f_{1},f_{2},\ldots,f_{m}\} \subset X\) is an expression of the form

$$\displaystyle{\alpha _{1}f_{1} +\alpha _{2}f_{2} +\ldots +\alpha _{m}f_{m} =\sum \limits _{ j=1}^{m}\alpha _{ j}f_{j}}$$

where the coefficients \(\alpha _{1},\alpha _{2},\ldots,\alpha _{m}\) are any scalars. For any (non-empty subset) M ⊂ X, the set of all linear combinations of elements in M is called the span of M, written span (M) (or sp (M)).

The vectors \(f_{1},f_{2},\ldots,f_{m}\) are said to be linearly independent if

$$\displaystyle{\alpha _{1}f_{1} +\alpha _{2}f_{2} +\ldots +\alpha _{m}f_{m} = O_{X}\ \ \ \Leftrightarrow \alpha _{1} =\alpha _{2} =\ldots =\alpha _{m} = 0;}$$

otherwise the vectors are linearly dependent. An arbitrary subset M of X is linearly independent if every non-empty finite subset of M is linearly independent; M is linearly dependent if it is not linearly independent.

A vector space X is said to be finite-dimensional if there is a positive integer n such that X contains a linearly independent set of n vectors whereas any set of n + 1 or more vectors of X is linearly dependent—in this case X is said to have dimension n and we write dimX = n. By definition, if X = { O X }, then dimX = 0. If dimX = n, then any linearly independent set of n vectors from X forms a basis for X. If \(e_{1},e_{2},\ldots,e_{n}\) is a basis for X then each f ∈ X has a unique representation as a linear combination of the basis vectors; i.e.

$$\displaystyle{f =\alpha _{1}e_{1} +\alpha _{2}e_{2} +\ldots +\alpha _{n}e_{n}\,,}$$

with the scalars \(\alpha _{1},\alpha _{2},\ldots \alpha _{n}\) uniquely determined by f.

1.3.2 Normed Vector Spaces and Banach Spaces

A norm on a vector space X is a mapping from X into \(\mathbb{R}\) satisfying the conditions

  • \(\Vert f\Vert \geq 0\) for all \(f \in X\) and \(\Vert f\Vert = 0 \Leftrightarrow f = O_{X}\);

  • \(\Vert \alpha f\Vert = \vert \alpha \vert \,\Vert f\Vert\) for all scalars α and f ∈ X;

  • \(\Vert f + g\Vert \leq \Vert f\Vert +\Vert g\Vert\) for all f, g ∈ X  (the Triangle Inequality).

A vector space X, equipped with a norm \(\Vert \cdot \Vert\), is called a normed vector space, denoted by \((X,\Vert \cdot \Vert )\) (or simply by X when it is clear which norm is being used). Note that a norm can be regarded as a generalisation to a vector space of the familiar idea of the modulus of a number. Moreover, just as | αβ | gives the distance between two numbers, we can use \(\Vert f - g\Vert\) to measure the distance between two elements f, g in \((X,\Vert \cdot \Vert )\). This then enables us to discuss convergence of sequences of elements and continuity of functions in a normed vector space setting.

We say that a sequence \((f_{n})_{n=1}^{\infty }\) in a normed vector space X (with norm \(\Vert \cdot \Vert\)) is convergent in X if there exists f ∈ X (the limit of the sequence) such that

$$\displaystyle{\lim _{n\rightarrow \infty }\Vert f_{n} - f\Vert = 0.}$$

In this case we write f n  → f as n → . Note that a convergent sequence \((f_{n})_{n=1}^{\infty }\) in X has a uniquely defined limit.

A sequence \((f_{n})_{n=1}^{\infty }\) in a normed vector space X is a Cauchy sequence if for every ε > 0, there exists \(N \in \mathbb{N}\) such that

$$\displaystyle{\Vert f_{m} - f_{n}\Vert <\epsilon \quad \mbox{ for all }m,n \geq N.}$$

The normed vector space X is said to be complete if every Cauchy sequence in X is convergent, and we refer to a complete normed vector space as a Banach space. Note that every finite-dimensional normed vector space is complete and hence a Banach space.

Example 3

Let

$$\displaystyle{\mathbb{C}^{n}:=\{ f = (f_{ 1},\ldots,f_{n}):\ f_{i} \in \mathbb{C}\mbox{ for }i = 1,\ldots n\}.}$$

We say that two vectors \(f = (f_{1},\ldots,f_{n})\) and \(g = (g_{1},\ldots,g_{n})\) are equal in \(\mathbb{C}^{n}\) if

$$\displaystyle{f_{1} = g_{1},\ldots,f_{n} = g_{n}.}$$

Also, if we define

$$\displaystyle\begin{array}{rcl} & & f + g:= (f_{1} + g_{1},\ldots,f_{n} + g_{n}),\quad \ f,g \in \mathbb{C}^{n}, {}\\ & & \alpha f:= (\alpha f_{1},\ldots,\alpha f_{n}),\quad \alpha \in \mathbb{C},\ f \in \mathbb{C}^{n}, {}\\ \end{array}$$

and

$$\displaystyle{\Vert f\Vert:= \sqrt{\vert f_{1 } \vert ^{2 } + \cdots + \vert f_{n } \vert ^{2}},\quad f = (f_{1},\ldots,f_{n}) \in \mathbb{C}^{n},}$$

then \((\mathbb{C}^{n},\Vert \cdot \Vert )\) is a normed vector space with dimension n. Consequently \((\mathbb{C}^{n},\Vert \cdot \Vert )\) is a Banach space. The Banach space \((\mathbb{R}^{n},\Vert \cdot \Vert )\) consisting of all ordered n-tuples of real numbers is defined in an analogous manner.

Example 4

For fixed μ ≥ 0, we define a vector space of scalar-valued sequences \((f_{i})_{i=1}^{\infty }\) by

$$\displaystyle{\ell_{\mu }^{1}:=\{ f = (f_{ i})_{i=1}^{\infty }:\sum _{ i=1}^{\infty }i^{\mu }\vert f_{ i}\vert < \infty \}.}$$

Equality, addition and multiplication by a scalar are defined pointwise in much the same way as in \(\mathbb{C}^{n}\) (e.g. \((f_{i})_{i=1}^{\infty } + (g_{i})_{i=1}^{\infty } = (f_{i} + g_{i})_{i=1}^{\infty }\)) and if we define a norm on μ 1 by

$$\displaystyle{\Vert f\Vert _{1,\mu } =\sum _{ i=1}^{\infty }i^{\mu }\vert f_{ i}\vert,}$$

then \((\ell_{\mu }^{1},\Vert \cdot \Vert _{1,\mu })\) can be shown to be an infinite-dimensional Banach space.

1.3.3 Operators on Normed Vector Spaces

We now introduce some concepts related to functions that are defined on a normed vector space \(X\). Functions of this type are often referred to as operators (or transformations) and we shall denote these by capital letters, such as L, S and T. We shall concentrate only on cases where the operator, say T, maps each vector f ∈ D(T) ⊆ X onto another (uniquely defined) vector T(f) ∈ X. Note that T(f) is often abbreviated to Tf and D(T) is the domain of T.

The simplest type of operator on a normed space X is an operator L that satisfies the algebraic condition

$$\displaystyle{ L(\alpha _{1}f_{1} +\alpha _{2}f_{2}) =\alpha _{1}L(f_{1}) +\alpha _{2}L(f_{2}),\quad \forall \,f_{1},f_{2} \in X\mbox{ and scalars }\alpha _{1},\alpha _{2}. }$$
(4)

Any operator L that satisfies (4) is said to be a linear operator on X. The set of all linear operators mapping X into X will be denoted by L(X) and, defining L 1 + L 2 and α L in L(X) by \((L_{1} + L_{2})(f):= L_{1}(f) + L_{2}(f)\) and (α L)(f): = α L(f), where \(L_{1},L_{2},L \in L(X)\), f ∈ X and α is a scalar, L(X) is a vector space.

An operator T: X → X (T not necessarily linear) is said to be continuous at a given f ∈ X if and only if

$$\displaystyle{f_{n} \rightarrow f\ \mbox{ in}\ X\ \Rightarrow \ T(f_{n}) \rightarrow T(f)\ \mbox{ in}\ X.}$$

We say that T is continuous on X if it is continuous at each f ∈ X.

Another important concept is that of a bounded operator. We say that the operator T: X → X is bounded on the normed vector space X if

$$\displaystyle{ \Vert T(f)\Vert \leq M\,\Vert f\Vert \ \mbox{ for all }f \in X, }$$
(5)

where M is a positive constant that is independent of f; i.e. the same constant M works for all f ∈ X. In the case of a linear operator L: X → X, continuity and boundedness are equivalent as it can be proved that

$$\displaystyle{\mbox{ the linear operator }L: X \rightarrow X\mbox{ is continuous on }X\ \Leftrightarrow \ L\ \mbox{ is bounded on }X.}$$

We shall denote the collection of bounded linear operators on X by B(X). It is straightforward to verify that B(X) is a subspace of L(X). Moreover, if X is a finite-dimensional normed vector space, then all operators in L(X) are bounded (so that, as sets, L(X) = B(X)).

It follows from (5) that, if L is bounded, then

$$\displaystyle{\sup \left \{\Vert L(f)\Vert: f \in X\mbox{ and }\Vert f\Vert \leq 1\right \}}$$

exists as a finite non-negative number. This supremum is used to define the norm of a bounded linear operator in the vector space B(X); i.e.

$$\displaystyle{\Vert L\Vert:=\sup \left \{\Vert L(f)\Vert: f \in X\mbox{ and }\Vert f\Vert \leq 1\right \}.}$$

Equipped with this norm, B(X) is a normed vector space in its own right, and is a Banach space whenever X is a Banach space. Specific examples of bounded and unbounded linear operators can be found in Sect. 1.1 of the contribution to this volume by Adam Bobrowski [8].

1.3.4 Calculus of Vector-Valued Functions

The basic operations of differentiation and integration of scalar-valued functions can be extended to the case of functions which take values in a normed vector space \((X,\Vert \cdot \Vert )\). A function of this type is said to be vector-valued because each value taken by the function is an element in a vector space. In the sequel, we shall encounter functions of the form u: J → X where \(J \subset \mathbb{R}\) is an interval. Thus, u(t) ∈ X for all t ∈ J. Such a function u is said to be strongly continuous at c ∈ J if, for each \(\varepsilon > 0\), a positive δ can be found such that

$$\displaystyle{\Vert u(t) - u(c)\Vert <\varepsilon \, \mbox{ whenever }\,t \in J\mbox{ and }\vert t - c\vert <\delta.}$$

If u is strongly continuous at each point in J, then u is said to be strongly continuous on J. Similarly, u is said to be strongly differentiable at c ∈ J if there exists an element u (c) ∈ X such that

$$\displaystyle{ \lim _{h\rightarrow 0}\frac{u(c + h) - u(c)} {h} = u^{{\prime}}(c), }$$
(6)

where the limit is with respect to the norm defined on X; i.e. given \(\varepsilon > 0\), there exists δ > 0 such that

$$\displaystyle{ \left \Vert \frac{u(c + h) - u(c)} {h} - u^{{\prime}}(c)\right \Vert <\varepsilon \, \mbox{ whenever }\,c + h \in J\,\mbox{ and }\,0 < \vert h\vert <\delta. }$$
(7)

If u is strongly differentiable at each point in J then we say that u is strongly differentiable on J.

As regards integration of a vector-valued function u: J → X, it is a straightforward task to extend the familiar definition of the Riemann integral of a scalar-valued function. For example, if J = [a, b], then, for each partition P n of J of the form

$$\displaystyle{a = t_{0} < t_{1} < t_{2} <\ldots < t_{n} = b,}$$

there is a corresponding Riemann sum

$$\displaystyle{S(u;P_{n}):=\sum _{ k=1}^{n}u(\xi _{ k})(t_{k} - t_{k-1}),}$$

in which ξ k is arbitrarily chosen in the sub-interval \([t_{k-1},t_{k}]\). We then define

$$\displaystyle{\int _{a}^{b}u(t)\,\mathit{dt}:=\lim _{\Vert P_{n}\Vert \rightarrow 0}S(u,P_{n}),}$$

whenever this limit exists in X (and is independent of the sequence (P n ) of partitions and choice of ξ k ). Here

$$\displaystyle{\Vert P_{n}\Vert:=\max _{1\leq k\leq n}(t_{k} - t_{k-1}).}$$

We refer to this integral as the strong (Riemann) integral of u over the interval [a, b]. The strong Riemann integral has similar properties to its scalar version. For example, suppose that u: [a, b] → X is strongly continuous on [a, b]. Then it can be shown that, for each t ∈ [a, b],

$$\displaystyle{\int _{a}^{t}u(s)\,\mathit{ds}\mbox{ exists },\quad \left \Vert \int _{ a}^{t}u(s)\,\mathit{ds}\right \Vert \leq \int _{ a}^{t}\Vert u(s)\Vert \,\mathit{ds},\quad \frac{d} {\mathit{dt}}\left (\int _{a}^{t}u(s)\,\mathit{ds}\right ) = u(t);}$$

see [7, Section 1.6] and also [4, Subsection 2.1.5].

1.3.5 The Contraction Mapping Principle

As discussed earlier, when carrying out a rigorous investigation into problems arising from mathematical models, the first step is usually to show that solutions actually exist. Moreover, such solutions should be uniquely determined by the problem data. Theoretical results which establish these properties are often referred to as Existence-Uniqueness Results. To end this section, we present one of the most important results of this type. We shall also supply a proof as this provides concrete motivation for working with Banach spaces.

Theorem 1 (Banach Contraction Mapping Principle )

Let \((X,\Vert \cdot \Vert )\) be a Banach space and let T: X → X be an operator with the property that

$$\displaystyle{\Vert Tf - Tg\Vert \leq \alpha \,\Vert f - g\Vert \quad \forall \,f,g \in X,}$$

for some constant α < 1 (such an operator T is said to be a (strict) contraction). Then the equation

$$\displaystyle{Tf = f}$$

has exactly one solution (called a fixed point of T) in X. Moreover, if we denote this unique solution by \(\overline{f}\) and use T iteratively to generate a sequence of vectors \((f_{1},\,Tf_{1},\,T^{2}f_{1},\,T^{3}f_{1},\ldots )\) , where f 1 is any given vector in X, then

$$\displaystyle{T^{n}f_{ 1} \rightarrow \overline{f}\mbox{ as }n \rightarrow \infty.}$$

Proof

Let the sequence \((f_{n})_{n=1}^{\infty }\) be defined as in the statement of the theorem. Then, for n ≥ 2,

$$\displaystyle{\Vert f_{n+1} - f_{n}\Vert =\Vert Tf_{n} - Tf_{n-1}\Vert \leq \alpha \Vert f_{n} - f_{n-1}\Vert \leq \cdots \leq \alpha ^{n-1}\Vert f_{ 2} - f_{1}\Vert.}$$

Note that the above inequality trivially holds for n = 1 as well. Hence, for any m > n ≥ 1, we have

$$\displaystyle\begin{array}{rcl} \Vert f_{m} - f_{n}\Vert & =& \Vert f_{m} - f_{m-1}\Vert +\Vert f_{m-1} - f_{m-2}\Vert + \cdots +\Vert f_{n+1} - f_{n}\Vert {}\\ & \leq & (\alpha ^{m-2} +\alpha ^{m-3} + \cdots +\alpha ^{n-1})\Vert f_{ 2} - f_{1}\Vert {}\\ & <& \alpha ^{n-1}(1 +\alpha +\alpha ^{2} + \cdots \,)\Vert f_{ 2} - f_{1}\Vert = \frac{\alpha ^{n-1}} {1-\alpha }\,\Vert f_{2} - f_{1}\Vert. {}\\ \end{array}$$

Since

$$\displaystyle{\frac{\alpha ^{n-1}} {1-\alpha }\,\Vert f_{2} - f_{1}\Vert \rightarrow 0\mbox{ as }n \rightarrow \infty,}$$

it follows that \((f_{n})_{n=1}^{\infty }\) is a Cauchy, and hence convergent, sequence in the Banach space \((X,\Vert \cdot \Vert )\). Let f ∈ X be the limit of this convergent sequence. Then, by continuity of the operator T, we obtain

$$\displaystyle{f_{n+1} = Tf_{n} \Rightarrow \lim _{n\rightarrow \infty }f_{n+1} =\lim _{n\rightarrow \infty }Tf_{n} = T\left (\lim _{n\rightarrow \infty }f_{n}\right ) \Rightarrow f = Tf,}$$

and so f is a fixed point of T. To show that no other fixed point exists, suppose that both f and g are fixed points, with fg. Then

$$\displaystyle{\Vert f - g\Vert =\Vert Tf - Tg\Vert \leq \alpha \Vert f - g\Vert.}$$

Dividing each side by \(\Vert f - g\Vert \,(\neq 0)\) leads to 1 ≤ α, which is a contradiction. □ 

2 Finite-Dimensional State Space

In this section we give a brief account of some aspects of the theory associated with autonomous finite-dimensional systems of ODEs and will explain how continuous-time dynamical systems defined on the finite-dimensional state-space \(\mathbb{R}^{n}\) arise naturally from such systems. This will pave the way for the discussion on infinite-dimensional dynamical systems that will follow in the next section. Note that the intention with these lectures is not to provide an exhaustive treatment of systems of ODEs. Instead, we concentrate only on those results which will be needed to analyse some selected problems arising in population dynamics. We begin by examining the most straightforward case where we have a linear system of constant-coefficient ODEs. We will then move on to systems involving nonlinear equations and describe how, through the process of linearisation, useful information on the long-time behaviour of solutions near an equilibrium solution can be obtained from a related linear, constant-coefficient system. Obviously, before we can talk about the long-time behaviour of solutions, we should make sure that solutions do, in fact, exist. Hence, we shall highlight some conditions which, thanks to the Contraction Mapping Principle, guarantee the existence and uniqueness of solutions to systems of ODEs.

2.1 Linear Constant-Coefficient Systems of ODEs

2.1.1 Matrix Exponentials

Consider the following IVP involving a linear system of n constant-coefficient ODEs:

$$\displaystyle\begin{array}{rcl} u_{1}^{{\prime}}(t)& =& l_{ 11}u_{1}(t) + l_{12}u_{2}(t) + \cdots + l_{1n}u_{n}(t),\ u_{1}(0) =\,\, \stackrel{\circ }{u}_{1}, {}\\ u_{2}^{{\prime}}(t)& =& l_{ 21}u_{1}(t) + l_{22}u_{2}(t) + \cdots + l_{2n}u_{n}(t),\ u_{2}(0) =\,\, \stackrel{\circ }{u}_{2}, {}\\ & \vdots & {}\\ u_{n}^{{\prime}}(t)& =& l_{ n1}u_{1}(t) + l_{n2}u_{2}(t) + \cdots + l_{nn}u_{n}(t),\ u_{n}(0) =\,\, \stackrel{\circ }{u}_{n}, {}\\ \end{array}$$

where \(l_{11},l_{12},\ldots,l_{n,n}\) and \(\stackrel{\circ }{u}_{1},\ldots,\stackrel{\circ }{u}_{n}\) are real constants. The problem is to find n differentiable functions \(u_{1},u_{2},\ldots,u_{n}\) of the variable t that satisfy the n equations in the system. Obviously, before seeking solutions, we have to know that solutions actually exist, and it is here that considerable progress can be made if we adopt the strategy of working with matrix exponentials that was pioneered by the Italian mathematician Guiseppe Peano in 1887 (see [13, pp. 503–504]).

The first step is to express the IVP system in the matrix–vector form

$$\displaystyle{ u^{{\prime}}(t) = \mathit{Lu}(t),\ u(0) =\,\, \stackrel{\circ }{u}, }$$
(8)

where L is the n × n constant real matrix

$$\displaystyle{L = \left [\begin{array}{cccc} l_{11} & l_{12} & \cdots & l_{1n} \\ l_{21} & l_{22} & \cdots & l_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ l_{n1} & l_{n2} & \cdots &l_{nn} \end{array} \right ],}$$

and \(u(t) = (u_{1}(t),\ldots,u_{n}(t))\) is interpreted as a column vector. A solution of (8) will be a vector-valued function in the sense that u(t) lies in the n-dimensional Banach space \(\mathbb{R}^{n}\) for each t. This means that our state space X is \(\mathbb{R}^{n}\).

Note that \(u^{{\prime}}(t) = (u_{1}^{{\prime}}(t),\ldots,u_{n}^{{\prime}}(t))\) with integrals of the vector-valued function u being interpreted similarly; e.g.

$$\displaystyle{\int _{0}^{t}u(s)\,\mathit{ds} = (\int _{ 0}^{t}u_{ 1}(s)\,\mathit{ds},\ldots,\int _{0}^{t}u_{ n}(s)\,\mathit{ds}).}$$

Expressed as (8), the linear system of ODEs bears a striking resemblance to the scalar equation u (t) = lu(t), and so it is tempting to write down a solution in the form

$$\displaystyle{ u(t) = e^{\,\mathit{tL}}\!\stackrel{\circ }{u}. }$$
(9)

It turns out that the unique solution of (8) can indeed be written as (9), but this obviously leads to the following questions.

  1. Q1.

    What does e tL mean when L is an n × n constant matrix?

  2. Q2.

    How do we verify that (9) is a solution of (8)?

  3. Q3.

    How do we prove that (9) is the only differentiable solution of (8) that satisfies the initial condition \(u(0) =\, \stackrel{\circ }{u}\, \in \mathbb{R}^{n}\)?

  4. Q4.

    For a given n × n constant matrix L, can we actually express e tL in terms of standard scalar-valued functions of t?

To answer Q1, we consider the power series definition of the scalar exponential e l, i.e.

$$\displaystyle{ e^{\,l} = 1 + l + \frac{l^{2}} {2!} + \frac{l^{3}} {3!} + \cdots \,. }$$
(10)

This infinite series converges to the number e l = exp ( l) for each fixed \(l \in \mathbb{R}\). Motivated by this, Peano defined the exponential of an n × n constant matrix L by a formula, which, in modern notation, takes the form

$$\displaystyle{ e^{\,L} =\exp \, (L) = I + L + \frac{L^{2}} {2!} + \frac{L^{3}} {3!} + \cdots \,. }$$
(11)

Here I is the n × n identity matrix, L 2 represents the matrix product LL, L 3 is the product \(LLL = L^{2}L = LL^{2}\) and so on. Note that the operation fLf, where f is a column vector in \(\mathbb{R}^{n}\), defines a bounded linear transformation that maps \(\mathbb{R}^{n}\) into \(\mathbb{R}^{n}\). If we use L to represent both the matrix and the bounded linear operator that it defines, then it can be shown that the infinite series of n × n matrices (or, equivalently, bounded linear operators in \(B(\mathbb{R}^{n})\)) will always converge (with respect to the norm on \(B(\mathbb{R}^{n}\))) to a uniquely defined n × n matrix (which, as before, can be interpreted as an operator in \(B(\mathbb{R}^{n})\)). Moreover

$$\displaystyle{\Vert e^{\,L}\Vert \leq e^{\,\Vert L\Vert },}$$

where \(\Vert L\Vert:=\sup \{\Vert Lf\Vert: f \in \mathbb{R}^{n}\mbox{ and }\Vert f\Vert \leq 1\}\) for any \(L \in B(\mathbb{R}^{n})\); see [15, pp. 82–84] and [13, p. 6]. It follows from (11) that, for any n × n constant matrix L and any scalar t,

$$\displaystyle{ e^{\,\mathit{tL}} =\exp \, (\mathit{tL}) = I + \mathit{tL} + \frac{t^{2}L^{2}} {2!} + \frac{t^{3}L^{3}} {3!} + \cdots \,, }$$
(12)

and

$$\displaystyle{\Vert e^{\,\mathit{tL}}\Vert \leq e^{\,\vert t\vert \,\Vert L\Vert }.}$$

The time-dependent matrix exponential defined by (12) has similar properties to its one-dimensional “little brother”. For example, if L is any n × n constant matrix, then

  1. (P1)

    e  0L = I;

  2. (P2)

    \(e^{\,\mathit{sL}}e^{\,\mathit{tL}} = e^{\,(s+t)L}\) for all \(s,t \in \mathbb{R};\)

  3. (P3)

    \(\frac{d} {\mathit{dt}}\left (e^{\,\mathit{tL}}f\right ) = L\,e^{\,\mathit{tL}}f\) for any given vector \(f \in \mathbb{R}^{n}\).

The derivative in (P3) is interpreted as a strong derivative with respect to the norm on \(\mathbb{R}^{n}\), so that

$$\displaystyle{\left \Vert \frac{e^{\,(t+h)L}f - e^{\,\mathit{tL}}f} {h} - L\,e^{\,\mathit{tL}}f\right \Vert \rightarrow 0\mbox{ as }h \rightarrow 0.}$$

Note that an \(\mathbb{R}^{n}\)-valued function u is strongly differentiable at \(c \in \mathbb{R}\) if and only if each of its scalar-valued components \(u_{k},\ k = 1,2,\ldots,n,\) is differentiable at c; the strong and pointwise (or component-wise) derivatives are then identical. In other words, the notions of strong derivative and pointwise derivative coincide in this n-dimensional case. It should also be remarked that a stronger version of (P3) can be established. Since

$$\displaystyle{\left \Vert \frac{1} {h}\left (e^{\,\mathit{hL}} - I\right ) - L\right \Vert \leq \sum _{ k=2}^{\infty }\frac{\vert h\vert ^{k-1}\,\Vert L\Vert ^{k}} {k!} = \frac{e^{\,\vert h\vert \,\Vert L\Vert } - 1} {\vert h\vert } -\Vert L\Vert \rightarrow 0\mbox{ as }h \rightarrow 0,}$$

and

$$\displaystyle{e^{\,(t+h)L} - e^{\,\mathit{tL}} = \left (e^{\,\mathit{hL}} - I\right )e^{\,\mathit{tL}},}$$

it follows that the operator-valued function te tL is strongly differentiable in \(B(\mathbb{R}^{n})\).

2.1.2 Existence and Uniqueness of Solutions

We can now answer Q2 and Q3. On setting \(u(t) = e^{\,\mathit{tL}}\!\!\stackrel{\circ }{u}\), it follows immediately from properties (P1) and (P3) that

$$\displaystyle{u(0) = I\!\stackrel{\circ }{u}\ =\ \stackrel{\circ }{u}}$$

and

$$\displaystyle{u^{{\prime}}(t) = \mathit{Lu}(t).}$$

Therefore \(u(t) = e^{\,\mathit{tL}}\!\stackrel{\circ }{u}\) is a solution of the IVP

$$\displaystyle{u^{{\prime}}(t) = \mathit{Lu}(t),\quad u(0) =\,\, \stackrel{\circ }{u}.}$$

To show that this IVP has no other differentiable solutions, we argue in exactly the same way as for the scalar case. Suppose that another solution v exists; i.e. v (t) = Lv(t) and \(v(0) =\,\, \stackrel{\circ }{u}\), and let t > 0 be arbitrarily fixed. Then, for 0 < s ≤ t, we have

$$\displaystyle\begin{array}{rcl} \frac{d} {\mathit{ds}}(e^{(t-s)L}v(s))& =& -Le^{(t-s)L}v(s) + e^{(t-s)L}v^{{\prime}}(s) {}\\ & =& -Le^{(t-s)L}v(s) + e^{(t-s)L}Lv(s) = \mathbf{0}, {}\\ \end{array}$$

where 0 is the zero vector in \(\mathbb{R}^{n}\). It follows from this that e (ts)L v(s) is a constant vector for all s ∈ [0, t]. On choosing s = 0 and s = t, we obtain

$$\displaystyle{e^{\,\mathit{tL}}\stackrel{\circ }{u} = e^{\,\mathit{tL}}v(0) = e^{\,(t-t)L}\,v(t) = e^{\,0L\,}v(t) = v(t).}$$

Since this argument works for any t > 0 and we already know that \(v(0) = u(0) =\,\, \stackrel{\circ }{u}\), we deduce that \(v(t) = u(t) = e^{\,\mathit{tL}}\!\stackrel{\circ }{u}\) for all t ≥ 0.

Note that this solution can be used to define the n-dimensional CDS \(\phi: [0,\infty ) \times \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\), where \(\phi (t,\stackrel{\circ }{u}):= e^{\,\mathit{tL}}\!\stackrel{\circ }{u}\). The associated semigroup of operators {S(t)} t ≥ 0, S(t): = e tl, is referred to as the semigroup generated by the matrix L, and, for each \(\stackrel{\circ }{u}\), the set \(\{S(t)\!\stackrel{\circ }{u}: t \geq 0\} \subset \mathbb{R}^{n}\) is called the (positive semi-) orbit of \(\stackrel{\circ }{u}\). Geometrically, we can regard the orbit as a continuous (with respect to t) “curve” (or path or trajectory), emanating from \(\stackrel{\circ }{u}\), that lies in the state-space \(\mathbb{R}^{n}\) for all t ≥ 0. The continuity property follows from the fact that

$$\displaystyle{\Vert e^{\,hL} - I\Vert \leq e^{\,\vert h\vert \,\Vert L\Vert } - 1 \rightarrow 0\mbox{ as }h \rightarrow 0.}$$

A constant solution, \(u(t) \equiv \overline{u}\) for all t, where \(\overline{u} = (\overline{u}_{1},\ldots,\overline{u}_{n}) \in \mathbb{R}^{n}\) is called an equilibrium solution or steady state solution. The orbit of such a solution is the single element (or point) \(\overline{u} \in \mathbb{R}^{n}\); \(\overline{u}\) is called an equilibrium point (or rest point, stationary point or critical point). If \(\overline{u}\) is an equilibrium point, then \(L\overline{u} = \mathbf{0}\). We shall only consider the case when the matrix L is non-singular and therefore the only equilibrium point of the system \(u^{{\prime}}(t) = \mathit{Lu}(t)\) is \(\overline{u} = \mathbf{0}.\) When each eigenvalue of L has a negative real part, the equilibrium point 0 is globally attractive (or globally asymptotically stable) since \(\Vert e^{\,\mathit{tL}}f\Vert \rightarrow 0\) as t →  for all \(f \in \mathbb{R}^{n}\); see [13, p.12].

In principle, e tL can be computed by using the fact that, if P is a non-singular matrix and \(L = P\varLambda P^{-1}\), then \(e^{\,\mathit{tL}} = Pe^{\,t\varLambda }P^{-1}\). For example, if L has n distinct real eigenvalues \(\lambda _{1},\ldots,\lambda _{n}\), then the corresponding eigenvectors can be used as the columns of a matrix P such that \(L = P\varLambda P^{-1}\), where \(\varLambda = \mbox{ diag}\{\lambda _{1},\ldots,\lambda _{n}\}\), in which case

$$\displaystyle{e^{\,\mathit{tL}} = P\,e^{\,t\varLambda }P^{-1}\mbox{ with }e^{\,t\varLambda } = \mbox{ diag}\{e^{\,\lambda _{1}t},\ldots,e^{\,\lambda _{n}t}\}.}$$

More generally, it can be shown that the components u j (t),  j = 1, , n, of any given solution u(t) can be written as a linear combination of the functions

$$\displaystyle{t^{k}e^{\,\mu \,t}\cos (\nu t),\qquad t^{\ell}e^{\,\mu \,t}\sin (\nu t),}$$

where μ + i ν runs through all the eigenvalues of L, and k,  are suitably restricted non-negative integers; see [15, p.135].

One final remark in this subsection is that it should be clear that the restriction t ≥ 0 is unnecessary in all of the above, and that we could just as easily have defined a group \(\{e^{\,\mathit{tL}}\}_{t\in \mathbb{R}}\). We have focussed only on the semigroup case since this is usually the best that we can hope to obtain when we look at the more complicated setting of semigroups generated by operators defined in an infinite-dimensional state space.

2.2 Nonlinear Autonomous Systems of ODEs

We have seen that IVPs involving constant-coefficient linear systems of ODEs have unique, globally defined solutions that can be expressed in terms of matrix exponentials. For more general systems of ODEs, life becomes a bit more complicated and it is usually difficult to obtain exact solutions. However, useful qualitative results can sometimes be obtained. We shall consider the IVP

$$\displaystyle{ u^{{\prime}}(t) = F(u(t)),\quad u(0) =\,\, \stackrel{\circ }{u}, }$$
(13)

where \(u(t) = (u_{1}(t),\ldots,u_{n}(t))\), \(\stackrel{\circ }{u} = (\stackrel{\circ }{u}_{1},\ldots,\stackrel{\circ }{u}_{n})\) and \(F:\mathbb{R} ^{n} \supseteq W \rightarrow \mathbb{R}^{n}\) is a vector-valued function \(F = (F_{1},\ldots,F_{n})\) defined on an open subset W of \(\mathbb{R}^{n}\). A solution of (13) is a differentiable function u: J → W defined on some interval \(J \subset \mathbb{R}\), with 0 ∈ J, such that

$$\displaystyle{u^{{\prime}}(t) = F(u(t))\ \forall \,t \in J,\mbox{ and }\ u(0) =\,\, \stackrel{\circ }{u}.}$$

2.2.1 Existence and Uniqueness of Solutions

The following theorem provides sufficient conditions for the existence of a unique solution to (13) on some interval \(J = (-a,a)\). We shall denote such a solution by \(\phi (\cdot,\stackrel{\circ }{u})\), i.e. at time t ∈ (−a, a), the solution is \(u(t) =\phi (t,\stackrel{\circ }{u})\). We shall also express \(\phi (t,\stackrel{\circ }{u})\) as \(S(t)\!\stackrel{\circ }{u}\).

Theorem 2

Let F be continuously differentiable on W.

  1. (i)

    ( Local Existence and Uniqueness ) For each \(\stackrel{\circ }{u}\, \in W\) , there exists a unique solution \(\phi (\cdot,\stackrel{\circ }{u})\) of the IVP  (13) defined on some interval (−a,a) where a > 0.

  2. (ii)

    ( Continuous Dependence on Initial Conditions ) Let the unique solution \(\phi (\cdot,\stackrel{\circ }{u})\) be defined on some closed interval [0,b]. Then there exists a neighbourhood U of \(\stackrel{\circ }{u}\) and a positive constant K such that if \(\stackrel{\circ }{v}\, \in U\) , then the corresponding IVP \(v^{{\prime}} = F(v),\ v(0) =\,\, \stackrel{\circ }{v}\) , has a unique solution also defined on [0,b] and

    $$\displaystyle{\Vert \phi (t,\stackrel{\circ }{u}) -\phi (t,\stackrel{\circ }{v})\Vert =\Vert S(t)\!\stackrel{\circ }{u} - S(t)\!\stackrel{\circ }{v}\Vert \leq e^{Kt}\Vert \stackrel{\circ }{u} -\stackrel{\circ }{v}\Vert \quad \forall \,t \in [0,b].}$$
  3. (iii)

    ( Maximal Interval of Existence ) For each \(\stackrel{\circ }{u}\, \in W\) , there exists a maximal open interval \(J_{\mathit{max}} = (\alpha,\beta )\) containing 0 (with α and β depending on \(\stackrel{\circ }{u}\) ) on which the unique solution \(\phi (t,\stackrel{\circ }{u})\) is defined. If β < ∞, then, given any compact subset K of W, there is some t ∈ (α,β) such that \(u(t)\notin K.\)

Remarks

  1. (a)

    Proofs of these results can be found in [15, Chapter 8].

  2. (b)

    The vector function F is said to be differentiable at g ∈ W if there exists a linear operator \(F_{g} \in B(\mathbb{R}^{n})\) such that

    $$\displaystyle{F(g + h) = F(g) + F_{g}(h) + E(g,h),\ h \in \mathbb{R}^{n},}$$

    where

    $$\displaystyle{\lim _{\Vert h\Vert \rightarrow 0}\frac{\Vert E(g,h)\Vert } {\Vert h\Vert } = 0.}$$

    It can be shown that F g can be represented by the n × n Jacobian matrix

    $$\displaystyle{DF = \left [\begin{array}{cccc} \partial _{1}F_{1} & \partial _{2}F_{1} & \cdots & \partial _{n}F_{1} \\ \partial _{1}F_{2} & \partial _{2}F_{2} & \cdots & \partial _{n}F_{2}\\ \vdots & \vdots & \ddots & \vdots \\ \partial _{1}F_{n}&\partial _{2}F_{n}&\cdots &\partial _{n}F_{n}\end{array} \right ]}$$

    evaluated at g. The function F is continuously differentiable on W if all the partial derivatives j F i exist and are continuous on W.

  3. (c)

    The fact that F is continuously differentiable on W means that F satisfies a local Lipschitz condition on W; i.e. for each \(\stackrel{\circ }{u}\, \in W\) there is a closed ball

    $$\displaystyle{\overline{B}_{r}(\stackrel{\circ }{u}):=\{ f \in \mathbb{R}^{n}:\Vert f -\stackrel{\circ }{u}\Vert \leq r\} \subset W}$$

    and a constant k, which may depend on \(\stackrel{\circ }{u}\) and r, such that

    $$\displaystyle{\Vert F(f) - F(g)\Vert \leq k\,\Vert f - g\Vert \qquad \forall f,g \in \overline{B}_{r}(\stackrel{\circ }{u}).}$$
  4. (d)

    The proof of Theorem 2(i) involves the Banach Contraction Mapping Principle. The first step is to note that the IVP (13) is equivalent to the fixed point problem \(u = Tu,\) where T is the operator defined by

    $$\displaystyle{(Tu)(t) =\,\, \stackrel{\circ }{u} +\int _{ 0}^{t}F(u(s))\mathit{ds},}$$

    i.e. u is a solution of (13) if and only if u satisfies the integral equation

    $$\displaystyle{u(t) =\,\, \stackrel{\circ }{u} +\int _{ 0}^{t}F(u(s))\mathit{ds}.}$$

    The local Lipschitz continuity of F can then be used to establish that T is a contraction on a suitably defined Banach space of functions; this yields existence and uniqueness. It is also possible to produce a sequence of iterates (u n ) convergent to the unique solution \(\phi (\cdot,\stackrel{\circ }{u})\) by using the Picard successive approximation scheme. We simply take \(u_{1}(t) \equiv \,\,\stackrel{\circ }{u}\) and then set

    $$\displaystyle{u_{n}(t) =\,\, \stackrel{\circ }{u} +\int _{ 0}^{t}\,F(u_{ n-1}(s))\mathit{ds},\qquad n = 2,3,\ldots }$$
  5. (e)

    The proof of Theorem 2(ii) relies on Gronwall’s inequality which states that if \(\psi: [0,b] \rightarrow \mathbb{R}\) is continuous, non-negative and satisfies

    $$\displaystyle{\psi (t) \leq C + K\int _{0}^{t}\psi (s)\,\mathit{ds}\ \ \forall \,t \in [0,b],}$$

    for constants C ≥ 0, K ≥ 0, then

    $$\displaystyle{\psi (t) \leq Ce^{Kt}\ \ \forall \,t \in [0,b].}$$
  6. (f)

    It can be shown that the operators S(t) have the following semigroup property:

    $$\displaystyle{S(t)S(s\!)\stackrel{\circ }{u} = S(t + s\!)\stackrel{\circ }{u},}$$

    where this identity is valid whenever one side exists (in which case, the other side will also exist).

2.2.2 Equilibrium Points

When analysing the nonlinear autonomous system of ODEs

$$\displaystyle{ u^{{\prime}} = F(u), }$$
(14)

the starting point is usually to look for equilibrium points (corresponding to constant, or steady-state solutions). In this case \(\bar{u}\) is an equilibrium point if

$$\displaystyle{F_{1}(\bar{u}) = 0,\ \ldots,F_{n}(\bar{u}) = 0,}$$

and the local stability properties of the equilibrium \(\bar{u}\) are usually determined by the eigenvalues of the Jacobian matrix \((DF)(\bar{u})\). The equilibrium \(\bar{u}\) is hyperbolic if \((DF)(\bar{u})\) has no eigenvalues with zero real part.

An equilibrium \(\bar{u}\) is said to be stable if nearby solutions remain nearby for all future time. More precisely, \(\bar{u}\) is stable if, for any given neighbourhood \(U\) of \(\bar{u}\), there is a neighbourhood \(U_{1}\) of \(\bar{u}\) in U such that

$$\displaystyle{\stackrel{\circ }{u}\, \in U_{1} \Rightarrow \phi (t,\stackrel{\circ }{u})\mbox{ exists for all }t \geq 0\mbox{ and }\phi (t,\stackrel{\circ }{u}) \in U\mbox{ for all }t \geq 0.}$$

If, in addition,

$$\displaystyle{\stackrel{\circ }{u}\, \in U_{1} \Rightarrow \phi (t,\stackrel{\circ }{u}) \rightarrow \bar{ u}\mbox{ as }t \rightarrow \infty,}$$

then \(\bar{u}\) is (locally) asymptotically stable. Any equilibrium which is not stable is said to be unstable. When \(\bar{u}\) is hyperbolic then it is either asymptotically stable (when all eigenvalues of \((DF)(\bar{u})\) have negative real parts) or unstable (when \((DF)(\bar{u})\) has at least one eigenvalue with positive real part).

The basic idea behind the proof of these stability results is that of linearisation. Suppose that \(\bar{u}\) is an equilibrium point and that \(\stackrel{\circ }{u}\) is sufficiently close to \(\bar{u}\). On setting \(v(t) =\phi (t,\stackrel{\circ }{u}) -\bar{ u}\), we obtain

$$\displaystyle\begin{array}{rcl} v^{{\prime}}(t)& =& F(\bar{u} + v(t)) \approx F(\bar{u}) + (DF)(\bar{u})\,v(t) {}\\ \mbox{ i.e.}\qquad v^{{\prime}}(t)& \approx & (DF)(\bar{u})\,v(t). {}\\ \end{array}$$

Thus, in the immediate vicinity of \(\bar{u}\), the nonlinear ODE \(u^{{\prime}} = F(u)\) can be approximated by the linear equation

$$\displaystyle{v^{{\prime}} = Lv,\quad \mbox{ where }L = (DF)(\bar{u}).}$$

In effect, this means that in order to understand the stability of a hyperbolic equilibrium point \(\bar{u}\) of u  = F(u), we need only consider the linearised equation \(v^{{\prime}} = (DF)(\bar{u})v\).

2.2.3 Graphical Approach in One and Two Dimensions

In the scalar case,

$$\displaystyle{u^{{\prime}}(t) = F(u(t)),\ u(0) = \stackrel{\circ }{u},}$$

we can represent the asymptotic behaviour of solutions using a phase portrait. Geometrically, the state space \(\mathbb{R}^{1}\) can be identified with the real line which, in this context, is called the phase line, and so the value u(t) of a solution u at time t defines a point on the phase line. As t varies, the solution u(t) traces out a trajectory, emanating from the initial point \(\stackrel{\circ }{u}\), that lies completely on the phase line. If we regard u(t) as the position of a particle on the phase line at time t, then the direction of motion of the particle is governed by the sign of F(u(t)). If F(u(t)) > 0 the motion at time t is to the right; if F(u(t)) < 0, then motion is to the left.

In two-dimensions, we use the phase plane. Here, we interpret the components u 1(t) and u 2(t) of any solution u(t) as the coordinates of a curve defined parametrically (in terms of t) in the u 1u 2 phase plane. Each solution curve plotted on the phase plane is a trajectory. A trajectory can also be regarded as the projection of a solution curve which “lives” in the three-dimensional space \(\mathbb{R}^{3}\) (with coordinates u 1, u 2 and t) onto the two-dimensional u 1u 2 plane. Phase plane trajectories have the following important properties.

  1. 1.

    Each trajectory corresponds to infinitely many solutions.

  2. 2.

    Through each point of the u 1u 2 phase plane there passes a unique trajectory and therefore trajectories cannot intersect.

  3. 3.

    On the phase plane, an equilibrium point \(\bar{u} = (\bar{u}_{1},\bar{u}_{2})\) is the trajectory of the constant solution

    $$\displaystyle{u_{1}(t) =\bar{ u}_{1},\quad u_{2}(t) =\bar{ u}_{2},\quad t \in \mathbb{R}.}$$
  4. 4.

    The trajectory of a non-constant periodic solution is a closed curve called a cycle.

The key to establishing these properties is to use the uniqueness of solutions to IVPs. For example, suppose that the point \(\stackrel{\circ }{u}\) lies, not only on the trajectory \(C(\stackrel{\circ }{u})\), but also on the trajectory \(C(\stackrel{\circ }{v})\) corresponding to the solution \(\phi (\cdot,\stackrel{\circ }{v})\). Then, \(\stackrel{\circ }{u} =\phi (t_{0},\stackrel{\circ }{v})\) for some t 0 and therefore the function \(\psi (t) =\phi (t - t_{0},\stackrel{\circ }{v})\) is a solution of the system that satisfies the initial condition \(\psi (0) =\,\, \stackrel{\circ }{u}\). By uniqueness of solutions, \(\psi (t) =\phi (t,\stackrel{\circ }{u})\). Therefore, the trajectories corresponding to ψ and \(\phi (\cdot,\stackrel{\circ }{u})\) (and hence \(\phi (\cdot,\stackrel{\circ }{v})\) and \(\phi (\cdot,\stackrel{\circ }{u})\)) are identical.

2.3 Dynamical Systems and Population Models

Suppose we are interested in the long-term behaviour of the population of a particular species (or the populations of several inter-related species). By a “population” we mean an assembly of individual organisms which can be regarded as being alike. What is required is a mathematical model that contains certain observed or experimentally determined parameters such as the number of predators, severity of climate, availability of food etc. This model may take the form of a differential equation or a difference equation, depending upon whether the population is assumed to change continuously or discretely. We shall restrict our attention to the case of continuous time. We can attempt to use the model to answer questions such as:

  1. 1.

    Does the population → 0 as t →  (extinction)?

  2. 2.

    Does the population become arbitrarily large as t →  ( eventual overcrowding)?

  3. 3.

    Does the population fluctuate periodically or even randomly?

Example 5

Single Species Population Dynamics (see [14, Section 2.1]). When all individuals in the population behave in the same manner, then the net effect of this behaviour on the total population is given by the product of the population size with the per capita effect (i.e. the effect due to the behaviour of a typical individual in the population). For example, if we consider the case of the production of new individuals, then the rate of change of the population size N(t) at time t in a continuous-time model can be expressed as

$$\displaystyle{ \frac{dN} {dt} = N \times \mbox{ per capita reproduction rate}. }$$
(15)

This can be written as

$$\displaystyle{ \frac{1} {N} \frac{dN} {dt} = \mbox{ per capita reproduction rate}}$$

or, equivalently,

$$\displaystyle{ \frac{d} {\mathit{dt}}\ln (N) = \mbox{ per capita reproduction rate.}}$$
  1. (i)

    The Malthus Model. In this extremely simple model, the per capita reproduction rate is assumed to be a constant, say β, in which case Eq. (15) becomes

    $$\displaystyle{\frac{dN} {dt} =\beta N,}$$

    and so \(N(t) = e^{\beta t}\!\!\stackrel{\circ }{N}\), where \(\stackrel{\circ }{N}\,\, = N(0)\). This type of population growth is often referred to as Malthusian growth. The Malthus model can easily be adapted to include the effect of deaths in the population. If we also assume that the mortality rate is proportional to the population size, then we obtain

    $$\displaystyle{\frac{dN} {dt} =\beta N -\delta N = rN,}$$

    where −δ N represents the decline in population size due to deaths, and the parameter \(r =\beta -\delta\) is the net per capita “growth” rate. The solution now is given by

    $$\displaystyle{ N(t) = e^{rt}\!\stackrel{\circ }{N}, }$$
    (16)

    where \(N(0) =\, \stackrel{\circ }{N}\) is the initial size of the population. It follows that:

    $$\displaystyle{\begin{array}{lll} &r > 0 \Rightarrow N(t) \rightarrow \infty \qquad \qquad &\,\mathrm{as}\ t \rightarrow \infty \ \quad (\mathrm{overcrowding}) \\ &r < 0 \Rightarrow N(t) \rightarrow 0 &\,\mathrm{as}\ t \rightarrow \infty \ \quad \mathrm{(extinction)} \\ &r = 0 \Rightarrow N(t) =\, \stackrel{\circ }{N}&\forall t \geq 0.\end{array} }$$

    Clearly, the solution (16) leads to an unrealistic prediction of what will happen to the size of the population in the long term and so we must include other (nonlinear) effects to improve the model.

  2. (ii)

    The Verhulst Model. A slightly more realistic model is given by

    $$\displaystyle{\frac{dN} {dt} = G(N)N,\quad t > 0;\,\quad N(0) =\,\, \stackrel{\circ }{N},}$$

    with a variable net growth rate G depending on the population size N. In some cases we would expect G to reflect the fact that there is likely to be some intra-specific competition for a limited supply of resources. This would require a growth rate, G(N), that would lead to a model predicting a small population growth when N is small, followed by more rapid population growth until N hits a saturation value, say K, beyond which N will level off. If N ever manages to exceed K, then G(N) should be such that N rapidly decreases towards K.

    For example, the equation of limited growth is

    $$\displaystyle{ \frac{dN} {dt} = r\left (1 -\frac{N} {K}\right )N,\,\quad N(0) =\,\, \stackrel{\circ }{N}, }$$
    (17)

    where K and r are positive constants. To obtain this equation, we have set \(G(N) = r(1 - N/K)\). Note that K is the population size at which G is zero and therefore \(dN/dt = 0\) when N = K. Equation (17) is called the (continuous time) logistic growth equation or Verhulst equation, the constant K is called the carrying capacity of the environment, and r is the unrestricted growth rate. The method of separation of variables can be used to show that the solution of (17) is

    $$\displaystyle{ N(t) = \frac{K} {1 - (1 - K/\stackrel{\circ }{N})\exp (-rt)}\,, }$$
    (18)

    and therefore N(t) → K as t → . 

Example 6

Models of Two Interacting Species (see [14, Section 2.2]). We now consider how interactions between pairs of species affect the population dynamics of both species. The type of interactions that can occur can be classified as follows:

  • Competition: each species has an inhibitory effect on the other;

  • Commensalism: each species benefits from the presence of others (symbiosis);

  • Predation: one species benefits and the other is inhibited by interactions between them.

In any given habitat, such as a lake, an island or a Petri dish, it is likely that a number of different species will live together. A common strategy is to identify two species as being the most important to each other, and then to ignore the effect on them of all the other species in the habitat.

In the case when the two species are in competition for the same resources, any increase in the numbers of one species will have an adverse effect on the growth rate of the other. The competitive Lotka–Volterra system of equations used to model this situation is given by

$$\displaystyle{ u_{1}^{{\prime}} = u_{ 1}(r_{1} - l_{11}u_{1} - l_{12}u_{2}),\quad u_{2}^{{\prime}} = u_{ 2}(r_{2} - l_{21}u_{1} - l_{22}u_{2}), }$$
(19)

where

  • \(u_{1}(t),u_{2}(t)\) are the sizes of the two species at time t;

  • r 1, r 2 are the intrinsic growth rates of the respective species;

  • \(l_{11},l_{22}\) represent the strength of the intraspecific competition within each species, with \(r_{1}/l_{11}\) and \(r_{2}/l_{22}\) the carrying capacities of the respective species;

  • \(l_{12},l_{21}\) represent the strength of the interspecific competition (i.e. competition between the species).

Each of the constants \(r_{1},r_{2},l_{11},l_{12},l_{21},l_{22}\) is positive.

It follows from the existence-uniqueness theorem that, for each initial state \(\stackrel{\circ }{u}\), there exists a unique solution \(u(t) = S(t)\!\stackrel{\circ }{u}\) defined on some interval [0, t max ), where t max  <  only if \(\Vert u(t)\Vert\) diverges to infinity in finite time. Moreover, since the non-negative u 1 and u 2 axes are composed of complete trajectories, any trajectory that starts off in the positive first quadrant must remain there; i.e. solutions that start off at positive values stay positive (recall from phase plane analysis that trajectories in the phase plane cannot intersect).

Let L be the matrix

$$\displaystyle{L = \left [\begin{array}{cc} l_{11} & l_{12} \\ l_{21} & l_{22} \end{array} \right ],}$$

and assume that | L | ≠ 0. The system of Eq. (19) has four equilibria, namely

$$\displaystyle{U_{1} = (0,0),\ U_{2} = (r_{1}/l_{11},0),\ U_{3} = (0,r_{2}/l_{22})\mbox{ and }U_{4} = (u_{1}^{{\ast}},u_{ 2}^{{\ast}}),}$$

where

$$\displaystyle{\left [\begin{array}{c} u_{1}^{{\ast}} \\ u_{2}^{{\ast}} \end{array} \right ] = L^{-1}\left [\begin{array}{c} r_{1} \\ r_{2}\end{array} \right ] = \frac{1} {\vert L\vert }\left [\begin{array}{c} r_{1}l_{22} - r_{2}l_{12} \\ r_{2}l_{11} - r_{1}l_{21} \end{array} \right ].}$$

Note that

$$\displaystyle\begin{array}{rcl} & & \vert L\vert > 0\mbox{ when }\frac{l_{12}} {l_{22}} < \frac{l_{11}} {l_{21}} {}\\ & & \vert L\vert < 0\mbox{ when }\frac{l_{12}} {l_{22}} > \frac{l_{11}} {l_{21}}\,. {}\\ \end{array}$$

From this, we can deduce that there are two scenarios that result in u 1  > 0 and u 2  > 0, namely

$$\displaystyle\begin{array}{rcl} & & \mbox{ Case I }:\ \frac{l_{12}} {l_{22}} < \frac{r_{1}} {r_{2}} < \frac{l_{11}} {l_{21}} {}\\ & & \mbox{ Case II }:\ \frac{l_{11}} {l_{21}} < \frac{r_{1}} {r_{2}} < \frac{l_{12}} {l_{22}}. {}\\ \end{array}$$

The Jacobian matrix at (x, y) is given by

$$\displaystyle{(DF)(x,y) = \left [\begin{array}{cc} r_{1} - 2l_{11\,}x - l_{12}\,y& - l_{12}\,x \\ - l_{21\,}y &r_{2} - l_{21\,}x - 2l_{22}\,y \end{array} \right ].}$$

For the equilibrium U 1, we have

$$\displaystyle{(\mathit{DF})(0,0) = \left [\begin{array}{cc} r_{1} & 0 \\ 0 &r_{2} \end{array} \right ],}$$

and it follows immediately that U 1 is unstable in each of Case I and Case II.

Consider now the other three equilibria when Case I applies. To determine the stability properties of these, we note first that the characteristic equation of a real 2 × 2 matrix, say A, can be written in the form

$$\displaystyle{\lambda ^{2} -\mathit{trace}(A)\,\lambda + \vert A\vert = 0.}$$

It follows that a non-singular matrix A will have two eigenvalues with negative real parts when | A |  > 0 and \(trace(A) < 0\), and will have exactly one positive eigenvalue when | A |  < 0. At U 2 we have

$$\displaystyle{(\mathit{DF})(r_{1}/l_{11},0) = \left [\begin{array}{cc} - r_{1} & - l_{12}\,r_{1}/l_{11} \\ 0 &r_{2} - l_{21}\,r_{1}/l_{11} \end{array} \right ].}$$

As the determinant of this Jacobian matrix is

$$\displaystyle{-r_{1}\left (r_{2} -\frac{l_{21}\,r_{1}} {l_{11}} \right ) < 0,}$$

the equilibrium U 2 is unstable. Similarly, U 3 is unstable. Now consider U 4. In this case,

$$\displaystyle{(\mathit{DF})(u_{1}^{{\ast}},u_{ 2}^{{\ast}}) = \left [\begin{array}{cc} r_{1} - 2l_{11}u_{1}^{{\ast}}- l_{12}u_{2}^{{\ast}}& - l_{12}u_{1}^{{\ast}} \\ - l_{21}u_{2}^{{\ast}} &r_{2} - l_{21}u_{1}^{{\ast}}- 2l_{22}u_{2}^{{\ast}} \end{array} \right ] = \left [\begin{array}{ccc} - l_{11}u_{1}^{{\ast}}&&- l_{12}u_{1}^{{\ast}} \\ - l_{21}u_{2}^{{\ast}}&&- l_{22}u_{2}^{{\ast}} \end{array} \right ].}$$

Consequently, the characteristic equation takes the form

$$\displaystyle{\lambda ^{2} +\lambda (l_{ 11}u_{1}^{{\ast}} + l_{ 22}u_{2}^{{\ast}}) + u_{ 1}^{{\ast}}u_{ 2}^{{\ast}}\,\vert L\vert = 0,}$$

and therefore U 4 is locally asymptotically stable (since the trace of the Jacobian matrix is negative and the determinant is positive). In fact, it can be shown that all trajectories in the positive first quadrant converge to U 4 as t → ; see [14, p. 32]. Thus, in Case I, the competing species may coexist in the long term. Note that the condition \(l_{11}l_{22} > l_{12}l_{21}\), which holds here, can be interpreted as stating that the overall intraspecific competition is stronger than the overall interspecific competition.

In Case II, a similar analysis shows that U 2 and U 3 are both asymptotically stable, with U 4 unstable (in fact U 4 is a saddle point). It follows that, in the long term, one of the species will die out. The species that survives is determined by the initial conditions. Since U 4 is a saddle point, there exist stable and unstable orbits emanating from U 4; see [24, p. 21]. These orbits are referred to as separatrices. As discussed in [14, p. 31], if the initial point on a trajectory lies above the stable separatrix, then the trajectory converges to U 3 (i.e. species u 1 dies out). If \(\stackrel{\circ }{u}\) lies below this separatrix, then the trajectory converges to U 2 (i.e. species u 2 dies out).

For an analysis of the case when U 4 does not lie in the first quadrant of the phase plane, see [14, Section 2.3]. Note also that the equations used to model two species which are interacting in a co-operative manner are also given by (19), but now we have \(l_{12} < 0,l_{21} < 0,l_{11} > 0\) and l 22 > 0.

Example 7

The SIR Models of Infectious Diseases (see [14, Chapter 3], [9, Chapter 3] and [12, Chapter 6]). In simple epidemic models, it is often assumed that the total population size remains constant. At any fixed time, each individual within this population will be in one (and only one) of the following classes.

  • Class S: this consists of individuals who are susceptible to being infected (i.e. can catch the disease).

  • Class I: this consists of infected individuals (i.e. individuals who have the disease and can transmit it to susceptibles).

  • Class R: this consists of individuals who have recovered from the disease and are now immune.

The class R is sometimes regarded as the Removed Class as it can also include those individuals who have died of the disease or are isolated until recovery. The SIR model was pioneered in a paper “Contribution to the Mathematical Theory of Epidemics” published in 1927 by two scientists, William Kermack and Anderson McKendrick, working in Edinburgh. In searching for a mechanism that would explain when and why an epidemic terminates, they concluded that: “In general a threshold density of population is found to exist, which depends upon the infectivity, recovery and death rates peculiar to the epidemic. No epidemic can occur if the population density is below this threshold value.”

If we let S(t), I(t) and R(t) denote the sizes of each class, then the following system of differential equations can be used to describe how these sizes change with time:

$$\displaystyle\begin{array}{rcl} \frac{ds} {dt} = -\beta \mathit{SI}& &{}\end{array}$$
(20)
$$\displaystyle\begin{array}{rcl} \frac{dI} {dt} =\beta \mathit{SI} -\gamma I& &{}\end{array}$$
(21)
$$\displaystyle\begin{array}{rcl} \frac{\mathit{dR}} {\mathit{dt}} =\gamma I.& &{}\end{array}$$
(22)

Here we are making the following assumptions.

  • The gain in the infective class is proportional to the number of infectives and the number of susceptibles; i.e. is given by β S I, where β is a positive constant. The susceptibles are lost at the same rate.

  • The rate of removal of infectives to the recovered class is proportional to the number of infectives; i.e. is given by γ I, where γ is a positive constant.

We refer to γ as the recovery rate and β as the transmission (or infection) rate.

Note that, when analysing this system of equations, we are only interested in non-negative solutions for S(t), I(t) and R(t). Moreover, the constant population size is built into the system (20)–(22) since adding the equations gives

$$\displaystyle{\frac{dS} {dt} + \frac{dI} {dt} + \frac{\mathit{dR}} {\mathit{dt}} = 0,}$$

showing that, for each t,

$$\displaystyle{S(t) + I(t) + R(t) = N,}$$

where N is the fixed total population size. The model is now completed by imposing initial conditions of the form

$$\displaystyle{S(0) =\,\, \stackrel{\circ }{S}\,\, \approx N,\quad I(0) =\,\, \stackrel{\circ }{I}\,\, = N -\,\,\stackrel{\circ }{S}\,\, > 0,\quad R(0) = 0.}$$

Given particular values of \(\beta,\,\gamma,\,\stackrel{\circ }{S}\) and \(\stackrel{\circ }{I}\), we can use the model to predict whether the infection will spread or not, and if it does spread, in what manner it will grow with time. One observation that can be made more or less immediately is that the infectious class will grow in size if \(\mathit{dI}/\mathit{dt} > 0\). Since we are assuming that there are infectious individuals in the population at time t = 0, Eq. (21) shows that I(t) will increase from its initial value provided \(\stackrel{\circ }{S}\,\, >\gamma /\beta\). The parameter \(R_{0} =\beta /\gamma\) is called the Basic Reproductive Ratio and is defined as the average number of secondary cases produced by an average infectious individual in a totally susceptible population.

We shall determine the long term behaviour of solutions by arguing as follows.

  • Since \(S(t) + I(t) + R(t) = N\) for all t, the system is really only a 2-D system and so we shall concentrate on the equations governing the evolution of S and I. For this 2-D system, we have an infinite number of equilibria, namely \((\overline{S},0)\), where \(\overline{S}\) can be any non-negative number in the interval [0, N]. Note that these equilibria are not isolated (i.e. for each of these equilibria, no open ball centred at the equilibrium can be found that contains no other equilibrium). This means that the customary local-linearisation at an isolated equilibrium cannot be used to determine the stability of the equilibria of this 2-D system.

  • The non-negative S axis consists entirely of equilibrium points and the non-negative I axis is composed of two complete trajectories, namely the equilibrium (0, 0) and the positive I axis. This means that solutions that start off with \(\,\stackrel{\circ }{S}\,\, > 0\) and \(\,\stackrel{\circ }{I}\,\, > 0\) remain positive.

  • Since S(t) > 0 and I(t) > 0, it follows (from the equation for S) that S(t) is strictly decreasing. Hence \(S(t) <\,\, \stackrel{\circ }{S}\) for any t > 0 for which S(t) exists. Note that it is impossible for I(t) to blow up in finite time since

    $$\displaystyle\begin{array}{rcl} I^{\,{\prime}}(t)& \leq & (\beta \!\!\stackrel{\circ }{S}\,-\gamma )I(t) {}\\ & \Rightarrow &\int _{0}^{t}\frac{I^{\,{\prime}}(s)} {I(s)} \,\mathit{ds} \leq \int _{0}^{t}(\beta \!\!\stackrel{\circ }{S}\,-\gamma )\,\mathit{ds} = (\beta \!\stackrel{\circ }{S}\,-\gamma )t {}\\ & \Rightarrow &\ln (I(t)) \leq \ln (\,\stackrel{\circ }{I}\,) + (\beta \!\!\stackrel{\circ }{S}\,-\gamma )t {}\\ & \Rightarrow & 0 < I(t) \leq \exp (\beta \!\!\stackrel{\circ }{S}\,-\gamma )t)\stackrel{\circ }{I}. {}\\ \end{array}$$

    Therefore both S(t) and I(t) exist globally in time. Moreover, if \(\stackrel{\circ }{S}\, <\gamma /\beta\), then I(t) → 0 as t → .

  • For the epidemic to spread initially, we require \(\stackrel{\circ }{S}\,\, >\gamma /\beta\), since we will then have I (0) > 0. However, in this case there will exist some finite time, say t , such that S(t ) < γβ. To see this, simply observe that if we assume that S(t) ≥ γβ for all t then we obtain \(I(t) \geq \,\,\stackrel{\circ }{I}\) and \(S^{{\prime}}(t) \leq -\gamma \stackrel{\circ }{I}\) for all t. From this it follows that

    $$\displaystyle{S(t) \leq -\gamma \!\stackrel{\circ }{I}\!t + \stackrel{\circ }{S} \rightarrow -\infty \mbox{ as }t \rightarrow \infty,}$$

    which clearly is a contradiction. Arguing as before (but now with \(\stackrel{\circ }{S}\) replaced by S(t )) shows that once again I(t) → 0 as t → , despite I(t) initially increasing.

  • Since S(t) is a strictly decreasing function that is bounded below (by zero), S(t) must converge to some limit S  ≥ 0 as t → . We now establish that S  > 0, showing that although the epidemic ultimately dies out, this is not caused by the number of available susceptibles decreasing to zero. Here we make use of the equation for R. We have

    $$\displaystyle{ \frac{dS} {dR} = \frac{\mathit{dS}/\mathit{dt}} {\mathit{dR}/\mathit{dt}} = -\frac{\beta } {\gamma }S \Rightarrow S =\exp (-\beta R/\gamma )\!\stackrel{\circ }{S}.}$$

    Since R ≤ N, we deduce that S is always greater than the positive constant \(\exp (-\beta N/\gamma )\!\stackrel{\circ }{S}\) and therefore S  > 0.

  • Finally the trajectories in the SI phase plane can be obtained from the ODE

    $$\displaystyle{\frac{dI} {ds} = -1 + \frac{\gamma } {\beta S}.}$$

    This has solution given by

    $$\displaystyle{I = N - S + (\gamma /\beta )\ln (S/\!\stackrel{\circ }{S});}$$

    here we have used the fact that \(\stackrel{\circ }{S} + \stackrel{\circ }{I}\,\, = N\). Consequently, on taking limits (t → ) on each side, and rearranging, we obtain

    $$\displaystyle{S_{\infty } = N + (\gamma /\beta )\ln (S_{\infty }/\!\stackrel{\circ }{S}).}$$

    For each given \(\stackrel{\circ }{S}\), this equation has only one positive solution S .

To summarise, we have shown that each solution (S(t), I(t)) will converge to an equilibrium (S , 0), with S  > 0, which is determined by the initial value of S. From this, it follows that \((S(t),I(t),R(t)) \rightarrow (S_{\infty },0,N - S_{\infty })\) as t → . The value of NS shows the extent to which the infection has affected the population.

3 Infinite-Dimensional State Space

We now move into the realm of infinite-dimensional dynamical systems. Therefore, in the following discussion, we shall assume that the state space X is an infinite-dimensional Banach space with norm \(\Vert \cdot \Vert\). The aim now is to express evolution equations in operator form as ordinary differential equations which are posed in X. We shall consider only problems of the type

$$\displaystyle{ u^{{\prime}}(t) = L\,(u(t)) + N\,(u(t)),\ t > 0,\quad u(0) =\,\, \stackrel{\circ }{u}, }$$
(23)

where \(L: X \supseteq D(L) \rightarrow X\) and N: X → X are, respectively, linear and nonlinear operators, with D(L) a linear subspace of X. In (23), the derivative is interpreted as a strong derivative, defined via (6) and (7), and a solution u: [0, ) → X is sought. The operator L + N that appears in (23) governs the time-evolution of the infinite-dimensional state vector u(⋅ ), and the initial-value problem (23) is usually called a (semi-linear) abstract Cauchy problem (ACP).

To provide some motivation for looking at infinite-dimensional dynamical systems, we shall investigate a particular mathematical model of a system of particles that can coagulate to form larger particles, or fragment into smaller particles. Coagulation and fragmentation (C–F) processes of this type can be found in many important areas of science and engineering. Examples range from astrophysics, blood clotting, colloidal chemistry and polymer science to molecular beam epitaxy and mathematical ecology. An efficient way of modelling the dynamical behaviour of these processes is to use a rate equation which describes the evolution of the distribution of the interacting particles with respect to their size or mass; see [10, 23] and also Section 1 of the contribution to this volume by Philippe Laurençot [18].

Suppose that we regard the system under consideration as one consisting of a large number of clusters (often referred to as mers) that can coagulate to form larger clusters or fragment into a number of smaller clusters. Under the assumption that each cluster of size n (n-mer) is composed of n identical fundamental units (monomers), the mass of each cluster is simply an integer multiple of the mass of a monomer. By appropriate scaling, each monomer can be assumed to have unit mass. This leads to a so-called discrete model of coagulation–fragmentation, with discrete indicating that cluster mass is a discrete variable which, in view of the above, can be assumed to take positive integer values.

In many theoretical investigations into discrete coagulation–fragmentation models, both coagulation and fragmentation have been assumed to be binary processes. Thus a j-mer can bind with an n-mer to form a (j + n)-mer or can break up into only two mers of smaller sizes; see the review article [10] by Collet for further details. However, a model of multiple fragmentation processes in which the break-up of a n-mer can lead to more than two mers has also been developed by Ziff; for example, see [25]. Consequently, we shall consider the more general model of binary coagulation combined with multiple fragmentation in the work we present here. In this case, the kinetic equation describing the time-evolution of the clusters is given by

$$\displaystyle\begin{array}{rcl} u_{n}^{{\prime}}(t)& =& -a_{ n}u_{n}(t) +\sum _{ j=n+1}^{\infty }a_{ j}b_{n,j}u_{j}(t) \\ & & +\frac{1} {2}\sum _{j=1}^{n-1}k_{ n-j,j}u_{n-j}(t)u_{j}(t) -\sum _{j=1}^{\infty }k_{ n,j}u_{n}(t)u_{j}(t)\,,{}\end{array}$$
(24)
$$\displaystyle\begin{array}{rcl} u_{n}(0)& =& \,\stackrel{\circ }{u}_{n}\,,\qquad n = 1,2,3,\ldots \,,{}\end{array}$$
(25)

where u n (t) is the concentration of n-mers at time t (where t is assumed to be a continuous variable), a n is the net rate of break-up of an n-mer, b n, j gives the average number of n-mers produced upon the break-up of a j-mer, and \(k_{n,j} = k_{j,n}\) represents the coagulation rate of an n-mer with a j-mer. Note that the total mass in the system at time t is given by

$$\displaystyle{M(t) =\sum _{ n=1}^{\infty }\mathit{nu}_{ n}(t)\,,}$$

and for mass to be conserved we require

$$\displaystyle{ \sum _{n=1}^{j-1}\mathit{nb}_{ n,j} = j\,,\quad j = 2,3,\ldots. }$$
(26)

On using this condition together with (24), a formal calculation establishes that M (t) = 0.

When the fragmentation process is binary, the C-F equation is usually expressed in the form

$$\displaystyle\begin{array}{rcl} u_{n}^{{\prime}}(t)& =& -\frac{1} {2}u_{n}(t)\sum _{j=1}^{n-1}F_{ j,n-j} +\sum _{ j=n+1}^{\infty }F_{ n,j-n}u_{j}(t) \\ & & +\frac{1} {2}\sum _{j=1}^{n-1}k_{ n-j,j}u_{n-j}(t)u_{j}(t) -\sum _{j=1}^{\infty }k_{ n,j}u_{n}(t)u_{j}(t)\,,{}\end{array}$$
(27)

where \(F_{n,j} = F_{j,n}\) represents the rate at which an (n + j)-mer breaks up into an n-mer and a j-mer. In this case,

$$\displaystyle{2a_{n} =\sum _{ j=1}^{n-1}F_{ j,n-j},\quad b_{n,j}\,a_{j} = F_{n,j-n}}$$

and so

$$\displaystyle{b_{n,j} = \frac{F_{n,j-n}} {a_{j}} = \frac{2F_{n,j-n}} {\sum _{r=1}^{j-1}F_{r,j-r}} \Rightarrow \sum _{n=1}^{j-1}b_{ n,j} = 2;}$$

i.e. the number of clusters produced in any fragmentation event is always two.

Equation (27) is the binary model that has been studied in [1] and [11], where existence and uniqueness results are presented for various rate coefficients. The underlying strategy common to each of these is to consider finite-dimensional truncations of (27). Standard methods from the theory of ordinary differential equations then lead to the existence of a sequence of solutions to these truncated equations. It is then shown, via Helly’s theorem, that a subsequence exists that converges to a function u that satisfies an integral version of (27). A solution obtained in this way is called an admissible solution. A similar approach has been used by Laurençot in [17] to prove the existence of appropriately defined global mass-conserving solutions of the more general Eq. (24), and also, in [18], of the continuous-size coagulation equation, which takes the form of an integro-differential equation.

In contrast to the truncation approach used in the aforementioned papers, here we shall show how results from the theory of semigroups of operators can be used to establish the existence and uniqueness of solutions to (24). For simplicity, we shall assume that k n, j  = k for all n, j where k is a non-negative constant. Note, however, that a semigroup approach can also deal with more general coagulation kernels. In particular, results related to the concept of an analytic semigroup play an important role. We shall not discuss analytic semigroups in these notes, but the interested reader should consult the contribution to this volume by Banasiak [3] where the continuous size C-F equation is investigated via analytic semigroups.

To see how an IVP for the discrete C–F equation can be expressed as an ACP, we define u(t) to be the sequence \((u_{1}(t),u_{2}(t),\ldots,u_{j}(t),\ldots )\). Then u(t) is a sequence-valued function of t for each t ≥ 0, and it therefore makes sense to seek a function u, defined on [0, ), that takes values in an infinite-dimensional state space consisting of sequences. The state space that is most often used due to its physical relevance is the Banach space 1 1 discussed in Example 4. The 1 1-norm of a non-negative element \(f \in \ell_{1}^{1}\) (i.e. \(f = (f_{1},f_{2},\ldots )\) with f j  ≥ 0 for all j ), given by \(\sum _{j=1}^{\infty }jf_{j}\), represents the total mass of the system. Similarly, the 0 1-norm of such an f gives the total number of particles in the system. Note that 1 1 is continuously imbedded in 0 1 since

$$\displaystyle{\Vert f\Vert _{0,1} \leq \Vert f\Vert _{1,1}\quad \forall \,f \in \ell_{1}^{1}.}$$

The function u will be required to satisfy an ACP of the form

$$\displaystyle{u^{{\prime}}(t) = L\,(u(t)) + N\,(u(t)),\ u(0) =\,\, \stackrel{\circ }{u},}$$

where L and N are appropriately defined operator versions of the respective mappings

$$\displaystyle\begin{array}{rcl} & & f_{n} \rightarrow -a_{n}f_{n} +\sum _{ j=n+1}^{\infty }a_{ j}\,b_{n,j}f_{j}\mbox{ and } {}\\ & & f_{n} \rightarrow \frac{k} {2}\sum _{j=1}^{n-1}f_{ n-j}f_{j} - k\sum _{j=1}^{\infty }f_{ n}f_{j},\quad (n = 1,2,3,\ldots ). {}\\ \end{array}$$

We begin our investigation into (23) by considering the case when only the linear operator L appears on the right-hand side of the equation; i.e. (23) takes the form

$$\displaystyle{ u^{{\prime}}(t) = L\,(u(t)),\ t > 0,\quad u(0) =\,\, \stackrel{\circ }{u}. }$$
(28)

In the context of our C–F model, this will represent a situation when no coagulation is occurring; i.e. the coagulation rate constant k is zero.

A function u: [0, ) → X is said to be a strong solution to (28) if

  1. (i)

    u is strongly continuous on [0, );

  2. (ii)

    the strong derivative u exists and is strongly continuous on (0, );

  3. (iii)

    u(t) ∈ D(L) for each t > 0;

  4. (iv)

    the equations in (28) are satisfied.

3.1 Linear Infinite-Dimensional Evolution Equations

3.1.1 Bounded Infinitesimal Generators

Although an infinite-dimensional setting may seem a bit daunting, it turns out that, for a bounded linear operator L, the methods discussed earlier in finite dimensions continue to work. Indeed, when L is bounded and linear on X, then the unique strong solution of the linear infinite-dimensional ACP (28) is given by

$$\displaystyle{ u(t) = e^{\,\mathit{tL}}\!\stackrel{\circ }{u}, }$$
(29)

where the operator exponential is defined by

$$\displaystyle{ e^{\,\mathit{tL}} = I + \mathit{tL} + \frac{t^{2}L^{2}} {2!} + \frac{t^{3}L^{3}} {3!} + \cdots \,, }$$
(30)

with I denoting the identity operator on X. This infinite series of bounded, linear operators on X always converges in B(X) to a bounded, linear operator on X. Moreover,

$$\displaystyle{ e^{\,0L} = I;\ e^{\,sL}e^{\,\mathit{tL}} = e^{(s+t)L}\ \mbox{ for all }s,t \in \mathbb{R};\ e^{\,\mathit{tL}}\!\stackrel{\circ }{u} \rightarrow \stackrel{\circ }{u}\mbox{ in }X\mbox{ as }t \rightarrow 0; }$$
(31)

see [19, Theorem 2.10]. It can easily be verified that the function \(\phi (t,\stackrel{\circ }{u}) = e^{\,\mathit{tL}}\!\!\stackrel{\circ }{u}\) defines a continuous, infinite-dimensional dynamical system on X.

The person who appears to have been the first to generalise the use of matrix exponentials for finite-dimensional systems of ODEs to operator exponentials in infinite-dimensional spaces is Maria Gramegna, a student of Peano, in 1910; see [13]. Peano had considered some special types of infinite systems of ODEs in 1894, but it was Gramegna who demonstrated that operator exponentials could be applied more generally, not only to infinite systems of ODEs, but also to integro-differential equations.

Example 8

We examine the simple case of an IVP for a fragmentation equation in which a j  = a for all j ≥ 2, where a is a positive constant. We shall show that the corresponding linear fragmentation operator L is bounded on 1 1. If we recall that a 1 = 0, and also that the mass-conservation condition (26) holds, then we obtain, for each f ∈  1 1,

$$\displaystyle\begin{array}{rcl} \Vert Lf\Vert _{1,1}& =& \sum _{n=1}^{\infty }n\left \vert -a_{ n}f_{n} +\sum _{ j=n+1}^{\infty }a_{ j}\,b_{n,j}f_{j}\,\right \vert {}\\ &\leq & \sum _{n=1}^{\infty }\mathit{na}_{ n}\vert f_{n}\vert +\sum _{ n=1}^{\infty }\sum _{ j=n+1}^{\infty }na_{ j}\,b_{n,j}\,\vert f_{j}\vert {}\\ & =& \sum _{n=1}^{\infty }\mathit{na}_{ n}\,\vert f_{n}\vert +\sum _{ j=2}^{\infty }\left (\sum _{ n=1}^{j-1}\mathit{nb}_{ n,j}\right )a_{j}\,\vert f_{j}\vert {}\\ & =& \sum _{n=1}^{\infty }\mathit{na}_{ n}\,\vert f_{n}\vert +\sum _{ j=1}^{\infty }ja_{ j}\,\vert f_{j}\vert {}\\ & =& 2a\sum _{n=1}^{\infty }n\vert f_{ n}\vert = 2a\Vert f\Vert _{1,1}. {}\\ \end{array}$$

It follows that \(L \in B(\ell_{1}^{1})\) and so the ACP

$$\displaystyle{u^{{\prime}}(t) = L\,(u(t)),\ u(0) =\,\, \stackrel{\circ }{u},}$$

has a strong, globally-defined, solution given by

$$\displaystyle{u(t) = e^{\,\mathit{tL}}\!\stackrel{\circ }{u}.}$$

As we shall demonstrate later when we consider the fragmentation equation with less restrictive conditions imposed on the rate coefficients a n , this strong solution is non-negative whenever \(\stackrel{\circ }{u}\) is non-negative, and \(\Vert u(t)\Vert _{1,1} =\Vert \stackrel{\circ }{u}\Vert _{1,1}\) for all t > 0, showing that mass is conserved.

3.1.2 Unbounded Infinitesimal Generators: The Hille–Yosida Theorem

In many applications that involve the analysis of a linear evolution equation, posed in an infinite-dimensional setting, when an approach involving semigroups of operators and exponentials of operators is tried, the restriction that L is bounded and defined on all of the state space X is frequently too severe. In most cases, L is unlikely to be bounded and is usually only defined on elements in X which have specific properties. Is it possible that a family of exponential operators \(\{e^{\mathit{tL}}\}_{t\geq 0}\) can be generated from an unbounded linear operator L and yield a unique solution to the IVP (28) via (29)? The answer to this is yes. In 1948, Einar Hille and Kôsaku Yosida, simultaneously and independently, proved a theorem (the Hille–Yosida theorem) that forms the cornerstone of the Theory of Strongly Continuous Semigroups of Operators. Since then, there has been a great deal of research activity in the theory and application of semigroups of operators. Amongst many other important developments, the Hille–Yosida theorem was extended in 1952 to a result that completely characterises the operators L that generate strongly continuous semigroups on a Banach space X. What this means is that, when a natural interpretation of “solution” is adopted, a unique solution to (28) exists if and only if the operator satisfies the conditions of this more general version of the Hille–Yosida theorem. Moreover, the solution is still given by (29), although, for unbounded linear operators L, a different exponential formula has to be used to define e tL. One such formula is

$$\displaystyle{ e^{\,\mathit{tL}}f:=\lim _{ n\rightarrow \infty }\left [\frac{n} {t} R(n/t,L)\right ]^{n}f =\lim _{ n\rightarrow \infty }\left (I - \frac{t} {n}L\right )^{-n}f, }$$
(32)

where R(λ, L) denotes the inverse of λ IL. Compare this with the scalar sequential formula for e tl,

$$\displaystyle{e^{\,tl} =\lim _{ n\rightarrow \infty }(1 + tl/n)^{n}.}$$

There are many excellent books devoted to the theory of strongly continuous semigroups; for example [6, 19, 21] and [13]. Important details can also be found in the lecture notes by Banasiak [3, Section 2.5] and a nice gentle introduction to the theory is given by Bobrowski [8, Section 1]. As in [8], the account of semigroups that is presented here is not intended to be comprehensive; instead we merely summarise several key results from this very elegant, and applicable, theory. We begin with the following fundamental definition.

Definition 2

Let {S(t)} t ≥ 0 be a family of bounded linear operators on a complex Banach space X. Then {S(t)} t ≥ 0 is said to be a strongly continuous semigroup (or C 0- semigroup) in B(X) if the following conditions are satisfied.

  1. S1.

    S(0) = I, where I is the identity operator on X.

  2. S2.

    \(S(t)S(s) = S(t + s)\) for all t, s ≥ 0.

  3. S3.

    S(t)f → f in X as t → 0+ for all f ∈ X.

Associated with each strongly continuous semigroup {S(t)} t ≥ 0 is a unique linear operator L defined by

$$\displaystyle{ Lf:=\lim _{h\rightarrow 0^{+}} \frac{S(h)f - f} {h},\quad D(L):= \left \{f \in X:\lim _{h\rightarrow 0^{+}} \frac{S(h)f - f} {h} \mbox{ exists in }X\right \}. }$$
(33)

The operator L is called the infinitesimal generator of the semigroup {S(t)} t ≥ 0. For example, the infinitesimal generator of the semigroup given by S(t) = e tL, where L ∈ B(X), is the operator L.

Before stating some important properties of strongly continuous semigroups and their generators, we require some terminology.

Definition 3

Let L: X ⊇ D(L) → X be a linear operator.

  1. (i)

    The resolvent set, ρ(L), of L is the set of complex numbers

    $$\displaystyle{\rho (L):=\{\lambda \in \mathbb{C}: R(\lambda,L):= (\lambda I - L)^{-1} \in B(X)\};}$$

    R(λ, L) is called the resolvent operator of L (at λ).

  2. (ii)

    L is a closed operator (or L is closed) if whenever \((f_{n})_{n=1}^{\infty }\subset D(L)\) is such that f n  → f and Lf n  → g in X as n → , then g ∈ D(L) and Lf = g.

  3. (iii)

    An operator L 1: X ⊃ D(L 1) → X is an extension of L, written L ⊂ L 1, if D(L) ⊂ D(L 1) and Lf = L 1 f for all f ∈ D(L). The operator L is closable if it has a closed extension, in which case the closure \(\overline{L}\) of L is defined to be the smallest closed extension of L.

  4. (iv)

    L is said to be densely defined if \(\overline{D(L)} = X\), i.e. if the closure of the set D(L) (with respect to the norm in X) is X. This means that, for each f ∈ X, there exists a sequence \((f_{n})_{n=1}^{\infty }\subset D(L)\) such that \(\Vert f - f_{n}\Vert \rightarrow 0\) as n → .

Theorem 3 (Some Semigroup Results)

Let \(\{S(t)\}_{t\geq 0} \subset B(X)\) be a strongly continuous semigroup with infinitesimal generator L. Then

  1. (i)

    S(t)f → S(t 0 )f in X as t → t 0 for any t 0 > 0 and f ∈ X;

  2. (ii)

    S(t)f → f in X as t → 0 + ;

  3. (iii)

    there are real constants M ≥ 1 and ω such that

    $$\displaystyle{ \Vert S(t)\Vert \leq Me^{\omega t}\ \mbox{ for all }t \geq 0; }$$
    (34)
  4. (iv)

    \(f \in D(L) \Rightarrow S(t)f \in D(L)\) for all t > 0 and

    $$\displaystyle{ \frac{d} {\mathit{dt}}S(t)f = LS(t)f = S(t)Lf\ \ \mbox{ for all }t > 0\mbox{ and }f \in D(L); }$$
    (35)
  5. (iv)

    the infinitesimal generator L is closed and densely defined.

We shall write \(L \in \mathcal{G}(M,\omega;X)\) when L is the infinitesimal generator of a strongly continuous semigroups of operators satisfying (34) on a Banach space X. When the operator \(L \in \mathcal{G}(1,0;X)\), L is said to generate a strongly continuous semigroup of contractions on X.

Theorem 4

(Hille–Yosida) :

The operator L is the infinitesimal generator of a strongly continuous semigroup of contractions on X if and only if

  1. (i)

    L is a closed, linear and densely-defined operator in X;

  2. (ii)

    λ ∈ρ(L) for all λ > 0;

  3. (iii)

    \(\Vert R(\lambda,L)\Vert \leq 1/\lambda\) for all λ > 0.

(Hille–Yosida–Phillips–Miyadera–Feller) :

\(L \in \mathcal{G}(M,\omega;X)\) if and only if

  1. (i)

    L is a closed, linear and densely-defined operator in X;

  2. (ii)

    λ ∈ρ(L) for all λ > ω;

  3. (iii)

    \(\Vert (R(\lambda,L))^{n}\Vert \leq M/(\lambda -\omega )^{n}\) for all \(\lambda >\omega,\ n = 1,2,\ldots\) .

Proofs of these extremely important results can be found in [19, Chapter 3].

We can now state the following existence/uniqueness theorem for the linear ACP

$$\displaystyle{ u^{\,{\prime}}(t) = L\,(u(t)),\ t > 0;\ \ u(0) =\,\, \stackrel{\circ }{u}\, \in D(L). }$$
(36)

Theorem 5

Let L be the infinitesimal generator of a strongly continuous semigroup \(\{S(t)\}_{t\geq 0} \subset B(X)\) . Then  (36) has one and only one strong solution u: [0,∞) → X and this is given by \(u(t) = S(t)\!\stackrel{\circ }{u}.\)

The operator S(t) can be interpreted as the exponential e tL if we define the latter by (32); see [19, Chaper 6] for a proof.

3.1.3 The Kato–Voigt Perturbation Theorem

Although the Hille–Yosida theorem and the generalisation due to Phillips et al. are extremely elegant results, in practice it is often difficult to check that the resolvent conditions are satisfied for a given linear operator L. One way to get round this is to make use of perturbation theorems for infinitesimal generators; see the book by Banasiak and Arlotti [4]. The basic idea is to treat, if possible, the linear operator governing the dynamics of the system as the sum of two linear operators, say A + B, where A is an operator which can easily be shown to generate a strongly continuous semigroup {S A (t)} t ≥ 0 on a Banach space X, and B is regarded as a perturbation of A. The question then is to identify sufficient conditions on B which will guarantee that A + B (or some extension of A + B) also generates a strongly continuous semigroup on X. A number of perturbation results of this type have been established. We shall focus on just one of these, namely the Kato–Voigt Perturbation theorem, but only for the specific case when the state space is the Banach space μ 1 of Example 4. An account of the general version of this important perturbation result is given in [3, Section 2.6].

As mentioned earlier, non-negative elements in μ 1 are taken to be sequences \(f = (f_{1},f_{2},\ldots )\) with f j  ≥ 0 for all j, in which case we write f ≥ 0. An operator \(T:\ell_{ \mu }^{1} \supseteq D(T) \rightarrow \ell_{\mu }^{1}\), is said to be non-negative if Tf ≥ 0 for all non-negative f ∈ D(T).

Theorem 6 (See [2, Theorem 2.1] and [4, Corollary 5.17])

Let the operators \(A:\ell_{ \mu }^{1} \supseteq D(A) \rightarrow \ell_{\mu }^{1}\) and \(B:\ell_{ \mu }^{1} \supseteq D(B) \rightarrow \ell_{\mu }^{1}\) have the following properties.

  1. (i)

    A is the infinitesimal generator of a semigroup of contractions \(\{S_{A}(t)\}_{t\geq 0}\) on \(\ell_{\mu }^{1}\) , with S A (t) ≥ 0 for all t ≥ 0.

  2. (ii)

    B is non-negative and \(D(B) \supseteq D(A)\) .

  3. (iii)

    For each non-negative f in D(A),

    $$\displaystyle{\sum _{j=1}^{\infty }j^{\,\mu }(Af + Bf)_{ j} \leq 0.}$$

Then there exists a strongly continuous semigroup of contractions, {S(t)} t≥0 , on ℓ μ 1 satisfying the Duhamel equation

$$\displaystyle{S(t)f = S_{A}(t)f +\int _{ 0}^{t}S(t - s)\mathit{BS}_{ A}(s)f\,\mathit{ds},\ f \in D(A).}$$

Each S(t) is non-negative and the infinitesimal generator of the semigroup is an extension L of A + B.

Example 9

We now show that a straightforward application of this perturbation theorem establishes the existence and uniqueness of solutions to the fragmentation equation for a wide class of fragmentation rate coefficients. Once again we work in the state space 1 1, and we take A and B to be the operators

$$\displaystyle\begin{array}{rcl} & & (Af)_{n}:= -a_{n}f_{n},\ n \in \mathbb{N},\quad D(A) =\{ f \in \ell_{1}^{1}: Af \in \ell_{ 1}^{1}\}, {}\\ & & (Bf)_{n}:=\sum _{ j=n+1}^{\infty }b_{ n,j}\,a_{j}f_{j},\ n \in \mathbb{N},\quad D(B) = D(A). {}\\ \end{array}$$

Then

  1. 1.

    By arguing as in [8, Example 6], it is not difficult to prove that the operator A is the infinitesimal generator of a strongly continuous semigroup of contractions \(\{S_{A}(t)\}_{t\geq 0}\) on 1 1 given by

    $$\displaystyle{(S_{A}(t)f)_{n}:= e^{-a_{n}t}f_{ n},\ n \in \mathbb{N}.}$$

    It is clear that S A (t) ≥ 0 for each t.

  2. 2.

    The calculations used in Example 8 can be repeated to show that

    $$\displaystyle\begin{array}{rcl} \Vert Bf\Vert & & \leq \sum _{n=1}^{\infty }\sum _{ j=n+1}^{\infty }\mathit{na}_{ j}\,b_{n,j}\vert f_{j}\vert {}\\ && =\sum _{ j=2}^{\infty }\left (\sum _{ n=1}^{j-1}\mathit{nb}_{ n,j}\right )a_{j}\vert f_{j}\vert {}\\ && =\sum _{ j=2}^{\infty }j\,a_{ j}\vert f_{j}\vert =\Vert \mathit{Af }\Vert,\ \forall \,f \in D(A). {}\\ \end{array}$$

    Consequently, B is well defined on D(A) and Bf  ≥ 0 for all f ∈ D(A) with f ≥ 0.

  3. 3.

    A similar argument shows that

    $$\displaystyle{\sum _{n=1}^{\infty }n(\mathit{Af } + \mathit{Bf })_{ n} = 0\ \forall \,f \in D(A)\mbox{ with }f \geq 0.}$$

Consequently, by the Kato–Voigt Perturbation Theorem, there exists a strongly continuous semigroup of contractions {S(t)} t ≥ 0 generated by an extension L of the operator (A + B, D(A)), with S(t)f ≥ 0 for all non-negative f ∈  1 1.

In this example it is possible to show that L is the closure of (A + B, D(A)) and also that

$$\displaystyle{\sum _{n=1}^{\infty }n(Lf)_{ n} = 0\ \forall \,f \in D(L)\mbox{ with }f \geq 0;}$$

see [20]. Consequently, the ACP

$$\displaystyle{u^{{\prime}}(t) = L(u(t)),\ u(0) =\,\, \stackrel{\circ }{u}\, \in D(L),\ \stackrel{\circ }{u}\, \geq 0,}$$

with \(L = \overline{A + B}\), has a unique strongly differentiable solution u: [0, ) → D(L) given by \(u(t) = S(t)\!\stackrel{\circ }{u}\).

Other results that can be established for this discrete-size fragmentation equation are:

  • If the sequence (a n ) is monotonic increasing, then S(t): D(A) → D(A) for all t ≥ 0 and therefore \(u(t) = S(t)\!\stackrel{\circ }{u}\) is the unique strong solution of the ACP

    $$\displaystyle{u^{{\prime}}(t) = A\,(u(t)) + B\,(u(t)),\ u(0) =\,\, \stackrel{\circ }{u}\, \in D(A),\ \stackrel{\circ }{u} \geq 0.}$$
  • Suppose that a n  > 0 for all n ≥ 2. Then

    $$\displaystyle{(A + B)\overline{u} = (0,0,\ldots )\mbox{ in }\ell_{1}^{1} \Leftrightarrow \overline{u} = ce_{ 1},}$$

    where c is a constant and e 1 = (1, 0, 0, ). Moreover, it can be shown that \(S(t)u_{0} \rightarrow M(\stackrel{\circ }{u})e_{1}\) in 1 1 as t → , where \(M(\stackrel{\circ }{u}) =\sum _{ n=1}^{\infty }n\!\stackrel{\circ }{u}_{n}\). This situation is similar to that observed with the SIR model in that we have infinitely many equilibria, and the equilibrium that any given solution converges to is uniquely determined by the initial data.

See [5] and [20] for further details.

3.2 Semi-linear Infinite-Dimensional Evolution Equations

To conclude, we return to the semi-linear ACP (23). We shall assume that the linear operator L is the infinitesimal generator of a strongly continuous semigroup {S(t)} t ≥ 0 on X. A strong solution on [0, t 0) of this ACP is a function u: [0, t 0) → X such that

  1. (i)

    u is strongly continuous on [0, t 0);

  2. (ii)

    u has a continuous strong derivative on (0, t 0);

  3. (iii)

    u(t) ∈ D(L) for 0 ≤ t < t 0;

  4. (iv)

    u(t) satisfies (23) for 0 ≤ t < t 0.

Suppose that u is a strong solution. Then, under suitable assumptions on N, u will also satisfy the Duhamel equation

$$\displaystyle{ u(t) = S(t)\!\stackrel{\circ }{u} +\int _{ 0}^{t}S(t - s)N(u(s))\,\mathit{ds},\quad 0 \leq t < t_{ 0}. }$$
(37)

This leads to the following definition of a weaker type of solution to the ACP.

Definition 4

A mild solution on [0, t 0) of (23) is a function u: [0, t 0) → X such that

  1. (i)

    u is strongly continuous on [0, t 0);

  2. (ii)

    u satisfies (37) on [0, t 0).

The definitions given earlier for a function on the finite-dimensional space \(\mathbb{R}^{n}\) to be Fréchet differentiable, or to satisfy a local Lipschitz condition, extend to operators on infinite-dimensional spaces. In particular, the nonlinear operator N: X → X satisfies a local Lipschitz condition on X if, for each \(\stackrel{\circ }{u}\, \in X\), there exists a closed ball \(\overline{B}_{r}(\stackrel{\circ }{u}):=\{ f \in X:\Vert f -\stackrel{\circ }{u}\Vert \leq r\}\) such that

$$\displaystyle{\Vert N(f) - N(g)\Vert \leq k\Vert f - g\Vert,\quad \forall f,g \in \overline{B}_{r}(\stackrel{\circ }{u}).}$$

Also, N is Fréchet differentiable at f ∈ X if an operator N f  ∈ B(X) exists such that

$$\displaystyle{N(f + h) = N(f) + N_{f}(h) + E(f,h),\ h \in X,}$$

where

$$\displaystyle{\lim _{\Vert h\Vert \rightarrow 0}\frac{\Vert E(f,h)\Vert } {\Vert h\Vert } = 0.}$$

The operator N f is the Fréchet derivative of N at f.

Theorem 7

Let \(L \in \mathcal{G}(M,\omega;X)\) and let N satisfy a local Lipschitz condition on X. Then there exists a unique mild solution of the ACP on some interval [0,t max ). Moreover, if t max < ∞, then

$$\displaystyle{\Vert u(t)\Vert \rightarrow \infty \mbox{ as }t \rightarrow t_{\mathit{max}}^{-}.}$$

Theorem 8

Let \(L \in \mathcal{G}(M,\omega;X)\) and let N be continuously Fréchet differentiable on X. Then the mild solution of the semi-linear ACP, with \(\stackrel{\circ }{u}\, \in D(L)\) , is a strong solution.

For proofs of these results, see [21, Chapter 6] and [7, Chapter 3].

Example 10

We now describe how these results have been applied to the discrete C-F equation in [20, 22] and [5]. Having already established that \(L = \overline{A + B}\,\) is the infinitesimal generator of a strongly continuous positive semigroup of contractions on the space 1 1, we express the full C-F equation as the semi-linear ACP

$$\displaystyle{u^{\,{\prime}}(t) = L\,(u(t)) + N\,(u(t)),\ t > 0,\quad u(0) =\,\, \stackrel{\circ }{u}\, \in D(L),}$$

where

$$\displaystyle{(Nf)_{n}:= \frac{k} {2}\sum _{j=1}^{n-1}f_{ n-j}f_{j}\, -\, k\sum _{j=1}^{\infty }f_{ n}f_{j},\ f \in \ell_{1}^{1}.}$$

We shall show below that \(N(f) \in \ell_{1}^{1}\) for all \(f \in \ell_{1}^{1}\). For this it is convenient to introduce the following bilinear operator

$$\displaystyle{\tilde{N}(f,g):=\tilde{ N}_{1}(f,g) -\,\tilde{ N}_{2}(f,g),}$$

where, for f, g ∈  1 1,

$$\displaystyle{[\tilde{N}_{1}(f,g)]_{n}:= \frac{k} {2}\sum _{j=1}^{n-1}f_{ n-j}g_{j},\quad [\tilde{N}_{2}(f,g)]_{n}:= k\sum _{j=1}^{\infty }f_{ n}g_{j},\ \ n \in \mathbb{N}.}$$

Note that \(N(f) =\tilde{ N}(f,f)\). Also, it is straightforward to verify that \(\tilde{N}(\cdot,\cdot )\) is linear in both left-hand and right-hand arguments. Consequently,

$$\displaystyle{ \tilde{N}(f + h,f + h) =\tilde{ N}(f,f) +\tilde{ N}(f,h) +\tilde{ N}(h,f) +\tilde{ N}(h,h). }$$
(38)

Now,

$$\displaystyle\begin{array}{rcl} \Vert \tilde{N}_{1}(f,g)\Vert & \leq & \frac{k} {2}\sum _{n=1}^{\infty }\sum _{ j=1}^{n-1}n\vert f_{ n-j}\vert \,\vert g_{j}\vert {}\\ & =& \frac{k} {2}\sum _{j=1}^{\infty }\,\sum _{ n=j+1}^{\infty }n\vert f_{ n-j}\vert \,\vert g_{j}\vert {}\\ & =& \frac{k} {2}\sum _{j=1}^{\infty }\,\sum _{ i=1}^{\infty }(i + j)\vert f_{ i}\vert \,\vert g_{j}\vert {}\\ &\leq & 2\,\frac{k} {2}\Vert f\Vert \,\Vert g\Vert = k\Vert f\Vert \,\Vert g\Vert. {}\\ \end{array}$$

Similarly,

$$\displaystyle{\Vert \tilde{N}_{1}(f,g)\Vert \leq k\Vert f\Vert \,\Vert g\Vert.}$$

Hence

$$\displaystyle{\Vert \tilde{N}(f,g)\Vert \leq 2k\Vert f\Vert \,\Vert g\Vert \ \mbox{ and }\Vert N(f)\Vert \leq 2k\Vert f\Vert ^{2}.}$$

In the case when N(f) ≥ 0, we can also deduce that

$$\displaystyle\begin{array}{rcl} \Vert N(f)\Vert & =& \sum _{n=1}^{\infty }n[N(f)]_{ n} {}\\ & =& \frac{k} {2}\left (\sum _{i=1}^{\infty }if_{ i}\right )\left (\sum _{j=1}^{\infty }f_{ j}\right ) + \frac{k} {2}\left (\sum _{i=1}^{\infty }f_{ i}\right )\left (\sum _{j=1}^{\infty }jf_{ j}\right ) {}\\ & & -k\left (\sum _{i=1}^{\infty }if_{ i}\right )\left (\sum _{j=1}^{\infty }f_{ j}\right ) = 0. {}\\ \end{array}$$

The bilinearity of \(\tilde{N}(\cdot,\cdot )\) leads immediately to the Fréchet differentiability of N. From (38), we obtain

$$\displaystyle{N(f + h) = N(f) + N_{f}(h) + N(h),}$$

where

$$\displaystyle{ N_{f}(h):=\tilde{ N}(f,h) +\tilde{ N}(h,f). }$$
(39)

For fixed \(f \in \ell_{1}^{1}\), N f is a linear operator on 1 1 and also

$$\displaystyle{ \Vert N_{f}(h)\Vert \leq 4k\Vert f\Vert \,\Vert h\Vert \,,\ \forall h \in \ell_{1}^{1} }$$
(40)

showing that N f  ∈ B(X). Moreover,

$$\displaystyle{\frac{\Vert N(h)\Vert } {\Vert h\Vert } \leq 2k\Vert h\Vert \rightarrow 0\mbox{ as }\Vert h\Vert \rightarrow 0.}$$

Hence, N is Fréchet differentiable at each f ∈  1 1, with Fréchet derivative given by (39). Moreover, inequality (40) can be used to establish that N f is continuous in f. (Note that this also means that N is locally Lipschitz continuous.) We can now apply Theorems 7 and 8 to conclude that the semi-linear ACP has a unique, locally defined (in time) strong solution.

To complete our analysis, we must show that the solution u(t) is non-negative for all t for which it is defined. We would also like to establish that the solution is defined for all t ≥ 0. It turns out that the latter can be deduced directly from the former since

$$\displaystyle{ \frac{d} {\mathit{dt}}\Vert u(t)\Vert =\Vert L\,(u(t)) + N\,(u(t))\Vert =\sum _{ n=1}^{\infty }n(L\,(u(t))_{ n} +\sum _{ n=1}^{\infty }n(N\,(u(t))_{ n} = 0,}$$

showing that \(\Vert u(t)\Vert\) cannot blow up in finite time. The proof that the solution remains non-negative is the most involved part of the argument and so only some outline details will be supplied here (see [20] for further information). In essence, we use the following trick. The ACP is rewritten as

$$\displaystyle{u^{\,{\prime}}(t) = (L(\,u(t) -\alpha u(t)) + (\alpha u(t) + N(u(t)),}$$

where the constant α is chosen so that (N +α)u(t) ≥ 0 for all t in some interval [0, t 0]. The operator Lα I is the infinitesimal generator of the positive semigroup \(\{e^{-\alpha t}S(t)\}_{t\geq 0}\) (where {S(t)} t ≥ 0 is the positive semigroup generated by L). The solution u of this modified equation satisfies the integral equation

$$\displaystyle{u(t) = e^{-\alpha t}S(t)\!\stackrel{\circ }{u} +\int _{ 0}^{t}e^{-\alpha (t-s)}S(t - s)(N+\alpha )u(s)\,\mathit{ds} =: T_{\alpha }(u(t)),\ t \in [0,t_{ 0}].}$$

The value t 0 is selected so that the operator T α on the right-hand side of the above equation is a contraction on a suitable Banach space of 1 1-valued functions and so we can obtain the solution u (the fixed point of this contraction) by means of successive iterations of T α on the initial state \(\stackrel{\circ }{u} > 0\). Since T α is positivity preserving, it follows that u(t) ≥ 0 for all t ∈ [0, t 0]. We then repeat this argument, but now with u(t 0) as the initial state, and continue in this manner.