Keywords

1 Introduction

Differential, difference and integral equations, often combined, have been used to describe real world phenomena, from the evolution of the Universe to the fundamental particles’ interactions, for well over two centuries. Since in most cases of interest the equations cannot be solved in explicit form, mathematicians have developed various theoretical tools for analyzing qualitative properties of solutions to validate approximate and computational methods used to provide numerical and graphical results, required by the end users in the applied sciences. In this paper we shall confine ourselves to time-dependent models and their functional analytic treatment; that is, to the description of the process by a family of operators parametrized by time. These operators represent subsequent states of the process. Such a family of operators is called a (semi) dynamical system or, in the linear setting, a semigroup of operators.

The problem with such an approach (and, for that matter, with any other) is that the process of making the object of interest manageable by particular mathematical techniques, called the mathematical modelling, produces a model that is significantly different from the original object. It is often forgotten that the heat equation is not the same as the heat transfer, or that the Navier–Stokes system is not the flow of water. Both are approximate descriptions of the respective processes and we cannot be a priori certain that a given solution of the equation has any physical realization. To somehow address this problem, we extend the classical Hadamard’s definition of the well-posedness of a problem by adding one more requirement. Thus, let us ask ourselves,

What Do We Want from a Mathematical Model?

  1. 1.

    The existence of solutions. This is the requirement that we haven’t used any mutually exclusive postulates while building the model.

  2. 2.

    The uniqueness of solutions. This reflects the requirement that we have full information about the process and there is causality in the process.

  3. 3.

    Continuous dependence on the input data. Since our information is imperfect, we want small errors in our data to yield only small deviations of the output.

  4. 4.

    Honesty. We want solutions of equations of the model to faithfully reproduce the principles used to build the model.

To better explain the last point, we observe that many processes involve only nonnegative quantities such as density, energy, absolute temperature, pressure. Thus the corresponding dynamical system should give only nonnegative solutions for physically correct inputs. Such dynamical systems will be called positive. Also, many equations express conservation laws such as the conservation of mass or energy. It is thus natural to expect that the solutions of such equations should satisfy the same laws. However, as we shall see, many mathematical models, even linear, fail to have this property (that is, they are dishonest). The main aim of this survey is to describe how the interplay of the positivity of models and classical functional analysis leads to a comprehensive theory of honesty for linear infinite dimensional dynamical systems (semigroups) in population dynamics. While the theory we present is fairly general, we shall illustrate it on examples from the class of birth-and-death problems described by infinite systems of ordinary differential equations. A detailed analysis of such problems in a more probabilistic context can be found in [12]. Similar problems in the theory of fragmentation–coagulation equations are discussed in [8].

2 Population Balance Equations

By a population we understand a collection of objects interacting with each other and with the environment, structured by a set of attributes that can change in the interactions. For instance, moving and colliding particles of a gas are characterized by their position and velocity that can change upon collisions, [13], reacting polymers are characterized by their length, [8], animals in a population can be characterized by their age and geographical location, [5, 27], cells can be characterized by their maturity and the number of copies of a particular gene, [24, 30].

Population balance equations characterize the population using only the number (density) of objects with given attributes and are mathematical expressions of conservation laws. In fact, in any field of science the modelled processes must obey laws of physics and, in particular, the conservation laws. More precisely, if Q is a quantity of interest (e.g., the number of animals, the total amount of a pollutant, the amount of heat energy, the number of infected individuals) with the attributes in a fixed domain Ω, then, over any fixed time interval, in Ω

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mbox{the change of }Q& =&\displaystyle \mbox{the inflow of Q} - \mbox{the outflow of Q}\\ & &\displaystyle +\;\mbox{the creation of Q}- \mbox{the destruction of Q}.{} \end{array} \end{aligned} $$
(1)

As mentioned before, we characterize populations using the density of the objects with respect to the attributes. The density, say u(x), is either the number (often normalized) of elements with an attribute x (if the number of possible attributes is finite or countable), or gives the number of elements with attributes in a set A according to the formula

$$\displaystyle \begin{aligned} \int\limits_{ A} u(x)\mathrm{d} \mu, {} \end{aligned} $$
(2)

if x is a continuous variable (and the space of all attributes Ω can be equipped with some measure structure). Balancing, for a given set of attributes A,

  • the loss of individuals from A due to the change of their attributes caused by internal or external interactions (that could include death),

  • the gain in A due to the changes of individuals’ attributes from outside A to the ones from A (that could include birth), results in the so-called Master Equation

$$\displaystyle \begin{aligned} \partial_t u(x,t) = (\mathcal{K} u)(x,t):= (\mathcal{A}u)x,t) + (\mathcal{B}u)(x,t), {} \end{aligned} $$
(3)

where \(\mathcal {A}\) is referred to as the loss operator and \(\mathcal {B}\) as the gain operator. The name comes from the theory of Markov processes, where it describes the evolution of probability of the system being in a particular state and the conservation law refers to the fact that the total probability, that is, the probability that a system is in one of the possible states, must be 1 at any give time.

The density u, by its physical meaning, should be non-negative. There are, however, population models having solutions that become eventually negative. It is then interpreted as the crash of the population, see e.g. [3].

3 Finite-Dimensional Population Equation

The simplest equations of this type occur when, at any given time, a system is in one of finitely many states and the switching between the states is determined by a matrix of migration rates. In the context of population dynamics, we consider a population divided into N classes, described by a vector u(t) = (u 1(t), u 2(t), …, u N(t)), where u i is the number of individuals with attribute i, i = 1, 2, …, N, at time t. The attribute may refer to a geographical location, financial status, number of genes of a particular type, etc. Over a short period of time Δt, the individuals can move from subpopulation i to subpopulation j with (approximate) probability p ji Δt but cannot die, emerge or leave the system, hence total population is constant.

Thus, (3) for the subpopulation with the attribute i takes the form

$$\displaystyle \begin{aligned} u^{\prime}_i(t) = - u_i(t)\sum_{{j=1},{j\neq i}}^Np_{ji}+ \sum_{{j=1},{j\neq i}}^Np_{ij} u_j(t),\;\; 1\leq i \leq N. {} \end{aligned} $$
(4)

The equation expresses the principle of conservation of the total population. The left hand side gives the rate of change of the number of individuals with attribute i and the right hand side gives the explanation of this change: the positive terms give the total rate of the immigration to i from all other classes and the negative terms give the total emigration rate from i to all other classes. Thus, denoting \(\mathcal {K}= -\mathrm{diag}\left (\sum \limits _{{j=1},{j\neq i}}^Np_{ji}\right )_{1\leq i\leq N}+ (p_{ij})_{1\leq i,j\leq N, i\neq j} =: \mathcal {A} + \mathcal {B},\) we write

$$\displaystyle \begin{aligned} {u}' = \mathcal{A} u +\mathcal{B} u = \mathcal{K}{u}. {} \end{aligned} $$
(5)

The unique solution, for an initial distribution , is given by

so that the first three requirements of well-posedness are satisfied by the standard theory of systems of linear ordinary differential equations. As far as the honesty is concerned, first we recall that the model is to describe a population, so we should have u i(t) ≥ 0 provided for all 0 ≤ i ≤ N, that is, u(t) ≥ 0 provided . Indeed, using e.g. [17, Proposition VI.1.2] or [7, Proposition 2.1.4], we see that this true as \(\mathcal {K}\) is positive off-diagonal.

By the construction of the model, the total size of the population at time t, given by

$$\displaystyle \begin{aligned} u(t)= u_1(t)+\ldots+u_N(t),{} \end{aligned} $$
(6)

should be constant in time. In fact, adding the equations in (4) we see that the system correctly reflects the conservation principle

$$\displaystyle \begin{aligned} u'(t) =0. \end{aligned}$$

If we solve (4) and evaluate (6), then we also obtain u(t) = u(0), t ≥ 0, confirming the above. Consider, for instance, the system

Clearly,

$$\displaystyle \begin{aligned} u^{\prime}(t) &= u_1^{\prime}(t)+u_2^{\prime}(t) = (-u_1(t) +u_2(t))+ (u_1(t) -u_2(t))\\ & = (-u_1(t) +u_1(t))+ (u_2(t) -u_2(t)) = 0, {} \end{aligned} $$
(7)

hence so the system is conservative. Also, the solution

satisfies

so it is conservative.

While the above calculations, such as (7), seem trivial, one should keep in mind that they tacitly depend on a number of properties such as the ability to differentiate a sum term by term, or the commutativity and associativity of summation. Such properties are often taken for granted but they are not as obvious in more complex situations, as we shall see below.

4 Making a Step to Infinity

As we already noted, the dynamics of (5) is fully determined by the matrix of rates (in the probabilistic context we say the matrix of intensities, see e.g. [12, Chapter 2]). In this section we shall provide several examples showing that this is no longer true in an infinite dimensional context.

Let us assume now that the number of possible attributes is arbitrary and, to simplify the exposition, assume that an attribute can change only to the neighbouring one, that is, the attribute i can change to either i − 1, or to i + 1. This results in the so-called birth-and-death system of equations

(8)

where u = (u n)n≥1 and (d n)n≥1 (with d 1 = 0) and (b n)n≥1 are sequences of nonnegative and, in general, unbounded coefficients. To make further calculations more compact, we let b 0 = 0 and u 0 = 0 whenever necessary. Extending the finite-dimensional ideas, we are interested in controlling the total population and thus we will take the space

$$\displaystyle \begin{aligned} X=l_1 =\left\{u;\; \sum\limits_{n=1}^\infty |u_n| <\infty\right\} \end{aligned}$$

as the state space. We observe that the right hand side of (8) has the same structure as (5), that is, as in (7),

$$\displaystyle \begin{aligned} \left(\sum_{n=1}^\infty u_n\right)^{\prime} &= \sum_{n=1}^\infty u^{\prime}_n = \sum_{n=1}^\infty (-(d_n+b_n)u_n +d_{n+1}u_{n+1}+b_{n-1}u_{n-1})\\ &=(-b_1 u_1+ d_2u_2) + (-(d_{2}+b_{2})u_2 +b_{1}u_{1}+ d_{3}u_{3})+\ldots \\ &=(-b_1 u_1+ b_1 u_1) +(d_2u_2-d_{2}u_2) + \ldots = 0.{} \end{aligned} $$
(9)

Here, however, we should be more cautious, since to obtain the above result we differentiate term by term an infinite series and rearrange the order of summation in an infinite sum. This is allowed only under specific conditions on u which, a priori, is not known. To illustrate this point, we discuss two simpler special cases of (8) describing, respectively, only death and only birth process. An analysis of the full equation, based on the fundamental paper [23], can be found in [4, Chapter 7], see also Sect. 7.3 of the present paper, while detailed results for (8) with the coefficients considered below can be found in [12, Sections 2.4.10–16 & 3.4.2–6].

4.1 A Death Equation: Multiple Solutions

Consider, for t ≥ 0,

(10)

If we are interested only in coordinate-wise solvability of (10), we can use the integrating factors to re-write it as

(11)

To find a solution, following [29] we observe that if , then the solution u to (11) is given by the solution u N to the finite dimensional system

(12)

where we agree to identify elements of \(\mathbb R^N\) with their extensions by 0 in l 1. It is easy to see that for any N ≥ 1, u N is nonnegative and for t ≥ 0 (note that (12) no longer is conservative). Further, if we consider u N+1, then

from where \(u^{N+1}_N(t) \geq u^N_N(t)\) on account of u N+1 ≥ 0. Thus, going down with n from N to 1, we obtain u N(t) ≤ u N+1(t) for t ≥ 0 and the sequence (∥u N(t)∥)N≥1 is convergent as nondecreasing and bounded. Using the properties of the l 1 norm, for M ≥ N we have

$$\displaystyle \begin{aligned} \|u^{M}(t) -u^{N}(t)\|{}_{l_1} = |\|u^{M}(t)\|{}_{l_1} -\|u^{N}(t)\|{}_{l_1}| \end{aligned}$$

and thus (u N(t))N≥1 is a Cauchy sequence in l 1 for any t. Thus,

$$\displaystyle \begin{aligned} \lim\limits_{N\to \infty} u^{N}(t) = u(t) {} \end{aligned} $$
(13)

in l 1 for some coordinate-wise measurable tu(t) ≥ 0. Now, let us fix n ≥ 2 and take N > n. Then, since

the Lebesgue dominated convergence theorem allows us to pass to the limit

which shows that u is a continuous coordinate-wise solution to (11), hence it is differentiable coordinate-wise and thus solves (10). Summarizing, for any there is \(\mathbb R_+ \ni t\mapsto u(t) \in l_1\) that satisfies u(t) ≥ 0, and such that for each n ≥ 1, \(u_n\in C^{1}(\mathbb R_+)\) and (10) is satisfied.

Remark 4.1

In fact, u has stronger properties that follow from the general theory presented in Sect. 6 but for our present goal the above are sufficient.

On the other hand, for λ > 0 consider the system

$$\displaystyle \begin{aligned} \lambda v^\lambda_1 &= 3^2 v^\lambda_2,\\ \lambda v^\lambda_n &= -3^n v^\lambda_n + 3^{n+1}v^\lambda_{n+1}, \quad n\geq 2. {} \end{aligned} $$
(14)

It is easy to see that, for a given \(v^\lambda _1\),

$$\displaystyle \begin{aligned} v^\lambda_n = \frac{\lambda v^\lambda_1}{3^n}\prod\limits_{i=1}^{n-2}\left(1+\frac{\lambda}{3^i}\right), \quad n\geq 2, \end{aligned}$$

with the convention that \(\prod \limits _{i=1}^{0} :=1\). We see that if \(v^\lambda _1> 0,\) then \(v^\lambda _n> 0\) for all n ≥ 1, \(v^\lambda =(v^\lambda _n)_{n\geq 1}\) is monotonically increasing and, since \(\sum \limits _{i=1}^\infty \frac {1}{3^i}<\infty \),

$$\displaystyle \begin{aligned} \lim\limits_{n\to \infty} \prod\limits_{i=1}^{n-2}\left(1+\frac{\lambda}{3^i}\right) = :P>0. \end{aligned}$$

Thus

$$\displaystyle \begin{aligned} \sum\limits_{n=1}^\infty v^\lambda_n = v^\lambda_1\left(1 + \lambda \sum\limits_{n=2}^\infty\frac{1}{3^n}\prod\limits_{i=1}^{n-2}\left(1+\frac{\lambda}{3^i}\right)\right) \leq v^\lambda_1\left(1 +\frac{\lambda P}{6}\right) \end{aligned}$$

and v λ ∈ l 1 for any λ > 0. But then it is clear that

$$\displaystyle \begin{aligned} v(t): = e^{\lambda t}v^\lambda {} \end{aligned} $$
(15)

is a coordinate-wise (and also l 1) differentiable solution to (10) with v(0) = v λ and final total size (the l 1 norm) for any t. Similarly to the above, we also have

$$\displaystyle \begin{aligned} \sum\limits_{n=1}^\infty v^\lambda_n \geq v^\lambda_1\left(1+ \lambda \sum\limits_{n=2}^\infty\frac{1}{3^n}\right)= v^\lambda_1\left(1 +\frac{\lambda }{6}\right),\end{aligned}$$

thus

$$\displaystyle \begin{aligned} \|v(t)\|{}_{l_1}\geq e^{\lambda t}v^\lambda_1\left(1 +\frac{\lambda }{6}\right)\end{aligned}$$

and hence v(t) cannot be the bounded solution constructed in the first part of the section for the same initial condition v λ.

4.2 A Birth Process: Breach of the Conservation Law

Example 4.2

Consider, for t ≥ 0,

(16)

In this case it is easy to construct recursively a unique nonnegative (for ) coordinate-wise solution

Again, column sums of the coefficient matrix in (16) are zero, so the expectation is that the solution satisfies

(17)

We estimate the solution to (16) for . We obviously have

$$\displaystyle \begin{aligned} u_1(t) &= e^{-3t},\\ u_2(t) &= 3e^{-3^2t}\int\limits_{0}^{t} e^{3^2s-3s}ds = \frac{3}{3^2-3} \left(e^{-3t} - e^{-3^2 t}\right)\leq \frac{3}{3^2-3} e^{-3t} \end{aligned} $$

and, by induction,

$$\displaystyle \begin{aligned} u_n(t) \leq e^{-3t}\prod_{i=2}^n\frac{3^{i-1}}{3^i-3} = \frac{e^{-3t}}{3^{n-1}}\prod_{i=2}^n\left(1+\frac{1}{3^{i-1}-1}\right). {}\end{aligned} $$
(18)

We see that

$$\displaystyle \begin{aligned}\lim\limits_{n\to \infty} \prod\limits_{i=2}^n\left(1+\frac{1}{3^{i-1}-1}\right) =:P<\infty\end{aligned}$$

in a monotonic way and thus

$$\displaystyle \begin{aligned} \|u(t)\|{}_{l_1}\leq e^{-3t}\left(1 + P\sum\limits_{n=2}^\infty \frac{1}{3^{n-1}}\right)= e^{-3t}\left(1 + \frac{P}{2}\right). \end{aligned}$$

This shows that, on the one hand, u ∈ l 1 but, on the other, (17) is not satisfied for large t.

Example 4.3

As another example we consider, for t ≥ 0,

(19)

where, again, . By direct calculation, we find u 1(t) = e t, u 2(t) = e t(1 − e t), u 3(t) = e t = e t − 2e −2t + e −3t = e t(1 − e t)2 and thus we make the inductive assumption

$$\displaystyle \begin{aligned} u_n(t) = e^{-t}(1-e^{-t})^{n-1}. {} \end{aligned} $$
(20)

Then

$$\displaystyle \begin{aligned} u_{n+1}(t) &= n e^{-(n+1)t}\int\limits_{0}^{t} e^{ns}(1-e^{-s})^{n-1}ds = n e^{-(n+1)t}\int\limits_{0}^{t} e^{s}(e^{s}-1)^{n-1}ds\\&= n e^{-(n+1)t}\int\limits_{0}^{e^t-1} z^{n-1}dz = e^{-(n+1)t} (e^t-1)^n = e^{-t}(1-e^{-t})^n \end{aligned} $$

and hence formula (20) has been proved. Thus we see that

$$\displaystyle \begin{aligned} \sum_{n=1}^\infty u_n = e^{-t}\sum_{n=1}^\infty (1-e^{-t})^{n-1} = 1 {} \end{aligned} $$
(21)

and the solution is norm conserving. At the same time, estimating as in (18), we obtain

$$\displaystyle \begin{aligned} u_n(t) \leq e^{-t}, \quad n\geq 1, \end{aligned}$$

and we see that the solution converges coordinate-wise to 0 (even uniformly in n). Compared with (21), this example once again emphasizes the fact that in infinite dimensional systems the coordinate-wise description of the evolution does not provide the full picture of the dynamics.

5 Between Model and Its Analysis

The presented examples indicate that in the infinite dimensional scenario it is not sufficient to consider a model verbatim, as derived in applied sciences, since such a simplistic approach often yields pathological outputs and renders the model ill-posed. Instead, we should carefully re-interpret the model in an adequate mathematical setting keeping, however, constantly in mind that by doing so we could lose some important features of the original formulation. Thus the developed theory should be related the original problem and be able to explain those pathologies, even if it does not cover them.

Thus, in what follows we describe such a mathematical formalization, following the exposition in [8, Section 4.1]. We should, however, remember that mathematical modelling is not mathematics (one cannot prove that a model is correct) and thus, until the model is fully formalized in a mathematical setting, the modelling consists of judicious steps rather than proofs.

As we have seen above, equations derived in applied sciences typically are formulated point-wise, that is, all operations, such as differentiation and integration, are understood in the classical calculus sense and the equation should be satisfied for all values of the independent variables:

(22)

where \(\mathcal {K}\) is a differential, integral, or functional expression operating on functions defined on some set Ω (with obvious modifications if Ω is denumerable). We have presented several examples, and there is plenty of others, see [8, Section 4.1], showing that with such an approach, (22) may become ill-posed even if in the modelling process we took into account all information characterizing the process. This can bee seen by noticing that the same Eq. (22) behaves well for some ranges of coefficients, e.g. (19), and displays pathological features for others, such as (16). Thus, to analyze (22) we have to reformulate it in a mathematically rigourous way.

Our choice is to describe the evolution of a system by a family of operators \({({\mathcal {G}}(t))_{t \geq 0}}\) parameterised by time, called the dynamical system or, in the linear setting, a semigroup. The operators act in some state space X, mapping an initial state of the system to all subsequent states in the evolution, that is, solutions are represented as

(23)

Here we note that we shall identify functions of two variables (t, x)↦u(t, x) with functions t → u(t) taking values in the space X of functions of the variable x; in the context of the applications discussed here, this is possible by [8, Section 3.1.2] and should not lead to any misunderstanding.

To be able to talk about an evolution of the system, that is, about a motion in X, the latter must be equipped with a notion of distance and, if we are interested in linear models, the distance should be consistent with the algebraic structure of X. Thus, our state space should be a Banach space which we choose partly due to its relevance in the problem and partly for its mathematical convenience. This choice is not unique, it is a mathematical intervention into the model.

In our examples we were interested in the total size of the population and thus we have chosen the space l 1 as the state space; in general L 1 spaces are used in similar models if x is a continuous variable. If, however, we were interested in controlling the maximal concentration of particles, a better choice would be some space with the supremum norm. On the other hand, in investigations of long time behaviour of population models with growth the most useful space is the L 1 space weighted with the eigenvector of the adjoint problem, see e.g. [15], while l 1 and L 1 spaces in which the higher order moments are finite allowed for proving much stronger results for fragmentation–coagulation problems, see [8, Chapters 5 & 8].

The choice of the space is, of course, not sufficient — all our ‘pathological’ examples live in the original state space.

Once we select the state space X, the right-hand side of (22) can be interpreted as an operator K : D(K) → X defined on some subset D(K) of X such that \(x\to [\mathcal {K}u](x)\in X\) for u ∈ D(K).

With this, (22) can be written as the Cauchy problem for an ordinary differential equation in X: find u ∈ D(K) such that tu(t) is differentiable in X for t > 0 and

(24)

Unfortunately, the domain D(K) is also not uniquely defined by the model. Discussing this problem and, in general, operator realizations of \(\mathcal {K}\), we focus on the Master equation (3).

For (8), the matrix \(\mathcal {A}\) is the diagonal, defined as \(\mathcal {A}u = -((b_n+d_n)u_n)_{n\geq 1}\), while \(\mathcal {B} u = (b_{n-1}u_{n-1} + d_{n+1}u_{n+1})_{n\geq 1}\) (remember the convention b 0 = u 0 = 0), both defined for u belonging to the space of all sequences l 0.

Possibly the operator which is the closest to the formal expression \(\mathcal {A}+\mathcal {B}\) is the maximal operator \(K_{\max }\) which is \(\mathcal {A}+\mathcal {B}\) defined on

$$\displaystyle \begin{aligned} D(K_{\max}) = \{{u} \in l_1;\; {\mathcal{A}}{u}+{\mathcal{B}}{u} \in l_1\}. \end{aligned}$$

Here, it is possible that neither \(\mathcal {A}u\) nor \(\mathcal {B}u\) belongs to l 1.

Another natural choice is to consider \(\mathcal {A}+\mathcal {B}\) on a domain which ensures that both \(\mathcal {A}u\) and \(\mathcal {B}u\) are in l 1. Thus, we define A as \(\mathcal {A}|{ }_{D(A)}\) on

$$\displaystyle \begin{aligned} D(A)=\left\{u\in l_1;\; \sum_{n=1}^\infty (b_n+d_n)|u_n|<\infty\right\}. {} \end{aligned} $$
(25)

Then for 0 ≤ u ∈ D(A), similarly to (9),

$$\displaystyle \begin{aligned} \|\mathcal{B}u\|{}_{l_1}= \sum_{n=1}^\infty (d_{n+1}u_{n+1}+b_{n-1}u_{n-1}) &= \sum_{n=1}^\infty (d_{n}u_{n}+b_{n}u_{n}) =\|Au\|{}_{l_1}, {} \end{aligned} $$
(26)

where this time the rearrangement of the summation is justified by the absolute summability of the right-hand-side. Thus, extending by linearity, we see that \(B=\mathcal {B}|{ }_{D(A)}\) is well-defined and we introduce

$$\displaystyle \begin{aligned} K_{\min} = (\mathcal{A}+\mathcal{B})|{}_{D(A)} =A+B, \end{aligned}$$

where both terms on the right act in l 1.

Example 5.4

Let us have a preliminary look at what the domain of the generator has to do with the well-posedness of (24). If a solution u to (24) satisfies \(0\leq u \in D(K_{\min }) = D(A)\), then, by the l 1 differentiability required in the definition and (25),

$$\displaystyle \begin{aligned} \partial_t\|u(t)\|{}_{l_1} &= \sum_{n=1}^\infty (Au+Bu)_n(t)=\sum_{n=1}^\infty (Au)_n(t)+ \sum_{n=1}^\infty(Bu)_n(t) =0{} \end{aligned} $$
(27)

and (9) holds, that is, the solution is conservative and the model is honest. Note that the right hand side holds also if u is not positive but then the left hand side does not have the interpretation of the derivative of the norm.

The conservativeness may be extended to the case, when the positive solutions stay in \(D(\overline {K_{\min }}),\) where \(\overline {K_{\min }}\) is defined as \( \overline {K_{\min }}u = \lim \limits _{n\to \infty }(Au_n+Bu_n) \) with \(D(K_{\min }) \ni u_n\to u\in D(\overline {K_{\min }})\), whenever both limits exist. Then it is easy to see that if \(u(t) \in D(\overline {K_{\min }})\) for any t ≥ 0, then

$$\displaystyle \begin{aligned} \sum_{n=1}^{\infty}(\overline{A+B}u)_n(t)=0{} \end{aligned} $$
(28)

and (9) also holds.

But can we be sure that yields \(u(t) \in D(\overline {K_{\min }})\) for all t > 0? Or, at least, can we find a D(K) such that any u emanating from stays in D(K) for all t > 0 and, for (24) to make sense, is differentiable in X? This leads us to the concept of the generator which, as we shall see below, is forced upon us by semigroup theory. To explain this, we have to formalize the above discussion.

Definition 5.5

A family (G(t))t≥0 of bounded linear operators on a Banach space X is called a C 0-semigroup, or a strongly continuous semigroup, if

  • (i) G(0) = I;

  • (ii) G(t + s) = G(t)G(s) for all t, s ≥ 0;

  • (iii) \( \lim _{t\to 0^+}G(t)u = u\) for any u ∈ X.

A linear operator K is called the (infinitesimal) generator of (G(t))t≥0 if

$$\displaystyle \begin{aligned} Ku = \lim\limits_{h\to 0^+}\frac{G(h)u-u}{h}, {} \end{aligned} $$
(29)

with D(K) defined as the set of all x ∈ X for which this limit exists.

By (iii), we see that for u ∈ D(K) and t ≥ 0, the right hand side derivative of tG(t)u satisfies

$$\displaystyle \begin{aligned} \partial_t G(t)u &= \lim\limits_{h\to 0^+}\frac{G(t+h)u-G(t)u}{h} = G(t)\lim\limits_{h\to 0^+}\frac{G(h)u-u}{h} =G(t)Ku \\ &= \lim\limits_{h\to 0^+}\frac{G(h)G(t)u-G(t)u}{h} = KG(t)u. {} \end{aligned} $$
(30)

With a similar calculation for the left hand derivative and t > 0, see e.g. the proof of [28, Theorem 1.2.4], we see that G(t)u ∈ D(K) for any t > 0, tG(t)u is differentiable in X and satisfies (24). We observe that if u ∈ X ∖ D(K), then in general tG(t)u is only continuous and thus it does not solve (24). It solves, however, the integrated version of (24) and thus it is called an integral, or mild, solution.

Finding the generator K usually is a challenge; some methods for doing this will be presented later in the paper. Here we mention that for a large class of problems (see Remark 6.14) associated with (3), the generator K always satisfies K min ⊂ K ⊂ K max, see e.g. [4, Theorem 6.20] (though this is not always the case, as we shall see in Theorem 7.18). The place of K on the scale between K min and K max determines the well-posedness of the problem (22). It turns out that all the following situations are possible

  1. 1.

    \(K_{\min } = K = K_{\max }\),

  2. 2.

    \(K_{\min } \varsubsetneq K =\overline {K_{\min }} = K_{\max }\),

  3. 3.

    \(K_{\min } = K \varsubsetneq K_{\max }\),

  4. 4.

    \(K_{\min } \varsubsetneq K =\overline {K_{\min }} \varsubsetneq K_{\max }\),

  5. 5.

    \(\overline {K_{\min }}\varsubsetneq K \varsubsetneq K_{\max }\).

Each of these cases has its own specific interpretation in the model. If \(K \varsubsetneq K_{\max },\) we don’t have uniqueness, that is, there are C 1([0, ), X) solutions to (22) emanating from zero and therefore they are not described by the semigroup, such as the one constructed in Sect. 4.1:

there is more to life, than meets the semigroup.

If \(\overline {K_{\text{min}}}\varsubsetneq K\), then despite the fact that the transition operators is formally conservative, the solutions are not: the modelled quantity leaks out from the system and the mechanism of this leakage is not present in the model.

In the remaining part of the paper we shall present a theory explaining these phenomena.

6 Substochastic Semigroup Theory

6.1 Generation Theorems

According to (30), if we have a semigroup (G(t))t≥0, we can uniquely identify its generator and the equation the semigroup solves. In practice, however, we are faced with an expression \(\mathcal {K}\) in (22) and we have to identify its realization in some Banach space that generates a semigroup. Though there is no general way for doing this, at least, given X and a realization K of \(\mathcal {K}\), the Hille–Yosida theorem allows us to determine whether it is a generator or not.

Let R(λ, K) := (λIK)−1 denote the resolvent of K defined for λ ∈ ρ(K), the resolvent set of K.

Theorem 6.6 ([28, Theorem 3.1])

K is the generator of a semigroup (G K(t))t≥0if and only if K is closed and densely defined and there exist M > 0, ω ℝ such that (ω, ) ⊂ ρ(K) and for all n ≥ 1, λ > ω,

$$\displaystyle \begin{aligned} \|R(\lambda, K)^{n}\|\leq \frac{M}{(\lambda-\omega)^{n}}. {} \end{aligned} $$
(31)

Despite its theoretical importance, practical applications of the Hille–Yosida theorem are very limited due to that fact that it requires solving infinitely many equations of increasing complexity and estimating their solutions. Fortunately, for the class of contractive semigroups that are important in applications the conditions can be somewhat simplified. In fact, the proof of the full Hille–Yosida theorem is reduced to the contractive case, albeit not in a very constructive manner.

Let us recall that the duality set of u ∈ X is defined as

$$\displaystyle \begin{aligned} \mathcal{J}(u) =\{u^*\in X^*;\;\langle u^*,u \rangle = \|u\|{}^2 = \|u^*\|{}^2\}, {} \end{aligned} $$
(32)

where X is the dual to X. Then we say that an operator (K, D(K)) is dissipative if for every u ∈ D(K) there is \(u^* \in \mathcal {J}(u)\) such that

$$\displaystyle \begin{aligned} \Re \langle u^*,Ku \rangle\, \leq 0. {} \end{aligned} $$
(33)

Theorem 6.7 ([16, Theorem II.3.15])

For a densely defined dissipative operator (K, D(K)) on a Banach space X, the following statements are equivalent.

  • (a) The closure\(\overline {K}\)generates a semigroup of contractions.

  • (b)\(\overline {\mathrm {Ran}(\lambda I-K)} = X\)for some (and hence all) λ > 0.

If either condition is satisfied, then K satisfies (33) for any\(u^*\in \mathcal {J}(u)\).

Thus, once we know that (33) is satisfied, then instead of finding the continuous inverses to (λIK)n for all sufficiently large λ and n ≥ 1, it suffices to find a solution u to

$$\displaystyle \begin{aligned} \lambda u - Ku = f {} \end{aligned} $$
(34)

for any f from a dense subset of X and some λ > 0.

6.2 How Do Multiple Solutions Fit into the Theory of Semigroups?

While semigroup theory ensures the uniqueness of solutions, this applies only to a particular realization of (24) in which on the right hand side we have the generator K of the semigroup. However, as we know, K is one of many versions of \(\mathcal {K}\) and this explains the existence of some nul-solutions, that is, solutions emanating from zero, [22, Section 27.3], and hence explains multiple solutions such as (15). Indeed, let (K, D(K)) be the generator of a C 0-semigroup (G(t))t≥0 on a Banach space X. To simplify notation we assume that (G(t))t≥0 is a semigroup of contractions, hence \(\{\lambda :\;\Re \lambda >0\} \subset \rho (K)\). Let us further assume that there exists an extension \(\mathcal {K}\) of K defined on the domain \(D(\mathcal {K})\). We have the following basic result.

Lemma 6.8 ([2])

If\(\mathcal {K}\)is closed, then for any λ with\(\Re \lambda >0\),

$$\displaystyle \begin{aligned} D(\mathcal{K}) = D(K)\oplus \mathrm{Ker}(\lambda I-\mathcal{K}). {} \end{aligned} $$
(35)

If we equip\(D(\mathcal {K})\)with the graph norm, then D(K) is a closed subspace of\(D(\mathcal {K}).\)

Furthermore, if \(\mathrm {Ker}(\lambda I-\mathcal {K})\) is finite-dimensional for some λ with \(\Re \lambda >0\) , then \(\mathcal {K}\) is closed.

To explain the meaning of the lemma, consider the Cauchy problem

(36)

Then u λ(t) = e λt v λ, where \(v^\lambda \in \mathrm {Ker}(\lambda I-\mathcal {K}),\) is a C 1([0, ), X) solution to (36) with . However, as v λD(K), t → G(t)v λ in general is not differentiable, and thus neither is tv(t) = G(t)v λ − u λ(t). Hence, v is only a mild nul-solution of (36). A classical solution u to (36) can be obtained, [22, Theorem 23.7.2], [4, Theorem 3.48], by taking \(y(\lambda )\in \mathrm {Ker}(\lambda I-\mathcal {K})\), multiplied by a suitable scalar function of λ to insure its appropriate integrability along γ ± i∞ for any γ > 0, and taking its inverse Laplace transform,

$$\displaystyle \begin{aligned} u(t) =\frac{1}{2\pi i} \int\limits_{\gamma -i\infty}^{\gamma +i\infty}e^{\lambda t}y(\lambda)d\lambda, \quad \gamma >0. {} \end{aligned} $$
(37)

6.3 Perturbation Theory for Positive Semigroups

Even if verifying assumptions of Theorem 6.7 is easier than of Theorem 6.6, solving (34) can be a formidable task. Thus, in practice, we try other methods among which perturbation techniques play an important role. In this approach we try to write (24) as

(38)

where A is an ‘easy’ operator for which the generation result is easy to prove and find conditions on B such that A + B (or its extension) is also a generator. It is easy to see that if A generates a semigroup and B is bounded, then A + B is also the generator of a semigroup but obviously this class of perturbations is to restrictive for most applications.

So far our discussion has not involved positivity aspects, apart from the observation that the solutions to the equations discussed in Sect. 4 should be coordinate-wise nonnegative if such is the initial condition. It turns out that, whenever a semigroup has this property, employing positivity can simplify many results.

Let X be a Banach lattice with partial order denoted by ≥. For any Y ⊂ X, Y + := {u ∈ Y ; u ≥ 0}. It is easy to see that l 1 with the coordinate-wise order is a Banach lattice. An operator O : X → X is called positive if u ≥ 0 implies Ou ≥ 0. A semigroup (G(t))t≥0 on X is called positive if G(t)u ≥ 0 for all t ≥ 0 and u ≥ 0. The Laplace transform representation of the resolvent on the one hand and the Hille formula, [28, Theorems 3.1 & 8.3], on the other, show that (G(t))t≥0 is positive if and only if the resolvent of its generator is positive for all large λ.

In what follows we shall develop a theory suitable for examples discussed in Sect. 4, that is, we consider problems of the form (38) in a Banach lattice X, where (A, D(A)) is the generator of a substochastic (positive and contractive) semigroup (G A(t))t≥0 and (B, D(A)) is a positive operator, though some results pertain to a more general situation.

As follows from Theorems 6.6 and 6.7, the first step in proving a generation result for a contractive semigroup is finding solutions to the resolvent equation

$$\displaystyle \begin{aligned} (\lambda I - (A+B))u = f, \quad f \in X, \lambda>0. {} \end{aligned} $$
(39)

Knowing that R(λ, A) exists for λ > 0, (39) can be formally re-written as

$$\displaystyle \begin{aligned} u - R(\lambda, A)B u = R(\lambda, A)f {} \end{aligned} $$
(40)

and we can recover u provided the Neumann series

$$\displaystyle \begin{aligned} R(\lambda)f:=\sum_{{n}={0}}^{{\infty}}(R(\lambda, A)B)^nR(\lambda, A)f = \sum_{{n}={0}}^{{\infty}}R(\lambda, A)(BR(\lambda, A))^nf {} \end{aligned} $$
(41)

is convergent. There are three possible cases.

  1. 1.

    Easy case. If the spectral radius of BR(λ, A) satisfies

    $$\displaystyle \begin{aligned} \rho(BR(\lambda, A))<1,{} \end{aligned} $$
    (42)

    for some λ > s(A), where s(A) is the spectral bound of A (and, since R(λ, A) ≥ 0, for all larger ones), then

    $$\displaystyle \begin{aligned} u = R(\lambda, A)\sum\limits_{{n}={0}}^{{\infty}}(BR(\lambda, A))^nf \in D(A) \end{aligned}$$

    is the solution to (39). Moreover, in Banach lattices, (42) is also necessary for the positive invertibility of (39), see [31] (it can be seen that Banach lattices and positive operators on them satisfy the more general assumption in op. cit., see e.g. [4, Section 2.2.3]). Thus A + B is a good candidate for the generator.

  2. 2.

    Slightly less easy case. If

    $$\displaystyle \begin{aligned} \lim\limits_{n\to \infty}(BR(\lambda,A))^nf = 0 {} \end{aligned} $$
    (43)

    for any f ∈ X, then \(R(\lambda ) = R(\lambda , \overline {A+B})\), see [4, Proposition 4.7]. In other words,

    $$\displaystyle \begin{aligned} (\lambda I - \overline{A+B})u = f, \quad f \in X, \lambda>0. {} \end{aligned} $$
    (44)

    Hence, (38) cannot be solved as it is, but we can hope to solve a modification of it with A + B replaced by \(\overline {A+B}\). As we have seen in Example 5.4, some essential features of the dynamics are preserved in such a case.

  3. 3.

    Neither of them. While, in general, the series in (41) may fail to converge or, even if it converges, R(λ) may fail to be the resolvent of a densely defined operator, see e.g. [11], we are interested in cases when R(λ) = R(λ, K) for some K ⊃ A + B, see e.g. Theorem 6.11.

Example 6.9

For the death problem (10), we have

$$\displaystyle \begin{aligned} R(\lambda, A)u = \left(\frac{u_1}{\lambda}, \frac{u_2}{\lambda+3^2}, \ldots, \frac{u_n}{\lambda+3^n},\ldots\right) \end{aligned}$$

and (Bu)n = 3n+1 u n+1, n ≥ 1. Thus

$$\displaystyle \begin{aligned} ((BR(\lambda, A))^k u)_n = u_{k+n}\prod\limits_{j=n+1}^{n+k}\frac{3^j}{\lambda+3^j} \end{aligned}$$

and, for u ∈ l 1,

$$\displaystyle \begin{aligned} \|(BR(\lambda, A))^k u\|{}_{l_1} &\leq \sum_{n=1}^\infty |u_{k+n}|\prod_{j=n+1}^{n+k}\frac{3^j}{\lambda+3^j} = \sum_{r=k+1}^\infty |u_{r}|\prod_{j=r-k+1}^{r}\frac{3^j}{\lambda+3^j}\\&\leq \sum_{r=k+1}^\infty |u_{r}| \end{aligned} $$

so that

$$\displaystyle \begin{aligned} \lim\limits_{k\to \infty} BR(\lambda, A))^k u =0. \end{aligned}$$

Thus, \(\overline {A+B}\) is a plausible candidate for the generator of a (conservative) death semigroup. On the other hand, for each fixed k let us consider the sequence u r = (δ jr)j≥1. Then

$$\displaystyle \begin{aligned} \|(BR(\lambda, A))^k \|&\geq \sup_{r\geq 1}\|BR(\lambda, A))^k u^r\|{}_{l_1} = \lim\limits_{r\to \infty } \prod_{j=r-k+1}^{r}\frac{3^j}{\lambda+3^j} = 1, \end{aligned} $$

as the product consists of only k terms. Hence ρ(BR(λ, A)) = 1. On the other hand, if A + B was the generator of a positive semigroup, then λI − (A + B) would be positively invertible for large λ. Then, however, by the comment under (42), ρ(BR(λ, A)) < 1. Thus A + B cannot be the generator.

Example 6.10

Consider now the birth problem (16). Here

$$\displaystyle \begin{aligned} (R(\lambda, A)u)_n = \frac{u_n}{\lambda+3^n}, \quad n\geq 1, \end{aligned}$$

and Bu = (0, 3u 1, 32 u 2, …), thus

$$\displaystyle \begin{aligned} ((BR(\lambda, A))^ku)_n = \left\{\begin{array}{lcl} 0&\mathrm{for}& 1\leq n\leq k,\\ u_j\prod\limits_{r =j}^{j+k-1}\frac{3^r}{\lambda + 3^r}&\mathrm{for}& n = j+k.\end{array}\right. \end{aligned}$$

Hence, for u ≥ 0,

$$\displaystyle \begin{aligned} \sum\limits_{n=1}^{\infty} ((BR(\lambda, A))^ku)_n = \sum\limits_{j=1}^\infty u_j\prod\limits_{r =j}^{j+k-1}\frac{3^r}{\lambda + 3^r} \end{aligned}$$

Now, if we take u = (1, 0, …), then for any k

$$\displaystyle \begin{aligned} \|(BR(\lambda, A))^ku\|{}_{l_1} = \prod\limits_{r =1}^{k}\frac{3^r}{\lambda + 3^r}\geq \prod\limits_{r =1}^{\infty}\frac{3^r}{\lambda + 3^r} >0, \end{aligned}$$

where the product is positive on account of the summability of (3r)r≥1. This is consistent with the result of Example 4.2 and (28) that ascertained that the relevant semigroup cannot be generated by \(\overline {A+B}\).

While in the first example we have the resolvent and a candidate for the generator, in the second case we have neither and thus we need a tool for handling situations in which ρ(B(R(λ, A)) = 1 since clearly they occur in important applications and result in interesting dynamics.

Such a tool have been developed by employing the order structure of the underlying state space and has its origins in the fundamental paper [23], devoted to the Kolmogorov system equations (which birth-and-death problems are special case of) in l 1. The main ideas, however, can be applied to a much broader class of problems.

Let us recall that a Banach lattice is called a Kolomogorov–Banach space (a KB-space) if every norm bounded and nondecreasing sequence is norm convergent. We have already used this property in Sect. 4.1. All reflexive, as well as L 1, spaces are KB-spaces.

Theorem 6.11 ([4, Theorem 5.2])

Let X be a KB-space. If

  • (A1) ρ(BR(λ, A)) ≤ 1 for some λ > 0,

  • (A2)u , (A + B)u〉≤ 0 for any\(u \in D(A)_+, u^*\in \mathcal {J}(u)_+\),

then there is an extension (K, D(K)) of (A + B, D(A)) generating a positive C 0 -semigroup of contractions, say, (G K(t))t≥0 . The generator K satisfies, for λ > 0,

$$\displaystyle \begin{aligned} R(\lambda, K)f = \sum_{{k}={0}}^{{\infty}}R(\lambda,A)(BR(\lambda,A))^kf. {} \end{aligned} $$
(45)

Main Ideas of the Proof

As we mentioned above, the order structure plays a crucial role in the proof. For 0 ≤ r < 1 we define K r = A + rB, D(K r) = D(A). By (A1), ρ(rBR(λ, A)) ≤ r < 1 and hence

$$\displaystyle \begin{aligned} R(\lambda, K_r)= R(\lambda,A)\sum_{{n}={0}}^{{\infty}}r^n\left(BR(\lambda,A)\right)^n, {} \end{aligned} $$
(46)

where the series converges absolutely and each term is positive. Let \(u^*\in \mathcal {J}(u)_+\). Then, by (A2), for u ∈ D(A)+ and r < 1,

$$\displaystyle \begin{aligned} \langle u^*,K_r u \rangle\, =\, \langle u^*,(A+B)u \rangle+(r-1)\!\langle u^*,Bu \rangle\, \leq 0; \end{aligned}$$

that is, K r are dissipative, and thus

$$\displaystyle \begin{aligned} \|(\lambda I - K_r)u\| \geq \,\langle u^*, (\lambda I-K_r)u \rangle\, =\, \lambda\!\langle u^*,u \rangle -\langle u^*,K_ru \rangle\, \geq \lambda \|u\|, \end{aligned}$$

for all u ∈ D(A)+. We can rewrite the above inequality as

$$\displaystyle \begin{aligned} \|R(\lambda, K_r)y\| \leq \lambda^{-1}\|y\| {} \end{aligned} $$
(47)

for all y ∈ X + and, because R(λ, K r) are positive, (47) can be extended to the whole space X. As in Theorem 6.7, all these properties can be extended to λ > 0. Since we are in a KB-space, for each f ∈ X + there is R(λ)f ∈ X + such that

$$\displaystyle \begin{aligned}\lim\limits_{r\to 1^-}R(\lambda, K_r)f = R(\lambda)f \end{aligned}$$

in X. It follows that R(λ) ≥ 0 is the resolvent of a densely defined operator K that is an extension of A + B and then the generation is a consequence of the Trotter-Kato theorem, e.g. [28, Theorem 3.4.4]. □

Remark 6.12

The main drawback of Theorem 6.11 is that it does not provide any constructive information about K. We can have \(K=A+B, K=\overline {A+B},\) or K could be an extension of \(\overline {A+B}\).

Theorem 6.11 is close to [28, Theorem 3.3.4] (see the reformulation in [4, Theorem 4.12] and [6, Theorem 3.6]) which also requires A + rB to be dissipative for r ∈ [0, 1] and allows for ∥BR(λ, A)∥ = 1. It requires, however, B to be densely defined but, in return, contrary to Theorem 6.11, provides the characterization \(K=\overline {A+B}\).

If X is reflexive and B closable, then B is densely defined, see [28, Lemma 1.10.5], and thus in such a case [4, Theorem 4.12] is stronger than Theorem 6.11. Thus, though there are examples when even in reflexive spaces the former is not applicable but the latter works, see [6, Example 3.8], the real power of Theorem 6.11 is revealed in L 1 spaces.

So, let us assume that ( Ω, μ) is a measure space with σ-finite measure μ and X = L 1( Ω, ).

Corollary 6.13

If for all u  D(A)+

$$\displaystyle \begin{aligned} \int\limits_{\Omega}^{}(Au + Bu)d\mu \leq 0, {} \end{aligned} $$
(48)

then the assumptions of Theorem 6.11 are satisfied.

The reason for this simplification is the additivity of the norm on the positive cone of X. Indeed, since R(λ, A) is a surjection from X onto D(A), for x ∈ X + we have u = R(λ, A)x ∈ D(A)+. Integrating

$$\displaystyle \begin{aligned} (A+B)u = -x + BR(\lambda,A)x +\lambda R(\lambda,A) x \end{aligned}$$

we get

$$\displaystyle \begin{aligned} - \int\limits_{\Omega}^{}x \,d\mu + \int\limits_{\Omega}^{}BR(\lambda,A) x \,d\mu + \lambda\int\limits_{\Omega}^{}R(\lambda,A) x \,d\mu \leq 0; {} \end{aligned} $$
(49)

that is,

$$\displaystyle \begin{aligned} \lambda\|R(\lambda,A) x\| + \|BR(\lambda,A) x\|- \|x\|\leq 0, \qquad x\in X_+, {} \end{aligned} $$
(50)

from which ∥BR(λ, A)∥≤ 1, that is, assumption (A1) is satisfied.

Remark 6.14

Though, as mentioned before, we do not have any explicit characterization of K, [4, Theorem 6.20] ensures that K obtained in Corollary 6.13 satisfies \(K\subset K_{\max }\).

Example 6.15

The strength of Corollary 6.13 lies in the fact that (48) is checked on the domain of the minimal operator.

In particular, for (8), (48) takes the form (9),

$$\displaystyle \begin{aligned} \sum_{n=1}^\infty (-(d_n+b_n)u_n +d_{n+1}u_{n+1}+b_{n-1}u_{n-1}) = 0, {} \end{aligned} $$
(51)

where the rearrangements of the terms in the summation is justified as in (27) and the existence of the solution semigroup follows immediately. Again, by Remark 6.14, the coordinates of the obtained (matrix) semigroup satisfy (coordinate-wise) (8).

Another important characterization of (G(t))t≥0 is that it is a minimal semigroup in the following sense.

Proposition 6.16

Let D be a core of A. If\(\mbox{{$({\bar G}(t))_{t \geq 0}$}}\)is another positive semigroup generated by an extension of (A + B, D), then\(\bar G(t)\geq G(t), t\geq 0\).

This result is used e.g. to clarify the relation between the solution to (10) obtained in Sect. 4.1 and the one constructed using Corollary 6.13—they coincide, see [4, Proposition 7.10].

7 Honesty of the Semigroup and Characterization of Its Generator

The presentation in this section is based on [26], [32, Sections 2.2& 2.3], [33] and [8, Sections 4.10.2& 4.10.3].

7.1 The Three Functionals

The way forward comes from the realization that (48) in fact means

$$\displaystyle \begin{aligned} \int\limits_{\Omega}^{}(Au + Bu )\,d\mu =-c(u) \leq 0, \quad u \in D(A)_+,{} \end{aligned} $$
(52)

where c is a nonnegative linear functional on D(A)+ (defined in fact on D(A)). On the other hand, we can define

$$\displaystyle \begin{aligned} \int\limits_{\Omega}^{}Ku d\mu &= -\hat c(u)\quad u \in D(K). \end{aligned} $$

Since, by Theorem 6.13, (G(t))t≥0 is contractive, that is, K is dissipative, we have \(\hat c\geq 0\).

Functional c extends to D(K) by monotonic limits and by continuity in the graph norm of D(K), [8, Theorem 4.10.6],

$$\displaystyle \begin{aligned} \bar c(R(\lambda, K)f):= \sum_{{n}={0}}^{{\infty}}c\left(R(\lambda, A)(BR(\lambda, A))^nf\right) {} \end{aligned} $$
(53)

and there is \(\beta _\lambda \in X^*_+\) such that

$$\displaystyle \begin{aligned} \hat c(R(\lambda, K)f) - \bar c\left(R(\lambda,K)f\right) = \langle\beta_\lambda, f\rangle. {} \end{aligned} $$
(54)

The functional \(\bar c\) is independent of λ, see [26]. The crucial characterization theorem, combining results of [8, 26, 32] reads as follows.

Theorem 7.17

The following are equivalent:

  1. 1.

    \(K=\overline {A+B}\);

  2. 2.

    β λ ≡ 0 on X for some/all λ > 0;

  3. 3.

    \(\bar c = \hat c\)on D(K);

  4. 4.

    Ker(λI − (A + B)) = {0} for some/all λ > 0.

This result has several important ramifications. First, let us provide a result related to Remark 6.14, given in [32, Theorem 2.3.4].

Theorem 7.18

Let the assumptions of Theorem 6.13 be satisfied. Then

  1. (a)

    If\(K=\overline {A+B}\) , then (G K(t))t≥0is the unique substochastic semigroup whose generator K is an extension of A + B.

  2. (b)

    If\(K\nsupseteq \overline {A+B}\) , then there are infinitely many substochastic semigroups generated by extensions of A + B.

The construction in the case b) is based on the observation that the functional \(0\leq C:= \hat c-\bar c,\) defined on D(K), vanishes on \(D(\overline {A+B})\) but is nonzero for \(u\in D(K)\setminus D(\overline {A+B})\). For any fixed f 0 ∈ X + ∖{0} with ∥f 0∥≤ 1, the operator

$$\displaystyle \begin{aligned} \tilde{K} u = Ku + C(u)f_0, \quad f \in D(K), \end{aligned}$$

is the generator of a substochastic semigroup. We observe that the domains of the generators are D(K). However, since \(K\subset K_{\max }\), we see that \(\tilde {K}\nsubseteq K_{\max }\) and thus it cannot be constructed using Corollary 6.13.

Let us now specify the concept of honesty for the current situation.

Definition 7.19

Let I ⊆ [0, ) be an interval and let .

  1. (a)

    We say that the trajectory is honest on I if satisfies

    (55)
  2. (b)

    The trajectory is called honest if it is honest on [0, ).

  3. (c)

    The semigroup (G(t))t≥0 is honest if all its trajectories are honest.

The dishonesty, that is, the amount of mass lost, can be measured by the defect function which can be defined for any f ∈ X +,

Rephrasing Theorem 7.17 in terms of honesty, we obtain

Theorem 7.20

(G(t))t≥0is honest if and only if\(K=\overline {A+B}\).

7.2 Structure of Honesty

Defining a dishonest semigroup as a semigroup that is not honest, we see that for a semigroup to be dishonest it is sufficient if only one of its trajectories is dishonest on an arbitrarily short time interval. Let us then have a closer look at the structure of honest and dishonest trajectories. We denote

$$\displaystyle \begin{aligned} H_I:=\{f\in X_+\ :\; \{G(t)f\}_{t\geq 0}\;\text{is honest on } I\} \end{aligned}$$

with H := H [0,). It turns out that H I has a nice order structure.

Lemma 7.21

If f  H Iand g  X +satisfies g  f, then g  H I.

Proof

Indeed,

Hence, \(\mathcal {d}_g(s,t) = 0\) for all s, t ∈ I, s ≤ t, and g ∈ H I. □

With some more work we arrive at the full order theoretic characterization of the honesty set.

Theorem 7.22 ([26], [8, Proposition 4.10.22])

For any interval I ⊂ [0, ), the set

$$\displaystyle \begin{aligned} \mathcal{H}_I:=\mathit{\text{Span}} H_I = H_I-H_I \end{aligned}$$

is a projection band in X. If I = [a, ) for some a ≥ 0, then\(\mathcal {H}_I\)is invariant under (G(t))t≥0.

Then the characterisation of projection bands, [10, Proposition 10.15], yields

Corollary 7.23

There is a measurable set Ω 1 ⊂ Ω such that\(\mathcal {H}_I = L_1(\Omega _1)\).

Another immediate consequence of Theorem 7.22 is

Corollary 7.24

Let (G(t))t≥0be an irreducible semigroup and a ≥ 0. If H [a,) ≠ {0}, then H [a,) = X +.

In other words, for irreducible semigroups, either all trajectories are dishonest, or all are honest on [0, ). Otherwise, we have

$$\displaystyle \begin{aligned} X = \mathcal{H}\oplus\mathcal{H}^d = L_1(\Omega_1)\oplus L_1(\Omega_2){} \end{aligned} $$
(56)

for some measurable set Ω2, that is, if X + ∋ f > 0 on Ω with μ( Ω2 ∩ Ω) > 0, then {G(t)f}t≥0 is dishonest on some interval I. In general, we can say very little about the behaviour of trajectories originating from such initial conditions:

  1. (1)

    once a trajectory becomes dishonest, it cannot recover (but it can continue as a honest trajectory if it becomes supported in Ω1),

  2. (2)

    if (G(t))t≥0 is dishonest, then there is an initial condition g ∈ X + such that the trajectory is immediately dishonest.

Proposition 7.25 ([26])

Assume that (G(t))t≥0is not honest. Then any trajectory {G(t)f}t≥0, where f  X +is such that f > 0 a.e. on Ω 2, see (56), is immediately dishonest.

Corollary 7.26

Let (G(t))t≥0be a dishonest irreducible semigroup. Then all trajectories are dishonest and, moreover, the trajectories originating from positive a.e. initial conditions are immediately dishonest.

A more detailed information on the trajectories can be obtained for specific applications using, in particular, probabilistic methods, see [12] for Markov chains, or [20, 21] for fragmentation equations.

7.3 A Hunt for Honest Semigroups

While the above results provide a nice theoretical framework, they do not give a working tool to determine whether a semigroup is honest. There are several approaches to this problem from which we present one based on the concept of operator extensions. First we note the following consequence of Theorem 7.17.

Theorem 7.27

The semigroup (G K(t))t≥0is honest if and only if for any u  D(K)+we have

$$\displaystyle \begin{aligned} \int_{\Omega}{}Ku \,d\mu \geq -\bar c(u). {} \end{aligned} $$
(57)

The statement follows from the fact that, by (54), \(\hat c\geq \bar c\), and thus (57) implies \(\hat c =\bar c\), giving the honesty by Theorem 7.17, part 3.

Condition (57) may seem useless as a tool for determining the honesty of (G(t))t≥0 since it requires the knowledge of K itself. We note, however, that if we can prove (57) for an extension of K (such as \(K_{\max }\)), then it will hold for K.

Corollary 7.28

If there exists an extension\(\mathcal {K}\)of K and\(\tilde {c}\)of\(\bar c\)from D(K) to\(D(\mathcal {K})\)such that

$$\displaystyle \begin{aligned} \int_{\Omega}\mathcal{K}u \,d\mu \geq - \tilde{c}(u), {}\end{aligned} $$
(58)

for all\(u \in D(\mathcal {K})_+\) , then\(K = \overline {A +B}\).

To illustrate these results, we consider the birth-and-death Eq. (8), where we assumed b n ≥ 0 for n ≥ 1 and d n ≥ 0 for n ≥ 2. By Example 6.15, there is a unique minimal substochastic semigroup (G(t))t≥0 solving (8) and, by (51), we have \(\bar c\equiv 0.\) Furthermore, since \(K \subset K_{\max }\), for u ∈ D(K)+ we have

(59)

First we look again at the death and birth semigroups.

Example 7.29

It is immediately seen that the death semigroup is always honest as then b n = 0, n ≥ 1, and hence

$$\displaystyle \begin{aligned} \int_{\Omega}{}Ku d\mu \geq 0. \end{aligned}$$

On the other hand, for the birth semigroup

$$\displaystyle \begin{aligned} \int_{\Omega}{}Ku \,d\mu=- \lim\limits_{n\to \infty} b_nu_n. \end{aligned}$$

The limit on the right-hand-side is negative if, for instance, \(u= (u_n)_{n\geq 1} = (b_n^{-1})_{n\geq 1}\) and this, by Theorem 7.27, would suffice for showing dishonesty if we could prove that u ∈ D(K). For this we use Lemma 6.8. Considering, for λ > 0,

$$\displaystyle \begin{aligned} \lambda u_1&= -b_1 u_1 ,\\ \lambda u_n &= -b_{n}u_n +b_{n-1}u_{n-1}, \quad n\geq 2, {} \end{aligned} $$
(60)

we see that \(\mathrm {Ker}(\lambda I - K_{\max }) = \{0\}\) and hence \(D(K) = D(K_{\max })\). Now, \(u \in D(K_{\max })\) is characterized by

$$\displaystyle \begin{aligned} \sum\limits_{n=2}^\infty |u_n| < \infty, \quad \sum\limits_{n=2}^\infty |b_{n}u_n -b_{n-1}u_{n-1}|<\infty. \end{aligned}$$

Since for \(u=(b_n^{-1})_{n\geq 1}\) the second sum is identically 0, the birth semigroup is honest if and only if \((b_n^{-1})_{n\geq 1}\notin l_1\), confirming the calculations of Examples 4.2 and 4.3.

A similar, but obviously more involved, argument can be used to prove the following fundamental results for the full birth-and-death equation, that is, (8) with b n > 0 for n ≥ 1 and d n > 0 for n ≥ 2. It goes back, in a slightly weaker form, to [29] and have been proved in [4, Section 7.4] by semigroups tools.

Theorem 7.30

\(K = {} \overline {A+B}\) if and only if

$$\displaystyle \begin{aligned} \sum_{{n}={0}}^{{\infty}}\frac{1}{b_n}\left( \sum_{{i}={0}}^{{\infty}}\prod_{j=1}^{i}\frac{d_{n+j}}{b_{n+j}}\right)=+\infty {} \end{aligned} $$
(61)

(where we put \(\prod _{j=1}^{0}=1\) ).

Theorem 7.31

\(K \neq { K_{\max }}\) if and only if

$$\displaystyle \begin{aligned} \sum_{{n}={1}}^{{\infty}}\frac{1}{d_n}\prod_{j=1}^{n-1}\frac{b_j}{d_j} \left(\sum_{{i}={0}}^{{n-1}}\prod_{r=1}^{i}\frac{d_r}{b_r}\right) <+\infty {}, \end{aligned} $$
(62)

where, as before,\(\prod _{j=1}^{0} =1\).

By playing with coefficients, we can construct all cases listed in Section 5, see [4, Section 7.4]. In particular, \(\overline {K_{\min }}\nsubseteq K \nsubseteq K_{\max }\) (Case 5) occurs as a combination of (10) and (16), that is, for (8) with b n = 2 ⋅ 3n and d n = 3n. The full example has been thoroughly analysed in [12, Sections 2.4.10–16 & 3.4.2–6].

Next we use the results of Sect. 7.2 to provide a more precise description of the birth-and-death dynamics.

Proposition 7.32

If the minimal birth-and-death semigroup (G(t))t≥0is dishonest, then all trajectories are dishonest and the trajectories emanating from strictly positive initial conditionsare immediately dishonest. Moreover, for such initial conditions, is strictly decreasing.

Proof

First we observe that the semigroup (G(t))t≥0 is irreducible. Indeed, by e.g. [10, Example 14.11], it is sufficient to show that (R(λ, K)f)n > 0 for some λ > 0 and all n ≥ 1, whenever 0 ≠ f ≥ 0. We use the representation (45) where we observe, similarly to Example 6.9, that

where, recall, d 1 = b 0 = 0. Let u r = (δ jr)j≥1 for some r ≥ 1. Then, since

$$\displaystyle \begin{aligned} R(\lambda, K)\geq \sum\limits_{k=0}^NR(\lambda, A)(BR(\lambda, A))^k \end{aligned}$$

for any N, (R(λ, K)u)n > 0 for all \(n = \max \{1, r-N\}, \ldots r+N\). Since N is arbitrary and any 0 ≠ f ≥ 0 satisfies f ≥ αu r for some r ≥ 1 and α > 0, the statement is proved.

Then the first two statements of the proposition follow from Corollaries 7.24 and 7.26. To prove the last one we observe that since G(t) ≥ G A(t), where (G A(t))t≥0 is the semigroup generated by the diagonal operator A, for such , for any t ≥ 0 and thus the claim follows again from Corollary 7.26. □

8 Conclusion

In this paper we have discussed a functional analytic explanation of seemingly pathological properties of some dynamical systems, such as the breach of the conservation laws and the non-uniqueness of solutions, using a class of Kolmogorov equations as an example. In the text we mentioned that similar phenomena occur, and can be explained in the same framework, in fragmentation processes, see [4, 8]. These examples certainly do not exhaust the list of cases in which such a behaviour occur. The existence of multiple solutions to the Cauchy problem has been observed in parabolic problems in non-smooth domains, [1]. The breach of conservation laws in Markov process has been studied at least since the work of Feller [18, 19], and recently it has been given a thorough overview and update in [12]. It is not restricted, however, to Markov chains but occurs in transport equations, see [9], as well as diffusion problems, see [25]. The latter paper is mostly concerned with the behaviour of the semigroup in the space of continuous functions where, by duality, the semigroup is conservative if the constant function 1 is invariant under its action. It is worthwhile to note that this property has been extensively studied in the context of quantum dynamical semigroups, [14].