Keywords

1 Introduction

Markov chains are popular probabilistic models for describing the behaviour of dynamical systems under uncertainty. The crucial simplifying assumption in these models is that the probabilities describing the system’s future behaviour are conditionally independent of its past behaviour, given that we know the current state of the system; this is the canonical Markov property.

It is this Markov assumption that makes the parametrisation of a Markov chain relatively straightforward—indeed, as we will discuss in Sect. 2, the uncertain dynamic behaviour is then completely characterised by a transition matrix T, whose elements \(T(x_n,x_{n+1})= \mathrm {P}(X_{n+1}=x_{n+1}\vert X_n=x_n)\) describe the probabilities that the system will transition from any state \(x_n\) at time n, to any state \(x_{n+1}\) at time \(n+1\). Note that T itself is independent of the time n; this is the additional assumption of time homogeneity that is often imposed implicitly in this context. An important advantage of these assumptions is that the resulting matrix T can be used to solve various important inference problems, using one of the many available efficient algorithms.

In many cases however, the numerical value of the transition matrix T may not be known exactly; that is, there may be additional (higher-order) uncertainty about the model itself. Moreover, it can be argued that simplifying assumptions like the Markov property and time homogeneity are often unrealistic in practice. It is of interest, then, to compute inferences in a manner that is robust; both to violations of such simplifying assumptions, and to variations in the numerical values of the transition probabilities.

The theory of imprecise probabilities allows us to describe such additional uncertainties by using, essentially, sets of traditional (“precise”) models. In particular, such a set is comprised of all the models that we deem “plausible”; for instance, we may include all Markov chains whose characterising transition matrix T is included in some given set \(\mathscr {T}\) of transition matrices. In this way we can also include non-homogeneous Markov chains, by simply requiring that their (now time-dependent) transition matrices remain in \(\mathscr {T}\). Moreover, we can even include non-Markovian models in such a set. This leads to the notion of an imprecise Markov chain. The robust inferences that we are after, are then the tightest possible lower and upper bounds on the inferences computed for each of the included precise models.

In this work, we present an efficient algorithm for solving a large class of these inferences within imprecise Markov chains. Broadly speaking, this class consists of inferences that depend on the uncertain state of the system at a finite number of time instances, and which can be decomposed in a particular recursive form. As we will discuss, it contains as special cases the (joint) probabilities of sequences of states; the hitting probabilities and expected hitting times of subsets of the possible states; and time averages of functions of the state of the system.

Interestingly, existing algorithms for some of these inferences turn out to correspond to special cases of our algorithm, giving our algorithm a unifying character. Time averages, for example, were already considered in [9], and some of the results in [8]—a theoretical study of lower and upper expected hitting times and probabilities—can be interpreted as a special cases of the algorithm presented here. Readers that are familiar with recursive algorithms for credal networks under epistemic irrelevance [1, 2, 4] might also recognise some of what we do; in fact, many of the ideas behind our algorithm have previously been discussed in this more general context [2, Chapter 7].

In order to adhere to the page limit, all proofs have been relegated to the appendix of an online extended version [13].

2 Preliminaries

We denote the natural numbers, without 0, by \(\mathbb {N}\), and let \(\mathbb {N}_{0}:= \mathbb {N}\cup \{0\}\). The set of positive real numbers is denoted by \(\mathbb {R}_{> 0}\) and the set of non-negative real numbers by \(\mathbb {R}_{\ge 0}\). Throughout, we let \(\mathbb {I}_{A}\) denote the indicator of any subset \(A\subseteq \mathscr {Y}\) of a set \(\mathscr {Y}\); so, for any \(y\in \mathscr {Y}\), \(\mathbb {I}_{A}(y):= 1\) if \(y\in A\) and \(\mathbb {I}_{A}(y):= 0\) otherwise.

Before we can introduce the notion of an imprecise Markov chain, we first need to discuss general (non-Markovian) stochastic processes. These are arguably most commonly formalised using a measure-theoretic approach; however, the majority of our results do not require this level of generality, and so we will keep the ensuing introduction largely intuitive and informal.

Let us start by considering the realisations of a stochastic process. At each point in time \(n\in \mathbb {N}\), such a process is in a certain state \(x_n\), which is an element of a finite non-empty state space \(\mathscr {X}\). A realisation of the process is called a path, and is an infinite sequence \(\omega = x_1 x_2 x_3 \cdots \) where, at each discrete time point \(n\in \mathbb {N}\), \(\omega _n:= x_n\in \mathscr {X}\) is the state obtained by the process at time n, on the path \(\omega \). So, we can interpret any path as a map \(\omega :\mathbb {N}\rightarrow \mathscr {X}\), allowing us to collect all paths in the set \(\varOmega := \mathscr {X}{}^{\mathbb {N}}\). Moreover, for any \(\omega \in \varOmega \) and any \(m,n\in \mathbb {N}\) with \(m\le n\), we use the notation \(\omega _{m:n}\) to denote the finite sequence of states \(\omega _m\cdots \omega _n\in \mathscr {X}^{n-m+1}\).

A stochastic process is now an infinite sequence \(X_1X_2X_3\cdots \) of uncertain states where, for all \(n\in \mathbb {N}\), the uncertain state at time n is a function of the form \(X_n:\varOmega \rightarrow \mathscr {X}:\omega \mapsto \omega _n\). Similarly, we can consider finite sequences of such states where, for all \(m,n\in \mathbb {N}\) with \(m\le n\), \(X_{m:n}:\varOmega \rightarrow \mathscr {X}^{n-m+1}:\omega \mapsto \omega _{m:n}\). These states are uncertain in the sense that we do not know which realisation \(\omega \in \varOmega \) will obtain in reality; rather, we assume that we have assessments of the probabilities \(\mathrm {P}(X_{n+1} = x_{n+1}\vert X_{1:n} = x_{1:n})\), for any \(n \in \mathbb {N}\) and any \(x_{1:n} \in \mathscr {X}^{n}\). Probabilities of this form tell us something about which state the process might be in at time \(n+1\), given that we know that at time points 1 through n, it followed the sequence \(x_{1:n}\). Moreover, we can consider probabilities of the form \(\mathrm {P}(X_1=x_1)\) for any \(x_1\in \mathscr {X}\); this tells us something about the state that the process might start in. It is well known that, taken together, these probabilities suffice to construct a global probability model for the entire process \(X_1X_2X_3\cdots \), despite each assessment only being about a finite subsequence of the states; see e.g. the discussion surrounding [7, Theorem 5.16] for further details on formalising this in a proper measure-theoretic setting. We simply use \(\mathrm {P}\) to denote this global model.

Once we have such a global model \(\mathrm {P}\), we can talk about inferences in which we are interested. In general, these are typically encoded by functions \(f:\varOmega \rightarrow \mathbb {R}\) of the unknown realisation \(\omega \in \varOmega \), and we collect all functions of this form in the set \(\mathscr {L}(\varOmega )\). To compute such an inference consists in evaluating the (conditional) expected value \(\mathrm {E}{}_\mathrm {P}(f \vert \,C)\) of f with respect to the model \(\mathrm {P}\), where \(C\subseteq \varOmega \) is an event of the form \(X_{m:n} = x_{m:n}\) with \(m,n \in \mathbb {N}{}\) such that \(m \le n\). In particular, if \(\mathrm {P}\) is a global model in the measure-theoretic sense, then under some regularity conditions like the measurability of f, we would be interested in computing the quantity \(\mathrm {E}{}_\mathrm {P}(f\vert \,C) := \int _{\varOmega }f(\omega )\,\mathrm {d}\mathrm {P}(\omega \vert C)\). For notational convenience, we will also use \(X_{1:0} := \varOmega {}\) as a trivial conditioning event, allowing us to regard unconditional expectations as a special case of conditional ones.

A special type of inferences that will play an important role in the remainder of this work are those for which the function f only depends on a finite subsequence of the path \(\omega \), thereby vastly simplifying the definition of its expectation. In particular, if an inference only depends on the states at time points m through n, say, then it can always be represented by a function \(f:\mathscr {X}^{n-m+1}\rightarrow \mathbb {R}\) evaluated in the uncertain states \(X_{m:n}\); specifically, the inference is represented by the composition \(f\circ X_{m:n}\), which we will denote by \(f(X_{m:n})\). In the sequel, we will call a composite function of this form finitary. Moreover, for any \(n\in \mathbb {N}\), we denote by \(\mathscr {L}(\mathscr {X}^n)\) the set of all functions of the form \(f:\mathscr {X}^n\rightarrow \mathbb {R}\), and we write \(\mathscr {L}_{\mathrm { fin}}{}(\varOmega {})\subset \mathscr {L}(\varOmega )\) for the set of all finitary functions. For a finitary function \(f(X_{1:n})\), the computation of its expected value reduces to evaluating the finite sum

$$\begin{aligned} \mathrm {E}{}_\mathrm {P}(f(X_{1:n})\vert \,C) = \sum _{x_{1:n}\in \mathscr {X}^{n}}f(x_{1:n})\mathrm {P}(X_{1:n}=x_{1:n}\vert \,C)\,. \end{aligned}$$

Let us now move on from the discussion about general uncertain processes, to the special case of Markov chains. An uncertain process \(\mathrm {P}\) is said to satisfy the Markov property if, for all \(n\in \mathbb {N}\) and all \(x_{1:n+1}\in \mathscr {X}^{n+1}\), the aforementioned probability assessments simplify in the sense that

$$\begin{aligned} \mathrm {P}(X_{n+1}=x_{n+1}\vert X_{1:n}=x_{1:n}) = \mathrm {P}(X_{n+1}=x_{n+1}\vert X_n=x_n)\,. \end{aligned}$$

A process that satisfies this Markov property is called a Markov chain. Thus, for a Markov chain, the probability that it will visit state \(x_{n+1}\) at time \(n+1\) is independent of the states \(X_{1:n-1}\), given that we know the state \(X_n\) at time n. If the process is moreover homogeneous, meaning that \(\mathrm {P}(X_{n+1}=y\vert X_n=x)=\mathrm {P}(X_{2}=y\vert X_1=x)\) for all \(x,y\in \mathscr {X}\) and all \(n\in \mathbb {N}\), then the parameterisation of the process becomes exceedingly simple. Indeed, up to the initial distribution \(\mathrm {P}(X_1)\)—a probability mass function on \(\mathscr {X}\)—the process’ behaviour is then fully characterised by a single \(|\mathscr {X}|\times |\mathscr {X}|\) matrix T that is called the transition matrix. It is row-stochastic (meaning that, for all \(x\in \mathscr {X}\), the x-th row \(T(x,\cdot )\) of T is a probability mass function on \(\mathscr {X}\)) and its entries satisfy \(T(x,y) = \mathrm {P}(X_{n+1}=y\vert X_n=x)\) for all \(x,y\in \mathscr {X}\) and \(n\in \mathbb {N}\). The usefulness of this representation comes from the fact that we can interpret T as a linear operator on the vector space \(\mathscr {L}{(\mathscr {X})}\simeq \mathbb {R}^{|\mathscr {X}|}\), due to the assumption that \(\mathscr {X}\) is finite. For \(f\in \mathscr {L}{(\mathscr {X})}\), this allows us to write the conditional expectation of \(f(X_{n+1})\) given \(X_n\) as a matrix-vector product: for any \(x\in \mathscr {X}\), \(\mathrm {E}{}_\mathrm {P}(f(X_{n+1})\vert X_n=x)\) equals

$$\begin{aligned} \sum _{y\in \mathscr {X}} f(y)\mathrm {P}(X_{n+1}=y\vert X_n=x) = \sum _{y\in \mathscr {X}} f(y)T(x,y) = \bigl [Tf\bigr ](x). \end{aligned}$$

3 Imprecise Markov Chains

Let us now move on to the discussion about imprecise Markov chains. Here, we additionally include uncertainty about the model specification, such as uncertainty about the numerical values of the probabilities \(\mathrm {P}(X_{n+1}\vert X_{1:n})\), and about the validity of structural assessments like the Markov property.

We will start this discussion by regarding the parameterisation of such an imprecise Markov chain. We first consider the (imprecise) initial model \(\mathcal {M}\); this is simply a non-empty set of probability mass functions on \(\mathscr {X}\) that we will interpret as containing those probabilities that we deem to plausibly describe the process starting in a certain state. Next, instead of being described by a single transition matrix T, an imprecise Markov chain’s dynamic behaviour is characterised by an entire set \(\mathscr {T}\) of transition matrices. So, each element \(T\in \mathscr {T}\) is an \(|\mathscr {X}|\times |\mathscr {X}|\) matrix that is row-stochastic. In the sequel, we will take \(\mathscr {T}\) to be fixed, and assume that it is non-empty and that it has separately specified rows. This last property is instrumental in ensuring that computations can be performed efficiently, and is therefore often adopted in the literature; see e.g. [6] for further discussion. For our present purposes, it suffices to know that it means that \(\mathscr {T}\) can be completely characterised by providing, for any \(x\in \mathscr {X}\), a non-empty set \(\mathscr {T}_x\) of probability mass functions on \(\mathscr {X}\). In particular, it means that \(\mathscr {T}\) is the set of all row-stochastic \(|\mathscr {X}|\times |\mathscr {X}|\) matrices T such that, for all \(x\in \mathscr {X}\), the x-row \(T(x,\cdot )\) is an element of \(\mathscr {T}_x\).

Given the sets \(\mathcal {M}\) and \(\mathscr {T}\), the corresponding imprecise Markov chain is defined as the largest set \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\) of stochastic processes that are in a specific sense compatible with both \(\mathcal {M}\) and \(\mathscr {T}\). In particular, a model \(\mathrm {P}\) is said to be compatible with \(\mathcal {M}\) if \(\mathrm {P}(X_1)\in \mathcal {M}\), and it is said to be compatible with \(\mathscr {T}\) if, for all \(n\in \mathbb {N}\) and all \(x_{1:n}\in \mathscr {X}^{n}\), there is some \(T\in \mathscr {T}\) such that

$$\begin{aligned} \mathrm {P}(X_{n+1}=x_{n+1}\vert X_{1:n}=x_{1:n}) = T(x_n,x_{n+1}) \text { for all } x_{n+1} \in \mathscr {X}{}. \end{aligned}$$

Notably, therefore, \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\) contains all the (precise) homogeneous Markov chains whose characterising transition matrix T is included in \(\mathscr {T}\), and whose initial distribution \(\mathrm {P}(X_1)\) is included in \(\mathcal {M}\). However, in general, \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\) clearly also contains models that do not satisfy the Markov property, as well as Markov chains that are not homogeneous.Footnote 1

For such an imprecise Markov chain, we are interested in computing inferences that are in a specific sense robust with respect to variations in the set \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\). Specifically, for any function of interest \(f:\varOmega \rightarrow \mathbb {R}\), we consider its (conditional) lower and upper expectations, which are respectively defined by

$$\begin{aligned} \underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f \vert \,C) := \inf _{\mathrm {P} \in \mathscr {P}_{\mathcal {M},\mathscr {T}{}}} \mathrm {E}{}_\mathrm {P}(f \vert \,C) \quad \text {and}\quad \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f \vert \,C) := \sup _{\mathrm {P} \in \mathscr {P}_{\mathcal {M},\mathscr {T}{}}} \mathrm {E}{}_\mathrm {P}(f \vert \,C)\,. \end{aligned}$$

In words, we are interested in computing the tightest possible bounds on the inferences computed for each \(\mathrm {P} \in \mathscr {P}_{\mathcal {M},\mathscr {T}{}}\). These lower and upper expectations are related through conjugacy, meaning that \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f \vert \,C) = -\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(-f \vert \,C)\), so it suffices to consider only the upper expectations in the remaining discussion; any results for lower expectations follow analogously through this relation.

From a computational point of view, it is also useful to consider the dual representation of the set \(\mathscr {T}\), given by the upper transition operator \(\overline{T}\) with respect to this set [5, 6]. This is a (non-linear) operator that maps \(\mathscr {L}(\mathscr {X})\) into \(\mathscr {L}(\mathscr {X})\); it is defined for any \(f\in \mathscr {L}(\mathscr {X})\) and any \(x\in \mathscr {X}\) as

$$\begin{aligned} \bigl [\overline{T}f\bigr ](x) := \sup _{T(x,\cdot )\in \mathscr {T}_x} \sum _{y\in \mathscr {X}}T(x,y)f(y) . \end{aligned}$$

So, in order to evaluate \(\bigl [\overline{T}f\bigr ](x)\), one must solve an optimisation problem over the set \(\mathscr {T}_x\) containing the x-rows of the elements of \(\mathscr {T}\). In many practical cases, the set \(\mathscr {T}_x\) is closed and convex and therefore, evaluating \(\bigl [\overline{T}f\bigr ](x)\) is relatively straightforward: for instance, if \(\mathscr {T}_x\) is described by a finite number of (in)equality constraints, then this problem reduces to a simple linear programming task, which can be solved by standard techniques. We will also make use of the conjugate lower transition operator \(\underline{T}\), defined by \([\underline{T}\,f](x):= -[\overline{T}(-f)](x)\) for all \(x\in \mathscr {X}\) and all \(f\in \mathscr {L}(\mathscr {X})\). Results about upper transition operators translate to results about lower transition operators through this relation; we will focus on the former in the following discussion.

Now, the operator \(\overline{T}\) can be used for computing upper expectations in much the same way as transition matrices are used for computing expectations with respect to precise Markov chains: for any \(n \in \mathbb {N}{}\), any finitary function \(f(X_{n+1})\) and any \(x_{1:n}\in \mathscr {X}^{n}\) it holds that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{n+1}) \vert X_{1:n}=x_{1:n}) = \bigl [\overline{T}f\bigr ](x_n)\,. \end{aligned}$$
(1)

Observe that the right-hand side in this expression does not depend on the history \(x_{1:n-1}\); this can be interpreted as saying that the model satisfies an imprecise Markov property, which explains why we call our model an “imprecise Markov chain”. Moreover, a slightly more general property holds that will be useful later on:

Proposition 1

Consider the imprecise Markov chain \(\mathscr {P}_{\mathcal {M},\mathscr {T}}\). For any \(m,n \in \mathbb {N}\) such that \(m \le n\), any function \(f \in \mathscr {L}{}(\mathscr {X}{}^{n - m +1})\) and any \(x_{1:m-1}\in \mathscr {X}^{m-1}\) and \(y\in \mathscr {X}\), we have that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\big (f(X_{m:n}) \big \vert X_{1:m-1} = x_{1:m-1}, X_m=y \big ) = \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\big (f(X_{1:n-m+1}) \big \vert X_{1} = y \big ). \end{aligned}$$

Finally, we remark that, for any \(m,n \in \mathbb {N}\) such that \(m \le n\), a conditional upper expectation \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\bigl (f \big \vert X_{m:n}\bigr )\) is itself a (finitary) function depending on the states \(X_{m:n}\). Using this observation, we can now introduce the law of iterated upper expectations, which will form the basis of the algorithms developed in the following sections:

Theorem 1

Consider the imprecise Markov chain \(\mathscr {P}_{\mathcal {M},\mathscr {T}}\). For all \(m \in \mathbb {N}_{0}\), all \(k \in \mathbb {N}\) and all \(f \in \mathscr {L}_{\mathrm { fin}}{}(\varOmega {})\), we have that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\bigl (f \big \vert X_{1:m} \bigr ) = \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\Big (\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\bigl (f \big \vert X_{1:m+k} \bigr ) \Big \vert X_{1:m} \Big ). \end{aligned}$$

4 A Recursive Inference Algorithm

In principle, for any function \(f\in \mathscr {L}{}(\mathscr {X}{}^n)\) with \(n \in \mathbb {N}{}\), the upper expectations of \(f(X_{1:n})\) can be obtained by maximising \(\mathrm {E}{}_\mathrm {P}(f(X_{1:n}))\) over the set \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\) of all precise models \(\mathrm {P}\) that are compatible with \(\mathcal {M}\) and \(\mathscr {T}{}\). Since this will almost always be infeasible if n is large, we usually apply the law of iterated upper expectations in combination with the Markov property in order to divide the optimisation problem into multiple smaller ones. Indeed, because of Theorem 1, we have that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})) = \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\Big (\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_{1:n-1}) \Big ). \end{aligned}$$

Using Eq. (1), one can easily show that \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_{1:n-1})\) can be computed by evaluating \([\overline{T}{} f(x_{1:n-1} \cdot )](x_{n-1})\) for all \(x_{1:n-1} \in \mathscr {X}{}^{n-1}\). Here, \(f(x_{1:n-1} \cdot )\) is the function in \(\mathscr {L}{}(\mathscr {X}{})\) that takes the value \(f(x_{1:n})\) on \(x_n \in \mathscr {X}{}\). This accounts for optimisation problems to be solved. With the acquired function \(f'(X_{1:n-1}) := \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_{1:n-1})\), we can then compute the upper expectation in a similar way, by solving optimisation problems. Continuing in this way, we end up with a function that only depends on \(X_1\) and for which the expectation needs to be maximised over the initial models in \(\mathcal {M}\). Hence, in total, \(\sum _{i = 0}^{n-1} {\vert \mathscr {X}{} \vert }^{i}\) optimisation problems need to be solved in order to obtain \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n}))\). Although these optimisation problems are relatively simple and therefore feasible to solve individually, the total number of required iterations is still exponential in n, therefore making the computation of \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n}))\) intractable when n is large.

In many cases, however, \(f(X_{1:n})\) can be recursively decomposed in a specific way allowing for a much more efficient computational scheme to be employed; see Theorem 2 further on. Before we present this scheme in full generality, let us first provide some intuition about its basic working principle.

So assume we are interested in \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n}))\), which, according to Theorem 1, can be obtained by maximising \(\mathrm {E}{}_{\mathrm {P}}(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_1))\) over \(\mathrm {P}(X_1)\in \mathcal {M}\). The problem then reduces to the question of how to compute \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_1)\) efficiently. Suppose now that \(f(X_{1:n})\) takes the following form:

$$\begin{aligned} f(X_{1:n})=g(X_1)+h(X_1)\tau (X_{2:n}), \end{aligned}$$
(2)

for some \(g, h \in \mathscr {L}{}(\mathscr {X}{})\) and some \(\tau \in \mathscr {L}{}(\mathscr {X}{}^{n-1})\). Then, because \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\) is a supremum over linear expectations, we find that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_1) = g(X_1)+h(X_1) \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau (X_{2:n})\vert X_1), \end{aligned}$$

where, for the sake of simplicity, we assumed that h does not take negative values. Then, by appropriately combining Proposition 1 with Theorem 1, one can express \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau (X_{2:n})\vert X_1)\) in terms of \(\overline{\varUpsilon }:\mathscr {X}\rightarrow \mathbb {R}\), defined by

$$\begin{aligned} \overline{\varUpsilon }(x):=\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau (X_{1:n-1})\vert X_1=x)\text { for all }x \in \mathscr {X}{}. \end{aligned}$$

In particular, we find that

$$\begin{aligned} \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau (X_{2:n})\vert X_1) = \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\big (\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau (X_{2:n})\vert X_{1:2}) \vert X_1\big )&= \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\big (\overline{\varUpsilon }(X_2)\vert X_1\big ) \\&= [\overline{T}{} \, \overline{\varUpsilon }](X_1), \end{aligned}$$

where the equalities follow from Theorem 1, Proposition 1 and Eq. (1), respectively. So \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_{1:n})\vert X_1)\) can be obtained from \(\overline{\varUpsilon }\) by solving a single optimisation problem, followed by a pointwise multiplication and summation.

Now, by repeating the structural assessment (2) in a recursive way, we can generate a whole class of functions for which the upper expectations can be computed using the principle illustrated above. We start with a function \(\tau _1(X_1)\), with \(\tau _1\in \mathscr {L}{}(\mathscr {X}{})\), that only depends on the initial state. The upper expectation \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _1(X_1) \vert X_1)\) is then trivially equal to \(\tau _1(X_1)\). Next, consider \(\tau _2(X_{1:2})=g_1(X_1)+h_1(X_1)\tau _1(X_{2})\) for some \(g_1, h_1\) in \(\mathscr {L}{}(\mathscr {X}{})\). \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _2(X_{1:2}) \vert X_1)\) is then given by \(g_1(X_1)+h_1(X_1) [\overline{T}{}\,\overline{\varUpsilon }_1](X_{1})\), where we let \(\overline{\varUpsilon }_1(x) := \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _1(X_1) \vert X_1=x)=\tau _1(x)\) for all \(x\in \mathscr {X}\) and where we (again) neglect the subtlety that \(h_1\) can take negative values. Continuing in this way, step by step considering new functions constructed by multiplication and summation with functions that depend on an additional time instance, and no longer ignoring the fact that the functions involved can take negative values, we end up with the following result.

Theorem 2

Consider any imprecise Markov chain \(\mathscr {P}_{\mathcal {M},\mathscr {T}{}}\) and two sequences of functions \(\{g_n\}_{n \in \mathbb {N}_{0}{}}\) and \(\{h_n\}_{n \in \mathbb {N}{}}\) in \(\mathscr {L}{}(\mathscr {X}{})\). Define \(\tau _1(x_1) := g_0(x_1)\) for all \(x_1 \in \mathscr {X}\), and for all \(n\in \mathbb {N}\), let

$$\begin{aligned} \tau _{n+1}(x_{1:n+1}) := h_n(x_1) \tau _{n}(x_{2:n+1})+ g_n(x_1)&\text { for all } x_{1:n+1} \in \mathscr {X}{}^{n+1}. \end{aligned}$$

If we write \(\{\overline{\varUpsilon }_n\}_{n \in \mathbb {N}{}}\) and \(\{\underline{\varUpsilon }_n\}_{n \in \mathbb {N}{}}\) to denote the sequences of functions in \(\mathscr {L}{}(\mathscr {X}{})\) that are respectively defined by \(\overline{\varUpsilon }_n(x) := \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}) \vert X_1 = x)\) and \(\underline{\varUpsilon }_n(x) := \underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}) \vert X_1 = x)\) for all \(x \in \mathscr {X}{}\) and all \(n \in \mathbb {N}{}\), then \(\{\overline{\varUpsilon }_n\}_{n \in \mathbb {N}{}}\) and \(\{\underline{\varUpsilon }_n\}_{n \in \mathbb {N}{}}\) satisfy the following recursive expressions:

$$\begin{aligned} {\left\{ \begin{array}{ll} \overline{\varUpsilon }_1 = \underline{\varUpsilon }_1 = g_0;\\ \overline{\varUpsilon }_{n+1} = h_n \mathbb {I}_{h_n \ge 0}[\overline{T}{} \, \overline{\varUpsilon }_{n}] + h_n \mathbb {I}_{h_n< 0}[\underline{T}{} \, \underline{\varUpsilon }_{n}] + g_n \text { for all } n \in \mathbb {N}{};\\ \underline{\varUpsilon }_{n+1} = h_n \mathbb {I}_{h_n \ge 0}[\underline{T}{} \, \underline{\varUpsilon }_{n}] + h_n \mathbb {I}_{h_n < 0}[\overline{T}{} \, \overline{\varUpsilon }_{n}] + g_n \text { for all } n \in \mathbb {N}{}. \end{array}\right. } \end{aligned}$$

Here, we used \(\mathbb {I}_{h_n \ge 0} \in \mathscr {L}{}(\mathscr {X}{})\) to denote the indicator of \(\{x \in \mathscr {X}{} :h_n(x) \ge 0\}\), and similarly for \(\mathbb {I}_{h_n < 0} \in \mathscr {L}{}(\mathscr {X}{})\). Note that, because we now need to evaluate both \(\overline{T}\) and \(\underline{T}\) for every iteration, we will in general need to solve \(2(n-1)\vert \mathscr {X}\vert \) optimisation problems to obtain \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}) \vert X_1)\) and \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n})\vert X_1)\) for some \(n \in \mathbb {N}{}\). In order to obtain the unconditional inferences \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}))\) and \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}))\), it then suffices to respectively maximise and minimise the expectations of \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}) \vert X_1)\) and \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n})\vert X_1)\) over all initial models in \(\mathcal {M}\).

5 Special Cases

To illustrate the practical relevance of our method, we now discuss a number of important inferences that fall within its scope. As already mentioned in the introduction section, in some of these cases, our method simplifies to a computational scheme that was already developed earlier in a more specific context. The strength of our present contribution, therefore, lays in its unifying character and the level of generality to which it extends.

Functions that depend on a single time instant. As a first, very simple inference we can consider the upper and lower expectation of a function \(f(X_n)\), for some \(f \in \mathscr {L}{}(\mathscr {X}{})\) and \(n \in \mathbb {N}{}\), conditional on the initial state. The expressions for these inferences are given by \(\overline{T}{}^{n-1} f\) and \(\underline{T}{}^{n-1} f\), respectively [5]. For instance, for any \(x\in \mathscr {X}\), \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f(X_5)) \vert X_1 = x)=[\underline{T}{}^{4}f](x)\). These expressions can also easily be obtained from Theorem 2, by setting \(g_0 := f\) and, for all \(k \in \{1,\cdots ,n-1\}\), \(g_k := 0\) and \(h_k := 1\).

Sums of functions. One can also use our method to compute upper and lower expectations of sums \(\sum _{k=1}^n f_k(X_k)\) of functions \(f_k \in \mathscr {L}{}(\mathscr {X}{})\). Then we would have to set \(g_0 := f_n\) and, for all \(k \in \{1,\cdots ,n-1\}\), \(g_{k} := f_{n-k}\) and \(h_{k} := 1\). Although we allow the functions \(f_k\) to depend on k, it is worth noting that, if we set them all equal to the same function f, our method can also be employed to compute the upper and lower expectation of the time average of f over the time interval n. The subtlety of the constant factor does not raise a problem here, because upper and lower expectations are homogeneous with respect to non-negative scaling.

Product of functions. Another interesting class of inferences are those that can be represented by a product \(\prod _{k=1}^n f_k(X_k)\) of functions \(f_k \in \mathscr {L}{}(\mathscr {X}{})\). To compute upper and lower expectations of such functions, it suffices to set \(g_0 := f_n\) and, for all \(k \in \{1,\cdots ,n-1\}\), \(g_{k} := 0\) and \(h_{k} := f_{n-k}\). A typical example of an inference than can be described in this way is the probability that the state will be in a set \(A \subseteq \mathscr {X}{}\) during a certain time interval. For instance, the upper expectation of the function \(\mathbb {I}_{A}(X_1)\mathbb {I}_{A}(X_2)\) gives us a tight upper bound on the probability that the state will be in A during the first two time instances.

Hitting probabilities. The hitting probability of some set \(A \subseteq \mathscr {X}{}\) over a finite time interval n is the probability that the state \(X_k\) will be in A somewhere within the first n time instances. The upper and lower bounds on such a hitting probability are equal to the upper and lower expectation of the function \(f(X_{1:n}) := \mathbb {I}_{A'_n} \in \mathscr {L}(\varOmega {})\), where \(A'_n := \{\omega \in \varOmega {} :(\exists k \le n) \, \omega _k \in A \}\). Note that \(f(X_{1:n})\) can be decomposed in the following way:

$$\begin{aligned} f(X_{1:n}) = \mathbb {I}_{A}(X_1) + \mathbb {I}_{A}(X_2) \mathbb {I}_{A^c}(X_1) + \cdots + \mathbb {I}_{A}(X_n) \prod _{k = 1}^{n-1} \mathbb {I}_{A^c}(X_k) \end{aligned}$$

Hence, these inferences can be obtained using Theorem 2 if we let \(g_0 := \mathbb {I}_{A}\) and, for all \(k \in \{1,\cdots ,n-1\}\), \(g_k := \mathbb {I}_{A}\) and \(f_k := \mathbb {I}_{A^c}\). Additionally, one could also be interested in the probability that the state \(X_k\) will ever be in A. Upper and lower bounds on this probability are given by the upper and lower expectation of the function \(f := \mathbb {I}_{A'} \in \mathscr {L}(\varOmega {})\) where \(A' := \{\omega \in \varOmega {} :(\exists k \in \mathbb {N}{}) \, \omega _k \in A \}\). Since the function f is non-finitary, we are unable to apply our method in a direct way. However, it is shown in [8, Proposition 16] that, if the set \(\mathscr {T}{}\) is convex and closed, the upper and lower bounds on the hitting probability over a finite time interval converge to the upper and lower bounds on the hitting probability over an infinite time interval, therefore allowing us to approximate these inferences by choosing n sufficiently large.

Hitting times. The hitting time of some set \(A \subseteq \mathscr {X}{}\) is defined as the time \(\tau \) until the state is in A for the first time; so \(\tau (\omega ) := \inf \{k \in \mathbb {N}_{0}{} :\omega _{k} \in A \}\) for all \(\omega \in \varOmega {}\). Once more, the function \(\tau \) is non-finitary, necessitating an indirect approach to the computation of its upper and lower expectation. This can be done in a similar way as we did for the case of hitting probabilities, now considering the finitary functions \(\tau _n(X_{1:n})\), where \(\tau _n \in \mathscr {L}{}(\mathscr {X}{}^n)\) is defined by \(\tau _n(x_{1:n}) := \inf \{k \in \mathbb {N}{} :x_{k} \in A\}\) if \(\{k \in \mathbb {N}{} :x_{k} \in A\}\) is non-empty, and \(\tau _n(x_{1:n}) := n+1\) otherwise, for all \(n \in \mathbb {N}{}\) and all \(x_{1:n} \in \mathscr {X}{}^{n}\). These functions correspond to choosing \(g_0 := \mathbb {I}_{A^c}\) and, for all \(k \in \{1,\cdots ,n-1\}\), \(g_k := \mathbb {I}_{A^c}\) and \(f_k := \mathbb {I}_{A^c}\). If the set \(\mathscr {T}{}\) is convex and closed, the upper and lower expectations of these functions for large n will then approximate those of the non-finitary hitting time [8, Proposition 10].

6 Discussion

The main contribution of this paper is a single, unified method to efficiently compute a wide variety of inferences for imprecise Markov chains; see Theorem 2. The set of functions describing these inferences is however restricted to the finitary type, and therefore a general approach for inferences characterised by non-finitary functions is still lacking. In some cases, however, as we already mentioned in our discussion of hitting probabilities and hitting times, this issue can be addressed by relying on a continuity argument.

Indeed, consider any function \(f = \lim _{n \rightarrow +\infty } \tau _n(X_{1:n})\) that is the pointwise limit of a sequence \(\{\tau _n(X_{1:n})\}_{n \in \mathbb {N}{}}\) of finitary functions, defined recursively as in Theorem 2. If \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\) is continuous with respect to \(\{\tau _n(X_{1:n})\}_{n \in \mathbb {N}{}}\), meaning that \(\lim _{n \rightarrow +\infty } \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n})) = \overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}( f)\), the inference \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f)\) can then be approximated by \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}))\) for sufficiently large n. Since we can recursively compute \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(\tau _n(X_{1:n}))\) for any \(n \in \mathbb {N}{}\) using the methods discussed at the end of Sect. 4, this yields an efficient way of approximating \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f)\). A completely analogous argument can be used for the lower expectation \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}(f)\). This begs the question whether the upper and lower expectations \(\overline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\) and \(\underline{\mathrm {E}}{}_{\mathcal {M},\mathscr {T}{}}\) satisfy the appropriate continuity properties for this to work.

Unfortunately, results about the continuity properties of these operators are rather scarce —especially compared to their precise counterparts—and depend on the formalism that is adopted. In this paper, for didactical reasons, we have considered one formalism: we have introduced imprecise Markov chains as being sets of “precise” models that are in a specific sense compatible with the given set \(\mathscr {T}\). It is however important to realise that there is also an entirely different formalisation of imprecise Markov chains that is instead based on the game-theoretic probability framework that was popularised by Shafer and Vovk; we refer to [10, 11] for details. It is well known that the inferences produced under these two different frameworks agree for finitary functions [3, 9], so the method described by Theorem 2 is also applicable when working in a game-theoretic framework. The continuity properties of the game-theoretic upper and lower expectations, however, are not necessarily the same as those of the measure-theoretic operators that we considered here. So far, the continuity properties of game-theoretic upper and lower expectations are better understood [10,11,12], making these operators more suitable if we plan to employ the continuity argument above.