Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Markov Decision Processes (MDPs) are a widely and commonly used mathematical abstraction that permits to study properties of real world systems in a rigorous way. The actual system is represented by means of a model subsuming the states the system can be in and the transitions representing how the system evolves from one state to another; the actual properties are encoded as logical formulas that are then verified against the model.

MDPs are suitable for modelling two core aspects of the behavior of the real world systems: nondeterminism and probability. A nondeterministic behavior can be introduced to model a behavior of the system that is just partially known (like receiving an asynchronous message, of which it is known it can be received in the current state but no information is available so to quantify its likelihood) or to leave implementation details open. A probabilistic behavior occurs whenever the successor state of the system is not uniquely determined by the current system and the performed action, but depends on a random choice; such a choice can be due to the design of the system, as it is required by the implementation of a distributed consensus protocol with faulty processes [3, 14], or by physical properties that need to be taken into account, like transmission errors.

Finding the exact probability values for the transitions is sometimes a difficult task: while probabilities introduced by design can be well known, probabilities modelling physical properties are usually estimated by observing the actual system. This means that the resulting MDP is a more or less appropriate abstraction of the real system, depending on how close the estimated probability values are to the actual values; as a consequence, the actual properties of the real system are more or less reflected by the satisfaction of the formulas by the model.

Interval Markov Decision Processes (IMDPs) extend the classical MDPs by including uncertainty over the transition probabilities. Instead of a single value for the probability of reaching a specific successor by taking a transition, IMDPs allow ranges of possible probability values given as closed intervals of the reals. Thereby, IMDPs provide a powerful modelling tool for probabilistic systems with an additional variation or uncertainty concerning the knowledge of exact transition probabilities. They are especially useful to represent realistic stochastic systems that, for instance, evolve in unknown environments with bounded behavior or do not preserve the Markov property.

Since their introduction (under the name of bounded-parameter MDPs) [16], IMDPs have been receiving a lot of attention in the formal verification community. They are particularly viewed as the appropriate abstraction model for uncertain systems with large state spaces, including continuous dynamical systems, for the purpose of analysis, verification, and control synthesis. Several model checking and control synthesis techniques have been developed [37, 38, 43] causing a boost in the applications of IMDPs, ranging from verification of continuous stochastic systems (e.g., [30]) to robust strategy synthesis for robotic systems (e.g., [32,33,34, 43]).

Bisimulation minimisation is a well-known technique that has been successfully used to reduce the size of a system while preserving the properties it satisfies [5, 8, 9, 23, 27]; this helps the task of the property solver, since it has to work on a smaller system. Compositional minimisation permits to minimise the single components of the system before combining them, thus making the task of the minimiser easier and extending its applicability to larger systems. In this paper, we show that this approach is suitable also for IMDPs. The contributions of the paper are as follows.

  • We define alternating probabilistic bisimulations to compress the IMDP model size with respect to the controller synthesis semantics while preserving probabilistic CTL property satisfaction. We show that the compressed models can be computed in polynomial time.

  • From the perspective of compositional reasoning, we show that alternating probabilistic bisimulations for IMDPs are congruences with respect to two facets of parallelism, namely synchronous product and interleaving.

  • We show promising results on a variety of case studies, obtained by prototypical implementations of all algorithms.

Related work. Related work can be grouped into three categories: uncertain Markov model formalisms, bisimulation minimization, and compositional minimization.

Firstly, from the modelling viewpoint, various probabilistic modelling formalisms with uncertain transitions are studied in the literature. Interval Markov Chains (IMCs) [25, 28] or abstract Markov chains [13] extend standard discrete-time Markov Chains (MCs) with interval uncertainties. They do not feature the non-deterministic choices of transitions. Uncertain MDPs [38] allow more general sets of distributions to be associated with each transition, not only those described by intervals. They usually are restricted to rectangular uncertainty sets requiring that the uncertainty is linear and independent for any two transitions of any two states. Parametric MDPs [17], to the contrary, allow such dependencies as every probability is described as a rational function of a finite set of global parameters. IMDPs extend IMCs by inclusion of nondeterminism and are a subset of uncertain MDPs and parametric MDPs.

Secondly, as regards to the bisimulation minimization for uncertain or parametric probabilistic models, works in [18, 20, 21] explored the computational complexity and approximability of deciding probabilistic bisimulation for IMDPs with respect to the cooperative resolution of nondeterminism. In this work, we show that IMDPs can be minimized efficiently with respect to the competitive resolution of nondeterminism.

Lastly, from the viewpoint of compositional minimization, IMCs [25] and abstract Probabilistic Automata (PA) [10, 11] serve as specification theories for MC and PA, featuring satisfaction relation and various refinement relations. In [22], the authors discuss the key ingredients to build up the operations of parallel composition for composing IMDP components at run-time. Our paper follows this spirit for alternating probabilistic bisimulation on IMDPs.

Structure of the paper. We start with necessary preliminaries in Sect. 2. In Sect. 3, we give the definitions of alternating probabilistic bisimulation for interval MDP and discuss their properties. A polynomial time decision algorithm to decide alternating probabilistic bisimulation for IMDPs and also compositional reasoning are discussed in Sects. 4 and 5, respectively. In Sect. 6, we demonstrate our approach on some case studies and present promising experimental results. Finally, in Sect. 7 we conclude the paper.

2 Mathematical Preliminaries

For a set X, denote by \(\mathrm {Disc}(X)\) the set of discrete probability distributions over X. Intuitively, a discrete probability distribution \(\rho \) is a function \(\rho :X \rightarrow \mathbb {R}_{\ge 0}\) such that \(\sum _{x \in X} \rho (x) = 1\); for \(X' \subseteq X\), we write \(\rho (X')\) for \(\sum _{x \in X'} \rho (x)\). Given \(\rho \in \mathrm {Disc}(X)\), we denote by \(\mathrm {Supp}(\rho )\) the set \(\{\, x \in X \mid \rho (x) >0 \,\}\) and by \(\delta _{x}\), where \(x \in X\), the Dirac distribution such that \(\delta _{x}(y) = 1\) for \(y = x\), 0 otherwise. For a probability distribution \(\rho \), we also write \(\rho = \{\, (x, p_{x}) \mid x \in X \,\}\) where \(p_{x}\) is the probability of x.

The lifting \(\mathrel {\mathcal {L}(\mathcal {\mathcal {R}})}\) [31] of a relation \(\mathcal {\mathcal {R}}\subseteq X \times Y\) is defined as follows: for \(\rho _{X} \in \mathrm {Disc}(X)\) and \(\rho _{Y} \in \mathrm {Disc}(Y)\), \(\rho _{X} \mathrel {\mathcal {L}(\mathcal {\mathcal {R}})}\rho _{Y}\) holds if there exists a weighting function \(w :X \times Y \rightarrow [0,1]\) such that (1) \(w(x,y) > 0\) implies \(x \mathrel {\mathcal {\mathcal {R}}}y\), (2) \(\sum _{y \in Y} w(x,y) = \rho _{X}(x)\), and (3) \(\sum _{x \in X} w(x,y) = \rho _{Y}(y)\). When \(\mathcal {\mathcal {R}}\) is an equivalence relation on X, \(\rho _{1} \mathrel {\mathcal {L}(\mathcal {\mathcal {R}})}\rho _{2}\) holds if for each \(\mathcal {C}\in X/\mathcal {\mathcal {R}}\), \(\rho _{1}(\mathcal {C}) = \rho _{2}(\mathcal {C})\) where \(X/\mathcal {\mathcal {R}}= \{\, [x]_{\mathcal {\mathcal {R}}} \mid x \in X \,\}\) and \([x]_{\mathcal {\mathcal {R}}} = \{\, y \in X \mid y \mathrel {\mathcal {\mathcal {R}}}x \,\}\).

For a vector \(\varvec{x} \in \mathbb {R}^{n}\) we denote by \(\varvec{x}_{i}\), its i-th component, and we call \(\varvec{x}\) a weight vector if \(\varvec{x} \in \mathbb {R}_{\ge 0}^{n}\) and \(\sum _{i=1}^{n} \varvec{x}_{i} = 1\). Given two vectors \(\varvec{x}, \varvec{y} \in \mathbb {R}^{n}\), their Euclidean inner product \(\varvec{x} \cdot \varvec{y}\) is defined as \(\varvec{x} \cdot \varvec{y} = \varvec{x}^{T} \varvec{y} = \sum _{i=1}^{n} \varvec{x}_{i} \cdot \varvec{y}_{i}\). We write \(\varvec{x} \le \varvec{y}\) if \(\varvec{x}_{i} \le \varvec{y}_{i}\) for each \(1 \le i \le n\) and we denote by \(\varvec{1} \in \mathbb {R}^{n}\) the vector such that \(\varvec{1}_{i} = 1\) for each \(1 \le i \le n\). For a set of vectors \(S = \{\varvec{s}^{1}, \cdots , \varvec{s}^{m}\} \subseteq \mathbb {R}^{n}\), we say that \(\varvec{s}\) is a convex combination of elements of S, if \(\varvec{s} = \sum _{i=1}^{m} \varvec{w}_{i} \cdot \varvec{s}^{i}\) for some weight vector \(\varvec{w} \in \mathbb {R}_{\ge 0}^{m}\). For a given set \(P \subseteq \mathbb {R}^{n}\), we denote by \({{\mathrm{conv}}}{P}\) the convex hull of P and by \(\mathtt {{Ext}}({P})\) the set of extreme points of P. If P is a polytope in \(\mathbb {R}^{n}\) then for each \(1 \le i \le n\), the projection \({{\mathrm{proj}}}_{\varvec{\mathbf {e}^{\mathbf {i}}}}{P}\) on the i-th dimension of P is defined as \({{\mathrm{proj}}}_{\varvec{\mathbf {e}^{\mathbf {i}}}}{P} = [\min _{i} P,\max _{i} P]\) where \(\mathbf {e}^{\mathbf {i}} \in \mathbb {R}^{n}\) is such that \(\mathbf {e}^{\mathbf {i}}_{i} = 1\) and \(\mathbf {e}^{\mathbf {i}}_{j} = 0\) for each \(j \ne i\), \(\min _{i} P = \min \{\, \varvec{x}_{i} \mid \varvec{x} \in P \,\}\), and \(\max _{i} P = \max \{\, \varvec{x}_{i} \mid \varvec{x} \in P \,\}\).

2.1 Interval Markov Decision Processes

We now define Interval Markov Decision Processes (IMDPs) as an extension of MDPs, which allows for the inclusion of transition probability uncertainties as intervals. IMDPs belong to the family of uncertain MDPs and allow to describe a set of MDPs with identical (graph) structures that differ in distributions associated with transitions.

Definition 1

( IMDPs). An Interval Markov Decision Process (IMDP) \(\mathcal {M}\) is a tuple \((S, \bar{s}, \mathcal {A}, \texttt {AP},L, I )\), where \(S\) is a finite set of states, \(\bar{s}\in S\) is the initial state, \(\mathcal {A}\) is a finite set of actions, \(\texttt {AP}\) is a finite set of atomic propositions, \(L:S\rightarrow 2^{\texttt {AP}}\) is a labelling function, and is a total interval transition probability function with .

Given \(s \in S\) and \(a \in \mathcal {A}\), we call \(\mathfrak {h}^{a}_{s} \in \mathrm {Disc}(S)\) a feasible distribution reachable from s by a, denoted by , if, for each state \(s' \in S\), we have \(\mathfrak {h}^{a}_{s}(s') \in I (s,a,s')\). We denote the set of feasible distributions for state s and action a by \(\mathcal {H}^{a}_{s}\), i.e., and we denote the set of available actions at state \(s \in S\) by \(\mathcal {A}(s)\), i.e., \(\mathcal {A}(s) = \{\, a \in \mathcal {A} \mid \mathcal {H}^{a}_{s} \ne \emptyset \,\}\). We assume that \(\mathcal {A}(s) \ne \emptyset \) for all \(s \in S\).

We define the size of \(\mathcal {M}\), written \(\left| \mathcal {M}\right| \), as the number of non-zero entries of \( I \), i.e., .

A path \(\xi \) in \(\mathcal {M}\) is a finite or infinite sequence of states \(\xi = s_{0} s_{1} \cdots \) such that for each \(i \ge 0\) there exists \(a_{i} \in \mathcal {A}(s_{i})\) such that . The i-th state along the path \(\xi \) is denoted by \(\xi [i]\) and, if the path is finite, we denote by \( last (\xi )\) its last state. The sets of all finite and infinite paths in \(\mathcal {M}\) are denoted by \( Paths ^{*}\) and \( Paths \), respectively.

The nondeterministic choices between available actions and feasible distributions present in an IMDP are resolved by strategies and natures, respectively.

Definition 2

(Strategy and Nature in ). Given an IMDP \(\mathcal {M}\), a strategy is a function \(\sigma : Paths ^{*}\rightarrow \mathrm {Disc}(\mathcal {A})\) such that for each path \(\xi \in Paths ^{*}\), \(\sigma (\xi ) \in \mathrm {Disc}(\mathcal {A}( last (\xi ))\). A nature is a function \(\pi : Paths ^{*}\times \mathcal {A}\rightarrow \mathrm {Disc}(S)\) such that for each path \(\xi \in Paths ^{*}\) and action \(a \in \mathcal {A}(s)\), \(\pi (\xi , a) \in \mathcal {H}^{a}_{s}\) where \(s = last (\xi )\).

The sets of all strategies and all natures are denoted by \(\varSigma \) and \(\varPi \), respectively.

Given a finite path \(\xi \) of an IMDP, a strategy \(\sigma \), and a nature \(\pi \), the system evolution proceeds as follows: let \(s = last (\xi )\). First, an action \(a \in \mathcal {A}(s)\) is chosen probabilistically by \(\sigma \). Then, \(\pi \) resolves the uncertainties and chooses one feasible distribution \(\mathfrak {h}^{a}_{s} \in \mathcal {H}^{a}_{s}\). Finally, the next state \(s'\) is chosen according to the distribution \(\mathfrak {h}^{a}_{s}\), and the path \(\xi \) is extended by \(s'\).

A strategy \(\sigma \) and a nature \(\pi \) induce a probability measure over paths as follows. The basic measurable events are the cylinder sets of finite paths, where the cylinder set of a finite path \(\xi \) is the set \( Cyl _{\xi } = \{\, \xi ' \in Paths \mid \xi \text { is a prefix of }\xi ' \,\}\). The probability of a state \(s'\) is defined to be and the probability of traversing a finite path is defined to be

Standard measure theoretical arguments ensure that extends uniquely to the \(\sigma \)-field generated by cylinder sets.

Fig. 1.
figure 1

An example of IMDP.

As an example of IMDPs, consider the one depicted in Fig. 1. The set of states is \(S= \{s,t,u\}\) with s being the initial one; the set of actions is \(\mathcal {A}= \{a, b\}\) while the set of atomic propositions assigned to each state by the labelling function \(L\) is represented by the letters in curly brackets near each state. Finally, the transition probability intervals are \( I (s,a,t) = [\frac{1}{3},\frac{2}{3}]\), \( I (s,a,u) = [\frac{1}{10},1]\), \( I (s,b,t) = [\frac{2}{5},\frac{3}{5}]\), \( I (s,b,u) = [\frac{1}{4},\frac{2}{3}]\), \( I (t,a,t) = I (u,b,u) = [1,1]\), and \( I (t,b,t) = I (u,a,u) = [0,0]\).

2.2 Probabilistic Computation Tree Logic (PCTL)

There are various ways how to describe properties of IMDPs. Here we focus on probabilistic CTL (PCTL) [19]. The syntax of PCTL state formulas \(\varphi \) and PCTL path formulas \(\psi \) is given by:

$$\begin{aligned} \varphi&:= a \mid \lnot \varphi \mid \varphi _{1}\wedge \varphi _{2} \mid \mathsf {P}_{\bowtie p}(\psi ) \\ \psi&:= \mathsf {X}\varphi \mid \varphi _{1}\mathbin {\mathsf {U}}\varphi _{2} \mid \varphi _{1}\mathbin {\mathsf {U}}^{\le k}\varphi _{2} \end{aligned}$$

where \(a \in \texttt {AP}\), \(p \in [0,1]\) is a rational constant, \(\mathord {\bowtie } \in \{\le , <, \ge , > \}\), and \(k \in \mathbb {N}\).

The semantics of a PCTL formula with respect to IMDPs is very similar to the classical PCTL semantics for MDPs: they coincide on all formulas except for \(\mathsf {P}_{\bowtie p}(\psi )\), where they may differ depending on how the nondeterminism is resolved. Formally, for the formulas they agree on, given a state s and a state formula \(\varphi \), the satisfaction relation \(s \models \varphi \) is defined as follows:

Given an infinite path \(\xi = s_{1} s_{2} \cdots \) and a path formula \(\psi \), the satisfaction relation \(\xi \models \psi \) is defined as follows:

Regarding the state formula \(\mathsf {P}_{\bowtie p}(\psi )\), its semantics depends on the way the nondeterminism is resolved for the probabilistic operator \(\mathsf {P}_{\bowtie p}(\psi )\). When quantifying both types of nondeterminism universally, the corresponding satisfaction relation \(s \models \mathsf {P}_{\bowtie p}(\psi )\) is defined as follows:

figure a

where \( Paths _{\psi } = \{\, \xi \in Paths \mid \xi \models \psi \,\}\) denotes the set of infinite paths satisfying \(\psi \). It is easy to show that the set \( Paths _{\psi }\) is measurable for any path formula \(\psi \), hence its probability can be computed and compared with p. When the IMDP is actually an MDP, i.e., all intervals are single values, then the satisfaction relation \(s \models \mathsf {P}_{\bowtie p}(\psi )\) in Equation (\(\forall \)) coincides with the corresponding definition for MDPs (cf. [2, Sect. 10.6.2]). We explain later how the semantics differs for a different resolution of nondeterminism for strategy and nature.

3 Alternating Probabilistic Bisimulation for IMDPs

This section revisits required main results on probabilistic bisimulation for IMDPs, as developed in [20]. In the setting of this paper, we consider alternating probabilistic bisimulation which stems from the competitive resolution of nondeterminisms in IMDPs. In the competitive semantics, the strategy and nature are playing in a game against each other; therefore, they are resolved competitively. This semantics is very natural in the context of controller synthesis for systems with uncertain probabilities or in the context of parameter synthesis for parallel systems.

In this paper, in order to resolve the stochastic nondeterminism we focus on the dynamic approach [24, 42], i.e., independently at each computation step as it is easier to work with algorithmically and can be seen as a relaxation of the static approach that is often intractable [4, 7, 12, 16].

To this end, we consider the controller synthesis semantics to resolve the two sources of IMDP nondeterminisms and discuss the resultant alternating probabilistic bisimulation. Note that there is another variant of alternating probabilistic bisimulation based on the parameter synthesis semantics [20]. However, the alternating bisimulations relations resulting from these two semantics coincide [20, Theorem 4].

In the controller synthesis semantics, we search for a strategy \(\sigma \) such that for any nature \(\pi \), a fixed property \(\varphi \) is satisfied. This corresponds to the satisfaction relation \(\models _{{(\exists \sigma \forall )}}\) in PCTL, obtained from \(\models \) by replacing the rule (\(\forall \)) with

figure b

As regards to bisimulation, the competitive setting is not common. We define a bisimulation similar to the alternating bisimulation of [1] applied to non-stochastic two-player games. For a decision \(\rho \in \mathrm {Disc}(\mathcal {A})\) of \(\sigma \), let denote that \(\mu \) is a possible successor distribution, i.e., there are decisions \(\mu _{a}\) of \(\pi \) for each \(a \in \mathrm {Supp}(\rho )\) such that \(\mu = \sum _{a \in \mathcal {A}} \rho (a) \cdot \mu _{a}\).

Definition 3

Given an IMDP \(\mathcal {M}\), let \(\mathcal {\mathcal {R}}\subseteq S\times S\) be an equivalence relation. We say that \(\mathcal {\mathcal {R}}\) is an alternating probabilistic \({(\exists \sigma \forall )}\) -bisimulation if for any \((s,t) \in \mathcal {\mathcal {R}}\) we have that \(L(s) = L(t)\) and for each \(\rho _{s} \in \mathrm {Disc}(\mathcal {A}(s))\) there exists \(\rho _{t} \in \mathrm {Disc}(\mathcal {A}(t))\) such that for each there exists such that \(\mu _{s} \mathrel {\mathcal {L}(\mathcal {\mathcal {R}})}\mu _{t}\). We write \(s \sim _{{(\exists \sigma \forall )}} t\) whenever \((s,t) \in \mathcal {\mathcal {R}}\) for some alternating probabilistic \({(\exists \sigma \forall )}\)-bisimulation \(\mathcal {\mathcal {R}}\).

The exact alternation of quantifiers might be counter-intuitive at first sight. Note that it exactly corresponds to the situation in non-stochastic games [1]. The defined bisimulation preserves the PCTL logic with respect to the \(\models _{{(\exists \sigma \forall )}}\) semantics.

Theorem 4

For states \(s \sim _{{(\exists \sigma \forall )}} t\) and any PCTL formula \(\varphi \), we have \(s \models _{{(\exists \sigma \forall )}} \varphi \) if and only if \(t \models _{{(\exists \sigma \forall )}} \varphi \).

As a concluding remark, it is worthwhile to note that Definition 3 can be seen as the conservative extension of probabilistic bisimulation for (state-labelled) MDPs. To see that, assume the set of uncertainty for every transition is a singleton. Since there is only one choice for the nature, the role of nature can be safely removed from the definitions.

4 A PTIME Decision Algorithm for Bisimulation Minimization

Computation of the alternating probabilistic bisimulation \(\sim _{{(\exists \sigma \forall )}}\) for IMDPs follows the standard partition refinement approach [6, 15, 26, 35]. However, the core part is finding out whether two states “violate the definition of bisimulation”. This verification routine amounts to check that s and t have the same set of strictly minimal polytopes detailed as follows.

For \(s \in S\) and \(a \in \mathcal {A}(s)\), recall that \(\mathcal {H}^{a}_{s}\) denotes the polytope of feasible successor distributions over states with respect to taking the action a in the state s. By \(\mathcal {P}^{s,a}_{\mathrel {\mathcal {\mathcal {R}}}}\), we denote the polytope of feasible successor distributions over equivalence classes of \(\mathrel {\mathcal {\mathcal {R}}}\) with respect to taking the action a in the state s. Given an interval \([l,u]\), let \(\inf [l,u] = l\) and \(\sup [l,u] = u\). For \(\mu \in \mathrm {Disc}(S/\mathcal {\mathcal {R}})\) we set \(\mu \in \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) if, for each \(\mathcal {C}\in S/\mathcal {\mathcal {R}}\), we have \(\mu (\mathcal {C}) \in I (s,a,\mathcal {C})\) where

$$ I (s,a,\mathcal {C}) = \Big [ \min \big \{1, \sum _{s' \in \mathcal {C}} \inf I (s, a, s') \big \}, \min \big \{1, \sum _{s' \in \mathcal {C}} \sup I (s, a, s') \big \} \Big ]\text {.} $$

It is not difficult to see that each \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) can be represented as an \(\mathcal {H}\)-polytope. To simplify our presentation, we shall fix an order over the equivalence classes in \(S/\mathcal {\mathcal {R}}\). By doing so, any distribution \(\rho \in \mathrm {Disc}(S/\mathcal {\mathcal {R}})\) can be seen as a vector \(\varvec{v} \in \mathbb {R}_{\ge 0}^{n}\) such that \(\varvec{v}_{i} = \rho (\mathcal {C}_{i})\) for each \(1 \le i \le n\), where \(n = |S/\mathcal {\mathcal {R}}|\) and \(\mathcal {C}_{i}\) is the i-th equivalence class in the order. For the above discussion, \(\rho \in \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) if and only if \(\rho (\mathcal {C}_{i}) \in [\varvec{l}^{s,a}_{i},\varvec{u}^{s,a}_{i}]\) for any \(1 \le i \le n\) and \(\rho \in \mathrm {Disc}(S/\mathcal {\mathcal {R}})\), where \(\varvec{l}^{s,a}\) and \(\varvec{u}^{s,a}\) are vectors such that \(\varvec{l}^{s,a}_{i} = \min \{1, \sum _{s' \in \mathcal {C}_{i}}{\inf I (s, a, s')}\}\) and \(\varvec{u}^{s,a}_{i} = \min \{1, \sum _{s' \in \mathcal {C}_{i}}{\sup I (s, a, s')}\}\) for each \(1 \le i \le n\). Therefore, \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) corresponds to an \(\mathcal {H}\)-polytope defined by \(\{\, \varvec{x}^{s,a} \in \mathbb {R}^{n} \mid \varvec{l}^{s,a} \le \varvec{x}^{s,a} \le \varvec{u}^{s,a}, \varvec{1} \cdot \varvec{x}^{s,a} = 1 \,\}\).

Definition 5

(Strictly minimal polytopes). Given an IMDP \(\mathcal {M}\), a state s, an equivalence relation \(\mathcal {\mathcal {R}}\subseteq S \times S\), and a set \(\{\, \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}} \mid a \in \mathcal {A}(s) \,\}\) where for each \(a \in \mathcal {A}(s)\), for given \(\varvec{l}^{s,a}, \varvec{u}^{s,a} \in \mathbb {R}^{n}\), \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) is the convex polytope \( \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}} = \{\, \varvec{x}^{s,a} \in \mathbb {R}^{n} \mid \varvec{l}^{s,a} \le \varvec{x}^{s,a} \le \varvec{u}^{s,a}, \varvec{1} \cdot \varvec{x}^{s,a} = 1 \,\} \), a polytope \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) is called strictly minimal, if for no \(\rho \in \mathrm {Disc}(\mathcal {A}(s) \setminus \{a\})\), we have \(\mathcal {P}^{s,\rho }_{\mathcal {\mathcal {R}}} \subseteq \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) where \(\mathcal {P}^{s,\rho }_{\mathcal {\mathcal {R}}}\) is defined as \( \mathcal {P}^{s,\rho }_{\mathcal {\mathcal {R}}} = \{\, \varvec{x}^{s,\rho }\in \mathbb {R}^{n} \mid \varvec{x}^{s,\rho } = \sum _{b \in \mathcal {A}(s) \setminus \{a\}} \rho (b)\cdot \varvec{x}^{s,b} \wedge \varvec{x}^{s,b} \in \mathcal {P}^{s,b}_{\mathcal {\mathcal {R}}} \,\} \).

Checking violation of a given pair of states amounts to check if the states have the same set of strictly minimal polytopes. Formally,

Lemma 6

(cf. [20]). Given an IMDP \(\mathcal {M}\) and \(s,t \in S\), we have \(s \sim _{{(\exists \sigma \forall )}} t\) if and only if \(L(s) = L(t)\) and \(\{\, \mathcal {P}^{s,a}_{\sim _{{(\exists \sigma \forall )}}} \mid a \in \mathcal {A}\text { and }\mathcal {P}^{s,a}_{\sim _{{(\exists \sigma \forall )}}} \text {is strictly minimal} \,\} = \{\, \mathcal {P}^{t,a}_{\sim _{{(\exists \sigma \forall )}}} \mid a \in \mathcal {A}\text { and }\mathcal {P}^{t,a}_{\sim _{{(\exists \sigma \forall )}}} \text { is strictly minimal} \,\}\).

The expensive procedure in the analysis of the worst case time complexity of computing the coarsest alternating probabilistic bisimulation \(\sim _{{(\exists \sigma \forall )}}\), as described in [20], is to check the strict minimality of a polytope \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) for \(a\in \mathcal {A}(s)\). This decision problem has been shown to be exponentially verifiable via a reduction to a system of linear (in)equalities in EXPTIME. In this paper, we give a polynomial time routine to verify the strict minimality of a polytope which in turn enables a polynomial time decision algorithm to decide \(\sim _{{(\exists \sigma \forall )}}\). To this aim, we use the following equivalent form of the Farkas’ Lemma [39].

Lemma 7

Let \(A \in \mathbb {R}^{m \times n}\), \(\varvec{b} \in \mathbb {R}^{m}\) and \(\varvec{c} \in \mathbb {R}^{n}\). Then, \(A \varvec{x} \le \varvec{b}\) implies \(\varvec{c} \cdot \varvec{x} \le d\) if and only if there exists \(\varvec{y} \in \mathbb {R}_{\ge 0}^{m}\) such that \(A^{T} \varvec{y} = \varvec{c}\) and \(\varvec{b} \cdot \varvec{y} \le d\).

This variant of Farkas’ Lemma leads us to establish the main result of the paper. Formally,

Theorem 8

Given an IMDP \(\mathcal {M}\), a state \(s \in S\), an equivalence relation \(\mathcal {\mathcal {R}}\subseteq S\times S\) and a set \(\{\, \mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}} \mid a \in \mathcal {A}(s) \,\}\) defined as in Definition 5, checking whether for each \(a \in \mathcal {A}(s)\), the polytope \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) is strictly minimal, is in P.

Proof

Let \(\mathcal {A}(s) = \{a_{0}, a_{1}, \cdots , a_{m}\}\), \(n = \left| S/\mathcal {\mathcal {R}}\right| \), and \(P_{i} = \mathcal {P}^{s,a_{i}}_{\mathcal {\mathcal {R}}}\) for \(0 \le i \le m\). We describe the verification routine to check the strict minimality of \(P_{0}\); the same routine applies to the other polytopes. We consider the converse of the strict minimality problem which asks to decide whether there exist \(\lambda _{1}, \lambda _{2}, \cdots , \lambda _{m} \in \mathbb {R}_{\ge 0}\) such that \(\sum _{i=1}^{m} \lambda _{i} = 1\) and \(\sum _{i=1}^{m} \lambda _{i} P_{i} \subseteq P_{0}\). We show that the latter problem can be casted as an LP via Farkas’ Lemma 7. To this aim, we alternatively reformulate the converse problem as “do there exist \(\lambda _{1}, \lambda _{2}, \cdots , \lambda _{m} \in \mathbb {R}_{\ge 0}\) with \(\sum _{i=1}^{m} \lambda _{i} = 1\), such that \(\varvec{x}^{i} \in P_{i}\) for each \(1 \le i \le m\) implies \(\sum _{i=1}^{m} \lambda _{i} \varvec{x}^{i} \in P_{0}\)?”.

For every fixed \(\lambda _{1}, \lambda _{2}, \cdots , \lambda _{m} \in \mathbb {R}_{\ge 0}\) with \(\sum _{i=1}^{m} \lambda _{i} = 1\), the implication “\((\forall 1 \le i \le m : \varvec{x}^{i} \in P_{i}) \implies \sum _{i=1}^{m} \lambda _{i} \varvec{x}^{i} \in P_{0}\)” can be written as the conjunction of 2n conditions:

$$\begin{aligned} \bigwedge _{i=1}^{m} \varvec{l}^{i} \le \varvec{x}^{i} \le \varvec{u}^{i} \wedge \bigwedge _{i=1}^{m} \varvec{1} \cdot \varvec{x}^{i} = 1 \implies \sum _{i=1}^{m} \lambda _{i} \varvec{x}^{i}_{k} \ge \varvec{l}^{0}_{k} \end{aligned}$$
(1)
$$\begin{aligned} \bigwedge _{i=1}^{m} \varvec{l}^{i} \le \varvec{x}^{i} \le \varvec{u}^{i} \wedge \bigwedge _{i=1}^{m} \varvec{1} \cdot \varvec{x}^{i} = 1 \implies \sum _{i=1}^{m} \lambda _{i} \varvec{x}^{i}_{k} \le \varvec{u}^{0}_{k} \end{aligned}$$
(2)

for all \(1 \le k \le n\). (Note that the condition \(\varvec{1} \cdot \sum _{i=1}^{m} \lambda _{i} \varvec{x}^{i} = 1\) is trivially satisfied if \(\varvec{1} \cdot \varvec{x}^{i} = 1\) for all \(1 \le i \le m\).) Each of the conditions (1) and (2), by Farkas’ Lemma, is equivalent to the feasibility of a system of inequalities; for instance, for a given k, (1) is true if and only if there exist vectors \(\varvec{\mu }^{k,i}, \varvec{\nu }^{k,i} \in \mathbb {R}_{\ge 0}^{n}\) and scalars \(\theta ^{k,i}, \eta ^{k,i} \in \mathbb {R}_{\ge 0}\) for each \(1 \le i \le m\) satisfying:

$$\begin{aligned} \varvec{\mu }^{k,i} - \varvec{\nu }^{k,i} + \theta ^{k,i} \varvec{1} - \eta ^{k,i} \varvec{1}&= -\lambda _{i} \mathbf {e}^{\mathbf {k}}&\forall 1 \le i \le m\end{aligned}$$
(3)
$$\begin{aligned} \sum _{i=1}^{m} \left( \varvec{u}^{i} \cdot \varvec{\mu }^{k,i} - \varvec{l}^{i} \cdot \varvec{\nu }^{k,i} + \theta ^{k,i} - \eta ^{k,i}\right)&\le -\varvec{l}^{0}_{k} \end{aligned}$$
(4)

Similarly, for a given k, (2) is true if and only if there exist vectors \(\varvec{\widehat{\mu }}^{k,i}, \varvec{\widehat{\nu }}^{k,i} \in \mathbb {R}_{\ge 0}^{n}\) and scalars \(\widehat{\theta }^{k,i}, \widehat{\eta }^{k,i} \in \mathbb {R}_{\ge 0}\) for each \(1 \le i \le m\) satisfying:

$$\begin{aligned} \varvec{\widehat{\mu }}^{k,i} - \varvec{\widehat{\nu }}^{k,i} + \widehat{\theta }^{k,i} \varvec{1} - \widehat{\eta }^{k,i} \varvec{1}&= \lambda _{i} \mathbf {e}^{\mathbf {k}}&\forall 1 \le i \le m\end{aligned}$$
(5)
$$\begin{aligned} \sum _{i=1}^{m} (\varvec{u}^{i} \cdot \varvec{\widehat{\mu }}^{k,i} - \varvec{l}^{i} \cdot \varvec{\widehat{\nu }}^{k,i} + \widehat{\theta }^{k,i} - \widehat{\eta }^{k,i})&\le \varvec{u}^{0}_{k} \end{aligned}$$
(6)

Thus, the converse problem we are aiming to solve reduces to checking the existence of vectors \(\varvec{\mu }^{k,i}, \varvec{\nu }^{k,i}, \varvec{\widehat{\mu }}^{k,i}, \varvec{\widehat{\nu }}^{k,i} \in \mathbb {R}_{\ge 0}^{n}\) and scalars \(\lambda _{i}, \theta ^{k,i}, \eta ^{k,i}, \widehat{\theta }^{k,i}, \widehat{\eta }^{k,i} \in \mathbb {R}_{\ge 0}\) for each \(1 \le i \le m\) satisfying (3)–(6) and \(\sum _{i=1}^{m} \lambda _{i} = 1\). That amounts to solve an LP problem, which is known to be in P.    \(\square \)

Fig. 2.
figure 2

Alternating probabilistic bisimulation algorithm for interval MDPs

As stated earlier, in order to compute \(\sim _{{(\exists \sigma \forall )}}\) we follow the standard partition refinement approach formalized by the procedure in Fig. 2. Namely, we start with \(\mathcal {\mathcal {R}}\) being the complete relation and iteratively remove from \(\mathcal {\mathcal {R}}\) pairs of states that violate the definition of bisimulation with respect to \(\mathcal {\mathcal {R}}\). Clearly the core part of the algorithm is to check if two states “violate the definition of bisimulation”. The violation of bisimilarity of s and t with respect to \(\mathcal {\mathcal {R}}\), which is addressed by the procedure , is checked by verifying if states s and t have the same set of strictly minimal polytopes. As a result of Theorem 8, this verification routine can be checked in polynomial time. As regards the computational complexity of Algorithm 1, let \(|S|=n\) and \(|\mathcal {A}| = m\). The procedure in Fig. 2 is called at most \(n^{3}\) times. The procedure is then linear in m and in the complexity of checking strict minimality of \(\mathcal {P}^{s,a}_{\mathcal {\mathcal {R}}}\) and \(\mathcal {P}^{t,a}_{\mathcal {\mathcal {R}}}\), which is in \(\mathcal {O}(\left| \mathcal {M}\right| ^{\mathcal {O}(1)})\). Putting all these together, we get the following result.

Theorem 9

Given an IMDP \(\mathcal {M}\), computing \(\sim _{{(\exists \sigma \forall )}}\) belongs to \(\mathcal {O}(\left| \mathcal {M}\right| ^{\mathcal {O}(1)})\).

5 Compositional Reasoning

In order to study the compositional minimization, that is, to split a complex IMDP as parallel composition of several simpler IMDPs and then to use the bisimulation as a means to reduce the size of each of these IMDPs before performing the model checking for a given PCTL formula \(\varphi \), we have to extend the notion of bisimulation from one IMDP to a pair of IMDPs; we do this by following the usual construction (see, e.g., [6, 40]). Given two IMDPs \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\), we say that they are alternating probabilistic \({(\exists \sigma \forall )}\) -bisimilar, denoted by \(\mathcal {M}_{1} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{2}\), if there exists an alternating probabilistic \({(\exists \sigma \forall )}\)-bisimulation on the disjoint union of \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\) such that \(\bar{s}_{1} \sim _{{(\exists \sigma \forall )}} \bar{s}_{2}\). We can now establish the first property needed for the compositional minimization, that is, transitivity of \(\sim _{{(\exists \sigma \forall )}}\):

Theorem 10

Given three IMDPs \(\mathcal {M}_{1}\), \(\mathcal {M}_{2}\), and \(\mathcal {M}_{3}\), whenever \(\mathcal {M}_{1} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{2}\) and \(\mathcal {M}_{2} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{3}\), then \(\mathcal {M}_{1} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{3}\).

For the second property needed by the compositional minimization, that is, that \(\sim _{{(\exists \sigma \forall )}}\) is preserved by the parallel composition operator, we first have to introduce such an operator; to this end, we consider a slight adaption of synchronous product of \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\) as introduced in [22]. Such a synchronous product makes use of a subclass of the Segala’s (simple) probabilistic automata [40, 41], called action agnostic probabilistic automata [22], where each automaton has as set of actions the same singleton set \(\{f\}\), that is, all transitions are labelled by the same external action \(f\): an (action agnostic) probabilistic automaton (PA) is a tuple \(\mathcal {P}= (S, \bar{s}, \texttt {AP}, L, D )\), where \(S\) is a set of states, \(\bar{s}\in S\) is the start state, \(\texttt {AP}\) is a finite set of atomic propositions, \(L:S\rightarrow 2^{\texttt {AP}}\) is a labelling function, and \( D \subseteq S\times \mathrm {Disc}(S)\) is a probabilistic transition relation.

Definition 11

Given two IMDPs \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\), we define the synchronous product of \(\mathcal {M}_{1}\) and \(\mathcal {M}_{2}\) as \(\mathcal {M}_{1} \otimes \mathcal {M}_{2}:= \mathtt {F}(\mathtt {UF}(\mathcal {M}_{1}) \otimes \mathtt {UF}(\mathcal {M}_{2}))\) where

  • the unfolding mapping \(\mathtt {UF}:\textit{IMDP}\rightarrow \textit{PA}\) is a function that maps a given IMDP \(\mathcal {M}= (S, \bar{s}, \mathcal {A}, \texttt {AP},L, I )\) to the PA \(\mathcal {P}= (S, \bar{s}, \texttt {AP}, L, D )\) where \( D = \{\, (s, \mu ) \mid s \in S, \exists a \in \mathcal {A}(s): \mu \in \mathtt {{Ext}}({\mathcal {H}^{a}_{s}}) \wedge \mathcal {H}^{a}_{s}\;\text {is a strictly minimal polytope} \,\}\);

  • the folding mapping \(\mathtt {F}:\textit{PA}\rightarrow \textit{IMDP}\) transforms a PA \(\mathcal {P}= (S, \bar{s}, \texttt {AP}, L, D )\) into the IMDP \(\mathcal {M}= (S, \bar{s}, \{f\}, \texttt {AP}, L, I )\) where, for each \(s, t \in S\), \( I (s, f, t) = {{\mathrm{proj}}}_{\varvec{\mathbf {e}^{\mathbf {t}}}}{{{\mathrm{conv}}}{\{\, \mu \mid (s, \mu ) \in D \,\}}}\);

  • the synchronous product of two PAs \(\mathcal {P}_{1}\) and \(\mathcal {P}_{2}\), denoted by \(\mathcal {P}_{1} \otimes \mathcal {P}_{2}\), is the probabilistic automaton \(\mathcal {P}= (S, \bar{s}, \texttt {AP}, L, D )\) where \(S= S_{1} \times S_{2}\), \(\bar{s}= (\bar{s}_{1}, \bar{s}_{2})\), \(\texttt {AP}= \texttt {AP}_{1} \cup \texttt {AP}_{2}\), for each \((s_{1},s_{2}) \in S\), \(L(s_{1},s_{2}) = L_{1}(s_{1}) \cup L_{2}(s_{2})\), and \( D = \{\, ((s_{1},s_{2}), \mu _{1} \times \mu _{2}) \mid (s_{1}, \mu _{1}) \in D _{1} \text { and } (s_{2}, \mu _{2}) \in D _{2} \,\}\), where \(\mu _{1} \times \mu _{2}\) is defined for each \((t_{1},t_{2}) \in S_{1} \times S_{2}\) as \((\mu _{1} \times \mu _{2})(t_{1},t_{2}) = \mu _{1}(t_{1}) \cdot \mu _{2}(t_{2})\).

As stated earlier, Definition 11 is slightly different from its counterpart in [22]. As a matter of fact, due to the competitive semantics for resolving the nondeterminism, only actions whose uncertainty set is a strictly minimal polytope play a role in deciding the alternating bisimulation relation \(\sim _{{(\exists \sigma \forall )}}\). In particular, for the compositional reasoning keeping state actions whose uncertainty set is not strictly minimal induces spurious behaviors and therefore, influences on the soundness of the parallel operator definition. In order to avoid such redundancies, we can either preprocess the IMDPs before composing by removing state actions whose uncertainty set is not strictly minimal or restricting the unfolding mapping \(\mathtt {UF}\) to unfold a given IMDP while ensuring that all extreme transitions in the resultant probabilistic automaton correspond to extreme points of strictly minimal polytopes in the original IMDP. For the sake of simplicity, we choose the latter.

Theorem 12

Given three IMDPs \(\mathcal {M}_{1}\), \(\mathcal {M}_{2}\), and \(\mathcal {M}_{3}\), if \(\mathcal {M}_{1} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{2}\), then \(\mathcal {M}_{1} \otimes \mathcal {M}_{3} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{2} \otimes \mathcal {M}_{3}\).

We have considered so far the parallel composition via synchronous production, which is working by the definition of folding collapsing all labels to a single transition. Here we consider the other extreme of the parallel composition: interleaving only.

Definition 13

Given two IMDPs \(\mathcal {M}_{l}\) and \(\mathcal {M}_{r}\), we define the interleaved composition of \(\mathcal {M}_{l}\) and \(\mathcal {M}_{r}\) as the IMDP \(\mathcal {M}= (S, \bar{s}, \mathcal {A}, \texttt {AP}, L, I )\) where

  • \(S= S_{l} \times S_{r}\);

  • \(\bar{s}= (\bar{s}_{l},\bar{s}_{r})\);

  • \(\mathcal {A}= (\mathcal {A}_{l} \times \{l\}) \cup (\mathcal {A}_{r} \times \{r\})\);

  • \(\texttt {AP}= \texttt {AP}_{l} \cup \texttt {AP}_{r}\);

  • for each \((s_{l},s_{r}) \in S\), \(L(s_{l},s_{r}) = L_{l}(s_{l}) \cup L_{r}(s_{r})\); and

Theorem 14

Given three IMDPs \(\mathcal {M}_{1}\), \(\mathcal {M}_{2}\), and \(\mathcal {M}_{3}\), if \(\mathcal {M}_{1} \sim _{{(\exists \sigma \forall )}} \mathcal {M}_{2}\), then

6 Case Studies

We implemented in a prototypical tool the proposed bisimulation minimization algorithm and applied it to several case studies. The bisimulation algorithm is tested on several PRISM [29] benchmarks extended to support also intervals in the transitions. For the evaluation, we have used a machine with a 3.6 GHz Intel i7-4790 with 16 GB of RAM of which 12 assigned to the tool; the timeout has been set to 30 mins. Our tool reads a model specification in the PRISM input language and constructs an explicit-state representation of the state space. Afterwards, it computes the quotient using the algorithm in Fig. 2.

Table 1. Experimental evaluation of the bisimulation computation
Fig. 3.
figure 3

Effectiveness of bisimulation minimization on model reduction

Fig. 4.
figure 4

State and transition reduction ratio by bisimulation minimization

Table 1 shows the performance of our prototype on a number of case studies taken from the PRISM website [36], where we have replaced some of the probabilistic choices with intervals. Despite using an explicit representation for the model, the prototype is able to manage cases studies in the order of millions of states and transitions (columns “Model”, “\(|S|\)”, and “\(| I |\)”). The time in seconds required to compute the bisimulation relation and the size of the corresponding quotient IMDPare shown in columns “\(t_{\sim }\)”, “\(|S_{\mathord {\sim }}|\)”, and “\(| I _{\mathord {\sim }}|\)”. In order to improve the performance of the tool, we have implemented optimizations, such as caching equivalent LP problems, which improve the runtime of our prototype. Because of this, we saved to solve several LP problems in each tool run, thereby avoiding the potentially costly solution of LP problems from becoming a bottleneck. However, the more refinements are needed, the more time is required to complete the minimization, since several new LP problems need to be solved. The plots in Fig. 3 show graphically the number of states and transitions for the Consensus and Crowds experiments, where for the latter we have considered more instances than the ones reported in Table 1. As we can see, the bisimulation minimization is able to reduce considerably the size of the IMDP, by several orders of magnitude. Additionally, this reduction correlates positively with the number of model parameters as depicted in Fig. 4.

7 Concluding Remarks

In this paper, we have analyzed interval Markov decision processes under controller synthesis semantics in a dynamic setting. In particular, we provided an efficient compositional bisimulation minimization approach for IMDPs with respect to the competitive semantics, encompassing both the controller and parameter synthesis semantics. In this regard, we proved that alternating probabilistic bisimulation for IMDPs with respect to the competitive semantics can be decided in polynomial time. From perspective of compositional reasoning, we showed that alternating probabilistic bisimulations for IMDPs are congruences with respect to synchronous product and interleaving. Finally, we presented results obtained with a prototype tool on several case studies to show the effectiveness of the developed algorithm.

The core part of this algorithm relies on verifying strictly minimal polytopes in polynomial time, which depends on the special structure of the uncertainty polytopes. For future work, we aim to explore the possibility of preserving this computational efficiency for MDPs with richer formalisms for uncertainties such as likelihood or ellipsoidal uncertainties.