Markov automata with multiple objectives

Quatmann, Tim; Junges, Sebastian; Katoen, Joost-Pieter

doi:10.1007/s10703-021-00364-6

Markov automata with multiple objectives

Published: 29 March 2021

Volume 60, pages 33–86, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Formal Methods in System Design Aims and scope Submit manuscript

Markov automata with multiple objectives

Download PDF

429 Accesses
1 Citation
Explore all metrics

Abstract

Markov automata combine probabilistic branching, exponentially distributed delays and nondeterminism. This compositional variant of continuous-time Markov decision processes is used in reliability engineering, performance evaluation and stochastic scheduling. Their verification so far focused on single objectives such as (timed) reachability, and expected costs. In practice, often the objectives are mutually dependent and the aim is to reveal trade-offs. We present algorithms to analyze several objectives simultaneously and approximate Pareto curves. This includes, e.g., several (timed) reachability objectives, or various expected cost objectives. We also consider combinations thereof, such as on-time-within-budget objectives—which policies guarantee reaching a goal state within a deadline with at least probability p while keeping the allowed average costs below a threshold? We adopt existing approaches for classical Markov decision processes. The main challenge is to treat policies exploiting state residence times, even for untimed objectives. Experimental results show the feasibility and scalability of our approach.

Markov Automata with Multiple Objectives

Optimal Continuous Time Markov Decisions

Markov Automata on Discount!

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Markov automata [24, 26] extend labeled transition systems with probabilistic branching and exponentially distributed delays. They are a compositional variant of continuous-time Markov decision processes (CTMDPs), in a similar vein as Segala’s probabilistic automata extend classical MDPs. Transitions of a Markov automaton (MA) lead from states to probability distributions over states, and are either labeled with actions (allowing for interaction) or real numbers (rates of exponential distributions). MAs are used in reliability engineering [8, 51], hardware design [19], data-flow computation [38], dependability [9] and performance evaluation [25], as MAs are a natural semantic framework for modeling formalisms such as AADL, dynamic fault trees, stochastic Petri nets, stochastic activity networks, SADF etc. The verification of MAs so far focused on single objectives such as reachability, timed reachability, expected costs, and long-run averages [2, 14, 16, 30, 31, 34, 35]. These analyses cannot treat objectives that are mutually influencing each other. The aim of this paper is to analyze multiple objectives on MAs at once, facilitating trade-off analysis by approximating Pareto curves.

Consider the stochastic job scheduling problem of [12]: perform n jobs with exponential service times on k identical processors allowing pre-emptive scheduling. Once a job finishes, all k processors can be assigned any of the m remaining jobs. When $n{-}m$ jobs are finished, this yields $\left( {\begin{array}{c}m\\ k\end{array}}\right) $ non-deterministic choices. The largest-expected-service-time-first-policy is optimal to minimize the expected time to complete all jobs [12]. It is unclear how to schedule when imposing extra constraints, e.g., requiring a high probability to finish a batch of c jobs within a tight deadline (to accelerate their post-processing), or having a low average waiting time. These multiple objectives involve non-trivial trade-offs. Our algorithms analyze such trade-offs. Figure 1, e.g., shows the obtained result for 12 jobs and 3 processors. It approximates the set of points $(p_{1}, p_{2})$ for schedules achieving that (1) the expected time to complete all jobs is at most $p_{1}$ and (2) the probability to finish half of the jobs within an hour is at least $p_{2}$.

This paper extends an earlier paper [44] and presents techniques to verify MAs with multiple objectives. The contribution is threefold. First and foremost, we generalize the analysis of (classical, discrete-time) MDPs with multiple objectives to MA. Moreover, we remove some restrictions present in the classical analysis of MDPs with multiple objectives. Third, we provide an efficient implementation. We detail the contributions below.

Regarding the analysis of MA, we consider multiple (un)timed reachability and expected reward objectives as well as their combinations. Put shortly, we reduce all these problems to instances of multi-objective verification problems on classical MDPs. For multi-objective queries involving (combinations of) untimed reachability and expected reward objectives, corresponding algorithms on the underlying MDP can be used. In this case, the MDP is simply obtained by ignoring the timing information, see Fig. 2b. The crux is in relating MA schedulers—that can exploit state sojourn times to optimize their decisions—to MDP schedulers. For multiple timed reachability objectives, digitization [30, 35] is employed to obtain an MDP, see Fig. 2c. The key is to mimic sojourn times by self-loops with appropriate probabilities. This provides a sound arbitrary close approximation of the timed behavior and also allows to combine timed reachability objectives with other types of objectives. The main contribution is to show that digitization is sound for all possible MA schedulers. This requires a new proof strategy as the existing ones are tailored to single objectives.

We extend approaches for classical multi-objective MDP to support the multi-objective analysis of digitization MDPs. The extension embeds single-objective MA techniques from [35] into the multi-objective framework of [28]. Furthermore, we allow the simultaneous analysis of minimizing and maximizing expected reward objectives, lifting a restriction imposed in [28] by applying additional preprocessing steps that eliminate problematic end components.

Experiments on instances of four MA benchmarks show encouraging results. Multiple untimed reachability and expected reward objectives can be efficiently treated for models with millions of states. As for single objectives [30], timed reachability is more expensive. Our implementation is competitive to PRISM for multi-objective MDPs [28, 39] and to IMCA [30], which implements [35], for single-objective MAs.

Compared with the version in [44], we (1) provide all technical ideas regarding the analysis of Markov automata, and provide formal proofs that previously were omitted. Furthermore, (2) the description of the existing multi-objective MDP model checking makes the paper self-contained, and allows us to give (3) the necessary extensions that allow, e.g., mixing upper and lower bounds on the different objectives. While such extensions may be considered minor, they are practically relevant and often intricate.

Related work Multi-objective decision making for MDPs with discounting and long-run objectives has been well investigated; for a recent survey, see [47]. Etessami et al. [27] consider verifying finite MDPs with multiple $\omega $-regular objectives. Other multiple objectives include expected rewards under worst-case reachability [13, 29], reward-bounded reachability [33], quantiles and conditional probabilities [5, 33], mean pay-offs and stability [11], long-run objectives [7, 10], total average discounted rewards under PCTL [49], and stochastic shortest path objectives [46]. This has been extended to MDPs with unknown cost function [37], MDPs under a restricted class of policies [23], infinite-state MDPs [20] arising from two-player timed games in a stochastic environment, and stochastic two-player games [18]. To the best of our knowledge, this is the first work on multi-objective MDPs extended with random timing.

2 Preliminaries

Notations The set of real numbers is denoted by $\mathbb {R}$, and we write $\mathbb {R}_{> 0}= \{x \in \mathbb {R}\mid x > 0\}$ and $\mathbb {R}_{\ge 0}= \mathbb {R}_{> 0}\cup \{0\}$. For a finite set $S$, $\textit{Dist}(S)$ denotes the set of probability distributions over $S$. $\mu \in \textit{Dist}(S)$ is Dirac if $\mu (s) = 1$ for some $s\in S$.

2.1 Models

Markov automata generalize both Markov decision processes (MDPs) and continuous time Markov chains (CTMCs). They are extended with rewards (or, equivalently, costs) to allow modelling, e.g., energy consumption.

Definition 1

(Markov automaton) A Markov automaton (MA) is a tuple $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ where $S$ is a finite set of states with initial state ${s_{0}}\in S$, $ Act $ is a finite set of actions with $\bot \in Act $ and $ Act \cap \mathbb {R}_{\ge 0}= \emptyset $,

is a set of transitions such that for all $s\in S$ there are no two transitions $(s, \lambda , \mu ), (s, \lambda ', \mu ') \in {\rightarrow }$ with $\lambda , \lambda ' \in \mathbb {R}_{> 0}$, and
$\rho _{1}, \dots , \rho _{\ell }$ with $\ell \ge 0$ are reward functions .

In the remainder of the article, let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ denote an MA. Below, we introduce some notation and introduce some standard assumptions on MA. A transition $(s, \gamma , \mu ) \in {\rightarrow }$, denoted by $s\xrightarrow {\gamma } \mu $, is called probabilistic if $\gamma \in Act $ and Markovian if $\gamma \in \mathbb {R}_{> 0}$. In the latter case, $\gamma $ is the rate of an exponential distribution, modeling a time-delayed transition. Probabilistic transitions fire instantaneously. The successor state is determined by $\mu $, i.e., we move to $s'$ with probability $\mu (s')$. A reward function $\rho _{i}$ defines state rewards and action rewards. When sojourning in a state $s$ for $t$ time units , the state reward $\rho _{i}(s) \cdot t$ is obtained. Upon taking a transition $s\xrightarrow {\gamma } \mu $, we collect action reward $\rho _{i}(s, \gamma )$ (if $\gamma \in Act $) or $\rho _{}(s, \bot )$ (if $\gamma \in \mathbb {R}_{> 0}$).

Example 1

Figure 2a shows an MA $\mathcal {M}$ with seven states and no rewards. We draw direct edges for Dirac distributions, and Markovian transitions are illustrated by dashed arrows. For example: State $s_0$ has an outgoing Markovian transition with rate 1 to a distribution $\mu $, where $\mu (s_3) = 1$. From state $s_4$ there are two probabilistic transitions labelled with two different actions $\gamma $ and $\eta $. For $s_4 \xrightarrow {\eta } \mu $, $\mu $ is given by $\mu (s_5) = 0.7$ and $\mu (s_2) = 0.3$.

Probabilistic (Markovian) states PS (MS) have an outgoing probabilistic (Markovian) transition, respectively:

$$\begin{aligned}&\text {PS}=\{ s\in S\mid s\xrightarrow {\alpha } \mu , \alpha \in Act \}, \\&\text {MS}=\{ s\in S\mid s\xrightarrow {\lambda } \mu , \lambda \in \mathbb {R}_{> 0}\}. \end{aligned}$$

We exclude terminal states $s\notin \text {PS}\cup \text {MS}$ by adding a (Markovian) self-loop to all these states. As standard for MAs [24, 26], we impose the maximal progress assumption, i.e., probabilistic transitions take precedence over Markovian ones as the probability to fire a Markovian transition instantly is zero. Thus, we assume that states are either probabilistic or Markovian: . For Markovian states, we define the exit rate ${\text {E}(s)}$ of $s\in \text {MS}$ by $s\xrightarrow {{\text {E}(s)}} \mu $.

Example 2

Reconsider $\mathcal {M}$ in Fig. 2a. The states $s_0$, $s_1$, $s_2$, $s_5$ and $s_6$ are Markovian. All other states are probabilistic. The exit rate of $s_0$ is 1, the exit rate of $s_5$ is 5.

We assume action-deterministic MAs: $|\{ \mu \in \textit{Dist}(S) \mid s\xrightarrow {\alpha } \mu \}| \le 1$ holds for all $s\in S$ and $\alpha \in Act $. For probabilistic states, this can be achieved by renaming. For Markovian states, this follows from the definition of Markov automata^{Footnote 1}. The transition probabilities of $\mathcal {M}$ are given by the function $\mathbf{P} :S\times Act \times S\rightarrow [0,1]$ satisfying

$$\begin{aligned} \mathbf{P} (s, \alpha , s') = {\left\{ \begin{array}{ll} \mu (s') &{} \text {if either } s\xrightarrow {\alpha } \mu \text { or } \big (\alpha = \bot \text { and } s\xrightarrow {{\text {E}(s)}} \mu \big ) \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

The value $\mathbf{P} (s, \alpha , s')$ corresponds to the probability to move from $s$ with action $\alpha $ to $s'$. The enabled actions at state $s$ are given by $ Act (s) = \{ \alpha \in Act \mid \exists s' \in S:\mathbf{P} (s, \alpha , s') > 0 \}$.

Example 3

Reconsider $\mathcal {M}$ in Fig. 2a. The enabled actions in $s_4$ are $\gamma $ and $\eta $. The probability $\mathbf{P} (s_4, \eta , s_2) = 0.3$, while $\mathbf{P} (s_5, \bot , s_4) = 0.4$.

Remark 1

In Sect. 4.3 we make one additional (but standard) assumption regarding the presence of Zeno-loops. We defer the discussion to that section.

Markov automata extend Markov decision processes by adding random timing. For concise notation, we define MDPs separately below.

Definition 2

(Markov decision process [43]) A Markov decision process (MDP) is a tuple $\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))$, with $S, {s_{0}}, Act , \ell $ as in Def. 1, $\rho _{1}, \dots , \rho _{\ell }$ are action reward functions $\rho _{i} :S\times Act \rightarrow \mathbb {R}_{\ge 0}$, and $\mathbf{P} :S\times Act \times S\rightarrow [0,1]$ are the transition probabilities satisfying $ \sum _{s' \in S}\mathbf{P} (s, \alpha , s') \in \{0,1\}$ for all $s\in S$ and $\alpha \in Act $.

MDPs are MAs without Markovian states, i.e., $\text {MS}= \emptyset $. Thus, MDPs exhibit probabilistic branching and non-determinism, but no random timing.

The reward $\rho _{}(s, \alpha )$ is collected when taking action $\alpha $ at state $s$. We do not consider state rewards for MDPs. The underlying MDP of an MA abstracts away from its timing:

Definition 3

(Underlying MDP) For MA $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ with transition probabilities $\mathbf{P} $ the underlying MDP of $\mathcal {M}$ is the MDP ${\mathcal {M}_\mathcal {D}}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}))$, where for each $i \in \{1, \dots , \ell \}$:

$$\begin{aligned} \rho _{i}^{\mathcal {D}}(s, \alpha ) = {\left\{ \begin{array}{ll} \rho _{i}(s, \alpha ) &{} \text {if } s\in \text {PS}, \\ \rho _{i}(s, \bot ) + \frac{1}{{\text {E}(s)}} \cdot \rho _{i}(s) &{} \text {if } s\in \text {MS}\text { and } \alpha = \bot , \\ 0 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

The reward functions $\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}$ incorporate the action and state rewards of $\mathcal {M}$ where the state rewards are multiplied with the expected sojourn times $\frac{1}{{\text {E}(s)}}$ of states $s\in \text {MS}$.

Example 4

Figure 2 shows an MA $\mathcal {M}$ with its underlying MDP ${\mathcal {M}_\mathcal {D}}$.

Paths and schedulers Paths represent runs of $\mathcal {M}$ starting in the initial state. Let $t(\kappa _{})= 0$ and $\alpha (\kappa _{})= \kappa _{}$, if $\kappa _{}\in Act $, and $t(\kappa _{})= \kappa _{}$ and $\alpha (\kappa _{})= \bot $, if $\kappa _{}\in \mathbb {R}_{\ge 0}$.

Definition 4

(Infinite path) An infinite path of MA $\mathcal {M}$ with transition probabilities $\mathbf{P} $ is an infinite sequence $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots $ of states $s_0, s_1, \dots \in S$ and stamps such that (1) $\sum _{i=0}^{\infty } t(\kappa _{i}) = \infty $, and for any $i \ge 0$ it holds that (2) $\mathbf{P} (s_i, \alpha (\kappa _{i}), s_{i+1}) > 0$, (3) $s_i \in \text {PS}$ implies $\kappa _{i} \in Act $, and (4) $s_i \in \text {MS}$ implies $\kappa _{i} \in \mathbb {R}_{\ge 0}$.

An infix $s_i \xrightarrow {\kappa _{i}} s_{i+1}$ of a path $\pi $ represents that we stay at $s_i$ for $t(\kappa _{i})$ time units and then perform action $\alpha (\kappa _{i})$ and move to state $s_{i+1}$. Condition (1) excludes Zeno paths, condition (2) ensures positive transition probabilities, and conditions (3) and (4) assert that stamps $\kappa _{i}$ match the transition type at $s_i$.

A finite path is a finite prefix $\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}$ of an infinite path. The length of $\pi '$ is $|\pi '| = n$, its last state is $ last (\pi ') = s_n$, and the time duration is $ T (\pi ') = \sum _{0 \le i < |\pi '|} t(\kappa _{i})$. We denote the sets of finite and infinite paths of $\mathcal {M}$ by ${ FPaths ^{\mathcal {M}}}$ and ${ IPaths ^{\mathcal {M}}}$, respectively. The superscript $\mathcal {M}$ is omitted if the model is clear from the context. For a finite or infinite path $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots $ the prefix of $\pi $ of length n is denoted by $ pref (\pi , n)$. The ith state visited by $\pi $ is given by ${\pi [i]} = s_i$. The time-abstraction $\text {ta}(\pi )$ of $\pi $ removes all sojourn times and is a path of the underlying MDP ${\mathcal {M}_\mathcal {D}}$: $ \text {ta}(\pi ) = s_0 \xrightarrow {\alpha (\kappa _{0})} s_1 \xrightarrow {\alpha (\kappa _{1})} \dots $ . Paths of ${\mathcal {M}_\mathcal {D}}$ are also referred to as the time-abstract paths of $\mathcal {M}$.

Definition 5

(Generic scheduler) A generic scheduler for $\mathcal {M}$ is a measurable function $ \sigma :{ FPaths ^{}}\times Act \rightarrow [0,1] $ such that $\sigma (\pi ,\cdot ) \in \textit{Dist}( Act ( last (\pi )))$ for each $\pi \in { FPaths ^{}}$.

A scheduler $\sigma $ for $\mathcal {M}$ resolves the non-determinism of $\mathcal {M}$: $\sigma (\pi ,\alpha )$ is the probability to take transition $ last (\pi ) \xrightarrow {\alpha } \mu $ after observing the run $\pi $. The set of such schedulers is denoted by $\text {GM}^{\mathcal {M}}$ ($\text {GM}^{}$ if $\mathcal {M}$ is clear from the context). $\sigma \in \text {GM}^{}$ is deterministic if the distribution $\sigma (\pi ,\cdot )$ is Dirac for any $\pi $. Time-abstract schedulers behave independently of the time-stamps of the given path, i.e., $\sigma (\pi ,\alpha ) = \sigma (\pi ',\alpha )$ for all actions $\alpha $ and paths $\pi , \pi '$ with $\text {ta}(\pi ) = \text {ta}(\pi ')$. We write $\text {TA}^{\mathcal {M}}$ to denote the set of time-abstract schedulers of $\mathcal {M}$. GM is the most general scheduler class for MAs, and $\text {TA}^{}$ is the most general for MDPs.

2.2 Objectives

An objective ${\mathbb {O}_{i}}$ is a representation of a quantitative property like the probability to reach an error state, or the expected energy consumption. To express Boolean properties (e.g., the probability to reach an error state is below $p_{i}$), ${\mathbb {O}_{i}}$ is combined with a threshold $\vartriangleright _{i} p_{i}$ where $\vartriangleright _{i}\, \in \{<, \le , >, \ge \}$ is a threshold relation and $p_{i} \in \mathbb {R}$ is a threshold value. Let $\mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}$ denote that the MA $\mathcal {M}$ under scheduler $\sigma \in \text {GM}^{}$ satisfies the property ${\mathbb {O}_{i}} \vartriangleright _{i} p_{i}$. We consider reachability objectives and expected reward objectives. In the remainder of this section, we formalize these standard notions.

2.2.1 Probability measure

Given a scheduler $\sigma \in \text {GM}^{}$, a probability measure $\text {Pr}^{\mathcal {M}}_{\sigma }$ on measurable sets of infinite paths is defined, which generalizes both the standard probability measure on MDPs and on CTMCs. We briefly sketch the definition of $\text {Pr}^{\mathcal {M}}_{\sigma }$. More information can be found in, e.g., [35, 40].

We first consider the probability measure $\text {Pr}^{ Steps }_{\sigma , \pi }$ for transition steps. We let $\mathfrak {B}(\mathbb {R}_{\ge 0})$ denote the Borel $\varvec{\sigma }$-algebra on $\mathbb {R}_{\ge 0}$ and define the $\varvec{\sigma }$-algebra for transition steps $\mathfrak {F}^ Steps = \varvec{\sigma }(\mathfrak {B}(\mathbb {R}_{\ge 0}) \times 2^ Act \times 2^S)$ using the Cartesian product of $\varvec{\sigma }$-algebra [1]. For a finite path $\pi \in { FPaths ^{}}$ with $s= last (\pi )$ and a measurable set of transition steps $T \in \mathfrak {F}^ Steps $, the probability of $T \subseteq \mathbb {R}_{\ge 0}\times Act \times S$ is

Next, we define the probability measure $\text {Pr}^{n}_{\sigma }$ for $n \in \mathbb {N}$ transition steps over the $\varvec{\sigma }$-algebra . For $S' \subseteq S$ we set if ${s_{0}}\in S'$ and otherwise. Moreover, given $n > 0$, $\varPi \in \mathfrak {F}^{n-1}$ and $T \in \mathfrak {F}^ Steps $ we set $\varPi \circ T= \{ (\pi ,\,(t, \alpha , s')) \mid \pi \in \varPi , (t, \alpha , s') \in T \}$ and

$$\begin{aligned} \text {Pr}^{n}_{\sigma }(\varPi \circ T) = \int _{\pi \in \varPi \cap { FPaths ^{}}} \text {Pr}^{ Steps }_{\sigma , \pi }(T) \,\text {d}\text {Pr}^{n-1}_{\sigma }(\{\pi \})\ . \end{aligned}$$

Following standard constructions (e.g., [1, 40]), we obtain $\text {Pr}^{\mathcal {M}}_{\sigma }$ by lifting $\text {Pr}^{n}_{\sigma }$ first to measurable sets of finite paths given by the $\varvec{\sigma }$-algebra $\mathfrak {F}^* = \bigcup _{n=0}^\infty \mathfrak {F}^n$ and finally to measurable sets of infinite paths given by the $\varvec{\sigma }$-algebra $\mathfrak {F}^\omega $. The latter is the smallest $\varvec{\sigma }$-algebra containing the cylinder set $ Cyl (\varPi )$ of all measurable $\varPi \in \mathfrak {F}^*$, where

$$\begin{aligned} Cyl (\varPi ) = \{ \pi \xrightarrow {\kappa _{n}} s_{n+1} \xrightarrow {\kappa _{n+1}} \dots \in { IPaths ^{\mathcal {M}}} \mid \pi \in \varPi \}. \end{aligned}$$

Let $\varPi \in \mathfrak {F}^*$ and $\pi \in IPaths$. For simplicity, we may write $\text {Pr}^{\mathcal {M}}_{\sigma }(\varPi )$ instead of $\text {Pr}^{\mathcal {M}}_{\sigma }( Cyl (\varPi ))$ and $\text {Pr}^{\mathcal {M}}_{\sigma }(\pi )$ instead of $\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \})$. Note that $\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \}) = 0 $ if $\pi $ contains one or more Markovian transitions.

For some $\varLambda \in \mathfrak {F}^\omega $ with $\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ) > 0$ we consider the conditional probability measure $\text {Pr}^{\mathcal {M}}_{\sigma }( \cdot \mid \varLambda ) :\mathfrak {F}^\omega \rightarrow [0,1]$, where for $\varPi \in \mathfrak {F}^\omega $:

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( \varPi \mid \varLambda ) = \frac{\text {Pr}^{\mathcal {M}}_{\sigma }( \varPi \cap \varLambda )}{\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda )}. \end{aligned}$$

2.2.2 Reachability objectives

$I\subseteq \mathbb {R}$ is a time interval if it is of the form $I= [a, b]$ or $I= [a,\infty )$, where $0 \le a< b$. The set of paths reaching a set of goal states $G\subseteq S$ in time $I$ is defined as

$$\begin{aligned} \lozenge ^{I} G=&\{ \pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{}} \mid \exists n \ge 0 :{\pi [n]} \in G\text { and}\\&I\cap [t, t+t(\kappa _{n})] \ne \emptyset \text { for } t= T ( pref (\pi , n)) \}. \end{aligned}$$

$\lozenge ^{I}$ is measurable as it can be expressed via countable unions of measurable cylinder sets. We write $\lozenge ^{}G$ instead of $\lozenge ^{[0,\infty )} G$.

We formulate reachability objectives as follows:

Definition 6

(Reachability objective) A reachability objective has the form ${\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}$ for time interval $I$ and goal states $G$. The objective is timed if $I\ne [0,\infty )$ and untimed otherwise. For MA $\mathcal {M}$ and scheduler $\sigma \in \text {GM}^{}$, let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})} \vartriangleright _{i} p_{i}\quad \text {iff}\quad \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \vartriangleright _{i}p_{i}. \end{aligned}$$

2.2.3 Expected reward objectives

Expected rewards define the expected amount of reward collected (w.r.t. some $\rho _{j}$) until a goal state in $G\subseteq S$ is reached, generalizing the notion on CTMCs and MDPs. More precisely, we fix a reward function $\rho _{}$ of the MA $\mathcal {M}$. The reward of a finite path $\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}$ is given by

Intuitively, is the sum over the rewards obtained in every step $s_{i} \xrightarrow {\kappa _{i}}$ in the path $\pi '$. The reward obtained in step i is composed of the state reward of $s_i$ multiplied with the sojourn time $t(\kappa _{i})$ plus the action reward given by $s_i$ and $\alpha (\kappa _{i})$. State rewards assigned to probabilistic states do not affect the reward of a path as the sojourn time in such states is zero.

For an infinite path $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{}}$, the reward of $\pi $ up to a set of goal states $G\subseteq S$ is given by

Intuitively, we stop collecting reward as soon as $\pi $ reaches a state in $G$. If no state in $G$ is reached, reward is accumulated along the infinite path, which potentially yields an infinite reward. The function is measurable. Its expected value with respect to scheduler $\sigma \in \text {GM}^{}{}$ is called the expected reward $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})$, i.e.,

With this definition, we formulate expected reward objectives as follows:

Definition 7

(Expected reward objective) An expected reward objective has the form ${\mathbb {E}({\# j_{}, G_{}})}$ where $j$ is the index of reward function $\rho _{j}$ and $G\subseteq S$. For MA $\mathcal {M}$ and scheduler $\sigma \in \text {GM}^{}$, let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {E}({\# j_{}, G_{}})} \vartriangleright _{i} p_{i}\quad \text {iff}\quad \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) \vartriangleright _{i} p_{i}. \end{aligned}$$

Expected time objectives ${\mathbb {E}({ T , G_{}})}$ are expected reward objectives that consider the reward function $\rho _{ T }$ with $\rho _{ T }(s) = 1$ if $s\in \text {MS}$ and all other rewards are zero.

3 Multi-objective model checking

Standard model checking considers objectives individually. This approach is not feasible when we are interested in multiple objectives that should be fulfilled by the same scheduler, e.g., a scheduler that maximizes the expected profit might violate certain safety constraints. Multi-objective model checking aims to analyze multiple objectives at once and reveals potential trade-offs.

Definition 8

(Satisfaction of multiple objectives) Let $\mathcal {M}$ be an MA and $\sigma \in \text {GM}^{}$. For objectives ${\mathbb {O}_{}} = ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})$ with threshold relations ${\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})\in \{<, \le , >, \ge \}^d$ and threshold values $\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d$ let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\iff \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \text { for all } 1 \le i \le d. \end{aligned}$$

Furthermore, let $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff \exists \sigma \in \text {GM}^{}$ such that $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$.

If $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$, the point $\mathbf{p }\in \mathbb {R}^d$ is achievable in $\mathcal {M}$ with scheduler $\sigma $. The set of achievable points of $\mathcal {M}$ w.r.t. ${\mathbb {O}_{}}$ and $\vartriangleright _{}$ is $ \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}. $ We straightforwardly also apply these definitions to MDPs. This is compatible with the notions on multi-objective MDPs as given in [27, 28].

Example 5

Figure 3b depicts the set of achievable points of the MA $\mathcal {M}$ from Fig. 3a w.r.t. relations ${\vartriangleright _{}}=(\ge ,\ge )$ and objectives $({\mathbb {P}({\lozenge ^{}\{s_2\}})},{\mathbb {P}({\lozenge ^{}\{s_4\}})})$. Likewise, Fig. 3c depicts the achievable points for $\mathcal {M}$, ${\vartriangleright _{}}$, and $({\mathbb {P}({\lozenge ^{}\{s_2\}})},{\mathbb {P}({\lozenge ^{[0,2]}{\{s_4\}}})})$. Using the set of achievable points, we can answer Pareto, numerical, and achievability queries as considered in [28], e.g., the Pareto front lies on the border of the set. We detail this computation in Sect. 5.

Schedulers For single-objective model checking on MAs, it suffices to consider deterministic schedulers [41]. For untimed reachability and expected rewards even time-abstract deterministic schedulers suffice [41]. Multi-objective model checking on MDPs requires history-dependent, randomized schedulers [27]. The following theorem and its proof show that schedulers on MA may also employ timing information to make optimal choices, even if only untimed objectives are considered. However, we show in Sect. 4 that for untimed objectives, such schedulers can always be converted to time-abstract schedulers (potentially considering the history and randomization). Put differently, timing information can be employed to achieve untimed objectives but it is not necessary to do so.

Theorem 1

For some MA $\mathcal {M}$ with $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$, no deterministic time-abstract scheduler $\sigma $ satisfies $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$.

Proof

Consider the MA $\mathcal {M}$ in Fig. 3a with relations ${\vartriangleright _{}} = (\ge , \ge )$, objectives ${\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}\{s_2\}})}, {\mathbb {P}({\lozenge ^{}\{s_4\}})})$, and point $\mathbf{p }=(0.5,0.5)$. We have $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$: A simple graph argument yields that both properties are only satisfied if action $\alpha $ is taken with probability exactly a half. As the probability mass to stay in $s_0$ for at most $\ln (2)$ is exactly 0.5, a timed scheduler $\sigma $ with $\sigma (s_0 \xrightarrow {t} s_1,\alpha )=1$ if $t\le \ln (2)$ and 0 otherwise does satisfy both objectives.

There are only two deterministic time abstract schedulers for $\mathcal {M}$:

$$\begin{aligned} \sigma _\alpha :\text { always choose } \alpha \qquad \text {and} \qquad \sigma _\beta :\text { always choose } \beta \end{aligned}$$

and it holds that $\mathcal {M}, \sigma _\alpha \not \models {\mathbb {P}({\lozenge ^{}\{s_4\}})} \ge 0.5$ and $\mathcal {M},\sigma _\beta \not \models {\mathbb {P}({\lozenge ^{}\{s_2\}})} \ge 0.5$. $\square $

The geometric shape of the achievable points Like for MDPs [27], the set of achievable points of any combination of aforementioned objectives is convex.

Proposition 1

The set $\{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}$ is convex.

Proof

Let $\mathcal {M}$ be an MA and let ${\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})$ be objectives with relations ${\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})$ and points $\mathbf{p }_1, \mathbf{p }_2 \in \mathbb {R}^d$ such that $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_1)$ and $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_2)$ holds. To show that the set of achievable points is convex, we need to show that for $w \in [0,1]$ the point $\mathbf{p }= w \cdot \mathbf{p }_1 + (1-w) \cdot \mathbf{p }_2$ is achievable as well, i.e., $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$. For $i \in {1,2}$, let $\sigma _i \in \text {GM}^{}$ be a scheduler satisfying $\mathcal {M}, \sigma _i \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_i$. The point $\mathbf{p }$ is achievable with the scheduler $\sigma $ that intuitively makes an initial one-off random choice:

with probability w mimic $\sigma _1$ and
with probability $1-w$ mimic $\sigma _2$.

Specifying such a scheduler as a function $\sigma ^w :{ FPaths ^{}}\times Act \rightarrow [0,1]$ as in Def. 5 is technically involved because $\sigma ^w$ can not memorize the outcome of the initial one-off random choice. For a path $\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}$ and $\alpha \in Act $, the value $\sigma ^w(\pi , \alpha )$ needs to depend on the previously chosen actions $\alpha (\kappa _{j})$ in $\pi $ and the probability that these choices adhere to $\sigma _1$ and $\sigma _2$, respectively. Let $w_1 = w$ and $w_2 = 1-w$. We set^{Footnote 2}

$$\begin{aligned} \sigma ^w(\pi , \alpha ) = \frac{\sum _{i=1}^2 \left( w_i \cdot \sigma _i(\pi , \alpha ) \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) }{\sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) } \end{aligned}$$

In App. A we show that for any measurable $\varPi \subseteq { IPaths ^{}}$ we have $\text {Pr}^{\mathcal {M}}_{\sigma ^w}(\varPi ) = w_1 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _1}(\varPi ) + w_2 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _2}(\varPi )$. Lifting this observation to expected rewards is straightforward. It follows that $\sigma ^w$ achieves the point $\mathbf{p }= w_1 \cdot \mathbf{p }_1 + w_2 \cdot \mathbf{p }_2$. $\square $

For MDPs, the set of achievable points is a convex polytope where the vertices can be realized by deterministic schedulers that use memory bounded by $2^d$, where $d$ is the number of objectives. As there are finitely many such schedulers, the polytope has a finite $\mathcal {V}$-representation [27], i.e., it can be represented by a finite number of vertices. This result does not carry over to MAs.

Theorem 2

For some MA $\mathcal {M}$ and objectives ${\mathbb {O}_{}}$, the polytope $\{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}$ does not have a finite $\mathcal {V}$-representation.

Proof

We show that the claim holds for the MA $\mathcal {M}$ in Fig. 3a with objectives ${\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}\{s_2\}})}, {\mathbb {P}({\lozenge ^{[0,2]}{\{s_4\}}})})$ and relations ${\vartriangleright _{}} = (\ge , \ge )$. The insight here is that for any sojourn time $t \le 2$ in $s_0$, the timing information is relevant for optimal schedulers: The shorter the sojourn time in $s_0$, the higher the probability to reach $s_4$ within the time bound. For the sake of contradiction assume that the polytope $A= \{\mathbf{p }\in \mathbb {R}^2 \mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}$ has a finite $\mathcal {V}$-representation. Then, there must be two distinct vertices $\mathbf{p }_1,\mathbf{p }_2$ of $A$ such that $\{w \cdot \mathbf{p }_1 + (1-w) \cdot \mathbf{p }_2 \mid w \in [0,1]\}$ is a face of $A$. In particular, this means that $\mathbf{p }= 0.5 \cdot \mathbf{p }_1 + 0.5 \cdot \mathbf{p }_2$ is achievable but $\mathbf{p }_\varepsilon = \mathbf{p }+ (0,\varepsilon )$ is not achievable for all $\varepsilon >0$. We show that there is in fact an $\varepsilon $ for which $\mathbf{p }_\varepsilon $ is achievable, contradicting our assumption that $A$ has a finite $\mathcal {V}$-representation.

For $i \in {1,2}$, let $\sigma _i \in \text {GM}^{}$ be a scheduler satisfying $\mathcal {M}, \sigma _i \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_i$. $\sigma _1 \ne \sigma _2$ has to hold as the schedulers achieve different vertices of $A$. The point $\mathbf{p }$ is achievable with the randomized scheduler $\sigma $ that mimics $\sigma _1$ with probability 0.5 and mimics $\sigma _2$ otherwise. Consider $t= -\log (\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_2\}))$ and the deterministic scheduler $\sigma '$ given by

$$\begin{aligned} \sigma '(s_0 \xrightarrow {t_0} s_1,\alpha ) = {\left\{ \begin{array}{ll} 1 &{} \text {if } t_0 > t\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

$\sigma '$ satisfies $\text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{}\{s_2\}) = e^{-t} = \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_2\})$. Moreover, we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{[0,t]} \{s_3\}) = \text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{}\{s_3\})= \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_3\}) > \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,t]} \{s_3\}), \end{aligned}$$

where the last inequality is due to $\sigma \ne \sigma '$. While the probability to reach $s_3$ is equal under both schedulers, $s_3$ is reached earlier when $\sigma '$ is considered. This increases the probability to reach $s_4$ in time, i.e., $\text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{[0,2]} \{s_4\}) > \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,2]} \{s_4\})$. It follows that $\mathcal {M}, \sigma ' \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_\varepsilon $ for some $\varepsilon >0$. $\square $

Since convex polytopes without a finite $\mathcal {V}$-representation cannot be represented by a finite number of vertices, any method extending the approach of [28]—which computes these vertices—can only approximate the set of achievable points.

4 Analysis of Markov automata with multiple objectives

The state-of-the-art in single-objective model checking of MA is to reduce the MA to an MDP, cf. [30, 31, 35], for which efficient algorithms exist. We aim to lift this approach to multi-objective model checking. Assume MA $\mathcal {M}$ and objectives ${\mathbb {O}_{}}$ with threshold relations $\vartriangleright _{}$. We discuss how the set of achievable points of $\mathcal {M}$ relates to the set of achievable points of an MDP. The key challenge is to deal with timing information—even for untimed objectives—and to consider schedulers beyond those optimizing single objectives. We obtain:

For untimed reachability and expected reward objectives, the achievable points of $\mathcal {M}$ equal those of its underlying MDP, cf. Theorems 3 and 4.
For timed reachability objectives, the set of achievable points of a digitized MDP ${\mathcal {M}_{\delta }}$ provides a sound approximation of the achievable points of $\mathcal {M}$, cf. Theorem 5. Corollary 1 gives the precision of the approximation.

4.1 Untimed reachability objectives

Although timing information is essential for deterministic schedulers, cf. Theorem 1, timing information does not strengthen randomized schedulers:

Theorem 3

For MA $\mathcal {M}$ and untimed reachability objectives ${\mathbb {O}_{}}$ it holds that $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }). $

The main idea for proving Theorem 3 is to construct for scheduler $\sigma \in \text {GM}^{\mathcal {M}}$ a time-abstract scheduler ${\text {ta}(\sigma )}\in \text {TA}^{{\mathcal {M}_\mathcal {D}}}$ such that they both induce the same untimed reachability probabilities. To this end, we discuss the connection between probabilities of paths of MA $\mathcal {M}$ and paths of MDP ${\mathcal {M}_\mathcal {D}}$.

Definition 9

(Induced paths of a time-abstract path) The set of induced paths on MA $\mathcal {M}$ of a path $\hat{\pi }$ of ${\mathcal {M}_\mathcal {D}}$ is given by

$$\begin{aligned} \langle {\hat{\pi }} \rangle = \text {ta}^{-1}(\hat{\pi }) = \{\pi \in { FPaths ^{\mathcal {M}}} \cup { IPaths ^{\mathcal {M}}} \mid \text {ta}(\pi ) = \hat{\pi }\}. \end{aligned}$$

The set $\langle {\hat{\pi }} \rangle $ contains all paths of $\mathcal {M}$ where replacing sojourn times by $\bot $ yields $\hat{\pi }$.

For $\sigma \in \text {GM}^{}$, the probability distribution $\sigma (\pi ,\cdot ) \in \textit{Dist}( Act )$ might depend on the sojourn times of the path $\pi $. The time-abstract scheduler ${\text {ta}(\sigma )}$ weights the distribution $\sigma (\pi ,\cdot )$ with the probability masses of the paths $\pi \in \langle {\hat{\pi }} \rangle $.

Definition 10

(Time-abstraction of a scheduler) The time-abstraction of $\sigma \in \text {GM}^{\mathcal {M}}$ is defined as ${\text {ta}(\sigma )}\in \text {TA}^{{\mathcal {M}_\mathcal {D}}}$ such that for any $\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}$

$$\begin{aligned} {\text {ta}(\sigma )}(\hat{\pi },\alpha ) = {\left\{ \begin{array}{ll} \displaystyle \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma } ( \pi \mid \langle {\hat{\pi }} \rangle ) &{} \text {if } \text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) > 0\\ \displaystyle 1\,/\, | Act ( last (\hat{\pi }))| &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Intuitively, the term $\text {Pr}^{\mathcal {M}}_{\sigma } ( \pi \mid \langle {\hat{\pi }} \rangle )$ represents the probability for a path in $\langle {\hat{\pi }} \rangle $ to have sojourn times as given by $\pi $. The value ${\text {ta}(\sigma )}(\hat{\pi },\alpha )$ coincides with the probability that $\sigma $ picks action $\alpha $, given that the time-abstract path $\hat{\pi }$ was observed. If $\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) = 0$, the value for ${\text {ta}(\sigma )}(\hat{\pi },\alpha )$ is arbitrary. For simplicity, we picked a uniform choice.

Example 6

Consider the MA $\mathcal {M}$ in Fig. 2a and the scheduler $\sigma $ choosing $\alpha $ at state $s_3$ iff the sojourn time at $s_0$ is at most one. Then ${\text {ta}(\sigma )}(s_0 \xrightarrow {\bot } s_3,\alpha ) = 1-e^{-{\text {E}(s_0)}}$, the probability that $s_0$ is left within one time unit. For $\bar{\pi }= s_0 \xrightarrow {\bot } s_3 \xrightarrow {\alpha } s_6$ we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_6\}) = \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\bar{\pi }} \rangle ) = 1-e^{-{\text {E}(s_0)}} = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\bar{\pi }) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}\{s_6\}). \end{aligned}$$

In the example, the considered scheduler and its time-abstraction induce the same untimed reachability probabilities. We generalize this observation.

Lemma 1

For any $\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}$ we have $ \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }). $

Proof

The proof is by induction over the length of the considered path $|\hat{\pi }| = n$. Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ and ${\mathcal {M}_\mathcal {D}}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}))$. If $n=0$, then $\{\hat{\pi }\} = \langle {\hat{\pi }} \rangle = \{{s_{0}}\}$. Hence, $\text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) = 1 = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi })$. In the induction step, we assume that the lemma holds for a fixed path $\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}$ with length $|\hat{\pi }|=n$ and $ last (\hat{\pi }) = s$. Consider the path $\hat{\pi }\xrightarrow {\alpha _{}} s' \in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}$. If $\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) = 0$, then $\text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\alpha _{}} s') = 0$ because $ Cyl (\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle ) \subseteq Cyl (\langle {\hat{\pi }} \rangle )$ and $ Cyl (\{\hat{\pi }\xrightarrow {\alpha _{}} s'\}) \subseteq Cyl (\{\hat{\pi }\})$.

Now assume $\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) > 0$.

$\underline{Case s\in \text {PS}:}$ It follows that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle )&= \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \cdot \mathbf{P} (s, \alpha , s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \} \cap \langle {\hat{\pi }} \rangle ) \\&= \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\big [\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid \langle {\hat{\pi }} \rangle ) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle )\big ] \\&= \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \cdot \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid \langle {\hat{\pi }} \rangle ) \\&= \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {ta}(\sigma )}(\hat{\pi },\alpha ) \\&\overset{{\textit{IH}}}{=}\text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {ta}(\sigma )}(\hat{\pi },\alpha ) \\&= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\alpha _{}} s'). \end{aligned}$$

$\underline{Case s\in \text {MS}:}$ As $s \in \text {MS}$ we have $\alpha = \bot $ and it follows

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\bot } s'} \rangle )&= \int _{\pi \in \langle {\hat{\pi }} \rangle } \int _{0}^{\infty } {\text {E}(s)}\cdot e^{-{\text {E}(s)} t} \cdot \mathbf{P} (s, \bot , s') \,\text {d}t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \bot , s')\cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \int _{0}^{\infty } {\text {E}(s)}\cdot e^{-{\text {E}(s)} t} \,\text {d}t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \bot , s')\cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \\&\overset{{\textit{IH}}}{=}\mathbf{P} (s, \bot , s')\cdot \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) \\&= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\bot } s'). \end{aligned}$$

$\square $

The result is lifted to untimed reachability probabilities.

Proposition 2

For any $G\subseteq S$ it holds that $ \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{})= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}). $

Proof

Let $\varPi $ be the set of finite time-abstract paths of ${\mathcal {M}_\mathcal {D}}$ that end at the first visit of a state in $G$, i.e.,

$$\begin{aligned} \varPi = \{s_{0} \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{n-1}} s_{n}\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}} \mid s_n \in G\text { and } \forall i < n :s_i \notin G\}. \end{aligned}$$

Every path $\pi \in \lozenge ^{}G\subseteq { IPaths ^{\mathcal {M}}}$ has a unique prefix $\pi '$ with $\text {ta}(\pi ') \in \varPi $. We have

The claim follows with Lemma 1 since

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{})= \sum _{\hat{\pi }\in \varPi } \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \overset{\text {Lem.}\,1}{=}\sum _{\hat{\pi }\in \varPi } \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}). \end{aligned}$$

$\square $

As the definition of ${\text {ta}(\sigma )}$ is independent of the considered set of goal states $G\subseteq S$, Proposition 2 can be lifted to multiple untimed reachability objectives.

Proof

(Theorem 3) Let ${\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}G_{1}})}, \dots , {\mathbb {P}({\lozenge ^{}G_{d}})})$ be the considered list of objectives with threshold relations ${\vartriangleright _{}}= (\vartriangleright _{1}, \dots , \vartriangleright _{d})$. The following equivalences hold for any $\sigma \in \text {GM}^{\mathcal {M}}$ and $\mathbf{p }\in \mathbb {R}^d$.

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ \,&\Longleftrightarrow \ \,\forall i :\mathcal {M}, \sigma \models {\mathbb {P}({\lozenge ^{}G_{i}})} \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \,\forall i :\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{i}) \vartriangleright _{i} p_{i} \\&\overset{{\text {Prop.}\,2}}{\Longleftrightarrow }\ \, \forall i :\text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{i}) \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \,\forall i :{\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {P}({\lozenge ^{}G_{i}})} \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \, {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ . \end{aligned}$$

Assume that $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$ holds, i.e., there is a $\sigma \in \text {GM}^{\mathcal {M}}$ such that $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$. It follows that ${\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$ which means that $ achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$ holds as well. For the other direction assume $ achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$, i.e., ${\mathcal {M}_\mathcal {D}}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$ for some time-abstract scheduler $\sigma \in \text {TA}^{}$. We have ${\text {ta}(\sigma )}=\sigma $. It follows that ${\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$. Applying the equivalences above yields $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$ and thus $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$. $\square $

4.2 Expected reward objectives

The results for expected reward objectives are similar to untimed reachability objectives: An analysis of the underlying MDP suffices. We show the following extension of Theorem 3 to expected reward objectives.

Theorem 4

For MA $\mathcal {M}$ and untimed reachability and expected reward objectives ${\mathbb {O}_{}}$: $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }). $

To prove this, we show that a scheduler $\sigma \in \text {GM}^{\mathcal {M}}$ and its time-abstraction ${\text {ta}(\sigma )}\in \text {TA}^{}$ induce the same expected rewards on $\mathcal {M}$ and ${\mathcal {M}_\mathcal {D}}$, respectively. Theorem 4 follows then analogously to Theorem 3.

Proposition 3

Let $\rho _{}$ be some reward function of $\mathcal {M}$ and let $\rho _{}^{\mathcal {D}}$ be its counterpart for ${\mathcal {M}_\mathcal {D}}$. For $G\subseteq S$ we have $ \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). $

Notice that $\rho _{}^{\mathcal {D}}$ encodes the expected reward of $\mathcal {M}$ obtained in a state s by assuming the sojourn time to be the expected sojourn time $\frac{1}{E(s)}$. Although the claim is similar to Proposition 2, its proof cannot be adapted straightforwardly. In particular, the analogon to Lemma 1 does not hold: The expected reward collected along a time-abstract path $\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}$ does not coincide in general for $\mathcal {M}$ and ${\mathcal {M}_\mathcal {D}}$.

Example 7

Let $\mathcal {M}$ be the MA with underlying MDP ${\mathcal {M}_\mathcal {D}}$ as shown in Fig. 2. Let $\rho _{}(s_0) =1$ and zero otherwise. Reconsider the scheduler $\sigma $ from Example 6. Let $\hat{\pi }_\alpha = s_0 \xrightarrow {\bot } s_3 \xrightarrow { \alpha } s_6$. The probability $\text {Pr}^{\mathcal {M}}_{\sigma }(\{ s_0 \xrightarrow {t} s_3 \xrightarrow {\alpha } s_6 \in \langle {\hat{\pi }_\alpha } \rangle \mid t> 1 \})$ is zero since $\sigma $ chooses $\beta $ on such paths. For the remaining paths in $\langle {\hat{\pi }_\alpha } \rangle $, action $\alpha $ is chosen with probability one. The expected reward in $\mathcal {M}$ along $\hat{\pi }_\alpha $ is:

The expected reward in ${\mathcal {M}_\mathcal {D}}$ along $\hat{\pi }_\alpha $ differs as

The intuition is as follows: If path $s_0 \xrightarrow {t} s_3 \xrightarrow {\alpha } s_6$ of $\mathcal {M}$ under $\sigma $ occurs, we have $t\le 1$ since $\sigma $ chose $\alpha $. Hence, the reward collected from paths in $\langle {\hat{\pi }_\alpha } \rangle $ is at most $1 \cdot \rho _{}(s_0) = 1$. There is thus a dependency between the choice of the scheduler at $s_3$ and the collected reward at $s_0$. This dependency is absent in ${\mathcal {M}_\mathcal {D}}$ as the reward at a state is independent of the subsequent performed actions.

Let $\hat{\pi }_\beta = s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4$. The expected reward along $\hat{\pi }_\beta $ is $2e^{-1}$ for $\mathcal {M}$ and $e^{-1}$ for ${\mathcal {M}_\mathcal {D}}$. As the rewards for $\hat{\pi }_\alpha $ and $\hat{\pi }_\beta $ sum up to one in both $\mathcal {M}$ and ${\mathcal {M}_\mathcal {D}}$, the expected reward along all paths of length two coincides for $\mathcal {M}$ and ${\mathcal {M}_\mathcal {D}}$.

This observation can be generalized to arbitrary MA and paths of arbitrary length. We first formalize the step-bounded expected reward.

Let $n \ge 0$ and $G\subseteq S$. The set of time-abstract paths that end after n steps or at the first visit of a state in $G$ is denoted by

$$\begin{aligned} {\varPi _{G}^{n}}=&\{s_{0} \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{m-1}} s_m \in { FPaths ^{{\mathcal {M}_\mathcal {D}}}} \mid \ (m=n \text { or } s_m \in G) \text { and } \\&s_i \notin G\text { for all } 0\le i < m\}. \end{aligned}$$

For $\mathcal {M}$ under $\sigma \in \text {GM}^{\mathcal {M}}$ and ${\mathcal {M}_\mathcal {D}}$ under ${\text {ta}(\sigma )}\in \text {TA}^{}$, we define the expected reward collected along the paths of ${\varPi _{G}^{n}}$ as

respectively. Intuitively, $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})$ corresponds to $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})$ assuming that no more reward is collected after the n-th transition. It follows that the value $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})$ approaches $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})$ for large n. Similarly, $\mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}})$ approaches $\mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{})$ for large n. This observation is formalized by the following lemma.

Lemma 2

For MA $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ with $G\subseteq S$, $\sigma \in \text {GM}^{}$, and reward function $\rho _{}$ it holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{}). \end{aligned}$$

Furthermore, any reward function $\rho _{}^{\mathcal {D}}$ for ${\mathcal {M}_\mathcal {D}}$ satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \end{aligned}$$

Proof

(Sketch) Essentially, for each $n \ge 0$, consider the function $f_n :{ IPaths ^{\mathcal {M}}} \rightarrow \mathbb {R}_{\ge 0}$ given by

for every path $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{\mathcal {M}}}$. Intuitively, $f_n(\pi )$ is the reward collected on $\pi $ within the first n steps and only up to the first visit of $G$. This allows us to express the expected reward collected along the paths of ${\varPi _{G}^{n}}$, and note that the sequence of functions $f_0, f_1, \dots $ is non-decreasing, we may apply the monotone convergence theorem [1] and conclude the proof. A full proof may be found in the appendix. $\square $

The next step is to show that the expected reward collected along the paths of ${\varPi _{G}^{n}}$ coincides for $\mathcal {M}$ under $\sigma $ and ${\mathcal {M}_\mathcal {D}}$ under ${\text {ta}(\sigma )}$.

Lemma 3

Let $\rho _{}$ be some reward function of $\mathcal {M}$ and let $\rho _{}^{\mathcal {D}}$ be its counterpart for ${\mathcal {M}_\mathcal {D}}$. Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ be an MA with $G\subseteq S$ and $\sigma \in \text {GM}^{}$. For all $G\subseteq S$ and $n\ge 0$ it holds that

$$\begin{aligned} \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) . \end{aligned}$$

Proof

The proof is by induction over the path length n. To simplify the notation, we often omit the reward functions $\rho _{}$ and $\rho _{}^{\mathcal {D}}$ and write, e.g., instead of or $\mathrm {ER}^{\mathcal {M}}_{\sigma }( {\varPi _{G}^{n}})$ instead of $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})$.

If $n = 0$, then ${\varPi _{G}^{n}}= \{ {s_{0}}\}$. The claim holds: .

In the induction step, we assume that the lemma is true for some fixed $n \ge 0$. We define $ pref (\pi , n)$ for paths with $|\pi | \le n$ such that $ pref (\pi , n) = \pi $. We split the term $\mathrm {ER}^{\mathcal {M}}_{\sigma }( {\varPi _{G}^{n+1}})$ into the reward that is obtained by paths which reach $G$ within n steps and the reward obtained by paths of length $n+1$. In a second step, we consider the sum of the reward collected within the first n steps and the reward obtained in the $(n+1)$-th step.

(1)

(2)

Detailed reformulations for the terms (1) and (2) are treated separately and discussed in “Appendix B.2”. $\square $

We now first show Proposition 3 and then Theorem 4.

Proof

(Proposition 3) The proposition is a direct consequence of Lemma 2 and Lemma 3 as

$$\begin{aligned} \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})&= \lim _{n\rightarrow \infty } \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})\\&= \lim _{n\rightarrow \infty } \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \end{aligned}$$

$\square $

Proof

(Theorem 4) Let ${\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})$ be the considered list of untimed reachability and expected reward objectives with threshold relations ${\vartriangleright _{}}= (\vartriangleright _{1}, \dots , \vartriangleright _{d})$. The following equivalences hold for any $\sigma \in \text {GM}^{\mathcal {M}}$ and $\mathbf{p }\in \mathbb {R}^d$.

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }&\Longleftrightarrow \forall i :\mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \\&\overset{*}{\Longleftrightarrow } \forall i :{\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ , \end{aligned}$$

where for the equivalence marked with $*$ we consider two cases: If ${\mathbb {O}_{i}}$ is of the form ${\mathbb {P}({\lozenge ^{}G_{}})}$, Proposition 2 yields

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}&\Longleftrightarrow \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{}) \vartriangleright _{i} p_{i}\\&\Longleftrightarrow \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}) \vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \ . \end{aligned}$$

Otherwise, ${\mathbb {O}_{i}}$ is of the form ${\mathbb {E}({\# j_{}, G_{}})}$ and with Proposition 3 it follows that

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}&\Longleftrightarrow \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) \vartriangleright _{i} p_{i}\\&\Longleftrightarrow \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{j}^{\mathcal {D}}, G_{})\vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \ . \end{aligned}$$

The remaining steps of the proof are completely analogous to the proof of Theorem 3 conducted on page 15. $\square $

Thus, queries on MA with mixtures of untimed reachability and expected reward objectives can be analyzed on the underlying MDP ${\mathcal {M}_\mathcal {D}}$.

4.3 Timed reachability objectives

Timed reachability objectives cannot be analyzed on ${\mathcal {M}_\mathcal {D}}$ as it abstracts away from sojourn times. We lift the digitization approach for single-objective timed reachability [30, 35] to multiple objectives. Instead of abstracting timing information, it is digitized. In this section, we make an additional but standard assumption on MAs (e.g., [14, 30, 35]). MAs with Zeno behavior, where infinitely many actions can be taken within finite time with non-zero probability, are unrealistic and considered a modeling error. MAs that exhibit Zeno behavior can be easily detected using the notion of end-components (that we discuss in the next section).

4.3.1 Digitized reachability

The digitization ${\mathcal {M}_{\delta }}$ of $\mathcal {M}$ w.r.t. some digitization constant $\delta \in \mathbb {R}_{> 0}$ is an MDP which digitizes the time [30, 35]. The main difference between ${\mathcal {M}_\mathcal {D}}$ and ${\mathcal {M}_{\delta }}$ is that the latter also introduces self-loops which describe the probability to stay in a Markovian state for $\delta $ time units. More precisely, the outgoing transitions of states $s\in \text {MS}$ in ${\mathcal {M}_{\delta }}$ represent that either (1) a Markovian transition in $\mathcal {M}$ was taken within $\delta $ time units, or (2) no transition is taken within $\delta $ time units—which is captured by taking the self-loop in ${\mathcal {M}_{\delta }}$. Counting the taken self-loops at $s\in \text {MS}$ allows to approximate the sojourn time in $s$.

Definition 11

(Digitization of an MA) For MA $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, \{\rho _{1},\dots , \rho _{\ell }\})$ with transition probabilities $\mathbf{P} $ and digitization constant $\delta \in \mathbb {R}_{> 0}$, the digitization of $\mathcal {M}$ w.r.t. $\delta $ is the MDP ${\mathcal {M}_{\delta }}= (S, Act , \mathbf{P} _{\delta }, {s_{0}}, (\rho _{1}^\delta , \dots , \rho _{\ell }^\delta ))$, where

$$\begin{aligned} \mathbf{P} _{\delta }(s, \alpha , s') = {\left\{ \begin{array}{ll} \mathbf{P} (s, \bot , s') \cdot (1-e^{-{\text {E}(s)} \delta }) &{} \text {if } s\in \text {MS}, \alpha = \bot , s\ne s'\\ \mathbf{P} (s, \bot , s') \cdot (1-e^{-{\text {E}(s)} \delta }) + e^{-{\text {E}(s)} \delta } &{} \text {if } s\in \text {MS}, \alpha = \bot , s= s'\\ \mathbf{P} (s, \alpha , s') &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

and for each $i \in \{1, \dots , \ell \}$:

$$\begin{aligned} \rho _{i}^\delta (s, \alpha ) = {\left\{ \begin{array}{ll} \rho _{i}(s, \alpha ) &{} \text {if } s\in \text {PS}\\ \big (\rho _{i}(s, \bot ) + \frac{1}{{\text {E}(s)}} \cdot \rho _{i}(s)\big ) \cdot \big (1-e^{-{\text {E}(s)} \delta }\big ) &{} \text {if } s\in \text {MS}\text { and } \alpha = \bot \\ 0 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Example 8

Figure 2 on page 4 shows an MA $\mathcal {M}$ with its underlying MDP ${\mathcal {M}_\mathcal {D}}$ and a digitization ${\mathcal {M}_{\delta }}$ for unspecified $\delta \in \mathbb {R}_{> 0}$.

A time interval $I\subseteq \mathbb {R}_{\ge 0}$ of the form $[a,\infty )$ or $[a, b]$ with $ \text {di}_a{:}{=} \frac{a}{\delta } \in \mathbb {N}$ and $\text {di}_b{:}{=} \frac{b}{\delta } \in \mathbb {N}$ is called well-formed. For the remainder, we only consider well-formed intervals. If the interval boundaries a and b are rationals with $a= \frac{a_1}{a_2}, b= \frac{b_1}{b_2}$ for integers $a_1, a_2, b_1, b_2 \in \mathbb {Z}$, well-formedness can always be ensured by setting the digitization constant $\delta $ to $\frac{1}{k}$ for some common multiple k of $a_2$ and $b_2$.

An interval for time-bounds $I$ is transformed to digitization step bounds ${\text {di}(I)}\subseteq \mathbb {N}$. Let $a=\min I$, we set $ {\text {di}(I)}= \{ {m_{}}\in \mathbb {N}\mid {m_{}}\cdot \delta \in I\}$.

We first relate paths in $\mathcal {M}$ to paths in its digitization.

Definition 12

(Digitization of a path) The digitization ${\text {di}(\pi )}$ of path $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots $ in $\mathcal {M}$ is the path in ${\mathcal {M}_{\delta }}$ given by

$$\begin{aligned} {\text {di}(\pi )}= \big ( s_0 \xrightarrow {\alpha (\kappa _{0})} \big )^{{m_{0}}} s_0 \xrightarrow {\alpha (\kappa _{0})} \big ( s_1 \xrightarrow {\alpha (\kappa _{1})} \big )^{{m_{1}}} s_1 \xrightarrow {\alpha (\kappa _{1})} \dots \ \end{aligned}$$

where ${m_{i}} = \max \{ {m_{}} \in \mathbb {N}\mid {m_{}} \cdot \delta \le t(\kappa _{i}) \}$ for each $i\ge 0$.

Example 9

For the path $\pi = s_0 \xrightarrow {1.1} s_3 \xrightarrow {\beta } s_4 \xrightarrow {\eta } s_5 \xrightarrow {0.3} s_4 $ of the MA $\mathcal {M}$ in Fig. 2a and $\delta = 0.4$, we get $ {\text {di}(\pi )}= s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4 \xrightarrow {\eta } s_5 \xrightarrow {\bot } s_4 $.

The ${m_{i}}$ in the definition above represent a digitization of the sojourn times $t(\kappa _{i})$ such that ${m_{i}} \delta \le t(\kappa _{i}) < ({m_{i}}{+}1) \delta $. These digitized times are incorporated into the digitization of a path by taking the self-loop at state $s_i \in \text {MS}$ ${m_{i}}$ times. We also refer to the paths of ${\mathcal {M}_{\delta }}$ as digital paths (of $\mathcal {M}$). The number ${|\bar{\pi }|_{\text {ds}}}$ of digitization steps of a digital path $\bar{\pi }$ is the number of transitions emerging from Markovian states, i.e., $ {|\bar{\pi }|_{\text {ds}}} = |\{ i < |\bar{\pi }| \mid {\bar{\pi }[i]} \in \text {MS}\} |$. One digitization step represents the elapse of at most $\delta $ time units—either by staying at some $s\in \text {MS}$ for $\delta $ time or by leaving $s$ within $\delta $ time. The number ${|{\text {di}(\pi )}|_{\text {ds}}}$ multiplied with $\delta $ yields an estimate for the duration $ T (\pi )$. A digital path $\bar{\pi }$ can be interpreted as representation of the set of paths of $\mathcal {M}$ whose digitization is $\bar{\pi }$.

Definition 13

(Induced paths of a digital path) The set of induced paths of a (finite or infinite) digital path $\bar{\pi }$ of ${\mathcal {M}_{\delta }}$ is

$$\begin{aligned}{}[{\bar{\pi }}] = \text {di}^{-1}(\bar{\pi }) = \{ \pi \in { FPaths ^{\mathcal {M}}} \cup { IPaths ^{\mathcal {M}}} \mid {\text {di}(\pi )}= \bar{\pi }\}. \end{aligned}$$

For sets of digital paths $\varPi $ we define the induced paths $[{\varPi }] = \bigcup _{\bar{\pi }\in \varPi } [{\bar{\pi }}]$. To relate timed reachability probabilities for $\mathcal {M}$ under scheduler $\sigma \in \text {GM}^{\mathcal {M}}$ with digitization step-bounded reachability probabilities for ${\mathcal {M}_{\delta }}$, relating $\sigma $ to a scheduler for ${\mathcal {M}_{\delta }}$ is necessary.

Definition 14

(Digitization of a scheduler) The digitization of $\sigma \in \text {GM}^{\mathcal {M}}$ is given by ${\text {di}(\sigma )}\in \text {TA}^{{\mathcal {M}_{\delta }}}$ such that for any $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$ with $ last (\bar{\pi }) \in \text {PS}$

$$\begin{aligned} {\text {di}(\sigma )}(\bar{\pi },\alpha ) = {\left\{ \begin{array}{ll} \displaystyle \ \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }}]) &{} \text {if } \text {Pr}^{\mathcal {M}}_{\sigma } ([{\bar{\pi }}]) > 0\\ \displaystyle 1\,/\, | Act ( last (\bar{\pi }))| &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

The digitization ${\text {di}(\sigma )}$ is similar to the time-abstraction ${\text {ta}(\sigma )}$ as both schedulers get a path with restricted timing information as input and mimic the choice of $\sigma $. However, while ${\text {ta}(\sigma )}$ receives no information regarding sojourn times, ${\text {di}(\sigma )}$ receives the digital estimate. Intuitively, ${\text {di}(\sigma )}(\bar{\pi },\alpha )$ considers $\sigma (\pi ,\alpha )$ for each $\pi \in [{\bar{\pi }}]$, weighted with the probability that the sojourn times of a path in $[{\bar{\pi }}]$ are as given by $\pi $. The restriction $ last (\bar{\pi }) \in \text {PS}$ asserts that $\bar{\pi }$ does not end with a self-loop on a Markovian state, implying $[{\bar{\pi }}] \ne \emptyset $.

Example 10

Let MA $\mathcal {M}$ in Fig. 2a and $\delta =0.4$. Again, $\sigma \in \text {GM}^{\mathcal {M}}$ chooses $\alpha $ at state $s_3$ iff the sojourn time at $s_0$ is at most one. Consider the digital paths $\bar{\pi }_m = ( s_0 \xrightarrow {\bot } )^{m} s_0 \xrightarrow {\bot } s_3$. For $\pi \in [{\bar{\pi }_1}] = \{s_0 \xrightarrow {t} s_3 \mid 0.4 \le t< 0.8 \}$ we have $\sigma (\pi ,\alpha ) = 1$. It follows ${\text {di}(\sigma )}(\pi _1,\alpha ) = 1$. For $\pi \in [{\bar{\pi }_2}] = \{s_0 \xrightarrow {t} s_3 \mid 0.8 \le t< 1.2 \}$ it is unclear whether $\sigma $ chooses $\alpha $ or $\beta $. Hence, ${\text {di}(\sigma )}$ randomly guesses:

$$\begin{aligned} {\text {di}(\sigma )}(\bar{\pi }_2,\alpha ) = \int _{\pi \in [{\bar{\pi }_2}]} \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }_2}]) = \frac{ \int _{0.8}^{1.0} {\text {E}(s_0)} e^{-{\text {E}(s_0)} t} \,\text {d}t}{\int _{0.8}^{1.2} {\text {E}(s_0)} e^{-{\text {E}(s_0)} t} \,\text {d}t} \approx 0.55\ . \end{aligned}$$

On ${\mathcal {M}_{\delta }}$ we consider $\text {ds}$-bounded reachability instead of timed reachability.

Definition 15

($\text {ds}$-bounded reachability) The set of infinite digital paths that reach $G\subseteq S$ within the interval $J\subseteq \mathbb {N}$ of consecutive natural numbers is

$$\begin{aligned} \lozenge ^{J}_{\text {ds}} G= \{\bar{\pi }\in { IPaths ^{{\mathcal {M}_{\delta }}}} \mid \ \exists n \ge 0 :{\bar{\pi }[n]} \in G\text { and } {| pref (\bar{\pi }, n)|_{\text {ds}}} \in J\}. \end{aligned}$$

The timed reachability probabilities for $\mathcal {M}$ are estimated by $\text {ds}$-bounded reachability probabilities for ${\mathcal {M}_{\delta }}$. The induced $\text {ds}$-bounded reachability probability for $\mathcal {M}$ (under $\sigma $) coincides with $\text {ds}$-bounded reachability probability on ${\mathcal {M}_{\delta }}$ (under ${\text {di}(\sigma )}$).

Proposition 4

Let $\mathcal {M}$ be an MA with $G\subseteq S$, $\sigma \in \text {GM}^{}$, and digitization ${\mathcal {M}_{\delta }}$. Further, let $J\subseteq \mathbb {N}$ be a set of consecutive natural numbers. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{J}_{\text {ds}} G}]) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{J}_{\text {ds}} G). \end{aligned}$$

Before we prove this proposition, we need to aggregate paths which differ a bit in their timing. Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ be an MA and let ${\mathcal {M}_{\delta }}$ be the digitization of $\mathcal {M}$ with respect to some $\delta \in \mathbb {R}_{> 0}$. We consider the infinite paths of $\mathcal {M}$ that are represented by a finite digital path.

Definition 16

(Induced cylinder of a digital path) Given a digital path $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$ of MA $\mathcal {M}$, the induced cylinder of $\bar{\pi }$ is given by

$$\begin{aligned}{}[{\bar{\pi }}]_{\textit{cyl}} = \{\pi \in { IPaths ^{\mathcal {M}}} \mid \bar{\pi }\text { is a prefix of } {\text {di}(\pi )}\}. \end{aligned}$$

Recall the definition of the cylinder of a set of finite paths (cf. Sect. 2.2.1). If $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$ does not end with a self-loop at a Markovian state, then $[{\bar{\pi }}]_{\textit{cyl}} = Cyl ([{\bar{\pi }}])$ holds.

Example 11

Let $\mathcal {M}$ and ${\mathcal {M}_{\delta }}$ be as in Fig. 2. We consider the path $\bar{\pi }_1 = s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4 $ and digitization constant $\delta = 0.4$. The set $[{\bar{\pi }_1}]_{\textit{cyl}}$ contains each infinite path whose digitization has the prefix $\bar{\pi }_1$, i.e.,

$$\begin{aligned}{}[{\bar{\pi }_1}]_{\textit{cyl}} = \{ s_0 \xrightarrow {t} s_3 \xrightarrow {\beta } s_4 \xrightarrow {\kappa _{}} \dots \in { IPaths ^{\mathcal {M}}} \mid 0.8 \le t< 1.2\}. \end{aligned}$$

We observe that these are exactly the paths that have a prefix in $[{\bar{\pi }_1}]$. Put differently, we have $[{\bar{\pi }_1}]_{\textit{cyl}} = Cyl ([{\bar{\pi }_1}])$.

Next, consider the digital path $\bar{\pi }_2 = s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0$. Note that there is no path $\pi \in { FPaths ^{\mathcal {M}}}$ with ${\text {di}(\pi )}= \bar{\pi }_2$, implying $[{\bar{\pi }_2}] = \emptyset $. Intuitively, $\bar{\pi }_2$ depicts a sojourn time at $ last (\bar{\pi }_2)$ but finite paths of MAs do not depict sojourn times at their last state. On the other hand, the induced cylinder of $\bar{\pi }_2$ contains all paths that sojourn at least $2 \delta $ time units at $s_0$, i.e.,

$$\begin{aligned}{}[{\bar{\pi }_2}]_{\textit{cyl}} = \{ s_0 \xrightarrow {t} s_1 \xrightarrow {\kappa _{}} \dots \in { IPaths ^{\mathcal {M}}} \mid t\ge 0.8 \}. \end{aligned}$$

The schedulers $\sigma $ and ${\text {di}(\sigma )}$ induce the same probabilities for a given digital path. This is formalized by the following lemma. Note that a similar statement for ${\text {ta}(\sigma )}$ and time-abstract paths was shown in Lemma 1.

Lemma 4

Let $\mathcal {M}$ be an MA with scheduler $\sigma \in \text {GM}^{}$, digitization ${\mathcal {M}_{\delta }}$, and digital path $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( [{\bar{\pi }}]_{\textit{cyl}} ) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }). \end{aligned}$$

The proof is included in App. C.1. To show Proposition 4, we now apply Lemma 4. The idea of the proof is similar to the proof of Proposition 2 conducted on page 14. We include a formal proof in App. C.2.

4.3.2 Timed reachability via digitization

Thus, induced $\text {ds}$-bounded reachability on MAs can be computed on their digitization. Next, we relate $\text {ds}$-bounded and timed reachability on MAs, i.e., we quantify the maximum difference between time-bounded and $\text {ds}$-bounded reachability probabilities.

The notation ${|\bar{\pi }|_{\text {ds}}}$ for paths $\bar{\pi }$ of ${\mathcal {M}_{\delta }}$ is also applied to paths of $\mathcal {M}$, where ${|\pi |_{\text {ds}}} = {|{\text {di}(\pi )}|_{\text {ds}}}$ for any $\pi \in { FPaths ^{\mathcal {M}}}$. Intuitively, one digitization step represents the elapse of at most $\delta $ time units. Consequently, the duration of a path with $k\in \mathbb {N}$ digitization steps is at most $k\delta $.

Lemma 5

For a path $\pi \in { FPaths ^{\mathcal {M}}}$ and digitization constant $\delta $ it holds that

$$\begin{aligned} T (\pi ) \le {|\pi |_{\text {ds}}} \cdot \delta \ . \end{aligned}$$

Proof

Let $\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}$ and let ${m_{i}} = \max \{ {m_{}} \in \mathbb {N}\mid {m_{}} \delta \le t(\kappa _{i}) \}$ for each $i \in \{0, \dots , n-1\}$ (as in Definition 12). The number ${|\pi |_{\text {ds}}}$ is given by $\sum _{ 0 \le i < n,\, s_i \in \text {MS}} ({m_{i}} + 1)$. With $t(\kappa _{i}) \le ({m_{i}} + 1) \delta $ it follows that

$$\begin{aligned} T (\pi ) = \sum _{\begin{array}{c} 0 \le i< n\\ s_i \in \text {MS} \end{array}} t(\kappa _{i}) \le \sum _{\begin{array}{c} 0 \le i < n\\ s_i \in \text {MS} \end{array}} ( {m_{i}} + 1) \delta = {|\pi |_{\text {ds}}} \cdot \delta \ . \end{aligned}$$

$\square $

For a path $\pi $ and $t\in \mathbb {R}_{\ge 0}$, the prefix of $\pi $ up to time point $t$ is given by $ pref _{ T }(\pi , t) = pref (\pi , \max \{ n \mid T ( pref (\pi , n)) \le t\})$. Observe that the digitization approach yields an inaccurate estimate of the actual time. This inaccuracy is the probability that more than $k\in \mathbb {N}$ digitization steps have been performed within $k\delta $ time units.

Definition 17

(Digitization step bounded paths) Assume an MA $\mathcal {M}$ and a digitization constant $\delta \in \mathbb {R}_{> 0}$. For some $t\in \mathbb {R}_{\ge 0}$, $k\in \mathbb {N}$, and ${\vartriangleright _{}} \in \{<,\le , >, \ge \}$ the set of paths whose prefix up to time point $t$ has $\vartriangleright _{}k$ digitization steps is defined as

$$\begin{aligned} \#^{}[{t}]^{{\vartriangleright _{}}{ k}} = \{\pi \in { IPaths ^{\mathcal {M}}} \mid {| pref _{ T }(\pi , t)|_{\text {ds}}} \vartriangleright _{}k\}. \end{aligned}$$

Example 12

Let $\mathcal {M}$ be the MA given in Fig. 4a. We consider the set $\#^{}[{5 \delta }]^{{\le }{ 5}}$. The digitization constant $\delta $ remains unspecified in this example. Fig. 4b illustrates paths $\pi _1$, $\pi _2$, and $\pi _3$ of $\mathcal {M}$. We depict sojourn times by arrow length. For instance, the path $\pi _1$ corresponds to $s_0 \xrightarrow {2.5 \delta } s_0 \xrightarrow {1.8 \delta } s_1 \xrightarrow {1.7 \delta } \dots \in { IPaths ^{\mathcal {M}}}$. Digitization steps that are “earned” by sojourning at some state for a multiple of $\delta $ time units are indicated by black dots. Transitions of $\pi _i$ (where $i \in \{1,2,3\}$) that do not belong to $ pref _{ T }(\pi _i, 5 \delta )$ are depicted in gray. We obtain

$$\begin{aligned} {| pref _{ T }(\pi _1, 5\delta )|_{\text {ds}}} = 5 \quad&\implies \quad \pi _1 \in \#^{}[{5 \delta }]^{{\le }{ 5}} \\ {| pref _{ T }(\pi _2, 5\delta )|_{\text {ds}}} = 4 \quad&\implies \quad \pi _2 \in \#^{}[{5 \delta }]^{{\le }{ 5}} \\ {| pref _{ T }(\pi _3, 5\delta )|_{\text {ds}}} = 7 \quad&\implies \quad \pi _3 \notin \#^{}[{5 \delta }]^{{\le }{ 5}}\ . \end{aligned}$$

Note that only the digitization steps of the prefix up to time point $5 \delta $ are considered. For example, the step of $\pi _2$ at time point $4.5 \delta $ is not considered since the corresponding transition is not part of $ pref _{ T }(\pi _2, 5\delta )$. However, we have ${| pref _{ T }(\pi _2, 5.5 \delta )|_{\text {ds}}} = 6$, implying $\pi _2 \notin \#^{}[{5.5 \delta }]^{{\le }{5}}$.

All considered paths reach $G= \{s_1 \}$ within $5 \delta $ time units but $\pi _3 \in \#^{}[{5 \delta }]^{{>}{5}}$ requires more than 5 digitization steps. This yields

$$\begin{aligned} \pi _1, \pi _2 \in \lozenge ^{I} G\cap [{\lozenge ^{{\text {di}(I)}}_{\text {ds}}G}] \quad \text { and } \quad \pi _3 \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}}G}]. \end{aligned}$$

The sets $\lozenge ^{I} G$ and $[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$ are illustrated in Fig. 5. We have

$$\begin{aligned} \text {Pr}^{}_{\sigma }(\lozenge ^{I} G) = \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) + \text {Pr}^{}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) - \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G). \end{aligned}$$

One then shows

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \varepsilon ^{\uparrow }_{}(I) \text {\quad and \quad } \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$

for adequate error bounds $ \varepsilon ^{\uparrow }_{}(I),\varepsilon ^{\downarrow }_{}(I)$. In particular, the following lemma gives an upper bound for the probability $\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{>}{k}})$, i.e., the probability that more than $k\in \mathbb {N}$ digitization steps have been performed within $k\delta $ time units. Then, this probability can be related to the probability of paths in $\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$ and $[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G$, respectively.

Lemma 6

Let $\mathcal {M}$ be an MA with $\sigma \in \text {GM}^{}$ and maximum rate $\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}$. Further, let $\delta \in \mathbb {R}_{> 0}$ and $k\in \mathbb {N}$. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{>}{k}}) \le 1- (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k} \end{aligned}$$

For the proof of Lemma 6 we employ the following auxiliary lemma.

Lemma 7

Let $\mathcal {M}$ be an MA with $\sigma \in \text {GM}^{}$ and maximum rate $\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}$. For each $\delta \in \mathbb {R}_{> 0}$, $k\in \mathbb {N}$, and $t\in \mathbb {R}_{\ge 0}$ it holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + t}]^{{\le }{ k}}) \ge \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{\le }{ k}}) \cdot e^{-\lambda t}\ . \end{aligned}$$

Proofs for both lemmas are given in App. C.3.

We have now obtained all necessary insights to bound $\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})$ from above and below, using the digitized steps and the following error bounds. Let $\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}$ be the maximum exit rate of $\mathcal {M}$. For $a\ne 0$ define

$$\begin{aligned} \varepsilon ^{\downarrow }_{}([a, b])&= \varepsilon ^{\downarrow }_{}([a, \infty )) = 1 - (1+ \lambda \delta )^{\text {di}_a} \cdot e^{- \lambda a}\ , \quad \varepsilon ^{\downarrow }_{}([0,b)) = \varepsilon ^{\downarrow }_{}([0,\infty ]) = 0 , \\ \varepsilon ^{\uparrow }_{}([a,b])&=\underbrace{ 1 - (1+ \lambda \delta )^{\text {di}_b} \cdot e^{- \lambda b}}_{=\varepsilon ^{\uparrow }_{}([0,b])} + \underbrace{1 - e^{-\lambda \delta }}_{=\varepsilon ^{\uparrow }_{}([a, \infty ))}\ , \text { and }\ \varepsilon ^{\uparrow }_{}([0,\infty )) = 0 . \end{aligned}$$

$\varepsilon ^{\downarrow }_{}(I)$ and $\varepsilon ^{\uparrow }_{}(I)$ approach 0 for small digitization constants $\delta \in \mathbb {R}_{> 0}$.

Proposition 5

For MA $\mathcal {M}$, scheduler $\sigma \in \text {GM}^{}$, goal states $G\subseteq S$, digitization constant $\delta \in \mathbb {R}_{> 0}$ and time interval $I$

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \in \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{I}_{\text {ds}} G}]) + \Big [{-}\varepsilon ^{\downarrow }_{}(I),\, \varepsilon ^{\uparrow }_{}(I)\Big ]. \end{aligned}$$

Proof

We already discussed that

$$\begin{aligned} \text {Pr}^{}_{\sigma }(\lozenge ^{I} G) = \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) + \text {Pr}^{}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) - \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G). \end{aligned}$$

The main part of the proof is to show that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \varepsilon ^{\downarrow }_{}(I) \text { and } \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \varepsilon ^{\uparrow }_{}(I). \end{aligned}$$

(3)

Then, the proposition follows directly. We show Equation 3 for the different forms of the time interval $I$. Here, we consider intervals of the form $I= [0,b]$. Recall that by assumption $b= \text {di}_b\cdot \delta $. For the other forms, we refer to App. C.4.

We have ${\text {di}(I)}= \{0,1, \dots , \text {di}_b\}$.

We show that $[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] \subseteq \lozenge ^{I} G$ which implies
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) = \text {Pr}^{\mathcal {M}}_{\sigma }(\emptyset ) = 0 = \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$
Let $\pi \in [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$ and let $\pi '$ be the smallest prefix of $\pi $ with $ last (\pi ') \in G$. It follows that $\text {di}(\pi ')$ is also the smallest prefix of ${\text {di}(\pi )}$ with $ last (\text {di}(\pi ')) \in G$. Since ${\text {di}(\pi )}\in \lozenge ^{{\text {di}(I)}}_{\text {ds}} G$, it follows that ${|\pi '|_{\text {ds}}} = {|\text {di}(\pi ')|_{\text {ds}}} \le \text {di}_b$. From Lemma 5 we obtain
$$\begin{aligned} T (\pi ') \le {|\pi '|_{\text {ds}}} \cdot \delta = {|\text {di}(\pi ')|_{\text {ds}}} \cdot \delta \le \text {di}_b\delta = b\ . \end{aligned}$$
Hence, the prefix $\pi '$ reaches $G$ within $b$ time units, implying $\pi \in \lozenge ^{I} G$.
Next, we show $\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] \subseteq \#^{}[{b}]^{{>}{\text {di}_b}}$. With Lemma 6 we obtain
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{b}]^{{>}{\text {di}_b}}) \le 1 - (1+ \lambda \delta )^{\text {di}_b} \cdot e^{- \lambda b} = \varepsilon ^{\uparrow }_{}(I) \end{aligned}$$
Consider a path $\pi \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$. Note that $\pi $ reaches $G$ within $b$ time units but with more than $\text {di}_b$ digitization steps. Hence, the prefix of $\pi $ up to time point $b$ certainly has more than $\text {di}_b$ digitization steps, i.e., $\pi $ satisfies ${| pref _{ T }(\pi , b)|_{\text {ds}}} > \text {di}_b$ which means $\pi \in \#^{}[{b}]^{{>}{\text {di}_b}}$.

$\square $

From Prop. 4 and Prop. 5, we immediately have Cor. 1, which ensures that the value $\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})$ can be approximated with arbitrary precision by computing $\text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G)$ for a sufficiently small $\delta $.

Corollary 1

For MA $\mathcal {M}$, scheduler $\sigma \in \text {GM}^{}$, goal states $G\subseteq S$, digitization constant $\delta \in \mathbb {R}_{> 0}$ and time interval $I$

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \in \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) + \Big [{-}\varepsilon ^{\downarrow }_{}(I),\, \varepsilon ^{\uparrow }_{}(I)\Big ] . \end{aligned}$$

Corollary 1 generalizes existing results from single-objective timed reachability analysis: For MA $\mathcal {M}$, goal states $G$, time bound $b\in \mathbb {R}_{> 0}$, and digitization constant $\delta \in \mathbb {R}_{> 0}$ with $\frac{b}{\delta } = \text {di}_b\in \mathbb {N}$, [30, Theorem 5.3] states that

$$\begin{aligned} \sup _{\sigma \in \text {GM}^{\mathcal {M}}} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} G_{}) \in \sup _{\sigma \in \text {TA}^{{\mathcal {M}_{\delta }}}} \text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma }(\lozenge ^{\{0, \dots , \text {di}_b\}}_{\text {ds}} G) + \Big [{-}\varepsilon ^{\downarrow }_{}([0, b]),\, \varepsilon ^{\uparrow }_{}([0, b])\Big ]. \end{aligned}$$

Corollary 1 generalizes this result by explicitly referring to the schedulers $\sigma \in \text {GM}^{\mathcal {M}}$ and ${\text {di}(\sigma )}\in \text {TA}^{{\mathcal {M}_{\delta }}}$ under which the claim holds. This extension is necessary as a multi-objective analysis can not be restricted to schedulers that only optimize a single objective. More details are given in App. D.

Next, we lift Cor. 1 to multiple objectives ${\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})$. We define the satisfaction of a timed reachability objective ${\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}$ for the digitization ${\mathcal {M}_{\delta }}$ as $ {\mathcal {M}_{\delta }}, \sigma \models {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\vartriangleright _{i} p_{i} \text { iff } \text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma }(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) \vartriangleright _{i}p_{i} $. This allows us to consider notations like $ achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$, where ${\mathbb {O}_{}}$ contains one or more timed reachability objectives. For a point $\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d$ we consider the hyperrectangle

and $\varepsilon ^{\downarrow }_{i}$ is defined similarly. The next example shows how the set of achievable points of $\mathcal {M}$ can be approximated using achievable points of ${\mathcal {M}_{\delta }}$.

Example 13

Let ${\mathbb {O}_{}} = ( {\mathbb {P}({\lozenge ^{I_{1}}{G_{1}}})}, {\mathbb {P}({\lozenge ^{I_{2}}{G_{2}}})})$ be two timed reachability objectives for an MA $\mathcal {M}$ with digitization ${\mathcal {M}_{\delta }}$ such that $\varepsilon ^{\downarrow }_{1} = 0.13$, $\varepsilon ^{\uparrow }_{1} = 0.22$, $\varepsilon ^{\downarrow }_{2} = 0.07$, and $\varepsilon ^{\uparrow }_{2} = 0.15$. The blue rectangle in Fig. 6a illustrates the set $\varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ for the point $\mathbf{p }= (0.4, 0.3)$. Assume $ achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })$ holds for threshold relations ${\vartriangleright _{}} = \{\ge , \ge \}$, i.e., $\mathbf{p }$ is achievable for the digitization ${\mathcal {M}_{\delta }}$. From Cor. 1, we infer that $\varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ contains at least one point $\mathbf{p }'$ that is achievable for $\mathcal {M}$. Hence, the bottom left corner point of the rectangle is achievable for $\mathcal {M}$. This holds for any rectangle $\varepsilon ({\mathbb {O}_{}}, \mathbf{q })$ with $\mathbf{q }\in A$, where $A$ is the set of achievable points of ${\mathcal {M}_{\delta }}$ denoted by the gray area^{Footnote 3} in Fig. 6b. It follows that any point in $A^{-}$ (depicted by the green area) is achievable for $\mathcal {M}$. On the other hand, an achievable point of $\mathcal {M}$ has to be contained in a set $\varepsilon ({\mathbb {O}_{}}, \mathbf{q })$ for at least one $\mathbf{q }\in A$. The red area depicts the points $\mathbb {R}^d{\setminus } A^{+}$ for which this is not the case, i.e., points that are not achievable for $\mathcal {M}$. The digitization constant $\delta $ controls the accuracy of the resulting approximation. Figure 6c depicts a possible result when a smaller digitization constant $\tilde{\delta }< \delta $ is considered.

The observations from the example above are formalized in the following theorem. The theorem also covers unbounded reachability objectives by considering the time interval $I= [0,\infty )$. For expected reward objectives of the form ${\mathbb {E}({\# j_{}, G_{}})}$ it can be shown that $\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) = \mathrm {ER}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\rho _{j}^\delta , G_{})$. This claim is similar to Proposition 3 and can be shown analogously. This enables multi-objective model checking of MAs with timed reachability objectives.

Theorem 5

Let $\mathcal {M}$ be an MA with digitization ${\mathcal {M}_{\delta }}$. Furthermore, let ${\mathbb {O}_{}}$ be (un)timed reachability or expected reward objectives with threshold relations ${\vartriangleright _{}}$ and $|{\mathbb {O}_{}}| = d$. It holds that $A^{-}\subseteq \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \} \subseteq A^{+}$ with:

$$\begin{aligned} A^{-}&= \{\mathbf{p }' \in \mathbb {R}^d\mid \forall \mathbf{p }\in \mathbb {R}^d:\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p }) \text { implies } achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \} \text { and} \\ A^{+}&= \{\mathbf{p }' \in \mathbb {R}^d\mid \exists \mathbf{p }\in \mathbb {R}^d:\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p }) \text { and } achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\} . \end{aligned}$$

Proof

For simplicity, we assume that only the threshold relation $\ge $ is considered, i.e., ${\vartriangleright _{}} = (\ge , \dots , \ge )$. Furthermore, we restrict ourself to (un)timed reachability objectives. The remaining cases are treated analogously.

First assume a point $\mathbf{p }' = (p_{1}', \dots , p_{d}') \in A^{-}$. Consider the point $\mathbf{p }= (p_{1}, \dots , p_{d})$ satisfying $p_{i}' = p_{i} - \varepsilon ^{\downarrow }_{i}$ for each index i. It follows that $\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ and thus ${\mathcal {M}_{\delta }}, \bar{\sigma } \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$ for some scheduler $\bar{\sigma } \in \text {TA}^{{\mathcal {M}_{\delta }}}$. Consider the scheduler $\sigma \in \text {GM}^{\mathcal {M}}$ given by $\sigma (\pi ,\alpha ) = \bar{\sigma }({\text {di}(\pi )},\alpha )$ for each path $\pi \in { FPaths ^{\mathcal {M}}}$ and action $\alpha \in Act $. Notice that $\bar{\sigma } = {\text {di}(\sigma )}$. For an index i let ${\mathbb {O}_{i}}$ be the objective ${\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}$. It follows that

$$\begin{aligned} {\mathcal {M}_{\delta }}, \bar{\sigma } \models {\mathbb {O}_{i}} \ge p_{i} \iff {\mathcal {M}_{\delta }}, {\text {di}(\sigma )}\models {\mathbb {O}_{i}} \ge p_{i} \iff \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G)\ge p_{i}\ , \end{aligned}$$

With Corollary 1 it follows that

$$\begin{aligned} p_{i}' = p_{i} - \varepsilon ^{\downarrow }_{i} \le \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) - \varepsilon ^{\downarrow }_{i} \overset{Cor.\,1}{\le } \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}). \end{aligned}$$

As this observation holds for all objectives in ${\mathbb {O}_{}}$, it follows that $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }'$, implying $ achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }')$.

The proof of the second inclusion is similar. Assume that $\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }'$ holds for a point $\mathbf{p }' = (p_{1}', \dots , p_{d}') \in \mathbb {R}^d$ and a scheduler $\sigma \in \text {GM}^{\mathcal {M}}$. For some index i, consider ${\mathbb {O}_{i}} = {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}$. It follows that $\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \ge p_{i}'$. With Corollary 1 we obtain

$$\begin{aligned} p_{i}' - \varepsilon ^{\uparrow }_{i} \le \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})- \varepsilon ^{\uparrow }_{i} \overset{Cor.\,1}{\le } \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) . \end{aligned}$$

Applying this for all objectives in ${\mathbb {O}_{}}$ yields ${\mathcal {M}_{\delta }}, {\text {di}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }$, where the point $\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d$ satisfies $ p_{i} = p_{i}' - \varepsilon ^{\uparrow }_{i}$ or, equivalently, $ p_{i}' = p_{i} + \varepsilon ^{\uparrow }_{i}$ for each index i. Note that $\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ which implies $\mathbf{p }' \in A^{+}$. $\square $

5 Computation of Pareto curves

We have seen that we can reduce the analysis of multiple objectives on MA to multi-objective MDPs to compute the achievable points of the underlying MDP ${\mathcal {M}_\mathcal {D}}$ or a digitization ${\mathcal {M}_{\delta }}$ of MA $\mathcal {M}$. To analyze the MDPs, we adapt the approach of [28]. In this section, we briefly recap that approach and report on the necessary changes. The approach repeatedly checks weighted combinations of the objectives (by means of value iteration [43]—a standard technique in single-objective MDP model checking) to refine an approximation of the set of achievable points. The following example demonstrates this idea for two objectives.

Example 14

Consider an MDP $\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))$ with two objectives ${\mathbb {O}_{}}= ({\mathbb {O}_{1}}, {\mathbb {O}_{2}})$ and relations ${\vartriangleright _{}} = \{\ge , \ge \}$. The approach of [28] iteratively considers weight vectors $\mathbf{w }= ({w_{1}}, {w_{2}})\in (\mathbb {R}_{\ge 0})^2 $ with $\mathbf{w }\ne (0,0)$ that assign a weight ${w_{i}} \ge 0$ to each objective ${\mathbb {O}_{i}}$. Optimizing a combination of the weighted objectives (see Def. 18) yields a point $\mathbf{q }\in \mathbb {R}^2$ such that

$\mathbf{q }$ is achievable and
all achievable points of $\mathcal {D}$ are contained in the half-space ${H}=\{ \mathbf{p }\in \mathbb {R}^2 \mid \mathbf{p }\cdot \mathbf{w }\le \mathbf{q }\cdot \mathbf{w }\}$.

Since $\mathbf{q }$ is achievable, any point in the set ${ down (\{\mathbf{q }\})} = \{\mathbf{p }\in \mathbb {R}^2 \mid \mathbf{p }\le \mathbf{q }\}$ depicted by the green area of Fig. 7a is achievable as well. On the contrary, there is no achievable point in $\mathbb {R}^2 {\setminus } {H}$, illustrated by the red area. For the points in the white area, it is still unknown whether they are achievable or not. The set of achievable points of $\mathcal {D}$ is explored by combining the results for multiple weight-vectors as indicated in Fig. 7b.

The sketched approach theoretically converges to an exact representation of the set of achievable points, but the number of required calls to value iteration can be exponential in the size of the MDP and the number of objectives [28]. However, experiments in [28] and in Sect. 6 of this work indicate that, in practice, only a small number of weight vectors need to be considered in order to obtain “good” approximations.

We extend [28] towards

the simultaneous analysis of minimizing and maximizing expected reward objectives, and
the analysis of $\text {ds}$-bounded reachability objectives.

These extensions only concern the computation of optimal points. The remaining aspects of the approach are as in [28]. We first restrict our attention to maximizing expected total reward objectives, i.e., expected reward objectives of the form ${\mathbb {E}({\# j_{}, G_{}})}$ with $G= \emptyset $.

Definition 18

(Optimal Scheduler, Optimal Point) Let $\mathcal {D}$ be an MDP and let ${\mathbb {O}_{}}= ({\mathbb {E}({\# j_{1}, \emptyset })}, \dots , {\mathbb {E}({\# j_{d}, \emptyset })})$ be expected total reward objectives with threshold relations ${\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})$ and ${\vartriangleright _{i}} \in \{{\ge }, {>} \}$. A scheduler $\sigma \in \text {TA}^{}$ is called optimal for a weight vector $\mathbf{w }= ({w_{1}}, \dots , {w_{d}})\in \mathbb {R}_{\ge 0}^d{\setminus } \{\mathbf{0 }\}$ iff

$$\begin{aligned} \sigma \in {\mathop {\hbox {arg max}}\limits _{\sigma ' \in \text {TA}^{}}} \Big (\sum _{i=0}^{d} {w_{i}} \cdot \mathrm {ER}^{\mathcal {D}}_{\sigma '}(\rho _{j_i}, \emptyset ) \Big ). \end{aligned}$$

A point $\mathbf{p }=(p_{1}, \dots , p_{d})\in \mathbb {R}^d$ is optimal for $\mathbf{w }$ iff $p_{i} = \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset )$ for all i and optimal $\sigma $.

The computation of optimal points for a weighted sum of expected total reward objectives with thresholds ${\vartriangleright _{i}} \in \{{\ge }, {>} \}$ is detailed in [28]. The idea is to use value iteration [43] to compute an optimal scheduler $\sigma _\text {opt}$ for the maximal expected reward $\max _{\sigma \in \text {TA}^{}} \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{\mathbf{w }}, \emptyset )$ with the weighted reward function $\rho _{\mathbf{w }}$ given by $ \rho _{\mathbf{w }}(s, \alpha ) = \sum _{i=1}^{d} {w_{i}} \cdot \rho _{j_i}(s, \alpha ) $ for each $s\in S$ and $\alpha \in Act $. Evaluating the individual objectives with respect to $\sigma _\text {opt}$ yields the entries of an optimal point $\mathbf{p }=(p_{1}, \dots , p_{d})$, i.e., $p_{i} = \mathrm {ER}^{\mathcal {D}}_{\sigma _\text {opt}}(\rho _{j_i}, \emptyset )$ for all i. In general, we might have $p_{i} = \infty $ and thus $\mathbf{p }\notin \mathbb {R}^d$. In this case, the sketched approach is not applicable. We therefore impose some assumptions.

Assumptions on reward finiteness For maximizing objectives ${\mathbb {E}({\# j_{i}, \emptyset })}$ with ${\vartriangleright _{i}} \in \{{\ge }, {>} \}$ we assume $\max _{\sigma \in \text {TA}^{}}\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) < \infty $—following the suggestions of [28, 29]. Note that if a scheduler $\sigma _\infty $ with $\mathrm {ER}^{\mathcal {D}}_{\sigma _\infty }(\rho _{j_i}, \emptyset ) = \infty $ exists, we can also construct schedulers $\sigma $ with $\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) = \infty $ that mimic $\sigma _\infty $ with an arbitrarily small (but non-zero) probability and otherwise focus on the remaining objectives. We further assume that there is at least one scheduler $\sigma $ inducing $\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) < \infty $ for all minimizing objectives ${\mathbb {E}({\# j_{i}, \emptyset })}$ with ${\vartriangleright _{i}} \in \{{\le }, {<} \}$. If this is not the case, there is no achievable point $\mathbf{p }\in \mathbb {R}^d$ at all. These assumptions can be checked algorithmically via a graph analysis.

Remark 2

(Reachability and (general) expected reward objectives) A transformation of untimed reachability objectives ${\mathbb {P}({\lozenge ^{}G_{}})}$ to expected total reward objectives is given in [28, 29]. Roughly, the state space of the MDP is unfolded, yielding two copies of each state. Transitions leading to a state in $G$ are redirected to the second copy, allowing us to store whether a state in $G$ has already been visited. A reward of 1 is collected whenever a goal state is visited for the first time, i.e., when we move from the first copy to the second one. A similar unfolding technique can be applied to transform expected reward objectives ${\mathbb {E}({\# j_{}, G_{}})}$ with $G\ne \emptyset $ to total reward objectives. This approach increases the number of considered states by a factor of up to $2^d$, where $d$ is the number of objectives.

5.1 Treatment of minimizing objectives

We now consider expected total reward objectives with arbitrary threshold relations. Let $\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))$ be an MDP and let ${\mathbb {O}_{}}= ({\mathbb {E}({\# j_{1}, \emptyset })}, \dots , {\mathbb {E}({\# j_{d}, \emptyset })})$ be expected total reward objectives with threshold relations ${\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})$. Without loss of generality, let each objective consider a different reward function, i.e., the indices $j_1, \dots , j_d$ are pairwise distinct. We further simplify the notations by assuming $j_i = i$ for $i \in \{1, \dots , d\}$, i.e. the i-th objective considers reward function $\rho _{i}$. We proceed in three steps:

1.
Convert all minimizing objectives ${\mathbb {E}({\# j_{i}, \emptyset })}$ with ${\vartriangleright _{i}} \in \{\le , <\}$ to maximizing objectives, potentially introducing negative rewards.
2.
Compute an optimal scheduler for the maximal expected reward for a weighted reward function that considers positive- and negative rewards.
3.
Lift further reward finiteness assumptions imposed in Step 2.

5.1.1 From minimizing to maximizing objectives

We convert minimizing objectives ${\mathbb {E}({\# i, \emptyset })}$ with ${\vartriangleright _{i}} \in \{\le , <\}$ to maximizing objectives by negating the considered rewards, and thereby deviate from our definition of MDPs from Sect. 2 by allowing negative rewards.

More precisely, we consider the reward functions $\overline{\rho }_{1}, \dots , \overline{\rho }_{d}:S\times Act \rightarrow \mathbb {R}$ and relations ${\mathbin {\overline{\vartriangleright }_{}}} = ({\mathbin {\overline{\vartriangleright }_{1}}}, \dots , {\mathbin {\overline{\vartriangleright }_{d}}})$, where for $i \in \{1, \dots , d\}$, $s\in S$, and $\alpha \in Act $:

$$\begin{aligned} \overline{\rho }_{i}(s, \alpha ) = {\left\{ \begin{array}{ll} -\rho _{i}(s, \alpha ) &{} \text {if } {\vartriangleright _{i}} \in \{\le ,<\}\\ \rho _{i}(s, \alpha ) &{} \text {otherwise}\\ \end{array}\right. } \qquad \text {and} \qquad {\mathbin {\overline{\vartriangleright }_{i}}} = {\left\{ \begin{array}{ll} \ge &{} \text {if } {\vartriangleright _{i}} = {\le }\\ > &{} \text {if } {\vartriangleright _{i}} = {<}\\ \vartriangleright _{i} &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Let $\overline{\mathcal {D}} = (S, Act , \mathbf{P} , {s_{0}}, (\overline{\rho }_{1}, \dots , \overline{\rho }_{d}))$. For each scheduler $\sigma $ for $\mathcal {D}$ (and $\overline{\mathcal {D}}$), $i \in \{1, \dots , d\}$, and $p_{i} \in \mathbb {R}$ we have

$$\begin{aligned} \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{i}, \emptyset ) \le p_{i} \iff -\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{i}, \emptyset ) \ge -p_{i} \iff \mathrm {ER}^{\mathcal {D}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \ge -p_{i} \ . \end{aligned}$$

Lifting to multiple objectives yields for all $\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d$:

$$\begin{aligned} achieve ^{\mathcal {D}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\mathbin {\overline{\vartriangleright }_{}}\overline{\mathbf{p }}), \end{aligned}$$

where $\overline{\mathbf{p }} = (\overline{p}_1, \dots , \overline{p}_d)$ with $\overline{p}_i = -p_i$ if ${\vartriangleright _{i}} \in \{\le , <\}$ and otherwise $\overline{p}_i = p_i$.

We can thus compute (or approximate) the set of achievable points for $\overline{\mathcal {D}}$ and $\mathbin {\overline{\vartriangleright }_{}}$ instead of $\mathcal {D}$ and $\vartriangleright _{}$. More concretely, we employ the approach of [28] to approximate the set $ \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\mathbin {\overline{\vartriangleright }_{}}\mathbf{p }) \} $ where only maximizing objectives are considered. The result for $\mathcal {D}$ and $\vartriangleright _{}$ can then be obtained by a simple transformation, essentially multiplying entries of minimizing objectives by $-1$.

5.1.2 Mixtures of positive and negative rewards

As mentioned above, the approach of [28] requires to repeatedly compute an optimal scheduler for weighted reward functions. Considering the MDP $\overline{\mathcal {D}} = (S, Act , \mathbf{P} , {s_{0}}, (\overline{\rho }_{1}, \dots , \overline{\rho }_{d}))$ from above results in some technical complications due to the presence of positive and negative rewards.

For simplicity, we further strengthen our assumptions on reward finiteness by assuming that all induced expected rewards are finite, i.e., $\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \notin \{-\infty , \infty \}$ for all $\sigma \in \text {TA}^{}$ and $i \in \{1,\dots ,d\}$. The next section discusses how this assumption can be lifted again.

For a weight vector $\mathbf{w }= ({w_{1}}, \dots , {w_{d}})\in \mathbb {R}_{\ge 0}^d{\setminus } \{\mathbf{0 }\}$ we consider the weighted reward function $\rho _{\mathbf{w }}$ given by $\rho _{\mathbf{w }}(s, \alpha ) = \sum _{i=1}^{d} {w_{i}} \cdot \overline{\rho }_{i}(s,\alpha )$ for $s\in S$ and $\alpha \in Act $ and let $\mathcal {D}^\mathbf{w }= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{\mathbf{w }}))$ denote the MDP which arises from $\overline{\mathcal {D}}$ by replacing the reward functions $(\overline{\rho }_{1}, \dots , \overline{\rho }_{d})$ by $\rho _{\mathbf{w }}$. Our goal in this section is to compute an optimal scheduler for $\mathbf{w }$ (cf. Def. 18), i.e., a scheduler inducing the maximal expected total reward for . Since considers both, positive- and negative rewards, conventional value iteration as considered in [28] yields incorrect results.

Example 15

Consider the MDP $\mathcal {D}^\mathbf{w }$ with the weighted reward function $\rho _{\mathbf{w }}$ depicted in Fig. 8a. Action rewards are depicted next to the action label, e.g., $\rho _{\mathbf{w }}(s_0, \alpha ) = 2$. The maximal expected total reward is obtained for a scheduler $\sigma _\text {opt}$ that always chooses action $\alpha $, yielding $\mathrm {ER}^{}_{\sigma _\text {opt}}(\rho _{\mathbf{w }}, \emptyset ) = 2-1=1$.

On the other hand, value iteration as suggested in [28, Alg. 2] yields a value of 2 and a suboptimal scheduler $\sigma $ with $\sigma (s_0) = \beta $. Roughly speaking, this is because value iteration computes the expected reward accumulated within n steps for an increasing value of n. However, for the example MDP, this step-bounded expected reward does not converge to the unbounded expected total reward: For a given step-bound n, an optimal scheduler can avoid to collect the reward $\rho _{\mathbf{w }}(s_1,\alpha )=-1$ by taking the $\beta $-self-loop at $s_0$ $n-1$ times and then taking the $\alpha $-transition to $s_1$ only in the very last step.

In the example, the problem is introduced by the action $\beta $ at state $s_0$. This action allows a scheduler to stay at $s_0$ arbitrarily long without collecting any reward. We refer to such model components as 0-EC.

Definition 19

(End Component, 0-EC) A non-empty set of state-action pairs $\mathcal {E}\subseteq S\times Act $ is an end component (EC) of $\mathcal {D}^\mathbf{w }$ if

1.
$\forall (s, \alpha ) \in \mathcal {E}:\mathbf{P} (s, \alpha , s') > 0$ implies $s' \in { states (\mathcal {E})}= \{ \tilde{s} \mid \exists (\tilde{s}, \tilde{\alpha }) \in \mathcal {E}\}$ and
2.
the graph $\big ({ states (\mathcal {E})},\, \big \{ (s, s') \mid \exists (s, \alpha )\in \mathcal {E} :\mathbf{P} (s, \alpha , s') > 0 \big \} \big )$ is strongly connected.

An EC $\mathcal {E}$ is a 0-EC w.r.t. reward function $\rho _{\mathbf{w }}$ if $\rho _{\mathbf{w }}(s,\alpha ) = 0$ for all $(s,\alpha ) \in \mathcal {E}$. A (0-)EC $\mathcal {E}$ is maximal, if there is no other (0-)EC $\mathcal {E}'$ with $\mathcal {E}\subsetneq \mathcal {E}'$.

Example 16

The MDP $\mathcal {D}^\mathbf{w }$ in Fig. 8a has two maximal 0-EC w.r.t. the depicted reward function $\rho _{\mathbf{w }}$: $\mathcal {E}_1 = \{(s_0, \beta )\}$ and $\mathcal {E}_2 = \{(s_2,\alpha )\}$.

The set of all maximal ECs of an MDP can be computed efficiently [17]. The set of maximal 0-ECs can be obtained by computing the maximal ECs of a modified MDP in which all transitions incurring non-zero reward are erased. However, in this particular case we know that all ECs (reachable from ${s_{0}}$) actually are 0-ECs: If an EC is not a 0-EC, we could construct a scheduler inducing infinite reward for at least one objective which violates our assumption that all induced expected rewards are finite.

For an EC $\mathcal {E}$ we define ${ states (\mathcal {E})}:= \{ s\in S\mid \exists \alpha \in Act (s): (s, \alpha ) \in \mathcal {E}\}$ and ${ exits (\mathcal {E})}{:}{=} \{(s,\alpha ) \in S\times Act \mid s\in { states (\mathcal {E})}, \alpha \in Act (s), \text { and } (s,\alpha ) \notin \mathcal {E}\}$.

Intuitively, once a maximal 0-EC $\mathcal {E}$ is reached, a scheduler can choose to stay within the states of $\mathcal {E}$ for arbitrary many steps by only picking actions $\alpha $ at states $s\in { states (\mathcal {E})}$ with $(s,\alpha ) \in \mathcal {E}$. Since ECs are strongly connected, it is possible to reach any state $s\in { states (\mathcal {E})}$ almost surely. At some point, the scheduler may decide to leave the EC by choosing an exiting state-action pair $(s, \alpha ) \in { exits (\mathcal {E})}$. Although this decision could be delayed for arbitrary many steps, such a delay has no effect on the induced expected total reward since no reward is accumulated in 0-ECs.

Our approach is to replace each maximal 0-EC $\mathcal {E}$ of $\mathcal {D}^\mathbf{w }$ by a state $s_?^\mathcal {E}$ in which a scheduler immediately has to choose either an exiting state-action pair $(s, \alpha ) \in { exits (\mathcal {E})}$ or an action $\alpha _!$, indicating that the EC should never be left. This procedure coincides with the computation of EC quotients in, e.g., [3, 4, 21, 32].

Definition 20

(0-EC Quotient) The 0-EC quotient of $\mathcal {D}^\mathbf{w }$ w.r.t. reward function $\rho _{\mathbf{w }}$ is the MDP $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}} = (S', Act ', \mathbf{P} ', {s_{0}}', (\rho _{\mathbf{w }}'))$ obtained by applying the following steps on $\mathcal {D}^\mathbf{w }$ for every maximal 0-EC $\mathcal {E}$:

1.
Add two new states $s_?^\mathcal {E}$ and $s_!^\mathcal {E}$ and remove all states from ${ states (\mathcal {E})}$.
2.
Add a new action $\alpha _s$ for each $(s,\alpha ) \in { exits (\mathcal {E})}$.
3.
Redirect the target of transitions that enter the states in ${ states (\mathcal {E})}$ to $s_?^\mathcal {E}$, i.e., $\mathbf{P} '(s,\alpha , s_?^\mathcal {E}) = \sum _{s' \in { states (\mathcal {E})}} \mathbf{P} (s,\alpha ,s')$ for every $(s,\alpha ) \in (S\times Act ) {\setminus } \mathcal {E}$.
4.
Redirect the origin of transitions that exit the states in ${ states (\mathcal {E})}$ to $s_?^\mathcal {E}$, i.e., $\mathbf{P} '(s_?^\mathcal {E}, \alpha _s, s') = \mathbf{P} (s,\alpha ,s')$ for every $(s,\alpha ) \in { exits (\mathcal {E})}$.
5.
Add a new action $\alpha _!$ and set $\mathbf{P} (s,\alpha _!,s_!^\mathcal {E}) = 1$ for $s \in \{s_?^\mathcal {E}, s_!^\mathcal {E}\}$.
6.
Restrict the reward function $\rho _{\mathbf{w }}$ to the remaining state-action pairs and set $\rho _{\mathbf{w }}'(s_?^\mathcal {E}, \alpha _!) = \rho _{\mathbf{w }}'(s_!^\mathcal {E}, \alpha _!) = 0$ and $\rho _{\mathbf{w }}'(s_?^\mathcal {E}, \alpha _s) = \rho _{\mathbf{w }}(s, \alpha )$ for all $(s,\alpha ) \in { exits (\mathcal {E})}$.

Example 17

Fig. 8b depicts the 0-EC quotient of the MDP in Fig. 8a w.r.t. the depicted reward function $\rho _{\mathbf{w }}$.

Considering the 0-EC quotient $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}} = (S', Act ', \mathbf{P} ', {s_{0}}', (\rho _{\mathbf{w }}'))$ instead of $\mathcal {D}^\mathbf{w }$ preserves maximal expected total rewards, i.e.,

$$\begin{aligned} \max _{\sigma \in \text {TA}^{\mathcal {D}^\mathbf{w }}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset ) = \max _{\sigma ' \in \text {TA}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}_{\sigma '}(\rho _{\mathbf{w }}', \emptyset ). \end{aligned}$$

Moreover, we can transform any scheduler $\sigma '$ for $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}$ to a scheduler $\sigma $ for $\mathcal {D}^\mathbf{w }$ with the same expected reward, i.e., $ \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset ) = \mathrm {ER}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}_{\sigma '}(\rho _{\mathbf{w }}', \emptyset ). $ In particular, if $\sigma '$ chooses an action $\alpha _s$ at a state $s_?^\mathcal {E}$, $\sigma $ can mimic this by choosing $\alpha $ at state $s\in { states (\mathcal {E})}$ and enforcing that s is reached almost surely from any other state $s' \in { states (\mathcal {E})}{\setminus } \{s\}$ of the 0-EC $\mathcal {E}$. Similarly, if $\sigma '$ chooses $\alpha _!$ at $s_?^\mathcal {E}$, $\sigma $ can mimic this by only picking actions $\alpha $ at states $s\in { states (\mathcal {E})}$ with $(s,\alpha ) \in \mathcal {E}$. We refer to [4, 21] for more details on the correctness of this construction.

Since there are no more ECs in $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}$ (other than the 0-ECs of the form $\{(s_!^\mathcal {E}, \alpha _!)\}$ which can not be left), it can be shown that the value iteration algorithm on $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}$ approaches the correct expected total rewards [4, 6].

5.1.3 Negative infinite rewards

We now lift our strengthened reward finiteness assumptions from the previous section (but still impose the assumptions mentioned on page 29). More precisely, we now allow that $\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma _\infty }(\overline{\rho }_{i}, \emptyset )= -\infty $ for some $i \in \{1, \dots , d\}$ with ${\vartriangleright _{i}} \in \{{\le }, {<} \}$ and some scheduler $\sigma _\infty $. Observe that such a scheduler does not achieve any point $\mathbf{p }\in \mathbb {R}^d$ and can thus be excluded from the analysis. Thus, for the computation of $\{ \mathbf{p }\in \mathbb {R}^d\mid achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}$ it suffices to restrict the analysis to schedulers in $\{ \sigma \in \text {TA}^{}\mid \forall i \in \{1, \dots , d\} :\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \ne -\infty \}$. Our assumptions for reward finiteness imply that this set is not empty.

If the considered weight vector $\mathbf{w }= ({w_{1}}, \dots , {w_{d}})$ satisfies $w_i > 0$ for all $i \in \{0, \dots , d\}$, our approach from the previous section can be applied without further modifications: If $\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma _\infty }(\overline{\rho }_{i}, \emptyset )= -\infty $ for some $i \in \{1, \dots , d\}$ and some scheduler $\sigma _\infty $, we also have $\mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma _\infty }(\rho _{\mathbf{w }}, \emptyset )= -\infty < \max _{\sigma \in \text {TA}^{}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset )$.

However, if $w_i = 0$ for some $i \in \{0, \dots , d\}$, there might be a 0-EC w.r.t. $\rho _{\mathbf{w }}$ that is not a 0-EC w.r.t. $\overline{\rho }_{i}$. This is because the rewards of $\rho _{i}$ are not considered in the weighted reward function $\rho _{\mathbf{w }}$. As a consequence, a scheduler for $\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}$ as defined above might choose action $\alpha _!$ at a state $s_?^\mathcal {E}$ for an EC $\mathcal {E}$ that is not a 0-EC w.r.t. $\overline{\rho }_{i}$. Transforming such a scheduler back to $\overline{\mathcal {D}}$ can then induce an expected total reward of $-\infty $ w.r.t. $\overline{\rho }_{i}$. We exclude such schedulers during the computations by tweaking the elimination of 0-ECs in Def. 20: The action $\alpha _!$ is only inserted (Step 5) if there is a scheduler for $\overline{\mathcal {D}}$ that stays in the given EC forever and that yields finite expected reward for all individual objectives.

5.2 $\text {ds}$-bounded reachability

We generalize reachability objectives from Def. 6 to $\text {ds}$-bounded reachability objectives of the form ${\mathbb {P}({\lozenge ^{J}_{\text {ds}} G})}$ which concern $\text {ds}$-bounded reachability probabilities (cf. Def. 15). As explained in Sect. 4.3, multi-objective queries for MA considering timed reachability objectives can be approximated by checking $\text {ds}$-bounded reachability objectives on a digitization of the MA. This section extends the approach of [28] to process such objectives.

Assume the digitization ${\mathcal {M}_{\delta }}= (S, Act , \mathbf{P} _{\delta }, {s_{0}}, (\rho _{1}^\delta , \dots , \rho _{\ell }^\delta ))$ of an MA $\mathcal {M}$ w.r.t. $\delta > 0$ and $\text {ds}$-bounded reachability objectives ${\mathbb {P}({\lozenge ^{\le k_{1}}_{\text {ds}} G_{1}})}, \dots , {\mathbb {P}({\lozenge ^{\le k_{d}}_{\text {ds}} G_{d}})}$. Other digitization step bounds (e.g. lower step bounds) are treated similarly. We further denote by $\text {MS}\subseteq S$ the set of states of ${\mathcal {M}_{\delta }}$ that represent the Markovian states of the initially considered MA. We present a transformation from $\text {ds}$-bounded reachability objectives to untimed reachability objectives. The idea is to store the number of visited Markovian states in the state space. Upon reaching a state in $G_i$, this information allows us to distinguish the cases where the step bound $k_i$ has been exceeded or not. The procedure is based on the idea of [35, Algorithm 1] for single-objective MA.

For the largest occurring step bound $k_{\text {max}}= \max _i k_i$ let $\mathcal {D}$ be the MDP consisting of $k_{\text {max}}+ 2$ copies of ${\mathcal {M}_{\delta }}$ such that transitions to states $s'\in \text {MS}$ are redirected to the next copy. Formally, $\mathcal {D}= (S\times \{0, \dots , k_{\text {max}}+ 1 \}, Act , \mathbf{P} ,({s_{0}}, k_\text {init}), \{\rho _{1}, \dots , \rho _{\ell } \})$ where

$k_\text {init} = 1$ if ${s_{0}}\in \text {MS}$, and $k_\text {init} = 0$ otherwise,
$ \mathbf{P} ((s,k),\,\alpha ,\,(s',k')) = \mathbf{P} _{\delta }(s,\alpha ,s')$ if $s' \in \text {MS}$ and $k' = \min (k{+} 1, k_{\text {max}}{+}1)$,
$ \mathbf{P} ((s,k),\,\alpha ,\,(s',k)) = \mathbf{P} _{\delta }(s,\alpha ,s')$ if $s' \notin \text {MS}$,
$\mathbf{P} ((s,k),\,\alpha ,\,(s',k')) = 0$ in all other cases, and
every reward function $\rho _{i}$ satisfies $\rho _{i}((s,k),\,\alpha ) = \rho _{i}^\delta (s,\alpha )$.

The second component of a state $(s,k)$ of $\mathcal {D}$ reflects the number of Markovian states visited so far. Hence, eventually reaching $G_i$ in ${\mathcal {M}_{\delta }}$ with at most $k_i$ digitization steps is equivalent to reaching $G'_i = \{ (s,k) \mid s \in G_i \text { and } k \le k_i \}$ in $\mathcal {D}$. In particular, there is a mapping from schedulers $\sigma _\delta $ for ${\mathcal {M}_{\delta }}$ to schedulers $\sigma $ for $\mathcal {D}$ such that $\text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma _\delta }(\lozenge ^{\le k_i}_{\text {ds}} G_i) = \text {Pr}^{\mathcal {D}}_{\sigma }(\lozenge ^{}G'_i)$.

Example 18

We illustrate the construction for the digitization ${\mathcal {M}_{\delta }}$ depicted in Fig. 9a and the objective ${\mathbb {P}({\lozenge ^{\le 2}_{\text {ds}} \{s_2\}})}$. States that correspond to Markovian states of the original MA are depicted with rectangles.

Figure 9b shows the reachable states of the MDP $\mathcal {D}$ as given by the construction above.

With the construction above, a digitization ${\mathcal {M}_{\delta }}$ with objectives ${\mathbb {O}_{}}$ can be transformed to an MDP $\mathcal {D}$ and untimed reachability or expected reward objectives ${\mathbb {O}_{}}'$ such that

$$\begin{aligned} achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}) = achieve ^{\mathcal {D}}({\mathbb {O}_{}}'). \end{aligned}$$

The latter can be computed using the approach of [28]. In practice, however, this approach is not feasible as $k_{\text {max}}$ may take high values, leading to huge model sizes. To avoid the problematic, the different copies of ${\mathcal {M}_{\delta }}$ are analyzed individually. For $k\in \{0, \dots , k_{\text {max}}{+} 1\}$ let $S_k$ denote the states $(s, k)$ of $\mathcal {D}$ with second entry $k$. A transition emerging from a state in $S_k$ can only point to a state in $S_k\cup S_{k{+}1}$. As a consequence, for the computation of optimal values for the states in $S_k$, only the results of the states in $S_{k{+}1}$ are relevant. We thus analyze the sub-models of $\mathcal {D}$ induced by the state sets $S_{k_{\text {max}}{+} 1}, S_{k_{\text {max}}}, \dots , S_1$ and $S_0$ sequentially in the given order. Details on the sequential approach can be found in [35] for single-objective MA and in [33] for multi-objective MDPs.

Dealing with digitization errors Recall that the analysis of the digitization ${\mathcal {M}_{\delta }}$ only approximates the timed reachability probabilities of the MA. More precisely, given a weight vector $\mathbf{w }$, the approach outlined above computes an optimal point $\mathbf{p }$ for $\mathbf{w }$ on ${\mathcal {M}_{\delta }}$ and the digitized query. Applying our results from Sect. 4.3 (in particular Theorem 5), we can lift the point $\mathbf{p }$ to the original time-bounded reachability query by considering the hyperrectangle $\varepsilon ({\mathbb {O}_{}}, \mathbf{p })$. Fig. 10a illustrates this, where $\varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ is depicted by the blue rectangle. The point $\mathbf{q }$ at the bottom left of the rectangle is known to be achievable, implying that all points in the green area are achievable, whereas the points in the red area are known to be unachievable. This introduces a gap between the achievable and unachievable area. We measure the size $\gamma $ of this gap by considering the smallest distance between the achievable and the unachievable points, i.e., the distance between the point $\mathbf{q }$ and the red area in Fig. 10a. The value of $\gamma $ depends on the size of the rectangle $\varepsilon ({\mathbb {O}_{}}, \mathbf{p })$ and thus on the selection of the digitization constant $\delta $. However, $\gamma $ also depends on the considered weight vector $\mathbf{w }$ which motivates a dynamic selection of the $\delta $, depending on the currently considered weight vector.

Example 19

Assume that an optimal point $\mathbf{p }$ has been computed for the weight vector $\mathbf{w }^{(1)} = (0,1)$, resulting in the (un-)achievable points and gap $\gamma $ depicted in Fig. 10b. Note that $\gamma $ is not affected by the comparably large approximation error for the first objective (x-axis). We continue the example in Fig. 10c. To achieve the same gap $\gamma $ also for the weight vector $\mathbf{w }^{(2)}=(1,0)$, a smaller digitization constant has been chosen so that the rectangle $\varepsilon ({\mathbb {O}_{}}, \mathbf{p }^{(2)})$ becomes smaller. We assume that—by coincidence—the analysis for $\mathbf{w }^{(2)}=(1,0)$ produces the same point $\mathbf{q }$ as in the previous step from Fig. 10b. For example, this is possible when the nondeterminism of the model is spurious. Observe that the largest distance between a point in the unknown (white) area and the green area is $\sqrt{\gamma ^2 + \gamma ^2} = \sqrt{2} \cdot \gamma $. Put differently, the obtained approximation in Fig. 10c is a $(\sqrt{2} \cdot \gamma )$-approximation^{Footnote 4} of the set of achievable points for the considered MA.

In general, we can show the following. If all gaps $\gamma $ are below $\eta / \sqrt{d}$ for some $\eta > 0$ and the number of objectives $d$, the approach of [28] converges to an $\eta $-approximation of the set of achievable points.

In order to obtain an $\eta $-approximation, our implementation that we discuss in Sect. 6 therefore implements the following heuristic for the selection of the digitization constants. Given a weight vector $\mathbf{w }$, we choose the largest possible digitization constant $\delta $ for which the resulting gap $\gamma $ w.r.t. $\mathbf{w }$ is at most $(\eta / \sqrt{d}) \cdot 0.9$. Here, the factor 0.9 ensures that we do not have to explore the full set of achievable points of ${\mathcal {M}_{\delta }}$, which otherwise could result in analyzing an unnecessarily large amount of weight vectors.

6 Experimental evaluation

Implementation We implemented multi-objective model checking of MAs into Storm [22, 36]. The input model is given in the PRISM language^{Footnote 5} and translated into a sparse representation. For MA $\mathcal {M}$, the implementation performs a multi-objective analysis on the underlying MDP ${\mathcal {M}_\mathcal {D}}$ or a digitization ${\mathcal {M}_{\delta }}$ and infers (an approximation of) the achievable points of $\mathcal {M}$ by exploiting the results from Sect. 4. For computing the achievable points of ${\mathcal {M}_\mathcal {D}}$ and ${\mathcal {M}_{\delta }}$, we apply the approach of [28] with the extensions explained in Sect. 5.

All material to replicate the experiments is available at [45]. The implementation is part of $\texttt {Storm} $ since release 1.6.3 available at http://stormchecker.org.

Setup Our implementation uses a single core of an Intel Xeon Platinum 8160 CPU with memory limited to 20GB RAM. The timeout (TO) is two hours. For a model, a set of objectives, and a precision $\eta \in \mathbb {R}_{> 0}$, we measure the time to compute an $\eta $-approximation of the set of achievable points. This set-up coincides with Pareto queries as considered in [28]. The digitization constant $\delta $ is chosen heuristically as discussed in Sect. 5.2. We set the precision for value-iteration to $\varepsilon = 10^{-6}$. Similar to [28], we use the conventional, unsound variant of value iteration. The use of improved algorithms providing sound precision guarantees [32] is left for future work.

Results for MAs We consider four case studies: (i) a job scheduler [12], see Sect. 1; (ii) a polling system [48, 50] containing a server processing jobs that arrive at two stations; (iii) a video streaming client buffering received packages and deciding when to start playback; and (iv) a randomized mutual exclusion algorithm [50], a variant of [42] with a process-dependent random delay in the critical section. Details on the benchmarks and the objectives are given in App. E.1.

Table 1 Experimental results for multi-objective MAs

Full size table

Tab. 1 lists results. For each instance we give the defining constants, the number of states of the MA and the used $\eta $-approximation. A multi-objective query is given by the triple (l, m, n) indicating l untimed, m expected reward, and n timed objectives. For each MA and query we depict the total run-time of our implementation (time) and the number of vertices of the obtained under-approximation (pts).

Queries analyzed on the underlying MDP are solved efficiently on large models with up to millions of states. For timed objectives the run-times increase drastically due to the costly analysis of digitized reachability objectives on the digitization, cf. [30]. Queries with up to four objectives can be dealt with within the time limit. Furthermore, for an approximation one order of magnitude better, the number of vertices of the result increases approximately by a factor three. In addition, a lower digitization constant has then to be considered which often leads to timeouts in experiments with timed objectives.

Comparison with PRISM [39] and IMCA [30]. We compared the performance of our implementation with both PRISM^{Footnote 6} and IMCA^{Footnote 7}.

For the comparison with PRISM (no MAs), we considered the multi-objective MDP benchmarks from [28, 29]. We conducted our experiments on PRISM with both variants of the value iteration-based implementation (standard and Gauss-Seidel) and chose the faster variant for each benchmark instance. For all experiments the approximation precision $\eta = 0.001$ was considered.

For the comparison with IMCA (no multi-objective queries) we used the benchmarks from Tab. 1, with just a single objective. The experiments on IMCA have been conducted with and without enabling value-iteration and we chose the faster variant for each benchmark instance. For timed reachability objectives, the precision $\eta = 0.001$ was considered in all experiments.

Verification times are summarized in Fig. 11: On points above the diagonal, our implementation is faster. Both implementations are based on [28]. We observe that our implementation is competitive with these tools. Further details are given in App. E.2 and App. E.3.

7 Conclusion

We considered multi-objective verification of Markov automata, including in particular timed reachability objectives. The next step is to apply our algorithms to the manifold applications of MA, such as generalized stochastic Petri nets to enrich the analysis possibilities of such nets, and to investigate whether recent advances in the single-objective timed reachability analysis, in particular the methods of [14, 15] may be lifted to the multi-objective case.

Notes

Multiple outgoing Markovian transition could be reduced to a single Markovian transition by taking a weighted sum.
Our construction is roughly inspired by a construction in [10, Sect. 6], where schedulers for MDPs with stochastic memory-updates are considered. Lifting the approach of [10] to Markov automata is not obvious.
In the figure, $A^{-}$ partly overlaps $A$, i.e., the green area also belongs to $A$.
An $\eta $-approximation of $A\subseteq \mathbb {R}^d$ is given by $A^{-}, A^{+}\subseteq \mathbb {R}^d$ with $A^{-}\subseteq A\subseteq A^{+}$ and for all $\mathbf{p }\in A^{+}$ exists a $\mathbf{q }\in A^{-}$ such that the distance between $\mathbf{p }$ and $\mathbf{q }$ is at most $\eta $.
We slightly extend the PRISM language in order to describe MAs.
We considered PRISM 4.6 obtained from www.prismmodelchecker.org.
We considered IMCA 1.6 obtained from https://github.com/buschko/imca.
We adapt [30, Lemma G.2] to our notations from “Appendix C.4”.
A buffer underrun occurs when the next package needs to be processed while the buffer is empty.

References

Ash RB, Doléans-Dade C (2000) Probability and measure theory. Academic Press, New York
MATH Google Scholar
Ashok P, Butkova Y, Hermanns H, Kretínský J (2018) Continuous-time Markov decisions based on partial exploration. In: ATVA, LNCS, vol 11138. Springer, pp 317–334
Ashok P, Chatterjee K, Daca P, Kretínský J, Meggendorfer T (2017) Value iteration for long-run average reward in Markov decision processes. In: CAV (1), LNCS, vol 10426. Springer, pp 201–221
Baier C, Bertrand N, Dubslaff C, Gburek D, Sankur O (2018) Stochastic shortest paths and weight-bounded properties in Markov decision processes. In: LICS. ACM, pp 86–94
Baier C, Dubslaff C, Klüppelholz S (2014) Trade-off analysis meets probabilistic model checking. In: CSL-LICS. ACM, pp 1:1–1:10
Baier C, Klein J, Leuschner L, Parker D, Wunderlich S (2017) Ensuring the reliability of your model checker: interval iteration for Markov decision processes. In: CAV (1), LNCS, vol 10426. Springer, pp 160–180
Basset N, Kwiatkowska MZ, Topcu U, Wiltsche C (2015) Strategy synthesis for stochastic games with multiple long-run objectives. In: Proceedings of of TACAS, LNCS, vol 9035. Springer, pp 256–271
Boudali H, Crouzen P, Stoelinga M (2010) A rigorous, compositional, and extensible framework for dynamic fault tree analysis. IEEE Trans Dependable Secur Comput 7(2):128–143
Article Google Scholar
Bozzano M, Cimatti A, Katoen JP, Nguyen VY, Noll T, Roveri M (2011) Safety, dependability and performance analysis of extended AADL models. Comput J 54(5):754–775
Article Google Scholar
Brázdil T, Brozek V, Chatterjee K, Forejt V, Kucera A (2014) Markov decision processes with multiple long-run average objectives. LMCS 10(1):1156
MathSciNet MATH Google Scholar
Brázdil T, Chatterjee K, Forejt V, Kucera A (2017) Trading performance for stability in Markov decision processes. J Comput Syst Sci 84:144–170
Article MathSciNet MATH Google Scholar
Bruno JL, Downey PJ, Frederickson GN (1981) Sequencing tasks with exponential service times to minimize the expected flow time or makespan. J ACM 28(1):100–113
Article MathSciNet MATH Google Scholar
Bruyère V, Filiot E, Randour M, Raskin JF (2014) Meet your expectations with guarantees: beyond worst-case synthesis in quantitative games. In: Proceedings of STACS, LIPIcs, vol 25. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, pp 199–213
Butkova Y, Fox G (2019) Optimal time-bounded reachability analysis for concurrent systems. In: TACAS (2), LNCS, vol 11428. Springer, pp 191–208
Butkova Y, Hatefi H, Hermanns H, Krcál J (2015) Optimal continuous time Markov decisions. In: ATVA, LNCS, vol 9364. Springer, pp 166–182
Butkova Y, Wimmer R, Hermanns H (2017) Long-run rewards for Markov automata. In: Proceedings of TACAS, LNCS. Springer https://springerlink.bibliotecabuap.elogim.com/chapter/10.1007%2F978-3-662-54580-5_11
Chatterjee K, Henzinger M (2011) Faster and dynamic algorithms for maximal end-component decomposition and related graph problems in probabilistic verification. In: Proceedings of SODA, pp 1318–1336
Chen T, Forejt V, Kwiatkowska MZ, Simaitis A, Wiltsche C (2013) On stochastic games with multiple objectives. In: Proceedings of MFCS, LNCS, vol 8087. Springer, pp 266–277
Coste N, Hermanns H, Lantreibecq E, Serwe W (2009) Towards performance prediction of compositional models in industrial GALS designs. In: Proceedings of CAV, LNCS, vol 5643. Springer, pp 204–218
David A, Jensen PG, Larsen KG, Legay A, Lime D, Sørensen MG, Taankvist JH (2014) On time with minimal expected cost! In: Proceedings of ATVA, LNCS, vol 8837. Springer, pp 129–145
de Alfaro L (1997) Formal verification of probabilistic systems. Ph.D. thesis, Stanford University
Dehnert C, Junges S, Katoen JP, Volk M (2017) A storm is coming: a modern probabilistic model checker. In: Proceedings of CAV
Delgrange F, Katoen J, Quatmann T, Randour M (2020) Simple strategies in multi-objective MDPs. In: TACAS (1), LNCS, vol 12078. Springer, pp 346–364
Deng Y, Hennessy M (2013) On the semantics of Markov automata. Inf Comput 222:139–168
Article MathSciNet MATH Google Scholar
Eisentraut C, Hermanns H, Katoen JP, Zhang L (2013) A semantics for every GSPN. In: Petri Nets, LNCS, vol 7927. Springer, pp 90–109
Eisentraut C, Hermanns H, Zhang L (2010) On probabilistic automata in continuous time. In: Proceedings of LICS. IEEE CS, pp 342–351
Etessami K, Kwiatkowska MZ, Vardi MY, Yannakakis M (2008) Multi-objective model checking of Markov decision processes. LMCS 4(4):1–21
MathSciNet MATH Google Scholar
Forejt V, Kwiatkowska M, Parker D (2012) Pareto curves for probabilistic model checking. In: Proceedings of ATVA, LNCS, vol 7561. Springer, pp 317–332
Forějt V, Kwiatkowska MZ, Norman G, Parker D, Qu H ()20119 Quantitative multi-objective verification for probabilistic systems. In: Proceedings of TACAS, LNCS, vol 6605. Springer, pp 112–127
Guck D, Hatefi H, Hermanns H, Katoen JP, Timmer M (2014) Analysis of timed and long-run objectives for Markov automata. LMCS 10(3):943
MathSciNet MATH Google Scholar
Guck D, Timmer M, Hatefi H, Ruijters E, Stoelinga M (2014) Modelling and analysis of Markov reward automata. In: Proceedings of ATVA, LNCS, vol 8837. Springer, pp 168–184
Haddad S, Monmege B (2014) Reachability in MDPs: refining convergence of value iteration. In: RP, LNCS, vol 8762. Springer, pp 125–137
Hartmanns A, Junges S, Katoen J, Quatmann T (2020) Multi-cost bounded tradeoff analysis in MDP. J Autom Reason 64(7):1483–1522
Article MathSciNet MATH Google Scholar
Hatefi H, Braitling B, Wimmer R, Fioriti LMF, Hermanns H, Becker B (2015) Cost vs. time in stochastic games and Markov automata. In: Proceedings of SETTA, LNCS, vol 9409. Springer, pp 19–34
Hatefi H, Hermanns H (2012) Model checking algorithms for Markov automata. ECEASST, vol 53. http://journal.ub.tu-berlin.de/eceasst/article/view/783
Hensel C, Junges S, Katoen J, Quatmann T, Volk M (2020) The probabilistic model checker Storm. CoRR abs/2002.07080
Junges S, Jansen N, Dehnert C, Topcu U, Katoen JP (2016) Safety-constrained reinforcement learning for MDPs. In: Proceedings of TACAS, LNCS, vol 9636. Springer, pp 130–146
Katoen JP, Wu H (2016) Probabilistic model checking for uncertain scenario-aware data flow. ACM Trans Embedded Comput Syst 22(1):15:1–15:27
Google Scholar
Kwiatkowska M, Norman G, Parker D (2011) Prism 4.0: verification of probabilistic real-time systems. In: Proceedings of CAV, LNCS, vol 6806. Springer, pp 585–591
Neuhäußer MR (2010) Model checking nondeterministic and randomly timed systems. Ph.D. thesis, RWTH Aachen University
Neuhäußer MR, Stoelinga M, Katoen JP (2009) Delayed nondeterminism in continuous-time Markov decision processes. In: Proceedings of FOSSACS, LNCS, vol 5504. Springer, pp 364–379
Pnueli A, Zuck L (1986) Verification of multiprocess probabilistic protocols. Distrib. Comput. 1(1):53–72
Article MATH Google Scholar
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Book MATH Google Scholar
Quatmann T, Junges S, Katoen J (2017) Markov automata with multiple objectives. In: CAV (1), LNCS, vol 10426. Springer, pp 140–159
Quatmann T, Junges S, Katoen J (2020) Markov automata with multiple objectives: supplemental material. Zenodo. https://doi.org/10.5281/zenodo.4298642
Article MATH Google Scholar
Randour M, Raskin JF, Sankur O (2015) Variations on the stochastic shortest path problem. In: Proceedings of VMCAI, LNCS, vol 8931. Springer, pp 1–18
Roijers DM, Vamplew P, Whiteson S, Dazeley R (2013) A survey of multi-objective sequential decision-making. J Artif Intell Res 48:67–113
Article MathSciNet MATH Google Scholar
Srinivasan MM (1991) Nondeterministic polling systems. Manag Sci 37(6):667–681
Article MATH Google Scholar
Teichteil-Königsbuch F (2012) Path-constrained Markov decision processes: bridging the gap between probabilistic model-checking and decision-theoretic planning. In: Proceedings of ECAI, frontiers in AI and applications, vol 242. IOS Press, pp 744–749
Timmer M, Katoen JP, van de Pol J, Stoelinga M (2012) Efficient modelling and generation of Markov automata. In: Proceedings of CONCUR, LNCS, vol 7454. Springer, pp 364–379
Volk M, Junges S, Katoen J (2018) Fast dynamic fault tree analysis by model checking techniques. IEEE Trans Ind Inform 14(1):370–379
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their detailed feedback on an earlier version of this draft. This work has been supported by the DFG RTG 2236 “UnRAVeL”. S. Junges was supported in part by NSF grants 1545126 (VeHICaL) and 1646208, by the DARPA Assured Autonomy program and the DARPA SDCPS Program (Contract FA8750-20-C-0156), by Berkeley Deep Drive, and by Toyota under the iCyPhy center.

Author information

Authors and Affiliations

RWTH Aachen University, Aachen, Germany
Tim Quatmann & Joost-Pieter Katoen
University of California, Berkeley, CA, USA
Sebastian Junges

Authors

Tim Quatmann
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Junges
View author publications
You can also search for this author in PubMed Google Scholar
Joost-Pieter Katoen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Junges.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proofs about sets of achievable points

Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ be an MA and $\sigma _1, \sigma _2 \in \text {GM}^{}$ be two schedulers for $\mathcal {M}$. Further let $w_1 \in [0,1]$ and $w_2 = 1-w_1 \in [0,1]$. The proof of Proposition 1 considers the scheduler $\sigma ^w \in \text {GM}^{}$, where for a path $\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}$ and action $\alpha \in Act $ we have

$$\begin{aligned} \sigma ^w(\pi , \alpha ) = \frac{\sum _{i=1}^2 \left( w_i \cdot \sigma _i(\pi , \alpha ) \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) }{\sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) } \end{aligned}$$

We now show the following lemma.

Lemma 8

For $\mathcal {M}$, $\sigma _1, \sigma _2, w_1, w_2$, and $\sigma $ as above and arbitrary measurable set $\varPi \subseteq { IPaths ^{}}{}$ we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma ^w}(\varPi ) = w_1 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _1}(\varPi ) + w_2 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _2}(\varPi ). \end{aligned}$$

To show the Lemma, we fix a time-abstract path $\hat{\pi }= s_0 \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{n-1}} s_n$ of $\mathcal {M}$ and show that the claim holds for the cylinder set $ Cyl (\varPi )$ of some measurable $\varPi \subseteq \{ \pi \in { FPaths ^{}}{} \mid \text {ta}(\pi ) = \hat{\pi }\}$. The lemma also follows for arbitrary measurable sets as these can be described via unions and complements of such cylinder sets.

We define the scheduler $\sigma _{\hat{\pi }}$ where for $\pi \in { FPaths ^{}}{}$ and $\alpha \in Act $ we set

$$\begin{aligned} \sigma _{\hat{\pi }}(\pi , \alpha ) = {\left\{ \begin{array}{ll} 1 &{} \text {if } \exists \, j< n:\, \text {ta}(\pi ) = pref (\hat{\pi }, j) \text { and } \alpha = \alpha _j \\ 0 &{} \text {if } \exists \, j < n:\, \text {ta}(\pi ) = pref (\hat{\pi }, j) \text { and } \alpha \ne \alpha _j \\ \frac{1}{| Act ( last (\pi ))|} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

On a path whose time abstraction is a proper prefix of $\hat{\pi }$, $\sigma _{\hat{\pi }}$ will choose exactly the action given in $\hat{\pi }$. In other cases, the choice is arbitrary (for simplicity, we picked a uniform distribution over available actions). We first show two auxiliary lemmas.

Lemma 9

For a scheduler $\sigma \in \text {GM}^{}{}$ and $\hat{\pi }$, $\varPi \subseteq \{ \pi \in { FPaths ^{}}{} \mid \text {ta}(\pi ) = \hat{\pi }\}$, and $\sigma _{\hat{\pi }}$ as above we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\varPi ) = \int _{\pi \in \varPi } \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi , j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ). \end{aligned}$$

Proof

The proof is by induction over the length n of $\hat{\pi }= s_0 \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{n-1}} s_n$. If $n=0$ we have either $\varPi = \{s_0\}$ or $\varPi = \emptyset $ and thus either $ Cyl (\varPi ) = { IPaths ^{}}{}$ or $ Cyl (\varPi ) = \emptyset $. The lemma follows immediately in both cases. Now assume that Lemma 9 holds for $\varPi ' = \{ pref (\pi , n-1) \mid \pi \in \varPi \}$, i.e., for paths of length $n-1$. Notice that for $\pi ' \in \varPi '$ we have $ last (\pi ') = s_{n-1}$.

$\underline{Case\, s_{n-1} \in \text {PS}:}$

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\varPi )&= \int _{\pi ' \in \varPi '} \sigma (\pi ',\alpha _{n-1}) \cdot \mathbf{P} (s, \alpha _{n-1}, s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ') \\&= \int _{\pi ' \in \varPi '} \sigma (\pi ',\alpha _{n-1}) \cdot \mathbf{P} (s, \alpha _{n-1}, s') \cdot \Big (\prod _{j=0}^{n-2} \sigma ( pref (\pi ', j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ') \\&= \int _{\pi ' \in \varPi '} \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi ', j), \alpha _j)\Big ) \cdot \mathbf{P} (s, \alpha _{n-1}, s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ') \\&= \int _{\pi ' \in \varPi '} \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi ', j), \alpha _j)\Big ) \cdot \underbrace{\sigma _{\hat{\pi }}(\pi ',\alpha _{n-1})}_{=1} \cdot \mathbf{P} (s, \alpha _{n-1}, s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ') \\&= \int _{\pi \in \varPi } \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi , j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ). \end{aligned}$$

In the last step we use that for $\pi ' \in \varPi '$ we have $\pi = \pi ' \xrightarrow {\alpha _{n-1}} s_n \in \varPi $.

$\underline{Case \, s_{n-1} \in \text {MS}:}$ For $\pi ' \in \varPi '$ let $T_{\pi '} = \{ (t, \alpha _{n-1}, s_n) \mid \pi ' \xrightarrow {t} s_n \in \varPi \}$. We note that $\alpha _{n-1} = \bot $, $\sigma '(\pi ',\bot ) = 1$, and that the probability of the transition step $\text {Pr}^{ Steps }_{\sigma , \pi }(T_{\pi '})$ does not depend on $\sigma $ since $s_{n-1} \in \text {MS}$. We get:

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\varPi )&= \int _{\pi ' \in \varPi '} \text {Pr}^{ Steps }_{\sigma , \pi '}(T_{\pi '}) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ') \\&= \int _{\pi ' \in \varPi '} \text {Pr}^{ Steps }_{\sigma , \pi '}(T_{\pi '}) \cdot \Big (\prod _{j=0}^{n-2} \sigma ( pref (\pi ', j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ')\\&= \int _{\pi ' \in \varPi '} \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi ', j), \alpha _j)\Big ) \cdot \text {Pr}^{ Steps }_{\sigma _{\hat{\pi }}, \pi }(T_{\pi '}) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ')\\&= \int _{\pi \in \varPi } \Big (\prod _{j=0}^{n-1} \sigma ( pref (\pi , j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma _{\hat{\pi }}}(\pi ). \end{aligned}$$

$\square $

Lemma 10

For $\sigma _1, \sigma _2, \sigma ^w$ as above and $\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}{}$ we have

$$\begin{aligned} \prod _{j=0}^{n-1} \sigma ^w( pref (\pi , j), \alpha (\kappa _{j})) = \sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) . \end{aligned}$$

Proof

$$\begin{aligned} \prod _{j=0}^{n-1} \sigma ^w( pref (\pi , j), \alpha (\kappa _{j}))&= \prod _{j=0}^{n-1} \frac{\sum _{i=1}^2 \left( w_i \cdot \prod _{k=0}^{j} \sigma _i( pref (\pi , k), \alpha (\kappa _{k})) \right) }{\sum _{i=1}^2 \left( w_i \cdot \prod _{k=0}^{j-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) }\\&= \frac{\sum _{i=1}^2 \left( w_i \cdot \prod _{k=0}^{n-1} \sigma _i( pref (\pi , k), \alpha (\kappa _{k})) \right) }{\sum _{i=1}^2 \left( w_i \cdot \prod _{k=0}^{0-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) }\\&= \sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) \end{aligned}$$

Proof

(of Lemma 8) Using the auxiliary Lemmas 9 and 10, we can prove Lemma 8 as follows:

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma ^w}(\varPi ) \overset{\text {Lem.}\,9}{=}&\int _{\pi \in \varPi } \Big (\prod _{j=0}^{n-1} \sigma ^w( pref (\pi , j), \alpha _j)\Big ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma ^w_{\hat{\pi }}}(\pi )\\ \overset{\text {Lem.}\,10}{=}&\int _{\pi \in \varPi } \sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha _j) \right) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma ^w_{\hat{\pi }}}(\pi )\\ {=}&\sum _{i=1}^2 \left( w_i \cdot \int _{\pi \in \varPi } \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha _j) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma ^w_{\hat{\pi }}}(\pi )\right) \\ \overset{\text {Lem.}\,9}{=}&\sum _{i=1}^2 \left( w_i \cdot \text {Pr}^{\mathcal {M}}_{\sigma _i}(\varPi ) \right) . \end{aligned}$$

$\square $

B Proofs for expected reward

1.1 B.1 Proof of Lemma 2

Lemma 2

For MA $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ with $G\subseteq S$, $\sigma \in \text {GM}^{}$, and reward function $\rho _{}$ it holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{}). \end{aligned}$$

Furthermore, any reward function $\rho _{}^{\mathcal {D}}$ for ${\mathcal {M}_\mathcal {D}}$ satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \end{aligned}$$

Proof

We show the first claim. The second claim follows analogously. For each $n \ge 0$, consider the function $f_n :{ IPaths ^{\mathcal {M}}} \rightarrow \mathbb {R}_{\ge 0}$ given by

for every path $\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{\mathcal {M}}}$. Intuitively, $f_n(\pi )$ is the reward collected on $\pi $ within the first n steps and only up to the first visit of $G$. This allows us to express the expected reward collected along the paths of ${\varPi _{G}^{n}}$ as

It holds that which is a direct consequence from the definition of the reward of $\pi $ up to $G$ (cf. Sect. 2.2.3). Furthermore, note that the sequence of functions $f_0, f_1, \dots $ is non-decreasing, i.e., we have $f_n(\pi ) \le f_{n+1}(\pi )$ for all $n\ge 0$ and $\pi \in { IPaths ^{\mathcal {M}}}$. By applying the monotone convergence theorem [1] we obtain

$\square $

The next step is to show that the expected reward collected along the paths of ${\varPi _{G}^{n}}$ coincides for $\mathcal {M}$ under $\sigma $ and ${\mathcal {M}_\mathcal {D}}$ under ${\text {ta}(\sigma )}$.

1.2 B.2 Proof of Lemma 3

Lemma 3

Let $\rho _{}$ be some reward function of $\mathcal {M}$ and let $\rho _{}^{\mathcal {D}}$ be its counterpart for ${\mathcal {M}_\mathcal {D}}$. Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ be an MA with $G\subseteq S$ and $\sigma \in \text {GM}^{}$. For all $G\subseteq S$ and $n\ge 0$ it holds that

$$\begin{aligned} \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) . \end{aligned}$$

We detail the terms (1) and (2) from the proof of Lemma 3 separately.

Term (1): Let ${\varLambda ^{\le n}_{G}}= \{\hat{\pi }\in {\varPi _{G}^{n+1}} \mid |\hat{\pi }| \le n \}$ be the paths in ${\varPi _{G}^{n+1}}$ of length at most n. We have ${\varLambda ^{\le n}_{G}}\subseteq {\varPi _{G}^{n}}$ and every path in ${\varLambda ^{\le n}_{G}}$ visits a state in $G$. Correspondingly, ${\varLambda ^{= n}_{\lnot G}}= {\varPi _{G}^{n}}{\setminus } {\varLambda ^{\le n}_{G}}$ is the set of time-abstract paths of length n that do not visit a state in $G$. Hence, the paths in ${\varPi _{G}^{n+1}}$ with length $n+1$ have a prefix in ${\varLambda ^{= n}_{\lnot G}}$.

The set ${\varPi _{G}^{n+1}}$ is partitioned such that

The reward obtained within the first n steps is independent of the $(n+1)$-th transition. To show this formally, we fix a path $\hat{\pi }' \in {\varLambda ^{= n}_{\lnot G}}$ with $ last (\hat{\pi }') = s$ and derive

(4)

With the above-mentioned partition of the set ${\varPi _{G}^{n+1}}$, it follows that the expected reward obtained within the first n steps is given by

(5)

Term (2): For the expected reward obtained in step $n+1$, consider a path $\hat{\pi }= \hat{\pi }' \xrightarrow {\alpha _{}} s' \in {\varPi _{G}^{n+1}}$ such that $ |\hat{\pi }'| = n$ and $ last (\hat{\pi }') = s$.

If $s\in \text {MS}$, we have $\hat{\pi }= \hat{\pi }' \xrightarrow {\bot } s'$. It follows that
$$\begin{aligned}&\int _{\pi = \pi '\xrightarrow {t} s' \in \langle {\hat{\pi }} \rangle } \rho _{}(s) \cdot t+ \rho _{}(s, \bot ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \nonumber \\&\quad = \int _{\begin{array}{c} \pi = \pi '\xrightarrow {t} s' \in \langle {\hat{\pi }} \rangle \end{array}} \rho _{}(s) \cdot t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) + \int _{\pi \in \langle {\hat{\pi }} \rangle } \rho _{}(s, \bot ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \nonumber \\&\quad = \rho _{}(s) \cdot \int _{\pi ' \in \langle {\hat{\pi }'} \rangle } \int _{0}^{\infty } t\cdot {\text {E}(s)} \cdot e^{-{\text {E}(s)} t} \cdot \mathbf{P} (s, \bot , s') \,\text {d}t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ') \nonumber \\&\qquad + \rho _{}(s, \bot ) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \nonumber \\&\quad = \frac{\rho _{}(s)}{{\text {E}(s)}} \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) + \rho _{}(s, \bot ) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \nonumber \\&\quad = \rho _{}^{\mathcal {D}}(s, \bot ) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \overset{Lem.\,1}{=} \rho _{}^{\mathcal {D}}(s, \bot ) \cdot \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }). \end{aligned}$$
(6)
If $s\in \text {PS}$, then $ \displaystyle \int _{\pi = \pi '\xrightarrow {\alpha } s' \in \langle {\hat{\pi }} \rangle } \rho _{}(s, \alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) = \rho _{}^{\mathcal {D}}(s, \alpha ) \cdot \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi })$ follows similarly.

C Proofs for timed reachability

1.1 C.1 Proof of Lemma 4

Lemma 4

Let $\mathcal {M}$ be an MA with scheduler $\sigma \in \text {GM}^{}$, digitization ${\mathcal {M}_{\delta }}$, and digital path $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( [{\bar{\pi }}]_{\textit{cyl}} ) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }). \end{aligned}$$

Proof

The proof is by induction over the length n of $\bar{\pi }$. Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$ and ${\mathcal {M}_{\delta }}= (S, Act , \mathbf{P} _{\delta }, {s_{0}}, (\rho _{1}^\delta , \dots , \rho _{\ell }^\delta ))$. If $n=0$, then $\bar{\pi }= {s_{0}}$ and $[{\bar{\pi }}]_{\textit{cyl}} = { IPaths ^{\mathcal {M}}}$. Hence, $\text {Pr}^{\mathcal {M}}_{\sigma }([{{s_{0}}}]_{\textit{cyl}}) = 1 = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}({s_{0}})$. In the induction step it is assumed that the lemma holds for a fixed path $\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}$ with $|\bar{\pi }| = n$ and $ last (\bar{\pi }) = s$. Consider a path $\bar{\pi }\xrightarrow {\alpha _{}} s' \in { FPaths ^{{\mathcal {M}_{\delta }}}}$.

If $\text {Pr}^{\mathcal {M}}_{\sigma } ([{\bar{\pi }}]_{\textit{cyl}}) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }) = 0$, then $\text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\alpha _{}} s'}]_{\textit{cyl}}) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }\xrightarrow {\alpha _{}} s') = 0$ because $[{\bar{\pi }\xrightarrow {\alpha _{}} s'}]_{\textit{cyl}} \subseteq [{\bar{\pi }}]_{\textit{cyl}}$ and $ Cyl (\{\bar{\pi }\xrightarrow {\alpha _{}} s'\}) \subseteq Cyl (\{\bar{\pi }\})$.

Now assume $\text {Pr}^{\mathcal {M}}_{\sigma } ([{\bar{\pi }}]_{\textit{cyl}}) > 0$. We distinguish the following cases.

$\underline{Case s\in \text {PS}:}$ It follows that $[{\bar{\pi }\xrightarrow {\alpha _{}} s'}]_{\textit{cyl}} = Cyl ([{\bar{\pi }\xrightarrow {\alpha _{}} s'}])$ since $\bar{\pi }\xrightarrow {\alpha _{}} s'$ ends with a probabilistic transition. Hence,

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\alpha _{}} s'}]_{\textit{cyl}})&= \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\alpha _{}} s'}])\\&= \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \cdot \mathbf{P} (s, \alpha , s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \cdot \mathbf{P} (s, \alpha , s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \} \cap [{\bar{\pi }}]) \\&= \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \cdot \mathbf{P} (s, \alpha , s') \,\text {d}\big [\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }}]) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}])\big ] \\&= \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}]) \cdot \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }}]) \\&= \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}]) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {di}(\sigma )}(\bar{\pi },\alpha ) \\&\overset{{\textit{IH}}}{=}\text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {di}(\sigma )}(\bar{\pi },\alpha ) \\&= \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }\xrightarrow {\alpha _{}} s'). \end{aligned}$$

$\underline{Case s\in \text {MS}:}$ As $s\in \text {MS}$ we have $\alpha = \bot $ and it follows

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}})&= \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}]_{\textit{cyl}} \cap [{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}}) \nonumber \\&= \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}]_{\textit{cyl}}) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}} \mid [{\bar{\pi }}]_{\textit{cyl}} ). \end{aligned}$$

(7)

Assume that a path $\pi \in [{\bar{\pi }}]_{\textit{cyl}}$ has been observed, i.e., $ pref ({\text {di}(\pi )}, m) = \bar{\pi }$ holds for some $m\ge 0$. The term $\text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}} \mid [{\bar{\pi }}]_{\textit{cyl}} )$ coincides with the probability that also $ pref ({\text {di}(\pi )}, m+1) = \bar{\pi }\xrightarrow {\bot } s'$ holds.

We have either

$s\ne s'$ which means that the transition from $s$ to $s'$ has to be taken during a period of $\delta $ time units or
$s= s'$ where we additionally have to consider the case that no transition is taken at $s$ for $\delta $ time units.

It follows that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}} \mid [{\bar{\pi }}]_{\textit{cyl}} )&= {\left\{ \begin{array}{ll} \mathbf{P} (s, \bot , s') (1-e^{-{\text {E}(s)} \delta }) &{} \text {if } s\ne s' \\ \mathbf{P} (s, \bot , s') (1-e^{-{\text {E}(s)} \delta }) + e^{-{\text {E}(s)} \delta } &{} \text {if } s= s' \end{array}\right. }\nonumber \\&= \mathbf{P} _{\delta }(s, \bot , s'). \end{aligned}$$

(8)

We conclude that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }\xrightarrow {\bot } s'}]_{\textit{cyl}}) \ \&\overset{(7),\, (8)}{=} \ \ \text {Pr}^{\mathcal {M}}_{\sigma }([{\bar{\pi }}]_{\textit{cyl}}) \cdot \mathbf{P} _{\delta }(s, \bot , s') \\&\overset{{\textit{IH}}}{=}\ \ \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }) \cdot \mathbf{P} _{\delta }(s, \bot , s') = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }\xrightarrow {\bot } s'). \end{aligned}$$

$\square $

1.2 C.2 Proof of Proposition 4

Proposition 4

Let $\mathcal {M}$ be an MA with $G\subseteq S$, $\sigma \in \text {GM}^{}$, and digitization ${\mathcal {M}_{\delta }}$. Further, let $J\subseteq \mathbb {N}$ be a set of consecutive natural numbers. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{J}_{\text {ds}} G}]) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{J}_{\text {ds}} G). \end{aligned}$$

Proof

Consider the set $\varPi _G^J\subseteq { FPaths ^{{\mathcal {M}_{\delta }}}}$ of paths that (i) visit $G$ within $J$ digitization steps and (ii) do not have a proper prefix that satisfies (i). Every path in $\lozenge ^{J}_{\text {ds}} G$ has a unique prefix in $\varPi _G^J$, yielding

For the corresponding paths of $\mathcal {M}$ we obtain

The proposition follows with Lemma 4 since

$$\begin{aligned} \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{J}_{\text {ds}} G) = \sum _{\bar{\pi }\in \varPi _G^J} \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }) \overset{Lem.\,4}{=} \sum _{\bar{\pi }\in \varPi _G^J} \text {Pr}^{\mathcal {M}}_{\sigma }( [{\bar{\pi }}]_{\textit{cyl}}) = \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{J}_{\text {ds}} G}]). \end{aligned}$$

$\square $

1.3 C.3 Proofs of Lemmas 6 and 7

Lemma 7

Let $\mathcal {M}$ be an MA with $\sigma \in \text {GM}^{}$ and maximum rate $\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}$. For each $\delta \in \mathbb {R}_{> 0}$, $k\in \mathbb {N}$, and $t\in \mathbb {R}_{\ge 0}$ it holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + t}]^{{\le }{ k}}) \ge \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{\le }{ k}}) \cdot e^{-\lambda t}\ . \end{aligned}$$

Proof

First, we show that the set $\#^{}[{k\delta + t}]^{{\le }{ k}}$ corresponds to the paths of $\#^{}[{k\delta }]^{{\le }{ k}}$ with the additional requirement that no transition is taken between the time points $k\delta $ and $k\delta + t$, i.e.,

$$\begin{aligned} \#^{}[{k\delta + t}]^{{\le }{ k}} = \{\pi \in \#^{}[{k\delta }]^{{\le }{ k}} \mid \text {there is no prefix } \pi ' \text { of } \pi \text { with } k\delta < T (\pi ') \le k\delta + t\}. \end{aligned}$$

“$\subseteq $”: If $\pi \in \#^{}[{k\delta + t}]^{{\le }{ k}}$, then $\pi \in \#^{}[{k\delta }]^{{\le }{ k}}$ follows immediately. Furthermore, assume towards a contradiction that there is a prefix $\pi '$ of $\pi $ with $k\delta < T (\pi ') \le k\delta + t$. Then, $k< \frac{ T (\pi ')}{\delta } \le {|\pi '|_{\text {ds}}} $ (cf. Lemma 5). As $ T (\pi ') \le k\delta + t$, this means that ${| pref _{ T }(\pi , k\delta + t)|_{\text {ds}}} \ge {|\pi '|_{\text {ds}}} > k$ which contradicts $\pi \in \#^{}[{k\delta + t}]^{{\le }{ k}}$.
“$\supseteq $”: For $\pi \in \#^{}[{k\delta }]^{{\le }{ k}}$ with no prefix $\pi '$ such that $k\delta < T (\pi ') \le k\delta + t$, it holds that $ pref _{ T }(\pi , k\delta + t) = pref _{ T }(\pi , k\delta )$. Hence, ${| pref _{ T }(\pi , k\delta + t)|_{\text {ds}}} = {| pref _{ T }(\pi , k\delta )|_{\text {ds}}} \le k$ and it follows that $\pi \in \#^{}[{k\delta + t}]^{{\le }{ k}}$.

The probability for no transition to be taken between $k\delta $ and $k\delta + t$ only depends on the current state at time point $k\delta $. More precisely, for some state $s\in \text {MS}$ assume the set of paths $\{\pi \in \#^{}[{k\delta }]^{{\le }{ k}} \mid last ( pref _{ T }(\pi , k\delta )) = s\}$. The probability that a path in this set takes no transition between time points $k\delta $ and $k\delta + t$ is given by $e^{-{\text {E}(s)} t}$. With $\lambda \ge {\text {E}(s)}$ for all $s\in \text {MS}$ it follows that

$$\begin{aligned}&\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + t}]^{{\le }{ k}}) \\&\quad = \text {Pr}^{\mathcal {M}}_{\sigma }( \{\pi \in \#^{}[{k\delta }]^{{\le }{ k}} \mid \text {there is no prefix } \pi ' \text { of } \pi \text { with } k\delta < T (\pi ') \le k\delta + t\} ) \\&\quad = \sum _{s\in \text {MS}} \text {Pr}^{\mathcal {M}}_{\sigma }( \{\pi \in \#^{}[{k\delta }]^{{\le }{ k}} \mid last ( pref _{ T }(\pi , k\delta )) = s\} ) \cdot e^{-{\text {E}(s)} t}\\&\quad \ge \sum _{s\in \text {MS}} \text {Pr}^{\mathcal {M}}_{\sigma }( \{\pi \in \#^{}[{k\delta }]^{{\le }{ k}} \mid last ( pref _{ T }(\pi , k\delta )) = s\} ) \cdot e^{-\lambda t}\\&\quad = \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{\le }{ k}}) \cdot e^{-\lambda t}\ . \end{aligned}$$

$\square $

Lemma 6

Let $\mathcal {M}$ be an MA with $\sigma \in \text {GM}^{}$ and maximum rate $\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}$. Further, let $\delta \in \mathbb {R}_{> 0}$ and $k\in \mathbb {N}$. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{>}{k}}) \le 1- (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k} \end{aligned}$$

Proof

Let $\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )$. By induction over $k$ we show that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{\le }{ k}}) \ge (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k}. \end{aligned}$$

The claim follows as $\#^{}[{k\delta }]^{{>}{k}} = { IPaths ^{\mathcal {M}}} {\setminus } \#^{}[{k\delta }]^{{\le }{k}}$.

For $k=0$, we have $\pi \in \#^{}[{0 \cdot \delta }]^{{\le }{ 0}}$ iff $\pi $ takes no Markovian transition at time point zero. As this happens with probability one, it follows that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{0 \cdot \delta }]^{{\le }{ 0}}) = 1 = (1+ \lambda \delta )^0 \cdot e^{-\lambda \delta \cdot 0 }\ . \end{aligned}$$

We assume in the induction step that the proposition holds for some fixed $k$. We distinguish between two cases for the initial state ${s_{0}}$ of $\mathcal {M}$.

$\underline{Case \,{s_{0}}\in \text {MS}}:$ We partition the set with

$$\begin{aligned} \varLambda ^{\ge \delta }&= \{ {s_{0}}\xrightarrow {t} s_{1} \xrightarrow {\kappa _{1}} \dots \in \#^{}[{k\delta + \delta }]^{{\le }{ k+ 1}} \mid t\ge \delta \} \text { and } \\ \varLambda ^{< \delta }&= \{ {s_{0}}\xrightarrow {t} s_{1} \xrightarrow {\kappa _{1}} \dots \in \#^{}[{k\delta + \delta }]^{{\le }{ k+ 1}} \mid t< \delta \}. \end{aligned}$$

Hence, $\varLambda ^{\ge \delta }$ contains the paths where we wait at least $\delta $ time units at ${s_{0}}$ and $\varLambda ^{< \delta }$ contains the paths where the first transition is taken within $t< \delta $ time units. It follows that $\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + \delta }]^{{\le }{ k+ 1}} ) = \text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{\ge \delta }) + \text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{< \delta })$. We consider the probabilities for $\varLambda ^{\ge \delta }$ and $\varLambda ^{< \delta }$ separately.

$\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{\ge \delta })$: For a path ${s_{0}}\xrightarrow {t+\delta } s_{1} \xrightarrow {\kappa _{1}} \dots \in \varLambda ^{\ge \delta }$, after the first $\delta $ time units there are at most $k$ digitization steps within the next $k\delta $ time units, i.e.,
$$\begin{aligned} {s_{0}}\xrightarrow {t+ \delta } s_{1} \xrightarrow {\kappa _{1}} \dots \in \varLambda ^{\ge \delta }\iff {s_{0}}\xrightarrow {t} s_{1} \xrightarrow {\kappa _{1}} \dots \in \#^{}[{k\delta }]^{{\le }{ k}}. \end{aligned}$$
The probability for $\varLambda ^{\ge \delta }$ can therefore be derived from the probability to wait at ${s_{0}}$ for at least $\delta $ time units and the probability for $\#^{}[{k\delta }]^{{\le }{ k}}$. In order to apply this, we need to modify the considered scheduler as it might depend on the sojourn time in ${s_{0}}$. Let $\sigma _\delta $ be the scheduler for $\mathcal {M}$ that mimics $\sigma $ on paths where the first transition is delayed by $\delta $, i.e., $\sigma _\delta $ satisfies
$$\begin{aligned} \sigma _\delta ({s_{0}}\xrightarrow {t} \dots {\xrightarrow {\kappa _{n-1}} s_{n}},\alpha ) = \sigma ({s_{0}}\xrightarrow {t+ \delta } \dots {\xrightarrow {\kappa _{n-1}} s_{n}},\alpha ). \end{aligned}$$
for all ${s_{0}}\xrightarrow {t} \dots {\xrightarrow {\kappa _{n-1}} s_{n}} \in { FPaths ^{\mathcal {M}}}$ and $\alpha \in Act $. It holds that
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( \varLambda ^{\ge \delta })&= e^{-{\text {E}({s_{0}})} \delta } \cdot \text {Pr}^{\mathcal {M}}_{\sigma _\delta }( \#^{}[{k\delta }]^{{\le }{ k}}) \nonumber \\&\overset{{\textit{IH}}}{\ge }e^{-{\text {E}({s_{0}})} \delta } \cdot (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k} \nonumber \\&= e^{-{\text {E}({s_{0}})} \delta } \cdot (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k} \cdot e^{-\lambda \delta } \cdot e^{\lambda \delta } \nonumber \\&= (1+ \lambda \delta )^{k} \cdot e^{-\lambda \delta (k+1)} \cdot e^{(\lambda -{\text {E}({s_{0}})}) \delta }\ . \end{aligned}$$
(9)
$\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{< \delta })$: For a path ${s_{0}}\xrightarrow {t} s_{1} \xrightarrow {\kappa _{1}} \dots \in \varLambda ^{< \delta }$, the first digitization step happens at less than $\delta $ time units, i.e., $0 \le t<\delta $. It follows that there are at most $k$ digitization steps in the remaining $ k\delta + \delta - t$ time units, i.e.,
$$\begin{aligned} {s_{0}}\xrightarrow {t} s_{1} \xrightarrow {\kappa _{1}} s_{2} \xrightarrow {\kappa _{2}} \dots \in \varLambda ^{< \delta }\iff s_{1} \xrightarrow {\kappa _{1}} s_{2} \xrightarrow {\kappa _{2}} \dots \in \#^{s_1}[{ k\delta + \delta - t}]^{{\le k}{\ }}, \end{aligned}$$
where $\#^{s_1}[{ k\delta + \delta - t}]^{{\le }{ k}}$ refers to the paths $\#^{}[{ k\delta + \delta - t}]^{{\le }{ k}}$ of ${\mathcal {M}^{s_1}} = (S, Act ,\rightarrow ,s_1, (\rho _{1}, \dots , \rho _{\ell }))$, the MA obtained from $\mathcal {M}$ by changing the initial state to $s_1$. Hence, the probability for $\varLambda ^{< \delta }$ can be derived from the probability to take a transition from ${s_{0}}$ to some state $s$ within $ t< \delta $ time units and the probability for $\#^{s}[{ k\delta + \delta - t}]^{{\le }{ k}} $. Again, we need to adapt the considered scheduler. Let $\pi \in { FPaths ^{\mathcal {M}}}$ with $ last (\pi ) = s$. The scheduler ${\sigma [{\pi }]}$ for ${\mathcal {M}^{s}}$ mimics the scheduler $\sigma $ for $\mathcal {M}$, where $\pi $ is prepended to the given path, i.e., we set
$$\begin{aligned} {\sigma [{\pi }]}(s\xrightarrow {\kappa _{j}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}},\alpha ) = \sigma (\pi \xrightarrow {\kappa _{j}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}},\alpha ) \end{aligned}$$
for all $s\xrightarrow {\kappa _{j}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}} \in { FPaths ^{{\mathcal {M}^{s}}}}$ and $\alpha \in Act $. With Lemma 7 it follows that
$$\begin{aligned}&\text {Pr}^{\mathcal {M}}_{\sigma }( \varLambda ^{< \delta })\nonumber \\&\quad = \int _{0}^{\delta } {\text {E}({s_{0}})} \cdot e^{-{\text {E}({s_{0}})} t} \cdot \left( \sum _{s\in S} \mathbf{P} ({s_{0}}, \bot , s) \cdot \text {Pr}^{{\mathcal {M}^{s}}}_{{\sigma [{\pi }]}}( \#^{s}[{ k\delta + \delta - t}]^{{\le }{ k}} ) \right) \,\text {d}t\nonumber \\&\quad \ge \int _{0}^{\delta } {\text {E}({s_{0}})} \cdot e^{-{\text {E}({s_{0}})} t} \cdot \left( \sum _{s\in S} \mathbf{P} ({s_{0}}, \bot , s) \cdot \text {Pr}^{{\mathcal {M}^{s}}}_{{\sigma [{\pi }]}}( \#^{s}[{ k\delta }]^{{\le }{ k}} ) \cdot e^{-\lambda (\delta - t)} \right) \,\text {d}t\nonumber \\&\quad \overset{{\textit{IH}}}{\ge }\int _{0}^{\delta } {\text {E}({s_{0}})} \cdot e^{-{\text {E}({s_{0}})} t} \cdot \left( \sum _{s\in S} \mathbf{P} ({s_{0}}, \bot , s) \cdot (1+\lambda \delta )^{k} \cdot e^{- \lambda \delta k} \cdot e^{-\lambda (\delta - t)} \right) \,\text {d}t\nonumber \\&\quad = (1+\lambda \delta )^{k} \cdot e^{- \lambda \delta k} \cdot {\text {E}({s_{0}})} \cdot \int _{0}^{\delta } e^{-{\text {E}({s_{0}})} t} \cdot e^{-\lambda (\delta - t)} \cdot \left( \sum _{s\in S} \mathbf{P} ({s_{0}}, \bot , s) \right) \,\text {d}t\nonumber \\&\quad = (1+\lambda \delta )^{k} \cdot e^{- \lambda \delta k} \cdot {\text {E}({s_{0}})} \cdot \int _{0}^{\delta } e^{-{\text {E}({s_{0}})} t} \cdot e^{-\lambda \delta } \cdot e^{\lambda t} \,\text {d}t\nonumber \\&\quad = (1+\lambda \delta )^{k} \cdot e^{-\lambda \delta (k+1) } \cdot {\text {E}({s_{0}})} \cdot \int _{0}^{\delta } e^{(\lambda -{\text {E}({s_{0}})}) t} \,\text {d}t\ . \end{aligned}$$
(10)

Combining the results for $\varLambda ^{\ge \delta }$ and $\varLambda ^{< \delta }$ (i.e., Equations 9 and 10), we obtain

$$\begin{aligned}&\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + \delta }]^{{\le }{ k+ 1}}) \\&\quad = \text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{\ge \delta }) + \text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ^{< \delta }) \\&\quad \ge (1+ \lambda \delta )^{k} \cdot e^{-\lambda \delta (k+1) } \cdot \Big ( e^{(\lambda -{\text {E}({s_{0}})}) \delta } + {\text {E}({s_{0}})} \cdot \int _{0}^{\delta } e^{(\lambda -{\text {E}({s_{0}})}) t} \,\text {d}t\Big )\\&\quad \overset{*}{\ge } (1+ \lambda \delta )^{k} \cdot e^{-\lambda \delta (k+1) } \cdot \left( 1 + \lambda \delta \right) = (1+ \lambda \delta )^{k+1} \cdot e^{-\lambda \delta (k+1) }\ , \end{aligned}$$

where the inequality marked with $*$ is due to

$\underline{Case \,{s_{0}}\in \text {PS}:}$ Since $\mathcal {M}$ is non-zeno, a state $s\in \text {MS}$ is reached from ${s_{0}}$ within zero time almost surely (i.e., with probability one). From the previous case, it already follows that the Proposition holds for ${\mathcal {M}^{s}}$ with $s\in \text {MS}$ and the set $\#^{s}[{ k\delta + \delta }]^{{\le }{ k+1}}$. With $\varPi _\text {MS}= \{s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{\mathcal {M}}} \mid s_n \in \text {MS}\text { and } \forall i < n :s_i \in \text {PS}\}$ we obtain

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + \delta }]^{{\le }{ k+ 1}})&= \int _{\begin{array}{c} \pi \in \varPi _\text {MS}\\ last (\pi ) = s \end{array}} \text {Pr}^{{\mathcal {M}^{s}}}_{{\sigma [{\pi }]}}(\#^{s}[{ k\delta + \delta }]^{{\le }{ k+1}}) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi )\\&\ge \int _{\begin{array}{c} \pi \in \varPi _\text {MS}\\ last (\pi ) = s \end{array}} (1+\lambda \delta )^{k+1} \cdot e^{-\lambda \delta (k+1) } \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi )\\&= (1+\lambda \delta )^{k+1} \cdot e^{-\lambda \delta (k+1) } \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\varPi _\text {MS}) \\&= (1+\lambda \delta )^{k+1} \cdot e^{-\lambda \delta (k+1) } \ . \end{aligned}$$

$\square $

1.4 C.4 Proof of Proposition 5

Proposition 5

For MA $\mathcal {M}$, scheduler $\sigma \in \text {GM}^{}$, goal states $G\subseteq S$, digitization constant $\delta \in \mathbb {R}_{> 0}$ and time interval $I$

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \in \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{I}_{\text {ds}} G}]) + \Big [{-}\varepsilon ^{\downarrow }_{}(I),\, \varepsilon ^{\uparrow }_{}(I)\Big ]. \end{aligned}$$

We show Eq. 3, that is,

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \varepsilon ^{\downarrow }_{}(I) \text { and } \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \varepsilon ^{\uparrow }_{}(I) \end{aligned}$$

for the remaining forms of the time interval $I$.

$\underline{Case \, I= [0, \infty ):}$ In this case we have ${\text {di}(I)}= \mathbb {N}$. It follows that

$$\begin{aligned}{}[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] = \lozenge ^{I} G= \{s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{\mathcal {M}}} \mid s_i \in G\text { for some } i \ge 0 \}. \end{aligned}$$

Hence,

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) = \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) = \text {Pr}^{\mathcal {M}}_{\sigma }(\emptyset ) = 0 = \varepsilon ^{\downarrow }_{}(I) = \varepsilon ^{\uparrow }_{}(I). \end{aligned}$$

$\underline{Case \, I= [a,\infty ) for a= \text {di}_a\delta :}$ We have ${\text {di}(I)}= \{\text {di}_a+1, \text {di}_a+2, \dots \}$.

We show that $[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G\subseteq \#^{}[{a}]^{{>}{\text {di}_a}}$. With Lemma 6 we obtain
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{a}]^{{>}{\text {di}_a}}) \le 1 - (1+ \lambda \delta )^{\text {di}_a} \cdot e^{- \lambda a} = \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$
Consider a path $\pi \in [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G$. As $\pi \notin \lozenge ^{I} G$, it follows that $\pi $ has to reach (and leave) $G$ within less than $a$ time units. Let $\bar{\pi }$ be the largest prefix of ${\text {di}(\pi )}$ that satisfies $ last (\bar{\pi }) \in G$. Our observations yield that $\pi $ leaves $ last (\bar{\pi })$ before time point $a$. Hence, $\bar{\pi }$ is a prefix of $\text {di}( pref _{ T }(\pi , a))$. Moreover, ${|\bar{\pi }|_{\text {ds}}} \in {\text {di}(I)}$ as ${\text {di}(\pi )}\in \lozenge ^{{\text {di}(I)}}_{\text {ds}} G$. It follows that ${| pref _{ T }(\pi , a)|_{\text {ds}}} \ge {|\bar{\pi }|_{\text {ds}}} > \text {di}_a$ which implies $\pi \in \#^{}[{a}]^{{>}{\text {di}_a}}$.
Now consider a path $\pi \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$. $\pi $ visits $G$ at least once since $\pi \in \lozenge ^{I} G$. Moreover, ${\text {di}(\pi )}$ does not visit $G$ after $\text {di}_a$ digitization steps due to $\pi \notin [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$. This means $\pi $ visits $G$ only finitely often. Let $\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}$ be the largest prefix of $\pi $ such that $s_n \in G$. Notice that ${|\pi '|_{\text {ds}}} \le \text {di}_a$ holds. Let $\pi ' \xrightarrow {\kappa _{}} s$ be the prefix of $\pi $ of length $|\pi '|+1$. We show by contradiction that $a\le T (\pi ' \xrightarrow {\kappa _{}} s) < a+ \delta $ holds:
- If $ T (\pi ' \xrightarrow {\kappa _{}} s) < a$, then $ last (\pi ') \in G$ is left before time point $a$ which contradicts $\pi \in \lozenge ^{I} G$.
- Further, assume that $ T (\pi ' \xrightarrow {\kappa _{}} s) \ge a+ \delta $. With Lemma 5 we obtain
  $$\begin{aligned} t(\kappa _{})&\ge a+ \delta - T (\pi ')\\&\ge a+ \delta - {|\pi '|_{\text {ds}}} \cdot \delta \\&\ge (\text {di}_a+ 1 - \underbrace{{|\pi '|_{\text {ds}}}}_{\le \text {di}_a}) \cdot \delta > 0\ . \end{aligned}$$
  Hence, $\pi $ stays at $ last (\pi ')$ for at least $(j + 1 - {|\pi '|_{\text {ds}}}) \cdot \delta $ time units which means that $\text {di}(\pi ') \big ({\xrightarrow {\bot }} last (\pi ')\big )^{j+1 - {|\pi '|_{\text {ds}}}} = \bar{\pi }$ is a prefix of ${\text {di}(\pi )}$. Since ${|\bar{\pi }|_{\text {ds}}} = j+1$, this contradicts $\pi \notin [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$.
We infer that $\pi $ takes at least one transition in the time interval $[a, a+ \delta )$. The probability for this can be upper bounded by $1-e^{-\lambda \delta }$, i.e.,
$$\begin{aligned}&\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \\&\quad \le \text {Pr}^{\mathcal {M}}_{\sigma }(\{ \pi \in { IPaths ^{\mathcal {M}}} \mid \pi \text { takes a transition in time interval } [a, a+ \delta ) \}) \\&\quad \le 1-e^{-\lambda \delta } = \varepsilon ^{\uparrow }_{}(I). \end{aligned}$$

$\underline{Case \, I= [a,b] for a= \text {di}_a\delta and b= \text {di}_b\delta :}$ We have ${\text {di}(I)}= \{\text {di}_a+1, \text {di}_a+2, \dots , \text {di}_b\}$.

As in the case “$I= [a, \infty )$”, we show that $[{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G\subseteq \#^{}[{a}]^{{>}{\text {di}_a}}$. With Lemma 6 we obtain
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{a}]^{{>}{\text {di}_a}}) \le 1 - (1+ \lambda \delta )^{\text {di}_a} \cdot e^{- \lambda a} = \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$
Let $\pi \in [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G$ and let $\bar{\pi }$ be the largest prefix of ${\text {di}(\pi )}$ with $ last (\bar{\pi }) \in G$ and ${|\bar{\pi }|_{\text {ds}}} \in {\text {di}(I)}$. Such a prefix exists due to $\pi \in [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$. $\pi $ reaches $ last (\bar{\pi })$ with at most $\text {di}_b$ digitization steps and therefore within at most $b$ time units (cf. Lemma 5). As $\pi \notin \lozenge ^{I} G$, we conclude that $\pi $ has to reach (and leave) $ last (\bar{\pi })$ within less than $a$ time units. It follows that ${| pref _{ T }(\pi , a)|_{\text {ds}}} \ge {|\bar{\pi }|_{\text {ds}}} > \text {di}_a$ which implies $\pi \in \#^{}[{a}]^{{>}{\text {di}_a}}$.
Next, let $\pi \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$ and let $\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}$ be the largest prefix of $\pi $ such that $s_n \in G$ and $ T (\pi ') \le b$. Such a prefix exists due to $\pi \in \lozenge ^{I} G$. We distinguish two cases.
- If ${|\pi '|_{\text {ds}}} > \text {di}_b$, then $\pi \in \#^{}[{b}]^{{>}{\text {di}_b}}$ since $ {| pref _{ T }(\pi , b)|_{\text {ds}}} \ge {|\pi '|_{\text {ds}}} > \text {di}_b$.
- If ${|\pi '|_{\text {ds}}} \le \text {di}_b$, then ${|\pi '|_{\text {ds}}} \le \text {di}_a$ holds due to $\pi \notin [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]$. Similar to the case “$I= [a,\infty )''$ we can show that $\pi $ takes at least one transition in time interval $[a, a+ \delta )$.
It follows that
$$\begin{aligned}&\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] \\&\quad \subseteq \#^{}[{b}]^{{>}{\text {di}_b}} \cup \{ \pi \in { IPaths ^{\mathcal {M}}} \mid \pi \text { takes a transition in time interval } [a, a+ \delta ) \} \end{aligned}$$
Hence,
$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le 1 - (1+ \lambda \delta )^{\text {di}_b} \cdot e^{- \lambda b} + 1 - e^{-\lambda \delta } = \varepsilon ^{\uparrow }_{}(I). \end{aligned}$$

D Comparison to single-objective analysis

We remark that the proof in [30, Theorem 5.3] can not be adapted to show our result. The main reason is that the proof relies on an auxiliary lemma which claims that^{Footnote 8}

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} G\mid \#^{}[{\delta }]^{{<}{2}}) \le \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} G) \end{aligned}$$

(11)

holds for all schedulers $\sigma \in \text {GM}^{\mathcal {M}}$. We show that this claim does not hold. The intuition is as follows. Assume we observe that at most one Markovian transition is taken in $\mathcal {M}$ within the first $\delta $ time units (i.e., we observe a path in $\#^{}[{\delta }]^{{<}{2}}$). The lemma claims that under this observation the probability to reach $G$ within $b$ time units does not increase. We give a counterexample to illustrate that there are schedulers for which this is not true. Consider the MA $\mathcal {M}$ from Fig. 12 and let $\sigma $ be the scheduler for $\mathcal {M}$ satisfying

$$\begin{aligned} \sigma (s_0 \xrightarrow {t_1} s_1 \xrightarrow {t_2} s_2,\alpha ) = {\left\{ \begin{array}{ll} 1 &{} \text {if } t_1 + t_2 > \delta \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Hence, $\sigma $ chooses $\alpha $ iff there are less than two digitization steps within the first $\delta $ time units. It follows that the probability to reach $G= \{s_3\}$ on a path in $\#^{}[{\delta }]^{{\ge }{2}}$ is zero. We conclude that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} \{s_3\})&= \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} \{s_3\} \cap \#^{}[{\delta }]^{{<}{2}}) + \underbrace{\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} \{s_3\} \cap \#^{}[{\delta }]^{{\ge }{2}})}_{=0} \\&= \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} \{s_3\} \mid \#^{}[{\delta }]^{{<}{2}}) \cdot \underbrace{\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{\delta }]^{{<}{2}})}_{<1} \\&< \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} \{s_3\} \mid \#^{}[{\delta }]^{{<}{2}}) \end{aligned}$$

which contradicts Eq. 11.

Table 2 Additional model details

Full size table

E Further details for the experiments

1.1 E.1 Benchmark details

We provide additional information regarding our experiments on multi-objective MAs. Table 2 provides details of the considered MA. We further describe the considered case studies and objectives.

Job scheduling The job scheduling case study originates from [12] and was already discussed in Sect. 1. We consider N jobs that are executed on K identical processors. Each of the N jobs gets a different rate between 1 and 3. We consider the following objectives.

$\mathbb {E}_1$: Minimize the expected time until all jobs are completed.
$\mathbb {E}_2$: Minimize the expected time until $\lceil \frac{N}{2}\rceil $ jobs are completed.
$\mathbb {E}_3$: Minimize the expected waiting time of the jobs.
$\mathbb {P}$: Minimize the probability that the job with the lowest rate is completed before the job with the highest rate.
$\mathbb {P}_1^\le $: Maximize the probability that all jobs are completed within $\frac{N}{2K}$ time units.
$\mathbb {P}_2^\le $: Maximize the probability that $\lceil \frac{N}{2}\rceil $ jobs are completed within $\frac{N}{4K}$ time units.

The objectives have been combined as follows: (${\mathbb {O}_{}}^i$ refers to the objectives considered in Column i of Table 1):

$$\begin{aligned} {\mathbb {O}_{}}^1 = (\mathbb {E}_1, \mathbb {E}_2, \mathbb {E}_3) \quad {\mathbb {O}_{}}^2 = (\mathbb {E}_1, \mathbb {P}^\le _2) \quad {\mathbb {O}_{}}^3 = (\mathbb {P}, \mathbb {E}_1, \mathbb {E}_2, \mathbb {E}_3) \quad {\mathbb {O}_{}}^4 = (\mathbb {P}, \mathbb {E}_3, \mathbb {P}^\le _1, \mathbb {P}^\le _2) \end{aligned}$$

Polling The polling system is based on [48, 50]. It considers two stations, each having a separate queue storing up to K jobs of N different types. The jobs arrive at Station i (for $i \in \{1,2\}$) with some rate $\lambda _i$ as long as the queue of the station is not full. A server polls the two stations and processes the jobs by (nondeterministically) taking a job from a non-empty queue. The time for processing a job is given by a rate which depends on the type of the job. Erasing a job from a queue is unreliable, i.e., there is a $10\,\%$ chance that an already processed job stays in the queue. For $i \in \{1,2\}$ we assume the following objectives:

Table 3 Results for our implementation (Storm) and PRISM on the multi-objective MDP benchmarks from [28]. All run-times are in seconds

Full size table

Table 4 Results for our implementation (Storm) and IMCA for single-objective MAs. All run-times are in seconds

Full size table

$\mathbb {E}_i$: Maximize the expected number of processed jobs of Station i until its queue is full.
$\mathbb {E}_{2+i}$: Minimize the expected sum of all waiting times of the jobs arriving at Station i until the queue of Station i is full.
$\mathbb {P}^\le _i$: Minimize the probability that the queue of Station i is full within two time units.

The objectives have been combined as follows: (${\mathbb {O}_{}}^i$ refers to the objectives considered in Column i of Table 1):

$$\begin{aligned} {\mathbb {O}_{}}^1 = (\mathbb {E}_1, \mathbb {E}_2) \quad {\mathbb {O}_{}}^2 = (\mathbb {E}_1, \mathbb {E}_2, \mathbb {E}_3, \mathbb {E}_4) \quad {\mathbb {O}_{}}^3 = (\mathbb {P}^\le _1, \mathbb {P}^\le _2) \quad {\mathbb {O}_{}}^4 = (\mathbb {E}_1, \mathbb {E}_2, \mathbb {P}^\le _1, \mathbb {P}^\le _2) \end{aligned}$$

Stream This case study considers a client of a video streaming platform. The client consecutively receives N data packages and stores them into a buffer. The buffered packages are processed during the playback of the video. The time it takes to receive (or to process) a single package is modeled by an exponentially distributed delay. Whenever a package is received and the video is not playing, the client nondeterministically chooses whether it starts the playback or whether it keeps on buffering. The latter choice is not reliable, i.e., there is a $1\,\%$ chance that the playback is started anyway. In case of a buffer underrun^{Footnote 9}, the playback is paused and the client waits for new packages to arrive. We analyzed the following objectives:

$\mathbb {E}_1$: Minimize the expected buffering time until the playback is finished.
$\mathbb {E}_2$: Minimize the expected number of buffer underruns during the playback.
$\mathbb {E}_3$: Minimize the expected time to start the playback.
$\mathbb {P}^\le _1$: Minimize the probability for a buffer underrun within 2 time units.
$\mathbb {P}^\le _2$: Maximize the probability that the playback starts within 0.5 time units.

The objectives have been combined as follows: (${\mathbb {O}_{}}^i$ refers to the objectives considered in Column i of Table 1):

$$\begin{aligned} {\mathbb {O}_{}}^1 = (\mathbb {E}_1, \mathbb {E}_2) \quad {\mathbb {O}_{}}^2 = (\mathbb {E}_3, \mathbb {P}_1^\le ) \quad {\mathbb {O}_{}}^3 = (\mathbb {P}^\le _1, \mathbb {P}^\le _2) \quad {\mathbb {O}_{}}^4 = (\mathbb {E}_1, \mathbb {E}_3, \mathbb {P}^\le _1) \end{aligned}$$

Mutex This case study regards a randomized mutual exclusion protocol based on [42, 50]. Three processes nondeterministically choose a job for which they need to enter the critical section. The amount of time a process spends in its critical section is given by a rate which depends on the chosen job. There are N different types of jobs. For each $i \in \{1,2,3\}$ the following objective are considered:

$\mathbb {P}^\le _i$: Maximize the probability that Process i enters its critical section within 0.5 time units.
$\mathbb {P}^\le _{3+i}$: Maximize the probability that Process i enters its critical section within 1 time unit.

The objectives have been combined as follows: (${\mathbb {O}_{}}^i$ refers to the objectives considered in Column i of Table 1):

$$\begin{aligned} {\mathbb {O}_{}}^1 = (\mathbb {P}^\le _1, \mathbb {P}^\le _2,\mathbb {P}^\le _3) \quad {\mathbb {O}_{}}^2 = (\mathbb {P}^\le _4, \mathbb {P}^\le _5,\mathbb {P}^\le _6) \quad \end{aligned}$$

1.2 E.2 Comparison with PRISM

The detailed results of our experiments with PRISM are given in Table 3. We depict the different benchmark instances with the number of states of the MDP (Column #states) and the considered combination of objectives ($\lozenge $ represents an (untimed) probabilistic objective, $\text {ER}$ an expected reward objective, and $\le $ a step-bounded reward objective). We list the number of vertices of the obtained under-approximation (Column pts). Column iter lists the time required for the iterative exploration of the set of achievable points as described in [28]. In Column verif we depict the verification time—including the time for the iterations as well as the conducted preprocessing steps. Column total indicates the total runtime of the tool which includes model building time and verification time.

During our experiments we observed that PRISM does not detect that both objectives considered for the scheduler-instances yield infinite rewards under every possible resolution of non-determinism. As a result, the value iteration-based procedure does not converge and PRISM reports that the maximal number of iterations are exceeded. Storm detects this issue and shows a proper warning to the user.

We further note that PRISM can not compute Pareto curves for more than two objectives. However, it can answer achievability- and numerical queries as introduced in [28] with three or more objectives.

1.3 E.3 Comparison with IMCA

The resulting verification times are given in Table 4. We depict the different benchmark instances with the number of states of the MA (Column #states) and the considered objective (as discussed in App. E.1). Besides the run-times of IMCA, we depict the run-times of our implementation (effectively performing multi-objective model checking with only one objective) in Column Storm (multi). Column Storm (single) shows the run-times obtained when Storm is invoked with standard (single-objective) model checking methods. The latter uses the more recent Unif+ algorithm [15].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quatmann, T., Junges, S. & Katoen, JP. Markov automata with multiple objectives. Form Methods Syst Des 60, 33–86 (2022). https://doi.org/10.1007/s10703-021-00364-6

Download citation

Received: 18 September 2019
Accepted: 10 February 2021
Published: 29 March 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10703-021-00364-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Markov automata with multiple objectives

Abstract

Similar content being viewed by others

Markov Automata with Multiple Objectives

Optimal Continuous Time Markov Decisions

Markov Automata on Discount!

1 Introduction

2 Preliminaries

2.1 Models

Definition 1

Example 1

Example 2

Example 3

Remark 1

Definition 2

Definition 3

Example 4

Definition 4

Definition 5

2.2 Objectives

2.2.1 Probability measure

2.2.2 Reachability objectives

Definition 6

2.2.3 Expected reward objectives

Definition 7

3 Multi-objective model checking

Definition 8

Example 5

Theorem 1

Proof

Proposition 1

Proof

Theorem 2

Proof

4 Analysis of Markov automata with multiple objectives

4.1 Untimed reachability objectives

Theorem 3

Definition 9

Definition 10

Example 6

Lemma 1

Proof

Proposition 2

Proof

Proof

4.2 Expected reward objectives

Theorem 4

Proposition 3

Example 7

Lemma 2

Proof

Lemma 3

Proof

Proof

Proof

4.3 Timed reachability objectives

4.3.1 Digitized reachability

Definition 11

Example 8

Definition 12

Example 9

Definition 13

Definition 14

Example 10

Definition 15

Proposition 4

Definition 16

Example 11

Lemma 4

4.3.2 Timed reachability via digitization

Lemma 5

Proof

Definition 17

Example 12

Lemma 6

Lemma 7

Proposition 5

Proof

Corollary 1

Example 13