1 Introduction

Markov automata [24, 26] extend labeled transition systems with probabilistic branching and exponentially distributed delays. They are a compositional variant of continuous-time Markov decision processes (CTMDPs), in a similar vein as Segala’s probabilistic automata extend classical MDPs. Transitions of a Markov automaton (MA) lead from states to probability distributions over states, and are either labeled with actions (allowing for interaction) or real numbers (rates of exponential distributions). MAs are used in reliability engineering [8, 51], hardware design [19], data-flow computation [38], dependability [9] and performance evaluation [25], as MAs are a natural semantic framework for modeling formalisms such as AADL, dynamic fault trees, stochastic Petri nets, stochastic activity networks, SADF etc. The verification of MAs so far focused on single objectives such as reachability, timed reachability, expected costs, and long-run averages [2, 14, 16, 30, 31, 34, 35]. These analyses cannot treat objectives that are mutually influencing each other. The aim of this paper is to analyze multiple objectives on MAs at once, facilitating trade-off analysis by approximating Pareto curves.

Consider the stochastic job scheduling problem of [12]: perform n jobs with exponential service times on k identical processors allowing pre-emptive scheduling. Once a job finishes, all k processors can be assigned any of the m remaining jobs. When \(n{-}m\) jobs are finished, this yields \(\left( {\begin{array}{c}m\\ k\end{array}}\right) \) non-deterministic choices. The largest-expected-service-time-first-policy is optimal to minimize the expected time to complete all jobs [12]. It is unclear how to schedule when imposing extra constraints, e.g., requiring a high probability to finish a batch of c jobs within a tight deadline (to accelerate their post-processing), or having a low average waiting time. These multiple objectives involve non-trivial trade-offs. Our algorithms analyze such trade-offs. Figure 1, e.g., shows the obtained result for 12 jobs and 3 processors. It approximates the set of points \((p_{1}, p_{2})\) for schedules achieving that (1) the expected time to complete all jobs is at most \(p_{1}\) and (2) the probability to finish half of the jobs within an hour is at least \(p_{2}\).

Fig. 1
figure 1

Approx. Pareto curve for stochastic job scheduling

This paper extends an earlier paper [44] and presents techniques to verify MAs with multiple objectives. The contribution is threefold. First and foremost, we generalize the analysis of (classical, discrete-time) MDPs with multiple objectives to MA. Moreover, we remove some restrictions present in the classical analysis of MDPs with multiple objectives. Third, we provide an efficient implementation. We detail the contributions below.

Regarding the analysis of MA, we consider multiple (un)timed reachability and expected reward objectives as well as their combinations. Put shortly, we reduce all these problems to instances of multi-objective verification problems on classical MDPs. For multi-objective queries involving (combinations of) untimed reachability and expected reward objectives, corresponding algorithms on the underlying MDP can be used. In this case, the MDP is simply obtained by ignoring the timing information, see Fig. 2b. The crux is in relating MA schedulers—that can exploit state sojourn times to optimize their decisions—to MDP schedulers. For multiple timed reachability objectives, digitization [30, 35] is employed to obtain an MDP, see Fig. 2c. The key is to mimic sojourn times by self-loops with appropriate probabilities. This provides a sound arbitrary close approximation of the timed behavior and also allows to combine timed reachability objectives with other types of objectives. The main contribution is to show that digitization is sound for all possible MA schedulers. This requires a new proof strategy as the existing ones are tailored to single objectives.

We extend approaches for classical multi-objective MDP to support the multi-objective analysis of digitization MDPs. The extension embeds single-objective MA techniques from [35] into the multi-objective framework of [28]. Furthermore, we allow the simultaneous analysis of minimizing and maximizing expected reward objectives, lifting a restriction imposed in [28] by applying additional preprocessing steps that eliminate problematic end components.

Experiments on instances of four MA benchmarks show encouraging results. Multiple untimed reachability and expected reward objectives can be efficiently treated for models with millions of states. As for single objectives [30], timed reachability is more expensive. Our implementation is competitive to PRISM  for multi-objective MDPs [28, 39] and to IMCA  [30], which implements [35], for single-objective MAs.

Compared with the version in [44], we (1) provide all technical ideas regarding the analysis of Markov automata, and provide formal proofs that previously were omitted. Furthermore, (2) the description of the existing multi-objective MDP model checking makes the paper self-contained, and allows us to give (3) the necessary extensions that allow, e.g., mixing upper and lower bounds on the different objectives. While such extensions may be considered minor, they are practically relevant and often intricate.

Related work Multi-objective decision making for MDPs with discounting and long-run objectives has been well investigated; for a recent survey, see [47]. Etessami et al. [27] consider verifying finite MDPs with multiple \(\omega \)-regular objectives. Other multiple objectives include expected rewards under worst-case reachability [13, 29], reward-bounded reachability [33], quantiles and conditional probabilities [5, 33], mean pay-offs and stability [11], long-run objectives [7, 10], total average discounted rewards under PCTL [49], and stochastic shortest path objectives [46]. This has been extended to MDPs with unknown cost function [37], MDPs under a restricted class of policies [23], infinite-state MDPs [20] arising from two-player timed games in a stochastic environment, and stochastic two-player games [18]. To the best of our knowledge, this is the first work on multi-objective MDPs extended with random timing.

Fig. 2
figure 2

MA \(\mathcal {M}\) with underlying MDP \({\mathcal {M}_\mathcal {D}}\) and digitization \({\mathcal {M}_{\delta }}\)

2 Preliminaries

Notations The set of real numbers is denoted by \(\mathbb {R}\), and we write \(\mathbb {R}_{> 0}= \{x \in \mathbb {R}\mid x > 0\}\) and \(\mathbb {R}_{\ge 0}= \mathbb {R}_{> 0}\cup \{0\}\). For a finite set \(S\), \(\textit{Dist}(S)\) denotes the set of probability distributions over \(S\). \(\mu \in \textit{Dist}(S)\) is Dirac if \(\mu (s) = 1\) for some \(s\in S\).

2.1 Models

Markov automata generalize both Markov decision processes (MDPs) and continuous time Markov chains (CTMCs). They are extended with rewards (or, equivalently, costs) to allow modelling, e.g., energy consumption.

Definition 1

(Markov automaton) A Markov automaton (MA) is a tuple \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) where \(S\) is a finite set of states with initial state \({s_{0}}\in S\), \( Act \) is a finite set of actions with \(\bot \in Act \) and \( Act \cap \mathbb {R}_{\ge 0}= \emptyset \),

  • is a set of transitions such that for all \(s\in S\) there are no two transitions \((s, \lambda , \mu ), (s, \lambda ', \mu ') \in {\rightarrow }\) with \(\lambda , \lambda ' \in \mathbb {R}_{> 0}\), and

  • \(\rho _{1}, \dots , \rho _{\ell }\) with \(\ell \ge 0\) are reward functions .

In the remainder of the article, let \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) denote an MA. Below, we introduce some notation and introduce some standard assumptions on MA. A transition \((s, \gamma , \mu ) \in {\rightarrow }\), denoted by \(s\xrightarrow {\gamma } \mu \), is called probabilistic if \(\gamma \in Act \) and Markovian if \(\gamma \in \mathbb {R}_{> 0}\). In the latter case, \(\gamma \) is the rate of an exponential distribution, modeling a time-delayed transition. Probabilistic transitions fire instantaneously. The successor state is determined by \(\mu \), i.e., we move to \(s'\) with probability \(\mu (s')\). A reward function \(\rho _{i}\) defines state rewards and action rewards. When sojourning in a state \(s\) for \(t\) time units , the state reward \(\rho _{i}(s) \cdot t\) is obtained. Upon taking a transition \(s\xrightarrow {\gamma } \mu \), we collect action reward \(\rho _{i}(s, \gamma )\) (if \(\gamma \in Act \)) or \(\rho _{}(s, \bot )\) (if \(\gamma \in \mathbb {R}_{> 0}\)).

Example 1

Figure 2a shows an MA \(\mathcal {M}\) with seven states and no rewards. We draw direct edges for Dirac distributions, and Markovian transitions are illustrated by dashed arrows. For example: State \(s_0\) has an outgoing Markovian transition with rate 1 to a distribution \(\mu \), where \(\mu (s_3) = 1\). From state \(s_4\) there are two probabilistic transitions labelled with two different actions \(\gamma \) and \(\eta \). For \(s_4 \xrightarrow {\eta } \mu \), \(\mu \) is given by \(\mu (s_5) = 0.7\) and \(\mu (s_2) = 0.3\).

Probabilistic (Markovian) states PS (MS) have an outgoing probabilistic (Markovian) transition, respectively:

$$\begin{aligned}&\text {PS}=\{ s\in S\mid s\xrightarrow {\alpha } \mu , \alpha \in Act \}, \\&\text {MS}=\{ s\in S\mid s\xrightarrow {\lambda } \mu , \lambda \in \mathbb {R}_{> 0}\}. \end{aligned}$$

We exclude terminal states \(s\notin \text {PS}\cup \text {MS}\) by adding a (Markovian) self-loop to all these states. As standard for MAs [24, 26], we impose the maximal progress assumption, i.e., probabilistic transitions take precedence over Markovian ones as the probability to fire a Markovian transition instantly is zero. Thus, we assume that states are either probabilistic or Markovian: . For Markovian states, we define the exit rate \({\text {E}(s)}\) of \(s\in \text {MS}\) by \(s\xrightarrow {{\text {E}(s)}} \mu \).

Example 2

Reconsider \(\mathcal {M}\) in Fig. 2a. The states \(s_0\), \(s_1\), \(s_2\), \(s_5\) and \(s_6\) are Markovian. All other states are probabilistic. The exit rate of \(s_0\) is 1, the exit rate of \(s_5\) is 5.

We assume action-deterministic MAs: \(|\{ \mu \in \textit{Dist}(S) \mid s\xrightarrow {\alpha } \mu \}| \le 1\) holds for all \(s\in S\) and \(\alpha \in Act \). For probabilistic states, this can be achieved by renaming. For Markovian states, this follows from the definition of Markov automataFootnote 1. The transition probabilities of \(\mathcal {M}\) are given by the function \(\mathbf{P} :S\times Act \times S\rightarrow [0,1]\) satisfying

$$\begin{aligned} \mathbf{P} (s, \alpha , s') = {\left\{ \begin{array}{ll} \mu (s') &{} \text {if either } s\xrightarrow {\alpha } \mu \text { or } \big (\alpha = \bot \text { and } s\xrightarrow {{\text {E}(s)}} \mu \big ) \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

The value \(\mathbf{P} (s, \alpha , s')\) corresponds to the probability to move from \(s\) with action \(\alpha \) to \(s'\). The enabled actions at state \(s\) are given by \( Act (s) = \{ \alpha \in Act \mid \exists s' \in S:\mathbf{P} (s, \alpha , s') > 0 \}\).

Example 3

Reconsider \(\mathcal {M}\) in Fig. 2a. The enabled actions in \(s_4\) are \(\gamma \) and \(\eta \). The probability \(\mathbf{P} (s_4, \eta , s_2) = 0.3\), while \(\mathbf{P} (s_5, \bot , s_4) = 0.4\).

Remark 1

In Sect. 4.3 we make one additional (but standard) assumption regarding the presence of Zeno-loops. We defer the discussion to that section.

Markov automata extend Markov decision processes by adding random timing. For concise notation, we define MDPs separately below.

Definition 2

(Markov decision process [43]) A Markov decision process (MDP) is a tuple \(\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))\), with \(S, {s_{0}}, Act , \ell \) as in Def. 1, \(\rho _{1}, \dots , \rho _{\ell }\) are action reward functions \(\rho _{i} :S\times Act \rightarrow \mathbb {R}_{\ge 0}\), and \(\mathbf{P} :S\times Act \times S\rightarrow [0,1]\) are the transition probabilities satisfying \( \sum _{s' \in S}\mathbf{P} (s, \alpha , s') \in \{0,1\}\) for all \(s\in S\) and \(\alpha \in Act \).

MDPs are MAs without Markovian states, i.e., \(\text {MS}= \emptyset \). Thus, MDPs exhibit probabilistic branching and non-determinism, but no random timing.

The reward \(\rho _{}(s, \alpha )\) is collected when taking action \(\alpha \) at state \(s\). We do not consider state rewards for MDPs. The underlying MDP of an MA abstracts away from its timing:

Definition 3

(Underlying MDP) For MA \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) with transition probabilities \(\mathbf{P} \) the underlying MDP of \(\mathcal {M}\) is the MDP \({\mathcal {M}_\mathcal {D}}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}))\), where for each \(i \in \{1, \dots , \ell \}\):

$$\begin{aligned} \rho _{i}^{\mathcal {D}}(s, \alpha ) = {\left\{ \begin{array}{ll} \rho _{i}(s, \alpha ) &{} \text {if } s\in \text {PS}, \\ \rho _{i}(s, \bot ) + \frac{1}{{\text {E}(s)}} \cdot \rho _{i}(s) &{} \text {if } s\in \text {MS}\text { and } \alpha = \bot , \\ 0 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

The reward functions \(\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}\) incorporate the action and state rewards of \(\mathcal {M}\) where the state rewards are multiplied with the expected sojourn times \(\frac{1}{{\text {E}(s)}}\) of states \(s\in \text {MS}\).

Example 4

Figure 2 shows an MA \(\mathcal {M}\) with its underlying MDP \({\mathcal {M}_\mathcal {D}}\).

Paths and schedulers Paths represent runs of \(\mathcal {M}\) starting in the initial state. Let \(t(\kappa _{})= 0\) and \(\alpha (\kappa _{})= \kappa _{}\), if \(\kappa _{}\in Act \), and \(t(\kappa _{})= \kappa _{}\) and \(\alpha (\kappa _{})= \bot \), if \(\kappa _{}\in \mathbb {R}_{\ge 0}\).

Definition 4

(Infinite path) An infinite path of MA \(\mathcal {M}\) with transition probabilities \(\mathbf{P} \) is an infinite sequence \(\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \) of states \(s_0, s_1, \dots \in S\) and stamps such that (1) \(\sum _{i=0}^{\infty } t(\kappa _{i}) = \infty \), and for any \(i \ge 0\) it holds that (2) \(\mathbf{P} (s_i, \alpha (\kappa _{i}), s_{i+1}) > 0\), (3) \(s_i \in \text {PS}\) implies \(\kappa _{i} \in Act \), and (4) \(s_i \in \text {MS}\) implies \(\kappa _{i} \in \mathbb {R}_{\ge 0}\).

An infix \(s_i \xrightarrow {\kappa _{i}} s_{i+1}\) of a path \(\pi \) represents that we stay at \(s_i\) for \(t(\kappa _{i})\) time units and then perform action \(\alpha (\kappa _{i})\) and move to state \(s_{i+1}\). Condition (1) excludes Zeno paths, condition (2) ensures positive transition probabilities, and conditions (3) and (4) assert that stamps \(\kappa _{i}\) match the transition type at \(s_i\).

A finite path is a finite prefix \(\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\) of an infinite path. The length of \(\pi '\) is \(|\pi '| = n\), its last state is \( last (\pi ') = s_n\), and the time duration is \( T (\pi ') = \sum _{0 \le i < |\pi '|} t(\kappa _{i})\). We denote the sets of finite and infinite paths of \(\mathcal {M}\) by \({ FPaths ^{\mathcal {M}}}\) and \({ IPaths ^{\mathcal {M}}}\), respectively. The superscript \(\mathcal {M}\) is omitted if the model is clear from the context. For a finite or infinite path \(\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \) the prefix of \(\pi \) of length n is denoted by \( pref (\pi , n)\). The ith state visited by \(\pi \) is given by \({\pi [i]} = s_i\). The time-abstraction \(\text {ta}(\pi )\) of \(\pi \) removes all sojourn times and is a path of the underlying MDP \({\mathcal {M}_\mathcal {D}}\): \( \text {ta}(\pi ) = s_0 \xrightarrow {\alpha (\kappa _{0})} s_1 \xrightarrow {\alpha (\kappa _{1})} \dots \) . Paths of \({\mathcal {M}_\mathcal {D}}\) are also referred to as the time-abstract paths of \(\mathcal {M}\).

Definition 5

(Generic scheduler) A generic scheduler for \(\mathcal {M}\) is a measurable function \( \sigma :{ FPaths ^{}}\times Act \rightarrow [0,1] \) such that \(\sigma (\pi ,\cdot ) \in \textit{Dist}( Act ( last (\pi )))\) for each \(\pi \in { FPaths ^{}}\).

A scheduler \(\sigma \) for \(\mathcal {M}\) resolves the non-determinism of \(\mathcal {M}\): \(\sigma (\pi ,\alpha )\) is the probability to take transition \( last (\pi ) \xrightarrow {\alpha } \mu \) after observing the run \(\pi \). The set of such schedulers is denoted by \(\text {GM}^{\mathcal {M}}\) (\(\text {GM}^{}\) if \(\mathcal {M}\) is clear from the context). \(\sigma \in \text {GM}^{}\) is deterministic if the distribution \(\sigma (\pi ,\cdot )\) is Dirac for any \(\pi \). Time-abstract schedulers behave independently of the time-stamps of the given path, i.e., \(\sigma (\pi ,\alpha ) = \sigma (\pi ',\alpha )\) for all actions \(\alpha \) and paths \(\pi , \pi '\) with \(\text {ta}(\pi ) = \text {ta}(\pi ')\). We write \(\text {TA}^{\mathcal {M}}\) to denote the set of time-abstract schedulers of \(\mathcal {M}\). GM is the most general scheduler class for MAs, and \(\text {TA}^{}\) is the most general for MDPs.

2.2 Objectives

An objective \({\mathbb {O}_{i}}\) is a representation of a quantitative property like the probability to reach an error state, or the expected energy consumption. To express Boolean properties (e.g., the probability to reach an error state is below \(p_{i}\)), \({\mathbb {O}_{i}}\) is combined with a threshold \(\vartriangleright _{i} p_{i}\) where \(\vartriangleright _{i}\, \in \{<, \le , >, \ge \}\) is a threshold relation and \(p_{i} \in \mathbb {R}\) is a threshold value. Let \(\mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}\) denote that the MA \(\mathcal {M}\) under scheduler \(\sigma \in \text {GM}^{}\) satisfies the property \({\mathbb {O}_{i}} \vartriangleright _{i} p_{i}\). We consider reachability objectives and expected reward objectives. In the remainder of this section, we formalize these standard notions.

2.2.1 Probability measure

Given a scheduler \(\sigma \in \text {GM}^{}\), a probability measure \(\text {Pr}^{\mathcal {M}}_{\sigma }\) on measurable sets of infinite paths is defined, which generalizes both the standard probability measure on MDPs and on CTMCs. We briefly sketch the definition of \(\text {Pr}^{\mathcal {M}}_{\sigma }\). More information can be found in, e.g., [35, 40].

We first consider the probability measure \(\text {Pr}^{ Steps }_{\sigma , \pi }\) for transition steps. We let \(\mathfrak {B}(\mathbb {R}_{\ge 0})\) denote the Borel \(\varvec{\sigma }\)-algebra on \(\mathbb {R}_{\ge 0}\) and define the \(\varvec{\sigma }\)-algebra for transition steps \(\mathfrak {F}^ Steps = \varvec{\sigma }(\mathfrak {B}(\mathbb {R}_{\ge 0}) \times 2^ Act \times 2^S)\) using the Cartesian product of \(\varvec{\sigma }\)-algebra [1]. For a finite path \(\pi \in { FPaths ^{}}\) with \(s= last (\pi )\) and a measurable set of transition steps \(T \in \mathfrak {F}^ Steps \), the probability of \(T \subseteq \mathbb {R}_{\ge 0}\times Act \times S\) is

Next, we define the probability measure \(\text {Pr}^{n}_{\sigma }\) for \(n \in \mathbb {N}\) transition steps over the \(\varvec{\sigma }\)-algebra . For \(S' \subseteq S\) we set if \({s_{0}}\in S'\) and otherwise. Moreover, given \(n > 0\), \(\varPi \in \mathfrak {F}^{n-1}\) and \(T \in \mathfrak {F}^ Steps \) we set \(\varPi \circ T= \{ (\pi ,\,(t, \alpha , s')) \mid \pi \in \varPi , (t, \alpha , s') \in T \}\) and

$$\begin{aligned} \text {Pr}^{n}_{\sigma }(\varPi \circ T) = \int _{\pi \in \varPi \cap { FPaths ^{}}} \text {Pr}^{ Steps }_{\sigma , \pi }(T) \,\text {d}\text {Pr}^{n-1}_{\sigma }(\{\pi \})\ . \end{aligned}$$

Following standard constructions (e.g., [1, 40]), we obtain \(\text {Pr}^{\mathcal {M}}_{\sigma }\) by lifting \(\text {Pr}^{n}_{\sigma }\) first to measurable sets of finite paths given by the \(\varvec{\sigma }\)-algebra \(\mathfrak {F}^* = \bigcup _{n=0}^\infty \mathfrak {F}^n\) and finally to measurable sets of infinite paths given by the \(\varvec{\sigma }\)-algebra \(\mathfrak {F}^\omega \). The latter is the smallest \(\varvec{\sigma }\)-algebra containing the cylinder set \( Cyl (\varPi )\) of all measurable \(\varPi \in \mathfrak {F}^*\), where

$$\begin{aligned} Cyl (\varPi ) = \{ \pi \xrightarrow {\kappa _{n}} s_{n+1} \xrightarrow {\kappa _{n+1}} \dots \in { IPaths ^{\mathcal {M}}} \mid \pi \in \varPi \}. \end{aligned}$$

Let \(\varPi \in \mathfrak {F}^*\) and \(\pi \in IPaths\). For simplicity, we may write \(\text {Pr}^{\mathcal {M}}_{\sigma }(\varPi )\) instead of \(\text {Pr}^{\mathcal {M}}_{\sigma }( Cyl (\varPi ))\) and \(\text {Pr}^{\mathcal {M}}_{\sigma }(\pi )\) instead of \(\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \})\). Note that \(\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \}) = 0 \) if \(\pi \) contains one or more Markovian transitions.

For some \(\varLambda \in \mathfrak {F}^\omega \) with \(\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda ) > 0\) we consider the conditional probability measure \(\text {Pr}^{\mathcal {M}}_{\sigma }( \cdot \mid \varLambda ) :\mathfrak {F}^\omega \rightarrow [0,1]\), where for \(\varPi \in \mathfrak {F}^\omega \):

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( \varPi \mid \varLambda ) = \frac{\text {Pr}^{\mathcal {M}}_{\sigma }( \varPi \cap \varLambda )}{\text {Pr}^{\mathcal {M}}_{\sigma }(\varLambda )}. \end{aligned}$$

2.2.2 Reachability objectives

\(I\subseteq \mathbb {R}\) is a time interval if it is of the form \(I= [a, b]\) or \(I= [a,\infty )\), where \(0 \le a< b\). The set of paths reaching a set of goal states \(G\subseteq S\) in time \(I\) is defined as

$$\begin{aligned} \lozenge ^{I} G=&\{ \pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{}} \mid \exists n \ge 0 :{\pi [n]} \in G\text { and}\\&I\cap [t, t+t(\kappa _{n})] \ne \emptyset \text { for } t= T ( pref (\pi , n)) \}. \end{aligned}$$

\(\lozenge ^{I}\) is measurable as it can be expressed via countable unions of measurable cylinder sets. We write \(\lozenge ^{}G\) instead of \(\lozenge ^{[0,\infty )} G\).

We formulate reachability objectives as follows:

Definition 6

(Reachability objective) A reachability objective has the form \({\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\) for time interval \(I\) and goal states \(G\). The objective is timed if \(I\ne [0,\infty )\) and untimed otherwise. For MA \(\mathcal {M}\) and scheduler \(\sigma \in \text {GM}^{}\), let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})} \vartriangleright _{i} p_{i}\quad \text {iff}\quad \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \vartriangleright _{i}p_{i}. \end{aligned}$$

2.2.3 Expected reward objectives

Expected rewards define the expected amount of reward collected (w.r.t. some \(\rho _{j}\)) until a goal state in \(G\subseteq S\) is reached, generalizing the notion on CTMCs and MDPs. More precisely, we fix a reward function \(\rho _{}\) of the MA \(\mathcal {M}\). The reward of a finite path \(\pi ' = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}\) is given by

Intuitively, is the sum over the rewards obtained in every step \(s_{i} \xrightarrow {\kappa _{i}}\) in the path \(\pi '\). The reward obtained in step i is composed of the state reward of \(s_i\) multiplied with the sojourn time \(t(\kappa _{i})\) plus the action reward given by \(s_i\) and \(\alpha (\kappa _{i})\). State rewards assigned to probabilistic states do not affect the reward of a path as the sojourn time in such states is zero.

For an infinite path \(\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{}}\), the reward of \(\pi \) up to a set of goal states \(G\subseteq S\) is given by

Intuitively, we stop collecting reward as soon as \(\pi \) reaches a state in \(G\). If no state in \(G\) is reached, reward is accumulated along the infinite path, which potentially yields an infinite reward. The function is measurable. Its expected value with respect to scheduler \(\sigma \in \text {GM}^{}{}\) is called the expected reward \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})\), i.e.,

With this definition, we formulate expected reward objectives as follows:

Definition 7

(Expected reward objective) An expected reward objective has the form \({\mathbb {E}({\# j_{}, G_{}})}\) where \(j\) is the index of reward function \(\rho _{j}\) and \(G\subseteq S\). For MA \(\mathcal {M}\) and scheduler \(\sigma \in \text {GM}^{}\), let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {E}({\# j_{}, G_{}})} \vartriangleright _{i} p_{i}\quad \text {iff}\quad \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) \vartriangleright _{i} p_{i}. \end{aligned}$$

Expected time objectives \({\mathbb {E}({ T , G_{}})}\) are expected reward objectives that consider the reward function \(\rho _{ T }\) with \(\rho _{ T }(s) = 1\) if \(s\in \text {MS}\) and all other rewards are zero.

3 Multi-objective model checking

Standard model checking considers objectives individually. This approach is not feasible when we are interested in multiple objectives that should be fulfilled by the same scheduler, e.g., a scheduler that maximizes the expected profit might violate certain safety constraints. Multi-objective model checking aims to analyze multiple objectives at once and reveals potential trade-offs.

Definition 8

(Satisfaction of multiple objectives) Let \(\mathcal {M}\) be an MA and \(\sigma \in \text {GM}^{}\). For objectives \({\mathbb {O}_{}} = ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})\) with threshold relations \({\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})\in \{<, \le , >, \ge \}^d\) and threshold values \(\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d\) let

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\iff \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \text { for all } 1 \le i \le d. \end{aligned}$$

Furthermore, let \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff \exists \sigma \in \text {GM}^{}\) such that \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\).

If \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\), the point \(\mathbf{p }\in \mathbb {R}^d\) is achievable in \(\mathcal {M}\) with scheduler \(\sigma \). The set of achievable points of \(\mathcal {M}\) w.r.t. \({\mathbb {O}_{}}\) and \(\vartriangleright _{}\) is \( \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}. \) We straightforwardly also apply these definitions to MDPs. This is compatible with the notions on multi-objective MDPs as given in [27, 28].

Example 5

Figure 3b depicts the set of achievable points of the MA \(\mathcal {M}\) from Fig. 3a w.r.t. relations \({\vartriangleright _{}}=(\ge ,\ge )\) and objectives \(({\mathbb {P}({\lozenge ^{}\{s_2\}})},{\mathbb {P}({\lozenge ^{}\{s_4\}})})\). Likewise, Fig. 3c depicts the achievable points for \(\mathcal {M}\), \({\vartriangleright _{}}\), and \(({\mathbb {P}({\lozenge ^{}\{s_2\}})},{\mathbb {P}({\lozenge ^{[0,2]}{\{s_4\}}})})\). Using the set of achievable points, we can answer Pareto, numerical, and achievability queries as considered in [28], e.g., the Pareto front lies on the border of the set. We detail this computation in Sect. 5.

Schedulers For single-objective model checking on MAs, it suffices to consider deterministic schedulers [41]. For untimed reachability and expected rewards even time-abstract deterministic schedulers suffice [41]. Multi-objective model checking on MDPs requires history-dependent, randomized schedulers [27]. The following theorem and its proof show that schedulers on MA may also employ timing information to make optimal choices, even if only untimed objectives are considered. However, we show in Sect. 4 that for untimed objectives, such schedulers can always be converted to time-abstract schedulers (potentially considering the history and randomization). Put differently, timing information can be employed to achieve untimed objectives but it is not necessary to do so.

Fig. 3
figure 3

Markov automaton and achievable points

Theorem 1

For some MA \(\mathcal {M}\) with \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\), no deterministic time-abstract scheduler \(\sigma \) satisfies \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\).

Proof

Consider the MA \(\mathcal {M}\) in Fig. 3a with relations \({\vartriangleright _{}} = (\ge , \ge )\), objectives \({\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}\{s_2\}})}, {\mathbb {P}({\lozenge ^{}\{s_4\}})})\), and point \(\mathbf{p }=(0.5,0.5)\). We have \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\): A simple graph argument yields that both properties are only satisfied if action \(\alpha \) is taken with probability exactly a half. As the probability mass to stay in \(s_0\) for at most \(\ln (2)\) is exactly 0.5, a timed scheduler \(\sigma \) with \(\sigma (s_0 \xrightarrow {t} s_1,\alpha )=1\) if \(t\le \ln (2)\) and 0 otherwise does satisfy both objectives.

There are only two deterministic time abstract schedulers for \(\mathcal {M}\):

$$\begin{aligned} \sigma _\alpha :\text { always choose } \alpha \qquad \text {and} \qquad \sigma _\beta :\text { always choose } \beta \end{aligned}$$

and it holds that \(\mathcal {M}, \sigma _\alpha \not \models {\mathbb {P}({\lozenge ^{}\{s_4\}})} \ge 0.5\) and \(\mathcal {M},\sigma _\beta \not \models {\mathbb {P}({\lozenge ^{}\{s_2\}})} \ge 0.5\). \(\square \)

The geometric shape of the achievable points Like for MDPs [27], the set of achievable points of any combination of aforementioned objectives is convex.

Proposition 1

The set \(\{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}\) is convex.

Proof

Let \(\mathcal {M}\) be an MA and let \({\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})\) be objectives with relations \({\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})\) and points \(\mathbf{p }_1, \mathbf{p }_2 \in \mathbb {R}^d\) such that \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_1)\) and \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_2)\) holds. To show that the set of achievable points is convex, we need to show that for \(w \in [0,1]\) the point \(\mathbf{p }= w \cdot \mathbf{p }_1 + (1-w) \cdot \mathbf{p }_2\) is achievable as well, i.e., \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\). For \(i \in {1,2}\), let \(\sigma _i \in \text {GM}^{}\) be a scheduler satisfying \(\mathcal {M}, \sigma _i \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_i\). The point \(\mathbf{p }\) is achievable with the scheduler \(\sigma \) that intuitively makes an initial one-off random choice:

  • with probability w mimic \(\sigma _1\) and

  • with probability \(1-w\) mimic \(\sigma _2\).

Specifying such a scheduler as a function \(\sigma ^w :{ FPaths ^{}}\times Act \rightarrow [0,1]\) as in Def. 5 is technically involved because \(\sigma ^w\) can not memorize the outcome of the initial one-off random choice. For a path \(\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\in { FPaths ^{}}\) and \(\alpha \in Act \), the value \(\sigma ^w(\pi , \alpha )\) needs to depend on the previously chosen actions \(\alpha (\kappa _{j})\) in \(\pi \) and the probability that these choices adhere to \(\sigma _1\) and \(\sigma _2\), respectively. Let \(w_1 = w\) and \(w_2 = 1-w\). We setFootnote 2

$$\begin{aligned} \sigma ^w(\pi , \alpha ) = \frac{\sum _{i=1}^2 \left( w_i \cdot \sigma _i(\pi , \alpha ) \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) }{\sum _{i=1}^2 \left( w_i \cdot \prod _{j=0}^{n-1} \sigma _i( pref (\pi , j), \alpha (\kappa _{j})) \right) } \end{aligned}$$

In App. A we show that for any measurable \(\varPi \subseteq { IPaths ^{}}\) we have \(\text {Pr}^{\mathcal {M}}_{\sigma ^w}(\varPi ) = w_1 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _1}(\varPi ) + w_2 \cdot \text {Pr}^{\mathcal {M}}_{\sigma _2}(\varPi )\). Lifting this observation to expected rewards is straightforward. It follows that \(\sigma ^w\) achieves the point \(\mathbf{p }= w_1 \cdot \mathbf{p }_1 + w_2 \cdot \mathbf{p }_2\). \(\square \)

For MDPs, the set of achievable points is a convex polytope where the vertices can be realized by deterministic schedulers that use memory bounded by \(2^d\), where \(d\) is the number of objectives. As there are finitely many such schedulers, the polytope has a finite \(\mathcal {V}\)-representation [27], i.e., it can be represented by a finite number of vertices. This result does not carry over to MAs.

Theorem 2

For some MA \(\mathcal {M}\) and objectives \({\mathbb {O}_{}}\), the polytope \(\{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}\) does not have a finite \(\mathcal {V}\)-representation.

Proof

We show that the claim holds for the MA \(\mathcal {M}\) in Fig. 3a with objectives \({\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}\{s_2\}})}, {\mathbb {P}({\lozenge ^{[0,2]}{\{s_4\}}})})\) and relations \({\vartriangleright _{}} = (\ge , \ge )\). The insight here is that for any sojourn time \(t \le 2\) in \(s_0\), the timing information is relevant for optimal schedulers: The shorter the sojourn time in \(s_0\), the higher the probability to reach \(s_4\) within the time bound. For the sake of contradiction assume that the polytope \(A= \{\mathbf{p }\in \mathbb {R}^2 \mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}\) has a finite \(\mathcal {V}\)-representation. Then, there must be two distinct vertices \(\mathbf{p }_1,\mathbf{p }_2\) of \(A\) such that \(\{w \cdot \mathbf{p }_1 + (1-w) \cdot \mathbf{p }_2 \mid w \in [0,1]\}\) is a face of \(A\). In particular, this means that \(\mathbf{p }= 0.5 \cdot \mathbf{p }_1 + 0.5 \cdot \mathbf{p }_2\) is achievable but \(\mathbf{p }_\varepsilon = \mathbf{p }+ (0,\varepsilon )\) is not achievable for all \(\varepsilon >0\). We show that there is in fact an \(\varepsilon \) for which \(\mathbf{p }_\varepsilon \) is achievable, contradicting our assumption that \(A\) has a finite \(\mathcal {V}\)-representation.

For \(i \in {1,2}\), let \(\sigma _i \in \text {GM}^{}\) be a scheduler satisfying \(\mathcal {M}, \sigma _i \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_i\). \(\sigma _1 \ne \sigma _2\) has to hold as the schedulers achieve different vertices of \(A\). The point \(\mathbf{p }\) is achievable with the randomized scheduler \(\sigma \) that mimics \(\sigma _1\) with probability 0.5 and mimics \(\sigma _2\) otherwise. Consider \(t= -\log (\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_2\}))\) and the deterministic scheduler \(\sigma '\) given by

$$\begin{aligned} \sigma '(s_0 \xrightarrow {t_0} s_1,\alpha ) = {\left\{ \begin{array}{ll} 1 &{} \text {if } t_0 > t\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

\(\sigma '\) satisfies \(\text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{}\{s_2\}) = e^{-t} = \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_2\})\). Moreover, we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{[0,t]} \{s_3\}) = \text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{}\{s_3\})= \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_3\}) > \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,t]} \{s_3\}), \end{aligned}$$

where the last inequality is due to \(\sigma \ne \sigma '\). While the probability to reach \(s_3\) is equal under both schedulers, \(s_3\) is reached earlier when \(\sigma '\) is considered. This increases the probability to reach \(s_4\) in time, i.e., \(\text {Pr}^{\mathcal {M}}_{\sigma '}(\lozenge ^{[0,2]} \{s_4\}) > \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,2]} \{s_4\})\). It follows that \(\mathcal {M}, \sigma ' \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }_\varepsilon \) for some \(\varepsilon >0\). \(\square \)

Since convex polytopes without a finite \(\mathcal {V}\)-representation cannot be represented by a finite number of vertices, any method extending the approach of [28]—which computes these vertices—can only approximate the set of achievable points.

figure a

4 Analysis of Markov automata with multiple objectives

The state-of-the-art in single-objective model checking of MA is to reduce the MA to an MDP, cf. [30, 31, 35], for which efficient algorithms exist. We aim to lift this approach to multi-objective model checking. Assume MA \(\mathcal {M}\) and objectives \({\mathbb {O}_{}}\) with threshold relations \(\vartriangleright _{}\). We discuss how the set of achievable points of \(\mathcal {M}\) relates to the set of achievable points of an MDP. The key challenge is to deal with timing information—even for untimed objectives—and to consider schedulers beyond those optimizing single objectives. We obtain:

  • For untimed reachability and expected reward objectives, the achievable points of \(\mathcal {M}\) equal those of its underlying MDP, cf. Theorems 3 and 4.

  • For timed reachability objectives, the set of achievable points of a digitized MDP \({\mathcal {M}_{\delta }}\) provides a sound approximation of the achievable points of \(\mathcal {M}\), cf. Theorem 5. Corollary 1 gives the precision of the approximation.

4.1 Untimed reachability objectives

Although timing information is essential for deterministic schedulers, cf. Theorem 1, timing information does not strengthen randomized schedulers:

Theorem 3

For MA \(\mathcal {M}\) and untimed reachability objectives \({\mathbb {O}_{}}\) it holds that \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }). \)

The main idea for proving Theorem 3 is to construct for scheduler \(\sigma \in \text {GM}^{\mathcal {M}}\) a time-abstract scheduler \({\text {ta}(\sigma )}\in \text {TA}^{{\mathcal {M}_\mathcal {D}}}\) such that they both induce the same untimed reachability probabilities. To this end, we discuss the connection between probabilities of paths of MA \(\mathcal {M}\) and paths of MDP \({\mathcal {M}_\mathcal {D}}\).

Definition 9

(Induced paths of a time-abstract path) The set of induced paths on MA \(\mathcal {M}\) of a path \(\hat{\pi }\) of \({\mathcal {M}_\mathcal {D}}\) is given by

$$\begin{aligned} \langle {\hat{\pi }} \rangle = \text {ta}^{-1}(\hat{\pi }) = \{\pi \in { FPaths ^{\mathcal {M}}} \cup { IPaths ^{\mathcal {M}}} \mid \text {ta}(\pi ) = \hat{\pi }\}. \end{aligned}$$

The set \(\langle {\hat{\pi }} \rangle \) contains all paths of \(\mathcal {M}\) where replacing sojourn times by \(\bot \) yields \(\hat{\pi }\).

For \(\sigma \in \text {GM}^{}\), the probability distribution \(\sigma (\pi ,\cdot ) \in \textit{Dist}( Act )\) might depend on the sojourn times of the path \(\pi \). The time-abstract scheduler \({\text {ta}(\sigma )}\) weights the distribution \(\sigma (\pi ,\cdot )\) with the probability masses of the paths \(\pi \in \langle {\hat{\pi }} \rangle \).

Definition 10

(Time-abstraction of a scheduler) The time-abstraction of \(\sigma \in \text {GM}^{\mathcal {M}}\) is defined as \({\text {ta}(\sigma )}\in \text {TA}^{{\mathcal {M}_\mathcal {D}}}\) such that for any \(\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}\)

$$\begin{aligned} {\text {ta}(\sigma )}(\hat{\pi },\alpha ) = {\left\{ \begin{array}{ll} \displaystyle \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma } ( \pi \mid \langle {\hat{\pi }} \rangle ) &{} \text {if } \text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) > 0\\ \displaystyle 1\,/\, | Act ( last (\hat{\pi }))| &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Intuitively, the term \(\text {Pr}^{\mathcal {M}}_{\sigma } ( \pi \mid \langle {\hat{\pi }} \rangle )\) represents the probability for a path in \(\langle {\hat{\pi }} \rangle \) to have sojourn times as given by \(\pi \). The value \({\text {ta}(\sigma )}(\hat{\pi },\alpha )\) coincides with the probability that \(\sigma \) picks action \(\alpha \), given that the time-abstract path \(\hat{\pi }\) was observed. If \(\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) = 0\), the value for \({\text {ta}(\sigma )}(\hat{\pi },\alpha )\) is arbitrary. For simplicity, we picked a uniform choice.

Example 6

Consider the MA \(\mathcal {M}\) in Fig. 2a and the scheduler \(\sigma \) choosing \(\alpha \) at state \(s_3\) iff the sojourn time at \(s_0\) is at most one. Then \({\text {ta}(\sigma )}(s_0 \xrightarrow {\bot } s_3,\alpha ) = 1-e^{-{\text {E}(s_0)}}\), the probability that \(s_0\) is left within one time unit. For \(\bar{\pi }= s_0 \xrightarrow {\bot } s_3 \xrightarrow {\alpha } s_6\) we have

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}\{s_6\}) = \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\bar{\pi }} \rangle ) = 1-e^{-{\text {E}(s_0)}} = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\bar{\pi }) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}\{s_6\}). \end{aligned}$$

In the example, the considered scheduler and its time-abstraction induce the same untimed reachability probabilities. We generalize this observation.

Lemma 1

For any \(\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}\) we have \( \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }). \)

Proof

The proof is by induction over the length of the considered path \(|\hat{\pi }| = n\). Let \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) and \({\mathcal {M}_\mathcal {D}}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}^{\mathcal {D}}, \dots , \rho _{\ell }^{\mathcal {D}}))\). If \(n=0\), then \(\{\hat{\pi }\} = \langle {\hat{\pi }} \rangle = \{{s_{0}}\}\). Hence, \(\text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) = 1 = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi })\). In the induction step, we assume that the lemma holds for a fixed path \(\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}\) with length \(|\hat{\pi }|=n\) and \( last (\hat{\pi }) = s\). Consider the path \(\hat{\pi }\xrightarrow {\alpha _{}} s' \in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}\). If \(\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) = 0\), then \(\text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle ) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\alpha _{}} s') = 0\) because \( Cyl (\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle ) \subseteq Cyl (\langle {\hat{\pi }} \rangle )\) and \( Cyl (\{\hat{\pi }\xrightarrow {\alpha _{}} s'\}) \subseteq Cyl (\{\hat{\pi }\})\).

Now assume \(\text {Pr}^{\mathcal {M}}_{\sigma } (\langle {\hat{\pi }} \rangle ) > 0\).

\(\underline{Case s\in \text {PS}:}\) It follows that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\alpha _{}} s'} \rangle )&= \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \cdot \mathbf{P} (s, \alpha , s') \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\{\pi \} \cap \langle {\hat{\pi }} \rangle ) \\&= \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\big [\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid \langle {\hat{\pi }} \rangle ) \cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle )\big ] \\&= \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \cdot \mathbf{P} (s, \alpha , s') \cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid \langle {\hat{\pi }} \rangle ) \\&= \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {ta}(\sigma )}(\hat{\pi },\alpha ) \\&\overset{{\textit{IH}}}{=}\text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) \cdot \mathbf{P} (s, \alpha , s') \cdot {\text {ta}(\sigma )}(\hat{\pi },\alpha ) \\&= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\alpha _{}} s'). \end{aligned}$$

\(\underline{Case s\in \text {MS}:}\) As \(s \in \text {MS}\) we have \(\alpha = \bot \) and it follows

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }\xrightarrow {\bot } s'} \rangle )&= \int _{\pi \in \langle {\hat{\pi }} \rangle } \int _{0}^{\infty } {\text {E}(s)}\cdot e^{-{\text {E}(s)} t} \cdot \mathbf{P} (s, \bot , s') \,\text {d}t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \bot , s')\cdot \int _{\pi \in \langle {\hat{\pi }} \rangle } \int _{0}^{\infty } {\text {E}(s)}\cdot e^{-{\text {E}(s)} t} \,\text {d}t\,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi ) \\&= \mathbf{P} (s, \bot , s')\cdot \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \\&\overset{{\textit{IH}}}{=}\mathbf{P} (s, \bot , s')\cdot \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) \\&= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }\xrightarrow {\bot } s'). \end{aligned}$$

\(\square \)

The result is lifted to untimed reachability probabilities.

Proposition 2

For any \(G\subseteq S\) it holds that \( \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{})= \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}). \)

Proof

Let \(\varPi \) be the set of finite time-abstract paths of \({\mathcal {M}_\mathcal {D}}\) that end at the first visit of a state in \(G\), i.e.,

$$\begin{aligned} \varPi = \{s_{0} \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{n-1}} s_{n}\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}} \mid s_n \in G\text { and } \forall i < n :s_i \notin G\}. \end{aligned}$$

Every path \(\pi \in \lozenge ^{}G\subseteq { IPaths ^{\mathcal {M}}}\) has a unique prefix \(\pi '\) with \(\text {ta}(\pi ') \in \varPi \). We have

The claim follows with Lemma 1 since

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{})= \sum _{\hat{\pi }\in \varPi } \text {Pr}^{\mathcal {M}}_{\sigma }(\langle {\hat{\pi }} \rangle ) \overset{\text {Lem.}\,1}{=}\sum _{\hat{\pi }\in \varPi } \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\hat{\pi }) = \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}). \end{aligned}$$

\(\square \)

As the definition of \({\text {ta}(\sigma )}\) is independent of the considered set of goal states \(G\subseteq S\), Proposition 2 can be lifted to multiple untimed reachability objectives.

Proof

(Theorem 3) Let \({\mathbb {O}_{}}= ({\mathbb {P}({\lozenge ^{}G_{1}})}, \dots , {\mathbb {P}({\lozenge ^{}G_{d}})})\) be the considered list of objectives with threshold relations \({\vartriangleright _{}}= (\vartriangleright _{1}, \dots , \vartriangleright _{d})\). The following equivalences hold for any \(\sigma \in \text {GM}^{\mathcal {M}}\) and \(\mathbf{p }\in \mathbb {R}^d\).

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ \,&\Longleftrightarrow \ \,\forall i :\mathcal {M}, \sigma \models {\mathbb {P}({\lozenge ^{}G_{i}})} \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \,\forall i :\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{i}) \vartriangleright _{i} p_{i} \\&\overset{{\text {Prop.}\,2}}{\Longleftrightarrow }\ \, \forall i :\text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{i}) \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \,\forall i :{\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {P}({\lozenge ^{}G_{i}})} \vartriangleright _{i} p_{i} \\&\Longleftrightarrow \ \, {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ . \end{aligned}$$

Assume that \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\) holds, i.e., there is a \(\sigma \in \text {GM}^{\mathcal {M}}\) such that \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\). It follows that \({\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\) which means that \( achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\) holds as well. For the other direction assume \( achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\), i.e., \({\mathcal {M}_\mathcal {D}}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\) for some time-abstract scheduler \(\sigma \in \text {TA}^{}\). We have \({\text {ta}(\sigma )}=\sigma \). It follows that \({\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\). Applying the equivalences above yields \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\) and thus \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\). \(\square \)

4.2 Expected reward objectives

The results for expected reward objectives are similar to untimed reachability objectives: An analysis of the underlying MDP suffices. We show the following extension of Theorem 3 to expected reward objectives.

Theorem 4

For MA \(\mathcal {M}\) and untimed reachability and expected reward objectives \({\mathbb {O}_{}}\): \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{{\mathcal {M}_\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }). \)

To prove this, we show that a scheduler \(\sigma \in \text {GM}^{\mathcal {M}}\) and its time-abstraction \({\text {ta}(\sigma )}\in \text {TA}^{}\) induce the same expected rewards on \(\mathcal {M}\) and \({\mathcal {M}_\mathcal {D}}\), respectively. Theorem 4 follows then analogously to Theorem 3.

Proposition 3

Let \(\rho _{}\) be some reward function of \(\mathcal {M}\) and let \(\rho _{}^{\mathcal {D}}\) be its counterpart for \({\mathcal {M}_\mathcal {D}}\). For \(G\subseteq S\) we have \( \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \)

Notice that \(\rho _{}^{\mathcal {D}}\) encodes the expected reward of \(\mathcal {M}\) obtained in a state s by assuming the sojourn time to be the expected sojourn time \(\frac{1}{E(s)}\). Although the claim is similar to Proposition 2, its proof cannot be adapted straightforwardly. In particular, the analogon to Lemma 1 does not hold: The expected reward collected along a time-abstract path \(\hat{\pi }\in { FPaths ^{{\mathcal {M}_\mathcal {D}}}}\) does not coincide in general for \(\mathcal {M}\) and \({\mathcal {M}_\mathcal {D}}\).

Example 7

Let \(\mathcal {M}\) be the MA with underlying MDP \({\mathcal {M}_\mathcal {D}}\) as shown in Fig. 2. Let \(\rho _{}(s_0) =1\) and zero otherwise. Reconsider the scheduler \(\sigma \) from Example 6. Let \(\hat{\pi }_\alpha = s_0 \xrightarrow {\bot } s_3 \xrightarrow { \alpha } s_6\). The probability \(\text {Pr}^{\mathcal {M}}_{\sigma }(\{ s_0 \xrightarrow {t} s_3 \xrightarrow {\alpha } s_6 \in \langle {\hat{\pi }_\alpha } \rangle \mid t> 1 \})\) is zero since \(\sigma \) chooses \(\beta \) on such paths. For the remaining paths in \(\langle {\hat{\pi }_\alpha } \rangle \), action \(\alpha \) is chosen with probability one. The expected reward in \(\mathcal {M}\) along \(\hat{\pi }_\alpha \) is:

The expected reward in \({\mathcal {M}_\mathcal {D}}\) along \(\hat{\pi }_\alpha \) differs as

The intuition is as follows: If path \(s_0 \xrightarrow {t} s_3 \xrightarrow {\alpha } s_6\) of \(\mathcal {M}\) under \(\sigma \) occurs, we have \(t\le 1\) since \(\sigma \) chose \(\alpha \). Hence, the reward collected from paths in \(\langle {\hat{\pi }_\alpha } \rangle \) is at most \(1 \cdot \rho _{}(s_0) = 1\). There is thus a dependency between the choice of the scheduler at \(s_3\) and the collected reward at \(s_0\). This dependency is absent in \({\mathcal {M}_\mathcal {D}}\) as the reward at a state is independent of the subsequent performed actions.

Let \(\hat{\pi }_\beta = s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4\). The expected reward along \(\hat{\pi }_\beta \) is \(2e^{-1}\) for \(\mathcal {M}\) and \(e^{-1}\) for \({\mathcal {M}_\mathcal {D}}\). As the rewards for \(\hat{\pi }_\alpha \) and \(\hat{\pi }_\beta \) sum up to one in both \(\mathcal {M}\) and \({\mathcal {M}_\mathcal {D}}\), the expected reward along all paths of length two coincides for \(\mathcal {M}\) and \({\mathcal {M}_\mathcal {D}}\).

This observation can be generalized to arbitrary MA and paths of arbitrary length. We first formalize the step-bounded expected reward.

Let \(n \ge 0\) and \(G\subseteq S\). The set of time-abstract paths that end after n steps or at the first visit of a state in \(G\) is denoted by

$$\begin{aligned} {\varPi _{G}^{n}}=&\{s_{0} \xrightarrow {\alpha _{0}} \dots \xrightarrow {\alpha _{m-1}} s_m \in { FPaths ^{{\mathcal {M}_\mathcal {D}}}} \mid \ (m=n \text { or } s_m \in G) \text { and } \\&s_i \notin G\text { for all } 0\le i < m\}. \end{aligned}$$

For \(\mathcal {M}\) under \(\sigma \in \text {GM}^{\mathcal {M}}\) and \({\mathcal {M}_\mathcal {D}}\) under \({\text {ta}(\sigma )}\in \text {TA}^{}\), we define the expected reward collected along the paths of \({\varPi _{G}^{n}}\) as

respectively. Intuitively, \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})\) corresponds to \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})\) assuming that no more reward is collected after the n-th transition. It follows that the value \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})\) approaches \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})\) for large n. Similarly, \(\mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}})\) approaches \(\mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{})\) for large n. This observation is formalized by the following lemma.

Lemma 2

For MA \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) with \(G\subseteq S\), \(\sigma \in \text {GM}^{}\), and reward function \(\rho _{}\) it holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{}). \end{aligned}$$

Furthermore, any reward function \(\rho _{}^{\mathcal {D}}\) for \({\mathcal {M}_\mathcal {D}}\) satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \end{aligned}$$

Proof

(Sketch) Essentially, for each \(n \ge 0\), consider the function \(f_n :{ IPaths ^{\mathcal {M}}} \rightarrow \mathbb {R}_{\ge 0}\) given by

for every path \(\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \in { IPaths ^{\mathcal {M}}}\). Intuitively, \(f_n(\pi )\) is the reward collected on \(\pi \) within the first n steps and only up to the first visit of \(G\). This allows us to express the expected reward collected along the paths of \({\varPi _{G}^{n}}\), and note that the sequence of functions \(f_0, f_1, \dots \) is non-decreasing, we may apply the monotone convergence theorem [1] and conclude the proof. A full proof may be found in the appendix. \(\square \)

The next step is to show that the expected reward collected along the paths of \({\varPi _{G}^{n}}\) coincides for \(\mathcal {M}\) under \(\sigma \) and \({\mathcal {M}_\mathcal {D}}\) under \({\text {ta}(\sigma )}\).

Lemma 3

Let \(\rho _{}\) be some reward function of \(\mathcal {M}\) and let \(\rho _{}^{\mathcal {D}}\) be its counterpart for \({\mathcal {M}_\mathcal {D}}\). Let \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) be an MA with \(G\subseteq S\) and \(\sigma \in \text {GM}^{}\). For all \(G\subseteq S\) and \(n\ge 0\) it holds that

$$\begin{aligned} \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) . \end{aligned}$$

Proof

The proof is by induction over the path length n. To simplify the notation, we often omit the reward functions \(\rho _{}\) and \(\rho _{}^{\mathcal {D}}\) and write, e.g., instead of or \(\mathrm {ER}^{\mathcal {M}}_{\sigma }( {\varPi _{G}^{n}})\) instead of \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})\).

If \(n = 0\), then \({\varPi _{G}^{n}}= \{ {s_{0}}\}\). The claim holds: .

In the induction step, we assume that the lemma is true for some fixed \(n \ge 0\). We define \( pref (\pi , n)\) for paths with \(|\pi | \le n\) such that \( pref (\pi , n) = \pi \). We split the term \(\mathrm {ER}^{\mathcal {M}}_{\sigma }( {\varPi _{G}^{n+1}})\) into the reward that is obtained by paths which reach \(G\) within n steps and the reward obtained by paths of length \(n+1\). In a second step, we consider the sum of the reward collected within the first n steps and the reward obtained in the \((n+1)\)-th step.

(1)
(2)

Detailed reformulations for the terms (1) and (2) are treated separately and discussed in “Appendix B.2”. \(\square \)

We now first show Proposition 3 and then Theorem 4.

Proof

(Proposition 3) The proposition is a direct consequence of Lemma 2 and Lemma 3 as

$$\begin{aligned} \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, G_{})&= \lim _{n\rightarrow \infty } \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{}, {\varPi _{G}^{n}})\\&= \lim _{n\rightarrow \infty } \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, {\varPi _{G}^{n}}) = \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{}^{\mathcal {D}}, G_{}). \end{aligned}$$

\(\square \)

Proof

(Theorem 4) Let \({\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})\) be the considered list of untimed reachability and expected reward objectives with threshold relations \({\vartriangleright _{}}= (\vartriangleright _{1}, \dots , \vartriangleright _{d})\). The following equivalences hold for any \(\sigma \in \text {GM}^{\mathcal {M}}\) and \(\mathbf{p }\in \mathbb {R}^d\).

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }&\Longleftrightarrow \forall i :\mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \\&\overset{*}{\Longleftrightarrow } \forall i :{\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\ , \end{aligned}$$

where for the equivalence marked with \(*\) we consider two cases: If \({\mathbb {O}_{i}}\) is of the form \({\mathbb {P}({\lozenge ^{}G_{}})}\), Proposition 2 yields

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}&\Longleftrightarrow \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{}G_{}) \vartriangleright _{i} p_{i}\\&\Longleftrightarrow \text {Pr}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\lozenge ^{}G_{}) \vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \ . \end{aligned}$$

Otherwise, \({\mathbb {O}_{i}}\) is of the form \({\mathbb {E}({\# j_{}, G_{}})}\) and with Proposition 3 it follows that

$$\begin{aligned} \mathcal {M}, \sigma \models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i}&\Longleftrightarrow \mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) \vartriangleright _{i} p_{i}\\&\Longleftrightarrow \mathrm {ER}^{{\mathcal {M}_\mathcal {D}}}_{{\text {ta}(\sigma )}}(\rho _{j}^{\mathcal {D}}, G_{})\vartriangleright _{i} p_{i} \Longleftrightarrow {\mathcal {M}_\mathcal {D}}, {\text {ta}(\sigma )}\models {\mathbb {O}_{i}} \vartriangleright _{i} p_{i} \ . \end{aligned}$$

The remaining steps of the proof are completely analogous to the proof of Theorem 3 conducted on page 15. \(\square \)

Thus, queries on MA with mixtures of untimed reachability and expected reward objectives can be analyzed on the underlying MDP \({\mathcal {M}_\mathcal {D}}\).

4.3 Timed reachability objectives

Timed reachability objectives cannot be analyzed on \({\mathcal {M}_\mathcal {D}}\) as it abstracts away from sojourn times. We lift the digitization approach for single-objective timed reachability [30, 35] to multiple objectives. Instead of abstracting timing information, it is digitized. In this section, we make an additional but standard assumption on MAs (e.g., [14, 30, 35]). MAs with Zeno behavior, where infinitely many actions can be taken within finite time with non-zero probability, are unrealistic and considered a modeling error. MAs that exhibit Zeno behavior can be easily detected using the notion of end-components (that we discuss in the next section).

4.3.1 Digitized reachability

The digitization \({\mathcal {M}_{\delta }}\) of \(\mathcal {M}\) w.r.t. some digitization constant \(\delta \in \mathbb {R}_{> 0}\) is an MDP which digitizes the time [30, 35]. The main difference between \({\mathcal {M}_\mathcal {D}}\) and \({\mathcal {M}_{\delta }}\) is that the latter also introduces self-loops which describe the probability to stay in a Markovian state for \(\delta \) time units. More precisely, the outgoing transitions of states \(s\in \text {MS}\) in \({\mathcal {M}_{\delta }}\) represent that either (1) a Markovian transition in \(\mathcal {M}\) was taken within \(\delta \) time units, or (2) no transition is taken within \(\delta \) time units—which is captured by taking the self-loop in \({\mathcal {M}_{\delta }}\). Counting the taken self-loops at \(s\in \text {MS}\) allows to approximate the sojourn time in \(s\).

Definition 11

(Digitization of an MA) For MA \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, \{\rho _{1},\dots , \rho _{\ell }\})\) with transition probabilities \(\mathbf{P} \) and digitization constant \(\delta \in \mathbb {R}_{> 0}\), the digitization of \(\mathcal {M}\) w.r.t. \(\delta \) is the MDP \({\mathcal {M}_{\delta }}= (S, Act , \mathbf{P} _{\delta }, {s_{0}}, (\rho _{1}^\delta , \dots , \rho _{\ell }^\delta ))\), where

$$\begin{aligned} \mathbf{P} _{\delta }(s, \alpha , s') = {\left\{ \begin{array}{ll} \mathbf{P} (s, \bot , s') \cdot (1-e^{-{\text {E}(s)} \delta }) &{} \text {if } s\in \text {MS}, \alpha = \bot , s\ne s'\\ \mathbf{P} (s, \bot , s') \cdot (1-e^{-{\text {E}(s)} \delta }) + e^{-{\text {E}(s)} \delta } &{} \text {if } s\in \text {MS}, \alpha = \bot , s= s'\\ \mathbf{P} (s, \alpha , s') &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

and for each \(i \in \{1, \dots , \ell \}\):

$$\begin{aligned} \rho _{i}^\delta (s, \alpha ) = {\left\{ \begin{array}{ll} \rho _{i}(s, \alpha ) &{} \text {if } s\in \text {PS}\\ \big (\rho _{i}(s, \bot ) + \frac{1}{{\text {E}(s)}} \cdot \rho _{i}(s)\big ) \cdot \big (1-e^{-{\text {E}(s)} \delta }\big ) &{} \text {if } s\in \text {MS}\text { and } \alpha = \bot \\ 0 &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Example 8

Figure 2 on page 4 shows an MA \(\mathcal {M}\) with its underlying MDP \({\mathcal {M}_\mathcal {D}}\) and a digitization \({\mathcal {M}_{\delta }}\) for unspecified \(\delta \in \mathbb {R}_{> 0}\).

A time interval \(I\subseteq \mathbb {R}_{\ge 0}\) of the form \([a,\infty )\) or \([a, b]\) with \( \text {di}_a{:}{=} \frac{a}{\delta } \in \mathbb {N}\) and \(\text {di}_b{:}{=} \frac{b}{\delta } \in \mathbb {N}\) is called well-formed. For the remainder, we only consider well-formed intervals. If the interval boundaries a and b are rationals with \(a= \frac{a_1}{a_2}, b= \frac{b_1}{b_2}\) for integers \(a_1, a_2, b_1, b_2 \in \mathbb {Z}\), well-formedness can always be ensured by setting the digitization constant \(\delta \) to \(\frac{1}{k}\) for some common multiple k of \(a_2\) and \(b_2\).

An interval for time-bounds \(I\) is transformed to digitization step bounds \({\text {di}(I)}\subseteq \mathbb {N}\). Let \(a=\min I\), we set \( {\text {di}(I)}= \{ {m_{}}\in \mathbb {N}\mid {m_{}}\cdot \delta \in I\}\).

We first relate paths in \(\mathcal {M}\) to paths in its digitization.

Definition 12

(Digitization of a path) The digitization \({\text {di}(\pi )}\) of path \(\pi = s_{0} \xrightarrow {\kappa _{0}} s_{1} \xrightarrow {\kappa _{1}} \dots \) in \(\mathcal {M}\) is the path in \({\mathcal {M}_{\delta }}\) given by

$$\begin{aligned} {\text {di}(\pi )}= \big ( s_0 \xrightarrow {\alpha (\kappa _{0})} \big )^{{m_{0}}} s_0 \xrightarrow {\alpha (\kappa _{0})} \big ( s_1 \xrightarrow {\alpha (\kappa _{1})} \big )^{{m_{1}}} s_1 \xrightarrow {\alpha (\kappa _{1})} \dots \ \end{aligned}$$

where \({m_{i}} = \max \{ {m_{}} \in \mathbb {N}\mid {m_{}} \cdot \delta \le t(\kappa _{i}) \}\) for each \(i\ge 0\).

Example 9

For the path \(\pi = s_0 \xrightarrow {1.1} s_3 \xrightarrow {\beta } s_4 \xrightarrow {\eta } s_5 \xrightarrow {0.3} s_4 \) of the MA \(\mathcal {M}\) in Fig. 2a and \(\delta = 0.4\), we get \( {\text {di}(\pi )}= s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4 \xrightarrow {\eta } s_5 \xrightarrow {\bot } s_4 \).

The \({m_{i}}\) in the definition above represent a digitization of the sojourn times \(t(\kappa _{i})\) such that \({m_{i}} \delta \le t(\kappa _{i}) < ({m_{i}}{+}1) \delta \). These digitized times are incorporated into the digitization of a path by taking the self-loop at state \(s_i \in \text {MS}\) \({m_{i}}\) times. We also refer to the paths of \({\mathcal {M}_{\delta }}\) as digital paths (of \(\mathcal {M}\)). The number \({|\bar{\pi }|_{\text {ds}}}\) of digitization steps of a digital path \(\bar{\pi }\) is the number of transitions emerging from Markovian states, i.e., \( {|\bar{\pi }|_{\text {ds}}} = |\{ i < |\bar{\pi }| \mid {\bar{\pi }[i]} \in \text {MS}\} |\). One digitization step represents the elapse of at most \(\delta \) time units—either by staying at some \(s\in \text {MS}\) for \(\delta \) time or by leaving \(s\) within \(\delta \) time. The number \({|{\text {di}(\pi )}|_{\text {ds}}}\) multiplied with \(\delta \) yields an estimate for the duration \( T (\pi )\). A digital path \(\bar{\pi }\) can be interpreted as representation of the set of paths of \(\mathcal {M}\) whose digitization is \(\bar{\pi }\).

Definition 13

(Induced paths of a digital path) The set of induced paths of a (finite or infinite) digital path \(\bar{\pi }\) of \({\mathcal {M}_{\delta }}\) is

$$\begin{aligned}{}[{\bar{\pi }}] = \text {di}^{-1}(\bar{\pi }) = \{ \pi \in { FPaths ^{\mathcal {M}}} \cup { IPaths ^{\mathcal {M}}} \mid {\text {di}(\pi )}= \bar{\pi }\}. \end{aligned}$$

For sets of digital paths \(\varPi \) we define the induced paths \([{\varPi }] = \bigcup _{\bar{\pi }\in \varPi } [{\bar{\pi }}]\). To relate timed reachability probabilities for \(\mathcal {M}\) under scheduler \(\sigma \in \text {GM}^{\mathcal {M}}\) with digitization step-bounded reachability probabilities for \({\mathcal {M}_{\delta }}\), relating \(\sigma \) to a scheduler for \({\mathcal {M}_{\delta }}\) is necessary.

Definition 14

(Digitization of a scheduler) The digitization of \(\sigma \in \text {GM}^{\mathcal {M}}\) is given by \({\text {di}(\sigma )}\in \text {TA}^{{\mathcal {M}_{\delta }}}\) such that for any \(\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}\) with \( last (\bar{\pi }) \in \text {PS}\)

$$\begin{aligned} {\text {di}(\sigma )}(\bar{\pi },\alpha ) = {\left\{ \begin{array}{ll} \displaystyle \ \int _{\pi \in [{\bar{\pi }}]} \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }}]) &{} \text {if } \text {Pr}^{\mathcal {M}}_{\sigma } ([{\bar{\pi }}]) > 0\\ \displaystyle 1\,/\, | Act ( last (\bar{\pi }))| &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

The digitization \({\text {di}(\sigma )}\) is similar to the time-abstraction \({\text {ta}(\sigma )}\) as both schedulers get a path with restricted timing information as input and mimic the choice of \(\sigma \). However, while \({\text {ta}(\sigma )}\) receives no information regarding sojourn times, \({\text {di}(\sigma )}\) receives the digital estimate. Intuitively, \({\text {di}(\sigma )}(\bar{\pi },\alpha )\) considers \(\sigma (\pi ,\alpha )\) for each \(\pi \in [{\bar{\pi }}]\), weighted with the probability that the sojourn times of a path in \([{\bar{\pi }}]\) are as given by \(\pi \). The restriction \( last (\bar{\pi }) \in \text {PS}\) asserts that \(\bar{\pi }\) does not end with a self-loop on a Markovian state, implying \([{\bar{\pi }}] \ne \emptyset \).

Example 10

Let MA \(\mathcal {M}\) in Fig. 2a and \(\delta =0.4\). Again, \(\sigma \in \text {GM}^{\mathcal {M}}\) chooses \(\alpha \) at state \(s_3\) iff the sojourn time at \(s_0\) is at most one. Consider the digital paths \(\bar{\pi }_m = ( s_0 \xrightarrow {\bot } )^{m} s_0 \xrightarrow {\bot } s_3\). For \(\pi \in [{\bar{\pi }_1}] = \{s_0 \xrightarrow {t} s_3 \mid 0.4 \le t< 0.8 \}\) we have \(\sigma (\pi ,\alpha ) = 1\). It follows \({\text {di}(\sigma )}(\pi _1,\alpha ) = 1\). For \(\pi \in [{\bar{\pi }_2}] = \{s_0 \xrightarrow {t} s_3 \mid 0.8 \le t< 1.2 \}\) it is unclear whether \(\sigma \) chooses \(\alpha \) or \(\beta \). Hence, \({\text {di}(\sigma )}\) randomly guesses:

$$\begin{aligned} {\text {di}(\sigma )}(\bar{\pi }_2,\alpha ) = \int _{\pi \in [{\bar{\pi }_2}]} \sigma (\pi ,\alpha ) \,\text {d}\text {Pr}^{\mathcal {M}}_{\sigma }(\pi \mid [{\bar{\pi }_2}]) = \frac{ \int _{0.8}^{1.0} {\text {E}(s_0)} e^{-{\text {E}(s_0)} t} \,\text {d}t}{\int _{0.8}^{1.2} {\text {E}(s_0)} e^{-{\text {E}(s_0)} t} \,\text {d}t} \approx 0.55\ . \end{aligned}$$

On \({\mathcal {M}_{\delta }}\) we consider \(\text {ds}\)-bounded reachability instead of timed reachability.

Definition 15

(\(\text {ds}\)-bounded reachability) The set of infinite digital paths that reach \(G\subseteq S\) within the interval \(J\subseteq \mathbb {N}\) of consecutive natural numbers is

$$\begin{aligned} \lozenge ^{J}_{\text {ds}} G= \{\bar{\pi }\in { IPaths ^{{\mathcal {M}_{\delta }}}} \mid \ \exists n \ge 0 :{\bar{\pi }[n]} \in G\text { and } {| pref (\bar{\pi }, n)|_{\text {ds}}} \in J\}. \end{aligned}$$

The timed reachability probabilities for \(\mathcal {M}\) are estimated by \(\text {ds}\)-bounded reachability probabilities for \({\mathcal {M}_{\delta }}\). The induced \(\text {ds}\)-bounded reachability probability for \(\mathcal {M}\) (under \(\sigma \)) coincides with \(\text {ds}\)-bounded reachability probability on \({\mathcal {M}_{\delta }}\) (under \({\text {di}(\sigma )}\)).

Proposition 4

Let \(\mathcal {M}\) be an MA with \(G\subseteq S\), \(\sigma \in \text {GM}^{}\), and digitization \({\mathcal {M}_{\delta }}\). Further, let \(J\subseteq \mathbb {N}\) be a set of consecutive natural numbers. It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{J}_{\text {ds}} G}]) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{J}_{\text {ds}} G). \end{aligned}$$

Before we prove this proposition, we need to aggregate paths which differ a bit in their timing. Let \(\mathcal {M}= (S, Act , \rightarrow , {s_{0}}, (\rho _{1}, \!\dots \!, \rho _{\ell }) )\) be an MA and let \({\mathcal {M}_{\delta }}\) be the digitization of \(\mathcal {M}\) with respect to some \(\delta \in \mathbb {R}_{> 0}\). We consider the infinite paths of \(\mathcal {M}\) that are represented by a finite digital path.

Definition 16

(Induced cylinder of a digital path) Given a digital path \(\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}\) of MA \(\mathcal {M}\), the induced cylinder of \(\bar{\pi }\) is given by

$$\begin{aligned}{}[{\bar{\pi }}]_{\textit{cyl}} = \{\pi \in { IPaths ^{\mathcal {M}}} \mid \bar{\pi }\text { is a prefix of } {\text {di}(\pi )}\}. \end{aligned}$$

Recall the definition of the cylinder of a set of finite paths (cf. Sect. 2.2.1). If \(\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}\) does not end with a self-loop at a Markovian state, then \([{\bar{\pi }}]_{\textit{cyl}} = Cyl ([{\bar{\pi }}])\) holds.

Example 11

Let \(\mathcal {M}\) and \({\mathcal {M}_{\delta }}\) be as in Fig. 2. We consider the path \(\bar{\pi }_1 = s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_3 \xrightarrow {\beta } s_4 \) and digitization constant \(\delta = 0.4\). The set \([{\bar{\pi }_1}]_{\textit{cyl}}\) contains each infinite path whose digitization has the prefix \(\bar{\pi }_1\), i.e.,

$$\begin{aligned}{}[{\bar{\pi }_1}]_{\textit{cyl}} = \{ s_0 \xrightarrow {t} s_3 \xrightarrow {\beta } s_4 \xrightarrow {\kappa _{}} \dots \in { IPaths ^{\mathcal {M}}} \mid 0.8 \le t< 1.2\}. \end{aligned}$$

We observe that these are exactly the paths that have a prefix in \([{\bar{\pi }_1}]\). Put differently, we have \([{\bar{\pi }_1}]_{\textit{cyl}} = Cyl ([{\bar{\pi }_1}])\).

Next, consider the digital path \(\bar{\pi }_2 = s_0 \xrightarrow {\bot } s_0 \xrightarrow {\bot } s_0\). Note that there is no path \(\pi \in { FPaths ^{\mathcal {M}}}\) with \({\text {di}(\pi )}= \bar{\pi }_2\), implying \([{\bar{\pi }_2}] = \emptyset \). Intuitively, \(\bar{\pi }_2\) depicts a sojourn time at \( last (\bar{\pi }_2)\) but finite paths of MAs do not depict sojourn times at their last state. On the other hand, the induced cylinder of \(\bar{\pi }_2\) contains all paths that sojourn at least \(2 \delta \) time units at \(s_0\), i.e.,

$$\begin{aligned}{}[{\bar{\pi }_2}]_{\textit{cyl}} = \{ s_0 \xrightarrow {t} s_1 \xrightarrow {\kappa _{}} \dots \in { IPaths ^{\mathcal {M}}} \mid t\ge 0.8 \}. \end{aligned}$$

The schedulers \(\sigma \) and \({\text {di}(\sigma )}\) induce the same probabilities for a given digital path. This is formalized by the following lemma. Note that a similar statement for \({\text {ta}(\sigma )}\) and time-abstract paths was shown in Lemma 1.

Lemma 4

Let \(\mathcal {M}\) be an MA with scheduler \(\sigma \in \text {GM}^{}\), digitization \({\mathcal {M}_{\delta }}\), and digital path \(\bar{\pi }\in { FPaths ^{{\mathcal {M}_{\delta }}}}\). It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }( [{\bar{\pi }}]_{\textit{cyl}} ) = \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\bar{\pi }). \end{aligned}$$

The proof is included in App. C.1. To show Proposition 4, we now apply Lemma 4. The idea of the proof is similar to the proof of Proposition 2 conducted on page 14. We include a formal proof in App. C.2.

4.3.2 Timed reachability via digitization

Thus, induced \(\text {ds}\)-bounded reachability on MAs can be computed on their digitization. Next, we relate \(\text {ds}\)-bounded and timed reachability on MAs, i.e., we quantify the maximum difference between time-bounded and \(\text {ds}\)-bounded reachability probabilities.

The notation \({|\bar{\pi }|_{\text {ds}}}\) for paths \(\bar{\pi }\) of \({\mathcal {M}_{\delta }}\) is also applied to paths of \(\mathcal {M}\), where \({|\pi |_{\text {ds}}} = {|{\text {di}(\pi )}|_{\text {ds}}}\) for any \(\pi \in { FPaths ^{\mathcal {M}}}\). Intuitively, one digitization step represents the elapse of at most \(\delta \) time units. Consequently, the duration of a path with \(k\in \mathbb {N}\) digitization steps is at most \(k\delta \).

Lemma 5

For a path \(\pi \in { FPaths ^{\mathcal {M}}}\) and digitization constant \(\delta \) it holds that

$$\begin{aligned} T (\pi ) \le {|\pi |_{\text {ds}}} \cdot \delta \ . \end{aligned}$$

Proof

Let \(\pi = s_{0} \xrightarrow {\kappa _{0}} \dots {\xrightarrow {\kappa _{n-1}} s_{n}}\) and let \({m_{i}} = \max \{ {m_{}} \in \mathbb {N}\mid {m_{}} \delta \le t(\kappa _{i}) \}\) for each \(i \in \{0, \dots , n-1\}\) (as in Definition 12). The number \({|\pi |_{\text {ds}}}\) is given by \(\sum _{ 0 \le i < n,\, s_i \in \text {MS}} ({m_{i}} + 1)\). With \(t(\kappa _{i}) \le ({m_{i}} + 1) \delta \) it follows that

$$\begin{aligned} T (\pi ) = \sum _{\begin{array}{c} 0 \le i< n\\ s_i \in \text {MS} \end{array}} t(\kappa _{i}) \le \sum _{\begin{array}{c} 0 \le i < n\\ s_i \in \text {MS} \end{array}} ( {m_{i}} + 1) \delta = {|\pi |_{\text {ds}}} \cdot \delta \ . \end{aligned}$$

\(\square \)

For a path \(\pi \) and \(t\in \mathbb {R}_{\ge 0}\), the prefix of \(\pi \) up to time point \(t\) is given by \( pref _{ T }(\pi , t) = pref (\pi , \max \{ n \mid T ( pref (\pi , n)) \le t\})\). Observe that the digitization approach yields an inaccurate estimate of the actual time. This inaccuracy is the probability that more than \(k\in \mathbb {N}\) digitization steps have been performed within \(k\delta \) time units.

Definition 17

(Digitization step bounded paths) Assume an MA \(\mathcal {M}\) and a digitization constant \(\delta \in \mathbb {R}_{> 0}\). For some \(t\in \mathbb {R}_{\ge 0}\), \(k\in \mathbb {N}\), and \({\vartriangleright _{}} \in \{<,\le , >, \ge \}\) the set of paths whose prefix up to time point \(t\) has \(\vartriangleright _{}k\) digitization steps is defined as

$$\begin{aligned} \#^{}[{t}]^{{\vartriangleright _{}}{ k}} = \{\pi \in { IPaths ^{\mathcal {M}}} \mid {| pref _{ T }(\pi , t)|_{\text {ds}}} \vartriangleright _{}k\}. \end{aligned}$$
Fig. 4
figure 4

MA \(\mathcal {M}\) and illustration of paths of \(\mathcal {M}\) (cf. Example 12)

Example 12

Let \(\mathcal {M}\) be the MA given in Fig. 4a. We consider the set \(\#^{}[{5 \delta }]^{{\le }{ 5}}\). The digitization constant \(\delta \) remains unspecified in this example. Fig. 4b illustrates paths \(\pi _1\), \(\pi _2\), and \(\pi _3\) of \(\mathcal {M}\). We depict sojourn times by arrow length. For instance, the path \(\pi _1\) corresponds to \(s_0 \xrightarrow {2.5 \delta } s_0 \xrightarrow {1.8 \delta } s_1 \xrightarrow {1.7 \delta } \dots \in { IPaths ^{\mathcal {M}}}\). Digitization steps that are “earned” by sojourning at some state for a multiple of \(\delta \) time units are indicated by black dots. Transitions of \(\pi _i\) (where \(i \in \{1,2,3\}\)) that do not belong to \( pref _{ T }(\pi _i, 5 \delta )\) are depicted in gray. We obtain

$$\begin{aligned} {| pref _{ T }(\pi _1, 5\delta )|_{\text {ds}}} = 5 \quad&\implies \quad \pi _1 \in \#^{}[{5 \delta }]^{{\le }{ 5}} \\ {| pref _{ T }(\pi _2, 5\delta )|_{\text {ds}}} = 4 \quad&\implies \quad \pi _2 \in \#^{}[{5 \delta }]^{{\le }{ 5}} \\ {| pref _{ T }(\pi _3, 5\delta )|_{\text {ds}}} = 7 \quad&\implies \quad \pi _3 \notin \#^{}[{5 \delta }]^{{\le }{ 5}}\ . \end{aligned}$$

Note that only the digitization steps of the prefix up to time point \(5 \delta \) are considered. For example, the step of \(\pi _2\) at time point \(4.5 \delta \) is not considered since the corresponding transition is not part of \( pref _{ T }(\pi _2, 5\delta )\). However, we have \({| pref _{ T }(\pi _2, 5.5 \delta )|_{\text {ds}}} = 6\), implying \(\pi _2 \notin \#^{}[{5.5 \delta }]^{{\le }{5}}\).

All considered paths reach \(G= \{s_1 \}\) within \(5 \delta \) time units but \(\pi _3 \in \#^{}[{5 \delta }]^{{>}{5}}\) requires more than 5 digitization steps. This yields

$$\begin{aligned} \pi _1, \pi _2 \in \lozenge ^{I} G\cap [{\lozenge ^{{\text {di}(I)}}_{\text {ds}}G}] \quad \text { and } \quad \pi _3 \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}}G}]. \end{aligned}$$
Fig. 5
figure 5

Illustration of the sets \(\lozenge ^{I} G\) and \([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]\)

The sets \(\lozenge ^{I} G\) and \([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]\) are illustrated in Fig. 5. We have

$$\begin{aligned} \text {Pr}^{}_{\sigma }(\lozenge ^{I} G) = \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) + \text {Pr}^{}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) - \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G). \end{aligned}$$

One then shows

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \varepsilon ^{\uparrow }_{}(I) \text {\quad and \quad } \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$

for adequate error bounds \( \varepsilon ^{\uparrow }_{}(I),\varepsilon ^{\downarrow }_{}(I)\). In particular, the following lemma gives an upper bound for the probability \(\text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{>}{k}})\), i.e., the probability that more than \(k\in \mathbb {N}\) digitization steps have been performed within \(k\delta \) time units. Then, this probability can be related to the probability of paths in \(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]\) and \([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G\), respectively.

Lemma 6

Let \(\mathcal {M}\) be an MA with \(\sigma \in \text {GM}^{}\) and maximum rate \(\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}\). Further, let \(\delta \in \mathbb {R}_{> 0}\) and \(k\in \mathbb {N}\). It holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{>}{k}}) \le 1- (1+ \lambda \delta )^{k} \cdot e^{- \lambda \delta k} \end{aligned}$$

For the proof of Lemma 6 we employ the following auxiliary lemma.

Lemma 7

Let \(\mathcal {M}\) be an MA with \(\sigma \in \text {GM}^{}\) and maximum rate \(\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}\). For each \(\delta \in \mathbb {R}_{> 0}\), \(k\in \mathbb {N}\), and \(t\in \mathbb {R}_{\ge 0}\) it holds that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta + t}]^{{\le }{ k}}) \ge \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{k\delta }]^{{\le }{ k}}) \cdot e^{-\lambda t}\ . \end{aligned}$$

Proofs for both lemmas are given in App. C.3.

We have now obtained all necessary insights to bound \(\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})\) from above and below, using the digitized steps and the following error bounds. Let \(\lambda = \max \{{\text {E}(s)} \mid s\in \text {MS}\}\) be the maximum exit rate of \(\mathcal {M}\). For \(a\ne 0\) define

$$\begin{aligned} \varepsilon ^{\downarrow }_{}([a, b])&= \varepsilon ^{\downarrow }_{}([a, \infty )) = 1 - (1+ \lambda \delta )^{\text {di}_a} \cdot e^{- \lambda a}\ , \quad \varepsilon ^{\downarrow }_{}([0,b)) = \varepsilon ^{\downarrow }_{}([0,\infty ]) = 0 , \\ \varepsilon ^{\uparrow }_{}([a,b])&=\underbrace{ 1 - (1+ \lambda \delta )^{\text {di}_b} \cdot e^{- \lambda b}}_{=\varepsilon ^{\uparrow }_{}([0,b])} + \underbrace{1 - e^{-\lambda \delta }}_{=\varepsilon ^{\uparrow }_{}([a, \infty ))}\ , \text { and }\ \varepsilon ^{\uparrow }_{}([0,\infty )) = 0 . \end{aligned}$$

\(\varepsilon ^{\downarrow }_{}(I)\) and \(\varepsilon ^{\uparrow }_{}(I)\) approach 0 for small digitization constants \(\delta \in \mathbb {R}_{> 0}\).

Proposition 5

For MA \(\mathcal {M}\), scheduler \(\sigma \in \text {GM}^{}\), goal states \(G\subseteq S\), digitization constant \(\delta \in \mathbb {R}_{> 0}\) and time interval \(I\)

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \in \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{I}_{\text {ds}} G}]) + \Big [{-}\varepsilon ^{\downarrow }_{}(I),\, \varepsilon ^{\uparrow }_{}(I)\Big ]. \end{aligned}$$

Proof

We already discussed that

$$\begin{aligned} \text {Pr}^{}_{\sigma }(\lozenge ^{I} G) = \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) + \text {Pr}^{}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) - \text {Pr}^{}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G). \end{aligned}$$

The main part of the proof is to show that

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) \le \varepsilon ^{\downarrow }_{}(I) \text { and } \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \varepsilon ^{\uparrow }_{}(I). \end{aligned}$$
(3)

Then, the proposition follows directly. We show Equation 3 for the different forms of the time interval \(I\). Here, we consider intervals of the form \(I= [0,b]\). Recall that by assumption \(b= \text {di}_b\cdot \delta \). For the other forms, we refer to App. C.4.

We have \({\text {di}(I)}= \{0,1, \dots , \text {di}_b\}\).

  • We show that \([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] \subseteq \lozenge ^{I} G\) which implies

    $$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }([{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] {\setminus } \lozenge ^{I} G) = \text {Pr}^{\mathcal {M}}_{\sigma }(\emptyset ) = 0 = \varepsilon ^{\downarrow }_{}(I). \end{aligned}$$

    Let \(\pi \in [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]\) and let \(\pi '\) be the smallest prefix of \(\pi \) with \( last (\pi ') \in G\). It follows that \(\text {di}(\pi ')\) is also the smallest prefix of \({\text {di}(\pi )}\) with \( last (\text {di}(\pi ')) \in G\). Since \({\text {di}(\pi )}\in \lozenge ^{{\text {di}(I)}}_{\text {ds}} G\), it follows that \({|\pi '|_{\text {ds}}} = {|\text {di}(\pi ')|_{\text {ds}}} \le \text {di}_b\). From Lemma 5 we obtain

    $$\begin{aligned} T (\pi ') \le {|\pi '|_{\text {ds}}} \cdot \delta = {|\text {di}(\pi ')|_{\text {ds}}} \cdot \delta \le \text {di}_b\delta = b\ . \end{aligned}$$

    Hence, the prefix \(\pi '\) reaches \(G\) within \(b\) time units, implying \(\pi \in \lozenge ^{I} G\).

  • Next, we show \(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}] \subseteq \#^{}[{b}]^{{>}{\text {di}_b}}\). With Lemma 6 we obtain

    $$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]) \le \text {Pr}^{\mathcal {M}}_{\sigma }(\#^{}[{b}]^{{>}{\text {di}_b}}) \le 1 - (1+ \lambda \delta )^{\text {di}_b} \cdot e^{- \lambda b} = \varepsilon ^{\uparrow }_{}(I) \end{aligned}$$

    Consider a path \(\pi \in \lozenge ^{I} G{\setminus } [{\lozenge ^{{\text {di}(I)}}_{\text {ds}} G}]\). Note that \(\pi \) reaches \(G\) within \(b\) time units but with more than \(\text {di}_b\) digitization steps. Hence, the prefix of \(\pi \) up to time point \(b\) certainly has more than \(\text {di}_b\) digitization steps, i.e., \(\pi \) satisfies \({| pref _{ T }(\pi , b)|_{\text {ds}}} > \text {di}_b\) which means \(\pi \in \#^{}[{b}]^{{>}{\text {di}_b}}\).

\(\square \)

From Prop. 4 and Prop. 5, we immediately have Cor. 1, which ensures that the value \(\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})\) can be approximated with arbitrary precision by computing \(\text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G)\) for a sufficiently small \(\delta \).

Corollary 1

For MA \(\mathcal {M}\), scheduler \(\sigma \in \text {GM}^{}\), goal states \(G\subseteq S\), digitization constant \(\delta \in \mathbb {R}_{> 0}\) and time interval \(I\)

$$\begin{aligned} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \in \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) + \Big [{-}\varepsilon ^{\downarrow }_{}(I),\, \varepsilon ^{\uparrow }_{}(I)\Big ] . \end{aligned}$$

Corollary 1 generalizes existing results from single-objective timed reachability analysis: For MA \(\mathcal {M}\), goal states \(G\), time bound \(b\in \mathbb {R}_{> 0}\), and digitization constant \(\delta \in \mathbb {R}_{> 0}\) with \(\frac{b}{\delta } = \text {di}_b\in \mathbb {N}\), [30, Theorem 5.3] states that

$$\begin{aligned} \sup _{\sigma \in \text {GM}^{\mathcal {M}}} \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{[0,b]} G_{}) \in \sup _{\sigma \in \text {TA}^{{\mathcal {M}_{\delta }}}} \text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma }(\lozenge ^{\{0, \dots , \text {di}_b\}}_{\text {ds}} G) + \Big [{-}\varepsilon ^{\downarrow }_{}([0, b]),\, \varepsilon ^{\uparrow }_{}([0, b])\Big ]. \end{aligned}$$

Corollary 1 generalizes this result by explicitly referring to the schedulers \(\sigma \in \text {GM}^{\mathcal {M}}\) and \({\text {di}(\sigma )}\in \text {TA}^{{\mathcal {M}_{\delta }}}\) under which the claim holds. This extension is necessary as a multi-objective analysis can not be restricted to schedulers that only optimize a single objective. More details are given in App. D.

Next, we lift Cor. 1 to multiple objectives \({\mathbb {O}_{}}= ({\mathbb {O}_{1}}, \dots , {\mathbb {O}_{d}})\). We define the satisfaction of a timed reachability objective \({\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\) for the digitization \({\mathcal {M}_{\delta }}\) as \( {\mathcal {M}_{\delta }}, \sigma \models {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\vartriangleright _{i} p_{i} \text { iff } \text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma }(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) \vartriangleright _{i}p_{i} \). This allows us to consider notations like \( achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\), where \({\mathbb {O}_{}}\) contains one or more timed reachability objectives. For a point \(\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d\) we consider the hyperrectangle

and \(\varepsilon ^{\downarrow }_{i}\) is defined similarly. The next example shows how the set of achievable points of \(\mathcal {M}\) can be approximated using achievable points of \({\mathcal {M}_{\delta }}\).

Example 13

Let \({\mathbb {O}_{}} = ( {\mathbb {P}({\lozenge ^{I_{1}}{G_{1}}})}, {\mathbb {P}({\lozenge ^{I_{2}}{G_{2}}})})\) be two timed reachability objectives for an MA \(\mathcal {M}\) with digitization \({\mathcal {M}_{\delta }}\) such that \(\varepsilon ^{\downarrow }_{1} = 0.13\), \(\varepsilon ^{\uparrow }_{1} = 0.22\), \(\varepsilon ^{\downarrow }_{2} = 0.07\), and \(\varepsilon ^{\uparrow }_{2} = 0.15\). The blue rectangle in Fig. 6a illustrates the set \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) for the point \(\mathbf{p }= (0.4, 0.3)\). Assume \( achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\) holds for threshold relations \({\vartriangleright _{}} = \{\ge , \ge \}\), i.e., \(\mathbf{p }\) is achievable for the digitization \({\mathcal {M}_{\delta }}\). From Cor. 1, we infer that \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) contains at least one point \(\mathbf{p }'\) that is achievable for \(\mathcal {M}\). Hence, the bottom left corner point of the rectangle is achievable for \(\mathcal {M}\). This holds for any rectangle \(\varepsilon ({\mathbb {O}_{}}, \mathbf{q })\) with \(\mathbf{q }\in A\), where \(A\) is the set of achievable points of \({\mathcal {M}_{\delta }}\) denoted by the gray areaFootnote 3 in Fig. 6b. It follows that any point in \(A^{-}\) (depicted by the green area) is achievable for \(\mathcal {M}\). On the other hand, an achievable point of \(\mathcal {M}\) has to be contained in a set \(\varepsilon ({\mathbb {O}_{}}, \mathbf{q })\) for at least one \(\mathbf{q }\in A\). The red area depicts the points \(\mathbb {R}^d{\setminus } A^{+}\) for which this is not the case, i.e., points that are not achievable for \(\mathcal {M}\). The digitization constant \(\delta \) controls the accuracy of the resulting approximation. Figure 6c depicts a possible result when a smaller digitization constant \(\tilde{\delta }< \delta \) is considered.

Fig. 6
figure 6

Approximation of achievable points

The observations from the example above are formalized in the following theorem. The theorem also covers unbounded reachability objectives by considering the time interval \(I= [0,\infty )\). For expected reward objectives of the form \({\mathbb {E}({\# j_{}, G_{}})}\) it can be shown that \(\mathrm {ER}^{\mathcal {M}}_{\sigma }(\rho _{j}, G_{}) = \mathrm {ER}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\rho _{j}^\delta , G_{})\). This claim is similar to Proposition 3 and can be shown analogously. This enables multi-objective model checking of MAs with timed reachability objectives.

Theorem 5

Let \(\mathcal {M}\) be an MA with digitization \({\mathcal {M}_{\delta }}\). Furthermore, let \({\mathbb {O}_{}}\) be (un)timed reachability or expected reward objectives with threshold relations \({\vartriangleright _{}}\) and \(|{\mathbb {O}_{}}| = d\). It holds that \(A^{-}\subseteq \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \} \subseteq A^{+}\) with:

$$\begin{aligned} A^{-}&= \{\mathbf{p }' \in \mathbb {R}^d\mid \forall \mathbf{p }\in \mathbb {R}^d:\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p }) \text { implies } achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \} \text { and} \\ A^{+}&= \{\mathbf{p }' \in \mathbb {R}^d\mid \exists \mathbf{p }\in \mathbb {R}^d:\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p }) \text { and } achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p })\} . \end{aligned}$$

Proof

For simplicity, we assume that only the threshold relation \(\ge \) is considered, i.e., \({\vartriangleright _{}} = (\ge , \dots , \ge )\). Furthermore, we restrict ourself to (un)timed reachability objectives. The remaining cases are treated analogously.

First assume a point \(\mathbf{p }' = (p_{1}', \dots , p_{d}') \in A^{-}\). Consider the point \(\mathbf{p }= (p_{1}, \dots , p_{d})\) satisfying \(p_{i}' = p_{i} - \varepsilon ^{\downarrow }_{i}\) for each index i. It follows that \(\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) and thus \({\mathcal {M}_{\delta }}, \bar{\sigma } \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\) for some scheduler \(\bar{\sigma } \in \text {TA}^{{\mathcal {M}_{\delta }}}\). Consider the scheduler \(\sigma \in \text {GM}^{\mathcal {M}}\) given by \(\sigma (\pi ,\alpha ) = \bar{\sigma }({\text {di}(\pi )},\alpha )\) for each path \(\pi \in { FPaths ^{\mathcal {M}}}\) and action \(\alpha \in Act \). Notice that \(\bar{\sigma } = {\text {di}(\sigma )}\). For an index i let \({\mathbb {O}_{i}}\) be the objective \({\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\). It follows that

$$\begin{aligned} {\mathcal {M}_{\delta }}, \bar{\sigma } \models {\mathbb {O}_{i}} \ge p_{i} \iff {\mathcal {M}_{\delta }}, {\text {di}(\sigma )}\models {\mathbb {O}_{i}} \ge p_{i} \iff \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G)\ge p_{i}\ , \end{aligned}$$

With Corollary 1 it follows that

$$\begin{aligned} p_{i}' = p_{i} - \varepsilon ^{\downarrow }_{i} \le \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) - \varepsilon ^{\downarrow }_{i} \overset{Cor.\,1}{\le } \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}). \end{aligned}$$

As this observation holds for all objectives in \({\mathbb {O}_{}}\), it follows that \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }'\), implying \( achieve ^{\mathcal {M}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }')\).

The proof of the second inclusion is similar. Assume that \(\mathcal {M}, \sigma \models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }'\) holds for a point \(\mathbf{p }' = (p_{1}', \dots , p_{d}') \in \mathbb {R}^d\) and a scheduler \(\sigma \in \text {GM}^{\mathcal {M}}\). For some index i, consider \({\mathbb {O}_{i}} = {\mathbb {P}({\lozenge ^{I_{}}{G_{}}})}\). It follows that \(\text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{}) \ge p_{i}'\). With Corollary 1 we obtain

$$\begin{aligned} p_{i}' - \varepsilon ^{\uparrow }_{i} \le \text {Pr}^{\mathcal {M}}_{\sigma }(\lozenge ^{I_{}} G_{})- \varepsilon ^{\uparrow }_{i} \overset{Cor.\,1}{\le } \text {Pr}^{{\mathcal {M}_{\delta }}}_{{\text {di}(\sigma )}}(\lozenge ^{{\text {di}(I)}}_{\text {ds}} G) . \end{aligned}$$

Applying this for all objectives in \({\mathbb {O}_{}}\) yields \({\mathcal {M}_{\delta }}, {\text {di}(\sigma )}\models {\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }\), where the point \(\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d\) satisfies \( p_{i} = p_{i}' - \varepsilon ^{\uparrow }_{i}\) or, equivalently, \( p_{i}' = p_{i} + \varepsilon ^{\uparrow }_{i}\) for each index i. Note that \(\mathbf{p }' \in \varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) which implies \(\mathbf{p }' \in A^{+}\). \(\square \)

5 Computation of Pareto curves

We have seen that we can reduce the analysis of multiple objectives on MA to multi-objective MDPs to compute the achievable points of the underlying MDP \({\mathcal {M}_\mathcal {D}}\) or a digitization \({\mathcal {M}_{\delta }}\) of MA \(\mathcal {M}\). To analyze the MDPs, we adapt the approach of [28]. In this section, we briefly recap that approach and report on the necessary changes. The approach repeatedly checks weighted combinations of the objectives (by means of value iteration [43]—a standard technique in single-objective MDP model checking) to refine an approximation of the set of achievable points. The following example demonstrates this idea for two objectives.

Fig. 7
figure 7

Illustration of the Pareto curve approximation algorithm (cf. Example 14)

Example 14

Consider an MDP \(\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))\) with two objectives \({\mathbb {O}_{}}= ({\mathbb {O}_{1}}, {\mathbb {O}_{2}})\) and relations \({\vartriangleright _{}} = \{\ge , \ge \}\). The approach of [28] iteratively considers weight vectors \(\mathbf{w }= ({w_{1}}, {w_{2}})\in (\mathbb {R}_{\ge 0})^2 \) with \(\mathbf{w }\ne (0,0)\) that assign a weight \({w_{i}} \ge 0\) to each objective \({\mathbb {O}_{i}}\). Optimizing a combination of the weighted objectives (see Def. 18) yields a point \(\mathbf{q }\in \mathbb {R}^2\) such that

  • \(\mathbf{q }\) is achievable and

  • all achievable points of \(\mathcal {D}\) are contained in the half-space \({H}=\{ \mathbf{p }\in \mathbb {R}^2 \mid \mathbf{p }\cdot \mathbf{w }\le \mathbf{q }\cdot \mathbf{w }\}\).

Since \(\mathbf{q }\) is achievable, any point in the set \({ down (\{\mathbf{q }\})} = \{\mathbf{p }\in \mathbb {R}^2 \mid \mathbf{p }\le \mathbf{q }\}\) depicted by the green area of Fig. 7a is achievable as well. On the contrary, there is no achievable point in \(\mathbb {R}^2 {\setminus } {H}\), illustrated by the red area. For the points in the white area, it is still unknown whether they are achievable or not. The set of achievable points of \(\mathcal {D}\) is explored by combining the results for multiple weight-vectors as indicated in Fig. 7b.

The sketched approach theoretically converges to an exact representation of the set of achievable points, but the number of required calls to value iteration can be exponential in the size of the MDP and the number of objectives [28]. However, experiments in [28] and in Sect. 6 of this work indicate that, in practice, only a small number of weight vectors need to be considered in order to obtain “good” approximations.

We extend [28] towards

  • the simultaneous analysis of minimizing and maximizing expected reward objectives, and

  • the analysis of \(\text {ds}\)-bounded reachability objectives.

These extensions only concern the computation of optimal points. The remaining aspects of the approach are as in [28]. We first restrict our attention to maximizing expected total reward objectives, i.e., expected reward objectives of the form \({\mathbb {E}({\# j_{}, G_{}})}\) with \(G= \emptyset \).

Definition 18

(Optimal Scheduler, Optimal Point) Let \(\mathcal {D}\) be an MDP and let \({\mathbb {O}_{}}= ({\mathbb {E}({\# j_{1}, \emptyset })}, \dots , {\mathbb {E}({\# j_{d}, \emptyset })})\) be expected total reward objectives with threshold relations \({\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})\) and \({\vartriangleright _{i}} \in \{{\ge }, {>} \}\). A scheduler \(\sigma \in \text {TA}^{}\) is called optimal for a weight vector \(\mathbf{w }= ({w_{1}}, \dots , {w_{d}})\in \mathbb {R}_{\ge 0}^d{\setminus } \{\mathbf{0 }\}\) iff

$$\begin{aligned} \sigma \in {\mathop {\hbox {arg max}}\limits _{\sigma ' \in \text {TA}^{}}} \Big (\sum _{i=0}^{d} {w_{i}} \cdot \mathrm {ER}^{\mathcal {D}}_{\sigma '}(\rho _{j_i}, \emptyset ) \Big ). \end{aligned}$$

A point \(\mathbf{p }=(p_{1}, \dots , p_{d})\in \mathbb {R}^d\) is optimal for \(\mathbf{w }\) iff \(p_{i} = \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset )\) for all i and optimal \(\sigma \).

The computation of optimal points for a weighted sum of expected total reward objectives with thresholds \({\vartriangleright _{i}} \in \{{\ge }, {>} \}\) is detailed in [28]. The idea is to use value iteration [43] to compute an optimal scheduler \(\sigma _\text {opt}\) for the maximal expected reward \(\max _{\sigma \in \text {TA}^{}} \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{\mathbf{w }}, \emptyset )\) with the weighted reward function \(\rho _{\mathbf{w }}\) given by \( \rho _{\mathbf{w }}(s, \alpha ) = \sum _{i=1}^{d} {w_{i}} \cdot \rho _{j_i}(s, \alpha ) \) for each \(s\in S\) and \(\alpha \in Act \). Evaluating the individual objectives with respect to \(\sigma _\text {opt}\) yields the entries of an optimal point \(\mathbf{p }=(p_{1}, \dots , p_{d})\), i.e., \(p_{i} = \mathrm {ER}^{\mathcal {D}}_{\sigma _\text {opt}}(\rho _{j_i}, \emptyset )\) for all i. In general, we might have \(p_{i} = \infty \) and thus \(\mathbf{p }\notin \mathbb {R}^d\). In this case, the sketched approach is not applicable. We therefore impose some assumptions.

Assumptions on reward finiteness For maximizing objectives \({\mathbb {E}({\# j_{i}, \emptyset })}\) with \({\vartriangleright _{i}} \in \{{\ge }, {>} \}\) we assume \(\max _{\sigma \in \text {TA}^{}}\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) < \infty \)—following the suggestions of [28, 29]. Note that if a scheduler \(\sigma _\infty \) with \(\mathrm {ER}^{\mathcal {D}}_{\sigma _\infty }(\rho _{j_i}, \emptyset ) = \infty \) exists, we can also construct schedulers \(\sigma \) with \(\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) = \infty \) that mimic \(\sigma _\infty \) with an arbitrarily small (but non-zero) probability and otherwise focus on the remaining objectives. We further assume that there is at least one scheduler \(\sigma \) inducing \(\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{j_i}, \emptyset ) < \infty \) for all minimizing objectives \({\mathbb {E}({\# j_{i}, \emptyset })}\) with \({\vartriangleright _{i}} \in \{{\le }, {<} \}\). If this is not the case, there is no achievable point \(\mathbf{p }\in \mathbb {R}^d\) at all. These assumptions can be checked algorithmically via a graph analysis.

Remark 2

(Reachability and (general) expected reward objectives) A transformation of untimed reachability objectives \({\mathbb {P}({\lozenge ^{}G_{}})}\) to expected total reward objectives is given in [28, 29]. Roughly, the state space of the MDP is unfolded, yielding two copies of each state. Transitions leading to a state in \(G\) are redirected to the second copy, allowing us to store whether a state in \(G\) has already been visited. A reward of 1 is collected whenever a goal state is visited for the first time, i.e., when we move from the first copy to the second one. A similar unfolding technique can be applied to transform expected reward objectives \({\mathbb {E}({\# j_{}, G_{}})}\) with \(G\ne \emptyset \) to total reward objectives. This approach increases the number of considered states by a factor of up to \(2^d\), where \(d\) is the number of objectives.

5.1 Treatment of minimizing objectives

We now consider expected total reward objectives with arbitrary threshold relations. Let \(\mathcal {D}= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{1}, \dots , \rho _{\ell }))\) be an MDP and let \({\mathbb {O}_{}}= ({\mathbb {E}({\# j_{1}, \emptyset })}, \dots , {\mathbb {E}({\# j_{d}, \emptyset })})\) be expected total reward objectives with threshold relations \({\vartriangleright _{}} = (\vartriangleright _{1}, \dots , \vartriangleright _{d})\). Without loss of generality, let each objective consider a different reward function, i.e., the indices \(j_1, \dots , j_d\) are pairwise distinct. We further simplify the notations by assuming \(j_i = i\) for \(i \in \{1, \dots , d\}\), i.e. the i-th objective considers reward function \(\rho _{i}\). We proceed in three steps:

  1. 1.

    Convert all minimizing objectives \({\mathbb {E}({\# j_{i}, \emptyset })}\) with \({\vartriangleright _{i}} \in \{\le , <\}\) to maximizing objectives, potentially introducing negative rewards.

  2. 2.

    Compute an optimal scheduler for the maximal expected reward for a weighted reward function that considers positive- and negative rewards.

  3. 3.

    Lift further reward finiteness assumptions imposed in Step 2.

5.1.1 From minimizing to maximizing objectives

We convert minimizing objectives \({\mathbb {E}({\# i, \emptyset })}\) with \({\vartriangleright _{i}} \in \{\le , <\}\) to maximizing objectives by negating the considered rewards, and thereby deviate from our definition of MDPs from Sect. 2 by allowing negative rewards.

More precisely, we consider the reward functions \(\overline{\rho }_{1}, \dots , \overline{\rho }_{d}:S\times Act \rightarrow \mathbb {R}\) and relations \({\mathbin {\overline{\vartriangleright }_{}}} = ({\mathbin {\overline{\vartriangleright }_{1}}}, \dots , {\mathbin {\overline{\vartriangleright }_{d}}})\), where for \(i \in \{1, \dots , d\}\), \(s\in S\), and \(\alpha \in Act \):

$$\begin{aligned} \overline{\rho }_{i}(s, \alpha ) = {\left\{ \begin{array}{ll} -\rho _{i}(s, \alpha ) &{} \text {if } {\vartriangleright _{i}} \in \{\le ,<\}\\ \rho _{i}(s, \alpha ) &{} \text {otherwise}\\ \end{array}\right. } \qquad \text {and} \qquad {\mathbin {\overline{\vartriangleright }_{i}}} = {\left\{ \begin{array}{ll} \ge &{} \text {if } {\vartriangleright _{i}} = {\le }\\ > &{} \text {if } {\vartriangleright _{i}} = {<}\\ \vartriangleright _{i} &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Let \(\overline{\mathcal {D}} = (S, Act , \mathbf{P} , {s_{0}}, (\overline{\rho }_{1}, \dots , \overline{\rho }_{d}))\). For each scheduler \(\sigma \) for \(\mathcal {D}\) (and \(\overline{\mathcal {D}}\)), \(i \in \{1, \dots , d\}\), and \(p_{i} \in \mathbb {R}\) we have

$$\begin{aligned} \mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{i}, \emptyset ) \le p_{i} \iff -\mathrm {ER}^{\mathcal {D}}_{\sigma }(\rho _{i}, \emptyset ) \ge -p_{i} \iff \mathrm {ER}^{\mathcal {D}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \ge -p_{i} \ . \end{aligned}$$

Lifting to multiple objectives yields for all \(\mathbf{p }= (p_{1}, \dots , p_{d})\in \mathbb {R}^d\):

$$\begin{aligned} achieve ^{\mathcal {D}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \iff achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\mathbin {\overline{\vartriangleright }_{}}\overline{\mathbf{p }}), \end{aligned}$$

where \(\overline{\mathbf{p }} = (\overline{p}_1, \dots , \overline{p}_d)\) with \(\overline{p}_i = -p_i\) if \({\vartriangleright _{i}} \in \{\le , <\}\) and otherwise \(\overline{p}_i = p_i\).

We can thus compute (or approximate) the set of achievable points for \(\overline{\mathcal {D}}\) and \(\mathbin {\overline{\vartriangleright }_{}}\) instead of \(\mathcal {D}\) and \(\vartriangleright _{}\). More concretely, we employ the approach of [28] to approximate the set \( \{\mathbf{p }\in \mathbb {R}^d\mid achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\mathbin {\overline{\vartriangleright }_{}}\mathbf{p }) \} \) where only maximizing objectives are considered. The result for \(\mathcal {D}\) and \(\vartriangleright _{}\) can then be obtained by a simple transformation, essentially multiplying entries of minimizing objectives by \(-1\).

5.1.2 Mixtures of positive and negative rewards

As mentioned above, the approach of [28] requires to repeatedly compute an optimal scheduler for weighted reward functions. Considering the MDP \(\overline{\mathcal {D}} = (S, Act , \mathbf{P} , {s_{0}}, (\overline{\rho }_{1}, \dots , \overline{\rho }_{d}))\) from above results in some technical complications due to the presence of positive and negative rewards.

For simplicity, we further strengthen our assumptions on reward finiteness by assuming that all induced expected rewards are finite, i.e., \(\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \notin \{-\infty , \infty \}\) for all \(\sigma \in \text {TA}^{}\) and \(i \in \{1,\dots ,d\}\). The next section discusses how this assumption can be lifted again.

For a weight vector \(\mathbf{w }= ({w_{1}}, \dots , {w_{d}})\in \mathbb {R}_{\ge 0}^d{\setminus } \{\mathbf{0 }\}\) we consider the weighted reward function \(\rho _{\mathbf{w }}\) given by \(\rho _{\mathbf{w }}(s, \alpha ) = \sum _{i=1}^{d} {w_{i}} \cdot \overline{\rho }_{i}(s,\alpha )\) for \(s\in S\) and \(\alpha \in Act \) and let \(\mathcal {D}^\mathbf{w }= (S, Act , \mathbf{P} , {s_{0}}, (\rho _{\mathbf{w }}))\) denote the MDP which arises from \(\overline{\mathcal {D}}\) by replacing the reward functions \((\overline{\rho }_{1}, \dots , \overline{\rho }_{d})\) by \(\rho _{\mathbf{w }}\). Our goal in this section is to compute an optimal scheduler for \(\mathbf{w }\) (cf. Def. 18), i.e., a scheduler inducing the maximal expected total reward for . Since considers both, positive- and negative rewards, conventional value iteration as considered in [28] yields incorrect results.

Example 15

Consider the MDP \(\mathcal {D}^\mathbf{w }\) with the weighted reward function \(\rho _{\mathbf{w }}\) depicted in Fig. 8a. Action rewards are depicted next to the action label, e.g., \(\rho _{\mathbf{w }}(s_0, \alpha ) = 2\). The maximal expected total reward is obtained for a scheduler \(\sigma _\text {opt}\) that always chooses action \(\alpha \), yielding \(\mathrm {ER}^{}_{\sigma _\text {opt}}(\rho _{\mathbf{w }}, \emptyset ) = 2-1=1\).

On the other hand, value iteration as suggested in [28, Alg. 2] yields a value of 2 and a suboptimal scheduler \(\sigma \) with \(\sigma (s_0) = \beta \). Roughly speaking, this is because value iteration computes the expected reward accumulated within n steps for an increasing value of n. However, for the example MDP, this step-bounded expected reward does not converge to the unbounded expected total reward: For a given step-bound n, an optimal scheduler can avoid to collect the reward \(\rho _{\mathbf{w }}(s_1,\alpha )=-1\) by taking the \(\beta \)-self-loop at \(s_0\) \(n-1\) times and then taking the \(\alpha \)-transition to \(s_1\) only in the very last step.

Fig. 8
figure 8

MDP where value iteration yields wrong results and the corresponding 0-EC quotient (cf. Examples 15 to 17)

In the example, the problem is introduced by the action \(\beta \) at state \(s_0\). This action allows a scheduler to stay at \(s_0\) arbitrarily long without collecting any reward. We refer to such model components as 0-EC.

Definition 19

(End Component, 0-EC) A non-empty set of state-action pairs \(\mathcal {E}\subseteq S\times Act \) is an end component (EC) of \(\mathcal {D}^\mathbf{w }\) if

  1. 1.

    \(\forall (s, \alpha ) \in \mathcal {E}:\mathbf{P} (s, \alpha , s') > 0\) implies \(s' \in { states (\mathcal {E})}= \{ \tilde{s} \mid \exists (\tilde{s}, \tilde{\alpha }) \in \mathcal {E}\}\) and

  2. 2.

    the graph \(\big ({ states (\mathcal {E})},\, \big \{ (s, s') \mid \exists (s, \alpha )\in \mathcal {E} :\mathbf{P} (s, \alpha , s') > 0 \big \} \big )\) is strongly connected.

An EC \(\mathcal {E}\) is a 0-EC w.r.t. reward function \(\rho _{\mathbf{w }}\) if \(\rho _{\mathbf{w }}(s,\alpha ) = 0\) for all \((s,\alpha ) \in \mathcal {E}\). A (0-)EC \(\mathcal {E}\) is maximal, if there is no other (0-)EC \(\mathcal {E}'\) with \(\mathcal {E}\subsetneq \mathcal {E}'\).

Example 16

The MDP \(\mathcal {D}^\mathbf{w }\) in Fig. 8a has two maximal 0-EC w.r.t. the depicted reward function \(\rho _{\mathbf{w }}\): \(\mathcal {E}_1 = \{(s_0, \beta )\}\) and \(\mathcal {E}_2 = \{(s_2,\alpha )\}\).

The set of all maximal ECs of an MDP can be computed efficiently [17]. The set of maximal 0-ECs can be obtained by computing the maximal ECs of a modified MDP in which all transitions incurring non-zero reward are erased. However, in this particular case we know that all ECs (reachable from \({s_{0}}\)) actually are 0-ECs: If an EC is not a 0-EC, we could construct a scheduler inducing infinite reward for at least one objective which violates our assumption that all induced expected rewards are finite.

For an EC \(\mathcal {E}\) we define \({ states (\mathcal {E})}:= \{ s\in S\mid \exists \alpha \in Act (s): (s, \alpha ) \in \mathcal {E}\}\) and \({ exits (\mathcal {E})}{:}{=} \{(s,\alpha ) \in S\times Act \mid s\in { states (\mathcal {E})}, \alpha \in Act (s), \text { and } (s,\alpha ) \notin \mathcal {E}\}\).

Intuitively, once a maximal 0-EC \(\mathcal {E}\) is reached, a scheduler can choose to stay within the states of \(\mathcal {E}\) for arbitrary many steps by only picking actions \(\alpha \) at states \(s\in { states (\mathcal {E})}\) with \((s,\alpha ) \in \mathcal {E}\). Since ECs are strongly connected, it is possible to reach any state \(s\in { states (\mathcal {E})}\) almost surely. At some point, the scheduler may decide to leave the EC by choosing an exiting state-action pair \((s, \alpha ) \in { exits (\mathcal {E})}\). Although this decision could be delayed for arbitrary many steps, such a delay has no effect on the induced expected total reward since no reward is accumulated in 0-ECs.

Our approach is to replace each maximal 0-EC \(\mathcal {E}\) of \(\mathcal {D}^\mathbf{w }\) by a state \(s_?^\mathcal {E}\) in which a scheduler immediately has to choose either an exiting state-action pair \((s, \alpha ) \in { exits (\mathcal {E})}\) or an action \(\alpha _!\), indicating that the EC should never be left. This procedure coincides with the computation of EC quotients in, e.g., [3, 4, 21, 32].

Definition 20

(0-EC Quotient) The 0-EC quotient of \(\mathcal {D}^\mathbf{w }\) w.r.t. reward function \(\rho _{\mathbf{w }}\) is the MDP \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}} = (S', Act ', \mathbf{P} ', {s_{0}}', (\rho _{\mathbf{w }}'))\) obtained by applying the following steps on \(\mathcal {D}^\mathbf{w }\) for every maximal 0-EC \(\mathcal {E}\):

  1. 1.

    Add two new states \(s_?^\mathcal {E}\) and \(s_!^\mathcal {E}\) and remove all states from \({ states (\mathcal {E})}\).

  2. 2.

    Add a new action \(\alpha _s\) for each \((s,\alpha ) \in { exits (\mathcal {E})}\).

  3. 3.

    Redirect the target of transitions that enter the states in \({ states (\mathcal {E})}\) to \(s_?^\mathcal {E}\), i.e., \(\mathbf{P} '(s,\alpha , s_?^\mathcal {E}) = \sum _{s' \in { states (\mathcal {E})}} \mathbf{P} (s,\alpha ,s')\) for every \((s,\alpha ) \in (S\times Act ) {\setminus } \mathcal {E}\).

  4. 4.

    Redirect the origin of transitions that exit the states in \({ states (\mathcal {E})}\) to \(s_?^\mathcal {E}\), i.e., \(\mathbf{P} '(s_?^\mathcal {E}, \alpha _s, s') = \mathbf{P} (s,\alpha ,s')\) for every \((s,\alpha ) \in { exits (\mathcal {E})}\).

  5. 5.

    Add a new action \(\alpha _!\) and set \(\mathbf{P} (s,\alpha _!,s_!^\mathcal {E}) = 1\) for \(s \in \{s_?^\mathcal {E}, s_!^\mathcal {E}\}\).

  6. 6.

    Restrict the reward function \(\rho _{\mathbf{w }}\) to the remaining state-action pairs and set \(\rho _{\mathbf{w }}'(s_?^\mathcal {E}, \alpha _!) = \rho _{\mathbf{w }}'(s_!^\mathcal {E}, \alpha _!) = 0\) and \(\rho _{\mathbf{w }}'(s_?^\mathcal {E}, \alpha _s) = \rho _{\mathbf{w }}(s, \alpha )\) for all \((s,\alpha ) \in { exits (\mathcal {E})}\).

Example 17

Fig. 8b depicts the 0-EC quotient of the MDP in Fig. 8a w.r.t. the depicted reward function \(\rho _{\mathbf{w }}\).

Considering the 0-EC quotient \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}} = (S', Act ', \mathbf{P} ', {s_{0}}', (\rho _{\mathbf{w }}'))\) instead of \(\mathcal {D}^\mathbf{w }\) preserves maximal expected total rewards, i.e.,

$$\begin{aligned} \max _{\sigma \in \text {TA}^{\mathcal {D}^\mathbf{w }}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset ) = \max _{\sigma ' \in \text {TA}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}_{\sigma '}(\rho _{\mathbf{w }}', \emptyset ). \end{aligned}$$

Moreover, we can transform any scheduler \(\sigma '\) for \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}\) to a scheduler \(\sigma \) for \(\mathcal {D}^\mathbf{w }\) with the same expected reward, i.e., \( \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset ) = \mathrm {ER}^{\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}}_{\sigma '}(\rho _{\mathbf{w }}', \emptyset ). \) In particular, if \(\sigma '\) chooses an action \(\alpha _s\) at a state \(s_?^\mathcal {E}\), \(\sigma \) can mimic this by choosing \(\alpha \) at state \(s\in { states (\mathcal {E})}\) and enforcing that s is reached almost surely from any other state \(s' \in { states (\mathcal {E})}{\setminus } \{s\}\) of the 0-EC \(\mathcal {E}\). Similarly, if \(\sigma '\) chooses \(\alpha _!\) at \(s_?^\mathcal {E}\), \(\sigma \) can mimic this by only picking actions \(\alpha \) at states \(s\in { states (\mathcal {E})}\) with \((s,\alpha ) \in \mathcal {E}\). We refer to [4, 21] for more details on the correctness of this construction.

Since there are no more ECs in \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}\) (other than the 0-ECs of the form \(\{(s_!^\mathcal {E}, \alpha _!)\}\) which can not be left), it can be shown that the value iteration algorithm on \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}\) approaches the correct expected total rewards [4, 6].

5.1.3 Negative infinite rewards

We now lift our strengthened reward finiteness assumptions from the previous section (but still impose the assumptions mentioned on page 29). More precisely, we now allow that \(\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma _\infty }(\overline{\rho }_{i}, \emptyset )= -\infty \) for some \(i \in \{1, \dots , d\}\) with \({\vartriangleright _{i}} \in \{{\le }, {<} \}\) and some scheduler \(\sigma _\infty \). Observe that such a scheduler does not achieve any point \(\mathbf{p }\in \mathbb {R}^d\) and can thus be excluded from the analysis. Thus, for the computation of \(\{ \mathbf{p }\in \mathbb {R}^d\mid achieve ^{\overline{\mathcal {D}}}({\mathbb {O}_{}}\vartriangleright _{}\mathbf{p }) \}\) it suffices to restrict the analysis to schedulers in \(\{ \sigma \in \text {TA}^{}\mid \forall i \in \{1, \dots , d\} :\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma }(\overline{\rho }_{i}, \emptyset ) \ne -\infty \}\). Our assumptions for reward finiteness imply that this set is not empty.

If the considered weight vector \(\mathbf{w }= ({w_{1}}, \dots , {w_{d}})\) satisfies \(w_i > 0\) for all \(i \in \{0, \dots , d\}\), our approach from the previous section can be applied without further modifications: If \(\mathrm {ER}^{\overline{\mathcal {D}}}_{\sigma _\infty }(\overline{\rho }_{i}, \emptyset )= -\infty \) for some \(i \in \{1, \dots , d\}\) and some scheduler \(\sigma _\infty \), we also have \(\mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma _\infty }(\rho _{\mathbf{w }}, \emptyset )= -\infty < \max _{\sigma \in \text {TA}^{}} \mathrm {ER}^{\mathcal {D}^\mathbf{w }}_{\sigma }(\rho _{\mathbf{w }}, \emptyset )\).

However, if \(w_i = 0\) for some \(i \in \{0, \dots , d\}\), there might be a 0-EC w.r.t. \(\rho _{\mathbf{w }}\) that is not a 0-EC w.r.t. \(\overline{\rho }_{i}\). This is because the rewards of \(\rho _{i}\) are not considered in the weighted reward function \(\rho _{\mathbf{w }}\). As a consequence, a scheduler for \(\mathcal {D}^\mathbf{w }_{{\setminus } \text {0-EC}}\) as defined above might choose action \(\alpha _!\) at a state \(s_?^\mathcal {E}\) for an EC \(\mathcal {E}\) that is not a 0-EC w.r.t. \(\overline{\rho }_{i}\). Transforming such a scheduler back to \(\overline{\mathcal {D}}\) can then induce an expected total reward of \(-\infty \) w.r.t. \(\overline{\rho }_{i}\). We exclude such schedulers during the computations by tweaking the elimination of 0-ECs in Def. 20: The action \(\alpha _!\) is only inserted (Step 5) if there is a scheduler for \(\overline{\mathcal {D}}\) that stays in the given EC forever and that yields finite expected reward for all individual objectives.

5.2 \(\text {ds}\)-bounded reachability

We generalize reachability objectives from Def. 6 to \(\text {ds}\)-bounded reachability objectives of the form \({\mathbb {P}({\lozenge ^{J}_{\text {ds}} G})}\) which concern \(\text {ds}\)-bounded reachability probabilities (cf. Def. 15). As explained in Sect. 4.3, multi-objective queries for MA considering timed reachability objectives can be approximated by checking \(\text {ds}\)-bounded reachability objectives on a digitization of the MA. This section extends the approach of [28] to process such objectives.

Assume the digitization \({\mathcal {M}_{\delta }}= (S, Act , \mathbf{P} _{\delta }, {s_{0}}, (\rho _{1}^\delta , \dots , \rho _{\ell }^\delta ))\) of an MA \(\mathcal {M}\) w.r.t. \(\delta > 0\) and \(\text {ds}\)-bounded reachability objectives \({\mathbb {P}({\lozenge ^{\le k_{1}}_{\text {ds}} G_{1}})}, \dots , {\mathbb {P}({\lozenge ^{\le k_{d}}_{\text {ds}} G_{d}})}\). Other digitization step bounds (e.g. lower step bounds) are treated similarly. We further denote by \(\text {MS}\subseteq S\) the set of states of \({\mathcal {M}_{\delta }}\) that represent the Markovian states of the initially considered MA. We present a transformation from \(\text {ds}\)-bounded reachability objectives to untimed reachability objectives. The idea is to store the number of visited Markovian states in the state space. Upon reaching a state in \(G_i\), this information allows us to distinguish the cases where the step bound \(k_i\) has been exceeded or not. The procedure is based on the idea of [35, Algorithm 1] for single-objective MA.

For the largest occurring step bound \(k_{\text {max}}= \max _i k_i\) let \(\mathcal {D}\) be the MDP consisting of \(k_{\text {max}}+ 2\) copies of \({\mathcal {M}_{\delta }}\) such that transitions to states \(s'\in \text {MS}\) are redirected to the next copy. Formally, \(\mathcal {D}= (S\times \{0, \dots , k_{\text {max}}+ 1 \}, Act , \mathbf{P} ,({s_{0}}, k_\text {init}), \{\rho _{1}, \dots , \rho _{\ell } \})\) where

  • \(k_\text {init} = 1\) if \({s_{0}}\in \text {MS}\), and \(k_\text {init} = 0\) otherwise,

  • \( \mathbf{P} ((s,k),\,\alpha ,\,(s',k')) = \mathbf{P} _{\delta }(s,\alpha ,s')\) if \(s' \in \text {MS}\) and \(k' = \min (k{+} 1, k_{\text {max}}{+}1)\),

  • \( \mathbf{P} ((s,k),\,\alpha ,\,(s',k)) = \mathbf{P} _{\delta }(s,\alpha ,s')\) if \(s' \notin \text {MS}\),

  • \(\mathbf{P} ((s,k),\,\alpha ,\,(s',k')) = 0\) in all other cases, and

  • every reward function \(\rho _{i}\) satisfies \(\rho _{i}((s,k),\,\alpha ) = \rho _{i}^\delta (s,\alpha )\).

The second component of a state \((s,k)\) of \(\mathcal {D}\) reflects the number of Markovian states visited so far. Hence, eventually reaching \(G_i\) in \({\mathcal {M}_{\delta }}\) with at most \(k_i\) digitization steps is equivalent to reaching \(G'_i = \{ (s,k) \mid s \in G_i \text { and } k \le k_i \}\) in \(\mathcal {D}\). In particular, there is a mapping from schedulers \(\sigma _\delta \) for \({\mathcal {M}_{\delta }}\) to schedulers \(\sigma \) for \(\mathcal {D}\) such that \(\text {Pr}^{{\mathcal {M}_{\delta }}}_{\sigma _\delta }(\lozenge ^{\le k_i}_{\text {ds}} G_i) = \text {Pr}^{\mathcal {D}}_{\sigma }(\lozenge ^{}G'_i)\).

Fig. 9
figure 9

Transformation for \(\text {ds}\)-bounded reachability objectives (cf. Example 18)

Example 18

We illustrate the construction for the digitization \({\mathcal {M}_{\delta }}\) depicted in Fig. 9a and the objective \({\mathbb {P}({\lozenge ^{\le 2}_{\text {ds}} \{s_2\}})}\). States that correspond to Markovian states of the original MA are depicted with rectangles.

Figure 9b shows the reachable states of the MDP \(\mathcal {D}\) as given by the construction above.

With the construction above, a digitization \({\mathcal {M}_{\delta }}\) with objectives \({\mathbb {O}_{}}\) can be transformed to an MDP \(\mathcal {D}\) and untimed reachability or expected reward objectives \({\mathbb {O}_{}}'\) such that

$$\begin{aligned} achieve ^{{\mathcal {M}_{\delta }}}({\mathbb {O}_{}}) = achieve ^{\mathcal {D}}({\mathbb {O}_{}}'). \end{aligned}$$

The latter can be computed using the approach of [28]. In practice, however, this approach is not feasible as \(k_{\text {max}}\) may take high values, leading to huge model sizes. To avoid the problematic, the different copies of \({\mathcal {M}_{\delta }}\) are analyzed individually. For \(k\in \{0, \dots , k_{\text {max}}{+} 1\}\) let \(S_k\) denote the states \((s, k)\) of \(\mathcal {D}\) with second entry \(k\). A transition emerging from a state in \(S_k\) can only point to a state in \(S_k\cup S_{k{+}1}\). As a consequence, for the computation of optimal values for the states in \(S_k\), only the results of the states in \(S_{k{+}1}\) are relevant. We thus analyze the sub-models of \(\mathcal {D}\) induced by the state sets \(S_{k_{\text {max}}{+} 1}, S_{k_{\text {max}}}, \dots , S_1\) and \(S_0\) sequentially in the given order. Details on the sequential approach can be found in [35] for single-objective MA and in [33] for multi-objective MDPs.

Fig. 10
figure 10

Illustration of digitization errors

Dealing with digitization errors Recall that the analysis of the digitization \({\mathcal {M}_{\delta }}\) only approximates the timed reachability probabilities of the MA. More precisely, given a weight vector \(\mathbf{w }\), the approach outlined above computes an optimal point \(\mathbf{p }\) for \(\mathbf{w }\) on \({\mathcal {M}_{\delta }}\) and the digitized query. Applying our results from Sect. 4.3 (in particular Theorem 5), we can lift the point \(\mathbf{p }\) to the original time-bounded reachability query by considering the hyperrectangle \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p })\). Fig. 10a illustrates this, where \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) is depicted by the blue rectangle. The point \(\mathbf{q }\) at the bottom left of the rectangle is known to be achievable, implying that all points in the green area are achievable, whereas the points in the red area are known to be unachievable. This introduces a gap between the achievable and unachievable area. We measure the size \(\gamma \) of this gap by considering the smallest distance between the achievable and the unachievable points, i.e., the distance between the point \(\mathbf{q }\) and the red area in Fig. 10a. The value of \(\gamma \) depends on the size of the rectangle \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p })\) and thus on the selection of the digitization constant \(\delta \). However, \(\gamma \) also depends on the considered weight vector \(\mathbf{w }\) which motivates a dynamic selection of the \(\delta \), depending on the currently considered weight vector.

Example 19

Assume that an optimal point \(\mathbf{p }\) has been computed for the weight vector \(\mathbf{w }^{(1)} = (0,1)\), resulting in the (un-)achievable points and gap \(\gamma \) depicted in Fig. 10b. Note that \(\gamma \) is not affected by the comparably large approximation error for the first objective (x-axis). We continue the example in Fig. 10c. To achieve the same gap \(\gamma \) also for the weight vector \(\mathbf{w }^{(2)}=(1,0)\), a smaller digitization constant has been chosen so that the rectangle \(\varepsilon ({\mathbb {O}_{}}, \mathbf{p }^{(2)})\) becomes smaller. We assume that—by coincidence—the analysis for \(\mathbf{w }^{(2)}=(1,0)\) produces the same point \(\mathbf{q }\) as in the previous step from Fig. 10b. For example, this is possible when the nondeterminism of the model is spurious. Observe that the largest distance between a point in the unknown (white) area and the green area is \(\sqrt{\gamma ^2 + \gamma ^2} = \sqrt{2} \cdot \gamma \). Put differently, the obtained approximation in Fig. 10c is a \((\sqrt{2} \cdot \gamma )\)-approximationFootnote 4 of the set of achievable points for the considered MA.

In general, we can show the following. If all gaps \(\gamma \) are below \(\eta / \sqrt{d}\) for some \(\eta > 0\) and the number of objectives \(d\), the approach of [28] converges to an \(\eta \)-approximation of the set of achievable points.

In order to obtain an \(\eta \)-approximation, our implementation that we discuss in Sect. 6 therefore implements the following heuristic for the selection of the digitization constants. Given a weight vector \(\mathbf{w }\), we choose the largest possible digitization constant \(\delta \) for which the resulting gap \(\gamma \) w.r.t. \(\mathbf{w }\) is at most \((\eta / \sqrt{d}) \cdot 0.9\). Here, the factor 0.9 ensures that we do not have to explore the full set of achievable points of \({\mathcal {M}_{\delta }}\), which otherwise could result in analyzing an unnecessarily large amount of weight vectors.

6 Experimental evaluation

Implementation We implemented multi-objective model checking of MAs into Storm  [22, 36]. The input model is given in the PRISM languageFootnote 5 and translated into a sparse representation. For MA \(\mathcal {M}\), the implementation performs a multi-objective analysis on the underlying MDP \({\mathcal {M}_\mathcal {D}}\) or a digitization \({\mathcal {M}_{\delta }}\) and infers (an approximation of) the achievable points of \(\mathcal {M}\) by exploiting the results from Sect. 4. For computing the achievable points of \({\mathcal {M}_\mathcal {D}}\) and \({\mathcal {M}_{\delta }}\), we apply the approach of [28] with the extensions explained in Sect. 5.

All material to replicate the experiments is available at [45]. The implementation is part of \(\texttt {Storm} \) since release 1.6.3 available at http://stormchecker.org.

Setup Our implementation uses a single core of an Intel Xeon Platinum 8160 CPU with memory limited to 20GB RAM. The timeout (TO) is two hours. For a model, a set of objectives, and a precision \(\eta \in \mathbb {R}_{> 0}\), we measure the time to compute an \(\eta \)-approximation of the set of achievable points. This set-up coincides with Pareto queries as considered in [28]. The digitization constant \(\delta \) is chosen heuristically as discussed in Sect. 5.2. We set the precision for value-iteration to \(\varepsilon = 10^{-6}\). Similar to [28], we use the conventional, unsound variant of value iteration. The use of improved algorithms providing sound precision guarantees [32] is left for future work.

Results for MAs We consider four case studies: (i) a job scheduler [12], see Sect. 1; (ii) a polling system [48, 50] containing a server processing jobs that arrive at two stations; (iii) a video streaming client buffering received packages and deciding when to start playback; and (iv) a randomized mutual exclusion algorithm [50], a variant of [42] with a process-dependent random delay in the critical section. Details on the benchmarks and the objectives are given in App. E.1.

Table 1 Experimental results for multi-objective MAs

Tab. 1 lists results. For each instance we give the defining constants, the number of states of the MA and the used \(\eta \)-approximation. A multi-objective query is given by the triple (lmn) indicating l untimed, m expected reward, and n timed objectives. For each MA and query we depict the total run-time of our implementation (time) and the number of vertices of the obtained under-approximation (pts).

Queries analyzed on the underlying MDP are solved efficiently on large models with up to millions of states. For timed objectives the run-times increase drastically due to the costly analysis of digitized reachability objectives on the digitization, cf. [30]. Queries with up to four objectives can be dealt with within the time limit. Furthermore, for an approximation one order of magnitude better, the number of vertices of the result increases approximately by a factor three. In addition, a lower digitization constant has then to be considered which often leads to timeouts in experiments with timed objectives.

Comparison with PRISM  [39] and IMCA  [30]. We compared the performance of our implementation with both PRISMFootnote 6 and IMCAFootnote 7.

For the comparison with PRISM (no MAs), we considered the multi-objective MDP benchmarks from [28, 29]. We conducted our experiments on PRISM with both variants of the value iteration-based implementation (standard and Gauss-Seidel) and chose the faster variant for each benchmark instance. For all experiments the approximation precision \(\eta = 0.001\) was considered.

For the comparison with IMCA (no multi-objective queries) we used the benchmarks from Tab. 1, with just a single objective. The experiments on IMCA have been conducted with and without enabling value-iteration and we chose the faster variant for each benchmark instance. For timed reachability objectives, the precision \(\eta = 0.001\) was considered in all experiments.

Verification times are summarized in Fig. 11: On points above the diagonal, our implementation is faster. Both implementations are based on [28]. We observe that our implementation is competitive with these tools. Further details are given in App. E.2 and App. E.3.

Fig. 11
figure 11

Verification times (in seconds) of our implementation and other tools

7 Conclusion

We considered multi-objective verification of Markov automata, including in particular timed reachability objectives. The next step is to apply our algorithms to the manifold applications of MA, such as generalized stochastic Petri nets to enrich the analysis possibilities of such nets, and to investigate whether recent advances in the single-objective timed reachability analysis, in particular the methods of [14, 15] may be lifted to the multi-objective case.