Strategy synthesis for multi-dimensional quantitative objectives

Chatterjee, Krishnendu; Randour, Mickael; Raskin, Jean-François

doi:10.1007/s00236-013-0182-6

Strategy synthesis for multi-dimensional quantitative objectives

Original Article
Published: 23 July 2013

Volume 51, pages 129–163, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Acta Informatica Aims and scope Submit manuscript

Strategy synthesis for multi-dimensional quantitative objectives

Download PDF

Krishnendu Chatterjee¹,
Mickael Randour² &
Jean-François Raskin³

613 Accesses
32 Citations
Explore all metrics

Abstract

Multi-dimensional mean-payoff and energy games provide the mathematical foundation for the quantitative study of reactive systems, and play a central role in the emerging quantitative theory of verification and synthesis. In this work, we study the strategy synthesis problem for games with such multi-dimensional objectives along with a parity condition, a canonical way to express $\omega $-regular conditions. While in general, the winning strategies in such games may require infinite memory, for synthesis the most relevant problem is the construction of a finite-memory winning strategy (if one exists). Our main contributions are as follows. First, we show a tight exponential bound (matching upper and lower bounds) on the memory required for finite-memory winning strategies in both multi-dimensional mean-payoff and energy games along with parity objectives. This significantly improves the triple exponential upper bound for multi energy games (without parity) that could be derived from results in literature for games on vector addition systems with states. Second, we present an optimal symbolic and incremental algorithm to compute a finite-memory winning strategy (if one exists) in such games. Finally, we give a complete characterization of when finite memory of strategies can be traded off for randomness. In particular, we show that for one-dimension mean-payoff parity games, randomized memoryless strategies are as powerful as their pure finite-memory counterparts.

Robust Multidimensional Mean-Payoff Games are Undecidable

Synthesizing Permissive Winning Strategy Templates for Parity Games

On Satisficing in Quantitative Games

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Two-player games on graphs provide the mathematical foundation to study many important problems in computer science. Game-theoretic formulations have especially proved useful for synthesis [22, 37, 39], verification [3], refinement [33], and compatibility checking [23] of reactive systems, as well as in analysis of emptiness of automata [41].

Games played on graphs are repeated games that proceed for an infinite number of rounds. The state space of the graph is partitioned into player 1 states and player 2 states (player 2 is adversary to player 1). The game starts at an initial state, and if the current state is a player 1 (resp. player 2) state, then player 1 (resp. player 2) chooses an outgoing edge. This choice is made according to a strategy of the player: given the sequence of visited states, a pure (resp. randomized) strategy chooses an outgoing edge (resp. probability distribution over outgoing edges). This process of choosing edges is repeated forever, and gives rise to an outcome of the game, called a play, that consists of the infinite sequence of states that are visited. When randomized strategies are used, there is in general not a unique outcome, but a set of possible outcomes, as the choice of edges is stochastic rather than deterministic.

Traditionally, games on graphs have been studied with Boolean objectives such as reachability, liveness, $\omega $-regular conditions formalized as the canonical parity objectives, strong fairness objectives, etc [28, 29, 31, 32, 41, 44]. While games with quantitative objectives have been studied in the game theory literature [27, 35, 45], their application in synthesis and other problems in verification is quite recent. The two classical quantitative objectives that are most relevant in verification and synthesis are the mean-payoff and energy objectives. In games on graphs with quantitative objectives, the game graph is equipped with a weight function that assigns integer-valued weights to every edge. For mean-payoff objectives, the goal of player 1 is to ensure that the long-run average of the weights is above a threshold. For energy objectives, the goal of player 1 is to ensure that the sum of the weights stays above 0 at all times. In applications of verification and synthesis, the quantitative objectives that typically arise are (i) multi-dimensional quantitative objectives (i.e., conjunction of several quantitative objectives), e.g., to express properties like the average response time between a grant and a request is below a given threshold $\nu _1$, and the average number of unnecessary grants is below threshold $\nu _2$; and (ii) conjunction of quantitative objectives with a Boolean objective, such as a mean-payoff parity objective that can express properties like the average response time is below a threshold along with satisfying a liveness property. In summary, the quantitative objectives can express properties related to resource requirements, performance, and robustness; multiple objectives can express the different, potentially dependent or conflicting objectives; and the Boolean objective specifies functional properties such as liveness or fairness. The game theoretic framework of multi-dimensional quantitative games and games with conjunction of quantitative and Boolean objectives has recently been shown to have many applications in verification and synthesis, such as synthesizing systems with quality guarantee [6], synthesizing robust systems [7], performance aware synthesis of concurrent data structure [13], analyzing permissivity in games and synthesis [11], simulation between quantitative automata [18], generalizing Boolean simulation to quantitative simulation distance [14], etc. Moreover, multi-dimensional energy games are equivalent to a decidable class of games on vector addition systems with states (VASS). This model is equivalent to games over multi-counter systems and Petri nets [12].

In literature, there are many recent works on the theoretical analysis of multi-dimensional quantitative games, such as, mean-payoff parity games [11, 20], energy-parity games [16], multi-dimensional energy games [19], and multi-dimensional mean-payoff games [19, 43]. Most of these works focus on establishing the computational complexity of the problem of deciding if player 1 has a winning strategy. From the perspective of synthesis and other related problems in verification, the most important problem is to obtain a witness finite-memory winning strategy (if one exists). The winning strategy in the game corresponds to the desired controller for (or implementation of) the system in synthesis, and for implementability a finite-memory strategy is essential. In this work we consider the problem of finite-memory strategy synthesis in multi-dimensional quantitative games in conjunction with parity objectives, and the problem of existence of memory-efficient randomized strategies for such games. These are some of the core and foundational problems in the emerging theory of quantitative verification and synthesis.

Our contributions In this work, we give an extended presentation of the results of Chatterjee et al. [21], the first study of multi-dimensional energy and mean-payoff objectives in conjunction with parity objectives. Conjunction of parity objectives with multi-dimensional quantitative objectives had never been considered before [21]. Since we consider the synthesis of finite-memory strategies, it follows from the results of Chatterjee et al. [19] that both the problems (multi-dimensional energy with parity and multi-dimensional mean-payoff with parity) are equivalent. Our main results for finite-memory strategy synthesis for multi-dimensional energy parity games are as follows. (i) Optimal memory bounds. We first show that memory of exponential size is sufficient in multi-dimensional energy parity games. Our result is a significant improvement over the result that can be obtained naively from the results known in literature that yields a triple exponential bound, even in the case of multi-dimensional energy games without parity. Second, we show a matching lower bound by presenting a family of game graphs where exponential memory is necessary in multi-dimensional energy games (without parity), even when all the transition weights belong to $\{ -1,0,+1 \}$. Thus we establish optimal memory bounds for the finite-memory strategy synthesis problem. (ii) Symbolic and incremental algorithm. We present a symbolic algorithm (in the sense of Doyen and Raskin [25], i.e., using a compact antichain representation of sets by their minimal elements) to compute a finite-memory winning strategy, if one exists, for multi-dimensional energy parity games. Our algorithm is parameterized by the range of energy levels to consider during its execution. So, we can use it in an incremental approach: first, we search for finite-memory winning strategies with a small range, and increment the range only when necessary. We also establish a bound on the maximal range to consider which ensures completeness of the incremental approach. In the worst case the algorithm requires exponential time. Since exponential size memory is required (and also the decision problem is coNP-complete [19]), the worst case exponential bound can be considered as optimal. Moreover, as our algorithm is symbolic and incremental, in most relevant problems in practice, it is expected to be efficient. (iii) Randomized strategies. We also consider when the (pure) finite-memory strategies can be traded off for conceptually much simpler randomized strategies. We show that for energy objectives randomization is not helpful (as energy objectives are similar in spirit with safety objectives), even with only one player, neither it is for two-player multi-dimensional mean-payoff objectives. However, randomized memoryless strategies suffice for one-player multi-dimensional mean-payoff parity games. For the important special case of mean-payoff parity objectives (conjunction of a single mean-payoff and parity objectives), we show that in games, finite-memory strategies can be traded off for randomized memoryless strategies.

Related works This paper extends the results presented in its preceding conference version [21] and gives a full presentation of the technical details. Games with a single mean-payoff objective have been studied in [27, 45], and games with a single energy objective in [15]; their equivalence was established in [10]. One-dimensional mean-payoff parity games problem has been studied in [20]: an exponential algorithm was given to decide if there exists a winning strategy (which in general was shown to require infinite memory); and an improved algorithm was presented in [11]. One-dimensional energy parity games problem has been studied in [16]: it was shown that deciding the existence of a winning strategy is in NP $\cap $ coNP, and an exponential algorithm was given. It was also shown in [16] that, for one-dimensional energy parity objectives, finite-memory strategies with exponential memory are sufficient, and the decision problem for mean-payoff parity objective can be reduced to energy parity objective. Games on VASS with several different winning objectives have been studied in [12], and from the results of Brázdil et al. [12] it follows that in multi-dimensional energy games, winning strategies with finite memory are sufficient (and a triple exponential bound on memory can be derived from the results). The complexity of multi-dimensional energy and mean-payoff games was studied in [19, 43]. It was shown in [19] that in general, winning strategies in multi-dimensional mean-payoff games require infinite memory, whereas for multi-dimensional energy games, finite-memory strategies are sufficient. Moreover, for finite-memory strategies, the multi-dimensional mean-payoff and energy games coincide, and optimal computational complexity for deciding the existence of a winning strategy was established as coNP-complete [19, 43]. Multi-dimensional mean-payoff games with infinite-memory strategies were studied in [43], and optimal computational complexity results were established. Various decision problems over multi-dimensional energy games were studied in [30].

2 Preliminaries

We consider two-player game structures and denote the two players by $\mathcal{P _{1}} $ and $\mathcal{P _{2}} $.

Multi-weighted two-player game structures A multi-weighted two-player game structure is a tuple ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ where (i) $S_{1}$ and $S_{2}$ resp. denote the finite sets of states belonging to $\mathcal{P _{1}} $ and $\mathcal{P _{2}} $, with $S_{1}\cap S_{2}= \emptyset $; (ii) $s_{init}\in S= S_{1}\cup S_{2}$ is the initial state; (iii) $E\subseteq S\times S$ is the set of edges s.t. for all $s \in S$, there exists $s^{\prime } \in S$ s.t. $(s, s^{\prime }) \in E$; (iv) $k \in \mathbb{N }$ is the dimension of the weight vectors; and (v) $w:E\rightarrow \mathbb{Z }^{k}$ is the multi-weight labeling function. The game structure $G$ is one-player if $S_{2}= \emptyset $. A play in $G$ is an infinite sequence of states $\pi = s_{0}s_{1}s_{2}\ldots {}$ s.t. $s_{0} = s_{init}$ and for all $i \ge 0$, we have $(s_{i}, s_{i+1}) \in E$. The prefix up to the $n$-th state of play $\pi = s_{0}s_{1}\ldots {}s_{n}\ldots {}$ is the finite sequence $\pi (n) = s_{0}s_{1}\ldots {}s_{n}$. Let $\mathsf{First }(\pi (n))$ and $\mathsf{Last }(\pi (n))$ resp. denote $s_{0}$ and $s_{n}$, the first and last states of $\pi (n)$. A prefix $\pi (n)$ belongs to $\mathcal{P }_{i}, i \in \lbrace 1, 2\rbrace $, if $\mathsf{Last }(\pi (n)) \in S_{i}$. The set of plays of $G$ is denoted by $\mathsf{Plays }(G)$ and the corresponding set of prefixes is denoted by $\mathsf{Prefs }(G)$. The set of prefixes that belong to $\mathcal{P }_{i}$ is denoted by $\mathsf{Prefs }_{i}(G)$. The energy level vector of a sequence of states $\rho = s_{0}s_{1}\ldots {}s_{n}$ s.t. for all $i \ge 0$, we have $(s_{i}, s_{i+1}) \in E$, is $\mathsf{EL }(\rho ) = \sum _{i = 0}^{i = n - 1} w(s_{i}, s_{i+1})$ and the mean-payoff vector of a play $\pi = s_{0}s_{1}\ldots {}$ is $\mathsf{MP }(\pi ) = \liminf _{n \rightarrow \infty } \frac{1}{n} \mathsf{EL }(\pi (n))$.

Parity A game structure $G$ is extended with a priority function ${p:S\rightarrow \mathbb{N }}$ to the structure ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $. Given a play $\pi = s_{0}s_{1}s_{2}\ldots {}$, we define $\mathsf{Inf }(\pi ) = \left\{ s \in S \;\vert \; \forall \, m \ge 0, \exists \, n > m \text { s.t. } s_{n} = s\right\} $, the set of states that appear infinitely often along $\pi $. The parity of a play $\pi $ is defined as $\mathsf{Par }(\pi ) = \min \left\{ p(s) \;\vert \; s \in \mathsf{Inf }(\pi ) \right\} $. In the following definitions, we denote any game by $G_{p}$ with no loss of generality.

Strategies Given a finite set $A$, a probability distribution on $A$ is a function $p :A \rightarrow [0, 1]$ s.t. $\sum _{a\in A} p(a) = 1$. We denote the set of probability distributions on $A$ by $\mathcal D (A)$. A pure strategy for $\mathcal{P }_{i}, i \in \lbrace 1, 2\rbrace $, in $G_{p}$ is a function $\lambda _{i} :{\mathsf{Prefs }_{i}(G_{p})} \rightarrow S$ s.t. for all $\rho \in {\mathsf{Prefs }_{i}(G_{p})} $, we have $(\mathsf{Last }(\rho ), \lambda _{i}(\rho )) \in E$. A (behavioral) randomized strategy is a function $\lambda _{i} :{\mathsf{Prefs }_{i}(G_{p})} \rightarrow \mathcal D (S)$ s.t. for all $\rho \in {\mathsf{Prefs }_{i}(G_{p})} $, we have $\left\{ (\mathsf{Last }(\rho ), s) \;\vert \; s \in S, \lambda _{i}(\rho )(s) > 0\right\} \subseteq E$. A pure strategy $\lambda _{i}$ for $\mathcal{P }_{i}$ has finite memory if it can be encoded by a deterministic Moore machine $(M,m_0,\alpha _u,\alpha _n)$ where $M$ is a finite set of states (the memory of the strategy), $m_0 \in M$ is the initial memory state, $\alpha _u :M \times S \rightarrow M$ is an update function, and $\alpha _n :M \times S_{i} \rightarrow S$ is the next-action function. If the game is in $s \in S_{i}$ and $m \in M$ is the current memory value, then the strategy chooses $s^{\prime } = \alpha _n(m,s)$ as the next state of the game. When the game leaves a state $s \in S$, the memory is updated to $\alpha _u(m,s)$. Formally, $\left\langle M, m_0, \alpha _u, \alpha _n\right\rangle $ defines the strategy $\lambda _{i}$ s.t. $\lambda _{i}(\rho \cdot s) = \alpha _n(\hat{\alpha }_u(m_0, \rho ), s)$ for all $\rho \in S^*$ and $s \in S_{i}$, where $\hat{\alpha }_u$ extends $\alpha _u$ to sequences of states as expected. A pure strategy is memoryless if $\vert M\vert = 1$, i.e., it does not depend on history but only on the current state of the game. Similar definitions hold for finite-memory randomized strategies, s.t. the next-action function $\alpha _n$ is randomized, while the update function $\alpha _u$ remains deterministic. We resp. denote by $\varLambda _{i}, \varLambda ^{PF}_{i}, \varLambda ^{PM}_{i}, \varLambda ^{RM}_{i}$ the sets of general (i.e., possibly randomized and infinite-memory), pure finite-memory, pure memoryless and randomized memoryless strategies for player $\mathcal{P }_{i}$.

Given a prefix $\rho \in {\mathsf{Prefs }_{i}(G_{p})} $ belonging to player $\mathcal{P }_{i}$, and a strategy $\lambda _{i} \in \varLambda _{i}$ of this player, we define the support of the probability distribution defined by $\lambda _{i}$ as $\mathsf{Supp }_{\lambda _{i}}(\rho ) = \left\{ s \in S\;\vert \; \lambda _{i}(\rho )(s) > 0\right\} $, with $\lambda _{i}(\rho )(s) = 1$ if $\lambda _{i}$ is pure and $\lambda _{i}(\rho ) = s$. A play $\pi $ is said to be consistent with a strategy $\lambda _{i}$ of $\mathcal{P }_{i}$ if for all $n \ge 0$ s.t. $\mathsf{Last }(\pi (n)) \in S_{i}$, we have $\mathsf{Last }(\pi (n+1)) \in \mathsf{Supp }_{\lambda _{i}}(\pi (n))$. Given two strategies, $\lambda _{1}$ for $\mathcal{P _{1}} $ and $\lambda _{2}$ for $\mathcal{P _{2}} $, we define ${\mathsf{Outcome }_{G_{p}}} (\lambda _{1}, \lambda _{2}) = \left\{ \pi \in {\mathsf{Plays }(G_{p})} \;\vert \; \pi \text { is consistent with } \lambda _{1} \text { and } \lambda _{2}\right\} $, the set of possible outcomes of the game. Note that if both strategies $\lambda _{1}$ and $\lambda _{2}$ are pure, we obtain a unique play $\pi = s_{0}s_{1}s_{2}\ldots {}$ s.t. for all $j \ge 0, i \in \lbrace 1, 2\rbrace $, if $s_{j} \in S_{i}$, then we have $s_{j+1} = \lambda _{i}(s_{j})$.

Given the initial state $s_{init}$ and strategies for both players $\lambda _{1} \in \varLambda _{1}, \lambda _{2} \in \varLambda _{2}$, we obtain a Markov chain. Thus, every event $\mathcal A \subseteq {\mathsf{Plays }(G_{p})} $, a measurable set of plays, has a uniquely defined probability [42] (Carathéodory’s extension theorem induces a unique probability measure on the Borel $\sigma $-algebra over ${\mathsf{Plays }(G_{p})} $). We denote by $\mathbb P _{s_{init}}^{\lambda _{1}, \lambda _{2}}(\mathcal A )$ the probability that a play belongs to $\mathcal A $ when the game starts in $s_{init}$ and is played consistently with $\lambda _{1}$ and $\lambda _{2}$. Let $f : {\mathsf{Plays }(G_{p})} \rightarrow \mathbb{R }$ be a measurable function, we denote $\mathbb E _{s_{init}}^{\lambda _{1}, \lambda _{2}}(f)$ the expected value of function $f$ over a play when the game starts in $s_{init}$ and is played consistently with $\lambda _{1}$ and $\lambda _{2}$. We use the same notions for prefixes by naturally extending them to their infinite counterparts.

Objectives An objective for $\mathcal{P _{1}} $ in $G_{p}$ is a set of plays $\phi \subseteq {\mathsf{Plays }(G_{p})} $. We consider several kinds of objectives:

Multi Energy objectives. Given an initial natural energy vector $v_{0} \in \mathbb{N }^{k}$, the objective ${\mathsf{PosEnergy }_{G_{p}}(v_{0})} = \left\{ \pi \in {\mathsf{Plays }(G_{p})} \;\vert \; \forall \, n \ge 0 : v_{0} + \mathsf{EL }(\pi (n)) \in \mathbb{N }^{k} \right\} $ requires that the energy level in all dimensions stays positive at all times.
Multi Mean-payoff objectives. Given a rational threshold vector $v \in \mathbb{Q }^{k}$, the objective ${\mathsf{MeanPayoff }_{G_{p}}(v)} = \left\{ \pi \in {\mathsf{Plays }(G_{p})} \;\vert \; \mathsf{MP }(\pi ) \ge v\right\} $ requires that for all dimension $j$, the mean-payoff on this dimension is at least $v(j)$.
Parity objectives. Objective ${\mathsf{Parity }_{G_{p}}} = \left\{ \pi \in {\mathsf{Plays }(G_{p})} \;\vert \; \mathsf{Par }(\pi ) \text { mod } 2 = 0\right\} $ requires that the minimum priority visited infinitely often be even. When the set of priorities is restricted to $\lbrace 0, 1\rbrace $, we have a Büchi objective. Note that every multi-weighted game structure $G$ without parity can trivially be extended to $G_{p}$ with $p: S\rightarrow \left\{ 0\right\} $.
Combined objectives. Parity objectives can naturally be combined with multi mean-payoff and multi energy objectives, resp. yielding ${\mathsf{MeanPayoff }_{G_{p}}(v)} \cap {\mathsf{Parity }_{G_{p}}} $ and ${\mathsf{PosEnergy }_{G_{p}}(v_{0})} \cap {\mathsf{Parity }_{G_{p}}} $.

Sure, satisfaction and expectation semantics A strategy $\lambda _{1}$ for $\mathcal{P _{1}} $ is surely winning for an objective $\phi $ in $G_{p}$ if for all plays $\pi \in {\mathsf{Plays }(G_{p})} $ that are consistent with $\lambda _{1}$, we have $\pi \in \phi $. When at least one of the players plays a randomized strategy, the notion of sure winning in general is too restrictive and inadequate, as the set of consistent plays that do not belong to $\phi $ may have zero probability measure. Therefore, it is useful to use satisfaction or expectation criteria. Let $\lambda _{1} \in \varLambda _{1}$ be the strategy of $\mathcal{P _{1}} $.

Given a threshold $\alpha \in \left[ 0, 1\right] $ and a measurable objective $\phi \subseteq {\mathsf{Plays }(G_{p})} , \alpha $-satisfaction asks that for all $\lambda _{2} \in \varLambda _{2}$, we have $\mathbb P _{s_{init}}^{\lambda _{1}, \lambda _{2}}(\phi ) \ge \alpha $. If $\lambda _{1}$ satisfies $\phi $ with probability $\alpha = 1$, we say that $\lambda _{1}$ is almost-surely winning for $\phi $ in $G_{p}$.
Given a threshold $\beta \in \mathbb{Q }^{k}$, a function $f: {\mathsf{Plays }(G_{p})} \rightarrow \mathbb{Q }, \beta $-expectation asks that for all $\lambda _{2} \in \varLambda _{2}$, we have $\mathbb E _{s_{init}}^{\lambda _{1}, \lambda _{2}}(f) \ge \beta $.

Note that energy objectives are naturally more enclined towards satisfaction semantics, as they model safety properties.

Strategy synthesis problem For multi energy parity games, the problem is to synthesize a finite initial credit $v_{0} \in \mathbb{N }^{k}$ and a pure finite-memory strategy $\lambda ^{pf}_{1} \in \varLambda ^{PF}_{1}$ that is surely winning for $\mathcal{P _{1}} $ in $G_{p}$ for the objective ${\mathsf{PosEnergy }_{G_{p}}(v_{0})} \cap {\mathsf{Parity }_{G_{p}}} $, if one exists. So, the initial credit is not fixed, but is part of the strategy to synthesize. For multi mean-payoff games, given a threshold $v \in \mathbb{Q }^{k}$, the problem is to synthesize a pure finite-memory strategy $\lambda ^{pf}_{1} \in \varLambda ^{PF}_{1}$ that is surely winning for $\mathcal{P _{1}} $ in $G_{p}$ for the objective ${\mathsf{MeanPayoff }_{G_{p}}(v)} \cap {\mathsf{Parity }_{G_{p}}} $, if one exists. Note that multi energy and multi mean-payoff games are equivalent for finite-memory strategies, while in general, infinite memory may be necessary for the latter [19].

Trading finite memory for randomness We study when finite memory can be traded for randomization. The question is: given a strategy $\lambda _{1}^{pf} \in \varLambda ^{PF}_{1}$ which ensures surely winning of some objective $\phi $, does there exist a strategy $\lambda _{1}^{rm} \in \varLambda ^{RM}_{1}$ which ensures almost-surely winning for the same objective $\phi $? For mean-payoff objectives, one can also ask for a weaker equivalence, that is: can randomized memoryless strategies achieve the same expectation as pure finite-memory ones?

3 Optimal memory bounds

In this section, we establish optimal memory bounds for pure finite-memory winning strategies on multi-dimensional energy parity games (MEPGs). Also, as a corollary, we obtain results for pure finite-memory winning strategies on multi-dimensional mean-payoff parity games (MMPPGs). We show that single exponential memory is both sufficient and necessary for winning strategies. Additionally, we show how the parity condition in a MEPG can be removed by adding additional energy dimensions.

Multi energy parity games A sample game is depicted on Fig. 1. The key point in the upper bound proof on memory is to understand that for $\mathcal{P _{1}} $ to win a multi energy parity game, he must be able to force cycles whose energy level is positive in all dimensions and whose minimal parity is even. As stated in the next lemma, finite-memory strategies are sufficient for multi energy parity games for both players.

Lemma 1

(Extension of [19, Lemmas 2 and 3]) If $\mathcal{P _{1}} $ wins a multi energy parity game, then he has a pure finite-memory winning strategy. If $\mathcal{P _{2}} $ wins a multi energy parity game, then he has a pure memoryless winning strategy.

Proof

The first part of the result follows using the standard well-quasi ordering argument (straightforward extension of [19, Lemma 2]). The second part follows by the classical edge induction argument: Lemma 3 of [19] and Lemma 3 of [16] show the result using edge induction for multi energy and energy parity games, respectively. Repeating the arguments of Lemma 3 of [16], and replacing the part on single energy objectives by the argument of Lemma 3 of [19] for multi energy objectives, we obtain the desired result.$\square $

By Lemma 1, we know that w.l.o.g. both players can be restricted to play pure finite memory strategies. The property on the cycles can then be formalized as follows.

Lemma 2

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a multi energy parity game. Let $\lambda _{1}^{pf} \in \varLambda ^{PF}_{1}$ be a winning strategy of $\mathcal{P _{1}} $ for initial credit $v_{0} \in \mathbb{N }^{k}$. Then, for all $\lambda _{2}^{pm} \in \varLambda ^{PM}_{2}$, the outcome is a regular play $\pi = \rho \cdot (\eta _{\infty })^{\omega }$, with $\rho \in \mathsf{Prefs }(G), \eta _{\infty } \in S^{+}$, s.t. $\mathsf{EL }(\eta _{\infty }) \ge 0$ and $\mathsf{Par }(\pi ) = \min \left\{ p(s) \;\vert \; s \in \eta _{\infty } \right\} $ is even.

Proof

Recall that both players play with pure finite memory strategies. Therefore, a finite number of decisions are made and the outcome is a regular play $\pi = \rho \cdot (\eta _{\infty })^{\omega }$. Note that $\mathsf{EL }(\rho )$ does not have to be positive, as $\mathcal{P _{1}} $ may have $v_{0} > \mathsf{EL }(\rho )$. Similarly, priorities of states visited in $\rho $ have no impact on winning as they are only visited a finite number of times. First, suppose $\mathsf{EL }(\eta _{\infty }) < 0$ on some dimension $1 \le j \le k$. Then, after $m > 0$ cycles, for some $n > 0$, the energy level will be $\mathsf{EL }(\pi (n)) = \mathsf{EL }(\rho \cdot (\eta _{\infty })^{m}) = \mathsf{EL }(\rho ) + m \cdot \mathsf{EL }(\eta _{\infty })$. Since $v_{0}$ is finite and $m \rightarrow \infty $, there exist some $m, n > 0$, s.t. $v_{0} + \mathsf{EL }(\pi (n)) < 0$ on dimension $j$ and $\lambda _{1}$ is not winning. Second, suppose $\min \left\{ p(s) \;\vert \; s \in \eta _{\infty } \right\} $ is odd. Since the set of states visited infinitely often is exactly the set of states in $\eta _{\infty }$, this implies that $\mathsf{Par }(\pi )$ is odd, and thus $\lambda _{1}$ is not winning.$\square $

A self-covering path in a game, straightforwardly extending the notion introduced by Rackoff [38] for Vector Addition Systems (VAS), is a sequence of states $s_{0}s_{1}s_{2}\ldots {}s_{m}$ s.t. there exist two positions $i$ and $j$ that verify $0 \le i < j \le m, s_{i} = s_{j}$ and $\mathsf{EL }(s_{0}\ldots {}s_{i}) \le \mathsf{EL }(s_{0}\ldots {}s_{i}\ldots {}s_{j})$. In other words, such a path describes a finite prefix followed by a cycle which has a non-negative effect on the energy level. Ensuring such cycles is crucial to win the energy objective. With the notion of regular play of Lemma 2, we generalize the notion of self-covering path to include the parity condition. We show here that, if such a path exists, then the lengths of its cycle and the prefix needed to reach it can be bounded. Bounds on the strategy follow. In [38], Rackoff showed how to bound the length of self-covering paths in VAS. This work was extended to Vector Addition Systems with States (VASS) by Rosier and Yen [40]. Recently, Brázdil et al. [12] introduced reachability games on VASS and the notion of self-covering trees. Their Zero-safety problem with $\omega $ initial marking is equivalent to multi energy games with weights in $\lbrace -1, 0, 1\rbrace $, and without the parity condition. They showed that if winning strategies exist for $\mathcal{P _{1}} $, then some of them can be represented as self-covering trees of bounded depth. Trees have to be considered instead of paths, as in a game setting all the possible choices of the adversary ($\mathcal{P _{2}} $) must be considered. Here, we extend the notion of self-covering trees to even-parity self-covering trees, in order to handle parity objectives.

Even-parity self-covering tree An even-parity self-covering tree (epSCT) for $s \in S$ is a finite tree $T= \left( Q, R\right) $, where $Q$ is the set of nodes, $\varTheta :Q\mapsto S \times \mathbb{Z }^{k}$ is a labeling function and $R\subset Q\times Q$ is the set of edges, s.t.

The root of $T$ is labeled $\langle s, (0, \ldots {}, 0)\rangle $.
If $\varsigma \in Q$ is not a leaf, then let $\varTheta (\varsigma ) = \langle t, u \rangle , t \in S, u \in \mathbb{Z }^{k}$, s.t.
- if $t \in S_{1}$, then $\varsigma $ has a unique child $\vartheta $ s.t. $\varTheta (\vartheta ) = \langle t^{\prime }, u^{\prime }\rangle , (t, t^{\prime }) \in E$ and $u^{\prime } = u + w(t, t^{\prime })$;
- if $t \in S_{2}$, then there is a bijection between children of $\varsigma $ and edges of the game leaving $t$, s.t. for each successor $t^{\prime } \in S$ of $t$ in the game, there is one child $\vartheta $ of $\varsigma $ s.t. $\varTheta (\vartheta ) = \langle t^{\prime }, u^{\prime }\rangle , u^{\prime } = u + w(t, t^{\prime })$.
If $\varsigma $ is a leaf, then let $\varTheta (\varsigma ) = \langle t, u \rangle $ s.t. there is some ancestor $\vartheta $ of $\varsigma $ in $T$ s.t. $\varTheta (\vartheta ) = \langle t, u^{\prime } \rangle $, with $u^{\prime } \le u$, and the downward path from $\vartheta $ to $\varsigma $, denoted by $\vartheta \rightsquigarrow \varsigma $, has minimal priority even. We say that $\vartheta $ is an even-descendance energy ancestor of $\varsigma $.

Intuitively, each path from root to leaf is a self-covering path of even parity in the game graph so that plays unfolding according to such a tree correspond to winning plays of Lemma 2. Thus, the epSCT fixes how $\mathcal{P _{1}} $ should react to actions of $\mathcal{P _{2}} $ in order to win the MEPG (Fig. 1). Note that as the tree is finite, one can take the largest negative number that appears on a node in each dimension to compute an initial credit for which there is a winning strategy (i.e., the one described by the tree). In particular, let $W$ denote the maximal absolute weight appearing on an edge in $G_{p}$. Then, for an epSCT $T$ of depth $l$, it is straightforward to see that the maximal initial credit required is at most $l\cdot W$ as the maximal decrease at each level of the tree is bounded by $W$. We suppose $W> 0$ as otherwise, any strategy of $\mathcal{P _{1}} $ is winning for the energy objective, for any initial credit vector $v_{0} \in \mathbb{N }^{k}$.

Let us explicitly state how $\mathcal{P _{1}} $ can deploy a strategy $\lambda _{1}^{T} \in \varLambda ^{PF}_{1}$ based on an epSCT $T= \left( Q, R\right) $. We refer to such a strategy as an epSCT strategy. It consists in following a path in the tree $T$, moving a pebble from node to node and playing in the game depending on edges taken by this pebble. Each time a node $\varsigma $ s.t. $\varTheta (\varsigma ) = \langle t, u \rangle $ is encountered, we do the following.

If $\varsigma $ is a leaf, the pebble directly goes up to its oldest even-descendance energy ancestor $\vartheta $. By oldest we mean the first encountered when going down in the tree from the root. Note that this choice is arbitrary, in an effort to ease following proof formulations, as any one would suit.
Otherwise, if $\varsigma $ is not a leaf,
- if $t \in S_{2}$ and $\mathcal{P _{2}} $ plays state $t^{\prime } \in S$, the pebble is moved along the edge going to the only child $\vartheta $ of $\varsigma $ s.t. $\varTheta (\vartheta ) = \langle t^{\prime }, u^{\prime } \rangle , u^{\prime } = u + w(t, t^{\prime })$;
- if $t \in S_{1}$, the pebble moves to $\vartheta , \varTheta (\vartheta ) = \langle t^{\prime }, u^{\prime } \rangle $, the only child of $\varsigma $, and $\mathcal{P _{1}} $ strategy is to choose the state $t^{\prime }$ in the game.

If such an epSCT $T$ of depth $l$ exists for a game $G_{p}$, then $\mathcal{P _{1}} $ can play the strategy $\lambda _{1}^{T} \in \varLambda ^{PF}_{1}$ to win the game with initial credit bounded by $l\cdot W$.

Bounding the depth of epSCTs Consider a multi energy game without parity. Then, the priority condition on downward paths from ancestor to leaf is not needed and self-covering trees (i.e., epSCTs without the condition on priorities) suffice to describe winning strategies. One can bound the size of SCTs using results on the size of solutions for linear diophantine equations (i.e., with integer variables) [9]. In particular, recent work on reachability games over VASS with weights $\left\{ -1, 0, 1\right\} $, Lemma 7 of [12], states that if $\mathcal{P _{1}} $ has a winning strategy on a VASS, then he can exhibit one that can be described as a SCT whose depth is at most $l = 2^{(d-1) \cdot \vert S\vert } \cdot (\vert S\vert + 1)^{c \cdot k^{2}}$, where $c$ is a constant independent of the considered VASS and $d$ its branching degree (i.e., the highest number of outgoing edges on any state). Naive use of this bound for multi energy games with arbitrary integer weights would induce a triple exponential bound for memory. Indeed, recall that $W$ denotes the maximal absolute weight that appears in a game ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $. A straightforward translation of a game with arbitrary weights into an equivalent game that uses only weights in $\{-1,0,1\}$ induces a blow-up by $W$ in the size of the state space, and thus an exponential blow-up by $W$ in the depth of the tree, which becomes doubly exponential as we have

$$\begin{aligned} l = 2^{(d-1) \cdot W\cdot \vert S\vert } \cdot (W\cdot \vert S\vert + 1)^{c \cdot k^{2}} = 2^{(d-1) \cdot 2^{V}\cdot \vert S\vert } \cdot (W\cdot \vert S\vert + 1)^{c \cdot k^{2}}, \end{aligned}$$

where $V$ denotes the number of bits used by the encoding of $W$. Moreover, the width of the tree increases as $d^{l}$, i.e., it increases exponentially with the depth. So straight application of previous results provides an overall tree of triple exponential size. In this paper we improve this bound and prove a single exponential upper bound, even for multi energy parity games. We proceed in two steps, first studying the depth of the epSCT, and then showing how to compress the tree into a directed acyclic graph (DAG) of single exponential size.

Lemma 3

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a multi energy parity game s.t. $W$ is the maximal absolute weight appearing on an edge and $d$ the branching degree of $G_{p}$. Suppose there exists a finite-memory winning strategy for $\mathcal{P _{1}} $. Then there is an even-parity self-covering tree for $s_{init}$ of depth at most $l= 2^{(d-1) \cdot \vert S\vert } \cdot \left( W\cdot \vert S\vert + 1\right) ^{c \cdot k^{2}}$, where $c$ is a constant independent of $G_{p}$.

Lemma 3 eliminates the exponential blow-up in depth induced by a naive coding of arbitrary weights into $\{-1,0,1\}$ weights, and implies an overall doubly exponential upper bound. Our proof is a generalization of [12, Lemma 7], using a more refined analysis to handle both parity and arbitrary integer weights. The idea is the following. First, consider the one-player case. The epSCT is reduced to a path. By Lemma 2, it is composed of a finite prefix, followed by an infinitely repeated sequence of positive energy level and even minimal priority. The point is to bound the length of such a sequence by eliminating cycles that are not needed for energy or parity. Second, to extend the result to two-player games, we use an induction on the number of choices available for $\mathcal{P _{2}} $ in a given state. Intuitively, we show that if $\mathcal{P _{1}} $ can win with an epSCT $T_{A}$ when $\mathcal{P _{2}} $ plays edges from a set $A$ in a state $s$, and if he can also win with an epSCT $T_{B}$ when $\mathcal{P _{2}} $ plays edges from a set $B$, then he can win when $\mathcal{P _{2}} $ chooses edges from both $A$ and $B$, with an epSCT whose depth is bounded by the sum of depths of $T_{A}$ and $T_{B}$.

Proof

The proof is made in two steps. First, we consider the one-player case, where $S_{2} = \emptyset $. Second, we use an induction scheme over the choice degree of $\mathcal{P _{2}} $ to extend our results to the two-player case.

We start with $S_{2} = \emptyset $, the one-player game. By Lemma 2, a winning play is of the form $\pi = \rho \cdot \eta _{\infty }^{\omega }$ s.t. $\mathsf{EL }(\eta _{\infty }) \ge 0$ and $\mathsf{Par }(\pi ) = \min \left\{ p(s) \;\vert \; s \in \eta _{\infty } \right\} $ is even. Notice that such a play corresponds to the epSCT defined above, as it reduces to an even-parity self-covering path $\langle s_{init}, (0, \ldots , 0) \rangle \rightsquigarrow \langle s, u\rangle \rightsquigarrow \langle s, u^{\prime }\rangle $ with $u^{\prime } \ge u$. Therefore its existence is guaranteed and it remains to bound its length. Given such a path, the idea is to eliminate unnecessary cycles, in order to reduce its length while maintaining the needed properties (i.e., positive energy and even minimal priority). First, notice that cycles in the sub-path $\langle s_{init}, (0, \ldots , 0) \rangle \rightsquigarrow \langle s, u\rangle $ can be trivially erased as they are only visited a finite number of times and thus (a) the initial credit can compensate for the loss of their potential positive energy effect, and (b) they do not contribute in the parity. Now consider the sub-path $\langle s, u\rangle \rightsquigarrow \langle s, u^{\prime }\rangle $. Since it induces a winning play, its minimal priority is even. Let $p_{m}$ be this priority. We may suppose w.l.o.g. that $p(s) = p_{m}$, otherwise it suffices to shift this sub-path to $\langle s^{\prime }, v\rangle \rightsquigarrow \langle s^{\prime }, v^{\prime }\rangle $ for some state $s^{\prime }$ s.t. $p(s^{\prime }) = p_{m}$ and $v^{\prime } \ge v$, and add the sub-path $\langle s, u\rangle \rightsquigarrow \langle s^{\prime }, v\rangle $ to the finite prefix. Now we may eliminate each cycle of $\langle s, u\rangle \rightsquigarrow \langle s, u^{\prime }\rangle $ safely in regards to the parity objective as they only contain states with greater or equal priority. Thus, we only need to take care of the energy, and fall under the scope of [12, Lemma 15] for the special case of weights in $\left\{ -1, 0, 1\right\} $, where an upper bound $h\left( \vert S\vert , k\right) = \left( \vert S\vert + 1\right) ^{c \cdot k^{2}}$ on the length of such a path is shown.

We claim that for a one-player game $G$, with weights in $\left\{ -W, -W+1, \ldots {}, W-1, W\right\} $, an upper bound $h\left( W, \vert S\vert , k\right) \!=\! \left( W\cdot \vert S\vert + 1 \right) ^{c \cdot k^{2}}$ is obtained. Indeed, one can translate ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ into an equivalent game $G^{\prime }_{p^{\prime }} = \left( S_{1}^{\prime }, S_{2}, s_{init}, E^{\prime }, k, w^{\prime }, p^{\prime }\right) $ s.t. each edge of $G_{p}$ is split into at most $W$ edges in $G^{\prime }_{p^{\prime }}$, with at most $(W-1)$ dummy states in between, so that each edge of $G^{\prime }_{p^{\prime }}$ only uses weights in $\left\{ -1, 0, 1\right\} $. Let $S_{d}$ denote the set of these added dummy states. We define this translation $\mathsf Tr :G_{p}\mapsto G^{\prime }_{p^{\prime }}$ with $\mathsf Tr (S_{1}) = S_{1}\cup S_{d}, \mathsf Tr (S_{2}) = S_{2}, \mathsf Tr (s_{init}) = s_{init}, \mathsf Tr (E) = \bigcup _{(s, t) \in E} \mathsf Tr ((s,t)), \mathsf Tr (k) = k, \mathsf Tr (w) = w^{\prime } :E^{\prime } \rightarrow \left\{ -1, 0, 1\right\} ^{k}, \mathsf Tr (p) = p^{\prime } :S^{\prime } \rightarrow \mathbb{N }$ s.t. for all $(s, t) \in E$ s.t. $m = \max \left\{ w(s, t) (j) \;\vert \; 1 \!\le \! j \!\le \! k\right\} - 1$, we have that $\mathsf Tr \left( (s, t)\right) \!=\! \left\{ (s, s_{d}^{1}), (s_{d}^{1}, s_{d}^{2}), \ldots {}, (s_{d}^{m-1}, s_{d}^{m}),\right. \left. (s_{d}^{m}, t)\right\} $ s.t.

$$\begin{aligned} \Big (\forall \; j > 0,\; s_{d}^{j} \in S_{d} \;\wedge \; p^{\prime }(s_{d}^{j}) = p(s)\Big ) \;\wedge \; \sum _{(q, r) \in \mathsf Tr \left( (s, t)\right) } w^{\prime } (q, r) = w(s, t). \end{aligned}$$

To be formally correct, we have to add that for all $s_{d} \in S_{d}$, we have $\mathsf{degree }_{in}(s_{d}) = \mathsf{degree }_{out}(s_{d}) = 1$, and for all $s \not \in S_{d}$, we have $p^{\prime }(s) = p(s)$. This translation does not hinder the outcome of the game as each edge in $G_{p}$ has a unique corresponding path in $G^{\prime }_{p^{\prime }}$ that preserves the weights and the visited priorities, and that offers no added choice to $\mathcal{P _{1}} $. Since $G_{p}$ possesses $\vert E\vert \le \vert S\vert ^{2}$ edges, and for each edge of $G_{p}$, we add at most $(W-1)$ dummy states in $G^{\prime }_{p^{\prime }}$, we have $\vert S^{\prime }\vert \le \vert S\vert + \vert S\vert ^{2}\cdot (W-1) \le \vert S\vert ^{2}\cdot W$. Therefore, by applying [12, Lemma 15] on $G^{\prime }_{p^{\prime }}$, we obtain the following upper bound:

$$\begin{aligned} h\left( W, \vert S\vert , k\right) = h\left( \vert S^{\prime }\vert , k\right) = \left( \vert S\vert ^{2}\cdot W+ 1\right) ^{c \cdot k^{2}} = \left( W\cdot \vert S\vert + 1\right) ^{c^{\prime } \cdot k^{2}} \end{aligned}$$

for some constant $c^{\prime }$ that is independent of $G_{p}$.

Now, consider $S_{2} \ne \emptyset $. (I) We extend [12, Lemma 16] for parity. This will help us to establish an induction scheme over the choice degree of $\mathcal{P _{2}} $. Suppose $s \in S_{2}$ has more than one outgoing edge. Let $\tau = (s,t) \in E$ be one of them and $R \subset E$ denote the nonempty set of other outgoing edges. Let $G_{p}^{\tau }$ (resp. $G_{p}^{R}$) be the game induced when removing $R$ (resp. $\tau $) from $G_{p}$. Suppose that (a) $s$ is winning for $\mathcal{P _{1}} $ in $G_{p}^{R}$ for initial credit $v_{R} \in \mathbb{N }^{k}$, and (b) there exists some state $s^{\prime } \in S$ s.t. $s^{\prime }$ is winning for $\mathcal{P _{1}} $ in $G_{p}^{\tau }$ for initial credit $v_{\tau } \in \mathbb{N }^{k}$. We claim that $s^{\prime }$ is winning in $G_{p}$ for initial credit $v_{0} = v_{\tau } + v_{R}$. Indeed, let $\lambda _{1}^{\tau }$ and $\lambda _{1}^{R}$ resp. denote winning strategies for $\mathcal{P _{1}} $ in $G_{p}^{\tau }$ and $G_{p}^{R}$. Let $\mathcal{P _{1}} $ use the following strategy. Player $\mathcal{P _{1}} $ plays $\lambda _{1}^{\tau }$ as long as $\mathcal{P _{2}} $ does not play any edge of $R$. If such an edge is played, then $\mathcal{P _{1}} $ switches to strategy $\lambda _{1}^{R}$ and plays it until edge $\tau $ is played again by $\mathcal{P _{2}} $, in which case $\mathcal{P _{1}} $ switches back to $\lambda _{1}^{\tau }$, and so on. In this way, the outcome of the game is guaranteed to be a play $\pi = s^{\prime } \ldots {} s \ldots {} s \ldots {} s \ldots {} $ resulting from a merge between a play consistent with $\lambda _{1}^{\tau }$ over $G_{p}^{\tau }$ (whose energy level is bounded by $-v_{\tau }$ at all times), and a play consistent with $\lambda _{1}^{R}$ over $G_{p}^{R}$ (whose energy level is bounded by $-v_{R}$ at all times). Therefore, the combined overall energy level of any prefix $\rho $ of this play is bounded by $(-v_{\tau } - v_{R})$ as positive cycles in $G_{p}^{\tau }$ and $G_{p}^{R}$ do remain positive in $G_{p}$. Furthermore, the parity condition is preserved in $G_{p}$. Indeed, suppose it is not. Thus, there exists a state visited infinitely often in the outcome s.t. its priority is minimal and odd. However, as the outcome results from merging plays resp. consistent with $\lambda _{1}^{\tau }$ and $\lambda _{1}^{R}$, this implies that one of those strategies yields an odd minimal priority, which contradicts the fact that they are winning. This proves the claim.

(II) We apply the induction scheme of [12, Lemma 18] on $r = \vert \lbrace (s,t) \in E\;\vert \; s \in S_{2}\rbrace \vert - \vert S_{2}\vert \le (d-1) \cdot \vert S\vert $, the choice degree of $\mathcal{P _{2}} $. Notice that our translation $\mathsf Tr :G_{p}\mapsto G^{\prime }_{p^{\prime }}$ maintains this choice degree unchanged. The claim is that for a winning state $s^{\prime }$, there is an epSCT of depth bounded by $2^{r}\cdot h(W, \vert S\vert , k)$. We have proved that for the base case $r = 0$, similar to $S_{2} = \emptyset $, this claim is true. So assume it holds for $r$, it remains to prove that it is preserved for $r + 1$. Let $s \in S_{2}$ be s.t. $\mathcal{P _{2}} $ has at least two outgoing edges. As before, we define $G_{p}^{\tau }$ and $G_{p}^{R}$. Clearly, the choice degree of $\mathcal{P _{2}} $ is at most $r$ in both games. Let $s^{\prime }$ be a winning state in $G_{p}$. As $\mathcal{P _{2}} $ has less choices in both $G_{p}^{\tau }$ and $G_{p}^{R}$, clearly $s^{\prime }$ is still winning in those games. If an epSCT in either of them (which are guaranteed to exist and have depth bounded by $2^{r} \cdot h(W, \vert S\vert , k)$ by hypothesis) do not contain the state $s$, then the claim is verified. Now suppose we have two epSCTs for games $G_{p}^{\tau }$ and $G_{p}^{R}$ s.t. they both contain state $s$. Notice that $s$ is winning in those two games and as such, is the root of two respective epSCTs of depth less than $2^{r} \cdot h(W, \vert S\vert , k)$. Applying (I) on states $s^{\prime }$ and $s$, we get an epSCT for $s^{\prime }$ in $G_{p}$ of depth $2 \cdot 2^{r} \cdot h(W, \vert S\vert , k)$, which concludes the proof.$\square $

From multi energy parity games to multi energy games Let $G_{p}$ be a MEPG and assume that $\mathcal{P _{1}} $ has a winning strategy in that game. By Lemma 3, there exists an epSCT whose depth is bounded by $l$. As a direct consequence of that bounded depth, we have that $\mathcal{P _{1}} $, by playing the strategy prescribed by the epSCT, enforces a stronger objective than the parity objective. Namely, this strategy ensures to “never visit more than $l$ states of odd priorities before seeing a smaller even priority” (which is a safety objective). Then, the parity condition can be transformed into additional energy dimensions.

While our transformation shares ideas with the classical transformation of parity objectives into safety objectives, first proposed in [5] (see also [26, Lemma 6.4]), it is technically different because energy levels cannot be reset (as it would be required by those classical constructions). The reduction is as follows. For each odd priority, we add one dimension. The energy level in this dimension is decreased by $1$ each time this odd priority is visited, and it is increased by $l$ each time a smaller even priority is visited. If $\mathcal{P _{1}} $ is able to maintain the energy level positive for all dimensions (for a given initial energy level), then he is clearly winning the original parity objective; on the other hand, an epSCT strategy that wins the original objective also wins the new game.

Lemma 4

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a multi energy parity game with priorities in $\{ 0,1,\dots ,2 \cdot m\}$, s.t. $W$ is the maximal absolute weight appearing on an edge. Then we can construct a multi energy game $G$ with the same set of states, $(k+ m)$ dimensions and a maximal absolute weight bounded by $l$, as defined by Lemma 3, s.t. $\mathcal{P _{1}} $ has a winning strategy in $G$ iff he has one in $G_{p}$.

Proof

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a MEPG with priorities in $\{ 0,1,\dots ,2 \cdot m\}$. Let $G= \left( S_{1}, S_{2}, s_{init}, E, (k+ m), w^{\prime }\right) $ be the MEG obtained from the following transformation: $\forall \; (s, t) \in E, \forall \; 1 \le j \le k, w^{\prime }((s,t))(j) = w((s,t))(j)$, and (a) if $p(t)$ is even, $\forall \; k< j \le \frac{p(t)}{2}, w^{\prime }((s,t))(j) = 0$ and $ \forall \; p(t) < j \le k+ m, w^{\prime }((s,t))(j) = l$, or (b) if $p(t)$ is odd, $\forall \; k< j \le k+ m, j \ne \frac{p(t)}{2}, w^{\prime }((s,t))(j) = 0$ and $w^{\prime }((s,t))(\frac{p(t)}{2}) = -1$. We have to prove both ways of the equivalence.

First, suppose $\lambda _{1} \in \varLambda ^{PF}_{1}$ is a winning strategy for $\mathcal{P _{1}} $ in the MEPG $G_{p}$. By Lemma 3, there is an epSCT of depth at most $l$ for $s_{init}$. Thus, we know that in every repeated sequence of $l$ states, the minimal visited priority will be even. Therefore, for all additional dimensions, ranging from $k+ 1$ to $k+ m$, the effect of a sequence of $l$ states will be bounded from below by $-1 \cdot (l-1) + l$, which is positive. Thus strategy $\lambda _{1}$ is also winning in $G$ (with initial credit bounded by $l$ on additional dimensions).

Second, suppose $\lambda _{1} \in \varLambda ^{PF}_{1}$ is a winning strategy for $\mathcal{P _{1}} $ in the MEG $G$, as defined above. Since $\lambda _{1}$ is winning, it yields a SCT (epSCT without the parity condition) of bounded depth s.t. $\mathcal{P _{1}} $ is able to enforce positive energy cycles. By definition of weights over $G$, this cannot be the case if the minimal priority infinitely often visited is odd. Thus this strategy is winning for parity on $G_{p}$, and stays winning for energy over dimensions $1$ to $k$ as weights are unchanged.$\square $

Bounding the width Thanks to Lemma 4, we continue with multi energy games without parity. In order to bound the overall size of memory for winning strategies, we consider the width of self-covering trees. The following lemma states that SCTs, whose width is at most doubly exponential by application of Lemma 3, can be compressed into directed acyclic graphs (DAGs) of single exponential width. Thus we eliminate the second exponential blow-up and give an overall single exponential bound for memory of winning strategies.

Lemma 5

Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a multi energy game s.t. $W$ is the maximal absolute weight appearing on an edge and $d$ the branching degree of $G$. Suppose there exists a finite-memory winning strategy for $\mathcal{P _{1}} $. Then, there exists $\lambda ^{D}_{1} \in \varLambda ^{PF}_{1}$ a winning strategy for $\mathcal{P _{1}} $ described by a DAG $D$ of depth at most $l= 2^{(d-1) \cdot \vert S\vert } \cdot \left( W\cdot \vert S\vert + 1\right) ^{c \cdot k^{2}}$ and width at most $L= \vert S\vert \cdot (2\cdot l\cdot W+ 1)^{k}$, where $c$ is a constant independent of $G$. Thus the overall memory needed to win this game is bounded by the single exponential $l\cdot L$.

The sketch of this proof is the following. By Lemma 3, we know that there exists a tree $T$, and thus a DAG, that satisfies the bound on depth. We construct a finite sequence of DAGs, whose first element is $T$, so that (1) each DAG describes a winning strategy for the same initial credit, (2) each DAG has the same depth, and (3) the last DAG of the sequence has its width bounded by $\vert S\vert \cdot (2\cdot l\cdot W+ 1)^{k}$. This sequence $D_{0} = T, D_{1}, D_{2}, \ldots {}, D_{n}$ is built by merging nodes on the same level of the initial tree depending on their labels, level by level. The key idea of this procedure is that what actually matters for $\mathcal{P _{1}} $ is only the current energy level, which is encoded in node labels in the self-covering tree $T$. Therefore, we merge nodes with identical states and energy levels: since $\mathcal{P _{1}} $ can essentially play the same strategy in both nodes, we only keep one of their subtrees.

It is possible to further reduce the practical size of the compressed resulting DAG by merging nodes according to a “greater or equal” relation over energy levels rather than simply equality (Fig. 2). This improvement is part of the algorithm that follows, and it has a significant impact on the practical width of DAGs as it can then be bounded by the number of incomparable labeling vectors instead of unequivalent ones.

The remainder of this subsection is dedicated to the proof of Lemma 5. We need to introduce some notations and two intermediate lemmas. If he so wishes, the reader may directly proceed to the next subsection and Lemma 8 for results on lower memory bounds.

We first introduce some notations. Let $T= \left( Q, R\right) $ be a self-covering tree (i.e., epSCT without the parity condition). We define the partial order $\preceq $ on $Q$ s.t. for all $\varsigma _{1}, \varsigma _{2} \in Q$ s.t. $\varTheta (\varsigma _{1}) = \langle t_{1}, u_{1}\rangle $ and $\varTheta (\varsigma _{2}) = \langle t_{2}, u_{2}\rangle $, we have $\varsigma _{1} \preceq \varsigma _{2}$ iff $t_{1} = t_{2}$ and $u_{1} \le u_{2}$. We denote the equivalence by $\simeq $ s.t. $\varsigma _{1} \simeq \varsigma _{2}$ iff $\varsigma _{1} \preceq \varsigma _{2}$ and $\varsigma _{2} \preceq \varsigma _{1}$. For all $\varsigma \in Q$, let $\mathsf{Anc }$ and $\mathsf{EnAnc }$ resp. denote the set of ancestors and energy ancestors of $\varsigma $ in $T$: $ \mathsf{Anc }(\varsigma ) = \left\{ \vartheta \in Q\setminus \lbrace \varsigma \rbrace \;\vert \; \vartheta \vDash \exists \diamondsuit \varsigma \right\} $, where we use the classical CTL notation to denote that there exists a path from $\vartheta $ to $\varsigma $ in $T$, and $\mathsf{EnAnc }(\varsigma ) = \left\{ \vartheta \in \mathsf{Anc }(\varsigma ) \;\vert \; \vartheta \preceq \varsigma \right\} $.

We build a sequence of DAGs $(D_{i})_{0 \le i \le n}\equiv D_{0} = T, D_{1}, D_{2}, \ldots {}, D_{n}$ s.t. for all $0 < i \le n, D_{i}$ is obtained from $D_{i-1}$ by merging two equivalent nodes of the same minimal level (i.e., closest to the root) of $D_{i-1}$. The sequence stops when we obtain a DAG $D_{n} = (Q_{n}, R_{n})$ s.t. for all level $j$ of $D_{n}$, there does not exist two distinct equivalent nodes on level $j$. This construction induces merges by increasing depth, starting with level one. Moreover, if a DAG $D_{i}$ of the sequence is the result on merges up to level $j$, then it has the tree property (i.e., every node has a unique father) for levels greater than $j$. As the depth and the branching degree of $T$ are finite, the defined sequence of DAGs is finite (and actually bounded).

Let us give a formal definition of the merge operation. Consider such a DAG $D_{i} = (Q_{i}, R_{i})$. Let $j$ the minimal level of $D_{i}$ that contains two equivalent nodes. Let $\varsigma _{1}, \varsigma _{2} \in Q_{i}(j)$ (i.e., nodes of level $j$) be two nodes s.t. $\varsigma _{1} \ne \varsigma _{2}$ and $\varsigma _{1} \simeq \varsigma _{2}$. We suppose w.l.o.g. an arbitrary order on nodes of the same level so that $\varsigma _{1}, \varsigma _{2}$ are the two leftmost nodes that satisfy this condition. We define $D_{i+1}= (Q_{i+1}, R_{i+1}) = \mathsf{merge }(D_{i})$ as the result of the following transformation:

$Q_{i+1} = Q_{i} \!{\setminus }\! \left( \lbrace \varsigma _{2}\rbrace \cup \left\{ \varsigma _{d} \in Q_{i} \,\vert \, \varsigma _{2} \in \mathsf{Anc }(\varsigma _{d}) \right\} \right) $,
$R_{i+1} = \left( R_{i} \,\cap \, (Q_{i+1} \times Q_{i+1})\right) \cup \left\{ (\vartheta , \varsigma _{1}) \;\vert \; (\vartheta , \varsigma _{2}) \in R_{i}\right\} $.

Thus, we eliminate the subtree starting in $\varsigma _{2}$ and replace all edges that point to $\varsigma _{2}$ by edges pointing to $\varsigma _{1}$. This follows the idea that the same strategy can be played in $\varsigma _{2}$ as in $\varsigma _{1}$ since the present state and the energy level are the same.

Let $D_{i} = (Q_{i}, R_{i})$ be a DAG of the sequence $(D_{i})_{0 \le i \le n}$. Given $\varsigma \in Q_{i}, \vartheta \in \mathsf{Anc }(\varsigma )$, we denote by $\vartheta \rightsquigarrow \varsigma $ an arbitrary downward path from $\vartheta $ to $\varsigma $ in $D_{i}$. Given a leaf $\varsigma \in Q_{i}$, we denote its oldest energy ancestor by $\mathsf{oea }(\varsigma )$. Recall that a strategy is described by such a DAG according to moves of a pebble. Given a leaf $\varsigma \in Q_{i}$ and one of its energy ancestors $\vartheta \in \mathsf{EnAnc }(\varsigma )$, we represent the pebble going up from $\varsigma $ to $\vartheta $ by $\varsigma \circlearrowleft \vartheta $. Given $\alpha , \beta \in (Q_{i})^{*}, \alpha \circlearrowleft \beta $ naturally extends this notation s.t. we have $\mathsf{Last }(\alpha ) \circlearrowleft \mathsf{First }(\beta )$. We consider energy levels of paths in the tree by refering to their counterparts in the game. Note that given $\vartheta , \varsigma \in Q_{i}, \varTheta (\vartheta ) = \langle t, u\rangle , \varTheta (\varsigma ) = \langle t^{\prime }, u^{\prime }\rangle $, we have $\mathsf{EL }(\vartheta \rightsquigarrow \varsigma ) = u^{\prime } - u$. We start with two useful lemmas.

Lemma 6

Let $D_{i} = (Q_{i}, R_{i})$ be a DAG of $(D_{i})_{0 \le i \le n}$. For all nodes $\varsigma _{1}, \varsigma _{2} \in Q_{i}$ s.t. $\varsigma _{1} \simeq \varsigma _{2}$, we have that $\forall \, \vartheta \in \mathsf{Anc }(\varsigma _{1}) \cap \mathsf{Anc }(\varsigma _{2}), \mathsf{EL }(\vartheta \rightsquigarrow \varsigma _{1}) = \mathsf{EL }(\vartheta \rightsquigarrow \varsigma _{2})$.

Proof

The proof is straightforward.$\square $

Lemma 7

Let $D_{i} = (Q_{i}, R_{i})$ be a DAG of $(D_{i})_{0 \le i \le n}$. Let $\varsigma , \vartheta , \nu , \xi \in Q_{i}$ be four nodes s.t. $\varsigma $ and $\xi $ are leafs, $\nu $ is the deepest common ancestor of $\varsigma $ and $\xi $, and $\vartheta $ is an ancestor of $\nu $. Let the oldest energy ancestor of $\xi $ be an ancestor of $\varsigma $, i.e., $\mathsf{oea }(\xi ) \in \mathsf{Anc }(\varsigma )$. We have that $\mathsf{EL }(\vartheta \rightsquigarrow \varsigma ) \le \mathsf{EL }(\vartheta \rightsquigarrow \nu \rightsquigarrow \xi \circlearrowleft \mathsf{oea }(\xi ) \rightsquigarrow \varsigma )$.

This lemma states that we can extract pebble cycles, which have positive energy levels, from a given path, in order to obtain some canonical path whose energy level is lower or equal (Fig. 3).

Proof

Let $\chi = \mathsf{oea }(\xi )$ and $\rho = \vartheta \rightsquigarrow \nu \rightsquigarrow \xi \circlearrowleft \chi \rightsquigarrow \varsigma $. Since $\chi \in \mathsf{Anc }(\varsigma ) \cap \mathsf{Anc }(\xi )$, we have $\chi \in \mathsf{Anc }(\nu ) \cup \lbrace \nu \rbrace $. Therefore, and applying Lemma 6, four cases are possible: $\chi \in \mathsf{Anc }(\vartheta ), \chi = \vartheta , \chi \in \mathsf{Anc }(\nu ) {\setminus } \left( \mathsf{Anc }(\vartheta ) \cup \lbrace \vartheta \rbrace \right) $, and $\chi = \nu $. Consider the first case, $\chi \in \mathsf{Anc }(\vartheta )$. Then $\rho = \vartheta \rightsquigarrow \nu \rightsquigarrow \xi \circlearrowleft \chi \rightsquigarrow \vartheta \rightsquigarrow \nu \rightsquigarrow \varsigma $. We have $\mathsf{EL }(\rho ) = \mathsf{EL }(\vartheta \rightsquigarrow \nu ) + \mathsf{EL }(\nu \rightsquigarrow \xi ) + \mathsf{EL }(\chi \rightsquigarrow \vartheta ) + \mathsf{EL }(\vartheta \rightsquigarrow \nu ) + \mathsf{EL }(\nu \rightsquigarrow \varsigma ) = \mathsf{EL }(\chi \rightsquigarrow \vartheta \rightsquigarrow \nu \rightsquigarrow \xi ) + \mathsf{EL }(\vartheta \rightsquigarrow \varsigma )$. By definition of $\chi = \mathsf{oea }(\xi )$, the first term is positive. Thus, $\mathsf{EL }(\rho ) \ge \mathsf{EL }(\vartheta \rightsquigarrow \varsigma )$. Arguments are similar for the other cases.$\square $

We proceed with the proof of Lemma 5.

Proof

(Lemma 5) Let $(D_{i})_{0 \le i \le n}$ be the sequence of DAGs defined above. We claim that (i) each DAG describes a winning strategy for the same initial credit, (ii) each DAG has the same depth $l$, and (iii) the last DAG of the sequence has its width bounded by $\vert S\vert \cdot (2\cdot l\cdot W+ 1)^{k}$.

(i) First, recall that $\mathcal{P _{1}} $ can play a strategy $\lambda _{1}^{T} \in \varLambda ^{PF}_{1}$ based on edges taken by a pebble on $T$. Notice that moving the pebble as we previously defined is possible because nodes belonging to $\mathcal{P _{1}} $ have only one child, and nodes of $\mathcal{P _{2}} $ have childs covering all his choices once, and only once. Fortunately, the $\mathsf{merge }$ operation maintains this property. Therefore, it is straightforward to see that $\mathcal{P _{1}} $ can also play a strategy $\lambda _{1}^{D_{i}} \in \varLambda ^{PF}_{1}$ for a DAG $D_{i}$ resulting of some merges on $T$. However, while this would be a valid strategy for $\mathcal{P _{1}} $, we have to prove that it is still a winning one, for the same initial credit $v_{0}$ as $\lambda _{1}^{T}$. Precisely, we claim that $\forall \, i \ge 0$, we have that $\lambda _{1}^{D_{i}}$ is winning for $v_{0}$.

We show it by induction on $D_{i}$. The base case is trivial as $D_{0} = T$: the strategy $\lambda _{1}^{T}$ is winning for $v_{0}$ by definition. Our induction hypothesis is that our claim is valid for $D_{i-1}$, and we now prove it for $D_{i}$, by contradiction. Let $\varsigma _{1}, \varsigma _{2} \in Q_{i-1}(j)$ be the merged nodes, for some level $j$ of $D_{i-1}$. Suppose $\lambda _{1}^{D_{i}}$ is not winning for $v_{0}$. Thus there exists a finite path $\zeta $ of the pebble in $D_{i}$, which corresponds to a strategy $\lambda _{2}^{D_{i}} \in \varLambda ^{PF}_{2}$ of $\mathcal{P _{2}} $, s.t. it achieves a negative value on at least one dimension $m, 1 \le m \le k$. We have that $\left( v_{0}+ \mathsf{EL }(\zeta )\right) (m) < 0$. We aim to find a similar path $\eta $ in $D_{i-1}$ s.t. $\mathsf{EL }(\eta ) \le \mathsf{EL }(\zeta )$, thus yielding contradiction, as it would witness that $\lambda _{1}^{D_{i-1}}$ is not winning for $v_{0}$.

We denote by $\varsigma _\mathsf{m }$ the father of $\varsigma _{2}$ in $D_{i-1}$. The only edge added by the $\mathsf{merge }$ operation is $(\varsigma _\mathsf{m }, \varsigma _{1})$. Obviously, if $\zeta $ does not involve this edge, then we can take $\eta = \zeta $ and immediately obtain contradiction. Thus, we can decompose the witness path

$$\begin{aligned} \zeta = \alpha (1)\,\varsigma _\mathsf{m }\varsigma _{1}\,\beta (1)\circlearrowleft \alpha (2)\, \varsigma _\mathsf{m }\varsigma _{1}\,\beta (2)\circlearrowleft \ldots {}\circlearrowleft \alpha (q)\, \varsigma _\mathsf{m }\varsigma _{1}\,\xi , \end{aligned}$$

for some $q \ge 1$ s.t. for all $1 \le p \le q$, we have that $\alpha (p), \beta (p), \xi \in \left( Q_{i} \cup \lbrace \circlearrowleft \rbrace \right) ^{*}$ are valid paths of the pebble in $D_{i}$ (and $D_{i-1}$); they do not involve edge $(\varsigma _\mathsf{m }, \varsigma _{1})$, i.e., $\lbrace \varsigma _\mathsf{m }\varsigma _{1}\rbrace \not \subseteq \alpha (p), \beta (p), \xi $; and $\beta (p) \cap \left( \mathsf{Anc }_{D_{i}}(\varsigma _\mathsf{m }) {\setminus } \mathsf{Anc }_{D_{i-1}}(\varsigma _{1})\right) = \emptyset , \mathsf{Last }(\beta (p))$ is a leaf and $\mathsf{oea }(\mathsf{Last }(\beta (p))) \in \mathsf{Anc }_{D_{i}}(\varsigma _\mathsf{m })$.

Intuitively, $\zeta $ is split into several parts in regard to $q$, the number of times it takes the added edge $(\varsigma _\mathsf{m }, \varsigma _{1})$. Each time, this transition is preceded by some path $\alpha $. It is then followed by some path $\beta $ where all visited ancestors of $\varsigma _\mathsf{m }$ were already ancestors of $\varsigma _{1}$ in $D_{i-1}$ (thus, $\beta $ paths can be kept in $\eta $). Finally, after the $q$-th transition $\varsigma _\mathsf{m }\varsigma _{1}$ is taken, the path $\zeta $ ends with a finite sub-path $\xi $.

We define the witness path $\eta $ in $D_{i-1}$ as $\eta = \kappa (1)\beta (1)\circlearrowleft \kappa (2)\beta (2)\circlearrowleft \ldots {}\circlearrowleft \kappa (q)\xi ,$ with the following transformation of sub-paths $\alpha (p)\,\varsigma _\mathsf{m }\varsigma _{1}$:

$\kappa (1) = r\rightsquigarrow _{D_{i-1}} \varsigma _{1}$,
$\forall \, 2 \le p \le q, \kappa (p) = \mathsf{oea }(\mathsf{Last }(\beta (p-1))) \rightsquigarrow _{D_{i-1}} \varsigma _{1}$,

where $\rightsquigarrow _{D_{i-1}}$ denotes a valid path in $D_{i-1}$. Note that given preceding definitions, this indeed constitutes a valid path in $D_{i-1}$. We have to prove that $\mathsf{EL }(\eta ) \le \mathsf{EL }(\zeta )$. We have

$$\begin{aligned} \mathsf{EL }(\eta ) = \sum _{1 \le p \le q} \mathsf{EL }(\kappa (p)) + \sum _{1 \le p \le q-1} \mathsf{EL }(\beta (p)) + \mathsf{EL }(\xi ), \end{aligned}$$

and

$$\begin{aligned} \mathsf{EL }(\zeta ) = \sum _{1 \le p \le q} \mathsf{EL }(\alpha (p)\,\varsigma _\mathsf{m }\varsigma _{1}) + \sum _{1 \le p \le q-1} \mathsf{EL }(\beta (p)) + \mathsf{EL }(\xi ). \end{aligned}$$

Thus, it remains to show that

$$\begin{aligned} \sum _{1 \le p \le q} \mathsf{EL }(\kappa (p)) \le \sum _{1 \le p \le q} \mathsf{EL }(\alpha (p)\,\varsigma _\mathsf{m }\varsigma _{1}). \end{aligned}$$

In particular, we claim that for all $1 \le p \le q$, we have $\mathsf{EL }(\kappa (p)) \le \mathsf{EL }(\alpha (p)\,\varsigma _\mathsf{m }\varsigma _{1})$. Indeed, notice that $\kappa (p)$ and $\alpha (p)$ share their starting and ending nodes and that $\alpha (p)$ contains a finite number of pebble cycles. Let $\vartheta $ denote the common starting node of both $\kappa (p)$ and $\alpha (p)$. Applying Lemma 7 on $\alpha (p)$, we can eliminate cycles one at a time, without ever increasing the energy level, and obtain a path $\vartheta \rightsquigarrow _{D_{i}} \varsigma _\mathsf{m }\varsigma _{1}$ s.t. $\mathsf{EL }(\vartheta \rightsquigarrow _{D_{i}} \varsigma _\mathsf{m }\varsigma _{1}) \le \mathsf{EL }(\alpha (p))$. Since $\varsigma _{1} \simeq \varsigma _{2}$, we have by Lemma 6 that $\mathsf{EL }(\vartheta \rightsquigarrow _{D_{i}} \varsigma _\mathsf{m }\varsigma _{1}) = \mathsf{EL }(\vartheta \rightsquigarrow _{D_{i-1}} \varsigma _\mathsf{m }\varsigma _{2}) = \mathsf{EL }(\vartheta \rightsquigarrow _{D_{i-1}} \varsigma _{1})$, implying the claim.

Consequently, we obtain $\mathsf{EL }(\eta ) \le \mathsf{EL }(\zeta )$, which witnesses that $D_{i-1}$ was not winning. This contradicts our induction hypothesis and concludes our proof that for all $0 \le i \le n, \lambda _{1}^{D_{i}}$ is winning for $v_{0}$.

(ii) Second, the $\mathsf{merge }$ operation only prunes some parts of the tree $T$, without ever adding any new state, and added edges are on existing successive levels. Therefore, each $D_{i}$ has noticeably the same depth $l$.

(iii) Third, the last DAG of the sequence, $D_{n}$, is s.t. for all level $j$, for all $\varsigma _{1}, \varsigma _{2} \in Q_{n}(j)$, we have $(\varsigma _{1} \ne \varsigma _{2}) \Rightarrow (\varsigma _{1} \not \simeq \varsigma _{2})$. Therefore the width of this DAG is bounded by the number of possible non-equivalent nodes. Recall that two nodes are equivalent if they have the same labels, i.e., they represent the same state of the game and are marked with exactly the same energy level vector. Since the maximal change in energy level on an edge is $W$, and the depth of the DAG is bounded by $l = 2^{(d-1) \cdot \vert S\vert } \cdot \left( W\cdot \vert S\vert + 1\right) ^{c \cdot k^{2}}$ thanks to Lemma 3, we have possible vectors in $\lbrace -l\cdot W, -l\cdot W+ 1, \ldots {}, l\cdot W- 1, l\cdot W\rbrace ^{k}$ for each state. Consequently, the width of $D_{n}$ is bounded by

$$\begin{aligned} \vert S\vert \cdot (2\cdot l\cdot W+ 1)^{k} = \vert S\vert \cdot \left( 2^{d \cdot \vert S\vert } \cdot \left( W\cdot \vert S\vert +1\right) ^{c \cdot k^{2}}\cdot W+ 1\right) ^{k}, \end{aligned}$$

which is still single exponential.$\square $

Lower bound In the next lemma, we show that the upper bound is tight in the sense that there exist families of games which require exponential memory (in the number of dimensions), even for the simpler case of multi energy objectives without parity and weights in $\lbrace -1, 0, 1\rbrace $ (Fig. 4). Note that for one-dimension energy parity, it was shown in [16] that exponential memory (in the encoding of weights) may be necessary.

Lemma 8

There exists a family of multi energy games $(G(K ))_{K \ge 1, } = \left( S_{1}, S_{2}, s_{init}, E, \right. \left. k= 2\cdot K , w: E\rightarrow \lbrace -1, 0, 1\rbrace \right) $ s.t. for any initial credit, $\mathcal{P _{1}} $ needs exponential memory to win.

The idea is the following: in the example of Fig. 4, if $\mathcal{P _{1}} $ does not remember the exact choices of $\mathcal{P _{2}} $ (which requires an exponential size Moore machine), there will exist some sequence of choices of $\mathcal{P _{2}} $ s.t. $\mathcal{P _{1}} $ cannot counteract a decrease in energy. Thus, by playing this sequence long enough, $\mathcal{P _{2}} $ can force $\mathcal{P _{1}} $ to lose, whatever his initial credit is.

Proof

We define a family of games $(G(K ))_{K \ge 1}$ which is an assembly of $k= 2\cdot K $ gadgets, the first $K $ belonging to $\mathcal{P _{2}} $, and the remaining $K $ belonging to $\mathcal{P _{1}} $ (Fig. 4). Precisely, we have $\vert S_{1}\vert = \vert S_{2}\vert = 3\cdot K , \vert S\vert = \vert E\vert = 6\cdot K = 3\cdot k$ (linear in $k$), $k= 2\cdot K $, and $w$ defined as:

$$\begin{aligned} \forall \, 1 \le i \le K ,\, w((\circ , s_{i}))&= w((\circ , t_{i})) = (0, \ldots {}, 0),\\ w((s_{i}, s_{i, L}))&= - w((s_{i}, s_{i, R})) = w((t_{i}, t_{i, L})) = - w((t_{i}, t_{i, R})),\\ \forall \, 1 \le j \le k,\, w((s_{i}, s_{i, L}))(j)&= {\left\{ \begin{array}{ll}1 \quad \text { if } j = 2\cdot i - 1\\ -1 \quad \text { if } j = 2\cdot i\\ 0 \quad \text { otherwise}\end{array}\right. }, \end{aligned}$$

where $\circ $ denotes any valid predecessor state.

There exists a winning strategy $\lambda ^{exp}_{1}$ for $\mathcal{P _{1}} $, for initial credit $v^{exp}_{0} = (1, \ldots {}, 1)$. Indeed, for any strategy of $\mathcal{P _{2}} $, for any state $t_{i}$ belonging to $\mathcal{P _{1}} $, it suffices to play the opposite choice as $\mathcal{P _{2}} $ made on its last visit of $s_{i}$ to maintain at all times an energy vector which is positive on all dimensions. This strategy thus requires to remember the last choice of $\mathcal{P _{2}} $ in all gadgets, which means $\mathcal{P _{1}} $ needs $K $ bits to encode these decisions. Thus, this winning strategy is described by a Moore machine containing $2^{K } = 2^{\frac{k}{2}}$ states, which is exponential in the number of dimensions $k$.

We claim that, for any initial credit $v_{0}$, there exists no winning strategy $\lambda _{1}$ that can be described with less than $2^{K }$ states and prove it by contradiction. Suppose $\mathcal{P _{1}} $ plays according to such a strategy $\lambda _{1}$. Then there exists some $1 \le x \le K $ s.t. $\lambda _{1}(s_{1} \ldots {} s_{x} s_{x, L} \ldots {} t_{x}) = \lambda _{1}(s_{1} \ldots {} s_{x} s_{x, D} \ldots {} t_{x})$, i.e., $\mathcal{P _{1}} $ chooses the same action in $t_{x}$ against both choices of the adversary. Suppose that $\mathcal{P _{1}} $ chooses to play $t_{x, L}$ in both cases, that is $\lambda _{1}(s_{1} \ldots {} s_{x} s_{x, L} \ldots {} t_{x}) = \lambda _{1}(s_{1} \ldots {} s_{x} s_{x, D} \ldots {} t_{x}) = t_{x, L}$. By playing $s_{x, L}, \mathcal{P _{2}} $ can force a decrease of the energy vector by $2$ on dimension $2\cdot x$ every visit in gadget $x$. Similarly, if the strategy of $\mathcal{P _{1}} $ is to play $t_{x, R}, \mathcal{P _{2}} $ wins by choosing to play $s_{x, R}$ as dimension $2\cdot x - 1$ decreases by $2$ every visit. Therefore, whatever the finite initial vector of $\mathcal{P _{1}} , \mathcal{P _{2}} $ can enforce a negative dimension by playing long enough. This contradicts the fact that $\lambda _{1}$ is winning and concludes our proof that exponential memory is necessary for this simple family of games $(G(K ))_{K \ge 1}$.$\square $

We summarize our results in Theorem 1.

Theorem 1

(Optimal memory bounds) The following assertions hold: (1) In multi energy parity games, if there exists a winning strategy, then there exists a finite-memory winning strategy. (2) In multi energy parity and multi mean-payoff games, if there exists a finite-memory winning strategy, then there exists a winning strategy with at most exponential memory. (3) There exists a family of multi energy games (without parity) with weights in $\lbrace -1, 0 ,1\rbrace $ where all winning strategies require at least exponential memory.

Proof

Thanks to [19, Theorem 3], we have equivalence between finite-memory winning for multi energy and multi mean-payoff games. The rest follows from straightforward application of Lemmas 1, 4, 5, and 8.$\square $

4 Symbolic synthesis algorithm

We now present a symbolic, incremental and optimal algorithm to synthesize a finite-memory winning strategy in a MEG.^{Footnote 1} This algorithm outputs a (set of) winning initial credit(s) and a derived finite-memory winning strategy (if one exists) which is exponential in the worst-case. Its running time is at most exponential. So our symbolic algorithm can be considered (worst-case) optimal in the light of the results of previous section.

This algorithm computes the greatest fixed point of a monotone operator that defines the sets of winning initial (vectors of) credits for each state of the game. As those sets are upward-closed, they are symbolically represented by their minimal elements. To ensure convergence, the algorithm considers only credits that are below some threshold, noted $\mathbb{C }$. This is without giving up completeness because, as we show below, for a game ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $, it is sufficient to take the value $2\cdot l \cdot W$ for $\mathbb{C }$, where $l$ is the bound on the depth on epSCT obtained in Lemma 3 and $W$ is the largest absolute value of weights used in the game. We also show how to extract a finite state Moore machine representing a corresponding winning strategy (states of the Moore machine encode the memory of the strategy) from this set of minimal winning initial credits and how to obtain an incremental algorithm by increasing values for the threshold $\mathbb{C }$ starting from small values.

A controllable predecessor operator Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a MEG, $\mathbb{C }\in \mathbb{N }$ be a constant, and $U(\mathbb{C })$ be the set $(S_{1}\cup S_{2}) \times \lbrace 0, 1,\ldots {},\mathbb{C }\rbrace ^{k}$. Let $\mathcal{U}(\mathbb{C })=2^{U(\mathbb{C })}$, i.e., the powerset of $U(\mathbb{C })$, and the operator $\mathsf{Cpre}_{\mathbb{C }} :\mathcal{U}(\mathbb{C }) \rightarrow \mathcal{U}(\mathbb{C })$ be defined as follows:

$$\begin{aligned} \mathcal E (V)&= \{ (s_1,e_1) \in U(\mathbb{C }) \mid s_1 \in S_{1}\wedge \exists (s_1,s) \in E, \exists (s,e_2) \in V : e_2 \le e_1 + w(s_1,s) \},\\ \mathcal A (V)&= \{ (s_2,e_2) \in U(\mathbb{C }) \mid s_2 \in S_{2}\wedge \forall (s_2,s) \in E, \exists (s,e_1) \in V : e_1 \le e_2 + w(s_2,s) \}, \end{aligned}$$

$$\begin{aligned} \mathsf{Cpre}_{\mathbb{C }}(V) = \mathcal E (V) \;\cup \; \mathcal A (V). \end{aligned}$$

(1)

Intuitively, $\mathsf{Cpre}_{\mathbb{C }}(V)$ returns the set of energy levels from which $\mathcal{P _{1}} $ can force an energy level in $V$ in one step. The operator $\mathsf{Cpre}_{\mathbb{C }}$ is $\subseteq $-monotone over the complete lattice $\mathcal{U}(\mathbb{C })$, and so there exists a greatest fixed point for $\mathsf{Cpre}_{\mathbb{C }}$ in the lattice $\mathcal{U}(\mathbb{C })$, denoted by $\mathsf{Cpre}_{\mathbb{C }}^{*}$. As usual, the greatest fixed point of the operator $\mathsf{Cpre}_{\mathbb{C }}$ can be computed by successive approximations as the last element of the following finite $\subseteq $-descending chain. We define the algorithm $\mathsf{CpreFP}$ that computes this greatest fixed point:

$$\begin{aligned} U_0=U(\mathbb{C }),\;U_1=\mathsf{Cpre}_{\mathbb{C }}(U_0),\;\ldots ,\;U_n=\mathsf{Cpre}_{\mathbb{C }}(U_{n-1})=U_{n-1}. \end{aligned}$$

(2)

The set $U_i$ contains all the energy levels that are sufficient to maintain the energy positive in all dimensions for $i$ steps. Note that the length of this chain can be bounded by $| U(\mathbb{C }) |$ and the time needed to compute each element of the chain can be bounded by a polynomial in $| U(\mathbb{C }) |$. As a consequence, we obtain the following lemma.

Lemma 9

Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a multi energy game and $\mathbb{C }\in \mathbb{N }$ be a constant. Then $\mathsf{Cpre}_{\mathbb{C }}^{*}$ can be computed in time bounded by a polynomial in $| U(\mathbb{C }) |$, i.e., an exponential in the size of $G$.

Symbolic representation To define a symbolic representation of the sets manipulated by the $\mathsf{Cpre}_{\mathbb{C }}$ operator, we exploit the following partial order: let $(s,e), (s^{\prime },e^{\prime }) \in U(\mathbb{C })$, we define

$$\begin{aligned} (s,e) \preceq (s^{\prime },e^{\prime })\quad \text{ iff } s = s^{\prime } \text{ and } e \le e^{\prime }. \end{aligned}$$

(3)

A set $V \in \mathcal{U}(\mathbb{C })$ is closed if for all $(s,e), (s^{\prime },e^{\prime }) \in U(\mathbb{C })$, if $(s,e) \in V$ and $(s,e) \preceq (s^{\prime },e^{\prime })$, then $(s^{\prime },e^{\prime }) \in V$. By definition of $\mathsf{Cpre}_{\mathbb{C }}$, we get the following property.

Lemma 10

All sets $U_i$ in Eq. (2) are closed for $\preceq $.

Therefore, all sets $U_{i}$ in the descending chain of Eq. (2) can be symbolically represented by their minimal elements $\mathsf{Min}_{\preceq }(U_{i})$ which is an antichain of elements for $\preceq $. Even if the largest antichain can be exponential in $G$, this representation is, in practice, often much more efficient, even for small values of the parameters. For example, with $\mathbb{C }= 4$ and $k= 4$, we have that the cardinality of a set can be as large as $\vert U_{i}\vert \le 625$ whereas the size of the largest antichain is bounded by $\vert \mathsf{Min}_{\preceq }(U_{i})\vert \le 35$. Antichains have proved to be very efficient: see for example [2, 24, 25]. Therefore, our algorithm is expected to have good performances in practice.

Correctness and completeness The following two lemmas relate the greatest fixed point $\mathsf{Cpre}_{\mathbb{C }}^{*}$ and the existence of winning strategies for $\mathcal{P _{1}} $ in $G$. We start with the correctness of the symbolic algorithm.

Lemma 11

(Correctness) Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a multi energy game, let $\mathbb{C }\in \mathbb{N }$ be a constant. If there exists $(c_{1}, \ldots {}, c_{k}) \in \mathbb{N }^{k}$ s.t. $(s_{init}, (c_{1}, \ldots {}, c_{k})) \in \mathsf{Cpre}_{\mathbb{C }}^{*}$, then $\mathcal{P _{1}} $ has a winning strategy in $G$ for initial credit $(c_{1}, \ldots {}, c_{k})$ and the memory needed by $\mathcal{P _{1}} $ can be bounded by $| \mathsf{Min}_{\preceq }(\mathsf{Cpre}_{\mathbb{C }}^{*})|$ (the size of the antichain of minimal elements in the fixed point).

Given the set of winning initial credits output by $\mathsf{CpreFP}$, it is straightforward to derive a corresponding winning strategy of at most exponential size. Indeed, for winning initial credit $\overline{c} \in \mathbb{N }^{k}$, we build a Moore machine which (i) states are the minimal elements of the fixed point (antichain at most exponential in $G$), (ii) initial state is any element $(t, u)$ among them s.t. $t=s_{init}$ and $u \le \overline{c}$, (iii) next-action function prescribes an action that ensures remaining in the fixed point, and (iv) update function maintains an accurate energy level in the memory.

Proof

We denote by $\overline{c}$ the $k$-dimensional credit vector $(c_{1}, \ldots {}, c_{k})$. W.l.o.g. we assume that states of $G$ alternate between positions of $\mathcal{P _{1}} $ and positions of $\mathcal{P _{2}} $ (otherwise, we split needed edges by introducing dummy states). From $\mathsf{Cpre}_{\mathbb{C }}^{*}$, we construct a Moore machine $M=(Q^M,q_0^M,\varDelta ^M, \mathsf{Act}^M)$ which respects the following definitions:

$Q^M=\mathsf{Min}_{\preceq } \{ (t,u) \in S_1 \times \lbrace 0 \ldots {} \mathbb{C }\rbrace ^{k} \mid (t,u) \in (\mathsf{Cpre}_{\mathbb{C }}^{*}) \}$. The set of states of the machine is the antichain of $\preceq $-minimal elements that belongs to $\mathcal{P _{1}} $ in the fixed point. Note that the length of this antichain is bounded by an exponential in the size of the game.
$q_0^M$ is any element $(t,u)$ in $Q^M$ s.t. $t=s_{init}$ and $u \le \overline{c}$. Note that such an element is guaranteed to exist as $(s_{init}, \overline{c}) \in \mathsf{Cpre}_{\mathbb{C }}^{*}$.
For all $(t,u) \in Q^M$, we define $\mathsf{Act}^M((t,u))$ by choosing any element $(t,t^{\prime }) \in E$ s.t. there exists $(t^{\prime },u^{\prime }) \in \mathsf{Cpre}_{\mathbb{C }}^*$ with $u^{\prime }=u+w(t,t^{\prime })$. Such an element is guaranteed to exist by definition of $\mathsf{Cpre}_{\mathbb{C }}$ and the fact that $(t,u) \in \mathsf{Cpre}_{\mathbb{C }}^{*}$.
$\varDelta ^M :Q^M \times ((S_{2}\times S) \cap E) \mapsto Q^M$ is any partial function that respects the following constraint: if $\mathsf{Act}^M((t,u))=(t,t^{\prime })$ then $\varDelta ^M((t,u), (t^{\prime },t^{\prime \prime }))$ is defined for any $(t^{\prime },t^{\prime \prime }) \in E$ and can be chosen to be equal to any $(t^{\prime \prime },u^{\prime \prime })$ s.t. $u^{\prime \prime } \le u+w(t,t^{\prime })+w(t^{\prime },t^{\prime \prime })$, and such an $u^{\prime \prime }$ is guaranteed to exist by definition of $\mathsf{Cpre}_{\mathbb{C }}$ and because $\mathsf{Cpre}_{\mathbb{C }}^{*}$ is a fixed point.

Now, let us prove that for any initial prefix $s_0s_1 \dots s_{2n}$ of even length in $G$, which is compatible with $M$, we have that $\overline{c} + \mathsf{EL }(s_0s_1 \dots s_{2n-1}) \ge 0$ and $\overline{c} + \mathsf{EL }(s_0s_1 \dots s_{2n}) \ge 0$. To establish this property, we first prove the following property by induction on $n$: $\overline{c} + \mathsf{EL }(s_0s_1 \dots s_{2n}) \ge u$ where $u$ is the energy level of the label of the state reached after reading the prefix $s_0s_1 \dots s_{2n}$ with the Moore machine $M$. Base case $n=0$ is trivial. Induction: assume that the property is true for $n-1$, and let us establish it for $n$. By induction hypothesis, we have that $\overline{c} + \mathsf{EL }(s_0s_1 \dots s_{2(n-1)}) \ge u$ where $u$ is the energy level of the label of state $q$ that is reached after reading $s_0s_1 \dots s_{2(n-1)}$ with the Moore machine. Now, assume that $\mathsf{Act}^M(q)=(t,t^{\prime })$. So, $s_{2(n-1)}=t$ and the choice of $\mathcal{P _{1}} $ is to play $(t,t^{\prime })$. So, $s_{2(n-1)+1}=t^{\prime }$. Now for all possible choices $(t^{\prime },t^{\prime \prime })$ of $\mathcal{P _{2}} $, we know by definition of $M$ that the energy level $u^{\prime \prime }$ that labels the state $\varDelta ^M(q,(t^{\prime },t^{\prime \prime }))$ is $u^{\prime \prime } \le u+w(t,t^{\prime })+w(t^{\prime },t^{\prime \prime })$, which establishes our property. Therefore, the strategy of $\mathcal{P _{1}} $ based on $M$ is s.t. the energy always stays positive for initial credit $\overline{c}$, which concludes the proof.$\square $

Completeness of the symbolic algorithm is guaranteed when a sufficiently large threshold $\mathbb{C }$ is used as established in the following lemma.

Lemma 12

(Completeness) Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a multi energy game in which all absolute values of weights are bounded by $W$. If $\mathcal{P _{1}} $ has a winning strategy in $G$ and $T=(Q,R)$ is a self-covering tree for $G$ of depth $l$, then $(s_{init}, (\mathbb{C }, \ldots {}, \mathbb{C })) \in \mathsf{Cpre}_{\mathbb{C }}^{*}$ for $\mathbb{C }= 2\cdot l \cdot W$.

Remark 1

This algorithm is complete in the sense that if a winning strategy exists for $\mathcal{P _{1}} $, it outputs at least a winning initial credit (and the derived strategy) for $\mathbb{C }= 2\cdot l \cdot W$. However, this is different from the fixed initial credit problem, which consists in deciding if a particular given credit vector is winning and is known to be EXPSPACE-hard by equivalence with deciding the existence of an infinite run in a Petri net given an initial marking [12, 30]. In general, there may exist winning credits incomparable to those captured by algorithm $\mathsf{CpreFP}$. More precisely, given a constant $\mathbb{C }\in \mathbb{N }$, the algorithm fully captures all the winning initial credits smaller than $(\mathbb{C }, \ldots {}, \mathbb{C })$. Indeed, the fixed point computation considers the whole range of initial credits up to the given constant exhaustively, and only removes credits if they do not suffice to win. By Lemma 12, it is moreover guaranteed that if an arbitrary winning initial credit exists, then there exists one in the range defined by the constant $\mathbb{C }= 2\cdot l \cdot W$. Nevertheless, since our algorithm works in exponential time while the problem of finding all the winning initial credits is EXPSPACE-hard, there may be some incomparable credits outside that range that are not captured by the algorithm (comparable credits are captured since we work with upper closed sets). Indeed, if our algorithm was able to compute exhaustively all winning credits in exponential time, this would induce that EXPTIME is equal to EXPSPACE. Notice that defining a class of games for which the algorithm $\mathsf{CpreFP}$ proves to be incomplete (in the sense that uncomparable winning credits exist outside the region captured by constant $\mathbb{C }= 2\cdot l \cdot W$) is an interesting open problem.

Proof

To establish this property, we first prove that from the set of labels of $T$, we can construct a set $f$ which is increasing for the operator $\mathsf{Cpre}_{\mathbb{C }}$, i.e., $\mathsf{Cpre}_{\mathbb{C }}(f) \supseteq f$, and s.t. $(s_{init}, (\mathbb{C }, \ldots {}, \mathbb{C })) \in f$. We define $f$ from $T=(Q,R)$ as follows. Let $C \in \mathbb{N }$ be the smallest non-negative integer s.t. for all $q \in Q$, with $\varTheta (q)=(t,u)$, for all dimensions $i, 1 \le i \le k$, we have that $u(i)+C \ge 0$. $C$ is bounded from above by $l \cdot W$ because on every path from the root to a leaf in $T$, every dimension is at most decreased $l$ times by an amount bounded by $W$, and at the root all the dimensions are equal to $0$. For any $q \in Q$, we denote by $\varTheta (q)+C$ the label of $q$ where the energy level has been increased by $C$ in all the dimensions, i.e., if $\varTheta (q)=(t,u)$ then $\varTheta (q)+C=(t,u+ (C,\ldots ,C))$. Note that for all nodes in $Q$, the label is at most $l \cdot W$ and thus the shifted label remains under $\mathbb{C }= 2\cdot l\cdot W$. Now, we define the set $f$ as follows:

$$\begin{aligned} f=\{ (t,u) \in U(\mathbb{C }) \mid \exists \, q \in Q,\, \quad \varTheta (q)+C \preceq (t,u) \}. \end{aligned}$$

(4)

So, $f$ is defined as the $\preceq $-closure of the set of labels in $T$ shifted by $C$ in all the dimensions.

First, note that $(s_{init}, (\mathbb{C },\ldots ,\mathbb{C })) \in f$ as the label of the root in $T$ is $(s_{init}, (0,\ldots ,0))$. Second, let us show that $\mathsf{Cpre}_{\mathbb{C }}(f) \supseteq f$. Take any $(t,u) \in f$ and let us show that $(t,u) \in \mathsf{Cpre}_{\mathbb{C }}(f)$. We decompose the proof in two cases. (A) $t \in S_{1}$. By definition of $f$, there exists $q \in Q$ s.t. $\varTheta (q)+C \preceq (t,u)$. W.l.o.g. we can assume that $q$ is not a leaf as otherwise there exists an ancestor $q^{\prime }$ of $q$ s.t. $\varTheta (q^{\prime }) \preceq \varTheta (q)$ (recall the set is described by its minimal elements). By definition of $T$, there exists $(t,t^{\prime }) \in E$ and $q^{\prime } \in Q$ s.t. $(q,q^{\prime }) \in R$ and $\varTheta (q^{\prime }) = \varTheta (q) + w(t, t^{\prime })$. Let $(t^{\prime }, v) = \varTheta (q^{\prime }) + C$. By definition of $f$, we have $(t^{\prime },v) \in f$. By Eq. (1), it follows that $(t,u) \in \mathsf{Cpre}_{\mathbb{C }}(f)$. (B) $t \in S_{2}$. By definition of $f$, there exists $q \in Q$ s.t. $\varTheta (q)+C \preceq (t,u)$. Again, w.l.o.g. we can assume that $q$ is not a leaf as otherwise there exists an ancestor $q^{\prime }$ of $q$ s.t. $\varTheta (q^{\prime }) \preceq \varTheta (q)$. By definition of $T$, for all $(t,t^{\prime }) \in E$, there is $q^{\prime } \in Q$ s.t. $(q,q^{\prime }) \in R$ and $\varTheta (q^{\prime }) = \varTheta (q) + w(t, t^{\prime })$. Let $(t^{\prime }, v) = \varTheta (q^{\prime }) + C$. By definition of $f$, we have $(t^{\prime },v) \in f$. By Eq. (1), it follows that $(t,u) \in \mathsf{Cpre}_{\mathbb{C }}(f)$.

Now, let us show that $f \subseteq \mathsf{Cpre}_{\mathbb{C }}^{*}$. This is a direct consequence of the monotonicity of $\mathsf{Cpre}_{\mathbb{C }}$: it is well known that for any monotone function on a complete lattice, its greatest fixed point is equal to the least upper bound of all post-fixed points (points $e$ s.t. $e \subseteq \mathsf{Cpre}_{\mathbb{C }}(e)$), i.e., $\mathsf{Cpre}_{\mathbb{C }}^{*} = \bigcup \{ e \mid e \subseteq \mathsf{Cpre}_{\mathbb{C }}(e) \} \supseteq f$. As $(s_{init}, (\mathbb{C }, \ldots {}, \mathbb{C })) \in f$, that concludes the proof.$\square $

Remark 2

Note that the exponential bound on memory, obtained in Lemma 5, can also be derived from the Moore machine construction of Lemma 11 as this method is complete according to Lemma 12. Still, the DAG construction of Lemma 5 is interesting in its own right, and introduces the concept of node merging, which is underlying to the symbolic algorithm correctness, while transparent in its use.

Incrementality While the threshold $2 \cdot l \cdot W$ is sufficient, it may be the case that $\mathcal{P _{1}} $ can win the game even if its energy level is bounded above by some smaller value. So, in practice, we can use Lemma 11, to justify an incremental algorithm that first starts with small values for the parameter $\mathbb{C }$ and stops as soon as a winning strategy is found or when the value of $\mathbb{C }$ reaches the threshold $2\cdot l \cdot W$ and no winning strategy has been found.

Application of the symbolic algorithm to MEPGs and MMPGs Using the reduction of Lemma 4 that allows us to remove the parity condition, and the equivalence between multi energy games and multi mean-payoff games for finite-memory strategies (given by Chatterjee et al. [19, Theorem 3]), along with Lemma 9 (complexity), Lemma 11 (correctness) and Lemma 12 (completeness), we obtain the following result.

Theorem 2

(Symbolic and incremental synthesis algorithm) Let $G_{p}$ be a multi energy (resp. multi mean-payoff) parity game. Algorithm $\mathsf{CpreFP}$ is a symbolic and incremental algorithm that synthesizes a winning strategy in $G_{p}$ of at most exponential size memory, if a winning (resp. finite-memory winning) strategy exists. In the worst-case, the algorithm $\mathsf{CpreFP}$ takes exponential time.

Proof

The correctness and completeness for algorithm $\mathsf{CpreFP}$ on multi energy games are resp. given by Lemmas 11 and 12. Extension to mean-payoff games (under finite memory) is given by Chatterjee et al. [19, Theorem 3], whereas the parity condition can be encoded as energy thanks to Lemma 4. Exponential worst-case complexity of the algorithm $\mathsf{CpreFP}$ is induced by Lemma 9.$\square $

Integration in synthesis tools Following the conference version of this paper [21], our results on strategy synthesis have been used in the Acacia+ synthesis tool. This tool originally handled the synthesis of controllers for specifications expressed in LTL (Linear Temporal Logic, a classical formalism for formal specifications [36]) using antichain-based algorithms and has recently been extended to the synthesis from LTL specifications with mean-payoff objectives [8]. The addition of multi-mean-payoff objectives to LTL specifications provides a convenient way to enforce that synthesized controllers also satisfy some reasonable behavior from a quantitative standpoint, such as minimizing the number of unsollicited grants in a client-server architecture with prioritized clients. Numerous practical applications may benefit from this multi-dimension framework.

The authors present an approach in which the corresponding synthesis problem ultimately reduces to strategy synthesis on a multi-energy game [8, Theorem 26]. Their implementation uses fixed point computations similar to Eq. (2) and has proved efficient (considering the complexity of the problem) in practice. It uses antichains to provide a compact representation of upper-closed sets and implements the incremental approach proposed before (regarding the constant $\mathbb{C }$). In practical benchmarks, winning strategies can generally be found for rather small values of $\mathbb{C }$. Hence, the incremental approach overcomes the need to compute up to the exponential theoretical bound $\mathbb{C }= 2 \cdot l \cdot W$ in many cases. Sample benchmarks and experiments can be found in [8], and the tool can be used online [1].

5 Trading finite memory for randomness

In this section, we answer the fundamental question regarding the trade-off of memory for randomness in strategies: we study on which kind of games $\mathcal{P _{1}} $ can replace a pure finite-memory winning strategy by an equally powerful, yet conceptually simpler, randomized memoryless one and discuss how memory is encoded into probability distributions. Note that we do not consider wider strategy classes (e.g., randomized finite-memory), nor do we allow randomization for $\mathcal{P _{2}} $ (which on most cases is dispensable anyway). Indeed, we aim at a better understanding of the underlying mechanics of memory and randomization, in order to provide alternative strategy representations of practical use; not exploration of more complex games with wider strategy classes (Lemma 21 shows a glimpse of it).

We present an overview of our results in Table 1 and summarize them in Theorem 3. Note that we do not consider the opposite implication, i.e., does there always exist a way of encoding a randomized memoryless strategy into an equivalent finite-memory one. In general, this is not the case even for classes of games where we can trade memory for randomness, and it can easily be witnessed on the one-player multi mean-payoff game depicted on Fig. 5. Indeed, expectation $(1, 1)$ is achievable with a simple uniform distribution while it is not achievable with a pure, arbitrary high memory strategy (even infinite).

Table 1 When pure finite memory for $\mathcal{P _{1}} $ can be traded for randomized memorylessness

Full size table

We break down these results into three subsections: energy games, multi mean-payoff (parity) games, and single mean-payoff parity games. We start with energy games.

5.1 Randomization and energy games

Randomization is not helpful for energy objectives, even in one-player games. The proof argument is obtained from the intuition that energy objectives are similar in spirit to safety objectives.

Lemma 13

Randomization is not helpful for almost-sure winning in one-player and two-player energy, multi energy, energy parity and multi energy parity games: if there exists a finite-memory randomized winning strategy, then there exists a pure winning strategy with the same memory requirements.

Proof

Let $G_{p}$ be a game fitted with an energy objective. Consider an almost-sure winning strategy $\lambda _{1}$. If there exists a single path $\pi $ consistent with $\lambda _{1}$ that violates the energy objective, then there exists a finite prefix witness $\rho $ to violate the energy objective. Moreover, as the finite prefix has positive probability (otherwise the play is not consistent), and the strategy $\lambda _{1}$ is almost-sure winning, it follows that no such path exists. In other words, $\lambda _{1}$ is a sure winning strategy. Since randomization does not help for sure winning strategy, it follows that randomization is not helpful for one-player and two-player energy, multi energy, energy parity and multi energy parity games.$\square $

5.2 Randomization and multi mean-payoff (parity) games

Randomized memoryless strategies can replace pure finite-memory ones in the one-player multi mean-payoff parity case, but not in the two-player one, even without parity. We first note a useful link between satisfaction and expectation semantics for the mean-payoff objective.

Lemma 14

Let ${G= \left( S_{1}, S_{2}, s_{init}, E, k, w\right) } $ be a game structure with mean-payoff objective $\phi = \mathsf{MeanPayoff }_{G}(v)$ for some threshold vector $v \in \mathbb{Q }^{k}$. Let $\lambda _{1} \in \varLambda _{1}$ be a strategy of $\mathcal{P _{1}} $. If $\lambda _{1}$ is almost-sure winning for $\phi $ (i.e., winning for $1$-satisfaction), then $\lambda _{1}$ is also winning for $v$-expectation for the mean-payoff function $\mathsf{MP }$. The opposite does not hold.

Proof

We first discuss the claimed implication. Suppose $1$-satisfaction is verified. Then, for all strategy $\lambda _{2} \in \varLambda _{2}$ of $\mathcal{P _{2}} $, the set of consistent plays of value $\ge v$ has measure $1$, while the one of value $< v$ has measure $0$, by definition. Therefore, the expectation $\mathbb E _{s_{init}}^{\lambda _{1}, \lambda _{2}}(\mathsf{MP })$ is at least $v$ and $v$-expectation is verified.

To show that the opposite does not hold, consider the simple one-player game depicted on Fig. 5. Let $\lambda _{1}$ be a simple coin flipping on $s_{1}$, i.e., $\lambda _{1}(s_{1})(s_{2}) = 1/2, \lambda _{1}(s_{1})(s_{3}) = 1/2, \lambda _{1}(s_{2})(s_{2}) = 1$ and $\lambda _{1}(s_{3})(s_{3}) = 1$. The expectation of this strategy is $v = (1,1)$. Nevertheless, the probability of achieving mean-payoff of at least $v$ is $0 < 1$, which shows that it does not verify $1$-satisfaction for $\mathsf{MeanPayoff }_{G}(v)$.$\square $

The fundamental difference between energy and mean-payoff is that energy requires a property to be satisfied at all times (in that sense, it is similar to safety), while mean-payoff is a limit property. As a consequence, what matters here is the long-run frequencies of weights, not their order of appearance, as opposed to the energy case.

Lemma 15

Pure finite-memory winning strategies can be traded for equally powerful randomized memoryless ones for one-player multi mean-payoff parity games, for both satisfaction and expectation semantics. For two-player games, randomized memoryless strategies are not as powerful, even limited to expectation semantics, no parity condition, and only 2 dimensions.

For the one-player case, we extract the frequencies of visit for edges of the graph from the regular outcome that arises from the finite-memory strategy of $\mathcal{P _{1}} $. We build a randomized strategy with probability distributions on edges that yield the exact same frequencies in the long-run. Therefore, if the original pure finite-memory of $\mathcal{P _{1}} $ is surely winning, the randomized one is almost-surely winning. For the two-player case, this approach cannot be used as frequencies are not well defined, since the strategy of $\mathcal{P _{2}} $ is unknown. Consider a game which needs perfect balance between frequencies of appearance of two sets of edges in a play to be winning (Fig. 6). To almost-surely achieve mean-payoff vector $(0, 0), \mathcal{P _{1}} $ must ensure that the long-term balance between edges $(s_{4}, s_{5})$ and $(s_{4}, s_{6})$ is the same as the one between edges $(s_{1}, s_{3})$ and $(s_{1}, s_{2})$. This is achievable with memory as it suffices to react immediately to compensate the choice of $\mathcal{P _{2}} $. However, given a randomized memoryless strategy of $\mathcal{P _{1}} , \mathcal{P _{2}} $ always has a strategy to enforce that the long-term frequency is unbalanced, and thus the game cannot be won almost-surely by $\mathcal{P _{1}} $ with such a strategy. Achieving expected mean-payoff $(0, 0)$ is also excluded.

Proof

We begin with the one-player case. Let $G_{p}$ be a multi mean-payoff parity game. Let $\lambda _{1}^{pf} \in \varLambda ^{PF}_{1}$ be the pure finite-memory strategy of the player. Since it is pure and finite, its outcome is a regular word $\pi = \rho _{1} \cdot \rho _{2}^{\omega }$, with $\rho _{1} \in S^{*}, \rho _{2} \in S^{+}$. Let $\phi = {\mathsf{MeanPayoff }_{G_{p}}(v)} \cap {\mathsf{Parity }_{G_{p}}} $ be the multi mean-payoff parity objective for some threshold vector $v \in \mathbb{Q }^{k}$. Suppose this strategy verifies $\alpha $-satisfaction for $\phi $ and $\beta $-expectation for the $\mathsf{MP }$ function, for some $\alpha , \beta $. We claim that there exists a randomized memoryless strategy $\lambda _{1}^{rm} \in \varLambda ^{RM}_{1}$ that is also $\alpha $-satisfying for $\phi $ and that satisfies $\beta $-expectation for the $\mathsf{MP }$ function; and we show how to build it.

We denote concatenation by the $\cdot $ symbol. Given a finite word $\rho \in S^{*}$, two states $s, s^{\prime } \in S$, we resp. denote by $\mathsf{occ }(s, \rho )$ and $\mathsf{occ }((s, s^{\prime }), \rho )$ the number of occurrences of the state $s$ and the transition $(s, s^{\prime })$ in the word $\rho $. We add the subscript $\circ $ when we count the first state of the word as the successor of the last one (i.e., the word is a cycle in the game graph). That is, $\mathsf{occ }_{\circ }(*, \rho ) = \mathsf{occ }(*, \rho \cdot \mathsf{First }(\rho ))$.

Let us consider the mean-payoff of the outcome of strategy $\lambda _{1}^{pf}$. Recall that for a play $\pi \in \mathsf{Plays }(G), \pi = s^{1}, s^{2}, s^{3}\ldots {}$, we have $\mathsf{MP }(\pi ) = \liminf _{n \rightarrow \infty } \frac{1}{n} \sum _{1 \le i < n} w(s^{i}, s^{i+1})$. Since the play induced by $\lambda _{1}^{pf}$ is regular, the limit is well defined and we may express the mean-payoff in terms of frequencies, that is

$$\begin{aligned} \mathsf{MP }(\pi ) = \sum _{(s, s^{\prime }) \in E} w(s, s^{\prime }) \cdot \mathsf{freq }_{\infty }(s, s^{\prime }), \end{aligned}$$

where $\mathsf{freq }_{\infty }$ denotes the long-term frequency of a transition defined as

$$\begin{aligned} \forall \, (s, s^{\prime }) \in E,\quad \mathsf{freq }_{\infty }((s, s^{\prime })) = \dfrac{\mathsf{occ }_{\circ }((s, s^{\prime }), \rho _{2})}{\vert \rho _{2}\vert }. \end{aligned}$$

We define the randomized memoryless strategy $\lambda _{1}^{rm}$ as follows: $\forall \, s, s^{\prime } \in S,\, (s, s^{\prime }) \in E,\, X = \left\{ (s, t) \,\vert \, t \in S, (s, t) \in \left( \rho _{1} \cdot \mathsf{First }(\rho _{2})\right) \right\} $,

$$\begin{aligned} \lambda _{1}^{rm}(s)(s^{\prime }) = {\left\{ \begin{array}{ll}\dfrac{1}{\vert X \vert }\quad \text { if } s \in \rho _{1} \,\wedge \, s \not \in \rho _{2},\\ \dfrac{\mathsf{occ }_{\circ }((s, s^{\prime }), \rho _{2})}{\mathsf{occ }(s, \rho _{2})}\quad \text { if } s \in \rho _{2},\\ 0\quad \text { otherwise}. \end{array}\right. } \end{aligned}$$

Intuitively, we fix a uniform distribution over transitions of the finite prefix $\rho _{1}$ as we only need to ensure reaching the bottom strongly connected component (BSCC) defined by $\rho _{2}$ with probability $1$, and the relative frequencies in $\rho _{1}$ do not matter (because these weights and priorities are negligible in the long run). On the contrary, we use the exact frequencies for transitions of $\rho _{2}$ as they prevail long-term wise. Note that $\lambda _{1}^{rm}$ is a correctly defined randomized memoryless strategy.

Obviously, $\lambda _{1}^{rm}$ yields a Markov chain over states of $(\rho _{1} \cup \rho _{2})$ s.t. states of $(\rho _{1} \setminus \rho _{2})$ are transient and states of $\rho _{2}$ constitute a BSCC that is reached with probability one. Thus, the mean-payoff induced by $\lambda _{1}^{rm}$ is totally dependent on this BSCC mean-payoff value. As a consequence, proving that transition frequencies in the BSCC are exactly the same as frequencies $\mathsf{freq }_{\infty }$ defined by $\lambda _{1}^{pf}$ will imply the claim on mean-payoff. Moreover, parity will remain satisfied as the sets of infinitely often visited states will be the same for both the pure and the randomized strategy. Let $T = \lbrace t_{1}, t_{2}, \ldots {}, t_{m}\rbrace $ be the set of states that appear in $\rho _{2}$. This BSCC is an ergodic Markov chain $\mathcal M _{e}= (T, P)$ with the following matrix of transition probabilities:

$$\begin{aligned} P = \begin{array}{l} t_{1} \\ \vdots \\ t_{m} \\ \end{array} \left( \begin{array}{lllll} t_{1} &{} &{}\vdots &{} &{} t_{m}\\ \dfrac{\mathsf{occ }_{\circ }((t_{1}, t_{1}), \rho _{2})}{\mathsf{occ }(t_{1}, \rho _{2})} &{}&{}&{} &{} \\ &{} &{} \ddots &{} &{} \\ &{} &{} &{} &{} \dfrac{\mathsf{occ }_{\circ }((t_{m}, t_{m}), \rho _{2})}{\mathsf{occ }(t_{m}, \rho _{2})} \\ \end{array}\right) . \end{aligned}$$

Classical analysis of ergodic Markov chains grants the existence of a unique probability vector $\nu $ s.t. $\nu P = \nu $, i.e.

$$\begin{aligned} \forall \, 1 \le i \le m,\; \nu _{i} = \sum _{1 \le j \le m} \dfrac{\mathsf{occ }_{\circ }\left( (t_{j}, t_{i}), \rho _{2}\right) }{\mathsf{occ }\left( t_{j}, \rho _{2}\right) } \cdot \nu _{j}. \end{aligned}$$

This vector $\nu $ represents the occurrence frequency of each state in an infinite run over the Markov chain. It is easy to see that the unique probability vector $\nu $ that satisfies $\nu P = \nu $ is

$$\begin{aligned} \nu = \left( \dfrac{\mathsf{occ }(t_{1}, \rho _{2})}{\vert \rho _{2}\vert }, \ldots , \dfrac{\mathsf{occ }(t_{m}, \rho _{2})}{\vert \rho _{2}\vert }\right) . \end{aligned}$$

Moreover, given a transition of the Markov chain, its frequency is simply the product of the frequency of its starting state by the probability of the transition when the chain is in this state: for all $t, t^{\prime } \in T$, we have $\mathsf{freq }_{\infty }^\mathcal{M _{e}}((t, t^{\prime })) = \nu (t) \cdot P(t, t^{\prime })$. By definition of $\nu $ and $P$, that is

$$\begin{aligned} \mathsf{freq }_{\infty }^\mathcal{M _{e}}((t, t^{\prime })) = \dfrac{\mathsf{occ }_{\circ }((t, t^{\prime }), \rho _{2})}{\vert \rho _{2}\vert } = \mathsf{freq }_{\infty }((t, t^{\prime })), \end{aligned}$$

thus proving that the randomized strategy $\lambda _{1}^{rm}$ yields the same mean-payoff and parity as the pure finite-memory one $\lambda _{1}^{pf}$.

Now it remains to show that this does not carry over to two-player games. Indeed, we show that randomized memoryless strategies cannot replace pure finite-memory ones for the expectation semantics, even without parity. By Lemma 14, this implies that it cannot be verified for $1$-satisfaction semantics either. Consider the game depicted on Fig. 6. Player $\mathcal{P _{1}} $ has a pure finite-memory strategy $\lambda _{1}^{pf}$ that ensures $\mathsf{MP }(\pi ) \ge (0, 0)$, for all strategy $\lambda _{2}$ of $\mathcal{P _{2}} $. This strategy is simply to take the opposite choice of $\mathcal{P _{2}} $: $\lambda _{1}^{pf}(*s_{2}s_{4}) = s_{6}$ and $\lambda _{1}^{pf}(*s_{3}s_{4}) = s_{5}$. Now suppose $\mathcal{P _{1}} $ uses a randomized memoryless strategy $\lambda _{1}^{rm}$ s.t. $\lambda _{1}^{rm}(s_{4})(s_{5}) = p$ and $\lambda _{1}^{rm}(s_{4})(s_{6}) = 1-p$, for some $p \in \left[ 0, 1\right] $. We claim that whatever the value of $p$, there exists a counter-strategy $\lambda _{2}$ for $\mathcal{P _{2}} $ s.t. $\mathbb E _{s_{1}}^{\lambda _{1}^{rm}, \lambda _{2}}(\mathsf{MP }) \ngeq (0, 0)$. Suppose $p \ge 1/2$ and let $\lambda _{2}(s_{1}) = s_{2}$. Then, we have

$$\begin{aligned} \mathbb E _{s_{1}}^{\lambda _{1}^{rm}, \lambda _{2}}(\mathsf{MP }) = \dfrac{(1, -1) + \left[ p \cdot (1, -1) + (1-p) \cdot (-1, 1)\right] }{4} = \dfrac{1}{2}(p, -p) \ngeq (0, 0). \end{aligned}$$

Now suppose $p < 1/2$ and let $\lambda _{2}(s_{1}) = s_{3}$. Then, we have

$$\begin{aligned} \mathbb E _{s_{1}}^{\lambda _{1}^{rm}, \lambda _{2}}(\mathsf{MP }) = \dfrac{(-1, 1) + \left[ p \cdot (1, -1) + (1-p) \cdot (-1, 1)\right] }{4} = \dfrac{1}{2}(p - 1, 1 - p) \ngeq (0, 0). \end{aligned}$$

This shows that memory is needed to achieve the $(0, 0)$-expectation objective and concludes our proof.$\square $

5.3 Randomization and single mean-payoff parity games

Randomized memoryless strategies can replace pure finite-memory ones for single mean-payoff parity games. The proof outline is as follows. We do it in two steps. First, we show that it is the case for the simpler case of MP Büchi games (Lemma 18). Suppose $\mathcal{P _{1}} $ has a pure finite-memory winning strategy for such a game. We use the existence of particular pure memoryless strategies on winning states: the classical attractor for Büchi states, and a strategy that ensures that cycles of the outcome have positive energy (whose existence follows from Chatterjee and Doyen[16]). We build an almost-surely randomized memoryless winning strategy for $\mathcal{P _{1}} $ by mixing those strategies in the probability distributions, with sufficient probability over the strategy that is good for energy. We illustrate this construction on the simple game $G_{p}$ depicted on Fig. 7. Let $\lambda _{1}^{pf} \in \varLambda ^{PF}_{1}$ be a strategy of $\mathcal{P _{1}} $ s.t. $\mathcal{P _{1}} $ plays $(s_{1}, s_{1})$ for $8$ times, then plays $(s_{1}, s_{2})$ once, and so on. This strategy ensures surely winning for the objective $\phi = \mathsf{MeanPayoff }_{G_{p}}(3/5)$. Obviously, $\mathcal{P _{1}} $ has a pure memoryless strategy that ensures winning for the Büchi objective: playing $(s_{1}, s_{2})$. On the other hand, he also has a pure memoryless strategy that ensures cycles of positive energy: playing $(s_{1}, s_{1})$. Let $\lambda _{1}^{rm} \in \varLambda ^{RM}_{1}$ be the strategy defined as follows: play $(s_{1}, s_{2})$ with probability $\gamma $ and $(s_{1}, s_{1})$ with the remaining probability. This strategy is almost-surely winning for $\phi $ for sufficiently small values of $\gamma $ (e.g., $\gamma = 1/9$). Second, we extend this result to MP parity games using an induction on the number of priorities and the size of games (Lemma 20). We consider subgames that reduce to the MP Büchi and MP coBüchi cases. For MP coBüchi games, pure memoryless strategies are known to suffice [20].

Büchi case A particular, simpler case of the parity objective is the Büchi objective. It corresponds to parity with priorities $\lbrace 0, 1\rbrace $. We denote a Büchi game by $G= (S_{1}, S_{2}, s_{init}, E, w, F)$, with $F$ the set of Büchi states s.t. that a play is winning if it visits infinitely often states of the set $F$. We first state results on these Büchi objectives, as they are conceptually simpler to understand. Proof arguments for parity are more involved and make use of results on Büchi objectives. We sometimes denote the Büchi objective for the set $F$ by $\square \diamondsuit F$ (where $\square $ stands for globally and $\diamondsuit $ for finally), using the classical LTL formulation [36].

We first introduce the useful notion of $\varepsilon $-optimality. Given a game $G_{p}$ with a one-dimension^{Footnote 2} mean-payoff objective, we define its value as

$$\begin{aligned} \underline{\mathsf{val }} = \sup _{\lambda _{1} \in \varLambda _{1}} \inf _{\lambda _{2} \in \varLambda _{2}} \lbrace v \,\vert \, {\mathsf{Outcome }_{G_{p}}} (\lambda _{1}, \lambda _{2}) \subseteq {\mathsf{MeanPayoff }_{G_{p}}(v)} \rbrace . \end{aligned}$$

A strategy is said optimal for the mean-payoff objective if it achieves this value. Such a strategy may not need to exist in general, even in one-player games [11, 17, 20] (Fig. 8, $\mathcal{P _{1}} $ has to delay its visits of $s_{1}$ for longer and longer intervals in order to tend towards value 1). However, it is known that for all $\varepsilon > 0, \varepsilon $-optimal strategies (i.e., that achieve value $(\underline{\mathsf{val }} - \varepsilon )$) always exist in one-dimension mean-payoff games, as a consequence of Martin’s theorem on Borel determinacy [34].

Here, we show finite-memory strategies can be traded off for randomized memoryless ones for mean-payoff Büchi games. Precisely, we prove that $\varepsilon $-optimality for mean-payoff Büchi games can as well be achieved by randomized memoryless strategies. We first need to state two useful lemmas granting the existence of pure memoryless strategies that are resp. good-for-energy or good-for-Büchi, in all states that are winning for the mean-payoff Büchi objective. These strategies will help us build the needed $\varepsilon $-optimal strategies.

Lemma 16

(Extension of [16, Lemma 4]) Let $G= (S_{1}, S_{2}, s_{init}, E, w, F)$, with $F$ the set of Büchi states. Let $\mathsf{Win }\subseteq S$ be the set of winning states for the mean-payoff Büchi objective with threshold $0$. For all $s \in \mathsf{Win }, \mathcal{P _{1}} $ has a uniform (i.e., independent of the starting state) memoryless good-for-energy strategy $\lambda _{1}^{{gfe}}$ whose outcome never leaves the set $\mathsf{Win }$, s.t. any cycle $c$ of this outcome has energy $\mathsf{EL }(c) \ge 0$.

Lemma 17

(Classical attractor) Let $G= (S_{1}, S_{2}, s_{init}, E, w, F)$, with $F$ the set of Büchi states. Let $\mathsf{Win }\subseteq S$ be the set of winning states for the mean-payoff Büchi objective with threshold $0$. For all $s \in \mathsf{Win }, \mathcal{P _{1}} $ has a uniform (i.e. independent of the starting state) memoryless good-for-Büchi strategy $\lambda _{1}^{\diamondsuit F}$, an attractor strategy for $F$, whose outcome never leaves the set $\mathsf{Win }$, s.t. it ensures reaching $F$ in at most $\vert S\vert $ steps.

The randomized memoryless strategy of $\mathcal{P _{1}} $ will thus consist in mixing these two strategies, with a very low probability on the good-for-Büchi strategy. Indeed, the Büchi objective will be satisfied whatever this probability is, provided it is strictly positive. On the other hand, by giving more weight to the good-for-energy strategy, $\mathcal{P _{1}} $ can obtain a mean-payoff that is arbitrary close to the optimum.

Lemma 18

In mean-payoff Büchi games, $\varepsilon $-optimality can be achieved surely by pure finite-memory strategies and almost-surely by randomized memoryless strategies.

Proof

Let $G= (S_{1}, S_{2}, s_{init}, E, w, F)$, with $F$ the set of Büchi states. We consider the mean-payoff objective with threshold $0$ (w.l.o.g.). Let $\mathsf{Win }\subseteq S$ be the set of winning states for the mean-payoff Büchi objective. By Lemmas 16 and 17, for all $s \in \mathsf{Win }, \mathcal{P _{1}} $ has two uniform memoryless strategies $\lambda _{1}^{{gfe}}$ and $\lambda _{1}^{\diamondsuit F}$, whose outcomes never leave the set $\mathsf{Win }$, s.t. $\lambda _{1}^{{gfe}}$ ensures that any cycle $c$ of its outcome has energy $\mathsf{EL }(c) \ge 0$, and $\lambda _{1}^{\diamondsuit F}$, an attractor strategy for $F$, ensures reaching $F$ in at most $\vert S\vert $ steps.

We first build $\varepsilon $-optimal pure finite-memory strategies based on these two pure memoryless strategies. Let $\varepsilon > 0$. As usual, $W$ denotes the largest absolute weight on any edge. Let us define $\lambda _{1}^{pf}$ s.t. (a) it plays $\lambda _{1}^{{gfe}}$ for $\frac{2 \cdot W\cdot \vert S\vert }{\varepsilon } - \vert S\vert $ steps, then (b) it plays $\lambda _{1}^{\diamondsuit F}$ for $\vert S\vert $ steps, then again (a). This ensures that $F$ is visited infinitely often as $\lambda _{1}^{\diamondsuit F}$ is played infinitely many times for $\vert S\vert $ steps in a row. Furthermore, the total cost of phases (a) + (b) is bounded by $-2\cdot W\cdot \vert S\vert $, and thus the mean-payoff of the outcome is at least $-\varepsilon $, against any strategy of the adversary.

Second, we show that based on the same pure memoryless strategies, it is possible to obtain almost-surely $\varepsilon $-optimal randomized memoryless strategies, i.e.,

$$\begin{aligned}&\forall \, \varepsilon > 0,\; \exists \, \lambda _{1}^{rm} \in \varLambda ^{RM}_{1},\; \forall \, \lambda _{2} \in \varLambda _{2},\\&\mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}}_{s_{init}} \left( \pi \vDash \square \diamondsuit F\right) = 1 \;\wedge \; \mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}}_{s_{init}} \left( \mathsf{MP }(\pi ) \ge - \varepsilon \right) = 1. \end{aligned}$$

Note that pure memoryless strategies suffice for $\mathcal{P _{2}} $ as he essentially has to win against the Büchi or the mean-payoff criterion [11]. Therefore, given $\varepsilon > 0$, we need to build some strategy $\lambda _{1}^{rm} \in \varLambda ^{RM}_{1}$ s.t.

$$\begin{aligned} \forall \, \lambda _{2}^{pm} \in \varLambda ^{PM}_{2},\;\mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}^{pm}}_{s_{init}} \left( \pi \vDash \square \diamondsuit F\right) = 1 \;\wedge \; \mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}^{pm}}_{s_{init}} \left( \mathsf{MP }(\pi ) \ge - \varepsilon \right) = 1. \end{aligned}$$

We build such a strategy as follows:

$$\begin{aligned} \forall s \in S,\, \lambda _{1}^{rm}(s) = {\left\{ \begin{array}{ll}\lambda _{1}^{{gfe}}(s)\quad \text { with probability } 1 - \gamma ,\\ \lambda _{1}^{\diamondsuit F}(s)\quad \text { with probability } \gamma ,\end{array}\right. } \end{aligned}$$

for some well-chosen $\gamma \in \left] 0, 1\right[ $.

It is straightforward to see that the Büchi objective is almost-surely satisfied for all values of $\gamma > 0$ as at all times, the probability of playing according to $\lambda _{1}^{\diamondsuit F}$ for $\vert S\vert $ steps in a row, and thus ensuring a visit of $F$, is $\gamma ^{\vert S\vert }$, which is strictly positive.

It remains to study if choosing such a constant $\gamma $ s.t. the $\mathsf{MeanPayoff }_{G_{p}}(-\varepsilon )$ objective is almost-surely satisfied is always possible. Consider such a strategy $\lambda _{1}^{rm} \in \varLambda ^{RM}_{1}$ and some fixed strategy $\lambda _{2}^{pm} \in \varLambda ^{PM}_{2}$ of $\mathcal{P _{2}} $: the game reduces to a Markov chain $\mathcal M _{c}= \left( S, \delta , w\right) $, where $\delta :E\rightarrow \left[ 0, 1\right] $ is the transition probability function resulting from fixing those strategies. Suppose $\lambda _{2}^{pm}$ is winning for $\mathcal{P _{2}} $. Thus, $\mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}^{pm}}_{s_{init}} \left( \mathsf{MP }(\pi ) < - \varepsilon \right) > 0$. The mean-payoff depends on limit behavior: the probability measure of plays that do not enter in a bottom strongly connected component (BSCC) is zero [4], whereas in an BSCC of expected mean-payoff $v$, we have probability one of obtaining mean-payoff $v$. This implies that there exists some BSCC $\mathcal C $ in $\mathcal M _{c}$ s.t. $\mathbb P _\mathcal{M _{c}} \left( \diamondsuit \mathcal C \right) > 0$ and $\mathbb E _\mathcal{C } \left( \mathsf{MP }(\pi )\right) < -\varepsilon $. We claim that it is possible to choose $\gamma $ s.t. all BSCCs, in all Markov chains induced by pure memoryless strategies of $\mathcal{P _{2}} $, have expectation greater or equal to $\varepsilon $, thus proving that strategy $\lambda _{1}^{rm}$ is almost-surely $\varepsilon $-optimal with regard to mean-payoff. Intuitively, the smaller this constant $\gamma $ is chosen, the nearer will the expected mean-payoff induced by $\lambda _{1}^{rm}$ be to the one induced by $\lambda _{1}^{{gfe}}$, that is zero. Since the number of pure memoryless strategies of $\mathcal{P _{2}} $ is finite, and so is the number of BSCCs induced by $\lambda _{1}^{rm}$ (regardless of the exact value of $\gamma \in \left] 0, 1\right[$, we obtain the same BSCCs in terms of states and edges), one can compute a suitable $\gamma $ for each of them, and then take the minimum to ensure that the needed property will be satisfied in all possible cases.

Therefore, let us fix some strategy $\lambda _{2}^{pm}$ of $\mathcal{P _{2}} $, and some BSCC $\mathcal C $ of the induced Markov chain when played against strategy $\lambda _{1}^{rm}$ of $\mathcal{P _{1}} $. It remains to show that there exists $\gamma ^{*} \in \left] 0, 1\right[$ s.t. for all $\gamma \le \gamma ^{*}$, we have $\mathbb E _\mathcal{C }(\mathsf{MP }(\pi )) \ge -\varepsilon $ to conclude this proof. In $\mathcal C $, all states bear two outgoing edges, one from $\lambda _{1}^{{gfe}}$, and one from $\lambda _{1}^{\diamondsuit F}$ (we suppose w.l.o.g. that they are distinct), with respective probabilities $1-\gamma $ and $\gamma $. Consider the stochastic process $\mathcal M _{e}$ depicting alternation between sequences of edges from $\lambda _{1}^{{gfe}}$ and $\lambda _{1}^{\diamondsuit F}$ (Fig. 9).

By definition of $\lambda _{1}^{{gfe}}$, a sequence of $gfe$ edges of length $k$ has its energy bounded below by $-W\cdot \vert S\vert $ (i.e., it does not depend on $k$). Indeed, recall that all cycles have positive energy. Thus, the energy level of a sequence is a sum of positive terms (cycles), plus a sum of at most $\vert S\vert $ terms bounded from below by $-W$, as having more than $\vert S\vert $ edges produces cycles. Moreover, each $\diamondsuit F$ edge has energy bounded below by $-W$. Thus the overall mean-payoff for a play that consists of repeated sequences of $k\,gfe$ edges followed by one $\diamondsuit F$ edge is $\frac{-W\cdot (\vert S\vert + 1)}{k}$. By putting more probability on lengthy sequences of $gfe$ edges, we will thus be able to obtain an overall expected mean-payoff that is closer to zero, and particularly, greater or equal to $-\varepsilon $. Indeed, we decompose the overall expected mean-payoff according to the length of $gfe$ sequences before seeing a $\diamondsuit F$ edge. Let $\mathsf{seq }_{a}^{b}$ denote a sequence of $a$ edges of length $b$. We have:

$$\begin{aligned} \mathbb E _\mathcal{C }(\mathsf{MP }(\pi ))&= \sum _{k = 0}^{\infty } \mathbb P (\mathsf{seq }_{{gfe}}^{k}\mathsf{seq }_{\diamondsuit F}^{1}) \cdot \mathbb E (\mathsf{MP }\,\vert \, \mathsf{seq }_{{gfe}}^{k}\mathsf{seq }_{\diamondsuit F}^{1}),\\&= \sum _{k = 0}^{\infty } (1 - \gamma )^{k}\gamma \cdot \mathbb E (\mathsf{MP }\,\vert \, \mathsf{seq }_{{gfe}}^{k}\mathsf{seq }_{\diamondsuit F}^{1}). \end{aligned}$$

Now we divide this sum in two parts, according to some value $k^{*}$ s.t. for all $k \ge k^{*}$, we have $\mathbb E (\mathsf{MP }\,\vert \, \mathsf{seq }_{{gfe}}^{k}\mathsf{seq }_{\diamondsuit F}^{1}) = -\frac{W\cdot (\vert S\vert + 1)}{k + 1} \ge -\frac{W\cdot (\vert S\vert + 1)}{k^{*} + 1} = -\eta > -\varepsilon $. It suffices to take $k^{*} = \frac{W\cdot (\vert S\vert + 1)}{\varepsilon }$ to achieve this. Notice that the mean-payoff of a play is also trivially bounded below by $-W$, the largest negative weight on any edge. We obtain:

$$\begin{aligned} \mathbb E _\mathcal{C }(\mathsf{MP }(\pi ))&\ge \sum _{k = 0}^{k^{*} - 1} (1 - \gamma )^{k}\gamma \cdot (-W) + \sum _{\,k = k^{*}}^{\infty } (1 - \gamma )^{k}\gamma \cdot (-\eta ),\\&\ge k^{*}\gamma \cdot (-W) + (1 - k^{*}\gamma )\cdot (-\eta ). \end{aligned}$$

Thus, one can achieve $\mathbb E _\mathcal{C }(\mathsf{MP }(\pi )) \ge -\varepsilon $ by choosing any $\gamma \le \gamma ^{*} = \frac{-\varepsilon + \eta }{k^{*}(\eta - W)}$. Notice that such a $\gamma ^{*}$ is indeed present in $\left] 0, 1\right[$ for sufficiently small values of $\varepsilon $, independently of values of $\vert S\vert $ and $W$. As we are interested in $\varepsilon $ arbitrary close to zero, this concludes our proof.$\square $

Parity Case Given those results for mean-payoff Büchi games, we now consider the more general case of mean-payoff parity games. We start by introducing the useful notion of subgames.

Subgame. Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a game and $A \subseteq S$ be a subset of states in $G_{p}$. If $E$ is such that for all $s \in A$, there exists $s^{\prime } \in A$ with $(s,s^{\prime }) \in E$, then we define the subgame $G_{p}\downarrow A$ as $(S_{1} \cap A, S_{2} \cap A, E\cap (A \times A), w^{\prime }, p^{\prime })$ where $w^{\prime }, p^{\prime }$ are the functions $w, p$ restricted to the subdomain $A$. Note that for subgames, we do not consider an initial state.

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ and $U \subseteq S$. We define $\mathsf{Attr}_1(U)$ as the set that is obtained as the limit of the following increasing sequence: $U_0=U$, and $U_i=U_{i-1} \cup \{ s \in S_{1} \mid \exists \, s^{\prime } \in U_{i-1},\, (s, s^{\prime }) \in E\} \cup \{ s \in S_{2} \mid \forall \, s^{\prime },\, (s, s^{\prime }) \in E,\, s^{\prime } \in U_{i-1} \}$, for $i \ge 1$. As this sequence of sets is increasing, there exists $i \le \vert S\vert $ such that $U_j=U_i$ for all $j \ge i$. $\mathsf{Attr}_1(U)$ contains all the states in $G$ from which $\mathcal{P _{1}} $ can force a visit to $U$, and it is well known that $\mathcal{P _{1}} $ has a pure memoryless strategy to force such a visit from those states. Also, it is clear that $\mathcal{P _{1}} $ does not have a strategy to leave the states in $S\setminus \mathsf{Attr}_1(U)$. Attractors can be defined symmetrically for $\mathcal{P _{2}} $ and are noted $\mathsf{Attr}_2(\cdot )$. As direct consequence, we have the following proposition.

Proposition 1

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a game, let $U \subseteq S$ and $\mathsf{Attr}_1(U)$ be such that $B = S{\setminus } \mathsf{Attr}_1(U)$ is non-empty, then $G_{p}\downarrow B$ is a subgame.

The following lemma states that optimal pure memoryless strategies exist for $\mathcal{P _{1}} $ in games with mean-payoff coBüchi objectives (i.e., parity with priorities $\lbrace 1, 2\rbrace $). For mean-payoff Büchi objectives, we showed in Lemma 18 that, for all $\varepsilon > 0, \varepsilon $-optimal randomized memoryless strategies exist.

Lemma 19

[20, Theorem 5] Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ be a game with priorities $\lbrace 1, 2\rbrace $, and $\textsc {Win}^p_{\ge 0}$ be the set of nodes in $G_{p}$ from which $\mathcal{P _{1}} $ wins the mean-payoff coBüchi objective for threshold $0$ (w.l.o.g.). Then from all states in $\textsc {Win}^p_{\ge 0}, \mathcal{P _{1}} $ has a pure memoryless winning strategy for the coBüchi mean-payoff objective for threshold $0$.

We now establish that $\varepsilon $-optimal randomized memoryless strategies also exist for mean-payoff parity games, and thus, can replace pure finite-memory ones.

Lemma 20

Let ${G_{p} = \left( S_{1}, S_{2}, s_{init}, E, k, w, p\right) } $ and $\textsc {Win}^p_{\ge 0}$ be the set of nodes in $G_{p}$ from which $\mathcal{P _{1}} $ wins the mean-payoff parity objective for threshold $0$. Then for all $\varepsilon >0$, there exists $\lambda _1^{rm} \in \varLambda ^{RM}_{1}$, s.t. for all $s \in \textsc {Win}^p_{\ge 0}$ and for all $\lambda _2 \in \varLambda _2$, we have that:

$$\begin{aligned} \mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}}_{s} \left( \mathsf{MP }(\pi ) < - \varepsilon \right) = 1 \;\wedge \; \mathbb P ^{\lambda _{1}^{rm}, \lambda _{2}}_{s} \left( \mathsf{Par }(\pi ) \text { mod } 2 = 0 \right) = 1. \end{aligned}$$

Proof

The proof is by induction on the lexicographic order $\preceq $ on games, defined as follows: $G_{p}^1 \preceq G_{p}^2$ if $G_{p}^1$ has less priorities than $G_{p}^2$ or $G_{p}^1$ has the same priorities than in $G_{p}^2$ but less states. Clearly, this lexicographic order is well-founded.

The base cases are twofold: one for the number of states, and one for priorities. First, if the game is such that $\vert S\vert = 1$, then obviously, if $\mathcal{P _{1}} $ can win, he can do so with a pure memoryless strategy, which respects the claim. Second, for two priorities. W.l.o.g., we can assume that all priorities are either in $\{0,1\}$ or in $\{1,2\}$. Those cases resp. correspond to mean-payoff Büchi and mean-payoff coBüchi games. The result for mean-payoff Büchi games has been established in Lemma 18, while the result for mean-payoff coBüchi games is a direct consequence of Lemma 19, as pure memoryless strategies are a special case of randomized memoryless strategies.

Let us now consider the inductive case. Suppose we have a mean-payoff parity game $G_{p}$ with $m$ priorities and $\vert S\vert $ states. W.l.o.g., we can make the assumption that the lowest priority in $G_{p}$ is either $0$ or $1$, otherwise we subtract an even number to all priorities so that we are in that case. Let $U_0=\{ s \in \textsc {Win}^p_{\ge 0} \mid p(s)=0 \}$ and $U_1=\{ s \in \textsc {Win}^p_{\ge 0} \mid p(s)=1 \}$. We consider the two possible following situations corresponding to $U_0$ empty or not.

1.
$U_0$ empty. In that case $U_1$ is not empty. Let us consider $A_2=\mathsf{Attr}_2(U_1)$ the attractor of $\mathcal{P _{2}} $ for $U_1$. It must be the case that $\textsc {Win}^p_{\ge 0} {\setminus } A_2$ is non-empty, otherwise this would contradict the fact that $\mathcal{P _{1}} $ is winning the parity objective from states in $\textsc {Win}^p_{\ge 0}$. Indeed, if it was not the case, then $\mathcal{P _{2}} $ would be able to force an infinite number of visits to $U_1$ from all states in $\textsc {Win}^p_{\ge 0}$, and the parity would be odd as $U_{0}$ is empty, a contradiction with the definition of $\textsc {Win}^p_{\ge 0}$. (i) Let $B= \textsc {Win}^p_{\ge 0} {\setminus } A_2$. First note that, as $B$ is non-empty, by Proposition 1, $G_{p}\downarrow B$ is a subgame. Also, note that from all states in $B$, it must be the case that $\mathcal{P _{1}} $ has a winning strategy that does not require visits of the states outside $B$, i.e., states in $A_2$, for otherwise this would lead to a contradiction with the fact that $\mathcal{P _{1}} $ is winning the parity objective in $\textsc {Win}^p_{\ge 0}$. So all states in the subgame $G_{p}\downarrow B$ are winning for $\mathcal{P _{1}} $. The game $G_{p}\downarrow B$ does not contain states with priority $0$, and so we can apply our induction hypothesis to conclude that $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $B$, as $(G_{p}\downarrow B) \preceq G_{p}$ since it has one less priority. (ii) Now, let us concentrate on states in $A_2$. Let $A_1=\mathsf{Attr}_1(B)$. From states in $A_1, \mathcal{P _{1}} $ has a pure memoryless strategy to reach states in $B$, and so from there $\mathcal{P _{1}} $ can play as in $G_{p}\downarrow B$, and we are done. Let $C=A_2 \setminus A_1$. If $C$ is empty, we are done. Otherwise, by Proposition 1, $G_{p}\downarrow C$ is a subgame ($\mathcal{P _{2}} $ can force to stay within $C$). We conclude that all states in this game must be winning for $\mathcal{P _{1}} $. This game has the same minimal priority than in the original game (i.e., priority $1$) but it has at least one state less, and so we can apply our induction hypothesis to conclude that $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $C$. Therefore, by (i) and (ii), $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $\textsc {Win}^p_{\ge 0}$, which proves the claim in that case.
2.
$U_0$ is not empty. Let us consider $A_1=\mathsf{Attr}_1(U_0)$. (iii) First, consider the case where $A_1=\textsc {Win}^p_{\ge 0}$. In this case, it means that $\mathcal{P _{1}} $ can force a visit to states in $U_0$ from any states in $\textsc {Win}^p_{\ge 0}$. So, we conclude that $\mathcal{P _{1}} $ wins in $G_{p}$ the mean-payoff Büchi game with threshold $0$, and by Lemma 18, we conclude that $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $G_{p}$ for almost surely winning the parity game with mean-payoff threshold $0$ so we are done. (iv) Second, consider the case where $B=\textsc {Win}^p_{\ge 0} \setminus A_1$ is non-empty. Then by Proposition 1, $G_{p}\downarrow B$ is a subgame. So $\mathcal{P _{2}} $ can force to stay within $B$ in the original game and so we conclude that all states in the game $G_{p}\downarrow B$ are winning for $\mathcal{P _{1}} $. As $G_{p}\downarrow B$ does not contain states of priority $0$, and thus has at least one less priority, we can apply the induction hypothesis to conclude that $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $B$. Therefore, by (iii) and (iv), $\mathcal{P _{1}} $ has a memoryless randomized strategy from all states in $\textsc {Win}^p_{\ge 0}$, which also proves the case.

As we have proved the claim in both possible cases, this concludes the proof.$\square $

5.4 Summary for randomization

We sum up results for these different classes of games in Theorem 3 (cf. Table 1).

Theorem 3

(Trading finite memory for randomness) The following assertions hold: (1) Randomized strategies are exactly as powerful as pure strategies for energy objectives. Randomized memoryless strategies are not as powerful as pure finite-memory strategies for almost-sure winning in one-player and two-player energy, multi energy, energy parity and multi energy parity games. (2) Randomized memoryless strategies are not as powerful as pure finite-memory strategies for almost-sure winning in two-player multi mean-payoff games. (3) In one-player multi mean-payoff parity games, and two-player single mean-payoff parity games, if there exists a pure finite-memory sure winning strategy, then there exists a randomized memoryless almost-sure winning strategy.

Proof

(1) For energy games, results follow from Lemma 13. (2) For two-player multi mean-payoff games, they follow from Lemma 15. (3) For one-player multi mean-payoff games, they follow from Lemma 15. For two-player single mean-payoff parity, they are direct consequence of Lemma 20.$\square $

We close this section by observing that there are even more powerful classes of strategies. Their study, as well as their practical interest, remains open.

Lemma 21

Randomized finite-memory strategies are strictly more powerful than both randomized memoryless and pure finite-memory strategies for multi-mean payoff games with expectation semantics, even in the one-player case.

The intuition is essentially that memory permits to achieve an exact payoff by sticking to a given side, while randomization permits to combine payoffs of pure strategies to achieve any linear combination in between.

Proof

Consider the game $G$ depicted on Fig. 10. Whatever the pure finite-memory strategy of $\mathcal{P _{1}} $, the only achievable mean-payoff values are $(1,-1)$ (if $(s_{0}, s_{1})$ is never taken) and $(-1, 1)$ (if $(s_{0}, s_{1})$ is taken). This is also true for randomized memoryless strategies: either the probability of $(s_{0}, s_{1})$ is null and the mean-payoff has value $(1,-1)$, or this probability is strictly positive, and the mean-payoff has value $(-1,1)$ as the probability mass will eventually reach $s_{1}$. On the contrary, value $(0, 0)$ is achievable by a randomized finite-memory strategy. Indeed, consider the strategy that tosses a coin in its first visit of $s_{0}$ to decide if it will play always play $(s_{0}, s_{0})$ or if it will play $(s_{0}, s_{1})$ and then always $(s_{1}, s_{1})$. This strategy only needs one bit of memory and one bit to encode probabilities, and still, it is strictly more powerful than any amount of pure memory or any arbitrary high precision for probabilities without memory.$\square $

6 Conclusion

In this work, we considered the finite-memory strategy synthesis problem for games with multiple quantitative (energy and mean-payoff) objectives along with a parity objective. We established tight (matching upper and lower) exponential bounds on the memory requirements for such strategies (Theorem 1), significantly improving the previous triple exponential bound for multi energy games (without parity) that could be derived from results in literature for games on VASS. We presented an optimal symbolic and incremental strategy synthesis algorithm (Theorem 2). As discussed in Sect. 4, the presented algorithm has been used as part of the synthesis tool Acacia+ for specifications combining LTL properties and multi-dimensional quantitative objectives [8] and has proved efficient in practice. Finally, we also presented a precise characterization of the trade-off of memory for randomness in strategies (Theorem 3).

Notes

Note that the symbolic algorithm can be applied to MEPGs and MMPPGs after removal of the parity condition by applying the construction of Lemma 4.
The multi-dimensional setting gives rise to incomparable outcomes and the need to consider Pareto-optimality.

References

Acacia+. http://lit2.ulb.ac.be/acaciaplus/
Abdulla, P.A., Chen, Y.-F., Holík, L., Mayr, R., Vojnar, T.: When simulation meets antichains. In: Proceedings of TACAS, LNCS 6015. Springer (2010)
Alur, R., Henzinger, T.A., Kupferman, O.: Alternating-time temporal logic. J. ACM 49(5), 672–713 (2002)
Article MathSciNet Google Scholar
Baier, C., Katoen, J.-P.: Principles of Model Checking. MIT Press, Cambridge, MA (2008)
MATH Google Scholar
Bernet, J., Janin, D., Walukiewicz, I.: Permissive strategies: from parity games to safety games. ITA 36(3), 261–275 (2002)
MATH MathSciNet Google Scholar
Bloem, R., Chatterjee, K., Henzinger, T.A., Jobstmann, B.: Better quality in synthesis through quantitative objectives. In: Proceedings of CAV, LNCS 5643, pp. 140–156. Springer (2009)
Bloem, R., Greimel, K., Henzinger, T.A., Jobstmann, B.: Synthesizing robust systems. In: Proceedings of FMCAD, pp. 85–92. IEEE (2009)
Bohy, A., Bruyère, V., Filiot, E., Raskin, J.-F.: Synthesis from LTL specifications with mean-payoff objectives. In: Proceedings of TACAS, LNCS 7795, pp. 169–184. Springer (2013)
Borosh, I., Treybig, B.: Bounds on positive integral solutions of linear diophantine equations. Proc. Am. Math. Soc. 55(2), 299–304 (1976)
Article MATH MathSciNet Google Scholar
Bouyer, P., Fahrenberg, U., Larsen, K.G., Markey, N., Srba, J.: Infinite runs in weighted timed automata with energy constraints. In: Proceedings of FORMATS, LNCS 5215, pp. 33–47. Springer (2008)
Bouyer, P., Markey, N., Olschewski, J., Ummels, M.: Measuring permissiveness in parity games: mean-payoff parity games revisited. In: Proceedings of ATVA, LNCS 6996, pp. 135–149. Springer (2011)
Brázdil, T., Jancar, P., Kucera, A.: Reachability games on extended vector addition systems with states. In: Proceedings of ICALP, LNCS 6199, pp. 478–489. Springer (2010)
Cerný, P., Chatterjee, K., Henzinger, T.A., Radhakrishna, A., Singh, R.: Quantitative synthesis for concurrent programs. In: Proceedings of CAV, LNCS 6806, pp. 243–259. Springer (2011)
Cerný, P., Henzinger, T.A., Radhakrishna, A.: Simulation distances. Theor. Comput. Sci. 413(1), 21–35 (2012)
Article MATH Google Scholar
Chakrabarti, A., de Alfaro, L., Henzinger, T.A., Stoelinga, M.: Resource interfaces. In: Proceedings of EMSOFT, LNCS 2855, pp. 117–133. Springer (2003)
Chatterjee, K., Doyen, L.: Energy parity games. In: Proceedings of ICALP, LNCS 6199, pp. 599–610. Springer (2010)
Chatterjee, K., Doyen, L.: Games and markov decision processes with mean-payoff parity and energy parity objectives. In: Proceedings of MEMICS, LNCS. Springer (2011)
Chatterjee, K., Doyen, L., Henzinger, T.A.: Quantitative languages. ACM Trans. Comput. Log. 11(4), Art. no. 23 (2010). http://dl.acm.org/citation.cfm?id=1805953&dl=ACM&coll=DL&CFID=348120849&CFTOKEN=19768725
Chatterjee, K., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Generalized mean-payoff and energy games. In: Proceedings of FSTTCS, LIPIcs 8, pp. 505–516. Schloss Dagstuhl—LZI (2010)
Chatterjee, K., Henzinger, T.A., Jurdzinski, M.: Mean-payoff parity games. In: Proceedings of LICS, pp. 178–187. IEEE Computer Society (2005)
Chatterjee, K., Randour, M., Raskin, J.-F.: Strategy synthesis for multi-dimensional quantitative objectives. In: Proceedings of CONCUR, LNCS 7454, pp. 115–131. Springer (2012)
Church, A.: Logic, arithmetic, and automata. In: Proceedings of the International Congress of Mathematicians, pp. 23–35. Institut Mittag-Leffler (1962)
de Alfaro, L., Henzinger, T.A.: Interface theories for component-based design. In: Proceedings of EMSOFT, LNCS 2211, pp. 148–165. Springer (2001)
De Wulf, M., Doyen, L., Henzinger, T.A., Raskin, J.-F.: Antichains: a new algorithm for checking universality of finite automata. In: Proceedings of CAV, LNCS 4144, pp. 17–30. Springer (2006)
Doyen, L., Raskin, J.-F.: Antichains algorithms for finite automata. In: Proceedings of TACAS, LNCS 6015, pp. 2–22. Springer (2010)
Doyen, L., Raskin, J.-F.: Games with imperfect information: theory and algorithms. In: Lectures in Game Theory for Computer Scientists, pp. 185–212 (2011)
Ehrenfeucht, A., Mycielski, J.: Positional strategies for mean payoff games. Int. J. Game Theory 8(2), 109–113 (1979)
Article MATH MathSciNet Google Scholar
Emerson, E.A., Jutla, C.: The complexity of tree automata and logics of programs. In: Proceedings of FOCS, pp. 328–337. IEEE (1988)
Emerson, E.A., Jutla, C.: Tree automata, mu-calculus and determinacy. In: Proceedings of FOCS, pp. 368–377. IEEE (1991)
Fahrenberg, U., Juhl, L., Larsen, K.G., Srba, J.: Energy games in multiweighted automata. In: Proceedings of ICTAC, LNCS 6916, pp. 95–115. Springer (2011)
Grädel, E., Thomas, W., Wilke, T. (eds.): Automata, logics, and infinite games: a guide to current research. In: LNCS 2500. Springer (2002)
Gurevich, Y., Harrington, L.: Trees, automata, and games. In: Proceedings of STOC, pp. 60–65. ACM (1982)
Henzinger, T.A., Kupferman, O., Rajamani, S.: Fair simulation. Inf. Comput. 173(1), 64–81 (2002)
Article MATH MathSciNet Google Scholar
Martin, D.A.: Borel determinacy. Ann. Math. 102(2), 363–371 (1975)
Article MATH Google Scholar
Martin, D.A.: The determinacy of Blackwell games. J. Symb. Logic 63(4), 1565–1581 (1998)
Article MATH Google Scholar
Pnueli, A.: The temporal logic of programs. In: Proceedings of FOCS, pp. 46–57. IEEE Computer Society (1977)
Pnueli, A., Rosner, R.: On the synthesis of a reactive module. In: Proceedings of POPL, pp. 179–190 (1989)
Rackoff, C.: The covering and boundedness problems for vector addition systems. Theor. Comput. Sci. 6, 223–231 (1978)
Article MATH MathSciNet Google Scholar
Ramadge, P.J., Wonham, W.M.: Supervisory control of a class of discrete-event processes. SIAMbreak J. Control Optim. 25(1), 206–230 (1987)
Article MATH MathSciNet Google Scholar
Rosier, L.E., Yen, H.-C.: A multiparameter analysis of the boundedness problem for vector addition systems. J. Comput. Syst. Sci. 32(1), 105–135 (1986)
Article MATH MathSciNet Google Scholar
Thomas, W.: Languages, automata, and logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, Beyond Words, chapter 7, pp. 389–455. Springer, Berlin (1997)
Vardi, M.Y.: Automatic verification of probabilistic concurrent finite-state programs. In: Proceedings of FOCS, pp. 327–338. IEEE Computer Society (1985)
Velner, Y., Rabinovich, A.: Church synthesis problem for noisy input. In: Proceedings of FOSSACS, LNCS 6604, pp. 275–289. Springer (2011)
Zielonka, W.: Infinite games on finitely coloured graphs with applications to automata on infinite trees. Theor. Comput. Sci. 200(1–2), 135–183 (1998)
Article MATH MathSciNet Google Scholar
Zwick, U., Paterson, M.: The complexity of mean payoff games on graphs. Theor. Comput. Sci. 158, 343–359 (1996)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

Thanks to D. Sbabo for useful pointers, V. Bruyère for comments on a preliminary draft, and A. Bohy for fruitful discussions about the Acacia+ tool. We are grateful to the anonymous reviewers for their insightful comments. Krishnendu Chatterjee is supported by Austrian Science Fund (FWF) Grant No P 23499-N23, FWF NFN Grant No S11407 (RiSE), ERC Starting Grant (279307: Graph Games) and Microsoft faculty fellowship. Mickael Randour is supported by F.R.S.-FNRS. fellowship. Jean-François Raskin is supported by ERC Starting Grant (279499: inVEST).

Author information

Authors and Affiliations

IST Austria (Institute of Science and Technology Austria), Am Campus 1, 3400 , Klosterneuburg, Austria
Krishnendu Chatterjee
Computer Science Department, Université de Mons (UMONS), Place du Parc, 20, 7000 , Mons, Belgium
Mickael Randour
Département d’Informatique, Université Libre de Bruxelles (U.L.B.), Avenue Franklin Roosevelt, 50, 1050 , Brussels, Belgium
Jean-François Raskin

Authors

Krishnendu Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Mickael Randour
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Raskin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mickael Randour.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterjee, K., Randour, M. & Raskin, JF. Strategy synthesis for multi-dimensional quantitative objectives. Acta Informatica 51, 129–163 (2014). https://doi.org/10.1007/s00236-013-0182-6

Download citation

Received: 24 January 2013
Accepted: 25 June 2013
Published: 23 July 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00236-013-0182-6

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Strategy synthesis for multi-dimensional quantitative objectives

Abstract

Similar content being viewed by others

Robust Multidimensional Mean-Payoff Games are Undecidable

Synthesizing Permissive Winning Strategy Templates for Parity Games

On Satisficing in Quantitative Games

1 Introduction

2 Preliminaries

3 Optimal memory bounds

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Lemma 6

Proof

Lemma 7

Proof

Proof

Lemma 8

Proof

Theorem 1

Proof

4 Symbolic synthesis algorithm

Lemma 9

Lemma 10

Lemma 11

Proof

Lemma 12

Remark 1

Proof

Remark 2

Theorem 2

Proof

5 Trading finite memory for randomness

5.1 Randomization and energy games

Lemma 13

Proof

5.2 Randomization and multi mean-payoff (parity) games

Lemma 14

Proof

Lemma 15

Proof

5.3 Randomization and single mean-payoff parity games

Lemma 16

Lemma 17

Lemma 18

Proof

Proposition 1

Lemma 19

Lemma 20

Proof

5.4 Summary for randomization

Theorem 3

Proof

Lemma 21

Proof

6 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation