1 Introduction

Quantitative games

Game-theoretic formulations are a standard tool for the synthesis of provably-correct controllers for reactive systems [24]. We consider two-player (system vs. environment) turn-based games played on finite graphs. Vertices of the graph are called states and partitioned into states of player 1 and states of player 2. The game is played by moving a pebble from state to state, along edges in the graph, and starting from a given initial state. Whenever the pebble is on a state belonging to player i, player i decides where to move the pebble next, according to his strategy. The infinite path followed by the pebble is called a play: it represents one possible behavior of the system. A winning objective encodes acceptable behaviors of the system and can be seen as a set of winning plays. The goal of player 1 is to ensure that the outcome of the game will be a winning play, whatever the strategy played by his adversary.

To reason about resource constraints and the performance of strategies, quantitative games have been considered in the literature. See for example [3, 11, 33], or [34] for an overview. Those games are played on weighted graphs, where edges are fitted with integer weights modeling rewards or costs. The performance of a play is evaluated via a payoff function that maps it to the numerical domain. The objective of player 1 is then to ensure a sufficient payoff with regard to a given threshold value. Seminal classes of quantitative games include mean-payoff (\({ MP}\)), total-payoff (\({ TP}\)) and energy games (\({ EG}\)). In \({ MP}\) games [17, 26, 38], player 1 has to optimize his long-run average gain per edge taken whereas, in \({ TP}\) games [21, 22], player 1 has to optimize his long-run sum of weights. Energy games [6, 11, 25] model safety-like properties: the goal is to ensure that the running sum of weights never drops below zero and/or that it never exceeds a given upper bound \(U \in \mathbb {N}\). All three classes share common properties. First, \({ MP}\) games, \({ TP}\) games, and \({ EG}\) games with only a lower bound (\({ EG}_L\)) are memoryless determined (given an initial state, either player 1 has a strategy to win, or player 2 has one, and in both cases no memory is required to win). Second, deciding the winner for those games is in \(\mathsf{NP}\cap \mathsf{coNP}\) and no polynomial algorithm is known despite many efforts (e.g., [9, 16]). Energy games with both lower and upper bounds (\({ EG}_{{ LU}}\)) are more complex: they are EXPTIME-complete and winning requires memory in general [6].

While those classes are well-known, it is sometimes necessary to go beyond them to accurately model practical applications. For example, multi-dimensional games and conjunctions with a parity objective model trade-offs between different quantitative aspects [12, 15, 37]. Similarly, window objectives address the need for strategies ensuring good quantitative behaviors within reasonable time frames [16].

Average-energy games

We study the average-energy (\( AE \)) payoff function: in \( AE \) games, the goal of player 1 is to optimize the long-run average accumulated energy over a play. We introduce this objective to formalize the specification desired in a practical application [10], which we detail in the following as a motivating example. Interestingly, it turns out that this payoff first appeared long ago [36], but it was not subject to a systematic study until very recently: see related work for more discussion.

In addition to being meaningful w.r.t. practical applications, \( AE \) games also have theoretical interest. In [14], Chatterjee and Prabhu define the average debit-sum level objective, which can be seen as a variation of the average-energy where the accumulated energy is taken to be zero in any point where it is actually positive (hence, it focuses on the average debt). They use the corresponding games to compute the values of quantitative timed simulation functions. In particular, they provide a pseudo-polynomial-time algorithm to solve those games, but the complexity of deciding the winner as well as the memory requirements are open. Here, we solve those questions for the very similar average-energy objective.

Motivating example

Our example is a simplified version of the industrial application studied by Cassez et al. in [10]. Consider a machine that consumes oil, stored in a connected accumulator. We want to synthesize an appropriate controller to operate the oil pump that fills the accumulator, and by the effect of pressure, that releases oil from the accumulator into the machine with a (time-varying) rate according to desired production. In order to ensure safety, the oil level in the accumulator should be maintained at all times between a minimal and a maximal level. This part of the specification can be encoded as an energy objective with both lower and upper bounds (\({ EG}_{{ LU}}\)). At the same time, the more oil (thus pressure) in the accumulator, the faster the whole apparatus wears out. Hence, an ideal controller should minimize the average level of oil in the long run. This desire can be formalized through the average-energy payoff (\( AE \)). Overall, the specification is thus to minimize the average-energy under the strong energy constraints: we denote the corresponding objective by \({ AE}_{{ LU}}\).

Table 1 Complexity of deciding the winner and memory requirements for quantitative games: \( MP \) stands for mean-payoff, \( TP \) for total-payoff, \({ EG}_L\) (resp. \({ EG}_{{ LU}}\)) for lower-bounded (resp. lower- and upper-bounded) energy, \( AE \) for average-energy, \({ AE}_{L}\) (resp. \({ AE}_{{ LU}}\)) for average-energy under a lower bound (resp. and upper bound \(U \in \mathbb {N}\)) on the energy, c. for complete, e. for easy, and h. for hard

Contributions

Our main results are summarized in Table 1.

  1. (A)

    We establish that the average-energy objective can be seen as a refinement of total-payoff, in the same sense as total-payoff is seen as a refinement of mean-payoff [21]: it allows to distinguish strategies yielding identical mean-payoff and total-payoff.

  2. (B)

    We show that deciding the winner in two-player \( AE \) games is in \(\mathsf{NP}\cap \mathsf{coNP}\) whereas it is in P for one-player games. In both cases, memoryless strategies suffice (Theorem 8). Those complexity bounds match the state-of-the-art for \( MP \) and \( TP \) games [9, 21, 26, 38]. Furthermore we prove that \( AE \) games are at least as hard as mean-payoff games (Theorem 10). Therefore, the \(\mathsf{NP}\cap \mathsf{coNP}\)-membership can be considered optimal w.r.t. our knowledge of \( MP \) games. Technically, the crux of our approach is as follows. First, we show that memoryless strategies suffice in one-player \( AE \) games (Theorem 6): this requires to prove important properties of the \( AE \) payoff as classical sufficient criteria for memoryless determinacy present in the literature fail to apply directly. Second, we establish a polynomial-time algorithm for the one-player case: it exploits the structure of winning strategies and mixes graph techniques with local linear program solving (Theorem 7). Finally, we lift memoryless determinacy to the two-player case using results by Gimbert and Zielonka [23] and obtain the \(\mathsf{NP}\cap \mathsf{coNP}\)-membership as a corollary (Theorem 9).

  3. (C)

    We establish an EXPTIME algorithm to solve two-player \( AE \) games with lower- and upper-bounded energy (\({ AE}_{{ LU}}\)) with an arbitrary upper bound \(U \in \mathbb {N}\) (Theorem 13). It relies on a reduction of the \({ AE}_{{ LU}}\) game to a pseudo-polynomially larger \( AE \) game where the energy constraints are encoded in the graph structure. Applying straightforwardly the \( AE \) algorithm on this game would only give us \(\mathsf{NEXPTIME}\cap \mathsf{coNEXPTIME}\)-membership, hence we avoid this blowup by further reducing the problem to a particular \( MP \) game and applying a pseudo-polynomial algorithm, with some care to ensure that overall the algorithm only requires pseudo-polynomial time in the original \({ AE}_{{ LU}}\) game. Since the simpler \({ EG}_{{ LU}}\) games (i.e., \({ AE}_{{ LU}}\) with a trivial \( AE \) constraint) are already EXPTIME-hard [6], the \(\mathsf{EXPTIME}\)-membership result is optimal. We also prove that pseudo-polynomial memory is both sufficient and in general necessary to win in \({ AE}_{{ LU}}\) games, for both players (Theorem 14). We show that one-player \({ AE}_{{ LU}}\) games are \(\mathsf{PSPACE}\)-complete via the on-the-fly construction of a witness path based on the aforementioned reduction, answering a question left open in [7]. For polynomial (in the size of the game graph) values of the upper bound U—or if it is given in unary—the complexity of the two-player (resp. one-player) \({ AE}_{{ LU}}\) problem collapses to \(\mathsf{NP}\cap \mathsf{coNP}\) (resp. P) with the same approach, and polynomial memory suffices for both players.

  4. (D)

    We provide partial answers for the \({ AE}_{L}\) objective—\( AE \) under a lower bound constraint on energy but no upper bound. We show PSPACE-membership for the one-player case (Theorem 17), by reducing the problem to an \({ AE}_{{ LU}}\) game with a sufficiently large upper bound. That is, we prove that if the player can win for the \({ AE}_{L}\) objective, then he can do so without ever increasing its energy above a well-chosen bound. We also prove the \({ AE}_{L}\) problem to be at least NP-hard in one-player games (Theorem 17) and EXPTIME-hard in two-player games (Lemma 20) via reductions from the subset-sum problem and countdown games respectively. Finally, we show that memory is required for both players in two-player \({ AE}_{L}\) games (Lemma 21), and that pseudo-polynomial memory is both sufficient and necessary in the one-player case (Theorem 18). The decidability status of two-player \({ AE}_{L}\) games remains open as we only provide a correct but incomplete incremental algorithm (Lemma 19). We conjecture that the two-player \({ AE}_{L}\) problem is decidable and sketch a potential approach to solve it. We highlight the key remaining questions and discuss some connections with related models that are known to be difficult.

Observe that in many applications, the energy must be stocked in a finite-capacity storage for which an upper bound is provided. Hence, the model of choice in this case is \({ AE}_{{ LU}}\).

Related work

This paper extends previous work presented in a conference [7]: it gives a full presentation of the technical details, along with additional results and improved complexities.

The average-energy payoff—Eq. (1)—appeared in a paper by Thuijsman and Vrieze in the late eighties [36], under the name total-reward. This definition is different from the classical total-payoff—see Sect. 2—commonly studied in the formal methods community (see for example [21, 22]), which, despite that, has been referred in many papers as either total-payoff or total-reward equivalently. We will see that both definitions are indeed different and exhibit different behaviors.

Maybe due to this confusion, the payoff of Eq. (1)—which we call average-energy thus avoiding misunderstandings—was not studied extensively until recently. Nothing was known about memoryless determinacy and complexity of deciding the winner. Independently to our work, Boros et al. recently studied the same payoff (under the name total-payoff). In [5], they study Markov decision processes and stochastic games with the payoff of Eq. (1) and solve both questions. Their results overlap with ours for \( AE \) games (Table 1). Let us first mention that our results were obtained independently. Second, and most importantly, our approach and techniques are different, and we believe our take on the problem yields some interest for our community. Indeed, the algorithm of Boros et al. entirely relies on linear programming in the one-player case, and resorts to approximation by discounted games in the two-player one. Our techniques are arguably more constructive and based on inherent properties of the payoff. In that sense, it is closer to what is usually deemed important in our field. For example, we provide an extensive comparison with classical payoffs. We base our proof of memoryless determinacy on operational understanding of the \( AE \) which is crucial in order to formalize proper specifications. Our technique then benefits from seminal works [23] to bypass the reduction to discounted games and obtain a direct proof, thanks to our more constructive approach. Lastly, while [5] considers the \({ AE}\) problem in the stochastic context, we focus on the deterministic one but consider multi-criteria extensions by adding bounds on the energy (\({ AE}_{{ LU}}\) and \({ AE}_{L}\) games). Those extensions are completely new, exhibit theoretical interest and are adequate for practical applications in constrained energy systems, as witnessed by the case study of [10].

Recent work of Brázdil et al. [8] considers the optimization of a payoff under energy constraint. They study mean-payoff in consumption systems, i.e., simplified one-player energy games where all edges consume energy but some states can atomically produce a reload of the energy up to the allowed capacity.

2 Preliminaries

Graph games

We consider turn-based games played on graphs between two players denoted by \(\mathcal {P}_{1} \) and \(\mathcal {P}_{2} \). A game is a tuple \(\textit{G}= (S_1, S_2, \textit{E}, \textit{w})\) where (i) \(S_1\) and \(S_2\) are disjoint finite sets of states belonging to \(\mathcal {P}_{1} \) and \(\mathcal {P}_{2} \), with \(S = S_1 \uplus S_2\), (ii) \(\textit{E}\subseteq S \times S\) is a finite set of edges such that for all \(s \in S\), there exists \(s' \in S\) such that \((s, s') \in \textit{E}\) (i.e., no deadlock), and (iii) \(\textit{w}:\textit{E}\rightarrow \mathbb {Z}\) is an integer weight function. Given edge \((s_{1}, s_{2}) \in \textit{E}\), we write \(\textit{w}(s_{1}, s_{2})\) as a shortcut for \(\textit{w}((s_{1}, s_{2}))\). We denote by \(W\) the largest absolute weight assigned by function \(\textit{w}\). A game is called 1-player if \(S_{1} = \emptyset \) or \(S_{2} = \emptyset \).

A play from an initial state \(s_{\mathsf{init}}\in S\) is an infinite sequence \(\pi = s_0 s_1 \ldots s_n \ldots \) such that \(s_0 = s_{\mathsf{init}}\) and for all \(i \ge 0\) we have \((s_i,s_{i+1}) \in \textit{E}\). The (finite) prefix of \(\pi \) up to position n gives the sequence \(\pi (n) = s_0 s_1 \ldots s_n\), the first (resp. last) element \(s_0\) (resp. \(s_n\)) is denoted \( first (\pi (n))\) (resp. \( last (\pi (n))\)). The set of all plays in \(\textit{G}\) is denoted by \( Plays (\textit{G})\) and the set of all prefixes is denoted by \( Prefs (\textit{G})\). We say that a prefix \(\rho \in Prefs (\textit{G})\) belongs to \(\mathcal {P}_{i}, i \in \{1,2\}\), if \( last (\rho ) \in S_i\). The set of prefixes that belong to \(\mathcal {P}_{i} \) is denoted by \( Prefs _i (\textit{G})\). The classical concatenation between prefixes (resp. prefix and play) is denoted by the \(\cdot \) operator. The length of a non-empty prefix \(\rho = s_{0}\ldots {}s_{n}\) is defined as the number of edges and denoted by \(\vert \rho \vert = n\).

Payoffs of plays Given a play \(\pi = s_0 s_1 \ldots s_n \ldots \) we define

  • its energy level at position n as

    $$\begin{aligned} EL (\pi (n)) = \sum _{i = 0}^{n-1} w(s_i,s_{i+1}); \end{aligned}$$
  • its mean-payoff as

    $$\begin{aligned} \overline{MP} (\pi ) = \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{i = 0}^{n-1} w(s_i,s_{i+1}) = \limsup _{n \rightarrow \infty } \frac{1}{n} EL (\pi (n)); \end{aligned}$$
  • its total-payoff as

    $$\begin{aligned} \overline{TP} (\pi ) = \limsup _{n \rightarrow \infty } \sum _{i = 0}^{n-1} w(s_i,s_{i+1}) = \limsup _{n \rightarrow \infty } EL (\pi (n)); \end{aligned}$$
  • and its average-energy as

    $$\begin{aligned} \overline{AE} (\pi ) = \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{i = 1}^{n} \left( \sum _{j = 0}^{i-1} w(s_j,s_{j+1}) \right) = \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{i = 1}^{n} EL (\pi (i)). \end{aligned}$$
    (1)

We will sometimes consider those measures defined with \(\liminf \) instead of \(\limsup \), in which case we write \( \underline{MP} , \underline{TP} \) and \( \underline{AE} \) respectively. Finally, we also consider those measures over prefixes: we naturally define them by dropping the \(\limsup _{n \rightarrow \infty }\) operator and taking \(n = \vert \rho \vert \) for a prefix \(\rho \in Prefs (\textit{G})\). In this case, we simply write \( MP (\rho ), TP (\rho )\) and \( AE (\rho )\) to denote the fact that we consider finite sequences.

Strategiesstrategy for \(\mathcal {P}_{i}, i \in \{1,2\}\), is a function \(\sigma _i: Prefs _i(\textit{G}) \rightarrow S\) such that for all \(\rho \in Prefs _i(\textit{G})\) we have \(( last (\rho ), \sigma _i(\rho )) \in \textit{E}\). A strategy \(\sigma _{i}\) for \(\mathcal {P}_{i} \) is finite-memory if it can be encoded by a deterministic Moore machine \((M,m_0,\alpha _u,\alpha _n)\) where M is a finite set of states (the memory of the strategy), \(m_0 \in M\) is the initial memory state, \(\alpha _u :M \times S \rightarrow M\) is an update function, and \(\alpha _n :M \times S_{i} \rightarrow S\) is the next-action function. If the game is in \(s \in S_{i}\) and \(m \in M\) is the current memory value, then the strategy chooses \(s' = \alpha _n(m,s)\) as the next state of the game. When the game leaves a state \(s \in S\), the memory is updated to \(\alpha _u(m,s)\). Formally, \((M, m_0, \alpha _u, \alpha _n) \) defines the strategy \(\sigma _{i}\) such that \(\sigma _{i}(\rho \cdot s) = \alpha _n(\hat{\alpha }_u(m_0, \rho ), s)\) for all \(\rho \in S^*\) and \(s \in S_{i}\), where \(\hat{\alpha }_u\) extends \(\alpha _u\) to sequences of states as expected. A strategy is memoryless if \(\vert M\vert = 1\), i.e., it does not depend on the history but only on the current state of the game. We denote by \(\Sigma _{i}(\textit{G})\), the sets of strategies for player \(\mathcal {P}_{i} \). We drop \(\textit{G}\) when the context is clear.

A play \(\pi = s_0 s_1 \ldots \) is consistent with a strategy \(\sigma _i\) of \(\mathcal {P}_{i} \) if, for all \(n \ge 0\) where \( last (\pi (n)) \in S_i\), we have \(\sigma _i(\pi (n)) = s_{n+1}\). Given an initial state \(s_{\mathsf{init}}\in S\) and strategies \(\sigma _{1}\) and \(\sigma _{2}\) for the two players, we denote by \( Outcome (s_{\mathsf{init}}, \sigma _1,\sigma _2 )\) the unique play that starts in \(s_{\mathsf{init}}\) and is consistent with both \(\sigma _1\) and \(\sigma _2\). When fixing the strategy of only \(\mathcal {P}_{i} \), we denote the set of consistent outcomes by \( Outcomes (s_{\mathsf{init}}, \sigma _i)\).

Objectives An objective in \(\textit{G}\) is a set \(\mathcal {W} \subseteq Plays (\textit{G})\) that is declared winning for \(\mathcal {P}_{1} \). Given a game \(\textit{G}\), an initial state \(s_{\mathsf{init}}\), and an objective \(\mathcal {W}\), a strategy \(\sigma _1 \in \Sigma _{1}\) is winning for \(\mathcal {P}_{1} \) if for all strategy \(\sigma _2 \in \Sigma _{2}\), we have that \( Outcome (s_{\mathsf{init}}, \sigma _1,\sigma _2) \in \mathcal {W}\). Symmetrically, a strategy \(\sigma _2 \in \Sigma _{2}\) is winning for \(\mathcal {P}_{2} \) if for all strategy \(\sigma _1 \in \Sigma _{1}\), we have that \( Outcome (s_{\mathsf{init}}, \sigma _1,\sigma _2) \not \in \mathcal {W}\). That is, we consider zero-sum games.

We consider the following objectives and combinations of those objectives.

  • Given an initial energy level \(c_{\mathsf{init}}\in {\mathbb {N}}\), the lower-bounded energy (\({ EG}_L\)) objective \( Energy _ L (c_{\mathsf{init}}) = \{ \pi \in Plays (G)\) \(\mid \forall \, n \ge 0,\ c_{\mathsf{init}}+ EL (\pi (n)) \ge 0 \}\) requires non-negative energy at all times.

  • Given an upper bound \(U \in {\mathbb {N}}\) and an initial energy level \(c_{\mathsf{init}}\in {\mathbb {N}}\), the lower- and upper-bounded energy (\({ EG}_{{ LU}}\)) objective \( Energy _ LU (U, c_{\mathsf{init}}) = \{ \pi \in Plays (\textit{G}) \mid \forall \, n \ge 0,\ c_{\mathsf{init}}+ EL (\pi (n)) \in [0,U] \}\) requires that the energy always remains non-negative and below the upper bound U along a play.

  • Given a threshold \(t \in \mathbb {Q}\), the mean-payoff (\({ MP}\)) objective \( MeanPayoff (t) = \{ \pi \in Plays (\textit{G}) \mid \overline{MP} (\pi ) \le t \}\) requires that the mean-payoff is at most t.

  • Given a threshold \(t \in \mathbb {Z}\), the total-payoff (\({ TP}\)) objective \( TotalPayoff (t) = \{ \pi \in Plays (\textit{G}) \mid \overline{TP} (\pi ) \le t \}\) requires that the total-payoff is at most t.

  • Given a threshold \(t \in \mathbb {Q}\), the average-energy (\({ AE}\)) objective \( AvgEnergy (t) = \{ \pi \in Plays (\textit{G}) \mid \overline{AE} (\pi ) \le t \}\) requires that the average-energy is at most t.

For the \({ MP}, { TP}\) and \({ AE}\) objectives, note that \(\mathcal {P}_{1} \) aims to minimize the payoff value while \(\mathcal {P}_{2} \) tries to maximize it. The reversed convention is also often used in the literature but both are equivalent. For our motivating example, seeing \(\mathcal {P}_{1} \) as a minimizer is more natural. Note that we define the objectives using the \(\limsup \) variants of \({ MP}, { TP}\) and \({ AE}\), but similar results are obtained for the \(\liminf \) variants.

Decision problem Given a game \(\textit{G}\), an initial state \(s_{\mathsf{init}}\in S\), and an objective \(\mathcal {W} \subseteq Plays (\textit{G})\) as defined above, the associated decision problem is to decide if \(\mathcal {P}_{1} \) has a winning strategy for this objective.

We recall classical results in Table 1. Memoryless strategies suffice for both players for \({ EG}_L\) [6, 11], \({ MP}\) [17] and \({ TP}\) [19, 22] objectives. Since all associated problems can be solved in polynomial time for 1-player games, it follows that the 2-player decision problem is in \(\mathsf{NP}\cap \mathsf{coNP}\) for those three objectives [6, 21, 38]. For the \({ EG}_{{ LU}}\) objective, memory is in general needed and the associated decision problem is EXPTIME-complete [6] (PSPACE-complete for one-player games [18]).

Game values Given a game with an objective \(\mathcal {W} \in \{ MeanPayoff , TotalPayoff , AvgEnergy \}\) and an initial state \(s_{\mathsf{init}}\), we refer to the value from \(s_{\mathsf{init}}\) as \(v = \inf \{t \in \mathbb {Q} \mid \exists \, \sigma _{1} \in \Sigma _{1},\, Outcomes (s_{\mathsf{init}}, \sigma _1) \subseteq \mathcal {W}(t)\}\). For both \({ MP}\) and \({ TP}\) objectives, it is known that the value can be achieved by an optimal memoryless strategy; for the \({ AE}\) objective it follows from our results (Theorem 8).

3 Average-energy

In this section, we consider the problem of ensuring a sufficiently low average-energy.

Problem 1

(\({ AE}\)) Given a game \(\textit{G}\), an initial state \(s_{\mathsf{init}}\), and a threshold \(t \in \mathbb {Q}\), decide if \(\mathcal {P}_{1} \) has a winning strategy \(\sigma _1 \in \Sigma _{1}\) for the objective \( AvgEnergy (t)\).

We first compare the \({ AE}\) objective with traditional quantitative objectives and study how they can be connected (Sect. 3.1). Then we want to establish that in \({ AE}\) games, memoryless strategies are always sufficient to play optimally, for both players. Interestingly, this result cannot be obtained by straightforward application of many well-known sufficient criteria for memoryless determinacy existing in the literature. We thus introduce some technical lemmas that highlight the inherent features of the \({ AE}\) payoff function (Sect. 3.2) and permit to prove the result for one-player \({ AE}\) games (Sect. 3.3). We then prove that one-player \({ AE}\) games can be solved in polynomial-time via an algorithm combining graph analysis techniques with linear programming. Finally, we consider the two-player case (Sect. 3.4). Applying a result by Gimbert and Zielonka [23], combined with our results on the one-player case, we derive memoryless determinacy of two-player \({ AE}\) games. This also induces \(\mathsf{NP}\cap \mathsf{coNP}\)-membership of the \({ AE}\) problem by the \(\mathsf{P}\) algorithm of Sect. 3.3. We conclude by proving that \({ AE}\) games are at least as hard as \({ MP}\) games, hence indicating that the \(\mathsf{NP}\cap \mathsf{coNP}\) upper bound is essentially optimal with regard to our current knowledge of \({ MP}\) games (whose membership to \(\mathsf{P}\) is a long-standing open problem [9, 16, 26, 38]).

3.1 Relation with classical objectives

Several links between \({ EG}_L, { MP}\) and \({ TP}\) objectives can be established. Intuitively, \(\mathcal {P}_{1} \) can only ensure a lower bound on energy if he can prevent \(\mathcal {P}_{2} \) from enforcing strictly-negative cycles (otherwise the initial energy is eventually exhausted). This is the case if and only if \(\mathcal {P}_{1} \) can ensure a non-negative mean-payoff in \(\textit{G}\) (here, he wants to maximize the \( MP \)), and if this is the case, \(\mathcal {P}_{1} \) can prevent the running sum of weights from ever going too far below zero along a play, hence granting a lower bound on total-payoff. We introduce the sign-reversed game \(\textit{G}'\) in the next lemma, which is consistent with our view of \(\mathcal {P}_{1} \) as a minimizer with regard to payoffs (as discussed in Sect. 2).

Lemma 1

Let \(\textit{G}= (S_1, S_2, \textit{E}, \textit{w})\) be a game and \(s_{\mathsf{init}}\in S\) be the initial state. The following assertions are equivalent.

  1. A.

    There exists \(c_{\mathsf{init}}\in \mathbb {N}\) such that \(\mathcal {P}_{1} \) has a (memoryless) winning strategy for objective \( Energy _ L (c_{\mathsf{init}})\).

  2. B.

    Player \(\mathcal {P}_{1} \) has a (memoryless) winning strategy for objective \( MeanPayoff (0)\) in the game \(\textit{G}'\) defined by reversing the sign of the weight function, i.e., for all \((s_{1}, s_{2}) \in \textit{E}, \textit{w}'(s_{1}, s_{2}) = -\textit{w}(s_{1}, s_{2})\).

  3. C.

    Player \(\mathcal {P}_{1} \) has a (memoryless) winning strategy for objective \( TotalPayoff (t)\), with \(t = 2\cdot (\vert S\vert - 1)\cdot W\), in the game \(\textit{G}'\) defined by reversing the sign of the weight function.

  4. D.

    There exists \(t \in \mathbb {Z}\) such that \(\mathcal {P}_{1} \) has a (memoryless) winning strategy for objective \( TotalPayoff (t)\), in the game \(\textit{G}'\) defined by reversing the sign of the weight function.

Proof

Proof of \(A \Leftrightarrow B\) is given in [6, Proposition 12]. Proof of \(B \Leftrightarrow C \Leftrightarrow D\) is in [16, Lemma 1].\(\square \)

The \({ TP}\) objective is sometimes seen as a refinement of \({ MP}\) for the case where \(\mathcal {P}_{1} \)—as a minimizer—can ensure \( MP \) equal to zero but not lower, i.e., the \({ MP}\) game has value zero [21]. Indeed, one may use the \( TP \) to further discriminate between strategies that guarantee \( MP \) zero. In the same philosophy, the average-energy can help in distinguishing strategies that yield identical total-payoffs. See Fig. 1. The \( AE \) values in both examples can be computed easily using the upcoming technical lemmas (Sect. 3.2).

Fig. 1
figure 1

Both plays have identical mean-payoff and total-payoff: \( \overline{MP} (\pi _{1}) = \underline{MP} (\pi _{1}) = \overline{MP} (\pi _{2}) = \underline{MP} (\pi _{2}) = 0, \overline{TP} (\pi _{1}) = \overline{TP} (\pi _{2}) = 5\), and \( \underline{TP} (\pi _{1}) = \underline{TP} (\pi _{2}) = 1\). But play \(\pi _{1}\) has a lower average-energy: \( \overline{AE} (\pi _{1}) = \underline{AE} (\pi _{1}) = 3 < \overline{AE} (\pi _{2}) = \underline{AE} (\pi _{2}) = 11/3\). a Play \(\pi _1\) sees energy levels (1, 3, 5, 3)\(^{\omega }\). b Play \(\pi _2\) sees energy levels (1, 3, 5, 5, 5, 3)\(^{\omega }\)

In these examples, the average-energy is clearly comprised between the infimum and supremum total-payoffs. This remains true for any play.

Lemma 2

For any play \(\pi \in Plays (\textit{G})\), we have that

$$\begin{aligned} \underline{AE} (\pi ), \overline{AE} (\pi ) \in \left[ \underline{TP} (\pi ), \overline{TP} (\pi ) \right] \subseteq \mathbb {R} \cup \{-\infty , \infty \}. \end{aligned}$$

Proof

Consider a play \(\pi \in Plays (\textit{G})\). By definition of the total-payoff and thanks to weights taking integer values, we have that there exists some index \(m \in \mathbb {N}_{0}\) such that, for all \(n \ge m, EL (\pi (n)) \in \left[ \underline{TP} (\pi ), \overline{TP} (\pi ) \right] \). By definition, the average-energy \( \overline{AE} \) (resp. \( \underline{AE} \)) measures the supremum (resp. infinimum) limit of the averages of those partial sums, hence it holds that \( \underline{AE} (\pi ), \overline{AE} (\pi ) \in \left[ \underline{TP} (\pi ), \overline{TP} (\pi ) \right] \).\(\square \)

In particular, if the mean-payoff value from a state is not zero, its total-payoff value is infinite and the following lemma holds.

Lemma 3

Let \(\textit{G}= (S_1, S_2, \textit{E}, \textit{w})\) be a game and \(s_{\mathsf{init}}\in S\) be the initial state.

  1. 1.

    If there exists \(t < 0\) such that \(\mathcal {P}_{1} \) has a (memoryless) winning strategy for objective \( MeanPayoff (t)\), then \(\mathcal {P}_{1} \) has a memoryless strategy that is winning for \( AvgEnergy (t')\) for all \(t' \in \mathbb {Q}\), i.e., this strategy ensures that any consistent outcome \(\pi \) is such that \( \underline{AE} (\pi ) = \overline{AE} (\pi ) = -\infty \).

  2. 2.

    If \(\mathcal {P}_{1} \) has no (memoryless) winning strategy for \( MeanPayoff (0)\), then, for any \(t' \in \mathbb {Q}, \mathcal {P}_{1} \) has no winning strategy for \( AvgEnergy (t')\). In particular, \(\mathcal {P}_{2} \) has a memoryless strategy ensuring that any consistent outcome \(\pi \) is such that \( \underline{AE} (\pi ) = \overline{AE} (\pi ) = \infty \).

Proof

Consider the first implication. Assume \(\mathcal {P}_{1} \) has a memoryless strategy \(\sigma _{1}\) ensuring that all outcomes \(\pi \in Outcomes (s_{\mathsf{init}}, \sigma _1)\) are such that \( \overline{MP} (\pi )~<~0\). For any such outcome, it is guaranteed that all simple cycles have a strictly negative energy level. Thus, we have that \( \overline{TP} (\pi ) = -\infty \), and by Lemma 2, it implies that \( \overline{AE} (\pi ) = -\infty \), as claimed. Since \( \underline{AE} (\pi ) \le \overline{AE} (\pi )\) by definition, the property holds.

Now consider the second implication. Assume there exists no winning strategy for \(\mathcal {P}_{1} \) for the mean-payoff objective. By equivalence B \(\Leftrightarrow \) D of Lemma 1, and memoryless determinacy of total-payoff games (see for example [22]), it follows that \(\mathcal {P}_{2} \) has a memoryless strategy \(\sigma _{2}\) ensuring that all consistent outcomes \(\pi \in Outcomes (s_{\mathsf{init}}, \sigma _2)\) are such that \( \underline{TP} (\pi ) = \infty \). By Lemma 2, this induces the claim. \(\square \)

3.2 Useful properties of the average-energy

In this subsection, we will first review some classical criteria that usually prove sufficient to deduce memoryless determinacy in quantitative games and discuss why they cannot be applied straight out of the box to the average-energy payoff. We will then prove two useful properties of this payoff that will later help us to prove the desired result.

Classical sufficient criteria

We briefly discuss traditional approaches to prove memoryless determinacy in quantitative games. The first one is to study a variant of the infinite-duration game where the game halts as soon as a cycle is closed and then to relate the properties of this variant to the infinite-duration game. This technique was used in the original proof of memoryless determinacy for mean-payoff games by Ehrenfeucht and Mycielski [17], and in a following simpler proof by Björklund et al. [2]. The connection between infinite-duration games and so-called first cycle games was recently streamlined by Aminof and Rubin [1], identifying sufficient conditions to prove that first cycle games and their infinite-duration counterparts admit optimal memoryless strategies for both players. Among those conditions is the need for winning objectives to be closed under cyclic permutation (intuitively, swapping cycles in a play should not induce a better payoff) and under concatenation (intuitively, concatenating two prefixes should not result in a payoff better than the best of the two prefixes). Without further assumptions, the average-energy objective satisfies neither. Indeed, consider individual cycles represented by sequences of weights \(\mathcal {C}_{1} = \{-1\}, \mathcal {C}_{2} = \{1\}\) and \(\mathcal {C}_{3} = \{1, -2\}\). We see that \( AE (\mathcal {C}_{1}\mathcal {C}_{2}) = (-1 + 0)/2 = -1/2 < AE (\mathcal {C}_{2}\mathcal {C}_{1}) = (1 -0)/2 = 1/2\), hence AE is not closed under cyclic permutations. Intuitively, the order in which the weights are seen does matter, in contrast to most classical payoffs. For concatenation, see that \( AE (\mathcal {C}_{3}) = 0\) while \( AE (\mathcal {C}_{3}\mathcal {C}_{3}) = -1/2 < 0\). Here the intuition is that the overall \( AE \) is impacted by the energy of the first cycle which is strictly negative (\(-1\)). In a sense, the \( AE \) of a cycle can only be maintained through repetition if this cycle is neutral with regard to the total energy level, i.e., if it has energy level zero: we will formalize this intuition in Lemma 5.

Other criteria for memoryless determinacy or half-memoryless determinacy (i.e., holding only for one of the two players) respectively appear in works by Gimbert and Zielonka [22] and by Kopczynski [29]. They involve checking that the payoff is fairly mixing, or concave. Again, both are false for arbitrary sequences of weights in the case of the average-energy, for essentially the same reasons as above. Nevertheless, we will be able to prove that memoryless strategies suffice for both players using similar ideas but first taking care of the problematic cases. Intuitively, when those cases are dealt with, we will regain a payoff that satisfies the above conditions. We also obtain monotonicity and selectivity of the payoff function as defined in [23].

Extraction of prefixes The following lemma describes the impact of adding a finite prefix to an infinite play. We prove that the average-energy over a play can be decomposed w.r.t. to the energy level of any of its prefixes and the average-energy of the remaining suffix.

Lemma 4

(Average-energy prefix) Let \(\rho \in Prefs (\textit{G}), \pi \in Plays (\textit{G})\). Then,

$$\begin{aligned} \overline{AE} (\rho \cdot \pi ) = EL (\rho ) + \overline{AE} (\pi ). \end{aligned}$$

The same equality holds for \( \underline{AE} \).

Proof

Let \(\rho = s_{0}\ldots {}s_{k} \in Prefs (\textit{G})\) and \(\pi \in Plays (\textit{G})\) be a prefix and a play over a game \(\textit{G}\). We prove the property for \( \overline{AE} \). By definition and decomposition, we have that

$$\begin{aligned} \overline{AE} (\rho \cdot \pi )&= \limsup _{n \rightarrow \infty } \dfrac{1}{n} \sum _{i = 1}^{n} EL ((\rho \cdot \pi )(i))\\&= \limsup _{n \rightarrow \infty } \left[ \dfrac{1}{n} \cdot \sum _{i = 1}^{k} EL (\rho (i)) + \frac{1}{n} \cdot \sum _{i = k+1}^{n} EL (\rho ) + \frac{1}{n} \cdot \sum _{i = k+1}^{n} EL (\pi (i-k))\right] . \end{aligned}$$

For clarity, we rewrite this expression as \( \overline{AE} (\rho \cdot \pi ) = \limsup _{n \rightarrow \infty } \big [ X_{1}(n) + X_{2}(n) + X_{3}(n)\big ]\), maintaining the same order.

Since k is fixed and finite, and \( EL (\rho (i))\) is bounded for all \(i \le k\), we have that \(\limsup _{n \rightarrow \infty } X_{1}(n) = \lim _{n \rightarrow \infty } X_{1}(n) = 0\). Furthermore, for \(n \ge k + 1\), we rewrite the second term as \(X_{2}(n) = (n - k - 1)\cdot EL (\rho )/n\), and it follows that \(\limsup _{n \rightarrow \infty } X_{2}(n) = \lim _{n \rightarrow \infty } X_{2}(n) = EL (\rho )\). Since both sequences \(X_{1}(n)\) and \(X_{2}(n)\) converge, we can write

$$\begin{aligned}&\liminf _{n \rightarrow \infty } X_{1}(n) + \liminf _{n \rightarrow \infty } X_{2}(n) + \limsup _{n \rightarrow \infty } X_{3}(n) \le \overline{AE} (\rho \cdot \pi )\\&\quad \le \limsup _{n \rightarrow \infty } X_{1}(n) + \limsup _{n \rightarrow \infty } X_{2}(n) + \limsup _{n \rightarrow \infty } X_{3}(n). \end{aligned}$$

Hence, by a small change of variable,

$$\begin{aligned} \overline{AE} (\rho \cdot \pi ) = EL (\rho ) + \limsup _{n \rightarrow \infty } X_{3}(n)&= EL (\rho ) + \limsup _{n \rightarrow \infty } \left[ \frac{1}{n} \cdot \sum _{i = 1}^{n-k-1} EL (\pi (i))\right] \\ {}&= EL (\rho ) + \overline{AE} (\pi ), \end{aligned}$$

as, in the limit, the \((k+1)\) missing terms in the sum are negligible. The proof for \( \underline{AE} \) is similar.\(\square \)

Extraction of a best cycle The next lemma is crucial to prove that memoryless strategies suffice: under well-chosen conditions, one can always select a best cycle in a play—hence, there is no interest in mixing different cycles and no use for memory. It holds only for sequences of cycles that have energy level zero: since they do not change the energy, they do not modify the \( AE \) of the following suffix of play, and one can decompose the \( AE \) as a weighted average over zero cycles.

Lemma 5

(Repeated zero cycles of bounded length) Let \(\mathcal {C}_{1}, \mathcal {C}_{2}, \mathcal {C}_{3}, \ldots {}\) be an infinite sequence of cycles \(\mathcal {C}_{i} \in Prefs (\textit{G})\) such that (i) \(\pi = \mathcal {C}_{1} \cdot \mathcal {C}_{2} \cdot \mathcal {C}_{3} \cdots {} \in Plays (\textit{G})\),Footnote 1 (ii) \(\forall \,i \ge 1, EL (\mathcal {C}_{i}) = 0\) and (iii) \(\exists \, \ell \in {\mathbb {N}}_{>0}\) such that \(\forall \,i \ge 1, \vert \mathcal {C}_{i}\vert \le \ell \). Then the following properties hold.

  1. 1.

    The average-energy of \(\pi \) is the weighted average of the average-energies of the cycles:

    $$\begin{aligned} \overline{AE} (\pi ) = \limsup _{k \rightarrow \infty } \left[ \dfrac{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert }\right] . \end{aligned}$$
    (2)
  2. 2.

    For any cycle \(\mathcal {C}\in Prefs (\textit{G})\) such that \( EL (\mathcal {C}) = 0\), we have that \( \overline{AE} (\mathcal {C}^{\omega }) = AE (\mathcal {C})\).

  3. 3.

    Repeating the best cycle gives the lowest \( AE \):

    $$\begin{aligned} \inf _{i \in {\mathbb {N}}_{>0}} AE (\mathcal {C}_{i}) = \inf _{i \in {\mathbb {N}}_{>0}} \overline{AE} ((\mathcal {C}_{i})^{\omega }) \le \overline{AE} (\pi ). \end{aligned}$$

Similar properties hold for \( \underline{AE} \).

Observe that since we assume a bound \(\ell \in {\mathbb {N}}_{>0}\) on the length of cycles, and the game is played on a finite graph, Point 3 of Lemma 5 does actually allow to select a best cycle: the set of possible cycles of length at most \(\ell \) is finite and the infimum is reached, hence can be replaced by the minimum.

Proof

We prove the three points for \( \overline{AE} \), similar arguments can be applied for \( \underline{AE} \). Consider Point 1. Let \(\pi = s_{0}^{1}\ldots {}s_{\vert \mathcal {C}_{1}\vert }^{1} s_{1}^{2}\ldots {}s_{\vert \mathcal {C}_{2}\vert }^{2}s_{1}^{3} \ldots {}\) where \(s_{j}^{i}\) denotes the jth state of cycle \(\mathcal {C}_{i}\), with \(\mathcal {C}_{1} = s_{0}^{1}\ldots {}s_{\vert \mathcal {C}_{1}\vert }^{1}\) and for all \(i > 1, \mathcal {C}_{i} = s^{i-1}_{\vert \mathcal {C}_{i-1}\vert }s_{1}^{i}\ldots {}s^{i}_{\vert \mathcal {C}_{i}\vert }\). Essentially, \(s^{i-1}_{\vert \mathcal {C}_{i-1}\vert }\) is both the last state of \(\mathcal {C}_{i-1}\) and the first one of \(\mathcal {C}_{i}\): it can also be seen as \(s^{i}_{0}\) and we later use both notations depending on the role we consider for this state. Given index \(k \in {\mathbb {N}}\) of a state \(s_{k}\) in the classical formulation \(\pi = s_{0}s_{1}s_{2}\ldots {}\) such that \(s_{k}\) denotes state \(s_{j}^{i}\) in our new formulation \(\pi = s_{0}^{1}\ldots {}s_{\vert \mathcal {C}_{1}\vert }^{1} s_{1}^{2}\ldots {}s_{\vert \mathcal {C}_{2}\vert }^{2}s_{1}^{3} \ldots {}\), we define \(c(k) = i\) and \(p(k) = j\), respectively denoting the index of the corresponding cycle and the position of state \(s_{k}\) within this cycle. We can rewrite the definition of the average-energy of \(\pi \) as

$$\begin{aligned} \overline{AE} (\pi )&= \limsup _{n \rightarrow \infty } \left[ \dfrac{1}{n} \sum _{k = 1}^{n} EL (\pi (k))\right] \nonumber \\&= \limsup _{n \rightarrow \infty } \left[ \dfrac{1}{n} \left( \sum _{i = 1}^{c(n) - 1} \sum _{j = 1}^{\vert \mathcal {C}_{i}\vert } EL \left( s_{0}^{1}\ldots {}s_{j}^{i}\right) \,+\, \sum _{j = 1}^{p(n)} EL \left( s_{0}^{1}\ldots {}s_{j}^{c(n)}\right) \right) \right] . \end{aligned}$$
(3)

Observe that since all cycles are such that \( EL (\mathcal {C}_{i}) = 0\), we have that \( EL (s_{0}^{1}\ldots {}s_{j}^{i}) = EL (s_{0}^{i}\ldots {}s_{j}^{i})\) for all indices \(i \in {\mathbb {N}}_{>0}, j \in \{1, \ldots , \vert \mathcal {C}_{i}\vert \}\). In other words, the energy level in a given position only depends on the current cycle. Hence, for all \(i \in {\mathbb {N}}_{>0}\),

$$\begin{aligned} \sum _{j = 1}^{\vert \mathcal {C}_{i}\vert } EL \left( s_{0}^{1}\ldots {}s_{j}^{i}\right) = \sum _{j = 1}^{\vert \mathcal {C}_{i}\vert } EL \left( s_{0}^{i}\ldots {}s_{j}^{i}\right) = \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i}) \end{aligned}$$

where the second equality follows by definition of \( AE (\mathcal {C}_{i})\). Therefore, Eq. (3) becomes

$$\begin{aligned} \overline{AE} (\pi ) = \limsup _{n \rightarrow \infty } \left[ \dfrac{1}{n} \left( \sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})\,+\, \sum _{j = 1}^{p(n)} EL \left( s_{0}^{c(n)}\ldots {}s_{j}^{c(n)}\right) \right) \right] . \end{aligned}$$

Recall that, by hypothesis, there exists \(\ell \in {\mathbb {N}}_{>0}\) such that for all \(i \ge 1, \vert \mathcal {C}_{i}\vert ~\le ~\ell \). Observe that the boundedness of cycles length implies that

  1. (a)

    \(p(n) \le \ell \),

  2. (b)

    \(\sum _{j = 1}^{p(n)} EL (s_{0}^{c(n)}\ldots {}s_{j}^{c(n)})\) is bounded,

  3. (c)

    \(\sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert \le n = \sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert + p(n) \le \sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert + \ell \).

Combining those three arguments, we obtain that

$$\begin{aligned} \limsup _{n \rightarrow \infty } \left[ \dfrac{\sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert + \ell }\right] \le \overline{AE} (\pi ) \le \limsup _{n \rightarrow \infty } \left[ \dfrac{\sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i = 1}^{c(n) - 1} \vert \mathcal {C}_{i}\vert }\right] \end{aligned}$$

Hence,

$$\begin{aligned} \overline{AE} (\pi ) = \limsup _{k \rightarrow \infty } \left[ \dfrac{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert }\right] \end{aligned}$$

as claimed by Point 1.

Now consider Point 2. For any cycle \(\mathcal {C}\in Prefs (\textit{G})\) such that \( EL (\mathcal {C}) = 0\), all three hypotheses (i), (ii), and (iii) are clearly satisfied, with \(\ell = \vert \mathcal {C}\vert \). Hence by Point 1, we have that

$$\begin{aligned} \overline{AE} (\mathcal {C}^{\omega }) = \limsup _{k \rightarrow \infty } \left[ \dfrac{k\cdot \vert \mathcal {C}\vert \cdot AE (\mathcal {C})}{k\cdot \vert \mathcal {C}\vert }\right] = AE (\mathcal {C}). \end{aligned}$$

Finally, we prove Point 3. The equality straightforwardly follows from Point 2. It remains to consider the inequality. By definition of the infimum, we have that, for all \(k \ge 1\),

$$\begin{aligned} \inf _{i \in {\mathbb {N}}_{>0}} AE (\mathcal {C}_{i}) = \dfrac{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert \cdot \inf _{i \in {\mathbb {N}}_{>0}} AE (\mathcal {C}_{i})}{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert } \le \dfrac{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert }. \end{aligned}$$

Hence by taking the limit, we obtain

$$\begin{aligned} \inf _{i \in {\mathbb {N}}_{>0}} AE (\mathcal {C}_{i}) = \limsup _{k \rightarrow \infty } \left[ \inf _{i \in {\mathbb {N}}_{>0}} AE (\mathcal {C}_{i})\right] \le \limsup _{k \rightarrow \infty } \left[ \dfrac{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert \cdot AE (\mathcal {C}_{i})}{\sum _{i=1}^{k} \vert \mathcal {C}_{i}\vert }\right] = \overline{AE} (\pi ). \end{aligned}$$

This concludes our proof.\(\square \)

3.3 One-player games

We assume that the unique player is \(\mathcal {P}_{1} \), hence that \(S_{2} = \emptyset \). The proofs are similar for the case where all states belong to \(\mathcal {P}_{2} \) (i.e., \(S_{1} = \emptyset \)). Similarly, we present our results for the \( \overline{AE} \) variant, but they carry over to the \( \underline{AE} \) one. Actually, since we show that we can restrict ourselves to memoryless strategies, all consistent outcomes will be periodic and thus both variants will be equal over those outcomes.

Memoryless determinacy Intuitively, we use Lemmas 4 and 5 to transform any arbitrary path into a simple lasso path, repeating a unique simple cycle, yielding an \( AE \) at least as good, thus proving that any threshold achievable with memory can also be achieved without it.

Theorem 6

Memoryless strategies are sufficient to win one-player \({ AE}\) games.

Proof

As a preliminary step, we check whether the graph contains a reachable strictly negative cycle, e.g., using the Bellman–Ford algorithm in \(\mathcal {O}(\vert S\vert \cdot \vert T \vert )\)-time. If so, then \(\mathcal {P}_{1} \) can ensure a strictly negative mean-payoff, and by Point 1 of Lemma 3, a memoryless strategy exists to make the average-energy be \(-\infty \): such a strategy consists in reaching and repeating the negative simple cycle forever.

Now, assume that the graph contains no (reachable) strictly negative cycle. If the graph also contains no zero cycle, then the energy level necessarily diverges to \(+\infty \), and the average-energy is \(+\infty \) along any run. Indeed, we are in the case of Point 2 of Lemma 3. Any strategy is optimal in that case: in particular, any memoryless strategy is.

For the rest of this proof, we consider the remaining case of graphs that contain no reachable strictly negative cycle, but that do contain zero cycles. We will prove that memoryless strategies suffice for \(\mathcal {P}_{1} \) in those games, by induction on the number of choices of \(\mathcal {P}_{1} \). Given a game \(\textit{G}= (S_1, S_2 = \emptyset , \textit{E}, \textit{w})\), we define \(d_\textit{G}= \left| \textit{E}\right| - \left| S\right| \). Since we assume graphs to be deadlock-free, we have that \(d_\textit{G}\ge 0\) for any game \(\textit{G}\). We consider induction on the value \(d_\textit{G}\). For every game \(\textit{G}\) such that \(d_\textit{G}= 0\) and initial state \(s_{\mathsf{init}}\in S, \mathcal {P}_{1} \) wins for the \( AE \) objective for threshold \(t \in \mathbb {Q}\) iff he wins with a memoryless strategy: indeed, \(\mathcal {P}_{1} \) actually has no choice at all in \(\textit{G}\), which is reduced to a unique outcome from \(s_{\mathsf{init}}\).

Now assume that memoryless strategies suffice for \(\mathcal {P}_{1} \) in every game \(\textit{G}\) such that \(d_\textit{G}\le m\) for some \(m \in \mathbb {N}\). We claim that they also suffice in every game \(\textit{G}\) such that \(d_\textit{G}= m+1\). Observe that if this holds, we are done as it proves that memoryless strategies suffice for \(\mathcal {P}_{1} \) in all one-player \( AE \) games. Let \(\textit{G}\) be such a game with \(d_\textit{G}= m+1\). Recall that in \(\textit{G}= (S_1, S_2 = \emptyset , \textit{E}, \textit{w})\) there is no strictly negative cycle by hypothesis. Let s be a state of \(\textit{G}\) such that s has at least two outgoing edges. Such a state necessarily exists since \(d_\textit{G}\ge 1\). Consider a partition of the outgoing edges of s in two non-empty sets AB such that \(A \uplus B = \{(s_1, s_2) \in \textit{E}\mid s_1 = s\}\). According to this partition, we can define in the natural way two sub-games \(\textit{G}_A= (S_1, S_2 = \emptyset , \textit{E}{\setminus } B, \textit{w})\) and \(\textit{G}_B = (S_1, S_2 = \emptyset , \textit{E}{\setminus } A, \textit{w})\) such that \(d_{\textit{G}_A} \le m\) and \(d_{\textit{G}_B} \le m\). By induction hypothesis, we know that memoryless strategies suffice to play optimally for the \( AE \) objective in those two sub-games. First, observe that if \(\mathcal {P}_{1} \) has a memoryless winning strategy \(\sigma \) in either \(\textit{G}_A\) or \(\textit{G}_B\) for threshold \(t \in \mathbb {Q}\), then this strategy remains winning in \(\textit{G}\). What we need to show is that if \(\mathcal {P}_{1} \) cannot win in both \(\textit{G}_A\) and \(\textit{G}_B\), then he also cannot win in \(\textit{G}\), even using memory in s: in the following, we assume that \(\mathcal {P}_{1} \) is memoryless in any other state \(s' \ne s\) (following the induction hypothesis) and we show that mixing cycles in s does not help him.

By contradiction, assume that \(\mathcal {P}_{1} \) cannot win in both \(\textit{G}_A\) and \(\textit{G}_B\), but he has a winning strategy \(\sigma \) in \(\textit{G}\), for the same threshold t. Let \(\pi \) be the outcome consistent with \(\sigma \). Two cases are possible.

First, state s is seen finitely often along \(\pi \). In this case, we apply Lemma 4 repeatedly on \(\pi \) to iteratively remove all cycles on s. Since there is no strictly negative cycle in \(\textit{G}\), we know that removing one cycle cannot increase the average-energy of the play (it either stays the same if the cycle is a zero cycle, or decreases if it is a strictly positive one). Since s is seen finitely often, we eventually obtain a play \(\pi '\) that sees s at most once. Therefore, this play either belongs to \(\textit{G}_A\) or \(\textit{G}_B\) (both if s is never visited). Furthermore, it has average-energy at most t by construction. This contradicts the claim that \(\mathcal {P}_{1} \) has no winning strategy in both sub-games and concludes the proof in this case.

Second, state s is seen infinitely often along \(\pi \). Since \(\mathcal {P}_{1} \) is memoryless outside \(s, \pi \) only contains simple cycles and can be written as \(\pi = \rho \cdot \mathcal {C}_{1} \cdot \mathcal {C}_{2} \cdot \mathcal {C}_{3} \cdots {}\) where \(\rho \) is an acyclic prefix ending in s and for all \(i \ge 1, \mathcal {C}_{i}\) is a simple cycle on s. Observe that every cycle \(\mathcal {C}_{i}\) belongs either to \(\textit{G}_A\) or to \(\textit{G}_B\). Furthermore, since \(\pi \) is winning and there is no strictly negative cycle in \(\textit{G}\), only finitely many indices \(i_1\), ..., \(i_k\) may correspond to a strictly positive cycle. With the same reasoning as above (repeated application of Lemma 4), we have that the play \(\pi ' = \rho \cdot \mathcal {C}_{i_k+1} \cdot \mathcal {C}_{i_k +2}\cdots {}\), obtained by removing the first cycles up to index \(i_k\), necessarily has a lower or equal average-energy: hence it is also winning. Now observe that the sequence of cycles \(\pi '' = \mathcal {C}_{i_k+1} \cdot \mathcal {C}_{i_k +2}\cdots {}\) may still involve simple cycles from both \(\textit{G}_A\) and \(\textit{G}_B\). Still, as all cycles are of length at most \(\left| S\right| \), and are zero cycles, we can apply Lemma 5 to extract one best cycle \(\mathcal {C}_{j}, j > i_k\). Putting all this together, we have that \(\pi ''' = \rho \cdot (\mathcal {C}_j)^{\omega }\) is such that \( \overline{AE} (\pi ''') \le \overline{AE} (\pi )\). Furthermore, \(\pi '''\) is a simple lasso path that belongs either to \(\textit{G}_A\) or to \(\textit{G}_B\) (as it now uses a unique outgoing edge from s). Consequently, \(\pi '''\) describes a winning strategy in one of the sub-games, which contradicts our hypothesis and concludes our proof in this case too.\(\square \)

Polynomial-time algorithm. We now know the form of optimal memoryless strategies: an optimal lasso path \(\pi = \rho \cdot \mathcal {C}^\omega \) w.r.t. the \( AE \). We establish a polynominal-time algorithm to solve one-player \({ AE}\) games.

The crux of our algorithm consists in computing, for each state s, the best—w.r.t. the \( AE \)zero cycle \(\mathcal {C}_s\) starting and ending in s (if any). This is achieved through linear programming (LP) over expanded graphs. For each state s and length \(k \in \{1, \ldots {}, \vert S\vert \}\), we compute the best cycle \(\mathcal {C}_{s, k}\) by considering a graph (Fig. 2) that models all cycles of length k from s and that uses \(k+1\) levels and two-dimensional weights on edges of the form \((c, l\cdot c)\) where c is the weight in the original game and \(l \in \{k, k-1, \ldots {}, 1\}\) is the level of the edge. In the LP, we look for cycles \(\mathcal {C}_{s,k}\) of length k on s such that (a) the sum of weights in the first dimension is zero (thus \(\mathcal {C}_{s,k}\) is a zero cycle), and (b) the sum in the second one is minimal. Fortunately, this sum is exactly equal to \( AE (\mathcal {C}) \cdot k\) thanks to the l factors used in the weights of the expanded graph. Hence, we obtain the optimal cycle \(\mathcal {C}_{s, k}\) (in polynomial time). Doing this \(\vert S\vert \) times for each state s, we obtain for each of them the optimal cycle \(\mathcal {C}_{s}\) (if one zero cycle exists). Then, by Lemma 4, it remains to compute the least \( EL \) with which each state s can be reached using classical graph techniques (e.g., Bellman–Ford), and to pick the optimal combination to obtain an optimal memoryless strategy, in polynomial time.

Fig. 2
figure 2

The best cycle \(\mathcal {C}_{s, 2}\) is computed by looking for a path from (s, 2) to (s, 0) with sum zero in the first dimension (zero cycle) and minimal sum in the second dimension (minimal \({ AE}\)). Here, the cycle via \(s'\) is clearly better, with \({ AE}\) equal to \(-1/2\) in contrast to 1 / 2 via \(s''\). a original game. b Expanded graph for \(k=2\)

Theorem 7

The \({ AE}\) problem for one-player games is in P.

Proof

Let \(s_{\mathsf{init}}\) be the initial state and \(t \in \mathbb {Q}\) be the threshold. From Theorem 6, we can restrict our search to memoryless strategies achieving average-energy less than or equal to t. As noted in the proof of Theorem 6, if a strictly negative simple cycle exists and can be reached from \(s_{\mathsf{init}}\), then the answer to the \({ AE}\) problem is clearly \(\textsf {Yes}\), as average-energy \(-\infty \) is achievable. Checking if such a cycle exists and is reachable can be done in cubic time in the number of states (e.g., using Bellman–Ford to detect negative cycles).

Hence, we now assume that no negative cycle exists. The main part of our algorithm consists in computing, for each state s, the least average-energy that can be achieved along a simple zero cycle starting and ending in s (if any). Indeed, strictly positive cycles should be avoided as there is no negative cycle to counteract them. Applying Lemma 4, it then remains to compute the least energy level with which each state s can be reached (simple paths are sufficient as there are no negative cycles), and to pick the optimal combination. Again, this last part can be solved by using classical graph algorithms in cubic time in \(\vert S\vert \).

We now focus on computing the best zero cycle from a state s. This is achieved by enumerating the possible lengths, from 1 to \(\left| S\right| \) (simple cycles suffice). For a fixed length k, we consider a new graph \(\mathcal {G}_{s,k}\), made of \(k+1\) copies of the original game \(\textit{G}\). The states of \(\mathcal {G}_{s,k}\) are pairs (ul) with \(u\in S\) and \(0\le l\le k\). The new graph is arranged in levels, indexed from \(l = k\) for the top one to \(l = 0\) for the bottom one: l represents the number of steps remaining to close the cycle of length k. For each edge \((u,u')\) of \(\textit{G}\), with \(w(u,u') = c\), and for each \(1\le l\le k\), except if both \(u'=s\) and \(l<k\) (in order to rule out intermediary visits to s), there is an edge from (ul) to \((u',l-1)\). This edge carries a pair of weights \((c,l\cdot c)\). Our aim is to find a path in this graph from (sk) to (s, 0) (hence this is a simple cycle of length k) such that the sum of the weights on the first dimension is zero (hence this is a zero cycle) and the sum on the second dimension is minimized (when divided by k, this sum is precisely the average-energy, if starting from energy level zero).

This problem can be expressed as a linear program, with variables \(x_{u,u',l}\) for each edge \(u\rightarrow u'\) and each \(1\le l\le k\). While they are not required to take integer values, these variables are intended to represent the number of times the edge from (ul) to \((u',l-1)\) is taken along a “path” in \(\mathcal {G}_{s,k}\). The linear program is as follows:

$$\begin{aligned}&\hbox {minimize }\sum x_{u,u',l}\cdot l\cdot w(u, u')\hbox { subject to}\\&1.\quad 0\le x_{u,u',l}\le 1\hbox { for all}~x_{u,u',l};\\&2.\quad \hbox {for all}~(u,l)\hbox { with }1\le l\le k-1, \sum _{u'} x_{u',u,l+1} = \sum _{u'} x_{u,u',l} ;\\&3.\quad \sum _{u'} x_{s,u',k} = \sum _{u'} x_{u',s,1} =1 ;\\&4.\quad \sum x_{u,u',l}\cdot w(u,u')=0;\\&5.\quad \sum x_{u,u',l} \ge 1. \end{aligned}$$

Condition (2) states that each state has the same amount of “incoming” and “outgoing” flow. Condition (3) expresses the fact that we start and end up in state s. Condition (4) encodes the fact that we are looking for zero cycles, and Condition (5) rules out the (possible) trivial solution where all variables are zero.

First observe that if this LP has no solution, then there is no zero cycle of length k from s. Now, assume it has a solution \((x_{u,u',l}^0)\): this solution minimizes \(\sum x_{u,u',l}\cdot l\cdot w(u,u')\). Consider a sequence of edges \(s=u_k \rightarrow u_{k-1} \rightarrow \cdots \rightarrow u_1 \rightarrow u_0=s\) for which \(x_{u_l,u_{l-1},l}>0\) for all l. The existence of such a sequence easily follows from Conditions (2) and (3). Assume that this is not a zero cycle. As there are no negative cycles, then this must be a positive cycle. But in order to fulfill Condition (4), we would need a negative cycle to compensate for this positive cycle, hence implying contradiction. We conclude that any sequence of consecutive edges as selected above is a zero cycle. Similarly, there cannot be a zero cycle of length k from s with better average-energy, as this would contradict the optimality of this solution. We thus have obtained an average-energy-optimal simple zero cycle of length k from s, in polynomial time. Indeed, the LP is polynomial in the size of \(\mathcal {G}_{s,k}\), itself polynomial in the size of the original game: the expanded graph has its size bounded by \(\vert S\vert \cdot (k+1)\) and all weights are bounded by \(k\cdot W\) with \(k \le \vert S\vert \) and W the largest absolute weight in the original game.

As discussed above, this process can be repeated for each state s and each length \(k, 1 \le k \le \vert S\vert \), hence at most \(\vert S\vert ^{2}\) times. For each state, we select the best cycle among the \(\vert S\vert \) possible ones (one for each length). Therefore, in polynomial time, we get a description of the best cycles w.r.t. the average-energy, for each \(s \in S\). Clearly if no such cycle exists, then the answer to the \({ AE}\) problem is No, as all cycles are strictly positive and the average-energy of any play will be \(+\infty \). If some exist, we can find an optimal strategy by picking the best combination between such a cycle from a state s and a corresponding prefix from \(s_{\mathsf{init}}\) to s of minimal energy level. As presented before, this is achieved in polynomial time. Then the answer to the \({ AE}\) problem is Yes if and only if this optimal combination yields average-energy at most equal to t. This concludes our proof. \(\square \)

3.4 Two-player games

Memoryless determinacy We now prove that memoryless strategies still suffice in two-player games. As discussed in Sect. 3.2, most classical criteria do not apply. There is, however, one result that proves particularly useful. Consider any payoff function such that memoryless strategies suffice for both one-player versions (\(S_{1} = \emptyset \), resp. \(S_{2} = \emptyset \)). In [23, Cor. 7], Gimbert and Zielonka establish that memoryless strategies also suffice in two-player games with the same payoff. Thanks to Theorem 6, this entails the next theorem.

Theorem 8

Average-energy games are determined and both players have memoryless optimal strategies.

Observe that this result is true for both variants of the average-energy payoff function, namely \( \overline{AE} \) and \( \underline{AE} \). When both players play optimally, they can restrict themselves to memoryless strategies and both variants thus coincide as mentioned earlier.

Solving average-energy games Finally, consider the complexity of deciding the winner in a two-player \({ AE}\) game. By Theorem 8, one can guess an optimal memoryless strategy for \(\mathcal {P}_{2} \) and solve the remaining one-player game for \(\mathcal {P}_{1} \), in polynomial time (by Theorem 7). The converse is also true: one can guess the strategy of \(\mathcal {P}_{1} \) and solve the remaining game where \(S_{1} = \emptyset \) in polynomial time. Thus, we obtain the following result.

Theorem 9

The \({ AE}\) problem for two-player games is in \(\mathsf{NP}\cap \mathsf{coNP}\).

We complete our study by proving that \({ MP}\) games can be encoded into \({ AE}\) ones in polynomial time. The former are known to be in \(\mathsf{NP}\cap \mathsf{coNP}\) but whether they belong to \(\mathsf{P}\) is a long-standing open question (e.g., [9, 16, 26, 38]). Hence, w.r.t. current knowledge, the \(\mathsf{NP}\cap \mathsf{coNP}\)-membership of the \({ AE}\) problem can be considered optimal. The key of the construction is to double each edge of the original game and modify the weight function such that each pair of successive edges corresponding to such a doubled edge now has a total energy level of zero, and an average-energy that is exactly equal to the weight of the original edge. Then we apply decomposition techniques as in Lemma 5 to establish the equivalence.

Theorem 10

Mean-payoff games can be reduced to average-energy games in polynomial time.

Proof

Let \(G=(S_1,S_2,E,w)\) be a game, and \(t\in \mathbb {Q}\) be the threshold for the mean-payoff problem. From G, we build another game \(G'=(S'_1,S'_2,E',w')\) such that

  • \(S'_1=S_1\cup E\) and \(S'_2=S_2\);

  • \(E'\) contains two types of edges:

    • \((s,e)\in E'\) iff there exists \(s'\) such that \(e=(s,s')\in E\). Then \(w'(s,e) = 2\cdot w(e)\).

    • \((e,s') \in E'\) for any \(e=(s,s') \in E\). Then \(w'(e,s') = -2\cdot w(e)\).

We claim that \(\mathcal {P}_{1} \) has a strategy ensuring objective \( MeanPayoff (t)\) in G if and only if the answer for the \({ AE}\) problem in \(G'\) is Yes for the same threshold t. A similar construction is used in [5].

With a prefix \(\rho =(s_i)_{i\le n}\) in G, we can associate a prefix \(\rho '=(s'_i)_{i\le 2n}\) in \(G'\) as follows: for all \(k \le n, s'_{2k}=s_k\), and for all \(k < n, s'_{2k+1}=(s_k,s_{k+1})\). The mean-payoff along \(\rho \) then equals the average energy along \(\rho '\) (assuming initial energy 0 for \(\rho '\)). Indeed, applying the same decomposition arguments as for Lemma 5 and by definition of the weight function \(w'\), we have that

$$\begin{aligned} AE (\rho ')&= \dfrac{1}{n}\sum _{i = 0}^{n-1} \dfrac{2\cdot w'(s_i, (s_i, s_{i+1})) + w'((s_i, s_{i+1}), s_{i+1})}{2}\\&= \dfrac{1}{n}\sum _{i = 0}^{n-1} \dfrac{4\cdot w(s_i, s_{i+1}) -2\cdot w(s_i, s_{i+1})}{2} = \dfrac{1}{n}\sum _{i = 0}^{n-1} w(s_i, s_{i+1}) = MP (\rho ). \end{aligned}$$

Conversely, with a prefix \(\rho '=(s'_i)_{i\le 2n}\) in \(G'\) starting and ending in a state in \(S_1~\cup ~S_2\), we can associate a prefix \(\rho =(s_i)_{i\le n}\) in G such that \(s_k=s'_{2k}\) for all \(k \le n\). Again, assuming the initial energy is zero in \(\rho '\), the average energy along \(\rho '\) equals the mean payoff along \(\rho \).

Now, assume that \(\mathcal {P}_{1} \) has a winning strategy \(\sigma \) in G from some state \(s\in S_1\cup S_2\), achieving mean-payoff less than or equal to t. Consider the strategy \(\sigma '\) for \(G'\) defined as \(\sigma '(\rho ') = \sigma (\rho )\) if \(\rho '\) ends in \(S_1\). If \(\rho '\) ends in a T-state of the form \((s,s')\), then we let \(\sigma '(\rho ')=s'\), which is the only possible outgoing edge. We see that the outcomes of \(\sigma '\) correspond to the outcomes of \(\sigma \), so that, assuming that the initial energy level is zero, \(\sigma '\) enforces that the average-energy is below t for any infinite outcome. Conversely, given a strategy \(\sigma '\) for \(G'\) whose outcomes have average-energy below t, the strategy defined by \(\sigma (\rho ) = \sigma '(\rho ')\) for all finite paths \(\rho \) in G secures a mean-payoff below t. Observe that the equivalence holds both between \( \overline{AE} \) and \( \overline{MP} \), and between \( \underline{AE} \) and \( \underline{MP} \). Indeed, we have seen that for both MP and \({ AE}\) games, memoryless strategies suffice and decision problems for both variants coincide.\(\square \)

4 Average-energy with lower- and upper-bounded energy

We extend the \({ AE}\) framework with constraints on the running energy level of the system. Such constraints are natural in many applications where the energy capacity is bounded (e.g., fuel tank, battery charge). We first study the case where the energy is subject to both a lower bound (here, zero) and an upper bound (\(U \in \mathbb {N}\)). We study the problem for the fixed initial energy level \(c_{\mathsf{init}}\,{:=}\,0\). In this case, the range of acceptable energy levels is by definition constrained to the interval [ 0, U]. Our approach benefits from this: we solve the \({ AE}_{{ LU}}\) problem by considering an \({ AE}\) problem (and subsequently, an \({ MP}\) problem) over an expanded game that explicitly accounts for the lower and upper bounds on the energy.

Formally, we want to decide if \(\mathcal {P}_{1}\) can ensure a sufficiently low \( AE \) while keeping the \( EL \) within the allowed range.

Problem 2

(\({ AE}_{{ LU}}\)) Given a game \(\textit{G}\), an initial state \(s_{\mathsf{init}}\), an upper bound \(U \in \mathbb {N}\), and a threshold \(t \in \mathbb {Q}\), decide if \(\mathcal {P}_{1} \) has a winning strategy \(\sigma _1 \in \Sigma _{1}\) for the objective \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\,\cap \, AvgEnergy (t)\).

Again, we present results for the supremum variant \( \overline{AE} \) but they also hold for the infimum one \( \underline{AE} \).

Illustration Consider the one-player game in Fig. 3. The energy constraints force \(\mathcal {P}_{1} \) to keep the energy in \([0,\,3]\) at all times. Hence, only three strategies can be followed safely, respectively inducing plays \(\pi _{1}, \pi _{2}\) and \(\pi _{3}\). Due to the bounds on energy, it is natural that strategies need to alternate between both a positive and a negative cycle to satisfy objective \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\) (since no simple zero cycle exists). It is yet interesting that to play optimally (play \(\pi _{3}\)), \(\mathcal {P}_{1} \) actually has to use both positive cycles, and in the appropriate order (compare plays \(\pi _{2}\) and \(\pi _{3}\)).

Fig. 3
figure 3

Example of a one-player \({ AE}_{{ LU}}\) game (\(U = 3\)) and the evolution of energy under different strategies that maintain it within \(\left[ 0,\, 3\right] \) at all times. The minimal average-energy is obtained with play \(\pi _{3}\): alternating in order between the \(+1, +2\) and \(-3\) cycles. a One player \({ AE}_{{ LU}}\) game. b Play \(\pi _1=(acacacab)^{\omega }\). c Play \(\pi _2=(aacab)^{\omega }\). d Play \(\pi _3=(acaab)^{\omega }\)

This type of alternating behavior is more intricate than for other classical conjunctions of objectives. Consider for example energy parity [12] or multi-dimensional energy games [15, 37]. It is usually necessary to use different cycles in such games: intuitively, one needs one “good” cycle for each dimension and one for the parity objective, and a winning strategy needs to alternate between those cycles. However, there is no need to use two different cycles that are “good” w.r.t. the same part of the objective. In the case of \({ AE}_{{ LU}}\) games, we see that it is sometimes necessary to use two (or more) different cycles even though they impact the sum of weights in the same direction (e.g., several positive cycles). This gives a hint of the complexity of \({ AE}_{{ LU}}\) games.

4.1 Pseudo-polynomial algorithm and complexity bounds

We first reduce the \({ AE}_{{ LU}}\) problem to the \( AE \) problem over a pseudo-polynomial expanded game, i.e., polynomial in the size of the original \({ AE}_{{ LU}}\) game and in \(U \in \mathbb {N}\). By Theorem 9 and Theorem 7, this reduction induces \(\mathsf{NEXPTIME}~\cap ~\mathsf{coNEXPTIME}\)-membership of the two-player \({ AE}_{{ LU}}\) problem, and \(\mathsf{EXPTIME}\)-membership of the one-player one. We improve the complexity for two-player games by further reducing the \( AE \) game to an \( MP \) game: this yields \(\mathsf{EXPTIME}\)-membership, which is optimal (Theorem 13). We also improve the one-player case by observing that a witness lasso path in the \( MP \) game can be built on-the-fly, and the mean-payoff of this path can be computed using only polynomial space in the original game, hence we end up with PSPACE-membership which we also prove optimal in Theorem 13.

Observe that if U is encoded in unary or if U is polynomial in the size of the original game, the complexity of the \({ AE}_{{ LU}}\) problem collapses to \(\mathsf{NP}\cap \mathsf{coNP}\) for two-player games and to \(\mathsf{P}\) for one-player games thanks to our reduction to an \( AE \) problem and the results of Theorem 9 and Theorem 7.

The reductions Given a game \(G = (S_{1}, S_{2}, E, w)\), an initial state \(s_{\mathsf{init}}\), an upper bound \(U \in \mathbb {N}\), and a threshold \(t \in \mathbb {Q}\), we reduce the \({ AE}_{{ LU}}\) problem to an \({ AE}\) problem as follows. If at any point along a play, the energy drops below zero or exceeds U, the play will be losing for the \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\) objective, hence also for its conjunction with the \({ AE}\) one. So we build a new game \(G'\) over the state space \((S \times \{0, 1, \ldots {}, U\}) \cup \{\textsf {sink}\}\). The idea is to include the energy level within the state labels, with \(\textsf {sink}\) as an absorbing state reached only when the energy constraint is breached. We now consider the \({ AE}\) problem for threshold t on \(G'\). By putting a self-loop of weight 1 on \(\textsf {sink}\), we ensure that if the energy constraint is not guaranteed in G, the answer to the \({ AE}\) problem in \(G'\) will be No as the average-energy will be infinite due to reaching this positive loop and repeating it forever. Hence, we show that the \({ AE}_{{ LU}}\) objective can be won in G if and only if the \( AE \) one can be won in \(G'\) (thus avoiding the \(\textsf {sink}\) state). The result of the reduction for the game in Fig. 3a is presented in Fig. 4.

Fig. 4
figure 4

Reduction from the \({ AE}_{{ LU}}\) game in Fig. 3a to an \( AE \) game and further reduction to an \( MP \) game over the same expanded graph. For the sake of succinctness, the weights are written as \(c \mid c'\) with c the weight used in the \( AE \) game and \(c'\) the one used in the \( MP \) game. We use the upper bound \(U = 3\) and the average-energy threshold \(t = 1\) (the optimal value in this case). The optimal play \(\pi _{3} = (acaab)^{\omega }\) of the original game corresponds to an optimal memoryless play in the expanded graph

Lemma 11

The \({ AE}_{{ LU}}\) problem over a game \(G = (S_{1}, S_{2}, E, w)\), with an initial state \(s_{\mathsf{init}}\), an upper bound \(U \in \mathbb {N}\), and a threshold \(t \in \mathbb {Q}\), is reducible to an \( AE \) problem for the same threshold \(t \in \mathbb {Q}\) over a game \(G' = (S'_{1}, S'_{2}, E', w')\) such that \(\vert S' \vert = (U + 1) \cdot \vert S\vert + 1\) and \(W' = \max \, \{\min \,\{W,\, U\},\, 1\}\), i.e., the largest absolute weight in \(G'\) is at most the same as in G, or equal to constant 1.

Proof

Consider the game \(G= (S_{1}, S_{2}, E, w)\), with initial state \(s_{\mathsf{init}}\), upper bound \(U \in \mathbb {N}\) and threshold \(t \in \mathbb {Q}\). We define the expanded game \(G' = (S'_{1}, S'_{2}, E', w')\) as follows.

  • \(S'_{1} = (S_{1} \times \{0, 1, \ldots {}, U\}) \cup \{\textsf {sink}\}\).

  • \(S'_{2} = S_{2} \times \{0, 1, \ldots {}, U\}\).

  • For all \((u, v) \in E, (u, c) \in S'\), we have that:

    1. 1.

      if \(d = c + w(u, v) \in [0, U]\), then \(e = \big ( (u,c), (v, d)\big ) \in E'\) and \(w'(e) = w(u, v)\),

    2. 2.

      else \(e = \big ( (u,c),\, \textsf {sink}\big ) \in E'\) and \(w'(e) = 1\).

  • \((\textsf {sink}, \textsf {sink}) \in E'\) and \(w(\textsf {sink}, \textsf {sink}) = 1\).

The game \(G'\) starts in state \((s_{\mathsf{init}}, 0)\) and edges are built naturally to reflect the changes in the energy level. Whenever the energy drops below zero or exceeds U, we redirect the edge to \(\textsf {sink}\), where a self-loop of weight 1 is repeated forever.

We claim that \(\mathcal {P}_{1} \) has a winning strategy \(\sigma _{1}\) for the \({ AE}_{{ LU}}\) objective in G if and only if he has a winning strategy \(\sigma '_{1}\) for the \( AE \) objective in \(G'\), for the very same average-energy threshold t.

First, consider the left-to-right implication. Assume \(\sigma _{1}\) is winning for objective \( Energy _ LU (U, c_{\mathsf{init}}~\,{:=}\,~0) \cap AvgEnergy (t)\) in G. The very same strategy can be followed in \(G'\), ignoring the additional information on the energy in the state labels. Precisely, for any prefix \(\rho ' = (s_0, c_0) (s_{1}, c_1) \ldots {} (s_n, c_n)\) in \(G'\), we define \(\sigma '_{1}(\rho ') = (s, c)\) where \(s = \sigma _{1}(\rho )\) for \(\rho = s_{0} s_{1} \ldots {} s_n\) and \(c = c_n + w(s_n, s)\). Obviously, playing this strategy ensures that the special state sink is never reached, as otherwise it would not be winning for \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\) in G, by construction of \(G'\). Since all weights are identical in both games except on edges entering the sink state, we have that any consistent outcome \(\pi '\) of \(\sigma '_{1}\) in \(G'\) corresponds to a consistent outcome \(\pi \) of \(\sigma _{1}\) in G such that \( \overline{AE} (\pi ') = \overline{AE} (\pi )\), and conversely. Therefore, \(\sigma '\) is clearly winning for objective \( AvgEnergy (t)\) in \(G'\).

Second, consider the right-to-left implication. Assume \(\sigma '_{1}\) is winning for objective \( AvgEnergy (t)\) in \(G'\). Then this strategy ensures that sink is avoided forever. Otherwise, there would exist a consistent outcome \(\pi '\) reaching sink, and such that \( \overline{AE} (\pi ') = \infty > t\) because of the strictly positive self-loop. Thus the strategy would not be winning. Hence by construction of \(G'\), this strategy trivially ensures \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\) in \(G'\). From \(\sigma '_{1}\), we build a strategy \(\sigma _{1}\) in G in the natural way, potentially integrating the information on the energy within the memory of \(\sigma _{1}\). Again, there is a bijection between plays avoiding sink in \(G'\) and plays in G, such that \(\sigma _{1}\) is winning for \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0) \cap AvgEnergy (t)\) in G.

Hence we have shown the claimed reduction. For the sake of completeness, observe that the reduction holds both for \( \overline{AE} \) and \( \underline{AE} \) variants of the average-energy. It remains to discuss the size of the expanded game. Observe that \(\vert S' \vert = (U + 1) \cdot \vert S \vert + 1\). Furthermore, if W is the largest absolute weight in G, then \(W' = \max \, \{\min \,\{W, U\},\, 1\}\) is the largest one in \(G'\). Indeed, \(W'\) is upper-bounded by U by construction (as all edges of absolute weight larger than U can be redirected directly to sink) and it is lower-bounded by 1 due to edges leading to sink. So the state space of \(G'\) is polynomial in the state space of G and in the value of the upper bound U, while its weights are bounded by either the largest weight W, the upper bound U or constant 1.\(\square \)

We now show that the \( AE \) game \(G'\) can be further reduced to an \( MP \) game \(G''\) by modifying the weight structure of the graph. Essentially, all edges leaving a state (sc) of \(G'\) are given weight c in \(G''\), i.e., the current energy level, and the self-loop on sink is given weight \((\lceil t\rceil + 1)\). This modification is depicted in Fig. 4. We claim that the \({ AE}\) problem for threshold \(t \in \mathbb {Q}\) in \(G'\) is equivalent to the \({ MP}\) problem for the same threshold in \(G''\). Indeed, we show that with our change of weight function, reaching sink implies losing, both in \(G'\) for \({ AE}\) and in \(G''\) for \({ MP}\), and all plays that do not reach sink have the same value for their average-energy in \(G'\) as for their mean-payoff in \(G''\).

Lemma 12

The \({ AE}\) problem over the game \(G' = (S'_{1}, S'_{2}, E', w')\) defined in Lemma 11 is reducible to an \({ MP}\) problem for the same threshold \(t \in \mathbb {Q}\) over a game \(G'' = (S'_{1}, S'_{2}, E', w'')\) sharing the same state space but with largest absolute weight \(W'' = \max \{U,\, \lceil t\rceil + 1\}\), where U is the energy upper bound of the original \({ AE}_{{ LU}}\) problem.

Proof

Let \(G' = (S'_{1}, S'_{2}, E', w')\) be the game defined in Lemma 11, as a reduction from the original game G for the \({ AE}_{{ LU}}\) problem with upper bound \(U \in \mathbb {N}\) and average-energy threshold \(t \in \mathbb {Q}\). We now build the game \(G'' = (S'_{1}, S'_{2}, E', w'')\) by simply modifying the weight function of \(G'\). The changes are as follows:

  • For all edge \(e = ((s,c),\,(s',c')) \in E'\), its weight in \(G'\) is \(w'(e) = c' - c\) and we now set it to \(w''(e) = c\) in \(G''\). Recall that by construction of \(G'\), the value c represents the current energy level for any prefix ending in (sc). This is the value we now use for the outgoing edge. Also, this value is constrained in \([0,\,U]\) by definition of \(G'\).

  • For all edge \(e = ((s,c),\,\textsf {sink}) \in E'\), its weight in \(G'\) is \(w'(e) = 1\) and we now set it to \(w''(e) = c\) in \(G''\) for the sake of consistency (the actual value over this type of edges will not matter eventually).

  • For the self-loop \(e = (\textsf {sink},\, \textsf {sink}) \in E'\), its weight in \(G'\) is \(w'(e) = 1\) and we now set it to \(w''(e) = \lceil t\rceil + 1\) in \(G''\). That is, reaching sink will imply a mean-payoff higher than the threshold.

Before proving the claim, we show that for all plays \(\pi \in Plays (G') = Plays (G'')\) that do not reach sink, we have that \( \overline{AE} _{G'}(\pi ) = \overline{MP} _{G''}(\pi )\), where the subscript naturally refers to the change of weight function. Let

$$\begin{aligned} \pi = s'_0s'_1s'_2\ldots {} = (s_0, c_0) (s_1, c_1) (s_2, c_2)\ldots {} \end{aligned}$$

be such a play, where for all \(i \ge 0, s'_{i} \in S'\) and \((s_{i}, c_{i}) \in S\times [0, U] \cap \mathbb {N}\) is its corresponding label. By definition of \(G''\), we have that,

$$\begin{aligned} \forall \, n \ge 0,\quad w''(s'_n, s'_{n+1}) = c_n = EL _{G'}(\pi (n)). \end{aligned}$$

Hence by definition of the mean-payoff and the average-energy,

$$\begin{aligned} \overline{MP} _{G''}(\pi )&= \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{i = 0}^{n-1} w''(s'_i,s'_{i+1})\nonumber \\&= \limsup _{n \rightarrow \infty } \frac{1}{n} \sum _{i = 0}^{n-1} EL _{G'}(\pi (i)) = \overline{AE} _{G'}(\pi ). \end{aligned}$$
(4)

For the sake of completeness, observe that this equality does not hold for plays reaching sink, as they have infinite average-energy in \(G'\) but finite mean-payoff in \(G''\).

We proceed by proving the claim that \(\mathcal {P}_{1} \) has a winning strategy \(\sigma '_{1}\) for the \( AE \) objective in \(G'\) if and only if he has a winning strategy \(\sigma ''_{1}\) for the \( MP \) objective in \(G''\), for the very same threshold t.

First, consider the left-to-right implication. Assume \(\sigma '_{1}\) is winning for objective \( AvgEnergy (t)\) in \(G'\). We apply the same strategy in \(G''\) straightforwardly as the underlying graph is not modified. Since this strategy is winning for the \( AE \) objective in \(G'\), it necessarily avoids sink both in \(G'\) and \(G''\) (as otherwise the \( AE \) would be infinite). Hence by Eq. (4), we have that \(\sigma '_{1}\) is also winning for \( MeanPayoff (t)\) in \(G''\).

Second, consider the right-to-left implication. Assume \(\sigma ''_{1}\) is winning for objective \( MeanPayoff (t)\) in \(G''\). Since the self-loop on sink has weight \(\lceil t\rceil + 1\), it is necessary that \(\sigma ''_{1}\) never reaches sink otherwise it would not be winning. Hence we apply the same strategy in \(G'\) and by Eq. (4), we have that \(\sigma ''_{1}\) is also winning for \( AvgEnergy (t)\) in \(G'\).

This proves correctness of the reduction. The same reasoning can be followed for \( \underline{AE} \) (thus using \( \underline{MP} \)) instead of \( \overline{AE} \). We end by discussing the size of \(G''\). Clearly, the state space \(S''\) is identical to \(S'\), hence \(\vert S'' \vert = (U + 1) \cdot \vert S \vert + 1\). However, the largest absolute weight in \(G''\) is \(W'' = \max \{U,\, \lceil t\rceil + 1\}\). Indeed, the self-loop on sink has weight \((\lceil t\rceil + 1)\) and all other edges have weight bounded by the energy upper bound U by construction.\(\square \)

Illustration Consider the \({ AE}_{{ LU}}\) game G depicted in Fig. 3a. We have seen that the optimal strategy is \(\pi _{3} = (acaab)^{\omega }\). Now consider the reduction to the \( AE \) game, and further to the \( MP \) game, depicted in Fig. 4. The optimal (memoryless) strategy in both the \( AE \) game \(G'\) and the \({ MP}\) game \(G''\) is to create the play \(\pi ' = ((a,0) (c,1) (a,1) (a,3) (b,0))^{\omega }\), which corresponds to the optimal play \(\pi _{3}\) in the original game. It can be checked that \( \overline{AE} _{G}(\pi _{3}) = \overline{AE} _{G'}(\pi ') = \overline{MP} _{G''}(\pi ')\).

Complexity The reduction from the \({ AE}_{{ LU}}\) game to the \( AE \) one induces a pseudo-polynomial blow-up in the number of states. Thanks to the second reduction and the use of a pseudo-polynomial algorithm for the \( MP \) game [9, 38], we get EXPTIME-membership, which is optimal for two-player games thanks to the lower bound proved for \({ EG}_{{ LU}}\) [6]. The complexity is reduced when the bound U is given in unary or is polynomial in the size of the game, matching the one obtained for \( AE \) games without energy constraints.

For the one-player case, we also use the reduction to an \( MP \) game. By [17], optimal memoryless strategies exist, hence it suffices to non-deterministically build a simple lasso path in \(G''\), and to check that it satisfies the mean-payoff constraint. It can be done using only polynomial space through on-the-fly computation.

Theorem 13

The \({ AE}_{{ LU}}\) problem is \(\mathsf{EXPTIME}\)-complete for two-player games and PSPACE-complete for one-player games. If the upper bound \(U \in \mathbb {N}\) is polynomial in the size of the game or encoded in unary, the \({ AE}_{{ LU}}\) problem collapses to \(\mathsf{NP}\cap \mathsf{coNP}\) and \(\mathsf{P}\) for two-player and one-player games, respectively.

Proof

Let \(G = (S_1, S_2, E, w)\) be the original \({ AE}_{{ LU}}\) game, \(W \in \mathbb {N}\) its largest absolute weight, \(U \in \mathbb {N}\) the upper bound for energy and \(t \in \mathbb {Q}\) the threshold for the \({ AE}_{{ LU}}\) problem. By Lemma 11, this \({ AE}_{{ LU}}\) problem is reducible to an \( AE \) problem for the same threshold t over a game \(G' = (S'_{1}, S'_{2}, E', w')\) such that \(\vert S' \vert = (U + 1) \cdot \vert S\vert + 1\) and \(W' = \max \, \{\min \,\{W,\, U\},\, 1\}\). By Lemma 12, the \({ AE}_{{ LU}}\) problem can be further reduced to an \({ MP}\) problem for the same threshold t over a game \(G'' = (S'_{1}, S'_{2}, E', w'')\) sharing the same state space as \(G'\) but with largest absolute weight \(W'' = \max \{U,\, \lceil t\rceil + 1\}\). We start by proving the complexity upper bounds.

First, consider the one-player case. Combining Theorem 7 and the reduction to an \( AE \) game, we obtain that one-player \({ AE}_{{ LU}}\) games can be solved in pseudo-polynomial time, i.e., polynomial in \(\vert S \vert \) but also in the value of U (hence exponential in the size of its binary encoding). This both gives \(\mathsf{EXPTIME}\)-membership of one-player \({ AE}_{{ LU}}\) games with arbitrary upper bounds, and \(\mathsf{P}\)-membership of the same games with polynomial or unary upper bounds. For arbitrary bounds, we improve the complexity from EXPTIME to PSPACE. To do so, we consider the further reduction to an \( MP \) game, but we do not completely build the \( MP \) game \(G''\) which is known to be of exponential size. Instead, we build non-deterministically a witness lasso path (thanks to memoryless determinacy [17], they are sufficient) and check on-the-fly that the path is winning or not, using only polynomial space. Recall that we consider a game \(G''\) such that \(S'_{2} = \emptyset \). Our non-deterministic algorithm answers Yes if \(\mathcal {P}_{1} \) has a winning strategy in \(G''\) (and hence in G thanks to Lemmas 11 and 12), No otherwise, and is as follows:

  1. 1.

    Guess a state \(s'_{r} \in S'_{1} = (S_{1} \times \{0, 1, \ldots {}, U\}) \cup \{\textsf {sink}\}\) that will be the starting (and ending) state of the cycling part of the lasso path. For the following, we assume that \(s'_{r} \ne \textsf {sink}\) otherwise the lasso path that we are trying to build is clearly losing (see proof of Lemma 12) and the algorithm answers No. Thus, store state \(s'_{r} = (s_{r}, m)\) for some \(m \in \{0, \ldots {}, U\}\).

  2. 2.

    Check that \(s'_{r}\) is reachable from the initial state \((s_{\mathsf{init}}, 0)\). This can be done in NLOGSPACE w.r.t. the size of \(G''\) (see e.g., [35]), hence NPSPACE w.r.t. the original problem. If it is not, then the answer is No.

  3. 3.

    Build step by stepFootnote 2 a lasso path by constructing a simple cycle in \(G''\) starting in \(s'_{r}\). This construction is non-deterministic: if at any point, the sink state is reached, the algorithm returns No. The construction stops as soon as \(s'_{r}\) is reached, or after \(\vert S'\vert +1\) steps if \(s'_{r}\) is not reached: in the latter case, the answer is also No (after \(\vert S'\vert +1\) steps, we know for certain that a cycle was created hence our lasso path is complete). While constructing the cycle, we make on-the-fly computations: at each step, the next state is chosen non-deterministically and the only information that is stored—except from state \(s'_{r}\) used to determine the end of the cycle—is the number of steps from leaving \(s'_{r}\), and the sum of the weights seen along the cycle.

  4. 4.

    Assume \(s'_{r}\) is reached (otherwise we have seen that the answer is No). Let \(s'_{0}s'_{1}\ldots {}s'_{l}\) be the sequence of states visited along the construction, with \(s'_{0} = s'_{l} = s'_{r}\). We have stored the length l and the sum of weights

    $$\begin{aligned} \gamma = \sum _{i = 0}^{l-1} w''\left( s'_{i}, s'_{i+1}\right) . \end{aligned}$$

    Now, we check if \(\dfrac{\gamma }{l} \le t\): this quantity is the mean-payoff of the lasso path we have constructed. If yes, then the answer is Yes, thanks to Lemmas 11 and 12: the lasso path describes a winning strategy. Otherwise, the answer is No as this lasso path represents a losing strategy, by the same lemmas.

The correctness of this algorithm is guaranteed by Lemmas 11 and 12. It remains to argue that it only uses polynomial space in the original \({ AE}_{{ LU}}\) problem. Observe that our on-the-fly computations only need to record the state \(s'_{r}\), the current state, the current length and the current sum. We have that both states belong to \(S_{1} \times \{0, 1, \ldots {}, U\}\), that \(l < \vert S'\vert = (U + 1) \cdot \vert S \vert + 1\) and that the sum is bounded by \(l \cdot W'' = l \cdot \max \{U,\, \lceil t\rceil + 1\}\). Hence, encoding those values only requires a polynomial number of bits w.r.t. the input of the \({ AE}_{{ LU}}\) problem (i.e., logarithmic in the upper bound U, the largest weight W and the threshold t). This proves that our algorithm lies in NPSPACE, and by Savitch’s theorem [35] we know that NPSPACE \(= \mathsf{PSPACE}\): hence we proved the upper bound for the one-player \({ AE}_{{ LU}}\) problem.

Second, consider two-player \({ AE}_{{ LU}}\) games. In this case, we solve the \({ MP}\) problem over \(G''\) using a pseudo-polynomial algorithm such as the one presented in [9], whose complexity is \(\mathcal {O}(\vert S^{*}\vert ^{3} \cdot W^{*})\) for a game with \(\vert S^{*}\vert \) states and largest absolute weight \(W^{*} \in \mathbb {N}\). Therefore, the complexity of solving the original \({ AE}_{{ LU}}\) problem is

$$\begin{aligned} \mathcal {O}\left( \vert S'\vert ^{3} \cdot W''\right) = \mathcal {O}\left( \big ( (U + 1) \cdot \vert S\vert + 1\big ) ^{3} \cdot \max \{U,\, \lceil t\rceil + 1\}\right) , \end{aligned}$$

which is clearly pseudo-polynomial. Hence we obtain \(\mathsf{EXPTIME}\)-membership for two-player \({ AE}_{{ LU}}\) games. If the upper bound \(U \in \mathbb {N}\) is polynomial in the size of the game or encoded in unary, it is sufficient to solve the polynomially-larger \({ AE}\) game \(G'\) using Theorem 9 to obtain \(\mathsf{NP}\cap \mathsf{coNP}\)-membership.

Now consider lower bounds. The \({ AE}_{{ LU}}\) problem trivially encompasses the lower- and upper-bounded energy problem \({ EG}_{{ LU}}\), i.e., the \({ AE}_{{ LU}}\) without consideration of the average-energy. Indeed, consider a game G with an objective \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\), for some \(U \in \mathbb {N}\). Assume \(\mathcal {P}_{1} \) has a winning strategy for this objective. Then this strategy ensures that along any consistent outcome \(\pi \), the running energy at any point is at most equal to U. By definition, this implies that \( \underline{AE} (\pi ) \le \overline{AE} (\pi ) \le U\). Hence this strategy is also winning for the \({ AE}_{{ LU}}\) objective written as the conjunction \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0) \cap AvgEnergy (t \,{:=}\,U)\). The converse is also trivially true. Ergo, any lower bound on the complexity of the \({ EG}_{{ LU}}\) problem also holds for the \({ AE}_{{ LU}}\) one. The \(\mathsf{EXPTIME}\)-hardness of the two-player \({ EG}_{{ LU}}\) problem was proved in [6], the \(\mathsf{PSPACE}\)-hardness of the one-player version was proved in [18] (in the equivalent setting of reachability in bounded one-counter automata). Note that those results clearly rely on having an upper bound U larger than polynomial (w.r.t. the size of the game) and encoded in binary, as we have already shown that in the opposite case the complexity of the problem is reduced.

Finally, observe that the same reduction and complexities also hold if we use \( \underline{AE} \) instead of \( \overline{AE} \) to define the \({ AE}_{{ LU}}\) problem. This concludes our proof.\(\square \)

Remark 1

One could argue that the reduction from \( AE \) games to \({ MP}\) games presented in Lemma 12 could be used to solve \( AE \) games without resorting to the specific analysis of Sect. 3. Indeed, in the case where the mean-payoff value is zero, any memoryless strategy (which we know to suffice) that is winning should only create zero cycles: the energy can be constrained in the range \([-2\cdot \vert S\vert \cdot W,\, 2\cdot \vert S\vert \cdot W]\) along any winning play. However, applying a pseudo-polynomial \({ MP}\) algorithm on this new game would only grant \(\mathsf{EXPTIME}\)-membership for \( AE \) games (because of the polynomial dependency on W), in contrast to the \(\mathsf{NP}\cap \mathsf{coNP}\) and \(\mathsf{P}\) results obtained with the refined analysis for two-player and one-player \( AE \) games respectively.

4.2 Memory requirements

We prove pseudo-polynomial lower and upper bounds on memory for the two players in \({ AE}_{{ LU}}\) games. The upper bound follows from the reduction to a pseudo-polynomial \( AE \) game and the memoryless determinacy of \( AE \) games proved in Theorem 8. Observe that winning strategies obtained via our reductions have a natural form: they are memoryless w.r.t. configurations (sc) denoting the current state and the current energy level. As noted before, when the upper bound on energy \(U \in \mathbb {N}\) is polynomial or given in unary, the expanded game is only polynomial in size, and the memory needs are also reduced.

The lower bound can be witnessed in two families of games asking for strategies using memory polynomial in the energy upper bound \(U \in \mathbb {N}\) to be won by \(\mathcal {P}_{1} \) (Fig. 5a) or \(\mathcal {P}_{2} \) (Fig. 5b) respectively. It is interesting to observe that those families already ask for such memory when considering the simpler \({ EG}_{{ LU}}\) objective (i.e., bounded energy only). Sufficiency of pseudo-polynomial memory for \({ EG}_{{ LU}}\) games follows from [6] but to the best of our knowledge, it was not proved in the literature that such memory is also necessary.

Fig. 5
figure 5

Families of games witnessing the need for pseudo-polynomial-memory strategies for \({ EG}_{{ LU}}\) (and \({ AE}_{{ LU}}\)) objectives. The goal of \(\mathcal {P}_{1} \) is to keep the energy in \([0,\,U]\) at all times, for \(U \in \mathbb {N}\). The left game is won by \(\mathcal {P}_{1} \) and the right one by \(\mathcal {P}_{2} \) but both require memory polynomial in the value U to be won. a \(\mathcal {P}_{1} \) needs to take U times \((s, s')\) before taking (ss) once and repeating. b \(\mathcal {P}_{2} \) needs to increase the energy up to U using (ac) to force \(\mathcal {P}_{1} \) to take (gd) then make him lose by taking (ab)

Theorem 14

Pseudo-polynomial-memory strategies are both sufficient and necessary to win in \({ EG}_{{ LU}}\) and \({ AE}_{{ LU}}\) games with arbitrary energy upper bound \(U \in \mathbb {N}\), for both players. Polynomial memory suffices when U is polynomial in the size of the game or encoded in unary.

Proof

We first prove the upper bound on memory. The expanded game \(G'\) built in the reduction from the \({ AE}_{{ LU}}\) to the \({ AE}\) problem (Lemma 11) has a state space of size \(\vert S' \vert = (U + 1) \cdot \vert S\vert + 1\), over which memoryless strategies suffice, by Theorem 8. Thus, winning for the \({ AE}_{{ LU}}\) objective only requires memory that is polynomial in the original number of states and the upper bound value \(U \in \mathbb {N}\). The same reduction holds for \({ EG}_{{ LU}}\) games with an even simpler safety objective (never reaching sink) instead of the \( AE \) one (or equivalently with the \( AE \) objective for threshold \(t = U\)). Thus, with regard to the binary encoding of U, strategies require exponential memory in general. For the special cases of unary encoding or polynomially bounded value U, polynomial memory suffices. Note that as usual, these arguments are true for both the \( \overline{AE} \) and the \( \underline{AE} \) versions of the objective.

We now discuss the two families of games witnessing that pseudo-polynomial memory is also a lower bound for both players.

First, consider the one-player game depicted in Fig. 5a and parametrized by the value \(U \in \mathbb {N}\). Assume the objective is \({ EG}_{{ LU}}\), asking for the energy to remain within \([0,\,U]\) at all times. Recall that the initial energy level is fixed to \(c_{\mathsf{init}}\,{:=}\,0\). It is easy to see that there is only one acceptable strategy for \(\mathcal {P}_{1} \): playing \((s, s')\) exactly U times, then playing the self-loop (ss) once, and repeating this forever. Indeed, any other strategy eventually leads the energy outside the allowed range. Hence, to win this game, \(\mathcal {P}_{1} \) needs a strategy described by a Moore machine whose memory contains at least \((U + 1)\) states. This proves that pseudo-polynomial memory is required for \(\mathcal {P}_{1} \) in \({ EG}_{{ LU}}\) games. Furthermore, the same argument can be applied on this game with objective \({ AE}_{{ LU}}\) by considering the average-energy threshold \(t \,{:=}\,U\) which is trivially ensured by strategies satisfying the \({ EG}_{{ LU}}\) objective.

Second, consider the two-playerFootnote 3 \({ EG}_{{ LU}}\) game depicted in Fig. 5b. Again this game is parametrized by the energy upper bound \(U \in \mathbb {N}\) and the initial energy level is fixed to \(c_{\mathsf{init}}\,{:=}\,0\). This game can be won by \(\mathcal {P}_{2} \) using the following strategy: if the energy level is in \([1,\, U]\), play (ac), otherwise play (ab). Note that this strategy again requires at least \((U + 1)\) states of memory in its Moore machine (to keep track of the energy level).

This strategy is indeed winning. Observe that \(\mathcal {P}_{1} \) can only decrease the energy by using edge (gd) of weight \(-U\), and this edge can only be used safely if the energy level is exactly U. In addition, the energy is bound to reach or exceed U eventually (as it will increase by 1 or 2 between each visit of a). If it exceeds U, then \(\mathcal {P}_{2} \) wins directly. Otherwise, assume that the energy is U when the game is in state g. If \(\mathcal {P}_{1} \) plays (gf), he loses (the energy reaches \(U+1\)). If he plays \((g, e), \mathcal {P}_{2} \) wins by playing (ac) (the energy also reaches \(U+1\)). And if \(\mathcal {P}_{1} \) plays \((g, d), \mathcal {P}_{2} \) wins by playing (ab) (the energy reaches \(-1\)). Hence, \(\mathcal {P}_{2} \) wins the game against all strategies of \(\mathcal {P}_{1} \).

Now, observe that \(\mathcal {P}_{2} \) cannot win if he uses a strategy with less memory states in its Moore machine. Indeed, any such strategy cannot keep track of all the energy levels between 0 and U and play (ac) a sufficient number of times in a row before switching to the appropriate choice (depending on the energy being 0 or U). Therefore, if \(\mathcal {P}_{2} \) uses such a strategy, \(\mathcal {P}_{1} \) can maintain the energy in the allowed range by simply reacting to edge (ab) with (gf) and to edge (ac) by choosing between (gd) (if the energy is U) and (ge) (otherwise). Such choices are safe for \(\mathcal {P}_{1} \) as the strategy of \(\mathcal {P}_{2} \) does not have enough memory to distinguish the resulting energy levels from the intermediate ones.

This proves that \(\mathcal {P}_{2} \) also needs pseudo-polynomial memory in \({ EG}_{{ LU}}\) games. Finally, we remark that this reasoning also holds for the \({ AE}_{{ LU}}\) objective with threshold \(t \,{:=}\,U\), as for the previous game.\(\square \)

5 Average-energy with lower-bounded energy

We conclude with the conjunction of an \( AE \) objective with a lower bound (again equal to zero) constraint on the running energy, but no upper bound. This corresponds to an hypothetical unbounded energy storage. Hence, its applicability is limited, but it may prove interesting on the theoretical standpoint.

Problem 3

(\({ AE}_{L}\)) Given a game \(\textit{G}\), an initial state \(s_{\mathsf{init}}\) and a threshold \(t \in \mathbb {Q}\), decide if \(\mathcal {P}_{1} \) has a winning strategy \(\sigma _1 \in \Sigma _{1}\) for objective \( Energy _ L (c_{\mathsf{init}}\,{:=}\,0)\,\cap \, AvgEnergy (t)\).

This problem proves to be challenging to solve: we provide partial answers in the following, with a proper algorithm for one-player games but only a correct but incomplete method for two-player games. As usual, we present our results for the supremum variant \( \overline{AE} \).

Illustration Consider the game in Fig. 3. Recall that for \({ AE}_{{ LU}}\) with \(U = 3\), the optimal play is \(\pi _{3}\), and it requires alternation between all three different simple cycles. Now consider \({ AE}_{L}\). One may think that relaxing the objective would allow for simpler winning strategies. This is not the case. Some new plays are now acceptable w.r.t. the energy constraint, such as \(\pi _{4} = (aabaaba)^{\omega }\), with \( \overline{AE} (\pi _{4}) = 11/7\) and \(\pi _{5} = (aaababa)^{\omega }\), with \( \overline{AE} (\pi _{5}) = 18/7\). Yet, the optimal play w.r.t. the \( AE \) (under the lower-bound energy constraint) is still \(\pi _{3}\), hence still requires to use all the available cycles, in the appropriate order. This indicates that \({ AE}_{L}\) games also require complex solutions.

5.1 One-player games

We assume that the unique player is \(\mathcal {P}_{1} \). Indeed, the opposite case is easy as for \(\mathcal {P}_{2} \), the objective is a disjunction and \(\mathcal {P}_{2} \) can choose beforehand which sub-objective he will transgress, and do so with a simple memoryless strategy (both \({ AE}\) and \({ EG}_L\) games admit memoryless optimal strategies as seen before). We show that one-player \({ AE}_{L}\) problems lie in PSPACE by reduction to \({ AE}_{{ LU}}\) problems for a well-chosen upper bound \(U \in \mathbb {N}\) and then application of Theorem 13.

The reduction Given a game \(G = (S_{1}, S_{2} = \emptyset , E, w)\) with largest weight \(W~\in ~\mathbb {N}\), an initial state \(s_{\mathsf{init}}\), and a threshold \(t \in \mathbb {Q}\), we reduce the \({ AE}_{L}\) problem to an \({ AE}_{{ LU}}\) problem with an upper bound \(U \in \mathbb {N}\) defined as \(U \,{:=}\,t + N^{2} + N^{3}\), with \(N = W \cdot (|S|+2)\). Observe that the length of the binary encoding of U is polynomial in the size of the game, the encoding of the largest weight W and the encoding of the threshold t. The intuition is that if \(\mathcal {P}_{1} \) can win a one-player \({ AE}_{L}\) game, he can win it without ever reaching energy levels higher than the chosen bound U, even if he is technically allowed to do so. Essentially, the interest of increasing the energy is making more cycles available (as they become safe to take w.r.t. the lower bound constraint), but increasing the energy further than necessary is not a good idea as it will negatively impact the average-energy. To prove this reduction, we start from an arbitrary winning path in the \({ AE}_{L}\) game, and build a witness path that is still winning for the \({ AE}_{L}\) objective, but also keeps the energy below U at all times. Our construction exploits a result of Lafourcade et al. that bounds the value of the counter along a path in a one-counter automaton (stated in [31] and proved in [30, Lemma 42]). We slightly adapt it to our framework in the next lemma. The technique is identical, but the statement is more precise. In the following, we call an expanded configuration of the game G a couple (sc) where \(s \in S\) is a state and \(c \in \mathbb {Z}\) a level of energy.

Lemma 15

Let \(g \in \mathbb {Z}\). Let (sc) and \((s',c')\) be two expanded configurations of the game G such that there exists an expanded path \(\rho _{\text {exp}} = (s_0,c_0) \dots (s_m,c_m)\) in G from (sc) to \((s',c')\) with \(c_i \ge g\) for every \(0 \le i \le m\). Then, there is a path \(\rho _{\text {exp}}'=(s'_0,c'_0) (s'_1,c'_1) \dots (s'_n,c'_n)\) in G from (sc) to \((s',c')\) such that:

  • for every \(0 \le i \le n, g \le c'_i \le \max \{ c,c', g\} +N^2+N^3\), where \(N = W \cdot (|S|+2)\), with W the maximal absolute weight in G;

  • there is an (injective) increasing mapping \(\iota :\{1,\dots ,n\} \rightarrow \{1,\dots ,m\}\) such that for every \(1 \le i \le n, s'_i = s_{\iota (i)}\) and \(c'_i \le c_{\iota (i)}\).

Furthermore,  for any two expanded paths \(\rho ^1\) and \(\rho ^2\), with \( last (\rho ^1) = (s,c)\) and \( first (\rho ^2) = (s',c')\), if \( AE (\rho ^1 \cdot \rho _{\text {exp}} \cdot \rho ^2) \le g\), then it also holds that \( AE (\rho ^1 \cdot \rho '_{\text {exp}} \cdot \rho ^2) \le AE (\rho ^1 \cdot \rho _{\text {exp}} \cdot \rho ^2) \le g\).

Proof

We write \(\alpha = W \cdot (|S|+1), \beta = (\alpha +W) \cdot (\alpha +W-1)-1\) and \(K = \max \bigl \lbrace c,c', g\bigr \rbrace +(\alpha +W)^2\). We apply inductively a transformation that removes similar ascending and descending segments of the path. The segments are selected such that their composition is neutral w.r.t. the energy.

Pick a subpath \(\rho _{\text {exp}}[k,k+h] = (s_k, c_k)\ldots {}(s_{k+h}, c_{k+h})\) of \(\rho _{\text {exp}}\), if it exists, such that:

(a):

\(c_{k} \le K\) and \(c_{k+h} \le K\);

(b):

for every \(0< \ell < h, c_{k+\ell } > K\);

(c):

there is \(0<\ell <h\) such that \(c_{k+\ell } > K + W \cdot (|S|+1) \cdot \beta \).

If such a subpath does not exist, then this means that the cost along \(\rho _{\text {exp}}\) is overall bounded by \(K+W\cdot (|S|+1)\cdot \beta \) (since condition (a) is not restrictive – \(c,c' \le K\)), which then concludes the proof. Hence, assume such a subpath exists for the following steps.

Ascent part Let \(k \le \ell _0\le \dots \le \ell _{\beta } \le k+h\) be indices such that:

  • \(c_{\ell _i} > K + i \cdot W \cdot (|S|+1)\);

  • for every \(k \le \ell < \ell _i, c_\ell \le K + i \cdot W \cdot (|S|+1)\).

Fix \(0 \le i \le \beta \). Then it holds that \(c_{\ell _i} \le K + i \cdot W \cdot (|S|+1) + W\) and thus \(c_{\ell _{i+1}} - c_{\ell _i} > K + (i+1) \cdot W \cdot (|S|+1) -(K + i \cdot W \cdot (|S|+1) + W) = W \cdot (|S|+1) -W = W \cdot |S|\). Let \(J_i\) be a subset of \([\ell _i;\ell _{i+1}]\) defined by \(\ell _i \in J_i\), and if \(j \in J_i\), then let \(j' \le \ell _{i+1}\) be the smallest index larger than j (if it exists) such that \(c_{j'}>c_j\). Obviously we have \(c_j < c_{j'} \le c_j+W\). Hence the cardinal of \(J_i\) is at least \(1+ \frac{W \cdot |S|}{W} \ge |S|+1\). Hence there is a state \(\widetilde{s}^{(i)}\) and two indices \(j_{i,1}<j_{i,2} \in J_i\) with \((s_{j_{i,1}},c_{j_{i,1}}) =(\widetilde{s}^{(i)},\alpha _1)\) and \((s_{j_{i,2}},c_{j_{i,2}}) =(\widetilde{s}^{(i)},\alpha _2)\) with \(c_{\ell _i} \le \alpha _1 < \alpha _2 \le c_{\ell _{i+1}}\), hence using previous computed bounds, \(0< \alpha _2-\alpha _1 \le c_{\ell _{i+1}} - c_{\ell _i} < W \cdot (|S|+2) = \alpha +W\). We write \(\widetilde{d}^{(i)} = \alpha _2-\alpha _1\). The segment between indices \(j_{i,1}\) and \(j_{i,2}\) is a candidate for being removed. Due to the value of \(\beta \), there is \(d \in \{\widetilde{d}^{(i)} \mid 0 \le i \le \beta \}\) that appears \((\alpha +W)\) times in that set.

Descent part We do a similar reasoning for the “descent” part. There must exist indices \(k \le m_0 \le \dots \le m_\beta \le k+h\) such that:

  • \(c_{m_i} > K + (\beta -i) \cdot W \cdot (|S|+1)\);

  • for every \(m_i<m \le k+h, c_m \le K + (\beta -i) \cdot W \cdot (|S|+1)\).

Note that we obviously have \(\ell _\beta <m_0\).

Then we apply the same combinatorics as for the ascent part. There is some value \(0<d' <\alpha +W\) which appears at least \(\alpha +W\) times in potential cycles within the segment \(\rho _{\text {exp}}[k,k+h]\).

Transformation The algorithm then proceeds by removing \(d'\) segments that increase the cost by d within \(\rho _{\text {exp}}[\ell _0,\ell _\beta ]\) and d segments that decrease the cost by \(d'\) within \(\rho _{\text {exp}}[m_0,m_\beta ]\). This yields another path \(\rho _{\text {exp}}'\) and an obvious injection of \(\rho _{\text {exp}}'\) into \(\rho _{\text {exp}}\) which satisfies all the mentioned constraints. The sum of all energy levels along \(\rho _{\text {exp}}'\) is smaller than that along \(\rho _{\text {exp}}\), and any energy level along \(\rho _{\text {exp}}'\) is obtained from that along \(\rho _{\text {exp}}\) by decreasing by at most \(0<d \cdot d' < (\alpha +W)^2\). By assumption on segment \(\rho _{\text {exp}}[k,k+h]\) and bound K, we get that the cost along \(\rho _{\text {exp}}'\) is always larger than or equal to gc and \(c'\).

We iterate this transformation to get a uniform upper bound. We finally notice that the obtained upper bound \(K + W \cdot (|S|+1) \cdot \beta \) is bounded itself by \(\max \{c,c',g\}+N^2+N^3\), where \(N = W \cdot (|S|+2)\). This implies the expected result.\(\square \)

We build upon this lemma to define an appropriate transformation leading to the witness path and derive a sufficiently large upper bound \(U \in \mathbb {N}\) for the \({ AE}_{{ LU}}\) problem.

Lemma 16

The \({ AE}_{L}\) problem over a one-player game \(G = (S_{1}, S_{2} = \emptyset , E, w)\), with an initial state \(s_{\mathsf{init}}\) and a threshold \(t \in \mathbb {Q}\), is reducible to an \({ AE}_{{ LU}}\) problem over the same game G, for the same threshold t and upper bound \(U \,{:=}\,\, t + N^{2} + N^{3}\), with \(N = W \cdot (|S|+2)\).

Proof

We prove that we can bound the energy along a witness of the one-player \({ AE}_{L}\) problem. Let \(\sigma \) be a winning strategy of \(\mathcal {P}_{1} \) for objective \( Energy _ L (c_{\mathsf{init}}~\,{:=}\,~0) \cap AvgEnergy (t)\) and \(\pi = s_0 s_1 \dots s_n \dots \) be the corresponding outcome.

We build another strategy \(\widetilde{\sigma }\) with corresponding play \(\widetilde{\pi }\) such that for every \(n, 0 \le c_{\mathsf{init}}+ EL (\widetilde{\pi }(n)) \le c_{\mathsf{init}}+ t + N^2 + N^3\), where \(N = W \cdot (|S|+2)\) (W is the maximal absolute weight in G), and such that \( \overline{AE} (\widetilde{\pi }) \le \overline{AE} (\pi )\). We actually build the play \(\widetilde{\pi }\) directly, and infer strategy \(\widetilde{\sigma }\).

From \(\pi \), we build the expanded play \(\pi _{\text {exp}} = (s_0,c_0) (s_1,c_1) \dots (s_n,c_n) \dots \) such that \(c_i = EL (\pi (i))\) for every \(i \ge 0\). Since \(\pi \) is a witness satisfying the objective \( Energy _ L (c_{\mathsf{init}}) \cap AvgEnergy (t)\), it holds that \(c_i + c_{\mathsf{init}}\ge 0\) for every \(i \ge 0\). We now show that some pair (sc) is visited infinitely often along \(\pi _{\text {exp}}\). Toward a contradiction, assume that it is not the case. Then since energy levels are bounded from below along \(\pi \), this means that \(\liminf _{n\rightarrow \infty } c_n = \underline{TP} (\pi ) = +\infty \), and by Lemma 2, that \( \overline{AE} (\pi ) = +\infty \) which contradicts the play being winning for the \( AE \) objective with threshold \(t \in \mathbb {Q}\). Now select the smallest energy c and state s such that (sc) is visited infinitely often along \(\pi _{\text {exp}}\). Pick \(n_0\) such that (1) \((s_{n_0}, c_{n_0}) = (s,c)\), (2) \(\pi [\ge n_0] = s_{n_0}s_{n_0 + 1}\ldots {}\) only visits states that are visited infinitely often along \(\pi \), and (3) for every \((s',c')\) along \(\pi _{\text {exp}}[\ge n_0]\), it holds that \(c' \ge c\).

We can then write \(\pi _{\text {exp}}\) as \(\pi _{\text {exp}}[\le n_0] \cdot \mathcal {C}_1 \cdot \mathcal {C}_2 \dots \) where each \(\mathcal {C}_i\) ends at configuration (sc) (hence \(\mathcal {C}_i\) forms a cycle), and each configuration \((s',c')\) along some \(\mathcal {C}_i\) satisfies \(c' \ge c\). We write \(\gamma _i\) for the projection of \(\mathcal {C}_i\) on states (without energy level)—it forms a cycle as well. We obviously have

$$\begin{aligned} \overline{AE} (\pi ) = EL (\pi (n_0)) + \overline{AE} (\pi [> n_0]) = c + \overline{AE} (\pi [> n_0]) \end{aligned}$$

by Lemma 4, and since \( \overline{AE} (\pi ) \le t\), there must be some cycle \(\mathcal {C}_i\) such that \( AE (\gamma _i) \le t - c\). We write \(\gamma \) for such a \(\gamma _i\), and we define \(\varpi = \pi (n_0) \cdot \gamma ^\omega \): it is a lasso-shaped play which also satisfies the objective \( Energy _ L (c_{\mathsf{init}}) \cap AvgEnergy (t)\).

We will now modify the play \(\varpi \), so that the energy does not grow too much along it. We write \(\varpi _{\text {exp}}\) for the expanded version of \(\varpi \): it is of the form

$$\begin{aligned}\varpi _{\text {exp}}[\le n_0] \cdot \bigl (\varpi _{\text {exp}}[n_0+1,n_0+p]\bigr )^\omega ,\end{aligned}$$

where \(\varpi _{\text {exp}}[n_0+1,n_0+p]\) projects onto \(\gamma \) when the energy information is removed (note that the last configurations of \(\varpi _{\text {exp}}[\le n_0]\) and of \(\varpi _{\text {exp}}[n_0+1,n_0+p]\) are (sc)). We will do two things: (i) first we will work on the cycle \(\gamma \); and (ii) then we will work on the prefix \(\varpi [\le n_0]\), to build a witness with a fixed upper bound on the energy. For the rest of the proof, we assume that \(\varpi _{\text {exp}} = (s_0,c_0) (s_1,c_1) \dots \) so that \((s_{n},c_{n}) = (s,c)\) for every \(n = n_0 + b \cdot p\) for some integer b.

First consider point (i). Let us notice that \(c \le t\), otherwise the average-energy along \(\varpi \) could not be at most t (remember that the cost along the expanded version of \(\gamma \) starting at (sc) is always larger than or equal to c by construction). We pick the first maximal subpath \(\varpi _{\text {exp}}[k,k+h]\) of \(\varpi _{\text {exp}}\) with \([k,k+h] \subseteq (n_0,n_0+p)\), such that \(c_{k+\ell } >t\) for every \(0 \le \ell \le h\). By maximality of \(\varpi _{\text {exp}}[k,k+h]\), it is the case that \(c_{k-1} \le t\) and \(c_{k+h+1} \le t\). We infer that \(t<c_{k} \le t+W\) and \(t<c_{k+h} \le t+W\), where W is the maximal absolute weight in the game G. We apply Lemma 15 to the path \(\varpi _{\text {exp}}[k,k+h]\) with \(g = t\), and we get that we can build an expanded path \(\varpi _{\text {exp}}^{(k)}\) which is shorter than \(\varpi _{\text {exp}}[k,k+h]\) and such that:

  • at all positions of \(\varpi _{\text {exp}}^{(k)}\), the energy is in the interval \([t,t+N^2+N^3]\), where \(N = W \cdot (|S|+2)\);

  • there is an injective increasing mapping \(\iota :[0,|\varpi _{\text {exp}}^{(k)}|] \rightarrow [k,k+h]\) such that for every index \(1 \le i \le |\varpi _{\text {exp}}^{(k)}|\), the state of \(\varpi _{\text {exp}}^{(k)}[=i]\) coincides with that of \(\varpi _{\text {exp}}[=\iota (i)]\) and the energy at position i of \(\varpi _{\text {exp}}^{(k)}\) is smaller than or equal to \(c_{\iota (i)}\).

In particular, we have a new witness for the objective \( Energy _ L (c_{\mathsf{init}}) \cap AvgEnergy (t)\), which is the play \(\varpi [< n_0] \cdot \bigl (\varpi [n_0,k-1] \cdot \varpi ^{(k)} \cdot \varpi [k+h+1,n_0+|\gamma |-1]\bigr )^\omega \), where \(\varpi ^{(k)}\) is the projection of \(\varpi _{\text {exp}}^{(k)}\) over the states of the game G. We iterate this transformation over all relevant segments of \(\gamma \) (this will happen only a finite number of times), and we end up with a new lasso-play \(\varpi ' = \varpi [\le n_0] \cdot (\gamma ')^\omega \) such that:

  • \(\varpi '\) satisfies the objective \( Energy _ L (c_{\mathsf{init}}) \cap AvgEnergy (t)\);

  • for every \(1\le \ell \le |\gamma '|, -c_{\mathsf{init}}\le EL (\varpi '(n_0+\ell )) \le t+N^2+N^3\).

Now, consider point (ii). It remains to work on the prefix \(\varpi [\le n_0]\) (which is still a prefix of \(\varpi '\)). We apply Lemma 15 to the prefix \(\varpi [\le n_0]\) with \(g=0\), and we get an appropriately bounded witness.

Summing up, our construction proves that if there exists a winning play for \( Energy _ L (c_{\mathsf{init}} \,{:=}\,0) \cap AvgEnergy (t)\) in the one-player game G, then there exists one for \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\, 0) \cap AvgEnergy (t)\), with \(U \,{:=}\,t + N^2 + N^3\). Since the converse implication is obvious (as the second objective is strictly stronger), this concludes the proof of the reduction to an \({ AE}_{{ LU}}\) game.\(\square \)

Complexity Plugging this bound U in the PSPACE algorithm for one-player \({ AE}_{{ LU}}\) games (Theorem 13) implies PSPACE-membership for one-player \({ AE}_{L}\) games also. In terms of time complexity, we saw that this problem can thus be solved in pseudo-polynomial time. We prove that no truly-polynomial-time algorithm can be obtained unless \(\mathsf{P}= \mathsf{NP}\) as the one-player \({ AE}_{L}\) problem is \(\mathsf{NP}\)-hard. We show it by reduction from the subset-sum problem [20]: given a finite set of naturals \(A = \{a_{1}, \ldots {}, a_{n}\}\) and a target natural v, decide if there exists a subset \(B \subseteq A\) such that \(\sum _{a_{i} \in B} a_{i} = v\). The reduction is sketched in Fig. 6: a play corresponds to a choice of subset. In order to keep a positive energy level, \(\mathcal {P}_{1} \) has to pick a subset that achieves a sum at least equal to v, but in order to satisfy the \( AE \) threshold, this sum must be at most v: hence \(\mathcal {P}_{1} \) must be able to pick a subset whose sum is exactly the target v.

Fig. 6
figure 6

Reduction from the subset-sum problem for target \(v \in \mathbb {N}\) to a one-player \({ AE}_{L}\) problem for average-energy threshold \(t \,{:=}\,v\)

Theorem 17

The \({ AE}_{L}\) problem is in \(\mathsf{PSPACE}\) and at least \(\mathsf{NP}\)-hard for one-player games.

Proof

First, consider the claim of \(\mathsf{PSPACE}\)-membership. Let \(G= (S_{1}, S_{2} = \emptyset , E, w)\) be a game with initial state \(s_{\mathsf{init}}\). Consider the \({ AE}_{L}\) problem for a given average-energy threshold \(t \in \mathbb {Q}\). By Lemma 16, this problem is reducible to the \({ AE}_{{ LU}}\) problem with upper bound \(U \,{:=}\,t + N^{2} + N^{3}\), with \(N = W \cdot (|S|+2)\). Hence, U is of order \(\mathcal {O}(t + W^3 \cdot \vert S\vert ^{3})\), and its encoding is polynomial in the encoding of the original \({ AE}_{L}\) problem (including thresholds and weights, not only in the number of states of the original game!). Following the complexity analysis presented in Theorem 13, we thus conclude that the one-player \({ AE}_{L}\) problem is indeed in \(\mathsf{PSPACE}\). In terms of time, by using the \( MP \) reduction and the pseudo-polynomial algorithm, we have an algorithm for the one-player \({ AE}_{L}\) problem that takes time of order

$$\begin{aligned} \mathcal {O}\left( \big ( (U + 1) \cdot \vert S\vert + 1\big ) ^{3} \cdot \max \{U,\, \lceil t\rceil + 1\}\right) = \mathcal {O}\left( \left( t + W^3 \cdot |S|^{3}\right) ^{4} \cdot |S|^{3}\right) , \end{aligned}$$

which is still pseudo-polynomial in the size of the original \({ AE}_{L}\) problem (i.e., polynomial in the number of states and in the values of the largest absolute weight and of the average-energy threshold).

Second, we prove that the one-player \({ AE}_{L}\) problem is \(\mathsf{NP}\)-hard. Consider the subset-sum problem for the set \(A = \{a_1, \ldots {}, a_{n}\}\) such that for all \(i \in \{1, \ldots {}, n\}, a_{i} \in \mathbb {N}\), and target \(v \in \mathbb {N}\). Deciding if there exists a subset \(B \subseteq A\) such that \(\sum _{a_{i} \in B} a_{i} = v\) is well-known to be \(\mathsf{NP}\)-complete [20]. We reduce this problem to an \({ AE}_{L}\) problem over the game G depicted in Fig. 6. Observe that this game has polynomially as many states as the size of A, and that its largest absolute weight is equal to the maximum between the largest element of A and the target v. It is clear that there is a bijection between choices of subsets of A and plays in G. Let us fix threshold \(t \,{:=}\,v\) for the average-energy. Recall that Lemma 4 implies that the average-energy of any play is exactly its energy level at the first visit of end (because afterwards the zero self-loop is repeated forever). Hence, we have that

  1. 1.

    a play \(\pi \) in G is winning for \( Energy _ L (c_{\mathsf{init}}\,{:=}\,0)\) if and only if the corresponding subset B is such that \(\sum _{a_{i} \in B} a_{i} \ge v\);

  2. 2.

    a play \(\pi \) in G is winning for \( AvgEnergy (t \,{:=}\,v)\) if and only if the corresponding subset B is such that \(\sum _{a_{i} \in B} a_{i} \le v\).

Therefore, \(\mathcal {P}_{1} \) has a winning strategy for the \({ AE}_{L}\) objective \( Energy _ L (c_{\mathsf{init}}\,{:=}\,0) \cap AvgEnergy (t\,{:=}\,v)\) in G if and only if there exists a subset B for which the sum of elements is exactly equal to the target v.

This proves the reduction from the subset-sum problem and the \(\mathsf{NP}\)-hardness result. Observe two things. First, the hardness proof relies on having set elements and a target value that are not polynomial in the size of the input set A. Indeed, the subset-sum problem is solvable with a pseudo-polynomial algorithm, hence in \(\mathsf{P}\) for polynomial values. Second, our reduction also holds for the \( \underline{AE} \) variant of the average-energy.\(\square \)

Memory requirements Recall that for \(\mathcal {P}_{2} \), the situation is simpler and memoryless strategies suffice. By the reduction to \({ AE}_{{ LU}}\), we know that pseudo-polynomial memory suffices for \(\mathcal {P}_{1} \). This bound is tight as witnessed by the family of games already presented in Fig. 5a. To ensure the lower bound on energy, \(\mathcal {P}_{1} \) has to play edge \((s, s')\) at least U times before taking the (ss) self-loop. But to minimize the average-energy, edge \((s, s')\) should never be played more than necessary. The optimal strategy is the same as for the \({ AE}_{{ LU}}\) problem: playing \((s, s')\) exactly U times, then (ss) once, then repeating, forever. As shown in Theorem 14, this strategy requires pseudo-polynomial memory.

Theorem 18

Pseudo-polynomial-memory strategies are both sufficient and necessary to win for \(\mathcal {P}_{1} \) in one-player \({ AE}_{L}\) games. Memoryless strategies suffice for \(\mathcal {P}_{2} \) in such games.

5.2 Two-player games

For the two-player \({ AE}_{L}\) problem, we only provide partial answers, as open questions remain. We first discuss decidability: we present an incremental algorithm that is correct but incomplete (Lemma 19) and we draw the outline of a potential approach to obtain completeness hence decidability. Then, we prove that the two-player \({ AE}_{L}\) problem is at least \(\mathsf{EXPTIME}\)-hard (Lemma 20). Finally, we show that, in contrast to the one-player case, \(\mathcal {P}_{2} \) also requires memory in two-player \({ AE}_{L}\) games (Lemma 21).

Decidability Assume that there exists some \(U \in \mathbb {N}\) such that \(\mathcal {P}_{1} \) has a winning strategy for the \({ AE}_{{ LU}}\) problem with upper bound U and average-energy threshold t. Then, this strategy is trivially winning for the \({ AE}_{L}\) problem as well. This observation leads to an incremental algorithm that is correct (no false positives) but incomplete (it is not guaranteed to stop).

Lemma 19

There is an algorithm that takes as input an \({ AE}_{L}\) problem and iteratively solves corresponding \({ AE}_{{ LU}}\) problems for incremental values of \(U \in \mathbb {N}\). If a winning strategy is found for some \(U \in \mathbb {N}\), then it is also winning for the original \({ AE}_{L}\) problem. If no strategy is found up to value \(U \in \mathbb {N}\), then no strategy of \(\mathcal {P}_{1} \) can simultaneously win the \({ AE}_{L}\) problem and prevent the energy from exceeding U at all times.

While an incomplete algorithm clearly seems limiting from a theoretical standpoint, it is worth noting that in practice, such approaches are common and often necessary restrictions, even for problems where a complete algorithm is known to exist. For example, the existence of an initial energy level sufficient to win in multi-dimensional energy games can be decided [15] but practical implementations resort to an incremental scheme that is in practice incomplete because the theoretical bound granting completeness is too large to be tackled efficiently by software synthesis tools [4]. In our case, we have already seen that if such a bound exists for the two-player \({ AE}_{L}\) problem, it needs to be at least exponential in the encoding of problem (cf. one-player \({ AE}_{L}\) games). Hence it seems likely that a prohibitive bound would be necessary, rendering the algorithm of Lemma 19 more appealing in practice.

Nevertheless, we conjecture that the \({ AE}_{L}\) problem is decidable for two-player games and that, similarly to the one-player case, an upper bound on the energy can be obtained. Unfortunately, this claim is much more challenging to prove for two-player games. Clearly, the approach of Lemma 16 has to be generalized: while in one-player games we could pick a witness winning play and transform it, we now have to deal with tree unfoldings—describing sets of plays—because of the uncontrollable choices made by \(\mathcal {P}_{2} \).

A potentially promising approach is to define a notion close to the self-covering trees used in [15] for energy games. Roughly, take any winning strategy of \(\mathcal {P}_{1} \) in a two-player \({ AE}_{L}\) game. Without further assumption, this strategy could be infinite-memory. It can be represented by its corresponding infinite tree unfolding where in nodes of \(\mathcal {P}_{1} \), a unique child is given by the strategy, and in nodes of \(\mathcal {P}_{2} \), all possible successors yield different branches. Every rooted branch of this tree is infinite and describes a winning play. Then, we would like to achieve the following steps.

  1. 1.

    Prove that all branches of this unfolding can be cut in such a way that the resulting finite tree describes a finite-memory strategy that is still winning for the \({ AE}_{L}\) objective.

  2. 2.

    Reduce the height of this finite tree by compressing parts of the branches: deleting embedded zero cycles seems to be a good candidate for the transformation to apply.

  3. 3.

    Derive an upper bound on the height of the compressed tree and, consequently, on the maximal energy level reached along any play consistent with the corresponding strategy.

  4. 4.

    Use this upper bound to reduce the \({ AE}_{L}\) problem to an \({ AE}_{{ LU}}\) problem.

Sadly, some challenges appear on the technical side when trying to implement this approach, mainly for items 1 and 3. Intuitively, the additional difficulty (when compared to the approach developed in [15] and similar works) arises from the fact that describing what is a good cycle pattern for the \({ AE}_{L}\) objective is much more intricate than it is for a simple \({ EG}_L\) objective (in which case we simply look for zero cycles). This makes the precise definition of an appropriate transformation of branches, and the resulting tree height analysis, more tedious to achieve.

We also mention that the \({ AE}_{L}\) problem could be reduced, following a construction similar to the one given in Sect. 4.1, to a mean-payoff threshold problem over an infinite arena, where states of the expanded graph are arranged respectively to their energy level, ranging from zero to infinity, and where weights would also take values inside \(\mathbb {N} \cup \{\infty \}\) (as they reflect the possible energy levels). To the best of our knowledge, it is not known if mean-payoff games over such particular structures are decidable. If so, an algorithm would have to fully exploit the peculiar form of those arenas, as it is for example known that general models such as pushdown games are undecidable for the mean-payoff [13].

Finally, one could envision to fill the gap between one-player and two-player \({ AE}_{L}\) games by using a general result similar to [23, Cor. 7]. Recall that we used it to derive memoryless determinacy in the two-player case from memoryless determinacy of both one-player versions (\(S_{1} = \emptyset \) and \(S_{2} = \emptyset \)). However, we here have that in one-player games, \(\mathcal {P}_{1} \) requires pseudo-polynomial memory. Therefore, it is necessary to extend the result of Gimbert and Zielonka to finite-memory strategies: that is, to show that if we have a bound on memory valid in both one-player versions of a game, then this bound, or a derived one, is also valid in the two-player version. This is not known to be the case in general, and establishing it for a sufficiently general class of games seems challenging.

Complexity lower bound We now prove that the two-player \({ AE}_{L}\) problem would require at least exponential time to solve. Our proof is by reduction from countdown games. A countdown game \(\mathcal {C}\) is a weighted graph \((\mathcal {V}, \mathcal {E})\), where \(\mathcal {V}\) is the finite set of states, and \(\mathcal {E} \subseteq \mathcal {V} \times \mathbb {N} {\setminus } \{0\} \times \mathcal {V}\) is the edge relation. Configurations are of the form \((v, c), v \in \mathcal {V}, c \in \mathbb {N}\). The game starts in an initial configuration \((v_{\text {init}}, c_0)\) and transitions from a configuration (sc) are performed as follows. First, \(\mathcal {P}_{1} \) chooses a duration \(d, 0 < d \le c\) such that there exists \(e = (v, d, v') \in \mathcal {E}\) for some \(v' \in \mathcal {V}\). Second, \(\mathcal {P}_{2} \) chooses a state \(v' \in \mathcal {V}\) such that \(e = (v, d, v') \in \mathcal {E}\). Then the game advances to \((v', c-d)\). Terminal configurations are reached whenever no legitimate move is available. If such a configuration is of the form \((v, 0), \mathcal {P}_{1} \) wins the play, otherwise \(\mathcal {P}_{2} \) wins. Deciding the winner given an initial configuration \((v_{\text {init}}, c_0)\) is \(\mathsf{EXPTIME}\)-complete [27].

Our reduction is depicted in Fig. 7. The \( EL \) is initialized to \(c_0\), then it is decreasing along any play. Consider the \({ AE}_{L}\) objective for \( AE \) threshold \(t \,{:=}\,0\). To ensure that the energy always stays non-negative, \(\mathcal {P}_{1} \) has to switch to stop while the \( EL \) is no less than zero. In addition, to ensure an \( AE \) no more than \(t = 0, \mathcal {P}_{1} \) has to obtain an \( EL \) at most equal to zero before switching to stop (as the \( AE \) will be equal to this \( EL \) thanks to Lemma 4 and the zero self-loop on stop). Hence, \(\mathcal {P}_{1} \) wins the \({ AE}_{L}\) objective only if he can ensure a total sum of chosen durations that is exactly equal to \(c_{0}\), i.e., if he can reach a winning terminal configuration for the countdown game. The converse also holds.

Fig. 7
figure 7

Reduction from a countdown game \(\mathcal {C} = (\mathcal {V}, \mathcal {E})\) with initial configuration \((v_{\text {init}}, c_{0})\) to a two-player \({ AE}_{L}\) problem for average-energy threshold \(t \,{:=}\,0\)

Lemma 20

The \({ AE}_{L}\) problem is \(\mathsf{EXPTIME}\)-hard for two-player games.

Proof

Given a countdown game \(\mathcal {C} = (\mathcal {V}, \mathcal {E})\) and an initial configuration \((v_{\text {init}}, c_0)\), we build a game \(G = (S_1, S_2, E, w)\) with initial state \(s_{\text {init}}\) such that \(\mathcal {P}_{1} \) has a winning strategy in G for the \({ AE}_{L}\) objective for threshold \(t \,{:=}\,0\) if and only if he has a winning strategy in \(\mathcal {C}\) to reach a terminal configuration with counter value zero. The construction is depicted in Fig. 7. Formally, the game G is built as follows.

  • \(S_1 = \mathcal {V} \cup \{\textsf {start}, \textsf {stop}\}\).

  • \(S_2 = \left\{ (v, d) \in \mathcal {V} \times \mathbb {N} {\setminus } \{0\} \mid \exists \, v' \in \mathcal {V},\, (v, d, v') \in \mathcal {E}\right\} \).

  • \(s_{\text {init}} = \textsf {start}\).

  • For each \((v, d, v') \in \mathcal {E}\), we have that \((v, (v, d)) \in E\) with \(w(v, (v, d)) = -d\) and \(((v, d), v') \in E\) with \(w((v, d), v') = 0\).

  • Additionally, \((\textsf {start}, v_{\text {init}}) \in E\) with \(w(\textsf {start}, v_{\text {init}}) = c_{0}, (\textsf {stop}, \textsf {stop}) \in E\) with \(w(\textsf {stop}, \textsf {stop}) = 0\) and for all \(v \in \mathcal {V}, (v, \textsf {stop}) \in E\) with \(w(v, \textsf {stop}) = 0\).

First, consider the left-to-right direction of the claim. Assume \(\mathcal {P}_{1} \) has a winning strategy for the \({ AE}_{L}\) objective in G. As noted before, such a strategy necessarily reaches the energy level zero then switches to stop directly. Hence, applying this strategy in the countdown game ensures that the sum of durations will be exactly equal to \(c_0\) (recall that we start our \({ AE}_{L}\) game by initializing the energy to \(c_0\) then decrease it at every step by the duration chosen by \(\mathcal {P}_{1} \)). Thus, this strategy is winning in the countdown game \(\mathcal {C}\).

Second, consider the right-to-left direction. Assume that \(\mathcal {P}_{1} \) has a winning strategy in the countdown game \(\mathcal {C}\). Playing this strategy in G ensures to reach a state \(v \in S_1\) with energy level exactly equal to zero. Thus a winning strategy for the \({ AE}_{L}\) objective is to play the countdown strategy up to this point then to immediately take the edge \((v, \textsf {stop})\). Indeed, any consistent outcome will satisfy the lower bound on energy (as the energy will never go below zero), and it will have an average-energy equal to \(t = 0\) (because the energy level when reaching stop will be zero).

This shows both directions of the claim and concludes our proof. Observe that this reduction is also true if we consider the \( \underline{AE} \) variant of the average-energy.\(\square \)

Memory requirements We close our study of two-player \({ AE}_{L}\) games by discussing the memory needs. First note that we cannot provide upper bounds: if we had such bounds, we could derive a bound on the energy along any consistent play and reduce the \({ AE}_{L}\) problem to an \({ AE}_{{ LU}}\) one as discussed before, hence proving its decidability. Second, we already know by Theorem 18 that pseudo-polynomial memory is necessary for \(\mathcal {P}_{1} \). Finally, we present a simple game (Fig. 8) where \(\mathcal {P}_{2} \) needs to use memory in order to prevent \(\mathcal {P}_{1} \) from winning.

Fig. 8
figure 8

Simple two-player \({ AE}_{L}\) game witnessing the need for memory even for \(\mathcal {P}_{2} \)

Lemma 21

Pseudo-polynomial-memory strategies are necessary to win for \(\mathcal {P}_{1} \) in two-player \({ AE}_{L}\) games. Memory is also required for \(\mathcal {P}_{2} \) in such games.

Proof

We only have to prove that \(\mathcal {P}_{2} \) needs memory in the game of Fig. 8. Consider the \({ AE}_{L}\) objective for the average-energy threshold \(t \,{:=}\,1\) on this game. Assume that \(\mathcal {P}_{2} \) is restricted to memoryless strategies. Then, there are only two possible strategies for \(\mathcal {P}_{2} \). If \(\mathcal {P}_{2} \) always takes the self-loop \((s_{2}, s_{2})\), then the only consistent play is \(s_1(s_2)^\omega \): it has \( AE \) equal to 1, and satisfies the lower bound constraint on energy, thus \(\mathcal {P}_{1} \) wins. If \(\mathcal {P}_{2} \) always takes \((s_2, s_3)\), then \(\mathcal {P}_{1} \) can win by producing the following play: \(s_{1}s_2(s_3s_2s_3)^{\omega }\). It also has \( AE \) equal to 1, and satisfies the energy constraint. Hence \(\mathcal {P}_{2} \) cannot win this game with a memoryless strategy. Nonetheless, he has a winning strategy that uses memory. Let this strategy be the one that plays \((s_{2}, s_{3})\) once then chooses the self-loop \((s_2, s_2)\) forever. When this strategy is used by \(\mathcal {P}_{2}, \mathcal {P}_{1} \) has to pick \((s_3, s_2)\) in the first visit of \(s_3\) otherwise he loses because the energy goes below zero. But if \(\mathcal {P}_{1} \) picks this edge, the unique outcome becomes \(s_1s_2s_3(s_2)^{\omega }\), whose average-energy is \(2 > t\), hence also losing for \(\mathcal {P}_{1} \). Thus, the defined strategy is winning for \(\mathcal {P}_{2} \).\(\square \)

6 Conclusion

We presented a thorough study of the average-energy payoff. We showed that average-energy games belong to the same intriguing complexity class as mean-payoff, total-payoff and energy games and that they are similarly memoryless determined. We then solved average-energy games with lower- and upper-bounded energy: such a conjunction is motivated by previous case studies in the literature [10]. Lastly, we provided preliminary results for the case of average-energy with a lower bound but no upper bound on the energy. Following the publication of [7], Larsen et al. addressed a different problem in [32]: they proved that deciding if there exists a threshold \(t \in \mathbb {Q}\) such that \(\mathcal {P}_{1} \) can win a two-player game for objective \( Energy _ L (c_{\mathsf{init}}\,{:=}\,0)\,\cap \, AvgEnergy (t)\) can be done in doubly-exponential time. This is indeed equivalent to deciding if there exists an upper-bound \(U \in \mathbb {N}\) such that \(\mathcal {P}_{1} \) can win for the objective \( Energy _ LU (U, c_{\mathsf{init}}\,{:=}\,0)\), which is known to be in 2EXPTIME [25]. Unfortunately, this approach does not help in solving Problem 3, where the threshold \(t \in \mathbb {Q}\) for the average-energy is part of the input: solving two-player \({ AE}_{L}\) games is still an open question.

We believe that the average-energy objective and its variations model relevant aspects of systems in practical applications as hinted by the aforementioned case study. Hence, we would like to extend our knowledge of this objective to more general models such as stochastic games, or games with multi-dimensional weights. Of course, the open questions regarding the \({ AE}_{L}\) objective are intriguing. Finally, we would like to implement our techniques in synthesis tools and assess their applicability through proper case studies.