1 Introduction

Since it was introduced in Strotz (1956), and further developed in Phelps and Pollak (1968) and Peleg and Yaari (1973), the problem of dynamic consistency in economic models has played an important role in work in many fields in economics. In particular, this problem has appeared in recent papers in such diverse topics as the theory of optimal consumption/savings, the role of liquidity constraints in dynamic asset markets, the behavioral foundations of economic choice, the role of commitment devices in dynamic models of self-control, the design of dynamic time-consistent environmental policies, models of social discounting and cost-benefit analysis, and various other papers studying the welfare implications of public policy in dynamic models.Footnote 1 The classical toolkit for studying these problems has emphasized the language of recursive decision theoryFootnote 2, which was first introduced in Strotz (1956). As observed by many researchers in subsequent discussions [e.g., Peleg and Yaari (1973) and Bernheim and Ray (1986)], a key problem with this recursive decision theory is that optimal dynamically consistent (Markov) plans need not exist, let alone be simple to characterize or compute. One key reason for the failure of existence lies in the seemingly inherent presence of discontinuities in intertemporal preferences that arises naturally in the dynamic structure of these problems when recursive decision theory approaches are attempted. The source of this lack of continuity is the lack of commitment between the current ”versions” of the decision maker and all her continuation “selves”.Footnote 3 Due to this discontinuity, the optimal level of “commitment” may be nonexistent, and the dynamic maximization problem can turn out to be poorly defined [see, for example, Caplin and Leahy (2006) for an excellent discussion of this fact].Footnote 4

As a way of circumventing these problems, Peleg and Yaari (1973) proposed a dynamic game interpretation of the time-consistency problem.Footnote 5 In this view, one envisions the decision maker playing a dynamic game between one’s current self and each of her future ”selves”, with the appropriate solution concept in the game being a subgame-perfect Nash equilibrium (SPNE, henceforth). A SPNE of the appropriate game need not be an optimal time-consistent policy, however. This fact is due, in part, to the dynamic decision theoretic approach being proposed itself, where future ties are broken in favor of a current self, and that observation is not necessarily true for a SPNE of a dynamic game. Additionally, the set of SPNE may be very large, and most importantly, not necessarily possess the element with the greatest value. Hence, an optimal SPNE (i.e., a SPNE that corresponds to some optimal time-consistent policy) may simply not exist. Moreover, even if the question of existence of SPNE is resolved, the existence of Stationary Markov Nash EquilibriaFootnote 6 (henceforth, SMNE) is still not guaranteed [see Bernheim and Ray (1986) and Leininger (1986)].

In this paper, we develop a new approach in the tradition of the classical Strotz (1956) recursive approach to studying equilibrium in the Phelps and Pollak (1968) game theoretic representation of the problem, but with an emphasis on developing constructive methods for characterizing SMNE, as well as methods for computing them. The underlying game theoretic structure is that of a stochastic game. In our setting, we seek conditions under which a simple class of stable iterative algorithms exists that can both (i) characterize the existence of SMNE from a theoretical perspective; (ii) provide explicit and accurate algorithms for computing particular elements of this set; (iii) characterize the optimal time consistent policy among the set of SMNE solutions, and (iv) provide methods that are stable (in some well-defined sense) under perturbation of deep parameters. We provide conditions under which affirmative answers to all of these questions can be given.

More specifically, under standard assumptions on preferences and certain geometric condition on a transition probability, we are able to show existence of the greatest and the least SMNE, as well as provide conditions where they are (Lipschitz) continuous or monotone. Further, and equally as important, we characterize the set of all values corresponding to time-consistent policies, showing that the set of SMNE is a countably chain complete posetFootnote 7 containing the least and the greatest elements. This characterization of the set of SMNE allows us to verify the existence, and compute, the greatest value function associated with SMNE, and hence compute the optimal time-consistent policy. This fact, along with our constructive methods, allows us to link directly the game-theoretic analysis of our problem with that predicted by recursive decision theory.

We next turn to the question of computation of SMNE, as well as addressing the question of equilibrium comparative statics relative to ordered perturbations of the deep parameters of the game. This latter set of questions is also critical, as it allows us to develop a theory of computable equilibrium comparative statics. That is, we are able to construct a simple approximation scheme that is able to compute monotone comparative statics relative to extremal time-consistent SMNE policies with respect to the model parameters. These comparative statics and computation/approximation results are important for applied research in the field.Footnote 8

From a technical perspective, our methods complement the ideas found in the important papers of Bernheim and Ray (1986) and Harris and Laibson (2001), where the authors add noise with invariant support to the problem, which in turn allows them to develop conditions that guarantee the existence of a time-consistent policy in spaces of functions with locally bounded variation or that are Lipschitzian for sufficiently small hyperbolic discount factor.Footnote 9 What is critical in understanding the differences between the approach in Harris and Laibson (2001) as opposed to those developed in the present paper is that our methods do not rely on so-called “generalized Euler equation” (GEE) methods.Footnote 10 Rather, our methods emphasize value iteration type methods, and hence are more in the spirit of “promised utility methods” but defined in spaces of functions [as opposed to spaces of correspondences in the promised utility literature as in the APS type approaches found in Bernheim et al. (1999) and Chade et al. (2008)]. What is equally as important is that our methods are therefore able to link the underlying stochastic game studied in Harris and Laibson (2001) with a recursive or value function methods suggested by Strotz (1956) [and further developed by Caplin and Leahy (2006)], and therefore it provides in the context of our stochastic framework a unification of both of decision-theoretic and game-theoretic approaches that have been taken in the existing literature.

The rest of the paper is organized as follows. We start in Sect. 2 presenting a general fixed point result that extends the theorems of Tarski (1955) per existence and Veinott (1992) per fixed point comparative statics to countably chain complete posets. We need these results as neither Tarski (1955) nor Veinott (1992)’s results can be applied in our problem at hand. In Sect. 3, we then specify our general model, state our assumptions, while in 4 discuss our main theorems on existence and computation of SMNE in such models. In Sect. 5, we present three examples showing how our general model and tools can be used in applications, as well how they can be extended to more general problems. Finally, Sect. 6 concludes by discussing in details the related results.

2 Preliminary result

We begin by stating a new fixed point result that is essential in all the subsequent analysis in this paper. The theorem is related to a well-known result characterizing the set of fixed points for monotone transformations of complete latticesFootnote 11 due to Tarski (1955).Footnote 12 Recall, Tarski’s theorem says that an isotone transformation of a nonempty complete lattice has a nonempty complete lattice of fixed points. Tarski’s theorem was later generalized by Markowsky (1976) to the case of isotone transformations of chain complete partially ordered sets (i.e., an isotone transformation of a nonempty chain complete partially order sets has a nonempty chain complete set of fixed points). Unfortunately, we cannot work with either of these theorems in this paper as our isotone maps transform domains that are neither complete lattices nor chain complete partially ordered sets. Rather, our mappings transform countable chain complete partially ordered sets.

Therefore, we need to begin by proving a new result that is an analog to the Tarski/Markowsky theorems for countably chain complete partially ordered sets.Footnote 13 We also need to extend well-known fixed point comparative statics result due to Veinott (1992) to this new context.Footnote 14 We start with an important definitionFootnote 15.

Definition 1

A function \(F:X\rightarrow X\), where \(X\) is a poset, is monotonically-sup-preserving, if for any monotone sequence \(\{x_{n}\}_{n=0}^{\infty }\) we have: \(F\left( \bigvee x_{n}\right) =\bigvee F(x_{n})\). We define monotonically inf-preserving functions analogously. \(F\) is said to be monotonically-sup-inf-preserving if and only if, it is both monotonically-sup and monotonically-inf-preserving.

The property of monotonically sup (resp., inf) preserving mapping is a type of sequential “order continuity” property of a mapping in the Scott topology. For example, a mapping that is monotonically sup-inf preserving is also referred to in the literature as a sigma-order continuous mapping [e.g., Dugundji and Granas (1982, p.15)]. It bears mentioning that such order continuity properties for operators play an essential role in the computation of fixed point for isotone maps in countably chain complete partially ordered sets (i.e., obtaining convergence of successive approximation schemes where iterations are indexed on the natural numbers).Footnote 16

We now state our new theorem which characterizes the order structure of the set of fixed points of a parameterized monotone increasing self map defined on a countably chain complete partially ordered set. We begin with some useful definitions. Let (\(X,\ge )\) be a partially ordered set (i.e., \(X\) is equipped with an order relation \(\ge :X\times X\rightarrow X \) that is reflexive, antisymmetric and transitive). If every element of a poset \(X\) is comparable in order, then \(X\) is chain. If \(X\) is a chain and countable, \(X\) is a countable chain. In a poset \(X\), if every chain \(C\subset \) \(X\) is complete, then \(X\ \)is referred to as a chain complete partially ordered set. If every countable chain \(C\subset X\) is complete, then \(X\) is referred to as a countably chain complete poset.

Our result has three parts: (a) characterization of the set of fixed points, (b) fixed point comparative statics, and (c) result on the computation of fixed points via successive approximations. Our contribution is part (a) and (b) of the theorem (not part (c), which is the Tarski–Kantorovich theorem). The proof is technical, and is found in the appendix.

Theorem 1

Let \(F:X\times T\rightarrow X\) be a parameterized monotone increasing operator with \(T\) a poset, \(X\) a countably chain complete poset with the greatest and least element, \(X\times T\) given the product order. Let the fixed point set of \(F(\cdot ,t)\) be denoted by \( \Phi (t)\). If for every \(t\in T,\) the function \(F(\cdot ,t)\) is monotonically sup-inf preserving, then

  1. (a)

    \(\Phi (t)\) is a non-empty countably chain complete poset in induced order.

  2. (b)

    Moreover, the least and greatest fixed point selections \(t\rightarrow {\underline{\Phi }}(t):=\wedge \) \(\Phi (t)\) and \(t\rightarrow {\overline{\Phi }}(t):=\vee \) \(\Phi (t)\) are isotone.

  3. (c)

    Finally, for the greatest \({\overline{\theta }}\) (resp, least \({\underline{\theta }}\)) elements of \(X\), we have:

    $$\begin{aligned} \inf _{n}F^{n}(\overline{\theta },t)=\overline{\Phi } (t) \quad \left( \mathrm{{resp.}} \sup _{n}F^{n}(\underline{\theta },t)=\underline{\Phi }(t)\right) . \end{aligned}$$

A few remarks on this result. As previously mentioned, part (a) generalizes Tarski (1955) and Markowsky (1976) to the context of countable chain complete poset. The key additional fact to notice is that this result requires a stronger property for the mapping \(F\) in \(x\), for every \(t\in T\) (namely, \(F\) needs to be sigma-order continuous in \(x\) to even obtain existence). Part (b) of the theorem is essentially Veinott’s fixed point comparative statics result adapted to the context of a countably chain complete partially ordered set. Finally, part (c) is related to the computational results for \(\sigma -\)complete lattices (resp, countably chain complete partially ordered sets) found in Vulikh (1967), Lemma XII.2.1 [resp., Tarski-Kantorovich, see Theorem 4.2 in Dugundji and Granas (1982)].

Now, obviously, weakening of conditions in our results relative to previous work does come at a cost. For example, per part (a), the additional assumption of order continuity implies the converse (necessity) results that are also proven relative to isotone maps for Tarski’s theorem for complete lattices [see Davis (1955), Theorem 2] and Markowsky’s theorem for chain complete partially ordered sets [see Markowsky (1976), Theorem 11] will not hold in our case. That is, we cannot characterize a countably chain complete partially ordered set using the fixed point property of the space relative to particular classes of monotone increasing mappings. So our new answers relate to sufficiency, not necessity.

Finally, it is also worth mentioning that in part (b) of the Theorem 1, we do generalize the fixed point comparative statics result of Veinott (1992) in some important directions. First, as mentioned before, \(\Phi \) is now only countably chain complete valued, so we require less structure on the underlying domain of our operators. Further, in conjunction with part (c) of the theorem, we are able to compute these fixed point comparative statics (with convergence structures studied relative to the Scott topology). Of course, this also comes at the expense of added order continuity conditions. Also, it bears mentioning that as both the top and bottom elements of \(\Phi \) are increasing selections, the correspondence \(\Phi \) is actually directed upward and directed downward (hence, ascending in the “weak induced set order”). This is also true, for example in Veinott’s theorem for the case of complete lattices. So we obtain his fixed point comparatives statics in the weaker setting of countably chain complete partially ordered sets.

3 Benchmark model

With these results in mind, we can now describe the model we study in the paper. Our environment is a multidimensional version of \(\beta -\delta \) quasi-hyperbolic discounting model that has been studied extensively in the literature. We envision an agent to be a sequence of “selves” indexed in discrete time \(t\in T=\{0,1,\ldots \}\). A “current self” or “self \(t\)” enters the period in given state \(x_{t}\in S\) , whereFootnote 17 \(S=[0,\overline{S }]\subset \mathbb {R}^{n}\) or \(S=[0,\infty )\subset \mathbb {R}^{n}\), and chooses a vector of actions denoted by \(a_{t}\in A\subset \mathbb {R}^{m}\). These choices, together with current state \(x_{t}\), determine a stochastic transition probability on the next period state \(x_{t+1}\) given by \( Q(dx_{t+1}|x_{t},a_{t})\).

The self \(t\) preferences are represented by a utility function given by:

$$\begin{aligned} u(a_{t})+\beta E_{t}\sum _{i=t+1}^{\infty }\delta ^{i-t}u(a_{i}), \end{aligned}$$
(1)

where \(1\ge \beta >0\) and \(1>\delta \ge 0\), \(u\) is an instantaneous payoff function, and expectations \(E_{t}\) are taken with respect to a realization of a random variable \(x_{i}\) drawn each period from a transition distribution \(Q\), where this expectation is well-defined by the Ionescu–Tulcea theorem.

Under some continuity assumptions on \(u\) and \(Q\) (to be specified later), we can define a SMNE for the quasi-hyperbolic consumer to be an \(h\in \mathcal {H }\), where \(\mathcal {H}=\{h:S\rightarrow A|h\, \mathrm{{ is\,bounded\,and}}\) \(\mathrm{{Borel\, measurable\, with }}\, h(x)\in A(x)\}\), that satisfies the following functional equation:

$$\begin{aligned} h(x)\in \arg \max _{a\in A(x)}u(a)+\beta \delta \int \limits _{S}V_{h}(y)Q(dy|x,a), \end{aligned}$$
(2)

where \(V_{h}:S\rightarrow \mathbb {R}\) is a continuation value function for the household of “future” selves that are successors to the self \(t\), and the future selves follow a stationary policy \(h\) from tomorrow onward.

This statement implies that the value function in (2) that is defined for the future selves in a Markovian equilibrium must also solve the following functional equation in the continuation that is given recursively as follows:

$$\begin{aligned} V_{h}(x)=u(h(x))+\delta \int \limits _{S}V_{h}(y)Q(dy|x,h(x)). \end{aligned}$$
(3)

Therefore, if we define the value function for the self \(t\) to be:

$$\begin{aligned} W_{h}(x):=u(h(x))+\beta \delta \int \limits _{S}V_{h}(y)Q(dy|x,h(x)), \end{aligned}$$

for the time consistent policy \(h\) one obtains the relation

$$\begin{aligned} V_{h}(x)=\frac{1}{\beta }W_{h}(x)-\frac{1-\beta }{\beta }u(h(x)). \end{aligned}$$
(4)

Based on equation (4), we can define an operator whose fixed point, say \(V^{*}\), corresponds to a value for some time-consistent Markov policy.

We need to make some assumptions on the primitive data of the game to use our parameterized fixed point results in Sect. 2. Along these lines, we make the following assumptions:

Assumption 1

Let us assume:

  • \(A(x)\subset A\subset \mathbb {R}^m\) is compact and complete lattice valued with \(A(0)=\{0\}\),

  • \(u:A\rightarrow \mathbb {R}_+\) is continuous, increasing and supermodular with \(u(0)=0\) and \(u(\cdot )\le \overline{u}\),

  • for any \(x,a\in S,\) let \(Q(\cdot |x,a)=g_{0}(x,a)\delta _{0}(\cdot )+\sum _{j=1}^{J}g_{j}(x,a)\lambda _{j}(\cdot |x)\),

  • \((\forall j=1,\ldots ,J)\,g_{j}:S\times A\rightarrow [0,1]\) is continuous, with \(g_{j}(0,a)=0\) and \(\sum _{j=0}^{J}g_{j}(x,a)=1\) for all \(a\) ; for all \(x,\) with the function \(a\rightarrow g_{j}(x,a)\) supermodular and decreasing,

  • \(\delta _0\) is a delta Dirac measure concentrated at point \(0\), while \( (\forall j=1,\ldots ,J)\,\lambda _j(\cdot |x)\) is a Borel transition distribution on \(S\) for any \(x\in S\).

Our assumptions on preferences are fairly standard, but require a few remarks relative to the work of Harris and Laibson (2001). Before doing that let us stress that our aim is not to weaken some conditions of their model but rather obtain new results [e.g. on computation and comparative statics]. Still, as they work is the most closely related to our paper, however, we owe the reader some specific discussion of our and their assumptions.

First, we assume bounded returns, which is not required in Harris and Laibson’s work, but we also allow for unbounded risk aversion (that is actually needed in their approach). The reason we make the assumption of bounded returns is quite natural, as we are studying a stochastic game with potentially an unbounded state space and many sources of shocks. Although, in principle, this assumption might be relaxed, it would require potentially very strong joint restrictions on payoffs and noise (especially in the case of returns unbounded below).Footnote 18

Second, we allow for multidimensional choice spaces as well as state spaces. To do this, we impose supermodularity structure on the payoffs, which we need to obtain monotone operators in the quasi-hyperbolic decision-makers optimization problems (more on this in a moment). If we are solving a (single-dimensional) consumption-investment version of the model as in Harris and Laibson (2001), we obviously do not need this supermodularity condition. We also do not impose twice continuous differentiability nor strict monotonicity of a utility function.

Also, our assumptions on a transition probability require few remarks. First, we impose that the stochastic transition \(Q\) is defined as a convex combination of \(J\) measures \(\lambda _{j}\) and one measure \(\delta _{0}\) Dirac concentrated at point zero. Hence, with probability \(1-\sum _{j=1}^{J}g_{j}(x,a),\) the next period state is zero, and with probability \(g_{j}(x,a)\) it is drawn from \(\lambda _{j}\). Also, we separate action variables \(a\) and state variables \(x\) in \(Q\), i.e. \(\lambda _{j}\) are not dependent on a decision \(a\). Our mixing condition on the stochastic transitions in the game is quite common in the literature, and was first introduced in Amir (1996), and later developed extensively by Nowak (2003), Balbus and Nowak (2008) or Balbus et al. (2013). The condition has also been used to studying Markovian equilibrium in a very general class of stochastic supermodular games in Balbus et al. (2014). We should mention, even this assumption can be weakened a great deal per questions of existence (e.g., per the application of the celebrated APS procedure), but this weakening of sufficient conditions comes at the cost of not being able to computing both equilibrium values and pure strategy Markovian equilibrium.

Finally, as far as a direct comparison with Harris and Laibson (2001) per stochastic transition probabilities, our model generates more sources of noise than theirs, as in our case, not only is labor income random, but wealth (or capital) is also draw from \(Q\). Also, we do not require that \(Q\) has a density, let alone impose conditions on its degree of smoothness.

4 Main results

4.1 Existence

We first consider the question of existence of Markovian equilibrium. Let \(\mathcal {V}\) be a space of bounded (by \(0\) and \(\frac{\overline{u}}{1-\delta }\)), Borel measurable, real valued functions on \(S\), with \(V(0)=0\) equipped with a pointwise partial order. For a given value \(V\in \mathcal {V}\), we construct a correspondence \(T\) by:

$$\begin{aligned} TV(x)=\frac{1}{\beta }{\textit{CV}}(x)-\frac{1-\beta }{\beta }u({\textit{BV}}(x)), \end{aligned}$$
(5)

where the pair of operators \(C\) and \(B\) defined on space \(\mathcal {V}\) are given by:

$$\begin{aligned} {\textit{CV}}(x)&= \max _{a\in A(x)}\left\{ u(a)+\beta \delta \int \limits _{S}V(y)Q(dy|x,a)\right\} , \end{aligned}$$
(6)
$$\begin{aligned} {\textit{BV}}(x)&= \arg \max _{a\in A(x)}\left\{ u(a)+\beta \delta \int \limits _{S}V(y)Q(dy|x,a)\right\} . \end{aligned}$$
(7)

Notice, in the above, we have defined the operator \(B\) to map between candidates for equilibrium values \(\mathcal {V}\) to spaces of pure strategy best replies \(\mathcal {H}\). So in effect, we have a pair of operator equation we need to solve to construct equilibrium values \(V^{*}\in \mathcal {V}\).

Surely \(T\) maps \(\mathcal {V}\) into \(2^{\mathcal {V}}\). Further, for any fixed point \(V^{*}\) of an operator \(T\), this value function corresponds to a stationary, time-consistent Markov policy \(h^{*}\in BV^{*}\in \mathcal {H}\). By \(\overline{T}\) denote the greatest and by \(\underline{T}\) the least selection from operator \(T\). Equip the space of pure strategies \( \mathcal {H}\) with a pointwise partial order. In this case, we obtain:

Lemma 1

Let assumption 1 hold then \(C:\mathcal {V }\rightarrow \mathcal {V}\) is increasing and \(\overline{B},\underline{B}:\mathcal {V}\rightarrow \mathcal {H}\) are decreasing. Moreover, both \(\overline{T}\) (resp. \(\underline{T}\)) are increasing and monotonically-inf (resp. sup) preserving.

Proof

\(C\) is increasing by definition. To see monotonicity of \(B,\) consider a function

$$\begin{aligned} G(a,x,V)=u(a)+\beta \delta \sum _{j=1}^{J}g_{j}(x,a)\int \limits _{S}V(y)\lambda _{j}(dy|x). \end{aligned}$$

Then for any \(V\in \mathcal {V}\) and \(x\in S,\) the function \(G(\cdot ,x,V)\) is supermodular.

Moreover, \((a,V)\rightarrow g_{j}(x,a)\int _{S}V(y)\lambda _{j}(dy|x)\) has decreasing differences. To see this fact, observe we have the following inequalities:

$$\begin{aligned}&[g_{j}(x,a_{2})-g_{j}(x,a_{1})]\int \limits _{S}V_{2}(y)\lambda _{j}(dy|x), \\&\quad \le [g_{j}(x,a_{2})-g_{j}(x,a_{1})]\int \limits _{S}V_{1}(y)\lambda _{j}(dy|x), \end{aligned}$$

where \(V_{2}\ge V_{1}\) and \(a_{2}\ge a_{1}\). Therefore, for any \(x\in S,\) the function \((a,V)\rightarrow G(a,x,V)\) has decreasing differences on \(A(x)\times \mathcal {V}\). Since \(A(x)\) is a lattice and \(\mathcal {V}\) a poset, we obtain by Topkis (1978) theorem that the extremal selections (\(\overline{B}\) and \(\underline{B}\)) of the best reply \(BR(V)(x)=\arg \max _{a\in A(x)}G(a,x,V)\) are decreasing on \(\mathcal {V}\). Since \(C\) is increasing and \(\overline{B},\underline{B}\) are decreasing, by definition of \(\underline{T}\) and \(\overline{T}\), we conclude that both extremal selections of \(T\) are increasing.

We now show that \(\underline{T}\) is monotonically sup preserving. Let \( \{V_{n}\}_{n=1}^{\infty }\subset \mathcal {V}\) be increasing sequence in the natural product order. Let \(V_{n}\rightarrow V\) pointwise. Clearly, \(V(x)=\sup \limits _{n\in \mathbf {N}}V_{n}(x)\). We need to show \(\lim \limits _{n\rightarrow \infty }\underline{T}(V_{n})=\underline{T}(V)\). By Lebesgue Dominating Theorem, we immediately obtain:

$$\begin{aligned} \int \limits _{S}V_{n}(y)\lambda _{j}(dy|x)\rightarrow \int \limits _{S}V(y)\lambda _{j}(dy|x)\quad \mathrm{{as }}\,n\rightarrow \infty , \end{aligned}$$

for all \(j\) and \(x\). For fixed \(x,\) let \(a_{n}:=\overline{B}(V_{n})(x)\). Since \(a_{n}\) belongs to compact set \(A(x),\) without loss of generality, let us assume \(a_{n}\rightarrow a_{0}\). Then by definition of \(G\), we have:

$$\begin{aligned} G(a_{n},x,V_{n})\ge G(a,x,V), \end{aligned}$$

for all \(a\in A(x)\). Taking limits, we obtain:

$$\begin{aligned} G(a_{0},x,V)\ge G(a,x,V), \end{aligned}$$

for all \(a\in A(x)\). Hence, \(a_0=\lim \limits _{n\rightarrow \infty }\overline{B}(V_{n})(x)\in B(V)(x)\). Further,

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }C(V_{n})(x)&= \lim \limits _{n\rightarrow \infty }G(\overline{B}(V_{n})(x),x,V_{n}) \\&= G(a_0,x,V)=C(V)(x). \end{aligned}$$

Therefore, \(\lim \limits _{n\rightarrow \infty }\underline{T}(V_{n})(x)\ge \underline{T}(V)(x)\). Moreover, \(\underline{T}(V)(x)\ge \underline{T}(V_n)(x)\) since \(\underline{T}\) is isotone. As a result \(\lim \limits _{n\rightarrow \infty }\underline{T}(V_{n})(x)= \underline{T}(V)(x)\). By isotonicity of \(\underline{T}\), the iterations \(\underline{T}(V_{n})(x)\) form an increasing sequence. Therefore, we have:

$$\begin{aligned} \sup \limits _{n\in \mathbf {N}}\underline{T}(V_{n})(x)=\lim \limits _{n\rightarrow \infty }\underline{T}(V_{n})(x)=\underline{T}(V)=\underline{T}\left( \sup \limits _{n\in \mathbf {N}}V_n(x)\right) . \end{aligned}$$

i.e., \(\underline{T}\) is monotonically-sup-preserving. Analogously, we show that \(\overline{T}\) is monotonically-inf-preserving. \(\square \)

Having lemma 1 in hand, we are now in a position to analyze fixed points of a monotone operator \(T\).

Theorem 2

(Existence of extremal SMNE) Let assumption 1 hold. Then, the set of equilibrium values corresponding to stationary, time-consistent Markov policies is nonempty and possesses the greatest \(\overline{h}^{*}\) and the least \(\underline{h}^{*}\) elements, and they correspond to the least \(v^{*}=\underline{T}v^{*}\) and the greatest \( w^{*}=\overline{T}w^{*}\) values.

Proof

By Lemma 1, the operator \(\underline{T}:\mathcal {V}\rightarrow \mathcal {V} \) is increasing. Moreover, by Lemma 1, we have \(\underline{T}\) is monotonically-sup preserving. As \(\mathcal {V}\) is a countably chain complete poset by Theorem 1, \(\underline{T}\) has a nonempty set of fixed points, with the greatest and the least elements. We conclude similar results for \(\overline{T}\). \(\square \)

Theorem 2 is our central result on existence, and it requires a few remarks. First, aside from asserting the existence of time-consistent equilibrium Markov policy (pure strategy), it also asserts that the set of equilibrium values has a particular poset structure; namely, the set of equilibrium values has the greatest and the least elements. This result, in turn, implies that the set of time-consistent value functions is bounded. Second, for any initial state \(x\in S\), the theorem indicates there exists the greatest time-consistent value (and the least equilibrium policy) that are optimal among all the time-consistent values. So, in general, some equilibrium values are ranked. Moreover, if \(\underline{T}=\overline{T}\), the set of SMNE values is a countable chain complete poset.

We can relate the nature of our existence result to those that can be obtained using other approaches found in the existing literature that differ from our approach [as well as the GEE approach in Harris and Laibson (2001)]. First, notice our Theorem 2 is based in a type of value-policy iteration procedure, and resembles in an abstract sense APS type procedures as suggested by the work of Bernheim et al. (1999) and Chade et al. (2008) for sequential equilibrium strategies. Further, to deal with the complications associated with measurability, we only work in function spaces (as opposed to spaces of correspondences). In an APS type method for our problem, one would construct a different operator that map between spaces of value correspondences ordered under set inclusion, where the relevant topology for convergence issues would be the weak star topology. A critical problem for such a method for our class of games concerns handling multidimensional state spaces. In particular, as the set of measurable selections from the Nash equilibrium value set need not be weak-star closed, it is very difficult to get sufficient conditions for even existence using APS type methods unless the state space is either countable or the real line. So it is not clear how to use these methods for multistate models. Now, in the case of a single dimensional state space (or countable state space), it is very easy to check self-generation property of the APS value operator. Then noting the natural monotone structure of the operator under set inclusion order, one can show convergence to the greatest fixed point that contains all the sequential equilibrium values. Unfortunately, as this is not a repeated game, it is difficult to say anything substantial about the set of sequential equilibrium strategies (mixed or pure) that generate this set of values.Footnote 19 See also Chade et al. (2008) for the extensions of these methods to the multi-player case for a repeated game.

4.2 Computation

We next turn to the question of the computation of equilibrium. This question is particularly important in applied work as often researchers want to simulate/calibrate/estimate SMNE. We first use our main existence result to prove our central theorem on the computation of extremal equilibrium values (and their supporting pure strategy SMNE). We then provide additional characterizations of equilibrium strategies that achieve these values.

Theorem 3

(Pointwise approximation of extremal values) Assume  1 and consider two sequences \(\{v_t\}^\infty _{t=0}\) and \(\{w_t\}^\infty _{t=0}\) whereFootnote 20 \(v_0()=0\), \(w_0(x)=\frac{\bar{u}}{1-\delta } \mathbf {1}_{(0,\infty ]}(x)\) and \(v_t=\underline{T}v_{t-1}\) and \(w_t= \overline{T}w_{t-1}\). Then \((\forall x\in S)\,\lim _{t\rightarrow \infty } v_t(x)=v^*(x)\) and \(\lim _{t\rightarrow \infty }w_t(x)=w^*(x).\)

Proof

Clearly \(v_{1}\ge v_{0}\). Since \(\underline{T}\) is monotone, we can conclude \(v_{t}\ge v_{t-1}\). As a result, sequence \(\{v_{t}\}\) is increasing. As it is also bounded above, it is convergent, say to \(\bar{v}\). Further, it is straightforward to show by Lebesgue Dominance Convergence theorem, Lemma 1 and Kall (1986), that \(v^{*}=\bar{v}\). Similarly, we show that \( \{w_{t}\}_{t=0}^{\infty }\) is decreasing and convergent to \(w^{*}\). \(\square \)

A couple of remarks on Theorem 3. First, it provides a very simple constructive method for calculating (pointwise) two of time-consistent values, as well as their supporting policies (including those that are optimal). The theorem, though, gives us much more. In particular, it allows us to calculate the pointwise bounds for any time-consistent equilibrium strategy as well. Finally, of course, if the limits of two sequences analyzed in theorem coincide for any initial state \( x\in S\), then the uniqueness of time-consistent policy is guaranteed. Footnote 21

Second, the theorem (in conjunction with Theorem 2) also provides computable bounds on equilibrium behavior. In particular, from least (resp, greatest) values with corresponding greatest (resp., least) actions, iterations on our monotone operators can compute SMNE least (resp, greatest) values with corresponding greatest (resp., least) actions. This is particularly important in applied work when numerical implementations of our methods are constructed, if models have approximately the same extremal SMNE for the sets of parameters that are “close” (say extremal SMNE are “close” in a sup-norm topology when parameters of a given model are “close” in some metric), this gives one a chance of studying the “robustness” or “stability” of the predictions of the model at hand. Obtaining such bounds can be formalized using order. Further, in an ordered metric space (as we have in the present situation), pointwise order bounds translate into metric bounds (in our case, uniform metric bounds). But alternative \(L_{p}\) metrics could also be developed. It is not clear how one could obtain a similar result using GEE methods ala Harris and Laibson (2001) (for, in the approach of Harris and Laibson (2001), the methods are inherently local, and obtain a global sensitivity analysis for SMNE would require a globalization of their analysis). Perhaps most importantly, our methods also allow one to construct bounds per the optimal time consistent policy versus other SMNE that are time consistent. Using Harris and Laibson (2001), it is not clear how to establish whether any particular equilibrium constructed using GEE is optimal (hence, no such comparison is possible)Footnote 22.

The next two results identify sufficient conditions that allow us to further characterize the smoothness and monotonicity of any time consistent equilibrium policy function \(h^{*}\).

Theorem 4

(Monotonicity of policies) Assume 1, and that each \(g_j\) has increasing differences in \((x,a)\). Consider any time-consistent policy \( h^{*}\). If each \(\lambda _{j}(\cdot |x)\) is constant with \(x,\) and \(x\rightarrow A(x)\) is strong set order increasing, then each time-consistent equilibrium policy \(h^{*}\) is increasing.

Proof

Let \(h^{*}={\textit{BV}}^{*}\) for some \(V^{*}\in {\textit{TV}}^{*}\). Consider the function

$$\begin{aligned} G(a,x,V^{*})=u(a)+\beta \delta \sum _{j=1}^{J}g_{j}(x,a)\int \limits _{S}V^{*}(y)\lambda _{j}(dy). \end{aligned}$$

Observe \(G\) is supermodular in \(a\) on a lattice \(A(x)\), and the feasible action set \(A(x)\) is increasing in the Veinott’s strong set order. Moreover, by assumption on \(g_{j}\), we conclude \(G\) has increasing differences with \((a,x)\). By Topkis (1978) theorem argument maximizing \( h^{*}\) is increasing with \(x\) on \(S\). \(\square \)

To obtain such strong characterization of equilibrium time-consistent policies, we require that \(\lambda _{j}\) are independent of state \(x\). Although such assumption has been imposed in many related papers [see Nowak (2006) or Amir (2002)], a natural question to ask is whether one can obtain similar monotonicity results while still allowing the measures \( \lambda _{j}\) to be dependent on \(x\). So it bears mentioning why it may be difficult to obtain such characterization, when \(\lambda _{j}(\cdot |x)\) is e.g. stochastically ordered with \(x\). Notice, if \(V^{*}\) is increasing, and all \(\lambda _{j}(\cdot |x)\) are stochastically decreasing, that is sufficient to obtain an increasing differences property between the control \( a\) and state \(x\). But to assure for the Bellman operator \(C\) that \(V^{*}\) is increasing, one would like to assume that each of the \(\lambda _{j}(\cdot |x)\) is stochastically increasing with \(x\). Hence, to get monotonicity in this very general setting, we need each \(\lambda _{j}\) independent on \(x\) for all \(j\). We also remark that the assumption that \(\lambda _{j}(\cdot |x)\) is independent of \(x\) means our noise is similar to that in Harris and Laibson (2001), but only for a multidimensional choice/state case.

We next turn to the question of continuous time consistent policies. For this, we impose the following Feller type property on the noise.

Assumption 2

\((\forall j=1,\ldots ,J)\,\lambda _{j}(\cdot |x)\) is strongly stochastically continuous (i.e. the function \(x\rightarrow \eta _{f}^{j}(x):=\int \limits _{S}f(y)\lambda _{j}(dy|x)\) is continuous for any \( f\in \mathcal {V)}\).

With this assumption in place, we now prove a theorem that studies the continuity structure of equilibrium time consistent policies.

Theorem 5

(Continuity of policies) Let 1 and 2 hold with \(u\) strictly concave and \(g\) concave in \(a\). Then, each time-consistent equilibrium policy \(h^{*}\) is continuous.

Proof

Let \(V_{h^{*}}\in \mathcal {V}\) be equilibrium payoff under time-consistent policy \(h^{*}\). Then, by Assumption 1, the mapping

$$\begin{aligned} x\rightarrow \zeta _{h^{*}}^{j}(x):=\int \limits _{S}V_{h^{*}}(y)\lambda _{j}(dy|x) \end{aligned}$$

is continuous. Notice, the function

$$\begin{aligned} F_{h^{*}}(a,x):=u(a)+\beta \delta \sum _{j=1}^{J}\zeta _{h^{*}}^{j}(x)g_{j}(x,a) \end{aligned}$$

is also continuous and strictly concave with respect to \(a\) for fixed \(x>0\). Let \(x_{n}\rightarrow x_{0}\). Since \(h^{*}(x)=\mathrm{{arg}}\, \max \limits _{a\in A(x)}F_{h^{*}}(a,x)\), we have

$$\begin{aligned} F_{h^{*}}(h^{*}(x_{n}),x_{n})\ge F_{h^{*}}(a,x_{n}). \end{aligned}$$

Without loss of generality, suppose \(h^{*}(x_{n})\rightarrow a_{0}\). By the continuity of \(F_{h^{*}}\), we have

$$\begin{aligned} F_{h^{*}}(a_{0},x_{0})\ge F_{h^{*}}(a,x_{0}). \end{aligned}$$

By the strict concavity of \(F_{h^{*}}(\cdot ,x)\) and definition of \( h^{*},\) we obtain \(a_{0}=h^{*}(x_{0})=\lim \limits _{n\rightarrow \infty }h^{*}(x_{n})\). \(\square \)

4.3 Monotone comparative statics

Finally, motivated by the indeterminacy result in Gong and Smith (2007), as well as concerns about the possible econometric estimation of our stochastic game, we now consider the nature of monotone comparative statics in a parameterized version of our optimization problem. For a partially ordered set \(\Theta \) , with \(\theta \in \Theta \) a typical element\(,\) define the greatest and least time-consistent policies as \(\overline{h} _{\theta }^{*}\) and \(\underline{h}_{\theta }^{*},\) respectively.

We make the following assumption.

Assumption 3

Let us assume:

  • \(u:A\times \Theta \rightarrow \mathbb {R}\), \(a\rightarrow u(a,\theta )\) is continuous, increasing and supermodular on \(A\) with \((\forall \theta \in \Theta )\,u(0,\theta )=0\). Also \(u\) has increasing differences with \( (a,\theta )\) and \(\theta \rightarrow u(a,\theta )\) is decreasing.

  • For any \((x,a)\in (S\times A)\) and \(\theta \in \Theta \) let \(Q(\cdot |x,a,\theta )=(1-\sum _{j=1}^{J}g_{j}(x,a,\theta ))\delta _{0}(\cdot )+\sum _{j=1}^{J}g_{j}(x,a,\theta )\lambda _{j}(\cdot |\theta )\).

  • \((\forall j=1,\ldots ,j)\,g_{j}:S\times \Theta \rightarrow [0,1] \) and \(a\rightarrow g_{j}(x,a,\theta )\) is continuous, decreasing and supermodular with \((\forall \theta \in \Theta )\,g_{j}(0,a,\theta )=0\). Also \(g_{j}\) has increasing differences with \((a,\theta )\) and \((a,x)\). Moreover \( (x,\theta ) \rightarrow g_{j}(x,a,\theta )\) is decreasing on \(S\times \Theta \).

  • \(\delta _{0}\) is a delta Dirac measure concentrated at point \(0\), while \((\forall j=1,\ldots ,j)\,\lambda _{j}(\cdot |\theta )\) is a Borel transition distribution on \(S\) for any \(\theta \in \Theta \), where \(\lambda _{j}(\cdot |\theta )\) is stochastically increasing with \(\theta \).

With Assumption 3 in place, we can now prove our main result on monotone comparative statics for extremal time consistent equilibrium policies.

Theorem 6

(Monotone comparative statics) Let Assumption 3 be satisfied. Then, the mappings \(\theta \rightarrow \overline{h}_{\theta }^{*}\) and \( \theta \rightarrow \underline{h}_{\theta }^{*}\) are both increasing on \( \Theta \).

Proof

By Theorem 2, for any \(\theta \in \Theta ,\) there exist top and bottom time consistent policies \(\overline{h}_{\theta }^{*} \) and \(\underline{h}_{\theta }^{*}\). By Theorem 4, \( x\rightarrow \overline{h}_{\theta }^{*}(x)\) and \(x\rightarrow \underline{h}_{\theta }^{*}(x)\) are increasing functions of \(x\in S\). As a result, for each \(\theta \) both operators \(\overline{T}_{\theta },\underline{T}_{\theta }\) maps \(\mathcal {V}\) into decreasing functions, hence, its fixed points are decreasing functions of \(x\in \) \(S\).

Now, for decreasing \(V\in \mathcal {V}\), consider a function

$$\begin{aligned} G(a,x,\theta ,V)=u(a,\theta )+\beta \delta \sum _{j=1}^{J}g_{j}(x,a,\theta )\int \limits _{S}V(y)\lambda _{j}(dy|\theta ), \end{aligned}$$

and observe that \(G\) is decreasing with \(\theta \), and has increasing differences with \((a,\theta )\). Clearly, \(C_{\theta }V(x)=\max _{a\in A(x)}G(x,a,\) \(\theta ,V)\) is decreasing with \(\theta \). Similarly, by Topkis (1978) theorem, \(\underline{B}_{\theta }V(x)\) is increasing with \(\theta \) (where \( B_{\theta }V(x)=\arg \max _{a\in A(x)}G(a,x,\theta ,V))\). Consequently, we have

$$\begin{aligned} \theta \rightarrow \overline{T}_{\theta }V(x)=\frac{1}{\beta }C_{\theta }V(x)-\frac{ 1-\beta }{\beta }u(\underline{B}_{\theta }V(x)), \end{aligned}$$

is decreasing on \(\Theta \). From Theorem 1, we therefore conclude the greatest \(w_{\theta }^{*}\) and the least fixed point \(v_{\theta }^{*}\) are decreasing with \( \theta \). Consequently, \(\theta \rightarrow G(x,a,\theta ,w_{\theta }^{*})\) is decreasing and \((a,\theta )\rightarrow G(x,a,\theta , w_{\theta }^{*})\) has increasing differences with \((a,\theta ) \). Then, by Topkis (1978) theorem, \(\underline{h}_{\theta }^{*}\) is increasing with \(\theta \). The reasoning is similar for \(v_{\theta }^{*}\).

\(\square \)

Finally, our strong results on comparative statics cannot be obtained using Chade et al. (2008) APS type approaches (adapted to a stochastic game). That is, Chade et al. (2008) show conditions in a repeated game under which the whole equilibrium value set is monotone (under set inclusion) in parameter. They also show example where it is not the case. In our case, we are able to provide conditions under which the greatest SMNE value increase with the change of the parameter. That is, we can characterize the comparative statics of the optimal among the set of equilibrium time consistent policies (but we cannot characterize the comparative statics of the whole equilibrium set).

5 Applications and extensions

In this section we discuss two applications and one extension that show how our results can be used in the study of optimal (among the set of time consistent) consumption policies under credit constraints, habit formation and environmental protection. In Sect. 5.4 we also present two specific examples of transition probabilities that generate non-trivial invariant distributions for any SMNE policy. We begin with the standard consumption-savings problem.

5.1 Consumption-savings with \(\beta -\delta \) preferences

We first apply our results to a version of the problem studied in Harris and Laibson (2001). Here, each self \(t\) has \(\beta -\delta \) preferences given as in the general model [e.g., Eq. (1)]. A typical self enters the period endowed with output \(x \in S=[0,\bar{S}]\) where \(\bar{S}\) is finite or \(S=[0,\infty )\), and she decides on the current consumption \(a\in [0,x]\). Investment equals \(x-a\) . Then, the level of investment parameterizes the stochastic transition technology \(Q\) that generates next period output. Preferences and technologies satisfy the assumptions of the previous section, i.e. \(u\) is increasing, continuous and strictly concave. For stochastic transition structures we take a special case of \(g_j\) to be \( g_j(x,a):=\tilde{g}_j(x-a)\) and assume \(\tilde{g}_j\) is increasing, continuous and concave.

As Assumption 1 is satisfied, so Theorem 2 holds. As the constraint set is strong set order increasing and \(g_j\) has increasing differences, we also have conclusions of Theorem 4 as long as \(\lambda _j\) does not depend on \(x\). Finally, if we additionally impose Assumption 2, then Theorem 5 holds. For this model we can also easily show that SMNE policy is Lipschitz continuous. In any case, time consistent (and optimal) policies exist and form a nonempty countable chain complete poset. Further, as Assumption 1 holds, we can also (Fig. 1) compute optimal (among time consistent) policies via Theorem 3 (i.e., pointwise approximate the extremal time consistent equilibrium including the greatest value equilibrium, which is the optimal SMNE).

Fig. 1
figure 1

Convergence of iterations (policies) from above and below to SMNE (\(\alpha =.3,\gamma =.5,\beta =.8,\delta =.96\) )

We can also provide an explicit example of how simple it is to apply our methods to compute/approximate equilibrium time-consistent strategies to this (Fig. 2) example. To see this, consider the following example from macroeconomic applications of hyperbolic discounting problems.

Fig. 2
figure 2

Consumption policy in a SMNE for \(\alpha =.8,\delta =.96, \gamma =.3\) and various \(\beta \)

Example 1

Consider a power utility, Cobb–Douglas class of examples. Let the state space \(S\) for the economy be given by \(S=[0,1]\), the period utility function be \(u(a)=a^{\alpha }\), \(g(x,a)=(x-a)^{\gamma }\), while \(\lambda (y|x)\) has a cdf given by: \(y^{2-x}\). Let \(1>\alpha >0,1>\gamma >0\).

For this economy, we can compute optimal SMNE via standard approximation methods (e.g., piecewise-constant approximation) by iterating on a simple Picard procedure based on the operator \(T\). The results of our calculations are presented in the following figures.Footnote 23 In the first figure, we show convergence to the SMNE iterating both from above and below.

In the second figure, we present a simple set of numerical comparative statics results. Sensitivity analysis exemplifies the monotone comparative statics result from Theorem 6.

5.2 Consumption-savings with habit formation and \(\beta - \delta \) preferences

In our second example, we consider an extension of our model to models with endogenous preferences. One interesting example of such a preference structure is the rational addiction model/habit formation mode of Becker and Murphy (1988), so we now consider a consumption-savings problem with quasi-hyperbolic preference and habit formation.

Along these lines, let us modify the primitive data of the model to accommodate habit formation. Let \(z=a_{-1}\) where \(a_{-1}\) denotes last period consumption, and \(z\) denotes the level of the habit.Footnote 24 Let \( u(a,z) \) denote the current utility from consumption of \(a\in \mathbb {R}_{+}\) , where the past consumption \(z=a_{-1}\in \mathbb {R}_{+}\) parameterizes the current period utility function. In a similar manner to the assumption on preferences in Assumption 1, assume current payoff \(u\) is continuous and strictly concave in its first argument, supermodular, and increasing. Also, assume that the stochastic production technology is exactly as in the previous application.

Then, under our conditions, Theorem 2 holds. If we additionally impose Assumption 2 on the noise, then Theorem 5 holds. In any case, time consistent (and optimal) policies exist and form a nonempty countable chain complete poset. Finally, as Assumption 1 holds, we can also compute optimal among time consistent policies via Theorem 3.

5.3 Environmental policy

In our final example, we apply our results to an environmental growth model with a pollution externality. This application is useful, as it is a case of the model where we have multidimensional choice spaces and multidimensional state spaces. In particular, the economy we study is based upon that studied in Jones and Manuelli (1995) and Acemoglu et al. (2012), but it also shares features found in a number of other papers including Jones and Manuelli (2001) , Brock and Taylor (2005), Karp and Tsur (2011), and Lemoine and Traeger (2012) . In this economy, there will be two sectors producing two types of consumption goods each period. One type of good will be referred to as a “clean” good, while the other type of good will be “dirty”. We again will have a stochastic capital accumulation technology that produces the goods from investment in the corresponding sectors.

More specifically, this economy has consumers deriving utility from consumption of both clean and dirty goods, with their preferences exhibiting a hyperbolic discounting. For a typical self \(t\), she enters the period in state \(s=(x_{c},x_{d})\), where \(x_{c},x_{d}\in [0,S]\) denotes the level of clean and dirty capital, respectively. Each period self \(t\) has lifetime utility given by equation (1) where preferences satisfy Assumption 1, and are defined over actions \( a=(c_{c},c_{d})\in \) \(A\subset [0,S]\times [0,S]=K\) (where \( (c_{c},c_{d})\) denote consumptions of clean and dirty consumption goods).

As for production, the transition probability between states \(s\in S\) given some action \(a=(c_{c},c_{d})\in \) \(A\) is given by \( Q(A|f_{c}(k_{c},k_{d})-c_{c},f_{d}(k_{c},k_{d})-c_{d},s)\), where \(f_{i}\) for \(i=c,d\) is a production function of clean and dirty consumption/capital goods. That is, each self \(t\) leaves self \(t+1\) clean investment goods of the amount \(i_{c}=\) \(f_{c}(k_{c},k_{d})-c_{c}\), and dirty investment goods of the amount \(i_{d}=\) \(f_{d}(k_{c},k_{d})-c_{d}\). Then, we assume the stochastic production technology is given by:

$$\begin{aligned}&Q(\cdot |f_{c}(k_{c},k_{d})-c_{c},f_{d}(k_{c},k_{d})-c_{d},s),\\&\quad =\sum ^J_{j=1}g_j(f_{c}(k_{c},k_{d})-c_{c},f_{d}(k_{c},k_{d})-c_{d})\lambda _{j}(\cdot |s)+g_0(f_{c}(k_{c},k_{d}) \\&\qquad -c_{c},f_{d}(k_{c},k_{d})-c_{d})\delta _{0}(\cdot ), \end{aligned}$$

where \(g\) is continuous, increasing, concave and supermodular in \(a\), \(f\) monotone with \(f(0,0)=0\).

To relate our result to this economy, first if the primitive data of this model satisfies Assumption 1, then by Theorem , we have existence of a nonempty countably chain complete set of time consistent equilibrium (with the greatest value equilibrium optimal). If we additionally impose Assumption , Theorem 5 holds, and time consistent (and optimal) policies are continuous. By 3, we can pointwise approximate the extremal time consistent equilibrium (including the optimal SMNE). Finally, if the production sectors are also separable (i.e., \(g_{j}(\cdot ,\cdot )\) is separable in both arguments) and \(\lambda _j\) does not depend on \(s\), then conclusions of Theorem 4 hold.

5.4 Possibilities for stationary Markov equilibrium

In this final subsection we consider the question of the structure and computation of Stationary Markov equilibrium (SME) associated with SMNE time consistent policies. For this we return to the general model studied in the main section.

In particular, we consider the question of when for SMNE, we can construct equilibrium invariant distributions generated by stochastic transition \( Q(\cdot |x,h^{*}(x))\) for any time consistent policy function \( h^{*}\) that is not trivial (i.e., do not converge to a degenerate distribution). That is, to prevent the SMNE generating a trivial SME, we need some additional assumptions. Two examples below provide some sufficient conditions for such results.

Along these lines, first let \(\Delta (S)\) denote a family of probability measures on the state space \(S\). Further, define a pair of operators, namely \(G:\mathcal {V}\rightarrow \mathcal {V}\), as well as \(G^{*}:\Delta (S)\rightarrow \Delta (S)\) as follows:

$$\begin{aligned} G_{h}(f)(x)=\int \limits _{S}f(y)Q(dy|x,h(x)), \end{aligned}$$

and

$$\begin{aligned} G_{h}^{*}(\tau )(A)=\int \limits _{S}Q(A|x,h(x))\tau (dx). \end{aligned}$$

Notice, that the fixed points of \(G_{h^{*}}^{*}\) are equilibrium invariant distributions of our economy associated with SMNE given by \(h(x)\).

We now give conditions where non-trivial invariant distributions exist.

Example 2

Assume the upper bound for the state space \(\bar{S}<\infty \), and the SMNE \( h^{*}\) is continuous. Let \(\tau \) be some probability distribution on \( [0,\bar{S}]\), and describe it as follows:

$$\begin{aligned} \tau (\cdot )=\xi \,\tau _{N}(\cdot )+(1-\xi )\delta _{0}(\cdot ), \end{aligned}$$
(8)

where \(\tau _{N}\) is probability measure which has no atom at \(0\), with \(\xi \in [0,1]\). If \(x_{t}\) has distribution \(\tau ,\) then the distribution of next state \(x_{t+1}\) is given by

$$\begin{aligned} \tilde{\tau }(\cdot ):=G_{h^{*}}^{*}(\tau )(\cdot )=\sum \limits _{j=1}^{J}\int \limits _{S}g_{j}^{h^{*}}(x)\lambda _{j}(\cdot |x)\tau (dx)+\int \limits _{S}g_{0}^{h^{*}}(x)\tau (dx)\delta _{0}(\cdot ), \end{aligned}$$

where \(g_{j}^{h^{*}}(x):=g_{j}(x,h^{*}(x)),\) for \(h^{*}\) an equilibrium time consistent policy. Let \(S_{h^{*}}:=\left\{ x:g_{0}^{h^{*}}(x)=0\right\} \). Clearly, this is compact set. We now construct an invariant distribution associated with \(h^{*}\). To do this, we impose some additional assumptions in the noise:

  • \(S_{h^*}\) is nonempty and \(0\notin S_{h^*}\),

  • for all \(j,\) \(\lambda _{j}\) has a Feller property, and we have

    $$\begin{aligned} (\forall x\in S_{h^{*}})\quad \sum \limits _{j=1}^{J}g_{j}^{h^{*}}(x)\lambda _{j}(S_{h^{*}}|x)=1. \end{aligned}$$

Given these assumptions, say \(\tau \) is invariant: i.e., \(\tilde{\tau } =G_{h^{*}}^{*}(\tau )=\tau \). We now characterize the invariant distribution under our added assumptions. That is, under the above assumptions, from equation (8), we have

$$\begin{aligned} \tilde{\tau }(\cdot )&:= \xi \sum \limits _{j=1}^{J}\int \limits _{S}g_{j}^{h^{*}}(x)\lambda _{j}(\cdot |x)\tau _{N}(dx) \!+\!\xi \int \limits _{S}g_{0}^{h^{*}}(x)\tau _{N}(dx)\delta _{0}(\cdot )\!+\!(1\!-\!\xi )g_{0}^{h^{*}}(0)\delta _{0}(\cdot ), \\&= \xi \sum \limits _{j=1}^{J}\int \limits _{S}g_{j}^{h^{*}}(x)\lambda _{j}(\cdot |x)\tau _{N}(dx) +\left( \xi \int \limits _{S}g_{0}^{h^{*}}(x)\tau _{N}(dx)+(1-\xi )\right) \delta _{0}(\cdot ). \end{aligned}$$

As \(\tau \) is invariant, \(g_{0}(\cdot )\ge 0,\) by (8) \( \int _{S}g_{0}^{h^{*}}(x)\tau _{N}(dx)=0\) unless \(\xi =0\). But, if \(\xi =0\) , then \(\tau \) is trivial. Hence, we may assume \(\xi \ne 0\). Since \(g_{0}^{h^{*}}\ge 0\), \(\tau _{N}\) must have a support in the set \( S_{h^{*}}\). By (8), we have

$$\begin{aligned} \tau _{N}(\cdot )=\sum \limits _{j=1}^{J}\int \limits _{S_{h^{*}}}g_{j}^{h^{*}}(x)\lambda _{j}(\cdot |x)\tau _{N}(dx). \end{aligned}$$

where the last equality follows from the fact that on \(S_{h^{*}}\) function \(g_{0}^{h^{*}}\equiv 0\).

Now, consider a set of probability distributions with a support on \( S_{h^{*}}\) (say \(\Delta (S_{h^{*}})\)). Since \(S_{h^{*}}\) is compact, by the Prohorov Theorem [e.g., see Sect. 5 in Billingsley (1999)], the space \(\Delta (S_{h^{*}})\) is compact in the weak topology. Define the operator on \(\Delta (S_{h^{*}})\) as follows:

$$\begin{aligned} \mathcal {T}(\mu ):=\sum \limits _{j=1}^{J}\int \limits _{S_{h^{*}}}g_{j}^{h^{*}}(x)\lambda _{j}(\cdot |x)\mu (dx). \end{aligned}$$

We hence have:

$$\begin{aligned} \sum \limits _{j=1}^{J}\int \limits _{S_{h^{*}}}g_{j}^{h^{*}}(x)\lambda _{j}(S_{h^{*}}|x)\tau _{N}(dx)=1. \end{aligned}$$

Hence, \(\mathcal {T}:\Delta (S_{h^{*}})\rightarrow \Delta (S_{h^{*}})\) . We now show \(\mathcal {T}\) has a fixed point. Notice, as \(\Delta (S_{h^{*}})\) is nonempty, convex and compact, to show the existence of a fixed point, it suffices to show \(\mathcal {T}\) is continuous in the weak topology. Let \(\mu _{n}\rightarrow \mu \) weakly, and \(f:S_{h^{*}}\rightarrow S_{h^{*}}\) be a continuous function. Then, we have

$$\begin{aligned} \int \limits _{S_{h^{*}}}f(x)\mathcal {T}(\mu _{n})(dx)=\sum \limits _{j=1}^{J}\int \limits _{S_{h^{*}}}g_{j}(x)\int \limits _{S_{h^{*}}}f(y)\lambda _{j}(dy|x)\mu _{n}(dx). \end{aligned}$$

By Feller properties of \(\lambda _{j},\) we have \(x\rightarrow \int \limits _{S_{h^{*}}}f(y)\lambda _{j}(dy|x)\) is continuous. Hence, we have

$$\begin{aligned} \int \limits _{S_{h^{*}}}f(y)\mathcal {T}(\mu _{n})(dy)\rightarrow \int \limits _{S_{h^{*}}}f(y)\mathcal {T}(\mu )(dy), \end{aligned}$$

which implies \(\mathcal {T}\) is continuous in the weak topology. Then, by the Schauder–Tykhonov Theorem, \(\mathcal {T}\) has a fixed point \(\tau _{N}^{*} \), and the SME invariant distribution takes the form \(\tau (\cdot )=\xi \tau _{N}^{*}(\cdot )+(1-\xi )\delta _{0}(\cdot )\).

In the next example, we construct another situation where a SMNE \(h^{*}\) has a non-trivial SME invariant distribution. In this case, the SME is given as a convex combination of uniform distributions on a fixed interval, and the Dirac delta centered at zero.

Example 3

Let \(J=1, \bar{S}=5, \lambda (\cdot ):=\mathcal {U}(2,5)\) (i.e. \(\lambda _{j}\) does not depend on either \(j\) or \(x\), and has a uniform distribution on the interval \([2,5]\)), \(u(a)=\sqrt{a}\) and \(g(x,a)=\min \left( \sqrt{x-a} ,1\right) \). Assume that \(\beta \) and \(\delta \) satisfy: \(\delta +\delta \beta \frac{14}{9}\ge 1\). First, we show that for \(x\in [2,5], h^{*}(x)=x-1\). Let \(x>2\) be an initial state. Let \(v_{0}\) be a payoff under strategy \(h^{*}(x)=x-1\) for \(x>2\). Then,

$$\begin{aligned} v_{0}(x)=\sqrt{x-1}+\delta \int \limits _{S}v_{0}(y)\lambda (dy). \end{aligned}$$
(9)

Since \(\mathrm{{supp}}\,(\lambda )=[2,5]\) from (9), we have

$$\begin{aligned} \int \limits _{S}v_{0}(y)\lambda (dy)=\frac{14}{9}+\delta \int \limits _{S}v_{0}(y)\lambda (dy), \end{aligned}$$

hence

$$\begin{aligned} \int \limits _{S}v_{0}(y)\lambda (dy)=\frac{\frac{14}{9}}{1-\delta }. \end{aligned}$$

Since \(h^{*}\) is MSNE, it must solve a maximization problem:

$$\begin{aligned} a\in [0,x]\rightarrow \sqrt{a}+\delta \beta \int \limits _{S}v_{0}(y)\lambda (dy)\min \left( \sqrt{x-a},1\right) , \end{aligned}$$
$$\begin{aligned} =\sqrt{a}+\delta \beta \frac{\frac{14}{9}}{1-\delta }\min (\sqrt{x-a} ,1):=w(a). \end{aligned}$$

Notice that

$$\begin{aligned} w(a)&= \sqrt{a}+\delta \beta \frac{\frac{14}{9}}{1-\delta },\quad \text { for }a\le x-1,\\ w(a)&= \sqrt{a}+\beta \delta \frac{\frac{14}{9}}{1-\delta }\sqrt{x-a},\text { else}. \end{aligned}$$

Further, right derivative of \(w\) at \(x-1\) is

$$\begin{aligned} \left. \frac{\partial w}{\partial a}\right| _{a=x-1}=\frac{1}{2\sqrt{x-1} }-\frac{1}{2}\beta \delta \frac{\frac{14}{9}}{1-\delta }\le \frac{1}{2} \left( 1-\beta \delta \frac{\frac{14}{9}}{1-\delta }\right) \le 0, \end{aligned}$$

whenever \(\delta +\delta \beta \frac{14}{9}\ge 1\). This implies that \(x-1\) is optimal policy at state \(x\in [2,5]\). We therefore have that \(\tau ^{*}():=\xi \mathcal {U}(2,5)+(1-\xi )\delta _{0}\) is SME invariant distribution under strategy \(h^{*}\) for arbitrary \(\xi \in [0,1]\) . Indeed, if \(\tau _{t}=^{d}\tau \) thenFootnote 25:

$$\begin{aligned} \tau _{t+1}=^{d}\xi \lambda +(1-\xi )\delta _{0}=^{d}\tau _{t}. \end{aligned}$$

6 Related results and conclusion

It is important to remember that equilibrium non-existence and its multiplicity, related to the class of quasi-hyperbolic games we study, have constituted a significant challenge for applied economists who sought to study models where such dynamic consistency failures play a key role. They have been equally as challenging for researchers that seek to identify tractable numerical approaches to computing SMNE in these (and related) dynamic games [e.g., see the discussion in Krusell and Smith (2003) or Judd (2004)]. On the one hand, Krusell et al. (2002a) propose a generalized Euler equation method for a version of a hyperbolic discounting consumer and obtain explicit solution for logarithmic utility and Cobb–Douglas production examples. But this is an example. On the other hand, in Judd (2004), he uses generalized Euler equation approach to analyze smooth time-consistent policies and proposes a perturbation method for calculating them. The problem here is providing conditions under which at any point in the state space the generalized Euler equations represent a sufficient first order theory for agent’s value function in the equilibrium of the game.Footnote 26 Concentrating on non-smooth policies, Krusell and Smith (2003) define a step function equilibrium and show its existence and resulting indeterminacy of steady state capital levels. Further, in a deterministic setting general existence result of optimal policies under quasi-geometric discounting can be provided using techniques proposed by Goldman (1980) for finite horizon economies, by Harris (1985) for infinite horizon or by Feinberg and Shwartz (1995) in the generalized discounting setting.

Summarizing, from a technical point of view, tools used to show existence and characterize Markovian policies are wide and motivated by specific applications or problems under study. Still the general framework for studying (analytically and numerically) of (possibly nonsmooth) SMNE is missing. To circumvent some of these mentioned predicaments in a unified setup authors also added noise to the decision problems or relevant dynamic games. Specifically, in a (recursive) decision approach, by adding noise (making payoff discontinuities negligible) Caplin and Leahy (2006) prove existence of recursively optimal plan for a finite horizon decision problem and general utility functions. Similarly Bernheim and Ray (1986) show that by adding enough noise to the dynamic game (to smooth discontinuities away) existence of SMNE is guaranteed. Such stochastic game approach was later developed by Harris and Laibson (2001) who characterize the set of smooth SMNE by (generalized) first order conditions. Finally Balbus and Nowak (2008) show conditions for SMNE existence in an infinite horizon, hyperbolic discounting stochastic game with many players in each generationFootnote 27.

It is worth mentioning that authors have also analyzed optimal but not necessarily time-consistent policies. For infinite horizon decision problems Kydland and Prescott (1980, henceforth KP) notice that the state space of an appropriately defined value function must incorporate some pseudo-state variables like Lagrange multipliers for the problem (of finding optimal policies) to be recursive. KP method is linked to the Abreu et al. (1990, henceforth APS) type arguments. Specifically by adding appropriate noise to the time-consistency game, characterization of all sequential equilibria using APS methods can be offered. This approach is undertaken by Bernheim et al. (1999). They analyze our problem using APS type arguments. Specifically they consider a set of (bounded) values for (sequential) subgame perfect equilibria in a Phelps and Pollak (1968) self-game and analyze all subsets of such values. Later they construct a monotone (under set inclusion) operator on this set and numerically analyze its largest fixed point. Using this method they show existence of a sequential time-consistent policy and use it to analyze self-control in the context of a low asset trap.

Finally, the literature on self-control is larger than on a specific problem of time-consistency, including papers specifying preferences over menus allowing for temptation. That is, instead of taking a preference change as a primitive of the model, economist introduce preferences over menus which are time-consistent (i.e. do not change over time) but still allow for modeling of self-control (by introducing so-called set-betweenness axiom).

Specifically Gul and Pesendorfer (2001) (GP, henceforth) and Dekel et al. (2001, 2009) (DLR, henceforth) consider a general model of preferences over menus (lotteries), from which choice is made at a later date and show that preferences over menus can be used to identify an agent’s subjective beliefs regarding her future tastes and behavior. They explicitly model a cost of tomorrow’s temptation as a difference between tomorrow optimal decision and a current tempted decisionFootnote 28. GP introduce also an overwhelming temptation preferences or Strotz representation where the future decisions are always made according to the tempted preferences, which is exactly the case in our quasi-hyperbolic discounting problem. For application of GP see Krusell et al. (2002b) who study asset pricing puzzle.

Actually, there are more links between (stochastic) game methods used in this paper and the preference approach discussed above. Here we refer the reader to the paper of Benabou and Pycia (2002) who represent GP preferences by outcomes of the two-period game of control between a “planner” and a “doer”. Also Fudenberg and Levine (2006) paper presents a stochastic game between planner and a sequence of myopic doers. Doers choose actions and planner their costs. They show the strategies and outcomes of their game are equivalent to solutions of a “planner” maximization problem under incentive compatible constraints. Fudenberg and Levine (2006) also discusses relation between their game and GP preference representation. Hence a natural question, on applicability of our constructive (stochastic game or stochastic decision problem) methods to the Fudenberg and Levine (2006) or Benabou and Pycia (2002) game and hence GP or DLR representation, arise. This becomes especially important in the view of Dekel and Lipman (2012) random Strotz representation Footnote 29, where decision from the menu is constrained to the actions incentive compatible with (tempted) doer, but where the preferences of the doer are drawn from some probability distribution.

Finally let us note that quasi-hyperbolic discounting problem is linked to a problem of altruism towards successive generations [see Saez-Marti and Weibull (2005) for formal results]. This link can be also seen through a technical perspective, where the stochastic games methods [see Balbus et al. (2014)] can be applied for both quasi-hyperbolic discounting and intergenerational altruism [see Balbus et al. (2013)] models.

All in all, we think that our approach offers an interesting alternative to all mentioned contributions by using stochastic games framework and directly attacking existence, computation and comparative statics questions.