Keywords

1 Introduction

Time inconsistency, also called dynamic inconsistency, refers to the phenomena that the long-term optimal decision policy determined at time 0 is no longer optimal when reconsidering the truncated decision problem at time t. Many real-life decision problems are time inconsistent problems, such as whether to abstain from smoking.Footnote 1 Apparently, the long-term optimal decision is to insist abstaining from smoking. However, when coming into action, the smoker may prefer to smoke today and abstain tomorrow. The reason behind is that comparing to the present temptation (the pleasure of smoking), the decision maker evaluates the bad consequences received in the future (bad health) with a large discount factor, which is also called the present bias (see Thaler 1981; Loewenstein and Prelec 1992). In the literature, the time inconsistent problem is also called the self-control problem, as the decision maker needs to exert proper self-control in order to achieve a good long-term performance.

Essentially, when a decision maker faces a dynamic decision-making problem, his/her preferences at different time instants for the corresponding tail parts of the time horizon could be time-varying and/or state dependent. Actually, we all have faced occasions in which we change our minds, but usually we do not go to extraordinary steps to prevent ourselves from deviating from the original plan. The only circumstances in which we would want to commit ourselves to our planned course of action is when we have a good reason to believe that if we change our preferences later, this change of preferences will be a mistake. In the smoking case, the smoker definitely believes that abstaining from smoking is good for health. Thus, the crucial question is this: if I know I am going to change my mind about my preferences, when and how would I take some actions to restrict my future behaviour?

In this article, we start by summarizing some progress in dealing with the time inconsistency and self-control issue, especially in the dynamic investment area. We then describe some major open questions in this area.

2 Progress

2.1 Separable Problem Versus Non-separable Problem

The time inconsistent problems can be classified into two categories: separable ones and non-separable ones, which are caused by the present bias and the non-separable preference, respectively. Dynamic investment and consumption problems with quasi-hyperbolic discounting and dynamic mean-variance portfolio selection problems are two salient time inconsistent decision problems in the literature. The past few years have seen substantial progress in our understanding of the time consistency issue. Much of the progress concerns these two problems.

For the dynamic investment and consumption problem with quasi-hyperbolic discounting, the decision maker’s time preference, which represents the value at time t of $1 received at future time s, is described by

$$\displaystyle{ D_{t}(s) = \left \{\begin{array}{ll} 1, &\mbox{ if }s = t, \\ \beta \delta ^{(s-t)},&\mbox{ if }s > t, \end{array} \right. }$$

where 0 < β < 1 is the quasi-hyperbolic discounting parameter and represents the short-run discounting, and δ represents the long-run discounting (see Laibson 1997; O’Donoghue and Rabin 1999). The quasi-hyperbolic discounting is a typical and well-documented time preference with present bias, under which the decision maker tends to underestimate the value of payoff in the future. The decision maker’s investment objective at time t is the expected sum of discounted utilities of the future consumptions and the terminal wealth,

$$\displaystyle{ \mathbb{E}_{t}\left [\sum _{s=t}^{T-1}D_{ t}(s)U(c(s))\right ] + D_{t}(T)\mathbb{E}_{t}[U(X(T))]. }$$

Due to the existence of the short-run discounting parameter β, the preferences at different time instants are inconsistent. The decision maker may switch his/her mind at a later time t and prefer to consume more than his/her original plan to maximize his/her short-term preference (see Thaler and Shefrin 1981). The smaller the parameter β, the larger the conflict between the long-term optimal consumption plan and the short-term optimal consumption plan.

For dynamic mean-variance portfolio selection problem, the decision maker’s preference at time t is the weighted sum of the conditional expected value and the conditional variance of the terminal wealth,

$$\displaystyle{ \mathbb{E}_{t}[X(T)] -\lambda \mbox{ Var}_{t}(X(T)), }$$

where λ > 0 is the risk aversion parameter. As the variance term does not satisfy the smooth property, i.e., Var t (X(T)) ≠ Var t (Var s (X(T))) for s > t, the preferences at different time instants would be definitely inconsistent. After a mean-variance investor derives his/her long-term optimal investment strategy at time 0, he/she could be tempted to adopt a different strategy at a later time t in order to achieve a short-term mean-variance efficiency (see Basak and Chabakauri 2010; Cui et al. 2012).

One significant difference between the above two problems is that the objectives of the dynamic investment and consumption problem with quasi-hyperbolic discounting at different time instants only involve the separable expectation operator (which can be represented as the expected sum of the future discounted performance measures), while the objectives of the dynamic mean-variance portfolio selection problem at different time instants involve a non-separable operator: variance. In addition, almost all widely adopted risk measures in static portfolio selection, including variance, semi-variance, value-at-risk (VaR) and conditional value-at-risk (CVaR), become time inconsistent in dynamic mean-risk framework (see Boda and Filar 2006). Moreover, all those risk measures are of a non-separable nature.

2.2 Approaches Dealing with Time Inconsistency

Before summarizing the approaches, we need to know the mathematical meaning of time inconsistency. When a decision maker faces a time inconsistent dynamic decision problem, the overall objective for the entire time horizon under consideration does not conform with the local objective for a tail part of the entire time horizon. In the language of dynamic programming, Bellman’s principle of optimality is not applicable in such situations, as the global and local interests derived from their respective objectives are not consistent (see Artzner et al. 2007).

Apparently, one direct approach to overcome the time inconsistency issue is to construct a time consistent decision model. This approach is widely studied in the field of dynamic risk measures and dynamic risk management. As a basic requirement, all the suitable dynamic risk measures should necessarily possess certain functional structure, such as

$$\displaystyle{ \rho _{t}(X_{T}) =\rho _{t}(-\rho _{s}(X_{T})),\quad t < s < T, }$$

in order to satisfy time consistency (see Rosazza Gianin 2006; Artzner et al. 2007; Jobert and Rogers 2008). When a dynamic risk measure is time consistent, it not only justifies the mathematical formulation for risk management, but also facilitates the solution process in finding the optimal decision (see Cherny 2010). However, this approach cannot be applied to the time inconsistent decision problems caused by present bias or non-separable preferences.

In the literature, there are mainly three different solution schemes in dealing with the time inconsistency issue for the general dynamic decision problem. The first solution scheme is the so-called pre-committed policy approach. In this approach, the decision maker strictly adheres to the global long-term optimal decision policy over the entire time horizon. In other words, the decision maker only cares about the global objective and fully ignores local objectives. Such policy is called the pre-committed policy.Footnote 2 To adopt the pre-committed policy, the decision maker can make the pre-committed policy the only feasible policy or the only economically reasonable policy via a strict self-control commitment or external contractual commitment, which is not easy in reality. Investment plan 401(k) is one such example in reality that forces employees not to withdraw pensions before retirement through a contractual penalty scheme (see Madrian and Shea 2001).Footnote 3

The second solution scheme is the time consistent policy approach. In this approach, the decision maker is aware of the inconsistency between the global objective and local objectives, but is unable to adhere to the pre-committed policy. Thus, the decision maker totally bows to local objectives (i.e., local temptations). As the decision maker at current time instant has a decision advantage with respect to the ones at future time instants, the decision maker’s problem can be modelled as an intrapersonal sequential game. In the game, the decision maker at any time instant acts as a Stackelberg leader and makes his/her “best” decision by taking into account his/her decisions in future periods. The corresponding subgame perfect Nash equilibrium decision policy is called the time consistent policy.Footnote 4 This approach is widely applied to separable and non-separable time inconsistent decision problems. Laibson (1997), O’Donoghue and Rabin (19992001), and Grenadier and Wang (2007) studied the time consistent policies for different financial decision problems with quasi-hyperbolic discounting. Basak and Chabakauri (2010), Hu et al. (2012), Lioui (2013), Chen et al. (2014b), Björk et al. (2014) and Cui et al. (2016) studied time consistent policies for the mean-variance preference under different market settings.

The third solution scheme is the self-control policy approach developed in the literature recently. In this approach, the decision maker intends to resolve conflicts between the long-term and short-term interests by reconciling the global objective and local objectives. To achieve this goal, the decision maker is required to possess a degree of willpower to exert self-control and resist the local temptation in the future time instants (see Rachlin 2004). Several theoretical models with a self-control feature have been developed to guide decision makers in achieving such a balance. For example, O’Donoghue and Rabin (19992001) proposed the partial naive decision maker assumption, which assumes that the decision maker can exert self-control to have a larger quasi-hyperbolic discounting parameter, i.e.,

$$\displaystyle{ \hat{D}_{t}(s) = \left \{\begin{array}{ll} 1, &\mbox{ if }s = t, \\ \hat{\beta }\delta ^{(s-t)},&\mbox{ if }s > t, \end{array} \right. }$$

where \(\beta <\hat{\beta }\leq 1\). With the larger quasi-hyperbolic discounting parameter \(\hat{\beta }\), the decision maker decreases the conflict between the long-term and short-term preferences, and thus achieves a better balance between the long-term and short-term interests. Gul and Pesendorfer (20012004) proposed the axiomatic theory of self-control, under which the decision maker integrates the opportunity costs of deviating from the local optimal policies into the global objective. By doing so, a policy taking into account both the long-term and short-term interests is obtained. Thaler and Shefrin (1981) and Bénabou and Pycia (2002) proposed the planner-doer model, and Fudenberg and Levine (20062012) proposed the dual-self model, which both assume that the global self can influence the myopic preferences of local selves through different self-control schemes and then derive the equilibrium policy between global self and local selves. We call this type of policies the self-control policy in this review.Footnote 5 These models with self-control features have been successfully applied to decision problems, whose time inconsistency is caused by present bias (see O’Donoghue and Rabin 19992001; Grenadier and Wang 2007; Chen et al. 2014a; Tian 2016). However, these models are not applicable for non-separable time inconsistent decision problems.

Besides the above approaches, Cui et al. (2012) and Cui et al. (2015a) proposed a new angle for dealing with the time inconsistency of dynamic mean-variance portfolio problem. As mean-variance problem is a multi-objective optimization problem, a multi-objective version of principle of optimality is applied. In other words, any tail part of an efficient policy is also efficient for any realizable state at an intermediate period (see Li and Haimes 1987; Li 1990). In spirit of this logic, Cui et al. (2012) extended the concept of time consistency to a relaxed version to incorporate efficiency, namely, time consistency in efficiency. Through showing that the dynamic mean-variance formulation is not time consistent in efficiency, they demonstrated that the investor may have irrational local preferences of minimizing the risk and the return at the same time. Cui et al. (2012) relaxed the self-financing restriction and allow withdrawal of positive dollar amounts out of the market during the investment process. Furthermore, they proposed a better revised policy which can achieve the original mean-variance pair but obtain some extra (positive) dollar amounts with a strictly positive probability under certain probability distribution assumptions. Moreover, Cui et al. (2015a) studied the effect of portfolio constraints on the time consistency in efficiency of convex cone-constrained markets and established a general procedure for constructing time consistent in efficiency dynamic mean-variance portfolio selection problems by introducing suitable portfolio constraints.

3 Challenges

3.1 Dynamic Mean-Risk Portfolio Optimization Problems

Mean-risk portfolio selection models are widely used in portfolio management practices. Although most of these models suffered the problem of time inconsistency, only the dynamic mean-variance model attracts enough attentions in the literature. Moreover, the published works on time inconsistency issue of dynamic mean-variance models mainly concentrate on the time consistent policy and the revised policy. Thus, there are several open questions in this field.

The first question is how to derive the time consistent policies for the mean-risk models beyond mean-variance. As these models are non-separable, there does not exist analytical or semi-analytical form policy. Thus, suitable numerical methods can be developed. Furthermore, the properties of the time consistent polices under different risk aversion parameter settings are worth investigation. Cui and Shi (2014) made an attempt to analyse the time consistent policy for multi-period mean-CVaR model with finite states.

The second question is whether the mean-risk models beyond mean-variance satisfy time consistency in efficiency. If not, the decision maker may devote himself/herself to constructing the revised policies, which is better than the pre-committed policy.

The third question is how to construct the self-control policies for the mean-risk models. Although the existing theoretical models with a self-control feature are not directly applicable to the non-separable mean-risk models, the idea is very useful in constructing some new theoretical models under non-separable mean-risk framework. To our best knowledge, there are some preliminary work in this direction. By extending the planner-doer model of Thaler and Shefrin (1981) and Bénabou and Pycia (2002), Cui et al. (2017) developed a two-tier planner-doer game framework with self-coordination, which is theoretically applicable to discrete-time non-separable decision problems. They successfully applied the proposed framework to deal with dynamic mean-variance portfolio selection problem and a two-period mean-CVaR portfolio selection problem. Similarly, Cui et al. (2015b) extended the dual-self model of Fudenberg and Levine (20062012) and proposed a two-tier dual-self game model, which is theoretically applicable to continuous-time non-separable decision problems. Although the above two new frameworks have an important theoretical value, how to apply them to construct suitable self-coordination schemes and compute the corresponding self-control policies for the mean-risk models beyond mean-variance are still unclear.

3.2 Time Inconsistency Generated by Probability Weighting

The probability weighting function, proposed by Tversky and Kahneman (1992), transforms objective probabilities into decision weights. The original motivation for this transformation function was the simultaneous demand many people had for both lotteries and insurance. Typically, people prefer a 0.001 chance of $50,000 to a certain $50 but meanwhile prefer to pay $50 rather than face a 0.001 chance of a $50,000 loss. This combination of behaviours is difficult to explain under the expected utility theory. However, under the probability weighting framework, the unlikely events—gaining or losing $50,000—are overweighted, thereby explaining these choices. The probability weighting is a key feature of many behaviour portfolio selection models, such as the rank-dependent utility model (see Schmeidler 1989; Abdellaoui 2002) and the cumulative prospect theory model (see Tversky and Kahneman 1992; He and Zhou 2011).

In a dynamic setting, probability weighting also generates time inconsistency (see Barberis 2012), which, once again, lies in the domain of non-separability. This inconsistency may be useful for understanding some real trading behaviours, for example, people sometimes hold on to losing investments longer than they were planning to, known as the disposition effect in the literature (see Odean 1998). However, there is relatively little research on it, especially when compared to the large literature on the inconsistency generated by present bias. Shi et al. (2015) suggested one possible approach to analyze the time inconsistency generated by probability weighting in dynamic setting, but other approaches are surely also possible and deserved more studies.

3.3 Data Challenge

As long as the data becomes more and more easy to collect, the decision makers begin to formulate their decision problems based on rich data. On the one hand, they can use the rich data to describe the dynamics of the uncertainties, which makes the constructions of dynamic decision problems possible. On the other hand, they may build data-driven decision problems by fully using the rich data (see Bertsimas and Thiele 2006; Delage and Ye 2010; Hou and Wang 2013; Huh et al. 2011 for data-driven decision-making examples). Based on these two developing directions, there will be more and more data-driven dynamic decision problems in research and practices.

In general, these data-driven dynamic decision problems are time inconsistent. Comparing to the dynamic decision problems with explicit assumptions on the uncertainties, the data-driven dynamic decision problems may introduce great computation challenges.