1 Introduction

From a scientific perspective, climate change is a problem in which the very long run is clearly important. A significant portion of carbon dioxide emissions remain in the atmosphere for “many thousands of years” (Solomon et al. 2008), and damages attributable to these emissions include irreversible changes, such as the potential collapse of the Greenland or West Antarctic ice sheets. But viewed through the lens of cost benefit analysis, typical integrated assessment models suggest that climate outcomes in the long run have little or no bearing on current policy decisions. In particular, the optimal carbon tax, a summary measure of the economic value of current emission reductions, is largely insensitive to climate outcomes beyond a hundred years or so.

The source of this strong result is economic discountingFootnote 1. Discounting converts costs and benefits at different times into a common unit of account that reflects foregone investment returns when a payoff stream is received in the future rather than today. In typical economic models, the rate at which future payoffs are discounted is governed by two parameters: the rate of time preference—the rate at which future welfare (or “utility”) is discounted—and the consumption elasticity, which reflects society’s aversion to consumption inequality across generations. In economics, these parameters are most commonly calibrated by (i) assuming that they are constant and (ii) choosing them so savings decisions are consistent with data on financial transactions in real economies. This implies a discount rate (on real consumption goods) of about 5.5 percent (Nordhaus 2008). Thus, a billion dollars in 200 years has the value today of an economy car ($ 17K), while a billion dollars in 400 years has the value today of a gumball (28 ¢).

While insensitivity to long run outcomes is potentially unsettling, the argument for calibrating the discounting parameters to match historical interest rates cannot be readily dismissed. Suppose, as in the widely-publicized Stern Review on the Economics of Climate Change (Stern 2007), that we instead adopt a lower discount rate. Motivated by ethical considerations, Stern (2007) adopts (constant) discounting parameters that imply a discount rate (on real consumption goods) of about 1.4 percent. The trouble is that the assumed parameters imply savings rates twice as high as observed in real economies (Nordhaus 2008), so the high value attributed to climate policy reflects an unrealistically high willingness to transfer wealth to future generations.

The accompanying assumption, a constant rate of time preference, is not nearly as compelling. Since it was suggested by Samuelson (1937), a constant rate of time preference has been part of the dominant model in economics for evaluating welfare outcomes over time. But Samuelson himself stated that “any connection between utility as discussed here and any welfare concept is disavowed” (Samuelson 1937; Frederick et al. 2002). Meanwhile, the assumption stands in conflict with experimental evidence from psychology and economics (Thaler 1981; Loewenstein 1987; Cropper et al. 1994). For example, Thaler (1981) asks subjects to indicate the dollar amount, to be received in a month, a year, or ten years, that would make them indifferent to receiving $ 15 today. The mean responses imply discount rates of 345 %, 120 %, and 19 % over the respective horizons, consistent with declining rates of time preference (DRTP). At long horizons, DRTP are consistent with the observation that people can more readily imagine differences between the current generation and the generation in 20 years than they can differences between the generation in 300 years and that in 320 years (Rubinstein 2003; Karp and Tsur 2011; Layton and Brown 2000).

A variety of normative rationale for DRTP have also been suggested. These are arguably more important for motivating policy since they speak to how policy ideally should be set. One of the strongest arguments applies when agents in society disagree about the appropriate rate of time preference. In this case, a utilitarian social planner who seeks to aggregate preferences will act “as if” using DRTP. The argument is conceptually related to seminal work by Weitzman (1998), and it has been developed in a variety of settings by Li and Löfgren (2000), Gollier and Zeckhauser (2005), Heal and Millner (2013), and Heal and Millner (2014). We build on a version of this argument in the application.

Incorporating DRTP into policy analysis introduces complexities that do not arise when time preference rates are constant. Consider a decision maker (or a sequence of decision makers) with DRTP who plans a series of actions over time. Today, when evaluating outcomes between ten and fifteen years out, she uses a relatively-low time preference rate. But when the time comes to implement the year ten action, she (or her successor) would reevaluate the decision by applying a relatively-high short-term rate over the same, now proximate, dates. This shift in preferences can lead to “time inconsistent” investment decisions. In principle, the difficulty is overcome by requiring agents each period to correctly anticipate how future agents are expected to respond (Strotz 1955)Footnote 2. But solving the problem in this way can be difficult in practice. Indeed, in most quantitatively interesting integrated assessment models, solving for the time consistent solution is numerically infeasible.

To make progress, the climate policy literature that considers DRTP has either employed relatively simple climate-economy models (Karp 2007; Fujii and Karp 2008) or restricted time preference rates to differ over two periods only (Karp 2005; Gerlagh and Liski 2013). The latter assumption is referred to as quasi-hyperbolic discounting (Phelps and Pollak 1968; Laibson 1997). It simplifies the analysis, but is not quantitatively plausible for a long term problem like climate change. Iverson (2013a), which builds on Gerlagh and Liski (2013), is the first paper to solve for (time consistent) optimal carbon taxes in a quantitatively plausible integrated assessment model while allowing for arbitrary non-constant time preference rates. It employs a recent integrated assessment model developed in Golosov et al. (2014).

With DRTP, it is possible for the model to oblige historical evidence on savings rates and for the current carbon tax to respond in a sensitive way to climate outcomes in the distant future. This point is emphasized in Gerlagh and Liski (2013). We evaluate the same question while building on the model in Iverson (2013a). The paper goes beyond prior work by providing a way to explicitly decompose the contribution to the optimal tax from consumption impacts at different timesFootnote 3. This makes it possible to study interactions between discounting assumptions and long run features of the carbon cycle in determining the importance of long run outcomes for current policy. Among other things, our results have implications for assessing the potential for scientific evidence regarding long run carbon decay to influence the current case for climate action.

2 Model

The paper employs the integrated assessment model of Golosov et al. (2014) (the “GHKT model”). The model structure emulates the widely-used DICE model (Nordhaus 2013). It combines a neoclassical economic growth model where carbon emissions arise as a byproduct of production activities with a carbon cycle that describes the manner in which carbon emissions accumulate in the atmosphere. Accumulated pollution stocks cause damages that feed back on economic growth. The model equations are presented in the Appendix.

The GHKT model differs from DICE in three important ways. First, the climate module abstracts from temperature inertia. As a result, the dynamic link between emissions and damages is somewhat less realistic then in DICE. Fortunately, GL show that temperature inertia can be recovered within the GHKT model by modifying the climate module in a tractable way. We build on GL to include temperature inertia in our analysis. Second, the model contains a full “general equilibrium” specification of clean and dirty energy sectors. Due to this, the model does a better job simulating the effect of a carbon tax in the equilibrium of a competitive market economy. Finally, the GHKT model allows for stochastic uncertainty about future damages.

Analytic tractability is obtained in Golosov et al. (2014)—and in Gerlagh and Liski (2013) and Iverson (2013a) and —by imposing restrictive functional form assumptions. These are presented in the Appendix. As argued in Golosov et al. (2014), the assumptions are close to those in most of the prior climate economics literature. Nevertheless, they have some strong consequences. First, they effectively shut down strategic interactions among generations (Iverson 2013a; Karp 2014). This feature arises because the equilibrium of the model is linear in the stock of carbon. Second, they have restrictive implications when modeling the time path of fossil fuel extraction (Rezai and van der Ploeg 2014).

The employed model simplifies the “full” GHKT model by assuming that fossil fuel reserves are not used up along the optimal path. Provided coal extraction costs remain low for the foreseeable future, this assumption is unlikely to matter much for the analysis. This is the case because coal is the most important source of carbon emissions in long run analyses (van der Ploeg and Withagen 2012), and coal deposits are not used up along the optimal path under plausible parameter values in typical integrated assessment models. This simple interpretation breaks down, however, if the marginal extraction cost of coal increases significantly as more coal is extracted.

2.1 Optimal carbon taxes

As shown in Gerlagh and Liski (2013) and Iverson (2013a), the GHKT model provides a highly convenient setting in which to consider the implications of DRTP for climate policy. Gerlagh and Liski (2013) use the model to derive an explicit formula for the optimal carbon tax when time preference rates differ over two periods (so-called “quasi-hyperbolic” discounting), and Iverson (2013a) derives an analogous formula when time preference rates vary arbitrarily over time. We employ Iverson’s formula. It reduces to the optimal tax formula in Golosov et al. (2014) when discounting is constant, and it reduces to the tax formula in Gerlagh and Liski (2013) when discounting is quasi-hyperbolicFootnote 4.

Without corrective policy, consumers and firms in a market economy largely ignore the environmental burden associated with emission-generating activities because these costs are external—they fall primarily on people and societies distant in space and time. The optimal carbon tax is the price on the carbon content of fossil fuels that would induce consumers and firms to “internalize” these external costs (Pigou 1924), thus guiding the market system to an abatement path that is socially optimal in the broad sense of maximizing the present value of benefits minus costs. The optimal tax equals the present value of all current and future social damages associated with an incremental ton of carbon emissions today, and it can be viewed as a summary measure of the current case for climate action.

The optimal tax depends on three sets of parameters: discounting parameters, carbon decay and climate damage parameters, and a parameter that determines the relationship between capital and output in final-goods production. Calibration assumptions for these parameters are discussed below.

2.2 Calibration

2.2.1 Discounting

Time preference rates (TPR) are calibrated to ensure the real return on capital in the model is consistent with historical interest rates. A similar approach is considered in Gerlagh and Liski (2013). To construct alternative TPR paths, we use a functional form for declining rates implied by the preference aggregation argument of Li and Löfgren (2000). Agents in society disagree about the correct rate of time preference, and a social planner aggregates preferences by taking a population-weighted average of welfare functionsFootnote 5.

To keep the analysis simple, we assume that agents adopt one of two discounting perspectives, and we loosely model these perspectives after the popular discounting debate between economists William Nordhaus and Nicholas Stern. Nordhaus (2008) calibrates a constant TPR to ensure the real return on capital is consistent with historical interest rates, while Stern (2007) adopts a rate near zero on ethical grounds. If r S is the Stern rate, r N the Nordhaus rate, and p S the population weight on the Stern perspective, the preference aggregation argument implies an aggregate period t discount factor of

$$ D(t) = p_{S} \cdot e^{-r_{S} t} + (1-p_{S}) \cdot e^{-r_{N} t}. $$
(1)

It is easy to show that the corresponding discount rate, defined as the instantaneous decay rate of the discount factor, begins in time zero at the mean rate, declines continuously in time, and converges asymptotically to the lower rate, r S Footnote 6.

As described, the approach would not satisfy the calibration commitment. It would with a Stern weight of zero—this is true by construction since the Nordhaus rate is calibrated to match observed returns. But with higher Stern weights, agents in the model are compelled to save more, which pushes the real return on capital downwards. To maintain the calibration commitment, we modify the procedure by allowing the higher of the two rates to adjust to ensure that the path of real returns remains fixed at what it is under the Nordhaus calibrationFootnote 7. Specifically, we fix the Stern rate at 0.1 % (as in Stern 2007), then for a given population weight on Stern (p S ), we consider an “adjusted Nordhaus rate” (r A N ), which is adjusted endogenously so the implied declining path of time preference rates gives rise to an average real return on capital (over the first 5 simulated decades) of 5.25 percent. r A N increases with the Stern weight. The corresponding path of discount factors is given by Eq. 1 with r N replaced by r A N . TPR paths obtained in this way are shown in Fig. 1.

Fig. 1
figure 1

Comparison of alternative discounting paths

2.2.2 Carbon cycle and climate damages

Carbon dioxide is chemically inert in the atmosphere, so its removal is limited to net uptake by the land and oceans. The lifetime of anthropogenic CO2 has been widely and persistently misunderstood in the popular culture and even in some of the natural science literature. To first order, the decay of the anthropogenic carbon pulse in the atmosphere can be approximated by assuming that 80 percent of it has a half life of about 300 years, while the remaining 20 percent remains in the atmosphere permanently (Archer and Brovkin 2008). Over many centuries, a more accurate representation of CO2 removal is a sum of three exponential decay terms operating on the excess atmospheric CO2: fast decay of a small carbon pulse through CO2 fertilization (Hyvönen et al. 2007; Norby and Zak 2011) and dissolution into the surface ocean (Sabine et al. 2004); slow decay of a larger pulse by chemical reaction with calcium carbonate in the deep ocean (Tans 1998); and extremely slow decay of the remainder by reaction with rocks over geologic time (Archer and Brovkin 2008).

Larger pulses saturate the faster reservoirs so the long-lived fraction increases with total historical emissions. Archer et al. (2009) compare calculations of carbon pulse response in eight Earth System models capable of representing the millennial timescales of these geochemical processes. They found that 15 to 30 percent of a 1000 GtC pulse remained in the atmosphere after 1000 years. For a larger pulse of 5000 GtC, 25 % to 65 % remains airborne after 1000 years and 10 % to 30 % remains airborne after 10,000 years. All emissions scenarios summarized by the IPCC (Solomon et al. 2008; Van Vuuren et al. 2011) are intermediate between these extremes, so the approximation outlined above (80 % with a 300 year half life and 20 % permanent) is plausible.

To account for these features of the carbon cycle, Golosov et al. (2014) model decay of atmospheric carbon with the following three-parameter decay function:

$$ 1-d_{s} = \phi_{L} + (1-\phi_{L})\phi_{0}(1-\phi)^{s}. $$
(2)

1 − d s is the fraction of a unit of emissions that remains in the atmosphere s periods ahead. ϕ L is the long-run emission fraction. ϕ is the geometric decay rate of the remaining fraction, and 1 − ϕ 0 is the fraction (of the non-permanent portion of emissions) that falls out of the atmosphere within the first decade. GHKT further link atmospheric carbon to temperature using a relationship adopted in Nordhaus (2008). This models steady-state mean temperature as a logarithmic function of the atmospheric carbon stockFootnote 8.

A feature of the problem omitted from this formulation is temperature inertia. Temperature inertia increases the lag between emissions and damages, and has a potentially important impact on the optimal carbon tax. Fortunately, (Gerlagh and Liski 2013) show that temperature inertia can be accommodated within the GHKT economic model without sacrificing analytic tractability. We build on Gerlagh and Liski (2013) to incorporate temperature inertia while also preserving the carbon cycle specification from GHKT. We do this because the GHKT specification allows us to specify and calibrate the long run emission fraction directly, which is important for our analysis.

To calibrate the carbon decay parameters in Eq. 2, we view ϕ L as a free parameter that falls somewhere between 0 and 0.3Footnote 9. For each possible value, we follow (Golosov et al. 2014) in calibrating ϕ 0 and ϕ to match two calibration targets from the literature. First, Archer (2005) argues that excess carbon that does not enter the permanent reservoir has a half-life of about 300 years. Second, according to Solomon et al. (2008), roughly half of emissions leave the atmosphere within thirty years. We further assume that “permanent carbon” remains in the atmosphere for 2000 years. To calibrate the rate of temperature adjustment, we target a six-decade delay between emissions and peak damages, as in Gerlagh and Liski (2013) and Nordhaus (2008).

Damages are driven by a parameter that indicates the percent loss in Gross World Product (GWP) associated with an extra ton of atmospheric CO2. The parameter is stochastic, reflecting uncertainty about future harm. Only the expected value of the damage parameter in each future period enters the optimal tax formula. Our baseline calibration follows (Golosov et al. 2014). They assume the (current) expected value of the damage elasticity parameter is the same for all future periods, and they calibrate it to match two data points from a meta analysis in Nordhaus and Boyer (2000): a 2.5 degree Celsius temperature increase leads to a 0.48 % loss of GWP, while a catastrophic 6 degree rise causes a 30 % loss in GWP. Combining these assumptions implies an expected damage elasticity in each future period of 2.379 × 10−5.

3 Quantifying long-run importance

3.1 Carbon taxes

The optimal carbon taxes are shown for each discounting path in Table 1 Footnote 10. The scenario without temperature inertia follows GHKT. This is augmented to include temperature inertia using the analysis of GLFootnote 11. The average real return is reported for the GHKT calibration, but it is essentially identical with temperature inertia included.

Table 1 Optimal taxes for discounting scenarios in Fig. 1

Under Nordhaus discounting, the optimal tax with temperature inertia is just over half what it is withoutFootnote 12. In this case, temperature inertia has a very significant impact on the optimal tax. But as more weight is put on the distant future, the relative importance of temperature inertia declines, becoming almost insignificant when the Stern weight is high. The source of the latter result is intuitive: temperature inertia creates a delay in the onset of damages without changing the long run path, and this has the largest relative impact for high discounting scenarios where near term damages comprise a larger fraction of the total tax.

The results can be compared to a first period carbon tax in Nordhaus (2008) of about $ 30 per ton carbon and a first period carbon tax in Stern (2007) of about $ 250. Our significantly higher tax under Stern discounting is due to the model allowing for “permanent carbon”, something not considered in Stern’s original analysis.

The results clearly show that fixing the calibration procedure to oblige historical interest rates does not uniquely pin down climate policy when TPR decline—nor does it necessitate climate policy that is insensitive to long run outcomes. The same point is made in GL.

3.2 Cumulative value

The optimal tax aggregates the present value of future consumption impacts from climate damages attributable to an extra ton of carbon emissions today. Only damages associated with the portion of current emissions that remain in the atmosphere in a given future period enter. These damages impact consumption and investment the period in which they occur, and the drop in investment in turn impacts consumption in subsequent periods. The formula (presented in the Appendix) allows us to decompose the contribution to the current optimal tax from consumption impacts at different times.

To present this information, we construct “cumulative value functions” (CVFs) in the spirit of cumulative distribution functions from probability theory. These plot the cumulative fraction of the optimal tax attributable to consumption impacts at or before a given horizon. When presenting results in this section, we have chosen to adopt the carbon cycle model without temperature inertia. This is the scenario for which our conclusions regarding the impact of declining time preference on long run valuation are smallest, and we do this in the interest of being conservative. Results generated in the model with temperature inertia are reported in the Appendix Footnote 13.

Figure 2 superimposes four CVFs to demonstrate the earlier claim that conventional discounting overwhelms the long run emission fraction (ϕ L ) in determining the importance of long run outcomes for current policy. For each CVF, discounting is fixed at the Nordhaus (2008) calibration, and the long run emission fraction varies from 0 to 0.3. The optimal tax—thus the denominator used in computing cumulative value as a fraction of total—varies over the range from $ 38 per ton when ϕ L = 0 to $ 30 per ton when ϕ L = 0.3. That the carbon tax decreases in ϕ L under this discounting scenario is a consequence of the calibration procedure which implies that near-term carbon decay is higher for lower ϕ L . Thus, these movements in the carbon tax are due to changes in the path of carbon decay over the first hundred years or so and not to differences in the long run path. Despite these small differences, the temporal distribution of cumulative value is nearly identical across scenarios. Moreover, the importance of the long-run, viewed as the fraction of total value attributable to consumption impacts beyond 200 years, is trivial in all cases—less than 0.2 %.

Fig. 2
figure 2

Percent of total value accrued over time. Nordhaus discounting. ϕ L is the long run emission fraction

The story changes dramatically when time preference rates decline. Figure 3 fixes the long run emission fraction at zero—the assumption for which the long run matters least—then plots CVFs for the discounting scenarios in Fig. 2 Footnote 14. The temporal distribution of cumulative value varies substantially across discounting scenarios. The effect of declining rates on the shape of the CVF is most dramatic at low Stern weights, and it attenuates with further increases. This is driven by Weitzman (1998) limiting result—at even small weights on the Stern view, the path of time preference rates still declines in the long run to the Stern rate. With just 10 percent weight on the Stern view, consumption impacts beyond 200 years account for almost 30 percent of the total tax today; the long run clearly matters.

Fig. 3
figure 3

Percent of total value accrued over time. ϕ L = 0

Increasing the long run emission fraction above zero makes the differences in Fig. 3 even more pronounced. Figure 4 plots long-run importance—fraction of value from impacts beyond 200 years—as a function of the discounting assumptions and the long run emission fraction. Reading from left to right, we see that while the long run does not matter for the Nordhaus calibration examined above, its importance increases immediately when rates decline. Indeed, with just 1 percent weight on Stern and ϕ L = 0.2, long run impacts make up over 10 percent of the current tax. Moreover, as the Stern weight increases, the long run emission fraction becomes increasingly important. With 20 percent weight on Stern, long run importance increases from 36 percent to 65 percent as ϕ L increases from 0 to 0.3.

Fig. 4
figure 4

Percent of current optimal carbon tax attributable to consumption impacts beyond 200 years (in percent). Horizontal axis depicts alternative discounting assumptions. Vertical axis depicts the fraction of emissions that remain in the very long run (ϕ L )

4 Conclusions

The main critique of economic analyses of climate change that simply adopt a low discount rate is that the implied social cost of carbon does not accurately reflect society’s willingness to transfer wealth to future generations. A key source of information concerning people’s willingness to transfer resources over times are transactions in financial markets. Because of this, a standard consistency requirement imposed on climate-economy models is that the assumed preferences should give rise to equilibrium returns in the capital market of the model that are consistent with historical interest rates.

Accepting this restriction, a great deal hinges on how policy-makers view the temporal structure of time preference. When evaluating investment decisions that play out within a decade or so the difference is small. But when the consequences of current actions span centuries, as for climate change, the difference becomes crucially important. If we accept the conventional assumption of a constant rate of time preference, long run outcomes are essentially irrelevant for current policy. In contrast, when time preference rates decline, this stark feature of typical integrated assessment models goes away. For example, in our analysis with the long run decay fraction set to the IPCC suggested value of 0.2, consumption impacts beyond 200 years account for half the total carbon tax when the calibrated weight on the Stern rate of time preference is a mere 12 %.

Mirroring this result, the potential for long run features of the carbon cycle to affect the current case for action also hinges on the structure of time preference. With constant time preference, the effect of discounting overwhelms the effect of carbon decay, and there is no room for long run features of the carbon cycle to influence current policy. This probably explains why long-run features of the carbon cycle are rarely emphasized in economic analyses of climate change. But with DRTP, long run features of the carbon cycle have the potential to matter a great deal.

Finally, our findings have implications for interpreting a recent contribution to this journal. Roe and Bauman (2013) argue that fat-tailed uncertainty about climate sensitivity is unlikely to have a large impact on current climate policy due to the long transition time needed for the earth system to adjust to long run equilibrium if climate sensitivity turns out to be very high. They use a model to show that high temperature outcomes associated with high climate sensitivity scenarios do not come about for several centuries. Under conventional economic discounting, even very large impacts at this horizon do not significantly effect current decisions. Our paper does not explicitly consider this issue, but the general insights carry over. Because outcomes several hundred years and beyond have much greater weight in current decision-making under declining time preference, allowing for this presents a challenge to the Roe and Bauman conclusions.