Abstract
A possibly immortal agent tries to maximise its summed discounted rewards over time, where discounting is used to avoid infinite utilities and encourage the agent to value current rewards more than future ones. Some commonly used discount functions lead to time-inconsistent behavior where the agent changes its plan over time. These inconsistencies can lead to very poor behavior. We generalise the usual discounted utility model to one where the discount function changes with the age of the agent. We then give a simple characterisation of time-(in)consistent discount functions and show the existence of a rational policy for an agent that knows its discount function is time-inconsistent.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Frederick, S., Oewenstein, G.L., O’Donoghue, T.: Time discounting and time preference: A critical review. Journal of Economic Literature 40(2) (2002)
Fudenberg, D.: Subgame-perfect equilibria of finite and infinite-horizon games. Journal of Economic Theory 31(2) (1983)
Goldman, S.M.: Consistent plans. The Review of Economic Studies 47(3), 533–537 (1980)
Green, L., Fristoe, N., Myerson, J.: Temporal discounting and preference reversals in choice between delayed outcomes. Psychonomic bulletin and review 1(3), 383–389 (1994)
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2004)
Hutter, M.: General Discounting Versus Average Reward. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 244–258. Springer, Heidelberg (2006)
Legg, S.: Machine Super Intelligence. PhD thesis, University of Lugano (2008)
Legg, S., Hutter, M.: Universal intelligence: A definition of machine intelligence. Minds & Machines 17(4), 391–444 (2007)
Osborne, M.J., Rubinstein, A.: A Course in Game Theory. The MIT Press, Cambridge (1994)
Peleg, B., Yaari, M.E.: On the existence of a consistent course of action when tastes are changing. The Review of Economic Studies 40(3), 391–401 (1973)
Pollak, R.A.: Consistent planning. The Review of Economic Studies 35(2), 201–208 (1968)
Samuelson, P.A.: A note on measurement of utility. The Review of Economic Studies 4(2), 155–161 (1937)
Strotz, R.H.: Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies 23(3), 165–180 (1955)
Thaler, R.: Some empirical evidence on dynamic inconsistency. Economics Letters 8(3), 201–207 (1981)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lattimore, T., Hutter, M. (2011). Time Consistent Discounting. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-24412-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)