Abstract
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Markowitz, H.,Portfolio Selection, Wiley, New York, New York, 1959.
Charnes, A., andCooper, W. W.,Chance Constrained Programming, Management Science, Vol. 6, pp. 73–79, 1959.
Hogan, A. J., Morris, J. G., andThompson, H. E.,Decision Problems under Risk and Chance Constrained Programming: Dilemmas in the Transition, Management Science, Vol. 27, pp. 698–716, 1981.
Jacquette, S. C.,A Utility Criterion for Markov Decision Processes, Management Science, Vol. 23, pp. 43–49, 1979.
Jacquette, S. C.,Markov Decision Processes with a New Optimality Criterion, Small Interest Rates, Annals of Mathematical Statistics, Vol. 1, pp. 1894–1901, 1973.
Porteus, E. L.,On the Optimality of Structure Policies in Countable Stage Decision Processes, Management Science, Vol. 22, pp. 148–157, 1975.
White, C. C.,The Optimality of Isotone Strategies for Markov Decision Problems with Utility Criterion, Recent Developments in Markov Decision Processes, Edited by R. Hartley, L. C. Thomas, and D. J. White, Academic Press, New York, New York, 1980.
Howard, R. A., andMatheson, J. E.,Risk-Sensitive Markov Decision Processes, Management Science, Vol. 8, pp. 356–369, 1972.
Kreps, D. M.,Decision Problems with Expected Utility Criteria, I: Upper and Lower Convergent Utility, Mathematics of Operations Research, Vol. 2, pp. 45–53, 1977.
Kreps, D. M.,Decision Problems with Expected Utility Criteria, II: Stationarity, Mathematics of Operations Research, Vol. 2, pp. 266–274, 1977.
Rothblum, U. G.,Multiplicative Markov Decision Chains, Mathematics of Operations Research, Vol. 9, pp. 6–24, 1984.
Sobel, M. J.,Ordinal Dynamic Programming, Management Science, Vol. 21, pp. 967–975, 1975.
Kallenberg, L. C. M.,Linear Programming and Finite Markovian Control Problems, Mathematisch Centrum, Amsterdam, Holland, 1983.
Sobel, M. J.,The Variance of Discounted Markov Decision Processes, Journal of Applied Probability, Vol. 19, pp. 774–802, 1982.
Miller, B.,On Dynamic Programming for a Stochastic Markovian Process with an Application to the Mean Variance Models, Management Science, Vol. 24, p. 1779, 1978.
White, D. J.,Probabilistic Constraints and Variance in Markov Decision Processes, University of Manchester, Department of Decision Theory, Notes in Decision Theory, No. 149, 1984.
Derman, C.,Finite State Markovian Decision Processes, Academic Press, New York, New York, 1970.
Van Der Wal, J.,Stochastic Dynamic Programming, Mathematisch Centrum, Amsterdam, Holland, 1981.
Derman, C.,On Sequential Control Procedures, Annals of Mathematical Statistics, Vol. 35, pp. 341–349, 1964.
Derman, C., andStrauch, R.,A Note on Memoryless Rules for Controlling Sequential Control Processes, Annals of Mathematical Statistics, Vol. 37, pp. 276–278, 1966.
Hartley, R.,Finite, Discounted, Vector Markov Decision Processes, University of Manchester, Department of Decision Theory, Notes in Decision Theory, No. 85, 1979.
Derman, C.,Stable Sequential Control Rules and Markov Chains, Journal of Mathematical Analysis and Applications, Vol. 6, pp. 257–265, 1963.
Hordjik, A., andKallenberg, L. C. M.,Constrained Stochastic Dynamic Programming, Mathematics of Operations Research, Vol. 9, pp. 276–289, 1984.
Derman, C., andVeinott, A. F.,Constrained Markov Decision Chains, Management Science, Vol. 19, pp. 389–390, 1972.
Strauch, R., andVeinott, A.,A Property of Sequential Control Processes, The Rand Corporation, Santa Monica, California, Research Memorandum No. RM 14772, 1966.
White, D. J.,Utility, Probabilistic Constraints, Mean, and Variance in Markov Decision Processes, University of Manchester, Notes in Decision Theory, No. 163, 1985.
Derman, C., andKlein, M.,Some Remarks on Finite-Horizon Markovian Decision Models, Operations Research, Vol. 13, pp. 272–278, 1965.
White, D. J.,Dynamic Programming with Probabilistic Constraints, Operations Research, Vol. 22, pp. 654–664, 1972.
Derman, C.,Optimal Replacement under Markovian Deterioration with Probability Bounds on Failure, Management Science, Vol. 9, pp. 478–481, 1963.
Dantzig, G. B., andWolfe, P.,The Decomposition Algorithm for Linear Programming, Econometrica, Vol. 29, pp. 767–778, 1961.
Howard, R. A.,Dynamic Programming and Markov Processes, Massachusetts Institute of Technology, PhD Thesis, 1960.
Filar, J. A., andLee, H. M.,Gain Variability Tradeoffs in Undiscounted Markov Decision Processes, Proceedings of the 24th IEEE Conference on Decision and Control, pp. 1106–1112, 1985.
White, D. J.,Optimality and Efficiency, Wiley, Now York, New York, 1982.
Mendelssohn, R.,A Systematic Approach to Determining Mean Variance Tradeoffs when Managing Randomly Varying Populations, Mathematical Biosciences, Vol. 50, pp. 75–84, 1980.
Filar, J. A.,Percentiles and Markovian Decision Proceesses, Operations Research Letters, Vol. 2, pp. 13–15, 1980.
White, D. J.,Fundamentals of Decision Theory, North-Holland, New York, New York, 1976.
White, D. J.,Minimizing Threshold Probabilities in Infinite-Horizon Discounted Markov Decision Processes, University of Manchester, Department of Decision Theory, Notes in Decision Theory, No. 165, 1985.
Henig, M.,Optimality in Dynamic Programming with Deterministic Transitions and Stochastic Rewards, Tel Aviv University, Faculty of Management, Working Paper No. 721/82, 1982.
Henig, M.,Target and Percentile Criteria in Dynamic Programming with Deterministic Transitions and Stochastic Rewards, University of Illinois at Urbana-Champaign, Department of Business Administration, 1984.
Charnes, A. andCooper, W. W.,Chance Constraints and Normal Deviates, Journal of the American Statistical Association, Vol. 57, pp. 134–148, 1962.
Goldwerger, J.,Dynamic Programming of a Stochastic Markovian Process with an Application to the Mean Variance Models, Management Science, Vol. 23, pp. 612–620, 1977.
Parks, M. S., andSteinberg, E.,A Preference Order Dynamic Program for a Knapsack Problem with Stochastic Rewards, Journal of the Operational Research Society, Vol. 30, pp. 141–147, 1979.
Sneidovitch, M.,Preference Order Stochastic Knapsack Problems: Methodological Issues, Journal of the Operation Research Society, Vol. 31, pp. 1025–1032, 1980.
Sneidovitch, M.,A Class of Variance Constrained Problems, Operations Research, Vol. 31, pp. 338–353, 1983.
Greenberg, H.,Dynamic Programming with Linear Uncertainty, Operations Research, Vol. 16, pp. 675–678, 1968.
Beja, A.,Probability Bounds in Replacement Policies for Markov Systems, Management Science, Vol. 16, pp. 253–264, 1969.
Bouakiz, M.,Risk Sensitivity in Stochastic Optimization with Applications, Georgia Institute of Technology, PhD Thesis, 1985.
Chung, K. J.,Some Topics in Risk-Sensitive Stochastic Dynamic Models, Georgia Institute of Technology, PhD Thesis, 1985.
Filar, J. A., andLee, H. M.,Gain Variability Tradeoffs in Discounted Markov Decision Processes, Johns Hopkins University, Department of Mathematical Sciences, Technical Report No. 408, 1985.
Lee, H. M.,Gain Variability Tradeoffs in Markovian Decision Processes and Related Problems, Johns Hopkins University, Department of Mathematical Sciences, PhD Thesis, 1985.
Sobel, M. J.,Mean-Variance Tradeoffs in an Undiscounted MDP, Georgia Institute of Technology, Research Memorandum, 1984.
Sobel, M. J.,Maximal Mean/Variance Ratio in an Undiscounted MDP, Georgia Institute of Technology, Research Memorandum, 1985.
Author information
Authors and Affiliations
Additional information
Communicated by P. L. Yu
The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list.
Rights and permissions
About this article
Cite this article
White, D.J. Mean, variance, and probabilistic criteria in finite Markov decision processes: A review. J Optim Theory Appl 56, 1–29 (1988). https://doi.org/10.1007/BF00938524
Issue Date:
DOI: https://doi.org/10.1007/BF00938524