Abstract
Optimal policies in Markov decision problems may be quite sensitive with regard to transition probabilities. In practice, some transition probabilities may be uncertain. The goals of the present study are to find the robust range for a certain optimal policy and to obtain value intervals of exact transition probabilities. Our research yields powerful contributions for Markov decision processes (MDPs) with uncertain transition probabilities. We first propose a method for estimating unknown transition probabilities based on maximum likelihood. Since the estimation may be far from accurate, and the highest expected total reward of the MDP may be sensitive to these transition probabilities, we analyze the robustness of an optimal policy and propose an approach for robust analysis. After giving the definition of a robust optimal policy with uncertain transition probabilities represented as sets of numbers, we formulate a model to obtain the optimal policy. Finally, we define the value intervals of the exact transition probabilities and construct models to determine the lower and upper bounds. Numerical examples are given to show the practicability of our methods.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
H S Baik, H S Jeong, D M Abraham. Estimating transition probabilities in Markov chain-based deterioration models for management of wastewater systems, Journal of Water Resources Planning and Management, 2006, 132(1): 15–24.
E Delage, S Mannor. Percentile optimization for Markov decision processes with parameter uncertainty, Operations Research, 2009, 58(1): 203–213.
N I Garud. Robust dynamic programming, Mathematics of Operations Research, 2005, 30(2): 257–280.
S Kalyanasundaram, E K P Chong, N B Shroff. Markov decision processes with uncertain transition rates: sensitivity and max hyphen min control, Asian Journal of Control, 2004, 6(2): 253–269
B H Li, J Si. Robust dynamic programming for discounted infinite-horizon Markov decision processes with uncertain stationary transition matrice, Proceedings of the 2007 IEEE Symposium on Approximate, 2007, 96–102.
A Nilim, L E Ghaoui. Robust control of Markov decision processes with uncertain transition matrices, Operations Research, 2005, 53(5): 780–798.
M L Puterman. Markov decision processes: discrete stochastic dynamic programming, John Wiley & Sons, New Jersey, 2014.
W A S Reis, L N Barros, K V Delgado. Robust topological policy iteration for infinite horizon bounded Markov Decision Processes, International Journal of Approximate Reasoning, 2019, 105: 287–304.
J K Satia, R E Lave. Markovian decision processes with uncertain transition probabilities, Operations Research, 1973, 21(3): 728–740.
B Wang, Q X Zhu. Stability analysis of semi-Markov switched stochastic systems, Automatic, 2018, 94: 72–80.
C C White, H K Eldeib. Markov decision processes with imprecise transition probabilities, Operations Research, 1994, 42(4): 739–749.
W Wiesemann, D Kuhn, B Rustem. Robust Markov decision processes, Mathematics of Operations Research, 2013, 38(1): 153–183.
H Xu, S Mannor. Distributionally robust Markov decision processes, Mathematics of Operations Research, 2012, 37(2): 288–300.
P Q Yu, H Xu. Distributionally robust counterpart in Markov decision processes, IEEE Transactions on Automatic Control, 2016, 61(9): 2538–2543.
Q X Zhu. Stability analysis of stochastic delay differential equations with Lévy noise Systems & Control Letters, 2018, 118: 62–68.
Q X Zhu. Stabilization of stochastic nonlinear delay systems with exogenous disturbances and the event-triggered feedback control, IEEE Transactions on Automatic Control, 2019, 64(9): 3764–3771.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National Natural Science Foundation of China (71571019).
Rights and permissions
About this article
Cite this article
Lou, Zk., Hou, Fj. & Lou, Xm. Robust analysis of discounted Markov decision processes with uncertain transition probabilities. Appl. Math. J. Chin. Univ. 35, 417–436 (2020). https://doi.org/10.1007/s11766-020-3664-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11766-020-3664-1
Keywords
- Markov decision processes
- uncertain transition probabilities
- robustness and sensitivity
- robust optimal policy
- value interval