共 50 条
- [31] Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 506 - 523
- [34] Discounted Deterministic Markov Decision Processes and Discounted All-Pairs Shortest Paths PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2009, : 958 - +
- [37] An envelope theorem and some applications to discounted Markov decision processes Mathematical Methods of Operations Research, 2008, 67 : 299 - 321
- [40] Conditions for the uniqueness of optimal policies of discounted Markov decision processes Mathematical Methods of Operations Research, 2004, 60 : 415 - 436