共 50 条
- [32] BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES [J]. ANNALS OF STATISTICS, 2022, 50 (06): : 3364 - 3387
- [34] MAXIMAL AVERAGE-REWARD POLICIES FOR SEMI-MARKOV DECISION PROCESSES WITH ARBITRARY STATE AND ACTION SPACE [J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (05): : 1717 - &
- [37] Infinite-Horizon Gaussian Processes [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
- [38] Adaptive aggregation for reinforcement learning in average reward Markov decision processes [J]. Annals of Operations Research, 2013, 208 : 321 - 336
- [39] Average Reward Reinforcement Learning for Semi-Markov Decision Processes [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777