Reinforcement Learning for Cost-Aware Markov Decision Processes

被引：0

作者：

Suttle, Wesley A. ^{[1
]}

Zhang, Kaiqing ^{[2
]}

Yang, Zhuoran ^{[3
]}

Kraemer, David N. ^{[1
]}

Liu, Ji ^{[4
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA

[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA

[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

ACTOR-CRITIC ALGORITHM; CRITERIA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.

引用

页数：11

共 50 条

[31] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Autef, Arnaud
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[32] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
Prabuchandran, K. J.
Bodas, Tejas
Tulabandhula, Theja
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
[33] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
Tan, Chuanfang
Li, Yanjie
Cheng, Yuhu
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
[34] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ronald Ortner
Annals of Operations Research, 2013, 208 : 321 - 336
[35] A reinforcement learning based algorithm for finite horizon Markov decision processes
Bhatnagar, Shalabh
Abdulla, Mohammed Shahid
PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
[36] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
[37] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
Brazdil, Tomas
Chatterjee, Krishnendu
Novotny, Petr
Vahala, Jiri
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
[38] Bayesian Nonparametric Inverse Reinforcement Learning for Switched Markov Decision Processes
Surana, Amit
Srivastava, Kunal
2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 47 - 54
[39] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ortner, Ronald
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336
[40] An immediate-return reinforcement learning for the atypical Markov decision processes
Pan, Zebang
Wen, Guilin
Tan, Zhao
Yin, Shan
Hu, Xiaoyan
FRONTIERS IN NEUROROBOTICS, 2022, 16

← 1 2 3 4 5 →