Reinforcement Learning for Cost-Aware Markov Decision Processes

被引:0
|
作者
Suttle, Wesley A. [1 ]
Zhang, Kaiqing [2 ]
Yang, Zhuoran [3 ]
Kraemer, David N. [1 ]
Liu, Ji [4 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA
关键词
ACTOR-CRITIC ALGORITHM; CRITERIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Autef, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [32] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
    Prabuchandran, K. J.
    Bodas, Tejas
    Tulabandhula, Theja
    AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290
  • [33] An Inverse Reinforcement Learning Algorithm for semi-Markov Decision Processes
    Tan, Chuanfang
    Li, Yanjie
    Cheng, Yuhu
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1256 - 1261
  • [34] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
    Ronald Ortner
    Annals of Operations Research, 2013, 208 : 321 - 336
  • [35] A reinforcement learning based algorithm for finite horizon Markov decision processes
    Bhatnagar, Shalabh
    Abdulla, Mohammed Shahid
    PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, : 5519 - 5524
  • [36] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729
  • [37] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Novotny, Petr
    Vahala, Jiri
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
  • [38] Bayesian Nonparametric Inverse Reinforcement Learning for Switched Markov Decision Processes
    Surana, Amit
    Srivastava, Kunal
    2014 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2014, : 47 - 54
  • [39] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
    Ortner, Ronald
    ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336
  • [40] An immediate-return reinforcement learning for the atypical Markov decision processes
    Pan, Zebang
    Wen, Guilin
    Tan, Zhao
    Yin, Shan
    Hu, Xiaoyan
    FRONTIERS IN NEUROROBOTICS, 2022, 16