Reinforcement Learning for Cost-Aware Markov Decision Processes

被引:0
|
作者
Suttle, Wesley A. [1 ]
Zhang, Kaiqing [2 ]
Yang, Zhuoran [3 ]
Kraemer, David N. [1 ]
Liu, Ji [4 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA
关键词
ACTOR-CRITIC ALGORITHM; CRITERIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A Deep Reinforcement Learning-Based Preemptive Approach for Cost-Aware Cloud Job Scheduling
    Cheng, Long
    Wang, Yue
    Cheng, Feng
    Liu, Cheng
    Zhao, Zhiming
    Wang, Ying
    IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (03): : 422 - 432
  • [22] A Structure-aware Online Learning Algorithm for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 71 - 78
  • [23] Risk-aware Q-Learning for Markov Decision Processes
    Huang, Wenjie
    Haskell, William B.
    2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
  • [25] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
    Bolshakov, V. E.
    Alfimtsev, A. N.
    DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S382 - S392
  • [26] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
    V. E. Bolshakov
    A. N. Alfimtsev
    Doklady Mathematics, 2023, 108 : S382 - S392
  • [27] Online Learning in Markov Decision Processes with Changing Cost Sequences
    Dick, Travis
    Gyorgy, Andras
    Szepesvari, Csaba
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [28] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Autef, Arnaud
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [29] Model-Free Reinforcement Learning for Branching Markov Decision Processes
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
  • [30] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
    Prabuchandran, K. J.
    Bodas, Tejas
    Tulabandhula, Theja
    AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290