Reinforcement Learning for Cost-Aware Markov Decision Processes

被引:0
|
作者
Suttle, Wesley A. [1 ]
Zhang, Kaiqing [2 ]
Yang, Zhuoran [3 ]
Kraemer, David N. [1 ]
Liu, Ji [4 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA
[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA
关键词
ACTOR-CRITIC ALGORITHM; CRITERIA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Reinforcement learning based algorithms for average cost Markov Decision Processes
    Abdulla, Mohammed Shahid
    Bhatnagar, Shalabh
    DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2007, 17 (01): : 23 - 52
  • [2] Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
    Mohammed Shahid Abdulla
    Shalabh Bhatnagar
    Discrete Event Dynamic Systems, 2007, 17 : 23 - 52
  • [3] Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs
    Murti, Fahri Wisnu
    Ali, Samad
    Iosifidis, George
    Latva-aho, Matti
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (01): : 200 - 216
  • [4] Reinforcement Learning for Constrained Markov Decision Processes
    Gattami, Ather
    Bai, Qinbo
    Aggarwal, Vaneet
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [5] Reinforcement Learning in Robust Markov Decision Processes
    Lim, Shiau Hong
    Xu, Huan
    Mannor, Shie
    MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
  • [6] Cost-aware job scheduling for cloud instances using deep reinforcement learning
    Feng Cheng
    Yifeng Huang
    Bhavana Tanpure
    Pawan Sawalani
    Long Cheng
    Cong Liu
    Cluster Computing, 2022, 25 : 619 - 631
  • [7] A reinforcement learning based algorithm for Markov decision processes
    Bhatnagar, S
    Kumar, S
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
  • [8] A sensitivity view of Markov decision processes and reinforcement learning
    Cao, XR
    MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
  • [9] Cost-aware job scheduling for cloud inutances using deep reinforcement learning
    Cheng, Feng
    Huang, Yifeng
    Tanpure, Bhavana
    Sawalani, Pawan
    Cheng, Long
    Liu, Cong
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (01): : 619 - 631
  • [10] NENYA: Cascade Reinforcement Learning for Cost-Aware Failure Mitigation at Microsoft 365
    Wang, Lu
    Zhao, Pu
    Du, Chao
    Luo, Chuan
    Su, Mengna
    Yang, Fangkai
    Liu, Yudong
    Lin, Qingwei
    Wang, Min
    Dang, Yingnong
    Zhang, Hongyu
    Rajmohan, Saravan
    Zhang, Dongmei
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4032 - 4040