Reinforcement Learning for Cost-Aware Markov Decision Processes

被引：0

作者：

Suttle, Wesley A. ^{[1
]}

Zhang, Kaiqing ^{[2
]}

Yang, Zhuoran ^{[3
]}

Kraemer, David N. ^{[1
]}

Liu, Ji ^{[4
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA

[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA

[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

ACTOR-CRITIC ALGORITHM; CRITERIA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.

引用

页数：11

共 50 条

[1] Reinforcement learning based algorithms for average cost Markov Decision Processes
Abdulla, Mohammed Shahid
Bhatnagar, Shalabh
DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2007, 17 (01): : 23 - 52
[2] Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processes
Mohammed Shahid Abdulla
Shalabh Bhatnagar
Discrete Event Dynamic Systems, 2007, 17 : 23 - 52
[3] Deep Reinforcement Learning for Orchestrating Cost-Aware Reconfigurations of vRANs
Murti, Fahri Wisnu
Ali, Samad
Iosifidis, George
Latva-aho, Matti
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (01): : 200 - 216
[4] Reinforcement Learning for Constrained Markov Decision Processes
Gattami, Ather
Bai, Qinbo
Aggarwal, Vaneet
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[5] Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Xu, Huan
Mannor, Shie
MATHEMATICS OF OPERATIONS RESEARCH, 2016, 41 (04) : 1325 - 1353
[6] Cost-aware job scheduling for cloud instances using deep reinforcement learning
Feng Cheng
Yifeng Huang
Bhavana Tanpure
Pawan Sawalani
Long Cheng
Cong Liu
Cluster Computing, 2022, 25 : 619 - 631
[7] A reinforcement learning based algorithm for Markov decision processes
Bhatnagar, S
Kumar, S
2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 199 - 204
[8] A sensitivity view of Markov decision processes and reinforcement learning
Cao, XR
MODELING, CONTROL AND OPTIMIZATION OF COMPLEX SYSTEMS: IN HONOR OF PROFESSOR YU-CHI HO, 2003, 14 : 261 - 283
[9] Cost-aware job scheduling for cloud inutances using deep reinforcement learning
Cheng, Feng
Huang, Yifeng
Tanpure, Bhavana
Sawalani, Pawan
Cheng, Long
Liu, Cong
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (01): : 619 - 631
[10] NENYA: Cascade Reinforcement Learning for Cost-Aware Failure Mitigation at Microsoft 365
Wang, Lu
Zhao, Pu
Du, Chao
Luo, Chuan
Su, Mengna
Yang, Fangkai
Liu, Yudong
Lin, Qingwei
Wang, Min
Dang, Yingnong
Zhang, Hongyu
Rajmohan, Saravan
Zhang, Dongmei
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4032 - 4040

← 1 2 3 4 5 →