Reinforcement Learning for Cost-Aware Markov Decision Processes

被引：0

作者：

Suttle, Wesley A. ^{[1
]}

Zhang, Kaiqing ^{[2
]}

Yang, Zhuoran ^{[3
]}

Kraemer, David N. ^{[1
]}

Liu, Ji ^{[4
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Coordinated Sci Lab, Urbana, IL USA

[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ USA

[4] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

ACTOR-CRITIC ALGORITHM; CRITERIA;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Ratio maximization has applications in areas as diverse as finance, reward shaping for reinforcement learning (RL), and the development of safe artificial intelligence, yet there has been very little exploration of RL algorithms for ratio maximization. This paper addresses this deficiency by introducing two new, model-free RL algorithms for solving cost-aware Markov decision processes, where the goal is to maximize the ratio of long-run average reward to long-run average cost. The first algorithm is a two-timescale scheme based on relative value iteration (RVI) Q-learning and the second is an actor-critic scheme. The paper proves almost sure convergence of the former to the globally optimal solution in the tabular case and almost sure convergence of the latter under linear function approximation for the critic. Unlike previous methods, the two algorithms provably converge for general reward and cost functions under suitable conditions. The paper also provides empirical results demonstrating promising performance and lending strong support to the theoretical results.

引用

页数：11

共 50 条

[21] A Deep Reinforcement Learning-Based Preemptive Approach for Cost-Aware Cloud Job Scheduling
Cheng, Long
Wang, Yue
Cheng, Feng
Liu, Cheng
Zhao, Zhiming
Wang, Ying
IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2024, 9 (03): : 422 - 432
[22] A Structure-aware Online Learning Algorithm for Markov Decision Processes
Roy, Arghyadip
Borkar, Vivek
Karandikar, Abhay
Chaporkar, Prasanna
PROCEEDINGS OF THE 12TH EAI INTERNATIONAL CONFERENCE ON PERFORMANCE EVALUATION METHODOLOGIES AND TOOLS (VALUETOOLS 2019), 2019, : 71 - 78
[23] Risk-aware Q-Learning for Markov Decision Processes
Huang, Wenjie
Haskell, William B.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[24] Optimising darts strategy using Markov decision processes and reinforcement learning
Baird, Graham
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2020, 71 (06) : 1020 - 1037
[25] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
Bolshakov, V. E.
Alfimtsev, A. N.
DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S382 - S392
[26] Hierarchical Method for Cooperative Multiagent Reinforcement Learning in Markov Decision Processes
V. E. Bolshakov
A. N. Alfimtsev
Doklady Mathematics, 2023, 108 : S382 - S392
[27] Online Learning in Markov Decision Processes with Changing Cost Sequences
Dick, Travis
Gyorgy, Andras
Szepesvari, Csaba
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[28] Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Lim, Shiau Hong
Autef, Arnaud
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[29] Model-Free Reinforcement Learning for Branching Markov Decision Processes
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
[30] Reinforcement Learning Algorithms for Regret Minimization in Structured Markov Decision Processes
Prabuchandran, K. J.
Bodas, Tejas
Tulabandhula, Theja
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1289 - 1290

← 1 2 3 4 5 →