Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

被引：0

作者：

Moreno, Bianca Marin ^{[1
]}

Bregere, Margaux ^{[2
,3
]}

Gaillard, Pierre ^{[1
]}

Oudjane, Nadia ^{[3
]}

机构：

[1] Inria THOTH, Paris, France

[2] Sorbonne Univ, LPSM, Paris, France

[3] EDF R&D, Paris, France

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

引用

页数：36

共 50 条

[31] Learning to Paint With Model-based Deep Reinforcement Learning
Huang, Zhewei
Heng, Wen
Zhou, Shuchang
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8708 - 8717
[32] A greedy energy efficient clustering scheme based reinforcement learning for WSNs
Nour El Houda Bourebia
Chunlin Li
[J]. Peer-to-Peer Networking and Applications, 2022, 15 : 2572 - 2588
[33] A greedy energy efficient clustering scheme based reinforcement learning for WSNs
Bourebia, Nour El Houda
Li, Chunlin
[J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2022, 15 (06) : 2572 - 2588
[34] Model-based reinforcement learning with dimension reduction
Tangkaratt, Voot
Morimoto, Jun
Sugiyama, Masashi
[J]. NEURAL NETWORKS, 2016, 84 : 1 - 16
[35] On Effective Scheduling of Model-based Reinforcement Learning
Lai, Hang
Shen, Jian
Zhang, Weinan
Huang, Yimin
Zhang, Xing
Tang, Ruiming
Yu, Yong
Li, Zhenguo
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[36] Objective Mismatch in Model-based Reinforcement Learning
Lambert, Nathan
Amos, Brandon
Yadan, Omry
Calandra, Roberto
[J]. LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 761 - 770
[37] Transferring Instances for Model-Based Reinforcement Learning
Taylor, Matthew E.
Jong, Nicholas K.
Stone, Peter
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 488 - 505
[38] Adaptive Discretization for Model-Based Reinforcement Learning
Sinclair, Sean R.
Wang, Tianyu
Jain, Gauri
Banerjee, Siddhartha
Yu, Christina Lee
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[39] Model-based average reward reinforcement learning
Tadepalli, P
Ok, D
[J]. ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224
[40] MOReL: Model-Based Offline Reinforcement Learning
Kidambi, Rahul
Rajeswaran, Aravind
Netrapalli, Praneeth
Joachims, Thorsten
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →