Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

被引：0

作者：

Moreno, Bianca Marin ^{[1
]}

Bregere, Margaux ^{[2
,3
]}

Gaillard, Pierre ^{[1
]}

Oudjane, Nadia ^{[3
]}

机构：

[1] Inria THOTH, Paris, France

[2] Sorbonne Univ, LPSM, Paris, France

[3] EDF R&D, Paris, France

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

引用

页数：36

共 50 条

[1] Efficient hyperparameter optimization through model-based reinforcement learning
Wu, Jia
Chen, SenPeng
Liu, XiYuan
[J]. NEUROCOMPUTING, 2020, 409 : 381 - 393
[2] Efficient state synchronisation in model-based testing through reinforcement learning
Turker, Uraz Cengiz
Hierons, Robert M.
Mousavi, Mohammad Reza
Tyukin, Ivan Y.
[J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 368 - 380
[3] Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
Efroni, Yonathan
Merlis, Nadav
Ghavamzadeh, Mohammad
Mannor, Shie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Reinforcement learning with constraint based on mirror descent algorithm
Miyashita, Megumi
Kondo, Toshiyuki
Yano, Shiro
[J]. RESULTS IN CONTROL AND OPTIMIZATION, 2021, 4
[5] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
Li, Zhuoru
Narayan, Akshay
Leong, Tze-Yun
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589
[6] Efficient reinforcement learning: Model-based acrobot control
Boone, G
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 229 - 234
[7] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
Liu, Xiyuan
Wu, Jia
Chen, Senpeng
[J]. SOFT COMPUTING, 2023, 27 (13) : 8661 - 8678
[8] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
Xiyuan Liu
Jia Wu
Senpeng Chen
[J]. Soft Computing, 2023, 27 : 8661 - 8678
[9] Sample-efficient model-based reinforcement learning for quantum control
Khalid, Irtaza
Weidner, Carrie A.
Jonckheere, Edmond A.
Schirmer, Sophie G.
Langbein, Frank C.
[J]. PHYSICAL REVIEW RESEARCH, 2023, 5 (04):
[10] Efficient Neural Network Pruning Using Model-Based Reinforcement Learning
Bencsik, Blanka
Szemenyei, Marton
[J]. 2022 INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2022, : 130 - 137

← 1 2 3 4 5 →