Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

被引:0
|
作者
Moreno, Bianca Marin [1 ]
Bregere, Margaux [2 ,3 ]
Gaillard, Pierre [1 ]
Oudjane, Nadia [3 ]
机构
[1] Inria THOTH, Paris, France
[2] Sorbonne Univ, LPSM, Paris, France
[3] EDF R&D, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] Efficient hyperparameter optimization through model-based reinforcement learning
    Wu, Jia
    Chen, SenPeng
    Liu, XiYuan
    [J]. NEUROCOMPUTING, 2020, 409 : 381 - 393
  • [2] Efficient state synchronisation in model-based testing through reinforcement learning
    Turker, Uraz Cengiz
    Hierons, Robert M.
    Mousavi, Mohammad Reza
    Tyukin, Ivan Y.
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 368 - 380
  • [3] Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
    Efroni, Yonathan
    Merlis, Nadav
    Ghavamzadeh, Mohammad
    Mannor, Shie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Reinforcement learning with constraint based on mirror descent algorithm
    Miyashita, Megumi
    Kondo, Toshiyuki
    Yano, Shiro
    [J]. RESULTS IN CONTROL AND OPTIMIZATION, 2021, 4
  • [5] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
    Li, Zhuoru
    Narayan, Akshay
    Leong, Tze-Yun
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589
  • [6] Efficient reinforcement learning: Model-based acrobot control
    Boone, G
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION - PROCEEDINGS, VOLS 1-4, 1997, : 229 - 234
  • [7] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
    Liu, Xiyuan
    Wu, Jia
    Chen, Senpeng
    [J]. SOFT COMPUTING, 2023, 27 (13) : 8661 - 8678
  • [8] Efficient hyperparameters optimization through model-based reinforcement learning with experience exploiting and meta-learning
    Xiyuan Liu
    Jia Wu
    Senpeng Chen
    [J]. Soft Computing, 2023, 27 : 8661 - 8678
  • [9] Sample-efficient model-based reinforcement learning for quantum control
    Khalid, Irtaza
    Weidner, Carrie A.
    Jonckheere, Edmond A.
    Schirmer, Sophie G.
    Langbein, Frank C.
    [J]. PHYSICAL REVIEW RESEARCH, 2023, 5 (04):
  • [10] Efficient Neural Network Pruning Using Model-Based Reinforcement Learning
    Bencsik, Blanka
    Szemenyei, Marton
    [J]. 2022 INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2022, : 130 - 137