Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

被引:0
|
作者
Moreno, Bianca Marin [1 ]
Bregere, Margaux [2 ,3 ]
Gaillard, Pierre [1 ]
Oudjane, Nadia [3 ]
机构
[1] Inria THOTH, Paris, France
[2] Sorbonne Univ, LPSM, Paris, France
[3] EDF R&D, Paris, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] Learning to Paint With Model-based Deep Reinforcement Learning
    Huang, Zhewei
    Heng, Wen
    Zhou, Shuchang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8708 - 8717
  • [32] A greedy energy efficient clustering scheme based reinforcement learning for WSNs
    Nour El Houda Bourebia
    Chunlin Li
    [J]. Peer-to-Peer Networking and Applications, 2022, 15 : 2572 - 2588
  • [33] A greedy energy efficient clustering scheme based reinforcement learning for WSNs
    Bourebia, Nour El Houda
    Li, Chunlin
    [J]. PEER-TO-PEER NETWORKING AND APPLICATIONS, 2022, 15 (06) : 2572 - 2588
  • [34] Model-based reinforcement learning with dimension reduction
    Tangkaratt, Voot
    Morimoto, Jun
    Sugiyama, Masashi
    [J]. NEURAL NETWORKS, 2016, 84 : 1 - 16
  • [35] On Effective Scheduling of Model-based Reinforcement Learning
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Huang, Yimin
    Zhang, Xing
    Tang, Ruiming
    Yu, Yong
    Li, Zhenguo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [36] Objective Mismatch in Model-based Reinforcement Learning
    Lambert, Nathan
    Amos, Brandon
    Yadan, Omry
    Calandra, Roberto
    [J]. LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 761 - 770
  • [37] Transferring Instances for Model-Based Reinforcement Learning
    Taylor, Matthew E.
    Jong, Nicholas K.
    Stone, Peter
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PART II, PROCEEDINGS, 2008, 5212 : 488 - 505
  • [38] Adaptive Discretization for Model-Based Reinforcement Learning
    Sinclair, Sean R.
    Wang, Tianyu
    Jain, Gauri
    Banerjee, Siddhartha
    Yu, Christina Lee
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [39] Model-based average reward reinforcement learning
    Tadepalli, P
    Ok, D
    [J]. ARTIFICIAL INTELLIGENCE, 1998, 100 (1-2) : 177 - 224
  • [40] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33