Concurrent Credit Assignment for Data-efficient Reinforcement Learning

被引:1
|
作者
Dauce, Emmanuel [1 ]
机构
[1] CNRS, Cent Marseille, Inst Neurosci la Timone, Marseille, France
关键词
D O I
10.1109/IJCNN55064.2022.9892560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The capacity to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms. The method presented in this paper relies on an occupancy model, that is the empirical distribution of the states encountered by the agent under a given policy, that is its "domain of operation". Then, under a uniform occupancy prior assumption, an evidence lower bound on the parameters of the policy provides a way to express a balance between two concurrent tendencies, namely the widening of the occupancy space and the maximization of the rewards, reminding of the classical exploration/exploitation trade-off. During training, both the policy and the occupancy model are updated as the exploration progresses, and that new states are undisclosed during the course of the training. Implemented on an actor-critic off-policy on classic continuous action benchmarks, this approach is shown to provide significant increase in the sampling efficacy, that is reflected in a reduced training time and higher returns, in both the dense and the sparse rewards cases.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Data-Efficient Hierarchical Reinforcement Learning
    Nachum, Ofir
    Gu, Shixiang
    Lee, Honglak
    Levine, Sergey
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [2] Data-Efficient Reinforcement Learning for Malaria Control
    Zou, Lixin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 507 - 513
  • [3] Pretraining Representations for Data-Efficient Reinforcement Learning
    Schwarzer, Max
    Rajkumar, Nitarshan
    Noukhovitch, Michael
    Anand, Ankesh
    Charlin, Laurent
    Hjelm, Devon
    Bachman, Philip
    Courville, Aaron
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] EqR: Equivariant Representations for Data-Efficient Reinforcement Learning
    Mondal, Arnab Kumar
    Jain, Vineet
    Siddiqi, Kaleem
    Ravanbakhsh, Siamak
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [5] Data-Efficient Reinforcement Learning for Variable Impedance Control
    Anand, Akhil S.
    Kaushik, Rituraj
    Gravdahl, Jan Tommy
    Abu-Dakka, Fares J.
    [J]. IEEE ACCESS, 2024, 12 : 15631 - 15641
  • [6] BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning
    Cagatan, Omer Veysel
    Akgun, Baris
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [7] Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
    Nie, Allen
    Flet-Berliac, Yannis
    Jordan, Deon R.
    Steenbergen, William
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Data-Efficient Offline Reinforcement Learning with Approximate Symmetries
    Angelotti, Giorgio
    Drougard, Nicolas
    Chanel, Caroline P. C.
    [J]. AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2023, 2024, 14546 : 164 - 186
  • [9] Optimistic Sampling Strategy for Data-Efficient Reinforcement Learning
    Zhao, Dongfang
    Liu, Jiafeng
    Wu, Rui
    Cheng, Dansong
    Tang, Xianglong
    [J]. IEEE ACCESS, 2019, 7 : 55763 - 55769
  • [10] Data-Efficient Reinforcement Learning for Complex Nonlinear Systems
    Donge, Vrushabh S.
    Lian, Bosen
    Lewis, Frank L.
    Davoudi, Ali
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (03) : 1391 - 1402