Concurrent Credit Assignment for Data-efficient Reinforcement Learning

被引:1
|
作者
Dauce, Emmanuel [1 ]
机构
[1] CNRS, Cent Marseille, Inst Neurosci la Timone, Marseille, France
关键词
D O I
10.1109/IJCNN55064.2022.9892560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The capacity to widely sample the state and action spaces is a key ingredient toward building effective reinforcement learning algorithms. The method presented in this paper relies on an occupancy model, that is the empirical distribution of the states encountered by the agent under a given policy, that is its "domain of operation". Then, under a uniform occupancy prior assumption, an evidence lower bound on the parameters of the policy provides a way to express a balance between two concurrent tendencies, namely the widening of the occupancy space and the maximization of the rewards, reminding of the classical exploration/exploitation trade-off. During training, both the policy and the occupancy model are updated as the exploration progresses, and that new states are undisclosed during the course of the training. Implemented on an actor-critic off-policy on classic continuous action benchmarks, this approach is shown to provide significant increase in the sampling efficacy, that is reflected in a reduced training time and higher returns, in both the dense and the sparse rewards cases.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Data-Efficient Reinforcement Learning for Energy Optimization of Power-Assisted Wheelchairs
    Feng, Guoxi
    Busoniu, Lucian
    Guerra, Thierry-Marie
    Mohammad, Sami
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2019, 66 (12) : 9734 - 9744
  • [32] Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning
    Zhong, Rujie
    Zhang, Duohan
    Schafer, Lukas
    Albrecht, Stefano V.
    Hanna, Josiah P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Mix-up Consistent Cross Representations for Data-Efficient Reinforcement Learning
    Liu, Shiyu
    Cao, Guitao
    Liu, Yong
    Li, Yan
    Wu, Chunwei
    Xi, Xidong
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [34] Credit Assignment during Movement Reinforcement Learning
    Dam, Gregory
    Kording, Konrad
    Wei, Kunlin
    [J]. PLOS ONE, 2013, 8 (02):
  • [35] SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
    Lyu, Daoming
    Yang, Fangkai
    Liu, Bo
    Gustafson, Steven
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2019, (306): : 354 - 354
  • [36] Data-efficient deep reinforcement learning with expert demonstration for active flow control
    Zheng, Changdong
    Xie, Fangfang
    Ji, Tingwei
    Zhang, Xinshuai
    Lu, Yufeng
    Zhou, Hongjie
    Zheng, Yao
    [J]. PHYSICS OF FLUIDS, 2022, 34 (11)
  • [37] SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning
    Lyu, Daoming
    Yang, Fangkai
    Liu, Bo
    Gustafson, Steven
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 2970 - 2977
  • [38] Load Balancing for Communication Networks via Data-Efficient Deep Reinforcement Learning
    Wu, Di
    Kang, Jikun
    Xu, Yi Tian
    Li, Hang
    Li, Jimmy
    Chen, Xi
    Rivkin, Dmitriy
    Jenkin, Michael
    Lee, Taeseop
    Park, Intaik
    Liu, Xue
    Dudek, Gregory
    [J]. 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [39] Uniform Priors for Data-Efficient Learning
    Sinha, Samarth
    Roth, Karsten
    Goyal, Anirudh
    Ghassemi, Marzyeh
    Akata, Zeynep
    Larochelle, Hugo
    Garg, Animesh
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4026 - 4037
  • [40] PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators
    Agarwal, Anish
    Alomar, Abdullah
    Alumootil, Varkey
    Shah, Devavrat
    Shen, Dennis
    Xu, Zhi
    Yang, Cindy
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34