Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients

被引:7
|
作者
Rauber, Paulo [1 ]
Ummadisingu, Avinash [2 ]
Mutz, Filipe [3 ]
Schmidhuber, Juergen [4 ,5 ,6 ,7 ]
机构
[1] Queen Mary Univ London, London E1 4FZ, England
[2] Preferred Networks, Tokyo 1000004, Japan
[3] Inst Fed Espirito Santo, BR-29056264 Vitoria, ES, Brazil
[4] Ist Dalle Molle Studi Intelligenza Artificiale, CH-6962 Viganello, Switzerland
[5] Univ Svizzera Italiana, CH-6900 Lugano, Switzerland
[6] Scuola Univ Profess Svizzera Italiana, CH-6928 Manno, Switzerland
[7] NNAISENSE, CH-6900 Lugano, Switzerland
基金
瑞士国家科学基金会; 欧洲研究理事会;
关键词
D O I
10.1162/neco_a_01387
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enabling sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this letter, we demonstrate how hindsight can be introduced to policy gradientmethods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.
引用
收藏
页码:1498 / 1553
页数:56
相关论文
共 50 条
  • [1] Reinforcement learning in sparse-reward environments with hindsight policy gradients
    Queen Mary University of London, London
    E1 4FZ, United Kingdom
    不详
    100-0004, Japan
    不详
    29056-264, Brazil
    不详
    6962, Switzerland
    不详
    6900, Switzerland
    不详
    6928, Switzerland
    不详
    6900, Switzerland
    [J]. Neural Comp., 6 (1498-1553):
  • [2] Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments
    Wulur, Christoper
    Weber, Cornelius
    Wermter, Stefan
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [3] Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks
    Guo, Yijie
    Wu, Qiucheng
    Lee, Honglak
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6792 - 6800
  • [4] Adaptive Variance for Changing Sparse-Reward Environments
    Lin, Xingyu
    Guo, Pengsheng
    Florensa, Carlos
    Held, David
    [J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 3210 - 3216
  • [5] A Novel State Space Exploration Method for the Sparse-Reward Reinforcement Learning Environment
    Liu, Xi
    Ma, Long
    Chen, Zhen
    Zheng, Changgang
    Chen, Ren
    Liao, Yong
    Yang, Shufan
    [J]. ARTIFICIAL INTELLIGENCE XL, AI 2023, 2023, 14381 : 216 - 221
  • [6] An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains
    Ocana, Jim Martin Catacora
    Capobianco, Roberto
    Nardi, Daniele
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 76 : 1181 - 1218
  • [7] Hybrid Task Scheduling in Cloud Manufacturing With Sparse-Reward Deep Reinforcement Learning
    Wang, Xiaohan
    Laili, Yuanjun
    Zhang, Lin
    Liu, Yongkui
    [J]. IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, : 1 - 15
  • [8] An Overview of Environmental Features that Impact Deep Reinforcement Learning in Sparse-Reward Domains
    Ocana, Jim Martin Catacora
    Capobianco, Roberto
    Nardi, Daniele
    [J]. Journal of Artificial Intelligence Research, 2023, 76 : 1181 - 1218
  • [9] Deep reinforcement learning applied to a sparse-reward trading environment with intraday data
    Takara, Lucas de Azevedo
    Santos, Andre Alves Portela
    Mariani, Viviana Cocco
    Coelho, Leandro dos Santos
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [10] Self-Supervised Online Reward Shaping in Sparse-Reward Environments
    Memarian, Farzan
    Goo, Wonjoon
    Lioutikov, Rudolf
    Niekum, Scott
    Topcu, Ufuk
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 2369 - 2375