Population-based exploration in reinforcement learning through repulsive reward shaping using eligibility traces

被引:1
|
作者
Bal, Melis Ilayda [1 ,2 ]
Iyigun, Cem [2 ]
Polat, Faruk [3 ]
Aydin, Huseyin [3 ]
机构
[1] Max Planck Inst Software Syst, Saarbrucken, Germany
[2] Middle East Tech Univ, Dept Ind Engn, Ankara, Turkiye
[3] Middle East Tech Univ, Dept Comp Engn, Ankara, Turkiye
关键词
Reinforcement learning; Population-based exploration; Eligibility traces; Reward shaping; Coordinated agents;
D O I
10.1007/s10479-023-05798-1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Efficient exploration plays a key role in accelerating the learning performance and sample efficiency of reinforcement learning tasks. In this paper we propose a framework that serves as a population-based repulsive reward shaping mechanism using eligibility traces to enhance the efficiency in exploring the state-space under the scope of tabular reinforcement learning representation. The framework contains a hierarchical structure of RL agents, where a higher level repulsive-reward-shaper agent (RRS-Agent) coordinates the exploration of its population of sub-agents through repulsion when necessary conditions on their eligibility traces are met. Empirical results on well-known benchmark problem domains show that the framework indeed achieves efficient exploration with a significant improvement in learning performance and state-space coverage. Furthermore, the transparency of the proposed framework enables explainable decisions made by the agents in the hierarchical structure to explore the state-space in a coordinated manner and supports the interpretability of the framework.
引用
收藏
页码:689 / 725
页数:37
相关论文
共 50 条
  • [1] Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
    Yang, Yulong
    Cao, Weihua
    Guo, Linwei
    Gan, Chao
    Wu, Min
    [J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [2] Reward Shaping Based Federated Reinforcement Learning
    Hu, Yiqiu
    Hua, Yun
    Liu, Wenyan
    Zhu, Jun
    [J]. IEEE ACCESS, 2021, 9 : 67259 - 67267
  • [3] On the efficient implementation biologic reinforcement learning using eligibility traces
    Lee, SeungGwan
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 476 - 481
  • [4] Efficient ant reinforcement learning using replacing eligibility traces
    Lee, SeungGwan
    Hong, SeokMi
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2006, PROCEEDINGS, 2006, 4029 : 823 - 832
  • [5] Adaptive Group-based Signal Control Using Reinforcement Learning with Eligibility Traces
    Jin, Junchen
    Ma, Xiaoliang
    [J]. 2015 IEEE 18TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, 2015, : 2412 - 2417
  • [6] Using Natural Language for Reward Shaping in Reinforcement Learning
    Goyal, Prasoon
    Niekum, Scott
    Mooney, Raymond J.
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2385 - 2391
  • [7] Plan-based Reward Shaping for Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 416 - 423
  • [8] Potential Based Reward Shaping for Hierarchical Reinforcement Learning
    Gao, Yang
    Toni, Francesca
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3504 - 3510
  • [9] Multi-agent reinforcement learning with cooperation based on eligibility traces
    杨玉君
    程君实
    陈佳品
    [J]. Journal of Harbin Institute of Technology(New series), 2004, (05) : 564 - 568
  • [10] Exploration-Guided Reward Shaping for Reinforcement Learning under Sparse Rewards
    Devidze, Rati
    Kamalaruban, Parameswaran
    Singla, Adish
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,