Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

被引:0
|
作者
Chen, Yifei [1 ]
Kasaei, Hamidreza [1 ]
Schomaker, Lambert [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst Math Comp Sci & Artificial Intelli, Groningen, Netherlands
关键词
D O I
10.1109/IJCNN52387.2021.9533682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nevertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning
    Boone, Victor
    Gaujal, Bruno
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [32] ARE EARLY REFERRALS OF POTENTIAL ORGAN DONORS GOOD OR BAD
    ROVELLI, MA
    BARTUS, SA
    SCHWEIZER, RT
    TRANSPLANTATION PROCEEDINGS, 1990, 22 (02) : 311 - 312
  • [33] The Effective Potential in Device Modeling: The Good, the Bad and the Ugly
    Ferry D.K.
    Ramey S.
    Shifren L.
    Akis R.
    Journal of Computational Electronics, 2002, 1 (1-2) : 59 - 65
  • [34] The good, the bad and the insignificant—assessing concept functions for conceptual engineering
    Sigurd Jorem
    Synthese, 2022, 200
  • [35] Outsourcing support functions: Identifying and managing the good, the bad, and the ugly
    Raiborn, Cecilly A.
    Butler, Janet B.
    Massoud, Marc F.
    BUSINESS HORIZONS, 2009, 52 (04) : 347 - 356
  • [36] Combining multiple probability predictions in the presence of class imbalance to discriminate between potential bad and good borrowers in the peer-to-peer lending market
    Zanin, Luca
    JOURNAL OF BEHAVIORAL AND EXPERIMENTAL FINANCE, 2020, 25
  • [37] Is Reinforcement Learning Good at American Option Valuation?
    Kor, Peyman
    Bratvold, Reidar B.
    Hong, Aojie
    ALGORITHMS, 2024, 17 (09)
  • [38] Theatres of Learning Disability: Good, Bad, or Plain Ugly?
    Wallin, Scott
    TDR-THE DRAMA REVIEW-THE JOURNAL OF PERFORMANCE STUDIES, 2017, 61 (01): : 188 - 191
  • [39] Theatres of Learning Disability: Good, Bad, or Plain Ugly?
    Marinucci, Sarah
    THEATRE RESEARCH INTERNATIONAL, 2016, 41 (02) : 182 - 183
  • [40] FedVal: Different good or different bad in federated learning
    Valadi, Viktor
    Qiu, Xinchi
    de Gusmao, Pedro Porto Buarque
    Lane, Nicholas D.
    Alibeigi, Mina
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 6365 - 6380