Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

被引:0
|
作者
Chen, Yifei [1 ]
Kasaei, Hamidreza [1 ]
Schomaker, Lambert [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst Math Comp Sci & Artificial Intelli, Groningen, Netherlands
关键词
D O I
10.1109/IJCNN52387.2021.9533682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nevertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Reinforcement learning: The Good, The Bad and The Ugly
    Dayana, Peter
    Niv, Yael
    CURRENT OPINION IN NEUROBIOLOGY, 2008, 18 (02) : 185 - 196
  • [2] Good and Bad Functions for Bad Processes
    Parrish, Andrew
    Rosenblatt, Joseph
    DYNAMICAL SYSTEMS AND RANDOM PROCESSES, 2019, 736 : 171 - 186
  • [3] Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning
    Hoang, Huy
    Mai, Tien
    Varakantham, Pradeep
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 12439 - 12447
  • [4] Pigeons can discriminate “good” and “bad” paintings by children
    Shigeru Watanabe
    Animal Cognition, 2010, 13 : 75 - 85
  • [5] Pigeons can discriminate "good" and "bad" paintings by children
    Watanabe, Shigeru
    ANIMAL COGNITION, 2010, 13 (01) : 75 - 85
  • [6] ChatGPT: the good, the bad, and the potential
    Chavez, Martin R.
    AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2023, 229 (03) : 357 - 357
  • [7] Learning: The Good, the Bad, and the Fly
    Hige, Toshihide
    Turner, Glenn
    NEURON, 2015, 86 (02) : 343 - 345
  • [8] Learning potential functions and their representations for multi-task reinforcement learning
    Snel, Matthijs
    Whiteson, Shimon
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2014, 28 (04) : 637 - 681
  • [9] Learning potential functions and their representations for multi-task reinforcement learning
    Matthijs Snel
    Shimon Whiteson
    Autonomous Agents and Multi-Agent Systems, 2014, 28 : 637 - 681
  • [10] Descending stairs: Good or bad task to discriminate women with patellofemoral pain?
    Novello, Aline de Almeida
    Garbelotti, Silvio, Jr.
    dos Anjos Rabelo, Nayra Deise
    Ferraz, Andre Nogueira
    Bley, Andre Serra
    Ferrari Correa, Joao Carlos
    Politti, Fabiano
    Garcia Lucareli, Paulo Roberto
    GAIT & POSTURE, 2018, 65 : 26 - 32