Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

被引:0
|
作者
Chen, Yifei [1 ]
Kasaei, Hamidreza [1 ]
Schomaker, Lambert [1 ]
Wiering, Marco [1 ]
机构
[1] Univ Groningen, Bernoulli Inst Math Comp Sci & Artificial Intelli, Groningen, Netherlands
关键词
D O I
10.1109/IJCNN52387.2021.9533682
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nevertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] GOOD AND BAD PRACTICES IN COMPUTER-BASED LEARNING
    BORK, A
    COMMUNICATIONS OF THE ACM, 1986, 29 (05) : 360 - &
  • [42] Zero-Shot Learning - The Good, the Bad and the Ugly
    Xian, Yongqin
    Schiele, Bernt
    Akata, Zeynep
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3077 - 3086
  • [43] GOOD, BAD, AND INDIFFERENT EFFECTS OF INCREASING LEVELS OF LEARNING
    HAYESROTH, B
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1975, 6 (NB4) : 417 - 417
  • [44] The good and the bad: Acquisition of likes and dislikes in affective learning
    Mallan, K
    Lipp, O
    AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2005, 57 : 61 - 61
  • [45] Learning outcomes: good, irrelevant, bad or none of the above?
    Souto-Otero, Manuel
    JOURNAL OF EDUCATION AND WORK, 2012, 25 (03) : 249 - 258
  • [46] Analysing deep reinforcement learning agents trained with domain randomisation
    Dai, Tianhong
    Arulkumaran, Kai
    Gerbert, Tamara
    Tukra, Samyakh
    Behbahani, Feryal
    Bharath, Anil Anthony
    NEUROCOMPUTING, 2022, 493 : 143 - 165
  • [47] An Open Domain Question Answering System Trained by Reinforcement Learning
    Afrae, Bghiel
    Mohamed, Ben Ahmed
    Abdelhakim, Anouar Boudhir
    SUSTAINABLE SMART CITIES AND TERRITORIES, 2022, 253 : 129 - 138
  • [48] Composing Value Functions in Reinforcement Learning
    van Niekerk, Benjamin
    James, Steven
    Earle, Adam
    Rosman, Benjamin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [49] Proxy Functions for Approximate Reinforcement Learning
    Alibekov, Eduard
    Kubalik, Jiri
    Babuska, Robert
    IFAC PAPERSONLINE, 2019, 52 (11): : 224 - 229
  • [50] SPACECRAFT RENDEZVOUS GUIDANCE IN CLUTTERED ENVIRONMENTS VIA ARTIFICIAL POTENTIAL FUNCTIONS AND REINFORCEMENT LEARNING
    Gaudet, Brain
    Linares, Richard
    Furfaro, Roberto
    ASTRODYNAMICS 2018, PTS I-IV, 2019, 167 : 813 - 827