Reinforcement Learning with Potential Functions Trained to Discriminate Good and Bad States

被引：0

作者：

Chen, Yifei ^{[1
]}

Kasaei, Hamidreza ^{[1
]}

Schomaker, Lambert ^{[1
]}

Wiering, Marco ^{[1
]}

机构：

[1] Univ Groningen, Bernoulli Inst Math Comp Sci & Artificial Intelli, Groningen, Netherlands

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

D O I：

10.1109/IJCNN52387.2021.9533682

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reward shaping is an efficient way to incorporate domain knowledge into a reinforcement learning agent. Nevertheless, it is unpractical and inconvenient to require prior knowledge for designing shaping rewards. Therefore, learning the shaping reward function by the agent during training could be more effective. In this paper, based on the potential-based reward shaping framework, which guarantees policy invariance, we propose to learn a potential function concurrently with training an agent using a reinforcement learning algorithm. In the proposed method, the potential function is trained by examining states that occur in good and in bad episodes. We apply the proposed adaptive potential function while training an agent with Q-learning and develop two novel algorithms. One is APF-QMLP, which applies the good/bad state potential function combined with Q-learning and multi-layer perceptrons (MLPs) to estimate the Q-function. The other is APF-Dueling-DQN, which combines the novel potential function with Dueling DQN. In particular, an autoencoder is adopted in APF-Dueling-DQN to map image states from Atari games to hash codes. We evaluated the created algorithms empirically in four environments: a six-room maze, CartPole, Acrobot, and Ms-Pacman, involving low-dimensional or high-dimensional state spaces. The experimental results showed that the proposed adaptive potential function improved the performances of the selected reinforcement learning algorithms.

引用

页数：7

共 50 条

[21] Perceptual learning in amblyopes: The good, the bad and the ugly
Kiorpes, L.
Mangal, P.
PERCEPTION, 2011, 40 : 42 - 42
[22] Reinforcement schedule effects in rats trained to discriminate 3,4-methylenedioxymethamphetamine (MDMA) or cocaine
Kueh, Daniel
Baker, Lisa E.
PSYCHOPHARMACOLOGY, 2007, 189 (04) : 447 - 457
[23] COLES LEARNING MYSTIQUE - THE GOOD, THE BAD, AND THE IRRELEVANT
ROURKE, BP
JOURNAL OF LEARNING DISABILITIES, 1989, 22 (05) : 274 - 277
[24] Reinforcement schedule effects in rats trained to discriminate 3,4-methylenedioxymethamphetamine (MDMA) or cocaine
Daniel Kueh
Lisa E. Baker
Psychopharmacology, 2007, 189 : 447 - 457
[25] Collective foraging of active particles trained by reinforcement learning
Robert C. Löffler
Emanuele Panizon
Clemens Bechinger
Scientific Reports, 13 (1)
[26] Optical neural networks trained in situ with reinforcement learning
Neill, Oliver
Faccio, Daniele
MACHINE LEARNING IN PHOTONICS, 2024, 13017
[27] Collective foraging of active particles trained by reinforcement learning
Loeffler, Robert C.
Panizon, Emanuele
Bechinger, Clemens
SCIENTIFIC REPORTS, 2023, 13 (01):
[28] Adversarially Trained Actor Critic for Offline Reinforcement Learning
Cheng, Ching-An
Xie, Tengyang
Jiang, Nan
Agarwal, Alekh
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[29] Good and bad news - Residents move to United States
KermodeScott, B
CANADIAN FAMILY PHYSICIAN, 1997, 43 : 1677 - &
[30] Reinforcement learning reward functions for unsupervised learning
Fyfe, Colin
Lai, Pei Ling
ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +

← 1 2 3 4 5 →