Enhancing Reinforcement Learning Performance in Delayed Reward System Using DQN and Heuristics

被引:2
|
作者
Kim, Keecheon [1 ]
机构
[1] Konkuk Univ, Dept Comp Informat & Commun Engn, Seoul 05029, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Games; Reinforcement learning; Shape; Q-learning; Licenses; Decision making; Visualization; Machine learning; reinforcement learning; heuristics; delayed reward system; Tetris;
D O I
10.1109/ACCESS.2022.3174361
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper suggests and implements how to apply the reinforcement learning on delayed reward system which is known to be complex to apply the machine learning technology such as Q-learning. Such games as Tetris game is known to be a delayed reward system because of its characteristics of generating sparse reward in learning process. Tetris game requires the actor's quick judgment ability and speed of response because the blocks must be stacked in an optimal location quickly, considering the random shape and rotation of appearing blocks. Also, since the number of cases is very large due to the various block types and order, if a human-being is playing the game, the performance is limited by simply relying on human memorization capability. Therefore, we applied a reinforcement learning implemented in this study for this delayed reward system. We find that the general legacy reinforcement learning method shows its limitation in improving the performance. Hence, we apply the heuristic to increase the decision accuracy as the weighting method of reward. As a result, we were able to obtain high scores in games. Although it is not yet possible to say that this heuristic(rule-based) approach has completely conquered the game. In several experiments, this hybrid reinforcement learning shows better playability than human in terms of learning speed, as well as high scores. In this paper, it is shown that general Q-learning is not suitable for delayed reward system. And a hybrid learning that adds prioritized experience replay tactics, and the related techniques and algorithms are introduced to increase the reinforcement learning performance.
引用
收藏
页码:50641 / 50650
页数:10
相关论文
共 50 条
  • [31] Enhancing the Security of OLSR Protocol Using Reinforcement Learning
    Priyadarshani, Hasitha
    Jayasekara, Nipuna
    Chathuranga, Lahiru
    Kesavan, Krishnadeva
    Nawarathna, Chamira
    Sampath, Kalpa Kalhara
    Liyanapathirana, Cethana
    Rupasinghe, Lakmak
    2017 NATIONAL INFORMATION TECHNOLOGY CONFERENCE (NITC), 2017, : 49 - 54
  • [32] Enhancing Navigational Performance with Holistic Deep-Reinforcement-Learning
    Meusel, Marvin
    Kaestner, Linh
    Bhuiyan, Teham
    Lambrecht, Jens
    INTELLIGENT AUTONOMOUS SYSTEMS 18, VOL 1, IAS18-2023, 2024, 795 : 57 - 69
  • [33] Adaptive traffic signal control system using composite reward architecture based deep reinforcement learning
    Jamil, Abu Rafe Md
    Ganguly, Kishan Kumar
    Nower, Naushin
    IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (14) : 2030 - 2041
  • [34] An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm
    Yang, Steve Y.
    Yu, Yangyang
    Almandi, Saud
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 388 - 401
  • [35] Enhancing reinforcement learning-based ramp metering performance at freeway uncertain bottlenecks using curriculum learning
    Zheng, Si
    Li, Zhibin
    Li, Meng
    Ke, Zemian
    IET INTELLIGENT TRANSPORT SYSTEMS, 2024, 18 (10) : 1863 - 1878
  • [36] Robot Reinforcement Learning using EEG-based reward signals
    Iturrate, I.
    Montesano, L.
    Minguez, J.
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 4822 - 4829
  • [37] Human locomotion with reinforcement learning using bioinspired reward reshaping strategies
    Nowakowski, Katharine
    Carvalho, Philippe
    Six, Jean-Baptiste
    Maillet, Yann
    Nguyen, Anh Tu
    Seghiri, Ismail
    M'Pemba, Loick
    Marcille, Theo
    Ngo, Sy Toan
    Dao, Tien-Tuan
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (01) : 243 - 256
  • [38] Efficient Average Reward Reinforcement Learning Using Constant Shifting Values
    Yang, Shangdong
    Gao, Yang
    An, Bo
    Wang, Hao
    Chen, Xingguo
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2258 - 2264
  • [39] Human locomotion with reinforcement learning using bioinspired reward reshaping strategies
    Katharine Nowakowski
    Philippe Carvalho
    Jean-Baptiste Six
    Yann Maillet
    Anh Tu Nguyen
    Ismail Seghiri
    Loick M’Pemba
    Theo Marcille
    Sy Toan Ngo
    Tien-Tuan Dao
    Medical & Biological Engineering & Computing, 2021, 59 : 243 - 256
  • [40] Deep Reinforcement Learning by Parallelizing Reward and Punishment using the MaxPain Architecture
    Wang, Jiexin
    Elfwing, Stefan
    Uchibe, Eiji
    2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 175 - 180