Enhancing Reinforcement Learning Performance in Delayed Reward System Using DQN and Heuristics

被引:2
|
作者
Kim, Keecheon [1 ]
机构
[1] Konkuk Univ, Dept Comp Informat & Commun Engn, Seoul 05029, South Korea
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Games; Reinforcement learning; Shape; Q-learning; Licenses; Decision making; Visualization; Machine learning; reinforcement learning; heuristics; delayed reward system; Tetris;
D O I
10.1109/ACCESS.2022.3174361
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper suggests and implements how to apply the reinforcement learning on delayed reward system which is known to be complex to apply the machine learning technology such as Q-learning. Such games as Tetris game is known to be a delayed reward system because of its characteristics of generating sparse reward in learning process. Tetris game requires the actor's quick judgment ability and speed of response because the blocks must be stacked in an optimal location quickly, considering the random shape and rotation of appearing blocks. Also, since the number of cases is very large due to the various block types and order, if a human-being is playing the game, the performance is limited by simply relying on human memorization capability. Therefore, we applied a reinforcement learning implemented in this study for this delayed reward system. We find that the general legacy reinforcement learning method shows its limitation in improving the performance. Hence, we apply the heuristic to increase the decision accuracy as the weighting method of reward. As a result, we were able to obtain high scores in games. Although it is not yet possible to say that this heuristic(rule-based) approach has completely conquered the game. In several experiments, this hybrid reinforcement learning shows better playability than human in terms of learning speed, as well as high scores. In this paper, it is shown that general Q-learning is not suitable for delayed reward system. And a hybrid learning that adds prioritized experience replay tactics, and the related techniques and algorithms are introduced to increase the reinforcement learning performance.
引用
收藏
页码:50641 / 50650
页数:10
相关论文
共 50 条
  • [41] Reward-Free Reinforcement Learning Algorithm Using Prediction Network
    Yu, Zhen
    Feng, Yimin
    Liu, Lijun
    FUZZY SYSTEMS AND DATA MINING VI, 2020, 331 : 663 - 670
  • [42] Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments
    Rengarajan, Desik
    Chaudhary, Sapana
    Kim, Jaewon
    Kalathil, Dileep
    Shakkottai, Srinivas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [43] Caterpillar robot locomotion based on reinforcement learning using subjective reward
    Yamashina, R. (ryota@yabsv.jks.ynu.ac.jp), 1600, Japan Society of Mechanical Engineers (79):
  • [44] Human locomotion with reinforcement learning using bioinspired reward reshaping strategies
    Nowakowski, Katharine
    Carvalho, Philippe
    Six, Jean-Baptiste
    Maillet, Yann
    Nguyen, Anh Tu
    Seghiri, Ismail
    M’Pemba, Loick
    Marcille, Theo
    Ngo, Sy Toan
    Dao, Tien-Tuan
    Medical and Biological Engineering and Computing, 2021, 59 (01): : 243 - 256
  • [45] Enhancing Car-Following Performance in Traffic Oscillations Using Expert Demonstration Reinforcement Learning
    Li, Meng
    Li, Zhibin
    Cao, Zehong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (07) : 7751 - 7766
  • [46] Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning?
    Messaoud, Safa
    Kumar, Maghav
    Schwing, Alexander G.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3313 - 3323
  • [47] Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning?
    Messaoud, Safa
    Kumar, Maghav
    Schwing, Alexander G.
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 7586 - 7596
  • [48] Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
    De Moor, Bram J.
    Gijsbrechts, Joren
    Boute, Robert N.
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 301 (02) : 535 - 545
  • [49] Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms
    Murthy, Yashaswini
    Moharrami, Mehrdad
    Srikant, R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [50] Reinforcement Learning Enabled Autonomous Manufacturing Using Transfer Learning and Probabilistic Reward Modeling
    Alam, Md Ferdous
    Shtein, Max
    Barton, Kira
    Hoelzle, David
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 508 - 513