Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

被引:0
|
作者
Hou, Yueqi [1 ,2 ]
Liang, Xiaolong [1 ,2 ]
Zhang, Jiaqiang [1 ,2 ]
Yang, Qisong [3 ]
Yang, Aiwu [1 ,2 ]
Wang, Ning [1 ,2 ]
机构
[1] Air Force Engn Univ, Air Traff Control & Nav Sch, Xian 710051, Peoples R China
[2] Air Force Engn Univ, Shaanxi Key Lab Meta Synth Elect & Informat Syst, Xian 710051, Peoples R China
[3] Xian Res Inst High Technol, Xian 710051, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 14期
基金
中国国家自然科学基金;
关键词
invalid action masking; reinforcement learning; policy gradient; proximal policy optimization; real-time strategy game;
D O I
10.3390/app13148283
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Invalid action masking is a practical technique in deep reinforcement learning to prevent agents from taking invalid actions. Existing approaches rely on action masking during policy training and utilization. This study focuses on developing reinforcement learning algorithms that incorporate action masking during training but can be used without action masking during policy execution. The study begins by conducting a theoretical analysis to elucidate the distinction between naive policy gradient and invalid action policy gradient. Based on this analysis, we demonstrate that the naive policy gradient is a valid gradient and is equivalent to the proposed composite objective algorithm, which optimizes both the masked policy and the original policy in parallel. Moreover, we propose an off-policy algorithm for invalid action masking that employs the masked policy for sampling while optimizing the original policy. To compare the effectiveness of these algorithms, experiments are conducted using a simplified real-time strategy (RTS) game simulator called Gym-mu RTS. Based on empirical findings, we recommend utilizing the off-policy algorithm for addressing most tasks while employing the composite objective algorithm for handling more complex tasks.
引用
收藏
页数:16
相关论文
共 6 条
  • [1] Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning
    Liu, Mushuang
    Wan, Yan
    Lewis, Frank L.
    Lopez, Victor G.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5522 - 5533
  • [2] Comparison of On-Policy Deep Reinforcement Learning A2C with Off-Policy DQN in Irrigation Optimization: A Case Study at a Site in Portugal
    Alibabaei, Khadijeh
    Gaspar, Pedro D.
    Assuncao, Eduardo
    Alirezazadeh, Saeid
    Lima, Tania M.
    Soares, Vasco N. G. J.
    Caldeira, Joao M. L. P.
    [J]. COMPUTERS, 2022, 11 (07)
  • [3] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games
    Song, Ruizhuo
    Lewis, Frank L.
    Wei, Qinglai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) : 704 - 713
  • [4] Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning
    Ma, Xiao
    Yuan, Yuan
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (07):
  • [5] Non-zero-sum games of discrete-time Markov jump systems with unknown dynamics: An off-policy reinforcement learning method
    Zhang, Xuewen
    Shen, Hao
    Li, Feng
    Wang, Jing
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (02) : 949 - 968
  • [6] Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning
    Wen, Yinlei
    Zhang, Huaguang
    Su, Hanguang
    Ren, He
    [J]. OPTIMAL CONTROL APPLICATIONS & METHODS, 2020, 41 (04): : 1233 - 1250