An Opponent-Aware Reinforcement Learning Method for Team-to-Team Multi-Vehicle Pursuit via Maximizing Mutual Information Indicator

被引:1
|
作者
Wang, Qinwen [1 ]
Li, Xinhang [1 ]
Yuan, Zheng [1 ]
Yang, Yiying [1 ]
Xu, Chen [1 ]
Zhang, Lin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
intelligent transportation; team-to-team multi-vehicle pursuit; multi-agent reinforcement pursuit;
D O I
10.1109/MSN57253.2022.00089
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The pursuit-evasion game in Smart City brings a profound impact on the Multi-vehicle Pursuit (MVP) problem, when police cars cooperatively pursue suspected vehicles. Existing studies on the MVP problems tend to set evading vehicles to move randomly or in a fixed prescribed route. The opponent modeling method has proven considerable promise in tackling the non-stationary caused by the adversary agent. However, most of them focus on two-player competitive games and easy scenarios without the interference of environments. This paper considers a Team-to-Team Multi-vehicle Pursuit (T2TMVP) problem in the complicated urban traffic scene where the evading vehicles adopt the pre-trained dynamic strategies to execute decisions intelligently. To solve this problem, we propose an opponent-aware reinforcement learning via maximizing mutual information indicator (OARLM(2)I(2)) method to improve pursuit efficiency in the complicated environment. First, a sequential encoding-based opponents joint strategy modeling (SEOJSM) mechanism is proposed to generate evading vehicles' joint strategy model, which assists the multi-agent decision-making process based on deep Q-network (DQN). Then, we design a mutual information-united loss, simultaneously considering the reward fed back from the environment and the effectiveness of opponents' joint strategy model, to update pursuing vehicles' decision-making process. Extensive experiments based on SUMO demonstrate our method outperforms other baselines by 21.48% on average in reducing pursuit time. The code is available at https://github.com/ANT-ITS/OARLM(2)I(2).
引用
收藏
页码:526 / 533
页数:8
相关论文
共 2 条
  • [1] T3OMVP: A Transformer-Based Time and Team Reinforcement Learning Scheme for Observation-Constrained Multi-Vehicle Pursuit in Urban Area
    Yuan, Zheng
    Wu, Tianhao
    Wang, Qinwen
    Yang, Yiying
    Li, Lei
    Zhang, Lin
    [J]. ELECTRONICS, 2022, 11 (09)
  • [2] Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit
    Yang, Yiying
    Li, Xinhang
    Yuan, Zheng
    Wang, Qinwen
    Xu, Chen
    Zhang, Lin
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 534 - 541