An Opponent-Aware Reinforcement Learning Method for Team-to-Team Multi-Vehicle Pursuit via Maximizing Mutual Information Indicator

被引：1

作者：

Wang, Qinwen ^{[1
]}

Li, Xinhang ^{[1
]}

Yuan, Zheng ^{[1
]}

Yang, Yiying ^{[1
]}

Xu, Chen ^{[1
]}

Zhang, Lin ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN | 2022年

基金：

中国国家自然科学基金;

关键词：

intelligent transportation; team-to-team multi-vehicle pursuit; multi-agent reinforcement pursuit;

D O I：

10.1109/MSN57253.2022.00089

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The pursuit-evasion game in Smart City brings a profound impact on the Multi-vehicle Pursuit (MVP) problem, when police cars cooperatively pursue suspected vehicles. Existing studies on the MVP problems tend to set evading vehicles to move randomly or in a fixed prescribed route. The opponent modeling method has proven considerable promise in tackling the non-stationary caused by the adversary agent. However, most of them focus on two-player competitive games and easy scenarios without the interference of environments. This paper considers a Team-to-Team Multi-vehicle Pursuit (T2TMVP) problem in the complicated urban traffic scene where the evading vehicles adopt the pre-trained dynamic strategies to execute decisions intelligently. To solve this problem, we propose an opponent-aware reinforcement learning via maximizing mutual information indicator (OARLM(2)I(2)) method to improve pursuit efficiency in the complicated environment. First, a sequential encoding-based opponents joint strategy modeling (SEOJSM) mechanism is proposed to generate evading vehicles' joint strategy model, which assists the multi-agent decision-making process based on deep Q-network (DQN). Then, we design a mutual information-united loss, simultaneously considering the reward fed back from the environment and the effectiveness of opponents' joint strategy model, to update pursuing vehicles' decision-making process. Extensive experiments based on SUMO demonstrate our method outperforms other baselines by 21.48% on average in reducing pursuit time. The code is available at https://github.com/ANT-ITS/OARLM(2)I(2).

引用

页码：526 / 533

页数：8

共 2 条

[1] T3OMVP: A Transformer-Based Time and Team Reinforcement Learning Scheme for Observation-Constrained Multi-Vehicle Pursuit in Urban Area
Yuan, Zheng
Wu, Tianhao
Wang, Qinwen
Yang, Yiying
Li, Lei
Zhang, Lin
[J]. ELECTRONICS, 2022, 11 (09)
[2] Graded-Q Reinforcement Learning with Information-Enhanced State Encoder for Hierarchical Collaborative Multi-Vehicle Pursuit
Yang, Yiying
Li, Xinhang
Yuan, Zheng
Wang, Qinwen
Xu, Chen
Zhang, Lin
[J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 534 - 541

← 1 →