Deconfounded Opponent Intention Inference for Football Multi-Player Policy Learning

被引:1
|
作者
Wang, Shijie [1 ,2 ]
Pan, Yi [2 ]
Pu, Zhiqiang [1 ,2 ]
Liu, Boyin [1 ,2 ]
Yi, Jianqiang [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
关键词
D O I
10.1109/IROS55552.2023.10341469
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the high complexity of a football match, the opponents' strategies are variable and unknown. Thus predicting the opponents' future intentions accurately based on current situation is crucial for football players' decision-making. To better anticipate the opponents and learn more effective strategies, a deconfounded opponent intention inference (DOII) method for football multi-player policy learning is proposed in this paper. Specifically, opponents' intentions are inferred by an opponent intention supervising module. Furthermore, for some confounders which affect the causal relationship among the players and the opponents, a deconfounded trajectory graph module is designed to mitigate the influence of these confounders and increase the accuracy of the inferences about opponents' intentions. Besides, an opponent-based incentive module is designed to improve the players' sensitivity to the opponents' intentions and further to train reasonable players' strategies. Representative results indicate that DOII can effectively improve the performance of players' strategies in the Google Research Football environment, which validates the superiority of the proposed method.
引用
收藏
页码:8054 / 8061
页数:8
相关论文
共 50 条
  • [41] My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
    Bistritz, Ilai
    Baharav, Tavor Z.
    Leshem, Amir
    Bambos, Nicholas
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [42] Min-Max Q-learning for multi-player pursuit-evasion games
    Selvakumar, Jhanani
    Bakolas, Efstathios
    NEUROCOMPUTING, 2022, 475 : 1 - 14
  • [43] Tight last-iterate convergence rates for no-regret learning in multi-player games
    Golowich, Noah
    Pattathil, Sarath
    Daskalakis, Constantinos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [44] Policy Iteration Adaptive Dynamic Programming for Optimal Control of Multi-Player Stackelberg-Nash Games
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    Zhang, Yongwei
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2393 - 2397
  • [45] My Fair Bandit: Distributed Learning of Max-Min Fairness with Multi-player Bandits
    Bistritz, Ilai
    Baharav, Tavor Z.
    Leshem, Amir
    Bambos, Nicholas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [46] Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state
    Li, Jinna
    Xiao, Zhenfei
    Fan, Jialu
    Chai, Tianyou
    Lewis, Frank L. L.
    AUTOMATICA, 2022, 136
  • [47] EVENT-TRIGGERED ADAPTIVE CONTROL FOR NONLINEAR MULTI-PLAYER GAMES USING NEURAL CRITIC LEARNING
    Li, Ping
    Zhang, Huiyan
    Ao, Wengang
    Liu, Pengda
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2024, 20 (05): : 1257 - 1275
  • [48] Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration
    Xiushan JIANG
    Yanshuang WANG
    Dongya ZHAO
    Ling SHI
    Science China(Information Sciences), 2024, 67 (04) : 21 - 37
  • [49] Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration
    Jiang, Xiushan
    Wang, Yanshuang
    Zhao, Dongya
    Shi, Ling
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (04)
  • [50] Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning
    Xiao, Zhenfei
    Li, Jinna
    Li, Ping
    IEEE ACCESS, 2020, 8 : 208938 - 208951