Iterative ADP learning algorithms for discrete-time multi-player games

被引:53
|
作者
Jiang, He [1 ]
Zhang, Huaguang [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; Approximate dynamic programming; Reinforcement learning; Neural network; ZERO-SUM GAMES; UNCERTAIN NONLINEAR-SYSTEMS; H-INFINITY CONTROL; CONSTRAINED-INPUT; POLICY ITERATION; EQUATION; DESIGNS;
D O I
10.1007/s10462-017-9603-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton-Jacobi-Isaacs equations for zero-sum games and a set of coupled Hamilton-Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.
引用
收藏
页码:75 / 91
页数:17
相关论文
共 50 条
  • [1] Iterative ADP learning algorithms for discrete-time multi-player games
    He Jiang
    Huaguang Zhang
    Artificial Intelligence Review, 2018, 50 : 75 - 91
  • [2] Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning
    Li, Jinna
    Xiao, Zhenfei
    Li, Ping
    IEEE ACCESS, 2019, 7 : 134647 - 134659
  • [3] Event-triggered adaptive dynamic programming for discrete-time multi-player games
    Wang, Ziyang
    Wei, Qinglai
    Liu, Derong
    INFORMATION SCIENCES, 2020, 506 : 457 - 470
  • [4] A comparison of algorithms for multi-player games
    Sturtevant, N
    COMPUTERS AND GAMES, 2003, 2883 : 108 - 122
  • [5] Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming
    Jiang, He
    Zhang, Huaguang
    Xie, Xiangpeng
    Han, Ji
    NEUROCOMPUTING, 2019, 344 : 13 - 19
  • [6] Player modeling, search algorithms and strategies in multi-player games
    Lorenz, Ulf
    Tscheuschner, Tobias
    ADVANCES IN COMPUTER GAMES, 2006, 4250 : 210 - +
  • [7] Discrete time stochastic multi-player competitive games with affine payoffs
    Guo, Ivan
    Rutkowski, Marek
    STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2016, 126 (01) : 1 - 32
  • [8] Multi-player matrix games
    Broom, M
    Cannings, C
    Vickers, GT
    BULLETIN OF MATHEMATICAL BIOLOGY, 1997, 59 (05) : 931 - 952
  • [9] Random multi-player games
    Kontorovsky, Natalia L.
    Pablo Pinasco, Juan
    Vazquez, Federico
    CHAOS, 2022, 32 (03)
  • [10] Multi-Player Flow Games
    Guha, Shibashis
    Kupferman, Orna
    Vardi, Gal
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 104 - 112