Optimal Synchronization Control of Multiagent Systems With Input Saturation via Off-Policy Reinforcement Learning

被引：102

作者：

Qin, Jiahu ^{[1
]}

Li, Man ^{[1
]}

Shi, Yang ^{[2
]}

Ma, Qichao ^{[1
]}

Zheng, Wei Xing ^{[3
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230027, Anhui, Peoples R China

[2] Univ Victoria, Dept Mech Engn, Victoria, BC V8W 2Y2, Canada

[3] Western Sydney Univ, Sch Comp Engn & Math, Sydney, NSW 2751, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2019年 / 30卷 / 01期

基金：

澳大利亚研究理事会; 中国国家自然科学基金;

关键词：

Input saturation; multiagent systems; neural networks (NNs); off-policy reinforcement learning (RL); optimal synchronization control; LINEAR-SYSTEMS; NONLINEAR-SYSTEMS; NETWORKS; GAMES;

D O I：

10.1109/TNNLS.2018.2832025

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we aim to investigate the optimal synchronization problem for a group of generic linear systems with input saturation. To seek the optimal controller, Hamilton Jacobi-Bellman (HJB) equations involving nonquadratic input energy terms in coupled forms are established. The solutions to these coupled HJB equations are further proven to be optimal and the induced controllers constitute interactive Nash equilibrium. Due to the difficulty to analytically solve HJB equations, especially in coupled forms, and the possible lack of model information of the systems, we apply the data-based off-policy reinforcement learning algorithm to learn the optimal control policies. A byproduct of this off-policy algorithm is shown that it is insensitive to probing noise that is exerted to the system to maintain persistence of excitation condition. In order to implement this off-policy algorithm, we employ actor and critic neural networks to approximate the controllers and the cost functions. Furthermore, the estimated control policies obtained by this presented implementation are proven to converge to the optimal ones under certain conditions. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.

引用

页码：85 / 96

页数：12

共 50 条

[21] A MULTIAGENT REINFORCEMENT LEARNING FRAMEWORK FOR OFF-POLICY EVALUATION IN TWO-SIDED MARKETS
Shi, Chengchun
Wan, Runzhe
Song, Ge
Luo, Shikai
Zhu, Hongtu
Song, Rui
ANNALS OF APPLIED STATISTICS, 2023, 17 (04): : 2701 - 2722
[22] H∞ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning
Modares, Hamidreza
Lewis, Frank L.
Jiang, Zhong-Ping
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (10) : 2550 - 2562
[23] Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems
Chen, Ci
Lewis, Frank L.
Xie, Kan
Xie, Shengli
Liu, Yilu
AUTOMATICA, 2020, 119
[24] H∞ control of linear discrete-time systems: Off-policy reinforcement learning
Kiumarsi, Bahare
Lewis, Frank L.
Jiang, Zhong-Ping
AUTOMATICA, 2017, 78 : 144 - 152
[25] Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning
Wen, Yinlei
Zhang, Huaguang
Su, Hanguang
Ren, He
OPTIMAL CONTROL APPLICATIONS & METHODS, 2020, 41 (04): : 1233 - 1250
[26] Off-Policy Reinforcement Learning for Optimal Preview Tracking Control of Linear Discrete-Time systems with unknown dynamics
Wang, Chao-Ran
Wu, Huai-Ning
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1402 - 1407
[27] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[28] Off-Policy Reinforcement Learning with Gaussian Processes
Girish Chowdhary
Miao Liu
Robert Grande
Thomas Walsh
Jonathan How
Lawrence Carin
IEEE/CAA Journal of Automatica Sinica, 2014, 1 (03) : 227 - 238
[29] Off-Policy Reinforcement Learning with Delayed Rewards
Han, Beining
Ren, Zhizhou
Wu, Zuofan
Zhou, Yuan
Peng, Jian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[30] Bounds for Off-policy Prediction in Reinforcement Learning
Joseph, Ajin George
Bhatnagar, Shalabh
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 3991 - 3997

← 1 2 3 4 5 →