A Prioritized objective actor-critic method for deep reinforcement learning

被引：11

作者：

Nguyen, Ngoc Duy ^{[1
]}

Nguyen, Thanh Thi ^{[2
]}

Vamplew, Peter ^{[3
]}

Dazeley, Richard ^{[4
]}

Nahavandi, Saeid ^{[1
]}

机构：

[1] Deakin Univ, Inst Intelligent Syst Res & Innovat, Waurn Ponds Campus, Geelong, Vic, Australia

[2] Deakin Univ, Sch Informat Technol, Burwood Campus, Melbourne, Vic, Australia

[3] Federat Univ Australia, Federat Learning Agents Grp, Sch Sci Engn & Informat Technol, Ballarat, Vic, Australia

[4] Deakin Univ, Sch Informat Technol, Waurn Ponds Campus, Geelong, Vic, Australia

来源：

NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 16期

关键词：

Deep learning; Reinforcement learning; Learning systems; Multi-objective optimization; Actor-critic architecture;

D O I：

10.1007/s00521-021-05795-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents' poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest.

引用

页码：10335 / 10349

页数：15

共 50 条

[1] A Prioritized objective actor-critic method for deep reinforcement learning
Ngoc Duy Nguyen
Thanh Thi Nguyen
Peter Vamplew
Richard Dazeley
Saeid Nahavandi
Neural Computing and Applications, 2021, 33 : 10335 - 10349
[2] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[3] Visual Navigation with Actor-Critic Deep Reinforcement Learning
Shao, Kun
Zhao, Dongbin
Zhu, Yuanheng
Zhang, Qichao
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[4] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
Zhong, Chen
Gursoy, M. Cenk
Velipasalar, Senem
2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
[5] Averaged Soft Actor-Critic for Deep Reinforcement Learning
Ding, Feng
Ma, Guanfeng
Chen, Zhikui
Gao, Jing
Li, Peng
COMPLEXITY, 2021, 2021
[6] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
[7] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
[8] Lexicographic Actor-Critic Deep Reinforcement Learning for Urban Autonomous Driving
Zhang, Hengrui
Lin, Youfang
Han, Sheng
Lv, Kai
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) : 4308 - 4319
[9] Deep Reinforcement Learning in VizDoom via DQN and Actor-Critic Agents
Bakhanova, Maria
Makarov, Ilya
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 138 - 150
[10] A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2019, 5 (04) : 1125 - 1139

← 1 2 3 4 5 →