A Prioritized objective actor-critic method for deep reinforcement learning

被引:11
|
作者
Nguyen, Ngoc Duy [1 ]
Nguyen, Thanh Thi [2 ]
Vamplew, Peter [3 ]
Dazeley, Richard [4 ]
Nahavandi, Saeid [1 ]
机构
[1] Deakin Univ, Inst Intelligent Syst Res & Innovat, Waurn Ponds Campus, Geelong, Vic, Australia
[2] Deakin Univ, Sch Informat Technol, Burwood Campus, Melbourne, Vic, Australia
[3] Federat Univ Australia, Federat Learning Agents Grp, Sch Sci Engn & Informat Technol, Ballarat, Vic, Australia
[4] Deakin Univ, Sch Informat Technol, Waurn Ponds Campus, Geelong, Vic, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2021年 / 33卷 / 16期
关键词
Deep learning; Reinforcement learning; Learning systems; Multi-objective optimization; Actor-critic architecture;
D O I
10.1007/s00521-021-05795-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An increasing number of complex problems have naturally posed significant challenges in decision-making theory and reinforcement learning practices. These problems often involve multiple conflicting reward signals that inherently cause agents' poor exploration in seeking a specific goal. In extreme cases, the agent gets stuck in a sub-optimal solution and starts behaving harmfully. To overcome such obstacles, we introduce two actor-critic deep reinforcement learning methods, namely Multi-Critic Single Policy (MCSP) and Single Critic Multi-Policy (SCMP), which can adjust agent behaviors to efficiently achieve a designated goal by adopting a weighted-sum scalarization of different objective functions. In particular, MCSP creates a human-centric policy that corresponds to a predefined priority weight of different objectives. Whereas, SCMP is capable of generating a mixed policy based on a set of priority weights, i.e., the generated policy uses the knowledge of different policies (each policy corresponds to a priority weight) to dynamically prioritize objectives in real time. We examine our methods by using the Asynchronous Advantage Actor-Critic (A3C) algorithm to utilize the multithreading mechanism for dynamically balancing training intensity of different policies into a single network. Finally, simulation results show that MCSP and SCMP significantly outperform A3C with respect to the mean of total rewards in two complex problems: Food Collector and Seaquest.
引用
收藏
页码:10335 / 10349
页数:15
相关论文
共 50 条
  • [1] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [2] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [3] Visual Navigation with Actor-Critic Deep Reinforcement Learning
    Shao, Kun
    Zhao, Dongbin
    Zhu, Yuanheng
    Zhang, Qichao
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [4] Deep Actor-Critic Reinforcement Learning for Anomaly Detection
    Zhong, Chen
    Gursoy, M. Cenk
    Velipasalar, Senem
    2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [5] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    COMPLEXITY, 2021, 2021
  • [6] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [7] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [8] Lexicographic Actor-Critic Deep Reinforcement Learning for Urban Autonomous Driving
    Zhang, Hengrui
    Lin, Youfang
    Han, Sheng
    Lv, Kai
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) : 4308 - 4319
  • [9] Deep Reinforcement Learning in VizDoom via DQN and Actor-Critic Agents
    Bakhanova, Maria
    Makarov, Ilya
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 138 - 150
  • [10] A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2019, 5 (04) : 1125 - 1139