Behavior fusion for deep reinforcement learning

被引:6
|
作者
Shi, Haobin [1 ]
Xu, Meng [1 ]
Hwang, Kao-Shing [2 ,3 ]
Cai, Bo-Yin [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Actor-critic; Policy gradient; Behavior fusion; Complex task; DECISION-MAKING; ENVIRONMENT; NAVIGATION; GRADIENT; NETWORK;
D O I
10.1016/j.isatra.2019.08.054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:434 / 444
页数:11
相关论文
共 50 条
  • [41] A deep reinforcement learning hyper-heuristic with feature fusion for online packing problems
    Tu, Chaofan
    Bai, Ruibin
    Aickelin, Uwe
    Zhang, Yuchang
    Du, Heshan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230
  • [42] A Heterogeneous Information Fusion Deep Reinforcement Learning for Intelligent Frequency Selection of HF Communication
    Xin Liu
    Yuhua Xu
    Yunpeng Cheng
    Yangyang Li
    Lei Zhao
    Xiaobo Zhang
    中国通信, 2018, 15 (09) : 73 - 84
  • [43] Explainability in deep reinforcement learning
    Heuillet, Alexandre
    Couthouis, Fabien
    Diaz-Rodriguez, Natalia
    KNOWLEDGE-BASED SYSTEMS, 2021, 214 (214)
  • [44] Deep Reinforcement Learning with Adjustments
    Khorasgani, Hamed
    Wang, Haiyan
    Gupta, Chetan
    Serita, Susumu
    2021 IEEE 19TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2021,
  • [45] Deep Reinforcement Learning: An Overview
    Mousavi, Seyed Sajad
    Schukat, Michael
    Howley, Enda
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 2, 2018, 16 : 426 - 440
  • [46] Intelligent Emergency Generator Rejection Schemes Based on Knowledge Fusion and Deep Reinforcement Learning
    Li Z.
    Zeng L.
    Yao W.
    Hu Z.
    Shuai H.
    Tang Y.
    Wen J.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2024, 44 (05): : 1675 - 1687
  • [47] Multi-feature Fusion for Deep Reinforcement Learning: Sequential Control of Mobile Robots
    Wang, Haotian
    Yang, Wenjing
    Huang, Wanrong
    Lin, Zhipeng
    Tang, Yuhua
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT VII, 2018, 11307 : 303 - 315
  • [48] Implementation of Deep Reinforcement Learning
    Li, Meng-Jhe
    Li, An-Hong
    Huang, Yu-Jung
    Chu, Shao-I
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEMS (ICISS 2019), 2019, : 232 - 236
  • [49] A Survey on Deep Reinforcement Learning
    Liu Q.
    Zhai J.-W.
    Zhang Z.-Z.
    Zhong S.
    Zhou Q.
    Zhang P.
    Xu J.
    2018, Science Press (41): : 1 - 27
  • [50] Deep Reinforcement Learning and Games
    Zhao, Dongbin
    Lucas, Simon
    Togelius, Julian
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2019, 14 (03) : 7 - 7