Behavior fusion for deep reinforcement learning

被引:6
|
作者
Shi, Haobin [1 ]
Xu, Meng [1 ]
Hwang, Kao-Shing [2 ,3 ]
Cai, Bo-Yin [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Actor-critic; Policy gradient; Behavior fusion; Complex task; DECISION-MAKING; ENVIRONMENT; NAVIGATION; GRADIENT; NETWORK;
D O I
10.1016/j.isatra.2019.08.054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:434 / 444
页数:11
相关论文
共 50 条
  • [31] Deep Reinforcement Learning for Intersection Signal Control Considering Pedestrian Behavior
    Han, Guangjie
    Zheng, Qi
    Liao, Lyuchao
    Tang, Penghao
    Li, Zhengrong
    Zhu, Yintian
    ELECTRONICS, 2022, 11 (21)
  • [32] A Survey of Robot Manipulation Behavior Research Based on Deep Reinforcement Learning
    Chen J.
    Zheng M.
    Jiqiren/Robot, 2022, 44 (02): : 236 - 256
  • [33] Deep reinforcement learning framework and algorithms integrated with cognitive behavior models
    Chen H.
    Li J.-X.
    Huang J.
    Wang C.
    Liu Q.
    Zhang Z.-J.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (11): : 3209 - 3218
  • [34] Guiding Deep Reinforcement Learning by Modelling Expert's Planning Behavior
    Ma, Ruidong
    Oyekan, John
    2021 7TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS (ICCAR), 2021, : 321 - 325
  • [35] Obtaining fault tolerance avoidance behavior using deep reinforcement learning
    Aznar, Fidel
    Pujol, Mar
    Rizo, Ramon
    NEUROCOMPUTING, 2019, 345 : 77 - 91
  • [36] Transfer Learning in Deep Reinforcement Learning
    Islam, Tariqul
    Abid, Dm. Mehedi Hasan
    Rahman, Tanvir
    Zaman, Zahura
    Mia, Kausar
    Hossain, Ramim
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL 1, 2023, 447 : 145 - 153
  • [37] Learning to Drive with Deep Reinforcement Learning
    Chukamphaeng, Nut
    Pasupa, Kitsuchart
    Antenreiter, Martin
    Auer, Peter
    2021 13TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST-2021), 2021, : 147 - 152
  • [38] A Survey on Reinforcement Learning and Deep Reinforcement Learning for Recommender Systems
    Rezaei, Mehrdad
    Tabrizi, Nasseh
    DEEP LEARNING THEORY AND APPLICATIONS, DELTA 2023, 2023, 1875 : 385 - 402
  • [39] A Heterogeneous Information Fusion Deep Reinforcement Learning for Intelligent Frequency Selection of HF Communication
    Liu, Xin
    Xu, Yuhua
    Cheng, Yunpeng
    Li, Yangyang
    Zhao, Lei
    Zhang, Xiaobo
    CHINA COMMUNICATIONS, 2018, 15 (09) : 73 - 84
  • [40] DGTRL: Deep graph transfer reinforcement learning method based on fusion of knowledge and data
    Chen, Genxin
    Qi, Jin
    Gao, Yu
    Zhu, Xingjian
    Dong, Zhenjiang
    Sun, Yanfei
    INFORMATION SCIENCES, 2024, 658