Behavior fusion for deep reinforcement learning

被引:6
|
作者
Shi, Haobin [1 ]
Xu, Meng [1 ]
Hwang, Kao-Shing [2 ,3 ]
Cai, Bo-Yin [2 ,3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[3] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
基金
中国国家自然科学基金;
关键词
Deep reinforcement learning; Actor-critic; Policy gradient; Behavior fusion; Complex task; DECISION-MAKING; ENVIRONMENT; NAVIGATION; GRADIENT; NETWORK;
D O I
10.1016/j.isatra.2019.08.054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network. (C) 2019 ISA. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:434 / 444
页数:11
相关论文
共 50 条
  • [1] Deep reinforcement learning in loop fusion problem
    Ziraksima, Mahsa
    Lotfi, Shahriar
    Razmara, Jafar
    NEUROCOMPUTING, 2022, 481 : 102 - 120
  • [2] Multifeature Fusion Human Motion Behavior Recognition Algorithm Using Deep Reinforcement Learning
    Lu, Chengkun
    MOBILE INFORMATION SYSTEMS, 2021, 2021
  • [3] A behavior fusion method based on inverse reinforcement learning
    Shi, Haobin
    Li, Jingchen
    Chen, Shicong
    Hwang, Kao-Shing
    INFORMATION SCIENCES, 2022, 609 : 429 - 444
  • [4] Sensor Fusion for Robot Control through Deep Reinforcement Learning
    Bohez, Steven
    Verbelen, Tim
    De Coninck, Elias
    Vankeirsbilck, Bert
    Simoens, Pieter
    Dhoedt, Bart
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2365 - 2370
  • [5] Multimodal Biometrics Fusion Algorithm Using Deep Reinforcement Learning
    Huang, Quan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [6] Feature Fusion Deep Reinforcement Learning Approach for Stock Trading
    Bai, Tongyuan
    Lang, Qi
    Song, Shifan
    Fang, Yan
    Liu, Xiaodong
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 7240 - 7245
  • [7] Avoiding fusion plasma tearing instability with deep reinforcement learning
    Seo, Jaemin
    Kim, Sangkyeun
    Jalalvand, Azarakhsh
    Conlin, Rory
    Rothstein, Andrew
    Abbate, Joseph
    Erickson, Keith
    Wai, Josiah
    Shousha, Ricardo
    Kolemen, Egemen
    NATURE, 2024, 626 (8000) : 746 - 751
  • [8] Avoiding fusion plasma tearing instability with deep reinforcement learning
    Jaemin Seo
    SangKyeun Kim
    Azarakhsh Jalalvand
    Rory Conlin
    Andrew Rothstein
    Joseph Abbate
    Keith Erickson
    Josiah Wai
    Ricardo Shousha
    Egemen Kolemen
    Nature, 2024, 626 : 746 - 751
  • [9] Character Behavior Automation Using Deep Reinforcement Learning
    Lee, Hyunki
    Dahouda, Mwamba Kasongo
    Joe, Inwhee
    IEEE ACCESS, 2023, 11 : 101435 - 101442
  • [10] Deep reinforcement learning for UAV swarm rendezvous behavior
    Zhang, Yaozhong
    Li, Yike
    Wu, Zhuoran
    Xu, Jialin
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2023, 34 (02) : 360 - 373