Regularized Soft Actor-Critic for Behavior Transfer Learning

被引:0
|
作者
Tan, Mingxi [1 ]
Tian, Andong [1 ]
Denoyer, Ludovic [1 ]
机构
[1] Ubisoft, Ubisoft La Forge, Chengdu, Peoples R China
关键词
CMDP; behavior style; video game;
D O I
10.1109/CoG51982.2022.9893655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.
引用
收藏
页码:516 / 519
页数:4
相关论文
共 50 条
  • [41] Research on actor-critic reinforcement learning in RoboCup
    Guo, He
    Liu, Tianying
    Wang, Yuxin
    Chen, Feng
    Fan, Jianming
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205
  • [42] Reinforcement actor-critic learning as a rehearsal in MicroRTS
    Manandhar, Shiron
    Banerjee, Bikramjit
    KNOWLEDGE ENGINEERING REVIEW, 2024, 39
  • [43] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [44] Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation
    Cao, Shuo
    Wang, Xuesong
    Cheng, Yuhu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (12) : 2497 - 2511
  • [45] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [46] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [47] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    Haarnoja, Tuomas
    Zhou, Aurick
    Abbeel, Pieter
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [48] Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning
    Tang Lun
    He Xiaoyu
    Wang Xiao
    Chen Qianbin
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (11) : 2671 - 2679
  • [49] Natural Actor-Critic
    Peters, J
    Vijayakumar, S
    Schaal, S
    MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
  • [50] Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
    Shuo Cao
    Xuesong Wang
    Yuhu Cheng
    IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511