Regularized Soft Actor-Critic for Behavior Transfer Learning

被引:0
|
作者
Tan, Mingxi [1 ]
Tian, Andong [1 ]
Denoyer, Ludovic [1 ]
机构
[1] Ubisoft, Ubisoft La Forge, Chengdu, Peoples R China
关键词
CMDP; behavior style; video game;
D O I
10.1109/CoG51982.2022.9893655
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.
引用
收藏
页码:516 / 519
页数:4
相关论文
共 50 条
  • [1] Dual Behavior Regularized Offline Deterministic Actor-Critic
    Cao, Shuo
    Wang, Xuesong
    Cheng, Yuhu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 4841 - 4852
  • [2] PAC-Bayesian Soft Actor-Critic Learning
    Tasdighi, Bahareh
    Akgul, Abdullah
    Haussmann, Manuel
    Brink, Kenny Kazimirzak
    Kandemir, Melih
    SYMPOSIUM ON ADVANCES IN APPROXIMATE BAYESIAN INFERENCE, 2024, 253 : 127 - 145
  • [3] Averaged Soft Actor-Critic for Deep Reinforcement Learning
    Ding, Feng
    Ma, Guanfeng
    Chen, Zhikui
    Gao, Jing
    Li, Peng
    COMPLEXITY, 2021, 2021
  • [4] Merging with Extraction Method for Transfer Learning in Actor-Critic
    Takano, Toshiaki
    Takase, Haruhiko
    Kawanaka, Hiroharu
    Tsuruoka, Shinji
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (07) : 814 - 821
  • [5] TD-regularized actor-critic methods
    Simone Parisi
    Voot Tangkaratt
    Jan Peters
    Mohammad Emtiyaz Khan
    Machine Learning, 2019, 108 : 1467 - 1501
  • [6] TD-regularized actor-critic methods
    Parisi, Simone
    Tangkaratt, Voot
    Peters, Jan
    Khan, Mohammad Emtiyaz
    MACHINE LEARNING, 2019, 108 (8-9) : 1467 - 1501
  • [7] Generative Adversarial Soft Actor-Critic
    Hwang, Hyo-Seok
    Kim, Yoojoong
    Seok, Junhee
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [8] Bayesian Strategy Networks Based Soft Actor-Critic Learning
    Yang, Qin
    Parasuraman, Ramviyas
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [9] Soft Actor-Critic With Integer Actions
    Fan, Ting-Han
    Wang, Yubo
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2611 - 2616
  • [10] Soft Actor-Critic for Navigation of Mobile Robots
    de Jesus, Junior Costa
    Kich, Victor Augusto
    Kolling, Alisson Henrique
    Grando, Ricardo Bedin
    Cuadros, Marco Antonio de Souza Leite
    Gamarra, Daniel Fernando Tello
    Journal of Intelligent and Robotic Systems: Theory and Applications, 2021, 102 (02):