Regularized Soft Actor-Critic for Behavior Transfer Learning

被引：0

作者：

Tan, Mingxi ^{[1
]}

Tian, Andong ^{[1
]}

Denoyer, Ludovic ^{[1
]}

机构：

[1] Ubisoft, Ubisoft La Forge, Chengdu, Peoples R China

来源：

2022 IEEE CONFERENCE ON GAMES, COG | 2022年

关键词：

CMDP; behavior style; video game;

D O I：

10.1109/CoG51982.2022.9893655

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.

引用

页码：516 / 519

页数：4

共 50 条

[41] Research on actor-critic reinforcement learning in RoboCup
Guo, He
Liu, Tianying
Wang, Yuxin
Chen, Feng
Fan, Jianming
WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205
[42] Reinforcement actor-critic learning as a rehearsal in MicroRTS
Manandhar, Shiron
Banerjee, Bikramjit
KNOWLEDGE ENGINEERING REVIEW, 2024, 39
[43] Actor-critic algorithms
Konda, VR
Tsitsiklis, JN
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
[44] Robust Offline Actor-Critic with On-Policy Regularized Policy Evaluation
Cao, Shuo
Wang, Xuesong
Cheng, Yuhu
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (12) : 2497 - 2511
[45] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[46] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[47] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja, Tuomas
Zhou, Aurick
Abbeel, Pieter
Levine, Sergey
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[48] Deployment Algorithm of Service Function Chain Based on Transfer Actor-Critic Learning
Tang Lun
He Xiaoyu
Wang Xiao
Chen Qianbin
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (11) : 2671 - 2679
[49] Natural Actor-Critic
Peters, J
Vijayakumar, S
Schaal, S
MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
[50] Robust Offline Actor-Critic With On-policy Regularized Policy Evaluation
Shuo Cao
Xuesong Wang
Yuhu Cheng
IEEE/CAA Journal of Automatica Sinica, 2024, 11 (12) : 2497 - 2511

← 1 2 3 4 5 →