Reinforcement learning with constraint based on mirror descent algorithm

被引：1

作者：

Miyashita, Megumi ^{[1
]}

Kondo, Toshiyuki ^{[2
]}

Yano, Shiro ^{[2
]}

机构：

[1] Tokyo Univ Agr & Technol, Grad Sch Engn, Dept Elect & Informat Engn, 2-24-16 Naka Cho, Koganei, Tokyo, Japan

[2] Tokyo Univ Agr & Technol, Inst Engn, Div Adv Informat Technol & Comp Sci, 2-24-16 Naka Cho, Koganei, Tokyo, Japan

来源：

RESULTS IN CONTROL AND OPTIMIZATION | 2021年 / 4卷

关键词：

Constrained optimization; Mirror descent algorithm;

D O I：

10.1016/j.rico.2021.100048

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

An important issue in reinforcement learning is to make the agent avoid the dangers and risks during the task such as physical collisions. We propose the reinforcement learning algorithm based on the CoMirror algorithm, named CoMDS, for the problem that has a functional constraint. Besides, we modify the proposed algorithm CoMDS to Gaussian CoMDS for practical use. We evaluate our algorithms with the via -point task of a planar robotic arm with a forbidden area, that employs as a constraint, in the simulator. As a result, we find that Gaussian CoMDS explores the policy while satisfying the constraint.

引用

页数：9

共 50 条

[1] Node Constraint Routing Algorithm based on Reinforcement Learning
Dong, Weihang
Zhang, Wei
Yang, Wei
[J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 1752 - 1756
[2] Robust Imitation via Mirror Descent Inverse Reinforcement Learning
Han, Dong-Sig
Kim, Hyunseo
Lee, Hyundo
Ryu, Je-Hwan
Zhang, Byoung-Tak
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[3] Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent
Moreno, Bianca Marin
Bregere, Margaux
Gaillard, Pierre
Oudjane, Nadia
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[4] POLICY MIRROR DESCENT FOR REGULARIZED REINFORCEMENT LEARNING: A GENERALIZED FRAMEWORK WITH LINEAR CONVERGENCE
Zhan, Wenhao
Cen, Shicong
Huang, Baihe
Chen, Yuxin
Lee, Jason D.
Chi, Yuejie
[J]. SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 1061 - 1091
[5] Mirror Descent Learning in Continuous Games
Zhou, Zhengyuan
Mertikopoulos, Panayotis
Moustakas, Aris L.
Bambos, Nicholas
Glynn, Peter
[J]. 2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[6] Analysis of Online Composite Mirror Descent Algorithm
Lei, Yunwen
Zhou, Ding-Xuan
[J]. NEURAL COMPUTATION, 2017, 29 (03) : 825 - 860
[7] Gradient descent for general reinforcement learning
Baird, L
Moore, A
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 11, 1999, 11 : 968 - 974
[8] Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
Lan, Guanghui
[J]. MATHEMATICAL PROGRAMMING, 2023, 198 (01) : 1059 - 1106
[9] Policy mirror descent for reinforcement learning: linear convergence, new sampling complexity, and generalized problem classes
Guanghui Lan
[J]. Mathematical Programming, 2023, 198 : 1059 - 1106
[10] Energy-Based Policy Constraint for Offline Reinforcement Learning
Peng, Zhiyong
Han, Changlin
Liu, Yadong
Zhou, Zongtan
[J]. ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 335 - 346

← 1 2 3 4 5 →