Multi-Objective Reinforcement Learning Method for Acquiring All Pareto Optimal Policies Simultaneously

被引：0

作者：

Mukai, Yusuke ^{[1
]}

Kuroe, Yasuaki ^{[2
]}

Iima, Hitoshi

机构：

[1] Kyoto Inst Technol, Dept Adv Fibro Sci, Kyoto 606, Japan

[2] Kyoto Inst Technol, Dept Comp Sciy, Kyoto 606, Japan

来源：

PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC) | 2012年

关键词：

Reinforcement learning; Multi-objective problem; Pareto optimal policy;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies multi-objective reinforcement learning problems in which an agent gains multiple rewards. In ordinary multi-objective reinforcement learning methods, only a single Pareto optimal policy is acquired by the scalarizing method which uses the weighted sum of the reward vector, and therefore different Pareto optimal policies are acquired by changing the weight vector and by performing the methods again. On the other hand, a method in which all Pareto optimal policies are acquired simultaneously is proposed for problems whose environment model is known. By using the idea of the method, we propose a method that acquires all Pareto optimal policies simultaneously for the multi-objective reinforcement learning problems whose environment model is unknown. Furthermore, we show theoretically and experimentally that the proposed method can find the Pareto optimal policies.

引用

页码：1917 / 1923

页数：7

共 50 条

[41] A multi-objective deep reinforcement learning framework
Thanh Thi Nguyen
Ngoc Duy Nguyen
Vamplew, Peter
Nahavandi, Saeid
Dazeley, Richard
Lim, Chee Peng
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
[42] Special issue on multi-objective reinforcement learning
Drugan, Madalina
Wiering, Marco
Vamplew, Peter
Chetty, Madhu
[J]. NEUROCOMPUTING, 2017, 263 : 1 - 2
[43] Multi-objective Reinforcement Learning for Responsive Grids
Perez, Julien
Germain-Renaud, Cecile
Kegl, Balazs
Loomis, Charles
[J]. JOURNAL OF GRID COMPUTING, 2010, 8 (03) : 473 - 492
[44] On convergence of multi-objective Pareto front: Perturbation method
Farmani, Raziyeh
Savic, Dragan A.
Walters, Godfrey A.
[J]. EVOLUTIONARY MULTI-CRITERION OPTIMIZATION, PROCEEDINGS, 2007, 4403 : 443 - +
[45] Pedestrian simulation as multi-objective reinforcement learning
Ravichandran, Naresh Balaji
Yang, Fangkai
Peters, Christopher
Lansner, Anders
Herman, Pawel
[J]. 18TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA'18), 2018, : 307 - 312
[46] Multi-objective Reinforcement Learning for Responsive Grids
Julien Perez
Cécile Germain-Renaud
Balazs Kégl
Charles Loomis
[J]. Journal of Grid Computing, 2010, 8 : 473 - 492
[47] Decomposition based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning
Cheng, Xiu
Browne, Will N.
Zhang, Mengjie
[J]. 2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 622 - 629
[48] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
Vamplew, Peter
Issabekov, Rustam
Dazeley, Richard
Foale, Cameron
[J]. AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
[49] Maximum Norm Minimization: A Single-Policy Multi-Objective Reinforcement Learning to Expansion of the Pareto Front
Lee, Seonjae
Lee, Myoung Hoon
Moon, Jun
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1064 - 1073
[50] Multi-Objective Influence Diagrams with Possibly Optimal Policies
Marinescu, Radu
Razak, Abdul
Wilson, Nic
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3783 - 3789

← 1 2 3 4 5 →