Multi-Objective Reinforcement Learning Method for Acquiring All Pareto Optimal Policies Simultaneously

被引:0
|
作者
Mukai, Yusuke [1 ]
Kuroe, Yasuaki [2 ]
Iima, Hitoshi
机构
[1] Kyoto Inst Technol, Dept Adv Fibro Sci, Kyoto 606, Japan
[2] Kyoto Inst Technol, Dept Comp Sciy, Kyoto 606, Japan
关键词
Reinforcement learning; Multi-objective problem; Pareto optimal policy;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies multi-objective reinforcement learning problems in which an agent gains multiple rewards. In ordinary multi-objective reinforcement learning methods, only a single Pareto optimal policy is acquired by the scalarizing method which uses the weighted sum of the reward vector, and therefore different Pareto optimal policies are acquired by changing the weight vector and by performing the methods again. On the other hand, a method in which all Pareto optimal policies are acquired simultaneously is proposed for problems whose environment model is known. By using the idea of the method, we propose a method that acquires all Pareto optimal policies simultaneously for the multi-objective reinforcement learning problems whose environment model is unknown. Furthermore, we show theoretically and experimentally that the proposed method can find the Pareto optimal policies.
引用
收藏
页码:1917 / 1923
页数:7
相关论文
共 50 条
  • [41] A multi-objective deep reinforcement learning framework
    Thanh Thi Nguyen
    Ngoc Duy Nguyen
    Vamplew, Peter
    Nahavandi, Saeid
    Dazeley, Richard
    Lim, Chee Peng
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [42] Special issue on multi-objective reinforcement learning
    Drugan, Madalina
    Wiering, Marco
    Vamplew, Peter
    Chetty, Madhu
    [J]. NEUROCOMPUTING, 2017, 263 : 1 - 2
  • [43] Multi-objective Reinforcement Learning for Responsive Grids
    Perez, Julien
    Germain-Renaud, Cecile
    Kegl, Balazs
    Loomis, Charles
    [J]. JOURNAL OF GRID COMPUTING, 2010, 8 (03) : 473 - 492
  • [44] On convergence of multi-objective Pareto front: Perturbation method
    Farmani, Raziyeh
    Savic, Dragan A.
    Walters, Godfrey A.
    [J]. EVOLUTIONARY MULTI-CRITERION OPTIMIZATION, PROCEEDINGS, 2007, 4403 : 443 - +
  • [45] Pedestrian simulation as multi-objective reinforcement learning
    Ravichandran, Naresh Balaji
    Yang, Fangkai
    Peters, Christopher
    Lansner, Anders
    Herman, Pawel
    [J]. 18TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA'18), 2018, : 307 - 312
  • [46] Multi-objective Reinforcement Learning for Responsive Grids
    Julien Perez
    Cécile Germain-Renaud
    Balazs Kégl
    Charles Loomis
    [J]. Journal of Grid Computing, 2010, 8 : 473 - 492
  • [47] Decomposition based Multi-Objective Evolutionary Algorithm in XCS for Multi-Objective Reinforcement Learning
    Cheng, Xiu
    Browne, Will N.
    Zhang, Mengjie
    [J]. 2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 622 - 629
  • [48] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
    Vamplew, Peter
    Issabekov, Rustam
    Dazeley, Richard
    Foale, Cameron
    [J]. AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
  • [49] Maximum Norm Minimization: A Single-Policy Multi-Objective Reinforcement Learning to Expansion of the Pareto Front
    Lee, Seonjae
    Lee, Myoung Hoon
    Moon, Jun
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 1064 - 1073
  • [50] Multi-Objective Influence Diagrams with Possibly Optimal Policies
    Marinescu, Radu
    Razak, Abdul
    Wilson, Nic
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3783 - 3789