Multi-Objective Reinforcement Learning Method for Acquiring All Pareto Optimal Policies Simultaneously

被引:0
|
作者
Mukai, Yusuke [1 ]
Kuroe, Yasuaki [2 ]
Iima, Hitoshi
机构
[1] Kyoto Inst Technol, Dept Adv Fibro Sci, Kyoto 606, Japan
[2] Kyoto Inst Technol, Dept Comp Sciy, Kyoto 606, Japan
关键词
Reinforcement learning; Multi-objective problem; Pareto optimal policy;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies multi-objective reinforcement learning problems in which an agent gains multiple rewards. In ordinary multi-objective reinforcement learning methods, only a single Pareto optimal policy is acquired by the scalarizing method which uses the weighted sum of the reward vector, and therefore different Pareto optimal policies are acquired by changing the weight vector and by performing the methods again. On the other hand, a method in which all Pareto optimal policies are acquired simultaneously is proposed for problems whose environment model is known. By using the idea of the method, we propose a method that acquires all Pareto optimal policies simultaneously for the multi-objective reinforcement learning problems whose environment model is unknown. Furthermore, we show theoretically and experimentally that the proposed method can find the Pareto optimal policies.
引用
收藏
页码:1917 / 1923
页数:7
相关论文
共 50 条
  • [1] Multi-Objective Reinforcement Learning for Acquiring All Pareto Optimal Policies Simultaneously - Method of Determining Scalarization Weights
    Iima, Hitoshi
    Kuroe, Yasuaki
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 876 - 881
  • [2] Distributional Pareto-Optimal Multi-Objective Reinforcement Learning
    Cai, Xin-Qiang
    Zhang, Pushi
    Zhao, Li
    Bian, Jiang
    Sugiyama, Masashi
    Llorens, Ashley J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies
    Van Moffaert, Kristof
    Nowe, Ann
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3483 - 3512
  • [4] Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
    Pirotta, Matteo
    Parisi, Simone
    Restelli, Marcello
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2928 - 2934
  • [5] On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
    Vamplew, Peter
    Yearwood, John
    Dazeley, Richard
    Berry, Adam
    [J]. AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 372 - 378
  • [6] Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning
    Sun, Yang
    Li, Yun
    Xiong, Wei
    Yao, Zhonghua
    Moniz, Krishna
    Zahir, Ahmed
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [7] Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning
    Reymond, Mathieu
    Hayes, Conor F.
    Willem, Lander
    Radulescu, Roxana
    Abrams, Steven
    Roijers, Diederik M.
    Howley, Enda
    Mannion, Patrick
    Hens, Niel
    Nowe, Ann
    Libin, Pieter
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [8] Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation
    Parisi, Simone
    Pirotta, Matteo
    Restelli, Marcello
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 : 187 - 227
  • [9] Learning adversarial attack policies through multi-objective reinforcement learning
    Garcia, Javier
    Majadas, Ruben
    Fernandez, Fernando
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [10] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359