Multi-Objective Reinforcement Learning for Acquiring All Pareto Optimal Policies Simultaneously - Method of Determining Scalarization Weights

被引:0
|
作者
Iima, Hitoshi [1 ]
Kuroe, Yasuaki [1 ]
机构
[1] Kyoto Inst Technol, Dept Informat Sci, Kyoto 606, Japan
关键词
reinforcement learning; multi-objective problem; Pareto optimal policy;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We recently proposed a multi-objective reinforcement learning method for acquiring all Pareto optimal policies simultaneously by introducing the concept of convex hulls into Q-learning method. In this method, state-action value vectors are obtained through learning only once, and then each Pareto optimal policy is derived through scalarizing the obtained stateaction value vectors by using a weight vector. The method does not require learning more than once, and finds all the Pareto optimal policies by determining weight vectors adequately and by giving them in scalarizing the obtained state-action value vectors. This paper proposes a method of determining the scalarization weight vectors. The performance of the proposed method is evaluated through numerical experiments.
引用
收藏
页码:876 / 881
页数:6
相关论文
共 50 条
  • [1] Multi-Objective Reinforcement Learning Method for Acquiring All Pareto Optimal Policies Simultaneously
    Mukai, Yusuke
    Kuroe, Yasuaki
    Iima, Hitoshi
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1917 - 1923
  • [2] Distributional Pareto-Optimal Multi-Objective Reinforcement Learning
    Cai, Xin-Qiang
    Zhang, Pushi
    Zhao, Li
    Bian, Jiang
    Sugiyama, Masashi
    Llorens, Ashley J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies
    Van Moffaert, Kristof
    Nowe, Ann
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3483 - 3512
  • [4] Estimating Objective Weights of Pareto-Optimal Policies for Multi-Objective Sequential Decision-Making
    Ikenaga, Akiko
    Arai, Sachiyo
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2024, 28 (02) : 393 - 402
  • [5] Dynamic Weights in Multi-Objective Deep Reinforcement Learning
    Abels, Axel
    Roijers, Diederik M.
    Lenaerts, Tom
    Nowe, Ann
    Steckelmacher, Denis
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
    Pirotta, Matteo
    Parisi, Simone
    Restelli, Marcello
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2928 - 2934
  • [7] On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
    Vamplew, Peter
    Yearwood, John
    Dazeley, Richard
    Berry, Adam
    [J]. AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 372 - 378
  • [8] An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning
    Mirzanejad, Mohammad
    Ebrahimi, Morteza
    Vamplew, Peter
    Veisi, Hadi
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2022, 37 (04):
  • [9] Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning
    Sun, Yang
    Li, Yun
    Xiong, Wei
    Yao, Zhonghua
    Moniz, Krishna
    Zahir, Ahmed
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [10] Determining All Pareto-Optimal Paths for Multi-category Multi-objective Path Optimization Problems
    Ma, Yiming
    Hu, Xiaobing
    Zhou, Hang
    [J]. ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 327 - 335