Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies

被引:0
|
作者
Van Moffaert, Kristof [1 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Dept Comp Sci, Brussels, Belgium
关键词
multiple criteria analysis; multi-objective; reinforcement learning; Pareto sets; hypervolume;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world problems involve the optimization of multiple, possibly conflicting objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In this paper, we present a novel temporal difference learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies in a single run. We name this algorithm Pareto Q-learning and it is applicable in episodic environments with deterministic as well as stochastic transition functions. A crucial aspect of Pareto Q-learning is the updating mechanism that bootstraps sets of Q-vectors. One of our main contributions in this paper is a mechanism that separates the expected immediate reward vector from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space. To balance exploration and exploitation during learning, we also propose three set evaluation mechanisms. These three mechanisms evaluate the sets of vectors to accommodate for standard action selection strategies, such as epsilon-greedy. More precisely, these mechanisms use multi-objective evaluation principles such as the hypervolume measure, the cardinality indicator and the Pareto dominance relation to select the most promising actions. We experimentally validate the algorithm on multiple environments with two and three objectives and we demonstrate that Pareto Q-learning outperforms current state-of-the-art MORL algorithms with respect to the hypervolume of the obtained policies. We note that (1) Pareto Q-learning is able to learn the entire Pareto front under the usual assumption that each state-action pair is sufficiently sampled, while (2) not being biased by the shape of the Pareto front. Furthermore, (3) the set evaluation mechanisms provide indicative measures for local action selection and (4) the learned policies can be retrieved throughout the state and action space.
引用
收藏
页码:3483 / 3512
页数:30
相关论文
共 50 条
  • [1] Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning
    Reymond, Mathieu
    Hayes, Conor F.
    Willem, Lander
    Radulescu, Roxana
    Abrams, Steven
    Roijers, Diederik M.
    Howley, Enda
    Mannion, Patrick
    Hens, Niel
    Nowe, Ann
    Libin, Pieter
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [2] Multi-Objective Reinforcement Learning Method for Acquiring All Pareto Optimal Policies Simultaneously
    Mukai, Yusuke
    Kuroe, Yasuaki
    Iima, Hitoshi
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1917 - 1923
  • [3] Distributional Pareto-Optimal Multi-Objective Reinforcement Learning
    Cai, Xin-Qiang
    Zhang, Pushi
    Zhao, Li
    Bian, Jiang
    Sugiyama, Masashi
    Llorens, Ashley J.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Multi-Objective Reinforcement Learning with Continuous Pareto Frontier Approximation
    Pirotta, Matteo
    Parisi, Simone
    Restelli, Marcello
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2928 - 2934
  • [5] On the Limitations of Scalarisation for Multi-objective Reinforcement Learning of Pareto Fronts
    Vamplew, Peter
    Yearwood, John
    Dazeley, Richard
    Berry, Adam
    [J]. AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 372 - 378
  • [6] Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation
    Parisi, Simone
    Pirotta, Matteo
    Restelli, Marcello
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2016, 57 : 187 - 227
  • [7] Learning adversarial attack policies through multi-objective reinforcement learning
    Garcia, Javier
    Majadas, Ruben
    Fernandez, Fernando
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [8] Multi-Objective Reinforcement Learning for Acquiring All Pareto Optimal Policies Simultaneously - Method of Determining Scalarization Weights
    Iima, Hitoshi
    Kuroe, Yasuaki
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 876 - 881
  • [9] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
  • [10] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Naoto Horie
    Tohgoroh Matsui
    Koichi Moriyama
    Atsuko Mutoh
    Nobuhiro Inuzuka
    [J]. Artificial Life and Robotics, 2019, 24 : 352 - 359