Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies

被引:0
|
作者
Van Moffaert, Kristof [1 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Dept Comp Sci, Brussels, Belgium
关键词
multiple criteria analysis; multi-objective; reinforcement learning; Pareto sets; hypervolume;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world problems involve the optimization of multiple, possibly conflicting objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In this paper, we present a novel temporal difference learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies in a single run. We name this algorithm Pareto Q-learning and it is applicable in episodic environments with deterministic as well as stochastic transition functions. A crucial aspect of Pareto Q-learning is the updating mechanism that bootstraps sets of Q-vectors. One of our main contributions in this paper is a mechanism that separates the expected immediate reward vector from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space. To balance exploration and exploitation during learning, we also propose three set evaluation mechanisms. These three mechanisms evaluate the sets of vectors to accommodate for standard action selection strategies, such as epsilon-greedy. More precisely, these mechanisms use multi-objective evaluation principles such as the hypervolume measure, the cardinality indicator and the Pareto dominance relation to select the most promising actions. We experimentally validate the algorithm on multiple environments with two and three objectives and we demonstrate that Pareto Q-learning outperforms current state-of-the-art MORL algorithms with respect to the hypervolume of the obtained policies. We note that (1) Pareto Q-learning is able to learn the entire Pareto front under the usual assumption that each state-action pair is sufficiently sampled, while (2) not being biased by the shape of the Pareto front. Furthermore, (3) the set evaluation mechanisms provide indicative measures for local action selection and (4) the learned policies can be retrieved throughout the state and action space.
引用
收藏
页码:3483 / 3512
页数:30
相关论文
共 50 条
  • [21] Multi-objective Optimization of Notifications Using Offline Reinforcement Learning
    Prabhakar, Prakruthi
    Yuan, Yiping
    Yang, Guangyu
    Sun, Wensheng
    Muralidharan, Ajith
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3752 - 3760
  • [22] Pareto Optimal Solutions for Network Defense Strategy Selection Simulator in Multi-Objective Reinforcement Learning
    Sun, Yang
    Li, Yun
    Xiong, Wei
    Yao, Zhonghua
    Moniz, Krishna
    Zahir, Ahmed
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [23] Pareto Rank Learning in Multi-objective Evolutionary Algorithms
    Seah, Chun-Wei
    Ong, Yew-Soon
    Tsang, Ivor W.
    Jiang, Siwei
    [J]. 2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [24] A survey on pareto front learning for multi-objective optimization
    Kang, Shida
    Li, Kaiwen
    Wang, Rui
    [J]. JOURNAL OF MEMBRANE COMPUTING, 2024,
  • [25] Structural analyses of pareto optimal sets in multi-objective optimization application to window design problem using multi-objective genetic algorithm
    Suga, Kentaro
    Kato, Shinsuke
    Hiyama, Kyosuke
    [J]. Journal of Environmental Engineering, 2008, 73 (625) : 283 - 289
  • [26] Meta-Learning for Multi-objective Reinforcement Learning
    Chen, Xi
    Ghadirzadeh, Ali
    Bjorkman, Marten
    Jensfelt, Patric
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 977 - 983
  • [27] Switching Policies based on Multi-Objective Reinforcement Learning for Adaptive Traffic Signal Control
    Saiki, Takumi
    Arai, Sachiyo
    [J]. 2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), 2022, : 488 - 493
  • [28] A multi-objective deep reinforcement learning framework
    Thanh Thi Nguyen
    Ngoc Duy Nguyen
    Vamplew, Peter
    Nahavandi, Saeid
    Dazeley, Richard
    Lim, Chee Peng
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [29] Special issue on multi-objective reinforcement learning
    Drugan, Madalina
    Wiering, Marco
    Vamplew, Peter
    Chetty, Madhu
    [J]. NEUROCOMPUTING, 2017, 263 : 1 - 2
  • [30] Multi-objective Reinforcement Learning for Responsive Grids
    Perez, Julien
    Germain-Renaud, Cecile
    Kegl, Balazs
    Loomis, Charles
    [J]. JOURNAL OF GRID COMPUTING, 2010, 8 (03) : 473 - 492