Multi-Objective Reinforcement Learning using Sets of Pareto Dominating Policies

被引:0
|
作者
Van Moffaert, Kristof [1 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Dept Comp Sci, Brussels, Belgium
关键词
multiple criteria analysis; multi-objective; reinforcement learning; Pareto sets; hypervolume;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many real-world problems involve the optimization of multiple, possibly conflicting objectives. Multi-objective reinforcement learning (MORL) is a generalization of standard reinforcement learning where the scalar reward signal is extended to multiple feedback signals, in essence, one for each objective. MORL is the process of learning policies that optimize multiple criteria simultaneously. In this paper, we present a novel temporal difference learning algorithm that integrates the Pareto dominance relation into a reinforcement learning approach. This algorithm is a multi-policy algorithm that learns a set of Pareto dominating policies in a single run. We name this algorithm Pareto Q-learning and it is applicable in episodic environments with deterministic as well as stochastic transition functions. A crucial aspect of Pareto Q-learning is the updating mechanism that bootstraps sets of Q-vectors. One of our main contributions in this paper is a mechanism that separates the expected immediate reward vector from the set of expected future discounted reward vectors. This decomposition allows us to update the sets and to exploit the learned policies consistently throughout the state space. To balance exploration and exploitation during learning, we also propose three set evaluation mechanisms. These three mechanisms evaluate the sets of vectors to accommodate for standard action selection strategies, such as epsilon-greedy. More precisely, these mechanisms use multi-objective evaluation principles such as the hypervolume measure, the cardinality indicator and the Pareto dominance relation to select the most promising actions. We experimentally validate the algorithm on multiple environments with two and three objectives and we demonstrate that Pareto Q-learning outperforms current state-of-the-art MORL algorithms with respect to the hypervolume of the obtained policies. We note that (1) Pareto Q-learning is able to learn the entire Pareto front under the usual assumption that each state-action pair is sufficiently sampled, while (2) not being biased by the shape of the Pareto front. Furthermore, (3) the set evaluation mechanisms provide indicative measures for local action selection and (4) the learned policies can be retrieved throughout the state and action space.
引用
收藏
页码:3483 / 3512
页数:30
相关论文
共 50 条
  • [41] Optimization of Fiber Radiation Processes Using Multi-Objective Reinforcement Learning
    Choi, Hye Kyung
    Lee, Whan
    Sajadieh, Seyed Mohammad Mehdi
    Do Noh, Sang
    Sim, Seung Bum
    Jung, Wu chang
    Jeong, Jeong Ho
    [J]. INTERNATIONAL JOURNAL OF PRECISION ENGINEERING AND MANUFACTURING-GREEN TECHNOLOGY, 2024,
  • [42] Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning
    Shresthamali, Shaswot
    Kondo, Masaaki
    Nakamura, Hiroshi
    [J]. JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2022, 12 (04)
  • [43] MULTI-OBJECTIVE SBRT TREATMENT PLANNING USING PARETO
    Potrebko, P.
    Fiege, J.
    McCurdy, B.
    Champion, H.
    Cull, A.
    [J]. RADIOTHERAPY AND ONCOLOGY, 2011, 99 : S161 - S161
  • [44] Scalable Pareto Front Approximation for Deep Multi-Objective Learning
    Ruchte, Michael
    Grabocka, Josif
    [J]. 2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1306 - 1311
  • [45] Distributed Pareto Reinforcement Learning for Multi-objective Smart Generation Control of Multi-area Interconnected Power Systems
    Yin, Linfei
    Cao, Xinghui
    Sun, Zhixiang
    [J]. JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2022, 17 (05) : 3031 - 3044
  • [46] Distributed Pareto Reinforcement Learning for Multi-objective Smart Generation Control of Multi-area Interconnected Power Systems
    Linfei Yin
    Xinghui Cao
    Zhixiang Sun
    [J]. Journal of Electrical Engineering & Technology, 2022, 17 : 3031 - 3044
  • [47] Multi-Objective Dynamic Dispatch Optimisation using Multi-Agent Reinforcement Learning
    Mannion, Patrick
    Mason, Karl
    Devlin, Sam
    Duggan, Jim
    Howley, Enda
    [J]. AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1345 - 1346
  • [48] Self-Learning Multi-Objective Service Coordination Using Deep Reinforcement Learning
    Schneider, Stefan
    Khalili, Ramin
    Manzoor, Adnan
    Qarawlus, Haydar
    Schellenberg, Rafael
    Karl, Holger
    Hecker, Artur
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (03): : 3829 - 3842
  • [49] Additive Approximations of Pareto-Optimal Sets by Evolutionary Multi-Objective Algorithms
    Horoba, Christian
    Neumann, Frank
    [J]. FOGA'09: PROCEEDINGS OF THE 10TH ACM SIGRVO CONFERENCE ON FOUNDATIONS OF GENETIC ALGORITHMS, 2009, : 79 - 86
  • [50] A Multi-objective Reinforcement Learning Algorithm for JS']JSSP
    Mendez-Hernandez, Beatriz M.
    Rodriguez-Bazan, Erick D.
    Martinez-Jimenez, Yailen
    Libin, Pieter
    Nowe, Ann
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: THEORETICAL NEURAL COMPUTATION, PT I, 2019, 11727 : 567 - 584