Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引:3
|
作者
Reymond, Mathieu [1 ]
Hayes, Conor F. [2 ]
Steckelmacher, Denis [1 ]
Roijers, Diederik M. [1 ,3 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Brussels, Belgium
[2] Univ Galway, Galway, Ireland
[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands
关键词
Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;
D O I
10.1007/s10458-023-09604-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Actor-critic multi-objective reinforcement learning for non-linear utility functions
    Mathieu Reymond
    Conor F. Hayes
    Denis Steckelmacher
    Diederik M. Roijers
    Ann Nowé
    [J]. Autonomous Agents and Multi-Agent Systems, 2023, 37
  • [2] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    [J]. INFORMATION SCIENCES, 2023, 647
  • [3] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    [J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [4] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
  • [5] Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
    Chen, Jingdi
    Wang, Yimeng
    Lan, Tian
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
  • [6] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
    Zhu, Lilu
    Wu, Feng
    Hu, Yanfeng
    Huang, Kai
    Tian, Xinmei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (13): : 9687 - 9710
  • [7] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
    Lilu Zhu
    Feng Wu
    Yanfeng Hu
    Kai Huang
    Xinmei Tian
    [J]. Neural Computing and Applications, 2023, 35 : 9687 - 9710
  • [8] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [9] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419