Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引：3

作者：

Reymond, Mathieu ^{[1
]}

Hayes, Conor F. ^{[2
]}

Steckelmacher, Denis ^{[1
]}

Roijers, Diederik M. ^{[1
,3
]}

Nowe, Ann ^{[1
]}

机构：

[1] Vrije Univ Brussel, Brussels, Belgium

[2] Univ Galway, Galway, Ireland

[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2023年 / 37卷 / 02期

关键词：

Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;

D O I：

10.1007/s10458-023-09604-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.

引用

页数：30

共 50 条

[1] Actor-critic multi-objective reinforcement learning for non-linear utility functions
Mathieu Reymond
Conor F. Hayes
Denis Steckelmacher
Diederik M. Roijers
Ann Nowé
[J]. Autonomous Agents and Multi-Agent Systems, 2023, 37
[2] Multi-actor mechanism for actor-critic reinforcement learning
Li, Lin
Li, Yuze
Wei, Wei
Zhang, Yujia
Liang, Jiye
[J]. INFORMATION SCIENCES, 2023, 647
[3] A Prioritized objective actor-critic method for deep reinforcement learning
Ngoc Duy Nguyen
Thanh Thi Nguyen
Peter Vamplew
Richard Dazeley
Saeid Nahavandi
[J]. Neural Computing and Applications, 2021, 33 : 10335 - 10349
[4] A Prioritized objective actor-critic method for deep reinforcement learning
Nguyen, Ngoc Duy
Nguyen, Thanh Thi
Vamplew, Peter
Dazeley, Richard
Nahavandi, Saeid
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16): : 10335 - 10349
[5] Bringing Fairness to Actor-Critic Reinforcement Learning for Network Utility Optimization
Chen, Jingdi
Wang, Yimeng
Lan, Tian
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
[6] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
Zhu, Lilu
Wu, Feng
Hu, Yanfeng
Huang, Kai
Tian, Xinmei
[J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (13): : 9687 - 9710
[7] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
Lilu Zhu
Feng Wu
Yanfeng Hu
Kai Huang
Xinmei Tian
[J]. Neural Computing and Applications, 2023, 35 : 9687 - 9710
[8] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[9] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419

← 1 2 3 4 5 →