Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引：3

作者：

Reymond, Mathieu ^{[1
]}

Hayes, Conor F. ^{[2
]}

Steckelmacher, Denis ^{[1
]}

Roijers, Diederik M. ^{[1
,3
]}

Nowe, Ann ^{[1
]}

机构：

[1] Vrije Univ Brussel, Brussels, Belgium

[2] Univ Galway, Galway, Ireland

[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2023年 / 37卷 / 02期

关键词：

Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;

D O I：

10.1007/s10458-023-09604-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.

引用

页数：30

共 50 条

[41] Multi-agent reinforcement learning by the actor-critic model with an attention interface
Zhang, Lixiang
Li, Jingchen
Zhu, Yi'an
Shi, Haobin
Hwang, Kao-Shing
[J]. NEUROCOMPUTING, 2022, 471 : 275 - 284
[42] Structural relational inference actor-critic for multi-agent reinforcement learning
Zhang, Xianjie
Liu, Yu
Xu, Xiujuan
Huang, Qiong
Mao, Hangyu
Carie, Anil
[J]. NEUROCOMPUTING, 2021, 459 : 383 - 394
[43] Intensive versus non-intensive actor-critic reinforcement learning algorithms
Wawrzynski, P
Pacut, A
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 934 - 941
[44] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Veeriah, Vivek
van Seijen, Harm
Sutton, Richard S.
[J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
[45] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
Liang, Kun
Zhang, Guoqiang
Guo, Jinhui
Li, Wentao
[J]. ELECTRONICS, 2023, 12 (24)
[46] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
Kim, Namyong
Shin, IIayong
[J]. 2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
[47] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
[J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
[48] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
Lin, Yixuan
Gade, Shripad
Sandhu, Romeil
Liu, Ji
[J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958
[49] Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
Xiao, Yuchen
Lyu, Xueguang
Amato, Christopher
[J]. 2021 INTERNATIONAL SYMPOSIUM ON MULTI-ROBOT AND MULTI-AGENT SYSTEMS (MRS), 2021, : 155 - 163
[50] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
[J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288

← 1 2 3 4 5 →