Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

被引：0

作者：

Vamplew, Peter ^{[1
]}

Dazeley, Richard ^{[1
]}

Barker, Ewan ^{[1
]}

Kelarev, Andrei ^{[1
]}

机构：

[1] Univ Ballarat, Grad Sch Informat Technol & Math Sci, Ballarat, Vic 3353, Australia

来源：

AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS | 2009年 / 5866卷

关键词：

multiobjective; reinforcement learning; scalarisation; Pareto fronts;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user's preferences than can be achieved by methods based strictly on deterministic policies.

引用

页码：340 / 349

页数：10

共 50 条

[1] Maximizing the average reward in episodic reinforcement learning tasks
Reinke, Chris
Uchibe, Eiji
Doya, Kenji
2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 420 - 421
[2] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
Vamplew, Peter
Issabekov, Rustam
Dazeley, Richard
Foale, Cameron
AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
[3] Developing Cooperative Policies for Mull' Stage Reinforcement Learning Tasks
Erskine, Jordan
Lehnert, Christopher
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 6590 - 6597
[4] Behaviour-Conditioned Policies for Cooperative Reinforcement Learning Tasks
Keurulainen, Antti
Westerlund, Isak
Kwiatkowski, Ariel
Kaski, Samuel
Ilin, Alexander
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 493 - 504
[5] NEFRL: A new neuro-fuzzy system for episodic reinforcement learning tasks
Behsaz, Babak
Safabakhsh, Reza
PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 819 - 824
[6] Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks
Crespi, Marco
Custode, Leonardo Lucio
Iacca, Giovanni
BIOINSPIRED OPTIMIZATION METHODS AND THEIR APPLICATIONS, 2022, 13627 : 262 - 276
[7] Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning
Koga, Marcelo L.
Freire, Valdinei
Costa, Anna H. R.
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 77 - 88
[8] Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning
Huttenrauch, Maximilian
Neumann, Gerhard
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 44
[9] Learning Options in Multiobjective Reinforcement Learning
Bonini, Rodrigo Cesar
da Silva, Felipe Leno
Reali Costa, Anna Helena
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4907 - 4908
[10] A Twin Agent Reinforcement Learning Framework by Integrating Deterministic and Stochastic Policies
Gupta, Nikita
Anand, Shikhar
Kumar, Deepak
Ramteke, Manojkumar
Kandath, Harikumar
Kodamana, Hariprasad
INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2024, 63 (24) : 10692 - 10703

← 1 2 3 4 5 →