Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

被引:0
|
作者
Vamplew, Peter [1 ]
Dazeley, Richard [1 ]
Barker, Ewan [1 ]
Kelarev, Andrei [1 ]
机构
[1] Univ Ballarat, Grad Sch Informat Technol & Math Sci, Ballarat, Vic 3353, Australia
关键词
multiobjective; reinforcement learning; scalarisation; Pareto fronts;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobjective tasks from deterministic base policies found via scalarised reinforcement learning. It is shown that these approaches are an efficient means of identifying solutions which offer a superior match to the user's preferences than can be achieved by methods based strictly on deterministic policies.
引用
收藏
页码:340 / 349
页数:10
相关论文
共 50 条
  • [1] Maximizing the average reward in episodic reinforcement learning tasks
    Reinke, Chris
    Uchibe, Eiji
    Doya, Kenji
    2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 420 - 421
  • [2] Reinforcement Learning of Pareto-Optimal Multiobjective Policies Using Steering
    Vamplew, Peter
    Issabekov, Rustam
    Dazeley, Richard
    Foale, Cameron
    AI 2015: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2015, 9457 : 596 - 608
  • [3] Developing Cooperative Policies for Mull' Stage Reinforcement Learning Tasks
    Erskine, Jordan
    Lehnert, Christopher
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (03) : 6590 - 6597
  • [4] Behaviour-Conditioned Policies for Cooperative Reinforcement Learning Tasks
    Keurulainen, Antti
    Westerlund, Isak
    Kwiatkowski, Ariel
    Kaski, Samuel
    Ilin, Alexander
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 493 - 504
  • [5] NEFRL: A new neuro-fuzzy system for episodic reinforcement learning tasks
    Behsaz, Babak
    Safabakhsh, Reza
    PROCEEDINGS OF THE FRONTIERS IN THE CONVERGENCE OF BIOSCIENCE AND INFORMATION TECHNOLOGIES, 2007, : 819 - 824
  • [6] Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks
    Crespi, Marco
    Custode, Leonardo Lucio
    Iacca, Giovanni
    BIOINSPIRED OPTIMIZATION METHODS AND THEIR APPLICATIONS, 2022, 13627 : 262 - 276
  • [7] Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning
    Koga, Marcelo L.
    Freire, Valdinei
    Costa, Anna H. R.
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 77 - 88
  • [8] Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning
    Huttenrauch, Maximilian
    Neumann, Gerhard
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 44
  • [9] Learning Options in Multiobjective Reinforcement Learning
    Bonini, Rodrigo Cesar
    da Silva, Felipe Leno
    Reali Costa, Anna Helena
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4907 - 4908
  • [10] A Twin Agent Reinforcement Learning Framework by Integrating Deterministic and Stochastic Policies
    Gupta, Nikita
    Anand, Shikhar
    Kumar, Deepak
    Ramteke, Manojkumar
    Kandath, Harikumar
    Kodamana, Hariprasad
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2024, 63 (24) : 10692 - 10703