The impact of environmental stochasticity on value-based multiobjective reinforcement learning

被引:10
|
作者
Vamplew, Peter [1 ]
Foale, Cameron [1 ]
Dazeley, Richard [2 ]
机构
[1] Federat Univ, Ballarat, Vic, Australia
[2] Deakin Univ, Geelong, Vic, Australia
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 03期
关键词
Multiobjective reinforcement learning; Multiobjective MDPs; Stochastic MDPs;
D O I
10.1007/s00521-021-05859-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common approach to address multiobjective problems using reinforcement learning methods is to extend model-free, value-based algorithms such as Q-learning to use a vector of Q-values in combination with an appropriate action selection mechanism that is often based on scalarisation. Most prior empirical evaluation of these approaches has focused on deterministic environments. This study examines the impact on stochasticity in rewards and state transitions on the behaviour of multi-objective Q-learning. It shows that the nature of the optimal solution depends on these environmental characteristics, and also on whether we desire to maximise the Expected Scalarised Return (ESR) or the Scalarised Expected Return (SER). We also identify a novel aim which may arise in some applications of maximising SER subject to satisfying constraints on the variation in return and show that this may require different solutions than ESR or conventional SER. The analysis of the interaction between environmental stochasticity and multi-objective Q-learning is supported by empirical evaluations on several simple multiobjective Markov Decision Processes with varying characteristics. This includes a demonstration of a novel approach to learning deterministic SER-optimal policies for environments with stochastic rewards. In addition, we report a previously unidentified issue with model-free, value-based approaches to multiobjective reinforcement learning in the context of environments with stochastic state transitions. Having highlighted the limitations of value-based model-free MORL methods, we discuss several alternative methods that may be more suitable for maximising SER in MOMDPs with stochastic transitions.
引用
收藏
页码:1783 / 1799
页数:17
相关论文
共 50 条
  • [1] The impact of environmental stochasticity on value-based multiobjective reinforcement learning
    Peter Vamplew
    Cameron Foale
    Richard Dazeley
    [J]. Neural Computing and Applications, 2022, 34 : 1783 - 1799
  • [2] Reinforcement Learning for value-based Placement of Fog Services
    Poltronieri, Filippo
    Tortonesi, Mauro
    Stefanelli, Cesare
    Suri, Niranjan
    [J]. 2021 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM 2021), 2021, : 466 - 472
  • [3] A reinforcement learning diffusion decision model for value-based decisions
    Laura Fontanesi
    Sebastian Gluth
    Mikhail S. Spektor
    Jörg Rieskamp
    [J]. Psychonomic Bulletin & Review, 2019, 26 : 1099 - 1121
  • [4] A reinforcement learning diffusion decision model for value-based decisions
    Fontanesi, Laura
    Gluth, Sebastian
    Spektor, Mikhail S.
    Rieskamp, Joerg
    [J]. PSYCHONOMIC BULLETIN & REVIEW, 2019, 26 (04) : 1099 - 1121
  • [5] Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning
    Byeon, Haewon
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (08) : 348 - 354
  • [6] Sparse distributed memories for on-line value-based reinforcement learning
    Ratitch, B
    Precup, D
    [J]. MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 347 - 358
  • [7] MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control
    Zang, Xinshi
    Yao, Huaxiu
    Zheng, Guanjie
    Xu, Nan
    Xu, Kai
    Li, Zhenhui
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1153 - 1160
  • [8] Value-based deep reinforcement learning for adaptive isolated intersection signal control
    Wan, Chia-Hao
    Hwang, Ming-Chorng
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2018, 12 (09) : 1005 - 1010
  • [9] Robust Multiobjective Reinforcement Learning Considering Environmental Uncertainties
    He, Xiangkun
    Hao, Jianye
    Chen, Xu
    Wang, Jun
    Ji, Xuewu
    Lv, Chen
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [10] Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms
    Xu, Tengyu
    Liang, Yingbin
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130