Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [11] Context-Aware Safe Reinforcement Learning for Non-Stationary Environments
    Chen, Baiming
    Liu, Zuxin
    Zhu, Jiacheng
    Xu, Mengdi
    Ding, Wenhao
    Li, Liang
    Zhao, Ding
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10689 - 10695
  • [12] Continual Reinforcement Learning in 3D Non-stationary Environments
    Lomonaco, Vincenzo
    Desai, Karan
    Culurciello, Eugenio
    Maltoni, Davide
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 999 - 1008
  • [13] Weighted Linear Bandits for Non-Stationary Environments
    Russac, Yoan
    Vernade, Claire
    Cappe, Olivier
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [14] The complexity of non-stationary reinforcement learning
    Peng, Binghui
    Papadimitriou, Christos
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [15] Structure Learning-Based Task Decomposition for Reinforcement Learning in Non-stationary Environments
    Woo, Honguk
    Yoo, Gwangpyo
    Yoo, Minjong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8657 - 8665
  • [16] Social Learning in non-stationary environments
    Boursier, Etienne
    Perchet, Vianney
    Scarsini, Marco
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [17] Weighted Gaussian Process Bandits for Non-stationary Environments
    Deng, Yuntian
    Zhou, Xingyu
    Kim, Baekjin
    Tewari, Ambuj
    Gupta, Abhishek
    Shroff, Ness
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [18] Tempo Adaptation in Non-stationary Reinforcement Learning
    Lee, Hyunin
    Ding, Yuhao
    Lee, Jongmin
    Jin, Ming
    Lavaei, Javad
    Sojoudi, Somayeh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [19] Factored Adaptation for Non-stationary Reinforcement Learning
    Feng, Fan
    Huang, Biwei
    Zhang, Kun
    Magliacane, Sara
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [20] Learning User Preferences in Non-Stationary Environments
    Huleihel, Wasim
    Pal, Soumyabrata
    Shayevitz, Ofer
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130