Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

被引:0
|
作者
Totaro, Simone [1 ]
Jonsson, Anders [1 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain
关键词
Non-stationary MDPs; Reinforcement Learning; Tracking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As we move towards real world applications, there is an increasing need for scalable, online optimization algorithms capable of dealing with the non-stationarity of the real world. We revisit the problem of online policy evaluation in non-stationary deterministic MDPs through the lense of Kalman filtering. We introduce a randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) that, combined with a low rank update, generates a sequence of feasible iterates. SKGD is suitable for large scale optimization of non-linear function approximators. We evaluate the performance of SKGD in two controlled experiments, and in one real world application of microgrid control. In our experiments, SKGD is more robust to drift in the transition dynamics than state-of-the-art reinforcement learning algorithms, and the resulting policies are smoother.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Byzantine Stochastic Gradient Descent
    Alistarh, Dan
    Allen-Zhu, Zeyuan
    Li, Jerry
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [42] Ensemble of fast learning stochastic gradient boosting
    Li, Bin
    Yu, Qingzhao
    Peng, Lu
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (01) : 40 - 52
  • [43] Total stochastic gradient algorithms and applications in reinforcement learning
    Parmas, Paavo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [44] A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning
    Pham, Nhan H.
    Nguyen, Ylam M.
    Phan, Ydzung T.
    Zphuong Ha Nguyen
    van Dijk, Zxmarten
    Tran-Dinh, Quoc
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 374 - 384
  • [45] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [46] MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient Descent
    A. S. Syed Shahul Hameed
    Narendran Rajagopalan
    National Academy Science Letters, 2024, 47 : 29 - 34
  • [47] Policy gradient reinforcement learning for fast quadrupedal locomotion
    Kohl, N
    Stone, P
    2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2619 - 2624
  • [48] Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling
    Lee, Sokbae
    Liao, Yuan
    Seo, Myung Hwan
    Shin, Youngki
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7381 - 7389
  • [49] Fast calculation of a cylindrical hologram by a preloaded stochastic gradient descent with skip connection
    Wu, Zhanghao
    Wang, Jun
    Cheng, Chuhang
    Wang, Jiabao
    Zhou, Jie
    Yan, Hua
    Chen, Chun
    OPTICS EXPRESS, 2024, 32 (18): : 30990 - 31005
  • [50] Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines
    Liu, Siting
    Jiang, Honglan
    Liu, Leibo
    Han, Jie
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2530 - 2541