Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

被引:0
|
作者
Totaro, Simone [1 ]
Jonsson, Anders [1 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain
关键词
Non-stationary MDPs; Reinforcement Learning; Tracking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As we move towards real world applications, there is an increasing need for scalable, online optimization algorithms capable of dealing with the non-stationarity of the real world. We revisit the problem of online policy evaluation in non-stationary deterministic MDPs through the lense of Kalman filtering. We introduce a randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) that, combined with a low rank update, generates a sequence of feasible iterates. SKGD is suitable for large scale optimization of non-linear function approximators. We evaluate the performance of SKGD in two controlled experiments, and in one real world application of microgrid control. In our experiments, SKGD is more robust to drift in the transition dynamics than state-of-the-art reinforcement learning algorithms, and the resulting policies are smoother.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [22] FORBID: Fast Overlap Removal by Stochastic GradIent Descent for Graph Drawing
    Giovannangeli, Loann
    Lalanne, Frederic
    Giot, Romain
    Bourqui, Romain
    GRAPH DRAWING AND NETWORK VISUALIZATION, GD 2022, 2023, 13764 : 61 - 76
  • [23] A fast non-monotone line search for stochastic gradient descent
    Fathi Hafshejani, Sajad
    Gaur, Daya
    Hossain, Shahadat
    Benkoczi, Robert
    OPTIMIZATION AND ENGINEERING, 2024, 25 (02) : 605 - +
  • [24] A fast non-monotone line search for stochastic gradient descent
    Sajad Fathi Hafshejani
    Daya Gaur
    Shahadat Hossain
    Robert Benkoczi
    Optimization and Engineering, 2024, 25 : 1105 - 1124
  • [25] Nonlinear Optimization Method Based on Stochastic Gradient Descent for Fast Convergence
    Watanabe, Takahiro
    Iima, Hitoshi
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 4198 - 4203
  • [26] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [27] Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
    Yang, Zhenhuan
    Lei, Yunwen
    Wang, Puyu
    Yang, Tianbao
    Ying, Yiming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [28] Convergence diagnostics for stochastic gradient descent with constant learning rate
    Chee, Jerry
    Toulis, Panos
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [29] Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
    Liu, Kangqiao
    Liu Ziyin
    Ueda, Masahito
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [30] Learning curves for stochastic gradient descent in linear feedforward networks
    Werfel, J
    Xie, XH
    Seung, HS
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 1197 - 1204