Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

被引:0
|
作者
Totaro, Simone [1 ]
Jonsson, Anders [1 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain
关键词
Non-stationary MDPs; Reinforcement Learning; Tracking;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As we move towards real world applications, there is an increasing need for scalable, online optimization algorithms capable of dealing with the non-stationarity of the real world. We revisit the problem of online policy evaluation in non-stationary deterministic MDPs through the lense of Kalman filtering. We introduce a randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) that, combined with a low rank update, generates a sequence of feasible iterates. SKGD is suitable for large scale optimization of non-linear function approximators. We evaluate the performance of SKGD in two controlled experiments, and in one real world application of microgrid control. In our experiments, SKGD is more robust to drift in the transition dynamics than state-of-the-art reinforcement learning algorithms, and the resulting policies are smoother.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
    Guo, Pengzhan
    Ye, Zeyang
    Xiao, Keli
    Zhu, Wei
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050
  • [32] Learning curves for stochastic gradient descent in linear feedforward networks
    Werfel, J
    Xie, XH
    Seung, HS
    NEURAL COMPUTATION, 2005, 17 (12) : 2699 - 2718
  • [33] Stability and optimization error of stochastic gradient descent for pairwise learning
    Shen, Wei
    Yang, Zhenhuan
    Ying, Yiming
    Yuan, Xiaoming
    ANALYSIS AND APPLICATIONS, 2020, 18 (05) : 887 - 927
  • [34] Learning-to-Learn Stochastic Gradient Descent with Biased Regularization
    Denevi, Giulia
    Ciliberto, Carlo
    Grazzi, Riccardo
    Pontil, Massimiliano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [35] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
    Le Lan, Charline
    Greaves, Joshua
    Farebrother, Jesse
    Rowland, Mark
    Pedregosa, Fabian
    Agarwal, Rishabh
    Bellemare, Marc
    arXiv, 2022,
  • [36] A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces
    Le Lan, Charline
    Greaves, Joshua
    Farebrother, Jesse
    Rowland, Mark
    Pedregosa, Fabian
    Agarwal, Rishabh
    Bellemare, Marc
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [37] Unforgeability in Stochastic Gradient Descent
    Baluta, Teodora
    Nikolic, Ivica
    Jain, Racchit
    Aggarwal, Divesh
    Saxena, Prateek
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1138 - 1152
  • [38] Preconditioned Stochastic Gradient Descent
    Li, Xi-Lin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (05) : 1454 - 1466
  • [39] Stochastic Reweighted Gradient Descent
    El Hanchi, Ayoub
    Stephens, David A.
    Maddison, Chris J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [40] Stochastic gradient descent tricks
    Bottou, Léon
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 421 - 436