Fast Stochastic Kalman Gradient Descent for Reinforcement Learning

被引：0

作者：

Totaro, Simone ^{[1
]}

Jonsson, Anders ^{[1
]}

机构：

[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona, Spain

来源：

LEARNING FOR DYNAMICS AND CONTROL, VOL 144 | 2021年 / 144卷

关键词：

Non-stationary MDPs; Reinforcement Learning; Tracking;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As we move towards real world applications, there is an increasing need for scalable, online optimization algorithms capable of dealing with the non-stationarity of the real world. We revisit the problem of online policy evaluation in non-stationary deterministic MDPs through the lense of Kalman filtering. We introduce a randomized regularization technique called Stochastic Kalman Gradient Descent (SKGD) that, combined with a low rank update, generates a sequence of feasible iterates. SKGD is suitable for large scale optimization of non-linear function approximators. We evaluate the performance of SKGD in two controlled experiments, and in one real world application of microgrid control. In our experiments, SKGD is more robust to drift in the transition dynamics than state-of-the-art reinforcement learning algorithms, and the resulting policies are smoother.

引用

页数：12

共 50 条

[41] Byzantine Stochastic Gradient Descent
Alistarh, Dan
Allen-Zhu, Zeyuan
Li, Jerry
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[42] Ensemble of fast learning stochastic gradient boosting
Li, Bin
Yu, Qingzhao
Peng, Lu
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (01) : 40 - 52
[43] Total stochastic gradient algorithms and applications in reinforcement learning
Parmas, Paavo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[44] A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning
Pham, Nhan H.
Nguyen, Ylam M.
Phan, Ydzung T.
Zphuong Ha Nguyen
van Dijk, Zxmarten
Tran-Dinh, Quoc
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 374 - 384
[45] Learning to Learn without Gradient Descent by Gradient Descent
Chen, Yutian
Hoffman, Matthew W.
Colmenarejo, Sergio Gomez
Denil, Misha
Lillicrap, Timothy P.
Botvinick, Matt
de Freitas, Nando
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[46] MABSearch: The Bandit Way of Learning the Learning Rate—A Harmony Between Reinforcement Learning and Gradient Descent
A. S. Syed Shahul Hameed
Narendran Rajagopalan
National Academy Science Letters, 2024, 47 : 29 - 34
[47] Policy gradient reinforcement learning for fast quadrupedal locomotion
Kohl, N
Stone, P
2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS, 2004, : 2619 - 2624
[48] Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling
Lee, Sokbae
Liao, Yuan
Seo, Myung Hwan
Shin, Youngki
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7381 - 7389
[49] Fast calculation of a cylindrical hologram by a preloaded stochastic gradient descent with skip connection
Wu, Zhanghao
Wang, Jun
Cheng, Chuhang
Wang, Jiabao
Zhou, Jie
Yan, Hua
Chen, Chun
OPTICS EXPRESS, 2024, 32 (18): : 30990 - 31005
[50] Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines
Liu, Siting
Jiang, Honglan
Liu, Leibo
Han, Jie
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (11) : 2530 - 2541

← 1 2 3 4 5 →