Stochastic Temporal Difference Learning for Sequence Data

被引：0

作者：

Chien, Jen-Tzung ^{[1
]}

Chiu, Yi-Chung ^{[1
]}

机构：

[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

Sequential learning; recurrent neural network; reinforcement learning; variational auto-encoder; language model;

D O I：

10.1109/IJCNN52387.2021.9534155

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Planning is crucial to train an agent via model-based reinforcement learning who can predict distant observations to reflect his/her past experience. Such a planning method is theoretically and computationally attractive in comparison with traditional learning which relies on step-by-step prediction. However, it is more challenging to build a learning machine which can predict and plan randomly across multiple time steps rather than act step by step. To reflect this flexibility in learning process, we need to predict future states directly without going through all intermediate states. Accordingly, this paper develops the stochastic temporal difference learning where the sequence data are represented with multiple jumpy states while the stochastic state space model is learned by maximizing the evidence lower bound of log likelihood of training data. A general solution with various number of jumpy states is developed and formulated. Experiments demonstrate the merit of the proposed sequential machine to find predictive states to roll forward with jumps as well as predict words.

引用

页数：6

共 50 条

[1] STOCHASTIC KERNEL TEMPORAL DIFFERENCE FOR REINFORCEMENT LEARNING
Bae, Jihye
Giraldo, Luis Sanchez
Chhatbar, Pratik
Francis, Joseph
Sanchez, Justin
Principe, Jose
2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
[2] Deterministic limit of temporal difference reinforcement learning for stochastic games
Barfuss, Wolfram
Donges, Jonathan F.
Kurths, Juergen
PHYSICAL REVIEW E, 2019, 99 (04)
[3] Stability of Stochastic Approximations With "Controlled Markov" Noise and Temporal Difference Learning
Ramaswamy, Arunselvan
Bhatnagar, Shalabh
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2614 - 2620
[4] Data Efficient Deep Reinforcement Learning With Action-Ranked Temporal Difference Learning
Liu, Qi
Li, Yanjie
Liu, Yuecheng
Lin, Ke
Gao, Jianqi
Lou, Yunjiang
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2949 - 2961
[5] Pattern-level temporal difference learning, data fusion, and chess
Levinson, R
Weber, R
SENSOR FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS IV, 2000, 4051 : 43 - 54
[6] Dynamics of Temporal Difference Learning
Wendemuth, Andreas
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1107 - 1112
[7] Discerning Temporal Difference Learning
Ma, Jianfei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14238 - 14245
[8] Natural Temporal Difference Learning
Dabney, William
Thomas, Philip S.
PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1767 - 1773
[9] Differential Temporal Difference Learning
Devraj, Adithya M.
Kontoyiannis, Ioannis
Meyn, Sean P.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (10) : 4652 - 4667
[10] Preferential Temporal Difference Learning
Anand, Nishanth
Precup, Doina
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →