Stochastic Temporal Difference Learning for Sequence Data

被引:0
|
作者
Chien, Jen-Tzung [1 ]
Chiu, Yi-Chung [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
关键词
Sequential learning; recurrent neural network; reinforcement learning; variational auto-encoder; language model;
D O I
10.1109/IJCNN52387.2021.9534155
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Planning is crucial to train an agent via model-based reinforcement learning who can predict distant observations to reflect his/her past experience. Such a planning method is theoretically and computationally attractive in comparison with traditional learning which relies on step-by-step prediction. However, it is more challenging to build a learning machine which can predict and plan randomly across multiple time steps rather than act step by step. To reflect this flexibility in learning process, we need to predict future states directly without going through all intermediate states. Accordingly, this paper develops the stochastic temporal difference learning where the sequence data are represented with multiple jumpy states while the stochastic state space model is learned by maximizing the evidence lower bound of log likelihood of training data. A general solution with various number of jumpy states is developed and formulated. Experiments demonstrate the merit of the proposed sequential machine to find predictive states to roll forward with jumps as well as predict words.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] STOCHASTIC KERNEL TEMPORAL DIFFERENCE FOR REINFORCEMENT LEARNING
    Bae, Jihye
    Giraldo, Luis Sanchez
    Chhatbar, Pratik
    Francis, Joseph
    Sanchez, Justin
    Principe, Jose
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,
  • [2] Deterministic limit of temporal difference reinforcement learning for stochastic games
    Barfuss, Wolfram
    Donges, Jonathan F.
    Kurths, Juergen
    PHYSICAL REVIEW E, 2019, 99 (04)
  • [3] Stability of Stochastic Approximations With "Controlled Markov" Noise and Temporal Difference Learning
    Ramaswamy, Arunselvan
    Bhatnagar, Shalabh
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2614 - 2620
  • [4] Data Efficient Deep Reinforcement Learning With Action-Ranked Temporal Difference Learning
    Liu, Qi
    Li, Yanjie
    Liu, Yuecheng
    Lin, Ke
    Gao, Jianqi
    Lou, Yunjiang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2949 - 2961
  • [5] Pattern-level temporal difference learning, data fusion, and chess
    Levinson, R
    Weber, R
    SENSOR FUSION: ARCHITECTURES, ALGORITHMS, AND APPLICATIONS IV, 2000, 4051 : 43 - 54
  • [6] Dynamics of Temporal Difference Learning
    Wendemuth, Andreas
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1107 - 1112
  • [7] Discerning Temporal Difference Learning
    Ma, Jianfei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14238 - 14245
  • [8] Natural Temporal Difference Learning
    Dabney, William
    Thomas, Philip S.
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1767 - 1773
  • [9] Differential Temporal Difference Learning
    Devraj, Adithya M.
    Kontoyiannis, Ioannis
    Meyn, Sean P.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (10) : 4652 - 4667
  • [10] Preferential Temporal Difference Learning
    Anand, Nishanth
    Precup, Doina
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139