A generalization error for Q-learning

被引:0
|
作者
Murphy, SA [1 ]
机构
[1] Univ Michigan, Dept Stat, Ann Arbor, MI 48109 USA
关键词
multistage decisions; dynamic programming; reinforcement learning; batch data;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Planning problems that involve learning a policy from a single training set of finite horizon trajectories arise in both social science and medical fields. We consider Q-learning with function approximation for this setting and derive an upper bound on the generalization error. This upper bound is in terms of quantities minimized by a Q-learning algorithm, the complexity of the approximation space and an approximation term due to the mismatch between Q-learning and the goal of learning a policy that maximizes the value function.
引用
收藏
页码:1073 / 1097
页数:25
相关论文
共 50 条
  • [1] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    [J]. FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [2] INTERNALLY DRIVEN Q-LEARNING Convergence and Generalization Results
    Alonso, Eduardo
    Mondragon, Esther
    Kjaell-Ohlsson, Niclas
    [J]. ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 491 - 494
  • [3] The Mean-Squared Error of Double Q-Learning
    Weng, Wentao
    Gupta, Harsh
    He, Niao
    Ying, Lei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents
    Hatcho, Yasuyo
    Hattori, Kiyohiko
    Takadama, Keiki
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2009, 13 (06) : 667 - 674
  • [5] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [6] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [7] An Error-Sensitive Q-learning Approach for Robot Navigation
    Tang, Rongkuan
    Yuan, Hongliang
    [J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 5835 - 5840
  • [8] Error bounds for constant step-size Q-learning
    Beck, C. L.
    Srikant, R.
    [J]. SYSTEMS & CONTROL LETTERS, 2012, 61 (12) : 1203 - 1208
  • [9] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [10] Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
    Omura, Motoki
    Osa, Takayuki
    Mukuta, Yusuke
    Harada, Tatsuya
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14474 - 14481