Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents

被引:1
|
作者
Hatcho, Yasuyo [1 ]
Hattori, Kiyohiko [1 ]
Takadama, Keiki [1 ,2 ]
机构
[1] Univ Electrocommun, 1-5-1 Chofugaoka, Chofu, Tokyo 1828585, Japan
[2] Japan Sci & Technol Agcy JST, PRESTO, Kawaguchi, Saitama 3320012, Japan
关键词
generalization; time horizon; sequential interaction; reinforcement learning;
D O I
10.20965/jaciii.2009.p0667
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.
引用
收藏
页码:667 / 674
页数:8
相关论文
共 50 条
  • [21] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
    Xu, Haoran
    Zhan, Xianyuan
    Zhu, Xiangyu
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
  • [22] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [23] Concurrent Q-learning: Reinforcement learning for dynamic goals and environments
    Ollington, RB
    Vamplew, PW
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1037 - 1052
  • [24] Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning
    Kuroe, Yasuaki
    Takeuchi, Kenya
    Maeda, Yutaka
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [25] Reinforcement Learning for Taxi-out Time Prediction: An improved Q-learning Approach
    George, Elizabeth
    Khan, Shamsuddin S.
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTING AND NETWORK COMMUNICATIONS (COCONET), 2015, : 757 - 764
  • [26] Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning
    Li, Gen
    Cai, Changxiao
    Chen, Yuxin
    Gu, Yuantao
    Wei, Yuting
    Chi, Yuejie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [27] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [28] INTERNALLY DRIVEN Q-LEARNING Convergence and Generalization Results
    Alonso, Eduardo
    Mondragon, Esther
    Kjaell-Ohlsson, Niclas
    [J]. ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 491 - 494
  • [29] Q-learning agents in a Cournot oligopoly model
    Waltman, Ludo
    Kaymak, Uzay
    [J]. JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2008, 32 (10): : 3275 - 3293
  • [30] Training and delayed reinforcements in Q-learning agents
    Caironi, PVC
    Dorigo, M
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 1997, 12 (10) : 695 - 724