GENERAL PROOF OF CONVERGENCE OF THE NASH-Q-LEARNING ALGORITHM

被引:0
|
作者
Wang, Jun [1 ]
Cao, Lei [1 ]
Chen, Xiliang [1 ]
Lai, Jun [1 ]
机构
[1] Army Engn Univ PLA, Command Control Engn Inst, Nanjing 211101, Peoples R China
基金
中国国家自然科学基金;
关键词
Nash-Q-Learning; Game Theory; Schauder; Fractals;
D O I
10.1142/S0218348X2250027X
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, the convergence of the Nash-Q-Learning algorithm will be studied mainly. In the previous proof of convergence, each stage of the game must have a global optimal point or a saddle point. Obviously, the assumption is so strict that there are not many application scenarios for the algorithm. At the same time, the algorithm can also get a convergent result in the two Grid-World Games, which do not meet the above assumptions. Thus, previous researchers proposed that the assumptions may be appropriately relaxed. However, a rigorous theoretical proof is not given. The convergence point is a fractal attractor from the view of Fractals, general proof of convergence of the Nash-Q-Learning algorithm will be shown by the mathematical method. Meanwhile, some discussions on the efficiency and scalability of the algorithm are also described in detail.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [41] The New Second Order Sliding Mode Algorithm and Convergence Proof
    Kochetkov, Sergey
    Krasnova, Svetlana
    Rassadin, Yuriy
    Utkin, Victor
    2015 INTERNATIONAL WORKSHOP ON RECENT ADVANCES IN SLIDING MODES (RASM), 2015,
  • [42] A proof of the convergence theorem of maximum-entropy clustering algorithm
    REN ShiJun & WANG YaDong School of Computers
    Science China(Information Sciences), 2010, 53 (06) : 1151 - 1158
  • [43] Rigorous proof of cubic convergence for the dqds algorithm for singular values
    Aishima, Kensuke
    Matsuo, Takayasu
    Murota, Kazuo
    JAPAN JOURNAL OF INDUSTRIAL AND APPLIED MATHEMATICS, 2008, 25 (01) : 65 - 81
  • [44] Rigorous proof of cubic convergence for the dqds algorithm for singular values
    Kensuke Aishima
    Takayasu Matsuo
    Kazuo Murota
    Japan Journal of Industrial and Applied Mathematics, 2008, 25
  • [45] Simple Proof of Convergence of the SMO Algorithm for Different SVM Variants
    Lopez, Jorge
    Dorronsoro, Jose R.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (07) : 1142 - 1147
  • [46] ON THE AS CONVERGENCE OF THE KOHONEN ALGORITHM WITH A GENERAL NEIGHBORHOOD FUNCTION
    Fort, Jean-Claude
    Pages, Gilles
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (04): : 1177 - 1216
  • [47] General discussion on convergence of immune genetic algorithm
    Luo, X.-P. (luo_xiao_ping@sina.com), 2006, Zhejiang University (39):
  • [48] Convergence of optimistic and incremental Q-learning
    Even-Dar, E
    Mansour, Y
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1499 - 1506
  • [49] Machine Learning in Proof General: Interfacing Interfaces
    Komendantskaya, Ekaterina
    Heras, Jonathan
    Grov, Gudmund
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (118): : 15 - 41
  • [50] Fuzzy Q(λ)-Learning Algorithm
    Zajdel, Roman
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT I, 2010, 6113 : 256 - 263