GENERAL PROOF OF CONVERGENCE OF THE NASH-Q-LEARNING ALGORITHM

被引:0
|
作者
Wang, Jun [1 ]
Cao, Lei [1 ]
Chen, Xiliang [1 ]
Lai, Jun [1 ]
机构
[1] Army Engn Univ PLA, Command Control Engn Inst, Nanjing 211101, Peoples R China
基金
中国国家自然科学基金;
关键词
Nash-Q-Learning; Game Theory; Schauder; Fractals;
D O I
10.1142/S0218348X2250027X
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, the convergence of the Nash-Q-Learning algorithm will be studied mainly. In the previous proof of convergence, each stage of the game must have a global optimal point or a saddle point. Obviously, the assumption is so strict that there are not many application scenarios for the algorithm. At the same time, the algorithm can also get a convergent result in the two Grid-World Games, which do not meet the above assumptions. Thus, previous researchers proposed that the assumptions may be appropriately relaxed. However, a rigorous theoretical proof is not given. The convergence point is a fractal attractor from the view of Fractals, general proof of convergence of the Nash-Q-Learning algorithm will be shown by the mathematical method. Meanwhile, some discussions on the efficiency and scalability of the algorithm are also described in detail.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [1] A convergence proof for the population based incremental learning algorithm
    Rastegar, R
    Hariri, A
    Mazoochi, M
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 387 - 391
  • [2] Nash Q-learning for general-sum stochastic games
    Hu, JL
    Wellman, MP
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1039 - 1069
  • [3] A Convergence Proof for Ant Colony Algorithm
    Nong, Jifu
    Jin, Long
    INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL SCIENCES AND OPTIMIZATION, VOL 2, PROCEEDINGS, 2009, : 974 - +
  • [4] A convergence proof for ant colony algorithm
    Zhao, Baojiang
    Li, Shiyong
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 3072 - +
  • [5] PROOF OF CONVERGENCE FOR A MODIFICATION OF GOLDFARB ALGORITHM
    FISCHER, J
    MATHEMATICS OF OPERATIONS RESEARCH, 1981, 6 (02) : 233 - 245
  • [6] Convergence proof for the algorithm by Papoulis and Gerchberg
    Protzmann, M
    Boche, H
    FREQUENZ, 1998, 52 (9-10) : 175 - 182
  • [7] A New Clustering Algorithm with the Convergence Proof
    Parvin, Hamid
    Minaei-Bidgoli, Behrouz
    Alizadeh, Hosein
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT I: 15TH INTERNATIONAL CONFERENCE, KES 2011, 2011, 6881 : 21 - 31
  • [8] The General Expression of the Prior Convergence Error: A Proof
    Bolt, Janneke H.
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, PT I, 2014, 442 : 496 - 505
  • [9] A simplified convergence proof for the cone partitioning algorithm
    Jaumard, B
    Meyer, C
    JOURNAL OF GLOBAL OPTIMIZATION, 1998, 13 (04) : 407 - 416
  • [10] A convergence proof for the softassign quadratic assignment algorithm
    Ragarajan, A
    Yuille, A
    Gold, S
    Mjolsness, E
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 9: PROCEEDINGS OF THE 1996 CONFERENCE, 1997, 9 : 620 - 626