Policy Space Diversity for Non-Transitive Games

被引:0
|
作者
Yao, Jian [1 ]
Liu, Weiming [1 ]
Fu, Haobo [1 ]
Yang, Yaodong [2 ]
McAleer, Stephen [3 ]
Fu, Qiang [1 ]
Yang, Wei [1 ]
机构
[1] Tencent Lab, Shenzhen, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
[3] Carnegie Mellon Univ, Pittsburgh, PA USA
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants. The experiment code is available at https://github.com/nigelyaoj/policy-space-diversity-psro.
引用
收藏
页数:23
相关论文
共 50 条