Improving Exploration in UCT Using Local Manifolds

被引:0
|
作者
Srinivasan, Sriram [1 ]
Talvitie, Erik [2 ]
Bowling, Michael [1 ]
机构
[1] Univ Alberta, Edmonton, AB, Canada
[2] Franklin & Marshall Coll, Lancaster, PA 17604 USA
关键词
SEARCH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monte Carlo planning has been proven successful in many sequential decision-making settings, but it suffers from poor exploration when the rewards are sparse. In this paper, we improve exploration in UCT by generalizing across similar states using a given distance metric. When the state space does not have a natural distance metric, we show how we can learn a local manifold from the transition graph of states in the near future. to obtain a distance metric. On domains inspired by video games, empirical evidence shows that our algorithm is more sample efficient than UCT, particularly when rewards are sparse.
引用
收藏
页码:3386 / 3392
页数:7
相关论文
共 50 条