A Partially Observable Monte Carlo Planning Algorithm Based on Path Modification

被引:0
|
作者
Wang, Qingya [1 ]
Liu, Feng [1 ]
Luo, Bin [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Software Inst, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
POMDP; POMCP-PM; Value Updating;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Balancing exploration and exploitation has long been recognized as an important theme in the online planning algorithms for POMDP problems. Explorative actions on one hand prevent the planning from falling into the suboptimal dilemma, while hindering the convergence of the planning procedure on the other hand. Therefore, it is meaningful to maintain the exploration as well as taking a step forward towards exploitation. Note that there is a deviation between the action selection criteria in the planning procedure and in the execution procedure, which inspires us to build a bridge between these two criteria to accelerate the convergence. A Partially Observable Monte Carlo Planning algorithm based on Path Modification (POMCP-PM) is presented in the paper, which modifies the backtracing paths by considering the two criteria simultaneously when updating the values of parent nodes. The algorithm is general as the Upper Confidence Bound Apply to Tree (UCT) algorithm used to select actions can be easily replaced by other criteria. Experimental results demonstrate that POMCP-PM outperforms POMCP with varying numbers of simulations on several scenarios with different scales.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Partially pruned DNN coupled with parallel Monte-Carlo algorithm for path loss prediction in underwater wireless optical channels
    Du, Zihao
    Ge, Wenmin
    Song, Guangbin
    Dai, Yizhan
    Zhang, Yufan
    Xiong, Jianmin
    Jia, Bowen
    Hua, Yan
    Ma, Dongfang
    Zhang, Zejun
    Xu, Jing
    OPTICS EXPRESS, 2022, 30 (08) : 12835 - 12847
  • [42] Entropy-Based Adaptive Exploit-Explore Coefficient for Monte-Carlo Path Planning
    Carmo, Ana Raquel
    Delamer, Jean-Alexis
    Watanabe, Yoko
    Ventura, Rodrigo
    Chanel, Caroline P. C.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2964 - 2971
  • [43] Path shadowing Monte Carlo
    Morel, Rudy
    Mallat, Stephane
    Bouchaud, Jean-Philippe
    QUANTITATIVE FINANCE, 2024, 24 (09) : 1199 - 1225
  • [44] FLEET OPTIMIZATION BASED ON THE MONTE CARLO ALGORITHM
    Lampa, Martin
    Samolejova, Andrea
    ACTA LOGISTICA, 2020, 7 (01): : 17 - 21
  • [45] Monte Carlo supported commissioning of non-Monte Carlo based treatment planning systems
    Jeraj, R
    Olivera, G
    Reckwerdt, P
    Smilowitz, J
    Mackie, T
    MEDICAL PHYSICS, 2002, 29 (06) : 1289 - 1289
  • [46] Monte Carlo hidden Markov models: Learning non-parametric models of partially observable stochastic processes
    Thrun, S
    Langford, JC
    Fox, D
    MACHINE LEARNING, PROCEEDINGS, 1999, : 415 - 424
  • [47] Worm algorithm and diagrammatic Monte Carlo: A new approach to continuous-space path integral Monte Carlo simulations
    Boninsegni, M.
    Prokof'ev, N. V.
    Svistunov, B. V.
    PHYSICAL REVIEW E, 2006, 74 (03)
  • [48] A Monte Carlo hyper-heuristic algorithm with low-level heuristics reward prediction for missile path planning
    Xu, Shuangfei
    Huang, Zhanjun
    Bi, Wenhao
    Zhang, An
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
  • [49] A Fast Monte Carlo Dose Algorithm for Radiotherapy Treatment Planning Based On Hybrid Adaptive Meshes
    Yuan, J.
    Brindle, J.
    Zheng, Y.
    Sohn, J.
    Geis, P.
    Yao, M.
    Lo, S.
    Wessels, B.
    MEDICAL PHYSICS, 2012, 39 (06) : 3596 - 3597
  • [50] A Stream Field Based Partially Observable Moving Object Tracking Algorithm
    Tseng, Kuo-Shih
    2008 10TH INTERNATIONAL CONFERENCE ON CONTROL AUTOMATION ROBOTICS & VISION: ICARV 2008, VOLS 1-4, 2008, : 1850 - 1856