Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

被引:0
|
作者
Da, Longchao [1 ]
Jenkins, Porter [2 ]
Schwantes, Trevor [2 ]
Dotson, Jeffrey [2 ]
Wei, Hua [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
[2] Brigham Young Univ, Provo, UT 84602 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special case performance (e.g., worst or best cases), due to the lack of holistic characterization of policies' performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst-, best-, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment.
引用
收藏
页码:20370 / 20378
页数:9
相关论文
共 50 条
  • [31] APPROXIMATE BAYESIAN COMPUTATION BY SUBSET SIMULATION
    Chiachio, Manuel
    Beck, James L.
    Chiachio, Juan
    Rus, Guillermo
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (03): : A1339 - A1358
  • [32] APPROXIMATE BAYESIAN COMPUTATION FOR COPULA ESTIMATION
    Grazian, Clara
    Liseo, Brunero
    STATISTICA, 2015, 75 (01) : 111 - 127
  • [33] The rate of convergence for approximate Bayesian computation
    Barber, Stuart
    Voss, Jochen
    Webster, Mark
    ELECTRONIC JOURNAL OF STATISTICS, 2015, 9 (01): : 80 - 105
  • [34] Asymptotic properties of approximate Bayesian computation
    Frazier, D. T.
    Martin, G. M.
    Robert, C. P.
    Rousseau, J.
    BIOMETRIKA, 2018, 105 (03) : 593 - 607
  • [35] Approximate Bayesian computation with differential evolution
    Turner, Brandon M.
    Sederberg, Per B.
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2012, 56 (05) : 375 - 385
  • [36] Approximate Bayesian Computation via Classification
    Wang, Yuexi
    Kaji, Tetsuya
    Rockova, Veronika
    Journal of Machine Learning Research, 2022, 23
  • [37] DIFFUSION FILTRATION WITH APPROXIMATE BAYESIAN COMPUTATION
    Dedecius, Kamil
    Djuric, Petar M.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 3207 - 3211
  • [38] HYPERPARAMETER OPTIMIZATION FOR APPROXIMATE BAYESIAN COMPUTATION
    Singh, Prashant
    Hellander, Andreas
    2018 WINTER SIMULATION CONFERENCE (WSC), 2018, : 1718 - 1729
  • [39] Filtering via approximate Bayesian computation
    Jasra, Ajay
    Singh, Sumeetpal S.
    Martin, James S.
    McCoy, Emma
    STATISTICS AND COMPUTING, 2012, 22 (06) : 1223 - 1237
  • [40] Approximate Bayesian Computation (ABC) in practice
    Csillery, Katalin
    Blum, Michael G. B.
    Gaggiotti, Oscar E.
    Francois, Olivier
    TRENDS IN ECOLOGY & EVOLUTION, 2010, 25 (07) : 410 - 418