Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

被引:0
|
作者
Da, Longchao [1 ]
Jenkins, Porter [2 ]
Schwantes, Trevor [2 ]
Dotson, Jeffrey [2 ]
Wei, Hua [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
[2] Brigham Young Univ, Provo, UT 84602 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special case performance (e.g., worst or best cases), due to the lack of holistic characterization of policies' performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst-, best-, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment.
引用
收藏
页码:20370 / 20378
页数:9
相关论文
共 50 条
  • [1] Bayesian Probabilistic Power Flow Analysis Using Jacobian Approximate Bayesian Computation
    Zuluaga, Carlos David
    Alvarez, Mauricio A.
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2018, 33 (05) : 5217 - 5225
  • [2] Bayesian Probabilistic Power Flow Analysis Using Jacobian Approximate Bayesian Computation
    Zuluaga, Carlos David
    Alvarez, Mauricio A.
    IEEE Transactions on Power Systems, 2018, 33 (05): : 5217 - 5225
  • [3] Improved approximate Bayesian computation for probabilistic damage identification of structures
    Fang S.-E.
    Chen S.
    Dong Z.-L.
    Zhendong Gongcheng Xuebao/Journal of Vibration Engineering, 2019, 32 (02): : 224 - 233
  • [4] Automatic Sampler Discovery via Probabilistic Programming and Approximate Bayesian Computation
    Perov, Yura
    Wood, Frank
    ARTIFICIAL GENERAL INTELLIGENCE (AGI 2016), 2016, 9782 : 262 - 273
  • [5] Approximate Bayesian Computation
    Beaumont, Mark A.
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 6, 2019, 6 : 379 - 403
  • [6] Approximate Bayesian Computation
    Sunnaker, Mikael
    Busetto, Alberto Giovanni
    Numminen, Elina
    Corander, Jukka
    Foll, Matthieu
    Dessimoz, Christophe
    PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (01)
  • [7] Probabilistic Updating of Structural Models for Damage Assessment Using Approximate Bayesian Computation
    Feng, Zhouquan
    Lin, Yang
    Wang, Wenzan
    Hua, Xugang
    Chen, Zhengqing
    SENSORS, 2020, 20 (11)
  • [8] Approximate Bayesian Computation for Probabilistic Decline-Curve Analysis in Unconventional Reservoirs
    Paryani, Mohit
    Awoleke, Obadare O.
    Ahmadi, Mohabbat
    Hanks, Catherine
    Barry, Ronald
    SPE RESERVOIR EVALUATION & ENGINEERING, 2017, 20 (02) : 478 - 485
  • [9] Probabilistic damage identification incorporating approximate Bayesian computation with stochastic response surface
    Fang, Sheng-En
    Chen, Shan
    Lin, You-Qin
    Dong, Zhao-Liang
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2019, 128 : 229 - 243
  • [10] Writer Identification using a Probabilistic Model of Handwritten Digits and Approximate Bayesian Computation
    Ahmadian, Amirhosein
    Fouladi, Kazim
    Araabi, Babak Nadjar
    2016 2ND INTERNATIONAL CONFERENCE OF SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2016, : 40 - 45