Deep Reinforcement Learning at the Edge of the Statistical Precipice

被引:0
|
作者
Agarwal, Rishabh [1 ]
Schwarzer, Max [2 ]
Castro, Pablo Samuel [3 ]
Courville, Aaron [2 ,4 ]
Bellemare, Marc G. [3 ]
机构
[1] Univ Montreal, MILA, Google Res, Brain Team, Montreal, PQ, Canada
[2] Univ Montreal, MILA, Montreal, PQ, Canada
[3] Google Res, Brain Team, Montreal, PQ, Canada
[4] Univ Montreal, MILA, Montreal, PQ, Canada
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few-run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable(2), to prevent unreliable results from stagnating the field.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] EDGE: Explaining Deep Reinforcement Learning Policies
    Guo, Wenbo
    Wu, Xian
    Khan, Usmann
    Xing, Xinyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Deep Reinforcement Learning Edge Workload Orchestrator for Vehicular Edge Computing
    Silva, Eliana Neuza
    da Silva, Fernando Mira
    [J]. 2023 IEEE 9TH INTERNATIONAL CONFERENCE ON NETWORK SOFTWARIZATION, NETSOFT, 2023, : 44 - 52
  • [3] Learning IoV in Edge: Deep Reinforcement Learning for Edge Computing Enabled Vehicular Networks
    Xu, Shilin
    Guo, Caili
    Hu, Rose Qingyang
    Qian, Yi
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [4] The Edge of the Precipice
    Amatulli, Rosa
    [J]. FORUM ITALICUM, 2012, 46 (01) : 201 - 203
  • [5] A Deep Reinforcement Learning Based Offloading Game in Edge Computing
    Zhan, Yufeng
    Guo, Song
    Li, Peng
    Zhang, Jiang
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2020, 69 (06) : 883 - 893
  • [6] A Deep Reinforcement Learning Approach for Collaborative Mobile Edge Computing
    Wu, Jiaqi
    Lin, Huang
    Liu, Huaize
    Gao, Lin
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 601 - 606
  • [7] Analyse or Transmit: Utilising Correlation at the Edge with Deep Reinforcement Learning
    Hribar, Jernej
    Shinkuma, Ryoichi
    Iosifidis, George
    Dusparic, Ivana
    [J]. 2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [8] Edge computing dynamic unloading based on deep reinforcement learning
    Kan, Jicheng
    Cai, Jiajing
    [J]. PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 937 - 944
  • [9] Hardware-oriented deep reinforcement learning for edge computing
    Yamagishi, Yoshiharu
    Kaneko, Tatsuya
    Akai-Kasaya, Megumi
    Asai, Tetsuya
    [J]. IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2021, 12 (03): : 526 - 544
  • [10] Deep Reinforcement Learning for Collaborative Offloading in Heterogeneous Edge Networks
    Nguyen, Dinh C.
    Pathirana, Pubudu N.
    Ding, Ming
    Seneviratne, Aruna
    [J]. 21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 297 - 303