Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation

被引:0
|
作者
Long, Alexander [1 ]
Blair, Alan [1 ]
van Hoof, Herke [2 ]
机构
[1] Univ New South Wales, Sydney, NSW, Australia
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
引用
收藏
页码:7620 / 7627
页数:8
相关论文
共 50 条
  • [41] A non-parametric Bayesian approach to decompounding from high frequency data
    Gugushvili S.
    van der Meulen F.
    Spreij P.
    Statistical Inference for Stochastic Processes, 2018, 21 (1) : 53 - 79
  • [42] Planning the efficient allocation of research funds: an adapted application of a non-parametric Bayesian value of information
    Karnon, J
    HEALTH POLICY, 2002, 61 (03) : 329 - 347
  • [43] Non-parametric estimation from simultaneous degradation and failure time data
    Bagdonavicius, V
    Bikelis, A
    Kazakevicius, V
    Nikulin, M
    COMPTES RENDUS MATHEMATIQUE, 2002, 335 (02) : 183 - 188
  • [44] An Efficient Non-Parametric Statistical Test for Assessing Some Treatment Methods of Clinical Data
    Mansour, Mahmoud
    Aboshady, Mohamed
    PARALLEL PROCESSING LETTERS, 2022, 32 (01N02)
  • [45] Learning Non-Parametric Models in Real Time via Online Generalized Product of Experts
    Watson, Connor
    Morimoto, Tania K.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 9326 - 9333
  • [46] Learning non-parametric basis independent models from point queries via low-rank methods
    Tyagi, Hemant
    Cevher, Volkan
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2014, 37 (03) : 389 - 412
  • [47] Entropy criterion for surrogate timeseries data generation via non-parametric dimensionality reduction
    Lewis, Tyler
    Sundaram, Arvind
    Abdel-Khalik, Hany S.
    Rabiti, Cristian
    Talbot, Paul
    ANNALS OF NUCLEAR ENERGY, 2023, 180
  • [48] Non-parametric automatic microseismic data denoising via PD method and its application
    Peng P.
    Wang L.
    Pei A.
    Yanshilixue Yu Gongcheng Xuebao/Chinese Journal of Rock Mechanics and Engineering, 2019, 38 : 3061 - 3069
  • [49] Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
    Akiyama, Takayuki
    Hachiya, Hirotaka
    Sugiyama, Masashi
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 980 - 985
  • [50] Assessing non-inferiority with time-to-event data via the method of non-parametric covariance
    Zhang, Xinji
    Xu, Jinfang
    He, Jia
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (03) : 346 - 360