Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation

被引:0
|
作者
Long, Alexander [1 ]
Blair, Alan [1 ]
van Hoof, Herke [2 ]
机构
[1] Univ New South Wales, Sydney, NSW, Australia
[2] Univ Amsterdam, Amsterdam, Netherlands
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
引用
收藏
页码:7620 / 7627
页数:8
相关论文
共 50 条
  • [31] Combined data augmentation framework for generalizing deep reinforcement learning from pixels
    Xiong, Xi
    Shen, Chun
    Wu, Junhong
    Lu, Shuai
    Zhang, Xiaodan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [32] Non-parametric Source Reconstruction via Kernel Temporal Enhancement for EEG Data
    Torres-Valencia, C.
    Hernandez-Muriel, J.
    Gonzalez-Vanegas, W.
    Alvarez-Meza, A.
    Orozco, A.
    Alvarez, M.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 443 - 450
  • [33] Non-parametric Learning of Embeddings for Relational Data Using Gaifman Locality Theorem
    Dhami, Devendra Singh
    Yan, Siwen
    Kunapuli, Gautam
    Natarajan, Sriraam
    INDUCTIVE LOGIC PROGRAMMING (ILP 2021), 2022, 13191 : 95 - 110
  • [34] An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space
    Sharma, Hiteshi
    Jain, Rahul
    Gupta, Abhishek
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 1368 - 1373
  • [35] Mortgage Loan Data Exploration with Non-parametric Statistical and Machine Learning Perspectives
    Hernandez-Lopez, Eymard
    Cruz-Espinosa, Diana Jaqueline
    Herrera-Zuniga, Leonardo
    Wences, Giovanni
    COMPUTATIONAL ECONOMICS, 2024,
  • [36] Learning transcriptional networks from the integration of ChIP-chip and expression data in a non-parametric model
    Youn, Ahrim
    Reiss, David J.
    Stuetzle, Werner
    BIOINFORMATICS, 2010, 26 (15) : 1879 - 1886
  • [37] Non-parametric estimator of a multivariate madogram for missing-data and extreme value framework
    Boulin, Alexis
    Di Bernardino, Elena
    Laloe, Thomas
    Toulemonde, Gwladys
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
  • [38] NON-PARAMETRIC APPROXIMATION USED TO ANALYSIS OF PSINSAR™ DATA OF UPPER SILESIAN COAL BASIN, POLAND
    Mirek, Katarzyna
    Mirek, Janusz
    ACTA GEODYNAMICA ET GEOMATERIALIA, 2009, 6 (04): : 405 - 409
  • [39] Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning
    Yue, Yang
    Kang, Bingyi
    Xu, Zhongwen
    Huang, Gao
    Yan, Shuicheng
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11069 - 11077