Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation

被引：0

作者：

Long, Alexander ^{[1
]}

Blair, Alan ^{[1
]}

van Hoof, Herke ^{[2
]}

机构：

[1] Univ New South Wales, Sydney, NSW, Australia

[2] Univ Amsterdam, Amsterdam, Netherlands

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

引用

页码：7620 / 7627

页数：8

共 50 条

[31] Combined data augmentation framework for generalizing deep reinforcement learning from pixels
Xiong, Xi
Shen, Chun
Wu, Junhong
Lu, Shuai
Zhang, Xiaodan
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
[32] Non-parametric Source Reconstruction via Kernel Temporal Enhancement for EEG Data
Torres-Valencia, C.
Hernandez-Muriel, J.
Gonzalez-Vanegas, W.
Alvarez-Meza, A.
Orozco, A.
Alvarez, M.
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 443 - 450
[33] Non-parametric Learning of Embeddings for Relational Data Using Gaifman Locality Theorem
Dhami, Devendra Singh
Yan, Siwen
Kunapuli, Gautam
Natarajan, Sriraam
INDUCTIVE LOGIC PROGRAMMING (ILP 2021), 2022, 13191 : 95 - 110
[34] An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space
Sharma, Hiteshi
Jain, Rahul
Gupta, Abhishek
2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 1368 - 1373
[35] Mortgage Loan Data Exploration with Non-parametric Statistical and Machine Learning Perspectives
Hernandez-Lopez, Eymard
Cruz-Espinosa, Diana Jaqueline
Herrera-Zuniga, Leonardo
Wences, Giovanni
COMPUTATIONAL ECONOMICS, 2024,
[36] Learning transcriptional networks from the integration of ChIP-chip and expression data in a non-parametric model
Youn, Ahrim
Reiss, David J.
Stuetzle, Werner
BIOINFORMATICS, 2010, 26 (15) : 1879 - 1886
[37] Non-parametric estimator of a multivariate madogram for missing-data and extreme value framework
Boulin, Alexis
Di Bernardino, Elena
Laloe, Thomas
Toulemonde, Gwladys
JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
[38] NON-PARAMETRIC APPROXIMATION USED TO ANALYSIS OF PSINSAR™ DATA OF UPPER SILESIAN COAL BASIN, POLAND
Mirek, Katarzyna
Mirek, Janusz
ACTA GEODYNAMICA ET GEOMATERIALIA, 2009, 6 (04): : 405 - 409
[39] Value-Consistent Representation Learning for Data-Efficient Reinforcement Learning
Yue, Yang
Kang, Bingyi
Xu, Zhongwen
Huang, Gao
Yan, Shuicheng
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11069 - 11077
[40] NON-PARAMETRIC COVARIANCE ESTIMATION FROM IRREGULARLY-SPACED DATA
MASRY, E
ADVANCES IN APPLIED PROBABILITY, 1983, 15 (01) : 113 - 132

← 1 2 3 4 5 →