Fast and Data Efficient Reinforcement Learning from Pixels via Non-parametric Value Approximation

被引：0

作者：

Long, Alexander ^{[1
]}

Blair, Alan ^{[1
]}

van Hoof, Herke ^{[2
]}

机构：

[1] Univ New South Wales, Sydney, NSW, Australia

[2] Univ Amsterdam, Amsterdam, Netherlands

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

引用

页码：7620 / 7627

页数：8

共 50 条

[1] Non-parametric Sampling Approximation via Voronoi Tessellations
Villagran, Alejandro
Huerta, Gabriel
Vannucci, Marina
Jackson, Charles S.
Nosedal, Alvaro
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (02) : 717 - 736
[2] Parameterizing Non-Parametric Meta-Reinforcement Learning Tasks via Subtask Decomposition
Lee, Suyoung
Cho, Myungsik
Sung, Youngchul
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Learn to flap: foil non-parametric path planning via deep reinforcement learning
Wang, Z. P.
Lin, R. J.
Zhao, Z. Y.
Chen, X.
Guo, P. M.
Yang, N.
Wang, Z. C.
Fan, D. X.
JOURNAL OF FLUID MECHANICS, 2024, 984
[4] NON-PARAMETRIC TREND TESTS FOR LEARNING DATA
JONCKHEERE, AR
BOWER, GH
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 1967, 20 : 163 - +
[5] Risk-sensitive inverse reinforcement learning via semi- and non-parametric methods
Singh, Sumeet
Lacotte, Jonathan
Majumdar, Anirudha
Pavone, Marco
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2018, 37 (13-14): : 1713 - 1740
[6] Non-Parametric Transformation Networks for Learning General Invariances from Data
Pal, Dipan K.
Savvides, Marios
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4667 - 4674
[7] Learning Data-adaptive Non-parametric Kernels
Liu, Fanghui
Huang, Xiaolin
Gong, Chen
Yang, Jie
Li, Li
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[8] Learning data-adaptive non-parametric kernels
Liu, Fanghui
Huang, Xiaolin
Gong, Chen
Yang, Jie
Li, Li
Journal of Machine Learning Research, 2020, 21
[9] A Non-parametric Approach for Learning from Crowds
Fu, Jiayi
Zhong, Jinhong
Liu, Yunfeng
Wang, Zhenyu
Tang, Ke
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2228 - 2235
[10] Fast efficient computation of expected value of sample information from a probabilistic sensitivity analysis sample: a non-parametric regression approach
Mark Strong
Alan Brennan
Jeremy Oakley
Trials, 14 (Suppl 1)

← 1 2 3 4 5 →