The Bottleneck Simulator: A Model-Based Deep Reinforcement Learning Approach

被引：0

作者：

Serban, Iulian Vlad ^{[1
]}

Sankar, Chinnadhurai ^{[1
]}

Pieper, Michael ^{[2
]}

Pineau, Joelle ^{[3
]}

Bengio, Yoshua ^{[1
]}

机构：

[1] Univ Montreal, Dept Comp Sci & Operat Res, Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada

[2] Polytech Montreal, Montreal, PQ, Canada

[3] McGill Univ, Sch Comp Sci, Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2020年 / 69卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

AGGREGATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning has recently shown many impressive successes. However, one major obstacle towards applying such methods to real-world problems is their lack of data-efficiency. To this end, we propose the Bottleneck Simulator: a model-based reinforcement learning method which combines a learned, factorized transition model of the environment with rollout simulations to learn an effective policy from few examples. The learned transition model employs an abstract, discrete (bottleneck) state, which increases sample efficiency by reducing the number of model parameters and by exploiting structural properties of the environment. We provide a mathematical analysis of the Bottleneck Simulator in terms of fixed points of the learned policy, which reveals how performance is affected by four distinct sources of error: an error related to the abstract space structure, an error related to the transition model estimation variance, an error related to the transition model estimation bias, and an error related to the transition model class bias. Finally, we evaluate the Bottleneck Simulator on two natural language processing tasks: a text adventure game and a real-world, complex dialogue response selection task. On both tasks, the Bottleneck Simulator yields excellent performance beating competing approaches.

引用

页码：571 / 612

页数：42

共 50 条

[1] The bottleneck simulator: A model-based deep reinforcement learning approach
Serban, Iulian Vlad
Sankar, Chinnadhurai
Pieper, Michael
Pineau, Joelle
Bengio, Yoshua
[J]. Journal of Artificial Intelligence Research, 2020, 69 : 571 - 612
[2] Learning to Paint With Model-based Deep Reinforcement Learning
Huang, Zhewei
Heng, Wen
Zhou, Shuchang
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8708 - 8717
[3] Calibrated Model-Based Deep Reinforcement Learning
Malik, Ali
Kuleshov, Volodymyr
Song, Jiaming
Nemer, Danny
Seymour, Harlan
Ermon, Stefano
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[4] A Contraction Approach to Model-based Reinforcement Learning
Fan, Ting-Han
Ramadge, Peter J.
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 325 - +
[5] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
Li, Zhuoru
Narayan, Akshay
Leong, Tze-Yun
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589
[6] A Model-based Factored Bayesian Reinforcement Learning Approach
Wu, Bo
Feng, Yanpeng
Zheng, Hongyan
[J]. APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 1092 - 1095
[7] Model-based deep reinforcement learning for wind energy bidding
Sanayha, Manassakan
Vateekul, Peerapon
[J]. INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 136
[8] Knowledge Transfer using Model-Based Deep Reinforcement Learning
Boloka, Tlou
Makondo, Ndivhuwo
Rosman, Benjamin
[J]. 2021 SOUTHERN AFRICAN UNIVERSITIES POWER ENGINEERING CONFERENCE/ROBOTICS AND MECHATRONICS/PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA (SAUPEC/ROBMECH/PRASA), 2021,
[9] Deep Reinforcement Learning with Model-based Acceleration for Hyperparameter Optimization
Chen, SenPeng
Wu, Jia
Chen, XiuYun
[J]. 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 170 - 177
[10] SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Zhang, Marvin
Vikram, Sharad
Smith, Laura
Abbeel, Pieter
Johnson, Matthew J.
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97

← 1 2 3 4 5 →