An Investigation of Model-Free Planning

被引：0

作者：

Guez, Arthur ^{[1
]}

Mirza, Mehdi ^{[1
]}

Gregor, Karol ^{[1
]}

Kabra, Rishabh ^{[1
]}

Racaniere, Sebastien ^{[1
]}

Weber, Theophane ^{[1
]}

Raposo, David ^{[1
]}

Santoro, Adam ^{[1
]}

Orseau, Laurent ^{[1
]}

Eccles, Tom ^{[1
]}

Wayne, Greg ^{[1
]}

Silver, David ^{[1
]}

Lillicrap, Timothy ^{[1
]}

机构：

[1] DeepMind, London, England

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97 | 2019年 / 97卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.

引用

页数：10

共 50 条

[1] Model-Free Grasp Planning for Configurable Vacuum Grippers
You, Fang
Mende, Michael
Stogl, Denis
Hein, Bjoern
Kroeger, Torsten
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 4554 - 4561
[2] Neural networks for model-free and scale-free automated planning
Michaela Urbanovská
Antonín Komenda
Knowledge and Information Systems, 2021, 63 : 3103 - 3138
[3] Neural networks for model-free and scale-free automated planning
Urbanovska, Michaela
Komenda, Antonin
KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (12) : 3103 - 3138
[4] Model-Free or Not?
Zumpfe, Kai
Smith, Albert A.
FRONTIERS IN MOLECULAR BIOSCIENCES, 2021, 8
[5] A Modular and Model-Free Trajectory Planning Strategy for Automated Driving
Vosswinkel, Rick
Mutlu, Ilhan
Alaa, Khaled
Schrodel, Frank
2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1186 - 1191
[6] Model-Free Motion Planning of Complex Tasks Subject to Ethical Constraints
Xiao, Shaoping
Li, Junchao
Wang, Zhaoan
ARTIFICIAL INTELLIGENCE IN HCI, PT II, AI-HCI 2024, 2024, 14735 : 116 - 129
[7] TD(0)-Replay: An Efficient Model-Free Planning with full Replay
Altahhan, Abdulrahman
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[8] Model-Free Model Reconciliation
Sreedharan, Sarath
Hernandez, Alberto Olmo
Mishra, Aditya Prasad
Kambhampati, Subbarao
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 587 - 594
[9] Model-free CPPI
Schied, Alexander
JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2014, 40 : 84 - 94
[10] Model-free sampling
Beer, Michael
STRUCTURAL SAFETY, 2007, 29 (01) : 49 - 65

← 1 2 3 4 5 →