An Investigation of Model-Free Planning

被引:0
|
作者
Guez, Arthur [1 ]
Mirza, Mehdi [1 ]
Gregor, Karol [1 ]
Kabra, Rishabh [1 ]
Racaniere, Sebastien [1 ]
Weber, Theophane [1 ]
Raposo, David [1 ]
Santoro, Adam [1 ]
Orseau, Laurent [1 ]
Eccles, Tom [1 ]
Wayne, Greg [1 ]
Silver, David [1 ]
Lillicrap, Timothy [1 ]
机构
[1] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been proposed that learn how to plan, by providing the structure for planning via an inductive bias in the function approximator (such as a tree structured neural network), trained end-to-end by a model-free RL algorithm. In this paper, we go even further, and demonstrate empirically that an entirely model-free approach, without special structure beyond standard neural network components such as convolutional networks and LSTMs, can learn to exhibit many of the characteristics typically associated with a model-based planner. We measure our agent's effectiveness at planning in terms of its ability to generalize across a combinatorial and irreversible state space, its data efficiency, and its ability to utilize additional thinking time. We find that our agent has many of the characteristics that one might expect to find in a planning algorithm. Furthermore, it exceeds the state-of-the-art in challenging combinatorial domains such as Sokoban and outperforms other model-free approaches that utilize strong inductive biases toward planning.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Model-Free Grasp Planning for Configurable Vacuum Grippers
    You, Fang
    Mende, Michael
    Stogl, Denis
    Hein, Bjoern
    Kroeger, Torsten
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 4554 - 4561
  • [2] Neural networks for model-free and scale-free automated planning
    Michaela Urbanovská
    Antonín Komenda
    Knowledge and Information Systems, 2021, 63 : 3103 - 3138
  • [3] Neural networks for model-free and scale-free automated planning
    Urbanovska, Michaela
    Komenda, Antonin
    KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 63 (12) : 3103 - 3138
  • [4] Model-Free or Not?
    Zumpfe, Kai
    Smith, Albert A.
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2021, 8
  • [5] A Modular and Model-Free Trajectory Planning Strategy for Automated Driving
    Vosswinkel, Rick
    Mutlu, Ilhan
    Alaa, Khaled
    Schrodel, Frank
    2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1186 - 1191
  • [6] Model-Free Motion Planning of Complex Tasks Subject to Ethical Constraints
    Xiao, Shaoping
    Li, Junchao
    Wang, Zhaoan
    ARTIFICIAL INTELLIGENCE IN HCI, PT II, AI-HCI 2024, 2024, 14735 : 116 - 129
  • [7] TD(0)-Replay: An Efficient Model-Free Planning with full Replay
    Altahhan, Abdulrahman
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [8] Model-Free Model Reconciliation
    Sreedharan, Sarath
    Hernandez, Alberto Olmo
    Mishra, Aditya Prasad
    Kambhampati, Subbarao
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 587 - 594
  • [9] Model-free CPPI
    Schied, Alexander
    JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2014, 40 : 84 - 94
  • [10] Model-free sampling
    Beer, Michael
    STRUCTURAL SAFETY, 2007, 29 (01) : 49 - 65