First return, then explore

被引：143

作者：

Ecoffet, Adrien ^{[1
,2
]}

Huizinga, Joost ^{[1
,2
]}

Lehman, Joel ^{[1
,2
]}

Stanley, Kenneth O. ^{[1
,2
]}

Clune, Jeff ^{[1
,2
]}

机构：

[1] Uber AI Labs, San Francisco, CA 94107 USA

[2] OpenAI, San Francisco, CA 94110 USA

来源：

NATURE | 2021年 / 590卷 / 7847期

关键词：

ARCADE LEARNING-ENVIRONMENT; LEVEL;

D O I：

10.1038/s41586-020-03157-9

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

A reinforcement learning algorithm that explicitly remembers promising states and returns to them as a basis for further exploration solves all as-yet-unsolved Atari games and out-performs previous algorithms on Montezuma's Revenge and Pitfall. Reinforcement learning promises to solve complex sequential-decision problems autonomously by specifying a high-level reward function only. However, reinforcement learning algorithms struggle when, as is often the case, simple and intuitive rewards provide sparse(1) and deceptive(2) feedback. Avoiding these pitfalls requires a thorough exploration of the environment, but creating algorithms that can do so remains one of the central challenges of the field. Here we hypothesize that the main impediment to effective exploration originates from algorithms forgetting how to reach previously visited states (detachment) and failing to first return to a state before exploring from it (derailment). We introduce Go-Explore, a family of algorithms that addresses these two challenges directly through the simple principles of explicitly 'remembering' promising states and returning to such states before intentionally exploring. Go-Explore solves all previously unsolved Atari games and surpasses the state of the art on all hard-exploration games(1), with orders-of-magnitude improvements on the grand challenges of Montezuma's Revenge and Pitfall. We also demonstrate the practical potential of Go-Explore on a sparse-reward pick-and-place robotics task. Additionally, we show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handle stochasticity throughout training. The substantial performance gains from Go-Explore suggest that the simple principles of remembering states, returning to them, and exploring from them are a powerful and general approach to exploration-an insight that may prove critical to the creation of truly intelligent learning agents.

引用

页码：580 / 586

页数：22

共 50 条

[31] Social Democratic Think Tanks Explore the Magical Return of Social Democracy in a Liberal Era
Keman, Hans
[J]. EUROPEAN POLITICAL SCIENCE, 2008, 7 (04) : 494 - 506
[32] Bottlenecks to glass return and refill in the United Kingdom: User Journeys to explore industry perspectives
Cinderby, Steve
Mckendree, Jean
[J]. SUSTAINABLE FUTURES, 2024, 7
[33] A Study to Explore the Feasibility of Using a Social Return on Investment Approach to Evaluate Short Breaks
Toms, Gill R.
Stringer, Carys Ll
Prendergast, Louise M.
Seddon, Diane
Anthony, Bethany F.
Edwards, Rhiannon T.
[J]. HEALTH & SOCIAL CARE IN THE COMMUNITY, 2023, 2023
[34] Using automated reasoning to explore the metabolism of unconventional organisms: a first step to explore host-microbial interactions
Frioux, Clemence
Dittami, Simon M.
Siegel, Anne
[J]. BIOCHEMICAL SOCIETY TRANSACTIONS, 2020, 48 (03) : 901 - 913
[35] A Macroscopic Model for First Return Stroke of Lightning
Raysaha, Rosy Balaram
Kumar, Udaya
Thottappillil, Rajeev
[J]. IEEE TRANSACTIONS ON ELECTROMAGNETIC COMPATIBILITY, 2011, 53 (03) : 782 - 791
[36] UNIVERSALLY FIRST RETURN CONTINUOUS-FUNCTIONS
DARJI, UB
EVANS, MJ
OMALLEY, RJ
[J]. PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 1995, 123 (09) : 2677 - 2685
[37] Psychologists Return to the First Question of Western Philosophy
Clifton, Jeremy D. W.
[J]. AMERICAN PSYCHOLOGIST, 2024,
[38] The Distribution of the First Return Time for Rational Maps
Nicolai Haydn
[J]. Journal of Statistical Physics, 1999, 94 : 1027 - 1036
[39] FIRST SIGNS OF SOLAR-ACTIVITY RETURN
KOUTCHMY, S
BAREAU, C
STELLMACHER, G
[J]. COMPTES RENDUS HEBDOMADAIRES DES SEANCES DE L ACADEMIE DES SCIENCES SERIE B, 1974, 278 (19): : 873 - 876
[40] Return of the Controversy about the First World War
Hueppauf, Bernd
[J]. MERKUR-DEUTSCHE ZEITSCHRIFT FUR EUROPAISCHES DENKEN, 2014, 68 (10): : 895 - 902

← 1 2 3 4 5 →