Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning

被引：0

作者：

Carta, Thomas ^{[1
]}

Romac, Clement ^{[1
,2
]}

Wolf, Thomas ^{[2
]}

Lamprier, Sylvain ^{[3
]}

Sigaud, Olivier ^{[4
]}

Oudeyer, Pierre-Yves ^{[1
]}

机构：

[1] Univ Bordeaux, Inria Flowers, Bordeaux, France

[2] Hugging Face, Paris, France

[3] Univ Angers, LERIA, SFR MATHSTIC, F-49000 Angers, France

[4] Sorbonne Univ, ISIR, Paris, France

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.

引用

页数：38

共 50 条

[11] Explanatory Interactive Machine Learning with Counterexamples from Constrained Large Language Models
Slany, Emanuel
Scheele, Stephan
Schmid, Ute
KI 2024: ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2024, 2024, 14992 : 324 - 331
[12] LLM4RL: Enhancing Reinforcement Learning with Large Language Models
Zhou, Jiehan
Zhao, Yang
Liu, Jiahong
Dong, Peijun
Luo, Xiaoyu
Tao, Hang
Chang, Shi
Luo, Hanjiang
2024 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CCECE 2024, 2024, : 86 - 87
[13] Hypothesis, Verification, and Induction: Grounding Large Language Models with Self-Driven Skill Learning
Peng, Shaohui
Hu, Xing
Yi, Qi
Zhang, Rui
Guo, Jiaming
Huang, Di
Tian, Zikang
Chen, Ruizhi
Du, Zidong
Guo, Qi
Chen, Yunji
Li, Ling
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14599 - 14607
[14] AI Charades: Language Models as Interactive Game Environments
Frans, Kevin
2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 876 - 877
[15] Online Learning and Exploiting Relational Models in Reinforcement Learning
Croonenborghs, Tom
Ramon, Jan
Blockeel, Hendrik
Bruynooghe, Maurice
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 726 - 731
[16] ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design
Lanzi, Pier Luca
Loiacono, Daniele
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023, 2023, : 1383 - 1390
[17] Interactive Visualization Tools to Improve Learning and Teaching in Online Learning Environments
Kuosa, Kirsi
Distante, Damiano
Tervakari, Anne
Cerulo, Luigi
Fernandez, Alejandro
Koro, Juho
Kailanto, Meri
INTERNATIONAL JOURNAL OF DISTANCE EDUCATION TECHNOLOGIES, 2016, 14 (01) : 1 - 21
[18] Reward Design Using Large Language Models for Natural Language Explanation of Reinforcement Learning Agent Actions
Masadome, Shinya
Harada, Taku
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2025,
[19] Online Learning and Integration of Complex Action and Word Lexicons for Language Grounding
Niehaus, Logan
Levinson, Stephen E.
2012 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL), 2012,
[20] Navigating WebAI: Training Agents to CompleteWeb Tasks with Large Language Models and Reinforcement Learning
Thil, Lucas-Andrei
Popa, Mirela
Spanakis, Gerasimos
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 866 - 874

← 1 2 3 4 5 →