A Survey of Robot Intelligence with Large Language Models

被引：3

作者：

Jeong, Hyeongyo ^{[1
]}

Lee, Haechan ^{[1
]}

Kim, Changwon ^{[2
]}

Shin, Sungtae ^{[1
]}

机构：

[1] Dong A Univ, Dept Mech Engn, Busan 49315, South Korea

[2] Pukyong Natl Univ, Sch Mech Engn, Busan 48513, South Korea

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 19期

基金：

新加坡国家研究基金会;

关键词：

embodied intelligence; foundation model; large language model (LLM); vision-language model (VLM); vision-language-action (VLA) model; robotics;

D O I：

10.3390/app14198868

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In general, traditional supervised learning-based robot intelligence systems have a significant lack of adaptability to dynamically changing environments. However, LLMs help a robot intelligence system to improve its generalization ability in dynamic and complex real-world environments. Indeed, findings from ongoing robotics studies indicate that LLMs can significantly improve robots' behavior planning and execution capabilities. Additionally, vision-language models (VLMs), trained on extensive visual and linguistic data for the vision question answering (VQA) problem, excel at integrating computer vision with natural language processing. VLMs can comprehend visual contexts and execute actions through natural language. They also provide descriptions of scenes in natural language. Several studies have explored the enhancement of robot intelligence using multimodal data, including object recognition and description by VLMs, along with the execution of language-driven commands integrated with visual information. This review paper thoroughly investigates how foundation models such as LLMs and VLMs have been employed to boost robot intelligence. For clarity, the research areas are categorized into five topics: reward design in reinforcement learning, low-level control, high-level planning, manipulation, and scene understanding. This review also summarizes studies that show how foundation models, such as the Eureka model for automating reward function design in reinforcement learning, RT-2 for integrating visual data, language, and robot actions in vision-language-action models, and AutoRT for generating feasible tasks and executing robot behavior policies via LLMs, have improved robot intelligence.

引用

页数：39

共 50 条

[31] TidyBot: Personalized Robot Assistance with Large Language Models
Wu, Jimmy
Antonova, Rika
Kan, Adam
Lepert, Marion
Zeng, Andy
Song, Shuran
Bohg, Jeannette
Rusinkiewicz, Szymon
Funkhouser, Thomas
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 3546 - 3553
[32] TidyBot: personalized robot assistance with large language models
Wu, Jimmy
Antonova, Rika
Kan, Adam
Lepert, Marion
Zeng, Andy
Song, Shuran
Bohg, Jeannette
Rusinkiewicz, Szymon
Funkhouser, Thomas
AUTONOMOUS ROBOTS, 2023, 47 (08) : 1087 - 1102
[33] TidyBot: personalized robot assistance with large language models
Jimmy Wu
Rika Antonova
Adam Kan
Marion Lepert
Andy Zeng
Shuran Song
Jeannette Bohg
Szymon Rusinkiewicz
Thomas Funkhouser
Autonomous Robots, 2023, 47 : 1087 - 1102
[34] Embodied Intelligence Systems Based on Large Models: A Survey
Wang, Wen-Sheng
Tan, Ning
Huang, Kai
Zhang, Yu-Nong
Zheng, Wei-Shi
Sun, Fu-Chun
Zidonghua Xuebao/Acta Automatica Sinica, 2025, 51 (01): : 1 - 19
[35] The cognitive age in medicine: Artificial intelligence, large language models, and iterative intelligence
Nosta, John
AMERICAN JOURNAL OF HEMATOLOGY, 2024, 99 (12) : 2256 - 2257
[36] Language Artificial Intelligence at a Crossroads: Deciphering the Future of Small and Large Language Models
Shan, Richard
COMPUTER, 2024, 57 (08) : 26 - 35
[37] Large language models and brain-inspired general intelligence
Bo Xu
Mu-ming Poo
NationalScienceReview, 2023, 10 (10) : 6 - 7
[38] Leveraging foundation and large language models in medical artificial intelligence
Wong, Io Nam
Monteiro, Olivia
Baptista-Hon, Daniel T.
Wang, Kai
Lu, Wenyang
Sun, Zhuo
Nie, Sheng
Yin, Yun
CHINESE MEDICAL JOURNAL, 2024, 137 (21) : 2529 - 2539
[39] Large language models and artificial intelligence chatbots in vascular surgery
Lareyre, Fabien
Nasr, Bahaa
Poggi, Elise
Di Lorenzo, Gilles
Ballaith, Ali
Sliti, Imen
Chaudhuri, Arindam
Raffort, Juliette
SEMINARS IN VASCULAR SURGERY, 2024, 7 (03) : 314 - 320
[40] Artificial Intelligence and Large Language Models for the Management of Tobacco Dependence
Chow, Ryan
Jama, Sadia
Cowan, Aaron
Pakhale, Smita
ANNALS OF THE AMERICAN THORACIC SOCIETY, 2025, 22 (02) : 305 - 309

← 1 2 3 4 5 →