A Survey of Robot Intelligence with Large Language Models

被引:3
|
作者
Jeong, Hyeongyo [1 ]
Lee, Haechan [1 ]
Kim, Changwon [2 ]
Shin, Sungtae [1 ]
机构
[1] Dong A Univ, Dept Mech Engn, Busan 49315, South Korea
[2] Pukyong Natl Univ, Sch Mech Engn, Busan 48513, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 19期
基金
新加坡国家研究基金会;
关键词
embodied intelligence; foundation model; large language model (LLM); vision-language model (VLM); vision-language-action (VLA) model; robotics;
D O I
10.3390/app14198868
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Since the emergence of ChatGPT, research on large language models (LLMs) has actively progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited exceptional abilities in understanding natural language and planning tasks. These abilities of LLMs are promising in robotics. In general, traditional supervised learning-based robot intelligence systems have a significant lack of adaptability to dynamically changing environments. However, LLMs help a robot intelligence system to improve its generalization ability in dynamic and complex real-world environments. Indeed, findings from ongoing robotics studies indicate that LLMs can significantly improve robots' behavior planning and execution capabilities. Additionally, vision-language models (VLMs), trained on extensive visual and linguistic data for the vision question answering (VQA) problem, excel at integrating computer vision with natural language processing. VLMs can comprehend visual contexts and execute actions through natural language. They also provide descriptions of scenes in natural language. Several studies have explored the enhancement of robot intelligence using multimodal data, including object recognition and description by VLMs, along with the execution of language-driven commands integrated with visual information. This review paper thoroughly investigates how foundation models such as LLMs and VLMs have been employed to boost robot intelligence. For clarity, the research areas are categorized into five topics: reward design in reinforcement learning, low-level control, high-level planning, manipulation, and scene understanding. This review also summarizes studies that show how foundation models, such as the Eureka model for automating reward function design in reinforcement learning, RT-2 for integrating visual data, language, and robot actions in vision-language-action models, and AutoRT for generating feasible tasks and executing robot behavior policies via LLMs, have improved robot intelligence.
引用
收藏
页数:39
相关论文
共 50 条
  • [31] TidyBot: Personalized Robot Assistance with Large Language Models
    Wu, Jimmy
    Antonova, Rika
    Kan, Adam
    Lepert, Marion
    Zeng, Andy
    Song, Shuran
    Bohg, Jeannette
    Rusinkiewicz, Szymon
    Funkhouser, Thomas
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 3546 - 3553
  • [32] TidyBot: personalized robot assistance with large language models
    Wu, Jimmy
    Antonova, Rika
    Kan, Adam
    Lepert, Marion
    Zeng, Andy
    Song, Shuran
    Bohg, Jeannette
    Rusinkiewicz, Szymon
    Funkhouser, Thomas
    AUTONOMOUS ROBOTS, 2023, 47 (08) : 1087 - 1102
  • [33] TidyBot: personalized robot assistance with large language models
    Jimmy Wu
    Rika Antonova
    Adam Kan
    Marion Lepert
    Andy Zeng
    Shuran Song
    Jeannette Bohg
    Szymon Rusinkiewicz
    Thomas Funkhouser
    Autonomous Robots, 2023, 47 : 1087 - 1102
  • [34] Embodied Intelligence Systems Based on Large Models: A Survey
    Wang, Wen-Sheng
    Tan, Ning
    Huang, Kai
    Zhang, Yu-Nong
    Zheng, Wei-Shi
    Sun, Fu-Chun
    Zidonghua Xuebao/Acta Automatica Sinica, 2025, 51 (01): : 1 - 19
  • [35] The cognitive age in medicine: Artificial intelligence, large language models, and iterative intelligence
    Nosta, John
    AMERICAN JOURNAL OF HEMATOLOGY, 2024, 99 (12) : 2256 - 2257
  • [37] Large language models and brain-inspired general intelligence
    Bo Xu
    Mu-ming Poo
    NationalScienceReview, 2023, 10 (10) : 6 - 7
  • [38] Leveraging foundation and large language models in medical artificial intelligence
    Wong, Io Nam
    Monteiro, Olivia
    Baptista-Hon, Daniel T.
    Wang, Kai
    Lu, Wenyang
    Sun, Zhuo
    Nie, Sheng
    Yin, Yun
    CHINESE MEDICAL JOURNAL, 2024, 137 (21) : 2529 - 2539
  • [39] Large language models and artificial intelligence chatbots in vascular surgery
    Lareyre, Fabien
    Nasr, Bahaa
    Poggi, Elise
    Di Lorenzo, Gilles
    Ballaith, Ali
    Sliti, Imen
    Chaudhuri, Arindam
    Raffort, Juliette
    SEMINARS IN VASCULAR SURGERY, 2024, 7 (03) : 314 - 320
  • [40] Artificial Intelligence and Large Language Models for the Management of Tobacco Dependence
    Chow, Ryan
    Jama, Sadia
    Cowan, Aaron
    Pakhale, Smita
    ANNALS OF THE AMERICAN THORACIC SOCIETY, 2025, 22 (02) : 305 - 309