Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude

被引：0

作者：

Sobo, Andrei ^{[1
]}

Mubarak, Awes ^{[1
]}

Baimagambetov, Almas ^{[1
]}

Polatidis, Nikolaos ^{[1
]}

机构：

[1] Univ Brighton, Sch Architecture Technol & Engn, Cockcroft 521,Moulsecoomb Campus, Brighton BN2 4GJ, England

来源：

APPLIED ARTIFICIAL INTELLIGENCE | 2025年 / 39卷 / 01期

关键词：

Personnel training - Robot applications - Robot programming;

D O I：

10.1080/08839514.2024.2439610

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study investigates the effectiveness of Large Language Models (LLMs) in generating code for Human-Robot Interaction (HRI) applications. We present the first direct comparison of ChatGPT 3.5, Gemini 1.5 Pro, and Claude 3.5 Sonnet in the specific context of generating code for Human-Robot Interaction applications. Through a series of 20 carefully designed prompts, ranging from simple movement commands to complex object manipulation scenarios, we evaluate the models' ability to generate accurate and context-aware code. Our findings reveal significant variations in performance, with Claude 3.5 Sonnet achieving a 95% success rate, Gemini 1.5 Pro at 60%, and ChatGPT 3.5 at 20%. The study highlights the rapid advancement in LLM capabilities for specialized programming tasks while also identifying persistent challenges in spatial reasoning and adherence to specific constraints. These results suggest promising applications for LLMs in robotics development and education while emphasizing the continued need for human oversight and specialized training in AI-assisted programming for HRI.

引用

页数：22

共 44 条

[1] Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study
Pagano, Stefano
Strumolo, Luigi
Michalk, Katrin
Schiegl, Julia
Pulido, Loreto C.
Reinhard, Jan
Maderbacher, Guenther
Renkawitz, Tobias
Schuster, Marie
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2025, 28 : 9 - 15
[2] Conversational AI forensics: A case study on ChatGPT, Gemini, Copilot, and Claude
Cho, Kyungsuk
Park, Yunji
Kim, Jiyun
Kim, Byeongjun
Jeong, Doowon
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2025, 52
[3] Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4, Gemini, Copilot
Tepe, Hacer Taskiran
Aslanturk, Husnunur
RESEARCH ON SOCIAL WORK PRACTICE, 2025,
[4] Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and Copilot
Sagodi, Zoltan
Siket, Istvan
Ferenc, Rudolf
IEEE ACCESS, 2024, 12 : 72303 - 72316
[5] Evaluating the accuracy and reliability of AI chatbots in patient education on cardiovascular imaging: a comparative study of ChatGPT, gemini, and copilot
Ahmed Marey
Abdelrahman M. Saad
Yousef Tanas
Hossam Ghorab
Julia Niemierko
Hazif Backer
Muhammad Umair
Egyptian Journal of Radiology and Nuclear Medicine, 56 (1):
[6] Political Bias in Large Language Models: A Comparative Analysis of ChatGPT-4, Perplexity, Google Gemini, and Claude
Choudhary, Tavishi
IEEE ACCESS, 2025, 13 : 11341 - 11379
[7] Code on Demand: A Comparative Analysis of the Efficiency Understandability and Self-Correction Capability of Copilot ChatGPT and Gemini
Batista, Samuel Silvestre
Castro, Otávio
Branco, Bruno
Avelino, Guilherme
ACM International Conference Proceeding Series, : 351 - 361
[8] Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists
Huang, Xiaoru
Estau, Dannya
Liu, Xuening
Yu, Yang
Qin, Jiguang
Li, Zijian
BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2024, 90 (01) : 232 - 238
[9] Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3
Zhao, Fang-Fang
He, Han-Jie
Liang, Jia-Jian
Cen, Jingyun
Wang, Yun
Lin, Hongjie
Chen, Feifei
Li, Tai-Ping
Yang, Jian-Feng
Chen, Lan
Cen, Ling-Ping
EYE, 2024,
[10] An Empirical Study of the Non-Determinism of ChatGPT in Code Generation
Ouyang, Shuyin
Zhang, Jie m.
Harman, Mark
Wang, Meng
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (02)

← 1 2 3 4 5 →