Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude

被引：0

作者：

Sobo, Andrei ^{[1
]}

Mubarak, Awes ^{[1
]}

Baimagambetov, Almas ^{[1
]}

Polatidis, Nikolaos ^{[1
]}

机构：

[1] Univ Brighton, Sch Architecture Technol & Engn, Cockcroft 521,Moulsecoomb Campus, Brighton BN2 4GJ, England

来源：

APPLIED ARTIFICIAL INTELLIGENCE | 2025年 / 39卷 / 01期

关键词：

Personnel training - Robot applications - Robot programming;

D O I：

10.1080/08839514.2024.2439610

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study investigates the effectiveness of Large Language Models (LLMs) in generating code for Human-Robot Interaction (HRI) applications. We present the first direct comparison of ChatGPT 3.5, Gemini 1.5 Pro, and Claude 3.5 Sonnet in the specific context of generating code for Human-Robot Interaction applications. Through a series of 20 carefully designed prompts, ranging from simple movement commands to complex object manipulation scenarios, we evaluate the models' ability to generate accurate and context-aware code. Our findings reveal significant variations in performance, with Claude 3.5 Sonnet achieving a 95% success rate, Gemini 1.5 Pro at 60%, and ChatGPT 3.5 at 20%. The study highlights the rapid advancement in LLM capabilities for specialized programming tasks while also identifying persistent challenges in spatial reasoning and adherence to specific constraints. These results suggest promising applications for LLMs in robotics development and education while emphasizing the continued need for human oversight and specialized training in AI-assisted programming for HRI.

引用

页数：22

共 44 条

[41] Evaluating Artificial Intelligence Efficacy: A Comparative Study between ChatGPT-4's Treatment Recommendations and Orthopaedic Clinical Practice Guidelines
Dagher, Tanios
Dwyer, Emma
Baker, Hayden P.
Kalidoss, Senthooran
Strelzow, Jason
JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2024, 239 (05) : S325 - S326
[42] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study
Roos, Jonas
Martin, Ron
Kaczmarczyk, Robert
JMIR FORMATIVE RESEARCH, 2024, 8
[43] A COMPARATIVE-STUDY OF THE PARABOLIZED NAVIER-STOKES CODE USING VARIOUS GRID-GENERATION TECHNIQUES
KAUL, UK
CHAUSSEE, DS
COMPUTERS & FLUIDS, 1985, 13 (04) : 421 - 441
[44] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study (vol 8, e57592, 2025)
Roos, Jonas
Martin, Ron
Kaczmarczyk, Robert
JMIR FORMATIVE RESEARCH, 2025, 9

← 1 2 3 4 5 →