Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude

被引:0
|
作者
Sobo, Andrei [1 ]
Mubarak, Awes [1 ]
Baimagambetov, Almas [1 ]
Polatidis, Nikolaos [1 ]
机构
[1] Univ Brighton, Sch Architecture Technol & Engn, Cockcroft 521,Moulsecoomb Campus, Brighton BN2 4GJ, England
关键词
Personnel training - Robot applications - Robot programming;
D O I
10.1080/08839514.2024.2439610
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigates the effectiveness of Large Language Models (LLMs) in generating code for Human-Robot Interaction (HRI) applications. We present the first direct comparison of ChatGPT 3.5, Gemini 1.5 Pro, and Claude 3.5 Sonnet in the specific context of generating code for Human-Robot Interaction applications. Through a series of 20 carefully designed prompts, ranging from simple movement commands to complex object manipulation scenarios, we evaluate the models' ability to generate accurate and context-aware code. Our findings reveal significant variations in performance, with Claude 3.5 Sonnet achieving a 95% success rate, Gemini 1.5 Pro at 60%, and ChatGPT 3.5 at 20%. The study highlights the rapid advancement in LLM capabilities for specialized programming tasks while also identifying persistent challenges in spatial reasoning and adherence to specific constraints. These results suggest promising applications for LLMs in robotics development and education while emphasizing the continued need for human oversight and specialized training in AI-assisted programming for HRI.
引用
收藏
页数:22
相关论文
共 44 条
  • [1] Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study
    Pagano, Stefano
    Strumolo, Luigi
    Michalk, Katrin
    Schiegl, Julia
    Pulido, Loreto C.
    Reinhard, Jan
    Maderbacher, Guenther
    Renkawitz, Tobias
    Schuster, Marie
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2025, 28 : 9 - 15
  • [2] Conversational AI forensics: A case study on ChatGPT, Gemini, Copilot, and Claude
    Cho, Kyungsuk
    Park, Yunji
    Kim, Jiyun
    Kim, Byeongjun
    Jeong, Doowon
    FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2025, 52
  • [3] Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4, Gemini, Copilot
    Tepe, Hacer Taskiran
    Aslanturk, Husnunur
    RESEARCH ON SOCIAL WORK PRACTICE, 2025,
  • [4] Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and Copilot
    Sagodi, Zoltan
    Siket, Istvan
    Ferenc, Rudolf
    IEEE ACCESS, 2024, 12 : 72303 - 72316
  • [5] Evaluating the accuracy and reliability of AI chatbots in patient education on cardiovascular imaging: a comparative study of ChatGPT, gemini, and copilot
    Ahmed Marey
    Abdelrahman M. Saad
    Yousef Tanas
    Hossam Ghorab
    Julia Niemierko
    Hazif Backer
    Muhammad Umair
    Egyptian Journal of Radiology and Nuclear Medicine, 56 (1):
  • [6] Political Bias in Large Language Models: A Comparative Analysis of ChatGPT-4, Perplexity, Google Gemini, and Claude
    Choudhary, Tavishi
    IEEE ACCESS, 2025, 13 : 11341 - 11379
  • [7] Code on Demand: A Comparative Analysis of the Efficiency Understandability and Self-Correction Capability of Copilot ChatGPT and Gemini
    Batista, Samuel Silvestre
    Castro, Otávio
    Branco, Bruno
    Avelino, Guilherme
    ACM International Conference Proceeding Series, : 351 - 361
  • [8] Evaluating the performance of ChatGPT in clinical pharmacy: A comparative study of ChatGPT and clinical pharmacists
    Huang, Xiaoru
    Estau, Dannya
    Liu, Xuening
    Yu, Yang
    Qin, Jiguang
    Li, Zijian
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2024, 90 (01) : 232 - 238
  • [9] Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3
    Zhao, Fang-Fang
    He, Han-Jie
    Liang, Jia-Jian
    Cen, Jingyun
    Wang, Yun
    Lin, Hongjie
    Chen, Feifei
    Li, Tai-Ping
    Yang, Jian-Feng
    Chen, Lan
    Cen, Ling-Ping
    EYE, 2024,
  • [10] An Empirical Study of the Non-Determinism of ChatGPT in Code Generation
    Ouyang, Shuyin
    Zhang, Jie m.
    Harman, Mark
    Wang, Meng
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (02)