On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

被引:23
|
作者
Mastropaolo, Antonio [1 ]
Pascarella, Luca [1 ]
Guglielmi, Emanuela [2 ]
Ciniselli, Matteo [1 ]
Scalabrino, Simone [2 ]
Oliveto, Rocco [2 ]
Bavota, Gabriele [1 ]
机构
[1] Univ Svizzera Italiana USI, SEART Software Inst, Lugano, Switzerland
[2] Univ Molise, STAKE Lab, Campobasso, Italy
基金
欧洲研究理事会;
关键词
Empirical Study; Recommender Systems; USAGE;
D O I
10.1109/ICSE48619.2023.00181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step forward, also because of its unprecedented ability to automatically generate even entire functions from their natural language description. While the usefulness of Copilot is evident, it is still unclear to what extent it is robust. Specifically, we do not know the extent to which semantic-preserving changes in the natural language description provided to the model have an effect on the generated code function. In this paper we present an empirical study in which we aim at understanding whether different but semantically equivalent natural language descriptions result in the same recommended function. A negative answer would pose questions on the robustness of deep learning (DL)-based code generators since it would imply that developers using different wordings to describe the same code would obtain different recommendations. We asked Copilot to automatically generate 892 Java methods starting from their original Javadoc description. Then, we generated different semantically equivalent descriptions for each method both manually and automatically, and we analyzed the extent to which predictions generated by Copilot changed. Our results show that modifying the description results in different code recommendations in similar to 46% of cases. Also, differences in the semantically equivalent descriptions might impact the correctness of the generated code (+/- 28%).
引用
收藏
页码:2149 / 2160
页数:12
相关论文
共 50 条
  • [1] An Empirical Evaluation of GitHub Copilot's Code Suggestions
    Nhan Nguyen
    Nadi, Sarah
    2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 1 - 5
  • [2] Assessing the Quality of GitHub Copilot's Code Generation
    Yetistiren, Burak
    Ozsoy, Isik
    Tuzun, Eray
    PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2022, 2022, : 62 - 71
  • [3] Using GitHub Copilot for Test Generation in Python']Python: An Empirical Study
    El Haji, Khalid
    Brandt, Carolin
    Zaidman, Andy
    PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 45 - 55
  • [4] CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot
    Niu, Liang
    Mirza, Shujaat
    Maradni, Zayd
    Popper, Christina
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2133 - 2150
  • [5] Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study
    Imai, Saki
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 319 - 321
  • [6] Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?
    Owura Asare
    Meiyappan Nagappan
    N. Asokan
    Empirical Software Engineering, 2023, 28
  • [7] Students' Use of GitHub Copilot for Working with Large Code Bases
    Shah, Anshul
    Chernova, Anya
    Tomson, Elena
    Porter, Leo
    Griswold, William G.
    Raj, Adalbert Gerald Soosai
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 1, 2025, : 1050 - 1056
  • [8] Students' Use of GitHub Copilot for Working with Large Code Bases
    Shah, Anshul
    Chernova, Anya
    Tomson, Elena
    Porter, Leo
    Griswold, William G.
    Raj, Adalbert Gerald Soosai
    PROCEEDINGS OF THE 56TH ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, SIGCSE TS 2025, VOL 2, 2025, : 1050 - 1056
  • [9] Is GitHub's Copilot as bad as humans at introducing vulnerabilities in code?
    Asare, Owura
    Nagappan, Meiyappan
    Asokan, N.
    EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (06)
  • [10] Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions
    Pearce, Hammond
    Ahmad, Baleegh
    Tan, Benjamin
    Dolan-Gavitt, Brendan
    Karri, Ramesh
    43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 754 - 768