On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot

被引:10
|
作者
Mastropaolo, Antonio [1 ]
Pascarella, Luca [1 ]
Guglielmi, Emanuela [2 ]
Ciniselli, Matteo [1 ]
Scalabrino, Simone [2 ]
Oliveto, Rocco [2 ]
Bavota, Gabriele [1 ]
机构
[1] Univ Svizzera Italiana USI, SEART Software Inst, Lugano, Switzerland
[2] Univ Molise, STAKE Lab, Campobasso, Italy
基金
欧洲研究理事会;
关键词
Empirical Study; Recommender Systems; USAGE;
D O I
10.1109/ICSE48619.2023.00181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step forward, also because of its unprecedented ability to automatically generate even entire functions from their natural language description. While the usefulness of Copilot is evident, it is still unclear to what extent it is robust. Specifically, we do not know the extent to which semantic-preserving changes in the natural language description provided to the model have an effect on the generated code function. In this paper we present an empirical study in which we aim at understanding whether different but semantically equivalent natural language descriptions result in the same recommended function. A negative answer would pose questions on the robustness of deep learning (DL)-based code generators since it would imply that developers using different wordings to describe the same code would obtain different recommendations. We asked Copilot to automatically generate 892 Java methods starting from their original Javadoc description. Then, we generated different semantically equivalent descriptions for each method both manually and automatically, and we analyzed the extent to which predictions generated by Copilot changed. Our results show that modifying the description results in different code recommendations in similar to 46% of cases. Also, differences in the semantically equivalent descriptions might impact the correctness of the generated code (+/- 28%).
引用
收藏
页码:2149 / 2160
页数:12
相关论文
共 50 条
  • [1] An Empirical Evaluation of GitHub Copilot's Code Suggestions
    Nhan Nguyen
    Nadi, Sarah
    [J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 1 - 5
  • [2] Assessing the Quality of GitHub Copilot's Code Generation
    Yetistiren, Burak
    Ozsoy, Isik
    Tuzun, Eray
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON PREDICTIVE MODELS AND DATA ANALYTICS IN SOFTWARE ENGINEERING, PROMISE 2022, 2022, : 62 - 71
  • [3] Using GitHub Copilot for Test Generation in Python']Python: An Empirical Study
    El Haji, Khalid
    Brandt, Carolin
    Zaidman, Andy
    [J]. PROCEEDINGS OF THE 2024 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST, AST 2024, 2024, : 45 - 55
  • [4] CodexLeaks: Privacy Leaks from Code Generation Language Models in GitHub Copilot
    Niu, Liang
    Mirza, Shujaat
    Maradni, Zayd
    Popper, Christina
    [J]. PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2133 - 2150
  • [5] Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study
    Imai, Saki
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2022), 2022, : 319 - 321
  • [6] Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?
    Owura Asare
    Meiyappan Nagappan
    N. Asokan
    [J]. Empirical Software Engineering, 2023, 28
  • [7] Is GitHub's Copilot as bad as humans at introducing vulnerabilities in code?
    Asare, Owura
    Nagappan, Meiyappan
    Asokan, N.
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (06)
  • [8] Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions
    Pearce, Hammond
    Ahmad, Baleegh
    Tan, Benjamin
    Dolan-Gavitt, Brendan
    Karri, Ramesh
    [J]. 43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 754 - 768
  • [9] Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot
    Koyanagi, Kei
    Wang, Dong
    Noguchi, Kotaro
    Kondo, Masanari
    Serebrenik, Alexander
    Kamei, Yasutaka
    Ubayashi, Naoyasu
    [J]. 2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 481 - 486
  • [10] Zero-shot Prompting for Code Complexity Prediction Using GitHub Copilot
    Siddiq, Mohammed Latif
    Samee, Abdus
    Azgor, Sk Ruhul
    Haider, Md. Asif
    Sawraz, Shehabul Islam
    Santos, Joanna C. S.
    [J]. 2023 IEEE/ACM 2ND INTERNATIONAL WORKSHOP ON NATURAL LANGUAGE-BASED SOFTWARE ENGINEERING, NLBSE, 2023, : 56 - 59