An Empirical Comparison of Pre-Trained Models of Source Code

被引:9
|
作者
Niu, Changan [1 ]
Li, Chuanyi [1 ]
Ng, Vincent [2 ]
Chen, Dongxiao [1 ]
Ge, Jidong [1 ]
Luo, Bin [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Univ Texas Dallas, Human Language Technol Res Inst, Richardson, TX 75080 USA
基金
中国国家自然科学基金;
关键词
Pre-training of Source Code; AI for SE;
D O I
10.1109/ICSE48619.2023.00180
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly limited. With the goal of advancing our understanding of these models, we perform the first systematic empirical comparison of 19 recently-developed pre-trained models of source code on 13 SE tasks. To gain additional insights into these models, we adopt a recently-developed 4-dimensional categorization of pre-trained models, and subsequently investigate whether there are correlations between different categories of pre-trained models and their performances on different SE tasks.
引用
收藏
页码:2136 / 2148
页数:13
相关论文
共 50 条
  • [1] CODEEDITOR: Learning to Edit Source Code with Pre-trained Models
    Li, Jia
    Li, Ge
    Li, Zhuo
    Jin, Zhi
    Hu, Xing
    Zhang, Kechi
    Fu, Zhiyi
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
  • [2] Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding
    Wang, Deze
    Jia, Zhouyang
    Li, Shanshan
    Yu, Yue
    Xiong, Yun
    Dong, Wei
    Liao, Xiangke
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 287 - 298
  • [3] Pre-trained transformers: an empirical comparison
    Casola, Silvia
    Lauriola, Ivano
    Lavelli, Alberto
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [4] Natural Attack for Pre-trained Models of Code
    Yang, Zhou
    Shi, Jieke
    He, Junda
    Lo, David
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1482 - 1493
  • [5] Extending Source Code Pre-Trained Language Models to Summarise Decompiled Binaries
    Al-Kaswan, Ali
    Ahmed, Toufique
    Izadi, Maliheh
    Sawant, Anand Ashok
    Devanbu, Premkumar
    van Deursen, Arie
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 260 - 271
  • [6] Compressing Pre-trained Models of Code into 3 MB
    Shi, Jieke
    Yang, Zhou
    Xu, Bowen
    Kang, Hong Jin
    Lo, David
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,
  • [7] Leveraging pre-trained language models for code generation
    Soliman, Ahmed
    Shaheen, Samir
    Hadhoud, Mayada
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
  • [8] PTM-APIRec: Leveraging Pre-trained Models of Source Code in API Recommendation
    Li, Zhihao
    Li, Chuanyi
    Tang, Ze
    Huang, Wanhong
    Ge, Jidong
    Luo, Bin
    Ng, Vincent
    Wang, Ting
    Hu, Yucheng
    Zhang, Xiaopeng
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (03)
  • [9] What do pre-trained code models know about code?
    Karmakar, Anjan
    Robbes, Romain
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1332 - 1336
  • [10] What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code
    Wan, Yao
    Zhao, Wei
    Zhang, Hongyu
    Sui, Yulei
    Xu, Guandong
    Jin, Hai
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2377 - 2388