Generation-based Code Review Automation: How Far Are We?

被引:6
|
作者
Zhou, Xin [1 ]
Kim, Kisub [1 ]
Xu, Bowen [1 ]
Han, DongGyun [2 ]
He, Junda [1 ]
Lo, David [1 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
[2] Royal Holloway Univ London, London, England
来源
2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC | 2023年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/ICPC58990.2023.00036
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code review is an effective software quality assurance activity; however, it is labor-intensive and time-consuming. Thus, a number of generation-based automatic code review (ACR) approaches have been proposed recently, which leverage deep learning techniques to automate various activities in the code review process (e.g., code revision generation and review comment generation). We find the previous works carry three main limitations. First, the ACR approaches have been shown to be beneficial in each work, but those methods are not comprehensively compared with each other to show their superiority over their peer ACR approaches. Second, general-purpose pre-trained models such as CodeT5 are proven to be effective in a wide range of Software Engineering (SE) tasks. However, no prior work has investigated the effectiveness of these models in ACR tasks yet. Third, prior works heavily rely on the Exact Match (EM) metric which only focuses on the perfect predictions and ignores the positive progress made by incomplete answers. To fill such a research gap, we conduct a comprehensive study by comparing the effectiveness of recent ACR tools as well as the general-purpose pre-trained models. The results show that a general-purpose pre-trained model CodeT5 can outperform other models in most cases. Specifically, CodeT5 outperforms the prior state-of-the-art by 13.4%-38.9% in two code revision generation tasks. In addition, we introduce a new metric namely Edit Progress (EP) to quantify the partial progress made by ACR tools. The results show that the rankings of models for each task could be changed according to whether EM or EP is being utilized. Lastly, we derive several insightful lessons from the experimental results and reveal future research directions for generation-based code review automation.
引用
收藏
页码:215 / 226
页数:12
相关论文
共 50 条
  • [1] Natural Language to Code: How Far Are We?
    Wang, Shangwen
    Geng, Mingyang
    Lin, Bo
    Sun, Zhensu
    Wen, Ming
    Liu, Yepang
    Li, Li
    Bissyande, Tegawende F.
    Mao, Xiaoguang
    PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 375 - 387
  • [2] Building automation: How far we've come
    Goldschmidt, Ira
    Ehrlich, Paul
    Engineered Systems, 2011, 28 (06):
  • [3] Automatically Assessing Code Understandability: How Far Are We?
    Scalabrino, Simone
    Bavota, Gabriele
    Vendome, Christopher
    Linares-Vasquez, Mario
    Poshyvanyk, Denys
    Oliveto, Rocco
    PROCEEDINGS OF THE 2017 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE'17), 2017, : 417 - 427
  • [4] How far are we from reproducible research on code smell detection? A systematic literature review
    Lewowski, Tomasz
    Madeyski, Lech
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 144
  • [5] Automation in code Generation: Tertiary and Systematic Mapping Review
    Ibn Batouta, Zouhair
    Dehbi, Rachid
    Talea, Mohammed
    Hajoui, Omar
    2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 200 - 205
  • [6] Neurala-Machine-Transiation-Based Commit Message Generation: How Far Are We?
    Liu, Zhongxin
    Xia, Xin
    Hassan, Ahmed E.
    Lo, David
    Xing, Zhenchang
    Wang, Xinyu
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 373 - 384
  • [7] HOW FAR IS AUTOMATION GOING
    不详
    WELDING AND METAL FABRICATION, 1978, 46 (05): : 359 - 359
  • [8] Model-based code generation works: But how far does it go?—on the role of the generator
    Benoit Combemale
    Jeff Gray
    Bernhard Rumpe
    Software and Systems Modeling, 2024, 23 : 267 - 268
  • [9] Model-based code generation works: But how far does it go?-on the role of the generator
    Combemale, Benoit
    Gray, Jeff
    Rumpe, Bernhard
    SOFTWARE AND SYSTEMS MODELING, 2024, 23 (02): : 267 - 268
  • [10] Commit Message Generation via ChatGPT: How Far Are We?
    Wu, Yifan
    Li, Ying
    Yu, Siyu
    PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024, 2024, : 124 - 129