On the "Naturalness" of Buggy Code

被引：145

作者：

Ray, Baishakhi ^{[1
]}

Hellendoorn, Vincent ^{[2
]}

Godhane, Saheel ^{[2
]}

Tu, Zhaopeng ^{[3
]}

Bacchelli, Alberto ^{[4
]}

Devanbu, Premkumar ^{[2
]}

机构：

[1] Univ Virginia, Charlottesville, VA 22903 USA

[2] Univ Calif Davis, Davis, CA 95616 USA

[3] Huawei Technol Co Ltd, Shenzhen, Guangdong, Peoples R China

[4] Delft Univ Technol, Delft, Netherlands

来源：

2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) | 2016年

基金：

美国国家科学基金会;

关键词：

PREDICTING FAULTS;

D O I：

10.1145/2884781.2884848

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be "natural", like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is "unnatural" in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca.7,139), from 10 different Java projects, and focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed. Ordering files for inspection by their average entropy yields cost-effectiveness scores comparable to popular defect prediction methods. At a finer granularity, focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and ordering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.

引用

页码：428 / 439

页数：12

共 50 条

[1] CBCD: Cloned Buggy Code Detector
Li, Jingyue
Ernst, Michael D.
2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2012, : 310 - 320
[2] On the Characteristics of Buggy Code Clones: A Code Quality Perspective
Islam, Md Rakibul
Zibran, Minhaz F.
2018 IEEE 12TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2018, : 23 - 29
[3] On the Naturalness of Fuzzer-Generated Code
Kambhamettu, Rajeswari Hita
Billos, John
Oluwaseun-Apo, Tomi
Gafford, Benjamin
Padhye, Rohan
Hellendoorn, Vincent J.
2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 506 - 510
[4] Dependency-Aware Code Naturalness
Yang, Chen
Chen, Junjie
Jiang, Jiajun
Huang, Yuliang
Proceedings of the ACM on Programming Languages, 2024, 8 (OOPSLA2)
[5] On the Impact of Refactoring Operations on Code Naturalness
Lin, Bin
Nagy, Csaba
Bavota, Gabriele
Lanza, Michele
2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 594 - 598
[6] Scalable and Systematic Detection of Buggy Inconsistencies in Source Code
Gabel, Mark
Yang, Junfeng
Yu, Yuan
Goldszmidt, Moises
Su, Zhendong
ACM SIGPLAN NOTICES, 2010, 45 (10) : 175 - 190
[7] Toward Refactoring Evaluation with Code Naturalness
Arima, Ryo
Higo, Yoshiki
Kusumoto, Shinji
2018 IEEE/ACM 26TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2018), 2018, : 316 - 319
[8] Supporting Code Review by Automatic Detection of Potentially Buggy Changes
Fejzer, Mikolaj
Wojtyna, Michal
Burzanska, Marta
Wisniewski, Piotr
Stencel, Krzysztof
BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2015, 2015, 521 : 473 - 482
[9] Research Progress of Code Naturalness and Its Application
Chen Z.-Z.
Yan M.
Xia X.
Liu Z.-X.
Xu Z.
Lei Y.
Ruan Jian Xue Bao/Journal of Software, 2022, 33 (08): : 3015 - 3034
[10] A Survey of Machine Learning for Big Code and Naturalness
Allamanis, Miltiadis
Barr, Earl T.
Devanbu, Premkumar
Sutton, Charles
ACM COMPUTING SURVEYS, 2018, 51 (04)

← 1 2 3 4 5 →