Dependency-Aware Code Naturalness

被引:0
|
作者
Yang, Chen [1 ]
Chen, Junjie [1 ]
Jiang, Jiajun [1 ]
Huang, Yuliang [1 ]
机构
[1] College of Intelligence and Computing, Tianjin University, Tianjin, China
关键词
D O I
10.1145/3689794
中图分类号
学科分类号
摘要
Code naturalness, which captures repetitiveness and predictability in programming languages, has proven valuable for various code-related tasks in software engineering. However, precisely measuring code naturalness remains a fundamental challenge. Existing methods measure code naturalness over individual lines of code while ignoring the deep semantic relations among different lines, e.g., program dependency, which may negatively affect the precision of the measure. Despite the intuitive appeal of extending the code naturalness measure to the code dependency domain (as there are some work that have initiated the utilization of code dependency for diverse code-related tasks), this assumption remains unexplored and warrants direct investigation. In this study, we aim to perform the first empirical study to investigate whether incorporating code dependency, instead of analyzing individual lines, can enhance the precision of measuring code naturalness. To achieve that, we first propose a new method named DAN for measuring code naturalness by incorporating the rich dependency information in the code. Specifically, DAN extracts multiple sequences of code lines by traversing the program dependency graph, where different code lines are connected by dependencies in each sequence, and then the code naturalness will be measured by taking each sequence as a whole. In this way, the dependency information can be well captured. Finally, we have conducted an extensive study to evaluate the influence of code dependency for measuring code naturalness with DAN, and compared it with the state-of-the-art methods under three emerging application scenarios of code naturalness. The results demonstrate that DAN can not only better distinguish natural and unnatural code, but also substantially boost two important downstream applications of code naturalness, i.e., distinguishing buggy and non-buggy code lines and data cleansing for training better code models, reflecting the significance of code dependency in measuring code naturalness. © 2024 Owner/Author.
引用
收藏
相关论文
共 50 条
  • [1] Dependency-aware Form Understanding
    Zhang, Shaokun
    Li, Yuanchun
    Yan, Weixiang
    Guo, Yao
    Chen, Xiangqun
    2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 139 - 149
  • [2] Dependency-Aware Data Locality for MapReduce
    Fan, Xiaoyi
    Ma, Xiaoqiang
    Liu, Jiangchuan
    Li, Dan
    2014 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2014, : 409 - 416
  • [3] Dependency-Aware Web Test Generation
    Biagiola, Matteo
    Stocco, Andrea
    Ricca, Filippo
    Tonella, Paolo
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VALIDATION AND VERIFICATION (ICST 2020), 2020, : 175 - 185
  • [4] Dependency-Aware Data Locality for MapReduce
    Ma, Xiaoqiang
    Fan, Xiaoyi
    Liu, Jiangchuan
    Li, Dan
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2018, 6 (03) : 667 - 679
  • [5] Dependency-Aware Neural Topic Model
    Huang, Heyan
    Tang, Yi-Kun
    Shi, Xuewen
    Mao, Xian-Ling
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [6] Dependency-Aware Software Release Planning
    Mougouei, Davoud
    Powers, David M. W.
    Moeini, Asghar
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 198 - 200
  • [7] Dependency-aware Fault Tree Analysis
    Prohaska, Alexander
    2021 5TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2021), 2021, : 22 - 31
  • [8] Dependency-Aware Metamorphic Testing of Datalog Engines
    Mansur, Muhammad Numair
    Wuestholz, Valentin
    Christakis, Maria
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 236 - 247
  • [9] VIDEZZO: Dependency-aware Virtual Device Fuzzing
    Liu, Qiang
    Toffalini, Flavio
    Zhou, Yajin
    Payer, Mathias
    2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 3228 - 3245
  • [10] Dependency-Aware Caching for HTTP Adaptive Streaming
    Zhang, Cong
    Liu, Jiangchuan
    Chen, Fei
    Cui, Yong
    Ngai, Edith C. -H.
    2016 DIGITAL MEDIA INDUSTRY AND ACADEMIC FORUM (DMIAF), 2016, : 89 - 93