Investigating the Generalizability of Deep Learning-based Clone Detectors

被引:0
|
作者
Choi, Eunjong [1 ]
Fuke, Norihiro [2 ]
Fujiwara, Yuji [2 ]
Yoshida, Norihiro [3 ]
Inoue, Katsuro [4 ]
机构
[1] Kyoto Inst Technol, Kyoto, Japan
[2] Osaka Univ, Osaka, Japan
[3] Ritsumeikan Univ, Kyoto, Japan
[4] Nanzan Univ, Nagoya, Aichi, Japan
关键词
code clone; deep learning; generalizability; CODE;
D O I
10.1109/ICPC58990.2023.00032
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The generalizability of Deep Learning (DL) models is a significant challenge, as poor generalizability indicates that the model has overfitted to the training data and is not able to generalize to new data. Despite numerous DL-based clone detectors emerging in recent years, their generalizability has not been thoroughly assessed. This study investigates the generalizability of three DL-based clone detectors (CCLearner, ASTNN, and CodeBERT) by comparing their detection accuracy on different training and testing clone benchmarks. The results show that all three clone detectors do not generalize well to new data and there is a strong relationship between clone types and generalizability for CCLearner and ASTNN.
引用
收藏
页码:181 / 185
页数:5
相关论文
共 50 条
  • [1] Investigating the impact of vulnerability datasets on deep learning-based vulnerability detectors
    Liu, Lili
    Li, Zhen
    Wen, Yu
    Chen, Penglong
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8 : 1 - 22
  • [2] Deep Learning-Based Computed Tomography Image Standardization to Improve Generalizability of Deep Learning-Based Hepatic Segmentation
    Lee, Seul Bi
    Hong, Youngtaek
    Cho, Yeon Jin
    Jeong, Dawun
    Lee, Jina
    Yoon, Soon Ho
    Lee, Seunghyun
    Choi, Young Hun
    Cheon, Jung-Eun
    [J]. KOREAN JOURNAL OF RADIOLOGY, 2023, 24 (04) : 294 - 304
  • [3] Generalizability Study On a Deep Learning-Based Dose Conversion Model
    Zhong, X.
    Xing, Y.
    Lin, M.
    Jiang, S.
    Zhang, Y.
    [J]. MEDICAL PHYSICS, 2021, 48 (06)
  • [4] CCLearner: A Deep Learning-Based Clone Detection Approach
    Li, Liuqing
    Feng, He
    Zhuang, Wenjie
    Meng, Na
    Ryder, Barbara
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 249 - 259
  • [5] Assessing the Generalizability of a Deep Learning-based Automated Atrial Fibrillation Algorithm
    Argha, Ahmadreza
    Li, Joan
    Magdy, Joseph
    Alinejad-Rokny, Hamid
    Celler, Branko G.
    Butcher, Ken
    Ooi, Sze-Yuan
    Lovell, Nigel H.
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [6] FDD: a deep learning-based steel defect detectors
    Akhyar, Fityanul
    Liu, Ying
    Hsu, Chao-Yung
    Shih, Timothy K.
    Lin, Chih-Yang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 126 (3-4): : 1093 - 1107
  • [7] A parallel deep learning-based code clone detection model
    Zhang, Xiangping
    Liu, Jianxun
    Shi, Min
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 181
  • [8] On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions
    Ciniselli, Matteo
    Martin-Lopez, Alberto
    Bavota, Gabriele
    [J]. PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 99 - 111
  • [9] Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach
    Etter, Brian
    Hu, James Lee
    Ebrahimi, Mohammadreza
    Li, Weifeng
    Li, Xin
    Chen, Hsinchun
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1313 - 1321
  • [10] Investigating the Robustness and Generalizability of Deep Reinforcement Learning Based Optimal Trade Execution Systems
    Lin, Siyu
    Beling, Peter A.
    [J]. INTELLIGENT COMPUTING, VOL 2, 2021, 284 : 912 - 926