Investigating the Generalizability of Deep Learning-based Clone Detectors

被引：1

作者：

Choi, Eunjong ^{[1
]}

Fuke, Norihiro ^{[2
]}

Fujiwara, Yuji ^{[2
]}

Yoshida, Norihiro ^{[3
]}

Inoue, Katsuro ^{[4
]}

机构：

[1] Kyoto Inst Technol, Kyoto, Japan

[2] Osaka Univ, Osaka, Japan

[3] Ritsumeikan Univ, Kyoto, Japan

[4] Nanzan Univ, Nagoya, Aichi, Japan

来源：

2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC | 2023年

关键词：

code clone; deep learning; generalizability; CODE;

D O I：

10.1109/ICPC58990.2023.00032

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The generalizability of Deep Learning (DL) models is a significant challenge, as poor generalizability indicates that the model has overfitted to the training data and is not able to generalize to new data. Despite numerous DL-based clone detectors emerging in recent years, their generalizability has not been thoroughly assessed. This study investigates the generalizability of three DL-based clone detectors (CCLearner, ASTNN, and CodeBERT) by comparing their detection accuracy on different training and testing clone benchmarks. The results show that all three clone detectors do not generalize well to new data and there is a strong relationship between clone types and generalizability for CCLearner and ASTNN.

引用

页码：181 / 185

页数：5

共 50 条

[1] Investigating the impact of vulnerability datasets on deep learning-based vulnerability detectors
Liu, Lili
Li, Zhen
Wen, Yu
Chen, Penglong
PEERJ COMPUTER SCIENCE, 2022, 8 : 1 - 22
[2] Deep Learning-Based Computed Tomography Image Standardization to Improve Generalizability of Deep Learning-Based Hepatic Segmentation
Lee, Seul Bi
Hong, Youngtaek
Cho, Yeon Jin
Jeong, Dawun
Lee, Jina
Yoon, Soon Ho
Lee, Seunghyun
Choi, Young Hun
Cheon, Jung-Eun
KOREAN JOURNAL OF RADIOLOGY, 2023, 24 (04) : 294 - 304
[3] Generalizability Study On a Deep Learning-Based Dose Conversion Model
Zhong, X.
Xing, Y.
Lin, M.
Jiang, S.
Zhang, Y.
MEDICAL PHYSICS, 2021, 48 (06)
[4] CCLearner: A Deep Learning-Based Clone Detection Approach
Li, Liuqing
Feng, He
Zhuang, Wenjie
Meng, Na
Ryder, Barbara
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 249 - 259
[5] Assessing the Generalizability of a Deep Learning-based Automated Atrial Fibrillation Algorithm
Argha, Ahmadreza
Li, Joan
Magdy, Joseph
Alinejad-Rokny, Hamid
Celler, Branko G.
Butcher, Ken
Ooi, Sze-Yuan
Lovell, Nigel H.
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[6] FDD: a deep learning-based steel defect detectors
Akhyar, Fityanul
Liu, Ying
Hsu, Chao-Yung
Shih, Timothy K.
Lin, Chih-Yang
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2023, 126 (3-4): : 1093 - 1107
[7] A parallel deep learning-based code clone detection model
Zhang, Xiangping
Liu, Jianxun
Shi, Min
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 181
[8] On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions
Ciniselli, Matteo
Martin-Lopez, Alberto
Bavota, Gabriele
PROCEEDINGS 2024 32ND IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC 2024, 2024, : 99 - 111
[9] Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach
Etter, Brian
Hu, James Lee
Ebrahimi, Mohammadreza
Li, Weifeng
Li, Xin
Chen, Hsinchun
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1313 - 1321
[10] Investigating Reproducibility in Deep Learning-Based Software Fault Prediction
Mulchtar, Adil
Jannach, Dietmar
Wotawa, Franz
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 306 - 317

← 1 2 3 4 5 →