Code Smell Detection Research Based on Pre-training and Stacking Models

被引：0

作者：

Zhang, Dongwen ^{[1
,2
]}

Song, Shuai ^{[1
]}

Zhang, Yang ^{[1
,2
]}

Liu, Haiyang ^{[1
]}

Shen, Gaojie ^{[1
]}

机构：

[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Hebei, Peoples R China

[2] Hebei Technol Innovat Ctr Intelligent IoT, Shijiazhuang 050018, Hebei, Peoples R China

来源：

IEEE LATIN AMERICA TRANSACTIONS | 2024年 / 22卷 / 01期

关键词：

Codes; Feature extraction; Stacking; Measurement; !text type='Java']Java[!/text; Training; Testing; CLASSIFIER;

D O I：

10.1109/TLA.2024.10375735

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Code smells detection primarily adopts heuristic-based, machine learning, and deep learning approaches, However, to enhance accuracy, most studies employ deep learning methods, but the value of traditional machine learning methods should not be underestimated. Additionally, existing code smells detection methods do not pay sufficient attention to the textual features in the code. To address this issue, this paper proposes a code smell detection method, SCSmell, which utilizes static analysis tools to extract structure features, then transforms the code into txt format using static analysis tools , and inputs it into the BERT pre-training model to extract textual features. The structure features are combined with the textual features to generate sample data and label code smells instances. The REFCV method is then used to filter important structure features. To deal with the issue of data imbalance, the Borderline-SMOTE method is used to generate positive sample data, and a three-layer Stacking model is ultimately employed to detect code smells. In our experiment, we select 44 large actual projects programs as the training and testing sets and conducted smell detection for four types of code smells: brain class, data class, God class, and brain method. The experimental results indicate that the SCSmell method improves the average accuracy by 10.38 % compared to existing detection methods, while maintaining high precision, recall, and F1 scores. The SCSmell method is an effective solution for implementing code smells detection.

引用

页码：22 / 30

页数：9

共 50 条

[1] Research frontiers of pre-training mathematical models based on BERT
Li, Guang
Wang, Wennan
Zhu, Liukai
Peng, Jun
Li, Xujia
Luo, Ruijie
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
[2] Clone Detection with Pre-training Enhanced Code Representation
Leng L.-S.
Liu S.
Tian C.-L.
Dou S.-J.
Wang Z.
Zhang M.-S.
[J]. Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1758 - 1773
[3] VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection
Hanif, Hazim
Maffeis, Sergio
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[4] Research on Pre-Training Models for Tibetan Text with Character Awareness
Gadeng, Luosang
Nyima, Tashi
[J]. Computer Engineering and Applications, 2024, 60 (21) : 127 - 133
[5] Contrastive Code-Comment Pre-training
Pei, Xiaohuan
Liu, Daochang
Qian, Luo
Xu, Chang
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
[6] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Liu, Tongtong
Feng, Fangxiang
Wang, Xiaojie
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
[7] Code smell detection based on supervised learning models: A survey
Zhang, Yang
Ge, Chuyan
Liu, Haiyang
Zheng, Kun
[J]. NEUROCOMPUTING, 2024, 565
[8] Realistic Channel Models Pre-training
Huangfu, Yourui
Wang, Jian
Xu, Chen
Li, Rong
Ge, Yiqun
Wang, Xianbin
Zhang, Huazi
Wang, Jun
[J]. 2019 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2019,
[9] Pre-Training Transformers as Energy-Based Cloze Models
Clark, Kevin
Luong, Minh-Thang
Le, Quoc V.
Manning, Christopher D.
[J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 285 - 294
[10] Unsupervised Pre-Training for Detection Transformers
Dai, Zhigang
Cai, Bolun
Lin, Yugeng
Chen, Junying
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782

← 1 2 3 4 5 →