Code Smell Detection Research Based on Pre-training and Stacking Models

被引:0
|
作者
Zhang, Dongwen [1 ,2 ]
Song, Shuai [1 ]
Zhang, Yang [1 ,2 ]
Liu, Haiyang [1 ]
Shen, Gaojie [1 ]
机构
[1] Hebei Univ Sci & Technol, Sch Informat Sci & Engn, Shijiazhuang 050018, Hebei, Peoples R China
[2] Hebei Technol Innovat Ctr Intelligent IoT, Shijiazhuang 050018, Hebei, Peoples R China
关键词
Codes; Feature extraction; Stacking; Measurement; !text type='Java']Java[!/text; Training; Testing; CLASSIFIER;
D O I
10.1109/TLA.2024.10375735
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code smells detection primarily adopts heuristic-based, machine learning, and deep learning approaches, However, to enhance accuracy, most studies employ deep learning methods, but the value of traditional machine learning methods should not be underestimated. Additionally, existing code smells detection methods do not pay sufficient attention to the textual features in the code. To address this issue, this paper proposes a code smell detection method, SCSmell, which utilizes static analysis tools to extract structure features, then transforms the code into txt format using static analysis tools , and inputs it into the BERT pre-training model to extract textual features. The structure features are combined with the textual features to generate sample data and label code smells instances. The REFCV method is then used to filter important structure features. To deal with the issue of data imbalance, the Borderline-SMOTE method is used to generate positive sample data, and a three-layer Stacking model is ultimately employed to detect code smells. In our experiment, we select 44 large actual projects programs as the training and testing sets and conducted smell detection for four types of code smells: brain class, data class, God class, and brain method. The experimental results indicate that the SCSmell method improves the average accuracy by 10.38 % compared to existing detection methods, while maintaining high precision, recall, and F1 scores. The SCSmell method is an effective solution for implementing code smells detection.
引用
收藏
页码:22 / 30
页数:9
相关论文
共 50 条
  • [1] Research frontiers of pre-training mathematical models based on BERT
    Li, Guang
    Wang, Wennan
    Zhu, Liukai
    Peng, Jun
    Li, Xujia
    Luo, Ruijie
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 154 - 158
  • [2] Clone Detection with Pre-training Enhanced Code Representation
    Leng L.-S.
    Liu S.
    Tian C.-L.
    Dou S.-J.
    Wang Z.
    Zhang M.-S.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2022, 33 (05): : 1758 - 1773
  • [3] VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection
    Hanif, Hazim
    Maffeis, Sergio
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [4] Research on Pre-Training Models for Tibetan Text with Character Awareness
    Gadeng, Luosang
    Nyima, Tashi
    [J]. Computer Engineering and Applications, 2024, 60 (21) : 127 - 133
  • [5] Contrastive Code-Comment Pre-training
    Pei, Xiaohuan
    Liu, Daochang
    Qian, Luo
    Xu, Chang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 398 - 407
  • [6] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
    Liu, Tongtong
    Feng, Fangxiang
    Wang, Xiaojie
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
  • [7] Code smell detection based on supervised learning models: A survey
    Zhang, Yang
    Ge, Chuyan
    Liu, Haiyang
    Zheng, Kun
    [J]. NEUROCOMPUTING, 2024, 565
  • [8] Realistic Channel Models Pre-training
    Huangfu, Yourui
    Wang, Jian
    Xu, Chen
    Li, Rong
    Ge, Yiqun
    Wang, Xianbin
    Zhang, Huazi
    Wang, Jun
    [J]. 2019 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2019,
  • [9] Pre-Training Transformers as Energy-Based Cloze Models
    Clark, Kevin
    Luong, Minh-Thang
    Le, Quoc V.
    Manning, Christopher D.
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 285 - 294
  • [10] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782