Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning
    Xu, Jiaxi
    Wang, Fei
    Ai, Jun
    IEEE TRANSACTIONS ON RELIABILITY, 2021, 70 (02) : 613 - 625
  • [42] Heterogeneous Defect Prediction Based on Federated Reinforcement Learning via Gradient Clustering
    Wang, Aili
    Zhao, Yinghui
    Li, Guodong
    Zhang, Jun
    Wu, Haibin
    Iwahori, Yuji
    IEEE Access, 2022, 10 : 87832 - 87843
  • [43] Heterogeneous Defect Prediction Based on Federated Transfer Learning via Knowledge Distillation
    Wang, Aili
    Zhang, Yutong
    Yan, Yixin
    IEEE ACCESS, 2021, 9 : 29530 - 29540
  • [44] Heterogeneous Defect Prediction Based on Federated Reinforcement Learning via Gradient Clustering
    Wang, Aili
    Zhao, Yinghui
    Li, Guodong
    Zhang, Jun
    Wu, Haibin
    Iwahori, Yuji
    IEEE ACCESS, 2022, 10 : 87832 - 87843
  • [45] A software defect prediction method with metric compensation based on feature selection and transfer learning
    Chen, Jinfu
    Wang, Xiaoli
    Cai, Saihua
    Xu, Jiaping
    Chen, Jingyi
    Chen, Haibo
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (05) : 715 - 731
  • [46] iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model
    Akbar, Shahid
    Ahmad, Ashfaq
    Hayat, Maqsood
    Rehman, Ateeq Ur
    Khan, Salman
    Ali, Farman
    COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 137
  • [47] Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System
    Abbas, Sagheer
    Aftab, Shabib
    Khan, Muhammad Adnan
    Ghazal, Taher M.
    Al Hamadi, Hussam
    Yeun, Chan Yeob
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 6083 - 6100
  • [48] Herb Target Prediction Based on Representation Learning of Symptom related Heterogeneous Network
    Wang, Ning
    Li, Peng
    Hu, Xiaochen
    Yang, Kuo
    Peng, Yonghong
    Zhu, Qiang
    Zhang, Runshun
    Gao, Zhuye
    Xu, Hao
    Liu, Baoyan
    Chen, Jianxin
    Zhou, Xuezhong
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2019, 17 : 282 - 290
  • [49] Neighbor cleaning learning based cost-sensitive ensemble learning approach for software defect prediction
    Li, Li
    Su, Renjia
    Zhao, Xin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (12):
  • [50] Research on User Default Prediction Algorithm Based on Adjusted Homogenous and Heterogeneous Ensemble Learning
    Lu, Yao
    Wang, Kui
    Sun, Hui
    Qu, Hanwen
    Chen, Jiajia
    Liu, Wei
    Chang, Chenjie
    APPLIED SCIENCES-BASEL, 2024, 14 (13):