Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Rating prediction model based on heterogeneous network representation learning
    Zhan N.
    Liu W.
    Chen X.
    Pu J.
    Liu, Wei (wayne@buaa.edu.cn), 1600, Beijing University of Aeronautics and Astronautics (BUAA) (47): : 1077 - 1084
  • [22] Feature Clustering and Ensemble Learning Based Approach for Software Defect Prediction
    Srivastava R.
    Jain A.K.
    Recent Advances in Computer Science and Communications, 2022, 15 (06): : 868 - 882
  • [23] Heterogeneous Double-Head Ensemble for Deep Metric Learning
    Ro, Youngmin
    Choi, Jin Young
    IEEE ACCESS, 2020, 8 : 118525 - 118533
  • [24] Heterogeneous Defect Prediction through Joint Metric Selection and Matching
    Chen, Haowen
    Jing, Xiao-Yuan
    Xu, Baowen
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 367 - 377
  • [25] Kernel Spectral Embedding Transfer Ensemble for Heterogeneous Defect Prediction
    Tong, Haonan
    Liu, Bin
    Wang, Shihai
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1886 - 1906
  • [26] Prediction of Drug-Target Interactions Based on Network Representation Learning and Ensemble Learning
    Xuan, Ping
    Chen, Bingxu
    Zhang, Tiangang
    Yang, Yan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (06) : 2671 - 2681
  • [27] Heterogeneous Defect Prediction Based on Transfer Learning to Handle Extreme Imbalance
    Jiang, Kaiyuan
    Zhang, Yutong
    Wu, Haibin
    Wang, Aili
    Iwahori, Yuji
    APPLIED SCIENCES-BASEL, 2020, 10 (01):
  • [28] Software Defect Prediction and Localization with Attention-Based Models and Ensemble Learning
    Zhang, Tianhang
    Du, Qingfeng
    Xu, Jincheng
    Li, Jiechu
    Li, Xiaojun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 81 - 90
  • [29] Ensemble-Learning-Based Prediction of Steel Bridge Deck Defect Condition
    Li, Qingfu
    Song, Zongming
    APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [30] Prediction Algorithm for Software Defect Series Based on Nonlinear Weighted Ensemble Learning
    Jia X.
    Fan S.
    Luo X.
    Zhu X.
    1600, Xi'an Jiaotong University (51): : 156 - 161