Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Heterogeneous Cross-Company Defect Prediction by Unified Metric Representation and CCA-Based Transfer Learning
    Jing, Xiaoyuan
    Wu, Fei
    Dong, Xiwei
    Qi, Fumin
    Xu, Baowen
    2015 10TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE 2015) PROCEEDINGS, 2015, : 496 - 507
  • [2] Heterogeneous Defect Prediction Using Ensemble Learning Technique
    Ansari, Arsalan Ahmed
    Iqbal, Amaan
    Sahoo, Bibhudatta
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, 2020, 1056 : 283 - 293
  • [3] Heterogeneous Defect Prediction through Multiple Kernel Learning and Ensemble Learning
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Zhang, Hongyu
    2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 91 - 102
  • [4] Heterogeneous defect prediction with two-stage ensemble learning
    Zhiqiang Li
    Xiao-Yuan Jing
    Xiaoke Zhu
    Hongyu Zhang
    Baowen Xu
    Shi Ying
    Automated Software Engineering, 2019, 26 : 599 - 651
  • [5] Heterogeneous defect prediction with two-stage ensemble learning
    Li, Zhiqiang
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Zhang, Hongyu
    Xu, Baowen
    Ying, Shi
    AUTOMATED SOFTWARE ENGINEERING, 2019, 26 (03) : 599 - 651
  • [6] Few-Shot Learning Based Balanced Distribution Adaptation for Heterogeneous Defect Prediction
    Wang, Aili
    Zhang, Yutong
    Wu, Haibin
    Jiang, Kaiyuan
    Wang, Minhui
    IEEE ACCESS, 2020, 8 : 32989 - 33001
  • [7] Ensemble learning based software defect prediction
    Dong, Xin
    Liang, Yan
    Miyamoto, Shoichiro
    Yamaguchi, Shingo
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (04): : 377 - 391
  • [8] Heterogeneous Defect Prediction through Correlation-Based Selection of Multiple Source Projects and Ensemble Learning
    Kim, Eunseob
    Baik, Jongmoon
    Ryu, Duksan
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 503 - 513
  • [9] Heterogeneous Defect Prediction Based on Federated Prototype Learning
    Wang, Aili
    Yang, Linlin
    Wu, Haibin
    Iwahori, Yuji
    IEEE ACCESS, 2023, 11 : 98618 - 98632
  • [10] Software defect prediction model based on distance metric learning
    Cong Jin
    Soft Computing, 2021, 25 : 447 - 461