Aligned metric representation based balanced multiset ensemble learning for heterogeneous defect prediction

被引:13
|
作者
Chen, Haowen [1 ]
Jing, Xiao-Yuan [1 ,2 ,3 ,4 ]
Zhou, Yuming [4 ]
Li, Bing [1 ]
Xu, Baowen [4 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan, Peoples R China
[2] Guangdong Univ Petrochem Technol, Sch Comp Sci, Maoming, Peoples R China
[3] Guangdong Univ Petrochem Technol, Guangdong Prov Key Lab Petrochem Equipment Fault, Maoming, Peoples R China
[4] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
关键词
Heterogeneous defect prediction; Class imbalance learning; Aligned metric representation; Ensemble learning; Balanced multiset; CODE; MODELS; MACHINE; FAULTS;
D O I
10.1016/j.infsof.2022.106892
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Heterogeneous defect prediction (HDP) refers to the defect prediction across projects with different metrics. Most existing HDP methods map source and target data into a common metric space where each dimension has no actual meaning, which weakens their interpretability. Besides, HDP always suffers from the class imbalance problem. Objective: For deficiencies of current HDP methods, we intend to propose a novel HDP approach that can reduce the heterogeneity of source and target data and deal with imbalanced data while retaining the actual meaning for each dimension of constructed common metric space. Method: We propose an Aligned Metric Representation based Balanced Multiset Ensemble learning (BMEL+ AMR) approach for HDP. AMR consists of shared, source-specific, and target-specific metrics. It is built by learning the translation from shared to specific metrics and reducing the distribution difference. To deal with imbalanced data, we design BMEL that constructs multiple balanced subsets for source data and produces an aggregated classifier for predicting labels of target data. Result: Experimental results on 22 public projects indicate that (1) among all competing methods, BMEL+AMR achieves the best performance on all indicators except Popt, followed by AMR; (2) compared with AMR, the introduction of BMEL improves the performance on non-effort-aware indicators statistically significantly except F1-score; compared with BMEL, the introduction of AMR improves the performance throughout all indicators statistically significantly. Conclusion: BMEL+AMR can effectively improve HDP performance by eliminating heterogeneity and dealing with imbalanced data, and AMR is helpful to explain the prediction model.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Using Coding-Based Ensemble Learning to Improve Software Defect Prediction
    Sun, Zhongbin
    Song, Qinbao
    Zhu, Xiaoyan
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06): : 1806 - 1817
  • [32] Heterogeneous hypergraph representation learning for link prediction
    Zhao, Zijuan
    Yang, Kai
    Guo, Jinli
    EUROPEAN PHYSICAL JOURNAL B, 2024, 97 (10):
  • [33] Multiple kernel ensemble learning for software defect prediction
    Wang, Tiejian
    Zhang, Zhiwu
    Jing, Xiaoyuan
    Zhang, Liqiang
    AUTOMATED SOFTWARE ENGINEERING, 2016, 23 (04) : 569 - 590
  • [34] Multiple kernel ensemble learning for software defect prediction
    Tiejian Wang
    Zhiwu Zhang
    Xiaoyuan Jing
    Liqiang Zhang
    Automated Software Engineering, 2016, 23 : 569 - 590
  • [35] The impact of parameter optimization of ensemble learning on defect prediction
    Ozturk, Muhammed Maruf
    COMPUTER SCIENCE JOURNAL OF MOLDOVA, 2019, 27 (01) : 85 - 128
  • [36] A Heterogeneous Ensemble Learning Method For Neuroblastoma Survival Prediction
    Feng, Yi
    Wang, Xianglin
    Zhang, Juan
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (04) : 1472 - 1483
  • [37] A machine learning method based on stacking heterogeneous ensemble learning for prediction of indoor humidity of greenhouse
    Melal, Sepehr Rezaei
    Aminian, Mahdi
    Shekarian, Seyed Mohammadhossein
    JOURNAL OF AGRICULTURE AND FOOD RESEARCH, 2024, 16
  • [38] Software Defect Prediction Based Ensemble Approach
    Harikiran J.
    Chandana B.S.
    Srinivasarao B.
    Raviteja B.
    Reddy T.S.
    Computer Systems Science and Engineering, 2023, 45 (03): : 2313 - 2331
  • [39] Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning
    Xu, Zhou
    Pang, Shuai
    Zhang, Tao
    Luo, Xia-Pu
    Liu, Jin
    Tang, Yu-Tian
    Yu, Xiao
    Xue, Lei
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2019, 34 (05) : 1039 - 1062
  • [40] Cross Project Defect Prediction via Balanced Distribution Adaptation Based Transfer Learning
    Zhou Xu
    Shuai Pang
    Tao Zhang
    Xia-Pu Luo
    Jin Liu
    Yu-Tian Tang
    Xiao Yu
    Lei Xue
    Journal of Computer Science and Technology, 2019, 34 : 1039 - 1062