A three-stage transfer learning framework for multi-source cross-project software defect prediction

被引:19
|
作者
Bai, Jiaojiao [1 ]
Jia, Jingdong [1 ]
Capretz, Luiz Fernando [2 ]
机构
[1] Beihang Univ, Sch Software, 37 Xueyuan Rd, Beijing 100191, Peoples R China
[2] Western Univ, Elect & Comp Engn, London, ON, Canada
关键词
Transfer learning; Cross-project defect prediction; Source selection; Multi-source utilization; 3SW-MSTL; SUPPORT VECTOR MACHINE; MODELS;
D O I
10.1016/j.infsof.2022.106985
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Transfer learning techniques have been proved to be effective in the field of Cross-project defect prediction (CPDP). However, some questions still remain. First, the conditional distribution difference between source and target projects has not been considered. Second, facing multiple source projects, most studies only rarely consider the issues of source selection and multi-source data utilization; instead, they use all available projects and merge multi-source data together to obtain one final dataset. Objective: To address these issues, in this paper, we propose a three-stage weighting framework for multi-source transfer learning (3SW-MSTL) in CPDP. In stage 1, a source selection strategy is needed to select a suitable number of source projects from all available projects. In stage 2, a transfer technique is applied to minimize marginal differences. In stage 3, a multi-source data utilization scheme that uses conditional distribution information is needed to help guide researchers in the use of multi-source transferred data. Method: First, we have designed five source selection strategies and four multi-source utilization schemes and chosen the best one to be used in stage 1 and 3 in 3SW-MSTL by comparing their influences on prediction performance. Second, to validate the performance of 3SW-MSTL, we compared it with four multi-source and six single-source CPDP methods, a baseline within-project defect prediction (WPDP) method, and two unsupervised methods on the data from 30 widely used open-source projects. Results: Through experiments, bellwether and weighted vote are separately chosen as a source selection strategy and a multi-source utilization scheme used in 3SW-MSTL. And, our results indicate that 3SW-MSTL outperforms four multi-source, six single-source CPDP methods and two unsupervised methods. And, 3SW-MSTL is comparable to the WPDP method. Conclusion: The proposed 3SW-MSTL model is more effective for considering the two issues mentioned before.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A three-stage transfer learning framework for multi-source cross-project software defect prediction
    Bai, Jiaojiao
    Jia, Jingdong
    Capretz, Luiz Fernando
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 150
  • [2] A three-stage transfer learning framework for multi-source cross-project software defect prediction
    Bai, Jiaojiao
    Jia, Jingdong
    Capretz, Luiz Fernando
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 150
  • [3] A Three-Stage Defect Prediction Model for Cross-Project Defect Prediction
    Huang, Song
    Wu, Yaning
    Ji, Haijin
    Bai, Chengzu
    [J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND THEIR APPLICATIONS (DSA 2017), 2017, : 169 - 169
  • [4] MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder
    Jie Wu
    Yingbo Wu 
    Nan Niu
    Min Zhou
    [J]. Software Quality Journal, 2021, 29 : 405 - 430
  • [5] MHCPDP: multi-source heterogeneous cross-project defect prediction via multi-source transfer learning and autoencoder
    Wu, Jie
    Wu, Yingbo
    Niu, Nan
    Zhou, Min
    [J]. SOFTWARE QUALITY JOURNAL, 2021, 29 (02) : 405 - 430
  • [6] Cross-project software defect prediction based on multi-source data sets
    Huang Junfu
    Wang Yawen
    Gong Yunzhan
    Jin Dahai
    [J]. The Journal of China Universities of Posts and Telecommunications, 2021, 28 (04) : 75 - 87
  • [7] Cross-project software defect prediction based on multi-source data sets
    Junfu, Huang
    Yawen, Wang
    Yunzhan, Gong
    Dahai, Jin
    [J]. Journal of China Universities of Posts and Telecommunications, 2021, 28 (04): : 75 - 87
  • [8] MSCPDPLab: A MATLAB toolbox for transfer learning based multi-source cross-project defect prediction
    Zou, Jiaqi
    Li, Zonghao
    Liu, Xuanying
    Tong, Haonan
    [J]. SOFTWAREX, 2023, 21
  • [9] MSCPDPLab: A MATLAB toolbox for transfer learning based multi-source cross-project defect prediction
    Zou, Jiaqi
    Li, Zonghao
    Liu, Xuanying
    Tong, Haonan
    [J]. SOFTWAREX, 2023, 21
  • [10] MASTER: Multi-Source Transfer Weighted Ensemble Learning for Multiple Sources Cross-Project Defect Prediction
    Tong, Haonan
    Zhang, Dalin
    Liu, Jiqiang
    Xing, Weiwei
    Lu, Lingyun
    Lu, Wei
    Wu, Yumei
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (05) : 1281 - 1305