Training data selection for imbalanced cross-project defect prediction

被引:7
|
作者
Zheng, Shang [1 ]
Gai, Jinjing [1 ]
Yu, Hualong [1 ]
Zou, Haitao [1 ]
Gao, Shang [1 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212100, Jiangsu, Peoples R China
关键词
Cross-project software prediction; Data selection; Jensen-Shannon divergence; Relative density; SMOTE;
D O I
10.1016/j.compeleceng.2021.107370
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning methods have been applied in software engineering to effectively predict software defects. Researchers proposed cross-project defect prediction (CPDP) for cases in which few or no data are available. CPDP uses the labeled data of a source project to construct a prediction model for the target project. However, the prediction performance remains inferior because the training data selection for the source project is ineffective. In this paper, the Jensen-Shannon divergence is first applied to automatically select the source project most similar to the target project. Subsequently, a grouped synthetic minority oversampling technique (SMOTE) is applied to improve the class imbalance of the projects. Finally, relative density estimation is performed to select the data for the source project. The experimental results demonstrate that the proposed method improves the prediction performance and exhibits high adaptability to different classifiers.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] An Improved Method for Training Data Selection for Cross-Project Defect Prediction
    Nayeem Ahmad Bhat
    Sheikh Umar Farooq
    [J]. Arabian Journal for Science and Engineering, 2022, 47 : 1939 - 1954
  • [2] An Improved Method for Training Data Selection for Cross-Project Defect Prediction
    Bhat, Nayeem Ahmad
    Farooq, Sheikh Umar
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2022, 47 (02) : 1939 - 1954
  • [3] An Improved Method for Cross-Project Defect Prediction by Simplifying Training Data
    He, Peng
    He, Yao
    Yu, Lvjun
    Li, Bing
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [4] Data Transformation in Cross-project Defect Prediction
    Feng Zhang
    Iman Keivanloo
    Ying Zou
    [J]. Empirical Software Engineering, 2017, 22 : 3186 - 3218
  • [5] Data Transformation in Cross-project Defect Prediction
    Zhang, Feng
    Keivanloo, Iman
    Zou, Ying
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2017, 22 (06) : 3186 - 3218
  • [6] Assessing the Effect of Imbalanced Learning on Cross-project Software Defect Prediction
    Sohan, Md Fahimuzzman
    Jabiullah, Md Ismail
    Rahman, Sheikh Shah Mohammad Motiur
    Mahmud, S. M. Hasan
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [7] An Empirical Study of Training Data Selection Methods for Ranking-Oriented Cross-Project Defect Prediction
    Luo, Haoyu
    Dai, Heng
    Peng, Weiqiang
    Hu, Wenhua
    Li, Fuyang
    [J]. SENSORS, 2021, 21 (22)
  • [8] An Investigation of Imbalanced Ensemble Learning Methods for Cross-Project Defect Prediction
    Qiu, Shaojian
    Lu, Lu
    Jiang, Siyu
    Guo, Yang
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2019, 33 (12)
  • [9] Isolation Forest Filter to Simplify Training Data for Cross-Project Defect Prediction
    Cui, Can
    Liu, Bin
    Wang, Shihai
    [J]. 2019 PROGNOSTICS AND SYSTEM HEALTH MANAGEMENT CONFERENCE (PHM-QINGDAO), 2019,
  • [10] Using Bandit Algorithms for Project Selection in Cross-Project Defect Prediction
    Asano, Takuya
    Tsunoda, Masateru
    Toda, Koji
    Tahir, Amjed
    Bennin, Kwabena Ebo
    Nakasai, Keitaro
    Monden, Akito
    Matsumoto, Kenichi
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021), 2021, : 649 - 653