Feature Selection with Imbalanced Data for Software Defect Prediction

被引:34
|
作者
Khoshgoftaar, Taghi M. [1 ]
Gao, Kehan [2 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
[2] Eastern Connecticut State Univ, Willimantic, CT 06226 USA
关键词
D O I
10.1109/ICMLA.2009.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects of imbalanced data on the prediction models. The datasets used in the empirical study were collected from numerous software projects. Five data preprocessing scenarios were explored in these experiments, including: ( I) training on the original, unaltered fit dataset, (2) training on a sampled version of the fit dataset, (3) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on the unsampled fit dataset, (4) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on a sampled version of the fit dataset, and (5) training on a sampled version of the fit dataset using only the attributes chosen by feature selection based on the sampled version of the fit dataset. We compared the performances of the classification models constructed over these five different scenarios. The results demonstrate that the classification models constructed on the sampled fit data with or without feature selection (case 2 and case 5) significantly outperformed the classification models built with the other cases (unsampled fit data). Moreover; the two scenarios using sampled data (case 2 and case 5) showed very similar performances, but the subset of attributes (case 5) is only around 15% or 30% of the complete set of attributes (case 2).
引用
收藏
页码:235 / +
页数:2
相关论文
共 50 条
  • [1] Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    [J]. 22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [2] Impact of Data Sampling on Feature Selection Techniques for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. PROCEEDINGS 18TH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY & QUALITY IN DESIGN, 2012, : 91 - +
  • [3] Genetic Feature Selection for Software Defect Prediction
    Wahono, Romi Satria
    Herman, Nanna Suryana
    [J]. ADVANCED SCIENCE LETTERS, 2014, 20 (01) : 239 - 244
  • [4] Imbalanced Data Processing Model for Software Defect Prediction
    Zhou, Lijuan
    Li, Ran
    Zhang, Shudong
    Wang, Hua
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (02) : 937 - 950
  • [5] Imbalanced Data Processing Model for Software Defect Prediction
    Lijuan Zhou
    Ran Li
    Shudong Zhang
    Hua Wang
    [J]. Wireless Personal Communications, 2018, 102 : 937 - 950
  • [6] RFC: a feature selection algorithm for software defect prediction
    Xu Xiaolong
    Chen Wen
    Wang Xinheng
    [J]. JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2021, 32 (02) : 389 - 398
  • [7] Software Defect Prediction Scheme Based on Feature Selection
    Wang, Pei
    Jin, Cong
    Jin, Shu-Wei
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 477 - 480
  • [8] RFC: a feature selection algorithm for software defect prediction
    XU Xiaolong
    CHEN Wen
    WANG Xinheng
    [J]. Journal of Systems Engineering and Electronics, 2021, 32 (02) : 389 - 398
  • [9] Feature Selection in Software Defect Prediction: A Comparative Study
    Kakkar, Misha
    Jain, Sarika
    [J]. 2016 6TH INTERNATIONAL CONFERENCE - CLOUD SYSTEM AND BIG DATA ENGINEERING (CONFLUENCE), 2016, : 658 - 663
  • [10] FECAR: A Feature Selection Framework for Software Defect Prediction
    Liu, Shulong
    Chen, Xiang
    Liu, Wangshu
    Chen, Jiaqiang
    Gu, Qing
    Chen, Daoxu
    [J]. 2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 426 - 435