A Novel Feature Selection Method for Software Fault Prediction Model

被引:0
|
作者
Cui, Can [1 ]
Liu, Bin [1 ]
Li, Guoqi [1 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, 37 Xueyuan Rd, Beijing 100191, Peoples R China
关键词
classification; data preprocessing; feature selection (FS); machine learning; software fault prediction model; QUALITY; METRICS;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Software fault prediction (SFP) is an active issue in software engineering (SE). At present, machine learning (ML) has been successfully applied to SFP classification problems. However, one of the challenges for building software fault prediction models (SFPM) is processing high dimensional datasets, which include many irrelevant and redundant features. To address this issue, feature selection techniques, mainly contain wrapper methods and filter methods, are used. In the paper, we report an empirical study aimed at providing a novel approach to select feature for SFP. First of all, a novel feature selection method based on correlation-based feature subset selection (CFS) is proposed. In stage 1, we use the classical CFS to selected features. Then in stage 2, we propose a method for calculating similarity of feature occurrence frequency to further decrease the usefulness features. Second, to validate the novel FS approach, we compare our method with other three FS techniques. For comparison, 38 releases of 10 Java open source projects collected from the PROMISE repository are used in our proposed method. In addition, 10 releases of 10 projects, a total of 10 different software fault data sets are randomly selected. All the selected data subsets after FS approaches are applied to five typical ML classifiers. The final prediction performance results suggest that our proposed method performs mostly better than other three FS methods. Therefore, the novel feature selection approach is feasible. To sum up, we can use the method to delete irrelevant and redundant features to gain useful data subsets and construct well-performed SFPM. The results of SFP can provide useful advice for other SE activities, such as software testing, software quality assurance. Although the current method is effective, it still has some limitations. Our future work is to test the statistical significance of the classification results to further prove the feasibility of the idea.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Enhanced Binary Moth Flame Optimization as a Feature Selection Algorithm to Predict Software Fault Prediction
    Tumar, Iyad
    Hassouneh, Yousef
    Turabieh, Hamza
    Thaher, Thaer
    [J]. IEEE ACCESS, 2020, 8 (08): : 8041 - 8055
  • [22] Improved Dwarf Mongoose Optimization Algorithm for Feature Selection: Application in Software Fault Prediction Datasets
    Hammouri, Abdelaziz I.
    Awadallah, Mohammed A.
    Braik, Malik Sh.
    Al-Betar, Mohammed Azmi
    Beseiso, Majdi
    [J]. JOURNAL OF BIONIC ENGINEERING, 2024, 21 (04) : 2000 - 2033
  • [23] A novel software defect prediction model using two-phase grey wolf optimisation for feature selection
    Malhotra, Ruchika
    Khan, Kishwar
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (09): : 12185 - 12207
  • [24] Comprehensive Model for Software Fault Prediction
    Singh, Pradeep
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 1103 - 1108
  • [25] Fault Prediction Method for Distribution Network Outage based on Feature Selection and Ensemble Learning
    Zhang, Wen
    Sheng, Wanxing
    Liu, Keyan
    Du, Songhuai
    Jia, Dongli
    Hu, Lijuan
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 226 - 231
  • [26] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Ni, Chao
    Liu, Wang-Shu
    Chen, Xiang
    Gu, Qing
    Chen, Dao-Xu
    Huang, Qi-Guo
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2017, 32 (06) : 1090 - 1107
  • [27] Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem
    Catal, Cagatay
    Diri, Banu
    [J]. INFORMATION SCIENCES, 2009, 179 (08) : 1040 - 1058
  • [28] A software defect prediction method with metric compensation based on feature selection and transfer learning
    Chen, Jinfu
    Wang, Xiaoli
    Cai, Saihua
    Xu, Jiaping
    Chen, Jingyi
    Chen, Haibo
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (05) : 715 - 731
  • [29] Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
    Balogun, Abdullateef Oluwagbemiga
    Basri, Shuib
    Abdulkadir, Said Jadid
    Hashim, Ahmad Sobri
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (13):
  • [30] A Cluster Based Feature Selection Method for Cross-Project Software Defect Prediction
    Chao Ni
    Wang-Shu Liu
    Xiang Chen
    Qing Gu
    Dao-Xu Chen
    Qi-Guo Huang
    [J]. Journal of Computer Science and Technology, 2017, 32 : 1090 - 1107