A Novel Dataset-Similarity-Aware Approach for Evaluating Stability of Software Metric Selection Techniques

被引:0
|
作者
Wang, Huanjing [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Western Kentucky Univ, Bowling Green, KY 42101 USA
关键词
PREDICTION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Software metric (feature) selection is an important preprocessing step before building software defect prediction models. Although much research has been done analyzing the classification performance of feature selection methods, fewer works have focused on their stability (robustness). Stability is important because feature selection methods which reliably produce the same results despite changes to the data are more trustworthy. Of the papers studying stability, most either compare the features chosen from different random subsamples of the dataset or compare each random subsample with the original dataset. These either result in an unknown degree of overlap between the subsamples, or comparing datasets of different sizes. In this work, we propose a fixed-overlap partition algorithm which generates a pair of subsamples with the same number of instances and a specified degree of overlap. We empirically evaluate the stability of 19 feature selection methods in terms of degree of overlap and feature subset size using sixteen real software metrics datasets. Consistency index is used as the stability measure, and we show that RF is the most stable filter. Results also show that degree of overlap and feature subset size do affect the stability of feature selection methods.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [31] A novel approach to market segmentation selection using artificial intelligence techniques
    Chang, Yu-Teng
    Fan, Neng-Hsun
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (02): : 1235 - 1262
  • [32] A novel approach to market segmentation selection using artificial intelligence techniques
    Yu-Teng Chang
    Neng-Hsun Fan
    The Journal of Supercomputing, 2023, 79 : 1235 - 1262
  • [33] Unique software algorithms for tremor analysis - comparing the novel approach and standard techniques
    Shaikh, A. G.
    Crawford, T. O.
    Tripp, R. M.
    Zee, D. S.
    MOVEMENT DISORDERS, 2007, 22 : S292 - S292
  • [34] A Novel RVFL-Based Algorithm Selection Approach for Software Model Checking
    Cao, Weipeng
    Wu, Yuhao
    Wang, Qiang
    Zhang, Jiyong
    Zhang, Xingjian
    Qiu, Meikang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 414 - 425
  • [35] A novel importance scores based variable selection approach and validation using a MIR and NIR dataset
    Tang, Li Jun
    Li, Xin Kang
    Huang, Yue
    Zhang, Xiang-Zhi
    Li, Bao Qiong
    SPECTROCHIMICA ACTA PART A-MOLECULAR AND BIOMOLECULAR SPECTROSCOPY, 2025, 330
  • [36] Dsn2Code: An automated approach for similarity-based Software Architecture selection for Code reuse
    South Eastern University of Sri Lanka, Department of Information and Communication Technolgy, Sri Lanka
    Proc. - Int. Res. Conf. Smart Comput. Syst. Eng., SCSE,
  • [37] A chemometric approach based on a novel similarity/diversity measure for the characterisation and selection of electronic nose sensors
    Ballabio, Davide
    Cosio, Maria Stella
    Mannino, Saverio
    Todeschini, Roberto
    ANALYTICA CHIMICA ACTA, 2006, 578 (02) : 170 - 177
  • [38] A novel qualitative metric based approach to the improvement of data plane flexibility in software-defined networks
    Kaljic, Enio
    Maric, Almir
    Hadzialic, Mesud
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [39] Novel Multiperspective Hiring Framework for the Selection of Software Programmer Applicants Based on AHP and Group TOPSIS Techniques
    Zaidan, A. A.
    Zaidan, B. B.
    Alsalem, M. A.
    Momani, Fayiz
    Zughoul, Omar
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2020, 19 (03) : 775 - 847
  • [40] Performance analysis of attributes selection and discretization of Parkinson's disease dataset using machine learning techniques: a comprehensive approach
    Kamalakannan, K.
    Anandharaj, G.
    Gunavathie, M. A.
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (04) : 1523 - 1529