A Novel Dataset-Similarity-Aware Approach for Evaluating Stability of Software Metric Selection Techniques

被引:0
|
作者
Wang, Huanjing [1 ]
Khoshgoftaar, Taghi M. [1 ]
Wald, Randall [1 ]
Napolitano, Amri [1 ]
机构
[1] Western Kentucky Univ, Bowling Green, KY 42101 USA
来源
2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI) | 2012年
关键词
PREDICTION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Software metric (feature) selection is an important preprocessing step before building software defect prediction models. Although much research has been done analyzing the classification performance of feature selection methods, fewer works have focused on their stability (robustness). Stability is important because feature selection methods which reliably produce the same results despite changes to the data are more trustworthy. Of the papers studying stability, most either compare the features chosen from different random subsamples of the dataset or compare each random subsample with the original dataset. These either result in an unknown degree of overlap between the subsamples, or comparing datasets of different sizes. In this work, we propose a fixed-overlap partition algorithm which generates a pair of subsamples with the same number of instances and a specified degree of overlap. We empirically evaluate the stability of 19 feature selection methods in terms of degree of overlap and feature subset size using sixteen real software metrics datasets. Consistency index is used as the stability measure, and we show that RF is the most stable filter. Results also show that degree of overlap and feature subset size do affect the stability of feature selection methods.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [21] An Interference-aware Approach for Co-located Container Orchestration with Novel Metric
    Li, Xiang
    Wen, Linfeng
    Xu, Minxian
    Ye, Kejiang
    2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 600 - 607
  • [22] A Novel Approach to Gene Selection of Leukemia Dataset Using Different Clustering Methods
    Prasath, P.
    Perumal, K.
    Thangavel, K.
    Manavalan, R.
    COMPUTATIONAL INTELLIGENCE, CYBER SECURITY AND COMPUTATIONAL MODELS, 2014, 246 : 63 - 69
  • [23] A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model
    Bezerra, Jose Fabio Ribeiro
    Kozierkiewicz, Adrianna
    Pietranik, Marcin
    IEEE ACCESS, 2025, 13 : 57043 - 57061
  • [24] A Mixed Approach to Similarity Metric Selection in Affinity Propagation-Based WiFi Fingerprinting Indoor Positioning
    Caso, Giuseppe
    de Nardis, Luca
    di Benedetto, Maria-Gabriella
    SENSORS, 2015, 15 (11) : 27692 - 27720
  • [25] An Approach for the Prediction of Number of Software Faults Based on the Dynamic Selection of Learning Techniques
    Rathore, Santosh Singh
    Kumar, Sandeep
    IEEE TRANSACTIONS ON RELIABILITY, 2019, 68 (01) : 216 - 236
  • [26] Time-aware selection approach for service composition based on pruning and improvement techniques
    Ikbel Guidara
    Nawal Guermouche
    Tarak Chaari
    Mohamed Jmaiel
    Software Quality Journal, 2020, 28 : 1245 - 1277
  • [27] Time-aware selection approach for service composition based on pruning and improvement techniques
    Guidara, Ikbel
    Guermouche, Nawal
    Chaari, Tarak
    Jmaiel, Mohamed
    SOFTWARE QUALITY JOURNAL, 2020, 28 (03) : 1245 - 1277
  • [28] A Novel Approach for Feature Selection Support of a Software Product Line Development
    Yugopuspito, Pujianto
    Murwantara, I. Made
    Sutomo, Adrian Hartanto
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2012, 12 (06): : 107 - 115
  • [29] A Novel Trust-Aware Composite Semantic Web Service Selection Approach
    Wang, Denghui
    Huang, Hao
    Xie, Changsheng
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [30] A novel firefly algorithm approach for efficient feature selection with COVID-19 dataset
    Bacanin, Nebojsa
    Venkatachalam, K.
    Bezdan, Timea
    Zivkovic, Miodrag
    Abouhawwash, Mohamed
    MICROPROCESSORS AND MICROSYSTEMS, 2023, 98