Empirical studies on the impact of filter-based ranking feature selection on security vulnerability prediction

被引:15
|
作者
Chen, Xiang [1 ,2 ]
Yuan, Zhidan [1 ]
Cui, Zhanqi [3 ]
Zhang, Dun [1 ]
Ju, Xiaolin [1 ]
机构
[1] Nantong Univ, Sch Informat Sci & Technol, Nantong, Peoples R China
[2] Guilin Univ Elect Technol, Guangxi Key Lab Trusted Software, Guilin, Peoples R China
[3] Beijing Informat Sci & Technol Univ, Comp Sch, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
DEFECT PREDICTION; SOFTWARE; METRICS;
D O I
10.1049/sfw2.12006
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Security vulnerability prediction (SVP) can construct models to identify potentially vulnerable program modules via machine learning. Two kinds of features from different points of view are used to measure the extracted modules in previous studies. One kind considers traditional software metrics as features, and the other kind uses text mining to extract term vectors as features. Therefore, gathered SVP data sets often have numerous features and result in the curse of dimensionality. In this article, we mainly investigate the impact of filter-based ranking feature selection (FRFS) methods on SVP, since other types of feature selection methods have too much computational cost. In empirical studies, we first consider three real-world large-scale web applications. Then we consider seven methods from three FRFS categories for FRFS and use a random forest classifier to construct SVP models. Final results show that given the similar code inspection cost, using FRFS can improve the performance of SVP when compared with state-of-the-art baselines. Moreover, we use McNemar's test to perform diversity analysis on identified vulnerable modules by using different FRFS methods, and we are surprised to find that almost all the FRFS methods can identify similar vulnerable modules via diversity analysis.
引用
收藏
页码:75 / 89
页数:15
相关论文
共 50 条
  • [1] An Empirical Investigation of Combining Filter-Based Feature Subset Selection and Data Sampling for Software Defect Prediction
    Gao, Kehan
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    [J]. INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2015, 22 (06)
  • [2] Evaluating the impact of filter-based feature selection in intrusion detection systems
    Houssam Zouhri
    Ali Idri
    Ahmed Ratnani
    [J]. International Journal of Information Security, 2024, 23 : 759 - 785
  • [3] Evaluating the impact of filter-based feature selection in intrusion detection systems
    Zouhri, Houssam
    Idri, Ali
    Ratnani, Ahmed
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2024, 23 (02) : 759 - 785
  • [4] An Empirical Study of Filter-based Feature Selection Algorithms Using Noisy Training Data
    Yuan, Weiwei
    Guan, Donghai
    Shen, Linshan
    Pan, Haiwei
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST), 2014, : 209 - 212
  • [5] Filter-based feature selection methods in the presence of missing data for medical prediction models
    Aydin, Zeliha Ergul
    Ozturk, Zehra Kamisli
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (8) : 24187 - 24216
  • [6] Effective Threshold Estimation for Filter-based Feature Selection
    Pramokchon, Past
    Piamsa-nga, Punpiti
    [J]. 2016 20TH INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC), 2016,
  • [7] Filter-based feature selection for rail defect detection
    C. Mandriota
    M. Nitti
    N. Ancona
    E. Stella
    A. Distante
    [J]. Machine Vision and Applications, 2004, 15 : 179 - 185
  • [8] Privacy-aware Filter-based Feature Selection
    Jafer, Yasser
    Matwin, Stan
    Sokolova, Marina
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [9] Filter-based feature selection for rail defect detection
    Mandriota, C
    Nitti, M
    Ancona, N
    Stella, E
    Distante, A
    [J]. MACHINE VISION AND APPLICATIONS, 2004, 15 (04) : 179 - 185
  • [10] A filter-based feature selection approach in multilabel classification
    Shaikh, Rafia
    Rafi, Muhammad
    Mahoto, Naeem Ahmed
    Sulaiman, Adel
    Shaikh, Asadullah
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):