Predicting Vulnerable Software Components through N-gram Analysis and Statistical Feature Selection

被引:48
|
作者
Pang, Yulei [1 ]
Xue, Xiaozhen [2 ]
Namin, Akbar Siami [2 ]
机构
[1] Southern Connecticut State Univ, Dept Math, New Haven, CT 06515 USA
[2] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
关键词
Vulnerability prediction; N-gram; Feature selection; Wilcoxon test;
D O I
10.1109/ICMLA.2015.99
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [1] N-gram feature selection for authorship identification
    Houvardas, John
    Stamatatos, Efstathios
    [J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2006, 4183 : 77 - 86
  • [2] Automatic Feature Learning for Predicting Vulnerable Software Components
    Hoa Khanh Dam
    Truyen Tran
    Trang Pham
    Ng, Shien Wee
    Grundy, John
    Ghose, Aditya
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (01) : 67 - 85
  • [3] Partitioning Based N-Gram Feature Selection for Malware Classification
    Hu, Weiwei
    Tan, Ying
    [J]. DATA MINING AND BIG DATA, DMBD 2016, 2016, 9714 : 187 - 195
  • [4] Feature n-gram set based software zero-watermarking
    Lu, Bin
    Liu, Fenlin
    Ge, Xin
    Wang, Ping
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING AND 2008 INTERNATIONAL PACIFIC WORKSHOP ON WEB MINING AND WEB-BASED APPLICATION, 2008, : 607 - 611
  • [5] Software Fault Localization Using N-gram Analysis
    Nessa, Syeda
    Abedin, Muhammad
    Wong, W. Eric
    Khan, Latifur
    Qi, Yu
    [J]. WIRELESS ALGORITHMS, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2008, 5258 : 548 - 559
  • [6] Clustering botnet communication traffic based on n-gram feature selection
    Lu, Wei
    Rammidi, Goaletsa
    Ghorbani, Ali A.
    [J]. COMPUTER COMMUNICATIONS, 2011, 34 (03) : 502 - 514
  • [7] Optimizing N-Gram Based Text Feature Selection in Sentiment Analysis for Commercial Products in Twitter through Polarity Lexicons
    Cabanlit, Mark Anthony
    Espinosa, Kurt Junshean
    [J]. 5TH INTERNATIONAL CONFERENCE ON INFORMATION, INTELLIGENCE, SYSTEMS AND APPLICATIONS, IISA 2014, 2014, : 94 - +
  • [8] N-gram MalGAN: Evading machine learning detection via feature n-gram
    Zhu, Enmin
    Zhang, Jianjie
    Yan, Jijie
    Chen, Kongyang
    Gao, Chongzhi
    [J]. DIGITAL COMMUNICATIONS AND NETWORKS, 2022, 8 (04) : 485 - 491
  • [9] N-gram MalGAN:Evading machine learning detection via feature n-gram
    Enmin Zhu
    Jianjie Zhang
    Jijie Yan
    Kongyang Chen
    Chongzhi Gao
    [J]. Digital Communications and Networks., 2022, 8 (04) - 491
  • [10] Predicting Vulnerable Software Components
    Neuhaus, Stephan
    Zimmermann, Thomas
    Holler, Christian
    Zeller, Andreas
    [J]. CCS'07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2007, : 529 - +