Predicting Vulnerable Software Components through N-gram Analysis and Statistical Feature Selection

被引:48
|
作者
Pang, Yulei [1 ]
Xue, Xiaozhen [2 ]
Namin, Akbar Siami [2 ]
机构
[1] Southern Connecticut State Univ, Dept Math, New Haven, CT 06515 USA
[2] Texas Tech Univ, Dept Comp Sci, Lubbock, TX 79409 USA
关键词
Vulnerability prediction; N-gram; Feature selection; Wilcoxon test;
D O I
10.1109/ICMLA.2015.99
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Vulnerabilities need to be detected and removed from software. Although previous studies demonstrated the usefulness of employing prediction techniques in deciding about vulnerabilities of software components, the accuracy and improvement of effectiveness of these prediction techniques is still a grand challenging research question. This paper proposes a hybrid technique based on combining N-gram analysis and feature selection algorithms for predicting vulnerable software components where features are defined as continuous sequences of token in source code files, i.e., Java class file. Machine learning-based feature selection algorithms are then employed to reduce the feature and search space. We evaluated the proposed technique based on some Java Android applications, and the results demonstrated that the proposed technique could predict vulnerable classes, i.e., software components, with high precision, accuracy and recall.
引用
收藏
页码:543 / 548
页数:6
相关论文
共 50 条
  • [31] Efficient n-gram analysis in R with cmscu
    Vinson, David W.
    Davis, Jason K.
    Sindi, Suzanne S.
    Dale, Rick
    [J]. BEHAVIOR RESEARCH METHODS, 2016, 48 (03) : 909 - 921
  • [32] Predicting Vulnerable Software Components via Text Mining
    Scandariato, Riccardo
    Walden, James
    Hovsepyan, Aram
    Joosen, Wouter
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2014, 40 (10) : 993 - 1006
  • [33] Amyloidogenic motifs revealed by n-gram analysis
    Burdukiewicz, Michal
    Sobczyk, Piotr
    Rodiger, Stefan
    Duda-Madej, Anna
    Mackiewicz, Pawel
    Kotulska, Malgorzata
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [34] Sentiment Analysis Using N-gram Technique
    Chidananda, Himadri Tanaya
    Das, Debashis
    Sagnika, Santwana
    [J]. PROGRESS IN COMPUTING, ANALYTICS AND NETWORKING, ICCAN 2017, 2018, 710 : 359 - 367
  • [35] Efficient n-gram analysis in R with cmscu
    David W. Vinson
    Jason K. Davis
    Suzanne S. Sindi
    Rick Dale
    [J]. Behavior Research Methods, 2016, 48 : 909 - 921
  • [36] DNA N-gram analysis framework (DNAnamer): A generalized N-gram frequency analysis framework for the supervised classification of DNA sequences
    Malamon, John S.
    [J]. HELIYON, 2024, 10 (17)
  • [37] Apriori and N-gram Based Chinese Text Feature Extraction Method
    王晔
    黄上腾
    [J]. Journal of Shanghai Jiaotong University(Science), 2004, (04) : 11 - 14
  • [38] Factored bilingual n-gram language models for statistical machine translation
    Crego, Josep M.
    Yvon, Francois
    [J]. MACHINE TRANSLATION, 2010, 24 (02) : 159 - 175
  • [39] Approach to Predict Software Vulnerability Based on Multiple-Level N-gram Feature Extraction and Heterogeneous Ensemble Learning
    Zhang, Bing
    Gao, Yuan
    Wu, Jingyi
    Wang, Ning
    Wang, Qian
    Ren, Jiadong
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2022, 32 (10) : 1559 - 1582
  • [40] N-gram Events for Analysis of Financial Time Series
    Borovikov, Igor
    Sadovsky, Michael
    [J]. PROCEEDINGS OF ECCS 2014: EUROPEAN CONFERENCE ON COMPLEX SYSTEMS, 2016, : 155 - 167