LTRWES: A new framework for security bug report detection

被引:16
|
作者
Jiang, Yuan [1 ]
Lu, Pengcheng [1 ]
Su, Xiaohong [1 ]
Wang, Tiantian [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Security bug report; Content-based filtering; Word embedding; Machine learning; CLASSIFICATION; PREDICTION;
D O I
10.1016/j.infsof.2020.106314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Security bug reports (SBRs) usually contain security-related vulnerabilities in software products, which could be exploited by malicious attackers. Hence, it is important to identify SBRs quickly and accurately among bug reports (BRs) that have been disclosed in bug tracking systems. Although a few methods have been already proposed for the detection of SBRs, challenging issues still remain due to noisy samples, class imbalance and data scarcity. Object: This motivates us to reveal the potential challenges faced by the state-of-the-art SBRs prediction methods from the viewpoint of data filtering and representation. Furthermore, the purpose of this paper is also to provide a general framework and new solutions to solve these problems. Method: In this study, we propose a novel approach LTRWES that incorporates learning to rank and word embedding into the identification of SBRs. Unlike previous keyword-based approaches, LTRWES is a content-based data filtering and representation framework that has several desirable properties not shared in other methods. Firstly, it exploits ranking model to efficiently filter non-security bug reports (NSBRs) that have higher content similarity with respect to SBRs. Secondly, it applies word embedding technology to transform the rest of NSBRs, together with SBRs, into low-dimensional real-value vectors. Result: Experiment results on benchmark and large real-world datasets show that our proposed method outperforms the state-of-the-art method. Conclusion: Overall, the LTRWES is valid with high performance. It will help security engineers to identify SBRs from thousands of NSBRs more accurately than existing algorithms. Therefore, this will positively encourage the research and development of the content-based methods for security bug report detection.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Security Bug Report Detection Via Noise Filtering and Deep Learning
    Jiang, Yuan
    Mu, Chen-Guang
    Su, Xiao-Hong
    Wang, Tian-Tian
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (08): : 1794 - 1813
  • [2] A Unified Framework for Bug Report Assignment
    Zhao, Yuan
    He, Tieke
    Chen, Zhenyu
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2019, 29 (04) : 607 - 628
  • [3] A New Method of Security Bug Reports Analysis
    Xu, Yunwu
    Li, Yan
    [J]. IT PROFESSIONAL, 2024, 26 (02) : 49 - 56
  • [4] Text Filtering and Ranking for Security Bug Report Prediction
    Peters, Fayola
    Tun, Thein Than
    Yu, Yijun
    Nuseibeh, Bashar
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2019, 45 (06) : 615 - 631
  • [5] A New Framework of Security Vulnerabilities Detection in PHP Web Application
    Zhao, Jingling
    Gong, Rulin
    [J]. 2015 9TH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING IMIS 2015, 2015, : 271 - 276
  • [6] A Systematic Study of Duplicate Bug Report Detection
    Gupta, Som
    Gupta, Sanjai Kumar
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (01) : 578 - 589
  • [7] Duplicate Bug Report Detection Using Clustering
    Gopalan, Raj P.
    Krishna, Aneesh
    [J]. 2014 23RD AUSTRALASIAN SOFTWARE ENGINEERING CONFERENCE (ASWEC), 2013, : 104 - 109
  • [8] A multi-model framework for semantically enhancing detection of quality-related bug report descriptions
    Krasniqi, Rrezarta
    Do, Hyunsook
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (02)
  • [9] Reformulating Queries for Duplicate Bug Report Detection
    Chaparro, Oscar
    Florez, Juan Manuel
    Singh, Unnati
    Marcus, Andrian
    [J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 218 - 229
  • [10] A multi-model framework for semantically enhancing detection of quality-related bug report descriptions
    Rrezarta Krasniqi
    Hyunsook Do
    [J]. Empirical Software Engineering, 2023, 28