REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

被引:0
|
作者
Wang, Chaozheng [1 ]
Li, Zongjie [2 ]
Peng, Yun [3 ]
Gao, Shuzheng [3 ]
Chen, Sirong [1 ]
Wang, Shuai [2 ]
Gao, Cuiyun [1 ]
Lyu, Michael R. [3 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Comp Sci & Engn Dept, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Vulnerability; Data collection; Bug fix; GENERATION;
D O I
10.1109/ASE56229.2023.00199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit messages as explanations. To address these issues, we propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories. We focus on vulnerabilities since they are exploitable and have serious consequences. We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs. Furthermore, we propose a neural language model-based approach to generate high-quality vulnerability explanations, which is key to producing informative fix messages. Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations. The dataset we collect contains 4,466 CVEs with 30,987 patches (including 236 CWE) across 7 programming languages with detailed related information, which is superior to existing benchmarks in scale, coverage, and quality. Evaluations by human experts further confirm that our framework produces high-quality vulnerability explanations.
引用
收藏
页码:1952 / 1962
页数:11
相关论文
共 50 条
  • [1] On the Effectiveness of Software Diversity: A Systematic Study on Real-World Vulnerabilities
    Han, Jin
    Gao, Debin
    Deng, Robert H.
    DETECTION OF INTRUSIONS AND MALWARE, AND VULNERABILITY ASSESSMENT, PROCEEDINGS, 2009, 5587 : 127 - 146
  • [2] Cheesecloth: Zero-Knowledge Proofs of Real-World Vulnerabilities
    Cuellar, Santiago
    Harris, Bill
    Parker, James
    Pernsteiner, Stuart
    Tromer, Eran
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 6525 - 6540
  • [3] Building a Commit-level Dataset of Real-world Vulnerabilities
    Challande, Alexis
    David, Robin
    Renault, Guenael
    CODASPY'22: PROCEEDINGS OF THE TWELVETH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2022, : 101 - 106
  • [4] A DIAGNOSTIC FRAMEWORK TO EVALUATE REAL-WORLD DATA SOURCES FOR REAL-WORLD EVIDENCE GENERATION
    Denysyk, L.
    Doyle, J.
    Sood, R.
    VALUE IN HEALTH, 2018, 21 : S89 - S89
  • [5] Understanding Java']JavaScript Vulnerabilities in Large Real-World Android Applications
    Song, Wei
    Huang, Qingqing
    Huang, Jeff
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2020, 17 (05) : 1063 - 1078
  • [6] The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities
    Jimenez, Matthieu
    Rwemalika, Renaud
    Papadakis, Mike
    Sarro, Federica
    Le Traon, Yves
    Harman, Mark
    ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 695 - 705
  • [7] Real-world evidence framework feedback consultation
    Patel, Dipesh
    BRITISH JOURNAL OF DIABETES, 2022, 22 (01): : 60 - 60
  • [8] The real-world impact of National Institute for Health and Care Excellence's real-world evidence framework
    Duffield, Stephen
    Jonsson, Pall
    JOURNAL OF COMPARATIVE EFFECTIVENESS RESEARCH, 2023, 12 (11)
  • [9] ASSESSING THE QUALITY OF REAL-WORLD DATA AND REAL-WORLD EVIDENCE IN ONCOLOGY RESEARCH: A COHESIVE FRAMEWORK FOR RESEARCHERS
    Su, Z.
    Dye, J.
    Wilson, T.
    Amirian, E. S.
    O'Sullivan, A.
    VALUE IN HEALTH, 2023, 26 (06) : S377 - S377
  • [10] Real-world evidence: a practical toolbox for collecting health state utilities
    Lambert-Obry, Veronique
    Lafrance, Jean-Philippe
    Savoie, Michelle
    Lachaine, Jean
    JOURNAL OF COMPARATIVE EFFECTIVENESS RESEARCH, 2021, 11 (01) : 57 - 64