REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

被引:0
|
作者
Wang, Chaozheng [1 ]
Li, Zongjie [2 ]
Peng, Yun [3 ]
Gao, Shuzheng [3 ]
Chen, Sirong [1 ]
Wang, Shuai [2 ]
Gao, Cuiyun [1 ]
Lyu, Michael R. [3 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Comp Sci & Engn Dept, Hong Kong, Peoples R China
来源
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年
基金
中国国家自然科学基金;
关键词
Vulnerability; Data collection; Bug fix; GENERATION;
D O I
10.1109/ASE56229.2023.00199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit messages as explanations. To address these issues, we propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories. We focus on vulnerabilities since they are exploitable and have serious consequences. We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs. Furthermore, we propose a neural language model-based approach to generate high-quality vulnerability explanations, which is key to producing informative fix messages. Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations. The dataset we collect contains 4,466 CVEs with 30,987 patches (including 236 CWE) across 7 programming languages with detailed related information, which is superior to existing benchmarks in scale, coverage, and quality. Evaluations by human experts further confirm that our framework produces high-quality vulnerability explanations.
引用
收藏
页码:1952 / 1962
页数:11
相关论文
共 50 条
  • [41] A FRAMEWORK FOR REAL-WORLD ECONOMIC EVALUATION BY INCORPORATING IMPLEMENTATION PARAMETERS
    Grutters, J. P.
    Joore, M. A.
    VALUE IN HEALTH, 2008, 11 (06) : A563 - A563
  • [42] Symptotics: a framework for estimating the scalability of real-world wireless networks
    Ram Ramanathan
    Ertugrul Ciftcioglu
    Abhishek Samanta
    Rahul Urgaonkar
    Tom La Porta
    Wireless Networks, 2017, 23 : 1063 - 1083
  • [43] Demonstration of the Cybersecurity Framework through Real-World Cyber Attack
    Gourisetti, Sri Nikhil Gupta
    Mylrea, Michael
    Ashley, Travis
    Kwon, Roger
    Castleberry, Jerry
    Wright-Mockler, Quinn
    McKenzie, Penny
    Brege, Geoffrey
    2019 RESILIENCE WEEK (RWS), 2019, : 19 - 25
  • [44] Real-world implementation of the location stack: The Universal Location Framework
    Graumann, D
    Lara, W
    Hightower, J
    Borriello, G
    FIFTH IEEE WORKSHOP ON MOBILE COMPUTING SYSTEMS & APPLICATIONS, PROCEEDINGS, 2003, : 122 - 128
  • [45] A message-based framework for real-world mobility simulations
    Gloor, C
    Cavens, D
    Nagel, K
    Applications of Agent Technology in Traffic and Transportation, 2005, : 193 - 209
  • [46] METRICS AND MAPPINGS - A FRAMEWORK FOR UNDERSTANDING REAL-WORLD QUANTITATIVE ESTIMATION
    BROWN, NR
    SIEGLER, RS
    PSYCHOLOGICAL REVIEW, 1993, 100 (03) : 511 - 534
  • [47] AN ONCOLOGY REAL-WORLD DATA ASSESSMENT FRAMEWORK FOR OUTCOMES RESEARCH
    Desai, K.
    Chandwani, S.
    Ru, B.
    Reynolds, M.
    Christian, J. B.
    Estiri, H.
    VALUE IN HEALTH, 2021, 24 : S25 - S25
  • [48] Incremental learning framework for real-world fraud detection environment
    Anowar, Farzana
    Sadaoui, Samira
    COMPUTATIONAL INTELLIGENCE, 2021, 37 (01) : 635 - 656
  • [49] Using real-world external controls to support drug approval: An interactive framework using oncology trial and real-world data
    Hester, Laura L.
    Rivera, Donna R.
    Lund, Jennifer L.
    Golozar, Asieh
    Davis, Kourtney
    Seeger, John D.
    Sansbury, Leah
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2021, 30 : 4 - 5
  • [50] Strategies to Turn Real-world Data Into Real-world Knowledge
    Hong, Julian C.
    JAMA NETWORK OPEN, 2021, 4 (10)