REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

被引:0
|
作者
Wang, Chaozheng [1 ]
Li, Zongjie [2 ]
Peng, Yun [3 ]
Gao, Shuzheng [3 ]
Chen, Sirong [1 ]
Wang, Shuai [2 ]
Gao, Cuiyun [1 ]
Lyu, Michael R. [3 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Comp Sci & Engn Dept, Hong Kong, Peoples R China
来源
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年
基金
中国国家自然科学基金;
关键词
Vulnerability; Data collection; Bug fix; GENERATION;
D O I
10.1109/ASE56229.2023.00199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit messages as explanations. To address these issues, we propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories. We focus on vulnerabilities since they are exploitable and have serious consequences. We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs. Furthermore, we propose a neural language model-based approach to generate high-quality vulnerability explanations, which is key to producing informative fix messages. Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations. The dataset we collect contains 4,466 CVEs with 30,987 patches (including 236 CWE) across 7 programming languages with detailed related information, which is superior to existing benchmarks in scale, coverage, and quality. Evaluations by human experts further confirm that our framework produces high-quality vulnerability explanations.
引用
收藏
页码:1952 / 1962
页数:11
相关论文
共 50 条
  • [31] Symptotics: a framework for estimating the scalability of real-world wireless networks
    Ramanathan, Ram
    Ciftcioglu, Ertugrul
    Samanta, Abhishek
    Urgaonkar, Rahul
    La Porta, Tom
    WIRELESS NETWORKS, 2017, 23 (04) : 1063 - 1083
  • [32] A weakly supervised framework for real-world point cloud classification
    Deng, An
    Wu, Yunchao
    Zhang, Peng
    Lu, Zhuheng
    Li, Weiqing
    Su, Zhiyong
    COMPUTERS & GRAPHICS-UK, 2022, 102 : 78 - 88
  • [33] Real-world assessment of sparsentan's drug safety framework
    Fu, Wenjing
    Wang, Jingyu
    Xue, Yuzhou
    Pan, Dikang
    RENAL FAILURE, 2025, 47 (01)
  • [34] A framework for understanding selection bias in real-world healthcare data
    Kundu, Ritoban
    Shi, Xu
    Morrison, Jean
    Barrett, Jessica
    Mukherjee, Bhramar
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2024, 187 (03) : 606 - 635
  • [35] An Integrated Framework for Operational Scheduling of a Real-World Pipeline Network
    Boschetto, Suelen Neves
    Felizari, Luiz Carlos
    Yamamoto, Lia
    Magatao, Leandro
    Stebel, Sergio Leandro
    Neves-, Flavio, Jr.
    Ramos de Arruda, Lucia Valeria
    Lueders, Ricardo
    Ribas, Paulo Cesar
    de Jesus Bernardo, Luiz Fernando
    18TH EUROPEAN SYMPOSIUM ON COMPUTER AIDED PROCESS ENGINEERING, 2008, 25 : 259 - 264
  • [36] DroidScreening: a practical framework for real-world Android malware analysis
    Yu, Junfeng
    Huang, Qingfeng
    Yian, CheeHoo
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (11) : 1435 - 1449
  • [37] A Framework Algorithm for a Real-World Variant of the Vehicle Routing Problem
    Vu Pham
    Tien Dinh
    2011 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2011, : 1859 - 1863
  • [38] A framework for development of real-world motorcycle driving cycle in India
    Sithananthan, Masilamani
    Kumar, Ravindra
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART D-JOURNAL OF AUTOMOBILE ENGINEERING, 2021, 235 (06) : 1497 - 1515
  • [39] A Framework for Generation, Replay, and Analysis of Real-World Attack Variants
    Phuong Cao
    Badger, Eric C.
    Kalbarczyk, Zbigniew T.
    Iyer, Ravishankar K.
    SYMPOSIUM AND BOOTCAMP ON THE SCIENCE OF SECURITY, 2016, : 28 - 37
  • [40] Disability theorising and real-world educational practice: a framework for understanding
    Gable, Alison S.
    DISABILITY & SOCIETY, 2014, 29 (01) : 86 - 100