REEF: A Framework for Collecting Real-World Vulnerabilities and Fixes

被引：0

作者：

Wang, Chaozheng ^{[1
]}

Li, Zongjie ^{[2
]}

Peng, Yun ^{[3
]}

Gao, Shuzheng ^{[3
]}

Chen, Sirong ^{[1
]}

Wang, Shuai ^{[2
]}

Gao, Cuiyun ^{[1
]}

Lyu, Michael R. ^{[3
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China

[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[3] Chinese Univ Hong Kong, Comp Sci & Engn Dept, Hong Kong, Peoples R China

来源：

2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE | 2023年

基金：

中国国家自然科学基金;

关键词：

Vulnerability; Data collection; Bug fix; GENERATION;

D O I：

10.1109/ASE56229.2023.00199

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Software plays a crucial role in our daily lives, and therefore the quality and security of software systems have become increasingly important. However, vulnerabilities in software still pose a significant threat, as they can have serious consequences. Recent advances in automated program repair have sought to automatically detect and fix bugs using data-driven techniques. Sophisticated deep learning methods have been applied to this area and have achieved promising results. However, existing benchmarks for training and evaluating these techniques remain limited, as they tend to focus on a single programming language and have relatively small datasets. Moreover, many benchmarks tend to be outdated and lack diversity, focusing on a specific codebase. Worse still, the quality of bug explanations in existing datasets is low, as they typically use imprecise and uninformative commit messages as explanations. To address these issues, we propose an automated collecting framework REEF to collect REal-world vulnErabilities and Fixes from open-source repositories. We focus on vulnerabilities since they are exploitable and have serious consequences. We develop a multi-language crawler to collect vulnerabilities and their fixes, and design metrics to filter for high-quality vulnerability-fix pairs. Furthermore, we propose a neural language model-based approach to generate high-quality vulnerability explanations, which is key to producing informative fix messages. Through extensive experiments, we demonstrate that our approach can collect high-quality vulnerability-fix pairs and generate strong explanations. The dataset we collect contains 4,466 CVEs with 30,987 patches (including 236 CWE) across 7 programming languages with detailed related information, which is superior to existing benchmarks in scale, coverage, and quality. Evaluations by human experts further confirm that our framework produces high-quality vulnerability explanations.

引用

页码：1952 / 1962

页数：11

共 50 条

[21] Real-world studies addressing real-world issues
Freemantle, N
DEUTSCHE MEDIZINISCHE WOCHENSCHRIFT, 2005, 130 : S77 - S81
[22] Balancing real-world problems with real-world results
Gordon, R
PHI DELTA KAPPAN, 1998, 79 (05) : 390 - 393
[23] The US Food and Drug Administration's Real-World Evidence Framework: A Commitment for Engagement and Transparency on Real-World Evidence
ElZarrad, M. Khair
Corrigan-Curay, Jacqueline
CLINICAL PHARMACOLOGY & THERAPEUTICS, 2019, 106 (01) : 33 - 35
[24] AndroCom: A Real-World Android Applications' Vulnerability Dataset to Assist with Automatically Detecting Vulnerabilities
Arikan, Kaya Emre
Yilmaz, Ercan Nurcan
APPLIED SCIENCES-BASEL, 2025, 15 (05):
[25] Reality Check: Assessing GPT-4 in Fixing Real-World Software Vulnerabilities
Sagodi, Zoltan
Antal, Gabor
Bogenfurst, Bence
Isztin, Martin
Hegedus, Peter
Ferenc, Rudolf
PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 252 - 261
[26] THE REAL-WORLD
GRAY, M
NEW REPUBLIC, 1995, 212 (25) : 4 - 4
[27] Advancing a Framework for Regulatory Use of Real-World Evidence: When Real Is Reliable
Nancy A. Dreyer
Therapeutic Innovation & Regulatory Science, 2018, 52 : 362 - 368
[28] Advancing a Framework for Regulatory Use of Real-World Evidence: When Real Is Reliable
Dreyer, Nancy A.
THERAPEUTIC INNOVATION & REGULATORY SCIENCE, 2018, 52 (03) : 362 - 368
[29] Real-world data: A relevant component in the framework of scientific evidence
Canonica, Giorgio W.
Del Moro, Lorenzo
Costanzo, Giovanni
Nappi, Emanuele
Paoletti, Giovanni
ASIA PACIFIC ALLERGY, 2023, 13 (01) : 40 - 43
[30] Systematic Framework for Solving Real-World Problems with Multiple Objectives
Huang, Tai-Ying
Chiu, Wei-Yu
2016 IEEE 5TH GLOBAL CONFERENCE ON CONSUMER ELECTRONICS, 2016,

← 1 2 3 4 5 →