How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

被引：0

作者：

Zhang, Cen ^{[1
]}

Zheng, Yaowen ^{[1
]}

Bai, Mingqiang ^{[2
,3
]}

Li, Yeting ^{[2
,3
]}

Ma, Wei ^{[1
]}

Xie, Xiaofei ^{[4
]}

Li, Yuekang ^{[5
]}

Sun, Limin ^{[2
,3
]}

Liu, Yang ^{[1
]}

机构：

[1] Nanyang Technol Univ, Singapore, Singapore

[2] Chinese Acad Sci, IIE, Beijing, Peoples R China

[3] UCAS, Sch Cyber Secur, Beijing, Peoples R China

[4] Singapore Management Univ, Singapore, Singapore

[5] Univ New South Wales, Sydney, NSW, Australia

来源：

PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024 | 2024年

基金：

新加坡国家研究基金会;

关键词：

Fuzz Driver Generation; Fuzz Testing; Large Language Model;

D O I：

10.1145/3650212.3680355

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.

引用

页码：1223 / 1235

页数：13

共 50 条

[1] Exploring large language model for next generation of artificial intelligence in ophthalmology
Jin, Kai
Yuan, Lu
Wu, Hongkang
Grzybowski, Andrzej
Ye, Juan
FRONTIERS IN MEDICINE, 2023, 10
[2] Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference
Wang, Chi
Liu, Susan Xueqing
Awadallah, Ahmed H.
INTERNATIONAL CONFERENCE ON AUTOMATED MACHINE LEARNING, VOL 224, 2023, 224
[3] LLMGA: Multimodal Large Language Model Based Generation Assistant
Xia, Bin
Wang, Shiyin
Tao, Yingfan
Wang, Yitong
Jia, Jiaya
COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 389 - 406
[4] How to write effective prompts for large language models
Lin, Zhicheng
NATURE HUMAN BEHAVIOUR, 2024, 8 (4) : 611 - 615
[5] How to write effective prompts for large language models
Zhicheng Lin
Nature Human Behaviour, 2024, 8 : 611 - 615
[6] Chinese Generation and Security Index Evaluation Based on Large Language Model
Zhang, Yu
Gao, Yongbing
Li, Weihao
Su, Zirong
Yang, Lidong
2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 151 - 161
[7] Exploring Generalizability of a fine-tuned Large Language Model for Impression Generation in PET Reports
Yousefirizi, F.
Wang, L.
Gowdy, C.
Shariftabrizi, A.
Harsini, S.
Ahamed, S.
Sabouri, M.
Mollaheydar, E.
Rahmim, A.
EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2024, 51 : S785 - S785
[8] Exploring Large Language Models for Verilog hardware design generation
D'Hollander, Erik H.
Danneels, Ewout
Decorte, Karel-Brecht
Loobuyck, Senne
Vanheule, Ame
Van Kets, Ian
Stroobandt, Dirk
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 111 - 115
[9] Exploring Automated Assertion Generation via Large Language Models
Zhang, Quanjun
Sun, Weifeng
Fang, Chunrong
Yu, Bowen
Li, Hongyan
Yan, Meng
Zhou, Jianyi
Chen, Zhenyu
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (03)
[10] Large language model for patent concept generation
Ren, Runtao
Ma, Jian
Luo, Jianxi
ADVANCED ENGINEERING INFORMATICS, 2025, 65

← 1 2 3 4 5 →