How Effective Are They? Exploring Large Language Model Based Fuzz Driver Generation

被引:0
|
作者
Zhang, Cen [1 ]
Zheng, Yaowen [1 ]
Bai, Mingqiang [2 ,3 ]
Li, Yeting [2 ,3 ]
Ma, Wei [1 ]
Xie, Xiaofei [4 ]
Li, Yuekang [5 ]
Sun, Limin [2 ,3 ]
Liu, Yang [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Chinese Acad Sci, IIE, Beijing, Peoples R China
[3] UCAS, Sch Cyber Secur, Beijing, Peoples R China
[4] Singapore Management Univ, Singapore, Singapore
[5] Univ New South Wales, Sydney, NSW, Australia
基金
新加坡国家研究基金会;
关键词
Fuzz Driver Generation; Fuzz Testing; Large Language Model;
D O I
10.1145/3650212.3680355
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fuzz drivers are essential for library API fuzzing. However, automatically generating fuzz drivers is a complex task, as it demands the creation of high-quality, correct, and robust API usage code. An LLM-based (Large Language Model) approach for generating fuzz drivers is a promising area of research. Unlike traditional program analysis-based generators, this text-based approach is more generalized and capable of harnessing a variety of API usage information, resulting in code that is friendly for human readers. However, there is still a lack of understanding regarding the fundamental issues on this direction, such as its effectiveness and potential challenges. To bridge this gap, we conducted the first in-depth study targeting the important issues of using LLMs to generate effective fuzz drivers. Our study features a curated dataset with 86 fuzz driver generation questions from 30 widely-used C projects. Six prompting strategies are designed and tested across five state-of-the-art LLMs with five different temperature settings. In total, our study evaluated 736,430 generated fuzz drivers, with 0.85 billion token costs ($8,000+ charged tokens). Additionally, we compared the LLM-generated drivers against those utilized in industry, conducting extensive fuzzing experiments (3.75 CPU-year). Our study uncovered that: 1) While LLM-based fuzz driver generation is a promising direction, it still encounters several obstacles towards practical applications; 2) LLMs face difficulties in generating effective fuzz drivers for APIs with intricate specifics. Three featured design choices of prompt strategies can be beneficial: issuing repeat queries, querying with examples, and employing an iterative querying process; 3) While LLM-generated drivers can yield fuzzing outcomes that are on par with those used in the industry, there are substantial opportunities for enhancement, such as extending contained API usage, or integrating semantic oracles to facilitate logical bug detection. Our insights have been implemented to improve the OSS-Fuzz-Gen project, facilitating practical fuzz driver generation in industry.
引用
收藏
页码:1223 / 1235
页数:13
相关论文
共 50 条
  • [41] Model-Based Generation of Natural Language Specifications
    Phan Thu Nhat Vo
    Spichkova, Maria
    SOFTWARE TECHNOLOGIES: APPLICATIONS AND FOUNDATIONS (STAF 2016), 2016, 9946 : 221 - 231
  • [42] Cockpit-Llama: Driver Intent Prediction in Intelligent Cockpit via Large Language Model
    Chen, Yi
    Li, Chengzhe
    Yuan, Qirui
    Li, Jinyu
    Fan, Yuze
    Ge, Xiaojun
    Li, Yun
    Gao, Fei
    Zhao, Rui
    SENSORS, 2025, 25 (01)
  • [43] Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment
    Zhang, Kunpeng
    Wang, Shipu
    Jia, Ning
    Zhao, Liang
    Han, Chunyang
    Li, Li
    ACCIDENT ANALYSIS AND PREVENTION, 2024, 198
  • [44] Exploring large language models for the generation of synthetic training samples for aspect-based sentiment analysis in low resource settings
    Hellwig, Nils Constantin
    Fehle, Jakob
    Wolff, Christian
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [45] Exploring the capabilities of large language models for the generation of safety cases: the case of GPT-4
    Sivakumar, Mithila
    Belle, Alvine Boaye
    Shan, Jinjun
    Shahandashti, Kimya Khakzad
    32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 35 - 45
  • [46] Large language model based mutations in genetic improvement
    Brownlee, Alexander E. I.
    Callan, James
    Even-Mendoza, Karine
    Geiger, Alina
    Hanna, Carol
    Petke, Justyna
    Sarro, Federica
    Sobania, Dominik
    AUTOMATED SOFTWARE ENGINEERING, 2025, 32 (01)
  • [47] A survey on large language model based autonomous agents
    Wang, Lei
    Ma, Chen
    Feng, Xueyang
    Zhang, Zeyu
    Yang, Hao
    Zhang, Jingsen
    Chen, Zhiyuan
    Tang, Jiakai
    Chen, Xu
    Lin, Yankai
    Zhao, Wayne Xin
    Wei, Zhewei
    Wen, Jirong
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [48] Diagnosing Glaucoma Based on a Large Language Model Chatbot
    Raja, Hina
    Huang, Xiaoqin
    Delsoz, Mohammad
    Madadi, Yeganeh
    Poursoroush, Asma
    Munawar, Asim
    Kahook, Malik
    Yousefi, Siamak
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [49] Research on Psychological Test based on Large Language Model
    Liu, Zhengzheng
    Li, Xinying
    Kang, Yunfeng
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 503 - 510
  • [50] A survey on large language model based autonomous agents
    Lei Wang
    Chen Ma
    Xueyang Feng
    Zeyu Zhang
    Hao Yang
    Jingsen Zhang
    Zhiyuan Chen
    Jiakai Tang
    Xu Chen
    Yankai Lin
    Wayne Xin Zhao
    Zhewei Wei
    Jirong Wen
    Frontiers of Computer Science, 2024, 18