Enhanced automated code vulnerability repair using large language models

被引：2

作者：

de-Fitero-Dominguez, David ^{[1
]}

Garcia-Lopez, Eva ^{[1
]}

Garcia-Cabot, Antonio ^{[1
]}

Martinez-Herraiz, Jose-Javier ^{[1
]}

机构：

[1] Univ Alcala, Dept Ciencias Computac, Alcala De Henares 28805, Spain

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 138卷

关键词：

Automated code repair; Deep learning; Large language models; Vulnerability repair; Mistral; Code llama;

D O I：

10.1016/j.engappai.2024.109291

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research addresses the complex challenge of automated repair of code vulnerabilities, vital for enhancing digital security in an increasingly technology-driven world. The study introduces a novel and efficient format for the representation of code modification, using advanced Large Language Models (LLMs) such as Code Llama and Mistral. These models, fine-tuned on datasets featuring C/C++ code vulnerabilities, significantly improve the accuracy and adaptability of automated code repair techniques. A key finding is the enhanced repair accuracy of these models when compared to previous methods such as VulRepair, which underscores their practical utility and efficiency. The research also offers a critical assessment of current evaluation metrics, such as "Perfect Predictions", and their limitations in reflecting the true capabilities of automated repair models in real-world scenarios. Following this, it underscores the importance of using test datasets devoid of train samples, emphasizing the need for dataset integrity to enhance the effectiveness of LLMs in code repair tasks. The significance of this work is its contribution to digital security, setting new standards for automated code vulnerability repair and paving the way for future advancements in the fields of cybersecurity and artificial intelligence. The study does not only highlight the potential of LLMs in enhancing code security but also fosters further exploration and research in these crucial areas.

引用

页数：13

共 50 条

[41] Invited: Automated Code generation for Information Technology Tasks in YAML through Large Language Models
Pujar, Saurabh
Buratti, Luca
Guo, Xiaojie
Dupuis, Nicolas
Lewis, Burn
Suneja, Sahil
Sood, Atin
Nalawade, Ganesh
Jones, Matt
Morari, Alessandro
Puri, Ruchir
2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[42] Fine-Tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code Review
Yu, Yongda
Rong, Guoping
Shen, Haifeng
Zhang, He
Shao, Dong
Wang, Min
Wei, Zhao
Xu, Yong
Wang, Juhong
ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2025, 34 (01)
[43] Automated Unit Test Improvement using Large Language Models at Meta
Alshahwan, Nadia
Chheda, Jubin
Finogenova, Anastasia
Gokkaya, Beliz
Harman, Mark
Harper, Inna
Marginean, Alexandru
Sengupta, Shubho
Wang, Eddy
COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 185 - 196
[44] Towards automated phenotype definition extraction using large language models
Ramya Tekumalla
Juan M. Banda
Genomics & Informatics, 22 (1)
[45] Automated Test Creation Using Large Language Models: A Practical Application
Hadzhikoleva, Stanka
Rachovski, Todor
Ivanov, Ivan
Hadzhikolev, Emil
Dimitrov, Georgi
APPLIED SCIENCES-BASEL, 2024, 14 (19):
[46] Automated Topic Analysis with Large Language Models
Kirilenko, Andrei
Stepchenkova, Svetlana
INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 2024, ENTER 2024, 2024, : 29 - 34
[47] Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models
Sarsa, Sami
Denny, Paul
Hellas, Arto
Leinonen, Juho
PROCEEDINGS OF THE 2022 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH, ICER 2022, VOL. 1, 2023, : 27 - 43
[48] Using Large Language Models to Document Code: A First Quantitative and Qualitative Assessment
Guelman, Ian
Leal, Arthur Gregório
Xavier, Laerte
Valente, Marco Tulio
arXiv,
[49] Large Language Models of Code Fail at Completing Code with Potential Bugs
Tuan Dinh
Zhao, Jinman
Tan, Samson
Negrinho, Renato
Lausen, Leonard
Zha, Sheng
Karypis, George
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[50] The use of large language models for program repair
Zubair, Fida
Al-Hitmi, Maryam
Catal, Cagatay
COMPUTER STANDARDS & INTERFACES, 2025, 93

← 1 2 3 4 5 →