Enhanced automated code vulnerability repair using large language models

被引：2

作者：

de-Fitero-Dominguez, David ^{[1
]}

Garcia-Lopez, Eva ^{[1
]}

Garcia-Cabot, Antonio ^{[1
]}

Martinez-Herraiz, Jose-Javier ^{[1
]}

机构：

[1] Univ Alcala, Dept Ciencias Computac, Alcala De Henares 28805, Spain

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2024年 / 138卷

关键词：

Automated code repair; Deep learning; Large language models; Vulnerability repair; Mistral; Code llama;

D O I：

10.1016/j.engappai.2024.109291

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This research addresses the complex challenge of automated repair of code vulnerabilities, vital for enhancing digital security in an increasingly technology-driven world. The study introduces a novel and efficient format for the representation of code modification, using advanced Large Language Models (LLMs) such as Code Llama and Mistral. These models, fine-tuned on datasets featuring C/C++ code vulnerabilities, significantly improve the accuracy and adaptability of automated code repair techniques. A key finding is the enhanced repair accuracy of these models when compared to previous methods such as VulRepair, which underscores their practical utility and efficiency. The research also offers a critical assessment of current evaluation metrics, such as "Perfect Predictions", and their limitations in reflecting the true capabilities of automated repair models in real-world scenarios. Following this, it underscores the importance of using test datasets devoid of train samples, emphasizing the need for dataset integrity to enhance the effectiveness of LLMs in code repair tasks. The significance of this work is its contribution to digital security, setting new standards for automated code vulnerability repair and paving the way for future advancements in the fields of cybersecurity and artificial intelligence. The study does not only highlight the potential of LLMs in enhancing code security but also fosters further exploration and research in these crucial areas.

引用

页数：13

共 50 条

[31] Evaluating Impact of Conventional Code Analysis Against Large Language Models in API Vulnerability Detection
Yildirim, Recep
Aydin, Kerem
Cetin, Orcun
PROCEEDINGS OF THE 2024 EUROPEAN INTERDISCIPLINARY CYBERSECURITY CONFERENCE, EICC 2024, 2024, : 57 - 64
[32] Advanced Smart Contract Vulnerability Detection using Large Language Models
Erfan, Fatemeh
Yahyatabar, Mohammad
Bellaiche, Martine
Halabi, Talal
2024 8TH CYBER SECURITY IN NETWORKING CONFERENCE, CSNET, 2024, : 289 - 296
[33] Training Language Models for Programming Feedback Using Automated Repair Tools
Koutcheme, Charles
ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 830 - 835
[34] Automated Large Program Repair based on Big Code
Hoang Van Thuy
Phan Viet Anh
Nguyen Xuan Hoai
PROCEEDINGS OF THE NINTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY (SOICT 2018), 2018, : 375 - 381
[35] Finetuning Large Language Models for Vulnerability Detection
Shestov, Aleksei
Levichev, Rodion
Mussabayev, Ravil
Maslov, Evgeny
Zadorozhny, Pavel
Cheshkov, Anton
Mussabayev, Rustam
Toleu, Alymzhan
Tolegen, Gulmira
Krassovitskiy, Alexander
IEEE ACCESS, 2025, 13 : 38889 - 38900
[36] Automatic Unit Test Code Generation Using Large Language Models
Ocal, Akdeniz Kutay
Keskinoz, Mehmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[37] Multilingual Code Co-evolution using Large Language Models
Zhang, Jiyang
Nie, Pengyu
Li, Junyi Jessy
Gligoric, Milos
ESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2023, : 695 - 707
[38] Enhancing Network Management Using Code Generated by Large Language Models
Mani, Sathiya Kumaran
Zhou, Yajie
Hsieh, Kevin
Segarra, Santiago
Eberl, Trevor
Azulai, Eliran
Frizler, Ido
Chandra, Ranveer
Kandula, Srikanth
PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 196 - 204
[39] Multilingual Code Co-evolution using Large Language Models
Zhang, Jiyang
Nie, Pengyu
Li, Junyi Jessy
Gligoric, Milos
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 695 - 707
[40] Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair
Wei, Yuxiang
Xia, Chunqiu Steven
Zhang, Lingming
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 172 - 184

← 1 2 3 4 5 →