How secure is AI-generated code: a large-scale comparison of large language models

被引:1
|
作者
Tihanyi, Norbert [1 ,2 ]
Bisztray, Tamas [3 ]
Ferrag, Mohamed Amine [4 ]
Jain, Ridhi [2 ]
Cordeiro, Lucas C. [5 ,6 ]
机构
[1] Eotvos Lorand Univ, Budapest, Hungary
[2] Technol Innovat Inst TII, Abu Dhabi, U Arab Emirates
[3] Univ Oslo, Oslo, Norway
[4] Guelma Univ, Guelma, Algeria
[5] Univ Manchester, Manchester, England
[6] Fed Univ Amazonas Manaus, Manaus, Brazil
基金
英国工程与自然科学研究理事会;
关键词
Large language models; Vulnerability classification; Formal verification; Software security; Artificial intelligence; Dataset; CHECKING;
D O I
10.1007/s10664-024-10590-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This study compares state-of-the-art Large Language Models (LLMs) on their tendency to generate vulnerabilities when writing C programs using a neutral zero-shot prompt. Tihanyi et al. introduced the FormAI dataset at PROMISE '23, featuring 112,000 C programs generated by GPT-3.5-turbo, with over 51.24% identified as vulnerable. We extended that research with a large-scale study involving 9 state-of-the-art models such as OpenAI's GPT-4o-mini, Google's Gemini Pro 1.0, TII's 180 billion-parameter Falcon, Meta's 13 billion-parameter Code Llama, and several other compact models. Additionally, we introduce the FormAI-v2 dataset, which comprises 331 000 compilable C programs generated by these LLMs. Each program in the dataset is labeled based on the vulnerabilities detected in its source code through formal verification, using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique minimizes false positives by providing a counterexample for the specific vulnerability and reduces false negatives by thoroughly completing the verification process. Our study reveals that at least 62.07% of the generated programs are vulnerable. The differences between the models are minor, as they all show similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires proper risk assessment and validation.
引用
收藏
页数:42
相关论文
共 50 条
  • [31] Large-Scale Random Forest Language Models for Speech Recognition
    Su, Yi
    Jelinek, Frederick
    Khudanpur, Sanjeev
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 945 - 948
  • [32] Towards Artwork Explanation in Large-scale Vision Language Models
    Hayashi, Kazuki
    Sakai, Yusuke
    Kamigaito, Hidetaka
    Hayashi, Katsuhiko
    Watanabe, Taro
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 705 - 729
  • [33] Large-Scale Language Models for Sarcasm Detection with Data Augmentation
    Zhang, Linrui
    Copus, Belinda
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 1 - 9
  • [34] An editorial of "AI plus informetrics": Robust models for large-scale analytics
    Zhang, Yi
    Zhang, Chengzhi
    Mayr, Philipp
    Suominen, Arho
    Ding, Ying
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [35] AI Accelerator Embedded Computational Storage for Large-Scale DNN Models
    Aim, Byungmin
    Jang, Jaehun
    Na, Hanbyeul
    Seo, Mankeun
    Son, Hongrak
    Song, Yong Ho
    2022 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS 2022): INTELLIGENT TECHNOLOGY IN THE POST-PANDEMIC ERA, 2022, : 483 - 486
  • [36] Comparison of Large-scale SVM Training Algorithms for Language Recognition
    Cumani, Sandro
    Castaldo, Fabio
    Laface, Pietro
    Colibro, Daniele
    Vair, Claudio
    ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 222 - 229
  • [37] REQUIEM FOR LARGE-SCALE MODELS
    LEE, DB
    JOURNAL OF THE AMERICAN INSTITUTE OF PLANNERS, 1973, 39 (03): : 163 - 178
  • [38] MODELS OF LARGE-SCALE STRUCTURE
    FRENK, CS
    PHYSICA SCRIPTA, 1991, T36 : 70 - 87
  • [40] A Large-Scale Comparison of Python']Python Code in Jupyter Notebooks and Scripts
    Grotov, Konstantin
    Titov, Sergey
    Sotnikov, Vladimir
    Golubev, Yaroslav
    Bryksin, Timofey
    2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 353 - 364