The increasing use of large language models (LLMs) such as OpenAI's ChatGPT and Google's Bard in the software development industry raise questions about the security of generated code. Our research evaluates Java, C, and Python code samples that were generated by these LLMs. In our investigation, we assessed the consistency of code samples generated by each LLM, characterized the security of generated code, and asked both LLMs to evaluate and fix the weaknesses of their own generated code as well as the code of the other LLM. Using 133 unique prompts from Google Code Jam competitions, we produced 3,854 code samples across three distinct programming languages. We found that the code produced by these LLMs is frequently insecure and prone to weaknesses and vulnerabilities. This concerns human developers who must exercise caution while employing these LLMs.
机构:
Umm Al Qura Univ, Dept Comp Sci, Prince Sultan Bin Abdulaz Rd, Mecca 21421, Makkah, Saudi Arabia
Univ Dayton, Dept Comp Sci, Dayton, OH 45469 USAUmm Al Qura Univ, Dept Comp Sci, Prince Sultan Bin Abdulaz Rd, Mecca 21421, Makkah, Saudi Arabia
Baraheem, Samah S.
Nguyen, Tam V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Dayton, Dept Comp Sci, Dayton, OH 45469 USAUmm Al Qura Univ, Dept Comp Sci, Prince Sultan Bin Abdulaz Rd, Mecca 21421, Makkah, Saudi Arabia