Effective test generation using pre-trained Large Language Models and mutation testing

被引:2
|
作者
Dakhel, Arghavan Moradi [1 ]
Nikanjam, Amin [1 ]
Majdinasab, Vahid [1 ]
Khomh, Foutse [1 ]
Desmarais, Michel C. [1 ]
机构
[1] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Test generation; Large language model; Mutation testing;
D O I
10.1016/j.infsof.2024.107468
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: One of the critical phases in the software development life cycle is software testing. Testing helps with identifying potential bugs and reducing maintenance costs. The goal of automated test generation tools is to ease the development of tests by suggesting efficient bug-revealing tests. Recently, researchers have leveraged Large Language Models (LLMs) of code to generate unit tests. While the code coverage of generated tests was usually assessed, the literature has acknowledged that the coverage is weakly correlated with the efficiency of tests in bug detection. Objective: To improve over this limitation, in this paper, we introduce MuTAP (Mutation Mu tation T est case generation using A ugmented P rompt) for improving the effectiveness of test cases generated by LLMs in terms of revealing bugs by leveraging mutation testing. Methods: Our goal is achieved by augmenting prompts with surviving mutants, as those mutants highlight the limitations of test cases in detecting bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs). We employ different LLMs within MuTAP and evaluate their performance on different benchmarks. Results: Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets. Among these, 17% remained undetected by both the current state-of-the-art fully-automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning approaches on LLMs. Furthermore, MuTAP achieves a Mutation Score (MS) of 93.57% on synthetic buggy code, outperforming all other approaches in our evaluation. Conclusion: Our findings suggest that although LLMs can serve as a useful tool to generate test cases, they require specific post-processing steps to enhance the effectiveness of the generated test cases which may suffer from syntactic or functional errors and may be ineffective in detecting certain types of bugs and testing corner cases in PUT s.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] μBERT: Mutation Testing using Pre-Trained Language Models
    Degiovanni, Renzo
    Papadakis, Mike
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
  • [2] CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models
    Lemieux, Caroline
    Inala, Jeevana Priya
    Lahiri, Shuvendu K.
    Sen, Siddhartha
    [J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 919 - 931
  • [3] Automated LOINC Standardization Using Pre-trained Large Language Models
    Tu, Tao
    Loreaux, Eric
    Chesley, Emma
    Lelkes, Adam D.
    Gamble, Paul
    Bellaiche, Mathias
    Seneviratne, Martin
    Chen, Ming-Jun
    [J]. MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 343 - 355
  • [4] Leveraging pre-trained language models for code generation
    Soliman, Ahmed
    Shaheen, Samir
    Hadhoud, Mayada
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
  • [5] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (09)
  • [6] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
  • [7] Exploring Pre-trained Language Models for Event Extraction and Generation
    Yang, Sen
    Feng, Dawei
    Qiao, Linbo
    Kan, Zhigang
    Li, Dongsheng
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5284 - 5294
  • [8] STYLEDGPT: Stylized Response Generation with Pre-trained Language Models
    Yang, Ze
    Wu, Wei
    Xu, Can
    Liang, Xinnian
    Bai, Jiaqi
    Wang, Liran
    Wang, Wei
    Li, Zhoujun
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1548 - 1559
  • [9] Emotional Paraphrasing Using Pre-trained Language Models
    Casas, Jacky
    Torche, Samuel
    Daher, Karl
    Mugellini, Elena
    Abou Khaled, Omar
    [J]. 2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [10] Probing Toxic Content in Large Pre-Trained Language Models
    Ousidhoum, Nedjma
    Zhao, Xinran
    Fang, Tianqing
    Song, Yangqiu
    Yeung, Dit-Yan
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274