CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models

被引：32

作者：

Lemieux, Caroline ^{[1
]}

Inala, Jeevana Priya ^{[2
]}

Lahiri, Shuvendu K. ^{[2
]}

Sen, Siddhartha ^{[2
]}

机构：

[1] Univ British Columbia, Vancouver, BC, Canada

[2] Microsoft Res, Redmond, WA USA

来源：

2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE | 2023年

关键词：

ALGORITHM;

D O I：

10.1109/ICSE48619.2023.00085

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Search-based software testing (SBST) generates high-coverage test cases for programs under test with a combination of test case generation and mutation. SBST's performance relies on there being a reasonable probability of generating test cases that exercise the core logic of the program under test. Given such test cases, SBST can then explore the space around them to exercise various parts of the program. This paper explores whether Large Language Models (LLMs) of code, such as OpenAI's Codex, can be used to help SBST's exploration. Our proposed algorithm, CODAMOSA, conducts SBST until its coverage improvements stall, then asks Codex to provide example test cases for under-covered functions. These examples help SBST redirect its search to more useful areas of the search space. On an evaluation over 486 benchmarks, CODAMOSA achieves statistically significantly higher coverage on many more benchmarks (173 and 279) than it reduces coverage on (10 and 4), compared to SBST and LLM-only baselines.

引用

页码：919 / 931

页数：13

共 50 条

[1] Effective test generation using pre-trained Large Language Models and mutation testing
Dakhel, Arghavan Moradi
Nikanjam, Amin
Majdinasab, Vahid
Khomh, Foutse
Desmarais, Michel C.
[J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2024, 171
[2] Leveraging pre-trained language models for code generation
Soliman, Ahmed
Shaheen, Samir
Hadhoud, Mayada
[J]. COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (03) : 3955 - 3980
[3] Pre-Trained Language Models for Text Generation: A Survey
Li, Junyi
Tang, Tianyi
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
[J]. ACM COMPUTING SURVEYS, 2024, 56 (09)
[4] Exploring Pre-trained Language Models for Event Extraction and Generation
Yang, Sen
Feng, Dawei
Qiao, Linbo
Kan, Zhigang
Li, Dongsheng
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5284 - 5294
[5] STYLEDGPT: Stylized Response Generation with Pre-trained Language Models
Yang, Ze
Wu, Wei
Xu, Can
Liang, Xinnian
Bai, Jiaqi
Wang, Liran
Wang, Wei
Li, Zhoujun
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1548 - 1559
[6] Probing Toxic Content in Large Pre-Trained Language Models
Ousidhoum, Nedjma
Zhao, Xinran
Fang, Tianqing
Song, Yangqiu
Yeung, Dit-Yan
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4262 - 4274
[7] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
[J]. ENGINEERING, 2023, 25 (51-65) : 51 - 65
[8] Commonsense Knowledge Reasoning and Generation with Pre-trained Language Models: A Survey
Bhargava, Prajjwal
Ng, Vincent
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 12317 - 12325
[9] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[10] Non-Autoregressive Text Generation with Pre-trained Language Models
Su, Yixuan
Cai, Deng
Wang, Yan
Vandyke, David
Baker, Simon
Li, Piji
Collier, Nigel
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 234 - 243

← 1 2 3 4 5 →