Large Language Models: The Next Frontier for Variable Discovery within Metamorphic Testing

被引：2

作者：

Tsigkanos, Christos ^{[1
]}

Rani, Pooja ^{[2
]}

Mueller, Sebastian ^{[3
]}

Kehrer, Timo ^{[1
]}

机构：

[1] Univ Bern, Bern, Switzerland

[2] Univ Zurich, Zurich, Switzerland

[3] Humboldt Univ, Berlin, Germany

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER | 2023年

关键词：

Metamorphic Testing; Large Language Models; Natural Language Processing; Scientific Software;

D O I：

10.1109/SANER56733.2023.00070

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Metamorphic testing involves reasoning on necessary properties that a program under test should exhibit regarding multiple input and output variables. A general approach consists of extracting metamorphic relations from auxiliary artifacts such as user manuals or documentation, a strategy particularly fitting to testing scientific software. However, such software typically has large input-output spaces, and the fundamental prerequisite extracting variables of interest is an arduous and non-scalable process when performed manually. To this end, we devise a workflow around an autoregressive transformerbased Large Language Model (LLM) towards the extraction of variables from user manuals of scientific software. Our end-toend approach, besides a prompt specification consisting of fewshot examples by a human user, is fully automated, in contrast to current practice requiring human intervention. We showcase our LLM workflow over a real case, and compare variables extracted to ground truth manually labelled by experts. Our preliminary results show that our LLM-based workflow achieves an accuracy of 0.87, while successfully deriving 61.8% of variables as partial matches and 34.7% as exact matches.

引用

页码：678 / 682

页数：5

共 50 条

[1] Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models
Li, Ningke
Li, Yuekang
Liu, Yi
Shi, Ling
Wang, Kailong
Wang, Haoyu
[J]. Proceedings of the ACM on Programming Languages, 2024, 8 (OOPSLA2)
[2] Evaluating Natural Language Inference Models: A Metamorphic Testing Approach
Jiang, Mingyue
Bao, Houzhen
Tu, Kaiyi
Zhang, Xiao-Yi
Ding, Zuohua
[J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 220 - 230
[3] Causal Dataset Discovery with Large Language Models
Liu, Junfei
Sun, Shaotong
Nargesian, Fatemeh
[J]. WORKSHOP ON HUMAN-IN-THE-LOOP DATA ANALYTICS, HILDA 2024, 2024,
[4] Metamorphic Malware Evolution: The Potential and Peril of Large Language Models
Madani, Pooria
[J]. 2023 5TH IEEE INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS AND APPLICATIONS, TPS-ISA, 2023, : 74 - 81
[5] What’s the next word in large language models?
[J]. Nature Machine Intelligence, 2023, 5 : 331 - 332
[6] What's the next word in large language models?
不详
[J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (04) : 331 - 332
[7] Applications of natural language processing and large language models in materials discovery
Xue Jiang
Weiren Wang
Shaohan Tian
Hao Wang
Turab Lookman
Yanjing Su
[J]. npj Computational Materials, 11 (1)
[8] Large language models: a new frontier in paediatric cataract patient education
Dihan, Qais
Chauhan, Muhammad Z.
Eleiwa, Taher K.
Brown, Andrew D.
Hassan, Amr K.
Khodeiry, Mohamed M.
Elsheikh, Reem H.
Oke, Isdin
Nihalani, Bharti R.
VanderVeen, Deborah K.
Sallam, Ahmed B.
Elhusseiny, Abdelrahman M.
[J]. BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
[9] Large Language Models for Next Point-of-Interest Recommendation
Li, Peibo
de Rijke, Maarten
Xue, Hao
Ao, Shuang
Song, Yang
Salim, Flora D.
[J]. PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1463 - 1472
[10] What's Next in Affective Modeling? Large Language Models
Yongsatianchot, Nutchanon
Thejll-Madsen, Tobias
Marsella, Stacy
[J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,

← 1 2 3 4 5 →