Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance Cases

被引：2

作者：

Shahandashti, Kimya Khakzad ^{[1
]}

Sivakumar, Mithila ^{[1
]}

Mohajer, Mohammad Mahdi ^{[1
]}

Belle, Alvine B. ^{[1
]}

Wang, Song ^{[1
]}

Lethbridge, Timothy C. ^{[2
]}

机构：

[1] York Univ, Toronto, ON, Canada

[2] Univ Ottawa, Ottawa, ON, Canada

来源：

PROCEEDINGS 2024 IEEE/ACM FIRST INTERNATIONAL CONFERENCE ON AI FOUNDATION MODELS AND SOFTWARE ENGINEERING, FORGE 2024 | 2024年

关键词：

Large Language Models; assurance cases; assurance defeaters; system certification; FM for Requirement Engineering;

D O I：

10.1145/3650105.3652291

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Assurance cases (ACs) are structured arguments that allowverifying the correct implementation of the created systems' non-functional requirements (e.g., safety, security). This allows for preventing system failure. The latter may result in catastrophic outcomes (e.g., loss of lives). ACs support the certification of systems in compliance with industrial standards, e.g., DO-178C and ISO 26262. Identifying defeaters -arguments that challenge these ACs - is crucial for enhancing ACs' robustness and confidence. To automatically support that task, we propose a novel approach that explores the potential of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, in identifying defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our preliminary evaluation assesses the model's ability to comprehend and generate arguments in this context and the results show that GPT-4 turbo is very proficient in EA notation and can generate different types of defeaters.

引用

页码：52 / 56

页数：5

共 50 条

[31] Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts
Jo, Eunbeen
Song, Sanghoun
Kim, Jong -Ho
Lim, Subin
Kim, Ju Hyeon
Cha, Jung - Joon
Kim, Young -Min
Joo, Hyung Joon
JMIR MEDICAL EDUCATION, 2024, 10
[32] GPT-4 performance on querying scientific publications: reproducibility, accuracy, and impact of an instruction sheet
Tao, Kaiming
Osman, Zachary A.
Tzou, Philip L.
Rhee, Soo-Yon
Ahluwalia, Vineet
Shafer, Robert W.
BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
[33] Can Open-Source AI Models Diagnose Complex Cases as Well as GPT-4?
Perlis, Roy
Collins, Nora
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2025,
[34] Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education
Garcia-Rudolph, Alejandro
Sanchez-Pinsach, David
Wright, Mark Andrew
Opisso, Eloy
Vidal, Joan
MEDICAL TEACHER, 2024,
[35] Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study
Zack T.
Lehman E.
Suzgun M.
Rodriguez J.A.
Celi L.A.
Gichoya J.
Jurafsky D.
Szolovits P.
Bates D.W.
Abdulnour R.-E.E.
Butte A.J.
Alsentzer E.
The Lancet Digital Health, 2024, 6 (01): : e12 - e22
[36] Generating credible referenced medical research: A comparative study of openAI's GPT-4 and Google's gemini
Omar, Mahmud
Nassar, Saleh
Hijazi, Kareem
Glicksberg, Benjamin S.
Nadkarni, Girish N.
Klang, Eyal
Computers in Biology and Medicine, 2025, 185
[37] Evaluating the accuracy, time and cost of GPT-4 and GPT-4o in liver disease diagnoses using cases from "What is Your Diagnosis"
Guo, Yusheng
Li, Tianxiang
Xie, Jiao
Luo, Miao
Zheng, Chuansheng
JOURNAL OF HEPATOLOGY, 2025, 82 (01) : e15 - e17
[38] Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Moore, Steven
Nguyen, Huy A.
Chen, Tianying
Stamper, John
RESPONSIVE AND SUSTAINABLE EDUCATIONAL FUTURES, EC-TEL 2023, 2023, 14200 : 229 - 245
[39] Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations
Demir, Gizem Boztas
Sukut, Yagizalp
Duran, Goekhan Serhat
Topsakal, Kubra Gulnur
Gorgulu, Serkan
EUROPEAN JOURNAL OF ORTHODONTICS, 2024, 46 (02)
[40] Evaluating large language models for surgical chart review of second stage implant-based breast reconstruction: a comparative analysis of manual review, GPT-3.5 Turbo, and GPT-4 Turbo
Lakhlani, Devi
Dadhania, Dhruv
Nazerali, Rahim
EUROPEAN JOURNAL OF PLASTIC SURGERY, 2025, 48 (01)

← 1 2 3 4 5 →