Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance Cases

被引:2
|
作者
Shahandashti, Kimya Khakzad [1 ]
Sivakumar, Mithila [1 ]
Mohajer, Mohammad Mahdi [1 ]
Belle, Alvine B. [1 ]
Wang, Song [1 ]
Lethbridge, Timothy C. [2 ]
机构
[1] York Univ, Toronto, ON, Canada
[2] Univ Ottawa, Ottawa, ON, Canada
关键词
Large Language Models; assurance cases; assurance defeaters; system certification; FM for Requirement Engineering;
D O I
10.1145/3650105.3652291
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assurance cases (ACs) are structured arguments that allowverifying the correct implementation of the created systems' non-functional requirements (e.g., safety, security). This allows for preventing system failure. The latter may result in catastrophic outcomes (e.g., loss of lives). ACs support the certification of systems in compliance with industrial standards, e.g., DO-178C and ISO 26262. Identifying defeaters -arguments that challenge these ACs - is crucial for enhancing ACs' robustness and confidence. To automatically support that task, we propose a novel approach that explores the potential of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, in identifying defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our preliminary evaluation assesses the model's ability to comprehend and generate arguments in this context and the results show that GPT-4 turbo is very proficient in EA notation and can generate different types of defeaters.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [31] Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts
    Jo, Eunbeen
    Song, Sanghoun
    Kim, Jong -Ho
    Lim, Subin
    Kim, Ju Hyeon
    Cha, Jung - Joon
    Kim, Young -Min
    Joo, Hyung Joon
    JMIR MEDICAL EDUCATION, 2024, 10
  • [32] GPT-4 performance on querying scientific publications: reproducibility, accuracy, and impact of an instruction sheet
    Tao, Kaiming
    Osman, Zachary A.
    Tzou, Philip L.
    Rhee, Soo-Yon
    Ahluwalia, Vineet
    Shafer, Robert W.
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [33] Can Open-Source AI Models Diagnose Complex Cases as Well as GPT-4?
    Perlis, Roy
    Collins, Nora
    JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2025,
  • [34] Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education
    Garcia-Rudolph, Alejandro
    Sanchez-Pinsach, David
    Wright, Mark Andrew
    Opisso, Eloy
    Vidal, Joan
    MEDICAL TEACHER, 2024,
  • [35] Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study
    Zack T.
    Lehman E.
    Suzgun M.
    Rodriguez J.A.
    Celi L.A.
    Gichoya J.
    Jurafsky D.
    Szolovits P.
    Bates D.W.
    Abdulnour R.-E.E.
    Butte A.J.
    Alsentzer E.
    The Lancet Digital Health, 2024, 6 (01): : e12 - e22
  • [36] Generating credible referenced medical research: A comparative study of openAI's GPT-4 and Google's gemini
    Omar, Mahmud
    Nassar, Saleh
    Hijazi, Kareem
    Glicksberg, Benjamin S.
    Nadkarni, Girish N.
    Klang, Eyal
    Computers in Biology and Medicine, 2025, 185
  • [37] Evaluating the accuracy, time and cost of GPT-4 and GPT-4o in liver disease diagnoses using cases from "What is Your Diagnosis"
    Guo, Yusheng
    Li, Tianxiang
    Xie, Jiao
    Luo, Miao
    Zheng, Chuansheng
    JOURNAL OF HEPATOLOGY, 2025, 82 (01) : e15 - e17
  • [38] Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
    Moore, Steven
    Nguyen, Huy A.
    Chen, Tianying
    Stamper, John
    RESPONSIVE AND SUSTAINABLE EDUCATIONAL FUTURES, EC-TEL 2023, 2023, 14200 : 229 - 245
  • [39] Enhancing systematic reviews in orthodontics: a comparative examination of GPT-3.5 and GPT-4 for generating PICO-based queries with tailored prompts and configurations
    Demir, Gizem Boztas
    Sukut, Yagizalp
    Duran, Goekhan Serhat
    Topsakal, Kubra Gulnur
    Gorgulu, Serkan
    EUROPEAN JOURNAL OF ORTHODONTICS, 2024, 46 (02)
  • [40] Evaluating large language models for surgical chart review of second stage implant-based breast reconstruction: a comparative analysis of manual review, GPT-3.5 Turbo, and GPT-4 Turbo
    Lakhlani, Devi
    Dadhania, Dhruv
    Nazerali, Rahim
    EUROPEAN JOURNAL OF PLASTIC SURGERY, 2025, 48 (01)