Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance Cases

被引:2
|
作者
Shahandashti, Kimya Khakzad [1 ]
Sivakumar, Mithila [1 ]
Mohajer, Mohammad Mahdi [1 ]
Belle, Alvine B. [1 ]
Wang, Song [1 ]
Lethbridge, Timothy C. [2 ]
机构
[1] York Univ, Toronto, ON, Canada
[2] Univ Ottawa, Ottawa, ON, Canada
关键词
Large Language Models; assurance cases; assurance defeaters; system certification; FM for Requirement Engineering;
D O I
10.1145/3650105.3652291
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Assurance cases (ACs) are structured arguments that allowverifying the correct implementation of the created systems' non-functional requirements (e.g., safety, security). This allows for preventing system failure. The latter may result in catastrophic outcomes (e.g., loss of lives). ACs support the certification of systems in compliance with industrial standards, e.g., DO-178C and ISO 26262. Identifying defeaters -arguments that challenge these ACs - is crucial for enhancing ACs' robustness and confidence. To automatically support that task, we propose a novel approach that explores the potential of GPT-4 Turbo, an advanced Large Language Model (LLM) developed by OpenAI, in identifying defeaters within ACs formalized using the Eliminative Argumentation (EA) notation. Our preliminary evaluation assesses the model's ability to comprehend and generate arguments in this context and the results show that GPT-4 turbo is very proficient in EA notation and can generate different types of defeaters.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [21] The potential impact of ChatGPT/GPT-4 on surgery: will it topple the profession of surgeons?
    Cheng, Kunming
    Sun, Zaijie
    He, Yongbin
    Gu, Shuqin
    Wu, Haiyang
    INTERNATIONAL JOURNAL OF SURGERY, 2023, 109 (05) : 1545 - 1547
  • [22] Assessing GPT-4's accuracy in answering clinical pharmacological questions on pain therapy
    Stroop, Anna
    Stroop, Tabea
    Alsofy, Samer Zawy
    Wegner, Moritz
    Nakamura, Makoto
    Stroop, Ralf
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2025,
  • [23] Assessing the accuracy and efficiency of Chat GPT-4 Omni (GPT-4o) in biomedical statistics Comparative study with traditional tools
    Meo, Anusha S.
    Shaikh, Narmeen
    Meo, Sultan A.
    SAUDI MEDICAL JOURNAL, 2024, 45 (12) : 1383 - 1390
  • [24] Assessing GPT-4's diagnostic accuracy with darker skin tones: underperformance and implications
    Akuffo-Addo, Edgar
    Samman, Luna
    Munawar, Leena
    Akbik, Maya
    Kokikian, Nelly
    Wescott, Raquel
    Wu, Jashin J.
    CLINICAL AND EXPERIMENTAL DERMATOLOGY, 2024, 49 (10) : 1244 - 1245
  • [25] Reality Check: Assessing GPT-4 in Fixing Real-World Software Vulnerabilities
    Sagodi, Zoltan
    Antal, Gabor
    Bogenfurst, Bence
    Isztin, Martin
    Hegedus, Peter
    Ferenc, Rudolf
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 252 - 261
  • [26] Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using GPT-4
    Kern, Felix B.
    Wu, Chien-Te
    Chao, Zenas C.
    BRITISH JOURNAL OF PSYCHOLOGY, 2024,
  • [28] The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education
    Rizzo, Michael G.
    Cai, Nathan
    Constantinescu, David
    JOURNAL OF ORTHOPAEDICS, 2024, 50 : 70 - 75
  • [29] Exploring the capabilities of large language models for the generation of safety cases: the case of GPT-4
    Sivakumar, Mithila
    Belle, Alvine Boaye
    Shan, Jinjun
    Shahandashti, Kimya Khakzad
    32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 35 - 45
  • [30] Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students
    Abramski, Katherine
    Citraro, Salvatore
    Lombardi, Luigi
    Rossetti, Giulio
    Stella, Massimo
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)