A Testing Framework for AI Linguistic Systems (testFAILS)

被引:0
|
作者
Kumar, Y. [1 ]
Morreale, P. [1 ]
Sorial, P. [1 ]
Delgado, J. [1 ]
Li, J. Jenny [1 ]
Martins, P. [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
关键词
Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;
D O I
10.1109/AITest58265.2023.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.
引用
收藏
页码:51 / 54
页数:4
相关论文
共 50 条
  • [21] NEW FIELD TESTING SYSTEMS FOR AI-BOARS
    BRANDT, H
    WORNER, R
    ARCHIV FUR TIERZUCHT-ARCHIVES OF ANIMAL BREEDING, 1995, 38 (03): : 299 - 304
  • [22] Applying the Framework of Complex Adaptive Systems to a Model of Linguistic Variation
    Lopez Rivera, Juan J.
    MOENIA-REVISTA LUCENSE DE LINGUISTICA & LITERATURA, 2013, 19 : 5 - 24
  • [23] An AI-based Simulation and Optimization Framework for Logistic Systems
    Zong, Zefang
    Yan, Huan
    Sui, Hongjie
    Li, Haoxiang
    Jiang, Peiqi
    Li, Yong
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5138 - 5142
  • [24] A Red Teaming Framework for Securing AI in Maritime Autonomous Systems
    Walter, Mathew J.
    Barrett, Aaron
    Tam, Kimberly
    APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [25] Evaluation framework to guide implementation of AI systems into healthcare settings
    Reddy, Sandeep
    Rogers, Wendy
    Makinen, Ville-Petteri
    Coiera, Enrico
    Brown, Pieta
    Wenzel, Markus
    Weicken, Eva
    Ansari, Saba
    Mathur, Piyush
    Casey, Aaron
    Kelly, Blair
    BMJ HEALTH & CARE INFORMATICS, 2021, 28 (01)
  • [26] A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems
    Mohseni, Sina
    Zarei, Niloofar
    Ragan, Eric D.
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2021, 11 (3-4)
  • [27] A Classification Study on Testing and Verification of AI-based Systems
    De Angelis, Emanuele
    De Angelis, Guglielmo
    Proietti, Maurizio
    2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 1 - 8
  • [28] A framework for the automation of testing computer vision systems
    Wotawa, Franz
    Klampfl, Lorenz
    Jahaj, Ledio
    2021 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST (AST 2021), 2021, : 121 - 124
  • [29] An Observable and Controllable Testing Framework for Modern Systems
    Yu, Tingting
    PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 1377 - 1380
  • [30] A Software Testing Framework for Networked Industrial Systems
    Satoh, Ichiro
    39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 4340 - 4345