A Testing Framework for AI Linguistic Systems (testFAILS)

被引:0
|
作者
Kumar, Y. [1 ]
Morreale, P. [1 ]
Sorial, P. [1 ]
Delgado, J. [1 ]
Li, J. Jenny [1 ]
Martins, P. [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
关键词
Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;
D O I
10.1109/AITest58265.2023.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.
引用
收藏
页码:51 / 54
页数:4
相关论文
共 50 条
  • [1] A Testing Framework for AI Linguistic Systems (testFAILS)
    Kumar, Yulia
    Morreale, Patricia
    Sorial, Peter
    Delgado, Justin
    Li, J. Jenny
    Martins, Patrick
    ELECTRONICS, 2023, 12 (14)
  • [2] Auditing and Testing AI - A Holistic Framework
    Becker, Nikolas
    Waltl, Bernhard
    DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT: HEALTH, OPERATIONS MANAGEMENT, AND DESIGN, PT II, 2022, 13320 : 283 - 292
  • [3] AI ethics: A framework for measuring embodied carbon in AI systems
    Catherine Mulligan
    Silvia Elaluf-Calderwood
    AI and Ethics, 2022, 2 (3): : 363 - 375
  • [4] Testing Framework for Black-box AI Models
    Aggarwal, Aniya
    Shaikh, Samiulla
    Hans, Sandeep
    Haldar, Swastik
    Ananthanarayanan, Rema
    Saha, Diptikalyan
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 81 - 84
  • [5] A framework for the development of hybrid AI control systems
    Graves, AR
    Czarnecki, CA
    SOFT COMPUTING TECHNIQUES AND APPLICATIONS, 2000, : 63 - 68
  • [6] A Holistic Framework for AI Systems in Industrial Applications
    Kaymakci, Can
    Wenninger, Simon
    Sauer, Alexander
    INNOVATION THROUGH INFORMATION SYSTEMS, VOL II: A COLLECTION OF LATEST RESEARCH ON TECHNOLOGY ISSUES, 2021, 47 : 78 - 93
  • [7] A framework for testing distributed systems
    Hughes, D
    Greenwood, P
    Coulson, G
    FOURTH INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, PROCEEDINGS, 2004, : 262 - 263
  • [8] Three IQs of AI systems and their testing methods
    Liu, Feng
    Liu, Ying
    Shi, Yong
    JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 566 - 571
  • [9] Testing AI Systems Leveraging Graph Perturbation
    Yang, Zhaorui
    Zhu, Haichao
    Zhang, Qian
    COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 665 - 666
  • [10] Computing with words and a framework for computational linguistic dynamic systems
    Wang, Feiyue
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2001, 14 (04):