A Testing Framework for AI Linguistic Systems (testFAILS)

被引：0

作者：

Kumar, Y. ^{[1
]}

Morreale, P. ^{[1
]}

Sorial, P. ^{[1
]}

Delgado, J. ^{[1
]}

Li, J. Jenny ^{[1
]}

Martins, P. ^{[1
]}

机构：

[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2023年

关键词：

Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;

D O I：

10.1109/AITest58265.2023.00017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.

引用

页码：51 / 54

页数：4

共 50 条

[21] NEW FIELD TESTING SYSTEMS FOR AI-BOARS
BRANDT, H
WORNER, R
ARCHIV FUR TIERZUCHT-ARCHIVES OF ANIMAL BREEDING, 1995, 38 (03): : 299 - 304
[22] Applying the Framework of Complex Adaptive Systems to a Model of Linguistic Variation
Lopez Rivera, Juan J.
MOENIA-REVISTA LUCENSE DE LINGUISTICA & LITERATURA, 2013, 19 : 5 - 24
[23] An AI-based Simulation and Optimization Framework for Logistic Systems
Zong, Zefang
Yan, Huan
Sui, Hongjie
Li, Haoxiang
Jiang, Peiqi
Li, Yong
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5138 - 5142
[24] A Red Teaming Framework for Securing AI in Maritime Autonomous Systems
Walter, Mathew J.
Barrett, Aaron
Tam, Kimberly
APPLIED ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
[25] Evaluation framework to guide implementation of AI systems into healthcare settings
Reddy, Sandeep
Rogers, Wendy
Makinen, Ville-Petteri
Coiera, Enrico
Brown, Pieta
Wenzel, Markus
Weicken, Eva
Ansari, Saba
Mathur, Piyush
Casey, Aaron
Kelly, Blair
BMJ HEALTH & CARE INFORMATICS, 2021, 28 (01)
[26] A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems
Mohseni, Sina
Zarei, Niloofar
Ragan, Eric D.
ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2021, 11 (3-4)
[27] A Classification Study on Testing and Verification of AI-based Systems
De Angelis, Emanuele
De Angelis, Guglielmo
Proietti, Maurizio
2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 1 - 8
[28] A framework for the automation of testing computer vision systems
Wotawa, Franz
Klampfl, Lorenz
Jahaj, Ledio
2021 IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATION OF SOFTWARE TEST (AST 2021), 2021, : 121 - 124
[29] An Observable and Controllable Testing Framework for Modern Systems
Yu, Tingting
PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), 2013, : 1377 - 1380
[30] A Software Testing Framework for Networked Industrial Systems
Satoh, Ichiro
39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 4340 - 4345

← 1 2 3 4 5 →