A Testing Framework for AI Linguistic Systems (testFAILS)

被引：0

作者：

Kumar, Y. ^{[1
]}

Morreale, P. ^{[1
]}

Sorial, P. ^{[1
]}

Delgado, J. ^{[1
]}

Li, J. Jenny ^{[1
]}

Martins, P. ^{[1
]}

机构：

[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2023年

关键词：

Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;

D O I：

10.1109/AITest58265.2023.00017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.

引用

页码：51 / 54

页数：4

共 50 条

[1] A Testing Framework for AI Linguistic Systems (testFAILS)
Kumar, Yulia
Morreale, Patricia
Sorial, Peter
Delgado, Justin
Li, J. Jenny
Martins, Patrick
ELECTRONICS, 2023, 12 (14)
[2] Auditing and Testing AI - A Holistic Framework
Becker, Nikolas
Waltl, Bernhard
DIGITAL HUMAN MODELING AND APPLICATIONS IN HEALTH, SAFETY, ERGONOMICS AND RISK MANAGEMENT: HEALTH, OPERATIONS MANAGEMENT, AND DESIGN, PT II, 2022, 13320 : 283 - 292
[3] AI ethics: A framework for measuring embodied carbon in AI systems
Catherine Mulligan
Silvia Elaluf-Calderwood
AI and Ethics, 2022, 2 (3): : 363 - 375
[4] Testing Framework for Black-box AI Models
Aggarwal, Aniya
Shaikh, Samiulla
Hans, Sandeep
Haldar, Swastik
Ananthanarayanan, Rema
Saha, Diptikalyan
2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2021), 2021, : 81 - 84
[5] A framework for the development of hybrid AI control systems
Graves, AR
Czarnecki, CA
SOFT COMPUTING TECHNIQUES AND APPLICATIONS, 2000, : 63 - 68
[6] A Holistic Framework for AI Systems in Industrial Applications
Kaymakci, Can
Wenninger, Simon
Sauer, Alexander
INNOVATION THROUGH INFORMATION SYSTEMS, VOL II: A COLLECTION OF LATEST RESEARCH ON TECHNOLOGY ISSUES, 2021, 47 : 78 - 93
[7] A framework for testing distributed systems
Hughes, D
Greenwood, P
Coulson, G
FOURTH INTERNATIONAL CONFERENCE ON PEER-TO-PEER COMPUTING, PROCEEDINGS, 2004, : 262 - 263
[8] Three IQs of AI systems and their testing methods
Liu, Feng
Liu, Ying
Shi, Yong
JOURNAL OF ENGINEERING-JOE, 2020, 2020 (13): : 566 - 571
[9] Testing AI Systems Leveraging Graph Perturbation
Yang, Zhaorui
Zhu, Haichao
Zhang, Qian
COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 665 - 666
[10] Computing with words and a framework for computational linguistic dynamic systems
Wang, Feiyue
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2001, 14 (04):

← 1 2 3 4 5 →