A Testing Framework for AI Linguistic Systems (testFAILS)

被引：0

作者：

Kumar, Y. ^{[1
]}

Morreale, P. ^{[1
]}

Sorial, P. ^{[1
]}

Delgado, J. ^{[1
]}

Li, J. Jenny ^{[1
]}

Martins, P. ^{[1
]}

机构：

[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2023年

关键词：

Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;

D O I：

10.1109/AITest58265.2023.00017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.

引用

页码：51 / 54

页数：4

共 50 条

[11] AI-T: Software Testing Ontology for AI-based Systems
Olszewska, J., I
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 291 - 298
[12] POLARIS: A framework to guide the development of Trustworthy AI systems
Baldassarre, Maria Teresa
Gigante, Domenico
Kalinowski, Marcos
Ragone, Azzurra
PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 200 - 210
[13] Smart IoMT Framework for Supporting UAV Systems with AI
Shankar, Nathan
Nallakaruppan, Musiri Kailasanathan
Ravindranath, Vaishali
Senthilkumar, Mohan
Bhagavath, Bhuvanagiri Prahal
ELECTRONICS, 2023, 12 (01)
[14] A framework for designing AI systems that support community wellbeing
van der Maden, Willem
Lomas, Derek
Hekkert, Paul
FRONTIERS IN PSYCHOLOGY, 2023, 13
[15] A Comprehensive Framework Proposal to Design Symbiotic AI Systems
Curci, Antonio
PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 460 - 465
[16] A Framework for Automated Testing of Automation Systems
Winkler, Dietmar
Hametner, Reinhard
Oestreicher, Thomas
Biffl, Stefan
2010 IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2010,
[17] A Testing Framework for Intelligent Transport Systems
Fouchal, Hacene
Wilhelm, Geoffrey
Bourdy, Emilien
Wilhelm, Geoffrey
Ayaida, Marwane
2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 180 - 184
[18] Boosting Exploratory Testing of Industrial Automation Systems with AI
Eidenbenz, Raphael
Franke, Carsten
Sivanthi, Thanikesavan
Schoenborn, Sandro
2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 362 - 371
[19] AI in optical measurement and testing systems - Chance or hype?
Marquardt E.
Marquardt, Erik, 1600, Walter de Gruyter GmbH (115): : 731 - 733
[20] Software Testing of Generative AI Systems: Challenges and Opportunities
Aleti, Aldeida
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: FUTURE OF SOFTWARE ENGINEERING, ICSE-FOSE, 2023, : 4 - 14

← 1 2 3 4 5 →