A Testing Framework for AI Linguistic Systems (testFAILS)

被引:0
|
作者
Kumar, Y. [1 ]
Morreale, P. [1 ]
Sorial, P. [1 ]
Delgado, J. [1 ]
Li, J. Jenny [1 ]
Martins, P. [1 ]
机构
[1] Kean Univ, Dept Comp Sci & Technol, Union, NJ 07083 USA
关键词
Chatbots; Validation of Chatbots; Bot Technologies; AI Linguistic Systems Testing Framework (testFAILS); AIDoctor;
D O I
10.1109/AITest58265.2023.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces testFAILS, an innovative testing framework designed for the rigorous evaluation of AI Linguistic Systems, with a particular emphasis on various iterations of ChatGPT. Leveraging orthogonal array coverage, this framework provides a robust mechanism for assessing AI systems, addressing the critical question, "How should we evaluate AI?" While the Turing test has traditionally been the benchmark for AI evaluation, we argue that current publicly available chatbots, despite their rapid advancements, have yet to meet this standard. However, the pace of progress suggests that achieving Turing test-level performance may be imminent. In the interim, the need for effective AI evaluation and testing methodologies remains paramount. Our research, which is ongoing, has already validated several versions of ChatGPT, and we are currently conducting comprehensive testing on the latest models, including ChatGPT-4, Bard, Bing Bot, and the LLaMA model. The testFAILS framework is designed to be adaptable, ready to evaluate new bot versions as they are released. Additionally, we have tested available chatbot APIs and developed our own application, AIDoctor, utilizing the ChatGPT-4 model and Microsoft Azure AI technologies.
引用
收藏
页码:51 / 54
页数:4
相关论文
共 50 条
  • [11] AI-T: Software Testing Ontology for AI-based Systems
    Olszewska, J., I
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 291 - 298
  • [12] POLARIS: A framework to guide the development of Trustworthy AI systems
    Baldassarre, Maria Teresa
    Gigante, Domenico
    Kalinowski, Marcos
    Ragone, Azzurra
    PROCEEDINGS 2024 IEEE/ACM 3RD INTERNATIONAL CONFERENCE ON AI ENGINEERING-SOFTWARE ENGINEERING FOR AI, CAIN 2024, 2024, : 200 - 210
  • [13] Smart IoMT Framework for Supporting UAV Systems with AI
    Shankar, Nathan
    Nallakaruppan, Musiri Kailasanathan
    Ravindranath, Vaishali
    Senthilkumar, Mohan
    Bhagavath, Bhuvanagiri Prahal
    ELECTRONICS, 2023, 12 (01)
  • [14] A framework for designing AI systems that support community wellbeing
    van der Maden, Willem
    Lomas, Derek
    Hekkert, Paul
    FRONTIERS IN PSYCHOLOGY, 2023, 13
  • [15] A Comprehensive Framework Proposal to Design Symbiotic AI Systems
    Curci, Antonio
    PROCEEDINGS OF 2024 28TH INTERNATION CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING, EASE 2024, 2024, : 460 - 465
  • [16] A Framework for Automated Testing of Automation Systems
    Winkler, Dietmar
    Hametner, Reinhard
    Oestreicher, Thomas
    Biffl, Stefan
    2010 IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2010,
  • [17] A Testing Framework for Intelligent Transport Systems
    Fouchal, Hacene
    Wilhelm, Geoffrey
    Bourdy, Emilien
    Wilhelm, Geoffrey
    Ayaida, Marwane
    2016 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2016, : 180 - 184
  • [18] Boosting Exploratory Testing of Industrial Automation Systems with AI
    Eidenbenz, Raphael
    Franke, Carsten
    Sivanthi, Thanikesavan
    Schoenborn, Sandro
    2021 14TH IEEE CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION (ICST 2021), 2021, : 362 - 371
  • [19] AI in optical measurement and testing systems - Chance or hype?
    Marquardt E.
    Marquardt, Erik, 1600, Walter de Gruyter GmbH (115): : 731 - 733
  • [20] Software Testing of Generative AI Systems: Challenges and Opportunities
    Aleti, Aldeida
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: FUTURE OF SOFTWARE ENGINEERING, ICSE-FOSE, 2023, : 4 - 14