Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

被引:62
|
作者
Hernandez-Orallo, Jose [1 ]
机构
[1] Univ Politecn Valencia, DSIC, Valencia, Spain
关键词
AI evaluation; AI competitions; Machine intelligence; Cognitive abilities; Universal psychometrics; Turing test; INTERNATIONAL PLANNING COMPETITION; UNIVERSAL INTELLIGENCE; COGNITIVE-ABILITIES; COMPUTER-SCIENCE; REINFORCEMENT; BENCHMARKING; ITEM; ENVIRONMENT; SIMPLICITY; COMPLEXITY;
D O I
10.1007/s10462-016-9505-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.
引用
收藏
页码:397 / 447
页数:51
相关论文
共 50 条
  • [31] On computing task-oriented grasps
    El-Khoury, Sahar
    de Souza, Ravin
    Billard, Aude
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2015, 66 : 145 - 158
  • [32] A survey of task-oriented crowdsourcing
    Luz, Nuno
    Silva, Nuno
    Novais, Paulo
    ARTIFICIAL INTELLIGENCE REVIEW, 2015, 44 (02) : 187 - 213
  • [33] TASK-ORIENTED APPROACH TO SPECTROPHOTOMETRY
    SCHLEIFER, A
    WILLIS, BG
    HEWLETT-PACKARD JOURNAL, 1980, 31 (02): : 11 - 17
  • [34] Task-Oriented and Semantic-Aware Heterogeneous Networks for Artificial Intelligence of Things: Performance Analysis and Optimization
    Xu, Xiaodong
    Xu, Bingxuan
    Han, Shujun
    Dong, Chen
    Xiong, Huachao
    Meng, Rui
    Zhang, Ping
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (01) : 228 - 242
  • [35] Construction of Ability-Oriented Marketing Practical Teaching System
    Li, Mingwu
    2015 4TH INTERNATIONAL CONFERENCE ON PHYSICAL EDUCATION AND SOCIETY MANAGEMENT (ICPESM 2015), PT 1, 2015, 47 : 101 - 105
  • [36] Task Modeling for Task-Oriented Robot Programming
    Trapani, Stefano
    Indri, Marina
    2017 22ND IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2017,
  • [37] Task-oriented keyphrase extraction from social media
    Min Yang
    Yuzhi Liang
    Wei Zhao
    Wei Xu
    Jia Zhu
    Qiang Qu
    Multimedia Tools and Applications, 2018, 77 : 3171 - 3187
  • [38] Artificial Neural Network with Bayes' Rule for Reasoning Task-Oriented Grasp
    Zuo, Guoyu
    Liu, Hongxing
    Tong, Jiayuan
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3316 - 3321
  • [39] Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
    Sun, Weiwei
    Zhang, Shuo
    Balog, Krisztian
    Ren, Zhaochun
    Ren, Pengjie
    Chen, Zhumin
    de Rijke, Maarten
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2499 - 2506
  • [40] Task-oriented keyphrase extraction from social media
    Yang, Min
    Liang, Yuzhi
    Zhao, Wei
    Xu, Wei
    Zhu, Jia
    Qu, Qiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3171 - 3187