Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

被引:62
|
作者
Hernandez-Orallo, Jose [1 ]
机构
[1] Univ Politecn Valencia, DSIC, Valencia, Spain
关键词
AI evaluation; AI competitions; Machine intelligence; Cognitive abilities; Universal psychometrics; Turing test; INTERNATIONAL PLANNING COMPETITION; UNIVERSAL INTELLIGENCE; COGNITIVE-ABILITIES; COMPUTER-SCIENCE; REINFORCEMENT; BENCHMARKING; ITEM; ENVIRONMENT; SIMPLICITY; COMPLEXITY;
D O I
10.1007/s10462-016-9505-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.
引用
收藏
页码:397 / 447
页数:51
相关论文
共 50 条
  • [21] A survey of task-oriented crowdsourcing
    Nuno Luz
    Nuno Silva
    Paulo Novais
    Artificial Intelligence Review, 2015, 44 : 187 - 213
  • [22] ExtraWeb An Extrinsic Task-oriented Evaluation of Webpage Extracts
    Silva, Patrick Pedreira
    Machado Rino, Lucia Helena
    ICEIS: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1, 2013, : 467 - 474
  • [23] Task-oriented learning on the Web
    Whittington, CD
    Campbell, LM
    INNOVATIONS IN EDUCATION AND TRAINING INTERNATIONAL, 1999, 36 (01): : 26 - 33
  • [24] Modelling "but" in task-oriented dialogue
    Thomas, KE
    MODELING AND USING CONTEXT, PROCEEDINGS, 2003, 2680 : 314 - 327
  • [25] Modeling Task-Oriented Dialogue
    Maite Taboada
    Computers and the Humanities, 2003, 37 : 431 - 454
  • [26] Task-Oriented Clustering for Dialogues
    Lv, Chenxu
    Lu, Hengtong
    Lei, Shuyu
    Jiang, Huixing
    Wu, Wei
    Yuan, Caixia
    Wang, Xiaojie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4338 - 4347
  • [27] Modeling task-oriented dialogue
    Taboada, M
    COMPUTERS AND THE HUMANITIES, 2003, 37 (04): : 431 - 454
  • [28] Task-Oriented Situation Recognition
    Bauer, Alexander
    Fischer, Yvonne
    CYBER SECURITY, SITUATION MANAGEMENT, AND IMPACT ASSESSMENT II; AND VISUAL ANALYTICS FOR HOMELAND DEFENSE AND SECURITY II, 2010, 7709
  • [29] Task-Oriented Feature Distillation
    Zhang, Linfeng
    Shi, Yukang
    Shi, Zuoqiang
    Ma, Kaisheng
    Bao, Chenglong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] A Task-Oriented Vision System
    Xiao, Yang
    Irick, Kevin
    Sampson, Jack
    Narayanan, Vijaykrishnan
    Zhang, Chuanjun
    GLSVLSI'14: PROCEEDINGS OF THE 2014 GREAT LAKES SYMPOSIUM ON VLSI, 2014, : 181 - 186