Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

被引:62
|
作者
Hernandez-Orallo, Jose [1 ]
机构
[1] Univ Politecn Valencia, DSIC, Valencia, Spain
关键词
AI evaluation; AI competitions; Machine intelligence; Cognitive abilities; Universal psychometrics; Turing test; INTERNATIONAL PLANNING COMPETITION; UNIVERSAL INTELLIGENCE; COGNITIVE-ABILITIES; COMPUTER-SCIENCE; REINFORCEMENT; BENCHMARKING; ITEM; ENVIRONMENT; SIMPLICITY; COMPLEXITY;
D O I
10.1007/s10462-016-9505-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.
引用
收藏
页码:397 / 447
页数:51
相关论文
共 50 条
  • [41] FACILITATING SELF-EVALUATION IN TASK-ORIENTED GROUP LEARNING
    TILTON, JR
    JENSEN, BT
    HUMAN FACTORS, 1960, 2 (02) : 92 - 96
  • [42] Task-Oriented Evaluation of Dependency Parsing with Open Information Extraction
    Gamallo, Pablo
    Garcia, Marcos
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 77 - 82
  • [43] Task-oriented Evaluation Algorithm of Node Importance for Operation Network
    Li, Eryu
    Gong, Jianxing
    Huang, Jian
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [44] Perceptual and Task-Oriented Assessment of a Semantic Metric for ASR Evaluation
    Rugayan, Janine
    Salvi, Giampiero
    Svendsen, Torbjorn
    INTERSPEECH 2023, 2023, : 2158 - 2162
  • [45] A pilot task-oriented evaluation of evidence-based medicine
    Zacks, MP
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1997, : 1019 - 1019
  • [46] Research on Task-Oriented Application Design
    Zhou, Chuan-Sheng
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 1482 - 1486
  • [47] Learning to Model Task-Oriented Attention
    Zou, Xiaochun
    Zhao, Xinbo
    Wang, Jian
    Yang, Yongjia
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2016, 2016 : 1 - 12
  • [48] Assessment criteria for task-oriented groups
    Witte, EH
    Lecher, S
    GRUPPENDYNAMIK-ZEITSCHRIFT FUR ANGEWANDTE SOZIALPSYCHOLOGIE, 1998, 29 (03): : 313 - 325
  • [49] Modeling task-oriented discussion groups
    Wilson, R
    USER MODELING 2003, PROCEEDINGS, 2003, 2702 : 248 - 257
  • [50] Landmark selection for task-oriented navigation
    Lerner, Ronen
    Rivlin, Ehud
    Shimshoni, Ilan
    2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 2785 - 2791