Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

被引:62
|
作者
Hernandez-Orallo, Jose [1 ]
机构
[1] Univ Politecn Valencia, DSIC, Valencia, Spain
关键词
AI evaluation; AI competitions; Machine intelligence; Cognitive abilities; Universal psychometrics; Turing test; INTERNATIONAL PLANNING COMPETITION; UNIVERSAL INTELLIGENCE; COGNITIVE-ABILITIES; COMPUTER-SCIENCE; REINFORCEMENT; BENCHMARKING; ITEM; ENVIRONMENT; SIMPLICITY; COMPLEXITY;
D O I
10.1007/s10462-016-9505-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.
引用
收藏
页码:397 / 447
页数:51
相关论文
共 50 条
  • [1] Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement
    José Hernández-Orallo
    Artificial Intelligence Review, 2017, 48 : 397 - 447
  • [2] Intelligent task-oriented semantic communication method in artificial intelligence of things
    Liu C.
    Guo C.
    Yang Y.
    Feng C.
    Sun Q.
    Chen J.
    Tongxin Xuebao/Journal on Communications, 2021, 42 (11): : 97 - 108
  • [3] CREATIVE ABILITY OF TASK-ORIENTED VERSUS PERSON-ORIENTED LEADERS
    JACOBY, J
    JOURNAL OF CREATIVE BEHAVIOR, 1968, 2 (04): : 249 - 253
  • [4] TASK-ORIENTED ARCHITECTURES
    BISIANI, R
    MAUERSBERG, H
    REDDY, R
    PROCEEDINGS OF THE IEEE, 1983, 71 (07) : 885 - 898
  • [5] Task-oriented Dependency Parsing Evaluation Methodology
    Volokh, Alexander
    Neumann, Guenter
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 132 - 137
  • [6] Task-Oriented Evaluation of Indoor Positioning Systems
    Jackermeier, Robert
    Ludwig, Bernd
    PROGRESS IN LOCATION BASED SERVICES 2018, 2018, : 25 - 47
  • [7] TASK-ORIENTED ACCESS TO DATA FILES - AN EVALUATION
    WATTERS, C
    SHEPHERD, MA
    QIU, LW
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1994, 45 (04): : 251 - 262
  • [8] Task-oriented approach to information retrieval evaluation
    Hersh, W
    Pentecost, J
    Hickam, D
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (01): : 50 - 56
  • [9] COMPARISON OF TASK-ORIENTED AND RELATION-ORIENTED INDIVIDUALS ABILITY TO PERCEIVE VIEWPOINTS OF OTHERS
    HARDY, RC
    CAREY, J
    EBERWEIN, B
    ELIOT, J
    PERCEPTUAL AND MOTOR SKILLS, 1976, 42 (03) : 1028 - 1030
  • [10] Web document summarisation: a task-oriented evaluation
    White, R
    Ruthven, I
    Jose, JM
    12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, : 951 - 955