Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement

被引：62

作者：

Hernandez-Orallo, Jose ^{[1
]}

机构：

[1] Univ Politecn Valencia, DSIC, Valencia, Spain

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2017年 / 48卷 / 03期

关键词：

AI evaluation; AI competitions; Machine intelligence; Cognitive abilities; Universal psychometrics; Turing test; INTERNATIONAL PLANNING COMPETITION; UNIVERSAL INTELLIGENCE; COGNITIVE-ABILITIES; COMPUTER-SCIENCE; REINFORCEMENT; BENCHMARKING; ITEM; ENVIRONMENT; SIMPLICITY; COMPLEXITY;

D O I：

10.1007/s10462-016-9505-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The evaluation of artificial intelligence systems and components is crucial for the progress of the discipline. In this paper we describe and critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems. We first focus on the traditional task-oriented evaluation approach. We identify three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. We describe some of the limitations of the many evaluation schemes and competitions in these three categories, and follow the progression of some of these tests. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more integrated approaches under the perspective of universal psychometrics. We analyse some evaluation tests from AI that are better positioned for an ability-oriented evaluation and discuss how their problems and limitations can possibly be addressed with some of the tools and ideas that appear within the paper. Finally, we enumerate a series of lessons learnt and generic guidelines to be used when an AI evaluation scheme is under consideration.

引用

页码：397 / 447

页数：51

共 50 条

[1] Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement
José Hernández-Orallo
Artificial Intelligence Review, 2017, 48 : 397 - 447
[2] Intelligent task-oriented semantic communication method in artificial intelligence of things
Liu C.
Guo C.
Yang Y.
Feng C.
Sun Q.
Chen J.
Tongxin Xuebao/Journal on Communications, 2021, 42 (11): : 97 - 108
[3] CREATIVE ABILITY OF TASK-ORIENTED VERSUS PERSON-ORIENTED LEADERS
JACOBY, J
JOURNAL OF CREATIVE BEHAVIOR, 1968, 2 (04): : 249 - 253
[4] TASK-ORIENTED ARCHITECTURES
BISIANI, R
MAUERSBERG, H
REDDY, R
PROCEEDINGS OF THE IEEE, 1983, 71 (07) : 885 - 898
[5] Task-oriented Dependency Parsing Evaluation Methodology
Volokh, Alexander
Neumann, Guenter
2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 132 - 137
[6] Task-Oriented Evaluation of Indoor Positioning Systems
Jackermeier, Robert
Ludwig, Bernd
PROGRESS IN LOCATION BASED SERVICES 2018, 2018, : 25 - 47
[7] TASK-ORIENTED ACCESS TO DATA FILES - AN EVALUATION
WATTERS, C
SHEPHERD, MA
QIU, LW
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1994, 45 (04): : 251 - 262
[8] Task-oriented approach to information retrieval evaluation
Hersh, W
Pentecost, J
Hickam, D
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1996, 47 (01): : 50 - 56
[9] COMPARISON OF TASK-ORIENTED AND RELATION-ORIENTED INDIVIDUALS ABILITY TO PERCEIVE VIEWPOINTS OF OTHERS
HARDY, RC
CAREY, J
EBERWEIN, B
ELIOT, J
PERCEPTUAL AND MOTOR SKILLS, 1976, 42 (03) : 1028 - 1030
[10] Web document summarisation: a task-oriented evaluation
White, R
Ruthven, I
Jose, JM
12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, : 951 - 955

← 1 2 3 4 5 →