Efficient and Robust Model Benchmarks with Item Response Theory and Adaptive Testing

被引：2

作者：

Song, Hao ^{[1
]}

Flach, Peter ^{[2
]}

机构：

[1] Univ Bristol, Bristol, Avon, England

[2] Univ Bristol, Artificial Intelligence, Bristol, Avon, England

来源：

INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE | 2021年 / 6卷 / 05期

关键词：

Item Response Theory; Adaptive Testing; Model Evaluation; Benchmarks;

D O I：

10.9781/ijimai.2021.02.009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Progress in predictive machine learning is typically measured on the basis of performance comparisons on benchmark datasets. Traditionally these kinds of empirical evaluation are carried out on large numbers of datasets, but this is becoming increasingly hard due to computational requirements and the often large number of alternative methods to compare against. In this paper we investigate adaptive approaches to achieve better efficiency on model benchmarking. For a large collection of datasets, rather than training and testing a given approach on every individual dataset, we seek methods that allow us to pick only a few representative datasets to quantify the model's goodness, from which to extrapolate to performance on other datasets. To this end, we adapt existing approaches from psychometrics: specifically, Item Response Theory and Adaptive Testing. Both are well-founded frameworks designed for educational tests. We propose certain modifications following the requirements of machine learning experiments, and present experimental results to validate the approach.

引用

页码：110 / 118

页数：9

共 50 条

[21] Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs
Shin, Hyo Jeong
Koenig, Christoph
Robin, Frederic
Frey, Andreas
Yamamoto, Kentaro
JOURNAL OF EDUCATIONAL MEASUREMENT, 2024,
[22] Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment
Fayers, Peter M.
QUALITY OF LIFE RESEARCH, 2007, 16 : 187 - 194
[23] COMPUTER ADAPTIVE TESTING: MULTIDIMENSIONAL ITEM RESPONSE THEORY AND THE DEVELOPMENT OF THE KIDDIE-CAT
Gibbons, Robert
JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2019, 58 (10): : S117 - S117
[24] Firestar: Computerized Adaptive Testing Simulation Program for Polytomous Item Response Theory Models
Choi, Seung W.
APPLIED PSYCHOLOGICAL MEASUREMENT, 2009, 33 (08) : 644 - 645
[25] Robust Measurement via A Fused Latent and Graphical Item Response Theory Model
Chen, Yunxiao
Li, Xiaoou
Liu, Jingchen
Ying, Zhiliang
PSYCHOMETRIKA, 2018, 83 (03) : 538 - 562
[26] Robust Measurement via A Fused Latent and Graphical Item Response Theory Model
Yunxiao Chen
Xiaoou Li
Jingchen Liu
Zhiliang Ying
Psychometrika, 2018, 83 : 538 - 562
[27] Quantitative Penetration Testing with Item Response Theory
Arnold, Florian
Pieters, Wolter
Stoelinga, Marielle
2013 9TH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY (IAS), 2013, : 49 - +
[28] Quantitative Penetration Testing with Item Response Theory
Arnold, Florian
Pieters, Wolter
Stoelinga, Marielle
JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2014, 9 (03): : 118 - 127
[29] The Item Response Theory Model for an AI-based Adaptive Learning System
Cui, Wei
Xue, Zhen
Shen, Jun
Sun, Geng
Li, Jianxin
2019 18TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING (ITHET 2019), 2019,
[30] Clustering Examples in Multi-Dataset NLP Benchmarks with Item Response Theory
Rodriguez, Pedro
Htut, Phu Mon
Lalor, John P.
Sedoc, Joao
PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 100 - 112

← 1 2 3 4 5 →