Efficient and Robust Model Benchmarks with Item Response Theory and Adaptive Testing

被引:2
|
作者
Song, Hao [1 ]
Flach, Peter [2 ]
机构
[1] Univ Bristol, Bristol, Avon, England
[2] Univ Bristol, Artificial Intelligence, Bristol, Avon, England
来源
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE | 2021年 / 6卷 / 05期
关键词
Item Response Theory; Adaptive Testing; Model Evaluation; Benchmarks;
D O I
10.9781/ijimai.2021.02.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Progress in predictive machine learning is typically measured on the basis of performance comparisons on benchmark datasets. Traditionally these kinds of empirical evaluation are carried out on large numbers of datasets, but this is becoming increasingly hard due to computational requirements and the often large number of alternative methods to compare against. In this paper we investigate adaptive approaches to achieve better efficiency on model benchmarking. For a large collection of datasets, rather than training and testing a given approach on every individual dataset, we seek methods that allow us to pick only a few representative datasets to quantify the model's goodness, from which to extrapolate to performance on other datasets. To this end, we adapt existing approaches from psychometrics: specifically, Item Response Theory and Adaptive Testing. Both are well-founded frameworks designed for educational tests. We propose certain modifications following the requirements of machine learning experiments, and present experimental results to validate the approach.
引用
收藏
页码:110 / 118
页数:9
相关论文
共 50 条
  • [21] Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs
    Shin, Hyo Jeong
    Koenig, Christoph
    Robin, Frederic
    Frey, Andreas
    Yamamoto, Kentaro
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2024,
  • [22] Applying item response theory and computer adaptive testing: the challenges for health outcomes assessment
    Fayers, Peter M.
    QUALITY OF LIFE RESEARCH, 2007, 16 : 187 - 194
  • [23] COMPUTER ADAPTIVE TESTING: MULTIDIMENSIONAL ITEM RESPONSE THEORY AND THE DEVELOPMENT OF THE KIDDIE-CAT
    Gibbons, Robert
    JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2019, 58 (10): : S117 - S117
  • [24] Firestar: Computerized Adaptive Testing Simulation Program for Polytomous Item Response Theory Models
    Choi, Seung W.
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2009, 33 (08) : 644 - 645
  • [25] Robust Measurement via A Fused Latent and Graphical Item Response Theory Model
    Chen, Yunxiao
    Li, Xiaoou
    Liu, Jingchen
    Ying, Zhiliang
    PSYCHOMETRIKA, 2018, 83 (03) : 538 - 562
  • [26] Robust Measurement via A Fused Latent and Graphical Item Response Theory Model
    Yunxiao Chen
    Xiaoou Li
    Jingchen Liu
    Zhiliang Ying
    Psychometrika, 2018, 83 : 538 - 562
  • [27] Quantitative Penetration Testing with Item Response Theory
    Arnold, Florian
    Pieters, Wolter
    Stoelinga, Marielle
    2013 9TH INTERNATIONAL CONFERENCE ON INFORMATION ASSURANCE AND SECURITY (IAS), 2013, : 49 - +
  • [28] Quantitative Penetration Testing with Item Response Theory
    Arnold, Florian
    Pieters, Wolter
    Stoelinga, Marielle
    JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2014, 9 (03): : 118 - 127
  • [29] The Item Response Theory Model for an AI-based Adaptive Learning System
    Cui, Wei
    Xue, Zhen
    Shen, Jun
    Sun, Geng
    Li, Jianxin
    2019 18TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING (ITHET 2019), 2019,
  • [30] Clustering Examples in Multi-Dataset NLP Benchmarks with Item Response Theory
    Rodriguez, Pedro
    Htut, Phu Mon
    Lalor, John P.
    Sedoc, Joao
    PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 100 - 112