Efficient and Robust Model Benchmarks with Item Response Theory and Adaptive Testing

被引:2
|
作者
Song, Hao [1 ]
Flach, Peter [2 ]
机构
[1] Univ Bristol, Bristol, Avon, England
[2] Univ Bristol, Artificial Intelligence, Bristol, Avon, England
来源
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE | 2021年 / 6卷 / 05期
关键词
Item Response Theory; Adaptive Testing; Model Evaluation; Benchmarks;
D O I
10.9781/ijimai.2021.02.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Progress in predictive machine learning is typically measured on the basis of performance comparisons on benchmark datasets. Traditionally these kinds of empirical evaluation are carried out on large numbers of datasets, but this is becoming increasingly hard due to computational requirements and the often large number of alternative methods to compare against. In this paper we investigate adaptive approaches to achieve better efficiency on model benchmarking. For a large collection of datasets, rather than training and testing a given approach on every individual dataset, we seek methods that allow us to pick only a few representative datasets to quantify the model's goodness, from which to extrapolate to performance on other datasets. To this end, we adapt existing approaches from psychometrics: specifically, Item Response Theory and Adaptive Testing. Both are well-founded frameworks designed for educational tests. We propose certain modifications following the requirements of machine learning experiments, and present experimental results to validate the approach.
引用
收藏
页码:110 / 118
页数:9
相关论文
共 50 条
  • [31] Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing
    Revicki, DA
    Cella, DF
    QUALITY OF LIFE RESEARCH, 1997, 6 (06) : 595 - 600
  • [32] Health status assessment for the twenty-first century: item response theory, item banking and computer adaptive testing
    D. A. Revicki
    D. F. Cella
    Quality of Life Research, 1997, 6 : 595 - 600
  • [33] Investigating item response times in computerized adaptive testing
    Hornke, LF
    DIAGNOSTICA, 1997, 43 (01): : 27 - 39
  • [34] Using response times for item selection in adaptive testing
    van der Linden, Wirn J.
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2008, 33 (01) : 5 - 20
  • [35] Online Calibration of a Joint Model of Item Responses and Response Times in Computerized Adaptive Testing
    Kang, Hyeon-Ah
    Zheng, Yi
    Chang, Hua-Hua
    JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2020, 45 (02) : 175 - 208
  • [36] Computerized Adaptive Testing Using a Class of High-Order Item Response Theory Models
    Huang, Hung-Yu
    Chen, Po-Hsi
    Wang, Wen-Chung
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2012, 36 (08) : 689 - 706
  • [37] The Automated Test Assembly and Routing Rule for Multistage Adaptive Testing with Multidimensional Item Response Theory
    Xu, Lingling
    Wang, Shiyu
    Cai, Yan
    Tu, Dongbo
    JOURNAL OF EDUCATIONAL MEASUREMENT, 2021, 58 (04) : 538 - 563
  • [38] The Accuracy of Computerized Adaptive Testing in Heterogeneous Populations: A Mixture Item-Response Theory Analysis
    Sawatzky, Richard
    Ratner, Pamela A.
    Kopec, Jacek A.
    Wu, Amery D.
    Zumbo, Bruno D.
    PLOS ONE, 2016, 11 (03):
  • [39] Beyond classical test theory-assessing function using item response theory/computer adaptive testing
    McDonough, C.
    Brandt, D.
    Chan, L.
    EUROPEAN JOURNAL OF PUBLIC HEALTH, 2016, 26
  • [40] Multidimensional Computerized Adaptive Testing Using Non-Compensatory Item Response Theory Models
    Hsu, Chia-Ling
    Wang, Wen-Chung
    APPLIED PSYCHOLOGICAL MEASUREMENT, 2019, 43 (06) : 464 - 480