Bayesian Inferential Risk Evaluation On Multiple IR Systems

被引:1
|
作者
Benham, Rodger [1 ]
Ben Carterette [2 ]
Culpepper, J. Shane [1 ]
Moffat, Alistair [3 ]
机构
[1] RMIT Univ, Melbourne, Vic, Australia
[2] Spotify, New York, NY USA
[3] Univ Melbourne, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Bayesian inference; risk-biased evaluation; multiple comparisons; effectiveness metric; credible intervals;
D O I
10.1145/3397271.3401033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval (IR) ranking models in production systems continually evolve in response to user feedback, insights from research, and new developments. Rather than investing all engineering resources to produce a single challenger to the existing system, a commercial provider might choose to explore multiple new ranking models simultaneously. However, even small changes to a complex model can have unintended consequences. In particular, the per-topic effectiveness profile is likely to change, and even when an overall improvement is achieved, gains are rarely observed for every query, introducing the risk that some users or queries may be negatively impacted by the new model if deployed into production. Risk adjustments that re-weight losses relative to gains and mitigate such behavior are available when making one-to-one system comparisons, but not for one-to-many or many-to-one comparisons. Moreover, no IR evaluation methodology integrates priors from previous or alternative rankers in a homogeneous inferential framework. In this work, we propose a Bayesian approach where multiple challengers are compared to a single champion. We also show that risk can be incorporated, and demonstrate the benefits of doing so. Finally, the alternative scenario that is commonly encountered in academic research is also considered, when a single challenger is compared against several previous champions.
引用
收藏
页码:339 / 348
页数:10
相关论文
共 50 条
  • [1] A Bayesian online inferential model for evaluation of analyzer performance
    Willis, AJ
    JOURNAL OF CHEMOMETRICS, 2005, 19 (02) : 90 - 96
  • [2] Inferential way for evaluation of bridge based on Bayesian network
    Chen, Xiaojia
    Shen, Chengwu
    Wuhan Ligong Daxue Xuebao (Jiaotong Kexue Yu Gongcheng Ban)/Journal of Wuhan University of Technology (Transportation Science and Engineering), 2006, 30 (01): : 132 - 135
  • [3] Bayesian inferential framework for diagnosis of non-stationary systems
    Smelyanskiy, Vadim N.
    Luchinsky, Dmitry G.
    Duggento, Andrea
    McClintock, Peter V. E.
    NOISE AND FLUCTUATIONS IN BIOLOGICAL, BIOPHYSICAL, AND BIOMEDICAL SYSTEMS, 2007, 6602
  • [4] ON THE EVALUATION OF IR SYSTEMS
    ROBERTSON, SE
    HANCOCKBEAULIEU, MM
    INFORMATION PROCESSING & MANAGEMENT, 1992, 28 (04) : 457 - 466
  • [5] Dynamic risk evaluation of complex systems with multiple protective systems
    Takehisa, K
    Masaki, N
    PROGRESS IN SAFETY SCIENCE AND TECHNOLOGY, VOL V, PTS A AND B, 2005, 5 : 1329 - 1337
  • [6] An Inferential Measure of Dependence Between Two Systems Using Bayesian Model Comparison
    Marrelec, Guillaume
    Giron, Alain
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (03): : 1671 - 1683
  • [7] Bayesian areal interpolation, estimation, and smoothing: an inferential approach for geographic information systems
    Mugglin, AS
    Carlin, BP
    Zhu, L
    Conlon, E
    ENVIRONMENT AND PLANNING A-ECONOMY AND SPACE, 1999, 31 (08): : 1337 - 1352
  • [8] The Inferential Complexity of Bayesian and Credal Networks
    de Campos, Cassio Polpo
    Cozman, Fabio Gagliardi
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1313 - 1318
  • [9] Continuity risk evaluation of the Bayesian posterior integrity monitoring against multiple faults
    Liu, Baoyu
    O'Keefe, Kyle
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 154
  • [10] Inferential Problems in Bayesian Logistic Regression Models
    Hwang, Jinsoo
    Kang, Sungchan
    KOREAN JOURNAL OF APPLIED STATISTICS, 2011, 24 (06) : 1149 - 1160