Bayesian Inferential Risk Evaluation On Multiple IR Systems

被引:1
|
作者
Benham, Rodger [1 ]
Ben Carterette [2 ]
Culpepper, J. Shane [1 ]
Moffat, Alistair [3 ]
机构
[1] RMIT Univ, Melbourne, Vic, Australia
[2] Spotify, New York, NY USA
[3] Univ Melbourne, Melbourne, Vic, Australia
基金
澳大利亚研究理事会;
关键词
Bayesian inference; risk-biased evaluation; multiple comparisons; effectiveness metric; credible intervals;
D O I
10.1145/3397271.3401033
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information retrieval (IR) ranking models in production systems continually evolve in response to user feedback, insights from research, and new developments. Rather than investing all engineering resources to produce a single challenger to the existing system, a commercial provider might choose to explore multiple new ranking models simultaneously. However, even small changes to a complex model can have unintended consequences. In particular, the per-topic effectiveness profile is likely to change, and even when an overall improvement is achieved, gains are rarely observed for every query, introducing the risk that some users or queries may be negatively impacted by the new model if deployed into production. Risk adjustments that re-weight losses relative to gains and mitigate such behavior are available when making one-to-one system comparisons, but not for one-to-many or many-to-one comparisons. Moreover, no IR evaluation methodology integrates priors from previous or alternative rankers in a homogeneous inferential framework. In this work, we propose a Bayesian approach where multiple challengers are compared to a single champion. We also show that risk can be incorporated, and demonstrate the benefits of doing so. Finally, the alternative scenario that is commonly encountered in academic research is also considered, when a single challenger is compared against several previous champions.
引用
收藏
页码:339 / 348
页数:10
相关论文
共 50 条
  • [21] Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning
    Jorge, Emilio
    Eriksson, Hannes
    Dimitrakakis, Christos
    Basu, Debabrota
    Grover, Divya
    NEURIPS WORKSHOPS, 2020, 2020, 137 : 43 - 52
  • [22] Inferring to cooperate: Evolutionary games with Bayesian inferential strategies
    Patra, Arunava
    Sengupta, Supratim
    Paul, Ayan
    Chakraborty, Sagar
    NEW JOURNAL OF PHYSICS, 2024, 26 (06):
  • [23] INFERENTIAL DISTRIBUTIONS FOR NON-BAYESIAN PREDICTIVE FIT
    KUBOKI, H
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1993, 45 (03) : 567 - 578
  • [24] Conversion of ordinal attitudinal scales: An inferential Bayesian approach
    Michael Evans
    Zvi Gilula
    Irwin Guttman
    Quantitative Marketing and Economics, 2012, 10 : 283 - 304
  • [25] Objective bayesian analysis for multiple repairable systems
    D'Andrea, Amanda M. E.
    Tomazella, Vera L. D.
    Aljohani, Hassan M.
    Ramos, Pedro L.
    Almeida, Marco P.
    Louzada, Francisco
    Verssani, Bruna A. W.
    Gazon, Amanda B.
    Afify, Ahmed Z.
    PLOS ONE, 2021, 16 (11):
  • [26] The ecological determinants of severe dengue: A Bayesian inferential model
    Annan, Esther
    Bukhari, Moeen Hamid
    Trevino, Jesus
    Abad, Zahra Shakeri Hossein
    Lubinda, Jailos
    da Silva, Eduardo A. B.
    Haque, Ubydul
    ECOLOGICAL INFORMATICS, 2023, 74
  • [27] INFERENTIAL PROCESSES IN FORCED COMPLIANCE SITUATION - BAYESIAN ANALYSIS
    TROPE, Y
    JOURNAL OF EXPERIMENTAL SOCIAL PSYCHOLOGY, 1974, 10 (01) : 1 - 16
  • [28] Dynamic Security Risk Evaluation via Hybrid Bayesian Risk Graph in Cyber-Physical Social Systems
    Li, Shancang
    Zhao, Shanshan
    Yuan, Yong
    Sun, Qindong
    Zhang, Kewang
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2018, 5 (04): : 1133 - 1141
  • [29] Conversion of ordinal attitudinal scales: An inferential Bayesian approach
    Evans, Michael
    Gilula, Zvi
    Guttman, Irwin
    QME-QUANTITATIVE MARKETING AND ECONOMICS, 2012, 10 (03): : 283 - 304
  • [30] Balancing Inferential Integrity and Disclosure Risk Via Model Targeted Masking and Multiple Imputation
    Jiang, Bei
    Raftery, Adrian E.
    Steele, Russell J.
    Wang, Naisyin
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 117 (537) : 52 - 66