Counterfactual Ranking Evaluation with Flexible Click Models

被引:0
|
作者
Buchholz, Alexander [1 ]
London, Ben [2 ]
Di Benedetto, Giuseppe [1 ]
Lichtenberg, Jan Malte [1 ]
Stein, Yannik [1 ]
Joachims, Thorsten [3 ]
机构
[1] Amazon Mus, Berlin, Germany
[2] Amazon Mus, Seattle, WA USA
[3] Amazon Mus, Ithaca, NY USA
关键词
Off-policy evaluation; Learning-to-rank; Position bias; Position-based model; Item-position model;
D O I
10.1145/3626772.3657810
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evaluating a new ranking policy using data logged by a previously deployed policy requires a counterfactual (off-policy) estimator that corrects for presentation and selection biases. Some estimators (e.g., the position-based model) perform this correction by making strong assumptions about user behavior, which can lead to high bias if the assumptions are not met. Other estimators (e.g., the itemposition model) rely on randomization to avoid these assumptions, but they often suffer from high variance. In this paper, we develop a new counterfactual estimator, called Interpol, that provides a tunable trade-off in the assumptions it makes, thus providing a novel ability to optimize the bias-variance trade-off. We analyze the bias of our estimator, both theoretically and empirically, and show that it achieves lower error than both the position-based model and the item-position model, on both synthetic and real datasets. This improvement in accuracy not only benefits offline evaluation of ranking policies, we also find that Interpol improves learning of new ranking policies when used as the training objective for learning-to-rank.
引用
收藏
页码:1200 / 1210
页数:11
相关论文
共 50 条
  • [1] Offline Evaluation of Ranking Policies with Click Models
    Li, Shuai
    Abbasi-Yadkori, Yasin
    Kveton, Branislav
    Muthukrishnan, S.
    Vinay, Vishwa
    Wen, Zheng
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1685 - 1694
  • [2] Combining counterfactual outcomes and ARIMA models for policy evaluation
    Menchetti, Fiammetta
    Cipollini, Fabrizio
    Mealli, Fabrizia
    [J]. ECONOMETRICS JOURNAL, 2023, 26 (01): : 1 - 24
  • [3] RankEval: Evaluation and investigation of ranking models
    Lucchese, Claudio
    Muntean, Cristina Ioana
    Nardini, Franco Maria
    Perego, Raffaele
    Trani, Salvatore
    [J]. SOFTWAREX, 2020, 12
  • [4] Reducing Sentiment Bias in Language Models via Counterfactual Evaluation
    Huang, Po-Sen
    Zhang, Huan
    Jiang, Ray
    Stanforth, Robert
    Welbl, Johannes
    Rae, Jack W.
    Maini, Vishal
    Yogatama, Dani
    Kohli, Pushmeet
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 65 - 83
  • [5] EVALUATION OF GRADE CROSSING HAZARD RANKING MODELS
    Sperry, Benjamin R.
    Naik, Bhaven
    Warner, Jeffery E.
    [J]. PROCEEDINGS OF THE ASME JOINT RAIL CONFERENCE, 2017, 2017,
  • [6] Ranking-based evaluation of regression models
    Saharon Rosset
    Claudia Perlich
    Bianca Zadrozny
    [J]. Knowledge and Information Systems, 2007, 12 : 331 - 353
  • [7] Ranking-based evaluation of regression models
    Rosset, Saharon
    Perlich, Claudia
    Zadrozny, Bianca
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (03) : 331 - 353
  • [8] Ranking-based evaluation of regression models
    Rosset, S
    Perlich, C
    Zadrozny, B
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 370 - 377
  • [9] Empirically Testing Deep and Shallow Ranking Models for Click-Through Rate (CTR) prediction
    Yang, Yi-Che
    Lai, Ping-Ching
    Chen, Hung-Hsuan
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 147 - 152
  • [10] CLASS RANKING MODELS FOR DEANS LETTERS AND THEIR PSYCHOMETRIC EVALUATION
    BLACKLOW, RS
    GOEPP, CE
    HOJAT, M
    [J]. ACADEMIC MEDICINE, 1991, 66 (09) : S10 - S12