Contrasting Offline and Online Results when Evaluating Recommendation Algorithms

被引:43
|
作者
Rossetti, Marco [1 ]
Stella, Fabio [1 ]
Zanker, Markus [2 ]
机构
[1] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[2] Free Univ Bozen Bolzano, Fac Comp Sci, Bolzano, Italy
关键词
User study; Evaluation methodology; Experimental within users design;
D O I
10.1145/2959100.2959176
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most evaluations of novel algorithmic contributions assess their accuracy in predicting what was withheld in an offline evaluation scenario. However, several doubts have been raised that standard offline evaluation practices are not appropriate to select the best algorithm for field deployment. The goal of this work is therefore to compare the offline and the online evaluation methodology with the same study participants, i.e. a within users experimental design. This paper presents empirical evidence that the ranking of algorithms based on offline accuracy measurements clearly contradicts the results from the online study with the same set of users. Thus the external validity of the most commonly applied evaluation methodology is not guaranteed.
引用
收藏
页码:31 / 34
页数:4
相关论文
共 50 条
  • [1] When online met offline
    Nitkin, Karen
    [J]. TECHNOLOGY REVIEW, 2007, 110 (01) : 16 - 16
  • [2] Scheduling with conflicts: online and offline algorithms
    Even, Guy
    Halldorsson, Magnus M.
    Kaplan, Lotem
    Ron, Dana
    [J]. JOURNAL OF SCHEDULING, 2009, 12 (02) : 199 - 224
  • [3] Offline and Online Algorithms for SSD Management
    Lange, Tomer
    Naor, Joseph
    Yadgar, Gala
    [J]. COMMUNICATIONS OF THE ACM, 2023, 66 (07) : 129 - 137
  • [4] Offline and Online Algorithms for SSD Management
    Lange, Tomer
    Naor, Joseph
    Yadgar, Gala
    [J]. COMMUNICATIONS OF THE ACM, 2024, 67 (07) : 129 - 137
  • [5] Scheduling with conflicts: online and offline algorithms
    Guy Even
    Magnús M. Halldórsson
    Lotem Kaplan
    Dana Ron
    [J]. Journal of Scheduling, 2009, 12 : 199 - 224
  • [6] Offline and Online Algorithms for SSD Management
    Lange, Tomer
    Naor, Joseph
    Yadgar, Gala
    [J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2021, 5 (03)
  • [7] Offline Shop Recommendation based On Online Shopping History
    Mo, Chongji
    Chen, Congcong
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY, MANAGEMENT AND HUMANITIES SCIENCE (ETMHS 2015), 2015, 27 : 962 - 965
  • [8] When children express their preferences regarding sales channels Online or offline or online and offline?
    Boulay, Jacques
    de Faultrier, Brigitte
    Feenstra, Florence
    Muzellec, Laurent
    [J]. INTERNATIONAL JOURNAL OF RETAIL & DISTRIBUTION MANAGEMENT, 2014, 42 (11-12) : 1018 - +
  • [9] Comparative evaluations of online and offline tracking algorithms
    Yardimci, Ozan
    [J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS IV, 2022, 12276
  • [10] Offline Evaluation of Online Reinforcement Learning Algorithms
    Mandel, Travis
    Liu, Yun-En
    Brunskill, Emma
    Popovic, Zoran
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1926 - 1933