An in-depth investigation on the behavior of measures to quantify reproducibility

被引:4
|
作者
Maistro, Maria [1 ]
Breuer, Timo [2 ]
Schaer, Philipp [2 ]
Ferro, Nicola [3 ]
机构
[1] Univ Copenhagen, Copenhagen, Denmark
[2] TH Koln Univ Appl Sci, Cologne, Germany
[3] Univ Padua, Padua, Italy
基金
欧盟地平线“2020”;
关键词
Reproducibility; Information retrieval; Evaluation;
D O I
10.1016/j.ipm.2023.103332
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Science is facing a so-called reproducibility crisis, where researchers struggle to repeat exper-iments and to get the same or comparable results. This represents a fundamental problem in any scientific discipline because reproducibility lies at the very basis of the scientific method. A central methodological question is how to measure reproducibility and interpret different measures. In Information Retrieval (IR), current practices to measure reproducibility rely mainly on comparing averaged scores. If the reproduced score is close enough to the original one, the reproducibility experiment is deemed successful, although the identical scores can still rely on entirely different result lists. Therefore, this paper focuses on measures to quantify reproducibility in IR and their behavior. We present a critical analysis of IR reproducibility measures by synthetically generating runs in a controlled experimental setting, which allows us to control the amount of reproducibility error. These synthetic runs are generated by a deterioration algorithm based on swaps and replacements of documents in ranked lists. We investigate the behavior of different reproducibility measures with these synthetic runs in three different scenarios. Moreover, we propose a normalized version of Root Mean Square Error (RMSE) to quantify reproducibility better. Experimental results show that a single score is not enough to decide whether an experiment is successfully reproduced because such a score depends on the type of effectiveness measure and the performance of the original run. This study highlights how challenging it can be to reproduce experimental results and quantify the amount of reproducibility.
引用
收藏
页数:39
相关论文
共 50 条
  • [1] An In-Depth Mechanistic Investigation of the Radical Initiation Behavior of Monoacylgermanes
    Joeckle, Philipp
    Schweigert, Caroline
    Lamparth, Iris
    Moszner, Norbert
    Unterreiner, Andreas-Neil
    Barner-Kowollik, Christopher
    [J]. MACROMOLECULES, 2017, 50 (22) : 8894 - 8906
  • [2] An In-depth Investigation of the Divine Ratio
    Fett, Birch
    [J]. MATHEMATICS ENTHUSIAST, 2006, 3 (02):
  • [3] AN IN-DEPTH INVESTIGATION OF EPISODIC PRIMING
    SMITH, MC
    BAIN, JD
    MACLEOD, C
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1984, 22 (04) : 275 - 275
  • [4] An in-depth investigation of global sea surface temperature behavior utilizing chaotic modeling
    Minaei, Masoud
    Hopke, Philip K.
    Kamangar, Muhammad
    [J]. Environmental Science and Pollution Research, 2024, 31 (27) : 39823 - 39838
  • [5] In-depth behavior understanding and use: The behavior informatics approach
    Cao, Longbing
    [J]. INFORMATION SCIENCES, 2010, 180 (17) : 3067 - 3085
  • [6] In-Depth Look atWord Filling Societal Bias Measures
    Pikuliak, Matus
    Benova, Ivana
    Bachraty, Viktor
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3648 - 3665
  • [7] An in-depth study of graph partitioning measures for perceptual organization
    Soundararajan, P
    Sarkar, S
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (06) : 642 - 660
  • [8] In-depth review of five fatigue measures in shift workers
    Sagherian, Knar
    Brown, Jeanne Geiger
    [J]. FATIGUE-BIOMEDICINE HEALTH AND BEHAVIOR, 2016, 4 (01): : 24 - 38
  • [9] An in-depth investigation of 11 pulsars discovered by FAST
    Cameron, A. D.
    Li, D.
    Hobbs, G.
    Zhang, L.
    Miao, C. C.
    Wang, J. B.
    Yuan, M.
    Wang, S.
    Corban, G. Jacobs
    Cruces, M.
    Dai, S.
    Feng, Y.
    Han, J.
    Kaczmarek, J. F.
    Niu, J. R.
    Pan, Z. C.
    Qian, L.
    Tao, Z. Z.
    Wang, P.
    Wang, S. Q.
    Xu, H.
    Xu, R. X.
    Yue, Y. L.
    Zhang, S. B.
    Zhi, Q. J.
    Zhu, W. W.
    Champion, D. J.
    Kramer, M.
    Zhou, S. Q.
    Qiu, K. P.
    Zhu, M.
    [J]. MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2020, 495 (03) : 3515 - 3530
  • [10] Rise and fall of knowledge power: an in-depth investigation
    Latiff, Hjh
    Hassan, Abul
    [J]. HUMANOMICS, 2008, 24 (01) : 17 - +