Biased accuracy in multisite machine-learning studies due to incomplete removal of the effects of the site

被引:13
|
作者
Solanes, Aleix [1 ,2 ]
Palau, Pol [3 ,4 ]
Fortea, Lydia [1 ,5 ,6 ]
Salvador, Raymond [3 ,5 ]
Gonzalez-Navarro, Laura [7 ]
Daniel Llach, Cristian [1 ,5 ,6 ,8 ]
Valenti, Marc [1 ,5 ,6 ,8 ]
Vieta, Eduard [1 ,5 ,6 ,8 ]
Radua, Joaquim [1 ,5 ,9 ,10 ]
机构
[1] Inst Invest Biomed August Pi i Sunyer IDIBAPS, Rossello 149, Barcelona 08036, Spain
[2] Autonomous Univ Barcelona, Dept Psychiat & Forens Med, Barcelona, Spain
[3] FIDMAG Res Fdn, Barcelona, Spain
[4] CASM Benito Menni Granollers Hosp Gen Granollers, Barcelona, Spain
[5] Inst Salud Carlos III, Biomed Network Res Ctr Mental Hlth CIBERSAM, Madrid, Spain
[6] Univ Barcelona, Inst Neurosci, Barcelona, Spain
[7] Univ Barcelona, Fac Biol, Barcelona, Spain
[8] Hosp Clin Barcelona, Inst Neurosci, Barcelona Bipolar Disorders & Depress Unit, Barcelona, Spain
[9] Kings Coll London, Inst Psychiat Psychol & Neurosci, Dept Psychosis Studies, London, England
[10] Karolinska Inst, Ctr Psychiat Res & Educ, Dept Clin Neurosci, Stockholm, Sweden
关键词
Bias; Effects of the site; Machine learning; Magnetic resonance imaging; CLASSIFICATION;
D O I
10.1016/j.pscychresns.2021.111313
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Brain MRI researchers conducting multisite studies, such as within the ENIGMA Consortium, are very aware of the importance of controlling the effects of the site (EoS) in the statistical analysis. Conversely, authors of the novel machine-learning MRI studies may remove the EoS when training the machine-learning models but not control them when estimating the models' accuracy, potentially leading to severely biased estimates. We show examples from a toy simulation study and real MRI data in which we remove the EoS from both the "training set" and the "test set" during the training and application of the model. However, the accuracy is still inflated (or occasionally shrunk) unless we further control the EoS during the estimation of the accuracy. We also provide several methods for controlling the EoS during the estimation of the accuracy, and a simple R package ("multisite. accuracy") that smoothly does this task for several accuracy estimates (e.g., sensitivity/specificity, area under the curve, correlation, hazard ratio, etc.).
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Machine-Learning Studies on Spin Models
    Shiina, Kenta
    Mori, Hiroyuki
    Okabe, Yutaka
    Lee, Hwee Kuan
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [2] Machine-Learning Studies on Spin Models
    Kenta Shiina
    Hiroyuki Mori
    Yutaka Okabe
    Hwee Kuan Lee
    [J]. Scientific Reports, 10
  • [3] Removing the effects of the site in brain imaging machine-learning - Measurement and extendable benchmark
    Solanes, Aleix
    Gosling, Corentin J.
    Fortea, Lydia
    Ortuno, Maria
    Lopez-Soley, Elisabet
    Llufriu, Sara
    Madero, Santiago
    Martinez-Heras, Eloy
    Pomarol-Clotet, Edith
    Solana, Elisabeth
    Vieta, Eduard
    Radua, Joaquim
    [J]. NEUROIMAGE, 2023, 265
  • [4] Improving the accuracy of machine-learning models with data from machine test repetitions
    Andres Bustillo
    Roberto Reis
    Alisson R. Machado
    Danil Yu. Pimenov
    [J]. Journal of Intelligent Manufacturing, 2022, 33 : 203 - 221
  • [5] Improving the accuracy of machine-learning models with data from machine test repetitions
    Bustillo, Andres
    Reis, Roberto
    Machado, Alisson R.
    Pimenov, Danil Yu.
    [J]. JOURNAL OF INTELLIGENT MANUFACTURING, 2022, 33 (01) : 203 - 221
  • [6] Neural effects of dopaminergic compounds revealed by multi-site electrophysiology and interpretable machine-learning
    Kapanaiah, Sampath K. T.
    Rosenbrock, Holger
    Hengerer, Bastian
    Kaetzel, Dennis
    [J]. FRONTIERS IN PHARMACOLOGY, 2024, 15
  • [7] How Much Do We Remove the Effects of the Site in Multisite MRI-Based Machine Learning?
    Radua, Joaquim
    Solanes, Aleix
    [J]. BIOLOGICAL PSYCHIATRY, 2022, 91 (09) : S23 - S23
  • [8] Statistical and Machine-Learning Analyses in Nutritional Genomics Studies
    Khorraminezhad, Leila
    Leclercq, Mickael
    Droit, Arnaud
    Bilodeau, Jean-Francois
    Rudkowska, Iwona
    [J]. NUTRIENTS, 2020, 12 (10) : 1 - 19
  • [9] Machine-Learning X-Ray Absorption Spectra to Quantitative Accuracy
    Carbone, Matthew R.
    Topsakal, Mehmet
    Lu, Deyu
    Yoo, Shinjae
    [J]. PHYSICAL REVIEW LETTERS, 2020, 124 (15)
  • [10] Rapid disease stratification with high accuracy using multiomics and machine-learning in erythroderma
    Stadler, P.
    Munoz, M. L. Neulinger
    Eicher, L.
    Senner, S.
    Anne-Sophie, B.
    Mitwalli, M.
    Flaig, M.
    Stadler, R.
    Helbig, D.
    Kerl-French, K.
    Satoh, T. K.
    Nordmann, T.
    Mann, M.
    French, L. E.
    [J]. JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2024, 144 (08) : S125 - S125