Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening

被引:8
|
作者
Chalkidou A. [1 ]
Shokraneh F. [1 ]
Kijauskaite G. [2 ]
Taylor-Phillips S. [3 ]
Halligan S. [4 ]
Wilkinson L. [5 ]
Glocker B. [6 ]
Garrett P. [7 ]
Denniston A.K. [8 ]
Mackie A. [2 ]
Seedat F. [2 ]
机构
[1] King's Technology Evaluation Centre, King's College London, London
[2] UK National Screening Committee, Office for Health Improvement and Disparities, Department of Health and Social Care, London
[3] Warwick Medical School, University of Warwick, Coventry
[4] Centre for Medical Imaging, Division of Medicine, University College London, London
[5] Oxford Breast Imaging Centre, Oxford University, Oxford
[6] Department of Computing, Imperial College London, London
[7] Department of Chemical Engineering and Analytical Science, University of Manchester, Manchester
[8] Department of Ophthalmology, University Hospitals Birmingham NHS Foundation Trust, Birmingham
来源
The Lancet Digital Health | 2022年 / 4卷 / 12期
关键词
D O I
10.1016/S2589-7500(22)00186-8
中图分类号
学科分类号
摘要
Rigorous evaluation of artificial intelligence (AI) systems for image classification is essential before deployment into health-care settings, such as screening programmes, so that adoption is effective and safe. A key step in the evaluation process is the external validation of diagnostic performance using a test set of images. We conducted a rapid literature review on methods to develop test sets, published from 2012 to 2020, in English. Using thematic analysis, we mapped themes and coded the principles using the Population, Intervention, and Comparator or Reference standard, Outcome, and Study design framework. A group of screening and AI experts assessed the evidence-based principles for completeness and provided further considerations. From the final 15 principles recommended here, five affect population, one intervention, two comparator, one reference standard, and one both reference standard and comparator. Finally, four are appliable to outcome and one to study design. Principles from the literature were useful to address biases from AI; however, they did not account for screening specific biases, which we now incorporate. The principles set out here should be used to support the development and use of test sets for studies that assess the accuracy of AI within screening programmes, to ensure they are fit for purpose and minimise bias. © 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license
引用
下载
收藏
页码:e899 / e905
页数:6
相关论文
共 50 条
  • [1] Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening
    Chalkidou, Anastasia
    Shokraneh, Farhad
    Kijauskaite, Goda
    Taylor-Phillips, Sian
    Halligan, Steve
    Wilkinson, Louise
    Glocker, Ben
    Garrett, Peter
    Denniston, Alastair K.
    Mackie, Anne
    Seedat, Farah
    LANCET DIGITAL HEALTH, 2022, 4 (12): : E899 - E905
  • [2] Use of artificial intelligence to evaluate the detection of retinal alterations as a screening test in Mexican patients
    Argueta-Santillan, Moises
    Mahuina Campos-Castolo, E.
    Angel Mendez-Lucero, Miguel
    Lima-Sanchez, Dania N.
    Fabricio Urbina-Gonzalez, Josue
    Ceron-Solis, Orlando
    Alayola-Sansores, Alejandro
    Fajardo-Dolci, German
    INTERNATIONAL JOURNAL OF COMBINATORIAL OPTIMIZATION PROBLEMS AND INFORMATICS, 2021, 12 (03): : 79 - 86
  • [3] Recommendations on compiling test datasets for evaluating artificial intelligence solutions in pathology
    Homeyer, Andre
    Geissler, Christian
    Schwen, Lars Ole
    Zakrzewski, Falk
    Evans, Theodore
    Strohmenger, Klaus
    Westphal, Max
    Buelow, Roman David
    Kargl, Michaela
    Karjauv, Aray
    Munne-Bertran, Isidre
    Retzlaff, Carl Orge
    Romero-Lopez, Adria
    Soltysinski, Tomasz
    Plass, Markus
    Carvalho, Rita
    Steinbach, Peter
    Lan, Yu-Chia
    Bouteldja, Nassim
    Haber, David
    Rojas-Carulla, Mateo
    Sadr, Alireza Vafaei
    Kraft, Matthias
    Krueger, Daniel
    Fick, Rutger
    Lang, Tobias
    Boor, Peter
    Mueller, Heimo
    Hufnagl, Peter
    Zerbe, Norman
    MODERN PATHOLOGY, 2022, 35 (12) : 1759 - 1769
  • [4] THE USE OF INTERRELATED SPECIMENS TO INVESTIGATE TEST KIT PERFORMANCE
    CHERESON, PH
    GRANNIS, GF
    LOTT, JA
    RILEY, B
    CLINICAL CHEMISTRY, 1980, 26 (07) : 1014 - 1014
  • [5] A STATISTICAL STUDY OF THE DEVELOPMENT OF INTELLIGENCE TEST PERFORMANCE
    Roff, Merrill
    JOURNAL OF PSYCHOLOGY, 1941, 11 (02): : 371 - 386
  • [6] USE OF ARTIFICIAL-INTELLIGENCE IN WELL-TEST INTERPRETATION
    ALLAIN, OF
    HORNE, RN
    JOURNAL OF PETROLEUM TECHNOLOGY, 1990, 42 (03): : 342 - 349
  • [7] Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy
    Freeman, Karoline
    Geppert, Julia
    Stinton, Chris
    Todkill, Daniel
    Johnson, Samantha
    Clarke, Aileen
    Taylor-Phillips, Sian
    BMJ-BRITISH MEDICAL JOURNAL, 2021, 374
  • [9] Associations between cognitive performance in a dementia screening test (SKT) and an intelligence test (WAIS IV)
    Pauli, Laura
    Daseking, Monika
    Petermann, Franz
    Stemmler, Mark
    ZEITSCHRIFT FUR GERONTOLOGIE UND GERIATRIE, 2018, 51 (03): : 266 - 274
  • [10] THE USE OF ARTIFICIAL INTELLIGENCE FOR THE DEVELOPMENT OF HEALTH ECONOMIC MODELS
    Poirrier, J. E.
    Kolasa, K.
    Vanderpuye-Orgle, J.
    Bergemann, R.
    VALUE IN HEALTH, 2023, 26 (06) : S296 - S296