Lack of Transparency and Potential Bias in Artificial Intelligence Data Sets and Algorithms A Scoping Review

被引:133
|
作者
Daneshjou, Roxana [1 ,2 ]
Smith, Mary P. [3 ]
Sun, Mary D. [4 ]
Rotemberg, Veronica [5 ]
Zou, James [6 ,7 ,8 ]
机构
[1] Stanford Sch Med, Stanford Dept Dermatol, 450 Broadway, Redwood City, CA 94061 USA
[2] Stanford Sch Med, Stanford Dept Biomed Data Sci, Stanford, CA 94305 USA
[3] Mem Sloan Kettering Canc Ctr, Dept Med, 1275 York Ave, New York, NY 10021 USA
[4] Icahn Sch Med Mt Sinai, New York, NY 10029 USA
[5] Mem Sloan Kettering Canc Ctr, Dermatol Serv, 1275 York Ave, New York, NY 10021 USA
[6] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
[7] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
[8] Chan Zuckerberg Biohub, San Francisco, CA USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
CONVOLUTIONAL NEURAL-NETWORK; SKIN-CANCER; IMAGE CLASSIFICATION; DERMATOLOGISTS; MELANOMA; PERFORMANCE; DIAGNOSIS; TIME; ACCURACY; SUPERIOR;
D O I
10.1001/jamadermatol.2021.3129
中图分类号
R75 [皮肤病学与性病学];
学科分类号
100206 ;
摘要
IMPORTANCE Clinical artificial intelligence (AI) algorithms have the potential to improve clinical care, but fair, generalizable algorithms depend on the clinical data on which they are trained and tested. OBJECTIVE To assess whether data sets used for training diagnostic AI algorithms addressing skin disease are adequately described and to identify potential sources of bias in these data sets. DATA SOURCES In this scoping review, PubMed was used to search for peer-reviewed research articles published between January 1, 2015, and November 1, 2020, with the following paired search terms: deep learning and dermatology, artificial intelligence and dermatology, deep learning and dermatologist, and artificial intelligence and dermatologist. STUDY SELECTION Studies that developed or tested an existing deep learning algorithm for triage, diagnosis, or monitoring using clinical or dermoscopic images of skin disease were selected, and the articles were independently reviewed by 2 investigators to verify that they met selection criteria. CONSENSUS PROCESS Data set audit criteria were determined by consensus of all authors after reviewing existing literature to highlight data set transparency and sources of bias. RESULTS A total of 70 unique studies were included. Among these studies, 1 065 291 images were used to develop or test AI algorithms, of which only 257 372 (24.2%) were publicly available. Only 14 studies (20.0%) included descriptions of patient ethnicity or race in at least 1 data set used. Only 7 studies (10.0%) included any information about skin tone in at least 1 data set used. Thirty-six of the 56 studies developing new AI algorithms for cutaneous malignant neoplasms (64.3%) met the gold standard criteria for disease labeling. Public data sets were cited more often than private data sets, suggesting that public data sets contribute more to new development and benchmarks. CONCLUSIONS AND RELEVANCE This scoping review identified 3 issues in data sets that are used to develop and test clinical AI algorithms for skin disease that should be addressed before clinical translation: (1) sparsity of data set characterization and lack of transparency, (2) nonstandard and unverified disease labels, and (3) inability to fully assess patient diversity used for algorithm development and testing.
引用
收藏
页码:1362 / 1369
页数:8
相关论文
共 50 条
  • [21] Artificial intelligence in orthopaedics: A scoping review
    Federer, Simon J.
    Jones, Gareth G.
    PLOS ONE, 2021, 16 (11):
  • [22] Artificial Intelligence for Detection of Dementia Using Motions Data: A Scoping Review
    Puterman-Salzman, Lily
    Katz, Jory
    Bergman, Howard
    Grad, Roland
    Khanassov, Vladimir
    Gore, Genevieve
    Vedel, Isabelle
    Wilchesky, Machelle
    Armanfard, Narges
    Ghourchian, Negar
    Rahimi, Samira Abbasgholizadeh
    DEMENTIA AND GERIATRIC COGNITIVE DISORDERS EXTRA, 2023, 13 (01) : 28 - 38
  • [23] A Review of Bias and Fairness in Artificial Intelligence
    Gonzalez-Sendino, Ruben
    Serrano, Emilio
    Bajo, Javier
    Novais, Paulo
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023,
  • [24] 378 The Future Potential of Artificial Intelligence in Upper Gastrointestinal Endoscopy, A Scoping review
    Burke, E.
    Awan, F.
    Balfe, P.
    BRITISH JOURNAL OF SURGERY, 2021, 108 (SUPPL 2)
  • [25] Bias in Algorithms and the Misuse of Big Data Sets
    Walker H.M.
    ACM Inroads, 2020, 11 (02) : 12 - 17
  • [26] Artificial Intelligence in Pediatric Cardiology: A Scoping Review
    Sethi, Yashendra
    Patel, Neil
    Kaka, Nirja
    Desai, Ami
    Kaiwan, Oroshay
    Sheth, Mili
    Sharma, Rupal
    Huang, Helen
    Chopra, Hitesh
    Khandaker, Mayeen Uddin
    Lashin, Maha M. A.
    Hamd, Zuhal Y. Y.
    Bin Emran, Talha
    JOURNAL OF CLINICAL MEDICINE, 2022, 11 (23)
  • [27] Artificial intelligence in emergency medicine: A scoping review
    Kirubarajan, Abirami
    Taher, Ahmed
    Khan, Shawn
    Masood, Sameer
    JOURNAL OF THE AMERICAN COLLEGE OF EMERGENCY PHYSICIANS OPEN, 2020, 1 (06) : 1691 - 1702
  • [28] Applying artificial intelligence to big data in hepatopancreatic and biliary surgery: a scoping review
    McGivern, Kieran G.
    Drake, Thomas M.
    Knight, Stephen R.
    Lucocq, James
    Bernabeu, Miguel O.
    Clark, Neil
    Fairfield, Cameron
    Pius, Riinu
    Shaw, Catherine A.
    Seth, Sohan
    Harrison, Ewen M.
    ARTIFICIAL INTELLIGENCE SURGERY, 2023, 3 (01): : 27 - 47
  • [29] Artificial intelligence in nursing education: A scoping review
    Lifshits, Igal
    Rosenberg, Dennis
    NURSE EDUCATION IN PRACTICE, 2024, 80
  • [30] A Scoping Review of Artificial Intelligence Research in Rhinology
    Osie, Gabriel
    Kaul, Rhea Darbari
    Alvarado, Raquel
    Katsoulotos, Gregory
    Rimmer, Janet
    Kalish, Larry
    Campbell, Raewyn G.
    Sacks, Raymond
    Harvey, Richard J.
    AMERICAN JOURNAL OF RHINOLOGY & ALLERGY, 2023, 37 (04) : 438 - 448