Health equity assessment of machine learning performance (HEAL): a framework and dermatology AI model case study

被引:2
|
作者
Schaekermann, Mike [1 ]
Spitz, Terry [1 ]
Pyles, Malcolm [2 ,3 ]
Cole-Lewis, Heather [1 ]
Wulczyn, Ellery [1 ]
Pfohl, Stephen R. [1 ]
Martin Jr, Donald [1 ]
Jaroensri, Ronnachai [1 ]
Keeling, Geoff [1 ]
Liu, Yuan [1 ]
Farquhar, Stephanie [1 ]
Xue, Qinghan [1 ]
Lester, Jenna [2 ,4 ]
Hughes, Cian [1 ]
Strachan, Patricia [1 ]
Tan, Fraser [1 ]
Bui, Peggy [1 ]
Mermel, Craig H. [1 ,5 ]
Peng, Lily H. [1 ,6 ]
Matias, Yossi [1 ]
Corrado, Greg S. [1 ]
Webster, Dale R. [1 ]
Virmani, Sunny [1 ]
Semturs, Christopher [1 ]
Liu, Yun [1 ]
Horn, Ivor [1 ]
Chen, Po-Hsuan Cameron [1 ]
机构
[1] Google Hlth, 600 Amphitheatre Pkwy, Mountain View, CA 94043 USA
[2] Assoc Adv Wound Care, Mundelein, IL USA
[3] Cleveland Clin, Dept Dermatol, Cleveland Hts, OH USA
[4] Pediat Univ Calif, Dept Dermatol, San Francisco, CA USA
[5] IETF, Mountain View, CA USA
[6] Verily Life Sci, South San Francisco, CA USA
关键词
Artificial intelligence; Machine learning; Health equity; Dermatology; DISPARITIES; MELANOMA; LIFE; BIAS;
D O I
10.1016/j.eclinm.2024.102479
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Artificial intelligence (AI) has repeatedly been shown to encode historical inequities in healthcare. We aimed to develop a framework to quantitatively assess the performance equity of health AI technologies and to illustrate its utility via a case study. Methods Here, we propose a methodology to assess whether health AI technologies prioritise performance for patient populations experiencing worse outcomes, that is complementary to existing fairness metrics. We developed the Health Equity Assessment of machine Learning performance (HEAL) framework designed to quantitatively assess the performance equity of health AI technologies via a four-step interdisciplinary process to understand and quantify domain-specific criteria, and the resulting HEAL metric. As an illustrative case study (analysis conducted between October 2022 and January 2023), we applied the HEAL framework to a dermatology AI model. A set of 5420 teledermatology cases (store-and-forward cases from patients of 20 years or older, submitted from primary care providers in the USA and skin cancer clinics in Australia), enriched for diversity in age, sex and race/ethnicity, was used to retrospectively evaluate the AI model's HEAL metric, defined as the likelihood that the AI model performs better for subpopulations with worse average health outcomes as compared to others. The likelihood that AI performance was anticorrelated to pre-existing health outcomes was estimated using bootstrap methods as the probability that the negated Spearman's rank correlation coefficient (i.e., "R") was greater than zero. Positive values of R suggest that subpopulations with poorer health outcomes have better AI model performance. Thus, the HEAL metric, defined as p (R >0), measures how likely the AI technology is to prioritise performance for subpopulations with worse average health outcomes as compared to others (presented as a percentage below). Health outcomes were quantified as disability-adjusted life years (DALYs) when grouping by sex and age, and years of life lost (YLLs) when grouping by race/ethnicity. AI performance was measured as top-3 agreement with the reference diagnosis from a panel of 3 dermatologists per case. Findings Across all dermatologic conditions, the HEAL metric was 80.5% for prioritizing AI performance of racial/ ethnic subpopulations based on YLLs, and 92.1% and 0.0% respectively for prioritizing AI performance of sex and age subpopulations based on DALYs. Certain dermatologic conditions were significantly associated with greater AI model performance compared to a reference category of less common conditions. For skin cancer conditions, the HEAL metric was 73.8% for prioritizing AI performance of age subpopulations based on DALYs. Interpretation Analysis using the proposed HEAL framework showed that the dermatology AI model prioritised performance for race/ethnicity, sex (all conditions) and age (cancer conditions) subpopulations with respect to preexisting health disparities. More work is needed to investigate ways of promoting equitable AI performance across age for non-cancer conditions and to better understand how AI models can contribute towards improving equity in health outcomes. Funding Google LLC. Copyright (c) 2024 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Conversational AI in health: Design considerations from a Wizard-of-Oz dermatology case study with users, clinicians and a medical LLM
    Li, Brenna
    Wang, Amy
    Strachan, Patricia
    Seguin, Julie Anne
    Lachgar, Sami
    Schroeder, Karyn
    Fleck, Mathias
    Wong, Renee
    Karthikesalingam, Alan
    Natarajan, Vivek
    Matias, Yossi
    Corrado, Greg S.
    Webster, Dale R.
    Liu, Yun
    Hammel, Naama
    Sayres, Rory
    Semturs, Christopher
    Schaekermann, Mike
    EXTENDED ABSTRACTS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2024, 2024,
  • [42] Performance Assessment Framework for Small Water Systems: Case Study in British Columbia
    Pokhrel, Sarin Raj
    Chhipi-Shrestha, Gyan
    Hager, James
    Rodriguez, Manuel J.
    Hewage, Kasun
    Sadiq, Rehan
    JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT, 2020, 146 (12)
  • [43] Framework for Competency and Performance Assessment in Radiation Oncology: MRIdian Linac Case Study
    Kota, C.
    RADIOTHERAPY AND ONCOLOGY, 2020, 152 : S1002 - S1003
  • [44] Applying DevOps Practices for Machine Learning: Case Study Predicting Academic Performance
    Valdiviezo-Diaz, Priscila
    Guaman, Daniel
    GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 5, WORLDCIST 2024, 2024, 989 : 295 - 307
  • [45] Framework for ADS-B Performance Assessment: the London TMA Case Study
    Ali, Busyairah Syd
    Schuster, Wolfgang
    Ochieng, Washington
    Majumdar, Arnab
    Chiew, Thiam Kian
    NAVIGATION-JOURNAL OF THE INSTITUTE OF NAVIGATION, 2014, 61 (01): : 39 - 52
  • [46] Sustainability performance assessment framework: a cross-industry multiple case study
    Saeed, Muhammad Amad
    Kersten, Wolfgang
    INTERNATIONAL JOURNAL OF SUSTAINABLE DEVELOPMENT AND WORLD ECOLOGY, 2020, 27 (06): : 496 - 514
  • [47] Machine learning-based aggressiveness assessment model construction for crabs: A case study of swimming crab Portunus trituberculatus
    Liang, Qihang
    Liu, Dapeng
    Zhang, Dan
    Wang, Xin
    Zhu, Boshan
    Wang, Fang
    AQUACULTURE, 2024, 593
  • [48] Learning together for better health using an evidence-based Learning Health System framework: a case study in stroke
    Teede, Helena
    Cadilhac, Dominique A.
    Purvis, Tara
    Kilkenny, Monique F.
    Campbell, Bruce C. V.
    English, Coralie
    Johnson, Alison
    Callander, Emily
    Grimley, Rohan S.
    Levi, Christopher
    Middleton, Sandy
    Hill, Kelvin
    Enticott, Joanne
    BMC MEDICINE, 2024, 22 (01):
  • [49] Performance of conceptual framework elements for the retrieval of qualitative health literature: a case study
    Frandsen, Tove Faber
    Lindhardt, Christina Louise
    Eriksen, Mette Brandt
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2021, 109 (03) : 388 - 394
  • [50] Health system performance assessment in small countries: The case study of Latvia
    Noto, Guido
    Corazza, Ilaria
    Klavina, Kristine
    Lepiksone, Jana
    Nuti, Sabina
    INTERNATIONAL JOURNAL OF HEALTH PLANNING AND MANAGEMENT, 2019, 34 (04): : 1408 - 1422