The impact of site-specific digital histology signatures on deep learning model accuracy and bias

被引:121
|
作者
Howard, Frederick M. [1 ]
Dolezal, James [1 ]
Kochanny, Sara [1 ]
Schulte, Jefree [2 ]
Chen, Heather [2 ]
Heij, Lara [3 ,4 ]
Huo, Dezheng [5 ,6 ]
Nanda, Rita [1 ,6 ]
Olopade, Olufunmilayo I. [1 ,6 ]
Kather, Jakob N. [7 ,8 ,9 ]
Cipriani, Nicole [2 ,6 ]
Grossman, Robert L. [1 ,6 ]
Pearson, Alexander T. [1 ,6 ]
机构
[1] Univ Chicago, Dept Med, Sect Hematol Oncol, 5841 S Maryland Ave, Chicago, IL 60637 USA
[2] Univ Chicago, Dept Pathol, 5841 S Maryland Ave, Chicago, IL 60637 USA
[3] Univ Hosp RWTH Aachen, Dept Surg & Transplantat, Aachen, Germany
[4] Univ Hosp RWTH Aachen, Inst Pathol, Aachen, Germany
[5] Univ Chicago, Dept Publ Hlth Sci, Chicago, IL 60637 USA
[6] Univ Chicago Comprehens Canc Ctr, Chicago, IL USA
[7] Univ Hosp RWTH Aachen, Dept Med 3, Aachen, Germany
[8] Univ Leeds, Leeds Inst Med Res St Jamess, Pathol & Data Analyt, Leeds, W Yorkshire, England
[9] Univ Heidelberg Hosp, Natl Ctr Tumor Dis, Med Oncol, Heidelberg, Germany
关键词
COMPREHENSIVE GENOMIC CHARACTERIZATION; OPERATING CHARACTERISTIC CURVES; BREAST-CANCER; MITOSIS DETECTION; HEALTH-CARE; HISTOPATHOLOGY; ANCESTRY; RESOURCE; BIOLOGY; AREAS;
D O I
10.1038/s41467-021-24698-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Cancer Genome Atlas (TCGA) is one of the largest biorepositories of digital histology. Deep learning (DL) models have been trained on TCGA to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. However, we demonstrate that these features vary substantially across tissue submitting sites in TCGA for over 3,000 patients with six cancer subtypes. Additionally, we show that histologic image differences between submitting sites can easily be identified with DL. Site detection remains possible despite commonly used color normalization and augmentation methods, and we quantify the image characteristics constituting this site-specific digital histology signature. We demonstrate that these site-specific signatures lead to biased accuracy for prediction of features including survival, genomic mutations, and tumor stage. Furthermore, ethnicity can also be inferred from site-specific signatures, which must be accounted for to ensure equitable application of DL. These site-specific signatures can lead to overoptimistic estimates of model performance, and we propose a quadratic programming method that abrogates this bias by ensuring models are not trained and validated on samples from the same site. Deep learning models have been trained on The Cancer Genome Atlas to predict numerous features directly from histology, including survival, gene expression patterns, and driver mutations. Here, the authors demonstrate that site-specific histologic signatures can lead to biased estimates of accuracy for such models, and propose a method to minimize such bias.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] The impact of site-specific digital histology signatures on deep learning model accuracy and bias
    Frederick M. Howard
    James Dolezal
    Sara Kochanny
    Jefree Schulte
    Heather Chen
    Lara Heij
    Dezheng Huo
    Rita Nanda
    Olufunmilayo I. Olopade
    Jakob N. Kather
    Nicole Cipriani
    Robert L. Grossman
    Alexander T. Pearson
    Nature Communications, 12
  • [2] Site-specific codon bias in bacteria
    Smith, JM
    Smith, NH
    GENETICS, 1996, 142 (03) : 1037 - 1043
  • [3] Site-Specific Beam Alignment in 6G via Deep Learning
    Heng, Yuqiang
    Zhang, Yu
    Alkhateeb, Ahmed
    Andrews, Jeffrey G.
    IEEE COMMUNICATIONS MAGAZINE, 2024, 62 (08) : 162 - 168
  • [4] Spatial Deep Learning for Site-Specific Movement Optimization of Aerial Base Stations
    Lyu, Jiangbin
    Chen, Xu
    Zhang, Jiefeng
    Fu, Liqun
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (07) : 7712 - 7727
  • [5] Artichoke deep learning detection network for site-specific agrochemicals UAS spraying
    Sassu, Alberto
    Motta, Jacopo
    Deidda, Alessandro
    Ghiani, Luca
    Carlevaro, Alberto
    Garibotto, Giovanni
    Gambella, Filippo
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 213
  • [6] Site-specific Deep Learning Path Loss Models based on the Method of Moments
    Brennan, Conor
    McGuinness, Kevin
    2023 17TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION, EUCAP, 2023,
  • [7] Enhancing site-specific weed detection using deep learning transformer architectures
    Garibaldi-Marquez, Francisco
    Martinez-Barba, Daniel A.
    Montanez-Franco, Luis E.
    Flores, Gerardo
    Valentin-Coronado, Luis M.
    CROP PROTECTION, 2025, 190
  • [8] Anatomic site-specific proteomic signatures of gastrointestinal stromal tumors
    Suehara, Yoshiyuki
    Kikuta, Kazutaka
    Nakayama, Robert
    Fujii, Kiyonaga
    Ichikawa, Hitoshi
    Shibata, Tatsuhiro
    Seki, Kunihiko
    Hasegawa, Tadashi
    Gotoh, Masahiro
    Tochigi, Naobumi
    Shimoda, Tadakazu
    Shimada, Yasuhiro
    Sano, Takeshi
    Beppu, Yasuo
    Kurosawa, Hisashi
    Hirohashi, Setsuo
    Kawai, Akira
    Kondo, Tadashi
    PROTEOMICS CLINICAL APPLICATIONS, 2009, 3 (05) : 584 - 596
  • [9] Site-specific molecular signatures predict aggressive disease in hnscc
    Belbin T.J.
    Schlecht N.F.
    Smith R.V.
    Adrien L.R.
    Kawachi N.
    Brandwein-Gensler M.
    Bergman A.
    Chen Q.
    Childs G.
    Prystowsky M.B.
    Head and Neck Pathology, 2008, 2 (4) : 243 - 256
  • [10] A model of site-specific nutrient management
    Faere, Rolf
    Wang, Chenggang
    Seavert, Clark
    APPLIED ECONOMICS, 2012, 44 (33) : 4369 - 4380