A whole-slide foundation model for digital pathology from real-world data

被引:6
|
作者
Xu, Hanwen [1 ,2 ]
Usuyama, Naoto [1 ]
Bagga, Jaspreet [1 ]
Zhang, Sheng [1 ]
Rao, Rajesh [1 ]
Naumann, Tristan [1 ]
Wong, Cliff [1 ]
Gero, Zelalem [1 ]
Gonzalez, Javier [1 ]
Gu, Yu [1 ]
Xu, Yanbo [1 ]
Wei, Mu [1 ]
Wang, Wenhui [1 ]
Ma, Shuming [1 ]
Wei, Furu [1 ]
Yang, Jianwei [1 ]
Li, Chunyuan [1 ]
Gao, Jianfeng [1 ]
Rosemon, Jaylen [3 ]
Bower, Tucker [3 ]
Lee, Soohee [4 ]
Weerasinghe, Roshanthi [4 ]
Wright, Bill J. [4 ]
Robicsek, Ari [4 ]
Piening, Brian [3 ,5 ]
Bifulco, Carlo [3 ,5 ]
Wang, Sheng [2 ,6 ]
Poon, Hoifung [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
[2] Univ Washington, Paul G Allen Sch Comp Sci Engn, Seattle, WA 98195 USA
[3] Providence Genom, Portland, OR 97225 USA
[4] Providence Res Network, Renton, WA USA
[5] Earle A Chiles Res Inst, Providence Canc Inst, Portland, OR 97213 USA
[6] Univ Washington, Dept Surg, Seattle, WA 98195 USA
关键词
D O I
10.1038/s41586-024-07441-w
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles 1-3 . Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context 4 . Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 x 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet 5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data 6 . With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision-language pretraining for pathology 7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling. Prov-GigaPath, a whole-slide pathology foundation model pretrained on a large dataset containing around 1.3 billion pathology images, attains state-of-the-art performance in cancer classification and pathomics tasks.
引用
收藏
页码:181 / 188
页数:22
相关论文
共 50 条
  • [1] Digital Imaging in Pathology: Whole-Slide Imaging and Beyond
    Ghaznavi, Farzad
    Evans, Andrew
    Madabhushi, Anant
    Feldman, Michael
    [J]. ANNUAL REVIEW OF PATHOLOGY: MECHANISMS OF DISEASE, VOL 8, 2013, 8 : 331 - 359
  • [2] Privacy risks of whole-slide image sharing in digital pathology
    Holub, Petr
    Mueller, Heimo
    Bil, Tomas
    Pireddu, Luca
    Plass, Markus
    Prasser, Fabian
    Schluender, Irene
    Zatloukal, Kurt
    Nenutil, Rudolf
    Brazdil, Tomas
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)
  • [3] Privacy risks of whole-slide image sharing in digital pathology
    Petr Holub
    Heimo Müller
    Tomáš Bíl
    Luca Pireddu
    Markus Plass
    Fabian Prasser
    Irene Schlünder
    Kurt Zatloukal
    Rudolf Nenutil
    Tomáš Brázdil
    [J]. Nature Communications, 14
  • [4] Systems pathology by multiplexed immunohistochemistry and whole-slide digital image analysis
    Blom, Sami
    Paavolainen, Lassi
    Bychkov, Dmitrii
    Turkki, Riku
    Maki-Teeri, Petra
    Hemmes, Annabrita
    Valimaki, Katja
    Lundin, Johan
    Kallioniemi, Olli
    Pellinen, Teijo
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [5] Colour reproduction evaluation of whole-slide imaging scanners for digital pathology
    Kubota, Akihiro
    Shibata, Motohiro
    Kikuchi, Susumu
    Yoneyama, Takashi
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2024, 12 (01):
  • [6] Assessing color performance of whole-slide imaging scanners for digital pathology
    Cheng, Wei-Chung
    Saleheen, Firdous
    Badano, Aldo
    [J]. COLOR RESEARCH AND APPLICATION, 2019, 44 (03): : 322 - 334
  • [7] Systems pathology by multiplexed immunohistochemistry and whole-slide digital image analysis
    Sami Blom
    Lassi Paavolainen
    Dmitrii Bychkov
    Riku Turkki
    Petra Mäki-Teeri
    Annabrita Hemmes
    Katja Välimäki
    Johan Lundin
    Olli Kallioniemi
    Teijo Pellinen
    [J]. Scientific Reports, 7
  • [8] Data-efficient and weakly supervised computational pathology on whole-slide images
    Lu, Ming Y.
    Williamson, Drew F. K.
    Chen, Tiffany Y.
    Chen, Richard J.
    Barbieri, Matteo
    Mahmood, Faisal
    [J]. NATURE BIOMEDICAL ENGINEERING, 2021, 5 (06) : 555 - +
  • [9] Data-efficient and weakly supervised computational pathology on whole-slide images
    Ming Y. Lu
    Drew F. K. Williamson
    Tiffany Y. Chen
    Richard J. Chen
    Matteo Barbieri
    Faisal Mahmood
    [J]. Nature Biomedical Engineering, 2021, 5 : 555 - 570
  • [10] Autofluorescence reduction and cross-talk correction in a whole-slide digital pathology system
    Mansfield, J.
    Hoyt, C.
    Johnson, K.
    Miller, P.
    [J]. VIRCHOWS ARCHIV, 2014, 465 : S296 - S296