Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

被引:37
|
作者
Lu, Ming Y. [1 ,2 ,3 ]
Chen, Bowen [2 ,3 ]
Zhang, Andrew [1 ,2 ,3 ]
Williamson, Drew F. K. [2 ,3 ]
Chen, Richard J. [2 ,3 ]
Ding, Tong [2 ,3 ]
Le, Long Phi [2 ,3 ]
Chuang, Yung-Sung [1 ]
Mahmood, Faisal [2 ,3 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Harvard Univ, Cambridge, MA 02138 USA
[3] Mass Gen Brigham, Boston, MA 02199 USA
关键词
SYSTEM;
D O I
10.1109/CVPR52729.2023.01893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pre-train our text encoder. By effectively leveraging strong pre-trained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: https://github.com/mahmoodlab/MI-Zero.
引用
收藏
页码:19764 / 19775
页数:12
相关论文
共 50 条
  • [1] Zero-Shot AutoML with Pretrained Models
    Oeztuerk, Ekrem
    Ferreira, Fabio
    Jomaa, Hadi S.
    Schmidt-Thieme, Lars
    Grabocka, Josif
    Hutter, Frank
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Zero-Shot Instance Segmentation
    Zheng, Ye
    Wu, Jiahong
    Qin, Yongqiang
    Zhang, Faen
    Cui, Li
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2593 - 2602
  • [3] Towards Cognition-Aligned Visual Language Models via Zero-Shot Instance Retrieval
    Ma, Teng
    Organisciak, Daniel
    Ma, Wenbao
    Long, Yang
    ELECTRONICS, 2024, 13 (09)
  • [4] Deep Multiple Instance Learning for Zero-Shot Image Tagging
    Rahman, Shafin
    Khan, Salman
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 530 - 546
  • [5] Prompt engineering for zero-shot and few-shot defect detection and classification using a visual-language pretrained model
    Yong, Gunwoo
    Jeon, Kahyun
    Gil, Daeyoung
    Lee, Ghang
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (11) : 1536 - 1554
  • [6] Zero-Shot Adaptive Transfer for Conversational Language Understanding
    Lee, Sungjin
    Jha, Rahul
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6642 - 6649
  • [7] Zero-shot urban function inference with street view images through prompting a pretrained vision-language model
    Huang, Weiming
    Wang, Jing
    Cong, Gao
    INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2024, 38 (07) : 1414 - 1442
  • [8] Translate to Disambiguate: Zero-shot Multilingual Word Sense Disambiguation with Pretrained Language Models
    Kang, Haoqiang
    Blevins, Terra
    Zettlemoyer, Luke
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 1562 - 1575
  • [9] Zero-shot Visual Question Answering with Language Model Feedback
    Du, Yifan
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 9268 - 9281
  • [10] Zero-Shot Visual Imitation
    Pathak, Deepak
    Mahmoudieh, Parsa
    Luo, Guanghao
    Agrawal, Pulkit
    Chen, Dian
    Shentu, Fred
    Shelhamer, Evan
    Malik, Jitendra
    Efros, Alexei A.
    Darrell, Trevor
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 2131 - 2134