Visual Language Pretrained Multiple Instance Zero-Shot Transfer for Histopathology Images

被引：37

作者：

Lu, Ming Y. ^{[1
,2
,3
]}

Chen, Bowen ^{[2
,3
]}

Zhang, Andrew ^{[1
,2
,3
]}

Williamson, Drew F. K. ^{[2
,3
]}

Chen, Richard J. ^{[2
,3
]}

Ding, Tong ^{[2
,3
]}

Le, Long Phi ^{[2
,3
]}

Chuang, Yung-Sung ^{[1
]}

Mahmood, Faisal ^{[2
,3
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Harvard Univ, Cambridge, MA 02138 USA

[3] Mass Gen Brigham, Boston, MA 02199 USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

SYSTEM;

D O I：

10.1109/CVPR52729.2023.01893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Contrastive visual language pretraining has emerged as a powerful method for either training new language-aware image encoders or augmenting existing pretrained models with zero-shot visual recognition capabilities. However, existing works typically train on large datasets of image-text pairs and have been designed to perform downstream tasks involving only small to medium sized-images, neither of which are applicable to the emerging field of computational pathology where there are limited publicly available paired image-text datasets and each image can span up to 100,000 x 100,000 pixels. In this paper we present MI-Zero, a simple and intuitive framework for unleashing the zero-shot transfer capabilities of contrastively aligned image and text models on gigapixel histopathology whole slide images, enabling multiple downstream diagnostic tasks to be carried out by pretrained encoders without requiring any additional labels. MI-Zero reformulates zero-shot transfer under the framework of multiple instance learning to overcome the computational challenge of inference on extremely large images. We used over 550k pathology reports and other available in-domain text corpora to pre-train our text encoder. By effectively leveraging strong pre-trained encoders, our best model pretrained on over 33k histopathology image-caption pairs achieves an average median zero-shot accuracy of 70.2% across three different real-world cancer subtyping tasks. Our code is available at: https://github.com/mahmoodlab/MI-Zero.

引用

页码：19764 / 19775

页数：12

共 50 条

[41] Hypernetworks for Zero-Shot Transfer in Reinforcement Learning
Rezaei-Shoshtari, Sahand
Morissette, Charlotte
Hogan, Francois R.
Dudek, Gregory
Meger, David
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9579 - 9587
[42] Zero-Shot Transfer Learning for Event Extraction
Huang, Lifu
Ji, Heng
Cho, Kyunghyun
Dagan, Ido
Riedel, Sebastian
Voss, Clare R.
PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2160 - 2170
[43] ZVQAF: Zero-shot visual question answering with feedback from large language models
Liu, Cheng
Wang, Chao
Peng, Yan
Li, Zhixu
NEUROCOMPUTING, 2024, 580
[44] Combined scaling for zero-shot transfer learning
Pham, Hieu
Dai, Zihang
Ghiasi, Golnaz
Kawaguchi, Kenji
Liu, Hanxiao
Yu, Adams Wei
Yu, Jiahui
Chen, Yi-Ting
Luong, Minh-Thang
Wu, Yonghui
Tan, Mingxing
V. Le, Quoc
NEUROCOMPUTING, 2023, 555
[45] Transfer Increment for Generalized Zero-Shot Learning
Feng, Liangjun
Zhao, Chunhui
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2506 - 2520
[46] Modularized Transfer Learning with Multiple Knowledge Graphs for Zero-shot Commonsense Reasoning
Kim, Yu Jin
Kwak, Beong-woo
Kim, Youngwook
Amplayo, Reinald Kim
Hwang, Seung-won
Yeo, Jinyoung
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2244 - 2257
[47] AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
Ebrahimi, Abteen
Mager, Manuel
Oncevay, Arturo
Chaudhary, Vishrav
Chiruzzo, Luis
Fan, Angela
Ortega, John E.
Ramos, Ricardo
Rios, Annette
Meza-Ruiz, Ivan
Gimenez-Lugo, Gustavo A.
Mager, Elisabeth
Neubig, Graham
Palmer, Alexis
Coto-Solano, Rolando
Ngoc Thang Vu
Kann, Katharina
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6279 - 6299
[48] Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
Li, Lin
Xiao, Jun
Chen, Guikun
Shao, Jian
Zhuang, Yueting
Chen, Long
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] Few-Shot and Zero-Shot Semantic Segmentation for Food Images
Honbu, Yuma
Yanai, Keiji
PROCEEDINGS OF THE 13TH INTERNATIONAL WORKSHOP ON MULTIMEDIA FOR COOKING AND EATING ACTIVITIES (CEA '21), 2021, : 25 - 28
[50] Webly-supervised zero-shot learning for artwork instance recognition
Del Chiaro, Riccardo
Bagdanov, Andrew D.
Del Bimbo, Alberto
PATTERN RECOGNITION LETTERS, 2019, 128 : 420 - 426

← 1 2 3 4 5 →