MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

被引：4

作者：

Wu, Chaoyi ^{[1
,2
]}

Zhang, Xiaoman ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Xie, Weidi ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China

[2] Shanghai AI Lab, Shanghai, Peoples R China

来源：

2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023) | 2023年

基金：

国家重点研发计划;

关键词：

CONVOLUTIONAL NEURAL-NETWORK; CANCER;

D O I：

10.1109/ICCV51070.2023.01954

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.

引用

下载

页码：21315 / 21326

页数：12

共 50 条

[11] SLIP: Self-supervision Meets Language-Image Pre-training
Mu, Norman
Kirillov, Alexander
Wagner, David
Xie, Saining
COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 529 - 544
[12] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
Fang, Alex
Ilharco, Gabriel
Wortsman, Mitchell
Wan, Yuhao
Shankar, Vaishaal
Dave, Achal
Schmidt, Ludwig
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[13] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
Xie, Chen-Wei
Sun, Siyang
Xiong, Xiong
Zheng, Yun
Zhao, Deli
Zhou, Jingren
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
[14] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Yuan, Hangjie
Zhang, Shiwei
Wang, Xiang
Albanie, Samuel
Pan, Yining
Feng, Tao
Jiang, Jianwen
Ni, Dong
Zhang, Yingya
Zhao, Deli
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21592 - 21604
[15] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
Yuan, Hangjie
Zhang, Shiwei
Wang, Xiang
Albanie, Samuel
Pan, Yining
Feng, Tao
Jiang, Jianwen
Ni, Dong
Zhang, Yingya
Zhao, Deli
Proceedings of the IEEE International Conference on Computer Vision, 2023, : 21592 - 21604
[16] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
Yuan, Hangjie
Jiang, Jianwen
Albanie, Samuel
Feng, Tao
Huang, Ziyuan
Ni, Dong
Tang, Mingqian
Advances in Neural Information Processing Systems, 2022, 35
[17] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Li, Junnan
Li, Dongxu
Xiong, Caiming
Hoi, Steven
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[18] Construction safety inspection with contrastive language-image pre-training (CLIP) image captioning and attention
Lin, Jacob J. (jacoblin@ntu.edu.tw), 2025, 169
[19] Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
Yang, Wenhan
Gao, Jingdong
Mirzasoleiman, Baharan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[20] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
Yuan, Hangjie
Jiang, Jianwen
Albanie, Samuel
Feng, Tao
Huang, Ziyuan
Ni, Dong
Tang, Mingqian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →