MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

被引:4
|
作者
Wu, Chaoyi [1 ,2 ]
Zhang, Xiaoman [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
CONVOLUTIONAL NEURAL-NETWORK; CANCER;
D O I
10.1109/ICCV51070.2023.01954
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
引用
收藏
页码:21315 / 21326
页数:12
相关论文
共 50 条
  • [1] Contrastive Language-Image Pre-Training with Knowledge Graphs
    Pan, Xuran
    Ye, Tianzhu
    Han, Dongchen
    Song, Shiji
    Huang, Gao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Grounded Language-Image Pre-training
    Li, Liunian Harold
    Zhang, Pengchuan
    Zhang, Haotian
    Yang, Jianwei
    Li, Chunyuan
    Zhong, Yiwu
    Wang, Lijuan
    Yuan, Lu
    Zhang, Lei
    Hwang, Jenq-Neng
    Chang, Kai-Wei
    Gao, Jianfeng
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
  • [3] ARCHICLIP Enhanced Contrastive Language-Image Pre-training Model With Architectural Prior Knowledge
    Xia, Shengtao
    Cheng, Yiming
    Tian, Runjia
    [J]. PROCEEDINGS OF THE 29TH INTERNATIONAL CONFERENCE OF THE ASSOCIATION FOR COMPUTER-AIDED ARCHITECTURAL DESIGN RESEARCH IN ASIA, CAADRIA 2024, VOL 1, 2024, : 69 - 78
  • [4] CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training
    You, Kihyun
    Gu, Jawook
    Ham, Jiyeon
    Park, Beomhee
    Kim, Jiho
    Hong, Eun K.
    Baek, Woonhyuk
    Roh, Byungseok
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT II, 2023, 14221 : 101 - 111
  • [5] Scaling Language-Image Pre-training via Masking
    Li, Yanghao
    Fan, Haoqi
    Hu, Ronghang
    Feichtenhofert, Christoph
    He, Kaiming
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23390 - 23400
  • [6] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
    Yang, Kaicheng
    Deng, Jiankang
    An, Xiang
    Li, Jiawei
    Feng, Ziyong
    Guo, Jia
    Yang, Jing
    Liu, Tongliang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2910 - 2919
  • [7] NLIP: Noise-Robust Language-Image Pre-training
    Huang, Runhui
    Long, Yanxin
    Han, Jianhua
    Xu, Hang
    Liang, Xiwen
    Xu, Chunjing
    Liang, Xiaodan
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 926 - 934
  • [8] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
    Lee, Janghyeon
    Kim, Jongsuk
    Shon, Hyounguk
    Kim, Bumsoo
    Kim, Seung Hwan
    Lee, Honglak
    Kim, Junmo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Non-Contrastive Learning Meets Language-Image Pre-Training
    Zhou, Jinghao
    Dong, Li
    Gan, Zhe
    Wang, Lijuan
    Wei, Furu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11028 - 11038
  • [10] iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
    Wei, Yixuan
    Cao, Yue
    Zhang, Zheng
    Peng, Houwen
    Yao, Zhuliang
    Xie, Zhenda
    Hue, Han
    Guo, Baining
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2776 - 2786