MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

被引:4
|
作者
Wu, Chaoyi [1 ,2 ]
Zhang, Xiaoman [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
CONVOLUTIONAL NEURAL-NETWORK; CANCER;
D O I
10.1109/ICCV51070.2023.01954
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
引用
下载
收藏
页码:21315 / 21326
页数:12
相关论文
共 50 条
  • [11] SLIP: Self-supervision Meets Language-Image Pre-training
    Mu, Norman
    Kirillov, Alexander
    Wagner, David
    Xie, Saining
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 529 - 544
  • [12] Data Determines Distributional Robustness in Contrastive Language-Image Pre-training (CLIP)
    Fang, Alex
    Ilharco, Gabriel
    Wortsman, Mitchell
    Wan, Yuhao
    Shankar, Vaishaal
    Dave, Achal
    Schmidt, Ludwig
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [13] RA-CLIP: Retrieval Augmented Contrastive Language-Image Pre-training
    Xie, Chen-Wei
    Sun, Siyang
    Xiong, Xiong
    Zheng, Yun
    Zhao, Deli
    Zhou, Jingren
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19265 - 19274
  • [14] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
    Yuan, Hangjie
    Zhang, Shiwei
    Wang, Xiang
    Albanie, Samuel
    Pan, Yining
    Feng, Tao
    Jiang, Jianwen
    Ni, Dong
    Zhang, Yingya
    Zhao, Deli
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21592 - 21604
  • [15] RLIPv2: Fast Scaling of Relational Language-Image Pre-training
    Yuan, Hangjie
    Zhang, Shiwei
    Wang, Xiang
    Albanie, Samuel
    Pan, Yining
    Feng, Tao
    Jiang, Jianwen
    Ni, Dong
    Zhang, Yingya
    Zhao, Deli
    Proceedings of the IEEE International Conference on Computer Vision, 2023, : 21592 - 21604
  • [16] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    Advances in Neural Information Processing Systems, 2022, 35
  • [17] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
    Li, Junnan
    Li, Dongxu
    Xiong, Caiming
    Hoi, Steven
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [19] Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
    Yang, Wenhan
    Gao, Jingdong
    Mirzasoleiman, Baharan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [20] RLIP: Relational Language-Image Pre-training for Human-Object Interaction Detection
    Yuan, Hangjie
    Jiang, Jianwen
    Albanie, Samuel
    Feng, Tao
    Huang, Ziyuan
    Ni, Dong
    Tang, Mingqian
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,