MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

被引:4
|
作者
Wu, Chaoyi [1 ,2 ]
Zhang, Xiaoman [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Xie, Weidi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
CONVOLUTIONAL NEURAL-NETWORK; CANCER;
D O I
10.1109/ICCV51070.2023.01954
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field, and implicitly build relationships between medical entities in the language embedding space; Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis; Fourth, we conduct thorough experiments to validate the effectiveness of our architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.
引用
下载
收藏
页码:21315 / 21326
页数:12
相关论文
共 50 条
  • [31] Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion
    Zhu, Huashi
    Xu, Dexuan
    Huang, Yu
    Jin, Zhi
    Ding, Weiping
    Tong, Jiahui
    Chong, Guoshuang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2697 - 2708
  • [32] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
    HUANG Jitao
    ZENG Guohui
    HUANG Bo
    GAO Yongbin
    LIU Jin
    SHI Zhicai
    Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155
  • [33] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
    Lee, Ju-Hee
    Kang, Je-Won
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290
  • [34] GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
    Deng, Xinchi
    Shi, Han
    Huang, Runhui
    Li, Changlin
    Xu, Hang
    Han, Jianhua
    Kwok, James
    Zhao, Shen
    Zhang, Wei
    Liang, Xiaodan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22121 - 22132
  • [35] <bold>MTPret:</bold> Improving X-ray Image Analytics with Multi-Task Pre-training
    Liao W.
    Wang Q.
    Li X.
    Liu Y.
    Chen Z.
    Huang S.
    Dou D.
    Xu Y.
    Xiong H.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (09): : 1 - 14
  • [36] Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis
    Dobrzycki, Andrzej D.
    Bernardos, Ana M.
    Bergesio, Luca
    Pomirski, Andrzej
    Saez-Trigueros, Daniel
    MATHEMATICS, 2024, 12 (01)
  • [37] Contrastive Language-knowledge Graph Pre-training
    Yuan, Xiaowei
    Liu, Kang
    Wang, Yequan
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)
  • [38] Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
    Chen, Zhihong
    Li, Guanbin
    Wan, Xiang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5152 - 5161
  • [39] Knowledge-enhanced visual-language pre-training on chest radiology images
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhang, Ya
    Xie, Weidi
    Wang, Yanfeng
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [40] Knowledge-enhanced visual-language pre-training on chest radiology images
    Xiaoman Zhang
    Chaoyi Wu
    Ya Zhang
    Weidi Xie
    Yanfeng Wang
    Nature Communications, 14