Knowledge-enhanced visual-language pre-training on chest radiology images

被引:13
|
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhang, Ya [1 ,2 ]
Xie, Weidi [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai 200240, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
国家重点研发计划;
关键词
SYSTEM;
D O I
10.1038/s41467-023-40260-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose an approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully supervised models but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios. Despite the success of multi-modal foundation models in natural language and vision tasks, their use in medical domains is limited. Here, the authors propose to train a foundation model for chest X-ray diagnosis that combines medical domain knowledge with vision-language representation learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Knowledge-enhanced visual-language pre-training on chest radiology images
    Xiaoman Zhang
    Chaoyi Wu
    Ya Zhang
    Weidi Xie
    Yanfeng Wang
    Nature Communications, 14
  • [2] Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language Model Pre-training
    Agarwal, Oshin
    Ge, Heming
    Shakeri, Siamak
    Al-Rfou, Rami
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3554 - 3565
  • [3] Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
    Karouimu, Yasmine
    Lebret, Remi
    Foroutan, Negar
    Aberer, Karl
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 366 - 375
  • [4] Medication Recommendation Based on a Knowledge-enhanced Pre-training Model
    Wang, Mengzhen
    Chen, Jianhui
    Lin, Shaofu
    PROCEEDINGS OF 2021 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY WORKSHOPS AND SPECIAL SESSIONS: (WI-IAT WORKSHOP/SPECIAL SESSION 2021), 2021, : 290 - 294
  • [5] REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
    Hu, Ziniu
    Iscen, Ahmet
    Sun, Chen
    Wang, Zirui
    Chang, Kai-Wei
    Sun, Yizhou
    Schmid, Cordelia
    Ross, David A.
    Fathi, Alireza
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23369 - 23379
  • [6] MR-KPA: medication recommendation by combining knowledge-enhanced pre-training with a deep adversarial network
    Shaofu Lin
    Mengzhen Wang
    Chengyu Shi
    Zhe Xu
    Lihong Chen
    Qingcai Gao
    Jianhui Chen
    BMC Bioinformatics, 23
  • [7] MR-KPA: medication recommendation by combining knowledge-enhanced pre-training with a deep adversarial network
    Lin, Shaofu
    Wang, Mengzhen
    Shi, Chengyu
    Xu, Zhe
    Chen, Lihong
    Gao, Qingcai
    Chen, Jianhui
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [8] Contrastive Pre-training and Representation Distillation for Medical Visual Question Answering Based on Radiology Images
    Liu, Bo
    Zhan, Li-Ming
    Wu, Xiao-Ming
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 210 - 220
  • [9] Graph Structure Enhanced Pre-Training Language Model for Knowledge Graph Completion
    Zhu, Huashi
    Xu, Dexuan
    Huang, Yu
    Jin, Zhi
    Ding, Weiping
    Tong, Jiahui
    Chong, Guoshuang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (04): : 2697 - 2708
  • [10] Knowledge Enhanced Pre-Training Model for Vision-Language-Navigation Task
    HUANG Jitao
    ZENG Guohui
    HUANG Bo
    GAO Yongbin
    LIU Jin
    SHI Zhicai
    Wuhan University Journal of Natural Sciences, 2021, 26 (02) : 147 - 155