Knowledge-enhanced visual-language pre-training on chest radiology images

被引:13
|
作者
Zhang, Xiaoman [1 ,2 ]
Wu, Chaoyi [1 ,2 ]
Zhang, Ya [1 ,2 ]
Xie, Weidi [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai 200240, Peoples R China
[2] Shanghai Artificial Intelligence Lab, Shanghai 200232, Peoples R China
基金
国家重点研发计划;
关键词
SYSTEM;
D O I
10.1038/s41467-023-40260-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose an approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully supervised models but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios. Despite the success of multi-modal foundation models in natural language and vision tasks, their use in medical domains is limited. Here, the authors propose to train a foundation model for chest X-ray diagnosis that combines medical domain knowledge with vision-language representation learning.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Knowledge distilled pre-training model for vision-language-navigation
    Bo Huang
    Shuai Zhang
    Jitao Huang
    Yijun Yu
    Zhicai Shi
    Yujie Xiong
    Applied Intelligence, 2023, 53 : 5607 - 5619
  • [22] Retrieval-based Knowledge Augmented Vision Language Pre-training
    Rao, Jiahua
    Shan, Zifei
    Liu, Longpo
    Zhou, Yao
    Yang, Yuedong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5399 - 5409
  • [23] Lightweight Model Pre-Training Via Language Guided Knowledge Distillation
    Li M.
    Zhang L.
    Zhu M.
    Huang Z.
    Yu G.
    Fan J.
    Chen T.
    IEEE Transactions on Multimedia, 2024, 26 : 1 - 11
  • [24] Knowledge distilled pre-training model for vision-language-navigation
    Huang, Bo
    Zhang, Shuai
    Huang, Jitao
    Yu, Yijun
    Shi, Zhicai
    Xiong, Yujie
    APPLIED INTELLIGENCE, 2023, 53 (05) : 5607 - 5619
  • [25] DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding
    Zhang, Taolin
    Wang, Chengyu
    Hu, Nan
    Qiu, Minghui
    Tang, Chengguang
    He, Xiaofeng
    Huang, Jun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11703 - 11711
  • [26] Pre-Training Without Natural Images
    Kataoka, Hirokatsu
    Okayasu, Kazushige
    Matsumoto, Asato
    Yamagata, Eisuke
    Yamada, Ryosuke
    Inoue, Nakamasa
    Nakamura, Akio
    Satoh, Yutaka
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (04) : 990 - 1007
  • [27] Pre-Training Without Natural Images
    Hirokatsu Kataoka
    Kazushige Okayasu
    Asato Matsumoto
    Eisuke Yamagata
    Ryosuke Yamada
    Nakamasa Inoue
    Akio Nakamura
    Yutaka Satoh
    International Journal of Computer Vision, 2022, 130 : 990 - 1007
  • [28] Quality Diversity for Visual Pre-Training
    Chavhan, Ruchika
    Gouk, Henry
    Li, Da
    Hospedales, Timothy
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5361 - 5371
  • [29] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [30] KEBLM: Knowledge-Enhanced Biomedical Language Models
    Lai, Tuan Manh
    Zhai, ChengXiang
    Ji, Heng
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 143