Anatomical Structure-Guided Medical Vision-Language Pre-training

被引：0

作者：

Li, Qingqiu ^{[1
]}

Yan, Xiaohan ^{[2
]}

Xu, Jilan ^{[3
]}

Yuan, Runtian ^{[3
]}

Zhang, Yuejie ^{[3
]}

Feng, Rui ^{[1
,3
]}

Shen, Quanli ^{[4
]}

Zhang, Xiaobo ^{[4
]}

Wang, Shujun ^{[5
,6
]}

机构：

[1] Fudan Univ, Sch Acad Engn & Technol, Shanghai, Peoples R China

[2] Tongji Univ, CAD Res Ctr, Shanghai, Peoples R China

[3] Fudan Univ, Sch Comp Sci, Shanghai, Peoples R China

[4] Fudan Univ, Childrens Hosp, Natl Childrens Med Ctr, Shanghai, Peoples R China

[5] Hong Kong Polytech Univ, Dept Biomed Engn, Hong Kong, Peoples R China

[6] Hong Kong Polytech Univ, Res Inst Smart Ageing, Hong Kong, Peoples R China

来源：

MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XI | 2024年 / 15011卷

关键词：

Representation Learning; Medical Vision-Language Pre-training; Contrastive Learning; Anatomical Structure;

D O I：

10.1007/978-3-031-72120-5_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition decoder to associate image features with their respective tags within each sample and constructing soft labels for contrastive learning to improve the semantic association of different image-report pairs. We evaluate the proposed ASG framework on two downstream tasks, including five public benchmarks. Experimental results demonstrate that our method outperforms the state-of-the-art methods. Our code is available at https://asgmvlp.github.io.

引用

页码：80 / 90

页数：11

共 50 条

[1] Survey on Vision-language Pre-training
Yin J.
Zhang Z.-D.
Gao Y.-H.
Yang Z.-W.
Li L.
Xiao M.
Sun Y.-Q.
Yan C.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
[2] Position-guided Text Prompt for Vision-Language Pre-training
Wang, Jinpeng
Zhou, Pan
Shou, Mike Zheng
Yan, Shuicheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23242 - 23251
[3] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-Training
Liu, Che
Cheng, Sibo
Shi, Miaojing
Shah, Anand
Bai, Wenjia
Arcucci, Rossella
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2025, 44 (01) : 519 - 529
[4] VLP: A Survey on Vision-language Pre-training
Chen, Fei-Long
Zhang, Du-Zhen
Han, Ming-Lun
Chen, Xiu-Yi
Shi, Jing
Xu, Shuang
Xu, Bo
MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
[5] VLP: A Survey on Vision-language Pre-training
Fei-Long Chen
Du-Zhen Zhang
Ming-Lun Han
Xiu-Yi Chen
Jing Shi
Shuang Xu
Bo Xu
Machine Intelligence Research, 2023, 20 (01) : 38 - 56
[6] VLP: A Survey on Vision-language Pre-training
Fei-Long Chen
Du-Zhen Zhang
Ming-Lun Han
Xiu-Yi Chen
Jing Shi
Shuang Xu
Bo Xu
Machine Intelligence Research, 2023, 20 : 38 - 56
[7] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-training
Chen, Xiaofei
He, Yuting
Xue, Cheng
Ge, Rongjun
Li, Shuo
Yang, Guanyu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 405 - 415
[8] Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-training
Zhang, Wenyu
Shen, Li
Foo, Chuan-Sheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 844 - 866
[9] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Jian, Yiren
Gao, Chongyang
Vosoughi, Soroush
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Pre-training A Prompt Pool for Vision-Language Model
Liu, Jun
Gu, Yang
Yang, Zhaohua
Guo, Shuai
Liu, Huaqiu
Chen, Yiqiang
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,

← 1 2 3 4 5 →