Contrastive Cross-Modal Pre-Training: A General Strategy for Small Sample Medical Imaging

被引:6
|
作者
Liang, Gongbo [1 ,2 ]
Greenwell, Connor [1 ]
Zhang, Yu [1 ]
Xing, Xin [1 ]
Wang, Xiaoqin [3 ]
Kavuluru, Ramakanth [4 ]
Jacobs, Nathan [1 ]
机构
[1] Univ Kentucky, Dept Comp Sci, Lexington, KY 40506 USA
[2] Eastern Kentucky Univ, Dept Comp Sci & Informat Technol, Richmond, KY 40475 USA
[3] Univ Kentucky, Dept Radiol, Lexington, KY 40506 USA
[4] Univ Kentucky, Div Biomed Informat, Lexington, KY 40506 USA
基金
美国国家科学基金会;
关键词
Task analysis; Biomedical imaging; Imaging; Training; Feature extraction; Image analysis; Text processing; Annotation-efficient modeling; pre-training; convolutional neural network; text-image matching; DEEP; CLASSIFICATION;
D O I
10.1109/JBHI.2021.3110805
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A key challenge in training neural networks for a given medical imaging task is the difficulty of obtaining a sufficient number of manually labeled examples. In contrast, textual imaging reports are often readily available in medical records and contain rich but unstructured interpretations written by experts as part of standard clinical practice. We propose using these textual reports as a form of weak supervision to improve the image interpretation performance of a neural network without requiring additional manually labeled examples. We use an image-text matching task to train a feature extractor and then fine-tune it in a transfer learning setting for a supervised task using a small labeled dataset. The end result is a neural network that automatically interprets imagery without requiring textual reports during inference. We evaluate our method on three classification tasks and find consistent performance improvements, reducing the need for labeled data by 67%-98%.
引用
收藏
页码:1640 / 1649
页数:10
相关论文
共 50 条
  • [1] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation
    Jiang, Chaoya
    Ye, Wei
    Xu, Haiyang
    Huang, Songfang
    Huang, Fei
    Zhang, Shikun
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
  • [2] VLCDoC: Vision-Language contrastive pre-training model for cross-Modal document classification
    Bakkali, Souhail
    Ming, Zuheng
    Coustaty, Mickael
    Rusinol, Marcal
    Ramos Terrades, Oriol
    [J]. PATTERN RECOGNITION, 2023, 139
  • [3] COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation
    Wen, Keyu
    Xia, Jin
    Huang, Yuanyuan
    Li, Linyang
    Xu, Jiayan
    Shao, Jie
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2188 - 2197
  • [4] UniXcoder: Unified Cross-Modal Pre-training for Code Representation
    Guo, Daya
    Lu, Shuai
    Duan, Nan
    Wang, Yanlin
    Zhou, Ming
    Yin, Jian
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7212 - 7225
  • [5] CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising*
    Luo, Jianjie
    Li, Yehao
    Pan, Yingwei
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5600 - 5608
  • [6] Multi-Modal Contrastive Pre-training for Recommendation
    Liu, Zhuang
    Ma, Yunpu
    Schubert, Matthias
    Ouyang, Yuanxin
    Xiong, Zhang
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 99 - 108
  • [7] CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations
    Li, Hang
    Ding, Wenbiao
    Kang, Yu
    Liu, Tianqiao
    Wu, Zhongqin
    Liu, Zitao
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3966 - 3977
  • [8] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
  • [9] Multi-Lingual Acquisition on Multimodal Pre-training for Cross-modal Retrieval
    Zhang, Liang
    Hu, Anwen
    Jin, Qin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Cross-Modal Self-Attention with Multi-Task Pre-Training for Medical Visual Question Answering
    Gong, Haifan
    Chen, Guanqi
    Liu, Sishuo
    Yu, Yizhou
    Li, Guanbin
    [J]. PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 456 - 460