SAM-GUIDED ENHANCED FINE-GRAINED ENCODING WITH MIXED SEMANTIC LEARNING FOR MEDICAL IMAGE CAPTIONING

被引:1
|
作者
Zhang, Zhenyu [1 ]
Wang, Benlu [1 ]
Liang, Weijie [1 ]
Li, Yizhi [1 ]
Guo, Xuechen [1 ]
Wang, Guanhong [1 ]
Li, Shiyan [2 ]
Wang, Gaoang [1 ]
机构
[1] Zhejiang Univ, Zhejiang Univ Illinois Urbana Champaign Inst, Hangzhou, Peoples R China
[2] Zhejiang Univ, Sch Med, Sir Run Run Shaw Hosp, Hangzhou, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
基金
中国国家自然科学基金;
关键词
Medical Image; Multimodal; Image Captioning; Dual Image Encoders; Large Language Model;
D O I
10.1109/ICASSP48485.2024.10446878
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations. However, current generic text and image pre-trained models do not yield satisfactory results when it comes to describing intricate details within medical images. In this paper, we present a novel medical image captioning method guided by the segment anything model (SAM) to enable enhanced encoding with both general and detailed feature extraction. In addition, our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images. We demonstrate the effectiveness of this approach, as it outperforms the pre-trained BLIP2 model on various evaluation metrics for generating descriptions of medical images.
引用
收藏
页码:1731 / 1735
页数:5
相关论文
共 50 条
  • [41] Learning Cascade Attention for fine-grained image classification
    Zhu, Youxiang
    Li, Ruochen
    Yang, Yin
    Ye, Ning
    NEURAL NETWORKS, 2020, 122 : 174 - 182
  • [42] Fine-Grained Image Analysis With Deep Learning: A Survey
    Wei, Xiu-Shen
    Song, Yi-Zhe
    Mac Aodha, Oisin
    Wu, Jianxin
    Peng, Yuxin
    Tang, Jinhui
    Yang, Jian
    Belongie, Serge
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8927 - 8948
  • [43] DEEP DICTIONARY LEARNING FOR FINE-GRAINED IMAGE CLASSIFICATION
    Srinivas, M.
    Lin, Yen-Yu
    Liao, Hong-Yuan Mark
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 835 - 839
  • [44] Destruction and Construction Learning for Fine-grained Image Recognition
    Chen, Yue
    Bai, Yalong
    Zhang, Wei
    Mei, Tao
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5152 - 5161
  • [45] Learning Fine-grained Image Similarity with Deep Ranking
    Wang, Jiang
    Song, Yang
    Leung, Thomas
    Rosenberg, Chuck
    Wang, Jingbin
    Philbin, James
    Chen, Bo
    Wu, Ying
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1386 - 1393
  • [46] Fine-Grained Semantics Enhanced Contrastive Learning for Graphs
    Liu, Youming
    Shu, Lin
    Chen, Chuan
    Zheng, Zibin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8238 - 8250
  • [47] Fine-grained person-based image captioning via advanced spectrum parsing
    Wu, Jianhui
    Ni, Fan
    Wang, Zijie
    Ju, Haoyu
    Zhang, Yue
    Hu, Fangqiang
    Li, Yifeng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 34015 - 34030
  • [48] Fine-Grained Self-Supervised Learning with Jigsaw puzzles for medical image classification
    Park W.
    Ryu J.
    Comput. Biol. Med., 2024,
  • [49] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Gupta, Neeraj
    Jalal, Anand Singh
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (24): : 17899 - 17908
  • [50] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Neeraj Gupta
    Anand Singh Jalal
    Neural Computing and Applications, 2020, 32 : 17899 - 17908