Retrieved Generative Captioning for Medical Images

被引:0
|
作者
Beddiar, Djamila Romaissa [1 ]
Oussalah, Mourad [2 ]
Seppanen, Tapio [1 ]
机构
[1] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland
[2] Univ Oulu, Fac Med, Oulu, Finland
基金
芬兰科学院;
关键词
Image Captioning; Medical Images; Neural Networks; Retrievalbased; Captioning; Generative-based Captioning;
D O I
10.1145/3617233.3617246
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding the content of medical images and mapping it into text is a very trending topic in intersection of two main domains; computer vision and natural language processing. This is known as medical image captioning, which plays a vital role in developing automatic systems for diagnosis purposes. Recent research in the medical field provided promising results for both deep-learning based and retrieval-based models for image captioning. However, each one of them has its own drawbacks, that can be overcome if combined. In addition, existing diagnosis systems are still not able to provide enough explanation about the findings, which might be similar to what a physician can deliver. In this regard, we present in this paper a combination of a generative deep-learning based method and a retrieval-based model for medical image captioning. First, we train an attention-based encoder-decoder model to generate new captions for given medical images. Then, we fit the generated caption from the generative model to the retrieval-based model, which retrieves the most similar caption from the training database. This multi-stage approach allows us to generate most important words of the caption (with the generative model) and then search for the most close caption that includes such words (with the retrieval-based model). Another way of combining both models is by selecting at each time the caption with highest score among generated and retrieved captions. We evaluate our proposed model on the medical ROCO dataset for which we achieved a BLEU-4 score of 07.89 for the radiology class and 03.19 for the out-of-class data, for the multi-stage model. Similarly, best results were achieved for the fused model (predicted caption is the best among generated and retrieved) where we obtain a BLEU-4 values of 18.61 for the radiology class and 13.28 for the out-of-class data. Even though our results seem to be low, they outperformed the state-of-the-art results on the same dataset and could be further improved.
引用
收藏
页码:48 / 54
页数:7
相关论文
共 50 条
  • [31] TRIPLE SEQUENCE GENERATIVE ADVERSARIAL NETS FOR UNSUPERVISED IMAGE CAPTIONING
    Zhou, Yucheng
    Tao, Wei
    Zhang, Wenqiang
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7598 - 7602
  • [32] End-to-end Generative Pretraining for Multimodal Video Captioning
    Seo, Paul Hongsuck
    Nagrani, Arsha
    Arnab, Anurag
    Schmid, Cordelia
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17938 - 17947
  • [33] Multi-Attention Generative Adversarial Network for image captioning
    Wei, Yiwei
    Wang, Leiquan
    Cao, Haiwen
    Shao, Mingwen
    Wu, Chunlei
    [J]. NEUROCOMPUTING, 2020, 387 : 91 - 99
  • [34] Video captioning using Semantically Contextual Generative Adversarial Network
    Munusamy, Hemalatha
    Sekhar, C. Chandra
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221
  • [35] MedSRGAN: medical images super-resolution using generative adversarial networks
    Yuchong Gu
    Zitao Zeng
    Haibin Chen
    Jun Wei
    Yaqin Zhang
    Binghui Chen
    Yingqin Li
    Yujuan Qin
    Qing Xie
    Zhuoren Jiang
    Yao Lu
    [J]. Multimedia Tools and Applications, 2020, 79 : 21815 - 21840
  • [36] Reconstruction of Thin-Slice Medical Images Using Generative Adversarial Network
    Li, Zeju
    Wang, Yuanyuan
    Yu, Jinhua
    [J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2017), 2017, 10541 : 325 - 333
  • [37] Interpreting Latent Spaces of Generative Models for Medical Images Using Unsupervised Methods
    Schon, Julian
    Selvan, Raghavendra
    Petersen, Jens
    [J]. DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 24 - 33
  • [38] MedSRGAN: medical images super-resolution using generative adversarial networks
    Gu, Yuchong
    Zeng, Zitao
    Chen, Haibin
    Wei, Jun
    Zhang, Yaqin
    Chen, Binghui
    Li, Yingqin
    Qin, Yujuan
    Xie, Qing
    Jiang, Zhuoren
    Lu, Yao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (29-30) : 21815 - 21840
  • [39] Reinforced Transformer for Medical Image Captioning
    Xiong, Yuxuan
    Du, Bo
    Yan, Pingkun
    [J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 673 - 680
  • [40] ACapMed: Automatic Captioning for Medical Imaging
    Beddiar, Djamila Romaissa
    Oussalah, Mourad
    Seppanen, Tapio
    Jennane, Rachid
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (21):