Retrieved Generative Captioning for Medical Images

被引：0

作者：

Beddiar, Djamila Romaissa ^{[1
]}

Oussalah, Mourad ^{[2
]}

Seppanen, Tapio ^{[1
]}

机构：

[1] Univ Oulu, Ctr Machine Vis & Signal Anal, Oulu, Finland

[2] Univ Oulu, Fac Med, Oulu, Finland

来源：

20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023 | 2023年

基金：

芬兰科学院;

关键词：

Image Captioning; Medical Images; Neural Networks; Retrievalbased; Captioning; Generative-based Captioning;

D O I：

10.1145/3617233.3617246

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding the content of medical images and mapping it into text is a very trending topic in intersection of two main domains; computer vision and natural language processing. This is known as medical image captioning, which plays a vital role in developing automatic systems for diagnosis purposes. Recent research in the medical field provided promising results for both deep-learning based and retrieval-based models for image captioning. However, each one of them has its own drawbacks, that can be overcome if combined. In addition, existing diagnosis systems are still not able to provide enough explanation about the findings, which might be similar to what a physician can deliver. In this regard, we present in this paper a combination of a generative deep-learning based method and a retrieval-based model for medical image captioning. First, we train an attention-based encoder-decoder model to generate new captions for given medical images. Then, we fit the generated caption from the generative model to the retrieval-based model, which retrieves the most similar caption from the training database. This multi-stage approach allows us to generate most important words of the caption (with the generative model) and then search for the most close caption that includes such words (with the retrieval-based model). Another way of combining both models is by selecting at each time the caption with highest score among generated and retrieved captions. We evaluate our proposed model on the medical ROCO dataset for which we achieved a BLEU-4 score of 07.89 for the radiology class and 03.19 for the out-of-class data, for the multi-stage model. Similarly, best results were achieved for the fused model (predicted caption is the best among generated and retrieved) where we obtain a BLEU-4 values of 18.61 for the radiology class and 13.28 for the out-of-class data. Even though our results seem to be low, they outperformed the state-of-the-art results on the same dataset and could be further improved.

引用

页码：48 / 54

页数：7

共 50 条

[31] TRIPLE SEQUENCE GENERATIVE ADVERSARIAL NETS FOR UNSUPERVISED IMAGE CAPTIONING
Zhou, Yucheng
Tao, Wei
Zhang, Wenqiang
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7598 - 7602
[32] End-to-end Generative Pretraining for Multimodal Video Captioning
Seo, Paul Hongsuck
Nagrani, Arsha
Arnab, Anurag
Schmid, Cordelia
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17938 - 17947
[33] Multi-Attention Generative Adversarial Network for image captioning
Wei, Yiwei
Wang, Leiquan
Cao, Haiwen
Shao, Mingwen
Wu, Chunlei
[J]. NEUROCOMPUTING, 2020, 387 : 91 - 99
[34] Video captioning using Semantically Contextual Generative Adversarial Network
Munusamy, Hemalatha
Sekhar, C. Chandra
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 221
[35] MedSRGAN: medical images super-resolution using generative adversarial networks
Yuchong Gu
Zitao Zeng
Haibin Chen
Jun Wei
Yaqin Zhang
Binghui Chen
Yingqin Li
Yujuan Qin
Qing Xie
Zhuoren Jiang
Yao Lu
[J]. Multimedia Tools and Applications, 2020, 79 : 21815 - 21840
[36] Reconstruction of Thin-Slice Medical Images Using Generative Adversarial Network
Li, Zeju
Wang, Yuanyuan
Yu, Jinhua
[J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2017), 2017, 10541 : 325 - 333
[37] Interpreting Latent Spaces of Generative Models for Medical Images Using Unsupervised Methods
Schon, Julian
Selvan, Raghavendra
Petersen, Jens
[J]. DEEP GENERATIVE MODELS, DGM4MICCAI 2022, 2022, 13609 : 24 - 33
[38] MedSRGAN: medical images super-resolution using generative adversarial networks
Gu, Yuchong
Zeng, Zitao
Chen, Haibin
Wei, Jun
Zhang, Yaqin
Chen, Binghui
Li, Yingqin
Qin, Yujuan
Xie, Qing
Jiang, Zhuoren
Lu, Yao
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (29-30) : 21815 - 21840
[39] Reinforced Transformer for Medical Image Captioning
Xiong, Yuxuan
Du, Bo
Yan, Pingkun
[J]. MACHINE LEARNING IN MEDICAL IMAGING (MLMI 2019), 2019, 11861 : 673 - 680
[40] ACapMed: Automatic Captioning for Medical Imaging
Beddiar, Djamila Romaissa
Oussalah, Mourad
Seppanen, Tapio
Jennane, Rachid
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (21):

← 1 2 3 4 5 →