Enhanced descriptive captioning model for histopathological patches

被引:0
|
作者
Elbedwehy, Samar [1 ,2 ]
Medhat, T. [3 ]
Hamza, Taher [2 ]
Alrahmawy, Mohammed F. [2 ]
机构
[1] Kafrelsheikh Univ, Fac Artificial Intelligence, Dept Data Sci, Kafr Al Sheikh, Egypt
[2] Mansoura Univ, Fac Comp & Informat Sci, Dept Comp Sci, Mansoura, Egypt
[3] Kafrelsheikh Univ, Fac Engn, Dept Elect Engn, Kafr Al Sheikh, Egypt
关键词
Image captioning; Medical-images; Word-embedding; Concatenation; Transformer; IMAGE; NETWORK;
D O I
10.1007/s11042-023-15884-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The interpretation of medical images into a natural language is a developing field of artificial intelligence (AI) called image captioning. This field integrates two branches of artificial intelligence which are computer vision and natural language processing. This is a challenging topic that goes beyond object recognition, segmentation, and classification since it demands an understanding of the relationships between various components in an image and how these objects function as visual representations. The content-based image retrieval (CBIR) uses an image captioning model to generate captions for the user query image. The common architecture of medical image captioning systems consists mainly of an image feature extractor subsystem followed by a caption generation lingual subsystem. We aim in this paper to build an optimized model for histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens. For the image feature extraction subsystem, we did two evaluations; first, we tested 5 different vision models (VGG, ResNet, PVT, SWIN-Large, and ConvNEXT-Large) using (LSTM, RNN, and bidirectional-RNN) and then compare the vision models with (LSTM-without augmentation, LSTM-with augmentation and BioLinkBERT-Large as an embedding layer-with augmentation) to find the accurate one. Second, we tested 3 different concatenations of pairs of vision models (SWIN-Large, PVT_v2_b5, and ConvNEXT-Large) to get among them the most expressive extracted feature vector of the image. For the caption generation lingual subsystem, we tested a pre-trained language embedding model which is BioLinkBERT-Large compared to LSTM in both evaluations, to select from them the most accurate model. Our experiments showed that building a captioning system that uses a concatenation of the two models ConvNEXT-Large and PVT_v2_b5 as an image feature extractor, combined with the BioLinkBERT-Large language embedding model produces the best results among the other combinations.
引用
收藏
页码:36645 / 36664
页数:20
相关论文
共 50 条
  • [1] Enhanced descriptive captioning model for histopathological patches
    Samar Elbedwehy
    T. Medhat
    Taher Hamza
    Mohammed F. Alrahmawy
    [J]. Multimedia Tools and Applications, 2024, 83 : 36645 - 36664
  • [2] Bilingual video captioning model for enhanced video retrieval
    Norah Alrebdi
    Amal A. Al-Shargabi
    [J]. Journal of Big Data, 11
  • [3] Bilingual video captioning model for enhanced video retrieval
    Alrebdi, Norah
    Al-Shargabi, Amal A.
    [J]. JOURNAL OF BIG DATA, 2024, 11 (01)
  • [4] Enhanced Text-Guided Attention Model for Image Captioning
    Zhou, Yuanen
    Hu, Zhenzhen
    Zhao, Ye
    Liu, Xueliang
    Hong, Richang
    [J]. 2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [5] Inference of captions from histopathological patches
    Tsuneki, Masayuki
    Kanavati, Fahdi
    [J]. INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 172, 2022, 172 : 1235 - 1249
  • [6] HISTOPATHOLOGICAL ASPECTS OF ORAL WHITE PATCHES
    BASTIAAN, RJ
    [J]. JOURNAL OF DENTAL RESEARCH, 1976, 55 (03) : 522 - 522
  • [7] Enhancing Descriptive Image Captioning with Natural Language Inference
    Shi, Zhan
    Liu, Hui
    Zhu, Xiaodan
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 269 - 277
  • [8] Center-enhanced video captioning model with multimodal semantic alignment
    Zhang, Benhui
    Gao, Junyu
    Yuan, Yuan
    [J]. Neural Networks, 2024, 180
  • [9] Descriptive growth model of the height of stapes in the fetus: a histopathological study of the temporal bone
    Viktor Chrobok
    Milan Meloun
    Eva Šimáková
    [J]. European Archives of Oto-Rhino-Laryngology and Head & Neck, 2004, 261 : 25 - 29
  • [10] Descriptive growth model of the height of stapes in the fetus:: a histopathological study of the temporal bone
    Chrobok, V
    Meloun, M
    Simáková, E
    [J]. EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2004, 261 (01) : 25 - 29