Bi-directional Image–Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges

被引:0
|
作者
Doaa B. Ebaid
Magda M. Madbouly
Adel A. El-Zoghabi
机构
[1] Alexandria University,Department of Information Technology, Institute of Graduate Studies and Research
关键词
Image–text matching; Multimodal retrieval; Cross-model retrieval; Deep learning;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, image–text matching (retrieval) has frequently attracted attention due to the growth of multimodal data. This task returns the relevant images to a textual query or descriptions that describe a visual scene and vice versa. The core challenge is how to precisely determine the similarity computation between the text and image, which requires understanding the different modalities by extracting the related information accurately. Although many approaches are established for matching textual data and visual content utilizing deep learning (DL) approaches, a few reviews of the studies of image–text matching are obtainable using DL. In this review study, we contribute to present and clarify the modern techniques based on DL in the image–text matching problem by providing an extensive study of the existing matching models, different current architectures, benchmark datasets, and evaluation methods. First, we explain the matching task and illustrate frequently used architecture. Second, we classify present approaches according to two important concepts the alignment between image and text, and the learning approach. Third, we report standard datasets and evaluation techniques. Finally, we show up current challenges to serve as an inspiration to new researchers in this field.
引用
收藏
相关论文
共 47 条
  • [1] Bi-directional Image-Text Matching Deep Learning-Based Approaches: Concepts, Methodologies, Benchmarks and Challenges
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2023, 16 (01)
  • [2] Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching
    Huang, Feiran
    Zhang, Xiaoming
    Zhao, Zhonghua
    Li, Zhoujun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 2008 - 2020
  • [3] Uniting Image and Text Deep Networks via Bi-directional Triplet Loss for Retreival
    Hua, Yan
    Du, Jianhe
    [J]. PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 297 - 300
  • [4] Region level Bi-directional Deep Learning Framework for EEG-based Image Classification
    Fares, Ahmed
    Zhong, Shenghua
    Jiang, Jianmin
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 368 - 373
  • [5] Deep Learning-based Image Text Processing Research
    Xiong, Huixuan
    Jin, Kai
    Liu, Jingnian
    Cai, Jiahong
    Xiao, Lijun
    [J]. 2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 163 - 168
  • [6] Machine Learning and Deep Learning Based Computational Approaches in Automatic Microorganisms Image Recognition: Methodologies, Challenges, and Developments
    Priya Rani
    Shallu Kotwal
    Jatinder Manhas
    Vinod Sharma
    Sparsh Sharma
    [J]. Archives of Computational Methods in Engineering, 2022, 29 : 1801 - 1837
  • [7] Machine Learning and Deep Learning Based Computational Approaches in Automatic Microorganisms Image Recognition: Methodologies, Challenges, and Developments
    Rani, Priya
    Kotwal, Shallu
    Manhas, Jatinder
    Sharma, Vinod
    Sharma, Sparsh
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2022, 29 (03) : 1801 - 1837
  • [8] EEG-based image classification via a region-level stacked bi-directional deep learning framework
    Fares, Ahmed
    Zhong, Sheng-hua
    Jiang, Jianmin
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 6)
  • [9] EEG-based image classification via a region-level stacked bi-directional deep learning framework
    Ahmed Fares
    Sheng-hua Zhong
    Jianmin Jiang
    [J]. BMC Medical Informatics and Decision Making, 19
  • [10] Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification
    Asim, Muhammad Nabeel
    Ghani, Muhammad Usman
    Ibrahim, Muhammad Ali
    Mahmood, Waqar
    Dengel, Andreas
    Ahmed, Sheraz
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (11): : 5437 - 5469