Memorize, Associate and Match: Embedding Enhancement via Fine-Grained Alignment for Image-Text Retrieval

被引:29
|
作者
Li, Jiangtong [1 ]
Liu, Liu [1 ]
Niu, Li [1 ]
Zhang, Liqing [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MOE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
关键词
Learning systems; Feature extraction; Visualization; Semantics; Correlation; Benchmark testing; Transformers; Image-text retrieval; memory network; attention mechanism; transformer;
D O I
10.1109/TIP.2021.3123553
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image-text retrieval aims to capture the semantic correlation between images and texts. Existing image-text retrieval methods can be roughly categorized into embedding learning paradigm and pair-wise learning paradigm. The former paradigm fails to capture the fine-grained correspondence between images and texts. The latter paradigm achieves fine-grained alignment between regions and words, but the high cost of pair-wise computation leads to slow retrieval speed. In this paper, we propose a novel method named MEMBER by using Memory-based EMBedding Enhancement for image-text Retrieval (MEMBER), which introduces global memory banks to enable fine-grained alignment and fusion in embedding learning paradigm. Specifically, we enrich image (resp., text) features with relevant text (resp., image) features stored in the text (resp., image) memory bank. In this way, our model not only accomplishes mutual embedding enhancement across two modalities, but also maintains the retrieval efficiency. Extensive experiments demonstrate that our MEMBER remarkably outperforms state-of-the-art approaches on two large-scale benchmark datasets.
引用
收藏
页码:9193 / 9207
页数:15
相关论文
共 50 条
  • [1] ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
    Messina, Nicola
    Stefanini, Matteo
    Cornia, Marcella
    Baraldi, Lorenzo
    Falchi, Fabrizio
    Amato, Giuseppe
    Cucchiara, Rita
    [J]. 19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 64 - 70
  • [2] Fine-Grained Image-Text Retrieval via Discriminative Latent Space Learning
    Zheng, Min
    Wang, Wen
    Li, Qingyong
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 (28) : 643 - 647
  • [3] Towards Fast and Accurate Image-Text Retrieval With Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 1361 - 1372
  • [4] Towards Fast and Accurate Image-Text Retrieval with Self-Supervised Fine-Grained Alignment
    Zhuang, Jiamin
    Yu, Jing
    Ding, Yang
    Qu, Xiangyan
    Hu, Yue
    [J]. arXiv, 2023,
  • [5] Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
    Qiu, Longtian
    Ning, Shan
    He, Xuming
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4605 - 4613
  • [6] Action-Aware Embedding Enhancement for Image-Text Retrieval
    Li, Jiangtong
    Niu, Li
    Zhang, Liqing
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1323 - 1331
  • [7] Estimating the Semantics via Sector Embedding for Image-Text Retrieval
    Wang, Zheng
    Gao, Zhenwei
    Han, Mengqun
    Yang, Yang
    Shen, Heng Tao
    [J]. IEEE Transactions on Multimedia, 2024, 26 : 10342 - 10353
  • [8] Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval
    Zhou, Zihui
    Feng, Yong
    Qiu, Agen
    Duan, Guofan
    Zhou, Mingliang
    [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17 : 19194 - 19210
  • [9] TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval
    Li, Qiqi
    Ma, Longfei
    Jiang, Zheng
    Li, Mingyong
    Jin, Bo
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 3713 - 3728
  • [10] Collaborative fine-grained interaction learning for image-text sentiment analysis
    Xiao, Xingwang
    Pu, Yuanyuan
    Zhou, Dongming
    Cao, Jinde
    Gu, Jinjing
    Zhao, Zhengpeng
    Xu, Dan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 279