EENet: embedding enhancement network for compositional image-text retrieval using generated text

被引:0
|
作者
Hur, Chan [1 ]
Park, Hyeyoung [1 ]
机构
[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
关键词
Compositional Image-Text Retrieval; Image-Captioning; Joint embedding; Visual Feature Enhancement; Textual Feature Generation;
D O I
10.1007/s11042-023-17531-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider the compositional image-text retrieval task, which searches for appropriate target images given a reference image with feedback text as a query. For instance, when a user finds a dress on an E-commerce site that meets all their needs except for the length and decoration, the user can give sentence-form feedback, e.g., "I like this dress, but I wish it was a little shorter and had no ribbon," to the system. This is a practical scenario for advanced retrieval systems and is applicable to user interactive search systems or E-commerce systems. To tackle this task, we propose a model, the Embedding Enhancement Network (EENet), which includes a text generation module and an image feature enhancement module using the generated text. While the conventional works mainly focus on developing an efficient composition module of a given image and text query, EENet actively generates an additional textual description to enhance the image feature vector in the embedding space, which is inspired by the human ability to recognize an object using a visual sensor and prior textual information. Also, a new training loss is introduced to ensure that images and additional generated texts are well combined. The experimental results show that the EENet achieves considerable improvement on retrieval performance evaluations; for the Recall@1 metric, it improved by 3.4% in Fashion200k and 1.4% in MIT-States over the baseline model.
引用
收藏
页码:49689 / 49705
页数:17
相关论文
共 50 条
  • [1] EENet: embedding enhancement network for compositional image-text retrieval using generated text
    Chan Hur
    Hyeyoung Park
    Multimedia Tools and Applications, 2024, 83 : 49689 - 49705
  • [2] Action-Aware Embedding Enhancement for Image-Text Retrieval
    Li, Jiangtong
    Niu, Li
    Zhang, Liqing
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1323 - 1331
  • [3] Compositional Learning of Image-Text Query for Image Retrieval
    Anwaar, Muhammad Umer
    Labintcev, Egor
    Kleinsteuber, Martin
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1139 - 1148
  • [4] Multi-Task Visual Semantic Embedding Network for Image-Text Retrieval
    Qin, Xue-Yang
    Li, Li-Shuang
    Tang, Jing-Yao
    Hao, Fei
    Ge, Mei-Ling
    Pang, Guang-Yao
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (04) : 811 - 826
  • [5] Estimating the Semantics via Sector Embedding for Image-Text Retrieval
    Wang Z.
    Gao Z.
    Han M.
    Yang Y.
    Shen H.T.
    IEEE Transactions on Multimedia, 2024, 26 : 1 - 12
  • [6] RELATION-GUIDED NETWORK FOR IMAGE-TEXT RETRIEVAL
    Yang, Yulou
    Shen, Hao
    Yang, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1856 - 1860
  • [7] Transformer Reasoning Network for Image-Text Matching and Retrieval
    Messina, Nicola
    Falchi, Fabrizio
    Esuli, Andrea
    Amato, Giuseppe
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5222 - 5229
  • [8] Image-Text Embedding with Hierarchical Knowledge for Cross-Modal Retrieval
    Seo, Sanghyun
    Kim, Juntae
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 350 - 353
  • [9] Conditional Image-Text Embedding Networks
    Plummer, Bryan A.
    Kordas, Paige
    Kiapour, M. Hadi
    Zheng, Shuai
    Piramuthu, Robinson
    Lazebnik, Svetlana
    COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 : 258 - 274
  • [10] Image-Text Cross-Modal Retrieval with Instance Contrastive Embedding
    Zeng, Ruigeng
    Ma, Wentao
    Wu, Xiaoqian
    Liu, Wei
    Liu, Jie
    ELECTRONICS, 2024, 13 (02)