EENet: embedding enhancement network for compositional image-text retrieval using generated text

被引：0

作者：

Hur, Chan ^{[1
]}

Park, Hyeyoung ^{[1
]}

机构：

[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 16期

关键词：

Compositional Image-Text Retrieval; Image-Captioning; Joint embedding; Visual Feature Enhancement; Textual Feature Generation;

D O I：

10.1007/s11042-023-17531-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider the compositional image-text retrieval task, which searches for appropriate target images given a reference image with feedback text as a query. For instance, when a user finds a dress on an E-commerce site that meets all their needs except for the length and decoration, the user can give sentence-form feedback, e.g., "I like this dress, but I wish it was a little shorter and had no ribbon," to the system. This is a practical scenario for advanced retrieval systems and is applicable to user interactive search systems or E-commerce systems. To tackle this task, we propose a model, the Embedding Enhancement Network (EENet), which includes a text generation module and an image feature enhancement module using the generated text. While the conventional works mainly focus on developing an efficient composition module of a given image and text query, EENet actively generates an additional textual description to enhance the image feature vector in the embedding space, which is inspired by the human ability to recognize an object using a visual sensor and prior textual information. Also, a new training loss is introduced to ensure that images and additional generated texts are well combined. The experimental results show that the EENet achieves considerable improvement on retrieval performance evaluations; for the Recall@1 metric, it improved by 3.4% in Fashion200k and 1.4% in MIT-States over the baseline model.

引用

页码：49689 / 49705

页数：17

共 50 条

[21] Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval
Mithun, Niluthpol Chowdhury
Panda, Rameswar
Papalexakis, Evangelos E.
Roy-Chowdhury, Amit K.
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 1856 - 1864
[22] USER: Unified Semantic Enhancement With Momentum Contrast for Image-Text Retrieval
Zhang, Yan
Ji, Zhong
Wang, Di
Pang, Yanwei
Li, Xuelong
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 595 - 609
[23] CycleMatch: A cycle-consistent embedding network for image-text matching
Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S.
PATTERN RECOGNITION, 2019, 93 : 365 - 379
[24] Kernel triplet loss for image-text retrieval
Pan, Zhengxin
Wu, Fangyu
Zhang, Bailing
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
[25] Reservoir Computing Transformer for Image-Text Retrieval
Li, Wenrui
Ma, Zhengyu
Deng, Liang-Jian
Wang, Penghong
Shi, Jinqiao
Fan, Xiaopeng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5605 - 5613
[26] Dynamic Contrastive Distillation for Image-Text Retrieval
Rao, Jun
Ding, Liang
Qi, Shuhan
Fang, Meng
Liu, Yang
Shen, Li
Tao, Dacheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8383 - 8395
[27] Prototype local-global alignment network for image-text retrieval
Meng, Lingtao
Zhang, Feifei
Zhang, Xi
Xu, Changsheng
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 525 - 538
[28] Cross-modal Graph Matching Network for Image-text Retrieval
Cheng, Yuhao
Zhu, Xiaoguang
Qian, Jiuchao
Wen, Fei
Liu, Peilin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
[29] Global Relation-Aware Attention Network for Image-Text Retrieval
Cao, Jie
Qian, Shengsheng
Zhang, Huaiwen
Fang, Quan
Xu, Changsheng
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
[30] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
Wang, Shuhuai
Liu, Zheng
Pei, Xinlei
Xu, Junhao
SENSORS, 2023, 23 (05)

← 1 2 3 4 5 →