EENet: embedding enhancement network for compositional image-text retrieval using generated text

被引：0

作者：

Hur, Chan ^{[1
]}

Park, Hyeyoung ^{[1
]}

机构：

[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 16期

关键词：

Compositional Image-Text Retrieval; Image-Captioning; Joint embedding; Visual Feature Enhancement; Textual Feature Generation;

D O I：

10.1007/s11042-023-17531-y

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider the compositional image-text retrieval task, which searches for appropriate target images given a reference image with feedback text as a query. For instance, when a user finds a dress on an E-commerce site that meets all their needs except for the length and decoration, the user can give sentence-form feedback, e.g., "I like this dress, but I wish it was a little shorter and had no ribbon," to the system. This is a practical scenario for advanced retrieval systems and is applicable to user interactive search systems or E-commerce systems. To tackle this task, we propose a model, the Embedding Enhancement Network (EENet), which includes a text generation module and an image feature enhancement module using the generated text. While the conventional works mainly focus on developing an efficient composition module of a given image and text query, EENet actively generates an additional textual description to enhance the image feature vector in the embedding space, which is inspired by the human ability to recognize an object using a visual sensor and prior textual information. Also, a new training loss is introduced to ensure that images and additional generated texts are well combined. The experimental results show that the EENet achieves considerable improvement on retrieval performance evaluations; for the Recall@1 metric, it improved by 3.4% in Fashion200k and 1.4% in MIT-States over the baseline model.

引用

页码：49689 / 49705

页数：17

共 50 条

[41] IMAGE-TEXT ALIGNMENT AND RETRIEVAL USING LIGHT-WEIGHT TRANSFORMER
Li, Wenrui
Fan, Xiaopeng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4758 - 4762
[42] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
Liu, An-An
Yang, Bo
Li, Wenhui
Song, Dan
Sun, Zhengya
Ren, Tongwei
Wei, Zhiqiang
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
[43] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
Qin, Xueyang
Li, Lishuang
Pang, Guangyao
Hao, Fei
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[44] Image-text bidirectional learning network based cross-modal retrieval
Li, Zhuoyi
Lu, Huibin
Fu, Hao
Gu, Guanghua
NEUROCOMPUTING, 2022, 483 : 148 - 159
[45] Multi-layer Probabilistic Association Reasoning Network for Image-Text Retrieval
Li W.
Xiong R.
Fan X.
IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (10) : 1 - 1
[46] External Knowledge Dynamic Modeling for Image-text Retrieval
Yang, Song
Li, Qiang
Li, Wenhui
Liu, Min
Li, Xuanya
Liu, Anan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5330 - 5338
[47] Asymmetric bi-encoder for image-text retrieval
Xiong, Wei
Liu, Haoliang
Mi, Siya
Zhang, Yu
MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
[48] Multiview adaptive attention pooling for image-text retrieval
Ding, Yunlai
Yu, Jiaao
Lv, Qingxuan
Zhao, Haoran
Dong, Junyu
Li, Yuezun
KNOWLEDGE-BASED SYSTEMS, 2024, 291
[49] Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
Shui, Jianan
Ding, Shuaipeng
Li, Mingyong
Ma, Yan
WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 130 - 145
[50] Causal image-text retrieval embedded with consensus knowledge
Liang Y.
Liu X.
Ma Z.
Li Z.
Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2024, 46 (02): : 317 - 328

← 1 2 3 4 5 →