Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach

被引：0

作者：

Lee, Saehyung ^{[1
]}

Yu, Sangwon ^{[1
]}

Park, Junsung ^{[1
]}

Yi, Jihun ^{[1
]}

Yoon, Sungroh ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea

[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling the use of any arbitrary black-box model. Second, we construct the LLM questioner to generate non-redundant questions about the attributes of the target image, based on the information of retrieval candidate images in the current context. This approach mitigates the issues of noisiness and redundancy in the generated questions. Beyond our methodology, we propose a novel evaluation metric, Best log Rank Integral (BRI), for a comprehensive assessment of the interactive retrieval system. PlugIR demonstrates superior performance compared to both zero-shot and fine-tuned baselines in various benchmarks. Additionally, the two methodologies comprising PlugIR can be flexibly applied together or separately in various situations. Our codes are available at https://github.com/Saehyung-Lee/PlugIR.

引用

页码：791 / 809

页数：19

共 50 条

[1] SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
Chai, Weilong
Zheng, Dandan
Cao, Jiajiong
Chen, Zhiquan
Wang, Changbao
Ma, Chenguang
COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 181 - 196
[2] Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Lu, Pan
Peng, Baolin
Cheng, Hao
Galley, Michel
Chang, Kai-Wei
Wu, Ying Nian
Zhu, Song-Chun
Gao, Jianfeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] Plug-and-Play Regulators for Image-Text Matching
Diao, Haiwen
Zhang, Ying
Liu, Wei
Ruan, Xiang
Lu, Huchuan
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
[4] Deep plug-and-play HIO approach for phase retrieval
Isil, Cagatay
Oktem, Figen s.
APPLIED OPTICS, 2025, 64 (05) : A84 - A94
[5] MaxFusion: Plug&Play Multi-modal Generation in Text-to-Image Diffusion Models
Nair, Nithin Gopalakrishnan
Valanarasu, Jeya Maria Jose
Patel, Vishal M.
COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 93 - 110
[6] Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
Lu, Haoming
Tunanyan, Hazarapet
Wang, Kai
Navasardyan, Shant
Wang, Zhangyang
Shi, Humphrey
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14267 - 14276
[7] P4: Plug-and-Play Discrete Prompting for Large Language Models Personalization
Zhang, Yuansen
Wang, Xiao
Chen, Tianze
Fu, Jiayi
Gui, Tao
Zhang, Qi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9129 - 9144
[8] Plug-and-Play Conversational Models
Madotto, Andrea
Ishii, Etsuko
Lin, Zhaojiang
Dathathri, Sumanth
Fung, Pascale
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2422 - 2433
[9] LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Lu, Yujie
Yang, Xianjun
Li, Xiujun
Wang, Xin Eric
Wang, William Yang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Tumanyan, Narek
Geyer, Michal
Bagon, Shai
Dekel, Tali
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1921 - 1930

← 1 2 3 4 5 →