Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach

被引:0
|
作者
Lee, Saehyung [1 ]
Yu, Sangwon [1 ]
Park, Junsung [1 ]
Yi, Jihun [1 ]
Yoon, Sungroh [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling the use of any arbitrary black-box model. Second, we construct the LLM questioner to generate non-redundant questions about the attributes of the target image, based on the information of retrieval candidate images in the current context. This approach mitigates the issues of noisiness and redundancy in the generated questions. Beyond our methodology, we propose a novel evaluation metric, Best log Rank Integral (BRI), for a comprehensive assessment of the interactive retrieval system. PlugIR demonstrates superior performance compared to both zero-shot and fine-tuned baselines in various benchmarks. Additionally, the two methodologies comprising PlugIR can be flexibly applied together or separately in various situations. Our codes are available at https://github.com/Saehyung-Lee/PlugIR.
引用
收藏
页码:791 / 809
页数:19
相关论文
共 50 条
  • [1] SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
    Chai, Weilong
    Zheng, Dandan
    Cao, Jiajiong
    Chen, Zhiquan
    Wang, Changbao
    Ma, Chenguang
    COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 181 - 196
  • [2] Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
    Lu, Pan
    Peng, Baolin
    Cheng, Hao
    Galley, Michel
    Chang, Kai-Wei
    Wu, Ying Nian
    Zhu, Song-Chun
    Gao, Jianfeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Plug-and-Play Regulators for Image-Text Matching
    Diao, Haiwen
    Zhang, Ying
    Liu, Wei
    Ruan, Xiang
    Lu, Huchuan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2322 - 2334
  • [4] Deep plug-and-play HIO approach for phase retrieval
    Isil, Cagatay
    Oktem, Figen s.
    APPLIED OPTICS, 2025, 64 (05) : A84 - A94
  • [5] MaxFusion: Plug&Play Multi-modal Generation in Text-to-Image Diffusion Models
    Nair, Nithin Gopalakrishnan
    Valanarasu, Jeya Maria Jose
    Patel, Vishal M.
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 93 - 110
  • [6] Specialist Diffusion: Plug-and-Play Sample-Efficient Fine-Tuning of Text-to-Image Diffusion Models to Learn Any Unseen Style
    Lu, Haoming
    Tunanyan, Hazarapet
    Wang, Kai
    Navasardyan, Shant
    Wang, Zhangyang
    Shi, Humphrey
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14267 - 14276
  • [7] P4: Plug-and-Play Discrete Prompting for Large Language Models Personalization
    Zhang, Yuansen
    Wang, Xiao
    Chen, Tianze
    Fu, Jiayi
    Gui, Tao
    Zhang, Qi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9129 - 9144
  • [8] Plug-and-Play Conversational Models
    Madotto, Andrea
    Ishii, Etsuko
    Lin, Zhaojiang
    Dathathri, Sumanth
    Fung, Pascale
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2422 - 2433
  • [9] LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
    Lu, Yujie
    Yang, Xianjun
    Li, Xiujun
    Wang, Xin Eric
    Wang, William Yang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
    Tumanyan, Narek
    Geyer, Michal
    Bagon, Shai
    Dekel, Tali
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1921 - 1930