Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

被引:26
|
作者
Saito, Kuniaki [1 ,2 ]
Sohn, Kihyuk [3 ]
Zhang, Xiang [2 ]
Li, Chun-Liang [2 ]
Lee, Chen-Yu [2 ]
Saenko, Kate [1 ,4 ]
Pfister, Tomas [2 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Google Cloud AI Res, Mountain View, CA 94043 USA
[3] Google Res, Mountain View, CA USA
[4] MIT IBM Watson AI Lab, Cambridge, MA USA
关键词
D O I
10.1109/CVPR52729.2023.01850
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. To this end, we propose a novel method, called Pic2Word, that requires only weakly labeled image-caption pairs and unlabeled image datasets to train. Unlike existing supervised CIR models, our model trained on weakly labeled or unlabeled datasets shows strong generalization across diverse ZS-CIR tasks, e.g., attribute editing, object composition, and domain conversion. Our approach outperforms several supervised CIR methods on the common CIR benchmark, CIRR and Fashion-IQ. Code will be made publicly available at https://github.com/google-research/composed_image_retrieval
引用
收藏
页码:19305 / 19314
页数:10
相关论文
共 50 条
  • [1] Zero-Shot Composed Image Retrieval with Textual Inversion
    Baldrati, Alberto
    Agnolucci, Lorenzo
    Bertini, Marco
    Del Bimbo, Alberto
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15292 - 15301
  • [2] Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
    Tang, Yuanmin
    Yu, Jing
    Gai, Keke
    Zhuang, Jiamin
    Xiong, Gang
    Hu, Yue
    Wu, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5180 - 5188
  • [3] Language-only Efficient Training of Zero-shot Composed Image Retrieval
    Gu, Geonmo
    Chun, Sanghyuk
    Kim, Wonjae
    Kang, Yoohoon
    Yun, Sangdoo
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13225 - 13234
  • [4] ATTRIBUTE HASHING FOR ZERO-SHOT IMAGE RETRIEVAL
    Xu, Yahui
    Yang, Yang
    Shen, Fumin
    Xu, Xing
    Zhou, Yuxuan
    Shen, Heng Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 133 - 138
  • [5] Zero-Shot Image Retrieval with Human Feedback
    Agnolucci, Lorenzo
    Baldrati, Alberto
    Bertini, Marco
    Del Bimbo, Alberto
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9417 - 9419
  • [6] Fine-grained Textual Inversion Network for Zero-Shot Composed Image Retrieval
    Lin, Haoqiang
    Wen, Haokun
    Song, Xuemeng
    Liu, Meng
    Hu, Yupeng
    Nie, Liqiang
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 240 - 250
  • [7] LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval
    Yang, Zhenyu
    Xue, Dizhan
    Qian, Shengsheng
    Dong, Weiming
    Xu, Changsheng
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 80 - 90
  • [8] SEMANTIC AUGMENTATION HASHING FOR ZERO-SHOT IMAGE RETRIEVAL
    Zhong, Fangming
    Chen, Zhikui
    Min, Geyong
    Xia, Feng
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1943 - 1947
  • [9] A Zero-Shot Framework for Sketch Based Image Retrieval
    Yelamarthi, Sasi Kiran
    Reddy, Shiva Krishna
    Mishra, Ashish
    Mittal, Anurag
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 316 - 333
  • [10] Zero-shot Hashing with orthogonal projection for image retrieval
    Zhang, Haofeng
    Long, Yang
    Shao, Ling
    PATTERN RECOGNITION LETTERS, 2019, 117 : 201 - 209