Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

被引:26
|
作者
Saito, Kuniaki [1 ,2 ]
Sohn, Kihyuk [3 ]
Zhang, Xiang [2 ]
Li, Chun-Liang [2 ]
Lee, Chen-Yu [2 ]
Saenko, Kate [1 ,4 ]
Pfister, Tomas [2 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Google Cloud AI Res, Mountain View, CA 94043 USA
[3] Google Res, Mountain View, CA USA
[4] MIT IBM Watson AI Lab, Cambridge, MA USA
关键词
D O I
10.1109/CVPR52729.2023.01850
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. To this end, we propose a novel method, called Pic2Word, that requires only weakly labeled image-caption pairs and unlabeled image datasets to train. Unlike existing supervised CIR models, our model trained on weakly labeled or unlabeled datasets shows strong generalization across diverse ZS-CIR tasks, e.g., attribute editing, object composition, and domain conversion. Our approach outperforms several supervised CIR methods on the common CIR benchmark, CIRR and Fashion-IQ. Code will be made publicly available at https://github.com/google-research/composed_image_retrieval
引用
收藏
页码:19305 / 19314
页数:10
相关论文
共 50 条
  • [31] Contour detection network for zero-shot sketch-based image retrieval
    Qing Zhang
    Jing Zhang
    Xiangdong Su
    Feilong Bao
    Guanglai Gao
    Complex & Intelligent Systems, 2023, 9 : 6781 - 6795
  • [32] Pho(SC)-CTC—a hybrid approach towards zero-shot word image recognition
    Ravi Bhatt
    Anuj Rai
    Sukalpa Chanda
    Narayanan C. Krishnan
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 51 - 63
  • [33] Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval
    Tian, Jialin
    Xu, Xing
    Wang, Zheng
    Shen, Fumin
    Liu, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5473 - 5481
  • [34] Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval
    Wang, Zhipeng
    Wang, Hao
    Yan, Jiexi
    Wu, Aming
    Deng, Cheng
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1143 - 1149
  • [35] Hybrid-Attention based Decoupled Metric Learning for Zero-Shot Image Retrieval
    Chen, Binghui
    Deng, Weihong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2745 - 2754
  • [36] Deep quantization network with visual-semantic alignment for zero-shot image retrieval
    Liu, Huixia
    Qin, Zhihong
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (07): : 4232 - 4247
  • [37] Zero-shot Sketch-based Image Retrieval with Adaptive Balanced Discriminability and Generalizability
    Tian, Jialin
    Xu, Xing
    Cao, Zuo
    Zhang, Gong
    Shen, Fumin
    Yang, Yang
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 407 - 415
  • [38] A zero-shot learning approach to the development of brain-computer interfaces for image retrieval
    McCartney, Ben
    Martinez-del-Rincon, Jesus
    Devereux, Barry
    Murphy, Brian
    PLOS ONE, 2019, 14 (09):
  • [39] Confusion-Based Metric Learning for Regularizing Zero-Shot Image Retrieval and Clustering
    Chen, Binghui
    Deng, Weihong
    Wang, Biao
    Zhang, Lei
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1884 - 1897
  • [40] Cross-Domain Alignment for Zero-Shot Sketch-Based Image Retrieval
    Wang, Xu
    Peng, Dezhong
    Hu, Peng
    Gong, Yunhong
    Chen, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 7024 - 7035