Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

被引:26
|
作者
Saito, Kuniaki [1 ,2 ]
Sohn, Kihyuk [3 ]
Zhang, Xiang [2 ]
Li, Chun-Liang [2 ]
Lee, Chen-Yu [2 ]
Saenko, Kate [1 ,4 ]
Pfister, Tomas [2 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Google Cloud AI Res, Mountain View, CA 94043 USA
[3] Google Res, Mountain View, CA USA
[4] MIT IBM Watson AI Lab, Cambridge, MA USA
关键词
D O I
10.1109/CVPR52729.2023.01850
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. To this end, we propose a novel method, called Pic2Word, that requires only weakly labeled image-caption pairs and unlabeled image datasets to train. Unlike existing supervised CIR models, our model trained on weakly labeled or unlabeled datasets shows strong generalization across diverse ZS-CIR tasks, e.g., attribute editing, object composition, and domain conversion. Our approach outperforms several supervised CIR methods on the common CIR benchmark, CIRR and Fashion-IQ. Code will be made publicly available at https://github.com/google-research/composed_image_retrieval
引用
收藏
页码:19305 / 19314
页数:10
相关论文
共 50 条
  • [41] Zero-Shot Sketch-Based Image Retrieval via Graph Convolution Network
    Zhang, Zhaolong
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12943 - 12950
  • [42] Asymmetric Mutual Alignment for Unsupervised Zero-Shot Sketch-Based Image Retrieval
    Yin, Zhihui
    Yan, Jiexi
    Xu, Chenghao
    Deng, Cheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16504 - 16512
  • [43] Feature Fusion and Metric Learning Network for Zero-Shot Sketch-Based Image Retrieval
    Zhao, Honggang
    Liu, Mingyue
    Li, Mingyong
    ENTROPY, 2023, 25 (03)
  • [44] OCEAN: A DUAL LEARNING APPROACH FOR GENERALIZED ZERO-SHOT SKETCH-BASED IMAGE RETRIEVAL
    Zhu, Jiawen
    Xu, Xing
    Shen, Fumin
    Lee, Roy Ka-Wei
    Wang, Zheng
    Shen, Heng Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [45] Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
    Liu, Qing
    Xie, Lingxi
    Wang, Huiyu
    Yuile, Alan L.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3661 - 3670
  • [46] Semi-transductive Learning for Generalized Zero-Shot Sketch-Based Image Retrieval
    Ge, Ce
    Wang, Jingyu
    Qi, Qi
    Sun, Haifeng
    Xu, Tong
    Liao, Jianxin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7678 - 7686
  • [47] Energy-Guided Feature Fusion for Zero-Shot Sketch-Based Image Retrieval
    Hao Ren
    Ziqiang Zheng
    Hong Lu
    Neural Processing Letters, 2022, 54 : 5711 - 5720
  • [48] Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents
    Rai, Anuj
    Krishnan, Narayanan C.
    Chanda, Sukalpa
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT I, 2021, 12821 : 19 - 33
  • [49] A zero-shot deep metric learning approach to Brain-Computer Interfaces for image retrieval
    McCartney, Ben
    Devereux, Barry
    Martinez-del-Rincon, Jesus
    KNOWLEDGE-BASED SYSTEMS, 2022, 246
  • [50] Pho(SC)-CTC-a hybrid approach towards zero-shot word image recognition
    Bhatt, Ravi
    Rai, Anuj
    Chanda, Sukalpa
    Krishnan, Narayanan C.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2023, 26 (01) : 51 - 63