Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

被引:26
|
作者
Saito, Kuniaki [1 ,2 ]
Sohn, Kihyuk [3 ]
Zhang, Xiang [2 ]
Li, Chun-Liang [2 ]
Lee, Chen-Yu [2 ]
Saenko, Kate [1 ,4 ]
Pfister, Tomas [2 ]
机构
[1] Boston Univ, Boston, MA 02215 USA
[2] Google Cloud AI Res, Mountain View, CA 94043 USA
[3] Google Res, Mountain View, CA USA
[4] MIT IBM Watson AI Lab, Cambridge, MA USA
关键词
D O I
10.1109/CVPR52729.2023.01850
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-Shot Composed Image Retrieval (ZS-CIR), whose goal is to build a CIR model without requiring labeled triplets for training. To this end, we propose a novel method, called Pic2Word, that requires only weakly labeled image-caption pairs and unlabeled image datasets to train. Unlike existing supervised CIR models, our model trained on weakly labeled or unlabeled datasets shows strong generalization across diverse ZS-CIR tasks, e.g., attribute editing, object composition, and domain conversion. Our approach outperforms several supervised CIR methods on the common CIR benchmark, CIRR and Fashion-IQ. Code will be made publicly available at https://github.com/google-research/composed_image_retrieval
引用
收藏
页码:19305 / 19314
页数:10
相关论文
共 50 条
  • [21] Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping
    Sun, Huadong
    Zhen, Zhibin
    Liu, Yinghui
    Zhang, Xu
    Han, Xiaowei
    Zhang, Pengyi
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [22] Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable Style
    Lin, Fengyin
    Li, Mingkang
    Li, Da
    Hospedales, Timothy
    Song, Yi-Zhe
    Qi, Yonggang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23349 - 23358
  • [23] Contour detection network for zero-shot sketch-based image retrieval
    Zhang, Qing
    Zhang, Jing
    Su, Xiangdong
    Bao, Feilong
    Gao, Guanglai
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 6781 - 6795
  • [24] Energy Confused Adversarial Metric Learning for Zero-Shot Image Retrieval and Clustering
    Chen, Binghui
    Deng, Weihong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8134 - 8141
  • [25] Zero-Shot Sketch Based Image Retrieval via Modality Capacity Guidance
    Zhou, Yanghong
    Liu, Dawei
    Mok, P. Y.
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1780 - 1787
  • [26] Doodle to Search: Practical Zero-Shot Sketch-based Image Retrieval
    Dey, Sounak
    Riba, Pau
    Dutta, Anjan
    Llados, Josep
    Song, Yi-Zhe
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2174 - 2183
  • [27] Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval
    Ueki, Kazuya
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 628 - 634
  • [28] Transferable Coupled Network for Zero-Shot Sketch-Based Image Retrieval
    Wang, Hao
    Deng, Cheng
    Liu, Tongliang
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 9181 - 9194
  • [29] Sharing Model Framework for Zero-Shot Sketch-Based Image Retrieval
    Ho, Yi-Hsuan
    Way, Der-Lor
    Shih, Zen-Chung
    COMPUTER GRAPHICS FORUM, 2023, 42 (07)
  • [30] Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval
    Yang, Fan
    Wang, Zheng
    Xiao, Jing
    Satoh, Shin'chi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12589 - 12596