Cross-Modal Visual Correspondences Learning Without External Semantic Information for Zero-Shot Sketch-Based Image Retrieval

被引:0
|
作者
Gao, Zhijie [1 ]
Wang, Kai [1 ]
机构
[1] Univ Elect Sci & Technol China, Chengdu 611731, Sichuan, Peoples R China
关键词
Sketch-based Image Retrieval; Zero-shot Learning; Knowledge Distillation;
D O I
10.1007/978-981-99-9109-9_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR), which is challenging because of the modal gap between sketch and image and the semantic inconsistency between seen categories and unseen categories. Most of the previous methods in ZS-SBIR, need external semantic information, i.e., texts and class labels, to minimize modal gap or semantic inconsistency. To tackle the challenging ZS-SBIR without external semantic information which is labor intensive, we propose a novel method of learning the visual correspondences between different modalities, i.e., sketch and image, to transfer knowledge from seen data to unseen data. This method is based on a transformer-based dual-pathway structure to learn the visual correspondences. In order to eliminate the modal gap between sketch and image, triplet loss and Gaussian distribution based domain alignment mechanism are introduced and performed on tokens obtained from our proposed structure. In addition, knowledge distillation is introduced to maintain the generalization capability brought by the vision transformer (ViT) used as the backbone to build the model. The comprehensive experiments on three benchmark datasets, i.e., Sketchy, TU-Berlin and QuickDraw, demonstrate that our method achieves superior results compared to baselines on all three datasets without external semantic information.
引用
收藏
页码:342 / 353
页数:12
相关论文
共 50 条
  • [1] Progressive Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
    Deng, Cheng
    Xu, Xinxun
    Wang, Hao
    Yang, Muli
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8892 - 8902
  • [2] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Jiao, Shichao
    Han, Xie
    Xiong, Fengguang
    Yang, Xiaowen
    Han, Huiyan
    He, Ligang
    Kuang, Liqun
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (16): : 13469 - 13483
  • [3] Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval
    Shichao Jiao
    Xie Han
    Fengguang Xiong
    Xiaowen Yang
    Huiyan Han
    Ligang He
    Liqun Kuang
    [J]. Neural Computing and Applications, 2022, 34 : 13469 - 13483
  • [4] Cross-modal Self-distillation for Zero-shot Sketch-based Image Retrieval
    Tian, Jia-Lin
    Xu, Xing
    Shen, Fu-Min
    Shen, Heng-Tao
    [J]. Ruan Jian Xue Bao/Journal of Software, 2022, 33 (09):
  • [5] WAD-CMSN: Wasserstein distance-based cross-modal semantic network for zero-shot sketch-based image retrieval
    Xu, Guanglong
    Hu, Zhensheng
    Cai, Jia
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2023, 21 (02)
  • [6] Cross-Domain Alignment for Zero-Shot Sketch-Based Image Retrieval
    Wang, Xu
    Peng, Dezhong
    Hu, Peng
    Gong, Yunhong
    Chen, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 7024 - 7035
  • [7] Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval
    Liu, Qing
    Xie, Lingxi
    Wang, Huiyu
    Yuile, Alan L.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3661 - 3670
  • [8] An efficient framework for zero-shot sketch-based image retrieval
    Tursun, Osman
    Denman, Simon
    Sridharan, Sridha
    Goan, Ethan
    Fookes, Clinton
    [J]. PATTERN RECOGNITION, 2022, 126
  • [9] Generative Model for Zero-Shot Sketch-Based Image Retrieval
    Verma, Vinay Kumar
    Mishra, Aakansha
    Mishra, Ashish
    Rai, Piyush
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 704 - 713
  • [10] A Simplified Framework for Zero-shot Cross-Modal Sketch Data Retrieval
    Chaudhuri, Ushasi
    Banerjee, Biplab
    Bhattacharya, Avik
    Datcu, Mihai
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 699 - 706