DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

被引:0
|
作者
Chen, Zhuo [1 ,2 ,6 ]
Huang, Yufeng [3 ,6 ]
Chen, Jiaoyan [4 ]
Geng, Yuxia [1 ,6 ]
Zhang, Wen [3 ,6 ]
Fang, Yin [1 ,6 ]
Pan, Jeff Z. [5 ]
Chen, Huajun [1 ,2 ,6 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Donghai Lab, Zhoushan 316021, Peoples R China
[3] Zhejiang Univ, Sch Software Technol, Hangzhou, Peoples R China
[4] Univ Manchester, Dept Comp Sci, Manchester, England
[5] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
[6] Alibaba Zhejiang Univ, Joint Inst Frontier Technol, Hangzhou, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. As annotations for class-level visual characteristics, attributes are widely used semantic information for zero-shot image classification. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the lack of fine-grained annotations, but also the issues of attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, and that its components are effective and its predictions are interpretable.
引用
收藏
页码:405 / 413
页数:9
相关论文
共 50 条
  • [1] Cross-modal Zero-shot Hashing
    Liu, Xuanwu
    Li, Zhao
    Wang, Jun
    Yu, Guoxian
    Domeniconi, Carlotta
    Zhang, Xiangliang
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 449 - 458
  • [2] Cross-modal Representation Learning for Zero-shot Action Recognition
    Lin, Chung-Ching
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    Li, Linjie
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
  • [3] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [4] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131
  • [5] Generalized Zero-Shot Cross-Modal Retrieval
    Dutta, Titir
    Biswas, Soma
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5953 - 5962
  • [6] Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
    Zhang, Yizhen
    Choi, Minkyu
    Han, Kuan
    Liu, Zhongming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification
    Fang, Zhiyu
    Zhu, Xiaobin
    Yang, Chun
    Han, Zheng
    Qin, Jingyan
    Yin, Xu-Cheng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6605 - 6613
  • [8] Cross-modal prototype learning for zero-shot handwritten character recognition
    Ao, Xiang
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2022, 131
  • [9] Semantic Contrastive Embedding for Generalized Zero-Shot Learning
    Zongyan Han
    Zhenyong Fu
    Shuo Chen
    Jian Yang
    International Journal of Computer Vision, 2022, 130 : 2606 - 2622
  • [10] Semantic Contrastive Embedding for Generalized Zero-Shot Learning
    Han, Zongyan
    Fu, Zhenyong
    Chen, Shuo
    Yang, Jian
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (11) : 2606 - 2622