DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

被引:0
|
作者
Chen, Zhuo [1 ,2 ,6 ]
Huang, Yufeng [3 ,6 ]
Chen, Jiaoyan [4 ]
Geng, Yuxia [1 ,6 ]
Zhang, Wen [3 ,6 ]
Fang, Yin [1 ,6 ]
Pan, Jeff Z. [5 ]
Chen, Huajun [1 ,2 ,6 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Donghai Lab, Zhoushan 316021, Peoples R China
[3] Zhejiang Univ, Sch Software Technol, Hangzhou, Peoples R China
[4] Univ Manchester, Dept Comp Sci, Manchester, England
[5] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
[6] Alibaba Zhejiang Univ, Joint Inst Frontier Technol, Hangzhou, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. As annotations for class-level visual characteristics, attributes are widely used semantic information for zero-shot image classification. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the lack of fine-grained annotations, but also the issues of attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark, and that its components are effective and its predictions are interpretable.
引用
收藏
页码:405 / 413
页数:9
相关论文
共 50 条
  • [21] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    KNOWLEDGE-BASED SYSTEMS, 2022, 257
  • [22] Contrastive semantic disentanglement in latent space for generalized zero-shot learning
    Fan, Wentao
    Liang, Chen
    Wang, Tian
    Knowledge-Based Systems, 2022, 257
  • [23] Mining Contrastive Relations Between Cross-Modal Features for Zero-Shot Remote Sensing Image Scene Classification
    Liu, Chun
    Ma, Suqiang
    Li, Zheng
    Yang, Wei
    Han, Zhigang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [24] CHOP: An orthogonal hashing method for zero-shot cross-modal retrieval
    Yuan, Xu
    Wang, Guangze
    Chen, Zhikui
    Zhong, Fangming
    PATTERN RECOGNITION LETTERS, 2021, 145 : 247 - 253
  • [25] A Simplified Framework for Zero-shot Cross-Modal Sketch Data Retrieval
    Chaudhuri, Ushasi
    Banerjee, Biplab
    Bhattacharya, Avik
    Datcu, Mihai
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 699 - 706
  • [26] Correlated Features Synthesis and Alignment for Zero-shot Cross-modal Retrieval
    Xu, Xing
    Lin, Kaiyi
    Lu, Huimin
    Gao, Lianli
    Shen, Heng Tao
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1419 - 1428
  • [27] Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval
    Yang, Fan
    Wang, Zheng
    Xiao, Jing
    Satoh, Shin'chi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12589 - 12596
  • [28] Discrete asymmetric zero-shot hashing with application to cross-modal retrieval
    Shu, Zhenqiu
    Yong, Kailing
    Yu, Jun
    Gao, Shengxiang
    Mao, Cunli
    Yu, Zhengtao
    NEUROCOMPUTING, 2022, 511 : 366 - 379
  • [29] Attribute-Guided Network for Cross-Modal Zero-Shot Hashing
    Ji, Zhong
    Sun, Yuxin
    Yu, Yunlong
    Pang, Yanwei
    Han, Jungong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (01) : 321 - 330
  • [30] Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval
    Tian, Jialin
    Wang, Kai
    Xu, Xing
    Cao, Zuo
    Shen, Fumin
    Shen, Heng Tao
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 960 - 969