Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

被引:19
|
作者
Ye, Zihan [1 ]
Hu, Fuyuan [1 ]
Lyu, Fan [2 ]
Li, Linyan [3 ]
Huang, Kaizhu [4 ]
机构
[1] Suzhou Univ Sci & Technol, Suzhou 215009, Peoples R China
[2] Tianjin Univ, Tianjin 300000, Peoples R China
[3] Suzhou Inst Trade & Commerce, Suzhou 215009, Jiangsu, Peoples R China
[4] Xian Jiaotong Liverpool Univ, Dept Elect & Elect Engn, Suzhou 215123, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Semantics; Training; Manganese; Extraterrestrial measurements; Generative adversarial networks; Search problems; Zero-shot learning; generative adversarial network; representation learning; deep learning; CLASSIFICATION;
D O I
10.1109/TMM.2021.3089017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.
引用
收藏
页码:2828 / 2840
页数:13
相关论文
共 50 条
  • [1] Joint Visual and Semantic Optimization for zero-shot learning
    Wu, Hanrui
    Yan, Yuguang
    Chen, Sentao
    Huang, Xiangkang
    Wu, Qingyao
    Ng, Michael K.
    KNOWLEDGE-BASED SYSTEMS, 2021, 215 (215)
  • [2] Learning discriminative visual semantic embedding for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Yuan, Jianying
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
  • [3] Learning semantic consistency for audio-visual zero-shot learning
    Xiaoyong Li
    Jing Yang
    Yuling Chen
    Wei Zhang
    Xiaoli Ruan
    Chengjiang Li
    Zhidong Su
    Artificial Intelligence Review, 58 (7)
  • [4] Transductive Visual-Semantic Embedding for Zero-shot Learning
    Xu, Xing
    Shen, Fumin
    Yang, Yang
    Shao, Jie
    Huang, Zi
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 41 - 49
  • [5] Semantics Disentangling for Generalized Zero-Shot Learning
    Chen, Zhi
    Luo, Yadan
    Qiu, Ruihong
    Wang, Sen
    Huang, Zi
    Li, Jingjing
    Zhang, Zheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8692 - 8700
  • [6] Semantic Autoencoder for Zero-Shot Learning
    Kodirov, Elyor
    Xiang, Tao
    Gong, Shaogang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4447 - 4456
  • [7] Learning semantic ambiguities for zero-shot learning
    Hanouti, Celina
    Le Borgne, Herve
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40745 - 40759
  • [8] Learning semantic ambiguities for zero-shot learning
    Celina Hanouti
    Hervé Le Borgne
    Multimedia Tools and Applications, 2023, 82 : 40745 - 40759
  • [9] Superclass-aware visual feature disentangling for generalized zero-shot learning
    Niu, Chang
    Shang, Junyuan
    Zhou, Zhiheng
    Yang, Junmei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [10] SVDML: Semantic and Visual Space Deep Mutual Learning for Zero-Shot Learning
    Lu, Nannan
    Luo, Yi
    Qiu, Mingkai
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 383 - 395