Semi-supervised classification-aware cross-modal deep adversarial data augmentation

被引:8
|
作者
Wang, Shaoqiang [1 ]
Wu, Zhenzhen [2 ]
He, Gewen [3 ]
Wang, Shudong [1 ]
Sun, Hongwei [2 ]
Fan, Fangfang [4 ]
机构
[1] China Univ Petr, Sch Comp & Commun Engn, Qingdao 266000, Peoples R China
[2] Weifang Univ Sci & Technol, Shandong Prov Univ Lab Protected Hort, Weifang 262700, Peoples R China
[3] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[4] Harvard Univ, Harvard Med Sch, Cambridge, MA 02215 USA
关键词
Adversarial network; Data augmentation; Density estimation; Graph representation; Semi supervised learning;
D O I
10.1016/j.future.2021.05.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural networks are usually data-starved in real-world applications, while manually annotation can be costly-for example, the audio emotion recognition from the audio. In contrast, the continued research in image-based facial expression recognition grants us a rich source of public available labeled IFER datasets. Using images to support audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. This paper proposes a system that facilitates knowledge transfer from the labeled visual to the heterogeneous labeled audio domain by learning a joint distribution of examples in different modalities then the system can map an IFER example to a corresponding audio spectrogram. Next, our work reformulates the audio emotion classification into a K+1 class discriminator of GAN-based semi-supervised learning. Good semi-supervised learning requires that the generator does NOT sample from a distribution well matching the true data distribution. Therefore, we demand the generated examples are from the low-density areas of the marginal distribution in the audio spectrogram modality. Concretely, the proposed model translates image samples to audios class-wisely in the form of spectrograms. To harness the decoded samples in a sparsely distributed area and construct a tighter decision boundary, we give a solution to precisely estimate the density on feature space and incorporate low-density pieces with an annealing scheme. Our method requires the network to discriminate against the low-density data points from high-density data points throughout the classification, and we evidence that this technique effectively improves task performance. (C) 2021 Published by Elsevier B.V.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 50 条
  • [21] Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
    Wu, Fei
    Li, Shuaishuai
    Gao, Guangwei
    Ji, Yimu
    Jing, Xiao-Yuan
    Wan, Zhiguo
    PATTERN RECOGNITION, 2023, 136
  • [22] Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
    Liao, Lei
    Yang, Meng
    Zhang, Bob
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 920 - 934
  • [23] Semi-supervised semantic factorization hashing for fast cross-modal retrieval
    Wang, Jiale
    Li, Guohui
    Pan, Peng
    Zhao, Xiaosong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20197 - 20215
  • [24] Semantic Consistency Cross-Modal Retrieval With Semi-Supervised Graph Regularization
    Xu, Gongwen
    Li, Xiaomei
    Zhang, Zhijun
    IEEE ACCESS, 2020, 8 : 14278 - 14288
  • [25] Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval
    Zhang, Liang
    Ma, Bingpeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 128 - 141
  • [26] Semi-supervised constrained graph convolutional network for cross-modal retrieval
    Zhang, Lei
    Chen, Leiting
    Ou, Weihua
    Zhou, Chuan
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
  • [27] Semi-supervised semantic factorization hashing for fast cross-modal retrieval
    Jiale Wang
    Guohui Li
    Peng Pan
    Xiaosong Zhao
    Multimedia Tools and Applications, 2017, 76 : 20197 - 20215
  • [28] Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
    Liang, Jingjun
    Li, Ruichen
    Jin, Qin
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2852 - 2861
  • [29] Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval
    Zhang, Xi
    Lai, Hanjiang
    Feng, Jiashi
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 614 - 629
  • [30] Augmentation Learning for Semi-Supervised Classification
    Frommknecht, Tim
    Zipf, Pedro Alves
    Fan, Quanfu
    Shvetsova, Nina
    Kuehne, Hilde
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98