Semi-supervised classification-aware cross-modal deep adversarial data augmentation

被引:8
|
作者
Wang, Shaoqiang [1 ]
Wu, Zhenzhen [2 ]
He, Gewen [3 ]
Wang, Shudong [1 ]
Sun, Hongwei [2 ]
Fan, Fangfang [4 ]
机构
[1] China Univ Petr, Sch Comp & Commun Engn, Qingdao 266000, Peoples R China
[2] Weifang Univ Sci & Technol, Shandong Prov Univ Lab Protected Hort, Weifang 262700, Peoples R China
[3] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA
[4] Harvard Univ, Harvard Med Sch, Cambridge, MA 02215 USA
关键词
Adversarial network; Data augmentation; Density estimation; Graph representation; Semi supervised learning;
D O I
10.1016/j.future.2021.05.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural networks are usually data-starved in real-world applications, while manually annotation can be costly-for example, the audio emotion recognition from the audio. In contrast, the continued research in image-based facial expression recognition grants us a rich source of public available labeled IFER datasets. Using images to support audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. This paper proposes a system that facilitates knowledge transfer from the labeled visual to the heterogeneous labeled audio domain by learning a joint distribution of examples in different modalities then the system can map an IFER example to a corresponding audio spectrogram. Next, our work reformulates the audio emotion classification into a K+1 class discriminator of GAN-based semi-supervised learning. Good semi-supervised learning requires that the generator does NOT sample from a distribution well matching the true data distribution. Therefore, we demand the generated examples are from the low-density areas of the marginal distribution in the audio spectrogram modality. Concretely, the proposed model translates image samples to audios class-wisely in the form of spectrograms. To harness the decoded samples in a sparsely distributed area and construct a tighter decision boundary, we give a solution to precisely estimate the density on feature space and incorporate low-density pieces with an annealing scheme. Our method requires the network to discriminate against the low-density data points from high-density data points throughout the classification, and we evidence that this technique effectively improves task performance. (C) 2021 Published by Elsevier B.V.
引用
收藏
页码:194 / 205
页数:12
相关论文
共 50 条
  • [1] Classification-aware Semi-supervised Domain Adaptation
    He, Gewen
    Liu, Xiaofeng
    Fan, Fangfang
    You, Jane
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4147 - 4156
  • [2] Semi-supervised Deep Quantization for Cross-modal Search
    Wang, Xin
    Zhu, Wenwu
    Liu, Chenghao
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1730 - 1739
  • [3] Semi-supervised cross-modal image generation with generative adversarial networks
    Li, Dan
    Du, Changde
    He, Huiguang
    PATTERN RECOGNITION, 2020, 100
  • [4] X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data
    Hong, Danfeng
    Yokoya, Naoto
    Xia, Gui-Song
    Chanussot, Jocelyn
    Zhu, Xiao Xiang
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 167 : 12 - 23
  • [5] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    NEUROCOMPUTING, 2024, 579
  • [6] SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network
    Zhang, Jian
    Peng, Yuxin
    Yuan, Mingkuan
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (02) : 489 - 502
  • [7] Semi-Supervised Semi-Paired Cross-Modal Hashing
    Zhang, Xuening
    Liu, Xingbo
    Nie, Xiushan
    Kang, Xiao
    Yin, Yilong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6517 - 6529
  • [8] Semi-Supervised Cross-Modal Retrieval With Label Prediction
    Mandal, Devraj
    Rao, Pramod
    Biswas, Soma
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2345 - 2353
  • [9] Semi-Supervised Knowledge Distillation for Cross-Modal Hashing
    Su, Mingyue
    Gu, Guanghua
    Ren, Xianlong
    Fu, Hao
    Zhao, Yao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 662 - 675
  • [10] Enhancing Semi-Supervised Learning with Cross-Modal Knowledge
    Zhu, Hui
    Lu, Yongchun
    Wang, Hongbin
    Zhou, Xunyi
    Ma, Qin
    Liu, Yanhong
    Jiang, Ning
    Wei, Xin
    Zeng, Linchengxi
    Zhao, Xiaofang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4456 - 4465