Semi-supervised classification-aware cross-modal deep adversarial data augmentation

被引：8

作者：

Wang, Shaoqiang ^{[1
]}

Wu, Zhenzhen ^{[2
]}

He, Gewen ^{[3
]}

Wang, Shudong ^{[1
]}

Sun, Hongwei ^{[2
]}

Fan, Fangfang ^{[4
]}

机构：

[1] China Univ Petr, Sch Comp & Commun Engn, Qingdao 266000, Peoples R China

[2] Weifang Univ Sci & Technol, Shandong Prov Univ Lab Protected Hort, Weifang 262700, Peoples R China

[3] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA

[4] Harvard Univ, Harvard Med Sch, Cambridge, MA 02215 USA

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 125卷

关键词：

Adversarial network; Data augmentation; Density estimation; Graph representation; Semi supervised learning;

D O I：

10.1016/j.future.2021.05.029

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep neural networks are usually data-starved in real-world applications, while manually annotation can be costly-for example, the audio emotion recognition from the audio. In contrast, the continued research in image-based facial expression recognition grants us a rich source of public available labeled IFER datasets. Using images to support audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. This paper proposes a system that facilitates knowledge transfer from the labeled visual to the heterogeneous labeled audio domain by learning a joint distribution of examples in different modalities then the system can map an IFER example to a corresponding audio spectrogram. Next, our work reformulates the audio emotion classification into a K+1 class discriminator of GAN-based semi-supervised learning. Good semi-supervised learning requires that the generator does NOT sample from a distribution well matching the true data distribution. Therefore, we demand the generated examples are from the low-density areas of the marginal distribution in the audio spectrogram modality. Concretely, the proposed model translates image samples to audios class-wisely in the form of spectrograms. To harness the decoded samples in a sparsely distributed area and construct a tighter decision boundary, we give a solution to precisely estimate the density on feature space and incorporate low-density pieces with an annealing scheme. Our method requires the network to discriminate against the low-density data points from high-density data points throughout the classification, and we evidence that this technique effectively improves task performance. (C) 2021 Published by Elsevier B.V.

引用

页码：194 / 205

页数：12

共 50 条

[21] Semi-supervised cross-modal hashing via modality-specific and cross-modal graph convolutional networks
Wu, Fei
Li, Shuaishuai
Gao, Guangwei
Ji, Yimu
Jing, Xiao-Yuan
Wan, Zhiguo
PATTERN RECOGNITION, 2023, 136
[22] Deep Supervised Dual Cycle Adversarial Network for Cross-Modal Retrieval
Liao, Lei
Yang, Meng
Zhang, Bob
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 920 - 934
[23] Semi-supervised semantic factorization hashing for fast cross-modal retrieval
Wang, Jiale
Li, Guohui
Pan, Peng
Zhao, Xiaosong
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20197 - 20215
[24] Semantic Consistency Cross-Modal Retrieval With Semi-Supervised Graph Regularization
Xu, Gongwen
Li, Xiaomei
Zhang, Zhijun
IEEE ACCESS, 2020, 8 : 14278 - 14288
[25] Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval
Zhang, Liang
Ma, Bingpeng
Li, Guorong
Huang, Qingming
Tian, Qi
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (01) : 128 - 141
[26] Semi-supervised constrained graph convolutional network for cross-modal retrieval
Zhang, Lei
Chen, Leiting
Ou, Weihua
Zhou, Chuan
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
[27] Semi-supervised semantic factorization hashing for fast cross-modal retrieval
Jiale Wang
Guohui Li
Peng Pan
Xiaosong Zhao
Multimedia Tools and Applications, 2017, 76 : 20197 - 20215
[28] Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Liang, Jingjun
Li, Ruichen
Jin, Qin
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2852 - 2861
[29] Attention-Aware Deep Adversarial Hashing for Cross-Modal Retrieval
Zhang, Xi
Lai, Hanjiang
Feng, Jiashi
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 614 - 629
[30] Augmentation Learning for Semi-Supervised Classification
Frommknecht, Tim
Zipf, Pedro Alves
Fan, Quanfu
Shvetsova, Nina
Kuehne, Hilde
PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98

← 1 2 3 4 5 →