Semi-supervised classification-aware cross-modal deep adversarial data augmentation

被引：8

作者：

Wang, Shaoqiang ^{[1
]}

Wu, Zhenzhen ^{[2
]}

He, Gewen ^{[3
]}

Wang, Shudong ^{[1
]}

Sun, Hongwei ^{[2
]}

Fan, Fangfang ^{[4
]}

机构：

[1] China Univ Petr, Sch Comp & Commun Engn, Qingdao 266000, Peoples R China

[2] Weifang Univ Sci & Technol, Shandong Prov Univ Lab Protected Hort, Weifang 262700, Peoples R China

[3] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA

[4] Harvard Univ, Harvard Med Sch, Cambridge, MA 02215 USA

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 125卷

关键词：

Adversarial network; Data augmentation; Density estimation; Graph representation; Semi supervised learning;

D O I：

10.1016/j.future.2021.05.029

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep neural networks are usually data-starved in real-world applications, while manually annotation can be costly-for example, the audio emotion recognition from the audio. In contrast, the continued research in image-based facial expression recognition grants us a rich source of public available labeled IFER datasets. Using images to support audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. This paper proposes a system that facilitates knowledge transfer from the labeled visual to the heterogeneous labeled audio domain by learning a joint distribution of examples in different modalities then the system can map an IFER example to a corresponding audio spectrogram. Next, our work reformulates the audio emotion classification into a K+1 class discriminator of GAN-based semi-supervised learning. Good semi-supervised learning requires that the generator does NOT sample from a distribution well matching the true data distribution. Therefore, we demand the generated examples are from the low-density areas of the marginal distribution in the audio spectrogram modality. Concretely, the proposed model translates image samples to audios class-wisely in the form of spectrograms. To harness the decoded samples in a sparsely distributed area and construct a tighter decision boundary, we give a solution to precisely estimate the density on feature space and incorporate low-density pieces with an annealing scheme. Our method requires the network to discriminate against the low-density data points from high-density data points throughout the classification, and we evidence that this technique effectively improves task performance. (C) 2021 Published by Elsevier B.V.

引用

页码：194 / 205

页数：12

共 50 条

[1] Classification-aware Semi-supervised Domain Adaptation
He, Gewen
Liu, Xiaofeng
Fan, Fangfang
You, Jane
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4147 - 4156
[2] Semi-supervised Deep Quantization for Cross-modal Search
Wang, Xin
Zhu, Wenwu
Liu, Chenghao
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1730 - 1739
[3] Semi-supervised cross-modal image generation with generative adversarial networks
Li, Dan
Du, Changde
He, Huiguang
PATTERN RECOGNITION, 2020, 100
[4] X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data
Hong, Danfeng
Yokoya, Naoto
Xia, Gui-Song
Chanussot, Jocelyn
Zhu, Xiao Xiang
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2020, 167 : 12 - 23
[5] A semi-supervised cross-modal memory bank for cross-modal retrieval
Huang, Yingying
Hu, Bingliang
Zhang, Yipeng
Gao, Chi
Wang, Quan
NEUROCOMPUTING, 2024, 579
[6] SCH-GAN: Semi-Supervised Cross-Modal Hashing by Generative Adversarial Network
Zhang, Jian
Peng, Yuxin
Yuan, Mingkuan
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (02) : 489 - 502
[7] Semi-Supervised Semi-Paired Cross-Modal Hashing
Zhang, Xuening
Liu, Xingbo
Nie, Xiushan
Kang, Xiao
Yin, Yilong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6517 - 6529
[8] Semi-Supervised Cross-Modal Retrieval With Label Prediction
Mandal, Devraj
Rao, Pramod
Biswas, Soma
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2345 - 2353
[9] Semi-Supervised Knowledge Distillation for Cross-Modal Hashing
Su, Mingyue
Gu, Guanghua
Ren, Xianlong
Fu, Hao
Zhao, Yao
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 662 - 675
[10] Enhancing Semi-Supervised Learning with Cross-Modal Knowledge
Zhu, Hui
Lu, Yongchun
Wang, Hongbin
Zhou, Xunyi
Ma, Qin
Liu, Yanhong
Jiang, Ning
Wei, Xin
Zeng, Linchengxi
Zhao, Xiaofang
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4456 - 4465

← 1 2 3 4 5 →