Semi-supervised classification-aware cross-modal deep adversarial data augmentation

被引：8

作者：

Wang, Shaoqiang ^{[1
]}

Wu, Zhenzhen ^{[2
]}

He, Gewen ^{[3
]}

Wang, Shudong ^{[1
]}

Sun, Hongwei ^{[2
]}

Fan, Fangfang ^{[4
]}

机构：

[1] China Univ Petr, Sch Comp & Commun Engn, Qingdao 266000, Peoples R China

[2] Weifang Univ Sci & Technol, Shandong Prov Univ Lab Protected Hort, Weifang 262700, Peoples R China

[3] Florida State Univ, Dept Comp Sci, Tallahassee, FL 32306 USA

[4] Harvard Univ, Harvard Med Sch, Cambridge, MA 02215 USA

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 125卷

关键词：

Adversarial network; Data augmentation; Density estimation; Graph representation; Semi supervised learning;

D O I：

10.1016/j.future.2021.05.029

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep neural networks are usually data-starved in real-world applications, while manually annotation can be costly-for example, the audio emotion recognition from the audio. In contrast, the continued research in image-based facial expression recognition grants us a rich source of public available labeled IFER datasets. Using images to support audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. This paper proposes a system that facilitates knowledge transfer from the labeled visual to the heterogeneous labeled audio domain by learning a joint distribution of examples in different modalities then the system can map an IFER example to a corresponding audio spectrogram. Next, our work reformulates the audio emotion classification into a K+1 class discriminator of GAN-based semi-supervised learning. Good semi-supervised learning requires that the generator does NOT sample from a distribution well matching the true data distribution. Therefore, we demand the generated examples are from the low-density areas of the marginal distribution in the audio spectrogram modality. Concretely, the proposed model translates image samples to audios class-wisely in the form of spectrograms. To harness the decoded samples in a sparsely distributed area and construct a tighter decision boundary, we give a solution to precisely estimate the density on feature space and incorporate low-density pieces with an annealing scheme. Our method requires the network to discriminate against the low-density data points from high-density data points throughout the classification, and we evidence that this technique effectively improves task performance. (C) 2021 Published by Elsevier B.V.

引用

页码：194 / 205

页数：12

共 50 条

[31] Supervised cross-modal factor analysis for multiple modal data classification
Wang, Jingbin
Zhou, Yihua
Duan, Kanghong
Wang, Jim Jing-Yan
Bensmail, Halima
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1882 - 1888
[32] Data Augmentation for Graph Convolutional Network on Semi-supervised Classification
Tang, Zhengzheng
Qiao, Ziyue
Hong, Xuehai
Wang, Yang
Dharejo, Fayaz Ali
Zhou, Yuanchun
Du, Yi
WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 33 - 48
[33] GSDA: Generative adversarial network-based semi-supervised data augmentation for ultrasound image classification
Liu, Zhaoshan
Lv, Qiujie
Lee, Chau Hung
Shen, Lei
HELIYON, 2023, 9 (09)
[34] Deep Supervised Cross-modal Retrieval
Zhen, Liangli
Hu, Peng
Wang, Xu
Peng, Dezhong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395
[35] SEMI-SUPERVISED SEMANTIC-PRESERVING HASHING FOR EFFICIENT CROSS-MODAL RETRIEVAL
Wang, Xingzhi
Liu, Xin
Hu, Zhikai
Wang, Nannan
Fan, Wentao
Du, Ji-Xiang
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1006 - 1011
[36] Semi-supervised cross-modal hashing with multi-view graph representation
Shen, Xiao
Zhang, Haofeng
Li, Lunbo
Yang, Wankou
Liu, Li
INFORMATION SCIENCES, 2022, 604 : 45 - 60
[37] Semi-supervised Cross-Modal Hashing Based on Label Prediction and Distance Preserving
Zhang, Xu
Tian, Xin
Yang, Bing
Zhang, Zuyu
Li, Yan
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 324 - 330
[38] Semi-supervised Prototype Semantic Association Learning for Robust Cross-modal Retrieval
Wang, Junsheng
Gong, Tiantian
Yan, Yan
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 872 - 881
[39] Semi-Supervised Deep Adversarial Forest for Cross-Environment Localization
Cui, Wei
Zhang, Le
Li, Bing
Chen, Zhenghua
Wu, Min
Li, Xiaoli
Kang, Jiawen
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (09) : 10215 - 10219
[40] Multi-Level Cross-Modal Interactive-Network-Based Semi-Supervised Multi-Modal Ship Classification
The School of Software Technology, Dalian University of Technology, Dalian
116621, China
Sensors, 2024, 22

← 1 2 3 4 5 →