Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data

被引:3
|
作者
Haque, Kazi Nazmul [1 ]
Rana, Rajib [1 ]
Liu, Jiajun [2 ]
Hansen, John H. L. [3 ]
Cummins, Nicholas [4 ]
Busso, Carlos [3 ]
Schuller, Bjorn W. [5 ,6 ]
机构
[1] Univ So Queensland, Toowoomba, Qld 4350, Australia
[2] CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Univ Texas Dallas, Richardson, TX 75080 USA
[4] Kings Coll London, London WC2R 2LS, England
[5] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[6] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Generators; Generative adversarial networks; Spectrogram; Data models; Training; Task analysis; Speech processing; Audio Generation; Disentangled Representation Learning; Guided Representation Learning; and Generative Adversarial Neural Network; SPEECH;
D O I
10.1109/TASLP.2021.3098764
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.
引用
收藏
页码:2575 / 2590
页数:16
相关论文
共 50 条
  • [31] Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data
    Sharon Torao Pingi
    Duoyi Zhang
    Md Abul Bashar
    Richi Nayak
    Data Science and Engineering, 2024, 9 : 5 - 25
  • [32] Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks
    Lattner, Stefan
    Nistal, Javier
    ELECTRONICS, 2021, 10 (11)
  • [33] Compressed Domain Invariant Adversarial Representation Learning for Robust Audio Deepfake Detection
    Yuan, Chengsheng
    Chen, Yifei
    Zhou, Zhili
    Xia, Zhihua
    Huang, Yongfeng
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 1111 - 1115
  • [34] Towards a Perceptual Loss: Using a Neural Network Codec Approximation as a Loss for Generative Audio Models
    Ananthabhotla, Ishwarya
    Ewert, Sebastian
    Paradiso, Joseph A.
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1518 - 1525
  • [35] Phase-Aware Audio Super-resolution for Music Signals Using Wasserstein Generative Adversarial Network
    Yan, Yanqiao
    Binh Thien Nguyen
    Geng, Yuting
    Iwai, Kenta
    Nishiura, Takanobu
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1673 - 1677
  • [36] Recognition of Audio Depression Based on Convolutional Neural Network and Generative Antagonism Network Model
    Wang, Zhiyong
    Chen, Longxi
    Wang, Lifeng
    Diao, Guangqiang
    IEEE ACCESS, 2020, 8 : 101181 - 101191
  • [37] TAXOGAN: Hierarchical Network Representation Learning via Taxonomy Guided Generative Adversarial Networks (Extended Abstract)
    Yang, Carl
    Zhang, Jieyu
    Han, Jiawei
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4859 - 4863
  • [38] Massive Data Generation for Deep Learning-Aided Wireless Systems Using Meta Learning and Generative Adversarial Network
    Kim, Jinhong
    Ahn, Yongjun
    Shim, Byonghyo
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (01) : 1302 - 1306
  • [39] Synthetic lung ultrasound data generation using autoencoder with generative adversarial network
    Fatima, Noreen
    Inchingolo, Riccardo
    Smargiassi, Andrea
    Soldati, Gino
    Torri, Elena
    Perrone, Tiziano
    Demi, Libertario
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [40] Anomalous Sound Detection Using Deep Audio Representation and a BLSTM Network for Audio Surveillance of Roads
    Li, Yanxiong
    Li, Xianku
    Zhang, Yuhan
    Liu, Mingle
    Wang, Wucheng
    IEEE ACCESS, 2018, 6 : 58043 - 58055