Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data

被引:3
|
作者
Haque, Kazi Nazmul [1 ]
Rana, Rajib [1 ]
Liu, Jiajun [2 ]
Hansen, John H. L. [3 ]
Cummins, Nicholas [4 ]
Busso, Carlos [3 ]
Schuller, Bjorn W. [5 ,6 ]
机构
[1] Univ So Queensland, Toowoomba, Qld 4350, Australia
[2] CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Univ Texas Dallas, Richardson, TX 75080 USA
[4] Kings Coll London, London WC2R 2LS, England
[5] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[6] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Generators; Generative adversarial networks; Spectrogram; Data models; Training; Task analysis; Speech processing; Audio Generation; Disentangled Representation Learning; Guided Representation Learning; and Generative Adversarial Neural Network; SPEECH;
D O I
10.1109/TASLP.2021.3098764
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.
引用
收藏
页码:2575 / 2590
页数:16
相关论文
共 50 条
  • [1] High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
    Haque, Kazi Nazmul
    Rana, Rajib
    Schuller, Bjorn W.
    Haque, Kazi Nazmul (shezan.huq@gmail.com), 1600, Institute of Electrical and Electronics Engineers Inc. (08): : 223509 - 223528
  • [2] High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder
    Haque, Kazi Nazmul
    Rana, Rajib
    Schuller, Bjorn W.
    IEEE ACCESS, 2020, 8 : 223509 - 223528
  • [3] NEURAL AUDIO DECORRELATION USING GENERATIVE ADVERSARIAL NETWORKS
    Anemuller, Carlotta
    Thiergart, Oliver
    Habets, Emanuel A. P.
    2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA, 2023,
  • [4] Anomaly Detection of Deepfake Audio Based on Real Audio Using Generative Adversarial Network Model
    Song, Daeun
    Lee, Nayoung
    Kim, Jiwon
    Choi, Eunjung
    IEEE ACCESS, 2024, 12 : 184311 - 184326
  • [5] Deep neural network representation and Generative Adversarial Learning
    Ruiz-Garcia, Ariel
    Schmidhuber, Jurgen
    Palade, Vasile
    Took, Clive Cheong
    Mandic, Danilo
    NEURAL NETWORKS, 2021, 139 : 199 - 200
  • [6] TOWARDS AUDIO TO SCENE IMAGE SYNTHESIS USING GENERATIVE ADVERSARIAL NETWORK
    Wan, Chia-Hung
    Chuang, Shun-Po
    Lee, Hung-Yi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 496 - 500
  • [7] Multi-channel neural audio decorrelation using generative adversarial networks
    Anemueller, Carlotta
    Thiergart, Oliver
    Habets, Emanuel A. P.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [8] Generative Adversarial Network-Based Neural Audio Caption Model for Oral Evaluation
    Zhang, Liu
    Shu, Chao
    Guo, Jin
    Zhang, Hanyi
    Xie, Cheng
    Liu, Qing
    ELECTRONICS, 2020, 9 (03)
  • [9] Anti-forensics of fake stereo audio using generative adversarial network
    Tianyun Liu
    Diqun Yan
    Nan Yan
    Gang Chen
    Multimedia Tools and Applications, 2022, 81 : 17155 - 17167
  • [10] Anti-forensics of fake stereo audio using generative adversarial network
    Liu, Tianyun
    Yan, Diqun
    Yan, Nan
    Chen, Gang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (12) : 17155 - 17167