Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data

被引:3
|
作者
Haque, Kazi Nazmul [1 ]
Rana, Rajib [1 ]
Liu, Jiajun [2 ]
Hansen, John H. L. [3 ]
Cummins, Nicholas [4 ]
Busso, Carlos [3 ]
Schuller, Bjorn W. [5 ,6 ]
机构
[1] Univ So Queensland, Toowoomba, Qld 4350, Australia
[2] CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Univ Texas Dallas, Richardson, TX 75080 USA
[4] Kings Coll London, London WC2R 2LS, England
[5] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[6] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Generators; Generative adversarial networks; Spectrogram; Data models; Training; Task analysis; Speech processing; Audio Generation; Disentangled Representation Learning; Guided Representation Learning; and Generative Adversarial Neural Network; SPEECH;
D O I
10.1109/TASLP.2021.3098764
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.
引用
收藏
页码:2575 / 2590
页数:16
相关论文
共 50 条
  • [21] TRICYCLE: AUDIO REPRESENTATION LEARNING FROM SENSOR NETWORK DATA USING SELF-SUPERVISION
    Cartwright, Mark
    Cramer, Jason
    Salamon, Justin
    Bello, Juan Pablo
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 278 - 282
  • [22] Land Clutter Data Generation Using Generative Adversarial Network
    Dang, Xunwang
    Chen, Yong
    Wang, Chao
    Yin, Hongcheng
    Xu, Honglei
    2020 IEEE MTT-S INTERNATIONAL CONFERENCE ON NUMERICAL ELECTROMAGNETIC AND MULTIPHYSICS MODELING AND OPTIMIZATION (NEMO 2020), 2020,
  • [23] Enabling Fast and Universal Audio Adversarial Attack Using Generative Model
    Xie, Yi
    Li, Zhuohang
    Shi, Cong
    Liu, Jian
    Chen, Yingying
    Yuan, Bo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14129 - 14137
  • [24] Face Generation using Deep Convolutional Generative Adversarial Neural Network
    Devaki, P.
    Kumar, Prasanna C. B.
    Kaviraj, S.
    Ramprasath, A.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (11): : 20 - 23
  • [25] Neural Audio Generation (GAN) Countermeasure Network Model Based on Deep Learning
    Zhu, Ni
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (09) : 747 - 753
  • [26] Generative adversarial network guided mutual learning based synchronization of cluster of neural networks
    Arindam Sarkar
    Complex & Intelligent Systems, 2021, 7 : 1955 - 1969
  • [27] TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network
    Guan, Yuansheng
    Yu, Guochen
    Li, Andong
    Zheng, Chengshi
    Wang, Jie
    INTERSPEECH 2022, 2022, : 565 - 569
  • [28] Generative adversarial network guided mutual learning based synchronization of cluster of neural networks
    Sarkar, Arindam
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1955 - 1969
  • [29] Audio-guided implicit neural representation for local image stylization
    Lee, Seung Hyun
    Kim, Sieun
    Byeon, Wonmin
    Oh, Gyeongrok
    In, Sumin
    Park, Hyeongcheol
    Yoon, Sang Ho
    Hong, Sung-Hee
    Kim, Jinkyu
    Kim, Sangpil
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (06) : 1185 - 1204
  • [30] Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data
    Pingi, Sharon Torao
    Zhang, Duoyi
    Bashar, Md Abul
    Nayak, Richi
    DATA SCIENCE AND ENGINEERING, 2024, 9 (01) : 5 - 25