Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data

被引:3
|
作者
Haque, Kazi Nazmul [1 ]
Rana, Rajib [1 ]
Liu, Jiajun [2 ]
Hansen, John H. L. [3 ]
Cummins, Nicholas [4 ]
Busso, Carlos [3 ]
Schuller, Bjorn W. [5 ,6 ]
机构
[1] Univ So Queensland, Toowoomba, Qld 4350, Australia
[2] CSIRO, Distributed Sensing Syst Grp, Pullenvale, Qld 4069, Australia
[3] Univ Texas Dallas, Richardson, TX 75080 USA
[4] Kings Coll London, London WC2R 2LS, England
[5] Imperial Coll London, Grp Language Audio & Mus, London SW7 2BX, England
[6] Univ Augsburg, Chair Embedded Intelligence Hlth Care & Wellbeing, D-86159 Augsburg, Germany
关键词
Generators; Generative adversarial networks; Spectrogram; Data models; Training; Task analysis; Speech processing; Audio Generation; Disentangled Representation Learning; Guided Representation Learning; and Generative Adversarial Neural Network; SPEECH;
D O I
10.1109/TASLP.2021.3098764
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Generation power of Generative Adversarial Neural Networks (GANs) has shown great promise to learn representations from unlabelled data while guided by a small amount of labelled data. We aim to utilise the generation power of GANs to learn Audio Representations. Most existing studies are, however, focused on images. Some studies use GANs for speech generation, but they are conditioned on text or acoustic features, limiting their use for other audio, such as instruments, and even for speech where transcripts are limited. This paper proposes a novel GAN-based model that we named Guided Generative Adversarial Neural Network (GGAN), which can learn powerful representations and generate good-quality samples using a small amount of labelled data as guidance. Experimental results based on a speech [Speech Command Dataset (S09)] and a non-speech [Musical Instrument Sound dataset (Nsyth)] dataset demonstrate that using only 5% of labelled data as guidance, GGAN learns significantly better representations than the state-of-the-art models.
引用
收藏
页码:2575 / 2590
页数:16
相关论文
共 50 条
  • [41] Synthetic Energy Data Generation Using Time Variant Generative Adversarial Network
    Asre, Shashank
    Anwar, Adnan
    ELECTRONICS, 2022, 11 (03)
  • [42] A Data Generation Method for Electricity Theft Detection Using Generative Adversarial Network
    Wang D.
    Yang K.
    Yang, Kaihua (244920742@qq.com), 1600, Power System Technology Press (44): : 775 - 782
  • [43] DATA-DRIVEN HARMONIC FILTERS FOR AUDIO REPRESENTATION LEARNING
    Won, Minz
    Chun, Sanghyuk
    Nieto, Oriol
    Serra, Xavier
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 536 - 540
  • [44] Audio Data Mining Using Multi-perceptron Artificial Neural Network
    Shetty, Surendra
    Achary, K. K.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (10): : 224 - 229
  • [45] The Deep Learning Generative Adversarial Random Neural Network in data marketplaces: The digital creative
    Serrano, Will
    NEURAL NETWORKS, 2023, 165 : 420 - 434
  • [46] Effective data generation for imbalanced learning using conditional generative adversarial networks
    Douzas, Georgios
    Bacao, Fernando
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 464 - 471
  • [47] A Generative Adversarial Network-based Attack for Audio-based Condition Monitoring Systems
    Nabila, Abdul Rahman Ba
    Viegas, Eduardo K.
    Almahmoud, Abdelrahman
    Lunardi, Willian T.
    2023 IEEE 20TH CONSUMER COMMUNICATIONS & NETWORKING CONFERENCE, CCNC, 2023,
  • [48] HaCk: Hand Gesture Classification Using a Convolutional Neural Network and Generative Adversarial Network-Based Data Generation Model
    Chatterjee, Kalyan
    Raju, M.
    Selvamuthukumaran, N.
    Pramod, M.
    Kumar, B. Krishna
    Bandyopadhyay, Anjan
    Mallik, Saurav
    INFORMATION, 2024, 15 (02)
  • [49] Audio Splicing Detection using Convolutional Neural Network
    Jadhav, Shital
    Patole, Rashmika
    Rege, Priti
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [50] Audio Data-driven Anomaly Detection for Induction Motor Based on Generative Adversarial Networks
    Shim, Jaehoon
    Joung, Taesuk
    Lee, Sangwon
    Ha, Jung-Ik
    2022 IEEE ENERGY CONVERSION CONGRESS AND EXPOSITION (ECCE), 2022,