WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

被引:5
|
作者
Hsu, Po-chun [1 ,2 ]
Lee, Hung-yi [1 ,2 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
[2] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
来源
关键词
neural vocoder; raw waveform synthesis; text-to-speech;
D O I
10.21437/Interspeech.2020-1736
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [31] Minimally Invasive Live Tissue High-Fidelity Thermophysical Modeling Using Real-Time Thermography
    El-Kebir, Hamza
    Ran, Junren
    Lee, Yongseok
    Chamorro, Leonardo P.
    Ostoja-Starzewski, Martin
    Berlin, Richard
    Cornejo, Gabriela M. Aguiluz
    Benedetti, Enrico
    Giulianotti, Pier C.
    Bentsman, Joseph
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2023, 70 (06) : 1849 - 1857
  • [32] MULTI-USER REAL-TIME SPEECH RECOGNITION WITH A GPU
    Kim, Jungsuk
    Sung, Wonyong
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1617 - 1620
  • [33] REAL-TIME SPEECH SYNTHESIS SYSTEM
    AINSWORTH, WA
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1972, AU20 (05): : 397 - +
  • [34] Fourier-inspired neural module for real-time and high-fidelity computer-generated holography
    Dong, Zhenxing
    Xu, Chao
    Ling, Yuye
    Li, Yan
    Su, Yikai
    OPTICS LETTERS, 2023, 48 (03) : 759 - 762
  • [35] Detection of structural pulmonary changes with real-time high-fidelity analysis of expiratory CO2
    Sassmann, Teresa
    Pienn, Michael
    Kovacs, Gabor
    Douschan, Philipp
    Foris, Vasile
    John, Nikolaus
    Zeder, Katarina
    Zirlik, Andreas
    Olschewski, Horst
    WIENER KLINISCHE WOCHENSCHRIFT, 2022, 134 (19-20) : 731 - 732
  • [36] Hairpin Structure Facilitates Multiplex High-Fidelity DNA Amplification in Real-Time Polymerase Chain Reaction
    Zhang, Kerou
    Pinto, Alessandro
    Cheng, Lauren Yuxuan
    Song, Ping
    Dai, Peng
    Wang, Michael
    Rodriguez, Luis
    Weller, Cailin
    Zhang, David Yu
    ANALYTICAL CHEMISTRY, 2022, 94 (27) : 9586 - 9594
  • [37] Real-time implementation of the high-fidelity NBI code RABBIT into the discharge control system of ASDEX Upgrade
    Weiland, M.
    Bilato, R.
    Sieglin, B.
    Felici, F.
    Giannone, L.
    Kudlacek, O.
    Rampp, M.
    Scheffer, M.
    Treutterer, W.
    Zehetbauer, T.
    NUCLEAR FUSION, 2023, 63 (06)
  • [38] High-fidelity Database-free Deep Learning Reconstruction for Real-time Cine Cardiac MRI
    Demirel, Omer Burak
    Zhang, Chi
    Yaman, Burhaneddin
    Gulle, Merve
    Shenoy, Chetan
    Leiner, Tim
    Kellman, Peter
    Akcakaya, Mehmet
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [39] GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
    Yang, Jinhyeok
    Bae, Jae-Sung
    Bak, Taejun
    Kim, Young-Ik
    Cho, Hoon-Young
    INTERSPEECH 2021, 2021, : 2202 - 2206
  • [40] Building Real-Time Speech Recognition Without CMVN
    Nguyen, Thai Son
    Sperber, Matthias
    Stueker, Sebastian
    Waibel, Alex
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 451 - 460