WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

被引:5
|
作者
Hsu, Po-chun [1 ,2 ]
Lee, Hung-yi [1 ,2 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
[2] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
来源
关键词
neural vocoder; raw waveform synthesis; text-to-speech;
D O I
10.21437/Interspeech.2020-1736
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [1] Parallel WaveNet: Fast High-Fidelity Speech Synthesis
    van den Oord, Aaron
    Li, Yazhe
    Babuschkin, Igor
    Simonyan, Karen
    Vinyals, Oriol
    Kavukcuoglu, Koray
    van den Driessche, George
    Lockhart, Edward
    Cobo, Luis C.
    Stimberg, Florian
    Casagrande, Norman
    Grewe, Dominik
    Noury, Seb
    Dieleman, Sander
    Elsen, Erich
    Kalchbrenner, Nal
    Zen, Heiga
    Graves, Alex
    King, Helen
    Walters, Tom
    Belov, Dan
    Hassabis, Demis
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] High-fidelity Real-time Antiship Cruise Missile Modeling on the GPU
    Scannell, Christopher
    Decker, Jonathan
    Collins, Joseph
    Smith, William
    APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 175 - 182
  • [3] High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes
    Lin, Haotong
    Peng, Sida
    Xu, Zhen
    Xie, Tao
    He, Xingyi
    Bao, Hujun
    Zhou, Xiaowei
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [4] High-fidelity real-time simulation on deployed platforms
    Huynh, D. B. P.
    Knezevic, D. J.
    Peterson, J. W.
    Patera, A. T.
    COMPUTERS & FLUIDS, 2011, 43 (01) : 74 - 81
  • [5] Real-Time High-Fidelity Facial Performance Capture
    Cao, Chen
    Bradley, Derek
    Zhou, Kun
    Beeler, Thabo
    ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (04):
  • [6] Real-Time High-Fidelity Surface Flow Simulation
    Ren, Bo
    Yuan, Tailing
    Li, Chenfeng
    Xu, Kun
    Hu, Shi-Min
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2018, 24 (08) : 2411 - 2423
  • [7] High-Fidelity, Faster than Real-Time Dynamics Simulation
    Flueck, Alexander J.
    2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [8] Real-Time Optimization for the High-Fidelity of Human Motion Imitation
    Han, Hyejin
    Kwon, Jounghuem
    Lee, Jiyong
    Destenay, Romain
    You, Bum-Jae
    2014 11TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2014, : 692 - 695
  • [9] Real-Time High-Fidelity Compression for Extremely High Frame Rate Video Cameras
    Shu, Xiao
    Wu, Xiaolin
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2018, 4 (01): : 172 - 180
  • [10] High-fidelity PWM inverter for audio amplification based on real-time DSP
    Pascual, C
    Krein, PT
    Midya, P
    Roeckner, B
    COMPEL 2000: 7TH WORKSHOP ON COMPUTERS IN POWER ELECTRONICS, PROCEEDINGS, 2000, : 227 - 232