WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

被引:5
|
作者
Hsu, Po-chun [1 ,2 ]
Lee, Hung-yi [1 ,2 ]
机构
[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan
[2] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan
来源
关键词
neural vocoder; raw waveform synthesis; text-to-speech;
D O I
10.21437/Interspeech.2020-1736
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [21] Real-time high-fidelity reliability updating with equality information using adaptive Kriging
    Wang, Zeyu
    Shafieezadeh, Abdollah
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2020, 195 (195)
  • [22] Padding-enabled real-time high-fidelity temporal single pixel imaging
    Keyaki, Ryota
    Matsuno, Jin
    Fukatsu, Susumu
    APPLIED PHYSICS EXPRESS, 2025, 18 (01)
  • [23] VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
    Yang, Jinhyeok
    Lee, Junmo
    Kim, Youngik
    Cho, Hoon-Young
    Kim, Injung
    INTERSPEECH 2020, 2020, : 200 - 204
  • [24] Poster: Enabling High-Fidelity and Real-Time Mobility Digital Twin with Edge Computing
    Liu, Yueyang
    Wang, Haoxin
    Cai, Zhipeng
    Chen, Dawei
    Han, Kyungtae
    2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), 2022, : 281 - 283
  • [25] REAL-TIME SPEECH SYNTHESIS
    COHEN, MM
    MASSARO, DW
    BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1976, 8 (02): : 189 - 196
  • [26] Demo: BuildTwin: Towards Real-time High-fidelity Digital Twin for Smart Building Management
    Liang, Zhizhao
    Jin, Yichao
    Singh, Jagdeep
    Khan, Aftab
    2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP, 2023,
  • [27] Real-Time and High-Fidelity Tracking of Lysosomal Dynamics with a Dicyanoisophorone-Based Fluorescent Probe
    Hong, Jiaxin
    Li, Qianhua
    Xia, Qingfeng
    Feng, Guoqiang
    ANALYTICAL CHEMISTRY, 2021, 93 (50) : 16956 - 16964
  • [28] Using machine learning and real-time workload assessment in a high-fidelity UAV simulation environment
    Monfort, Samuel S.
    Sibley, Ciara M.
    Coyne, Joseph T.
    NEXT-GENERATION ANALYST IV, 2016, 9851
  • [29] Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual-Strength Predictions
    Spear, Ashley D.
    Priest, Amanda R.
    Veilleux, Michael G.
    Ingraffea, Anthony R.
    Hochhalter, Jacob D.
    AIAA JOURNAL, 2011, 49 (12) : 2770 - 2782
  • [30] High-fidelity pose estimation for real-time extended reality (XR) visualization for cardiac catheterization
    Annabestani, Mohsen
    Sriram, Sandhya
    Caprio, Alexandre
    Janghorbani, Sepehr
    Wong, S. Chiu
    Sigaras, Alexandros
    Mosadegh, Bobak
    SCIENTIFIC REPORTS, 2024, 14 (01):