WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

被引：5

作者：

Hsu, Po-chun ^{[1
,2
]}

Lee, Hung-yi ^{[1
,2
]}

机构：

[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan

[2] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan

来源：

INTERSPEECH 2020 | 2020年

关键词：

neural vocoder; raw waveform synthesis; text-to-speech;

D O I：

10.21437/Interspeech.2020-1736

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online.

引用

页码：210 / 214

页数：5

共 50 条

[21] Real-time high-fidelity reliability updating with equality information using adaptive Kriging
Wang, Zeyu
Shafieezadeh, Abdollah
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2020, 195 (195)
[22] Padding-enabled real-time high-fidelity temporal single pixel imaging
Keyaki, Ryota
Matsuno, Jin
Fukatsu, Susumu
APPLIED PHYSICS EXPRESS, 2025, 18 (01)
[23] VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Yang, Jinhyeok
Lee, Junmo
Kim, Youngik
Cho, Hoon-Young
Kim, Injung
INTERSPEECH 2020, 2020, : 200 - 204
[24] Poster: Enabling High-Fidelity and Real-Time Mobility Digital Twin with Edge Computing
Liu, Yueyang
Wang, Haoxin
Cai, Zhipeng
Chen, Dawei
Han, Kyungtae
2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), 2022, : 281 - 283
[25] REAL-TIME SPEECH SYNTHESIS
COHEN, MM
MASSARO, DW
BEHAVIOR RESEARCH METHODS & INSTRUMENTATION, 1976, 8 (02): : 189 - 196
[26] Demo: BuildTwin: Towards Real-time High-fidelity Digital Twin for Smart Building Management
Liang, Zhizhao
Jin, Yichao
Singh, Jagdeep
Khan, Aftab
2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP, 2023,
[27] Real-Time and High-Fidelity Tracking of Lysosomal Dynamics with a Dicyanoisophorone-Based Fluorescent Probe
Hong, Jiaxin
Li, Qianhua
Xia, Qingfeng
Feng, Guoqiang
ANALYTICAL CHEMISTRY, 2021, 93 (50) : 16956 - 16964
[28] Using machine learning and real-time workload assessment in a high-fidelity UAV simulation environment
Monfort, Samuel S.
Sibley, Ciara M.
Coyne, Joseph T.
NEXT-GENERATION ANALYST IV, 2016, 9851
[29] Surrogate Modeling of High-Fidelity Fracture Simulations for Real-Time Residual-Strength Predictions
Spear, Ashley D.
Priest, Amanda R.
Veilleux, Michael G.
Ingraffea, Anthony R.
Hochhalter, Jacob D.
AIAA JOURNAL, 2011, 49 (12) : 2770 - 2782
[30] High-fidelity pose estimation for real-time extended reality (XR) visualization for cardiac catheterization
Annabestani, Mohsen
Sriram, Sandhya
Caprio, Alexandre
Janghorbani, Sepehr
Wong, S. Chiu
Sigaras, Alexandros
Mosadegh, Bobak
SCIENTIFIC REPORTS, 2024, 14 (01):

← 1 2 3 4 5 →