WG-WaveNet: Real-Time High-Fidelity Speech Synthesis without GPU

被引：5

作者：

Hsu, Po-chun ^{[1
,2
]}

Lee, Hung-yi ^{[1
,2
]}

机构：

[1] Natl Taiwan Univ, Coll Elect Engn & Comp Sci, Taipei, Taiwan

[2] Natl Taiwan Univ, Grad Inst Commun Engn, Taipei, Taiwan

来源：

INTERSPEECH 2020 | 2020年

关键词：

neural vocoder; raw waveform synthesis; text-to-speech;

D O I：

10.21437/Interspeech.2020-1736

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we propose WG-WaveNet, a fast, lightweight, and high-quality waveform generation model. WG-WaveNet is composed of a compact flow-based model and a post-filter. The two components are jointly trained by maximizing the likelihood of the training data and optimizing loss functions on the frequency domains. As we design a flow-based model that is heavily compressed, the proposed model requires much less computational resources compared to other waveform generation models during both training and inference time; even though the model is highly compressed, the post-filter maintains the quality of generated waveform. Our PyTorch implementation can be trained using less than 8 GB GPU memory and generates audio samples at a rate of more than 960 kHz on an NVIDIA 1080Ti GPU. Furthermore, even if synthesizing on a CPU, we show that the proposed method is capable of generating 44.1 kHz speech waveform 1.2 times faster than real-time. Experiments also show that the quality of generated audio is comparable to those of other methods. Audio samples are publicly available online.

引用

页码：210 / 214

页数：5

共 50 条

[31] Minimally Invasive Live Tissue High-Fidelity Thermophysical Modeling Using Real-Time Thermography
El-Kebir, Hamza
Ran, Junren
Lee, Yongseok
Chamorro, Leonardo P.
Ostoja-Starzewski, Martin
Berlin, Richard
Cornejo, Gabriela M. Aguiluz
Benedetti, Enrico
Giulianotti, Pier C.
Bentsman, Joseph
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2023, 70 (06) : 1849 - 1857
[32] MULTI-USER REAL-TIME SPEECH RECOGNITION WITH A GPU
Kim, Jungsuk
Sung, Wonyong
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 1617 - 1620
[33] REAL-TIME SPEECH SYNTHESIS SYSTEM
AINSWORTH, WA
IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1972, AU20 (05): : 397 - +
[34] Fourier-inspired neural module for real-time and high-fidelity computer-generated holography
Dong, Zhenxing
Xu, Chao
Ling, Yuye
Li, Yan
Su, Yikai
OPTICS LETTERS, 2023, 48 (03) : 759 - 762
[35] Detection of structural pulmonary changes with real-time high-fidelity analysis of expiratory CO2
Sassmann, Teresa
Pienn, Michael
Kovacs, Gabor
Douschan, Philipp
Foris, Vasile
John, Nikolaus
Zeder, Katarina
Zirlik, Andreas
Olschewski, Horst
WIENER KLINISCHE WOCHENSCHRIFT, 2022, 134 (19-20) : 731 - 732
[36] Hairpin Structure Facilitates Multiplex High-Fidelity DNA Amplification in Real-Time Polymerase Chain Reaction
Zhang, Kerou
Pinto, Alessandro
Cheng, Lauren Yuxuan
Song, Ping
Dai, Peng
Wang, Michael
Rodriguez, Luis
Weller, Cailin
Zhang, David Yu
ANALYTICAL CHEMISTRY, 2022, 94 (27) : 9586 - 9594
[37] Real-time implementation of the high-fidelity NBI code RABBIT into the discharge control system of ASDEX Upgrade
Weiland, M.
Bilato, R.
Sieglin, B.
Felici, F.
Giannone, L.
Kudlacek, O.
Rampp, M.
Scheffer, M.
Treutterer, W.
Zehetbauer, T.
NUCLEAR FUSION, 2023, 63 (06)
[38] High-fidelity Database-free Deep Learning Reconstruction for Real-time Cine Cardiac MRI
Demirel, Omer Burak
Zhang, Chi
Yaman, Burhaneddin
Gulle, Merve
Shenoy, Chetan
Leiner, Tim
Kellman, Peter
Akcakaya, Mehmet
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
[39] GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Yang, Jinhyeok
Bae, Jae-Sung
Bak, Taejun
Kim, Young-Ik
Cho, Hoon-Young
INTERSPEECH 2021, 2021, : 2202 - 2206
[40] Building Real-Time Speech Recognition Without CMVN
Nguyen, Thai Son
Sperber, Matthias
Stueker, Sebastian
Waibel, Alex
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 451 - 460

← 1 2 3 4 5 →