Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

被引：42

作者：

Ling, Zhen-Hua ^{[1
]}

Ai, Yang ^{[1
]}

Gu, Yu ^{[1
,2
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Anhui, Peoples R China

[2] Baidu Speech Dept, Baidu Technol Pk, Beijing 100193, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 05期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Speech bandwidth extension; recurrent neural networks; dilated convolutional neural networks; bottleneck features; VOICE CONVERSION; ENHANCEMENT;

D O I：

10.1109/TASLP.2018.2798811

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods that predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN, which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward layers. The LSTM layers forma hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck features derived from narrowband speech using a deep neural network based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network based method and the plain sample-level recurrent neural network based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.

引用

页码：883 / 894

页数：12

共 50 条

[1] Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension
Gu, Yu
Ling, Zhen-Hua
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1123 - 1127
[2] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
Gu, Yu
Ling, Zhen-Hua
Dai, Li-Rong
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
[3] Audio bandwidth extension using ensemble of recurrent neural networks
Xin Liu
Chang-Chun Bao
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2016
[4] Audio bandwidth extension using ensemble of recurrent neural networks
Liu, Xin
Bao, Chang-Chun
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016, : 1 - 12
[5] Mapping Neural Networks for Bandwidth Extension of Narrowband Speech
Shahina, A.
Yegnanarayana, B.
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1435 - 1438
[6] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
CHEN, WY
LIAO, YF
CHEN, SH
[J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
[7] RECURRENT NEURAL NETWORK FOR SPECTRAL MAPPING IN SPEECH BANDWIDTH EXTENSION
Wang, Yingxue
Zhao, Shenghui
Li, Jianxin
Kuang, Jingming
Zhu, Qiang
[J]. 2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 242 - 246
[8] Continuous mandarin speech recognition using hierarchical recurrent neural networks
Liao, YF
Chen, WY
Chen, SH
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3370 - 3373
[9] Hierarchical recurrent neural networks for graph generation
Song Xianduo
Wang Xin
Song Yuyuan
Zuo Xianglin
Wang Ying
[J]. INFORMATION SCIENCES, 2022, 589 : 250 - 264
[10] Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension
Lee, Bong-Ki
Noh, Kyounjin
Chang, Joon-Hyuk
Choo, Kihyun
Oh, Eunmi
[J]. IEEE ACCESS, 2018, 6 : 27039 - 27047

← 1 2 3 4 5 →