Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

被引:42
|
作者
Ling, Zhen-Hua [1 ]
Ai, Yang [1 ]
Gu, Yu [1 ,2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Anhui, Peoples R China
[2] Baidu Speech Dept, Baidu Technol Pk, Beijing 100193, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Speech bandwidth extension; recurrent neural networks; dilated convolutional neural networks; bottleneck features; VOICE CONVERSION; ENHANCEMENT;
D O I
10.1109/TASLP.2018.2798811
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods that predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN, which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward layers. The LSTM layers forma hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck features derived from narrowband speech using a deep neural network based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network based method and the plain sample-level recurrent neural network based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.
引用
收藏
页码:883 / 894
页数:12
相关论文
共 50 条
  • [1] Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension
    Gu, Yu
    Ling, Zhen-Hua
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1123 - 1127
  • [2] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [3] Audio bandwidth extension using ensemble of recurrent neural networks
    Xin Liu
    Chang-Chun Bao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [4] Audio bandwidth extension using ensemble of recurrent neural networks
    Liu, Xin
    Bao, Chang-Chun
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016, : 1 - 12
  • [5] Mapping Neural Networks for Bandwidth Extension of Narrowband Speech
    Shahina, A.
    Yegnanarayana, B.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1435 - 1438
  • [6] SPEECH RECOGNITION WITH HIERARCHICAL RECURRENT NEURAL NETWORKS
    CHEN, WY
    LIAO, YF
    CHEN, SH
    [J]. PATTERN RECOGNITION, 1995, 28 (06) : 795 - 805
  • [7] RECURRENT NEURAL NETWORK FOR SPECTRAL MAPPING IN SPEECH BANDWIDTH EXTENSION
    Wang, Yingxue
    Zhao, Shenghui
    Li, Jianxin
    Kuang, Jingming
    Zhu, Qiang
    [J]. 2016 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2016, : 242 - 246
  • [8] Continuous mandarin speech recognition using hierarchical recurrent neural networks
    Liao, YF
    Chen, WY
    Chen, SH
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3370 - 3373
  • [9] Hierarchical recurrent neural networks for graph generation
    Song Xianduo
    Wang Xin
    Song Yuyuan
    Zuo Xianglin
    Wang Ying
    [J]. INFORMATION SCIENCES, 2022, 589 : 250 - 264
  • [10] Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension
    Lee, Bong-Ki
    Noh, Kyounjin
    Chang, Joon-Hyuk
    Choo, Kihyun
    Oh, Eunmi
    [J]. IEEE ACCESS, 2018, 6 : 27039 - 27047