Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

被引:42
|
作者
Ling, Zhen-Hua [1 ]
Ai, Yang [1 ]
Gu, Yu [1 ,2 ]
Dai, Li-Rong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230027, Anhui, Peoples R China
[2] Baidu Speech Dept, Baidu Technol Pk, Beijing 100193, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Speech bandwidth extension; recurrent neural networks; dilated convolutional neural networks; bottleneck features; VOICE CONVERSION; ENHANCEMENT;
D O I
10.1109/TASLP.2018.2798811
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods that predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN, which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward layers. The LSTM layers forma hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck features derived from narrowband speech using a deep neural network based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network based method and the plain sample-level recurrent neural network based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.
引用
收藏
页码:883 / 894
页数:12
相关论文
共 50 条
  • [41] SPEECH WAVEFORM RECONSTRUCTION USING CONVOLUTIONAL NEURAL NETWORKS WITH NOISE AND PERIODIC INPUTS
    Watts, Oliver
    Valentini-Botinhao, Cassia
    King, Simon
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7045 - 7049
  • [42] PARAMETER GENERATION ALGORITHMS FOR TEXT-TO-SPEECH SYNTHESIS WITH RECURRENT NEURAL NETWORKS
    Klimkov, Viacheslav
    Moinet, Alexis
    Nadolski, Adam
    Drugman, Thomas
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 626 - 631
  • [43] Emotion Recognition from Speech using Artificial Neural Networks and. Recurrent Neural Networks
    Sharma, Shambhavi
    [J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 153 - 158
  • [44] On the importance of the excitation signal generation method in bandwidth extension of speech
    Rodriguez, SA
    Burt, PMS
    [J]. 2005 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS - DESIGN AND IMPLEMENTATION (SIPS), 2005, : 430 - 435
  • [45] Modeling a refrigeration system using recurrent neural networks
    Habtom, R
    [J]. COMPUTATIONAL INTELLIGENCE: THEORY AND APPLICATIONS, 1999, 1625 : 47 - 52
  • [46] Using recurrent neural networks for circuit complexity modeling
    Beg, Azam
    Prasad, P. W. Chandana
    Arshad, Mirza A.
    Hasnain, Khursheed
    [J]. 10TH IEEE INTERNATIONAL MULTITOPIC CONFERENCE 2006, PROCEEDINGS, 2006, : 194 - +
  • [47] Textile Plant Modeling Using Recurrent Neural Networks
    Hamrouni, L.
    Kherallah, M.
    Alimi, A. M.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 1580 - 1584
  • [48] Modeling an intrusion detection using recurrent neural networks
    Ibrahim, Mariam
    Elhafiz, Ruba
    [J]. JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (01):
  • [49] Persian Language Modeling Using Recurrent Neural Networks
    Saravani, Seyed Habib Hosseini
    Bahrani, Mohammad
    Veisi, Hadi
    Besharati, Sara
    [J]. 2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 207 - 210
  • [50] Transient electromagnetic modeling using recurrent neural networks
    Sharma, H
    Zhang, QJ
    [J]. 2005 IEEE MTT-S International Microwave Symposium, Vols 1-4, 2005, : 1597 - 1600