Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension

被引:17
|
作者
Gu, Yu [1 ]
Ling, Zhen-Hua [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
speech bandwidth extension; stacked dilated convolutional neural networks; causal convolution; non-causal convolution; WaveNet;
D O I
10.21437/Interspeech.2017-336
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a waveform modeling and generation method for speech bandwidth extension (BWE) using stacked dilated convolutional neural networks (CNNs) with causal or non-causal convolutional layers. Such dilated CNNs describe the predictive distribution for each wideband or high-frequency speech sample conditioned on the input narrowband speech samples. Distinguished from conventional frame-based BWE approaches. the proposed methods can model the speech waveforms directly and therefore avert the spectral conversion and phase estimation problems. Experimental results prove that the BWE methods proposed in this paper can achieve better performance than the state-of-the-art frame-based approach utilizing recurrent neural networks (RNNs) incorporating long shortterm memory (LSTM) cells in subjective preference tests.
引用
收藏
页码:1123 / 1127
页数:5
相关论文
共 50 条
  • [1] Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension
    Ling, Zhen-Hua
    Ai, Yang
    Gu, Yu
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 883 - 894
  • [2] BANDWIDTH EXTENSION OF MUSICAL AUDIO SIGNALS WITH NO SIDE INFORMATION USING DILATED CONVOLUTIONAL NEURAL NETWORKS
    Lagrange, Mathieu
    Gontier, Felix
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 801 - 805
  • [3] SPEECH WAVEFORM RECONSTRUCTION USING CONVOLUTIONAL NEURAL NETWORKS WITH NOISE AND PERIODIC INPUTS
    Watts, Oliver
    Valentini-Botinhao, Cassia
    King, Simon
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7045 - 7049
  • [4] Mapping Neural Networks for Bandwidth Extension of Narrowband Speech
    Shahina, A.
    Yegnanarayana, B.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1435 - 1438
  • [5] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [6] Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension
    Lee, Bong-Ki
    Noh, Kyounjin
    Chang, Joon-Hyuk
    Choo, Kihyun
    Oh, Eunmi
    [J]. IEEE ACCESS, 2018, 6 : 27039 - 27047
  • [7] Restoring High Frequency Spectral Envelopes Using Neural Networks for Speech Bandwidth Extension
    Gu, Yu
    Ling, Zhen-Hua
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [8] Speech bandwidth extension using temporal envelope Modeling
    Kim, Kyung-Tae
    Lee, Min-Ki
    Kang, Hong-Goo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2008, 15 : 429 - 432
  • [9] SPEECH BANDWIDTH EXTENSION USING GENERATIVE ADVERSARIAL NETWORKS
    Li, Sen
    Villette, Stephane
    Ramadas, Pravin
    Sinder, Daniel J.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5029 - 5033
  • [10] BLIND BANDWIDTH EXTENSION BASED ON CONVOLUTIONAL AND RECURRENT DEEP NEURAL NETWORKS
    Schmidt, Konstantin
    Edler, Bernd
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5444 - 5448