On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

被引:8
|
作者
Sulun, Serkan [1 ]
Davies, Matthew E. P. [2 ]
机构
[1] Inst Syst & Comp Engn Technol & Sci INESC TEC, P-4200465 Porto, Portugal
[2] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, P-3030790 Coimbra, Portugal
关键词
Training; Testing; Wideband; Signal to noise ratio; Training data; Noise reduction; Neural networks; Audio bandwidth extension; audio enhancement; deep neural networks; generalization; regularization; overfitting; SPEECH;
D O I
10.1109/JSTSP.2020.3037485
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we address a subtopic of the broad domain of audio enhancement, namely musical audio bandwidth extension. We formulate the bandwidth extension problem using deep neural networks, where a band-limited signal is provided as input to the network, with the goal of reconstructing a full-bandwidth output. Our main contribution centers on the impact of the choice of low-pass filter when training and subsequently testing the network. For two different state-of-the-art deep architectures, ResNet and U-Net, we demonstrate that when the training and testing filters are matched, improvements in signal-to-noise ratio (SNR) of up to 7 dB can be obtained. However, when these filters differ, the improvement falls considerably and under some training conditions results in a lower SNR than the band-limited input. To circumvent this apparent overfitting to filter shape, we propose a data augmentation strategy which utilizes multiple low-pass filters during training and leads to improved generalization to unseen filtering conditions at test time.
引用
收藏
页码:132 / 142
页数:11
相关论文
共 50 条
  • [1] ARTIFICIAL BANDWIDTH EXTENSION USING DEEP NEURAL NETWORKS FOR SPECTRAL ENVELOPE ESTIMATION
    Abel, Johannes
    Strake, Maximilian
    Fingscheidt, Tim
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [2] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [3] Sequential Deep Neural Networks Ensemble for Speech Bandwidth Extension
    Lee, Bong-Ki
    Noh, Kyounjin
    Chang, Joon-Hyuk
    Choo, Kihyun
    Oh, Eunmi
    [J]. IEEE ACCESS, 2018, 6 : 27039 - 27047
  • [4] BLIND BANDWIDTH EXTENSION BASED ON CONVOLUTIONAL AND RECURRENT DEEP NEURAL NETWORKS
    Schmidt, Konstantin
    Edler, Bernd
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5444 - 5448
  • [5] Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation
    Abel, Johannes
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 71 - 83
  • [6] Audio bandwidth extension using ensemble of recurrent neural networks
    Xin Liu
    Chang-Chun Bao
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2016
  • [7] Audio bandwidth extension using ensemble of recurrent neural networks
    Liu, Xin
    Bao, Chang-Chun
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2016, : 1 - 12
  • [8] Music Genre Classification using Deep Neural Networks
    Yimer, Mekonen Hiwot
    Yu, Yongbin
    Adu, Kwabena
    Favour, Ekong
    Liyih, Sinishaw Melikamu
    Patamia, Rutherford Agbeshi
    [J]. 2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2384 - 2391
  • [9] BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks
    Moliner, Eloi
    Valimaki, Vesa
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 943 - 956
  • [10] Nonlinear Prediction with Deep Recurrent Neural Networks for Non-Blind Audio Bandwidth Extension
    Lin Jiang
    Ruimin Hu
    Xiaochen Wang
    Weiping Tu
    Maosheng Zhang
    [J]. China Communications, 2018, 15 (01) : 72 - 85