Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation

被引:0
|
作者
Grais, Emad M. [1 ]
Zhao, Fei [1 ]
Plumbley, Mark D. [2 ]
机构
[1] Cardiff Metropolitan Univ, Ctr Speech & Language Therapy & Hearing Sci, Cardiff, Wales
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
基金
英国工程与自然科学研究理事会;
关键词
Deep learning; convolutional neural networks; singing voice separation; single channel audio source separation; feature extraction; MUSIC;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks with convolutional layers usually process the entire spectrogram of an audio signal with the same time-frequency resolutions, number of filters, and dimensionality reduction scale. According to the constant-Q transform, good features can be extracted from audio signals if the low frequency bands are processed with high frequency resolution filters and the high frequency bands with high time resolution filters. In the spectrogram of a mixture of singing voices and music signals, there is usually more information about the voice in the low frequency bands than the high frequency bands. These raise the need for processing each part of the spectrogram differently. In this paper, we propose a multi-band multi-resolution fully convolutional neural network (MBR-FCN) for singing voice separation. The MBR-FCN processes the frequency bands that have more information about the target signals with more filters and smaller dimensionality reduction scale than the bands with less information. Furthermore, the MBR-FCN processes the low frequency bands with high frequency resolution filters and the high frequency bands with high time resolution filters. Our experimental results show that the proposed MBR-FCN with very few parameters achieves better singing voice separation performance than other deep neural networks.
引用
收藏
页码:261 / 265
页数:5
相关论文
共 50 条
  • [1] Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
    Grais, Emad M.
    Wierstorf, Hagen
    Ward, Dominic
    Plumbley, Mark D.
    LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 340 - 350
  • [2] A MULTI-DILATION AND MULTI-RESOLUTION FULLY CONVOLUTIONAL NETWORK FOR SINGING MELODY EXTRACTION
    Gao, Ping
    You, Cheng-You
    Chi, Tai-Shih
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 551 - 555
  • [3] Multi-band Masking for Waveform-based Singing Voice Separation
    Papantonakis, Panagiotis
    Garoufis, Christos
    Maragos, Petros
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 249 - 253
  • [4] Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation
    Yuan, Weitao
    Dong, Bofei
    Wang, Shengbei
    Unoki, Masashi
    Wang, Wenwu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 807 - 822
  • [5] Multi-resolution convolutional neural networks for inverse problems
    Wang, Feng
    Eljarrat, Alberto
    Mueller, Johannes
    Henninen, Trond R.
    Erni, Rolf
    Koch, Christoph T.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [6] Multi-resolution convolutional neural networks for inverse problems
    Feng Wang
    Alberto Eljarrat
    Johannes Müller
    Trond R. Henninen
    Rolf Erni
    Christoph T. Koch
    Scientific Reports, 10
  • [7] Multi-Resolution for Disparity Estimation with Convolutional Neural Networks
    Jammal, Samer
    Tillo, Tammam
    Xiao, Jimin
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1756 - 1761
  • [8] Multi-Resolution Convolutional Recurrent Networks
    Chien, Jen-Tzung
    Huang, Yu-Min
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 2043 - 2048
  • [9] Multi-resolution convolutional neural networks for fully automated segmentation of acutely injured lungs in multiple species
    Gerard, Sarah E.
    Herrmann, Jacob
    Kaczka, David W.
    Musch, Guido
    Fernandez-Bustamante, Ana
    Reinhardt, Joseph M.
    MEDICAL IMAGE ANALYSIS, 2020, 60 (60)
  • [10] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351