Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation

被引:0
|
作者
Pyykkonen, Pyry [1 ]
Mimilakis, Styliannos, I [2 ]
Drossos, Konstantinos [3 ]
Virtanen, Tuomas [3 ]
机构
[1] Tampere Univ, 3D Media Res Grp, Tampere, Finland
[2] Fraunhofer IDMT, Semant Mus Technol Grp, Ilmenau, Germany
[3] Tampere Univ, Audio Res Grp, Tampere, Finland
关键词
Depthwise separable convolutions; recurrent neural networks; mad; madtwinnet; monaural singing voice separation;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recent approaches for music source separation are almost exclusively based on deep neural networks, mostly employing recurrent neural networks (RNNs). Although RNNs are in many cases superior than other types of deep neural networks for sequence processing, they are known to have specific difficulties in training and parallelization, especially for the typically long sequences encountered in music source separation. In this paper we present a use-case of replacing RNNs with depthwise separable (DWS) convolutions, which are a lightweight and faster variant of the typical convolutions. We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs). We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance, by utilizing the standard metrics of signal-to-artifacts, signal-tointerference, and signal-to-distortion ratio. Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] PROXIMAL DEEP RECURRENT NEURAL NETWORK FOR MONAURAL SINGING VOICE SEPARATION
    Yuan, Weitao
    Wang, Shengbei
    Li, Xiangrui
    Unoki, Masashi
    Wang, Wenwu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 286 - 290
  • [2] Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks
    Dbouk, Hassan
    Shanbhag, Naresh R.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Improving Singing Voice Separation Using Curriculum Learning on Recurrent Neural Networks
    Kang, Seungtae
    Park, Jeong-Sik
    Jang, Gil-Jin
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (07):
  • [4] Source Separation and Depthwise Separable Convolutions for Computer Audition (Student Abstract)
    Mersy, Gabriel
    Kuan, Jin Hong
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15847 - 15848
  • [5] A Skip Attention Mechanism for Monaural Singing Voice Separation
    Yuan, Weitao
    Wang, Shengbei
    Li, Xiangrui
    Unoki, Masashi
    Wang, Wenwu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1481 - 1485
  • [6] SINGING VOICE DETECTION WITH DEEP RECURRENT NEURAL NETWORKS
    Leglaive, Simon
    Hennequin, Romain
    Badeau, Roland
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 121 - 125
  • [7] Enhanced feature network for monaural singing voice separation
    Yuan, Weitao
    He, Boxin
    Wang, Shengbei
    Wang, Jianming
    Unoki, Masashi
    [J]. SPEECH COMMUNICATION, 2019, 106 : 1 - 6
  • [8] Efficient Robust Music Genre Classification with Depthwise Separable Convolutions and Source Separation
    Mersy, Gabriel
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15972 - 15973
  • [9] DISCRIMINATIVE DEEP RECURRENT NEURAL NETWORKS FOR MONAURAL SPEECH SEPARATION
    Wang, Guan-Xiang
    Hsu, Chung-Chien
    Chien, Jen-Tzung
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2544 - 2548
  • [10] RPCA-DRNN technique for monaural singing voice separation
    Wen-Hsing Lai
    Siou-Lin Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2022