Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CMNMF

被引:0
|
作者
Munoz-Montoro, Antonio J. [1 ]
Politis, Archontis [2 ]
Drossos, Konstantinos [2 ]
Carabias-Orti, Julio J. [1 ]
机构
[1] Univ Jaen, Telecommun Engn Dept, Jaen, Spain
[2] Tampere Univ, Audio Res Grp, Tampere, Finland
基金
欧洲研究理事会;
关键词
Multichannel Source Separation; Singing Voice; Deep Learning; CMNMF; Spatial Audio; SPATIAL COVARIANCE MODEL; AUDIO SOURCE SEPARATION; NONNEGATIVE MATRIX;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser twin network, able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on complex-valued multichannel non-negative matrix factorization (CMNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CMNMF method outperforms both the individual monophonic DL-based separation and the multichannel CMNMF baseline methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] A Constrained Deep Neural Network for Ordinal Regression
    Liu, Yanzhu
    Kong, Adams Wai Kin
    Goh, Chi Keong
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 831 - 839
  • [32] DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
    Zhang, Liqiang
    Yu, Chengzhu
    Lu, Heng
    Weng, Chao
    Zhang, Chunlei
    Wu, Yusong
    Xie, Xiang
    Li, Zijin
    Yu, Dong
    INTERSPEECH 2020, 2020, : 1231 - 1235
  • [33] Super resolution DOA estimation based on deep neural network
    Wanli Liu
    Scientific Reports, 10
  • [34] Super resolution DOA estimation based on deep neural network
    Liu, Wanli
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [35] SEPNET: A DEEP SEPARATION MATRIX PREDICTION NETWORK FOR MULTICHANNEL AUDIO SOURCE SEPARATION
    Inoue, Shota
    Kameoka, Hirokazu
    Li, Li
    Makino, Shoji
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 191 - 195
  • [36] Interactive Deep Singing-Voice Separation Based on Human-in-the-Loop Adaptation
    Nakano, Tomoyasu
    Koyama, Yuki
    Hamasaki, Masahiro
    Goto, Masataka
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2020, 2020, : 78 - 82
  • [37] JOINT SINGING PITCH ESTIMATION AND VOICE SEPARATION BASED ON A NEURAL HARMONIC STRUCTURE RENDERER
    Nakano, Tomoyasu
    Yoshii, Kazuyoshi
    Wu, Yiming
    Nishikimi, Ryo
    Lin, Kin Wah Edward
    Goto, Masataka
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 160 - 164
  • [38] Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation
    Pyykkonen, Pyry
    Mimilakis, Styliannos, I
    Drossos, Konstantinos
    Virtanen, Tuomas
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [39] Korean Singing Voice Synthesis System based on an LSTM Recurrent Neural Network
    Kim, Juntae
    Choi, Heejin
    Park, Jinuk
    Hahn, Minsoo
    Kim, Sangjin
    Kim, Jong-Jin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1551 - 1555