Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

被引:11
|
作者
Yuan, Weitao [1 ]
Dong, Bofei [1 ]
Wang, Shengbei [1 ]
Unoki, Masashi [2 ]
Wang, Wenwu [3 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin 300387, Peoples R China
[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Feature extraction; Periodic structures; Genetic algorithms; Music; Convolution; Phonocardiography; Speech processing; Evolving multi-resolution pooling CNN; genetic algorithm; monaural singing voice separation; neural architecture search; ACCOMPANIMENT;
D O I
10.1109/TASLP.2021.3051331
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural singing voice separation (MSVS) is a challenging task and has been extensively studied. Deep neural networks (DNNs) are current state-of-the-art methods for MSVS. However, they are often designed manually, which is time-consuming and error-prone. They are also pre-defined, thus cannot adapt their structures to the training data. To address these issues, we first designed a multi-resolution convolutional neural network (CNN) for MSVS called multi-resolution pooling CNN (MRP-CNN), which uses various-sized pooling operators to extract multi-resolution features. We then introduced Neural Architecture Search (NAS) to extend the MRP-CNN to the evolving MRP-CNN (E-MRP-CNN) to automatically search for effective MRP-CNN structures using genetic algorithms optimized in terms of a single objective taking into account only separation performance and multiple objectives taking into account both separation performance and model complexity. The E-MRP-CNN using the multi-objective algorithm gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Evaluations on the MIR-1 K, DSD100, and MUSDB18 datasets were used to demonstrate the advantages of the E-MRP-CNN over several recent baselines.
引用
收藏
页码:807 / 822
页数:16
相关论文
共 50 条
  • [1] Monaural singing voice separation based on high-resolution network
    Zhang, Yang
    Niu, Zhixian
    Niu, Baoning
    Chang, Yan
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2020, 46 (08): : 1555 - 1563
  • [2] Multi-Band Multi-Resolution Fully Convolutional Neural Networks for Singing Voice Separation
    Grais, Emad M.
    Zhao, Fei
    Plumbley, Mark D.
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 261 - 265
  • [3] A Skip Attention Mechanism for Monaural Singing Voice Separation
    Yuan, Weitao
    Wang, Shengbei
    Li, Xiangrui
    Unoki, Masashi
    Wang, Wenwu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (10) : 1481 - 1485
  • [4] Enhanced feature network for monaural singing voice separation
    Yuan, Weitao
    He, Boxin
    Wang, Shengbei
    Wang, Jianming
    Unoki, Masashi
    [J]. SPEECH COMMUNICATION, 2019, 106 : 1 - 6
  • [5] RPCA-DRNN technique for monaural singing voice separation
    Wen-Hsing Lai
    Siou-Lin Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [6] Separation of singing voice from music accompaniment for monaural recordings
    Li, Yipeng
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1475 - 1487
  • [7] RPCA-DRNN technique for monaural singing voice separation
    Lai, Wen-Hsing
    Wang, Siou-Lin
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [8] Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
    Grais, Emad M.
    Wierstorf, Hagen
    Ward, Dominic
    Plumbley, Mark D.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2018), 2018, 10891 : 340 - 350
  • [9] Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation
    Zhu, Bilei
    Li, Wei
    Li, Ruijiang
    Xue, Xiangyang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2096 - 2107
  • [10] PROXIMAL DEEP RECURRENT NEURAL NETWORK FOR MONAURAL SINGING VOICE SEPARATION
    Yuan, Weitao
    Wang, Shengbei
    Li, Xiangrui
    Unoki, Masashi
    Wang, Wenwu
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 286 - 290