Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

被引:11
|
作者
Yuan, Weitao [1 ]
Dong, Bofei [1 ]
Wang, Shengbei [1 ]
Unoki, Masashi [2 ]
Wang, Wenwu [3 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin 300387, Peoples R China
[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Feature extraction; Periodic structures; Genetic algorithms; Music; Convolution; Phonocardiography; Speech processing; Evolving multi-resolution pooling CNN; genetic algorithm; monaural singing voice separation; neural architecture search; ACCOMPANIMENT;
D O I
10.1109/TASLP.2021.3051331
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural singing voice separation (MSVS) is a challenging task and has been extensively studied. Deep neural networks (DNNs) are current state-of-the-art methods for MSVS. However, they are often designed manually, which is time-consuming and error-prone. They are also pre-defined, thus cannot adapt their structures to the training data. To address these issues, we first designed a multi-resolution convolutional neural network (CNN) for MSVS called multi-resolution pooling CNN (MRP-CNN), which uses various-sized pooling operators to extract multi-resolution features. We then introduced Neural Architecture Search (NAS) to extend the MRP-CNN to the evolving MRP-CNN (E-MRP-CNN) to automatically search for effective MRP-CNN structures using genetic algorithms optimized in terms of a single objective taking into account only separation performance and multiple objectives taking into account both separation performance and model complexity. The E-MRP-CNN using the multi-objective algorithm gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Evaluations on the MIR-1 K, DSD100, and MUSDB18 datasets were used to demonstrate the advantages of the E-MRP-CNN over several recent baselines.
引用
收藏
页码:807 / 822
页数:16
相关论文
共 50 条
  • [31] HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation
    Garoufis, Christos
    Zlatintsi, Athanasia
    Maragos, Petros
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 341 - 345
  • [32] Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music Via DNN and Adaptive Pitch Tracking
    Fan, Zhe-Cheng
    Jang, Jyh-Shing Roger
    Lu, Chung-Li
    [J]. 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 178 - 185
  • [33] Multi-resolution Stacking for Speech Separation Based on Boosted DNN
    Zhang, Xiao-Lei
    Wang, DeLiang
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1745 - 1749
  • [34] Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling
    Wijaya, Kevin Tirta
    Paek, Dong-Hee
    Kong, Seung-Hyun
    [J]. REMOTE SENSING, 2024, 16 (11)
  • [35] Multi-Resolution CNN and Knowledge Transfer for Candidate Classification in Lung Nodule Detection
    Zuo, Wangxia
    Zhou, Fuqiang
    Li, Zuoxin
    Wang, Lin
    [J]. IEEE ACCESS, 2019, 7 : 32510 - 32521
  • [36] DETECT FACE IN THE WILD USING CNN CASCADE WITH FEATURE AGGREGATION AT MULTI-RESOLUTION
    Deng, Jingjing
    Xie, Xianghua
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4167 - 4171
  • [37] Multi-resolution Path CNN with Deep Supervision for Intervertebral Disc Localization and Segmentation
    Gao, Yunhe
    Liu, Chang
    Zhao, Liang
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 309 - 317
  • [38] CNN-based Pansharpening of Multi-Resolution Remote-Sensing Images
    Masi, Giuseppe
    Cozzolino, Davide
    Verdoliva, Luisa
    Scarpa, Giuseppe
    [J]. 2017 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2017,
  • [39] Multi-band Masking for Waveform-based Singing Voice Separation
    Papantonakis, Panagiotis
    Garoufis, Christos
    Maragos, Petros
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 249 - 253
  • [40] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
    Bhuwan Bhattarai
    Yagya Raj Pandeya
    You Jie
    Arjun Kumar Lamichhane
    Joonwhoan Lee
    [J]. Circuits, Systems, and Signal Processing, 2023, 42 : 1083 - 1104