Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

被引：11

作者：

Yuan, Weitao ^{[1
]}

Dong, Bofei ^{[1
]}

Wang, Shengbei ^{[1
]}

Unoki, Masashi ^{[2
]}

Wang, Wenwu ^{[3
]}

机构：

[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin 300387, Peoples R China

[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan

[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Periodic structures; Genetic algorithms; Music; Convolution; Phonocardiography; Speech processing; Evolving multi-resolution pooling CNN; genetic algorithm; monaural singing voice separation; neural architecture search; ACCOMPANIMENT;

D O I：

10.1109/TASLP.2021.3051331

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Monaural singing voice separation (MSVS) is a challenging task and has been extensively studied. Deep neural networks (DNNs) are current state-of-the-art methods for MSVS. However, they are often designed manually, which is time-consuming and error-prone. They are also pre-defined, thus cannot adapt their structures to the training data. To address these issues, we first designed a multi-resolution convolutional neural network (CNN) for MSVS called multi-resolution pooling CNN (MRP-CNN), which uses various-sized pooling operators to extract multi-resolution features. We then introduced Neural Architecture Search (NAS) to extend the MRP-CNN to the evolving MRP-CNN (E-MRP-CNN) to automatically search for effective MRP-CNN structures using genetic algorithms optimized in terms of a single objective taking into account only separation performance and multiple objectives taking into account both separation performance and model complexity. The E-MRP-CNN using the multi-objective algorithm gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Evaluations on the MIR-1 K, DSD100, and MUSDB18 datasets were used to demonstrate the advantages of the E-MRP-CNN over several recent baselines.

引用

页码：807 / 822

页数：16

共 50 条

[31] HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation
Garoufis, Christos
Zlatintsi, Athanasia
Maragos, Petros
[J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 341 - 345
[32] Singing Voice Separation and Pitch Extraction from Monaural Polyphonic Audio Music Via DNN and Adaptive Pitch Tracking
Fan, Zhe-Cheng
Jang, Jyh-Shing Roger
Lu, Chung-Li
[J]. 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 178 - 185
[33] Multi-resolution Stacking for Speech Separation Based on Boosted DNN
Zhang, Xiao-Lei
Wang, DeLiang
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1745 - 1749
[34] Advanced Feature Learning on Point Clouds Using Multi-Resolution Features and Learnable Pooling
Wijaya, Kevin Tirta
Paek, Dong-Hee
Kong, Seung-Hyun
[J]. REMOTE SENSING, 2024, 16 (11)
[35] Multi-Resolution CNN and Knowledge Transfer for Candidate Classification in Lung Nodule Detection
Zuo, Wangxia
Zhou, Fuqiang
Li, Zuoxin
Wang, Lin
[J]. IEEE ACCESS, 2019, 7 : 32510 - 32521
[36] DETECT FACE IN THE WILD USING CNN CASCADE WITH FEATURE AGGREGATION AT MULTI-RESOLUTION
Deng, Jingjing
Xie, Xianghua
[J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 4167 - 4171
[37] Multi-resolution Path CNN with Deep Supervision for Intervertebral Disc Localization and Segmentation
Gao, Yunhe
Liu, Chang
Zhao, Liang
[J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 309 - 317
[38] CNN-based Pansharpening of Multi-Resolution Remote-Sensing Images
Masi, Giuseppe
Cozzolino, Davide
Verdoliva, Luisa
Scarpa, Giuseppe
[J]. 2017 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2017,
[39] Multi-band Masking for Waveform-based Singing Voice Separation
Papantonakis, Panagiotis
Garoufis, Christos
Maragos, Petros
[J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 249 - 253
[40] High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation
Bhuwan Bhattarai
Yagya Raj Pandeya
You Jie
Arjun Kumar Lamichhane
Joonwhoan Lee
[J]. Circuits, Systems, and Signal Processing, 2023, 42 : 1083 - 1104

← 1 2 3 4 5 →