Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

被引:11
|
作者
Yuan, Weitao [1 ]
Dong, Bofei [1 ]
Wang, Shengbei [1 ]
Unoki, Masashi [2 ]
Wang, Wenwu [3 ]
机构
[1] Tiangong Univ, Sch Comp Sci & Technol, Tianjin Key Lab Autonomous Intelligence Technol &, Tianjin 300387, Peoples R China
[2] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi 9231292, Japan
[3] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Feature extraction; Periodic structures; Genetic algorithms; Music; Convolution; Phonocardiography; Speech processing; Evolving multi-resolution pooling CNN; genetic algorithm; monaural singing voice separation; neural architecture search; ACCOMPANIMENT;
D O I
10.1109/TASLP.2021.3051331
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Monaural singing voice separation (MSVS) is a challenging task and has been extensively studied. Deep neural networks (DNNs) are current state-of-the-art methods for MSVS. However, they are often designed manually, which is time-consuming and error-prone. They are also pre-defined, thus cannot adapt their structures to the training data. To address these issues, we first designed a multi-resolution convolutional neural network (CNN) for MSVS called multi-resolution pooling CNN (MRP-CNN), which uses various-sized pooling operators to extract multi-resolution features. We then introduced Neural Architecture Search (NAS) to extend the MRP-CNN to the evolving MRP-CNN (E-MRP-CNN) to automatically search for effective MRP-CNN structures using genetic algorithms optimized in terms of a single objective taking into account only separation performance and multiple objectives taking into account both separation performance and model complexity. The E-MRP-CNN using the multi-objective algorithm gives a set of Pareto-optimal solutions, each providing a trade-off between separation performance and model complexity. Evaluations on the MIR-1 K, DSD100, and MUSDB18 datasets were used to demonstrate the advantages of the E-MRP-CNN over several recent baselines.
引用
收藏
页码:807 / 822
页数:16
相关论文
共 50 条
  • [21] SEMI-SUPERVISED MONAURAL SINGING VOICE SEPARATION WITH A MASKING NETWORK TRAINED ON SYNTHETIC MIXTURES
    Michelashvili, Michael
    Benaim, Sagie
    Wolf, Lior
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 291 - 295
  • [22] Monaural Singing Voice and Accompaniment Separation Based on Gated Nested U-Net Architecture
    Geng, Haibo
    Hu, Ying
    Huang, Hao
    [J]. SYMMETRY-BASEL, 2020, 12 (06):
  • [23] Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms
    Tachibana, Hideyuki
    Ono, Nobutaka
    Sagayama, Shigeki
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (01) : 228 - 237
  • [24] DATA AUGMENTATION FOR MONAURAL SINGING VOICE SEPARATION BASED ON VARIATIONAL AUTOENCODER-GENERATIVE ADVERSARIAL NETWORK
    He, Boxin
    Wang, Shengbei
    Yuan, Weitao
    Wang, Jianming
    Unoki, Masashi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1354 - 1359
  • [25] A RECURRENT ENCODER-DECODER APPROACH WITH SKIP-FILTERING CONNECTIONS FOR MONAURAL SINGING VOICE SEPARATION
    Mimilakis, Stylianos Ioannis
    Drossos, Konstantinos
    Virtanen, Tuomas
    Schuller, Gerald
    [J]. 2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [26] Super-resolution method for MR images based on multi-resolution CNN
    Kang, Li
    Liu, Guojuan
    Huang, Jianjun
    Li, Jianping
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 72
  • [27] A MULTI-DILATION AND MULTI-RESOLUTION FULLY CONVOLUTIONAL NETWORK FOR SINGING MELODY EXTRACTION
    Gao, Ping
    You, Cheng-You
    Chi, Tai-Shih
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 551 - 555
  • [28] MONAURAL SINGING VOICE SEPARATION WITH SKIP-FILTERING CONNECTIONS AND RECURRENT INFERENCE OF TIME-FREQUENCY MASK
    Mimilakis, Stylianos Ioannis
    Drossos, Konstantinos
    Santos, Joao F.
    Schuller, Gerald
    Virtanen, Tuomas
    Bengio, Yoshua
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 721 - 725
  • [29] Radar HRRP Recognition using Attentional CNN with Multi-resolution Spectrograms
    Wan, Jinwei
    Chen, Bo
    Yuan, Yijun
    Liu, Hongwei
    Jin, Lin
    [J]. 2019 INTERNATIONAL RADAR CONFERENCE (RADAR2019), 2019, : 326 - 329
  • [30] MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation
    Detsikas, Nikolaos
    Mitianoudis, Nikolaos
    Pratikakis, Ioannis
    [J]. JOURNAL OF IMAGING, 2024, 10 (06)