MULTI-TASK AUDIO SOURCE SEPARATION

被引:0
|
作者
Zhang, Lu [1 ,2 ]
Li, Chenxing [1 ]
Deng, Feng [1 ]
Wang, Xiaorui [1 ]
机构
[1] Kuai Shou Technol Co, Beijing, Peoples R China
[2] Harbin Inst Technol, Shenzhen, Peoples R China
关键词
multi-task audio source separation; two-stage model; complex ratio mask; SPEECH SEPARATION; NETWORKS;
D O I
10.1109/ASRU51503.2021.9687922
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The audio source separation tasks, such as speech enhancement, speech separation, and music source separation, have achieved impressive performance in recent studies. The powerful modeling capabilities of deep neural networks give us hope for more challenging tasks. This paper launches a new multi-task audio source separation (MTASS) challenge to separate the speech, music, and noise signals from the monaural mixture. First, we introduce the details of this task and generate a dataset of mixtures containing speech, music, and background noises. Then, we propose an MTASS model in the complex domain to fully utilize the differences in spectral characteristics of the three audio signals. In detail, the proposed model follows a two-stage pipeline, which separates the three types of audio signals and then performs signal compensation separately. After comparing different training targets, the complex ratio mask is selected as a more suitable target for the MTASS. The experimental results also indicate that the residual signal compensation module helps to recover the signals further. The proposed model shows significant advantages in separation performance over several well-known separation models.
引用
收藏
页码:671 / 678
页数:8
相关论文
共 50 条
  • [1] Multi-Task Learning for Blind Source Separation
    Du, Bo
    Wang, Shaodong
    Xu, Chang
    Wang, Nan
    Zhang, Liangpei
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (09) : 4219 - 4231
  • [2] Conformer Space Neural Architecture Search for Multi-Task Audio Separation
    Lu, Shun
    Wang, Yang
    Yao, Peng
    Li, Chenxing
    Tan, Jianchao
    Deng, Feng
    Wang, Xiaorui
    Song, Chengru
    INTERSPEECH 2022, 2022, : 5358 - 5362
  • [3] Spectrogram based multi-task audio classification
    Zeng, Yuni
    Mao, Hua
    Peng, Dezhong
    Yi, Zhang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3705 - 3722
  • [4] Spectrogram based multi-task audio classification
    Yuni Zeng
    Hua Mao
    Dezhong Peng
    Zhang Yi
    Multimedia Tools and Applications, 2019, 78 : 3705 - 3722
  • [5] EAD-CONFORMER: A CONFORMER-BASED ENCODER-ATTENTION-DECODER-NETWORK FOR MULTI-TASK AUDIO SOURCE SEPARATION
    Li, Chenxing
    Wang, Yang
    Deng, Feng
    Zhang, Zhuo
    Wang, Xiaorui
    Wang, Zhongyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 521 - 525
  • [6] Multi-Task Adapters for On-Device Audio Inference
    Tagliasacchi, Marco
    Quitry, Felix de Chaumont
    Roblek, Dominik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 630 - 634
  • [7] Multi-Task Audio-Driven Facial Animation
    Kim, Youngsoo
    An, Shounan
    Jo, Youngbak
    Park, Seungje
    Kang, Shindong
    Oh, Insoo
    Kim, Duke Donghyun
    SIGGRAPH '19 - ACM SIGGRAPH 2019 POSTERS, 2019,
  • [8] Binaural Audio Generation via Multi-task Learning
    Li, Sijia
    Liu, Shiguang
    Manocha, Dinesh
    ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (06):
  • [9] WA-Transformer: Window Attention-based Transformer with Two-stage Strategy for Multi-task Audio Source Separation
    Wang, Yang
    Li, Chenxing
    Deng, Feng
    Lu, Shun
    Yao, Peng
    Tan, Jianchao
    Song, Chengru
    Wang, Xiaorui
    INTERSPEECH 2022, 2022, : 5373 - 5377
  • [10] WEIGHTED AND MULTI-TASK LOSS FOR RARE AUDIO EVENT DETECTION
    Huy Phan
    Krawczyk-Becker, Martin
    Gerkmann, Timo
    Mertins, Alfred
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 336 - 340