Combining Mask Estimates for Single Channel Audio Source Separation using Deep Neural Networks

被引:15
|
作者
Grais, Emad M. [1 ]
Roma, Gerard [1 ]
Simpson, Andrew J. R. [1 ]
Plumbley, Mark D. [1 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
基金
英国工程与自然科学研究理事会;
关键词
Combining estimates; deep neural networks; single channel source separation; neural network ensembles; deep learning;
D O I
10.21437/Interspeech.2016-216
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask.
引用
收藏
页码:3339 / 3343
页数:5
相关论文
共 50 条
  • [11] PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation
    Takahashi, Naoya
    Agrawal, Purvi
    Goswami, Nabarun
    Mitsufuji, Yuki
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2713 - 2717
  • [12] SINGLE CHANNEL AUDIO SOURCE SEPARATION USING CONVOLUTIONAL DENOISING AUTOENCODERS
    Grais, Emad M.
    Plumbley, Mark D.
    [J]. 2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 1265 - 1269
  • [13] Single Channel Audio Source Separation by Clustered NMF
    Kirbiz, Serap
    Gunsel, Bilge
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 469 - 472
  • [14] Fully Quantized Neural Networks for Audio Source Separation
    Cohen, Elad
    Habi, Hai Victor
    Peretz, Reuven
    Netzer, Arnon
    [J]. IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 926 - 933
  • [15] BITWISE NEURAL NETWORKS FOR EFFICIENT SINGLE-CHANNEL SOURCE SEPARATION
    Kim, Minje
    Smaragdis, Paris
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 701 - 705
  • [16] On Discriminative Framework for Single Channel Audio Source Separation
    Gang, Arpita
    Biyani, Pravesh
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 565 - 569
  • [17] Single Channel Speech Separation Using Deep Neural Network
    Chen, Linlin
    Ma, Xiaohong
    Ding, Shuxue
    [J]. ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 285 - 292
  • [18] Deep neural networks for emotion recognition combining audio and transcripts
    Cho, Jaejin
    Pappagari, Raghavendra
    Kulkarni, Purva
    Villalba, Jesus
    Carmiel, Yishay
    Dehak, Najim
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 247 - 251
  • [19] Single Channel Blind Source Separation Under Deep Recurrent Neural Network
    Jiai He
    Wei Chen
    Yuxiao Song
    [J]. Wireless Personal Communications, 2020, 115 : 1277 - 1289
  • [20] Single Channel Blind Source Separation Under Deep Recurrent Neural Network
    He, Jiai
    Chen, Wei
    Song, Yuxiao
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2020, 115 (02) : 1277 - 1289