Single Channel Speech Source Separation Using Hierarchical Deep Neural Networks

被引:0
|
作者
Noorani, Seyed Majid [1 ]
Seyedin, Sanaz [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
关键词
Speech source separation; Deep neural networks; Time-frequency masks; BLIND SEPARATION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Single-channel speech source separation is a well known task for preparing speech signals for some applications like speech recognition and enhancement. In this paper, we introduce a novel design for separating sources with the help of hierarchical deep neural networks and time-frequency masks. The proposed method classifies the mixture signals in three categories based on the mixed genders in the first hierarchy. Thus, three other networks, each for a specific mixture type, use these categorized data for speech separation. Then, an enhancement stage improves the quality of voices considering an improved cost function that reduces the interference of the estimated sources of the previous stage. The demanded data is gathered from TSP corpus and the output of the systems have been evaluated with different metrics such as signal to distortion ratio (SDR), signal to interference ratio (SIR) and Perceptual evaluation of speech quality (PESQ). Comparing with other methods, the proposed architecture works considerably better and the results are outstanding.
引用
收藏
页码:466 / 470
页数:5
相关论文
共 50 条
  • [1] DEEP NEURAL NETWORKS FOR SINGLE CHANNEL SOURCE SEPARATION
    Grais, Emad M.
    Sen, Mehmet Umut
    Erdogan, Hakan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [2] Single Channel Speech Separation Using Deep Neural Network
    Chen, Linlin
    Ma, Xiaohong
    Ding, Shuxue
    [J]. ADVANCES IN NEURAL NETWORKS, PT I, 2017, 10261 : 285 - 292
  • [3] Discriminative Enhancement for Single Channel Audio Source Separation Using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. LATENT VARIABLE ANALYSIS AND SIGNAL SEPARATION (LVA/ICA 2017), 2017, 10169 : 236 - 246
  • [4] Combining Mask Estimates for Single Channel Audio Source Separation using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3339 - 3343
  • [5] Towards Automated Single Channel Source Separation using Neural Networks
    Gang, Arpita
    Biyani, Pravesh
    Soni, Akshay
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3494 - 3498
  • [6] Two-Stage Single-Channel Audio Source Separation Using Deep Neural Networks
    Grais, Emad M.
    Roma, Gerard
    Simpson, Andrew J. R.
    Plumbley, Mark D.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (09) : 1469 - 1479
  • [7] JOINT TRAINING OF DEEP NEURAL NETWORKS FOR MULTI-CHANNEL DEREVERBERATION AND SPEECH SOURCE SEPARATION
    Togami, Masahito
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3032 - 3036
  • [8] SINGLE-CHANNEL MIXED SPEECH RECOGNITION USING DEEP NEURAL NETWORKS
    Weng, Chao
    Yu, Dong
    Seltzer, Michael L.
    Droppo, Jasha
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] BITWISE NEURAL NETWORKS FOR EFFICIENT SINGLE-CHANNEL SOURCE SEPARATION
    Kim, Minje
    Smaragdis, Paris
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 701 - 705
  • [10] A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks
    Wang, Yannan
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (07) : 1535 - 1546