CASA BASED SUPERVISED SINGLE CHANNEL SPEAKER INDEPENDENT SPEECH SEPARATION

被引:0
|
作者
Rehman, M. Fazal Ur [1 ]
Saleem, Nasir [1 ]
Nawaz, Asif [2 ]
Jan, Sadeeq [3 ]
Najam, Zeeshan [4 ]
Khattak, M. Irfan [5 ]
Ahmed, Sheeraz [6 ]
机构
[1] Gomal Univ, Dept Elect Engn, Dera Ismail Khan, Pakistan
[2] HCT, Dubai Women Coll, Fac Engn ETS, Dubai, U Arab Emirates
[3] Univ Engn & Tech, Dept CSIT, Peshawar, Pakistan
[4] UET, MNS, Dept Elect Engn, Multan, Pakistan
[5] Univ Engn & Tech, Dept Elect Engn, Kohat Campus, Kohat, Pakistan
[6] Iqra Natl Univ, Dept Comp Sci, Peshawar, Pakistan
关键词
CASA; IBM; intelligibility; time-frequency masking; supervised speech separation; quality; QUALITY ASSESSMENT;
D O I
10.26782/jmcms.2019.12.00074
中图分类号
O3 [力学];
学科分类号
08 ; 0801 ;
摘要
Computational auditory scene analysis (CASA) based speech separation is widely considered in a number speech processing applications and is used to separate a target speech from target-interference mixtures and usually the task of target separation is considered as a signal processing problem. However, target speech separation is formulated as a supervised learning problem and discriminative patterns of speech, speakers and background noises are learned from input training data. In this paper, we present a single channel supervised speech separation approach based on the ideal binary mask (IBM) estimation. In proposed approach, speaker independent speech separation system is trained with sets of the clean speech magnitudes and during separation; SNR is estimated in time-frequency (TF) channels using clean magnitudes and compared to a pre-defined threshold. The TF channels satisfying threshold are hold while TF channels violating the threshold are discarded to construct an IBM. The estimated mask is than applied to the mixtures to reconstruct the target speech, using phase of the mixture speech. The experiments are conducted in three speaker independent mixture's scenarios: termed as 2-talkers, 3-talkers and 4-talkers mixtures at four input SNRs: -5dB, 0dB, 5dB and 10dB. The experimental outcomes reported that proposed CASA based supervised speaker independent mask estimation outperformed the competing approaches: Nonnegative matrix factorization (NMF), Nonnegative dynamical system (NNDS) and log minimum mean square error (LMMSE) estimation in terms of PESQ, SegSNR, LLR, WSS, SIG, BAK and STOI objective measures.
引用
收藏
页码:973 / 984
页数:12
相关论文
共 50 条
  • [1] A CASA APPROACH TO DEEP LEARNING BASED SPEAKER-INDEPENDENT CO-CHANNEL SPEECH SEPARATION
    Liu, Yuzhou
    Wang, DeLiang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5399 - 5403
  • [2] Speaker-independent model-based single channel speech separation
    Radfar, M. H.
    Dansereau, R. M.
    Sayadiyan, A.
    [J]. NEUROCOMPUTING, 2008, 72 (1-3) : 71 - 78
  • [3] Speaker Verification Based on Single Channel Speech Separation
    Jin, Rong
    Ablimit, Mijit
    Hamdulla, Askar
    [J]. IEEE ACCESS, 2023, 11 : 112631 - 112638
  • [4] UNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION
    Sun, Dennis L.
    Mysore, Gautham J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 141 - 145
  • [5] Soft-CASA system for Single Channel Speech Separation
    Wiem, Belhedi
    Anouar, Ben Messaoud Mohamed
    Aicha, Bouzid
    [J]. 2016 4TH INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING & INFORMATION TECHNOLOGY (CEIT), 2016,
  • [6] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Maciejewski, Matthew
    Watanabe, Shinji
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2021, 2021, : 3520 - 3524
  • [7] Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation
    Saleem, Nasir
    Khattak, Muhammad Irfan
    [J]. APPLIED ACOUSTICS, 2020, 167
  • [8] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [9] SPEAKER AND NOISE INDEPENDENT ONLINE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Germain, Francois G.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 71 - 75
  • [10] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
    Mowlaee, Pejman
    Saeidi, Rahim
    Christensen, Mads Grsboll
    Tan, Zheng-Hua
    Kinnunen, Tomi
    Franti, Pasi
    Jensen, Soren Holdt
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601