An iterative model-based approach to cochannel speech separation

被引:18
|
作者
Hu, Ke [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit & Brain Sci, Columbus, OH 43210 USA
关键词
AUDITORY SCENE ANALYSIS; SEQUENTIAL ORGANIZATION; CHANNEL; RECOGNITION; SEGREGATION; SYSTEM;
D O I
10.1186/1687-4722-2013-14
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cochannel speech separation aims to separate two speech signals from a single mixture. In a supervised scenario, the identities of two speakers are given, and current methods use pre-trained speaker models for separation. One issue in model-based methods is the mismatch between training and test signal levels. We propose an iterative algorithm to adapt speaker models to match the signal levels in testing. Our algorithm first obtains initial estimates of source signals using unadapted speaker models and then detects the input signal-to-noise ratio (SNR) of the mixture. The input SNR is then used to adapt the speaker models for more accurate estimation. The two steps iterate until convergence. Compared to search-based SNR detection methods, our method is not limited to given SNR levels. Evaluations demonstrate that the iterative procedure converges quickly in a considerable range of SNRs and improves separation results significantly. Comparisons show that the proposed system performs significantly better than related model-based systems.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] An iterative model-based approach to cochannel speech separation
    Ke Hu
    DeLiang Wang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [2] Model-based sequential organization in cochannel speech
    Shao, Y
    Wang, DL
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 289 - 298
  • [3] An Unsupervised Approach to Cochannel Speech Separation
    Hu, Ke
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 120 - 129
  • [4] COCHANNEL SPEECH SEPARATION
    LEE, CK
    CHILDERS, DG
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1988, 83 (01): : 274 - 280
  • [5] An Iterative Speech Model-Based A Priori SNR Estimator
    Elshamy, Samy
    Madhu, Nilesh
    Tirry, Wouter
    Fingscheidt, Tim
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1740 - 1744
  • [6] PITCH TRACKING FOR MODEL-BASED SPEECH SEPARATION
    Lee, S. W.
    Soong, Frank K.
    Ching, P. C.
    Lee, Tan
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 145 - 148
  • [7] A generalized approach for model-based speaker-dependent single channel speech separation
    Radfar, M. H.
    Sayadiyan, A.
    Dansereau, R. M.
    [J]. IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION B-ENGINEERING, 2007, 31 (B3): : 361 - 375
  • [8] Model-based Speech Separation with Single-microphone Input
    Lee, S. W.
    Soong, Frank K.
    Ching, P. C.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2648 - 2651
  • [9] GAIN ESTIMATION IN MODEL-BASED SINGLE CHANNEL SPEECH SEPARATION
    Radfar, M. H.
    Wong, W.
    Chan, W-Y.
    Dansereau, R. M.
    [J]. 2009 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2009, : 423 - +
  • [10] Model-based Speech Separation: Identifying Transcription using Orthogonality
    Lee, S. W.
    Soong, Frank K.
    Lee, Tan
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1363 - 1366