A Joint Approach for Single-Channel Speaker Identification and Speech Separation

被引:34
|
作者
Mowlaee, Pejman [1 ]
Saeidi, Rahim [2 ]
Christensen, Mads Grsboll [3 ]
Tan, Zheng-Hua [4 ]
Kinnunen, Tomi [5 ]
Franti, Pasi [5 ]
Jensen, Soren Holdt [4 ]
机构
[1] Ruhr Univ Bochum, Inst Commun Acoust IKA, D-44801 Bochum, Germany
[2] Radboud Univ Nijmegen, Ctr Language & Speech Technol, NL-6500 HD Nijmegen, Netherlands
[3] Aalborg Univ, Dept Architecture Design & Media Technol, DK-9220 Aalborg, Denmark
[4] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
[5] Univ Eastern Finland, Sch Comp, FI-70211 Kuopio, Finland
基金
芬兰科学院;
关键词
BSS EVAL; single-channel speech separation; sinusoidal modeling; speaker identification; speech recognition; ENHANCEMENT; MODEL; PERFORMANCE;
D O I
10.1109/TASL.2012.2208627
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results are generally lower. It outperforms the state-of-the-art in terms of intelligibility showing that the ASR results are not conclusive. The proposed method achieves on average, 52.3% ASR accuracy, 41.2 points in MUSHRA and 85.9% in speech intelligibility.
引用
收藏
页码:2586 / 2601
页数:16
相关论文
共 50 条
  • [1] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [2] Speaker Separation Using Visual Speech Features and Single-channel Audio
    Khan, Faheem
    Milner, Ben
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3263 - 3267
  • [3] Speaker Verification-Based Evaluation of Single-Channel Speech Separation
    Maciejewski, Matthew
    Watanabe, Shinji
    Khudanpur, Sanjeev
    [J]. INTERSPEECH 2021, 2021, : 3520 - 3524
  • [4] A PITCH-AWARE APPROACH TO SINGLE-CHANNEL SPEECH SEPARATION
    Wang, Ke
    Soong, Frank
    Xie, Lei
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 296 - 300
  • [5] Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Kinnunen, T.
    Franti, P.
    Jensen, S. H.
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 684 - +
  • [6] Joint identification-separation technique for single channel speech separation
    Radfar, M. H.
    Dansereau, R. M.
    Sayadiyan, A.
    [J]. 2006 IEEE 12TH DIGITAL SIGNAL PROCESSING WORKSHOP & 4TH IEEE SIGNAL PROCESSING EDUCATION WORKSHOP, VOLS 1 AND 2, 2006, : 76 - 81
  • [7] Soft mask methods for single-channel speaker separation
    Reddy, Aarthi M.
    Raj, Bhiksha
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1766 - 1776
  • [8] A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2018, 90 (07): : 963 - 973
  • [9] A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech
    Yan-Hui Tu
    Jun Du
    Chin-Hui Lee
    [J]. Journal of Signal Processing Systems, 2018, 90 : 963 - 973
  • [10] Assessment of Single-Channel Speech Enhancement Techniques for Speaker Identification under Mismatched Conditions
    Sadjadi, Seyed Omid
    Hansen, John H. L.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2138 - 2141