Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization

被引:105
|
作者
Yoshioka, Takuya [1 ]
Nakatani, Tomohiro [1 ]
Miyoshi, Masato [2 ]
Okuno, Hiroshi G. [3 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[2] Kanazawa Univ, Grad Sch Nat Sci & Technol, Kanazawa, Ishikawa 9201192, Japan
[3] Kyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Kyoto 6068501, Japan
关键词
Blind source separation (BSS); blind dereverberation (BD); conditional separation and dereverberation (CSD); CONVOLUTIVE MIXTURES; IDENTIFICATION; DECONVOLUTION; SUPPRESSION; ALGORITHMS; SIGNALS;
D O I
10.1109/TASL.2010.2045183
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.
引用
下载
收藏
页码:69 / 84
页数:16
相关论文
共 50 条
  • [41] Blind source separation of convolutive mixtures of speech in frequency domain
    Makino, S
    Sawada, H
    Mukai, R
    Araki, S
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1640 - 1655
  • [42] Multichannel blind deconvolution for source separation in convolutive mixtures of speech
    Kokkinakis, K
    Nandi, AK
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 200 - 212
  • [43] Subband based blind source separation for convolutive mixtures of speech
    Araki, S
    Makino, S
    Aichner, R
    Nishikawa, T
    Saruwatari, H
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 509 - 512
  • [44] Permutation correction in the frequency domain in blind separation of speech mixtures
    Serviere, Ch.
    Pham, D. T.
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
  • [45] Permutation Correction in the Frequency Domain in Blind Separation of Speech Mixtures
    Ch Servière
    DT Pham
    EURASIP Journal on Advances in Signal Processing, 2006
  • [46] Blind separation of non-linear convolved speech mixtures
    Koutras, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 913 - 916
  • [47] Oriented PCA Method for Blind Speech Separation of Convolutive Mixtures
    Benabderrahmane, Yasmina
    Selouani, Sid Ahmed
    O'Shaughnessy, Douglas
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 390 - +
  • [48] Solving the indeterminations of blind source separation of convolutive speech mixtures
    Rivet, B
    Girin, L
    Jutten, C
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 533 - 536
  • [49] Permutation correction in the frequency domain in blind separation of speech mixtures
    Servière, Ch.
    Pham, D.T.
    Eurasip Journal on Applied Signal Processing, 2006, 2006
  • [50] Improvements in Blind Source Separation of Anechoic Underdetermined Speech Mixtures
    Pires Filho, Jorge Costa
    Petraglia, Mariane Rembold
    2014 INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM (ITS), 2014,