Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization

被引:105
|
作者
Yoshioka, Takuya [1 ]
Nakatani, Tomohiro [1 ]
Miyoshi, Masato [2 ]
Okuno, Hiroshi G. [3 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[2] Kanazawa Univ, Grad Sch Nat Sci & Technol, Kanazawa, Ishikawa 9201192, Japan
[3] Kyoto Univ, Grad Sch Informat, Dept Intelligence Sci & Technol, Kyoto 6068501, Japan
关键词
Blind source separation (BSS); blind dereverberation (BD); conditional separation and dereverberation (CSD); CONVOLUTIVE MIXTURES; IDENTIFICATION; DECONVOLUTION; SUPPRESSION; ALGORITHMS; SIGNALS;
D O I
10.1109/TASL.2010.2045183
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a method for performing blind source separation (BSS) and blind dereverberation (BD) at the same time for speech mixtures. In most previous studies, BSS and BD have been investigated separately. The separation performance of conventional BSS methods deteriorates as the reverberation time increases while many existing BD methods rely on the assumption that there is only one sound source in a room. Therefore, it has been difficult to perform both BSS and BD when the reverberation time is long. The proposed method uses a network, in which dereverberation and separation networks are connected in tandem, to estimate source signals. The parameters for the dereverberation network (prediction matrices) and those for the separation network (separation matrices) are jointly optimized. This enables a BD process to take a BSS process into account. The prediction and separation matrices are alternately optimized with each depending on the other; hence, we call the proposed method the conditional separation and dereverberation (CSD) method. Comprehensive evaluation results are reported, where all the speech materials contained in the complete test set of the TIMIT corpus are used. The CSD method improves the signal-to-interference ratio by an average of about 4 dB over the conventional frequency-domain BSS approach for reverberation times of 0.3 and 0.5 s. The direct-to-reverberation ratio is also improved by about 10 dB.
引用
下载
收藏
页码:69 / 84
页数:16
相关论文
共 50 条
  • [11] A Semi-blind Source Separation Approach for Speech Dereverberation
    Wang, Ziteng
    Na, Yueyue
    Liu, Zhang
    Li, Yun
    Tian, Biao
    Fu, Qiang
    INTERSPEECH 2020, 2020, : 3925 - 3929
  • [12] JOINT SEPARATION AND DEREVERBERATION OF REVERBERANT MIXTURES WITH MULTICHANNEL VARIATIONAL AUTOENCODER
    Inoue, Shota
    Kameoka, Hirokazu
    Li, Li
    Seki, Shogo
    Makino, Shoji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 96 - 100
  • [13] Blind and Spatially-Regularized Online Joint Optimization of Source Separation, Dereverberation, and Noise Reduction
    Ueda, Tetsuya
    Nakatani, Tomohiro
    Ikeshita, Rintaro
    Kinoshita, Keisuke
    Araki, Shoko
    Makino, Shoji
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1157 - 1172
  • [14] Blind dereverberation of a speech signal
    V. A. Zverev
    Acoustical Physics, 2008, 54 : 261 - 268
  • [15] Blind dereverberation of a speech signal
    Zverev, V. A.
    ACOUSTICAL PHYSICS, 2008, 54 (02) : 261 - 268
  • [16] A Hybrid Reverberation Model and Its Application to Joint Speech Dereverberation and Separation
    Liu, Tongzheng
    Lu, Zhihua
    da Costa, Joao Paulo J.
    Fei, Tai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3000 - 3014
  • [17] RELAXED DISJOINTNESS BASED CLUSTERING FOR JOINT BLIND SOURCE SEPARATION AND DEREVERBERATION
    Ito, Nobutaka
    Araki, Shoko
    Yoshioka, Takuya
    Nakatani, Tomohiro
    2014 14TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2014, : 268 - 272
  • [18] Independent Vector Extraction for Fast Joint Blind Source Separation and Dereverberation
    Ikeshita, Rintaro
    Nakatani, Tomohiro
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 972 - 976
  • [19] Speech Recognition Using Blind Source Separation and Dereverberation Method for Mixed Sound of Speech and Music
    Wang, Longbiao
    Odani, Kyohei
    Kai, Atsuhiko
    Li, Weifeng
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [20] Blind separation of speech mixtures based on nonstationarity
    Pham, DT
    Servière, C
    Boumaraf, H
    SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 2, PROCEEDINGS, 2003, : 73 - 76