Variational probabilistic speech separation using microphone arrays

被引:7
|
作者
Rennie, Steven J. [1 ]
Aarabi, Parham [1 ]
Frey, Brendan J. [1 ]
机构
[1] Univ Toronto, Dept Comp Engn, Toronto, ON M5S 3G4, Canada
关键词
approximate inference; microphone arrays; phase-based speech processing; probabilistic graphical models; robust speech recognition; speech separation; variational methods;
D O I
10.1109/TASL.2006.876865
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Separating multiple speech sources using a limited number of noisy sensor measurements presents a difficult problem, but one that is of great practical interest. Although previously introduced source separation methods [such as independent component analysis (ICA)] can be made to work in many situations, most of these methods fail when the sensors are very noisy or when the number of sources exceeds the number of sensors. Our approach to this problem is to combine the multiple sensor likelihoods [obtained using time-delay-of-arrival (TDOA) information] with a generative probability model of the sources. This model accounts for the power spectrum of each source using a mixture model, and accounts for the phase of each source using one discretized hidden phase variable for each frequency. Source separation is achieved by identifying the source vector configuration of maximum a posteriori probability, given all available information. An exhaustive search for the MAP configuration is computationally intractable, but we present an efficient variational technique that performs approximate probabilistic inference. For the problem of separating delayed additive noise corrupted speech mixtures, the algorithm is able to improve upon the signal-to-noise ratio (SNR) gain performance of existing state-of-the-art probabilistic and TDOA-based speech separation algorithms by over 10 dB. This significant performance improvement is obtained by combining the information utilized by these approaches intelligently under a representative probabilistic description of the speech production and mixing process. The method is capable of recovering high fidelity estimates of the underlying speech sources even when there are more sources than microphone observations.
引用
收藏
页码:135 / 149
页数:15
相关论文
共 50 条
  • [1] SPEECH SEPARATION USING PARTIALLY ASYNCHRONOUS MICROPHONE ARRAYS WITHOUT RESAMPLING
    Corey, Ryan M.
    Singer, Andrew C.
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 111 - 115
  • [2] Continuous Speech Separation with Ad Hoc Microphone Arrays
    Wang, Dongmei
    Yoshioka, Takuya
    Chen, Zhuo
    Wang, Xiaofei
    Zhou, Tianyan
    Meng, Zhong
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1100 - 1104
  • [3] DISTRIBUTED SPEECH SEPARATION IN SPATIALLY UNCONSTRAINED MICROPHONE ARRAYS
    Furnon, Nicolas
    Serizel, Romain
    Illina, Irina
    Essid, Slim
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4490 - 4494
  • [4] Robust speech coding using microphone arrays
    Li, Z
    Hoffman, MW
    [J]. THIRTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 1998, : 44 - 48
  • [5] Broadband deterministic blind beamforming for speech separation using microphone arrays via rotational invariance techniques
    Lin Jing-Ran
    Peng Qi-Cong
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLS 1-4: VOL 1: SIGNAL PROCESSING, 2006, : 271 - +
  • [6] RECOGNITION OF OVERLAPPING SPEECH USING DIGITAL MEMS MICROPHONE ARRAYS
    Zwyssig, Erich
    Faubel, Friedrich
    Renals, Steve
    Lincoln, Mike
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7068 - 7072
  • [7] Speech activity detection of moving speaker using microphone arrays
    Potamitis, I
    Fishler, E
    [J]. ELECTRONICS LETTERS, 2003, 39 (16) : 1223 - 1225
  • [8] Location Feature Integration for Clustering-Based Speech Separation in Distributed Microphone Arrays
    Souden, Mehrez
    Kinoshita, Keisuke
    Delcroix, Marc
    Nakatani, Tomohiro
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (02) : 354 - 367
  • [9] Speech Enhancement in Distributed Microphone Arrays Using Polynomial Eigenvalue Decomposition
    d'Olne, Emilie
    Neo, Vincent W.
    Naylor, Patrick A.
    [J]. 2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 55 - 59
  • [10] DOA estimation of speech source with microphone arrays
    Jian, M
    Kot, AC
    Er, MH
    [J]. ISCAS '98 - PROCEEDINGS OF THE 1998 INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-6, 1998, : D293 - D296