Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain

被引:6
|
作者
SongGong, Kunkun [1 ]
Chen, Huawei [1 ]
Wang, Wenwu [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Direction-of-arrival estimation; Location awareness; Estimation; Harmonic analysis; Reverberation; Array signal processing; Sensor arrays; Multi-speaker localization; Bayesian nonparametrics (BNP); circular harmonics; direction of arrival (DOA) estimation; microphone array signal processing; SOUND SOURCE LOCALIZATION; OF-ARRIVAL ESTIMATION; MICROPHONE ARRAY; DECOMPOSITION; HOLOGRAPHY; SEPARATION; SPEAKERS; NOISE;
D O I
10.1109/TASLP.2021.3079809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Circular microphone arrays have been used for multi-speaker localization in computational auditory scene analysis, for their high flexibility in sound field analysis, including the generation of frequency-invariant eigenbeams for wideband acoustic sources. However, the localization performance of existing circular harmonic approaches, such as circular harmonics beamformer (CHB) depends strongly on the physical characteristics (such as shape) of sensor arrays, and the level of uncertainties presented in acoustic environments (such as background noise, room reverberation, and the number of sources). These uncertainties may limit the performance or practical application of the speaker localization algorithms. To address these issues, in this paper, we present a new indoor multi-speaker localization method in the circular harmonic domain based on the acoustic holography beamforming (AHB) technique and the Bayesian nonparametrics (BNP) method. More specifically, we use the AHB technique, which combines the delay-and-sum beamforming with acoustic-holography-based virtual sensing, to generate direction of arrival (DOA) measurements in the time-frequency (TF) domain, and then design a BNP algorithm based on the infinite Gaussian mixture model (IGMM) to estimate the DOAs of the individual sources without the prior knowledge about the number of sources. These estimates may degrade in the presence of room reverberation and background noise. To address this issue, we develop a robust TF bin selection and permutation method on the basis of mixture weights, using power, power ratio and local variance estimated at each TF bin. Experiments performed on both simulated and real-data show that our method gives significantly better performance, than four recent baseline methods, in a variety of noise and reverberation levels, in terms of the root-mean-square error (RMSE) of the DOA estimation and the source detecting success rate.
引用
收藏
页码:1864 / 1880
页数:17
相关论文
共 50 条
  • [1] Robust Indoor Speaker Localization in the Circular Harmonic Domain
    SongGong, Kunkun
    Chen, Huawei
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (04) : 3413 - 3422
  • [2] Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
    Sivasankaran, Sunit
    Vincent, Emmanuel
    Fohr, Dominique
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2703 - 2707
  • [3] Two-microphone multi-speaker localization based on a Laplacian Mixture Model
    Cobos, Maximo
    Lopez, Jose J.
    Martinez, David
    DIGITAL SIGNAL PROCESSING, 2011, 21 (01) : 66 - 76
  • [4] Multi-Speaker Video Dialog with Frame-Level Temporal Localization
    Wang, Qiang
    Jiang, Pin
    Guo, Zhiyi
    Han, Yahong
    Zhao, Zhou
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12200 - 12207
  • [5] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [6] A BAYESIAN HIERARCHICAL MIXTURE OF GAUSSIAN MODEL FOR MULTI-SPEAKER DOA ESTIMATION AND SEPARATION
    Laufer, Yaron
    Gannot, Sharon
    PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
  • [7] Multi-speaker articulatory reconstruction based on an Eigen articulatory HMM
    Hiroya, S
    Mochida, T
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 909 - 912
  • [8] Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 120 - 123
  • [9] Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs
    Hiroya, Sadao
    Mochida, Takemi
    SPEECH COMMUNICATION, 2006, 48 (12) : 1677 - 1690
  • [10] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
    Udagawa, Kenta
    Saito, Yuki
    Saruwatari, Hiroshi
    INTERSPEECH 2022, 2022, : 2968 - 2972