Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain

被引：6

作者：

SongGong, Kunkun ^{[1
]}

Chen, Huawei ^{[1
]}

Wang, Wenwu ^{[2
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China

[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

基金：

中国国家自然科学基金;

关键词：

Direction-of-arrival estimation; Location awareness; Estimation; Harmonic analysis; Reverberation; Array signal processing; Sensor arrays; Multi-speaker localization; Bayesian nonparametrics (BNP); circular harmonics; direction of arrival (DOA) estimation; microphone array signal processing; SOUND SOURCE LOCALIZATION; OF-ARRIVAL ESTIMATION; MICROPHONE ARRAY; DECOMPOSITION; HOLOGRAPHY; SEPARATION; SPEAKERS; NOISE;

D O I：

10.1109/TASLP.2021.3079809

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Circular microphone arrays have been used for multi-speaker localization in computational auditory scene analysis, for their high flexibility in sound field analysis, including the generation of frequency-invariant eigenbeams for wideband acoustic sources. However, the localization performance of existing circular harmonic approaches, such as circular harmonics beamformer (CHB) depends strongly on the physical characteristics (such as shape) of sensor arrays, and the level of uncertainties presented in acoustic environments (such as background noise, room reverberation, and the number of sources). These uncertainties may limit the performance or practical application of the speaker localization algorithms. To address these issues, in this paper, we present a new indoor multi-speaker localization method in the circular harmonic domain based on the acoustic holography beamforming (AHB) technique and the Bayesian nonparametrics (BNP) method. More specifically, we use the AHB technique, which combines the delay-and-sum beamforming with acoustic-holography-based virtual sensing, to generate direction of arrival (DOA) measurements in the time-frequency (TF) domain, and then design a BNP algorithm based on the infinite Gaussian mixture model (IGMM) to estimate the DOAs of the individual sources without the prior knowledge about the number of sources. These estimates may degrade in the presence of room reverberation and background noise. To address this issue, we develop a robust TF bin selection and permutation method on the basis of mixture weights, using power, power ratio and local variance estimated at each TF bin. Experiments performed on both simulated and real-data show that our method gives significantly better performance, than four recent baseline methods, in a variety of noise and reverberation levels, in terms of the root-mean-square error (RMSE) of the DOA estimation and the source detecting success rate.

引用

页码：1864 / 1880

页数：17

共 50 条

[1] Robust Indoor Speaker Localization in the Circular Harmonic Domain
SongGong, Kunkun
Chen, Huawei
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (04) : 3413 - 3422
[2] Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
Sivasankaran, Sunit
Vincent, Emmanuel
Fohr, Dominique
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2703 - 2707
[3] Two-microphone multi-speaker localization based on a Laplacian Mixture Model
Cobos, Maximo
Lopez, Jose J.
Martinez, David
DIGITAL SIGNAL PROCESSING, 2011, 21 (01) : 66 - 76
[4] Multi-Speaker Video Dialog with Frame-Level Temporal Localization
Wang, Qiang
Jiang, Pin
Guo, Zhiyi
Han, Yahong
Zhao, Zhou
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12200 - 12207
[5] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[6] A BAYESIAN HIERARCHICAL MIXTURE OF GAUSSIAN MODEL FOR MULTI-SPEAKER DOA ESTIMATION AND SEPARATION
Laufer, Yaron
Gannot, Sharon
PROCEEDINGS OF THE 2020 IEEE 30TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2020,
[7] Multi-speaker articulatory reconstruction based on an Eigen articulatory HMM
Hiroya, S
Mochida, T
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 909 - 912
[8] Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 120 - 123
[9] Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs
Hiroya, Sadao
Mochida, Takemi
SPEECH COMMUNICATION, 2006, 48 (12) : 1677 - 1690
[10] Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Udagawa, Kenta
Saito, Yuki
Saruwatari, Hiroshi
INTERSPEECH 2022, 2022, : 2968 - 2972

← 1 2 3 4 5 →