Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain

被引:6
|
作者
SongGong, Kunkun [1 ]
Chen, Huawei [1 ]
Wang, Wenwu [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Direction-of-arrival estimation; Location awareness; Estimation; Harmonic analysis; Reverberation; Array signal processing; Sensor arrays; Multi-speaker localization; Bayesian nonparametrics (BNP); circular harmonics; direction of arrival (DOA) estimation; microphone array signal processing; SOUND SOURCE LOCALIZATION; OF-ARRIVAL ESTIMATION; MICROPHONE ARRAY; DECOMPOSITION; HOLOGRAPHY; SEPARATION; SPEAKERS; NOISE;
D O I
10.1109/TASLP.2021.3079809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Circular microphone arrays have been used for multi-speaker localization in computational auditory scene analysis, for their high flexibility in sound field analysis, including the generation of frequency-invariant eigenbeams for wideband acoustic sources. However, the localization performance of existing circular harmonic approaches, such as circular harmonics beamformer (CHB) depends strongly on the physical characteristics (such as shape) of sensor arrays, and the level of uncertainties presented in acoustic environments (such as background noise, room reverberation, and the number of sources). These uncertainties may limit the performance or practical application of the speaker localization algorithms. To address these issues, in this paper, we present a new indoor multi-speaker localization method in the circular harmonic domain based on the acoustic holography beamforming (AHB) technique and the Bayesian nonparametrics (BNP) method. More specifically, we use the AHB technique, which combines the delay-and-sum beamforming with acoustic-holography-based virtual sensing, to generate direction of arrival (DOA) measurements in the time-frequency (TF) domain, and then design a BNP algorithm based on the infinite Gaussian mixture model (IGMM) to estimate the DOAs of the individual sources without the prior knowledge about the number of sources. These estimates may degrade in the presence of room reverberation and background noise. To address this issue, we develop a robust TF bin selection and permutation method on the basis of mixture weights, using power, power ratio and local variance estimated at each TF bin. Experiments performed on both simulated and real-data show that our method gives significantly better performance, than four recent baseline methods, in a variety of noise and reverberation levels, in terms of the root-mean-square error (RMSE) of the DOA estimation and the source detecting success rate.
引用
收藏
页码:1864 / 1880
页数:17
相关论文
共 50 条
  • [31] DIRECTIONAL ASR: A NEW PARADIGM FOR E2E MULTI-SPEAKER SPEECH RECOGNITION WITH SOURCE LOCALIZATION
    Subramanian, Aswin Shanmugam
    Weng, Chao
    Watanabe, Shinji
    Yu, Meng
    Xu, Yong
    Zhang, Shi-Xiong
    Yu, Dong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8433 - 8437
  • [32] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Shanfa Ke
    Ruimin Hu
    Xiaochen Wang
    Tingzhao Wu
    Gang Li
    Zhongyuan Wang
    Multimedia Tools and Applications, 2020, 79 : 32225 - 32241
  • [33] GRAPH CONVOLUTIONAL NETWORK BASED SEMI-SUPERVISED LEARNING ON MULTI-SPEAKER MEETING DATA
    Tong, Fuchuan
    Zheng, Siqi
    Zhang, Min
    Chen, Yafeng
    Suo, Hongbin
    Hong, Qingyang
    Li, Lin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6622 - 6626
  • [34] ENERGY-BASED MULTI-SPEAKER VOICE ACTIVITY DETECTION WITH AN AD HOC MICROPHONE ARRAY
    Bertrand, Alexander
    Moonen, Marc
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 85 - 88
  • [35] A New Indoor Localization System Based on Bayesian Graphical Model
    Alhammadi, Abdulraqeb
    Hashim, Fazirulhiysam
    Rasid, Mohd Fadlee A.
    Alraih, Saddam
    2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 1960 - 1964
  • [36] Robust Indoor Localization based on Hybrid Bayesian Graphical Models
    Kim, Ryangsoo
    Lim, Hyuk
    Hwang, Sun-Nyoung
    Obele, Brownson O.
    2014 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2014), 2014, : 423 - 429
  • [37] Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data
    Serrurier, Antoine
    Badin, Pierre
    Lamalle, Laurent
    Neuschaefer-Rube, Christiane
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2019, 145 (04): : 2149 - 2170
  • [38] A Method Of Indoor Multi-path IR-UWB Localization Based On Bayesian Compressed Sensing
    Wang Ping
    Ruan Huailin
    Fan Fuhua
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 56 - 59
  • [39] Acoustic Source Localization in the Circular Harmonic Domain Using Deep Learning Architecture
    SongGong, Kunkun
    Wang, Wenwu
    Chen, Huawei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2475 - 2491
  • [40] Single Channel multi-speaker speech Separation based on quantized ratio mask and residual network
    Ke, Shanfa
    Hu, Ruimin
    Wang, Xiaochen
    Wu, Tingzhao
    Li, Gang
    Wang, Zhongyuan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (43-44) : 32225 - 32241