Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain

被引:6
|
作者
SongGong, Kunkun [1 ]
Chen, Huawei [1 ]
Wang, Wenwu [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Elect & Informat Engn, Nanjing 210016, Peoples R China
[2] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford GU2 7XH, Surrey, England
基金
中国国家自然科学基金;
关键词
Direction-of-arrival estimation; Location awareness; Estimation; Harmonic analysis; Reverberation; Array signal processing; Sensor arrays; Multi-speaker localization; Bayesian nonparametrics (BNP); circular harmonics; direction of arrival (DOA) estimation; microphone array signal processing; SOUND SOURCE LOCALIZATION; OF-ARRIVAL ESTIMATION; MICROPHONE ARRAY; DECOMPOSITION; HOLOGRAPHY; SEPARATION; SPEAKERS; NOISE;
D O I
10.1109/TASLP.2021.3079809
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Circular microphone arrays have been used for multi-speaker localization in computational auditory scene analysis, for their high flexibility in sound field analysis, including the generation of frequency-invariant eigenbeams for wideband acoustic sources. However, the localization performance of existing circular harmonic approaches, such as circular harmonics beamformer (CHB) depends strongly on the physical characteristics (such as shape) of sensor arrays, and the level of uncertainties presented in acoustic environments (such as background noise, room reverberation, and the number of sources). These uncertainties may limit the performance or practical application of the speaker localization algorithms. To address these issues, in this paper, we present a new indoor multi-speaker localization method in the circular harmonic domain based on the acoustic holography beamforming (AHB) technique and the Bayesian nonparametrics (BNP) method. More specifically, we use the AHB technique, which combines the delay-and-sum beamforming with acoustic-holography-based virtual sensing, to generate direction of arrival (DOA) measurements in the time-frequency (TF) domain, and then design a BNP algorithm based on the infinite Gaussian mixture model (IGMM) to estimate the DOAs of the individual sources without the prior knowledge about the number of sources. These estimates may degrade in the presence of room reverberation and background noise. To address this issue, we develop a robust TF bin selection and permutation method on the basis of mixture weights, using power, power ratio and local variance estimated at each TF bin. Experiments performed on both simulated and real-data show that our method gives significantly better performance, than four recent baseline methods, in a variety of noise and reverberation levels, in terms of the root-mean-square error (RMSE) of the DOA estimation and the source detecting success rate.
引用
收藏
页码:1864 / 1880
页数:17
相关论文
共 50 条
  • [41] MuSLoc: Circular Array Based Indoor Localization with COTS APs
    Nafi, Kawser Wazed
    Gong, Wei
    Nayak, Amiya
    2019 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE (I2MTC), 2019, : 972 - 976
  • [42] Localization-Driven Speech Enhancement in Noisy Multi-Speaker Hospital Environments Using Deep Learning and Meta Learning
    Barhoush, Mahdi
    Hallawa, Ahmed
    Peine, Arne
    Martin, Lukas
    Schmeink, Anke
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 670 - 683
  • [43] Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN based Statistical Parametric Speech Synthesis
    Li, Bo
    Zen, Heiga
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2468 - 2472
  • [44] IMPROVING PROSODY WITH LINGUISTIC AND BERT DERIVED FEATURES IN MULTI-SPEAKER BASED MANDARIN CHINESE NEURAL TTS
    Xiao, Yujia
    He, Lei
    Ming, Huaiping
    Soong, Frank K.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6704 - 6708
  • [45] A novel Bayesian filtering based algorithm for RSSI-based indoor localization
    Zafari, Faheem
    Papapanagiotou, Ioannis
    Hackerz, Thomas J.
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [46] From text to formants - indirect model for trajectory prediction based on a multi-speaker parallel speech database
    Abari, Kalman
    Csapo, Tamas Gabor
    Toth, Balint Pal
    Olaszy, Gabor
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 623 - 627
  • [47] Efficient Calibration for RSSI-based Indoor Localization by Bayesian Experimental Design on Multi-task Classification
    Shimosaka, Masamichi
    Saisho, Osamu
    UBICOMP'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING, 2016, : 244 - 249
  • [48] Indoor Localization Algorithm Based on Array Antenna and Sparse Bayesian Learning
    Liu Kun
    Wu Jianxin
    Zhen Jie
    Wang Tong
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (05) : 1158 - 1164
  • [49] A GMM-UBM Based Multi-speaker Re-segmentation and Re-clustering Algorithm
    Su, Yahui
    Lu, Xuanmin
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 1048 - 1051
  • [50] Multi-Fingerprints Indoor Localization for Variable Spatial Environments: A Naive Bayesian Approach
    Hou, Chengjie
    Zhang, Zhizhong
    SENSORS, 2024, 24 (18)