Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

被引:2
|
作者
Gerlach, Stephan [1 ,4 ]
Bitzer, Joerg [1 ,2 ]
Goetze, Stefan [1 ,4 ]
Doclo, Simon [3 ,4 ]
机构
[1] Fraunhofer Inst Digital Media Technol IDMT, Project Grp Hearing Speech & Audio Technol HSA, D-26129 Oldenburg, Germany
[2] Jade Univ Appl Sci, D-26121 Oldenburg, Germany
[3] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26111 Oldenburg, Germany
[4] Cluster Excellence Hearing All, D-26129 Oldenburg, Germany
关键词
Joint DOA and pitch estimation; Spectral comb; GCC-PHAT; Multi-channel cross-correlation; Particle filter; ACOUSTIC SOURCE; LOCALIZATION; ALGORITHM; DOA;
D O I
10.1186/s13636-014-0031-8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT60 of 560 ms.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
    Stephan Gerlach
    Jörg Bitzer
    Stefan Goetze
    Simon Doclo
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014 (1)
  • [2] MAXIMUM LIKELIHOOD MULTI-SPEAKER DIRECTION OF ARRIVAL ESTIMATION UTILIZING A WEIGHTED HISTOGRAM
    Hadad, Elior
    Gannot, Sharon
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 586 - 590
  • [3] Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation
    He, Weipeng
    Motlicek, Petr
    Odobez, Jean-Marc
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1303 - 1317
  • [4] Multi-Speaker Direction of Arrival Estimation using SRP-PHAT Algorithm with a Weighted Histogram
    Hadad, Elior
    Gannot, Sharon
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
  • [5] Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network
    Wu, Yulin
    Hu, Ruimin
    Wang, Xiaochen
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 636 - 641
  • [6] NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain
    Sushmita Thakallapalli
    Suryakanth V. Gangashetty
    Nilesh Madhu
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [7] NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain
    Thakallapalli, Sushmita
    Gangashetty, Suryakanth V.
    Madhu, Nilesh
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [8] Joint position-pitch estimation for multiple speaker scenarios
    Kepesi, Marian
    Ottowitz, Lukas
    Habib, Tania
    [J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 86 - 89
  • [9] Improving the Accuracy and the Robustness of Harmonic Model for Pitch Estimation
    Asgari, Meysam
    Shafran, Izhak
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1935 - 1939
  • [10] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
    Zhao, Wei
    Xu, Li
    He, Ting
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503