Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios

被引：2

作者：

Gerlach, Stephan ^{[1
,4
]}

Bitzer, Joerg ^{[1
,2
]}

Goetze, Stefan ^{[1
,4
]}

Doclo, Simon ^{[3
,4
]}

机构：

[1] Fraunhofer Inst Digital Media Technol IDMT, Project Grp Hearing Speech & Audio Technol HSA, D-26129 Oldenburg, Germany

[2] Jade Univ Appl Sci, D-26121 Oldenburg, Germany

[3] Carl von Ossietzky Univ Oldenburg, Dept Med Phys & Acoust, D-26111 Oldenburg, Germany

[4] Cluster Excellence Hearing All, D-26129 Oldenburg, Germany

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2014年

关键词：

Joint DOA and pitch estimation; Spectral comb; GCC-PHAT; Multi-channel cross-correlation; Particle filter; ACOUSTIC SOURCE; LOCALIZATION; ALGORITHM; DOA;

D O I：

10.1186/s13636-014-0031-8

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In many speech communication applications, robust localization and tracking of multiple speakers in noisy and reverberant environments are of major importance. Several algorithms to tackle this problem have been proposed in the last decades. In this paper, we propose several extensions to a recently presented joint direction of arrival (DOA) and pitch estimation method, increasing its robustness in multi-speaker scenarios, noise, and reverberation. First, a spectral comb filter is added to the original algorithm to better cope with concurrent speakers. Second, the well-known generalized cross-correlation with phase transform (GCC-PHAT) is used as an additional weighting function to improve the DOA estimation accuracy in terms of correct hits. Third, using multiple microphone pairs, the multi-channel cross-correlation approach is incorporated to improve the robustness against noise and reverberation. In order to improve tracking for moving and even intersecting speakers, a particle filter is used. Experiments with real-world recordings in realistic acoustic conditions show that the proposed extensions increase the DOA hit rate by about 33% compared to the original algorithm for two step-wise moving sources at a signal-to-noise ratio (SNR) of 15 dB and a reverberation time RT60 of 560 ms.

引用

页数：17

共 50 条

[1] Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios
Stephan Gerlach
Jörg Bitzer
Stefan Goetze
Simon Doclo
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2014 (1)
[2] MAXIMUM LIKELIHOOD MULTI-SPEAKER DIRECTION OF ARRIVAL ESTIMATION UTILIZING A WEIGHTED HISTOGRAM
Hadad, Elior
Gannot, Sharon
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 586 - 590
[3] Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation
He, Weipeng
Motlicek, Petr
Odobez, Jean-Marc
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1303 - 1317
[4] Multi-Speaker Direction of Arrival Estimation using SRP-PHAT Algorithm with a Weighted Histogram
Hadad, Elior
Gannot, Sharon
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
[5] Multi-speaker Direction of Arrival Estimation Using Audio and Visual Modalities with Convolutional Neural Network
Wu, Yulin
Hu, Ruimin
Wang, Xiaochen
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 636 - 641
[6] NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain
Sushmita Thakallapalli
Suryakanth V. Gangashetty
Nilesh Madhu
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
[7] NMF-weighted SRP for multi-speaker direction of arrival estimation: robustness to spatial aliasing while exploiting sparsity in the atom-time domain
Thakallapalli, Sushmita
Gangashetty, Suryakanth V.
Madhu, Nilesh
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[8] Joint position-pitch estimation for multiple speaker scenarios
Kepesi, Marian
Ottowitz, Lukas
Habib, Tania
[J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 86 - 89
[9] Improving the Accuracy and the Robustness of Harmonic Model for Pitch Estimation
Asgari, Meysam
Shafran, Izhak
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1935 - 1939
[10] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
Zhao, Wei
Xu, Li
He, Ting
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503

← 1 2 3 4 5 →