Acoustic beamforming for speaker diarization of meetings

被引：274

作者：

Anguera, Xavier ^{[1
]}

Wooters, Chuck

Hernando, Javier

机构：

[1] Telefon ID, Madrid 28043, Spain

[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 07期

关键词：

acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;

D O I：

10.1109/TASL.2007.902460

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.

引用

下载

页码：2011 / 2022

页数：12

共 50 条

[31] The IBM RT07 evaluation systems for speaker diarization on lecture meetings
Huang, Jing
Marcheret, Etienne
Visweswariah, Karthik
Potamianos, Gerasimos
MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2008, 4625 : 497 - 508
[32] MULTISTREAM SPEAKER DIARIZATION BEYOND TWO ACOUSTIC FEATURE STREAMS
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Herve
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4950 - 4953
[33] Speaker diarization for multiple-distant-microphone meetings using several sources of information
Pardo, Jose M.
Anguera, Xavier
Wooters, Charles
IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1212 - 1224
[34] Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system
Anguera, Xavier
Wooters, Chuck
Pardo, Jose M.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1674 - 1677
[35] Robust speaker segmentation for meetings:: The ICSI-SRI Spring 2005 Diarization System
Anguera, X
Wooters, C
Peskin, B
Aguiló, M
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 402 - 414
[36] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[37] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[38] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[39] Trainable Speaker Diarization
Aronowitz, Hagai
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
[40] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
Koh, Eugene Chin Wei
Sun, Hanwu
Nwe, Tin Lay
Nguyen, Trung Hieu
Ma, Bin
Chng, Eng-Siong
Li, Haizhou
Rahardja, Susanto
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +

← 1 2 3 4 5 →