Acoustic beamforming for speaker diarization of meetings

被引:274
|
作者
Anguera, Xavier [1 ]
Wooters, Chuck
Hernando, Javier
机构
[1] Telefon ID, Madrid 28043, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
关键词
acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
D O I
10.1109/TASL.2007.902460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
引用
下载
收藏
页码:2011 / 2022
页数:12
相关论文
共 50 条
  • [31] The IBM RT07 evaluation systems for speaker diarization on lecture meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    Potamianos, Gerasimos
    MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2008, 4625 : 497 - 508
  • [32] MULTISTREAM SPEAKER DIARIZATION BEYOND TWO ACOUSTIC FEATURE STREAMS
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4950 - 4953
  • [33] Speaker diarization for multiple-distant-microphone meetings using several sources of information
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Charles
    IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1212 - 1224
  • [34] Robust Speaker Diarization for Meetings: ICSI RT06s evaluation system
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1674 - 1677
  • [35] Robust speaker segmentation for meetings:: The ICSI-SRI Spring 2005 Diarization System
    Anguera, X
    Wooters, C
    Peskin, B
    Aguiló, M
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 402 - 414
  • [36] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [37] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [38] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [39] Trainable Speaker Diarization
    Aronowitz, Hagai
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [40] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
    Koh, Eugene Chin Wei
    Sun, Hanwu
    Nwe, Tin Lay
    Nguyen, Trung Hieu
    Ma, Bin
    Chng, Eng-Siong
    Li, Haizhou
    Rahardja, Susanto
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +