Acoustic beamforming for speaker diarization of meetings

被引:274
|
作者
Anguera, Xavier [1 ]
Wooters, Chuck
Hernando, Javier
机构
[1] Telefon ID, Madrid 28043, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
关键词
acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
D O I
10.1109/TASL.2007.902460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
引用
收藏
页码:2011 / 2022
页数:12
相关论文
共 50 条
  • [1] Speaker diarization for multi-party meetings using acoustic fusion
    Anguera, X
    Wooters, C
    Hernando, J
    [J]. 2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 426 - 431
  • [2] Automatic weighting for the combination of TDOA and acoustic features in speaker diarization for meetings
    Anguera, Xavier
    Wooters, Chuck
    Pardo, Jose M.
    Hernando, Javier
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 241 - +
  • [3] IMPROVED SPEAKER DIARIZATION SYSTEM FOR MEETINGS
    El-Khoury, Elie
    Senac, Christine
    Pinquier, Julien
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4097 - 4100
  • [4] Purity algorithms for speaker diarization of meetings data
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1025 - 1028
  • [5] Improving Speaker Diarization for CHIL Lecture Meetings
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
  • [6] The SAIL Speaker Diarization System for Analysis of Spontaneous Meetings
    Han, Kyu J.
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    [J]. 2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 970 - 975
  • [7] A DOA based speaker diarization system for real meetings
    Araki, Shoko
    Fujimoto, Masakiyo
    Ishizuka, Kentaro
    Sawada, Hiroshi
    Makino, Shoji
    [J]. 2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 30 - 33
  • [8] Agglomerative Information Bottleneck for speaker diarization of meetings data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
  • [9] SPEAKER DIARIZATION OF MEETINGS BASED ON SPEAKER ROLE N-GRAM MODELS
    Valente, Fabio
    Vijayasenan, Deepu
    Motlicek, Petr
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4416 - 4419
  • [10] SPEAKER DIARIZATION OF MEETINGS BASED ON LARGE TDOA FEATURE VECTORS
    Vijayasenan, Deepu
    Valente, Fabio
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4173 - 4176