Acoustic beamforming for speaker diarization of meetings

被引:274
|
作者
Anguera, Xavier [1 ]
Wooters, Chuck
Hernando, Javier
机构
[1] Telefon ID, Madrid 28043, Spain
[2] Univ Politecn Cataluna, E-08028 Barcelona, Spain
关键词
acoustic beamforming; meeting processing; speaker diarization; speaker segmentation and clustering;
D O I
10.1109/TASL.2007.902460
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they cannot outperform the simpler case of using the best single microphone. In this paper, the use of classic acoustic beamforming techniques is proposed together with several novel algorithms to create a complete frontend for speaker diarization in the meeting room domain. New techniques we are presenting include blind reference-channel selection, two-step time delay of arrival (TDOA) Viterbi postprocessing, and a dynamic output signal weighting algorithm, together with using such TDOA values in the diarization to complement the acoustic information. Tests on speaker diarization show a 25% relative improvement on the test set compared to using a single most centrally located microphone. Additional experimental results show improvements using these techniques in a speech recognition task.
引用
收藏
页码:2011 / 2022
页数:12
相关论文
共 50 条
  • [41] Trainable Speaker Diarization
    Aronowitz, Hagai
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [42] DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings
    Vijayasenan, Deepu
    Valente, Fabio
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2167 - 2170
  • [43] INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION OF MEETINGS USING NON-SPEECH AS SIDE INFORMATION
    Yella, Sree Harsha
    Bourlard, Herve
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] MODELING AUDIO DIRECTIONAL STATISTICS USING A PROBABILISTIC SPATIAL DICTIONARY FOR SPEAKER DIARIZATION IN REAL MEETINGS
    Fakhry, Mahmoud
    Ito, Nobutaka
    Araki, Shoko
    Nakatani, Tomohiro
    [J]. 2016 IEEE INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2016,
  • [45] SIMULTANEOUS SPEECH RECOGNITION AND SPEAKER DIARIZATION FOR MONAURAL DIALOGUE RECORDINGS WITH TARGET-SPEAKER ACOUSTIC MODELS
    Kanda, Naoyuki
    Horiguchi, Shota
    Fujita, Yusuke
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 31 - 38
  • [46] Speaker diarization for multi-microphone meetings using only between-channel differences
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Chuck
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 257 - +
  • [47] Factor analysis-based approaches applied to the speaker diarization task of meetings: a preliminary study
    Tomasek, Pavel
    Fredouille, Corinne
    Matrouf, Driss
    [J]. ODYSSEY 2010: THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP, 2010, : 131 - 137
  • [48] TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge
    Pang, Bowen
    Zhao, Huan
    Zhang, Gaosheng
    Yang, Xiaoyue
    Sun, Yang
    Zhang, Li
    Wang, Qing
    Xie, Lei
    [J]. 2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 502 - 506
  • [49] Acoustic beamforming of a parametric speaker comprising ultrasonic transducers
    Yang, J
    Gan, WS
    Tan, KS
    Er, MH
    [J]. SENSORS AND ACTUATORS A-PHYSICAL, 2005, 125 (01) : 91 - 99
  • [50] New Advances in Speaker Diarization
    Aronowitz, Hagai
    Zhu, Weizhong
    Suzuki, Masayuki
    Kurata, Gakuto
    Hoory, Ron
    [J]. INTERSPEECH 2020, 2020, : 279 - 283