Speaker diarization for multi-party meetings using acoustic fusion

被引:0
|
作者
Anguera, X [1 ]
Wooters, C [1 ]
Hernando, J [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the sub-tasks of the Spring 2004 and Spring 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). One approach to this task is to run a speaker se-mentation system on each of the microphone channels separately, and then merge the results. This can be thought of as a many-to-one Post-processing approach. In this paper we propose an alternative approach in which we use delay-and-sum beamforming techniques to fuse the signals from each of the multiple distant microphones into a single enhanced signal. This approach can be thought of a many-to-one preprocessing approach. In the pre-processing approach we propose, the time delay of arrival (TDOA) between each of the multiple distant channels and a reference channel is computed incrementally using a window that steps through the signals from each of the multiple microphones. No information about the locations or setup of the microphones is required. Using the TDOA information, the channels are first aligned and then summed and the resulting "enhanced" signal is clustered using our standard speaker diarization system. We test our approach on the 2004 and 2005 NIST meetings evaluation databases and show that the technique performs very well.
引用
收藏
页码:426 / 431
页数:6
相关论文
共 50 条
  • [41] Analysis of transition cost and model parameters in speaker diarization for meetings
    Beatriz Martínez-González
    José M. Pardo
    José A. Vallejo-Pinto
    Rubén San-Segundo
    Javier Ferreiros
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [42] SPEAKER DIARIZATION OF MEETINGS BASED ON LARGE TDOA FEATURE VECTORS
    Vijayasenan, Deepu
    Valente, Fabio
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4173 - 4176
  • [43] MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
    Friedland, Gerald
    Hung, Hayley
    Yeo, Chuohao
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4069 - +
  • [44] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
    Koh, Eugene Chin Wei
    Sun, Hanwu
    Nwe, Tin Lay
    Nguyen, Trung Hieu
    Ma, Bin
    Chng, Eng-Siong
    Li, Haizhou
    Rahardja, Susanto
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +
  • [45] Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings
    Luque, J.
    Segura, C.
    Hernando, J.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 383 - 386
  • [46] Speaker diarization for multiple-distant-microphone meetings using several sources of information
    Pardo, Jose M.
    Anguera, Xavier
    Wooters, Charles
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1212 - 1224
  • [47] Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor Fusion Through SVM
    Minotto, Vicente Peruffo
    Jung, Claudio Rosito
    Lee, Bowon
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) : 1694 - 1705
  • [48] Joint Attention Estimation during Multi-party Facilitation Using Multi-Modal Fusion
    Chew, Jouh Yeong
    Wang, Xiaohan
    [J]. COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 322 - 326
  • [49] Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension
    Ma, Xinbei
    Zhang, Zhuosheng
    Zhao, Hai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2410 - 2423
  • [50] Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
    Xylogiannis, Paris
    Vryzas, Nikolaos
    Vrysis, Lazaros
    Dimoulas, Charalampos
    [J]. SENSORS, 2024, 24 (13)