Speaker diarization for multi-party meetings using acoustic fusion

被引：0

作者：

Anguera, X ^{[1
]}

Wooters, C ^{[1
]}

Hernando, J ^{[1
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the sub-tasks of the Spring 2004 and Spring 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). One approach to this task is to run a speaker se-mentation system on each of the microphone channels separately, and then merge the results. This can be thought of as a many-to-one Post-processing approach. In this paper we propose an alternative approach in which we use delay-and-sum beamforming techniques to fuse the signals from each of the multiple distant microphones into a single enhanced signal. This approach can be thought of a many-to-one preprocessing approach. In the pre-processing approach we propose, the time delay of arrival (TDOA) between each of the multiple distant channels and a reference channel is computed incrementally using a window that steps through the signals from each of the multiple microphones. No information about the locations or setup of the microphones is required. Using the TDOA information, the channels are first aligned and then summed and the resulting "enhanced" signal is clustered using our standard speaker diarization system. We test our approach on the 2004 and 2005 NIST meetings evaluation databases and show that the technique performs very well.

引用

页码：426 / 431

页数：6

共 50 条

[41] Analysis of transition cost and model parameters in speaker diarization for meetings
Beatriz Martínez-González
José M. Pardo
José A. Vallejo-Pinto
Rubén San-Segundo
Javier Ferreiros
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
[42] SPEAKER DIARIZATION OF MEETINGS BASED ON LARGE TDOA FEATURE VECTORS
Vijayasenan, Deepu
Valente, Fabio
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4173 - 4176
[43] MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
Friedland, Gerald
Hung, Hayley
Yeo, Chuohao
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4069 - +
[44] Using Direction of Arrival Estimate and Acoustic Feature Information in Speaker Diarization
Koh, Eugene Chin Wei
Sun, Hanwu
Nwe, Tin Lay
Nguyen, Trung Hieu
Ma, Bin
Chng, Eng-Siong
Li, Haizhou
Rahardja, Susanto
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2181 - +
[45] Clustering Initialization Based on Spatial Information for Speaker Diarization of Meetings
Luque, J.
Segura, C.
Hernando, J.
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 383 - 386
[46] Speaker diarization for multiple-distant-microphone meetings using several sources of information
Pardo, Jose M.
Anguera, Xavier
Wooters, Charles
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1212 - 1224
[47] Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor Fusion Through SVM
Minotto, Vicente Peruffo
Jung, Claudio Rosito
Lee, Bowon
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) : 1694 - 1705
[48] Joint Attention Estimation during Multi-party Facilitation Using Multi-Modal Fusion
Chew, Jouh Yeong
Wang, Xiaohan
[J]. COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 322 - 326
[49] Enhanced Speaker-Aware Multi-Party Multi-Turn Dialogue Comprehension
Ma, Xinbei
Zhang, Zhuosheng
Zhao, Hai
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2410 - 2423
[50] Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
Xylogiannis, Paris
Vryzas, Nikolaos
Vrysis, Lazaros
Dimoulas, Charalampos
[J]. SENSORS, 2024, 24 (13)

← 1 2 3 4 5 →