Speaker diarization for multi-party meetings using acoustic fusion

被引：0

作者：

Anguera, X ^{[1
]}

Wooters, C ^{[1
]}

Hernando, J ^{[1
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the sub-tasks of the Spring 2004 and Spring 2005 NIST Meetings evaluations requires segmenting multi-party meetings into speaker-homogeneous regions using data from multiple distant microphones (the "MDM" sub-task). One approach to this task is to run a speaker se-mentation system on each of the microphone channels separately, and then merge the results. This can be thought of as a many-to-one Post-processing approach. In this paper we propose an alternative approach in which we use delay-and-sum beamforming techniques to fuse the signals from each of the multiple distant microphones into a single enhanced signal. This approach can be thought of a many-to-one preprocessing approach. In the pre-processing approach we propose, the time delay of arrival (TDOA) between each of the multiple distant channels and a reference channel is computed incrementally using a window that steps through the signals from each of the multiple microphones. No information about the locations or setup of the microphones is required. Using the TDOA information, the channels are first aligned and then summed and the resulting "enhanced" signal is clustered using our standard speaker diarization system. We test our approach on the 2004 and 2005 NIST meetings evaluation databases and show that the technique performs very well.

引用

页码：426 / 431

页数：6

共 50 条

[1] Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
Hung, Hayley
Huang, Yan
Friedland, Gerald
Gatica-Perez, Daniel
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 847 - 860
[2] Multimodal Fusion using Respiration and Gaze for Predicting Next Speaker in Multi-Party Meetings
Ishii, Ryo
Kumano, Shiro
Otsuka, Kazuhiro
[J]. ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2015, : 99 - 106
[3] Speaker Diarization using Eye-gaze Information in Multi-party Conversations
Inoue, Koji
Wakabayashi, Yukoh
Yoshimoto, Hiromasa
Kawahara, Tatsuya
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 562 - 566
[4] Estimating the dominant person in multi-party conversations using speaker diarization strategies
Hung, Hayley
Huang, Yan
Friedland, Gerald
Gatica-Perez, Daniel
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2197 - +
[5] Acoustic beamforming for speaker diarization of meetings
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
[6] PREDICTING NEXT SPEAKER BASED ON HEAD MOVEMENT IN MULTI-PARTY MEETINGS
Ishii, Ryo
Kumano, Shiro
Otsuka, Kazuhiro
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2319 - 2323
[7] MULTI-CHANNEL SPEAKER DIARIZATION USING SPATIAL FEATURES FOR MEETINGS
Zheng, Naijun
Li, Na
Yu, JianWei
Weng, Chao
Su, Dan
Liu, XunYing
Meng, Helen
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7337 - 7341
[8] Predicting Next Speaker and Timing from Gaze Transition Patterns in Multi-Party Meetings
Ishii, Ryo
Otsuka, Kazuhiro
Kumano, Shiro
Matsuda, Masafumi
Yamato, Junji
[J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 79 - 86
[9] A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Yu, Fan
Du, Zhihao
Zhang, Shiliang
Lin, Yuxiao
Xie, Lei
[J]. INTERSPEECH 2022, 2022, : 560 - 564
[10] Automatic weighting for the combination of TDOA and acoustic features in speaker diarization for meetings
Anguera, Xavier
Wooters, Chuck
Pardo, Jose M.
Hernando, Javier
[J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 241 - +

← 1 2 3 4 5 →