Robust Speaker Diarization for Short Speech Recordings

被引：11

作者：

Imseng, David ^{[1
,2
]}

Friedland, Gerald ^{[3
]}

机构：

[1] Idiap Res Inst, POB 592, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

D O I：

10.1109/ASRU.2009.5373254

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate a state-of-the-art Speaker Diarization system regarding its behavior on meetings that are much shorter (from 500 seconds down to 100 seconds) than those typically analyzed in Speaker Diarization benchmarks. First, the problems inherent to this task are analyzed. Then, we propose an approach that consists of a novel initialization parameter estimation method for typical state-of-the-art diarization approaches. The estimation method balances the relationship between the optimal value of the duration of speech data per Gaussian and the duration of the speech data, which is verified experimentally for the first time in this article. As a result, the Diarization Error Rate for short meetings extracted from the 2006, 2007, and 2009 NIST RT evaluation data is decreased by up to 50 % relative.

引用

页码：432 / +

页数：2

共 50 条

[31] Speech Recognition and Multi-Speaker Diarization of Long Conversations
Mao, Huanru Henry
Li, Shuyang
McAuley, Julian
Cottrell, Garrison W.
INTERSPEECH 2020, 2020, : 691 - 695
[32] Robust acoustic domain identification with its application to speaker diarization
Kumar A.K.
Waldekar S.
Sahidullah M.
Saha G.
International Journal of Speech Technology, 2022, 25 (04) : 933 - 945
[33] PROGRESSIVE MULTI-TARGET NETWORK BASED SPEECH ENHANCEMENT WITH SNR-PRESELECTION FOR ROBUST SPEAKER DIARIZATION
Sun, Lei
Du, Jun
Zhang, Xueyang
Gao, Tian
Fang, Xin
Lee, Chin-Hui
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7099 - 7103
[34] Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization
Parada, Pablo Peso
Sharma, Dushyant
van Waterschoot, Toon
Naylor, Patrick A.
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 86 - 90
[35] Conversational Short-Phrase Speaker Diarization via Self-Adjusting Speech Segmentation and Embedding Extraction
Lu, Haitian
Cheng, Gaofeng
Yan, Yonghong
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2340 - 2344
[36] IMPACT OF OVERLAPPING SPEECH DETECTION ON SPEAKER DIARIZATION FOR BROADCAST NEWS AND DEBATES
Charlet, Delphine
Barras, Claude
Lienard, Jean-Sylvain
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7707 - 7711
[37] I-vector similarity based speech segmentation for interested speaker to speaker diarization system
Bae, Ara
Yoon, Ki-mu
Jung, Jaehee
Chung, Bokyung
Kim, Wooil
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 461 - 467
[38] SEGMENTATION OF TV SHOWS INTO SCENES USING SPEAKER DIARIZATION AND SPEECH RECOGNITION
Bredin, Herve
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2377 - 2380
[39] Joint speaker diarization and speech recognition based on region proposal networks
Huang, Zili
Delcroix, Marc
Garcia, Leibny Paola
Watanabe, Shinji
Raj, Desh
Khudanpur, Sanjeev
COMPUTER SPEECH AND LANGUAGE, 2022, 72
[40] Speech Overlap Detection in a Two-Pass Speaker Diarization System
Huijbregts, Marijn
van Leeuwen, David
de Jong, Franciska
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1047 - +

← 1 2 3 4 5 →