Robust Speaker Diarization for Short Speech Recordings

被引：11

作者：

Imseng, David ^{[1
,2
]}

Friedland, Gerald ^{[3
]}

机构：

[1] Idiap Res Inst, POB 592, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, CH-1015 Lausanne, Switzerland

[3] Int Comp Sci Inst, Berkeley, CA 94704 USA

来源：

2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009) | 2009年

关键词：

D O I：

10.1109/ASRU.2009.5373254

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate a state-of-the-art Speaker Diarization system regarding its behavior on meetings that are much shorter (from 500 seconds down to 100 seconds) than those typically analyzed in Speaker Diarization benchmarks. First, the problems inherent to this task are analyzed. Then, we propose an approach that consists of a novel initialization parameter estimation method for typical state-of-the-art diarization approaches. The estimation method balances the relationship between the optimal value of the duration of speech data per Gaussian and the duration of the speech data, which is verified experimentally for the first time in this article. As a result, the Diarization Error Rate for short meetings extracted from the 2006, 2007, and 2009 NIST RT evaluation data is decreased by up to 50 % relative.

引用

页码：432 / +

页数：2

共 50 条

[41] Speaker diarization method of telemarketer and client for improving speech dictation performance
Dahae Jung
Min-Kyoung Bae
Man Yong Choi
Eui Chul Lee
Jinoo Joung
The Journal of Supercomputing, 2016, 72 : 1757 - 1769
[42] Speech and multilingual natural language framework for speaker change detection and diarization
Anidjar, Or Haim
Esteve, Yannick
Hajaj, Chen
Dvir, Amit
Lapidot, Itshak
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[43] Speaker diarization method of telemarketer and client for improving speech dictation performance
Jung, Dahae
Bae, Min-Kyoung
Choi, Man Yong
Lee, Eui Chul
Joung, Jinoo
JOURNAL OF SUPERCOMPUTING, 2016, 72 (05): : 1757 - 1769
[44] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[45] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[46] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[47] A CLUSTER-VOTING APPROACH FOR SPEAKER DIARIZATION AND LINKING OF AUSTRALIAN BROADCAST NEWS RECORDINGS
Ghaemmaghami, Houman
Dean, David
Sridharan, Sridha
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4829 - 4833
[48] Trainable Speaker Diarization
Aronowitz, Hagai
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
[49] Automatic cluster complexity and quantity selection: Towards robust speaker diarization
Anguera, Xavier
Wooters, Chuck
Hernando, Javier
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 248 - +
[50] TRANSFER LEARNING USING RAW WAVEFORM SINCNET FOR ROBUST SPEAKER DIARIZATION
Dubey, Harishchandra
Sangwan, Abhijeet
Hansen, John H. L.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6296 - 6300

← 1 2 3 4 5 →