Incorporation of the ASR Output in Speaker Segmentation and Clustering within the Task of Speaker Diarization of Broadcast Streams

被引：0

作者：

Silovsky, Jan ^{[1
]}

Zdansky, Jindrich ^{[1
]}

Nouza, Jan ^{[1
]}

Cerva, Petr ^{[1
]}

Prazak, Jan ^{[1
]}

机构：

[1] Tech Univ Liberec, Fac Mechatron, Inst Informat Technol & Elect, Liberec, Czech Republic

来源：

2012 IEEE 14TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP) | 2012年

关键词：

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper we study the effect of incorporation of automatic transcriptions in the speaker diarization process. We aim to improve both the diarization accuracy as evaluated by standard objective measures and quality of the diarization output from user's perspective. Although the presented approach relies on output of an automatic speech recognizer, it makes no use of lexical information. Instead, we use information about word boundaries and classification of non-speech events occurring in the processed stream. The former information is used as constraining condition for speaker change-point candidates and the latter facilitate to neglect various vocal noise sounds that carry no speaker-specific information (considering representation of the signal by cepstral features) and thus harm the speaker's representation. The experimental evaluation of the presented approach was carried out using the COST278 multilingual broadcast news database. We demonstrate that the approach yields improvement in terms of both speaker diarization and segmentation performance measures. Furthermore, we show that the number of change-points detected within words (and not at their boundaries) is significantly reduced.

引用

页码：118 / 123

页数：6

共 50 条

[1] PLDA-based Clustering for Speaker Diarization of Broadcast Streams
Silovsky, Jan
Prazak, Jan
Cerva, Petr
Zdansky, Jindrich
Nouza, Jan
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2920 - +
[2] Comparison of Segmentation and Clustering Methods for Speaker Diarization of Broadcast Stream Audio
Prazak, Jan
Silovsky, Jan
[J]. ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 214 - 222
[3] Speaker diarization of French broadcast news
Gupta, Vishwa
Boulianne, Gilles
Kenny, Patrick
Ouellet, Pierre
Dumouchel, Pierre
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4365 - 4368
[4] Multistage speaker diarization of broadcast news
Barras, Claude
Zhu, Xuan
Meignier, Sylvain
Gauvain, Jean-Luc
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
[5] Robust Speaker Diarization for News Broadcast
Karthik, M. L. N. S.
Ganesh, Mirishkar Sai
Patnaik, Bijayananda
[J]. 2018 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2018,
[6] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[7] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
[8] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
Zibert, Janez
Mihelic, France
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
[9] Speaker diarization: From broadcast news to lectures
Zhu, X.
Barras, C.
Lamel, L.
Gauvain, J-L.
[J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 396 - +
[10] Domain Adaptation of PLDA models in Broadcast Diarization by means of Unsupervised Speaker Clustering
Vinals, Ignacio
Ortega, Alfonso
Villalba, Jesus
Miguel, Antonio
Lleida, Eduardo
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2829 - 2833

← 1 2 3 4 5 →