Investigation of Cross-show Speaker Diarization

被引：0

作者：

Yang, Qian ^{[1
]}

Jin, Qin

Schultz, Tanja ^{[1
]}

机构：

[1] Karlsruhe Inst Technol, Cognit Syst Lab, D-76021 Karlsruhe, Germany

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

关键词：

speaker diarization; cross-show diarization; conversational podcast shows;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of cross-show diarization is to index speech segments of speakers from a set of shows, with the particular challenge that reappearing speakers across shows have to be labeled with the same speaker identity. In this paper, we introduce three cross-show diarization systems namely Global-BIC-Seg, Global-BIG-Cluster, and Incremental. We compared the three systems on a set of 46 English scientific podcast shows. Among the three systems, the Global-BIC-Cluster achieves the best performance with 15.53% and 13.21% cross-show diarization error rate (DER) on the dev and test set, respectively. However, an incremental approach is more practical since data and shows are typically collected over time. By applying T-Norm on our incremental system, we obtain 13.18% and 10.97% relative improvements in terms of cross-show DER on dev and test set. We also investigate the impact of the show processing order on cross-show diarization for the incremental system.

引用

页码：2936 / +

页数：2

共 50 条

[1] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
Rouvier, Mickael
Favre, Benoit
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
[2] Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
Viet-Anh Tran
Viet Bac Le
Barras, Claude
Lamel, Lori
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1060 - +
[3] I-vectors and ILP clustering adapted to cross-show speaker diarization
Dupuy, Gregor
Rouvier, Mickael
Meignier, Sylvain
Esteve, Yannick
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2171 - 2174
[4] Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling
Delgado, Hector
Anguera, Xavier
Fredouille, Corinne
Serrano, Javier
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2286 - 2297
[5] Is Incremental Cross-Show Speaker Diarization Efficient For Processing Large Volumes of Data?
Dupuy, Gregor
Meignier, Sylvain
Esteve, Yannick
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 587 - 591
[6] Improving speaker diarization by cross EM refinement
Ning, Huazhong
Xu, Wei
Gong, Yihong
Huang, Thomas
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
[7] Speaker Diarization using Normalized Cross Likelihood Ratio
Le, Viet-Bac
Mella, Odile
Fohr, Dominique
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 873 - 876
[8] An Adaptive Method for Cross-Recording Speaker Diarization
Le Lan, Gael
Charlet, Delphine
Larcher, Anthony
Meignier, Sylvain
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1821 - 1832
[9] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
Sell, Gregory
McCree, Alan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
[10] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086

← 1 2 3 4 5 →