An Adaptive Method for Cross-Recording Speaker Diarization

被引：3

作者：

Le Lan, Gael ^{[1
]}

Charlet, Delphine ^{[1
]}

Larcher, Anthony ^{[2
]}

Meignier, Sylvain ^{[2
]}

机构：

[1] Orange Labs, F-22300 Lannion, France

[2] Univ Le Mans, F-72085 Le Mans, France

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 10期

关键词：

Speaker diarization; speaker linking; domain adaptation; ADAPTATION; LINKING; PLDA;

D O I：

10.1109/TASLP.2018.2844025

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Nowadays, state-of-the-art speaker diarization systems heavily rely on between-recording variability compensation methods to accurately process large collections of recordings. Variability estimation is performed on consequent training datasets, which must be labeled by speaker. One major problem of such systems is the acoustic mismatch between training and target data that degrades performances. Most of the collections contain lots of speakers speaking in various acoustic conditions. In this paper, we investigate how unlabeled speakers can help improve between-recording variability estimation, to overcome the mismatch issue. We propose a scalable unsupervised adaptation framework for two types of variability compensation. The proposed framework consists in adapting a state-of-the-art diarization and linking system, trained on out-of-domain data, using the data of the collection itself. Experiments in mismatch condition are run on two French Television shows, while the initial training dataset is composed of Radio recordings. Results indicate that the proposed adaptation framework reduces the cross-recording DER of 13% in average for variable collection sizes.

引用

页码：1821 / 1832

页数：12

共 50 条

[1] AN ADAPTIVE INITIALIZATION METHOD FOR SPEAKER DIARIZATION BASED ON PROSODIC FEATURES
Imseng, David
Friedland, Gerald
[J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4946 - 4949
[2] Phone Adaptive Training for Speaker Diarization
Bozonnet, Simon
Vipperla, Ravichander
Evans, Nicholas
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
[3] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
Rouvier, Mickael
Favre, Benoit
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
[4] Cross-recording reduction by trial recording on a magneto-optical disk
Fuji, H
Okumura, T
Maeda, S
Murakami, Y
Akiyama, J
Sato, H
[J]. IEEE TRANSACTIONS ON MAGNETICS, 2000, 36 (03) : 591 - 596
[5] ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA
Soldi, Giovanni
Beaugeant, Christophe
Evans, Nicholas
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2112 - 2116
[6] Investigation of Cross-show Speaker Diarization
Yang, Qian
Jin, Qin
Schultz, Tanja
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2936 - +
[7] Improving speaker diarization by cross EM refinement
Ning, Huazhong
Xu, Wei
Gong, Yihong
Huang, Thomas
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
[8] Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
Silovsky, Jan
Cerva, Petr
Zdansky, Jindrich
Nouza, Jan
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 478 - 481
[9] Speaker Diarization using Normalized Cross Likelihood Ratio
Le, Viet-Bac
Mella, Odile
Fohr, Dominique
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 873 - 876
[10] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
Sell, Gregory
McCree, Alan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429

← 1 2 3 4 5 →