An Adaptive Method for Cross-Recording Speaker Diarization

被引：3

作者：

Le Lan, Gael ^{[1
]}

Charlet, Delphine ^{[1
]}

Larcher, Anthony ^{[2
]}

Meignier, Sylvain ^{[2
]}

机构：

[1] Orange Labs, F-22300 Lannion, France

[2] Univ Le Mans, F-72085 Le Mans, France

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2018年 / 26卷 / 10期

关键词：

Speaker diarization; speaker linking; domain adaptation; ADAPTATION; LINKING; PLDA;

D O I：

10.1109/TASLP.2018.2844025

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Nowadays, state-of-the-art speaker diarization systems heavily rely on between-recording variability compensation methods to accurately process large collections of recordings. Variability estimation is performed on consequent training datasets, which must be labeled by speaker. One major problem of such systems is the acoustic mismatch between training and target data that degrades performances. Most of the collections contain lots of speakers speaking in various acoustic conditions. In this paper, we investigate how unlabeled speakers can help improve between-recording variability estimation, to overcome the mismatch issue. We propose a scalable unsupervised adaptation framework for two types of variability compensation. The proposed framework consists in adapting a state-of-the-art diarization and linking system, trained on out-of-domain data, using the data of the collection itself. Experiments in mismatch condition are run on two French Television shows, while the initial training dataset is composed of Radio recordings. Results indicate that the proposed adaptation framework reduces the cross-recording DER of 13% in average for variable collection sizes.

引用

下载

页码：1821 / 1832

页数：12

共 50 条

[31] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
[32] Exploring methods of improving speaker accuracy for speaker diarization
Knox, Mary Tai
Mirghafori, Nikki
Friedland, Gerald
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786
[33] Speaker-Corrupted Embeddings for Online Speaker Diarization
Ghahabi, Omid
Fischer, Volker
INTERSPEECH 2019, 2019, : 386 - 390
[34] Online Neural Speaker Diarization With Target Speaker Tracking
Wang, Weiqing
Li, Ming
IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 5078 - 5091
[35] Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
Viet-Anh Tran
Viet Bac Le
Barras, Claude
Lamel, Lori
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1060 - +
[36] Speaker diarization method of telemarketer and client for improving speech dictation performance
Dahae Jung
Min-Kyoung Bae
Man Yong Choi
Eui Chul Lee
Jinoo Joung
The Journal of Supercomputing, 2016, 72 : 1757 - 1769
[37] Speaker diarization method of telemarketer and client for improving speech dictation performance
Jung, Dahae
Bae, Min-Kyoung
Choi, Man Yong
Lee, Eui Chul
Joung, Jinoo
JOURNAL OF SUPERCOMPUTING, 2016, 72 (05): : 1757 - 1769
[38] Speaker Diarization and Linking of Meeting Data
Ferras, Marc
Madikeri, Srikanth
Bourlard, Herve
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1935 - 1945
[39] Speaker Diarization Using Gesture and Speech
Gebre, Binyam Gebrekidan
Wittenburg, Peter
Drude, Sebastian
Huijbregts, Marijn
Heskes, Tom
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
[40] Group Delay Functions for Speaker Diarization
Yadav, Mohit
Sao, Anil Kumar
Dileep, A. D.
Rajan, Padmanabhan
2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,

← 1 2 3 4 5 →