An Adaptive Method for Cross-Recording Speaker Diarization

被引:3
|
作者
Le Lan, Gael [1 ]
Charlet, Delphine [1 ]
Larcher, Anthony [2 ]
Meignier, Sylvain [2 ]
机构
[1] Orange Labs, F-22300 Lannion, France
[2] Univ Le Mans, F-72085 Le Mans, France
关键词
Speaker diarization; speaker linking; domain adaptation; ADAPTATION; LINKING; PLDA;
D O I
10.1109/TASLP.2018.2844025
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Nowadays, state-of-the-art speaker diarization systems heavily rely on between-recording variability compensation methods to accurately process large collections of recordings. Variability estimation is performed on consequent training datasets, which must be labeled by speaker. One major problem of such systems is the acoustic mismatch between training and target data that degrades performances. Most of the collections contain lots of speakers speaking in various acoustic conditions. In this paper, we investigate how unlabeled speakers can help improve between-recording variability estimation, to overcome the mismatch issue. We propose a scalable unsupervised adaptation framework for two types of variability compensation. The proposed framework consists in adapting a state-of-the-art diarization and linking system, trained on out-of-domain data, using the data of the collection itself. Experiments in mismatch condition are run on two French Television shows, while the initial training dataset is composed of Radio recordings. Results indicate that the proposed adaptation framework reduces the cross-recording DER of 13% in average for variable collection sizes.
引用
收藏
页码:1821 / 1832
页数:12
相关论文
共 50 条
  • [1] AN ADAPTIVE INITIALIZATION METHOD FOR SPEAKER DIARIZATION BASED ON PROSODIC FEATURES
    Imseng, David
    Friedland, Gerald
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4946 - 4949
  • [2] Phone Adaptive Training for Speaker Diarization
    Bozonnet, Simon
    Vipperla, Ravichander
    Evans, Nicholas
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
  • [3] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [4] Cross-recording reduction by trial recording on a magneto-optical disk
    Fuji, H
    Okumura, T
    Maeda, S
    Murakami, Y
    Akiyama, J
    Sato, H
    [J]. IEEE TRANSACTIONS ON MAGNETICS, 2000, 36 (03) : 591 - 596
  • [5] ADAPTIVE AND ONLINE SPEAKER DIARIZATION FOR MEETING DATA
    Soldi, Giovanni
    Beaugeant, Christophe
    Evans, Nicholas
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2112 - 2116
  • [6] Investigation of Cross-show Speaker Diarization
    Yang, Qian
    Jin, Qin
    Schultz, Tanja
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2936 - +
  • [7] Improving speaker diarization by cross EM refinement
    Ning, Huazhong
    Xu, Wei
    Gong, Yihong
    Huang, Thomas
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
  • [8] Study on Integration of Speaker Diarization with Speaker Adaptive Speech Recognition for Broadcast Transcription
    Silovsky, Jan
    Cerva, Petr
    Zdansky, Jindrich
    Nouza, Jan
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 478 - 481
  • [9] Speaker Diarization using Normalized Cross Likelihood Ratio
    Le, Viet-Bac
    Mella, Odile
    Fohr, Dominique
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 873 - 876
  • [10] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
    Sell, Gregory
    McCree, Alan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429