An Adaptive Method for Cross-Recording Speaker Diarization

被引:3
|
作者
Le Lan, Gael [1 ]
Charlet, Delphine [1 ]
Larcher, Anthony [2 ]
Meignier, Sylvain [2 ]
机构
[1] Orange Labs, F-22300 Lannion, France
[2] Univ Le Mans, F-72085 Le Mans, France
关键词
Speaker diarization; speaker linking; domain adaptation; ADAPTATION; LINKING; PLDA;
D O I
10.1109/TASLP.2018.2844025
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Nowadays, state-of-the-art speaker diarization systems heavily rely on between-recording variability compensation methods to accurately process large collections of recordings. Variability estimation is performed on consequent training datasets, which must be labeled by speaker. One major problem of such systems is the acoustic mismatch between training and target data that degrades performances. Most of the collections contain lots of speakers speaking in various acoustic conditions. In this paper, we investigate how unlabeled speakers can help improve between-recording variability estimation, to overcome the mismatch issue. We propose a scalable unsupervised adaptation framework for two types of variability compensation. The proposed framework consists in adapting a state-of-the-art diarization and linking system, trained on out-of-domain data, using the data of the collection itself. Experiments in mismatch condition are run on two French Television shows, while the initial training dataset is composed of Radio recordings. Results indicate that the proposed adaptation framework reduces the cross-recording DER of 13% in average for variable collection sizes.
引用
下载
收藏
页码:1821 / 1832
页数:12
相关论文
共 50 条
  • [31] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
    Desplanques, Brecht
    Demuynck, Kris
    Martens, Jean-Pierre
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085
  • [32] Exploring methods of improving speaker accuracy for speaker diarization
    Knox, Mary Tai
    Mirghafori, Nikki
    Friedland, Gerald
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2782 - 2786
  • [33] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    INTERSPEECH 2019, 2019, : 386 - 390
  • [34] Online Neural Speaker Diarization With Target Speaker Tracking
    Wang, Weiqing
    Li, Ming
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 5078 - 5091
  • [35] Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
    Viet-Anh Tran
    Viet Bac Le
    Barras, Claude
    Lamel, Lori
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1060 - +
  • [36] Speaker diarization method of telemarketer and client for improving speech dictation performance
    Dahae Jung
    Min-Kyoung Bae
    Man Yong Choi
    Eui Chul Lee
    Jinoo Joung
    The Journal of Supercomputing, 2016, 72 : 1757 - 1769
  • [37] Speaker diarization method of telemarketer and client for improving speech dictation performance
    Jung, Dahae
    Bae, Min-Kyoung
    Choi, Man Yong
    Lee, Eui Chul
    Joung, Jinoo
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (05): : 1757 - 1769
  • [38] Speaker Diarization and Linking of Meeting Data
    Ferras, Marc
    Madikeri, Srikanth
    Bourlard, Herve
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1935 - 1945
  • [39] Speaker Diarization Using Gesture and Speech
    Gebre, Binyam Gebrekidan
    Wittenburg, Peter
    Drude, Sebastian
    Huijbregts, Marijn
    Heskes, Tom
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
  • [40] Group Delay Functions for Speaker Diarization
    Yadav, Mohit
    Sao, Anil Kumar
    Dileep, A. D.
    Rajan, Padmanabhan
    2016 TWENTY SECOND NATIONAL CONFERENCE ON COMMUNICATION (NCC), 2016,