Investigation of Cross-show Speaker Diarization

被引:0
|
作者
Yang, Qian [1 ]
Jin, Qin
Schultz, Tanja [1 ]
机构
[1] Karlsruhe Inst Technol, Cognit Syst Lab, D-76021 Karlsruhe, Germany
关键词
speaker diarization; cross-show diarization; conversational podcast shows;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of cross-show diarization is to index speech segments of speakers from a set of shows, with the particular challenge that reappearing speakers across shows have to be labeled with the same speaker identity. In this paper, we introduce three cross-show diarization systems namely Global-BIC-Seg, Global-BIG-Cluster, and Incremental. We compared the three systems on a set of 46 English scientific podcast shows. Among the three systems, the Global-BIC-Cluster achieves the best performance with 15.53% and 13.21% cross-show diarization error rate (DER) on the dev and test set, respectively. However, an incremental approach is more practical since data and shows are typically collected over time. By applying T-Norm on our incremental system, we obtain 13.18% and 10.97% relative improvements in terms of cross-show DER on dev and test set. We also investigate the impact of the show processing order on cross-show diarization for the incremental system.
引用
收藏
页码:2936 / +
页数:2
相关论文
共 50 条
  • [1] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [2] Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
    Viet-Anh Tran
    Viet Bac Le
    Barras, Claude
    Lamel, Lori
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1060 - +
  • [3] I-vectors and ILP clustering adapted to cross-show speaker diarization
    Dupuy, Gregor
    Rouvier, Mickael
    Meignier, Sylvain
    Esteve, Yannick
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2171 - 2174
  • [4] Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling
    Delgado, Hector
    Anguera, Xavier
    Fredouille, Corinne
    Serrano, Javier
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2286 - 2297
  • [5] Is Incremental Cross-Show Speaker Diarization Efficient For Processing Large Volumes of Data?
    Dupuy, Gregor
    Meignier, Sylvain
    Esteve, Yannick
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 587 - 591
  • [6] Improving speaker diarization by cross EM refinement
    Ning, Huazhong
    Xu, Wei
    Gong, Yihong
    Huang, Thomas
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
  • [7] Speaker Diarization using Normalized Cross Likelihood Ratio
    Le, Viet-Bac
    Mella, Odile
    Fohr, Dominique
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 873 - 876
  • [8] An Adaptive Method for Cross-Recording Speaker Diarization
    Le Lan, Gael
    Charlet, Delphine
    Larcher, Anthony
    Meignier, Sylvain
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) : 1821 - 1832
  • [9] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
    Sell, Gregory
    McCree, Alan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
  • [10] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086