Chronological Self-Training for Real-Time Speaker Diarization

被引:0
|
作者
Padfield, Dirk [1 ]
Liebling, Daniel J. [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
关键词
Diarization; real-time; d-vector; self-training; classification; clustering;
D O I
10.21437/Interspeech.2021-822
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a small number of samples yields poor performance, we show that the accuracy can be improved dramatically using a chronological self-training approach. We studied the tradeoff between training time and classification performance and found that 1 second is sufficient to reach over 95% accuracy. We evaluated on 700 audio conversation files of about 10 minutes each from 6 different languages and demonstrated average diarization error rates as low as 10%.
引用
收藏
页码:4613 / 4617
页数:5
相关论文
共 50 条
  • [1] A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM
    Zheng, Siqi
    Huang, Weilong
    Wang, Xianliang
    Suo, Hongbin
    Feng, Jinwei
    Yan, Zhijie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7208 - 7212
  • [2] Real-time Sonography with central venous access - The role of self-training - Reply
    Feller-Kopman, David
    [J]. CHEST, 2007, 132 (06) : 2061 - 2062
  • [3] A real-time visual feedback system of strength self-training with motion capture
    Kaneko, Hikaru
    Makino, Mitsunori
    [J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 228 - 231
  • [4] Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation
    Lyu, Ke-Ming
    Lyu, Ren-yuan
    Chang, Hsien-Tsung
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [5] A fast-match approach for robust, faster than real-time speaker diarization
    Huang, Yan
    Vinyals, Oriol
    Friedland, Gerald
    Mueller, Christian
    Mirghafori, Nikki
    Wooters, Chuck
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 693 - 698
  • [6] Enhancing Real-Time Training of Heterogeneous UAVs Using a Federated Teacher-Student Self-Training Framework
    Nikam, Piyush
    Shah, Dhruv
    Sahu, Aryan
    Goveas, Neena
    Vidhyadharan, Sreejith
    [J]. MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
  • [7] Phone Adaptive Training for Speaker Diarization
    Bozonnet, Simon
    Vipperla, Ravichander
    Evans, Nicholas
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
  • [8] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
    Le Lan, Gael
    Meignier, Sylvain
    Charlet, Delphine
    Deleglise, Paul
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
  • [9] Self-supervised Speaker Diarization
    Dissen, Yehoshua
    Kreuk, Felix
    Keshet, Joseph
    [J]. INTERSPEECH 2022, 2022, : 4013 - 4017