Chronological Self-Training for Real-Time Speaker Diarization

被引：0

作者：

Padfield, Dirk ^{[1
]}

Liebling, Daniel J. ^{[1
]}

机构：

[1] Google Res, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2021 | 2021年

关键词：

Diarization; real-time; d-vector; self-training; classification; clustering;

D O I：

10.21437/Interspeech.2021-822

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Diarization partitions an audio stream into segments based on the voices of the speakers. Real-time diarization systems that include an enrollment step should limit enrollment training samples to reduce user interaction time. Although training on a small number of samples yields poor performance, we show that the accuracy can be improved dramatically using a chronological self-training approach. We studied the tradeoff between training time and classification performance and found that 1 second is sufficient to reach over 95% accuracy. We evaluated on 700 audio conversation files of about 10 minutes each from 6 different languages and demonstrated average diarization error rates as low as 10%.

引用

页码：4613 / 4617

页数：5

共 50 条

[1] A REAL-TIME SPEAKER DIARIZATION SYSTEM BASED ON SPATIAL SPECTRUM
Zheng, Siqi
Huang, Weilong
Wang, Xianliang
Suo, Hongbin
Feng, Jinwei
Yan, Zhijie
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7208 - 7212
[2] Real-time Sonography with central venous access - The role of self-training - Reply
Feller-Kopman, David
[J]. CHEST, 2007, 132 (06) : 2061 - 2062
[3] A real-time visual feedback system of strength self-training with motion capture
Kaneko, Hikaru
Makino, Mitsunori
[J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 228 - 231
[4] Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation
Lyu, Ke-Ming
Lyu, Ren-yuan
Chang, Hsien-Tsung
[J]. PEERJ COMPUTER SCIENCE, 2024, 10
[5] A fast-match approach for robust, faster than real-time speaker diarization
Huang, Yan
Vinyals, Oriol
Friedland, Gerald
Mueller, Christian
Mirghafori, Nikki
Wooters, Chuck
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 693 - 698
[6] Enhancing Real-Time Training of Heterogeneous UAVs Using a Federated Teacher-Student Self-Training Framework
Nikam, Piyush
Shah, Dhruv
Sahu, Aryan
Goveas, Neena
Vidhyadharan, Sreejith
[J]. MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
[7] Phone Adaptive Training for Speaker Diarization
Bozonnet, Simon
Vipperla, Ravichander
Evans, Nicholas
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 494 - 497
[8] SPEAKER DIARIZATION WITH UNSUPERVISED TRAINING FRAMEWORKL
Le Lan, Gael
Meignier, Sylvain
Charlet, Delphine
Deleglise, Paul
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5560 - 5564
[9] Self-supervised Speaker Diarization
Dissen, Yehoshua
Kreuk, Felix
Keshet, Joseph
[J]. INTERSPEECH 2022, 2022, : 4013 - 4017
[10] SELF-TRAINING NEURAL NETWORK MODEL FOR REAL TIME TOMOGRAPHY DATA PROCESSING
[J]. 激光生物学报, 1995, (02) : 625 - 629

← 1 2 3 4 5 →