Neural speech turn segmentation and affinity propagation for speaker diarization

被引：14

作者：

Yin, Ruiqing ^{[1
]}

Bredin, Herve ^{[1
]}

Barras, Claude ^{[1
]}

机构：

[1] Univ Paris Saclay, Univ Paris Sud, CNRS, LIMSI, F-91405 Orsay, France

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

speaker diarization; re-segmentation; LSTM; affinity propagation;

D O I：

10.21437/Interspeech.2018-1750

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker diarization is the task of determining "who speaks when" in an audio stream. Most diarization systems rely on statistical models to address four sub-tasks: speech activity detection (SAD), speaker change detection (SCD), speech turn clustering, and re-segmentation. First, following the recent success of recurrent neural networks (RNN) for SAD and SCD, we propose to address re-segmentation with Long-Short Term Memory (LSTM) networks. Then, we propose to use affinity propagation on top of neural speaker embeddings for speech turn clustering, outperforming regular Hierarchical Agglomerative Clustering (HAC). Finally, all these modules are combined and jointly optimized to form a speaker diarization pipeline in which all but the clustering step are based on RNNs. We provide experimental results on the French Broadcast dataset ETAPE where we reach state-of-the-art performance.

引用

页码：1393 / 1397

页数：5

共 50 条

[1] Speech Segmentation and Speaker Diarization using Time-Delay Neural Network
Toruk, Mesut
Serbes, Ahmet
Bilgin, Gokhan
[J]. 2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 335 - 339
[2] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
Zajic, Zbynek
Zelinka, Jan
Mueller, Ludek
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
[3] SEGMENTATION OF TV SHOWS INTO SCENES USING SPEAKER DIARIZATION AND SPEECH RECOGNITION
Bredin, Herve
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2377 - 2380
[4] Speaker-turn aware diarization for speech-based cognitive assessments
Xu, Sean Shensheng
Ke, Xiaoquan
Mak, Man-Wai
Wong, Ka Ho
Meng, Helen
Kwok, Timothy C. Y.
Gu, Jason
Zhang, Jian
Tao, Wei
Chang, Chunqi
[J]. FRONTIERS IN NEUROSCIENCE, 2024, 17
[5] I-vector similarity based speech segmentation for interested speaker to speaker diarization system
Bae, Ara
Yoon, Ki-mu
Jung, Jaehee
Chung, Bokyung
Kim, Wooil
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 461 - 467
[6] MULTI-SCALE SPEAKER DIARIZATION WITH NEURAL AFFINITY SCORE FUSION
Park, Tae Jin
Kumar, Manoj
Narayanan, Shrikanth
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7173 - 7177
[7] Speaker Diarization Using Gesture and Speech
Gebre, Binyam Gebrekidan
Wittenburg, Peter
Drude, Sebastian
Huijbregts, Marijn
Heskes, Tom
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
[8] Investigation of Segmentation in i-Vector Based Speaker Diarization of Telephone Speech
Zajic, Zbynek
Kunesova, Marie
Radova, Vlasta
[J]. SPEECH AND COMPUTER, 2016, 9811 : 411 - 418
[9] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[10] Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments
Xu, Sean Shensheng
Mak, Man-Wai
Wong, Ka Ho
Meng, Helen
Kwok, Timothy C. Y.
[J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1299 - 1304

← 1 2 3 4 5 →