Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization

被引：0

作者：

Gupta, Aishwarya ^{[1
]}

Purwar, Archana ^{[1
]}

机构：

[1] Jaypee Inst Informat Technol, Comp Sci & Engn & Informat Technol, Noida, Uttar Pradesh, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 83卷 / 18期

关键词：

Speaker Diarization; Speech Refinement; Bi-directional Long Short-Term Memory (Bi-LSTM); Skip U-Net Connections; Singular Value Decomposition; Spectral clustering; MEAN SHIFT; ENHANCEMENT;

D O I：

10.1007/s11042-023-17017-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this digitally-driven culture, the need and demand for diarizing online meetings, classes, conferences, and medical diagnoses have increased a lot. Speaker Diarization, a sub-domain of Speaker Recognition has grown with the advent of neural networks in the last decade. Diarize generally refers to obtaining the duration of individual speakers in any event. Researchers have suggested various approaches for multiple-speaker diarization. However, it still suffers from a problem of various environmental noises, and non-speech sounds like laughter, murmuring, clapping, etc. in the datasets. Hence, this paper proposes an improved speaker diarization pipeline to deal with the noise present in a dataset having multiple speakers. This improved diarization pipeline uses Bi-directional Long Short-Term Memory (Bi-LSTM), based speech refinement pre-processing module, and Modified Spectral Clustering with Symmetrized Singular Value Decomposition (MSC-SSVD). MSC-SSVD is used to cater to the problem of spectral clustering in large datasets. The proposed diarization pipeline is evaluated using the publicly available VoxConverse dataset. The Diarization Error Rate (DER) obtained after experimentation are 37.2%, 37.1%, and 43.3% respectively for three batches of dataset under study. The results are also compared with the baseline system and significant change in DER by 6.1%, 4.7%, and 7% respectively for three batches is observed.

引用

页码：54433 / 54448

页数：16

共 50 条

[1] Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization
Aishwarya Gupta
Archana Purwar
[J]. Multimedia Tools and Applications, 2024, 83 : 54433 - 54448
[2] LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
Lin, Qingjian
Yin, Ruiqing
Li, Ming
Bredin, Herve
Barras, Claude
[J]. INTERSPEECH 2019, 2019, : 366 - 370
[3] Spectral Clustering Approach to Speaker Diarization
Ning, Huazhong
Liu, Ming
Tang, Hao
Huang, Thomas
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
[4] X-Vector-Based Speaker Diarization Using Bi-LSTM and Interim Voting-Driven Post-processing
Mala, J. B.
Raj, S. M. Alex
Rajan, Rajeev
[J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 161 - 173
[5] Improved Overlapped Speech Handling for Speaker Diarization
Boakye, Kofi
Vinyals, Oriol
Friedland, Gerald
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
[6] Speaker Diarization Using Gesture and Speech
Gebre, Binyam Gebrekidan
Wittenburg, Peter
Drude, Sebastian
Huijbregts, Marijn
Heskes, Tom
[J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
[7] MULTI-CLASS SPECTRAL CLUSTERING WITH OVERLAPS FOR SPEAKER DIARIZATION
Raj, Desh
Huang, Zili
Khudanpur, Sanjeev
[J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 582 - 589
[8] Robust speaker clustering strategies to data source variation for improved speaker diarization
Han, Kyu J.
Kim, Samuel
Narayanan, Shrikanth S.
[J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 262 - 267
[9] Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
Cerva, Petr
Silovsky, Jan
Zdansky, Jindrich
Nouza, Jan
Seps, Ladislav
[J]. SPEECH COMMUNICATION, 2013, 55 (10) : 1033 - 1046
[10] Overlapped speech detection for improved speaker diarization in multiparty meetings
Boakye, Kofi
Trueba-Hornero, Beatriz
Vinyals, Oriol
Friedland, Gerald
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356

← 1 2 3 4 5 →