Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization

被引:0
|
作者
Gupta, Aishwarya [1 ]
Purwar, Archana [1 ]
机构
[1] Jaypee Inst Informat Technol, Comp Sci & Engn & Informat Technol, Noida, Uttar Pradesh, India
关键词
Speaker Diarization; Speech Refinement; Bi-directional Long Short-Term Memory (Bi-LSTM); Skip U-Net Connections; Singular Value Decomposition; Spectral clustering; MEAN SHIFT; ENHANCEMENT;
D O I
10.1007/s11042-023-17017-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this digitally-driven culture, the need and demand for diarizing online meetings, classes, conferences, and medical diagnoses have increased a lot. Speaker Diarization, a sub-domain of Speaker Recognition has grown with the advent of neural networks in the last decade. Diarize generally refers to obtaining the duration of individual speakers in any event. Researchers have suggested various approaches for multiple-speaker diarization. However, it still suffers from a problem of various environmental noises, and non-speech sounds like laughter, murmuring, clapping, etc. in the datasets. Hence, this paper proposes an improved speaker diarization pipeline to deal with the noise present in a dataset having multiple speakers. This improved diarization pipeline uses Bi-directional Long Short-Term Memory (Bi-LSTM), based speech refinement pre-processing module, and Modified Spectral Clustering with Symmetrized Singular Value Decomposition (MSC-SSVD). MSC-SSVD is used to cater to the problem of spectral clustering in large datasets. The proposed diarization pipeline is evaluated using the publicly available VoxConverse dataset. The Diarization Error Rate (DER) obtained after experimentation are 37.2%, 37.1%, and 43.3% respectively for three batches of dataset under study. The results are also compared with the baseline system and significant change in DER by 6.1%, 4.7%, and 7% respectively for three batches is observed.
引用
收藏
页码:54433 / 54448
页数:16
相关论文
共 50 条
  • [1] Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization
    Aishwarya Gupta
    Archana Purwar
    [J]. Multimedia Tools and Applications, 2024, 83 : 54433 - 54448
  • [2] LSTM based Similarity Measurement with Spectral Clustering for Speaker Diarization
    Lin, Qingjian
    Yin, Ruiqing
    Li, Ming
    Bredin, Herve
    Barras, Claude
    [J]. INTERSPEECH 2019, 2019, : 366 - 370
  • [3] Spectral Clustering Approach to Speaker Diarization
    Ning, Huazhong
    Liu, Ming
    Tang, Hao
    Huang, Thomas
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2178 - 2181
  • [4] X-Vector-Based Speaker Diarization Using Bi-LSTM and Interim Voting-Driven Post-processing
    Mala, J. B.
    Raj, S. M. Alex
    Rajan, Rajeev
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 161 - 173
  • [5] Improved Overlapped Speech Handling for Speaker Diarization
    Boakye, Kofi
    Vinyals, Oriol
    Friedland, Gerald
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
  • [6] Speaker Diarization Using Gesture and Speech
    Gebre, Binyam Gebrekidan
    Wittenburg, Peter
    Drude, Sebastian
    Huijbregts, Marijn
    Heskes, Tom
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
  • [7] MULTI-CLASS SPECTRAL CLUSTERING WITH OVERLAPS FOR SPEAKER DIARIZATION
    Raj, Desh
    Huang, Zili
    Khudanpur, Sanjeev
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 582 - 589
  • [8] Robust speaker clustering strategies to data source variation for improved speaker diarization
    Han, Kyu J.
    Kim, Samuel
    Narayanan, Shrikanth S.
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 262 - 267
  • [9] Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
    Cerva, Petr
    Silovsky, Jan
    Zdansky, Jindrich
    Nouza, Jan
    Seps, Ladislav
    [J]. SPEECH COMMUNICATION, 2013, 55 (10) : 1033 - 1046
  • [10] Overlapped speech detection for improved speaker diarization in multiparty meetings
    Boakye, Kofi
    Trueba-Hornero, Beatriz
    Vinyals, Oriol
    Friedland, Gerald
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356