Speech Enhancement for Multimodal Speaker Diarization System

被引:6
|
作者
Ahmad, Rehan [1 ]
Zubair, Syed [2 ,3 ]
Alquhayz, Hani [4 ]
机构
[1] Int Islamic Univ, Dept Elect Engn, Islamabad 44000, Pakistan
[2] Analyt Camp, Islamabad 44000, Pakistan
[3] Univ Sialkot, Dept Comp Sci, Sialkot 51310, Pakistan
[4] Majmaah Univ, Dept Comp Sci & Informat, Coll Sci Zulfi, Al Majmaah 11952, Saudi Arabia
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Multimodal speaker diarization; LSTM; audio-visual synchronization; additive white Gaussian noise; Gaussian mixture model; diarization error rate; SEGMENTATION; SEPARATION; MEETINGS;
D O I
10.1109/ACCESS.2020.3007312
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speaker diarization system identifies the speaker homogenous regions in those set of recordings where multiple speakers are present. It answers the question 'who spoke when?'. The data set for speaker diarization usually consists of telephone, meetings, TV/ talk shows, broadcast news and other multi-speaker recordings. In this paper, we present the performance of our proposed multimodal speaker diarization system under noisy conditions. Two types of noises comprising additive white Gaussian noise (AWGN) and realistic environmental noise is used to evaluate the system. To mitigate the effect of noise, we propose to add an LSTM based speech enhancement block in our diarization pipeline. This block is trained on synthesized data set with more than 100 noise types to enhance the noisy speech. The enhanced speech is further used in multimodal speaker diarization system which utilizes a pre-trained audio-visual synchronization model to find the active speaker. High confidence active speaker segments are then used to train the speaker specific clusters on the enhanced speech. A subset of AMI corpus consisting of 5.4 h of recordings is used in this analysis. For AWGN, the LSTM model performance improvement is comparable with Wiener filter while in case of realistic environmental noise, the LSTM model improves significantly as compared to Wiener filter in terms of diarization error rate (DER).
引用
收藏
页码:126671 / 126680
页数:10
相关论文
共 50 条
  • [1] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [2] Speaker Diarization Using Gesture and Speech
    Gebre, Binyam Gebrekidan
    Wittenburg, Peter
    Drude, Sebastian
    Huijbregts, Marijn
    Heskes, Tom
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 582 - 586
  • [3] Neural Network Speaker Descriptor in Speaker Diarization of Telephone Speech
    Zajic, Zbynek
    Zelinka, Jan
    Mueller, Ludek
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 555 - 563
  • [4] Speech Overlap Detection in a Two-Pass Speaker Diarization System
    Huijbregts, Marijn
    van Leeuwen, David
    de Jong, Franciska
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1047 - +
  • [5] Detection of Overlapping Speech for the Purposes of Speaker Diarization
    Kunesova, Marie
    Hruz, Marek
    Zajic, Zbynek
    Radova, Vlasta
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 247 - 257
  • [6] Robust Speaker Diarization for Short Speech Recordings
    Imseng, David
    Friedland, Gerald
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 432 - +
  • [7] Speaker Diarization Experiments for Romanian Parliamentary Speech
    Lupu, Eugen
    Apatean, Anca
    Arsinte, Radu
    [J]. 2015 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2015,
  • [8] I-vector similarity based speech segmentation for interested speaker to speaker diarization system
    Bae, Ara
    Yoon, Ki-mu
    Jung, Jaehee
    Chung, Bokyung
    Kim, Wooil
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 461 - 467
  • [9] Improved Overlapped Speech Handling for Speaker Diarization
    Boakye, Kofi
    Vinyals, Oriol
    Friedland, Gerald
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
  • [10] An Improved Speaker Diarization System
    Fu, Rong
    Benest, Ian D.
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1253 - 1256