Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

被引:0
|
作者
Sun, Hanwu [1 ]
Nwe, Tin Lay [1 ]
Chin, Eugene [1 ]
Koh, Wei [1 ]
Bin, Ma [1 ]
Li, Haizhou [1 ]
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
来源
关键词
speaker clustering; microphone array; diarization; TDOA; time delay estimate; speaker segmentation;
D O I
10.1117/12.740116
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a speaker diarization system developed at the Institute for Infocomm Research ((IR)-R-2) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing" (DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] The ELISA consortium approaches in broadcast news speaker segmentation during the NIST 2003 Rich Transcription evaluation
    Moraru, D
    Meignier, S
    Fredbuille, C
    Besacier, L
    Bonastre, JF
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 373 - 376
  • [32] Politecnico di Torino System for the 2007 NIST Language Recognition Evaluation
    Castaldo, Fabio
    Dalmasso, Emanuele
    Laface, Pietro
    Colibro, Daniele
    Vair, Claudio
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 297 - +
  • [33] Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
    Cerva, Petr
    Silovsky, Jan
    Zdansky, Jindrich
    Nouza, Jan
    Seps, Ladislav
    [J]. SPEECH COMMUNICATION, 2013, 55 (10) : 1033 - 1046
  • [34] The NIST SRE Summed Channel Speaker Recognition System
    Sun, Hanwu
    Ma, Bin
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1111 - 1114
  • [35] Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006
    Bruemmer, Niko
    Burget, Lukas
    Cernocky, Jan 'Honza'
    Glembek, Ondrej
    Grezl, Frantisek
    Karafiat, Martin
    van Leeuwen, David A.
    Matejka, Pavel
    Schwarz, Petr
    Strasheim, Albert
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2072 - 2084
  • [36] Rapid channel compensation for speaker verification in the NIST 2000 speaker recognition evaluation
    Pelecanos, J.
    Sridharan, S.
    [J]. Acoustics Australia, 2001, 29 (01) : 17 - 20
  • [37] Nuance - Politecnico di Torino's 2016 NIST Speaker Recognition Evaluation System
    Colibro, Daniele
    Vair, Claudio
    Dalmasso, Emanuele
    Farrell, Kevin
    Karvitsky, Gennady
    Cumani, Sandro
    Laface, Pietro
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1338 - 1342
  • [38] The MIT-LL, JHU and LRDE NIST 2016 Speaker Recognition Evaluation System
    Torres-Carrasquillo, Pedro A.
    Richardson, Fred
    Nercessian, Shahan
    Sturim, Douglas
    Campbell, William
    Gwon, Youngjune
    Vattam, Swaroop
    Dehak, Najim
    Mallidi, Harish
    Nidadavolu, Phani Sankar
    Li, Ruizhi
    Dehak, Reda
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1333 - 1337
  • [39] LOQUENDO - POLITECNICO DI TORINO'S 2010 NIST SPEAKER RECOGNITION EVALUATION SYSTEM
    Castaldo, Fabio
    Colibro, Daniele
    Vair, Claudio
    Cumani, Sandro
    Laface, Pietro
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5464 - 5467
  • [40] Nuance - Politecnico di Torino's 2012 NIST Speaker Recognition Evaluation System
    Colibro, Daniele
    Vair, Claudio
    Farrell, Kevin
    Krause, Nir
    Karvitsky, Gennady
    Cumani, Sandro
    Laface, Pietro
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1995 - 1999