Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

被引:0
|
作者
Sun, Hanwu [1 ]
Nwe, Tin Lay [1 ]
Chin, Eugene [1 ]
Koh, Wei [1 ]
Bin, Ma [1 ]
Li, Haizhou [1 ]
机构
[1] Inst Infocomm Res, Singapore 119613, Singapore
来源
关键词
speaker clustering; microphone array; diarization; TDOA; time delay estimate; speaker segmentation;
D O I
10.1117/12.740116
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper presents a speaker diarization system developed at the Institute for Infocomm Research ((IR)-R-2) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing" (DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] The rich transcription 2007 meeting recognition evaluation
    Fiscus, Jonathan G.
    Ajot, Jerome
    Garofolo, John S.
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2008, 4625 : 373 - 389
  • [2] The AMI speaker diarization system for NIST RT06s meeting data
    van Leeuwen, David A.
    Huijbregts, Marijn
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 371 - +
  • [3] The TNO speaker diarization system for NIST RT05s meeting data
    van Leeuwen, DA
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 440 - 449
  • [4] BUT system for NIST 2008 speaker recognition evaluation
    Burge, Lukas
    Fapso, Michal
    Hubeika, Valiantsina
    Glembek, Ondrej
    Karafiat, Martin
    Kockmann, Marcel
    Matejka, Pavel
    Schwartz, Petr
    Cernocky, Jan Honza
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2315 - 2318
  • [5] THE HKCUPU SYSTEM FOR THE NIST 2010 SPEAKER RECOGNITION EVALUATION
    Jiang, Weiwu
    Mak, Man-Wai
    Rao, Wei
    Meng, Helen
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5288 - 5291
  • [6] IFLY SYSTEM FOR THE NIST 2008 SPEAKER RECOGNITION EVALUATION
    Guo, Wu
    Long, Yanhua
    Li, Yijie
    Pan, Lei
    Wang, Eryu
    Dai, Lirong
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4209 - 4212
  • [7] THE SRI NIST 2008 SPEAKER RECOGNITION EVALUATION SYSTEM
    Kajarekar, Sachin S.
    Scheffer, Nicolas
    Graciarena, Martin
    Shriberg, Elizabeth
    Stolcke, Andreas
    Ferrer, Luciana
    Bocklet, Tobias
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4205 - 4208
  • [8] THE SRI NIST 2010 SPEAKER RECOGNITION EVALUATION SYSTEM
    Scheffer, Nicolas
    Ferrer, Luciana
    Graciarena, Martin
    Kajarekar, Sachin
    Shriberg, Elizabeth
    Stolcke, Andreas
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5292 - 5295
  • [9] The Opensesame NIST 2016 Speaker Recognition Evaluation System
    Liu, Gang
    Qian, Qi
    Wang, Zhibin
    Zhao, Qingen
    Wang, Tianzhou
    Li, Hao
    Xue, Jian
    Zhu, Shenghuo
    Jin, Rong
    Zhao, Tuo
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2854 - 2858
  • [10] STBU system for the NIST 2006 speaker recognition evaluation
    Matejka, P.
    Burget, L.
    Schwarz, P.
    Glembek, O.
    Karafiat, M.
    Grezl, F.
    Cernocky, J.
    van Leeuwen, D. A.
    Bruemmer, N.
    Strasheim, A.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 221 - +