Analysis of Language Dependent Front-End for Speaker Recognition

被引:0
|
作者
Madikeri, Srikanth [1 ]
Dey, Subhadeep [1 ,2 ]
Motlicek, Petr [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
关键词
i-vector; speaker recognition; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Deep Neural Network (DNN) i-vector based speaker recognition systems, acoustic models trained for Automatic Speech Recognition are employed to estimate sufficient statistics for i-vector modeling. The DNN based acoustic model is typically trained on a wellresourced language like English. In evaluation conditions where enrollment and test data are not in English, as in the NIST SRE 2016 dataset, a DNN acoustic model generalizes poorly. In such conditions, a conventional Universal Background Model/Gaussian Mixture Model (UBM/GMM) based i-vector extractor performs better than the DNN based i-vector system. In this paper, we address the scenario in which one can develop a Automatic Speech Recognizer with limited resources for a language present in the evaluation condition, thus enabling the use of a DNN acoustic model instead of UBM/GMM. Experiments are performed on the Tagalog subset of the NIST SRE 2016 dataset assuming an open training condition. With a DNN i-vector system trained for Tagalog, a relative improvement of 12.1% is obtained over a baseline system trained for English.
引用
收藏
页码:1101 / 1105
页数:5
相关论文
共 50 条
  • [1] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798
  • [2] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
    Guan, Yong
    Li, Peng
    Liu, Wen-Ju
    Xu, Bo
    Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
  • [3] ROBUST FEATURE FRONT-END FOR SPEAKER IDENTIFICATION
    Liu, Gang
    Lei, Yun
    Hansen, John H. L.
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4233 - 4236
  • [4] Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition
    Wei-Wei Liu
    Meng Cai
    Wei-Qiang Zhang
    Jia Liu
    Michael T. Johnson
    Journal of Signal Processing Systems, 2016, 82 : 229 - 239
  • [5] A Target-Oriented Phonotactic Front-End for Spoken Language Recognition
    Tong, Rong
    Ma, Bin
    Li, Haizhou
    Chng, Eng Siong
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1335 - 1347
  • [6] Front-end Channel Compensation using Mixture-dependent Feature Transformations for i-Vector Speaker Recognition
    Hasan, Taufiq
    Hansen, John H. L.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1090 - 1093
  • [7] Discriminative Boosting Algorithm for Diversified Front-End Phonotactic Language Recognition
    Liu, Wei-Wei
    Cai, Meng
    Zhang, Wei-Qiang
    Liu, Jia
    Johnson, Michael T.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 229 - 239
  • [8] AN INVESTIGATION INTO SPEAKER INFORMED DNN FRONT-END FOR LVCSR
    Liu, Yulan
    Karanasou, Penny
    Hain, Thomas
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4300 - 4304
  • [9] ROBUST SPEAKER IDENTIFICATION USING A CASA FRONT-END
    Zhao, Xiaojia
    Shao, Yang
    Wang, DeLiang
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5468 - 5471
  • [10] An investigation into front-end signal processing for speaker normalization
    Umesh, S
    Sinha, R
    Kumar, SVB
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 345 - 348