Robust speaker recognition in cross-channel condition based on Gaussian mixture model

被引:2
|
作者
Shan, Yuxiang [1 ]
Liu, Jia [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
关键词
Cross-channel; Echo cancellation; Latent factor analysis; Noise reduction; Score normalization; Speaker verification; VARIABILITY; INFORMATION; FEATURES;
D O I
10.1007/s11042-009-0456-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most difficult challenges for speaker recognition is dealing with channel variability. In this paper, several new cross-channel compensation techniques are introduced for a Gaussian mixture model-universal background model (GMM-UBM) speaker verification system. These new techniques include wideband noise reduction, echo cancellation, a simplified feature-domain latent factor analysis (LFA) and data-driven score normalization. A novel dynamic Gaussian selection algorithm is developed to reduce the feature compensation time by more than 60% without any performance loss. The performance of different techniques across varying channel train/test conditions are presented and discussed, finding that speech enhancement, which used to be neglected for telephone speech, is essential for cross-channel tasks, and the channel compensation techniques developed for telephone channel speech also perform effectively. The per microphone performance analysis further shows that speech enhancement can boost the effects of other techniques greatly, especially on channels with larger signal-to-noise ratio (SNR) variance. All results are presented on NIST SRE 2006 and 2008 data, showing a promising performance gain compared to the baseline. The developed system is also compared with other state-of-the-art speaker verification systems. The result shows that the developed system can obtain comparable or even better performance but consumes much less CPU time, making it more suitable for practical use.
引用
收藏
页码:159 / 173
页数:15
相关论文
共 50 条
  • [1] Robust speaker recognition in cross-channel condition based on Gaussian mixture model
    Yuxiang Shan
    Jia Liu
    [J]. Multimedia Tools and Applications, 2011, 52 : 159 - 173
  • [2] Robust Speaker Recognition in Cross-channel Condition
    Shan, Yuxiang
    Liu, Jia
    [J]. PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 4344 - 4348
  • [3] Supervized Mixture of PLDA Models for Cross-Channel Speaker Verification
    Simonchik, Konstantin
    Pekhovsky, Timur
    Shulipa, Andrey
    Afanasyev, Anton
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1682 - 1685
  • [4] ACCURATE SPEAKER RECOGNITION BASED ON ADAPTIVE GAUSSIAN MIXTURE MODEL
    Wang Yunqi
    Yu Yibiao
    [J]. 2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 527 - 531
  • [5] Duration Weighted Gaussian Mixture Model Supervector Modeling for Robust Speaker Recognition
    Ji, Zhe
    Hou, Wei
    Jin, Xin
    Li, Zhi-Yi
    [J]. 2013 NINTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION (ICNC), 2013, : 238 - 241
  • [6] Cohort based speaker model synthesis for channel robust speaker recognition
    Wu, Wei
    Zheng, Thomas Fang
    Xu, Mingxing
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 893 - 896
  • [7] CHANNEL ADVERSARIAL TRAINING FOR CROSS-CHANNEL TEXT-INDEPENDENT SPEAKER RECOGNITION
    Fang, Xin
    Zou, Liang
    Li, Jin
    Sun, Lei
    Ling, Zhen-Hua
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6221 - 6225
  • [8] Speaker recognition based on dynamic time warping and Gaussian mixture model
    Zhang, Nannan
    Yao, Yanru
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 1174 - 1177
  • [9] Speaker Recognition Based on SOINN and Incremental Learning Gaussian Mixture Model
    Tang, Zelin
    Shen, Furao
    Zhao, Jinxi
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [10] Improved Gaussian Mixture Model and Application in Speaker Recognition
    Bao Lingling
    Shen Xizhong
    [J]. PROCEEDINGS OF 2016 THE 2ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS, 2016, : 387 - 390