Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

被引:8
|
作者
Hanilci, Cemal [1 ]
Ertas, Figen [1 ]
机构
[1] Uludag Univ, Dept Elect Engn, Bursa, Turkey
关键词
IDENTIFICATION; ALGORITHM;
D O I
10.1016/j.compeleceng.2010.08.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper evaluates the impact of three special forms of the Minkowski metric (Euclidean, City Block, and Chebychev distances) on the performance of the conventional vector quantization (VQ) and Gaussian mixture model (GMM) based closed-set text-independent speaker recognition systems, in terms of recognition rate and confidence on decisions. For the VQ based system, evaluations are carried out using the two most common clustering algorithms, LBG and K-means, and it is revealed which clustering algorithm and distance pair should be used to exploit the best attribute of both to achieve the best recognition rate for a given codebook size. In the case of GMM based system, we introduce the metrics into the GMM using a concatenation of the LBG and K-means algorithms in estimating the initial mean vectors, to which the system performance is sensitive, and explore their impact on system performance. We also make comparison of results obtained from evaluations on clean speech (TIMIT) and telephone speech databases (NTIMIT and NIST2001) with the modern classifiers VQ-UBM and GMM-UBM. It is found that there are cases where conventional VQ based system outperforms the modern systems. Moreover, the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification). Crown Copyright (C) 2010 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:41 / 56
页数:16
相关论文
共 50 条
  • [21] Score Regulation based on GMM Token Ratio Similarity for Speaker Recognition
    Yang, Yingchun
    Deng, Licai
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 424 - 424
  • [22] A new common component GMM-based speaker recognition method
    Wang, YR
    Chiang, CY
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 645 - 648
  • [23] A GMM SUPERVECTOR KERNEL WITH THE BHATTACHARYYA DISTANCE FOR SVM BASED SPEAKER RECOGNITION
    You, Chang Huai
    Lee, Kong Aik
    Li, Haizhou
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4221 - 4224
  • [24] An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition
    You, Chang Huai
    Lee, Kong Aik
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (1-3) : 49 - 52
  • [25] GMM-based Bhattacharyya kernel Fisher Discriminant Analysis for speaker recognition
    Chao, YH
    Wang, HM
    Chang, RC
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 649 - 652
  • [26] A speaker recognition method based on GMM using non -negative matrix factorization
    Huang, Liming
    Liu, Dongbo
    Fang, Yu
    Wang, Weibo
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 870 - 875
  • [27] GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition
    You, Chang Huai
    Lee, Kong Aik
    Li, Haizhou
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1300 - 1312
  • [28] Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM
    Nakagawa, S
    Zhang, W
    Takahashi, M
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 81 - 84
  • [29] Performance Comparison of DCT and VQ Based Techniques for Iris Recognition
    H.B.Kekre
    Tanuja K. Sarode
    Vinayak Ashok Bharadi
    Abhishek A. Agrawal
    Rohan J. Arora
    Mahesh C. Nair
    Journal of Electronic Science and Technology, 2010, 8 (03) : 223 - 229
  • [30] New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition
    Zergat, Kawthar
    Amrouche, Abderrahmane
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (04) : 373 - 381