Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

被引：8

作者：

Hanilci, Cemal ^{[1
]}

Ertas, Figen ^{[1
]}

机构：

[1] Uludag Univ, Dept Elect Engn, Bursa, Turkey

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2011年 / 37卷 / 01期

关键词：

IDENTIFICATION; ALGORITHM;

D O I：

10.1016/j.compeleceng.2010.08.001

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper evaluates the impact of three special forms of the Minkowski metric (Euclidean, City Block, and Chebychev distances) on the performance of the conventional vector quantization (VQ) and Gaussian mixture model (GMM) based closed-set text-independent speaker recognition systems, in terms of recognition rate and confidence on decisions. For the VQ based system, evaluations are carried out using the two most common clustering algorithms, LBG and K-means, and it is revealed which clustering algorithm and distance pair should be used to exploit the best attribute of both to achieve the best recognition rate for a given codebook size. In the case of GMM based system, we introduce the metrics into the GMM using a concatenation of the LBG and K-means algorithms in estimating the initial mean vectors, to which the system performance is sensitive, and explore their impact on system performance. We also make comparison of results obtained from evaluations on clean speech (TIMIT) and telephone speech databases (NTIMIT and NIST2001) with the modern classifiers VQ-UBM and GMM-UBM. It is found that there are cases where conventional VQ based system outperforms the modern systems. Moreover, the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification). Crown Copyright (C) 2010 Published by Elsevier Ltd. All rights reserved.

引用

页码：41 / 56

页数：16

共 50 条

[21] Score Regulation based on GMM Token Ratio Similarity for Speaker Recognition
Yang, Yingchun
Deng, Licai
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 424 - 424
[22] A new common component GMM-based speaker recognition method
Wang, YR
Chiang, CY
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 645 - 648
[23] A GMM SUPERVECTOR KERNEL WITH THE BHATTACHARYYA DISTANCE FOR SVM BASED SPEAKER RECOGNITION
You, Chang Huai
Lee, Kong Aik
Li, Haizhou
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4221 - 4224
[24] An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition
You, Chang Huai
Lee, Kong Aik
Li, Haizhou
IEEE SIGNAL PROCESSING LETTERS, 2009, 16 (1-3) : 49 - 52
[25] GMM-based Bhattacharyya kernel Fisher Discriminant Analysis for speaker recognition
Chao, YH
Wang, HM
Chang, RC
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 649 - 652
[26] A speaker recognition method based on GMM using non -negative matrix factorization
Huang, Liming
Liu, Dongbo
Fang, Yu
Wang, Weibo
2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 870 - 875
[27] GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition
You, Chang Huai
Lee, Kong Aik
Li, Haizhou
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (06): : 1300 - 1312
[28] Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM
Nakagawa, S
Zhang, W
Takahashi, M
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 81 - 84
[29] Performance Comparison of DCT and VQ Based Techniques for Iris Recognition
H.B.Kekre
Tanuja K. Sarode
Vinayak Ashok Bharadi
Abhishek A. Agrawal
Rohan J. Arora
Mahesh C. Nair
Journal of Electronic Science and Technology, 2010, 8 (03) : 223 - 229
[30] New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition
Zergat, Kawthar
Amrouche, Abderrahmane
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2014, 17 (04) : 373 - 381

← 1 2 3 4 5 →