COMPARISON OF TEXT-INDEPENDENT SPEAKER RECOGNITION METHODS USING VECTOR-QUANTIZATION DISTORTION AND DISCRETE AND CONTINUOUS HMMS

被引:0
|
作者
MATSUI, T
FURUI, S
机构
[1] NTT Human Interface Laboratories, Musashino
关键词
SPEAKER RECOGNITION; TEXT-INDEPENDENT; VECTOR QUANTIZATION; ERGODIC HMM; UTTERANCE VARIATION;
D O I
10.1002/ecjc.4430771207
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The results of speaker recognition methods using vector quantization (VQ) distortion and discrete or continuous ergodic hidden Markov models (HMMs) are compared. The effectiveness of these methods is examined from the viewpoint of robustness against utterance variation such as differences in content, temporal variation, and changes in utterance speed. It is shown that the continuous HMM performs much better than the discrete HMM and its performance is close to that of the VQ distortion method. When the amount of training data is limited, however, the VQ distortion method achieves a better recognition rate than the continuous HMM. The transition information between the states is shown to contribute little to identifying the individual characteristics of a voice. An increase in the number of states or in the number of mixture components in a state both have an equal effect, and recognition performance is almost completely determined by the product of these two numbers.
引用
收藏
页码:63 / 70
页数:8
相关论文
共 50 条
  • [41] Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning
    El-Moneim S.A.
    El-Mordy E.A.
    Nassar M.A.
    Dessouky M.I.
    Ismail N.A.
    El-Fishawy A.S.
    El-Dolil S.
    El-Dokany I.M.
    El-Samie F.E.A.
    International Journal of Speech Technology, 2022, 25 (03) : 679 - 687
  • [42] Text-Independent Speaker Recognition System Using Feature-Level Fusion for Audio Databases of Various Sizes
    Chauhan N.
    Isshiki T.
    Li D.
    SN Computer Science, 4 (5)
  • [43] Toward Text-independent Cross-lingual Speaker Recognition Using English-Mandarin-Taiwanese Dataset
    Wu, Yi-Chieh
    Liao, Wen-Hung
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8515 - 8522
  • [44] Vector quantization in text dependent automatic speaker recognition using Mel-Frequency Cepstrum Coefficient
    Kabir, Ahsanul
    Ahsan, Sheikh Mohammad Masudul
    PROCEEDINGS OF THE WSEAS INTERNATIONAL CONFERENCE ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING: SELECTED TOPICS ON CIRCUITS, SYSTEMS, ELECTRONICS, CONTROL & SIGNAL PROCESSING, 2007, : 352 - 355
  • [45] Text and Language-Independent Speaker Recognition Using Suprasegmental Features and Support Vector Machines
    Bajpai, Anvita
    Pathangay, Vinod
    CONTEMPORARY COMPUTING, PROCEEDINGS, 2009, 40 : 307 - +
  • [46] SPEAKER-DEPENDENT ISOLATED WORD RECOGNITION USING SPEAKER-INDEPENDENT VECTOR QUANTIZATION CODEBOOKS AUGMENTED WITH SPEAKER-SPECIFIC DATA
    BURTON, DK
    SHORE, JE
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 440 - 443
  • [47] Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques
    Tiwari, Manish
    Verma, Deepak Kumar
    International Journal of Speech Technology, 2024, 27 (04) : 1013 - 1026
  • [48] Comparison of Indonesian Speaker Recognition Using Vector Quantization and Hidden Markov Model for Unclear Pronunciation Problem
    Handaya, Devi
    Fakhruroja, Hanif
    Hidayat, Egi Muhammad Idris
    Machbub, Carmadi
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON SYSTEM ENGINEERING AND TECHNOLOGY (ICSET), 2016, : 39 - 45
  • [49] Single-sided Approach to Discriminative PLDA Training for Text-Independent Speaker Verification without Using Expanded I-vector
    Hirano, Ikuya
    Lee, Kong Aik
    Zhang, Zhaofeng
    Wang, Longbiao
    Kai, Atsuhiko
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 59 - +
  • [50] Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization
    Manish Tiwari
    Deepak Kumar Verma
    International Journal of Speech Technology, 2025, 28 (1) : 245 - 260