Speaker verification using adapted Gaussian mixture models

被引:2852
|
作者
Reynolds, DA [1 ]
Quatieri, TF [1 ]
Dunn, RB [1 ]
机构
[1] MIT, Lincoln Lab, Speech Syst Technol Grp, Lexington, MA 02420 USA
关键词
speaker recognition; Gaussian mixture models; likelihood ratio detector; universal background model; handset normalization; NIST evaluation;
D O I
10.1006/dspr.1999.0361
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper we describe the major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs). The system is built around the likelihood ratio test for verification, using simple but effective GMMs for likelihood functions, a universal background model (UBM) for alternative speaker representation, and a form of Bayesian adaptation to derive speaker models from the UBM. The development and use of a handset detector and score normalization to greatly improve verification performance is also described and discussed. Finally representative performance benchmarks and system behavior experiments on NIST SRE corpora are presented. (C) 2000 Academic Press.
引用
收藏
页码:19 / 41
页数:23
相关论文
共 50 条
  • [41] Speaker verification using mixture decomposition discrimination
    Sukkar, RA
    Gandhi, MB
    Setlur, AR
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (03): : 292 - 299
  • [42] Privacy Preserving Speaker Verification using Adapted GMMs
    Pathak, Manas A.
    Raj, Bhiksha
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2416 - 2419
  • [43] Combining Gaussian mixture models and segmental feature models for speaker recognition
    Milosevic, Milana
    Glavitsch, Ulrike
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2042 - 2043
  • [44] Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models
    Kaminski, Kamil
    Majda, Ewelina
    Dobrowolski, Andrzej P.
    2013 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2013, : 220 - 225
  • [45] Use of Gaussian Mixture Models in Macedonian Forensic Speaker Identification
    Gerazov, Branislav
    Pop-Dimitrijoska, Vesna
    Ivanovski, Zoran
    Apostolovska, Gordana
    2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 724 - 727
  • [46] Telephone based speaker recognition using multiple binary classifier and Gaussian Mixture Models
    Castellano, PJ
    Slomka, S
    Sridharan, S
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1075 - 1078
  • [47] A Quality Measure Method Using Gaussian Mixture Models and Divergence Measure for Speaker Identification
    Zheng, Rong
    Zhang, Shuwu
    Xu, Bo
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2094 - 2097
  • [48] Improved Approach for Calculating Model Parameters in Speaker Recognition using Gaussian Mixture Models
    Metkar, Prashant
    Cohen, Aaron
    Parhi, Keshab
    2010 CONFERENCE RECORD OF THE FORTY FOURTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2010, : 567 - 570
  • [49] A session-GMM generative model using test utterance Gaussian mixture modeling for speaker verification
    Aronowitz, H
    Burshtein, D
    Amir, A
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 733 - 736
  • [50] Particle Swarm Optimization for Sorted Adapted Gaussian Mixture Models
    Saeidi, Rahim
    Mohammadi, Hamid Reza Sadegh
    Ganchev, Todor
    Rodman, Robert David
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 344 - 353