Relevance factor of maximum a posteriori adaptation for GMM-NAP-SVM in speaker and language recognition

被引:4
|
作者
You, Chang Huai [1 ]
Li, Haizhou [1 ]
Lee, Kong Aik [1 ]
机构
[1] ASTAR, Inst Infocomm Res, Singapore, Singapore
来源
COMPUTER SPEECH AND LANGUAGE | 2015年 / 30卷 / 01期
关键词
Maximum a posteriori; Supervector; Gaussian mixture model; Support vector machine; DISTANCE; KERNEL;
D O I
10.1016/j.csl.2014.09.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper studies the relevance factor in maximum a posteriori (MAP) adaptation of Gaussian mixture model (GMM) for speaker and language recognition. Knowing that relevance factor determines how much the observed training data influence the model adaptation, thus the resulting GMM model, it is believed that more effective modeling can be achieved if the relevance factor is adaptive to the corresponding data. We therefore provide a mathematic derivation for the estimation of relevance factor. GMM supervector support vector machine (SVM) with nuisance attribute projection (NAP) (GMM-NAP-SVM) has been reported to be effective and reliable for speaker and language recognition. Being a discriminative classifier in nature, a GMM-NAP-SVM system is sensitive to the magnitude and direction of a supervector in the high dimensional space. However, when characterizing a speech utterance with GMM supervector estimated through MAP, we observe that the resulting supervector is undesirably affected by the varying duration of the utterance. We propose an adaptive relevance factor that adapts to the duration to mitigate the variability effect due to the length of utterance. We give a systematic investigation on different types of relevance factor of MAP in different applicatively platforms. We show the efficacy of the data-dependent as well as adaptive relevance factors on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2008 and language recognition evaluation (LRE) 2009 and 2011 tasks respectively. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:116 / 134
页数:19
相关论文
共 50 条
  • [41] Speaker adaptation based on transfer vector field smoothing using maximum a posteriori probability estimation
    Tonomura, M
    Kosaka, T
    Matsunaga, S
    [J]. COMPUTER SPEECH AND LANGUAGE, 1996, 10 (02): : 117 - 132
  • [42] Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
    Sumita Nainan
    Vaishali Kulkarni
    [J]. International Journal of Speech Technology, 2021, 24 : 809 - 822
  • [43] Enhancement in speaker recognition for optimized speech features using GMM, SVM and 1-D CNN
    Nainan, Sumita
    Kulkarni, Vaishali
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 809 - 822
  • [44] THE ADAPTATION SCHEMES IN PR-SVM BASED LANGUAGE RECOGNITION
    Xu Bing
    Song Yan
    Dai LiRong
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 334 - 337
  • [45] Analysis of Large-Scale SVM Training Algorithms for Language and Speaker Recognition
    Cumani, Sandro
    Laface, Pietro
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1585 - 1596
  • [46] Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models
    Yongwon Jeong
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [47] Speaker adaptation in the maximum a posteriori framework based on the probabilistic 2-mode analysis of training models
    Jeong, Yongwon
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [48] Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework
    Sen, Nirmalya
    Sahidullah, Md
    Patil, Hemant A.
    Das Mandal, Shyamal Kumar
    Rao, Krothapalli Sreenivasa
    Basu, Tapan Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 1067 - 1088
  • [49] Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework
    Nirmalya Sen
    Md Sahidullah
    Hemant A. Patil
    Shyamal Kumar Das Mandal
    Krothapalli Sreenivasa Rao
    Tapan Kumar Basu
    [J]. International Journal of Speech Technology, 2021, 24 : 1067 - 1088
  • [50] STRUCTURAL MAXIMUM A POSTERIORI SPEAKER ADAPTATION OF SPEAKING RATE-DEPENDENT HIERARCHICAL PROSODIC MODEL FOR MANDARIN TTS
    Liao, I-Bin
    Chiang, Chen-Yu
    Chen, Sin-Horng
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5625 - 5629