Supervised i-vector Modeling - Theory and Applications

被引:0
|
作者
Ramoji, Shreyas [1 ]
Ganapathy, Sriram [1 ]
机构
[1] Indian Inst Sci, Learning & Extract Acoust Patterns LEAP Lab, Elect Engn, Bengaluru, India
关键词
Supervised Expectation Maximization; Total Variability Matrix; i-vector Modeling; Gaussian Back-end; SPEAKER; IDENTIFICATION; ROBUST;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last decade, the factor analysis based modeling of a variable length speech utterance into a fixed dimensional vector (termed as i-vector) has been prominently used for many tasks like speaker recognition, language recognition and even in speech recognition. The i-vector model is an unsupervised learning paradigm where the data is initially clustered using a Gaussian Mixture Universal Background Model (GMM-UBM). The adapted means of the Gaussian mixture components are dimensionality reduced using the Total Variability Matrix (TVM) where the latent variables are modeled with a single Gaussian distribution. In this paper, we propose to rework the theory of i-vector modeling using a supervised framework where the speech utterances are associated with a label. Class labels arc introduced in the i-vector model using a mixture Gaussian prior. We show that the proposed model is a generalized i-vector model and the conventional i-vector model turns out to be a special case of this model. This model is applied for a language recognition task using the NIST Language Recognition Evaluation (LRE) 2017 dataset. In these experiments, the supervised i-vector model provides significant improvements over the conventional i-vector model (average relative improvements of 5 % in terms of C-avg).
引用
收藏
页码:1091 / 1095
页数:5
相关论文
共 50 条
  • [1] Supervised I-vector modeling for language and accent recognition
    Ramoji, Shreyas
    Ganapathy, Sriram
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [2] SIMPLIFIED AND SUPERVISED I-VECTOR MODELING FOR SPEAKER AGE REGRESSION
    Shivakumar, Prashanth Gurunath
    Li, Ming
    Dhandhania, Vedant
    Narayanan, Shrikanth S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] SPEAKER VERIFICATION USING SIMPLIFIED AND SUPERVISED I-VECTOR MODELING
    Li, Ming
    Tsiartas, Andreas
    Van Segbroeck, Maarten
    Narayanan, Shrikanth S.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7199 - 7203
  • [4] Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification
    Li, Ming
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04): : 940 - 958
  • [5] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
    Garcia-Romero, Daniel
    McCree, Alan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] I-VECTOR BASED LANGUAGE MODELING FOR QUERY REPRESENTATION
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Chen, Berlin
    Chen, Hsin-His
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5211 - 5215
  • [7] Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
    You, Lanhua
    Guo, Wu
    Song, Yan
    Zhang, Sheng
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 62 - 66
  • [8] I-VECTOR BASED LANGUAGE MODELING FOR SPOKEN DOCUMENT RETRIEVAL
    Chen, Kuan-Yu
    Lee, Hung-Shin
    Wang, Hsin-Min
    Chen, Berlin
    Chen, Hsin-Hsi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] PLDA Modeling in I-Vector and Supervector Space for Speaker Verification
    Jiang, Ye
    Lee, Kong Aik
    Tang, Zhenmin
    Ma, Bin
    Larcher, Anthony
    Li, Haizhou
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1678 - 1681
  • [10] Multilingual I-Vector based Statistical Modeling for Music Genre Classification
    Dai, Jia
    Xue, Wei
    Liu, Wenju
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 459 - 463