Supervised i-vector Modeling - Theory and Applications

被引：0

作者：

Ramoji, Shreyas ^{[1
]}

Ganapathy, Sriram ^{[1
]}

机构：

[1] Indian Inst Sci, Learning & Extract Acoust Patterns LEAP Lab, Elect Engn, Bengaluru, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Supervised Expectation Maximization; Total Variability Matrix; i-vector Modeling; Gaussian Back-end; SPEAKER; IDENTIFICATION; ROBUST;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Over the last decade, the factor analysis based modeling of a variable length speech utterance into a fixed dimensional vector (termed as i-vector) has been prominently used for many tasks like speaker recognition, language recognition and even in speech recognition. The i-vector model is an unsupervised learning paradigm where the data is initially clustered using a Gaussian Mixture Universal Background Model (GMM-UBM). The adapted means of the Gaussian mixture components are dimensionality reduced using the Total Variability Matrix (TVM) where the latent variables are modeled with a single Gaussian distribution. In this paper, we propose to rework the theory of i-vector modeling using a supervised framework where the speech utterances are associated with a label. Class labels arc introduced in the i-vector model using a mixture Gaussian prior. We show that the proposed model is a generalized i-vector model and the conventional i-vector model turns out to be a special case of this model. This model is applied for a language recognition task using the NIST Language Recognition Evaluation (LRE) 2017 dataset. In these experiments, the supervised i-vector model provides significant improvements over the conventional i-vector model (average relative improvements of 5 % in terms of C-avg).

引用

页码：1091 / 1095

页数：5

共 50 条

[1] Supervised I-vector modeling for language and accent recognition
Ramoji, Shreyas
Ganapathy, Sriram
[J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
[2] SIMPLIFIED AND SUPERVISED I-VECTOR MODELING FOR SPEAKER AGE REGRESSION
Shivakumar, Prashanth Gurunath
Li, Ming
Dhandhania, Vedant
Narayanan, Shrikanth S.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] SPEAKER VERIFICATION USING SIMPLIFIED AND SUPERVISED I-VECTOR MODELING
Li, Ming
Tsiartas, Andreas
Van Segbroeck, Maarten
Narayanan, Shrikanth S.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7199 - 7203
[4] Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification
Li, Ming
Narayanan, Shrikanth
[J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04): : 940 - 958
[5] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
Garcia-Romero, Daniel
McCree, Alan
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[6] I-VECTOR BASED LANGUAGE MODELING FOR QUERY REPRESENTATION
Chen, Kuan-Yu
Wang, Hsin-Min
Chen, Berlin
Chen, Hsin-His
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5211 - 5215
[7] Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
You, Lanhua
Guo, Wu
Song, Yan
Zhang, Sheng
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 62 - 66
[8] I-VECTOR BASED LANGUAGE MODELING FOR SPOKEN DOCUMENT RETRIEVAL
Chen, Kuan-Yu
Lee, Hung-Shin
Wang, Hsin-Min
Chen, Berlin
Chen, Hsin-Hsi
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] PLDA Modeling in I-Vector and Supervector Space for Speaker Verification
Jiang, Ye
Lee, Kong Aik
Tang, Zhenmin
Ma, Bin
Larcher, Anthony
Li, Haizhou
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1678 - 1681
[10] Multilingual I-Vector based Statistical Modeling for Music Genre Classification
Dai, Jia
Xue, Wei
Liu, Wenju
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 459 - 463

← 1 2 3 4 5 →