Supervised I-vector modeling for language and accent recognition

被引:5
|
作者
Ramoji, Shreyas [1 ]
Ganapathy, Sriram [1 ]
机构
[1] Indian Inst Sci, Dept Elect Engn, Learning & Extract Acoust Patterns LEAP Lab, Bengaluru, India
来源
关键词
Unsupervised i-vector; S-vector; Minimum-mean square error (MMSE) estimate; Language recognition; SPEAKER; IDENTIFICATION; ROBUST;
D O I
10.1016/j.csl.2019.101030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The conventional i-vector approach to speaker and language recognition constitutes an unsupervised learning paradigm where a variable length speech utterance is converted into a fixed dimensional feature vector (termed as i-vector). The i-vector approach belongs to the broader family of factor analysis models where the utterance level adapted means of a Gaussian Mixture Model - Universal Background Model (GMM-UBM) are assumed to lie in a low rank subspace. The latent variables in the low rank model are assumed to have a standard Gaussian prior distribution. In this paper, we rework the theory of i-vector modeling in a supervised framework where the class labels (like language or accent) of the speech recordings are introduced directly into the i-vector model using a mixture Gaussian prior where each mixture component is associated with a class label. We provide the mathematical formulation for minimum mean squared error estimate (MMSE) of the supervised i-vector (s-vector) model. A detailed analysis of the s-vector model is given and this is contrasted with the traditional i-vector framework. The proposed model is used for language recognition tasks using the NIST Language Recognition Evaluation (LRE) 2017 dataset as well as an accent recognition task using the Mozilla common voices dataset. In these experiments, the s-vector model provides significant improvements over the conventional i-vector model (relative improvements of up to 24% for LRE task in terms of primary detection cost metric). (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition
    Behravan, Hamid
    Hautamaki, Ville
    Siniscalchi, Sabato Marco
    Kinnunen, Tomi
    Lee, Chin-Hui
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 29 - 41
  • [2] Supervised i-vector Modeling - Theory and Applications
    Ramoji, Shreyas
    Ganapathy, Sriram
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1091 - 1095
  • [3] I-vector features and deep neural network modeling for language recognition
    Wang, Wei
    Song, Wenjie
    Chen, Chen
    Zhang, Zhaoxin
    Xin, Yi
    [J]. 2018 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2019, 147 : 36 - 43
  • [4] I-Vector Speaker and Language Recognition System on Android
    Vazquez-Machado, Christian
    Colon-Hernandez, Pedro
    Torres-Carrasquillo, Pedro A.
    [J]. 2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [5] SUPERVISED DOMAIN ADAPTATION FOR I-VECTOR BASED SPEAKER RECOGNITION
    Garcia-Romero, Daniel
    McCree, Alan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Simplified supervised i-vector modeling with application to robust and efficient language identification and speaker verification
    Li, Ming
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04): : 940 - 958
  • [7] I-VECTOR BASED LANGUAGE MODELING FOR QUERY REPRESENTATION
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Chen, Berlin
    Chen, Hsin-His
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5211 - 5215
  • [8] SIMPLIFIED AND SUPERVISED I-VECTOR MODELING FOR SPEAKER AGE REGRESSION
    Shivakumar, Prashanth Gurunath
    Li, Ming
    Dhandhania, Vedant
    Narayanan, Shrikanth S.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] SPEAKER VERIFICATION USING SIMPLIFIED AND SUPERVISED I-VECTOR MODELING
    Li, Ming
    Tsiartas, Andreas
    Van Segbroeck, Maarten
    Narayanan, Shrikanth S.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7199 - 7203
  • [10] I-VECTOR BASED LANGUAGE MODELING FOR SPOKEN DOCUMENT RETRIEVAL
    Chen, Kuan-Yu
    Lee, Hung-Shin
    Wang, Hsin-Min
    Chen, Berlin
    Chen, Hsin-Hsi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,