Maximum Gaussianality training for deep speaker vector normalization

被引:2
|
作者
Cai, Yunqi [1 ,2 ,3 ]
Li, Lantian [4 ]
Abel, Andrew [3 ,5 ]
Zhu, Xiaoyan [3 ]
Wang, Dong [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] BNRist Tsinghua Univ, Ctr Speech & Language Technol CSLT, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
[4] Artificial Intelligence Beijing Univ Posts & Telec, Beijing, Peoples R China
[5] Univ Strathclyde, Dept Comp & Informat Sci, Glasgow, Scotland
关键词
Speaker embedding Normalization flow Gaussianality training; RECOGNITION;
D O I
10.1016/j.patcog.2023.109977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speaker Verification (ASV) is a critical task in pattern recognition and has been applied to various security-sensitive scenarios. The current state-of-the-art technique for ASV is based on deep embedding. However, a significant challenge with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. To address this issue, this paper proposes a novel training method called Maximum Gaussianality (MG), which regulates the distribution of the speaker vectors. Compared to the conventional normalization approach based on maximum likelihood (ML), the new approach directly maximizes the Gaussianality of the latent codes, and therefore can both normalize the between-class and within-class distributions in a controlled and reliable way and eliminate the unbound likelihood problem associated with the conventional ML approach. Our experiments on several datasets demonstrate that our MG-based normalization can deliver much better performance than the baseline systems without normalization and outperform discriminative normalization flow (DNF), an ML-based normalization method, particularly when the training data is limited. In theory, the MG criterion can be applied to any task in any research domain where Gaussian distributions are needed, making the MG training a versatile tool.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [1] Maximum Gaussianality training for deep speaker vector normalization
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    Pattern Recognition, 2024, 145
  • [2] Speaker adaptive training: A maximum likelihood approach to speaker normalization
    Anastasakos, T
    McDonough, J
    Makhoul, J
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1043 - 1046
  • [3] Deep Normalization for Speaker Vectors
    Cai, Yunqi
    Li, Lantian
    Abel, Andrew
    Zhu, Xiaoyan
    Wang, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 733 - 744
  • [4] A study on speaker normalization using vocal tract normalization and speaker adaptive training
    Welling, L
    Haeb-Umbach, R
    Aubert, X
    Haberland, N
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 797 - 800
  • [5] DEEP NEURAL NETWORK TRAINED WITH SPEAKER REPRESENTATION FOR SPEAKER NORMALIZATION
    Tang, Yun
    Mohan, Aanchan
    Rose, Richard C.
    Ma, Chengyuan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Speaker retrieval based on deep speaker vector
    Li, Wei
    Yang, Jichen
    He, Qianhua
    Li, Yanxiong
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2015, 43 (07): : 62 - 65
  • [7] A vector-quantizer based method of speaker normalization
    Shin, OK
    Fourth Annual ACIS International Conference on Computer and Information Science, Proceedings, 2005, : 402 - 407
  • [8] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [9] Comparison of vector normalization methods in multi-level speaker verification
    Drgas, Szymon
    Dabrowski, Adam
    2012 INTERNATIONAL CONFERENCE ON SIGNALS AND ELECTRONIC SYSTEMS (ICSES), 2012,
  • [10] Analysis of I-vector Length Normalization in Speaker Recognition Systems
    Garcia-Romero, Daniel
    Espy-Wilson, Carol Y.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 256 - 259