Maximum Gaussianality training for deep speaker vector normalization

被引:2
|
作者
Cai, Yunqi [1 ,2 ,3 ]
Li, Lantian [4 ]
Abel, Andrew [3 ,5 ]
Zhu, Xiaoyan [3 ]
Wang, Dong [2 ]
机构
[1] Kunming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming 650504, Peoples R China
[2] BNRist Tsinghua Univ, Ctr Speech & Language Technol CSLT, Beijing 100084, Peoples R China
[3] Tsinghua Univ, Dept Comp Sci, Beijing 100084, Peoples R China
[4] Artificial Intelligence Beijing Univ Posts & Telec, Beijing, Peoples R China
[5] Univ Strathclyde, Dept Comp & Informat Sci, Glasgow, Scotland
关键词
Speaker embedding Normalization flow Gaussianality training; RECOGNITION;
D O I
10.1016/j.patcog.2023.109977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic Speaker Verification (ASV) is a critical task in pattern recognition and has been applied to various security-sensitive scenarios. The current state-of-the-art technique for ASV is based on deep embedding. However, a significant challenge with this approach is that the resulting deep speaker vectors tend to be irregularly distributed. To address this issue, this paper proposes a novel training method called Maximum Gaussianality (MG), which regulates the distribution of the speaker vectors. Compared to the conventional normalization approach based on maximum likelihood (ML), the new approach directly maximizes the Gaussianality of the latent codes, and therefore can both normalize the between-class and within-class distributions in a controlled and reliable way and eliminate the unbound likelihood problem associated with the conventional ML approach. Our experiments on several datasets demonstrate that our MG-based normalization can deliver much better performance than the baseline systems without normalization and outperform discriminative normalization flow (DNF), an ML-based normalization method, particularly when the training data is limited. In theory, the MG criterion can be applied to any task in any research domain where Gaussian distributions are needed, making the MG training a versatile tool.
引用
下载
收藏
页数:12
相关论文
共 50 条
  • [41] Using Deep Belief Networks for Vector-Based Speaker Recognitiont
    Campbell, W. M.
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 676 - 680
  • [42] Centered Weight Normalization in Accelerating Training of Deep Neural Networks
    Huang, Lei
    Liu, Xianglong
    Liu, Yang
    Lang, Bo
    Tao, Dacheng
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2822 - 2830
  • [43] DEEP BELIEF NETWORKS FOR I-VECTOR BASED SPEAKER RECOGNITION
    Ghahabi, Omid
    Hernando, Javier
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [44] SPEAKER NORMALIZATION FOR AUTOMATIC WORD RECOGNITION
    BOEHM, JF
    WRIGHT, RD
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 133 - &
  • [45] Maximum Model Distance Discriminative Training for Text-Independent Speaker Verification
    Hong, Q. Y.
    Kwong, S.
    IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 2, 2004, : 1769 - 1774
  • [46] Generalized Distillation Framework For Speaker Normalization
    Joy, Neethu Mariam
    Kothinti, Sandeep Reddy
    Umesh, S.
    Abraham, Basil
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 739 - 743
  • [47] SPEAKER NORMALIZATION IN PERCEPTION OF LEXICAL TONE
    LEATHER, J
    JOURNAL OF PHONETICS, 1983, 11 (04) : 373 - 382
  • [48] Speaker normalization based on subglottal resonances
    Wang, Shizhen
    Alwan, Abeer
    Lulich, Steven M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4277 - +
  • [49] Robustness of Speaker Normalization Approaches: A Study
    Sinha, Rohit
    Kumar, B. Sandeep
    2008 IEEE REGION 10 CONFERENCE: TENCON 2008, VOLS 1-4, 2008, : 1584 - 1588
  • [50] The Entanglement of Dialectal Variation and Speaker Normalization
    Rankinen, Wil
    de Jong, Kenneth
    LANGUAGE AND SPEECH, 2021, 64 (01) : 181 - 202