STATISTICS POOLING TIME DELAY NEURAL NETWORK BASED ON X-VECTOR FOR SPEAKER VERIFICATION

被引:0
|
作者
Hong, Qian-Bei [1 ,2 ]
Wu, Chung-Hsien [1 ,2 ,3 ]
Wang, Hsin-Min [1 ,2 ]
Huang, Chien-Lin [4 ]
机构
[1] Natl Cheng Kung Univ, Grad Program Multimedia Syst & Intelligent Comp, Tainan, Taiwan
[2] Acad Sinica, Tainan, Taiwan
[3] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
[4] PingAn AI Lab, Palo Alto, CA 94306 USA
关键词
Speaker verification; time delay neural network; statistics pooling; RECOGNITION; EMBEDDINGS;
D O I
10.1109/icassp40776.2020.9054350
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics pooling time delay neural network (TDNN), in which the TDNN structure integrates statistics pooling for each layer, to consider the variation of temporal context in frame-level transformation. The proposed feature vector, named as stats-vector, are compared with the baseline x-vector features on the VoxCeleb dataset and the Speakers in the Wild (SITW) dataset for speaker verification. The experimental results showed that the proposed stats-vector with score fusion achieved the best performance on VoxCeleb1 dataset. Furthermore, considering the interference from other speakers in the recordings, we found that the proposed stats-vector efficiently reduced the interference and improved the speaker verification performance on the SITW dataset.
引用
收藏
页码:6849 / 6853
页数:5
相关论文
共 50 条
  • [1] Review of different robust x-vector extractors for speaker verification
    Rouvier, Mickael
    Dufour, Richard
    Bousquet, Pierre-Michel
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 366 - 370
  • [2] Linear transformation on x-vector for text-independent speaker verification
    Xu, Longting
    Ren, Bo
    Zhang, Guanglin
    Yang, Jichen
    [J]. ELECTRONICS LETTERS, 2019, 55 (15) : 864 - 865
  • [3] Design Choices for X-vector Based Speaker Anonymization
    Srivastava, Brij Mohan Lal
    Tomashenko, N.
    Wang, Xin
    Vincent, Emmanuel
    Yamagishi, Junichi
    Maouche, Mohamed
    Bellet, Aurelien
    Tommasi, Marc
    [J]. INTERSPEECH 2020, 2020, : 1713 - 1717
  • [4] An Adaptive X-vector Model for Text-independent Speaker Verification
    Gu, Bin
    Guo, Wu
    Ding, Penguin
    Ling, Zhenhua
    Du, Jun
    [J]. INTERSPEECH 2020, 2020, : 1506 - 1510
  • [5] Privacy and Utility of X-Vector Based Speaker Anonymization
    Srivastava, Brij Mohan Lal
    Maouche, Mohamed
    Sahidullah, Md
    Vincent, Emmanuel
    Bellet, Aurelien
    Tommasi, Marc
    Tomashenko, Natalia
    Wang, Xin
    Yamagishi, Junichi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2383 - 2395
  • [6] Improving X-vector and PLDA for Text-dependent Speaker Verification
    Chen, Zhuxin
    Lin, Yue
    [J]. INTERSPEECH 2020, 2020, : 726 - 730
  • [7] Densely Connected Time Delay Neural Network for Speaker Verification
    Yu, Ya-Qi
    Li, Wu-Jun
    [J]. INTERSPEECH 2020, 2020, : 921 - 925
  • [8] Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification
    You, Lanhua
    Guo, Wu
    Dai, Li-Rong
    Du, Jun
    [J]. INTERSPEECH 2019, 2019, : 1158 - 1162
  • [9] Bayesian HMM based x-vector clustering for Speaker Diarization
    Diez, Mireia
    Burget, Lukas
    Wang, Shuai
    Rohdin, Johan
    Cernocky, Jan
    [J]. INTERSPEECH 2019, 2019, : 346 - 350
  • [10] STATISTICAL PYRAMID DENSE TIME DELAY NEURAL NETWORK FOR SPEAKER VERIFICATION
    Wan, Zi-Kai
    Ren, Qing-Hua
    Qin, You-Cai
    Mao, Qi-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7532 - 7536