An Improved Deep Embedding Learning Method for Short Duration Speaker Verification

被引:19
|
作者
Gao, Zhifu [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
Guo, Wu [1 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
speaker verification; convolution neural network; dilated convolution; cross-convolutional-layer pooling;
D O I
10.21437/Interspeech.2018-1515
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for short duration speaker verification (SV). Existing deep learning based SV methods generally extract frontend embeddings from a feed-forward deep neural network, in which the long-term speaker characteristics are captured via a pooling operation over the input speech. The extracted embeddings are then scored via a backend model, such as Probabilistic Linear Discriminative Analysis (PLDA). Two improvements are proposed for frontend embedding learning based on the CNN structure: (1) Motivated by the WaveNet for speech synthesis, dilated filters are designed to achieve a tradeoff between computational efficiency and receptive-filter size; and (2) A novel cross-convolutional layer pooling method is exploited to capture 1st-order statistics for modelling long-term speaker characteristics. Specifically, the activations of one convolutional layer are aggregated with the guidance of the feature maps from the successive layer. To evaluate the effectiveness of our proposed methods, extensive experiments are conducted on the modified female portion of NIST SRE 2010 evaluations, with conditions ranging from 10s-10s to 5s-4s. Excellent performance has been achieved on each evaluation condition, significantly outperforming existing SV systems using i-vector and d-vector embeddings.
引用
收藏
页码:3578 / 3582
页数:5
相关论文
共 50 条
  • [31] Improved Meta-learning Training for Speaker Verification
    Chen, Yafeng
    Guo, Wu
    Gu, Bin
    INTERSPEECH 2021, 2021, : 1049 - 1053
  • [32] Introducing phonetic information to speaker embedding for speaker verification
    Yi Liu
    Liang He
    Jia Liu
    Michael T. Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [33] The SJTU System for Short-duration Speaker Verification Challenge 2021
    Han, Bing
    Chen, Zhengyang
    Zhou, Zhikai
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 2332 - 2336
  • [34] The Sogou System for Short-duration Speaker Verification Challenge 2021
    Yan, Jie
    Yao, Shengyu
    Pan, Yiqian
    Chen, Wei
    INTERSPEECH 2021, 2021, : 2327 - 2331
  • [35] CONTRASTIVE-MIXUP LEARNING FOR IMPROVED SPEAKER VERIFICATION
    Zhang, Xin
    Jin, Minho
    Cheng, Roger
    Li, Ruirui
    Han, Eunjung
    Stolcke, Andreas
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7652 - 7656
  • [36] UIAI SYSTEM FOR SHORT-DURATION SPEAKER VERIFICATION CHALLENGE 2020
    Sahidullah, Md
    Sarkar, Achintya Kumar
    Vestman, Ville
    Liu, Xuechen
    Serizel, Romain
    Kinnunen, Tomi
    Tan, Zheng-Hua
    Vincent, Emmanuel
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 323 - 329
  • [37] Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification
    Ma, Jianbo
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    Lee, Kong Aik
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1502 - 1506
  • [38] Minimax i-vector extractor for short duration speaker verification
    Hautamaki, Ville
    Cheng, You-Chi
    Rajan, Padmanabhan
    Lee, Chin-Hui
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3675 - 3679
  • [39] The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020
    Jiang, Tao
    Zhao, Miao
    Li, Lin
    Hong, Qingyang
    INTERSPEECH 2020, 2020, : 736 - 740
  • [40] Comparison of Modern Deep Learning Models for Speaker Verification
    Brydinskyi, Vitalii
    Khoma, Yuriy
    Sabodashko, Dmytro
    Podpora, Michal
    Khoma, Volodymyr
    Konovalov, Alexander
    Kostiak, Maryna
    APPLIED SCIENCES-BASEL, 2024, 14 (04):