An Improved Deep Embedding Learning Method for Short Duration Speaker Verification

被引:19
|
作者
Gao, Zhifu [1 ]
Song, Yan [1 ]
McLoughlin, Ian [2 ]
Guo, Wu [1 ]
Dai, Lirong [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China
[2] Univ Kent, Sch Comp, Medway, England
基金
中国国家自然科学基金;
关键词
speaker verification; convolution neural network; dilated convolution; cross-convolutional-layer pooling;
D O I
10.21437/Interspeech.2018-1515
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for short duration speaker verification (SV). Existing deep learning based SV methods generally extract frontend embeddings from a feed-forward deep neural network, in which the long-term speaker characteristics are captured via a pooling operation over the input speech. The extracted embeddings are then scored via a backend model, such as Probabilistic Linear Discriminative Analysis (PLDA). Two improvements are proposed for frontend embedding learning based on the CNN structure: (1) Motivated by the WaveNet for speech synthesis, dilated filters are designed to achieve a tradeoff between computational efficiency and receptive-filter size; and (2) A novel cross-convolutional layer pooling method is exploited to capture 1st-order statistics for modelling long-term speaker characteristics. Specifically, the activations of one convolutional layer are aggregated with the guidance of the feature maps from the successive layer. To evaluate the effectiveness of our proposed methods, extensive experiments are conducted on the modified female portion of NIST SRE 2010 evaluations, with conditions ranging from 10s-10s to 5s-4s. Excellent performance has been achieved on each evaluation condition, significantly outperforming existing SV systems using i-vector and d-vector embeddings.
引用
收藏
页码:3578 / 3582
页数:5
相关论文
共 50 条
  • [41] Deep neural networks for speaker verification with short speech utterances
    Yang, Il-Ho
    Heo, Hee-Soo
    Yoon, Sung-Hyun
    Yu, Ha-Jin
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (06): : 501 - 509
  • [42] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [43] A Gaussian Selection Method for Speaker Verification with Short Utterances
    Reyes Diaz, Flavio J.
    Hernandez Sierra, Gabriel
    Calvo de lara, Jose
    COMPUTACION Y SISTEMAS, 2014, 18 (02): : 345 - 358
  • [44] Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems
    Novoselov, Sergey
    Lavrentyeva, Galina
    Volokhov, Vladimir
    Volkova, Marina
    Khmelev, Nikita
    Akulov, Artem
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 159 - 168
  • [45] Revisiting the Statistics Pooling Layer in Deep Speaker Embedding Learning
    Wang, Shuai
    Yang, Yexin
    Qian, Yanmin
    Yu, Kai
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [46] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
    Li, Rongjin
    Li, Lin
    Hong, Qingyang
    Guo, Huiyang
    Zhao, Miao
    PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
  • [47] Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification
    Kang, Woo Hyun
    Mun, Sung Hwan
    Han, Min Hyun
    Kim, Nam Soo
    IEEE ACCESS, 2020, 8 : 141838 - 141849
  • [48] Self Attentive Context dependent Speaker Embedding for Speaker Verification
    Sankala, Sreekanth
    Rafi, B. Shaik Mohammad
    Kodukula, Sri Rama Murty
    2020 TWENTY SIXTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC 2020), 2020,
  • [49] Investigation of NICT submission for short-duration speaker verification challenge 2020
    Shen, Peng
    Lu, Xugang
    Kawai, Hisashi
    INTERSPEECH 2020, 2020, : 751 - 755
  • [50] Nonparametrically trained PLDA for short duration i-vector speaker verification
    Khosravani, Abbas
    Homayounpour, Mohammad M.
    COMPUTER SPEECH AND LANGUAGE, 2018, 52 : 105 - 122