An Improved Deep Embedding Learning Method for Short Duration Speaker Verification

被引：19

作者：

Gao, Zhifu ^{[1
]}

Song, Yan ^{[1
]}

McLoughlin, Ian ^{[2
]}

Guo, Wu ^{[1
]}

Dai, Lirong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] Univ Kent, Sch Comp, Medway, England

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

中国国家自然科学基金;

关键词：

speaker verification; convolution neural network; dilated convolution; cross-convolutional-layer pooling;

D O I：

10.21437/Interspeech.2018-1515

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for short duration speaker verification (SV). Existing deep learning based SV methods generally extract frontend embeddings from a feed-forward deep neural network, in which the long-term speaker characteristics are captured via a pooling operation over the input speech. The extracted embeddings are then scored via a backend model, such as Probabilistic Linear Discriminative Analysis (PLDA). Two improvements are proposed for frontend embedding learning based on the CNN structure: (1) Motivated by the WaveNet for speech synthesis, dilated filters are designed to achieve a tradeoff between computational efficiency and receptive-filter size; and (2) A novel cross-convolutional layer pooling method is exploited to capture 1st-order statistics for modelling long-term speaker characteristics. Specifically, the activations of one convolutional layer are aggregated with the guidance of the feature maps from the successive layer. To evaluate the effectiveness of our proposed methods, extensive experiments are conducted on the modified female portion of NIST SRE 2010 evaluations, with conditions ranging from 10s-10s to 5s-4s. Excellent performance has been achieved on each evaluation condition, significantly outperforming existing SV systems using i-vector and d-vector embeddings.

引用

页码：3578 / 3582

页数：5

共 50 条

[21] DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
Yi, Lu
Mak, Man-Wai
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7662 - 7666
[22] PHONE ADAPTIVE TRAINING FOR SHORT-DURATION SPEAKER VERIFICATION
Soldi, Giovanni
Bozonnet, Simon
Beaugeant, Christophe
Evans, Nicholas
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2107 - 2111
[23] Dual Path Embedding Learning for Speaker Verification with Triplet Attention
Liu, Bei
Chen, Zhengyang
Qian, Yanmin
INTERSPEECH 2022, 2022, : 291 - 295
[24] Learning Discriminative Speaker Embedding by Improving Aggregation Strategy and Loss Function for Speaker Verification
Luo, Chengfang
Guo, Xin
Deng, Aiwen
Xu, Wei
Zhao, Junhong
Kang, Wenxiong
2021 INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2021), 2021,
[25] DOMAIN ROBUST DEEP EMBEDDING LEARNING FOR SPEAKER RECOGNITION
Hu, Hang-Rui
Song, Yan
Liu, Ying
Dai, Li-Rong
McLoughlin, Ian
Liu, Lin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7182 - 7186
[26] ECAPA plus plus : Fine-grained Deep Embedding Learning for TDNN Based Speaker Verification
Liu, Bei
Qian, Yanmin
INTERSPEECH 2023, 2023, : 3132 - 3136
[27] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
[28] Introducing phonetic information to speaker embedding for speaker verification
Liu, Yi
He, Liang
Liu, Jia
Johnson, Michael T.
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
[29] Consideration of Varying Training Lengths for Short-Duration Speaker Verification
Ko, WooSeok
Um, Seyun
Piao, Zhenyu
Kang, Hong-goo
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 139 - 144
[30] The TalTech Systems for the Short-duration Speaker Verification Challenge 2020
Alumae, Tanel
Valk, Jorgen
INTERSPEECH 2020, 2020, : 746 - 750

← 1 2 3 4 5 →