IMPROVING SPEAKER IDENTIFICATION FOR SHARED DEVICES BY ADAPTING EMBEDDINGS TO SPEAKER SUBSETS

被引：1

作者：

Tan, Zhenning ^{[1
]}

Yang, Yuguang ^{[1
]}

Han, Eunjung ^{[1
]}

Stolcke, Andreas ^{[1
]}

机构：

[1] Amazon Alexa AI, Sunnyvale, CA 94089 USA

来源：

2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2021年

关键词：

speaker identification; adaptation network; household scoring model; personalization;

D O I：

10.1109/ASRU51503.2021.9687975

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same household, we propose a household-adapted nonlinear mapping to a low dimensional space to complement the global scoring metric. The combined scoring function is optimized on labeled or pseudo-labeled speaker utterances. With input dropout, the proposed scoring model reduces EER by 45-71% in simulated households with 2 to 7 hard-to-discriminate speakers per household. On real-world internal data, the EER reduction is 49.2%. From t-SNE visualization, we also show that clusters formed by household-adapted speaker embeddings are more compact and uniformly distributed, compared to clusters formed by global embeddings before adaptation.

引用

页码：1124 / 1131

页数：8

共 50 条

[41] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
Nirupam Shome
Banala Saritha
Richik Kashyap
Rabul Hussain Laskar
[J]. Neural Computing and Applications, 2023, 35 : 18933 - 18947
[42] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
Shome, Nirupam
Saritha, Banala
Kashyap, Richik
Laskar, Rabul Hussain
[J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (26): : 18933 - 18947
[43] Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
Qin, Xiaoyi
Li, Na
Weng, Chao
Su, Dan
Li, Ming
[J]. INTERSPEECH 2022, 2022, : 1436 - 1440
[44] S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder
Mary, Narla John Metilda Sagaya
Umesh, Srinivasan
Katta, Sandesh Varadaraju
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 404 - 413
[45] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Zhu, Yingke
Mak, Brian
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
[46] Reducing speaker model search space in speaker identification
De Leon, Phillip L.
Apsingekar, Vijendra
[J]. 2007 BIOMETRICS SYMPOSIUM, 2007, : 90 - 95
[47] A modified speaker clustering method for efficient speaker identification
Yan, JiaChang
Wang, Lei
[J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 2, 2014,
[48] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
Zhao, Wei
Xu, Li
He, Ting
[J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503
[49] ECAPA-TDNN Embeddings for Speaker Diarization
Dawalatabad, Nauman
Ravanelli, Mirco
Grondin, Francois
Thienpondt, Jenthe
Desplanques, Brecht
Na, Hwidong
[J]. INTERSPEECH 2021, 2021, : 3560 - 3564
[50] SiamTDNN: Enhancing Discriminative Embeddings for Speaker Diarization
Zhang, Runqing
Lu, Huijun
Cai, Dunbo
Huang, Zhiguo
Du, Yujian
Qian, Ling
Zhang, Yijun
[J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)

← 1 2 3 4 5 →