IMPROVING SPEAKER IDENTIFICATION FOR SHARED DEVICES BY ADAPTING EMBEDDINGS TO SPEAKER SUBSETS

被引:1
|
作者
Tan, Zhenning [1 ]
Yang, Yuguang [1 ]
Han, Eunjung [1 ]
Stolcke, Andreas [1 ]
机构
[1] Amazon Alexa AI, Sunnyvale, CA 94089 USA
关键词
speaker identification; adaptation network; household scoring model; personalization;
D O I
10.1109/ASRU51503.2021.9687975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same household, we propose a household-adapted nonlinear mapping to a low dimensional space to complement the global scoring metric. The combined scoring function is optimized on labeled or pseudo-labeled speaker utterances. With input dropout, the proposed scoring model reduces EER by 45-71% in simulated households with 2 to 7 hard-to-discriminate speakers per household. On real-world internal data, the EER reduction is 49.2%. From t-SNE visualization, we also show that clusters formed by household-adapted speaker embeddings are more compact and uniformly distributed, compared to clusters formed by global embeddings before adaptation.
引用
收藏
页码:1124 / 1131
页数:8
相关论文
共 50 条
  • [41] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
    Nirupam Shome
    Banala Saritha
    Richik Kashyap
    Rabul Hussain Laskar
    [J]. Neural Computing and Applications, 2023, 35 : 18933 - 18947
  • [42] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
    Shome, Nirupam
    Saritha, Banala
    Kashyap, Richik
    Laskar, Rabul Hussain
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (26): : 18933 - 18947
  • [43] Cross-Age Speaker Verification: Learning Age-Invariant Speaker Embeddings
    Qin, Xiaoyi
    Li, Na
    Weng, Chao
    Su, Dan
    Li, Ming
    [J]. INTERSPEECH 2022, 2022, : 1436 - 1440
  • [44] S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder
    Mary, Narla John Metilda Sagaya
    Umesh, Srinivasan
    Katta, Sandesh Varadaraju
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 404 - 413
  • [45] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [46] Reducing speaker model search space in speaker identification
    De Leon, Phillip L.
    Apsingekar, Vijendra
    [J]. 2007 BIOMETRICS SYMPOSIUM, 2007, : 90 - 95
  • [47] A modified speaker clustering method for efficient speaker identification
    Yan, JiaChang
    Wang, Lei
    [J]. 2014 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2014), VOL 2, 2014,
  • [48] Improving Multi-Speaker Tacotron with Speaker Gating Mechanisms
    Zhao, Wei
    Xu, Li
    He, Ting
    [J]. PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 7498 - 7503
  • [49] ECAPA-TDNN Embeddings for Speaker Diarization
    Dawalatabad, Nauman
    Ravanelli, Mirco
    Grondin, Francois
    Thienpondt, Jenthe
    Desplanques, Brecht
    Na, Hwidong
    [J]. INTERSPEECH 2021, 2021, : 3560 - 3564
  • [50] SiamTDNN: Enhancing Discriminative Embeddings for Speaker Diarization
    Zhang, Runqing
    Lu, Huijun
    Cai, Dunbo
    Huang, Zhiguo
    Du, Yujian
    Qian, Ling
    Zhang, Yijun
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2024, 33 (03)