IMPROVING SPEAKER IDENTIFICATION FOR SHARED DEVICES BY ADAPTING EMBEDDINGS TO SPEAKER SUBSETS

被引:1
|
作者
Tan, Zhenning [1 ]
Yang, Yuguang [1 ]
Han, Eunjung [1 ]
Stolcke, Andreas [1 ]
机构
[1] Amazon Alexa AI, Sunnyvale, CA 94089 USA
关键词
speaker identification; adaptation network; household scoring model; personalization;
D O I
10.1109/ASRU51503.2021.9687975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification typically involves three stages. First, a front-end speaker embedding model is trained to embed utterance and speaker profiles. Second, a scoring function is applied between a runtime utterance and each speaker profile. Finally, the speaker is identified using nearest neighbor according to the scoring metric. To better distinguish speakers sharing a device within the same household, we propose a household-adapted nonlinear mapping to a low dimensional space to complement the global scoring metric. The combined scoring function is optimized on labeled or pseudo-labeled speaker utterances. With input dropout, the proposed scoring model reduces EER by 45-71% in simulated households with 2 to 7 hard-to-discriminate speakers per household. On real-world internal data, the EER reduction is 49.2%. From t-SNE visualization, we also show that clusters formed by household-adapted speaker embeddings are more compact and uniformly distributed, compared to clusters formed by global embeddings before adaptation.
引用
收藏
页码:1124 / 1131
页数:8
相关论文
共 50 条
  • [1] Adapting Speaker Embeddings for Speaker Diarisation
    Kwon, Youngki
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    [J]. INTERSPEECH 2021, 2021, : 3101 - 3105
  • [2] Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech
    Sarma, Biswajit Dev
    Das, Rohan Kumar
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 610 - 615
  • [3] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    [J]. 2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [4] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    [J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [5] Deep Speaker Embeddings for Speaker Verification of Children
    Abed, Mohammed Hamzah
    Sztaho, David
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II, 2024, 15049 : 58 - 69
  • [6] Improving Speaker Segmentation via Speaker Identification and Text Segmentation
    Li, Runxin
    Schultz, Tanja
    Jin, Qin
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 928 - 931
  • [7] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
    Sun, G.
    Liu, D.
    Zhang, C.
    Woodland, P. C.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
  • [8] Speaker-Corrupted Embeddings for Online Speaker Diarization
    Ghahabi, Omid
    Fischer, Volker
    [J]. INTERSPEECH 2019, 2019, : 386 - 390
  • [9] INVESTIGATION OF SPEAKER EMBEDDINGS FOR CROSS-SHOW SPEAKER DIARIZATION
    Rouvier, Mickael
    Favre, Benoit
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5585 - 5589
  • [10] Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
    Novoselov, Sergey
    Gusev, Aleksei
    Ivanov, Artem
    Pekhovsky, Timur
    Shulipa, Andrey
    Avdeeva, Anastasia
    Gorlanov, Artem
    Kozlov, Alexandr
    [J]. INTERSPEECH 2019, 2019, : 1003 - 1007