Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引:11
|
作者
Kang, Woo Hyun [2 ]
Mun, Sung Hwan [2 ]
Han, Min Hyun [2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;
D O I
10.1109/ACCESS.2020.3012893
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
引用
收藏
页码:141838 / 141849
页数:12
相关论文
共 50 条
  • [21] A Robust Speaker-Adaptive and Text-Prompted Speaker Verification System
    Hong, Qingyang
    Wang, Sheng
    Liu, Zhijian
    BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 385 - 393
  • [22] A robust speaker-adaptive and text-prompted speaker verification system
    Hong, Qingyang, 1600, Springer Verlag (8833):
  • [23] Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
    Liu, Xuechen
    Sahidullah, Md
    Lee, Kong Aik
    Kinnunen, Tomi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1261 - 1273
  • [24] An Effective Deep Embedding Learning Architecture for Speaker Verification
    Jiang, Yiheng
    Song, Yan
    McLoughlin, Ian
    Gao, Zhifu
    Dai, Lirong
    INTERSPEECH 2019, 2019, : 4040 - 4044
  • [25] Enhancing acoustic models for robust speaker verification
    Nolazco-Flores, Juan A.
    Garcia-Perera, L. Paola
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4837 - 4840
  • [26] Robust speaker verification with state duration modeling
    Yoma, NB
    Pegoraro, TF
    SPEECH COMMUNICATION, 2002, 38 (1-2) : 77 - 88
  • [27] Acoustic Factor Analysis for Robust Speaker Verification
    Hasan, Taufiq
    Hansen, John H. L.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (04): : 842 - 853
  • [28] Cosine Distance Features for Robust Speaker Verification
    George, Kuruvachan K.
    Kumar, C. Santhosh
    Ramachandran, K. I.
    Panda, Ashish
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 234 - 238
  • [29] Improved Jacobian adaptation for robust speaker verification
    Anguita, J
    Hernando, J
    Abad, A
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (07): : 1767 - 1770
  • [30] Speaker-discriminative Embedding Learning via Affinity Matrix for Short Utterance Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    Wangt, Wenwu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 314 - 319