Disentangled Speaker and Nuisance Attribute Embedding for Robust Speaker Verification

被引:11
|
作者
Kang, Woo Hyun [2 ]
Mun, Sung Hwan [2 ]
Han, Min Hyun [2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Seoul, South Korea
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Training; Robustness; Performance evaluation; Law enforcement; Machine learning; Task analysis; Licenses; Speech embedding; speaker verification; domain disentanglement; deep learning; RECOGNITION;
D O I
10.1109/ACCESS.2020.3012893
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Over the recent years, various deep learning-based embedding methods have been proposed and have shown impressive performance in speaker verification. However, as in most of the classical embedding techniques, the deep learning-based methods are known to suffer from severe performance degradation when dealing with speech samples with different conditions (e.g., recording devices, emotional states). In this paper, we propose a novel fully supervised training method for extracting a speaker embedding vector disentangled from the variability caused by the nuisance attributes. The proposed framework was compared with the conventional deep learning-based embedding methods using the RSR2015 and VoxCeleb1 dataset. Experimental results show that the proposed approach can extract speaker embeddings robust to channel and emotional variability.
引用
收藏
页码:141838 / 141849
页数:12
相关论文
共 50 条
  • [1] DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
    Yi, Lu
    Mak, Man-Wai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7662 - 7666
  • [2] Nonlinear kernel nuisance attribute projection for speaker verification
    Zhao, Xianyu
    Dong, Yuan
    Yang, Hao
    Zhao, Jian
    Lu, Liang
    Wang, Haila
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4125 - +
  • [3] The Likelihood Ratio Decision Criterion for Nuisance Attribute Projection in GMM Speaker Verification
    Boštjan Vesnicer
    France Mihelič
    EURASIP Journal on Advances in Signal Processing, 2008
  • [4] The likelihood ratio decision criterion for nuisance attribute projection in GMM speaker verification
    Vesnicer, Bostjan
    Mihelic, France
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2008, 2008 (1)
  • [5] Introducing phonetic information to speaker embedding for speaker verification
    Liu, Yi
    He, Liang
    Liu, Jia
    Johnson, Michael T.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [6] Introducing phonetic information to speaker embedding for speaker verification
    Yi Liu
    Liang He
    Jia Liu
    Michael T. Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [7] DEAAN: DISENTANGLED EMBEDDING AND ADVERSARIAL ADAPTATION NETWORK FOR ROBUST SPEAKER REPRESENTATION LEARNING
    Sang, Mufan
    Xia, Wei
    Hansen, John H. L.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6169 - 6173
  • [8] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [9] Deep Segment Attentive Embedding for Duration Robust Speaker Verification
    Liu, Bin
    Nie, Shuai
    Liu, Wenju
    Zhang, Hui
    Li, Xiangang
    Li, Changliang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 822 - 826
  • [10] Comparison of Input and Feature Space Nonlinear Kernel Nuisance Attribute Projections for Speaker Verification
    Zhao, Xianyu
    Dong, Yuan
    Zhao, Jian
    Lu, Liang
    Liu, Jiqing
    Wang, Haila
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1377 - +