Deep Speaker Embedding with Frame-Constrained Training Strategy for Speaker Verification

被引:0
|
作者
Gu, Bin [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker verification; loss function; local variation; frame-level features;
D O I
10.21437/Interspeech.2022-867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals contain a lot of side information (content, stress, etc.), besides the voiceprint statistics. The session-variablility poses a huge challenge for modeling speaker characteristics. To alleviate this problem, we propose a novel frame-constrained training (FCT) strategy in this paper. It enhances the speaker information in frame-level layers for better embedding extraction. More precisely, a similarity matrix is calculated based on the frame-level features among each batch of the training samples, and a FCT loss is obtained through this similarity matrix. Finally, the speaker embedding network is trained by the combination of the FCT loss and the speaker classification loss. Experiments are performed on the VoxCeleb1 and VOiCES databases. The results demonstrate that the proposed training strategy boosts the system performance.
引用
收藏
页码:1451 / 1455
页数:5
相关论文
共 50 条
  • [41] Speaker verification using minimum verification error training
    Rosenberg, AE
    Siohan, O
    Parthasarathy, S
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 105 - 108
  • [42] Mutual Information Enhanced Training for Speaker Embedding
    Tu, Youzhi
    Mak, Man-Wai
    INTERSPEECH 2021, 2021, : 91 - 95
  • [43] Triplet-Center Loss Based Deep Embedding Learning Method for Speaker Verification
    Jiang, Yiheng
    Song, Yan
    Yan, Jie
    Dai, Lirong
    McLoughlin, Ian
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1625 - 1629
  • [44] SPEAKER ADAPTIVE TRAINING FOR DEEP NEURAL NETWORKS EMBEDDING LINEAR TRANSFORMATION NETWORKS
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Hori, Chiori
    Katagiri, Shigeru
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4605 - 4609
  • [45] Residual Information in Deep Speaker Embedding Architectures
    Stan, Adriana
    MATHEMATICS, 2022, 10 (21)
  • [46] On Metric-based Deep Embedding Learning for Text-Independent Speaker Verification
    Kashani, Hamidreza Baradaran
    Reza, Shaghayegh
    Rezaei, Iman Sarraf
    2020 6TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS), 2020,
  • [47] Attentive Statistics Pooling for Deep Speaker Embedding
    Okabel, Koji
    Koshinaka, Takafumi
    Shinoda, Koichi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2252 - 2256
  • [48] Mixture Representation Learning for Deep Speaker Embedding
    Lin, Weiwei
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 968 - 978
  • [49] INVESTIGATION OF SPECAUGMENT FOR DEEP SPEAKER EMBEDDING LEARNING
    Wang, Shuai
    Rohdin, Johan
    Plchot, Oldrich
    Burget, Lukas
    Yu, Kai
    Cernocky, Jan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7139 - 7143
  • [50] Neural Discriminant Analysis for Deep Speaker Embedding
    Li, Lantian
    Wang, Dong
    Zheng, Thomas Fang
    INTERSPEECH 2020, 2020, : 3251 - 3255