Deep Speaker Embedding with Frame-Constrained Training Strategy for Speaker Verification

被引:0
|
作者
Gu, Bin [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker verification; loss function; local variation; frame-level features;
D O I
10.21437/Interspeech.2022-867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals contain a lot of side information (content, stress, etc.), besides the voiceprint statistics. The session-variablility poses a huge challenge for modeling speaker characteristics. To alleviate this problem, we propose a novel frame-constrained training (FCT) strategy in this paper. It enhances the speaker information in frame-level layers for better embedding extraction. More precisely, a similarity matrix is calculated based on the frame-level features among each batch of the training samples, and a FCT loss is obtained through this similarity matrix. Finally, the speaker embedding network is trained by the combination of the FCT loss and the speaker classification loss. Experiments are performed on the VoxCeleb1 and VOiCES databases. The results demonstrate that the proposed training strategy boosts the system performance.
引用
收藏
页码:1451 / 1455
页数:5
相关论文
共 50 条
  • [31] Improving the Generalized Performance of Deep Embedding for Text-Independent Speaker Verification
    Li, Rongjin
    Li, Lin
    Hong, Qingyang
    Guo, Huiyang
    Zhao, Miao
    PROCEEDINGS OF 2018 12TH IEEE INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION (ASID), 2018, : 21 - 25
  • [32] Local Training in Speaker Verification for PLDA
    Pahuja, Hunny
    Ranjan, Priya
    Ujlayan, Amit
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND AUTOMATION (ICCCA), 2017, : 1466 - 1469
  • [33] Emotional Adaptive Training for Speaker Verification
    Bie, Fanhu
    Wang, Dong
    Zheng, Thomas Fang
    Tejedor, Javier
    Chen, Ruxin
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,
  • [34] Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification
    Li, Lantian
    Tang, Zhiyuan
    Shi, Ying
    Wang, Dong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 284 - 288
  • [35] DEEP SPEAKER REPRESENTATION USING ORTHOGONAL DECOMPOSITION AND RECOMBINATION FOR SPEAKER VERIFICATION
    Kim, Insoo
    Kim, Kyuhong
    Kim, Jiwhan
    Choi, Changkyu
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6126 - 6130
  • [36] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [37] Masked cross self-attentive encoding based speaker embedding for speaker verification
    Seo, Soonshin
    Kim, Ji-Hwan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 497 - 504
  • [38] Attentive Deep CNN for Speaker Verification
    Yu, Yong-bin
    Qi, Min-hui
    Tang, Yi-fan
    Deng, Quan-xin
    Peng, Chen-hui
    Mai, Feng
    Nyima, Tashi
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [39] ATTENTION MECHANISM IN SPEAKER RECOGNITION: WHAT DOES IT LEARN IN DEEP SPEAKER EMBEDDING?
    Wang, Qiongqiong
    Okabe, Koji
    Lee, Kong Aik
    Yamamoto, Hitoshi
    Koshinaka, Takafumi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1052 - 1059
  • [40] Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
    Liu, Xuechen
    Sahidullah, Md
    Lee, Kong Aik
    Kinnunen, Tomi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1261 - 1273