Deep Speaker Embedding with Frame-Constrained Training Strategy for Speaker Verification

被引:0
|
作者
Gu, Bin [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Speaker verification; loss function; local variation; frame-level features;
D O I
10.21437/Interspeech.2022-867
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech signals contain a lot of side information (content, stress, etc.), besides the voiceprint statistics. The session-variablility poses a huge challenge for modeling speaker characteristics. To alleviate this problem, we propose a novel frame-constrained training (FCT) strategy in this paper. It enhances the speaker information in frame-level layers for better embedding extraction. More precisely, a similarity matrix is calculated based on the frame-level features among each batch of the training samples, and a FCT loss is obtained through this similarity matrix. Finally, the speaker embedding network is trained by the combination of the FCT loss and the speaker classification loss. Experiments are performed on the VoxCeleb1 and VOiCES databases. The results demonstrate that the proposed training strategy boosts the system performance.
引用
收藏
页码:1451 / 1455
页数:5
相关论文
共 50 条
  • [21] Speaker Verification with Deep Features
    Liu, Yuan
    Fu, Tianfan
    Fan, Yuchen
    Qian, Yanmin
    Yu, Kai
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 747 - 753
  • [22] MEConformer: Highly representative embedding extractor for speaker verification via incorporating selective convolution into deep speaker encoder
    Zheng, Qiuyu
    Chen, Zengzhao
    Wang, Zhifeng
    Liu, Hai
    Lin, Mengting
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [23] Speaker adaptations in sparse training data for improved speaker verification
    Ahn, S
    Ko, H
    ELECTRONICS LETTERS, 2000, 36 (04) : 371 - 373
  • [24] Deep Speaker Embeddings for Short-Duration Speaker Verification
    Bhattacharya, Gautam
    Alam, Jahangir
    Kenny, Patrick
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1517 - 1521
  • [25] Deep speaker embeddings for Speaker Verification: Review and experimental comparison
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Kasak, Peter
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 127
  • [26] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
    Stafylakis, Themos
    Mosner, Ladislav
    Plchot, Oldrich
    Rohdin, Johan
    Silnova, Anna
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2022, 2022, : 605 - 609
  • [27] Speaker Verification Using Neighborhood Preserving Embedding
    Liang, Chunyan
    Yang, Jinchao
    Yang, Lin
    Yan, Yonghong
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1558 - 1561
  • [28] Deep Speaker Embedding for Speaker-Targeted Automatic Speech Recognition
    Chao, Guan-Lin
    Shen, John Paul
    Lane, Ian
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 39 - 43
  • [29] Embedding-Based Speaker Adaptive Training of Deep Neural Networks
    Cui, Xiaodong
    Goel, Vaibhava
    Saon, George
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 122 - 126
  • [30] Investigation of Different Calibration Methods for Deep Speaker Embedding Based Verification Systems
    Novoselov, Sergey
    Lavrentyeva, Galina
    Volokhov, Vladimir
    Volkova, Marina
    Khmelev, Nikita
    Akulov, Artem
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 159 - 168