On Parameter Adaptation in Softmax-based Cross-Entropy Loss for Improved Convergence Speed and Accuracy in DNN-based Speaker Recognition

被引:6
|
作者
Rybicka, Magdalena [1 ]
Kowalczyk, Konrad [1 ]
机构
[1] AGH Univ Sci & Technol, Dept Elect, PL-30059 Krakow, Poland
来源
关键词
speaker recognition; deep neural networks; softmax activation functions; speaker embedding; ResNet; MARGIN SOFTMAX; VOICES;
D O I
10.21437/Interspeech.2020-2264
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In various classification tasks the major challenge is in generating discriminative representation of classes. By proper selection of deep neural network (DNN) loss function we can encourage it to produce embeddings with increased inter-class separation and smaller intra-class distances. In this paper, we develop softmax-based cross-entropy loss function which adapts its parameters to the current training phase. The proposed solution improves accuracy up to 24% in terms of Equal Error Rate (EER) and minimum Detection Cost Function (minDCF). In addition, our proposal also accelerates network convergence compared with other state-of-the-art softmax-based losses. As an additional contribution of this paper, we adopt and subsequently modify the ResNet DNN structure for the speaker recognition task. The proposed ResNet network achieves relative gains of up to 32% and 15% in terms of EER and minDCF respectively, compared with the well-established Time Delay Neural Network (TDNN) architecture for x-vector extraction.
引用
收藏
页码:3805 / 3809
页数:5
相关论文
共 4 条
  • [1] Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting
    Panchapagesan, Sankaran
    Sun, Ming
    Khare, Aparna
    Mandal, Spyros Matsoukas Arindam
    Hoffineister, Bjorn
    Vitaladevuni, Shiv
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 760 - 764
  • [2] Neutral Cross-Entropy Loss Based Unsupervised Domain Adaptation for Semantic Segmentation
    Xu, Hanqing
    Yang, Ming
    Deng, Liuyuan
    Qian, Yeqiang
    Wang, Chunxiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4516 - 4525
  • [3] A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement
    Chai, Li
    Du, Jun
    Liu, Qing-Feng
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 106 - 117
  • [4] Remote Sensing Image Classification via Improved Cross-Entropy Loss and Transfer Learning Strategy Based on Deep Convolutional Neural Networks
    Bahri, Ali
    Majelan, Sina Ghofrani
    Mohammadi, Sina
    Noori, Mehrdad
    Mohammadi, Karim
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (06) : 1087 - 1091