Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:7
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [41] Deep Metric Learning with Triplet-Margin-Center Loss for Sketch Face Recognition
    Feng, Yujian
    Wu, Fei
    Ji, Yimu
    Jing, Xiao-Yuan
    Yu, Jian
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (11): : 2394 - 2397
  • [43] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
    Qiang Wu
    Liqing Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2008
  • [44] Noise-robust feature based on sparse representation for speaker recognition
    Qi, Hongzhuo
    Metallurgical and Mining Industry, 2015, 7 (04): : 64 - 69
  • [45] Deep Margin-Sensitive Representation Learning for Cross-Domain Facial Expression Recognition
    Li, Yingjian
    Zhang, Zheng
    Chen, Bingzhi
    Lu, Guangming
    Zhang, David
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1359 - 1373
  • [46] Auditory Sparse Representation for Robust Speaker Recognition Based on Tensor Structure
    Wu, Qiang
    Zhang, Liqing
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2008, 2008 (1)
  • [47] Deep Metric Learning with Tuplet Margin Loss
    Yu, Baosheng
    Tao, Dacheng
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6499 - 6508
  • [48] PRECISE ADJACENT MARGIN LOSS FOR DEEP FACE RECOGNITION
    Wei, Xin
    Wang, Hui
    Scotney, Bryan
    Wan, Huan
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3641 - 3645
  • [49] ElasticFace: Elastic Margin Loss for Deep Face Recognition
    Boutros, Fadi
    Damer, Naser
    Kirchbuchner, Florian
    Kuijper, Arjan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1577 - 1586
  • [50] Multiview Clustering by Joint Latent Representation and Similarity Learning
    Xie, Deyan
    Zhang, Xiangdong
    Gao, Quanxue
    Han, Jiale
    Xiao, Song
    Gao, Xinbo
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (11) : 4848 - 4854