Robust Deep Speaker Recognition: Learning Latent Representation with Joint Angular Margin Loss

被引:7
|
作者
Chowdhury, Labib [1 ]
Zunair, Hasib [2 ]
Mohammed, Nabeel [1 ]
机构
[1] North South Univ, Dept Elect & Comp Engn, Dhaka 1229, Bangladesh
[2] Concordia Univ, Gina Cody Sch Engn & Comp Sci, Montreal, PQ H3G, Canada
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 21期
关键词
speaker recognition; speaker identification; margin loss; SincNet; inter dataset testing; biometric authentication; feature embedding;
D O I
10.3390/app10217522
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Speaker identification is gaining popularity, with notable applications in security, automation, and authentication. For speaker identification, deep-convolutional-network-based approaches, such as SincNet, are used as an alternative to i-vectors. Convolution performed by parameterized sinc functions in SincNet demonstrated superior results in this area. This system optimizes softmax loss, which is integrated in the classification layer that is responsible for making predictions. Since the nature of this loss is only to increase interclass distance, it is not always an optimal design choice for biometric-authentication tasks such as face and speaker recognition. To overcome the aforementioned issues, this study proposes a family of models that improve upon the state-of-the-art SincNet model. Proposed models AF-SincNet, Ensemble-SincNet, and ALL-SincNet serve as a potential successor to the successful SincNet model. The proposed models are compared on a number of speaker-recognition datasets, such as TIMIT and LibriSpeech, with their own unique challenges. Performance improvements are demonstrated compared to competitive baselines. In interdataset evaluation, the best reported model not only consistently outperformed the baselines and current prior models, but also generalized well on unseen and diverse tasks such as Bengali speaker recognition.
引用
下载
收藏
页码:1 / 17
页数:17
相关论文
共 50 条
  • [1] Latent discriminative representation learning for speaker recognition
    Huang, Duolin
    Mao, Qirong
    Ma, Zhongchen
    Zheng, Zhishen
    Routryar, Sidheswar
    Ocquaye, Elias-Nii-Noi
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 697 - 708
  • [2] Erratum to: Latent discriminative representation learning for speaker recognition
    Duolin Huang
    Qirong Mao
    Zhongchen Ma
    Zhishen Zheng
    Sidheswar Routray
    Elias-Nii-Noi Ocquaye
    Frontiers of Information Technology & Electronic Engineering, 2021, 22 : 914 - 914
  • [3] Angular Margin Centroid Loss for Text-independent Speaker Recognition
    Wei, Yuheng
    Du, Junzhao
    Liu, Hui
    INTERSPEECH 2020, 2020, : 3820 - 3824
  • [4] DOMAIN ROBUST DEEP EMBEDDING LEARNING FOR SPEAKER RECOGNITION
    Hu, Hang-Rui
    Song, Yan
    Liu, Ying
    Dai, Li-Rong
    McLoughlin, Ian
    Liu, Lin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7182 - 7186
  • [5] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
    Deng, Jiankang
    Guo, Jia
    Xue, Niannan
    Zafeiriou, Stefanos
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4685 - 4694
  • [6] ArcFace: Additive Angular Margin Loss for Deep Face Recognition
    Deng, Jiankang
    Guo, Jia
    Yang, Jing
    Xue, Niannan
    Kotsia, Irene
    Zafeiriou, Stefanos
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 5962 - 5979
  • [7] MaaFace: Multiplicative and Additive Angular Margin Loss for Deep Face Recognition
    Liu, Weilun
    Jiao, Jichao
    Mo, Yaokai
    Jiao, Jian
    Deng, Zhongliang
    IMAGE AND GRAPHICS, ICIG 2019, PT III, 2019, 11903 : 642 - 653
  • [8] KappaFace: Adaptive Additive Angular Margin Loss for Deep Face Recognition
    Oinar, Chingis
    M. Le, Binh
    Woo, Simon S.
    IEEE ACCESS, 2023, 11 : 137138 - 137150
  • [9] Joint representation and pattern learning for robust face recognition
    Yang, Meng
    Zhu, Pengfei
    Liu, Feng
    Shen, Linlin
    NEUROCOMPUTING, 2015, 168 : 70 - 80
  • [10] Latent discriminative representation learning for speaker recognition (vol 22, pg 697, 2021)
    Huang, Duolin
    Mao, Qirong
    Ma, Zhongchen
    Zheng, Zhishen
    Routray, Sidheswar
    Ocquaye, Elias-Nii-Noi
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (06) : 914 - 914