Self-attention based speaker recognition using Cluster-Range Loss

被引:17
|
作者
Bian, Tengyue [1 ]
Chen, Fangzhou [1 ]
Xu, Li [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;
D O I
10.1016/j.neucom.2019.08.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [31] Speaker identification for household scenarios with self-attention and adversarial training
    Li, Ruirui
    Jiang, Jyun-Yu
    Wu, Xian
    Hsieh, Chu-Cheng
    Stolcke, Andreas
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, 2020-October : 2272 - 2276
  • [32] Speaker Identification for Household Scenarios with Self-attention and Adversarial Training
    Li, Ruirui
    Joang, Jyun-Yu
    Wu, Xian
    Hsieh, Chu-Cheng
    Stolcke, Andreas
    INTERSPEECH 2020, 2020, : 2272 - 2276
  • [33] A framework for facial expression recognition using deep self-attention network
    Indolia S.
    Nigam S.
    Singh R.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (07) : 9543 - 9562
  • [34] Voice gender recognition under unconstrained environments using self-attention
    Nasef, Mohammed M.
    Sauber, Amr M.
    Nabil, Mohammed M.
    APPLIED ACOUSTICS, 2021, 175 (175)
  • [35] Cyclic Self-attention for Point Cloud Recognition
    Zhu, Guanyu
    Zhou, Yong
    Yao, Rui
    Zhu, Hancheng
    Zhao, Jiaqi
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [36] Self-Attention Networks for Human Activity Recognition Using Wearable Devices
    Betancourt, Carlos
    Chen, Wen-Hui
    Kuan, Chi-Wei
    2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 1194 - 1199
  • [37] Facial Action Unit Recognition Based on Self-Attention Spatiotemporal Fusion
    Liang, Chaolei
    Zou, Wei
    Hu, Danfeng
    Wang, JiaJun
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 600 - 605
  • [38] An efficient self-attention network for skeleton-based action recognition
    Xiaofei Qin
    Rui Cai
    Jiabin Yu
    Changxiang He
    Xuedian Zhang
    Scientific Reports, 12 (1)
  • [39] SELF-ATTENTION BASED DARKNET NAMED ENTITY RECOGNITION WITH BERT METHODS
    Chen, Yuxuan
    Guo, Yubin
    Jiang, Hong
    Ding, Jianwei
    Chen, Zhouguo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2021, 17 (06): : 1973 - 1988
  • [40] Self-Attention Network for Skeleton-based Human Action Recognition
    Cho, Sangwoo
    Maqbool, Muhammad Hasan
    Liu, Fei
    Foroosh, Hassan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 624 - 633