Self-attention based speaker recognition using Cluster-Range Loss

被引:17
|
作者
Bian, Tengyue [1 ]
Chen, Fangzhou [1 ]
Xu, Li [1 ]
机构
[1] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Self-attention; Speaker recognition; Triplet loss; VERIFICATION; MACHINES;
D O I
10.1016/j.neucom.2019.08.046
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker recognition with short utterances is a challenging research topic in the natural language processing (NLP) field. Previous convolutional neural network (CNN) based models for speaker recognition usually utilize very deep or wide layers, resulting in many parameters and high computational cost. Besides, great training difficulty and inefficiency exist in the triplet loss, which is widely used in speaker recognition. In this work, we propose to combine the residual network (ResNet) with the self-attention mechanism to achieve better performance in text-independent speaker recognition with fewer parameters and lower computational cost. In addition, the Cluster-Range Loss based on a well-designed online exemplar mining is proposed to directly shrink the intra-class variation and to enlarge the inter-class distance. Experiments on Voxceleb dataset are conducted to verify the effectiveness of the proposed scheme. The proposed approach achieves a Top-1 accuracy of 89.1% for speaker identification by jointly training the network with the Cluster-Range Loss and softmax cross entropy loss. For speaker verification, we achieve a competitive EER of 5.5% without any heavy-tailed backend, compared with the state-of-the-art i-vector system, as well as the x-vector system. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:59 / 68
页数:10
相关论文
共 50 条
  • [1] Weighted Cluster-Range Loss and Criticality-Enhancement Loss for Speaker Recognition
    Mo, Jianye
    Xu, Li
    APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 20
  • [2] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [3] MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION
    Wang, Rui
    Ao, Junyi
    Zhou, Long
    Liu, Shujie
    Wei, Zhihua
    Ko, Tom
    Li, Qing
    Zhang, Yu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6732 - 6736
  • [4] Emotion embedding framework with emotional self-attention mechanism for speaker recognition
    Li, Dongdong
    Yang, Zhuo
    Liu, Jinlin
    Yang, Hai
    Wang, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [5] Self-attention is What You Need to Fool a Speaker Recognition System
    Wang, Fangwei
    Song, Ruixin
    Tan, Zhiyuan
    Li, Qingru
    Wang, Changguang
    Yang, Yong
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 929 - 936
  • [6] Deep CNNs With Self-Attention for Speaker Identification
    Nguyen Nang An
    Nguyen Quang Thanh
    Liu, Yanbing
    IEEE ACCESS, 2019, 7 : 85327 - 85337
  • [7] Oral mucosal disease recognition based on dynamic self-attention and feature discriminant loss
    Xie, Fei
    Xu, Pengfei
    Xi, Xinyi
    Gu, Xiaokang
    Zhang, Panpan
    Wang, Hexu
    Shen, Xuemin
    ORAL DISEASES, 2024, 30 (05) : 3094 - 3107
  • [8] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [9] Polarimetric HRRP Recognition Based on ConvLSTM With Self-Attention
    Zhang, Liang
    Li, Yang
    Wang, Yanhua
    Wang, Junfu
    Long, Teng
    IEEE SENSORS JOURNAL, 2021, 21 (06) : 7884 - 7898
  • [10] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):