Deep CNNs With Self-Attention for Speaker Identification

被引:53
|
作者
Nguyen Nang An [1 ]
Nguyen Quang Thanh [1 ]
Liu, Yanbing [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Dept Comp Sci & Technol, Chongqing 400065, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing Engn Lab Internet & Informat Secur, Chongqing 400065, Peoples R China
关键词
Speaker identification; deep neural networks; self-attention; embedding learning; SUPPORT VECTOR MACHINES; RECOGNITION; QUANTIZATION; ROBUSTNESS;
D O I
10.1109/ACCESS.2019.2917470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most current works on speaker identification are based on i-vector methods; however, there is a marked shift from the traditional i-vector to deep learning methods, especially in the form of convolutional neural networks (CNNs). Rather than designing features and a subsequent individual classification model, we address the problem by learning features and recognition systems using deep neural networks. Based on the deep convolutional neural network (CNN), this paper presents a novel text-independent speaker identification method for speaker separation. Specifically, this paper is based on the two representative CNNs, called the visual geometry group (VGG) nets and residual neural networks (ResNets). Unlike prior deep neural network-based speaker identification methods that usually rely on a temporal maximum or average pooling across all time steps to map variable-length utterances to a fixed-dimension vector, this paper equips these two CNNs with a structured self-attention mechanism to learn a weighted average across all time steps. Using the structured self-attention layer with multiple attention hops, the proposed deep CNN network is not only capable of handling variable-length segments but also able to learn speaker characteristics from different aspects of the input sequence. The experimental results on the speaker identification benchmark database, VoxCeleb demonstrate the superiority of the proposed method over the traditional i-vector-based methods and the other strong CNN baselines. In addition, the results suggest that it is possible to cluster unknown speakers using the activation of an upper layer of a pre-trained identification CNN as a speaker embedding vector.
引用
收藏
页码:85327 / 85337
页数:11
相关论文
共 50 条
  • [41] Mineral Prospectivity Mapping Using Deep Self-Attention Model
    Yin, Bojun
    Zuo, Renguang
    Sun, Siquan
    NATURAL RESOURCES RESEARCH, 2023, 32 (01) : 37 - 56
  • [42] Aggregating Frame-Level Information in the Spectral Domain With Self-Attention for Speaker Embedding
    Tu, Youzhi
    Mak, Man-Wai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 944 - 957
  • [43] Self-attention mechanism in person re-identification models
    Chen, Wenbai
    Lu, Yue
    Ma, Hang
    Chen, Qili
    Wu, Xibao
    Wu, Peiliang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 4649 - 4667
  • [44] Fast Global Self-Attention for Seismic Image Fault Identification
    Wang, Shenghou
    Si, Xu
    Cai, Zhongxian
    Sun, Leiming
    Wang, Wei
    Jiang, Zirun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [45] Combined Self-attention Mechanism For Biomedical Event Trigger Identification
    Mang, Zhichang
    Zhang, Ruifang
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 1009 - 1012
  • [46] Decoupled Self-attention Module for Person Re-identification
    Zhao, Chao
    Zhang, Zhenyu
    Yan, Jian
    Yan, Yan
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7617 - 7624
  • [47] Attention and self-attention in random forests
    Utkin, Lev V.
    Konstantinov, Andrei V.
    Kirpichenko, Stanislav R.
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2023, 12 (03) : 257 - 273
  • [48] Attention and self-attention in random forests
    Lev V. Utkin
    Andrei V. Konstantinov
    Stanislav R. Kirpichenko
    Progress in Artificial Intelligence, 2023, 12 : 257 - 273
  • [49] A Dual Self-Attention mechanism for vehicle re-Identification
    Zhu, Wenqian
    Wang, Zhongyuan
    Wang, Xiaochen
    Hu, Ruimin
    Liu, Huikai
    Liu, Cheng
    Wang, Chao
    Li, Dengshi
    PATTERN RECOGNITION, 2023, 137
  • [50] Self-attention mechanism in person re-identification models
    Wenbai Chen
    Yue Lu
    Hang Ma
    Qili Chen
    Xibao Wu
    Peiliang Wu
    Multimedia Tools and Applications, 2022, 81 : 4649 - 4667