Deep CNNs With Self-Attention for Speaker Identification

被引:53
|
作者
Nguyen Nang An [1 ]
Nguyen Quang Thanh [1 ]
Liu, Yanbing [2 ]
机构
[1] Chongqing Univ Posts & Telecommun, Dept Comp Sci & Technol, Chongqing 400065, Peoples R China
[2] Chongqing Univ Posts & Telecommun, Chongqing Engn Lab Internet & Informat Secur, Chongqing 400065, Peoples R China
关键词
Speaker identification; deep neural networks; self-attention; embedding learning; SUPPORT VECTOR MACHINES; RECOGNITION; QUANTIZATION; ROBUSTNESS;
D O I
10.1109/ACCESS.2019.2917470
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most current works on speaker identification are based on i-vector methods; however, there is a marked shift from the traditional i-vector to deep learning methods, especially in the form of convolutional neural networks (CNNs). Rather than designing features and a subsequent individual classification model, we address the problem by learning features and recognition systems using deep neural networks. Based on the deep convolutional neural network (CNN), this paper presents a novel text-independent speaker identification method for speaker separation. Specifically, this paper is based on the two representative CNNs, called the visual geometry group (VGG) nets and residual neural networks (ResNets). Unlike prior deep neural network-based speaker identification methods that usually rely on a temporal maximum or average pooling across all time steps to map variable-length utterances to a fixed-dimension vector, this paper equips these two CNNs with a structured self-attention mechanism to learn a weighted average across all time steps. Using the structured self-attention layer with multiple attention hops, the proposed deep CNN network is not only capable of handling variable-length segments but also able to learn speaker characteristics from different aspects of the input sequence. The experimental results on the speaker identification benchmark database, VoxCeleb demonstrate the superiority of the proposed method over the traditional i-vector-based methods and the other strong CNN baselines. In addition, the results suggest that it is possible to cluster unknown speakers using the activation of an upper layer of a pre-trained identification CNN as a speaker embedding vector.
引用
收藏
页码:85327 / 85337
页数:11
相关论文
共 50 条
  • [31] Sentence Matching with Deep Self-attention and Co-attention Features
    Wang, Zhipeng
    Yan, Danfeng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 550 - 561
  • [32] Homogeneous Learning: Self-Attention Decentralized Deep Learning
    Sun, Yuwei
    Ochiai, Hideya
    IEEE ACCESS, 2022, 10 : 7695 - 7703
  • [33] Pest Identification Based on Fusion of Self-Attention With ResNet
    Hassan, Sk Mahmudul
    Maji, Arnab Kumar
    IEEE ACCESS, 2024, 12 : 6036 - 6050
  • [34] Self-attention based speaker recognition using Cluster-Range Loss
    Bian, Tengyue
    Chen, Fangzhou
    Xu, Li
    NEUROCOMPUTING, 2019, 368 : 59 - 68
  • [35] Contextualized dynamic meta embeddings based on Gated CNNs and self-attention for Arabic machine translation
    Bensalah, Nouhaila
    Ayad, Habib
    Adib, Abdellah
    El Farouk, Abdelhamid Ibn
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2024, 17 (03) : 605 - 631
  • [36] Mineral Prospectivity Mapping Using Deep Self-Attention Model
    Bojun Yin
    Renguang Zuo
    Siquan Sun
    Natural Resources Research, 2023, 32 : 37 - 56
  • [37] Lightweight Smoke Recognition Based on Deep Convolution and Self-Attention
    Zhao, Yang
    Wang, Yigang
    Jung, Hoi-Kyung
    Jin, Yongqiang
    Hua, Dan
    Xu, Sen
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [38] SHYNESS AND SELF-ATTENTION
    CROZIER, WR
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1983, 36 (FEB): : A5 - A5
  • [39] Deep relational self-Attention networks for scene graph generation
    Li, Ping
    Yu, Zhou
    Zhan, Yibing
    Pattern Recognition Letters, 2022, 153 : 200 - 206
  • [40] Deep relational self-Attention networks for scene graph generation
    Li, Ping
    Yu, Zhou
    Zhan, Yibing
    PATTERN RECOGNITION LETTERS, 2022, 153 : 200 - 206