Self-Attention Encoding and Pooling for Speaker Recognition

被引:35
|
作者
Safari, Pooyan [1 ]
India, Miquel [1 ]
Hernando, Javier [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
来源
关键词
Self-Attention Encoding; Self-Attention Pooling; Speaker Verification; Speaker Embedding;
D O I
10.21437/Interspeech.2020-1446
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other hand, self-attention networks based on Transformer architecture have attracted remarkable interests due to their high parallelization capabilities and strong performance on a variety of Natural Language Processing (NLP) applications. Inspired by the Transformer, we propose a tandem Self-Attention Encoding and Pooling (SAEP) mechanism to obtain a discriminative speaker embedding given non-fixed length speech utterances. SAEP is a stack of identical blocks solely relied on self-attention and position-wise feed-forward networks to create vector representation of speakers. This approach encodes short-term speaker spectral features into speaker embeddings to be used in text-independent speaker verification. We have evaluated this approach on both VoxCeleb1 & 2 datasets. The proposed architecture is able to outperform the baseline x-vector, and shows competitive performance to some other benchmarks based on convolutions, with a significant reduction in model size. It employs 94%, 95%, and 73% less parameters compared to ResNet-34, ResNet-50, and x-vector, respectively. This indicates that the proposed fully attention based architecture is more efficient in extracting time-invariant features from speaker utterances.
引用
收藏
页码:941 / 945
页数:5
相关论文
共 50 条
  • [41] UniFormer: Unifying Convolution and Self-Attention for Visual Recognition
    Li, Kunchang
    Wang, Yali
    Zhang, Junhao
    Gao, Peng
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    Qiao, Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12581 - 12600
  • [42] Self Multi-Head Attention for Speaker Recognition
    India, Miquel
    Safari, Pooyan
    Hernando, Javier
    INTERSPEECH 2019, 2019, : 4305 - 4309
  • [43] GCNSA: DNA storage encoding with a graph convolutional network and self-attention
    Cao, Ben
    Wang, Bin
    Zhang, Qiang
    ISCIENCE, 2023, 26 (03)
  • [44] Progressive Self-Attention Network with Unsymmetrical Positional Encoding for Sequential Recommendation
    Zhu, Yuehua
    Huang, Bo
    Jiang, Shaohua
    Yang, Muli
    Yang, Yanhua
    Zhong, Wenliang
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2029 - 2033
  • [45] Phrase-level Self-Attention Networks for Universal Sentence Encoding
    Wu, Wei
    Wang, Houfeng
    Liu, Tianyu
    Ma, Shuming
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3729 - 3738
  • [46] Video person re-identification with global statistic pooling and self-attention distillation
    Lin, Gaojie
    Zhao, Sanyuan
    Shen, Jianbing
    NEUROCOMPUTING, 2021, 453 (453) : 777 - 789
  • [47] An Aerial Target Recognition Algorithm Based on Self-Attention and LSTM
    Liang, Futai
    Chen, Xin
    He, Song
    Song, Zihao
    Lu, Hao
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 81 (01): : 1101 - 1121
  • [48] Pedestrian Attribute Recognition Based on Dual Self-attention Mechanism
    Fan, Zhongkui
    Guan, Ye-peng
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2023, 20 (02) : 793 - 812
  • [49] Using Self-Attention LSTMs to Enhance Observations in Goal Recognition
    Amado, Leonardo
    Licks, Gabriel Paludo
    Marcon, Matheus
    Pereira, Ramon Fraga
    Meneguzzi, Felipe
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [50] Neural Named Entity Recognition Using a Self-Attention Mechanism
    Zukov-Gregoric, Andrej
    Bachrach, Yoram
    Minkovsky, Pasha
    Coope, Sam
    Maksak, Bogdan
    2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, : 652 - 656