Self Attention Networks in Speaker Recognition

被引:2
|
作者
Safari, Pooyan [1 ]
India, Miquel [1 ]
Hernando, Javier [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona 08034, Spain
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 11期
关键词
speaker recognition; self-attention networks; transformer; speaker embeddings; SPEECH; REPRESENTATION;
D O I
10.3390/app13116410
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Recently, there has been a significant surge of interest in Self-Attention Networks (SANs) based on the Transformer architecture. This can be attributed to their notable ability for parallelization and their impressive performance across various Natural Language Processing applications. On the other hand, the utilization of large-scale, multi-purpose language models trained through self-supervision is progressively more prevalent, for tasks like speech recognition. In this context, the pre-trained model, which has been trained on extensive speech data, can be fine-tuned for particular downstream tasks like speaker verification. These massive models typically rely on SANs as their foundational architecture. Therefore, studying the potential capabilities and training challenges of such models is of utmost importance for the future generation of speaker verification systems. In this direction, we propose a speaker embedding extractor based on SANs to obtain a discriminative speaker representation given non-fixed length speech utterances. With the advancements suggested in this work, we could achieve up to 41% relative performance improvement in terms of EER compared to the naive SAN which was proposed in our previous work. Moreover, we empirically show the training instability in such architectures in terms of rank collapse and further investigate the potential solutions to alleviate this shortcoming.
引用
收藏
页数:14
相关论文
共 50 条
  • [11] NEPALI SPEECH RECOGNITION USING SELF-ATTENTION NETWORKS
    Joshi, Basanta
    Shrestha, Rupesh
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2023, 19 (06): : 1769 - 1784
  • [12] Self-attention transfer networks for speech emotion recognition
    Ziping ZHAO
    Keru Wang
    Zhongtian BAO
    Zixing ZHANG
    Nicholas CUMMINS
    Shihuang SUN
    Haishuai WANG
    Jianhua TAO
    Bj?rn W.SCHULLER
    虚拟现实与智能硬件(中英文), 2021, 3 (01) : 43 - 54
  • [13] Probabilistic mapping networks for speaker recognition
    Li, HZ
    Gong, YF
    Haton, JP
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 3374 - 3377
  • [14] Speaker Recognition with ResNet and VGG Networks
    Jakubec, Maros
    Lieskovska, Eva
    Jarina, Roman
    2021 31ST INTERNATIONAL CONFERENCE RADIOELEKTRONIKA (RADIOELEKTRONIKA), 2021,
  • [15] Phonetically-aware embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention models for the 2018 NIST Speaker Recognition Evaluation
    Vinals, Ignacio
    Ribas, Dayana
    Mingote, Victoria
    Llombart, Jorge
    Gimeno, Pablo
    Miguel, Antonio
    Ortega, Alfonso
    Lleida, Eduardo
    INTERSPEECH 2019, 2019, : 4310 - 4314
  • [16] ATTENTION MECHANISM IN SPEAKER RECOGNITION: WHAT DOES IT LEARN IN DEEP SPEAKER EMBEDDING?
    Wang, Qiongqiong
    Okabe, Koji
    Lee, Kong Aik
    Yamamoto, Hitoshi
    Koshinaka, Takafumi
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 1052 - 1059
  • [17] A BAYESIAN ATTENTION NEURAL NETWORK LAYER FOR SPEAKER RECOGNITION
    Zhu, Weizhong
    Pelecanos, Jason
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6241 - 6245
  • [18] SELF-ATTENTION NETWORKS FOR CONNECTIONIST TEMPORAL CLASSIFICATION IN SPEECH RECOGNITION
    Salazar, Julian
    Kirchhoff, Katrin
    Huang, Zhiheng
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7115 - 7119
  • [19] Speaker recognition using isomorphic graph attention network based pooling on self-supervised representation *
    Ge, Zirui
    Xu, Xinzhou
    Guo, Haiyan
    Wang, Tingting
    Yang, Zhen
    APPLIED ACOUSTICS, 2024, 219
  • [20] Speaker diarization with variants of self-attention and joint speaker embedding extractor
    Fu, Pengbin
    Ma, Yuchen
    Yang, Huirong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180