Speaker Verification Employing Combinations of Self-Attention Mechanisms

被引:5
|
作者
Bae, Ara [1 ]
Kim, Wooil [1 ]
机构
[1] Incheon Natl Univ, Dept Comp Sci & Engn, Incheon 22012, South Korea
关键词
speaker verification; self-attention; attention combinations;
D O I
10.3390/electronics9122201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker's properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [1] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
    Han, Bing
    Chen, Zhengyang
    Qian, Yanmin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6727 - 6731
  • [2] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
    Han, Bing
    Chen, Zhengyang
    Qian, Yanmin
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 6727 - 6731
  • [3] Self-Attention Networks for Text-Independent Speaker Verification
    Bian, Tengyue
    Chen, Fangzhou
    Xu, Li
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3955 - 3960
  • [4] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [5] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [6] Deep CNNs With Self-Attention for Speaker Identification
    Nguyen Nang An
    Nguyen Quang Thanh
    Liu, Yanbing
    IEEE ACCESS, 2019, 7 : 85327 - 85337
  • [7] Class token and knowledge distillation for multi-head self-attention speaker verification systems
    Mingote, Victoria
    Miguel, Antonio
    Ortega, Alfonso
    Lleida, Eduardo
    DIGITAL SIGNAL PROCESSING, 2023, 133
  • [8] Speaker diarization with variants of self-attention and joint speaker embedding extractor
    Fu, Pengbin
    Ma, Yuchen
    Yang, Huirong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
  • [9] Speaker-Aware Speech Enhancement with Self-Attention
    Lin, Ju
    Van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
  • [10] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
    Fujita, Yusuke
    Kanda, Naoyuki
    Horiguchi, Shota
    Xue, Yawen
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303