Speaker Verification Employing Combinations of Self-Attention Mechanisms

被引：5

作者：

Bae, Ara ^{[1
]}

Kim, Wooil ^{[1
]}

机构：

[1] Incheon Natl Univ, Dept Comp Sci & Engn, Incheon 22012, South Korea

来源：

ELECTRONICS | 2020年 / 9卷 / 12期

关键词：

speaker verification; self-attention; attention combinations;

D O I：

10.3390/electronics9122201

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker's properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.

引用

页码：1 / 11

页数：11

共 50 条

[1] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
Han, Bing
Chen, Zhengyang
Qian, Yanmin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6727 - 6731
[2] LOCAL INFORMATION MODELING WITH SELF-ATTENTION FOR SPEAKER VERIFICATION
Han, Bing
Chen, Zhengyang
Qian, Yanmin
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2022, 2022-May : 6727 - 6731
[3] Self-Attention Networks for Text-Independent Speaker Verification
Bian, Tengyue
Chen, Fangzhou
Xu, Li
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 3955 - 3960
[4] Global-Local Self-Attention Based Transformer for Speaker Verification
Xie, Fei
Zhang, Dalong
Liu, Chengming
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[5] Self-Attention Encoding and Pooling for Speaker Recognition
Safari, Pooyan
India, Miquel
Hernando, Javier
INTERSPEECH 2020, 2020, : 941 - 945
[6] Deep CNNs With Self-Attention for Speaker Identification
Nguyen Nang An
Nguyen Quang Thanh
Liu, Yanbing
IEEE ACCESS, 2019, 7 : 85327 - 85337
[7] Class token and knowledge distillation for multi-head self-attention speaker verification systems
Mingote, Victoria
Miguel, Antonio
Ortega, Alfonso
Lleida, Eduardo
DIGITAL SIGNAL PROCESSING, 2023, 133
[8] Speaker diarization with variants of self-attention and joint speaker embedding extractor
Fu, Pengbin
Ma, Yuchen
Yang, Huirong
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
[9] Speaker-Aware Speech Enhancement with Self-Attention
Lin, Ju
Van Wijngaarden, Adriaan J.
Smith, Melissa C.
Wang, Kuang-Ching
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 486 - 490
[10] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Xue, Yawen
Nagamatsu, Kenji
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303

← 1 2 3 4 5 →