Discriminative speaker embedding with serialized multi-layer multi-head attention

被引:6
|
作者
Zhu, Hongning [1 ,2 ]
Lee, Kong Aik [3 ]
Li, Haizhou [2 ,4 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] ASTAR, Inst Infocomm Res, Singapore, Singapore
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Guangdong, Peoples R China
关键词
Speaker embeddings; Speaker verification; Attention mechanism; Serialized attention; RECOGNITION;
D O I
10.1016/j.specom.2022.09.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a serialized multi-layer multi-head attention is proposed for extracting neural speaker embedding in text-independent speaker verification task. The majority of the recent approaches apply one attention layer to aggregate frame-level features. Inspired by the Transformer network, the proposed serialized attention contains a stack of self-attention layers. Unlike parallel multi-head attention, we propose to aggregate the attentive statistics in a serialized manner to generate the utterance-level embedding and it is propagated to the next layer by residual connection. We further propose an input-aware query for each utterance with the statistics pooling. To evaluate the quality of learned speaker embeddings, the proposed serialized attention mechanism is applied on two widely used neural speaker embedding architectures and validated on several benchmark datasets of various languages and acoustic conditions, including the VoxCeleb1, SITW, and CN-Celeb. Experimental results demonstrate the use of serialized attention can achieve better speaker verification performance.
引用
收藏
页码:89 / 100
页数:12
相关论文
共 50 条
  • [1] Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding
    Zhu, Hongning
    Lee, Kong Aik
    Li, Haizhou
    [J]. INTERSPEECH 2021, 2021, : 106 - 110
  • [2] MULTI-RESOLUTION MULTI-HEAD ATTENTION IN DEEP SPEAKER EMBEDDING
    Wang, Zhiming
    Yao, Kaisheng
    Li, Xiaolong
    Fang, Shuo
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6464 - 6468
  • [3] Self Multi-Head Attention for Speaker Recognition
    India, Miquel
    Safari, Pooyan
    Hernando, Javier
    [J]. INTERSPEECH 2019, 2019, : 4305 - 4309
  • [4] DOUBLE MULTI-HEAD ATTENTION FOR SPEAKER VERIFICATION
    India, Miquel
    Safari, Pooyan
    Hernando, Javier
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6144 - 6148
  • [5] Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection
    Kaneko, Masahiro
    Komachi, Mamoru
    [J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 883 - 891
  • [6] Bidirectional Encoder Representations from Transformers (Bert) And Serialized Multi-Layer Multi-Head Attention Feature Location Model Foraspect-Level Sentiment Analysis
    Regina, I. Anette
    Sengottuvelan, P.
    [J]. JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (02) : 1391 - 1406
  • [7] On the diversity of multi-head attention
    Li, Jian
    Wang, Xing
    Tu, Zhaopeng
    Lyu, Michael R.
    [J]. NEUROCOMPUTING, 2021, 454 : 14 - 24
  • [8] Acoustic Word Embedding Based on Multi-Head Attention Quadruplet Network
    Zhu, Shirong
    Zhang, Ying
    He, Kai
    Zhao, Lasheng
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 184 - 188
  • [9] Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recommendation
    Zhao, Zhiwei
    Wang, Xiaoye
    Xiao, Yingyuan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] Attention-based Multi-layer Chinese Word Embedding
    Ma, Bing
    Sun, Haifeng
    Wang, Jingyu
    Qi, Qi
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2895 - 2902