Discriminative speaker embedding with serialized multi-layer multi-head attention

被引:6
|
作者
Zhu, Hongning [1 ,2 ]
Lee, Kong Aik [3 ]
Li, Haizhou [2 ,4 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[3] ASTAR, Inst Infocomm Res, Singapore, Singapore
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Guangdong, Peoples R China
关键词
Speaker embeddings; Speaker verification; Attention mechanism; Serialized attention; RECOGNITION;
D O I
10.1016/j.specom.2022.09.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, a serialized multi-layer multi-head attention is proposed for extracting neural speaker embedding in text-independent speaker verification task. The majority of the recent approaches apply one attention layer to aggregate frame-level features. Inspired by the Transformer network, the proposed serialized attention contains a stack of self-attention layers. Unlike parallel multi-head attention, we propose to aggregate the attentive statistics in a serialized manner to generate the utterance-level embedding and it is propagated to the next layer by residual connection. We further propose an input-aware query for each utterance with the statistics pooling. To evaluate the quality of learned speaker embeddings, the proposed serialized attention mechanism is applied on two widely used neural speaker embedding architectures and validated on several benchmark datasets of various languages and acoustic conditions, including the VoxCeleb1, SITW, and CN-Celeb. Experimental results demonstrate the use of serialized attention can achieve better speaker verification performance.
引用
下载
收藏
页码:89 / 100
页数:12
相关论文
共 50 条
  • [41] MULE: Multi-Layer Virtual Network Embedding
    Chowdhury, Shihabur Rahman
    Ayoubi, Sara
    Ahmed, Reaz
    Shahriar, Nashid
    Boutaba, Raouf
    Mitra, Jeebak
    Liu, Liu
    2017 13TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2017,
  • [42] Multi-Head Attention for Multi-Modal Joint Vehicle Motion Forecasting
    Mercat, Jean
    Gilles, Thomas
    El Zoghby, Nicole
    Sandou, Guillaume
    Beauvois, Dominique
    Gil, Guillermo Pita
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 9638 - 9644
  • [43] Multi-modal multi-head self-attention for medical VQA
    Joshi, Vasudha
    Mitra, Pabitra
    Bose, Supratik
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (14) : 42585 - 42608
  • [44] Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention
    Yun, Hyeongu
    Kang, Taegwan
    Jung, Kyomin
    APPLIED SCIENCES-BASEL, 2021, 11 (04): : 1 - 14
  • [45] Multi-modal multi-head self-attention for medical VQA
    Vasudha Joshi
    Pabitra Mitra
    Supratik Bose
    Multimedia Tools and Applications, 2024, 83 : 42585 - 42608
  • [46] Attention-Enabled Multi-layer Subword Joint Learning for Chinese Word Embedding
    Pengpeng Xue
    Jing Xiong
    Liang Tan
    Zhongzhu Liu
    Kanglong Liu
    Cognitive Computation, 2025, 17 (2)
  • [47] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (22) : 34937 - 34958
  • [48] Parallel multi-head attention and term-weighted question embedding for medical visual question answering
    Sruthy Manmadhan
    Binsu C Kovoor
    Multimedia Tools and Applications, 2023, 82 : 34937 - 34958
  • [49] A MULTI-LAYER DISCRIMINATIVE FRAMEWORK FOR PARKING SPACE DETECTION
    Huang, Ching-Chun
    Hoang Tran Vu
    2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2015,
  • [50] Local Constraint and Label Embedding Multi-layer Dictionary Learning for Sperm Head Classification
    Ni, Tongguang
    Ding, Yan
    Xue, Jing
    Xia, Kaijian
    Gu, Xiaoqing
    Jiang, Yizhang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (03)