Exploring a Unified Attention-Based Pooling Framework for Speaker Verification

被引:0
|
作者
Liu, Yi [1 ]
He, Liang [1 ]
Liu, Weiwei [2 ]
Liu, Jia [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Chinese Peoples Liberat Army, 62315 Unit, Beijing 100842, Peoples R China
基金
中国国家自然科学基金;
关键词
speaker verification; speaker embedding; attention mechanism; multi-head attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pooling layer is an essential component in the neural network based speaker verification. Most of the current networks in speaker verification use average pooling to derive the utterance-level speaker representations. Average pooling takes every frame as equally important, which is suboptimal since the speaker-discriminant power is different between speech segments. In this paper, we present a unified attention-based pooling framework and combine it with the multi-head attention. Experiments on the Fisher and NIST SRE 2010 dataset show that involving outputs from lower layers to compute the attention weights can outperform average pooling and achieve better results than vanilla attention method. The multi-head attention further improves the performance.
引用
收藏
页码:200 / 204
页数:5
相关论文
共 50 条
  • [1] ATTENTION-BASED MODELS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chowdhury, F. A. Rezaur Rahman
    Wang, Quan
    Moreno, Ignacio Lopez
    Wan, Li
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5359 - 5363
  • [2] Attention-Based Temporal-Frequency Aggregation for Speaker Verification
    Wang, Meng
    Feng, Dazheng
    Su, Tingting
    Chen, Mohan
    SENSORS, 2022, 22 (06)
  • [3] A Unified Framework for Speaker and Utterance Verification
    Liu, Tianchi
    Madhavi, Maulik
    Das, Rohan Kumar
    Li, Haizhou
    INTERSPEECH 2019, 2019, : 4320 - 4324
  • [4] Neighbour feature attention-based pooling
    Li, Xiaosong
    Wu, Yanxia
    Fu, Yan
    Tang, Chuheng
    Zhang, Lidan
    NEUROCOMPUTING, 2022, 501 : 285 - 293
  • [5] ABDPool: Attention-based Differentiable Pooling
    Liu, Yue
    Cui, Lixin
    Wang, Yue
    Bai, Lu
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3021 - 3026
  • [6] On the Use of Cross-module Attention Statistics Pooling for Speaker Verification
    Alam, Jahangir
    Fathan, Abderrahim
    2023 11TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS, IWBF, 2023,
  • [7] SAGP: A spectral attention-based global pooling
    Yu, Xiang
    Xu, Longzheng
    Han, Yi
    Geng, Zhe
    Zhu, Daiyin
    ELECTRONICS LETTERS, 2024, 60 (11)
  • [8] Attention-based multi-channel speaker verification with ad-hoc microphone arrays
    Liang, Chengdong
    Chen, Junqi
    Guan, Shanzheng
    Zhang, Xiao-Lei
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1111 - 1115
  • [9] Multistructure Graph Classification Method With Attention-Based Pooling
    Xu, Yuhua
    Wang, Junli
    Guang, Mingjian
    Yan, Chungang
    Jiang, Changjun
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (02) : 602 - 613
  • [10] AN ATTENTION-BASED BACKEND ALLOWING EFFICIENT FINE-TUNING OF TRANSFORMER MODELS FOR SPEAKER VERIFICATION
    Peng, Junyi
    Plchot, Oldrich
    Stafylakis, Themos
    Mosner, Ladislav
    Burget, Lukas
    Cernocky, Jan
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 555 - 562