Exploring a Unified Attention-Based Pooling Framework for Speaker Verification

被引:0
|
作者
Liu, Yi [1 ]
He, Liang [1 ]
Liu, Weiwei [2 ]
Liu, Jia [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
[2] Chinese Peoples Liberat Army, 62315 Unit, Beijing 100842, Peoples R China
基金
中国国家自然科学基金;
关键词
speaker verification; speaker embedding; attention mechanism; multi-head attention;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The pooling layer is an essential component in the neural network based speaker verification. Most of the current networks in speaker verification use average pooling to derive the utterance-level speaker representations. Average pooling takes every frame as equally important, which is suboptimal since the speaker-discriminant power is different between speech segments. In this paper, we present a unified attention-based pooling framework and combine it with the multi-head attention. Experiments on the Fisher and NIST SRE 2010 dataset show that involving outputs from lower layers to compute the attention weights can outperform average pooling and achieve better results than vanilla attention method. The multi-head attention further improves the performance.
引用
收藏
页码:200 / 204
页数:5
相关论文
共 50 条
  • [31] Enhancing high-vocabulary image annotation with a novel attention-based pooling
    Salar, Ali
    Ahmadi, Ali
    VISUAL COMPUTER, 2024, : 3537 - 3551
  • [32] Attention-Based Second-Order Pooling Network for Hyperspectral Image Classification
    Xue, Zhaohui
    Zhang, Mengxue
    Liu, Yifeng
    Du, Peijun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (11): : 9600 - 9615
  • [33] A unified framework for score normalization techniques applied to text-independent speaker verification
    Mariéthoz, J
    Bengio, S
    IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (07) : 532 - 535
  • [34] Speaker Verification Based on Channel Attention and Adaptive Joint Loss
    Fan, Houbin
    Li, Jun
    Ge, Fengpei
    Liang, Chunyan
    ELECTRONICS, 2025, 14 (03):
  • [35] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    INTERSPEECH 2020, 2020, : 941 - 945
  • [36] A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
    Jung, Youngmoon
    Choi, Yeunju
    Lim, Hyungjun
    Kim, Hoirin
    IEEE ACCESS, 2020, 8 : 175448 - 175466
  • [38] AN ATTENTION-BASED FRAMEWORK FOR CONTEXT IDENTIFICATION IN AUTONOMOUS ROBOTS
    Montironi, Maria Alessandra
    Cheng, Harry H.
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2017, VOL 9, 2017,
  • [39] GRAPH ATTENTION NETWORKS FOR SPEAKER VERIFICATION
    Jung, Jee-weon
    Heo, Hee-Soo
    Yu, Ha-Jin
    Chung, Joon Son
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6149 - 6153
  • [40] A Unifying Framework of Attention-Based Neural Load Forecasting
    Xiong, Jing
    Zhang, Yu
    IEEE ACCESS, 2023, 11 : 51606 - 51616