Exploring a Unified Attention-Based Pooling Framework for Speaker Verification

被引：0

作者：

Liu, Yi ^{[1
]}

He, Liang ^{[1
]}

Liu, Weiwei ^{[2
]}

Liu, Jia ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China

[2] Chinese Peoples Liberat Army, 62315 Unit, Beijing 100842, Peoples R China

来源：

2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2018年

基金：

中国国家自然科学基金;

关键词：

speaker verification; speaker embedding; attention mechanism; multi-head attention;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The pooling layer is an essential component in the neural network based speaker verification. Most of the current networks in speaker verification use average pooling to derive the utterance-level speaker representations. Average pooling takes every frame as equally important, which is suboptimal since the speaker-discriminant power is different between speech segments. In this paper, we present a unified attention-based pooling framework and combine it with the multi-head attention. Experiments on the Fisher and NIST SRE 2010 dataset show that involving outputs from lower layers to compute the attention weights can outperform average pooling and achieve better results than vanilla attention method. The multi-head attention further improves the performance.

引用

页码：200 / 204

页数：5

共 50 条

[31] Enhancing high-vocabulary image annotation with a novel attention-based pooling
Salar, Ali
Ahmadi, Ali
VISUAL COMPUTER, 2024, : 3537 - 3551
[32] Attention-Based Second-Order Pooling Network for Hyperspectral Image Classification
Xue, Zhaohui
Zhang, Mengxue
Liu, Yifeng
Du, Peijun
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (11): : 9600 - 9615
[33] A unified framework for score normalization techniques applied to text-independent speaker verification
Mariéthoz, J
Bengio, S
IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (07) : 532 - 535
[34] Speaker Verification Based on Channel Attention and Adaptive Joint Loss
Fan, Houbin
Li, Jun
Ge, Fengpei
Liang, Chunyan
ELECTRONICS, 2025, 14 (03):
[35] Self-Attention Encoding and Pooling for Speaker Recognition
Safari, Pooyan
India, Miquel
Hernando, Javier
INTERSPEECH 2020, 2020, : 941 - 945
[36] A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments
Jung, Youngmoon
Choi, Yeunju
Lim, Hyungjun
Kim, Hoirin
IEEE ACCESS, 2020, 8 : 175448 - 175466
[37] A Possible Framework for Attention-Based Politics: A Field for Research
Merkovity, Norbert
INTERNATIONAL JOURNAL OF E-POLITICS, 2019, 10 (02) : 13 - 23
[38] AN ATTENTION-BASED FRAMEWORK FOR CONTEXT IDENTIFICATION IN AUTONOMOUS ROBOTS
Montironi, Maria Alessandra
Cheng, Harry H.
PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2017, VOL 9, 2017,
[39] GRAPH ATTENTION NETWORKS FOR SPEAKER VERIFICATION
Jung, Jee-weon
Heo, Hee-Soo
Yu, Ha-Jin
Chung, Joon Son
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6149 - 6153
[40] A Unifying Framework of Attention-Based Neural Load Forecasting
Xiong, Jing
Zhang, Yu
IEEE ACCESS, 2023, 11 : 51606 - 51616

← 1 2 3 4 5 →