MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION

被引:20
|
作者
Wang, Rui [1 ,4 ]
Ao, Junyi [2 ,3 ,4 ]
Zhou, Long [4 ]
Liu, Shujie [4 ]
Wei, Zhihua [1 ]
Ko, Tom [2 ]
Li, Qing [3 ]
Zhang, Yu [2 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, Shanghai, Peoples R China
[2] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[4] Microsoft Res Asia, Beijing, Peoples R China
关键词
speaker recognition; Transformer; speaker identification; speaker verification;
D O I
10.1109/ICASSP43922.2022.9746639
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Initially developed for natural language processing (NLP), Transformer model is now widely used for speech processing tasks such as speaker recognition, due to its powerful sequence modeling capabilities. However, conventional self-attention mechanisms are originally designed for modeling textual sequence without considering the characteristics of speech and speaker modeling. Besides, different Transformer variants for speaker recognition have not been well studied. In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition. Specifically, to balance the capabilities of capturing global dependencies and modeling the locality, we propose a multi-view self-attention mechanism for speaker Transformer, in which different attention heads can attend to different ranges of the receptive field. Furthermore, we introduce and compare five Transformer variants with different network architectures, embedding locations, and pooling methods to learn speaker embeddings. Experimental results on the VoxCeleb1 and VoxCeleb2 datasets show that the proposed multi-view self-attention mechanism achieves improvement in the performance of speaker recognition, and the proposed speaker Transformer network attains excellent results compared with state-of-the-art models.
引用
收藏
页码:6732 / 6736
页数:5
相关论文
共 50 条
  • [1] Multi-view self-attention networks
    Xu, Mingzhou
    Yang, Baosong
    Wong, Derek F.
    Chao, Lidia S.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [2] Improved Multi-Head Self-Attention Classification Network for Multi-View Fetal Echocardiography Recognition
    Zhang, Yingying
    Zhu, Haogang
    Wang, Yan
    Wang, Jingyi
    He, Yihua
    [J]. 2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [3] Multi-view 3D Reconstruction with Self-attention
    Qian, Qiuting
    [J]. 2021 14TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2021), 2021, : 20 - 26
  • [4] Focal cross transformer: multi-view brain tumor segmentation model based on cross window and focal self-attention
    Li, Zongren
    Silamu, Wushouer
    Feng, Shurui
    Yan, Guanghui
    [J]. FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [5] MHSAN: Multi-view hierarchical self-attention network for 3D shape recognition
    Cao, Jiangzhong
    Yu, Lianggeng
    Ling, Bingo Wing-Kuen
    Yao, Zijie
    Dai, Qingyun
    [J]. PATTERN RECOGNITION, 2024, 150
  • [6] Global-Local Self-Attention Based Transformer for Speaker Verification
    Xie, Fei
    Zhang, Dalong
    Liu, Chengming
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [7] Self-Attention Encoding and Pooling for Speaker Recognition
    Safari, Pooyan
    India, Miquel
    Hernando, Javier
    [J]. INTERSPEECH 2020, 2020, : 941 - 945
  • [8] Multi-View 3D Reconstruction Method Based on Self-Attention Mechanism
    Zhu, Guangzhao
    Bo, Wei
    Yang, Afeng
    Xin, Xu
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (16)
  • [9] Multi-View Group Recommendation Integrating Self-Attention and Graph Convolution
    Wang, Yonggui
    Wang, Xinru
    [J]. Computer Engineering and Applications, 2024, 60 (08) : 287 - 295
  • [10] Incomplete multi-view clustering via self-attention networks and feature reconstruction
    Zhang, Yong
    Jiang, Li
    Liu, Da
    Liu, Wenzhe
    [J]. APPLIED INTELLIGENCE, 2024, 54 (04) : 2998 - 3016