Application of Channel Attention for Speaker Recognition in the Wild

被引:0
|
作者
Chen, Zhi [1 ]
Wang, Lei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Speaker recognition; speaker verification; channel attention; NetVLAD; prototypical networks loss;
D O I
10.1145/3469213.3470331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of this paper is to build a speaker recognition system 'in the wild' (utterances with different lengths and irrelevant signals). The key elements of designing the deep neural network for this task are the type of backbone (frame-level) network, the time aggregation (utterance-level) method and the loss function (optimisation). We propose an effective speaker recognition system based on deep neural network, using SE-ResNet to extract speaker frame-level features, and a dictionary based NetVLAD or GhostVLAD to aggregate features along the time domain. We also point out that the superiority of NetVlAD plus SE-Block is that they are all based on channel attention. Additionally, we used prototypical networks loss, which learns a metric space in which the open-set classification task can be implemented by calculating the distance to the prototype representation of each class (the training process is consistent with the test scenario). We also study the influence of utterance length on the network and conclude that longer length is beneficial for "in the wild" data. Furthermore, we present results that suggest adapting from a model trained with English dataset can work on Mandarin speaker recognition, that is to say, the representations learned by our systems transfer well across different languages.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Attention-based Text Recognition in the Wild
    Yan, Zhi-Chen
    Yu, Stephanie A.
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 42 - 49
  • [32] Speaker Verification Based on Channel Attention and Adaptive Joint Loss
    Fan, Houbin
    Li, Jun
    Ge, Fengpei
    Liang, Chunyan
    ELECTRONICS, 2025, 14 (03):
  • [33] Speaker Adaptive Training for Speech Recognition Based on Attention-over-Attention Mechanism
    Wan, Genshun
    Pan, Jia
    Wang, Qingran
    Gao, Jianqing
    Ye, Zhongfu
    INTERSPEECH 2020, 2020, : 1251 - 1255
  • [34] Estimation of handset nonlinearity with application to speaker recognition
    Quatieri, TF
    Reynolds, DA
    O'Leary, GC
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05): : 567 - 584
  • [35] A sequence kernel and its application to speaker recognition
    Campbell, WM
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1157 - 1163
  • [36] SIAMESE CAPSULE NETWORK FOR END-TO-END SPEAKER RECOGNITION IN THE WILD
    Hajavi, Amirhossein
    Etemad, Ali
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7203 - 7207
  • [37] A Speaker Recognition Method Based on Dynamic Convolution with Dual Attention Mechanism
    Luo, Yuan
    Zhu, Kuilin
    Wang, Wenhao
    Lin, Ziyao
    ENGINEERING LETTERS, 2023, 31 (02) : 825 - 832
  • [38] Fine-grained Early Frequency Attention for Deep Speaker Recognition
    Hajavi, Amirhossein
    Etemad, Ali
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [39] From Speaker Recognition to Forensic Speaker Recognition
    Drygajlo, Andrzej
    BIOMETRIC AUTHENTICATION (BIOMET 2014), 2014, 8897 : 93 - 104
  • [40] Single-Channel Target Speaker Extraction System with Attention Enhancement
    Lai, Yen-Ting
    Lin, Yi-En
    Chang, Pao-Chi
    Wang, Jia-Ching
    2022 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN, IEEE ICCE-TW 2022, 2022, : 433 - 434