Application of Channel Attention for Speaker Recognition in the Wild

被引:0
|
作者
Chen, Zhi [1 ]
Wang, Lei [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21) | 2021年
关键词
Speaker recognition; speaker verification; channel attention; NetVLAD; prototypical networks loss;
D O I
10.1145/3469213.3470331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of this paper is to build a speaker recognition system 'in the wild' (utterances with different lengths and irrelevant signals). The key elements of designing the deep neural network for this task are the type of backbone (frame-level) network, the time aggregation (utterance-level) method and the loss function (optimisation). We propose an effective speaker recognition system based on deep neural network, using SE-ResNet to extract speaker frame-level features, and a dictionary based NetVLAD or GhostVLAD to aggregate features along the time domain. We also point out that the superiority of NetVlAD plus SE-Block is that they are all based on channel attention. Additionally, we used prototypical networks loss, which learns a metric space in which the open-set classification task can be implemented by calculating the distance to the prototype representation of each class (the training process is consistent with the test scenario). We also study the influence of utterance length on the network and conclude that longer length is beneficial for "in the wild" data. Furthermore, we present results that suggest adapting from a model trained with English dataset can work on Mandarin speaker recognition, that is to say, the representations learned by our systems transfer well across different languages.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] FREQUENCY AND TEMPORAL CONVOLUTIONAL ATTENTION FOR TEXT-INDEPENDENT SPEAKER RECOGNITION
    Yadav, Sarthak
    Rai, Atul
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6794 - 6798
  • [42] MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild
    Gong, Weijun
    Qian, Yurong
    Fan, Yingying
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (09): : 6529 - 6543
  • [43] MPCSAN: multi-head parallel channel-spatial attention network for facial expression recognition in the wild
    Weijun Gong
    Yurong Qian
    Yingying Fan
    Neural Computing and Applications, 2023, 35 : 6529 - 6543
  • [44] Improved Gaussian Mixture Model and Application in Speaker Recognition
    Bao Lingling
    Shen Xizhong
    PROCEEDINGS OF 2016 THE 2ND INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND ROBOTICS, 2016, : 387 - 390
  • [45] Android Voice Recognition Application with Multi Speaker Feature
    Frewat, George
    Baroud, Charbel
    Sammour, Roy
    Kassem, Abdallah
    Hamad, Mustapha
    PROCEEDINGS OF THE 18TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE MELECON 2016, 2016,
  • [46] Application of an annular/sphere search algorithm for speaker recognition
    de León, GM
    Vargas, JIDL
    Domínguez, EG
    15TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND COMPUTERS, PROCEEDINGS, 2005, : 190 - 194
  • [47] CHANNEL ADVERSARIAL TRAINING FOR CROSS-CHANNEL TEXT-INDEPENDENT SPEAKER RECOGNITION
    Fang, Xin
    Zou, Liang
    Li, Jin
    Sun, Lei
    Ling, Zhen-Hua
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6221 - 6225
  • [48] AN APPLICATION OF SPEAKER RECOGNITION USING ARTIFICIAL NEURAL NETWORKS
    Caner, Murat
    Ustun, Seydi Vakkas
    PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2006, 12 (02): : 279 - 284
  • [49] A general framework of feature extraction: Application to speaker recognition
    Liu, CS
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 669 - 672
  • [50] Experiments on the MFCC application in Speaker Recognition using Matlab
    Bhattarai, Kritagya
    Prasad, P. W. C.
    Alsadoon, Abeer
    Pham, L.
    Elchouemi, A.
    2017 SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND TECHNOLOGY (ICIST2017), 2017, : 32 - 37