Self Attentive Context dependent Speaker Embedding for Speaker Verification

被引:1
|
作者
Sankala, Sreekanth [1 ]
Rafi, B. Shaik Mohammad [2 ]
Kodukula, Sri Rama Murty [2 ]
机构
[1] APIIIT RGUKT RK Valley, Dept ECE, Cuddapah, India
[2] Indian Inst Technol Hyderabad, Speech Informat Proc Lab, Hyderabad, India
关键词
speaker recognition; Multi-head Self attention; time-delay neural networks; x-vector; phonetic vector;
D O I
10.1109/ncc48643.2020.9056043
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In the recent past, Deep neural networks became the most successful approach to extract the speaker embeddings. Among the existing methods, the x-vector system, that extracts a fixed dimensional representation from varying length speech signal, became the most successful approach. Later the performance of the x-vector system improved by explicitly modeling the phonological variations in it i.e, c-vector. Although the c-vector framework utilizes the phonological variations in the speaker embedding extraction process, it is giving equal attention to all the frames using the stats pooling layer. Motivated by the subjective analysis of the importance of nasals, vowels, and semivowels for speaker recognition, we extend the work of the c-vector system by including a multi-head self-attention mechanism. In comparison with the earlier subjective analysis on the importance of different phonetic units for speaker recognition, we also analyzed the attentions learnt by the network using TIMIT data. To examine the effectiveness of the proposed approach, we evaluate the performance of the proposed system on the NIST SRE10 database and get a relative improvement of 18.19 % with respect to the c-vector system on the short-duration case.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] DEEP SPEAKER EMBEDDING LEARNING WITH MULTI-LEVEL POOLING FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Tang, Yun
    Ding, Guohong
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6116 - 6120
  • [42] MULTI-USER VOICEFILTER-LITE VIA ATTENTIVE SPEAKER EMBEDDING
    Rikhye, Rajeev
    Wang, Quan
    Liang, Qiao
    He, Yanzhang
    McGraw, Ian
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 275 - 282
  • [43] Self-Attentive Similarity Measurement Strategies in Speaker Diarization
    Lin, Qingjian
    Hou, Yu
    Li, Ming
    INTERSPEECH 2020, 2020, : 284 - 288
  • [44] Domain Adaptation for Text Dependent Speaker Verification
    Aronowitz, Hagai
    Rendel, Asaf
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1337 - 1341
  • [45] Neural Embedding Extractors for Text-Independent Speaker Verification
    Alam, Jahangir
    Kang, Woohyun
    Fathan, Abderrahim
    SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 10 - 23
  • [46] Dual Path Embedding Learning for Speaker Verification with Triplet Attention
    Liu, Bei
    Chen, Zhengyang
    Qian, Yanmin
    INTERSPEECH 2022, 2022, : 291 - 295
  • [47] Speaker-dependent Dictionary-based Speech Enhancement for Text-Dependent Speaker Verification
    Thomsen, Nicolai Baek
    Thomsen, Dennis Alexander Lehmann
    Tan, Zheng-Hua
    Lindberg, Borge
    Jensen, Soren Holdt
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1839 - 1843
  • [48] Speaker verification using speaker- and test-dependent fast score normalization
    Ramos-Castro, Daniel
    Fierrez-Aguilar, Julian
    Gonzalez-Rodriguez, Joaquin
    Ortega-Garcia, Javier
    PATTERN RECOGNITION LETTERS, 2007, 28 (01) : 90 - 98
  • [49] Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification
    Liang, Chunyan
    Yang, Lin
    Zhao, Qingwei
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (10): : 2572 - 2576
  • [50] Weighting scores to improve speaker-dependent threshold estimation in text-dependent speaker verification
    Saeta, JR
    Hernando, J
    NONLINEAR ANALYSES AND ALGORITHMS FOR SPEECH PROCESSING, 2005, 3817 : 81 - 91