Self Attentive Context dependent Speaker Embedding for Speaker Verification

被引:1
|
作者
Sankala, Sreekanth [1 ]
Rafi, B. Shaik Mohammad [2 ]
Kodukula, Sri Rama Murty [2 ]
机构
[1] APIIIT RGUKT RK Valley, Dept ECE, Cuddapah, India
[2] Indian Inst Technol Hyderabad, Speech Informat Proc Lab, Hyderabad, India
关键词
speaker recognition; Multi-head Self attention; time-delay neural networks; x-vector; phonetic vector;
D O I
10.1109/ncc48643.2020.9056043
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In the recent past, Deep neural networks became the most successful approach to extract the speaker embeddings. Among the existing methods, the x-vector system, that extracts a fixed dimensional representation from varying length speech signal, became the most successful approach. Later the performance of the x-vector system improved by explicitly modeling the phonological variations in it i.e, c-vector. Although the c-vector framework utilizes the phonological variations in the speaker embedding extraction process, it is giving equal attention to all the frames using the stats pooling layer. Motivated by the subjective analysis of the importance of nasals, vowels, and semivowels for speaker recognition, we extend the work of the c-vector system by including a multi-head self-attention mechanism. In comparison with the earlier subjective analysis on the importance of different phonetic units for speaker recognition, we also analyzed the attentions learnt by the network using TIMIT data. To examine the effectiveness of the proposed approach, we evaluate the performance of the proposed system on the NIST SRE10 database and get a relative improvement of 18.19 % with respect to the c-vector system on the short-duration case.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] GRAPH ATTENTIVE FEATURE AGGREGATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION
    Shim, Hye-Jin
    Heo, Jungwoo
    Park, Jae-Han
    Lee, Ga-Hui
    Yu, Ha-Jin
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7972 - 7976
  • [32] Data Augmentation Enhanced Speaker Enrollment for Text-dependent Speaker Verification
    Sarkar, Achintya Kumar
    Sarma, Himangshu
    Dwivedi, Priyanka
    Tan, Zheng-Hua
    2020 3RD INTERNATIONAL CONFERENCE ON ENERGY, POWER AND ENVIRONMENT: TOWARDS CLEAN ENERGY TECHNOLOGIES (ICEPE 2020), 2021,
  • [33] On Deep Speaker Embeddings for Speaker Verification
    Jakubec, Maros
    Jarina, Roman
    Lieskovska, Eva
    Chmulik, Michal
    2021 44TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2021, : 162 - 166
  • [34] Effective speaker adaptations for speaker verification
    Ahn, S
    Kang, S
    Ko, H
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1081 - 1084
  • [35] Enroll-Aware Attentive Statistics Pooling for Target Speaker Verification
    Zhang, Leying
    Chen, Zhengyang
    Qian, Yanmin
    INTERSPEECH 2022, 2022, : 311 - 315
  • [36] SPEAKER VERIFICATION
    CHAPMAN, WD
    LI, KP
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1966, 40 (05): : 1282 - &
  • [37] Speaker verification
    Atkins, Wendy
    Biometric Technology Today, 2001, 9 (03) : 8 - 11
  • [38] Implementation of Text Dependent Speaker Verification on MATLAB
    Kaur, Gurpreet
    Kumar, Naresh
    Khanna, Ravinder
    Kumar, Amod
    2015 2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ENGINEERING & COMPUTATIONAL SCIENCES (RAECS), 2015,
  • [39] Text-dependent speaker verification system
    Qin, Bing
    Chen, Huipeng
    Li, Guangqi
    Liu, Songbo
    Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology, 2000, 32 (04): : 16 - 18
  • [40] Deep Speaker Embedding with Long Short Term Centroid Learning for Text-independent Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    INTERSPEECH 2020, 2020, : 3246 - 3250