Self Attentive Context dependent Speaker Embedding for Speaker Verification

被引:1
|
作者
Sankala, Sreekanth [1 ]
Rafi, B. Shaik Mohammad [2 ]
Kodukula, Sri Rama Murty [2 ]
机构
[1] APIIIT RGUKT RK Valley, Dept ECE, Cuddapah, India
[2] Indian Inst Technol Hyderabad, Speech Informat Proc Lab, Hyderabad, India
关键词
speaker recognition; Multi-head Self attention; time-delay neural networks; x-vector; phonetic vector;
D O I
10.1109/ncc48643.2020.9056043
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In the recent past, Deep neural networks became the most successful approach to extract the speaker embeddings. Among the existing methods, the x-vector system, that extracts a fixed dimensional representation from varying length speech signal, became the most successful approach. Later the performance of the x-vector system improved by explicitly modeling the phonological variations in it i.e, c-vector. Although the c-vector framework utilizes the phonological variations in the speaker embedding extraction process, it is giving equal attention to all the frames using the stats pooling layer. Motivated by the subjective analysis of the importance of nasals, vowels, and semivowels for speaker recognition, we extend the work of the c-vector system by including a multi-head self-attention mechanism. In comparison with the earlier subjective analysis on the importance of different phonetic units for speaker recognition, we also analyzed the attentions learnt by the network using TIMIT data. To examine the effectiveness of the proposed approach, we evaluate the performance of the proposed system on the NIST SRE10 database and get a relative improvement of 18.19 % with respect to the c-vector system on the short-duration case.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Masked cross self-attentive encoding based speaker embedding for speaker verification
    Seo, Soonshin
    Kim, Ji-Hwan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2020, 39 (05): : 497 - 504
  • [2] Deep Segment Attentive Embedding for Duration Robust Speaker Verification
    Liu, Bin
    Nie, Shuai
    Liu, Wenju
    Zhang, Hui
    Li, Xiangang
    Li, Changliang
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 822 - 826
  • [3] Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Ko, Tom
    Snyder, David
    Mak, Brian
    Povey, Daniel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3573 - 3577
  • [4] Bayesian Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
    Zhu, Yingke
    Mak, Brian
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1000 - 1012
  • [5] DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION
    Yi, Lu
    Mak, Man-Wai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7662 - 7666
  • [6] CROSS ATTENTIVE POOLING FOR SPEAKER VERIFICATION
    Kye, Seong Min
    Kwon, Yoohwan
    Chung, Joon Son
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 294 - 300
  • [7] Attentive Deep CNN for Speaker Verification
    Yu, Yong-bin
    Qi, Min-hui
    Tang, Yi-fan
    Deng, Quan-xin
    Peng, Chen-hui
    Mai, Feng
    Nyima, Tashi
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [8] Introducing phonetic information to speaker embedding for speaker verification
    Liu, Yi
    He, Liang
    Liu, Jia
    Johnson, Michael T.
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
  • [9] Introducing phonetic information to speaker embedding for speaker verification
    Yi Liu
    Liang He
    Jia Liu
    Michael T. Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2019
  • [10] Attentive Statistics Pooling for Deep Speaker Embedding
    Okabel, Koji
    Koshinaka, Takafumi
    Shinoda, Koichi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2252 - 2256