Self Attentive Context dependent Speaker Embedding for Speaker Verification

被引:1
|
作者
Sankala, Sreekanth [1 ]
Rafi, B. Shaik Mohammad [2 ]
Kodukula, Sri Rama Murty [2 ]
机构
[1] APIIIT RGUKT RK Valley, Dept ECE, Cuddapah, India
[2] Indian Inst Technol Hyderabad, Speech Informat Proc Lab, Hyderabad, India
关键词
speaker recognition; Multi-head Self attention; time-delay neural networks; x-vector; phonetic vector;
D O I
10.1109/ncc48643.2020.9056043
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
In the recent past, Deep neural networks became the most successful approach to extract the speaker embeddings. Among the existing methods, the x-vector system, that extracts a fixed dimensional representation from varying length speech signal, became the most successful approach. Later the performance of the x-vector system improved by explicitly modeling the phonological variations in it i.e, c-vector. Although the c-vector framework utilizes the phonological variations in the speaker embedding extraction process, it is giving equal attention to all the frames using the stats pooling layer. Motivated by the subjective analysis of the importance of nasals, vowels, and semivowels for speaker recognition, we extend the work of the c-vector system by including a multi-head self-attention mechanism. In comparison with the earlier subjective analysis on the importance of different phonetic units for speaker recognition, we also analyzed the attentions learnt by the network using TIMIT data. To examine the effectiveness of the proposed approach, we evaluate the performance of the proposed system on the NIST SRE10 database and get a relative improvement of 18.19 % with respect to the c-vector system on the short-duration case.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Anadaptive speaker verification system with speaker dependent a priori decision thresholds
    Nuance Communications, 1380 Willow Rd., Menlo Park
    CA
    94025, United States
    Int. Conf. Spok. Lang. Process., ICSLP, 1600, (589-592):
  • [22] Parallel Speaker and Content Modelling for Text-dependent Speaker Verification
    Ma, Jianbo
    Irtza, Saad
    Sriskandaraja, Kaavya
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 435 - 439
  • [23] Effectiveness of speaker-dependent feature score pruning in speaker verification
    Pillay, Surosh G.
    Ariyaeeinia, Aladdin
    Pawlewski, Mark
    2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 372 - +
  • [24] Phoneme-aware and Channel-wise Attentive Learning for Text Dependent Speaker Verification
    Liu, Yan
    Li, Zheng
    Li, Lin
    Hong, Qingyang
    INTERSPEECH 2021, 2021, : 101 - 105
  • [25] Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
    Liu, Xuechen
    Sahidullah, Md
    Lee, Kong Aik
    Kinnunen, Tomi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1261 - 1273
  • [26] An Effective Deep Embedding Learning Architecture for Speaker Verification
    Jiang, Yiheng
    Song, Yan
    McLoughlin, Ian
    Gao, Zhifu
    Dai, Lirong
    INTERSPEECH 2019, 2019, : 4040 - 4044
  • [27] Speaker diarization with variants of self-attention and joint speaker embedding extractor
    Fu, Pengbin
    Ma, Yuchen
    Yang, Huirong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
  • [28] Speaker-discriminative Embedding Learning via Affinity Matrix for Short Utterance Speaker Verification
    Peng, Junyi
    Gu, Rongzhi
    Zou, Yuexian
    Wangt, Wenwu
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 314 - 319
  • [29] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
    Bhattacharya, Gautam
    Monteiro, Joao
    Alam, Jahangir
    Kenny, Patrick
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
  • [30] DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
    Guo, Xin
    Luo, Chengfang
    Deng, Aiwen
    Deng, Feiqi
    AIMS MATHEMATICS, 2022, 7 (04): : 6381 - 6395