Self Attentive Context dependent Speaker Embedding for Speaker Verification

被引：1

作者：

Sankala, Sreekanth ^{[1
]}

Rafi, B. Shaik Mohammad ^{[2
]}

Kodukula, Sri Rama Murty ^{[2
]}

机构：

[1] APIIIT RGUKT RK Valley, Dept ECE, Cuddapah, India

[2] Indian Inst Technol Hyderabad, Speech Informat Proc Lab, Hyderabad, India

来源：

2020 TWENTY SIXTH NATIONAL CONFERENCE ON COMMUNICATIONS (NCC 2020) | 2020年

关键词：

speaker recognition; Multi-head Self attention; time-delay neural networks; x-vector; phonetic vector;

D O I：

10.1109/ncc48643.2020.9056043

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In the recent past, Deep neural networks became the most successful approach to extract the speaker embeddings. Among the existing methods, the x-vector system, that extracts a fixed dimensional representation from varying length speech signal, became the most successful approach. Later the performance of the x-vector system improved by explicitly modeling the phonological variations in it i.e, c-vector. Although the c-vector framework utilizes the phonological variations in the speaker embedding extraction process, it is giving equal attention to all the frames using the stats pooling layer. Motivated by the subjective analysis of the importance of nasals, vowels, and semivowels for speaker recognition, we extend the work of the c-vector system by including a multi-head self-attention mechanism. In comparison with the earlier subjective analysis on the importance of different phonetic units for speaker recognition, we also analyzed the attentions learnt by the network using TIMIT data. To examine the effectiveness of the proposed approach, we evaluate the performance of the proposed system on the NIST SRE10 database and get a relative improvement of 18.19 % with respect to the c-vector system on the short-duration case.

引用

页数：5

共 50 条

[21] Anadaptive speaker verification system with speaker dependent a priori decision thresholds
Nuance Communications, 1380 Willow Rd., Menlo Park
CA
94025, United States
Int. Conf. Spok. Lang. Process., ICSLP, 1600, (589-592):
[22] Parallel Speaker and Content Modelling for Text-dependent Speaker Verification
Ma, Jianbo
Irtza, Saad
Sriskandaraja, Kaavya
Sethu, Vidhyasaharan
Ambikairajah, Eliathamby
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 435 - 439
[23] Effectiveness of speaker-dependent feature score pruning in speaker verification
Pillay, Surosh G.
Ariyaeeinia, Aladdin
Pawlewski, Mark
2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 372 - +
[24] Phoneme-aware and Channel-wise Attentive Learning for Text Dependent Speaker Verification
Liu, Yan
Li, Zheng
Li, Lin
Hong, Qingyang
INTERSPEECH 2021, 2021, : 101 - 105
[25] Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
Liu, Xuechen
Sahidullah, Md
Lee, Kong Aik
Kinnunen, Tomi
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1261 - 1273
[26] An Effective Deep Embedding Learning Architecture for Speaker Verification
Jiang, Yiheng
Song, Yan
McLoughlin, Ian
Gao, Zhifu
Dai, Lirong
INTERSPEECH 2019, 2019, : 4040 - 4044
[27] Speaker diarization with variants of self-attention and joint speaker embedding extractor
Fu, Pengbin
Ma, Yuchen
Yang, Huirong
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
[28] Speaker-discriminative Embedding Learning via Affinity Matrix for Short Utterance Speaker Verification
Peng, Junyi
Gu, Rongzhi
Zou, Yuexian
Wangt, Wenwu
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 314 - 319
[29] GENERATIVE ADVERSARIAL SPEAKER EMBEDDING NETWORKS FOR DOMAIN ROBUST END-TO-END SPEAKER VERIFICATION
Bhattacharya, Gautam
Monteiro, Joao
Alam, Jahangir
Kenny, Patrick
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6226 - 6230
[30] DeltaVLAD: An efficient optimization algorithm to discriminate speaker embedding for text-independent speaker verification
Guo, Xin
Luo, Chengfang
Deng, Aiwen
Deng, Feiqi
AIMS MATHEMATICS, 2022, 7 (04): : 6381 - 6395

← 1 2 3 4 5 →