Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask

被引：7

作者：

Liu, Tianchi ^{[1
,2
]}

Das, Rohan Kumar ^{[3
]}

Lee, Kong Aik ^{[1
]}

Li, Haizhou ^{[2
]}

机构：

[1] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore

[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore

[3] Fortemedia, Singapore 138589, Singapore

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

基金：

新加坡国家研究基金会;

关键词：

Phonetics; Training; Mel frequency cepstral coefficient; Generators; Speech recognition; Task analysis; Databases; Speaker verification; text-dependent; attention; masking; phonetic information; prompted digit recognition; RECOGNITION;

D O I：

10.1109/LSP.2022.3143036

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Traditional acoustic-phonetic approach makes use of both spectral and phonetic information when comparing the voice of speakers. While phonetic units are not equally informative, the phonetic context of speech plays an important role in speaker verification (SV). In this paper, we propose a neural acoustic-phonetic approach that learns to dynamically assign differentiated weights to spectral features for SV. Such differentiated weights form a phonetic attention mask (PAM). The neural acoustic-phonetic framework consists of two training pipelines, one for SV and another for speech recognition. Through the PAM, we leverage the phonetic information for SV. We evaluate the proposed neural acoustic-phonetic framework on the RSR2015 database Part III corpus, that consists of random digit strings. We show that the proposed framework with PAM consistently outperforms baseline with an equal error rate reduction of 13.45% and 10.20% for female and male data, respectively.

引用

页码：782 / 786

页数：5

共 50 条

[21] ACOUSTIC-PHONETIC ISSUES IN SPEECH-PERCEPTION
SAMUEL, AG
TARTTER, VC
ANNUAL REVIEW OF ANTHROPOLOGY, 1986, 15 : 247 - 273
[22] Acoustic-phonetic features for the automatic classification of fricatives
Abdelatty Ali, Ahmed M.
Van Der Spiegel, Jan
Mueller, Paul
1600, Acoustical Society of America (109):
[23] Acoustic-phonetic features for the automatic classification of fricatives
Ali, AMA
Van der Spiegel, J
Mueller, P
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2001, 109 (05): : 2217 - 2235
[24] A fuzzy acoustic-phonetic decoder for speech recognition
Oppizzi, O
Fournier, D
Gilles, P
Meloni, H
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2270 - 2273
[25] SYSTEM FOR ACOUSTIC-PHONETIC ANALYSIS OF CONTINUOUS SPEECH
WEINSTEIN, CJ
MCCANDLESS, SS
MONDSHEIN, LF
ZUE, VW
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (01): : 54 - 67
[26] Acoustic-Phonetic Analysis for Speech Recognition: A Review
Sarma, Biswajit Dev
Prasanna, S. R. Mahadeva
IETE TECHNICAL REVIEW, 2018, 35 (03) : 305 - 327
[27] Introducing phonetic information to speaker embedding for speaker verification
Yi Liu
Liang He
Jia Liu
Michael T. Johnson
EURASIP Journal on Audio, Speech, and Music Processing, 2019
[28] Introducing phonetic information to speaker embedding for speaker verification
Liu, Yi
He, Liang
Liu, Jia
Johnson, Michael T.
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2019, 2019 (01)
[29] Acoustic-Phonetic Features for Refining the Explicit Speech Segmentation
Selmini, Antonio Marcos
Violaro, Fabio
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1853 - 1856
[30] ACOUSTIC-PHONETIC EXPERIMENT FACILITY FOR STUDY OF CONTINUOUS SPEECH
SCHWARTZ, RM
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S105 - S105

← 1 2 3 4 5 →