Neural Acoustic-Phonetic Approach for Speaker Verification With Phonetic Attention Mask

被引:7
|
作者
Liu, Tianchi [1 ,2 ]
Das, Rohan Kumar [3 ]
Lee, Kong Aik [1 ]
Li, Haizhou [2 ]
机构
[1] ASTAR, Inst Infocomm Res, Singapore 138632, Singapore
[2] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
[3] Fortemedia, Singapore 138589, Singapore
基金
新加坡国家研究基金会;
关键词
Phonetics; Training; Mel frequency cepstral coefficient; Generators; Speech recognition; Task analysis; Databases; Speaker verification; text-dependent; attention; masking; phonetic information; prompted digit recognition; RECOGNITION;
D O I
10.1109/LSP.2022.3143036
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Traditional acoustic-phonetic approach makes use of both spectral and phonetic information when comparing the voice of speakers. While phonetic units are not equally informative, the phonetic context of speech plays an important role in speaker verification (SV). In this paper, we propose a neural acoustic-phonetic approach that learns to dynamically assign differentiated weights to spectral features for SV. Such differentiated weights form a phonetic attention mask (PAM). The neural acoustic-phonetic framework consists of two training pipelines, one for SV and another for speech recognition. Through the PAM, we leverage the phonetic information for SV. We evaluate the proposed neural acoustic-phonetic framework on the RSR2015 database Part III corpus, that consists of random digit strings. We show that the proposed framework with PAM consistently outperforms baseline with an equal error rate reduction of 13.45% and 10.20% for female and male data, respectively.
引用
收藏
页码:782 / 786
页数:5
相关论文
共 50 条
  • [1] AN ACOUSTIC-PHONETIC DATABASE
    FISHER, WM
    ZUE, V
    BERNSTEIN, J
    PALLETT, DS
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 : S92 - S93
  • [2] A neural architecture for computing acoustic-phonetic invariants
    Tsiang, E
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 1109 - 1112
  • [3] Acoustic-phonetic speech parameters for speaker-independent speech recognition
    Deshmukh, O
    Espy-Wilson, CY
    Juneja, A
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596
  • [4] Acoustic-Phonetic Approach for Automatic Evaluation of Spoken Grammar
    Deshmukh, Om D.
    Verma, Ashish
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2614 - 2617
  • [5] End-To-End Phonetic Neural Network Approach for Speaker Verification
    Demirbag, Sedat
    Erden, Mustafa
    Arslan, Levent
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [6] Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification
    Li, Lantian
    Tang, Zhiyuan
    Shi, Ying
    Wang, Dong
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 284 - 288
  • [7] Acoustic-phonetic processing in semantic dementia
    Kwok, Shaleigh
    Reilly, Jamie
    Grossman, Murray
    Work, Melissa
    BRAIN AND LANGUAGE, 2006, 99 (1-2) : 145 - 146
  • [8] Acoustic-phonetic analysis of prominence in Swedish
    Fant, G
    Kruckenberg, A
    Liljencrants, J
    INTONATION: ANALYSIS, MODELLING AND TECHNOLOGY, 2000, 15 : 55 - 86
  • [9] ACOUSTIC-PHONETIC REPRESENTATIONS IN WORD RECOGNITION
    PISONI, DB
    LUCE, PA
    COGNITION, 1987, 25 (1-2) : 21 - 52
  • [10] Discriminative likelihood score weighting based on acoustic-phonetic classification for speaker identification
    Suh, Youngjoo
    Kim, Hoirin
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2014, : 1 - 7