Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data

被引:0
|
作者
Do, Cong-Thanh [1 ]
Barras, Claude [1 ]
Le, Viet-Bac [2 ]
Sarkar, Achintya K. [1 ]
机构
[1] Univ Paris Sud, CNRS, LIMSI, F-91403 Orsay, France
[2] Parc Orsay Univ, Vocapia Res, F-91400 Orsay, France
关键词
Speaker verification; multi-layer perceptron (MLP); principal component analysis (PCA); KIST SRE 2008; GMM-UBM; RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, have been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with long-term MLP (multi-layer perceptron) features makes it possible to improve significantly the performance of ASR. In this work, we investigate the possibility of augmenting short-term cepstral features with MLP features in order to improve the performance of text-independent speaker verification. We show, that, even though augmenting cepstral features with MLP features does not directly improve speaker verification performance, reducing the dimension of the augmented features, using principal component analysis (PCA), makes it possible to reduce, relatively, around 12% of the equal error rate (EER). Experiments are performed on telephone data of the 2008 KIST SRE (speaker recognition evaluation) database.
引用
收藏
页码:2483 / 2487
页数:5
相关论文
共 50 条
  • [1] Combining Short-term Cepstral and Long-term Pitch Features for Automatic Recognition of Speaker Age
    Mueller, Christian
    Burkhardt, Felix
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2268 - +
  • [2] Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification
    Sarkar, Achintya K.
    Cong-Thanh Do
    Le, Viet-Bac
    Barras, Claude
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1040 - 1044
  • [3] Investigating Long-Term and Short-Term Time-Varying Speaker Verification
    Qin, Xiaoyi
    Li, Na
    Duan, Shufei
    Li, Ming
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3408 - 3423
  • [4] Cepstral and Long-Term Features for Emotion Recognition
    Dumouchel, Pierre
    Dehak, Najim
    Attabi, Yazid
    Dehak, Reda
    Boufaden, Narjes
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 344 - +
  • [5] FUSING SHORT TERM AND LONG TERM FEATURES FOR IMPROVED SPEAKER DIARIZATION
    Friedland, A. Gerald
    Vinyals, B. Oriol
    Huang, C. Yan
    Mueller, D. Christian
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4077 - +
  • [6] Short-term and long-term attentional biases to frequently encountered target features
    Sha, Li Z.
    Remington, Roger W.
    Jiang, Yuhong V.
    [J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2017, 79 (05) : 1311 - 1322
  • [7] Combination of Long-Term and Short-Term Features for Age Identification from Voice
    Buyuk, Osman
    Arslan, Mustafa Levent
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2018, 18 (02) : 101 - 108
  • [8] Short-term and long-term attentional biases to frequently encountered target features
    Li Z. Sha
    Roger W. Remington
    Yuhong V. Jiang
    [J]. Attention, Perception, & Psychophysics, 2017, 79 : 1311 - 1322
  • [9] Speaker Profiling Based on the Short-Term Acoustic Features of Vowels
    Humayun, Mohammad Ali
    Shuja, Junaid
    Abas, Pg Emeroylariffion
    [J]. TECHNOLOGIES, 2023, 11 (05)
  • [10] Prosodic and other Long-Term Features for Speaker Diarization
    Friedland, Gerald
    Vinyals, Oriol
    Huang, Yan
    Mueller, Christian
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 985 - 993