Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data

被引：0

作者：

Do, Cong-Thanh ^{[1
]}

Barras, Claude ^{[1
]}

Le, Viet-Bac ^{[2
]}

Sarkar, Achintya K. ^{[1
]}

机构：

[1] Univ Paris Sud, CNRS, LIMSI, F-91403 Orsay, France

[2] Parc Orsay Univ, Vocapia Res, F-91400 Orsay, France

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

Speaker verification; multi-layer perceptron (MLP); principal component analysis (PCA); KIST SRE 2008; GMM-UBM; RECOGNITION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, have been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with long-term MLP (multi-layer perceptron) features makes it possible to improve significantly the performance of ASR. In this work, we investigate the possibility of augmenting short-term cepstral features with MLP features in order to improve the performance of text-independent speaker verification. We show, that, even though augmenting cepstral features with MLP features does not directly improve speaker verification performance, reducing the dimension of the augmented features, using principal component analysis (PCA), makes it possible to reduce, relatively, around 12% of the equal error rate (EER). Experiments are performed on telephone data of the 2008 KIST SRE (speaker recognition evaluation) database.

引用

页码：2483 / 2487

页数：5

共 50 条

[1] Combining Short-term Cepstral and Long-term Pitch Features for Automatic Recognition of Speaker Age
Mueller, Christian
Burkhardt, Felix
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2268 - +
[2] Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification
Sarkar, Achintya K.
Cong-Thanh Do
Le, Viet-Bac
Barras, Claude
[J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (09) : 1040 - 1044
[3] Investigating Long-Term and Short-Term Time-Varying Speaker Verification
Qin, Xiaoyi
Li, Na
Duan, Shufei
Li, Ming
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3408 - 3423
[4] Cepstral and Long-Term Features for Emotion Recognition
Dumouchel, Pierre
Dehak, Najim
Attabi, Yazid
Dehak, Reda
Boufaden, Narjes
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 344 - +
[5] FUSING SHORT TERM AND LONG TERM FEATURES FOR IMPROVED SPEAKER DIARIZATION
Friedland, A. Gerald
Vinyals, B. Oriol
Huang, C. Yan
Mueller, D. Christian
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4077 - +
[6] Short-term and long-term attentional biases to frequently encountered target features
Sha, Li Z.
Remington, Roger W.
Jiang, Yuhong V.
[J]. ATTENTION PERCEPTION & PSYCHOPHYSICS, 2017, 79 (05) : 1311 - 1322
[7] Combination of Long-Term and Short-Term Features for Age Identification from Voice
Buyuk, Osman
Arslan, Mustafa Levent
[J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2018, 18 (02) : 101 - 108
[8] Short-term and long-term attentional biases to frequently encountered target features
Li Z. Sha
Roger W. Remington
Yuhong V. Jiang
[J]. Attention, Perception, & Psychophysics, 2017, 79 : 1311 - 1322
[9] Speaker Profiling Based on the Short-Term Acoustic Features of Vowels
Humayun, Mohammad Ali
Shuja, Junaid
Abas, Pg Emeroylariffion
[J]. TECHNOLOGIES, 2023, 11 (05)
[10] Prosodic and other Long-Term Features for Speaker Diarization
Friedland, Gerald
Vinyals, Oriol
Huang, Yan
Mueller, Christian
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 985 - 993

← 1 2 3 4 5 →