Eliminating inter-speaker variability prior to discriminant transforms

被引：0

作者：

Saon, G ^{[1
]}

Padmanabhan, M ^{[1
]}

Gopinath, R ^{[1
]}

机构：

[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS | 2001年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper shows the impact of speaker normalization techniques such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT) prior to discriminant feature space transforms, such as LDA. We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved classification accuracy (as measured by the word error rate). Experimental results on the SPINE (speech in noisy environments) database indicate an improvement of up to 5% relative over the standard case where speaker adaptation (during testing and training) is applied after the LDA transform which is trained in a speaker independent manner. We conjecture that performing linear discriminant analysis in a canonical feature space (or speaker normalized space) is more effective than LDA in a speaker independent space because the eigenvectors will carve a subspace of maximum intra-speaker phonetic separability whereas in the latter case this subspace is also defined by the interspeaker variability. Indeed, we will show that the more normalization is performed (first VTLN, then SAT) the higher the LDA eigenvalues become.

引用

页码：73 / 76

页数：4

共 50 条

[31] Separating Speaker and Environmental Variability Using Factored Transforms
Seltzer, Michael L.
Acero, Alex
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1104 - 1107
[32] Inter-speaker acoustic differences of sustained vowels at varied dysarthria severities for amyotrophic lateral sclerosis
Bhattacharjee, Tanuka
Vengalil, Seena
Belur, Yamini
Atchayaram, Nalini
Ghosh, Prasanta Kumar
JASA EXPRESS LETTERS, 2024, 4 (12):
[33] A Study on the Mixed Model Approach and Symbol Probability Weighting Function for Maximization of Inter-Speaker Variation
Chin, Se-Noon
Kang, Chul-Ho
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (07): : 410 - 415
[34] Overrated gaps: Inter-speaker gaps provide limited information about the timing of turns in conversation
Corps, Ruth E.
Knudsen, Birgit
Meyer, Antje S.
COGNITION, 2022, 223
[35] MATERNAL CONTROL OF CO-VOCALIZATION AND INTER-SPEAKER SILENCES IN MOTHER INFANT VOCAL ENGAGEMENTS
ELIAS, G
HAYES, A
BROERSE, J
JOURNAL OF CHILD PSYCHOLOGY AND PSYCHIATRY AND ALLIED DISCIPLINES, 1986, 27 (03): : 409 - 415
[36] Speaker matters: Natural inter-speaker variation affects 4-month-olds' perception of audio-visual speech
Pejovic, Jovana
Yee, Eiling
Molnar, Monika
FIRST LANGUAGE, 2020, 40 (02) : 113 - 127
[37] INTER DATASET VARIABILITY MODELING FOR SPEAKER RECOGNITION
Aronowitz, Hagai
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5400 - 5404
[38] INTER DATASET VARIABILITY COMPENSATION FOR SPEAKER RECOGNITION
Aronowitz, Hagai
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[39] Studying the link between inter-speaker coordination and speech imitation through human-machine interactions
Lancia, Leonardo
Chaminade, Thierry
Nguyen, Noel
Prevot, Laurent
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 859 - 863
[40] Notes on So-called Inter-speaker Difference in Spontaneous Speech: The Case of Japanese Voiced Obstruent
Maekawa, Kikuo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3036 - 3040

← 1 2 3 4 5 →