Eliminating inter-speaker variability prior to discriminant transforms

被引:0
|
作者
Saon, G [1 ]
Padmanabhan, M [1 ]
Gopinath, R [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper shows the impact of speaker normalization techniques such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT) prior to discriminant feature space transforms, such as LDA. We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved classification accuracy (as measured by the word error rate). Experimental results on the SPINE (speech in noisy environments) database indicate an improvement of up to 5% relative over the standard case where speaker adaptation (during testing and training) is applied after the LDA transform which is trained in a speaker independent manner. We conjecture that performing linear discriminant analysis in a canonical feature space (or speaker normalized space) is more effective than LDA in a speaker independent space because the eigenvectors will carve a subspace of maximum intra-speaker phonetic separability whereas in the latter case this subspace is also defined by the interspeaker variability. Indeed, we will show that the more normalization is performed (first VTLN, then SAT) the higher the LDA eigenvalues become.
引用
收藏
页码:73 / 76
页数:4
相关论文
共 50 条
  • [1] Investigations on inter-speaker variability in the feature space
    Haeb-Umbach, R.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 397 - 400
  • [2] Investigations on inter-speaker variability in the feature space
    Haeb-Umbach, R
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 397 - 400
  • [3] Modeling inter-speaker variability in speech recognition
    Cloarec, Gwenael
    Jouvet, Denis
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4529 - 4532
  • [4] Biomechanical Tongue Models: An Approach to Studying Inter-speaker Variability
    Winkler, Ralf
    Fuchs, Susanne
    Perrier, Pascal
    Tiede, Mark
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 280 - +
  • [5] INTER-SPEAKER VARIABILITY IN FORENSIC VOICE COMPARISON: A PRELIMINARY EVALUATION
    Ajili, Moez
    Bonastre, Jean-Francois
    Rossato, Solange
    Kahn, Juliette
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2114 - 2118
  • [6] Inter-speaker variability in audio-visual classification of word prominence
    Heckmann, Martin
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1790 - 1794
  • [7] Inter-speaker variability: speaker normalisation and quantitative estimation of articulatory invariants in speech production for French
    Serrurier, Antoine
    Badin, Pierre
    Boe, Louis-Jean
    Lamalle, Laurent
    Neuschaefer-Rube, Christiane
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2272 - 2276
  • [8] INTER-SPEAKER VARIATION IN COMPOUND PROMINENCE
    Bell, Melanie J.
    LINGUE E LINGUAGGIO, 2015, 14 (01) : 61 - 78
  • [9] Formant-based articulatory strategies: Characterisation and inter-speaker variability analysis
    Serrurier, Antoine
    Neuschaefer-Rube, Christiane
    JOURNAL OF PHONETICS, 2024, 107
  • [10] Capture inter-speaker information with a neural network for speaker identification
    Wang, L
    Chen, K
    Chi, HH
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL V, 2000, : 247 - 252