Eliminating inter-speaker variability prior to discriminant transforms

被引:0
|
作者
Saon, G [1 ]
Padmanabhan, M [1 ]
Gopinath, R [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper shows the impact of speaker normalization techniques such as vocal tract length normalization (VTLN) and speaker-adaptive training (SAT) prior to discriminant feature space transforms, such as LDA. We demonstrate that removing the inter-speaker variability by using speaker compensation methods results in improved discrimination as measured by the LDA eigenvalues and also in improved classification accuracy (as measured by the word error rate). Experimental results on the SPINE (speech in noisy environments) database indicate an improvement of up to 5% relative over the standard case where speaker adaptation (during testing and training) is applied after the LDA transform which is trained in a speaker independent manner. We conjecture that performing linear discriminant analysis in a canonical feature space (or speaker normalized space) is more effective than LDA in a speaker independent space because the eigenvectors will carve a subspace of maximum intra-speaker phonetic separability whereas in the latter case this subspace is also defined by the interspeaker variability. Indeed, we will show that the more normalization is performed (first VTLN, then SAT) the higher the LDA eigenvalues become.
引用
收藏
页码:73 / 76
页数:4
相关论文
共 50 条
  • [21] Intra- and inter-speaker variation in eight Russian fricativesa)
    Ulrich, Natalja
    Pellegrino, Francois
    Allassonniere-Tang, Marc
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (04): : 2285 - 2297
  • [22] R-Norm: Improving Inter-Speaker Variability Modelling at the Score Level via Regression Score Normalisation
    Vandyke, David
    Wagner, Michael
    Goecke, Roland
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3116 - 3120
  • [23] Inter-speaker speech variability assessment using statistical deformable models from 3.0 Tesla magnetic resonance images
    Vasconcelos, Maria J. M.
    Ventura, Sandra M. R.
    Freitas, Diamantino R. S.
    Tavares, Joao Manuel R. S.
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART H-JOURNAL OF ENGINEERING IN MEDICINE, 2012, 226 (H3) : 185 - 196
  • [24] Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification
    Li, Zuoliang
    Guo, Wu
    Bin Gu
    Peng, Shengyu
    Zhang, Jie
    INTERSPEECH 2024, 2024, : 3794 - 3798
  • [26] Voice conversion based on probabilistic parameter transformation and extended inter-speaker residual prediction
    Hanzlicek, Zdenek
    Matousek, Jindrich
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 480 - 487
  • [27] Normal non-fluency in adult males: An intra-and inter-speaker study
    Duckworth, M. S.
    McDougall, K.
    10TH OXFORD DYSFLUENCY CONFERENCE, ODC 2014, 2015, 193 : 302 - 303
  • [28] Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion
    Feldhoffer, Gergely
    Oroszi, Balazs
    Takacs, Gyoergy
    Tihanyi, Attila
    Bardi, Tamas
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 447 - 454
  • [29] Intra- and inter-speaker variations of formant pattern for lateral syllables in Standard Chinese
    Zhang, C
    van de Weijer, J
    Cui, JX
    FORENSIC SCIENCE INTERNATIONAL, 2006, 158 (2-3) : 117 - 124
  • [30] SPECTRAL DISTRIBUTION CUES - COMPARATIVE-STUDY BASED ON 2 INTRA-SPEAKER AND INTER-SPEAKER DISCRIMINATING ANALYSES
    CAELEN, G
    VIGOUROUX, N
    SPEECH COMMUNICATION, 1983, 2 (2-3) : 133 - 136