Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

被引:21
|
作者
Sivaraman, Ganesh [1 ]
Mitra, Vikramjit [1 ]
Nam, Hosung [2 ]
Tiede, Mark [3 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, Elect & Comp Engn, College Pk, MD 20740 USA
[2] Korea Univ, Seoul, South Korea
[3] Haskins Labs Inc, New Haven, CT 06511 USA
来源
基金
美国国家科学基金会;
关键词
VOCAL-TRACT; MOVEMENTS;
D O I
10.1121/1.5116130
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets. (C) 2019 Acoustical Society of America.
引用
收藏
页码:316 / 329
页数:14
相关论文
共 50 条
  • [31] Unsupervised intra-speaker variability compensation based on Gestalt and model adaptation in speaker verification with telephone speech
    Yoma, Nestor Becerra
    Garreton, Claudio
    Molina, Carlos
    Huenupan, Fernando
    [J]. SPEECH COMMUNICATION, 2008, 50 (11-12) : 953 - 964
  • [32] DIFFERENTIABLE POOLING FOR UNSUPERVISED SPEAKER ADAPTATION
    Swietojanski, Pawel
    Renals, Steve
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4305 - 4309
  • [33] Unsupervised model adaptation for speaker verification
    Preti, Alexandre
    Bonastre, Jean-Francois
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2090 - 2093
  • [34] An approach to robust unsupervised speaker adaptation
    Kim, NS
    Seo, DJ
    Lim, W
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (06) : 469 - 472
  • [35] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
    Hariharan, R
    Viikki, O
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
  • [36] Prediction of the articulatory movements of unseen phonemes of a speaker using the speech structure of another speaker
    Uchida, Hidetsugu
    Saito, Daisuke
    Minematsu, Nobuaki
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 450 - 454
  • [37] Acoustic-phonetic speech parameters for speaker-independent speech recognition
    Deshmukh, O
    Espy-Wilson, CY
    Juneja, A
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 593 - 596
  • [38] Speaker clustering and transformation for speaker adaptation in speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (01): : 71 - 77
  • [39] Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent
    Hemakumar, G.
    Punitha, P.
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 73 - 80
  • [40] Unsupervised Lattice-based Acoustic Model Adaptation for Speaker-Dependent Conversational Telephone Speech Transcription
    Thambiratnam, K.
    Seide, E.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1567 - 1570