Unsupervised speaker adaptation for speaker independent acoustic to articulatory speech inversion

被引:21
|
作者
Sivaraman, Ganesh [1 ]
Mitra, Vikramjit [1 ]
Nam, Hosung [2 ]
Tiede, Mark [3 ]
Espy-Wilson, Carol [1 ]
机构
[1] Univ Maryland, Elect & Comp Engn, College Pk, MD 20740 USA
[2] Korea Univ, Seoul, South Korea
[3] Haskins Labs Inc, New Haven, CT 06511 USA
来源
基金
美国国家科学基金会;
关键词
VOCAL-TRACT; MOVEMENTS;
D O I
10.1121/1.5116130
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech inversion is a well-known ill-posed problem and addition of speaker differences typically makes it even harder. Normalizing the speaker differences is essential to effectively using multi-speaker articulatory data for training a speaker independent speech inversion system. This paper explores a vocal tract length normalization (VTLN) technique to transform the acoustic features of different speakers to a target speaker acoustic space such that speaker specific details are minimized. The speaker normalized features are then used to train a deep feed-forward neural network based speech inversion system. The acoustic features are parameterized as time-contextualized mel-frequency cepstral coefficients. The articulatory features are represented by six tract-variable (TV) trajectories, which are relatively speaker invariant compared to flesh point data. Experiments are performed with ten speakers from the University of Wisconsin X-ray microbeam database. Results show that the proposed speaker normalization approach provides an 8.15% relative improvement in correlation between actual and estimated TVs as compared to the system where speaker normalization was not performed. To determine the efficacy of the method across datasets, cross speaker evaluations were performed across speakers from the Multichannel Articulatory-TIMIT and EMA-IEEE datasets. Results prove that the VTLN approach provides improvement in performance even across datasets. (C) 2019 Acoustical Society of America.
引用
收藏
页码:316 / 329
页数:14
相关论文
共 50 条
  • [41] DYNAMIC SPEAKER ADAPTATION IN SPEAKER-INDEPENDENT WORD RECOGNITION
    HEWETT, AJ
    HOLMES, G
    YOUNG, SJ
    [J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 275 - 282
  • [42] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [43] UNSUPERVISED SPEAKER ADAPTATION OF DEEP NEURAL NETWORK BASED ON THE COMBINATION OF SPEAKER CODES AND SINGULAR VALUE DECOMPOSITION FOR SPEECH RECOGNITION
    Xue, Shaofei
    Jiang, Hui
    Dai, Lirong
    Liu, Qingfeng
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4555 - 4559
  • [44] Speaker verification based on fusion of acoustic and articulatory information
    Li, Ming
    Kim, Jangwon
    Ghosh, Prasanta
    Ramanarayanan, Vikram
    Narayanan, Shrikanth
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1613 - 1617
  • [45] Supervised and unsupervised speaker adaptation in large vocabulary continuous speech recognition of Czech
    Cerva, P
    Nouza, J
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 203 - 210
  • [46] ON COMBINING DNN AND GMM WITH UNSUPERVISED SPEAKER ADAPTATION FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Liu, Shilin
    Sim, Khe Chai
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [47] Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression
    Hueber, Thomas
    Girin, Laurent
    Alameda-Pineda, Xavier
    Bailly, Gerard
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2246 - 2259
  • [48] DATA SAMPLING ENSEMBLE ACOUSTIC MODELLING IN SPEAKER INDEPENDENT SPEECH RECOGNITION
    Chen, Xin
    Zhao, Yunxin
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5130 - 5133
  • [49] Unsupervised Speaker Adaptation of BLSTM-RNN for LVCSR Based on Speaker Code
    Huang, Zhiying
    Xue, Shaofei
    Yan, Zhijie
    Dai, Lirong
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [50] SPEAKER IDENTITY PRESERVATION IN DYSARTHRIC SPEECH RECONSTRUCTION BY ADVERSARIAL SPEAKER ADAPTATION
    Wang, Disong
    Liu, Songxiang
    Wu, Xixin
    Lu, Hui
    Sun, Lifa
    Liu, Xunying
    Meng, Helen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6677 - 6681